Thursday, September 4, 2008

Sorting Perl Lists And Removing Duplicates On Linux Or Unix

Hey there,

This may be the first time I've run a series of posts where the topic was different every step of the way. That's a good thing, because I get just as bored as anyone else reading about the same thing over and over ;) So, following in the footsteps of our posts on Number Pools And Guaranteed Combinations Within Fixed Lists and Perl List Permutations, we'll move straight ahead to the next logical step, which is almost an entirely different subject altogether. I will however, reiterate (repeat again... sorry, one of my idiosyncrasies ;) our Objective in this whole exercise (Skip to the next paragraph if you feel I've done this sentence to death ;) : Given a Number Pool of "x through y," create the maximum possible Fixed List Length variations of our Fixed List that contain some variation of our Guaranteed Combination, without any duplication (i.e. 1, 2, 3 is equal to 2, 3, 1 and would only count as one match), and return the results.

Of course, to revisit the definitions of our key components , please refer back to our previous post on Number Pools

Today, we're going to look a Perl list sorting. In this context, you can actually consider our lists as arrays if you prefer. I'm going to keep calling them lists because, in the end, it will most closely approximate what we will want to manipulate; a fixed list of numbers.

From yesterday's posts on list permutations you can basically understand, even if you didn't run the code, how very many different combinations of 6 numbers there can be, when duplication is not a factor (i.e. 1, 2, 3 is different than 3, 2, 1). In our case, given our Objective, we would want those number lists, in parentheses above, to be considered equal. So how do we go about that?

There are actually quite a number of ways you can go about doing this. They run the gamut from human-readable and computer-intensive to nearly-hieroglyphic and incredibly efficient. Some folks like to dump all their values into hashes and sort it out that way, since the Perl hash function is built to sort keys and values. I'm not against that at all, but feel like it might stray us farther off the path and I want to keep this series of posts somewhat accessible.

My answer is actually a fairly commonly used one, as well. I don't consider myself a Perl expert (or really even a Unix or Linux expert. Like in Eastern philosophy: Even the teacher is ever the student). If I ever do convince myself that I know it all, I'll probably have a fatal seizure from lack-of-interest within moments of my revelation ;)

I like to sort all of my permutations (in this case numerically) first, and then check for duplicates. This makes sense to me, for some reason, and (especially in this case) it can greatly reduce the pressure you feel when dealing with large numbers of numbers. For instance, our 720 permutations are all of the same 6 numbers, so, once they've all been sorted numerically, we're going to have 720 identical lists. We don't even need a computer to tell us what our Objective obliquely defines as a "unique list" in this situation. It's "1, 2, 3, 4, 5."

All of our actions are predicated on the output of the "public domain" code we put out yesterday, which has been revised slightly to suit our purposes (the original just printed out the answers and didn't do any variable assignation or a return) and, yes, I do use my lists as arrays from time to time (we all need time to relax and unshift ;) The "permute" subroutine is called with an array, or list, (@number_seeds) as its only argument (which contains "1, 2, 3, 4, 5"), like:

@s_perms = permute([@number_seeds], []);

Modified code below:

sub permute {
my @items = @{ $_[0] };
my @perms = @{ $_[1] };
unless (@items) {
@sortperms = join(" ", @perms);
unshift(@s_perms, "@sortperms" );
} else {
foreach $i (0 .. $#items) {
@newitems = @items;
@newperms = @perms;
unshift(@newperms, splice(@newitems, $i, 1));
permute([@newitems], [@newperms]);
return @s_perms;

And that's the lesson for today. I've attached code to the end of the post to illustrate the steps of sorting each list in our list of lists and checking for (and removing) duplicate lists from the pool. Note that we don't take into account multiple variable lists, as this won't be germane later (and adds some heft to the scriplet :) I tried to make this as easy to read as possible, so various steps, which could be combined, have been spelled out. If it has the opposite effect and makes things more confusing, I apologize profusely. And let me know! If there's interest, I'd be happy to write about this stuff at a much more granular level :)

Here's the mock output, to save you a screenshot:

host # ./program
1 2 3 4 5


Creative Commons License

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License


# program - an unoriginal name for an unoriginal concept ;)
# 2008 - Mike Golvach -
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

foreach $unsorted (@s_perms) {
@temparray = split(" ", $unsorted);
@temparray = sort { $a <=> $b } (@temparray);
unshift(@sorted_perms, "@temparray");
#@sorted_perms = sort { $a <=> $b } (@s_perms);

foreach $s_p (@sorted_perms) {
@unik = split(" ", $s_p);
@unik = sort(@unik);
if ( $unik == $last_unik ) {
print "$s_p";
@last_unik = @unik;
} elsif ( @unik == @last_unik ) {
$total = @unik;
$zero = $nope = 0;
while ( $zero < $total ) {
if ( "$unik[$zero]" ne "$last_unik[$zero]" ) {
$nope = 1;
} else {
if ( $nope ) {
print "$s_p";
@last_unik = @unik;
} else {
print "$s_p";
@last_unik = @unik;

, Mike

Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.