Thursday, September 18, 2008

Streamlined Perl Number Matching Script For Unix Or Linux

Hey There,

We'll, this week's been completely out-of-sequence. Today's post was slated for the day before yesterday and Tuesday's number pool script that was supposed to come out yesterday ended up getting published on Tuesday. If there's one thing we've learned this week, it's that my day job is starting to interfere with my hobbies ;)

On the bright side, we're finally getting around to wrapping up the two-parter we started on Monday with our uncondensed Number Matching Perl Script. The intent, with that post, was to show a working example of a simple Perl script (basically, matching numbers out of another file with a few option-based limitations imposed by the user) that was written off-the-cuff. Although it functioned, and worked pretty efficiently, it was a confusing mess to look at, and didn't include some options that it probably should have. To summarize, the aim of the follow up post (today's, at last) was going to be to reproduce that script in a more condensed format. This condensation (that just doesn't sound right, does it? But, I looked it up in the dictionary and the way in which I meant it was actually the first entry ;) would include addition of extra code to handle some needed functionality, as well as a few helpful comments and more logical ordering of all the parts. The end result, hopefully, is more scalable, so you won't have to dread all the extra typing and logical convolution involved in adding a seventh variable ;)

Again, we have the same fairly simple Perl script that will parse any file consisting of rows of numbers and print out matches meeting certain criteria. As noted before, you can visit the lamer version of this script from Monday and compare it with today's "better" version (or "less lame," depending on what kind of a person you are. Glass half full? Glass half empty? Glass too big? ;)

Again, the basic file format (for the input) should be a text file with lines like the following (note that this updated script will also take regular single digits and has an extra "-p" flag if you want to pad those with 0's):

host # cat file
01 02 03 04 05 09
5 18 19 45 33
33 55 666 88 23 12
...



Again, the script has 4 non-optional arguments that you'll be prompted for if you forget to enter any of them (with the additional, and optional, "-p" flag), like so:

host # ./match.pl
Error Encountered! Invalid or incomplete options!
Usage: ./better.pl -h highNumber -f statFile -n numberOfCombos -m mimimumCombos
[-p padSingleDigitsWithZeros]


A regular execution would look like this (we're removing duplicate match elements in a simple hash now, instead of building an increasingly lengthening logic knot - depending upon how you want to implement the match, it should be trivial, like adding \b or whatnot, to make the matches more exact so that 13 won't match 130):

host # ./match.pl -h 15 -f NUMFILE -n 3 -m 10 -p
12 matches 10 times:
7 8 9 10 11 12
109 110 111 112 113 114
115 116 117 118 119 120
121 122 123 124 125 126
127 128 129 130 131 132
211 212 213 214 215 216
307 308 309 310 311 312
409 410 411 412 413 414
511 512 513 514 515 516
607 608 609 610 611 612

13 matches 10 times:
13 14 15 16 17 18
109 110 111 112 113 114
127 128 129 130 131 132
133 134 135 136 137 138
139 140 141 142 143 144
211 212 213 214 215 216
313 314 315 316 317 318
409 410 411 412 413 414
511 512 513 514 515 516
613 614 615 616 617 618

15 matches 10 times:
13 14 15 16 17 18
115 116 117 118 119 120
145 146 147 148 149 150
151 152 153 154 155 156
157 158 159 160 161 162
211 212 213 214 215 216
313 314 315 316 317 318
415 416 417 418 419 420
511 512 513 514 515 516
613 614 615 616 617 618


And, when everything isn't equal, the output basically looks the same :)

host # ./match.pl -h 15 -f NUMFILE -n 3 -m 5 -p

01 02 matches 6 times:
97 98 99 100 101 102
199 200 201 202 203 204
301 302 303 304 305 306
397 398 399 400 401 402
499 500 501 502 503 504
601 602 603 604 605 606...


All in all, we only save about 50 lines of code (some of which is blank lines and comments), but the result should be a lot easier to manage and build upon.

Hope you enjoy the script and can find some use for it or its parts (like dealing with duplicates in multiple arrays, etc). You can never do enough matching ;)

Cheers,


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/usr/bin/perl

#
# match.pl - Updated to last longer and taste better :)
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

use Getopt::Std;

%options = ();

# DIFF - Add -p option to pad 1-9
getopts("h:f:m:n:p", \%options);

if ( defined $options{m} && defined $options{h} && defined $options{f} && defined $options{n} ) {
$highnum = $options{h};
$combos = $options{n};
$statfile = $options{f};
$minmatch = $options{m};
if ( ! -f $statfile ) {
usage("File $options{f} Does Not Exist!");
} elsif ( $combos > 6 ) {
usage("Only Combos Up To 6 Please!");
}
} else {
usage("Invalid or incomplete options!");
}


open(FILE, "<$statfile");
@file = <FILE>;
close(FILE);

# DIFF - with -p, now have to differentiate
# DIFF - between straight and padded 1-9
for ( $lownum = 1; $lownum <= $highnum; $lownum++ ) {
if ( defined $options{p} && $lownum < 10 ) {
$padded_num = "0$lownum";
} else {
$padded_num = $lownum;
}
push(@numbers, "$padded_num");
}

# DIFF - Still not using a subroutine for this.
# Trying to stay away from arrays of array references, also
# for ease of understaning

@numbers1 = @numbers2 = @numbers3 = @numbers4 = @numbers5 = @numbers6 = "";
$one = $two = $three = $four = $five = $six = 1;
$start = 1;
while ( $start <= $combos ) {
if ( @numbers1 == 1 && $start == 1 ) {
@numbers1 = @numbers;
$one = 0;
} elsif ( @numbers2 == 1 && $start == 2 ) {
@numbers2 = @numbers;
$two = 0;
} elsif ( @numbers3 == 1 && $start == 3 ) {
@numbers3 = @numbers;
$three = 0;
} elsif ( @numbers4 == 1 && $start == 4 ) {
@numbers4 = @numbers;
$four = 0;
} elsif ( @numbers5 == 1 && $start == 5 ) {
@numbers5 = @numbers;
$five = 0;
} elsif ( @numbers6 == 1 && $start == 6 ) {
@numbers6 = @numbers;
$six = 0;
}
$start++;
}

foreach $cnum1 (@numbers1) {
foreach $cnum2 (@numbers2) {
foreach $cnum3 (@numbers3) {
foreach $cnum4 (@numbers4) {
foreach $cnum5 (@numbers5) {
foreach $cnum6 (@numbers6) {
@match = grep(/$cnum1/ && /$cnum2/ && /$cnum3/ && /$cnum4/ && /$cnum5/ && /$cnum6/, @file);
$match = @match;
if ( $match >= $minmatch ) {
undef %seen;
@success = grep(!$seen{$_}++, ($cnum1, $cnum2, $cnum3, $cnum4, $cnum5, $cnum6));
print "@success matches $match times:\n @match\n";
}
}
}
}
}
}
}

sub usage {
$message = shift;
print "Error Encountered! $message\n";
print "Usage: $0 -h highNumber -f statFile -n numberOfCombos -m mimimumCombos\n";
print "[-p padSingleDigitsWithZeros]\n";
exit(1);
}


, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.