Monday, September 15, 2008

Condensing Perl Scripts In Linux and Unix

Hey There,

Today, we're going to go back a bit (we'll be putting out the final script from our number pool series this Wednesday) and take a look at a subject we've visited before in posts on making our Thesaurus script better and improving our webserver access log parser. Today, we present a fairly simple Perl script that will parse any file consisting of rows of numbers and print out matches meeting certain criteria. This is a fairly banal concept (and most probably well-overdone ;), so the spin we're going to put on it for this 2-parter is to present the Perl script (Suitable for running on any Unix or Linux distro) we've written to do this, first, in its completely "lame" format.

Now, when we say "lame," we don't mean that it doesn't do what it's supposed to; only that it is written in an overly cumbersome and confusing manner and will probably make any Perl enthusiast nauseated at the mere sight of it ;) The script is fairly simple and only requires that you have a file to parse with it. That file should be of the format:

host # cat file
01 02 03 04 05 09
05 18 19 45 33
33 55 666 88 23 12
...


etc, etc, etc... In this version of the script, the "09" version of the number 9 is "hard-coded" and a simple "9" won't match. The script has 4 non-optional arguments that you'll be prompted for if you forget one, like so:

host # ./match.pl
Error Encountered! Invalid or incomplete options!
Usage: ./match.pl -h highNumber -f statFile -n numberOfCombos -m mimimumCombos


A regular execution would look like this:

host # ./match.pl -h 39 -f MYFILE -n 3 -m 2
13 29 19 matches 2 times:
13 19 26 29 36
13 15 19 22 29

13 30 12 matches 2 times:
11 12 13 29 30
05 12 13 21 30

13 31 16 matches 2 times:
01 13 16 28 31
13 16 20 22 31
...


This command line tells "match.pl" that it should only look for numbers (and combinations of numbers) from 1 - 39, that the file to parse is called MYFILE, that we want to match 3 digit combinations (like 01 02 19, etc, from above) and that we only want to get output from the program if those 3 digit combinations match 2 or more times.

Tomorrow, we'll have this script stripped down and revamped and I think you'll be surprised at the difference (not just in length ;) In the meantime, feast your eyes on this monstrosity and feel free to use it. As ugly as it looks, it does actually work :)

Cheers,


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/usr/bin/perl

#
# match.pl
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

use Getopt::Std;

%options = ();

getopts("h:f:n:m:", \%options);

if ( defined $options{m} && defined $options{h} && defined $options{f} && defined $options{n} ) {
$highnum = $options{h};
$combos = $options{n};
$statfile = $options{f};
$minmatch = $options{m};
if ( ! -f $statfile ) {
usage("File $options{f} Does Not Exist!");
} elsif ( $combos > 6 ) {
usage("Only Combos Up To 6 Please!");
}
} else {
usage("Invalid or incomplete options!");
}


open(FILE, "<$statfile");
@file = <FILE>;
close(FILE);

for ( $lownum = 1; $lownum <= $highnum; $lownum++ ) {
if ( $lownum < 10 ) {
$padded_num = "0$lownum";
} else {
$padded_num = $lownum;
}
push(@numbers, "$padded_num");
}

for ( $times = 1; $times <= $combos; $times++) {
if ( $combos == 1 ) {
foreach $cnum1 (@numbers) {
chomp($cnum);
@match = grep(/$cnum1/, @file);
$match = @match;
if ( $match >= $minmatch ) {
print "$cnum1 matches $match times:\n @match\n";
}
}
} elsif ( $combos == 2 ) {
foreach $cnum1 (@numbers) {
foreach $cnum2 (@numbers) {
chomp($cnum);
if ( $cnum1 != $cnum2 ) {
@match = grep(/$cnum1/ && /$cnum2/, @file);
$match = @match;
if ( $match >= $minmatch ) {
print "$cnum1 $cnum2 matches $match times:\n @match\n";
}
}
}
}
} elsif ( $combos == 3 ) {
foreach $cnum1 (@numbers) {
foreach $cnum2 (@numbers) {
foreach $cnum3 (@numbers) {
chomp($cnum);
if ( $cnum1 != $cnum2 && $cnum1 != $cnum3 && $cnum2 != $cnum3 ) {
@match = grep(/$cnum1/ && /$cnum2/ && /$cnum3/, @file);
$match = @match;
if ( $match >= $minmatch ) {
print "$cnum1 $cnum2 $cnum3 matches $match times:\n @match\n";
}
}
}
}
}
} elsif ( $combos == 4 ) {
foreach $cnum1 (@numbers) {
foreach $cnum2 (@numbers) {
foreach $cnum3 (@numbers) {
foreach $cnum4 (@numbers) {
chomp($cnum);
if ( $cnum1 != $cnum2 && $cnum1 != $cnum3 && $cnum1 != $cnum4 && $cnum2 != $cnum3 && $cnum2 != $cnum4 && $cnum3 != $cnum4 ) {
@match = grep(/$cnum1/ && /$cnum2/ && /$cnum3/ && /$cnum4/, @file);
$match = @match;
if ( $match >= $minmatch ) {
print "$cnum1 $cnum2 $cnum3 $cnum4 matches $match times:\n @match\n";
}
}
}
}
}
}
} elsif ( $combos == 5 ) {
foreach $cnum1 (@numbers) {
foreach $cnum2 (@numbers) {
foreach $cnum3 (@numbers) {
foreach $cnum4 (@numbers) {
foreach $cnum5 (@numbers) {
chomp($cnum);
if ( $cnum1 != $cnum2 && $cnum1 != $cnum3 && $cnum1 != $cnum4 && $cnum1 != $cnum5 && $cnum2 != $cnum3 && $cnum2 != $cnum4 && $cnum2 != $cnum5 && $cnum3 != $cnum4 && $cnum3 != $cnum5 && $cnum4 != $cnum5 ) {
@match = grep(/$cnum1/ && /$cnum2/ && /$cnum3/ && /$cnum4/ && /$cnum5/, @file);
$match = @match;
if ( $match >= $minmatch ) {
print "$cnum1 $cnum2 $cnum3 $cnum4 $cnum5 matches $match times:\n @match\n";
}
}
}
}
}
}
}
} elsif ( $combos == 6 ) {
foreach $cnum1 (@numbers) {
foreach $cnum2 (@numbers) {
foreach $cnum3 (@numbers) {
foreach $cnum4 (@numbers) {
foreach $cnum5 (@numbers) {
foreach $cnum6 (@numbers) {
chomp($cnum);
if ( $cnum1 != $cnum2 && $cnum1 != $cnum3 && $cnum1 != $cnum4 && $cnum1 != $cnum5 && $cnum1 != $cnum6 && $cnum2 != $cnum3 && $cnum2 != $cnum4 && $cnum2 != $cnum5 && $cnum2 != $cnum6 && $cnum3 != $cnum4 && $cnum3 != $cnum5 && $cnum3 != $cnum6 && $cnum4 != $cnum5 && $cnum4 != $cnum6 && $cnum5 != $cnum6 ) {
@match = grep(/$cnum1/ && /$cnum2/ && /$cnum3/ && /$cnum4/ && /$cnum5/ && /$cnum6/, @file);
$match = @match;
if ( $match >= $minmatch ) {
print "$cnum1 $cnum2 $cnum3 $cnum4 $cnum5 $cnum6 matches $match times:\n @match\n";
}
}
}
}
}
}
}
}
}
}

sub usage {
$message = shift;
print "Error Encountered! $message\n";
print "Usage: $0 -h highNumber -f statFile -n numberOfCombos -m mimimumCombos\n";
exit(1);
}


, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.