Sunday, May 18, 2008

Doing Search And Replace In Multiple Files With Unix and Linux Perl - Easy Or Hard?

Hey There,

For this weeks "Lazy Sunday" post we're going to take a look at the versatility of Perl on Linux or Unix. While I'm fairly certain there's little argument that it's the best tool for most "extraction" and "reporting" functions when used in complicated situations, it's always been more interesting to me because of the wide range of ways you can complete any sort of task.

Today we're going to take a look at two different ways to do search and replace in multiple files using strictly Perl. The first way will be obnoxiously long and the second way will be almost invisible ;)

For both situations, we'll assume that we have 15 files all in the same directory. We'll also assume that we're logged into our favorite flavour of Linux or Unix OS and, coincidentally, in the same directory as those files. All the files are text files and are humungous. And, finally, all of the files are stories where the main character's name is Waldo, they've never been published and the writer's had a change of heart and decided to name his main character Humphrey. It could happen ;)

1. The hard way (or, if you prefer, the long way):

We'll write a script to read in each file and scour it, line by line. For lines on which the name Waldo appears, we'll replace that with Humphrey. We're taking into account, also, that Waldo may be named more than once on any particular line and that the name Waldo may have accidentally been mistyped with a leading lowercase "w," which needs to be corrected. That script would look something like this:

#!/usr/bin/perl

#
# replace_waldo.pl - change Waldo to Humphrey in all files.
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

@all_files = `ls -1d *`;
$search_term = "Waldo";
$replace_term = "Humphrey";

foreach $file (@all_files) {
$got_one = 0;
chomp($file);
open(FILE, "<$file");
@file=<FILE>;
close(FILE);
foreach $line (@file) {
if ( $line =~ /Waldo/i ) {
$line =~ s/Waldo/Humphrey/gi;
$got_one = 1
}
}
if ( $got_one ) {
open(NEWFILE, ">$file.new");
print NEWFILE @file;
close(NEWFILE);
rename("$file.new", "$file");
}
}


2. The easy way (or, again, the short way):

Assuming the exact same convoluted situation, here's another way to do it (which we've covered in a bit more detail in this older post on using Perl like Sed or Awk):

From the command line we'll type:

host # perl -p -i -e 's/Waldo/Humphrey/gi' *

And we're done :)

Of course, the longer method is better suited for situations in which there are other extenuating circumstances. Or, perhaps, even more work to do. For the sort of limited situation we've laid out today, I will almost always go with the second method (Who wants pie? :)... Unless I have lots of time on my hands ;)

Cheers,

, Mike