Friday, March 28, 2008

Easy Multiple File Patching And Patch Removal On Linux And Unix

Hello again,

Today we're going to continue with yesterday's post on manually mass patching files on Linux and Unix, but come at it from a different, and much simpler, angle.

First things first; you can delete the ed patch files from yesterday :) When you look at them, you can see that using "diff -e" to create an ed patch file creates a patch that contains absolutely no information that would allow you to reverse a change that you made, as opposed to a patch file made from diff straight-up. The small comparison below points this up more clearly:

host #diff -e tmpdir1/file1 tmpdir2/file1
5c
BASEDIR="/usr/binky"
.
host # diff tmpdir1/file1 tmpdir2/file1
5c5
< HOMEDIR="/usr/binky"
---
> BASEDIR="/usr/binky"


The ed patch file type that we used yesterday, although simpler to understand, doesn't make it possible to reverse your changes without keeping backup copies of your files.

The good news is that patch files created with diff can still be used to patch massive amounts of files and recover them even if the original, unpatched, files get lost or deleted. The method used to obtain these results is simpler than what we walked through yesterday and only requires the use of the "diff" and "patch" commands. Both of these commands should come standard with your OS. They've been on Linux for a long time and have been on Solaris Unix since, at least, release 2.6 (probably earlier).

So, let's get started patching lots of files and then backing out those patches. Since yesterday's post was so long (and this one has the same potential), I'm going to use only 2 files as the base number of files (although the number of files can be however large you want) and grep out the relevant information for display purposes. Hopefully it will save us all some eye strain ;)

The first thing we'll want to do is copy all the scripts we want to patch into a new directory to work on them. We'll also put the new scripts (that we'll need to create our diff patches) in yet another directory. We'll never work directly on the scripts in their native directory until we're sure they're patched correctly. This isn't absolutely necessary, but is generally good practice. No sense in letting a simple mistake cost you any more time, or grief, than it has to. Again, the only thing that is different between our existing scripts and the new scripts is that the HOMEDIR variable has been changed to BASEDIR.

host # cp scriptdir/* tmpdir1/
<--- This could be any number of files.

The following is our setup, with only one difference between the files, as noted above:

host # ls tmpdir1
. .. file1 file2
host # ls tmpdir2
. .. file1 file2
host # grep DIR tmpdir1/*
tmpdir1/file1:HOMEDIR="/usr/binky"
tmpdir1/file2:HOMEDIR="/usr/binky"
host # grep DIR tmpdir2/*
tmpdir2/file1:BASEDIR="/usr/binky"
tmpdir2/file2:BASEDIR="/usr/binky"


Next, we'll use diff to create a patch file. Although, rather than just doing diff straight-up, we're going to run it in "contextual" mode. When you invoke "diff -c" it creates a contextual diff which, literally, means that it puts the diff output in context (so you can see the lines before and after the lines that differ). The main reason I like to use this option is that the output works with "patch" when patching multiple files from multiple directories. The output from a standard diff of multiple files in multiple directories doesn't work well for this (mostly because it puts all the file names on one line and "patch" attempts to find a file named "diff tmpdir1/file tmpdir2/file" - literally. And that file can never exist (I hope ;)

Now we'll create the multiple file patch and examine its contents, created at the directory level directly above tmpdir1 and tmpdir2, so we can get all the files in both directories (Note that only the lines beginning with the exclamation point (!) in the diff output are different. The lines above and below only serve to showcase the line in its context within the file):

host # diff -c tmpdir1 tmpdir2 >patchfile.patch
host # cat patchfile.patch
diff -c tmpdir1/file1 vtmpdir2/file1
*** tmpdir1/file1 Wed Mar 26 14:36:05 2008
--- tmpdir2/file1 Wed Mar 26 14:54:10 2008
***************
*** 2,8 ****

COMMAND_ARGS="-d --takeforeverandaday"

! HOMEDIR="/usr/binky"

case $x in

--- 2,8 ----

COMMAND_ARGS="-d --takeforeverandaday"

! BASEDIR="/usr/binky"

case $x in

diff -c tmpdir1/file2 vtmpdir2/file2
*** tmpdir1/file2 Wed Mar 26 14:36:05 2008
--- tmpdir2/file2 Wed Mar 26 14:54:10 2008
***************
*** 2,8 ****

COMMAND_ARGS="-d --takeforeverandaday"

! HOMEDIR="/usr/binky"

case $x in

--- 2,8 ----

COMMAND_ARGS="-d --takeforeverandaday"

! BASEDIR="/usr/binky"

case $x in


Now we're ready to patch all of the files in tmpdir1 at once, using the simple form of the command (patch -i PATCHFILE), and receive an error for doing so. Note that we're running this from the directory above tmpdir1 and tmpdir2; exactly where we were when we used "diff -c" to create the patch file:

host # patch -i patchfile.patch
can't find file to patch at input line 4
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
--------------------------
|diff -c tmpdir1/file1 tmpdir2/file1
|*** tmpdir1/file1 Wed Mar 26 14:36:05 2008
|--- tmpdir2/file1 Wed Mar 26 14:54:10 2008
--------------------------
File to patch: ^C
<--- Type the control (ctl) key + C, or any other escape/control key combination to break out of this prompt

This error killed us before it actually made any changes because of the way "patch" works. When run without any options (other than -i, which we used to indicate the name of our patch file), "patch" parses the patch file and strips the file names down to the base, in much the same way the "basename" command does. So, even though the file name in the patch file is "tmpdir/file1," the "patch" program is looking for a file named "file1" and it's looking for it in the directory we're in which, unfortunately, isn't where the file is.

Luckily, this little setback is easy to remedy. Using the -p option to "patch" we can instruct "patch" how to interpret the file names in our patch file. As we noted, when the option isn't present, "patch" reverts to "basename" type behaviour. If we used -p1, we would be instructing "patch" to remove the leading slash (/) from the file name. We're going to use -p0, which instructs "patch" to not interpret the file name and just take it as it is (In this case "tmpdir/file1," which is relative, but just fine considering where we're running the command from).

host # patch -p0 -i patchfile.patch
patching file tmpdir1/file1
patching file tmpdir1/file2


Success! Now, let's check that the patch actually took:

host # grep DIR tmpdir1/*
tmpdir1/file1:BASEDIR="/usr/binky"
tmpdir1/file2:BASEDIR="/usr/binky"


Excellent! HOMEDIR is now BASEDIR. Alas, as I intimated in our post yesterday on mass file updating, our boss has, only minutes later, decided that the BASEDIR variable really should be HOMEDIR after all. He's not going to explain why, we just need to switch everything back now ;)

And this is where our method of execution really pays off. Assuming we held on to that patch file, we can now use "patch" to put everything back the way it was in short order, by simply adding the -R (to reverse the patch operations) to the command line and running it again, like so:

host # patch -p0 -R -i patchfile.patch
patching file tmpdir1/file1
patching file tmpdir1/file2


And, then, just to verify that the patch differences have been removed:

host # grep DIR tmpdir1/*
tmpdir1/file1:HOMEDIR="/usr/binky"
tmpdir1/file2:HOMEDIR="/usr/binky"


And we're all set :) Hopefully this follow-up tutorial was easy enough to follow, and you can find some good use for it in your work routine. BTW, don't forget to copy the changed scripts back to the real script directory, but only after copying that directory off somewhere else, again. But only if you're as paranoid as I am ;)

Best Wishes,

, Mike