Tuesday, November 27, 2007

Using find and xargs to locate Windows Files

A lot of times, when you're asked to find something on a machine, and you only have a moderate idea of what you're specifically looking for, you'll use the obvious command: find. find is a great command to use because you can use wildcards in your expression argument. So, if you know that you're looking for something like "theWordiestScriptEver," and you have no idea where it's located on your box, you could find it by typing just this:

find / -iname "*word*" -print

This will find every file on the system (even on non-local mounts if you have them set up) and only print the results for files with the word "word" in the name. Note that the "-iname" option matches without regards to case, so h and H both match. This option isn't available in all versions of find. If you don't have this option available to you, you'll get an error when you run the above line (just use "-name" instead). The standard Solaris find does not do "case insensitive" pattern matching, so your best bet is to find the smallest substring that you're sure of the case on, or use another attribute to search for the file (like -user for the userid or -atime for the last access time). Alternatively, you could spend hours stringing together a bunch of "or" conditions for every conceivable combination of upper and lower case letters in your expression.

Now suppose you needed to perform an action on a file you found. You could use find,s built-in exec function, like so:

find / -iname "*word*" -print -exec grep spider {} \;

This will perform the command "grep spider" on all files that match the expression. Which brings us around to the next predicament. What do you do if you have to try and find something simple, have no idea where it is on your box "and" that box hosts file systems that Windows users are allowed to write files to. The above example should work just fine on those. My own advice is, if you can get away with just using find, do so, since it handles all of the rogue characters, tabs and spaces in Windows files on its own.

Now, if you have to do something much more complicated (or convoluted), you'll want to pipe to a program like xargs, which is where all those funny Windows file names and characters (some of which are special to your shell) start to cause issues. Again, this would return ok:

# find . -name "*word*" -print
./word - file's
./word file
./word & file's
./word file's

But this will become an issue if you pipe it to xargs, as shown below:

# find . -name "*word*" -print|xargs ls
xargs: Missing quote: files

Ouch! xargs doesn't deal with those spaces, tabs and special characters very well. You can fix the space/tab problem very simply by using xarg's "named variable" option. Normally, xargs acts on the input it receives (thus: "xargs ls," above, is processing ls on each file name find sends it), but you can alter how it deals with that data in a simple way (at least as far as the spacing issue is concerned). Example below:

# find . -name "*file" -print|xargs -ivar ls "var"
./word file
# find . -name "*word*" -print|xargs -ivar ls "var"
xargs: Missing quote: ./word - files

But, in the second invocation above, you see that it still can't handle the "shell special" characters, like "'" or """ <--- Double quote - so it's time to step it up. I prefer to just sanitize everything that's not kosher, even though I know I don't technically have to avoid the ---> \ / : * ? "< > <--- characters, since Windows won't allow them as parts of file names. It seems easier just to react on anything that isn't a letter or number and pass it along with enough escapes (back slashes) so that xargs can parse it correctly, and get you back good information. Here's how to do that; using sed (and a little grep, to keep it neat), also:

# find . -name "*word*" 2>&1|grep -iv denied|sed "s/\([^A-Za-z0-9]\)/\\\\\1/g"|xargs -ivar ls "var"
./word - file's
./word file
./word & file's
./word file's

And now you can use find, combined with xargs, on all the files you have permission to see, no matter what goofy characters are in them :)

, Mike