Thursday, May 29, 2008

Simple Shell One-Liner To Enumerate File Types In Linux and Unix

Hey there,

Lately, we've been focusing a lot on Perl "one liners," from mass file time syncing to name and IP resolution and I thought it was only fair that we should write a post about a shell "one liner" for once. After all, most standard Unix and Linux shells are perfect for that purpose :)

Here's a very quick way to take an inventory of all the different file types (including directories, sockets, named pipes, etc) that exist in a given directory tree and provide a tally of each file type. I'm not entirely sure that I care if I have 15 ASCII text files and 2 Perl scripts in my current working directory, but this little piece of code must be able to help someone somewhere accomplish something, even if it is only to use as a smaller part of a larger organism ;)

This works in pretty much any standard shell on every flavour of Linux and/or Unix I've tested (ash, sh, bash, jsh, ksh, zsh, even csh and tcsh, which is huge for me since I never use those shells. One day, soon, I will redouble my efforts and just learn how to use them well, which, hopefully, won't result in my writing a whole bunch of posts about "cool" stuff that everyone has known about for the past few decades ;).

This one-liner could actually be written as a script, to make it more readable, like this:

#!/bin/sh
find . -print | xargs -I var file "var"|
awk '{
$1="";
x[$0]++;
}
END {
for (y in x) printf("%d\t%s\n", x[y], y);
}' | sort -nr


But, since I'm a big fan of brevity (which is about as far away from obvious as possible if you consider my writing style ;), I would run it like this:

host # find . -print|xargs file|awk '{$1="";x[$0]++;}END{for(y in x)printf("%d\t%s\n",x[y],y);}'|sort -nr

In my terminal, that all comes out on one line :) And here's the sort of output you can expect:

host # find . -print|xargs file|awk '{$1="";x[$0]++;}END{for(y in x)printf("%d\t%s\n",x[y],y);}'|sort -nr
23 Bourne-Again shell script text
11 ASCII text
10 perl script text
5 pkg Datastream (SVR4)
4 directory
4 ASCII English text
2 UTF-8 Unicode English text, with overstriking
2 Bourne shell script text
1 RPM v3 bin i386 m4-1.4.10-1


and, just to confirm the file count, we'll change the command slightly (to total up everything, instead of being particular -- another "one-liner" that's, admittedly, completely unnecessary unless you're obsessive/compulsive like me ;) and compare that with a straight-up find:

host # find . -print|xargs file|awk 'BEGIN{x=0}{x++}END{print x}'
62
host # find .|wc -l
62


Good news; everything appears to be in order! Hope this helps you out in some way, shape or form :)

Cheers,

, Mike