Thursday, May 15, 2008

Finding An "Invisible" Proc's Working Directory Without lsof On Linux Or Unix

Ahoy there,

Today, we're going to take a look at something that gets taken for granted a lot these days. lsof (a fine program, to be sure. No debate here) has become a very common staple for finding out information about processes, and where they're hanging out, on most Linux and Unix systems today. Much like the command "top," it provides a simple and robust frontend to having to do a lot of grunt-work to achieve the same results.

I find that, for the most part, lsof is used to find out where a process is, or what filesystems, etc, it's using, in order to troubleshoot issues. One of the most common is the "mysteriously full, yet empty, disk" phenomenon. Every once in a while that will turn out to be an issue where all of the inodes in a partition have been used before all of the blocks have, which produces confusing output in df, leading to the mistaken assumption that there is plenty of space left a device even when there isn't.

However, many times, that empty-yet-full disk is the victim of a process that met an untimely demise and never cleaned up a lot of temporary space in memory (or virtual disk, to split hairs). Another issue that lsof is used for is to find out which dag-nabbed process is holding onto a mount-point that claims it's in use when no one is logged on and no user processes are running that would access it (for instance, a really specific, user-defined, mountpoint like /whereILikeToPutMyStuff - Hopefully the OS isn't depending on this to be around ;) Both problems are, essentially, the same.

However, should you find yourself in a situation where lsof either doesn't come with your Operating System, and/or hasn't been installed, you can still break down these two (and I'm just limiting the post to these two particulars so I don't end up writing an embellished manpage ;) separate issues into one, and find the solution to your problems using the commonly available "pwdx" utility.

pwdx will print out the working directory of any given process (using the process ID as input) at its best. But this is enough to get you to the answer you need.

For instance, we'll take this common scenario: /tmp is reporting 100% full, but df -k shows that /tmp is only at 1% capacity (99% of it is unused). My thinking here almost immediately gravitates toward vi, or some other program that opened up a buffer in memory (using /tmp or /var/tmp), got clipped unexpectedly and never let the system know that it was done with the space it allocated for itself. This would normally not be an issue but, since your Linux or Unix machine "thinks" /tmp is full, whether or not it actually is makes no difference. It won't let you use the free space :(

This command line could be used to figure out what process was using that space in /tmp or /var/tmp:

host # ps -ef|awk '{print $2}'|xargs pwdx 2>&1|grep -iv cannot|grep /tmp
2969: /tmp


Taking it a step further (assuming we trust our own output), we could just skip right to the process in question by adding a bit more to the pipe-chain:

host # ps -ef|awk '{print $2}'|xargs pwdx 2>&1|grep -iv cannot|grep /tmp|sed 's/^\([^:]*\).*$/\1/'|xargs -n1 ps -fp
UID PID PPID C STIME TTY TIME CMD
root 2969 2966 0 Mar 21 ? 0:05 /bin/vi /home/george/myHumungousFile


Since it's May already, we can fairly assume that this PID is pointing to a dead process (especially since it has no TTY associated with it), and (double-checking, just to be sure) we can probably solve our problem by killing that PID. See our previous post on killing zombie processes if it won't seem to go away and "ps -el" shows it in a Z state.

Yes, that example was pretty simplistic, but the same methodology can be used to find other programs using up other filesystems. Just like "lsof -d," you'll be able to find out what processes are using what filesystems and narrow down your list of suspects, if you don't nail the correct one right away. Since pwdx comes with your Linux or Unix OS, it's actually statistically more likely than lsof to be correct about what process is using what filesystem :)

Cheers,

, Mike