Saturday, November 17, 2007

How to Find a Rogue Process That's Hogging a Port

Hey there,

Today's little tip can actually come in useful even if the information you're seeking isn't "mission critical" (which, by the way, ranks among one of my least favorite terms. If there's one thing positive I can say about where I work now, it's that they don't describe every problem, resolution or project as if we were engaged in war -- but that could be an entirely separate post ;).

I've actually been asked to figure out what process was running on what port more often for information's sake than to try and figure out why something was "wrong," but the same principles apply. The scenario is generally something like the following:

Internal customer Bob needs to start (or restart) an application, but it keeps crashing and getting errors about how it can't bind to a port. This port is necessarily vague, since, in my experience, it's very common to be asked to figure something out with little or no information. I consider myself lucky if I have a somewhat-specific description of the problem at the onset. As we all know, folks will sometimes just complain that "the server is broken." What does that mean? ;)

The troubleshooting process here is pretty simple and linear (perhaps more detail and information in a future post regarding similar issues, as any problem or situation can be fluid and not always follow the rules). In order to try and fix Bob's problem, we'll do the following:

1. Double check that the port (We'll use 1647 as a random example) is actually in use by running netstat.

netstat -an|grep 1647|grep LIST

you can leave out the final "grep LIST" if you just want to know if anything is going on on port 1647 at all. Generally the output to look for is in the local address column (Format is generally IP_ADDRESS:PORT - like 192.168.1.45:1647 or *:1647 - depending on your OS the colon may be a dot). Whether or not you're checking for a LISTENing process, information about a connection from your machine on any port to foreign port 1647 shouldn't concern you.

2. We're going to assume that you actually found that the port is either LISTENing, or actively connected to, on your local machine (if it isn't, your troubleshooting would likely take a much different turn at this point). Now we'll try to figure out what process is using that port.

If you have lsof installed on your machine, figuring this out is fairly simple. Just type:

lsof -i :1647

and you should get a dump of the list of processes (or single process) listening on port 1647 (Easily found under the PID column). They're probably all going to be the same, but, if not, take note of all of them.

3. Run sommething along the lines of:

ps -ef|grep PID

and Problem solved! You now know what process is listening on port 1647 and you'll probably end up having to hard kill it if Bob doesn't have any idea why it won't let go of the port using standard methods associated with whatever program is using it.

But, sometimes, the last part isn't that simple, so:

4. What's that? lsof isn't installed on your machine? My first inclination is to recommend that you download it ;) Seriously, it's a valuable tool that you'll find a million uses for. But you can find out the process ID another way, just in case you can't get your hands on it and/or time is of the essence, etc.

In this instance, and we'll just assume the worst, you can use two commands called "ptree" and "pfiles" (these are standard on Solaris in /usr/proc/bin - may be located elsewhere on your OS of choice and/or named somewhat differently). Use the following command to just grab all the information possible and weed it down to the process using port 1647:

for x in `ptree -a | grep -v ptree | awk '{print $1}'`
do
pfiles $x 2>/dev/null|grep 1647
done


and you'll get the line of output that maps your PID to your port. The above is, admittedly, somewhat messy (not really messy, but you'll end up printing a lot of blank lines ;) Feel free to tailor it to your needs and make it more general (I explicitly used port 1647, but that should also be a variable if you want to create a little script to keep in your war chest).

Run your ps, as above, and now you should know what process is hogging that port and, in the process, making Bob's life miserable. If you cleanly kill that process, Bob should have one less thing to worry about and his program should be able to bind to the now-free port :)

, Mike