Tuesday, December 11, 2007

What To Do When nohup Hangs Up Anyway - Down and Dirty

nohup is one of those commands pretty much everyone who uses Unix knows about. And for good reason. If you don't have access to a system console, or some other direct connection to a machine at work, it makes sense that you'd want to avail yourself of any utility that makes it possible for you to begin running a command through your terminal software, shut down your PC and go home. Even people who've never thought of this before will find this command almost immediately after typing something like "./sqlplus @extremelyLongDatabaseQuery" 5 minutes before quitting time ;)

nohup is the natural end-point of most searches since it's been around so long that it's actually a built-in in certain shells.

nohup stands for, literally, "no hangup." This, in layman's terms (and terms that make more grammatical sense), means that any command, run by nohup, will ignore the hangup signal (or SIGHUP) which is issued when a user disconnects from his or her pseudo-terminal (pty). Anyway; that's what it's supposed to do. In reality, I'd give it about a 50% reliability rating. One of the complaints I hear the most on the job is that "such and such" a command was run with nohup, the user disconnected and the process never completed. It, in plain fact, stopped running as soon as the user logged out; which sent the hangup signal. Yes, nohup is specifically designed to ignore that (nohup /sqlplus@extremelyLongDatabaseQuery really works in theory ;)

Alternatives to this command exist, of course. On my ratings scale, from worst to best, you've got (to my knowledge):

1. nohup <-- Included as an alternative only to hammer-home how much heartache this command causes me on a regular basis.
2. disown -h JOBID <-- This is available in bash, and exists as a way to block SIGHUP on an existing job. Your JOBID can be found, while still in your shell, by typing: jobs (If your job is listed as number 2, you can then run "disown -h %2" - In fact, you don't really need to include the "-h" flag. The outcome is pretty much the same, since disown'ing the job removes it from the job table)
3. setsid <-- Not always available. I haven't used it frequently enough to know exactly what distro's it comes standard with, but I've run it enough to know that it does run commands in a separate session (not attached to your pty) which will keep your command running after you log out.
4. GNU Screen <--- This is "almost" the best because you can run a process in one screen, disattach it, log out, and that same screen can either be accessed by you at a later point, or by any other user with access, from a different pty at any point in time before its conclusion.
5. The Down and Dirty Method <--- Explained below.

The Down and Dirty Method isn't an industry term; just what I like to call doing what most of the above mentioned programs are built to do in a way that, in my experience, is more consistently effective. In order to ensure that my programs keep running after I log off (if I happen to not have access to a console), I do the following: background the command I want to run and then execute that backgrounded command in a subshell. This probably sounds more complicated than it actually is. For example; to background a command, instead of typing:

./command

you would type

./command &

The ampersand character runs the job for you in the background. You can verify that it's still running by typing jobs at the command line.

To run a command in a subshell (in sh, ksh and bash, for example), you just need to surround it with parentheses, like so:

(./command)

You won't be able to verify that this job is running by typing jobs, but (assuming it has a long enough run-time or you're an incredibly fast typist) you can see it in the output of "ps -ef." You should note that, when you execute a command like this, the ps output will show your user id as the process owner.

Now, to make use of the Down and Dirty Method, you just need to type the following amalgam of the above concepts:

(./command &)

And that's it. You've backgrounded the command, by following it with an ampersand, and executed it in a subshell by surrounding it with parentheses. The key reason this process almost always works is the way in which the shell processes your request. First, it invokes the subshell in which to execute your command. Once in the subshell, it executes the command and processes the ampersand, which instructs it to background the process. Since the process you're executing is being run in a subshell, when the subshell backgrounds it, the process disconnects itself from your pty. You can also verify this in "ps -ef" output; noting that the user 1 (init) is listed as the processes owner. It's not going anywhere until it's finished. Unless you have the privilege to kill the process.

Hope this helps you out in some form or fashion. Even if we agree to disagree on what works best, it's always good to have alternatives :)

, Mike