Thursday, December 20, 2007

Getting Error Values From The Middle Of A Pipe Chain In Bash

This is something very interesting I found out not too long ago, while hashing out some work with a colleague. As most administrators (or users) who do a fair amount of shell scripting know, the error status or return code (Generally referred to as "errno" in all the man pages) of a process is a fairly common method to use in determining the process flow of a script.

The one thing about the value of "errno" (or, literally, the variable "$?" in most shells) is that it's erased with each consecutive process that gets run. So if you were to run a series of command lines that echoed the return value of the grep command, the following example would be accurate (assuming the string "bob" can't be found in /home/myfile):

host # grep bob /home/myfile >/dev/null 2>&1
host # echo $?
host # 1


while this one would give you misleading information:

host # grep bob /home/myfile >/dev/null 2>&1
host # touch /home/myfile
host # echo $?
host # 0


So, on the first set of command lines, you're actually getting the return code of 1 from grep (indicating that it can't find the string "bob" in /home/myfile), while the second one gets you the return code of 0 from the touch command. "errno" always contains the return value of the last-executed command.

Which brings us around to the topic indicated in the title of this post (I promise to tie in the whole introduction about "errno" at the end; it wasn't a complete waste of your time ;). While it's easy enough to trap "errno" in any series of disconnected commands (for instance, in the second example above, if we'd echoed $? before running touch, it would have given us the correct output), I had always thought it was impossible to grab the correct value from a command in the middle of a pipe chain, like this:

host # grep bob /home/myfile 2>&1|Grep joe|xargs echo
host # echo $?
host # 0


You'll note that I purposefully capitalized the G in grep so that it would return an error code that didn't indicate success, yet - since this is a chain of commands all connected by pipes - "errno" returns the value of the xargs command, since it was the last one executed. Which means I've spent a lot of time jumping through hoops to "reword" any pipe chain so that I could extract the information I needed.

Now (and I'm almost positive this wasn't the case a few years back) the bash shell has actually taken on this predicament and come up with a nice workable solution for it(I'm waiting for it to pop up in sh and ksh, since they've been burned into my psyche over the last decade or so). In bash, if you run a series of piped-together commands, you can actually extract the value of "errno" from any command in the chain by using the shell built-in PIPESTATUS array, like so:

host # grep bob /home/myfile 2>&1|Grep joe|xargs echo
host # echo ${PIPESTATUS[@]}
host # 1 127 0


How nice is that? :) Now you can easily tell the return value of every process in a pipe-chain. The initial grep returns 1 because the string "bob" isn't in /home/myfile, the misspelled Grep returns 127 because the command can't be found and the final xargs returns 0. That solves a lot of problems and can potentially save you lines upon lines of convoluted code.

The one thing about it that can be frustrating is that it behaves in much the same way as "errno" (See, I told you I'd bring it back around ;). If you don't capture the output immediately (or dish it off into another variable), the array will zero out and contain no values as soon as you enter your next command, like so:

host # grep bob /home/myfile 2>&1|Grep joe|xargs echo
host # touch /home/myfile
host # echo ${PIPESTATUS[@]}
host # 0


At this point, after we've executed the touch command, the PIPESTATUS array has been cleared out, just like "errno" gets written over, even though we haven't executed another pipe chain. Its behaviour is basically identical. Below, we show that, once the array has been written over, its size gets reduced to 1 ( The single return value of the last executed command) and we further prove that the array really has been clipped down to one variable by attempting to print the first and second values; the second of which doesn't exist. Continued from above:

host # echo ${#PIPESTATUS[@]} <--- Here we ask bash for the size of the PIPESTATUS array
host # 1
host # echo ${PIPESTATUS[0]}
<--- Here we check the first variable in the PIPESTATUS array
host # 0
host # echo ${PIPESTATUS[1]}
<--- Here we check the second variable, which now doesn't exist
host #

This is easy enough to get around, however, since - just like "errno" - you can assign that array to another array before you execute another command, like so:

host # grep bob /home/myfile 2>&1|Grep joe|xargs echo
host # new_array=${PIPESTATUS[@]}
host # touch /home/myfile
host # echo ${PIPESTATUS[@]}
host # 0
host # echo ${new_array[@]}
host # 1 127 0


If you knew this already, I envy you the convenience you continue to enjoy. For the rest of us; a pleasant surprise :)

Best Wishes,

, Mike