Friday, June 6, 2008

Piped Variable Scoping In The Linux Or Unix Shell

Hey There,

Today we're going to look at variable scoping within a piped-while-loop in a few Linux and Unix shells. We're actually almost at this point in our series of ongoing posts regarding bash, Perl and awk porting.

Probably the most interesting thing about zsh (and shells that share its characteristic, in this sense) is that the scope of variables passed through a pipe is slightly different than in other shells, like bash and ksh (Note that not all vendor's versions are equal even if they have the same name! For instance, HP-UX's sh is the Posix shell, while Solaris' is not) I'm taking care to separate the piping construct from the "is the while loop running in a subshell?" argument, as I don't want to get too far off course. And, given this material, that can happen pretty fast.

For a very simple demonstration of whether the scoping issue is a "problem" (defining problem as either a bug or a feature ;) with the while-loop or pipes, we'll look at a very simple "scriptlet" that sticks to using a while-loop, without any piping, like this:

while true
do
bob=joe
echo " $bob inside the while"
break
done
echo " $bob outside the while"


And we can see, easily, that the value of the "bob" variable stays the same, even after the while loop breaks, for all 3 of our test shells. If the while loop, alone, was the issue, bob shouldn't be defined when the while loop breaks:

host # zsh ./test.sh
joe inside the while
joe outside the while
host # ksh ./test.sh
joe inside the while
joe outside the while
host # bash ./test.sh
joe inside the while
joe outside the while



If we change this scriptlet slightly to make it "pipe" an echo to the while-loop, the behaviour changes dramatically:

echo a|while read a
do
bob=joe
echo " $bob inside the while"
break
done
echo " $bob outside the while"


Now, if we use zsh, the value assigned to the "bob" variable inside our while loop (which has been created on the other side of the pipe) actually maintains it state when coming out of the loop, like this:

host # zsh ./test.sh
joe inside the while
joe outside the while


On most other shells, because of variable scope issues with the pipe, an empty value of the "bob" variable is printed after they break out of the while loop, even though it does get correctly defined within the while loop. This is because (and here's where the technicality, and subtle differences between myriad shells, usually becomes a hotbed of raging debate ;) after the pipe, the read command (as opposed to the while loop) runs in a subshell, like so:

host # bash ./test.sh
joe inside the while
outside the while
host # ksh ./test.sh
joe inside the while
outside the while


Notice, again, that the "echo $bob outside the while" statement in these two executions prints an empty variable when the value bob is declared outside the while loop, even though it is set within the while loop.

For most shells, this is easy to get around in one aspect. The main problem stems from the fact that the value is being piped to the while loop, and not a direct fault of the while loop itself. Therefore, a fix like the following should work, and does. Unfortunately, with the command-pipe (such as an echo statement), you won't be able to use a while-loop in many cases, and would have to substitute a for-loop, like so (In most shells, redirecting at the end of a while loop with << will either result in an error or clip the script at that line):

for x in 1
do
bob=joe
echo " $bob inside the while"
done
echo " $bob outside the while"


host # zsh ./test.sh
joe inside the while
joe outside the while
host # ksh ./test.sh
joe inside the while
joe outside the while
host # bash ./test.sh
joe inside the while
joe outside the while


This gets worse (usually hangs) if you try to get around the pipe by doing some inline subshelling with backticks, like:

while read `echo 1`

However, the following solution (awkward though it may be) does actually do the trick (substitute any other fancy i/o redirection you want, as long as you "avoid the pipe"):

exec 7<>/tmp/bob
echo -n "a" >&7
while read -r line <&7
do
bob=joe
echo " $bob inside the while"
done
echo " $bob outside the while"
exec 7<&-
exec 7>&-
rm /tmp/bob

host # zsh ./test.sh
joe inside the while
joe outside the while
host # ksh ./test.sh
joe inside the while
joe outside the while
host # bash ./test.sh
joe inside the while
joe outside the while


For more examples of input/output redirection, check out our older post on bash networking using file descriptors.

Now, when it comes to reading in files, the case is a bit easier to remedy. If you're in the habit of doing:

cat SOMEFILE |while read x
...


You'll run into the same scoping problem. This is also easily fixed by using i/o redirection, which would change our script to this:

while read x
do
bob=joe
echo " $bob inside the while"
done < SOMEFILE
echo " $bob outside the while"


Assuming the file SOMEFILE had one line of content, you'd get the same results as we got above with the for loop.

And that's about all there is to that (minus the highly-probably ensuing arguments ;). There are, I'm sure, a couple more ways to do this, but using the methods that fixed the "problem" of variable scope in bash and ksh is probably better practice, since zsh (and shell's that share its distinction in this case) is a rare exception to the rule (even though zsh may very well be doing things the "proper" way) and the bash/ksh fix works in zsh, while the opposite is not true.

At long last, good evening :)

, Mike


Thanks for this comment from Douglas Huff, which helps to clarify the underbelly of the process:

A friend of mine pointed me to this article and the
previous one in the series that you wrote [on variable scoping]...

I had two comments on these articles but you seem to have
comments disabled, so I figured I'd email them to you.

First, calling it a "scoping" issue is a bit misleading.
While technically true, understanding the underlying
reasons why this doesn't work as "expected" is key to
understanding how you can work around it in POSIX sh or in
ksh without the zsh/bash syntatical sugar for doing so.

What's going on is that a process cannot modify the
environment of it's parent.

When you do:

something | while read blah; do blah; done

What the shell is doing is first executing a subshell
(separate process) that runs the while with stdin
redirected to read from the unnamed pipe. Then in another
subshell it runs "something" with standard out redirected
to the unnamed pipe.

Knowing this it's quite easy to replicate the behaviour
from bash 2/3 and zsh in POSIX sh and ksh with a bit of
understanding of the underlying mechanics. The trick is to
keep the while inside of the original process (since it is
run by the interpretter and does not require a separate
process) and execute the other command in a subshell.
Which is exactly what the syntactical sugar does for you
behind the scenes in bash2&3/zsh.