Showing posts with label loop. Show all posts
Showing posts with label loop. Show all posts

Thursday, August 14, 2008

Back To Basics: Avoiding Recursive Alias Disasters On Linux And Unix

Hey there,

Today, we're going to step into the wayback machine and take a look at something very basic that, if ignored or forgotten, can potentially have devastating side-effects. And, I'm not referring to the way I used to drink in college ;) This has more to do with an op-ed piece we ran a while back about paying attention to the small things. In fact, it has exactly to do with that.

The topic for today's blast of hot air is "infinite recursion." Specifically, we'll be looking at a kind of insidious way of getting stung by that issue through the use of shell aliases. They seem harmless, and they do make it so you can, for instance, type "sit-stay" instead of "while :;do print -n "*";sleep 15;done" when you want to walk away from your terminal and not disconnect your SSH session, but they do have a down side, which is fairly easy to exploit. In my experience, these cpu-killers happen most often "by accident" rather than with any malicious intent. Nevertheless, their very existence cries out for caution (Tomorrow I might write some fiction to get all these fifty cent phrases out of my system. Although some folks may contend that I've done enough of that on this blog already ;)

The infinite recursion problem with aliases closely parallels a very simple exploit that can be run in a simple shell script. The ends are the same, and the means are closely related. For instance, if you create a shell script called "ls," with the contents:

#!/bin/sh

ls;sleep 1500000000


and manage to get that in a user's PATH before the real /bin/ls (or /usr/bin/ls), it'll ratchet up the number of open processes and filehandles very quickly. If let sit, it will take down any machine of any size eventually. This end can actually be accomplished even more simply than that, but we're not trying to encourage reckless behaviour; just rubbernecking a little ;)

The problem you can get into with simple shell aliases is when you have an alias for a command (like "ls") that actually already exists. I will usually make my aliases unique (like "sit-stay," above, which substitutes for an asterisk-printing while loop), but not everyone does. And, sometimes, an alias that is unique can become the opposite if you change OS's. For instance "ll" doesn't exist on Solaris, but it does on HP-UX.

It's very important that, when you create an alias which is named the same as a system binary (or shell built-in) that you compensate for that fact. It's actually made very easy to protect yourself, from yourself, by pretty much every shell I've ever worked with. At the most basic level, you can instruct your shell to not even attempt alias expansion of whatever you type on the command line by simply enclosing it in quotes (single or double should both work - If not, let me know what shell you're using. It would be interesting to have a list of shells that make this distinction). So, while:

host # cd /tmp

would (and I'm oversimplifying this by a preposterous degree) cause the shell to check, in order, if "cd" was an alias, a shell built-in, a function, an actual binary in our PATH or an erroneous entry (stopping at whatever point returns true first), typing:

host # "cd" /tmp

would remove the "alias check" from the equation. Problem solved and/or potential for damage neutralized. Just like the simple shell script above, if you created an alias for "cd" (which might differ from this shell to yours) like this one:

host # alias cd="cd $@"

you could have a problem on your hands. Lord knows why you'd want to do this, since you'd still be typing "cd wherever" either way, but I'm just being obtuse to make a point ;) In this situation, every time someone types something like:

host # cd /my/home/dir

The shell will determine that "cd" is an alias before it processes the command and, while it's processing the alias it will see that the actual command (to which the alias refers) contains an alias, which it will process, etc, etc, and, before you know it, system resources will become depleted to the point that no new processes can be forked. The machine, however, will be completely forked ;)

The preferred way (or so they tell me) is to encapsulate your alias in a function of the type _NAME (for the command NAME) (even though you, theoretically, only have to make sure your alias has the "real" command in quotes):

function _cd {
"cd" $@
}
alias cd="_cd"


If you're lazy, like me, you'll just do this, instead:

alias cd='"cd" $@'

And, of course, if you want to have your alias/function print out where you've cd'ed to, you'll need to keep in mind that the return code (errno) for cd will be overwritten when you do the print or echo. This is easy enough to handle by grabbing the value of $?, and we use it in a lot of our scripts:

function _cd {
"cd $@"
return=$?
echo $PWD
return $return
}
alias cd="_cd"


and you shouldn't have to worry about recursive calling of the "cd" (or any) command ever again. Assuming you always follow these precautions when dreaming up your aliases :)

Cheers,

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Wednesday, August 6, 2008

Using Bash To Feed Command Output To A While Loop Without Using Pipes!

Hey There,

Today's post regards something I just picked up. In fact, it's something that's been driving me nuts for a long long time (Reference our earlier post on piped variable scoping in Linux or Unix). The issue I'm writing about is something I've been puzzling over for quite a while in my spare time. I haven't been mulling over how to "do it myself" creatively, but more about "why has this feature never existed when it seems so essential?" As it turns out, this feature "has" existed, although it was a little hard to find in bash 2.x. With bash 3.x, it's brought to the fore and given the attention it deserves (including it's own name ;)

HUGE HELPFUL HINT: If you don't care about the process I went through to find the answer for bash 2.x, and just want to know how to do it, skip down to the PROBLEM SOLVED section which is named appropriately and in the same SCREAMING typeface ;)

The issue is that of command line output (or, if you prefer to think about it the other way, STDIN command line input) redirection and handling. For instance, if you wanted to avoid a problem with scoping in bash and you were reading your input from a file, you could change this block of code:

cat FILE|while read line
do
echo $line
done


to the more appropriate and efficient:

while read line
do
echo $line
done <FILE


On the other hand, if you were dealing with command output, you "couldn't" switch this block of code:

ls -1d *|while read line
do
echo $line
done


with this:

while read line
do
echo $line
done < ls -1d *


...but that's just common sense, since the redirection input operator would expect ls to be a file and you'd get an error like:

./program: syntax error near unexpected token `-1d'
./program: line 4: `done < ls -1d *'


So, other than the file descriptor exec workaround (which is really just a fancy way of outputting your process's STDIN and STDOUT streams to a file and reading from it; completely contrary to the spirit of having this work naturally) the following might seem like a reasonable way to "feed" your while loop the command output (only showing the last lines of the following code blocks for brevity's sake as we roll through the error scenarios):

done < `ls -1d *`

but, this results in:

./program: `ls -1d *`: ambiguous redirect

and you'd get the same thing using the bash built-in's:

./program: $(ls -1d *): ambiguous redirect

Double redirect doesn't work either (<< ls -1d * - also << `ls -1d *` returns only an error code in errno with no output):

./program: line 4: syntax error near unexpected token `-1d'
./program: line 4: `done << ls -1d *'


And we've considered subshells, which don't work either:

./program: line 4: syntax error near unexpected token `*)'
./program: line 4: `done <(ls -1d *)'


PROBLEM SOLVED!

But here's a really neat trick for getting this to work in bash 2.x. If you change your program to be structured like so:

while read line
do
echo $line
done < <(ls -1d *)


Your outcome will result in success!! You've got the command output and you didn't have to use a pipe to feed it to the while loop!

NOTE: The two most important things to remember about doing this are that:

1. The space between the first < and second < is mandatory! Although, it should be noted that, between the two <'s, you can have as many spaces as you want. You can even use a tab between the two <'s, they just can't be directly connected.

2. The command, from which you want to use output as fodder for the while loop, needs to be run in a subshell (generally placed between parentheses, just like the ones surrounding this sentence) and the left parenthesis must immediately follow the second <, with "no" space in between!

We've already looked at what happens if you ignore rule number 1 and use << instead of < <. If you ignore rule number 2, you'll get:

./program: line 4: syntax error near unexpected token `<'
./program: line 4: `done < < (ls -1d *)'


And here's the "even better part" - In bash 3.x, you don't have to worry about all that spacing anymore, as they've added a new feature which does the same thing (or is it really just an old feature dressed up to make it seem fabulous? ;) In bash 3.x, you can use the triple-< operator. Actually, I believe the <<< syntax is referred to as a "here string," but that's purely academic. They could call it "fudge," as long as it works ;)

So, in bash 3.x, you could write a while loop that takes input from a command without using a pipe like so:

while read line
do
echo hi $line
done <<< "`ls -1d *`"


NOTE: The space between the <<< and your backticked (or otherwise extrapolated) command output is not necessary and you can have as much space as the shell can stand between those two parts of the "here string." Of course, the three <'s need to be all clumped together with no space in between them.

I hope this has been helpful and/or enlightening for everyone out there, like me, who've been stumped by this issue for a while and always ended up doing some half-arsed workaround. It's a problem that's been bugging me forever. It turns out I was "this close" with bash 2.x, but I'm very happy to see that bash 3.x actually includes the functionality and makes finding it as simple as RTFMP ;)

Cheers,

, Mike


Thanks For This Comment From Richard Bos, which points out a flaw in this post that has been corrected per his remarks:

...
So your example should actually read:

while read line
do
echo hi $line
done <<< "`ls -1d *`"


One thing though I use: done <<< "$(ls -1d *)"
This construct is also used on this example page http://tldp.org/LDP/abs/html/x16712.html




Thanks, also, for this comment from Douglas Huff, which helps to clarify the underbelly of the process:

A friend of mine pointed me to this article and the
previous one in the series that you wrote [on variable scoping]...

I had two comments on these articles but you seem to have
comments disabled, so I figured I'd email them to you.

First, calling it a "scoping" issue is a bit misleading.
While technically true, understanding the underlying
reasons why this doesn't work as "expected" is key to
understanding how you can work around it in POSIX sh or in
ksh without the zsh/bash syntatical sugar for doing so.

What's going on is that a process cannot modify the
environment of it's parent.

When you do:

something | while read blah; do blah; done

What the shell is doing is first executing a subshell
(separate process) that runs the while with stdin
redirected to read from the unnamed pipe. Then in another
subshell it runs "something" with standard out redirected
to the unnamed pipe.

Knowing this it's quite easy to replicate the behaviour
from bash 2/3 and zsh in POSIX sh and ksh with a bit of
understanding of the underlying mechanics. The trick is to
keep the while inside of the original process (since it is
run by the interpretter and does not require a separate
process) and execute the other command in a subshell.
Which is exactly what the syntactical sugar does for you
behind the scenes in bash2&3/zsh.




Thanks, also, to Vincenzo Di Massa, for shedding even more light on the subject :

Hi,
the reason why there is the space between < and < in
done < <(ls *.txt)

is the following.

<(ls *.c) gets espanded into a filename

for example try:
$ echo <(ls *.txt)


it will print somehing like /dev/fd/63

the meaning is that <(ls *.txt) gets replaced by the filename
of a special file attached to the output of the ls command.

thus < <( ls *.txt) gets replaced by
< /dev/fd/63
and thus the standard input redirection takes place.

Best Regards Vincenzo

Monday, June 9, 2008

Finding The Number Of Open File Descriptors Per Process On Linux And Unix

Hey There,

Today, we're going to take a look at a fairly simple process (no pun intended), but one that (perhaps) doesn't come up enough in our workaday environments that the answer comes to mind as obviously as it should. How does one find the number of open file descriptors being used by any given process?

The question is a bit of a trick, in and of itself, since some folks define "open file descriptors" as the number of files any given process has open at any given time. For our purposes, we'll be very strict, and make the (usually fairly large) distinction between "files open" and "open file descriptors."

Generally, the two easiest ways to find out how many "open files" a process has, at any given point in time, are to use the same utilities you'd use to find a process that's using a network port. On most Linux flavours, you can do this easily with lsof, and on most Unix flavours you can find it with a proc command, such as pfiles for Solaris.

This is where the difference in definitions makes a huge difference in outcome. Both pfiles and lsof report on information for "open files," rather than "open file descriptors," exclusively. So, if, for instance, we were running lsof on Linux against a simple shell process we might see output like this (all output dummied-up to a certain degree, to protect the innocent ;)

host # lsof -p 2034
CMD PID USER FD TYPE DEVICE SIZE NODE NAME
process 2034 user1 cwd DIR 3,5 4096 49430 /tmp/r (deleted)
process 2034 user1 rtd DIR 3,7 1024 2 /
process 2034 user1 txt REG 3,5 201840 49439 /tmp/r/process (deleted)
process 2034 user1 mem REG 3,7 340771 40255 /lib/ld-2.1.3.so
process 2034 user1 mem REG 3,7 4101836 40258 /lib/libc-2.1.3.so
process 2034 user1 0u CHR 136,9 29484 /dev/pts/9
process 2034 user1 1u CHR 136,9 29484 /dev/pts/9
process 2034 user1 2u CHR 136,9 29484 /dev/pts/9
process 2034 user1 4r CHR 5,0 29477 /dev/tty


However, if we check this same output by interrogating the /proc filesystem, we get much different results:

host # ls -l /proc/2034/fd/
total 0
lrwx------ 1 user1 user1 64 Jul 30 15:16 0 -> /dev/pts/9
lrwx------ 1 user1 user1 64 Jul 30 15:16 1 -> /dev/pts/9
lrwx------ 1 user1 user1 64 Jul 30 15:16 2 -> /dev/pts/9
lrwx------ 1 user1 user1 64 Jul 30 15:16 4 -> /dev/tty


So, we see that, although this one particular process has more than 4 "open files," it actually only has 4 "open file descriptors."

An easy way to iterate through each processes open file descriptors is to just run a simple shell loop, substituting your particular version of ps's arguments, like:

host # for x in `ps -ef| awk '{ print $2 }'`;do ls /proc/$x/fd;done

If you're only interested in the number of open file descriptors per process, you can shorten that output up even more:

host # for x in `ps -ef| awk '{ print $2 }'`;do ls /proc/$x/fd|wc -l;done

Here's to being able to over-answer that seemingly simple question in the future ;)

, Mike

Friday, June 6, 2008

Piped Variable Scoping In The Linux Or Unix Shell

Hey There,

Today we're going to look at variable scoping within a piped-while-loop in a few Linux and Unix shells. We're actually almost at this point in our series of ongoing posts regarding bash, Perl and awk porting.

Probably the most interesting thing about zsh (and shells that share its characteristic, in this sense) is that the scope of variables passed through a pipe is slightly different than in other shells, like bash and ksh (Note that not all vendor's versions are equal even if they have the same name! For instance, HP-UX's sh is the Posix shell, while Solaris' is not) I'm taking care to separate the piping construct from the "is the while loop running in a subshell?" argument, as I don't want to get too far off course. And, given this material, that can happen pretty fast.

For a very simple demonstration of whether the scoping issue is a "problem" (defining problem as either a bug or a feature ;) with the while-loop or pipes, we'll look at a very simple "scriptlet" that sticks to using a while-loop, without any piping, like this:

while true
do
bob=joe
echo " $bob inside the while"
break
done
echo " $bob outside the while"


And we can see, easily, that the value of the "bob" variable stays the same, even after the while loop breaks, for all 3 of our test shells. If the while loop, alone, was the issue, bob shouldn't be defined when the while loop breaks:

host # zsh ./test.sh
joe inside the while
joe outside the while
host # ksh ./test.sh
joe inside the while
joe outside the while
host # bash ./test.sh
joe inside the while
joe outside the while



If we change this scriptlet slightly to make it "pipe" an echo to the while-loop, the behaviour changes dramatically:

echo a|while read a
do
bob=joe
echo " $bob inside the while"
break
done
echo " $bob outside the while"


Now, if we use zsh, the value assigned to the "bob" variable inside our while loop (which has been created on the other side of the pipe) actually maintains it state when coming out of the loop, like this:

host # zsh ./test.sh
joe inside the while
joe outside the while


On most other shells, because of variable scope issues with the pipe, an empty value of the "bob" variable is printed after they break out of the while loop, even though it does get correctly defined within the while loop. This is because (and here's where the technicality, and subtle differences between myriad shells, usually becomes a hotbed of raging debate ;) after the pipe, the read command (as opposed to the while loop) runs in a subshell, like so:

host # bash ./test.sh
joe inside the while
outside the while
host # ksh ./test.sh
joe inside the while
outside the while


Notice, again, that the "echo $bob outside the while" statement in these two executions prints an empty variable when the value bob is declared outside the while loop, even though it is set within the while loop.

For most shells, this is easy to get around in one aspect. The main problem stems from the fact that the value is being piped to the while loop, and not a direct fault of the while loop itself. Therefore, a fix like the following should work, and does. Unfortunately, with the command-pipe (such as an echo statement), you won't be able to use a while-loop in many cases, and would have to substitute a for-loop, like so (In most shells, redirecting at the end of a while loop with << will either result in an error or clip the script at that line):

for x in 1
do
bob=joe
echo " $bob inside the while"
done
echo " $bob outside the while"


host # zsh ./test.sh
joe inside the while
joe outside the while
host # ksh ./test.sh
joe inside the while
joe outside the while
host # bash ./test.sh
joe inside the while
joe outside the while


This gets worse (usually hangs) if you try to get around the pipe by doing some inline subshelling with backticks, like:

while read `echo 1`

However, the following solution (awkward though it may be) does actually do the trick (substitute any other fancy i/o redirection you want, as long as you "avoid the pipe"):

exec 7<>/tmp/bob
echo -n "a" >&7
while read -r line <&7
do
bob=joe
echo " $bob inside the while"
done
echo " $bob outside the while"
exec 7<&-
exec 7>&-
rm /tmp/bob

host # zsh ./test.sh
joe inside the while
joe outside the while
host # ksh ./test.sh
joe inside the while
joe outside the while
host # bash ./test.sh
joe inside the while
joe outside the while


For more examples of input/output redirection, check out our older post on bash networking using file descriptors.

Now, when it comes to reading in files, the case is a bit easier to remedy. If you're in the habit of doing:

cat SOMEFILE |while read x
...


You'll run into the same scoping problem. This is also easily fixed by using i/o redirection, which would change our script to this:

while read x
do
bob=joe
echo " $bob inside the while"
done < SOMEFILE
echo " $bob outside the while"


Assuming the file SOMEFILE had one line of content, you'd get the same results as we got above with the for loop.

And that's about all there is to that (minus the highly-probably ensuing arguments ;). There are, I'm sure, a couple more ways to do this, but using the methods that fixed the "problem" of variable scope in bash and ksh is probably better practice, since zsh (and shell's that share its distinction in this case) is a rare exception to the rule (even though zsh may very well be doing things the "proper" way) and the bash/ksh fix works in zsh, while the opposite is not true.

At long last, good evening :)

, Mike


Thanks for this comment from Douglas Huff, which helps to clarify the underbelly of the process:

A friend of mine pointed me to this article and the
previous one in the series that you wrote [on variable scoping]...

I had two comments on these articles but you seem to have
comments disabled, so I figured I'd email them to you.

First, calling it a "scoping" issue is a bit misleading.
While technically true, understanding the underlying
reasons why this doesn't work as "expected" is key to
understanding how you can work around it in POSIX sh or in
ksh without the zsh/bash syntatical sugar for doing so.

What's going on is that a process cannot modify the
environment of it's parent.

When you do:

something | while read blah; do blah; done

What the shell is doing is first executing a subshell
(separate process) that runs the while with stdin
redirected to read from the unnamed pipe. Then in another
subshell it runs "something" with standard out redirected
to the unnamed pipe.

Knowing this it's quite easy to replicate the behaviour
from bash 2/3 and zsh in POSIX sh and ksh with a bit of
understanding of the underlying mechanics. The trick is to
keep the while inside of the original process (since it is
run by the interpretter and does not require a separate
process) and execute the other command in a subshell.
Which is exactly what the syntactical sugar does for you
behind the scenes in bash2&3/zsh.

Saturday, December 15, 2007

A Simple Trick To Keep Your SSH Session From Timing Out

Hey there,

I thought I'd write a simple something this Saturday; just a little trick that's served me well over the years.

This has to do with any login session in a terminal window (Telnet, SSH, etc). Although this may not always be the case, most places I've ever worked at have had, at least, a few machines were they enforced time-outs on your login. This can be frustrating if you're in the middle of doing something, get called away to do something else, and come back to a "Disconnected!" dialogue box.

Usually, you should be able to get away with just putting a line like this in your .profile or .bash_profile:

TMOUT=0;export TMOUT

But, I've found that a lot of setups don't honor this shell setting. Instead, they take a measure of your activity and log you out if you don't produce enough in your session. So, when you type:

sqlplus @database_query

Even if that takes all day to run, you'll still get disconnected in 5 minutes.

This simple shell loop has almost always worked for me:

while :;do print -n "* ";sleep 15;done

Substitute either
echo -n "* "
or
echo "*\c" <--- \c is the escape character representation of a space.

for the print statement in that little one-line loop, depending on your shell (sh doesn't support the "print" statement) and its implementation of echo (you might be using the built-in or the system binary; in any event - one of these three varieties should do the trick for you.

I leave the carriage-return out on purpose so I can come back later and look at:

*******_

instead of pages of single *'s along the side of my screen, forcing me to scroll back forever to remember what I was doing ;)

This trick basically works because, even though you're not directly interacting with your terminal, you're sending packets back and forth every time that print (or echo) statement executes.

Enjoy a little less stress. Your terminal windows should now be waiting for you instead of threatening to leave :)

, Mike