Friday, December 7, 2007

The Shebang Line: An Introduction to Porting Shell to Perl

This is actually going to be, over time, a series of posts because it would be impossible (especially with my writing style) to cover every aspect of porting shell script to Perl in one posting (There may be some unusual or vague aspects that won't be covered at all, but, by that point, you will understand enough to be able to find the answers through trial and error, or judicious use of various search engines ;)

In this post we're going to cover one of the very basics. By "very basic," I mean a brief discourse on the "shebang" line. There's so much to mine in that seemingly obvious topic that it'll take far too long for me to crank out anything more today. After this, in a future post, we'll follow up with how to properly format a standard line of code, the three common variable types and how they are created and accessed in either language.

The beginning is complicated enough, when you really look at it. The "shebang" line (It's referred to as this because the exclamation point is referred as a "bang," for reasons which I don't completely understand, and the pound symbol is... I honestly can't come up with any intelligent reason for the other part of the name ;). I still call it the "pound bang" line.

In shell and in Perl, the "shebang" lines serve the same purpose. In fact, the "shebang" line is an entity unto itself. Its use extends beyond shell and/or Perl scripting and is a construct used for many purposes. The main thing the "shebang" line does is instruct the shell that you're using (you can be logged into csh, tcsh, ksh, sh, zsh or whatever shell you like to work in) to invoke a specific "command" to initiate the script with. It is customary to make this the full path to the shell in which you are writing your script, or the full path to the Perl binary, but it doesn't have to be (although, what you are asking the "shebang" to execute should be directly connected to it. No space between "#!" and "/bin/sh." Lots of shells nowadays will consume this blank space for you. Some older versions of shells are far less forgiving). For instance, our two basic examples are these:

#!/bin/sh <--- Tells the shell's interpreter to invoke "/bin/sh" in a subshell and run all the commands in the script in that subshell.
#!/usr/bin/perl <--- Tells the shell's interpreter to invoke "/usr/bin/perl" in a subshell and run all the commands in the script in that subshell.

This doesn't mean you can only use one shell or even one version of Perl within the same script, but that's outside the scope of today's post. So, to illustrate what I mean when I say the "shebang" line instructs the shell to invoke a specific "command," I mean just that. If you wrote a script, in the editor of your choice, and made the first line:

#!/bin/rm

the default action of the shell would be to delete the file and then run all the following commands (reading the script - in both shell and Perl - left to right, from top to bottom) until the script ended. If this seems counterintuitive, which it did to me at one point, that's normal. This is a good example of what goes on "under the covers" when you execute a script in your Unix or Linux shell. In the above example (as in the previous two, and all others), the entire script is read into memory before the "shebang" line is executed. That way, when your script with "#!/bin/rm" at the top is executed, the following happens:

1. The script is read into memory.
2. The script is executed.
3. The first line ("shebang") instructs the script to delete itself.
4. The rest of the script is run out from memory.
5. When the script is done, everything you instructed the script to do has been completed (provided it's at all possible in the shell you're executing it from (since the default subshell will be a subshell of the shell you're in when you execute the script) and the script is no longer there! In actuality, the script disappears before the code is executed from memory, so your script is gone, sometimes, long before your script finishes! For example, let's look at the following script:


#!/bin/rm
# /export/home/user/MyScript.sh c) 2007 MT - xyz.com
#
# Pounds are considered comment lines in both shell and Perl
# You can include interesting notes or whatever you like behind
# the pound sign and the script won't execute it. Even the "bang"
# ! has no special significance after the first line of the script
# except in special situations where you're invoking another
# subshell or another "shebang."
# Blank lines are also not acted upon in either shell or Perl.

sleep 500000000000000000


Now, if we were to run that script at the command line (after doing a chmod to 700 so that we, the user, have full read, write and execute permissions on it) we would end up sitting at a hung prompt, waiting for the sleep command to finish for a very long time. In this case that's good, because it gives us ample opportunity to login to the same machine again, on a different terminal, and have a look at what's going on:

This is an example of what our first terminal session would look like:

user# cd /export/home/user
user# ls -a
. .. .profile MyScript.sh
user# /bin/bash ./MyScript.sh
user#


Now, if we log on to the machine again, using a different terminal (or another connection, window; however you want to put it) we'd see this:

user# cd /export/home/user
user# ls -a
. .. .profile
user#


Yet - we can check that the script is still running, which it is:

user# ps -ef|grep "[M]yScript.sh
user# user 6523 6521 0 15:12:03 ? 0:14 ./MyScript.sh


And thus is all that explained. You can use the "shebang" line to run any command you want. Try it out. Be careful, because you can do some incredibly awful things if you're not careful. Luckily, the shell will protect you against doing something like this (but the way it reacts is specific to the command invoked, so some programs, like Perl, can accept a switch after the command line, like - #!/usr/bin/perl -w):

#!/bin/rm -rf /export/home/user

as it only executes the full qualified "rm" command directly attached to the "shebang." If you quote the entire command to try and get around it that way, it will try to interpret the entire line as one command and shoot you an error like:

ksh ./MyScript.sh: not found

Hopefully, this has been a somewhat informative and easy to understand introduction to the relationship between the shell and Perl. I understand that this bedrock we've discussed today isn't really specific to either of them, although it applies to both. But, as an old saying goes: It's usually best to start with the first floor (or the basement) when you're putting up a building ;)

Best Wishes,

, Mike



Thanks to Michael Pelletier, Merrimack NH for this perfect explanation of what that pound symbol means!


It's a "sharp" symbol in music notation:

http://en.wikipedia.org/wiki/Sharp_(music)

That's why the "C#" language is pronounced "C sharp," not "C plus plus plus plus:"

http://en.wikipedia.org/wiki/C_Sharp_(programming_language)

Enjoy this little tidbit of mostly useless knowledge. I pronounced it "pound bang" for maybe 20 years.