Friday, May 16, 2008

Porting Associative Arrays Between Perl, Shell and Awk

Hey There,

And, for the last time (this week) we're back to porting :) One thing you might have noticed, in the title of today's post, is that the C programming language has been dropped. We received some feedback, and also generally agree, that throwing C in with Perl, Shell and Awk creates a layer of confusion that distracts from the point of these posts. Which is, of course, to demonstrate how simply one can use concepts, and apply them easily, with a little translation, between three very useful Unix and Linux programming (or scripting) languages.

We'll leave the original posts alone, so if you go back to revisit our original post regarding simple string variables and its follow-up that dealt with simple arrays, please just ignore all that stuff about C. If there's sufficient interest at a later time, we'll write up a series of posts on porting between (for instance) Perl and C and try to contain the amount of collateral damage ;)

Today, we're going to look at associative arrays (which are generally referred to as "Hashes" in Perl and as "lookup tables" from time to time) and how we can define, add value to, and extract value from, them in Perl, bash and awk.

The main differences between a "regular" array and an associative array is, at its most basic, found in the way that the array is indexed and the way in which you can add and extract value from the array. This is why I noted that, in bash, the only type of arrays available (at this point) are "regular" or "one dimensional" arrays, which we looked at in our last post on array porting ( Please note that I'm stuck using version "2.05b.0(1)" and am unaware of any newer release (up to bash 3.2) which directly supports this functionality. I "am" looking forward to it, though :)

The good news is that there is a hack to get around this, and emulate associative arrays, if you want to drive yourself crazy. We will definitely follow up this post with a post regarding how to do that (it'll be a post in and of itself, so we'll not belabour that point any longer, here). Also, in awk, there's, technically, no such thing as a "regular array." If you dump an Awk array, there's no guarantee that the contents will come out in the order you expect (which is also true of the Perl hash) and you can also use keys with them, instead of being limited to indexes. This will begin to make sense, if it doesn't already, as we go along. Terminology being what it is, I may have upset a lot of people already ;)

And, now that I've dug myself a deep enough hole, let's get started with associative arrays ;)

The associative arrray is sometimes defined as a "lookup table" for good reason. Compared with "associative array" and "hash," it's a name that's more descriptive of how the associative array works. Basically, on the left hand side of any assignment (the variable before the value) you have the name of the associative array along with a "key," like: $bob{"key"} - On the right hand side of the equation you have a simple value. You can see how the term "lookup table" makes the most sense as, in this case, if you wanted to look up the value of "key" you could find it in the associative array named "bob" simply by naming it. The associative array can also be thought of as having "keys" and "values," rather than "indexes" and "values," like regular, one dimensional, arrays have.

Since Bash only supports one-dimensional arrays up to (at least) version 2.05b.0(1), none of today's lesson applies. We will, however, definitely follow up with a post dedicated to "faking" associative arrays with bash.

1. Defining, Initializing or Declaring an associative array. As was the case with simple variables and arrays, no explicit declaration of an associative array is absolutely necessary in either of our (now) two languages:

Ex: We need to define an associative array called MySimpleHash. Even though it's unnecessary, we could do so like this:

In Perl: Just type "%MySimpleHash;" - The % symbol indicates an associative array (or hash) in Perl, as opposed to the $ sign, which indicates a scalar (or simple, or string) variable and the @ symbol which indicates an array. Perl hashes can also be created by simply defining their elements on-the-fly.

In Awk: Just type "declare MySimpleHash" - Again, associative arrays in Awk can be created by referencing their components and work exactly like one-dimensional arrays.

2. Assigning values to the associative array. This is very straightforward in both of our two remaining languages:

Ex: We want to assign the keys "MySimpleKey0," "MySimpleKey1" and "MySimpleKey2" ( with values, in order, of "MySimpleValue0", "MySimpleValue1," and "MySimpleValue2") to the associative array named MySimpleHash (Note that any values that contain spaces should be quoted - it's actually good practice to quote any string that is a being used as a key or a value in an associative array. This is generally not necessary for integer values).

In Perl: Just type "$MySimpleHash{"MySimpleKey0"} = MySimpleValue0; $MySimpleHash{"MySimpleKey1"} = MySimpleValue1; $MySimpleHash{"MySimpleKey2"} = MySimpleValue2;" - Spaces between the variable, "=" sign and value are optional. Note that we have to use the $ symbol when referring to an element of a Perl hash, while we use the % symbol to refer to the entire hash (or associative array. Same thing. But you know that by now ;) Also, last thing, notice that the square brackets, used to refer to regular array indexes, are replaced by curly brackets.

In Awk: Just type "MySimpleHash["MySimpleKey0"] = "MySimpleValue0"; MySimpleHash["MySimpleKey1"] = "MySimpleValue1"; MySimpleHash["MySimpleKey2"] = "MySimpleValue2";" - Spaces between the variable, "=" sign and value are not, technically, necessary, but recommended. Also, note that all of the "MySimpleKey" keys, and "MySimpleValue" values, are placed within double quotes in the assignment. This is sometimes necessary for string values, but usually not always necessary for numeric values.

3. Extracting the value from your simple associative array. This is no longer "trivial," but still not too terribly difficult to do. Let's look some stuff up in these tables :)

Ex: We want to print the value of all of the keys of the MySimpleHash associative array. This is also fairly simple in our dwindling nation of two languages ;) Note that we will be iterating over the array key/value pairs in order to print them all out. This is where you'll notice one very specific characteristic of the associative array (or hash). The associative array does it's own indexing based on an internal algorithm and may not spit out the same values in the same order every time you dump the contents. This is done, by the programming language, for maximum efficiency with regards to information storage and retrieval (with the assumption that you'll be looking for a particular "key"'s value, I suppose).

In Perl: Just type "while (($Key, $Value) = each %MySimpleHash) { print "$Key equals $MySimpleHash{$Key}\n";}"
- Note that the $ character needs to precede the variable name when you want to retrieve any of an associative array's key or value elements. Also, when iterating over the hash, you need to refer to the hash directly with the % prefix (printing %MySimpleHash would print out all of that hash's elements - generally all squished together with no separating space) - The \n, indicating a carriage-return, line-feed or new-line isn't necessary, but is nice if you don't want your output on the same line as your next command prompt:

host # perl -e '$MySimpleHash{"MySimpleKey0"} = MySimpleValue0; $MySimpleHash{"MySimpleKey1"} = MySimpleValue1; $MySimpleHash{"MySimpleKey2"} = MySimpleValue2;while( ($key, $value) = each %MySimpleHash ) { print "SimpleKey: $key, SimpleValue: $value.\n";}'
SimpleKey: MySimpleKey0, SimpleValue: MySimpleValue0.
SimpleKey: MySimpleKey2, SimpleValue: MySimpleValue2.
SimpleKey: MySimpleKey1, SimpleValue: MySimpleValue1.


For a goof, let's print out the entire hash at once, so you can see how the indexing is done automatically by Perl (i.e. it might not come out in the same order that the data was logically entered by us) :

host # perl -e '$MySimpleHash{"MySimpleKey0"} = MySimpleValue0; $MySimpleHash{"MySimpleKey1"} = MySimpleValue1; $MySimpleHash{"MySimpleKey2"} = MySimpleValue2;print %MySimpleHash;print "\n";'
MySimpleKey0MySimpleValue0MySimpleKey2MySimpleValue2MySimpleKey1MySimpleValue1


And, if you can see in the crunch there, it has indeed ordered our key/value pairs as 0, 2, 1, instead of the 0, 1, 2 that you would expect from a regular array!

In Awk: Just Type "for (x in MySimpleHash) { print x "=" MySimpleHash[x]; }" - Note that the $ or % symbol "must not" precede the variable name, or key name, when you want to get the value. Note that, just like regular arrays, awk associative arrays need to be iterated over to be entirely printed out. Again, here, we should see how awk has decided to index our key/value pairs, which might be different than the order in which we entered them (you'll generally see this more if you mix numeric and alpha keys in your awk associative array (here, we seem to get lucky :) :

host # echo |awk '{MySimpleHash["MySimpleKey0"] = "MySimpleValue0"; MySimpleHash["MySimpleKey1"] = "MySimpleValue1"; MySimpleHash["MySimpleKey2"] = "MySimpleValue2";for (x in MySimpleHash) { print x "=" MySimpleHash[x]; }}'
MySimpleKey0=MySimpleValue0
MySimpleKey1=MySimpleValue1
MySimpleKey2=MySimpleValue2


An interesting side fact about awk associative arrays is that, even if they come out in the order you expect, they may not be indexed exactly as you indexed them. For instance, when you enter the first, second and third value, you would expect them to occupy indexes 0, 1 and 2 from an regular awk array. An awk associative array may have those values indexed as 0,44 and 117. Let's see what happens if we pull keys 0, 1 and 2 out from MySimpleHash:

host # echo |awk '{MySimpleHash["MySimpleKey0"] = "MySimpleValue0"; MySimpleHash["MySimpleKey1"] = "MySimpleValue1"; MySimpleHash["MySimpleKey2"] = "MySimpleValue2";print MySimpleHash[0] MySimpleHash[1] MySimpleHash[2];}'

host #


Nothing. Goose eggs; as expected. Since awk creates the association for you, the simple numeric indexing won't work to retrieve the value for, say, "MySimpleKey2." If you want to force numeric indexing, you just need to make sure that all of your indexes are numbers (which is how we emulated a regular array in our previous post on working with regular arrays). Note, also, that if you want to make sure that a number you enter as an index is treated like a key in a key/value pair, the simplest way to do this is to enclose it in "double quotes."

Sadly, even though this article ran a bit long, there are still a lot of "little things" about associative arrays (or hashes, lookup tables, or whatever you prefer to call them) that can be explored. Hopefully, you're encouraged to get down to the nitty-gritty and rise above the attempted basic-ness of this post (Now I'm starting to make up words ;)

In our next addition to this threaded series of posts, we'll start looking at programming constructs (loops, conditionals, etc) and explore how we can use the basic variable knowledge we've acquired so far to get some real work done now(after all, that's less work for us to do later ;)

Cheers,

, Mike