Wednesday, May 14, 2008

Working With Arrays - Porting Between Linux Or Unix Using Bash, Perl, C and Awk

Greetings,

Back to porting some more :) Building on our posts starting from the shebang line, followed, somewhat logically by a post on working with simple variables, today we're going to move on to the next step: Defining, populating and extracting the values from simple array variables.

The array variable is another basic building block of most shell scripts or code. Put simply, an array is just a collection of the simple variables we looked at in our last post on porting.

The technical definitions, especially as they apply to each of our four languages (bash, Perl, awk and C) are beyond the scope of this series of posts (for now). The details are important, but they're useless if you don't know the basics. To put it in another light, when you learn a foreign language (German, for example), it's more important that you understand basic concepts of the language, and how to use those simple phrases, than it is for you to be able to break down the subtle differences between the gender and tense of each word within the context of a sentence. The folks in Germany will know you need to go to the bathroom no matter how poor your grammar is, as long as you can spit out a few key words that indicate your need to find a restroom immediately ;)

So, let's get started with arrays:

Arrays, as we mentioned, are simply collections of simple variables. For instance, taking from our previous example, if we have a variable x, that has a value of y (x=y), then we have a simple variable. Arrays provide a way to group collections of simple variables. So you could have a number of variable/value pairs (x=y, a=b, c=d, e=f, etc), or simply a collection of values, that can all be referred to by one variable name: the array name (e.g. array b = x=y and a=b and c=d and e=f). More cryptic than a simple variable explanation, but (even if it's not right now) simple to understand once you get the hang of it.

1. Defining, Initializing or Declaring an array. As was the case with simple variables, with a simple array, except in C (of course), no explicit declaration of an array is absolutely necessary:

Ex: We need to define an array called MySimpleArray. This is trivial in all four languages:

In Bash: Just type "declare -a MySimpleArray" (again, you can also use "typeset -a"). This is not absolutely necessary, as you can create an array simply by defining a part of it (e.g. MySimpleArray[0]="bob" would create the MySimpleArray array with one value)

In Perl: Just type "@MySimpleArray;" - The @ sign indicates an array in Perl, as opposed to the $ sign, which indicates a scalar (or simple, or string) variable. Perl arrays can also be created by defining their elements.

In Awk: Just type "declare MySimpleArray" - Again, arrays in Awk can be created by referencing their components.

In C: You "need" to declare/initialize your array (and its size) before you can use it. As noted in our last post, a simple string variable, in C, is actually an array of the type "char."

So, just like when you declared the "simple variable" MySimpleVariable, you'll use the exact same syntax, since that was, technically, an array: "char *MySimpleArray;" (This, again, generally needs to be followed by a declaration of the size/memory-allocation-requirement of the string, like "MySimpleArray = (char *)malloc(8*sizeof(char));" for an 8 character array).

Also, in C, if you want to declare an integer array, you would do it in this fashion (although we're not going to drill too far into this since it pulls away from the commonality of all the other examples): "int MySimpleArray[8];" for an eight integer array.

2. Assigning values to the simple array. This is very straightforward in all of our four languages:

Ex: We want to assign the values "MySimpleValue0", "MySimpleValue1," and "MySimpleValue2" to the simple array named MySimpleArray (Note that any values that contain spaces should be quoted - it's actually good practice to quote any string that is a being used as a value in an array. This is generally not necessary for integer values). Note that our instructions for creation here today are based on simplicity, and not efficiency. There are quicker ways to define arrays all at once (and print them all at once, when we extract the values from the array variables), but we'll leave that for another time. Also note that, in most arrays, the first element is numbered 0, rather than 1.

In Bash: Just type "MySimpleArray[0]=MySimpleValue0; MySimpleArray[1]=MySimpleValue1; MySimpleArray[2]=MySimpleValue2" - Spaces between the variable, "=" sign and value are not permitted.

In Perl: Just type "$MySimpleArray[0] = MySimpleValue0; $MySimpleArray[1] = MySimpleValue1; $MySimpleArray[2] = MySimpleValue2;" - Spaces between the variable, "=" sign and value are optional. Note that we have to use the $ symbol when referring to an element of an array, while we use the @ symbol to refer to the entire array.

In Awk: Just type "MySimpleArray[0] = "MySimpleValue0"; MySimpleArray[1] = "MySimpleValue1"; MySimpleArray[2] = "MySimpleValue2"" - Spaces between the variable, "=" sign and value are not, technically, necessary, but recommended. Also, note that "MySimpleValue0," "MySimpleValue1," and "MySimpleValue2" are placed within double quotes in the assignment. This is sometimes necessary for string values, but usually not for numeric values.

In C: Just type: "MySimpleArray = "MySimpleValue";" If your array is not simply a char (as we're using in our example today), you do not need to use quotes. For an integer array, you would add values like this: "MySimpleArray[] = {0,1,2};" <--- Again, apologies if these C integer array side notes are distracting. Just ignore them ;)

3. Extracting the value from your simple array. It's time to collect :)

Ex: We want to print the value of the MySimpleArray elements. This is also fairly simple in all four languages:

In Bash: Just type "echo ${MySimpleArray[0]};echo ${MySimpleArray[1]};echo ${MySimpleArray[2]}" - Note that the $ character needs to precede the variable name when you want to get the value and that the {} brackets around the array name and subscript (in [] brackets) are required. Printing ${MySimpleArray[@]} would print out all elements.

host # echo ${MySimpleArray[0]};echo ${MySimpleArray[1]};echo ${MySimpleArray[2]}
MySimpleValue0
MySimpleValue1
MySimpleValue2


In Perl: Just type "print "$MySimpleArray[0] $MySimpleArray[1] $MySimpleArray[2]\n";" - Note that the $ character needs to precede the variable name when you want to get the individual value of an array element (printing @MySimpleArray would print out all elements) - The \n, indicating a carriage-return, line-feed or new-line isn't necessary, but is nice if you don't want your output on the same line as your next command prompt:

host # perl -e '@MySimpleArray[0] = MySimpleValue0; @MySimpleArray[1] = MySimpleValue1; @MySimpleArray[2] = MySimpleValue2;print "$MySimpleArray[0] $MySimpleArray[1] $MySimpleArray[2]\n";'
MySimpleValue0 MySimpleValue1 MySimpleValue2


In Awk: Just Type "print MySimpleArray[0],MySimpleArray[1],MySimpleArray[2]" - Note that the $ or @ symbol "must not" precede the variable name when you want to get the value. The comma in between the values ensures that a space will be printed between them for clarity's sake. Note that awk arrays need to be iterated over to be entirely printed out, and then extra care has to be taken if you want to get the variables out in the correct sequence (for another day) :

host # echo|awk '{MySimpleArray[0] = "MySimpleValue0"; MySimpleArray[1] = "MySimpleValue1"; MySimpleArray[2] = "MySimpleValue2";print MySimpleArray[0],MySimpleArray[1],MySimpleArray[2]}'
MySimpleValue0 MySimpleValue1 MySimpleValue2


In C: Just type "printf("%s\n", MySimpleArray);" to get the value for your character array. Note, again, that, for these posts, we're not going to get into the compilation part of creating a working C program:

host # ./c_program
MySimpleArray


And, now we've got two out of the three of the "basics" covered. In our next post on this subject, we'll take a look at the third most common variable/value type: The hash or associative array (which, as chance would have it, are technically what Awk arrays are :)

Best Wishes,

, Mike