Monday, June 30, 2008

Simple Cumulative Math Using Awk On Unix or Linux

Hey There,

I'm off to start a new work assignment today (after sleeping in late for a week), so, as you're reading this I may be trying not to fall asleep on the train or wishing I had slept more and/or was dead ;) In any event, this week, I thought we'd start getting back to that interstitial series on porting between shell, awk and Perl on Linux or Unix and, by way of getting back into it, make this post (at least a little bit) resemble the last post, which dealt with simple arithmetic.

One of the questions I see floating around the boards a lot has to do with using awk to calculate cumulative (or incremental) sums within a file. So, for a simple example, if a user had a file called "numbers" which contained the following:

host #cat numbers
a 4
b 2
c 8
d 57
e 12
f 8967
g 3
h 58
i 3


they'd want awk to read through that file, and give the total result of all the numbers in column 2. For kicks, they may also want to print each line in the file as it gets processed (at least until they're satisifed that it's doing what they want. After that it gets to be a consumption of screen space for no reason)

Our attached shell script (which only runs awk inside it and accepts a filename as an argument) does just that. Note that it doesn't do any error checking, and assumes you know that your input file should be of the form: one pair per line of "name" [space or tab] "number," like above. It can be run very simply from the command line, and should provide output, like so:

host # ./sum.sh numbers
a numvalue: 4
b numvalue: 2
c numvalue: 8
d numvalue: 57
e numvalue: 12
f numvalue: 8967
g numvalue: 3
h numvalue: 58
i numvalue: 3
Sum total: 9114


Of course, this script can also be written as a one-liner, as long as you redirect the input to it that it requires, like so:

host # awk 'BEGIN {name="testcase";oldname="testcase"}{name=$1;if (name != oldname || oldname != "testcase") print name " numvalue: " $2; sum=0;finalsum += $2;oldname = name} END {print "Sum total: " finalsum}' numbers
a numvalue: 4
b numvalue: 2
c numvalue: 8
d numvalue: 57
e numvalue: 12
f numvalue: 8967
g numvalue: 3
h numvalue: 58
i numvalue: 3
Sum total: 9114


Basically, all we're doing is going through each line of the file and adding the numeric value of each line to the numeric value of the total until the input file ends. Of course, you can modify this slightly to make it do any sort of arithmetic operation you require. Multiplication, division, subtraction, etc. At least, as much as awk will allow ;)

Hope you're enjoying this more than I'm probably enjoying my morning. And, in both cases, there's probably a better, and more efficient way to do this ;) Here's to that!

Cheers,


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/bin/sh

#
# sum.sh
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

file=$1
awk 'BEGIN {name="testcase"
oldname="testcase"}
{name=$1
if (name != oldname || oldname != "testcase") {
print name " numvalue: " $2
sum=0}
finalsum += $2
oldname = name}
END {print "Sum total: " finalsum}' $file


, Mike