Tuesday, March 31, 2009

Measuring Heavy CPU Usage Over Time On Linux And Unix

Hey There,

Today's bash script is going to be somewhat related to our previous script which tracked idle process time on Linux and Unix, insofar as it deals with trying to rid your system of troubling processes as automatically as possible. Of course, there's no substitute for an eyeball-inspection (of the system, I mean. Unless your eyeballs are hurting ;) but, once you've got a few things down and feel reasonably safe, the more you can take off of your daily plate, the better. ...Just don't make your job so incredibly simple that a machine (or a pre-schooler from a third-world country) could do it ;)

This script is, like most of the stuff we put out here, incredibly easy to run (especially since you set all the variables inside it - change as you see fit), like so:

host # ./munchies

And you're off. In the screenshots below, we'll walk through some basic examples of simple usage, assuming the script's built in parameters. Any process consuming more than 10 percent of the CPU gets added to the blacklist and any process that shows up in the blacklist, 10 times consecutively, will get killed (No screwin' around here ;)

In the first screenshot, we've isolated process id 499 (which happens to be the X server), since it's the only process on the box that meets the "CPU percentage" criteria. Once it finds that process, it adds it to the default temporary file (the simple way to maintain state ;). We then populate the /tmp/munchiestats file with a whole bunch of other PID's (some real, some non-existent) and multiple instances of PID 499 (but, less than 9, so we don't trigger the kill on the next execution) and cat that so you can see the contents:

Click on the picture below. Like water on a sponge ;)

munchies script output 1

In the second screenshot, we run munchies again and see it clear all the PID's in the temp file that are legitimate, but aren't using over 10% of the CPU anymore. We also free any PID's in the temp file that don't exist any more (possibly, from a process exiting, but - in this case - because we just made them up ;). The final run executes the kill of PID 499 and removes it from the temp file:

Click on the picture below and brace yourself for the HUGEness ;)

munchies script output 2

Of course, the script has its faults. The most blatant pain in the arse (to our thinking, at this point - with very little QA'ing done ;) is that we've hardcoded the percentage of CPU (10%) and amount of times a PID is allowed to use that much (10 times) and not made them command line or top-listing variables. If you want to change it in the script, just modify these lines:

For the CPU percentage limit:

if [[ $cpu_percentage_integer -gt 10 ]]

And for the number of consecutive times you'll allow the offending PID to get away with it before you murder (I mean, kill... ;) it:

if [[ $chronic_muncher -gt 8 ]] <-- This is set to 8 since, if the pre-existing number of additions of a certain PID is over 8, it's (at best) 9, and this go 'round will put it at the limit of 10!
elif [[ $chronic_muncher -lt 10 ]]

Another maybe-flaw is that we don't have it set to run backgrounded, or as a daemon. In other words, you need to run it on your own schedule. We have it running in cron every 5 minutes, so a process can abuse the CPU for about 50 minutes before we kill it. If you run it every minute, you can kill it in 10. Of course, all of this is "variable" and you can change it to suit your needs.

And, if you consider this a flaw, the script was written in bash on Solaris 10, but should be easily portable to other Unix and Linux distro's. Let us know if you'd like to see a version for RedHat or Ubuntu!

Here's hoping this helps you out in some way, shape or form. It's probably translatable to a lot of other work-type performance-tuning situations, as well.

Cheers!


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/bin/bash

#
# munchies - eat up processes using over 10 percent of the cpu over 10 iterations...
#
# 2009 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

cpu_munchers_file="/tmp/munchiestats"
sed=`which sed`
awk=`which awk`
ps="/usr/ucb/ps" # Using /usr/ucb/ps on purpose for %CPU stats
grep=`which grep`
mv=`which mv`
wc=`which wc`
sort=`which sort`
xargs=`which xargs`
kill=`which kill`
# kill_signal="-9" # Don't set this if "kill -TERM"/"kill -15" - i.e. "plain vanilla"kill - is acceptable

while read a b c d
do
munch_pid="$b"
cpu_percentage="$c"

if [[ -z "$cpu_percentage" ]]
then
echo "munch_pid $munch_pid is either non-existent or is using less than zero percent of the cpu!"
continue
else
cpu_percentage_integer=$(echo "$cpu_percentage"|$sed 's/^\([^\.]*\)\..*$/\1/')
fi

if [[ $cpu_percentage_integer -gt 10 ]]
then
echo "Got A Bad One Here - munch_pid $munch_pid Is Using $cpu_percentage_integer Percent Of Our Cpu"
if [[ -f $cpu_munchers_file ]]
then
echo "Checking cpu_munchers_file $cpu_munchers_file For munch_pid $munch_pid"
chronic_muncher=$(echo `$grep -w $munch_pid $cpu_munchers_file|$wc -l`)
if [[ $chronic_muncher -gt 8 ]]
then
echo "munch_pid $munch_pid Count Is $chronic_muncher - This Will Put It At 10 Or Higher"
echo "Issuing \"$kill $kill_signal $munch_pid\" And Removing From $cpu_munchers_file now!"
temp_variable=$$
### $kill $kill_signal $munch_pid
$grep -vw $munch_pid $cpu_munchers_file >>${cpu_munchers_file}.$temp_variable
mv ${cpu_munchers_file}.$temp_variable $cpu_munchers_file
elif [[ $chronic_muncher -lt 10 ]]
then
echo "munch_pid $munch_pid, with $cpu_percentage_integer cpu usage, Being Added, Possibly Again, To cpu_munchers_file $cpu_munchers_file"
echo "$munch_pid" >>$cpu_munchers_file
fi
else
echo "No Cpu-Munchers Exist. Creating cpu_munchers_file $cpu_munchers_file And Adding munch_pid $munch_pid"
echo "$munch_pid" >>$cpu_munchers_file
fi
else
if [[ -f $cpu_munchers_file ]]
then
chronic_muncher=$(echo `$grep -w $munch_pid $cpu_munchers_file|$wc -l`)
if [[ $chronic_muncher -gt 0 ]]
then
echo "munch_pid $munch_pid Is Ok And Is In $cpu_munchers_file - Removing"
temp_variable=$$
$grep -vw $munch_pid $cpu_munchers_file >>${cpu_munchers_file}.$temp_variable
mv ${cpu_munchers_file}.$temp_variable $cpu_munchers_file
else
:
fi
else
:
fi
fi
done <<< "`$ps -aux|$awk '{print $1,$2,$3,$NF}'|sed 1d`"

echo "Checking $cpu_munchers_file For Non-Existent munch_pids"
if [[ -f $cpu_munchers_file ]]
then
muncher_array=$($sort -u $cpu_munchers_file|$xargs echo)
for possible_lost_pid in ${muncher_array[@]}
do
is_this_muncher_real=$(echo `$ps -aux|$grep -w $possible_lost_pid|$grep -v grep|$wc -l`)
if [[ $is_this_muncher_real -eq 0 ]]
then
echo "Lost munch_pid $possible_lost_pid Is No Longer Running. Removing From $cpu_munchers_file"
temp_variable=$$
$grep -vw $possible_lost_pid $cpu_munchers_file >>${cpu_munchers_file}.$temp_variable
mv ${cpu_munchers_file}.$temp_variable $cpu_munchers_file
fi
done
echo "All Possible Injustices Have Been Remedied"
fi



, Mike




Discover the Free Ebook that shows you how to make 100% commissions on ClickBank!



Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.