Thursday, March 26, 2009

Simple, But Effective. Echo Debugging On Linux And Unix

Hey there,

I took some time and did some simple "echo debugging" and found that the warning I issued about yesterday's script to find a process's idle time was completely backward. Fortunately, it turns out that my mistaken judgement meant that I had a lot less to worry about, in terms of damage control, from the flaw I perceived in my script (I'm not saying there aren't others, of course ;)

It turns out that the problem was not that the script would sometimes consider an active process, that didn't have an idle column value in the "w -s" output, to be idle and worth terminating. The actual problem was that it would consider processes that had been up for more than a day to be active and not worth terminating. This was a much better situation. At least I wouldn't be killing off active processes!

A sample of just using the DEBUG statements I put in yesterday's script pointed out the error very obviously, like so:

DEBUG::::: PIDTTY 6892 pts/119
DEBUG::::: W user1 119 2days ./program
DEBUG::::: LONGTIME TIME
PID 6892 is OK - Not Idle At All - Remove this message!
------------------------
DEBUG::::: PIDTTY 581 pts/232
DEBUG::::: W user1 232 2days ./program
DEBUG::::: LONGTIME TIME
PID 581 is OK - Not Idle At All - Remove this message!
-----------------------------------


And, yes, I felt like a complete moron when I finally took a second to actually look at the output ;) It's a amazing what a few simple echo statements in a script can tell about what problem's it has :)

From that point, I found several other issues and worked on them accordingly:

1. ISSUE WITH IDLE ALPHA DAYS NOTATION = FIX BY CHECKING FOR NON-NUMERIC TYPES

2. ISSUE WITH NO-IDLE MISSING COLUMN = FIX BY SETTING EMPTY VALUE TO NULL PADDED

3. ISSUE WITH MISSING COLUMN ERROR OUTPUT = FIX BY CHECKING COLUMN COUNT IN TIME

4. MUCH BETTER - "NOT IDLE AT ALL" EXCEPTION NEVER CAUGHT - UNNECESSARY NOW - REMOVED

5. REWORKED TIME HANDLING AND SET TO AMBIGUOUS ALPHA MATCH


Pardon my hysterical notes ;) Most of my problem stemmed from the fact that I switched from full-fledged "w" to "w -s" and made some mistakes in updating the relevant columns that I needed to assign to variables.

I should note that I also considered using "who -T" to get around the one time-stealer in this script. Although it did bring the script down to under a second (processing approximately 100 records), "who" only reports on the "user process." This is a huge consideration, since the "user process" can (and usually is) the parent process of the process you want to check the idle time on. I ultimately decided to stick with "w" since using "who" would mean I'd have to check the parent process, cross reference that with the grep output associated with the pty and then end up back at "w" again to get the process's idle time. A lot of extra work for a lot of extra uncertainty. I didn't want to end up in a situation where the "user process" was idle because the user kicked off a script that ran for 6 hours and then terminate the user's main process (which would kill the kids) based on the idle time of the user's session. Sometimes, lack of precision like that can cause you headaches you never imagined you could have ;)

As you can see below, the updates weren't all that impressive, but I did get the execution time down to 30 seconds from 2 minutes. The only way I could get it lower (that I've figure out so far ;) was to compromise the integrity of the script and remove the one awk statement that was holding it back. Notice the last step I took, just to see what would happen, that proved the awk if/else conditional in the script was responsible for a majority of the execution time:

TRIMMED CODE - REMOVED DEBUG AND UNNECESSARY ECHO STATEMENTS - USING BASH TEST AND OPERATORS
OLD SCRIPT EXECUTION TIME FOR 178 PROCS = 1m27.430s
NEW SCRIPT EXECUTION TIME FOR 179 PROCS = 0m56.517s
NEW SCRIPT EXECUTION TIME FOR 110 PROCS = 0m48.940s
SELF-CONTAINED SCRIPT EXECUTION TIME FOR 111 PROCS = 0m51.048s
ADDED TTY TO PS SCRIPT EXECUTION TIME FOR 101 PROCS = 0m29.703s
REMOVING AWK TTY STATEMENT SCRIPT EXECUTION TIME FOR 100 PROCS = 0m29.991s
ADDED ?, console and "continue" SCRIPT EXECUTION TIME FOR 102 PROCS = 0m33.018s
REMOVED W HEADING (REM SED) AND EXPLICIT USER SCRIPT EXECUTION TIME FOR 101 PROCS = 0m33.382s
TEST - HARDCODED UPTIME AND REMOVED AWK STATEMENT - SCRIPT EXECUTION TIME FOR 170 PROCS = 0m8.512s!!!!!!!!!!!


I'm going to work on it some more, because I believe it can be improved tremendously, but - to satisfy any curiosity, here's some of the mid-work that fixed that issue and made the bash script report correctly. I'll post the one with the fixes noted above (and more, I'm sure ;) once I've thoroughly tested them and removed a lot of the redundancy in this script. Redundancy really gets under my skin. I mean it; redundancy really irritates me. Plus, I don't much care for redundancy ;)

Cheers,


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/bin/bash

#
# rip - Kill any processes that we know have been idle for more than 45 minutes - v2-alpha
#
# 2009 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

if [[ $# -lt 1 ]]
then
echo "Usage: $0 PID [user]"
echo "User defaults to the value"
echo "of \$LOGNAME if not specified"
exit 1
fi

PID=$1
ISITAPID=$(echo $PID | grep [A-z])

if [[ ! -z $ISITAPID ]]
then
echo "PID $1 contains non-numeric characters!"
echo "-----------------------------------"
exit 2
fi

PID="$1"
USER=${2:-$LOGNAME}

PIDTTY=$(/usr/bin/ps -fu $USER -o pid,tty |/usr/bin/grep -w $PID|/usr/bin/grep -v grep)

echo DEBUG::::: PIDTTY $PIDTTY

if [[ -z "$PIDTTY" ]]
then
echo "PID $PID is either non-existent, not owned by \"$USER\" or not attached to a p/tty!"
echo "-----------------------------------"
exit 3
else
TTYNUMBER=$(echo "$PIDTTY"|/usr/bin/sed '/TT/d'|/usr/bin/awk -F"/" '{print $2}')
fi

if [[ -z "$TTYNUMBER" ]]
then
echo "PID $PID is not attached to a p/tty!"
echo "KILL OR NOT-----------------------------------"
exit 4
fi

echo DEBUG::::: W $(w -s|/usr/bin/sed 1d|/usr/bin//awk '{if ( $2 == '"$TTYNUMBER"' ) print $0}')

TIME=$(w -s|/usr/bin/sed 1d|/usr/bin/awk '{if ( $2 == '"$TTYNUMBER"' && NF == 4 ) print $3;else if ( $2 == '"$TTYNUMBER"' && NF == 3) print "0"}')
#TIME=$(w -s|/usr/bin/sed 1d|/usr/bin/awk '{if ( $2 == '"$TTYNUMBER"' ) print $3}')
#WCOLUMNS=$(w -s|/usr/bin/sed 1d|/usr/bin/awk '{if ( NF == 4 ) print "4";else print "3"}')

ISITANUMBER=$(echo $TIME | grep [A-z])
if [[ ! -z $ISITANUMBER ]]
then
unset TIME
fi

LONGTIME=$(echo $ISITANUMBER | grep [A-z])

echo DEBUG::::: LONGTIME $LONGTIME TIME $TIME

if [[ ! -z "$LONGTIME" && -z "$TIME" ]]
then
echo "PID $PID is ancient - Idle for $LONGTIME... Killing $PID"
# KILLKILLKILL
elif [[ "$TIME" = "0" ]]
then
echo "PID $PID is OK - Not Idle At All - Remove this message!"
else
TIMEIDLE=$(echo $TIME|grep -v "[:]")
echo DEBUG::::: TIME $TIME
if [[ -z $TIMEIDLE ]]
then
echo "PID $PID has been idle way too long - $LONGTIME $TIME so far... Killing $PID"
# KILLKILLKILL
elif [[ $TIMEIDLE -gt 45 ]]
then
echo "PID $PID has been idle too long - $TIMEIDLE minutes so far... Killing $PID"
# KILLKILLKILL
else
echo "PID $PID is OK - Only idle for $TIME minute(s) - Remove this message!"
fi
fi
echo "-----------------------------------"


, Mike




Discover the Free Ebook that shows you how to make 100% commissions on ClickBank!



Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.