Hey There,
Hopefully yesterday's rant on the simplicity of complexity wasn't too much of a bitter pill. If it was, here's hoping you didn't swallow it ;)
Today, I finally found some time to make a little headway on this project (which should be a lot simpler than it is). Basically, what I'm looking to do is create a way to track specific process's idle times at any given point in time on any given Linux or Unix system. As I mentioned yesterday, there are c structures in Solaris' /proc/PID/status C data files (for one example), but that's just another thing that ended up frustrating me more. As I noted, parts of the OS that are included, should be available for use. The structure is used by the OS, in some shape or fashion, to determine idle times (as we'll see below), but no specific "tool" exists to do what i wanted. Of course, this is limited "to my knowledge." If anyone out there knows of a standard program or command that's managed to elude me, please feel free to email me and tell me all about. I promise to not get offended if you feel the need to belittle me for not having the common sense to look for it where it was at in the first place ;)
Attached to today's post is a rough-draft bash script that attempts to grab a process's idle time. It won't work in all instances, although I've tried to capture as many of those instances as possible. The one big gotcha in this whole mess is that you can't take the output of ps and directly retrieve the idle time for a process from the listing, even if you do your own formatting (I wrote this on Solaris 10 and looked at SUSE Linux 9, but found no love :) Instead, I found that I needed to run ps, extract the pty associated with the process from that (if it existed - which is an exception the script catches) and then use either who or w to retrieve the idle time associate with the pty.
See what I mean? Shouldn't it be a little bit less of a hassle than that?
Okay. I'll admit, if it was, I wouldn't be having half the fun I'm having now trying to script it all out for myself ;) So far, what I've put together works fairly well, although I'm not 100% certain that it's bullet-proof so I would recommend that you leave the "business end" of the code commented out (The stuff that performs unforgivable actions, like killing ;). I have a hard time reproducing it, but I can swear that this code will (every once in a good while) determine that a process that hasn't been idle at all (which removes a column from the "w -s" output) has been idle too long. I'm still working on that part and welcome any suggestions regarding the script, how to make it better, why I'm doing everything the wrong (and/or hard) way when I don't need to and any other constructive criticism :)
The script runs very simply, and you only need to supply it with a PID. You can, optionally supply a username as a second argument:
host # ./rip 17787
if you just run it with no arguments, you'll get a usage screen, which may or may not help ;)
host # ./rip
Usage: ./rip PID [user]
User defaults to the value
of $LOGNAME if not specified
and the following is a sample of the output you might get on a specific run. Here, I've written a command line while loop from a pipe to barbarically hammer out multiple instances at a time ;)host # time ps -ef|grep "[b]ash"|awk '{print $2}'|while read x;do ./rip $x;done
PID 2664 is not attached to a p/tty!
-----------------------------------
PID 10700 is either non-existent, not owned by "root" or not attached to a p/tty!
-----------------------------------
PID 10855 is OK - Not Idle At All - Remove this message!
-----------------------------------
PID 23217 is OK - Not Idle At All - Remove this message!
-----------------------------------
PID 14730 is either non-existent, not owned by "root" or not attached to a p/tty!
-----------------------------------
Here's another example. This time you'll see what you get if you try to run the script specifying a user other than the user that owns the processes or, in this case, a completely bogus user. This "test" in the script really isn't necessary and I only included it as feeble attempt at damage control. Feel free to remove it if you like:host # time ps -ef|grep "[b]ash"|awk '{print $2}'|while read x;do ./rip $x joeUser;done
ps: unknown user joeUser
PID 2664 is either non-existent, not owned by "joeUser" or not attached to a p/tty!
-----------------------------------
ps: unknown user joeUser
PID 23633 is either non-existent, not owned by "joeUser" or not attached to a p/tty!
-----------------------------------
ps: unknown user joeUser
PID 10700 is either non-existent, not owned by "joeUser" or not attached to a p/tty!
-----------------------------------
ps: unknown user joeUser
PID 10855 is either non-existent, not owned by "joeUser" or not attached to a p/tty!
-----------------------------------
ps: unknown user joeUser
PID 14730 is either non-existent, not owned by "joeUser" or not attached to a p/tty!
-----------------------------------
I hope you find some good use for this script!
NOTE: Please keep in mind the caveat noted above regarding the sometimes-false-positive I believe this script returns under certain circumstances when it decides a non-idle process (with nothing displayed in the idle column from "w -s" output) has been idle too long! It may never happen again and I may have been seeing spots. Just want to keep you in a "safe" mindset, just in case I'm not completely insane ;)
Cheers,
This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License#!/bin/bash
#
# rip - Kill any processes that we know have been idle for more than 45 minutes
#
# 2009 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#
if [ $# -lt 1 ]
then
echo "Usage: $0 PID [user]"
echo "User defaults to the value"
echo "of \$LOGNAME if not specified"
exit 1
fi
PID=$1
ISITAPID=$(echo $PID | grep [A-z])
if [ ! -z $ISITAPID ]
then
echo "PID $1 contains non-numeric characters!"
echo "-----------------------------------"
exit 2
fi
PID="$1"
USER=${2:-$LOGNAME}
PIDTTY=$(/usr/bin/ps -fu $USER -o pid,tty |/usr/bin/grep -w $PID|/usr/bin/grep -v grep)
#echo DEBUG::::: PIDTTY $PIDTTY
if [ -z "$PIDTTY" ]
then
echo "PID $PID is either non-existent, not owned by \"$USER\" or not attached to a p/tty!"
echo "-----------------------------------"
exit 3
else
TTYNUMBER=$(echo "$PIDTTY"|/usr/bin/sed '/TT/d'|/usr/bin/awk -F"/" '{print $2}')
fi
if [ -z "$TTYNUMBER" ]
then
echo "PID $PID is not attached to a p/tty!"
echo "-----------------------------------"
exit 4
fi
#echo DEBUG::::: W $(w -s|/usr/bin/sed 1d|/usr/bin//awk '{if ( $2 == '"$TTYNUMBER"' ) print $0}')
TIME=$(w -s|/usr/bin/sed 1d|/usr/bin/awk '{if ( $2 == '"$TTYNUMBER"' ) print $3}')
ISITANUMBER=$(echo $TIME | grep [A-z])
if [ ! -z $ISITANUMBER ]
then
unset TIME
fi
LONGTIME=$(echo $TIME | grep [A-z])
#echo DEBUG::::: LONGTIME $LONGTIME TIME $TIME
if [ -z "$LONGTIME" -a -z "$TIME" ]
then
echo "PID $PID is OK - Not Idle At All $TIME - Remove this message!"
elif [ ! -z $LONGTIME ]
then
echo "PID $PID is ancient - Idle for $TIME... Killing $PID"
# DO_WHAT_YOU_HAVE_TO_DO_TO_THE_PID_HERE
else
TIMEIDLE=$(echo $TIME|grep -v "[:]")
# echo DEBUG::::: TIME $TIME
if [ -z $TIMEIDLE ]
then
echo "PID $PID has been idle way too long - $LONGTIME $TIME so far... Killing $PID"
# DO_WHAT_YOU_HAVE_TO_DO_TO_THE_PID_HERE
elif [ $TIMEIDLE -gt 45 ]
then
echo "PID $PID has been idle too long - $TIMEIDLE minutes so far... Killing $PID"
# DO_WHAT_YOU_HAVE_TO_DO_TO_THE_PID_HERE
else
echo "PID $PID is OK - Only idle for $TIME minute(s) - Remove this message!"
fi
fi
echo "-----------------------------------"
, Mike
Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.