Monday, November 12, 2007

Determining The Real Space Hogs On a Multiple-Overlay Mounted File System

This happens a lot to me, and probably most system administrators, at work. All the machines are being monitored (the "important" ones, anyway) 24x7. Of course, that's good because it's our job to make sure the machines keep running optimally and it would be a real drag to have to log in and check them manually.

However, no matter how sophisticated your monitoring system is, it's prone to giving you the usual headaches. A computer's only as smart as a computer, after all. On the one end of the spectrum, you'll get notified if the cpu so much as spikes under 10% idle for 1 minute of the day and on the other end, you won't find out that your machine is melting until you smell the smoke. In any event, some sort of automation is better than none!

Disk watching is something that's becoming more and more critical, in so far as performance assurance goes, as time marches on. What we used to store in MegaBytes, we're now storing in PetaBytes and it's still not enough space to store all that important music and video... I mean, customer data :P

The issue I'm addressing with today's bit of work, is how to deal with getting a clear assessment of disk space usage when you've got a multiple-overlay mounted file system. This may not be the proper industry term, so I'll explain that when I say this I mean a partition which has many other partitions mounted over it. For instance (all separate mount points):

/stuff
/stuff/IBM/aa
/stuff/IBM/bb
/stuff/oracle/db_files


...and so on. A lot of times, the overlain mounts number in the hundreds. So, in our instance, what do we do (in our role as administrators and blame-takers) to determine who's really using up all that space? Obviously, running a disk usage command like "du -sk" on the /stuff directory would be very time consuming, and it wouldn't accurately reflect what files/directories are eating up all the space on /stuff since most of the directories beneath /stuff are on entirely separate partitions!

The only way to really do it accurately is to ensure that you're only measuring the disk usage of files and directories that are on /stuff and not any of the overlain partitions. The little script below does just that. For your consideration:


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/bin/ksh

#
# 2007 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

trap 'rm tmpfile.$pid;echo "Caught Signal. Cleaning Up And Quitting...";exit 3"' 1 2 3 9 15

pid=$$
nicedate=`date "+%m/%d/%y`
date=`date +%m%d%y`

hostname=`hostname`

print ""
print "$hostname StuffCheck Output for $nicedate"
echo "---------------------------------------------------------------------"
print ""
printf "%-15s %-10s %-20s %-10s %-10s\n" Hostname Size Partition User Group
echo "---------------------------------------------------------------------"

if [ -d /stuff ]
then
cd /stuff
counter=0
ls -1d * 2>&1|while read x
do
if [ `df -k |grep "$x" >/dev/null 2>&1;echo $?` -ne 0 ]
then
if [ $counter -eq 0 ]
then
dir_array="$x"
let counter=$counter+1
else
dir_array="${dir_array}\t$x"
fi
fi
done
for y in `echo $dir_array`
do
du -dsk $y 2>&1
done|sort -rn|head -5|while read size partition
do
printf "%-15s %-10s %-20s %-10s %-10s\n" $hostname: $size $partition `ls -ld /stuff/$partition|/bin/awk '{ print $3}'` `ls -ld /stuff/$partition|/bin/awk '{ print $4}'`
done
else
print "No /stuff Directory found!!!"
fi


Note that the real time saver here is that we only run du on files and/or directories that we've already determined are not on any overlay mount (using df -k).

All in all, it's a fairly simple script to decipher, since I wrote it quickly and hardcoded a lot of things that could very well be variable (the partition name for instance) and tipped my hand to what I wanted to accomplish before the timer ran out by adding a "hostname" field to the output. But, I'm sure I'll end up making this more flexible as soon as the /things, /odds and /ends directories start getting overloaded :)

Take care,

, Mike