Showing posts with label suse. Show all posts
Showing posts with label suse. Show all posts

Tuesday, May 5, 2009

Wtmpx Log Rolling On Unix or Linux: Practical Application Of Fwtmp

Hey There,

In yesterday's post on getting the year from wtmpx, we took a look at a great built-in program called fwtmp (that I somehow managed to not notice for several years ;) and examined some uses of it from a high level perspective.

Today we're going to look it from an opposite angle and look at something very specific that it can help you do. And, just to dot the i's that I can (and this list is, of course, incomplete), you can find fwtmp in /usr/lib/acct on Solaris and in /usr/sbin/acct on RedHat Linux (AS 5.2) and on SUSE Linux 9. Of course, all of the operating systems require you to have the correct pkg/rpm/dpkg files installed in order for the command to exist on your system at all :)

Below is a really simple shell script to illustrate the functionality of fwtmp. It's basically a log rotation script written specifically to highlight the use of fwtmp to rotate your wtmpx/wtmp/btmp file. It's meant to be run in cron and is simple to execute since (as it stands) it takes no arguments. Feel free to embellish for your own environment or to make it more accessible across a wide variety of different OS's. The basic cron entry I would add would be something like:

58 23 * * * /usr/local/bin/wtmp_rotate >/dev/null 2>&1


which basically just tells the cron daemon to run /usr/local/bin/wtmp_rotate (the place I like to put all my custom scripts) at 11:58pm every day and to dump any output from the command into the bit-bucket (redirecting both STDOUT and STDERR to /dev/null)

Hope this script helps you out some. You may want to test it by making a temporary directory and copying your wtmpx file into there first. I've included some commented lines to indicate the parts of the script you'd want to modify to ensure that your testing "doesn't" use the real system file.

And to answer the question of why I compress the files after converting them back to binary; I found, in my testing, that the opposite of what seemed logical was true. The binary files compacted to a much greater degree than the fwtmp-generated ASCII files. I didn't investigate it much further since it is what it is, but, if I had to throw out a possible reason it may be that fwtmp pads that ASCII file with a lot of extra bits that can't be stripped (That brush-off has middle-management written all over it ;)

Enjoy and cheers :)


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/bin/bash

#
# wtmpx_rotate - rotate your user login logs... wheee :)
#
# 2009 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

# COMMENTED OUT SECTIONS SHOULD BE SWAPPED WITH THEIR UNCOMMENTED COUNTERPARTS TO DO LOCAL-DIRECTORY TESTING WITH A COPY OF WTMPX
# THESE SWAPPABLE SECTIONS WILL BE SEPARATED FROM LIKE PARTS OF THE SCRIPT BY A SINGLE BLANK LINE

if [[ ! -d /var/adm/backup_log_dir ]]
then
mkdir /var/adm/backup_log_dir
fi


#if [[ ! -d backup_log_dir ]]
#then
# mkdir backup_log_dir
#fi


wtmpx="/var/adm/wtmpx" # This may be /var/log/wtmpx or /var/log/lastlog depending on your setup
#wtmpx="wtmpx"


fwtmp="/usr/lib/acct/fwtmp" # This may be /usr/sbin/fwtmp depending on your setup. Not using "which" since /usr/lib/acct isn't a standard directory.
sed=`which sed`
rm=`which rm`
compress=`which compress` # Or gzip, bzip2, whatever you prefer

grep_date=$(date "+%a %b %e")
grep_date_ext=$(date "+%a %b %e"|$sed 's/ //g')
grep_year=$(date +%Y)
variable_ext1=$(echo ${RANDOM}`date "+%S"`)
variable_ext2=$(echo ${RANDOM}`date "+%S"`)
variable_ext3=$(echo ${RANDOM}`date "+%S"`)
wtmpx_plus_variable_ext1=${wtmpx}.$variable_ext1
wtmpx_plus_variable_ext2=${wtmpx}.$variable_ext2
wtmpx_plus_variable_ext3=${wtmpx}.$variable_ext3
backup_log_dir_file=${wtmpx}.${grep_date_ext}.$grep_year


backup_log_dir_dir="/var/adm/backup_log_dir"
#backup_log_dir_dir="backup_log_dir"


$fwtmp < $wtmpx > $wtmpx_plus_variable_ext1

$sed -n "/$grep_date.*$grep_year$/p" $wtmpx_plus_variable_ext1 > $wtmpx_plus_variable_ext2
$sed "/$grep_date.*$grep_year$/d" $wtmpx_plus_variable_ext1 > $wtmpx_plus_variable_ext3

$rm $wtmpx $wtmpx_plus_variable_ext1

$fwtmp -ic < $wtmpx_plus_variable_ext2 > $wtmpx
$fwtmp -ic < $wtmpx_plus_variable_ext3 > $backup_log_dir_file

$rm $wtmpx_plus_variable_ext2 $wtmpx_plus_variable_ext3
$compress $backup_log_dir_file
mv ${backup_log_dir_file}.Z $backup_log_dir_dir


, Mike




Discover the Free Ebook that shows you how to make 100% commissions on ClickBank!



Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Tuesday, November 25, 2008

Quick And Easy Local Filesystem Troubleshooting For SUSE Linux

Hey There,

Today we're going to take a look at some quick and easy ways to determine if you have a problem with your local filesystem on SUSE Linux (tested on 8.x and 9.x). Of course, we're assuming that you have some sort of an i/o wait issue and the users are blaming it on the local disk. While it's not always the case (i/o wait can occur because of CPU, memory and even network latency), it never hurts to be able to put out a fire when you need to. And, when the mob's pounding on your door with lit torches, that analogue is never more appropriate ;)

Just as in previous "quick troubleshooting" posts, like the week before last's on basic VCS troubleshooting, we'll be running through this with quick bullets. This won't be too in-depth, but it should cover the basics.

1. Figure out where you are and what OS you're on:

Generally, something as simple as:

host # uname -a

will get you the info you need. For instance, with SUSE Linux (and most others), you'll get output like:

Linux “hostname” kernel-version blah..blah Date Architecture... yada yada

The kernel version in that string is your best indicator. Generally, a kernel-version starting with 2.4.x will be fore SUSE 8.x and 2.6.x will be for SUSE 9.x. Of course, also avail yourselves of the, possibly available, /etc/release, /etc/issue, /etc/motd, /etc/issue.net files and others like them. It's important that you know what you're working with when you get started. Even if it doesn't matter now, it might later :)

2. Figure out how many local disks and/or volume groups you have active and running on your system:

Determine your server model, number of disks and volume groups. Since you're on SUSE, you may as well use the "hwinfo" command. I never know how much I'm going to need to know about the system when I first tackle a problem, so I'll generally dump it all into a file and then extract it from there as needed. See our really old post for a script that Lists out hardware information on SUSE Linux in a more pleasant format:

host # hwinfo >/var/tmp/hwinfo.out
host # grep system.product /var/tmp/hwinfo.out
system.product = 'ProLiant DL380 G4'


Now, I know what I'm working with. If this specific of a grep doesn't work for you, try "grep -i product" - you'll get a lot more information than you need, but your machine's model and number will be in there and much easier to find than if you looked through the entire output file.

Then, go ahead and check out /proc/partitions. This will give you the layout of your disk:

host # /proc # cat /proc/partitions
major minor #blocks name

104 0 35561280 cciss/c0d0
104 1 265041 cciss/c0d0p1
104 2 35294805 cciss/c0d0p2
104 16 35561280 cciss/c0d1
104 17 35559846 cciss/c0d1p1
253 0 6291456 dm-0
253 1 6291456 dm-1
253 2 2097152 dm-2
253 3 6291456 dm-3
253 4 10485760 dm-4
253 5 3145728 dm-5
253 6 2097152 dm-6



"cciss/c0d0" and "cciss/c0d1" show you that you have two disks (most probably mirrored, which we can infer from the dm-x output). Depending upon how your local disk is managed, you may see lines that indicate, clearly, that LVM is being used to manage the disk (because the lines contain hints like "lvma," "lvmb" and so forth ;)

58 0 6291456 lvma 0 0 0 0 0 0 0 0 0 0 0
58 1 6291456 lvmb 0 0 0 0 0 0 0 0 0 0 0


3. Check out your local filesystems and fix anything you find that's broken:

Although it's boring, and manual, it's a good idea do take the output of:

host # df -l

and compare that with the contents of your /etc/fstab. This will clear up any obvious errors like mounts that are supposed to be up but aren't or mounts that aren't supposed to up that are, etc... You can whittle down your output from /etc/fstab to show (mostly) only local filesystems by doing a reverse grep on the colon character (:) - This is generally found in remote mounts and almost never found in local filesystem listings.

host # grep -v ":" /etc/fstab

4. Keep hammering away at the obvious:

Check the USED% column in the output of your "df -l" command. If any filesystems are at 100%, some cleanup is in order. It may seem silly, but usually the simplest problems get missed when one too many managers begin breathing down your neck ;) Also, check the inodes column and ensure that those aren't all being used up either.

Mount any filesystems that are supposed to be mounted but aren't, and unmount any filesystems that are mounted but (according to /etc/fstab) shouldn't be). Someone will complain about the latter at some point (almost guaranteed), which will put you in a perfect position to request that it either be put in the /etc/fstab file or not mounted at all.

You're most likely to have an issue here with mounting the unmounted filesystem that's supposed to be mounted. If you try to mount and get an error that indicates the mountpoint can't be found in /etc/fstab or /etc/mnttab, the mount probably isn't listed in /etc/fstab or there is an issue with the syntax of that particular line (could even be a "ghost" control character). You should also check to make sure the mount point being referenced actually exists, although you should get an entirely different (and very self-explanatory) error message in the event that you have that problem.

If you still can't mount, after correcting any of these errors (of course, you could always avoid the previous step and mount from the command line using logical device names instead of paths from /etc/vfstab, but it's always nice to know that what you fix will probably stay fixed for a while ;), you may need to "fix" the disk. This will range in complexity from the very simple to the moderately un-simple ;) The simple (Note: If you're running ReiserFS, use reiserfsck instead of plain fsck for all the following examples. I'm just trying to save myself some typing):

host # umount /uselessFileSystem
host # fsck -y /uselessFileSystem
....
host # mount /


which, you may note, would be impossible to do (or, I should say, I'd highly recommend you DON'T do) on used-and-mounted filesystems or any special filesystems, like root "/" - In cases like that, if you need to fsck the filesystem, you should optimally do it when booted up off of a cdrom or, at the very least, in single user mode (although you still run a risk if you run fsck against a mounted root filesystem).

For the moderately un-simple, we'll assume a "managed file system," like one under LVM control. In this case you could check a volume that refuses to mount (assuming you tried most of the other stuff above and it didn't do you any good) by first scanning all of them (just in case):

host # vgscan
Reading all physical volumes. This may take a while...
Found volume group "usvol" using metadata type lvm2
Found volume group "themvol" using metadata type lvm2


If "usvol" (or any of them) is showing up as inactive, or is completely missing from your output, you can try the following:

host # vgchange –a y

to use the brute-force method of trying to activate all volume groups that are either missing or inactive. If this command gives you errors, or it doesn't and vgscan still gives you errors, you most likely have a hardware related problem. Time to walk over to the server room and check out the situation more closely. Look for amber lights on most servers. I've yet to work on one where "green" meant trouble ;)

If doing the above sorts you out and fixes you up, you just need to scan for logical volumes within the volume group, like so:

host # lvscan
ACTIVE '/dev/usvol/usfs02' [32.00 GB] inherit
....


And (is this starting to sound familiar or am I just repeating myself ;), if this gives you errors, try:

host # lvchange –a y

If the logical volume throws you into an error loop, or it doesn't complain but a repeated run of "lvscan" fails, you've got a problem outside the scope of this post. But, at least you know pretty much where it is!

If you manage to make it through the logical volume scan, and everything seems okay, you just need to remount the filesystem as you normally would. Of course, that could also fail... (Does the misery never end? ;)

At that point, give fsck (or reiserfsck) another shot and, if it doesn't do any good, you'll have to dig deeper and look at possible filesystem corruption so awful you may as well restore the server from a backup or server image (ghost).

And, that's that! Hopefully it wasn't too much or too little, and helps you out in one way or another :)

Cheers,

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Thursday, November 13, 2008

Getting CPU Information From Various Flavours Of Linux And Unix

Hey there,

Today, in keeping with yesterday's theme of covering a fairly specific topic and trolling it around the seamy underbelly of the world of Unix and Linux, we're going to take a look at how to grab CPU information from your Unix or Linux box. I try to cover the major distros here, but (of course) my resources are limited and (except for the HP-UX example below) I've personally run and verified all of the results of the commands put out in this post today. If you crave additional information about your particular Operating System and the CPU(s) on it, check out some of our earlier posts on CPU usage, monitoring, etc, for all sorts of Unix/Linux varieties. Hopefully, you'll find some useful information in the growing archives of this sprawling morass of letters and numbers (by which I, of course, mean this blog ;)

On to today's business! For each OS, we'll be listing the OS name, followed by a practical line of code that you should, hopefully, be able to cut-and-paste onto your own system to get your specific information. Also, for each, we'll list the actual output that my trials produced, except for the HP-UX exception already noted.

1. Listing out CPU information on Solaris (Note that these "virtual" CPU's are actually real):

host # psrinfo -v| awk 'BEGIN{cpua=0;cpub=0}{if ( $0 ~ /Status.*processor/ ) {print $0;cpua=1} else if ( cpua == 1 ) {printf"%s ", $1;cpua=0;cpub=1} else if ( cpub == 1 ) {penult=NF-1;print $penult,$NF;cpub=0}}'|sed 's/,$//'
Status of virtual processor 0 as of: 11/12/2008 13:53:47
on-line 650 MHz
Status of virtual processor 1 as of: 11/12/2008 13:53:47
on-line 650 MHz


2. Listing out CPU information on SUSE and/or RedHat Linux:

host # cat /proc/cpuinfo|egrep 'processor|model name'|awk -F":" 'BEGIN{i=0}{if ( ! i%2 ) {printf"%s %d: ", $1,$2;i++} else {print $2;i=0}}'

processor 0: Intel(R) Xeon(TM) CPU 3.40GHz
processor 1: Intel(R) Xeon(TM) CPU 3.40GHz
processor 2: Intel(R) Xeon(TM) CPU 3.40GHz
processor 3: Intel(R) Xeon(TM) CPU 3.40GHz


3. Listing out CPU information on AIX 5.x:

host /# pmcycles -m
CPU 0 runs at 1200 MHz
CPU 1 runs at 1200 MHz
CPU 2 runs at 1200 MHz
CPU 3 runs at 1200 MHz
CPU 4 runs at 1200 MHz
CPU 5 runs at 1200 MHz
CPU 6 runs at 1200 MHz
CPU 7 runs at 1200 MHz


4. Listing out CPU information on HP-UX (as generally as possible). This bit of code is sewn together from exemplary commands on SysDigg's HP-UX CPU Info Page, since I can't get my hands on an HP-UX box right now, and the information gathering process is more convoluted than any of the preceding distro's (We're not even going to begin to talk about the new Itanium processors and/or differentiate between virtual and physical CPU's/Cores, etc. This is for a HP9000/800 model box):

host # pc=`ioscan -k |grep processor |wc –l `
host # pt=$(grep -i $(model |tr "/" " " \|awk '{print $NF}') \/usr/sam/lib/mo/sched.models |awk '{print $NF}')
host # ps=`echo "itick_per_usec/D" | adb /stand/vmunix /dev/mem | tail -1`
host # echo "Processor count: $pc - CPU Type: $pt - Speed: ${ps}Mhz"
Processor count: 2 - CPU Type: PA8700 - Speed: 750Mhz


Phew... Hopefully, the HP-UX information is correct (if not, please write in as I'd be glad to credit anyone with the correct answer if they can get to an HP-UX machine before I can :)

And, that's that. Quick and painless today ;) I promise to chew your virtual ear off tomorrow as I rant and rage for no apparent reason other than to possibly foment passive-aggressive revolution amongst the working class. Actually, whatever I have to write about tomorrow probably won't be "that" moving ;)

Cheers,

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Thursday, November 6, 2008

More Quick Ways To Find CPU Bottlenecks On Linux

Hey there,

Yesterday, we took a look at some useful commands to help identify memory bottlenecks in Linux. More specifically, we were looking at SUSE 9.x. We're going to use the same Linux version today (for our examples), although - again - much of this stuff translates fairly simply to other distro's. This post will be different from yesterday's in that we'll be focusing on specific CPU-related commands you can use, in a time of crisis (or, perhaps, just a long, drawn-out eternity of soul-crushing boredom ;), to determine if the CPU(s) on your machine are at the fore of whatever problems your system is having.

On a completely unrelated note, last night's election was, indeed, historic and satisfying. Of course, I didn't get to sleep until 3am because my wife was waiting for Indiana to finish counting its votes and hoping for an Obama landslide. When I snapped out of the temporary stupor that substituted for sleep this morning, I noticed that Indiana was still a "partially yellow" state. Hopefully, they'll get the votes counted before I publish this post. If not... I'll just thank God that my wife doesn't read this blog. She's a wonderful woman, but (like most people who've known me for a long time) probably more than willing to snatch the life right out from under me ;)

And here we go. Today's hit list for CPU testing on SUSE:

1. top. This command comes in first again. Explaining it again isn't necessary. When you looked it over to find your memory bottleneck you, no doubt, noticed the %CPU column and summary at the top. About the only thing special, with regards to CPU reporting on top, is how it deals with a multiple-CPU system. You can generally flip between the regular output (All CPUs' statistics combined) and forcing it to show per-CPU stats by using the capital "I" (You may see a message indicating the "Irix mode" is either off or on. On some builds, I've seen this work but give no verbose indication of the change).

2. host # more /proc/cpuinfo

General output will look something like below. Generally, on most newer (and just slightly older) machines, you'll be dealing with CPU's that list out in /proc/cpuinfo as more than they "physically" are. That is to say that hyperthreading/multiple-core CPU's will not appear in this file as the single physical entity that they are. Of course, your situation may vary, but this file should (at the very least) give you a feel for whether you have a bad CPU problem. In a situation where you have 4 physical CPU's (hyperthreading to simulate 8 CPU's) you can get a good indication of whether the problem your facing (we'll just assume you're facing a problem ;) is of a physical nature. If 2 virtual CPU's are down (in proper sequence), you probably need some new parts :) The "physical id" line value, when compared with the "processor" line value, is usually a good indication of whether or not your system is using hyperthreading or any other virtual enhancements. Odds are, you'll probably know this information before you ever have to look at this file.

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) MP CPU 2.5GHz
stepping : 5
cpu MHz : 2495.259
cache size : 512 KB
physical id : 0
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 4980.73


3. vmstat 2 - another return visitor from yesterday's post on memory mangle-ment ;)

This will run vmstat every 2 seconds, ad infinitum. You should check the "id" (idle percentage), "us" (percentage of CPU resources dedicated to user processes) and "sy" (percentage of CPU resources dedicated to kernel processes) columns first. If the idle percentage is low, knowing whether user processes (like a program running on your system) or the kernel (basically, the operating system and all its built-in facilities) are taking up all the CPU resources can get you pointed in the right direction early on.

You'll also want to consider the "wa" (I/O wait - although this does not necessarily mean that you're experiencing CPU-related I/O wait), "in" (kernel interrupts) and "cs" (kernel context switches) columns as well. High activity in any of these columns could indicate overuse of the CPU (Note that vmstat, although it can tell you a lot about what's going on with your system, cannot pinpoint the particular application or system setting that may be causing the events it reports!)

4. If you do notice a high number of CPU interrupts in your vmstat output, be sure to check out the contents of /proc/interrupts. Check it, for instance, every 10 seconds for a few minutes. Within that amount of time, the contents of the /proc/interrupts file may point you directly to the culprit. This may not be the answer, but should provide you some relief while you find the real problem and need to verify it doubly :)

Note that, as a rule, lots of kernel interrupts and CPU context switches (especially in the thousands) are a fairly good indicator of CPU load reaching maximum capacity.

5. Check your standard log files in /var/log. If you find a ton of messages there (or even just a few), they can provide invaluable clues. Combining this additional information with the output of vmstat, top and (possibly) the contents of /proc/cpuinfo and /proc/interrupts, should paint a fairly vivid picture and allow you to assess, quickly, whether or not you need to focus more effort on reducing CPU load or, possibly, replacing a bad CPU or two.

Once again, I wish you a good night and hope this little introduction to CPU bottleneck troubleshooting has been "accessible" or, at least, somewhat helpful to you :)

Cheers,

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Wednesday, November 5, 2008

A Few Ways To Gauge Possible Memory Bottlenecks In SUSE Linux

Hey there,

Today we’re going to take a look at a few (well, maybe more than a few) ways to check your Linux box to find out if the reason it’s beginning to perform poorly is related to memory or memory-management issues. For today’s examples, in particular, we’ll be using SUSE Linux 9.x, but most of these examples translate just as easily to RedHat, Ubuntu and other widely-used distro’s, with some minor modifications.

First things second, I’d like to apologize for the lack of animation in my “voice” in this post. As you probably well know (Even if you don’t live in the USA), we’re (hopefully) going to find out who our next President is by this evening (I write these a day in advance) and I’m somewhat distracted, even though I know I shouldn't start caring until its late enough in the game for any of the numbers to make a legitimate difference. My wife is hanging on the 1% from here and 3% from there ;) I won’t reveal whom I voted for, since this blog doesn’t take any particular political stance. I’ll just let it be said that I’m really hoping my candidate comes in. It’s going to be a farce if the other contender somehow manages to steal this thing….

And, secondly, we’ll walk through the many different (and easily accessible) ways you can check up on your Linux system’s memory usage to make sure that all is well. Note that for some commands, you need root privilege to get information of any value. Most of these commands don’t require that level of access, but (as a rule) if you need to interface with the kernel (or access its symbol/memory tables, etc) you’ll probably just end up with a big old raspberry when you try to run a few of these commands. Not to worry, though: You’ll be able to get more than enough information from this mass of commands to make a reasonably accurate estimation, no matter what level of access you have.

And here’s a quick look at assessing your system’s (possible) memory problem, quickly:

1. top.

This command should be in everyone’s arsenal. If your Linux distro doesn’t include it (as part of a standard pkg) you can always build it from source. It’s been my experience that this tool comes standard with everything nowadays. Pay special attention to the physical and virtual memory sections. top should give you a fairly accurate count of free vs. used memory for both types.

2. ps –aux |grep AprocessYouSuspect

This will provide you with similar information. Pay special attention to the %MEM, VSZ and RSS columns.

3. vmstat 2

This will provide you with memory statistics at 2 second intervals (also the default refresh rate for top).

Watch the following columns (note that the “inact” and “active” columns only show up when you invoke vmstat with the “-a” flag:

swpd: To check on the amount of virtual memory in use
free: To check on the amount of free memory
buff: To check on the amount of memory being set aside for buffering
cache: To check on the amount of memory set aside for caching.
inact: To check on the amount of inactive memory
active: To check on the amount of active memory
si: To check on the amount of memory swapped in from disk
so: To check on the amount of memory swapped out to disk. <-- For more on the difference between swapping, paging, etc, check out our older posts on swapping and Paging on Linux and Unix and its inevitable follow-up

4. ps -o vsz,rss,tsiz,dsiz,majflt,minflt,pmem,cmd 9999

This will format your ps output to spit out virtually all the memory information ps can get you for a specified process ID (In this case: 9999)

5. cat /proc/9999/status

This will provide you with a lot more detailed information on the 9999 process.

6. swapon –s

This command will list out all the system swap partitions

7. free

This will show you the amount of used and free memory in terms of straight-up memory, buffers and cache

8. cat /proc/meminfo

This will show you more detailed information on the what the system thinks its memory is doing and/or how its being used.

8. sar –r

This will show you, from another perspective, memory usage defined in terms of memory, buffers and cache

9. ipcs –u

This command will list out all the shared memory status in segments, semaphores and queues.

10. ipcs –p

This will list out all shared memory used by process ID and owner.

11. Finally, check the /etc/sysctl.conf (for shared memory values,etc) and /etc/sysconfig/kernel (for tmpfs/shmfs filesystem size figures).

And, hopefully, that should be enough to get you started and/or, at least, give you some idea if your bottleneck has anything to do with a shortage (or overuse) of memory on your system.

Here’s hoping your vote counts ;)

, Mike




Olivier Berger had this excellent suggestion for a GUI tool to add!


Hi.

I'd like to suggest that you mention gmemusage, which I find a really
useful tool to visualize what's eating memory if a X display is
available.

Hope this helps.



Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Friday, May 23, 2008

Using Who To Find What And When On Linux and Unix

Hello again,

Today's post is yet another in a somewhat disjointed series of posts on "stuff you might not know and you might find interesting" regarding very common commands. And they don't get much more common than the "who" command.

Generally, "who" is used like the last command that we looked at in our previous post. It's generally issued at the command line to determine who (yes, it's not just a clever name ;) is logged on "right now," if anyone is at all.

Unlike "last," however, the "who" command has quite a number of options that make it a great troubleshooting, and statistics gathering, command. And, as luck would have it, the four options that we're going to look at today are exactly the same on SUSE Linux 9.x, Solaris 9 Unix and even Solaris 10 :) We'll go through the options from most to least used (in my experience). Not that it matters. We're only looking at four options, so it's going to be hard to get lost ;) All example output will be from SUSE Linux 9.x

1. "who -r" - Prints the current runlevel. This is somewhat similar to the functionality of the last command that we posted about before, but it gives more limited information. This command is excellent for a quick overview of the system's current runlevel, previous state and last state-transition time. For instance, take the following example:

host # who -r
run-level 3 Feb 27 16:06 last=S


This shows us that our system is currently at "run level 3," was in "Single User" mode (S) previous to that, and that the transition from "Single User" to "run level 3" occurred approximately February 27th at 16:06. I say approximately, because (if we look at last's output, as we did in our previous post on using last to its full potential, we could see that this was actually a reboot).

The last state will usually appear as "S" on a reboot, since it's the last recorded state the system is at before it switches to "run level 3" (Of course run level 2 is executed on a normal boot to run level 3). All the information about switching from "run level 3" to "run level 6," and from "run level 6" to "run level S", and all the reboot and shutdown commands are not reported. Again, we don't know the year, but, since this command reads from wtmpx, you can check out a few older posts on user deletion with Perl and the relevant mods for Linux if you want to use Perl to grab that information, as well.

2. "who -b" - Prints the system boot time. Didn't I just get through a really long-winded explanation of all the information missing from "who -r"? ;) Well, here's some of that. This invocation of "who" prints out the last time the system was booted. Note that this doesn't differentiate between a reboot and a power-cycle:

host # who -b
system boot Feb 27 16:06


3. "who -d" - Prints out a list of all the dead processes on your system. This invocation of the who command is really only useful if you're looking for a problem process and can't seem to find it. Generally, you'd use either lsof or ptree/pfiles to find the rogue process, but, if you don't have those (or find them too messy), this command can sometimes help. Mostly though, it's just a listing of processes which are no longer running and still in memory. Note that, for our example below, all of these processes aren't even in the process table anymore!

host # who -d
Feb 27 16:06 2134 id=si term=0 exit=0
Feb 27 16:07 4410 id=l3 term=0 exit=0
pts/2 Apr 14 10:40 24532 id=ts/2 term=0 exit=0
pts/1 May 2 20:29 20407 id=ts/1 term=0 exit=0


4. "who -t" - Prints out the last time the System Clock was changed. Like I mentioned, I saved the least used, and/or obvious, invocation of who for last. You may never have to run the who command with this argument. Still, it's nice to know it's there. As far as I can tell, this setting is not affected by the NTP protocol or any similar software you might have running on your machine (xnptd, etc) to keep the OS clock set correctly. If someone with root (or equivalent) privilege decides to run the "date" command on the server to set an incorrect (or correct) time, this command's output will note it. Unfortunately, it's been a while on the machine I'm using as a test case, and the default output (assuming no change) is nothing. On the bright side, we can be reasonably certain that no one's been goofing with the system clock :)

host # who -t
host #


Enjoy the rest of your day, and have a great Memorial Day weekend ;)

Cheers,

, Mike

Tuesday, May 13, 2008

Killing Zombie Processes In Linux And Unix

Greetings,

Today's post is going to deal with "zombie" processes. These are processes to which the definition of a process only loosely applies.

A zombie process is most often generated when a parent process loses track of its child process and that child process becomes detached. The parent process, generally running some sort of a "wait()" call to receive notification that the child process has exited, loses track of the child process and never receives that information. The child process exits normally, but the parent thinks it's still running, and thus is a zombie process born :)

There are a number of steps to take, from simplest to most obscure, to get rid of zombie processes. And then there's what to do if none of that seems to work. Here we go :)

1. First, identify the fact that you have zombie processes running on your system (you may not notice, and there's a good reason why, which we'll address near the end of this post). You can do this on most major brands of Unix and Linux by running:

host # ps -el|grep Z <--- The -l flag to ps will include the "state" column. The zombie state is represented by a capital Z.

On Solaris 9:

host # ps -el|grep Z

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 Z 0 3038 1 0 0 - - 0 - ? 0:00
0 Z 0 19769 2966 0 0 - - 0 - ? 0:00


On SUSE Linux 9:

/home/ymdg001# ps -el|grep Z
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 Z 0 476 9874 0 0 - - 0 - ? 0:00


2. Now that you know you have zombie processes running, the first, and easiest, thing you can do to kill them is to try and "assassinate" the actual defunct (zombie) process. For the process in the SUSE example above, you can always try:

host # kill -9 476

It probably won't work, but it's worth a shot. Sometimes it does and your troubles are over just like that :)

3. Next, you should try to kill the parent process. This may or may not be possible for you. For instance, the parent process may be a process that you "need" to have running. It may also be a process that the Operating System "needs" to have running (like "init" - process 1, shown in the Solaris example above, under the PPID column).

Killing the parent process (if possible) will almost always work to get rid of a zombie process.

Please "never" try to kill init (process 1). If you're successful, your machine will go down hard and fast!

4. Assuming none of the above worked, some common wisdom says you should just give up (and for good reason, which we'll get to very soon ;). However you can try killing both the zombie process (and/or the parent process) using signals other than SIGKILL (or -9). I've seen it happen more than a few times. Different programs trap, and/or handle, different signals different ways. If your zombie doesn't go away when you execute a "kill -9" against it, try a simple "kill" (Which is, technically "kill -15" or SIGTERM). You can try to kill the process with any signal you want. I generally try signals 1 - 15 and then SIGUSR1 and SIGUSR2, just in case they're defined differently for that particular program on that particular system. You'd be surprised how many zombies you can whack with a SIGHUP or SIGINT. Sending a kill SIGCHLD or SIGCLD (Which is the same as SIGCHLD on System V) is a good one to try, as well. Sometimes your chosen method won't make "textbook sense" but it will work from time to time :)

You can find a handy list of signals to try in our old post on translating signal names to numbers and vice versa.

5. And the point I've been alluding to throughout this entire post.

What to do if your zombie process just won't die, you can't kill the parent and/or you're otherwise stuck?

The answer is: nothing.

Here's a brief explanation why: Even though zombie processes alarm most casual users of Unix and/or Linux, and they can make the process table look ugly with all those "defunct" messages scattered in between everything else, a zombie process lives up to its name in more ways than the sense defined above. It literally is like the somewhat-living dead. Although the proc table (and filesystem) have space reserved to record it, the process has already exited and is not consuming any of your system resources. It takes up none of your kernel or system space and is only a minor nuisance since "times" keeps track of its time (If you're a fly, you'll notice the 0:00 slow-down ;)

6. But WAIT!

There's more... (I'm starting to sound like a pitch man ;). Here's one last thing you can do if that ps entry for your zombie process is really bugging you: Once the zombie has totally disconnected from its parent process, you can just use the "wait" command to make it go away. For example:

host # ps -el|grep Z
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 Z 0 3038 1 0 0 - - 0 - ? 0:00
host # id
uid=0(root) gid=0(root)
host # wait 3038


...and when that returns (I'd recommend that you run this with "&" to background it - e.g. "wait 3038 &")

host # ps -el|grep Z
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD


It's gone :)

In any event, hopefully, after reading this, you'll no longer worry about zombies :)

Cheers,

, Mike

Saturday, March 15, 2008

Converting Linux RPM's Into Solaris Pkg Files

Hey there,

As this week wraps up, I thought I'd put out the last few things I've been toying around with. Today, we've got a Perl script for you that will take an RPM (From Linux) and convert it to a Solaris datastream pkg file. Of course, we've got the opposite (just like we have the opposite of our post on creating Solaris pkg files from already installed content, which we'll post in the next few days. That might be a more useful script than this one (you might actually "need" to recreate a Linux RPM from what's on your box more often), but I thought this script was kind of cool :)

Basically, you can take any Linux RPM (I tested against RedHat AS and SUSE 9), feed it to this script, like so:

host # ./rpm2pkg PROGRAM-3.2-1.rpm

and end up with your own valid Solaris pkg file named:

PROGRAM-3.2-1.pkg

My thought was that this might be useful, since one of the commands used inside it (rpm2cpio) is already included in Solaris 9 and 10. For architecture-independent, and binary compatible, programs, there obviously exists a need to convert Linux RPM's to a cpio archive that can then be extracted to the local filesystem on a Solaris machine. In my mind, the logical next step would be to skip ahead and create a valid Solaris datastream pkg file from that cpio output. This way, the process could be completed once and then distributed easily as a single pkg installation file to all of your Solaris servers:)

The script works, as mentioned above, by using rpm2cpio to extract the Linux RPM's contents and then using the "strings" command on the actual RPM to extract all the header information that we require to seed the "pkginfo" file. The prototype file is the easiest necessary pkgmk file to create since you just have to use find and pkgproto on the extracted rpm2cpio output.

If you want to be able to tweak this more to your liking, there's a lot more information about the Solaris pkg making process in our previous posts on building Solaris pkg files quickly and, more theoretically, what you need to know to create your own Solaris pkg files.

Enjoy the script and have fun trying to get Linux binaries to run on your Solaris box (Hint: You're success rate will be much greater on Solaris 10, since they're finally buying into "open source" :)

Best wishes,


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/usr/bin/perl

#
# rpm2pkg - creating Solaris pkg files from Linux rpm's
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

$rpm=$ARGV[0];
$tmp_dir="dir.$$";
$orig_dir=`pwd`;

mkdir("$tmp_dir");
chdir("$tmp_dir");
system("rpm2cpio ../$rpm|cpio -dim");
$proto_list=`find . -print|pkgproto|grep -v prototype`;
open(TMP_FILE, ">>prototype");
print TMP_FILE "i pkginfo\n";
print TMP_FILE $proto_list;
close(TMP_FILE);
@pkg_info=`strings ../$rpm`;
$count = 0;
foreach (@pkg_info) {
if ( $_ =~ /\# \@\(\#\)BegWS/ ) {
push(@rpm_info, $_);
$count++;
} elsif ( $_ =~ /\# \@\(\#\)EndWS/ ) {
last;
} elsif ( $count > 0 ) {
push(@rpm_info, $_);
}
}
$uname_s = `uname -s`;
$uname_r = `uname -r`;
$uname_p = `uname -p`;
$pkg_name = $rpm;
$pkg_name =~ s/\.rpm//;
chomp $uname_s;
chomp $uname_r;
chomp $uname_p;
chomp $rpm_info[1];
chomp $rpm_info[2];
chomp $rpm_info[5];
chomp $rpm_info[12];
open(PKGINFO, ">>pkginfo");
print PKGINFO "SUNW_PRODNAME=\"$uname_s\"\n";
print PKGINFO "SUNW_PRODVERS=\"$uname_r\"\n";
print PKGINFO "SUNW_PKGTYPE=\"usr\"\n";
print PKGINFO "PKG=\"$pkg_name\"\n";
print PKGINFO "NAME=\"$rpm_info[2]\"\n";
print PKGINFO "VERSION=\"$rpm_info[5]\"\n";
print PKGINFO "VENDOR=\"$rpm_info[1]\"\n";
print PKGINFO "ARCH=\"$uname_p\"\n";
print PKGINFO "EMAIL=\"me@xyz.com\"\n";
print PKGINFO "CATEGORY=\"application\"\n";
print PKGINFO "BASEDIR=/\n";
print PKGINFO "DESC=\"$rpm_info[12]\"\n";
print PKGINFO "PSTAMP=\"Your Name Here\"\n";
print PKGINFO "CLASSES=\"none\"\n";
close(PKGINFO);
system("pkgmk -o -b `pwd` -d /tmp");
system("pkgtrans -o -s /tmp `pwd`/${pkg_name}.pkg $pkg_name");
system("mv ${pkg_name}.pkg ../");
system("cd ../;pwd;rm -r $tmp_dir");


, Mike




Monday, March 10, 2008

Shell Script To Report Linux Server Hardware Information

server info script output

Please click above for a slightly larger view of the beginning of the output today's script provides :)

Hey There,

Well, I guess it's about time we starting putting some more shell scripts out there. The last 3 or 4 posts have all been how-to's (except the last one, which I suppose you could trim all the surrounding text and make a script out of ;) and it's high time to start hitting the shell again.

Today's offering is something we cooked up to tiptoe the fine-line between producing what a manager wants to see and what an administrator wants to see in a quick system profile. This has been tested on RedHat Linux and SUSE (only up to release 9.x). The only major difference is some extra output in the "SERVER - MEMORY" section (mostly when run on x86_64 architecture machines) that some of you may find useful.

If you're interested in something more basic, or generic, check out our previous posts on gathering system information on Solaris and gathering system information on RedHat Linux.

This is a pretty straightforward shell script offering that basically parses the output of the hwinfo command. We run it in "--short" mode for most options, but leave it long for parts where the shortening process removed vital information (Like the brand name of the server). It's formatted loosely, but is fairly easy to read. One of the things I like most about it (and the main reason I started writing it in the first place) is that it highlights the Manufacturer, Model and Serial number of the machine your Linux OS is running on. This generally isn't an issue when you're, say, running Solaris on your Sun box ;) Then, of course, I couldn't get away from putting in all the basic information about CPU, Memory, Disks, etc.

If you want to know more about your system than this little shell script will show you, the hwinfo command has a variety of options I chose not to include (Neither my manager nor I want to know about every little "debug" detail of the PCI controller unless we have to ;), but you can access just about any hardware related information using that command. Just run it as:

host # hwinfo --help

Assuming, of course, that you've run this script already as:

host # ./server_info.sh

and found it lacking.

If hwinfo isn't available on your machine (Oh, yes. Be sure you're "root" when you run this or you might not have the access required to pull some of the information hwinfo tries to get for you!), there are a number of other options available to you, both on SUSE, RedHat and different flavors of Linux. Off the top of my head, you can always give these commands a shot (assuming they exist ;) --> kudzu, lspci, lsusb, dmidecode and a great project (which even has a GUI now) called lshw. You should check that out if you or your manager dig this little shell script :)

Cheers,


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/bin/bash

#
# server_info.sh - display server hardware info
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

hwinfo="/usr/sbin/hwinfo --short"
hostname=`hostname`
separator="----------------------------------------"
echo $separator
echo "System Information For $hostname"
echo $separator
echo $separator
echo SERVER - MEMORY
echo $separator
/usr/sbin/hwinfo --bios|egrep 'OEM id:|Product id:|CPUs|Product:|Serial:|Physical Memory Array:|Max. Size:|Memory Device:|Location:|Size:|Speed:|Location:'|sed -e 's/"//g' -e '/^ *Speed: */s/Memory Device:/\n Memory Device:/' -e 's/\(Max. Speed:\)/CPU \1 MHz/' -e 's/\(Current Speed\)/CPU \1 MHz/'
echo $separator
echo SMP
echo $separator
$hwinfo --smp
echo $separator
echo CPU
echo $separator
$hwinfo --cpu
echo $separator
echo CD_ROM
echo $separator
/usr/sbin/hwinfo --cdrom|egrep '24:|Device File:|Driver:'|awk -F":" '{ if ( $1 ~ /[0-9][0-9]*/ ) print $0; else print " " $2}'|sed -e 's/^.*[0-9] //' -e 's/ //' -e 's/"//g'
echo $separator
echo DISK
echo $separator
$hwinfo --disk
echo $separator
echo PARTITION
echo $separator
$hwinfo --partition
echo $separator
echo NETWORK
echo $separator
$hwinfo --network
echo $separator
echo NETCARD
echo $separator
$hwinfo --netcard
echo $separator


, Mike




Tuesday, March 4, 2008

Creating Your Own Linux RPM's - The Initial Software Build.

Greetings,

A while back, we took a look at creating your own pkg files for Solaris Unix. Today we're going to continue in that tradition, but take a look at the (some would say) simpler process of creating your own RPM's for Linux. These builds have been tested on both RedHat and SUSE, since they seem like polar opposites to me no matter how many similarities they have ;)

The process of building RPM's is much simpler than creating packages for Solaris in that the post-software build portion only consists of creating one specification file and then running one command. Fewer steps, and the ability to add all of your software information into one specification file, makes for a much tighter (and easier to modify or reproduce) software packaging system.

Even though the process is simpler, I've split this post up into a few parts so that each aspect of RPM package creation could be given it's fair share of attention.

The first step in creating a Linux package (or RPM which - technically - stands for RedHat Package Manager, although the format is used on many flavors of Unix) is to actually compile (or build) the software you're going to be packaging. It's important to either log your output (or, at least, the commands you execute) during the build process, as that information is going to be needed by the "rpmbuild" command that we'll ultimately use to create the finished product.

For the purposes of this "how to," we'll assume that you've downloaded the source for PACKAGE-3.2-1.tar.gz already and have "gcc" (or a suitable compiler) and "make" on your system. Also, we'll assume that you have the user privilege required to build and install software on your system.

Now, we'll get going, step by step:

1. First copy off your PACKAGE-3.2-1.tar.gz file into an appropriate location (I usually put them in a place like /usr/src/packages/SOURCES, since that file will be where it needs to be later, but you can copy it off to anywhere you like):

host # pwd
/users/me/softbuilds
host # ls
PACKAGE-3.2-1.tar.gz
host # cp PACKAGE-3.2-1.tar.gz /usr/src/packages/SOURCES/.
<--- Note that /usr/src/packages may be a completely different location depending on the flavor of Linux you're running, but the subfolder SOURCES should always be there. The same note will apply to all other instance where I mention the /usr/src/packages directory.

2. Now use gzip and tar to unpack your gzipped source (or, use tar for both operations if possible. For instance, Gnu Tar has a -z flag that you can use to avoid calling "gunzip" (or "gzip") altogether:

host # gunzip PACKAGE-3.2-1.tar.gz
host # tar xpf PACKAGE-3.2-1.tar


or

host # gzip -d -c PACKAGE-3.2-1.tar.gz||tar xpf -

or

host # tar xzpf PACKAGE-3.2-1.tar.gz <--- Gnu tar required for this (Probably the default for your Linux OS)

3. Change directories into the directory created by unpacking your gzipped PACKAGE-3.2-1.tar.gz file and be sure to read the INSTALL and/or README file(s). One (or both) of these will almost invariably include the specific commands you need to run in order to build and install your software.

host # cd PACKAGE-3.2-1

4. Follow the instructions to build your software. Below, I've run down what a typical install of a generic software package would look like (and assumes no errors). The one important thing to note below is the use of the "--prefix=" argument to the "configure" command. We want to be sure to build our package into a completely separate directory than we actually intend for the RPM to install later. This may seem counter-intuitive, but it's actually the easiest way to complete some of the upcoming "rpmbuild" steps and avoid utter confusion or complication ;)

host # ./configure --prefix=/usr/local/PACKAGE-3.2-1
...
<--- Probably lots of output. Generally only helpful if you have errors. There may be any number of other options, aside from "--prefix=," that you'll need to pass to configure, but that should be explained in the INSTALL and/or README file(s). Worst case, you can run "./configure --help" to see a list of all available options for configuring the build of the software you're installing.
host # make <--- This is the command that will run through the compile/build of your software package.
host # make check <--- Sometimes "make test," although this option may not even be available in your software's Makefile.
host # make install <--- This will complete your installation.

The specifics of your build may be more or less complicated, but (from the above, assuming all went well) we should have noted the following successfully run commands, in order:

host # configure --prefix=/usr/local/PACKAGE-3.2-1
host # make
host # make check/test
host #make install


You actually won't be using the --prefix flag directly in your specification file later, but you'll need to know the prefix, so it's best just to jot it down.

5. Now that your build is complete, generate a list of all the files that got created when you did your build and keep this for future use (you'll need it for the specification file later). A quick and easy way to do this is:

host # find /usr/local/PACKAGE-3.2-1 >FILELIST <--- Redirect all your output to FILELIST.

or

host # find /usr/local/PACKAGE-3.2-1 >FILELIST 2>&1 <--- Only if, for some bizarre reason, find sends any output to STDERR that's important.

Now you've got your software package built and are ready to move on to the next step in building your RPM package. We'll pick up there tomorrow!

Until then,

, Mike