Monday, January 28, 2008

A Few Linux RAID Disk Monitoring Tips

Hey there,

For today's post, I thought I'd put together a few Linux RAID disk monitoring tips. This list isn't meant to be too specific, but more of a catch-all of stuff you probably have to do over and over again. But sometimes not often enough that it burns into your brain ;)

At the very basic level, you can manage your disk partitions using fdisk, much like in Windows, although completely different (type "m" to get a menu listing of available commands once you bring this little utility up). You can use the "-l" option to just print out the partition menu and not use it in interactive mode:

host # fdisk -l /dev/hda

Disk /dev/hda: 16 heads, 63 sectors, 77520 cylinders
Units = cylinders of 1008 * 512 bytes

Device Boot Start End Blocks Id System
/dev/hda1 * 1 6095 3071848+ fd Linux raid autodetect
/dev/hda2 6096 67047 30719808 fd Linux raid autodetect
/dev/hda3 67048 73142 3071880 fd Linux raid autodetect
/dev/hda4 75175 77206 1024096+ 82 Linux swap


If you want to display the status of your RAID devices, much the way metastat does on Solaris, it's much simpler. One of the great things about Linux is the way they've taken the /proc filesystem and turned into a really useful tool.

Entering the following will display the information you need. Note that this is a two way mirror with only one active mirror (I take the examples I can find on our network ;). Ideally, you'd want to have a [2/2] configuration for better HA.

host # cat /proc/mdstat

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda3[0]
62912 blocks [2/1] [U_]

md1 : active raid1 hda2[0]
153152 blocks [2/1] [U_]

md0 : active raid1 hda1[0]
307328 blocks [2/1] [U_]


If you're syncing up your RAID devices and want to check on the progress, just use the same command, like so:

host # cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hdc3[1]
1020032 blocks [2/1] [_U]

md0 : active raid1 hdc1[1]
3068288 blocks [2/1] [_U]

md1 : active raid1 hda2[2] hdc2[1]
3068288 blocks [2/1] [_U]
[==>..................] recovery = 13.7% (894298/6136576) finish=10.2min speed=7833K/sec
unused devices:


Now, assuming all that is taken care of and you want to check out your RAID configuration, you'll just need to read your RAID configuration table (/etc/raidtab):

host # cat /etc/raidtab

raiddev /dev/md0
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda1
raid-disk 0
device /dev/hdc1
raid-disk 1
raiddev /dev/md1
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda2
raid-disk 0
device /dev/hdc2
raid-disk 1
raiddev /dev/md2
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda3
raid-disk 0
device /dev/hdc3
raid-disk 1


If you want to get more detailed with your analysis, there are (of course) built in commands to enable you to do so. lsraid is one of my favorites :)

To display a short listing of the md0 device, type the following:

# lsraid -A -a /dev/md0
[dev 9, 0] /dev/md0 8F88ACB0.7D06B4C4.FA677344.C3448700 online
[dev 3, 3] /dev/hda1 8F88ACB0.7D06B4C4.FA677344.C3448700 good
[dev 22, 3] /dev/hdc1 8F88ACB0.7D06B4C4.FA677344.C3448700 good


If you'd like to display a short listing of the RAID array that the disk hda1 belongs to (essentially the same output in this case), just type:

# lsraid -A -d /dev/hda1
[dev 9, 0] /dev/md0 8F88ACB0.7D06B4C4.FA677344.C3448700 online
[dev 3, 3] /dev/hda1 8F88ACB0.7D06B4C4.FA677344.C3448700 good
[dev 22, 3] /dev/hdc1 8F88ACB0.7D06B4C4.FA677344.C3448700 good


lsraid also comes in handy if you want to list out any faulty devices. This is something to consider putting in cron, or running regularly, so that you get notified of any failures in a timely fashion. No sense in having RAID's HA setup if you wait for all the components to fail before you replace any of them ;)

For a short listing, go with:

# lsraid -A -f -a /dev/md0
[dev 9, 0] /dev/md0 8F88ACB0.7D06B4C4.FA677344.C3448700 online


If you want, or need, all the gruesome details, lsraid will be happy to oblige. Just invoke it like this:

lsraid -D -l -a /dev/md0
[dev 3, 3] /dev/hda1:
md version = 0.90.0
superblock uuid = 8F88ACB0.7D06B4C4.FA677344.C3448700
md minor number = 0
created = 1169715493 (Fri Jan 25 08:58:13 2007)
last updated = 1169933481 (Sun Jan 27 21:31:21 2007)
raid level = 1
chunk size = 64 KB
apparent disk size = 3068288 KB
disks in array = 2
required disks = 2
active disks = 2
working disks = 2
failed disks = 0
spare disks = 0
position in disk list = 0
position in md device = 0
state = good

[dev 22, 3] /dev/hdc1:
...


In a near future post (maybe tomorrow), we'll follow up with some tips on setting up your RAID arrays in Linux and walk through that process together.

Hope this helps you out some in the meantime :)

Cheers,

, Mike