Tuesday, January 29, 2008

Creating RAID disk sets on Linux

Today's post is a somewhat-continuation of yesterday's post on RAID disk monitoring. We looked at monitoring then, and (the reason I call this a somewhat-continuation) now we're going to take a step back and look at setting up those RAID disk sets (a simple mirror set in this example)in the first place ;) Note that all the examples I'm using today are taken from RedHat Linux AS and can be run in bash or any native shell. Scripting some of this out is near impossible, but someday soon I may give it a go.

In any event, once we've covered all this ground, we'll definitely look at how a handy shell script or two can save a whole lot of time on these processes :) So, off we go:

The first thing you'll want to do is to create your partitions on each physical disk, using the fdisk utility. I personally prefer it from the command line, so if you use Disk Druid or any other GUI interface, it should be easy enough to translate this to that so I don't have to over-explain.

The first thing we'll do is to create the disk partitions and assign them to an appropriate RAID device. The steps should roughly be the same as these:

1. Invoke fdisk at that command line.
2. Select "n" to add a new partition (anything from a slice of disk for mirroring two disks, to an entire disk for higher level RAID and mega-storage).
3. Once added, select "t" to change the partition flag and select "Linux Raid Autodetect" (selection "fd" on my console, possibly different on yours).
4. Select "a" to add a bootable flag to the partition, if necessary.
5. Save the new disk layout, and quit, by selecting "w"
6. Repeat as necessary

It's important, since we're doing this from the command line down-and-dirty, that we make sure that all partitions on both disks are "exactly" the same size. Unless your secondary mirror disk is larger, in which case you just need to make sure that you only use partitions larger than the originals to mirror to (It only makes sense. You can't copy a larger filesystem to a smaller one without losing data - The RAID utilities will discourage you from doing so, anyway).

Now, you'll need to put filesystems on all of those partitions. Generally, you should be able to take care of this using mkfs:

host # mkfs -t ext3 /dev/hda1

In the bizarre event that this fails, or you're using an older or different OS than I am, this should also work:

host # mkfs -t ext2 -j /dev/hda1

Now we'll use set up our /etc/raidtab file, with the following (assuming the same for all slices):

host # raiddev /dev/md0
raid-level linear
nr-raid-disks 2
nr-spare-disks 0
chunk-size 64
persistent-superblock 1
device /dev/hda1
raid-disk 0
device /dev/hdc1
raid-disk 1
...


Then we'll initialize the RAID device, like so:

host # mkraid /dev/md0

Once again, per yesterday's post on RAID disk monitoring, we can check on the RAID device's progress by simply doing the following:

host # cat /proc/mdstat

and we'll know instantly if the RAID device is in a good or bad state, and how far along it is if it's still synching. You can use the above command to check status whenever you make a change to your RAID configuration for any valid RAID md devices.

If you ever want, or need, to remove any RAID partitions, you can do that like this (In this example, the hda1 slice is having issues on the md0 device):

host # raidsetfaulty /dev/md0 /dev/hda1
host # raidhotremove /dev/md0 /dev/hda1


Sometimes, as a failsafe (this depends on your situation and what device you're removing) you may want to re-install grub on your disk device, like so:

host # grub-install /dev/hda

And, finally for today, if you do end up having to remove a RAID mirror partition and it can be replaced inline (A whole disk in itself, for instance - possibly just a partition on a disk if bizarre filesystem corruption was the only issue and you've recreated the filesystem), do the following:

1. Add the new RAID devices as before (create, set filesystem type, assign same partition names if possible, etc) and check the status of the RAID groups. [2/2] means that the RAID group is ok, [2/1] means that the RAID group needs to be resynced (you'll note from yesterday that the RAID group we're working on is in a [2/1] state, with only one active mirror. If you noted that this could be, and should be, fixed - as well as the fact that this failure would drop that RAID partition because of this condition - you're sprinting ahead of my tutorial skills ;)

host # cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hda1[0]
30716160 blocks [2/1] [U_]
...
unused devices: <none>


2. Determine your RAID group and mirrors from /etc/raidtab:

host # cat /etc/raidtab
...
raiddev /dev/md0
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda1
raid-disk 0
device /dev/hdc1
raid-disk 1
...


3. Now, you'll want to resync the RAID mirror sets that got goofed up when we had to replace the RAID parititions:

host # raidhotadd /dev/md0 /dev/hda1

4. Assuming you've lost more than 1 mirror, just repeat the procedure above as many times as necessary.

5. Lastly, assuming all went well, kick back and enjoy the view ;) - Note that Linux will normally queue resyncs and do them one at a time, so don't panic if you only see one out of 5 disk groups being worked on when you check this out!

host # cat /proc/mdstat
...
md3 : active raid1 hda1[2] hdc1[1]
30716160 blocks [2/1] [_U]
[===========>..........] recovery = 45.9% (34790000/61432320) finish=98.7min speed=8452K/sec
...
unused devices: <none>


And that's it for today. In another near future post (probably tomorrow) we'll look at the fun involved with RAID disk failure ;)

Best Wishes,