Monday, February 25, 2008

Booting A Known-Good Solaris Disk On A Different Server Type

Hey There,

For the most part, if you work on Sun (or any other kind of) servers routinely, when really bad things happen (like system crashes, etc), part of the process of recovery may ultimately include parts replacement. At its worst, the process may include a complete replacement machine (This has actually happened to me many times, after 15 or 16 parts replacement visits ;)

Generally, this isn't an issue, as Sun will want to replace, for instance, a v490 with a new v490 and you can just pop your old operating system disks in there and boot right up.

Occassionally, however, you'll run into a situation where the server you need replaced is no longer supported. In these instances (and in instances where your company has just decided to upgrade its servers), you find yourself in a little bit of a pickle. Nothing terrible, but you can't just pop the old disks in the new box and boot up without issue (We'll assume, for this post, that the hardware, itself, is compatible. In a lot of instances, new machines have different physical connection interfaces, which adds another layer of complication to the issue, but I digress...)

So, now that you've got your new box (which has OS compatible parts, compatible hardware, etc) and your OS disk from your old Sun server, you have to address how to get that new box up and running with your old OS disk. A lot of times, this can be a life saver, especially if you keep peripheral data on your boot disk!

The good news is, you only need to do three things to get yourself back up and running and not have to hassle with re-installing your OS, copying back from the original disk and hoping you get everything you need :)

First, after inserting your hard drive in the new server, you'll need to either boot off of cdrom or the network (Check out this post on extended boot options for tips on doing that). Whatever media you do decide to boot off of, local or remote, it's very important that you boot from an image that's as close to the OS on your original disk as possible. In a perfect world, you'll have a CD or netboot image that is the exact release of the OS you have on your original disk. At the very least, it should be the same major version. As a for instance, trying to complete the following steps using a Solaris 10 image to get your Solaris 8 disk to boot will seem to work, but will ultimately result in disappointment.

At the "ok>" prompt (or PROM) level on your new box, type something similar to the following (Basically, boot into single user mode):

ok> boot cdrom -s

Second, you'll want to mount your old disk on the mini Solaris environment you end up in after booting into single user mode, like so:

host # mkdir /tmp/a <--- You'll probably have to make your mount point in /tmp as all the other filesystems should be read-only in single user mode. /mnt may also be available to mount your physical hard drive on.

host # mount /dev/dsk/c0t0d0s0 /tmp/a <--- Note that the exact disk device might be different. You'll know as soon as you try to mount it and run an "ls" on "/tmp/a" whether or not you've gotten the right one. You may even find out sooner if the system outputs errors like it should.

Thirdly, you'll need to execute two crucial commands. These will make the difference between your system booting to a bizarre error, and resultant crash, or a successful start up.

From the single user command prompt, type:

host # devfsadm -C -r /tmp/a -p /tmp/a/etc/path_to_inst <--- This command will rebuild the device tree on your boot disk and create a new /etc/path_to_inst file.

host # installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t0d0s0 <--- This will install a new boot block on your old hard drive. This is the one step where it really matters what version of the OS your boot CD is. As referenced above, if you try to install a Solaris 10 boot block on a Solaris 8 OS disk, the command will succeed, but the following boot attempt will fail!

Once you've completed these three steps, you should be all set to bring the machine back down to run level 0 and boot up successfully! In the odd case you may need to make sure STDIN, STDOUT and STDERR are setup correctly. A simple fix for this is to just symlink in your mounted root's dev directory:

host # cd /tmp/a/dev
host # ln -s fd/0 stdin
host # ln -s fd/1 stdout
host # ln -s fd/2 stderr

Then boot (you can skip the above step most of the time)

host # init 0
ok > boot

And all should be well again :)

Best wishes,

, Mike