Wednesday, July 2, 2008

Dealing With ZFS-Rooted Zones on Solaris 10 Unix

Hey there,

Today, we're going to take a look at a problem that's been haunting Solaris 10 (and, to a degree, Open Solaris) for almost 3 years now. This ties back pretty closely to earlier posts we put out on migrating zones, patching local and global zones and working with zfs filesystems, since it has exactly to do with a problem concerning zfs, Solaris 10 zones and one specific way in which they can be created.

Theoretically, it would seem, that the one way that's causing the most problems is the one way that should be the most desirable way to configure your setup (???)

Here's a little something to think about when considering creating zones with a zfs filesystem. Although this quote is taken out of context, directly from Sun, it was actually put out there as a selling point (in its own context actually):

zones are integrated into the operating system, providing seamless functionality and a smooth upgrade path.

However, as many of you may be aware by now (it being July 2nd, 2008 and, officially, version 5/08 of Solaris 10 is on the market), although creating bootable zfs zones and zones with zfs root filesystems is now finally possible (I believe it was originally introduced, in a small way, back in 6/05 -right before the 06/06 official release), it still suffers from some severe issues, that may not be initially evident. That is to say, if you didn't do your homework before you took advantage of this seemingly great feature, you've probably gotten burned in one fashion
or another, with regards to the upgrade process. Per Sun, again:

Solaris 10 6/06 supports the use of ZFS file systems. It is possible to install a zone into a ZFS fs, but the installer/upgrader program does not yet understand ZFS well enough to upgrade zones that "live" on a ZFS file system.

Because of this (and repeating this ;) upgrading a system that has a zone installed on a ZFS file system is not yet supported. To this day (to my knowledge) the problem still hasn't been completely resolved. Again, from Sun's bug list (And, I know I sound like I'm Sun-bashing here, but I am coming to a positive point. I swear :)

zoneadm attach command Might Fail (6550154)
when you attach a zone, if the original host and the new host have packages at the same patch level but at different intermediate patch histories, the zone attach might fail. various error messages are displayed. The error message depends on the patch histories of the two hosts.
workaround: Ensure that the original host and the new host machines have had the same sequence of patch versions applied for each patch.


Basically, the way things stand now, if you have a zone built on a zfs root filesystem (rather than, say ufs), if you need to upgrade, you officially have 3 options:

l. Be pro-active and "Don't do it!"

2. Go ahead and do it, but be sure to uninstall your zones before upgrading to a new release of Solaris 10, and then reinstall them when your upgrade to the new release is completed.

3. Go ahead and do it, but instead of following the more traditional upgrade-path, completely reinstall the system in order to perform the upgrade. This option makes the least sense, since reinstallation and upgrading aren't synonymous.

Now, for the rainbow after the storm. Yes, rainbows are somewhat illusory and their beauty isn't necessarily the matched-opposite of the horrors of nature you may have had to endure in order for it to bring you to that phenomenon, but it's a lot better than nothing, right ;)

The situation, as you may have guessed, is still pretty much up-in-the air, but there is hope; and in more than one area. For x86 (and possibly Sparc), this initiative is being fast-tracked by Sun for Open Solaris/Solaris 10 (Note that it's dated June 27th, 2008 :) - It basically proposes a -b flag to zoneadm attach, to be used in conjunction with the -u flag, to allow for backing patches out of a zone before an OS update. The full discussion, to date, is located here on openSolaris.org.

Why is this important?

As we noted above, the biggest problem Solaris 10 has with upgrading the OS on machines that have zones with zfs roots is that every single patch and package must be the same after the upgrade in order for it to be considered successful and Solaris 10's update software doesn't work with ZFS well enough to be able to guarantee that patches that get installed in one zone will necessarily get installed in another (global vs. local, zfs vs. ufs/vxfs, even zfs vs. zfs). If we were allowed to ignore certain patches and/or packages in our upgrades, this might make the likelihood of failure drop dramatically!

And here's one more ray of hope (which might be even better by the time you need to apply it). Here's how to upgrade your OS, assuming it has zones mounted on zfs roots, and (possibly) get away with not having to go to the extremes Sun is obligated to recommend. It's actually fairly simple. Mostly because it isn't guaranteed to work ;) The one good thing is that, if it doesn't work, you'll have your data saved off, so (if this procedure fails) you can still do it the hard way and not lose anything, except time, by trying :)

Do the following. We'll assume you've read our previous posts on migrating and patching both local and global zones and understand the basic system-down commands that would necessarily precede the following:

l. Halt and detach each of your zones that sits on a zfs root (I'd personally do this for all of my zones), like so:

host # zoneadm -z ZONENAME1 halt
host # zoneadm -z ZONENAME1 detach


2. Export all of your zfs pools:

host # zpool export ZONENAME1 <-Make sure that you note the names here, for the reverse process, just in case!
host # zpool export ZONENAME2

3. Perform your upgrade however you prefer.

4. Import all of your zfs pools

host # zpool import <--If this doesn't work, use the names you specified during the export previously.

or

host # zpool import ZONENAME1
host # zpool import ZONENAME2


5. Reattach and boot/install your zones using the -F flag to force the issue (of course you can leave it out if you want, just to see what happens. Sometimes forcing makes things work that are flagged as errors, but aren't really. You can also use the -n flag to do a dry-run):

host# zoneadm -z ZONENAME1 attach -F
host# zoneadm -z ZONENAME1 boot -F
<--Note that for the zoneadm command. if you don't list a zone name with the -z flag, the subcommand (halt, detach, attach, boot, etc) would apply to all zones!

And you should either be all set or have a more limited set of issues to deal with (probably mostly patch related).

Later on in the week (if we don't run out of screen space ;) we'll look at ways to troubleshoot a Solaris 10 zfs-root zone upgrade gone bad.

Until that bright and sunny day:)

, Mike