Today we're going to take a look at some "Solaris 10" specific stuff (at least, I hope it's only the OS ;) that's been making me nuts lately. My problem may have to do with the new patch update/management setup, but I've run down enough dead-ends on that hunt that I'm fairly sure it has to do with the implementation of the Solaris 10 "boot archive" and may also be contained within release "10/08," although I've read complaints from users running earlier versions (actually, strangely enough *** heavy sarcasm *** most of the complaints seem to stem from users of the more recent releases ;)
From a fresh install of 10/08, I thought (for once in my life) I'd do the convenient thing on my SunBlade and setup patch notifications. Usually, for my personal boxes, I'll leave everything alone and never fix anything until I notice that it's causing me a problem (which is generally never -- Not that I've ever run any Solaris versions that were bullet-proof, just that I didn't notice any problems I couldn't live with. Like they say; if it ain't broke... ;)
Now that I've gone through my 4th or 5th update using Sun's update manager (which basically just downloads all the patches I need and then runs patchadd in the correct order), I've tried doing the same thing manually - thinking that might be the issue - but ended up in the same quandary. The problem is starting to irritate me. I'm not so much worried about the fact that this issue occurs at all, just that it occurs on my workstation which doesn't have any sort of console connection to it. Ergo, if I run patch updates from home, I have to wait until I get back in the office to get past the single-user-mode hang-up.
The basic issue plays out like this (assuming a simple one disk system with no mirroring, etc):
1. Patches are added in the correct order, patch installs are validated and the system reboots.
2. After cruising past the ok> prompt and starting to boot back up, the system inevitably fails and stops at the dreaded "control-D-or-enter-root-password-for-maintenance" prompt.
3. The error message is always the same, with slight variations denoted by asterisks:
Warning: The following files in / differ from the boot archive:
Immediate fixing of the issue (assuming you're at the console) is very simple to fix, and (to their credit) Sun does include the exact steps you need to go through to take care of it, right after the error message.
Those steps would be:
1. Bring your system down to the PROM level after entering the root password:
host # init 0
host # halt (the stop+a keys for those of you who like to get as much bang for the buck from your keystrokes as possible ;)
2. Bring it up in failsafe mode and fix the boot archive problem (which it will, basically, fix for you):
ok> boot -F failsafe <-- with "-Z zpool_dataset" if you're booting a ZFS Root Pool ("boot -L" will list the pools out for you at the ok> prompt)
blah, blah, blah
Do you wish to automatically update this boot archive? [y,n,?] y
and, more often than not, you'll then have to run fsck against your root partition and either exit from single user mode to continue the boot process or do another "init 0" followed by a straight up "ok> boot"
Another option, after your system has failed to boot up properly following patching, is to just clear the boot archive. This works well also (even when your system is live), but is frowned upon in some academic circles. Just enter the root password to get into single user mode and run:
host # svcadm clear system/boot-archive
host # exit
and your machine will come up fine. I've tried applying some basic logic to the problem by executing that command pre-and-post-patching before rebooting, but I still end up in the same boat (??? Why, God? Why!!!!??? ;)
In any event, I'm sorry that we still don't have this site on new hosting. I would "love" for this to be a post that had comments enabled. Someone out there must know the answer (and not any of the regular ones about "known bugs that may never get resolved" ;)
Just as a "maybe/possibly" in closing, I noticed this on my Sparc workstation. At this point I have a sneaking suspicion it's a kernel patch revision issue (based on bootadm's output, tacked on to the end of the post ;), so (if what I've been reading on the message boards is any indication), I'll have this problem for anywhere from "a while" to forever, unless I decide to put my head on the chopping block and try to patchrm my kernel back to a state it's never been in ;)
Hope this post helps you out if you get stuck in the same situation.
host # bootadm list-archive
Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.