Sunday, December 2, 2007

Sifting Through the PROM

Every once in a while, like most admins (I hope;), I'll find myself in a situation where I'm working on a downed Solaris box and stuck at the PROM level. For the purposes of this post, we'll assume that diagnostics over and above the PROM level are impossible (Which is sometimes true).

The issue here isn't that I'm stuck at the PROM. That's no big deal and happens often. The problem is, every so often, I'll find myself stuck at the PROM on a Sun system type I've never worked on before. Depending upon how varied the PROM interface and commands are for that system, digging into my old bag of tricks my have me drawing up blanks left and right. The PROM "is" always very courteous about letting you know you got the command you typed in wrong and doesn't offer much in the way of help. For instance:

ok> show-me-the-money
show-me-the-money ?
ok>


The output from the PROM, letting you know it doesn't know what you're talking about, can vary also.

One of the most useful commands I've ever run across, at the PROM level, is called "sifting." Since you really have no help immediately available, and searching for particular commands in manuals or on the net can suck up huge amounts of time, this command is a life saver. Some folks refer to it as a "sifting dump" (the technical name), but the end result and usage are the same.

Best of all, the sifting command is incredibly easy to use (and the "one" command you should remember if you forget everything else ;). The command itself is incredibly simplistic. All it does is find all the OBP/PROM commands that contain the string you've specified as the only argument, and prints them out to the screen. That's it!

So, even if you've got your PROM chops down, when life throws you that curveball, you can still find the solution easily. Consider that you're working on a 4500 series server that hasn't been rebooted in 3 years. On boot up, everything seems to be going fine and, suddenly, you're presented with a poorly worded error about clock's being out of sync. What??? Assuming, still, that you have no access to any reference material of any kind on any medium, you can still figure this one out using sifting.

To continue, once that error pops up, you get dropped back to the PROM. Subsequent attempts to just boot again and hope the problem goes away have equal results. At this point, you know two things from reading the error itself: One clock on the system is out of sync with another clock, and/or the opposite. But how to synchronize them, since that would seem to be all the boot process wants you to do? This is where sifting becomes invaluable. Just like grep, the smaller the search string you provide, the more results you'll get. In this instance, we could do:

sifting clock

and we'd get back, among a few other things that can easily be dismissed:

copy-clock-tod-to-io-boards
copy-io-board-tod-to-clock-tod


I usually opt to copy the system clock to the I/O boards (copy-clock-tod-to-io-boards). My idea of best practices dictates that I'd go about this like so:

setenv auto-boot? false
reset-all
copy-clock-tod-to-io-boards
setenv auto-boot? true
boot


And, magically, the "clocks out of sync" error is gone!

The little example I showed you above is just the beginning. You can use sifting to find any command you need to know the exact name of (the PROM is so unforgiving ;). Try "sifting probe" if you can't remember what that probe command was that you needed to run, or "sifting net" if you've forgotten the exact name of "show-nets" or "test-net" (notice the seemingly arbitrary difference; one is plural and the other is singular). You can run sifting against a single character if you want to. Just get ready to have plenty of options ;)

Enjoy your exploration of the PROM commands available on your system of choice, and remember: sifting through the PROM will eventually lead you to the answer :)

, Mike