Showing posts with label reboot. Show all posts
Showing posts with label reboot. Show all posts

Tuesday, December 30, 2008

Unix And Linux Easter Eggs For The Wrong Holiday

Hey there,

Today, since it's just past Christmas and almost New Year's, I figured this would be a great time to trot out some Linux and/or Unix Easter eggs. Actually, it doesn't make sense at all, but if you can put aside your burnt-in sense of the chronological order of the holidays, these can still be fun ;)

I found all of the Easter Eggs for today at a site with the very strange name Eeggs.com. I don't know what an eegg is, and I'm not sure that I want to know, but they have a great collection of Easter Eggs for all manner of OS' ;) I spent most of my time in their Linux section, but you could spend hours on other sections of their site and only occasionally be reminded that you're still at work. Of course, in all seriousness, if you're at work, the thought of driving home as soon as possible is keeping you aware of your location at all times ;)

The following are a few of the cooler ones I ran across (AND could personally verify). If you get a chance, drop by Eeggs.com and submit a support email asking why "eegs" isn't in the dictionary when "ain't" is ;)

1. Fun with PHP. This has worked with every site I've tested it against. The key here is just to find a php-enabled site, and navigate to a php page. Then, all you need to do is pass the php page a few arguments on the browser command line to find these four gems.

For a working example, we'll look at linuxandunixupdates.com's index.php page. Using that URL, we can add the following four strings and get the following four easter eggs. All of the links in this section are set to open up in new windows, so you can click on the link above and add the strings manually, or you can just click on any of the links below. I've also included a picture of the outcome of running those commands below each "magic string" just in case you're worried that I might be luring you into clicking on a redirected link or something else I don't have the time to invest in doing properly right now ;) You should be able to replicate this on any php page on any site anywhere. I haven't been able to fully test the veracity of that claim, but it appears to be true so far!

a. Add ?=PHPE9568F34-D428-11d2-A769-00AA001ACF42 to the end of your URL to see this picture:

php logo

b. Add ?=PHPE9568F35-D428-11d2-A769-00AA001ACF42 to the end of your URL to see this picture:

zend engine 2 logo

c. Add ?=PHPE9568F36-D428-11d2-A769-00AA001ACF42 to the end of your URL to see this picture:

squiggly php logo

d. Add ?=PHPB8B5F2A0-3C92-11d3-A3A9-4C7B08C10000 to the end of your URL to see the PHP Credits. This page looks exactly like the standard info.php page, but lists all the developers who worked on each component. I haven't included it here because it's incredibly long and there are more Easter Eggs to get to before we all forget why we're here :)

2. MAGIC reboot times in the Linux Kernel. This one is interesting, and a bit of a puzzle, since the original entry only gives the answer to the first time (they're all significant to Linux in some way). In any event, you can find these times by looking in /usr/include/linux/*.h and grepping for LINUX_REBOOT_MAGIC. As you can see, below, in our includes, they're all in reboot.h:

host # grep LINUX_REBOOT_MAGIC /usr/include/linux/*.h
/usr/include/linux/reboot.h:#define LINUX_REBOOT_MAGIC1 0xfee1dead
/usr/include/linux/reboot.h:#define LINUX_REBOOT_MAGIC2 672274793
/usr/include/linux/reboot.h:#define LINUX_REBOOT_MAGIC2A 85072278
/usr/include/linux/reboot.h:#define LINUX_REBOOT_MAGIC2B 369367448
/usr/include/linux/reboot.h:#define LINUX_REBOOT_MAGIC2C 537993216


MAGIC2 (as well as the MAGIC2A, B and C) is where you'll find the Easter Egg. If you take any of those values and convert them into regular time (using Perl, for instance), they resolve to an important date in Linux history.

host # perl -e 'print localtime(672274793). "\n";'
Sun Apr 21 17:59:53 1991
host # perl -e 'print localtime(85072278). "\n";'
Mon Sep 11 10:11:18 1972
host # perl -e 'print localtime(369367448). "\n";'
Mon Sep 14 21:04:08 1981
host # perl -e 'print localtime(537993216). "\n";'
Sun Jan 18 12:33:36 1987


Sun Apr 21 17:59:53 1991 is supposedly (and I'm not using the word "supposedly" to cast any more doubt than any reasonable human being would have. I'm not sure if the following is true, so I can only "suppose" that the folks who submitted these Easter Eggs aren't just prepping a new Wikipedia page. Just kidding, of course. Everything in Wikipedia is true ;)) the date Linus Torvalds first began writing Linux. The rest is left up to us to figure out. Something tells me the answers are all somewhere in this Linux Online Timeline.

3. And lastly, so there's plenty more left for you to check out at Eeggs.com, I really enjoyed this last one (actually there were a few others I'm dying to try, along the same lines, but don't have the proper OS' to validate right now) since I'm a "huge" fan of Douglas Adams, even beyond the HitchHiker's Series (although lots and lots of people got really upset over Mostly Harmless when he chose to wrap up the HitchHiker's Trilogy (with the 5th book in the series) in a manner that, apparently, was extremely dissatisfying to ardent fans of the series. I don't begrudge them their opinions. I dug it. I'm only sorry that he passed away and that we'll never know if the The Salmon of Doubt was going to be the sixth HitchHiker's book (answering the fan's complaints, at worst) or the next Dirk Gently novel.

Back to planet earth ;) If you open up vim, and type the following:

host # vim
[esc]:help 42


with the [esc]: being the actual "escape" or "esc" key, followed by the colon (:)

You'll, sadly, not get an explanation of the answer to the meaning of life, the universe and everything, but the payoff's just as pleasant :)

What is the meaning of life, the universe and everything? *42*
Douglas Adams, the only person who knew what this question really was about is
now dead, unfortunately. So now you might wonder what the meaning of death
is...

==============================================================================

Next chapter: |usr_43.txt| Using filetypes
...


Hope you all enjoyed those Easter Eggs and, should you decide to look for more, happy hunting :)

Cheers,

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Thursday, October 9, 2008

Shutdown, Reboot and Init Process Flow On Solaris Unix

Hey There,

Today's post harkens back to an earlier post we did on clearing up some common misconceptions about Solaris run levels. That post took care of going through the in's and out's of what the differences between boot, reboot, init, shutdown, etc, all mean and has a lot of good practical examples to demonstrate each point.

Branching off from that post, today we're going to add something of an appendix to it. If you're interested in some of the nittier gritty of the Solaris run level changing process, definitely give that older post another look. Maybe it'll go down easier the second time ;)

Below, we've got a nice collection of commands that would (or could) normally be executed by a root (or privileged) user to switch run levels, put together with what run level scripts in what run level script directories get run when those commands get ...executed (almost typed "run" again ;) And, just for flavor, the command and the script/directory execution sequence are further segmented into what run level the Solaris Unix OS switches to at each salient point. Note that, when you see a run level listed in the "run level" column, it's at this point that the run level has become the run level listed. Any points without run levels listed are executed at the run level of the previous entry in the example. Just trying to write a thousand words on how to not have to explain subtle differences in understanding of the blatantly obvious. I'm well on my way to being even more confused than I was when I woke up this morning... ;)

Hope you enjoy this chart and find it useful. Print it out and stick it to your refrigerator with lots of magnets from various pizza parlours and real estate agents. You never know when you'll have an epiphany in the kitchen. ...it's getting late in the day (both technically and figuratively) when you're all alone telling in-jokes ;)

Note: This chart assumes that you're starting from run level 0 (except for the reboot, init 6 and "shutdown -i6" sections). Generally, you can reverse the order of execution if you're starting from, say, run level 3, or combine multiple actions (or their opposites) when transitioning from one particular run level to the next. The stop/start lines for identical run level directories are also dependent on your understanding of which direction you're heading (down to 0 or up to 3, for instance). If this is too confuggling, just let me know :) Again, our previous post on unconfusing Solaris run levels may be of help in conjunction with this little map. Switching to run level 5 has been left out on purpose. Since that level is "power down," everything's going the way of the K ;)

Cheers and pardon the formatting,



command run level directories/scripts current run level (retrieved from "who -r" if it seems to not make sense)

boot -s /etc/rcS.d/S* start 0

boot -S /etc/rcS.d/S* start 0

reboot -- -s /etc/rcS.d/S* start 3

shutdown -is /etc/rcS.d/K* stop S

shutdown -i0 /etc/rc0.d/K* stop 0
/etc/rc0.d/S* start

shutdown -i6 /etc/rc0.d/K* stop 6
/etc/rc0.d/S* start
/etc/rcS.d/S* start
/etc/rc2.d/S* start 3
/etc/rc3.d/K* stop
/etc/rc3.d/S* start

init 0 /etc/rc0.d/K* stop 0
/etc/rc0.d/S* start

init s /etc/rcS.d/K* stop S
/etc/rcS.d/S* start

init S /etc/rcS.d/K* stop S
/etc/rcS.d/S* start

init 1 /etc/rc1.d/K* stop 1
/etc/rc1.d/S* start

init 2 /etc/rc2.d/K* stop 2
/etc/rc2.d/S* start

init 3 /etc/rc3.d/K* stop 3
/etc/rc3.d/S* start

init 6 /etc/rc0.d/K* stop 6
/etc/rc0.d/S* start
/etc/rcS.d/S* start
/etc/rc2.d/S* start 3
/etc/rc3.d/K* stop
/etc/rc3.d/S* start




, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Saturday, August 16, 2008

Linux and Unix Admin Humor - The Web Site Is Down!

Hey There,

This weekend's comedic gold is quite a bit older than I would have thought. I feel like I've missed out, although I'm glad that someone finally mailed this link to me :) This is a simple (although a bit long at around 10 minutes) video that is absolutely hilarious. If you've ever done any kind of computer administration (and maybe if you've been on the other end of this debacle) it's entertaining stuff. The voices have been "chipmunked," but not to protect the innocent/guilty, since there's a cast list during the credits at the end. I'm thinking this thing was probably twice as long and the distorted voices are from a speedup of the entire video. No matter how it plays, this is a killer humor :)

Below is my "ripped" version from the Web Site itself. I'm keeping in line with their no-rip policy by including the credits. They don't mind if their product gets distributed, as long as people like me don't claim to have produced it themselves. That's only fair. I'm sure some work went into this one:



Once, you're done watching, check out The Web Site Is Down, where you can see a much better quality version of this, and goof around a bit. They have one page where you can log on to a fake computer and it gives you no end of wisecracking command line returns.

NOTE: When you find the online Linux box and login (click the "desktop" link at the top right of the main page), be sure to "Reboot it!" 3 times. You'll catch some video and there are a few easter eggs you can find directly from there.

I've also included a link to this mini slowed down version of a part of the video, so you can hear their real voices.

I hope you enjoyed this as much as I did. Have a great Saturday :)

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Monday, August 11, 2008

Changing Solaris Run Levels And Clearing Up Some Common Misconceptions

Hey there,

Today's post hearkens back to an earlier time, when I was administrating Solaris 2.4 and figured Sun had big plans for run level 4. It was, after all "reserved for future use." I don't know about you, but (to this day) when I see terminology like that, I tend to shy away from getting too comfortable with working in a cheap hack. My thinking was that, at some point (given Sun's policy at the time) they'd come up with a unique use for run level 4 and I'd have a whole lot of stuff set up to launch from there and my world would be thrown into chaos ;)

I no longer worry too much about the chaos (something, somewhere is going to break or I'll eventually be out of a job ;), but I still like to keep tabs on Solaris' "level 4" status updates and their command run-level policies. So, today, we'll take a brief look at "run level 4" and then look at a few commands, used to change run levels, that may not act as you would expect.

1. Run level 4: It is still, technically, reserved for future use on Solaris. However, the subtle difference in meaning over time has been stated more clearly with the last few releases. Now, about half of the places you look (on Sun's Documentation Site) list "run level 4" as being "user-defined." What this means, aside from the fact that any admin can make use of it if he or she deems it worthwhile, is that I spent a lot of time worrying about nothing. Of course, I'm willing to forgive, since I never had any use for "run level 4" anyway.

On most Linux and Unix distro's, run level 4 is considered open and user-definable. In a few instances, on HP-UX and Slakware, among others, run level 4 is the default run level for the X-Windows display manager. Generally, this is found on run level 5. But, before I go crazy over-explaining, check out Wikipedia's entry on run levels for virtually every system known to man :)

The only thing you have to make sure of, for the most part, if you want to use Solaris' run level 4 is that you create an entry for it in /etc/inittab. Something like:

s4:4:wait:/sbin/rc4 >/dev/msglog 2<>/dev/msglog </dev/console



would probably suffice for distributions up to, and including Solaris 9. If there are minor differences in your distribution, you can simply copy the line for "s3," paste it below and change all the 3's to 4's. Fairly simplistic, but without it, Solaris' init will never know to look there and will not know what you mean when you try to change to run level 4.

On Solaris 10, things are a little goofier, but in a good way. You can mess around with the inittab and set things up that way, but if you use the "out-of-the-box" configuration and stick with the Service Management Facility (SMF), you won't need to worry about that. SMF has no concept of run levels and doesn't need /etc/inittab. Since SMF handles the order of starting services (with dependencies), you don't need to have a "separate" run level to accomplish whatever end you were shooting for. You can add new services with SMF and then goof around, to your heart's content, with svcadm, svccfg, svc.configd, etc. You may find yourself longing for the "old ways" until you get used to it.

2. Some command run level changing commands and what they're limited to: Some of this stuff still surprises me, since it seems counter-intuitive, but I guess I've always been lucky in the way I start stuff up and shut it down (my apologies to Clint Eastwood for bastardizing his famous quote from "Unforgiven" ;) Plus, I'm a big fan of "init" :) We're going to forget run level 4 exists for most of this part since it rarely matters... technically...

a. init: This command can be used to bring Solaris to any run level you specify (0, 1, s, S, 2, 3, 4, 5 and 6). This is a bone of contention, if you ever have to get certified, but Sun says there are only 8 run levels, of which only 7 are used. If you count "s" and "S" as separate (even though they both do the same thing) I would back you up if your answer was 9.

b. reboot: This command can only bring the system to run level 6. However, you can make reboot take you to other places by terminating the getopts routine and adding a flag afterward, so that:

host # reboot -- -s

would allow you to use the reboot command to get into single user mode. Also, another reason this perplexes me as much as it does is that, technically, bringing the machine to run level 6 entails the system going through a series of run levels leading all the way to the default run level (which is 3, if you left it alone). Sun's answer is technically true, but somewhat misleading. The only thing you really "can't" do with reboot is stop at run levels 0, 1, 2 and 5 (unless you make one of them your default run level in /etc/inittab - not a good idea, especially for 0 and 5 ;)

c. shutdown: This command can only bring the system to run levels 0, 1, s/S, 5 or 6. This makes sense to a degree, since run level 6 is reboot, 5 is power-off and 0 is PROM. I don't understand why it can shut the system down to run level 1 (single user with all local filesystems mounted read-write) and s/S (single user with only / mounted read-only) and not shut it down to run level 2 (basic multi-user with network) from run level 3 (same multi-user with nfs and some additional network services) straight up. As noted above, regarding reboot, you can actually make this command take you to any run level except 2 (unless you set that as your default run level in /etc/inittab)

d. poweroff: This command can only bring the system to run level 5. Perfectly sensible.

e. halt: This command can only bring the system to run level 0. This makes sense, too.

f. uadmin: This command can only bring the system to run levels 0, 5 and 6 (see above regarding why run level 6 means you can also use this command to get to run level 3, or your system default run level, as well). This command is restricted to super-user access by default. For the longest time I thought it could get you to any run level straight off, although I never needed to use it except to dump the system. An interesting fact about uadmin is that it can accept arguments. Which means, if you're feeling moderately clever, you can make it go places Mother won't let it go, by fudging it a little. Consider the following, understanding that uadmin converts arguments into integers as such, and is invoked as "uadmin command function optarg":

commands:

1 = No Disk Sync
2 = Sync Disks

functions:

1 = A_REBOOT <-- Run level 6
2 = A_SHUTDOWN <-- Run level 0
5 = A_DUMP <-- Run level 5

optional arguments:

Try whatever you want out, if you can afford to. Be sure to enclose an argument with spaces in double quotes.

host # uadmin 1 1 "-s kernel/unix" <-- Basically any option you can pass to the PROM "boot" command, you can sneak in here!

also, there are multiple versions of uadmin available, depending on your Solaris release, so you may be looking at these set of options, which will get you to run levels 0, s/S, 1, 2, 5 and 6 (which gives you run level 3) as well as allowing you the optional argument:

uadmin 2 0: sync the filesystems and drops system to ok prompt
uadmin 2 1: sync the filesystems and reboots to multi-user mode
uadmin 2 2: sync the filesystems and reboots interactively
uadmin 2 3: sync the filesystems and reboots to single-user mode
uadmin 2 6: sync the filesystems and powers off the system

uadmin 1 0: do not sync filesystems and drops system to ok prompt
uadmin 1 1: do not sync filesystems and reboots to multi-user mode
uadmin 1 2: do not sync filesystems and reboots interactively
uadmin 1 3: do not sync filesystems and reboots to single-user mode
uadmin 1 6: do not sync filesystems and powers off the system

Now, let's kick back and let the raging debate about OS version-and-release accuracy begin ;)

Cheers,

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Tuesday, August 5, 2008

Using Grub To Change RedHat Linux's Root Password

Hey There,

I'm back from the dead and no longer cursing Vista ( I've always thought it was strange that most people, including myself, feel disappointment or sorrow when they lose something they didn't have in the first place. The opposite seems perfectly logical. I feel very relieved that I no longer have something that I never had before ;)

Today, we're going to take a look at a quick way to get back your root password (assuming its yours) if, for some reason, you get locked out of your machine. It happens... more than I'd care to admit ;) The course of action I see followed most often is booting from CD, temporarily mounting the physical root drive, and editing /etc/passwd. This is a time-tested solution, and works on pretty much any version of Linux or Unix I've worked with, but I think this way is more fun (and slightly less dangerous). Plus, it saves you a little time (not a lot; just a little). We haven't taken a look at grub since our post on recovering failed raid disks, so I guess this post is about due.

The trick in question has only been tested on RedHat Linux ES, so I can't speak for whether it works on, say, CentOS or Fedora (although I imagine it would work on any system that uses grub for boot loading). Basically, what we're going to do is use grub's boot options to allow us to obtain root access. And, if your machine is properly secured (the /boot directory in particular), you shouldn't be able to edit /etc/grub.conf as a normal user, so physical access (or console/ALOM-type access) to the machine in question is required. It's a pretty simple procedure and goes something like the following:

1. Login to the console on the machine and type "reboot" or "shutdown -r," etc, if you have an account with privileges to initiate such an action. If you don't have an account with suitable privilege, try control-alt-delete (and power off) or hard power-down your machine (you may need to fsck later, but that's a given), take a deep breath and count to 11 ;)

2. Power on the server and wait for the grub boot screen to come up. You'll may not need the GUI for this to work, but it's the only way I've done it. When the grub boot menu comes up, hit the up or down arrow key at least once to stop the automatic boot countdown timer. If you have multiple boot options, choose the one you know (or believe) is the one currently in use (actually, this shouldn't matter, but loading up an older kernel might cause issues) and press the "e" key to enter edit mode.

3. After you enter edit mode, you'll be presented with a few lines of text (dependent on how you have your grub.conf populated). Using your arrow key again, navigate to the line that starts with "kernel." Press the "e" key again, and your cursor should show up at the end of the "kernel" line (if it doesn't, you can move it to where you need it by using the left and/or right arrow keys as necessary).

4. Now that you're in edit mode, and your cursor is in the correct position, type a "space" character followed by "single." So if your boot command line was:

kernel /boot/vmlinuz-2.6.9-34.ELsmp ro root=/dev/sda1
it would now be:
kernel /boot/vmlinuz-2.6.9-34.ELsmp ro root=/dev/sda1 single

5. Now type "b" to continue the boot process and you'll be dumped into a limited shell, as root, passwordlessly. Sometimes this has seemed not to work for me if I changed my edit-focus to a line other than the "kernel" line before typing "b", but that could just be superstition on my part. Thankfully, I don't have to do this all that often :)

6. The rest is gravy. You're root, so all you have to do is type "passwd", set the root password to whatever you like and reboot using your preferred method (reboot, shutdown -r, init 6, whatever works, etc). Since you're in a single-user shell, you can also instantiate a reboot by just typing "exit."
And you should be all set. Now you no longer have an excuse to avoid fixing problems on that machine (the downside ;)
Cheers,

, Mike



08/06/2008 - Thanks for this Additional Useful Information From zcat:

On many distros the 'single' or 'rescue' boot will still ask for a
password. You can get around this by starting linux without starting
initd, just launch a shell instead; and it's blindingly fast.

'e' to edit the boot entry, select the kernel line and press 'e'
again, then type "init=/bin/bash", enter, press 'b' to boot it. You
end up at a root prompt with / mounted read-only. (depending on the
distro, you might need /bin/sh instead)

# mount / -o remount,rw
# passwd
<change your root password here>
# mount / -o remount,ro
<three-finger salute or hit the reset button>



It's also useful for fixing up boot problems, if you're silly enough
to have put commands in various init scripts that don't actually exit
or daemonize...




Thanks for these comments from Laurent regarding an alternate way to get to single user:

Nice to see that method that I was used to apply. Especially with some servers that have been hardened with password aging implementation. And when it is stable you don't need to log on for more than 60 days sometimes....

You could also add that grub can (should?) be password protected.

Cheers

Laurent

Thursday, May 22, 2008

Using Last To Its Full Potential On Linux

Hey There,

This probably comes as no surprise to most Unix or Linux administrators out there (at least this first thing), but I find it's always interesting how rarely the "last" command is used to determine anything other than the users logged in "now" and the "last" time a user logged in.

Granted; the last command doesn't offer too much in the extra-functionality department, but it does have one very useful feature. Normally, if you were to run last, you'd see output like the following:

reboot system boot 2.6.5-7.283-smp Thu Jan 25 18:06 (00:21)
user1 pts/1 host.xyz.com Thu Jan 25 08:03 - down (00:27)
reboot system boot 2.6.5-7.283-smp Thu Jan 25 08:01 (00:29)
user1 pts/1 host.xyz.com Thu Jan 25 07:50 - down (00:06)


But, if you add the "-x" switch to the "last" command, it gives you a lot more detailed information about system run-level changes, which makes it a more accurate way to determine what happened if, and when, your system ever goes down unexpectedly! Here's output from that same swatch of time using "last -x":

runlevel (to lvl 3) 2.6.5-7.283-smp Thu Jan 25 18:06 - 18:28 (00:21)
reboot system boot 2.6.5-7.283-smp Thu Jan 25 18:06 (00:21)
shutdown system down 2.6.5-7.283-smp Thu Jan 25 08:31 - 18:28 (09:56)
runlevel (to lvl 6) 2.6.5-7.283-smp Thu Jan 25 08:31 - 08:31 (00:00)
user1 pts/1 host.xyz.com Thu Jan 25 08:03 - down (00:27)
runlevel (to lvl 3) 2.6.5-7.283-smp Thu Jan 25 08:01 - 08:31 (00:29)
reboot system boot 2.6.5-7.283-smp Thu Jan 25 08:01 (00:29)
shutdown system down 2.6.5-7.283-smp Thu Jan 25 07:57 - 08:31 (00:33)
runlevel (to lvl 6) 2.6.5-7.283-smp Thu Jan 25 07:57 - 07:57 (00:00)
user1 pts/1 host.xyz.com Thu Jan 25 07:50 - down (00:06)


Interestingly enough, the "-x" flag still isn't available in Solaris, even in all the versions of the 10.x strain that I've checked out. There are other methods to get the information, but they are more tedious and require the user, or admin, to do enough work that they may as well script it out (or write a wrapper for "last" that allows for a "-x" flag ;)

Generally, you'll notice that this extra information is assigned to the "user" with the name of your "kernel" revision ( usually the value of "uname -r" or "uname -k." 2.6.5-7.286-smp, in our case) so you can run:

last -x|grep `uname -r`


to restrict your output to this system information and ignore all the user logins/logouts :)

While the information that "last -x" provides may seem extraneous and not generally worthwhile, I'd say that it's exactly the opposite. For instance, in our first, straight-up, last command, we only get the reboot time of (we'll take the last one) January 25th at 8:01 a.m. ( The year is 2008 since we're taking this from the top of the output).

Interestingly enough, again, last does not print the year, although you can get that information if you really want it. For more info on that, check out our previous posts on scripting out user deletion on Unix and the modifications for Linux, which both include Perl routines for tearing open wtmpx so you "can" get the "year" data if you want it :)

With "last -x," for that very same reboot, we know that the reboot command was issued by the system on January 25th at 8:01 a.m. (this helps put into perspective what last, without arguments, is "really" reporting. The "beginning" of the reboot process). We can then see that (and, just as a reminder, we're reading from the bottom of the output up!) the request to switch to "run level 6" (which is "reboot") was actually issued at 7:57 a.m.

The "shutdown" information on the next line is an all-encompassing time. It should always match the entire amount of time spent in all of the states we're looking at. It starts with the switch to "run level 6" at 7:57 a.m. and ends with the switch to "run level 3" (this system's default run level) at 8:31 a.m. Finally, after the "reboot" line, we see the switch to "run level 3" which happens from 8:01 a.m. (the time the "reboot" was called) until 8:31 a.m. (the time the system fully got back to "run level 3").

As you can see, just knowing the "reboot" time doesn't give a very accurate report of the time involved in the reboot, at a glance. We just know that it happened at 8:01 a.m. If we wanted more information, we might need to go look at system logs.

"last -x," however, makes it so that we can, just by reviewing that output, see that the reboot process actually began at 7:57 a.m. and didn't complete until 8:31 a.m. That may not be a long time for this machine (If it is, you'll know to look at the system logs, now :), but the length of time required for a normal reboot is very system-independent and, also, dependant on what sorts of scripts and programs are run on a controlled reboot, etc.

And that's the last I have to say about that ;)

Best wishes,

, Mike

Friday, December 14, 2007

Why Horrible Sun Boot Problems Aren't Always All That Bad

I had an experience at work recently that had me shaking me head (and wishing I'd left for home a few minutes earlier ;) One of our v490 servers, that was already racked,cabled up and ready to have the OS built and put on the network the following day, decided it just wasn't going to boot up; not even to an ok> prompt!

Without the keyswitch set to run extended diagnostics, the situation looked pretty severe. This is about all I saw before it would power back down to nothing:

1:0>Waiting for master in slave_spin() CPU=0:0, timeout in 29 seconds...
2:0>Waiting for master in slave_spin() CPU=0:0, timeout in 29 seconds...
3:0>Waiting for master in slave_spin() CPU=0:0, timeout in 29 seconds...
1:0>
1:0>ERROR: TEST = Slave Spin
1:0>H/W under test = CPU, Motherboard/Centerplane, I/O board, (system init)
1:0>Repair Instructions: Replace items in order listed by 'H/W under test' above.
1:0>MSG = ERROR :Timeout waiting for master, doing re-config reset.
1:0>END_ERROR


And "nothing!" Anyway, as is our company's policy, I placed a call to Sun Support and their suggestion, as is suggested plainly by the error above, was to have a Field Engineer come out and replace the CPU boards (including the CPU's and memory - which is actually faster), and if that didn't work, replace the motherboard, the centerplane and the I/O board, progressively, until the error went away. You can see why I wasn't too happy, right? We're talking about a potential 10 extra hours of work doing parts replacement, followed by diagnostics, followed by possible extra parts orders, replacements, diagnostics, add infinitum (if not ad naseum ;)

Here's the kicker. After hooking up a laptop to the ALOM port, we started the system up with extended diagnostics. It wasn't looking much better. In fact, it gave a lot of confusing errors, like (and I'm paraphrasing here, because I stopped logging my diag output after a while):

FATAL ERRORS:
This version of v490/890 servers only support Ultra IV Processors
CPU's Online:
cpu #0 - Ultra IV 1500
cpu #2 - Ultra IV 1500


What?? That seemed contradictory to me. So we did what isn't generally a good idea (unless your machine appears to be in a state of complete ruination anyway) and pulled the plug, let it idle and powered it back on with the diagnostic keyswitch set. This time it gave us a little more information, and - lo and behold - in between the thousands of diagnostic messages (in between the FATAL ERRORS and the "slave_spin" errors) this line popped up:

OBP/Flash version 4.16.4 does not support part number ##### (Which happened to be the part number of both of our CPU boards).

This was great news! But how to fix it? Of course, replacing the centerplane (which, if you've ever done it - or even watched it being done - understand that it can be a painstaking and extended process) would fix the problem. On the v490 server, the OBP resides on the centerplane, so that was one option (If we'd have followed Sun's advice, of course, we would have already gone through replacing both CPU boards and, possibly, the motherboard before getting to that point!)

Our system OBP/Flash version was 4.16.4, and for the 1500 CPU - Ultra IV CPU boards, we needed to be up to OBP/Flash version 4.18.1. Clearly the CPU boards had been put in the v490 without regard to whether or not they were actually compatible ;)

Our next step was to take an old CPU board and replace the two new ones with it (just to test) and, magically, the machine booted perfectly. None of the system components listed were in a state of failure, or on their way to failing. The 1350 CPU board we put in only required OBP/Flash version 4.15.6 to be supported, and our centerplane OBP exceeded that level.

Our options boiled down to, as we saw it then, installing the OS on disk while we had the one 1350 CPU board installed, downloading the latest OBP/Flash and installing it, and then shutting down and booting up with the two new 1500 CPU Ultra IV boards (While this was a perfectly workable solution, it seemed like there must be a faster way to do it). Net booting was also an option, but that would require modifying our net boot server and might also cause other unforeseen complicatons. We also didn't want to have to have Sun replace the centerplane, as this wasn't any more guaranteed to work than our system-install method.

We eventually ended up bringing a Sun FE on site and got the surprise of our lives (or at least our present days ;) Luckily, Sun FE's have a CD/DVD (So far as I know, it's been around for about a year and is only available to Sun personnel) called SUE (which stands for Sun Utility Environment - or something like that - I was sneaking peaks). This is a tool that's time came a long while ago. With it, the FE was able to boot us to the ok> prompt (using the 1350 CPU board) and run the OBP/Flash upgrade directly from CD!

That's stretching the truth somewhat - SUE actually creates a mini-boot environment in on-board memory and sets up a temporary alias so that you can reboot and upgrade the OBP/Flash. So, instead of having to install the OS, boot the machine into network mode, download the latest OBP/Flash and then reboot with the new flash file, like so (somewhat abbreviated):

init 0
ok> boot disk /flash-update-v490
<--- or whatever the OBP/Flash upgrade file was called.

We were able to update the OBP/Flash by just booting off of the SUE CD, picking the OBP/Flash upgrade from the list available on the CD and letting it do a :

reboot -- cdrom /flash-update-v490

That was a "huge" time savings! Hopefully, Sun will make this CD, or a CD utility like it, available to users (or, at least, contract holders) in the near future.

So, as it turned out, that absolutely horrible boot problem wasn't really all that bad. Rather than replacing every single piece of hardware on the system until we found the one that was bad, all we had to do was upgrade the OBP/Flash on the system!

Sometimes the most complicated problems have the simplest solutions :)

Best wishes,

, Mike