Wednesday, March 11, 2009

Locating New Backup Hardware Using Veritas NetBackup On The Solaris Unix Command Line

Hey there,

Yesterday we took a look at the basics of using Veritas/Symantec NetBackup to add a new TLD and drives to your existing machine. Today, we're going to go just one step beyond and assume a fairly commonplace situation, which has somehow, inexplicably arisen from THE SITUATION we found ourselves in yesterday. For some reason (and this hardly ever happens ...not sure which word to emphasize to obtain the maximum sarcastic drippage) after we connected our new Tape Loading Device (TLD or Tape Robot), and the two drives it contains, to our backup server, NetBackup - and, possibly, the server itself, is failing to recognize the new device(s). Again, we're going to assume that both the server, TLD, drives and all other hardware are absolutely fine and that all required connections between the devices are set up properly.

NOTE: Today's post is going to assume that some tried and true methods will get you to "good." Tomorrow's post will look at some other ways to make NetBackup recognize and work with your "known good" (and compliant) setup.

If we take the same direct route to initial discovery that we did yesterday, we'd run the same sgscan (which is, as one reader noted, shorthand for "sgscan all") command initially, like so (pardon the error output. I can't afford to create the situation I want to display so I'm doing it from memory):

host # /usr/openv/volmgr/bin/sgscan
/dev/sg/c0t0l0: Disk (/dev/rdsk/c0t0d0): "SUZUKI MBB2147RCSUN146G"
/dev/sg/c0t1l0: Disk (/dev/rdsk/c0t1d0): "SUZUKI MBB2147RCSUN146G"
/dev/sg/c0t2l0: Tape (???): "Unknown"
/dev/sg/c0t3l0: Cdrom: "Hyundai DV-W28E-R"
/dev/sg/c1t0l0: Changer: "Unknown"
/dev/sg/c1t1l0: Tape (???): "Unknown"


Basically, every line where it says "Unknown" is where we're interested in looking. The system can't find our TLD or its drives, so now we have to try to discover them ourselves (with and/or without NetBackup) and then come back around and use NetBackup to verify that we're okay. These steps are pretty dry, but I think if you follow them in a somewhat linear order (skipping some or doing some before others, if you're comfortable) they should get you to where you want to be. Fat, happy and with a TLD your backup server recognizes. Okay, maybe not happy ;)

Note:
If you feel uncomfortable about running any of the commands below, please enlist the assistance of someone who is either able to provide guidance (since each case is unique) and/or will get in trouble instead of you if things to go to Hell ;) j.k.

And, here we go. These steps won't be numbered, so I can't possibly screw that aspect up, but should be easy to follow since each command will be separated by space and begin with the "host # " prompt. Some of these commands, as the title of today's post suggests, may not exist on a flavour of Unix or Linux that isn't Solaris.

First, we'll take a look at our device tree. Do the device links listed in sgscan exist? Also, is /dev/rmt populated at all?

host # ls /dev/sg/c0t2l0 /dev/sg/c1t1l0 /dev/sg/c1t0l0 /dev/rmt
/dev/sg/c0t2l0 /dev/sg/c1t0l0 /dev/sg/c1t1l0

/dev/rmt:
0 0cb 0hb 0lb 0mb 0u 1 1cb 1hb 1lb 1mb 1u
0b 0cbn 0hbn 0lbn 0mbn 0ub 1b 1cbn 1hbn 1lbn 1mbn 1ub
0bn 0cn 0hn 0ln 0mn 0ubn 1bn 1cn 1hn 1ln 1mn 1ubn
0c 0h 0l 0m 0n 0un 1c 1h 1l 1m 1n 1un


They appear to be there, but they're probably bad. Let's try devfsadm, all on its lonesome and check sgscan again (From now on we'll just assume the output is the same as the train-wreck we witnessed above, until we get to the end. Hopefully, your journey will come to a close sooner!):

host # devfsadm

If this fails to produce results, you can try to run the same command with the "-C" option to remove stale links that no longer point to a valid physical device path:

host # devfsadm -C

Of course, if you know that you only had two tape drives before (/dev/rmt/0 and 1) and believe sgscan when it says it can't recognize the paths we listed, you can delete all of that stuff and try those two steps again. Sometimes it helps to force Solaris to recreate the dev links:

host # rm /dev/rmt/*
host # devfsadm -C


should be enough, but you can almost certainly do this, as well:

host # rm /dev/rmt/* /dev/sg/c0t2l0 /dev/sg/c1t1l0 /dev/sg/c1t0l0
host # devfsadm -C


Running the "ls /dev/sg/c0t2l0 /dev/sg/c1t1l0 /dev/sg/c1t0l0 /dev/rmt" listed above will, almost always, give you the same results once you've completed these steps.

You might also run this command if you have the drivers installed:

host # cfgadm -al

If you find a section with /dev/rmt1, /dev/rmt0 and the /dev/sg path to your Changer in it, and one or some of them are showing unconfigured (all the sections start with a controller number and a colon - in our setup the output is "c2:xxxx") you can either specifically configure any of the entries listed behind the controller number, by using the entire device name your rmt and disk changer devices are listed beside, or you can just configure the whole shebang. Why not?:

host # cfgadm -c configure c2

Listing it again with "cfgadm -al" should show all the appropriate devices as "configured." If it doesn't; don't worry. It probably doesn't matter, but was worth a shot.

Both "tpconfig-d" and "tpconfig -dl" will give you back the same results as sgscan (although formatted differently and limited to the tape and TLD information) if the problem still hasn't resolved. To save space and prevent carpal-knuckle syndrome, full versions of the output of these commands, as run against a working setup, are located at the bottom of yesterday's posts as a series of in-page hyperlinks. The only things that will be different in your execution of:

host # tpconfig -d

and

host # tpconfig -dl

output will be that the drives will usually either show up as DOWN ( possibly with an identifier - for us, hcart2 - and path like /dev/rmt/0) or you will get virtually no output at all ...yeah, I guess that's a "huge" difference :) If you notice that tpconfig returns a listing for you, this is positive, even if it shows your drives as "down." We won't go crazy yet, since we were going to run the next command, regardless:

host # vmoprcmd

Now we may get results that show "HOST STATUS" as <NONE> or, hopefully ACTIVE (good to go!), ACTIVE-DISK (can do local disk backups), ACTIVE-TAPE (can backup to tape, but, for some reason, can't backup to local disk) or even DEACTIVATED (either it's off or NetBackup thinks it is) or OFFLINE (Same as the last, except substitute offline for off ;) Your drives will also show as either non-existent, UP, UP-TLD, RESTART or DOWN (perhaps a few others, but all of them self-explanatory). As long as the tape drive type (hcart2 for us) is shown, you're on the way.

And the final things we'll try today will be to react to the output produced for the Tape Drives. If your TLD is still not showing, that's something for tomorrow. If you see your tapes in a DOWN state, but correctly identified as the types of tapes they are, this will probably do the trick for you:

host # vmoprcmd -up 0
host # vmoprcmd -up 1


for the first (0) and second (1) instance of the drive, listed in the first "Id" column of "tpconfig -d". You can also do this, which is easier (at least for me) to remember, since you can directly map it from the vmoprcmd output without squinting ;)

host # vmoprcmd -upbyname Drive000
host # vmoprcmd -upbyname Drive001


from the vmoprcmd output in the "Drive Name" column, which also happens to be the first column in the "vmoprcmd" output.

When you're done with that, or if your tape drives show as RESTART, do yourself a favor and stop and start NetBackup. You may not get a chance once you let everyone know it's fixed. If you don't have other startup scripts set up, you can use:

host # /usr/openv/netbackup/bin/goodies/netbackup stop

then run:

host # /usr/openv/netbackup/bin/bpps -a

and, if everything is gone (unless you're running the GUI - it's okay to not kill those PID's), start 'er up again, like so:

host # /usr/openv/netbackup/bin/goodies/netbackup start

and do another "bpps -a" to make sure all of the appropriate daemons are running. Then, just to make yourself feel better, and so you're absolutely sure, do one more "sgscan." All should look as it did in yesterday's post (see link-back above) and you should be all set. At least, you'll be ready to test some backups and pray that your troubles are over ;)

We'll be back tomorrow to look at some ways to deal with really pernicious and aggravating software and OS failures. Until then,

Cheers,

, Mike




Discover the Free Ebook that shows you how to make 100% commissions on ClickBank!



Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.