Showing posts with label switch. Show all posts
Showing posts with label switch. Show all posts

Wednesday, October 1, 2008

How To Easily Find The WWN's Of A QLogic HBA On RedHat Linux

Hey there,

Today's post should be nice and simple. Maybe even short... yeah ;) This is a bit of a follow-up on a post we did a long long time ago regarding Linux networking tips. It's quite a bit more specific, but remains true to the spirit of that post (from December 2007, which, still, seems like it was just last year ;)

Today, we're going to take a look at a really simple way to figure out the World Wide Name (WWN) of both your Fibre NIC's port(s) and the switch it's connected to. To add a bit of clarity here, a lot of times you'll hear a QLogic (which is what we have on the menu today) Fibre NIC referred to as an HBA (Host Bus Adapter). It's become common to interchange the meaning of the two, although, technically, the HBA is an I/O adapter that resides "between" the computer's Bus (A collection of wires that data gets transmitted through. Not a clever acronym as far as I know ;) and the Fibre Channel Loop, and deals with the overhead associated with the transfer of information between the two. This becomes even more confusing when you consider that some Fibre NIC cards either have an HBA on board or act in the capacity of an HBA. But, in the end, you have to ask yourself just one question: If you can fix one a' them doohickies when it breaks, who cares? ;)

Anyway, back to the topic of the day: Finding the port and switch WWN's for a QLogic Fibre NIC on RedHat Linux. The version we're testing on today is:

hostess # cat /etc/issue
Red Hat Enterprise Linux ES release 4 (Nahant Update 6)
Kernel \r on an \m

hostess # uname -a
Linux hostess1 2.6.9-67.0.1.ELsmp #1 SMP Fri Nov 30 11:51:05 EST 2007 i686 i686 i386 GNU/Linux


The setup on our machine is a very simple QLogic Fibre Channel Card (which comes with it's own on-board HBA), referred to by the operating system as qla2xxx (This translates to another meta-name in /etc/modprobe.conf, but since that name is an alias for this name, we'll use this one instead.). In order to find out the WWN's for this card, all we need to do is follow this simple process:

1. Find the instance of the card's name in the /proc filesystem:

host # find /proc -type d -name qla2xxx
/proc/scsi/qla2xxx
<-- Note that we referenced a directory here. Normally, you wouldn't know this, but we did, so it made the search go faster by .001 seconds (according to the "time" command which is notorious for being off by a few thousandths of a second every now and again ;)

2. cd into the qla2xxx directory and do an "ls -l" and a "file" on "*" (not really necessary, but fun :)

hostess # ls -l *
-rw-r--r-- 1 root root 0 Sep 30 12:25 1
-rw-r--r-- 1 root root 0 Sep 30 12:25 2
-rw-r--r-- 1 root root 0 Sep 30 12:25 3
-rw-r--r-- 1 root root 0 Sep 30 12:25 4
hostess # file *
1: empty
2: empty
3: empty
4: empty


3. Now, you've probably noticed that the file sizes are all 0 and the output of the file command says "empty" for all of them. This is an illusion the /proc filesystem plays. You can prove this (the long way) by copying any file into, say, /tmp and running the same commands again:

hostess # cp 1 /tmp/DELETEME
hostess # ls -l /tmp/DELETEME
-rw-r--r-- 1 root root 1033 Sep 30 12:27 /tmp/DELETEME
hostess # file /tmp/DELETEME
/tmp/DELETEME: ASCII English text


...it's interesting to note, also, that running the "stat" command doesn't even get you the correct information within the /proc filesystem. The results are the same as above: The file's empty if it's in /proc, but has mass when it's copied or moved out.

4. Now we can get our info. This egrep command pulls out all the salient information. There's a lot more output in each file and, depending upon your interests (or how bored you are right now ;), you might find a lot of it very helpful in a troubleshooting or performance-evaluation situation. First the command, then the explanation:

hostess # egrep 'QLogic|scsi-' *
1:QLogic PCI to Fibre Channel Host Adapter for QLA2342:
1:scsi-qla0-adapter-node=200000e08b12f98d;
1:scsi-qla0-adapter-port=210000e08b12f98d;
2:QLogic PCI to Fibre Channel Host Adapter for QLA2342:
2:scsi-qla1-adapter-node=200100e08b32f98d;
2:scsi-qla1-adapter-port=210100e08b32f98d;
2:scsi-qla1-target-0=50060e80039cab0a;
2:scsi-qla1-port-0=50060e80039cab0a:50060e80039cab0a:612c13:81;
3:QLogic PCI to Fibre Channel Host Adapter for QLA2342:
3:scsi-qla2-adapter-node=200000e08b18e575;
3:scsi-qla2-adapter-port=210000e08b18e575;
3:scsi-qla2-target-0=50060e80039cab1a;
3:scsi-qla2-port-0=50060e80039cab1a:50060e80039cab1a:612c13:81;
4:QLogic PCI to Fibre Channel Host Adapter for QLA2342:
4:scsi-qla3-adapter-node=200100e08b38e575;
4:scsi-qla3-adapter-port=210100e08b38e575;


The things to notice above, are that only two of the targets have the "scsi-qlaX-adapter-target" value set. This is because the "QLA2342" (qla2xxx's actual name, shown in the output) only has 2 ports (despite the four descriptors). You can actually read up more on the technical specs at QLogic's Official QLA2342 Spec Page. Notice that even "they" refer to the QLA2342 as an HBA, rather than a Fibre Channel NIC...

5. And, now, here's the information you're ultimately looking for. To find the WWN for each of the two "ports," check the egrep results for the files that came back with the word "target" in them. All four use the "port" keyword. You can grab this more succinctly off of the command line with:

hostess # grep -l "target-" *|xargs egrep -i 'QLogic|scsi' /dev/null
2:QLogic PCI to Fibre Channel Host Adapter for QLA2342:
2:Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0
2:SCSI Device Information:
2:scsi-qla1-adapter-node=200100e08b32f98d;
2:scsi-qla1-adapter-port=210100e08b32f98d;
2:scsi-qla1-target-0=50060e80039cab0a;
2:scsi-qla1-port-0=50060e80039cab0a:50060e80039cab0a:612c13:81;
2:SCSI LUN Information:
3:QLogic PCI to Fibre Channel Host Adapter for QLA2342:
3:Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0
3:SCSI Device Information:
3:scsi-qla2-adapter-node=200000e08b18e575;
3:scsi-qla2-adapter-port=210000e08b18e575;
3:scsi-qla2-target-0=50060e80039cab1a;
3:scsi-qla2-port-0=50060e80039cab1a:50060e80039cab1a:612c13:81;
3:SCSI LUN Information:


From the above, you'll know that the "adapter-port-0" line is the WWN of the port itself, and the "target-0" line is the WWN of the Fibre Switch it's connected to.

Simple as binary PI ;)

Cheers,

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Tuesday, July 1, 2008

Using Strings To Safely Get Program Usage Information On Linux And Unix

Hey There,

We've posted quite a bit about the "strings" command in various past-posts running the gamut from using strings to extract RPM header information to using the basic strings construct in C to make running shells on network sockets possible. Today we're going to take a look at the "strings" command in an entirely new light.

Imagine that you were tasked with running a particular command named, for the sake of argument, BLARG. Unfortunately, in our manufactured situation, BLARG has no man page, and searches for it in Google, and other search engines turn up no useful information. Also your boss just said that you needed to run it, and left it at that, with no further instruction (he also can't be reached. What's wrong with this guy? ;) BLARG is also a compiled binary.

Your basic inclination might be to just run it without any arguments, as many commands (like "mkdir") will give you the usage information you need if you use this method, like so:

host # mkdir
usage: mkdir [-p] [-m mode] dirname ...


However, lots of other programs don't, so it's not the wisest choice. Remember that BLARG could potentially be a very harmful program. Running it without arguments may destroy things you can't afford to lose.

Other options you have, would include (but not be limited to), the following, coupled with their undesirable possible outcomes:

1. You could give the command a bogus switch line, like "BLARG -xKECVDSLdlske" : Assuming that that command line is indeed bogus, lots of programs silently ignore bogus switches and run their default instructions anyway.

2. You could cat the command : This will probably just turn your terminal output into Chinese. Even if you redirect standard error to /dev/null, odds are standard output is going to include a lot of funky characters that might cause more harm than good. You might also note that, a lot of the time, the usage message is printed to standard error and not standard output!

3. You could use eval to run the program, like "eval BLARG" : Unfortunately, even though it seems counterintuitive, eval just evaluates a condition or program's return status. Unfortunately, in order to get that, it has to run the command.

4. You could use commands like crash to get the information : This can be a great way to find out the information you need. By typing "crash -h BLARG" you should, theoretically, get a dump of all the help information you need. Unfortunately, not all distro's of Linux and Unix include it by default and not all distros' versions of crash operate the same. Some require you to be proficient in running a debugger against a dump file, afterward. Way too much hassle.

So far, we've gone through about 5 options, going from worse to better. There are probably a lot more than I'm thinking up here as I type (email them to me at eggi@comcast.net with comments if you'd like, as I'd love to do a follow-up to this post with more of that kind of information).

One way I've found that is virtually foolproof, and works in every distro I've tested, is to use the "strings" command to extract usage information. If you've ever used strings before, you know that distilling what it spits out when you run it against a command to a universally acceptable output of help information for any and/or all binaries is next to impossible. The Linux version of the crash command comes much closer to doing this, and doing it better. But, for the rest of us (even those without the privilege to run "crash"), we can still get the information we need using "strings", like so:

host # strings BLARG 2>/dev/null|egrep -i 'usage|help' <-- Note that strings generally requires the fully qualified name of the binary, like /bin/BLARG or ./BLARG
usage: %s [-abcdefGHIJKv] [file ...]

and you can even add the universal "%s" printf modifier to your egrep if you want to get all the lines that might contain useful help information, if you're not sure that the usage message is limited to a single line of output. This has the side effect of, sometimes, making the output a little messy, although (as some of you may have noted) the above usage display (while better than nothing) doesn't really help you. You'll probably be right 99% of the time if you guess the -v flag stands for verbose or version, but you never know. Using strings and grabbing all the lines with %s can provide more insight, if not a more distracting view of the binary's guts (of course, this output is from another command entirely ;)

host # strings BLARG 2>/dev/null|egrep -i 'usage|help|%s'
%s: %s
%s: directory causes a cycle
%s %*u %-*s %-*s
ls: %s: %s
%s/%s
usage: %s [-abcdefGHIJKv] [file ...]
%ld%s-blocks
%s: unknown blocksize
%s: minimum blocksize is 512
%s:
%s: %m
netgroup: Cycle in group `%s'
%s.%s
(%s,%s,%s)
option requires an argument -- %s
unknown option -- %s
stack overflow in function %s
%.3s %.3s%3d %2.2d:%2.2d:%2.2d %s
%H:%M:%S
%a %b %e %H:%M:%S %Z %Y
%I:%M:%S %p
%s/%s.%d
YP server for domain %s not responding, still trying
<; errno = %s
%s: %s - %s
%s/bt.XXXXXX
%s/_hash.XXXXXX


Worst case, you can just run something like:

host # strings BLARG >OUTPUT 2>&1

and safely cruise the lines of text in the OUTPUT fiel to manually find what you need. You may have to ;)

In any event, you've got a great tool at your disposal to find out what you need to know the hard way. And, sometimes, that's the only way to be absolutely sure :)

Cheers,

, Mike

Tuesday, February 26, 2008

Offlining, Failing Over And Switching in VCS

Hey there,

Today we're going to address a question that's asked commonly enough by foks who use Veritas Cluster Server: What's the difference between offlining, failing over and switching when dealing with service groups across multiple nodes? That doesn't seem like a very common question. I'm sure it's probably hardly ever phrased that way ;)

Anyway, to the point, all three options above are useful (hence your ability to avail yourself of them) and sufficiently distinct that you should be sure you're using the one you want to, depending on what ends you wish to achieve. All of this information is fairly general and should work on VCS for Linux as well as Unix.

We'll deal with them one bye one, outlining what they basically do, with a short example command line to demonstrate, where applicable. None of this stuff is hard to pick up and run with, as long as all the components of your VCS cluster are setup correctly. Occasionally, you may see errors if things aren't exactly perfect, like we noted in this post on recovering indefinitely hung VCS resources.

1. Offlining. The distinction to be made here, most plainly, is that, when you offline a resource or service group, you are "only" doing that. This differs from failover and switching in that the service group you offline with this particular option is not brought online anywhere else as a result. So, when you execute, for instance:

host # hagrp -offline YOUR_SERVICE_GROUP -sys host1

that service group, and its resources, generally, are taken offline on host1, and nothing else happens. If you're operating in an environment where systems don't run service groups concurrently (active/active), you will have effectively "shut down" that service group, and any services it provides, for the entire cluster.

2. Failing Over. This is more of a concept than any particular command. When you have your cluster setup to fail over, if a resource or service group, etc, goes offline on one node (host1, for instance) and it wasn't brought offline on purpose, VCS will naturally attempt to bring it online on the next available node (listed in your main.cf configuration file). Needless to say, if only one host is listed for a particular service group, its failure on one host will mean the failure of the entire service group. It also obviates the use of VCS in the first place ;)

3. Switching. This is what most folks want to do when they "offline" a service group, as in point 1. Although, since VCS automatically switches unexpectedly offlined resources on its own (when it's set up to), it's reasonable for someone new to the product to assume that offlining a service group would engage VCS in a switching activity. Unfortunately, this isn't the case. If you want to switch a service group from host1 to host2, for example, these would be the command options you'd want to give to hagrp:

host # hagrp -switch YOUR_SERVICE_GROUP -to host2 <--- Assuming you're running this from host1.

Hopefully this little guided mini-FAQ helped out with differentiating between the concepts. If you find the command line examples valuable, even better :)

Happy failing! ;) Over.


, Mike