Wednesday, February 13, 2008

Prtdiag And "Lane Width Failed" Errors On Solaris 10

A lot of admins who work with Solaris 10 Zones may have run into this situation already, but I'm just starting to hear it now from users who like to do their own diagnostics before coming to me with a possible system issue. Don't get me wrong folks; I love it when users show this kind of initiative (as long as they don't go into the data center and start pressing buttons ;)

This problem is similar to the problem Solaris 10 has with old style /usr/ucb/ps, but it isn't quite as prevalent.

The situation that occurs is that a user, trying to gather information on the system that may, or may not be, having a hardware or software issue, runs a pretty standard command called "prtdiag," probably like this:

host # /usr/platform/`uname -i`/sbin/prtdiag

Now, in Solaris 10, this doesn't always cause an error. The reason for this, and how it differs from the old style ps error, is that it only occurs when the user runs prtdiag on Solaris 10 with Zones enabled.

Another thing that makes this unusual error so rare is that, on most Zone-enabled Solaris 10 setups, users accounts are all setup in the non-global Zone and prtdiag will only run in the global Zone. Obviously, the "rules" aren't followed all the time, insofar as system setup and access are concerned. Like they say: the customer is always right, even if he's doing something he probably shouldn't be (or something like that ;)

So far, I've only seen this error on the Mx000 Series Servers from Sun, but that's probably because we use those the most. Generally, the error will present itself in some way similar to the following (Looks scarier than it is and the actual error may vary)

host # /usr/platform/`uname -i`/sbin/prtdiag <-- Stripping the 100 lines preceding the error in my ongoing effort to fight eye-strain ;)

...
IO Lane/Frq
LSB Type LPID RvID,DvID,VnID BDF State Act, Max Name Model
--- ----- ---- ------------------ --------- ----- ----------- ------------------------ ------------------
Logical Path
------------
Getting lane width failed for path /pci@3,800000/SUNW,emlxs@0


And, again, just like our ps error on Solaris 10, running the command as root makes everything work just fine:

root@host # /usr/platform/`uname -i`/sbin/prtdiag

...
IO Lane/Frq
LSB Type LPID RvID,DvID,VnID BDF State Act, Max Name Model
--- ----- ---- ------------------ --------- ----- ----------- ------------------------ ------------------
Logical Path
------------
00 PCIe 3 1, fc21, 10af 1, 0, 0 okay 4, 4 SUNW,emlxs-pci10af,fc21 LPe110094-S
/pci@3,800000/SUNW,emlxs@0


Sun's stock answer, for now, is to change the permissions of the prtdiag command so that it runs setuid root (For those of us who are reading this and don't know what that means - A very small portion of the audience that bothered to read this far, I'm sure - when a program is setuid "username," it will run as that user - with that user's privileges - no matter what user actually executes it)

root@host # chmod 4755 /usr/platform/`uname -i`/sbin/prtdiag

or, if you prefer to change your file modes in alpha:

root@host # chmod u+s /usr/platform/`uname -i`/sbin/prtdiag

Probably the best way to work-around this, and keep with Sun's basic security requirement of not running programs like this setuid root, is to make use of a program called sudo (which comes with Solaris 10). Just include a rule like the one below, so that users can only run "/usr/platform/`uname -i`/sbin/prtdiag" straight up and can't run it with any additional flags or switches. This will allow them to get the information they want and safeguard you, the admin, against any unforeseen issues with this work-around. Example rule below:

ALL ALL = (root) /usr/platform/`uname -i`/sbin/prtdiag ""

Another promise from me that, eventually, we will get to a post devoted entirely to sudo. For now, here's a quick rundown on how this rule reads:

ALL <--- All users can use this sudo rule
ALL <--- This command can be used on any host.
= <--- The cement that connects the preceding user and host restrictions with the commands and options to follow
(root) <--- This command will be run as the user root
/usr/platform/`uname -i`/sbin/prtdiag "" <--- This is the only allowed command. This command specifies that /usr/platform/`uname -i`/sbin/prtdiag can only be run with no additional switches or flags (like "-v"). The "" (double-double quotes) indicate that no switches are allowed after the command.

Hopefully, and in all likelihood, this solution should keep everyone happy. You've kept the security flaw from being exploited, retained the original permissions on /usr/platform/`uname -i`/sbin/prtdiag and allowed users to be able to get their diagnostic output.

All that's left to do is thank your user base for helping make your job easier :)

, Mike