Hey there,
I thought I'd start this week out with a yawn ...I mean a bang ;) This is a topic we've never touched on, but one that is used very often in most large computer networks: IP multi-pathing. Simply put, allowing for the "outside world" to have more than one path to your networked server. The closest we've come, to date, would be our post on SunCluster monitoring. I suppose that could be chalked up to the fact that we're concentrating on a particular flavour of Unix, while most of my Linux postings are (or attempt to be) broad-based and satisfy as many distro's as possible.
In any event, here's a quick primer on getting IPMP set up on your Solaris host with the minimum amount of hardware and hassle. Quick and easy (I think ;)
1. Why should I set up IPMP? You don't have to. It's not a requirement of anything except SunCluster (assuming you want to get your setup officially certified - otherwise you can hack your way around that, too). The main benefit to you is that you'll have the comfort of knowing that users will still be able to connect to your host over the network even if one of your network cards goes to pot And, of course, that you won't have to lift a finger to make things go back to the way they were once the disaster is over. Simply put: You'll have network failover working for you in case of a network card failure. The user will never know you had an issue, as they will always be accessing your host via the same IP address.
2. What is required to use IPMP? Generally, you'll want to have as many failover points as possible (or reduce the number of network single-points-of-failure). This would mean having two separate network adapters. This can be done with one (with failover happening between virtual interfaces on the same NIC), but in order to attain your minimum two-points-of-failure, you should have two different NIC's with each of those residing on a different physical bus. It's good practice actually have those network cards hooked up to the network and the links verified before proceeding. Also, you should do most of this through a serial console or ALOM connection. Since you're dealing with networking failure, if you connect via a regular network connection to any IP on the host, there's a good chance you'll be dropped unexpectedly at some point during the process.
SPECIAL NOTE: IPMP, on its own, is not meant to protect you from an entire network segment going down. It will handle failover between NIC's, but they all need to be on the same network segment, or subnet, so if the network goes down (our example 10.10.10.0/Class C), you're still going to be offline.
3. Is it easy to setup IPMP on Solaris? Yeah :) Here's how:
4. At the PROM level, be sure to set the local-mac-address? variable to true
ok > setenv local-mac-address? true
you can also set this at the OS level, using the eeprom command:
host # eeprom local-mac-address?=true
If you choose to make this change using eeprom at the OS level, you should reboot your box before proceeding.
5. Install the required pkg files: This step is really easy, since the in.mpathd binary comes in the SUNWcsr (Core Solaris) pkg file, which your system won't run without. Hopefully it's already installed ;)
6. Alter the FAILURE_DETECTION_TIME value in the /etc/default/mpathd configuration file from 10000 milliseconds to 3 or 4000 milliseconds. This isn't necessary, but it will drop the failure detection time below 10 seconds, which might save you from having to answer any questions if you experience an unexplainable split-second network "burp" - of course, use this to your taste. Setting it too low may cause your virtual interfaces to flap back and forth constantly!
7. In your /etc/hosts file, include information for the "floating IP" (The one everyone else will use to connect to your system) and the other two physical IP's. This isn't absolutely necessary, but it can be helpful later on if you happen to forget what's what on any given machine.
Ex:
10.10.10.1 hostname-phys1
10.10.10.2 hostname-phys2
10.10.10.3 virtualhost
8. Modify the /etc/hostname.* files so that they include the proper information (this is where the meat of the configuration is done, in my opinion. If it's even up for a vote ;)
Ex:
/etc/hostname.hme0 (contents)
hostname-phys1 group ipmpgroup netmask + broadcast + deprecated -failover up
addif virtualhost netmask + broadcast + up
/etc/hostname.qfe0 (contents)
hostname-phys2 group ipmpgroup netmask + broadcast + deprecated -failover standby up
NOTE: You can, in our example, set the /etc/hostname.qfe0 file to be exactly the same as /etc/hostname.hme0 (with the only different being the "hostname-phys2" at the beginning of the entry) and it will work just as well. If you are using Veritas Cluster Server to do the failing over for you, it will not work unless you have one of your failover NIC's set to "standby" rather than "up"
9. Add both of your virtual IP's to an IP group (named ipmpgroup in this case - it can be anything you want) using ifconfig (You could also have done this before (added NIC's to groups before creating your /etc/hostname.* files), but, as long as we haven't started this baby up yet, the exact order doesn't matter):
host # ifconfig hme0 group ipmpgroup
host # ifconfig qfe0 group ipmpgroup
10. Activate: You can do this one of two basic ways (I'm sure there are more if you're creative about it ;) Reboot your machine or make the interfaces active manually. To activate manually, all you really need to do is copy and paste the contents of the two /etc/hostname.* files onto the command line, one after the other. If you have two lines of input in any, or either file, try to fit them all onto one line when executing from the command line.
11. Test: Run a continuous ping against your "floating IP" (10.10.10.3 in our case), and start pulling cables and resetting them. Do this one at a time, of course. If you remove both physical network connections at once, your machine will be off the net :)
You can also use the if_mpadm command to help with testing, if you prefer to "virtually" pull the plug on your physical interfaces. For instance "if_mpadm -d INTERFACE_NAME" will disable an interface, just as if you pulled the cable out. "if_mpadm -r INTERFACE_NAME" will reset the interface to it's "natural" state (however you have it set up; even if that's wrong ;) Check out the if_mpadm man page for more information on this command, although there's not much more to it.
If you're interested in jumping forward and getting into more advanced IPMP configurations, you can also check out the Sun Documentation Site and read the IP Network Multipathing Administration Guide at your leisure.
Cheers,
, Mike