Tuesday, January 27, 2009

Adding NFS Management To A VCS Cluster With No Downtime

Hey there,

Today we're following up on yesterday's post regarding adding NFS management to an existing VCS cluster. In that post, we took care of adding the resource quickly, although the method did require us to take down VCS on all of the nodes in the cluster (And the distinction to be made is that we did it by bringing down VCS only (using "hastop -force") and not VCS and every resource it managed (using "hastop" straight up. In this way, although we temporarily lost HA/Fault Tolerance/Failover, none of our managed applications ever stopped, so, theoretically, no one else was the wiser).

Today we're going to take a look at the very same subject (before my aging mind wanders and I forget that I promised to follow up and/or change my Depends. I still remember that I owe you a few "Waldos," so I'm not completely gone yet ;), although, this time, we won't be bringing down VCS "and" we won't be bringing down the services and resources that it manages. That's assuming we don't screw up :)

To iterate the basic assumptions set in yesterday's post, for today's purposes, we're going to assume (per the title) that we're going to add an NFS resource to an existing VCS cluster on Linux or Unix (If you need extra help with SMF on Solaris 10, check yesterday's NFS-VCS post). We're also going to assume that we're dealing with a two-node cluster since that's easier to write about and cuts down on my over-explanation. It also means that I basically just had to slightly rewrite this exact same paragraph today ;) Also, we've disabled all NFS services on the host OS (meaning that we've stopped them and removed them from init scripts so that the OS won't try to manage them itself! A simple "ps -ef|grep nfs" check before you proceed should show you any remaining running NFS programs that you should kill).

Yesterday, we took care of phase 1 (The simple way according to me). Today, we'll forge boldly ahead and turn the technical-writing world on its ear by starting a numbered list with 2 ;)

2. The somewhat-less-simple way (according to me :)

a. We won't be bringing down VCS on any nodes today, but, as with yesterday, it's good practice to do all of your modifications on whatever node you choose to be your primary for the exercise and to make backups of the main.cf file (even though, using today's method, VCS should do this for you). Opening up the configuration file read/write on one node and running the VCS commands to modify its contents on another is highly discouraged. Luckily, doing that sort of thing contradicts common sense, so it's an easy mistake to avoid ;) It's also best to do this work on your primary node (the node that all the others have built their main.cf's from), although it isn't absolutely necessary (Read: could cause issues that don't need to possibly exist ;)

So, to add NFS management to our running VCS cluster, on the command line, we'd do the following on the primary node (as a special side-note, I like to run "hastatus" in another window while I do this. Since that command, with no arguments, gives out real-time broad diagnostic information, it can be a great help if you make a mistake, since you'll clearly see the ONLINE/OFFLINE messages, etc). First we'll freeze the service group (SG1) to ensure that no faults occur while we mess with it, and open up the configuration file:

host # hagrp -freeze SG1 -persistent <-- the -persistent option is only necessary if you want the freeze to survive a reboot, which we do, just in case.
host # haconf -makerw

b. Now, we'll run all the commands necessary to add the NFS, NFSRestart and Share resources to our configuration, and set them up in the exact same fashion as we did yesterday. Note that we're not actually going to run commands to create the "types" since these are all already available and included in types.cf, but (perhaps in a future post) we can do this quite simply if we ever need to using "hatype" and "haattr." For now, it's getting in the way of me finishing this post before you fall asleep ;) For each step, I'll also preface the command line with the specific part of the main.cf file that we're building, for back-reference.

c. First, we'll create the NFS resource:

NFS NFS_server_host1 (
Critical = 0

from the CLI:

host # hares -add NFS_server_host1 NFS SG_host1 <-- SG_host1 is just the name of host1 in our service group SG1. hares needs to know what service group it's adding a resource to.
host # hares -modify NFS_server_host1 Critical 0
host # hares -modify NFS_server_host1 Enabled 1

d. Then, we'll create the Share resource:

Share SHR_sharename_host1 (
Critical = 0
PathName = "/your/nfs/shared/directory"
Options = "rw,anon=0"

from the CLI:

host # hares -add SHR_sharename_host1 Share SG_host1
host # hares -modify SHR_sharename_host1 Critical 0
host # hares -modify SHR_sharename_host1 PathName "/your/nfs/shared/directory"
host # hares -modify SHR_sharename_host1 Options "rw,anon=0"
host # hares -modify SHR_sharename_host1 Enabled 1

e. And then we'll add the NFSRestart resource:

NFSRestart NFS_nfsrestart_host1 (
Critical = 0
NFSRes = NFS_server_host1
LocksPathName = "/opt/VRTSvcs/lock"
NFSLockFailover = 1

from the CLI:

host # hares -add NFS_nfsrestart_host1 NFSRestart SG_host1
host # hares -modify NFS_nfsrestart_host1 Critical 0
host # hares -modify NFS_nfsrestart_host1 NFSRes NFS_server_host1
host # hares -modify NFS_nfsrestart_host1 LocksPathName "/opt/VRTSvcs/lock"
host # hares -modify NFS_nfsrestart_host1 NFSLockFailover 1
host # hares -modify NFS_nfsrestart_host1 Enabled 1

f. Almost last, but possibly least, we'll setup the dependencies for the new resources:

NFS_nfsrestart_host1 requires hostip1
SHR_sharename_host1 requires MNT_mount1_host1
SHR_sharename_host1 requires NFS_server_host1

from the CLI:

host # hares -link NFS_nfsrestart_host1 hostip1
host # hares -link SHR_sharename_host1 MNT_mount1_host1
host # hares -link SHR_sharename_host1 NFS_server_host1

You'll note that I set all of the new entries to non-critical (Critical = 0), since VCS's default is to make the resource critical, which would mean that, if we made mistake, the NFS resource could cause a failover that we don't want if it doesn't work as expected. We'll reverse this condition once we know everything is good. One could argue that having a critical resource in a frozen service group is no big deal. Let the polemic begin ;)

g. Then we'll dump the configuration file (without locking it) so that the other nodes in the cluster will update as well, like so:

host # haconf -dump

Syntax checking isn't necessary, using this method, since (if you mistype) you'll get your error messages right away. This makes running "hacf -verify" unnecessary. And, once we're done testing the basic failover to our satisfaction, by forced onlining and offlining using "hares -switch/-online/-offline" (we can't do normal failover since the service group is frozen, but it's better this way for reasons mentioned above), we're ready to put everything back to normal, set the resources as "Critical" (if we want to) and thaw VCS back out ;) Again, if you get an error about the "lock" file, just touch (create yourself) the file you named as your lockfile when you ran the "hares -modify NFS_nfsrestart_host1 LocksPathName" command:

host # touch /opt/VRTSvcs/lock <-- (optional)
host # hares -modify NFS_server_host1 Critical 1
host # hares -modify NFS_nfsrestart_host1 Critical 1
host # hares -modify SHR_sharename_host1 Critical 1
host # haconf -dump -makero
host # hagrp -unfreeze SG1 -persistent
<-- note that we need to use the "-persistent" option to unfreeze the service group here, since it was frozen that way. Failure to do this can result in interesting calls in the middle of the night ;)

And, as infomercial-superstar Billy Mays might bellow like a wild African jungle beast jacked to the ceiling on Meth, "KABOOM!!! YOU'RE ALL SET!!!" Seriously, though, that guy really needs to calm down. The cleaning solution cleans; I get it. Please quit yelling at me ;) ...click on Billy's hyperlink if you need a good laugh and check out the "Gangsta Remix." :)

Believe it or not, there's an even simpler way to do this. I'll give it a rest for a while before I touch on this subject again.

You're welcome ;)


, Mike

Discover the Free Ebook that shows you how to make 100% commissions on ClickBank!

Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.