The Linux and Unix Menagerie: zpool

Showing posts with label zpool. Show all posts

Tuesday, April 28, 2009

DiskSuite/VolumeManager or Zpool Mirroring On Solaris: Pros and Cons

Hey There,

Today we're going to look at two different ways to mirror disk on Solaris (both free - but distinguished from freeware in that they're distributed for use on Solaris' commercial and (often) proprietary filesystems and OS).

The old way, probably every Solaris Sysadmin knows backward and forward. Using the Solaris DiskSuite set of tools (meta-whathaveyou ;), which was, at one point, changed to Solaris Volume Manager (which introduced some feature enhancements, but not the kind I was expecting. The name Volume Manager has a direct connection in my brain to Veritas and the improvements weren't about coming closer to working seamlessly with that product).

The somewhat-new way (using the zpool command) won't work - to my knowledge - on any OS prior to Solaris 10, but with Solaris 8 and 9 reaching end of life in the not-too-distant future, every Solaris Sysadmin will have some measure of choice.

With that in mind let's take a look at a simple two disk mirror. We'll look at how to create one and review it in terms of ease-of-implementation and cost (insofar as work is considered expensive if it takes a long time... which leaves one to wonder why I'm not comparing the two methods in terms of time ;)

Both setups will assume that you've already installed your operating system, and all required packages, and that the only task before you is to create a mirror of your root disk and have it available for failover (which it should be by default)

The DiskSuite/VolumeManager Way:

1. Since you just installed your OS, you wouldn't need to check if your disks were mirrored. In the event that you're picking up where someone else left off (and it isn't blatantly obvious - I mean "as usual" ;), you can check the status of your mirror using the metastat command:

host # metastat -p

You'll get errors because nothing is set up. Cool :)

2. The first thing you'll want to do is to ensure that both disks have exactly the same partition table. The same-ness has to be "exact," as in down to the cylinder. If you're off even slightly, you could be causing yourself major headaches. Luckily, it's very easy to make your second (soon to be a mirror) layout exactly the same as your base OS disk. You actually have at least two options:

a. You can run format, select the disk you have the OS installed on, type label (if format tells you the disk isn't labeled), then select your second disk, type partition, type select and pick the number of the label of your original disk. A lot of times these labels will be very generic (especially if you just typed "y" when format asked you to label the disk or format already did it for you during install) and you may have more than one to choose from. It's simple enough to figure out which one is the right one though (as long as you remember your partition map from the original disk and have made is sufficiently different from the default 2 or 3 partition layout). Just choose select, pick one, then choose print. If you've got the right one, then type label. Otherwise, repeat until you've gone through all of your selections. One of them has to be it, unless you never labeled your primary disk.

b. You can use two command (fmthard and prtvtoc) and just get it over with:

host # prtvtoc /dev/rdsk/c0t0d0s2 |fmthard -s - /dev/rdsk/c1t0d0s2

3. Then you'll want to mirror all of your "slices" (or partitions; whatever you want to call them. We'll assume you have 6 slices set up (s0, s1, s3, s4, s5 and s6) for use and slice 7 (s7) partitioned with about 5 Mb of space. You can probably get away with less. You just need to set this up for DiskSuite/VolumeManager to be able to keep track of itself.

Firstly, you'll need to initialize the minimum number of "databases," set up the mirror group and add the primary disk slices as the first mirrors in the mirror-set (even though, at this point, they're not mirroring anything, nor are they mirrors of anything ;) Note that it's considered best practice to not attach the secondary mirror slices to the mirror device, even though you can do it for some of your slices. You'll have to reboot to get root to work anyway, so you may as well do them all at once and be as efficient as is possible:

host # metadb -a -f /dev/rdsk/c0t0d0s7
host # metadb -a /dev/rdsk/c1t0d0s7
host # metainit -f d10 1 1 c0t0d0s0
host # metainit -f d20 1 1 c1t0d0d0
host # metainit -d0 -m d10
host # metainit -f d11 1 1 c0t0d0s1
host # metainit -f d21 1 1 c1t0d0d1
host # metainit -d1 -m d11
host # metainit -f d13 1 1 c0t0d0s3
host # metainit -f d23 1 1 c1t0d0d3
host # metainit -d3 -m d13
host # metainit -f d14 1 1 c0t0d0s4
host # metainit -f d24 1 1 c1t0d0d4
host # metainit -d4 -m d14
host # metainit -f d15 1 1 c0t0d0s5
host # metainit -f d25 1 1 c1t0d0d5
host # metainit -d5 -m d15
host # metainit -f d16 1 1 c0t0d0s6
host # metainit -f d26 1 1 c1t0d0d6
host # metainit -d6 -m d16

4. Now you'll run the "metaroot" command, which will add some lines to your /etc/system file and modify your /etc/vfstab to list the metadevice for your root slice, rather than the plain old slice (/dev/dsk/c0t0d0s0, /dev/rdsk/c0t0d0s0):

host # metaroot

5. Then, you'll need to manually edit /etc/vfstab to replace all of the other slices' regular logical device entries with the new metadevice entries. You can use the root line (done for you) as an example. For instance, this line:

/dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /users ufs 1 yes -

would need to be changed to:

/dev/md/dsk/d6 /dev/md/rdsk/d6 /users ufs 1 yes -

and, once that's done you can reboot. If you didn't make any mistakes, everything will come up normally.

6. Once you're back up and logged in, you need to attach the secondary mirror slices. This is fairly simple and where the actual syncing up of the disk begins. Continuing from our example above, you'd just need to type:

host # metattach d0 d20
host # metattach d1 d21
host # metattach d3 d23
host # metattach d4 d24
host # metattach d5 d25
host # metattach d6 d26

The syncing work will go on in the background, and may take some time depending upon how large your hard drives and slices are. Note that, if you reboot during a sync, that sync will fail and it will start from 0% on reboot with the affected primary mirror slices remaining intact and the secondary mirror slices automatically resyncing. You can use the "metastat" command to check out the progress of your syncing slices.

And, oh yeah... I almost forgot this part of the post:

The Zpool way:

1. First you'll want to do exactly what you did with DiskSuite/VolumeManager (since both disks have to be exactly the same). We'll assume you're insanely practical, and will just use this command to make sure your disks are both formatted exactly the same (just like above):

host # prtvtoc /dev/rdsk/c0t0d0s2 |fmthard -s - /dev/rdsk/c1t0d0s2

2. Now we'll need create a pool, add your disks to it (all slices as one) and mirror them:

host # zpool create mypool mirror c0t0d0 c1t0d0

3. Wait for the mirror to sync up all the slices. You can check the progress with "zpool status POOLNAME" - like:

host # zpool status mypool

And that's that. The choice is yours, unless you're still using Solaris 9 or older. This post isn't meant to condemn the SDS/SVM way. It works reliably and is really easy to script out (and when both of these methods are scripted out, they're just as easy to run and the only hassle the old way gets you is the forced reboot).

It's good to see that things are getting easier and more efficient. Although, hopefully, that won't make today's Sysadmins tomorrows bathroom attendants ;)

Cheers,

, Mike

Discover the Free Ebook that shows you how to make 100% commissions on ClickBank!

Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Wednesday, July 2, 2008

Dealing With ZFS-Rooted Zones on Solaris 10 Unix

Hey there,

Today, we're going to take a look at a problem that's been haunting Solaris 10 (and, to a degree, Open Solaris) for almost 3 years now. This ties back pretty closely to earlier posts we put out on migrating zones, patching local and global zones and working with zfs filesystems, since it has exactly to do with a problem concerning zfs, Solaris 10 zones and one specific way in which they can be created.

Theoretically, it would seem, that the one way that's causing the most problems is the one way that should be the most desirable way to configure your setup (???)

Here's a little something to think about when considering creating zones with a zfs filesystem. Although this quote is taken out of context, directly from Sun, it was actually put out there as a selling point (in its own context actually):

zones are integrated into the operating system, providing seamless functionality and a smooth upgrade path.

However, as many of you may be aware by now (it being July 2nd, 2008 and, officially, version 5/08 of Solaris 10 is on the market), although creating bootable zfs zones and zones with zfs root filesystems is now finally possible (I believe it was originally introduced, in a small way, back in 6/05 -right before the 06/06 official release), it still suffers from some severe issues, that may not be initially evident. That is to say, if you didn't do your homework before you took advantage of this seemingly great feature, you've probably gotten burned in one fashion
or another, with regards to the upgrade process. Per Sun, again:

Solaris 10 6/06 supports the use of ZFS file systems. It is possible to install a zone into a ZFS fs, but the installer/upgrader program does not yet understand ZFS well enough to upgrade zones that "live" on a ZFS file system.

Because of this (and repeating this ;) upgrading a system that has a zone installed on a ZFS file system is not yet supported. To this day (to my knowledge) the problem still hasn't been completely resolved. Again, from Sun's bug list (And, I know I sound like I'm Sun-bashing here, but I am coming to a positive point. I swear :)

zoneadm attach command Might Fail (6550154)
when you attach a zone, if the original host and the new host have packages at the same patch level but at different intermediate patch histories, the zone attach might fail. various error messages are displayed. The error message depends on the patch histories of the two hosts.
workaround: Ensure that the original host and the new host machines have had the same sequence of patch versions applied for each patch.

Basically, the way things stand now, if you have a zone built on a zfs root filesystem (rather than, say ufs), if you need to upgrade, you officially have 3 options:

l. Be pro-active and "Don't do it!"

2. Go ahead and do it, but be sure to uninstall your zones before upgrading to a new release of Solaris 10, and then reinstall them when your upgrade to the new release is completed.

3. Go ahead and do it, but instead of following the more traditional upgrade-path, completely reinstall the system in order to perform the upgrade. This option makes the least sense, since reinstallation and upgrading aren't synonymous.

Now, for the rainbow after the storm. Yes, rainbows are somewhat illusory and their beauty isn't necessarily the matched-opposite of the horrors of nature you may have had to endure in order for it to bring you to that phenomenon, but it's a lot better than nothing, right ;)

The situation, as you may have guessed, is still pretty much up-in-the air, but there is hope; and in more than one area. For x86 (and possibly Sparc), this initiative is being fast-tracked by Sun for Open Solaris/Solaris 10 (Note that it's dated June 27th, 2008 :) - It basically proposes a -b flag to zoneadm attach, to be used in conjunction with the -u flag, to allow for backing patches out of a zone before an OS update. The full discussion, to date, is located here on openSolaris.org.

Why is this important?

As we noted above, the biggest problem Solaris 10 has with upgrading the OS on machines that have zones with zfs roots is that every single patch and package must be the same after the upgrade in order for it to be considered successful and Solaris 10's update software doesn't work with ZFS well enough to be able to guarantee that patches that get installed in one zone will necessarily get installed in another (global vs. local, zfs vs. ufs/vxfs, even zfs vs. zfs). If we were allowed to ignore certain patches and/or packages in our upgrades, this might make the likelihood of failure drop dramatically!

And here's one more ray of hope (which might be even better by the time you need to apply it). Here's how to upgrade your OS, assuming it has zones mounted on zfs roots, and (possibly) get away with not having to go to the extremes Sun is obligated to recommend. It's actually fairly simple. Mostly because it isn't guaranteed to work ;) The one good thing is that, if it doesn't work, you'll have your data saved off, so (if this procedure fails) you can still do it the hard way and not lose anything, except time, by trying :)

Do the following. We'll assume you've read our previous posts on migrating and patching both local and global zones and understand the basic system-down commands that would necessarily precede the following:

l. Halt and detach each of your zones that sits on a zfs root (I'd personally do this for all of my zones), like so:

host # zoneadm -z ZONENAME1 halt
host # zoneadm -z ZONENAME1 detach

2. Export all of your zfs pools:

host # zpool export ZONENAME1 <-Make sure that you note the names here, for the reverse process, just in case!
host # zpool export ZONENAME2

3. Perform your upgrade however you prefer.

4. Import all of your zfs pools

host # zpool import <--If this doesn't work, use the names you specified during the export previously.

or

host # zpool import ZONENAME1
host # zpool import ZONENAME2

5. Reattach and boot/install your zones using the -F flag to force the issue (of course you can leave it out if you want, just to see what happens. Sometimes forcing makes things work that are flagged as errors, but aren't really. You can also use the -n flag to do a dry-run):

host# zoneadm -z ZONENAME1 attach -F
host# zoneadm -z ZONENAME1 boot -F <--Note that for the zoneadm command. if you don't list a zone name with the -z flag, the subcommand (halt, detach, attach, boot, etc) would apply to all zones!

And you should either be all set or have a more limited set of issues to deal with (probably mostly patch related).

Later on in the week (if we don't run out of screen space ;) we'll look at ways to troubleshoot a Solaris 10 zfs-root zone upgrade gone bad.

Until that bright and sunny day:)

, Mike

Friday, May 9, 2008

Destroying Storage Pools And File Systems On Solaris 10 Unix ZFS

Hey once more,

Today we're finally ready to wrap up our series of posts on using ZFS and storage pools on Solaris 10 Unix. We started out with a post on ZFS and storage pool creation commands, followed that up with a two-parter on maintenance and usage commands and even more helpful usage commands, and have finally arrived here, today.

This post marks the end in this little series, and, befittingly, it has to deal with destroying ZFS file systems and storage pools. It's wonderful how that worked out so appropriately ;)

I hope this series has been as much a help to you as it has been in crippling my fingers ;) Seriously, here's hoping we all got something out of it :)

Let the end begin:

1. If you decide you want to get rid of your storage pool (none of my business... ;), you can do so like this:

host # zpool destroy zvdevs
host # zpool list zvdevs
cannot open 'zvdevs': no such pool

2. You can also easily remove mirrors from storage pools by doing the following:

host # zpool detach zvdevs /vdev3
host # zpool status -v zvdevs
  pool: zvdevs
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zvdevs      ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            /vdev1  ONLINE       0     0     0
            /vdev2  ONLINE       0     0     0

errors: No known data errors

3. In a Solaris 10 storage pool, you can't remove any active disks using "zpool remove." If you attempt to, it's not a big deal, because you'll just get an error, like this:

host # zpool remove zvdevs /vdev3
cannot remove /vdev3: only inactive hot spares can be removed

But, you can still remove active disks (as long as doing so isn't detrimental...) with the -f flag to force it. To do it the polite way, you just need to "detach" the virtual device in the exact same way you'd detach a mirror from a storage pool, like so:

host # zpool detach zvdevs /vdev3
host # zpool status -v zvdevs
pool: zvdevs
state: ONLINE
scrub: resilver completed with 0 errors on Mon May 5 10:05:33 2008
config:

NAME STATE READ WRITE CKSUM
zvdevs ONLINE 0 0 0
mirror ONLINE 0 0 0
/vdev1 ONLINE 0 0 0
/vdev2 ONLINE 0 0 0

errors: No known data errors

4. Of course, you can easily remove vdevs from a storage pool if they're "hot spares" and they're "inactive." Here, we'll add a spare and then remove it:

host # zpool add zvdevs spare /vdev3
host # zpool status -v zvdevs
pool: zvdevs
state: ONLINE
scrub: resilver completed with 0 errors on Mon May 5 10:05:33 2008
config:

NAME STATE READ WRITE CKSUM
zvdevs ONLINE 0 0 0
mirror ONLINE 0 0 0
/vdev1 ONLINE 0 0 0
/vdev2 ONLINE 0 0 0
spares
/vdev3 AVAIL

errors: No known data errors
host # zpool remove zvdevs /vdev3
host # zpool status -v zvdevs
pool: zvdevs
state: ONLINE
scrub: resilver completed with 0 errors on Mon May 5 10:05:33 2008
config:

NAME STATE READ WRITE CKSUM
zvdevs ONLINE 0 0 0
mirror ONLINE 0 0 0
/vdev1 ONLINE 0 0 0
/vdev2 ONLINE 0 0 0

errors: No known data errors

5. If you want to remove ZFS file systems, just use the "destroy" option, shown below:

host # zfs destroy zvdevs/vusers4
host # zfs list|grep zvdevs
zvdevs 200M 784M 28.5K /zvdevs
zvdevs/vusers 24.5K 200M 24.5K /zvdevs/vusers
zvdevs/vusers@backup 0 - 24.5K -
zvdevs/vusers2 24.5K 100M 24.5K /zvdevs/vusers2
zvdevs/vusers3 24.5K 100M 24.5K /zvdevs/vusers3

6. Of course, there's a catch to destroying ZFS file systems. It won't work straight off if the file system has "children" (Unix has always been a family-oriented OS ;)

host # zfs destroy zvdevs/vusers
cannot destroy 'zvdevs/vusers': filesystem has children
use '-r' to destroy the following datasets:
zvdevs/vusers@backup

In this instance, we either need to recursively delete the file system ( with the -r flag ), like so:

host # zfs destroy -r zvdevs/vusers
host # zfs list|grep zvdevs
zvdevs 100M 884M 27.5K /zvdevs
zvdevs/vusers2 24.5K 100M 24.5K /zvdevs/vusers2
zvdevs/vusers3 24.5K 100M 24.5K /zvdevs/vusers3

or promote the filesystem so that it's no longer a clone and no longer dependant on its origin snapshot, at which point we can destroy it like this (If you recall, many moons ago - this gets confusing - we took a snapshot of zvdevs/vusers. Then we cloned the snapshot to a new file system, zvdevs/vusers4, in order to do a rollback... That's why we have to promote zvdevs/vusers4 to remove zvdev/vusers):

host # zfs promote zvdevs/vusers4
host # zfs list|grep zvdevs
zvdevs 100M 884M 29.5K /zvdevs
zvdevs/vusers 0 884M 24.5K /zvdevs/vusers
zvdevs/vusers2 24.5K 100M 24.5K /zvdevs/vusers2
zvdevs/vusers3 24.5K 100M 24.5K /zvdevs/vusers3
zvdevs/vusers4 24.5K 884M 24.5K /zvdevs/vusers4
zvdevs/vusers4@backup 0 - 24.5K -
host # zfs promote zvdevs/vusers4
host # zfs destroy zvdevs/vusers
host # zfs list|grep zvdevs
zvdevs 100M 884M 28.5K /zvdevs
zvdevs/vusers2 24.5K 100M 24.5K /zvdevs/vusers2
zvdevs/vusers3 24.5K 100M 24.5K /zvdevs/vusers3
zvdevs/vusers4 24.5K 884M 24.5K /zvdevs/vusers4
zvdevs/vusers4@backup 0 - 24.5K -

7. If you've had enough and you want to destroy your storage pool, that's your call. Just remember the hierarchical nature of ZFS and don't get discouraged ;)

host # zpool destroy zvdevs

The only time this won't work is if your pool is not empty (which means containing data, rather than containing file systems) or if it has mounted file systems. Simple variations on regular UFS commands can fix the mounting/unmounting issue for you. Destroying individual ZFS filesystems in your pool will take care of the other issue (So will the "-f" flag, if you just want to use force and be done with it):

host # zpool destroy zvdevs
cannot destroy 'zvdevs': pool is not empty
use '-f' to force destruction anyway
Can’t destroy a pool with active filesystems.
host # zfs mount myzfs/zvdevs
host # zfs unmount myzfs/zvdevs

Note that you may have to unmount filesystems overlay-mounted on top of the main file system. No problem. Again, you can use "-f" to save you the hassle.

And that's all she wrote about this subject... for now...

Have a great weekend :)

, Mike

linux unix internet technology

Thursday, May 8, 2008

More Helpful Usage Commands For Solaris 10 Unix ZFS

Hey there,

Today we're here with part three of our four-part mini-series on ZFS and storage pool commands and tips. We started out with ZFS and storage pool creation commands, and followed that up with yesterday's post on maintenance and usage commands. Today's post is actually a continuation of yesterday's post, which was originally going to be part two of three. The inmates must be running the asylum ;)

For any notes pertaining to this entire four-post series, please refer to the first ZFS and storage pool post. I'm getting tired of hearing my own voice in my head ;)

Today, we'll keep on going with usage and maintenance tips. Enjoy and don't be afraid to come back for seconds :)

1. You can run statistics specifically against your ZFS storage pools (which can help in identifying a bottleneck on a system with a lot of virtual devices, zones and pools). These stats aren't too exciting since I just created the pool for this little cheat-sheet:

host # zpool iostat zvdevs 1 5
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zvdevs        92K  1016M      0      0     83    633

<--- As with regular iostat, this is a summary line
zvdevs 92K 1016M 0 0 0 0
zvdevs 92K 1016M 0 0 0 0
zvdevs 92K 1016M 0 0 0 0
zvdevs 92K 1016M 0 0 0 0

2. If you ever need to list out your ZFS file systems, just use the "list" option, like this:

host # zfs list |grep zvdevs
zvdevs 120K 984M 26.5K /zvdevs
zvdevs/vusers 24.5K 984M 24.5K /zvdevs/vusers

3. One problem you might note with new ZFS files systems and listing them out, is that they all have access to the entire storage pool. So, theoretically, any of them could fill up the entire storage pool space and leave the other two out of disk space. This can be fixed by setting "reservations" to ensure that each VFS file system gets to keep a certain amount of space for itself no matter what (here, vusers gets at least 100 Megabytes, and vusers2 and 3 will get at least 50 Megabytes each):

host # zfs set reservation=100m zvdevs/vusers
host # zfs set reservation=50m zvdevs/vusers2
host # zfs set reservation=50m zvdevs/vusers3
host # zfs list -o name,reservation|grep zvdevs
zvdevs none
zvdevs/vusers 100M
zvdevs/vusers2 50M
zvdevs/vusers3 50M

4. On the other hand, you may want to restrict space-hogs from taking up too much space. You can do this with quota's, almost just like on older versions of Solaris Unix:

host # zfs set quota=200m zvdevs/vusers
host # zfs set quota=100m zvdevs/vusers2
host # zfs set quota=100m zvdevs/vusers3
host # zfs list -o name,quota|grep zvdevs
zvdevs none
zvdevs/vusers 200M
zvdevs/vusers2 100M
zvdevs/vusers3 100M

5. You can also set up file systems to use compression, selectively, like so (Whether or not this is worthwhile is debatable):

host # zfs set compression=on zvdevs/vusers2
host # zfs list -o name,compression|grep zvdevs
zvdevs off
zvdevs/vusers off
zvdevs/vusers2 on
zvdevs/vusers3 off

6. ZFS also has built-in snapshot capabilities, so you can prep yourself (like, before you're about to try something crazy ;) and be able to rollback to an earlier point in time (We'll call our snapshot "backup"), like this:

host # zfs snapshot zvdevs/vusers@backup
host # zfs list|grep zvdevs
zvdevs 200M 784M 28.5K /zvdevs
zvdevs/vusers 24.5K 200M 24.5K /zvdevs/vusers
zvdevs/vusers@backup 0 - 24.5K -
zvdevs/vusers2 24.5K 100M 24.5K /zvdevs/vusers2
zvdevs/vusers3 24.5K 100M 24.5K /zvdevs/vusers3

7. Now, assuming that you screwed everything up so badly that you need to back out your changes (but not so bad that you can't access Solaris ;), you can use ZFS's built-in rollback capability. Here we'll do a rollback, based on our snapshot, and clone that to a separate ZFS file system, so we can move back whatever files we need to their original locations (Unfortunately, the extra cloning steps are necessary, as snapshots are not directly accessible and you also can't clone to an existing vdev! ):

host # zfs rollback zvdevs/vusers@backup
host # zfs clone zvdevs/vusers@backup zvdevs/vusers
cannot create 'zvdevs/vusers': dataset already exists <--- Didn't I just remind myself that I can't do this ;)
host # zfs clone zvdevs/vusers@backup zvdevs/vusers4
host # zfs list|grep zvdevs
zvdevs 200M 784M 29.5K /zvdevs
zvdevs/vusers 24.5K 200M 24.5K /zvdevs/vusers
zvdevs/vusers@backup 0 - 24.5K -
zvdevs/vusers2 24.5K 100M 24.5K /zvdevs/vusers2
zvdevs/vusers3 24.5K 100M 24.5K /zvdevs/vusers3
zvdevs/vusers4 0 784M 24.5K /zvdevs/vusers4

8. Another cool thing you can do is to rename your file systems, like this (note that any attached snapshots will be renamed as well):

host # zfs rename zvdevs/vusers4 zvdevs/garbage
host # zfs list|grep zvdevs
zvdevs 100M 884M 28.5K /zvdevs
zvdevs/garbage 24.5K 884M 24.5K /zvdevs/garbage
zvdevs/garbage@backup 0 - 24.5K -
zvdevs/vusers2 24.5K 100M 24.5K /zvdevs/vusers2
zvdevs/vusers3 24.5K 100M 24.5K /zvdevs/vusers3

9. You can also rename snapshots, if you don't want anybody (including yourself in a few months ;) to be able to easily tell what file system is a backup of what other file system, like this:

host # zfs rename zvdevs/garbage@backup zvdevs/garbage@rollbacksys
host # zfs list|grep zvdevs
zvdevs 100M 884M 28.5K /zvdevs
zvdevs/garbage 24.5K 884M 24.5K /zvdevs/garbage
zvdevs/garbage@rollbacksys 0 - 24.5K -
zvdevs/vusers2 24.5K 100M 24.5K /zvdevs/vusers2
zvdevs/vusers3 24.5K 100M 24.5K /zvdevs/vusers3

10. And, just in case you need to have all the information you could ever want about your ZFS file systems, you can "get" it "all" :

host # zfs get all zvdevs
NAME PROPERTY VALUE SOURCE
zvdevs type filesystem -
zvdevs creation Mon May 5 10:00 2008 -
zvdevs used 100M -
zvdevs available 884M -
zvdevs referenced 28.5K -
zvdevs compressratio 1.00x -
zvdevs mounted yes -
zvdevs quota none default
zvdevs reservation none default
zvdevs recordsize 128K default
zvdevs mountpoint /zvdevs default
zvdevs sharenfs off default
zvdevs checksum on default
zvdevs compression off default
zvdevs atime on default
zvdevs devices on default
zvdevs exec on default
zvdevs setuid on default
zvdevs readonly off default
zvdevs zoned off default
zvdevs snapdir hidden default
zvdevs aclmode groupmask default
zvdevs aclinherit secure default
zvdevs canmount on default
zvdevs shareiscsi off default
zvdevs xattr on default

11. It's better to send than to receive :) Maybe not... In any event, you can use the ZFS send and receive commands to send snapshots to other filesystems locally or remotely (much in the same way you can use the system subshell to move content using tar), like so:

host # zfs send zvdevs/vusers@garbage | ssh localhost zfs receive zvdev/newbackup
host # zfs list
host # zfs list|grep zvdevs
zvdevs 100M 884M 28.5K /zvdevs
zvdevs/garbage 24.5K 884M 24.5K /zvdevs/garbage
zvdevs/vusers@backup 0 - 24.5K -
zvdevs/vusers2 24.5K 100M 24.5K /zvdevs/vusers2
zvdevs/vusers3 24.5K 100M 24.5K /zvdevs/vusers3
zvdevs/newbackups 24.5K 100M 24.5K /zvdevs/newbackup

12. If you get nostalgic from time to time, or you need to figure out why something bad happened, you can use the ZFS filesystem history command, like so (Note that no history will be available after you destroy a pool -- I did that already, to zvdevs, so this output is for another storage pool on the same system):

# zpool history
History for 'datadg':
2008-02-27.20:25:32 zpool create -f -m legacy datadg c0t1d0
2008-02-27.20:27:57 zfs set mountpoint=legacy datadg
2008-02-27.20:27:58 zfs create -o mountpoint -o quota=5m datadg/data
2008-02-27.20:27:59 zfs create -V 20gb datadg/swap
2008-02-27.20:31:47 zfs create -o mountpoint -o quota=200m datadg/oracleclient

Cheers,

, Mike

linux unix internet technology

Wednesday, May 7, 2008

Maintenance And Usage Commands For ZFS On Solaris 10 Unix

Hey again,

Today, we're back with part two of what was going to be a three part series on working with ZFS and storage pools. Actually, this was originally going to be one post, but (luckily ?) it's grown into four gigantic ones ;) This one, and tomorrow's, are going to be the "big daddies" of the bunch.

Yesterday we looked at ZFS storage pool and file system creation and today we're going to move on to commands that we can use to manipulate those storage pools and file systems that we've made.

Please note, again, that for all commands where I specifically name a virtual device or storage pool, you can get a full listing of all available devices by simply not specifying any storage pool at all.

And, without a moment to spare, here come those maintenance/usage commands (please enjoy responsibly ;)

1. If you need to know as much as possible about your storage pools, you can use this command:

host # zpool status -v zvdevs
  pool: zvdevs
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zvdevs      ONLINE       0     0     0
          /vdev1    ONLINE       0     0     0
          /vdev2    ONLINE       0     0     0
          /vdev3    ONLINE       0     0     0

errors: No known data errors

2. In Solaris 10 storage pool management, you can also "online" and "offline" virtual devices. You might need to do this from time to time if you need to replace an "actual" device that may be faulty. Here's an example of offlining and then onlining a vdev. Note that, if you use the "-t" flag when offlining, the device will only be temporarily disabled. Normal offlining is persistent and the storage pool will maintain the vdev in an offline state even after a reboot:

host # zpool offline zvdevs /vdev2
Bringing device /vdev2 offline
host # zpool status -v zvdevs
pool: zvdevs
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scrub: resilver completed with 0 errors on Mon May 5 10:05:33 2008
config:

NAME STATE READ WRITE CKSUM
zvdevs DEGRADED 0 0 0
mirror DEGRADED 0 0 0
/vdev1 ONLINE 0 0 0
/vdev2 OFFLINE 0 0 0

errors: No known data errors
host # zpool online zvdevs /vdev2
Bringing device /vdev2 online
host # zpool status -v zvdevs
pool: zvdevs
state: ONLINE
scrub: resilver completed with 0 errors on Mon May 5 10:18:34 2008
config:

NAME STATE READ WRITE CKSUM
zvdevs ONLINE 0 0 0
mirror ONLINE 0 0 0
/vdev1 ONLINE 0 0 0
/vdev2 ONLINE 0 0 0

errors: No known data errors

3. If you want to attach another disk to your storage pool mirror, it's just as simple. This process will create a simple mirror if you only have one device in your pool (???) or create a triplicate, quadruplicate, etc, mirror if you already have a simple mirror set up:

host # zpool attach zvdevs /vdev1 /vdev3 <--- Note that we're specifically saying we want to mirror /vdev1 and not /vdev2. It doesn't really matter, since they're both mirrors of each other, but you can't just attach a device without naming a device to mirror!
host # zpool status -v zvdevs
pool: zvdevs
state: ONLINE
scrub: resilver completed with 0 errors on Mon May 5 10:05:33 2008
config:

NAME STATE READ WRITE CKSUM
zvdevs ONLINE 0 0 0
mirror ONLINE 0 0 0
/vdev1 ONLINE 0 0 0
/vdev2 ONLINE 0 0 0
/vdev3 ONLINE 0 0 0

errors: No known data errors

5. For a little shell-game, if you ever need to replace a vdev in your storage pool (say, with a hot spare), you can do it easily, like this:

host # zpool add zvdevs spare /vdev3 <--- This may not be necessary, but I removed the spare and am re-adding it.
host # zpool replace zvdevs /vdev1 /vdev3
host # zpool status -v zvdevs
pool: zvdevs
state: ONLINE
scrub: resilver completed with 0 errors on Mon May 5 10:20:58 2008
config:

NAME STATE READ WRITE CKSUM
zvdevs ONLINE 0 0 0
mirror ONLINE 0 0 0
spare ONLINE 0 0 0
/vdev1 ONLINE 0 0 0
/vdev3 ONLINE 0 0 0
/vdev2 ONLINE 0 0 0
spares
/vdev3 INUSE currently in use

errors: No known data errors

6. If you suspect file system damage, you can "scrub" your storage pool. zpool will verify that everything is okay (if it is ;) and will auto-repair any problems on mirror or raid pools :)

host # zpool scrub zvdevs <--- Depending on how much disk you have and how full it is, this can take a while and chew up I/O cycles like nuts.

7. If you want to share storage pools between systems (real or virtual), you can "export" your storage pool like so (Note that this will make the storage pool not show up as being "owned" by your system anymore, although you can reimport it):

host # zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
datadg 33.8G 191M 33.6G 0% ONLINE -
rootdg 9.75G 5.11G 4.64G 52% ONLINE -
zvdevs 1016M 127K 1016M 0% ONLINE -
host # zpool export zvdevs
host # zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
datadg 33.8G 191M 33.6G 0% ONLINE -
rootdg 9.75G 5.11G 4.64G 52% ONLINE -

8. In order to "import" an exported filesystem, you just need to have the system permission to do so (being root in the global zone is the perfect place to be when you try this. Just be careful ;)

host # zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
datadg 33.8G 191M 33.6G 0% ONLINE -
rootdg 9.75G 5.11G 4.64G 52% ONLINE -
host # zpool import -d / zvdevs
host # zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
datadg 33.8G 191M 33.6G 0% ONLINE -
rootdg 9.75G 5.11G 4.64G 52% ONLINE -
zvdevs 1016M 92K 1016M 0% ONLINE -

You can run this command without -d (to specify the base directory of the storage pool) and zpool will search /dev/dsk for a place to import. In our case it won't find it, like this:

host # zpool import zvdevs
cannot import 'zvdevs': no such pool available

9. If you need to know what version of ZFS your system is using, you can use zpool's "upgrade" option. Don't worry. Without the proper flags, it just lets you know what version is running and some other information, like this:

host # zpool upgrade
This system is currently running ZFS version 4.

All pools are formatted using this version.

10. If you want to see what features your version of ZFS has, you can use the "upgrade" option with the -v flag. Everything is retro, so (in our case), since we're running version 4, we have all the capabilities of versions 3, 2 and 1 but not of versions 5 and higher (At the time of this post, ZFS is up to version 10):

host # zpool upgrade -v
This system is currently running ZFS version 4.

The following versions are supported:

VER DESCRIPTION
--- --------------------------------------------------------
1 Initial ZFS version
2 Ditto blocks (replicated metadata)
3 Hot spares and double parity RAID-Z
4 zpool history
...

Now, go and have fun with those commands :)

Until we meet again,

, Mike

linux unix internet technology

Tuesday, May 6, 2008

ZFS Command Sheet For Solaris Unix 10 - Pool And File System Creation

Hey There,

Today, we're going back to the Solaris 10 Unix well and slapping together a few useful commands (or, at least, a few commands that you'll probably use a lot ;). We've already covered ZFS, and Solaris 10 zones, in our previous posts on creating storage pools for ZFS and patching Solaris 10 Unix zones, but those were more specific, while this post is meant to be a little quick-stop command repository (and only part one, today). This series also is going to focus more on ZFS and less on the "zone" aspect of the Solaris 10 OS.

Apologies if the explanations aren't as long as my normal posts are. ...Then again, some of you may be thanking me for the very same thing ;)

So, without further ado, some Solaris 10-specific commands that will hopefully help you in a pinch :) Note that for all commands where I specify a virtual device or storage pool, you can get a full listing of all available devices/pools by "not specifying" any storage pool. I'm just trying to keep the output to the point so this doesn't get out of hand.

Today we're going to take storage pools and ZFS file systems and look at creation-based commands, tomorrow we'll look at maintenance/usage commands, and then we'll dig on destructive commands and cleaning up the mess :)

1. To create virtual devices (vdevs), which can, technically, be virtual (disk made from a part, or parts, of real disk) or "real" disk if you have it available to you, you can do this:

host # mkfile 1g vdev1 vdev2 vdev3
host # # ls -l vdev[123]
-rw------T   1 root     root     1073741824 May  5 09:47 vdev1
-rw------T   1 root     root     1073741824 May  5 09:47 vdev2
-rw------T   1 root     root     1073741824 May  5 09:48 vdev3

2. To create a storage pool, and check it out, you can do the following:

# zpool create zvdevs /vdev1 /vdev2 /vdev3
# zpool list zvdevs <--- Don't specify the name of the pool if you want to get a listing of all storage pools!
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
zvdevs 2.98G 90K 2.98G 0% ONLINE -

3. If you want to create a mirror of two vdev's of different size, this can be done, but you'll be stuck with the smallest possible mirror (as it would be physically impossible to put more information on one disk that it can contain. That seems like common sense ;)

host # zpool create -f vzdevs mirror /vdev1 /smaller_vdev <--- The mirrored storage pool will be the size of the "smaller_vdev"

4. If you want to create a mirror, with all the disks (or vdevs) the same size (like they should be :), you can do it like this:

host # zpool create zvdevs mirror /vdev1 /vdev2 /vdev3 /vdevn... <--- I haven't hit the max yet, but I know you can create a "lot" of mirrors in the same set. Of course, you'd be wasting a lot of disk and it would probably make data access slower...

# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
myzfs 95.5M 112K 95.4M 0% ONLINE -
host # zpool status -v zvdevs
pool: zvdevs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
zvdevs ONLINE 0 0 0
mirror ONLINE 0 0 0
/vdev1 ONLINE 0 0 0
/vdev2 ONLINE 0 0 0
/vdev3 ONLINE 0 0 0

errors: No known data errors

5. You can create new directories, add file systems on them and mount them in your storage pool very easily. All you need to do is "create" them with the "zfs" command. Three tasks in one! (as easy as creating a pool with the zpool command):

host # zfs create zvdevs/vusers
host # df -h zvdevs/vusers
Filesystem size used avail capacity Mounted on
zvdevs/vusers 984M 24K 984M 1% /zvdevs/vusers

6. If you need to create additional ZFS file systems, the command is the same, just lather rinse and repeat ;)

host # zfs create zvdevs/vusers2
host # zfs create zvdevs/vusers3
host # zfs list |grep zvdevs
zvdevs 182K 984M 27.5K /zvdevs
zvdevs/vusers 24.5K 984M 24.5K /zvdevs/vusers
zvdevs/vusers2 24.5K 984M 24.5K /zvdevs/vusers2
zvdevs/vusers3 24.5K 984M 24.5K /zvdevs/vusers3

See you tomorrow, for more fun with Solaris 10 ZFS/Storage Pool maintenance/usage commands :)

Cheers,

, Mike

linux unix internet technology

Thursday, April 24, 2008

Creating Storage Pools For Solaris ZFS

Hey there,

As a way of getting on to addressing Solaris 10 Unix issues beyond patching zones and migrating zones, today we're going to put together a slam-bang setup of a Solaris ZFS storage pool. For those of you unfamiliar with ZFS (and I only mention this because I still barely ever use it -- Lots of folks are stuck on either Solaris 8 or 9 and are having a hard time letting go ;), it simply stands for "Zettabyte File System."

A Zettabyte = 1024 Exabytes.
An Exabyte = 1024 Petabytes.
A Petabyte = 1024 Terabytes.
A Terabyte = 1024 Gigabytes.
A Gigabyte = 1024 Megabytes.

And, at about this point the number scale makes sense to most of us. And, even though a Zettabyte seems like it's an unbelievably large amount of space, the next generation of operating systems will require twice as much RAM and disk space in order to respond to your key presses in under a minute ;)

Anyway, now that we have that out of the way, let's set up a ZFS pool.

First we'll create the pool, in a pretty straightforward fashion, using the zpool command and adding the disks (which can be in either cXtXdX notation, or specified by slice with cXtXdXsX notation. You can also use full logical path notation if you want (e.g. /dev/disk/cXtXdXsX). We'll assume two entire disks (unmirrored), create them and mount them for use:

host # zpool create MyPoolName c0t0d0 c0t1d0

Note that you can run "zpool create -n" to do a dry-run. This will allow you to find any mistakes you may be making in your command line as zpool doesn't "really" create the pool when you run it with the "-n" flag.

We also won't have to do anything extra to create the mount point, as it defaults to the name of the disk group. So, in this case, we would have a storage pool with a default mount point of /MyPoolName. This directory has to either not exist (in which case it will be automatically created) or exist and be empty (in which case the root dataset will be mounted over the existing directory). If you want to specify a different mount point, you can use the "-m" flag for zpool, like so:

host # zpool create -m /pools/MyPool MyPoolName c0t0d0 c0t1d0

And your ZFS storage pool is created! You can destroy it just as easily by running:

host # zpool destroy MyPoolName

or, if the pool is in a faulted state (or you experience any other error, but know you want to get rid of it), you can always force the issue ;)

host # zpool destroy -f MyPoolName

And you're just about ready to use your new "device." All you need to do is ( just like you would when mounting a regular disk device ) create a filesytem on your storage pool. You can do this pretty quickly with the zfs command, like so:

host # zfs create /MyPoolName/MyFileSystem

Note that the pool name directory is the base of the storage pool and your file system is actually created at the next level up. Basically, /MyPoolName is not your file system, but /MyPoolName/MyFileSystem is. And you can manage this file system, in much the same way you manage a regular ufs file system, using the zfs command.

Again, you can always opt-out and destroy it if you don't like it (here, as well, you have the option to "force" if you want):

host # zfs destroy /MyPoolName/MyFileSystem

Aside from "-f" (to force execution), the most common flags to use with "zfs destroy" are "-r" to do a recursive destroy or "-R" to recursively destroy all the regular descendants in the file system tree, plus all cloned file systems outside of your target directory (Be careful with that switch!)

We've only just brushed on getting started with zfs, but, hopefully, it's been somewhat helpful and, at the very least, readable ;)

Enjoy,

, Mike

linux unix internet technology

The Linux and Unix Menagerie