AppleRAID 2 in Depth
Way back in the day I mentioned that appleraid 2 had received a fair bit of attention on Tiger. Then we proceeded to never say anything about it again. Today that changes as our in depth coverage of one of the most under utilized bits of Mac OS X, appleraid, hits the site.
Read on for more...
There are few parts of Mac OS X that are understood, or appreciated, by fewer people than the built-in RAID software. There is a perception out there that it is flaky, inflexible, and unreliable. Over the course of hundreds of Mac OS X Server installs though we have found these perceptions to be false or at least ill applied. By the end of this article I hope that you will have a greater respect for appleraid 2.
A bit of history
Apple has had a long history of RAID software. In the classical Mac OS days, they bundled a re-branded copy of SoftRAID as appleraid with some installs of AppleShare or ASIP. RAID disappeared with the advent of Mac OS X, reappearing in Mac OS X 10.2. At the time it offered device level stripes and mirrors and nothing else. Rebuilding a RAID mirror before 10.2.4 was tricky and often failed. With the coming of Mac OS X 10.3 we received a few nice updates to appleraid, the main ones being the ability to convert an existing drive into a mirror and the ability to rebuild mirrors on-line. All of this was just a tease for the launch of 10.4 and appleraid 2.
What's new in appleraid 2
appleraid 2 brings a bunch of new features to the party:
All of this means that appleraid 2 is now a very flexible tool. In particular the new mirror tools are a boon to sysadmins.
RAID Primer
Before we get too far into the Mac OS X Server specific details we should take a moment to review the basics of RAID.
RAID stands for Redundant Array of Inexpensive (or Independent) Disks. The idea behind it is to take multiple cheap drives and band them together to achieve performance, or reliability, that is beyond the reach of a single drive. The different types of RAID arrays are designated by a level number. appleraid supports two levels of RAID:
appleraid does not support any other common levels such as 3 or 5. If you want those you need to go with a hardware based solution for the best performance. Let's look a bit closer at each of these RAID levels.
RAID level 0 stripes multiple volumes together in an effort to maximize speed. It offers no data protection at all and the loss of a single member results in the loss of the entire array. (Astute readers will note that this really means that RAID 0 really isn't a RAID at all as it lacks the "Redundant" part.) In general a stripe is not often used in a server as it only offers a speed increase and it multiplies your odds of a volume failure by each member added to the RAID. It is possible to mirror two stripes, but with only 3 drive bays in a Xserve it is an option that not many will use. In general a Xserve RAID configured for RAID 5 is a better option if you want a single large, and protected, volume. While stripes find little use on Mac OS X Server outside of specific Xsan applications, they are often used on workstations for things like video capture volumes.
RAID level 1 mirrors two volumes to provide redundancy. This way if a drive were to fail there would be no sudden failure of services. In general this is the level of RAID that most Mac OS X Server sysadmins will apply when hardening their servers against failure. RAID mirrors give the administrator a great deal of flexibility to replace trouble hardware without downtime. Another common task is to split, or remove, a member from a RAID mirror to create an instant backup of that volume. appleraid 2 adds the ability to cleanly split a member from a live RAID under Mac OS X.
Nested RAID levels such as 0+1 and 1+0 are an attempt to add flexibility to levels 0 and 1. 0+1 has all the problems of 0 for the most part with only slightly higher reliability. Since the underlaying stripes are so fragile the loss of a single drive in both arrays will result in total data loss. RAID 1+0, or 10, is more robust as each member of the greater stripe is comprised of a mirror. You could loose a single drive from each mirror member of the stripe and not loose any data. As noted above these options are not often used on Mac OS X Server due to the relatively small number of drive bays in any shipping Mac.
A concatenated disk is sometimes also referred to as JBOD (Just a Bunch of Disks). This is very similar to a RAID 0 in that it combines volumes to form one larger volume. Where it is different on most OSes is that the failure of one volume won't result in the loss of all data, but rather just the data that was located on that particular disk in the set. This has an effect very similar to a large patch of bad blocks on a single device. Unfortunately, Apple's implimentation of concatenated disks doesn't follow this model. The loss of a single disk will result in a wholly inaccessible volume but if you restore the missing member it will spring back to life though. The advantage to a concat versus a stripe on Mac OS X is that you can dynamically add volumes to expand the set without taking the volume offline. If you administer Windows servers you know these disk sets as a spanning dynamic disk.
One of the key things to remember about any RAID level is that a RAID set is not a backup! A RAID is either a performance aid or a guard against unexpected downtime. Nothing more.
So now that we have been refreshed on what the levels of RAID are we can dive in.
appleraid Specifics
So how does Mac OS X know that a disk is a member of a RAID set? The basics of it are pretty simple. Take a look at the partition table for a non-RAID device.
GPT formatted disk:
josh$ diskutil list disk3 /dev/disk3 #: type name size identifier 0: GUID_partition_scheme *27.9 GB disk3 1: EFI 200.0 MB disk3s1 2: Apple_HFS Client 13.8 GB disk3s2 3: Apple_HFS disk3 13.7 GB disk3s3APT formatted disk:
josh$ diskutil list disk1 /dev/disk1 #: type name size identifier 0: Apple_partition_scheme *17.0 GB disk1 1: Apple_partition_map 31.5 KB disk1s1 2: Apple_Boot 128.0 MB disk1s2 3: Apple_HFS disk1 16.8 GB disk1s3Now let's look at the same disks once they are part of a RAID set.
GPT formatted disk:
josh$ diskutil list disk3 /dev/disk3 #: type name size identifier 0: GUID_partition_scheme *27.9 GB disk3 1: EFI 200.0 MB disk3s1 2: Apple_HFS Client 13.8 GB disk3s2 3: Apple_RAID 13.7 GB disk3s3APT formatted disk:
josh$ diskutil list disk1 /dev/disk1 #: type name size identifier 0: Apple_partition_scheme *17.0 GB disk1 1: Apple_partition_map 31.5 KB disk1s1 2: Apple_Boot 128.0 MB disk1s2 3: Apple_RAID 16.8 GB disk1s3Notice the data partition changed from a type of "Apple_HFS" to "Apple_RAID" on the disks. What you can't see is that the partition was shrunk by about 8Kb and new header info was written. This header is used to store the information about the RAID set and this particular drive's membership and status in that set. Because the info is stored on the disk, it means that you can move the set to a different Mac and it should come up fine. For more detailed info on the RAID header you should take a look at appleraidMember.cpp and appleraidUserLib.h. (You must have an Apple ID to access the Darwin source repository.)
If we were to look at the newly created RAID as a disk we would see that it only has one partition:
josh$ diskutil list disk4 /dev/disk4 #: type name size identifier 0: Apple_HFS Untitled RAID Set 2 *16.8 GB disk4This is because this logical disk only contains the actual RAID set volume in an Apple_HFS partition. The physical disks retain all of the partitions needed for the disk to function.
For all practical purposes the Apple_RAID volumes of a mirror set and the Apple_HFS RAID volume itself are functionally identical. Using Amit Singh's hfsdebug tool we can examine the volume headers of all the disks in a set. For example:
josh$ diskutil checkRAID
RAID SETS
---------
Name: Untitled RAID Set 7
Unique ID: 3E2B3BDE-345C-4060-9D61-F56D05AB6DF3
Type: Mirror
Status: Online
Device Node: disk4
Apple RAID Version: 2
----------------------------------------------------------------------
# Device Node UUID Status
----------------------------------------------------------------------
0 disk1s3 F07F8405-3F4F-42B3-A443-55C30C0188D2 Online
1 disk2s3 AE62FC67-5F6F-4252-B7F3-720B91653A0B Online
----------------------------------------------------------------------
josh$ sudo ./hfsdebug -d /dev/rdisk4 -v | grep UUID
# File System Boot UUID
UUID = FE926987-7868-34C7-8E05-0B06AB3F366E
josh$ sudo ./hfsdebug -d /dev/rdisk1s3 -v | grep UUID
# File System Boot UUID
UUID = FE926987-7868-34C7-8E05-0B06AB3F366E
josh$ sudo ./hfsdebug -d /dev/rdisk2s3 -v | grep UUID
# File System Boot UUID
UUID = FE926987-7868-34C7-8E05-0B06AB3F366E
Note that the devices remain discrete and maintain their original UUIDs. The fact that the HFS volumes are identical is what allows the mirror to seamlessly function even when missing a member. If you were to add a new member to the array then it would assume the HFS properties of the mirror as a whole.Things are very different when considering a stripe or concat set of disks. Here only the first member of the array carries any HFS information with the other volumes simply providing additional blocks for the set. Let's examine the same information on a stripe.
josh$ diskutil checkRAID
RAID SETS
---------
Name: Untitled RAID Set 16
Unique ID: 410492D8-E48D-4B3E-99CA-75FD70B08354
Type: Stripe
Status: Online
Device Node: disk4
Apple RAID Version: 2
----------------------------------------------------------------------
# Device Node UUID Status
----------------------------------------------------------------------
0 disk1s3 D9866327-6AF8-49D1-982F-BC40B5AEE3EB Online
1 disk2s3 DD5EA9D5-041D-4BB2-9F0C-D4EED21931C9 Online
----------------------------------------------------------------------
josh$ sudo ./hfsdebug -d /dev/rdisk4 -v | grep UUID
# File System Boot UUID
UUID = 06E430B8-F1A4-3156-ADB4-5361D6B15031
josh$ sudo ./hfsdebug -d /dev/rdisk1s3 -v | grep UUID
# File System Boot UUID
UUID = 06E430B8-F1A4-3156-ADB4-5361D6B15031
josh$ sudo ./hfsdebug -d /dev/rdisk2s3 -v | grep UUID
This is neither an HFS+ nor an HFSX volume.
hfsdebug: failed to access the Volume Header.
On Mac OS X a concat set will return similar results.Now that we understand the relationship between an Apple_RAID partition and an Apple_HFS one there are two different ways to get this new partition onto the disk. One is to create a new set, essentially re-partiontining the device or volume and destroying all data on the disks. The other way is to use the diskutil enableRAID command. This will shrink the data partition slightly (by around 8Kb) and then create the Apple_RAID header and partition. This operation depends on several things:
Really the only two situations in which I have seen the enableRAID command fail is when the disk is slap full of data or if the Mac O9 drivers are installed. In these cases you will get errors like the following:
josh$ sudo diskutil enableRAID mirror disk2s10 Changing filesystem size on disk 'disk2s10'... Attempting to change filesystem size from 18232721408 to 18234343424 bytes Filesystem grow failed, 1 Disk Management could not shrink the filesystem to fit the new RAID headers Error enabling disk to RAID Invalid request (-9998)What the enableRAID command gives you is very powerful and that is the ability to convert an existing volume into the first disk of a mirror or concatenated RAID set. We will further explore these options in just a bit.
On Mac OS X there are two main interfaces for dealing with disks, Disk Utility and its command line counterpart diskutil. As was the case on Mac OS X 10.3, the diskutil tool can perform a few more functions than Disk Utility. The gap has narrowed though, and for many tasks it is far simpler to use Disk Utility's GUI. For the examples in this article we will use both tools where appropriate. A safe rule of thumb when dealing with the differences between the two tools though is that the non-destructive creation and removal tools for dealing with RAID sets and members exist only in diskutil. Additionally, there are some cases where Disk Utility will become confused and out of sync with the status of RAID sets. In those instances you can typically clear things up by restarting Disk Utility.
To write this article I grabbed an ancient AGP G4 500, threw three old 10K SCSI drives in it and grabbed a bus-powered FireWire drive. I'm running 10.4.7 PPC and the internal disks are APT formatted while the FireWire drive is GPT formatted. Really you can use just about any combination of devices to build RAID sets with as appleraid is very flexible.
Mirrors
Since this is what most of you are after I'll cover it first. Simply put, it is my opinion that almost no server should be running if it isn't booted from a mirror of some sort. Drive failures are one of the most common failure on servers and a RAID mirror can, and will, save your butt.
The simplest way to create a mirror is with Disk Utility. Simply select a volume, click the RAID tab, and then drag the volumes to create the mirror with into the window. Before you click the "Create" button though let's take a look at the "Options..." one.
When you click on the Options button an options panel appears that has two settings. The first one is an option to change the RAID block size. This is a performance tuning parameter that can assist you in tuning your RAID for speed when you have a specific sort of data that will be stored on it. If your array will host a MySQL database with tiny records you can lower the block size to improve access. If you are creating a volume that will store large video files then a larger block size will probably help speed access to those files. For general use the default of 32K is fine but we will take a closer look at block size tuning and determination later. Take note though that you can not change the block size after the initial creation.
The second option is the RAID Mirror AutoRebuild setting and it, quite obviously, is only for mirror sets. What this setting does is to allow a mirror to automatically rebuild onto a spare drive in case of a failure. It will not grab just any drive, only those that are included in the set as a spare. Once a spare drive is activated the member of the set that failed is marked as a spare so that it will not get in the way if it happens to come back online during the rebuild. The simple act of adding a spare drive will activate the AutoRebuild option for the RAID and in my testing I have not been able to turn it off. Luckily I have not been able to cause the rebuild process to fail by reintroducing the missing member to the set either. Let's take a closer look at spare drives now.
To add a spare, just drag another volume of equal or larger size into the RAID mirror and then select "Spare" from the type pick list. You can also add a spare from the command line using sudo diskutil addToRAID spare newMember existingRAID where newMember and existingRAID are disk identifiers of the volumes you are working with. This will remove the volume from general use and put it on standby for an issue with the mirror. Check out the spiffy video (QuickTime 7 required) of a warm spare and AutoRebuild in action as I yank the cable on a member of the RAID. Notice that the DataRAID volume never drops from the desktop. You can have multiple spares if you wish, although you will probably begin to run out of drive bays before it becomes practical. In a standard Xserve configuration a boot mirror with a spare would be a very bulletproof setup.
To create your RAID now just click the "Create" button and let Disk Utility do its thing. If we wanted to create our mirror from the command line then the syntax is simple, sudo diskutil createRAID mirror RAIDname FilesystemName member disks. To see an example I'll create a mirror from disk1 and disk 3s3.
josh$ sudo diskutil createRAID mirror DataMirror "Journaled HFS+" disk1 disk3s3 Preparing partition 'disk1s3' for RAID Adding disk 'disk1s3' to new RAID set Preparing partition 'disk3s3' for RAID Adding disk 'disk3s3' to new RAID set Creating RAID Set (disk1 , disk3s3 data1) Bringing RAID partitions online Waiting for new RAID to come online "2CFEC74D-5C0C-49CB-940E-6099AE3C2B97" Creating file system on RAID volume "disk4 " The RAID has been created successfullyAnother common use of RAID 1 is to split a member out of the mirror as a backup. This provides an instant snapshot of the volume as a whole. Before 10.4 Apple had no easy way to do this and the best you could get away with was to unmount the volume and remove a drive. This was a pretty nasty trick and not that easy if the OS was on the mirror as it required you to shut the server down. With appleraid 2 though we have a much easier option in the form of the removeFromRAID command. Let's take a look...
For this example I've created a mirror that has an external FireWire drive as one of its members. A quick look with diskutil checkRAID will show us the mirror, its status, and its member devices.
josh$ diskutil checkRAID RAID SETS --------- Name: DataMirror Unique ID: 2CFEC74D-5C0C-49CB-940E-6099AE3C2B97 Type: Mirror Status: Online Device Node: disk4 Apple RAID Version: 2 ---------------------------------------------------------------------- # Device Node UUID Status ---------------------------------------------------------------------- 0 disk1s3 EF3F0D87-0E9A-4ED4-BB2B-F2A7EBFD0306 Online 1 disk3s3 B1EC4F22-9C0D-4CDB-90FE-D49347195834 Online ----------------------------------------------------------------------In this case the FireWire disk is the second member of the array. To cleanly remove this disk from the array I simply call sudo diskutil removeFromRAID disk3s3 disk4. After a few seconds I'll be notified that the RAID headers have been removed from the drive and it will mount on the Desktop.It looks like this when it happens.
josh$ sudo diskutil removeFromRAID disk3s3 disk4
Password:
appleraid Headers removed from disk 'disk3s3'
Changing filesystem size on disk 'disk3s3'...
Attempting to change filesystem size from 14658928640 to 14658936832 bytes
The disk has been removed from the RAID
Now I can eject the external drive and tuck it away with all of its data intact. If my mirror had a spare it would of begun the process of rebuilding onto it or I can add a new member to the array and rebuild onto that.Keep in mind that this operation actually removes the drive from the RAID. The headers are deleted and it is removed from the mirror's member list. In this way it's not an exact replica but since the data is exactly preserved it serves our purposes.
As mentioned earlier, you can turn any existing volume into a degraded mirror pair, then rebuild the mirror onto a second volume. We have an older document on the process here but I'll quickly run through the steps for you now. In this example we will turn a boot volume into a mirror.
1. MAKE A BACKUP! You are about to live edit the partition table of a volume. To borrow a phrase from diskutil, "Enabling RAID is an inherently dangerous operation.".
2. Boot from something else. An external HD with the latest version of Mac OS X is probably your best bet.
3. Identify the disk slice or volume mount point that you want to enable the RAID on. Let's say that the disk I want to enable is mounted at /Volumes/BootDisk
4. Continuing with our example, fire off, 'sudo diskutil enableRAID mirror /Volumes/BootDisk'. The volume will unmount and the re-appear a few moments later.
5. Re-select the freshly RAID enabled volume in the Startup Disk System Preference and reboot.
6. Now you can add additional volumes to the mirror and rebuild it in the background. This may take a long time.
Our example here can vary in a few different ways. If you are working with a data volume that can be safely unmounted then you can skip the external media boot. Likewise you can always pass the volume as a device node to diskutil. You can get the disk and slice number with diskutil list. If we were to execute that command it generates output like this:
josh$ diskutil list /dev/disk1 #: type name size identifier 0: Apple_partition_scheme *17.0 GB disk1 1: Apple_partition_map 31.5 KB disk1s1 2: Apple_Boot 128.0 MB disk1s2 3: Apple_HFS BootDisk 16.8 GB disk1s3In this case the data partition we would want to name in the enableRAID command is disk1s3. Remember that the disk identifiers on Mac OS X are dynamic and you never know what disk will get which number. So always verify the disk number before doing anything!
What we have done here is to turn a single device into a degraded mirror set. All we need to do now is add a second volume and rebuild the mirror onto it. You can do this in Disk Utility with a simple drag and drop or with sudo diskutil repairMirror mirrorSet newMember where mirrorSet and newMember are disk identifiers like we just looked at. Once this process is started you can check it with Disk Utility or diskutil checkRAID. From a Terminal it looks like this:
josh$ sudo diskutil repairMirror disk4 disk3s3 Password: Note: Syncing data between mirror partitions can take a very long time. Note: The mirror should now be repairing itself You can check it's status using 'diskutil checkRAID'.
After all the fun and games with mirror sets, the other RAID types will seem simple by comparison.
Stripes
Due to their relative fragility, there are not a lot of times that you will be using a stripe on your server. There are cases though, like a RAID 50 from an Xserve RAID, where it is common so we should take a look at it.
As with the mirrors, the easiest way to create a stripe is with Disk Utility. The process is the same, but you would define that the array created should be a stripe. Just drag in the volumes to stripe, set your options, and click "Create". The process of creating a stripe from the command line is the same as well with the only change being to define a stripe rather than a mirror. sudo diskutil createRAID stripe RAIDname FilesystemName member disks. The big difference with a stripe though is that you can not create one from an existing data disk. On Mac OS X the process of stripe creation will destroy any data on the disks.
Stripes are also different than the other supported RAID levels on Mac OS X in that they are static. Once you create a stripe you are stuck with it. You can't add space, you can't remove members, and you can't change the block size. For this reason it's important to figure out how you want to set it up before you do it.
When building your RAID stripe for performance the biggest factor is the number of spindles that you can stripe across. In general the more discrete devices in any given stripe, the faster it will go.
Concatenated Disks
New to appleraid 2 is the concatenated, or concat for short, disk set. A concat set takes multiple volumes and combines them into a larger one. In this way it is similar to a stripe, but there are three key differences.
Concat sets are cool in that it enables you to span multiple volumes dynamically. When you add more storage space you can now stretch an existing volume to use it. Keep in mind though that, just like a mirror, any volumes you add to the set after it's creation are formatted.
It is important to note that Apple's concat sets are a bit different that your typical one. On most OSes a concat set will survive the loss of a member but will act as if it just has a large section of bad blocks. On Mac OS X this is not the case and the loss of any member will result in the loss of the volume as a whole. Along these same lines, you can not remove a concat member once it is in the array. If you read the source code it appears that Apple wanted to let you remove the last member but that functionality is not present. It's probably just as well as removing a member of a concat could lead to catastrophic data loss if anything went wrong.
Mixed RAID Types
A big change in appleraid 2 is that you can now nest RAIDs within a larger RAID. As noted earlier, RAID 0+1 and 1+0 (aka RAID 10.) allow you to mix the features of the different array types to create a more robust system. Of the two, RAID 10 is probably the one that you would be looking at. It provides the speed and large volume of a stripe but while using mirrors for each of the stripe members.
You create nested RAIDs just like any other in Mac OS X, either with drag and drop in Disk Utility or with the diskutil command line tool. In this case it's probably easier with diskutil.
In Disk Utility you need to create your base RAIDs, then select the first one and go to the RAID tab. There you can click the + button to create a new set and drag the existing RAID sets into it. Click "Create" and you are done.
From the command line it's a bit simpler. Using the diskutil checkRAID command determine the disk identifiers of your existing RAID volumes. Then use the createRAID verb to build a new set using them. For example, if I wanted to create a RAID 10 from mirrors with a disk IDs of disk4 and disk6 I would execute sudo diskutil createRAID stripe NestedDisk HFS+ disk4 disk6. So while the creation of a nested set is a bit different in the GUI tools, it is exactly the same when using the CLI. Disks is disks.
This all sounds great until you realize how unlikely it is that this will help you at all. To achieve any sort of gains from a nested RAID you need at least four disks. Currently the only shipping Apple hardware with four disk bays is the Mac Pro, and the new Xeon Xserve models continue the three bay configuration of the G5 Xserve. If you had a large eSATA or Firewire 800 drive enclosure this might be of use, but if you are going to go the external storage route your money is better spent on a Xserve RAID.
Other Options...
There are a few more things to cover before we can wrap up.
If you are upgrading a server from Panther to Tiger you will also need to upgrade any RAID sets to take advantage of the new features. This is pretty easy to do with diskutil. Let's say we have a appleraid 1 volume at /Volumes/PantherRAID that we need to update. The command would simply be sudo diskutil convertRAID /Volumes/PantherRAID or you could use the disk identifier numbers as well. When performing this operation all the same caveats apply as when you use the enableRAID command, the most important being MAKE A BACKUP FIRST!
Another, seemingly useless, command is the updateRAID verb for diskutil. In theory you should be able to use this to adjust the settings of a RAID after it is created. In practice I've not been able to get it to change anything. The example shows, "diskutil updateRAID appleraid-AutoRebuild 1 disk5" but the changes never seem to take when you try it yourself. Even more frustrating is the fact that the supposed settings aren't published anywhere and you need to dig in the source to figure them out. It seems that AutoRebuild is the only setting that this should be able to work with, but it doesn't.
One of the most confusing settings is the RAID block size. There was a time when using tools like tunefs (Which has a fun man page by the way.) was a standard practice to get the most out of your server's disks. Now IO is so fast that it doesn't really come up much, except in the case of RAIDs. As a result, tuning a file system is a bit of a lost art these days. This isn't an exhaustive look at it, but rather a quick summary to give you an idea of how to proceed if you really want to get into it.
The first, and most obvious, step is to look at the data access patterns of the applications that will be accessing the disk. If it's a SQL database doing tiny queries then you may benefit from small blocks. If you are serving large video files then a larger size might provide a boost. An easy thing to do in many cases is to ask the person responsible for the application to tell you what the average size of a read or write is. You can't just make the assumption that a database is going to need small blocks.
The next step is to take some measurements using iostat -w 1 on your server. This will tell you how many transactions per second are happening on each disk and the average size in KB of each transaction. Let's take a look at some examples...
Here is the output from one of my Xserves at work. At the time I took this sample disk0 was working on active print queues and disk1 had a radmind client in the middle of a large lappy run.
macxmv2:~ mv2admin$ iostat -w 1
iostat: sysctl(kern.tty_nin) failed: No such file or directory
iostat: disabling TTY statistics
disk0 disk1 cpu
KB/t tps MB/s KB/t tps MB/s us sy id
33.94 3 0.12 13.76 5 0.06 7 5 88
0.00 0 0.00 73.12 24 1.71 4 2 94
0.50 1 0.00 72.29 24 1.69 5 2 92
80.14 7 0.55 73.84 16 1.15 46 16 38
52.67 6 0.31 52.46 36 1.84 52 14 34
6.08 6 0.04 67.82 25 1.65 26 20 53
128.00 80 9.98 54.25 38 2.01 5 28 66
126.56 89 10.98 58.48 25 1.43 5 28 67
122.48 92 10.98 54.62 29 1.54 6 34 60
118.44 32 3.69 39.10 25 0.95 42 16 42
64.50 8 0.50 76.06 27 2.00 50 32 18
70.62 8 0.55 54.40 20 1.06 43 16 40
122.20 100 11.92 4.46 57 0.25 37 30 33
120.45 202 23.70 5.06 67 0.33 6 35 59
0.00 0 0.00 15.60 52 0.79 4 4 92
There are a few things we can gleen from this info. The first is that radmind doesn't make very large filesystem transactions, with most of them between 50 and 75 KB. You can see the print spools on the boot drive make much larger reads and writes, but that is to be expected as it is sending and receiving large Postscript files. Note that the largest transaction you see on this Xserve is 128 KB. That's because it's a G4 and the biggest transaction a PATA drive can muster is 128 KB. Other drive types are capable of larger IO. In this case I might want to tune a file system that has lots of radmind traffic on it to a 64K block size as that more closely matches the average transaction size. Really what you are trying to do here is reduce the amount of tps needed to get the job done. I would hold off on changing the boot drive in this case though as the print spool is not the only thing it does. In general more analysis would be in order before trying to tune anything. Since block size isn't a setting you can change later you need to be sure what you are setting.Another way to help determine your block size is to use a tool like Bonnie++ to measure read/write performance as well as latency. A good example of this is the fact that my SCSI based G4 test rig is completely bus restrained. If I stripe 2 SCSI drives together I can create a noticeable improvement in some areas --but not all-- over that of a single drive. By tuning the block size to 128KB I can sacrifice some overall read/write performance in an exchange for better latency. It's up to you to determine if tuning your blocksize is worth it. You should see more dramatic results with less, um, stately hardware.
Two Bonnie++ processes against a single device.
Version 1.93c ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
dhcp172-21s10n 300M 53 50 3844 7 1917 2 65 53 4007 2 598.3 22
Latency 746ms 952ms 1199ms 278ms 157ms 2166ms
Version 1.93c ------Sequential Create------ --------Random Create--------
dhcp172-21s10n88.09 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 398 20 2603 44 282 16 78 6 4248 71 56 5
Latency 317ms 234ms 419ms 426ms 55717us 1230ms
Two Bonnie++ processes against a two drive stripe and default, 32K, blocksize.
Version 1.93c ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
dhcp172-21s10n 300M 54 51 5617 10 2374 3 67 54 5278 3 358.1 25
Latency 470ms 1316ms 1446ms 474ms 148ms 658ms
Version 1.93c ------Sequential Create------ --------Random Create--------
dhcp172-21s10n88.09 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 433 22 2480 40 324 17 120 11 2957 50 80 8
Latency 197ms 175ms 258ms 317ms 80632us 585ms
Two Bonnie++ processes against a two drive stripe and 128K blocksize.
Version 1.93c ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
dhcp172-21s10n 300M 50 47 5107 9 2370 3 63 52 4972 3 845.1 40
Latency 516ms 791ms 780ms 246ms 148ms 952ms
Version 1.93c ------Sequential Create------ --------Random Create--------
dhcp172-21s10n88.09 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 419 21 2274 37 316 17 109 9 3219 51 88 9
Latency 210ms 197ms 272ms 597ms 78131us 518ms
(Note that bonnie++ has a bunch of options you can use in the latest dev builds in order to multithread the tests. For these tests I created 2 semaphores and then used those to synchronize the tests. This is where you should see improvement on a RAID stripe over a single disk as there are more spindles and more heads to move the data around. Check the bonnie++ documentation on how to get the best results out of it. Bonnie++ can also output some nice HTML tables, but those results don't include latency.)A Cautionary Warning
Currently the status of all of this on Intel based Macs is a bit up in the air. As of 10.4.8 it should all work, but Apple's related KB articles are so vague that it's not really clear. If anyone has some Intel Macs to test with please do, and post your results in the comments below.
Wrapping up
So there it is. The most comprehensive collection of information on appleraid anywhere. While I was writing this, Apple posted a few KB notes on the subject. They have pretty pictures, but in general are pretty surface level. That's not to say you shouldn't read them though as I'm a firm believer of reading all the documentation from the ReadMe to the source code.
With the apparent lack of an internal hardware RAID option in the new Xeon Xserve you can bet that Apple is going to be pushing to further improve appleraid and I am excited to see where they take the technology. If you read the comments in the source it's plain to see that RAID 5 and better concat sets are something they would like to implement.
A critical thing to remember in all of this is that RAID is not a backup. RAID, in the server context, is uptime assurance. Even if something gets hosed up in the rebuild process, and you need to start over from a backup, the RAID did its job by keeping services up and letting YOU pick the downtime needed to repair the server. When dealing with servers in general I belong to the school that availability is goal number one. appleraid 2 can help you achieve this.
As always, have fun and read the man pages...
