Xsan has been out for a few months now so it seems like a good time to go over some Storage Area Network (SAN) stuff and then go through a typical Xsan setup.Before Xsan
A SAN is, at its most simple, a collection of devices (servers, storage, tape libraries) connected together on a Fibre Channel (FC) network with a FC switch. A SAN enables us to get more out of our storage by making it accessible to multiple hosts.
However, in the simple setup above, each server will see the storage as directly attached there is no file-locking and soon enough we’ll have file corruption and major problems.
To get round this, the storage needs to be provisioned out to the servers (each getting their own piece of the storage), either through LUN masking or zoning on the switch. No matter how well we plan this, we can guarantee that eventually we’ll be stuck with one storage device at 98% capacity while others sit half empty.
This is where Xsan comes in.
Xsan is a cluster file system for OS X. A SAN filesystem allows us to open up all our storage to all our devices. Xsan will control access to the files to make sure everyone is happy – and it does all this through metadata which is controlled, surprisingly enough, by what is called a metadata controller.
While setting up our SAN with Xsan is made amazingly simple in Xsan Admin (the GUI application), please remember that this is Xsan and not iSan. The design, configuration and setup of our SAN environment should not be taken lightly or approached without proper planning and knowledge.
So, here is a quick guide to the key elements: Metadata Controllers, Clients, Storage, and the FC switch.
As I said earlier, all access to files stored on the SAN is controlled by a metadata controller. This machine tells the clients where to find files, and permissions to those files.
Clients are any machines directly connected to the FC network – they can be G4 or G5 towers or Xserves running OS X client or server.
Storage is where all our data sits, i.e. Xserve RAIDs
The FC switch is possibly the most important element when designing a SAN. Apple has certified switches from a number of vendors to use with Xsan. To get the best possible performance you want a full fabric switch. Much like Ethernet, Fibre Channel has both “hubs”, that create arbitrated loop connections, and full fabric switches, similar to an Ethernet switch. Don’t be confused by the fact that both types of Fibre Channel switch are called a “switch” unlike Ethernet where you’d call them a hub and a switch. A rule of thumb here is that you get more if you pay more. The Emulex models on the Apple Store are various versions of a Fibre Channel “hub”, as their price would suggest. While these devices are great for basic Fibre Channel connectivity they probably aren’t what you’re looking for if you are getting Xsan for speed.
So, head over to Apple’s Xsan compatibility page for a full list of approved devices. Switches come in 8 port up to 64 port models. Some manufacturers allow you to purchase based on number of active ports and you can activate the remaining ports by purchasing license keys when you need the extra capacity.
Measure Twice, Buy Once
The initial design of an Xsan solution is so important, it’s almost impossible to stress it enough. Mistakes in planning your SAN can literally cost you thousands of dollars.
The best place to start your design with Xsan is at the end! Figure out what your end-user requirements are, this will give you a target for how much capacity you’re going to need. From there you’ll be able to plan the storage and FC networking that you’ll need to support that capacity.
And now an Example
Most Xsans are being used for video production. While Xsan can be useful in a number of situations, video is certainly one of Apple’s target markets for this product. Plus the need for very high bandwidths in a video environment really bring home how to architect a SAN solution.
Our example will focus on a reasonable video production network or 4 edit bays.
3 edit bays that will be using uncompressed SD 8-bit
1 edit bay doing uncompressed HD 8-bit
From this we can begin our initial bandwidth requirement calculations by multiplying each video format, HD and SD in this case, by the necessary data rate for that type.
3 SD feeds at 20MB/s + 1 HD feed at 125MB/s = 185MB/s
Note that that’s a big “B” in MB for megabyte, not megabit.
Now let’s assume our SD edit bays will be using 4 streams of video in their projects. Throw this extra traffic into the mix and calculate again.
3 SD bays with 4 feeds each at 20MB/s +1 HD feed at 125MB/s = 365MB/s
The extra feeds per bay have pretty much doubled our requirements and begin taking up a significant amount of bandwidth.
So, now that we know we need 365MB/s from our storage we can start putting the storage together that will get us that speed.
Ed. Note: While Xsan can use pretty much any Fibre Channel storage, it has certainly been designed with the Xserve RAID in mind. Because of this most Xsan setups are using Xserve RAIDs and thus we have the most real-world knowldege on what to expect from them.
Each Xserve RAID (XSR) has 2 controllers. Each controller is capable of delivering something in the region of 80-100MB/s. It’s very important to note that this is less than what the XSR will do if it’s directly attached to the host machine. This descrepency is caused by the greater overhead required by Xsans clustered file system.
So, in our example we’ll need 5 controllers, or 2 and a half RAIDs to get us our bandwidth.
5 controllers x 80MB/s = 400MB/s which is greater than our needs of 365MB/s
Sadly Apple doesn’t offer the option of buying half an XSR (even though it would make them much easier to lift), but this works out perfectly for us as we’ll be needing that other controller. So 3 XSR units it is.
Hang on, shouldn’t the calculation be 4 controllers at 100MB/s to get 400MB/s? Why do we need 5 controllers? Well, on a good day you’ll get 100MB/s, but on a bad day, and this depends on the type of video you get, the codec used to comress it and a number of other factors, you’ll only get 80MB/s which would leave you wanting by about 40MB/s if you only had 4 controllers. With this in mind it is much better to build in padding to acount for changes in demand than to have dropped frames during a capture.
It’s imperative when planning an Xsan that you be realistic about your needs and abilities. Yes it’s expensive to put one of these together and management’s natural tendency is to keep the price low. However scraping by on the requirements is going to kill you in the long run when your production is brought to a standstill because your bays can’t be feed. Saving a few thousand on the storage is going to make the $10,000 you spent on the extra edit bay worthless.
When configuring your XSRs for Xsan it’s usally best to use one of the drives as a hot spare, which will automatically get added into the array if one of the other drives fails. Then take the remaining 6 drives and make a RAID 5 array out of them. The XSR has been optimized for RAID 5 and is faster doing that than any other RAID configuration.
An Xsan volume, being essentially a RAID 0 stripe across multiple XSRs, is best if all of the storage is the same size. So keep all of your XSRs configured with the same number and size of drives.
Metadata and Latency
Earlier on I talked about metadata. This needs to be stored on a volume on the SAN too. The metadata is key to our SAN. If we lose access to our metadata, we lose access to our data, which is usually not a good thing.
Xsan metadata can easily be stored on a LUN that is also used for file storage. However, this will drive down your SANs total speed. Every extra hit to the MD controller introduces more latency into the SAN as the MD read/writes have to wait on other operations. This will slow down your SAN, so bite the bullet and resist the urge to fill up an entire XSR half with drives so that you can get more storage. Typically a 2-disk RAID 1 array is used for MD storage. You won’t need more than 250GB for this, and the RAID mirror will prevent a drive failure from stopping the SAN. So put two modules in one half of an XSR and duct tape over the other 5 slots to prevent them from being used.
The metadata controllers (MDCs) are crucial to keeping the SAN running. If we lose a controller, we lose access to our data. Our primary MDC should be an Xserve dedicated to this task. Again, this is another area where you’re going to think you’re wasting a huge amount of capacity just to do metadata. This thought will be reinforced when you look at the MDC under load and realize it’s not working nearly as hard as an Xserve could be.
When you have this thought, put Server Admin away. Leave the MDC alone. Every other service that you put on the MDC will cause additional latency to occur. Sure your MDC could be serving webpages or files in it’s spare time, but every MD request that has to wait because the processor is thinking about something else is reduced speed for the SAN. You get enough of that and you start dropping frames again.
You need to have at least one secondary MDC in case of failure on the primary. The failover process is fully automated and fast, so no worries there. The failover also makes Xsan upgrades a breeze – you can fail-over between controllers when installing with no down-time. Ideally the backup MDC is dedicated to this and not just another edit bay doing double-duty.
As long as we are talking about MDCs we need to talk about the MD network. Xsan wants a Gigabit Ethernet network dedicated to MD between all of the Xsan machines. This isn’t to move the files around, the FC does that, it’s for the Xsan nodes to query the MDC on where to find the files on the FC network.
Your MD will most likely not require 1000Mb/s of pipe. However, Gigabit has lower latency than 100Mb Ethernet. For this reason you want to go with Gigabit. Now a slight silver lining for the bean counters in this, is that you probably want the dumbest Gigabit Ethernet switch that you can get for your MD network. The smarter the swtich, the more it looks into the Ethernet frames and the more… wait for it…. latency you get. We’ve seen VLANs, SpanningTree/PortFast and other managed switch tecnologies become a slight hinderance for Xsan MD networks.
Again this is an architectural decision that you’re going to need to make. Many network administrators really want to use the equipment and cabling that’s already in place for the MD network. And don’t get me wrong, the existing pieces will work, but you won’t be as optimized as you could be.
And as you’ve probably already surmised, it’s best not to use the MD network for other things like filesharing and surfing the web. Keep the MD network separate from the rest. This means that in the Network Prefernce pane on the Xsan nodes that the MD network is secondary to your “regular” network. For your G5 desktops you’re going to have to get another Ethernet card, or in a pinch use wireless, for this regular network. Since your machines are probably dedicated editing machines you might not be using the primary network connection that much and wireless is certainly good enough for surfing the web and reading e-mail while you’re waiting for the client to make up their mind about the size of the logo for the intro piece.
There are a couple of limitations in Panther right now that will most likely be removed in Tiger. A single LUN cannot exceed 2TB and an Xsan volume cannot exceed 16TB. Of course, a 6 disk (400GB) RAID 5 array with a hot spare doesn’t exceed the 2 TB limit on LUN’s and Xsan allows multiple 16TB volumes. Plus putting together a 16TB volume to begin with is beyond most organziations’ budgets, so these really aren’t a hinderence with Panther.
Time for a diagram! Xsan Fibre Channel setup
Another diagram – Xsan with metadata network
Now we have all the components connected together over FC and Ethernet, it’s time to look at completing the picture. We’re going to need a backup MDC, we’re going to need a DNS server and we’re going to need a way to manage permissions to all our files. Trying to manage permissions to files on the SAN without a central directory service gives me nightmares, so we’re going to use Open Directory. The idea of managing and synchronizing UID’s across all those machines is asking for trouble.
As I said earlier, you really want to leave your primary MDC just doing that – being a controller. Adding in extra servers gives us excellent redundancy and performance benefits, just as adding in extra RAIDs gives both increased storage and increased performance. We can add in a second Xserve running DNS and being an Open Directory master as well as setting it up as a backup MDC. We can add in a 3rd Xserve as a file server for the regular LAN (via AFP or NFS) to ethernet clients that need access without the performance of FC. This server can be a replica for OD and a slave DNS server.
And a final diagram – Xsan with all the other bits and pieces.
Things to remember:
Xsan supports 64 clients on the FC network (an XSR is not a client but a metadata controller is).
Apple’s FC cards have 2 ports – this gives increased performance and redundancy, but takes up more ports on your switch. Make sure configure the Xsan client to “rotate” through the FC ports to use as much of both as possible for throughput.
Don’t fill up every port on your switch – if you need to add a client or two for a rush project you won’t be able to – always leave some breathing room.
Firmware! Make sure your firmware is up to date and compatible. The FC switch makers sometimes have specific firmware for specific manufacturers.
FC switch – make sure you have it configured properly. Most will work straight out of the box – but that doesn’t always mean they’re performing optimally. Some switches require RSCN suppression, others need some zoning to improve performance.
Xsan can easily scale in capacity and performance as you add Xserves and Xserve RAIDs up to the limits stated above
Don’t try to cut corners – corners are sharp and you will bleed bandwidth!