Tips April 19, 2005 at 11:08 am

Xsan Deployment Advice

Xsan has been out for a few months now so it seems like a good time to go over some Storage Area Network (SAN) stuff and then go through a typical Xsan setup.Before Xsan

A SAN is, at its most simple, a collection of devices (servers, storage, tape libraries) connected together on a Fibre Channel (FC) network with a FC switch. A SAN enables us to get more out of our storage by making it accessible to multiple hosts.

However, in the simple setup above, each server will see the storage as directly attached there is no file-locking and soon enough we’ll have file corruption and major problems.

To get round this, the storage needs to be provisioned out to the servers (each getting their own piece of the storage), either through LUN masking or zoning on the switch. No matter how well we plan this, we can guarantee that eventually we’ll be stuck with one storage device at 98% capacity while others sit half empty.

This is where Xsan comes in.

Xsan is a cluster file system for OS X. A SAN filesystem allows us to open up all our storage to all our devices. Xsan will control access to the files to make sure everyone is happy – and it does all this through metadata which is controlled, surprisingly enough, by what is called a metadata controller.

While setting up our SAN with Xsan is made amazingly simple in Xsan Admin (the GUI application), please remember that this is Xsan and not iSan. The design, configuration and setup of our SAN environment should not be taken lightly or approached without proper planning and knowledge.

So, here is a quick guide to the key elements: Metadata Controllers, Clients, Storage, and the FC switch.

As I said earlier, all access to files stored on the SAN is controlled by a metadata controller. This machine tells the clients where to find files, and permissions to those files.

Clients are any machines directly connected to the FC network – they can be G4 or G5 towers or Xserves running OS X client or server.

Storage is where all our data sits, i.e. Xserve RAIDs

The FC switch is possibly the most important element when designing a SAN. Apple has certified switches from a number of vendors to use with Xsan. To get the best possible performance you want a full fabric switch. Much like Ethernet, Fibre Channel has both “hubs”, that create arbitrated loop connections, and full fabric switches, similar to an Ethernet switch. Don’t be confused by the fact that both types of Fibre Channel switch are called a “switch” unlike Ethernet where you’d call them a hub and a switch. A rule of thumb here is that you get more if you pay more. The Emulex models on the Apple Store are various versions of a Fibre Channel “hub”, as their price would suggest. While these devices are great for basic Fibre Channel connectivity they probably aren’t what you’re looking for if you are getting Xsan for speed.

So, head over to Apple’s Xsan compatibility page for a full list of approved devices. Switches come in 8 port up to 64 port models. Some manufacturers allow you to purchase based on number of active ports and you can activate the remaining ports by purchasing license keys when you need the extra capacity.

Measure Twice, Buy Once

The initial design of an Xsan solution is so important, it’s almost impossible to stress it enough. Mistakes in planning your SAN can literally cost you thousands of dollars.

The best place to start your design with Xsan is at the end! Figure out what your end-user requirements are, this will give you a target for how much capacity you’re going to need. From there you’ll be able to plan the storage and FC networking that you’ll need to support that capacity.

And now an Example

Most Xsans are being used for video production. While Xsan can be useful in a number of situations, video is certainly one of Apple’s target markets for this product. Plus the need for very high bandwidths in a video environment really bring home how to architect a SAN solution.

Our example will focus on a reasonable video production network or 4 edit bays.

3 edit bays that will be using uncompressed SD 8-bit
1 edit bay doing uncompressed HD 8-bit

From this we can begin our initial bandwidth requirement calculations by multiplying each video format, HD and SD in this case, by the necessary data rate for that type.

3 SD feeds at 20MB/s + 1 HD feed at 125MB/s = 185MB/s

Note that that’s a big “B” in MB for megabyte, not megabit.

Now let’s assume our SD edit bays will be using 4 streams of video in their projects. Throw this extra traffic into the mix and calculate again.

3 SD bays with 4 feeds each at 20MB/s +1 HD feed at 125MB/s = 365MB/s

The extra feeds per bay have pretty much doubled our requirements and begin taking up a significant amount of bandwidth.

So, now that we know we need 365MB/s from our storage we can start putting the storage together that will get us that speed.

Storage Caluculations

Ed. Note: While Xsan can use pretty much any Fibre Channel storage, it has certainly been designed with the Xserve RAID in mind. Because of this most Xsan setups are using Xserve RAIDs and thus we have the most real-world knowldege on what to expect from them.

Each Xserve RAID (XSR) has 2 controllers. Each controller is capable of delivering something in the region of 80-100MB/s. It’s very important to note that this is less than what the XSR will do if it’s directly attached to the host machine. This descrepency is caused by the greater overhead required by Xsans clustered file system.

So, in our example we’ll need 5 controllers, or 2 and a half RAIDs to get us our bandwidth.

5 controllers x 80MB/s = 400MB/s which is greater than our needs of 365MB/s

Sadly Apple doesn’t offer the option of buying half an XSR (even though it would make them much easier to lift), but this works out perfectly for us as we’ll be needing that other controller. So 3 XSR units it is.

Hang on, shouldn’t the calculation be 4 controllers at 100MB/s to get 400MB/s? Why do we need 5 controllers? Well, on a good day you’ll get 100MB/s, but on a bad day, and this depends on the type of video you get, the codec used to comress it and a number of other factors, you’ll only get 80MB/s which would leave you wanting by about 40MB/s if you only had 4 controllers. With this in mind it is much better to build in padding to acount for changes in demand than to have dropped frames during a capture.

It’s imperative when planning an Xsan that you be realistic about your needs and abilities. Yes it’s expensive to put one of these together and management’s natural tendency is to keep the price low. However scraping by on the requirements is going to kill you in the long run when your production is brought to a standstill because your bays can’t be feed. Saving a few thousand on the storage is going to make the $10,000 you spent on the extra edit bay worthless.

When configuring your XSRs for Xsan it’s usally best to use one of the drives as a hot spare, which will automatically get added into the array if one of the other drives fails. Then take the remaining 6 drives and make a RAID 5 array out of them. The XSR has been optimized for RAID 5 and is faster doing that than any other RAID configuration.

An Xsan volume, being essentially a RAID 0 stripe across multiple XSRs, is best if all of the storage is the same size. So keep all of your XSRs configured with the same number and size of drives.

Metadata and Latency

Earlier on I talked about metadata. This needs to be stored on a volume on the SAN too. The metadata is key to our SAN. If we lose access to our metadata, we lose access to our data, which is usually not a good thing.

Xsan metadata can easily be stored on a LUN that is also used for file storage. However, this will drive down your SANs total speed. Every extra hit to the MD controller introduces more latency into the SAN as the MD read/writes have to wait on other operations. This will slow down your SAN, so bite the bullet and resist the urge to fill up an entire XSR half with drives so that you can get more storage. Typically a 2-disk RAID 1 array is used for MD storage. You won’t need more than 250GB for this, and the RAID mirror will prevent a drive failure from stopping the SAN. So put two modules in one half of an XSR and duct tape over the other 5 slots to prevent them from being used.

The metadata controllers (MDCs) are crucial to keeping the SAN running. If we lose a controller, we lose access to our data. Our primary MDC should be an Xserve dedicated to this task. Again, this is another area where you’re going to think you’re wasting a huge amount of capacity just to do metadata. This thought will be reinforced when you look at the MDC under load and realize it’s not working nearly as hard as an Xserve could be.

When you have this thought, put Server Admin away. Leave the MDC alone. Every other service that you put on the MDC will cause additional latency to occur. Sure your MDC could be serving webpages or files in it’s spare time, but every MD request that has to wait because the processor is thinking about something else is reduced speed for the SAN. You get enough of that and you start dropping frames again.

You need to have at least one secondary MDC in case of failure on the primary. The failover process is fully automated and fast, so no worries there. The failover also makes Xsan upgrades a breeze – you can fail-over between controllers when installing with no down-time. Ideally the backup MDC is dedicated to this and not just another edit bay doing double-duty.

Metadata Network

As long as we are talking about MDCs we need to talk about the MD network. Xsan wants a Gigabit Ethernet network dedicated to MD between all of the Xsan machines. This isn’t to move the files around, the FC does that, it’s for the Xsan nodes to query the MDC on where to find the files on the FC network.

Your MD will most likely not require 1000Mb/s of pipe. However, Gigabit has lower latency than 100Mb Ethernet. For this reason you want to go with Gigabit. Now a slight silver lining for the bean counters in this, is that you probably want the dumbest Gigabit Ethernet switch that you can get for your MD network. The smarter the swtich, the more it looks into the Ethernet frames and the more… wait for it…. latency you get. We’ve seen VLANs, SpanningTree/PortFast and other managed switch tecnologies become a slight hinderance for Xsan MD networks.

Again this is an architectural decision that you’re going to need to make. Many network administrators really want to use the equipment and cabling that’s already in place for the MD network. And don’t get me wrong, the existing pieces will work, but you won’t be as optimized as you could be.

And as you’ve probably already surmised, it’s best not to use the MD network for other things like filesharing and surfing the web. Keep the MD network separate from the rest. This means that in the Network Prefernce pane on the Xsan nodes that the MD network is secondary to your “regular” network. For your G5 desktops you’re going to have to get another Ethernet card, or in a pinch use wireless, for this regular network. Since your machines are probably dedicated editing machines you might not be using the primary network connection that much and wireless is certainly good enough for surfing the web and reading e-mail while you’re waiting for the client to make up their mind about the size of the logo for the intro piece.

There are a couple of limitations in Panther right now that will most likely be removed in Tiger. A single LUN cannot exceed 2TB and an Xsan volume cannot exceed 16TB. Of course, a 6 disk (400GB) RAID 5 array with a hot spare doesn’t exceed the 2 TB limit on LUN’s and Xsan allows multiple 16TB volumes. Plus putting together a 16TB volume to begin with is beyond most organziations’ budgets, so these really aren’t a hinderence with Panther.

Time for a diagram! Xsan Fibre Channel setup

Another diagram – Xsan with metadata network

Now we have all the components connected together over FC and Ethernet, it’s time to look at completing the picture. We’re going to need a backup MDC, we’re going to need a DNS server and we’re going to need a way to manage permissions to all our files. Trying to manage permissions to files on the SAN without a central directory service gives me nightmares, so we’re going to use Open Directory. The idea of managing and synchronizing UID’s across all those machines is asking for trouble.

As I said earlier, you really want to leave your primary MDC just doing that – being a controller. Adding in extra servers gives us excellent redundancy and performance benefits, just as adding in extra RAIDs gives both increased storage and increased performance. We can add in a second Xserve running DNS and being an Open Directory master as well as setting it up as a backup MDC. We can add in a 3rd Xserve as a file server for the regular LAN (via AFP or NFS) to ethernet clients that need access without the performance of FC. This server can be a replica for OD and a slave DNS server.

And a final diagram – Xsan with all the other bits and pieces.

Things to remember:

Xsan supports 64 clients on the FC network (an XSR is not a client but a metadata controller is).

Apple’s FC cards have 2 ports – this gives increased performance and redundancy, but takes up more ports on your switch. Make sure configure the Xsan client to “rotate” through the FC ports to use as much of both as possible for throughput.

Don’t fill up every port on your switch – if you need to add a client or two for a rush project you won’t be able to – always leave some breathing room.

Firmware! Make sure your firmware is up to date and compatible. The FC switch makers sometimes have specific firmware for specific manufacturers.

FC switch – make sure you have it configured properly. Most will work straight out of the box – but that doesn’t always mean they’re performing optimally. Some switches require RSCN suppression, others need some zoning to improve performance.

Xsan can easily scale in capacity and performance as you add Xserves and Xserve RAIDs up to the limits stated above
Don’t try to cut corners – corners are sharp and you will bleed bandwidth!

No Comments

  • Hi, thanks for the article! its good to see someone with some real world
    experience talking on the topic regarding expected throughput and latency
    etc. as this is always hard to judge and documentation is not always as real
    world as we would like. . .

    I have a couple of things to ask regarding someone working with Xsan and
    associated hardware (as buying one to test is not really an option):
    First of all as it appears (as far as I am aware) that there is a potential single
    point of failure at the controller (and cable connected to it) on the XServe
    RAID. All documentation (I have found) points to this (page 24 of admin
    guide) and various posts on the discussion forum seem to confirm this.

    I have no problem with this if you are selecting XSan as a lower tier SAN
    solution (not mission critical), however I wanted to ask if anyone has seen (or
    simultated) a controller failure or cable failure to RAID controller.

    From a support perspective I would want to know how this would affect the a
    SAN volume and the best procedures for recovery ASAP. Obviously this will to
    some degree depend on how you define SAN volumes and affinties – and if
    someone could suggest the best way of minumising risk in different
    situations without losing too many benefits of the SAN architecture.

    Secondly I wanted to suggest it may be worth considering for the MetaData
    controller having mirrored hardware RAID internal drives instead of taking up
    a side of an Xserver RAID just for this purpose) – Unless you considered and
    rejected this idea for your article for reasons not mentioned.

    thanks again

    Pete

    • Metadata has to reside on a SAN volume – *if* you were to be able to store it on
      the MDC’s internal drives (which you’re not), how would your backup MDC be
      able to get to it if your primary MDC failed?
      With it on a SAN volume, all MDC’s can access the metadata pretty much
      instantly when failing over.

    • Regarding the single point of failure at the Xserve RAID controller, and how to
      recover ASAP: simply add an AppleCare Service Parts Kit to your initial
      shopping list. This kit includes a replacement controller (as well as a power
      supply, a cooling fan, and a drive module), and the replacement installs in
      just a couple minutes.

      Two other items I’d consider for my shopping list: an Xsan Certification
      course, available at train.apple.com, and AppleCare Xsan Support, for a year
      of 24-7 phone and email assistance. The course will be invaluable during
      your setup, maintenance, and day-to-day troubleshooting, and the
      AppleCare assistance could save you a considerable consultant’s fee if you
      get in over your head.

      • Don’t forget about buying maintenance if you want all software upgrades
        covered for the next 3 years too:
        http://www.apple.com/server/maintenance/

        • Thanks for the advice, sorry I did not pick up that the metadata was required
          to be on the SAN. .

          One question though is as Xsan stripes RAID arrays at a higher level, is if you
          have say three (RAID 5 based) LUNs that are all connected to seperate
          controllers creating one storage pool from two LUN’s (striped together by
          Xsan) with one dedicated LUN for metadata if any one controller fails you
          would lose the entire storage pool? can you then take the system offline
          replace controller, reboot and recover the last state within a short time and
          without going to backup ?

    • What happens if a LUN dies — say a controller goes out. What happens to the
      volume? Is data lost?

      • You shouldn’t lose any data, but you will lose access to it until you replace the
        controller (or repair whatever the cause of the LUN going down was). This is
        where parts kits (and hot spares in your LUNs) are invaluable – you’re back up
        and running before the service technician has even got their engine started.

  • Hi,

    I’ve seen a few articles discussing Xsan in video production suites. I’ve never seen anyone mention how well Xsan works with simple web files.

    We have a few WebObjects applications (running on multiple Xserves) that allow users to upload images to a volume that must be able to be read by multiple web servers. The files are small images anywhere between 1024 bytes and 768KB. Right now what we do is write the images to the application servers, then scp them over to the web servers.

    I was hoping that we could remove this step by installing an Xsan environment with a couple Xserve RAIDs. That way, the application servers could all write to the Xsan volume and all the web servers would have instant access to those files for serving up to the users’ web browsers.

    Does anyone see any problems with this? Is Xsan suitable for this kind of task? There will obviously be very frequent access to LOTS of small files. From reading this thread, Xsan seems to be a very complex system. I’m not afraid of complexity. I just want to make sure that I’m not getting myself into anything that could be better solved using a simpler solution.

    Essentially, I want multiple web servers and application servers to be able to access the same Xserve RAIDs at the same time.

    Is there a better solution for this than Xsan?

    Thanks,

    Brendan

Leave a reply

You must be logged in to post a comment.