Horray for new toys!

Last week was probably my craziest here so far. On Monday, we were off because there was some kind of blizzard thing, so it was a shorter week to begin with. On top of that, there were some things piled up from my week off with my gall bladder surgery, so I was trying to get those done, and then the icing on the cake was that we were evaluating a new storage solution for the VMware environment that we’ve got, so I had to get things going there, too. Of everything, that last one turned out the best.

So, the situation that we’re in right now is that our primary storage is a dual-controller NetApp 3210. We’re doing the whole Active/Active thing, and our disk aggregates are broken up across controllers.

Prior to 4 or 5 months ago, we were running roughly the same number of disks but in a single aggregate, attached to a single-headed FAS3140, so while it was probably a less sophisticated setup, IO was spread over a much larger number of spindles.

If you’re curious about the actual disk count, on the head that deals with virtualization, we’ve got 14 spindles, and on the other head, we’ve got 42. I’m not the storage admin, so I can’t tell you exactly why it’s set up like this, but there are reasons.

The 14 spindles that deal with virtualization just can’t keep up with the load that our classroom virtual environment is putting on it, so we’ve been looking at other options, from putting everything back on the same aggregate to getting a specific solution in place to offload the IO. We talked to Cambridge Computing, and they suggested that we take a look at Nimble Storage, who had been getting rave reviews from Cambridge’s customers.

I was familiar with Nimble because they’ve presented at Tech Field Day a few times. Architecturally speaking, their arrays are hybrid SSD/spinning disk, but they have a few interesting techniques that sound pretty decent. We talked to the local Nimble team, and they agreed to let us borrow their CS260 demo unit while they went to the company’s annual meeting in California, which was very cool, and we really appreciated.

So this past week, I needed to do some amount of stress testing of the array, which of course involved running cables and configuring the network and ESXi boxes. I’ve never played with iSCSI before (my former arrays were all fibre channel, and our NetApps are using NFS to present the datastores), so that was a fun new experience.

All in all, we were happy with the performance of the array (basically, we were network bound the entire time), but even being network bound, we were seing better performance by far than we were getting with the 14 spindles on the NetApp. I wish I’d had more time (or a smaller workload) to give it a more thorough beating, but even so, we decided to purchase an array. In a few weeks (as soon as University paperwork goes through), we’ll get our brand new CS220. Awesome!

I can’t wait, and I’ll definitely let you know what it’s like to get going with it. In the meantime, I’ve got to get the vSphere cluster ready to roll, so that means plenty of work. It’s going to be fun!

  • Paul

    Hi Matt,

    How did you determine that you where network bound only? Where you using some kind of a IO meter software?

  • Paul: Actually, the Nimble’s administrative interface has a really nice real-time IO meter that you can watch for activity.

    Here’s a graph of when I did a 50-VM deployment to the Nimble machine from a link clone:

    Although Nimble supports VAAI, they don’t have the clone primitive supported yet. It’s being planned for this summer, so I can get it in place for the Fall semester, which will be nice.

    If they did support the clone primitive, you would see the network traffic almost go away, while the IOPS would spike much higher, because the array would be taking care of the heavy lifting, since the source template was on the same device.

    As it was, this machine had a single 1Gb/s ethernet link to the storage (which is woefully inadequate and if I had time, would have used a more appropriate system to test with at least 4 1Gb/s links). You can see the network traffic topping out around 85megabytes per second, which google tells us is 680Mb/s.

    The link should be able to move data faster than 680Mb/s, but there are a lot of variables, particularly in the wiring of the server room, that I think play a bigger part in the traffic than the Nimble not pushing data fast enough.

    I’m working on the interface grouping now for the production installation, but essentially, without 10Gb/s links, we’re not going to be IO bound on most things.