One of the larger differences between a position in Academia and a commercial business is that after-hours or weekend work is far less frequent. Every once in a while, though, there’s something that you need to do that can’t be done during the day. This weekend was one of those times.
Our central fileserver duties were, up until this weekend, run by a NetApp FAS3140 filer. It provided all of the NFS shares and iSCSI LUNs to all the machines around the college. It had a few different issues that made us want to replace it. namely it being long in the tooth, plus only having a single head / controller (I’m still not all that familiar with the NetApp nomenclature).
We replaced it with another filer yesterday, a 3210 with two controllers.
Part of the support agreement with our NetApp reseller was the actual installation, testing, and turn-up of the new filer, so our checklist basically had two major parts: prepare for the outage and recover from it. As you know or can imagine, when the central file server goes down, there’s a non-trivial amount of work done to prepare the infrastructure.
You could approach this problem one of two ways. You could actually do it both ways in order to self-check for correctness.
The first way, and the way that I started, was to say, “alright, what’s the least-important machines that rely on the filer?”. Those need to turn off first. Then, the next important, then the next important, and so on. Importance is kind of a arbitrary judgement though; what I was really asking was, “what relies on this, but has nothing that relies on it?”. These were things like desktop machines. Because desktop machines have nothing that rely on them (except users, and the users had been warned previously several times), they were to be the first to get shut-off.
The other way, which is probably more correct, is to start at the center and say, “what relies on the NetApp directly?”. Create a list, then iterate through that list, asking the same question, “what relies on this?”, and repeat until you’re out of dependent systems. I didn’t take this route because it generally takes more time and I started late. Next time, I imagine we’ll make the checklist farther ahead in time, something I’ll bring up at the post-mortem.
Overall, things went relatively smoothly. Of course, things almost always go smoothly. It’s the whole “bringing it back up” that creates wrinkles, but it honestly didn’t go badly. There were a couple of undocumented places on really old Solaris boxes which referenced the previous filer by name, as opposed to by CNAME (each of the major shares now has a CNAME that the clients point at…something like homedirs.domain.tld, but since this wasn’t exclusively documented, we had to fix it manually.
Overall, I’m pretty happy, and now we’ve got a shiny new filer, and still have a disk shelf on the old one, so I can get a little more familiar with NetApp without breaking production ;-)
If you have any questions or suggestions of things that we could fix, please let me know by commenting below. Beware that this purchase was planned before I got here (in fact, they showed me the boxed-up filer during my interview months ago), so I won’t be able to answer any “why did you pick this” type questions.
I’m in Silicon Valley for Tech Field Day 8, and I just got out of a very cool session.
The company is Nasuni, and the two hour presentation was delivered by their founder/CEO Andres Rodriguez.
Don’t stop reading this blog post after this sentence, but Nasuni provides cloud storage. I know, that’s like buzzword-ese for “stop reading because this is boring”, but really, hang in there. They do it in an interesting way.
Most of the solutions I’ve seen that are “cloud storage” treat the cloud like a storage tier. You’re familiar with storage tiering, right? Here’s a diagram:
The idea is that data people use most, or most recently (aka hot data, either blocks or files, depending on the solution) is on the fastest storage, and as the data “cools”, it moves to slower storage.
Some cloud storage providers use “the cloud” as another tier, something like this:
It kind of makes sense, since you can store files there, and the latency is high, plus there’s a LOT of capacity.
In the case of Nasuni, though, it’s a different idea altogether. What they do is provide a device (or a VM) that in essence acts as a file server. It has a certain amount of diskspace locally, but the real power behind it is illustrated by the following diagram:
To highlight the “important” part, you have clients that are speaking normal NAS protocols, like NFS or SMB/CIFS to the Nasuni box, and on the other end, it’s talking natively to a cloud provider, storing your files there.
As you can see in the diagram, there is a local disk cache, which acts like any other cache. When a file is changed, it’s written to the local disk cache, which then propagates to the cloud storage.
There are a lot of cool things going on with this, but essentially, it allows you to have multiple “Nasuni boxes”, which gives you something like a unified file share in the cloud without needing to deal with replication of file servers or data yourself. It kind of cuts out a lot of the PITA part of the process.
That being said, it’s not for everyone. It’s file-level only right now, and it’s not what I would call cheap, when you consider the amount of space you get in their cloud (right now, it’s several thousand dollars per terabyte per year). And I’m not sure that their claims on their webpage (specifically the 100% uptime), especially when they’re relying on a third-party’s cloud offering (right now Amazon’s S3 (and we all know how sterling Amazon’s reputation is right now), but it’s provider-agnostic with the right feature-sets on the server-side).
The best part is that there is a free trial via virtual machine (and they have an OVF format, which means it runs great on VirtualBox). I actually installed it on my laptop during the Tech Field Day presentation and had other delegates talking to it, sharing files, and playing on the administration interface before the end of the session.
I liked it a lot. If nothing else, check out the free trial. It’s pretty awesome looking.
If you’ve got some time, watch the video I’ve embedded below. It’s the CEO giving the presentation I just saw. You can tell that he really digs the solution he’s made and believes in the technology.
A blog for IT Admins who do everything by an IT Admin who does everything