SoCal LOPSA Chapter. Who’s doing it?

I have heard from a ton of people in Southern California (or SoCal for you peeps not hip to the jive) who want to attend a LOPSA meeting in Southern California. Enough people that they’d actually start out with a full chapter.

This is silly. I don’t live in SoCal, so I can’t start that chapter, but I can help someone else do it. So who’s in? Who wants to organize a LOPSA chapter there? Who would attend one (and remember, you don’t need to be a LOPSA member to attend the chapter meetings typically).

A while back, I wrote a LOPSA Locals Chapter Guide designed to make it easier for organizers to spin up LOPSA chapters. Take a look through if you’re interested.

The biggest single thing that everyone is afraid of is, “I can’t lead a group of sysadmins”. As I say in the guide, don’t think of yourself as a leader, think of yourself as an organizer. You’re already an amazing organizer of things, otherwise you’d never have made it as a sysadmin. Organizing LOPSA chapters are just one other type of logistics. You’re good at logistics; you can do this.

I will personally help anyone willing to start working on a LOPSA chapter in Southern California. LISA12 is in San Diego this year. It would be AWESOME to have a local chapter represent.

If you are interested in Organizing OR Attending a meeting somewhere in SoCal, drop a comment here or shoot me an email and let me know you’re interested.

Weekend Fun: New NetApp Installed

One of the larger differences between a position in Academia and a commercial business is that after-hours or weekend work is far less frequent. Every once in a while, though, there’s something that you need to do that can’t be done during the day. This weekend was one of those times.

Our central fileserver duties were, up until this weekend, run by a NetApp FAS3140 filer. It provided all of the NFS shares and iSCSI LUNs to all the machines around the college. It had a few different issues that made us want to replace it. namely it being long in the tooth, plus only having a single head / controller (I’m still not all that familiar with the NetApp nomenclature).

We replaced it with another filer yesterday, a 3210 with two controllers.

I’m really glad that I presented the Infrastructure Migration talk recently, because I used a lot of the things I talked about in those slides.

Part of the support agreement with our NetApp reseller was the actual installation, testing, and turn-up of the new filer, so our checklist basically had two major parts: prepare for the outage and recover from it. As you know or can imagine, when the central file server goes down, there’s a non-trivial amount of work done to prepare the infrastructure.

You could approach this problem one of two ways. You could actually do it both ways in order to self-check for correctness.

The first way, and the way that I started, was to say, “alright, what’s the least-important machines that rely on the filer?”. Those need to turn off first. Then, the next important, then the next important, and so on. Importance is kind of a arbitrary judgement though; what I was really asking was, “what relies on this, but has nothing that relies on it?”. These were things like desktop machines. Because desktop machines have nothing that rely on them (except users, and the users had been warned previously several times), they were to be the first to get shut-off.

The other way, which is probably more correct, is to start at the center and say, “what relies on the NetApp directly?”. Create a list, then iterate through that list, asking the same question, “what relies on this?”, and repeat until you’re out of dependent systems. I didn’t take this route because it generally takes more time and I started late. Next time, I imagine we’ll make the checklist farther ahead in time, something I’ll bring up at the post-mortem.

Overall, things went relatively smoothly. Of course, things almost always go smoothly. It’s the whole “bringing it back up” that creates wrinkles, but it honestly didn’t go badly. There were a couple of undocumented places on really old Solaris boxes which referenced the previous filer by name, as opposed to by CNAME (each of the major shares now has a CNAME that the clients point at…something like homedirs.domain.tld, but since this wasn’t exclusively documented, we had to fix it manually.

Overall, I’m pretty happy, and now we’ve got a shiny new filer, and still have a disk shelf on the old one, so I can get a little more familiar with NetApp without breaking production ;-)

If you have any questions or suggestions of things that we could fix, please let me know by commenting below. Beware that this purchase was planned before I got here (in fact, they showed me the boxed-up filer during my interview months ago), so I won’t be able to answer any “why did you pick this” type questions.