February 16, 2011
Coconuts might be non-migratery, but my servers sure aren’t!
One of the larger projects on my docket has been migrating our production stack at our primary site from one rack into two. Making this difficult was the fact that the datacenter floor we were on was full, so we couldn’t just wheel the rack over and move stuff around. Oh no, we’d have to unrack everything, move it a few hundred yards, then re-rack it.
As much fun as that sounds, doing it with only two people would be unbearable (and take the entire weekend, to boot!), so I hopped on the trusty local LOPSA mailing lists (LOPSA-NJ and LOPSA-NYC) and asked for some help:
This coming Saturday, I’m going to be migrating my production stack
from one datacenter in a facility in Carlstadt, NJ into another one in
the same building (maybe 300 yards apart total). My junior admin can
handle taking the stack apart, for the most part, and I can handling
re-racking and reconnecting, but each of us could use a hand, and we’d
love to have someone to actually move the parts from one site to the
other. This is where you come in…
I’m thinking 6 people would be the perfect amount for this, and Ryan
and I are already going to be there, so that leaves 4 more people. My
company is offering $100 for the day, and I’ll even pitch in and buy
dinner afterwards for everyone.
I’m estimating that we’ll start around 12:30-1pm, and if things go
well, our part should be done by 5 or 6. You would meet me in
Please let me know before Thursday if you can do it, and it’s first
come, first serve.
Frankly, I was surprised by the number of people responding! I had four people sign up in short order, and several others as backups. That many people in the community willing to give up their Saturday to help a fellow sysadmin (sure, and make a quick buck, but not a ton) was really great to see.
So on Saturday at 12:30pm or so, myself, Ryan (my junior admin) and hired-server-wranglers Michael, Jesse, John, and Jared got acquainted with each other and the task at hand.
Walking into someone else’s infrastructure is never easy, so I wanted to get them as comfortable with the task as possible. To that end, Ryan and I created several sheets which outlined exactly what was going to happen, and when. This wasn’t just for their benefit, though, it was for Ryan and my as well.
We had one sheet with a checklist of everything that had to be done, and in what order. By going down this list, we could guarantee that each step was complete before the following steps were started (useful for things such as “Verify production run complete with head of operations”, and “verify network cutover is prepared”).
I also produced a very simple spreadsheet which functioned as a diagram of what the racks would look like when finished. Each cell was 1u, and servers took the appropriate number of cells up, and we could verify correctness at a glance. As it turns out, I made the spreadsheet backwards, so the left rack was the right rack, and vice versa, but it looked perfect from the back, so I’ll just say that’s what I was going for…
Because a total infrastructure shutdown is a rarity, I also had a checklist of the order in which servers could be shut down and brought back up (and in the case of VMs, which physical hosts they resided on). This was useful in the heat of the moment so that we didn’t mistakenly shut down the server running, say, the domain controller, as that could cause difficulties later on.
Here are some before pictures:
It looks all nice and neat, doesn’t it?
…but the truth comes out!
Here are some after pics:
The air-flow panels will be installed shortly
Not perfect, but a damned sight better than it was before!
Overall, the move went very smoothly. There was an hour-long wrinkle in the beginning where our new cross-connects were plugged into the wrong ports, and at the end, my storage didn’t want to enable the SPS (which caused a near-panic on my part, because this was not the first time) , but eventually everything came up, and right now, my servers are chugging along happily.
I owe John, Jesse, Jacob, and Michael a huge debt of gratitude (and beer). We were going to go out to the pub following, but with the near-tragedy of the SAN not coming back up until 11:30pm, we thought it best to not do the pub thing, so hopefully in the near future!
My weekend was pretty stressful, but it was definitely productive. A lot got done, and I feel very confident that the infrastructure can expand comfortably without growing pains.