Adding switchports doesn’t suck anymore!

Working in a college is such a different environment from what I was used to. Some examples:

We don’t have “on call”. Seriously. The most critical core router on my section of the network can go down at 3am, and I won’t find out about it until the morning when I wake up.

When I suggested that, you know, maybe we might want to know about it at 3am, the question that I got back was, “Who is it going to affect?”. Well..umm….uh, our college’s external website will be down, but other than that…I can’t think of much.

The fact that business isn’t being conducted 24/7 and that I’m no longer dealing with untold amounts of financial data from companies around the world means that certain things are more relaxed. And I like it.

Another example, one morning last week, I took the early train and added a new blade to one of our switches. I originally asked if I should come in at midnight to do it, but the answer was the same, “if you do it when you get in before everyone else, then no one will notice if the entire switch reboots”. Well, um, yeah, I guess you’re right. This is going to get some taking used to.

And while I’m on the subject of switches, can I just mention that it’s really nice going to a chassis switch? Until now, all of my switches have looked a lot like these:

They were decent switches (well, ok, those switches in the picture weren’t decent, but there are decent rack mount switches), and relatively inexpensive. But when you ran out of ports, you had a problem…namely that you needed another switch. When you went to buy another, you had to make a choice of whether to get another of the same size and use it in addition to the one you already have, or whether you should get a bigger one and replace the original entirely.

The biggest issue with adding a switch was that co-location rack space was tight, and large chunks of computers had to be moved to make way for a new switch. Blargh. I’ve hated it each and every time i’ve had to add a new switch into a production rack.

But there is another way, particularly if you’ve got a lot of switch ports lit. It’s called a chassis switch. I knew about them before, but it just wasn’t cost effective for me to buy one. Here, though…I’ve got over a thousand lit switch ports, and I currently administer 5 chassis switches. And they’re beautiful. Cisco has a lot of promotional material devoted to why they’re so great, but the bottom line is that you add more switch ports and you don’t cry about it. Plus, you get to manage a single device rather than a ton of individual switches.

Interestingly, the idea of chassis switches does scale down. You can get a 3-bay 6503-E for under a couple grand. That usually includes the “supervisor module”, which functions as the brain of the switch. It does not include any switch ports. Those are extra.

When you look at the economics on it, the price of a 48 port gigabit switch module isn’t really much different than a 48 port rack mount switch. In essence, you’re paying for the chassis, and the chassis is paying for itself in terms of convenience. As the chassis gets bigger, the benefits increase (and in my opinion, they increase faster than the costs).

Oh, and just so you know, you can work with any good reseller to get refurbished models for a LOT cheaper than the prices listed up there. I’ve worked with both Network Hardware Resellers and World Data Products, but there are tons of other refurb dealers out there.

Anyway, I’m very happy that I get to administer chassis switches. I’ll have to be doing some re-working of the “core” infrastructure in the coming year, and I’ll make sure to post a nice entry about that and what my plans are.

As a note, Cisco certainly doesn’t have a monopoly on this stuff. Juniper, Brocade, and heck, even Netgear(!) all have chassis switch lines.

New LOPSA Chapters starting in DC and Austin, TX

LOPSA’s new local chapter drive is rolling right along. In the past month, LOPSA-LA became a full fledged chapter, and LOPSA-OC (Orange County) are working on starting a chapter. It’s time to start thinking about which chapters come next.

And LOPSA Board Member Evan Pettrey has been doing that! He’s got a new chapter that he’s working on in Washington, DC! He has teamed up with Greg Riedesel (of SysAdmin1138 and ServerFault renown) and they appear to be doing very well.

In the short term, they’re looking for people interested in attending and they really need some speakers. If either of these sound like you, just comment on this blog entry and I’ll get you in touch.

The second “new” chapter I wanted to let you know about is the Austin, TX chapter. It’s not exactly new, but it has been down for a while. All of this new activity is getting people interested in revitalizing Austin, though, and I think that’s perfect. Austin has an amazing culture, and there are more than enough IT admins in the area to support it.

I spoke with Travis Campbell, one of the Austin organizers, and he told me that the single most helpful thing someone in Austin could do would be to join the mailing list and let everyone know that we want do to meetings there again. I’m on the mailing list, too, so I can check out what’s going on. I’ve only spent a couple of weeks in Austin, but I loved it. I’ve got a lot of friends there, too, so I really want this one to succeed. I know they need people interested in attending meetings as well as speaking, so make sure to join the list and let Travis know!

Incidentally, speaking at one of these meetings is way easier than you think. It’s certainly a lot easier than speaking at a conference – you’re surrounded by friends, and it’s really just you talking about something you’re interested in. When I did my Infrastructure Migration talk, I literally just sat down for an hour or so and thought about what all of my moves had in common, and I made a list. That brain dump, after just a little massaging, turned into the slideshow. It really is easy, and you can do it. Consider it, at least.

And if every Fortune 50 company jumped off a bridge?

Oh, look! Microsoft is eliminating backup generators! That’s awesome. I’m going to save so much on generator maintenance and fuel! Lemme call to cancel my contract.
–Some Idiot, somewhere on the Internet

Yes, it’s true. Microsoft is eliminating backup generators and switching to alternative methods of providing backup power (in some cases – in others, they’re eliminating it entirely).

And that’s fine. For Microsoft. It’s not for you (unless you work at Microsoft – or one of maybe a couple dozen other companies that could do something this bizarre and get away with it). A while back, they also ran a test program with servers outside in tents. You shouldn’t do that either.

A lot of people read news stories like this and take it the wrong way. They want to get in on the action and try cool crazy things with their machines. “Microsoft ran outside in the summer just fine. I bet we can run our server room at 85 and it’ll work out”. And if they’re crazy enough to do it, sure, it’ll work out. For a while…but because they don’t understand the underlying mechanisms of what’s happening and why, their server room won’t be able to withstand a cooling loss for nearly as long as it would have. Or one of a dozen other gotchas will catch them up.

From hacker lore:
A novice was trying to fix a broken Lisp machine by turning the power off and on.

Knight, seeing what the student was doing, spoke sternly:

“You cannot fix a machine by just power-cycling
it with no understanding of what is going wrong.”

Knight turned the machine off and on.

The machine worked.

When you do something just because someone else is doing it, but without understanding why, you start to practice Cargo Cult System Administration.

Just because Microsoft does something, doesn’t make it a good idea for everyone. The same goes for Google, who uses custom in-house built servers in its datacenters. That doesn’t make it a good idea for you. In fact, it’s the opposite of a good idea. I don’t know what to call that…maybe a bad idea? Yeah, lets go with that.

Completely aside from not knowing why they do what they do, there’s the issue of scale.

See, Microsoft and Google have a certain advantage on their side, and that’s the economy of scale. Because a company is that huge and has so many resources, they have to do something so many times that the cost per item drops to almost nothing, and the advantages of a bizarre thing like making their own servers begins to outweigh the cost.

In fact, I’m going to go further. When you look at gargantuan companies such as Google, Microsoft, Apple, Amazon, etc, my view is that you don’t actually get many useful pointers from examining the technology that they use. You’re not in the business of trying to be Google, and you’re certainly not able to take advantage of their scale, so why would you spend time and money trying to emulate them technologically?

If you want to learn something from the Googles of the world, look at the broad strokes. They treat entire datacenters like we treat servers. The loss of a single entity is inconvenient, but it isn’t a tragedy. That’s why Microsoft can afford to not have some of their datacenters on generator backup. The infrastructures are designed to be up for as long as possible, and when they fail, the service continues gracefully by routing around those problems. That’s exactly how you should develop your services. Just not necessarily on the datacenter level.

You can actually learn a lot more from well-run medium-sized companies. Read the blogs from companies like Etsy, Joyent, or even Zoosk. These are medium-sized companies who are dealing with technology similar to yourself, and they’re doing it very well. And what’s more, they tell you how they do it.

Learn from companies who are doing what you’re doing, but better than you are. Not from companies who do insane, overwrought things with technology that you couldn’t afford on a scale you can’t imagine.