And if every Fortune 50 company jumped off a bridge?

Oh, look! Microsoft is eliminating backup generators! That’s awesome. I’m going to save so much on generator maintenance and fuel! Lemme call to cancel my contract.
–Some Idiot, somewhere on the Internet

Yes, it’s true. Microsoft is eliminating backup generators and switching to alternative methods of providing backup power (in some cases – in others, they’re eliminating it entirely).

And that’s fine. For Microsoft. It’s not for you (unless you work at Microsoft – or one of maybe a couple dozen other companies that could do something this bizarre and get away with it). A while back, they also ran a test program with servers outside in tents. You shouldn’t do that either.

A lot of people read news stories like this and take it the wrong way. They want to get in on the action and try cool crazy things with their machines. “Microsoft ran outside in the summer just fine. I bet we can run our server room at 85 and it’ll work out”. And if they’re crazy enough to do it, sure, it’ll work out. For a while…but because they don’t understand the underlying mechanisms of what’s happening and why, their server room won’t be able to withstand a cooling loss for nearly as long as it would have. Or one of a dozen other gotchas will catch them up.

From hacker lore:
A novice was trying to fix a broken Lisp machine by turning the power off and on.

Knight, seeing what the student was doing, spoke sternly:

“You cannot fix a machine by just power-cycling
it with no understanding of what is going wrong.”

Knight turned the machine off and on.

The machine worked.

When you do something just because someone else is doing it, but without understanding why, you start to practice Cargo Cult System Administration.

Just because Microsoft does something, doesn’t make it a good idea for everyone. The same goes for Google, who uses custom in-house built servers in its datacenters. That doesn’t make it a good idea for you. In fact, it’s the opposite of a good idea. I don’t know what to call that…maybe a bad idea? Yeah, lets go with that.

Completely aside from not knowing why they do what they do, there’s the issue of scale.

See, Microsoft and Google have a certain advantage on their side, and that’s the economy of scale. Because a company is that huge and has so many resources, they have to do something so many times that the cost per item drops to almost nothing, and the advantages of a bizarre thing like making their own servers begins to outweigh the cost.

In fact, I’m going to go further. When you look at gargantuan companies such as Google, Microsoft, Apple, Amazon, etc, my view is that you don’t actually get many useful pointers from examining the technology that they use. You’re not in the business of trying to be Google, and you’re certainly not able to take advantage of their scale, so why would you spend time and money trying to emulate them technologically?

If you want to learn something from the Googles of the world, look at the broad strokes. They treat entire datacenters like we treat servers. The loss of a single entity is inconvenient, but it isn’t a tragedy. That’s why Microsoft can afford to not have some of their datacenters on generator backup. The infrastructures are designed to be up for as long as possible, and when they fail, the service continues gracefully by routing around those problems. That’s exactly how you should develop your services. Just not necessarily on the datacenter level.

You can actually learn a lot more from well-run medium-sized companies. Read the blogs from companies like Etsy, Joyent, or even Zoosk. These are medium-sized companies who are dealing with technology similar to yourself, and they’re doing it very well. And what’s more, they tell you how they do it.

Learn from companies who are doing what you’re doing, but better than you are. Not from companies who do insane, overwrought things with technology that you couldn’t afford on a scale you can’t imagine.

  • It’s interesting to read about what the big providers are doing because they often pave the way to new technologies and methodologies over the coming years. E.g. Hadoop coming from the Google BigTable/Map Reduce paper. There is academic value there but it takes time for the best practices to filter down to make it worth doing in your own infrastructure.

    These companies have the resources (budget and people) to try new approaches and figure out what works, and it makes sense for them to do that. At their scale they can make significant efficiency improvements from small tweaks which simply don’t make sense at smaller scale.

    Perhaps this could be called premature optimisation…good for the big guys but not appropriate for smaller players (yet).

  • Excellent post Matt, I see far too many people in our industry who do things without understanding what they’re doing and this puts businesses in grave danger.

    I work with with my team often to try and help them understand not just what they’re doing, but also why they’re doing it. However, I didn’t realize that this had a term before now (Cargo Cult System Administration). Thanks for that!

  • Thanks, Evan!

    And yeah, the term comes from “Cargo Cult Programming”, which is in turn based on Cargo Cult Science, coined by none other than Richard Feynman.