Please schedule your unplanned emergencies at a more opportune time

It’s been one of those weeks.

We recently ordered a couple of pretty hefty (for us, anyway) machines from penguin computing. They’re 12 core machines with 32GB of RAM. They’re for heavy maths, and we’ve planned to put one in the primary site and one in the backup site.

Well, one arrived DOA. I’ve RMA’d it, but since we’ve needed to get these machines into production, I installed and configured the working machine and delivered it to the rack at the primary site. And there was much rejoicing.

Until the next day, when we started getting odd flapping reports of packet loss on the production database machine. It solved itself, and we didn’t think any more of it, until it happened the next day. Of course, in retrospect, we should have seen the relationship immediately, but it took some investigation. It seems that certain processes using 10 of the 12 cores cause issues with the database machine when run concurrently with another database-heavy task on another machine. Maybe.

See, I’d love to say conclusively, but signs are pointing to “yes”. Before I can assure anyone that it is absolutely the issue, I’ve got to recreate the occurrence at the backup stack. But we don’t have a machine there capable of that kind of performance. At least, we didn’t before yesterday.

It was decided at 3:30pm that the machine needed to be physically migrated from the primary site to the secondary site. Like, right now.

So by 6 last night, Ryan and I were at the backup site installing the machine in the rack and getting it configured. Hopefully by the end of the week, we’ll be able to conclusively prove that I need to find a way to make the database server magically better. Or not. I’m not sure which to hope for.

Thank goodness that the LISA conference is next week. It sure won’t be a vacation, but it’ll be nice to get away for a while.

Please allow us to automate your virtual cloud environment…

After that title, I should put “SEO Expert” on my business card…

Two weeks ago, Brendon Burton, over at “Story of a Sysadmin wrote a great article about “the cloud”, and I’ve been wanting to reply to it since then. It’s called “Automation is the cloud“, and it starts with a very laudable, though difficult goal.

I think it is important that as technologists and sysadmins, we do what we can to bring clarity to what “The Cloud” is and how it affects and benefits you.

Well, let me tell you, I’m on board for that. I hear the phrase “cloud computing” and I have mental images of this:

Brandon concentrates on the automation aspect of cloud computing. That certainly is a large part of it. The automation of virtualization, of networking, of deployment, and of pretty much everything else. It is certain, a cloud won’t scale without that. If you’ve got to manually deploy and configure an arbitrary number of virtual machines, you’re going to go insane before you even build up a good mist, let alone a whole cloud.

Although Brandon does a great job talking about the necessary automations that go into making a cloud, one thing that was outside of the scope of the article that someone should talk about is abstraction. Not only does the cloud deal with automating vast numbers of virtual data, it also abstracts it.

To me, the concept of “the cloud” is apathy. What server does your machine run on? What storage array(s) does the drive and your data live on? Who cares?

The cloud is not the sum of its parts. The cloud is the layer above that. It’s the platform upon which your machines (and everyone else’s) run. And as Brandon wrote, that platform is automated from top to bottom. It has to be, otherwise it doesn’t work.

Dedupe to tape: Are you crazy?

W Curtis Preston, at Backup Central has posted an interesting entry, “Is Dedupe to tape crazy?. Even he admits in the first sentence that, yes, dedupe to tape is crazy. But then he bumps the crazy-fest up a notch by asking whether it’s crazy-bad or crazy-good.

You should read the article, but let me jump ahead to the end. He says it’s good, at least in certain cases. I say it’s bad, in any case where you’d like to actually retrieve your data, rather than minimize your costs.

Here’s why…

To understand that this is a bad idea, you’ve got to know what deduplication is first. As wikipedia puts it succinctly, deduplication is the elimination of redundant data. The

data is stored in one place, and all references to that data are stored as a shorter index number which points to the deduplicated data. Think of it like symlinks in your filesystem, if you’d like, except the symlinks are block level.

When you’re storing things on disk, this leads to near-miraculous disk savings. Want to store 5 years worth of full backups, but only use the disk space of the equivalent incremental backups? No problem! Store 50 copies of the same directory tree in various differing hierarchies, but only use the disk space of one? Done!

Now, what happens when the time comes to back up those 50 different hierarchies? Well, in the time honored, tape-expensive version, you use the tape equivalent of 50 copies of the data.

What Mr Preston is suggesting is that for long term storage, instead of storing 50 copies of the same data, store the actual data once, and back up the pointers to the data on the various backup sets. The argument for this is that if a full backup of the data takes 10 tapes, rather than 50 * 10 tapes, you can do 50 * 1, where the 50 different backup sets look at the 1 tape for the deduplicated data. This is a massive cost savings by any measure.

My problem with this is tape failure. If one of the 50 individual backup tapes fails, it’s no problem. Sure, you lose that particular arrangement of the data, but it’s not that big of an issue. Unfortunate, sure, but not tragic. If you lose the 1 tape that contains the deduplicated data, though, then you immediately have a Bad Day(tm).

Essentially, you are betting on one tape not failing over the course of (in the argument of Mr Preston) 7+ years. And if something does happen in that 7 years, whether it’s degaussing, loss, theft, fire, water, or aliens, you don’t lose one backup set. You lose every backup that referenced that set of data.

So I would, if I could afford one, buy a deduplicated storage array in a heartbeat for my backup needs. But I would not trust a deduplcated archival system at all. The odds of loss are too great, and it’s not worth the savings. I’d rather cut the frequency of my backups than save money by making my archives co-dependent.

But I could be wrong. Feel free to comment and let me know if I am.

It should probably be noted that Preston wrote about this too. The difference is, of course, that he knows what he’s talking about… :-)