Tag Archives: backup

Dedupe to tape: Are you crazy?

W Curtis Preston, at Backup Central has posted an interesting entry, “Is Dedupe to tape crazy?. Even he admits in the first sentence that, yes, dedupe to tape is crazy. But then he bumps the crazy-fest up a notch by asking whether it’s crazy-bad or crazy-good.

You should read the article, but let me jump ahead to the end. He says it’s good, at least in certain cases. I say it’s bad, in any case where you’d like to actually retrieve your data, rather than minimize your costs.

Here’s why…

To understand that this is a bad idea, you’ve got to know what deduplication is first. As wikipedia puts it succinctly, deduplication is the elimination of redundant data. The

data is stored in one place, and all references to that data are stored as a shorter index number which points to the deduplicated data. Think of it like symlinks in your filesystem, if you’d like, except the symlinks are block level.

When you’re storing things on disk, this leads to near-miraculous disk savings. Want to store 5 years worth of full backups, but only use the disk space of the equivalent incremental backups? No problem! Store 50 copies of the same directory tree in various differing hierarchies, but only use the disk space of one? Done!

Now, what happens when the time comes to back up those 50 different hierarchies? Well, in the time honored, tape-expensive version, you use the tape equivalent of 50 copies of the data.

What Mr Preston is suggesting is that for long term storage, instead of storing 50 copies of the same data, store the actual data once, and back up the pointers to the data on the various backup sets. The argument for this is that if a full backup of the data takes 10 tapes, rather than 50 * 10 tapes, you can do 50 * 1, where the 50 different backup sets look at the 1 tape for the deduplicated data. This is a massive cost savings by any measure.

My problem with this is tape failure. If one of the 50 individual backup tapes fails, it’s no problem. Sure, you lose that particular arrangement of the data, but it’s not that big of an issue. Unfortunate, sure, but not tragic. If you lose the 1 tape that contains the deduplicated data, though, then you immediately have a Bad Day(tm).

Essentially, you are betting on one tape not failing over the course of (in the argument of Mr Preston) 7+ years. And if something does happen in that 7 years, whether it’s degaussing, loss, theft, fire, water, or aliens, you don’t lose one backup set. You lose every backup that referenced that set of data.

So I would, if I could afford one, buy a deduplicated storage array in a heartbeat for my backup needs. But I would not trust a deduplcated archival system at all. The odds of loss are too great, and it’s not worth the savings. I’d rather cut the frequency of my backups than save money by making my archives co-dependent.

But I could be wrong. Feel free to comment and let me know if I am.

It should probably be noted that Preston wrote about this too. The difference is, of course, that he knows what he’s talking about… :-)

Quick blurb on backups and Amanda

I’ve mentioned AMANDA a couple of times, and I just wanted to give an update on where I stood.

I hate to say it, but I gave up. I didn’t have the time to work with the configurations and craft them into the backup plan I needed. On the other hand, I wasn’t willing to give up a vendor-neutral backup solution that used off-the-shelf tools like Amanda does.

So I bought support. For $100 per client, it was a net win when you consider the time I spent and would have continue to spend. Zmanda gave me the upgrade to the 3.0 code base, and using the web based GUI is a breeze. There were a couple of minor issues, since the interface code is new, but Zmanda has been very good about issuing me patches and making sure that the issues I experienced were taken care of.

I’ve been very happy. The only remaining issues I’ve got in my backup solution are very close to being taken care of, and even those are just issues of logistics on my end.

Overall, buying support from Zmanda has been a big win so far.

The Admin Arsenal blog is talking about backups today, too. May want to check that out.

The Backup Policy: Databases

It’s getting time to revisit my old friend, the backup policy. My boss and I reviewed it last week before he left, and I’m going to spend some time refining the implementation of it.

Essentially, our company, like most, operates on data. The backup policy is designed to ensure that no piece of essential data is lost or unusable, and we try to accomplish that through various backups and archives (read Michael Janke’s excellent guest blog entry, “Backups Suck“, for more information).

The first thing listed in our backup policy is our Oracle database. It’s our primary data store, and at 350GB, a real pain in the butt to transfer around. We’ve got our primary oracle instance at the primary site (duh?), and it’s producing archive logs. That means anytime there’s a change in the database, a log file gets written to. We then ship those logs to three machines that are running in “standby mode” where they are replayed to bring the database up to date.

The first standby database is also at the primary site. This enables us to switch over to another database server in an instant if the primary machine crashes with an OS problem or a hardware problem, or something similar that hasn’t been corrupting the database for a significant time.

The second standby database is at the backup site. We would move to it in the event that both database machines crash at the primary site (not likely), or if the primary site is rendered unusable for some other reason (slightly more likely). Ideally, we’d have a very fast link (100Mb/s+) between the two sites, but this isn’t the case currently, although a link like that is planned in the future.

The third standby database is on the backup server. The backup server is at a 3rd location and has 16-tape library attached to it. In addition to lots of other data that I’ll cover in later articles, the Oracle database and historic transaction logs get spooled here so that we can create archives of the database.

These archives would be useful if we found out that several months ago, an unnoticed change went through the database, like a table getting dropped, or some kind of slight corruption that wouldn’t bring attention to itself. With archives, we can go back and find out how long it has been that way, or even recover data from before the table was dropped.

Every Sunday, the second standby database is shut down and copied to a test database. After it is copied, it’s activated on the test database machine, so that our operations people can test experimental software and data on it.

In addition, a second testing environment is going to be launched at the third site, home of the backup machine. This testing environment will be fed in a similar manner from the third standby database.

Being able to activate these backups help to ensure that our standby databases are a viable recovery mechanism.

The policy states that every Sunday an image will be created from the standby instance. This image will be paired with the archive logs from the next week (Mon-Sat) and written to tape the following Sunday, after which a new image will be created. Two images will be kept live on disk, and another two will be kept in compressed form (It’s faster to uncompress a disk than read it from tape).

In the future, I’d like to build in a method to regularly restore a DB image from tape, activate it, and run queries against it, similar to the testing environments. This would extend our “known good” area from the database images to include our backup media.

So that’s what I’m doing to prevent data loss from the Oracle DB. I welcome any questions, and I especially welcome any suggestions that would improve the policy. Thanks!