If you can’t script it, use a checklist

Date July 9, 2009

In yesterday’s entry on my failure as a new-user-machine-setter-upper, I mentioned offhand one of my maxims, “if you can’t script it, use a checklist”. Others are “Documentation shall set you free”, and “Never go against a Sicilian when death is on the line”.

As it turns out, I have had several people tell me that they really like this idea, and this morning, Kaput asked where I got it. Well, like all really good ideas, it was stolen from other people, just not in its current form. I took the individual ideas from two sources and merged them into one logical statement.

It was Michael Janke (who runs the excellent blog, Last In, First Out) who gave me the idea that for truly structured management, things should be scripted. My favorite article to draw inspiration from is his Ad Hoc vs Structured System Management (and that link is soon going to be full of nothing but trackbacks from here. I linked to it when I was talking about Opsview vs Nagios, too). Anyway, the article is great.

The very first thing I heard about sysadmins using checklists was from Chris Siebenmann. After reading that article, I wrote an article about preemptive troubleshooting. (Incidentally, that post was also the beginning of the famous Rubber Ducky Debugging that lead to my favicon)

The idea of checklists was also influenced by an article very much like this one. Afterwards, I found out that many very prominent sysadmins (such as Tom Limoncelli) advocate checklists.

The article that I read originally devoted the first third or so to explaining why pilots have checklists at all, and was fascinating. Planes started out relatively simple. A few controls, an engine to make sure was started, and that was pretty much it. As airplanes became more complex, the number of things needing to be done increased, and pilots had to remember more things. Eventually, the complexity reached a point (maybe in the 4-6 tasks range?) that the pilots were making mistakes, and when you’re flying a plane, mistakes are A Bad Thing(tm).

As it turns out, the plane that tipped the scale was the Boeing Model 299, which caused two fatalities during a test flight, due to pilot error. Rather than completely scrap the plane, the army ordered more test pilots, and tried to figure out how to make the thing safer. The pilots came up with the idea of the checklist, which enabled the 299 to go into production as a plane you might have heard of: the B-17 Flying Fortress.

Nowadays, the newest thing is surgeons using checklists, an idea cribbed from the pilots. I just figured, if it’s good enough for the people trying to keep me alive, it’s good enough for my systems.

If you work from Michael’s idea that since humans can make mistakes, the best way to ensure uniformity is to use a script to make the changes, then you want to script everything. Sadly, not everything can be feasibly scripted. Since there are some things that can’t be scripted, we want to ensure that those operations happen using the most reliable means at our disposal. For a small group (or single person) doing something complex that has several steps, it looks like our answer is a checklist.

In the end, the maxim is just the previous paragraph shortened. “If you can’t script it, use a checklist.”

Many thanks to Michael, Chris, various news reporters, surgeons, and the test pilots of the Model 299 for this post.



7 Responses to “If you can’t script it, use a checklist”

  1. ToddT said:

    First, I love your blog. Second, I have to add one more maxim to your arsenal.

    ” Document what you do.” and “Do what you Document.”

    I learned this one when I transitioned from sysadmin to IT auditor. Many of the shops that I audit, maintain the documentation for the auditors only. No one else updates or reads it. Half of the issues I find are personnel not following the procedures outlined in the documentation.

  2. dusty said:

    Hi. I am browsing your “Redhat Clustering” write-ups — and notice a problem:

    Whenever I click a link in your blog to take my browser to a story, such as this link, it then redirects me away from that page to the latest and greatest entry.

    I would really love to read that link, but alas, cannot seem to get there and stay. Bugger.

  3. Matt Simmons said:

    @ToddT: That’s really good. I like that, and I will gladly steal it :-) Thanks!

    @Dusty: Yeah, that’s a problem with the blogspot redirect I’ve got going on. I need to spend some more time making it work, but I don’t have a ton to devote to the blog right now. In the meantime, please accept my apologies, and feel free to search at the top. Alternatively, the archives are fully browsable at the right, if you use the URL to determine the date. Sorry it’s so difficult right now.

  4. John McGrath said:

    When I write a script/Setup Document/Instruction document, I will screenshot the step, and make notes about the step in our template.

    In this way, who ever comes behind me can use the document to install and configure the software/hardware/OS the same, and if the process requires validation, we have a known good procedure.

  5. Steve said:

    That’s really good. I like that, and I will gladly steal it Thanks!

    ‘Tis not theft, but a long term loan. :)

  6. Michael T said:

    I created (as opposed to caused) a check list to be created in a support environment where the one sheet checklist was completed and initialed by the next support character in a pager rotation and became part of the audit material available to customers.

    It had some of the dumbest little things on it such as “Have spare battery.” It seems stupid but I can assure you that it made a HUGE difference to the person who had to initial it that they knew, without a doubt, that if they initialed it and didn’t have the battery that everyone would know and no one could say I forgot. And you also knew that the customer, during a post-mortem, could ask for proof that we did the right things and hadn’t screwed up and that this documentation would be produced with our names on it. Tweren’t anway way of getting around it.

    On another note, I learned this about documentation. I always griped/wished/complained that the jerk before me thought of the next guy and provided documentation. It was then that I realized that while I was “the next guy” there would always be a next guy and I was the current jerk and it was my responsibility to document things for the person after me!

  7. Matt Simmons said:

    Hey Michael, great comment. You’re very right in that we’re ALL the “current jerk”, and that we need to document for those people who come after us.

    It sounds like the checklists really worked for you in that position. I’m hoping to get some more going here, too. They’re too handy to ignore.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Easy AdSense by Unreal

Switch to our mobile site