3 Activities That Will Make You a Better SysAdmin

As a number of my blog entries will attest, I really like drawing inspiration from sources outside of the typical system administration skill-set. I think that the technical aspects of our job are far easier than the “soft” stuff, or at least the “big picture” things.

When we start as inexperienced admins, we focus mostly on technical work. Ideally, we perform this work at the behest of our more experienced superiors, or less ideally, because there’s no one else to do it, and the company’s owner is telling you that the mail server is broken and that it needs fixed yesterday.
'Server Rack' photo (c) 2008, Jamison Judd - license: http://creativecommons.org/licenses/by/2.0/
I still recall very well back when I was a first year administrator and Brett, our senior admin, would have me do some small bit of technical work. Usually it was to build a new service or rebuild an old one. I vividly remember wondering to myself why he was having me do these kinds of things. There was no outward indication that something was wrong, I hadn’t heard anyone suggest these new services. It just seemed like Brett was randomly having me do things.

It took some time before I got to the point where I could really see the forest for the trees. Brett wasn’t being reactive, he was being proactive. He could see impending problems and build solutions before they became emergencies. It took a long time of dealing with individual components of the network, then small subsets of services, then the entire infrastructure before I could see why he did what he did.

The valuable addition to my skill set wasn’t technical in nature. It was the ability to see the relationships of systems to each other. It was gained through perspective. I was looking at trees when I was building solutions, but Brett was looking at the forest when he’d direct me to do them.

Your goal as a System Administrator should be to get to the point where you see the forest. It’s not an overnight kind of thing, and it takes a lot of working on trees before you can see it, but you should be working toward that goal while you’re doing the grunt work.

I’ve thought about it, and I’ve identified three things that I think can help you to move into this architectual-type role. These aren’t the only things you should be doing (by far!), but if you do these three things, you will advance your analytic skills above and beyond someone who doesn’t.

Also, they might seem a little out of left field, but trust me, the skills that you gain from them will carry over into your eventual role as someone who deals with forests instead of trees.

  1. Read Airline Accident Reports

  2. This sounds macabre, but what these reports consist of is ex post facto review of incidents involving aircraft by people who are extremely well trained and who have a very large incentive to determine the various causes of the incidents.

    The best resources in the world are probably from the FAA. There are two major data sets available, the preliminary data, which is a brief report that covers the apparent facts and situations (but may be subject to change through the course of the investigation), and the final reports, which are the results of what is sometimes a multi-year investigation.

    To give you some idea of the thoroughness behind these reports, here is an almost 100 page document detailing the crash of a 13 passenger turboprop airplane. From the abstract:

    The safety issues discussed in the report address fuel system limitations, requirements for fuel filler placards, and guidance on fuel system icing prevention. Safety recommendations concerning these issues are addressed to the Federal Aviation Administration (FAA) and the European Aviation Safety Agency. Previous safety recommendations concerning crash protection for airplane occupants and flight recorder systems were addressed to the FAA.

    These people are not screwing around.

    One thing to keep in mind while reading these incident reports is that, for the purposes of fact finding, some investigators will assign a single root cause (which, given the redundancy of modern aviation, is typically assigned to “pilot error”), while others cite contributory causes and situations. An incident investigator named Sidney Dekker wrote a gamechanging book about this called The Field Guide to Understanding Human Error, where he discusses that simplistically assigning blame isn’t just wrong, it’s actually counter productive, because the problems don’t get fixed. I picked up the paperback ($20 used), and I loved it.

  3. Read Medical Case Reports

  4. Speaking of macabre…I think it’s important to read medical case reports. Like I said, it sounds like it’s from left field, but it’s actually very relevant.

    The best site that I’ve found for this is The Journal of Medical Case Reports. It’s a publication dedicated to just that: case reports. They’re available for free, and they’re all written by practicing doctors who have documented interesting or unusual cases that they’ve encountered that they feel would be useful to their fellow medical professionals.

    Here’s an example: Low back pain during pregnancy caused by a
    sacral stress fracture: a case report
    (pdf). The medical jargon isn’t as important to us as the form and the function. Check out this abstract:

    Introduction: Sacral stress fractures are a rare but well known cause of low back pain. This type of fracture has also been observed as a postpartum complication. To date, no cases of intrapartum sacral stress fractures have been described in the literature.
    Case presentation: We report the case of a 26-year-old Caucasian European primigravid patient (30 weeks and two days of gestation) who presented to our outpatient clinic with severe low back pain that had started after a downhill walk 14 days previously. She had no history of trauma. A magnetic resonance imaging scan revealed a non-displaced stress fracture of the right lateral mass of her sacrum. Following her decision to opt for nonoperative treatment, our patient received an epidural catheter for pain control. The remaining course of her pregnancy was uneventful and our patient gave birth to a healthy child by normal vaginal delivery.
    Conclusions: We conclude that a sacral stress fracture must be considered as a possible cause of low back pain during pregnancy.

    Read through the rest of the paper. The frank and open discussion about the case, the evaluation, all of it lends itself as an evaluation by a knowledgeable professional giving his opinions to his peers.

    The doctors who authored this piece are doing exactly what we should be doing. How many times have you encountered some strange condition or a case that seemed bizarre, but when you discovered the underlying cause, was preventable through easy checking early in the process, if only you had thought to do that?

  5. Write Reports of Your Own
  6. You had to see this coming.

    Now that you have two very excellent examples of reports generated by professionals in their respective fields, you should come to see that you need to start writing reports on the things you experience, too.

    Part of being a professional is taking part in the professional community and sharing your knowledge with your fellow practitioners. The strange things that you notice can end up being very helpful to other system administrators. By publishing your findings, you are legitimizing your experiences and your professional status, plus documenting your experiences for posterity, plus encouraging others to do the same. There is absolutely no downside.

    So where to publish? Every conference that I know of is actively seeking practice and experience reports, which would cover either of the above types of reports. If you had a failure that you ran a post-mortem on, or you encountered an issue during the otherwise routine operation of your infrastructure that others could learn from, then by publishing a practice and experience report, you help everyone.

    There is also a journal for system administrators called ;login:, and they are always looking for case studies, which is what this kind of report would be.

Doing these things will not turn you into an amazing system administrator overnight, but taken together, they’ll provide excellent models of professional behavior that we should all work to emulate.

You don’t need to do any of these things to be good at your job, and I don’t want to give that impression. What I would like you to consider is that, given two equally proficient administrators, I will always prefer the one who has tried to share what he or she has learned rather than kept it to themselves.

  • Hi Matt, great post as always!

    I do a lot of this type of analysis on any problem I encounter, and I think your ‘seeing the forest’ on the documenting and sharing of detailed trouble reports is spot on.

    I have only one minor issue: you say, “There is absolutely no downside.” Unfortunately, there can be. If you are seen as someone who has the time, drive, and passion to document and share information, some people may pigeon hole you as not_a_sysadmin. This shouldn’t be the case, but it does happen.

    I believe that the additional analysis, troubleshooting, and communication skills required to write a decent trouble report, the courage and grace to describe human errors in neutral, non-blaming terms, and the drive to share information rather than hide it away are all extremely valuable professional behaviors.

    I encourage all of you to try writing reports of your own!

  • Hi Pam, thanks!

    I suppose it’s possible that some non-sysadmins will make the mistake of misinterpreting our work when we go to these lengths to be professional, but I see it as a temporary problem, though it is unfortunate.

    As this kind of activity becomes normalized and then expected, I hope that at some point, a common interview question will be “What have you published?”, because I feel like it’s that important.

  • Patrick Cable

    I find court opinions a good source of analysis, too.

    They may not be as easy to link to the SA world, but I think that they also do a good job of methodically analyzing a complaint. And, complaints can combine facts and opinions. All good stuff!

    Time to get readin’ :)

  • Another book that goes into quite some detail is Normal Accidents: Living with High-Risk Technologies (ISBN-13 978-0691004129). This is a close look at Nuclear accidents, focusing on 3 Mile Island. There are a lot of parallels between that and highly complex IT infrastructures, though the consequences of failures are less flashy (usually). A friend who works as an Air Traffic Controller recommended it to me, since the US ATC system is just such a highly automated, failures only happen when multiple things go wrong, kind of system.

    One of the key things from that book is that the more automated, cross-checked, and integrated a system is, the more likely it is that any failures will be critical failures. Fault-management doesn’t always take into account the many permutations of little failures into a whacking great one, and in fact such analysis is extremely hard. And we need to accept this as normal, and plan for it.

    Good reading.

  • Hey, thanks a lot for the suggestion! I just picked it up for my Kindle, so I’ll let you know what I think.

  • Pingback: 3 Activities That Will Make You a Better SysAdmin | Standalone Sysadmin | Tactical Buddha()