Eventual regulation of system administration?

Date November 12, 2011

I was asked recently whether I thought that, eventually, System Administration would require regulation, similar to how engineering and medicine require regulation.

This isn't an easy question to answer, even though I think about it quite a lot. I think the right answer (as much as any answer can be "right") is that, yes, eventually some of us may hold positions that need to be regulated in the future, but in my opinion, it's for the best. Here's my answer:

Yes, some regulation is absolutely necessary in certain segments of the industry.

There is a very good (but very hard to read) book called Risk Society written by Ulrich Beck that caused something of a paradigm shift in the engineering mindset in the 90s.

To oversimplify, society (and the world it exists in) has become complex to the point that you can not engineer risk out of the equation.

This idea is supported by the findings of people like Sidney Dekker in The Field Guide to Understanding Human Failure, who performs what could be considered root cause analysis of surgical and aeronautical accidents. The systems that he deals with are now complex to the point where there is no single root cause, because failure is an inherent operational condition of the environment. In other words, asking why something failed is exactly like asking why something didn't fail - it was the end result of an impossibly complex web of interrelationships, all of which culminated in the eventual success (or failure) of the system.

There are a lot of scenarios where the tasks undertaken by system administrators do have life or death consequences, and in order to architect those infrastructures with adequate resiliency, a lot of education is necessary.

The path of a lot of system administrators from amateur to professional resembles that of a child who is exceptionally gifted at building erector sets being hired to construct a pedestrian bridge. Then, if the bridge doesn't fall, the kid gets to build bridges designed to handle interstate traffic.

I don't write this to disparage the upwardly mobile system administrator who has learned on the job, acquired a high skill level, and is successful in the systems that they engineer. Someone who does that should be justly proud.

When you start considering the potential loss of human life in such a system, however, you start to realize that "best effort" learning isn't enough, particularly when there is no test to establish a safe knowledge level.

Why should you require a degree in civil engineering to design and implement a traffic control system, then not require the slightest test of the people who administer the IT infrastructure that it runs on?

No, I anticipate that in the future, "critical infrastructure" administrators will have certain requirements laid on them for the benefit of everyone who uses the system. The difficult decision will be where to draw the line.

What are your thoughts? Can you see the need to pass a test (or series of tests) to become a "Critical Infrastructure Administrator"?

  • Furicle

    It sounds like a reasonable requirement, but it will have to start with the developers before it can work down the chain. And I have no clue how you'll ever regulate it when most of the work can be done from a chair in any country in the world. Lastly, defining what counts as critical might be the hardest step of all....

  • http://gurubert.de Robert Sander

    When you are working in a regulated environment (e.g. healthcare) you already have to follow rules (FDA 21 CFR part 11, ISO 13485, GAMP for example).

    AFAIK there is no formal "personal certification", but your whole organization needs to be audited, and if that fails, your product will not be accepted.

  • http://serverfault.com/users/7200 Evan Anderson

    Touching on what @Robert Sander says I believe that any certification should lie with the organization providing a product or service and not the individual people who provide that service. Certify the design and the organization's ops procedures, not the individual people.

    There are too many interlocking "moving parts" in any complex IT infrastructure for ultimate responsibility for failures to rest with any one individual or small group of individuals. I think it becomes really thorny when you consider bugs in off-the-shelf software (and firmware). At that point, the responsibility crosses organizational lines. Are all the developers on the firmware in the failed router "certified", too? How about the developers of the RTOS kernel the router runs on?

    A systems administrator / architect might "design" a solution but the level of integration in the "parts" that make up that solution is much higher than in the components that make up a bridge, for example.

    Certifications like SAS70 and PCI are steps in the right direction. They can be abused by organizations who need a marketing check-box. They also enrich the consulting industry and create a culture of FUD, too.

  • http://arsedout.net Ian

    Yeah, the likely hood of material failure in say, a bridge, is a lot lower than the firmware/software/hardware failure. Most competent IT people plan around failure and what they deem as appropriate levels of risk. Those differ from industry to industry, org to org, and application to application. I wouldn't want to be on the committee to figure out curriculum for all those nuances. And beyond that, by the time the committee figures it out, it will probably antiquated information.

    With the life and death thing, I believe that the far majority of IT work is not life and death. If you're certified to engineer a building or a bridge, you are dealing with risk where death is a very real possibility.

    As an aside, I always find it interesting to take a step back and see how much failure we work with on a day to day basis and how it's accepted. Why do we pay for support? While configuration help is a part of it, it's mostly because we expect that hard drive to die, that server motherboard to eat it, that initial software release has massive problems and needs patching, that core switch to suddenly stop switching. It's funny how the consumer market doesn't accept failure with credit cards and purchase orders ready to pay the vendor more for their poor product just to be allowed to patch their poorly designed software or replace their low quality hardware. What if Apple charged to get iOS 5.0.1. There would be blood in the streets!

  • http://www.antoinebenkemoun.fr antoine

    In the field in which I work (Air Traffic Control), it is mandatory for any system administrator to be certified as an Air Traffic Safety Electonics Personnel with a rating in system administration. This is actually not so far away from the training curriculum for Air Traffic Controlers.

    The mid-air crash in Uberlingen, Switzerland was, as you mentionned, due to an incredible number of failures down the line. One of them was non-certified personnel working on Air Traffic related systems. There are such examples in Germany where non-certified personnel took down infrastructures critical to Air Safety and resulted in near-misses. This happened around the time when they were planning on making the certification optional.

    The end problem is that unless it is immediatly obivous that a fault will cause human casualties, people will most likely not bother. Does this make good sense ? Yes in a way. Is it the best situation, definitely not !

  • John McGrath

    This is a very interesting topic.

    In the Pharma field (in which I work), we have validation documentation that we process to verify that our systems are working as specified, and that our IT personnel are compliant in their training on those systems, and infrastructure that uses those systems.

    More directly, Doesn't the "industry standard" type certifications (Cisco, VMware(?), Microsoft, etc.) cognisant in this regard? These certifications are supposed to show that the people who pass these tests are knowledgeable in the scope of the certification, and most of the new hires must meet the crucible of "certification" before they are considered.

    In the Healthcare, Pharma, and Financial sectors I can see this as a step towards a more regulation of the IT industry. Eventually this will expand to most of the sector as companies will want to emulate a validated environment (see ITIL, ISO 2000, etc...).

  • Pingback: Civil Engineering Companies | Civil Engineering Companies

  • Anthony

    I don't think there can ever be a 'System Administration' certification. The main reason being that there is no single certification possible.

    Looking through the comments above - would a single certification be adequate for an Air Traffic control systems administrator and a healthcare systems administrator?

    The need isn't for a 'generic' Systems Administration certification but for specific certification on the specific systems that a person would be working on in any given environment. Being an expert in Linux isn't going to help you repair a mainframe, even if there are common principles and practices.

    Perhaps what would be useful would be something to help organization build certification guidelines for their specific environment. A recognized standard in defining the required knowledge to maintain a system, and a standard for assessment of that knowledge.

  • Pingback: Learning from other disciplines | Standalone Sysadmin