March 8, 2010
Or as (ir)regular as they normally are. I really hope that you enjoyed the flashback week, and got something useful from it. I’m going to try to do it again next year on the first full week of March.
Now it’s just back to the daily grind for me. I’ve been rehashing some Nagios configuration and I’ve unearthed an ancient relic! How fun! Configuration archaeology is a hobby of mine, and to find a gem that hasn’t (as far as I can tell) been mentioned on the official site since 2002? That’s GREAT! I’ve still got to go through the source code to make sure that it doesn’t do anything interesting, but it’s out of my config now.
As it turns out, my recent attention to Nagios is multifaceted. I’m cleaning up the config and tightening up the alert rules, but also, I’m going to be giving a 45 minute talk at the Professional IT Community Conference in May. If you’re in the northeast US, you should definitely make it! And you should hurry and register while the early bird special is going!
Posted in General
3 Comments »
March 5, 2010
You are probably a human. At least, the statistical odds are in your favor. As a human, you experience stress, and how you react to it plays a large part in determining how happy you are. System administrators deal with stress particularly poorly, in general. We assume the role of hero and that’s that. Do what it takes, bask in whatever glory accompanies the successful completion of our task.
There is no downtime in that equation. Immediately following those emergencies, most of us drink depressants to bring ourselves down. On normal days, we require morning stimulants to bring ourselves up. I highly suspect that some of us are so called “adrenaline junkies” from the relative high that we get when there’s an immediate problem that no one can solve but ourselves.
This is unhealthy.
What we really need is to be able to step back and look at the pattern in our lives and say I don’t want to live with this stress.
When it first hit me that stress is probably the biggest single microproblem for admins, I wrote the following. I hope you find it relevant.
Jack Hughes, over at the Tech Teapot, mentions a very appropriate subject for too many systems administrators:
burnout.
As sysadmins, we’re nearly always the go-to person for whatever happens. After a while, we start to get used to it, and lots of times, we can develop a hero complex, carrying the weight of the world on our shoulders, at least in our minds. This isn’t healthy for a lot of reasons, the most important of which is your health.
Here’s an example of what taking your job too seriously can do to you:
Part One
Part Two
Not to ruin the ending, but the most disgusting part is that, while the guy was taking medical leave, his company fired him. To be completely honest, he’s much better off without a company like that, and if your company would do the same thing, then so are you.
To quote Peter Gibbons, “We don’t have a lot of time on this earth. We weren’t meant to spend it this way. Human beings were not meant to sit in little cubicles staring at computer screens all day…”
Even one of the most preeminent Systems Administrators around, Tom Limoncelli advocates leaving the pressure at work when you head home. For those of us on call 24/7/365, that can be a little hard, but it’s important to try.
Posted in General
12 Comments »
March 3, 2010
It’s the end of a long day. You lean back in your chair, sigh, and you’re glad it’s time to go home. Someone asks you what you did all day. You just sort of shake your head and say “fought fires”.
Fire fighting, as a sysadmin, means you don’t make any progress. You only work very hard to stay where you are. Working against entropy is difficult, and it can take a lot out of you. Some days are harder than others.
One day in early June, not long after I started this blog, I experienced a major setback. Also, a major power outage. Our entire backup facility lost power, and what’s worse, the generator refused to kick on. Our secondary site was down hard for days, until the power was restored to the downtown area of the village we were located in.
During the problem, though, we were able to turn a major issue into a net gain. Read on for the rest of the story…
It’s funny, sometimes, how we tolerate suboptimal or downright malproductive arrangements in our infrastructures, just because it’s inconvenient or inopportune to do it the “right way”. It seems like “the right way” either never comes, due to projects getting phased out, or it gets fixed during a cataclysmic upheaval, when it has become an immediate concern.
The case in point is my mail server. We have an A and a B mx record. Originally the B MX just stored mail until the A came back up, then it would get delivered. Everyone checks mail on A, so it can’t really be down during the day, and about 6 months ago, the office that B was at relocated and B was never set up. This left us with just A. To make matters worse, A was old enough that it was physically located in our backup site, which used to be our primary site. This was suboptimal. Of course there was talk about moving it to the primary site, but when could a maintenance window be created? And we’d risk the entire period of non-connectivity when it was being moved. No, management said, lets just leave it where it was.
Great strategy. It actually worked fine though, until this weekend.
I came in on Saturday, ready to do some major work on the blade systems I’m building for our new site. I sat down at my desk, ready to dive into work. Since I was alone, Raiders of the Lost Ark was playing on the laptop. I had just logged into the first server when the lights went off, and the telltale screech and whine from the server room told me that we’d lost main power.
In Granville, OH, that’s not a strange thing. We’ve got backup AC and a backup generator, so I wasn’t worried. It does have to be manually started, so I jogged into the server room and turned on the CFL floor lamp. At least I tried to. I looked at the generator control panel and it confirmed my fears. No generator power.
I tried for several minutes to start it, but nothing gave me the impression that anything would change, so I called my boss to let him know the situation, and that I was going to start shutting down machines. Since the only critical thing was mail, I suggested that he change DNS to point to an as-yet unassigned IP at the colocation, and that I could setup a postfix process there to queue the mail. He said that it would work, but he suggested an alternative approach.
Why not relocate the physical mail server to the colocation? A lightbulb went off. Of course, not only could I take care of that long standing problem, but because there was no power at all in the datacenter, the normal policy of no-downtime-for-repairs-and-upgrades was out the window.
The next morning, I left work to go home at 5am. The previous 15 hours had been spent completely rehauling the backup datacenter. With the mail relocated to the primary facility, once the power came on in the backup, I had free reign to cull everything unnecessary that had been accumulating.
There is now a pile of cables covering a square yard or so around 6 inches deep of power, ethernet, and copper/fiber cables. There are something like 96 ports worth of switches that I took out, multiple servers, KVMs, fiber switches, and general cruft. The servers are also arranged so that no half-depth servers are hiding between full depth. That was always a pet peeve of mine.
I thought about it while I was doing this, and if fighting normal issues is considered firefighting, then what I went through should have been considered forestfire fighting. And just like a forest fire, good can come from it. It takes the massive heat of a forestfire to crack open some pine cones. It also takes massive infrastructure downtime to make significant changes.
Posted in General
3 Comments »
March 2, 2010
This is a short bit that I wrote when I was considering overhauling the internal naming scheme at my company. We used to use an odd mismash of names, and we used to have multiple invented internal DNS names, that referred to the physical location. And I don’t mean things like “location.example.com” (that might make sense!). I mean it would be as if General Motors had “boston.gm” and “tijuana.gm” and “tokyo.gm”. Nonesensical in a lot of ways (particularly now that the TLD’s can be bought for a song (well, an expensive song)).
Anyway, I was curious how other people did it, so I asked. As it turns out, this post originally aired in July of 2008. I would guess that I had a couple of hundred readers. That’s a good range of experience to draw from, but I wanted a more broad view, so I submitted it to slashdot. And it got on the front page.
Thanks to Slashdot, this entry originally received 43 comments, which is right around 30 more than the next most popular story at that point. I’ve had a lot of people tell me that they found me because of that front page article. I didn’t submit it to drive people to the blog; I really did want to hear what people were doing with their own networks. Driving people to the blog was a completely satisfactory side effect, though
Before you leave this page, make sure to check out the original and read the comments. There’s a lot of funny (and interesting) ideas!
Enjoy!
Bob Plankers, over at The Lone Sysadmin wrote a couple days ago about getting busted while reading the wiki page on X-Men. He tried to cover it up by claiming to be researching future host names. Quick thinking, Bob. Good job!
It does bring up a good point, though. Internal naming schemes are something that everyone has an opinion on, and a load of suggestions.
At various places, I’ve used greek/roman gods, Simpsons characters, beer companies, wine labels, and fish.
At my current company, we used the beer and wine names. We absorbed another company that used fish. It worked fine for a while, but we grew in terms of servers and locations until it got unwieldy to remember A) all the names, and B) what each name did. You’d also start to get very similar names after a while. We’ve now got 4 physical locations, soon to be 5, and something like 50-60 servers (not counting network devices), no one would be able to keep them all straight (including the admin).
To improve the situation, we’re in the process of changing to location-based hostnames with a flat internal domain structure. For example, the 2ndary application server in Ohio is oh-app2, with the fake internal domain name trailing. The alpha site’s primary fileserver is a-fs1.
It’s no where near as fun as “wolverine.internal.com” but it certainly does tell you where you’re connecting to and what the machine does. What makes it interesting is when you go changing things like CVS repositories on people’s machines, mail servers, etc. The policy we’ve taken is to alias the old information to the new, and slowly phase out the old method.
What do you use as internal naming systems? What do you think would make an excellent scheme? Make sure to check the list to make sure it hasn’t been done before!
Posted in General
8 Comments »
March 1, 2010
Today’s flashback is also going to be a HOWTO, and vaguely related to yesterday’s Rackmount HOWTO. Today I’m including a HOWTO for understanding how building wiring works.
When I first looked at house wiring in a moderately complex 8 story building, I was sort of mystified. It was only after literally tracing wires and numbers around the various wiring closets that I understood what was happening.
This howto deals specifically with 66-blocks, but 110 blocks are also becoming common, just not in my neck of the woods.
Enjoy!
Punch down blocks are used for when you need to run wires long distances, typically between distribution points ( things like the MDF, or Main Distribution Facility, otherwise known as the main telco room on the primary floor of the building), comms closets, and the like. They are used, rather than normal RJ45 jacks, because they are simpler, less prone to breaking, and don’t introduce much, if any, extraneous electrical interference to the wire.
For the next few paragraphs, please refer to the following picture, which is a clear, understandable example of a punch down block:

Those grey things in the middle are termination points for single wires. When you deal with punch down blocks, you deal in pairs of wires, and you get one wire to one grey clip. Pretend there is an imaginary line down the middle of that patch panel, because the pairs on the left are separate from the pairs on the right. If you number a line going across, 1 2 3 4, 1 and 2 are a pair, 3 and 4 are an entirely different pair. In the picture, you can tell, because of the numbering scheme they’ve used. There are 4 clips for a pair, but only 2 clips are wired in the beginning. Each pair is numbered, so the phone company can say “Turn on pair 3044″.
Now, at our building, the phone company’s lines are on the right side of the wall. On the left side of the wall is another huge array of punchdown blocks. These are for the “house wiring”. When they built the building, they pre-ran hundreds of pairs of wires to each floor from this room, so that they wouldn’t have to redo it every time someone ordered a T1. Each wire to each floor is terminated to a pair on the left side of the wall in exactly the same manner as the one on the right (including leaving one set empty).
To connect the two sides, you run a twisted pair of wires (it looks like you took a section of cat5, stripped off the sheathing, and just used one set of wires) from the right side of the wall (from the pair of grey clips we left open on #3044) to the left side of the wall (say #514, the 14th pair to the 5th floor, again using the empty grey clips). If you look again at that picture, you can see 3043 has been wired across, because all 4 wires are clipped in, but 3044 has not, since only the rightmost clips have wires.
At this point, you have two wires coming in from the phone company to the punch down blocks on the right. Then you’ve got wires connecting those punch down blocks to the “house wiring” punchdown blocks on the left. Then you’ve got vertical wiring up to the floor that the wire ends at.
In the comms closet on that floor (also known as the IDF, or intermediate distribution facility), you have a very similar situation. On the right hand side, you’ve got the punch down block where the vertical cabling from the MDF terminates, and on the left, you’ve got a punch down block where the actual wires that end up in your office are terminated. You use another twisted pair of wires to connect the two sides, and at that point, the wire that ends up in your office is connected directly to the phone company, albeit through several punch down blocks and lots of wire.
Now, when it comes into your office, hopefully someone has had the courtesy to install a patch panel for you. The patch panel looks like this on the front:

and this on the back:

As you can tell from the photo on the back, wires are typically matched up color for color when it comes to straight CAT5 cables. When it comes to things like wiring T1s, you’re only using two wires, so as long as you remember which one goes to what wire, you’re ok.
So, i review, we’ve got phone company wires coming in, and terminated in the MDF. They’re connected across to the house wiring, which is run vertically to the IDF, and from the IDF, it goes to your space. All of this is accomplished with those magic little grey clips.
Now, if only the wires would go in there. It turns out that there’s a trick. Or a tool, really, called a punch down tool (creative, eh?). The cheapest punch down tool I’ve ever seen is a buck. It’ll work in a pinch, but the one you want is here:

The way you use it is to arrange the wire you want to punch down against the metal clip. There’s a very thin slit in the clip where the wire will end up. Press the tip of the punch down tool against the clip, and push. The spring loaded mechanism (in the expensive tool) or your elbow grease (in $.99 model) will push the wire to the bottom of the slit, and in the process, scrap away the plastic or teflon sheathing on the wire, allowing the metals to make contact. The expensive model will then use the spring action to slice the extra wiring off the end, eliminating extraneous electrical interference (when you’re dealing with hundreds of feet worth of cable, this is a good thing). In the cheap model, I’d recommend an Xacto knife to do the job.
As for maintenance, there’s not really much that can go wrong in a patch panel, as long as no one comes in and starts pulling on wires. Typically there’s a plastic case that goes over the entire block to prevent accidental snags from pulling wires loose.
The best advice is to document everything you can. Leave a hard copy of the documentation in the comms closet so that you can see what’s been done. Lots of times, the telephone tech will “tag” the lines that he’s installed on the right hand side. The tag usually has the numbers of the pairs that are activated on the telco side, and the phone numbers (or circuit IDs) that match those pairs.
(Photos courtesy of lil 1/2 pint, techmsg, dmitrybarsky)
Posted in General
5 Comments »
March 1, 2010
(PICC is a regional sysadmin conference to be held in central NJ on May
7-8, 2010. I’m on the planning committee. http://picconf.org)
Today is the deadline for proposals for papers, talks, and such.
We’re a little low on submissions so I’d like to make one more “beg”. We’d love to have a talk about PHP for sysadmins, something fun you’ve done with Arduino, your favorite JS library, a walk-through on setting up Google Apps. Demo your favorite open source project, or propose a panel of people to talk about something you find interesting (I can help find others for your panel). It is an excellent way to spread the word about a project you are involved with.
We’ve tried to make the proposal process really easy. Just send your
contact info and topic plus a 1-2 paragraph description to
submissions@lopsanj.org
For more info, contact me and/or view:
http://lopsanj.org/events/picc10/cfp
BTW, today is the deadline but we can grant extensions to anyone that writes and asks.
Posted in General
No Comments »