Monitoring (old) Zimbra

Date September 6, 2014

It's September, and in universities, that means tons of new people. New staff, new faculty, new students. Lots and lots of new people.

Here at The College of Computer and Information Science at Northeastern University, we've got a banner crop of incoming CS students. So many, in fact, that we bumped up against one of those things that we don't think about a lot. Email licenses.

Every year, we pay for a lot of licenses. We've never monitored the number used vs the number bought, but we buy many thousand seats. Well, we ran out last week. Oops.

After calling our reseller, who hooked us up with a temporary emergency bump, we made it through the day until we could buy more. I decided that it was time to start monitoring that sort of thing, so I started working on learning the Zimbra back-end.

Before you follow along with anything in this article, you should know - my version of Zimbra is old. Like, antique:

Zimbra was very cool about this and issued us some emergency licenses so that we could do what we needed until our new license block purchase went through. Thanks Zimbra!

In light of the whole "running out of licenses" surprise, I decided that the first thing I should start monitoring is license usage. In fact, I instrumented it so well that I can pinpoint the exact moment that we went over the number of emergency licenses we got:


Cool, right?

Well, except for the whole "now we're out of licenses" again thing. Sigh.

I mentioned a while back that I was going to be concentrating on instrumenting my infrastructure this year, and although I got a late start, it's going reasonably well. In that blog entry, I linked to a GitHub repo where I built a Vagrant-based Graphite installation. I used that work as the basis for the work I did when creating a production Graphite installation, using the echocat graphite module.

After getting Graphite up and running, I started gathering metrics in an automated fashion from the rest of the puppetized infrastructure using the pdxcat CollectD puppet module, and I wrote a little bit about how similar that was with my Kerbal Space Administration blog entry.

But my Zimbra install is old. Really old, and the server it's on isn't puppetized, and I don't even want to think about compiling collectd on the version of Ubuntu this machine runs. So I was going to need something else.

As it turns out, I've been working in Python for a little while, and I'd written a relatively short program that serves both as a standalone command that can send a single metric to Carbon or can function as a library, if you need to send a lot of metrics at a time. I'm sure there's probably a dozen tools to do this, but it was relatively easy, so I just figured I'd make my own. You can check it out on GitHub if you're interested.

So that's the script I'm using, but a script needs data. If you log in to the Zimbra admin interface (which I try not to do, because it requires Firefox in the old version we're using), you can actually see most of the stats you're interested in. It's possible to scrape that page and get the information, but it's much nicer to get to the source data itself. Fortunately, Zimbra makes that (relatively) easy:

In the Zimbra home directory (/opt/zimbra in my case), there is a "zmstats/" subdirectory, and in there you'll find a BUNCH of directories with dates as names, and some CSV files:

... snip ...
drwxr-x--- 2 zimbra zimbra 4096 2014-09-04 00:00 2014-09-03/
drwxr-x--- 2 zimbra zimbra 4096 2014-09-05 00:00 2014-09-04/
drwxr-x--- 2 zimbra zimbra 4096 2014-09-06 00:00 2014-09-05/
-rw-r----- 1 zimbra zimbra 499471 2014-09-06 20:11 cpu.csv
-rw-r----- 1 zimbra zimbra 63018 2014-09-06 20:11 fd.csv
-rw-r----- 1 zimbra zimbra 726108 2014-09-06 20:12 imap.csv
-rw-r----- 1 zimbra zimbra 142226 2014-09-06 20:11 io.csv
-rw-r----- 1 zimbra zimbra 278966 2014-09-06 20:11 io-x.csv
-rw-r----- 1 zimbra zimbra 406240 2014-09-06 20:12 mailboxd.csv
-rw-r----- 1 zimbra zimbra 72780 2014-09-06 20:12 mtaqueue.csv
-rw-r----- 1 zimbra zimbra 2559697 2014-09-06 20:12 mysql.csv
drwxr-x--- 2 zimbra zimbra 4096 2014-06-15 22:13 pid/
-rw-r----- 1 zimbra zimbra 259389 2014-09-06 20:12 pop3.csv
-rw-r----- 1 zimbra zimbra 893333 2014-09-06 20:12 proc.csv
-rw-r----- 1 zimbra zimbra 291123 2014-09-06 20:12 soap.csv
-rw-r----- 1 zimbra zimbra 64545 2014-09-06 20:12 threads.csv
-rw-r----- 1 zimbra zimbra 691469 2014-09-06 20:11 vm.csv
-rw-r----- 1 zimbra zimbra 105 2014-09-06 19:08 zmstat.out
-rw-r----- 1 zimbra zimbra 151 2014-09-06 06:28 zmstat.out.1.gz
-rw-r----- 1 zimbra zimbra 89 2014-09-04 21:15 zmstat.out.2.gz
-rw-r----- 1 zimbra zimbra 98 2014-09-04 01:41 zmstat.out.3.gz

Each of those CSV files contains the information you want, in one of a couple of formats. Most are really easy.

sudo head mtaqueue.csv
timestamp, KBytes, requests
09/06/2014 00:00:00, 4215, 17
09/06/2014 00:00:30, 4257, 17
09/06/2014 00:01:00, 4254, 17
09/06/2014 00:01:30, 4210, 16
... snip ...

In this case, there are three columns, which include the timestamp, the number of kilobytes in queue, and the number of requests. Most CSV files have (many) more columns, but this works pretty simply. That file is updated every minute, so if you have a cronjob run, grab the last line of that file, parse it, and send it into Graphite, then your work is basically done:

zimbra$ crontab -l
... snip ...
* * * * * /opt/zimbra/zimbra-stats/

And looking at that file, it's super-easy:


import pyGraphite as graphite
import sys
import resource

CSV = open('/opt/zimbra/zmstat/mtaqueue.csv', "r")
lineList = CSV.readlines()
GraphiteString = "MY.GRAPHITE.BASE."

rawLine = lineList[-1]
listVals = rawLine.split(',')

values = {
	'kbytes': listVals[1],
	'items':  listVals[2],


for value in values:
	graphite.sendData(GraphiteString + "." + value + " ", values[value])


So there you go. My python isn't awesome, but it gets the job done. Any includes not used here are because some of the other scripts I needed them, and by the time I got to this one, I was just copying and pasting my code for the most part. #LazySysAdmin

The only CSV file that took me a while to figure out was imap.csv. The format of that one is more interesting:

msimmons@zimbra:/opt/zimbra/zmstat$ sudo head imap.csv
09/06/2014 00:00:13,ID,11,0
09/06/2014 00:00:13,FETCH,2,0
09/06/2014 00:00:13,CAPABILITY,19,0

So you get the timestamp, the IMAP command, the number of times that command is being executed, and how long, on average, it took, so you can watch latency. But the trick is that you only get one command per line, so the previous tactic of only grabbing the final line won't work. Instead, you have to grab the last line, figure out the timestamp, and then grab all of the lines that match the timestamp. Also, I've found that not all IMAP commands will show up every time, so make sure that your XFilesFactor is set right for the metrics you'll be dealing with.

The code is only a little more complicated, but still isn't too bad:


import pyGraphite as graphite
import sys
import resource

imapCSV = open('/opt/zimbra/zmstat/imap.csv', "r")
lineList = imapCSV.readlines()
GraphiteString = "MY.GRAPHITE.PATH"

class imapCommand:
	name = ""
	count = ""
	avgres = ""

	def __init__(self, name, count, avgres): = name
		self.count = count
		self.avgres = avgres

IMAPcmds = list()

datestamp = lineList[-1].split(',')[0]

record = len(lineList)

while True:
	if ( lineList[record-1].split(',')[0] == datestamp ):
		CMD = lineList[record-1].split(',')[1]
		COUNT = lineList[record-1].split(',')[2]
		AVGRES = lineList[record-1].split(',')[3].strip()
		IMAPcmds.append(imapCommand(CMD, COUNT, AVGRES))
	record = record - 1


for command in IMAPcmds:
	graphite.sendData(GraphiteString + "." + + ".count ", command.count)
	graphite.sendData(GraphiteString + "." + + ".avgres ", command.avgres)


You can read much more about all of the metrics in the online documents, Monitoring Zimbra.

Now, so far, this has been the runtime metrics, which is helpful, but doesn't actually give me account information. To get that, we're going to use some of the built-in Zimbra tools. zmaccts lists all accounts, and then prints a summary at the end. We can just grab the summary and learn the number of accounts. We can also use the zmlicense -p command to get the number of licensed accounts we have.

The shell script is pretty easy:

$ cat zimbra-stats/

. /opt/zimbra/zimbra-stats/

OUTPUT="`/opt/zimbra/bin/zmaccts | tail -n 1`"

ACTIVE=`echo $OUTPUT | awk '{print $2}'`
CLOSED=`echo $OUTPUT | awk '{print $3}'`
LOCKED=`echo $OUTPUT | awk '{print $4}'`
MAINT=`echo $OUTPUT | awk '{print $5}'`
TOTAL=`echo $OUTPUT | awk '{print $6}'`
NEVERLOGGEDIN=`/opt/zimbra/bin/zmaccts | grep "never$" | wc -l`

MAX="`/opt/zimbra/bin/zmlicense -p | grep ^AccountsLimit= | cut -d \= -f 2`"


/opt/zimbra/zimbra-stats/ ${STATPATH}.active ${ACTIVE} 
/opt/zimbra/zimbra-stats/ ${STATPATH}.closed ${CLOSED}
/opt/zimbra/zimbra-stats/ ${STATPATH}.locked ${LOCKED} 
/opt/zimbra/zimbra-stats/ ${STATPATH}.maintenance ${MAINT} 
/opt/zimbra/zimbra-stats/ ${STATPATH}.total ${TOTAL} 
/opt/zimbra/zimbra-stats/ ${STATPATH}.neverloggedin ${NEVERLOGGEDIN} 
/opt/zimbra/zimbra-stats/ ${STATPATH}.max ${MAX}  


Forgive all of the shortcuts taken in the above. Things aren't quoted when they should be and so on. Use at your own risk. Warranty void in Canada. Etc etc.

Overall, it's to get that additional transparency into the mail server. Even after we get the server upgraded and on a modern OS, this kind of information is a welcome addition.

Oh, and for the record?

$ find ./ -name "*wsp" | wc -l

Over 8,500 metrics coming in. Sweet. Most of that is coming from collectd, but that's another blog entry...

Ubuntu and SNMP

Date August 21, 2014

After running Ubuntu for about two years now, I have a laundry list of complaints. Whether Ubuntu is automagically starting daemons that I install, or the relative difficulty of running an internal repo, or (and I'm heartily agreeing with my coworker Nick here) that it doesn't actually include a firewall out of the box....there are very basic issues I have with running and managing Ubuntu machines.

The one that inspired this entry, though, is like, super dumb and annoying.

Suppose I'm trying to do something like snmpwalk on a switch:

$ snmpwalk -v 2c -c public myswitch.mydomain
-bash: /usr/bin/snmpwalk: No such file or directory

Of course, I need snmp. Lets install that:

~$ sudo apt-get install snmp
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
libperl5.18 libsensors4 libsnmp-base libsnmp30
Suggested packages:
lm-sensors snmp-mibs-downloader
The following NEW packages will be installed:
libperl5.18 libsensors4 libsnmp-base libsnmp30 snmp
0 upgraded, 5 newly installed, 0 to remove and 36 not upgraded.
Need to get 1,168 kB of archives.
After this operation, 4,674 kB of additional disk space will be used.

and try it again:

$ snmpwalk -v 2c -c public myswitch.mydomain
iso. = STRING: "Cisco NX-OS(tm) n5000, Software (n5000-uk9), Version 6.0(2)N2(3), RELEASE SOFTWARE Copyright (c) 2002-2012 by Cisco Systems, Inc. Device Manager Version 6.2(1), Compiled 12/17/2013 2:00:00"
iso. = OID: iso.
iso. = Timeticks: (575495258) 66 days, 14:35:52.58
iso. = STRING: "me@here"
iso. = STRING: "myswitch"
iso. = STRING: "snmplocation"
iso. = INTEGER: 70
iso. = Timeticks: (4294966977) 497 days, 2:27:49.77
iso. = OID: iso.
iso. = OID: iso.
iso. = OID: iso.

Well, we're close, but all we have are a bunch of OIDs. I'd really like names. If you've read my introduction to SNMP, you know that it's not loading the MIBs. Weird. On RHEL/CentOS, that's kind of automatic. Maybe there's another package?

Well, that snmp-mibs-downloader that was listed as a suggested package above sounds pretty promising. Lets install that.

$ sudo apt-get install snmp-mibs-downloader
...snip lots of installing MIBS...

So basically, 300+ MIBs were just installed into /var/lib/mibs/ - this is awesome. Lets run that command again:

$ snmpwalk -v 2c -c public myswitch.mydomain
iso. = STRING: "Cisco NX-OS(tm) n5000, Software (n5000-uk9), Version 6.0(2)N2(3), RELEASE SOFTWARE Copyright (c) 2002-2012 by Cisco Systems, Inc. Device Manager Version 6.2(1), Compiled 12/17/2013 2:00:00"
iso. = OID: iso.
iso. = Timeticks: (575577418) 66 days, 14:49:34.18
iso. = STRING: "me@here"
iso. = STRING: "myswitch"
iso. = STRING: "snmplocation"
iso. = INTEGER: 70
iso. = Timeticks: (4294966977) 497 days, 2:27:49.77
iso. = OID: iso.
iso. = OID: iso.
iso. = OID: iso.

That's strange. As it turns out, though, Ubuntu has yet another trick up its sleeve to screw with you. Check out /etc/snmp/snmp.conf:

msimmons@nagios:/var/log$ cat /etc/snmp/snmp.conf
# As the snmp packages come without MIB files due to license reasons, loading
# of MIBs is disabled by default. If you added the MIBs you can reenable
# loaging them by commenting out the following line.
mibs :

This file's entire purpose in life is to stop you from having MIBs out of the box.

Obviously, you can comment out that line and then things work:

$ snmpwalk -v 2c -c public myswitch.mydomain
SNMPv2-MIB::sysDescr.0 = STRING: Cisco NX-OS(tm) n5000, Software (n5000-uk9), Version 6.0(2)N2(3), RELEASE SOFTWARE Copyright (c) 2002-2012 by Cisco Systems, Inc. Device Manager Version 6.2(1), Compiled 12/17/2013 2:00:00
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (575602144) 66 days, 14:53:41.44
SNMPv2-MIB::sysContact.0 = STRING: me@here
SNMPv2-MIB::sysName.0 = STRING: myswitch
SNMPv2-MIB::sysLocation.0 = STRING: snmplocation
SNMPv2-MIB::sysServices.0 = INTEGER: 70
SNMPv2-MIB::sysORLastChange.0 = Timeticks: (4294966977) 497 days, 2:27:49.77
SNMPv2-MIB::sysORID.1 = OID: SNMPv2-MIB::snmpMIB
SNMPv2-MIB::sysORID.3 = OID: SNMP-FRAMEWORK-MIB::snmpFrameworkMIBCompliance
SNMPv2-MIB::sysORID.4 = OID: SNMP-MPD-MIB::snmpMPDCompliance
SNMPv2-MIB::sysORDescr.1 = STRING: The MIB module for SNMPv2 entities
SNMPv2-MIB::sysORDescr.2 = STRING: View-based Access Control Model for SNMP.
SNMPv2-MIB::sysORDescr.3 = STRING: The SNMP Management Architecture MIB.
SNMPv2-MIB::sysORDescr.4 = STRING: The MIB for Message Processing and Dispatching.
SNMPv2-MIB::sysORDescr.5 = STRING: The management information definitions for the SNMP User-based Security Model.
SNMPv2-MIB::sysORUpTime.1 = Timeticks: (4294966977) 497 days, 2:27:49.77
SNMPv2-MIB::sysORUpTime.2 = Timeticks: (4294966977) 497 days, 2:27:49.77

But...if you're not actually running an snmp server, and you just want to use snmp for querying...if you get rid of that file entirely, it ALSO fixes the problem.

Anyway, just another annoying thing I've found that I thought I'd share.

Adam Moskowitz's video: The Future of System Administration

Date August 6, 2014

My friend Adam Moskowitz presented a topic at LOPSA-East this past year that is one near and dear to my heart - the Future of System Administration.

I've written a blog entry with something very close to that title twice:

A while back, I started to realize something...and I haven't written about this before, but I'm firmly coming to see that a system administrator is not just something someone is, system administration is something someone does. Regardless of whether someone's title is SysAdmin, IT Operations, Systems Engineering, or Sparkly DevOps Prince(ess), you might not be a system administrator, but you are doing systems administration.

And in the future, a lot of the people who are going to be doing systems administration are developers.

<voice="Levar Burton">But you don't have to take my word for it...</voice>

The Future of System Administration from Adam Moskowitz on Vimeo.

DRACs and Macs and Java Hacks

Date August 5, 2014

I'm inherently lazy. Like, "I want a bottle of water, but it's all the way over there, so I guess I'll just dehydrate" style lazy.

Our server room is maybe, MAYBE, 50 feet from my desk. But I've got to go around a wall and unlock two doors, disable an alarm system, fight a dragon, and then find the machine I want to administer. Ain't nobody got time for that.

To make sure that I can maintain my standards of laziness, we have remote management available for all of our servers. Since we're a Dell shop, that consists of DRAC , the Dell Remote Access Controller (with the Enterprise license for remote console).

In the past, these have been of questionable quality and perhaps I have, on occasion, called into question the marital status of the DRAC's mother, but we seem to have settled into an uneasy truce where I try to do things, and it continually tells me no until I ask nicely enough.

The first case in point is the DRAC's remote console and the security surrounding it. If you ever take a look at the jnlp file, you'll see that the application pulls down a platform-specific jar file that is the actual KVM application.

When you run jarsigner -verbose -certs -verify on the jar file, here's what you get:

$ jarsigner -verbose -certs -verify ./avctVMLinux32.jar

135 Thu Sep 08 13:44:00 EDT 2011 META-INF/MANIFEST.MF
259 Thu Sep 08 13:44:00 EDT 2011 META-INF/DELL.SF
5406 Thu Sep 08 13:44:00 EDT 2011 META-INF/DELL.RSA
0 Thu Sep 08 13:42:48 EDT 2011 META-INF/
sm 160904 Thu Dec 11 19:54:40 EST 2008

[entry was signed on 9/8/11 9:44 AM]
X.509, CN=Dell Inc., OU=PG Software Development, OU=Digital ID Class 3 - Java Object Signing, O=Dell Inc., L=Round Rock, ST=Texas, C=US
[certificate expired on 8/30/13 7:59 PM]

X.509, CN=VeriSign Class 3 Code Signing 2009-2 CA, OU=Terms of use at (c)09, OU=VeriSign Trust Network, O="VeriSign, Inc.", C=US
[certificate is valid from 5/20/09 8:00 PM to 5/20/19 7:59 PM]
[KeyUsage extension does not support code signing]
X.509, OU=Class 3 Public Primary Certification Authority, O="VeriSign, Inc.", C=US
[certificate is valid from 1/28/96 7:00 PM to 8/1/28 7:59 PM]

s = signature was verified
m = entry is listed in manifest
k = at least one certificate was found in keystore
i = at least one certificate was found in identity scope

jar verified.

This jar contains entries whose signer certificate has expired.

Emphasis mine.

As you can see via the bolded line, the certificate used to sign this jar file expired in August of 2013. This creates an awkward situation, because Java 1.7 got all gung-ho about security and was like, "sorry dude, no can do. If you want to run this, you've got to turn your security into swiss cheese. Via con Dios". That might be a loose quote.

Of course, that's assuming you can actually run the thing. If you're on a Mac, your computer is conspiring against you. If you run Chrome, for reasons that have never been apparent to me, the jnlp file doesn't actually get downloaded as viewer.jnlp. It gets downloaded as viewer.jnlp(servername@0@idrac-6C61TS1,+PowerEdge+R410,+User-USERNAME@RANDOMNUMBERS).YOURTLDHERE@0@idrac-6C61TS1,+PowerEdge+R410,+User-USERNAME@MORERANDOMNUMBERS), and at least on a Mac, the OS is like, "Sorry, I've got no idea. Would you like to open this with like, text edit, or maybe XMLedit? Because I can't do anything with this".

So once you rename it to .jnlp (complete with scary "changing the file extension may lead to wormholes destroying the universe" warnings), then you double click the file and your computer goes full-on HAL9000. "I'm sorry, Matt, I'm afraid I can't let you do that".


The workaround is, of course, to right click the file then select Open. This lets the computer know that you're totally serial about opening the malware.

Once you do that (then click Open again), Java actually pulls down the jar file and runs it. If you haven't replaced the self-signed DRAC certificate (you lout), then you get an SSL warning when it pulls the jar file, of course. But then you have the jar file, and it runs it, and immediately pukes because of the aforementioned signing issue.

The way to trick it into running is to go into your System Preferences, launch the Java Control Panel, go to Security, say "I solomnly swear that I am up to no good", set your security slider to "High" or if you live dangerously, Medium. I'd say "Low", but much like the pizza place by my house, there is no "Small", only "Medium, "Large", and "Extra Large". You know what? That makes Medium the Small, you morons! Gah!

Anyway, where were we? Right, change your security level to High or Medium, then in the Exception Site List section, click the Edit Site List, then add the https:// url to your DRAC. Yes, all of them. No, you can't use wildcards. Awesome, right?

Anyway, add your DRACs to the list (or just edit the list once and distribute it) and then you'll be able to connect.

Of course...if you're trying to manage ESXi machines like I am, and you're logging in to the console to, say, reset the management agents, you're going to run into other the fact that VMware's ESXi console makes extensive use of the F11 key, something Macs have claimed as their own, as pressing F11 launches Expose. So basically, by default, you can't possibly press F11 in the virtual terminal because it makes all of your windows scatter like cockroaches.

You can also fix that by going into the control panel, clicking on Mission Control, and change the "Show Desktop" shortcut to something that your enterprise consoles haven't used for critical functionality.

So there. If you use a Mac to manage things, hopefully you can at least remove those particular hurdles. What other things do you run into with Macs (or just stupid platforms in general)? Comment and let me know!

Xenophobia Revisited

Date July 30, 2014

Way back in 2010, I wrote an article called Xenophobia and Elitism in the Community. The article talked mostly about xenophobia and elitism in the community through a technical lens - about the tendencies that we all sometimes get toward mocking some technology or technique, and treating people who practice using it as less intelligent than ourselves, or less worthy than ourselves. It's not a bad read. You might want to check it out.

A recent post on Reddit, titled I've gone off the deep end, brought that same feeling of revulsion back to me in full force. It's rare to see a display of open racism so blatant - regardless of what the original poster calls it. Here's the text, and you should be forewarned - this will probably upset you if you're easily upset. It pissed me off, for what that's worth:

Did any of you see the video on the front page of reddit a few days ago of a saudi guy beating the shit out of an Indian with a belt? It was pretty horrible.

However it made me laugh, and for that I'm ashamed. Working in IT I've developed a deep hatred for Indians, and I'm not even a bad or racist person. I'm just so sick of them as a group thinking they know so much more than everyone else when they often know nothing, and that damn accent, and the weird sense of superiority.

My initial thought was "I wish I could do that to the useless indian IT guys at my company" when I saw that saudi kid swinging the belt and beating him senseless.
I never used to be like that.

"I'm not even a bad or racist person"

I don't even really know how to properly respond to that, when it's immediately followed by "I'm just so sick of them as a group thinking they know so much more than everyone else when they often know nothing, and that damn accent, and the weird sense of superiority".

I'm not going to say, "this should make you mad", but if it doesn't, you might want to think about why it doesn't, and maybe reconsider some things.

What is just as depressing to me is the sheer number of comments of supporters in that thread. People who take the side of the person who wrote that he wishes he could take a belt and beat people in his company senseless.

Look, people can be pretty horrible. People can be monsters, and people can be awful to their fellow people, but I can't sit by and not comment on this blatant ...racist...xenophobic....asshole behavior. These aren't the words of someone who is frustrated. These are the words of someone who really needs to step back and understand that he is everything he claims to not be. He's racist and I don't know whether he's a bad person, but he definitely has some anger issues he should deal with.

Racism is real, and it's more than just "That person is Indian so I'm biased", it's deeply cultural, and it might be deeply biological, but regardless of where the xenophobic, racist distrust originates, we need to see it and we need to understand that it is a bias, and to compensate for it.

The temple of Apollo at Delphi, in ancient Greece, had a stone carved with the words, "γνῶθι σεαυτόν". We're more familiar with the Latin translation, "temet nosce". It means "know thyself", or literally, "get to know yourself", and that's the advice I would give to everyone, because it's something that I struggle with, too, but I work toward because it's important. It might be the most important thing that any of us can do.

We each see the world through a series of lenses handed to us by our parents, our teachers, our religious leaders, and others who we encounter through life. What we end up with is a very customized version of the world that is unique to each individual, and your experience on this Earth are very different than mine, so we see different things. If we're ever to truly understand each other, and work together effectively, I need to understand that my vision is impacted by my experiences, and I need to account for that, and you need to do the same. I can't do that unless I frankly consider how I look at things, and why, and neither can you.

If you talk with another admin or a support person, and you hear the words, "Do the needful", or "please advise", and you feel something negative, ask yourself why that is. Is a negative reaction helping your predicament? Are you better off for automatically having that kind of response? If not, work toward correcting for the response, however you can make that happen. Maybe you need to identify where in your life you got that reaction and isolate what makes you respond that way, so that you can separate that from your current experience. Or maybe you should just use the phrase occasionally yourself. Taken literally, there's nothing wrong with the phrase, assuming both sides of the conversation know what needs to be done.

In the end, I would just encourage you to 'temet nosce', and to work toward a better version of you in the future. I'm trying, and I know that it's an uphill battle, but I think it's worth fighting for. And hopefully you do, too.

Kerbal Space System Administration

Date July 28, 2014

I came to an interesting cross-pollination of ideas yesterday while talking to my wife about what I'd been doing lately, and I thought you might find it interesting too.

I've been spending some time lately playing video games. In particular, I'm especially fond of Kerbal Space Program, a space simulation game where you play the role of spaceflight director of Kerbin, a planet populated by small, green, mostly dumb (but courageous) people known as Kerbals.

Initially the game was a pure sandbox, as in, "You're in a planetary system. Here are some parts. Go knock yourself out", but recent additions to the game include a career mode in which you explore the star system and collect "science" points for doing sensor scans, taking surface samples, and so on. It adds a nice "reason" to go do things, and I've been working on building out more efficient ways to collect science and get it back to Kerbin.

Part of the problem is that when you use your sensors, whether they detect gravity, temperature, or materials science, you often lose a large percentage of the data when you transmit it back, rather than deliver it in ships - and delivering things in ships is expensive.

There is an advanced science lab called the MPL-LG-2 which allows greater fidelity in transmitted data, so my recent work in the game has been to build science ships which consist of a "mothership" with a lab, and a smaller lightweight lander craft which can go around whatever body I'm orbiting and collect data to bring to the mothership. It's working pretty well.

At the same time, I'm working on building out a collectd infrastructure that can talk to my graphite installation. It's not as easy as I'd like because we're standardized on Ubuntu Precise, which only has collectd 4.x, and the write_graphite plugin began with collectd 5.1.

To give you background, collectd is a daemon that runs and collects information, usually from the local machine, but there are an array of plugins to collect data from any number of local or remote sources. You configure collectd to collect data, and you use a write_* plugin to get that data to somewhere that can do something with it.

It was in the middle of explaining these two things - KSP's science missions and collectd - that I saw the amusing parity between them. In essence, I'm deploying science ships around my infrastructure to make it easier to get science back to my central repository so that I advance my own technology. I really like how analogous they are.

I talked about doing the collectd work on twitter, and Martijn Heemels expressed interest in what I was doing, since he would also like write_graphite on Precise, so I figured that other people probably might want to get in on the action, so to speak. I could give you the package I made, or I could show you how I made it. That sounds more fun.

Like all good things, this project involves software from Jordan Sissel - namely fpm, effing package management. Ever had to make packages and deal with spec files, control files, or esoteric rulesets that made you go into therapy? Not anymore!

So first we need to install it, which is easy, because it's a gem:

$ sudo gem install fpm

Now, lets make a place to stage files before they get packaged:

$ mkdir ~/collectd-package

And grab the source tarball and untar it:

$ wget
$ tar zxvf collectd-5.4.1.tar.gz
$ cd collectd-5.4.1/

(if you're reading this, make sure to go to and get the new one, not the version I have listed here.)

Configure the Makefile, just like you did when you were a kid:

$ ./configure --enable-debug --enable-static --with-perl-bindings="PREFIX=/usr"

Hat tip to Mike Julian who let me know that you can't actually enable debugging in the collectd tool unless you actually use the flag here, so save yourself some heartbreak by turning that on. Also, if I'm going to be shipping this around, I want to make sure that it's compiled statically, and for whatever reason, I found that the perl bindings were sad unless I added that flag.

Now we compile:

$ make

Now we "install":

make DESTDIR="/home/YOURUSERNAME/collectd-package" install

I've found that the install script is very grumpy about relative directory names, so I appeased it by giving it the full path to where the things would be dropped (the directory we created earlier)

We're going to be using a slightly customized init script. I took this from the version that comes with the precise 4.x collectd installation and added a prefix variable that can be changed. We didn't change the installation directories above, so by default, everything is going to eventually wind up in /opt/collectd/ and the init script needs to know about that:

$ cd ~
$ mkdir -p collectd-package/etc/init.d/
$ wget --no-check-certificate -O collectd-package/etc/init.d/collectd
$ chmod +x collectd-package/etc/init.d/collectd

This is pulling in the file from this gist.

Now, we're finally ready to create the package:

fakeroot fpm -t deb -C collectd-package/ --name collectd \
--version 5.4.1 --iteration 1 --depends libltdl7 -s dir opt/ usr/ etc/

Since you may not be familiar with fpm, some of the options are obvious, but for the ones that aren't, -C changes directory to the given argument, --version is the version of the software, as opposed to --iteration is the version of the package. If you package this, deploy it, then find a bug in the packaging, when you package it again after fixing the problem, you increment the iteration flag, and your package management can treat it as an upgrade. The --depends is a library that collectd needs on the end systems. -s sets the source type to "directory", and then we give it a list of directories to include (remembering that we've changed directories with the -C flag).

Also, this was my first foray into the world of fakeroot, which you should probably read about if you run Debian-based systems.

At this point, in the current directory, there should be "collectd_5.4.1-1.deb", a package file that works for installing using 'dpkg -i' or in a PPA or in a repo, if you have one of those.

Once collectd is installed, you'll probably want to configure it to talk to your graphite host. Just edit the config in /opt/collectd/etc/collectd.conf. Make sure to uncomment the write_graphite plugin line, and change the write_graphite section. Here's mine:

    Port "2003"
    Protocol "tcp"
    LogSendErrors true
    # remember the trailing period in prefix
    #    otherwise you get
    #    You'll probably want to change it anyway, because 
    #    this one is mine. ;-) 
    Prefix ""
    StoreRates true
    AlwaysAppendDS false
    EscapeCharacter "_"

Anyway, hopefully this helped you in some way. Building a puppet module is left as an exercise to the reader. I think I could do a simplistic one in about 5 minutes, but as soon as you want to intelligently decide which modules to enable and configure, then it gets significantly harder. Hey, knock yourself out! (and let me know if you come up with anything cool!)