Nagios Config Howto Followup

Date September 16, 2014

One of the most widely read stories I've ever posted on this blog is my Nagios Configuration HowTo, where I explained how I set up my Nagios config at a former employer. I still think that it's a good layout to use if you're manually building Nagios configs. In my current position, we have a small manual setup and a humongous automated monitoring setup. We're moving toward a completely automated monitoring config using Puppet and Nagios, but until everything is puppetized, some of it needs to be hand-crafted, bespoke monitoring.

For people who don't have a single 'source of truth' in their infrastructure that they can draw monitoring config from, hand-crafting is still the way to go, and if you're going to do it, you might as well not drive yourself insane. For that, you need to take advantage of the layers of abstraction in Nagios and the built-in object inheritance that it offers.

Every once in a while, new content gets posted that refers back to my Config HowTo, and I get a bump in visits, which is cool. Occasionally, I'll get someone who is interested and asks questions, which is what happened in this thread on Reddit. /u/sfrazer pointed to my config as something that he references when making Nagios configs (Thanks!), and the original submitter replied:

I've read that write up a couple of times. My configuration of Nagios doesn't have an objects, this is what it looks like

(click to embiggen)

And to understand what you are saying, just by putting them in the file structure you have in your HowTo that will create an inheritance?

I wanted to help him understand how Nagios inheritance works, so I wrote a relatively long response, and I thought that it might also help other people who still need to do this kind of thing:


 

No, the directories are just to help remember what is what, and so you don't have a single directory with hundreds of files.

What creates the inheritance is this:

You start out with a host template:

msimmons@nagios:/usr/local/nagios/etc/objects$ cat generic-host.cfg
define host {
    name generic-host
    notifications_enabled   1
    event_handler_enabled   1
    flap_detection_enabled  1
    failure_prediction_enabled  1
    process_perf_data   1
    retain_status_information   1
    retain_nonstatus_information    1
    max_check_attempts 3
    notification_period 24x7
    contact_groups systems
    check_command check-host-alive
    register 0
}
# EOF

So, what you can see there is that I have a host named "generic-host" with a bunch of settings, and "register 0". The reason I have this is that I don't want to have to set all of those settings for every other host I make. That's WAY too much redundancy. Those settings will almost never change (and if we do have a specific host that needs to have the setting changed, we can do it on that host).

Once we have generic host, lets make a 'generic-linux' host that we can have the linux machines use:

msimmons@monitoring:/usr/local/nagios/etc/objects/linux$ cat generic-linux.cfg 
define host { 
    name     linux-server
    use generic-host
    check_period    24x7
    check_interval  5
    retry_interval  1
    max_check_attempts  5
    check_command   check-host-alive
    notification_interval 1440
    contact_groups  systems
    hostgroups linux-servers
    register 0
}

define hostgroup {
    hostgroup_name linux-servers
    alias Linux Servers
}
# EOF

Alright, so you see we have two things there. A host, named 'linux-server', and you can see that it inherits from 'generic-host'. I then set some of the settings specific to the monitoring host that I'm using (for instance, you probably don't want notification_interval 1440, because that's WAY too long for most people - a whole day would go between Nagios notifications!). The point is that I set a bunch of default host settings in 'generic-host', then did more specific things in 'linux-server' which inherited the settings from 'generic-host'. And we made it 'register 0', which means it's not a "real" host, it's a template. Also, and this is important, you'll see that we set 'hostgroups linux-servers'. This means that any host we make that inherits from 'linux-server' will automatically be added to the 'linux-servers' hostgroup.

Right below that, we create the linux-servers hostgroup. We aren't listing any machines. We're creating an empty hostgroup, because remember, everything that inherits from linux-servers will automatically become a member of this group.

Alright, you'll notice that we don't have any "real" hosts yet. We're not going to yet, either. Lets do some services first.

msimmons@monitoring:/usr/local/nagios/etc/objects$ cat check-ssh.cfg
define command{
   command_name   check_ssh
   command_line   $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
   }
# EOF

This is a short file which creates a command called "check_ssh". This isn't specific to Linux or anything else. It could be used by anything that needed to verify that SSH was running. Now, lets build a service that uses it:

msimmons@monitoring:/usr/local/nagios/etc/objects/services$ cat generic-service.cfg 
define service{
        name                            generic-service  
        active_checks_enabled           1     
        passive_checks_enabled          1      
        parallelize_check               1       
        obsess_over_service             1        
        check_freshness                 0         
        notifications_enabled           1          
        event_handler_enabled           1           
        flap_detection_enabled          1            
        failure_prediction_enabled      1             
        process_perf_data               1
        retain_status_information       1 
        retain_nonstatus_information    1  
        is_volatile                     0   
        check_period                    24x7 
        max_check_attempts              3     
        normal_check_interval           10     
        retry_check_interval            2       
        contact_groups                  systems
      notification_options    w,u,c,r        
        notification_interval           1440
        notification_period             24x7       
         register                        0            
}
# EOF

This is just a generic service template with sane settings for my environment. Again, you'll want to use something good for yours. Now, something that will inherit from generic-service:

msimmons@monitoring:/usr/local/nagios/etc/objects/linux$ cat linux-ssh.cfg
define service { 
    use generic-service
    service_description Linux SSH Enabled
    hostgroup_name linux-servers
    check_command check_ssh 
}
# EOF

Now we have a service "Linux SSH Enabled". This uses check_ssh, and (importantly), 'hostgroup_name linux-servers' means "Every machine that is a member of the hostgroup 'linux-servers' automatically gets this service check".

Lets do the same thing with ping:

define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
 }

 define service { 
    use generic-service
    service_description Linux Ping
    hostgroup_name linux-servers
    check_command check_ping!3000.0,80%!5000.0,100
}

Sweet. (If you're wondering about the exclamation marks on the check_ping line in the Linux Ping service, we're sending those arguments to the command, which you can see set the warning and critical thresholds).

Now, lets add our first host:

msimmons@monitoring:/usr/local/nagios/etc/objects/linux$ cat mylinuxserver.mycompany.com.cfg 
define host{
       use linux-server
       host_name myLinuxServer.mycompany.com
       address my.ip.address.here
}

That's it! I set the host name, I set the IP address, and I say "use linux-server" so that it automatically gets all of the "linux-server" settings, including belonging to the linux host group, which makes sure that it automatically gets assigned all of the Linux service checks. Ta-Da!

Hopefully this can help people see the value in arranging configs like this. If you have any questions, please let me know via comments. I'll be happy to explain! Thanks!

 

 

Mount NFS share to multiple hosts in vSphere 5.5

Date September 14, 2014

One of the annoying parts of making sure that you can successfully migrate virtual resources across a vSphere datacenter is ensuring that networks and datastores are not only available everywhere, but also named identically.

I inherited a system that was pretty much manually administered, without scripts. I've built a small powershell script to make sure that vSwitches can be provisioned identically when spinning up a new vHost, and there's really no excuse for not doing it for storage, except that not all of my hosts should have all of the same NFS datastores that another host has. I could do some kind of complicated menu system or long command line options, but that's hardly better than doing it individually.

Tonight, I learned about a really nice feature of the vSphere 5.5 web interface (which I am generally not fond of) - the ability to take a specific datastore and mount it on multiple hosts. (Thanks to Genroo on #vmware on Freenode for letting me know that it was a thing).


vmware-nfs-add-host-1
Log into the vSphere web interface and select "vCenter"

vmware-nfs-add-host-2

 

Select Datastores

vmware-nfs-add-host-3

Right click on the datastore you want to mount on other hosts. Select "All vCenter Actions", then 'Mount Datastore to Additional Host"
vmware-nfs-add-host-4

 

 

Pick the hosts you want to mount the datastore to.

vmware-nfs-add-host-5

The mount attempt will show up in the task list. Make sure that the hosts you select have access to NFS mount the datastore, otherwise it will fail. (You can see the X here, my failed attempt to rename a datastore after I created it using the wrong network for the filer. I'll clean that up shortly.

Anyway, hopefully this helps someone else in the future.

Impossible problems are the best

Date September 11, 2014

"I can't believe that!" said Alice.
"Can't you?" the Queen said in a pitying tone. "Try again: draw a long breath, and shut your eyes."
Alice laughed. "There's no use trying," she said: "one can't believe impossible things."
"I daresay you haven't had much practice," said the Queen. "When I was your age, I always did it for half-an-hour a day. Why, sometimes I've believed as many as six impossible things before breakfast."

Impossible problems are fun. It's nice to sometimes encounter the types of things that require a suspension of disbelief in order to deal with. I had a user give me one of these this morning, and I really enjoyed the mental cartwheels.

Imagine for a second, that we have the following situation:

jdoe@login:/course/cs101$ ls -al
total 52
drwxrwsr-x   4 msimmons cs101-staff  4096 Sep 11 09:51 ./
drwxr-sr-x 309 root    course          28672 Sep 11 09:59 ../
drwxrws---   7 msimmons cs101-staff  4096 Sep 19  2013 svnrepos/
drwxrwsr-x   4 msimmons cs101-staff  8192 Sep  9 22:21 .www/
jdoe@login:/course/cs2500f14$ 

The issue reported was that the user, here being played by 'Jane Doe' (username jdoe), cannot checkout from the svn repository in svnrepos there. She is a member of the group cs101-staff, as indicated by getent group cs101-staff, by running groups as her user on the machine, and by the ypgroup jdoe command on the NetApp. However, when trying to checkout the repository, she gets a permission denied error on a file in svnrepos/, and initial investigations show this:

jdoe@login:/course/cs101$ cd svnrepos
-bash: cd: svnrepos: Permission denied

You'll notice that the x in the group permissions is set to 's', which indicates the GUID bit is set. This is a red herring, and the problem happens regardless of the fiddling of this bit.

I'm not going to walk you though the hour of debugging that my coworkers and I performed, but I'm willing to bet you would probably have done similar, if not the same. Clearly, something was wrong. A user that should have been able to change directory was not able to. This is not new software - there's no bug in 'cd'. We are dealing with the most ancient part of the code, and as it turns out, that had something to do with it.

The key was discovered (and initially overlooked) while verifying that the user was a member of the group:


jdoe@login:/course/cs101$ groups
faculty 101prof cs101f14-prof cs201sp14-prof cs301su14 2101ta cs301f14-staff cs101sp14-prof cs101sp14-ta 101staff cs121f14 cs101f14-ta cs101sp14-staff cs121sp14 cs301f14-ta cs201sp14-staff cs221f14-ta cs101-staff

If you're going through looking all of those groups thinking there are too many similarities, that's also a red herring (and although the course numbers have been changed, they really were this similar. The proper group IS in there though).

If you're thinking, "wow, that's a lot of groups", then you're right. That IS a lot of groups. 18 of them, but Linux has no problem with up to 32 groups.

So what is the problem? Well, it's the group count. Even though it's OK for Linux, you might have caught that I said that the netapp 'ypgroup' command showed her as a group member. That's because she absolutely is! When you query NIS (which is what the YP (formerly 'yellow pages') in ypgroup means), NIS says "yes, she is a member of that group". However, all is not peachy, and in this well-written blog entry from 2005(!), Mike Eisler explains why NFS still so often has a 16 group limit for users.

Removing Jane Doe from a few groups made her instantly able to change directory in to the subversion repo, and she was immediately able to complete her svn checkout.

That was a really fun little excursion into the realm of impossibility. Keep that one in mind if you're the sort of place where group memberships tend to accumulate and you use NFS.

Monitoring (old) Zimbra

Date September 6, 2014

It's September, and in universities, that means tons of new people. New staff, new faculty, new students. Lots and lots of new people.

Here at The College of Computer and Information Science at Northeastern University, we've got a banner crop of incoming CS students. So many, in fact, that we bumped up against one of those things that we don't think about a lot. Email licenses.

Every year, we pay for a lot of licenses. We've never monitored the number used vs the number bought, but we buy many thousand seats. Well, we ran out last week. Oops.

After calling our reseller, who hooked us up with a temporary emergency bump, we made it through the day until we could buy more. I decided that it was time to start monitoring that sort of thing, so I started working on learning the Zimbra back-end.

Before you follow along with anything in this article, you should know - my version of Zimbra is old. Like, antique:

Zimbra was very cool about this and issued us some emergency licenses so that we could do what we needed until our new license block purchase went through. Thanks Zimbra!

In light of the whole "running out of licenses" surprise, I decided that the first thing I should start monitoring is license usage. In fact, I instrumented it so well that I can pinpoint the exact moment that we went over the number of emergency licenses we got:

CCIS-mail-accounts

Cool, right?

Well, except for the whole "now we're out of licenses" again thing. Sigh.

I mentioned a while back that I was going to be concentrating on instrumenting my infrastructure this year, and although I got a late start, it's going reasonably well. In that blog entry, I linked to a GitHub repo where I built a Vagrant-based Graphite installation. I used that work as the basis for the work I did when creating a production Graphite installation, using the echocat graphite module.

After getting Graphite up and running, I started gathering metrics in an automated fashion from the rest of the puppetized infrastructure using the pdxcat CollectD puppet module, and I wrote a little bit about how similar that was with my Kerbal Space Administration blog entry.

But my Zimbra install is old. Really old, and the server it's on isn't puppetized, and I don't even want to think about compiling collectd on the version of Ubuntu this machine runs. So I was going to need something else.

As it turns out, I've been working in Python for a little while, and I'd written a relatively short program that serves both as a standalone command that can send a single metric to Carbon or can function as a library, if you need to send a lot of metrics at a time. I'm sure there's probably a dozen tools to do this, but it was relatively easy, so I just figured I'd make my own. You can check it out on GitHub if you're interested.

So that's the script I'm using, but a script needs data. If you log in to the Zimbra admin interface (which I try not to do, because it requires Firefox in the old version we're using), you can actually see most of the stats you're interested in. It's possible to scrape that page and get the information, but it's much nicer to get to the source data itself. Fortunately, Zimbra makes that (relatively) easy:

In the Zimbra home directory (/opt/zimbra in my case), there is a "zmstats/" subdirectory, and in there you'll find a BUNCH of directories with dates as names, and some CSV files:


... snip ...
drwxr-x--- 2 zimbra zimbra 4096 2014-09-04 00:00 2014-09-03/
drwxr-x--- 2 zimbra zimbra 4096 2014-09-05 00:00 2014-09-04/
drwxr-x--- 2 zimbra zimbra 4096 2014-09-06 00:00 2014-09-05/
-rw-r----- 1 zimbra zimbra 499471 2014-09-06 20:11 cpu.csv
-rw-r----- 1 zimbra zimbra 63018 2014-09-06 20:11 fd.csv
-rw-r----- 1 zimbra zimbra 726108 2014-09-06 20:12 imap.csv
-rw-r----- 1 zimbra zimbra 142226 2014-09-06 20:11 io.csv
-rw-r----- 1 zimbra zimbra 278966 2014-09-06 20:11 io-x.csv
-rw-r----- 1 zimbra zimbra 406240 2014-09-06 20:12 mailboxd.csv
-rw-r----- 1 zimbra zimbra 72780 2014-09-06 20:12 mtaqueue.csv
-rw-r----- 1 zimbra zimbra 2559697 2014-09-06 20:12 mysql.csv
drwxr-x--- 2 zimbra zimbra 4096 2014-06-15 22:13 pid/
-rw-r----- 1 zimbra zimbra 259389 2014-09-06 20:12 pop3.csv
-rw-r----- 1 zimbra zimbra 893333 2014-09-06 20:12 proc.csv
-rw-r----- 1 zimbra zimbra 291123 2014-09-06 20:12 soap.csv
-rw-r----- 1 zimbra zimbra 64545 2014-09-06 20:12 threads.csv
-rw-r----- 1 zimbra zimbra 691469 2014-09-06 20:11 vm.csv
-rw-r----- 1 zimbra zimbra 105 2014-09-06 19:08 zmstat.out
-rw-r----- 1 zimbra zimbra 151 2014-09-06 06:28 zmstat.out.1.gz
-rw-r----- 1 zimbra zimbra 89 2014-09-04 21:15 zmstat.out.2.gz
-rw-r----- 1 zimbra zimbra 98 2014-09-04 01:41 zmstat.out.3.gz

Each of those CSV files contains the information you want, in one of a couple of formats. Most are really easy.


sudo head mtaqueue.csv
Password:
timestamp, KBytes, requests
09/06/2014 00:00:00, 4215, 17
09/06/2014 00:00:30, 4257, 17
09/06/2014 00:01:00, 4254, 17
09/06/2014 00:01:30, 4210, 16
... snip ...

In this case, there are three columns, which include the timestamp, the number of kilobytes in queue, and the number of requests. Most CSV files have (many) more columns, but this works pretty simply. That file is updated every minute, so if you have a cronjob run, grab the last line of that file, parse it, and send it into Graphite, then your work is basically done:


zimbra$ crontab -l
... snip ...
* * * * * /opt/zimbra/zimbra-stats/zimbraMTAqueue.py

And looking at that file, it's super-easy:


#!/usr/bin/python

import pyGraphite as graphite
import sys
import resource

CSV = open('/opt/zimbra/zmstat/mtaqueue.csv', "r")
lineList = CSV.readlines()
CSV.close()
GraphiteString = "MY.GRAPHITE.BASE."

rawLine = lineList[-1]
listVals = rawLine.split(',')

values = {
	'kbytes': listVals[1],
	'items':  listVals[2],
	}

graphite.connect()

for value in values:
	
	graphite.sendData(GraphiteString + "." + value + " ", values[value])

graphite.disconnect()

So there you go. My python isn't awesome, but it gets the job done. Any includes not used here are because some of the other scripts I needed them, and by the time I got to this one, I was just copying and pasting my code for the most part. #LazySysAdmin

The only CSV file that took me a while to figure out was imap.csv. The format of that one is more interesting:

msimmons@zimbra:/opt/zimbra/zmstat$ sudo head imap.csv
timestamp,command,exec_count,exec_ms_avg
09/06/2014 00:00:13,ID,11,0
09/06/2014 00:00:13,FETCH,2,0
09/06/2014 00:00:13,CAPABILITY,19,0
...snip...

So you get the timestamp, the IMAP command, the number of times that command is being executed, and how long, on average, it took, so you can watch latency. But the trick is that you only get one command per line, so the previous tactic of only grabbing the final line won't work. Instead, you have to grab the last line, figure out the timestamp, and then grab all of the lines that match the timestamp. Also, I've found that not all IMAP commands will show up every time, so make sure that your XFilesFactor is set right for the metrics you'll be dealing with.

The code is only a little more complicated, but still isn't too bad:

#!/usr/bin/python

import pyGraphite as graphite
import sys
import resource

imapCSV = open('/opt/zimbra/zmstat/imap.csv', "r")
lineList = imapCSV.readlines()
imapCSV.close()
GraphiteString = "MY.GRAPHITE.PATH"

class imapCommand:
	name = ""
	count = ""
	avgres = ""

	def __init__(self, name, count, avgres):
		self.name = name
		self.count = count
		self.avgres = avgres
	

IMAPcmds = list()

datestamp = lineList[-1].split(',')[0]

record = len(lineList)

while True:
	if ( lineList[record-1].split(',')[0] == datestamp ):
		CMD = lineList[record-1].split(',')[1]
		COUNT = lineList[record-1].split(',')[2]
		AVGRES = lineList[record-1].split(',')[3].strip()
		IMAPcmds.append(imapCommand(CMD, COUNT, AVGRES))
	else:
		break
	record = record - 1

graphite.connect()

for command in IMAPcmds:
	graphite.sendData(GraphiteString + "." + command.name + ".count ", command.count)
	graphite.sendData(GraphiteString + "." + command.name + ".avgres ", command.avgres)

graphite.disconnect()

You can read much more about all of the metrics in the online documents, Monitoring Zimbra.

Now, so far, this has been the runtime metrics, which is helpful, but doesn't actually give me account information. To get that, we're going to use some of the built-in Zimbra tools. zmaccts lists all accounts, and then prints a summary at the end. We can just grab the summary and learn the number of accounts. We can also use the zmlicense -p command to get the number of licensed accounts we have.

The shell script is pretty easy:

$ cat zimbra-stats/zimbraAccountStatuses.sh
#!/bin/bash

# Creates $GRAPHITESERVER and $GRAPHITEPORT
. /opt/zimbra/zimbra-stats/graphite.sh

OUTPUT="`/opt/zimbra/bin/zmaccts | tail -n 1`"

ACTIVE=`echo $OUTPUT | awk '{print $2}'`
CLOSED=`echo $OUTPUT | awk '{print $3}'`
LOCKED=`echo $OUTPUT | awk '{print $4}'`
MAINT=`echo $OUTPUT | awk '{print $5}'`
TOTAL=`echo $OUTPUT | awk '{print $6}'`
NEVERLOGGEDIN=`/opt/zimbra/bin/zmaccts | grep "never$" | wc -l`

MAX="`/opt/zimbra/bin/zmlicense -p | grep ^AccountsLimit= | cut -d \= -f 2`"

STATPATH="MY.GRAPHITE.PATH."

/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.active ${ACTIVE} 
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.closed ${CLOSED}
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.locked ${LOCKED} 
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.maintenance ${MAINT} 
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.total ${TOTAL} 
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.neverloggedin ${NEVERLOGGEDIN} 
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.max ${MAX}  

 

Forgive all of the shortcuts taken in the above. Things aren't quoted when they should be and so on. Use at your own risk. Warranty void in Canada. Etc etc.

Overall, it's to get that additional transparency into the mail server. Even after we get the server upgraded and on a modern OS, this kind of information is a welcome addition.

Oh, and for the record?

$ find ./ -name "*wsp" | wc -l
8783

Over 8,500 metrics coming in. Sweet. Most of that is coming from collectd, but that's another blog entry...

Ubuntu and SNMP

Date August 21, 2014

After running Ubuntu for about two years now, I have a laundry list of complaints. Whether Ubuntu is automagically starting daemons that I install, or the relative difficulty of running an internal repo, or (and I'm heartily agreeing with my coworker Nick here) that it doesn't actually include a firewall out of the box....there are very basic issues I have with running and managing Ubuntu machines.

The one that inspired this entry, though, is like, super dumb and annoying.

Suppose I'm trying to do something like snmpwalk on a switch:

$ snmpwalk -v 2c -c public myswitch.mydomain
-bash: /usr/bin/snmpwalk: No such file or directory

Of course, I need snmp. Lets install that:

~$ sudo apt-get install snmp
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
libperl5.18 libsensors4 libsnmp-base libsnmp30
Suggested packages:
lm-sensors snmp-mibs-downloader
The following NEW packages will be installed:
libperl5.18 libsensors4 libsnmp-base libsnmp30 snmp
0 upgraded, 5 newly installed, 0 to remove and 36 not upgraded.
Need to get 1,168 kB of archives.
After this operation, 4,674 kB of additional disk space will be used.

and try it again:

$ snmpwalk -v 2c -c public myswitch.mydomain
iso.3.6.1.2.1.1.1.0 = STRING: "Cisco NX-OS(tm) n5000, Software (n5000-uk9), Version 6.0(2)N2(3), RELEASE SOFTWARE Copyright (c) 2002-2012 by Cisco Systems, Inc. Device Manager Version 6.2(1), Compiled 12/17/2013 2:00:00"
iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1.9.12.3.1.3.1084
iso.3.6.1.2.1.1.3.0 = Timeticks: (575495258) 66 days, 14:35:52.58
iso.3.6.1.2.1.1.4.0 = STRING: "me@here"
iso.3.6.1.2.1.1.5.0 = STRING: "myswitch"
iso.3.6.1.2.1.1.6.0 = STRING: "snmplocation"
iso.3.6.1.2.1.1.7.0 = INTEGER: 70
iso.3.6.1.2.1.1.8.0 = Timeticks: (4294966977) 497 days, 2:27:49.77
iso.3.6.1.2.1.1.9.1.2.1 = OID: iso.3.6.1.6.3.1
iso.3.6.1.2.1.1.9.1.2.2 = OID: iso.3.6.1.6.3.16.2.2.1
iso.3.6.1.2.1.1.9.1.2.3 = OID: iso.3.6.1.6.3.10.3.1.1
...snip...

Well, we're close, but all we have are a bunch of OIDs. I'd really like names. If you've read my introduction to SNMP, you know that it's not loading the MIBs. Weird. On RHEL/CentOS, that's kind of automatic. Maybe there's another package?

Well, that snmp-mibs-downloader that was listed as a suggested package above sounds pretty promising. Lets install that.

$ sudo apt-get install snmp-mibs-downloader
...snip lots of installing MIBS...

So basically, 300+ MIBs were just installed into /var/lib/mibs/ - this is awesome. Lets run that command again:

$ snmpwalk -v 2c -c public myswitch.mydomain
iso.3.6.1.2.1.1.1.0 = STRING: "Cisco NX-OS(tm) n5000, Software (n5000-uk9), Version 6.0(2)N2(3), RELEASE SOFTWARE Copyright (c) 2002-2012 by Cisco Systems, Inc. Device Manager Version 6.2(1), Compiled 12/17/2013 2:00:00"
iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1.9.12.3.1.3.1084
iso.3.6.1.2.1.1.3.0 = Timeticks: (575577418) 66 days, 14:49:34.18
iso.3.6.1.2.1.1.4.0 = STRING: "me@here"
iso.3.6.1.2.1.1.5.0 = STRING: "myswitch"
iso.3.6.1.2.1.1.6.0 = STRING: "snmplocation"
iso.3.6.1.2.1.1.7.0 = INTEGER: 70
iso.3.6.1.2.1.1.8.0 = Timeticks: (4294966977) 497 days, 2:27:49.77
iso.3.6.1.2.1.1.9.1.2.1 = OID: iso.3.6.1.6.3.1
iso.3.6.1.2.1.1.9.1.2.2 = OID: iso.3.6.1.6.3.16.2.2.1
iso.3.6.1.2.1.1.9.1.2.3 = OID: iso.3.6.1.6.3.10.3.1.1
...snip...

That's strange. As it turns out, though, Ubuntu has yet another trick up its sleeve to screw with you. Check out /etc/snmp/snmp.conf:

msimmons@nagios:/var/log$ cat /etc/snmp/snmp.conf
#
# As the snmp packages come without MIB files due to license reasons, loading
# of MIBs is disabled by default. If you added the MIBs you can reenable
# loaging them by commenting out the following line.
mibs :

This file's entire purpose in life is to stop you from having MIBs out of the box.

Obviously, you can comment out that line and then things work:

$ snmpwalk -v 2c -c public myswitch.mydomain
SNMPv2-MIB::sysDescr.0 = STRING: Cisco NX-OS(tm) n5000, Software (n5000-uk9), Version 6.0(2)N2(3), RELEASE SOFTWARE Copyright (c) 2002-2012 by Cisco Systems, Inc. Device Manager Version 6.2(1), Compiled 12/17/2013 2:00:00
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.9.12.3.1.3.1084
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (575602144) 66 days, 14:53:41.44
SNMPv2-MIB::sysContact.0 = STRING: me@here
SNMPv2-MIB::sysName.0 = STRING: myswitch
SNMPv2-MIB::sysLocation.0 = STRING: snmplocation
SNMPv2-MIB::sysServices.0 = INTEGER: 70
SNMPv2-MIB::sysORLastChange.0 = Timeticks: (4294966977) 497 days, 2:27:49.77
SNMPv2-MIB::sysORID.1 = OID: SNMPv2-MIB::snmpMIB
SNMPv2-MIB::sysORID.2 = OID: SNMP-VIEW-BASED-ACM-MIB::vacmBasicGroup
SNMPv2-MIB::sysORID.3 = OID: SNMP-FRAMEWORK-MIB::snmpFrameworkMIBCompliance
SNMPv2-MIB::sysORID.4 = OID: SNMP-MPD-MIB::snmpMPDCompliance
SNMPv2-MIB::sysORID.5 = OID: SNMP-USER-BASED-SM-MIB::usmMIBCompliance
SNMPv2-MIB::sysORDescr.1 = STRING: The MIB module for SNMPv2 entities
SNMPv2-MIB::sysORDescr.2 = STRING: View-based Access Control Model for SNMP.
SNMPv2-MIB::sysORDescr.3 = STRING: The SNMP Management Architecture MIB.
SNMPv2-MIB::sysORDescr.4 = STRING: The MIB for Message Processing and Dispatching.
SNMPv2-MIB::sysORDescr.5 = STRING: The management information definitions for the SNMP User-based Security Model.
SNMPv2-MIB::sysORUpTime.1 = Timeticks: (4294966977) 497 days, 2:27:49.77
SNMPv2-MIB::sysORUpTime.2 = Timeticks: (4294966977) 497 days, 2:27:49.77
...snip...

But...if you're not actually running an snmp server, and you just want to use snmp for querying...if you get rid of that file entirely, it ALSO fixes the problem.

Anyway, just another annoying thing I've found that I thought I'd share.

Adam Moskowitz's video: The Future of System Administration

Date August 6, 2014

My friend Adam Moskowitz presented a topic at LOPSA-East this past year that is one near and dear to my heart - the Future of System Administration.

I've written a blog entry with something very close to that title twice:

A while back, I started to realize something...and I haven't written about this before, but I'm firmly coming to see that a system administrator is not just something someone is, system administration is something someone does. Regardless of whether someone's title is SysAdmin, IT Operations, Systems Engineering, or Sparkly DevOps Prince(ess), you might not be a system administrator, but you are doing systems administration.

And in the future, a lot of the people who are going to be doing systems administration are developers.

<voice="Levar Burton">But you don't have to take my word for it...</voice>

The Future of System Administration from Adam Moskowitz on Vimeo.