July 10, 2010
Introduction to what? This isn't going to be a "how to configure SNMP for your server" kind of introduction. I'm no great expert there, but if there's call for it, I can share my configuration bits to help. This is more of a "what the heck is SNMP" introduction. Hopefully it'll be more valuable, since there are reams of existing documentation on how to actually configure the services, and not so many on why you should care.
Really, the system administration world is divided into two camps. Those of us who want to monitor our servers, network gear, and get performance metrics so that we can trend future usage, and those of us who don't yet know that we want those things. The former group uses SNMP. The latter group will probably get something out of this post.
If you're new to the idea of SNMP, bear with me for a second. Suppose that it would be handy to remotely query all of your network devices and retrieve stats from them. If you're familiar with the concept of logging into, say, a router, you know that you can get information that way. If you buy intelligent switches, you know that you can telnet, ssh, or web-browse to the switch interface and check out what's going on that way. Likewise, you can log into your servers and check out the stats there, but overall, there are as many different ways of getting this information as there are devices that you want it from. That's no good, because no one wants to script that many possible interactive sessions.
This is the problem that SNMP was meant to address. SNMP means Simple Network Management Protocol, and it is just a well-agreed-upon language (protocol) that almost all network devices speak. By using SNMP, you can effectively move beyond the normal administrative interface of your network device and just query it for information. Sounds great, right? It Is!
Well, ok, it can be. From this pristine dream of a one-ness of network devices, we muddy the waters a bit when it comes to the specifics. As of right now, there are three different versions of the SNMP protocol, with the primary differences between v1 and v2 being capabilities, and the primary differences between v2 and v3 being security.
Starting with v1, and continuing with v2, SNMP didn't actually have "usernames" and "passwords", so much. They instead had "community strings", which function as passwords, but without all of those messy account details getting in the way. Typically speaking, there was a community string for reading data (the default was "public"), and a community string for writing settings, when the device supported that (with a default of, yep, you guessed it, "private"). It's hard to imagine why anyone thought this was an insecure protocol, but apparently some people were uncomfortable with the idea of all of their machines being monitored remotely with no accountability whatsoever. Weird, I know.
That brought about the idea of SNMP v3, which packs as many security features into it as the previous versions lacked. In fact, that's pretty much all it does. The actual protocol request itself is still v1 or v2, but with extra security layers. By default, not only does SNMP v3 require the use of accounts with passwords, but the transmission itself is encrypted (DES by default, though some vendors support better encryption like 3DES)to protect the account credentials and data. In addition, each of the transmissions is signed (using MD5 or SHA-1) to guarantee that it wasn't altered in transit. Because yeah, that's not overkill for me querying the number of bits transmitted since the last time I asked.
Anyway, to use the universal car analogy, you can either have the jeep with no roll cage (v1/2) or the armored tank (v3).
Honestly, I use SNMP v2, and as much as I hate to admit it, I have a nearly universal read-only community string that I use for it. It's not "public", and I disable the write-access community string, but I run old hardware. A lot of it doesn't work with v3. In fact, some of it doesn't even work with v2, but for everything that does, I use v2. It is noticeably faster, and as far as security is concerned, 99% of my things are internal on a private IP-based switched network. If someone is sniffing my packets, I have bigger issues than my read-only community string being compromised. You, on the other hand, may want to check things over the internet. In that case, use SNMP v3. The encryption will be worth the time you invest.
So that's an introduction to what the versions are, but that's not much of an explanation of what SNMP *IS*. SNMP is a logical tree.
Imagine that you're an snmp server in the mid 1990s. You don't have a lot of RAM, but you have a lot of data to keep track of. Strange remote machines will be querying you to access this data. What method do you use to keep track of the data that they want?
In the case of SNMP, they used a tree. Every branch of the tree is separated from the parent and child branches by a period. Taken together, this string of numbers is called an OID, or Object IDentifier. The very top of the tree (or very bottom, depending on how you look at it) is the most abstracted...and you're almost always going to see it start with a 1, which has been assigned to the Internet Standards Organization, or ISO. In fact, a lot of the OIDs that you run into will start with 220.127.116.11, which maps to ISO.identified-organization.dod.internet. You can browse the entire registered OID tree at http://www.oid-info.com, if you're really bored.
Alright, so imagine that you've browsed all the way down to 18.104.22.168.22.214.171.124.1.16. Great. What the heck does that mean, though?
The other great tree of numbers strung together with dots, IP addresses, had the same problem a long time ago, and so DNS was invented, to map IP addresses to names. For a very similar reason, there is a Management Information Base, or MIB, that maps OIDs to useful names. That 126.96.36.199.188.8.131.52.1.16 monstrosity above? Yeah, it actually means ifOutOctets, shorthand for interface output octets. It's a 32 bit counter that shows the number of octets which have been output by each interface. When I query it (more on that shortly) on a machine with 5 interfaces, I get the following output:
IF-MIB::ifOutOctets.1 = Counter32: 2766014067
IF-MIB::ifOutOctets.2 = Counter32: 3209623655
IF-MIB::ifOutOctets.3 = Counter32: 3606918534
IF-MIB::ifOutOctets.4 = Counter32: 2521574893
IF-MIB::ifOutOctets.5 = Counter32: 0
There are some very standard OIDs that are universal across pretty much all devices. On the other hand, many devices have specialized OIDs that you probably wouldn't otherwise find (and certainly wouldn't know what they meant!) unless you had the specific MIB for that device. For this reason, many manufacturers have made their MIBs available for download, but there are also websites that archive MIBs and make them searchable by the public. This can be a huge help if you want to know how many VPN users are currently logged in, or really anything else that is non-standard or hard to find.
Think of the MIB files as a map to the information you want to look for.
Now, how to actually get that information out of the device...
If you want to query by hand (certainly only a temporary measure), in the Unix/Linux world, I recommend net-snmp. It includes a suite of tools to poke and prod SNMP-enabled devices, but the two things that I use the most are snmpwalk and snmpget.
The block of results above were retrieved using snmpwalk. What I did was issue the following command:
snmpwalk -v 2c -c CommunityString servername 184.108.40.206.220.127.116.11.1.16
If you notice, the output from that command returned 5 lines, with the first field of each line ending in "ifOutOctets.#", where # is the number of the interface. That's because the actual OID of each of those values was 18.104.22.168.22.214.171.124.1.16.#! If I try to use 'snmpget' (which, unlike snmpwalk, only returns one result), it fails:
snmpget -v 2c -c CommunityString servername 126.96.36.199.188.8.131.52.1.16
IF-MIB::ifOutOctets = No Such Instance currently exists at this OID
However, specifying the correct OID does the trick:
snmpget -v 2c -c CommunityString servername 184.108.40.206.220.127.116.11.1.16.1
IF-MIB::ifOutOctets.1 = Counter32: 2766027795
What 'snmpwalk' actually does is walk the tree. I specified '18.104.22.168.22.214.171.124.1.16', so it said "alright, I'm going to get that OID, then I'm going to dive in and get '.1', then '.2', etc etc until it reaches a failure message indicating that there aren't any more children. By this method, you can actually query a huge part (or even all) of the tree.
In this case, I knew I had 5 interfaces, numbered 1-5 (according to the OID results from snmpwalk), but I didn't know which interface was registered as which number...I did know, however, that one of the interfaces was called 'eth0', so I shaved some numbers off of the OID, and executed this snmpwalk:
snmpwalk -v 2c -c CommunityString servername 126.96.36.199.188.8.131.52 | grep eth0
IF-MIB::ifDescr.2 = STRING: eth0
Excellent. At this point, I know that ifDescr is the name (registered in the MIB) that holds the interface descriptions. So I just execute an snmpwalk against that:
snmpwalk -v 2c -c CommunityString servername ifDescr
IF-MIB::ifDescr.1 = STRING: lo
IF-MIB::ifDescr.2 = STRING: eth0
IF-MIB::ifDescr.3 = STRING: eth1
IF-MIB::ifDescr.4 = STRING: bond0
IF-MIB::ifDescr.5 = STRING: sit0
Easy as pie.
Of course, you don't always want to query by hand...in fact, it's probably the exception, rather than the rule. You want monitoring software to do all that stuff for you. Pretty much every monitoring software known to man can query snmp directly (and if it can't, you know how to query it via the command line now, so you can write a script to do it, if it's absolutely necessary). Most of the graphing solutions like Cacti, MRTG, and everything else include code to query, and even Nagios has a check_snmp plugin (which I highly recommend using, rather than creatively solving the problem yourself).
This really only leaves one stone unturned. SNMP Traps. Essentially, SNMP traps are a way of letting the SNMP server stop being passively queried and start actively letting someone know that something is wrong. Configuring a trap involves specifying a remote server (or servers) to alert when something goes horribly awry.
The remote server specified needs to be listening for SNMP traps. In Unix/Linux, it's not too difficult to get net-snmp to listen for them, and on Windows, there is software available to do the same thing. Here's one I found with a quick search. I'm sure there are more, so if you have a favorite, please let us know what it is in the comments.
The only thing left is to tie your notification system into the trap server, but I'll leave that as an exercise for the reader.
Thanks for reading, and hopefully you got something out. If you have a favorite SNMP tip or trick (or I screwed something up), let us know in the comments!