Are you monitoring your switchports the right way?

Graphite might be the best thing I’ve rolled out here in my position at CCIS.

One of our graduate students has been working on a really interesting paper for a while. I can’t go into details, because he’s going to publish before too long, but he has been making good use of my network diagrams. Since he has a lot riding on the accuracy of the data, he’s been asking me very specific questions about how the data was obtained, and how the graphs are produced, and so on.

One of the questions he asked me had to do with a bandwidth graph, much like this one:

His question revolved around the actual amount of traffic each datapoint represented. I explained briefly that we were looking at Megabytes per second, and he asked for clarification – specifically, whether each point was the sum total of data sent per second between updates, or whether it was the average bandwidth used over the interval.

We did some calculations, and decided that if it were, in fact, the total number of bytes received since the previous data point, it would mean my network had basically no traffic, and I know that not to be the case. But still, these things need verified, so I dug in and re-determined the entire path that the metrics take.

These metrics are coming from a pair of Cisco Nexus Switches via SNMP. The data being pulled is a per-interface ifInOctets and ifOutOctets. As you can see from the linked pages, each of those are 32 bit counters, with “The total number of octets transmitted [in|out] of the interface, including framing characters”.

Practically speaking, what this gives you is an ever-increasing number. The idea behind this counter is that you query it, and receive a number of bytes (say, 100). This indicates that at the time you queried it, the interface has sent (in the case of ifOutOctets) 100 bytes. If you query it again ten seconds later, and you get 150, then you know that in the intervening ten seconds, the interface has sent 50 bytes, and since you queried it ten seconds apart, you determine that the interface has transmitted 5 bytes per second.

Having the counter work like this means that, in theory, you don’t have to worry about how frequently you query it. You could query it tomorrow, and if it went from 100 to 100000000, you could be able to figure out how many seconds it was since you asked before, divide the byte difference, and figure out the average bytes per second that way. Granted, the resolution on those stats isn’t stellar at that frequency, but it would still be a number.

Incidentally, you might wonder, “wait, didn’t you say it was 32 bits? That’s not huge. How big can it get?”. The answer is found in RFC 1155:

3.2.3.3. Counter

This application-wide type represents a non-negative integer which monotonically increases until it reaches a maximum value, when it wraps around and starts increasing again from zero. This memo specifies a maximum value of 2^32-1 (4294967295 decimal) for counters.

In other words, 4.29 gigabytes (or just over 34 gigabits). It turns out that this is actually kind of an important facet to the whole “monitoring bandwith” thing, because in our modern networks, switch interfaces are routinely 1Gb/s, often 10Gb/s, and sometimes even more. If our standard network interfaces can transfer one gigabits per second, then a fully utilized network interface can roll over an entire counter in 35 seconds. If we’re only querying that interface once a minute, then we’re potentially losing a lot of data. Consider, then, a 10Gb/s interface. Are you pulling metrics more often than once every 4 seconds? If not, you may be losing data.

Fortunately, there’s an easy fix. Instead of ifInOctets and ifOutOctets, query ifHCInOctets and ifHCOutOctets.  They are 64 bit counters, and only roll over once every 18 exabytes. Even with a 100% utilized 100Gb/s interface, you’ll still only roll over a counter every 5.8 years or so.

I made this change to my collectd configuration as soon as I figured out what I was doing wrong, and fortunately, none of my metrics jumped, so I’m going to say I got lucky. Don’t be me – start out doing it the right way, and save yourself confusion and embarrassment later.  Use 64-bit counters from the start!

(Also, there are the equivalent HC versions for all of the other interface counters you’re interested in, like the UCast, Multicast, and broadcast packet stats – make sure to use the 64-bit version of all of them).

Thanks, I hope I managed to help someone!

New Blog Theme is Up

The other day, I got an email from a loyal reader who let me know that, when you find Standalone SysAdmin on Google, you get a little warning that says, “This site may be hacked”. Well, that’s not good. It was clearly related to some adware that got installed a while back by accident (where “accident” means that WordPress allowed someone to upload a .htaccess file into my blog’s uploads directory, and then used that as a springboard for advertising Casino sites). Somehow, the vulnerability was severe enough that the wp-footer.php site was able to be modified as well. Ugh.

After cleaning up the uploads directory and removing the extraneous plugins that I wasn’t using, or didn’t like, or whatever, I went to clean up the footer, but I thought, “I wonder why my security scans didn’t pick this up?”. The blog runs Wordfence thanks to recommendations from several other bloggers, and it’s really good software. It was able to help me with cleaning up most of the modified files because it can compare the originals with what’s installed, and revert to the reference copy. But it didn’t do that with my theme.

As it turns out, the theme I was running was an antique one named Cleaker. I really liked how it looked at the time I installed it, which was back in 2009 when I migrated from Blogger. Well, you can’t even download Cleaker anymore, at least from the author. So why not use this as a chance to put something new in place?

Funny, that “Welcome to the Future” post mentions RSS, but practically no one reads blogs through RSS anymore. Oh, there are still some, but when Google killed Reader, RSS readers dropped almost off the radar, so you’re almost certainly reading this on the actual site, and that means you’re almost certainly seeing the new site theme. It’s a fairly standard one called Twenty Fourteen, which I’m sure will feel hilariously old when I get around to changing it in 2020. The banner image at the top rotates occasionally between several images that I’ve taken, or taken from NASA.gov, or from some other Free-Without-Attribution source. There are seriously tons of free image sources out there, if you’re looking.

Anyway, I just wanted to let you know that the look of the site has changed, so when I inevitably get around to writing more content, this is what you’ll be seeing. Unless, of course, everyone comments below to let me know that it sucks, which, if you think so, please do tell me. There are comments below for that kind of thing. If you like it, don’t like it, or are so ambivalent about it that you want to comment and let me know that too, then go for it. Thanks!

Reminder (to self, too): Use Python virtualenv!

I’m really not much of a programmer, but I dabble at times, in order to make tools for myself and my colleagues. Or toys, like the time I wrote an entire MBTA library because I wanted to build a Slack integration for the local train service.

One of the things that I want to learn better, because it seems so gosh-darned helpful, is Python. I’m completely fluent (though non-expert level) in both Bash and PHP, so I’m decent at writing systems scripts and web back-ends, but I’m only passingly familiar with Perl. The way I see it, the two “modern” languages that get the most use in systems scripts are Python and Ruby, and it’s basically a toss-up for me as to which to pick.

Python seems a little more pervasive, although ruby has the benefit of basically being the language of our systems stack. Puppet, Foreman, logstash, and several other tools are all written in Ruby, and there’s a lot to be said for being fluent in the language of your tools. That being said, I’m going to learn Python because it seems easier and honestly, flying sounds so cool.

 

One of the things that a lot of intro-to-Python tutorials don’t give you is the concept of virtual environments. These are actually pretty important in a lot of ways. You don’t absolutely need them, but you’re going to make your life a lot better if you use them. There’s a really great bit on why you should use them on the Python Guide, but basically, they create an entire custom python environment for your code, segregated away from the rest of the OS. You can use a specific version of python, a specific set of modules, and so on (with no need for root access, since they’re being installed locally).

Installing virtualenv is pretty easy. You may be able to install it with your system’s package manager, or you may need to use pip. Or you could use easy_install. Python, all by itself, has several package managers. Because of course it does.

Setting up a virtual environment is straight forward, if a little kudgy-feeling. If you find that you’re going to be moving it around, maybe from machine to machine or whatever, you should probably know about the —relocatable flag.

By default, the workflow is basically, create a virtual environment, “activate” the virtual environment (which mangles lots of environmental variables and paths, so that python-specific stuff runs local to the environment rather than across the entire server), configuring it by installing the modules you need, write/execute your code as normal, and then deactivate your environment when you’re done, which restores all of your original environmental settings.

There is also a piece of software called virtualenvwrapper that is supposed to make all of this easier. I haven’t used it, but it looks interesting. If you find yourself really offended by the aforementioned workflow, give it a shot and let me know what you think.

Also as a reminder, make sure to put your virtual environment directory in your .gitignore file, because you’re definitely using version control, right? (Right?) Right.

Here’s how I use virtual environments in my workflow:


msimmons@bullpup:~/tmp > mkdir mycode
msimmons@bullpup:~/tmp > cd mycode
msimmons@bullpup:~/tmp/mycode > git init
Initialized empty Git repository in /home/msimmons/tmp/mycode/.git/
msimmons@bullpup:~/tmp/mycode > virtualenv env
New python executable in env/bin/python
Installing setuptools, pip...done.
msimmons@bullpup:~/tmp/mycode > echo "env" > .gitignore
msimmons@bullpup:~/tmp/mycode > git add .gitignore # I always forget this!
msimmons@bullpup:~/tmp/mycode > source env/bin/activate
(env)msimmons@bullpup:~/tmp/mycode >
(env)msimmons@bullpup:~/tmp/mycode > which python
/home/msimmons/tmp/mycode/env/bin/python
(env)msimmons@bullpup:~/tmp/mycode > deactivate
msimmons@bullpup:~/tmp/mycode > which python
/usr/bin/python

A blog for IT Admins who do everything by an IT Admin who does everything