Category Archives: System Administration

Posts generally related to system administration. This will be the primary RSS feed, and will include the vast majority of the posts, excluding those in Administrivia and some of those from Tech Field Day.

Are you monitoring your switchports the right way?

Graphite might be the best thing I’ve rolled out here in my position at CCIS.

One of our graduate students has been working on a really interesting paper for a while. I can’t go into details, because he’s going to publish before too long, but he has been making good use of my network diagrams. Since he has a lot riding on the accuracy of the data, he’s been asking me very specific questions about how the data was obtained, and how the graphs are produced, and so on.

One of the questions he asked me had to do with a bandwidth graph, much like this one:

His question revolved around the actual amount of traffic each datapoint represented. I explained briefly that we were looking at Megabytes per second, and he asked for clarification – specifically, whether each point was the sum total of data sent per second between updates, or whether it was the average bandwidth used over the interval.

We did some calculations, and decided that if it were, in fact, the total number of bytes received since the previous data point, it would mean my network had basically no traffic, and I know that not to be the case. But still, these things need verified, so I dug in and re-determined the entire path that the metrics take.

These metrics are coming from a pair of Cisco Nexus Switches via SNMP. The data being pulled is a per-interface ifInOctets and ifOutOctets. As you can see from the linked pages, each of those are 32 bit counters, with “The total number of octets transmitted [in|out] of the interface, including framing characters”.

Practically speaking, what this gives you is an ever-increasing number. The idea behind this counter is that you query it, and receive a number of bytes (say, 100). This indicates that at the time you queried it, the interface has sent (in the case of ifOutOctets) 100 bytes. If you query it again ten seconds later, and you get 150, then you know that in the intervening ten seconds, the interface has sent 50 bytes, and since you queried it ten seconds apart, you determine that the interface has transmitted 5 bytes per second.

Having the counter work like this means that, in theory, you don’t have to worry about how frequently you query it. You could query it tomorrow, and if it went from 100 to 100000000, you could be able to figure out how many seconds it was since you asked before, divide the byte difference, and figure out the average bytes per second that way. Granted, the resolution on those stats isn’t stellar at that frequency, but it would still be a number.

Incidentally, you might wonder, “wait, didn’t you say it was 32 bits? That’s not huge. How big can it get?”. The answer is found in RFC 1155:

3.2.3.3. Counter

This application-wide type represents a non-negative integer which monotonically increases until it reaches a maximum value, when it wraps around and starts increasing again from zero. This memo specifies a maximum value of 2^32-1 (4294967295 decimal) for counters.

In other words, 4.29 gigabytes (or just over 34 gigabits). It turns out that this is actually kind of an important facet to the whole “monitoring bandwith” thing, because in our modern networks, switch interfaces are routinely 1Gb/s, often 10Gb/s, and sometimes even more. If our standard network interfaces can transfer one gigabits per second, then a fully utilized network interface can roll over an entire counter in 35 seconds. If we’re only querying that interface once a minute, then we’re potentially losing a lot of data. Consider, then, a 10Gb/s interface. Are you pulling metrics more often than once every 4 seconds? If not, you may be losing data.

Fortunately, there’s an easy fix. Instead of ifInOctets and ifOutOctets, query ifHCInOctets and ifHCOutOctets.  They are 64 bit counters, and only roll over once every 18 exabytes. Even with a 100% utilized 100Gb/s interface, you’ll still only roll over a counter every 5.8 years or so.

I made this change to my collectd configuration as soon as I figured out what I was doing wrong, and fortunately, none of my metrics jumped, so I’m going to say I got lucky. Don’t be me – start out doing it the right way, and save yourself confusion and embarrassment later.  Use 64-bit counters from the start!

(Also, there are the equivalent HC versions for all of the other interface counters you’re interested in, like the UCast, Multicast, and broadcast packet stats – make sure to use the 64-bit version of all of them).

Thanks, I hope I managed to help someone!

Reminder (to self, too): Use Python virtualenv!

I’m really not much of a programmer, but I dabble at times, in order to make tools for myself and my colleagues. Or toys, like the time I wrote an entire MBTA library because I wanted to build a Slack integration for the local train service.

One of the things that I want to learn better, because it seems so gosh-darned helpful, is Python. I’m completely fluent (though non-expert level) in both Bash and PHP, so I’m decent at writing systems scripts and web back-ends, but I’m only passingly familiar with Perl. The way I see it, the two “modern” languages that get the most use in systems scripts are Python and Ruby, and it’s basically a toss-up for me as to which to pick.

Python seems a little more pervasive, although ruby has the benefit of basically being the language of our systems stack. Puppet, Foreman, logstash, and several other tools are all written in Ruby, and there’s a lot to be said for being fluent in the language of your tools. That being said, I’m going to learn Python because it seems easier and honestly, flying sounds so cool.

 

One of the things that a lot of intro-to-Python tutorials don’t give you is the concept of virtual environments. These are actually pretty important in a lot of ways. You don’t absolutely need them, but you’re going to make your life a lot better if you use them. There’s a really great bit on why you should use them on the Python Guide, but basically, they create an entire custom python environment for your code, segregated away from the rest of the OS. You can use a specific version of python, a specific set of modules, and so on (with no need for root access, since they’re being installed locally).

Installing virtualenv is pretty easy. You may be able to install it with your system’s package manager, or you may need to use pip. Or you could use easy_install. Python, all by itself, has several package managers. Because of course it does.

Setting up a virtual environment is straight forward, if a little kudgy-feeling. If you find that you’re going to be moving it around, maybe from machine to machine or whatever, you should probably know about the —relocatable flag.

By default, the workflow is basically, create a virtual environment, “activate” the virtual environment (which mangles lots of environmental variables and paths, so that python-specific stuff runs local to the environment rather than across the entire server), configuring it by installing the modules you need, write/execute your code as normal, and then deactivate your environment when you’re done, which restores all of your original environmental settings.

There is also a piece of software called virtualenvwrapper that is supposed to make all of this easier. I haven’t used it, but it looks interesting. If you find yourself really offended by the aforementioned workflow, give it a shot and let me know what you think.

Also as a reminder, make sure to put your virtual environment directory in your .gitignore file, because you’re definitely using version control, right? (Right?) Right.

Here’s how I use virtual environments in my workflow:


msimmons@bullpup:~/tmp > mkdir mycode
msimmons@bullpup:~/tmp > cd mycode
msimmons@bullpup:~/tmp/mycode > git init
Initialized empty Git repository in /home/msimmons/tmp/mycode/.git/
msimmons@bullpup:~/tmp/mycode > virtualenv env
New python executable in env/bin/python
Installing setuptools, pip...done.
msimmons@bullpup:~/tmp/mycode > echo "env" > .gitignore
msimmons@bullpup:~/tmp/mycode > git add .gitignore # I always forget this!
msimmons@bullpup:~/tmp/mycode > source env/bin/activate
(env)msimmons@bullpup:~/tmp/mycode >
(env)msimmons@bullpup:~/tmp/mycode > which python
/home/msimmons/tmp/mycode/env/bin/python
(env)msimmons@bullpup:~/tmp/mycode > deactivate
msimmons@bullpup:~/tmp/mycode > which python
/usr/bin/python

Spinning up a quick cloud instance with Digital Ocean

This is another in a short series of blog posts that will be brought together like Voltron to make something even cooler, but it’s useful on its own. 

I’ve written about using a couple other cloud providers before, like AWS and the HP cloud, but I haven’t actually mentioned Digital Ocean yet, which is strange, because they’ve been my go-to cloud provider for the past year or so. As you can see on their technology page, all of their instances are SSD backed, they’re virtualized with KVM, they’ve got IPv6 support, and there’s an API for when you need to automate instance creation.

To be honest, I’m not automating any of it. What I use it for is one-off tests. Spinning up a new “droplet” takes less than a minute, and unlike AWS, where there are a ton of choices, I click about three buttons and get a usable machine for whatever I’m doing.

To get the most out of it, the first step you need to do is to generate an SSH key if you don’t have one already. If you don’t set up key-based authentication, you’ll get the root password for your instance in your email, but ain’t nobody got time for that, so create the key using ssh-keygen (or if you’re on Windows, I conveniently covered setting up key-based authentication using pageant the other day – it’s almost like I’d planned this out).

Next, sign up for Digital Ocean. You can do this at DigitalOcean.com or you can get $10 for free by using my referral link (and I’ll get $25 in credit eventually).  Once you’re logged in, you can create a droplet by clicking the big friendly button:

This takes you to a relatively limited number of options – but limited in this case isn’t bad. It means you can spin up what you want without fussing about most of the details. You’ll be asked for your droplet’s hostname (which will be used to refer to the instance both in the Digital Ocean interface and will actually be set to to the hostname of the created machine),  you’ll need to specify the size of the machine you want (and at the current moment, here are the prices:)

The $10/mo option is conveniently highlighted, but honestly, most of my test stuff runs perfectly fine on the $5/mo, and most of my test stuff never runs for more than an hour, and 7/1000 of a dollar seems like a good deal to me. Even if you screw up and forget about it, it’s $5/mo. Just don’t set up a 64GB monster and leave that bad boy running.

Next there are several regions. For me, New York 3 is automatically selected, but I can override that default choice if I want. I just leave it, because I don’t care. You might care, especially if you’re going to be providing a service to someone in Europe or Asia.

The next options are for settings like Private Networking, IPv6, backups, and user data. Keep in mind that backups cost money (duh?), so don’t enable that feature for anything you don’t want to spend 20% of your monthly fee on.

The next option is honestly why I love Digital Ocean so much. The image selection is so painless and easy that it puts AWS to shame. Here:

You can see that the choice defaults to Ubuntu current stable, but look at the other choices! Plus, see that Applications tab? Check this out:

I literally have a GitLab install running permanently in Digital Ocean, and the sum total of my efforts were 5 seconds of clicking that button, and $10/mo (it requires a gig of RAM to run the software stack). So easy.

It doesn’t matter what you pick for spinning up a test instance, so you can go with the Ubuntu default or pick CentOS, or whatever you’d like. Below that selection, you’ll see the option for adding SSH keys. By default, you won’t have any listed, but you have a link to add a key, which pops open a text box where you can paste your public key text. The key(s) that you select will be added to the root user’s ~/.ssh/authorized_keys file, so that you can connect in without knowing the password. The machine can then be configured however you want. (Alternately, when selecting which image to spin up, you can spin up a previously-saved snapshot, backup, or old droplet which can be pre-configured (by you) to do what you need).

Click Create Droplet, and around a minute later, you’ll have a new entry in your droplet list that gives you the public IP to connect to. If you spun up a vanilla OS, SSH into it as the root user with one of the keys you specified, and if you selected one of the apps from the menu, try connecting to it over HTTP or HTTPS.

That’s really about it. In an upcoming entry, we’ll be playing with a Digital Ocean droplet to do some cool stuff, but I wanted to get this out here so that you could start playing with it, if you don’t already. Make sure to remember, though, whenever you’re done with your machine, you need to destroy it, rather than just shut it down. Shutting it down makes it unavailable, but keeps the data around, and that means you’ll keep getting billed for it. Destroy it and that erases the data and removes the instance, which is what causes you to be billed.

Have fun, and let me know if you have any questions!