How I approach a new python project

It’s been a long time since I’ve written a post, but that’s just because I haven’t had time. For real I haven’t had time, not like, “making excuses, I haven’t had time”. I stay busy, and I love it.

I’m in what passes for a lull starting today (and I’ve been sick to boot), so I answered a question on Reddit about Raspberry Pi development, and I got a PM from someone who read it, who was interested in learning how I approached a new project with Python. I wrote a long answer, so I figured I’d share it here. This totally counts as a blog entry ;-)

Hey there,

I saw your post on programming or rasp pi for someone learning python. My knowledge base is probably under theirs, however im still interested in learning python. Im actually in the process of making a magic mirror.

I was wondering where would be a good starting point to learn CLI and scraping data and putting together a gui. I would like to do as much as i personally could.

Cheers,

X

Here’s my answer:

Hi X!,

This is a really good guide:

http://docs.python-guide.org/en/latest/scenarios/scrape/

But there are lots of ways to do it. In general, since Python is so easy to develop on, there are tons of modules out there to do it. In general, after I figure out what I want to work on, I see what the existing libraries are for dealing with that, and use one of them if I can, and if I can’t, I go back to first principles and look at what the methods are for interacting with the service I want to interface with.

For instance, lets say that I live in Los Angeles (which I do), and I want to monitor earthquakes so that my magic mirror can let me know when I wake up if there’s been anything happen while I was asleep. The first thing I’d do is to Google for “python earthquake library“, and what’s the first hit? A very in depth guide  about how to use python to monitor for earthquakes, including maps and stuff way beyond what I’m looking for.

Since that’s a bit overwhelming and I’m not quite ready to get into matplotlib (which is the graphical stuff they’re using), lets see what the next few links are:

So we have our choice. Since I really just care about quakes affecting the LA basin, I’ll look through the libraries and see which ones have the best interface to allow me to geographically select things, and then move on from there.

So that’s an existing library, but suppose we need to screen-scrape things? Like, what if you wanted to generate a number for the average rental prices in a zip code? Well, first search “python library rental prices“, and there’s nothing, so we’re going to have to do it the hard way.

There are almost certainly better ways, but just as an example, lets take a look at ApartmentGuide.com (note, what we’re doing is probably against the terms of service, so you should probably not offer this as a public solution – just keep it for your personal use). That allows us to specify a zip code, a couple of quick filters, and when we search, it gives us most of that on the URL. Easy-peasy!

When I search for two bedroom apartments in the zipcode of my job, I get this link:

http://www.apartmentguide.com/zip/90250-Apartments-For-Rent/2-beds-1z141y8/

As a human, it’s easy for me to look at that URL and figure out what everything is, with the exception of the 1z141y8. I’m worried that it might be a unique ID, like a cookie almost, that makes sure that I was the one that loaded the previous page. When I look at the original page’s (very ugly) source, though, I see that it shows up here:

<meta class=”pageInfo” content=”2-beds-1z141y8″ name=”refinements”>

On a hunch, I go to the results page again, and I change the zipcode to Columbus, Ohio, where I lived for a while, and I hit enter. The results for Columbus pop up, so I think we’re good to go.

So, given the very first link I mentioned (remember?), I’d go through the web page results with the XML scraper and get the information I needed, then do whatever I needed with them – average, or whatever.

Does that make sense? Thanks!

–Matt

So, generally, that’s how I approach the problem, regardless of the language. Sometimes, if I’m really into a project, I’ll actually write the library myself (like I did with libMBTA), but the world is gradually becoming one big API, and if you can program, you can take advantage of it!

I hope you enjoyed it. If so, let me know in the comments. Thanks!

Debian Jessie and Puppet

Please correct me if this blog entry is wrong, because I really, truthfully hope it is.

It seems that Debian Jessie is not going to be receiving backports to Puppet 3 (client, anyway).  The way forward is through Puppet 4 (which most of you have probably known about for a while). Like me, you probably hoped to not have to go there so soon. Well, maybe not, but I had hoped to not go there so soon, anyway.

So the first thing I started to do was build out a Puppet 4 testing Vagrant box, which was surprisingly hard. Basically, puppet 3 supported the “–manifestdir” argument, but puppet 4 doesn’t, so if you install the puppet-agent package (which is how you install puppet 4 rather than puppet 3), it dies with “Could not parse application options: invalid option: –manifestdir”. Not cool.

The solution is to redesign your Vagrant puppet directory. Instead of “puppet/init.pp” and “puppet/modules/”, you need to make “puppet/environments/site/environmentname/manifests/site.pp” and “puppet/environment/site/environmentname/modules/”, respectively. The puppet config for my box looks like this:

config.vm.provision :puppet do |puppet|
puppet.hiera_config_path = "hiera.yaml"
puppet.environment_path = "puppet/environments"
puppet.environment = "vagrant"
end

The directory structure looks like this:

├── bootstrap.sh
├── hiera
│   ├── common.yaml
│   └── nodes
├── hiera.yaml
├── puppet
│   ├── environments
│   │   └── vagrant
│   │       ├── manifests
│   │       │   └── site.pp
│   │       ├── modules
│   │       │   ├── collectd
│   │       │   ├── concat
│   │       │   ├── redis
│   │       │   ├── stdlib
│   │       │   └── sysctl
│   │       └── Puppetfile
│   └── Puppetfile.lock
├── README.md
└── Vagrantfile

Now, you should know… a lot of the puppet code will work without changing, but there were some not-small changes, including a few real head scratchers. For instance:

Empty Strings in Boolean Context are true

In previous versions of Puppet, an empty string was evaluated as a false boolean value. You would see this in variable and parameter default values where conditional checks would be used to determine if someone passed in a value or left it blank.

class empty_string_defaults (
  $parameter_to_check = ''
) {
  if $parameter_to_check {
    $parameter_to_check_real = $parameter_to_check
  } else {
    $parameter_to_check_real = 'default value'
  }
}

Puppet’s old behavior of evaluating the empty string as false would allow you to set the default based on a simple if-statement. In Puppet 4.x, this behavior is flipped and $parameter_to_check_real will be set to an empty string.

You can check your existing codebase for this behavior with a puppet-lint plugin.

See the language page on boolean values for more info.

I… um… I don’t even really know what to say to that. I mean, they completely flipped the entire logic of the test around. That’s pretty unnecessary, in my book. I understand what they’re trying to say, that even an empty value isn’t False, but it’s hard to think of a more common shortcut when first writing code. I’m not going to argue that checking against an empty string is a good way to determine true or false, but wow, to change the behavior of the language entirely like that? That’s intense.

Another one I found that will probably kill code:

Regular Expressions Against Non-Strings

Matching a value that is not a string with a regular expression now raises an error. In 3.x, other data types were converted to string form before matching (often with surprising and undefined results).

$securitylevel = 2

case $securitylevel {
  /[1-3]/: { notify { 'security low': } }
  /[4-7]/: { notify { 'security medium': } }
  default: { notify { 'security high': } }
}

Prior to Puppet 4.0, the first regex would match, and the notify { ‘security low’: } resource would be put into the catalog.

Now, in Puppet 4.0, neither of the regexes would match because the value of $securitylevel is an integer, not a string, and so the default condition would match, resulting in the inclusion of notify { 'security high': } in the catalog.

So, I can kind of see that. A 2 is clearly a number. But on the other hand, the puppet code will happily cast the string ’30’ as a number because that’s valid. But the number 30 can easily be turned into a string, too: “30”. See?

I actually tested to see if you explicitly declared the “2” as a string, would the regex pick it up. Nope, not at all:

$securityLevel = “2”

case $securitylevel {
/[1-3]/: { notify { ‘security low’: } }
/[4-5]/: { notify { ‘security medium’: } }
default: { notify { ‘security high’: } }
}

[email protected]:/tmp/vagrant-puppet/environments/vagrant/manifests$ sudo puppet apply –modulepath=../modules/ ./site.pp
Warning: Facter: timeout option is not supported for custom facts and will be ignored.
Notice: Compiled catalog for test-monitoring-client.spacex.corp in environment production in 4.16 seconds
Notice: security high

Because of course not.

Anyway, I’m currently dealing with this, so I figured I’d write my first blog entry in a while so you could share in the fun. Good luck! And if you have a good technique for testing existing code against new puppet builds, let me know! I’m considering my CI options, but I’ve got a lot of new tests to write, it seems like.

Great Open Positions at Northeastern CCIS

I’ve landed in Los Angeles, and I’m getting settled in temporary housing until I find my own place, but it’s been a really busy couple of weeks, and I just realized that I didn’t get a chance to post about the open positions that my (now former) team has.

First, more obviously, there’s my old position, that of the Networking & Virtualization Administrator. The position is officially posted on Northeastern’s Careers page, but I can tell you that you’d be responsible for a medium-sized relatively flat network infrastructure. There are a few dozen VLANs, all statically routed from the core switches, and around a thousand lit switchports. The hardware is mostly Cisco Catalyst, with the core being Cisco Nexus 5548s, although there are some virtual PFsense boxes running around too.  You would be working with the (pretty friendly and competent) central ITS network admin to coordinate staff and faculty moves around the infrastructure, and with the university’s security officer (who is also surprisingly friendly, given his line of work) whenever something weird pops up.

The role is also responsible for the VMware cluster, which currently consists of around 15 ESXi nodes and two vCenter instances (one for “production” use which has the vSphere Essentials Plus license) and the educational cluster, built out using VMware Academic licenses for classroom and academic use. They’re backed by NetApp and Nimble storage, and it’s this part of the job responsibilities that gives you a little more creativity to solve problems, since professors usually want interesting things. I’ve built some useful stuff in PowerShell, but there’s no reason you have to use that long-term, if you want to solve the problems yourself.

Anyway, I really enjoyed my time in this position, and to be honest, I really miss the other staff members and students there.

In addition, the CCIS staff is growing. We got a new dean a little over a year ago, and one of the things she wants to do is to offer management of researchers’ clusters in a more active manner, so we are looking for another Linux sysadmin (pretty much all of the researchers do work on Linux).

This position will involve a lot working with our current Linux admin to bring over the technology he has built to deal with our “managed” machines to help with our “unmanaged” or “soon to be managed” researcher-owned machines. Basically, there’s nothing like this right now, so you would be inventing the role as you go. Exciting! Challenging! Rewarding!

Anyway, please, if you’re looking for a position in Boston somewhere, take a look at Northeastern. It’s easy to get to, there’s free tuition for you, your spouse, and your children, and I feel like the staff that I worked with there are my family, and I miss them :-)

If you have any questions, please drop me an email and I’ll be happy to help. Thanks!

A blog for IT Admins who do everything by an IT Admin who does everything