It’s been a long time since I’ve written a post, but that’s just because I haven’t had time. For real I haven’t had time, not like, “making excuses, I haven’t had time”. I stay busy, and I love it.
I’m in what passes for a lull starting today (and I’ve been sick to boot), so I answered a question on Reddit about Raspberry Pi development, and I got a PM from someone who read it, who was interested in learning how I approached a new project with Python. I wrote a long answer, so I figured I’d share it here. This totally counts as a blog entry ;-)
I saw your post on programming or rasp pi for someone learning python. My knowledge base is probably under theirs, however im still interested in learning python. Im actually in the process of making a magic mirror.
I was wondering where would be a good starting point to learn CLI and scraping data and putting together a gui. I would like to do as much as i personally could.
Here’s my answer:
This is a really good guide:
But there are lots of ways to do it. In general, since Python is so easy to develop on, there are tons of modules out there to do it. In general, after I figure out what I want to work on, I see what the existing libraries are for dealing with that, and use one of them if I can, and if I can’t, I go back to first principles and look at what the methods are for interacting with the service I want to interface with.
For instance, lets say that I live in Los Angeles (which I do), and I want to monitor earthquakes so that my magic mirror can let me know when I wake up if there’s been anything happen while I was asleep. The first thing I’d do is to Google for “python earthquake library“, and what’s the first hit? A very in depth guide about how to use python to monitor for earthquakes, including maps and stuff way beyond what I’m looking for.
Since that’s a bit overwhelming and I’m not quite ready to get into matplotlib (which is the graphical stuff they’re using), lets see what the next few links are:
- Quakefeeds – https://pypi.python.org/pypi/quakefeeds/0.2
- Earthquakes library – https://github.com/RealTimeWeb/earthquakes/tree/master/python
- Obspy – https://github.com/obspy/obspy/wiki
So we have our choice. Since I really just care about quakes affecting the LA basin, I’ll look through the libraries and see which ones have the best interface to allow me to geographically select things, and then move on from there.
So that’s an existing library, but suppose we need to screen-scrape things? Like, what if you wanted to generate a number for the average rental prices in a zip code? Well, first search “python library rental prices“, and there’s nothing, so we’re going to have to do it the hard way.
There are almost certainly better ways, but just as an example, lets take a look at ApartmentGuide.com (note, what we’re doing is probably against the terms of service, so you should probably not offer this as a public solution – just keep it for your personal use). That allows us to specify a zip code, a couple of quick filters, and when we search, it gives us most of that on the URL. Easy-peasy!
When I search for two bedroom apartments in the zipcode of my job, I get this link:
As a human, it’s easy for me to look at that URL and figure out what everything is, with the exception of the 1z141y8. I’m worried that it might be a unique ID, like a cookie almost, that makes sure that I was the one that loaded the previous page. When I look at the original page’s (very ugly) source, though, I see that it shows up here:
<meta class=”pageInfo” content=”2-beds-1z141y8″ name=”refinements”>
On a hunch, I go to the results page again, and I change the zipcode to Columbus, Ohio, where I lived for a while, and I hit enter. The results for Columbus pop up, so I think we’re good to go.
So, given the very first link I mentioned (remember?), I’d go through the web page results with the XML scraper and get the information I needed, then do whatever I needed with them – average, or whatever.
Does that make sense? Thanks!
So, generally, that’s how I approach the problem, regardless of the language. Sometimes, if I’m really into a project, I’ll actually write the library myself (like I did with libMBTA), but the world is gradually becoming one big API, and if you can program, you can take advantage of it!
I hope you enjoyed it. If so, let me know in the comments. Thanks!