Blog

July 22, 2009
Categories:

beautifulsoup, bonktown and growl

Here’s a little script that combines two of my favorite pass times. Python programming and cycling. bonktown.com is a great site that has steep discounts on road cycling gear. They only sell one item at a time and they typically sell that item until it is gone. I’ve gotten some great deals on clothing and other stuff on that site. Bonktown helps you know whats currently for sale in a number of ways, including a nice dashboard widget that pops up a notifier when something new comes on sale. The problem is that over time I’ve started to ignore the growl notifiers for bonktown, because I’m not interested in lots of the stuff they sell.

So, I wrote this python script that allows me to look for the stuff I am interested in buying. It works by having a file of regular expressions that I use to search the item descriptions when something goes on sale at bonktown. If the item matches something I’m looking for then I get a Growl notification. If not then I don’t hear about it.

Here’s the code:


#!/usr/bin/env python2.6

import re
import urllib
from BeautifulSoup import BeautifulSoup
import Growl

name = “MyBonk” # was BonkMe
notifications = [“search_hit”]
notifier = Growl.GrowlNotifier(name,notifications)
notifier.register()

# Read file of search Terms
myTerms = open(“/Users/bmiller/lib/bonk_items.txt”).readlines()


# Get the latest page
bt = urllib.urlopen(”www.bonktown.com”)

doc = BeautifulSoup(bt.read())

itemlist = doc.findAll(id=re.compile(“item_title”))
price = doc.findAll(id=re.compile(“price”))
desc = doc.findAll(id=re.compile(“item_description”))

for term in myTerms:
for i in range(len(itemlist)):
if itemlist[i] and re.search(term[:-1],itemlist[i].contents[0],re.IGNORECASE):
notifier.notify(“search_hit”,
itemlist[i].contents[0],
desc[i].contents[7].contents[0],
sticky=False)



This script makes use of several modules:

  • Growl

  • BeautifulSoup

  • urllib

  • re



I would have liked to use one of the standard library html/xml parsers, but I could not find one that was as convenient or easy to use as BeautifulSoup. If you can tell me how to parse messy html with one of the standard library xml modules please let me know.