Quick thoughts on IT, fun projects, and the singularities I come across.

Web Scraping, PHP, and Wheel of Fortune

My fiancee loves Wheel of Fortune and watches whenever she can. The clever folks over at Sony (producers of Wheel of Fortune) introduced a loyalty program called “Wheel Watchers.” People who sign up get a “Spin ID,” and if your Spin ID is chosen for a given episode, you win one of the prizes they gave away on the show. Only catch is you have to watch every night.

This is a lot of random background information, but there’s a reason. My fiancee asked me to write an application that would check the Spin ID every day and notify us if we won. This seemed like a great reason to learn some web scraping with PHP (although you could probably do this in just about any language).

Pulling Raw Data

First things first, you need a decent website to scrape the information from! Strangely, the official Wheel of Fortune site doesn’t offer up the winning spin IDs. Luckily, there are a handful of websites that do report the winning numbers. For my application, I’ll be using http://wheeloffortuneclub.blogspot.com

We’ll pull the entire web page first and then parse for the Spin ID later. In PHP, this is quite simple:

Keep in mind, many site admins will not take kindly to you scraping their site especially if you’re doing it frequently. In this example, we’ll only need to scrape the information once a day, so it shouldn’t be a problem.

Parsing for the Spin ID

When scraping web pages, regular expressions come in handy. To pull the specific data you’re looking for, you may need to use a clever combination of identifying content as well as identifying HTML tags and attributes to retrieve the data. In the case of the Spin ID, it’s two capital letters and 6-7 numbers. This is a pretty specific format, so it’ll be pretty easy to pull using regex.

Now, regex syntax can be tough if you don’t use it on a regular basis. Regex 101 is an awesome site to use as a reference for regex syntax or to test your expressions.

For the Spin ID, our regular expression is “[A-Z][A-Z]\d{6,}”. This translates to two capital letters ([A-Z][A-Z]) followed by 6 or more numbers (\d{6,}). We’ll create a variable for our regular expression and parse our previously fetched web page to look for the expression using preg_match, which will return the first match:

We’re passing three parameters to preg_match – the pattern we’re seeking, the string subject, and an output variable ($match in this case).

The $match variable is actually an array, and so we’ll refer to the first object in the array to get the string. For now, we’ll just echo the variable out to the page to confirm that everything’s working!

So the complete code looks like this:

You can check out the page here.

Automation with Cron and Email

So we’ve got a PHP page that will parse and return the most recent winning Spin ID. So what? I could have just browsed to the Spin ID website and gotten the same information. We need to automate the parsing and compare it to our specific Spin ID (to see if we’re a winner) and contact us if it’s a match. If it’s a match, we’ll send an email to notify us. Here’s the code in it’s entirety:

The Spin ID “KW6426861″ was the most recent winning ID at the time I wrote this script, and so the check resolved to true and sent me a convenient notification email. Awesome. Now, to finish our project, we just need to regularly execute the PHP script with a Cron job. If you’re using your home server to host, you can just write a crontab entry using php -f /path/to/your/php/script.php and execute it at whatever interval you want.

If you are using hosting externally, most CPanels will offer a cron functionality. Again, you just need to provide the command (“php -f” in this case), the path to your php script, and then your interval. I used “ 0 */12 * * * ” to check every 12 hours.

That’s it! A very simple but powerful PHP script in just 14 lines of code!

Social tagging: > > >

One Response to Web Scraping, PHP, and Wheel of Fortune

  1. Hi there to every one, because I am really keen of reading this website’s post to be updated regularly. It includes fastidious stuff.

Leave a Reply