Runescape Market Analysis (Part 1): Gathering the Data
I have a soft spot in my heart for Runescape. I spent a lot of time in my Junior High's computer lab casting steely glances around the room as my friends and I endeavored to play without the teacher seeing what we were up to instead of practicing our touch typing (snore!)
It was that fuzzy feeling, coupled with my more recent interest in teaching myself Machine Learning and Python that gave me the inspiration to make a market analysis bot for the Runescape Grand Exchange. The bot gathers the day's data, adds it to a 3+ year long dataset scraped from the web, analyzes the last 20 days, and tweets the items that are projected to increase in price some time in the next 20 days.
I'm not sure if this has been done before. I'm sure it has. But it was certainly a fun learning experience either way.
Before I begin I want to give a very big shout out to Harrison Kinsley (who has no idea who I am) and his YouTube channel. His video series on pattern analysis with Python make up the bulk of the code for the bot, and although I'm hoping to improve its performance someday, it does not by any means run on an original algorithm. Check out Harrison's videos on pattern recognition for a more in-depth explanation of how this all works.
Let's start with how I gathered the data.
There are a number of websites out there that provide historical information on the Grand Exchange. I chose Grand Exchange Watch, which provides data in the form of a table with dates, and prices from those dates:
Unfortunately there's no way to export data from Grand Exchange Watch as far as I know, so I had to scrape the data myself using BeautifulSoup
I grabbed a list of item IDs from the runescape website, and headed on over to Grand Exchange Watch where I ran into my first problem. Unlike the official Runescape website, Grand Exchange Watch includes both the item name and the item ID in the URL.
If I'm going to programmatically grab all the data I need, I need to be able to arbitrarily go to any item page on this website. But all I have is a list of item IDs, so I wrote a quick function to use requests and the Runescape API to grab the item names from their ID numbers:
So now I have a way to refer to the item's ID and the item's name.
I now have a list of names and IDs to append to my URLs. It's now a matter of grabbing the appropriate dates and prices from the table on each page. Also notice that if I want more than just the most recent 20 days I'm going to have to go through each page of the table, which is just a matter of appending a number to the url. For example page 2 of the table is:
I use Requests to grab the page's source, and I stuck the next part into two big for loops. I'm not sure if this is the most pythonic way to do what I'm trying to do, but it works for me and that's all I really care about.
And there we have it. This will grab 60 pages of table data for a list of items, which is a little over 3 years of prices, every day, for each item. In Part 2 I'll go over how I analyze this historical data.