Runescape Market Analysis (Part 1): Gathering the Data

I have a soft spot in my heart for Runescape. I spent a lot of time in my Junior High's computer lab casting steely glances around the room as my friends and I endeavored to play without the teacher seeing what we were up to instead of practicing our touch typing (snore!)

It was that fuzzy feeling, coupled with my more recent interest in teaching myself Machine Learning and Python that gave me the inspiration to make a market analysis bot for the Runescape Grand Exchange. The bot gathers the day's data, adds it to a 3+ year long dataset scraped from the web, analyzes the last 20 days, and tweets the items that are projected to increase in price some time in the next 20 days.

I'm not sure if this has been done before. I'm sure it has. But it was certainly a fun learning experience either way.

Before I begin I want to give a very big shout out to Harrison Kinsley (who has no idea who I am) and his YouTube channel. His video series on pattern analysis with Python make up the bulk of the code for the bot, and although I'm hoping to improve its performance someday, it does not by any means run on an original algorithm. Check out Harrison's videos on pattern recognition for a more in-depth explanation of how this all works.

Let's start with how I gathered the data.

There are a number of websites out there that provide historical information on the Grand Exchange. I chose Grand Exchange Watch, which provides data in the form of a table with dates, and prices from those dates:

Unfortunately there's no way to export data from Grand Exchange Watch as far as I know, so I had to scrape the data myself using BeautifulSoup

I grabbed a list of item IDs from the runescape website, and headed on over to Grand Exchange Watch where I ran into my first problem. Unlike the official Runescape website, Grand Exchange Watch includes both the item name and the item ID in the URL.

If I'm going to programmatically grab all the data I need, I need to be able to arbitrarily go to any item page on this website. But all I have is a list of item IDs, so I wrote a quick function to use requests and the Runescape API to grab the item names from their ID numbers:

import json
import requests
from bs4 import BeautifulSoup
import re
import os
import time
import datetime


item_ids = ['1944', '556', '314']
item_names = {}

#the function accepts my list of ID numbers as strings or ints

def getItemNames(item_numbers):
    #for every itemID in my list grab the appropriate json from the Runescape API

    for item_id in item_numbers:
         item_url = 'http://services.runescape.com/m=itemdb_rs/api/catalogue/detail.json?item='+str(item_id)
         item_response = requests.get(item_url)
         item_json = item_response.json()
 
         #stick the item name and id number in a dictionary

         item_names[str(item_json['item']['id'])] = item_json['item']['name']

So now I have a way to refer to the item's ID and the item's name.

url_names = []

getItemNames(item_ids)

for item_id in item_names:
    url_names.append(item_id+ "-" +item_names[item_id].replace(" ", "-"))
    #item_id is the ID Number key from our item_names dictionary,

    #and item_names[item_id] is the name of the item

I now have a list of names and IDs to append to my URLs. It's now a matter of grabbing the appropriate dates and prices from the table on each page. Also notice that if I want more than just the most recent 20 days I'm going to have to go through each page of the table, which is just a matter of appending a number to the url. For example page 2 of the table is:

I use Requests to grab the page's source, and I stuck the next part into two big for loops. I'm not sure if this is the most pythonic way to do what I'm trying to do, but it works for me and that's all I really care about.

dates = []
prices = []

pagenums = range(1, 60)

for item_url in url_names:
    item_id = re.split(r'([0-9]*)', item_url)[1]
    for pagenum in pagenums:
        url = "http://www.grandexchangewatch.com/item/" + item_url + "?range=360&amp;start=" + str(pagenum)
        r = requests.get(url)
        soup = BeautifulSoup(r.content)
        cal = soup.find_all("div", {"id": "calendar-container"})
        #finds the table of data


        tds = cal[0].contents[5].find_all("td")
        #finds the individual entries in the table, reading from right to left



        date_id_1 = [x*6 for x in range(10)]
        date_id_2 = [x+3 for x in date_id_1]
        #These weird lists of numbers are the indexes for

        #the appropriate entries in the table (Date, Price)

        price_id_1 = [1+(x*6) for x in range(10)]
        price_id_2 = [x+3 for x in price_id_1]

        for i in date_id_1:
            dates.append(tds[i].text)
        for y in date_id_2:
            dates.append(tds[y].text)

        for i in price_id_1:
            prices.append(tds[i].text.replace(",",'').replace("gp",""))
        for y in price_id_2:
            prices.append(tds[i].text.replace(",",'').replace("gp",""))

        dates.reverse()
        prices.reverse()
        #Want the lists so they are from oldest to newest


    f = open(item_names[item_id]+'.csv', 'a+')
    for date in dates:
        ind = dates.index(date)
        entry = str(time.mktime(time.strptime(date, "%B %d, %Y"))) + "," + prices[ind] + "\n"
        #make the string formatted date into a unix timestamp

        f.write(entry)
    f.close()
    #Saving a csv file of what we've gathered


    dates[:] = []
    prices[:] = []
    #Clearing the date and price lists for the next item in our item_id list

And there we have it. This will grab 60 pages of table data for a list of items, which is a little over 3 years of prices, every day, for each item. In Part 2 I'll go over how I analyze this historical data.