And in case you didn't see it in part 1, here's the end result of this entire endeavor: A twitter bot that tweets about items projected to increase in price on the Grand Exchange within the next 20 days.
At the end of part 1 I gathered 3 years worth of historical data by scraping Grand Exchange Watch and stuck it in a set of CSV files. Each row of one of the CSV files is 1 unix date stamp, and 1 price of the item on that date. Like so:
Simple enough. So I have this rather long data file, now what?
I need a way to go through this collection and collect patterns of price changes. I think 20 days is quite enough time for a price movement to occur, so I'll make all the patterns 20 points long. I'll start with an "anchor" point, and iterate through the next 20 price points, taking the percentage change between the anchor point and each point after it, up to 20. At this point I'll move the anchor 1 down the list and do the same. In the end there will be as many patterns as there are data-points, minus 20 (I don't want the most recent 20 days to be included in the pattern collection).
First things first, I need a percentage change function:
If there is no change between the two points I'm working with I don't want the function to return exactly 0, as that ends up causing the code to return some errors later on (I'm not entirely sure why). So 0.00000000001 is close enough to 0.0 to work. Getting exactly 0 is rare enough that it's not really an issue anyway.
Okay, so now on to collecting the patterns. The following code does exactly as I described above.
A note about the above code: Yes. It should be written with a loop or otherwise shortened. The primary reason it is written so repetitively is for the sake of learning and for my own understanding. Writing it like this makes it abundantly clear what the patterns consist of.
I'm also going to need to do the same thing for the most recent pattern (the last 20 items in the CSV). Luckily I only need to do this once, so no for loop this time.
In case you're wondering why I take the percent change of everything in both of these functions: It's because I have to normalize the patterns. Otherwise if an item used to fluctuate around 100gp, but now fluctuates around 200gp, this won't pick up on those patterns.
Next up I have to figure out how similar the current pattern is to all the other patterns in my dataset
With this function I have iterated through all the patterns in our CSV, and I have compared them with the most recent pattern. For all the patterns that are >75% similar, I add them to a list called similarPatterns.
I then average the historical outcomes of each entry in similarPatterns, and if the average is greater than the current price of the item, I label the item as having a net positive prediction.Additionally, I count the number of positive outcomes from the similarPatterns list. This gives me a couple of variables that describe the total number of similar positive outcomes and whether or not the average outcome is predicted to be positive.
And that pretty much completes the analysis. Now I just have to decide if I want to count this particular item as "interesting" or not. And of course I have to call the previous 3 functions in the proper order to do so.
This Analyze function will do everything I need to do for each item. I made a function out of it since I'm going to be calling it for every CSV file I gathered in part 1. Like so:
And finally I create a CSV file that contains all of the items of interest based on the analysis.
And there we have it, a (messy) script that will scan for and save items that might be increasing in price in the next 20 days.
Again, if you have suggestions for improving the code, or questions about how it works go ahead and leave a comment or shoot me an email.
In part 3 I discuss how I update the CSV with the current day's price, and how I set up the twitter bot.