This blog has been focused mostly on descriptive statistics, more than inferential or predictive models. Having said that, I thought it would be fun to create a model to predict the ratings provided by some media outlets after every race. Surely nothing can go wrong … right?
I will explain the model-building process step by step since that is in fact the whole purpose of this article. Before I begin, I will say that these results are not necessarily optimal. Building predictive models is not easy, and unfortunately, it takes quite a bit of time. Since my time is limited, I decided to just try 3 different types of models and see what kind of results I would get. It is important to note that all of these machine learning models are purely predictive, so predictive accuracy was the main objective of all of them.
The main idea is quite simple. I got the ratings given to the drivers after every race for the first 10 races of the season. These ratings come from 6 main media sources, which are: AMuS, Autosport, Crash, F1i, PlanetF1 and The Race.
In this case, I built three different models. The first one is a single-layer neural network, which in this case I will just call “multinomial classification“. The second model is a traditional random forest, which is a non-linear type of model that can be very powerful for classification analyses. The third and final model used a gradient boosting decision tree algorithm, which I will shorten to just “boosted trees”. This final model is in a way an extension to a decision tree type of model, but with more complexity and, in many cases, better predictive accuracy.
The first step was to decide which data to use for the model. In data science, the concept of garbage in, garbage out, refers to the idea that if the data that the model takes as an input is useless, then the output of the model will be useless as well. In this case, I decided to give the model a limited amount of data, but data that I thought would be meaningful to help the model obtain better predictions.
The model was fed 80% of the ratings that have been already posted for the first 10 races of the season. The remaining 20% of the data was kept as a validation set. Doing this type of process ensures that our model will give the same quality of predictions for both seen and unseen data.
These models have a variety of parameters that can be tuned to get better predictive accuracy. The way these parameters are tuned is by creating multiple models with subsets of the training data. These sub-models are then assessed to see which model is the one with the best results.
All the models that were created by using the subsets of the data are displayed in the chart above. In total, I created 60 models (20 per model type)/ As you can see, the best accuracy, which refers to the number of correct predictions, was obtained by the random forest models.
The multinomial classification models seemed to struggle to get good predictive accuracy regardless of the parameters that were tuned. The boosted trees models were better than the neural network models, but they still couldn’t match the predictive accuracy of the random forest models.
In the end, the best random forest model had a predictive accuracy of around 33%, which doesn’t sound so great but it’s still not as bad as it seems.
Comparison of predictive models
Just to be sure that the random forest models were better, I ran a Bayesian simulation to generate a distribution from the data. I won’t get into the details of Bayesian simulation, but it’s basically a model that produces multiple predictions, and these predictions produce a distribution. With these distributions, I can compare with more granularity all of the multiple models that I created.
As you can see, the random forest still came out on top, with a prediction accuracy ranging from around 31 to 34%.
We can also get a simulation of the average difference between the models. In this case, we are comparing the random forest model with the boosted trees model.
We can see that on average we would expect the random forest to be between 1% and 1.5% better than the boosted trees model. While this difference is not massive, I considered it enough to select one random forest model as the best one of the lot.
Once the best model was selected, it was time to evaluate it on the testing data. Remember that the testing data comprises 20% of the original data. This data has not been seen by the model at all, but based on the sub-models that we created, we should expect a similar predictive accuracy of 31-34%.
In our case, the model had a predictive accuracy of almost 34%, which is on the upper limit of our expectations. Having said that, our model is not complete garbage as you may think at first.
To see if this model was usable, I calculated the error for every prediction. This means that I got the difference between the original rating and the rating predicted by the model. So for example, if a media source, let’s call it The Race, gave George Russell a 10 in a particular race, and our model predicted a 9, then the error would be 1 (10 – 9 = 1).
In the chart shown above, we can see the empirical cumulative distribution of the errors. This is a cumulative chart, meaning that as we move towards the right side of the charts, we keep adding the percentage that we had on the left side of the chart.
As I’ve said, our model correctly predicted 34% of the ratings, meaning an error of 0. Then, we can see that almost 50% of our predictions had an error of 0.5% or less. The data then makes a big jump, showing that 83% of our predictions had an error of 1 rating point or less. This is not bad when you think about it. It means that most of the time we had either a) a correct prediction or b) a prediction with an error of 1 point or less.
Model evaluation by media source
We can get more detail by separating the predictions by media source. I added the mean—also called average—error to each panel to see if the predictions were better, the same, or worse, depending on the media outlet.
We can see that for sources like F1i and AMuS, the average error was just 0.63/0.64, which means that on average our predictions were off by a little over half a point. For AMuS, almost 95% of our predictions had an error of 1 point or less. This is quite acceptable, if not perfect.
The predictions for the ratings provided by The Race were the worst of the lot. We only correctly predicted 18% of the ratings, and only 62% of them had an error of 1 point or less.
After evaluating all the models, we can finally see all of our predictions. The predictions shown here are done on the testing data set which comprised 20% of the data. It is all randomized to avoid biases, so in some cases, we will see predictions from different media sources for the same drivers. In some other cases, however, we won’t even see a driver in a particular race.
After looking at the numbers, I honestly don’t feel as disappointed as I felt when I saw the overall predictive accuracy of 34%. Most of our predictions are quite good. I think that the predictions that the model created for The Race are not the best, and that brings down the overall predictive capacity of the entire model.
It’s easy to see that the largest errors come from the predictions done for that particular same outlet too. For example, the model predicted a 6 for Fernando Alonso at the Emilia Romagna GP, but the real value was 3.5. Something similar happened with the Sergio Perez rating at the Monaco GP. The model predicted a 9 after a good recovery performance from the Mexican, but The Race gave him a 4.
The next step is to make some predictions on ratings that have not been given yet. So for the upcoming Hungarian Grand Prix, I will use this model to make some predictions.
I do not know at what time do all these 6 media outlets post their ratings. I usually get time to work on my analyses until hours after the race, but I never check the news related to the race until I get home from work. So most likely this means that I will post the results of this prediction at around 3 or 4 AM Eastern Standard Time. Hopefully, this means that you will get the chance to see the results on Monday morning after the race.
If you’re asking yourself something like “how do I know you won’t cheat?”, well, you won’t. I will give you my word though that the predictions are 100% produced by the model and will not be altered by me in any way. If the model predicts a 10 and the rating is a 1, so be it, that’s how machine learning works sometimes.
First of all, thank you for taking the time to read this article. I hope you stay tuned for the follow-up of this post, which will be the actual prediction of the ratings provided by the 6 media outlets after the Hungarian Grand Prix.
If you enjoyed this analysis, please share it with friends and family. If you want to support me even more, there are multiple ways you can donate some money to help me keep this project alive. You can find all the options on either the about tab or the my supporters tab in the main menu.