# 2019 Bahrain GP: FP2 race simulation pace

**Key points about this graph**:

- The graph shows all lap times done by a driver during a single stint.
- No laps were removed from the graph. I will explain why in the detailed explanation paragraph.
- The horizontal line represents the median. Why not the mean? Detailed explanation paragraph.
- The box represent the interquartile range, from 25% to 75%. You know where to go.

**Short explanation:**

** **If we consider the median as the most representative measurement from the data, then interestingly **Max Verstappen** had the best session of the day. Most of his laps were very consistent,with very few laps that could be considered as outliers. **Leclerc **had the worst median of the top 6 drivers. While he had the fastest lap of all of them, he also had the slowest one. His inconsistent laps unfortunately prevent us from doing a very reliable analysis of his data.

**Long(er) explanation:**

I know that most of us are used to check lap times, do averages, and say “Hey, he had a lower average lap time, so he was faster!”, and that may be true on some occasions. However, in some other cases like here, it may be better to use the median. Unlike the mean (or average, however you want to call it), the median is not as sensitive to outliers (that means very fast or very slow laps that do not fit with the rest of the data). Since these stints are so short, I decided to go for the median and try to get a more accurate picture of what happened.

Why not remove the laps that are very fast or very slow? Well, because it is not mathematically correct. Simple as that. Just because it makes the data easier to analyze, does not mean that it is correct to do it. What if a driver had a very slow lap that helped him to cool his increasingly overheated tires, and then on the next lap he had a blistering lap because the tires had an optimal temperature? If you remove the slow lap, but keep the following fast lap, then your data is already biased and not representative of reality.

In cases like this, I believe that it is better to visualize the data and try to understand properly the lap distribution for each driver, instead of focusing on a single number like the average lap time. The interquartile range helps us to do that. All the data inside the box represents 50% of all the laps that were done by a particular driver. Anything below the box represents his 25% fastest laps, while everything above the box represents his 25% slowest laps. Another benefit of this way of presenting data is that it allows you to see not only how fast were the lap times done by each driver, but how consistent they were too.

## Race Simulation Model

**Key points about this graph**:

- The graph shows all lap times done by a driver during a single stint.
- The line represents a fitted linear model for each driver’s data.
- Some data has been removed from this data set in order to obtain a more accurate model.

For this figure I adjusted the lap number, just to make sure that we could have a more or less truthful way of comparing lap times between driver. Also, unlike with the previous graph, I did remove some data points. “Hey, you just said that you should not remove data points, what is going on?” Well, in this case I am playing not only with real raw data, but I am making a model that has in fact to make inferences. Because a linear model like this one is extremely sensitive to outliers, not removing them would mean that the model would be all over the place. Since the purpose of this model is to predict, not to state what REALLY happened, it is alright to remove a couple of data points in order to obtain a more accurate representation.

**NOTE: This model is a raw way of presenting data.** This is not by any means a very specific model that will allow you to make incredible accurate predictions, but it does help to understand for example how consistent was a driver.

The lines are created by using all the lap times for each driver and creating a linear model from them. The model tries to predict how would the times change during the next few laps. In this case, if the laps (represented as squares or circles) run close to the line, then the model for that particular driver was accurate. If the laps are far from the line, then the model may not be as accurate as we would like it to be.

A line that is not very steep means that the lap times start increasing slowly. Maybe the driver is just very consistent and the only reason that he starts getting slower is due to tire degradation.

A line that is very steep means that lap times increase rapidly as the driver keeps doing more and more laps. Perhaps the tires have a high degradation rate.

In this case, I would argue that only the data from **Leclerc**, **Verstappen **and **Vettel **can be compared, since they are the only 3 drivers who have 10 or more data points. The more data points, the better the model.

In this case I would say that Verstappen once again is showing how much potential the **Red Bull **car has. His laps were very consistent, with each lap getting slower by a small margin. While Vettel and his **SF90 **started strong, his laps got slower than Verstappen’s by the end of his stint. Something that obviously a strong team like Ferrari is already monitoring.

**Conclusion**

Free practices will not give us the most meaningful information, mostly because we do not have all the information available such as engine mode, fuel load, among others. However, we have to assume that teams are trying to run similar setups that are representative of the ones that they will use during the race.

In this case, Verstappen and his **Red Bull **look strong, posting the lowest median and mean of the top 6 drivers, while at the same time showing that they can maintain a sustained speed for several laps in a row.

**Ferrari **looks a little bit weaker than Red Bull on race pace, however we must understand that there is not a great amount of data to draw conclusion from. From what we got, it appears that Ferrari is looking stronger when compared to last race, and I believe that Tifosi fans will be happier during this race than in Australia.

**Mercedes **is a bit of an enigma. I did not even mentioned them previously because they did not run many laps. While Hamilton posted some very fast laps, and a very competitive median, he did some very slow laps during his stint that prevent us from getting an accurate picture of his race pace. Maybe Mercedes is always sandbagging? We will know pretty soon.

I hope that this article was interesting for you and that it helps you to draw your own conclusions from the free practice 2 at the 2019 Bahrain Grand Prix.

Looking forward for more content! Where are you getting FP times from?

Hey Chris, thanks for the nice comments. You can get the data from the FiA official website. Just look for each GP and in the timing information you will find all the information including lap times, stewards decisions, technical information, etc.

cool, I didn’t know that you can get all the data there. Thanks for sharing!

Which software do you use for your graphs?

I work with R, a statistical software that is pretty versatile for data analysis.