For years, drag racing has talked about reaction time averages. You see it in publications, news articles, broadcasts, and even in the practice trees we use at home.
But when I examined 10,000 runs, some interesting patterns emerge, and I believe we’ve been doing it all wrong this entire time.
Density curves – symmetric and skewed
In statistics, we can count how many times something happened, then make a tall bars if it happened a lot, or a short bars if it didn’t happen a lot.
When we get a lot of data on something, this lets us create something called a density curve. Some people call them bell curves, and they’ve always been one of my favorite types of charts.
A density curve doesn’t always have to look like a bell though. Sometimes, for different reasons, the bell gets pushed a little to one side or the other. This is called a skewed distribution, because it’s, well, skewed.
Here are two examples, where the top one is a symmetric distribution, and the bottom one is a “right skewed” (aka “positively skewed”).
Central tendency is what you call stats like average, median, and mode. When, looking at these three AND the density curves together, you can see what kind of central tendency is best to use.
The average of a set of numbersMEAN
The most common number in a set of numbersMODE
The middle number in a set of numbers ordered low to high.MEDIAN
For example, if you look at life expectancy charts, they are skewed. If you took the average life expectancy, it’s not even 20 years because of so many unfortunate deaths in youth. But does that mean we should expect to live for 20 years?
In this case, because it’s skewed (“left skewed,” technically), the median is a great way to convey the proper central tendency of life expectancy. 74 years? Sounds much better!
Put another way, below the median age has 50% of all people ever, and above the median age is 50% of all people ever. So it’s not an average, but rather saying the halfway point of most humans life expectancy is this point.
Drag racing reaction times
I looked at about 10,000 runs, and here is the density curve of how many times a certain reaction time happened:
It starts at about -.500 because LB3 (left before 3rd light) data is rarely given, and goes well into the 2+ second range for some reaction times. Sometimes people are really late due to issues, so it makes the chart go a long way to the right.
The average RT using all these values in this chart is 0.046. So is it safe to think I’m better than most people if I get an 0.046?
If we zoom in and cut out everything outside -.200 to .200 reaction times:
A lot of lights just slower than green, a couple red and then it sort of tails off.
The average with this group becomes 0.026. But why cut it off outside -.2 and .2? I have no idea, I just chose it at random.
And random is a terrible, terrible way of analyzing reaction times. If I cut it off at -.010 and .030 (because I’m never slower than .030, right?) I get a completely different average still.
But guess what?
The median in both charts is 0.021. Why? Because using median is what we call “robust to outliers”.
That means that it doesn’t matter if your spread is amazing and you had one goof that was 0.095 or 15.42 reaction time. Your median won’t change because all it’s doing is finding the exact middle occurance.
What do you notice?
How do we tell if a curve is symmetrical? The median and average will be the same.
In our reaction time distribution above, taking all runs, the average is 0.046 and the median in 0.021.
In the second trimmed curve, the average becomes 0.026 and the median is still 0.021.
Now what if I do some overlay of our RT curve and the example curve from the beginning?
It becomes clear that reaction times then are skewed data. We are trying to get close to 0.000 but not go under, creating this effect. If we miss, we often miss to the slow side way more often than missing too quick because with a green light you’re still in the game.
Statistically speaking, for skewed data, median is a better measure of central tendency than average. You’ll notice that in both cases, the median did not change. The average changes based on where we “trim” the data.
The downside and IER
The bad part about this whole thing is that a lot of statistical analysis tools and calculations use average and standard deviation. We’ve already shown we can’t (or shouldn’t, anyway) use average and standard deviation for reaction times.
Using median allows is the use of something like IQR or “inner quartile range.” For drag racers, this means that if we order all our reaction times fast to slow then cut them into four equal sections, we now have four parts that hold 25% of our RTs in them.
IQR answers the question, what is the range of the middle two sections? Put another way, I know I have a “spread” of reaction times. Lots of people talk about spread. But if we admit we sometimes miss it, is there a “bulk” of spreads? If I ignored my slowest 25% and fastest 25%, what’s my inner 50% bulk spread?
Personally, I don’t believe the inner 50% is wide enough on its own, so I created my own similar statistic of IER, or “inner eightieth range.” This means the middle 80th percentile spread.
So basically, chop off the fastest 10% and the slowest 10% and it can show a nice measure of the bulk of your reaction times. I chose the inner 80% because: a) in general, about 10-15% of people go red, which means if you know the IER you know what your slowest light would be most of the time if you were okay with the usual 10% red range, and b) racers don’t “miss” all that often, so throwing out the fastest 10% both ways feels more fair.
If you have your reaction times in a spreadsheet, here’s how you can find your spreads.
For total spread, be sure to exchange “array” for the entire range of cells for your data:
To find your IQR:
To find your IER:
You’ll notice the pattern, I’m sure. Again just be sure you’re swapping out “array” for the range, which usually looks something like a2:a (if they’re in the same column).
A visual interpretation of a reaction time set
Here we have 40-something runs placed on a chart with a visualization of the density curve.
The total spread is 0.045 (-0.006 to 0.039).
The median is expressed as the dashed line and is 0.009.
The IQR is expressed as blue lines and is 0.010. So half of all reaction times are only 0.010 apart.
The IER is expressed as green lines and is 0.014. So 80% of all reaction times are only 0.014 apart.
Then you’ll notice there are actually some statistical outliers we could ignore (and therefore not feel bad about). I discuss how to find those below.
If your data makes sense to use average, that means you can use standard deviation. For that, anything outside 3 standard deviations (the middle 99.73%) can be considered an outlier and effectively ignored.
If you’re using median, it’s also possible to actually calculate statistical outliers using this formula:
Finding the lower limit:
Q1 - (1.5 * IQR)
Finding the upper limit:
Q3 + (1.5 * IQR)
Where Q1 is the value at the 25%, and Q3 is the value at the 75%. There are calculators online if you want to easily find the Q1 & Q3 values. Or if you’re using a spreadsheet, you can use:
Where “range” is all the spreadsheet cells you want to use, and looks something like a2:a; and k is the percentile you want.
So to find outliers in a spreadsheet, you could use each of these in a cell to find the upper limit (where anything higher can be ignored) and lower limit (where anything lower can be ignored). Be sure to change “array” to where the data is.
Truthfully, this doesn’t come into play that often, but if you’re extra worried about knowing whether you actually should leave out a reaction time, this is how you can know.