by Andrie de Vries

A week ago my high school friend, @XLRunner, sent me a link to the article "How Zach Bitter Ran 100 Miles in Less Than 12 Hours". Zach's effort was rewarded with the American record for the 100 mile event.

This reminded me of some analysis I did, many years ago, of the world record speeds for various running distances. The International Amateur Athletics Federation (IAAF) keeps track of world records for distances from 100m up to the marathon (42km). The distances longer than 42km do not fall in the IAAF event list, but these are also tracked by various other organisations.

You can find a list of IAAF world records at Wikipedia, and a list of ultramarathon world best times at Wikepedia.

I extracted only the mens running events from these lists, and used R to plot the average running speeds for these records:

You can immediately see that the speed declines very rapidly from the sprint events. Perhaps it would be better to plot this using a logarithmic x-scale, adding some labels at the same time. I also added some colour for what I call standard events - where "standard" is the type of distance you would see regularly at a world championships or olympic games. Thus the mile is "standard", but the 2,000m race is not.

Now our data points are in somewhat more of a straight line, meaning we could consider fitting a linear regression.

However, it seems that there might be two kinks in the line:

- The first kink occurs somewhere between the 800m distance and the mile. It seems that the sprinting distances (and the 800m is sometimes called a long sprint) has different dynamics from the events up to the marathon.
- And then there is another kink for the ultra-marathon distances. The standard marathon is 42.2km, and distances longer than this are called ultramarathons.

Also, note that the speed for the 100m is actually slower than for the 200m. This indicates the transition effect of getting started from a standing start - clearly this plays a large role in the very short sprint distance.

## Subsetting the data

For the analysis below, I exlcuded the data for:

- The 100m sprint (transition effects play too large a role)
- The ultramarahon distances (they get raced less frequently, thus something strange seems to be happening in the data for the 50km race in particular).

## Using the segmented package

To fit a regression line with kinks, more properly known as a segmented regression (or sometimes called piecewise regression), you can use the segmented package, available on CRAN.

The `segmented()`

function allows you to modify a fitted object of class `lm`

or `glm`

, specifying which of the independent variables should have segments (kinks). In my case, I fitted a linear model with a single variable (log of distance), and allowed `segmented()`

to find a single kink point.

My analysis indicates that there is a kink point at 1.13km (10^0.055 = 1.13), i.e. between the 800m event and the 1,000m event.

`> summary(sfit)`

`***Regression Model with Segmented Relationship(s)***`

`Call: `

`segmented.lm(obj = lfit, seg.Z = ~logDistance)`

`Estimated Break-Point(s):`

` Est. St.Err `

` 0.055 0.021`

`Meaningful coefficients of the linear terms:`

` Estimate Std. Error t value Pr(>|t|) `

`(Intercept) 27.2064 0.1755 155.04 < 2e-16 ***`

`logDistance - 15.1305 0.4332 -34.93 1.94e-13 ***`

`U1.logDistance 11.2046 0.4536 24.70 NA `

`---`

`Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`

`Residual standard error: 0.2373 on 12 degrees of freedom`

`Multiple R-Squared: 0.9981, Adjusted R-squared: 0.9976`

`Convergence attained in 4 iterations with relative change -4.922372e-16 `

The final plot shows the same data, but this time with the segmented regression line also displayed.

## Conclusion

I conlude:

- It is really easy to fit a segmented linear regression model using the segmented package
- There seems to be a different physiological process for the sprint events and the middle distance events. The segmented regression finds this kink point between the 800m event and the 1,000m event
- The ultramarathon distances have a completely different dynamic. However, it's not clear to me whether this is due to inherent physiological constraints, or vastly reduced competition in these "non-standard" events.
- The 50km world record seems too "slow". Perhaps the competition for this event is less intense than for the marathon?

## The code

Here is my code for the analysis:

Thanks for the shoutout!

In addition the physiology, psychology definitely comes into play in the ultra distances. Additionally in a sport with Kenyan and Ethiopian runners at the top in most of the marathon events, we have not seen a lot of these elites at ultra races. Perhaps something that will change over time.

Posted by: XLrunner | December 31, 2015 at 06:49

It may not seem obvious at first, but in the 100m dash, there is a long acceleration phase of about 55-60 meters before top speed is attained. While in 200m, the first 100 is slower than the 100m dash, but the second 100 is much faster because the sprinter has already attained top cruising speed. Examine the splits (speed of race segments) and it's becomes clear.

Posted by: Gerald Cheves | January 01, 2016 at 22:37

Fun! I think that the biggest outlier is the super-fast marathon. I'd say that is due to the large amounts of money to be won by winning one of the world's big marathons. So - stiffer competition, and perhaps more drug taking, but that is speculation.

From my limited knowledge of sports physiology, having three regimes: anaerobic, aerobic and fat-burning seems a reasonable hypothesis.

Posted by: Alan Parker | January 02, 2016 at 06:08