Bayesian model for in-race marathon finish time predictions.
Code for the paper Quantifying Uncertainty in Live Marathon Finish Time Predictions. View the Shiny app here (code found in app.py).
Abstract: During a marathon, a runner’s expected finish time is commonly estimated by extrapolating the average pace covered so far, assuming it is held constant for the rest of the race. Two problems arise when predicting finish times this way: the estimates do not consider in-race context that can determine if a runner is likely to finish faster or slower than expected, and the prediction is a single point estimate with no information about uncertainty. To address these issues, we implement a hierarchical Bayesian linear regression model that incorporates information from all splits in a race and allows quantification of uncertainty around the predicted finish times. Data from three marathons (Boston, New York, and Chicago) across 4 years (2021-2024) are utilized to establish the improved performance of this Bayesian approach over the traditional baseline method. Finally, we develop an app for runners to visualize their estimated finish distribution in real time.
Data: Scraped from the websites of the Boston, New York, and Chicago Marathons, and stored in \raw_data (processed data found in \processed_data).