In this project I leverage the XGBoost machine learning library and publically available weather data to forecast the date of last spring freeze (
In Figure 1, I show an example of minimum temperature and temperature fluctuation forecasts after Jan. 31, 2010. Generally the average and variation of the true and predicted values are similar. We can also see form the purple vertical lines that the predicted and true last day of spring freeze (LDSF) are only 3 days apart. Farmers and gardeners often use the farmer's almanac but its average-based predictions don't capture any variations. For example, using the average, the absolute difference from the true LDSF in 2010 is 7 days, over 2 times larger than my prediction.
Figure 1. The true minimum temperature and temperature fluctuatuation distributions are in blue. The XGBoost model was trained prior to "today" and everything after "today" is forecasted (exept True distribution obviously). The data is noisy so it is not surprising that the predictions are not perfect. But since we have prior information of the seasonality (sine function) we can get pretty close. Our machine learning model can focus on temperature fluctuations rather than the total temperature. `T_flucs(t+1)` is the prediction for temperature fluctuations one day into the future. `T_flucs(t+7)` is the prediction for temperature fluctuations seven days into the future. LDSF is the last day of spring frost for the prediction and truth.
Here I show the mean absolute error (MAE) between the predicted and true LDSFs.
MAE(Avg. LDSF over previous years, truth) | MAE(sine model minus |
MAE(XGBoost 1-day forcast, truth) |
---|---|---|
10.2 days | 8.9 days | 7.6 days |
Using the XGBoost model, the MAE decrease by a factor of 25%