Time-series based 7-days ahead forecasts of (known) confirmed cases of the novel Corona virus COVID-19 (2019-nCoV)
Archived since 2021-01-12: This repo is archived but still can be forked.
Statement: This is just "hobby" project and forecast results should not be taken too seriously! Even though forecast performance may be reasonable for some country and some period in time. Don't take the numbers forecast for granted! This project is mainly supposed to show what may be realized (without too much effort) with the open-source statistics and econometrics software gretl (URL: http://gretl.sourceforge.net/).
Minimum gretl version: At least gretl 2020b.
An automated job retrives latest data at 3am (CET), trains a new model, computes the forecasts and uploads the new forecasting plots here to my github-repo. The overall job finishes in about 15 seconds on a Raspberry Pi 4 computer --- this is just an amazing piece of hardware ;-)
Data provided by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) can be found here: https://github.com/CSSEGISandData/COVID-19
I want to keep it simple. If you fancy, you can use the current state of code as a starting point for devloping and evaluating more complex approaches. However, the ARIMA type of model applied may already provide a reasonable approach for modelling and computing short-term contagion dynamics.
I recently (2020-07-26) switched from simple pre-defined ARIMA models to my official auto_arima package for gretl automatcally searching for the "best" model. Details on the auto_arima package can be found here: https://github.com/atecon/auto_arima
The code executes a brute-force search for the 'best' ARIMA model specification by optimizing the corrected Akaike information criteria ("aicc"). The following parameter space -- implying 120 different ARIMA models in total -- is evaluated:
Default ARIMA parameter space:
string ARIMA_OPTS.INFO_CRIT = "aicc" # information criteria to optimize
scalar ARIMA_OPTS.min_p = 0 # autoregressive (AR) order
scalar ARIMA_OPTS.max_p = 4 # autoregressive (AR) order
scalar ARIMA_OPTS.min_d = 0 # differencing order
scalar ARIMA_OPTS.max_d = 2 # differencing order
scalar ARIMA_OPTS.min_q = 0 # moving average (MA) order
scalar ARIMA_OPTS.max_q = 1 # moving average (MA) order
scalar ARIMA_OPTS.min_P = 0 # seasonal autoregressive (AR) order
scalar ARIMA_OPTS.max_P = 1 # seasonal autoregressive (AR) order
scalar ARIMA_OPTS.min_D = 0 # seasonal differencing order
scalar ARIMA_OPTS.max_D = 0 # seasonal differencing order
scalar ARIMA_OPTS.min_Q = 0 # seasonal MA order
scalar ARIMA_OPTS.max_Q = 1 # seasonal MA order
We compute out-of-sample multi-period interval forecasts. The multi-period forecast is recursively computed. Per default the 90 % forecast (Gaussian) interval will be shown as well.
The gretl script for setting up relevant things and executing the analysis is ./script/run.inp
.
At the beginning of the script, the user can specify the following parameters:
string DIR_WORK = "" # <SET_PATH_HERE> e.g. "/home/git_project"
string DATA_URL = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv" # Data source
string INITIAL_DATE = "2020-01-22" # 1st available observation of CSSE dataset
scalar MINIMUM_CASES = 30 # Minimal number of confirmed cases at latest observation for consideration
string SAVE_PLOT_AS = "png" # grpahic format: "png", "pdf"", eps"
# ARIMA model settings -- see auto_arima package: http://ricardo.ecn.wfu.edu/gretl/cgi-bin/current_fnfiles/auto_arima.gfn
bundle ARIMA_OPTS = null
scalar ARIMA_OPTS.MAX_HORIZON = 7 # max. multi-step OoS forecast horizon
# optimize by information criteria, either aic, aicc, bic or hqc
string ARIMA_OPTS.INFO_CRIT = "aicc"
scalar ARIMA_OPTS.min_p = 0 # autoregressive (AR) order
scalar ARIMA_OPTS.max_p = 4 # autoregressive (AR) order
scalar ARIMA_OPTS.min_d = 0 # differencing order
scalar ARIMA_OPTS.max_d = 2 # differencing order
scalar ARIMA_OPTS.min_q = 0 # moving average (MA) order
scalar ARIMA_OPTS.max_q = 1 # moving average (MA) order
scalar ARIMA_OPTS.min_P = 0 # seasonal autoregressive (AR) order
scalar ARIMA_OPTS.max_P = 1 # seasonal autoregressive (AR) order
scalar ARIMA_OPTS.min_D = 0 # seasonal differencing order
scalar ARIMA_OPTS.max_D = 0 # seasonal differencing order
scalar ARIMA_OPTS.min_Q = 0 # seasonal MA order
scalar ARIMA_OPTS.max_Q = 1 # seasonal MA order
This script will also load the functions doing the main stuff in the beckground which are stored in ./src/helper.inp
as well as the 3rd party library auto_arima
.
The gretl script can be executed in the following ways: Option A:
1) Clone the repo by means of ```git clone```
2) open the script "./script/run.inp"
3) Set your project path by setting the variable "DIR_WORK" accordingly.
4) Execute and enjoy.
Option B (works for linux):
1) Clone the repo by means of ```git clone```
2) Execute the shell-script run.sh
The script downloads latest available CCSE-data, processes the raw data for obtaining a clean panel data set. Next, for each country-province combination two exercises are conducted:
1) If the parameter ```RUN_EXPOST_ANALYSIS``` is set to '1' , an **ex-post** forecasting analysis is done. For this, the training-set is set to <CURRENT_DATE - MAX_HORIZON> observations where "CURRENT_DATE" refers to latest date for which data is available, and ```MAX_HORIZON``` is the set multi-step forecast horizon (default 7 days).
2) Out-of-sample interval forecasts for the forthcoming ```MAX_HORIZON``` days are computed.
The left panel shows forecast made in information available 7 days ago and the realization of 'confirmed cases' during this period. This may give you an idea of how 'well' the forecasted dynamics were.
The middle panel shows forecasts made on latest data for the forthcoming 7 days. The left panel depicts both the historic and predicted day-to-day changes of confirmed cases.
australia - australian_capital_territory
australia - northern_territory
canada - newfoundland_and_labrador
netherlands - bonaire_sint_eustatius_and_saba
saint_vincent_and_the_grenadines -
united_kingdom - british_virgin_islands
united_kingdom - cayman_islands
united_kingdom - channel_islands