This repository implements a Recurrent Neural Network, which predicts the spread of COVID-19 accross the world. To accomplish this, a model is trained for each country, taking into account its nearest neighbors in terms of growth (later explained).
Clone this repo runnning the following command:
$ git clone https://github.com/rdbch/COVID-19-Forecast/
$ cd COVID-19-Forecast
Install PyTorch 1.0+ and other dependencies (pandas, numpy, seaborn, jupyter, etc).
- pip
$ pip install -r requirements.txt
- conda
$ conda create --name COVID-19-Forecast --file req.txt
Note: development was done using GPU accelerated version of Pytorch, so they have CUDA dependencies included.
To have access to the latest data, please run
$ python scripts/fetch_new_data.py
This will download the new global data from Johns Hopkins University github repo and convert it to a more convenient format (the one used in Kaggles COVID-19 spread). They update the data on a daily basis.
To run the notebooks please start the jupyter server in ./COVID-19-Forecast
(parent of ./notebooks):
$ jupyter notebook
Notebook: link
Rather than training a model for all countries, it is more suited to train a model for each individual one, using only the nearest neighbours countries in terms of growth. Please check the this notebook for more details. By doing this, we improve the predictions for the majority of countries.
Below it is explained how the nearest neighbors of a source country S, are obtained:
First, we discard the entries (days) which are below a specified alignment threshold Ta (have less than a specified number of cases), for every country (S included). Then, we take a candidate country, Cn. Cn must be more evolved than S (this means it reached Ta earlier). We start sliding S over Cn, beginning with the first day it reached the threshold, until Cn ends. For each such step, an error is computed. The smallest error will be the error associated with Cn. We do this for all countries that are available in the dataset, taking one feature f, f in {confirmedCases, fatalities} at a time. During training, the neighbours will be filtered by applying an error threshold Terror(f).
Below is provided a sample of the first 3 neaighbours for Romania. The data used for this was last updated on 03 may 2020
.
Notebook:link
A naive model based of reccurent cells is implied. The predictor was only trained on the neareast neighbours. To limit the growth, an unsupervised loss is used for smoothing out the long term prediction. Please check this notebook for more details.
Below are the graphs for the results obtained for confirmed cases (left) and fatalities (right) of a country with an advanced disease spread and another with an average one. The predicted output represents a period of 60 days.
The data used for this task was last updated on 26.04.2020
.