Checking your submission using Python or R

This entry is an adapted version of materials provided by the US COVID-19 Forecast Hub under the MIT license (https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/R_forecast_file_validation.md)

Upon submission, your forecast files will be checked automatically using the Python script test_formatting.py.

In addition to checking formal correctness we strongly encourage you to take a look a visualization of your final forecasts and do some visual plausibility checks. We created a Shiny app to help you do so:

Local Checks with Python

Clone the repository

git clone https://github.com/KITmetricslab/covid19-forecast-hub-de.git

Add your files to data-processed/<TeamName>-<ModelName>
Run these commands for one-time setup

pip install pandas pymmwr click requests urllib3 selenium webdriver-manager python-dateutil pyyaml numpy
pip install git+https://github.com/reichlab/zoltpy/

Validate with python code/validation/test-formatting.py

Local Checks with R

For those familiar with R (but not Python), there is a separate set of tests that may be useful to diagnose data formatting issues in functions_plausibility.R. We have tried to keep these in sync with the python checks automatically run during a pull request, but cannot guarantee perfect agreement. If you discover discrepancies, let us know and we will try to address them.

As an example of using these tests from the base of the repository run

source("code/validation/functions_plausibility.R")

To check a single file, run

validate_file("data-processed/UMass-MechBayes/2020-04-26-UMass-MechBayes.csv")

To check a directory, run

validate_directory("data-processed/UMass-MechBayes/")

Any "ERROR"s is likely to result in a failed pull request. "Warning"s and "Message"s are informational, but may help prevent unwanted or incomplete forecasts from getting pushed to the repository.

In addition to purely technical sanity checks, three plausibility checks are performed:

avoid quantile crossing: quantiles should be non-decreasing, e.g. it does not make sense to have a median of 500, but a 75% quantile of 400 in the same forecast.
avoid temporal inconsistencies: quantiles of cumulative forecasts should be non-decreasing over time. It does not make sense to predict a median of 500 one week ahead and a median of 400 two weeks ahead (for the same cumulative target and if both forecasts are issued at the same time).
avoid inconsistencies between incidence and cumulative forecasts: quantiles of cumulative deaths should not be below those of incident deaths for the same forecast horizon (as incident deaths are a subset of cumulative deaths).

If inconsistencies are found here, the list returned by validate_file contains a table pointing you to the respective parts of your file which caused the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checking your submission using Python or R

Local Checks with Python

Local Checks with R

Clone this wiki locally