Skip to content

Checking your submission using Python or R

Johannes Bracher edited this page Oct 27, 2020 · 5 revisions

This entry is an adapted version of materials provided by the US COVID-19 Forecast Hub under the MIT license (https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/R_forecast_file_validation.md)

Upon submission, your forecast files will be checked automatically using the Python script test_formatting.py.

In addition to checking formal correctness we strongly encourage you to take a look a visualization of your final forecasts and do some visual plausibility checks. We created a Shiny app to help you do so:

Local Checks with Python

  1. Clone the repository
git clone https://github.com/KITmetricslab/covid19-forecast-hub-de.git
  1. Add your files to data-processed/<TeamName>-<ModelName>
  2. Run these commands for one-time setup
pip install pandas pymmwr click requests urllib3 selenium webdriver-manager python-dateutil pyyaml numpy
pip install git+https://github.com/reichlab/zoltpy/
  1. Validate with python code/validation/test-formatting.py <folder name>, where <folder name> could for instance be KIT-baseline. This will check all files in the respective subfolder <folder name> of data-processed. If you provide no folder name, all folders will be checked.

Local Checks with R

For those familiar with R (but not Python), there is a separate set of tests that may be useful to diagnose data formatting issues in functions_plausibility.R. We have tried to keep these in sync with the python checks automatically run during a pull request, but cannot guarantee perfect agreement. If you discover discrepancies, let us know and we will try to address them.

As an example of using these tests from the base of the repository run

source("code/validation/functions_plausibility.R")

To check a single file, run

validate_file("data-processed/UMass-MechBayes/2020-04-26-UMass-MechBayes.csv")

To check a directory, run

validate_directory("data-processed/UMass-MechBayes/")

Any "ERROR"s is likely to result in a failed pull request. "Warning"s and "Message"s are informational, but may help prevent unwanted or incomplete forecasts from getting pushed to the repository.

In addition to purely technical sanity checks, three plausibility checks are performed:

  • avoid quantile crossing: quantiles should be non-decreasing, e.g. it does not make sense to have a median of 500, but a 75% quantile of 400 in the same forecast.
  • avoid temporal inconsistencies: quantiles of cumulative forecasts should be non-decreasing over time. It does not make sense to predict a median of 500 one week ahead and a median of 400 two weeks ahead (for the same cumulative target and if both forecasts are issued at the same time).
  • avoid inconsistencies between incidence and cumulative forecasts: quantiles of cumulative deaths should not be below those of incident deaths for the same forecast horizon (as incident deaths are a subset of cumulative deaths).

If inconsistencies are found here, the list returned by validate_file contains a table pointing you to the respective parts of your file which caused the problem.