-
Notifications
You must be signed in to change notification settings - Fork 34
Checking your submission using Python or R
This entry is an adapted version of materials provided by the US COVID-19 Forecast Hub under the MIT license (https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/R_forecast_file_validation.md)
Upon submission, your forecast files will be checked automatically using the Python script test_formatting.py.
In addition to checking formal correctness we strongly encourage you to take a look a visualization of your final forecasts and do some visual plausibility checks. We created a Shiny app to help you do so:
- Clone the repository
git clone https://github.com/KITmetricslab/covid19-forecast-hub-de.git
- Add your files to
data-processed/<TeamName>-<ModelName>
- Run these commands for one-time setup
pip install pandas pymmwr click requests urllib3 selenium webdriver-manager python-dateutil pyyaml numpy
pip install git+https://github.com/reichlab/zoltpy/
- Validate with
python code/validation/test-formatting.py
For those familiar with R (but not Python), there is a separate set of tests that may be useful to diagnose data formatting issues in functions_plausibility.R. We have tried to keep these in sync with the python checks automatically run during a pull request, but cannot guarantee perfect agreement. If you discover discrepancies, let us know and we will try to address them.
As an example of using these tests from the base of the repository run
source("code/validation/functions_plausibility.R")
To check a single file, run
validate_file("data-processed/UMass-MechBayes/2020-04-26-UMass-MechBayes.csv")
To check a directory, run
validate_directory("data-processed/UMass-MechBayes/")
Any "ERROR"s is likely to result in a failed pull request. "Warning"s and "Message"s are informational, but may help prevent unwanted or incomplete forecasts from getting pushed to the repository.
In addition to purely technical sanity checks, three plausibility checks are performed:
- avoid quantile crossing: quantiles should be non-decreasing, e.g. it does not make sense to have a median of 500, but a 75% quantile of 400 in the same forecast.
- avoid temporal inconsistencies: quantiles of cumulative forecasts should be non-decreasing over time. It does not make sense to predict a median of 500 one week ahead and a median of 400 two weeks ahead (for the same cumulative target and if both forecasts are issued at the same time).
- avoid inconsistencies between incidence and cumulative forecasts: quantiles of cumulative deaths should not be below those of incident deaths for the same forecast horizon (as incident deaths are a subset of cumulative deaths).
If inconsistencies are found here, the list returned by validate_file
contains a table pointing you to the respective parts of your file which
caused the problem.