-
Notifications
You must be signed in to change notification settings - Fork 34
Validation Checks
Each pull request to our repository triggers an automated validation of the formatting requirements for forecast files in the data-processed. These are implemented in the script test-formatting.py. We are using a somewhat shortened form of the procedure implemented by the US Forecast Hub, see here for their documentation. You can also run these checks locally or apply a similar set of checks in R, see here. Specifically, the following checks are performed:
-
validates file name
- checks that the format of the file name is
<date>-<country>-<team>-<model><possibly -ICU or -case>.csv
- checks that the
<team>
part of the file name is the same as the name of the containing folder
- checks that the format of the file name is
-
validates header
- checks for required columns:
location
,target
,type
,quantile
,value
,forecast_date
,target_end_date
- checks for required columns:
-
validates csv rows at the row level:
-
checks each row has same number of columns as header
-
validates
forecast_date
andtarget_end_date
are dates of formatYYYY-MM-DD
-
validates
"__ day ahead"
or"__ week ahead"
increments intarget
are integers -
validates values of
quantile
are int/float and from these 23 values:[0.01, 0.25, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99]
-
checks that
value
is an int or float
-
-
validates quantiles and values (i.e,. at the prediction level):
- checks that entries in
value
are be non-decreasing as quantiles increase - checks that elements in
quantile
are unique (per target)
- checks that entries in
-
validates quantiles as a group:
- there must be exactly one point prediction for each location/target pair
The following checks are only applied to death forecast files, i.e. without the -ICU
tag in the file name.
- validates that
target
is one of the following:paste(-1:130, "day ahead inc death") paste(-1:130, "day ahead cum death") paste(-1:20, "wk ahead inc death") paste(-1:20, "wk ahead cum death")
The following checks are only applied to case forecast files, i.e. without containing the -case
tag at the end of the file name.
- validates that
target
is one of the following:paste(-1:130, "day ahead inc case") paste(-1:130, "day ahead cum case") paste(-1:20, "wk ahead inc case") paste(-1:20, "wk ahead cum case")
The following checks are only applied to ICU forecast files, i.e. without containing the -ICU
tag at the end of the file name.
- validates that
target
is one of the following:paste(-1:130, "day ahead curr ICU") paste(-1:130, "day ahead curr ventilated") paste(-1:20, "wk ahead curr ICU") paste(-1:20, "wk ahead curr ventilated")
The allowed entries for the variable location
depend on the country indicated in the file name. For files containing -Germany-
the following locations are allowed:
r ["GM", "GM01", "GM02", "GM03", "GM04", "GM05", "GM06", "GM07", "GM08", "GM09", "GM10", "GM11", "GM12", "GM13", "GM14", "GM15", "GM16"]
The FIPS code - "Bundesland" mapping can be found here and on Wikipedia.
For files containing -Poland-
the following locations are allowed:
r ["PL", "PL72", "PL73", "PL74", "PL75", "PL76", "PL77", "PL78", "PL79", "PL80", "PL81", "PL82", "PL83", "PL84", "PL85", "PL86", "PL87"]
The FIPS code - "Voivodeship" mapping can be found here and on Wikipedia.
- validates metadata (In progress)
- proper yaml format
- includes: team_name,team_abbr, model_name, model_abbr, methods
- methods is under 200 characters
- forecast_startdate is date
- this_model_is_an_ensemble and this_model_is_unconditional are boolean
- model_name needs to be distinct from any already existing model_name
- model_abbr needs to be distinct from any already existing model_abbr
To make debugging easier the automated validation of the formatting requirements for forecast files applied to each new submission can also be executed manually. This is done by running the test-formatting.py file located in code/validation from the covid-19-forecast-hub-de directory and adding your model name (similar to the one used in the data-processed folder) as the first command line argument.
...\covid19-forecast-hub-de>python code/validation/test-formatting.py "kit-baseline"