Skip to content

Validation Checks

Jannik Deuschel edited this page Oct 12, 2020 · 18 revisions

Current Validation checks

Each pull request to our repository triggers an automated validation of the formatting requirements for forecast files in the data-processed. These are implemented in the script test-formatting.py. We are using a somewhat shortened form of the procedure implemented by the US Forecast Hub, see here for their documentation. You can also run these checks locally or apply a similar set of checks in R, see here. Specifically, the following checks are performed:

Checks applied to all files

  • validates file name

    • checks that the format of the file name is <date>-<country>-<team>-<model><possibly -ICU or -case>.csv
    • checks that the <team> part of the file name is the same as the name of the containing folder
  • validates header

    • checks for required columns: location, target, type, quantile, value, forecast_date, target_end_date
  • validates csv rows at the row level:

    • checks each row has same number of columns as header

    • validates forecast_date and target_end_date are dates of format YYYY-MM-DD

    • validates "__ day ahead" or "__ week ahead" increments in target are integers

    • validates values of quantile are int/float and from these 23 values:

      [0.01, 0.25, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99]
    • checks that value is an int or float

  • validates quantiles and values (i.e,. at the prediction level):

    • checks that entries in value are be non-decreasing as quantiles increase
    • checks that elements in quantile are unique (per target)
  • validates quantiles as a group:

    • there must be exactly one point prediction for each location/target pair

Checks applied to death forecast files

The following checks are only applied to death forecast files, i.e. without the -ICU tag in the file name.

  • validates that target is one of the following:
    paste(-1:130, "day ahead inc death")
    paste(-1:130, "day ahead cum death")
    paste(-1:20,  "wk ahead inc death")
    paste(-1:20,  "wk ahead cum death")

Checks applied to case forecast files

The following checks are only applied to case forecast files, i.e. without containing the -case tag at the end of the file name.

  • validates that target is one of the following:
    paste(-1:130, "day ahead inc case")
    paste(-1:130, "day ahead cum case")
    paste(-1:20, "wk ahead inc case")
    paste(-1:20, "wk ahead cum case")

Checks applied to ICU forecast files

The following checks are only applied to ICU forecast files, i.e. without containing the -ICU tag at the end of the file name.

  • validates that target is one of the following:
    paste(-1:130, "day ahead curr ICU")
    paste(-1:130, "day ahead curr ventilated")
    paste(-1:20, "wk ahead curr ICU")
    paste(-1:20, "wk ahead curr ventilated")

Checks of location variable

The allowed entries for the variable location depend on the country indicated in the file name. For files containing -Germany- the following locations are allowed: r ["GM", "GM01", "GM02", "GM03", "GM04", "GM05", "GM06", "GM07", "GM08", "GM09", "GM10", "GM11", "GM12", "GM13", "GM14", "GM15", "GM16"] The FIPS code - "Bundesland" mapping can be found here and on Wikipedia.

For files containing -Poland- the following locations are allowed: r ["PL", "PL72", "PL73", "PL74", "PL75", "PL76", "PL77", "PL78", "PL79", "PL80", "PL81", "PL82", "PL83", "PL84", "PL85", "PL86", "PL87"] The FIPS code - "Voivodeship" mapping can be found here and on Wikipedia.

Metadata checks

  • validates metadata (In progress)
    • proper yaml format
    • includes: team_name,team_abbr, model_name, model_abbr, methods
    • methods is under 200 characters
    • forecast_startdate is date
    • this_model_is_an_ensemble and this_model_is_unconditional are boolean
    • model_name needs to be distinct from any already existing model_name
    • model_abbr needs to be distinct from any already existing model_abbr

Execute checks locally

To make debugging easier the automated validation of the formatting requirements for forecast files applied to each new submission can also be executed manually. This is done by running the test-formatting.py file located in code/validation from the covid-19-forecast-hub-de directory and adding your model name (similar to the one used in the data-processed folder) as the first command line argument.

   ...\covid19-forecast-hub-de>python code/validation/test-formatting.py "kit-baseline"
Clone this wiki locally