Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update readme to include parquet files and clarification about samples #29

Merged
merged 6 commits into from
Nov 12, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 25 additions & 20 deletions model-output/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,11 @@ The combination of `team` and `model` should be unique from any other model in t

### Metadata

The metadata file will be saved within the model-metdata directory in the Hub's GitHub repository, and should have the following naming convention:
The metadata file will be saved within the model-metdata directory in the Hub's GitHub repository. It should be a YAML file with base name `{team}-{model}`, and extension `.yml` or `.yaml`, e.g.

exampleteam-examplemodel.yml
otherteam-othermodel.yaml

team-model.yml

Details on the content and formatting of metadata files are provided in the [model-metadata README](https://github.com/hubverse-org/hubTemplate/blob/master/model-metadata/README.md).

Expand All @@ -90,22 +91,30 @@ Details on the content and formatting of metadata files are provided in the [mod
Each forecast file should have the following
format

YYYY-MM-DD-team-model.csv
{YYYY-MM-DD}-{team}-{model}.csv

or

{YYYY-MM-DD}-{team}-{model}.parquet

depending on whether the team is submitting forecasts as `.csv` files or as `.parquet` files.

where

- `YYYY` is the 4 digit year,
- `MM` is the 2 digit month,
- `DD` is the 2 digit day,
- `team` is the team name, and
- `model` is the name of your model.
- `team` is the abbreviated team name, and
- `model` is the abbreviated name of your model.

The date YYYY-MM-DD is the [`reference_date`](#reference_date). This should be the Saturday following the submission date. For example, submission from the team above for a reference date of November 2, 2024 will be named:

The date YYYY-MM-DD is the [`reference_date`](#reference_date). This should be the Saturday following the submission date.
2024-11-02-exampleteam-examplemodel.csv

The `team` and `model` in this file must match the `team` and `model` in
the directory this file is in. Both `team` and `model` should be less
than 15 characters, alpha-numeric and underscores only, with no spaces
or hyphens.
or hyphens. Submission of both targets- quantiles and samples must be in the same weekly csv or parquet submission file.

## Forecast file format

Expand All @@ -123,7 +132,7 @@ columns (in any order):

No additional columns are allowed.

The value in each row of the file is a quantile for a particular combination of location, date, and horizon.
The value in each row of the file is either a quantile or sample for a particular combination of location, date, and horizon.

### `reference_date`

Expand All @@ -137,7 +146,7 @@ This is the date from which all forecasts should be considered. This date is the

Values in the `target` column must be a character (string) and be the following specific target:

- *`wk inc covid hosp`*
- `wk inc covid hosp`


### `horizon`
Expand All @@ -164,12 +173,12 @@ Values in the `location` column must be one of the "locations" in this [file](..

### `output_type`

Values in the `output_type` column are either
Values in the `output_type` column should be one of

- "quantile" or
- "samples".
- `quantile`
- `samples`

This value indicates whether that row corresponds to a quantile forecast or sample trajectories for weekly incident hospital admissions. Samples can be submitted either for individual modeling tasks, where each `horizon` and `location` is treated independently, or as a part of a compound modeling task that encodes predictive statistical dependency across forecast `horizon`s and/or `location`s.
This value indicates whether that row corresponds to a quantile forecast or sample trajectories for weekly incident hospital admissions. Samples can either encode both temporal and spatial dependency across forecast `horizon`s and `location`s or just encode temporal dependency across `horizon` but treats each `location` independently.

### `output_type_id`
Values in the `output_type_id` column specify identifying information for the output type.
Expand All @@ -180,7 +189,7 @@ When the predictions are quantiles, values in the `output_type_id` column are a
```
0.###
```
This value indicates the quantile probability level for for the `value` in this row.
This value indicates the quantile probability level for the `value` in this row.

Teams must provide the following 23 quantiles:

Expand Down Expand Up @@ -217,11 +226,7 @@ Teams must provide the following 23 quantiles:

#### sample output

When the predictions are samples, values in the `output_type_id` column are indexes for the samples.
The `output_type_id` is used to indicate the dependence across multiple task id variables when samples
come from a joint predictive distribution. For example, samples from a joint predictive distribution
across `horizon`s for a given `location`, will share `output_type_id` for predictions for different
`horizon`s within a same `location`, as shown in the table below:
When the predictions are samples, values in the `output_type_id` column are indexes for the samples. The `output_type_id` is used to indicate the dependence across multiple task id variables when samples come from a joint predictive distribution. For example, samples from a joint predictive distribution across `horizon`s for a given `location`, will share `output_type_id` for predictions for different `horizon`s within a same `location`, as shown in the table below:

| origin_date|horizon| location | output_type| output_type_id | value |
|:---------- |:-----:|:-----:| :-------- | :------------ | :---- |
Expand Down Expand Up @@ -294,4 +299,4 @@ Every Thursday morning, we will generate a CovidHub ensemble hospital admissio
In order to ensure that forecasting is done in real-time, all forecasts are required to be submitted to this repository by 11 PM ET on Wednesdays each week. We do not accept late forecasts.

## Evaluation criteria
Forecasts will be evaluated using a variety of metrics, including the weighted interval score (WIS).
Forecasts will be evaluated using a variety of metrics, including the weighted interval score (WIS).
Loading