Skip to content
This repository has been archived by the owner on Dec 13, 2022. It is now read-only.

Latest commit

 

History

History
120 lines (77 loc) · 11.8 KB

README.md

File metadata and controls

120 lines (77 loc) · 11.8 KB

⚠️ IMPORTANT NOTICE ⚠️

THIS DATASET HAS BEEN REPLACED WITH A NEW DATASET: CovidTimelineCanada

⚠️ Please use the new dataset from now on. The dataset in this repository will no longer be updated as of May 4, 2022.

🚨 Vaccine-related datasets have also been added to the new repository (vaccine coverage, vaccine administration) or will be added in the near future (vaccine distribution).

❗ To ease the transition to the new dataset, case and death datasets using the old column names, date format, province/territory names and health region names are being offered for download as CSV files. These files should be more-or-less drop-in replacements for the old case and death datasets. However, we encourage users to switch to the new dataset format, as this legacy format will not be supported indefinitely. Download links for the CSV files:

Epidemiological Data from the COVID-19 Outbreak in Canada

The COVID-19 Canada Open Data Working Group collects daily time series data on COVID-19 cases, deaths, recoveries, testing and vaccinations at the health region and province levels. Data are collected from publicly available sources such as government datasets and news releases. Updates are made nightly at 22:00 ET. See data_notes.txt for notes regarding the latest data update. Our data collection is mostly automated; see Covid19CanadaETL for details.

Our data dashboard is available at the following URL: https://art-bd.shinyapps.io/covid19canada/.

Table of contents:

Accessing the data

❗ Before using our datasets, please read the Datasets section below. ⚠️

Our datasets are available in three different formats:

  • CSV format from this GitHub repository (to download all the latest data, select the green "Code" button and click "Download ZIP")
  • JSON format from our API
  • Google Drive

Note that retired datasets (retired_datasets) are only available on GitHub.

Datasets

Usage notes and caveats

The dataset in this repository was launched in March 2020 and has been maintained ever since. As a legacy dataset, it preserves many oddities in the data introduced by changes to COVID-19 reporting over time (see details below). A new, definitive COVID-19 dataset for Canada is currently being developed as CovidTimelineCanada, a part of the What Happened? COVID-19 in Canada project. While the new CovidTimelineCanada dataset is not yet stable (and thus should not be relied upon), it fixes many of the aforementioned oddities present in the legacy dataset in this repository.

  • ℹ️ See data_notes.txt for notes regarding issues affecting the dataset.
  • ℹ️ Ontario case, mortality and recovered data are retrieved from individual public health units (exceptions are listed here and differ from values reported in the Ontario Ministry of Health dataset. For most public health units, we limit cases to confirmed cases (excluding probable cases).
  • ⚠️ Impossible values, such as negative case or death counts
    • Our dataset preserves some "impossible" values such as negative daily case or death counts. This is because our dataset reports primarily the cumulative value reported each day by the public health authority. Since historical data are sometimes revised (e.g., cases reassigned to different regions, fixing data quality issues, etc.), this sometimes results in negative values reported for a particular date.
  • ⚠️ Testing numbers are unreliable
    • For continuity, we generally report the first testing number that was reported by the province. For some provinces this was number of tests performed, for others this was number of unique people tested. For the purposes of calculating percent positivity, the number of tests performed should generally be used. The Public Health Agency of Canada provides a province-level time series of number of tests performed. We supply a compatible version of this dataset as in the official_datasets directory as phac_n_tests_performed_timeseries_prov.csv. This dataset should be used over our dataset for inter-provincial comparisons.
    • Additionally, some provinces have stopped directly reporting their COVID-19 testing numbers.
  • ⚠️ Recovered/active case counts are unreliable
    • The defintion of "recovered" has changed over time and differs between provinces. For example, Quebec changed their defintion of recovered on July 17, 2020, which created a massive spike on that date. For this reason, these data should be interpreted with caution.
    • Recovered and active case numbers for Ontario (and thus Canada) are incorrectly estimated prior to 2021-09-07 and should not be considered reliable.
    • Recovered and active case numbers for British Columbia are no longer available as of 2021-02-10. Values for this province (and thus Canada) should be discarded after this date. Several other provinces have also stopped reporting these values, including Saskatchewan, Nova Scotia and Newfoundland & Labrador.
  • ⚠ Vaccine dose numbers are unreliable
    • Many provinces have stopped reporting vaccine dose data like they did previously. The most reliable vaccine numbers are available weekly from the PHAC vaccine coverage map.

The update date and time for our dataset is given in update_time.txt.

The following time series data are available at the health region level (as well as at the level of province and Canada-wide):

  • cases (confirmed and probable COVID-19 cases)
  • mortality (confirmed and probable COVID-19 deaths)

The following time series data are available at the province level (as well as Canada-wide):

  • recovered (COVID-19 cases considered resolved that did not end in death)
  • testing (definitions vary, see our technical report
  • active cases (we use the formula active cases = confirmed cases - recovered - deaths, which explains the disrepecies between our active case numbers and those reported from official sources)
  • vaccine distribution (total doses distributed)
  • vaccine administration (total doses administered)
  • vaccine completion (second doses administered)
  • vaccine additional doses (third doses administered)

Note that definitions for each of these values differ between provinces. See our technical report for more details.

Several other important files are also available in the other folder:

  • Correspondence between health region names used in our dataset and HRUID values given in Esri Canada's health region map, with 2019 population values: other/hr_map.csv
  • Correspondece between province names used in our dataset and full province names and two-letter abbreviations, with 2019 population values: other/prov_map.csv
  • Correspondece between province names used in our dataset and full province names and two-letter abbreviations, with 2019 population values and new Saskatchewan health regions: other/prov_map_sk_new.csv
    • The new Saskatchewan health regions (13 health regions versus 6 in the original data) use unofficial estimates of 2020 population values provided by Statistics Canada and may differ from official data released by Statistics Canada at a later date

We also have a case and mortality datasets which combine our dataset with the official SK provincial dataset using the new 13 reporting zones (our dataset continues to use the old 6 reporting zones) in the hr_sk_new folder. Data for SK are only available from August 4, 2020 and onward in this dataset.

Our individual-level case and mortality datasets are retired as of June 1, 2021 (see retired_datasets).

Recommended citation

Below is the current citation for the dataset:

  • Berry, I., O’Neill, M., Sturrock, S. L., Wright, J. E., Acharya, K., Brankston, G., Harish, V., Kornas, K., Maani, N., Naganathan, T., Obress, L., Rossi, T., Simmons, A. E., Van Camp, M., Xie, X., Tuite, A. R., Greer, A. L., Fisman, D. N., & Soucy, J.-P. R. (2021). A sub-national real-time epidemiological and vaccination database for the COVID-19 pandemic in Canada. Scientific Data, 8(1). doi: https://doi.org/10.1038/s41597-021-00955-2

Below is the previous citation for the dataset:

  • Berry, I., Soucy, J.-P. R., Tuite, A., & Fisman, D. (2020). Open access epidemiologic data and an interactive dashboard to monitor the COVID-19 outbreak in Canada. Canadian Medical Association Journal, 192(15), E420. doi: https://doi.org/10.1503/cmaj.75262

Methodology & data notes

Detailed information about our data collection methodology and sources, answers to frequently asked data questions and the technical report for our dataset are available on our website. Note that some of this information is out-of-date and will eventually be updated. Information on automated data collection is available in the Covid19CanadaETL GitHub repository.

The scripts used to prepare, update and validate the datasets in this repository are available in the scripts folder.

Acknowledgements

We would like to thank all individuals and organizations across Canada who have worked tirelessly to provide data to the public during this pandemic.

Additionally, we thank the following organizations/individuals for their support:

Public Health Agency of Canada / Joe Murray (JMA Consulting)

Contact us

You can learn more about the COVID-19 Canada Open Data Working Group at our website and reach out to us via our contact page.