The purpose of this project is to crawl various government websites and collect COVID-19 testing data. This will hopefully aid in quantifying and comparing responses between countries.
Currently spiders are available for the following countries/states. This is not neccessarily the same the above table because some data is collected manually.
Country | Region | Source URL | Notes | Additional URL | Scrap |
---|---|---|---|---|---|
Bahrain | Asia | Bahrain Ministry of Health | Table is in easily accesible format, but translation isn't easy. | Scrapy | |
Japan | Asia | covid19japan GitHub | Collects data from a number of government sources which I can't read/parse | Github | |
Malaysia | Asia | Malaysia Ministry of Health | Scrapy | ||
Pakistan | Asia | Pakistan National Institute of Health | PDF including testing data from providences is listed here. | Manual | |
Palestine | Asia | Corona Virus (COVID-19) in Palestine | Provide an API for grabbing the current data. Unclear where to get historic data | Scrapy | |
South Korea | Asia | Coronavirus-Dataset GitHub | Had data for tests performed and positive cases up to March 20th. Unclear if still being updated. | Github | |
Vietnam | Asia | Vietnam Ministry of Health | Easily parsable table. Also provide testing information at the state-wide level which isn't utilized at the moment. | Scrapy | |
Costa Rica | Central America | Costa Rica Ministry of Health | Manual | ||
Austria | Europe | Austria Ministry of Public Affairs | List number of cases performed and positive cases for the entire country as well as all federal states. | Scrapy | |
Czech Republic (Czechia) | Europe | Czech Republic Ministry of Health | Current cases at link which can be scrapped. Past data pulled from wikipedia | 2020 coronavirus pandemic in the Czech Republic | Scrapy |
Estonia | Europe | Estonia Government | Tests performed and positive tests provided, but historic data and deaths grabbed manually from interactive application. | CoronaCard | Scrapy |
Finland | Europe | Finland Public Health Institute | Current data is presented on this webpage, historical data is probably available in daily press releases. | Scrapy | |
Greece | Europe | 2020 coronavirus pandemic in Greece | Information released by greek government in daily PDFs. Will take values from Wikipedia. | Wikipedia | |
Hungary | Europe | Hungary Government | List total cases and positive cases. Past cases through wayback machine. | Scrapy | |
Iceland | Europe | Iceland Government | Positive cases can be parsed but total tested in only available in the interactive graphs. Provided a download data option though. | Manual | |
Italy | Europe | COVID-19 GitHub | Presidenza del Consiglio dei Ministri is publishing all data on github repository. | Github | |
Latvia | Europe | Latvia Center for Disease Prevention and Control | Official twitter account uploads daily tests results. Haven't found a source for deaths. | Scrapy | |
Lithuania | Europe | Lithuania Ministry of Health | Can parse directly from daily news releases. Historical values were collected from interactive map. | Scrapy | |
Poland | Europe | @micalrg's Google Doc | Polish government is tweeting out daily data which is being recorded by @micalrg. | Manual | |
Portugal | Europe | Portugal Ministry of Health | Releases number of tests performed and positive tests in interactive table. Can't parse with scrapy but will pull manually. | Manual | |
Romania | Europe | Romania Ministry of Health | Data taken from daily afternoon press briefings. Have to translate so might be errors. | Manual | |
United Kingdom | Europe | UK Government | Cummulative test counts are released daily. Data for Northern Ireland and Scotland are also being recorded on @Tomwhite on GitHub | covid-19-uk-data GitHub | Scrapy/Github |
Alberta, Canada | North America | Alberta Provincial Government | Collated test data can be found on website provided. Unable to parse, but can be added manually. | Manual | |
British Columbia, Canada | North America | British Columbia Center for Disease Control | Scrapy | ||
Manitoba, Canada | North America | Manitoba Government | Scrapy | ||
Canada National Lab | North America | Canada Government | Total number of cases doesn't match negative + positive, so difference is recorded as pending. | Scrapy | |
New Brunswick, Canada | North America | New Brunswick Provincial Government | Scrapy | ||
NL, Canada | North America | Newfoundland and Labrador Government | Scrapy | ||
Nova Scotia, Canada | North America | Nova Scotia Provincial Government | Scrapy | ||
NWT, Canada | North America | Northwest Territories Health and Social Servies | Scrapy | ||
Ontario, Canada | North America | Ontario Provincial Government | Scrapy | ||
Quebec, Canada | North America | Quebec Ministry of Health and Social Services | Scrapy | ||
Saskatchewan, Canada | North America | Saskatchewan Government | Scrapy | ||
Yukon, Canada | North America | Yukon Government | Scrapy | ||
USA | North America | Covid Tracking Project | Official sources aren't too good. Will pull from The COVID Tracking Project. Spiders are available for a number of states as backup. | Github | |
Australia Capital Territory | Oceania | Australia Capital Health Department | Scrapy | ||
New South Wales, Australia | Oceania | NSW Health Department | Press briefings are available at the link which are individually grabbed and parse | Scrapy | |
Philippines | Oceania | Philippines Department of Health | Can find negative and positve test results, but not deaths. Need additional source besides interactive maps. | Scrapy | |
--------------------------- | --------------- | ----------------------- | ---------------------------------------------------------------------------------- | -------------------------- | ------------- |
Refer to the diagram below to see what dates for what countries are available. Black indicates unavailable, and white indicates available. Updated data is typically added at 9 PM PST.
- Modify pipeline (covid19/pipeline.py) to specify were to save data.
- Run pipeline with covid19/update_all.py script. Countries specified for manual scrapping in the table below must be updated separately.