NOTICE: This repo is automatically generated by apd-core. Please DO NOT modify this file directly. We have provided a new way to contribute to Awesome Public Datasets. The original PR entrance directly on repo is closed forever.
This list of a topic-centric public data sources in high quality. They are collected and tidied from blogs, answers, and user responses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found in sindresorhus's awesome list.
Table of Contents
- Agriculture
- Biology
- Climate+Weather
- ComplexNetworks
- ComputerNetworks
- DataChallenges
- EarthScience
- Economics
- Education
- Energy
- Finance
- GIS
- Government
- Healthcare
- ImageProcessing
- MachineLearning
- Museums
- NaturalLanguage
- Neuroscience
- Physics
- Psychology+Cognition
- PublicDomains
- SearchEngines
- SocialNetworks
- SocialSciences
- Software
- Sports
- TimeSeries
- Transportation
- Complementary Collections
1000 Genomes
American Gut (Microbiome Project)
Broad Bioimage Benchmark Collection (BBBC)
Broad Cancer Cell Line Encyclopedia (CCLE)
Cell Image Library
Complete Genomics Public Data
EBI ArrayExpress
EBI Protein Data Bank in Europe
ENCODE project
Electron Microscopy Pilot Image Archive (EMPIAR)
Ensembl Genomes
Gene Expression Omnibus (GEO)
Gene Ontology (GO)
Global Biotic Interactions (GloBI)
Harvard Medical School (HMS) LINCS Project
Human Genome Diversity Project
Human Microbiome Project (HMP)
ICOS PSP Benchmark
International HapMap Project
Journal of Cell Biology DataViewer
KEGG - KEGG is a database resource for understanding high-level functions [...]
MIT Cancer Genomics Data
NCBI Proteins
NCBI Taxonomy
NCI Genomic Data Commons
NIH Microarray data [fixme]
OpenSNP genotypes data
Pathguid - Protein-Protein Interactions Catalog
Protein Data Bank
Psychiatric Genomics Consortium
PubChem Project
PubGene (now Coremine Medical)
Sanger Catalogue of Somatic Mutations in Cancer (COSMIC)
Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC)
Sequence Read Archive(SRA)
Stanford Microarray Data [fixme]
Stowers Institute Original Data Repository
Systems Science of Biological Dynamics (SSBD) Database
The Cancer Genome Atlas (TCGA), available via Broad GDAC
The Catalogue of Life
The Personal Genome Project
UCSC Public Data
UniGene
Universal Protein Resource (UnitProt)
Actuaries Climate Index
Australian Weather
Aviation Weather Center - Consistent, timely and accurate weather [...]
Brazilian Weather - Historical data (In Portuguese)
Canadian Meteorological Centre
Climate Data from UEA (updated monthly)
European Climate Assessment & Dataset [fixme]
Global Climate Data Since 1929
NASA Global Imagery Browse Services
NOAA Bering Sea Climate
NOAA Climate Datasets
NOAA Realtime Weather Models
NOAA SURFRAD Meteorology and Radiation Datasets
The World Bank Open Data Resources for Climate Change
UEA Climatic Research Unit
WU Historical Weather Worldwide
WorldClim - Global Climate Data
AMiner Citation Network Dataset
CrossRef DOI URLs
DBLP Citation dataset [fixme]
DIMACS Road Networks Collection
NBER Patent Citations
NIST complex networks data collection
Network Repository with Interactive Exploratory Analysis Tools
Protein-protein interaction network
PyPI and Maven Dependency Network
Scopus Citation Database
Small Network Data
Stanford GraphBase
Stanford Large Network Dataset Collection
Stanford Longitudinal Network Data Sources
The Koblenz Network Collection
The Laboratory for Web Algorithmics (UNIMI)
The Nexus Network Repository [fixme]
UCI Network Data Repository
UFL sparse matrix collection
WSU Graph Database
3.5B Web Pages from CommonCrawl 2012
53.5B Web clicks of 100K users in Indiana Univ.
CAIDA Internet Datasets
CRAWDAD Wireless datasets from Dartmouth Univ.
ClueWeb09 - 1B web pages
ClueWeb12 - 733M web pages
CommonCrawl Web Data over 7 years
Criteo click-through data
Internet-Wide Scan Data Repository
OONI: Open Observatory of Network Interference - Internet censorship data
Open Mobile Data by MobiPerf
Rapid7 Sonar Internet Scans
UCSD Network Telescope, IPv4 /8 net
Bruteforce Database
Challenges in Machine Learning
CrowdANALYTIX dataX
D4D Challenge of Orange [fixme]
DrivenData Competitions for Social Good
ICWSM Data Challenge (since 2009) [fixme]
KDD Cup by Tencent 2012
Kaggle Competition Data
Localytics Data Visualization Challenge
Netflix Prize
Space Apps Challenge
Telecom Italia Big Data Challenge
TravisTorrent Dataset - MSR'2017 Mining Challenge
TunedIT - Data mining & machine learning data sets, algorithms, challenges
Yelp Dataset Challenge
AQUASTAT - Global water resources and uses
BODC - marine data of ~22K vars
EOSDIS - NASA's earth observing system data
Earth Models
Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements
Marinexplore - Open Oceanographic Data
Smithsonian Institution Global Volcano and Eruption Database
USGS Earthquake Archives
American Economic Association (AEA)
EconData from UMD
Economic Freedom of the World Data [fixme]
Historical MacroEconomc Statistics
INFORUM - Interindustry Forecasting at the University of Maryland
International Economics Database
International Trade Statistics
Internet Product Code Database
Joint External Debt Data Hub
Jon Haveman International Trade Data Links
OpenCorporates Database of Companies in the World
Our World in Data
SciencesPo World Trade Gravity Datasets
The Atlas of Economic Complexity
The Center for International Data
The Observatory of Economic Complexity
UN Commodity Trade Statistics
UN Human Development Reports
AMPds
BLUEd
COMBED
DRED
ECO
EIA
Global Power Plant Database - The Global Power Plant Database is a [...]
HES - Household Electricity Study, UK
HFED
PLAID - The Plug Load Appliance Identification Dataset
REDD
Tracebase
UK-DALE - UK Domestic Appliance-Level Electricity
WHITED
iAWE
CBOE Futures Exchange [fixme]
Google Finance
Google Trends
NASDAQ
NYSE Market Data
OANDA
OSU Financial data
Quandl
St Louis Federal
Yahoo Finance
ArcGIS Open Data portal
Cambridge, MA, US, GIS data on GitHub
Factual Global Location Data [fixme]
Geo Maps - High Quality GeoJSON maps programmatically generated
Geo Spatial Data from ASU
Geo Wiki Project - Citizen-driven Environmental Monitoring
GeoFabrik - OSM data extracted to a variety of formats and areas
GeoNames Worldwide
Global Administrative Areas Database (GADM) [fixme]
Homeland Infrastructure Foundation-Level Data
Landsat 8 on AWS
List of all countries in all languages
National Weather Service GIS Data Portal
Natural Earth - vectors and rasters of the world
OpenAddresses
OpenStreetMap (OSM)
Pleiades - Gazetteer and graph of ancient places
Reverse Geocoder using OSM data
TIGER/Line - U.S. boundaries and roads [fixme]
TZ Timezones shapfiles
TwoFishes - Foursquare's coarse geocoder
UN Environmental Data
World boundaries from the U.S. Department of State [fixme]
World countries in multiple formats
Alberta, Province of Canada
Antwerp, Belgium
Argentina (non official)
Datos Argentina - Portal de datos abiertos de la República Argentina. [...]
Austin, TX, US
Australia (abs.gov.au)
Australia (data.gov.au)
Austria (data.gv.at)
Baton Rouge, LA, US
Belgium
Brazil
Buenos Aires, Argentina
Calgary, AB, Canada [fixme]
Cambridge, MA, US
Canada
Chicago
Chile
Dallas Open Data
DataBC - data from the Province of British Columbia
Denver Open Data
Durham, NC Open Data
Edmonton, AB, Canada
England LGInform
EuroStat
EveryPolitician - Ongoing project collating and sharing data on every [...]
FedStats
Finland
France
Fredericton, NB, Canada
Gatineau, QC, Canada
Germany
Ghent, Belgium
Glasgow, Scotland, UK
Greece
Guardian world governments
Halifax, NS, Canada [fixme]
Helsinki Region, Finland
Hong Kong, China
Houston Open Data [fixme]
Indian Government Data
Indonesian Data Portal
Ireland's Open Data Portal
Italy - Il Portale dati.gov.it è il catalogo nazionale dei metadati [...]
Japan
Laval, QC, Canada
Lexington, KY
London Datastore, UK
London, ON, Canada
Los Angeles Open Data
Luxembourg - Luxembourgish Open Data Portal
MassGIS, Massachusetts, U.S.
Metropolitain Transportation Commission (MTC), California, US
Mexico
Missisauga, ON, Canada
Moldova
Moncton, NB, Canada
Montreal, QC, Canada
Mountain View, California, US (GIS)
NYC Open Data [fixme]
NYC betanyc
Netherlands
New Zealand
OECD
Oakland, California, US
Oklahoma
Open Data for Africa
Open Government Data (OGD) Platform India
OpenDataSoft's list of 1,600 open data
Oregon
Ottawa, ON, Canada
Palo Alto, California, US
OpenDataPhilly - OpenDataPhilly is a catalog of open data in the [...]
Portland, Oregon
Portugal - Pordata organization
Puerto Rico Government
Quebec City, QC, Canada
Quebec Province of Canada [fixme]
Regina SK, Canada
Rio de Janeiro, Brazil [fixme]
Romania
Russia
San Antonio, TX - Community Information Now - CI:Now is a nonprofit [...]
San Francisco Data sets
San Jose, California, US
San Mateo County, California, US
Saskatchewan, Province of Canada
Seattle
Singapore Government Data
South Africa Trade Statistics
South Africa
State of Utah, US
Switzerland
Taiwan g0v
Taiwan
Tel-Aviv Open Data
Texas Open Data
The World Bank
Toronto, ON, Canada
Tunisia
U.K. Government Data
U.S. American Community Survey
U.S. CDC Public Health datasets
U.S. Census Bureau
U.S. Department of Housing and Urban Development (HUD)
U.S. Federal Government Agencies
U.S. Federal Government Data Catalog
U.S. Food and Drug Administration (FDA)
U.S. National Center for Education Statistics (NCES)
U.S. Open Government
UK 2011 Census Open Atlas Project [fixme]
U.S. Patent and Trademark Office (USPTO) Bulk Data Products
Uganda Bureau of Statistics
United Nations
Uruguay
Valley Transportation Authority (VTA), California, US
Vancouver, BC Open Data Catalog
Victoria, BC, Canada [fixme]
Vienna, Austria
Composition of Foods Raw, Processed, Prepared USDA National Nutrient Database for Standard [...]
EHDP Large Health Data Sets
GDC - GDC supports several cancer genome programs for CCG, TCGA, TARGET etc.
Gapminder World demographic databases
MeSH, the vocabulary thesaurus used for indexing articles for PubMed
Medicare Coverage Database (MCD), U.S.
Medicare Data Engine of medicare.gov Data
Medicare Data File
Number of Ebola Cases and Deaths in Affected Countries (2014) [fixme]
Open-ODS (structure of the UK NHS)
OpenPaymentsData, Healthcare financial relationship data
PhysioBank Databases - A large and growing archive of physiological data.
The Cancer Imaging Archive (TCIA)
The Cancer Genome Atlas project (TCGA)
World Health Organization Global Health Observatory
10k US Adult Faces Database
2GB of Photos of Cats [fixme]
Adience Unfiltered faces for gender and age classification
Affective Image Classification
Animals with attributes
Caltech Pedestrian Detection Benchmark
Chars74K dataset - Character Recognition in Natural Images (both English [...]
Face Recognition Benchmark
Flickr: 32 Class Brand Logos
GDXray - X-ray images for X-ray testing and Computer Vision
ImageNet (in WordNet hierarchy)
Indoor Scene Recognition
International Affective Picture System, UFL
MNIST database of handwritten digits, near 1 million examples
Massive Visual Memory Stimuli, MIT
SUN database, MIT
Several Shape-from-Silhouette Datasets [fixme]
Stanford Dogs Dataset
The Action Similarity Labeling (ASLAN) Challenge
The Oxford-IIIT Pet Dataset
Violent-Flows - Crowd Violence / Non-violence Database and benchmark
Visual genome
YouTube Faces Database
Context-aware data sets from five domains
Delve Datasets for classification and regression
Discogs Monthly Data
Free Music Archive
IMDb Database
Keel Repository for classification, regression and time series
Labeled Faces in the Wild (LFW)
Lending Club Loan Data
Machine Learning Data Set Repository
Million Song Dataset
More Song Datasets
MovieLens Data Sets
New Yorker caption contest ratings
RDataMining - "R and Data Mining" ebook data
Registered Meteorites on Earth
Restaurants Health Score Data in San Francisco [fixme]
UCI Machine Learning Repository
Yahoo! Ratings and Classification Data
YouTube-BoundingBoxes
Youtube 8m
eBay Online Auctions (2012)
Canada Science and Technology Museums Corporation's Open Data
Cooper-Hewitt's Collection Database
Minneapolis Institute of Arts metadata
Natural History Museum (London) Data Portal
Rijksmuseum Historical Art Collection
Tate Collection metadata
The Getty vocabularies
Automatic Keyphrase Extraction
Blogger Corpus
CLiPS Stylometry Investigation Corpus
ClueWeb09 FACC
ClueWeb12 FACC
DBpedia - 4.58M things with 583M facts
Flickr Personal Taxonomies
Freebase of people, places, and things
Google Books Ngrams (2.2TB)
Google MC-AFP - Generated based on the public available Gigaword dataset [...]
Google Web 5gram (1TB, 2006)
Gutenberg eBooks List
Hansards text chunks of Canadian Parliament
Microsoft MAchine Reading COmprehension Dataset (or MS MARCO)
Machine Comprehension Test (MCTest) of text from Microsoft Research
Machine Translation of European languages
Making Sense of Microposts 2013 - Concept Extraction [fixme]
Making Sense of Microposts 2016 - Named Entity rEcognition and Linking
Multi-Domain Sentiment Dataset (version 2.0)
Open Multilingual Wordnet
POS/NER/Chunk annotated data
Personae Corpus
SMS Spam Collection in English
SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles)
Stanford Question Answering Dataset (SQuAD)
USENET postings corpus of 2005~2011
Universal Dependencies
Webhose - News/Blogs in multiple languages
Wikidata - Wikipedia databases
Wikipedia Links data - 40 Million Entities in Context
WordNet databases and tools [fixme]
Allen Institute Datasets
Brain Catalogue
Brainomics
CodeNeuro Datasets [fixme]
Collaborative Research in Computational Neuroscience (CRCNS)
FCP-INDI
Human Connectome Project
NDAR
NIMH Data Archive
NeuroData
Neuroelectro
OASIS
OpenfMRI
Study Forrest
CERN Open Data Portal
Crystallography Open Database
IceCube - South Pole Neutrino Observatory
NASA Exoplanet Archive
NSSDC (NASA) data of 550 space spacecraft
Sloan Digital Sky Survey (SDSS) - Mapping the Universe
Amazon
Archive.org Datasets
Archive-it from Internet Archive
CMU JASA data archive
CMU StatLab collections
Data.World
Data360
Enigma Public
Google
Infochimps [fixme]
KDNuggets Data Collections
Microsoft Azure Data Market Free DataSets [fixme]
Microsoft Data Science for Research
Numbray [fixme]
Open Library Data Dumps
Reddit Datasets
RevolutionAnalytics Collection
Sample R data sets
StatSci.org
Stats4Stem R data sets [fixme]
The Washington Post List
UCLA SOCR data collection
UFO Reports
Wikileaks 911 pager intercepts
Yahoo Webscope
Academic Torrents of data sharing from UMB
DataMarket (Qlik)
Datahub.io
Harvard Dataverse Network of scientific data
ICPSR (UMICH)
Institute of Education Sciences
National Technical Reports Library [fixme]
Open Data Certificates (beta)
OpenDataNetwork - A search engine of all Socrata powered data portals
Statista.com - statistics and Studies
Zenodo - An open dependable home for the long-tail of science
72 hours #gamergate Twitter Scrape
Ancestry.com Forum Dataset over 10 years
CMU Enron Email of 150 users
Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape
EDRM Enron EMail of 151 users, hosted on S3
Facebook Data Scrape (2005)
Facebook Social Networks from LAW (since 2007)
Foursquare from UMN/Sarwat (2013)
GitHub Collaboration Archive
Google Scholar citation relations
High-Resolution Contact Networks from Wearable Sensors
Indie Map: social graph and crawl of top IndieWeb sites
Mobile Social Networks from UMASS [fixme]
Network Twitter Data
Reddit Comments
Skytrax' Air Travel Reviews Dataset
Social Twitter Data
SourceForge.net Research Data
Twitter Data for Online Reputation Management
Twitter Data for Sentiment Analysis
Twitter Graph of entire Twitter site
Twitter Scrape Calufa May 2011 [fixme]
UNIMI/LAW Social Network Datasets
Yahoo! Graph and Social Data
Youtube Video Social Graph in 2007,2008
ACLED (Armed Conflict Location & Event Data Project)
Canadian Legal Information Institute [fixme]
Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc
Correlates of War Project
Cryptome Conspiracy Theory Items
Datacards [fixme]
European Social Survey
FBI Hate Crime 2013 - aggregated data
Fragile States Index [fixme]
GDELT Global Events Database
General Social Survey (GSS) since 1972
German Social Survey
Global Religious Futures Project
Gun Violence Data - A comprehensive, accessible database that contains [...]
Humanitarian Data Exchange [fixme]
INFORM Index for Risk Management
Institute for Demographic Studies
International Networks Archive
International Social Survey Program ISSP
International Studies Compendium Project
James McGuire Cross National Data
MIT Reality Mining Dataset
MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste
Minnesota Population Center
Notre Dame Global Adaptation Index (NG-DAIN)
Open Crime and Policing Data in England, Wales and Northern Ireland
OpenSanctions - A global database of persons and companies of political, [...]
Paul Hensel General International Data Page
PewResearch Internet Survey Project [fixme]
PewResearch Society Data Collection
Political Polarity Data
StackExchange Data Explorer
Terrorism Research and Analysis Consortium
Texas Inmates Executed Since 1984
Titanic Survival Data Set
UCB's Archive of Social Science Data (D-Lab)
UCLA Social Sciences Data Archive [fixme]
UN Civil Society Database
UPJOHN for Labor Employment Research
Universities Worldwide
Uppsala Conflict Data Program
World Bank Open Data
WorldPop project - Worldwide human population distributions
FLOSSmole data about free, libre, and open source software development
Libraries.io Open Source Repository and Dependency Metadata
Betfair Historical Exchange Data
Cricsheet Matches (cricket)
Ergast Formula 1, from 1950 up to date (API)
Football/Soccer resources (data and APIs)
Lahman's Baseball Database [fixme]
Pinhooker: Thoroughbred Bloodstock Sale Data
Retrosheet Baseball Statistics
Tennis database of rankings, results, and stats for ATP
Tennis database of rankings, results, and stats for WTA
Databanks International Cross National Time Series Data Archive
Hard Drive Failure Rates
Heart Rate Time Series from MIT
Time Series Data Library (TSDL) from MU
UC Riverside Time Series Dataset
Airlines OD Data 1987-2008
Bay Area Bike Share Data
Bike Share Systems (BSS) collection
GeoLife GPS Trajectory from Microsoft Research
German train system by Deutsche Bahn
Hubway Million Rides in MA
Montreal BIXI Bike Share
NYC Taxi Trip Data 2009-
NYC Taxi Trip Data 2013 (FOIA/FOILed)
NYC Uber trip data April 2014 to September 2014
Open Traffic collection
OpenFlights - airport, airline and route data
Philadelphia Bike Share Stations (JSON) [fixme]
Plane Crash Database, since 1920
RITA Airline On-Time Performance data
RITA/BTS transport data collection (TranStat)
Toronto Bike Share Stations (XML file) [fixme]
Transport for London (TFL)
Travel Tracker Survey (TTS) for Chicago
U.S. Bureau of Transportation Statistics (BTS)
U.S. Domestic Flights 1990 to 2009
U.S. Freight Analysis Framework since 2007
- Data Packaged Core Datasets
- Database of Scientific Code Contributions
- A growing collection of public datasets: CoolDatasets.
- DataWrangling: Some Datasets Available on the Web
- Inside-r: Finding Data on the Internet
- OpenDataMonitor: An overview of available open data resources in Europe
- Quora: Where can I find large datasets open to the public?
- RS.io: 100+ Interesting Data Sets for Statistics
- StaTrek: Leveraging open data to understand urban lives