-
Notifications
You must be signed in to change notification settings - Fork 2
/
en - Data Fundamentals 02 - Finding Data.txt
51 lines (29 loc) · 5.36 KB
/
en - Data Fundamentals 02 - Finding Data.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# Finding Data
## Introduction
Now we know what data is and the questions we’re interested in, we’re ready to go out and hunt for it online.
In this tutorial, you will learn where to start looking for data. In this course, we will then look at different ways of getting hold of data, before setting you loose to find data yourselves!
## Data Sources
There are three basic ways of getting hold of data:
1. **Finding data** – this involves searching and finding data that has already been released
2. **Getting hold of more data** – asking for ‘new’ data from official sources e.g. through Freedom of Information requests. Sometimes data is public on a website but there is not a download link to get hold of it in bulk – but don’t give up! This data can be liberated with what datawranglers call [_scraping_](http://schoolofdata.org/handbook/courses/appendix/glossary/#term-scraping).
3. **Collecting data yourself** – This means gathering data and entering it into a database or a spreadsheet – whether you work alone or collaboratively.
In this tutorial we’ll focus on finding data that already has been released. We will deal with getting more data and collecting data yourself in future courses.
### Step 1: Identify your Data Source
Many sources frequently release data for public use. Some examples:
1. **Government** In recent years governments have begun to release some of their data to the public. Many governments host special (open) government data platforms for the data they create. For example the UK government started [data.gov.uk](http://data.gov.uk) to release their datasets. Similar data portals exist in the [US](http://www.data.gov), [Brazil](http://dados.gov.br/) and [Kenya](https://opendata.go.ke/) – and in many other countries! Does your country have an open data portal ([Datacatalogs.org](http://datacatalogs.org) is a good starting point)?
2. **Organisations** Other sources of data are large organisations. The [World Bank](http://data.WorldBank.org) and the [World Health Organization](http://www.who.int/research/en/) for example regularly release reports and data sets.
3. **Science** Scientific projects and institutions release data to the scientific community and the general public. Open data is produced by [NASA](http://data.nasa.gov/) for example, and many specific disciplines have their own data repositories, some of which are open. More and more initiatives exist trying to provide access to already published data (e.g. [Dryad](http://datadryad.org/))
To help people to find data, projects like the Open Access Directory’s [data repository list](http://oad.simmons.edu/oadwiki/Data_repositories) or the Open Knowledge Foundation’s [datahub.io](http://datahub.io) have been started. They aim either to collect data sources, or collect together different data sets from various sources.
### Step 2: Getting data in the format you need it
In the “What is Data” course, we talked briefly about the importance of [_machine-readable_](http://schoolofdata.org/handbook/courses/appendix/glossary/#term-machine-readable) data. You’ll save yourself a lot of trouble and time in working with the data if you get hold of data in the correct format initially. To tell Google which format you are looking for, you can search for CSV files by typing +filetype:csv in the search bar. Searching for "South Africa +filetype:csv" will result in CSV files mentioning South Africa. You can try different other filetypes as well (such as: "xls" for excel spreadsheets or "pdf").
## Using data to answer your question
Now that you have an overview of some of the key concepts related to data, it’s time to start hunting for your own! Over the next courses in the Data Fundamentals series, we will be further exploring the question we posed ourselves in the What is Data Course? **How does healthcare spending influence life expectancy?**. To get the data for this course, please see our recipe on Getting Data from the World Bank.
**Task:** If you found your own alternative data to answer this question, congratulations! Take a moment to upload it to the [DataHub](http://datahub.io) – and have a browse to see what other School of Data learners have found.
**Extension Task:** Explore the web, and see what [_open data_](http://schoolofdata.org/handbook/courses/appendix/glossary/#term-open-data) you can find. If you find something really interesting and think of an exciting question it could help to address, tweet it to @SchoolofData – or write a short post for the School of Data blog.
## Summary
In this tutorial we discussed how we get the data to answer our question. We explored different ways of accessing data sources and introduced several resources listing different data portals and search engines.
At the beginning of Data Fundamentals, we posed ourselves a question: ‘How does healthcare spending influence life expectancy?’, and by following the recipe, have found a dataset from the World Bank that will help us to answer that question.
## Extra Reading
1. [How to get data from the world bank data portal](http://schoolofdata.org/handbook/recipes/getting-data-from-world-bank/)
2. How to upload data to datahub.io [http://vimeo.com/45913395](http://vimeo.com/45913395)
3. The [Data Journalism Handbook](http://datajournalismhandbook.org/1.0/en/getting_data_0.html) has lots of handy tips for finding useful data sources in a “Five Minute Field Guide”