Data Carpentry's aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop uses a tabular ecology dataset and teaches data cleaning, management, analysis and visualization. There are no pre-requisites, and the materials assume no prior knowledge about the tools.
The workshop uses a single tabular data set that contains observations about adorable small mammals over a long period of time in Arizona. See data.md
for more information about this data set, including the download location.
The workshop can be taught using R or Python as the base language.
Overview of the lessons:
- Data organization in spreadsheets and data cleaning with OpenRefine
- Introduction to R or Python
- Data analysis and visualization in R or Python
- SQL for data management
An example of the ecology materials in the wild is this Data Carpentry workshop at CalTech in 2015.
There are two lessons in this section. The first is a spreadsheet lesson that teaches good data organization, and some data cleaning and quality control checking in a spreadsheet program.
The second lesson uses a spreadsheet-like program called OpenRefine to teach data cleaning and filtering, and to introduce scripting, regular expressions and APIs (application programming interfaces).
These lessons includes a basic introduction to R or Python syntax, importing CSV data, and subsetting and merging data. It finishes with calculating summary statistics and creating simple plots.
This lesson introduces the concept of a database using SQLite, how to structure data for easy database import, and how to import tabular data into SQLite. Then, it teaches basic queries, combining results and doing queries across multiple tables.
There are a number of other ecology lessons that are not part of the base workshop. Some of these are no longer taught, and some are only taught at extended workshops.