The lessons posted here are from an in-person course I taught in the fall of 2015. I have developed these materials into something more conducive to self-study at www.episkills.com.
##Getting started
I recommend using Enthought Canopy to install Python. Enthought is free, and works on Windows, Mac and Linux. It does not require administrative rights to install, and is on some lists for approved installation for government employees.
After installation, open Enthought and click on the Package Manager button. Make sure pandas and numpy are installed by searching for them, and clicking install if needed. Then return the welcome screen, and click Editor. From there, go to File -> New -> IPython Notebook. The Notebook relies heavily on keyboard shortcuts. Shift+enter is one you cannot get away without - it executes the code in each cell. This tutorial can help you learn more.
##Lessons
- Basic data skills using pandas: indexing and selection, filtering, cleaning data, basic summary statistics.
- More advanced data skills using pandas: stratified analysis, binning ages, pivot tables, comparing data sets.
- Writing control flows and functions: for loops, if-then statements, writing functions, dictionaries.
- Using epipy, a Python package for epidemiology.
- Visualization using pandas, seaborn and matplotlib.
- Automatically generated regular reports.
(More details in the homework folder)
- Practice by yourself. Apply the skills learned in lesson 1 to your own data. Practice using the ipython notebook.
- Congressional age analysis. Analyze a data set with every member of congress and their ages at election.
- Generate fake line list. Pretend you are teaching an epidemiology class. Use control flows and functions to prepare to teach.
- No really, do homework 3.
- Produce the graphics for a report or analysis that you perform regularly.
- Put the finishing aesthteic touches on homework 5, and add any additional analyses needed. Then distribute!
Pandas documentation - everything you could ever want to know about pandas
Seaborn documentation - beautiful plots, made easy
Greg Reda's pandas tutorial - a gentle introduction to data analysis using pandas
Learn Python the Hard Way - a long and thorough tutorial of more 'traditional' programming
Python for Data Analysis - a book available to purchase by the author of Pandas
Epipy - a Python package for epidemiology analysis. Documentation is also available.
Creative Commons Attribution NonCommercial ShareAlike (CC-NC-SA)
Reuse and distribution -> yes. Commercial reuse -> no. Attribution -> please.