This repository contains the material (notebooks, data) for the pandas tutorial
To follow this tutorial you need to have the following packages installed:
- Python version 2.6-2.7 or 3.3-3.5
pandas
version 0.18.0 or later: http://pandas.pydata.org/ (previous versions will work for most examples as well)numpy
version 1.7 or later: http://www.numpy.org/matplotlib
version 1.3 or later: http://matplotlib.org/ipython
version 3.x with notebook support, oripython 4.x
combined withjupyter
: http://ipython.org
You can install all this modules using Pip module
pip install module_name #
OR you can simply run
pip install requirements.txt
It will add all modules at once.
Once this is installed, the following command will install all required packages in your Python environment:
conda install pandas jupyter seaborn
But of course, using another distribution (e.g. Enthought Canopy) or pip is good as well, as long as you have the above packages installed.
If you have git installed, you can get the material in this tutorial by cloning this repo:
git clone https://github.com/jorisvandenbossche/pandas-tutorial.git
As an alternative, you can download it as a zip file: https://github.com/jorisvandenbossche/pandas-tutorial/archive/master.zip.
All the data files are inside the repo, do make sure to get it.
To view the content on nbviewer:
- Index
- 1.Introduction to pandas
- 2.Analysis Using Pandas
- 3.adding to dataframe
- 4.Boolean indexing
- 5.Categorical Data
- 6.Computations Tools in pandas
- 7.Creating DataFrames
- 8.Cross sections of different axes with MultiIndex
- 9.Pandas Data types
- 10.get_dummies
- 11.Handling Duplicate data
- 12.More on Dataframes
- 13.NaN values
- 14.Data Visualization
- 15.Grouping columns in pandas
- 16.Grouping Time Series Data