Tutorial on using Pandas, a popular data analysis framework in Python, to perform data analysis tasks on structured data.
This repository is structured as follow:
contents/ : contains ipython notebooks for instructor's demonstration purposes during training session and practice sections for participants
solutions/ : contains ipython notebooks with solutions for the practice sections
Class contents are structured in such a way that core functionalities of pandas
are exposed to participants in a step-by-step and functional manner. Following contains short description of subject covered for each scripts:
01_intro.ipynb
: IntroducesSeries
andDataFrame
, two core data structures ofpandas
02_load_and_save.ipynb
: Importing and exporting files/datasets03_data_manipulation.ipynb
: Indexing and slicing data,loc
andiloc
methods04_EDA.ipynb
: General methods to perform exploratory data analysis05_data_cleaning.ipynb
: Performing missing data identification and imputation06_data_analysis.ipynb
: Performing more in-depth data analysis using rolling window, grouping method and more07_data_visualization.ipynb
: Depicting data in various visualizations
An introductory slide deck to pandas
is available here.
A Google Colaboratory URL is embedded in the button beside each notebook. One click on it and you are ready to check out the contents. If you want to access the notebooks locally, follow the steps below.
Isolating each environment for different projects is the best practice. One of the way you can create virtual environment is by using Python's native virtualenv
module. At this directory's root, execute the following to create a virtual environment:
python3 -m venv venv
Commands to activate virtual environment varies according to your OS. Use the following for Linux:
source venv/bin/activate
Use the following for Windows:
venv\Scripts\activate
A (venv)
appearing in front of your system path indicates that the virtual environment is successfully activated, as shown below:
(venv) C:\Users\User\pandas-tutorial>
Finally, execute the following to install the dependencies for this lab into your activated virtual environment:
pip install -r requirements.txt
All the dataset used in this repository does not belong to the authors but rather are open-sourced datasets found online. Attached are the URLs for each and every dataset: