Skip to content

Tutorial on using Pandas, a popular data analysis framework in Python, to perform data analysis tasks on structured data.

License

Notifications You must be signed in to change notification settings

KianYang-Lee/pandas-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pandas-tutorial

Tutorial on using Pandas, a popular data analysis framework in Python, to perform data analysis tasks on structured data.

Contents

Repository Structure

This repository is structured as follow:

contents/   : contains ipython notebooks for instructor's demonstration purposes during training session and practice sections for participants
solutions/  : contains ipython notebooks with solutions for the practice sections

Class contents are structured in such a way that core functionalities of pandas are exposed to participants in a step-by-step and functional manner. Following contains short description of subject covered for each scripts:

  • 01_intro.ipynb: Introduces Series and DataFrame, two core data structures of pandas Open In Colab
  • 02_load_and_save.ipynb: Importing and exporting files/datasets Open In Colab
  • 03_data_manipulation.ipynb: Indexing and slicing data, loc and iloc methods Open In Colab
  • 04_EDA.ipynb: General methods to perform exploratory data analysis Open In Colab
  • 05_data_cleaning.ipynb: Performing missing data identification and imputation Open In Colab
  • 06_data_analysis.ipynb: Performing more in-depth data analysis using rolling window, grouping method and more Open In Colab
  • 07_data_visualization.ipynb: Depicting data in various visualizations Open In Colab

Slide

An introductory slide deck to pandas is available here.

Dependencies

A Google Colaboratory URL is embedded in the button beside each notebook. One click on it and you are ready to check out the contents. If you want to access the notebooks locally, follow the steps below.

Isolating each environment for different projects is the best practice. One of the way you can create virtual environment is by using Python's native virtualenv module. At this directory's root, execute the following to create a virtual environment:

python3 -m venv venv

Commands to activate virtual environment varies according to your OS. Use the following for Linux:

source venv/bin/activate

Use the following for Windows:

venv\Scripts\activate

A (venv) appearing in front of your system path indicates that the virtual environment is successfully activated, as shown below:

(venv) C:\Users\User\pandas-tutorial>

Finally, execute the following to install the dependencies for this lab into your activated virtual environment:

pip install -r requirements.txt

Disclaimer

All the dataset used in this repository does not belong to the authors but rather are open-sourced datasets found online. Attached are the URLs for each and every dataset:

About

Tutorial on using Pandas, a popular data analysis framework in Python, to perform data analysis tasks on structured data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published