Skip to content

Pair Programming – NumPy & Pandas Project – collaboratively applied NumPy and Pandas for EDA, data cleaning, null handling, and transformations, including groupby/apply operations on real datasets.

Notifications You must be signed in to change notification settings

ana-nobre/Pair-Programming-NumPy-and-Pandas-Project

Repository files navigation

Pair Programming — NumPy and Pandas Project

1) Overview

This project was developed as a pair programming exercise to practice data analysis using NumPy and Pandas.
The goal is to explore, clean, transform, and integrate datasets while applying core data analytics concepts, from exploratory analysis to visualization.

2) Repository structure

PAIR-PROGRAMMING-NUMPY-AND-PANDAS-PROJECT/
├─ EDA.ipynb                      # Exploratory Data Analysis
├─ Group-By-and-Apply.ipynb       # Aggregations and custom functions
├─ Nulls-management.ipynb         # Handling missing values
├─ Numpy.ipynb                     # NumPy fundamentals
├─ Pandas.ipynb                    # Pandas basics
├─ Merge-and-Data-Cleaning.ipynb   # Data merging and cleaning
├─ vis_world_data.ipynb            # Data visualization
├─ medallas.csv                    # Olympics medal dataset (input)
├─ world_data_full_apply.csv       # Processed world data (output)
└─ README.md

3) Learning objectives

  • NumPy
    • Array creation, slicing, reshaping, broadcasting, and vectorized operations.
  • Pandas
    • DataFrame and Series manipulation, indexing, and selection.
  • EDA (Exploratory Data Analysis)
    • Descriptive statistics, distributions, correlations.
  • GroupBy & Apply
    • Aggregations, transformations, and custom apply functions.
  • Null management
    • Identifying, imputing, and dropping missing values.
  • Merging & Cleaning
    • Joining multiple datasets and ensuring consistency.
  • Visualization
    • Plotting insights with Matplotlib/Seaborn.

4) How to run

  1. Open the notebooks in Jupyter Notebook or VS Code (Jupyter extension).
  2. Run cells sequentially in each notebook.

5) Skills demonstrated

  • Efficient use of NumPy arrays for numerical computation.
  • Data wrangling and manipulation with Pandas.
  • Handling real-world data quality issues (nulls, duplicates).
  • Combining multiple datasets into a unified view.
  • Generating insights through exploratory data analysis and visualization.
  • Collaborative coding through pair programming practices (shared design, debugging, and review).

About

Pair Programming – NumPy & Pandas Project – collaboratively applied NumPy and Pandas for EDA, data cleaning, null handling, and transformations, including groupby/apply operations on real datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •