This project was developed as a pair programming exercise to practice data analysis using NumPy and Pandas.
The goal is to explore, clean, transform, and integrate datasets while applying core data analytics concepts, from exploratory analysis to visualization.
PAIR-PROGRAMMING-NUMPY-AND-PANDAS-PROJECT/
├─ EDA.ipynb # Exploratory Data Analysis
├─ Group-By-and-Apply.ipynb # Aggregations and custom functions
├─ Nulls-management.ipynb # Handling missing values
├─ Numpy.ipynb # NumPy fundamentals
├─ Pandas.ipynb # Pandas basics
├─ Merge-and-Data-Cleaning.ipynb # Data merging and cleaning
├─ vis_world_data.ipynb # Data visualization
├─ medallas.csv # Olympics medal dataset (input)
├─ world_data_full_apply.csv # Processed world data (output)
└─ README.md
- NumPy
- Array creation, slicing, reshaping, broadcasting, and vectorized operations.
- Pandas
- DataFrame and Series manipulation, indexing, and selection.
- EDA (Exploratory Data Analysis)
- Descriptive statistics, distributions, correlations.
- GroupBy & Apply
- Aggregations, transformations, and custom apply functions.
- Null management
- Identifying, imputing, and dropping missing values.
- Merging & Cleaning
- Joining multiple datasets and ensuring consistency.
- Visualization
- Plotting insights with Matplotlib/Seaborn.
- Open the notebooks in Jupyter Notebook or VS Code (Jupyter extension).
- Run cells sequentially in each notebook.
- Efficient use of NumPy arrays for numerical computation.
- Data wrangling and manipulation with Pandas.
- Handling real-world data quality issues (nulls, duplicates).
- Combining multiple datasets into a unified view.
- Generating insights through exploratory data analysis and visualization.
- Collaborative coding through pair programming practices (shared design, debugging, and review).