Skip to content

Titanic dataset analysis using NumPy and Matplotlib, includes preprocessing, feature engineering, statistical analysis, and visualizations to explore survival patterns.

Notifications You must be signed in to change notification settings

rabeehakamran/Preprocessing-Numpy-only

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Titanic Survival Analysis (NumPy + Matplotlib)

📌 Project Overview

This project analyzes the Titanic dataset using NumPy & Matplotlib.
We perform data preprocessing, feature engineering, statistical analysis, and visualization to uncover survival patterns.


⚙️ Data Processing Steps

1️⃣ Data Loading & Cleaning

  • Merged Name columns
  • Handled missing values (Age, Fare → mean | Embarked → mode)

2️⃣ Encoding

  • Sex encoded (female=0, male=1)
  • Embarked encoded (S=0, C=1, Q=2)

3️⃣ Feature Engineering

  • Dropped Name, Ticket, Cabin
  • Added FamilySize & IsAlone features

4️⃣ Normalization

  • Applied Z-score scaling on Age & Fare

5️⃣ Statistical Analysis

  • Computed mean, median, std for key features
  • Calculated survival rates by gender & class
  • Correlation matrix of numerical features

6️⃣ Visualizations

  • Survival Rate by Gender (bar chart)
  • Fare Distribution (histogram)
  • Correlation Heatmap

7️⃣ Train/Test Split

  • Random shuffle
  • 80% training, 20% testing

About

Titanic dataset analysis using NumPy and Matplotlib, includes preprocessing, feature engineering, statistical analysis, and visualizations to explore survival patterns.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published