This project analyzes the Titanic dataset using NumPy & Matplotlib.
We perform data preprocessing, feature engineering, statistical analysis, and visualization to uncover survival patterns.
1️⃣ Data Loading & Cleaning
- Merged Name columns
- Handled missing values (Age, Fare → mean | Embarked → mode)
2️⃣ Encoding
- Sex encoded (female=0, male=1)
- Embarked encoded (S=0, C=1, Q=2)
3️⃣ Feature Engineering
- Dropped Name, Ticket, Cabin
- Added FamilySize & IsAlone features
4️⃣ Normalization
- Applied Z-score scaling on Age & Fare
5️⃣ Statistical Analysis
- Computed mean, median, std for key features
- Calculated survival rates by gender & class
- Correlation matrix of numerical features
6️⃣ Visualizations
- Survival Rate by Gender (bar chart)
- Fare Distribution (histogram)
- Correlation Heatmap
7️⃣ Train/Test Split
- Random shuffle
- 80% training, 20% testing