๐ข LAB 3 : Titanic Data Processing & Modeling ๐งโ๐ป
Welcome to the Titanic Data Preprocessing & Modeling project! ๐ This project showcases how to preprocess data, handle missing values, outliers, feature engineering, and build models like Logistic Regression and Random Forest! ๐ฅ
๐ ๏ธ Tools & Libraries Used
This project is built using Python ๐ and the following libraries:
Pandas ๐ผ: For data manipulation.
NumPy ๐ข: For numerical operations.
Matplotlib ๐ & Seaborn ๐จ: For visualizations.
Scikit-learn ๐ค: For machine learning and model evaluation.
๐ Project Overview
The main objective of this project is to preprocess the Titanic dataset and build models to predict passenger survival ๐ณ๏ธ. Here's a quick breakdown of the steps taken:
Data Collection ๐ฆ
Used the Titanic dataset from seaborn ๐ฏ
Data Cleaning ๐งน
Handle missing values, outliers, and invalid data ๐ฎ
Outliers Handling ๐ซ
Capped the extreme values in age and fare columns โ๏ธ
Normalization ๐
Scaled the numerical features using Min-Max scaling or Z-score normalization ๐๏ธ
Feature Engineering ๐ ๏ธ
Created new features like family_size and extracted title from names ๐
Feature Selection ๐
Selected important features using correlation and feature importance analysis ๐
Model Building ๐งโ๐ฌ
Built Logistic Regression & Random Forest models for classification ๐ค
๐ Steps to Run the Project
Clone this repository:
๐ Results
After training our models, here's what we found:
Logistic Regression Results ๐
Accuracy: 0.79
Precision: 0.72
Recall: 0.72
F1 Score: 0.72
Random Forest Results ๐ฒ
Accuracy: 0.78
Precision: 0.74
Recall: 0.65
F1 Score: 0.69
๐ The Logistic Regression model performed slightly better in terms of recall, while Random Forest had higher precision.
๐ค Why Titanic Dataset?
The Titanic dataset is famous for demonstrating basic machine learning tasks such as classification ๐. It's easy to understand yet provides a challenging problem with both categorical and numerical data ๐ง .
๐ฅ Features of this Project
Easy-to-follow data preprocessing pipeline ๐
Intuitive visualizations ๐ to understand the data and outliers
Fun feature engineering to get the most out of the dataset โ๏ธ
Two powerful models: Logistic Regression and Random Forest ๐ฏ
๐ How to Contribute
Fork this repository ๐ด
Create your feature branch: git checkout -b my-new-feature ๐ต
Commit your changes: git commit -am 'Add some feature' ๐พ
Push to the branch: git push origin my-new-feature ๐
Submit a pull request ๐
๐ก Fun Facts
Did you know? The Titanic was built in Belfast, Northern Ireland ๐ฎ๐ช.
The real story of the Titanic is both tragic and heroic, inspiring numerous books and movies ๐ฌ.
๐คฉ Meet the Team
Yves Dylane ๐ป Project Lead
๐ฌ Contact
Feel free to contact us with any questions or suggestions! ๐ง
Enjoy coding and have fun! ๐