GitHub - yvesdylane/data_processing: Welcome to the Titanic Data Preprocessing & Modeling project! 🌊 This project showcases how to preprocess data, handle missing values, outliers, feature engineering, and build models like Logistic Regression and Random Forest! 🔥

🚢 LAB 3 : Titanic Data Processing & Modeling 🧑‍💻

Welcome to the Titanic Data Preprocessing & Modeling project! 🌊 This project showcases how to preprocess data, handle missing values, outliers, feature engineering, and build models like Logistic Regression and Random Forest! 🔥

🛠️ Tools & Libraries Used

This project is built using Python 🐍 and the following libraries:

Pandas 🐼: For data manipulation.

NumPy 🔢: For numerical operations.

Matplotlib 📊 & Seaborn 🎨: For visualizations.

Scikit-learn 🤖: For machine learning and model evaluation.

🔍 Project Overview

The main objective of this project is to preprocess the Titanic dataset and build models to predict passenger survival 🛳️. Here's a quick breakdown of the steps taken:

Data Collection 📦

Used the Titanic dataset from seaborn 🎯

Data Cleaning 🧹

Handle missing values, outliers, and invalid data 🚮

Outliers Handling 🚫

Capped the extreme values in age and fare columns ✂️

Normalization 📐

Scaled the numerical features using Min-Max scaling or Z-score normalization 🎚️

Feature Engineering 🛠️

Created new features like family_size and extracted title from names 📛

Feature Selection 📑

Selected important features using correlation and feature importance analysis 🔍

Model Building 🧑‍🔬

Built Logistic Regression & Random Forest models for classification 🤖

🚀 Steps to Run the Project

Clone this repository:

📊 Results

After training our models, here's what we found:

Logistic Regression Results 🚀

Accuracy: 0.79

Precision: 0.72

Recall: 0.72

F1 Score: 0.72

Random Forest Results 🌲

Accuracy: 0.78

Precision: 0.74

Recall: 0.65

F1 Score: 0.69

👀 The Logistic Regression model performed slightly better in terms of recall, while Random Forest had higher precision.

🤔 Why Titanic Dataset?

The Titanic dataset is famous for demonstrating basic machine learning tasks such as classification 📚. It's easy to understand yet provides a challenging problem with both categorical and numerical data 🧠.

🔥 Features of this Project

Easy-to-follow data preprocessing pipeline 📋

Intuitive visualizations 📈 to understand the data and outliers

Fun feature engineering to get the most out of the dataset ⚙️

Two powerful models: Logistic Regression and Random Forest 🎯

🙌 How to Contribute

Fork this repository 🍴

Create your feature branch: git checkout -b my-new-feature 🌵

Commit your changes: git commit -am 'Add some feature' 💾

Push to the branch: git push origin my-new-feature 🚀

Submit a pull request 🎉

💡 Fun Facts

Did you know? The Titanic was built in Belfast, Northern Ireland 🇮🇪.

The real story of the Titanic is both tragic and heroic, inspiring numerous books and movies 🎬.

🤩 Meet the Team

Yves Dylane 💻 Project Lead

📬 Contact

Feel free to contact us with any questions or suggestions! 📧

Enjoy coding and have fun! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Lab3.py		Lab3.py
README.md		README.md
lab3.odt		lab3.odt
lab3.pdf		lab3.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

yvesdylane/data_processing

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages