Outline

1 - Project Outline
2 - Problem Statement
- 1.1 The Challenge
- 1.2 Data Content
3 - Analysis Conclusion
4 - Feature Engineering
5 - Best Model & Evaluation

1 - Project Outline

1- Loading Packages
3- Univariate Analysis
4- Bivariate Analysis
5- Conclusion
6- Data Preprocessing
- 6.1 Quick Pipeline
- 6.2 Full Pipeline
7- BEAST Model
8- Error Analysis
9- Voting Classification
10- Final Submission

2 - Problem Statement

2.1 The Challenge

The sinking of the Titanic is one of the most infamous shipwrecks in history.
On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.
While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.
In this challenge, the task is to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

2.2 Data Content

Overview

The Titanic dataset is a classic dataset in the field of data science and machine learning. It contains information about passengers aboard the ill-fated Titanic, including details such as their demographics, ticket information, and survival status.

Data Card

Variables

PassengerId: Unique identifier for each passenger.
Survived: Binary variable indicating whether the passenger survived (1) or not (0).
Pclass: Ticket class (1st, 2nd, or 3rd class).
Name: Passenger's name.
Sex: Gender of the passenger.
Age: Age of the passenger.
SibSp: Number of siblings/spouses aboard.
Parch: Number of parents/children aboard.
Ticket: Ticket number.
Fare: Passenger fare.
Cabin: Cabin number.
Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).

Target Variable

Survived: This is the target variable we aim to predict. It indicates whether a passenger survived the sinking of the Titanic.

3 - Analysis Conclusion

Exploring the Titanic dataset reveals fascinating details about passengers and their survival. Surprisingly, more than 60% of passengers did not survive, mostly being men. The main starting point was Southampton, and over 50% of passengers were in the 3rd class, showing a clear class difference. The age distribution looked like a typical curve, with more younger passengers, possibly influencing who survived.

When we looked at cabins, some, like Cabins E, D, and B, had higher survival rates. We found connections between passenger and fare class, suggesting that where people stayed on the ship related to their ticket prices. Understanding different passenger classes showed clear differences in survival rates, with 1st and 2nd classes having higher rates. Examining factors like gender, family size, embarkation port, and titles gave us a more detailed picture. Interestingly, those with titles like 'Dr/Military/Noble/Clergy' surprisingly had lower survival rates, prompting more questions.

Taking a closer look at passengers with high fares, we discovered a group that was mostly female, 1st class, and from Cherbourg, with notably higher survival rates. This made us wonder about preferential treatment during evacuation based on socio-economic factors. Overall, the Titanic dataset goes beyond numbers, providing a story of social dynamics, spatial influences, and unexpected connections, giving us insights into the tragic events that unfolded.

4 - Feature Engineering

Summary of Preprocessing Pipelines

Cabin Letter Class Pipeline: Categorizes cabins based on their starting letters.
Name (Title) Extraction Pipeline: Extracts titles from names and groups them into broader categories.
Surname Pipeline: Extracts surnames from names.
Family Size Pipeline: Calculates family sizes and categorizes them.
Ticket Groups Pipeline: Groups tickets based on counts.
Married Women Pipeline: Identifies married women from names.
Age Class Pipeline: Categorizes ages based on passenger class, sex, and title.
Fare Pipeline: Categorizes fares into classes.
Category Pipelines: Handles categorical variables by encoding them.
Sex Pipeline: Handles sex data by encoding it.
Full Preprocessing Pipeline: Combines all preprocessing steps into one.
Target Encoding Processing: Applies target encoding to categorical features.
Target Encoding Processing 2: Further processes target-encoded features.
Full Pipeline: Integrates all preprocessing and target encoding steps into one cohesive pipeline for use with machine learning models.

5 - Best Model & Evaluation

Ridge classifier reached 79.1% on submission result, when Voting Ridge with Logistic regression, LinearSVC and Calibrated classifier reached 79.9% putting us among top5%.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
Titanic- Advanced Techniques-(Accuracy~80).ipynb		Titanic- Advanced Techniques-(Accuracy~80).ipynb
image-1.png		image-1.png
image.png		image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Outline

1 - Project Outline

2 - Problem Statement

2.1 The Challenge

2.2 Data Content

Overview

Data Card

Variables

Target Variable

3 - Analysis Conclusion

4 - Feature Engineering

Summary of Preprocessing Pipelines

5 - Best Model & Evaluation

About

Releases

Packages

Languages

sayedgamal99/Titanic-Survival-Prediction-Advanced-Techniques

Folders and files

Latest commit

History

Repository files navigation

Outline

1 - Project Outline

2 - Problem Statement

2.1 The Challenge

2.2 Data Content

Overview

Data Card

Variables

Target Variable

3 - Analysis Conclusion

4 - Feature Engineering

Summary of Preprocessing Pipelines

5 - Best Model & Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages