For this project, we have worked with the Titanic Data Set from Kaggle.
The goal is to predict whether each person survived or deceased in the shipwreck. (binary classification task)
The model's objective is to analyze various features or information about the individuals and make predictions about their survival outcomes based on that data.
Steps of the project:
- Import all important libraries
- Reading the titanic_train.csv file into pandas dataframe
- View the top few rows of the dataframe
- Exploratory Data Analysis to visualize the data
- Check for Missing data
- Data Cleaning: Impute missing values in Age based Pclass (take average of age in Pclass)
- Data Cleaning: Drop the Cabin Column
- Data Cleaning: Drop the row in Embarked column that is NaN
- Feature Engineering
- Convert categorical features (Sex, Embark) to dummy variables using get_dummies
- Build a logistic regression model (by splitting the data in 70:30 ratio of train/test)
- Predict and evaluate the model
- Analyse Confusion Matrix and Classification Report