Description: This code analyzes the Titanic dataset using logistic regression to predict passenger survival. It covers data preprocessing, exploratory data analysis, model training and evaluation, as well as feature interpretation. The goal is to identify influential survival factors and assess model performance.
Sections:
Exploratory Data Analysis:
- Visualize survival patterns and demographics.
- Missing data heatmap for identifying gaps.
- Survival visualization by gender and passenger class.
Data Cleaning:
- Impute missing ages based on passenger class averages.
- Drop Cabin column and rows with missing Embarked data.
- Convert categorical features to numerical dummy variables.
Building a Logistic Regression Model:
- Split data for training and testing.
- Scale feature data for consistency.
- Train, predict, and evaluate a logistic regression model.
Coefficient Analysis:
- Identify top feature coefficients and interpretation.
ROC Curve and AUC:
- Calculate and visualize the ROC curve and AUC score.
The analysis provides insights into survival determinants and assesses the model's predictive ability.