Predict survival on the Titanic using logistic regression by exploring relationships between passenger characteristics and survival outcomes through data cleaning, EDA, and model training. Dataset sourced from Kaggle.
To predict Titanic survival with a logistic regression model, aiming for higher accuracy, precision, and recall using dataset features.
- Numpy (for data manipulation)
- Pandas (for data manipulation)
- Matplotlib (for data visualization)
- Seaborn (for data visualization)
- Statstools (for data modeling)
- Scikit-Learn (for data modeling)
- Collections (for counting occurences)
- imblearn (for oversampling)
- Importing Libraries
- Importing Dataset
- Data Understanding
- Handling Missing Values
- 4.1 Handling missing values - Dropping
- 4.2 Handling missing values - Imputing
- 4.2.1 Feature engineering
- Exploratory Data Analysis
- 5.1 Bivariate Analysis
- 5.2 Multivariate Analysis
- Data Preparation for Modeling
- 6.1 Binary Encoding
- 6.2 Splitting train and test data set
- 6.3 Resampling Class Imbalance
- 6.4 Feature Scaling
- Training the Model
- 7.1 Model Creation
- 7.2 VIF
- Precision - Recall Analysis
- 8.1 Confusion Matrix
- 8.2 Precison and Recall
- 8.3 Optimal Cut off - Precision_Recall_Curve
- Predicting on the Test Data
- Preparation and Submission
- 10.1 Missing values
- 10.1.1 Handling missing values - Imputing
- 10.2 Binary Encoding
- 10.3 Scaling the Features
- 10.4 Dropping the unnecessary columns
- 10.5 Prediction and Submission