Auto Insurance Claim Prediction ✨ 🚗📈 ✨

This project focuses on predicting two essential outcomes for auto insurance claims:

The probability of a car crash (TARGET_FLAG) - Binary Logistic Regression
The potential claim amount if a crash occurs (TARGET_AMT) - Multiple Linear Regression

Why This Matters

Accurate predictions of accident probability and claim amounts allow insurance providers to assess risks better, set fair premiums, and handle claims efficiently. This project uses advanced data preparation and modeling techniques to maximize prediction accuracy, ensuring models are ready to tackle real-world scenarios with high variability.

🛠️ Project Workflow

The project follows a systematic data preparation and modeling workflow to handle this high-dimensional dataset, clean inconsistencies, address class imbalances, and tackle multicollinearity. Below is the comprehensive flowchart of the workflow:

🚀 Models Built and Evaluated

1. Multiple Linear Regression (MLR)

Full Model: Includes all predictors to assess the overall feature impact.
Stepwise Model: Refined using stepwise selection to improve simplicity and interpretability.

2. Binary Logistic Regression (BLR)

Null Model: Serves as a baseline.
Full Model: Includes all predictors to explore all possible risk factors.
Stepwise Model: Adds preprocessing steps (removing near-zero variance and correlated features) for a leaner, more focused model.

📊 Key Results and Visualizations

Each model was evaluated on a set of important metrics to identify the best-performing approach.

Multiple Linear Regression (MLR) Metrics

MLR models were evaluated on Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared, Adjusted R-squared, and F-statistic:

MSE & RMSE: These metrics help measure the average and root error between actual and predicted values, providing insight into prediction accuracy.
R-squared & Adjusted R-squared: These scores indicate the proportion of variance in the target variable explained by the model. Adjusted R-squared adjusts for the number of predictors, giving a more accurate assessment as predictors are added.
F-statistic: Assesses the overall significance of the model, with higher values indicating a better fit.

The Stepwise MLR model slightly outperformed the Full Model with a higher Adjusted R-squared and F-statistic, indicating a more parsimonious model with similar predictive power.

Binary Logistic Regression (BLR) Metrics

BLR models were assessed with Accuracy, Error Rate, Kappa, Precision, Sensitivity, Specificity, F1 Score, and AUC (Area Under the Curve):

Accuracy & Error Rate: Measure the model's correctness and error rate, providing a straightforward performance overview.
Kappa: Indicates how well the predictions match the actual values, adjusted for agreement by chance, offering a fairer metric than accuracy in imbalanced datasets.
Precision & Sensitivity: Evaluate the model's ability to correctly identify positive cases (crash likelihood), essential in risk prediction.
Specificity: Indicates the model’s ability to correctly classify non-crash cases.
F1 Score & AUC: F1 balances precision and sensitivity, while AUC reflects the overall ability to discriminate between crash and non-crash cases.

The Stepwise BLR model achieved the best AUC, Kappa, and F1 scores, demonstrating balanced predictive power with reduced predictor redundancy.

By analyzing these metrics, the Stepwise models for both MLR and BLR were chosen for their ability to balance predictive power with simplicity. These models were then retrained on the full dataset to produce robust final models for prediction on unseen data.

📂 File Structure

Data: Source data files.
Resources: Supporting images and charts for reference.
Code: R scripts for data preparation, model building, and evaluation.
Final Predictions: Exported predictions on test data for easy access.

Happy analyzing! ✨

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Resources		Resources
data		data
Data 621 HW4.pdf		Data 621 HW4.pdf
Data 621-hw4.Rmd		Data 621-hw4.Rmd
README.md		README.md
final_predictions.csv		final_predictions.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto Insurance Claim Prediction ✨ 🚗📈 ✨

Why This Matters

🛠️ Project Workflow

🚀 Models Built and Evaluated

1. Multiple Linear Regression (MLR)

2. Binary Logistic Regression (BLR)

📊 Key Results and Visualizations

Multiple Linear Regression (MLR) Metrics

Binary Logistic Regression (BLR) Metrics

📂 File Structure

About

Releases

Packages

yinaS1234/Auto-Insurance-Regression

Folders and files

Latest commit

History

Repository files navigation

Auto Insurance Claim Prediction ✨ 🚗📈 ✨

Why This Matters

🛠️ Project Workflow

🚀 Models Built and Evaluated

1. Multiple Linear Regression (MLR)

2. Binary Logistic Regression (BLR)

📊 Key Results and Visualizations

Multiple Linear Regression (MLR) Metrics

Binary Logistic Regression (BLR) Metrics

📂 File Structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages