Skip to content

TalhaMemisoglu/Claim_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Car Insurance Claim Prediction with Machine Learning

This project focuses on building a machine learning pipeline to predict whether a customer will file a car insurance claim based on policy, demographic, and vehicle-related features.

The motivation behind this project is to assist insurance companies in:

  • Risk Assessment: Identifying high-risk customers more accurately.
  • Operational Efficiency: Reducing manual effort in claim risk analysis.
  • Customer Retention: Offering personalized services and pricing strategies.

Dataset

Source: Kaggle Car Insurance Claim Dataset
Size: ~40,000 policyholder records
Target: Claim Status (1 = Claim, 0 = No Claim)
Features include: Policy tenure, age of car, vehicle type, population density, premium amount, income group, region type, etc.

Metodology

1. Exploratory Data Analysis (EDA)

  • Data cleaning (missing values, duplicates, outliers)
  • Feature distributions and correlations
  • Class imbalance detection

2. Feature Engineering

  • Encoding categorical features
  • Scaling numerical features
  • Handling class imbalance using oversampling(SMOTE), random undersampling, and class weights

3. Model Training

Trained and compared multiple models:

  • Random Forest
  • Logistic Regression
  • CatBoost
  • XGBoost

4. Model Evaluation

  • Metrics: Accuracy, Precision, Recall, F1-score
  • Compared models on the same test set

Results

  • Best performing model: RandomForest with adjusted class weight parameter
  • Achieved macro Average F1-score: 51%

About

Car Insurance Claim Prediction with Machine Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published