Skip to content

Use GLM to predict insurance claims. Frequency Modelling, Severity Modelling, PurePremium Modelling, Xgboost Tweedie Regression, pygam linear modelling.

Notifications You must be signed in to change notification settings

bhishanpdl/Project_French_Motor_Claims

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Introduction

Project : French Motor Claims
Author : Bhishan Poudel, Ph.D Physics
Goal : Implement Frequency modelling, Severity modelling and Pure Premium Modelling
Tools : pandas, scikit-learn, xgboost,pygam

References:

Project Notebooks

Notebook Rendered Description Author
a01_data_cleaning.ipynb ipynb, rendered ohe, kbin, logscaling Bhishan Poudel
b01_freq_modelling.ipynb ipynb, rendered Poisson Bhishan Poudel
b02_severity_modelling.ipynb ipynb, rendered Gamma Bhishan Poudel
b03_pure_premium_modelling.ipynb ipynb, rendered Poisson*Gamma and Tweedie Bhishan Poudel
b04_tweedie_vs_freqSev.ipynb ipynb, rendered comparison Bhishan Poudel
b05_lorentz_curves_comparison.ipynb ipynb, rendered Lorentz Curve Bhishan Poudel
c01_xgboost_tweedie.ipynb ipynb, rendered 'objective':'reg:tweedie' Bhishan Poudel
d01_gam_linear.ipynb ipynb, rendered n_splies=10, grid_search Bhishan Poudel

Data

Data Cleaning

Some of the features are chosen for modelling.

one hot encoding = ["VehBrand", "VehPower", "VehGas", "Region", "Area"]
kbins discretizer = ["VehAge", "DrivAge"]
log and scaling = ["Density"]
pass through =  ["BonusMalus"]

Results

Module Distribution y_train sample_weight train D2 test D2 train MAE test MAE train MSE test MSE
sklearn Frequency Modelling (Poisson Distribution) df_train['Frequency'] df_train['Exposure'] 0.051384 0.048138 0.232085 0.224547 4.738399 2.407906
sklearn Severity Modelling (Gamma Distribution) df_train.loc[mask_train, 'AvgClaimAmount'] df_train.loc[mask_train, 'ClaimNb'] - 3.638157e-03 -4.747382e-04 1.859814e+03 1.856312e+03 4.959565e+06
sklearn Pure Premium Modelling (TweedieRegressor) df_train['PurePremium'] df_train['Exposure'] 2.018645e-02 1.353285e-02 6.580440e+02 4.927505e+02 1.478259e+09 1.622053e+08
xgboost Xgboost Tweedie Regression dtrain.set_base_margin(np.log(df_train['Exposure'].to_numpy()) dtest.set_base_margin(np.log(df_test['Exposure'].to_numpy())) - - 1.760538e+03 1.588351e+03 1.481952e+09 1.659363e+08
pygam GAM Linear Model df_train["AvgClaimAmount"].values N/A - - 1.686438e+02 1.655408e+02 1.785332e+06 1.647533e+06

About

Use GLM to predict insurance claims. Frequency Modelling, Severity Modelling, PurePremium Modelling, Xgboost Tweedie Regression, pygam linear modelling.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published