-
Comparing different sampling techniques in combination with classification modelling to determine which combinations predict credit risk best. Data is derived from a sample set which can be accessed here: (See Resources/.LoanStats_2019Q1.csv.icloud)
-
As credit risk is an inherently unbalanced classification problem (i.e good loans drastically outweigh bad ones) more intricate sampling techniques are needed in order to get at reliable results for prediction.
The first four models used logistic regression with a number of different sampling techniques. The models and their respective statistics are as follows (classification report images can be referenced below each model - derived from Analysis/credit_risk_resampling.ipynb and Analysis/credit_risk_ensemble.ipynb respectively):
- accuracy: 0.64
- high risk precision: 0.01
- high risk recall: 0.66
- accuracy: 0.65
- high risk precision: 0.01
- high risk recall: 0.61
- accuracy: 0.54
- high risk precision: 0.01
- high risk recall: 0.69
- accuracy: 0.64
- high risk precision: 0.01
- high risk recall: 0.71
The final two instances leverage tree models, with aim at taking advantage of their robustness to overfitting, looking to reduce bias in our predictions:
- accuracy: 0.79
- high risk precision: 0.03
- high risk recall: 0.70
- accuracy: 0.93
- high risk precision: 0.09
- high risk recall: 0.92
The first four models focused on different sampling techniques with use of logistic regression to help account for the inherent class inbalances in credit risk categorizing, while the final two used two different types of models to help account for biases. Oversampling techniques help scale the under-represented class when training the data, while undersampling techniques use the opposite approach by reducing the scale of the over-represented class.
Given the above statistics, the value of most importance in this circumstance is the recall/sensitivity score. Although precision and accuracy are also of importance, their values are not necessarily indicative of the model's reliability in judging all high risk circumstances correctly. To elaborate, a high precision score means that the model is right most of the time. However, given that credit risk is inherently imbalanced - i.e that there are way more low risk circumstances than high - this could simply mean that the model detects each low risk instance correctly, but never detects a high risk one. In addition, simply looking for a good high-risk precision score indicates reliability in judging the instances that are already considered high-risk, but doesn't account for false-negatives (i.e detecting low risk instances when they really are high).
Thus, sensitivity is the best metric when dealing with credit risk - the aim is to find a model that can catch all/almost all of the high-risk circumstances that exist, even if there is a high false-positive rate. Given the above stats, the Easy Ensemble AdaBoost Classifier yields the highest recall rate at 0.92 - meaning that 92% of high risk cases were actually categorized as high risk. Although the precision score for high risk is quite low (0.09), this simply means that there is a large rate of false positives (i.e that many circumstances were labelled high risk when they were actually low). This may not be the most ideal circumstance, however, the ramifications of a false high-risk categorization are much less detrimental to a peer-to-peer lending services company than a false low-risk categorization. Furthermore, even with an objectively low precision score, the Easy Ensemble model still scored highest in all categories in comparison to all other models.