Given a dataset of consumer credit profiles, this project aims to build and test several different types of machine learning models to analyze and predict credit risk. Six different models/sampling methods were employed, and then compared based on balanced accuracy score, precision score, and recall score.
- Results:
- Summary
Source Data: LoanStats_2019Q1.csv
Software: Visual Studio Code v1.64.2, Jupyter Notebook 6.4.6, Python 3.7.11, Conda 4.11.0
- Balanced Accuracy Score: 64.8%
- Precision Score(high risk / low risk): 1% / 100%
- Recall Score(high risk / low risk): 63% / 66%
- Balanced Accuracy Score: 62.4%
- Precision Score(high risk / low risk): 1% / 100%
- Recall Score(high risk / low risk): 62% / 63%
- Balanced Accuracy Score: 51.3%
- Precision Score(high risk / low risk): 1% / 100%
- Recall Score(high risk / low risk): 59% / 44%
- Balanced Accuracy Score: 62.2%
- Precision Score(high risk / low risk): 1% / 100%
- Recall Score(high risk / low risk): 70% / 54%
- Balanced Accuracy Score: 78.8%
- Precision Score(high risk / low risk): 4% / 100%
- Recall Score(high risk / low risk): 67% / 91%
- Balanced Accuracy Score: 92.5%
- Precision Score(high risk / low risk): 7% / 100%
- Recall Score(high risk / low risk): 91% / 94%
All six models show fairly weak precision in detecting high risk credit, though the Ensemble classifiers showed much higher sensitivity in that category. With such weak precision (between 1-7%, respectively), our client is to have far more false negatives than false positives. While that means that the bank is very unlikely to end up financing an account that they shouldn't have, the number of business opportunities that they miss out on is far greater. Overall, the strongest model seems to be the EasyEnsemble Adaboost classifier, which had slightly higher precision than the rest (7% for high_risk) and much higher sensitivity than most (91% for high_risk).
On the whole, I wouldn't recommend our client utilize any of these models until they can be better trained, and I believe that effort should be focused on the EasyEnsemble Adaboost Classifier, since it has produced the most promising results thusfar.