FastLending is a peer to peer lending services company that wants to use machine learning to predict credit risk. Management believes that this will provide a quicker and more reliable loan experience. Furthermore, they also believe that machine learning will lead to a more accurate identification of good candidates for loans which will lead to lower default rates. The purpose of this project is to assist Fastlending's lead data scientist in implementing this plan by building and evaluating several machine learning models or algorithms to predict credit risk. The techniques used to achieve this include resampling and boosting as part of the project. Once designed, we need to evaluate the performance of these models and make a written recommendation on whether these models should be used to predict credit risk.
The code for the machine learning algorithms can be found in the jupyter notebook files: credit_risk_resampling and credit_risk_ensemble.
In the course of the project we developed the following machine learning models:
- Naive Random Oversampling;
- SMOTE Oversampling;
- Undersampling;
- SMOTEENN algorithm - Combination (Over and Under) Sampling;
- Ensemble Classifier - Balanced Random Forest;
- Ensemble Classifier - Easy Ensemble.
For each one of these models, we split the data into training and testing datasets, performed accuracy scores calculations, confusion matrixes and imbalanced classification reports.
The results of our analysis are summarized in the table above. Based on our findings, we can say that AdaBoost Classifier model has the highest accuracy rate of 93%, which means that the model can predict the correct values 93% of the times.
Overall, it is noticeable that regression models fall behind their classifier counterparts with accuracy scores being 66% and lower. Based on accuracy rate alone the AdaBoost Easy Ensemble Classifier model is preferrable.
Additionally, the F1 score, which indicates the level of imbalance between sensitivity and precision, shows much higher scores for classifier models. The Easy Ensemble AdaBoost Classifier in particular had the largest F1 score of 0.97, thereby demonstrating the least disparity between sensitivity and precision.
To summarize, based on the results of our analysis, we recommend the Easy Ensemble AdaBoost Classifier machine learning model be adopted for the purpose of predicting credit risk.