Skip to content

Apply machine learning to predict credit card risk using imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling.

Notifications You must be signed in to change notification settings

Cryptotwister/Credit_Risk_Analysis

Repository files navigation

Credit_Risk_Analysis

Overview of the analysis:

FastLending is a peer to peer lending services company that wants to use machine learning to predict credit risk. Management believes that this will provide a quicker and more reliable loan experience. Furthermore, they also believe that machine learning will lead to a more accurate identification of good candidates for loans which will lead to lower default rates. The purpose of this project is to assist Fastlending's lead data scientist in implementing this plan by building and evaluating several machine learning models or algorithms to predict credit risk. The techniques used to achieve this include resampling and boosting as part of the project. Once designed, we need to evaluate the performance of these models and make a written recommendation on whether these models should be used to predict credit risk.

Results:

The code for the machine learning algorithms can be found in the jupyter notebook files: credit_risk_resampling and credit_risk_ensemble.

In the course of the project we developed the following machine learning models:

  • Naive Random Oversampling;
  • SMOTE Oversampling;
  • Undersampling;
  • SMOTEENN algorithm - Combination (Over and Under) Sampling;
  • Ensemble Classifier - Balanced Random Forest;
  • Ensemble Classifier - Easy Ensemble.

For each one of these models, we split the data into training and testing datasets, performed accuracy scores calculations, confusion matrixes and imbalanced classification reports.

1. Naive Random Oversampling

Naive Random Oversampling

2. SMOTE Oversampling

SMOTE Oversampling

3. Undersampling

Undersampling

4. SMOTEENN algorithm - Combination (Over and Under) Sampling

SMOTEENN - combination

5. Ensemble Classifier - Balanced Random Forest

Balanced Random Forest Classifier

6. Ensemble Classifier - Easy Ensemble

Easy Ensemble AdaBoost Classifier

Summary:

ML Models Summary

The results of our analysis are summarized in the table above. Based on our findings, we can say that AdaBoost Classifier model has the highest accuracy rate of 93%, which means that the model can predict the correct values 93% of the times.

Overall, it is noticeable that regression models fall behind their classifier counterparts with accuracy scores being 66% and lower. Based on accuracy rate alone the AdaBoost Easy Ensemble Classifier model is preferrable.

Additionally, the F1 score, which indicates the level of imbalance between sensitivity and precision, shows much higher scores for classifier models. The Easy Ensemble AdaBoost Classifier in particular had the largest F1 score of 0.97, thereby demonstrating the least disparity between sensitivity and precision.

To summarize, based on the results of our analysis, we recommend the Easy Ensemble AdaBoost Classifier machine learning model be adopted for the purpose of predicting credit risk.

About

Apply machine learning to predict credit card risk using imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published