Assist the client to predict credit risk using a variety of Resampling Models and algorithms.
Tools/Programs/Languages used:
- imbalanced-learn library
- scikit-learn library
- ETL
- Machine Learning
I oversampled data via RandomOverSample and SMOTE algorithms using credit card credit dataset from LendingClub. ClusterCentroids algorithm was used to undersample the data. SMOTEENN algorithm was used for over-and-undersampling. Then I used two machine learning models to predict credit risk.
- Accuracy score of Random Oversampling using
RandomOVersampler
- Accuracy score was 0.657 which means that the model was correct 65.7% of the time.
- Accuracy score of SMOTE Oversampling using
SMOTE
- Accuracy score was 0.662 which means that the model was correct 66.2% of the time.
- Accuracy score of Undersampling using
ClusterCentroids
- Accuracy score was 0.544 which means that the model was correct 54.4% of the time.
- Accuracy score of Combination (Over and Under) Sampling using
SMOTEENN
- Accuracy score was 0.644 which means that the model was correct 64.4% of the time.
- Accuracy score of ML model using
BalancedRandomForestClassifier
- Accuracy score was 0.778 which means that the model was correct 77.8% of the time.
- Accuracy score of ML model using
EasyEnsembleClassifier
- Accuracy score was 0.920 which means that the model was correct 91.1% of the time.
- Overall, the models had pretty average accuracy scores. The clear winner was the ML model using
EasyEnsembleClassifier
. - The Accuracy score was much higher compared to any of the other models.