Credit_Risk_Analysis

Overview of the analysis:

FastLending is a peer to peer lending services company that wants to use machine learning to predict credit risk. Management believes that this will provide a quicker and more reliable loan experience. Furthermore, they also believe that machine learning will lead to a more accurate identification of good candidates for loans which will lead to lower default rates. The purpose of this project is to assist Fastlending's lead data scientist in implementing this plan by building and evaluating several machine learning models or algorithms to predict credit risk. The techniques used to achieve this include resampling and boosting as part of the project. Once designed, we need to evaluate the performance of these models and make a written recommendation on whether these models should be used to predict credit risk.

Results:

The code for the machine learning algorithms can be found in the jupyter notebook files: credit_risk_resampling and credit_risk_ensemble.

In the course of the project we developed the following machine learning models:

Naive Random Oversampling;
SMOTE Oversampling;
Undersampling;
SMOTEENN algorithm - Combination (Over and Under) Sampling;
Ensemble Classifier - Balanced Random Forest;
Ensemble Classifier - Easy Ensemble.

For each one of these models, we split the data into training and testing datasets, performed accuracy scores calculations, confusion matrixes and imbalanced classification reports.

1. Naive Random Oversampling

2. SMOTE Oversampling

3. Undersampling

4. SMOTEENN algorithm - Combination (Over and Under) Sampling

5. Ensemble Classifier - Balanced Random Forest

6. Ensemble Classifier - Easy Ensemble

Summary:

The results of our analysis are summarized in the table above. Based on our findings, we can say that AdaBoost Classifier model has the highest accuracy rate of 93%, which means that the model can predict the correct values 93% of the times.

Overall, it is noticeable that regression models fall behind their classifier counterparts with accuracy scores being 66% and lower. Based on accuracy rate alone the AdaBoost Easy Ensemble Classifier model is preferrable.

Additionally, the F1 score, which indicates the level of imbalance between sensitivity and precision, shows much higher scores for classifier models. The Easy Ensemble AdaBoost Classifier in particular had the largest F1 score of 0.97, thereby demonstrating the least disparity between sensitivity and precision.

To summarize, based on the results of our analysis, we recommend the Easy Ensemble AdaBoost Classifier machine learning model be adopted for the purpose of predicting credit risk.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
.gitignore		.gitignore
LoanStats_2019Q1.csv		LoanStats_2019Q1.csv
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit_Risk_Analysis

Overview of the analysis:

Results:

1. Naive Random Oversampling

2. SMOTE Oversampling

3. Undersampling

4. SMOTEENN algorithm - Combination (Over and Under) Sampling

5. Ensemble Classifier - Balanced Random Forest

6. Ensemble Classifier - Easy Ensemble

Summary:

About

Releases

Packages

Languages

Cryptotwister/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit_Risk_Analysis

Overview of the analysis:

Results:

1. Naive Random Oversampling

2. SMOTE Oversampling

3. Undersampling

4. SMOTEENN algorithm - Combination (Over and Under) Sampling

5. Ensemble Classifier - Balanced Random Forest

6. Ensemble Classifier - Easy Ensemble

Summary:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages