AI-Credit-Risk

The use of AI and its applications in the many industries exceeds what we may believe. One of which is the financial industricies and its many intricacies surrounding risks such as credit risk. By using various metrics such as income, payment time, schedule, and other important facotrs, financial companies can determine the risk an individual poses when they are requesting some sort of loan. In this module, we will use different models to determine which is the most effective in determining credit risk.

Results

4 similar resampling techniques were used from the imblearn library: 2 forms of Oversampling, 1 Undersampling and 1 Combination sampling (if you want more information on these, visit this link. In summary, due to large differences in populations of our data sets, we will be randomly duplicating/deleting examples in within the respective class to create a more even distrubition for better analysis. However, within these algoriths, there are multitudes of libraries that exists but in this module, we will not go over the differences, only the results.

Aside from resmapling we also used ensemble learners which use multiple algorithms. In this module we use random forest and easy ensemble.

The general code is relatively straight forward and consistent. The data was split into our training and testing data and strings were converted to numbers to provide a more "computer friendly" information. From there, we used the train_test_split function and then instantiated the algorithms we plan to use and then finally, fit the model and summarize the data.

The summaries are provided below in the order of Accuracy, Classification report, and imbalanced matrix

Naive Random Oversample

SMOTE Over sample

Random Undersample

Combination

Random Forest Ensemble

Easy Ensemble

Summary

Although each method provided us with a prediction, not each had the desired results. From the 6 machine learning models, the ensemble methods were the most consistent in all three sections of measure: accuracy, precision and recall. The sampling techniques had accuracy scores of approximately 60% which is quite low, furthermore, the poor precision and recall (except for combination with 99% precision) provide far too inconsistent results. The ensemble methods are all 70% in each of the metrics and provide a level of consistency that is great for banks wanting to assess risk. Of the 6 models, considering they all require the same effort, easy ensemble is the best option to use.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
Resources		Resources
.gitattributes		.gitattributes
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Credit-Risk

Results

Naive Random Oversample

SMOTE Over sample

Random Undersample

Combination

Random Forest Ensemble

Easy Ensemble

Summary

About

Releases

Packages

Languages

YourOriginal/AI-Credit-Risk

Folders and files

Latest commit

History

Repository files navigation

AI-Credit-Risk

Results

Naive Random Oversample

SMOTE Over sample

Random Undersample

Combination

Random Forest Ensemble

Easy Ensemble

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages