Skip to content

Applying various sampling methods and ML to analyze credit risk

Notifications You must be signed in to change notification settings

YourOriginal/AI-Credit-Risk

Repository files navigation

AI-Credit-Risk

The use of AI and its applications in the many industries exceeds what we may believe. One of which is the financial industricies and its many intricacies surrounding risks such as credit risk. By using various metrics such as income, payment time, schedule, and other important facotrs, financial companies can determine the risk an individual poses when they are requesting some sort of loan. In this module, we will use different models to determine which is the most effective in determining credit risk.

Results

4 similar resampling techniques were used from the imblearn library: 2 forms of Oversampling, 1 Undersampling and 1 Combination sampling (if you want more information on these, visit this link. In summary, due to large differences in populations of our data sets, we will be randomly duplicating/deleting examples in within the respective class to create a more even distrubition for better analysis. However, within these algoriths, there are multitudes of libraries that exists but in this module, we will not go over the differences, only the results.

Aside from resmapling we also used ensemble learners which use multiple algorithms. In this module we use random forest and easy ensemble.

The general code is relatively straight forward and consistent. The data was split into our training and testing data and strings were converted to numbers to provide a more "computer friendly" information. From there, we used the train_test_split function and then instantiated the algorithms we plan to use and then finally, fit the model and summarize the data.

The summaries are provided below in the order of Accuracy, Classification report, and imbalanced matrix

Naive Random Oversample

combination acc

comb report

comb matrix

SMOTE Over sample

smote over acc

over report

Smote over matrix

Random Undersample

under acc

under report

under matrix

Combination

combination acc

comb report

comb matrix

Random Forest Ensemble

forest acc

forest report

forest matrix

Easy Ensemble

easy ensemble acc

easy report

easy matrix

Summary

Although each method provided us with a prediction, not each had the desired results. From the 6 machine learning models, the ensemble methods were the most consistent in all three sections of measure: accuracy, precision and recall. The sampling techniques had accuracy scores of approximately 60% which is quite low, furthermore, the poor precision and recall (except for combination with 99% precision) provide far too inconsistent results. The ensemble methods are all 70% in each of the metrics and provide a level of consistency that is great for banks wanting to assess risk. Of the 6 models, considering they all require the same effort, easy ensemble is the best option to use.

About

Applying various sampling methods and ML to analyze credit risk

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published