Data available in Google Drive.
-
Prediction model is developed using Gradient Boosting classifier. The conceptual idea behind this classifier is to pick an algorithm and make tweaks to it with various regularization schemes, this process improves the learning ability of the model in a gradual and additive fashion. This classifier is particularly effective at classifying complex dataset such as Banking and Financing.
-
The results predicted by model on the probability a customers will default on their loan repayment is accurate ~76% of the time (this is validated against a pre-defined validation dataset fed into the model during the train/test phase).
-
Model identified a potential set of customers who maintains a significant account balance and has regular intervals of credit transactions on their account, has high probability of loan repayment. The customers with less account transactions and maintains a very low account balance for a significant time before the loan request date, has a high probability of loan defaulting. The model is also capable of adjusting the prediction parameters of its features based on the future dataset fed into it.
-
In total 21 features are extracted from the provided dataset which yields maximum accuracy. The features are as follows:
- Total number of credit transactions
- Average amount credited to the account
- Total number of debit transactions
- Average amount debited from the account
- Requested loan amount
- Total amount of all credit transactions
- Days between loan request and account transactions
- Current balance in the customer’s account
- Loan request date
- Total amount of all debit transactions
- The idea behind choosing these particular features is to train the prediction model on the bank’s existing customers transactional behavior and their loan outcomes, So that any potential loan seekers can be evaluated and risk-assessed primarily based on their transactional history.
The accuracy of the model is 76%