--------------THIS ANALYSIS IS DONE IN R USING JUPYTER NOTEBOOK----------------
This project is an analysis to identify customers who might default on their first payment. Through this project, I wanted to identify the important factors that indicate towards applicant defaulting.
The data is from a company called Net Pay Advance, that lends small loans to customers. Here, each row indicates a loan application.
Mean distribution
Correlation matrix
- Logistic Regression
- Decision Tree
- Random Forest
- Ada-boost
The models were evaluate using ROC curve, and initially the data was used as is (given imbalance class distribution). Later, SMOTE technique and up-sampling was used to balance the classes and the models were used again on the new data.
The Random Forest was chosen, its parameters hyper-tuned.
Due to a very limited training set, and poor predictor variables, such low AUC was achieved. However, variables such as Monthly Net Income, Months lived at residence, Having a Bank Account for a long duration, time left for the due date and Loan amount very important factors indicating default towars the first payment.