-
The goal is to create a Machine Learning model using historcial lending activity to predict credit worthiness of borrowers.
-
Following data points are needed to build this model - loan_size,interest_rate,borrower_income,debt_to_income,num_of_accounts,derogatory_marks,total_debt,loan_status
-
'loan_status' provided in the given data set indicates whether the loan is a 'Healthy loan (0)' or a 'High-risk Loan (1)'
-
The data provided in the csv file was loaded into a Pandas DataFrame.
-
The data was then split into training and test set.
- The target column 'loan_status' was pulled into a seried named 'y'
- All the other features except 'loan_status' were pulled into a dataframe named 'X'
- Scikit-Learn 'train_test_split' function was used to split X and y to train and test data sets named X_train, y_train and X_test, y_test.
-
Logistic Regression model from Scikit Learn was chosen to train and predict the loan status
- LogisticRegression model was created using solver named 'lbfgs' and random_state=1
- LogisticRegression Model created was then fitted using training dataset - X_train and y_train.
- The model is then used to predict loan_status of test dataset 'X_test'
- Logistic Regression Model :
- Classification Report
- Accuracy (0.99),
- Loan Status 0 (Healthy Loan) - Precision (1.00), and Recall (1.00).
- Loan Status 1 (High-Risk Loan) - Precision (0.87), and Recall (0.89).
- Classification Report
-
The Logistic Regression Model has performed well to predict the Loan Status with an accuracy score of 0.99 as reported in the classification report.
-
The precision and recall score for Loan Status '0' are both 1, which means the model was able to correctly predict 'Healty Loan' labels.
-
The precision and recall score for Loan Status '1' are .87 & .89 respectievely so the model was not able to predict 'High-Risk Loan' as accurate as 'healthy loan' but it still has a good success rate.
Recommends Logistics Regression Model for predicting credit worthiness of borrowers