Skip to content

Develop a model that can assess and predict loan risk using supervised machine learning.

Notifications You must be signed in to change notification settings

jaezak/credit_risk_sklearn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

lesson banner

Credit Risk Prediction

Welcome to the Credit Risk Classification repository! In this project, you'll embark on an exciting journey to develop a model that can assess and predict loan risk. Your mission is to leverage various data science techniques to analyze historical lending activity from a peer-to-peer lending services company. Ultimately, you'll build a powerful model capable of identifying the creditworthiness of borrowers.

Project Overview

Lending institutions face the ongoing challenge of assessing loan applicants' creditworthiness to minimize financial risk. In this challenge, you will work with a rich dataset containing historical lending data. Your primary objectives include:

  • Data Exploration: Dive into the dataset to gain a deep understanding of its structure, features, and any potential data quality issues.

  • Feature Engineering: Create relevant features that can help the model make accurate predictions about loan risk.

  • Model Training: Implement and train machine learning models using a variety of techniques, such as classification algorithms, to predict loan risk.

  • Model Evaluation: Assess the performance of your models using appropriate evaluation metrics and techniques, such as cross-validation and hyperparameter tuning.

  • Interpretability: Aim to create models that provide insights into why a particular loan application is deemed risky or not.

Getting Started

The instructions are are divided into the following subsections:

  • Split the Data into Training and Testing Sets

  • Create a Logistic Regression Model with the Original Data

  • Write a Credit Risk Analysis Report

Split the Data into Training and Testing Sets

Open the starter code notebook and use it to complete the following steps:

  1. Read the lending_data.csv data from the Resources folder into a Pandas DataFrame.

  2. Create the labels set (y) from the “loan_status” column, and then create the features (X) DataFrame from the remaining columns. A value of 0 in the “loan_status” column means that the loan is healthy. A value of 1 means that the loan has a high risk of defaulting.

  3. Split the data into training and testing datasets by using train_test_split.

Create a Logistic Regression Model with the Original Data

Use your knowledge of logistic regression to complete the following steps:

  1. Fit a logistic regression model by using the training data (X_train and y_train).

  2. Save the predictions for the testing data labels by using the testing feature data (X_test) and the fitted model.

  3. Evaluate the model’s performance by doing the following:

  • Generate a confusion matrix.

  • Print the classification report.

Credit Risk Analysis Report

  1. Analysis Overview: The purpose of this analysis was to identify whether or not credit borrowers were considered high-risk or safe. This was done by building a model using a dataset of lending activity. When consulting the classification report, 0 represents a healthy loan (the borrower will be pay back the loan) while 1 represents a high-risk loan.

  2. The results: Using a bulleted list, describe the accuracy score, the precision score, and recall score of the machine learning model.

  • Accuracy score : The accuracy score was .99, or 99%, meaning the model performed with almost perfect accuracy.

  • Precision score : The percentage of correct positive predictions to total positive predictions. Out of all the borrowers the model predicted would be considered high-risk, 85% were actually high-risk. Out of all the borrowers the model predicted to be safe, 100% of them were safe.

  • Recall score: Out of all the borrowers that were labeled high-risk, the model predicted the outcome correctly for 91% of them. Out of all the borrwers labeled safe, the model predicted the outcome correctly for 99% of them.

  1. A summary: Summarize the results from the machine learning model. I would recommend the model be used by the company because it predicted the outcome of repayment with 99% accuracy.

References

Data for this dataset was generated by edX Boot Camps LLC, and is intended for educational purposes only.

About

Develop a model that can assess and predict loan risk using supervised machine learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published