Skip to content

minhtrang4078/Personal-Loan-Status-Prediction-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Personal Loan Status Prediction App

First look of the app app 1st

Contents

Short description

  • The jupyter notebook includes detailed comments showing how things works together.

Problem statement

LendingClub company provides P2P lending, which becomes more and more popular. Investors are responsible for their own decision on lending. Even though the company has fitter borrowers who meet the minimum requirements, there is still a huge risk of non-repayment. This is the problem that this project works on by making the prediction of loan status for these people.

The final product of this project is a localhost web app that presents an explanation dashboard of a predictive model to classify which customers fall into categories of 'Fully Paid' or Charged Off'. End-users can understand the model as well as receive predictive results for a new customer.

In brief, this project is created with the purpose of:

  • improving profitability for the lenders
  • providing a lightweight tool to support the process of decision making
  • an app with an easily used interface

Choice of algorithm

XGBoost is chosen in this project because of some important characteristics (Chen & Guestrin, 2016):

  • For a big data sample:

    • sparsity-aware split finding algorithm to handle the problem of missing values in the data
    • For a big dataset – training speed: fast
  • In practice:

    • Python package of XGBoost
    • Parallelization of tree construction
    • Out-of-Core Computing
    • Cache Optimization

The data

Demo video

Project roadmap - Slides

This project use the most popular approach of the Cross-Industry Standard Process for Data Mining (CRISP-DM), (Wirth et al.,2000, p.6). The framework as below:

1.Problem statement/Business understanding

2.Data Understanding (Exploratory Data Analysis)

3.Data Preparation

4.Modeling development

5.Evaluation/Interpretation

6.Communication/Deployment

Model efficiency

  • Overall accuracy of prediction is ~70%
  • This project is done within 3 months, and there are still rooms for improvement.