LENDING_CLUB_DEPLOYMENT

Capstone Project- Lending Club

Introduction

"LendingClub is a US peer-to-peer lending company, headquartered in San Francisco, California. It was the first peer-to-peer lender to register its offerings as securities with the Securities and Exchange Commission (SEC), and to offer loan trading on a secondary market. LendingClub is the world's largest peer-to-peer lending platform." Wikipedia-Lending club On the LendingClub platform, people invest on other people loan through an online secured system. On these types of platforms, in the most cases, the main criteria of giving loans to costumers is solely based on their credit scores, so that a customer with lower credit score (more risky) get higher rate and customer with higher credit score (less risky) get lower interest rate for their loan. Obviously, from the investor point of view, the loans with higher interest rate are more attractive due to their higher return of investment. However, it also has high risk of being not returned at all. So, investing on "Bad loans" or charged-off, which means you loose your asset, is more worse than loosing an opportunity to gain more profit. The machine learning/deep learning model that could predict which of the high interest loans are more likely to be returned, would bring added value by minimizing the associated risks. Also, using other factors along with credit score may help us to identify the high risky loans and minimize the investors loss of money more accurately. Problem statement In the last few years, applying for different types of loans through online peer-to-peer lending platforms such as the LendingClub is raising.

Objective

First is to try to find a better prediction model to prevent investing on '"bad loans". To do that, First, going to implement some data engineering and preprocessing on LendingClub dataset to prepare data for analysis and modeling.
Second, need to apply explanatory data analysis (EDA) to investigate the features.
Preprocessed data on LendingClub loans labeled on whether or not the borrower defaulted (charged-off) to develop a model and predict whether or not a borrower will pay back their loan. This way in the future when we get a new potential customer who assigned with higher interest loan, we can assess whether or not they are likely to pay back the loan.

Dataset

We used LendigClub Dataset possessing almost all features including FICO scores. This dataset contains more than several millions data and because, here, We only use a normal laptop to analyze and model this dataset, thus, We only selected the loans issued in 2018 (almost 0.5 million data) to reduce the processing time.

Moreover, some of the features in this dataset are only relevant after loans are issued and thus, not available at the moment of investing. For this reason, We used features list from here that are available and visible to investors before issuing a loan. To match this feature list to the main dataset, We did some simple and primary cleaning, whitespace removing, and spell corrections using dropping and “regular expression” technique. Also, it requires to check the unmatched features to see if some of them could be matched manually.

DATA SEGMENTATION AND DATA CLEANING

• In this project, we have p prepared a processed dataset by and collected the clear-cut data available on https://www.kaggle.com/wordsforthewise/lending-club. • Using pandas data frame, we have calculated the percent of Non-null values in each columns. • Then removed the columns that had large number of Null value or were irrelevant to the analysis. • Then separated the columns with numerical values to those that have categorical values. • By using the fillna we have filled all the remaining numerical value columns with empty values with mean of the columns. • By using the fillna we have filled all the remaining Categorical value columns with empty values with mode of the columns. • After handling the null values, we deal with skewness of the data. • After checking the skewness, all the columns that are highly skewed are dropped using drop function. • Then we checked for outlier tendency using kurtosis. Then we capped some outlier. • Outliner were successfully handled. And after that we created the target variable that is Loan Status in this case. • Loan status had too many unique values so instead we created a new variable taking the default as 1 and rest 0. • We converted date object columns to integer years or months so that we can easily encode other categorical features • without exhausting our resources. • For filling the dates, we used the most used dates in that feature. • Thus our data cleaning process was cleared.

EXPLORATORY DATA ANALYSIS:

Loan Status vs Loan Amount:

• It can be observed that Fully Paid and Charged-off are having higher amount of loans than that of Default. • Median: It can be further inferred that the Median lies on the lower side of the Interquartile Range for both Fully-Paid and Default categories, while the median in somewhat in the middle of the range for the Charged-off.

Average Load Amount vs Grade:

• The graph draws a pattern between the loan amount and grade. As the Grade goes down from A to G, the Average Loan Amount linearly increases. • Additionally, it can be observed that the average loan amount of B grade loans is the least of all grades.

Loan Status diversification:

• The count of fully paid is more than the charged off. • The dataset is slightly imbalanced and thus was balanced before the model training phase.

Box Plot for Loan Amount with Grades:

• It can be inferred from the scenario that larger loans generally appear to be given at a lower grade, with the median loan amount for a grade G loan being almost 5000 higher than that of a grade A, B, or C loan. • Grade B, however, remains at the lowest in the range.

Bar Graph for Loan Amount vs the Loan Status:

• Higher loan amounts are Verified more often. We already know that larger loans are less in number, but see a higher charge off rate. • It's not the verified status per se, it's the fact that higher loan amounts are riskier and are also verified more often by Lending Club.

Box Plot Grouped By Terms:

• The graph shows Interest Rate on the Y axis and term (number of months) on the x axis. • On X axis- 0: 36 months 1: 60 months • Interest rates are based on term. Larger amounts were seen to be given for higher term. The rate of interest associated with them is also high.

Loan Amount with respect to Grades:

• The graph is made between loan amount Grades from A to G, for two terms (36 months and 60 months). • On X axis- 0 denotes 36 months 1 denotes 60 months • Higher loan amount are associated with grade for longer terms. • It can also be observed that marginally equal amount of loan was taken by all the Grades range for same terms.

Loan Amount vs Term:

• The graph is made between loan amount and the term (number of months) for 3 types of Loan Status- Fully-paid, Charged- off, and Default • Higher loan amount are associated with longer terms and see higher Charge Offs.

Training The Model:

Selecting the pivotal determinants from the accepted data frame, 22 specific columns are selected for optimizing the target variable i.e. whether the loan is to be approved or not. Columns Selected for Training Model:

 loan_amt  zip_code  earliest_cr_line  open_acc  fico_score  total_acc  grade  revol_util  sub_grade  target  home_ownership  issue_d  installment  credit_history  int_rate  credit_ratio  mort_acc  installment_ratio  term  annual_inc  verification_status  dti

Models Used:

1. Random Forest:

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. • Parameters Used o n_estimators : 10 ; defines the number of trees in the forest o random_state=10 ; Controls both the randomness of the bootstrapping of the samples used when building trees and the sampling of the features. o max_depth = 6;

Individual Account Result Table:

Joint Account Result Table:

2. Regularized Logistic Regression:

In the multiclass case, the training algorithm uses the one-vs-rest (OvR) scheme if the ‘multi_class’ option is set to ‘ovr’, and uses the cross-entropy loss if the ‘multi_class’ option is set to ‘multinomial’. This class implements regularized logistic regression using the ‘liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ solvers.

• Parameters Used: o penalty = ‘l2’ ;this is a Ridge regression which adds the squared magnitude of coefficient as penalty term to the loss function. o C = 100; it is the inverse of regularization strength o random_state = 0;

Individual Account Result Table:

Joint Account Result Table:

3. Logistic Regression:

In Logistic Regression above performed, if no solver is mentioned i.e. ‘none’ is used, then regression is not applied.

• Parameters used: o C = 1; o Solver = ‘none’;

4. KNeighborsClassifier:

In k-NN, the output is a class membership. An object is assigned to a class that is most common among its k-neighbours. • Parameters used: o n_neighbors = 13; It defines the number of nearest neighbours considered. o P = 2; It is the power parameter. P=2 is used for Euclidean distance. o Metric= ‘euclidean’;

Accuracy for Individual Account:

Results:

Table 1. Accuracy For Different Models:

Conclusion :

Based on the results and the summary tables, Regularized Logistic Regression was selected for predictions ahead.

Now, we use flask framework to create a web app for our model.

we have created python file as "app.py" in our root folder. We will start working on web app now,

Step-1

Initializing Flask object as per following command and importing requried liabraries.

Step-2

Now, we load pickle files for Indivisual, Joint and sub grade model.

Step-3

After loading all 3 files, we start with creating our web app.

We have created web pages using html. We have joined 4 html pages into our web app using flask 'app.route()' Styling of web page was done using CSS.

Now, our webpage is ready integrate with flask app. --1. Main Page:

- Our home page is pretty simple. 
- It has 4 buttons at the top-right corner to move to different pages of the app mentioned below. 
- It has two more buttons at the center to choose the type of loan you are interested in for prediction.

Loand Prediction:

 - This page two buttons at the center to choose the type of loan you are interested in for prediction.
   - Joint Loan or
   - Individual Loan

Individual Application:

 - Individual Application page consist of a form that asks for various inputs from the user concerning several features of the Individual loan one need to be predicted.

Joint Application:

 - Join Application page consist of a form that asks for various inputs from the user concerning several features of the Joint loan one need to be predicted.

Step-4

Now, we take the inputs from user. This step is common for both Individual/Joint type. Following is the list for inputs we ask from the user-

Annual Income
Fico score
Earliest credit line
Zipcode
Total account
Verification status
Term
Dti (Debt-to-income ratio)
Loan amount
MORTGAGE account
Open account
Revol Util
Home Ownership

After getting all the inputs, we do feature engoneering in next step

Step-5

Feature Engineering

We convert datatype of all the input features to required datatypes to perform further steps.
Now we calculate credit ratio Credit Ratio = Open account / Total account
Calculate the credit history of the applicant Credit Histoy = Issue date - Earliest credit line
Predict sub grade from Fico score using a KNN model
With the help of sub grade we decide Grade
Now we calculate intrest rate and EMI on the basis of Grade and Loan amount.
Calculate Installment Ratio Installment Ratio = EMI / Loan amount
Now we merge all the features in one final array.

Feature Scaling We perform scaling on Final array using 'StandardScaler()'

Step-6

Model Building We use Random Forrest algorithm for our model. Now, using Final to train our model on it.

Step-6

Model Prediction Now, we use our model to predict the Output values.

We predict the loan status with the help of 3 parameters

DTI
Revol Util
Output -- So, on the basis of these conditions we predict the loan status

If dti> 40.0 loan is denied because your Debt to income is too high
If revol_util>89 loan is denied because your credit amount usage is too high
If output > 0.43 _ loan is denied, your loan risk probabilty is high_

Else, if all these conditions satisfy, your loan will be approved

HEROKU DEPLOYMENT:

Requirements:

Below you will find the requirements and prerequisites for the deployment. • Create account in Heroku • GIT installation • Heroku_CLI_installation • Login to Heroku account • Install gunicorn • Declare app dependencies • Create Procfile • Initialize git inside the project • Create_heroku_app • Add files to the GIT repository and deploy • Browse deployed URL • Download full project

How to make account in Heroku: Step 1: To have an account in Heroku you have to login to https://signup.heroku.com/

Step 2: You must fill the mandatory details, as shown in the below screenshot.

Step 3: Once you have verified your email address, you will be able to access the platform.

How to deploy a model on Heroku platform:

1. Login to Heroku from command prompt

It will ask you to enter email id and password to login. After successful login next screen will show like below:

2. Install gunicorn

Gunicorn is a Python WSGI HTTP Server for UNIX. It allows you to run any Python application concurrently by running multiple Python processes within a single dyno. The Gunicorn server is broadly compatible with various web frameworks, simply implemented, light on server resources, and speedy.

3. Declare app dependencies

Create requirements.txt file in the root directory of the project by pip freeze command. The requirements.txt file lists all the app dependencies together. When an app is deployed, Heroku reads this file and installs the appropriate Python dependencies using the pip install -r command.

4. Create Procfile

The Procfile is always a plain text file that is named Procfile without a file extension in the root directory of the project, to explicitly declare what command should be executed to start your app.

F:\python-projects\flask-projects\flask-app\Procfile

A Heroku app’s web process type is special: it is the only process type that can receive external HTTP traffic from Heroku’s routers. If your app includes a web server, you should declare it as your app’s web process. The first app refers to the filename app.py. The second app refers the instance of Flask which is inside app.py file.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
Images		Images
data		data
model		model
notebook		notebook
static		static
templates		templates
Procfile		Procfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

BHAVI2803/LENDING_CLUB_DEPLOYMENT

Folders and files

Latest commit

History

Repository files navigation

LENDING_CLUB_DEPLOYMENT

Capstone Project- Lending Club

Introduction

Objective

Dataset

DATA SEGMENTATION AND DATA CLEANING

EXPLORATORY DATA ANALYSIS:

Loan Status vs Loan Amount:

Average Load Amount vs Grade:

Loan Status diversification:

Box Plot for Loan Amount with Grades:

Bar Graph for Loan Amount vs the Loan Status:

Box Plot Grouped By Terms:

Loan Amount with respect to Grades:

Loan Amount vs Term:

Training The Model:

Models Used:

1. Random Forest:

2. Regularized Logistic Regression:

3. Logistic Regression:

4. KNeighborsClassifier:

Accuracy for Individual Account:

Results:

Conclusion :

Now, we use flask framework to create a web app for our model.

Step-1

Step-2

Step-3

After loading all 3 files, we start with creating our web app.

Step-4

Step-5

Step-6

Step-6

We predict the loan status with the help of 3 parameters

HEROKU DEPLOYMENT:

Requirements:

How to deploy a model on Heroku platform:

1. Login to Heroku from command prompt

2. Install gunicorn

3. Declare app dependencies

4. Create Procfile

5. Initialize git

6. Create Heroku app, add files to GIT and deploy

7. Browse deployed URL

8. Download full project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages