Lending Club, is the largest online loan marketplace, facilitating personal loans, business loans, and financing of medical procedures. The Aim of this case study is to analyze the data and point out driving factors behind loan default.
- To derive Regions from States for analysis we have used an open source dataset available on kaggle
- link for the data set: https://www.kaggle.com/omer2040/usa-states-to-region
- this data set is also available in this github repo with the name 'states.csv'
- This Project Includes EDA performed on the loan lending company data set to derive insights which could be used by the business for decision making.
- The Business Problem here is to find out the driving factors which result into loan defaults.
- We are working on a data set of financial domain, the data set contains information on the approved loans which were fully paid or charged off with lot of customer demographics and information.
EDA Results:
- we can expect an increase in number or loan applications and approvals every year.
- We have most number of defaulters coming from Western Region and most number of loans were approved for southern region.
- Providing loan to a borrower with credit grade G could be risky as 50% of the borrowers default of this particular group.
- People with low income are the ones with less work experience and the people with low income group default the most.
- dti rate is negatively correlated with annual income of the employee, in the data people with high income has the best dti rate.
- Most number of loans are applied by the people who has a medium annual income.
- People who live in mortgage and on rent tend to default as compared to homeowners.
- pandas - latest version, for data manipulation
- matplotlib.pyplot - latest version, for plotting the graphs
- seaborn - latest version, for plotting the graphs
- numpy - latest version, for calculations
- missingno - latest version, for plotting NaN columns info