Credit risk modeling using Python
Credit risk modeling is the intersection of data science and finance, and it is one of the most important activities conducted by banks. With increased attention since the recession, credit risk modeling plays a critical role in informing lending decisions and minimizing financial institutions' exposure to risk. The project uses datasets called "Lending Club" from Kaggle containing two large dataset with over 400 thousand observations and around 40 variables in initial setting.
This project covers the complete process of credit risk modeling, from data preprocessing to calculating expected loss (EL). The project focuses on both parametric and non-parametric machine learning methods for modeling probability of default (PD), loss given default (LGD), and exposure at default (EAD). Several concepts related to credit risk and machine learning such as Weight of Evidence, Information Value, Logistic Regression, K-Nearest Neighbours and Support Vector Machines are brought together.
- How to prepare data for credit risk modeling
- The mathematical foundations of credit risk modeling
- How to build and evaluate parametric and non-parametric models for PD, LGD, and EAD
- How to calculate expected loss using the modeled values
- Comparison of the results obtained from different modeling methods