Skip to content

Latest commit

 

History

History
12 lines (6 loc) · 1.85 KB

File metadata and controls

12 lines (6 loc) · 1.85 KB

Default Prediction for Loan

ML-cover

Business problem:

Banks face significant losses when customers default on their loans, which in turn negatively impacts the country's economic growth. To address this problem, a data scientist is needed to perform in-depth analysis of loan data to identify the factors that influence defaults and constantly review the status of the loans. The objective of this project is to develop a risk mitigation plan that allows permanent control of the status of the credits granted by the institution in order for banks to contact borrowers and minimize losses. To achieve this goal, we will use various data science techniques, such as logistic regression, decision tree, random forest, and Xgboost. This project will be considered a binary classification problem, in which we intend to identify whether or not a client will default on their loan. The results of this analysis will be used to inform decision making and mitigate bank and investor losses, while promoting economic growth.

The objective is to identify the models that present the best ACCURACY, in other words, to search for the best metric performance that measures the proportion of correct predictions in general, that is, the proportion of cases classified correctly (true positives and true negatives) in relation to with all cases.

Finally, as a reminder and as we will observe in this paper, it should be noted that accuracy is a general performance metric for the model that evaluates global precision, but to maximize true positives and minimize false negatives, it is necessary to consider other metrics. such as sensitivity and specificity. Choosing the right metric depends on the goal and context of the classification problem.