Skip to content

dinhdat07/credit_scoring_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💳 Credit Default Risk Prediction

A production-ready machine learning web application to predict whether a credit card client will default in the next month.

This project uses a real-world dataset from the UCI Machine Learning Repository, combined with feature engineering, model tuning, and SHAP explainability — all deployed via Streamlit.


Demo

image

Project Structure


credit_scoring_model/
├── app.py                 # main Streamlit app
├── requirements.txt       # python dependencies
├── models/                # trained model (.pkl)
├── data/                  
├── notebook/              # notebook for training & EDA

Problem Statement

Predict whether a customer will default on their credit card payment next month based on demographic data, bill/payment history, and credit behavior.

Dataset Description

  • Source: Kaggle Dataset
  • Rows: 30,000
  • Columns: 24 original + 5 engineered features

Input Features:

Column Description
LIMIT_BAL Credit limit (NT dollars)
SEX Gender (1 = Male, 2 = Female)
EDUCATION Education level (1 = Graduate, 2 = University, ...)
MARRIAGE Marital status
AGE Age in years
PAY_0 to PAY_6 Repayment status (last 6 months)
BILL_AMT1 to BILL_AMT6 Monthly bill amounts
PAY_AMT1 to PAY_AMT6 Monthly payment amounts

Engineered Features:

  • TOTAL_PAY_AMT: Total amount paid in 6 months
  • TOTAL_BILL_AMT: Total bill in 6 months
  • NUM_LATE_PAYMENTS: Number of late payments
  • MAX_DELAY: Longest delay (in months)
  • LONGEST_LATE_STREAK: Longest continuous months of delay

Tech Stack

  • Language: Python 3.12
  • ML Model: XGBoost Classifier
  • Tuning: GridSearchCV
  • Imbalance Handling: SMOTE
  • Explainability: SHAP (Waterfall plot)
  • Deployment: Streamlit

How to Run Locally

  1. Clone the repository

    git clone https://github.com/your-username/credit-scoring-model.git
    cd credit-scoring-model
  2. Create virtual environment

    python -m venv venv
    venv\Scripts\activate  # Windows
    source venv/bin/activate  # Mac/Linux
  3. Install dependencies

    pip install -r requirements.txt
  4. Run the app

    streamlit run app.py

Model Performance

Metric Score
ROC-AUC 0.7815
Accuracy 82%
Tuning GridSearchCV
Explainability SHAP waterfall

SHAP Visualization

We use SHAP to explain predictions for each customer: image

image

Future Improvements

  • Add login/auth for secure access
  • Store prediction history to database
  • Train model incrementally with real-time data

🧑‍💻 Author

Đạt Đình
2nd Year @ UET, Data Engineering Track
Email: dinhdatnguyen0710@example.com

📜 License

MIT License — Free to use and modify

🙌 Acknowledgments

  • UCI Credit Card Dataset
  • Streamlit community
  • SHAP and XGBoost developers

About

A Streamlit-based machine learning app that predicts the likelihood of a credit card customer defaulting next month.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published