A production-ready machine learning web application to predict whether a credit card client will default in the next month.
This project uses a real-world dataset from the UCI Machine Learning Repository, combined with feature engineering, model tuning, and SHAP explainability — all deployed via Streamlit.
credit_scoring_model/
├── app.py # main Streamlit app
├── requirements.txt # python dependencies
├── models/ # trained model (.pkl)
├── data/
├── notebook/ # notebook for training & EDA
Predict whether a customer will default on their credit card payment next month based on demographic data, bill/payment history, and credit behavior.
- Source: Kaggle Dataset
- Rows: 30,000
- Columns: 24 original + 5 engineered features
| Column | Description |
|---|---|
LIMIT_BAL |
Credit limit (NT dollars) |
SEX |
Gender (1 = Male, 2 = Female) |
EDUCATION |
Education level (1 = Graduate, 2 = University, ...) |
MARRIAGE |
Marital status |
AGE |
Age in years |
PAY_0 to PAY_6 |
Repayment status (last 6 months) |
BILL_AMT1 to BILL_AMT6 |
Monthly bill amounts |
PAY_AMT1 to PAY_AMT6 |
Monthly payment amounts |
TOTAL_PAY_AMT: Total amount paid in 6 monthsTOTAL_BILL_AMT: Total bill in 6 monthsNUM_LATE_PAYMENTS: Number of late paymentsMAX_DELAY: Longest delay (in months)LONGEST_LATE_STREAK: Longest continuous months of delay
- Language: Python 3.12
- ML Model: XGBoost Classifier
- Tuning: GridSearchCV
- Imbalance Handling: SMOTE
- Explainability: SHAP (Waterfall plot)
- Deployment: Streamlit
-
Clone the repository
git clone https://github.com/your-username/credit-scoring-model.git cd credit-scoring-model -
Create virtual environment
python -m venv venv venv\Scripts\activate # Windows source venv/bin/activate # Mac/Linux
-
Install dependencies
pip install -r requirements.txt
-
Run the app
streamlit run app.py
| Metric | Score |
|---|---|
| ROC-AUC | 0.7815 |
| Accuracy | 82% |
| Tuning | GridSearchCV |
| Explainability | SHAP waterfall |
We use SHAP to explain predictions for each customer:

- Add login/auth for secure access
- Store prediction history to database
- Train model incrementally with real-time data
Đạt Đình
2nd Year @ UET, Data Engineering Track
Email: dinhdatnguyen0710@example.com
MIT License — Free to use and modify
- UCI Credit Card Dataset
- Streamlit community
- SHAP and XGBoost developers