Customer churn is a major challenge in the banking industry, directly impacting revenue and customer retention.
This project focuses on building a Machine Learning-based churn prediction system using XGBoost, along with an interactive Streamlit dashboard to visualize model performance and insights.
In this project, I developed a predictive system that identifies customers who are likely to churn based on historical banking data.
The model helps businesses take proactive retention measures by understanding churn patterns and customer behavior.
- Programming Language: Python
- Libraries: Pandas, NumPy, Matplotlib
- Machine Learning: Scikit-learn, XGBoost Classifier
- Web Framework: Streamlit
- Model Persistence: Pickle
- Data preprocessing using Label Encoding and One-Hot Encoding
- Splitting the dataset into training and testing sets
- Training an XGBoost Classification model
- Model evaluation using:
- Accuracy Score
- Confusion Matrix
- Performance analysis using:
- ROC Curve
- AUC Score
- Bias vs Accuracy analysis to check overfitting
- K-Fold Cross Validation for model stability and reliability
- Clean and modern user interface
- Interactive dataset preview
- Confusion Matrix visualization
- ROC Curve with AUC score
- Actual vs Predicted churn comparison
- Bias vs Accuracy visualization
- Gained hands-on experience with end-to-end Machine Learning pipelines
- Learned effective feature encoding and preprocessing techniques
- Improved understanding of model evaluation and validation
- Built an interactive ML dashboard using Streamlit
- Understood the importance of interpretability in real-world business problems
- Hyperparameter tuning using GridSearchCV
- Adding feature importance visualization
- Deploying the application using cloud platforms
- Real-time prediction functionality