This project is a part of my AI/ML Engineering Internship at DevelopersHub Corporation, Islamabad.
- Company: DevelopersHub Corporation, Islamabad 🇵🇰
- Internship Period: July - September 2025
This project implements an end-to-end machine learning pipeline for predicting customer churn. The goal is to identify customers who are likely to discontinue using a service, enabling proactive retention strategies.
- View the app screenshots here
- Link to App Online: click the link in the About section on the left
- Data preprocessing and feature engineering
- Model training with hyperparameter tuning
- Web-based interface for predictions
- Input validation and error handling
- Model performance visualization
-
Clone the repository:
git clone 'repository s URL' cd customer-churn-predictor
-
Create and activate a virtual environment:
python -m venv churn-env source churn-env/bin/activate # On Windows: churn-env\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
- Running the Web App:
Open your browser and navigate to
streamlit run app.py
http://localhost:8501
-
Dataset source - Kaggle
-
Dataset: The model is trained on a dataset containing 50,000 customer records with the following features:
tenure_months: Number of months as a customermonthly_usage_hours: Average monthly usage hourshas_multiple_devices: Whether the customer uses multiple devices (0/1)customer_support_calls: Number of support calls madepayment_failures: Number of payment failuresis_premium_plan: Whether the customer is on a premium plan (0/1)churn: Target variable (0 = No churn, 1 = Churn)
- Handling missing values
- Feature scaling
- Encoding categorical variables
- Train-test split (80-20)
- Implemented using scikit-learn's Pipeline API
- Models evaluated:
- Logistic Regression
- Random Forest Classifier
- Hyperparameter tuning using GridSearchCV
- Model evaluation using:
- Accuracy
- Precision
- Recall
- F1-Score
- AUC-ROC
- Built with Streamlit
- Input validation
- Real-time predictions
- Model performance visualization
| Metric | Score |
|---|---|
| Accuracy | 0.984 |
| Precision | 0.683 |
| Recall | 0.406 |
| F1-Score | 0.509 |
| AUC-ROC | 0.974 |
- The Random Forest model outperformed Logistic Regression in terms of overall accuracy and AUC-ROC score.
- The most important features for predicting churn were found to be:
tenure_monthsmonthly_usage_hourspayment_failures
- The model shows good generalization with consistent performance on the test set.
ChurnPredictor/
├── app-screenshots/ # Snapshots of the App
├── dataset/ # Dataset directory
│ └── customer_churn_dataset.csv
├── .gitignore # Git ignore file
├── README.md # This file
├── app.py # Streamlit web application
├── customer_churn_pipeline.ipynb # Jupyter notebook for model development
├── best_churn_model_pipeline.pkl # Trained model pipeline
├── model_info.pkl # Model metadata and validation rules
└── requirements.txt # Dependencies
- Python 3.8+
- pandas
- numpy
- scikit-learn
- streamlit
- matplotlib
- seaborn
- joblib