Using machine learning to predict which customers are likely to leave.
Analyzes telecommunications customer data to understand churn patterns and build a predictive model.
- 7,043 customers with service, contract, and billing data
- 26.5% churn rate identified
- 80% model accuracy achieved with Random Forest
- $139,000+ monthly revenue loss from churned customers
- Month-to-month: 42.7% churn rate
- One year: 11.3% churn rate
- Two year: 2.8% churn rate
- Electronic check: 45.3% churn rate (highest risk)
- Credit card: 15.2% churn rate
- Bank transfer: 16.2% churn rate
- Senior citizens: 41.7% churn rate
- Customers without partners: 32.9% churn rate
- New customers: highest risk group
- Model: Random Forest Classifier
- Accuracy: 80%
- AUC Score: 0.8406
- High-risk customers identified: 109 customers
- Top predictive factors: Tenure, Total Charges, Monthly Charges, Contract Type
- Focus on contracts: Encourage longer-term contracts
- Fix payment issues: Migrate electronic check users to automatic payments
- Improve fiber service: Address quality problems (30.9% churn rate)
- Support new customers: Extra attention in first few months
- Target high-risk customers: Use ML model for proactive retention
-
Setup environment:
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt -
Run analysis:
jupyter notebook Churn_analysis.ipynb
-
View results:
- Charts saved to
figures/folder - Model identifies high-risk customers
- Financial impact calculated
- Charts saved to
Churn_analysis.ipynb- Main analysis notebookCustomer-Churn.csv- Dataset (7,043 records)requirements.txt- Dependencies (pandas, numpy, matplotlib, sklearn)figures/- Generated visualizations
Competition Entry: Data Analytics with AI β Contest #1 | Date: September 30, 2025