- Customer Churn Rate (also known as attrition rate) refers to the percentage of customers who stop doing business with a company over a given period. It is a key metric used to measure customer retention and business performance.
- The customer churn project aims at predicting the churn rate of a business in advance using machine learning algorithms. By analyzing historical customer data and various influencing factors, this model will help businesses take preventive actions to reduce churn.
- Developed a machine learning model to predict whether a customer of a telecommunication company will churn.
- Followed a modular structure for the entire project.
- Utilized data of over 7000 records to train and develop the model.
- Cleaned and preprocessed the raw data.
- Performed feature transformation, scaled the numerical features and handled imbalance in the dataset.
- Trained the model using various ML algorithms and selected the best one with higher accuracy.
- Deployed the model using a Flask web application for real-time predictions.
- Utilized the company's historical data of over 7000 records which includes information such as demographic details, services subscribed and account information.
- For each customer the following information is available:
- Gender
- Senior Citizen
- Partner
- Dependents
- Tenure
- Phone Service
- Multiple Lines
- Internet Service
- Online Security
- Online Backup
- Device Protection
- Tech Support
- Streaming TV
- Streaming Movies
- Contract Type
- Paperless Billing
- Payment Method
- Monthly Charges
- Total Charges
- Cleaned and preprocessed the raw data:
- Handled missing values.
- Removed duplicate records.
- Removed outliers using zscore to avoid overfitting.
- Replaced boolean values with numerical values.
- Converted the values of tenure column in to bin values with a range of 12 months to ensure effective information understanding.
-
Once the data is cleaned and preprocessed I analyzed the data to identify hidden patterns, relationships between features.
-
Implemented both single and cross feature analysis to find relationships betweent features.
-
Analyzed and visualized each feature to understand its values and the value counts to determine its overall importance.
-
Some of the major findings:
- Among the entire customer base around 16% of them are senior citizens.
- Customers who are more likely to churn have lower monthly and total charges.
- Senior citizen customer have higher churn rates than non senior citizen customers.
- The longer a customer stays with the business, the lower the chances of churning.
- Customers with a tenure of within 1 years have equal chances of both churning and staying in the business.
- Customers with a contract type of month-to-month have left the business more often.
-
Visualizations:
-
Distribution of tenure:
-
Imbalance in churn:
-
Monthly and Total Charges by churn:
- Used different classification algorithms to train the model.
- Logistic Regression
- Naive Bayes
- Knn Classifier
- Decision Tree
- Random Forest
- Adaboost Classifier
- Xgboost Classifier
- Support Vector Classifier
- Performed hyper parameter tunning using GridSearchCV to optimize and improve the performance models.
- Evaluated the models with accuracy score and confusion matrix (percision, recall, f1 score) and selected the model with higher accuracy.
- Out of all the algorithms used, Xgboost classifier had the highest accuracy of 81%.
- Developed a Flask web application to deploy the model for real-time predictions.
- Built both front-end and back-end components for the web app.
- Created a custom website where users can enter customer data and receive predictions from the model.
- Deployed the Flask app on local host server for easy access.
Technology | Description |
---|---|
Python | Programming language used |
Flask | Web framework for UI and API integration |
HTML & CSS | Frontend design and styling |
Pandas | Cleaning and preprocessing the data |
Numpy | Performing numerical operations |
Matplotlib | Visualization of the data |
/πCustomer-Churn-Project
βββ /πartifacts # Csv and pickel files
β βββ data_cleaned.csv
β βββ test.csv
β βββ train.csv
β βββ model.pkl
β βββ preprocessor.pkl
βββ /πData
β βββ data.csv # Raw data
| βββ data_eda.csv # Cleaned, preprocessed data
βββ /πeda_images # Images of exploratory analysis
β βββ tenure.png
| βββ churn.png
| βββcharges by churn.png
βββ /πnotebook # Research ipynb notebook
βββ /πsrc # Source files (core files of the project)
| βββexception_handling.py # custom exception handling
| βββlogger.py # Logging messages
| βββutils.py # Helper, utilities functions
| βββ /πcomponents # Main components files
| | βββ data_cleaning.py
| | βββ data_ingestion.py
| | βββ data_transformation.py
| βββ /πpipelines # Pipeline files
| | βββ predict_pipeline.py
| | βββ train_pipeline.py
βββ /πstatic # Static folder
| βββ /πcss # Css files
| | βββ hp_style.css # Home page styles
| | βββ pp_style.css # Predict page styles
| βββ /πimages # Website Images
βββ /πtemplates # Templates (html files)
| βββ /home_page.html
| βββ /predict_page.html
βββ .gitignore
βββ README.md
βββ app.py # Flask backend
βββ requirements.txt # Python dependencies
βββ setup.py # Setup
git clone https://github.com/Dhanush-Raj1/Customer-Churn-Project.git
cd Customer-Churn-Project
conda create -p envi python==3.9 -y
source venv/bin/activate # On macOS/Linux
conda activate envi # On Windows
pip install -r requirements.txt
python app.py
The app will be available at: http://127.0.0.1:5000/
1οΈβ£ Open the web app in your browser.
2οΈβ£ Click the predict on the home page of the web app.
3οΈβ£ Enter the customer details in the respective dropdowns.
4οΈβ£ Click the predit button and the predicted results will appear.
β
Improved accuracy of the model with advanced fine tunning
β
Real-Time Prediction System
β
Automated Retraining Pipeline
β
Improve UI with a more interactive design.
β
Customer Retention Strategy Recommender.
β
Anomaly Detection for Unexpected Churn
π‘ Contributions, issues, and pull requests are welcome! Feel free to open an issue or submit a PR to improve this project. π
This project is licensed under the MIT License β see the LICENSE file for details.