DEV_Projects

Telco Customer Churn Prediction

Overview

This project aims to predict customer churn for a telecommunications company using machine learning techniques. Churn, also known as customer attrition, refers to the phenomenon where customers stop doing business with a company. By accurately predicting which customers are likely to churn, the company can proactively implement retention strategies to reduce churn rates and improve profitability.

Dataset

The project uses the "Telco Customer Churn" dataset available on Kaggle: https://www.kaggle.com/datasets/blastchar/telco-customer-churn

The dataset contains information about customers of a telecom company, including demographic data, services subscribed to, account information, and whether or not the customer churned.

Project Structure

churn_prediction.py: The main Python script containing the code for data preprocessing, model training, hyperparameter tuning, evaluation, and feature importance analysis.
WA_Fn-UseC_-Telco-Customer-Churn.csv: The original dataset file.
requirements.txt: Lists the required Python libraries for running the code.
Initial_Model_roc_curve.png: ROC curve for the initial logistic regression model.
Best_Model_roc_curve.png: ROC curve for the logistic regression model with tuned hyperparameters.
feature_importance.png: Feature importance plot for the best model.

Approach

The project follows these main steps:

Data Loading and Preprocessing:
- Load the dataset into a Pandas DataFrame.
- Handle missing values (in TotalCharges) by imputing with the mean.
- Convert relevant features (e.g., SeniorCitizen) to categorical data types.
- Perform one-hot encoding of categorical features.
- Split the data into training and testing sets.
Model Training and Evaluation (Logistic Regression):
- Train an initial logistic regression model on the training set.
- Evaluate the model on the test set using precision, recall, F1 score, accuracy, and AUC-ROC.
- Visualize the ROC curve to assess the model's ability to discriminate between classes.
Hyperparameter Tuning (Logistic Regression):
- Use GridSearchCV to find the optimal hyperparameters for the logistic regression model, including C, penalty, solver, and class_weight.
Retraining and Final Evaluation (Logistic Regression):
- Retrain the logistic regression model using the best hyperparameters found in the previous step.
- Evaluate the model's performance on the test set.
- Analyze feature importance to identify the most significant predictors of churn.

Results

The best logistic regression model achieved the following performance on the test set:

Metric	Score
Precision	0.62
Recall	0.51
F1 Score	0.56
Accuracy	0.79
AUC-ROC	0.83

The specific results may vary slightly depending on the random state of the data split.

ROC Curves

Initial Model	Best Model (Tuned)

Feature Importance

Limitations and Future Directions

The model's recall could be improved, as missing out on potential churners can be costly for the company. We could try incorporating more features, experimenting with different algorithms (e.g., XGBoost), or employing techniques like SMOTE to address class imbalance.

How to Run

Clone the Repository:

git clone [https://github.com/](https://github.com/)<DevasivaBA>/<DEV_Projects>.git

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Best Model ROC Curve.png		Best Model ROC Curve.png
Feature Importance.png		Feature Importance.png
Initial Model ROC Curve.png		Initial Model ROC Curve.png
README.md		README.md
requirements.txt		requirements.txt
telco_customer_churn_prediction .py		telco_customer_churn_prediction .py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEV_Projects

Telco Customer Churn Prediction

Overview

Dataset

Project Structure

Approach

Results

ROC Curves

Feature Importance

Limitations and Future Directions

How to Run

About

Releases

Packages

Languages

DevasivaBA/DEV_Projects

Folders and files

Latest commit

History

Repository files navigation

DEV_Projects

Telco Customer Churn Prediction

Overview

Dataset

Project Structure

Approach

Results

ROC Curves

Feature Importance

Limitations and Future Directions

How to Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages