Skip to content

neehanthreddym/telecom-churn-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Telco Customer Churn Prediction

This project aims to predict customer churn for a fictional telecommunications company using an Artificial Neural Network (ANN). By identifying customers who are likely to churn, the company can proactively engage in targeted retention campaigns to reduce revenue loss. The model is a binary classifier built with TensorFlow and Keras.

📋 Table of Contents

🚀 Introduction

Customer churn is a critical metric for subscription-based businesses. This project addresses the challenge by building a deep learning model to predict which customers are most likely to cancel their service. The model analyzes various customer attributes, including their subscribed services, contract type, and billing information, to generate a churn prediction.

The solution is implemented as a Python script (churn_model.py) that handles the complete machine learning pipeline from data preprocessing to model training, evaluation, and artifact storage.

📁 Project Structure

TELECOMCHURNANNMODEL/
│
├── data/
│   └── Telco_customer_churn.csv      # Raw dataset
│
├── outputs/v1/
│   ├── metrics/
│   │   └── metrics.json              # Model performance metrics
│   ├── model/
│   │   └── telecom_churn_model_v1.keras # Saved Keras model
│   ├── plots/
│   │   ├── ... (image plots)         # Generated charts
│   ├── model_config.json             # Model architecture configuration
│   └── model_summary.txt             # Text summary of the model layers
│
├── src/
│   ├── __init__.py
│   ├── data_preprocessing.py         # Module for cleaning and feature engineering
│   └── churn_model.py                # Main Python script for training and evaluation
│
├── .gitignore                        # Git ignore file
├── churn_model.ipynb                 # Jupyter notebook for EDA and prototyping
├── README.md                         # Project documentation
└── requirements.txt                  # Python dependencies

📊 Dataset

The dataset is from a fictional telco company that provided home phone and Internet services to 7,043 customers in California during Q3. It contains 33 columns, including customer demographics, services, account information, and the churn status.

  • Source: IBM Community Telco Customer Churn Dataset
  • Target Variable: ChurnValue (1 = Churn, 0 = No Churn)
  • Key Features Used: Gender, SeniorCitizen, TenureMonths, Contract, PaymentMethod, MonthlyCharges, TotalCharges, CLTV, etc.

🛠️ Methodology

1. Data Cleaning and Preprocessing (src/data_preprocessing.py) The raw data is loaded and cleaned by handling missing values in the TotalCharges column. Relevant features are then selected, and categorical data is converted into a numerical format using binary and one-hot encoding.

2. Model Building, Training and Evaluation (churn_model.py)

  • Model Building: An Artificial Neural Network (ANN) is constructed with Keras. The model consists of an input layer, three hidden dense layers with ReLU activation, and a sigmoid output layer for binary classification. It's compiled using the Adam optimizer and binary_crossentropy loss function.
  • Training and Evaluation: The data is split into 80% for training and 20% for testing. Numerical features are standardized using StandardScaler. The model is trained for 100 epochs, and its performance is evaluated using metrics like accuracy, loss, confusion matrix, ROC-AUC, and Average Precision.

📈 Results

The model demonstrated strong performance in identifying customers likely to churn. All generated charts and metrics are saved automatically in the outputs/v1/ directory.

  • Time for Training: 0h:0m:7s (System: Macbook Pro M3 pro chipset)
  • Training Accuracy: 93.19%
  • Validation Accuracy: 91.22%
  • Training Loss: 16.86%
  • Validation Loss: 21.63%

Classification Report

              precision    recall  f1-score   support

           0       0.95      0.93      0.94      1059
           1       0.79      0.84      0.81       350

    accuracy                           0.90      1409
   macro avg       0.87      0.88      0.88      1409
weighted avg       0.91      0.90      0.91      1409

The learning curves below show that the model trained effectively without significant overfitting, as the training and validation metrics converged well.

The ROC and Precision-Recall curves confirm the model's excellent discriminative ability.

▶️ How to Run

1. Prerequisites

  • Python 3.8+

2. Clone the Repository

git clone <repository_url>
cd <repository_directory>

3. Set Up a Virtual Environment (Recommended)

  • macOS / Linux:
    python3 -m venv <virtual environment name>
    source <virtual environment name>/bin/activate
  • Windows:
    python -m venv <virtual environment name>
    .\<virtual environment name>\Scripts\activate

4. Install Dependencies

pip install -r requirements.txt

5. Place the Dataset

  • Download the dataset from the source link.
  • Place it inside the data/ directory.

6. Execute the Training Script

python src/churn_model.py

7. Review the Outputs After the script finishes, the trained model, performance plots, metrics, and configuration files will be saved in the outputs/v1/ directory.

📦 Dependencies

The main libraries used in this project are listed below. See requirements.txt for a full list of dependencies.

  • tensorflow
  • pandas
  • numpy
  • scikit-learn
  • matplotlib

Tip: When you install all these dependencies additional libraries will be installed. To mention all of those dependencies in requirements.txt just run this command after you install the main libraries:

pip freeze > requirements.txt

This will updated the requirements.txt with all the dependencies (including additional ones) installed in your environment.