This project aims to predict customer churn for a fictional telecommunications company using an Artificial Neural Network (ANN). By identifying customers who are likely to churn, the company can proactively engage in targeted retention campaigns to reduce revenue loss. The model is a binary classifier built with TensorFlow
and Keras
.
Customer churn is a critical metric for subscription-based businesses. This project addresses the challenge by building a deep learning model to predict which customers are most likely to cancel their service. The model analyzes various customer attributes, including their subscribed services, contract type, and billing information, to generate a churn prediction.
The solution is implemented as a Python script (churn_model.py) that handles the complete machine learning pipeline from data preprocessing to model training, evaluation, and artifact storage.
TELECOMCHURNANNMODEL/
│
├── data/
│ └── Telco_customer_churn.csv # Raw dataset
│
├── outputs/v1/
│ ├── metrics/
│ │ └── metrics.json # Model performance metrics
│ ├── model/
│ │ └── telecom_churn_model_v1.keras # Saved Keras model
│ ├── plots/
│ │ ├── ... (image plots) # Generated charts
│ ├── model_config.json # Model architecture configuration
│ └── model_summary.txt # Text summary of the model layers
│
├── src/
│ ├── __init__.py
│ ├── data_preprocessing.py # Module for cleaning and feature engineering
│ └── churn_model.py # Main Python script for training and evaluation
│
├── .gitignore # Git ignore file
├── churn_model.ipynb # Jupyter notebook for EDA and prototyping
├── README.md # Project documentation
└── requirements.txt # Python dependencies
The dataset is from a fictional telco company that provided home phone and Internet services to 7,043 customers in California during Q3. It contains 33 columns, including customer demographics, services, account information, and the churn status.
- Source: IBM Community Telco Customer Churn Dataset
- Target Variable:
ChurnValue
(1 = Churn, 0 = No Churn) - Key Features Used:
Gender
,SeniorCitizen
,TenureMonths
,Contract
,PaymentMethod
,MonthlyCharges
,TotalCharges
,CLTV
, etc.
1. Data Cleaning and Preprocessing (src/data_preprocessing.py)
The raw data is loaded and cleaned by handling missing values in the TotalCharges
column. Relevant features are then selected, and categorical data is converted into a numerical format using binary and one-hot encoding.
2. Model Building, Training and Evaluation (churn_model.py)
- Model Building: An Artificial Neural Network (ANN) is constructed with Keras. The model consists of an input layer, three hidden dense layers with ReLU activation, and a sigmoid output layer for binary classification. It's compiled using the Adam optimizer and
binary_crossentropy
loss function. - Training and Evaluation: The data is split into 80% for training and 20% for testing. Numerical features are standardized using
StandardScaler
. The model is trained for 100 epochs, and its performance is evaluated using metrics like accuracy, loss, confusion matrix, ROC-AUC, and Average Precision.
The model demonstrated strong performance in identifying customers likely to churn. All generated charts and metrics are saved automatically in the outputs/v1/
directory.
- Time for Training: 0h:0m:7s (System: Macbook Pro M3 pro chipset)
- Training Accuracy: 93.19%
- Validation Accuracy: 91.22%
- Training Loss: 16.86%
- Validation Loss: 21.63%
Classification Report
precision recall f1-score support
0 0.95 0.93 0.94 1059
1 0.79 0.84 0.81 350
accuracy 0.90 1409
macro avg 0.87 0.88 0.88 1409
weighted avg 0.91 0.90 0.91 1409
The learning curves below show that the model trained effectively without significant overfitting, as the training and validation metrics converged well.
The ROC and Precision-Recall curves confirm the model's excellent discriminative ability.
1. Prerequisites
- Python 3.8+
2. Clone the Repository
git clone <repository_url>
cd <repository_directory>
3. Set Up a Virtual Environment (Recommended)
- macOS / Linux:
python3 -m venv <virtual environment name> source <virtual environment name>/bin/activate
- Windows:
python -m venv <virtual environment name> .\<virtual environment name>\Scripts\activate
4. Install Dependencies
pip install -r requirements.txt
5. Place the Dataset
- Download the dataset from the source link.
- Place it inside the
data/
directory.
6. Execute the Training Script
python src/churn_model.py
7. Review the Outputs
After the script finishes, the trained model, performance plots, metrics, and configuration files will be saved in the outputs/v1/
directory.
The main libraries used in this project are listed below. See requirements.txt for a full list of dependencies.
tensorflow
pandas
numpy
scikit-learn
matplotlib
Tip: When you install all these dependencies additional libraries will be installed. To mention all of those dependencies in requirements.txt
just run this command after you install the main libraries:
pip freeze > requirements.txt
This will updated the requirements.txt
with all the dependencies (including additional ones) installed in your environment.