-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
84 lines (71 loc) · 2.9 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
Customer Churn Prediction
Overview
--------
This project aims to predict customer churn using a machine learning approach. Customer churn refers to the likelihood of a customer leaving a service or subscription. By identifying potential churners, businesses can implement strategies to retain them and reduce revenue loss.
Features
--------
- Data Preprocessing: Handled missing values, scaled numerical features, and encoded categorical variables.
- Exploratory Data Analysis (EDA): Visualized data distributions and relationships.
- Modeling: Built a Random Forest Classifier to predict customer churn.
- Evaluation: Used metrics like ROC-AUC, confusion matrix, and classification report to assess the model's performance.
- Feature Importance: Identified the most influential features in predicting churn.
- Optimization: Performed hyperparameter tuning using GridSearchCV.
- Deployment: Developed a Streamlit application for user-friendly predictions.
Dataset
-------
The dataset used is `Telco-Customer-Churn.csv`, which contains information about customers, their subscription details, and whether they churned or not.
Columns in the Dataset:
- gender, Partner, Dependents: Categorical demographic information.
- tenure: Number of months a customer has been with the company.
- MonthlyCharges, TotalCharges: Numerical subscription metrics.
- Churn: Target variable indicating if the customer churned.
Requirements
------------
Python Libraries:
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
- joblib
- imbalanced-learn
- streamlit
To install dependencies:
pip install -r requirements.txt
How to Run
----------
1. Clone the Repository
git clone (https://github.com/n-liyana/customer-churn-prediction)
2. Run the Jupyter Notebook or Python Script
Open the Jupyter Notebook or run the Python script in your preferred IDE.
3. Launch the Streamlit App
streamlit run app.py
Results
-------
Evaluation Metrics:
- ROC-AUC Score: Measures the model's ability to distinguish between classes.
- Confusion Matrix: Visualizes the number of correct and incorrect predictions.
- Feature Importance: Identifies key factors influencing customer churn.
Outputs:
- Churn Distribution: Visual representation of churn vs. non-churn customers.
- Correlation Matrix: Heatmap showing relationships between features.
- Confusion Matrix: Evaluation of prediction accuracy.
- ROC Curve: Visualization of the model's performance.
Project Structure
-----------------
project/
+-- data/
¦ +-- Telco-Customer-Churn.csv
+-- outputs/
¦ +-- Churn Distribution.png
¦ +-- Correlation Matrix.png
¦ +-- Confusion Matrix.png
¦ +-- ROC Curve.png
+-- churn_model.pkl # Trained model
+-- app.py # Streamlit app
+-- requirements.txt # Required Python libraries
+-- README.txt # Project description
+-- churn_analysis.py # Main script
License
-------
This project is open source and free to use under the MIT License.