Author: Kumaresan K
Techniques: BG/NBD, Gamma-Gamma, RFM Segmentation
Goal: Predict future customer value and enable value-based marketing strategy
Understanding which customers drive future revenue is critical for effective marketing and retention strategy. This project applies probabilistic Customer Lifetime Value (CLV) modeling to transactional e-commerce data to estimate future purchasing behavior and monetary contribution at the individual customer level.
Using the BG/NBD model for purchase frequency and the Gamma-Gamma model for monetary value, we predict expected future revenue and segment customers into value tiers (Low, Mid, High, VIP).
The analysis demonstrates how raw transaction data can be transformed into forward-looking customer intelligence for strategic decision-making.
- Model repeat purchasing behavior and customer survival probability
- Estimate future customer lifetime value (CLV)
- Identify high-value customer segments
- Analyze revenue concentration across segments
- Derive actionable marketing strategies
Online Retail transactional dataset containing:
- Invoice date
- Quantity and price
- Customer ID
- Transaction revenue
Each row represents a product-level transaction.
- Removed invalid and anonymous transactions
- Filtered returns and negative quantities
- Computed transaction revenue
- Converted timestamps
Customer retention evaluated by acquisition month cohorts to validate repeat purchase behavior and CLV modeling assumptions.
Customer behavior summarized using:
- Recency — time between first and last purchase
- Frequency — repeat purchase count
- Monetary — average transaction value
Estimated:
- Expected future purchase frequency
- Probability customer is still active
Estimated expected monetary value per future purchase.
Combined models to predict 6-month Customer Lifetime Value for each customer.
Customers segmented into quartiles:
- Low
- Mid
- High
- VIP
- Customer value is highly skewed with strong heterogeneity
- A small VIP segment drives the majority of predicted revenue
- Recent purchasing activity is the strongest predictor of future value
- Historical purchase volume alone does not indicate future profitability
Example insight:
Customers with high historical frequency but long inactivity exhibit near-zero predicted CLV, while recently active customers show substantially higher future value.
The project includes:
- Cohort retention heatmap
- RFM distributions
- BG/NBD frequency-recency matrix
- CLV distribution
- CLV segment comparison
- Revenue contribution by segment
CLV segmentation enables value-based targeting:
- VIP: retention & loyalty programs
- High: upsell and cross-sell
- Mid: engagement nurturing
- Low: cost-efficient reactivation
Revenue concentration analysis confirms that prioritizing high-value segments maximizes marketing ROI.
- Python
- Pandas / NumPy
- Matplotlib / Seaborn
- Lifetimes (BG/NBD & Gamma-Gamma)
- Jupyter Notebook
CLV Prediction/
│
├── notebook/
│ └── CLV_analysis.ipynb
│
├── data/
│ ├── raw/
│ │ └── data.csv
│ └── cleaned/
│ └── cleaned_data.csv
│
├── images/
│
├── requirements.txt.txt
└── README.md
pip install -r requirements.txt.txtOpen notebook:
notebook/CLV_analysis.ipynb
This project demonstrates how probabilistic data science models can:
- Forecast customer revenue
- Identify high-value customers
- Optimize marketing allocation
- Support retention strategy
Kumaresan K
Data Science | Machine Learning | Customer Analytics
- LinkedIn: https://www.linkedin.com/in/kumaresan-k-4289452b4
- GitHub: https://github.com/kuman002
- Probabilistic modeling
- Customer lifetime value analytics
- Behavioral segmentation
- Marketing data science
- Business insight generation