Unlocking $10.9M in Revenue Opportunities through Advanced Analytics & Machine Learning
This project demonstrates an end-to-end data solution analyzing 1M+ transactions to solve critical retail challenges: churn, pricing inefficiency, and revenue forecasting. By combining robust ETL pipelines with machine learning models, we identified actionable strategies to drive a 10.3% projected revenue increase.
| Metric | Impact |
|---|---|
| Revenue Opportunity | $10.9M identified via pricing & retention strategies |
| Forecast Accuracy | 98.03% (MAPE 1.97%) for reliable planning |
| Operational Insight | $6M unlocked through optimized pricing elasticity |
| Customer Retention | $7.8M saved by proactively targeting at-risk segments |
This project culminates in a comprehensive analysis dashboard. Below are effective views from our analysis showing the power of data-driven decision making.
Real-time visibility into business health, providing a consolidated view of KPIs, revenue trends, and churn risk.
We moved beyond simple demographics to behavioral segmentation. Using K-Means clustering, we identified 3 distinct personas.
Insight: The "At Risk" segment constitutes only 9% of customers but represents a disproportionate $7.8M in potential revenue loss. Targeted retention campaigns for this specific group yield the highest ROI.
Using Price Elasticity of Demand (PED) analysis, we determined optimal price points for each product category.
Insight: High-volume items like "Coffee K-Cups" showed inelastic demand (-0.8), suggesting a price increase would drive pure margin growth without sacrificing volume.
The system is built on a modular "Lakehouse" architecture, validating data integrity at every stage from raw CSVs to the final serving layer.
graph LR
subgraph Data_Pipeline
Raw[Raw Data CSV] -->|Pandas NumPy| Clean[Processed Data]
Clean -->|Feature Engineering| Features[ML Features]
end
subgraph Machine_Learning
Features -->|Random Forest XGBoost| Forecast[Revenue Forecast Model]
Features -->|K Means| Segments[Customer Clusters]
Features -->|Elasticity Algorithm| Pricing[Pricing Model]
end
subgraph Insights
Forecast -->|KPIs| Report[Business Report]
Segments -->|Cohorts| Report
Pricing -->|Strategy| Report
end
- Data Processing: Python, Pandas, NumPy
- Machine Learning: Scikit-learn, Statsmodels (ARIMA/SARIMA)
- Visualization: Plotly Interactive Charts, Matplotlib
- Environment: Jupyter Notebooks
├── notebooks/ # 8-step analysis pipeline
│ ├── 00_Setup_Data_Overview.ipynb
│ ├── 01_EDA.ipynb
│ ├── 03_Forecasting.ipynb # Revenue prediction models
│ ├── 04_Pricing.ipynb # Elasticity analysis
│ └── 05_Segmentation.ipynb # Clustering & CLV
├── reports/ # Generated assets & visualizations
├── data/ # Data storage (Raw & Processed)
└── models/ # Serialized ML models
To replicate the analysis or explore the notebooks:
-
Clone the repository
git clone https://github.com/stevenlagadapati/retail-analytics-project.git cd retail-analytics-project -
Install dependencies
pip install -r requirements.txt
-
Run Jupyter Notebooks
jupyter notebook notebooks/
Steven Lagadapati
Data Scientist & Analytics Engineer
Email | GitHub
Made with ❤️ and Python


