This project is a comprehensive SQL + Python case study using a real-world e-commerce dataset to showcase business intelligence, data analysis, and storytelling skills. It is designed to demonstrate end-to-end capabilities for potential clients on UpWork or hiring managers.
- Source: Brazilian E-Commerce Public Dataset by Olist
- Description: Includes orders, customers, payments, reviews, products, sellers, and geolocation data from a multi-vendor marketplace.
- Analyze growth trends: revenue, order volume, and average order value
- Segment customers using RFM analysis
- Assess retention and churn behavior over time
- Deliver visual insights and business-ready metrics
- PostgreSQL
- Python (Pandas, Matplotlib, Seaborn)
- Jupyter Notebook
- VSCode (SQLTools Extension)
- Tracked total revenue and order volume by month
- Calculated and visualized average order value over time
- Output saved as
monthly_revenue.csv,monthly_order_count.csv, andmonthly_avg_order_value.csv
SELECT
TO_CHAR(o.order_purchase_timestamp, 'YYYY-MM') AS order_year_month,
ROUND(SUM(oi.price + oi.freight_value), 2) AS total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_status = 'delivered'
GROUP BY order_year_month
ORDER BY order_year_month;
Figure 1: This line chart displays the total revenue generated from delivered orders over time. It highlights seasonal fluctuations, growth patterns, and potential dips in sales performance.
- Recency, Frequency, and Monetary scores calculated using PostgreSQL
- Customers classified into business segments:
| RFM Profile | Segment |
|---|---|
| R β₯ 3, F β₯ 3, M β₯ 3 | Champions |
| R β₯ 3, F β₯ 2 | Loyal |
| R = 4, F = 1 | New Customers |
| R β€ 2, F β₯ 3 | At Risk |
| M β€ 2, F β€ 2 | Low Value |
| else | Other |
-
Segment-level metrics saved and visualized
-
Outputs:
customer_segments.csv,segment_metrics.csv, andsegment_analysis.png
Figure 2: The chart illustrates how different customer groups contribute to business value β helping identify high-volume vs. high-value targets like Champions and Loyal Customers, while spotlighting low-revenue or at-risk groups for strategic re-engagement.
This visual combines:
- π· Blue bars: total number of customers in each segment
- π§ Orange bars: total revenue (in thousands of BRL) generated by that segment
- π’ Green line: average spend per customer in the segment
- Built cohort-based retention analysis by month
WITH customer_orders AS (
SELECT
c.customer_unique_id,
o.order_id,
TO_CHAR(o.order_purchase_timestamp, 'YYYY-MM') AS order_month
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_status = 'delivered'
),
cohorts AS (
SELECT
customer_unique_id,
MIN(order_month) AS cohort_month
FROM customer_orders
GROUP BY customer_unique_id
)
SELECT
c.customer_unique_id,
ch.cohort_month,
cu.order_month
FROM customer_orders cu
JOIN cohorts ch ON cu.customer_unique_id = ch.customer_unique_id
JOIN customers c ON c.customer_unique_id = cu.customer_unique_id;- Created a retention heatmap showing repeat behavior over time
Figure 3: This heatmap visualizes customer retention by monthly cohorts. The low-intensity values across most cohorts confirm that a majority of customers made only a single purchase and did not return in subsequent months. This insight highlights a key business challenge: limited repeat engagement, suggesting potential for targeted retention strategies or loyalty initiatives.
All key trends and patterns are visualized using matplotlib and seaborn for easy interpretation and presentation.
project-root/
β
βββ data/ # Raw CSV files from Kaggle
βββ sql/
β βββ create_tables.sql # SQL schema creation
β βββ query_tables/ # Output CSVs and visual assets
βββ notebooks/
β βββ ecommerce_analysis.ipynb # Full end-to-end notebook
βββ README.md
β
βββ figs/ # Saved plotted figures
- SQL-based data modeling and analytics
- Python data wrangling and visualization
- RFM segmentation and cohort analysis
- Business insights communication
- Reproducible project structure for clients or employers
π§ Use This Project To:
- Showcase data storytelling on UpWork
- Submit a polished example in data analyst interviews
- Customize for future marketplace or CRM analytics projects
Created with πΌ job-readiness in mind.