Skip to content

zanyarsalami/ecommerce-analytic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

E-Commerce Analytics Case Study (Brazilian Marketplace Dataset)

This project is a comprehensive SQL + Python case study using a real-world e-commerce dataset to showcase business intelligence, data analysis, and storytelling skills. It is designed to demonstrate end-to-end capabilities for potential clients on UpWork or hiring managers.

πŸ“¦ Dataset

🎯 Goals

  • Analyze growth trends: revenue, order volume, and average order value
  • Segment customers using RFM analysis
  • Assess retention and churn behavior over time
  • Deliver visual insights and business-ready metrics

🧰 Tech Stack

  • PostgreSQL
  • Python (Pandas, Matplotlib, Seaborn)
  • Jupyter Notebook
  • VSCode (SQLTools Extension)

πŸ” Key Analyses

1. Growth Trends

  • Tracked total revenue and order volume by month
  • Calculated and visualized average order value over time
  • Output saved as monthly_revenue.csv, monthly_order_count.csv, and monthly_avg_order_value.csv
SELECT
    TO_CHAR(o.order_purchase_timestamp, 'YYYY-MM') AS order_year_month,
    ROUND(SUM(oi.price + oi.freight_value), 2) AS total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_status = 'delivered'
GROUP BY order_year_month
ORDER BY order_year_month;

Monthly RevenueFigure 1: This line chart displays the total revenue generated from delivered orders over time. It highlights seasonal fluctuations, growth patterns, and potential dips in sales performance.

2. Customer Segmentation (RFM)

  • Recency, Frequency, and Monetary scores calculated using PostgreSQL
  • Customers classified into business segments:
RFM Profile Segment
R β‰₯ 3, F β‰₯ 3, M β‰₯ 3 Champions
R β‰₯ 3, F β‰₯ 2 Loyal
R = 4, F = 1 New Customers
R ≀ 2, F β‰₯ 3 At Risk
M ≀ 2, F ≀ 2 Low Value
else Other
  • Segment-level metrics saved and visualized

  • Outputs: customer_segments.csv, segment_metrics.csv, and segment_analysis.png

Customer SegmentationFigure 2: The chart illustrates how different customer groups contribute to business value β€” helping identify high-volume vs. high-value targets like Champions and Loyal Customers, while spotlighting low-revenue or at-risk groups for strategic re-engagement.

This visual combines:

  • πŸ”· Blue bars: total number of customers in each segment
  • 🟧 Orange bars: total revenue (in thousands of BRL) generated by that segment
  • 🟒 Green line: average spend per customer in the segment

3. Churn & Retention

  • Built cohort-based retention analysis by month
WITH customer_orders AS (
    SELECT
        c.customer_unique_id,
        o.order_id,
        TO_CHAR(o.order_purchase_timestamp, 'YYYY-MM') AS order_month
    FROM orders o
    JOIN customers c ON o.customer_id = c.customer_id
    WHERE o.order_status = 'delivered'
),

cohorts AS (
    SELECT
        customer_unique_id,
        MIN(order_month) AS cohort_month
    FROM customer_orders
    GROUP BY customer_unique_id
)

SELECT
    c.customer_unique_id,
    ch.cohort_month,
    cu.order_month
FROM customer_orders cu
JOIN cohorts ch ON cu.customer_unique_id = ch.customer_unique_id
JOIN customers c ON c.customer_unique_id = cu.customer_unique_id;
  • Created a retention heatmap showing repeat behavior over time

HeatmapFigure 3: This heatmap visualizes customer retention by monthly cohorts. The low-intensity values across most cohorts confirm that a majority of customers made only a single purchase and did not return in subsequent months. This insight highlights a key business challenge: limited repeat engagement, suggesting potential for targeted retention strategies or loyalty initiatives.

πŸ“ˆ Visualizations

All key trends and patterns are visualized using matplotlib and seaborn for easy interpretation and presentation.

πŸ“‚ Folder Structure

project-root/
β”‚
β”œβ”€β”€ data/                     # Raw CSV files from Kaggle
β”œβ”€β”€ sql/
β”‚   β”œβ”€β”€ create_tables.sql     # SQL schema creation
β”‚   └── query_tables/         # Output CSVs and visual assets
β”œβ”€β”€ notebooks/
β”‚   └── ecommerce_analysis.ipynb  # Full end-to-end notebook
└── README.md
β”‚
└── figs/                     # Saved plotted figures

βœ… Key Skills Demonstrated

  • SQL-based data modeling and analytics
  • Python data wrangling and visualization
  • RFM segmentation and cohort analysis
  • Business insights communication
  • Reproducible project structure for clients or employers

🧠 Use This Project To:

  • Showcase data storytelling on UpWork
  • Submit a polished example in data analyst interviews
  • Customize for future marketplace or CRM analytics projects

Created with πŸ’Ό job-readiness in mind.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published