Skip to content

Latest commit

 

History

History
37 lines (32 loc) · 3.68 KB

README.md

File metadata and controls

37 lines (32 loc) · 3.68 KB

Product Range Analysis

Masterschool's capstone project integrating skills and tools for data analysis.

Project Goal

  • Identify the top performers: top customer, top product, and top category
  • Identify the customer segments based on RFM
  • Did higher-priced products contribute to higher sales than the lower-priced products?
  • Compared to other months, were sales higher during the Christmas season (December)?

Project Scope

  • Data Preprocessing
  • Exploratory Data Analysis
  • Customer Segmentation based on RFM Metrics (using percentile ranking and K-means clustering)
  • Product Categorization & Product Category Analysis
  • Statistical Hypotheses
  • Insights
  • Dashboard (Tableau)

The project is composed of five Jupyter notebooks:

  • 1_data_prep_and_EDA.ipynb
  • 2_customer_rfm_segmentation.ipynb
  • 3_product_categorization.ipynb
  • 4_product_category_analysis.ipynb
  • 5_statistical_hypotheses.ipynb

The summary of observations, insights, conclusion, and resources can be found in 5_statistical_hypotheses.ipynb notebook.

Dataset Overview

The dataset is a modified version of the online retail dataset sourced from UCI Machine Learning Repository. It contains 541,909 transaction records from 2018-11-29 to 2019-12-07 which has seven attributes:

  • InvoiceNo: Invoice reference number uniquely assigned for each transaction. If the InvoiceNo starts with 'C', it indicates a cancellation
  • StockCode: Product or item code uniquely assigned to each distinct product
  • Description: Product or item name
  • Quantity: The quantities for each product or item per transaction
  • InvoiceDate: Invoice date and time, the day and time the transaction was generated by the system
  • UnitPrice: Product price per unit
  • CustomerID: Customer reference number uniquely assigned to each customer