Skip to content

Masterschool's capstone project integrating skills and tools for data analysis.

Notifications You must be signed in to change notification settings

DSKunth/Product-Range-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Product Range Analysis

Masterschool's capstone project integrating skills and tools for data analysis.

Project Goal

  • Identify the top performers: top customer, top product, and top category
  • Identify the customer segments based on RFM
  • Did higher-priced products contribute to higher sales than the lower-priced products?
  • Compared to other months, were sales higher during the Christmas season (December)?

Project Scope

  • Data Preprocessing
  • Exploratory Data Analysis
  • Customer Segmentation based on RFM Metrics (using percentile ranking and K-means clustering)
  • Product Categorization & Product Category Analysis
  • Statistical Hypotheses
  • Insights
  • Dashboard (Tableau)

The project is composed of five Jupyter notebooks:

  • 1_data_prep_and_EDA.ipynb
  • 2_customer_rfm_segmentation.ipynb
  • 3_product_categorization.ipynb
  • 4_product_category_analysis.ipynb
  • 5_statistical_hypotheses.ipynb

The summary of observations, insights, conclusion, and resources can be found in 5_statistical_hypotheses.ipynb notebook.

Dataset Overview

The dataset is a modified version of the online retail dataset sourced from UCI Machine Learning Repository. It contains 541,909 transaction records from 2018-11-29 to 2019-12-07 which has seven attributes:

  • InvoiceNo: Invoice reference number uniquely assigned for each transaction. If the InvoiceNo starts with 'C', it indicates a cancellation
  • StockCode: Product or item code uniquely assigned to each distinct product
  • Description: Product or item name
  • Quantity: The quantities for each product or item per transaction
  • InvoiceDate: Invoice date and time, the day and time the transaction was generated by the system
  • UnitPrice: Product price per unit
  • CustomerID: Customer reference number uniquely assigned to each customer