Market Basket Analysis (MBA) also known as association rule learning or affinity analysis, is a data mining technique that can be used in various fields, such as marketing, bioinformatics, education field, nuclear science, etc. The main aim of MBA in marketing is to provide information to the retailer to understand the purchase behavior of the buyer, which can help the retailer in correct decision-making (Kaur & Kang, 2016).
The objective of this analysis is to determine when customers have 2 certain products in their shopping cart or basket, which items they will be likely to purchase with.
- The AdventureWorksDW21019 sample database of Microsoft SQL Server includes the following tables:
- FactInternetSales table, recording sales through internet channel, will be mainly used to analyze our basket with two attributes: SalesOrderNumber and ProductKey.
SELECT TOP 10
SalesOrderNumber
, ProductKey
FROM FactInternetSales;
SELECT
SalesOrderNumber
, COUNT(ProductKey) AS NumberofProducts
FROM FactInternetSales
GROUP BY SalesOrderNumber
HAVING COUNT(ProductKey) >= 3;
WITH OrderList AS
(
SELECT
SalesOrderNumber
, COUNT(ProductKey) AS NumberofProducts
FROM FactInternetSales
GROUP BY SalesOrderNumber
HAVING COUNT(ProductKey) >= 3
)
SELECT
OrderList.SalesOrderNumber
, InSales.ProductKey
FROM OrderList
JOIN FactInternetSales AS InSales
ON Orderlist.SalesOrderNumber = InSales.SalesOrderNumber;
WITH OrderList AS
(
SELECT
SalesOrderNumber
, COUNT(ProductKey) AS NumberofProducts
FROM FactInternetSales
GROUP BY SalesOrderNumber
HAVING COUNT(ProductKey) >= 3
),
OrderInfo AS
(
SELECT
OrderList.SalesOrderNumber
, InSales.ProductKey
FROM OrderList
JOIN FactInternetSales AS InSales
ON Orderlist.SalesOrderNumber = InSales.SalesOrderNumber
)
SELECT
OrderInfo1.SalesOrderNumber
, OrderInfo1.ProductKey AS Product1
, OrderInfo2.ProductKey AS Product2
, OrderInfo3.ProductKey AS Product3
FROM OrderInfo AS OrderInfo1
JOIN OrderInfo AS OrderInfo2
ON OrderInfo2.SalesOrderNumber = OrderInfo1.SalesOrderNumber
JOIN OrderInfo AS OrderInfo3
ON OrderInfo3.SalesOrderNumber = OrderInfo1.SalesOrderNumber
WHERE OrderInfo1.ProductKey < OrderInfo2.ProductKey
AND OrderInfo2.ProductKey < OrderInfo3.ProductKey;
- We self-join 3 CTE tables (tables of Sales Order Number and Product Key) to create a new table with an Order Number and 3 different products in that order.*
- Condition to avoid duplicate - WHERE:
OrderInfo1.ProductKey < OrderInfo2.ProductKey < OrderInfo3.ProductKey. - We have the following result:
WITH OrderList AS
(
SELECT
SalesOrderNumber
, COUNT(ProductKey) AS NumberofProducts
FROM FactInternetSales
GROUP BY SalesOrderNumber
HAVING COUNT(ProductKey) >= 3
),
OrderInfo AS
(
SELECT
OrderList.SalesOrderNumber
, InSales.ProductKey
FROM OrderList
JOIN FactInternetSales AS InSales
ON Orderlist.SalesOrderNumber = InSales.SalesOrderNumber
)
SELECT DISTINCT
OrderInfo1.ProductKey AS Product1
, OrderInfo2.ProductKey AS Product2
, OrderInfo3.ProductKey AS Product3
, COUNT(OrderInfo3.ProductKey) OVER(PARTITION BY OrderInfo1.ProductKey, OrderInfo2.ProductKey, OrderInfo3.ProductKey) AS Frequency
FROM OrderInfo AS OrderInfo1
JOIN OrderInfo AS OrderInfo2
ON OrderInfo2.SalesOrderNumber = OrderInfo1.SalesOrderNumber
JOIN OrderInfo AS OrderInfo3
ON OrderInfo3.SalesOrderNumber = OrderInfo1.SalesOrderNumber
WHERE OrderInfo1.ProductKey < OrderInfo2.ProductKey
AND OrderInfo2.ProductKey < OrderInfo3.ProductKey
ORDER BY Frequency DESC;
- From the above result, we can recognize that when there are products number 477 and 478 in customers' baskets, they are most likely to buy products number 485.
- We can apply MBA in purchasing suggestions for customers or product arrangements in supermarkets to improve overall sales.
SQL can help us to query data needed for Market Basket Analysis by suggesting them the most relevant product besides what they have chose to optimize revenue for the organizations or businesses and make the right business decisions.
- Kaur, M. and Kang, S. (2016) “Market basket analysis: Identify the changing trends of Market Data Using Association Rule Mining,” Procedia Computer Science, 85, pp. 78–85. Available at: https://doi.org/10.1016/j.procs.2016.05.180.