The growth of supermarkets in most populated cities are increasing and market competitions are also high. The dataset is one of the historical sales of supermarket company which has recorded in 3 different branches for 3 months data.
I utilize the 'groupby' function in pandas to identify trends and patterns throughout my analysis. This approach will make informed decisions aimed at enhancing the business.
The primary dataset used for this project was downloaded here https://www.kaggle.com/datasets/lovishbansal123/sales-of-a-supermarket
Below is a detailed breakdown of each column descriptions below:
Invoice id: Computer generated sales slip invoice identification number.
Branch: Branch of supercenter (3 branches are available identified by A, B and C).
City: Location of supercenters.
Customer type: Type of customers, recorded by Members for customers using member card and Normal for without member card.
Gender: Gender type of customer.
Product line: General item categorization groups - Electronic accessories, Fashion accessories, Food and beverages, Health and beauty, Home and lifestyle, Sports and travel.
Unit price: Price of each product in $.
Quantity: Number of products purchased by customer.
Tax: 5% tax fee for customer buying.
Total: Total price including tax.
Date: Date of purchase (Record available from January 2019 to March 2019).
Time: Purchase time (10am to 9pm).
Payment: Payment used by customer for purchase (3 methods are available – Cash, Credit card and Ewallet).
COGS: Cost of goods sold.
Gross margin percentage: Gross margin percentage.
Gross income: Gross income.
I analyzed a dataset of 1000 records using Python, leveraging libraries such as pandas for data manipulation, NumPy for statistical insights, and seaborn and matplotlib for data visualization. This approach allowed me to interpret and communicate insights effectively through visual representations.
In the initial data preparation, I performed the following tasks below:
- Data loading and inspection.
- Identify and handling for missing values.
- Checking for duplicate values.
- Standardizing datatypes
- Visualized column quantity with boxplot to check if there were outliers across the dataset.
EDA involved exploring the supermarket dataset to answer key questions such as:
- What was the distribution of sales across different branches?
- How does customer type (member/normal) sales revenue varies across the dataset?
- What was the average gross income for each product line?
- What month has the best revenue income?
- Find payment type with the most revenue.
- Find the correlation relationship between cogs, gross income, Rating, Unit price, Quantity and Tax 5%
- What was the percentage gender demographic in the dataset?
- What was the average rating for each product line?
- What city has the best order quantity?
- Which product lines have the highest and lowest percentage sales volumes?
- How does weekly sales vary across the dataframe?
- Branches "C" and "B" tend to generate high and low distribution sales of $110,568.71 and $106,197.67 across different branches.
- Member and Normal payment types generate $164,223.44 and $158,743.40 respectively.
- Home and lifestyle tend to have the highest average income of $16.03, while fashion and accessories tend to have the lowest average income.
- January tends to generate the best revenue income of $116,291.87 compared to other months.
- Cash payment type tends to generate the most revenue of $112,206.57.
- "COGS" and "gross income" have a perfect positive correlation of 1.0, indicating they move together perfectly.
- "Rating" has a weak negative correlation with "COGS" and "gross income" (-0.03), suggesting that as "Rating" decreases slightly, "COGS" and "gross income" may also decrease slightly.
- "Unit price" has a moderately positive correlation with "COGS" and "gross income" (0.63), indicating that as the unit price increases, both "COGS" and "gross income" tend to increase.
- "Quantity" has a strong positive correlation with "COGS" and "gross income" (0.70), indicating that as the quantity sold increases, both "COGS" and "gross income" tend to increase.
- "Tax 5%" has a perfect positive correlation with "COGS" and "gross income" (1.00), suggesting they move together perfectly, possibly implying that the tax is directly proportional to both "COGS" and "gross income".
- There was a high percentage of 50.10% female and 49.90% male across the dataset.
- Food beverage and home lifestyle tend to have the highest and lowest average ratings of 7.11 and 6.84 respectively.
- Yangon city tends to have the highest order quantity.
- Product lines with the highest and lowest percentages are food and beverages and health and beauty, with 17.38% and 15.23% respectively.
- Branch Performance Analysis: Identifying what makes Branch "C" successful and implementing similar strategies in Branch "B" could help improve its performance.
- Payment Type Analysis: Since member payment types generate higher revenue compared to normal payment types, efforts should be made to encourage more customers to become members or to promote member-exclusive offers to increase revenue.
- Product Category Analysis: Home and lifestyle products seem to be performing well compared to fashion and accessories. This suggests a potential opportunity to optimize the product mix or marketing strategies for fashion and accessories to increase sales and average income.
- Monthly Revenue Analysis: January is identified as the month generating the best revenue income. Understanding the factors contributing to this peak performance, such as seasonal trends or promotional activities, can help in planning future marketing campaigns and sales strategies.
- Payment Method Impact: Cash payment type generates the most revenue, indicating that offering discounts or incentives for cash payments could be beneficial. However, exploring reasons behind this preference and ensuring convenient payment options for all customers is essential.
- Correlation Analysis: Understanding the correlations between variables like "cogs," "gross income," and others could help in predicting and managing financial performance. Monitoring these correlations over time could provide insights into changing business dynamics.
- Gender Demographics: The nearly equal distribution of male and female customers suggests that marketing strategies should cater to both demographics equally, ensuring inclusivity and broad appeal in advertising and product offerings.
- Product Rating Analysis: Analyzing product ratings across different categories could provide insights into customer preferences and satisfaction levels. Improving the quality or variety of products with lower ratings could enhance overall customer experience and loyalty.
- Location-Based Analysis: Yangon city stands out for having the highest order quantity. Further investigation into the reasons behind this trend could uncover opportunities for targeted marketing or expansion strategies in similar locations.
- Product Line Performance: Understanding the performance of different product lines, such as food and beverages versus health and beauty, could help in optimizing inventory management, marketing efforts, and overall business focus. Identifying the reasons behind the success of certain product lines could inform strategic decision-making.