This repository demonstrates the process of extracting, cleaning, transforming, and analyzing retail orders data. The project uses Python and MySQL to perform data extraction from Kaggle, data cleaning, and exploratory data analysis (EDA) to derive business insights.
In this project, I processed and analyzed a dataset containing over 10,000 retail orders. The process involved:
- Extracting the Dataset: The retail orders dataset is downloaded from Kaggle using the Kaggle API in Python.
- Data Cleaning: Null values were treated, columns were renamed, date columns were updated, and unnecessary columns were dropped.
- Data Transformation: The cleaned data was loaded into a MySQL database for further analysis.
- Exploratory Data Analysis (EDA): SQL queries were used to answer various business questions about product performance, revenue generation, sales growth, and more.
The following key business queries were analyzed using the dataset:
- Top 10 Highest Revenue Generating Products
- Top 5 Highest Selling Products in Each Region
- Month-over-Month Growth Comparison for Sales (2022 vs. 2023)
- Month with Highest Sales in Each Category
- Highest Growth by Profit in Sub-Categories (2023 vs. 2022)
- Revenue Contribution of Each Customer Segment
- Products or Regions with the Highest Profit Margins
- Impact of Ship Mode on Sales and Profitability
- Comparison of Sales Across Seasons
- Seasonal Sales Growth and Regional Performance Analysis
Directory structure:
└── anshulkansal121-Retail_Orders_ETL/
├── Order_Data_Queries.sql # EDA SQL Script
├── Retail Orders Data Cleaning and Loading.ipynb # Python scripts for data extraction, cleaning, and loading
├── orders.csv # Original Data
├── orders.csv.zip
└── .ipynb_checkpoints/
└── Retail Order Data Cleaning and Loading-checkpoint.ipynb
To run this project, you need the following libraries installed:
pandas
numpy
mysql-connector-python
pymysql
kaggle
You can install the required dependencies using:
pip install -r requirements.txt
-
Setup Kaggle API: Ensure that you have the Kaggle API key setup. You can follow this guide to configure it. The .json file need to in ~/.kaggle/kaggle..json directory.
-
Run the Python Script: Download and Run
Retail Orders Data Cleaning and Loading.ipynb
the following python script will perform the Data Extraction, Cleaning and Load it to Mysql Server. -
Run EDA and Analysis: Open and run
Order_Data_Queries.sql
or you can also try the above mentioned analysis questions yourself.
To contribute to this project, please follow these guidelines:
Steps to Create a PR:
-
Fork the Repository
-
Create a New Branch for your changes
git checkout -b feature/your-feature-name
-
Implement your feature, bug fix or any relevant insight, and make sure to follow the code style used in the repository.
-
Write clear and concise commit messages describing the changes you made.
git commit -m "Added feature or fixed bug"
- Push your changes to your forked repository.
git push origin feature/your-feature-name
-
Open a PR from your branch to the main repository. Ensure your PR follows these rules:
- The PR should be clear and well-documented.
- Ensure the PR only contains relevant changes.
- Add detailed descriptions of the changes in the PR comments.