This project showcases a complete data analysis pipeline, utilizing SQL for data extraction and Python for data visualization and insights generation. The dataset comes from a large-scale Ecommerce platform, and the goal was to derive key business insights through structured queries and visual analysis.
By blending the power of SQL Workbench for database management and Python (Jupyter Notebook) for analysis and visualizations, this project highlights my ability to work across different tools and technologies in data science. The insights drawn from this dataset could inform critical business decisions, such as customer retention strategies, sales optimization, and revenue growth forecasting.
- Source: A comprehensive Ecommerce dataset, including customer orders, product details, and payment information.
- Format: CSV files stored locally and ingested into a MySQL database for structured querying.
- Dataset-link: https://www.kaggle.com/datasets/devarajv88/target-dataset?select=products.csv
- SQL Workbench: Used to run queries and extract insights from the dataset.
- Python (Jupyter Notebook): Employed for connecting to the SQL database, performing advanced analysis, and visualizing the results.
- MySQL: Database management for storing and querying the Ecommerce data.
- Libraries:
pandas
for data manipulation and cleaning.matplotlib
andseaborn
for visualizations.mysql-connector-python
for MySQL interaction.
-
SQL Querying:
- Extracted detailed insights with multiple levels of complexity (Basic, Intermediate, and Advanced queries).
- Queries include data on customer location, sales performance, product category revenue, customer retention, and more.
-
Data Analysis:
- Used Python to compute additional metrics like moving averages, year-over-year growth, and customer retention rates.
-
Visualizations:
- Presented key business insights through compelling visualizations (bar plots, line graphs, heatmaps) to enhance data understanding and decision-making.
- Identified cities with the most active customers.
- Counted the number of orders placed in a specific year.
- Calculated sales by category to identify top-performing products.
- Analyzed the number of orders per month to identify sales trends.
- Found the average number of products per order based on customer location.
- Determined the percentage of revenue generated by each product category.
- Computed the moving average of order values for each customer, giving insights into purchase trends.
- Calculated the retention rate of customers, helping to develop customer retention strategies.
- Identified the top 3 customers who contributed the most revenue per year, crucial for loyalty programs.
- Sales trends and monthly order patterns.
- Product category performance breakdown.
- Year-over-year growth visualized to track business progress.
- Customer behavior patterns to inform marketing strategies.
- How to integrate SQL with Python for large-scale data analysis.
- Techniques for querying and extracting data from a MySQL database.
- Advanced data analysis skills in Jupyter Notebook.
- Hands-on experience with key Ecommerce KPIs like sales performance, customer retention, and product popularity.
- Adding machine learning models to predict customer lifetime value (CLV) and churn.
- Implementing real-time data pipelines for more dynamic analysis.
Feel free to explore the project, and don't hesitate to get in touch if you’d like to discuss my work!