This project demonstrates an end-to-end data analytics workflow — from raw customer transaction data to meaningful business insights. It includes data loading and cleaning in Python, exploratory data analysis (EDA), SQL-based analysis, and interactive visualization using Power BI.
The objective is to analyze customer shopping behavior and highlight key trends in spending, subscriptions, and product performance.
The dataset contains customer shopping and transaction data, including:
- Customer demographics (age, gender, location)
- Purchase details (product, category, amount, season)
- Shopping behavior (discounts, reviews, frequency, subscriptions)
The dataset is used to study purchasing patterns and customer segmentation.
- Python – Pandas, NumPy, Matplotlib, Seaborn
- SQL – PostgreSQL / MySQL / SQL Server
- Power BI – Dashboard and data visualization
- Excel – Supporting analysis
- Jupyter Notebook / VS Code – Development environment
- Loaded dataset using Pandas
- Handled missing values and inconsistent data
- Renamed columns for clarity
- Performed feature engineering (age groups, purchase frequency)
- Prepared cleaned data for database storage
- Distribution of purchase amounts
- Customer segmentation analysis
- Category-wise and product-wise sales analysis
- Subscription behavior comparison
- Review rating trends
The cleaned data was loaded into a SQL database and analyzed using queries such as:
- Revenue by gender
- High-spending customers
- Top-rated products
- Shipping type comparison
- Subscribers vs non-subscribers analysis
- Repeat customer behavior
An interactive Power BI dashboard was created to visualize:
- Total customers and average purchase value
- Revenue by category and age group
- Subscription distribution
- Sales trends and customer segments
Users can filter insights by category, gender, shipping type, and subscription status.
- Identified high-value customer segments
- Highlighted top-performing and top-rated products
- Compared subscriber and non-subscriber purchasing behavior
- Found age groups contributing most to total revenue
- Identified discount-dependent products
- Python notebooks for EDA and data cleaning
- SQL queries for analysis
- Power BI dashboard (.pbix)
- Project report
- Presentation (PPT)
git clone https://github.com/your-username/customer-shopping-behavior-analysis.git
cd customer-shopping-behavior-analysispip install pandas numpy matplotlib seaborn sqlalchemy psycopg2Open Jupyter Notebook or VS Code and run:
customer_behavior_analysis.ipynb
- Create a database in PostgreSQL/MySQL/SQL Server
- Run the provided SQL schema
- Use Python script to insert cleaned data
- Open the .pbix file in Power BI Desktop
- Update database credentials if needed
- Refresh data
