This project analyzes ShopEasy’s marketing performance to address:
- Declining customer engagement
- Low conversion rates
- High marketing spend with weak ROI
- Decreasing customer satisfaction based on reviews
The workflow covers data profiling (EDA), data cleaning, and building a reporting-ready layer (SQL views) that feeds a Power BI dashboard. A Python notebook is included for optional/bonus analysis.
- SQL Server (database restore, EDA, cleaning, reporting views)
- SSMS (query execution)
- Power BI Desktop (dashboard & KPIs)
- Python (Jupyter Notebook) (optional exploratory analysis)
PortfolioProject_MarketingAnalytics/
│
├── Query.sql # EDA + cleaning + view creation scripts
├── Dashboard.pbix # Power BI dashboard
├── Notebook.ipynb # Optional Python analysis (bonus)
└── Ref/ # Materials provided by Orange (inputs/templates)
Place the project inputs sent by Orange inside Ref/, for example:
DA_Marketing_Project.pdf(project brief / requirements)MarketingAnalyticsProject.bak(SQL Server database backup)- Any additional notes, templates, or supporting assets shared with the assignment
Tip: Keeping “provided materials” in
Ref/makes the project clean and easy to review.
Database: PortfolioProject_MarketingAnalytics
Core tables:
customer_journey— customer funnel activity (homepage/product page/checkout)engagement_data— content engagement by campaign/productcustomer_reviews— ratings + free-text reviewscustomers— customer demographics + geography keyproducts— product catalog and pricinggeography— country/city dimension
| Table | Rows | Columns |
|---|---|---|
| customer_journey | 4011 | 7 |
| engagement_data | 4623 | 8 |
| customer_reviews | 1363 | 6 |
| customers | 100 | 7 |
| products | 20 | 4 |
| geography | 10 | 3 |
customer_journey.Duration: 613 NULL values
customer_journey: 79 duplicate rows- All other tables: 0 duplicates
Stage values (examples):
HomepagevshomepageProductPagevsproductpageCheckoutvscheckout
ContentType values (examples):
BlogvsblogSocialmediavssocialmediaNewslettervsnewsletterVideovsvideo
Within the checked range (10–300):
- Avg: 138.18
- Min: 26.21
- Max: 275.43
The cleaning pipeline is implemented as SQL views so Power BI can consume clean, consistent data.
Cleans customer_journey by:
- Standardizing
Stage→CleanStageusingTRIM + UPPER - Filling NULL
Durationvalues with the average duration (imputation) - Exposes
CleanDurationfor reporting
Output columns:
JourneyID, CustomerID, ProductID, VisitDate, CleanStage, Action, CleanDuration
Cleans engagement_data by:
- Standardizing
ContentType→CleanContentTypeusingTRIM + UPPER
Output columns:
EngagementID, ContentID, CleanContentType, Likes, EngagementDate, CampaignID, ProductID, ViewsClicksCombined
Creates an initial combined view by joining journey + engagement:
LEFT JOINon ProductID
Why this matters: a product may have multiple engagement rows (different content/campaign/date), so this join can multiply rows.
Final reporting-ready view that removes duplicates after the join using:
ROW_NUMBER() OVER (PARTITION BY CustomerID, ProductID, VisitDate ORDER BY JourneyID)- Keeps
rn = 1to return one representative row per(CustomerID, ProductID, VisitDate)
Output columns include: Journey fields + Engagement fields (IDs, content type, likes, campaign, etc.)
Note: If you want detailed engagement analysis (multiple engagement rows per product), use
vw_clean_engagement_dataas a separate fact table in Power BI instead of relying on the joined view.
Simple pass-through views for consistent naming and modeling:
vw_customersvw_productsvw_customer_reviewsvw_geography
- Copy
MarketingAnalyticsProject.bakinto your SQL Server backup directory - Restore via SSMS:
- Right-click Databases → Restore Database…
- Select Device → choose the
.bakfile
- Confirm the restored DB name matches your setup (e.g.,
PortfolioProject_MarketingAnalytics)
Open and run:
Query.sql
This script:
- Performs EDA (row counts, null checks, duplicates, standardization checks)
- Creates/updates the cleaning views (
CREATE OR ALTER VIEW ...)
Open:
Dashboard.pbixThen:- Update the SQL Server connection (if needed)
- Refresh the dataset
Open:
Notebook.ipynbRun EDA/visualizations and any bonus analysis.
For a clean star schema, consider:
- Fact tables:
vw_clean_journey(journey events)vw_clean_engagement_data(engagement events)vw_customer_reviews(review events)
- Dimensions:
vw_customers+vw_geographyvw_products
Relationships (typical):
- Customers → Journey (CustomerID)
- Products → Journey (ProductID)
- Products → Engagement (ProductID)
- Customers → Reviews (CustomerID)
- Products → Reviews (ProductID)
- Geography → Customers (GeographyID)
Depending on how “conversion” is defined in your analysis:
- Conversion Rate: reach checkout / product page visits (or homepage)
- Engagement Rate: likes per view/click (after parsing
ViewsClicksCombined) - Customer Feedback Score: average rating + rating distribution + review themes
- Drop-off Rate: drop-offs by stage and over time
- Content Performance: engagement by content type and campaign