This project uses Propensity Score Matching (PSM), a causal inference technique, to isolate the true impact of competitor entry on Rossmann drugstore sales. Instead of simple before/after comparisons, we rigorously control for confounding factors to answer: "Did competitor entry actually hurt sales, or would sales have changed anyway?"
Organizations struggle to understand the true impact of market changes (competitor entry, policy changes, etc.) versus natural market trends. A store's sales might drop, but was it the competitor's fault, or just seasonal decline? This project tackles that challenge.
Business Question: How much did competitor entry impact Rossmann store sales?
Why PSM?
- Handles selection bias: Stores that got competitors might be naturally different from stores without competitors
- Works with observational data: No randomized experiment needed
- Creates comparable groups: Matches similar stores (one with competitor, one without)
- Isolates causal effect: Compares matched pairs to measure true impact
-
Calculate Propensity Scores
- Build a logistic regression model to estimate: "Given store characteristics, what's the probability it gets a competitor?"
- Output: Score 0-1 for each store
-
Match Treated & Control Stores
- Match each treated store (got competitor) with a control store (no competitor) that has a similar propensity score
- Create 357 matched pairs (limited by number of control stores)
-
Calculate Treatment Effect
- Compare sales: Treated Store Sales - Control Store Sales
- Average Treatment Effect (ATE) = mean difference across all pairs
- Confidence interval tells us if effect is statistically significant
-
Validate & Check Robustness
- Verify matching quality (are matched stores actually similar?)
- Calculate 95% confidence interval
- Assess statistical significance
| Metric | Value |
|---|---|
| Average Treatment Effect (ATE) | +60.92 sales |
| Standard Deviation | 2,162.02 |
| 95% Confidence Interval | [-164.11, +285.96] |
| Statistical Significance | NOT SIGNIFICANT |
Competitor entry had NO statistically significant effect on Rossmann sales.
- ATE = +60.92 (small, positive)
- CI includes zero → effect could be negative, positive, or zero
- We cannot confidently say competitor hurt or helped
- With 95% confidence, true effect is between -164 and +286 sales per store
- Rossmann Store Sales (1,017,209 transactions, 1,115 stores)
- Time Period: 2013-2015
- Key Variables:
Sales: Daily store revenueCompetitionOpenSinceYear/Month: When competitor enteredHasCompetition: Treatment indicator (1 = competitor, 0 = no competitor)Promo,StateHoliday,SchoolHoliday: Confounders to control
- Promotions (boost sales independently)
- State/School Holidays (affect shopping behavior)
- Day of Week (weekends differ from weekdays)
- Store Type (different formats sell differently)
Shows treated vs control store propensity scores. Good overlap validates matching approach.
Compares treated vs control sales for each pair. Points near diagonal line = matching worked; spread shows variability in effects.
Histogram of individual effects with confidence interval. Centered near zero, includes zero → no significant effect.
-
Limited Time Window (2013-2015)
- Data only covers 3 years; competitors entered gradually
- No clear "pre-competitor" vs "post-competitor" periods
- Limits causal identification
-
Unconfoundedness Assumption
- Assumes all confounders measured (promotion, holidays, store type)
- Unmeasured confounders (store reputation, manager quality) could bias results
- Cannot be tested; only sensitivity analysis available
-
High Variability
- Standard deviation (2,162) much larger than ATE (60.92)
- Large differences between individual stores
- Effect is imprecisely estimated
-
Selection Bias
- Stores that got competitors might be in growing markets (confounding)
- Even with matching, baseline differences may remain
What This Means:
- Competitor entry did NOT significantly hurt Rossmann sales in this analysis
- Natural variation and confounders explain most of the change observed
- Any impact is too small or uncertain to detect with this data
Why This Matters:
- Suggests competitors less threatening than feared, OR
- Data quality/timeframe insufficient to detect true effect, OR
- Other factors (price, location, loyalty) matter more than competitor presence
Impact-Analysis/
├── data/
│ ├── train.csv
│ └── store.csv
├── analysis.ipynb
├── README.md
└── plots/
├── psm_distribution.html
├── matched_pairs.html
└── effect_distribution.html
-
Install dependencies:
pip install pandas numpy scikit-learn scipy plotly
-
Load data:
import pandas as pd df = pd.read_csv('train.csv').merge(pd.read_csv('store.csv'), on='Store')
-
Run analysis:
- Open
analysis.ipynbin Jupyter - Execute cells sequentially
- View interactive plots
- Open
Confounding: Other factors change alongside treatment. We controlled for promotions, holidays, store type to isolate competitor effect.
Propensity Score: Probability of treatment based on pre-treatment characteristics. Used to match comparable stores.
Matching: Paired treated (competitor) and control (no competitor) stores with similar propensity scores, making comparison fair.
Statistical Significance: 95% confidence interval includes zero → effect not significantly different from no effect.