Causal Impact Analysis: Competitor Entry Effects on Rossmann Store Sales

Project Overview

This project uses Propensity Score Matching (PSM), a causal inference technique, to isolate the true impact of competitor entry on Rossmann drugstore sales. Instead of simple before/after comparisons, we rigorously control for confounding factors to answer: "Did competitor entry actually hurt sales, or would sales have changed anyway?"

Problem Statement

Organizations struggle to understand the true impact of market changes (competitor entry, policy changes, etc.) versus natural market trends. A store's sales might drop, but was it the competitor's fault, or just seasonal decline? This project tackles that challenge.

Business Question: How much did competitor entry impact Rossmann store sales?

Methodology

Technique: Propensity Score Matching (PSM)

Why PSM?

Handles selection bias: Stores that got competitors might be naturally different from stores without competitors
Works with observational data: No randomized experiment needed
Creates comparable groups: Matches similar stores (one with competitor, one without)
Isolates causal effect: Compares matched pairs to measure true impact

How It Works

Calculate Propensity Scores
- Build a logistic regression model to estimate: "Given store characteristics, what's the probability it gets a competitor?"
- Output: Score 0-1 for each store
Match Treated & Control Stores
- Match each treated store (got competitor) with a control store (no competitor) that has a similar propensity score
- Create 357 matched pairs (limited by number of control stores)
Calculate Treatment Effect
- Compare sales: Treated Store Sales - Control Store Sales
- Average Treatment Effect (ATE) = mean difference across all pairs
- Confidence interval tells us if effect is statistically significant
Validate & Check Robustness
- Verify matching quality (are matched stores actually similar?)
- Calculate 95% confidence interval
- Assess statistical significance

Key Findings

Metric	Value
Average Treatment Effect (ATE)	+60.92 sales
Standard Deviation	2,162.02
95% Confidence Interval	[-164.11, +285.96]
Statistical Significance	NOT SIGNIFICANT

Interpretation

Competitor entry had NO statistically significant effect on Rossmann sales.

ATE = +60.92 (small, positive)
CI includes zero → effect could be negative, positive, or zero
We cannot confidently say competitor hurt or helped
With 95% confidence, true effect is between -164 and +286 sales per store

Data & Features

Dataset

Rossmann Store Sales (1,017,209 transactions, 1,115 stores)
Time Period: 2013-2015
Key Variables:
- Sales: Daily store revenue
- CompetitionOpenSinceYear/Month: When competitor entered
- HasCompetition: Treatment indicator (1 = competitor, 0 = no competitor)
- Promo, StateHoliday, SchoolHoliday: Confounders to control

Confounders Controlled

Promotions (boost sales independently)
State/School Holidays (affect shopping behavior)
Day of Week (weekends differ from weekdays)
Store Type (different formats sell differently)

Visualizations

Plot 1: Propensity Score Distribution

Shows treated vs control store propensity scores. Good overlap validates matching approach.

Plot 2: Matched Pairs Scatter

Compares treated vs control sales for each pair. Points near diagonal line = matching worked; spread shows variability in effects.

Plot 3: Treatment Effect Distribution

Histogram of individual effects with confidence interval. Centered near zero, includes zero → no significant effect.

Limitations & Caveats

Limited Time Window (2013-2015)
- Data only covers 3 years; competitors entered gradually
- No clear "pre-competitor" vs "post-competitor" periods
- Limits causal identification
Unconfoundedness Assumption
- Assumes all confounders measured (promotion, holidays, store type)
- Unmeasured confounders (store reputation, manager quality) could bias results
- Cannot be tested; only sensitivity analysis available
High Variability
- Standard deviation (2,162) much larger than ATE (60.92)
- Large differences between individual stores
- Effect is imprecisely estimated
Selection Bias
- Stores that got competitors might be in growing markets (confounding)
- Even with matching, baseline differences may remain

Interpretation & Business Implications

What This Means:

Competitor entry did NOT significantly hurt Rossmann sales in this analysis
Natural variation and confounders explain most of the change observed
Any impact is too small or uncertain to detect with this data

Why This Matters:

Suggests competitors less threatening than feared, OR
Data quality/timeframe insufficient to detect true effect, OR
Other factors (price, location, loyalty) matter more than competitor presence

Code Structure

Impact-Analysis/
├── data/
│   ├── train.csv           
│   └── store.csv           
├── analysis.ipynb         
├── README.md           
└── plots/
    ├── psm_distribution.html
    ├── matched_pairs.html
    └── effect_distribution.html

How to Run

Install dependencies:

pip install pandas numpy scikit-learn scipy plotly

Load data:

import pandas as pd
df = pd.read_csv('train.csv').merge(pd.read_csv('store.csv'), on='Store')

Run analysis:
- Open analysis.ipynb in Jupyter
- Execute cells sequentially
- View interactive plots

Key Learnings: Causal Inference Concepts

Confounding: Other factors change alongside treatment. We controlled for promotions, holidays, store type to isolate competitor effect.

Propensity Score: Probability of treatment based on pre-treatment characteristics. Used to match comparable stores.

Matching: Paired treated (competitor) and control (no competitor) stores with similar propensity scores, making comparison fair.

Statistical Significance: 95% confidence interval includes zero → effect not significantly different from no effect.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
output		output
Impact_Analysis.ipynb		Impact_Analysis.ipynb
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Causal Impact Analysis: Competitor Entry Effects on Rossmann Store Sales

Project Overview

Problem Statement

Methodology

Technique: Propensity Score Matching (PSM)

How It Works

Key Findings

Interpretation

Data & Features

Dataset

Confounders Controlled

Visualizations

Plot 1: Propensity Score Distribution

Plot 2: Matched Pairs Scatter

Plot 3: Treatment Effect Distribution

Limitations & Caveats

Interpretation & Business Implications

Code Structure

How to Run

Key Learnings: Causal Inference Concepts

About

Uh oh!

Releases

Packages

Languages

kaverikb/Causal-Impact-Analysis

Folders and files

Latest commit

History

Repository files navigation

Causal Impact Analysis: Competitor Entry Effects on Rossmann Store Sales

Project Overview

Problem Statement

Methodology

Technique: Propensity Score Matching (PSM)

How It Works

Key Findings

Interpretation

Data & Features

Dataset

Confounders Controlled

Visualizations

Plot 1: Propensity Score Distribution

Plot 2: Matched Pairs Scatter

Plot 3: Treatment Effect Distribution

Limitations & Caveats

Interpretation & Business Implications

Code Structure

How to Run

Key Learnings: Causal Inference Concepts

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages