Skip to content

Causal inference analysis using Propensity Score Matching to measure competitor impact on store sales. Isolates true treatment effect from confounding factors with statistical rigor.

Notifications You must be signed in to change notification settings

kaverikb/Causal-Impact-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Causal Impact Analysis: Competitor Entry Effects on Rossmann Store Sales

Project Overview

This project uses Propensity Score Matching (PSM), a causal inference technique, to isolate the true impact of competitor entry on Rossmann drugstore sales. Instead of simple before/after comparisons, we rigorously control for confounding factors to answer: "Did competitor entry actually hurt sales, or would sales have changed anyway?"


Problem Statement

Organizations struggle to understand the true impact of market changes (competitor entry, policy changes, etc.) versus natural market trends. A store's sales might drop, but was it the competitor's fault, or just seasonal decline? This project tackles that challenge.

Business Question: How much did competitor entry impact Rossmann store sales?


Methodology

Technique: Propensity Score Matching (PSM)

Why PSM?

  • Handles selection bias: Stores that got competitors might be naturally different from stores without competitors
  • Works with observational data: No randomized experiment needed
  • Creates comparable groups: Matches similar stores (one with competitor, one without)
  • Isolates causal effect: Compares matched pairs to measure true impact

How It Works

  1. Calculate Propensity Scores

    • Build a logistic regression model to estimate: "Given store characteristics, what's the probability it gets a competitor?"
    • Output: Score 0-1 for each store
  2. Match Treated & Control Stores

    • Match each treated store (got competitor) with a control store (no competitor) that has a similar propensity score
    • Create 357 matched pairs (limited by number of control stores)
  3. Calculate Treatment Effect

    • Compare sales: Treated Store Sales - Control Store Sales
    • Average Treatment Effect (ATE) = mean difference across all pairs
    • Confidence interval tells us if effect is statistically significant
  4. Validate & Check Robustness

    • Verify matching quality (are matched stores actually similar?)
    • Calculate 95% confidence interval
    • Assess statistical significance

Key Findings

Metric Value
Average Treatment Effect (ATE) +60.92 sales
Standard Deviation 2,162.02
95% Confidence Interval [-164.11, +285.96]
Statistical Significance NOT SIGNIFICANT

Interpretation

Competitor entry had NO statistically significant effect on Rossmann sales.

  • ATE = +60.92 (small, positive)
  • CI includes zero → effect could be negative, positive, or zero
  • We cannot confidently say competitor hurt or helped
  • With 95% confidence, true effect is between -164 and +286 sales per store

Data & Features

Dataset

  • Rossmann Store Sales (1,017,209 transactions, 1,115 stores)
  • Time Period: 2013-2015
  • Key Variables:
    • Sales: Daily store revenue
    • CompetitionOpenSinceYear/Month: When competitor entered
    • HasCompetition: Treatment indicator (1 = competitor, 0 = no competitor)
    • Promo, StateHoliday, SchoolHoliday: Confounders to control

Confounders Controlled

  • Promotions (boost sales independently)
  • State/School Holidays (affect shopping behavior)
  • Day of Week (weekends differ from weekdays)
  • Store Type (different formats sell differently)

Visualizations

Plot 1: Propensity Score Distribution

Shows treated vs control store propensity scores. Good overlap validates matching approach.

Plot 2: Matched Pairs Scatter

Compares treated vs control sales for each pair. Points near diagonal line = matching worked; spread shows variability in effects.

Plot 3: Treatment Effect Distribution

Histogram of individual effects with confidence interval. Centered near zero, includes zero → no significant effect.


Limitations & Caveats

  1. Limited Time Window (2013-2015)

    • Data only covers 3 years; competitors entered gradually
    • No clear "pre-competitor" vs "post-competitor" periods
    • Limits causal identification
  2. Unconfoundedness Assumption

    • Assumes all confounders measured (promotion, holidays, store type)
    • Unmeasured confounders (store reputation, manager quality) could bias results
    • Cannot be tested; only sensitivity analysis available
  3. High Variability

    • Standard deviation (2,162) much larger than ATE (60.92)
    • Large differences between individual stores
    • Effect is imprecisely estimated
  4. Selection Bias

    • Stores that got competitors might be in growing markets (confounding)
    • Even with matching, baseline differences may remain

Interpretation & Business Implications

What This Means:

  • Competitor entry did NOT significantly hurt Rossmann sales in this analysis
  • Natural variation and confounders explain most of the change observed
  • Any impact is too small or uncertain to detect with this data

Why This Matters:

  • Suggests competitors less threatening than feared, OR
  • Data quality/timeframe insufficient to detect true effect, OR
  • Other factors (price, location, loyalty) matter more than competitor presence

Code Structure

Impact-Analysis/
├── data/
│   ├── train.csv           
│   └── store.csv           
├── analysis.ipynb         
├── README.md           
└── plots/
    ├── psm_distribution.html
    ├── matched_pairs.html
    └── effect_distribution.html

How to Run

  1. Install dependencies:

    pip install pandas numpy scikit-learn scipy plotly
  2. Load data:

    import pandas as pd
    df = pd.read_csv('train.csv').merge(pd.read_csv('store.csv'), on='Store')
  3. Run analysis:

    • Open analysis.ipynb in Jupyter
    • Execute cells sequentially
    • View interactive plots

Key Learnings: Causal Inference Concepts

Confounding: Other factors change alongside treatment. We controlled for promotions, holidays, store type to isolate competitor effect.

Propensity Score: Probability of treatment based on pre-treatment characteristics. Used to match comparable stores.

Matching: Paired treated (competitor) and control (no competitor) stores with similar propensity scores, making comparison fair.

Statistical Significance: 95% confidence interval includes zero → effect not significantly different from no effect.

About

Causal inference analysis using Propensity Score Matching to measure competitor impact on store sales. Isolates true treatment effect from confounding factors with statistical rigor.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published