Skip to content

Project Experiment

Brian Weisberg edited this page Feb 21, 2026 · 3 revisions

Project Experiment

A Comprehensive Knowledge Base for Modern Experimentation in Marketing & Analytics

Project Experiment is a definitive, 25-module knowledge base synthesizing academic research, industry best practices, and real-world case studies from leading technology and financial services companies. Developed through collaborative synthesis by multiple expert systems and refined through rigorous editorial consolidation, this resource provides both theoretical foundations and practical implementation guidance for analytics professionals, data scientists, and business leaders.


πŸ“– How to Navigate This Knowledge Base

This knowledge base is organized into five thematic groups that represent the natural progression of experimentation maturityβ€”from foundational concepts through advanced techniques to organizational transformation:

  1. Foundations & Statistical Methods (Topics 01-05): Core concepts, decision frameworks, and fundamental statistical techniques
  2. Advanced Statistical Techniques (Topics 06-10): Variance reduction, specialized metrics, and channel-specific applications
  3. Analytical Sophistication (Topics 11-15): Heterogeneous effects, personalization, and operational excellence
  4. Stakeholder Management & Governance (Topics 16-20): Communication, compliance, unified measurement, and future trends
  5. Organizational Maturity (Topics 21-25): Implementation roadmaps, common pitfalls, and cultural transformation

⭐ Highlights: Start Here

Essential Foundations

Advanced Statistical Methods

Email & Marketing Specialization

Critical Organizational Topics

For Regulated Industries


Group 1: Foundations & Statistical Methods

Governance, compliance, and decision science in high-stakes environments

A comprehensive analysis of how experimentation functions within regulated industries (financial services, healthcare, insurance). Covers Model Risk Management (SR 11-7), Conduct Risk frameworks, Fair Lending regulations (ECOA), and the governance structures that distinguish regulated experimentation from Silicon Valley's "move fast and break things" culture. Essential reading for anyone working in regulated sectors or building experimentation programs that require rigorous oversight.

Key Topics: Three Lines of Defense, Model Risk Management, Conduct Risk, Fair Lending (ECOA), SR 11-7 compliance, capital planning integration, gamification risks


Sequential testing, Bayesian methods, and sophisticated stopping criteria

Consolidated from 3 expert sources: Claude, Gemini, and Main synthesis

The definitive guide to moving beyond arbitrary 95% confidence thresholds and fixed experiment durations. Covers group sequential testing (GST), always-valid inference (AVI), Bayesian decision rules, expected loss frameworks, and risk-adjusted thresholds by metric hierarchy. Includes detailed implementation guidance from Spotify, Netflix, Airbnb, Microsoft, Booking.com, and other elite experimentation organizations.

Key Frameworks: O'Brien-Fleming boundaries, Pocock spending functions, mSPRT, GSPRT, Expected Loss minimization, Probability to Be Best (P2BB), threshold of caring

πŸ“š Research Appendix (10 documents):


Building a culture that values learning from failure

Consolidated from 3 expert sources: Claude, Gemini, and Main synthesis

Why 70-90% of experiments "fail" at elite companiesβ€”and why this is a sign of maturity, not dysfunction. Explores the organizational culture, knowledge management systems, and decision frameworks needed to extract value from null results. Includes the "Ship Flat" decision matrix, Edmondson's failure typology, and detailed case studies from Booking.com, Netflix, Microsoft, Amazon, and Airbnb.

Key Frameworks: Edmondson's failure framework (preventable/complex/intelligent), "Ship Flat" decision matrix, Sample Ratio Mismatch diagnostics, A/A testing protocols, knowledge repository design

πŸ“š Research Appendix (4 documents):


Statistical power analysis for channels with limited traffic

The essential guide to running experiments in low-velocity environments: direct mail, B2B sales, premium segments, annual purchase cycles, and regulatory-limited channels. Covers power calculations, Minimum Detectable Effect (MDE) tradeoffs, duration estimation, and practical strategies when traditional A/B testing isn't feasible.

Key Topics: Power analysis, MDE calculation, sample size requirements, variance estimation, cluster randomization, quasi-experimental designs, difference-in-differences


Rigorous frameworks for early stopping without inflating false positives

A comprehensive guide to sequential testing frameworks that allow "peeking" at results without alpha inflation. Covers Wald's Sequential Probability Ratio Test (SPRT), alpha spending functions, group sequential designs, and practical implementations from Netflix, Optimizely, and VWO.

Key Frameworks: SPRT, alpha spending functions, O'Brien-Fleming boundaries, Lan-DeMets approach, futility monitoring, conditional power


Group 2: Advanced Statistical Techniques

Advanced methods to reduce sample size requirements and duration

Detailed coverage of variance reduction techniques that can reduce sample size requirements by 30-50%. Covers CUPED (Controlled-experiments Using Pre-Experiment Data), CUPAC (CUPED using Predictions as Covariates), stratification, regression adjustment, and modern machine learning approaches. Includes Microsoft and Netflix implementations.

Key Techniques: CUPED, CUPAC, stratified randomization, post-stratification, regression adjustment, doubly robust estimation, machine learning-based covariate adjustment


Statistical complexities of ratio metrics and user-level dependencies

Addresses the unique challenges of ratio metrics (revenue per user, sessions per visit, CTR) and correlated user behavior. Covers the Delta method, bootstrap approaches, clustered standard errors, and when simple t-tests fail. Critical for anyone analyzing engagement, revenue, or behavioral metrics.

Key Topics: Delta method, bootstrap estimation, Taylor expansion, clustered standard errors, intra-cluster correlation, ratio estimators


Adapting email analytics after Apple's tracking pixel blocking

Comprehensive analysis of how Apple's Mail Privacy Protection (MPP) fundamentally changed email analytics. Covers the technical mechanisms of pixel blocking, alternative metrics (clicks, conversions, list hygiene), proxy metric validation, and new experimental approaches that don't rely on open rates.

Key Topics: Apple MPP mechanics, click-based metrics, conversion tracking, list hygiene indicators, engagement scoring, proxy metric validation


Holdout designs and causal inference for marketing campaigns

The gold standard for measuring true incremental value of email campaigns using holdout groups. Covers holdout design, attribution modeling, long-term effects, cannibalization detection, and strategic frameworks for building incrementality testing programs.

Key Frameworks: Global holdouts, rolling holdouts, stratified holdouts, ghost ads, synthetic controls, attribution modeling integration


Balancing short-term engagement with long-term subscriber value

Analyzes the complex relationship between email frequency and long-term subscriber value. Covers frequency optimization, fatigue detection, lifetime value modeling, recency-frequency-monetary (RFM) analysis, and how to balance short-term engagement metrics with long-term list health.

Key Topics: Fatigue curves, optimal frequency estimation, suppression list management, reactivation strategies, LTV modeling for email


Group 3: Analytical Sophistication & Operational Excellence

Heterogeneous treatment effects and conditional average treatment effects

When an experiment shows no overall effect (ATE = 0), Conditional Average Treatment Effects (CATE) analysis can uncover heterogeneous treatment effects across segments. Covers causal forests, meta-learners (S-learner, T-learner, X-learner), honest causal trees, and practical implementation strategies for finding value in "failed" experiments.

Key Methods: Causal forests, generalized random forests, meta-learners, honest splitting, double machine learning, targeted maximum likelihood estimation


Decision frameworks for segment-level personalization

A rigorous framework for deciding when segment-level personalization is justified versus when it introduces unnecessary complexity or overfitting. Covers the bias-variance tradeoff in personalization, multiple testing corrections, false discovery rates, validation frameworks, and organizational readiness assessment.

Key Frameworks: Bias-variance tradeoff, cross-validation for personalization rules, Bonferroni correction, FDR control, effect size thresholds


Adaptive allocation algorithms and exploration-exploitation tradeoffs

Comprehensive comparison of traditional A/B testing versus adaptive algorithms. Covers multi-armed bandits (epsilon-greedy, UCB, Thompson Sampling), contextual bandits, regret minimization, and practical guidance on when each approach is appropriate.

Key Algorithms: Epsilon-greedy, Upper Confidence Bound (UCB), Thompson Sampling, LinUCB, contextual bandits, reinforcement learning


Organizational design for scaled experimentation capabilities

Strategic guide for building organization-wide experimentation capabilities. Covers governance structures, Centers of Excellence (CoE) models, federated vs. centralized architectures, tool selection, platform architecture, velocity metrics, and change management for embedding experimentation into company culture.

Key Topics: CoE design, platform selection, experimentation velocity, democratization vs. governance, training programs, stakeholder alignment


Metric hierarchies and Overall Evaluation Criteria

Frameworks for designing primary, secondary, and guardrail metrics that align with strategic business objectives. Covers metric selection criteria, leading vs. lagging indicators, Overall Evaluation Criteria (OEC) design, composite metrics, and how to avoid "metric gaming."

Key Frameworks: Metric hierarchies, OEC design, guardrail specification, leading indicator validation, proxy metric assessment


Group 4: Stakeholder Management & Governance

Translating statistical rigor into executive decision-making

Practical guidance on presenting statistical results to non-technical executives. Covers narrative structure, visualization best practices, confidence interval communication, effect size interpretation, and how to discuss statistical nuance without losing business context.

Key Skills: Executive storytelling, data visualization, p-value translation, confidence intervals for business audiences, decision memo structure


Ethical frameworks and reputational risk management

Explores the ethical and reputational dimensions of experimentation. When do customers perceive experiments as innovation vs. manipulation? Covers transparency frameworks, opt-in/opt-out considerations, informed consent, and case studies of experimentation that damaged trust (Facebook emotional contagion study, OkCupid compatibility experiments).

Key Topics: Research ethics, informed consent, deceptive practices, transparency obligations, reputational risk, customer trust recovery


Navigating GDPR, CCPA, and sector-specific regulations

Comprehensive legal framework covering GDPR Article 6(1)(f) (legitimate interests), CCPA opt-out rights, CAN-SPAM compliance, TCPA restrictions, and sector-specific regulations. Includes guidance on consent management, data retention, cross-border data transfers, and when experiments require legal review.

Key Regulations: GDPR, CCPA, CAN-SPAM, TCPA, HIPAA (healthcare), GLBA (financial services), data localization requirements


Integrating experimentation with Marketing Mix Modeling and attribution

Explains how controlled experiments complement (rather than replace) Marketing Mix Modeling and multi-touch attribution. Covers the strengths and limitations of each approach, integration strategies, calibration techniques, and how leading companies use all three methods together.

Key Topics: MMM-experiment integration, attribution model validation through experiments, calibration techniques, incrementality vs. attribution


Emerging trends and technological frontiers

Forward-looking analysis of emerging trends: AI-powered experiment design, automated variance reduction, privacy-preserving experimentation techniques (differential privacy, federated learning, secure multi-party computation), synthetic controls at scale, and the evolution from "tests" to "continuous optimization systems."

Emerging Trends: Automated experiment design, privacy-preserving techniques, synthetic controls, federated learning, continuous optimization


Group 5: Organizational Maturity & Implementation

Organizational transformation roadmap

Consolidated from 3 expert sources: Claude, Gemini, and Main synthesis

Strategic roadmap for organizations transitioning from basic holdout testing to mature experimentation programs. Covers the causality gap, defensive foundation (holdout methodology), catalyst strategy (the single win), tactical playbook (First Five experiments), four-phase maturity model, and common mistakes. Includes detailed case studies from eBay ($50M revelation), True Classic (DTC transformation), Microsoft (experimentation flywheel), and Booking.com (cultural sustainability).

Key Frameworks: Four-phase maturity model (Crawl/Walk/Run/Fly), Microsoft's experimentation flywheel, CATS hypothesis framework, First Five experiments, global holdout methodology

πŸ“š Research Appendix (6 documents):


Building organizational confidence through low-risk initial tests

How to design early experiments that build organizational confidence rather than generate political resistance. Covers stakeholder engagement, risk assessment frameworks, choosing low-risk initial tests, success criteria definition, and building credibility through transparency.

Key Strategies: Stakeholder mapping, risk assessment matrix, pilot experiment selection, transparent reporting, confidence building


Diagnostics, validation, and early decision-making

Consolidated from 3 expert sources: Claude, Gemini, and Main synthesis

Comprehensive guide to what early experiment data can and cannot tell you. Covers Sample Ratio Mismatch (SRM) detection, instrumentation verification, baseline covariate balance, temporal dynamics (email lifecycle, novelty effects), leading vs. lagging metrics, statistical frameworks for early monitoring (SPRT, Bayesian P2BB), and the critical 48-hour checklist.

Key Frameworks: SRM severity levels, email engagement timelines, leading-lagging metric hierarchy, sequential monitoring protocols, 48-hour validation checklist

πŸ“š Research Appendix:


Understanding how experiment effects evolve over time

Comprehensive analysis of how experiment effects evolve over time. Covers novelty effects, primacy effects, user learning curves, habituation, long-term equilibrium effects, and how to design experiments that capture both immediate and sustained impacts.

Key Concepts: Novelty effects, primacy bias, habituation, learning curves, long-run equilibrium, carryover effects


Data-driven experiment duration decisions

Consolidated from 3 expert sources: Claude, Gemini, and Main synthesis

Critical analysis of the widespread but unfounded "30-day minimum" rule. Covers the statistical and organizational origins of this myth, the 4-quadrant experimentation matrix (signal velocity Γ— risk), economic tradeoffs (opportunity cost analysis), when shorter or longer durations are actually appropriate, and practical duration calculation frameworks. Includes special considerations for financial services and regulated industries.

Key Frameworks: 4-quadrant matrix (signal velocity Γ— risk), 6-question pre-commitment framework, duration calculation methodology, risk-based decision logic

πŸ“š Research Appendix (4 documents):


🎯 Learning Pathways by Role

For Analytics Professionals

Start here: 04 β†’ 05 β†’ 06 β†’ 07 β†’ 02 β†’ 03 Then explore: 08 β†’ 09 β†’ 10 (email specialization) or 11 β†’ 12 β†’ 13 (advanced methods) Complete with: 14 β†’ 15 β†’ 23

For Business Stakeholders & Executives

Start here: 21 β†’ 01 β†’ 16 β†’ 22 Then explore: 02 β†’ 03 β†’ 15 β†’ 17 β†’ 18 Complete with: 19 β†’ 20

For Data Scientists & ML Engineers

Start here: 05 β†’ 06 β†’ 07 β†’ 11 β†’ 13 Deep dive: 02 (research appendix), 03 (research appendix) Advanced topics: 12 β†’ 20

For Experimentation Program Managers

Start here: 21 β†’ 14 β†’ 22 β†’ 15 Then explore: 02 β†’ 03 β†’ 16 β†’ 17 β†’ 18 Complete with: 23 β†’ 24 β†’ 25 β†’ 19 β†’ 20


πŸ“Š Key Frameworks & Methodologies

Statistical Foundations:

  • Sequential testing (GST, AVI, SPRT)
  • Bayesian methods (Expected Loss, P2BB, Threshold of Caring)
  • Variance reduction (CUPED, CUPAC, stratification)
  • Power analysis and sample size calculation
  • Multiple testing corrections (Bonferroni, FDR)

Causal Inference:

  • Heterogeneous treatment effects (CATE, causal forests)
  • Meta-learners (S-learner, T-learner, X-learner)
  • Instrumental variables and quasi-experiments
  • Synthetic controls and difference-in-differences
  • Doubly robust estimation

Organizational Design:

  • Centers of Excellence (CoE) architecture
  • Three Lines of Defense (regulated environments)
  • Maturity models and transformation roadmaps
  • Metric hierarchies and Overall Evaluation Criteria
  • Knowledge management and failure libraries

Regulatory & Compliance:

  • Model Risk Management (SR 11-7)
  • Fair Lending (ECOA)
  • Data privacy (GDPR, CCPA)
  • Marketing regulations (CAN-SPAM, TCPA)

Industry Case Studies: Netflix, Airbnb, Spotify, Microsoft, Amazon, Booking.com, LinkedIn, eBay, Facebook, Uber, Google, Intuit, Vanguard, Capital One, JPMorgan Chase, True Classic, Robinhood, Optimizely, GrowthBook


πŸ”— Supplementary Materials

Slide Decks


πŸ“ Editorial Note

Five critical topics (02, 03, 21, 23, 25) represent consolidated authoritative documents synthesized from multiple expert AI systems (Claude, Gemini, and custom synthesis agents). These consolidations integrate the strongest insights, frameworks, and case studies from each source while eliminating redundancy and maintaining rigorous McKinsey/HBR editorial standards. Original source documents and extensive research appendices remain accessible through the linked materials.


This knowledge base represents a comprehensive synthesis of academic research, industry publications, regulatory guidance, and practical implementation experience in experimentation methodology. All content is designed for analytics professionals, data scientists, and business leaders working in marketing, product, and strategy roles.

Last updated: February 2026 | Maintained by expert editorial synthesis


⭐ Featured


πŸ€– Agentic Systems

🏒 AI in Enterprise

πŸ“ˆ Competitive Intel

🧠 Marketing & Behavioral

🧘 Psychology & Mental Health

πŸ“Š Marketing Analytics

πŸ›  Tech & Guides

🏦 Finance & Vanguard

πŸ“Š McKinsey AI


Sorted by Topic

Clone this wiki locally