This repository provides open-source best practices for for conducting geographic randomized controlled trials (Geo RCTs) for measuring incremental sales effect of advertising cammpaigns. It includes details on one design type in particular, a multi-armed stepped experimental design that has particular advantages in terms of statistical strength and balance in using geographic regions, namely designated market areas (DMAs), as units of the experiment. It also covers other techniques for achieving statistical strength and balance with simpler experimental designs, including two-celled tests (test and control), including stratefied randomization, matched-pair randomization, covariate-constrained randomization and more.
These approaches ars designed to help advertisers, agencies, and researchers measure the causal impact of advertising across large geographic regions with improved statistical power and transparency.
The goal of publishing this methodology openly is to make it easier for practitioners to understand, evaluate, and apply Geo RCTs in their own work. The repository includes:
- A detailed write-up of the methodology (
README.md) - Code examples (
/code/) for running simulations and analyses - Supporting tables and reference materials (
/tables/) - A PDF version of the original whitepaper (
/Central_Control_Geo_RCT_Whitepaper.pdf)
We welcome feedback, discussion, and contributions from the measurement community.
- INTRODUCTION
- FIRST STEPS
- SELECT DESIGN TYPE
- DESIGN THE EXPERIMENT
- CONDUCT POWER ANALYSIS
- LOCK THE DESIGN FRAMEWORK
- EXECUTE THE EXPERIMENT
- CONDUCT ANALYSIS
- INTERPRET AND ACT
- COMMON PITFALLS AND HOW TO AVOID THEM
- CONCLUSION
- APPENDICES
- ABOUT CENTRAL CONTROL
Incremental return on ad spend (iROAS) by using large-scale geographic randomized controlled trials (RCT). This document provides a detailed explanation of how to design and execute such experiments.
EXPERIMENT DESIGN → STATISTICAL PRE-TESTING → DESIGN DOCUMENTATION → EXPERIMENT EXECUTION → ANALYSIS & REPORTING → BENCHMARKS & META-ANALYSIS
DESIGN EXPERIMENTS TO ANSWER SPECIFIC QUESTIONS. PICK THE AMONG MULTIPLE TEST DESIGNS TO BEST SUIT REQUIREMENTS. → CONDUCT POWER ANALYSIS TO PRE-TEST DESIGN'S STATISTICS AND FIX ON PARAMETER VARIABLES. → PRODUCE AND SHARE EXPERIMENT DESIGN DOCUMENT WITH KEY STAKEHOLDERS. → MANAGE AND MONITOR EACH STAGE OF THE EXPERIMENTS. → UNDERSTAND TEST RESULTS CLEARLY, TAKE ACTION ON BUSINESS DECISION. → ANALYZE A LARGE BODY OF RCT RESULTS ACROSS CAMPAIGNS FOR BEST EVIDENCE OF WHAT DRIVES OUTCOMES.
The workflow above summarizes the end-to-end process for running advertising experiments. The final stage — benchmarking — compiles results from all experiments into a growing evidence base. As advertisers, agencies, and media companies accumulate dozens, then hundreds, or even thousands of tests, these benchmarks become a powerful source of insight. They reveal what truly works in advertising and provide expected effect sizes by media channel, ad format, partner, product category, and other key dimensions.
Before designing the experiment, several foundational questions should be addressed.
STATE YOUR BUSINESS OBJECTIVE AND THE DECISION TO BE SUPPORTED
Experiments are not as difficult or expensive as many marketers perceive or portray them, but they do require resources and calendar time, so run them only when the answer will drive a meaningful business decision. Geo experiments are almost always "intent to treat" (ITT) designs, where treatment is tracked at the geographic level rather than the individual level. You know which geographic populations were intended to receive treatment, not which individuals actually saw the ads. This means your media investment must be substantial enough to detectably move KPIs at the geographic level. This technique isn't suited for relatively small campaigns or very rare conversions.
Start by measuring sales effects for your largest media channels to validate and calibrate coefficient assumptions in your marketing mix model. From there, having laid a baseline of channel-level validation and gotten a grounding in best practices for conducting experiments, researchers can branch out into nuances of the mix, such as the effectiveness of particular media partners, subchannel tactics, media formats and so forth.
Most importantly, identify the specific business decision that will be made based on the results, with marketing, analytics and finance working in concert. A well-constructed geo experiment that doesn't inform action is a waste of resources.
Example business objectives:
- Validate that the impact of a marketing channel, such as Search (or Television, Social, Retail Media, etc.) matches our marketing mix model's (MMM) coefficients, and if not, investigate (and experiment) further with a view to calibrate the MMM accordingly and reevaluate investment levels
- Measure iROAS of a media partner, such as Meta (or Google, Spotify, Disney, etc.) as a channel partner and renegotiate or reallocate if results are unfavorable
- Evaluate iROAS of a tactic, such as Retargeting (or Lookalike Targeting, Branded Search Keywords, Geo-fencing, etc.) and adjust investment accordingly
Next, we define the key parameters that will structure the experiment.
INDEPENDENT VARIABLE: THE MEDIA INVESTMENT
In an experiment, the independent variable is what you modify between test and control groups to measure its effect on the outcome variable. For advertising experiments, this means the ad campaign or media channel being tested, whether branded search, social media, television, or another tactic.
Since we're designing geographic experiments, it's crucial to verify that your chosen medium can reliably target different geographic units sufficiently for your testing objectives. This capability varies significantly across media types and platforms. Examples, as of this writing: Linear TV can be tested effectively using DMA targeting through providers like Ampersand or Nexstar, but individual networks like CBS or AMC may lack sufficient scale or targeting granularity. Spotify enables geo-targeting for streaming music and run-of-network podcasts but not for host-read spots on specific shows. Amazon supports geo-targeting for display and video on-site and their DSP but not for sponsored product keywords that comprise most retail media budgets. ZIP codes are not a reliably accurate testing unit for most media.
DEPENDENT VARIABLE: KEY PERFORMANCE INDICATOR (KPI)
The dependent variable is the outcome you're measuring, ideally sales. The fundamental assumption is that this outcome is influenced by the independent variable: running advertising increases sales. The experiment aims to quantify this causal relationship by comparing against the counterfactual baseline sales rate in the control group absent the advertising stimulus.
Sales is the preferred KPI as it directly measures iROAS, customer acquisition cost (CAC), and cost per acquisition (CPA), all metrics that should reflect true incrementality. Other outcomes such as foot traffic, search activity, web visitation, and brand lift can also be measured, but should be measured with RCT to know if the ads are having a true causal effect.
Critically, ensure you have reliable KPI data that aligns with the experimental structure. For geographic experiments, postal codes in customer databases are ideal — no device graphs, tracking pixels, or clean rooms required. Whether from first-party CRM systems or partner panels, ZIP codes easily roll up to most standard geographic units, allowing straightforward determination of whether each customer belonged to test or control groups at the time of conversion. That doesn't work with non-standard geographic units such as lat/long radial or hexagonal zones.
EXPERIMENT UNIT: GEOGRAPHIC REGIONS
In RCTs, "experiment units" are the smallest entities to which treatment is independently applied and which are subject to randomization. For advertising, these could be individuals, devices, households, or geographic regions.
As explained in Central Control's article "Advertisers seeking accurate ROAS should use large-scale, randomized geo tests," user- and household-level targeting provides false precision. User-level tracking accuracy has generally been overstated by vendors, and Apple's App Tracking Transparency (ATT) and Safari browser have further degraded online tracking capabilities.
The internet's promise of 1:1 targeting was never fully realized. Cookies are inherently unstable, regularly deleted and lacking persistent identity. Industry responded with complex identity graphs and clean rooms. Users pushed back with ad blockers, while governments enacted privacy laws such as GDPR. As a result, when marketers attempt to match campaign-exposed audiences to first-party outcome data, match rates typically range from 50-60% or less.
Even at 80%, this introduces too much noise to reliably detect typical advertising effects of 1-5% with statistical confidence.
This explains the rise of geographic experiments. Using geo regions as experimental units forms what's known as cluster-randomized trials (CRT), where geographic clusters of individuals are randomly assigned to treatment conditions.
Geographic experiments offer multiple advantages. They sidestep privacy concerns by eliminating the need for personally identifiable information. Sales and other KPIs align naturally with this structure through ZIP-coded transaction data from CRM systems, without privacy implications or technical overhead.
Most media can accurately target large geographic regions, notably Nielsen's designated market areas (DMAs). While postal codes, cable zones, core based statistical areas (CBSAs), and states are alternatives, each presents challenges. Postal codes would be ideal, but, as discussed in the AdExchanger article "ZIP Codes: The Simple Fix For Advertising ROI Measurement," media companies currently infer ZIPs from IP addresses with high error rates or accumulate multiple ZIPs per user account, making them unreliable for experimentation without additional system enhancements.
Geographic regions suitable for experimentation must be mutually exclusive and sufficiently isolated to minimize spillover between test and control groups, upholding the experimentation best practice known as Stable Unit Treatment Value Assumption (SUTVA). Ideally, units should also be collectively exhaustive (MECE: mutually exclusive and collectively exhaustive).
DMAs meet these criteria well. They're mutually exclusive and collectively exhaustive across the US. Most people don't commute between DMAs — while NYC and Philadelphia DMAs border each other, their populations generally live and work within one or the other. With 210 US DMAs, randomization (as discussed further below) can balance the many factors affecting regional sales: income levels, education, competitive mixes, weather, and more. Critically, many media types can be targeted reliably by DMA in the US, a practice established since the 1950s.
Outside the US, metro areas or postal-code clusters typically work best for the same purpose of geo experiments.
Bear in mind, we're talking here about the RCT subclass of cluster randomized trials, so don't confuse this approach with matched-market tests (MMT) or synthetic control method (SCM). Those are forms of quasi-experiments, a scientific designation that means "not as good as real experiments," defined by a lack of randomization of assignments of test and control arms. This paper focuses on how to conduct geo RCTs for iROAS measurement and doesn't get into why CRT provides categorically better evidence than MMT or SCM. For more on that discussion, refer to articles including "Advertisers Seeking Cccurate ROAS Should Use Large-Scale, Randomized Geo Tests," "Recession-Proofing Your Ad Mix: Measure Twice, Cut Once" and "Gain Market Share by Disrupting Bad Ad Measurement" on Central Control's blog or LinkedIn page.
For this document, we focus on large-scale randomized geographic experiments using U.S. DMAs as the experimental unit, though these techniques can also be applied in other countries by targeting metro areas or postal code clusters.
There are myriad types of designs for running experiments. We have already narrowed to some key parameters, such as that we are focusing on using geographic regions for the experiment units and also that we are using an intent-to-treat (ITT) design, meaning that we do not need to track who in the test group was actually treated with the campaign exposure, and that treatment is withheld from everyone in the control group. We will briefly discuss here a few other more design details, but settle on one for the rest of this paper.
Different business questions give rise to different experimental design choices.
Those include:
- ** Multi-armed, stepped CRT:** This design, nicknamed "Rolling Thunder," offers greater statistical power than a traditional two-celled test by generating many independent contrasts across time and treatment arms. Each group enters treatment in a staggered sequence and remains in for a fixed duration, creating repeated observations both within and across DMAs. This structure increases the precision of the causal estimate while reducing sensitivity to outliers or time-based shocks. The design is well-suited for cessation testing: by phasing off the media gradually, advertisers can closely monitor sales impact, minimize the risk of revenue loss, and halt the test early if needed. It is just as applicable for introduction testing.
-
Targeting Test: To see if the iROAS of a targeting technique pays off, this three-armed design includes one test group for targeting, one test group for minimally targeted media using the same creative, and one unexposed control group.
-
Media Bake-Off Trial: When comparing two media channels against one another for the superior ROI, a three-armed test, one for medium A, one for medium B and a control group.
-
Media Bake-Off Plus Interaction Effect: A four-armed test compares iROAS of two media properties plus their interaction effect, with one exposed to medium A, one medium B, one both, and one unexposed control group, using factorial design and analysis.
-
Intensity Trial: For determining optimal media weight and calibrating diminishing returns, a multi-armed test where different groups receive varying media levels (e.g., 50%, 100%, 200% of baseline spend) plus an unexposed control. Unlike cessation tests which measure what's lost when media stops, intensity trials identify the point of maximum efficiency by testing increased investment levels. This design reveals whether doubling spend doubles impact or if returns diminish, critical for optimizing budgets rather than just validating them.
-
Parallel Two-celled Test: This is the most basic design for a lift test, answering the question of how much incremental effect an ad campaign/channel/publisher/tactic or other ad strategy as on a desired outcome, such as sales, with one test group and one control. Statistically weaker than Rolling Thunder design: recommended only for estimated effect size great than 5%.
Practitioners may opt to have different ratios of the various arms of the experiments, but our recommendation is to use equal probability in assigning arms whether that's 50/50 for two-armed tests or likewise for any other number of splits. Uneven ratios between test and control weaken the statistical power of the test and risk skewing the sample between the arms.
The analysis for each of the designs mentioned above functions similarly.
With the business problem defined and the experimental approach selected we can now proceed to the detailed design.
EFFECT SIZE
Specify the effect size you intend to detect, typically expressed as a percent lift in the primary KPI (e.g., sales). This value should reflect what the business would consider meaningful, based on past campaign benchmarks, modeling outputs (e.g., marketing mix models or multitouch attribution models), or practical judgment. For example, an expected lift of ~3% may be a reasonable planning target for most large consumer brands.
It may feel odd to have to estimate this quantity, since the point of the experiment is to determine its value. Don't think of this as a forecast. Instead, think of this as the minimum lift that would be worth detecting based on your business. A point of departure for this, if lacking any better guidance, is what would be the minimum lift required to generate positive iROAS for the campaign. The lift estimate is an input to power analysis that anchors the design in real-world relevance.
Power analysis will assess whether the proposed design can detect this effect with sufficient probability and may show that the setup is sensitive enough to detect even smaller effects. Avoid designing the experiment to detect only the absolute minimum effect you think might occur, as this can overfit assumptions and raise the risk of an underpowered test.
Where feasible, simulate power across a range of plausible effect sizes to assess robustness and trade-offs in detectable lift. Given that this is a how-to paper concerning experimental design, this material gets at least a bit more technical as it proceeds. We assume the reader has some familiarity with modern statistical techniques, but we trust that non-technical readers should be able to follow the basic concepts as we continue.
CONFIDENCE LEVEL
Specify the confidence level that will be used to test for statistical significance. The standard is 95%, corresponding to a significance threshold (α) of 0.05, although 90% confidence is often acceptable for marketing purposes.
This determines the risk of a false positive ("Type I error"), i.e., concluding there was an effect when there wasn't. A 95% confidence level implies that, under the null hypothesis (the opposite of the hypothesis), there's a 5% chance of incorrectly declaring a statistically significant result.
TEMPORAL WINDOWS
Pre-Period (Baseline) Length
Set a baseline window — eight weeks is usually sufficient, depending on sales cycles — to normalize each DMA's trend before treatment. This period is used for trend stabilization and covariate adjustment. Justify that it is long enough to smooth out short-term noise but recent enough to reflect current conditions. Avoid choosing periods that include holidays or anomalies unrepresentative of the test window.
Exposure Period
Determine the active test duration based on expected consumer response. For high-reach media (e.g., TV or digital display), a 4-6 week exposure is often sufficient. For lower-latency channels (e.g., search, retail media), a shorter window may suffice. Ensure media weight and pacing can be reliably sustained during this period. This can be informed during the simulations of the power analysis.
Post-Period (Observation Window)
Decide whether to measure lift only during exposure or include a tail period to capture delayed effects. Include a wash-out buffer if you expect lingering media influence that could contaminate post-test analysis. Pre-specify whether post-period sales will be included in the primary lift calculation, analyzed separately as a decay effect, or excluded entirely as a buffer period. This prevents post-hoc decisions that could bias results.
Seasonality Alignment
Ensure that the pre-period and test period reflect comparable seasonal patterns. Avoid scheduling the experiment so that only one period includes major events like holidays, back-to-school, or Black Friday, unless both periods are expected to show similar seasonal dynamics. Misaligned timing can introduce artificial differences that may be mistaken for treatment effects. Marketing mix models can be used to create seasonally adjusted conversions in situations where bridging a "seasonal change" is unavoidable.
GEOGRAPHIC CLUSTER SPECIFICATION
Unit Definition
In our example, we have determined to use DMAs as the cluster unit. They are mutually exclusive, cover the full U.S. population, and reflect media-buying logic with minimal spillover across boundaries. Their scale makes them well-suited for randomization and aggregate KPI tracking.
Sampling Frame
Use all 210 US DMAs rather than a subset to ensure generalizability to national campaigns. Full-market inclusion maximizes statistical power and captures regional heterogeneity, while limited-DMA designs introduce selection bias and restrict external validity of results.
Assignment Probability
Apply equal probability randomization (typically 50/50 for two arms, or 33/33/33 for three arms, etc.) across all included DMAs. This ensures each geographic unit has the same chance of receiving treatment, preserving the statistical properties needed for unbiased causal inference.
Exclusion Criteria
Define up front which DMAs may be excluded, and why. Valid reasons include things like low store count, data sparsity, regulatory constraints, or known structural anomalies. Aim to keep the full set of 210 DMAs unless exclusions are methodologically necessary. Excluding DMAs simply because they are large, account for a large portion of sales, or are competitive are not ideal exclusion criteria, as their inclusion inherently makes the results of the experiment more generalizable to real-world conditions. Forcing them into the test group is also a bad idea, moving out of the realm of being a randomized trial and into quasi-experiment territory.
Minimum Cluster Size
Set a clear inclusion threshold (e.g., "DMAs with fewer than 500 transactions per week may be excluded") to avoid excess variance from underpowered regions. Document the rationale in the design phase to avoid ad hoc decisions post hoc.
Spillover/Contamination Safeguards
To prevent treatment effects from leaking into control DMAs, implement geo-targeting controls (e.g., frequency caps, DMA-level pacing). Set up monitoring protocols to track media delivery and investigate outliers.
OUTCOME DATA REQUIREMENTS
Data Source and Format
The experiment requires comprehensive sales (or other KPI) data at the geographic level for both historical power analysis (ideally 2 years) and the test period. Two primary sources are available:
First-Party CRM Data (Preferred):
- Extract daily or weekly transaction counts or sales volumes by ZIP code (daily preferred)
- Requires: buyer ZIP code and transaction date (daily or weekly) for aggregated sales counts
- Provides maximum flexibility and granularity
- No additional licensing costs
Third-Party Syndicated Data:
- Sources: NielsenIQ, Circana (formerly IRI), or similar providers
- Often pre-aggregated to DMA level
- Weekly granularity typical (daily may be cost-prohibitive but provides greater statistical power)
- Necessary for CPG and other industries without direct sales data
KPI Data Structure
Outcome data for a geo test is typically simple, with three columns: date, sales count (or volume), and geography (ZIP code or DMA). If the advertiser has comprehensive first-party transaction data with ZIP codes in the customer record, outcomes should be aggregated by ZIP. If using a third-party panel such as NielsenIQ or Circana, DMA-level aggregation is sufficient. ZIP codes can be rolled up to DMAs using standard match tables from Nielsen or other providers.
While sales is the preferred outcome metric, other KPIs can be substituted in this structure, such as web form completions, brand search volume (from Google Trends, reported by DMA), or foot traffic (via third-party mobility data providers). Metrics can be reported daily or weekly. Sales can be expressed as counts or continuous dollar values.
WHERE POSSIBLE, USE DAILY KPI AGGREGATIONS, WHICH PROVIDE MORE STATISTICAL POWER THAN WEEKLY. AVOID MONTHLY ROLL-UPS.
A good experimental hypothesis is a clear, testable statement about the expected causal effect of a treatment. It should specify:
- The direction and magnitude of expected impact (e.g., a lift of at least X%)
- The outcome metric (e.g., sales revenue)
- The comparison groups (treatment vs. control DMAs)
- The timeframe
- The confidence level required for the test
The hypothesis makes explicit what you believe the intervention will accomplish and provides stakeholders with a shared understanding of success criteria.
UNDERSTANDING HYPOTHESIS TESTING
Statistical testing doesn't "prove" a hypothesis true. Instead, it attempts to reject the opposite — the null hypothesis. The null hypothesis typically states there is no effect or that the effect is less than the pre-specified minimum. If the data are sufficiently inconsistent with the null hypothesis at your chosen confidence level, you reject it in favor of your research hypothesis.
Example Based on Business Objectives:
For the objective: "Measure iROAS of Meta as a channel partner and renegotiate or reallocate if results are unfavorable," the following hypothesis and null hypothesis could apply:
Research Hypothesis (H₁): Meta advertising will generate a statistically significant lift in sales revenue of at least 3% in treatment DMAs compared to control DMAs during the 5-week test period, measured at the 95% confidence level.
Null Hypothesis (H₀): Meta advertising will not generate a statistically significant lift in sales revenue of at least 3% in treatment DMAs compared to control DMAs during the 5-week test period at the 95% confidence level.
If we reject the null hypothesis, we have evidence supporting Meta's effectiveness. If we fail to reject it, either the true effect is smaller than 3%, the test lacked sufficient power, or Meta genuinely doesn't drive meaningful incremental sales at current spend levels.
Note that "failing to reject" the null hypothesis doesn't prove Meta ineffective; it may indicate the need for a longer test, higher spend, or acceptance that the channel drives modest but still positive returns below our detection threshold. Likewise, rejecting the null doesn't "prove" our hypothesis true; it provides evidence that the observed effect is unlikely to have occurred by chance alone. At a 95% confidence level, there's a 5% probability of incorrectly rejecting a true null hypothesis (Type I error). While we cannot prove hypotheses through statistical testing, we can accumulate evidence that makes certain conclusions more or less plausible given our data and assumptions.
SIMPLE RANDOMIZATION AS THE DEFAULT
At its core, randomization is straightforward: we flip a fair coin (metaphorically) for each DMA to determine whether it receives treatment or serves as control. This simple approach, assigning each of the 210 DMAs to treatment or control with equal probability, forms the foundation of our experimental design.
For the complete implementation, see simple_randomization.py.
This approach might seem too simple — won't random chance sometimes create imbalanced groups? The key insight is that with 210 units and our comprehensive power analysis methodology, simple randomization works remarkably well, particularly in the multi-armed, stepped design dubbed Rolling Thunder.
OUR SIMULATION-BASED VALIDATION PROCESS
We run thousands of Monte Carlo simulations using ideally two years of historical weekly sales data. Each simulation:
-
Randomly samples multiple test windows from the historical period, preserving the calendar structure of pre- and post-periods. This ensures we test across different seasonal patterns and market conditions.
-
Randomly assigns DMAs to treatment and control groups using simple equal odds randomization, exactly as we would in the actual experiment.
-
Normalizes each DMA to its own pre-period trend using an 8-week baseline. This normalization process is crucial: it reduces baseline heterogeneity by accounting for each DMA's unique growth trajectory, seasonal patterns, and market dynamics before the test begins.
-
Simulates a treatment effect by artificially inflating sales in treatment DMAs during the test period by the hypothesized effect size (e.g., 3%, 5%).
-
Estimates treatment effects using the planned analysis method (typically DMA-level t-test or ANCOVA).
-
Tests for statistical significance at the pre-specified confidence level (e.g., 95%).
This process is repeated hundreds or thousands of times with different test windows and treatment assignments. If more than 85% of simulations consistently show high power (P-val < 0.05) to detect target effects, we have empirical evidence that chance imbalances from simple randomization don't systematically undermine our design.
The pre-period normalization is particularly important. By normalizing each DMA to its own 8-week trend before treatment, we are effectively controlling for many potential confounders without having to explicitly model them. The Rolling Thunder design further enhances balance by incorporating the pre-treatment weeks from each test group into the control pool. This staggered entry means that test groups B through F contribute their early weeks as additional control observations, naturally increasing the effective control sample size and improving the representativeness of the control group across different time periods. This addresses the practical concern that with only 210 units, chance alone might produce treatment and control groups that differed systematically on important variables.
ALTERNATIVE RANDOMIZATION APPROACHES
Our simulation analyses demonstrate that the Rolling Thunder design inherently improves balance without requiring other such complex techniques. By incorporating the pre-treatment periods from all test groups (except the first) into the control group, this approach naturally enhances balance by leveraging observations from the same DMAs that will later receive treatment. The enhanced control group contains data from nearly all DMAs, more accurately reflecting the full population variance and making the framework robust against initial randomization imbalances. This natural balancing, combined with DMA-level normalization to pre-period means, typically provides sufficient statistical power for detecting modest lift effects. Certain situations may still benefit from more sophisticated approaches, such as these:
STRATIFIED RANDOMIZATION
Groups DMAs by key characteristics before randomizing within strata.
For the stratified randomization implementation, see stratified_randomization.py.
This guarantees balance on stratification variables but requires careful selection of stratifying covariates. Best when you can identify 2-3 clearly important variables (i.e., with strong correlation to KPI) and have sufficient DMAs within each stratum.
Example: Stratify by sales quartile to ensure treatment and control arms have equal representation of high-, medium-, and low-volume markets.
Power analysis is a critical step in designing a geographic RCT. It evaluates whether the experiment design, given its parameters, can reliably detect a meaningful effect size.
Rather than relying solely on theoretical variance assumptions, we use a simulation-based power analysis grounded in real-world sales data. This approach provides an empirical check on whether the design is likely to produce statistically significant results, and helps quantify the test's sensitivity under realistic conditions.
The simulation requires the same data format as the actual experiment.
For the required data structure format, see power_analysis_data_structure.py.
For first-party data, this means:
- Two years of historical ZIP-level transactions
- Same aggregation logic as planned for test period
- Same ZIP-to-DMA mapping to ensure consistency
For syndicated data:
- Negotiate access to historical data (ideally 2 years)
- Verify no methodology changes during this period
- Account for any markets with incomplete history
The power simulation will use this exact data structure, preserving real-world variance patterns, seasonality, and DMA-specific trends that theoretical calculations would miss.
We simulate the experiment using ideally two years of historical weekly sales data to reflect true patterns of variance, seasonality, and DMA-level idiosyncrasies. The simulation process mirrors the structure of the planned experiment:
Step-by-Step Simulation Process:
For the situation-based power estimation implementation, see situation_based_power_estimation.py.
-
Randomly sample multiple possible test windows from the historical period. For a planned 6-week test, we would sample dozens of different 6-week windows across the two years, preserving the calendar structure of pre- and post-periods. This captures different seasonal contexts — a test in Q4 may behave differently than Q2.
-
Vary the treatment duration systematically. Run parallel simulations at 3, 4, 5, 6, 8, 10, and 12 weeks. While 4-6 weeks is often the sweet spot for balancing cost and statistical power, only simulation reveals the actual trade-offs for your specific sales patterns.
-
Apply the DMA normalization process that mirrors our actual analysis. For each simulation:
- Calculate each DMA's average weekly sales during its 8-week pre-period
- Compute the week-over-week growth rate during this baseline
- Project expected sales during the test period based on this trend
- Express actual test-period sales as an index relative to this projection.
This normalization is crucial: it transforms raw sales (which vary greatly by DMA size) into comparable lift indices, dramatically reducing variance.
-
Randomly assign DMAs to treatment and control groups using the same randomization logic planned for the real experiment (typically equal odds of assignment to all experiment arms).
-
Simulate a treatment effect by inflating normalized sales values in the treatment group during the test period by the hypothesized effect size (e.g., 3%, 5%). This creates the "signal" we're trying to detect against the "noise" of natural variation.
-
Estimate treatment effects using the planned analysis method (e.g., DMA-level t-test on the normalized values).
-
Test for statistical significance at the pre-specified confidence level (e.g., 95%).
-
Repeat this process thousands of times, with different test windows, treatment assignments, and durations to build a robust empirical distribution of power outcomes.
KEY SIMULATION OUTPUTS
The Monte Carlo simulation yields several critical insights:
- Power curves by duration: Shows probability of detecting effects at 3%, 5%, 7% lift for each test length
- Minimum detectable effect (MDE): The smallest lift where power exceeds 80%
- Optimal test configuration: The duration that minimizes MDE while keeping the test practical
For example, simulations might reveal:
- 3-week test: 80% power only at 7%+ lift (too insensitive)
- 5-week test: 80% power at 3.5% lift (good balance)
- 8-week test: 80% power at 2.8% lift (marginal improvement for 60% more media cost)
This section demonstrates ICC estimation and application. The complete implementation is in icc_estimation.py.
ESTIMATING AND APPLYING INTRACLASS CORRELATION COEFFICIENT (ICC)
Within the simulation process, we also estimate the Intraclass Correlation Coefficient (ICC). This measures how similar sales values are within each DMA over time relative to between DMAs, and informs how much clustering is reducing the effective number of observations.
Understanding ICC Impact:
- Low ICC (< 0.05): DMAs behave relatively independently week-to-week. Each week adds substantial information.
- High ICC (> 0.20): Strong within-DMA correlation. Adding weeks provides diminishing returns.
A high ICC indicates that data within each DMA are highly correlated — meaning that more weeks (or switching to daily data) are needed to compensate for the lack of independence. Because we treat the number of DMAs as fixed (typically all 210), we must adjust for ICC through test duration and data granularity:
Practical Adjustments for High ICC:
- Extend the test window to observe more time points. If ICC = 0.15, you might need 6 weeks instead of 4 to achieve target power.
- Switch to daily data, if day-to-day variation adds usable signal. Daily data often shows lower ICC than weekly aggregates, effectively increasing sample size.
- Run parallel simulations with different granularities and durations to identify the most power-efficient combination. Sometimes 4 weeks of daily data outperforms 6 weeks of weekly data.
Once power analysis confirms feasibility, document every parameter in a formal experiment design document. This should be circulated to all stakeholders (advertiser, agency, media partners, analytics team) before any media is trafficked.
GEOGRAPHIC RCT DESIGN SPECIFICATION EXAMPLE
| Component | Specification |
|---|---|
| Business Question | Does Social Media drive ≥3% lift in sales revenue? |
| Decision Rule | If lift ≥3%, increase budget 20%. If <3%, reallocate 50%. |
| Primary KPI | Weekly sales revenue by DMA (from CRM, ZIP→DMA mapping) |
| Experimental Unit | 210 US DMAs (Nielsen 2024 definitions) |
| Design Type | Multi-armed stepped trial ("Rolling Thunder") |
| Randomization | Simple random assignment, equal probability |
| Random Seed | 42 (stored in repo: geoRCT_Q2_2025_assignments.csv) |
| Pre-Period | 8 weeks (Mar 1 - Apr 25, 2025) |
| Treatment Period | 5 weeks (Apr 26 - May 30, 2025) |
| Post-Period | 2 weeks (May 31 - Jun 13, 2025) [washout buffer] |
| Treatment Dosage | 10M impressions per week @ $10 CPM (≈$500,000 total) |
| Media Channels | Facebook, Instagram |
| Confidence Level | 95% (α = 0.05) |
| Power Target | ≥80% |
| Simulation Results | 84% power to detect 3% lift (via 2 years historical data) |
| MDE = 2.7% (minimum detectable effect at 80% power) | |
| Normalization Method | 8-week pre-period trend per DMA |
| Analysis Method | T-test on normalized sales indices |
| Compliance Threshold | Treatment DMAs must receive 90-110% of planned spend |
| Exclusions | None (all 210 DMAs included) |
| Secondary Analyses | None pre-specified |
| Governance | CMO approves changes; Analytics owns methodology |
Before launching the test campaign run through this checklist:
Media Setup Verification:
- ☐ Treatment DMA lists uploaded to all platforms
- ☐ Control DMA exclusions properly configured, if necessary for a given medium (otherwise, pure holdout will suffice)
- ☐ Geo-targeting set to "exact" (no radius expansion)
- ☐ Frequency caps consistent across platforms
- ☐ Creative assets identical in all DMAs
- ☐ Budget distributed across Treatment DMAs proportional to pre-test KPI rate
Data Pipeline Verification:
- ☐ Historical sales data passes integrity checks
- ☐ ZIP→DMA mapping uses current DMA definitions consistent with the medium's targeting
- ☐ Sales reporting latency documented and acceptable
- ☐ Compliance monitoring dashboard live
- ☐ Backup data extraction plan in place
WEEK 1 - SOFT LAUNCH:
- Daily compliance checks: spend by DMA vs. plan
- Verify targeting accuracy
- Investigate any Control DMA with >0% spend
- Verify Treatment DMAs receiving impressions
WEEKS 2-5 - FULL FLIGHT:
Establish a monitoring cadence:
- Monday AM: Pull weekend delivery data
- Monday PM: Calculate compliance metrics by DMA
- Tuesday AM: Flag DMAs outside 90-110% of plan
- Tuesday PM: Adjust bids/budgets (not targeting)
- Wednesday: Spot-check creative rotation
- Thursday: Review week-to-date pacing
- Friday: Send compliance report to stakeholders
CRITICAL RULES:
- Never change DMA assignments - This breaks randomization
- Document all adjustments - Include timestamp and rationale
- Don't peek at results - Avoid the temptation to check lift mid-flight unless conditional stopping was pre-established
- Maintain spend ratios - If cutting budget is necessary, due to business conditions, cut proportionally across all DMAs
Track these metrics weekly:
| Metric | On Target | Yellow Flag | Red Flag |
|---|---|---|---|
| % Treatment DMAs Compliant | 95%+ | 90-94% | <90% |
| % Control DMAs with Spillover | <5% | 5-10% | >10% |
| Spend Variance Across Treatment | <20% | 20-30% | >30% |
| Platform Delivery Discrepancies | <10% | 10-20% | >20% |
For CRM Data:
After the test period ends, follow the same data extraction process used for historical data.
For the data extraction implementation, see extract_raw_transactions.py.
Then apply ZIP-to-DMA mapping.
For the ZIP-to-DMA mapping implementation, see zip_to_dma_mapping.py.
For Syndicated Data:
Request data pull with these specifications:
- Markets: All 210 US DMAs (or subset used in test)
- Metrics: [Total sales volume / unit sales / transactions]
- Time period: [Full date range including pre and post periods]
- Granularity: Daily (preferred, more statistical power) or weekly
Allow 5-7 business days after period close for transaction settlement and data processing, then freeze the dataset for analysis.
If using sales volume (continuous), check distribution of normalized values.
For the data quality checks implementation, see data_quality_checks.py.
For count data, standard t-test typically suffices unless counts are very low (< 5 per DMA-week), in which case consider Poisson regression.
For the primary analysis t-test implementation, see primary_analysis_ttest.py.
The core analysis follows these steps:
- Calculate pre-period baselines for each DMA
- Normalize test period sales to account for DMA heterogeneity
- Compare normalized values between Treatment and Control
- Test for statistical significance
INTERPRETING RESULTS:
The coefficient on assignment Treatment represents the estimated lift. For example:
Estimate: 0.0342
Std. Error: 0.0156
t value: 2.19
p-value: 0.029
95% CI: [0.0036, 0.0648]
This indicates a 3.42% lift (p = 0.029), statistically significant at the 95% confidence level.
While randomized controlled trials (RCTs) provide the strongest basis for causal inference, additional diagnostic checks can help validate that the estimated treatment effect is not an artifact of pre-existing differences or instability in the data. These checks are not required in every case but may be useful for stakeholder reassurance or internal QA in higher-stakes tests.
For sample code, see:
- For the difference-in-differences cross-check implementation, see difference_in_differences_crosscheck.py.
- For the leave-one-out sensitivity analysis implementation, see leave_one_out_sensitivity.py.
- For the pre-period balance check implementation, see pre_period_balance_check.py.
SUMMARY TABLE: DIAGNOSTIC CHECKS ON SIMULATED RCT RESULTS
| Check | Estimate | P-Value | Interpretation |
|---|---|---|---|
| Difference-in-Differences | 0.97 | - | Confirms treatment effect remains when comparing pre/post trends across groups. |
| Pre-Period Balance Check (T-Test) | 1.0 | 1.0000 | No significant differences before treatment; groups were well balanced at baseline. |
| Leave-One-Out (Mean Effect) | 3.61 | - | Treatment effect is stable across geographies; no single DMA drives the result. |
DESCRIPTION OF CHECKS
-
Difference-in-Differences (DiD): Compares the change in outcome between Treatment and Control groups from pre- to post-period. This provides a cross-check that the estimated lift is not merely due to different group trajectories over time.
-
Pre-Period Balance Check: A simple t-test comparing average pre-treatment outcomes between Treatment and Control groups. A high p-value (e.g., >0.1) suggests the groups were balanced before treatment, reducing the risk of confounding.
-
Leave-One-Out Sensitivity: Runs the treatment effect estimation multiple times, each time excluding one DMA from the Treatment group. If the estimated lift remains stable across iterations, the result is not overly influenced by any single region.
Based on the results, follow your pre-committed decision rule:
| Result | Statistical Significance | Business Action |
|---|---|---|
| Lift ≥ Target | Yes | Scale spend per plan |
| Lift ≥ Target | No | Rerun test with extended time or spend |
| 0 < Lift < Target | Yes | Optimize efficiency (creative, targeting) |
| 0 < Lift < Target | No | Consider reducing/reallocating budget |
| Lift ≤ 0 | Any | Pause budget and investigate further |
"We detected a 3.4% lift (p=0.029)"
- The channel works. Incremental revenue exceeds cost.
- Update MMM coefficients with this experimental prior
- Plan follow-up tests on sub-tactics (creative variants, audiences)
"Lift was 2.1% but not significant (p=0.18)"
- Effect may exist but test was underpowered
- Calculate: Would 2.1% lift be profitable? If yes, consider extending
- Review: Was compliance good? Any unusual events?
"No detectable lift (-0.5%, p=0.72)"
- Channel isn't moving the needle at current spend levels
- Before cutting: Verify test execution was clean
- Consider: Different creative? Different targeting? Or truly ineffective?
UPDATE MARKETING MIX MODELS:
Use the experimental estimate as a Bayesian prior.
For the MMM experimental prior update implementation, see update_mmm_experimental_prior.py.
Plan Next Tests:
Successful experiments often reveal new questions:
- If social works, which platform drives most lift?
- Does the effect vary by creative theme?
- Is there a frequency threshold?
Design follow-up experiments using the same rigorous framework.
PITFALL: EXCLUDING "IMPORTANT" DMAS
Example: "Let's exclude NYC and LA because they're too big to risk"
Why it's wrong: Scope of sales effect estimate now only applies to smaller markets
Solution: Include all DMAs; let randomization handle heterogeneity.
PITFALL: PEEKING AT ASSIGNMENTS
Example: "Treatment got more large DMAs, let's re-randomize"
Why it's wrong: Not truly a randomized trial when treatment groups are cherry-picked
Solution: Design test ahead of time with stratification or re-randomization with pre-specified criteria
PITFALL: VAGUE SUCCESS CRITERIA
Example: "We'll see if it works"
Why it's wrong: Invites post-hoc rationalization
Solution: Define specific lift threshold and decision rule upfront
PITFALL: MID-FLIGHT ASSIGNMENT CHANGES
Example: "Dallas is underdelivering, let's swap it to Control"
Why it's wrong: Destroys randomization and causal inference
Solution: Fix delivery issues with bid/budget changes only
PITFALL: CREATIVE VARIATIONS BY REGION
Example: "Let's test the new creative in Treatment DMAs"
Why it's wrong: Confounds media effect with creative effect
Solution: Keep all non-media variables constant; Conduct creative testing or versioning as a separate experiment
PITFALL: CHERRY-PICKING METRICS
Example: "Revenue was flat but look at this segment!"
Why it's wrong: Multiple testing without pre-specified design intent
Solution: Pre-specify primary KPI; treat others as exploratory
PITFALL: IGNORING FAILED RANDOMIZATION
Example: "Treatment had 20% higher pre-period sales but that's OK"
Why it's wrong: Imbalance can masquerade as treatment effect
Solution: Use normalized metrics or covariate adjustment
- Scientifically sound: It is a true Randomized Control Trial
- Omnichannel: It works for cable TV, CTV, digital video, programmatic, social, search, out-of-home, and more
- Advertiser-friendly: It is proven to work for advertisers, both large and small
- Extensible: Works for various KPIs including sales, foot traffic, TV tune-in and more
- Independent: Requires nothing but plan compliance from media firms, DSPs, agencies or other partners
- Low Tech: No special technology required (no clean rooms, user IDs, cookies, etc.)
- Privacy assured: No PII, only ZIP codes
- Fraud proof: Performance cannot be gamed: we defy anyone to hack it
- Immediate results: Final report in minutes of last KPI data availability
- Simple: Easily deployed in any media channel with DMA, ZIP code, or other geo targeting capabilities
- Generalizable: Projectable to national campaigns because it uses all DMAs in the country in the experiment
- Transparent, replicable, explainable
A well-designed geographic randomized controlled trial provides the gold standard for measuring advertising incrementality. By following this playbook, you can transform your organization's approach to marketing measurement and unlock significant competitive advantages.
The framework presented here — built on hundreds of real-world experiments — provides more than just a methodology. It represents a fundamental shift in how organizations should approach marketing ROI. While competitors continue relying on correlation-based attribution models and quasi-experimental methods that systematically overstate digital performance, companies that embrace geographic experimentation gain a decisive edge. In markets where share is zero-sum and growth comes at competitors' expense, understanding what truly drives incremental sales becomes your secret weapon, demonstrably changing perception of marketing from a cost center into a growth accelerator.
WHILE COMPETITORS CONTINUE RELYING ON CORRELATION-BASED ATTRIBUTION MODELS AND QUASI-EXPERIMENTAL METHODS THAT SYSTEMATICALLY OVERSTATE DIGITAL PERFORMANCE, COMPANIES THAT EMBRACE GEOGRAPHIC EXPERIMENTATION GAIN A DECISIVE EDGE.
Start your experimentation journey with your largest spend categories, where even modest improvements in efficiency drive material business impact. A 5% improvement in a $100M channel delivers far more value than a 50% improvement in a $1M channel. But don't stop there — systematically work through your entire marketing mix:
- Channel Validation: Test all major channels to establish true incremental contribution
- Risk Mitigation: Never cut substantial spending without rigorous testing. Channels that appear ineffective in attribution models or with matched-market testing may be driving significant incremental sales
- Opportunity Discovery: Investigate underinvested channels like audio, outdoor, and linear TV where attention may be high and CPMs low, potentially offering better returns than oversaturated digital channels
- Tactical Optimization: After channel-level validation, test media partners, creative variants, audience segments, and messaging strategies
- Continuous Calibration: Track whether mix changes deliver expected sales impact
As your organization builds a repository of experimental results — tens, then hundreds, eventually thousands of tests — you develop the industry's clearest picture of what drives incremental sales.
THIS EXPERIMENTAL BENCHMARK BECOMES YOUR COMPETITIVE MOAT. MAKE BOLD MOVES BACKED BY EXPERIMENTAL EVIDENCE WHILE COMPETITORS HESITATE.
Enabling:
- Predictive Modeling: Build AI models grounded in causal evidence rather than correlations
- Strategic Confidence: Make bold moves backed by experimental evidence while competitors hesitate
- Compound Learning: Each experiment builds on previous insights, accelerating your pace of discovery
Geographic RCTs offer simplicity in design but profound impact in results. By randomly assigning regions, delivering media, and measuring outcomes, you cut through the noise of modern marketing analytics. The careful attention to design details, statistical power, operational excellence, and analytical rigor pays dividends in decision quality.
This whitepaper demonstrates that while running experiments requires rigor and attention to detail, it's not as complicated or expensive as many marketers fear. This isn't rocket science, it's marketing science.
The real cost isn't in the resources required or the temporary sales suspension in control markets. The real cost is making million-dollar media decisions based on flawed measurement while billions in revenue opportunity hang in the balance.
Every day spent optimizing against inaccurate signals is a day your competitors might be using better evidence to steal your customers.
That said, executing geographic experiments well requires expertise, infrastructure, and sustained commitment. Not every organization needs to build this capability in-house.
Remember: every failed experiment that prevents wasted spend is as valuable as a successful test that identifies a winning strategy. In the words often attributed to Thomas Edison, "I have not failed. I've just found 10,000 ways that won't work."
Start today. Pick your largest channel. Design a test. Let real science, not assumptions or statistical mysticism, guide your next million-dollar decision. Soon, you'll be in the company of leading marketing experimentation experts like Netflix, Uber, Airbnb and other market leaders and wonder how you ever made decisions without this level of evidence.
All code examples have been extracted to separate Python files in the /code directory for easy reference and implementation:
- simple_randomization.py - Simple randomization for geographic RCT
- power_analysis_data_structure.py - Data structure for power analysis
- icc_estimation.py - ICC estimation and application
- extract_raw_transactions.py - SQL to Python data extraction
- zip_to_dma_mapping.py - ZIP to DMA mapping and aggregation
- data_quality_checks.py - Data quality validation
- primary_analysis_ttest.py - Primary analysis t-test
- difference_in_differences_crosscheck.py - DiD cross-check
- leave_one_out_sensitivity.py - Leave-one-out sensitivity analysis
- pre_period_balance_check.py - Pre-period balance check
- update_mmm_experimental_prior.py - MMM updates with experimental priors
- stratified_randomization.py - Stratified randomization
- situation_based_power_estimation.py - Power simulation framework
ANCOVA: Analysis of Covariance; regression model that includes treatment indicator and continuous covariates
Cluster-RCT: Cluster Randomized Controlled Trial; experimental design where groups (clusters) rather than individuals are randomly assigned to treatment conditions
DMA: Designated Market Area; geographic regions defined by Nielsen for television viewership measurement, commonly used as experimental units in geo tests
ICC: Intraclass Correlation Coefficient; measures the similarity of observations within the same cluster relative to observations between clusters
iROAS: Incremental Return on Ad Spend; the causal revenue impact per dollar spent, measured through experimentation rather than correlation
ITT: Intent-to-Treat; analysis based on initial treatment assignment regardless of actual exposure, preserving the benefits of randomization
MDE: Minimum Detectable Effect; the smallest true effect size that an experiment has adequate power to detect as statistically significant
MECE: Mutually Exclusive and Collectively Exhaustive; property where categories don't overlap and cover all possibilities
MMM: Marketing Mix Model; statistical model that decomposes sales into contributions from various marketing channels and external factors
SMD: Standardized Mean Difference; difference in means divided by pooled standard deviation, used to assess balance between groups
SUTVA: Stable Unit Treatment Value Assumption; assumption that treatment of one unit doesn't affect outcomes of other units (no spillover)
Business Alignment
- ☐ Business objective clearly stated
- ☐ Decision rule documented (if X then Y)
- ☐ Budget approved and ring-fenced
- ☐ Stakeholders signed off on design
Technical Setup
- ☐ Historical data validated (2+ years)
- ☐ Power analysis completed
- ☐ Randomization code peer-reviewed
- ☐ DMA assignments generated and locked
- ☐ Version control established
Media Readiness
- ☐ Platform targeting lists created
- ☐ Creative assets approved and identical
- ☐ Trafficking instructions documented
- ☐ Compliance monitoring dashboard live
- ☐ Pacing alerts configured
Data Infrastructure
- ☐ KPI data pipeline tested
- ☐ ZIP→DMA mapping current
- ☐ Latency documented and acceptable
- ☐ Analysis code templates ready
- ☐ Results template prepared
Data Quality
- ☐ Sales data completeness >99%
- ☐ No DMA assignment violations
- ☐ Spillover quantified and <5%
- ☐ Outliers investigated
- ☐ Time series plots reviewed
Analysis Completeness
- ☐ Primary t-test completed
- ☐ DiD cross-check performed
- ☐ Leave-one-out sensitivity done
- ☐ Pre-period placebo tested
- ☐ Confidence intervals calculated
Documentation
- ☐ Results summary drafted
- ☐ Technical appendix complete
- ☐ Visualizations created
- ☐ Recommendations clear
- ☐ Lessons learned captured
Action Items
- ☐ Decision rule executed
- ☐ Budget changes implemented
- ☐ MMM coefficients updated
- ☐ Next test planned
- ☐ Results socialized
Central Control helps their partners measure the sales impact of advertising using rigorous, science-based methods that are independent, transparent, unbiased, replicable, and explainable. We offer a platform, consulting, and training to support high-quality advertising experiments that accurately measure incremental return on ad spend (iROAS).
Founded in 2020, the company works with brands, media companies, agencies, and others to build a clearer understanding of advertising's true effect on sales and other business outcomes. We specialize in helping organizations reframe how they assess advertising effectiveness for competitive advantage, offering executive advising, measurement retooling, and calibration of marketing mix models.
The team includes veterans from Google, DoubleClick, Microsoft, Amazon, and Adobe, with deep expertise in experimental design, media measurement, data science, marketing analytics, econometrics, and applied research.
Our team has supported 500+ experiments for Fortune 500 advertisers, optimizing billions of dollars in business impact.
CONTACT:
- Email: info@centralcontrol.com
- Web: www.centralcontrol.com
- LinkedIn: /company/centralcontrol
ABOUT THE AUTHOR:
Rick Bruner is CEO and founder of Central Control. He has spent 25+ years at the intersection of advertising and technology, previously running research and product departments at DoubleClick, Google, Viacom, Marketing Evolution, Viant and Guideline. He is Vice Chair for the marketing sciences industry body I-COM and founder and moderator of the influential Research Wonks online forum.
ACKNOWLEDGEMENTS:
Various contributors provided valuable input to the production of this paper, namely these individuals: John Chandler, PhD, Assistant Professor of Data Science at the St. Thomas University, for help in designing many of these experimental techniques and technical review of the paper; Kumi Harischandra, research scientist, for technical review of the paper; Campbell Foster, Chief Commercial Officer, and Jason Wertheimer, Head of Client Success, for editing, and Ben Munday, Creative Director of Munday Design, for graphic design.
© 2025 by the authors. This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt this material, provided appropriate credit is given. See https://creativecommons.org/licenses/by/4.0/
For questions, please contact info@centralcontrol.com








