Variance reduction combining pre-experiment and in-experiment data
Online controlled experiments (A/B testing) are fundamental to data-driven decision-making in many companies. Improving the sensitivity of these experiments under fixed sample size constraints requires reducing the variance of the average treatment effect (ATE) estimator. Existing variance reduction techniques such as CUPED and CUPAC use pre-experiment data, but their effectiveness depends on how predictive those data are for outcomes measured during the experiment. In-experiment data are often more strongly correlated with the outcome, but using arbitrary post-treatment variables can introduce bias. In this paper, we propose a general, robust, and scalable framework that combines both pre-experiment and in-experiment data to achieve variance reduction. Our framework is simple, interpretable, and computationally efficient, making it practical for real-world deployment. We develop the asymptotic theory of the proposed estimator and provide consistent variance estimators. Empirical results from multiple online experiments conducted at Etsy demonstrate substantial additional variance reduction over current pipeline, even when incorporating only a few post-treatment covariates. These findings underscore the effectiveness of our framework in improving experimental sensitivity and accelerating data-driven decision-making.
💡 Research Summary
The paper addresses a core challenge in online controlled experiments—how to increase the statistical power of A/B tests without enlarging sample sizes. Traditional variance‑reduction techniques such as CUPED (which uses linear regression on pre‑experiment covariates) and its extension CUPAC (which employs flexible machine‑learning models on the same pre‑experiment data) can only achieve modest gains because the pre‑experiment variables often have limited predictive power for outcomes observed during the experiment.
Recognizing that many variables collected during the experiment (e.g., number of product‑detail views, session duration, add‑to‑cart actions) are strongly correlated with the final metric yet remain unaffected by typical UI‑level interventions, the authors propose a unified framework that safely incorporates such post‑treatment covariates. The key insight is to treat only those in‑experiment variables that are treatment‑insensitive—i.e., they are balanced across treatment arms and show no causal pathway from the treatment to the outcome. This balance can be empirically verified with simple statistical tests (e.g., t‑tests on means between groups).
The proposed estimator works in two stages. First, a CUPAC‑style predictor (\hat f(X)) is trained on historical pre‑experiment data (the model is fixed when the current experiment runs). Second, a linear adjustment (\hat\gamma^\top Z) is added, where (Z) denotes the selected treatment‑insensitive in‑experiment covariates. The final ATE estimator is
\
Comments & Academic Discussion
Loading comments...
Leave a Comment