Balancing Covariates in Survey Experiments

Balancing Covariates in Survey Experiments
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The survey experiment is widely used in economics and social sciences to evaluate the effects of treatments or programs. In a standard population-based survey experiment, the experimenter randomly draws experimental units from a target population of interest and then randomly assigns the sampled units to treatment or control conditions to explore the treatment effect of an intervention. Simple random sampling and treatment assignment can balance covariates on average. However, covariate imbalance often exists in finite samples. To address the imbalance issue, we study a stratified approach to balance covariates in a survey experiment. A stratified rejective sampling and rerandomization design is further proposed to enhance the covariate balance. We develop a design-based asymptotic theory for the widely used stratified difference-in-means estimator of the average treatment effect under the proposed design. In particular, we show that it is consistent and asymptotically a convolution of a normal distribution and two truncated normal distributions. This limiting distribution is more concentrated at the true average treatment effect than that under the existing experimental designs. Moreover, we propose a covariate adjustment method in the analysis stage, which can further improve the estimation efficiency. Numerical studies demonstrate the validity and improved efficiency of the proposed method.


💡 Research Summary

This paper addresses the persistent problem of covariate imbalance in population‑based survey experiments, which consist of two sequential stages: (i) drawing a sample from a target population and (ii) randomly assigning the sampled units to treatment and control groups. Although simple random sampling and complete randomization balance covariates on average, finite‑sample realizations often exhibit substantial imbalance, both between the sample and the population and between treatment arms.

The authors first develop a fully stratified design. The target population is partitioned into K strata based on important categorical covariates (e.g., race, gender). Within each stratum, a predetermined number of units is drawn without replacement (stratified random sampling) and then randomly allocated to treatment and control (stratified randomization). This conventional stratified randomized survey experiment (SRSE) already reduces variance relative to a completely randomized design, and the authors derive the optimal stratum‑specific sampling fractions and treatment allocations that minimize the asymptotic variance of the stratified difference‑in‑means estimator. Their asymptotic framework allows the overall sampling fraction to vanish, the number of strata to grow, and treatment proportions to vary across strata, thereby covering a wide range of practical scenarios.

To further improve balance, the paper introduces a novel “Stratified Rejective Sampling and Rerandomization” (SRSRR) design that augments SRSE with two additional steps:

  1. Stratified Rejective Sampling – After a candidate stratified sample is drawn, the authors compute a Mahalanobis distance (M_S) between the sample covariate means (\bar W_S) and the known population means (\bar W). If (M_S) exceeds a pre‑specified threshold (a_S), the sample is discarded and the sampling procedure is repeated until the distance falls below the threshold. This procedure forces the selected sample to be more representative of the population with respect to auxiliary covariates (W_i) that may be observed only in a prior survey.

  2. Stratified Rerandomization – Given an accepted sample, treatment assignment is performed repeatedly until the Mahalanobis distance (M_T) between the treatment‑group and control‑group covariate means (\bar X_1) and (\bar X_0) (based on covariates (X_i) collected after sampling) falls below another threshold (a_T). This step guarantees that the two arms are balanced on the post‑sampling covariates.

Both thresholds can be calibrated to achieve a desired acceptance probability (e.g., 0.01 or 0.001), following the recommendations of Morgan and Rubin (2012).

The core theoretical contribution is a design‑based asymptotic analysis of the stratified difference‑in‑means estimator (\hat\tau) under SRSRR. Unlike the classic result that (\hat\tau) is asymptotically normal, the authors prove that under SRSRR the limiting distribution is a convolution of a normal distribution with two truncated normal distributions—one arising from the rejective sampling constraint and the other from the rerandomization constraint. This “truncated‑normal convolution” concentrates more probability mass around the true average treatment effect (\tau), yielding tighter confidence intervals and higher power for the same sample size. The paper also provides a conservative variance estimator that consistently over‑covers the true asymptotic variance, facilitating valid inference without needing to simulate the exact truncation distribution.

In addition, the authors propose a covariate‑adjustment procedure for the analysis stage. By regressing the observed outcomes on the covariates (X_i) (or a linear combination of (W_i) and (X_i)) within each stratum and using the residualized outcomes, they construct an adjusted estimator (\tilde\tau) that further reduces variance. The adjustment is shown to be design‑consistent and to retain the improved concentration properties of the SRSRR estimator.

Numerical simulations based on a synthetic population of 10,000 units with four strata (racial categories) and five continuous covariates illustrate the practical gains. Across a range of sampling fractions, thresholds, and covariate‑outcome correlations, SRSRR reduces the absolute bias by roughly 30 % and the mean‑squared error by about 25 % relative to SRSE. Incorporating the covariate‑adjustment step yields an additional ≈10 % efficiency gain. The authors also demonstrate that the optimal stratum‑specific allocation derived analytically achieves the lowest possible asymptotic variance among all feasible designs.

Overall, the paper makes three substantive contributions: (1) a rigorous design‑based asymptotic theory for stratified survey experiments with vanishing sampling fractions; (2) a practically implementable SRSRR design that simultaneously balances covariates at the sampling and treatment stages, with a novel limiting distribution that is more concentrated around the true effect; and (3) an analysis‑stage covariate adjustment that further enhances efficiency. The methodology is directly applicable to policy evaluation, public‑opinion polling, and any field where survey experiments are used, offering researchers a concrete tool to obtain more reliable causal estimates while preserving the external validity afforded by probability sampling.


Comments & Academic Discussion

Loading comments...

Leave a Comment