Fast Rerandomization for Balancing Covariates in Randomized Experiments: A Metropolis-Hastings Framework
Balancing covariates is critical for credible and efficient randomized experiments. Rerandomization addresses this by repeatedly generating treatment assignments until covariate balance meets a prespecified threshold. By shrinking this threshold, it can achieve arbitrarily strong balance, with established results guaranteeing optimal estimation and valid inference in both finite-sample and asymptotic settings across diverse complex experimental settings. Despite its rigorous theoretical foundations, practical use is limited by the extreme inefficiency of rejection sampling, which becomes prohibitively slow under small thresholds and often forces practitioners to adopt suboptimal settings, leading to degraded performance. Existing work focusing on acceleration typically fail to maintain the uniformity over the acceptable assignment space, thus losing the theoretical grounds of classical rerandomization. Building upon a Metropolis-Hastings framework, we address this challenge by introducing an additional sampling-importance resampling step, which restores uniformity and preserves statistical guarantees. Our proposed algorithm, PSRSRR, achieves speedups ranging from 10 to 10,000 times while maintaining exact and asymptotic validity, as demonstrated by simulations and two real-data applications.
💡 Research Summary
This paper tackles a long‑standing computational bottleneck in rerandomization (RR), a technique that repeatedly draws treatment assignments until a pre‑specified covariate‑balance criterion—typically the Mahalanobis distance M(W) ≤ a—is satisfied. Classical RR relies on naïve rejection sampling, which becomes infeasible when the acceptance threshold a is small or the number of covariates is large; acceptance probabilities can drop below 10⁻¹⁵, forcing practitioners to use sub‑optimal thresholds and sacrificing statistical efficiency.
The authors propose a novel algorithm, Pair‑Switching Rejection Sampling Rerandomization (PSRSRR), built on a Metropolis‑Hastings (MH) framework. The method proceeds in two stages. First, a truncated pair‑switching Markov chain is constructed: starting from a random assignment, at each iteration a treated unit and a control unit are swapped, producing a candidate assignment W*. The candidate is accepted with probability
α = min{ (M(t)/M*)^{1/T}, 1 },
where T > 0 is a temperature parameter that controls the willingness to accept worse balance and thus helps the chain escape local minima. After N iterations the chain converges to a stationary distribution π(W) ∝ M(W)^{‑1/T}, which is explicitly non‑uniform—assignments with smaller Mahalanobis distance are more likely.
Second, the authors apply an “inverse‑back” importance‑resampling step. Treating π as a proposal distribution, they reject any candidate that fails the balance threshold (M(W) > a). For candidates that satisfy the threshold, they accept with probability
p(W) = (M(W)/a)^{1/T}.
Because the proposal bias is proportional to M(W)^{‑1/T}, this acceptance probability exactly cancels the bias, yielding a uniform distribution over the acceptable set 𝑊ₐ. Theorem 3.2 formally proves uniformity, restoring the key assumption underlying the classical rerandomization theory (e.g., unbiasedness of the difference‑in‑means estimator and the asymptotic variance formulas of Li et al., 2018).
To make the approach practical, the authors devise a stopping rule that monitors convergence of the Markov chain and adaptively adjusts T and N. The final PSRSRR algorithm therefore interleaves a limited number of pair‑switching MH steps with the importance‑resampling filter, generating a single uniformly balanced assignment at a fraction of the cost of naïve RR.
Theoretical contributions include: (1) a rigorous derivation of the stationary distribution of the pair‑switching MH chain; (2) a proof that the two‑step procedure yields exact uniformity over 𝑊ₐ; (3) an explicit stopping rule guaranteeing that the chain is “close enough” to stationarity for practical use.
Empirical evaluation comprises extensive simulations and two real‑world applications. In simulations with p = 5, 10, and 20 covariates, PSRSRR achieves speed‑ups ranging from 10× to 10 000× relative to classical RR while preserving the same estimation bias and variance. In a clinical trial with 150 participants and three biomarkers, and in a large‑scale A/B test with 20 000 users and 15 covariates, PSRSRR dramatically reduces the runtime of Fisher randomization tests (FRT) and improves covariate balance without sacrificing the validity of confidence intervals.
Limitations are acknowledged: the choice of temperature T and iteration count N can be data‑dependent, and high‑dimensional covariate spaces may require longer mixing times. Future work is suggested on adaptive temperature schedules, parallel multiple‑chain implementations, and extensions to non‑linear balance criteria (e.g., machine‑learning‑derived scores).
In summary, the paper delivers the first Metropolis‑Hastings‑based rerandomization method that is both theoretically exact and computationally scalable. PSRSRR bridges the gap between the elegant statistical guarantees of classical rerandomization and the practical demands of modern experimental designs, offering a powerful tool for researchers who need strong covariate balance without prohibitive computational cost.
Comments & Academic Discussion
Loading comments...
Leave a Comment