Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference
A/B testing on platforms often faces challenges from network interference, where a unit’s outcome depends not only on its own treatment but also on the treatments of its network neighbors. To address this, cluster-level randomization has become standard, enabling the use of network-aware estimators. These estimators typically trim the data to retain only a subset of informative units, achieving low bias under suitable conditions but often suffering from high variance. In this paper, we first demonstrate that the interior nodes - units whose neighbors all lie within the same cluster - constitute the vast majority of the post-trimming subpopulation. In light of this, we propose directly averaging over the interior nodes to construct the mean-in-interior (MII) estimator, which circumvents the delicate reweighting required by existing network-aware estimators and substantially reduces variance in classical settings. However, we show that interior nodes are often not representative of the full population, particularly in terms of network-dependent covariates, leading to notable bias. We then augment the MII estimator with a counterfactual predictor trained on the entire network, allowing us to adjust for covariate distribution shifts between the interior nodes and full population. By rearranging the expression, we reveal that our augmented MII estimator embodies an analytical form of the point estimator within prediction-powered inference framework. This insight motivates a semi-supervised lens, wherein interior nodes are treated as labeled data subject to selection bias. Extensive and challenging simulation studies demonstrate the outstanding performance of our augmented MII estimator across various settings.
💡 Research Summary
The paper tackles a fundamental problem in modern online platforms: estimating the global average treatment effect (GATE) when outcomes are subject to network interference. Traditional A/B testing assumes the Stable Unit Treatment Value Assumption (SUTVA), which is violated when a user’s outcome depends on the treatments of neighboring users. The authors focus on the common practical setting of cluster‑level randomization, where a pre‑computed partition of the social graph is used and all members of a cluster receive the same treatment. Under the neighborhood interference assumption (NIA) – interference is limited to a unit’s 1‑hop neighbors – the classic Horvitz–Thompson (HT) estimator with exposure indicators is unbiased but suffers from astronomically large variance because it weights each “clean” unit by the inverse probability of the exposure event, which scales as (1/p)^c where c is the number of clusters a node touches. In dense real‑world graphs, c can be large, making the HT estimator practically unusable.
The authors observe that after the usual trimming (keeping only units that satisfy the exposure condition), the overwhelming majority of retained units are interior nodes – nodes whose entire 1‑hop neighborhood lies inside the same cluster. Empirically, on a Facebook network clustered with the Louvain algorithm, interior nodes constitute only about 8 % of the whole population but more than 90 % of the trimmed sample. This motivates a radically simpler estimator: the Mean‑in‑Interior (MII) estimator, which simply computes the difference in average outcomes between treated and control interior nodes, i.e. a standard difference‑in‑means applied to the interior sub‑population.
Because MII assigns equal weight to each interior observation, it eliminates the explosive inverse‑probability weights of HT and the two‑level averaging of the Cluster‑Adaptive Estimator (CAE). The authors prove that, under two technical conditions – (1) the proportion of interior nodes becomes asymptotically uniform across clusters, and (2) interior node means approximate the full‑cluster means – the MII estimator is consistent for τ. These assumptions are considerably weaker than those required for CAE’s central limit theorem.
However, interior nodes are not a random sample of the whole graph; they tend to have different degree distributions, centralities, and other covariates compared with boundary nodes. This selection bias can introduce non‑negligible bias in the MII estimate, especially when covariates are strongly linked to outcomes. To correct this, the authors train a counterfactual predictor μ̂(z, X) on the entire network (including boundary nodes) using any suitable supervised model (e.g., graph neural networks). The predictor estimates the expected outcome under global treatment (z = 1) or control (z = 0) given node features X.
The augmented MII estimator adds a bias‑correction term that adjusts the interior averages by the difference between the population‑level predicted means and the interior‑only predicted means:
τ̂_aug = τ̂_MII +
Comments & Academic Discussion
Loading comments...
Leave a Comment