Sensitivity analysis for contamination in egocentric-network randomized trials with interference

Sensitivity analysis for contamination in egocentric-network randomized trials with interference
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Egocentric-Network Randomized Trials (ENRTs) are increasingly used to estimate causal effects under interference when measuring complete sociocentric network data is infeasible. ENRTs rely on egocentric network sampling, where a set of egos is first sampled, and each ego recruits a subset of its neighbors as alters. Treatments are then randomized across egos. While the observed ego-networks are disjoint by design, the underlying population network may contain edges connecting them, leading to contamination. Under a design-based framework, we show that the Horvitz-Thompson estimators of direct and indirect effects are biased whenever contamination is present. To address this, we derive bias-corrected estimators and propose a novel sensitivity analysis framework based on sensitivity parameters representing the probability or expected number of missing edges. This framework is implemented via both grid sensitivity analysis and probabilistic bias analysis, providing researchers with a flexible tool to assess the robustness of the causal estimators to contamination. We apply our methodology to the HIV Prevention Trials Network 037 study, finding that ignoring contamination may lead to underestimation of indirect effects and overestimation of direct effects.


💡 Research Summary

This paper addresses a critical but often overlooked source of bias in Ego‑centric Network Randomized Trials (ENRTs): contamination arising from unobserved edges that connect ego‑ego, ego‑alter, or alter‑alter pairs in the underlying population network. ENRTs rely on a two‑stage sampling design in which a set of egos (indexes) is sampled, each ego recruits a subset of its neighbors as alters, and treatment is randomized only among egos. By construction the observed ego‑alter sub‑network is disjoint, yet the true population network may contain additional links that are not captured in the observed data. The authors show, under a design‑based inference framework where the only source of randomness is the treatment assignment, that the standard Horvitz‑Thompson (HT) estimators for the direct effect (DE) on egos and the indirect effect (IE) on alters become biased whenever such contamination exists.

The bias stems from a mismatch between the true exposure indicator (F_i = I{\sum_j Z_j A_{ij}>0}) (which depends on the full adjacency matrix (A)) and the observed exposure (\tilde F_i = I{\sum_j Z_j \tilde A_{ij}>0}) computed on the sampled adjacency (\tilde A). The authors formalize this discrepancy through exposure probabilities (\pi_i^a = \Pr(F_i=1\mid i\in R_a)) for alters and (\pi_i^e = \Pr(F_i=1\mid i\in R_e)) for egos, and prove that the expectation of the HT estimators includes additional terms proportional to (\pi_i^a - p_z) and (\pi_i^e), where (p_z) is the randomization probability for egos.

To correct the bias, the paper introduces a set of sensitivity parameters that quantify the degree of missing edges:

  • (\phi_1) – the expected proportion (or probability) of missing ego‑alter edges,
  • (\phi_2) – the expected proportion (or probability) of missing ego‑ego edges, and
  • (\phi_3) – a parameter capturing the magnitude of a treatment‑exposure interaction effect needed for unbiased DE estimation.

These parameters can be informed by external data, expert elicitation, or covariate models. Using (\phi_1) and (\phi_2), the authors derive bias‑corrected versions of the HT estimators that replace the unknown exposure probabilities (\pi_i^a) and (\pi_i^e) with functions of the sensitivity parameters and the known randomization probability. The corrected IE estimator, for example, adjusts the HT weight by a factor (\frac{\phi_1}{p_z(1-p_z)}) that accounts for the expected number of unobserved treated neighbors.

The methodological contribution is complemented by two practical implementation strategies:

  1. Grid Sensitivity Analysis – Researchers specify plausible ranges for the sensitivity parameters, construct a multidimensional grid, and compute the corrected estimators at each grid point. This yields a visual map of how the causal estimates vary with different contamination scenarios.

  2. Probabilistic Bias Analysis – A Bayesian approach where prior distributions are placed on (\phi_1, \phi_2, \phi_3). Monte‑Carlo simulation draws from these priors and propagates the uncertainty through the bias‑corrected estimators, producing posterior distributions for DE and IE that reflect both sampling variability and uncertainty about contamination.

Both approaches are implemented in the R package ENRTsensitivity, which the authors make publicly available.

Simulation studies illustrate the impact of contamination. Networks generated from both Erdős‑Rényi and realistic social‑network models are subjected to varying levels of missing edges (e.g., (\phi_1, \phi_2 = 0, 0.05, 0.10, 0.20)). Results consistently show that the naïve HT estimators underestimate IE and overestimate DE as contamination increases, with bias becoming substantial even when only 10 % of ego‑alter edges are missing. The bias‑corrected estimators recover the true effects when the sensitivity parameters are correctly specified, and the probabilistic bias analysis yields credible intervals that contain the true values across a wide range of plausible priors.

The authors then apply their framework to the HIV Prevention Trials Network 037 (HPTN 037) study, a landmark ENRT evaluating a peer‑education intervention among people who inject drugs. The original analysis assumed completely disjoint ego‑alter networks. Using auxiliary information and expert opinion, the authors set prior distributions centered at (\phi_1=0.15) (≈15 % of ego‑alter edges missing) and (\phi_2=0.10) (≈10 % of ego‑ego edges missing), with a modest interaction parameter (\phi_3=0.05). Grid and probabilistic bias analyses reveal that accounting for contamination raises the estimated indirect effect on alters from roughly 0.12 to 0.18 (a 50 % increase) and lowers the direct effect on egos from 0.35 to 0.27. These findings suggest that prior conclusions may have under‑appreciated the spill‑over benefits of the intervention while overstating its direct impact.

In conclusion, the paper makes three substantive contributions: (i) it formally demonstrates that standard HT estimators are biased under realistic network contamination; (ii) it provides bias‑corrected estimators that depend on interpretable sensitivity parameters; and (iii) it offers a flexible, software‑enabled sensitivity‑analysis toolkit that can be applied to any ENRT or, more broadly, to studies with sampled networks and potential cross‑cluster interference. The authors discuss extensions to multi‑level exposure mappings, alternative sampling designs (e.g., snowball sampling), and data‑driven estimation of the sensitivity parameters as promising avenues for future research.


Comments & Academic Discussion

Loading comments...

Leave a Comment