A Diagnostic to Find and Help Combat Stochastic Positivity Issues -- with a Focus on Continuous Treatments
The positivity assumption is central in the identification of a causal effect, and especially the stochastic variant is an issue many applied researchers face, yet is rarely discussed, especially in conjunction with continuous treatments or Modified Treatment Policies. One common recommendation for dealing with a violation is to change the estimand. However, an applied researcher is faced with two problems: First, how can she tell whether there is a stochastic positivity violation given her estimand of interest, preferably without having to estimate a model first? Second, if she finds a problem with stochastic positivity, how should she change her estimand in order to arrive at an estimand which does not face the same issues? We suggest a novel diagnostic which allows the researcher to answer both questions by providing insights into how well an estimation for a certain estimand can be made for each observation using the data at hand. We provide a simulation study on the general behaviour of different Modified Treatment Policies (MTPs) at different levels of stochastic positivity violations and show how the diagnostic helps understand where bias is to be expected. We illustrate the application of our proposed diagnostic in a pharmacoepidemiological study based on data from CHAPAS-3, a trial comparing different treatment regimens for children living with HIV.
💡 Research Summary
The paper addresses a pervasive yet under‑discussed problem in causal inference: stochastic (or practical) positivity violations when the treatment is continuous and when researchers employ Modified Treatment Policies (MTPs). Positivity—requiring that every unit have a non‑zero probability of receiving any treatment level of interest—is a cornerstone for identifying causal effects. While deterministic violations are logically impossible scenarios, stochastic violations arise when, for some sub‑populations, the data contain few or no observations at certain treatment values. In continuous settings this sparsity is inevitable, but it can lead to severe finite‑sample bias, especially for estimators that rely on inverse‑probability weighting or g‑computation.
Existing diagnostics either depend on a specific estimator (e.g., Petersen et al.’s bootstrap bias estimator), require high‑dimensional density estimation (generalized propensity scores), or need user‑defined hyper‑parameters that are difficult to calibrate (Positivity Regression Trees). Moreover, they do not guide the analyst on how to modify the estimand or the intervention policy to avoid the identified problems.
The authors propose a novel, estimator‑independent, data‑driven diagnostic based on kernel density estimation. The diagnostic is explicitly “estimand‑specific”: for a chosen causal estimand ψ (for example, the expected outcome under a particular intervention), it evaluates, for each observation i, whether the conditional density f_{A|L}(a_i^{int} | l_i) that the intervention requires is sufficiently large. Two complementary scores are introduced:
-
Local Support Score (LSS) – a kernel‑smoothed estimate of the amount of data surrounding the (L, A^{int}) pair needed for observation i. Low LSS indicates that the required treatment value for that unit lies in a sparsely populated region, signalling a potential positivity violation for that unit.
-
Global Sparsity Index (GSI) – an aggregate measure that scans the entire support of the intervention function d(l, a^{obs}) and records the minimal local support across all (l, a^{int}) pairs that appear in the estimand. GSI highlights overall regions of the covariate‑treatment space that are insufficiently covered by the observed data.
Both scores rely on a kernel function K_h with bandwidth h; the authors provide practical guidance for selecting h and a threshold τ for “acceptable” support, using cross‑validation, sensitivity analyses, or substantive knowledge.
The paper proceeds with a thorough simulation study. Data are generated with a continuous treatment A and multivariate covariates L. Three representative intervention schemes are examined: (i) a static policy (A = a₀), (ii) a dynamic policy (A = d(L)), and (iii) a modified policy (A = d(L, A^{obs})). The degree of stochastic positivity violation is manipulated by thinning the density of A conditional on L in selected sub‑populations. Results show that observations with low LSS produce large bias for both inverse‑probability‑of‑treatment weighting (IPTW) and g‑computation estimators, confirming the diagnostic’s ability to flag high‑bias regions. Moreover, when the analyst uses the diagnostic to replace a highly sparse policy with a less sparse MTP (e.g., limiting the magnitude of the modification), the average bias drops dramatically, illustrating how the diagnostic can guide the selection of an estimand that is both scientifically relevant and statistically feasible.
The authors then apply the method to real data from the CHAPAS‑3 trial, a multi‑arm study of antiretroviral regimens for HIV‑infected children. The original analysis used a fixed‑dose policy, but the data revealed that certain age‑weight combinations rarely received some dose levels, leading to low LSS in those strata. By constructing a dynamic MTP that adjusts dose according to age and weight, the LSS across the sample increased, the estimated treatment effect became more precise, and the diagnostic confirmed that stochastic positivity violations were largely eliminated.
In the discussion, the authors highlight several strengths of their approach: (1) it is estimator‑agnostic, so any consistent estimator can be used once the diagnostic confirms feasibility; (2) it directly links the positivity assumption to the specific causal estimand under consideration, which is crucial for MTPs where the intervention function itself determines the required support; (3) it provides actionable feedback for analysts, suggesting whether to trim the sample, exclude certain covariates, or redesign the intervention policy. Limitations include the sensitivity of kernel bandwidth choice, computational cost in high‑dimensional covariate spaces, and the need for substantive judgment when deciding how to handle low‑support regions (e.g., trimming vs. extrapolation). Future work could explore adaptive kernels, dimensionality‑reduction techniques, and extensions to longitudinal settings.
Overall, the paper makes a substantive contribution to causal inference methodology by delivering a practical, theoretically grounded tool for diagnosing stochastic positivity violations in continuous‑treatment contexts and for steering analysts toward estimands and intervention policies that are both scientifically meaningful and statistically estimable.
Comments & Academic Discussion
Loading comments...
Leave a Comment