State policy heterogeneity analyses: considerations and proposals
State-level policy studies often conduct heterogeneity analyses that quantify how treatment effects vary across state characteristics. These analyses may be used to inform state-specific policy decisions, or to infer how the effect of a policy changes in combination with other state characteristics. However, in state-level settings with varied contexts and policy landscapes, multiple versions of similar policies, and differential policy implementation, the causal quantities targeted by these analyses may not align with the inferential goals. This paper clarifies these issues by distinguishing several causal estimands relevant to heterogeneity analyses in state-policy settings, including state-specific treatment effects (ITE), conditional average treatment effects (CATE), and controlled direct effects (CDE). We argue that the CATE is often the easiest to identify and estimate, but may not be the most policy relevant target of inference. Moreover, the widespread practice of coarsening distinct policies or implementations into a single indicator further complicates the interpretation of these analyses. Motivated by these limitations, we propose bounding ITEs as an alternative inferential goal, yielding ranges for each state’s policy effect under explicit assumptions that quantify deviations from the ideal identifying conditions. These bounds target a well-defined and policy-relevant quantity, the effect for specific states. We develop this approach within a difference-in-differences framework and discuss how sensitivity parameters may be informed using pre-treatment data. Through simulations we demonstrate that bounding state-specific effects can more reliably determine the sign of the ITEs than CATE estimates. We then illustrate this method to examine the effect of the Affordable Care Act Medicaid expansion on high-volume buprenorphine prescribing.
💡 Research Summary
The paper addresses a pervasive problem in state‑level policy evaluation: heterogeneity analyses that are routinely performed by regressing outcomes on policy indicators and their interactions with observed state characteristics often fail to answer the causal questions policymakers care about. The authors first catalogue five broad sources of effect heterogeneity—variation in population composition, contextual factors, co‑occurring policies, policy grouping (or “coarsening”), and differences in implementation—and argue that standard regression‑based heterogeneity studies conflate these sources, yielding merely associative statements rather than causal ones.
To clarify the causal targets, three estimands are formally defined: (1) the state‑specific treatment effect (ITE), the effect of a well‑specified policy version in a particular state; (2) the conditional average treatment effect (CATE), the average effect among states sharing a given covariate value; and (3) the controlled direct effect (CDE), the effect when a covariate is held fixed. The ITE is the most policy‑relevant but also the hardest to identify because real‑world policies are rarely a single, uniform intervention; they exist in multiple versions and are implemented with varying intensity. CATE is easier to estimate under standard identification assumptions but does not directly answer “what would happen in this state if the policy were adopted.” CDE further abstracts away from realistic implementation by fixing covariates that may themselves be policy levers.
A key insight is that the common practice of collapsing distinct policy versions into a single binary indicator (treatment coarsening) obscures the very distinctions that generate heterogeneity, making any regression‑based interaction term difficult to interpret causally.
In response, the authors propose bounding the ITE rather than point‑estimating it. Within a difference‑in‑differences (DiD) framework, they introduce sensitivity parameters that capture deviations from the ideal parallel‑trends assumption, such as pre‑treatment trend differences, staggered adoption timing, and unobserved implementation intensity. These parameters can be calibrated using pre‑policy outcome data, yielding upper and lower bounds for each state’s treatment effect under explicit, transparent assumptions.
Simulation studies compare traditional CATE estimates to the proposed ITE bounds. The results show that CATE point estimates often mis‑identify the sign of the true effect, especially when heterogeneity stems from unobserved implementation differences. By contrast, the bounded approach correctly identifies the sign in a far larger proportion of simulations and provides informative intervals that shrink as the sensitivity parameters become tighter.
The empirical illustration examines the impact of the Affordable Care Act Medicaid expansion on high‑volume buprenorphine prescribing across U.S. states. Using state‑level DiD models, the authors find that the expansion increased prescribing in some states while decreasing it in others. The bounded ITE estimates reveal that, for several states, the sign of the effect is robustly positive or negative, whereas for a few states the bounds include zero, indicating uncertainty. This nuanced picture would be invisible under a single average effect or a naïve CATE interaction analysis.
Overall, the paper makes three contributions: (1) a conceptual taxonomy of why state‑level effects may differ; (2) a clear distinction among ITE, CATE, and CDE, highlighting the pitfalls of treatment coarsening; and (3) a practical bounding methodology that aligns the inferential target with policymakers’ needs while remaining transparent about the assumptions required. The approach offers a way to communicate effect uncertainty at the state level, facilitating more informed, context‑specific policy decisions. Future work could extend the bounding framework to settings with multiple treatment periods, dynamic effects, or continuous policy intensity measures.
Comments & Academic Discussion
Loading comments...
Leave a Comment