A Weighting Framework for Clusters as Confounders in Observational Studies

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

When units in observational studies are clustered in groups, such as students in schools or patients in hospitals, researchers often address confounding by adjusting for cluster-level covariates or cluster membership. In this paper, we develop a unified weighting framework that clarifies how different estimation methods control two distinct sources of imbalance: global balance (differences between treated and control units across clusters) and local balance (differences within clusters). We show that inverse propensity score weighting (IPW) with a random effects propensity score model – the current standard in the literature – targets only global balance and constant level shifts across clusters, but imposes no constraints on local balance. We then present two approaches that target both forms of balance. First, hierarchical balancing weights directly control global and local balance through a constrained optimization problem. Second, building on the recently proposed Generalized Mundlak approach, we develop a novel Mundlak balancing weights estimator that adjusts for cluster-level sufficient statistics rather than cluster indicators; this approach can accommodate small clusters where all units are treated or untreated. Critically, these approaches rest on different assumptions: hierarchical balancing weights require only that treatment is ignorable given covariates and cluster membership, while Mundlak methods additionally require an exponential family structure. We then compare these methods in a simulation study and in two applications in education and health services research that exhibit very different cluster structures.

💡 Research Summary

This paper tackles the problem of confounding in observational studies where units are nested within clusters (e.g., students in schools, patients in hospitals). The authors argue that traditional approaches—most notably inverse‑propensity‑score weighting (IPW) using a random‑effects propensity model—address only “global balance,” i.e., differences in covariate means between treated and control units across clusters, while ignoring “local balance,” the covariate differences within each cluster. Ignoring local imbalance can lead to biased causal estimates, especially when clusters are small or when some clusters contain only treated or only control units.

To remedy this, the authors develop two novel weighting strategies that simultaneously enforce global and local balance.

Hierarchical Balancing Weights – These are obtained by solving a constrained optimization problem that imposes moment‑matching constraints both at the overall sample level (global) and within each cluster (local). The resulting weights are non‑negative and can be interpreted as the solution to a minimum‑entropy problem that respects the two sets of balance constraints. This method requires only the standard ignorability assumption conditional on individual covariates and cluster membership; it does not rely on any parametric form for the propensity score. However, achieving both sets of constraints may be infeasible when data are sparse, and in such cases the method may need to drop clusters that have no treatment variation.
Mundlak Balancing Weights – Building on the Generalized Mundlak framework (Arkhangelsky & Imbens, 2024), this approach replaces explicit cluster indicators with cluster‑level sufficient statistics (e.g., cluster means of covariates, treatment prevalence). The key assumption is that the joint distribution of covariates and treatment within each cluster belongs to an exponential family and that the chosen sufficient statistics capture all cluster‑level information relevant for treatment assignment. Under this “exponential‑family cluster ignorability” assumption, treatment is ignorable given individual covariates and the cluster‑level sufficient statistics, allowing one to pool information across clusters even when some clusters lack treatment variation. The Mundlak balancing weights are derived by first estimating a propensity model that conditions on these sufficient statistics and then applying a balancing‑weight optimization that enforces global balance (and optionally a weaker form of local balance). This method is attractive for settings with many small clusters, but its validity hinges on correctly specifying the sufficient statistics and the exponential‑family form.

The paper systematically compares three classes of estimators: (i) standard IPW with a random‑intercept propensity model (global only), (ii) hierarchical balancing weights (global + local), and (iii) Mundlak balancing weights (global + local under exponential‑family assumptions). Through extensive simulations varying cluster size, number of clusters, and the degree of treatment heterogeneity within clusters, the authors find:

When clusters are moderately large and contain sufficient treated and control units, hierarchical balancing weights achieve the lowest bias and mean‑squared error because they directly target both balance dimensions.
When many clusters are tiny or have no treatment variation, Mundlak balancing weights remain stable and produce unbiased estimates, whereas standard IPW and hierarchical weights either become biased or must discard a substantial fraction of data.
Random‑coefficients IPW (allowing random slopes) is computationally demanding and often fails to converge, offering little practical advantage.

Two empirical applications illustrate these trade‑offs. The first examines the effect of reduced class size on student test scores using data from thousands of schools, many of which have fewer than 30 students. Here, Mundlak balancing weights allow inclusion of the smallest schools without dropping them, while hierarchical weights require exclusion of schools with no class‑size variation. The second application studies emergency general surgery outcomes across a network of hospitals, where clusters are large and treatment variation is abundant; hierarchical balancing weights yield the most precise ATT estimate, and Mundlak weights perform similarly but require the exponential‑family assumption.

The authors conclude with practical guidance:

Use hierarchical balancing weights when cluster sizes are sufficient and treatment variation exists within most clusters.
Opt for Mundlak balancing weights when the data contain many small clusters or clusters lacking treatment variation, provided the researcher can plausibly justify the exponential‑family and sufficient‑statistic assumptions.
Standard IPW with random intercepts should be viewed as a baseline that only guarantees global balance; researchers should assess whether local imbalance might threaten validity.

Overall, the paper introduces a unified conceptual framework for thinking about global versus local balance in clustered observational studies, proposes two concrete weighting estimators that address both sources of imbalance, and offers clear recommendations for applied researchers across education, health services, and other fields where clustered data are common.

A Weighting Framework for Clusters as Confounders in Observational Studies

💡 Research Summary

Comments & Academic Discussion

Leave a Comment