Spatial Confounding in Multivariate Areal Data Analysis
We investigate spatial confounding in the presence of multivariate disease dependence. In the “analysis model perspective” of spatial confounding, adding a spatially dependent random effect can lead to significant variance inflation of the posterior distribution of the fixed effects. The “data generation perspective” views covariates as stochastic and correlated with an unobserved spatial confounder, leading to inferior statistical inference over multiple realizations. Although multiple methods have been proposed for adjusting statistical models to mitigate spatial confounding in estimating regression coefficients, the results on interactions between spatial confounding and multivariate dependence are very limited. We contribute to this domain by investigating spatial confounding from the analysis and data generation perspectives in a Bayesian coregionalized areal regression model. We derive novel results that distinguish variance inflation due to spatial confounding from inflation based on multicollinearity between predictors and provide insights into the estimation efficiency of a spatial estimator under a spatially confounded data generation model. We demonstrate favorable performance of spatial analysis compared to a non-spatial model in our simulation experiments even in the presence of spatial confounding and a misspecified spatial structure. In this regard, we align with several other authors in the defense of traditional hierarchical spatial models (Gilbert et al., 2025; Khan and Berrett, 2023; Zimmerman and Ver Hoef, 2022) and extend this defense to multivariate areal models. We analyze county-level data from the US on obesity / diabetes prevalence and diabetes-related cancer mortality, comparing the results with and without spatial random effects.
💡 Research Summary
The paper investigates spatial confounding in multivariate areal data from two complementary viewpoints: the analysis‑model perspective, where adding a spatially structured random effect can inflate the posterior variance of fixed‑effect coefficients, and the data‑generation perspective, where covariates are themselves stochastic, spatially correlated, and linked to an unobserved spatial confounder. While prior work has largely focused on univariate outcomes or independent multivariate settings, the interaction between spatial confounding and multivariate dependence has been under‑explored.
To fill this gap the authors develop a Bayesian coregionalized areal regression framework. In the data‑generation stage each outcome vector (Y_i) follows
(Y_i = \beta_{0i}\mathbf{1}n + X_1\beta{1i} + Z_i + \varepsilon_i),
where the latent confounder matrix (Z) and the covariate matrix (X_1) are generated by linear transformations of spatially correlated error terms (\zeta_i) and (\epsilon_j). The errors are modeled as (\zeta_i\sim N(0,\rho_i V_\phi)) and (\epsilon_j\sim N(0,V_\phi)), with (V_\phi) encoding a CAR‑type spatial precision and (\rho_i\in
Comments & Academic Discussion
Loading comments...
Leave a Comment