Bayesian Inference for Non-Gaussian Simultaneous Autoregressive Models with Missing Data

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Standard simultaneous autoregressive (SAR) models typically assume normally distributed errors, an assumption often violated in real-world datasets that frequently exhibit non-normal, skewed, or heavy-tailed characteristics. New SAR models are proposed to capture these non-Gaussian features. The spatial error model (SEM), a widely used SAR-type model, is considered. Three novel SEMs are introduced, extending the standard Gaussian SEM. These extensions incorporate Student’s $t$-distributed errors to accommodate heavy-tailed behaviour, one-to-one transformations of the response variable to address skewness, or a combination of both. Variational Bayes (VB) estimation methods are developed for these models, and the framework is further extended to handle missing response data under the missing not at random (MNAR) mechanism. Standard VB methods perform well with complete datasets; however, handling missing data requires a hybrid VB (HVB) approach, which integrates a Markov chain Monte Carlo (MCMC) sampler to generate missing values. The proposed VB methods are evaluated using both simulated and real-world datasets, demonstrating their robustness and effectiveness in dealing with non-Gaussian data and missing data in spatial models. Although the method is demonstrated using SAR models, the proposed model specifications and estimation approaches are widely applicable to various types of models for handling non-Gaussian data with missing values.

💡 Research Summary

This paper addresses two pervasive shortcomings of conventional simultaneous autoregressive (SAR) models: the reliance on Gaussian error assumptions and the difficulty of handling missing response data, especially under a missing‑not‑at‑random (MNAR) mechanism. Using the spatial error model (SEM) as a vehicle, the authors introduce three novel non‑Gaussian SEM variants. The first, SEM‑t, replaces the normal error term with a Student’s t distribution, thereby accommodating heavy‑tailed residuals. The second, YJ‑SEM‑Gau, applies a one‑to‑one Yeo‑Johnson (YJ) transformation to the response variable to mitigate skewness while retaining Gaussian errors. The third, YJ‑SEM‑t, combines both ideas, using a YJ‑transformed response together with t‑distributed errors, thus simultaneously addressing skewness and heavy tails. All three models share the same structural equation y* = Xβ + (I − ρW)⁻¹e, where ρ is the spatial autocorrelation parameter and W is a pre‑specified spatial weight matrix; only the distribution of the latent error vector e differs.

To enable Bayesian inference for these models, the authors develop variational Bayes (VB) algorithms. For complete data, a mean‑field VB approximation yields closed‑form updates for β, σ²_e, ρ, the degrees‑of‑freedom ν (in the t‑case), and the YJ transformation parameter γ, delivering fast convergence and accurate posterior approximations. However, when responses are missing and the missingness depends on the unobserved values (MNAR), the likelihood cannot be factorised into a product of observed‑data terms, and standard VB becomes inadequate.

The paper therefore adopts a selection‑model factorisation: p(y,m | ξ,ψ) = p(m | y,ψ) p(y | ξ), where ξ denotes the SEM parameters and ψ governs a logistic missingness model p(m_i = 1 | y_i,x_i,ψ) = exp(x_iᵀψ_x + y_iψ_y)/(1+exp(·)). This joint modelling forces simultaneous inference on the SEM parameters, the missingness parameters, and the latent missing responses y_u.

Because directly applying VB to the full joint posterior would still require an intractable integration over y_u, the authors propose a hybrid variational Bayes (HVB) algorithm. HVB alternates between (i) an MCMC step that draws samples of the missing responses y_u conditional on the current variational distributions of ξ and ψ, and (ii) a VB step that updates the variational factors for ξ and ψ given the sampled y_u. This hybrid scheme preserves the accuracy of MCMC for the most problematic latent variables while retaining the computational speed of VB for the remaining parameters.

Extensive simulation studies demonstrate the advantages of HVB. With n = 625 observations and 50 % missingness, HVB achieves estimation bias, root‑mean‑square error, and credible‑interval coverage comparable to Hamiltonian Monte Carlo (HMC) implemented in Stan, yet it runs more than ten times faster. When the data size is increased to tens of thousands of locations, HMC becomes computationally infeasible, whereas HVB still converges within minutes on a standard workstation.

A real‑world application to an environmental spatial dataset from coastal Australia illustrates the practical benefits. The YJ‑SEM‑t model captures both the heavy‑tailed residual structure and the pronounced skewness of the pollutant measurements, leading to superior predictive performance and more realistic uncertainty quantification compared with the traditional Gaussian SEM.

In summary, the paper makes four key contributions: (1) formulation of three flexible non‑Gaussian SAR models that jointly handle skewness and heavy tails; (2) development of efficient VB estimators for these models under complete data; (3) extension to MNAR missingness via a selection‑model framework and the introduction of a hybrid VB‑MCMC algorithm that scales to large spatial datasets; and (4) empirical validation through simulation and a substantive case study, showing that the proposed methods are both accurate and computationally attractive. The authors note that the methodology is readily extensible to other SAR variants (e.g., spatial lag, spatial Durbin), multivariate responses, and alternative non‑Gaussian error families, opening avenues for future research in robust spatial modelling.

Bayesian Inference for Non-Gaussian Simultaneous Autoregressive Models with Missing Data

💡 Research Summary

Comments & Academic Discussion

Leave a Comment