LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data

LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In many applications, we wish to fit a parametric statistical model to a small ensemble of spatially distributed random variables (‘fields’). However, parameter inference using maximum likelihood estimation (MLE) is computationally prohibitive, especially for large, non-stationary fields. Thus, many recent works train neural networks to estimate parameters given spatial fields as input, sidestepping MLE completely. In this work we focus on a popular class of parametric, spatially autoregressive (SAR) models. We make a simple yet impactful observation; because the SAR parameters can be arranged on a regular grid, both inputs (spatial fields) and outputs (model parameters) can be viewed as images. Using this insight, we demonstrate that image-to-image (I2I) networks enable faster and more accurate parameter estimation for a class of non-stationary SAR models with unprecedented complexity.


💡 Research Summary

The paper introduces LatticeVision, a novel framework for efficiently estimating spatially varying parameters of non‑stationary spatial autoregressive (SAR) models using image‑to‑image (I2I) neural networks. Traditional maximum‑likelihood estimation (MLE) becomes computationally infeasible for large, non‑stationary fields because it scales cubically with the number of locations and requires repeated factorisations of massive covariance matrices. Recent work has replaced MLE with neural networks that map a spatial field to local parameter estimates, but these approaches typically divide the domain into small stationary patches, estimate parameters independently for each patch, and thus lose global context while incurring a computational cost that grows linearly with the number of patches.

LatticeVision’s key insight is that SAR parameters—specifically the correlation‑range field κ²(s), the anisotropy ratio ρ(s), and the orientation field θ(s)—can be arranged on the same regular grid as the observed field. Consequently, both inputs (M replicated realizations of the field) and outputs (the three‑channel parameter image) share the same spatial dimensions, allowing the problem to be cast as an image‑to‑image translation task.

To train such a system, the authors devise a synthetic data pipeline that reflects realistic geophysical variability. They define eight spatial “building‑block” patterns (coastlines, jet‑stream corridors, oceanic circulations, etc.) and sample hyper‑parameters from uniform priors to generate diverse κ², ρ, and θ fields. Each generated parameter field is encoded into a sparse SAR matrix B via a finite‑difference discretisation of the SPDE (∇·D(s)∇ – κ²(s))f(s)=W(s). Solving B y = e with white‑noise e yields a random field y having the desired covariance structure. By drawing M independent white‑noise vectors but using the same B, they create an input tensor Y ∈ ℝ^{M×H×W} that shares a common underlying parameter image Φ ∈ ℝ^{3×H×W}. This construction guarantees that the network learns to infer a global parameter map from multiple stochastic realizations of the same process.

Three I2I architectures are evaluated: (1) a standard U‑Net with GELU activations and group normalisation, (2) a Vision Transformer (ViT) adapted for dense prediction (no classification token, 2‑D positional embeddings), and (3) a hybrid Conv‑Transformer model called STUN, derived from TransUNet but using the symmetric U‑Net encoder/decoder. For comparison, several local CNN estimators are implemented, varying receptive‑field sizes (9, 17, 25) and parameter counts (≈0.5–2.5 M). All networks are trained with batch size 64, using M ∈ {1, 5, 15, 30} replicates per sample; the main results use M = 30. Data augmentation is limited to translations and sign flips to preserve anisotropy orientation.

Empirical results on the synthetic test set (160 samples) show that I2I models dramatically outperform local CNNs in both speed and accuracy. Because the whole field is processed in a single forward pass, inference time is roughly five times faster than the best local CNN, and the mean absolute error (MAE) on κ², ρ, and θ drops by 30 % or more. The hybrid STUN architecture consistently yields the lowest MAE, indicating that modest attention mechanisms improve performance without the heavy parameter overhead of a pure transformer. Importantly, I2I models remain robust when the number of replicates is reduced (e.g., M = 5), a scenario relevant for expensive Earth System Model (ESM) simulations where only a handful of runs are available.

The authors then apply LatticeVision to real ESM output fields. After estimating the non‑stationary SAR parameters with the trained I2I network, they feed these parameters into the LatticeKrig SAR simulator, generating thousands of synthetic climate fields in seconds—a stark contrast to the millions of core‑hours required for full ESM ensembles. The synthetic ensembles reproduce long‑range anisotropic correlations more faithfully than ensembles built from locally estimated parameters, demonstrating the practical advantage of global, image‑based estimation for downstream tasks such as uncertainty quantification and data assimilation.

In summary, LatticeVision reframes non‑stationary SAR parameter inference as an image‑to‑image translation problem, leverages synthetic data that encode realistic geophysical priors, and shows that both fully convolutional and hybrid Conv‑Transformer I2I networks can learn to infer spatially varying parameters with superior speed and accuracy compared to existing local neural estimators. The framework opens the door to fast, physics‑informed emulation of large‑scale geoscientific fields, enabling scalable ensemble generation and improved statistical modeling of complex, non‑stationary spatial processes.


Comments & Academic Discussion

Loading comments...

Leave a Comment