Scalable non-separable spatio-temporal Gaussian process models for large-scale short-term weather prediction
Monitoring daily weather fields is critical for climate science, agriculture, and environmental planning, yet fully probabilistic spatio-temporal models become computationally prohibitive at continental scale. We present a case study on short-term forecasting of daily maximum temperature and precipitation across the conterminous United States using novel scalable spatio-temporal Gaussian process methodology. Building on three approximation families - inducing-point methods (FITC), Vecchia approximations, and a hybrid Vecchia-inducing-point full-scale approach (VIF) - we introduce three extensions that address key bottlenecks in large space-time settings: (i) a scalable correlation-based neighbor selection strategy for Vecchia approximations with point-referenced data, enabling accurate conditioning under complex dependence structures, (ii) a space-time kMeans++ inducing-point selection algorithm, and (iii) GPU-accelerated implementations of computationally expensive operations, including matrix operations and neighbor searches. Using both synthetic experiments and a large NOAA station dataset containing more than one million space-time observations, we analyze the models with respect to predictive performance, parameter estimation, and computational efficiency. Our results demonstrate that scalable Gaussian process models can yield accurate continental-scale forecasts while remaining computationally feasible, offering practical tools for weather applications.
💡 Research Summary
The paper tackles the challenge of applying fully probabilistic, non‑separable spatio‑temporal Gaussian process (GP) models to continental‑scale short‑term weather forecasting. Using daily maximum temperature and precipitation observations from roughly 3,000 NOAA stations across the conterminous United States (about 1.7 million space‑time points over 608 days), the authors evaluate three scalable GP approximation families: (1) the Fully Independent Training Conditional (FITC) low‑rank inducing‑point method, (2) the Vecchia approximation that yields a sparse precision matrix by conditioning each observation on a small set of neighbors, and (3) a hybrid Vecchia‑Inducing‑Point Full‑scale (VIF) approach that combines low‑rank and sparse components.
Two methodological innovations address the specific bottlenecks of non‑separable space‑time modeling. First, a correlation‑based neighbor selection strategy replaces the traditional distance‑based nearest‑neighbor rule in Vecchia. For each observation, candidate neighbors are ranked by the actual covariance value (computed from the full non‑separable kernel) rather than Euclidean distance, ensuring that the conditioning set captures the most informative dependencies even when spatial proximity does not align with temporal correlation. Second, a space‑time separated kMeans++ algorithm (sts‑kMeans++) is introduced for inducing‑point placement. The algorithm first clusters locations and times separately, then applies the kMeans++ seeding scheme within each combined cluster, producing a set of inducing points that reflects both spatial density and temporal variability without manual tuning.
All computationally intensive steps—construction of the B and D matrices in the Vecchia factorization, neighbor searches, and large matrix multiplications—are implemented on CUDA‑enabled GPUs. This yields order‑of‑magnitude speedups (10×–15×) compared with optimized CPU code, reducing the total training time for the full dataset to under two hours and enabling prediction at new points in a few seconds.
The statistical model comprises a structured mean component built from geographic covariates (elevation, distance to coast, slope, aspect, seasonal sine‑cosine bases, Köppen climate zones, etc.) and a latent GP residual with a Gneiting‑type non‑separable covariance function. Temperature is modeled with a Gaussian likelihood, while precipitation uses a zero‑censored, power‑transformed normal (ZC‑PTN) likelihood to handle the mixed discrete‑continuous distribution; a Laplace approximation is employed for inference.
Synthetic experiments confirm that all three approximations can recover the true non‑separable parameters (β, α, δ, etc.) with negligible bias, but VIF consistently yields the lowest mean‑squared error. In the real NOAA case study, predictive performance is assessed via RMSE for temperature, CRPS for both variables, log‑likelihood, and calibration plots. VIF outperforms FITC and pure Vecchia, achieving a 5‑7 % reduction in temperature RMSE and an 8 % improvement in precipitation CRPS, while maintaining well‑calibrated predictive distributions. Memory footprints are also favorable: VIF stays below 30 GB, whereas FITC and pure Vecchia require 45 GB and 38 GB respectively.
The authors conclude that scalable non‑separable GP modeling for large weather datasets is feasible when (i) neighbor sets are chosen based on actual covariance rather than distance, (ii) inducing points are placed using a data‑driven space‑time kMeans++ scheme, and (iii) GPU acceleration is leveraged for the core linear‑algebra operations. The hybrid VIF approach delivers the best trade‑off between statistical fidelity and computational efficiency, making it a practical tool for applications where numerical weather prediction outputs are unavailable, too coarse, or too costly to run, such as rapid decision support, exploratory climate analysis, and uncertainty quantification in environmental planning.
Comments & Academic Discussion
Loading comments...
Leave a Comment