Uncertainty-aware data assimilation through variational inference
Data assimilation, consisting in the combination of a dynamical model with a set of noisy and incomplete observations in order to infer the state of a system over time, involves uncertainty in most settings. Building upon an existing deterministic machine learning approach, we propose a variational inference-based extension in which the predicted state follows a multivariate Gaussian distribution. Using the chaotic Lorenz-96 dynamics as a testing ground, we show that our new model enables to obtain nearly perfectly calibrated predictions, and can be integrated in a wider variational data assimilation pipeline in order to achieve greater benefit from increasing lengths of data assimilation windows. Our code is available at https://github.com/anthony-frion/Stochastic_CODA.
💡 Research Summary
**
This paper introduces a probabilistic extension of the deterministic CODA (Combined Optimization of Dynamics and Assimilation) framework for data assimilation, enabling the model to output a full multivariate Gaussian posterior for the system state rather than a single point estimate. The authors replace the original CODA network Gθ with a variational version that returns a mean vector μₜ and a diagonal covariance Σₜ (parameterized by σₜ). To train this stochastic model, they adapt the original unsupervised loss (which combined observation reconstruction and self‑consistency) into a variational objective (Equation 6). The first term remains a mean‑squared error between observations and model‑propagated samples drawn from qₜ = N(μₜ, Σₜ). The second term approximates the Kullback‑Leibler divergence between the distribution obtained by propagating qₜ forward through the dynamics for h steps (qₜ→ₜ₊ₕ) and the variational posterior at the future time qₜ₊ₕ. Because the density of qₜ→ₜ₊ₕ cannot be evaluated, the authors add the entropy of the forward‑propagated distribution as a surrogate, resulting in a loss that encourages low‑entropy (sharp) posteriors while still matching the future posterior. A hyper‑parameter λ controls the weight of this entropy term; λ = 0 collapses the variance to zero (deterministic behavior), while larger λ values increase spread and can lead to over‑confidence if not tuned.
Experiments are conducted on the chaotic Lorenz‑96 model (n = 40, forcing F = 8). Observations are generated by randomly masking 75 % of the variables at each time step and adding standard Gaussian noise to the remaining variables. Three dataset sizes are used: 10⁴, 3 × 10⁵, and 3 × 10⁶ time steps. The authors evaluate predictive performance with the Continuous Ranked Probability Score (CRPS), the spread‑skill ratio (SSRA_T), and the spread‑skill reliability (SSREL). Results show that the variational CODA achieves the lowest CRPS (down to 0.168) and near‑perfect calibration (SSRA_T ≈ 1.00, SSREL ≈ 0.01) when trained on the largest dataset. Dropout‑augmented CODA and an ensemble of five dropout models perform competitively on small datasets but lag behind the variational approach on larger data, exhibiting higher SSREL and SSRA_T values, indicating poorer uncertainty calibration.
Beyond standalone forecasting, the trained variational CODA is incorporated into a weak‑constraint 4D‑Var assimilation scheme. The standard 4D‑Var cost function J includes observation misfit, model error, and optional background (β) and foreground (γ) prior terms that penalize deviations from the CODA‑provided mean and covariance at the start and end of the assimilation window. Four variants are tested: (i) nearest‑observation initialization without priors, (ii) CODA mean initialization without priors, (iii) CODA initialization with a background prior (β = 1), and (iv) CODA initialization with both background and foreground priors (β = γ = 1). Across assimilation windows ranging from 1 000 to 100 000 time steps, the variants that use CODA priors (especially the combined background‑foreground case) consistently achieve lower mean‑squared error than the baseline and the pure 4D‑Var without CODA information. This demonstrates that a well‑calibrated probabilistic forecast can serve as an effective prior, improving convergence and accuracy of variational data assimilation, particularly for long windows where the dynamical model is perfectly known.
In summary, the paper makes three key contributions: (1) a novel variational loss that fuses observation reconstruction with forward‑propagation consistency, enabling unsupervised training of a stochastic data‑assimilation network; (2) empirical evidence that the resulting model provides calibrated uncertainty estimates on a chaotic benchmark, outperforming dropout‑based stochastic methods and ensembles; and (3) a demonstration that the learned probabilistic model can be seamlessly integrated as a prior in a classical 4D‑Var framework, yielding superior assimilation performance for long observation windows. Future work suggested includes extending beyond diagonal covariances, exploring non‑Gaussian posterior families, and applying the methodology to real oceanic or atmospheric observation datasets.
Comments & Academic Discussion
Loading comments...
Leave a Comment