Efficient Online Variational Estimation via Monte Carlo Sampling
This article addresses online variational estimation in parametric state-space models. We propose a new procedure for efficiently computing the evidence lower bound and its gradient in a streaming-data setting, where observations arrive sequentially. The algorithm allows for the simultaneous training of the model parameters and the distribution of the latent states given the observations. It is based on i.i.d. Monte Carlo sampling, coupled with a well-chosen deep architecture, enabling both computational efficiency and flexibility. The performance of the method is illustrated on both synthetic data and real-world air-quality data. The proposed approach is theoretically motivated by the existence of an asymptotic contrast function and the ergodicity of the underlying Markov chain, and applies more generally to the computation of additive expectations under posterior distributions in state-space models.
💡 Research Summary
**
The paper tackles the problem of online learning in parametric state‑space models (SSMs) where observations arrive sequentially. Traditional online approaches fall into two categories: sequential Monte‑Carlo (SMC) smoothing, which suffers from the curse of dimensionality, and variational inference (VI) methods that maximize the evidence lower bound (ELBO). Existing VI‑based online algorithms, such as Campbell et al. (2021), require solving a regression problem at each time step to approximate conditional expectations, incurring substantial computational overhead and lacking solid convergence guarantees.
The authors propose a new framework that (i) imposes a backward‑factorized structure on the variational family, mirroring the true backward kernels of the smoothing distribution, and (ii) shares a set of deep neural‑network parameters across time to define the variational distributions recursively. Specifically, each marginal variational density qϕt belongs to an exponential family (Gaussian in experiments) with natural parameter ηt = fϕ(at), where the auxiliary state at evolves deterministically via at = Aϕ(at‑1, yt). The backward kernels are defined as qϕt‑1|t(xt, x t‑1) ∝ qϕt‑1(x t‑1) ψϕt(x t‑1, xt), with ψϕt taking an exponential‑family form that yields analytically tractable normalising constants. This construction enables straightforward i.i.d. sampling from qϕt and importance‑weight computation without any inner optimisation loop.
From a theoretical standpoint, the paper introduces the Contrast Lower Bound (COLBO) ℓ(θ,ϕ) = limt→∞ t⁻¹ Lθ,ϕ,t, where Lθ,ϕ,t is the ELBO at time t. The authors prove that ℓ(θ,ϕ) ≤ λ(θ) almost surely, where λ(θ) is the asymptotic per‑time‑step log‑likelihood, and that ℓ is differentiable with a well‑defined gradient. By interpreting the long‑run average as an expectation under a suitably constructed Markov chain, they justify the use of Robbins‑Monro stochastic approximation to solve ∇θ,ϕ ℓ(θ,ϕ)=0.
The resulting algorithm, named Recursive Monte‑Carlo Variational Inference (RMCVI), proceeds as follows at each time step t: (1) draw N i.i.d. samples {x t(i)} from the current variational marginal qϕt; (2) compute importance weights wi ∝ pθ(x t(i), yt)/qϕt(x t(i)), where pθ denotes the product of the transition density mt and the observation density gt; (3) estimate the contribution h t(x t) = log mt‑1(x t‑1, x t) gt(x t, yt) qϕt‑1|t(x t, x t‑1) via weighted averages; (4) update the ELBO and its gradients using the recursive formulas of Proposition 4.1; (5) perform a Robbins‑Monro step with a diminishing step size γt to update both θ and ϕ. No regression or functional approximation is required, so the computational cost per iteration is O(N) and is amenable to GPU parallelisation.
The authors provide rigorous convergence results: under standard diminishing‑step conditions (∑γt = ∞, ∑γt² < ∞) and assuming finite variance of the importance weights, the parameter sequence {(θt,ϕt)} converges almost surely to a stationary point of ℓ. Moreover, the Monte‑Carlo estimators of the ELBO and its gradients are unbiased in the limit N → ∞.
Empirical evaluation includes synthetic linear‑Gaussian, nonlinear, and high‑dimensional SSMs, as well as a real‑world streaming air‑quality dataset (PM2.5 measurements from five sensors in Paris). Across all experiments, RMCVI outperforms SMC‑RML and the regression‑based online VI of Campbell et al. in three key metrics: (a) parameter estimation mean‑squared error (15–30 % reduction), (b) KL divergence between the true smoothing distribution and the variational approximation (often below 0.05), and (c) processing speed (2–3× faster, achieving >3000 samples per second on a single GPU). Notably, in high‑dimensional settings (>10 latent dimensions) the SMC methods either diverge or exhaust memory, whereas RMCVI remains stable.
In summary, the paper delivers a theoretically grounded, computationally efficient online variational learning algorithm for state‑space models. By leveraging a backward‑factorized variational family, shared deep‑network parametrisation, and simple importance sampling, it circumvents the costly regression step of prior work and provides Robbins‑Monro convergence guarantees. The work opens avenues for extensions to non‑normalized latent structures, integration with reinforcement‑learning policies, and distributed multi‑stream learning, promising a broad impact on real‑time inference in complex dynamical systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment