Online Bernstein-von Mises theorem

Online Bernstein-von Mises theorem
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Online learning is an inferential paradigm in which parameters are updated incrementally from sequentially available data, in contrast to batch learning, where the entire dataset is processed at once. In this paper, we assume that mini-batches from the full dataset become available sequentially. The Bayesian framework, which updates beliefs about unknown parameters after observing each mini-batch, is naturally suited for online learning. At each step, we update the posterior distribution using the current prior and new observations, with the updated posterior serving as the prior for the next step. However, this recursive Bayesian updating is rarely computationally tractable unless the model and prior are conjugate. When the model is regular, the updated posterior can be approximated by a normal distribution, as justified by the Bernstein-von Mises theorem. We adopt a variational approximation at each step and investigate the frequentist properties of the final posterior obtained through this sequential procedure. Under mild assumptions, we show that the accumulated approximation error becomes negligible once the mini-batch size exceeds a threshold depending on the parameter dimension. As a result, the sequentially updated posterior is asymptotically indistinguishable from the full posterior.


💡 Research Summary

Problem Setting and Motivation
The paper addresses a fundamental challenge in Bayesian online learning: how to update posterior distributions sequentially when data arrive in mini‑batches, while keeping the computation tractable. Exact Bayesian updating quickly becomes infeasible for non‑conjugate models because the posterior does not belong to a simple family. A common remedy is to approximate the updated posterior at each step with a variational distribution (typically a multivariate Gaussian) by minimizing the Kullback‑Leibler (KL) divergence to the true posterior. However, repeated approximations raise the question of error accumulation: does the final variational posterior remain close to the true posterior that would be obtained by processing the whole dataset at once?

Model and Notation

  • The full data set (D = (Y_1,\dots,Y_N)) is partitioned into (T) mini‑batches of equal size (n) (so (N = nT)).
  • The parameter space is (\Theta\subset\mathbb{R}^p) with true value (\theta_0).
  • At step (t), the exact posterior after observing batch (D_t) is
    \

Comments & Academic Discussion

Loading comments...

Leave a Comment