Convergence Bounds for Sequential Monte Carlo on Multimodal Distributions using Soft Decomposition
We prove bounds on the variance of a function $f$ under the empirical measure of the samples obtained by the Sequential Monte Carlo (SMC) algorithm, with time complexity depending on local rather than global Markov chain mixing dynamics. SMC is a Markov Chain Monte Carlo (MCMC) method, which starts by drawing $N$ particles from a known distribution, and then, through a sequence of distributions, re-weights and re-samples the particles, at each instance applying a Markov chain for smoothing. In principle, SMC tries to alleviate problems from multi-modality. However, most theoretical guarantees for SMC are obtained by assuming global mixing time bounds, which are only efficient in the uni-modal setting. We show that bounds can be obtained in the truly multi-modal setting, with mixing times that depend only on local MCMC dynamics.
💡 Research Summary
The paper addresses a fundamental limitation of existing theoretical guarantees for Sequential Monte Carlo (SMC) methods when applied to multimodal target distributions. Traditional analyses rely on global mixing time bounds of the underlying Markov chain, which become prohibitively large when the chain must traverse low‑probability regions separating distinct modes. Consequently, the theoretical advantage of SMC over plain MCMC is obscured, despite empirical evidence that SMC can preserve particles in each mode through re‑weighting and resampling.
To overcome this, the authors introduce a “soft decomposition” framework. At each intermediate distribution µ_k in the SMC sequence, they assume that µ_k can be expressed as a finite mixture
µ_k = Σ_{i=1}^{M_k} w_i^{(k)} µ_i^{(k)} ,
with positive weights summing to one. Moreover, the generator L_k of the Markov kernel P_k used for smoothing satisfies a weighted Dirichlet‑form inequality:
⟨f, L_k f⟩{µ_k} ≤ Σ_i w_i^{(k)} ⟨f, L{k,i} f⟩{µ_i^{(k)}} ,
where L{k,i} is the generator of a Markov process that is natural for the i‑th component (e.g., Langevin dynamics for a Gaussian component, Metropolis–Hastings for a discrete component). This condition holds for many standard kernels, including Langevin diffusion, Glauber dynamics, and Metropolis random walk, and thus does not restrict the algorithm to “hard” partitions of the state space.
The core technical contribution is a variance decomposition that separates the total variance of the empirical estimator η_N^n(f) into intra‑mode and inter‑mode components. The intra‑mode part is bounded using local Poincaré (or log‑Sobolev) inequalities satisfied by each component µ_i^{(k)}. Specifically, if each component obeys a log‑Sobolev inequality with constant c_{k,i}, then the intra‑mode variance decays at a rate governed by the local spectral gap λ_{k,i} and the constant C_{k,i}=c_{k,i}. The inter‑mode part is controlled by the smallest mixture weight w_* = min_{k,i} w_i^{(k)}; intuitively, modes with tiny weight require more particles to achieve a given accuracy, which is reflected in a factor w_*^{-1} in the sample‑size bound.
Putting these pieces together, the authors prove a non‑asymptotic bound (Theorem 3) stating that for any ε>0, if the number of particles N satisfies
N ≥ n·max{ 4γM/(w_* ε), 128γ/(35·8·M^{7/4}·w_*^{15/8}) }
and each smoothing step runs for a time
t_k ≥ 2 C_k^* γ / 7 ,
then the mean‑squared error of the empirical estimate satisfies
E
Comments & Academic Discussion
Loading comments...
Leave a Comment