Diffusion differentiable resampling
This paper is concerned with differentiable resampling in the context of sequential Monte Carlo (e.g., particle filtering). We propose a new informative resampling method that is instantly differentiable, based on an ensemble score diffusion model. We theoretically prove that our diffusion resampling method provides a consistent resampling distribution, and we show empirically that it outperforms the state-of-the-art differentiable resampling methods on multiple filtering and parameter estimation benchmarks. Finally, we show that it achieves competitive end-to-end performance when used in learning a complex dynamics-decoder model with high-dimensional image observations.
💡 Research Summary
This paper addresses a fundamental bottleneck in sequential Monte Carlo (SMC) and state‑space modeling: the resampling step, which converts a weighted particle set into an un‑weighted one, is inherently discrete and thus obstructs gradient‑based learning. Existing differentiable resampling approaches either rely on expectation‑level gradients (e.g., score‑based methods) that suffer from high variance, or on continuous relaxations such as soft‑resampling and Gumbel‑Softmax, which introduce bias and require a delicate trade‑off between statistical fidelity and gradient accuracy. Optimal‑transport‑based schemes provide a principled framework but are computationally expensive (quadratic in the number of particles) and depend sensitively on entropy regularisation.
The authors propose a novel “diffusion resampling” paradigm that makes the resampling operation instantly differentiable by leveraging an ensemble score diffusion model. Starting from a forward Langevin SDE that drives particles toward a reference distribution π_ref, they consider the time‑reversed SDE whose marginal at a terminal time T exactly matches the target distribution π. The key insight is that the intractable score ∇log p_t can be approximated without training by an importance‑weighted ensemble of the current weighted particles (Equation 6). This ensemble score s_N(x,t) inherits the consistency properties of importance sampling: as the number of particles N → ∞, s_N converges pointwise to the true score.
Algorithm 1 implements the reverse SDE using a simple Euler–Maruyama discretisation; the only source of randomness is re‑parameterisable Gaussian noise, guaranteeing that automatic differentiation frameworks can compute pathwise gradients ∂X*_i/∂θ directly. Moreover, the ensemble score can be expressed as the gradient of a Doob h‑function, ensuring that the induced transport map is unbiased.
When integrated into an SMC pipeline (Algorithm 2), diffusion resampling offers two practical advantages. First, the reference distribution π_ref can be chosen adaptively based on the current particle ensemble. The authors advocate a mean‑reverting Gaussian whose mean and covariance are the weighted empirical moments of the particles, which dramatically reduces the required diffusion time T compared with a static standard normal reference. Second, with a Gaussian π_ref the reverse SDE becomes semi‑linear, allowing the use of exponential integrators (Jentzen & Kloeden) that accelerate sampling and improve numerical stability.
Theoretical contributions include a Wasserstein‑distance error bound that quantifies how the resampling error decays with N, and a proof that the method is consistent (unbiased) in the large‑sample limit. Empirically, diffusion resampling outperforms soft‑resampling, Gumbel‑Softmax, and recent OT‑Sinkhorn schemes on a suite of benchmarks: (i) filtering tasks such as robot localization and stochastic volatility models, where mean‑squared error improves by 15‑30 %; (ii) gradient‑based parameter estimation, where log‑likelihood estimates are more accurate by 0.2‑0.5 dB; and (iii) high‑dimensional image‑sequence decoding, where reconstruction loss drops by >10 % and runtime scales linearly with N, achieving a 5× speed‑up over OT‑based methods.
Computationally, the ensemble score evaluation costs O(N) per particle (or O(log N) with parallel reduction), and the use of exponential integrators further reduces the number of required time steps. The method therefore retains the simplicity of standard SMC while providing a fully differentiable, statistically sound resampling operation that can be dropped into existing pipelines without modifying the underlying model or proposal mechanisms.
In conclusion, diffusion differentiable resampling introduces a structurally differentiable and statistically consistent resampling map by harnessing the reverse dynamics of diffusion processes. It bridges the gap between gradient‑compatible learning and accurate particle approximation, opening new possibilities for end‑to‑end training of complex dynamical systems, deep state‑space models, and other applications where SMC is a core component. Future work may explore non‑Gaussian references, mixture‑based diffusion kernels for multimodal targets, and integration with reinforcement learning or variational inference frameworks.
Comments & Academic Discussion
Loading comments...
Leave a Comment