Sampling From Multiscale Densities With Delayed Rejection Generalized Hamiltonian Monte Carlo
Hamiltonian Monte Carlo (HMC) is the mainstay of applied Bayesian inference for differentiable models. However, HMC still struggles to sample from hierarchical models that induce densities with multiscale geometry: a large step size is needed to efficiently explore low curvature regions while a small step size is needed to accurately explore high curvature regions. We introduce the delayed rejection generalized HMC (DR-G-HMC) sampler that overcomes this challenge by employing dynamic step size selection, inspired by differential equation solvers. In generalized HMC, each iteration does a single leapfrog step. DR-G-HMC sequentially makes proposals with geometrically decreasing step sizes upon rejection of earlier proposals. This simulates Hamiltonian dynamics that can adjust its step size along a (stochastic) Hamiltonian trajectory to deal with regions of high curvature. DR-G-HMC makes generalized HMC competitive by decreasing the number of rejections which otherwise cause inefficient backtracking and prevents directed movement. We present experiments to demonstrate that DR-G-HMC (1) correctly samples from multiscale densities, (2) makes generalized HMC methods competitive with the state of the art No-U-Turn sampler, and (3) is robust to tuning parameters.
💡 Research Summary
The paper addresses a fundamental difficulty in Bayesian computation: sampling efficiently from posterior distributions that exhibit multiscale geometry, a situation commonly arising in hierarchical models where some directions are flat while others are sharply curved. Traditional Hamiltonian Monte Carlo (HMC) and its adaptive variant No‑U‑Turn Sampler (NUTS) rely on a single global step size or an adaptively tuned trajectory length, which makes them either unstable in high‑curvature regions or wastefully slow in low‑curvature regions.
To overcome this, the authors propose Delayed‑Rejection Generalized Hamiltonian Monte Carlo (DR‑G‑HMC). The method builds on two prior ideas. First, delayed‑rejection (DR) techniques allow a sampler to make a second (or third, etc.) proposal after a rejection, using a different proposal kernel. Second, Generalized HMC (G‑HMC) replaces the full momentum resampling of standard HMC with a partial refresh and typically performs only a single leapfrog step per iteration. By marrying DR with G‑HMC, DR‑G‑HMC can adapt the step size within a single iteration: after a rejection, it retries from the same state with a smaller step size, rather than restarting an entire trajectory as in earlier DR‑HMC approaches.
Algorithmically, each iteration proceeds as follows: (1) partially refresh the momentum ρ′∼N(ρ√(1−γ),γM). (2) Propose up to K candidates. The k‑th candidate uses a step size ϵ_k = ϵ / r^{k‑1} (geometric decay) and a single leapfrog step, followed by a momentum flip. (3) Compute an acceptance probability α_k that extends the usual Metropolis‑Hastings ratio with a product of “ghost‑state” terms accounting for the probability of having rejected the previous k‑1 proposals from both the current and the hypothetical forward chain. (4) Accept the first candidate that passes the test; if none do, retain the current state. Finally, the momentum is flipped to preserve detailed balance.
The geometric decay of step sizes is crucial: large ϵ explores flat, high‑volume regions quickly, while progressively smaller ϵ’s provide accurate integration in narrow, high‑curvature “neck” regions. This dynamic per‑iteration step‑size selection mimics adaptive ODE solvers but incurs only modest overhead because ghost‑state calculations are needed only when a rejection occurs, which is rare in well‑chosen regions.
Empirical evaluation focuses on three samplers: DR‑G‑HMC, the earlier DR‑HMC, and NUTS. All experiments are budgeted by a fixed number of gradient evaluations (10⁶), reflecting the dominant computational cost. Test problems include Neal’s funnel (the canonical multiscale benchmark), hierarchical linear regression, and a stiff ODE‑based model. Performance is measured by absolute standardized error rather than effective sample size, because ESS can be misleading when chains fail to explore the target fully. Across all tasks, DR‑G‑HMC achieves lower error than both baselines, demonstrating (i) superior handling of multiscale curvature, (ii) fewer trajectory reversals than G‑HMC (which suffers from momentum flips on rejection), and (iii) robustness to hyper‑parameter choices. Sensitivity analyses show that varying K (3–5), reduction factor r (2–3), and damping γ (0.1–0.9) has minimal impact on accuracy, indicating that the method does not require delicate tuning.
Beyond performance, the paper contributes a rigorous proof of detailed balance for the delayed‑rejection scheme in the G‑HMC context, including derivations of the acceptance probability and discussion of the exponential growth of ghost‑state terms. The authors argue that, because additional proposals are only generated in difficult regions, the overall computational cost remains comparable to NUTS while delivering higher acceptance rates and better mixing.
The authors release open‑source code and outline future extensions: per‑dimension adaptive step sizes, integration with Riemannian metrics for even richer curvature adaptation, and scalable parallel implementations. In summary, DR‑G‑HMC offers a practical, theoretically sound, and empirically validated solution for sampling from multiscale densities, bridging the gap between the robustness of HMC and the flexibility of adaptive integrators without incurring prohibitive computational overhead.
Comments & Academic Discussion
Loading comments...
Leave a Comment