Inference-Time Alignment for Diffusion Models via Variationally Stable Doob's Matching

Inference-Time Alignment for Diffusion Models via Variationally Stable Doob's Matching
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Inference-time alignment for diffusion models aims to adapt a pre-trained reference diffusion model toward a target distribution without retraining the reference score network, thereby preserving the generative capacity of the reference model while enforcing desired properties at the inference time. A central mechanism for achieving such alignment is guidance, which modifies the sampling dynamics through an additional drift term. In this work, we introduce variationally stable Doob’s matching, a novel framework for provable guidance estimation grounded in Doob’s $h$-transform. Our approach formulates guidance as the gradient of logarithm of an underlying Doob’s $h$-function and employs gradient-regularized regression to simultaneously estimate both the $h$-function and its gradient, resulting in a consistent estimator of the guidance. Theoretically, we establish non-asymptotic convergence rates for the estimated guidance. Moreover, we analyze the resulting controllable diffusion processes and prove non-asymptotic convergence guarantees for the generated distributions in the 2-Wasserstein distance. Finally, we show that variationally stable guidance estimators are adaptive to unknown low dimensionality, effectively mitigating the curse of dimensionality under low-dimensional subspace assumptions.


💡 Research Summary

This paper addresses the problem of adapting a pre‑trained diffusion model to a new target distribution at inference time, without any fine‑tuning of the underlying score network. The authors focus on “guidance”, an additional drift term that modifies the reverse‑time stochastic differential equation (SDE) of the diffusion process. Building on Doob’s h‑transform, they observe that the score of the tilted target distribution q₀(x) = w(x)p₀(x)/Z can be decomposed as ∇log q₀(x) = ∇log p₀(x) + ∇log h(x), where the h‑function h(x) = E_{X_T∼p_T}


Comments & Academic Discussion

Loading comments...

Leave a Comment