EVODiff: Entropy-aware Variance Optimized Diffusion Inference

EVODiff: Entropy-aware Variance Optimized Diffusion Inference
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Diffusion models (DMs) excel in image generation but suffer from slow inference and training-inference discrepancies. Although gradient-based solvers for DMs accelerate denoising inference, they often lack theoretical foundations in information transmission efficiency. In this work, we introduce an information-theoretic perspective on the inference processes of DMs, revealing that successful denoising fundamentally reduces conditional entropy in reverse transitions. This principle leads to our key insights into the inference processes: (1) data prediction parameterization outperforms its noise counterpart, and (2) optimizing conditional variance offers a reference-free way to minimize both transition and reconstruction errors. Based on these insights, we propose an entropy-aware variance optimized method for the generative process of DMs, called EVODiff, which systematically reduces uncertainty by optimizing conditional entropy during denoising. Extensive experiments on DMs validate our insights and demonstrate that our method significantly and consistently outperforms state-of-the-art (SOTA) gradient-based solvers. For example, compared to the DPM-Solver++, EVODiff reduces the reconstruction error by up to 45.5% (FID improves from 5.10 to 2.78) at 10 function evaluations (NFE) on CIFAR-10, cuts the NFE cost by 25% (from 20 to 15 NFE) for high-quality samples on ImageNet-256, and improves text-to-image generation while reducing artifacts. Code is available at https://github.com/ShiguiLi/EVODiff.


💡 Research Summary

EVODiff introduces an information‑theoretic framework for diffusion model (DM) inference that explicitly targets the reduction of conditional entropy during the reverse denoising process. The authors first observe that each reverse transition p(x_t | x_{t+1}) can be approximated as a Gaussian distribution, whose conditional entropy H(x_t | x_{t+1}) is proportional to the log‑determinant of the conditional covariance matrix. Consequently, minimizing the conditional variance directly minimizes conditional entropy, which in turn is equivalent to minimizing the mean‑squared reconstruction error.

Two key theoretical insights emerge from this formulation. First, data‑prediction parameterization (predicting x₀ directly) is provably more effective than the traditional noise‑prediction (ε‑prediction) for reducing both conditional entropy and reconstruction error. Theorem 3.4 demonstrates that, under the usual independence assumptions of diffusion training, data‑prediction bypasses the error‑prone chain ε → x → x₀ and thus yields a tighter bound on the expected error. Second, optimizing the conditional variance provides a reference‑free mechanism to simultaneously lower transition error (the error incurred by the numerical ODE/SDE solver) and reconstruction error (the distance between the generated sample and the true data point). This insight explains why recent gradient‑based solvers, which implicitly adjust step sizes, achieve better performance than first‑order methods such as DDIM.

Building on these insights, the paper proposes EVODiff, an entropy‑aware variance‑optimized inference algorithm. For each timestep i, the method computes a step size h_i = κ(t_{i‑1}) − κ(t_i) based on the log‑signal‑to‑noise ratio κ(t). It then updates the latent state using a data‑prediction ODE term g(·) together with a variance‑driven correction term that incorporates a dynamically estimated conditional variance B_θ(t_i). Two scaling coefficients, ζ_i and η_i, are derived from a closed‑form optimality condition (Eq. 25) and modulate the strength of the variance correction. The update rule can be written as:

 x_{t_{i‑1}} = g(x_{t_i}) + σ_{t_i}^{‑1} h_i² ζ_i B_θ(t_i)

where g(x_{t_i}) = σ_{t_i}^{‑1} x_{t_i} + σ_{t_i}^{‑1} h_i x_θ(x_{t_i}, t_i) corresponds to the standard Euler or DDIM step for data prediction. By reducing the step size |h_i| relative to the single‑step case, the algorithm achieves a finer discretization of the reverse diffusion trajectory, which, according to Proposition 3.2, yields a larger reduction in conditional entropy. The authors further show that popular solvers such as DPM‑Solver and the Heun method in EDM can be interpreted as specific instances of this entropy‑reduction framework (Proposition 3.3).

Extensive experiments validate the theoretical claims. On CIFAR‑10, EVODiff attains an FID of 2.78 with only 10 function evaluations (NFE), a 45.5 % improvement over the baseline DPM‑Solver++ (FID = 5.10). On ImageNet‑256, the method reduces the required NFE for high‑quality samples from 20 to 15, cutting inference cost by 25 % while preserving comparable FID scores. Similar gains are observed on LSUN‑Bedrooms (FID = 7.91 vs. 13.97 at 5 NFE) and CelebA‑64. Ablation studies comparing the entropy‑reduction (RE) variant against a traditional finite‑difference (FD) gradient approach demonstrate consistently lower FID and higher mutual information between successive latent states, confirming that RE more effectively drives information flow. In text‑to‑image generation using a pretrained Stable Diffusion model, EVODiff reduces visual artifacts and improves text‑image alignment, showcasing its applicability beyond unconditional image synthesis.

The paper also discusses practical considerations. Estimating the conditional variance B_θ(t_i) requires an additional forward pass through the diffusion network, incurring modest computational overhead. However, the overall inference speed remains favorable because the reduced NFE compensates for the extra cost. The authors acknowledge that the current variance estimator assumes isotropic Gaussian noise and that extending the framework to anisotropic or learned covariance structures is an open direction.

In summary, EVODiff provides a principled, entropy‑driven alternative to existing diffusion solvers. By explicitly minimizing conditional entropy through variance optimization and by leveraging data‑prediction parameterization, it achieves faster convergence and higher sample fidelity across a range of benchmarks. This work bridges a gap between information theory and diffusion inference, offering a new lens through which future generative modeling research can be viewed and improved.


Comments & Academic Discussion

Loading comments...

Leave a Comment