Lookahead Path Likelihood Optimization for Diffusion LLMs

Lookahead Path Likelihood Optimization for Diffusion LLMs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Diffusion Large Language Models (dLLMs) support arbitrary-order generation, yet their inference performance critically depends on the unmasking order. Existing strategies rely on heuristics that greedily optimize local confidence, offering limited guidance for identifying unmasking paths that are globally consistent and accurate. To bridge this gap, we introduce path log-likelihood (Path LL), a trajectory-conditioned objective that strongly correlates with downstream accuracy and enables principled selection of unmasking paths. To optimize Path LL at inference time, we propose POKE, an efficient value estimator that predicts the expected future Path LL of a partial decoding trajectory. We then integrate this lookahead signal into POKE-SMC, a Sequential Monte Carlo-based search framework for dynamically identifying optimal unmasking paths. Extensive experiments across 6 reasoning tasks show that POKE-SMC consistently improves accuracy, achieving 2%–3% average gains over strong decoding-time scaling baselines at comparable inference overhead on LLaDA models and advancing the accuracy–compute Pareto frontier.


💡 Research Summary

Diffusion Large Language Models (dLLMs) break the left‑to‑right constraint of traditional autoregressive generators by framing text generation as a discrete denoising process. While this enables parallel decoding and arbitrary‑order token unmasking, the actual unmasking order dramatically influences downstream performance. Existing inference strategies rely on simple heuristics—such as selecting the most confident tokens, the lowest‑entropy tokens, or those with the largest margin between top‑k probabilities—to decide which positions to unmask at each step. Empirical evidence in the paper shows that these static heuristics perform inconsistently across domains (code, math, planning), indicating that a one‑size‑fits‑all rule is insufficient.

The authors introduce Path Log‑Likelihood (Path LL) as a new, trajectory‑conditioned objective. For a given unmasking trajectory τ = (Q_T,…,Q_1) and a completed token sequence x, Path LL is defined as the sum of log‑probabilities of the tokens revealed at each step:
\


Comments & Academic Discussion

Loading comments...

Leave a Comment