Hierarchical Diffusion Motion Planning with Task-Conditioned Uncertainty-Aware Priors

Hierarchical Diffusion Motion Planning with Task-Conditioned Uncertainty-Aware Priors
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose a novel hierarchical diffusion planner that embeds task and motion structure directly into the noise model. Unlike standard diffusion-based planners that rely on zero-mean, isotropic Gaussian corruption, we introduce task-conditioned structured Gaussians whose means and covariances are derived from Gaussian Process Motion Planning (GPMP), explicitly encoding trajectory smoothness and task semantics in the prior. We first generalize the standard diffusion process to biased, non-isotropic corruption with closed-form forward and posterior expressions. Building on this formulation, our hierarchical design separates prior instantiation from trajectory denoising. At the upper level, the model predicts sparse, task-centric key states and their associated timings, which instantiate a structured Gaussian prior (mean and covariance). At the lower level, the full trajectory is denoised under this fixed prior, treating the upper-level outputs as noisy observations. Experiments on Maze2D goal-reaching and KUKA block stacking show consistently higher success rates and smoother trajectories than isotropic baselines, achieving dataset-level smoothness substantially earlier during training. Ablation studies further show that explicitly structuring the corruption process provides benefits beyond neural conditioning the denoising network alone. Overall, our approach concentrates the prior’s probability mass near feasible and semantically meaningful trajectories. Our project page is available at https://hta-diffusion.github.io.


💡 Research Summary

The paper introduces a hierarchical diffusion planner that embeds task and motion structure directly into the noise model, moving beyond the conventional zero‑mean isotropic Gaussian corruption used in most diffusion‑based motion planners. The authors first generalize the forward diffusion process to a biased, non‑isotropic Gaussian with task‑conditioned mean ξ and covariance K, preserving closed‑form forward marginals and reverse posteriors. This formulation ensures that, as the diffusion steps increase, the terminal distribution converges to N(ξ, K), a prior that already reflects task‑specific information.

The hierarchical architecture consists of two levels. The upper level predicts a sparse set of key states (e.g., start, goal, way‑points, contact events) together with a binary selection matrix that encodes their timestamps. These key states are treated as soft observations with an associated observation covariance K_y, allowing the model to express confidence without imposing hard constraints. Using Gaussian Process Motion Planning (GPMP) as a base, an unconditioned linear‑time‑varying (LTV) GP prior over the full trajectory is constructed. Conditioning this GP on the soft key‑state observations yields a task‑conditioned posterior with closed‑form mean ξ and covariance K (Equations 8‑9). The resulting prior has low variance near the key states and higher variance elsewhere, encouraging smooth interpolation while respecting the uncertainty of each anchor.

The lower level performs reverse diffusion starting from the structured prior N(ξ, K). The denoising network is trained with a Mahalanobis loss ‖μ̂_i − μ_θ‖_{K^{-1}}², which weights errors according to the inverse of the task‑conditioned covariance. This loss naturally incorporates the temporal correlations encoded by the GP and the confidence in the key states. During training, ground‑truth key states are perturbed to simulate prediction errors from the upper level, ensuring robustness. Computationally expensive matrix inversions are mitigated by pre‑computing gain terms for the set of timing configurations encountered during training.

Experiments on two benchmarks—Maze2D (2‑D navigation) and a KUKA arm block‑stacking task—compare four variants: (i) a single‑layer isotropic diffusion model, (ii) a hierarchical model with only conditioning, (iii) a diffusion model with cost‑guidance, and (iv) the proposed structured‑prior hierarchical model. The full model achieves higher success rates (≈10–15 % absolute improvement) and produces significantly smoother trajectories, as measured by jerk and acceleration variance. Moreover, the structured prior accelerates learning: dataset‑level smoothness metrics rise earlier in training, indicating that the inductive bias concentrates probability mass around feasible, task‑consistent trajectories.

Ablation studies reveal that (1) treating key states as hard constraints harms performance, confirming the benefit of soft, uncertainty‑aware anchoring, and (2) replacing the non‑isotropic covariance with an isotropic one degrades both success rate and smoothness, demonstrating that the structured noise itself is a crucial source of improvement beyond mere conditioning or guidance.

In summary, the work shows that designing the diffusion noise model to reflect task semantics and motion smoothness—via a GPMP‑based, task‑conditioned Gaussian—provides a powerful inductive bias for robotic motion planning. The hierarchical design cleanly separates high‑level task reasoning (key‑state prediction) from low‑level trajectory refinement, and the Mahalanobis training objective aligns the denoising process with the underlying physics of the problem. This approach opens avenues for extending diffusion planners to more complex manipulation, multi‑robot coordination, and real‑time deployment while maintaining strong guarantees of feasibility and smoothness.


Comments & Academic Discussion

Loading comments...

Leave a Comment