Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion

Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose a hierarchical entity-centric framework for offline Goal-Conditioned Reinforcement Learning (GCRL) that combines subgoal decomposition with factored structure to solve long-horizon tasks in domains with multiple entities. Achieving long-horizon goals in complex environments remains a core challenge in Reinforcement Learning (RL). Domains with multiple entities are particularly difficult due to their combinatorial complexity. GCRL facilitates generalization across goals and the use of subgoal structure, but struggles with high-dimensional observations and combinatorial state-spaces, especially under sparse reward. We employ a two-level hierarchy composed of a value-based GCRL agent and a factored subgoal-generating conditional diffusion model. The RL agent and subgoal generator are trained independently and composed post hoc through selective subgoal generation based on the value function, making the approach modular and compatible with existing GCRL algorithms. We introduce new variations to benchmark tasks that highlight the challenges of multi-entity domains, and show that our method consistently boosts performance of the underlying RL agent on image-based long-horizon tasks with sparse rewards, achieving over 150% higher success rates on the hardest task in our suite and generalizing to increasing horizons and numbers of entities. Rollout videos are provided at: https://sites.google.com/view/hecrl


💡 Research Summary

The paper tackles two fundamental challenges that impede offline goal‑conditioned reinforcement learning (GCRL) in complex, multi‑entity domains: (1) the propagation of sparse‑reward signals over long horizons, which leads to high temporal‑difference (TD) error and a low value‑signal‑to‑noise ratio, and (2) the combinatorial explosion of the state space when the environment is naturally factored into many interacting entities. To address these, the authors propose Hierarchical Entity‑Centric Reinforcement Learning (HECRL), a modular two‑level architecture that couples a value‑based GCRL agent with a conditional diffusion model that generates factored subgoals.

The low‑level component is any value‑based offline GCRL algorithm (the paper uses HIQL as a representative). It learns a policy π(s,g) and a value function V(s,g) that estimates the discounted distance to a goal g. The authors formalize the notion of a “policy competence radius” R, the region around a state where the value function provides a reliable learning signal. Subgoals must lie within this radius for the low‑level policy to be able to reach them.

The high‑level component, called the Subgoal Diffuser, is a conditional diffusion model trained on the offline dataset. For each transition (s, g) the model learns the distribution p(ĝ|s,g) of states that are at most K steps ahead of s and are sampled from the same trajectory as g. Because the dataset is not required to contain goal‑directed behavior, p(ĝ|s,g) can be highly multimodal; diffusion is chosen precisely to capture such complex distributions. The diffusion model does not enforce optimality—it merely imitates the observed dynamics.

At test time, HECRL samples N candidate subgoals from the diffusion model, filters them by the value‑based reachability criterion V(s,ĝ) > Ř (an estimate of R), and selects the candidate with the highest value relative to the ultimate goal, i.e., argmax V(ĝ,g). If the current state is already closer to the goal than the selected subgoal, the algorithm directly switches to the goal. The chosen subgoal is then fed to the low‑level policy for a fixed horizon Tsg, after which the subgoal generation process repeats. This loop can be interpreted as a receding‑horizon planner operating in state space rather than action space, effectively performing model‑predictive control with the learned value function as a distance metric.

A crucial contribution is the use of unsupervised object‑centric representations (Deep Latent Particles, DLP) to factor the high‑dimensional image observations into entity‑specific latent variables. The diffusion model is conditioned on these factors, encouraging generated subgoals that modify only a small subset of entities—so‑called “entity‑factored subgoals.” Because many multi‑entity tasks allow independent control of individual objects, such sparse subgoals are easier for the low‑level policy to achieve, leading to more efficient planning and better scalability as the number of entities grows.

Empirical evaluation spans two domains: a robotic manipulation suite with visual inputs (arm, multiple objects, and target locations) and a video‑game‑style multi‑agent navigation task. The authors introduce new benchmark variations that increase horizon length and entity count, exposing the combinatorial difficulty. HECRL consistently outperforms baseline offline GCRL methods (HIQL, CQL, BCQ) and recent hierarchical diffusers. On the hardest task, success rates improve by more than 150 % relative to the strongest baseline. Ablation studies confirm that (i) the value‑based reachability filter is essential for avoiding unreachable subgoals, (ii) entity‑factored diffusion yields higher‑quality subgoals than a naïve pixel‑diffusion, and (iii) the modular design allows swapping the low‑level algorithm without retraining the diffusion model.

The paper’s contributions can be summarized as follows:

  1. A modular hierarchical framework that separates subgoal generation from policy learning, enabling plug‑and‑play integration with any value‑based offline GCRL algorithm.
  2. A conditional diffusion model that learns a multimodal distribution over near‑future states, combined with a simple yet effective test‑time filter based on the learned value function.
  3. Exploitation of unsupervised entity‑centric representations to produce factored subgoals, dramatically reducing the effective planning horizon in multi‑entity environments.

Overall, HECRL demonstrates that marrying entity‑centric perception, diffusion‑based generative modeling, and value‑function‑guided planning yields a powerful recipe for tackling long‑horizon, sparse‑reward tasks in high‑dimensional, combinatorial settings. The approach opens avenues for applying offline GCRL to real‑world robotics, multi‑robot coordination, and complex video‑game AI where visual inputs and many interacting objects are the norm.


Comments & Academic Discussion

Loading comments...

Leave a Comment