Joint Learning of Hierarchical Neural Options and Abstract World Model
Building agents that can perform new skills by composing existing skills is a long-standing goal of AI agent research. Towards this end, we investigate how to efficiently acquire a sequence of skills, formalized as hierarchical neural options. However, existing model-free hierarchical reinforcement algorithms need a lot of data. We propose a novel method, which we call AgentOWL (Option and World model Learning Agent), that jointly learns – in a sample efficient way – an abstract world model (abstracting across both states and time) and a set of hierarchical neural options. We show, on a subset of Object-Centric Atari games, that our method can learn more skills using much less data than baseline methods.
💡 Research Summary
The paper introduces AgentOWL (Option and World model Learning Agent), a novel framework that jointly learns hierarchical neural options and an abstract world model to achieve high sample efficiency in skill acquisition. Traditional model‑free hierarchical reinforcement learning suffers from a rapidly expanding action space as more options are added, leading to prohibitive data requirements. AgentOWL addresses this by integrating model‑based reinforcement learning: it builds an abstract world model that predicts the effects of options over temporally extended horizons, thereby allowing the agent to plan and prune unlikely option sequences before executing them in the environment.
The abstract world model is constructed using PoE‑World, a product‑of‑experts approach where each expert is a short symbolic program generated by a large language model (LLM). Experts model independent causal mechanisms; their weights are learned via maximum a posteriori estimation with a “frame axiom” prior that encourages an option to affect only its own goal predicate while leaving other predicates unchanged. State abstraction consists solely of the set of goal predicates, f(s) = (g₁(s), …, g_m(s)), ensuring that the model operates on a low‑dimensional symbolic space.
Option learning proceeds iteratively. For a target goal g, the algorithm first selects stable, high‑performing sub‑options already in the set Ω. If none are sufficient, an LLM is prompted with the current goal and existing sub‑goals to hypothesize a new precondition h, yielding a new sub‑option o_{h→g}. This hypothesized option and its corresponding abstract transition model are immediately added to Ω and the world model T. The agent then learns a model‑based policy π_wm in the abstract world using DQN, which serves as an exploration guide for the real‑world policy π_real. Exploration is governed by a mixing coefficient ε that anneals from 1 to 0, gradually shifting reliance from the possibly imperfect world model to pure model‑free learning.
Training updates the world model with newly collected option transitions via PoE‑World, and the process repeats until the target option is mastered. The architecture creates a positive feedback loop: better options improve the world model, and a more accurate world model accelerates option learning.
Empirical evaluation on three object‑centric Atari games—Montezuma’s Revenge, Pitfall, and Private Eye—demonstrates that AgentOWL learns a larger number of hierarchical skills with far fewer environment interactions than baselines such as HIRO or Option‑Critic. The LLM‑driven hypothesis generation effectively reduces the planning horizon, and the abstract world model successfully filters out unpromising option sequences, yielding substantial gains in sample efficiency. The results validate the central claim that jointly learning options and an abstract, symbolic world model can overcome the sample‑inefficiency bottleneck of traditional hierarchical reinforcement learning.
Comments & Academic Discussion
Loading comments...
Leave a Comment