R-HTN: Rebellious Online HTN Planning for Safety and Game AI

R-HTN: Rebellious Online HTN Planning for Safety and Game AI
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce online Hierarchical Task Network (HTN) agents whose behaviors are governed by a set of built-in directives \D. Like other agents that are capable of rebellion (i.e., {\it intelligent disobedience}), our agents will, under some conditions, not perform a user-assigned task and instead act in ways that do not meet a user’s expectations. Our work combines three concepts: HTN planning, online planning, and the directives \D, which must be considered when performing user-assigned tasks. We investigate two agent variants: (1) a Nonadaptive agent that stops execution if it finds itself in violation of \D~ and (2) an Adaptive agent that, in the same situation, instead modifies its HTN plan to search for alternative ways to achieve its given task. We present R-HTN (for: Rebellious-HTN), a general algorithm for online HTN planning under directives \D. We evaluate R-HTN in two task domains where the agent must not violate some directives for safety reasons or as dictated by their personality traits. We found that R-HTN agents never violate directives, and aim to achieve the user-given goals if feasible though not necessarily as the user expected.


💡 Research Summary

The paper introduces R‑HTN, a novel framework for online Hierarchical Task Network (HTN) planning that incorporates a set of built‑in directives D representing safety, ethical, or personality constraints. Traditional HTN planners generate a sequence of primitive actions by recursively decomposing high‑level goals, but they assume the agent will always follow the user’s commands. In many real‑world scenarios—autonomous drones navigating restricted airspace, game NPCs with distinct personalities, or any system that may be asked to perform unsafe actions—this assumption is problematic.

To address this, the authors define a “directive” as a Boolean function δ(s) that evaluates to true when the current state s violates a constraint (e.g., the agent is inside a red hazard zone). They further categorize discrepancies into three types: (1) Immediate discrepancy – the violation is already present in the current state; (2) Projected discrepancy – the violation would occur after executing the next n actions; (3) Adaptive discrepancy – the planner can modify the upcoming plan to avoid any projected violation while still achieving the user’s goal.

Based on this taxonomy, two agent variants are explored:

  • Non‑adaptive agent – Upon detecting any discrepancy, it halts execution and waits for a new user command.
  • Adaptive agent – When a discrepancy is detected, it invokes a domain‑specific repair routine that rewrites the pending task list, thereby seeking an alternative plan that respects all directives.

The core algorithm, R‑HTN, builds on the classic SHOP planner. The recursive procedure RSeekPlan decomposes the task list ˜t as usual, but before executing a primitive action it calls RepairTasksIfNeeded. This routine first checks for immediate violations (using δ(s)) and, if none, checks for projected violations after the action (using δ(a₀(s))). If a violation is found, domain‑specific repair functions repairTaskListState or repairTaskListEffect return a modified task list. For the non‑adaptive agent, any modification triggers a failure (returning an empty plan). For the adaptive agent, the modified list is fed back into RSeekPlan, allowing the planner to continue with a repaired plan.

The authors evaluate the approach in two simulated domains:

  1. O‑RESCHU – A multi‑UAV scenario where agents must reach assigned destinations while avoiding dynamically appearing red zones that increase energy consumption. The user randomly assigns destinations, and red zones appear/disappear each tick.
  2. MONSTER – A game‑style environment where NPCs must interact with a player while obeying personality‑based directives (e.g., “never attack,” “always greet”).

Three agents are compared: a standard compliant HTN agent (ignores D), the non‑adaptive R‑HTN agent, and the adaptive R‑HTN agent. The hypotheses are that the adaptive agent will achieve more goals and that both R‑HTN agents will never violate directives. Results confirm these hypotheses: the adaptive agent attains the highest goal‑completion rate in both domains, while all three agents exhibit zero directive violations. The non‑adaptive agent frequently halts, leading to longer overall execution times and more user intervention, whereas the adaptive agent autonomously repairs plans and proceeds without interruption.

Key contributions of the paper include:

  • Formal integration of constraint directives D into HTN planning, with a clear mathematical definition of discrepancies.
  • A taxonomy distinguishing immediate, projected, and adaptive discrepancies, providing a systematic way to reason about when and how to intervene.
  • The R‑HTN algorithm that augments an existing HTN planner with minimal changes, enabling online detection and repair of constraint violations.
  • Empirical validation in two distinct domains, demonstrating that the approach can guarantee safety/personality compliance while still pursuing user goals.

The authors discuss future work such as learning directives automatically from data, handling multi‑agent conflict resolution when different agents have overlapping or contradictory directives, and deploying the framework on real robotic platforms or commercial game engines. By introducing “rebellious” behavior—intelligent disobedience—into planning, the paper opens a new research direction that balances autonomy, safety, and user intent in dynamic environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment