GraphAllocBench: A Flexible Benchmark for Preference-Conditioned Multi-Objective Policy Learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Preference-Conditioned Policy Learning (PCPL) in Multi-Objective Reinforcement Learning (MORL) aims to approximate diverse Pareto-optimal solutions by conditioning policies on user-specified preferences over objectives. This enables a single model to flexibly adapt to arbitrary trade-offs at run-time by producing a policy on or near the Pareto front. However, existing benchmarks for PCPL are largely restricted to toy tasks and fixed environments, limiting their realism and scalability. To address this gap, we introduce GraphAllocBench, a flexible benchmark built on a novel graph-based resource allocation sandbox environment inspired by city management, which we call CityPlannerEnv. GraphAllocBench provides a rich suite of problems with diverse objective functions, varying preference conditions, and high-dimensional scalability. We also propose two new evaluation metrics – Proportion of Non-Dominated Solutions (PNDS) and Ordering Score (OS) – that directly capture preference consistency while complementing the widely used hypervolume metric. Through experiments with Multi-Layer Perceptrons (MLPs) and graph-aware models, we show that GraphAllocBench exposes the limitations of existing MORL approaches and paves the way for using graph-based methods such as Graph Neural Networks (GNNs) in complex, high-dimensional combinatorial allocation tasks. Beyond its predefined problem set, GraphAllocBench enables users to flexibly vary objectives, preferences, and allocation rules, establishing it as a versatile and extensible benchmark for advancing PCPL. Code: https://github.com/jzh001/GraphAllocBench

💡 Research Summary

This paper addresses a critical gap in the evaluation of Preference‑Conditioned Policy Learning (PCPL) for Multi‑Objective Reinforcement Learning (MORL). Existing benchmarks are limited to toy problems, low‑dimensional observation spaces, and simple Pareto fronts, which hampers progress toward realistic, scalable PCPL methods. To overcome these limitations, the authors introduce CityPlannerEnv, a Gymnasium‑based sandbox that models a city‑scale resource allocation problem as a bipartite graph between resources and demands. The environment allows users to freely configure the number of resource and demand types, the dependency graph topology, the quantity of each resource, and the mathematical form of objective functions (e.g., polynomial, sinusoidal, logarithmic). This flexibility enables the creation of problems with high‑dimensional graph observations, non‑convex and multi‑modal Pareto fronts, and complex, discrete allocation dynamics.

Built on CityPlannerEnv, GraphAllocBench is a curated benchmark suite comprising diverse problem categories that stress‑test PCPL algorithms along several axes: objective complexity, preference dimensionality, graph sparsity, and scalability. In addition to the widely used hypervolume (HV) metric, the authors propose two novel evaluation measures. Proportion of Non‑Dominated Solutions (PNDS) quantifies the fraction of solutions that remain non‑dominated for a set of sampled preference vectors, directly reflecting how well a policy stays near the Pareto front. Ordering Score (OS) assesses whether the ranking of produced solutions aligns with the supplied preference weights, thereby measuring preference consistency and robustness. Both metrics address shortcomings of HV, which can obscure fine‑grained preference‑related performance differences.

Experimental evaluation combines a standard PPO algorithm with two policy architectures: a Multi‑Layer Perceptron (MLP) and a Heterogeneous Graph Neural Network (HGNN). The HGNN processes the bipartite resource‑demand graph and captures heterogeneous node attributes, allowing it to learn the intricate dependency structure more effectively than the MLP. The authors also employ Smooth Tchebycheff scalarization to generate challenging, non‑convex Pareto fronts for training. Results show that the HGNN‑based policy consistently outperforms the MLP on both PNDS and OS across all benchmark problems, especially when preferences shift abruptly. This demonstrates that graph‑aware models can better maintain preference‑consistent behavior in high‑dimensional combinatorial allocation tasks.

The paper’s contributions are threefold: (1) the introduction of a flexible, graph‑based sandbox (CityPlannerEnv) that can generate realistic, scalable multi‑objective allocation problems; (2) the definition of two new evaluation metrics (PNDS and OS) that directly capture preference consistency and solution quality beyond hypervolume; and (3) empirical evidence that graph neural networks improve PCPL performance on complex, graph‑structured environments. By releasing the code and benchmark suite, the authors provide the community with a powerful tool for advancing PCPL research, and they outline future directions such as dynamic graph topologies, multi‑agent cooperation, and integration with real‑world city data.

GraphAllocBench: A Flexible Benchmark for Preference-Conditioned Multi-Objective Policy Learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment