Approximate Linear Programming for First-order MDPs
We introduce a new approximate solution technique for first-order Markov decision processes (FOMDPs). Representing the value function linearly w.r.t. a set of first-order basis functions, we compute suitable weights by casting the corresponding optimization as a first-order linear program and show how off-the-shelf theorem prover and LP software can be effectively used. This technique allows one to solve FOMDPs independent of a specific domain instantiation; furthermore, it allows one to determine bounds on approximation error that apply equally to all domain instantiations. We apply this solution technique to the task of elevator scheduling with a rich feature space and multi-criteria additive reward, and demonstrate that it outperforms a number of intuitive, heuristicallyguided policies.
💡 Research Summary
The paper presents a novel framework for solving First‑Order Markov Decision Processes (FOMDPs) by extending Approximate Linear Programming (ALP) to the first‑order logical level. Instead of enumerating concrete states and actions, the authors represent the value function as a weighted sum of first‑order basis functions, each defined by a logical formula. Using the Situation Calculus and successor‑state axioms, they perform regression of value functions through stochastic actions, producing logical constraints that capture the Bellman backup. These constraints are fed to a theorem prover to test satisfiability; only satisfiable constraints are included in a linear program whose variables are the basis weights. Because the constraints are expressed symbolically, the resulting LP is independent of the number of objects in any domain instantiation, avoiding the exponential blow‑up typical of propositional ALP. The authors also derive domain‑size‑independent error bounds for the approximation. The method is evaluated on a complex elevator‑scheduling problem with multiple criteria and a rich set of logical features. With about twenty basis functions, the first‑order ALP outperforms several intuitive heuristic policies, achieving roughly 15 % lower average passenger waiting time and improved energy efficiency, while scaling gracefully as the number of elevators, floors, and passengers increases. The work demonstrates that first‑order reasoning combined with linear programming can produce scalable, provably bounded approximations for relational decision‑making problems, though it relies on expert‑crafted basis functions and the efficiency of the underlying theorem prover.
Comments & Academic Discussion
Loading comments...
Leave a Comment