The FF Planning System: Fast Plan Generation Through Heuristic Search

Journal of Articial In telligence Researc h 14 (2001) 253-302 Submitted 2/01; published 5/01 The FF Planning System: F ast Plan Generation Through Heuristic Searc h J org Homann hoffmann@inf orma tik.uni-freibur g.de Bernhard Neb el nebel@inf orma tik.uni-freibur g.de Ge or ges-K ohler-A l le e, Geb. 52, 79110 F r eibur g, Germany Abstract W e describ e and ev aluate the algorithmic tec hniques that are used in the FF planning system. Lik e the HSP system, FF relies on forw ard state space searc h, using a heuristic that estimates goal distances b y ignoring delete lists. Unlik e HSP's heuristic, our metho d do es not assume facts to b e indep enden t. W e in tro duce a no v el searc h strategy that com bines hill-clim bing with systematic searc h, and w e sho w ho w other p o w erful heuristic informa- tion can b e extracted and used to prune the searc h space. FF w as the most successful automatic planner at the recen t AIPS-2000 planning comp etition. W e review the results of the comp etition, giv e data for other b enc hmark domains, and in v estigate the reasons for the run time p erformance of FF compared to HSP . 1. In tro duction Ov er the last few y ears w e ha v e seen a signican t increase of the eciency of planning systems. This increase is mainly due to three new approac hes in plan generation. The rst approac h w as dev elop ed b y Blum and F urst (1995, 1997). In their seminal pap er on the GRAPHPLAN system (Blum & F urst, 1995), they describ ed a new plan generation tec hnique based on planning gr aphs , whic h w as m uc h faster than an y other tec hnique kno wn at this time. Their pap er started a whole series of researc h eorts that rened this approac h b y making it ev en more ecien t (F o x & Long, 1998; Kam bhampati, P ark er, & Lam brec h t, 1997) and b y extending it to cop e with more expressiv e planning languages (Ko ehler, Neb el, Homann, & Dimop oulos, 1997; Gazen & Knoblo c k, 1997; Anderson, Smith, & W eld, 1998; Neb el, 2000). The second approac h is the planning as satisability metho d, whic h translates planning to prop ositional satisabilit y (Kautz & Selman, 1996). In particular there is the hop e that adv ances in the state of the art of prop ositional reasoning systems carry directly o v er to planning systems relying on this tec hnology . In fact, Kautz and Selman (1999) predicted that researc h on planning metho ds will b ecome sup eruous b ecause the state of the art in prop ositional reasoning systems will adv ance m uc h faster than in planning systems. A third new approac h is heuristic-se ar ch planning as prop osed b y Bonet and Gener (1998, 1999). In this approac h a heuristic function is deriv ed from the sp ecication of the planning instance and used for guiding the searc h through the state space. As demonstrated b y the system FF (short for F ast-F orw ard) at the planning comp etition at AIPS-2000, this approac h pro v ed to b e comp etitiv e. In fact, FF outp erformed all the other fully automatic systems and w as nominated Gr oup A Distinguishe d Performanc e Planning System at the comp etition. c  2001 AI Access F oundation and Morgan Kaufmann Publishers. All righ ts reserv ed. Hoffmann & Nebel In HSP (Bonet & Gener, 1998), goal distances are estimated b y appro ximating solution length to a relaxation of the planning task (Bonet, Lo erincs, & Gener, 1997). While FF uses the same relaxation for deriving its heuristics, it diers from HSP in a n um b er of im- p ortan t details. Its base heuristic tec hnique can b e seen as an application of GRAPHPLAN to the relaxation. This yields goal distance estimates that, in dierence to HSP's estimates, do not rely on an indep endence assumption. FF uses a dieren t searc h tec hnique than HSP , namely an enfor c e d form of hill-clim bing, com bining lo cal and systematic searc h. Finally , it emplo ys a p o w erful pruning tec hnique that selects a set of promising successors to eac h searc h no de, and another pruning tec hnique that cuts out branc hes where it app ears that some goal has b een ac hiev ed to o early . Both tec hniques are obtained as a side eect of the base heuristic metho d. Concerning the researc h strategy that FF is based on, w e remark the follo wing. A lot of classical planning approac hes, lik e partial-order planning (McAllester & Rosen blitt, 1991) or planning graph analysis (Blum & F urst, 1997), are generic problem solving metho ds, dev elop ed follo wing some theoretical concept, and tested on examples from the literature afterw ards. In our approac h, exploring the idea of heuristic searc h, there is no suc h clear distinction b et w een dev elopmen t and testing. The searc h strategy , as w ell as the pruning tec hniques, are generic metho ds that ha v e b een motiv ated b y observing examples. Also, design decisions w ere made on the basis of careful exp erimen tation. This in tro duces in to the system a bias to w ards the examples used for testing during dev elopmen t. W e w ere testing our algorithms on a range of domains often used in the planning literature. Throughout the pap er, w e will refer to domains that are frequen tly used in the literature, and to tasks from suc h domains, as b enchmarks . In the dev elopmen t phase, w e used b enc hmark examples from the Assem bly , Blo c ksw orld , Grid , Gripp er , Logistics , Mystery , Mprime , and Tirew orld domains. When describing our algorithms in the pap er, w e indicate the p oin ts where those testing examples pla y ed a role for design decision making. Planning is kno wn to b e PSP A CE-complete ev en in its simplest form (Bylander, 1994). Th us, in the general case, there is no ecien t algorithmic metho d. It is therefore w orth while to lo ok for algorithms that are ecien t at least on restricted sub classes. T o some exten t, this idea has b een pursued b y p osing sev ere syn tactical restrictions to the planning task sp ecications (Bylander, 1994). Our approac h is complemen tary to this. Examining the existing b enc hmarks, one nds that they , indeed, do not exploit the full expressivit y of the underlying planning formalism. Though they do not fulll an y ob vious rigid syn tactical re- strictions, almost none of them is particularly hard. In almost all of the existing b enc hmark domains, a non-optimal plan can, in principle, b e generated in p olynomial time. Using the b enc hmarks for inspiration during dev elopmen t, w e ha v e b een able to come up with a heuristic metho d that is not pro v ably ecien t, but do es w ork w ell empirically on a large class of planning tasks. This class includes almost all of the curren t planning b enc hmarks. In tuitiv ely , the algorithms exploit the simple structure underlying these tasks. Our ongoing w ork is concerned with nding a formal c haracterization of that \simple" structure, and thereb y formalizing the class of planning tasks that FF w orks w ell on. Section 2 giv es a sc hematic view on FF's system arc hitecture, and Section 3 in tro duces our notational con v en tions for STRIPS domains. Sections 4 to 6 describ e the base heuristic tec hnique, searc h algorithm, and pruning metho ds, resp ectiv ely . Section 7 sho ws ho w the algorithms are extended to deal with ADL domains. System p erformance is ev aluated in 254 F ast Plan Genera tion Thr ough Heuristic Sear ch Section 8, demonstrating that FF generates solutions extremely fast in a large range of planning b enc hmark domains. In order to illustrate our in tuitions on the kind of structure that FF can exploit successfully , the section also giv es examples of domains where the metho d is less appropriate. Finally , to clarify the p erformance dierences b et w een FF and HSP , the section describ es a n um b er of exp erimen ts w e made in order to estimate whic h of the new algorithmic tec hniques is most useful. W e sho w connections to related w ork at the p oin ts in the text where they apply , and o v erview other connections in Section 9. Section 10 outlines our curren t a v en ue of researc h. 2. System Arc hitecture T o giv e the reader an o v erview of FF's system arc hitecture, Figure 1 sho ws ho w FF's most fundamen tal tec hniques are arranged. Enforced Hill-climbing Relaxed GRAPHPLAN Task Specification Solution / "Fail" State Goal Distance Helpful Actions Figure 1: FF's base system arc hitecture. The fundamen tal heuristic tec hnique in FF is relaxed GRAPHPLAN, whic h w e will de- scrib e in Section 4. The tec hnique gets called on ev ery searc h state b y enfor c e d hil l-climbing , our searc h algorithm. This is a forw ard searc hing engine, to b e describ ed in Section 5. Giv en a state, relaxed GRAPHPLAN informs the searc h with a goal distance estimate, and addi- tionally with a set of promising successors for the state, the helpful actions , to b e describ ed in Section 6. Up on termination, enforced hill-clim bing either outputs a solution plan, or rep orts that it has failed. On top of the base arc hitecture sho wn in Figure 1, w e ha v e in tegrated a few optimizations to cop e with sp ecial cases that arose during testing:  If a planning task con tains states from whic h the goal is unreac hable (dead ends, dened in Section 5.2), then enforced hill-clim bing can fail to nd a solution. In that case, a complete heuristic searc h engine is in v ok ed to solv e the task from scratc h.  In the presence of goal orderings, enforced hill-clim bing sometimes w astes a lot of time ac hieving goals that need to b e cared for later on. Tw o tec hniques trying to a v oid this are in tegrated: { A dde d go al deletion , in tro duced in Section 6.2, cuts out branc hes where some goal has apparen tly b een ac hiev ed to o early . { The go al agenda tec hnique, adapted from w ork b y Jana Ko ehler (1998), feeds the goals to the planner in an order determined as a pre-pro cess (Section 6.2.2). 255 Hoffmann & Nebel 3. Notational Con v en tions F or in tro ducing FF's basic tec hniques, w e consider simple STRIPS planning tasks, as w ere in tro duced b y Fik es and Nilsson (1971). Our notations are as follo ws. Denition 1 (State) A state S is a nite set of lo gic al atoms. W e assume that all op erator sc hemata are grounded, i.e., w e only talk ab out actions . Denition 2 (Strips Action) A STRIPS action o is a triple o = ( pr e ( o ) ; add ( o ) ; del ( o )) wher e pr e ( o ) ar e the preconditions of o , add ( o ) is the add list of o and del ( o ) is the delete list of the action, e ach b eing a set of atoms. F or an atom f 2 add ( o ) , we say that o ac hiev es f . The r esult of applying a single STRIPS action to a state is dene d as fol lows: R esul t ( S; h o i ) = ( ( S [ add ( o )) n del ( o ) pr e ( o )  S undene d otherwise In the rst c ase, wher e pr e ( o )  S , the action is said to b e applicable in S . The r esult of applying a se quenc e of mor e than one action to a state is r e cursively dene d as R esul t ( S; h o 1 ; : : : ; o n i ) = R esul t ( R esul t ( S; h o 1 ; : : : ; o n  1 i ) ; h o n i ) : Denition 3 (Planning T ask) A planning task P = ( O ; I ; G ) is a triple wher e O is the set of actions, and I (the initial state) and G (the go als) ar e sets of atoms. Our heuristic metho d is based on r elaxe d planning tasks , whic h are dened as follo ws. Denition 4 (Relaxed Planning T ask) Given a planning task P = ( O ; I ; G ) . The re- laxation P 0 of P is dene d as P 0 = ( O 0 ; I ; G ) , with O 0 = f ( pr e ( o ) ; add ( o ) ; ; ) j ( pr e ( o ) ; add ( o ) ; del ( o )) 2 O g In w ords, one obtains the relaxed planning task b y ignoring the delete lists of all actions. Plans are simple sequences of actions in our framew ork. Denition 5 (Plan) Given a planning task P = ( O ; I ; G ) . A plan is a se quenc e P = h o 1 ; : : : ; o n i of actions in O that solves the task, i.e., for which G  R esul t ( I ; P ) holds. A n action se quenc e is c al le d a relaxed plan to P , i it solves the r elaxation P 0 of P . 256 F ast Plan Genera tion Thr ough Heuristic Sear ch 4. GRAPHPLAN as a Heuristic Estimator In this section, w e in tro duce the base heuristic metho d used in FF. It is deriv ed b y applying GRAPHPLAN to relaxed planning tasks. The resulting goal distance estimates do not, lik e HSP's estimates, rely on an indep endence assumption. W e pro v e that the heuristic com- putation is p olynomial, giv e some notions on ho w distance estimates can b e k ept cautious, and describ e ho w the metho d can b e implemen ted ecien tly . Consider the heuristic metho d that is used in HSP (Bonet & Gener, 1998). Giv en a planning task P = ( O ; I ; G ), HSP estimates for eac h state S that is reac hed in a forw ard searc h the solution length of the task P 0 S = ( O 0 ; S; G ), i.e., the length of a relaxed plan that ac hiev es the goals starting out from S . As computing the optimal solution length to P 0 S | whic h w ould mak e an admissible heuristic|is NP -hard (Bylander, 1994), the HSP estimate is a rough appro ximation based on computing the follo wing w eigh t v alues. w eig ht S ( f ) := 8 > < > : 0 if f 2 S i if [ min o 2O ;f 2 add ( o ) P p 2 pre ( o ) w eig ht S ( p )] = i  1 1 otherwise (1) HSP assumes facts to b e ac hiev ed indep enden tly in the sense that the w eigh t of a set of facts|an action's preconditions|is estimated as the sum of the individual w eigh ts. The state's heuristic estimate is h ( S ) := w eig ht S ( G ) = X g 2G w eig ht S ( g ) (2) Assuming facts to b e ac hiev ed indep enden tly , this heuristic ignores p ositiv e in teractions that can o ccur. Consider the follo wing short example planning task, where the initial state is empt y , the goals are f G 1 ; G 2 g , and there are the follo wing three actions: name (pre ; add ; del ) op G 1 = ( f P g ; f G 1 g ; ; ) op G 2 = ( f P g ; f G 2 g ; ; ) op P = ( ; ; f P g ; ; ) HSP's w eigh t v alue computation results in P ha ving w eigh t one, and eac h goal ha ving w eigh t t w o. Assuming facts to b e ac hiev ed indep enden tly , the distance of the initial state to a goal state is therefore estimated to four. Ob viously , ho w ev er, the task is solv able in only three steps, as op G 1 and op G 2 share the precondition P . In order to tak e accoun t of suc h p ositiv e in teractions, our idea is to start GRAPHPLAN on the tasks ( O 0 ; S; G ), and extract an explicit solution, i.e., a relaxed plan. One can then use this plan for heuristic ev aluation. W e will see in the next section that this approac h is feasible: GRAPHPLAN can b e pro v en to solv e relaxed tasks in p olynomial time. 4.1 Planning Graphs for Relaxed T asks Let us examine ho w GRAPHPLAN b eha v es when it is started on a planning task that do es not con tain an y delete lists. W e briey review the basic notations of the GRAPHPLAN algorithm (Blum & F urst, 1997). 257 Hoffmann & Nebel A planning graph is a directed, la y ered graph that con tains t w o kinds of no des: fact no des and action no des . The la y ers alternate b et w een fact and action la y ers, where one fact and action la y er together mak e up a time step . In the rst time step, n um b er 0, w e ha v e the fact la y er corresp onding to the initial state and the action la y er corresp onding to all actions that are applicable in the initial state. In eac h subsequen t time step i , w e ha v e the la y er of all facts that can p ossibly b e made true in i time steps, and the la y er of all actions that are p ossibly applicable giv en those facts. One crucial thing that GRAPHPLAN do es when building the planning graph is the inference of mutual exclusion relations. A pair of actions ( o; o 0 ) at time step 0 is mark ed m utually exclusiv e, if o and o 0 interfer e , i.e., if one action deletes a precondition or an add eect of the other. A pair of facts ( f ; f 0 ) at a time step i > 0 is mark ed m utually exclusiv e, if eac h action at lev el i  1 that ac hiev es f is exclusiv e of eac h action at lev el i  1 that ac hiev es f 0 . A pair of actions ( o; o 0 ) at a time step i > 0 is mark ed m utually exclusiv e, if the actions in terfere, or if they ha v e c omp eting ne e ds , i.e., if some precondition of o is exclusiv e of some precondition of o 0 . The planning graph of a relaxed task do es not con tain an y exclusion relations at all. Prop osition 1 L et P 0 = ( O 0 ; I ; G ) b e a r elaxe d STRIPS task. Starte d on P 0 , GRAPH- PLAN wil l not mark any p air of facts or actions as mutual ly exclusive. Pro of: The Prop osition is easily pro v en b y induction o v er the depth of the planning graph. Base c ase: time step 0. Only in terfering actions are mark ed m utual exclusiv e at time step 0. As there are no delete eects, no pair of actions in terferes. Inductive c ase: time step i ! time step i + 1. P er induction h yp othesis, the facts are not exclusiv e as their ac hiev ers one time step ahead are not. F rom this it follo ws that no pair of actions has comp eting needs. They do not in terfere either. 2 When started on a planning task, GRAPHPLAN extends the planning graph la y er b y la y er un til a fact la y er is reac hed that con tains all goal facts, and in whic h no t w o goal facts are mark ed exclusiv e. 1 Starting from that la y er, a recursiv e bac kw ard searc h algorithm is in v ok ed. T o nd a plan for a set of facts at la y er i > 0, initialize the set of selected actions at la y er i  1 to the empt y set. Then, for eac h fact, consider all ac hieving actions at la y er i  1 one after the other and select the rst one that is not exclusiv e of an y action that has already b een selected. If there exists suc h an action, pro ceed with the next fact. If not, bac ktrac k to the last fact and try to ac hiev e it with a dieren t action. If an ac hieving action has b een selected for eac h fact, then collect the preconditions of all these actions to mak e up a new set of facts one time step earlier. Succeed when fact la y er 0|the initial state|is reac hed, where no ac hieving actions need to b e selected. On relaxed tasks, no bac ktrac king o ccurs in GRAPHPLAN's searc h algorithm. Prop osition 2 L et P 0 = ( O 0 ; I ; G ) b e a r elaxe d STRIPS task. Starte d on P 0 , GRAPH- PLAN wil l never b acktr ack. Pro of: Bac ktrac king only o ccurs if all ac hiev ers for a fact f are exclusiv e of some already selected action. With Prop osition 1, w e kno w that no exclusions exist, and th us, that this 1. If no suc h fact la y er can b e reac hed, then the task is pro v en to b e unsolv able (Blum & F urst, 1997). 258 F ast Plan Genera tion Thr ough Heuristic Sear ch do es not happ en. Also, if f is in graph la y er i , then there is at least one ac hiev er in la y er i  1 supp orting it. 2 While the ab o v e argumen tation is sucien t for sho wing Prop osition 2, it do es not tell us m uc h ab out what is actually going on when one starts GRAPHPLAN on a task without delete lists. What happ ens is this. Giv en the task is solv able, the planning graph gets extended un til some fact la y er is reac hed that con tains all the goals. Then the recursiv e searc h starts b y selecting ac hiev ers for the goals at this lev el. The rst attempt succeeds, and new goals are set up one time step earlier. Again, the rst selection of ac hiev ers succeeds, and so forth, un til the initial state is reac hed. Th us, searc h p erforms only a single sw eep o v er the graph, starting from the top la y er going do wn to the initial la y er, and collects a relaxed plan on its w a y . In particular, the pro cedure tak es only p olynomial time in the size of the task. Theorem 1 L et P 0 = ( O 0 ; I ; G ) b e a solvable r elaxe d STRIPS task, wher e the length of the longest add list of any action is l . Then GRAPHPLAN wil l nd a solution to P 0 in time p olynomial in l , jO 0 j and jI j . Pro of: Building the planning graph is p olynomial in l , jO 0 j , jI j and t , where t is the n um b er of time steps built (Blum & F urst, 1997). No w, in our case the total n um b er jO 0 j of actions is an upp er limit to the n um b er of time steps. This is just b ecause after this n um b er of time steps has b een built, all actions app ear at some la y er in the graph. Otherwise, there is a la y er i where no new action comes in, i.e., action la y er i  1 is iden tical to action la y er i . As the task is solv able, this implies that all goals are con tained in fact la y er i , whic h w ould ha v e made the pro cess stop righ t a w a y . Similarly , action la y er jO 0 j w ould b e iden tical to action la y er jO 0 j  1, implying termination. The graph building phase is th us p olynomial in l , jO 0 j and jI j . Concerning the plan extraction phase: With Prop osition 2, searc h tra v erses the graph from top to b ottom, collecting a set of ac hieving actions at eac h la y er. Selecting ac hiev ers for a set of facts is O ( l  jO 0 j + jI j ): A set of facts has at most size l  jO 0 j + jI j , the maximal n um b er of distinct facts in the graph. An ac hieving action can b e found to eac h fact in constan t time using the planning graph. As the n um b er of la y ers to b e lo ok ed at is O ( jO 0 j ), searc h is p olynomial in the desired parameters. 2 Starting GRAPHPLAN on a solv able searc h state task ( O 0 ; S; G ) yields|in p olynomial time, with Theorem 1|a relaxed solution h O 0 ; : : : ; O m  1 i , where eac h O i is the set of actions selected in parallel at time step i , and m is the n um b er of the rst fact la y er con taining all goals. As w e are in terested in an estimation of se quential solution length, w e dene our heuristic as follo ws. h ( S ) := X i =0 ;::: ;m  1 j O i j (3) The estimation v alues obtained this w a y are, on our testing examples, usually lo w er than HSP's estimates (Equations 1 and 2), as extracting a plan tak es accoun t of p ositiv e in teractions b et w een facts. Consider again the short example from the b eginning of this section, empt y initial state, t w o goals f G 1 ; G 2 g , and three actions: 259 Hoffmann & Nebel name (pre ; add ; del ) op G 1 = ( f P g ; f G 1 g ; ; ) op G 2 = ( f P g ; f G 2 g ; ; ) op P = ( ; ; f P g ; ; ) Starting GRAPHPLAN on the initial state, the goals are con tained in fact la y er t w o, causing selection of op G 1 and op G 2 in action la y er one. This yields the new goal P at fact la y er one, whic h is ac hiev ed with op P . The resulting plan is hf op P g ; f op G 1 , op G 2 gi , giving us the correct goal distance estimate three, as distinct from HSP's estimate four. 4.2 Solution Length Optimization W e use GRAPHPLAN's heuristic estimates, Equation 3, in a greedy strategy , to b e in tro- duced in Section 5.1, that do es not tak e its decisions bac k once it has made them. F rom our exp erience with running this strategy on our testing examples, this w orks b est when distance estimates are cautious, i.e., as lo w as p ossible. As already said, an optimal sequen tial solu- tion can not b e syn thesized ecien tly . What one c an do is apply some tec hniques to mak e GRAPHPLAN return as short solutions as p ossible. Belo w, w e describ e some w a ys of doing that. The rst tec hnique is a built-in feature of GRAPHPLAN and ensures a minimalit y criterion for the relaxed plan. The t w o other tec hniques are heuristic optimizations. 4.2.1 NOOPs-first The original GRAPHPLAN algorithm mak es extensiv e use of so-called NOOP s. These are dumm y actions that simply propagate facts from one fact la y er to the next. F or eac h fact f that gets inserted in to some fact la y er, a NOOP corresp onding to that fact is inserted in to the action la y er at the same time step. This NOOP has no other eect than adding f , and no other precondition than f . When p erforming bac kw ard searc h, the NOOPs are considered just lik e an y other ac hiev er, i.e., one w a y of making a fact true at time i > 0 is to simply k eep it true from time i  1. In GRAPHPLAN, the implemen tation uses as a default the NOOPs-rst heuristic, i.e., if there is a NOOP presen t for ac hieving a fact f , then this NOOP is considered rst, b efore the planner tries selecting other \real" actions that ac hiev e f . On relaxed tasks, the NOOPs- rst heuristic ensures a minimalit y criterion for the returned plan as follo ws. Prop osition 3 L et ( O 0 ; I ; G ) b e a r elaxe d STRIPS task, which is solvable. Using the NOOPs-rst str ate gy, the plan that GRAPHPLAN r eturns wil l c ontain e ach action at most onc e. Pro of: Let us assume the opp osite, i.e., one action o o ccurs t wice in the plan h O 0 ; : : : ; O m  1 i that GRAPHPLAN nds. W e ha v e o 2 O i and o 2 O j for some la y ers i; j with i < j . No w, the action o has b een selected at la y er j to ac hiev e some fact f at la y er j + 1. As the algorithm is using the NOOPs-rst strategy , this implies that there is no NOOP for fact f con tained in action la y er j : otherwise, the NOOP|not action o |w ould ha v e b een selected for ac hieving f . In con tradiction to this, action la y er j do es indeed con tain a NOOP for fact f . This is b ecause action o already app ears in action la y er i < j . As f gets added b y o , it app ears in 260 F ast Plan Genera tion Thr ough Heuristic Sear ch fact la y er i + 1  j . Therefore, a NOOP for f is inserted in action la y er i + 1  j , and, in turn, will b e inserted in to eac h action la y er i 0  i + 1. 2 4.2.2 Difficul ty Heuristic With the ab o v e argumen tation, if w e can ac hiev e a fact b y using a NOOP , w e should do that. The question is, whic h ac hiev er should w e c ho ose when no NOOP is a v ailable? It is certainly a go o d idea to select an ac hiev er whose preconditions seem to b e \easy". F rom the graph building phase, w e can obtain a simple measure for the dicult y of an action's preconditions as follo ws. dicult y ( o ) := X p 2 pre ( o ) min f i j p is mem b er of the fact la y er at time step i g (4) The dicult y of eac h action can b e set when it is rst inserted in to the graph. During plan extraction, facing a fact for whic h no NOOP is a v ailable, w e then simply select an ac hieving action with minimal dicult y . This heuristic w orks w ell in situations where there are sev eral w a ys to ac hiev e one fact, but some w a ys need less eort than others. 4.2.3 A ction Set Lineariza tion Assume GRAPHPLAN has settled for a parallel set O i of ac hiev ers at a time step i , i.e., ac hieving actions ha v e b een selected for all goals at time step i + 1. As w e are only in terested in sequen tial solution length, w e still ha v e a c hoice on ho w to linearize the ac- tions. Some linearizations can lead to shorter plans than others. If an action o 2 O i adds a precondition p of another action o 0 2 O i , then w e do not need to include p in the new set of facts to b e ac hiev ed one time step earlier, giv en that w e restrict ourselv es to execute o b efore o 0 . The question no w is, ho w do w e nd a linearization of the actions that minimizes our new fact set? The corresp onding decision problem is NP -complete. Denition 6 L et OPTIMAL A CTION LINEARIZA TION denote the fol lowing pr oblem. Giv en a set O of relaxed STRIPS actions and a p ositiv e in teger K . Is there a one-to-one function f : O 7! f 1 ; 2 ; : : : ; jO jg suc h that the n um b er of unsatised preconditions when executing the sequence h f  1 (1) ; : : : ; f  1 ( jO j ) i is at most K ? Theorem 2 De ciding OPTIMAL A CTION LINEARIZA TION is NP -c omplete. Pro of: Mem b ership is ob vious. Hardness is pro v en b y transformation from DIRECTED OPTIMAL LINEAR ARRANGEMENT (Ev en & Shiloac h, 1975). Giv en a directed graph G = ( V ; A ) and a p ositiv e in teger K , the question is, do es there exists a one-to-one func- tion f : V 7! f 1 ; 2 ; : : : ; j V jg suc h that f ( u ) < f ( v ) whenev er ( u; v ) 2 A and suc h that P ( u;v ) 2 A ( f ( v )  f ( u ))  K ? T o a giv en directed graph, w e dene a set of actions as follo ws. F or eac h no de w in the graph, w e dene an action in our set O . F or simplicit y of presen tation, w e iden tify the actions with their corresp onding no des. T o b egin with, w e set pre( w ) = add ( w ) = ; for all w 2 V . Then, for eac h edge ( u; v ) 2 A , w e create new logical facts P ( u;v ) w and R ( u;v ) w for w 2 V . 261 Hoffmann & Nebel Using these new logical facts, w e no w adjust all precondition and add lists to express the constrain t that is giv en b y the edge ( u; v ). Sa y action u is ordered b efore action v in a linearization. W e need to sim ulate the dierence b et w een the p ositions of v and u . T o do this, w e dene our actions in a w a y suc h that the bigger this dierence is, the more unsatised preconditions there are when executing the linearization. First, w e \punish" all actions that are ordered b efore v , b y giving them an unsatised precondition. pre ( w ) := pre( w ) [ P ( u;v ) w for w 2 V ; add ( v ) := add ( v ) [ f P ( u;v ) w j w 2 V g With this denition, the actions w ordered b efore v |and v itself|will ha v e the unsatised precondition P ( u;v ) w , while those ordered after will get this precondition added b y v . Th us, the n um b er of unsatised preconditions w e get here is exactly f ( v ). Secondly , w e \giv e a rew ard" to eac h action that is ordered b efor e u . W e simply do this b y letting those actions add a precondition of u , whic h w ould otherwise go unsatised. add( w ) := add ( w ) [ R ( u;v ) w for w 2 V ; pre( u ) := pre ( u ) [ f R ( u;v ) w j w 2 V g That w a y , w e will ha v e exactly j V j  ( f ( u )  1) unsatised preconditions, namely the R ( u;v ) w facts for all actions except those that are ordered b efore u . Summing up the n um b er of unsatised preconditions w e get for a linearization f , w e arriv e at X ( u;v ) 2 A ( f ( v ) + j V j  ( f ( u )  1)) = X ( u;v ) 2 A ( f ( v )  f ( u )) + j A j  ( j V j + 1) W e th us dene our new p ositiv e in teger K 0 := K + j A j  ( j V j + 1). Finally , w e mak e sure that actions u get ordered b efore actions v for ( u; v ) 2 A . W e do this b y inserting new logical \safet y" facts S ( u;v ) 1 ; : : : ; S ( u;v ) K 0 +1 in to v 's precondition- and u 's add list. pre( v ) := pre( v ) [ f S ( u;v ) 1 ; : : : ; S ( u;v ) K 0 +1 g ; add( u ) := add ( u ) [ f S ( u;v ) 1 ; : : : ; S ( u;v ) K 0 +1 g Altogether, a linearization f of our actions leads to at most K 0 unsatised preconditions if and only if f satises the requiremen ts for a directed optimal linear arrangemen t. Ob viously , the action set and K 0 can b e computed in p olynomial time. 2 Our sole purp ose with linearizing an action set in a certain order is to ac hiev e a smaller n um b er of unsatised preconditions, whic h, in turn, migh t lead to a shorter re- laxed solution. 2 Th us, w e are certainly not willing to pa y the price that nding an optimal linearization of the actions is lik ely to cost, according to Theorem 2. There are a few meth- o ds ho w one can appro ximate suc h a linearization, lik e in tro ducing an ordering constrain t o < o 0 for eac h action o that adds a precondition of another action o 0 , and trying to linearize the actions suc h that man y of these constrain ts are met. During our exp erimen tations, w e found that parallel actions adding eac h other's preconditions o ccur so rarely in our testing tasks that ev en appro ximating is not w orth the eort. W e th us simply linearize all actions in the order they get selected, causing almost no computational o v erhead at all. 2. It should b e noted here that using optimal action linearizations at eac h time step do es not guaran tee the resulting relaxed solution to b e optimal, whic h w ould giv e us an admissible heuristic. 262 F ast Plan Genera tion Thr ough Heuristic Sear ch 4.3 Ecien t Implemen tation W e ha v e implemen ted our o wn v ersion of GRAPHPLAN, highly optimized for solving re- laxed planning tasks. It exploits the fact that the planning graph of a relaxed task do es not con tain an y exclusion relations (Prop osition 1). Our implemen tation is also highly op- timized for rep eatedly solving planning tasks whic h all share the same set of actions|the tasks P 0 S = ( O 0 ; S; G ) as describ ed at the b eginning of this section. Planning task sp ecications usually con tain some op erator sc hemata, and a set of con- stan ts. Instan tiating the sc hemata with the constan ts yields the actions to the task. Our system instan tiates all op erator sc hemata in a w a y suc h that all, and only , reac hable actions are built. Reac habilit y of an action here means that, when successiv ely applying op erators to the initial state, all of the action's preconditions app ear ev en tually . W e then build what w e call the connectivit y graph. This graph consists of t w o la y ers, one con taining all (reac h- able) actions, and the other all (reac hable) facts. F rom eac h action, there are p oin ters to all preconditions, add eects and delete eects. All of FF's computations are ecien tly implemen ted using this graph structure. F or the subsequen tly describ ed implemen tation of relaxed GRAPHPLAN, w e only need the information ab out preconditions and add eects. As a relaxed planning graph do es not con tain an y exclusion relations, the only informa- tion one needs to represen t it are what w e call the layer memb erships , i.e., for eac h fact or action, the n um b er of the rst la y er at whic h it app ears in the graph. Called on an in terme- diate task P 0 S = ( O 0 ; S; G ), our v ersion of GRAPHPLAN computes these la y er mem b erships b y using the follo wing xp oin t computation. The la y er mem b erships of all facts and actions are initialized to 1 . F or eac h action, there is also a coun ter, whic h is initialized to 0. Then, fact la y er 0 is built implicitly b y setting the la y er mem b ership of all facts f 2 S to 0. Eac h time when a fact f gets its la y er mem b ership set, all actions of whic h f is a precondition get their coun ter incremen ted. As so on as the coun ter for an action o reac hes the total n um b er of o 's preconditions, o is put to a list of sc heduled actions for the curren t la y er. After a fact la y er i is nished, all actions sc heduled for step i ha v e their la y er mem b ership set to i , and their adds, if not already presen t, are put to the list of sc heduled facts for the next fact la y er at time step i + 1. Ha ving nished with action la y er i , all sc heduled facts at step i + 1 ha v e their mem b ership set, and so on. The pro cess con tin ues un til all goals ha v e a la y er mem b ership lo w er than 1 . It should b e noticed here that this view of planning graph building corresp onds closely to the computation of the w eigh t v alues in HSP . Those can b e computed b y applying the actions in la y ers as ab o v e, up dating w eigh t v alues and propagating the c hanges eac h time an action comes in, and stopping when no c hanges o ccur in a la y er. Ha ving nished the relaxed v ersion of planning graph building, a similarly trivial v ersion of GRAPHPLAN's solution extraction mec hanism is in v ok ed. See Figure 2. Instead of putting all goals in to the top la y er in GRAPHPLAN st yle, and then propa- gating them do wn b y using NOOPs-rst, eac h goal g is simply put in to a goal set G i lo cated at g 's rst la y er i . Then, there is a for-next lo op do wn from the top to the initial la y er. A t eac h la y er i , an ac hieving action with la y er mem b ership i  1 gets selected for eac h fact in the corresp onding goal set. If there is more than one suc h ac hiev er, a b est one is pic k ed according to the dicult y heuristic. The preconditions are put in to their corresp onding goal sets. Eac h time an action is selected, all of its adds are mark ed true at times i and i  1. The mark er at time i prev en ts ac hiev ers to b e selected for facts that are already true 263 Hoffmann & Nebel for i := 1 ; : : : ; m do G i := f g 2 G j la y er-mem b ership ( g ) = i g endfor for i := m; : : : ; 1 do for all g 2 G i ; g not mark ed tr ue at time i do select an action o with g 2 add ( o ) and la y er mem b ership i  1, o 's dicult y b eing minimal for all f 2 pr e ( o ) ; la y er-mem b ership ( f ) 6 = 0 ; f not mark ed tr ue at time i  1 do G la y er-mem b ership ( f ) := G la y er-mem b ership ( f ) [ f f g endfor for all f 2 add ( o ) do mark f as tr ue at times i  1 and i endfor endfor endfor Figure 2: Relaxed plan extraction an yw a y . Marking at time i  1 assumes that actions are linearized in the order they get selected: A precondition that w as ac hiev ed b y an action ahead is not considered as a new goal. 5. A No v el V ariation of Hill-clim bing In this section, w e in tro duce FF's base searc h algorithm. W e discuss the algorithm's theo- retical prop erties regarding completeness, and deriv e FF's o v erall searc h strategy . In the rst HSP v ersion (Bonet & Gener, 1998), HSP1 as w as used in the AIPS- 1998 comp etition, the searc h strategy is a v ariation of hill-clim bing, alw a ys selecting one b est successor to the state it is curren tly facing. Because state ev aluations are costly , w e also c hose to use lo cal searc h, in the hop e to reac h goal states with as few ev aluations as p ossible. W e settled for a dieren t searc h algorithm, an \enforced" form of hill-clim bing, whic h com bines lo cal and systematic searc h. The strategy is motiv ated b y the simple structure that the searc h spaces of our testing b enc hmarks tend to ha v e. 5.1 Enforced Hill-clim bing Doing planning b y heuristic forw ard searc h, the searc h space is the space of all reac hable states, together with their heuristic ev aluation. No w, ev aluating states in our testing b enc h- marks with the heuristic dened b y Equation 3, one often nds that the resulting searc h spaces are simple in structure, sp ecically , that lo cal minima and plateaus tend to b e small. F or an y searc h state, the next state with strictly b etter heuristic ev aluation is usually only a few steps a w a y (an example for this is the Logistics domain describ ed in Section 8.1.1). Our idea is to p erform exhaustiv e searc h for the b etter states. The algorithm is sho wn in Figure 3. Lik e hill-clim bing, the algorithm depicted in Figure 3 starts out in the initial state. Then, facing an in termediate searc h state S , a complete breadth rst searc h starting out 264 F ast Plan Genera tion Thr ough Heuristic Sear ch initialize the curren t plan to the empt y plan <> S := I while h ( S ) 6 = 0 do p erform breadth rst searc h for a state S 0 with h ( S 0 ) < h ( S ) if no suc h state can b e found then output "F ail", stop endif add the actions on the path to S 0 at the end of the curren t plan S := S 0 endwhile Figure 3: The enforced hill-clim bing algorithm. from S is in v ok ed. This nds the closest b etter successor, i.e., the nearest state S 0 with strictly b etter ev aluation, or fails. In the latter case, the whole algorithm fails, in the former case, the path from S to S 0 is added to the curren t plan, and searc h is iterated. When a goal state|a state with ev aluation 0|is reac hed, searc h stops. Our implemen tation of breadth rst searc h starting out from S is standard, where states are k ept in a queue. One searc h iteration remo v es the rst state S 0 from the queue, and ev aluates it b y running GRAPHPLAN. If the ev aluation is b etter than that of S , searc h succeeds. Otherwise, the successors of S 0 are put to the end of the queue. Rep eated states are a v oided b y k eeping a hash table of visited states in memory . If no new states can b e reac hed an ymore, breadth rst searc h fails. 5.2 Completeness If in one iteration breadth rst searc h for a b etter state fails, then enforced hill-clim bing stops without nding a solution. This can happ en b ecause once enforced hill-clim bing has c hosen to include an action in the plan, it nev er tak es this decision bac k. The metho d is therefore only complete on tasks where no fatally wrong decisions can b e made. These are the tasks that do not con tain \dead ends." Denition 7 (Dead End) L et ( O ; I ; G ) b e a planning task. A state S is c al le d a dead end i it is r e achable and no se quenc e of actions achieves the go al fr om it, i.e., i 9 P : S = R esul t ( I ; P ) and :9 P 0 : G  R esul t ( S; P 0 ) . Naturally , a task is called dead-end free if it do es not con tain an y dead end states. W e remark that b eing dead-end free implies solv abilit y , as otherwise the initial state itself w ould already b e a dead end. Prop osition 4 L et P = ( O ; I ; G ) b e a planning task. If P is de ad-end fr e e, then enfor c e d hil l-climbing wil l nd a solution. Pro of: Assume enforced hill-clim bing do es not reac h the goal. Then w e ha v e some in ter- mediate state S = R esul t ( I ; P ), P b eing the curren t plan, where breadth rst searc h can not impro v e on the situation. No w, h ( S ) > 0 as searc h has not stopp ed y et. If there w as a 265 Hoffmann & Nebel path from S to some goal state S 0 , then complete breadth rst searc h w ould nd that path, obtain h ( S 0 ) = 0 < h ( S ), and terminate p ositiv ely . Suc h a path can therefore not exist, sho wing that S is a dead end state in con tradiction to the assumption. 2 W e remark that Prop osition 4 holds only when h is a function from states to natural n um b ers including 0, where h ( S ) = 0 i G  S . The prop osition iden ties a class of planning tasks where w e can safely apply enforced hill-clim bing. Unfortunately , it is PSP A CE-hard to decide whether a giv en planning task b elongs to that class. Denition 8 L et DEADEND-FREE denote the fol lowing pr oblem: Giv en a planning task P = ( O ; I ; G ), is P dead-end free? Theorem 3 De ciding DEADEND-FREE is PSP A CE-c omplete. Pro of: Hardness is pro v en b y p olynomially reducing PLANSA T (Bylander, 1994)|the decision problem of whether P is solv able|to the problem of deciding DEADEND-FREE. W e simply add an op erator to O that is executable in all states, and re-establishes the initial state. O 1 := O [ f o I := h; ; I ; [ o 2O add ( o ) n I ig Applying o I to an y state reac hable in P leads bac k to the initial state: all facts that can ev er b ecome true are remo v ed, and those in the initial state are added. No w, the mo died problem P 1 = ( O 1 ; I ; G ) is dead-end free i P is solv able. F rom left to righ t, if P 1 is dead- end free, then it is solv able, whic h implies that P is solv able, as w e ha v e not added an y new p ossibilit y of reac hing the goal. F rom righ t to left, if P is solv able, then also is P 1 , b y the same solution plan P . One can then, from all states in P 1 , ac hiev e the goal b y going bac k to the initial state with the new op erator, and executing P thereafter. Mem b ership in PSP A CE follo ws from the fact that PLANSA T and its complemen t are b oth in PSP A CE. A non-deterministic algorithm that decides the complemen t of DEADEND- FREE and that needs only p olynomial space can b e sp ecied as follo ws. Guess a state S . V erify in p olynomial space that S is reac hable from the initial state. F urther, v erify that the goal cannot b e reac hed from S . If this algorithm succeeds, it follo ws that the instance is not dead-end free|since S constitutes a dead end. This implies that DEADEND-FREE is in NPSP A CE, and hence in PSP A CE. 2 Though w e can not ecien tly decide whether a giv en task is dead-end free, there are easily testable sucien t criteria in the literature. Johnsson et al. (2000) dene a notion of symmetric planning tasks, whic h is sucien t for dead-end freeness, but co-NP-complete. They also giv e a p olynomial sucien t criterion for symmetry . This is, ho w ev er, v ery trivial. Hardly an y of the curren t b enc hmarks fullls it. Ko ehler and Homann (2000a) ha v e dened notions of invertible planning tasks|sucien t for dead-end freeness, and inverse actions | sucien t for in v ertibilit y , under certain restrictions. The existence of in v erse actions, and sucien t criteria for the additional restrictions, can b e decided in p olynomial time. Man y b enc hmark tasks do, in fact, fulll those criteria and can th us ecien tly b e pro v en dead-end free. 266 F ast Plan Genera tion Thr ough Heuristic Sear ch One could adopt Ko ehler and Homann's metho dology , and use the existence of in v erse actions to recognize dead-end free tasks. If the test fails, one could then emplo y a dieren t searc h strategy than enforced hill-clim bing. W e ha v e t w o reasons for not going this w a y:  Ev en amongst our b enc hmarks, there are tasks that do not con tain in v erse actions, but are nev ertheless dead-end free. An example is the Tirew orld domain, where enforced hill-clim bing leads to excellen t results.  Enforced hill-clim bing can often quite successfully solv e tasks that do con tain dead ends, as it do es not necessarily get caugh t in one. Examples for that are con tained in the Mystery and Mprime domains, whic h w e will lo ok at in Section 8.2.1. The observ ation that forms the basis for our w a y of dealing with completeness is the follo w- ing. If enforced hill-clim bing can not solv e a planning task, it usually fails v ery quic kly . One can then simply switc h to a dieren t searc h algorithm. W e ha v e exp erimen ted with ran- domizing enforced hill-clim bing, and doing a restart when one attempt failed. This didn't lead to con vincing results. Though w e tried a large v ariet y of randomization strategies, w e did not nd a planning task in our testing domains where one randomized restart did sig- nican tly b etter than the previous one, i.e., all attempts suered from the same problems. The tasks that enforced hill-clim bing do es not solv e righ t a w a y are apparen tly so full of dead ends that one can not a v oid those dead ends at random. W e ha v e therefore arranged our o v erall searc h strategy in FF as follo ws: 1. Do enforced hill-clim bing un til the goal is reac hed or the algorithm fails. 2. If enforced hill-clim bing failed, skip ev erything done so far and try to solv e the task b y a complete heuristic searc h algorithm. In the curren t implemen tation, this is what Russel and Norvig (1995) term gr e e dy b est-rst searc h. This strategy simply expands all searc h no des b y increasing order of goal distance estimation. T o summarize, FF uses enforced hill-clim bing as the base searc h metho d, and a complete b est-rst algorithm to deal with those sp ecial cases where enforced hill-clim bing has run in to a dead end and failed. 6. Pruning T ec hniques In this section, w e in tro duce t w o heuristic tec hniques that can, in principle, b e used to prune the searc h space in an y forw ard state space searc h algorithm: 1. Helpful actions selects a set of promising successors to a searc h state. As w e will demonstrate in Section 8.3, the heuristic is crucial for FF's p erformance on man y domains. 2. A dde d go al deletion cuts out branc hes where some goal has apparen tly b een ac hiev ed to o early . T esting the heuristic, w e found that it can yield sa vings on tasks that con tain goal orderings, and has no eect on tasks that don't. 267 Hoffmann & Nebel Both tec hniques are obtained as a side eect of using GRAPHPLAN as a heuristic esti- mator in the manner describ ed in Section 4. Also, b oth of them do not preserv e completeness of an y h yp othetical forw ard searc h. In the con text of our searc h algorithm, w e in tegrate them suc h that they prune the searc h space in the single enforced hill-clim bing try|whic h is not complete in general an yw a y|and completely turn them o during b est-rst searc h, if enforced hill-clim bing failed. 6.1 Helpful Actions T o a state S , w e dene a set H ( S ) of actions that seem to b e most promising among the actions applicable in S . The tec hnique is deriv ed b y ha ving a closer lo ok at the relaxed plans that GRAPHPLAN extracts on searc h states in our testing tasks. Consider the Gripp er domain, as it w as used in the 1998 AIPS planning comp etition. There are t w o ro oms, A and B, and a certain n um b er of balls, whic h are all in ro om A initially and shall b e mo v ed in to ro om B. The planner con trols a rob ot, whic h c hanges ro oms via the mo v e op erator, and whic h has t w o gripp ers to pic k or drop balls. Eac h gripp er can hold only one ball at a time. W e lo ok at a small task where 2 balls m ust b e mo v ed in to ro om B. Sa y the rob ot has already pic k ed up b oth balls, i.e., in the curren t searc h state, the rob ot is in ro om A, and eac h gripp er holds one ball. There are three applicable actions in this state: mo v e to ro om B, or drop one of the balls bac k in to ro om A. The relaxed solution that our heuristic extracts is the follo wing. < f mo v e A B g , f drop ball1 B left, drop ball2 B righ t g > This is a parallel relaxed plan consisting of t w o time steps. The action set selected at the rst time step con tains the only action that mak es sense in the state at hand, mo v e to ro om B. W e therefore pursue the idea of restricting the action c hoice in an y planning state to only those actions that are selected in the rst time step of the relaxed plan. W e call these the actions that seem to b e helpful. In the ab o v e example state, this strategy cuts do wn the branc hing factor from three to one. Sometimes, restricting oneself to only the actions that are selected b y the relaxed planner can b e to o m uc h. Consider the follo wing Blo c ksw orld example. Sa y w e use the w ell kno wn represen tation with four op erators, stac k , unstac k , pic kup and putdo wn . The planner con trols a single rob ot arm, and the op erators can b e used to stac k one blo c k on top of another one, unstac k a blo c k from another one, pic kup a blo c k from the table, or put a blo c k that the arm is holding do wn on to the table. Initially , the arm is holding blo c k C, and blo c ks A and B are on the table. The goal is to stac k A on to B. Started on this state, relaxed GRAPHPLAN will return one out of the follo wing three time step optimal solutions. < f putdo wn C g , f pic kup A g , f stac k A B g > or 268 F ast Plan Genera tion Thr ough Heuristic Sear ch < f stac k C A g , f pic kup A g , f stac k A B g > or < f stac k C B g , f pic kup A g , f stac k A B g > All of these are v alid relaxed solutions, as in the relaxation it do es not matter that stac k ing C on to A or B deletes facts that w e still need. If C is on A, w e can not pic kup A an ymore, and if C is on B, w e can not stac k A on to B an ymore. The rst action in eac h relaxed plan is only inserted to get rid of C, i.e., free the rob ot arm, and from the p oin t of view of the relaxed planner, all of the three starting actions do the job. Th us the relaxed solution extracted migh t b e an y of the three ab o v e. If it happ ens to b e the second or third one, then w e lose the path to an optimal solution b y restricting ourselv es to the corresp onding actions, stac k C A or stac k C B. Therefore, w e dene the set H ( S ) of helpful actions to a state S as follo ws. H ( S ) := f o j pre( o )  S; add ( o ) \ G 1 ( S ) 6 = ;g (5) Here, G 1 ( S ) denotes the set of goals that is constructed b y relaxed GRAPHPLAN at time step 1|one lev el ahead of the initial la y er|when started on the task ( O 0 ; S; G ). In w ords, w e consider as helpful actions all those applicable ones, whic h add at least one goal at the rst time step. In the ab o v e Blo c ksw orld example, freeing the rob ot arm is among these goals, whic h causes all the three starting actions to b e helpful in the initial state, i.e., to b e elemen ts of H ( I ). In the ab o v e Gripp er example, the mo dication do es not c hange an ything. The notion of helpful actions shares some similarities with what Drew McDermott calls the favor e d actions (McDermott, 1996, 1999), in the con text of computing gr e e dy r e gr ession gr aphs for heuristic estimation. In a n utshell, greedy regression graphs bac k c hain from the goals un til facts are reac hed that are con tained in the curren t state. Amongst other things, the graphs pro vide an estimation of whic h actions migh t b e useful in getting closer to the goal: Those applicable ones whic h are mem b ers of the ee ctive sub gr aph , whic h is the minimal cost subgraph ac hieving the goals. There is also a similarit y b et w een the helpful actions heuristic and what is kno wn as r elevanc e from the literature (Neb el, Dimop oulos, & Ko ehler, 1997). Consider a Blo c ksw orld task where h undreds of blo c ks are on the table initially , but the goal is only to stac k one blo c k A on top of another blo c k B. The set H ( I ) will in this case con tain only the single action pic kup A, thro wing a w a y all those applicable actions mo ving around blo c ks that are not men tioned in the goal, i.e., thro wing a w a y all those actions that are irrelev an t. The main dierence b et w een the helpful actions heuristic and the concept of relev ance is that relev ance in the usual sense refers to what is useful for solving the whole task. Being helpful, on the other hand, refers to something that is useful in the next step . This has the disadv an tage that the helpful things need to b e recomputed for eac h searc h state, but the 269 Hoffmann & Nebel adv an tage that p ossibly far less things are helpful than are relev an t. In our sp ecic setting, w e get the helpful actions for free an yw a y , as a side eect of running relaxed GRAPHPLAN. W e conclude this subsection with an example sho wing that helpful actions pruning do es not preserv e completeness, and a few remarks on the curren t in tegration of the tec hnique in to our searc h algorithm. 6.1.1 Completeness In the follo wing short example, the helpful actions heuristic prunes out all solutions from the state space. Sa y the initial state is f B g , the goals are f A; B g , and there are the follo wing actions: name (pre ; add ; del) op A 1 = ( ; ; f A g ; f B g ) op A 2 = ( f P A g ; f A g ; ; ) op P A = ( ; ; f P A g ; ; ) op B 1 = ( ; ; f B g ; f A g ) op B 2 = ( f P B g ; f B g ; ; ) op P B = ( ; ; f P B g ; ; ) In this planning task, there are t w o w a ys of ac hieving the missing goal A . One of these, op A 1 , deletes the other goal B . The other one, op A 2 , needs the precondition P A to b e ac hiev ed rst b y op P A , and th us in v olv es using two planning actions instead of one in the rst case. Relaxed GRAPHPLAN recognizes only the rst alternativ e, as it's the only time step optimal one. The set of goals at the single time step created b y graph construction is G 1 ( I ) = f A; B g This giv es us t w o helpful actions, namely H ( I ) = f op A 1 ; op B 1 g One of these, op B 1 , do es not cause an y state transition in the initial state. The other one, op A 1 , leads to the state where only A is true. T o this state, w e obtain the same set of helpful actions, con taining, again, op A 1 and op B 1 . This time, the rst action causes no state transition, while the second one leads us bac k to the initial state. Helpful actions th us cuts out the solutions from the state space of this example task. W e remark that the task is dead-end free|one can alw a ys reac h A and B b y applying op P A , op A 2 , op P B , and op B 2 |and that one can easily mak e the task in v ertible without c hanging the b eha vior. In STRIPS domains, one could theoretically o v ercome the incompleteness of helpful actions pruning b y considering not only the rst relaxed plan that GRAPHPLAN nds, but computing a kind of union o v er all relaxed plans that GRAPHPLAN could p ossibly nd, when allo wing non time step optimal plans. More precisely , in a searc h state S , consider the relaxed task ( O 0 ; S; G ). Extend the relaxed planning graph un til fact lev el jO 0 j is reac hed. Set a goal set G jO 0 j at the top fact lev el to G jO 0 j := G . Then, pro ceed from fact lev el jO 0 j  1 do wn to fact lev el 1, where, at eac h lev el i , a set G i of goals is generated 270 F ast Plan Genera tion Thr ough Heuristic Sear ch as the union of G i +1 with the preconditions of all actions in lev el i that add at least one fact in G i +1 . Up on termination, dene as helpful all actions that add at least one fact in G 1 . It can b e pro v en that, this w a y , the starting actions of all optimal solutions from S are considered helpful. Ho w ev er, in all our STRIPS testing domains, the complete metho d alw a ys selects al l applicable actions as helpful. 6.1.2 Integra tion into Sear ch As has already b een noted at the v ery b eginning of this section, w e in tegrate helpful actions pruning in to our searc h algorithm b y only applying it during the single enforced hill-clim bing try , lea ving the complete b est-rst searc h algorithm unc hanged (see Section 5). F acing a state S during breadth rst searc h for a b etter state in enforced hill-clim bing, w e lo ok only at those successors generated b y H ( S ). This renders our implemen tation of enforced hill- clim bing incomplete ev en on in v ertible planning tasks. Ho w ev er, in all our testing domains, the tasks that cannot b e solv ed b y enforced hill-clim bing using helpful actions pruning are exactly those that cannot b e solv ed b y enforced hill-clim bing an yw a y . 6.2 Added Goal Deletion The second pruning tec hnique that w e in tro duce in this section is motiv ated b y the ob- serv ation that in some planning domains there are goal ordering constrain ts, as has b een recognized b y quite a n um b er of researc hers in the past (Irani & Cheng, 1987; Drummond & Currie, 1989; Joslin & Roac h, 1990). In our exp erimen ts on tasks with goal ordering con- strain ts, FF's base arc hitecture sometimes w asted a lot of time ac hieving goals that needed to b e cared for later on. W e therefore dev elop ed a heuristic to inform searc h ab out goal orderings. The classical example for a planning domain with goal ordering constrain ts is the w ell kno wn Blo c ksw orld . Sa y w e ha v e three blo c ks A, B and C on the table initially , and w an t to stac k them suc h that w e ha v e B on top of C, and A on top of B. Ob viously , there is not m uc h p oin t in stac king A on B rst. No w, imagine a forw ard searc hing planner confron ted with a searc h state S , where some goal G has just b een ac hiev ed, i.e., S resulted from some other state b y applying an action o with G 2 add( o ). What one can ask in a situation lik e this is, w as it a go o d idea to ac hiev e G righ t no w? Or should some other goal b e ac hiev ed rst? Our answ er is inspired b y recen t w ork of Ko ehler and Homann (2000a), whic h argues that ac hieving G should b e p ostp oned if the remaining goals can not b e ac hiev ed without destro ying G again. Of course, nding out ab out this in v olv es solving the remaining planning task. Ho w ev er, w e can arriv e at a v ery simple but|in our testing domains|surprisingly accurate appro ximation b y using the relaxed plan that GRAPHPLAN generates for the state S . The metho d w e are using is as simple as this: If the relaxed solution plan, P , that GRAPHPLAN generates for S , con tains an action o , o 2 P , that deletes G ( G 2 del ( o ) in o 's non-relaxed v ersion), then w e remo v e S from the searc h space, i.e., do not generate an y successors to S . W e call this metho d the adde d go al deletion heuristic. Let us exemplify the heuristic with the ab o v e Blo c ksw orld example. Sa y the planner has just ac hiev ed on (A,B), but with on (B,C) still b eing false, i.e., w e are in the situation 271 Hoffmann & Nebel where A is on top of B, and B and C are standing on the table. The relaxed solution that GRAPHPLAN nds to this situation is the follo wing. < f unstac k A B g , f pic kup B g , f stac k B C g > The goal on (A,B), whic h has just b een ac hiev ed, gets deleted b y the rst action unstac k A B. Consequen tly , w e realize that stac k ing A on to B righ t no w w as probably a bad idea, and prune this p ossibilit y from the searc h space, whic h results in a solution plan that stac k s B on to C rst. Lik e in the preceding subsection, w e conclude with an example sho wing that pruning searc h states in the manner describ ed ab o v e do es not preserv e completeness, and with a few remarks on our curren t searc h algorithm implemen tation. 6.2.1 Completeness In the follo wing small example, one of the goals must b e destro y ed temp orarily in order to ac hiev e the other goal. This renders the planning task unsolv able when one is using the added goal deletion heuristic. Sa y the initial state is empt y , the goals are f A; B g , and there are the follo wing actions: name (pre ; add ; del ) op A = ( ; ; f A g ; ; ) op B = ( f A g ; f B g ; f A g ) All solutions to this task need to apply op A , use op B thereafter, and re-establish A . The crucial p oin t here is that A must b e temp orarily destro y ed. The added goal deletion heuristic is not adequate for suc h planning tasks. The example is dead-end free, and one can easily mak e the scenario in v ertible without c hanging the b eha vior of the heuristic. Unlik e for helpful actions, completeness can not b e regained b y someho w en umerating all relaxed plans to a situation. In the ab o v e example, when A has b een ac hiev ed but B is still f alse , then all relaxed plans con tain op B , deleting A . 6.2.2 Integra tion into Sear ch W e use the added goal deletion heuristic in a w a y similar to the in tegration of the helpful actions heuristic. As indicated at the v ery b eginning of the section, it is in tegrated in to the single enforced hill-clim bing try that searc h do es, and completely turned o during b est-rst searc h, in case enforced hill-clim bing didn't mak e it to the goal. W e also use another goal ordering tec hnique, tak en from the literature. One of the most common approac hes to dealing with goal orderings is trying to recognize them in a prepro cessing phase, and then use them to prune fractions of the searc h space during planning (Irani & Cheng, 1987; Cheng & Irani, 1989; Joslin & Roac h, 1990). This is also the basic principle underlying the so-called \goal agenda" approac h (Ko ehler, 1998). F or our system, w e ha v e implemen ted a sligh tly simplied v ersion of the goal agenda algorithm, and use it to further enhance p erformance. A v ery short summary of what happ ens is this. 272 F ast Plan Genera tion Thr ough Heuristic Sear ch In a prepro cessing phase, the planner lo oks at all pairs of goals and decides heuristically whether there is an ordering constrain t b et w een them. Afterw ards, the goal set is split in to a totally ordered series of subsets resp ecting these orderings. These are then fed to enforced hill-clim bing in an incremen tal manner. Precisely , if G 1 ; : : : ; G n is the ordered series of subsets, enforced hill-clim bing gets rst started on the original initial state and G 1 . If that w orks out, searc h ends in some state S satisfying the goals in G 1 . Enforced hill-clim bing is then called again on the new starting state S and the larger goal set G 1 [ G 2 . F rom a state satisfying this, searc h gets started for the goals G 1 [ G 2 [ G 3 , and so on. The incremen tal, or agenda-driven , planning pro cess can b e applied to an y planner, in principle, and preserv es completeness only on dead-end free tasks (Ko ehler & Homann, 2000a), i.e., again, w e ha v e an enhancemen t that loses completeness in general. Th us, w e use the goal agenda only in enforced hill-clim bing, lea ving the complete b est-rst searc h phase unc hanged. The goal agenda tec hnique yields run time sa vings in domains where there are ordering constrain ts b et w een the goals. In our testing suite, these are the Blo c ksw orld and the Tir eworld . In planning tasks without ordering constrain ts, the series of subsets collapses in to a single en try , suc h that the agenda mec hanism do es not c hange an ything there. The run time tak en for the pre-pro cess itself w as neglectible in all our exp erimen ts. 7. Extension to ADL So far, w e ha v e restricted ourselv es to planning tasks sp ecied in the simple STRIPS lan- guage. W e will no w sho w ho w our approac h can b e extended to deal with ADL (P ednault, 1989) tasks, more precisely , with the ADL subset of PDDL (McDermott et al., 1998) that w as used in the 2nd in ternational planning systems comp etition (Bacc h us, 2000). This in v olv es dealing with arbitrary function-sym b ol free rst order logic form ulae, and with conditional eects. Our extension w ork is divided in to the follo wing four subareas: 1. Apply a prepro cessing approac h to the ADL domain and task description, compiling the sp ecied task do wn in to a prop ositional normal form. 2. Extend the heuristic ev aluation of planning states to deal with these normal form constructs. 3. Adjust the pruning tec hniques. 4. Adjust the searc h mec hanisms. 7.1 Prepro cessing an ADL Planning T ask FF's prepro cessing phase is almost iden tical to the metho dology that has b een dev elop ed for the IPP planning system. F or details, w e refer the reader to the w ork that's b een done there (Ko ehler & Homann, 2000b), and giv e only the basic principles here. The planner starts with a planning task sp ecication giv en in the subset of PDDL dened for the AIPS-2000 planning comp etition (Bacc h us, 2000). The input is a set of op erator sc hemata, the initial state, and a goal form ula. The initial state is simply a set of ground atoms, and the goal form ula is an arbitrary rst order logical form ula using the relational sym b ols dened for the planning task. An y op erator sc hema o is dened b y a 273 Hoffmann & Nebel list of parameters, a precondition, and a list of eects. Instan tiating the parameters yields, just lik e STRIPS tasks are usually sp ecied, the actions to the sc hema. The precondition is an arbitrary (rst order) form ula. F or an action to b e applicable in a giv en state S , its instan tiation of this form ula m ust b e satised in S . Eac h eect i in the list has the form 8 y i 0 ; : : : ; y i n i : ( i ( o ) ; add i ( o ) ; del i ( o )) Here, y i 0 ; : : : ; y i n i are the eect parameters, i ( o ) is the eect condition|again, an arbitrary form ula|and add i ( o ) and del i ( o ) are the atomic add and delete eects, resp ectiv ely . The atomic eects are sets of uninstan tiated atoms, i.e., relational sym b ols con taining v ariables. The seman tics are that, if an instan tiated action is executed, then, for eac h single eect i in the list, and for eac h instan tiation of its parameters, the condition i ( o ) is ev aluated. If i ( o ) holds in the curren t state, then the corresp onding instan tiations of the atoms in add i ( o ) are added to the state, and the instan tiations of atoms in del i ( o ) are remo v ed from the state. In FF's heuristic metho d, eac h single state ev aluation can in v olv e thousands of op erator applications|building the relaxed planning graph, one needs to determine all applicable actions at eac h single fact la y er. W e therefore in v est the eort to compile the op erator descriptions do wn in to a m uc h simpler prop ositional normal form, suc h that heuristic ev al- uation can b e implemen ted ecien tly . Our nal normal form actions o ha v e the follo wing format. Precondition: pre( o ) Eects: (pre 0 ( o ) ; add 0 ( o ) ; del 0 ( o )) ^ (pre 1 ( o ) ; add 1 ( o ) ; del 1 ( o )) ^ . . . (pre m ( o ) ; add m ( o ) ; del m ( o )) The precondition is a set of ground atoms. Lik ewise, the eect conditions pre i ( o ) of the single eects are restricted to b e ground atoms. W e also represen t the goal state as a set of atoms. Th us, w e compile a w a y ev erything except the conditional eects. Compiling a w a y the logical form ulae in v olv es transforming them in to DNF, whic h causes an exp onen tial blo w up in general. In our testing domains, ho w ev er, w e found that this transformation can b e done in reasonable time. Concerning the conditional eects, those can not b e compiled a w a y without another exp onen tial blo w up, giv en that w e w an t to preserv e solution length. This w as pro v en b y Neb el (2000). As w e will see, conditional eects can ecien tly b e in tegrated in to our algorithmic framew ork, so there is no need for compiling them a w a y . The compilation pro cess pro ceeds as follo ws: 1. Determine predicates that are static , in the sense that no op erator has an eect on them. Suc h predicates are a common phenomenon in b enc hmark tasks. An example are the (in-cit y ? l ? c ) facts in Logistics tasks: An y lo cation ? l sta ys, of course, lo cated within the same cit y ? c throughout the whole planning pro cess. W e recognize static predicates b y a simple sw eep o v er all op erator sc hemata. 274 F ast Plan Genera tion Thr ough Heuristic Sear ch 2. T ransform all form ulae in to quan tier-free DNF. This is sub divided in to three steps: (a) Pre-normalize all logical form ulae. F ollo wing Gazen and Knoblo c k (1997), this pro cess expands all quan tiers, and translates negations. W e end up with for- m ulae that are made up out of conjunctions, disjunctions, and atoms con taining v ariables. (b) Instan tiate all parameters. This is simply done b y instan tiating all op erator and eect parameters with all t yp e consisten t constan ts one after the other. The pro cess mak es use of kno wledge ab out static predicates, in the sense that the instan tiated form ulae can often b e simplied (Ko ehler & Homann, 2000b). F or example, if an instan tiated static predicate ( p ~ a ) o ccurs in a form ula, and that instan tiation is not con tained in the initial state, then ( p ~ a ) can b e replaced with f alse . (c) T ransform form ulae in to DNF. This is p ostp oned un til after instan tiation, b e- cause it can b e costly , so it should b e applied to as small form ulae as p ossible. In a fully instan tiated form ula, it is lik ely that man y static or one-w a y predi- cate o ccurrences can b e replaced b y tr ue or f alse , resulting in a m uc h simpler form ula structure. 3. Finally , if the DNF of an y form ula con tains more than one disjunct, then the corre- sp onding eect, op erator, or goal condition gets split up in the manner prop osed b y Gazen and Knoblo c k (1997). 7.2 Relaxed GRAPHPLAN with Conditional Eects W e no w sho w ho w our sp ecialized GRAPHPLAN implemen tation, as w as describ ed in Section 4.3, is c hanged to deal with ADL constructs. Building on our normalized task represen tation, it suces to tak e care of conditional eects. 7.2.1 Relaxed Planning Graphs with Conditional Effects Our enco ding of planning graph building for relaxed tasks almost immediately carries o v er to ADL actions in the ab o v e prop ositional normal form. One simply needs to k eep an additional la y er mem b ership v alue for all ee cts of an action. The la y er mem b ership of an eect indicates the rst la y er where all its eect conditions plus the corresp onding action's preconditions are presen t. T o compute these mem b ership in tegers in an ecien t manner, w e k eep a coun ter for eac h eect i of an action o , whic h gets incremen ted eac h time a condition c 2 pre i ( o ) b ecomes presen t, and eac h time a precondition p 2 pre( o ) of the corresp onding action b ecomes presen t. The eect gets its la y er mem b ership set as so on as its coun ter reac hes j pre i ( o ) j + j pre ( o ) j . The eect's add eects add i ( o ) are then sc heduled for the next la y er. The pro cess is iterated un til all goals are reac hed the rst time. 7.2.2 Relaxed Plan Extra ction with Conditional Effects The relaxed plan extraction mec hanism for ADL diers from its STRIPS coun terpart in merely t w o little details. Instead of selecting ac hieving actions, the extraction mec hanism selects ac hieving eects. Once an eect i of action o is selected, all of its eect conditions 275 Hoffmann & Nebel plus o 's preconditions need to b e put in to their corresp onding goal sets. Afterw ards, not only the eect's o wn add eects add i ( o ) are mark ed tr ue at the time b eing, but also the added facts of all eects that are implie d , i.e., those eects j of o with pre j ( o )  pre i ( o ) (in particular, this will b e the unconditional eects of o , whic h ha v e an empt y eect condition). 7.3 ADL Pruning T ec hniques Both pruning tec hniques from Section 6 easily carry o v er to actions with conditional eects. 7.3.1 Helpful A ctions F or STRIPS, w e dened as helpful all applicable actions ac hieving at least one goal at time step 1, cf. Section 6.1. F or our ADL normal form, w e simply c hange this to al l applic able actions having an app e aring ee ct that achieves a go al at time step 1, where an eect app e ars i its eect condition is satised in the curren t state. H ( S ) := f o j pre( o )  S; 9 i : pre i ( o )  S ^ add i ( o ) \ G 1 ( S ) 6 = ;g (6) 7.3.2 Added Go al Deletion Originally , w e cut o a state S if one of the actions selected for the relaxed plan to S deleted a goal A that had just b een ac hiev ed, cf. Section 6.2. W e no w simply tak e as criterion the ee cts that are selected for the relaxed plan, i.e., a state is cut o if one of the eects selected for its relaxed solution deletes a goal A that has just b een ac hiev ed. 7.4 ADL State T ransitions Finally , for enabling the searc h algorithms to handle our prop ositional ADL normal form, it is sucien t to redene the state tr ansition function . F orw ard searc h, no matter if it do es hill-clim bing, b est-rst searc h, or whatso ev er, alw a ys faces a completely sp ecied searc h state. 3 It can therefore compute exactly the eects of executing a con text dep enden t action. F ollo wing Ko ehler et al.(1997), w e dene our ADL state transition function R es , mapping states and ADL normal form actions to states, as follo ws. R es ( S; o ) = ( ( S [ A ( S; o )) n D ( S; o ) if pre ( o )  S undened otherwise with A ( S; o ) = [ pre i ( o )  S add i ( o ) and D ( S; o ) = [ pre i ( o )  S del i ( o ) : 3. This holds if the initial state is completely sp ecied and all actions are deterministic, whic h w e b oth assume. 276 F ast Plan Genera tion Thr ough Heuristic Sear ch 8. P erformance Ev aluation W e ha v e implemen ted the metho dology presen ted in the preceding sections in C . 4 In this section, w e ev aluate the p erformance of the resulting planning system. Empirical data is divided in to three subareas: 1. The FF system to ok part in the fully automated trac k of the 2nd in ternational plan- ning systems comp etition, carried out alongside with AIPS-2000 in Brec k enridge, Col- orado. W e review the results, demonstrating FF's go o d run time and solution length b eha vior in the comp etition. W e also giv e some in tuitions on wh y FF b eha v es the w a y it do es. 2. F rom our o wn exp erimen ts, w e presen t some of the results that w e ha v e obtained in domains that w ere not used in the AIPS-2000 comp etition. First, w e briey summarize our ndings in some more domains where FF w orks w ell. Then, to illustrate our in tuitions on the reasons for FF's p erformance, w e giv e a few examples of domains where the approac h is less appropriate. 3. W e nally presen t a detailed comparison of FF's p erformance to that of HSP , in the sense that w e in v estigate whic h dierences b et w een FF and HSP lead to whic h p erformance results. 8.1 The AIPS-2000 Planning Systems Comp etition F rom Marc h to April 2000, the 2nd in ternational planning systems comp etition, organized b y F ahiem Bacc h us, w as carried out in the general setting of the AIPS-2000 conference in Brec k enridge, Colorado. There w ere t w o main trac ks, one for fully-automated planners, and one for hand-tailored planners. Both trac ks w ere divided in to v e parts, eac h one concerned with a dieren t planning domain. Our FF system to ok part in the fully automated trac k. In the comp etition, FF demonstrated run time b eha vior sup erior to that of the other fully automatic planners and w as therefore gran ted \Group A distinguished p erformance Planning System" (Bacc h us & Nau, 2001). It also w on the Sc hindler Aw ard for the rst place in the Miconic 10 Elev ator domain, ADL trac k. In this section, w e briey presen t the data collected in the fully automated trac k, and giv e, for eac h domain, some in tuitions on the reasons for FF's b eha vior. The reader should b e a w are that the comp etition made no distinction b et w een optimal and sub optimal planners, putting together the run time curv es for b oth groups. In the text to eac h domain, w e state whic h planners found optimal solutions, and whic h didn't. P er planning task, all planners w ere giv en half an hour running time on a 500 MHz P en tium I I I with 1GB main memory . If no solution w as found within these resource b ounds, the planner w as declared to ha v e failed on the resp ectiv e task. 8.1.1 The Logistics Domain The rst t w o domains that w ere used in the comp etition w ere the Logistics and Blo c ksw orld domains. W e rst lo ok at the former. This is a classical domain, in v olving the transp ortation 4. The source co de is a v ailable in an online app endix, and can b e do wnloaded from the FF Homepage at h ttp://www.informatik.uni-freiburg.de/~ homann/.h tml. 277 Hoffmann & Nebel of pac k ets via truc ks and airplanes. Figure 4 sho ws the run time curv es of those planners that w ere able to scale to bigger instances in the comp etition. 0.1 1 10 100 1000 10000 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 sec. problem size FF HSP2 System-R GRT Mips STAN Figure 4: Run time curv es on large Logistics instances for those six planners that could scale up to them. Time is sho wn on a logarithmic scale. The Logistics tasks w ere sub divided in to t w o sets of instances, the easy and the harder ones. Those planners that did w ell on all of the easy instances w ere also run on the harder set. These planners w ere FF, HSP2 (Bonet & Gener, 1998, 1999), System-R, GR T (Re- fanidis & Vlaha v as, 1999), Mips (Edelk amp, 2000), and ST AN (Long & F o x, 1999; F o x & Long, 2001). Tw o observ ations can b e made: 1. System-R do es signican tly w orse than the other planners. 2. The b etter planners all b eha v e quite similar, with FF and Mips tending to b e the fastest. Note also that times are sho wn on a logarithmic scale, so w e are not lo oking at linear time Logistics planners. Concerning solution plan length, w e do not sho w a gure here. None of the sho wn planners guaran tees the returned plans to b e optimal. It turns out that ST AN nds the shortest plans on most instances. System-R nds signican tly longer plans than the others, ranging from 178% to 261% of ST AN's plan lengths, with an a v erage of 224%. The lengths of FF's plans are within 97% to 115% of ST AN's plan lengths, with an a v erage length of 105%. Concerning FF's go o d run time b eha vior, w e think that there are mainly t w o reasons for that: 278 F ast Plan Genera tion Thr ough Heuristic Sear ch 1. In all iterations of enforced hill-clim bing, breadth rst searc h nds a state with b etter ev aluation at v ery small depths (motiv ating our searc h algorithm, cf. Section 5.1). In most cases, the next b etter successor is at depth 1, i.e., a direct one. There are some cases where the shallo w est b etter successor is at depth 2, and only v ery rarely breadth rst needs to go do wn to depth 3. These observ ations are indep enden t of task size. 2. The helpful actions heuristic prunes large fractions of the searc h space. Lo oking at the states that FF encoun ters during searc h, only b et w een 40 and 5 p ercen t of all of a state's successors w ere considered helpful in our exp erimen ts, with the tendency that the larger the task, the less helpful successors there are. There is a theoretical note to b e made on the rst observ ation. With the common represen tation of Logistics tasks, the follo wing can b e pro v en. Let d b e the maximal distance b et w een t w o lo cations, i.e., the n um b er of mo v e actions a mobile needs to tak e to get from one lo cation to another. Using a heuristic function that assigns to eac h state the length of an optimal relaxed solution as the heuristic v alue, the distance of eac h state to the next b etter ev aluated state is maximal d + 1. Th us, an algorithm that used enforced hill-clim bing with an oracle function returning the length of an optimal relaxed solution w ould b e p olynomial on standard Logistics represen tations, giv en an upp er limit to d . In the b enc hmarks a v ailable, mobiles can reac h an y lo cation accessible to them in just one step, i.e., the maximal distance in those tasks is constan tly d = 1. Also, FF's heuristic usually do es nd optimal, or close to optimal, relaxed solutions there, suc h that enforced hill-clim bing almost nev er needs to lo ok more than d + 1 = 2 steps ahead. 8.1.2 The Blo c ksw orld Domain The Blo c ksw orld is one of the b est kno wn b enc hmark planning domains, where the planner needs to rearrange a bunc h of blo c ks in to a sp ecied goal p osition, using a rob ot arm. Just lik e the Logistics tasks, the comp etition instances w ere divided in to a set of easier, and of harder ones. Figure 5 sho ws the run time curv es of the planners that scaled to the harder ones. System-R scales most steadily to the Blo c ksw orld tasks used in the comp etition. In particular, it is the only planner that can solv e al l of those tasks. HSP2 solv es some of the smaller instances, and FF solv es ab out t w o thirds of the set. If FF succeeds on an instance, then it do es so quite fast. F or example, FF solv es one of the size-50 tasks in 1 : 27 seconds, where System-R needs 892 : 31 seconds. None of the three planners nds optimal plans. On the tasks that HSP2 manages to solv e, its plans are within 97% to 177% of System-R's plan lengths, with an a v erage of 153%. On the tasks that FF manages to solv e, its plans are within 83% to 108% of System-R's plan lengths, a v erage 96%. By exp erimen ting with dieren t congurations of FF, w e found that the b eha vior of FF on these tasks is largely due to the goal ordering heuristics from Section 6.2. Goal distance estimates are not so go o d|the planner grabs a whole bunc h of blo c ks with its single arm|and neither is the helpful actions heuristic|when the arm holds a blo c k, all p ositions where the arm could p ossibly put the blo c k are usually considered helpful. The goal agenda (Section 6.2.2), on the other hand, divides the tasks in to small subtasks, and added goal deletion (Section 6.2) prev en ts the planner from putting blo c ks on to stac ks where 279 Hoffmann & Nebel 0.01 0.1 1 10 100 1000 10000 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 sec. problem size FF HSP2 System-R Figure 5: Run time curv es on large Blo c ksw orld instances for those three planners that could scale up to them: FF, HSP2, and System-R. Time is sho wn an a logarithmic scale. some blo c k b eneath still needs to b e mo v ed. Ho w ev er, in some cases ac hieving the goals from earlier en tries in the goal agenda cuts o goals that are still ahead. Not a w are of the blo c ks that it will need to stac k for ac hieving goals ahead, the planner migh t put the curren t blo c ks on to stac ks that need to b e disassem bled later on. If that happ ens with to o man y blo c ks|whic h dep ends more or less randomly on the sp ecic task and the actions that the planner c ho oses|then the planner can not nd its w a y out of the situation again. These are probably the instances that FF couldn't solv e in the comp etition. 8.1.3 The Sc hedule Domain In the Sc hedule domain, the planner is facing a bunc h of ob jects to b e w ork ed on with a set of mac hines, i.e., the planner is required to create a job sc hedule in whic h the ob jects shall b e assigned to the mac hines. The comp etition represen tation mak es use of a simple form of quan tied conditional eects. F or example, if an ob ject gets pain ted red, then that is its new color, and for all colors that it is curren tly pain ted in, it is not of that color an ymore. Only a subset of the planners in the comp etition could handle this kind of conditional eects. Their run time curv es are sho wn in Figure 6. Apart from those planners already seen, w e ha v e run time curv es in Figure 6 for IPP (Ko ehler et al., 1997), PropPlan, and BDDPlan (H olldobler & St orr, 2000). FF outp er- forms the other planners b y man y orders of magnitude|remem b er that time is sho wn on a logarithmic scale. Concerning solution length, FF's plans tend to b e sligh tly longer than the 280 F ast Plan Genera tion Thr ough Heuristic Sear ch 0.01 0.1 1 10 100 1000 10000 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 sec. problem size FF HSP2 Mips IPP PropPlan BDDPlan Figure 6: Run time curv es on Sc hedule instances for those planners that could handle con- ditional eects. Time is sho wn on a logarithmic scale. plans returned b y the other planners on the smaller instances. Optimal plans are found b y Mips, PropPlan, and BDDPlan. FF's plan lengths are within 175% of the optimal lengths, with an a v erage of 116%. Only HSP sometimes nds longer plans than FF, b eing in a range from 62% to 117% of FF's plan lengths, 94% on a v erage. Resp onsible for the outstanding run time b eha vior of FF on the Sc hedule domain is, apparen tly , the helpful actions heuristic. Measuring, for some example states, the p ercen t- age of successors that w ere considered helpful, w e usually found it w as close to 2 p ercen t, i.e., only t w o out of a h undred applicable actions w ere considered b y the planner. F or ex- ample, all of the 637 states that FF lo oks at for solving one of the size-50 tasks ha v e 523030 successors altogether, where the sum of all helpful successors is only 7663. Also, the b etter successors, similar to the Logistics domain, lie at shallo w depths. Breadth rst searc h nev er go es deep er than three steps on the Sc hedule tasks in the comp etition suite. Finally , in a few exp erimen ts w e ran for testing that, the goal agenda help ed b y ab out a factor 2 in terms of running time. 8.1.4 The F reecell Domain The F reecell domain formalizes a solitaire card game that comes with Microsoft Windo ws. The largest tasks en tered in the comp etition (size 13 in Figure 7) corresp ond directly to some real-w orld sized tasks, while in the smaller tasks, there are less cards to b e considered. Figure 7 sho ws the run time curv es of the four b est p erforming planners. 281 Hoffmann & Nebel 0.01 0.1 1 10 100 1000 10000 2 3 4 5 6 7 8 9 10 11 12 13 sec. problem size FF HSP2 Mips STAN Figure 7: Run time curv es on F reecell tasks for those planners that scaled to bigger instances. Time is sho wn on a logarithmic scale. F rom the group of the four b est-scaling planners sho wn in Figure 7, HSP2 is the slo w est, while ST AN is the fastest planner. FF is generally second place, and has a lot of v ariation in its running times. On the other hand, FF is the only planner that is capable of solving the real-w orld tasks, size 13. It solv es four out of v e suc h tasks. None of the sho wn planners guaran tees the found plans to b e optimal, and none of the sho wn planners demonstrates sup erior p erformance concerning solution length. ST AN pro duces unnecessarily long plans in a few cases. Precisely , on the tasks that b oth HSP and FF manage to solv e, HSP's plan lengths are within a range of 74% to 126% of FF's plan lengths, a v erage 95%. On tasks solv ed b y b oth Mips and FF, plan lengths of Mips are within 69% to 128% of FF's lengths, a v erage 101%. F or ST AN, the range is 65% to 318%, with 112% on a v erage. Concerning FF's run time b eha vior, the big v ariation in running time as w ell as its capa- bilit y of solving larger tasks b oth seem to result from the w a y the o v erall searc h algorithm is arranged. W e observ ed the follo wing. Those tasks that get solv ed b y enforced hill-clim bing are those that are solv ed fast. Sometimes, ho w ev er, esp ecially on the larger tasks, enforced hill-clim bing runs in to a dead end situation (no cards can b e mo v ed an ymore). Then, the planner starts from scratc h with complete b est-rst searc h, whic h tak es more time, but can solv e big instances quite reliably , as can b e seen on the tasks of size 13. Helpful ac- tions w orks mo derately w ell, selecting around 70% of the a v ailable actions, and the b etter successors are usually close, but sometimes lie at depths of more than 5 steps. 282 F ast Plan Genera tion Thr ough Heuristic Sear ch 8.1.5 The Miconic Domain The nal domain used in the comp etition comes from a real-w orld application, where mo ving sequences of elev ators need to b e planned. The sequences are due to all kinds of restrictions, lik e that the VIPs need to b e serv ed rst. T o form ulate all of these restrictions, complex rst order preconditions are used in the represen tation (Ko ehler & Sc h uster, 2000). As only a few planners could handle the full ADL represen tation, the domain w as sub divided in to the easier STRIPS and SIMPLE (conditional eects) classes, the full ADL class, and an ev en more expressiv e class where n umerical constrain ts (the n um b er of passengers in the elev ator at a time) needed to b e considered. W e sho w the run time curv es for the participan ts in the full ADL class in Figure 8. In dierence to the previous domains, the Miconic domain w as run on site at AIPS-2000, using 450 MHz P en tium I I I mac hines with 256 MByte main memory . 0.01 0.1 1 10 100 1000 10000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 sec. problem size FF IPP PropPlan Figure 8: Run time curv es on Elev ator tasks for those planners whic h handled the full ADL Miconic 10 domain represen tation. Time is sho wn on a logarithmic scale. FF outp erforms the t w o other full ADL planners in terms of solution time. It m ust b e noticed, ho w ev er, that IPP and PropPlan generate pro v ably optimal plans here, suc h that one needs to b e careful when directly comparing those running times. On the other hand, FF's plans are quite close to optimal on these instances, b eing within in a range of maximally 133% of the optimal solution lengths to the instances solv ed b y PropPlan, 111% on a v erage. The large v ariation of FF's running times is apparen tly due to the same phenomenon as the v ariation in F reecell is: sometimes, as w e observ ed, enforced hill-clim bing runs in to 283 Hoffmann & Nebel a dead end, whic h causes a switc h to b est-rst searc h, solving the task in more time, but reliably . The helpful actions p ercen tage tak es v ery lo w v alues on a v erage, around 15%, and breadth rst searc h rarely go es deep er than four or v e steps, where the large ma jorit y of the b etter successors lie at depth 1. 8.2 Some more Examples In this section, w e presen t some of the results that w e ha v e obtained in domains that w ere not used in the AIPS-2000 comp etition. W e giv e some more examples of domains where FF w orks w ell, and, to illustrate our in tuitions on the reasons for FF's b eha vior, also some examples of domains where FF is less appropriate. F or ev aluation, w e ran FF on a collection of 20 b enc hmark planning domains, including all domains from the AIPS-1998 and AIPS-2000 comp etitions, and sev en more domains from the literature. Precisely , the domains in our suite w ere Assem bly , t w o Blo c ksw orld s (three- and four-op erator represen tation), Briefcasew orld , Bulldozer , F reecell , F ridge , Grid , Grip- p er , Hanoi , Logistics , Miconic -ADL, Miconic -SIMPLE, Miconic -STRIPS, Mo vie , Mprime , Mystery , Sc hedule , Tirew orld , and Tsp . Instances w ere either tak en from published dis- tributions, from the literature, or mo died to sho w scaling b eha vior. 5 Times for FF w ere measured on a Sparc Ultra 10 running at 350 MHz, with a main memory of 256 MBytes. Running times that w e sho w for other planners w ere tak en on the same mac hine, if not oth- erwise indicated in the text. W e found that FF sho ws extremely comp etitiv e p erformance on 16 of the 20 domains listed ab o v e. On the t w o Blo c ksw orld s, Mprime , and Mystery , it still sho ws satisfying b eha vior. Some examples that ha v e not b een used in the AIPS-2000 comp etition are:  The Assem bly Domain. FF solv es 25 of the 30 tasks in the AIPS-1998 test suite in less than v e seconds, where the v e others are either unsolv able, or ha v e sp ecication errors. The only other planner w e kno w of that can solv e an y of the Assem bly tasks is IPP . The latest v ersion IPP4.0 solv es only four of the v ery small instances, taking up to 12 hours running time. FF's plan lengths are, in terms of the n um b er of actions, shorter than IPP's time step optimal ones, ranging from 90% to 96%.  The Briefcasew orld Domain. This is a classical domain, where n ob jects need to b e transp orted using a briefcase. Whenev er the briefcase is mo v ed, a conditional eect forces all ob jects inside the briefcase to mo v e with it. F rom our suite, IPP4.0 easily solv es the tasks with n  5 ob jects, but fails to solv e an y task where n  7. FF, on the other hand, solv es ev en the 11-ob jects tasks in less than a second. On the tasks that IPP solv es, plan lengths of FF are within 84% to 111% of IPP's lengths, 99% on a v erage.  The Grid Domain. The 1998 comp etition featured v e instances. F or these tasks, the fastest planning mec hanism w e kno w of from the literature is a v ersion of GR T that is enhanced with a simple kind of domain dep enden t kno wledge, supplied b y the 5. All PDDL les, and random instance generators for all domains, are a v ailable in an online ap- p endix. The generators, together with descriptions of our randomization strategies, are also a v ailable at h ttp://www.informatik.uni-freiburg.de/~ homann/-domains.h tml. 284 F ast Plan Genera tion Thr ough Heuristic Sear ch user. It solv es the tasks in 1 : 04, 6 : 63, 21 : 35, 19 : 92 and 118 : 65 seconds on a 300 MHz P en tium Celeron mac hine with 64 MByte main memory (Refanidis & Vlaha v as, 2000). FF solv es the same tasks within 0 : 15, 0 : 47, 2 : 11, 1 : 93 and 19 : 54 seconds, resp ectiv ely . Plan lengths of FF are within 89% to 139% of GR T's lengths, 112% on a v erage.  The Gripp er Domain, used in the 1998 comp etition. The n um b er of states that FF ev aluates b efore returning an optimal sequen tial solution is line ar in the size of the task there. The biggest AIPS-1998 example gets solv ed in 0 : 16 seconds.  The Tirew orld Domain. The original task form ulated b y Stuart Russel asks the plan- ner to nd out ho w to replace a at tire. Ko ehler and Homann (2000a) mo died the task suc h that an arbitrary n um b er of n tires need to b e replaced. IPP3.2, using the goal agenda tec hnique, solv es the 1, 2, and 3-tire tasks in 0 : 08, 0 : 21, and 1 : 33 seconds, resp ectiv ely , but exhausts memory resources as so on as n  4. FF scales to m uc h larger tasks, taking less than a ten th of a second when n  6, still solving the 10-tire task in 0 : 33 seconds. FF's plan lengths are, on the tasks that IPP manages to solv e, equally long in terms of the n um b er of actions. As w as already said, our in tuition is that the ma jorit y of the curren tly a v ailable b enc hmark planning domains|at least those represen ted b y our domain collection|are \simple" in structure, and that it is this simplicit y whic h mak es them solv able so easily b y a greedy algorithm suc h as FF. T o illustrate our in tuitions, w e no w giv e data for a few domains that ha v e a less simple structure. They are therefore c hallenging for FF. 8.2.1 The Mystery and Mprime Domains The Mystery and Mprime domains w ere used in the AIPS-1998 comp etition. Both are v ariations of the Logistics domain, where there are additional constrain ts on the capacit y of eac h v ehicle, and, in particular, on the amoun t of fuel that is a v ailable. Both domains are closely related, the only dierence b eing that in Mprime , fuel items can b e transp orted b et w een t w o lo cations, if one of those has more than one suc h item. In Figure 9, w e compare FF's results on b oth domains to that rep orted b y Drew McDermott for the Unp op system (McDermott, 1999). Instances are the same for b oth domains in Figure 9. Results for Unp op ha v e b een tak en b y McDermott on a 300 MHz P en tium-I I w orkstation (McDermott, 1999). A dash indicates that the task couldn't b e solv ed b y the corresp onding planner. One needs to b e careful when comparing the running times in Figure 9: unlik e FF, co ded in C , Unp op is written in Lisp . Th us, the apparen t run time sup eriorit y of FF in Figure 9 is not signican t. On the con trary , Unp op seems to solv e these task collections more reliably than FF: it nds solutions to four Mystery and three Mprime instances whic h FF do es not manage to solv e. None of the planners is sup erior in terms of solution lengths: On Mystery , FF ranges within 55% to 185% of Unp op's lengths, 103% on a v erage, on Mprime , FF ranges within 45% to 150%, 93% on a v erage. W e think that FF's b eha vior on these t w o domains is due to the large amoun t of dead ends in the corresp onding state spaces|w e tried to randomize FF's searc h strategy , running it on the Mystery and Mprime suits. Regardless of the randomization strategy w e tried, on the tasks that original FF couldn't solv e searc h ended up b eing stuc k in a dead end. Dead 285 Hoffmann & Nebel Mystery Mprime Unp op FF Unp op FF task time steps time steps time steps time steps prob-01 0.3 5 0.04 5 0.4 5 0.04 5 prob-02 3.3 8 0.25 10 13.5 8 0.27 10 prob-03 2.1 4 0.08 4 5.9 4 0.09 4 prob-04 - - - - 3.9 9 0.04 10 prob-05 - - - - 19.2 17 - - prob-06 - - - - - - - - prob-07 - - - - - - - - prob-08 - - - - 52.5 10 0.40 10 prob-09 3.3 8 - - 13.5 8 0.16 10 prob-10 - - - - 79.0 19 - - prob-11 1.4 11 0.05 9 2.9 11 0.06 9 prob-12 - - - - 8.0 12 0.20 10 prob-13 370.1 16 - - 89.3 15 0.16 10 prob-14 162.1 18 - - - - - - prob-15 17.3 6 0.98 8 14.6 6 3.39 8 prob-16 - - - - 25.2 13 0.28 7 prob-17 13.1 5 0.70 4 4.0 5 0.92 4 prob-18 - - - - - - - - prob-19 11.8 6 - - 24.7 6 0.99 9 prob-20 22.5 7 0.41 13 62.8 17 3.11 13 prob-21 - - - - 22.1 11 - - prob-22 - - - - 135.7 16 643.19 23 prob-23 - - - - 55.0 18 3.09 14 prob-24 - - - - 24.8 15 2.7 9 prob-25 0.4 4 0.02 4 0.5 4 0.02 4 prob-26 6.0 6 0.85 7 16.4 14 0.16 10 prob-27 3.8 9 0.05 5 2.8 7 0.78 5 prob-28 1.4 9 0.01 7 1.6 11 0.08 5 prob-29 0.9 4 0.06 4 1.5 4 0.30 4 prob-30 20.8 14 0.23 11 17.7 12 1.86 11 Figure 9: Running times and solution length results on the AIPS-1998 Mystery and Mprime suites. ends are a frequen t phenomenon in the Mystery and Mprime domains, where, for example, an imp ortan t v ehicle can run out of fuel. In that sense, the tasks in these domains ha v e a more complex structure than those in a lot of other b enc hmark domains, where the tasks are dead-end free. Dep ending more or less randomly on task structure and selected actions, FF can either solv e Mystery and Mprime tasks quite fast, or fails, i.e., encoun ters a dead end state with enforced hill-clim bing. T rying to solv e the tasks with complete b est-rst searc h exhausts memory resources for larger instances. 8.2.2 Random SA T Inst ances Our last example domain is not a classical planning b enc hmark. T o giv e an example of a planning task collection where FF r e al ly encoun ters diculties, w e created a planning domain con taining hard random SA T instances. Figure 10 sho ws run time curv es for FF, IPP4.0, and BLA CKBO X3.6. The tasks in Figure 10 are solv able SA T instances that w ere randomly generated accord- ing to the xed clause-length mo del with 4 : 3 times as man y clauses as v ariables (Mitc hell, Selman, & Lev esque, 1992). Random instance generation and translation soft w are to PDDL ha v e b oth b een pro vided b y Jussi Rin tanen. Our gure sho ws running times for SA T in- stances with 5, 10, 15, 20, 25, and 30 v ariables, v e tasks of eac h size. V alues for tasks of the same size are displa y ed in turn, i.e., all data p oin ts b elo w 10 on the x-axis sho w running times for 5 v ariable tasks, and so on. Though the data set is small, the observ ation to b e 286 F ast Plan Genera tion Thr ough Heuristic Sear ch 0.01 0.1 1 10 100 1000 5 10 15 20 25 30 sec. #variables FF IPP BLACKBOX Figure 10: Run time curv es for FF, IPP , and BLA CKBO X, when run on hard random SA T instances with an increasing n um b er of v ariables. made is clear: FF can only solv e the small instances, and t w o of the bigger ones. IPP and BLA CKBO X scale m uc h b etter, with the tendency that BLA CKBO X is fastest. The enco ding of the SA T instances is the follo wing. An op erator corresp onds to assigning a truth v alue to a v ariable, whic h mak es all clauses true that con tain the resp ectiv e literal. Once a v ariable has b een assigned, its v alue is xed. The goal is ha ving all clauses true. It is not surprising that BLA CKBO X do es b est. After all, this planner uses SA T tec hnology for solving the tasks. 6 F or IPP and FF, the searc h space is the space of all partial truth assignmen ts. Due to exclusion relations, IPP can rule out quite man y suc h assignmen ts early , when it nds they can't b e completed. FF, on the other hand, do es no suc h reasoning, and gets lost in the exp onen tial searc h space, using a heuristic that merely tells it ho w man y v ariables it will still need to assign truth v alues to, una w are of the in teractions that migh t, and most lik ely will, o ccur. In con trast to most of the curren t b enc hmark planning domains, nding a non-optimal solution to the planning tasks used here is NP -hard. FF's b eha vior on these tasks sup- p orts our in tuition that FF's eciency is due to the inheren t simplicit y of the planning b enc hmarks. 6. In these exp erimen ts, w e ran BLA CKBO X with the default parameters. Most lik ely , one can b o ost the p erformance b y parameter tuning. 287 Hoffmann & Nebel 8.3 What Mak es the Dierence to HSP? One of the questions that the authors ha v e b een ask ed most frequen tly at the AIPS-2000 planning comp etition is this: If FF is so closely related to HSP , then wh y do es it p erform so m uc h b etter? FF uses the same basic ideas as classical HSP , forw ard searc h in state space, and heuristic ev aluation b y ignoring delete lists (Bonet & Gener, 1998). The dierences lie in the w a y FF estimates goal distances, the searc h strategy , and FF's pruning tec hniques. T o obtain a picture of whic h new tec hnique yields whic h p erformance results, w e conducted a n um b er of exp erimen ts where those tec hniques could b e turned on and o indep enden tly of eac h other. Using all com binations of tec hniques, w e measured run time and solution length p erformance on a large set of planning b enc hmark tasks. In this section, w e describ e the exp erimen tal setup, and summarize our ndings. The ra w data and detailed graphical represen tations of the results are a v ailable in an online app endix. 8.3.1 Experiment al Setup W e fo cused our in v estigation on FF's k ey features, i.e., w e restricted our exp erimen ts to the FF base arc hitecture, rather than taking in to accoun t all of FF's new tec hniques. Remem b er that FF's base arc hitecture (cf. Section 2) is the enforced hill-clim bing algorithm, using FF's goal distances estimates, and pruning the searc h space with the helpful actions heuristic. The additional tec hniques in tegrated deal with sp ecial cases, i.e., the added goal deletion heuristic and the goal agenda are concerned with goal orderings, and the complete b est-rst searc h serv es as a kind of safet y net when lo cal searc h has run in to a dead end. Considering all tec hniques indep enden tly w ould giv e us 2 6 = 64 dieren t planner congurations. As eac h of the sp ecial case tec hniques yields sa vings only in a small subset (b et w een 4 and 6) of our 20 domains, large groups of those 64 congurations w ould b eha v e exactly the same on the ma jorit y of our domains. W e decided to concen trate on FF's more fundamen tal tec hniques. The dierences b et w een classical HSP and FF's base arc hitecture are the follo wing: 1. Goal distance estimates: while HSP appro ximates relaxed solution lengths b y com- puting certain w eigh t v alues, FF extracts explicit relaxed solutions, cf. Section 4. 2. Searc h strategy: while classical HSP emplo ys a v ariation of standard hill-clim bing, FF uses enforced hill-clim bing as w as in tro duced in Section 5. 3. Pruning tec hnique: while HSP expands all c hildren of an y searc h no de, FF expands only those c hildren that are considered helpful, cf. Section 6.1. W e ha v e implemen ted exp erimen tal co de where eac h of these algorithmic dierences is attac hed to a switc h, turning the new tec hnique on or o. The eigh t dieren t congurations of the switc hes yield eigh t dieren t heuristic planners. When all switc hes are on, the result- ing planner is exactly FF's base arc hitecture. With all switc hes o, our in ten tion w as to imitate classical HSP , i.e., HSP1 as it w as used in the AIPS-1998 comp etition. Concerning the goal distance estimates switc h and the pruning tec hniques switc h, w e implemen ted the original metho ds. Concerning the searc h strategy , w e used the follo wing simple hill-clim bing design:  Alw a ys select one b est ev aluated successor randomly . 288 F ast Plan Genera tion Thr ough Heuristic Sear ch  Keep a memory of past states to a v oid cycles in the hill-clim bing path.  Coun t the n um b er of consecutiv e times in whic h the c hild of a no de do es not impro v e the heuristic estimate. If that coun ter exceeds a threshold, then restart, where the threshold is 2 times the initial state's goal distance estimate.  Keep visited no des in memory across restart trials in order to a v oid m ultiple compu- tation of the heuristic for the same state. In HSP1, some more v ariations of restart tec hniques are implemen ted. In p ersonal com- m unication with Blai Bonet and Hector Gener, w e decided not to imitate those v ariations| whic h aect b eha vior only in a few sp ecial cases|and use the simplest p ossible design in- stead. W e compared the p erformance of our implemen tation with all switc hes turned o to the p erformance of HSP1, running the planners on 12 un t yp ed STRIPS domains (the input required for HSP1). Except in four domains, the tasks solv ed w ere the same for b oth planners. In F reecell and Logistics , our planner solv ed more tasks, apparen tly due to imple- men tation details: though HSP1 did not visit more states than our planner on the smaller tasks, it ran out of memory on the larger tasks. In Tirew orld and Hanoi , the restarting tec hniques seem to mak e a dierence: In Tirew orld , HSP1 cannot solv e tasks with more than one tire b ecause it alw a ys restarts b efore getting close to the goal (our planner solv es tasks with up to 3 tires), whereas in Hanoi our implemen tation can not cop e with more than 5 discs for the same reason (HSP1 solv es tasks with up to 7 discs). Altogether, in most cases there is a close corresp ondence b et w een the b eha vior of HSP1 and our conguration with all switc hes turned o. In an y case, our exp erimen ts pro vide useful insigh ts in to the p erformance of enforced hill-clim bing compared to a simple straigh tforw ard hill-clim bing strategy . T o obtain data, w e set up a large example suite, con taining a total of 939 planning tasks from our 20 b enc hmark domains. As said at the b eginning of Section 8.2, our domains w ere Assem bly , t w o Blo c ksw orld s (three- and four-op erator represen tation), Briefcasew orld , Bull- dozer , F reecell , F ridge , Grid , Gripp er , Hanoi , Logistics , Miconic -ADL, Miconic -SIMPLE, Miconic -STRIPS, Mo vie , Mprime , Mystery , Sc hedule , Tirew orld , and Tsp . In Hanoi , there w ere 8 tasks|3 to 10 discs to b e mo v ed|in the other domains, w e used from 30 to 69 dieren t instances. As v ery small instances are lik ely to pro duce noisy data, w e tried to a v oid those b y rejecting tasks that w ere solv ed b y FF in less than 0 : 2 seconds. This w as p ossible in all domains but Mo vie , where all tasks in the AIPS-1998 suite get solv ed in at most 0 : 03 seconds. In the t w o Blo c ksw orld represen tations, w e randomly generated tasks with 7 to 17 blo c ks, using the state generator pro vided b y John Slaney and Sylvie Thiebaux (2001). In Assem bly and Grid , w e used the AIPS-1998 instances, plus a n um b er of ran- domly generated ones similar in size to the biggest examples in the comp etition suites. In Gripp er , our tasks con tained from 10 to 59 balls to b e transp orted. In the remaining 9 comp etition domains, w e used the larger instances of the resp ectiv e comp etition suites. In Briefcasew orld and Bulldozer , w e randomly generated around 50 large tasks, with 10 to 20 ob jects, and 14 to 24 lo cations, resp ectiv ely . In F ridge , from 1 to 14 compressors had to 289 Hoffmann & Nebel b e exc hanged, in Tirew orld , 1 to 30 wheels needed to b e replaced, and in Tsp , 10 to 59 lo cations needed to b e visited. 7 F or eac h of the eigh t congurations of switc hes, w e ran the resp ectiv e planner on eac h of the tasks in our example suite. Those congurations using randomized hill-clim bing w ere run v e times on eac h task, and the results a v eraged afterw ards. Though v e trials migh t sound lik e a small n um b er here|w a y to o small if w e w ere to compare dieren t hill- clim bing strategies for SA T problems, for example|the n um b er seemed to b e reasonable to us: remem b er that, in the planning framew ork, all hill-clim bing trials start from the same state. The v ariance that w e found b et w een dieren t trials w as usually lo w in our testing runs. T o complete the exp erimen ts in a reasonable time, w e restricted memory consumption to 128 MByte, and time consumption to 150 seconds|usually , if FF needs more time or memory on a planning task of reasonable size, then it do esn't manage to solv e it at all. As said at the b eginning of the section, the ra w data is a v ailable in an online app endix, accompanied b y detailed graphical represen tations. Here, w e summarize the results, and discuss the most in teresting observ ations. W e examined the data separately for eac h domain, as our algorithmic tec hniques t ypically sho w similar b eha vior for all tasks within a domain. In con trast, there can b e essen tial dierences in the b eha vior of the same tec hnique when it is applied to tasks from dieren t domains. 8.3.2 R unning Time F or our running time in v estigation, if a conguration did not nd a solution plan to a giv en task, w e set the resp ectiv e running time v alue to the time limit of 150 seconds (sometimes, a conguration can terminate faster without nding a plan, for example an enforced hill- clim bing planner running in to a dead end). In the follo wing, w e designate eac h switc h conguration b y 3 letters: \H" stands for helpful actions on, \E" stands for enforced hill- clim bing on, \F" stands for FF estimates on. If a switc h is turned o, the resp ectiv e letter is replaced b y a \  ": FF's base arc hitecture is conguration \HEF", our HSP1 imitation is \    ", and \H  ", for example, is hill-clim bing with HSP goal distances and helpful actions pruning. F or a rst impression of our running time results, see the a v eraged v alues p er domain in Figure 11. Figure 11 sho ws, for eac h domain and eac h conguration, the a v eraged running time o v er all instances in that domain. As the instances in eac h domain are not all the same size, but t ypically scale from smaller to v ery large tasks, a v eraging o v er all running times is, of course, a v ery crude appro ximation of run time b eha vior. The data in Figure 11 pro vides a general impression of our run time results p er domain, and giv es a few hin ts on the phenomena that migh t b e presen t in the data. Compare, for example, the v alues on the righ t hand side|those planners using helpful actions|to those on the left hand side|those planners expanding all sons of searc h no des. In Briefcasew orld and Bulldozer , the righ t hand side v alues are higher, but in almost all other domains, they are considerably lo w er. This is esp ecially true for the t w o righ tmost columns, sho wing v alues for planners using helpful actions and enforced hill-clim bing. This indicates that the main sources of p erformance lie 7. All PDDL les, and the source co de of all instance generators w e used, are a v ailable in an online app endix. The generators, together with descriptions of the randomization strategies, are also a v ailable at h ttp://www.informatik.uni-freiburg.de/~ homann/-domains.h tml. 290 F ast Plan Genera tion Thr ough Heuristic Sear ch     F  E   EF H  H  F HE  HEF Assem bly 117.39 31.75 92.95 61.10 47.81 20.25 20.34 16.94 Blo c ksw orld-3ops 4.06 2.53 8.37 30.11 1.41 0.83 0.27 6.11 Blo c ksw orld-4ops 0.60 8.81 80.02 56.20 1.21 10.13 25.19 40.65 Briefcasew orld 16.35 5.84 66.51 116.24 150.00 150.00 150.00 150.00 Bulldozer 4.47 3.24 31.02 15.74 81.90 126.50 128.40 141.04 F reecell 65.73 46.05 54.15 51.27 57.35 42.68 43.99 41.44 F ridge 28.52 53.58 31.89 52.60 0.85 0.69 1.88 2.77 Grid 138.06 119.53 115.05 99.18 115.00 95.10 18.73 11.73 Gripp er 2.75 1.21 15.16 1.00 1.17 0.48 0.17 0.11 Hanoi 93.76 75.05 6.29 3.91 150.00 78.82 4.47 2.70 Logistics 79.27 102.09 79.77 111.47 36.88 39.69 10.18 11.94 Miconic-ADL 150.00 150.00 102.54 54.23 142.51 128.28 95.45 59.00 Miconic-SIMPLE 2.61 2.01 2.47 1.93 1.35 0.86 0.55 0.56 Miconic-STRIPS 2.71 2.32 4.84 1.53 1.44 1.01 0.64 0.36 Mo vie 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 Mprime 73.09 69.27 82.89 81.43 47.09 58.45 18.56 26.62 Mystery 78.54 90.55 71.60 86.01 75.73 95.24 85.13 86.21 Sc hedule 135.50 131.12 143.59 141.42 77.58 38.23 12.23 13.77 Tirew orld 135.30 110.38 119.22 121.34 121.13 105.67 97.41 85.64 Tsp 4.11 0.82 2.45 0.75 2.48 0.57 0.15 0.07 Figure 11: Av eraged running time p er domain for all eigh t congurations of switc hes. in the pruning tec hnique and the searc h strategy|lo oking at the righ tmost \HE  " and \HEF" columns, whic h only dier in the goal distance estimate, those t w o conguration v alues are usually close to eac h other, compared to the other congurations in the same domain. T o put our observ ations on a solid basis, w e lo ok ed, for eac h domain, at eac h pair of congurations in turn, amoun ting to 20  8  7 2 = 560 pairs of planner p erformances. F or eac h suc h pair, w e decided whether one conguration p erformed signican tly b etter than the other one. T o decide signicance, w e coun ted the n um b er of tasks that one conguration solv ed faster. W e found this to b e a more reliable criterion than things lik e the dierence b et w een running times for eac h task. As tasks gro w in size, rather than b eing tak en from a p opulation with nite mean size, parametric statistical pro cedures, lik e computing condence in terv als for run time dierences, mak e questionable assumptions ab out the distribution of data. W e th us used the follo wing non-parametric statistical test, kno wn as the t w o-tailed sign test (Siegel & N. J. Castellan, 1988). Assume that b oth planners, A and B, p erform equally on a giv en domain. Then, giv en a random instance from the domain, the probabilit y that B is faster than A should b e equal to the probabilit y that A is faster than B. T ak e this as the n ull h yp othesis. Under that h yp othesis, if A and B b eha v e dieren tly on an instance, then B is faster than A with probabilit y 1 2 . Th us, the tasks where B is faster are distributed o v er the tasks with dieren t b eha vior according to a Binomial distribution with p = 1 2 . Compute the probabilit y of the observ ed outcome under the n ull h yp othesis, i.e., if there are n tasks where A and B b eha v e dieren tly , and k tasks 291 Hoffmann & Nebel where B is faster, then compute the probabilit y that, according to a binomial distribution with p = 1 2 , at least k p ositiv e outcomes are obtained in n trials. If that probabilit y is less or equal than : 01, then reject the n ull h yp othesis and sa y that B p erforms signican tly b etter than A. Symmetrically , decide whether A p erforms signican tly b etter than B. W e remark that in all domains except Mo vie the tasks where t w o congurations b eha v ed equally w ere exactly those that could not b e solv ed b y either of the congurations. In 60% of the cases where w e found that one conguration B p erformed signican tly b etter than another conguration, B w as faster on al l instances with dieren t b eha vior. In 71%, B w as faster on all but one suc h instance. W e are particularly in terested in pairs A and B of congurations where B results from A b y turning one of the switc hes on, lea ving the t w o others unc hanged. Deciding ab out signican t impro v emen t in suc h cases tells us ab out the eect that the resp ectiv e tec hnique has on p erformance in a domain. There are 12 pairs of congurations where one switc h is turned on. Figure 12 sho ws our ndings in these cases. F E H domain   E H  HE   F H  HF   F E  EF Assem bly + + + +  + + + + + + Blo c ksw orld-3ops + +  +  + + + + + + Blo c ksw orld-4ops +     + + Briefcasew orld +        Bulldozer + +         F reecell + + + + + + + + + + + + F ridge      + + + + Grid + + + + + + + + + + Gripp er + + + +  + + + + + + + Hanoi + + + + + Logistics   +  + + + + + + Miconic-ADL + + + + + + + + + Miconic-SIMPLE + + + + + + + + + + + Miconic-STRIPS + + + + + + + + + + + Mo vie + + + + Mprime + + + + + + + + + Mystery + + + + Sc hedule + +  + + + + + + Tirew orld + + +  + + + + + Tsp + + + + + + + + + + + + Figure 12: The eect of turning on a single switc h, k eeping the others unc hanged. Summa- rized in terms of signican tly impro v ed or degraded running time p erformance p er domain, and p er switc h conguration. Figure 12 is to b e understo o d as follo ws. It sho ws our results for the \F", \E", and \H" switc hes, whic h b ecome activ e in turn from left to righ t. F or eac h of these switc hes, there are four congurations of the t w o other, bac kground, switc hes, displa y ed b y four columns 292 F ast Plan Genera tion Thr ough Heuristic Sear ch in the table. In eac h column, the b eha vior of the resp ectiv e bac kground conguration with the activ e switc h turned o is compared to the b eha vior with the activ e switc h turned on. If p erformance is impro v ed signican tly , the table sho ws a \+", if it is signican tly degraded, the table sho ws a \  ", and otherwise the resp ectiv e table en try is empt y . F or example, consider the top left corner, where the \F" switc h is activ e, and the bac kground conguration is \  ", i.e., hill-clim bing without helpful actions. Planner A is \    ", using HSP distances, and planner B is \  F", using FF distances. B's p erformance is signican tly b etter than A's, indicated b y a \+". The leftmost four columns in Figure 12 sho w our results for HSP distance estimates v er- sus FF distance estimates. Clearly , the latter estimates are sup erior in our domains, in the sense that, for eac h bac kground conguration, the b eha vior gets signican tly impro v ed in 8 to 10 domains. In con trast, there are only 5 cases altogether where p erformance gets w orse. The signicances are quite scattered o v er the domains and bac kground congurations, in- dicating that a lot of the signicances result from in teractions b et w een the tec hniques that o ccur only in the con text of certain domains. F or example, p erformance is impro v ed in Bulldozer when the bac kground conguration do es not use helpful actions, but degraded when the bac kground conguration uses hill-clim bing with helpful actions. This kind of b e- ha vior can not b e observ ed in an y other domain. There are 4 domains where p erformance is impro v ed in all but one bac kground conguration. Apparen tly in these cases some in ter- action b et w een the tec hniques o ccurs only in one sp ecic conguration. W e remark that often running times with FF's estimates are only a little b etter than with HSP's estimates, i.e., b eha vior gets impro v ed reliably o v er all instances, but only b y a small factor (to get an idea of that, compare the dierences b et w een a v erage running times in Figure 11, for cong- urations where only the distance estimate c hanges). In 5 domains, FF's estimates impro v e p erformance consisten tly o v er all bac kground congurations, indicating a real adv an tage of the dieren t distance estimates. In Gripp er (describ ed in Section 6.1), for example, w e found the follo wing. If the rob ot is in ro om A, and holds only one ball, FF's heuristic prefers pic king up another ball o v er mo ving to ro om B, i.e., the pic king action leads to a state with b etter ev aluation. No w, if there are n balls left in ro om A, then HSP's heuristic estimate of pic king up another ball is 4 n  2, while the estimate of mo ving to ro om B is 3 n + 1. Th us, if there are at least 4 balls left in ro om A, mo ving to ro om B gets a b etter ev aluation. Summing up w eigh ts, HSP o v erestimates the usefulness of the mo ving action. Comparing hill-clim bing v ersus enforced hill-clim bing, i.e., lo oking at the four columns in the middle of Figure 12, the observ ation is this. The dieren t searc h tec hnique is a bit questionable when the bac kground conguration do es not use helpful actions, but otherwise, enforced hill-clim bing yields excellen t results. Without helpful actions, p erformance gets degraded almost as man y times as it gets impro v ed, whereas, with helpful actions, enforced hill-clim bing impro v es p erformance signican tly in 16 of our 20 domains, b eing degraded only in F ridge . W e dra w t w o conclusions. First, whether one or the other searc h strategy is adequate dep ends v ery m uc h on the domain. A simple example for that is the Hanoi domain, where hill-clim bing alw a ys restarts b efore it can reac h the goal|on all paths to the goal, there are exp onen tially man y state transitions where the son has no b etter ev aluation than the father. Second, there is an in teraction b et w een enforced hill-clim bing and helpful actions pruning that o ccurs consisten tly across almost all of our planning domains. This can b e explained b y the eect that the pruning tec hnique has on the dieren t searc h strategies. 293 Hoffmann & Nebel In hill-clim bing, helpful actions pruning prev en ts the planner from lo oking at to o man y sup eruous successors on eac h single state that a path go es through. This sa v es time prop ortional to the length of the path. The eects on enforced hill-clim bing are m uc h more drastic. There, helpful actions prunes out unnecessary successors of eac h state during a breadth rst searc h, i.e., it cuts do wn the branc hing factor, yielding p erformance sp eedups exp onen tial in the depths that are encoun tered. W e nally compare consideration of all actions v ersus consideration of only the helpful ones. Lo ok at the righ tmost four columns of Figure 12. The observ ation is simply that helpful actions are really helpful|they impro v e p erformance signican tly in almost all of our planning domains. This is esp ecially true for those bac kground congurations using enforced hill-clim bing, due to the same in teraction that w e ha v e outlined ab o v e. In some domains, helpful actions pruning imp oses a v ery rigid restriction on the searc h space: in Sc hedule , as said in Section 8.1.3, w e found that states can ha v e h undreds of successors, where only ab out 2% of those are considered helpful. In other domains, only a few actions are pruned, lik e in Hanoi , where at most three actions are applicable in eac h state, whic h are all considered helpful in most of the cases. Ev en a small degree of restriction do es usually lead to a signican t impro v emen t in p erformance. In t w o domains, Briefcasew orld and Bulldozer , helpful actions can prune out to o man y p ossibilities, i.e., they cut a w a y solution paths. This happ ens b ecause there, the relaxed plan can ignore things that are crucial for solving the real task. Consider the Briefcasew orld , briey describ ed in Section 8.2, where ob jects need to b e mo v ed using a briefcase. Whenev er the briefcase is mo v ed, all ob jects inside it are mo v ed with it b y a conditional eect. No w, the relaxed planner nev er needs to tak e an y ob ject out of the briefcase|the delete eects sa y that mo ving an ob ject means the ob ject is no longer at the start lo cation. Ignoring this, k eeping ob jects inside the briefcase nev er h urts. 8.3.3 Solution Length W e also in v estigated the eects that FF's new tec hniques ha v e on solution length. Com- paring t w o congurations A and B, w e to ok as the data set the resp ectiv e solution length for those tasks that b oth A and B managed to solv e|ob viously , there is not m uc h p oin t in comparing solution length when one planner can not nd a solution at all. W e then coun ted the n um b er n of tasks where A and B b eha v ed dieren tly , and the n um b er k where B's solution w as shorter, and decided ab out signicance lik e describ ed in the last section. Figure 13 sho ws our results in those cases where a single switc h is turned. The data in Figure 13 are organized in the ob vious manner analogous to Figure 12. A rst glance at the table tells us that FF's new tec hniques are also useful for shortening solution length in comparison to HSP1, but not as useful as they are for impro ving run time b eha vior. Let us fo cus on the leftmost four columns, HSP distance estimates v ersus FF dis- tance estimates. The observ ations are that, with enforced hill-clim bing in the bac kground, FF estimates often result in shorter plans, and that there are t w o domains where solution lengths are impro v ed across all bac kground congurations. Concerning the second obser- v ation, this is due to prop erties of the domain that FF's heuristic recognizes, but HSP's do esn't. Recall what w e observ ed ab out the Gripp er domain in the preceding section. With the rob ot standing in ro om A, holding only one ball, the FF heuristic giv es pic king up 294 F ast Plan Genera tion Thr ough Heuristic Sear ch F E H domain   E H  HE   F H  HF   F E  EF Assem bly + + + + + + + Blo c ksw orld-3ops + + + + + + Blo c ksw orld-4ops + + + + + + + + Briefcasew orld + + + Bulldozer + + F reecell + + + + F ridge   + + + + + Grid + + + + + + + Gripp er + + + + +   Hanoi Logistics +  + + + + + + + Miconic-ADL + + + + Miconic-SIMPLE  + + + + + + + + + Miconic-STRIPS + + + + + + + + + + Mo vie     Mprime Mystery Sc hedule +  + + Tirew orld + + Tsp Figure 13: The eect of turning on a single switc h, k eeping the others unc hanged. Summa- rized in terms of signican tly impro v ed or degraded solution length p erformance p er domain, and p er switc h conguration. the ball a b etter ev aluation than mo ving to ro om B. The HSP heuristic do esn't do this. Therefore, using the HSP heuristic results in longer plans. Concerning the rst observ ation, impro v ed solution lengths when enforced hill-clim bing is in the bac kground, w e do not ha v e a go o d explanation for this. It seems that the greedy w a y in whic h enforced hill-clim bing builds its plans is just b etter suited when distance estimates are cautious, i.e., lo w. Consider the four columns in the middle of Figure 13, hill-clim bing v ersus enforced hill-clim bing. There are man y cases where the dieren t searc h strategy results in shorter plans. W e gure that this is due to the dieren t plateau b eha vior that the searc h metho ds exhibit, i.e., their b eha vior in at regions of the searc h space. Enforced hill-clim bing en ters a plateau somewhere, p erforms complete searc h for a state with b etter ev aluation, and adds the shortest path to that state to its curren t plan prex. When hill-clim bing en ters a plateau, it strolls around more or less randomly , un til it hits a state with b etter ev aluation, or has enough of it and restarts. All the actions on its journey to the b etter state are k ept in the nal plan. In Mo vie , the phenomenon is this. If a planner c ho oses to reset the coun ter on the V CR b efore it c ho oses to rewind the mo vie (initially , neither heuristic mak es a distinction b et w een these t w o actions), then it has to reset the coun ter again. The enforced hill-clim bing planners alw a ys reset the coun ter rst. The hill-clim bing planners, 295 Hoffmann & Nebel on the other hand, randomly c ho ose either ordering with equal probabilit y . As said in Section 8.3.1, hill-clim bing w as giv en v e tries on eac h task, and results a v eraged. In v e tries, around half of the solutions use the correct ordering, suc h that, for all tasks, the a v erage v alue is lo w er than the corresp onding v alue for the enforced hill-clim bing planners. Finally , w e compare consideration of all actions v ersus consideration of only the helpful ones, results depicted in the righ tmost four columns of Figure 12. Coming a bit unexp ected, there is only one single case where solution length p erformance is degraded b y turning on helpful actions. This indicates that the actions on the shortest path to the goal are, in fact, usually considered helpful|unless al l solution paths are thro wn a w a y , as is sometimes the case only in the Briefcasew orld and Bulldozer domains. Quite the other w a y around than one should think, pruning the searc h space with helpful actions sometimes leads to signican tly shorter solution plans, esp ecially when the underlying searc h metho d is hill- clim bing. Though this ma y sound parado xical, there is a simple explanation to it. Consider what w e said ab o v e ab out the plateau b eha vior of hill-clim bing, randomly adding actions to the curren t plan in the searc h for a b etter state. If suc h a searc h engine is armed with the helpful actions successors c hoice, fo cusing it in to the direction of the goals, it migh t w ell tak e less steps to nd the w a y o a plateau. 9. Related W ork The most imp ortan t connections of the FF approac h to metho dologies rep orted in the literature are the follo wing:  HSP's basic idea of forw ard state space searc h and heuristic ev aluation b y ignoring delete lists (Bonet & Gener, 1998).  The view of our heuristic as a sp ecial case of GRAPHPLAN (Blum & F urst, 1995), and its connection to HSP's heuristic metho d.  The similarit y of the helpful actions heuristic to McDermott's fa v ored actions (1996), and to irrelev ance detection mec hanisms (Neb el et al., 1997).  The inspiration of the added goal deletion heuristic b y w ork done b y Ko ehler and Homann (2000a), and the adaption of the goal agenda approac h (Ko ehler, 1998).  The adaption of IPP's ADL prepro cessing phase (Ko ehler & Homann, 2000b), in- spired b y ideas from Gazen and Knoblo c k (1997). W e ha v e discussed all of these connections in the resp ectiv e sections already . So let us fo cus on a connection that has not y et b een men tioned. It has b een recognized after the rst planning comp etition at AIPS-1998 that the main b ottlenec k in HSP1 is the recom- putation of the heuristic on eac h single searc h state. Tw o recen t approac hes are based on the observ ation that the rep eated recomputation is necessary b ecause HSP1 do es forw ard searc h with a forw ard heuristic, i.e., the directions of searc h and heuristic are the same. The authors of HSP themselv es stic k to their heuristic, but c hange the searc h direction, going bac kw ards from the goal in HSP-r (Bonet & Gener, 1999). This w a y , they need to 296 F ast Plan Genera tion Thr ough Heuristic Sear ch compute w eigh t v alues only once, estimating eac h fact's distance to the initial state, and only sum the w eigh ts up for a state later during searc h. 8 Refanidis and Vlaha v as (1999) in v ert the direction of the HSP heuristic instead. While HSP computes distances going from the curren t state to w ards the goal, GR T go es from the goal to eac h fact. The function that then extracts, for eac h state during forw ard searc h, the states heuristic estimate, uses the pre computed distances as w ell as some information on whic h facts will probably b e ac hiev ed sim ultaneously . In terestingly , FF recomputes, lik e HSP , the heuristic from scratc h on eac h searc h state, but nev ertheless outp erforms the other approac hes. As w e ha v e seen in Section 8.3, this is for the most part due to FF's searc h strategy and the helpful actions pruning tec hnique. 10. Conclusion and Outlo ok W e ha v e presen ted an approac h to domain indep enden t planning that, at the time b eing, outp erforms all existing tec hnology on the ma jorit y of the curren tly a v ailable b enc hmark domains. Just lik e the w ell kno wn HSP1 system, it relies completely on forw ard state space searc h and heuristic ev aluation of states b y ignoring delete lists. Unlik e HSP , the metho d uses a GRAPHPLAN-st yle algorithm to nd an explicit relaxed solution to eac h searc h state. Those solutions giv e a more careful estimation of a state's dicult y . As a second ma jor dierence to HSP , our system emplo ys a no v el lo cal searc h strategy , com bining hill- clim bing with complete searc h. Finally , the metho d mak es use of p o w erful heuristic pruning tec hniques, whic h are based on examining relaxed solutions. As w e ha v e men tioned earlier, our in tuition is that the reasons for FF's eciency lie in structural prop erties that the curren t planning b enc hmarks tend to ha v e. As a matter of fact, the simplicit y of the b enc hmarks quite immediately meets the ey e, once one tries to lo ok for it. It should b e clear that the Gripp er tasks, where some balls need to b e transp orted from one ro om to another, exhibit a totally dieren t searc h space structure than, for example, hard random SA T instances. Therefore, it's in tuitiv ely unsurprising that dieren t searc h metho ds are appropriate for the former tasks than are traditionally used for the latter. The eciency of FF on man y of the b enc hmarks can b e seen as putting that observ ation to the surface. T o mak e explicit the h yp otheses stated ab o v e, w e ha v e in v estigated the state spaces of the planning b enc hmarks. F ollo wing F rank et al. (1997), w e ha v e collected empirical data, iden tifying c haracteristic parameters for dieren t kinds of planning tasks, lik e the densit y and size of lo cal minima and plateaus in the searc h space. This has lead us to a taxonom y for planning domains, dividing them b y the degree of complexit y that the resp ectiv e task's state spaces exhibit with resp ect to relaxed goal distances. Most of the curren t b enc hmark domains apparen tly b elong to the \simpler" parts of that taxonom y (Homann, 2001). W e also approac h our h yp otheses from a theoretical p oin t of view, where w e measure the degree of in teraction that facts in a planning task exhibit, and dra w conclusions on the searc h space structure from that. Our goal in that researc h is to devise a metho d that automatically decides whic h part of the taxonom y a giv en planning task b elongs to. In that con text, there are some remarks to b e made on what AI planning researc h is heading for. Our p oin t of view is that the goal in the eld should not b e to dev elop a 8. HSP-r is in tegrated in to HSP2 as an option of conguring the searc h pro cess (Bonet & Gener, 2001). 297 Hoffmann & Nebel tec hnology that w orks w ell on al l kinds of tasks one can express with planning languages. This will hardly b e p ossible, as ev en simple languages as STRIPS can express NP -hard problems lik e SA T. What migh t b e p ossible, ho w ev er, is to devise a tec hnology that w orks w ell on those tasks that c an b e solv ed ecien tly . In particular, if a planning task do es not constitute m uc h of a problem to an uninformed h uman solv er, then it neither should do so to our planning algorithms. With the FF system, w e already seem to ha v e a metho d that accomplishes this quite w ell, at least for sequen tial planning in STRIPS and ADL. While FF is not particularly w ell suited for solving random SA T instances, it easily solv es in tuitiv ely simple tasks lik e the Gripp er and Logistics ones, and is w ell suited for a n um b er of other domains where nding a non-optimal solution is not NP -hard. This sheds a critical ligh t on the predictions of Kautz and Selman (1999), who susp ected that planning tec hnology will b ecome sup eruous b ecause of the fast adv ance of the state of the art in prop ositional reasoning systems. The metho ds dev elop ed there are surely useful for solving SA T. They migh t, ho w ev er, not b e appropriate for the t ypical structures of tasks that AI planning should b e in terested in. Ac kno wledgmen ts This article is an extended and revised v ersion of a pap er (Homann, 2000) that has b een published at ISMIS-00. The authors wish to thank Blai Bonet and Hector Gener for their help in setting up the exp erimen ts on the comparison of FF with HSP . W e thank Jussi Rin tanen for pro viding us with soft w are to create random SA T instances in the PDDL language, and ac kno wledge the anon ymous review er's commen ts, whic h help ed impro v e the pap er. References Anderson, C. R., Smith, D. E., & W eld, D. S. (1998). Conditional eects in Graphplan. In Simmons, R., V eloso, M., & Smith, S. (Eds.), Pr o c e e dings of the 4th International Confer enc e on A rticial Intel ligenc e Planning Systems (AIPS-98) , pp. 44{53. AAAI Press, Menlo P ark. Bacc h us, F. (2000). Subset of PDDL for the AIPS2000 Planning Comp etition . The AIPS-00 Planning Comp etition Comitee. Bacc h us, F., & Nau, D. (2001). The 2000 AI planning systems comp etition. The AI Magazine . F orthcoming. Blum, A. L., & F urst, M. L. (1995). F ast planning through planning graph analysis. In Pr o- c e e dings of the 14th International Joint Confer enc e on A rticial Intel ligenc e (IJCAI- 95) , pp. 1636{1642 Mon treal, Canada. Morgan Kaufmann. Blum, A. L., & F urst, M. L. (1997). F ast planning through planning graph analysis. A rticial Intel ligenc e , 90 (1-2), 279{298. Bonet, B., & Gener, H. (1998). HSP: Heuristic searc h planner. In AIPS-98 Planning Comp etition Pittsburgh, P A. 298 F ast Plan Genera tion Thr ough Heuristic Sear ch Bonet, B., & Gener, H. (1999). Planning as heuristic searc h: New results. In Biundo, S., & F o x, M. (Eds.), R e c ent A dvanc es in AI Planning. 5th Eur op e an Confer enc e on Planning (ECP'99) Durham, UK. Springer-V erlag. Bonet, B., & Gener, H. (2001). Planning as heuristic searc h. A rticial Intel ligenc e . F orth- coming. Bonet, B., Lo erincs, G., & Gener, H. (1997). A robust and fast action selection mec ha- nism for planning. In Pr o c e e dings of the 14th National Confer enc e of the A meric an Asso ciation for A rticial Intel ligenc e (AAAI-97) , pp. 714{719. MIT Press. Bylander, T. (1994). The computational complexit y of prop ositional STRIPS planning. A rticial Intel ligenc e , 69 (1{2), 165{204. Cheng, J., & Irani, K. B. (1989). Ordering problem subgoals. In Sridharan, N. S. (Ed.), Pr o- c e e dings of the 11th International Joint Confer enc e on A rticial Intel ligenc e (IJCAI- 89) , pp. 931{936 Detroit, MI. Morgan Kaufmann. Drummond, M., & Currie, K. (1989). Goal ordering in partially ordered plans. In Sridha- ran, N. S. (Ed.), Pr o c e e dings of the 11th International Joint Confer enc e on A rticial Intel ligenc e (IJCAI-89) , pp. 960{965 Detroit, MI. Morgan Kaufmann. Edelk amp, S. (2000). Heuristic searc h planning with BDDs. In ECAI-Workshop: PuK . Ev en, S., & Shiloac h, Y. (1975). NP-completeness of sev eral arrangemen t problems. T ec h. rep. 43, Departmen t of Computer Science, Haifa, Israel. Fik es, R. E., & Nilsson, N. (1971). STRIPS: A new approac h to the application of theorem pro ving to problem solving. A rticial Intel ligenc e , 2 , 189{208. F o x, M., & Long, D. (1998). The automatic inference of state in v arian ts in tim. Journal of A rticial Intel ligenc e R ese ar ch , 9 , 367{421. F o x, M., & Long, D. (2001). Hybrid ST AN: Iden tifying and managing com binatorial op- timisation sub-problems in planning. In Pr o c e e dings of the 17th International Joint Confer enc e on A rticial Intel ligenc e (IJCAI-01) Seattle, W ashington, USA. Morgan Kaufmann. Accepted for publication. F rank, J., Cheeseman, P ., & Stutz, J. (1997). When gra vit y fails: Lo cal searc h top ology . Journal of A rticial Intel ligenc e R ese ar ch , 7 , 249{281. Gazen, B. C., & Knoblo c k, C. (1997). Com bining the expressiv eness of UCPOP with the eciency of Graphplan. In Steel, S., & Alami, R. (Eds.), R e c ent A dvanc es in AI Planning. 4th Eur op e an Confer enc e on Planning (ECP'97) , V ol. 1348 of Le ctur e Notes in Articial Intel ligenc e , pp. 221{233 T oulouse, F rance. Springer-V erlag. Homann, J. (2000). A heuristic for domain indep enden t planning and its use in an en- forced hill-clim bing algorithm. In Pr o c e e dings of the 12th International Symp osium on Metho dolo gies for Intel ligent Systems (ISMIS-00) , pp. 216{227. Springer-V erlag. 299 Hoffmann & Nebel Homann, J. (2001). Lo cal searc h top ology in planning b enc hmarks: An empirical analysis. In Pr o c e e dings of the 17th International Joint Confer enc e on A rticial Intel ligenc e (IJCAI-01) Seattle, W ashington, USA. Morgan Kaufmann. Accepted for publication. H olldobler, S., & St orr, H.-P . (2000). Solving the en tailmen t problem in the uen t calculus using binary decision diagrams. In Pr o c e e dings of the First International Confer enc e on Computational L o gic (CL) . T o app ear. Irani, K. B., & Cheng, J. (1987). Subgoal ordering and goal augmen tation for heuristic problem solving. In McDermott, J. (Ed.), Pr o c e e dings of the 10th International Joint Confer enc e on A rticial Intel ligenc e (IJCAI-87) , pp. 1018{1024 Milan, Italy . Morgan Kaufmann. Jonsson, P ., Haslum, P ., & B ac kstr om, C. (2000). Planning - a randomized approac h. A rticial Intel ligenc e , 117 (1), 1{29. Joslin, D., & Roac h, J. W. (1990). A theoretical analysis of conjunctiv e-goal problems. A rticial Intel ligenc e , 41 , 97{106. Kam bhampati, S., P ark er, E., & Lam brec h t, E. (1997). Understanding and extending Graphplan. In Steel, S., & Alami, R. (Eds.), R e c ent A dvanc es in AI Planning. 4th Eur op e an Confer enc e on Planning (ECP'97) , V ol. 1348 of Le ctur e Notes in Articial Intel ligenc e , pp. 260{272 T oulouse, F rance. Springer-V erlag. Kautz, H., & Selman, B. (1999). Unifying SA T-based and graph-based planning. In Pr o c e e d- ings of the 16th International Joint Confer enc e on A rticial Intel ligenc e (IJCAI-99) , pp. 318{325 Sto c kholm, Sw eden. Morgan Kaufmann. Kautz, H. A., & Selman, B. (1996). Pushing the en v elop e: Planning, prop ositional logic, and sto c hastic searc h. In Pr o c e e dings of the 13th National Confer enc e of the A meric an Asso ciation for A rticial Intel ligenc e (AAAI-96) , pp. 1194{1201. MIT Press. Ko ehler, J. (1998). Solving complex planning tasks through extraction of subproblems. In Simmons, R., V eloso, M., & Smith, S. (Eds.), Pr o c e e dings of the 4th International Confer enc e on A rticial Intel ligenc e Planning Systems (AIPS-98) , pp. 62{69. AAAI Press, Menlo P ark. Ko ehler, J., & Homann, J. (2000a). On reasonable and forced goal orderings and their use in an agenda-driv en planning algorithm. Journal of A rticial Intel ligenc e R ese ar ch , 12 , 338{386. Ko ehler, J., & Homann, J. (2000b). On the instan tiation of ADL op erators in v olving arbitrary rst-order form ulas. In Pr o c e e dings ECAI-00 Workshop on New R esults in Planning, Sche duling and Design . Ko ehler, J., Neb el, B., Homann, J., & Dimop oulos, Y. (1997). Extending planning graphs to an ADL subset. In Steel, S., & Alami, R. (Eds.), R e c ent A dvanc es in AI Plan- ning. 4th Eur op e an Confer enc e on Planning (ECP'97) , V ol. 1348 of L e ctur e Notes in A rticial Intel ligenc e , pp. 273{285 T oulouse, F rance. Springer-V erlag. 300 F ast Plan Genera tion Thr ough Heuristic Sear ch Ko ehler, J., & Sc h uster, K. (2000). Elev ator con trol as a planning problem. In Chien, S., Kam bhampati, R., & Knoblo c k, C. (Eds.), Pr o c e e dings of the 5th International Confer enc e on A rticial Intel ligenc e Planning Systems (AIPS-00) . AAAI Press, Menlo P ark. Long, D., & F o x, M. (1999). Ecien t implemen tation of the plan graph in stan. Journal of A rticial Intel ligenc e R ese ar ch , 10 , 87{115. McAllester, D. A., & Rosen blitt, D. (1991). Systematic nonlinear planning. In Pr o c e e dings of the 9th National Confer enc e of the A meric an Asso ciation for A rticial Intel ligenc e (AAAI-91) , pp. 634{639 Anaheim, CA. MIT Press. McDermott, D. (1996). A heuristic estimator for means-ends analysis in planning. In Pr o c e e dings of the 3r d International Confer enc e on A rticial Intel ligenc e Planning Systems (AIPS-96) , pp. 142{149. AAAI Press, Menlo P ark. McDermott, D., et al. (1998). The PDDL Planning Domain Denition L anguage . The AIPS-98 Planning Comp etition Comitee. McDermott, D. V. (1999). Using regression-matc h graphs to con trol searc h in planning. A rticial Intel ligenc e , 109 (1-2), 111{159. Mitc hell, D., Selman, B., & Lev esque, H. J. (1992). Hard and easy distributions of SA T problems. In Pr o c e e dings of the 10th National Confer enc e of the A meric an Asso ciation for A rticial Intel ligenc e (AAAI-92) , pp. 459{465 San Jose, CA. MIT Press. Neb el, B. (2000). On the compilabilit y and expressiv e p o w er of prop ositional planning formalisms. Journal of A rticial Intel ligenc e R ese ar ch , 12 , 271{315. Neb el, B., Dimop oulos, Y., & Ko ehler, J. (1997). Ignoring irrelev an t facts and op erators in plan generation. In Steel, S., & Alami, R. (Eds.), R e c ent A dvanc es in AI Planning. 4th Eur op e an Confer enc e on Planning (ECP'97) , V ol. 1348 of Le ctur e Notes in Articial Intel ligenc e , pp. 338{350 T oulouse, F rance. Springer-V erlag. P ednault, E. P . (1989). ADL: Exploring the middle ground b et w een STRIPS and the situation calculus. In Brac hman, R., Lev esque, H. J., & Reiter, R. (Eds.), Principles of Know le dge R epr esentation and R e asoning: Pr o c e e dings of the 1st International Confer enc e (KR-89) , pp. 324{331 T oron to, ON. Morgan Kaufmann. Refanidis, I., & Vlaha v as, I. (1999). GR T: a domain indep enden t heuristic for STRIPS w orlds based on greedy regression tables. In Biundo, S., & F o x, M. (Eds.), R e c ent A dvanc es in AI Planning. 5th Eur op e an Confer enc e on Planning (ECP'99) Durham, UK. Springer-V erlag. Refanidis, I., & Vlaha v as, I. (2000). Exploiting state constrain ts in heuristic state-space planning. In Chien, S., Kam bhampati, R., & Knoblo c k, C. (Eds.), Pr o c e e dings of the 5th International Confer enc e on A rticial Intel ligenc e Planning Systems (AIPS-00) , pp. 363{370. AAAI Press, Menlo P ark. 301 Hoffmann & Nebel Russell, S., & Norvig, P . (1995). A rticial Intel ligenc e: A Mo dern Appr o ach . Pren tice-Hall, Englew o o d Clis, NJ. Siegel, S., & N. J. Castellan, J. (1988). Nonp ar ametric Statistics for the Behavior al Scienc es (2nd edition). McGra w-Hill. Slaney , J., & Thiebaux, S. (2001). Blo c ks w orld revisited. A rticial Intel ligenc e , 125 , 119{153. 302

The FF Planning System: Fast Plan Generation Through Heuristic Search

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment