Efficient Implementation of the Plan Graph in STAN

Journal of Articial In telligence Researc h 10 (1999) 87-115 Submitted 9/98; published 2/99 Ecien t Implemen tation of the Plan Graph in ST AN Derek Long Maria F o x Dep artment of Computer Scienc e University of Durham, UK d.p.long@dur.a c.uk maria.f o x@dur.a c.uk Abstract St an is a Graphplan-based planner, so-called b ecause it uses a v ariet y of ST ate ANal- ysis tec hniques to enhance its p erformance. St an comp eted in the AIPS-98 planning comp etition where it compared w ell with the other comp etitors in terms of sp eed, nding solutions fastest to man y of the problems p osed. Although the domain analysis tec hniques St an exploits are an imp ortan t factor in its o v erall p erformance, w e b eliev e that the sp eed at whic h St an solv ed the comp etition problems is largely due to the implemen tatio n of its plan graph. The implemen tation is based on t w o insigh ts: that man y of the graph construction op erations can b e implem en ted as bit-lev el logical op erations on bit v ectors, and that the graph should not b e explicitly constructed b ey ond the x p oin t. This pap er describ es the implem en tation of St an 's plan graph and pro vides exp erimen tal results whic h demonstrate the circumstances under whic h adv an tages can b e obtained from using this implemen tatio n. 1. In tro duction St an is a domain-indep enden t planner for STRIPS domains, based on the graph construc- tion and searc h metho d of Graphplan (Blum & F urst, 1997). Its name is deriv ed from the fact that it p erforms a n um b er of prepro cessing analyses, or ST ate ANalyses, on the domain b efore planning, using the T yp e Inference Mo dule Tim describ ed b y F o x and Long (1998). St an comp eted in the AIPS -98 planning comp etition and ac hiev ed an excellen t o v erall p erformance in b oth rounds. The results of the comp etition, whic h can b e found at the URL giv en in App endix A, sho w that St an w as able to solv e some problems notably quic kly and that it could nd optimal parallel solutions to some problems whic h could not b e solv ed optimally b y an y other planner in the comp etition, for example in the Gripp er domain. The problems p osed in the comp etition did not giv e St an m uc h opp ortunit y to exploit its domain analysis tec hniques, so this p erformance is due mainly to the underlying implemen tation of the plan graph that St an constructs and searc hes. A more detailed discussion of the comp etition, from the comp etitors' p oin t of view, is in preparation. The design of St an 's plan graph is based on t w o insigh ts. First, w e observ e that action pre- and p ost-conditions can b e represen ted using bit v ectors. Chec king for m utual exclusion b et w een pairs of actions whic h directly in teract can b e implemen ted using logical op erations on these bit v ectors. Mutual exclusion (m utex relations) b et w een facts can b e implemen ted in a similar w a y . In order to b est exploit the bit v ector represen tation of information w e construct a t w o-la y er graph called a spike whic h a v oids unnecessary cop ying of data and allo ws la y er-dep enden t information ab out a no de to b e clearly separated from la y er- in dep enden t information ab out that no de. The spik e allo ws us to record m utex relations c  1999 AI Access F oundation and Morgan Kaufmann Publishers. All righ ts reserv ed. Long & F o x using bit v ectors, making m utex testing for indir e ct in teraction m uc h more ecien t (w e distinguish b et w een direct and indirect in teraction in Section 2.1). Second, w e observ e that there is no adv an tage in explicit construction of the graph b ey ond the stage at whic h the x p oin t is reac hed. Our plan graph main tains a wave fr ont whic h k eeps trac k of all of the goal sets remaining to b e considered during searc h. Since no new facts, actions or m utex relations are added b ey ond the x p oin t these goal sets can b e considered without explicit cop ying of the fact and action la y ers. The w a v e fron t mec hanism allo ws St an to solv e v ery large problem instances using a fraction of the time and space consumed b y Graphplan and Ipp (Ko ehler, Neb el, & Dimop oulos, 1997). F or example, using a heuristic discussed in Section 5.1, St an can solv e the 10-disc T o w ers of Hanoi problem (a 1023 step plan) in less than 9 min utes. In this pap er w e describ e the spik e and w a v e fron t mec hanisms and pro vide exp erimen tal results indicating the p erformance adv an tages obtained. 2. The Spik e Graph Structure Graphplan (Blum & F urst, 1997) uses constrain t satisfaction tec hniques to searc h a la y- ered graph whic h represen ts a compressed reac habilit y analysis of a domain. The la y ers corresp ond to snapshots of p ossible states at instan ts on a time line from the initial to the goal state. Eac h la y er in the graph comprises a set of facts that represen ts the union of states reac hable from the preceding la y er. This compression guaran tees that the plan graph can b e constructed in time p olynomial in the n um b er of action instances in the domain. The expansion of the graph, from whic h solutions can b e extracted, is partially enco ded in binary m utex relations computed during the construction of eac h la y er. ST AN implemen ts an ecien t represen tation of the graph in whic h a wave fr ont , discussed in Section 4, further supp orts its compression. In Graphplan-st yle planners the searc h for a plan, from la y er k , in v olv es the selection and exploration of a collection of action c hoices to see whether a plan can b e constructed, using those actions at the k th time step. If no plan is found the planner bac ktrac ks o v er the action c hoices. Tw o imp ortan t landmarks arise during the construction of the plan graph. The rst is the p oin t at whic h the graph op ens in the sense that the problem b ecomes, in principle, solv able. This is the la y er at whic h all of the top lev el goals rst b ecome pairwise non-m utex and is referred to here as the op ening layer . The second is the x p oint , referred to as level o b y Blum and F urst (1997), the la y er after whic h no further c hanges can b e made to either the action, fact or m utex information recorded in the graph at eac h la y er. In the original implemen tation of Graphplan the graph w as implemen ted as an alter- nating sequence of la y ers of fact no des and action no des, with arcs connecting actions to their preconditions in the previous la y er and their p ostconditions in the subsequen t la y er. The la y ers w ere constructed explicitly in v olving the rep eated cop ying of large p ortions of the graph at eac h stage in main taining the graph structure. This cop ying w as due to t w o features of the graph. First, since actions with satised preconditions in one la y er con tin ue to ha v e satised preconditions in all subsequen t la y ers, actions that ha v e once b een added to a la y er will app ear in ev ery successiv e action la y er with the same name and the same pre- and p ost-conditions. Second, since facts that ha v e once b een ac hiev ed b y the eects of an action will alw a ys b e ac hiev ed b y that action, they will con tin ue to app ear in ev ery 88 Efficient Implement a tion of the Plan Graph in ST AN successiv e fact la y er after the la y er in whic h they rst app eared. Although the la y ers can get deep er at ev ery successiv e stage they eac h duplicate information presen t in the previous la y er, so there is only a small amoun t of new information added at ev ery stage. The pro- p ortion of new material, relativ e to copied material, decreases progressiv ely as the graph dev elops. In the original Graphplan, m utex relations w ere c hec k ed for b y main taining lists of facts, corresp onding to the pre- and p ost-conditions of actions, and c hec king for mem b ership of facts within these lists. Because of the need to cop y information at eac h new la y er, the pre- and p ost-conditions of actions w ere duplicated ev en though this information did not v ary from la y er to la y er (it can b e determined once and for all at the p oin t of instan tiation of the sc hema). It is p ossible to iden tify la y er-indep enden t information, with eac h no de in the graph, whic h can b e stored just once using a dieren t represen tation of the graph structure. The spik e represen tation reimplemen ts the graph as a single fact arra y , called the fact spike , and a single action arra y , called the action spike , eac h divided in to r anks corresp onding to the la y ers in the original Graphplan graph structure. The observ ations leading to this compressed implemen tation of the plan graph w ere made indep enden tly b y Smith and W eld (1998). In St an , a fact rank is a consecutiv e sequence of fact he aders storing the la y er- indep enden t information asso ciated with their asso ciated facts in the corresp onding fact la y er. Similarly , an action rank is a consecutiv e sequence of action he aders storing la y er- indep enden t information ab out their asso ciated actions in the corresp onding action la y er. Eac h header is a tuple con taining, amongst other things, the name of the fact or action it is asso ciated with and a structure whic h stores the la y er-dep enden t information relev an t to that fact or action. In the case of fact headers this structure is called a fact level p ackage and in the case of action headers it is an action level p ackage . Figure 1 sho ws ho w a simple graph structure can b e view ed as a spik e. In the spik e the p ositions of all fact and action headers are xed and can b e referred to b y indexing in to the appropriate arra y . A t an y p oin t, the sizes of the arra ys are referred to using the constan t M axS iz e , a large n um b er setting an upp er b ound on the size of the spik e. All of the v ectors allo cated are also initialised to this size, although they are used in w ord-sized incremen ts. This sa v es the eort of re-allo cating and cop ying v ectors as the spik e increases in size to w ards M axS iz e . W e no w dene the data t yp es so far in tro duced. Denition 1 A spik e v ector is a bit ve ctor of size M axS iz e . Denition 2 A fact header is a tuple of six c omp onents: a name which is the pr e dic ate and ar guments that c omprise the fact itself; an index , i, giving the p osition of the fact in the fact arr ay; a bit mask which is a spike ve ctor in which the ith bit is set and al l other bits ar e unset; a reference identifying its achieving no-op; a spike ve ctor consumers with bits set for al l the actions which use this fact as a pr e c ondition and a fact lev el pac k age storing the layer-dep endent information ab out that fact. Denition 3 A n action header is a tuple of eight c omp onents: the name of the action; an index , i, giving the p osition of the action in the action arr ay; a bit mask which is a spike ve ctor in which the ith bit is set and al l other bits ar e unset; a ag indic ating whether the action is a no-op; thr e e spike ve ctors, c al le d precs, adds and dels and an action lev el pac k age storing the layer-dep endent information ab out that action. Each bit in pr e cs, adds 89 Long & F o x P Q R op1 op2 Noop Noop Noop P Q R S T U V W Noop Noop Noop Noop Noop Noop rank 0 rank 1 rank 2 rank 1 rank 2 op1 op2 op3 P Q R S T U P Q R S T U V W Noop P Noop Q Noop R op1 Fact Spike and Fact Level Packages Action Spike and Action Level Packages op2 Noop S Noop T Noop U op3 Fact Layer 0 Fact Layer 1 Fact Layer 2 Action Layer 1 Action Layer 2 Figure 1: Represen tation of a plan graph as a spik e. In the fact spik e, ranks 0, 1 and 2 corresp ond to fact la y ers 0, 1 and 2 resp ectiv ely . In the action spik e, ranks 1 and 2 corresp ond to action la y ers 1 and 2 resp ectiv ely . 90 Efficient Implement a tion of the Plan Graph in ST AN and dels c orr esp onds to an index into the fact arr ay and is set in precs if the fact at that index is a pr e c ondition (and unset otherwise), in adds if the fact at that index is an add list element (and unset otherwise) and in dels if the fact at that index is a delete list element of the action (and unset otherwise). Denition 4 A fact m utex v ector (FMV) for a fact, f , is a spike ve ctor in which the bits c orr esp ond to the indic es into the fact arr ay and a bit is set if the c orr esp onding fact is mutex with f . Denition 5 A n action m utex v ector (AMV) for an action, a , is a spike ve ctor in which the bits c orr esp ond to the indic es into the action arr ay and a bit is set if the c orr esp onding action is mutex with a . Denition 6 A fact lev el pac k age for a fact, f , is an arr ay of p airs, one for e ach r ank in the spike, e ach c ontaining a fact m utex v ector for f and a ve ctor of achievers, c al le d the ac hiev emen t v ector (A V) , in the pr evious action r ank. Denition 7 A n action lev el pac k age for an action, a , is an arr ay of triples, one for e ach r ank in the spike, e ach c ontaining an action m utex v ector for a and a list of actions mutex with a (MAs) . Using these denitions w e can no w pro vide a detailed description of the spik e construc- tion pro cess. 2.1 The Spik e Construction Pro cess W e will mak e use of these header access functions in the follo wing discussion: mv ec : f act ! f act mutex v ector pr ecs of : action ! pr ecs adds of : action ! adds del s of : action ! del s The spik e construction pro cess tak es place within a lo op whic h stops when all goals are pairwise ac hiev able, and thereafter alternates with searc h un til the x p oin t is reac hed and the w a v e fron t mec hanism tak es o v er. The use of the w a v e fron t is discussed in Section 4. The k ey comp onen t of the pro cess is the rank construction algorithm whic h builds a fact rank and an action rank b y extending the previous fact and action ranks in the spik e. The action rank is started b y adding no-ops for eac h of the fact headers in the previous fact rank. As so on as these are added, the fact headers can b e up dated to refer, b y index in to the action rank, to their ac hieving no-ops. This information allo ws St an to giv e preference, when searc hing, to plans that use the no-op to ac hiev e a goal rather than some other ac hiev er. In Graphplan this preference w as ensured b y k eeping all of the no-ops at the top of the graph la y ers and considering the ac hiev ers in order during searc h. All p ossible action instances are then considered. All applicable action instances are enacte d and then remo v ed so that they will nev er b e reconsidered for enactmen t. W e then iden tify m utex relations b et w een the actions in the new action rank, and b et w een facts in the new fact rank. 91 Long & F o x As in Graphplan, an action instance is applicable in a rank if all of its preconditions are presen t and non-m utex in the previous rank. The w a y in whic h preconditions are tested for m utual exclusion in St an is based on our bit v ector represen tation of fact m utex relations. W e tak e the logical or of all of the fact m utex v ectors of the preconditions, and logically and the result with the precondition v ector of the action. If the result is non-zero then there are m utex preconditions and the action is not applicable. This test corresp onds to c hec king whether the action b eing considered is m utex with itself - a condition w e dene as b eing self-mutex . Denition 8 A n action a , with pr e c onditions a p 1 ::a pn , is self-m utex if: ( mv ec ( a p 1 ) _ mv ec ( a p 2 ) _ ::: _ mv ec ( a pn )) ^ pr ecs of ( a ) is non-zer o. An applicable action is enacted b y adding an action header to the new rank and setting its name to the name of the action and its bit mask to record its p osition in the spik e. In Figures 2 and 3 no-ops are giv en the names of the facts they ac hiev e and are iden tied as no-ops b y the ag comp onen ts of their headers. W e allo cate space for the action lev el pac k age and create and set its pr es , adds and del s v ectors. W e then add an y new facts on the add list of the action to the corresp onding new fact rank. The addition of new facts requires new fact headers to b e initialised. W e then iden tify m utex actions and m utex facts in the new ranks. Mutex actions are iden tied in t w o phases. Actions whic h w ere non-m utex in the previous rank remain non-m utex and are not considered at this stage. First, existing action m utex relations are c hec k ed to see whether they hold in the new rank. Second, new action m utex relations m ust b e deduced from the addition of new actions in the construction of this rank. W e rst consider the existing action m utex relations. Tw o actions are m utex, as in Graphplan, if they ha v e conicting add and delete lists, conicting precondition and delete lists or m utex preconditions. In the rst t w o cases the actions are directly , or p ermanently , m utex and nev er need to b e re-tested although their m utex relationship m ust b e recorded at eac h rank. In the third case the actions are indirectly , or temp or arily , m utex and m ust b e retried at subsequen t ranks. W e k eep trac k of whic h actions to retry in order to a v oid unnecessary retesting. W e conrm that t w o actions, a and b , whic h w ere temp orarily m utex in the previous rank are still temp orarily m utex using the follo wing logical op erations on the fact m utex v ectors for the action preconditions. W e rst logically or together the m utex v ectors for a 's preconditions then and the result with the precondition v ector for b . If the result is non-zero then a and b are m utex. This pro cedure, whic h is expressed concisely in Denition 9, is iden tical to that for c hec king whether an action is self-m utex except that, in this case, the result of or ing the fact m utex v ectors of the preconditions of one action is and ed with the precondition v ector of the other action. Since m utex relations are symmetric it is irrelev an t whic h action pla ys whic h role in the test. Denition 9 Two actions a (with n pr e c onditions a p 1 ::a pn ) and b ar e temp orarily m utex if ( mv ec ( a p 1 ) _ mv ec ( a p 2 ) _ ::: _ mv ec ( a pn )) ^ pr ecs of ( b ) 92 Efficient Implement a tion of the Plan Graph in ST AN is non-zer o. W e no w consider what new m utex relations can b e inferred from the in tro duction of the new actions. It is necessary to c hec k all new actions against all actions in the spik e. This c hec k is done in only one direction - lo w-indexed actions against high-indexed actions - so that the test is done only once for eac h pair. W e c hec k for b oth p ermanen t and temp orary m utex relations. The p ermanen t m utex test is done rst, b ecause if t w o actions are p ermanen tly m utex it is of no in terest to nd that they are also temp orarily m utex. Denition 10 pro vides the logical op eration used to conrm that t w o actions are p ermanen tly m utex. T emp orary m utex relations are c hec k ed for using the logical op eration dened in Denition 9. Denition 10 Two actions a and b ar e p ermanen tly m utex if the r esult of (( pr ecs of ( a ) _ adds of ( a )) ^ del s of ( b )) _ (( pr ecs of ( b ) _ adds of ( b )) ^ del s of ( a )) is non-zer o. W e add these m utex relations b y setting the appropriate bits in the m utex v ectors of eac h of the new actions. This is done b y or ing the m utex v ector of the rst action with the bit mask of the second action, and vice v ersa. A list of m utex actions is also main tained for use during searc h of the spik e. A renemen t of the action m utex c hec king done b y St an is the use of a record of actions whose preconditions ha v e lost m utex relations since the last la y er of the graph. This record enables St an to a v oid retesting temp orary m utex relations b et w een actions when the m utex relations b et w een their preconditions cannot ha v e c hanged. W e use a bit v ector called change dA cts to record this information. Eac h fact whic h loses m utex relations b et w een la y ers adds its consumers to change dA cts . The impact of this renemen t on eciency is discussed in Section 3. This concludes the construction of the new action rank. The new fact rank has already b een partially constructed b y the addition to the spik e of fact headers for an y add list elemen ts, of the new actions, that w ere not already presen t. No w it is necessary to determine m utex relations b et w een all pairs of facts in the spik e. T o do this w e m ust rst complete the ac hiev emen t v ectors for all of the fact headers in the new rank. An y non-m utex pairs remain non-m utex, as with actions, so eort is fo cussed on deciding whether previously m utex facts are still m utex follo wing the addition of the new actions, and whether new facts induce new m utex relations. Tw o facts are m utex if the only w a y of ac hieving b oth of them in v olv es the use of m utex actions. W e therefore consider ev ery new fact with ev ery other fact, in only one direction. The pair f , g is m utex in the new rank if ev ery p ossible ac hiev er of f is m utex with ev ery p ossible ac hiev er for g . The test for this exclusion is done using g 's achievement ve ctor and the result of logically and ing the action m utex v ectors for all p ossible ac hiev ers of f . the follo wing denition giv es the details: Denition 11 Two facts, f and g, ar e m utex if: v ec g ^ al l mutex f = v ec g 93 Long & F o x wher e v ec g is g 's achievement ve ctor and al l mutex f is the c onse quenc e of and ing al l of the action mutex ve ctors of al l of f 's p ossible achievers. It do es not matter in whic h order f and g are treated. The computation of the ab o v e condition corresp onds to testing the truth of 8 a  8 b  ( achiev er ( a; f ) ^ achiev er ( b; g ) ! mutex ( a; b )) Since m utex relations are symmetric and the quan tiers can b e freely reordered the expres- sion equally corresp onds to v ec f ^ al l mutex g = v ec f If f and g are found to b e m utex then w e set the fact m utex v ector of f b y or ing it with g 's bit mask and the fact m utex v ector of g is set con v ersely . This concludes the rank construction pro cess and one iteration of the spik e construction pro cess. 2.2 Subset Memoization in St an Most of the searc h mac hinery used in St an is essen tially iden tical to that of Graphplan. That is, a goal set is considered b y iden tifying appropriate ac hieving actions in the previous la y er and propagating their preconditions bac k through the graph. The use of the spik e and bit v ector represen tations do es not impact on the searc h algorithm. W e exp erimen ted with using bit v ector represen tations of bad goal sets in the memoization pro cess, in order to exploit logical bit op erations to test for subset relations b et w een sets of goals, but this pro v ed to o exp ensiv e and w e no w rely up on a trie data structure. This b enets marginally from the spik e b ecause goal sets do not need to b e sorted for subset testing. The order in whic h the goals are generated in the spik e can b e tak en as the canonical ordering since goal sets are formed b y a simple sw eep through the spik e at eac h successiv e la y er. St an implemen ts an impro v emen t on the goal set memoization of Graphplan. In the original Graphplan, when a goal set could not b e ac hiev ed at a particular la y er the en tire set w as memoized as a bad set for that la y er. In St an v ersion 2, only the subset of goals that ha v e b een satised at the p oin t of failure, within a la y er, are actually memoized. More goal sets are lik ely to con tain the smaller memoized subset than w ould b e lik ely to con tain the complete original failing goal set. This therefore allo ws us to prune searc h branc hes earlier. This metho d is a w eak v ersion of Kam bhampati's (1998, 1999) EBL (Explanation-Based Learning) mo dications. EBL allo ws the iden tication of the subset of a goal set that is really resp onsible for its failure to yield a plan. Memoization of smaller sets increases the eciency of the planner b y reducing the o v erhead necessary in iden tifying failing goal sets. DDB (Dep endency-Directed Bac ktrac king) impro v es the searc h p erformance b y ensuring that bac ktrac king returns to the p oin t at whic h the last c hoice resp onsible for failure w as made. These mo dications result in smaller sets b eing memoized and a more ecien t searc h b eha viour whic h, in com bination with the trie, ensure that a higher prop ortion of failing searc h paths are terminated early . W e ha v e exp erimen ted with an implemen tation of the full EBL/DDB mo dications prop osed b y Kam bhampati, but there is an in teraction b et w een the EBL/DDB mac hinery and the w a v e fron t of St an whic h w e are curren tly attempting to resolv e. Our exp erimen ts so far indicate that b oth the w a v e fron t and EBL/DDB ha v e signican t b enecial impact 94 Efficient Implement a tion of the Plan Graph in ST AN on searc h, but not consisten tly across the same problems. W e b eliev e that w e can enhance the adv an tages of the w a v e fron t b y full in tegration with EBL/DDB, but this remains to b e demonstrated. 2.3 A W ork ed Example W e no w demonstrate the spik e construction pro cess in action on a simple blo c ks w orld ex- ample in whic h there are t w o blo c ks and t w o table p ositions. In the initial state, b oth blo c ks are on the table, one in eac h of the t w o p ositions. Consequen tly there are no clear table p ositions. The initial spik e consists of a fact rank con taining fact headers for the four facts that describ e the initial state. There is a single op erator sc hema, puton ( B l ock ; T o; F r om ), as follo ws: puton(X,Y,Z) Pre: on(X,Z), clear(X), clear(Y) Add: on(X,Y), clear(Z) Del: on(X,Z), clear(Y) The action rank is initially empt y . On the rst iteration of the lo op the rst action rank is constructed b y creating no-ops for ev ery fact in the zeroth fact rank. Tw o further actions are applicable and are enacted, and the facts on their addlists are used to create a new fact rank. This results in the partially dev elop ed spik e sho wn in Figure 2. It can b e observ ed from Figure 2 that, follo wing enactmen t, the fact headers asso ciated with the newly added facts are incomplete, and although the new fact lev el and action lev el pac k ages ha v e b een allo cated they do not y et con tain an y v alues. The new fact headers are missing references to the no-ops that will b e used to ac hiev e them in the next action rank. The new fact lev el pac k ages are blank b ecause their corresp onding fact headers will ha v e no lev el information for rank 0. After iden tication of m utex actions and m utex facts, the picture is as sho wn in Figure 3. In the action lev el pac k ages, the lists of m utex actions are giv en as lists of indices for the sak e of clarit y . In fact they are lists of p oin ters to actions, in order to a v oid the indirection in v olv ed in the use of indices. None of the action pairs are temp orarily m utex at rank 1 b ecause all of the fact m utex v ectors from rank 0 are zero-v alued. 3. Empirical Results In this section w e presen t results demonstrating the eciency of the spik e and v ector rep- resen tation of the plan graph used b y St an . W e consider graph construction only in this section { the eciency of searc h in St an will b e demonstrated in Section 4. W e sho w the eciency of graph construction in St an b y sho wing relativ e p erformance gures for St an and the comp etition v ersion of Ipp in sev eral of the comp etition domains and t w o further standard b enc h mark domains. These are the Graphplan v ersion of the T ra v elling Salesman domain (Blum & F urst, 1997), whic h uses a complete graph and is referred to here as the Complete-Graph T ra v elling Salesman domain, and the F erry domain a v ailable in the PDDL release. W e compare St an with Ipp b ecause, to the b est of our kno wledge, Ipp is the only other fast Graphplan-based planner curren tly publicly a v ailable. W e use the comp etition 95 Long & F o x name: on(a,t1) index: 0 noop: 0 name: on(b,t2) index: 1 msk 01000000 noop: 1 name: clear(a) index: 2 msk 00100000 msk 10000000 noop: 2 name: clear(b) index: 3 msk 00010000 noop: 3 name: on(a,b) index: 4 msk 00001000 noop: name: clear(t1) index: 5 msk 00000100 noop: name: clear(t2) index: 6 msk 00000010 noop: name: on(b,a) index: 7 msk 00000001 noop: index: 0 msk: 10000000 noop?: True precs: 10000000 adds: 10000000 index: 1 msk: 01000000 noop?: True precs: 01000000 adds: 01000000 dels: 00000000 dels: 00000000 index: 2 msk: 00100000 noop?: True precs: 00100000 adds: 00100000 dels: 00000000 index: 3 msk: 00010000 noop?: True precs: 00010000 adds: 00010000 dels: 00000000 index: 4 noop?: False precs: 10110000 adds: 11000000 dels: 10010000 index: 5 msk: 00000100 noop?: False precs: 01110000 adds: 00000011 dels: 01100000 msk: 00001000 Fact Spike Action Spike Fact Level Packages (rank 0) Action Level Packages (rank 1 - as yet uninstantiated) name: on(a,t1) name: on(b,t2) name: clear(a) name: clear(b) name: puton(a,b,t1) name: puton(b,a,t2) FMV: 0...0 AV: 0...0 FMV: 0...0 AV: 0...0 FMV: 0...0 AV: 0...0 FMV: 0...0 AV: 0...0 Figure 2: The spik e after enactmen t of the rank 1 actions. 96 Efficient Implement a tion of the Plan Graph in ST AN name: on(a,t1) index: 0 noop: 0 name: on(b,t2) index: 1 msk 01000000 noop: 1 name: clear(a) index: 2 msk 00100000 msk 10000000 noop: 2 name: clear(b) index: 3 msk 00010000 noop: 3 name: on(a,b) index: 4 msk 00001000 noop: name: clear(t1) index: 5 msk 00000100 noop: name: clear(t2) index: 6 msk 00000010 noop: name: on(b,a) index: 7 msk 00000001 noop: index: 0 msk: 10000000 noop?: True precs: 10000000 adds: 10000000 index: 1 msk: 01000000 noop?: True precs: 01000000 adds: 01000000 dels: 00000000 dels: 00000000 index: 2 msk: 00100000 noop?: True precs: 00100000 adds: 00100000 dels: 00000000 index: 3 msk: 00010000 noop?: True precs: 00010000 adds: 00010000 dels: 00000000 name: puton(a,b,t1) index: 4 noop?: False precs: 10110000 adds: 11000000 dels: 10010000 name: puton(b,a,t2) index: 5 msk: 00000100 noop?: False precs: 01110000 adds: 00000011 dels: 01100000 msk: 00001000 FMV: 0..0 AV: 0..0 FMV: 0..0 AV: 0..0 FMV: 0..0 AV: 0..0 00000100 01000000 00000011 00100000 00001000 00010000 10000011 00001000 01000011 00001000 00100100 00000100 00101100 00000100 Fact Spike Fact Level Packages (ranks 0 and 1) Action Spike MAs: 4 Action Level Packages (rank 1) AMV: 00000100 MAs: 5 AMV: 00000100 MAs: 5 AMV: 00001000 AMV: 10010100 MAs: 0,3,5 AMV: 01101000 MAs: 1,2,4 name: on(a,t1) name: on(b,t2) name: clear(a) name: clear(b) FMV: 0..0 00001000 AV: 0..0 10000000 AMV: 00001000 MAs: 4 Figure 3: The spik e at the end of the rank 1 construction phase. 97 Long & F o x 100 1000 10000 100000 100 1000 10000 100000 IPP ST AN 3 3 3 3 3 3 3 Figure 4: Graph construction in the logistics domain: St an sho ws a constan t factor im- pro v emen t o v er the p erformance of Ipp . 10 100 1000 10 100 1000 IPP ST AN 3 3 3 3 3 3 3 3 3 3 Figure 5: Graph construction in the Gripp er domain. 98 Efficient Implement a tion of the Plan Graph in ST AN v ersion of Ipp b ecause this is the most up to date v ersion a v ailable from the F reiburg w ebpage at the time of writing. In order to fo cus on the graph construction phase, and eliminate the searc h phase from b oth planners, w e ha v e constructed v ersions of St an and Ipp whic h terminate once the graph has op ened. W e ha v e remo v ed from St an all of the unnecessary pre-pro cessing, domain analysis and additional features that con tribute to later searc h eciency . Ho w ev er, since Ipp is designed to build one more la y er b efore op ening than is strictly necessary , to include a dumm y goal corresp onding to the ac hiev emen t of the conjunction of the top lev el goal set, w e mak e St an build one extra la y er to o so that the t w o systems are comparable. W e ha v e remo v ed all of the meta-strategy con trol from Ipp , forcing Ipp directly in to its graph construction. It is p ossible that a more streamlined graph constructor could b e built from Ipp b y elimination of further pro cessing, but w e observ ed, during exp erimen tation with Ipp , that pre-pro cessing accoun ts for insignican t prop ortions of the timings rep orted b elo w. W e are therefore conden t that an y further streamlining w ould ha v e minimal eects on our results. In order to compare St an and Ipp accurately it w as necessary to mo dify the timing mec hanisms to ensure that precisely the same elemen ts are timed. A Unix /Lin ux di le is a v ailable at the St an w ebsite, and in Online App endix 1, for an y one in terested in reconstructing the Ipp graph construction system w e ha v e used. The domains and problems used, and our graph construction v ersion of St an , can also b e found at these lo cations. All exp erimen ts rep orted in this pap er w ere carried out on a P300 Lin ux PC, with 128Mb of RAM and 128Mb sw ap space. All of the timings in the data sets rep orted are in milliseconds. All the graphs are log-log scaled. This w as necessary to com bat the long scales caused b y v ery large timings asso ciated with a few instances in eac h domain. The graphs sho w Ipp 's construction p erformance compared with St an 's construction p erformance measured on the same problems in eac h of six domains. The straigh t line sho ws where equal p erformance w ould b e. P oin ts ab o v e the line indicate sup erior p erformance b y St an and p oin ts b elo w the line indicate sup erior p erformance b y Ipp . In all of the rst v e data sets, St an clearly out-p erforms Ipp . In the last data set (Figure 9), Ipp con vincingly out-p erforms St an and w e no w consider a more detailed analysis of the c haracteristics of the domains and instances whic h explain these data sets. The rst four data sets rev eal a v ery similar p erformance. The p oin ts are broadly par- allel to the equal p erformance line, indicating that St an p erforms at a constan t m ultiple of the p erformance of Ipp . Despite the trend that these data sets rev eal, o ccasional data p oin ts deviate signican tly from this b eha viour. This reects the fact that dieren t struc- tures of particular problems exercise dieren t comp onen ts of the graph construction system. Comp onen ts include instan tiation of op erators, application of individual op erator instances and the corresp onding extension of fact la y ers and c hec king and re-c hec king m utex relations b et w een facts and b et w een actions. W e observ ed that in some problem instances, 50 p er cen t or more of the construction time w as sp en t in action m utex c hec king, whilst in others instan tiation dominated. The densit y of p ermanen t m utex relations b et w een actions, and the degree of p ersistence of temp orary m utex relations b et w een actions, are b oth v ery sig- nican t in determining eciency of p erformance. F or example in problem 8 in the Mystery domain, where 21 la y ers are constructed b efore the graph op ens, only 9 p er cen t of the action pairs w ere discarded as p ermanen tly m utex and, of the temp orary m utex pairs, the 99 Long & F o x 10 100 1000 10000 100000 10 100 1000 10000 100000 IPP ST AN 3 3 3 3 3 3 3 3 Figure 6: Graph construction in the Mystery domain. St an 's p erformance in this domain is consisten tly b etter than that of Ipp , but sho ws more mark ed v ariation rev ealing that the b enets of the spik e are problem-dep enden t. 1000 10000 100000 1000 10000 100000 IPP ST AN 3 3 3 3 3 3 3 Figure 7: Graph construction in the Mprime domain. 100 Efficient Implement a tion of the Plan Graph in ST AN 1 10 100 1000 1 10 100 1000 IPP ST AN 3 3 3 3 3 3 3 3 3 3 Figure 8: Graph construction in the F erry domain. St an sho ws p olynomially b etter graph construction p erformance than Ipp . a v erage n um b er of re-tests across the en tire graph construction w as o v er 7. The use of the change dA cts mec hanism describ ed in Section 2.1, to a v oid retesting actions when their precondition m utex relations had not c hanged from the previous la y er, ga v e us a 50 p er cen t impro v emen t in p erformance and accoun ts for a more than 40 second adv an tage o v er Ipp in the construction phase of this problem. In other problems a m uc h higher p ercen tage of action pairs are p ermanen tly m utex, allo wing early elimination of man y action pairs from further retesting. Where m utex re- lations are not highly p ersisten t a similar elimination rate is p ossible. This allo ws m uc h faster construction for St an . Ipp do es not b enet in the same w a y , b ecause it do es not distinguish b et w een temp orary and p ermanen t m utex and do es not try to iden tify whic h pairs of actions should b e retested. In the F erry domain, Figure 8, 7 la y ers are constructed to op en the graph regardless of instance size. Analysis rev eals that appro ximately 25 p er cen t of action pairs are p er- manen tly m utex and the a v erage p ersistence of temp orary m utex relations is sligh tly more than 2 la y ers. Since Ipp do es not in telligen tly eliminate actions from retesting, the implica- tion of this is that Ipp unnecessarily re-c hec ks m utex relations for a p olynomially increasing n um b er of pairs of actions. This explains the p olynomial adv an tage obtained b y St an in this domain. The last data set sho ws a rather dieren t pattern of p erformance from that of the others. The Complete-Graph T ra v elling Salesman domain used to pro duce the data set for Figure 9 is a simplied v ersion, in whic h the graph is fully connected, of the w ell kno wn 101 Long & F o x 10 100 1000 10 100 1000 IPP ST AN 3 3 3 3 3 3 Figure 9: Graph construction in the Complete-Graph T ra v elling Salesman domain. St an displa ys a p olynomially deteriorating graph construction p erformance. This is further discussed in the text. NP-hard TSP . It is, in principle, ecien tly solv able. In Figure 9 Ipp 's p erformance app ears to b e p olynomially b etter than that of St an . Analysis of the graph structure built for dieren t instances rev eals that, on all instance sizes, the graph op ens at la y er 3. In these graphs an in teresting pattern can b e observ ed in the m utex relations b et w een actions: the v ast ma jorit y of action pairs are m utex after their rst application at la y er 2 (b ecause the salesman can only ev er b e in one place). These m utex relations are considered, b y b oth St an and Ipp , to b e temp orary although they in fact p ersist. The consequence is that b oth St an and Ipp retest all pairs at the next la y er. St an obtains no adv an tage from the use of change dA cts or the distinction b et w een temp orary and p ermanen t m utex relations in this domain. The n um b er of m utex pairs to b e c hec k ed increases quadratically with increase in instance size, whic h is in line with St an 's p erformance. Ipp clearly pa ys m uc h less for this retesting, despite the fact that it do es the same amoun t of w ork. This fact, together with proling of b oth systems, leads us to b eliev e that the disadv an tage suered b y St an is due to the o v erhead in supp orting ob ject mem b er applications in its C++ implemen tation. It is w orth p oin ting out that in the Complete-Graph T ra v elling Salesman domain, as w ell as in Gripp er and F erry , the construction time for b oth planners is under 1 second for all instances tested so the discrepancies in p erformance in these three domains are insignican t compared with the discrepancies measured in seconds (for large instances) in the other domains. 102 Efficient Implement a tion of the Plan Graph in ST AN Fix Point Buffer G1 G2 G3 G4 G5 G G1 G2 Figure 10: The w a v e fron t in St an . 4. The W a v e F ron t When a la y er is reac hed in whic h all of the top lev el goals are pairwise non-m utex Graphplan- based planners b egin searc hing for a plan. If no plan can b e found, new la y ers are con- structed alternately with searc h un til the x p oin t of the graph is reac hed. In Graphplan and Ipp the graph con tin ues to b e explicitly constructed b ey ond the x p oin t, ev en though the la y ers whic h can b e built b ey ond this p oin t are sterile (con tain no new facts, actions or m utex relations). Their construction is necessary to allo w the conditions for ac hiev emen t of goal sets to b e established, b et w een the x p oin t and the curren t la y er. Ho w ev er, this con- stitutes signican t computational eort in cop ying existing structures and in unnecessary searc hing of these duplicate structures. Instead of building these sterile la y ers explicitly , St an main tains a single la y er, called the buer , b ey ond the x p oin t together with a queue of goal sets remaining to b e considered. Eac h time a goal set is remo v ed from this queue, to b e considered in the buer, those goal sets it generates in the x p oin t la y er, whic h ha v e 103 Long & F o x not b een previously mark ed as unsolv able, are added to the queue. The goal sets in this queue are considered in order, alw a ys for ac hiev emen t in the buer la y er. Th us, rather than constructing a new la y er eac h time the top lev el goal set pro v es unsolv able, and then reconsidering all of the same ac hiev ers in the new la y er, the goal sets in the queue are simply considered in the buer la y er. W e call this mec hanism a wave fr ont b ecause it pushes goal sets forw ard from the x p oin t la y er in to the buer, and then recedes to consider another goal set from the x p oin t la y er. The goal sets generated at the x p oin t, whic h join the queue for propagation, are referred to as c andidate goal sets. The w a v e fron t is depicted in Figure 10. The underlying implemen tation of the plan graph remains based on the spik e, but the gure depicts the graph in the traditional w a y for simplicit y . In the picture, G represen ts the top lev el goal set and when it is used to initiate a plan searc h from the buer la y er it generates the sequence of goal sets G1, G2 and G3 at the x p oin t la y er. Assuming that these all fail, the rst set in this queue, G1, is propagated forw ard to the buer leading to the generation of goal sets G4 and G5 in the x p oin t la y er. These are added to the end of the queue and G2 will b e the next goal set selected from the queue to propagate forw ard. In order to demonstrate that the w a v e fron t mac hinery main tains an appropriate b e- ha viour there are three questions to b e considered. 1. Is ev ery goal set that w ould ha v e b een considered in the buer la y er, had the graph b een constructed explicitly , still considered using the w a v e fron t? This question con- cerns completeness of the searc h pro cess. 2. Do es ev ery plan generated to ac hiev e a goal set that is considered in the buer la y er corresp ond to a plan that w ould ha v e b een generated had the graph b een explicitly constructed? This question concerns soundness. 3. The nal question concerns whether the termination prop erties of Graphplan are main tained. Denition 12 A k -level goal tree for go al set G at layer n in a plan gr aph, GT k ;G;n , is a gener al tr e e of depth k in which the no des ar e go al sets and the p ar ent-child r elationship is dene d as fol lows. If the go al set x is in the tr e e at level i then the go al set y is a child of x if y is a minimal go al set c ontaining no mutex go al p airs such that achievement of y at layer n  i  1 in the plan gr aph enables the achievement of x at layer n  i in that gr aph. We take the r o ot to b e at level 0 of the tr e e and the le aves to b e at level k  1 . Lemma 1 If n  k  F P then GT k ;G;n = GT k ;G;n +1 , wher e F P is the numb er of the x p oint layer in the plan gr aph. Pro of By denition of the x p oin t, all la y ers in a plan graph b ey ond the x p oin t con tain an exact replica of the information con tained at the x p oin t la y er. Since, b y denition of the goal tree, the paren t-c hild relationship dep ends exclusiv ely up on the relationship b et w een t w o consectiv e la y ers in the plan graph, and la y ers cannot c hange after the x p oin t, it follo ws that if x is the paren t of y at some la y er b ey ond the x p oin t then the paren t-c hild relationship b et w een x and y m ust hold at an y pair of consecutiv e la y ers b ey ond the x p oin t. F urther, no new paren t-c hild relationships can arise b ey ond the x p oin t. The 104 Efficient Implement a tion of the Plan Graph in ST AN restriction that n  k  F P ensures that all la y ers in b oth goal trees lie in the region b ey ond the x p oin t. 2 The completeness of St an follo ws from the completeness of Graphplan pro vided that all of the goal sets that w ould app ear in the la y er after the x p oin t in the explicit graph arise as candidates to b e considered in the buer la y er using the w a v e fron t. W e no w pro v e that this condition is satised b y rst pro ving that the lea v es of goal trees generated at successiv e la y ers of a plan graph are all used to generate candidates in St an . Since the goal sets considered b y Graphplan are alw a ys subsets of the lea v es of goal trees it will b e sho wn that the completeness of St an follo ws. Theorem 1 Given a go al set, G , and a plan gr aph of n layers, c ontaining no plan for G of length n  1 , with x p oint at layer F P ( n > F P ), al l of the le aves of GT n  F P ;G;n ar e gener ate d as c andidates by St an . Pro of The pro of is b y induction on n , with base case n = F P + 1. In the base case the result follo ws trivially b ecause the only leaf in GT 1 ;G;F P +1 is the top lev el goal set G and this is generated as the initial candidate b y St an . Supp ose n > F P + 1. The inductiv e h yp othesis states that all of the lea v es of the tree GT n  1  F P ;G;n  1 are generated as candidates b y St an . Since the plan graph constructed b y St an is iden tical to that of Graphplan up to la y er F P + 1, and all candidates are used to initiate searc h from la y er F P + 1, the lea v es of GT n  F P ;G;n  1 will also b e generated as goal sets in la y er F P b y St an . These goal sets are then used b y St an to construct candidates. St an will not generate m ultiple copies of candidates, but eac h new goal set will generate a new candidate. By Lemma 1, GT n  F P ;G;n = GT n  F P ;G;n  1 , so that the lea v es of GT n  F P ;G;n are gener- ated as candidates b y St an . 2 The denition of goal trees captures precisely the relationship b et w een goal sets and the searc h paths considered b y Graphplan. Ho w ev er, b ecause Graphplan memoizes failed goal sets it can prune parts of a goal tree as it regresses through the explicit plan graph during searc h. Whenev er a goal set con tains a memoized goal set searc h terminates along this branc h and none of its c hildren will b e generated. It can no w b e seen that Graphplan will generate at la y er F P + 1 a subset of the lea v es of GT n  F P ;G;n , when searc hing from la y er n with goal set G , whereas Theorem 1 demonstrates that St an will construct all of these lea v es as candidates. This argumen t migh t suggest that St an engages in unnecessary searc h b y generating candidates that Graphplan can prune, using memos, in la y ers that are not constructed explicitly b y St an . In fact, St an generates no more candidates than Graphplan generates goal sets at la y er F P + 1. Indeed, St an ac hiev es a dramatic reduction in searc h b y exploiting the corresp ondence b et w een the goal trees generated at la y ers n and n  1, demonstrated b y Lemma 1. Because of this corresp ondence there is no need to construct the la y ers b et w een F P + 1 and n explicitly , and undertak e all of the concommitan t searc h from those la y ers. 105 Long & F o x G Ln L2 L1 G3 G2 G1 G Ln L2 L1 G3 G2 G1 Layer 0 FP FP+1 n-1 n Figure 11: The sliding windo w of la y ers b et w een F P + 1 and n . Graphplan rebuilds the sliding window , sho wn in Figure 11, of la y ers b et w een F P and n  1 as la y ers F P + 1 through to n . St an simply promotes the lea v es of the tree, generated at la y er F P in GT n  F P ;G;n  1 , in to la y er F P + 1. It is straigh tforw ard to sho w that the w a v e fron t main tains soundness. The searc h that Graphplan p erforms generates a goal tree of goal sets, as dened in Denition 12. In the example in Figure 10, the tree is ro oted at G, with G1, G2 and G3 its c hildren and G4 and G5 the c hildren of G1. It can b e seen from the picture that the tree structure generated b y Graphplan, in whic h eac h successiv e la y er w ould b e em b edded in a separate la y er of the explicitly constructed graph, app ears in a spiral of related goal sets b et w een the x p oin t and buer la y ers. All of the candidate goal sets lie in this same searc h tree and therefore no additional goal sets are generated. Graphplan constructs the nal plan b y reading o the sequence of action c hoices at eac h la y er in the nal graph. In St an , the plan is obtained b y reading o the initial fragmen t of the plan in the same w a y , from the la y ers preceding the x p oin t. The rest of the plan is extracted from the spiral. This extraction pro cess yields the same path of action c hoices from the top lev el goal set to the candidate goal set as w ould b e recorded explicitly in the Graphplan plan graph. The only question remaining to b e considered is whether the w a v e fron t has the same termination prop erties as Graphplan. It can b e seen that it do es since, if no new unsolv able goal sets are generated at the x p oin t, the queue will b ecome empt y and the planner terminates. This corresp onds exactly to the termination conditions of Graphplan. A subtlet y concerns the in teraction b et w een the w a v e fron t and the subset memoization discussed in Section 2.2. In principle, subset memoization could cause the loss of all three of the desired prop erties of the graph. The w a y that St an generates candidate goal sets is b y 106 Efficient Implement a tion of the Plan Graph in ST AN sim ultaneously generating a candidate set whenev er a goal set is memoized at the x p oin t. If the candidate set and the memoized set are one and the same, then the memoization of a subset of a goal set will lead to the propagation of only a subset of the actual candidate goals in to the buer and soundness migh t b e undermined. If w e use subset memoization at the w a v e fron t then the question arises whether sets that c ontain a memoized subset should b e propagated forw ard as candidates. If they are not, then completeness is p oten tially lost, since there migh t b e action sequences that could ha v e b een constructed follo wing propagation that will not no w b e found. If they are, then termination is p oten tially lost, since the set that led to the construction of the memoized subset migh t itself b e generated as a candidate. This could happ en, for example in Figure 10, if G1 is unsolv able at the x p oin t but is generated again b y consideration of a later candidate at the buer. T o a v oid these problems w e ha v e restored full subset memoization at the w a v e fron t. An alternativ e solution, whic h w e are curren tly exploring, is to separate the subsets of goals memoized from the iden tication of the candidate sets. Both solutions a v oid the loss of soundness b ecause candidates are constructed from en tire goal sets rather than from subsets. In the rst solution, termination is preserv ed b ecause memoizing full goal sets ensures that rep eated candidates can b e correctly iden tied as they recur. In the second solution, w e w ould separately memoize candidates as they w ere generated to a v oid rep eated generation, thereb y main taining termination. In b oth cases, completeness is preserv ed b y propagating goal-sets forw ard as new candidates pro vided only that they do not con tain previously encoun tered candidates as subsets. If a p oten tial candidate is a sup erset of an en tire memoized candidate then it is correct not to propagate that p oten tial candidate in to the buer b ecause if the memoized candidate cannot b e solv ed at the buer then no sup erset of it can b e solv ed there either. 5. Exp erime n tal Results The results presen ted here use St an v ersion 2 (a v ailable at our w ebsite). W e ha v e p erformed exp erimen ts comparing St an with and without the w a v efron t in order to demonstrate the adv an tages obtained b y the use of the w a v e fron t. W e ha v e p erformed further exp erimen ts to compare St an with the comp etition v ersion of Ipp . There are some minor discrepancies in the timing mec hanisms of these t w o systems. St an measures elapsed time for the en tire execution, whereas Ipp measures user+system time for graph construction and searc h but not for parsing of the problem domain and instance. On a single user mac hine as used for these exp erimen ts the discrepancy is negligible. The problem domains used in this section ha v e b een selected to emphasise the b enets oered b y the w a v e fron t. The imp ortan t c haracteristic is that there should b e an early x p oin t relativ e to the length of plan as instances gro w. In the comparisons with Ipp the w a v e fron t accoun ts for the trends in p erformance, although St an emplo ys a range of other mec hanisms whic h giv e it some minor adv an tages. Amongst these is the Tim mac hinery , whic h w e ha v e not decoupled as the problem domains used are the standard t yp ed ones so that no signican t adv an tage is obtained from inferring t yp e structures automatically . Only the resource in v arian ts inferred b y Tim are exploited b y St an v ersion 2, and w e ha v e indicated where this giv es us an adv an tage o v er Ipp . Our ablation data sets conrm 107 Long & F o x 10 100 1000 10000 100000 10 100 1000 10000 IPP ST AN 3 3 3 3 3 Figure 12: St an compared with Ipp : solving T o w ers of Hanoi problems of 3-7 discs. that the w a v e fron t is the most signican t comp onen t in the p erformance of St an in these exp erimen ts. St an is capable of ecien tly solving larger T o w ers of Hanoi instances than are presen ted in the graph in Figure 12, whic h accoun ts for the additional p oin t in Figure 13. St an with the w a v e fron t found the 511-step plan for the 9-disc problem in less than 7 min utes using ab out 15Mb of memory . During the exp erimen ts rep orted here, Ipp w as terminated after 15 min utes ha ving reac hed la y er 179 out of 255 la y ers in the 8-disc problem. W e observ e that on a mac hine with 1Gb of RAM, Ipp has solv ed this problem in 8 min utes. The results for the Gripp er domain demonstrate only a small adv an tage for St an . The reason is b ecause the searc h space gro ws exp onen tially in the size of the graph in the Gripp er domain, so that the cost of searc hing dominates ev erything else. Although the searc h spaces for T o w ers of Hanoi instances also gro w exp onen tially , they gro w as 2 x whereas Gripp er instances gro w as x x (where x is the n um b er of discs or balls resp ectiv ely). Although the w a v e fron t helps under these conditions, the size of the searc h space dw arfs the b enets it oers. The F erry domain is a less rapidly gro wing v ersion of the gripp er domain since only one v ehicle can b e carried on eac h journey , reducing the n um b er of c hoices at eac h la y er. The dierence in b enets obtained in the T o w ers of Hanoi domain relativ e to the Gripp er and F erry domains can b e explained b y consideration of the table in Figure 16. The b enets of the w a v e fron t are prop ortional to the n um b er of la y ers whic h exist implicitly b et w een the buer and the la y er from whic h the plan is ultimately found. In the T o w ers of Hanoi the n um b er of implicit la y ers is exp onen tial in the n um b er of discs whereas the n um b er of la y ers b et w een the initial la y er and the buer is linear in the n um b er of discs. Therefore the b enets oered b y the use of the w a v e fron t are magnied exp onen tially as the problem 108 Efficient Implement a tion of the Plan Graph in ST AN 10 100 1000 10000 100000 1e+06 10 100 1000 10000 100000 ST AN no wf ST AN wf 3 3 3 3 3 3 Figure 13: St an with and without the w a v e fron t: solving T o w ers of Hanoi problems of 3-8 discs. 10 100 1000 10000 100000 1e+06 10 100 1000 10000 100000 1e+06 IPP ST AN 3 3 3 3 Figure 14: St an compared with Ipp : solving Gripp er problems of 4-10 balls. 109 Long & F o x 10 100 1000 10000 100000 1e+06 10 100 1000 10000 100000 1e+06 ST AN no wf ST AN wf 3 3 3 3 Figure 15: St an with and without the w a v e fron t: solving Gripp er problems of 4-10 balls. Domain P arameter n Plan Length Buer T o w ers of Hanoi no. discs 2 n  1 n + 3 Gripp er no. balls 2 n  1 5 F erry no. v ehicles 4 n  1 7 Complete-Graph TSP no. cities n 4 Figure 16: Relativ e v alues of plan length and n um b er of la y ers to buer for four domains. instance gro ws. On the other hand, in b oth Gripp er and F erry there is only a linear gro wth in the dierence b et w een plan length and x p oin t la y er, so b enets are magnied only linearly . This analysis can b e conrmed b y observ ation of Figures 12, 14 and 17. The b enet of the w a v e fron t is measured not only in terms of the cost of construction that is a v oided b y not explicitly building the la y ers b ey ond the buer, but also in terms of the searc h that is a v oided in those la y ers. Crudely , the b enets can b e measured as the n um b er of la y ers not constructed m ultiplied b y the searc h eort a v oided at eac h of those la y ers. Th us, the n um b er of la y ers not constructed magnies the b enets obtained b y not searc hing amongst them. This is a simplication, since the searc h eort a v oided at successiv e la y ers increases as they get further a w a y from the x p oin t, but it giv es a guide to the kind of b enets that can b e exp ected from the w a v e fron t. St an obtains signican t adv an tages o v er Ipp in the Complete-Graph T ra v elling Sales- man domain, as Figure 19 demonstrates. Some of these adv an tages are obtained b y ex- 110 Efficient Implement a tion of the Plan Graph in ST AN 10 100 1000 10000 100000 1e+06 10 100 1000 10000 100000 IPP ST AN 3 3 3 3 3 3 Figure 17: St an compared with Ipp : solving F erry problems of 2-12 cars. 10 100 1000 10000 100000 1e+06 10 100 1000 10000 100000 ST AN no wf ST AN with wf 3 3 3 3 3 3 Figure 18: St an with and without the w a v e fron t: solving F erry problems of 2-12 cars. 111 Long & F o x 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 1e+10 100 1000 10000 IPP ST AN 3 3 3 3 3 3 Figure 19: St an compared with Ipp : solving Complete-Graph T ra v elling Salesman prob- lems of 10-20 cities. ploiting the resource analysis tec hniques of Tim (F o x & Long, 1998), whilst a signican t prop ortion of the adv an tage is obtained from the use of the w a v e fron t, as Figure 20 sho ws. Resource analysis allo ws a lo w er b ound to b e determined on the n um b er of la y ers that m ust b e built in a plan graph b efore it is w orth searc hing for a plan. In the Complete-Graph T ra v elling Salesman domain this is v ery p o w erful, as the calculated b ound is n , the n um b er of cities in the instance, whic h is precisely the correct plan length. In this domain, if no searc h is done un til n la y ers are constructed, no searc h needs to b e done at all since it do esn't matter in what order the cities are visited. This w ould allo w the problem to b e solv ed in p olynomial time (of course, this only mak es sense b ecause the Complete-Graph TSP used here is simpler than the NP-hard TSP). Ho w ev er, when the w a v e fron t is used, the buer is at la y er 4 and the only w a y of nding the plan is to generate all of the candidate goal sets at la y er 4, of whic h there are an exp onen tial n um b er. The use of the w a v e fron t in this domain therefore forces St an to tak e exp onen tial time in the size of the instances. Despite this the w a v e fron t oers great adv an tages. The b enets increase exp onen tially as instance sizes gro w although the magnication of these b enets at eac h la y er is only linear, see Figure 16, although the b enets are oset b y the exp onen tial gro wth in the n um b er of candidates. It m ust b e observ ed that in Figure 19, the gures are extrap olated for Ipp for instances in whic h n is greater than 14. The extrap olation w as based on Ipp 's p erformance on instance sizes b et w een 2 and 14, whic h demonstrates a clear exp onen tial gro wth. It app ears that w e could allo w the resource analysis to o v er-ride the w a v e fron t when a domain is encoun tered in whic h it can b e guaran teed that explicit construction of the graph 112 Efficient Implement a tion of the Plan Graph in ST AN 100 1000 10000 100 1000 10000 ST AN no wf ST AN with wf 3 3 3 3 3 3 Figure 20: St an with and without the w a v e fron t: solving Complete-Graph TSP problems. will b e more ecien t. In practice the Complete-Graph T ra v elling Salesman domain seems exceptional, since searc h is eliminated if the graph is constructed to la y er n , and if this w ere not the case the explicit construction and subsequen t searc h w ould b e more costly than the use of the w a v e fron t. 5.1 The W a v e F ron t Heuristic The queue of candidate goal sets considered in the buer can b e implemen ted as an un- ordered structure in whic h goal sets are selected for consideration according to more so- phisticated criteria than the order in whic h they w ere stored. In principle, this could sa v e m uc h searc hing eort since it could a v oid costly consideration of goal sets whic h turn out to b e unsolv able b efore meeting a solv able goal set. W e ha v e exp erimen ted with a n um b er of goal set selection heuristics whic h fa v our goal sets for whic h the searc h progresses deep est in to the graph structure. These sets are considered to b e closer to b eing solv able than sets whic h fail in a la y er v ery close to the buer. Candidates are ev aluated b y considering the length of the plan fragmen t asso ciated with the candidate and the exten t to whic h the failed searc h p enetrated in to the graph when initiated from the x p oin t la y er when the candidate w as rst generated. The searc h p enetration should b e maximized while the plan fragmen t length should b e minimized. Considering the goal sets in some order other than that in whic h they are generated do es not aect an y of the formal prop erties of the planner other than the optimalit y of the plans generated. Non-optimal plans can b e fa v oured b ecause 113 Long & F o x 100 1000 10000 100000 1e+06 100 1000 10000 100000 1e+06 St an h St an 3 3 3 3 3 3 3 Figure 21: T o w ers of Hanoi with ( St an h) and without the heuristic: 3-9 discs. the balance b et w een fragmen t length and p enetration can cause candidates with shorter fragmen ts to b e o v erlo ok ed. Using the heuristic St an is able to solv e T o w ers of Hanoi problems v ery ecien tly , as Figure 21 sho ws. As previously , the graph is log-log scaled. The line indicates at least a p olynomial impro v emen t in the size of instances. The heuristic w as originally dev elop ed b y consideration of blo c ks w orld problems, in whic h it also p erforms w ell. Ho w ev er, it do es not pro vide a reliable adv an tage so it is not used in St an v ersion 2. It was used on all problems in the comp etition but often represen ted a hea vy o v erhead for St an . W e are con tin uing to exp erimen t with alternativ e domain-indep ende n t ev aluation criteria. 6. Conclusion This pap er presen ts t w o impro v emen ts on the represen tation of the plan graph exploited b y Graphplan-based planners. These are: the represen tation of the graph as a single pair of la y ers, called a spike , built around bit v ectors and logical op erations, and the use of a wave fr ont whic h a v oids the explicit construction of the graph b ey ond the x p oin t. W e describ e a highly ecien t pro cedure for c hec king m utex relations b et w een actions and explain what c haracteristics of problems allo w its full exploitation. The spik e and the w a v e fron t ha v e b oth b een implemen ted in St an , a Graphplan based planner v ersion 1 1 of whic h comp eted successfully in the AIPS-98 planning comp etition. W e ha v e presen ted empirical evidence to supp ort b oth impro v emen ts. The rst set of data demonstrates the increase in graph 1. V ersion 1 con tained implemen tations of b oth the spik e and the w a v e fron t. V ersion 2 enhances b oth of these mec hanisms with impro v ed implemen tation and the addition of the change dA cts mec hanism discussion in Section 2.1. 114 Efficient Implement a tion of the Plan Graph in ST AN construction eciency obtained b y the use of the spik e. The second set of data sho ws the adv an tages obtained during the searc h of the plan graph b y using the w a v e fron t. St an also emplo ys the state in v arian t inference mac hinery of Tim (F o x & Long, 1998), but in v ersion 2 the in tegration of the in v arian ts in to the graph construction pro cess is still only partial. W e observ e that the m utex relations generated in the Complete-Graph TSP , in particular, are almost en tirely domain in v arian ts of the kind inferred b y Tim . In tegration of these inferred in v arian ts in to the graph w ould allo w these m utex relations to b e iden tied immediately as p ermanen t and eliminate them from retesting, dramatically enhancing St an 's graph construction p erformance in this domain. A similar adv an tage w ould b e obtained across other domains since man y of the m utex relations inferred during graph construction corresp ond to in v arian ts of the v arious forms inferred ecien tly b y Tim during a prepro cessing stage. App endix A. W ebsite Addresses Online App endix 1 con tains a complete collection of the domains and problems used in this pap er, executables (Lin ux and Sparc-Solaris binaries) for St an and the reduced v ersion of St an for graph construction, and a di le sho wing ho w the graph constructing v ersion of IPP w as generated. The results of the AIPS-98 planning comp etition can b e found at: http://ftp.cs.yale.edu/pub /mcderm ott/aips comp-re sults.h tml . The St an w ebsite can b e found at: http://www.dur.ac.uk/  dcs0www/ researc h/stans tuff/pla npage.h tml . References Blum, A., & F urst, M. (1997). F ast Planning through Planning Graph Analysis. A rticial Intel ligenc e , 90 , 281{300. F o x, M., & Long, D. (1998). The Automatic Inference of State In v arian ts in TIM. JAIR , 9 , 317{371. Kam bhampati, S. (1998). EBL and DDB for Graphplan. T ec h. rep. ASU CSE TR 98-008, Arizona State Univ ersit y . Kam bhampati, S. (1999). On the Relations Bet w een In telligen t Bac ktrac king and Explana- tion Based Learning in Planning and CSP. A rticial Intel ligenc e , 105 (1-2). Ko ehler, J., Neb el, B., & Dimop oulos, Y. (1997). Extending Planning Graphs to an ADL Subset. In Pr o c e e dings of 4th Eur op e an Confer enc e on Planning . Smith, D., & W eld, D. (1998). Incremen tal Graphplan. T ec h. rep. TR 98-09-06, Univ ersit y of W ashington. 115

Efficient Implementation of the Plan Graph in STAN

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment