New Implementation Framework for Saturation-Based Reasoning
The saturation-based reasoning methods are among the most theoretically developed ones and are used by most of the state-of-the-art first-order logic reasoners. In the last decade there was a sharp increase in performance of such systems, which I att…
Authors: Alex, re Riazanov
New Implementation Framework f or Saturation-Based Reasoning Alexa ndre Riazano v alexandre.ria zanov@gmail.c om December 12, 2006 Abstract. The saturation -based r easoning methods are among the most theoretically de veloped ones and are us ed by most of the state-of-the-art first-order logic reasoners. In the last decade there was a sharp increase in p erformance of such systems, which I attribute to the use of adv anced calculi and the intensified research in implementation techniques. Ho w e ver , no wadays we are witnessing a slowdown in performance pr ogr ess , which may be considered as a sign that the saturation-based technology is reach i ng its inh erent limits. The position I am trying to p ut forward in this paper is that such scepticism is pr ematur e and a sharp improvem ent in performance may potentially be reached by adopting new ar chitectur al princip les for saturation . The top-lev el algorithms and corresponding designs used in the st ate-of-the-art saturation-base d t heorem pro vers hav e (at least) two i nherent d rawbacks : the insuffi cient flexibility o f the used infer ence selection mech anisms and t he lac k of means for intelligent prior itising of sear ch dir ections . I n this position paper I analyse these drawbacks and present two ideas on ho w they could be ov ercome. In particular , I propose a flexible low-cost high-pr ecision mechanism for infer ence selection , intended to ov ercome problems associated with the currently used instances of clause selection-based procedures. I also outline a me thod for intelligent prioritising of search directions, based on pr obing the sear ch space by exploring gener alised sear ch dir ections . I discuss some technical issues related to implementation of the proposed architectural principles and outline possible solutions. 1 Intr oduction An automatic theor em pr over for first-order logic (FO L) is a software system that can be used to show that some conjectu res formulat ed in the language of FOL are implied by some theory . The exp ressi venes s of FOL and its relati ve m echani sability make automated theorem proving in FOL a useful instrument for such applic ations as verification [5,4,1,6] and synthe sis [19] of hardware and software , kno w ledge represen ta- tion [18 ], S emantic W eb [16], assisting human mathematician s [21,3], background reasoning in interac ti ve theore m prove rs [23], and others. This paper is concerned with the theorem proving method based on the concept of saturatio n . Giv en an input set of formulas , the prov er tries to saturate i t und er all infe rences in th e inf erence s ystem of the pro ver . In order to deal with synta ctic objec ts which allow ef ficient calculi, the input set of formulas is usually con v erted into a set of formulas of a specia l form, called clauses 1 . Demonstrating v alidity of a first-orde r formula is thereby reduced to demonstrating unsatisfiabilit y of the corresp ondin g set of clauses 2 . T he calculi worki ng with clauses are usually de signed in such a wa y that infer ences can only produ ce claus es (see, e. g., [2,26]). There are three possib le outcomes of the saturat ion process on clauses: (1) an empty clause is deri ved , which means that the input set of clauses is unsatis fiable; (2) satura tion terminates without producing an empty c lause, in which ca se the in put set o f c lauses is satisfiab le (provide d that a complete infer ence syste m is used); (3) the prov er runs out of resou rces. The saturatio n method is well-stud ied theore tically ([2,26]) 1 Univ ersally quantified disjuncts of literals. A lit eral is either an atomic formula (possibly depending on some variables), or a negation o f such an atomic formula. 2 Sometimes problems coming from app l ications are already represented in the clausal form o r require only min or transformation. and i s implemente d in a significant nu m ber o f mode rn pr ov ers, e. g., E [34], E-SETHEO (the E compon ent), Gandalf [38], Otter [22], SN ARK [36], Spas s [39], V ampire [32,30], and W aldmeister [14 ]. In the last decade there has been a sharp incr ease in performanc e of such systems 3 , which I attrib ute to the use of adv anced calcul i and infere nce systems (primarily , complete vari ants of resoluti on [2] and paramod ulatio n [26] w ith orderin g restrictions , and a number of compatible redundanc y detec tion and sim- plificatio n techniqu es), and intensified research on ef ficient implementation technique s, such as term index - ing (see [12] and more recent surv ey [35]), heuristic methods for guiding proof search (see, e. g., [34]) and top-le vel saturation algorithms (see, e. g., [13] and [33]). Unfortunate ly , the initial momentum created by such work seems to hav e diminishe d, and no wadays we are witnessi ng a slowd o w n in performan ce progre ss 4 . Some researche rs consid er this to be a sign that the saturatio n-base d reason ing technology is reachi ng its inherent limits. The positio n I am trying to defend in this paper is that such sceptic ism is pr ematur e . My ar gumentation is based on a thesis that potential oppo rtunities for a ne w breakth rough in perfor mance hav e not been exhau sted. Namely , the possibility of adopting new implementat ion frameworks for satura tion, i.e., top-lev el designs and algorithms, has not been fully explored . T o suppor t this claim, I will pinpoi nt some major weaknesse s in the org anisation of proof search in the standard appro aches to implementi ng saturation, and propose two concrete ideas on ho w to overco m e these problems. First, I will analys e some inherent problems with the standar d procedu res for satura tion, based on the implementa tion of in fere nce selecti on via cla use select ion . In particular , I conside r the t wo main procedu res, the O TT ER algorithm and the DISCOUN T algorith m, based on clause selection . The m ain probl em with the former proce dure is the coarsen ess of infer ence selection , which translate s into insuf ficient pr oducti v- ity of heuristics and res tricts the choice of possible heuristics . The latter procedu re implements very fine selecti on of inferences , but at a high cost in terms of computati onal resourc es. I will propose a ne w proce- dure based on a fl exible high-pr ecisi on infer ence se lection mechanism w ith accept able o verhead . A concrete implementa tion scheme will be outlined. Second, I will highlight th e inadequa cy of the popular ap proach es to prio ritising proof sear ch directions , based on syntactic chara cteris tics of separa te clauses. As a possible remedy , I propos e a method for intel- ligent prioritisin g of search direct ions, based on pr obing the sear ch space by e xploring gene ralised sear ch dir ections . I also propo se a concrete implementatio n scheme for the method. This criticism of the current state of af fairs in the saturation architectu res originates in my hands-on exp erience with implementing the satura tion-b ased kernel of V ampire [32,30], and numero us experi m ents with the system. In f act, I consider the observ ations rela ted t o proof search effe ctiv eness, on which this paper is based , the most va luable lessons learned from the V ampire kernel implementa tion. Howe ver , thi s paper is only a positio n paper . A s such, it does not presen t any complete results, either theoretica l or experi m ental. Its aim is to prov ide a basis and an inspirati on for new implement ations and experiments . The rest of this paper is structure d as follo ws. Each of the remaining two sectio ns introdu ces a ne w archite ctural princip le. In the beginni ng of the c hapter the rele vant asp ects of th e state-of-the -art design s are critici sed. Then, ideas of a possible remedy are formulate d, follo wed by a discus sion of related work and a tentati ve research programme. Conclud ing this introduction , I would like to ask the reader to be toleran t to some presentatio nal prob- lems with this text. I am trying to keep this paper informati ve for exper ts in the implementa tion of saturati on- 3 A good benchmark is Otter which has not changed much since 1996. Compare its relative performance in CASC- 13 (http://www .cs.miami.edu/ ∼ tptp/CASC/13/) and CASC-20 (http://www .cs.miami.edu/ ∼ tptp/CASC/20/). 4 Compare the performance of the best provers in CAS C-20 (http://www .cs.miami.edu/ ∼ tptp/CASC/ 20/ ) with t he previou s year winners. 2 based pro vers and, at the same time, acceptable for a superficial readin g by a broade r audience. Some ne ga- ti ve conseq uences of such conflict of intentions seem to be inevi table. 2 General pr elimi naries For the s ake of self-c ontainednes s , I will reprodu ce a number of standa rd definition s here. I am a ssuming that the reader is familiar with the syntax and semantics of first-orde r predicate l ogic with equali ty . In what follo ws, ordinary predicate symbols will be denoted by p , q and r , the equality predicate will be denote d by ≃ , function symbols will be denoted by f , g and h , indi vidual constants w ill be denote d by a , b and c , va riables will be denoted by x , y and z , possibly with subsc ripts, and the letters s and t , possib ly with subscript s, will denot e terms. W e are mostly inter ested in a special kind of first-order fo rmulas ca lled clauses . A clause is a dis junction L 1 ∨ . . . L n , where all L i are literals , i .e. atoms (positi ve literals) or nega ted atoms (negati ve literals). T he order of the literals in a claus e is usually irrele vant, so I will often refer to claus es as finite multisets of literal s. The empty multiset of literals will also be considere d a clause which is fals e in any interpre tation. Substi tutions are total functio ns that map v ariables to terms. They w ill be denote d by θ and σ , possibly with subscrip ts. Substitu tion applica tion is extended to complex expre ssions, such as terms, atoms, literals and clauses, in an obv ious way: if E is an expressio n, E θ is obtained by replacing each variab le x in E by xθ . A substitu tion θ is a unifier for two expre ssions E 1 and E 2 if E 1 θ = E 2 θ . It is the most gener al unifier , if for any oth er unifier θ 1 , there exis ts a substitu tion θ 2 , such that E 1 θ 1 = ( E 1 θ ) θ 2 . W e will say that a clause C subsumes clause D if there is a substi tution θ such that (the multiset of literal s) C θ is a submultiset of D . W e are interested in implementation of calculi based on r esolut ion and paramodu lation (see, e. g., [2,26]). (Unrestr icted) binary r esolut ion is the follo wing deduc tion rule: C ∨ A D ∨ ¬ B ( C ∨ D ) θ where θ is the most general unifier of the atoms A and B . Paramod ulatio n is the followin g rule: C ∨ s ≃ t D [ u ] ( C ∨ D [ t ]) θ where θ is the most general unifier of the terms s and u , and u is not a v ariable. A resolut ion-based reasoner usually applies restric ted v ariants of these rules together w ith some auxil iary rules to demon strate unsatis fi ability of an input clause set by de riving an empty claus e from it. Such der i va- tions are called r efutation s of the correspond ing clause sets. Saturatio n-based reasoners are called so because of the way th ey searc h for refutat ions. In an attempt to deri ve an empty clause , a reasone r tries to saturat e the initial set with all clauses deri vab le from it. Roughly speaki ng, at some steps of the saturatio n pr ocess the reasone r selects a possib le infer ence between some clause s in the curre nt clause set, applie s the inferen ce and adds the re sulting clause to the curren t clause set. Other steps of the process usually prune the search space by removi ng r edundant clauses , i. e. clauses that are not strictly necess ary to find a refutation. For details on the concept of satura tion modulo redund ancy , the reader is referre d to [2]. 3 3 Fine inferen ce selection at affordable cost 3.1 Backgr ound : infer ence selection via clause selection When one has to search in a n ind efinitely lar ge s pace, the ability to explo re mor e pr omising sea rc h dir ectio ns befor e the less pr omising ones is a key to succes s. In saturat ion-ba sed reasoning the mechani sm responsi- ble for deciding on which directio n to promote first is kno w n as infer ence selection . Ideally , the inferen ce selecti on should be able to name one single infer ence to be deploy ed at eve ry step of saturation , and the de- cision should be b ased on the ( heuris tically e val uated) quality of the re sulting clause . In p ractice, most of the worki ng saturation -based systems ad opt a simpler but coarser mechanism kn own as clause selection . Instead of selecti ng a single inferenc e at a time, we select a clause and oblige to deploy immediately all possible inferen ces between the clause and all active (prev iously selected) clauses. C lauses of better heuris tically e valuat ed quality are gi ven higher priority for select ion, in the hope that they will produc e heuristi cally good inferences . T he algorith m realising inferenc e selection via clause select ion is known as given -claus e algori thm . Its varia nts hav e been used in prove rs since as early as 1974 [27 ] (see also [20]), although its curren t monopoly seems to be m ostly due to the succes s of Otter [22]. Other prove rs based on var iants of gi ven-clau se algorithm includ e E, Gandalf, SN ARK, S pass, V ampire and W aldmei ster , i. e., practic ally all modern satur ation- based systems. In order to illustrat e the main idea behind the giv en-clause algorithm, namely the implementation of in- ferenc e sel ection via clause selection, it is suf ficient to consider o nly deduct ion inferences . So, the a lgorithm presen ted in F igure 1 perf orms no simplification steps. procedure GivenClause ( input : set of clauses) var new , p assive , active : sets of clauses var curr ent : clause active := ∅ p assive := input while p assive 6 = ∅ do curr ent := sele ct ( p assive ) p assive := p assive − { curr ent } active := active ∪ { curr ent } new := i nfer ( curr ent , active ) if new contains empty clause then return r efutable p assive := p assive ∪ new od return failur e to refute Fig. 1. Gi ven -clause algorithm (without simplifications) It is also con venien t to represent the algorithm with a more abstract dataflow diagra m as in Fig- ure 2. In this picture, the boxes denote operations performed on clauses. T he rounded boxes denote sets of clauses . The shallo w ones corres pond to the sets that typically contain very fe w clauses , while the deep ones corres pond to the sets that can gro w larg e. The arro w s reflect the infor m ation flow for dif fer- ent operati ons. Arrows labeled with the same number belong to the same operat ion/pr ocessing phase . In Figure 2, label 1 corres ponds to the line passiv e := inp ut in the pseudocod e from Figure 1, phase 2 is clause selection ( cur r ent := s elect ( pass iv e ) and passiv e := passiv e − { cur r ent } ), 3 correspon ds 4 input clauses curr ent p assive active de duction inf . new 1 2 3 4 5 4 4 Fig. 2. Dataflo w in the giv en-clause algorithm to activ e := acti v e ∪ { cur r ent } , 4 is the generatio n of deduction inferences between cur r ent and activ e ( n ew := inf e r ( cur r ent , activ e ) ), and 5 is the integra tion of newly deriv ed clause s into pass iv e ( passiv e := passiv e ∪ new ). The thin solid arro ws sho w the m ov ements of clauses be tween clause s ets a nd from operations to the sets. A dashed arro w from a set to an operation indicat es that the operation depends on the claus es from the set. My experi ence w ith V ampire and, to some exte nt, with other prover s allows me to see a number of soft spots of the gi ven-clau se algorithm: – The selection is based on the pr operties of eligible clause s , which are only vaguely re lated to the pr op- erties of the enable d infer ences . A “good” clause may interact with many previo usly selected “not-so -good ” clauses and produce many “not-s o-goo d” inferences . The set of selected clauses often contains such heuristical ly bad clauses for a number of r eason s. In partic ular , we can not completely av oid select ing bad clause s because in g eneral it leads to incompletenes s. Moreo ver , in practice w e often cannot ev en significantl y restrict the selectio n of heuristical ly bad clause s since such strate gy easily leads to loss of solutions (in the practical sense, i.e., solutions that can be obtained w ith giv en resourc es). Another reason why bad clause s get into the set of selected clauses is the relati vity of the heurist ic estimation of clause quality: a clause selecte d as relati vely good in the beginnin g of the proof search, can become relati vely bad later if many better clause s hav e been deri ved. Another problem with claus e property-ba sed select ion is that ev en two “good” claus es can easily hav e “not-s o-goo d” inferen ces b etween them. This happens when the clause quality criteria do not sufficien tly penali se clauses containing “bad” parts av ailable for inferen ces. If our quality criteria are too strict with respec t to clauses with “bad” parts, the prover also postpon es the inferences in v olving “good” parts of such claus es. – The newly selected clause may , and often does, interact with very many parts of very many acti ve clauses. This often leads to patholog ical situations of the followin g kind: a prolific claus e is selected and the proces sing of inferences between this clause and many activ e ones takes all av ailable time, whereas a fe w inferences with other clauses would lead to a solutio n. In su m , the coa rseness of the c lause selec tion principle deprives us of contr ol over t he pr oof sear ch pr o- cess to a gr eat exte nt , w hich transl ates into poor pr oductivity of heuristi cs , r estricts the cho ice of heuristics that can be implemented , and leads to littering the sear ch state with too many “undesir able” clauses . 5 There are two main v ariants of the gi ven-cla use algor ithm: the Otter algori thm 5 and the DISCO UNT algori thm 6 , which dif fer in the way the passive (waiti ng to be selected) clauses are treated. input clauses curr ent p assive active de duction inf . b ackwar d simpl . forwar d simpl . new r etaine d 1 2 3 4 6 8 7 7 4 4 5 5 7 7 7 5 Fig. 3. Otter algo rithm In the Otter algorith m, presented as a dataflo w diagram in Figure 3, the passiv e clause s are subje ct to simplificati on b y the newly deri ved clauses, can be discard ed as redunda nt with the help of the ne w ly deri ved clause s, and themselv es can be used to simplify/disc ard the newly deri ved clauses. Newly deriv ed clauses are subject to forwar d simplification which may transform them or ev en discard them completely 7 Note that in th e Otter algorith m forward simpl ification uses bot h passiv e and acti ve clauses as simp lifiers (see the dashed arro w s labeled with 5 in the diagram). Bac kward simplific ation also af fects passi ve clauses as well as the acti ve ones (see the broad arro ws labeled with 7). In the DISCOUNT algorithm (see Figure 4), only acti ve clauses can be simplified/d iscarded, or used to simplify/d iscard ne w clauses. S o, there are no dashed lines between the box passiv e and the forward and bac kward simplificati on boxes. Note a lso that the c lause in curr ent is subject to forwa rd simplification (arro ws labele d w ith 3), and it is used to simplify the activ e clause s (arro ws labeled w ith 4). This is done to kee p the set of acti ve clauses as simple 8 as poss ible. In the DISCOUN T algorithm passi ve clause s are constr ucted practi cally exclusi vely for ev aluation of their properties which ha ve to be kno wn for controllin g the inference selection . O ne may ar gue that the set of passi ve clauses is just a represent ation of all (potentiall y non-redun dant) one-ste p inferences from the acti ve clauses, and from this point of vie w the D ISCOUNT algori thm implements the ideali stic notion of inferen ce selection descr ibed in the beginn ing of Section 3.1. In other words, the DISCOUN T algorit hm allows the prov er to obser ve the space of all possib le one-step inferen ces between acti ve clauses, which is a good thing by itself. Howe ver , the algorithm also obliges the syste m to do so by expl icitly making all suc h infer ences and storing the resulting clauses as passiv e. The cost of good infer ence selecti on becomes very high. T ypically , a thousan d activ e clauses may generat e hundreds of thousands inferen ces, and a great 5 Implemented, in particular , in Gandalf, Otter, SN ARK, Spass and V ampire. 6 Implemented, in particular , in E, V ampire and W aldmeister . 7 In diagrams on Figures 3 and 4, a broad arro w from an operation to a set indicates that the operation modifies the set by removing or replacing some clauses. 8 Simplicity here is, of course, relati ve to the features of the used inference system, in particular , the redundan cy criteria. 6 input clauses b ackwar d simpl . p assive curr ent active new forwar d simpl . de duction inf . 1 2 5 8 6 6 4 6 6 3,7 4 3 7 Fig. 4. DISCOUNT algor ithm deal of the resulting clauses may be non-redu ndant w ith respect to the activ e ones, and, as such, ha ve to be stored as passi ve. Since the passi ve clauses are not used for anythin g but selection , the work spent on constr ucting a clause may be fr ozen fo r a lon g time, whil e the clause r emains pa ssi ve, and this wo rk is l ost i f the prov er exha usts a giv en time or m emory limit and terminates . Storing huge numbers of passi ve clauses may addit ionall y require a lot of memory . The O tter algorithm is not completely immune from any of these problems too. In addition, the cost of simplificati on operations gro ws with the gro wth of the set of passi ve clauses. Recently there hav e be en (at least) two attempts to address some of these issues. V ampire implements the Limited Resource Strateg y [33], w hich is intended to minimise the amount of work on generatin g, process - ing and k eeping passi ve clause s in the Otter algo rithm, w hich is wasted when th e time limit is rea ched. T his is done by di scardi ng some non-r edundant b ut heuristical ly bad clauses and infe rences . W aldmeister imple- ments a sophis ticated scheme to reduce the memory requirements by the D ISCOUNT algorithm [13,8]. In both ca ses, the ad justment s of the to p le vel algor ithms led to a great improv ement in the ef fecti veness of the systems. This giv es m e hope that a radically diffe rent approa ch to inference selection may result in a real perfor mance breakthroug h. 3.2 Finer sele ction un its with graded acti veness Finer selection un its. The inherent problems with the giv en-clause algorithm motiv ated me to look for a scheme that can facilit ate better contr ol of sear ch at an affor dabl e cost . Instead of selecting clauses, we are going to select some particul ar parts (litera ls or subterms) of claus es and m ake them available for some partic ular kinds of infer ences . S uch triples (claus e + clause part + inference rule) will be the new selection units. This will help us to avoid pr ematur e in vocation of less promising clause parts. Stronger heuristics become a vailab le for ev aluating the quality of selection units since such ev aluatio n can take into account more than just inte gral character istics of a whole clause. For e xample, a selection unit with a gene rally go od clause, b ut with a bad litera l or subt erm int ended f or a prolific inferen ce rule may now be gi ven a low priority . O n the o ne hand, this allo ws us to delay infer ences with a b ad part of the clause . On the other hand , w e don ’t have to delay all infer ences with the clause simply because one of its parts is bad. As an illustration, consid er the unit clause p ( f ( x, y ) , f ( a, b )) . If some form of paramodul ation is al- lo wed, the subterm f ( x, y ) is av ailable for paramodu lation into. This selectio n unit is ext r emely pr olific 7 since f ( x, y ) unifies with all terms startin g with f , and it makes a go od sense to dela y paramodula tions into this term without postpo ning other inference s with the clause, e.g., paramod ulatio ns into f ( a, b ) . Another example of a highly promising heuristic which is enabled by the propo sed approa ch, is to giv e higher priorit y to binary resolution than to paramodulatio n, since the latter is often much m ore prolific than the f ormer . This heuristic alread y works v ery well (at lea st) in V ampir e. The pro ver ne ver e nables infere nces with positi ve equalities in a clause if there are literal s of other kinds. Although generally success ful, this strate gy often fails if all the other literals are relativ ely bad, e.g., if they can generate m any infere nces. Consider the clause p ( x, y ) ∨ q ( a, b ) ∨ f ( a, b ) ≃ a . The literal p ( x, y ) is lik ely to be more prolific than f ( a, b ) ≃ a , sinc e p ( x, y ) unifies with any atom s tarting with p . The prop osed scheme allo w s u s to gi ve very high prior ity to q ( a, b ) , lo wer priority to f ( a, b ) ≃ a since this is a positi ve equality , and a very lo w priority to the ov erly prolific literal p ( x, y ) . Also, simplificati on infere nces and redund anc y tests can be treated in the same way as deducti on in- ferenc es. In the exa mple above , we could make the term f ( a, b ) a vailab le for rewri ting immediatel y , and postpo ne the integrati on of f ( x, y ) into the correspond ing index es until much later . B y delayi ng simplifica- tion inferenc es on stored c lauses in a con trolled manner we can achie ve b eha viours combinin g the propertie s of the Otter and DISCOUNT algorith ms. If s implification infer ences are gi ven higher pri ority , the beh aviou r of our pr ocedu re w ill be c loser to that of the Otter algo rithm. If simplification inferenc es hav e priority com- parabl e to the priority of deduction inferences such as resolution and paramodu lation, w e can expect the ne w procedure to behav e similar to the DISCOUNT algo rithm. Graded acti venes s. Apa rt fro m ch anging the subjec t of selectio n, I pr opose to chan ge th e no tion of se lection itself . The giv en-clause algorith m divid es the search state in two par ts. One pa rt conta ins active clauses, and the other one contains passive clause s that are not yet av ailable for deductio n inferences . If a clause gets into the activ e set, it becomes av ailable for all future inferences reg ardless of its quality . T o overc ome this proble m , I propose to use finer gradati on of selection unit activenes s . Intuiti vely , all sele ction units would become pote ntially av ailable for inferences almost immediately , but some would be “more av ailable” than the others. Less active selection units would be availabl e for infer ences with mor e active ones . High degr ee of acti ven ess of a selection unit would indicat e higher priority of this unit for proof search. In the new procedur e, the units contai ning parts of newly generat ed clauses initiall y recei ve the minimal degree of acti veness, but later are gradually promoted to highe r degrees of acti veness. When a promotion step ta kes place, the selection uni t becomes a v ailable for ne w inferen ces with some units which hav e not been eligib le so far due to insuf ficient acti veness . T o giv e higher priority to inferences w ith heuris tically bet ter s electio n units, the promotion freq uenc y for dif ferent units sh ould vary according t o the ir qualit y . Thus, we will be able to delay infer ences between heuristical ly bad infer ence units . T o illustrate this rather general scheme, I w ill outline a simple implementa tion scheme. For this imple- mentatio n the nature of the used selection units is irrele va nt, i. e. they can be clauses as well as the finer selecti on units proposed abov e. Howe ver , the implementatio n relies on the assumpt ion that the quality of selecte d units is reflected by a special real-v alued coef ficient w hich tak es positi ve value s. If υ is a selection unit, the corr esponding coefficie nt will be denoted as q ual ity ( υ ) . The in tuitiv e mean- ing of the quality coef ficient is the rel ative fr equency of pr omotion . If υ 1 and υ 2 are two selection units, at each promotion step the probability of selectin g υ 1 for promotio n relates to the probability of selectio n υ 2 as q ual ity ( υ 1 ) relates to q ual ity ( υ 1 ) . Practicall y , we can select units for promotion randomly according to the distrib ution expli citly specified with the qualit y coef fi cients . T his selectio n discipline is kno wn in the area of Genetic Algorith ms as r oulette-whe el selecti on [11]. T o realise the idea of graded acti veness, I propose to partition the set of all av ailable selection units into n + 1 sets Υ 0 , Υ 1 , . . . , Υ n . The index es of the sets reflect the activ eness of selecti on units contained in 8 them: units in Υ i +1 are more acti ve than units in Υ i . More specifically , for i > 0 , υ ∈ Υ i implies that all possib le infer ences between υ and units fr om Υ n − i +1 , . . . , Υ n have been made and no infer ences between υ and units in Υ 0 , . . . , Υ n − i have been con sider ed yet . Υ 0 contai ns abso lutely passive sel ection uni ts, i. e. u nits that have not particip ated in any infer ences yet . This in v ariant, illustrated in Figure 5, is m aintain ed by the follo wing procedure. As soon as some selection unit is constructe d, it is placed in Υ 0 . At each macrostep of the proce dure some select ion unit υ is selected for promotio n as outlined abo ve. If υ happens to be in Υ i , where i < n , its p r omotion means that υ is remov ed from Υ i , all possible infe rences between υ and sel ection units from Υ n − i are made, and υ is placed in Υ i +1 . Selection units from Υ n are not promoted , they ha ve the maximal acti veness . Υ 0 Υ 1 . . . Υ i . . . Υ n − i Υ n − i +1 . . . Υ n acti veness increases no infer ences yet . . . . . . can ha ve inference s with can ha ve inference s with Fig. 5. Graded acti venes s implementation Special arrangements may hav e to be m ade if we admit selection units that need not interact w ith other selecti on units to produce inferenc es. For exa m ple, we may decide that selection units intended for binary fact oring ([2]) with a partic ular literal in a parti cular clause do not need a counterp art unit, i. e. if we decide to deplo y such a unit, we will ha ve to mak e all possible facto ring infere nces with the specified litera l within the specified clause . One possibility of dealing with such select ion units is to designate some activ eness i > 0 as a threshold, so that when a select ion unit reaches Υ i , all inferences requiring this unit alone are immediatel y m ade. As a whole, the propose d inference selection scheme allo w s for much better contr ol over infer ence selecti on which may translat e into higher pr oductiv ity of heuristi cs and enables the use of new heuri stics which could not be used with the giv en-clause algor ithm. A part from other things, the extra flexibility 9 of inference selection will enhance the diver sity of availa ble strate gies 10 . These advan tages come at an af for dable cost . The only in v olved overh ead, caused by the need to store lar ge numbers of selection units, is compensat ed by lo w er numbers of heuristica lly bad clauses which hav e to be created and stored only to maintain complete ness. I would lik e to add one final conside ration here. The calculi used in the state-of-the- art saturation- based prov ers are designed with the aim of reducing search space. Partially , they do this by restrictin g the appli cabilit y of resolution and paramodulati on rules. O ften this is done by prohibiting inferences with 9 The proposed design is stri ctly more flexible than the standard ones since it is possible t o implement it in such a way that both the Otter and DISCOUNT algorithms can be simulated by appropriate parameter settings. 10 My experience sug gests that this is a v ery important factor a s in 2002–2 005 the mu l titude o f strategies supported by the V ampire kernel has been a major , if not the main, contributo r to the growth of performance of the whole system. 9 certain parts of clauses. For exa m ple, ordered resolutio n w ith literal selection (see [2]) prohibi ts resolving non-max imal positiv e literals. Howe ver , restricting the shape of eligible deriv ations also means restric ting the number of eligible solutions , and simple solut ions ar e often thr own away if they do not satisfy the restric tions. It is possible, in principle, to relax the restriction s by allowing some redundan t inferences with some heuris tically good parts of clauses. For example, we may want to resolve lar ge (i. e., containing m any sym- bols) positi ve non-maximal literals with the aim of obtaining smaller resolve nts. Howe ver , adjusting prov er archite ctures based on the standard v ariants of the gi ven -claus e algori thm makes very desirable the introduc- tion of new ad hoc mechanisms for regul ating the pr oportion of r edundant and non-r edu ndant infer ences . The sch eme propo sed in this pape r seems to ha ve suf fi cient flexib ility to accommodate such contro l mecha- nisms for free. For example , we can allo w selection units with lar ge p ositiv e non -maximal lite rals, a nd assign to th em h igher quality measu re than to non-redund ant selection units, if we are eager to deri ve small claus es earlier . If we choose to be more conserv ati ve and want to av oid most redundan t infer ences excep t a small number of v ery heu ristically promisin g ones, we can al ways ass ign h igher quality measure to non-redunda nt selecti on units. 3.3 Methodologic al consideratio ns The proposed scheme for finer inference selection is complete ly compatible with the modern theory of resolu tion and paramod ulation, and req uires no the oretical analysis. The dif ficult part of th e job is to find an adequ ate design and to do the actual implementation. One of th e imple mentatio n op tions is to adjust an exist ing system. T o in vestig ate t his possib ility , I looked throug h the code of the kernel o f V ampire, v7.0, with the purp ose of es timating the amo unt o f work required to adju st it to the new scheme. This in ves tigatio n has con vinced me that at least one third of the code would hav e to be rewrit ten complet ely , and at least another third of it would hav e to be hea vily adjuste d to accomod ate the new code. This is h ardly surpris ing, taking in to account t hat the pr oposed changes t ar get the top l e vel d esign as well as s ome ke y data representat ions and so me mid-le vel functiona lity such as inde xing. The main conclus ion of m y inspection of the V ampire kernel code is that the amount of work required for a transition to the new scheme is likel y to exc eed the cost of creating a rather advan ced brand ne w protot ype. An implementation from scratch can also be better tailored to the ne w design. C onside ring this additi onal advan tage, my preference is clear . H o w e ver , I do not dismiss the possib ility of implementing the ne w scheme on the base of other adva nced saturatio n-based prov ers. The nature o f the proposed architectu ral principles is such that their advan tages can only b e full y demo n- strated if a significant ef fort is in ve sted in design and assessmen t of search heuristics . Indeed , the main ad- v antages of the ne w inference selectio n approach are the higher product ivity of exis ting heuristics and the possib ility of using new heur istics. This extra fle xibility i n direct ing proof search ca n only be fully ex ploited by means of tuning . Therefore, ver y ex tensi ve experi m entatio n will be necess ary to find general ly good combinations of parameters of heuris tics, as well as strate gies specialise d for impor tant classe s of problems 11 . Stron g tuning infras tructure to support such exper imentati on seems highly desirable. Deve loping such an infrastruc ture may be itself an interes ting research problem. Finally , I would lik e to add a note abo ut term ind exi ng. The finer g radation of acti veness di vides clause s and their parts into m any logica lly separate sets. An initial implementatio n may adapt existin g techniqu es 11 For experiments one can u se the TPTP library [3 7], which is at the moment the largest a nd mos t di verse collection of first-order proof problems. It would also be v ery useful to look at more specialised large problem sets coming from app l ications in o r der to demonstrate the tunability of the proposed architecture. 10 to in dex these sets sepa rately . Ho we ver , better spec ialised indexing sol utions may e xist, and, if the propo sed design pro ves viable in the initial experiments , it may giv e rise to a new line of rese arch in term index ing. 4 Generalisation-based prioritising of search dir ections 4.1 Backgr ound : local syntacti c rel ev ancy estimation Blind search in indefinite ly lar ge spaces is usually not effect ive enough for most applicatio ns, so, all mod- ern satura tion-based prov ers try to predict the rele van cy of particul ar search direc tions by using v arious heuris tics. In a saturation proces s state, the a vaila ble search directio ns are identified by the accumu lated clause s (e. g., the conten ts of the sets p assive and active in the pick-gi ven clause algorith m presen ted in Figure 1). The m ost common heuris tics priorit ise search directio ns by gi ving some clause s higher prior ity for participat ing in inferences , than th e other s. The estimatio n of re lev ancy of a clause is ba sed o n such char - acteris tics of the clause as its structural complexity (e.g., simpler clauses get higher priority ) or its potential for particip ating in inference s (e.g., very pro lific clauses get very lo w priority). Such approaches ha ve natura l limitations. The syntactic char acteristics of a clause , used in the estima- tion, often fail to r eflect the usefulness of the clause adequately . For example , a structu rally complex clause may be abso lutely indisp ensable for any soluti on of the problem at hand, b ut it will be suspend ed for a long time. Another problem is that the e stimation is done l ocally , i.e. only one clause i s ana lysed a nd global prop- erties of the current sea rch state are not t aken into account. For example, an abso lutely irrele vant clau se, i.e., partici pating in no minimal u nsatisfiable subset of the curre nt claus e set, may be giv en high pri ority be cause of its simplici ty . 4.2 Generalisa tion-based prioritising of proo f-sear ch direc tions T o address the issues raised above , I propose a method for intelli gent priorit ising of search directions. T he idea is as follo ws. W e will es timate the pote ntial of a clau se to partici pate in solution s of the who le pr oblem at hand by inte rac ting with other curr ently ava ilable clause s . Precise esti m ation is impos sible si nce it wo uld requir e fi nding all, or at least some, solu tions of the problem, so we are looking for a good approximati on. General method. I suggest to pr obe the sear ch space by explor ing a substan tially simpler sear ch space . The latter is obtained from the former by genera lising some sear ch dir ections . This is done by replacing (prefer ably large) clusters of similar clauses w ith their common generalisa tions. If we fi nd a solution of the simplified prob lem, which in vol ves the general isation of a particu lar cluster , this is a good indicatio n that at least so me of the claus es in t he cl uster can be r elev ant. More impo rtantly , the clauses whos e gen eralisati ons have not yet pr ove d useful , can be suspe nded as potentially irr elevant . Additionally , the closer a resolv ed genera lisation is to a particular clause in its cluster , the better chances the clause has to partic ipate in a soluti on and the bigger priority it should be giv en. Generalis ations can be defined semantical ly: a clause C can be called a general isatio n of clause D if C logical ly implies D . For our purpos es, ho w e ver , it is con venient to use a simpler , syntac tically defined notion of generalisa tion, based on subsumpti on. In what follo ws, we w ill call C a generalis ation of D if C subsu mes D , i. e. C θ ⊆ D 12 , where C θ and D are viewed as sets of liter als. Implementation with naming and folding. T echnica lly , the general approach describe d abov e can be re- alised by mean s of a combi nation of dynamic naming and fo lding . This combination is called decompositio n rule in [17], bu t for the purp oses of this paper it is con ven ient to consider the rules separately . 12 More restrictiv e multiset-based variant of subsumption , where C θ is required to be a submultiset of D , can also be used. 11 The idea sho uld be clear fr om the fo llowing exampl e. Suppose we ha ve a cla use C 1 = p ( f ( a, b )) ∨ p ( g ( b, a )) ∨ q ( a ) . W e decide th at th is claus e is too specific and its generalisa tion Γ 1 ( x 1 , x 2 ) = p ( f ( x 1 , x 2 )) ∨ p ( g ( x 2 , x 1 )) should be explo red first. T o thi s end, we in troduce a new bi nary (accor ding to the number of v ariable s in Γ 1 ) predic ate γ 1 and make it the name for Γ 1 . Logically , this can be vie wed as introducti on of the definition ∀ x 1 , x 2 . γ 1 ( x 1 , x 2 ) ⇔ Γ 1 ( x 1 , x 2 ) . W e immediately transform C 1 by folding this definitio n into the follo wing clause C ′ 1 = γ ( a, b ) ∨ q ( a ) . Moreov er , if there are other clause s, currently stored or deri ved in the future, which are insta nces of the generalisa tion Γ 1 , we can app ly foldin g to them as well , thus recogni sing that the cla uses are cov ered by the gene ralisat ion Γ 1 . Fo r example , if the clause C 2 = p ( f ( h ( a ) , b )) ∨ p ( g ( b, h ( a ))) ∨ r ( b ) is deriv ed, it will be replaced by the clause C ′ 2 = γ 1 ( h ( a ) , b ) ∨ r ( b ) . The generalisati on Γ 1 is injected into the search space in the form of the clause Γ 1 ( x 1 , x 2 ) ∨ ¬ γ 1 ( x 1 , x 2 ) , which is a logical consequ ence of the definition for γ 1 . In order to obtain the beha viour prescribed by the general scheme, clauses contain ing γ -pr edica tes (i.e., predicat es which are generalisati on names) are giv en special treatmen t. Namely , if a clause contains neg ativ ely the name γ 1 for the generalisati on Γ 1 , it means that the clause was deri ved from the clause Γ 1 ( x 1 , x 2 ) ∨ ¬ γ 1 ( x 1 , x 2 ) repres enting the generalis ation Γ 1 in the search space. In such clauses, we prohibit all infere nces in vo lving negati ve γ -liter als (i.e., literals with γ -predi cates) if there is at least one litera l of a diffe rent kind. R oughly , in the clause Γ 1 ( x 1 , x 2 ) ∨ ¬ γ 1 ( x 1 , x 2 ) we want to resolve the generalisa tion part Γ 1 ( x 1 , x 2 ) before we touch the literal ¬ γ 1 ( x 1 , x 2 ) . U ntil this happens, the literal ¬ γ 1 ( x 1 , x 2 ) only accumula tes the substituti on which solv es Γ 1 ( x 1 , x 2 ) . When a clause containing only neg ativ e literals with generalisat ion names is deriv ed, this indicates that some generalis ations “fired”, i.e. they contradict each other and some ordinary input clauses. The deri va- tion of such a clause can be vie wed as a represen tation of a refutation of the clause set consi sting of the in v olved general isations and input clauses. W e will call such clauses γ -contra dictions and their inferences γ -r efutati ons . Clauses contai ning positi ve γ -literals are suspended (temporarily remov ed from the search state) until all of the corr espondi ng gener alisat ions have pr ove d useful , i.e. eve ry participa ting γ -predicat e belong s to at least one γ -contra dictio n. When we can no longer suspen d such a clause , w e still block any inference s in v olving its non- γ literals. A resolut ion inference between such a claus e and a γ -contradict ion indica tes that the c lause is compatib le with the co rresponding γ -refutation , and it repre sents an att empt to (gr aduall y) r efine the γ -refu tation into a soluti on for the origina l proble m . If some form of paramodulatio n is used, we hav e to allo w paramod ulation into the positi ve γ -literal s in an attempt to m ake them compatible with a vailab le γ -contradic tions. T o illustrate this, I contin ue the example . S uppose we ha ve deriv ed the γ -con tradiction ¬ γ 1 ( a, b ) . The clause s C ′ 1 = γ ( a, b ) ∨ q ( a ) and C ′ 2 = γ 1 ( h ( a ) , b ) ∨ r ( b ) can no longer be suspended. The clause C ′ 1 is directl y compatibl e with the γ -refutatio n, which results in a deri vati on of the clause q ( a ) . T he clause C ′ 2 is not compatible with the γ -refut ation since γ 1 ( h ( a ) , b ) is not unifiable with γ 1 ( a, b ) . Howe ver , in prese nce of the unit equalit y clause h ( a ) ≃ a , we can rewrite C ′ 2 into γ 1 ( a, b ) ∨ r ( b ) , which is compatible with the γ -refut ation, and then deriv e r ( b ) . Note, that the work spent on refuting the generalisati on Γ 1 (modulo some ordinary input clauses ) is utilised: we do not repeat the same inferen ces with the genera lised literals from the original clause C 1 . Moreov er , the results of this work are shared w ith anoth er clause – C 2 , and, potent ially , with man y other clau ses cov ered by the gen eralisa tion Γ 1 . Such sharin g of w ork on si milar p arts of poten tially very many d ifferen t clauses can be an additiona l advan tage. Note that the propo sed naming- and folding-ba sed scheme is rather fl exib le. It allo ws many varia nts which m ay differ , e.g., in the way suspended clauses are treated, ho w selecti on of inference s is done with the γ -lite rals, ho w generalisati ons are chosen, ho w many gener alisati ons can be applied to a single clause 12 and whether they can be overl apping 13 , etc. The descri ption abo ve is only intended to provide a general frame work for formulating such var iants. Moreo ver , it is obvious ly not the only possible frame work for implementi ng the general scheme presented in the beginnin g of this section. The propose d implementation scheme of fers anoth er advant age for free. T he user gets an addit ional means of controll ing proof search by specifyin g in the input which clauses he would like to make named genera lisations from the start. T his can be vie wed as a way of hinting at useful lemmas (of a restricted kind since only clauses are named rather than arbitrary formulas) or suppre ssing search directio ns w hich do not seem promising to the user . For example, by analysing some previ ous proof attempts the user may conclude that many clause s of the f orm ¬ p ( g ( a, b )) ∨ C are gene rated. If the use r has reason s to be liev e that the lite ral ¬ p ( g ( a, b )) can be solv ed, i. e. p ( g ( a, b )) is logica lly implied by the input clause s, he may want to try provin g p ( g ( a, b )) as a lemma, a nd later use it to resol ve with the clauses ¬ p ( g ( a, b )) ∨ C . Practic ally , this can be done by making ¬ p ( g ( a, b )) a generali sation and givi ng it some name, e. g. γ 3 . Refuting ¬ p ( g ( a, b )) ∨ ¬ γ 3 corres ponds to prov ing the lemma p ( g ( a, b )) , and resolutions between the γ -contrad iction ¬ γ 3 and clauses of the form γ 3 ∨ C correspon d to applicatio ns of the lemma. Such lemma hint ing ma y be ben eficial bec ause it allo ws to share the work on solv ing literals ¬ p ( g ( a, b )) in many dif ferent clauses instead of solving them separately . If the user has reasons to belie ve that the literal ¬ p ( g ( a, b )) cannot be solv ed, and thus all the claus es ¬ p ( g ( a, b )) ∨ C are redundant, it still makes sense to make ¬ p ( g ( a, b )) a genera lisatio n. This w ill keep the generalise d clauses ¬ p ( g ( a, b )) ∨ C away from inferenc es without completely discardin g them. Only if the user’ s intuitio n was incorre ct, i. e. ¬ p ( g ( a, b )) can actually be solved , the genera lised clauses are reintro duced in the search space. 4.3 Related wor k Static re lev ancy prediction. My original idea was to use some sort of clause abstractio ns for dynamic suppre ssing of potenti ally irrele vant search directi ons in the frame work of saturat ion-ba sed reasoning. T his idea was inspire d by [7] where the authors propose to use v arious clause abstracti ons for statically identi- fying input clauses which are pr actica lly irr elevant , i.e. can not be useful in a proof attempt of acceptable comple xity . Roughly , this is done by applyi ng abstrac tions to an input clause set, explorin g the space of all proofs of restricted comple xity with the abstracted clause set, and thro wing away the input claus es whose abstra ctions do not participate in any of the obtai ned proofs w ith the abstr acted set. Iterat iv e generalisat ion-r efinement. Some time ago [29] dre w my attention to the simplest kind of clause abstra ctions – generalisa tions, which seems con venie nt for our purposes. The method works roug hly as fol- lo ws. A resoluti on p rov er is parameter ised by a generalis ation fu nction o n clauses, i.e. a function which com- putes se ver al, pos sibly over lappin g, genera lisations for a giv en clause . When the prov er is run on a proble m , the gener alisation mechani sm replaces suitable clauses by their generalisat ions. The whole scheme works as iteration throu gh lev els of generalisat ion strength. First, the prov er is run with a strong generalisat ion functi on to enumerate all refutations with depth belo w a certain limit. Then the generalis ation function is weake ned 14 and the pro ver uses the p re viously f ound refutations to guide the enumeration of refutation s with the ne w generalisa tion functio n. T he ke y idea is that the refutatio ns with the weak er generalisati on function 13 Intuitiv ely , two generalisations of a clause C ov erlap if t hey cover some common literals in C . For example, C 1 = p ( f ( a, b )) ∨ p ( g ( b, a )) ∨ q ( a ) has overlap ping generalisations p ( f ( x 1 , x 2 )) ∨ p ( g ( x 2 , x 1 )) and p ( g ( x 1 , x 2 )) ∨ q ( x 2 ) because the literals p ( g ( x 2 , x 1 )) and p ( g ( x 2 , x 1 )) both generalise the literal p ( g ( b, a )) . 14 Roughly , a weak er generalisation function produces more specific generalisations of a giv en clause. 13 are in a certain (strict) sense refinements of the refutations obtaine d with the stronger general isation func- tion. Such refinement is performed repeatedly and at some point the prov er tries to refine a refutat ion from the pre vious step into a refutation which uses no generalisat ion. Octopus approach. The Octopus system [25] runs a lar ge number of sessions of the prover Theo [24] distrib uted ov er a cluster of computers. Each Theo se ssion first runs on a w eak ening of the ori ginal p roblem, obtain ed by r eplacing one of the clauses with one of its gener alisations . If one of the sessio ns succeeds in solvin g the weake ned pro blem, the sol ution is used to dire ct the sea rch for a solutio n of the original pro blem in two ways : – The unmodified clauses from the original problem formulation, which participate in the solution of the weake ned problem, are consider ed to be heuristical ly more rele va nt. In the future searches for solutio ns of the origin al problem, these clauses are giv en higher priority . – Some clauses in the obt ained refutati on of the general ised clause set, which were deri ved from unmodi- fied clause s, are added as lemmas to the problem formulation. The main diffe rence between m y approach and the static rele vanc y predictio n approa ch of [7], and also the Octopus approa ch [25], is that our clause gener alisati ons are introduced dynamicall y , and can be used on derived clause s . This allo w s a good degree of adapt ivity . My approach is closer to, and can be vie wed as an attempt to revi ve the line of work presented in [29]. I hope to improv e on this approach mainly by enumerating generalised refutations lazily , thus avoiding any artific ial limits on the comple xity of r efutat ions and the need to enumerate a whole, potential ly lar ge, set of generalised refutatio ns before w e try to use these refutations. Also, my approach is more semantic in its nature since we do not try to refine general ised refutations by follo wing their structure . W e are intere sted in e xistence of γ -refu tations rather than their shape. This allo ws much easier integ ration with variou s varian ts of resolution- and superpositi on-based inference systems. Additionally , m y approach imposes no restric - tions on how the generalis ation functi ons are specified and implement ed. In particu lar , the generalisa tion mechanis m can be adapti ve. For example, the strength of generalisat ion may depend on variou s properties of claus es being generalis ed, or e ven on some global propert ies of the curren t search state. The general meth od is a lso part ially inspired b y , an d share s some philos ophic al ideas with [28] and [10]. The use of naming and foldi ng is a natural continuat ion of our joint work with Andrei V oro nkov on implementi ng splitting without bac ktrac king [31] and also partia lly stems from an unfinish ed attempt by the autho r to mimic tableaux without back trac king [9] in the contex t of saturatio n. Recently I ha ve dis- cov ered that [17 ] propos es to use exact ly the same combination of naming and folding, under the name of decompo sition rule , for deciding two descripti on logics and query answering in one of them. Semantic guidance in the sty le of SCO TT . T o conclu de the ov erview of rele va nt work, I wou ld like to mention another approach which is technically unrela ted to the one proposed here, but which also provide s an alternat i ve to local syntactic rele v ancy estimation. The semantic guidance approach, dev eloped within the SC O TT projec t [15], is roughly as follows. The pro ver tries to establish satisfiabi lity of se veral sets of stored clauses (in SCO T T this is done with the help of an exte rnal model bu ilder). Ideally , these sets must approxi m ate their m aximal satisfiabl e supersets as closel y as possible. The sets are used for guiding clause selectio n roughly as follo w s: clauses participatin g in fe wer such sa tisfiable sets are giv en hig her prio rity for selection. The intuition beh ind this approach is th at a clause is m ore likely to be redundant if it participate s in many satisfiable sets. This heuristic is supported by the fa ct that if a clause is in ev ery maximal consistent subset, then it is definitely redundant . 14 The applicabil ity of the semantic guidance approac h seems limited becaus e it relies on the costly op- eration of establishi ng satisfiability of large clause sets. T his ov erhead may be acceptable in solving very hard pro blems w hen t he user c an af ford to ru n a prov er for hours or ev en days. Many applicati ons, howe ver , requir e solving lar ge numbers of simpler problems and much quick er response. I hope that generalis ation- based guidance c an be more useful fo r this kind of applic ations because the ass ociated ov erhead seems more managea ble due to the flexibi lity of generalisat ion function choice . Anywa y , a meaningful comparison of the two approaches can only be done expe rimentall y , w hen at least one v ariant of the general isation-based method is implemente d. 4.4 Methodologic al consideratio ns Certain theoretical effo rt is requi red to formula te the method in full detail. It makes sense to consi der a number of v ariants of the method and try to predict their strength s and weaknesses . It is also essential to h ave a clear picture of ho w the proposed use of generalis ations will intera ct with the popular inference systems based on resolution, paramodula tion and standa rd simplificatio n techniqu es. In particular , it is necessary to consid er the search completene ss issues . The ef fectiv eness of the method is likely to depend strongly on the choice of generalisa tion functions and, therefore, a significant effort to find adequate heuristics would be well justified. In particular , anyb ody implementi ng the method is very likely to encounter the proble m of ove r gener alisation . W orking with too strong gene ralisat ions of claus es may potentiall y lead to numerous γ -refutat ions that are not compatib le with any of the cov ered clause s. For example, if w e fold the definition ∀ x.γ ( x ) ⇔ p ( x ) into the claus e p ( f ( a )) , transfor ming it into γ ( f ( a )) , we may later deri ve some γ -refutation ¬ γ ( b ) w hich is incompatible with γ ( f ( a )) . The work on deri ving ¬ γ ( b ) is poten tially waste d, unless, of course, there are other clause s compatib le with ¬ γ ( b ) . Another problem with o ver generalisa tion is that γ -refu tation s compatib le with many clause s may be found quickly , and w ill activ ate the correspo nding clauses. In such cases the work spent on creatin g general isations themselv es and their applicatio n to clauses, is wasted because the genera lisations do not fulfill their mission of suspendi ng clauses. On the other hand, too weak generali sation s may also be bad, e. g., because they cov er too small sets of clauses, in which case their constructio n is not properly amortise d. I hope these considerat ions illust rate the thesis about importance of searching for heuristi cs for choos ing effec tive genera lisation functions. In contrast with the fine infere nce selec tion scheme which essenti ally requires creatin g a ne w imple- mentatio n, the generalisati on-based search guida nce can be relati vely easily integ rated into some exist ing pro ver s, especi ally if it is implemented with namin g and fol ding as outline d earlier . My exper ience with im- plementi ng splitt ing-with out-ba ck tracking [31] (see also Chapter 5 in [30]) in the V ampire kernel suggests that only a moderate ef fort is required to implement naming and folding on the base of a reasonably m an- ageabl e implementa tion of forwa rd subsump tion, which is a standard feature in adv anced saturati on-bas ed pro ver s. The most dif ficult task is like ly to be the desig n and implement ation of a flexible, yet manageab le, mechanis m for specifying generalisa tion function s, and to provide a higher -le vel interf ace for this mech- anism which would enable producti ve use of heuris tics. The relian ce on heuristi cs also implies that very ext ensi ve expe rimentation w ill be r equire d to assess the gene ral ef fectiv eness of the method and t o compare its v ariants . 15 5 Acknowledgmen ts This paper is almost entirely based on my work on V ampire in the Computer Science Department at the Uni versity of Mancheste r . The work was supported by a grant from EPSR C. The first draft of this paper was also written in Mancheste r . I would like to thank Andrei V o ronko v for useful discus sions of the ideas presen ted here. Many than ks to Geoff Sutc lif fe for his scribbli ngs on the fi rst draft of this pape r . Refer ences 1. W . Ahrendt, T . Baar , B. Beckert, M. Gi ese, E. Habermalz, R. H ¨ ahnle, W . Menzel, and P . H. Schmitt. The KeY Approach: Integrating Object Oriented Design and Formal Verification. In M. Ojeda-Aciego, I . P . de Guzm ´ an, G. Brewka, and L. M. Pereira, editors, Proc. 8th Eur opean W orkshop on Logics i n A I (JELIA), Malaga, Spain , volume 1919 of LNAI , pages 21–36. Springer V erlag, October 2000. 2. L. Bachmair and H. Ganzinger . Resolution Theorem Proving. In A. Robinson and A. V oronko v , editors, Handbook of Auto- mated Reasoning , volum e I, chapter 2. Elsevier Science, 20 01. 3. C. Benzm ¨ uller , L . Cheikhrouhou, D. Fehrer , A. Fiedler, X. Huang, M. Ke rber , M. Koh lhase, K. K onrad, E. Melis, A. Meier, W . Schaarschmidt, J. Si ekmann, and V . Sorge. Omega: T o wards a Mathematical Assistant. In W . W . McCune, editor, Pr o- ceedings of 14th International Confer ence on Automated Deduction (CADE-14) , number 1249 in LN AI, pages 252–255, T ownsville, Australia, 1997. Springer . 4. K. Claessen, R. H ¨ ahnle, and J. M ˚ artensso n. V erification of Hardware S ystems with First-Order L ogic. In G . Sutcliffe, J. Pel- letier , and C. Suttner , editors, Pr oceedings of the CADE- 18 W orkshop - Pro blem and Prob l em Sets for A T P , number 02/10 in Department of Computer Science, Uni versity of Copenhagen, T echnical Report, 2002. 5. D. Crocker . Making Formal Methods popular through Automated Verification. In R. Gor ´ e, A. Leitsch, and T .Nipko w , editors, IJCAR-2001 - Short P apers , number 11/01 in Dipartamento di Ingeg neria dell’Informazione, Univ ersit ` a degli S tudi di S iena, T echnical Report, pages 21 –24, 2001. 6. D. L. Detlefs, K. Rustan, M. Leino, G. N elson, and J. B. Saxe. Extended S tatic Checking. Research Report 159, COMP AQ Systems Research Center , December 1998. 7. M. Fuchs an d D. Fuchs. Abstraction-Based Relev ancy Testing for M odel Elimination. In H. Ganzinger , editor , Pro c. CADE-16 , volume 1 632 of LNCS , pages 344–358. Springer V erlag, 1999 . 8. J.-M. Gaillourdet, Th. Hillenbrand, B. L ¨ ochner , and H. S pies. T he ne w W A L D M E I S T E R loop at work. In F . B aader , editor, Pr oc. CADE-19 , volume 2741 of LNAI , pages 317–321. Springer-V erlag, 2003. 9. M. Giese. Incremental cl osure of free variable tableaux. In R. Gor ´ e, A. Leitsch, and T . Nipko w , editors, Fir st International J oint Confer ence on Automated Reasoning , IJCAR 2001 , volume 2083 of LNAI , pages 545–56 0. S pringer V erlag, 2001. 10. F . Giunch i glia, A. V illafiorita, and T . W alsh. Theories of Abstractions. AI Communications , 10(3-4):167–1 76, 1997. 11. D. E. Goldberg . Genetic Algorithms in Sear ch, Optimization and Mach ine Learning . Addison-W esley , 1989. 12. P . Graf. T erm Indexing , volu me 1053 of LNCS . Springer V erlag, 1996. 13. Th. Hillenbrand and B. L ¨ ochner . The next W A L D M E I S T E R loop. In A. V oronk ov , editor , Pr oc. CADE-18 , LNAI, pages 486–50 0. S pringer-V erlag, 2002. 14. Th. Hillenbrand and B. L ¨ ochner. A Phytography of Waldmeister . AI Communications , 15(2-3):127– 133, 2002. 15. K. Hodgson and J. Slaney . TPTP, CASC and the Dev elopment of a Semantically Guided Theorem Prover . AI Communications , 15(2-3):135– 146, 2002. 16. I. Horrocks and P . F . Patel-Schneider . Three T heses of R epresentation in the Semantic Web . In Pr oc. of the T welfth International W orld W ide W eb Confer ence (WWW 2003) , pages 39–47, 2003. 17. U. Hustadt, B. Motik, and U. Sattler . A Decomposition Ru le for Decision Procedures by R esolution-Based Calculi. In F .Baader and A. V o ronko v , editors, Logic for Pr ogramming , Artificial Intelligence, and R easoning: 11th International Confer ence , LP A R 2004 , volume 34 52 of LNCS , pages 21–35, Monte video, Uruguay , March 2005. 18. D. B . Lenat. CYC: A Large-Scale Inv estment in K no wledge Infrastructure. Communications of t he ACM , 38(11):33–38 , Nov ember 1995. 19. M. Lo wry , A. P hilpot, T . Pressb urger , I. Underw ood, R. W aldinger , and M. S tickel. Amphion: Automatic Programm ing for th e N AIF Toolkit. NASA S cience Information Systems Newsletter , 31:22– 25, 1994. 20. E.L. L usk. Controlling Redundanc y in Large Search Spaces: Argonn e-Style Theorem P roving Through the Years. In A. V oronko v , editor , Lo gic Pr ogramming and Automated Rea soning . International Conferen ce LP AR’92. , v olume 624 of LN AI , pages 96–106, St.Petersburg, Russia, July 1992. 21. W . W . McCune. Solution of the Robbins Problem. Jour nal of Automated Reasoning , 19(3):263– 276, 1997. 16 22. W .W . McCune. OTTER 3.0 R eference Manual and Guide. T echnical Report ANL-94/6, Argonn e Nation al Laboratory , January 1994. 23. J. Meng. E xperiments On Supporting Interactiv e Proof Using Resolution. In D. Basin and M. Rusinowitch, editors, IJCAR 2004: Second International Jo i nt Confere nce on Automated Reasoning , volume 3097 of L NCS , pages 372–384. Springer V erlag, 2004. 24. M. Newbo r n. Automated Theor em Pro ving: T heory and Practice . Springer V erlag, 2001. 25. M. Ne wborn and Z. W ang. Octopus: Combin ing Learning and Parallel Search. Journa l of Automa t ed Reason ing , 33:171–2 18, 2005. 26. R. Nieuwenhu i s and A. Rub io. Paramodulation-Based Theo r em Prov ing. In A. Ro binson and A. V oronk ov , editors, Handbook of Automated Reason i ng , volume I, chap ter 7. Elsevier Science, 2001. 27. R. Overbeek . A Ne w Class of Automated T heorem-Proving Algorithms. J ournal of the ACM , 21:191–200, 1974. 28. D. Plaisted. Theorem proving with abstraction. Arti ficial Intelligen ce , 16(1):227–26 1, 1981. 29. D. Plaisted. Abstractions using Generalisation Functions. In J. H. Siekmann, editor , Pr oc. CADE-8 , volume 230 of LNCS , pages 365–376 , 1986. 30. A. Riazanov . Implementing an Efficient Theor em Pr over . P hD Thesis, The Univ ersity of Manchester , Manchester , July 2003. 31. A. Riazano v and A. V oronk ov . Splitti ng w ithout Backtrackin g. In B. Ne bel, editor , Pr oc. IJCAI’01 , v olume 1, page s 611–617 , 2001. 32. A. Riazanov and A. V oronk ov . The Design and Implementation of V ampire. AI Communications , 15(2-3):91–11 0, 2002. 33. A. Riaza nov and A. V oronk ov . Limi ted Resource Strategy i n Resolution T heorem Proving. J ournal o f Symbolic Computations , 36(1-2):101– 115, 2003. 34. S. Schulz. E - a Brainiac Theorem Pr ov er . AI Communications , 15(2-3):111– 126, 2002. 35. R. S ekar , I.V . Ramakrishnan, and A. V oronk ov . T erm Indexing. In A. Robinson and A. V oronko v , editors, Handboo k of Automated Reason i ng , volume 2, pages 1853– 1964. Elsevier Science and MIT Press, 200 1. 36. M. Stickel. Homepag e of the SNARK system: http://www .at.sri.com/ ∼ stickel/snark.html . SRI, 2005 . 37. G. Sutcliffe a nd C. Suttner . The TP TP Problem Library . TPTP v . 2.4.1. T echnical report, Univ ersity of Miami, 2001. 38. T . T ammet. Gandalf. J ournal of Automated Reason i ng , 18(2):199–204 , 1997. 39. C. W eidenbach. Combining Superposition, S orts and Splitting. In A. Robinson and A. V oronko v , editors, Handboo k of Automated Reason i ng , volume II, chap ter 27, pages 1965–2014. Elsevier Science, 2001. 17
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment