Simulating reachability using first-order logic with applications to verification of linked data structures

Logical Methods in Computer Science V ol. 5 (2:12) 2009, pp. 1–30 www .lmcs-online.org Submitted Apr . 2, 2006 Published Ma y 28, 2009 SIMULA TING REA CHA BILITY USING FIRST -ORDER LOGIC WITH APPLICA TIONS TO VERIFICA TION OF LINKED DA T A STR UCTURE S ∗ T AL LEV -AMI a , NEIL IMMERMAN b , THOMAS W . REP S c , MOOL Y SAGIV d , SIDDHAR T H SRIV AST A V A e , AND GRET A YORSH f a,d,f School of Computer Science, T el A vi v Uni versity e-mail addr ess : tal.lev ami@cs.tau.ac.il , { msagi v , gretay } @post.tau.ac.il b,e Department of Computer Science, Univ ersity of Massachu setts, Amherst e-mail addr ess : { immerman,siddharth } @cs.umass.edu c Computer Science Department, Univ ersity of W isconsin, Madiso n e-mail addr ess : reps@cs.wisc.edu A B S T R A C T . This paper sho ws ho w to harness existing theorem provers for ﬁrst-order l ogic to au- tomatically verify safety prop erties of imperativ e programs that perform dyn amic storage allocation and destructiv e updating of pointer-v alued structure ﬁelds. One of the main obstacles is specifying and pro ving the (absence) of reachability properties among dynamically allocated cells. The main technical contributions are methods for simulating reachability in a conserv ative way using ﬁrst-order formulas—the formulas describe a sup erset of the set of program states that would be speciﬁed if o ne had a precise w ay to ex press reachability . These methods are emplo yed for semi- automatic program veriﬁcation (i.e., using programmer-su pplied loop in variants) on programs such as mark-and-sweep g arbage collection and destructi ve rev ersal of a singly link ed list. (The mark-and- sweep exa mple has been previou sly reported as being beyond the capabilities of ESC/Jav a.) 1. I N T R O D U C T I O N This paper explores how to harness existing theorem prov ers fo r ﬁrst-order logic to prov e reach- ability propertie s of programs that manipulate dynamically allocated data structu res. The approach that w e use in volv es simulating reachabi lity in a conserv ativ e way using ﬁrst-orde r formulas—i.e., the formulas describe a superse t of the set of program states that would be speciﬁed if one had an accura te way to e xpre ss reachabilit y . 1998 ACM Subject Classiﬁcation: F . 3.1, F .4.1, F .3.2. K e y wor ds and phra ses: First O rder Logic, Transiti ve Closure, Approximation, P rogram V eriﬁcation, Program Analysis. ∗ A preliminary version of this paper app eared in Automated Deduction - CADE-20, 20th International Conferen ce on Automated Deduction, T alli nn, Estonia, July 22-27 , 2005. a This research was supp orted by an Adams Fellowship through the Israel Academy of Sciences and Humanities. b,e Supported by NSF grants CCF-0514621,05 41018,08 30174. c Supported by ONR under contracts N00014-01 -1- { 0796,07 08 } . f Partially su pported by the Israeli Academy of Science. LOGICAL METHODS l IN COMPUTER SCIENCE DOI:10.216 8/LMCS-5 (2:12) 2009 c  T . Lev-Ami, N. Immerman, T . Reps, M. Sagiv, S. Srivastava, and G. Y o rsh CC  Creative Commons 2 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH Automatica lly establis hing safety and liv eness proper ties of sequential and concurren t pro- grams that permit dynamic s torage allocat ion and lo w-le v el pointer manipulation s is challen ging. Dynamic a llocat ion causes th e sta te sp ace to be inﬁnite ; moreo ver , a program is permitted to muta te a data structure by d estruct i vely u pdatin g point er -v alue d ﬁelds of n odes. These fe atures remain ev en if a programming language has good cap abiliti es for data abst raction . Abstract- dataty pe operations are implemented us ing loops, procedu re calls, and sequences of low-le vel pointer manipulatio ns; conseq uently , it is hard to prov e that a data -struc ture in v arian t is reesta blishe d once a sequence of operat ions is ﬁnis hed [Hoa7 5]. In lan guage s such as Jav a, concurre ncy poses y et an other ch alleng e: establ ishing the absence of dead lock requires esta blishin g the abs ence of any cycle of threads that are waiting for lock s held by other threads. Reachabi lity is crucial for reasoning about linked data structur es. For instan ce, to establish that a memory conﬁguration contains no garbage elements , we must show that e ver y element is reacha ble from some prog ram variab le. Other cases w here re achabi lity is a useful notion include • Specifyin g acyclicit y of data-str ucture fragments, i.e., from ev ery element reachable from node n , one canno t reach n • Specifyin g the effe ct of proc edure calls when reference s are passe d as ar guments: only elements that are reachab le from a formal para meter can be modiﬁed • Specifyin g the abs ence of deadlocks • Specifyin g sa fety conditions th at allow establishin g that a d ata-str ucture tra versal terminates, e.g., there is a path from a node to a sink-no de of the data stru cture. The veri ﬁcation of such properties presents a challenge . Ev en simple decida ble frag ments of ﬁrst- order logic become undec idable when reacha bility is added [GME99, IRR + 04a]. Mor eov er , the utility of monadic second-or der lo gic on trees is rather limited because (i) many programs allo w non- tree data struct ures, (ii) ex pressin g the pos tcondi tion of a procedu re (which is essenti al for m odula r reason ing) usuall y requir es referr ing to the pre-state that holds before the proced ure ex ecutes, and thus cannot, in general, be express ed in monadi c second -order logic on trees—e ven for proc edures that manipulate only singly -link ed lists, such as the in-situ list-re versal program shown in Fig. 6, and (iii) the comple xity is prohibiti ve. While our work was actu ally moti vated by our e xperie nce using abstract inter pretati on – and, in particular , the TVLA system [LAS00, SR W02, RSW04] – to establish properties of programs that manipulate heap-alloca ted data stru ctures , in this paper , we consider the proble m of verif ying data-s tructur e operatio ns, assuming that we hav e user- suppli ed loop in varia nts. This is similar to the approa ch tak en in systems like ESC/Jav a [FLL + 02], and Pale [ MS01]. The contri b utions of the paper can be summarize d as follo ws: Handling FO(TC) fo rmulas using FO theorem pr ov ers. W e want to use ﬁ rst-or der theorem pro vers and we need to discuss the transiti v e closure of certain binary predicat es, f . Ho wev er , ﬁrst-orde r theorem prov ers cannot handle transiti ve closure. W e solve this conun drum by adding a ne w relatio n symbol f tc for each such f , togethe r w ith ﬁrst-ord er axioms that assure that f tc is interp reted correctly . The theoretical details of how this is done are presente d in Section 3. The fact that we are able to handle transiti ve closure ef fecti vely and reasona bly automatical ly is quite surpri sing. As expla ined in Section 3, the axioms that we add to cont rol the beha vior of the adde d pre di- cates, f tc , must be sound but not necessari ly complete. O ne way to think about this is that we are simulatin g a formula, χ , in which tr ansiti ve closu re occur s, with a pure ﬁrst -order fo rmula χ ′ . If our axioms are not complete then w e are allo wing χ ′ to denote more stores than χ does . The study of methods that are sound but potential ly incomplete is m oti v ated by the fac t that abstra ction [CC77] SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 3 can be an aid in the ve riﬁcation of many properties . In terms of logic, abstraction correspon ds to using formulas tha t d escribe a superset of the s et of p rogram st ates th at can act ually ari se. A deﬁnite answer about whether a propert y alway s hol ds can sometimes be obtained e ve n when infor mation has been lost becau se of abst raction . If χ ′ is pr ov en v alid in FO then χ is al so v alid in FO(TC); ho weve r , if we fail to pr ov e th at χ ′ is v alid, i t is stil l pos sible th at χ is valid: the failure wo uld be due to th e incompletenes s of the axioms, or the lack of time or space for the theore m pro ver to comp lete the proof. As we will see in S ection 3, it is easy to write a sound axiom, T 1 [ f ] , that is “complete” in the ver y l imited sense that e very ﬁnite, ac ycli c mode l satisfying T 1 [ f ] must interpret f tc as the reﬂexi ve , transit i ve closure of its inter pretati on of f . Ho wev er , in practic e this is not worth much because, as is well-kno w n, ﬁniteness is not express ible in ﬁrst-order logic. Thus, the properties that w e want to pro ve do not follo w from T 1 [ f ] . W e do prov e that T 1 [ f ] is complete for positi ve transiti ve-clos ure proper ties (Proposition 3.2). The real difﬁcult y lies in proving prop erties in volv ing the nega tion of f tc , i.e., that a certain f -path doe s not exist. Induction axio m scheme. T o solv e the abo ve prob lem, we add an inducti on axiom scheme. Although in general, there is no comple te, recursi vely-en umerabl e axiomatizat ion of transiti ve clo- sure (Propositio n 4.1), we ha ve found, on the practica l side, that on the examples w e ha v e tried, T 1 plus i nducti on allo ws us to auto maticall y prov e all of our d esired properties. On th e theoretical side, we prov e that our axiomati zation is compl ete for word models (Theorem 4.8). W e think of the axioms that we use as aides for the ﬁrst-order theor em prov er that we emplo y (S P A S S [WGR96]) to prov e the p roperti es in questi on. R ather th an gi ving S PA S S many instances of the i nducti on scheme, our ex perien ce is that i t ﬁnds th e pro of fas ter if we giv e it sev eral axioms th at are simpler to use than inductio n. As already m ention ed, the hard part is to show that certain paths do not exist . Coloring axiom schemes. In part icular , we use three axiom schemes, havi ng to do w ith par - titioni ng memory into a small set of colo rs. W e call insta nces of these schemes “colorin g axioms”. Our col oring axioms are simple, and are easily pro ved using S PA S S (in under ten seconds) from the induction axioms . For ex ample, the ﬁrst colori ng axiom scheme, NoExit [ A, f ] , says that if no f -edges leav e color class, A , then no f -paths leav e A . It turns out that the NoExit axio m scheme implies – and thus is equi v alent to – the inducti on scheme. Howe ver , we ha ve found in practice that exp licitly adding other coloring axioms (which are consequen ces of NoExit ) enables S PA S S to pro ve prop erties that it othe rwise fails at. W e ﬁrst assume that the programmer prov ides the colors by means of ﬁrst-ord er formulas with transit i ve closure. Our initial experie nce indicates that the generated coloring axioms are useful to S PA S S . In partic ular , it pro vide s the abili ty to verify programs like the mark phase of a mark-and- sweep garbage collector . T his example has been previ ously reported as being beyo nd the capabiliti es of ESC/Jav a. TVLA also succeeds o n this exampl e; ho wev er our new approach provid es veriﬁcation methods that can in some instan ces be more preci se than TV LA. Pro totype imp lementation. Perhaps most excit ing, we hav e implemented the heuristic s for selecti ng colors and their corres pondin g axioms in a prototype using S P A S S . W e hav e used this to automatic ally choose useful color axioms and then verify a series of small heap-man ipulati ng progra ms. W e belie ve that the detailed examples presented here gi ve con vincing e viden ce of the promise of our methodo logy . Of course much further study is needed. Str engthening Nelson’ s res ults. Greg Nelson c onside red a set of axiom s chemes for r easoni ng about reach ability in function graphs, i.e., graphs in which there is at most one f -edge lea ving an y node [Nel83]. He left open the question of whether his axiom scheme s w ere complete for function 4 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH graphs . W e sho w that Nelson’ s axioms are pro v able from T 1 plus our induction axioms. W e also sho w that Nelson’ s axioms are not complet e: in fac t, they do not imply NoExit . Outline. The remaind er of the paper is or ganiz ed as follo ws: Section 2 ex plains our notat ion and the setting; Section 3 ﬁ lls in our formal frame work, introduces the induct ion axiom scheme, and pres ents the colorin g axiom schemes; S ection 4 provid es more detail about TC-completen ess includ ing a d escrip tion of Nels on’ s axioms, a p roof tha t the y ar e n ot TC-compl ete for the func tional case, and a proof that our axiomatizati on is TC-complete for word s; S ection 5 presen ts our heuris- tics incl uding the details of their successful use on a va riety of e xamples; Section 6 descr ibes the applic ability of our methodol ogy , relating it to the reasoning done in the TVLA system; Section 7 descri bes some related work; and Sectio n 8 describe s some conclus ions and future directi ons. 2. P R E L I M I N A R I E S This sectio n deﬁnes the basic nota tions used in this paper and the setting. 2.1. Notation. Syntax : A relationa l vocab ulary τ = { p 1 , p 2 , . . . , p k } is a set of relation symbols , each of ﬁxed arity . W e use the letters u , v , and w (possi bly with numeric subscrip t) for ﬁrst-order v ariabl es. W e write ﬁrst-order formulas over τ with quantiﬁers ∀ and ∃ , logical conne cti ve s ∧ , ∨ , → , ↔ , and ¬ , where atomic formulas include : equality , p i ( v 1 , v 2 , . . . v a i ) , and TC[ f ]( v 1 , v 2 ) , where p i ∈ τ is of arity a i and f ∈ τ is binary . Here TC[ f ]( v 1 , v 2 ) denotes the exis tence of a ﬁnite path of 0 or more f edges from v 1 to v 2 . A formula without TC is called a ﬁrst-order formula. W e use the followin g preced ence of logic al opera tors: ¬ has highest prece dence, follo wed by ∧ and ∨ , follo wed by → and ↔ , and ∀ and ∃ hav e lo west preceden ce. Semantic s : A model , A , of voc ab ulary τ , consists of a non-empty uni verse, |A| , and a relation p A ov er the univ erse inte rpretin g each relat ion symbol p ∈ τ . W e write A | = ϕ to mean that the formula ϕ is true in the model A . For Σ a set of formul as, we write Σ | = ϕ ( Σ semantic ally implies ϕ ) to mean that all models of Σ satisfy ϕ . 2.2. Setting. W e are primarily interes ted in formulas that arise while proving the correctnes s of progra ms. W e assume that the programmer speciﬁes pre and post-co nditio ns for procedu res and loop in varian ts using ﬁrst-order formulas with transiti ve closure on binary relations. T he transformer for a loop body can be produ ced automatically from the prog ram code. For ins tance, to establis h the partial corr ectness w ith resp ect to a user- suppli ed speciﬁcatio n of a pro gram that cont ains a single loo p, we need t o est ablish three prope rties: First, the lo op in var iant must hold at the be ginnin g of the ﬁ rst iteration ; i.e., we must sho w that the loop in var iant follo ws from the precondi tion and the code leading to the loop. Second, the loop in v aria nt provid ed by the user must be maintained; i.e., we must sho w that if the loop in v ariant holds at the beg inning of an iterati on and the loop condi tion also holds, the transf ormer caus es the loop in v aria nt to hold at the end o f the itera tion. Finally , the postcondi tion m ust follo w fr om th e loop in v ariant and the con dition for exit ing the loop. In gene ral, these formula s are of the form ψ 1 [ τ ] ∧ T r [ τ , τ ′ ] → ψ 2 [ τ ′ ] SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 5 where τ is the vo cab ulary of the before state, τ ′ is the voca b ulary of the after state, 1 and T r is the transformer , which may use both the before and after predicates to describe the meaning of the module to be ex ecute d. If symbo l f denot es the value of a predica te before the opera tion, then f ′ denote s the value of the same predi cate after the ope ration. An interes ting special case is the proof of the maintenance formula of a loop in varia nt. This has the form: LC [ τ ] ∧ LI [ τ ] ∧ T r [ τ , τ ′ ] → L I [ τ ′ ] Here L C is the cond ition for entering the lo op an d LI is the loop in va riant. LI [ τ ′ ] in dicate s that the loop in v ariant remains true after the body of the loop is execut ed. The challeng e is that the formulas of interes t contain transiti v e closure ; thus, the val idity of these formulas cann ot be direc tly pro ve n using a theorem prov er for ﬁrst-orde r logic. 3. A X I O M A T I Z A T I O N O F T R A N S I T I V E C L O S U R E The original formula that we want to pro ve, χ , contains transiti ve closure, which ﬁrst-orde r theore m prove rs cannot handle. T o ad dress this problem, we repl ace χ by a ne w formula, χ ′ , where all appeara nces of TC[ f ] hav e been replac ed by the ne w bina ry relation symbol , f tc . W e show in this pape r that from χ ′ , we can often automatically generate an appropriate ﬁrst- order axiom, σ , with the follo wing two prope rties: (1) if σ → χ ′ is va lid in FO, then χ is va lid in FO(TC). (2) A theore m pro ver succ essful ly prov es that σ → χ ′ is v alid in FO. W e now expl ain the theory behind this proce ss. A TC model , A , is a model such that if f and f tc are in th e v ocab ulary of A , then ( f tc ) A = ( f A ) ⋆ ; i.e., A interprets f tc as th e reﬂexi ve, tra nsiti ve closur e of its inte rpretat ion of f . A ﬁrst-orde r f ormula ϕ is TC valid iff it is true in all T C models. W e say that an axiomatiza tion, Σ , is TC sound if ev ery formula that follo ws from Σ is TC va lid. Since ﬁrst-orde r reasoning is sound , Σ is TC sound if f e v ery σ ∈ Σ is TC va lid. W e say that Σ is TC complete if for ev ery TC-va lid ϕ , Σ | = ϕ . If Σ is TC complete and TC sound , then for all ﬁrst-or der ϕ , Σ | = ϕ ⇔ ϕ is TC v alid Thus a TC-complete set of axioms prov es exa ctly the ﬁrst-order formulas , χ ′ , such that the corres pondin g F O(TC) for mula, χ , is vali d. All t he ax ioms th at we consider are TC v alid. There is no recurs i vely e numerabl e TC-complete axiom system (Proposition 4.1). Howe ver , the axiomati zation that we giv e does allo w S P A S S to pro ve all the desired properties on the examples that we hav e tried. W e do pro ve tha t our axiomati- zation is TC complet e for word model s (Theorem 4.8). 1 In some cases it is useful for t he postcondition formula to refer to the original vocab ulary as well. This way the postcondition can summarize some of the behavior of the t ransformer , e .g., summarize the behavior of an entire procedure. 6 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH 3.1. Some TC-Sound Axioms . W e begin with our ﬁrs t TC axio m scheme. For an y binary relation symbol, f , let, T 1 [ f ] ≡ ∀ u, v . f tc ( u, v ) ↔ ( u = v ) ∨ ∃ w . f ( u, w ) ∧ f tc ( w, v ) W e ﬁ rst observ e that T 1 [ f ] is “complet e” in a very limited way for ﬁnite, acycl ic graph s, i.e., T 1 [ f ] exactly characteriz es the meaning of f tc for all ﬁnite, acyclic graphs. The reason that we say this is limit ed is t hat it doe s not gi ve us a co mplete set of ﬁ rst-ord er axioms: as is wel l kno wn, there is no ﬁrst-orde r axiomatization of “ﬁnite”. Pro position 3.1. Any ﬁnite and acyclic model of T 1 [ f ] is a TC model. Pr oof. Let A | = T 1 [ f ] where A is ﬁnite and acyclic. Let a 0 , b ∈ |A| . Assume that th ere is an f -path from a 0 to b . Since A | = T 1 [ f ] , it is easy to see that A | = f tc ( a 0 , b ) . Con versely , suppose that A | = f tc ( a 0 , b ) . If a 0 = b , then there is a path of length 0 from a 0 to b . Otherwise, by T 1 [ f ] , there exi sts an a 1 ∈ |A| such that A | = f ( a 0 , a 1 ) ∧ f tc ( a 1 , b ) . N ote that a 1 6 = a 0 since A is acyclic. If a 1 = b then there is an f -path of length 1 from a to b . Otherwise there must exist an a 2 ∈ |A| such that A | = f ( a 1 , a 2 ) ∧ f tc ( a 2 , b ) and so on, generating a set { a 1 , a 2 , . . . } . None of the a i can be equal to a j , for j < i , by acyclic ity . Thus, by ﬁniteness, some a i = b . Hence A is a TC model. Let T ′ 1 [ f ] be the ← direction of T 1 [ f ] : T ′ 1 [ f ] ≡ ∀ u, v . f tc ( u, v ) ← ( u = v ) ∨ ∃ w . f ( u, w ) ∧ f tc ( w, v ) Pro position 3.2. Let f tc occur only positi vely in ϕ . If ϕ is TC valid, then T ′ 1 [ f ] | = ϕ . Pr oof. Suppose that T ′ 1 [ f ] 6| = ϕ . Let A | = T ′ 1 [ f ] ∧ ¬ ϕ . Note that f tc occurs only negati vely in ¬ ϕ . Furthermore, since A | = T ′ 1 [ f ] , it is easy to sho w by indu ction on the lengt h of the path, that if there is an f -path from a to b in A , then A | = f tc ( a, b ) . Deﬁne A ′ to be the model formed from A by interp reting f tc in A ′ as ( f A ) ⋆ . Thus A ′ is a TC model and it only dif fers from A by the fact that we ha ve remov ed zero or m ore pairs from ( f tc ) A to form ( f tc ) A ′ . Because A | = ¬ ϕ and f tc occurs only negati vely in ¬ ϕ , it follo w s that A ′ | = ¬ ϕ , which contradicts the assumption that ϕ is TC v alid. Proposit ion 3.2 sho ws that provi ng positi ve fa cts of the form f tc ( u, v ) is easy; it is the task of pro ving that paths do not exist that is more subtle . Proposit ion 3.1 shows that what we are missing, at least in the acyclic case, is that there is no ﬁrst-orde r axiomatizat ion of ﬁniteness . T radi tional ly , when reasoning about the natural numbers, this proble m is mitigated by addi ng inducti on axioms. W e next i ntrodu ce an inductio n scheme that, togeth er with T 1 , seems to be suf ﬁ cient to pro ve an y property we need concernin g TC. Notation : In gen eral, we will use F to denote the set of all binary relation symbols, f , such that T C[ f ] occurs in a formula we are considerin g. If ϕ [ f ] is a formula in which f oc curs, let ϕ [ F ] = V f ∈ F ϕ [ f ] . T hus, for exampl e, T 1 [ F ] is the conjunction of the axio m T 1 [ f ] for all binary relatio n symbols, f , under consi deratio n. Deﬁnition 3.3. For any ﬁrst-order formulas Z ( u ) , P ( u ) , and binary relatio n symbol, f , let the induction principle , IND [ Z, P , f ] , be the follo wing ﬁrst-order formula: ( ∀ w . Z ( w ) → P ( w )) ∧ ( ∀ u, v . P ( u ) ∧ f ( u, v ) → P ( v )) → ∀ u, w . Z ( w ) ∧ f tc ( w, u ) → P ( u ) In order to expl ain the m eanin g of IND and other axio ms it is important to remember that w e are trying to write axioms, Σ , that are, SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 7 • TC val id , i.e., true in all TC models, and • useful , i.e., all models of Σ are sufﬁcie ntly like TC models that they satis fy the TC-v alid pro per - ties we want to pro ve . T o make the meaning of our axioms intui ti ve ly clear , in this section we will say , for examp le, that “ y is f tc -reach able from x ” to mean that f tc ( x, y ) holds. Later , we will assume that the reader has the idea and just say “reach able” instea d of “ f tc -reach able”. The intu iti v e m eaning of the induction princi ple is that if eve ry zero point sati sﬁes P , and P is preser ved when follo wing f -edges, then ev ery point f tc -reach able from a zero point satisﬁes P . Obvio usly this principl e is TC v alid, i.e., it is true for all structu res such that f tc = f ⋆ . As an easy applica tion of the induct ion principle, consider the followin g cousin of T 1 [ f ] , T 2 [ f ] ≡ ∀ u, v . f tc ( u, v ) ↔ ( u = v ) ∨ ∃ w . f tc ( u, w ) ∧ f ( w , v ) The diff erence between T 1 and T 2 is that T 1 requir es that each path repre sented by f tc starts with an f edge and T 2 requir es the path to end with an f edge. It is easy to see that neither of T 1 [ f ] , T 2 [ f ] implies the other . Ho wev er , in the presence of the inductio n principle they do imply each other . Fo r exa mple, it is easy to prov e T 2 [ f ] from T 1 [ f ] using IND [ Z, P , f ] where Z ( v ) ≡ v = u and P ( v ) ≡ u = v ∨ ∃ w . f tc ( u, w ) ∧ f ( w, v ) . Here , for each u w e use IND [ Z, P , f ] to prov e by induct ion that eve ry v reachable from u sati sﬁes the right-han d side of T 2 [ f ] . Another useful axiom scheme pro v able from T 1 plus IND is the transit i vity of reacha bility : T rans [ f ] ≡ ∀ u, v , w . f tc ( u, w ) ∧ f tc ( w, v ) → f tc ( u, v ) 3.2. Coloring Axioms. W e next describe three TC-sound axioms schemes that are not implied by T 1 [ F ] ∧ T 2 [ F ] , and are prov able from the inducti on princ iple. W e will see in the sequel that these colori ng axioms are very useful in pro ving that paths do not exist , permitting us to ver ify a v ariety of algorithms. In Section 5, we will present some heuristics for automatically choos ing particular instan ces of the colo ring axiom schemes that enable us to prove our goal formu las. The ﬁrst color ing axiom scheme is the NoExit axiom scheme: ( ∀ u, v . A ( u ) ∧ ¬ A ( v ) → ¬ f ( u, v )) → ∀ u, v . A ( u ) ∧ ¬ A ( v ) → ¬ f tc ( u, v ) for any ﬁrst- order fo rmula A ( u ) , and binary rel ation symbol , f , NoExit [ A, f ] says that if no f -edge lea v es color class A , then no point outside of A is f tc -reach able from A . Observ e that altho ugh it is very simple, NoExit [ A, f ] does not follo w from T 1 [ f ] ∧ T 2 [ f ] . Let G 1 = ( V , f , f tc , A ) be a m odel consisti ng of two disjoint cycles: V = { 1 , 2 , 3 , 4 } , f = {h 1 , 2 i , h 2 , 1 i , h 3 , 4 i , h 4 , 3 i} , and A = { 1 , 2 } . L et f tc ha ve all 16 possible pairs. Thus G 1 sat- isﬁes T 1 [ f ] ∧ T 2 [ f ] but violates NoExit [ A, f ] . Eve n for acyc lic m odels, NoExit [ A, f ] does not follo w from T 1 [ f ] ∧ T 2 [ f ] because there are inﬁnite m odels in which the implication does not hold (Proposi tion 4.7 ). NoExit [ A, f ] follo ws eas ily f rom t he induction pr inciple : if no f -edges lea ve A , then i nducti on tells us that ev eryth ing f tc -reach able from a point in A sati sﬁes A . S imilarly , NoExit [ A, f ] implies the induct ion axiom, IND [ Z, A, f ] , for any formula Z . The second coloring axiom scheme is the GoOut axiom: for any ﬁ rst-ord er formulas A ( u ) , B ( u ) , and binary relation symbol, f , GoOut [ A, B , f ] says that if the only f -edges leavin g colo r class A are to B , th en any f tc -path from a point in A to a point not in A must pass through B . ( ∀ u, v . A ( u ) ∧ ¬ A ( v ) ∧ f ( u, v ) → B ( v )) → ∀ u, v . A ( u ) ∧ ¬ A ( v ) ∧ f tc ( u, v ) → ∃ w . B ( w ) ∧ f tc ( u, w ) ∧ f tc ( w, v ) 8 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH T o see that G oOut [ A, B , f ] follo w s from the induc tion princip le, assume that the only f -edges out of A enter B . For any ﬁxe d u in A , we prov e by induc tion that any point v f tc -reach able from u is either in A or has a predeces sor , b in B , th at is f tc -reach able from u . The third coloring axiom scheme is the NewS tart axiom, which is useful in the conte xt of dynamic ally changin g graphs: for any ﬁrst-orde r formula A ( u ) , and binary relation symbols f and g , thi nk of f as the pre vious edge relation and g as the cu rrent ed ge relation. NewStart [ A, f , g ] say s that if the re are no n e w ed ges b etween A node s, then any new path, i.e., g tc b ut n ot f tc , from A must lea v e A to m ake i ts change: ( ∀ u, v . A ( u ) ∧ A ( v ) ∧ g ( u, v ) → f ( u, v )) → ∀ u, v . g tc ( u, v ) ∧ ¬ f tc ( u, v ) → ∃ w . ¬ A ( w ) ∧ g tc ( u, w ) ∧ g tc ( w, v ) NewStart [ A, f , g ] follo ws from the inducti on principle by a proof that is similar to the proof of GoOut [ A, B , f ] . 3.2.1. Linked Lists. The spirit behind our considera tion of the coloring axioms is similar to that found in a paper of Greg Nelson’ s in which he introduc ed a set of reach ability axioms for a func- tional predicat e, f , i.e., there is at m ost one f edge l ea ving a ny point [Nel83]. Nelson asked whether his a xiom sc hemes are complete fo r the functional setti ng. W e remark that Nelson’ s axiom sc hemes are prov able from T 1 plus our inductio n princip le. Howe ver , N elson’ s axiom schemes are not com- plete: we constructed a functional graph that satisﬁes Nelson’ s axioms but violate s N oExit [ A, f ] (Proposi tion 4.7 ). At least one of Nelson’ s axiom schemes seems orthog onal to our colorin g axioms and may be useful in certain proofs . Nelso n’ s ﬁfth axiom scheme states that the points reachable from a gi ve n point are linearly ordered. The soundnes s of the axiom scheme is due to the fact that f is functi onal. W e make use of a simpliﬁed version of Nelson’ s o rdering axiom scheme: Let F unc [ f ] ≡ ∀ u, v , w . f ( u, v ) ∧ f ( u, w ) → v = w ; then, Order [ f ] ≡ F unc [ f ] → ∀ u, v , w . f tc ( u, v ) ∧ f tc ( u, w ) → f tc ( v , w ) ∨ f tc ( w, v ) 3.2.2. T r ees. When working with programs mani pulatin g trees, we hav e a ﬁxed set of selectors S el and transit i ve closure is performed on the dow n relation, deﬁned as ∀ v 1 , v 2 . dow n ( v 1 , v 2 ) ↔ _ s ∈ S e l s ( v 1 , v 2 ) T rees hav e no sharing (i.e., the down relati on is injecti ve), thus a simila r axi om to Order [ f ] is used: ∀ u, v , w . dow n tc ( v , u ) ∧ dow n tc ( w, u ) → down tc ( v , w ) ∨ dow n tc ( w, v ) Another important property of t rees is that the subtrees be lo w distinct children of a node are disjoint. W e use the follo wing axioms to capture this, where s 1 6 = s 2 ∈ S el : ∀ v , v 1 , v 2 , w . ¬ ( s 1 ( v , v 1 ) ∧ s 2 ( v , v 2 ) ∧ down tc ( v 1 , w ) ∧ dow n tc ( v 2 , w )) SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 9 4. O N T C - C O M P L E T E N E S S In this sectio n we consider the conce pt of TC-C omplete ness in detail. The reade r anxious to see ho w we use our method ology is enco uraged to skim or skip this section. W e ﬁrst sho w that there is no recursi vely enumer able T C-complete set of axi oms. Pro position 4.1. Let Γ be an r .e. set of TC-valid ﬁrs t-or der sentenc es. T hen Γ is not TC-complete . Pr oof. By t he proof of Cor ollary 9 , pa ge 11 of [IRR + 04a], the re is a rec ursi ve procedu re that, gi v en any T uring machine M n as input, pr oduces a ﬁrst-o rder formula ϕ n in a vo cab ular y τ n such that ϕ n is T C-v alid iff T uring m achine , M n , on input 0 ne v er halts. The v ocab ulary τ n consis ts of the two bin ary relation symbols , E , E tc , constan t symbols, a, d , and some una ry relation symbols. It follo ws that if Γ were TC-complete, then it wou ld pr ov e al l t rue i nstanc es of ϕ n and t hus t he halting proble m would be s olv able . Proposit ion 4.1 sho ws that ev en in the presence of only one binar y relation symbol, there is no r .e. TC-complete axiomati zation. In [A vr03 ], A vron giv es an e lega nt ﬁnite ax iomatizat ion of the nat ural numbers u sing tra nsiti ve closur e, a succes sor relation and the binary function symbol, “ + ”. Furthermore , he sho w s that multiplic ation is deﬁnable in this languag e. Since the uni que TC-model for A vron’ s axioms is the standa rd natura l numbers it follows that: Cor ollary 4.2. Let Γ be an arithmeti c set of TC-valid ﬁr st-or der sentenc es over a voca b ulary in- cludin g a binar y r elatio n symbol an d a b inary funct ion symbol (or a ternar y r elatio n symbol). Then Γ is not TC-complete. In P ropos ition 3.1 we sho w ed that any ﬁnite and acycl ic model of T 1 [ f ] is a TC model. This can be stren gthene d to Pro position 4.3. Any ﬁnite model of T 1 plus IND is a TC-model. Pr oof. Let A be a ﬁ nite model of T 1 plus IND . Let f be a binary relation symbol , and let a, b be elements of the uni verse of A . S ince A | = T 1 , if there is an f path from a to b then A | = f tc ( a, b ) . Con versel y , suppose that the re is no f path from a to b . Let R a be the set of elements of the uni v erse of A that are reach able from a . Let k = | R a | . S ince A is ﬁnite we may use exi stentia l quanti ﬁcation to name exactly all the elements of R a : x 1 , . . . , x k . W e can then deﬁne the color class: C ( y ) ≡ y = x 1 ∨ · · · ∨ y = x k . Then we can prov e using IND , or equiv alently NoExit , that no verte x outside this color class is reachab le from a , i.e., A | = ¬ f tc ( a, b ) . Thus, as desired, A is a TC-model. 4.1. Mor e About TC-Completeness. Even thoug h there is no r . e. set of TC-complete axioms in genera l, th ere are TC-complete axiomatiza tions for certain interesting cases. Let Σ be a set of formulas . W e s ay that ψ is TC-valid wrt Σ iff e very TC-model of Σ satisﬁes ψ . Let Γ be TC-sound. W e say that Γ is TC -complet e wrt Σ iff Γ ∪ Σ ⊢ ψ for e very ψ that is T C-v alid wrt Σ . W e are interes ted in whether T 1 plus IND is TC-complete with respect to interest ing theor ies, Σ . Since TC[ s ]( a, b ) asserts the exis tence of a ﬁnite s -path from a to b , we can expr ess that a structu re is ﬁnite by writing the formula: Φ ≡ Fun c [ s ] ∧ ∃ x ∀ y . s tc ( x, y ) . Observe that ev ery TC- model that satisﬁes Φ is ﬁnite . T hus, if we are in a setting – as is frequent in logic – where we may add a ne w binary relatio n symbol, s , then ﬁniteness is TC-expr essible . 10 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH Pro position 4.4. L et Σ be a ﬁnite set of fo rmulas, and Γ an r .e., TC-complete a xiomatiza tion wrt Σ in a langua g e wher e ﬁniteness is TC-e xpr essible . Then ﬁnite TC-validity for Σ is decida ble . Pr oof. Let Φ be a formula as above that TC-expresse s ﬁniteness. Let ψ be any formula. If ψ is not ﬁnite TC-v alid wrt Σ , then we can ﬁnd a ﬁnite TC model of Σ w here ψ is fa lse. If ψ is ﬁnite TC-va lid, then Γ ∪ Σ ⊢ Φ → ψ , and we can ﬁnd this out by syste matically generatin g all proofs from Γ . From Propositio n 4.4 we kno w that we must restrict our search for cases of TC-completen ess to those whe re ﬁnite TC-v alidi ty is decidab le. In particu lar , since the ﬁnite th eory of two functional relatio ns is unde cidabl e, e.g., [IRR + 04a], we kno w that, Cor ollary 4.5. Ther e ar e no r .e. TC-valid axioms for the functio nal case even if we r estric t to at most two binary r elat ion symbols. 4.2. Nelson’ s Axioms. Our idea of considering transiti ve-closu re axioms is similar in spirit to the approa ch that Nelson takes [Nel83 ]. T o pro ve some progr am propert ies, he introduc es a set of reacha bility axiom schemes for a function al predicate , f . By “functio nal” we mean that f is a partial functi on: Fu nc [ f ] ≡ ∀ u, v , w . f ( u, v ) ∧ f ( u, w ) → v = w . W e remark that Nelson ’ s axio m schemes are prov able from T 1 plus our ind uction principle . At least two of his schemes m ay be useful for us to add in our approa ch. Nelson aske d w hether his axioms are complete for the functio nal setting. It follo ws from Corollary 4.5 that the answer is no. W e prov e belo w that Nelson’ s axioms do not pro ve NoExit . Nelson’ s basic relation s ymbols are tern ary . For example, h e writ es “ u f → x v ” to mean tha t t here is an f -path from u to v that follo ws no edges out of x . W e encod e this as, f x tc ( u, v ) , where, for each pa rameter x we add a new rela tion symbol, f x , toget her with the ass ertion: ∀ u, v . f x ( u, v ) ↔ f ( u, v ) ∧ ( u 6 = x ) . Nelson also include s a notati on for modifyin g the partial functio n f . He writes, f ( p ) q for the partial functio n that agrees with f ev erywher e except on argu ment p where it has val ue q . Nelson’ s eighth axiom scheme asserts a basic consistenc y property for this notation. In our transla tion w e simply assert that f ( p ) q ( u, v ) ↔ ( u 6 = p ∧ f ( u, v )) ∨ ( u = p ∧ v = q ) . When we transla te Nelson’ s eighth axiom scheme the resul t is tau tologic al, so we can safely omit it. Using our transla tion, Nelson ’ s axiom schemes are the foll o wing. (N1) f x tc ( u, v ) ↔ ( u = v ) ∨ ∃ z . ( f x ( u, z ) ∧ f x tc ( z , v )) (N2) f x tc ( u, v ) ∧ f x tc ( v , w ) → f x tc ( u, w ) (N3) f x tc ( u, v ) → f tc ( u, v ) (N4) f y tc ( u, x ) ∧ f z tc ( u, y ) → f z tc ( u, x ) (N5) f tc ( u, x ) → f y tc ( u, x ) ∨ f x tc ( u, y ) (N6) f y tc ( u, x ) ∧ f z tc ( u, y ) → f z tc ( x, y ) (N7) f ( x, u ) ∧ f tc ( u, v ) → f x tc ( u, v ) These axio m schemes can be prov ed using appropriate instanc es of T 1 and the inducti on prin- ciple. Just as we s ho wed i n Prop ositio n 3.1 that any ﬁnite and ac yclic mo del of T 1 [ f ] is a TC model, we ha ve that , Pro position 4.6. Any ﬁnite and functio nal model of Nelson’ s axioms is a TC-m odel. SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 11 Pr oof. Consider any ﬁnite and functio n model, M . W e claim that for each f and x ∈ |M| , ( f x tc ) M = (( f x ) M ) ⋆ . If there is an f x path from u to v , then it follo ws from repeated uses of (N1) that f x tc holds. If there is no f x path from u to v and u is not on an f -cycle , then using (N1) w e can follo w f -edges from u to the end and prov e that f x tc does not hold. If there is no f x path from u to v and u is on an f -cycle contai ning x , then using (N1) we can follo w f -edges from u to x to prove that f x tc ( u, v ) does not hold. Finally , if there is no f path from u to v and u is on an f -cy cle, suppose for the sake of a contra diction that f tc ( u, v ) holds. Let x be the predecessor of u on th e c ycle. By N7, f x tc ( u, v ) must hold. Ho wev er , this contra dicts the previo us parag raph. Axiom schemes (N5) and (N7) may be useful for us to assert when f is function al. (N5) says that the poin ts reacha ble from u are totally ordere d in the sense that if x and y are both reachable from u , then in the path from u either x comes ﬁrst or y comes ﬁrst. (N7) says that if there is an edge from x to u and a path from u to v , then there is a path from u to v that does not go through x . This implie s the usefu l proper ty that no verte x not on a cycle is reachable from a verte x on the cyc le. W e conclud e this section by prov ing the follo wing, Pro position 4.7. Nelson’ s axioms do not imply NoExit . Pr oof. Consider the stru cture G = ( V , f , f tc , f 0 tc , f 1 tc , f 2 tc , . . . , f ∞ tc , A ) such that V = N ∪ {∞} , the set of natura l numbers plus a point at inﬁnity . Let A = N , i.e., the color class A is interprete d as all points exc ept ∞ . Deﬁne f = {h u, u + 1 i | u ∈ N } , i.e., there is an edge from e very natural number to its successo r , b ut ∞ is isolat ed. Howe ver , let f tc = {h u, v i | u ≤ v } , i.e., G belie ves that there is a path from each natural number to inﬁnity . Similarly , for each k ∈ V , f k tc = {h u, v i| u ≤ v ∧ ( k < u ∨ v ≤ k ) } . It is easy to check that G satisﬁes all of Nelson’ s axioms. The problem is that G | = ¬ NoExit [ A, f ] . It follows that N elson’ s axioms do no t entail NoExit [ A, f ] . This is anot her proof that they are not TC complet e. 4.3. TC-Completeness for W ords. In this subsecti on, we prov e that T 1 plus IND is TC-complete for words. For any alphabet, Σ , let the vo cab ular y of words over Σ be v ocab ( Σ) = h 0 , max ; s 2 , s 2 tc , P 1 σ : σ ∈ Σ i . The domain of a word m odel is an ordered set of positions, and the unar y relation P σ ( x ) exp resses the presence of symbol σ at position x. s is the successor relatio n ove r position s, and s tc is its trans iti ve closure. T he constants 0 and max repre sent the ﬁrst and last positions in the word. A simple axiomatiz ation of word s is A Σ w , the conju nction of the follo wing four statemen ts: (A1) ∀ x . ( ¬ s ( x, 0) ∧ ¬ s ( max, x ) ∧ ( x 6 = 0 → ∃ y . s ( y , x )) ∧ ( x 6 = max → ∃ y . s ( x, y ))) (A2) ∀ xy z . (( s ( x, y ) ∧ s ( x, z )) ∨ ( s ( y , x ) ∧ s ( z , x ))) → y = z (A3) ∀ x . s tc (0 , x ) ∧ s tc ( x, max ) (A4) ∀ x . _ σ ∈ Σ ( P σ ( x ) ∧ ^ τ 6 = σ ¬ P τ ( x )) In particular , observ e that a TC-model of A Σ w is exact ly a Σ word . Let Γ = IND ∪ { T 1 } . W e wish to pro ve the follo wing: Theor em 4.8. Γ is TC-complete wrt A Σ w . 12 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH W e ﬁrst note that Γ ∪ { A Σ w } implies acyclici ty: ∀ xy . s ( x, y ) → ¬ s tc ( y , x ) . The proof using induct ion proc eeds as follo ws: in the base case, there is no loop at 0 . Inducti vely , suppose there is no loop starting at x , s ( x, y ) holds, bu t there is a loop at y , i.e., ∃ z . s ( y , z ) ∧ s tc ( z , y ) . T hen by T 1 and IND we k no w ∃ x ′ . s tc ( z , x ′ ) ∧ s ( x ′ , y ) , and s tc ( y , x ′ ) . (A2) ass erts that the in-d egr ee of s is 1, which means x ′ = x and we hav e a contra dictio n: s tc ( y , x ) . In order to prov e The orem 4.8 , we nee d to show that i f ϕ is true in al l TC models of Γ ∪ { A Σ w } , i.e., in all words, then Γ ∪ { A Σ w } ⊢ ϕ . By the completen ess of ﬁrst-orde r logic it suf ﬁces to sho w that Γ ∪ { A Σ w } | = ϕ . W e p rov e the contra positi ve of this in Lemma 4.10. In o rder to do so, we ﬁrst constr uct a DF A D ϕ that has some desirab le proper ties. Lemma 4.9. F or any ϕ ∈ L ( v ocab ( Σ)) we can build a DF A D ϕ = ( Q ϕ , Σ , δ ϕ , q 1 , F ϕ ) , satis fying the following pr opertie s: (1) The states q 1 , q 2 , . . . q n of D ϕ ar e ﬁrst-or der deﬁnable as formulas q 1 1 , q 1 2 , . . . q 1 n , wher e intu- itively q i ( x ) will mean th at D ϕ is in state q i after rea ding sy mbols at wor d p osition s 0 , 1 , . . . , x . (2) The transit ion function δ ϕ of D ϕ is captur ed by the ﬁrst-or der deﬁnition s of the states. That is, for all i ≤ n , Γ ∪ A Σ w semantic ally implies the following two formula s for ev ery state q i : (a) q i (0) ↔ _ σ ∈ Σ , δ ϕ ( q 1 ,σ )= q i P σ (0) . (b) ∀ u, v . s ( u, v ) →  q i ( v ) ↔ _ σ ∈ Σ , δ ϕ ( q j ,σ )= q i ( P σ ( v ) ∧ q j ( u ))  . (3) Γ ∪ {A Σ w } | = ϕ ↔ F ( max ) , wher e F ( u ) ≡ _ q i ∈ F ϕ q i ( u ) . Pr oof. W e prov e prope rties 1, 2, and 3 while construc ting D ϕ and the ﬁrst-orde r deﬁnitions of its states by induction on the length of ϕ . The rew ard is that we get a gene ralized form of the McNaughto n-Pap ert [MP71] constru ction that work s on non-stand ard models. Some subformu las of ϕ may ha v e free v ariabl es, e.g., x, y . In the indu cti ve step consid ering such subfo rmulas, we expa nd the vo cab ular y of the automaton to Σ ′ = { x, ǫ } × { y, ǫ } × Σ . W e write P σ ( u ) ∧ ( x = u ) ∧ ( y 6 = u ) to mean that at position u , symbol σ occurs, as does x , b ut not y . Note: Since ev ery structure giv es a unique v alue to each vari able, x , we are only intereste d in string s in which x occur s at exa ctly one positi on. For the follo wing indu ction, let B be any model of Γ ∪ { A Σ w } . For the inter mediate stage s of induct ion where some vari ables may occur freely , we assume that B interprets these free v ariab les. W e prov e that the formulas of proper ties 2 and 3 must hold in B at eac h step of the induction. Base cases : ϕ is either P σ ( x ) , x = y , s ( x, y ) , or s tc ( x, y ) . ϕ = P σ ( x ) : The automat on for P σ ( x ) and its state deﬁnitio ns are sho wn in Fig 1. Figure 1: D P σ ( x ) State pred icate Deﬁnition q 1 ( v ) ¬ s tc ( x, v ) q 2 ( v ) s tc ( x, v ) ∧ P σ ( x ) q 3 ( v ) s tc ( x, v ) ∧ ¬ P σ ( x ) T able 1: D P σ ( x ) SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 13 Propertie s 2 and 3 can be v eriﬁed as follo ws: For pro perty 2b, suppose that B | = s ( u, v ) . W e must sho w that B | = q 2 ( v ) iff one of two ru les leadin g to state q 2 holds. T hese two rules corres pond to the edge from q 1 (if x = v ), and the self loop on q 2 (if x 6 = v ). Suppose B | = q 2 ( v ) ∧ ( v = x ) . Expan ding the deﬁnit ion of q 2 , we get B | = s tc ( x, v ) ∧ P σ ( x ) ∧ ( v = x ) . But this means B | = ¬ s tc ( x, u ) since B | = Γ ∪ {A Σ w } and we ha ve ac yclici ty . Therefo re, we hav e B | = q 1 ( u ) by deﬁnition of q 1 , and w e get the desi red conclu sion, B | = q 1 ( u ) ∧ P σ ( v ) . The case correspond ing to x 6 = v is also easy , and relies on the fact that B | = s tc ( x, v ) ∧ s ( u, v ) ∧ ( x 6 = v ) → s tc ( x, u ) . In other words, if q 2 ( v ) holds and x 6 = v , then q 2 holds at v ’ s predec essor too. This pr ov es one direction of pro perty 2b f or state q 2 . T he o ther dire ction for q 2 , and the proofs for other states procee d similarly . The proof for 2a is similar . For pro perty 3, we need to sho w that B | = P σ ( x ) ↔ q 2 ( max ) . This can be veriﬁed easily from the deﬁnition of q 2 . ϕ = ( x = y ) or s ( x, y ) : The automata and their state deﬁnitions for ϕ = ( x = y ) and ϕ = s ( x, y ) are sho wn in Figs 2 and 3. Properti es 2 and 3 can be v eriﬁed easily for these deﬁnitions . Figure 2: D x = y State pred icate Deﬁnition q 1 ( v ) ¬ s tc ( x, v ) q 2 ( v ) ( x = y ) ∧ s tc ( x, v ) q 3 ( v ) ( x 6 = y ) ∧ s tc ( x, v ) T able 2: D x = y Figure 3: D s ( x,y ) State predica te Deﬁnition q 1 ( v ) ¬ s tc ( x, v ) q 2 ( v ) x = v q 3 ( v ) s ( x, y ) ∧ s tc ( y , v ) q 4 ( v ) s tc ( x, v ) ∧ ( x 6 = v ) ∧ ¬ s ( x, y ) T able 3: D s ( x,y ) ϕ = s tc ( x, y ) : The automato n for ϕ = s tc ( x, y ) , and its state deﬁnition s are shown in Fig 4. W e provide a ske tch of the proof of propert y 2b for state q 3 . Proofs for other states follow using similar ar guments . Suppose B | = q 3 ( v ) ∧ s ( u, v ) . Expa nding the deﬁnition of q 3 ( v ) , we get B | = s tc ( x, y ) ∧ s tc ( y , v ) ∧ s ( u, v ) . There are two pos sibilit ies: v 6 = y and v = y , correspond ing to the loop on state q 3 , and the incoming e dges fr om q 2 or q 1 . S uppos e v = y . Now we ha ve two further ca ses, x = y and x 6 = y . 14 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH Figure 4: D s tc ( x,y ) State predica te Deﬁnition q 1 ( v ) ¬ s tc ( x, v ) q 2 ( v ) s tc ( x, v ) ∧ ¬ ( s tc ( x, y ) ∧ s tc ( y , v )) q 3 ( v ) s tc ( x, y ) ∧ s tc ( y , v ) T able 4: D s tc ( x,y ) If x = y = v , we get B | = ¬ s tc ( x, u ) , or B | = q 1 ( u ) ∧ s ( u, x ) ∧ ( x = y = v ) , denoting the approp riate transi tion from state q 1 . On the o ther hand, if B | = ( x 6 = y ) , we need to show that q 3 was reached via q 2 . Expanding t he deﬁnitio n o f q 3 ( v ) we ha ve B | = s tc ( x, y ) ∧ s tc ( y , v ) . Since y = v , we get B | = s tc ( x, u ) ∧ s ( u, y ) . But by deﬁnition of q 2 , this means B | = q 2 ( u ) . Thus, we ha ve B | = q 2 ( u ) ∧ s ( u, v ) ∧ v = y , the approp riate transi tion rule for m ovi ng from state q 2 to q 3 . For this directi on of property 2b, the only remaini ng case is y 6 = v . In this case, it is easy to pro ve that we entered state q 3 at y , and loop ed thereafter using the appro priate transition for the loop. For the re v erse dire ction, we nee d to prove that if a transiti on rule is applicable at a position then the corresp ondin g nex t state must hold at the next position. This is easily veriﬁed using the state-d eﬁnition s. Property 2 for o ther states foll o ws by similar ar guments. Property 3 ca n also be ver iﬁed easily using the deﬁnition of q 3 . Induct ive steps : ϕ is eithe r ϕ 1 ∧ ϕ 2 , or ¬ ψ , or ∃ x . ψ ( x ) . ϕ = ϕ 1 ∧ ϕ 2 : Induc ti ve ly we hav e D ϕ 1 and D ϕ 2 with ﬁnal state de ﬁnitions q f 1 and q f 2 respec ti vel y . T o con struct D ϕ , we perform the produc t construct ion: let q i be state deﬁniti ons of D ϕ 1 and q ′ i those of D ϕ 2 . Then the state deﬁnitions of D ϕ are q h i,j i , and w e ha ve q h i,j i ( u ) ≡ q i ( u ) ∧ q ′ j ( u ) . The accept ing states are F ϕ 1 ∧ ϕ 2 ( u ) ≡ _ f 1 ∈ F 1 ∧ f 2 ∈ F 2 q h f 1 ,f 2 i ( u ) . Property 1 holds because we are still in ﬁrst-order . Property 2 follo ws because w e are just perfor ming lo gical transliterati ons of the standard DF A conjunction ope ration . P ropert y 3 follows from the fac t that we already ha v e B | = F 1 ( max ) ↔ ϕ 1 and B | = F 2 ( max ) ↔ ϕ 2 , and from the deﬁnitio n of F ϕ 1 ∧ ϕ 2 . ϕ = ¬ ψ : In this case, we take the complement of D ψ which is easy because our automata are determin istic. Let the ﬁnal state of D ψ be F ′ . D ϕ has the s ame state de ﬁnitions as ψ , b ut its ﬁnal state deﬁnition is F ( u ) ≡ ¬ F ′ ( u ) . It is easy to see that prope rties 1, 2 and 3 hold in this case. ϕ = ∃ x . ψ ( x ) : Induct i vel y we ha ve D ψ = ( { q 1 , . . . , q n } , Σ × { x, ǫ } , δ ψ , q 1 , F ψ ) . First we tran sform D ψ to an NF A N ϕ = ( { p 1 , . . . , p n , p ′ 1 , . . . , p ′ n } , Σ , δ, p 1 , F ) , where F = { p ′ i | q i ∈ F ψ } and δ ( p i , σ ) = { p j , p ′ k | δ ψ ( q i , σ ∧ ¬ x ) = q j , δ ψ ( q i , σ ∧ x ) = q k } . Thus N ϕ no longer sees x ’ s. Inst ead, it guesses the one place that x m ight occur , and that is where the transiti on from p i to p ′ i occurs . (See Fig. 5) Let p i ( u ) ≡ ∃ x . ¬ s tc ( x, u ) ∧ q i ( u ); p ′ i ( u ) ≡ ∃ x . s tc ( x, u ) ∧ q i ( u ) . SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 15 Figure 5: N ∃ x . P σ ( x ) Deﬁne D ϕ to be the D F A equi v alent to N ϕ using the subset construction . Let S 0 = { p i 0 , p ′ j | j ∈ J 0 } , S 1 = { p i 1 , p ′ j | j ∈ J 1 } be two states o f D ϕ . (Note t hat each reachabl e stat e of D ϕ has exac tly one element of { p 1 , . . . , p n } .) Observ e that in a “run” of N ϕ on B , we can be in state p i at positi on u if f B | = p i ( u ) and w e can be in state p ′ i of u if f B | = p ′ i ( u ) . Thus, the ﬁrst-ord er formula captu ring state S 0 is S 0 ( u ) ≡ p i 0 ∧ ^ j ∈ J 0 p ′ j ( u ) ∧ ^ j / ∈ J 0 ¬ p ′ j ( u ) Conditio ns 2 and 3 for D ϕ thus follow by these condition s for D ψ , which hold by inducti ve assumpti on. For exampl e, if δ ϕ ( S 0 , σ ) = S 1 , th en δ ψ ( p i 0 , σ ∧ ¬ x ) = p i 1 , and j ∈ J 1 if f δ ψ ( q i 0 , σ ∧ x ) = q j or δ ψ ( q j 0 , σ ∧ ¬ x ) = q j for some j 0 ∈ J 0 . Thus, we ha ve ind ucti vely constr ucted the D ϕ and pro ved that it satisﬁes proper ties 1, 2, and 3. Lemma 4.9 tells us that for any model B of Γ ∪ { A Σ w } , B | = ϕ if f B | = F ϕ ( max ) . In other words , B | = ϕ if f B “belie ves” that there is a path from the star t state to s ome q f in F ϕ . A s a part of the ne xt lemma, we us e induc tion to prov e that this impli es that there actuall y must be a path in D ϕ from the start state to some q f in F ϕ . Lemma 4.10. Suppo se B | = Γ ∪ { A Σ w } ∪ { ϕ } . Then, ther e exi sts a wor d, w 0 , such that its corr espondin g w or d model, B 0 , satis ﬁes ϕ . Pr oof. By Lemma 4.9, w e can constr uct D ϕ , and we ha v e B | = F ϕ ( max ) . So B “belie v es” that there is a path to so me q f ∈ F ϕ . S uppos e there is no such path in D ϕ . L et C denot e the disjunctio n of all states that are trul y reacha ble from the star t state in D ϕ . T his situation can be express ed as follo ws: ∀ u, v . C ( u ) ∧ s ( u, v ) → C ( v ) . But this is exactly the premise for the axiom scheme NoExit , which m ust hold since B | = Γ . Therefore, we hav e B | = ∀ u, v . C ( v ) ∧ s tc ( u, v ) → C ( v ) . This implies some accepti ng state q f should be in C , beca use B | = ∀ u . s tc ( u, max ) ∧ F ϕ ( max ) , and we get a contrad iction . Therefore , there has to be a real path from the start state to a ﬁnal state q f in D ϕ . T his implies that the DF A D ϕ accept s some stand ard word, w 0 . Let B 0 be the word model correspon ding to w 0 . Thus B 0 | = F ϕ ( max ) , and theref ore by L emma 4.9 B 0 | = ϕ as desired . 16 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH Node reve rse(No de x ) { [0] Node y = null; [1] while (x != null ) { [2] Node t = x.nex t; [3] x.nex t = y; [4] y = x; [5] x = t; [6] } [7] retur n y; } Figure 6: A simple Ja va-lik e implementation of the in-pl ace re versal of a singly linke d list. 5. H E U R I S T I C S F O R U S I N G T H E C O L O R I N G A X I O M S This section presents heuristics for using the colorin g axioms. T oward that end, it answers the follo wing questio ns: • Ho w can the colorin g axioms be use d by a theo rem prov er to prov e χ ? (Sectio n 5.2 ) • When should a speciﬁc instan ce of a coloring axiom be gi ve n to the theore m prov er while trying to pro ve χ ? (Sectio n 5.4) • What part of the proces s can be automat ed? (Section 5.5) W e ﬁrst present a runni ng exampl e (more ex amples are described in S ection 5.6 and used in later sectio ns to illus trate the heur istics) . W e then expla in how the colorin g axioms are useful, describe the search space for useful axioms, gi v e an algorithm for expl oring this space, and conclu de by discus sing a prototype implementati on we ha ve de veloped that prov es the examp le presented and others . 5.1. Reve rse Sp eciﬁcati on. The heuris tics described in Sections 5.2–5.4 are illustra ted on prob- lems that arise in the veriﬁcation of partial correctne ss of a list rev ersal procedur e. Other examples pro ven usin g this technique can be found in Section 5.6. The pro cedure re verse, sho wn in Fig. 6, performs in-place rev ersal of a sin gly link ed list, de- structi vely updating the list. The precondit ion requires that the input list be acyc lic and unshared (i.e., each heap node is pointed to by at most one heap node). For simplicity , we assume that there is no garbage. The postcond ition ensures that the resulting list is acycli c and unshared . Also, it ensure s that the node s rea chable f rom th e for mal p arameter o n e ntry to re v erse a re exactly th e n odes reacha ble from the re turn va lue of re ve rse at the exi t. Most importan tly , it ensure s that each edge in the origina l list is re ve rsed in the returned list. The speciﬁcatio n for re v erse is sho w n in Fig. 7. W e use unary predicate s to represent progra m v ariabl es and binar y predic ates to repre sent data-stru cture ﬁelds. Fig. 7(a) deﬁnes some sh orthan ds. T o specify that a unary pred icate z can point to a singl e node at a time and that a binary predicate f of a node can point to at most one node (i.e., f is a partial function), we use uniq ue [ z ] and f unc [ f ] . T o specif y that there are no cycles of f -ﬁelds in th e graph, we use acy cl ic [ f ] . T o specify that the graph does not conta in nodes shared by f -ﬁelds, (i.e., nodes with 2 or more incoming f - ﬁelds), we use unshar ed [ f ] . T o spec ify that all nodes in the graph are reachable from z 1 or z 2 by follo wing f -ﬁelds, we use total [ z 1 , z 2 , f ] . Another he lpful sh orthan d is r x,f ( v ) whic h s peciﬁes tha t v is reacha ble from the node pointe d to by x using f -edges. SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 17 The precond ition of the re ver se procedure is sho wn in Fig. 7(b). W e use the predicates xe and ne to record the values of the v ariabl e x and the next ﬁeld at the beg innin g of the procedur e. The preconditio n requires that the list pointed to by x be acyc lic and unshared. It also requires that uniq ue [ z ] and f unc [ f ] hold for all unary predica tes z that repres ent program v ariab les and all binary predic ates f that represen t ﬁelds, respecti ve ly . For simplicity , w e assume that there is no garbag e, i.e., all nodes are reachable from x . The post-condit ion is sho wn in F ig. 7(c). It ensure s that the resultin g list is acy clic and un- shared . Also, it ensures that the nodes reachable from the formal paramete r x on entry to the proced ure are exact ly the nodes reachab le from the return va lue y at the exi t. Most importantl y , w e wish to sho w that each edge in the origin al list is re versed in the returned list (see Eq. (5.9)). A loop in v ariant is giv en in Fig. 7 (d). It describes the state of the program at the begin ning of each l oop iter ation. Every node is in on e of two di sjoint li sts p ointed to by x and y (Eq. (5.10)). The lists are acyclic and u nshare d. Every edge in th e list p ointed to b y x is e xact ly an edge in the original list (Eq. (5.12)). Every edge in the list pointe d to by y is the rev erse of an edge in the origin al list (Eq. (5.13)). The only origi nal edge goin g out of y is to x (Eq. (5.14)). The transformer is giv en in Fig. 7(e), using the primed predicates n ′ , x ′ , and y ′ to describe the v alues of predic ates n , x , and y , respecti vely , at the end of the iteration. 5.2. Pro ving Formulas using the C oloring Axioms. All the coloring axioms ha ve the form A ≡ P A → C A , where P A and C A are closed formulas. W e call P A the axiom’ s premise and C A the axiom’ s conclu sion. For an axiom to be usefu l, the theorem prov er will hav e to prove the premis e (as a subgoal ) and then use the conclusi on in the proof of the goal formula χ . For each of the colori ng axioms, we now explain whe n the premise can b e proved , ho w it s con clusio n can hel p, and gi ve an e xample . NoExit. The premise P NoExit [ C, f ] states that there are no f -edges exiting color class C . When C is a unary predicate appearing in the prog ram, the premise is sometimes a direct result of the loop in varia nt. Another color that will be used heav ily through out this section is reachabil ity from a unary predicate, i.e., una ry reachabilit y , formally deﬁned in E q. (5.6). Let us examine two cases. P NoExit [ r x,f , f ] is immediate from the deﬁnition of r x,f and the transiti vity of f tc . P NoExit [ r x,f , f ′ ] actu ally states that there is no f -path from x to an edge for which f ′ holds b ut f does not, i.e., a chang e in f ′ with respect to f . Thus, we use the absence of f -paths to prov e the absenc e of f ′ -paths . In m any cas es, the chan ge is an important part of the loop in va riant, and paths from and to it are part of the speci ﬁcation. A sk etch of the proof by refuta tion of P NoExit [ r x ′ ,n , n ′ ] that arises in the re ver se example is gi ve n in Fig. 8. The numbers in brack ets are the stages of the pr oof. (1) The nega tion of the premise exp ands to: ∃ u 1 , u 2 , u 3 . x ′ ( u 1 ) ∧ n tc ( u 1 , u 2 ) ∧ ¬ n tc ( u 1 , u 3 ) ∧ n ′ ( u 2 , u 3 ) (2) Since u 2 is reachabl e from u 1 and u 3 is not, by T 2 , we ha v e ¬ n ( u 2 , u 3 ) . (3) By the deﬁnition of n ′ in the transformer , the only edge in which n diffe rs from n ′ is out of x (one of the clau ses g enerat ed from Eq. ( 5.15) is ∀ v 1 , v 2 . ¬ n ′ ( v 1 , v 2 ) ∨ n ( v 1 , v 2 ) ∨ x ( v 1 ) ) . Thus, x ( u 2 ) holds. (4) By the deﬁnitio n of x ′ it has an incoming n edge from x . Thus, n ( u 2 , u 1 ) holds. The list pointed to by x must be ac yclic, whereas we hav e a cy cle between u 1 and u 2 ; i.e., we ha ve a contra dictio n. Thus, P NoExit [ r x ′ ,n , n ′ ] must hold. C NoExit [ C, f ] states there are no f paths ( f tc edges) e xiting C . This is u seful because pro ving the absenc e of paths is the dif ﬁcult part of pro ving formulas with TC . 18 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH (a) uniq ue [ z ] def = ∀ v 1 , v 2 .z ( v 1 ) ∧ z ( v 2 ) → v 1 = v 2 (5.1) f unc [ f ] def = ∀ v 1 , v 2 , v .f ( v , v 1 ) ∧ f ( v , v 2 ) → v 1 = v 2 (5.2) acy cl ic [ f ] def = ∀ v 1 , v 2 . ¬ f ( v 1 , v 2 ) ∨ ¬ T C[ f ]( v 2 , v 1 ) (5.3) unshar ed [ f ] def = ∀ v 1 , v 2 , v .f ( v 1 , v ) ∧ f ( v 2 , v ) → v 1 = v 2 (5.4) total [ z 1 , z 2 , f ] def = ∀ v . ∃ w. ( z 1 ( w ) ∨ z 2 ( w )) ∧ TC[ f ]( w, v ) (5.5) r x,f ( v ) def = ∃ w . x ( w ) ∧ TC[ f ]( w, v ) (5.6) r x, ← − f ( v ) def = ∃ w . x ( w ) ∧ TC[ f ]( v , w ) (5.7) (b) pr e def = total [ xe, xe, ne ] ∧ acy cl ic [ ne ] ∧ unshar ed [ ne ] ∧ (5.8) uniq ue [ xe ] ∧ f unc [ ne ] (c) post def = total [ y , y , n ] ∧ acy cl ic [ n ] ∧ unshar ed [ n ] ∧ (5.9) ∀ v 1 , v 2 .ne ( v 1 , v 2 ) ↔ n ( v 2 , v 1 ) (d) LI [ x, y , n ] def = total [ x, y , n ] ∧ ∀ v . ( ¬ r x,n ( v ) ∨ ¬ r y ,n ( v )) ∧ (5 .10) acy cl ic [ n ] ∧ unshar ed [ n ] uniq ue [ x ] ∧ uniq ue [ y ] ∧ f unc [ n ] ∧ (5 .11) ∀ v 1 , v 2 . ( r x,n ( v 1 ) → ( ne ( v 1 , v 2 ) ↔ n ( v 1 , v 2 ))) ∧ (5.12 ) ∀ v 1 , v 2 . ( r y ,n ( v 2 ) ∧ ¬ y ( v 1 ) → ( ne ( v 1 , v 2 ) ↔ n ( v 2 , v 1 ))) ∧ (5.13 ) ∀ v 1 , v 2 , v .y ( v 1 ) → ( x ( v 2 ) ↔ n e ( v 1 , v 2 )) (5.14) (e) T def = ∀ v . ( y ′ ( v ) ↔ x ( v )) ∧ ∀ v . ( x ′ ( v ) ↔ ∃ w .x ( w ) ∧ n ( w , v )) ∧ ∀ v 1 , v 2 .n ′ ( v 1 , v 2 ) ↔ (( n ( v 1 , v 2 ) ∧ ¬ x ( v 1 )) ∨ ( x ( v 1 ) ∧ y ( v 2 ))) (5.15) Figure 7: Example speciﬁca tion of re ver se procedure: (a) shorth ands, (b) preconditio n pr e , (c) postco nditio n post , (d) loop in v arian t LI [ x, y , n ] , (e) transformer T (ef fect of the loop body) . x ′ [1] / / GFED @ABC u 1 n tc [1] / / ¬ n tc [1] % % K K K K K K K GFED @ABC u 2 n [4] { { n ′ [1] y y s s s s s s s ¬ n [2] i i x [3] o o GFED @ABC u 3 Figure 8: Prov ing P NoExit [ r x,n , n ′ ] . GoOut. The premise P GoOut [ A, B , f ] states that all f edges going out of color class A , go to B . When A and B are unary predi cates that appear in the program, again the premise someti mes holds as a direct result of the loop in varia nt. An interesti ng special case is when B is deﬁned as SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 19 ∃ w . A ( w ) ∧ f ( w , v ) . In this case the premise is immediate. Note that in this case the conclu - sion is prov able also from T 1 . H o we ver , from experien ce, the axiom is very usefu l for improvi ng perfor mance (2 orders of magnitu de when proving the acyc lic part of rev erse’ s postcond ition) . C GoOut [ A, B , f ] states that all paths out of A must pass throug h B . Thus, under the premise P GoOut [ A, B , f ] , if we kn o w that there is a path from A to somewh ere outside of A , we kno w that there is a path to there from B . In case all nodes in B are reachable from all nodes in A , tog ether with the tra nsiti vity of f tc this means tha t the nod es reachabl e from B are exactly the n odes outsi de of A that are reachab le from A . For example, C GoOut [ y ′ , y , n ′ ] allo ws us to prov e that only the original list pointed to by y is reacha ble from y ′ (in additio n to y ′ itself) . NewStart. The pre mise P NewStart [ C, g , h ] state s that all g edges between nodes in C are also h ed ges. This ca n mean the iterat ion has n ot ad ded edg es or has not removed edges accor ding to th e selecti on of h and g . In some cases , the premise holds as a dire ct result of the deﬁnit ion of C and the loop in va riant. C NewStart [ C, g , h ] means that e ve ry g path that is not an h path must pass outsid e of C . T o- gether with C NoExit [ C, g ] , it prov es there are no ne w paths within C . For e xample, in re verse the NewStart sc heme can be u sed as fol lo ws. No outg oing edges were added to nodes rea chable from y . There are no n or n ′ edges from no des reachable from y to nodes not reac hable from y . Thus, no paths w ere add ed between nod es reach able from y . Since the list pointe d to by y is acycl ic before the loop body , we can prov e that it is acyc lic at the end of the loop body . W e can see that NewStart allows the theorem pro ve r to reason about path s w ithin a color , and the othe r axioms allo w the theor em prov er to reas on about paths between colors. T ogether , giv en enoug h colors, the theorem pro ver can often pro v e all the fac ts that it needs about paths and thus pro ve the formula of intere st. 5.3. The Searc h Space of Possi ble Axioms. T o answer the questi on of when we shoul d use a speciﬁc insta nce of a colori ng axiom when attemptin g to prov e the targe t formula, we ﬁrst deﬁne the search space in whi ch we are lo oking for such instances. The axioms can b e instantiat ed with the colors deﬁned by an arbi trary unary formula (one free v ariabl e) and one or two binary predicates . First, we limit ourselves to binar y predicate s for which TC was used in the tar get formula. No w , since it is infea sible to cons ider all arbitrary unary formulas, we start limiting the set of colors w e consid er . The initial set of color s to consider are unary pre dicates that occur in the formula we want to pro ve. Interesting ly enough, these colors are enoug h to prov e that the postcond ition of mark and sweep is implied by the loop in var iant, because the only axiom we need is NoExit [ mar k ed, f ] . An immediat e ext ension tha t is v ery ef fecti ve is forward and b ackwa rd reachab ility from unary predic ates, as deﬁned in Eq. (5.6) and Eq. (5 .7), resp ecti vely . Instant iating all possi ble axioms from the unary p redica tes appearing in the formula and their unary forward reacha bility pr edicat es, allo ws us to pro v e rev erse. For a list o f the axioms neede d to prov e re ve rse, see Fig. 9. Other examples are presen ted in Section 5.6. Finally , we consider Boolean combinations of the abo v e colors. Though not u sed in the examples s ho wn in t his p aper , this is nee ded, for e xample , in the presence of sh aring or when splici ng two lis ts together . All the colors abov e are base d on the unary pred icates that appe ar in the origin al formula. T o pro ve the re ve rse ex ample, we needed x ′ as part of the initial colo rs. T able 5 giv es a heuristic for ﬁnding the initial colors we need in cases w hen the y cannot be deduced from the formula, and how it appli es to re v erse. 20 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH NoExit [ r x ′ ,n , n ′ ] GoOut [ x, x ′ , n ] NewStart [ r x ′ ,n , n , n ′ ] NewStart [ r x ′ ,n , n ′ , n ] NoExit [ r x ′ ,n ′ , n ] GoOu t [ x, y , n ′ ] NewStart [ r x ′ ,n ′ , n, n ′ ] NewStart [ r x ′ ,n ′ , n ′ , n ] NoExit [ r y ,n , n ′ ] NewStart [ r y ,n , n , n ′ ] NewStart [ r y ,n , n ′ , n ] NoExit [ r y ,n ′ , n ] NewStart [ r y ,n ′ , n, n ′ ] NewStart [ r y ,n ′ , n ′ , n ] Figure 9: The instanc es of color ing axioms used in proving re ver se. Gr oup Criteria Roots[f] All change s are reachable from one of the color s using f tc StartChang e[f,g] All edges for which f and g dif fer start from a node in these colors EndChange [f,g] All edges for which f and g dif fer end at a node in these colors (a) Gr oup Colors Roots [ n ] x ( v ) , y ( v ) Roots [ n ′ ] x ′ ( v ) , y ′ ( v ) S tar tC hang e [ n, n ′ ] x ( v ) E ndC hang e [ n, n ′ ] y ( v ) , x ′ ( v ) (b) T able 5: (a) Heuristic for choosi ng initia l colors. (b) Results of applying the heuristic on rev erse. An interes ting observ ation is that the initial colors w e need can, in many cases, be deduce d from the progra m code. A s in th e pre vious secti on, we h a ve a go od way for deducing paths between colors and within colors in which the edges hav e not changed. The prog ram usually manipulates ﬁelds usin g point ers, and can tra v erse an edge o nly in on e direc tion. Thus, the unar y predic ates that repres ent the program v ariables (in cludin g the temporary v ariables) are in man y cas es what we nee d as initia l colors. 5.4. Exploring the Searc h Sp ace. When trying to automate the process of choosing colors, the proble m is that the set of possible colors to choose from is doubly-e xpone ntial in the number of initial colors; givin g all the axioms dire ctly to the theorem prover is infeasible. In this section, we deﬁne a heuristic alg orithm fo r explori ng a limited nu mber of a xioms in a directed w ay . Pseudoco de for this algorit hm is sho wn in Fig. 10. The ope rator ⊢ is implement ed as a call to a theorem prove r . Because the colori ng axioms h a ve the for m A ≡ P A → C A , the theore m pro ver mu st prove P A or the axiom is of no use. Therefore, the pseudoco de works iterati vely , trying to prov e P A from the curren t ψ ∧ Σ , an d if success ful it adds C A to Σ . The algorithm trie s colo rs in incr easing le vels of comple xity . B C ( i, C ) gi ves all the Boolean combina tions of the predic ates in C up to size i . A fter each iterat ion we try to prov e the goal formula. Sometimes we need the conclusion of one axiom to prov e the premise of another . The NoExit axioms are particular ly useful for provin g P NewStart . Therefor e, we need a way to order instan tiation s so that axioms useful for provin g the premises of other axioms are acquired ﬁrst. The orderin g we chose is based on phase s: First, try to instantia te axioms from the axiom scheme GoOut . S econd, try to instan tiate axioms fr om the axiom scheme NoExi t . Finally , try t o i nstanti ate axioms from the axiom scheme N ewStart . For NewStart [ c, f , g ] to be useful, we need to be able SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 21 explore( I nit , χ ) { Let χ = ψ → ϕ Σ := { T rans [ f ] , Or der [ f ] | f ∈ F } Σ := Σ ∪ { T 1 [ f ] , T 2 [ f ] | f ∈ F } C := { r c,f ( v ) | c ∈ I nit, f ∈ F } C := C ∪ I nit i := 1 forever { C ′ := B C ( i, C ) // Phase 1 foreach f ∈ F , c s 6 = c e ∈ C ′ if Σ ∧ ψ ⊢ P GoOut [ c s , c e , f ] Σ := Σ ∪ { C GoOut [ c s , c e , f ] } // Phase 2 foreach f ∈ F , c ∈ C ′ if Σ ∧ ψ ⊢ P NoExit [ c, f ] Σ := Σ ∪ { C NoExit [ c, f ] } // Phase 3 foreach C NoExit [ c, f ] ∈ Σ , g 6 = f ∈ F if Σ ∧ ψ ⊢ P NewStart [ c, f , g ] Σ := Σ ∪ { C NewStart [ c, f , g ] } if Σ ∧ ψ ⊢ ϕ return SUCCESS i := i + 1 } } Figure 10: An iterati ve algorithm for instan tiating the axiom schemes. Each iteration consists of three phas es that augment the axiom set Σ to sho w that there are either no incomin g f -paths or no outgoin g f -paths from c . Thus, we only try to insta ntiate such an axiom when eith er P NoExit [ c, f ] or P NoExit [ ¬ c, f ] has been pro ve n. 5.5. Implementation. The algori thm presented here was implemented using a Perl script and the S PA S S theorem prove r [WG R96] and used successfu lly to verify the example programs of Sec- tion 5.1 and Section 5.6. The method described above can be optimized. For instance, if C A has already been added to the axioms, we do not try to prov e P A again. These details are import ant in practic e, bu t hav e been omitted for bre vity . When trying to prov e the dif feren t premises, S P A S S may fail to termina te if the formula that it is trying to pro ve is in v alid. Thus, we limit the time that S P A S S can spend proving each formula. It is possi ble that we will fa il to acquire useful axioms this way . 5.6. Further E xamples. This section sho ws the code (Fig. 11) and the complete speciﬁcation of two addition al example s: appe nding two linked lists, and the m ark phase of a simple mark and sweep garbag e collector . 22 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH Node appe nd(Nod e x, N ode y) { [0] Node last = x; [1] if (last == null ) [2] retur n y; [3] while (las t.nex t != null) { [4] last = last. next; [5] } [6] last. next = y; [7] retur n x; } (a) void mark (NodeS et r oot, NodeSet marked) { [0] Node x; [1] if(!r oot.is Empty()) { [2] NodeS et pend ing = new NodeSet( ); [3] pendi ng.ad dAll(root); [4] marke d.cle ar(); [5] while (!pend ing.i sEmpty()) { [6] x = pendi ng.sel ectAndRemove(); [7] marke d.add (x); [8] if (x.car != null && [9] !mark ed.co ntains(x.car)) [10] pendi ng.ad d(x.car); [11] if (x.cdr != null && [12] !mark ed.co ntains(x.cdr)) [13] pendi ng.ad d(x.cdr); } } } (b) Figure 11: A simple Ja v a-lik e implementation of (a) the concaten ation proced ure for two singly- link ed lists; (b) the mark phase of a mark-and-sweep garbage collecto r . 5.6.1. Speciﬁc ation of a ppend . The s peciﬁcati on of ap pend (s ee Fig. 11(a)) is gi ven in Fig. 12. The speciﬁca tion incl udes procedure’ s pre-conditi on, a transformer of the proce dure’ s body ef fect, and the procedure’ s post-con dition . The pre-conditio n (Fig. 12(a)) states that the lists point ed to by x and y are acy clic, unshared and disjoin t. It also states there is no garbage . The post condition (Fig. 12(b)) states that after the procedur e’ s exe cution , the list pointed to by x ′ is exactly the union of the lists p ointed to b y x and y . Also, th e l ist is still ac ycli c a nd unshared. The transfor mer is giv en in F ig. 12(c). The result of the loop in the procedure’ s body is summarized as a formul a deﬁning the las t v ariabl e. The only change to n is the add ition of an edge between l ast and y . The colori ng axioms need ed to prove appe nd are giv en in Fig. 13. SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 23 (a) pr e def = acy clic [ n ] ∧ uns har ed [ n ] ∧ uniq ue [ x ] ∧ uniq ue [ y ] ∧ f unc [ n ] ∧ ( ∀ v . ¬ r x,n ( v ) ∨ ¬ r y ,n ( v )) ∧ ∀ v .r x,n ( v ) ∨ r y ,n ( v ) (5.16) (b) post def = acy clic [ n ′ ] ∧ un shar ed [ n ′ ] ∧ uniq ue [ x ′ ] ∧ uniq ue [ l ast ] ∧ f un c [ n ′ ] ∧ ( ∀ v . r x ′ ,n ′ ( v ) ↔ ( r x,n ( v ) ∨ r y ,n ( v ))) ∧ ∀ v 1 , v 2 . n ′ ( v 1 , v 2 ) ↔ n ( v 1 , v 2 ) ∨ ( l ast ( v 1 ) ∧ y ( v 2 )) (5.17) (c) T is the conjunctio n of the follo wing formulas: ∀ v .x ′ ( v ) ↔ x ( v ) (5.18) ∀ v .l ast ( v ) ↔ r x,n ( v ) ∧ ∀ u. ¬ n ( v , u ) (5.19) ∃ v . l ast ( v ) (5.20) ∀ v 1 , v 2 .n ′ ( v 1 , v 2 ) ↔ n ( v 1 , v 2 ) ∨ ( last ( v 1 ) ∧ y ( v 2 )) (5.21) Figure 12: Example speciﬁcation of append procedure : (a) precond ition pre , (b) postcondit ion post , (c) trans former T (effe ct of the procedure body). NoExit [ r y ,n , n ′ ] GoOut [ last, y , n ′ ] NewStart [ r x,n , n, n ′ ] NewStart [ r x,n , n ′ , n ] NewStart [ r y ,n , n, n ′ ] NewStart [ r y ,n , n ′ , n ] Figure 13: The ins tances of coloring axioms used in proving append . 5.6.2. Speciﬁc ation of the mark phase. Another ex ample pro ve n is the mark phase of a mark-and- sweep sequential garba ge collector , sho wn in Fig. 11(b). The example goes beyo nd the rev erse exa mple in that it m anipu lates a genera l graph and not just a linked list. F urthermo re, as far as we kno w , ES C/Ja v a [FLL + 02] was not able prove its correctnes s because it could not show that unreac hable elements were not marked. Note that the axiom needed to prove this propert y is NoExit , which we ha ve sho wn to be be yond the po wer of Nelson’ s axiomat ization . The loop in vari ant o f mark is giv en in Fig. 14(a). The ﬁrst disjunct o f the fo rmula holds only in the ﬁrst iteration, when o nly t he no des in root are pen ding an d nothin g is mar ked . The secon d holds from the second iteration on. H ere, the nodes in root are marke d or pending (the y start as pending, and t he on ly way to stop being pend ing is to become mark ed). No node is bo th mark ed an d pen ding (becau se the procedure checks if the node is marked before adding it to pend ing). All nodes that are mar ked o r p endin g are reac hable from th e roo t set (we start with only the root nodes as pendin g, and afte r that only nodes that are neighbors of pendin g nodes became pen ding; furthermore, only pendin g nodes may beco me marked ). There are no e dges between mark ed nod es and node s that are neithe r marked nor pending (becau se when w e mark a node we add all its neighbo rs to pending , unless they are marked already ). Our m ethod succeeded in pro ving the loop in va riant in Fig. 14(a) using only the positi ve axioms. The post-con dition of mar k is gi ve n in Fig. 14(b). T o prov e it, we had to use the fact that there are no edges between marked and unmark ed nodes (i.e, there are no pending nodes at the end 24 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH (a) (( ∀ v . r oot ( v ) ↔ pending ( v )) ∧ (5.22) ( ∀ v . ¬ mark ed ( v ))) (5.23) ∨ (( ∀ v . r oot ( v ) → mar k ed ( v ) ∨ pending ( v )) ∧ (5.24) ( ∀ v . ¬ pending ( v ) ∨ ¬ mar k ed ( v )) ∧ (5.25) ( ∀ v . pending ( v ) ∨ mar k ed ( v ) → r r oot,f ( v )) ∧ (5.26) ( ∀ v 1 , v 2 . mar k ed ( v 1 ) ∧ ¬ mar k ed ( v 2 ) ∧ ¬ pending ( v 2 ) → ¬ f ( v 1 , v 2 ))) (5.27) (b) ∀ v . mar k ed ( v ) ↔ r r oot,f ( v ) (5.28) Figure 14: Example speciﬁcation of mark proced ure: (a) The loop in varian t of mark, (b) The post- condit ion of mark. of the loop). Thus, we instantiate the axiom NoExit [ mar k ed, f ] , and this is enough to prov e the post-c onditi on. 6. A P P L I C A B I L I T Y O F T H E C O L O R I N G A X I O M S The coloring axio ms are applicab le to a wide v ariet y of veriﬁcat ion problems . T o demons trate this, we describe the reas oning done by the T VLA syst em and how it can be simulate d using the colori ng axioms. TVLA is b ased on the theory of abstrac t interpret ation [CC79] an d speci ﬁcally on canon ical abst raction [SR W 02]. TV LA has b een successfull y used to analyze a large verity of small b ut intri cate heap manipulati ng progr ams (see e.g., [LAS00, BLARS07]), in cludin g the veriﬁcatio n of se vera l algorithms (see e.g., [LARSW00, LR S06]). F urthermo re, the axioms described in this paper hav e been used to integra te S P A S S as the reasoni ng engine behind the TVLA system. T he inte grated system is used to perform backward analysi s o n heap manipulatin g pro grams as described in [LASR07]. In [SR W02], logical structur es are used to represen t the concrete stores of the program, and FO(TC) is used to specify the concrete transformers. This provide s great ﬂexibi lity in what program- ming-lan guage constructs the method can handle. For the purpose of this section , we assume that the v ocab ulary used is ﬁxed a nd a lwa ys co ntains equ ality . F urther more, we a ssume th at th e tran sformer canno t change t he univ erse o f t he co ncrete s tore. Allocat ion and deallocatio n can be easily modeled by using a designated unary predic ate that holds for the allocated heap cells. S imilarly , we assume that the univ erse of the concrete store is non-empt y . A bstrac t stores are represen ted as ﬁnite 3 - v alued logic al structures. W e shall expla in the meaning of a structur e S by des cribing the formula b γ ( S ) to which it corr espond s. The ind i vidual s of a 3 -v alued logical structu re are called abstract nodes. W e use an aux iliary unary predicate for each abs tract node to capture the concrete nod es that are mappe d to it. For an abstra ct structure with uni v erse { node 1 , . . . , node n } , let { a 1 , . . . a n } be the correspon ding unary predic ates. For each k -ary predicate p in the voca b ulary , each k -tuple h node 1 , . . . , node k i in the abst ract structu re (calle d an abstract tuple) can ha ve one of the follo wing truth value s { 0 , 1 , 1 2 } as follo ws: SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 25 • The truth valu e 1 means that the predicate p univ ersal ly ho lds for all of the concrete tuples mapped to this abstract tuple, i.e., ∀ v 1 , . . . , v k . a 1 ( v 1 ) ∧ . . . ∧ a k ( v k ) → p ( v 1 , . . . , v k ) (6.1) • The truth v alue 0 means that the predicate p univ ersall y does not hold, for all of the concrete tuples mapped to this abstra ct tuple, i.e., ∀ v 1 , . . . , v k . a 1 ( v 1 ) ∧ . . . ∧ a k ( v k ) → ¬ p ( v 1 , . . . , v k ) (6.2) • The truth v alue 1 2 means tha t we ha ve no inf ormation about this abst ract tuple, and thus the v alue of the predi cate p is not restri cted. W e use a designa ted set of unary predicat es called abstr actio n pr edicat es to control the dis- tinctio ns among conc rete nodes that can be made in an abstrac t element, which also places a bound on the size of abstract elements. For each abstract node node i , A i denote s the set of abstract ion predic ates for which node i has the truth v alue 1 , and A i denote s the set of abstra ction predicates for which node i has the truth val ue 0 . Every pair node i , node j of dif ferent abstract nodes either A i ∩ A j 6 = ∅ or A i ∩ A j 6 = ∅ . In addition, we require that th e abstract nodes in the structur e represe nt all the concrete nodes, i.e., ∀ v . W i a i ( v ) . T hus, the abstract nodes form a bounded partitio n of the concre te nodes . F inally , each node must represe nt at least one concret e node, i.e., ∃ v . a i ( v ) . The v ocab ulary may contain additio nal predicate s called derived pr edicat es , which are ex- plicitl y deﬁned from other predicates using a formula in FO(TC). These deri v ed predicates help the precision of the analysis by recording correlatio ns not capture d by the uni vers al information. Some of the una ry deriv ed predi cates may also be abstract ion predi cates, and thus can induce ﬁner - granul arity abstra ct nodes. W e say that S 1 ⊑ S 2 if there is a total mapping m between the abstract nodes of S 1 and the abstract nodes of S 2 such that S 2 repres ents all of the concr ete stores that S 1 repres ents when consid ering each abstract node of S 2 as a union of the abstract nodes of S 1 mapped to it by m . Formally , b γ ( S 1 ) ∧ ψ m → b γ ( S 2 ) where ψ m = ^ node i ∈ S 1 m ( node i ) = node ′ j ∀ v . a i ( v ) → a ′ j ( v ) The order is extende d to sets using the induced Hoare order (i.e., XS 1 ⊑ XS 2 if for each element S 1 ∈ XS 1 there ex ists an element S 2 ∈ XS 2 such that S 1 ⊑ S 2 ). In th e origin al TVLA implement ation [LAS00] th e abstra ct transforme r is compute d by a t hree step proce ss: • First, a he uristic is use d to perf orm case splits by reﬁning the partitio n induced by the abs tractio n predic ates. This process is called F ocus . • Second, the formulas co mprising the concrete t ransfo rmer are used to conserv ativ ely appr oximate the effe ct of the concrete transfor mer on all the represente d m emory states. Update formulas are either handwritt en or deri ved using ﬁnite dif ferenci ng [RSL03]. • Third, a constraint solv er called C oer ce is used to impro ve the precision of the abstr act elemen t by taking adv antage of the inter -dependen cies between the predicates dictated by the deﬁning formulas of the deri ved predicates and constraints of the programming language semantics. Most of the l ogical reasoning performed by T VLA is ﬁrst order in nature. The transiti ve-closu re reason ing is comprise d of three parts: (1) The upda te formulas for deri ved predicate s base d on transiti ve closure use ﬁrst-order formulas to upda te the transiti ve -closu re relation, as ex plaine d in Section 6.1. 26 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH (2) The Coerce procedure relates the deﬁnition of the edge relatio n with its transiti ve closure by perfor ming Kleene evaluat ion (see belo w ). (3) Handwritten axioms are giv en to Coerce to allo w additional transiti ve-closu re reasonin g. They are usua lly written onc e and for all per data-stru cture analyzed by the system. T o compare the transiti ve-clos ure reasoning of TVL A and the coloring axioms presented in this pap er , we conc entrat e on programs that manipu late singly-li nke d lists and trees, although the basic ar gumen t holds for oth er data-str ucture s ana lyzed by TVL A as well. T he hand written axioms used by TVL A for these cases are all covere d by the axioms described in Section 3.2. The issue of updat e formu las is cove red in detail in S ection 6.1. A detailed description of K leene e v alua tion is beyond the scope of this paper and can be found in [SR W02]. K leene ev aluation of transiti v e closur e is equiv alent to appl ying transiti vity to infer the existenc e of paths , and ﬁnding a subset of the partition that has no outgoing edges to infer the absence of paths. The latter is equiv alent to applyi ng the NoExit axio m on the formul a that deﬁnes the appropri ate partition . 6.1. Prec ise Up date. Mainte nance of transit i ve closu re throug h updat es in the underlying relation is requi red for the veriﬁcatio n of heap-mani pulatin g pro grams. In general, it is not possible to update transiti ve closure for arbit rary change using ﬁrst-orde r -logic formulas . Instead, we limit the discussion to unit changes (i.e., the add ition or remo v al of a single edge). W ork in desc ripti ve dynamic compl exit y [PI97, Hes03] and d atabas e theory [DS95] gi ves ﬁrst-or der update formul as to unit chang es in se veral classes of graphs, including function al graphs and acycli c graphs. W e demonstrate the applic ability of the propos ed axiom schemes by sho wing how the y can be used to prov e the precise update formula for unit chan ges in se veral classes of graphs. 6.1.1. Edge addition. W e refer to the edge relation before the update by e and the edge relation after the updat e by e ′ . Adding an edge from s to t can be formulated as ∀ v 1 , v 2 . e ′ ( v 1 , v 2 ) ↔ ( e ( v 1 , v 2 ) ∨ ( s ( v 1 ) ∧ t ( v 2 ))) . The precis e update formula for this chang e is ∃ v s , v t . s ( v s ) ∧ t ( v t ) ∧ ∀ v 1 , v 2 . e ′ tc ( v 1 , v 2 ) ↔ ( e tc ( v 1 , v 2 ) ∨ ( e tc ( v 1 , v s ) ∧ e tc ( v t , v 2 ))) W e hav e used S PA S S to prov e the v alidit y of this update formula using the color axioms de- scribe d in this paper . The basic colo rs needed are r t,e , i.e., forw ard reachabilit y from the tar get of the new edge, and r s, ← − e , i.e., backward reach abilit y from the sou rce of the ne w edge. The axioms instan tiated in the proo f are gi ven in T able 6(a). 6.1.2. Edge remo val. There is no known precise formula for updating the transiti ve closure of a genera l graph. For general acyclic graphs, Dong and Su [DS 95] giv e a precis e update formula that is be yond the scope of this work. For functional graphs, Hesse [Hes03] gi ve s precise update formulas based on either an auxiliar y binary rela tion, or by u sing a ternary re lation t o d escribe p aths in the graph that pass through each node. W ithou t these additions , it is not possible to gi v e precise update formulas in the prese nce of c yclicit y . When limiting th e discussion to acyclic graphs in w hich between any two nodes th ere is at most one path (such as acyc lic function al graph s and trees) it is possible to giv e a simple precis e upd ate formula. As befo re, let s be the source of the edge to be remo ved and t be the target of the edge . The formula for remo ving an edge is ∀ v 1 , v 2 . e ′ ( v 1 , v 2 ) ↔ ( e ( v 1 , v 2 ) ∧ ¬ ( s ( v 1 ) ∧ t ( v 2 ))) . SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 27 NewStart [ tr ue, e, e ′ ] NewStart [ r t,e ∧ ¬ r s, ← − e , e ′ , e ] NewStart [ ¬ r t,e ∧ r s, ← − e , e ′ , e ] NewStart [ ¬ r t,e , e ′ , e ] NewStart [ ¬ r s, ← − e , e ′ , e ] NoExit [ ¬ r s, ← − e , e ′ ] NoExit [ r t,e , e ′ ] NewStart [ tr ue, e ′ , e ] NewStart [ r t,e , e, e ′ ] NewStart [ r s, ← − e , e, e ′ ] NewStart [ ¬ r t,e , e, e ′ ] NewStart [ ¬ r s, ← − e , e, e ′ ] NoExit [ r s, ← − e , e ′ ] NewStart [ tr ue, e ′ , e ] NewStart [ r t,e , e, e ′ ] NewStart [ r s, ← − e , e, e ′ ] NewStart [ ¬ r t,e , e, e ′ ] NewStart [ ¬ r s, ← − e , e, e ′ ] NoExit [ ¬ r t,e , e ′ ] (a) (b) (c) T able 6: Axioms instant iated for the proof of the precise update formula of: (a) adding an edge to a genera l graph, (b) remov ing an edge from an ac yclic functio nal graph , and (c) remov ing an edge from a tree. The precis e update formula for this chang e is ∃ v s , v t . s ( v s ) ∧ t ( v t ) ∧ ∀ v 1 , v 2 . e ′ tc ( v 1 , v 2 ) ↔ ( e tc ( v 1 , v 2 ) ∧ ¬ ( e tc ( v 1 , v s ) ∧ e tc ( v t , v 2 ))) . W e hav e used S PA S S to pro ve the vali dity of this updat e formul a for the case of acyclic func- tional graphs and the case of trees. A s in edge additi on, r t,e and r s, ← − e are used as the basic colors. The axioms insta ntiated in the proof are gi ven in T able 6(b) and T able 6(c). 7. R E L A T E D W O R K Shape A nalysis. This work was motiv ated by our exper ience with TV LA [LAS00, S R W02], which is a generic system for abstract interpretat ion [CC77]. The TVL A syst em is more automatic than the methods describ ed in this paper since it does not rely on user -sup plied loop in varia nts. Ho wev er , the techniqu es prese nted in the present paper are potentiall y more precis e due to the use of full ﬁrst-order reasonin g. It can be sho wn that the NoExit scheme allows us to infer reachabil ity at least as precise ly as ev aluation rules for 3 -v alued logic with Kleene semantic s. In the future , we hope to dev elop an ef ﬁ cient non-interac ti ve theorem prover that enjo ys the beneﬁts of bot h app roache s. An interes ting observ ation is that the colors needed in our example s to prov e the formula are the same unary predicate s used by TV LA to deﬁne its abstraction . T his similarity may , in the future, help us ﬁnd better ways to automatic ally instantiate the require d axioms. In particular , indu cti ve logic prog ramming has recently been used to learn formulas to use in TVLA abst raction s [LRS05], which holds out the po ssibili ty of applyin g similar method s to further automate the approach of the presen t paper . Decidable Logics. Decidab le logics can be employ ed to deﬁne properties of link ed data struc- tures: W eak monadic seco nd-or der logic has been used in [EMS00, MS01] to deﬁne properties of heap-a llocate d data structures, and to condu ct Hoare-st yle ve riﬁcation using programmer -supplied loop in vari ants in the P ALE syste m [MS01]. A decidabl e logic called L r (for “logic of reachabi lity exp ressio ns”) was deﬁne d in [BRS99]. L r is rich enoug h to express the shape desc riptor s studied in [SR W98] and the path matrices introduc ed in [Hen90]. More recent decidable logics include Logic of Reachab le Patterns [YRS + 06] and a dec ision procedure for linked data structur es that can handle singly link ed lists [BR06]. The pre sent paper does not de velop decisi on procedure s, b ut instea d suggests methods that can be used in conjunction with existing theorem prov ers. Thus, the technique s are incomplete and the theo rem prov ers need not terminate. Howe v er , our initial experienc e is that the ex tra ﬂexi bility gained by the use of ﬁrst- order logic with tr ansiti ve c losure is promising. For exa mple, w e can prov e 28 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH the correctnes s of imperati ve destru cti v e list-re v ersal speciﬁed in a natural way and the correctness of mark and sweep garbag e collectors, w hich are be yond the scope of Mona and L r . Indeed , in [IRR + 04b], we ha ve tried to simulate ex isting data stru ctures using decidab le logic s and r ealized tha t this ca n be tricky be cause the programmer may ne ed to p rov e a spe ciﬁc simulat ion in va riant for a g i ven p rogram. Giv ing an in accura te simulation in va riant causes the simulation to be unsou nd. O ne of the a dv anta ges of the tech nique describ ed in the pres ent paper is that sou ndness is guaran teed no matter which axio ms are insta ntiate d. Moreov er , the simulation requ irements are not necess arily exp ressib le in the decid able logic. Other F irst-Order Axiomatizatio ns of Linked Data Structures. The closest approach to ours that we are awa re of was taken by Nelson as w e describe in S ection 4. This also has some follo w-up work by Leino and Joshi [Lei98]. Our impressi on from their w rite-up is that Leino and Joshi’ s work can be pu shed forward by using our colori ng axioms. A m ore recent work by Lahiri and Qadeer [LQ06] uses ﬁrst-order axiomat ization . This work can be seen as a speci alizatio n of ours to the case of (c yclic) singly link ed lists. Dynamic Mai ntenance o f T ransiti ve Closure. Anoth er ort hogon al but promising appro ach to transit i ve closure is to maintain reachab ility relatio ns increme ntally as w e mak e unit change s in the data structure. It is known that in many cases, rea chabili ty can be main tained by ﬁrst- order formulas [DS95, PI9 7] and e ven sometimes by quantiﬁer -free formulas [Hes03]. Furthermore, in these ca ses, it is often possibl e to automat ically deri v e the ﬁrst-order update formula s using ﬁnite dif fere ncing [RSL03]. 8. C O N C L U S I O N This paper reports on our proposal of a new methodology for using off -the-sh elf ﬁrst-order theore m pr ov ers t o reason ab out reachabili ty in programs. W e hav e e xpl ored man y of the theoretica l issues as well a s presenting e xampl es that, while still preliminary , sugge st th at t his is in deed a v iable approa ch. As mentione d earlier , prov ing the absen ce of paths is the difﬁcu lt part of pro ving formulas with TC . The promise of our approach is that it is able to handle such formulas eff ecti v ely and reason ably automatical ly , as shown by the fact that it can successf ully h andle the programs described in Section 5 and the success of the T VLA syste m, w hich uses similar transi ti ve- closur e reason ing. Of course, much furthe r work is needed including the follo wing: • Explorin g other heuri stics for identifying color classes. • Explorin g variat ions of the algori thm giv en in Fig. 10 for instan tiating colorin g axioms. • Explorin g the use of additi onal axiom schemes, such as two of the sche mes from [Nel83], which are likel y to be useful when dealing with predicates that are partial functio ns. Such predic ates arise in prog rams that manipulate sing ly-link ed or doubly-link ed lists—or , more gene rally , data structu res that are acyclic in one or more “dimensions” [HHN92 ] (i.e., in which the iterated applic ation of a gi ven ﬁeld selector can nev er return to a pre viously visited node). • Addition al wo rk should be d one on th e theoretica l po wer of T 1 + IND and re lated axiomatizat ions of trans iti ve closure. W e conjectu re, for exampl e, that T 1 + IND is TC-complete for trees. Acknowledgements. Thanks to Aharo n Abadi and Roman Mane vich for inter esting suggestio ns. Thanks to V ikto r Kuncak for useful con ver sation s including his observ ation and pro of of Proposi - tion 4.4. SIMULA TING REA CHABILITY USING FIRST -ORDER LOGIC 29 R E F E R E N C E S [A vr03] A. A vron. Transiti v e closure and t he mechanization of mathematics. In Thirty F ive Y ears of Automating Mathematics , pages 149–171. Kluwer Academic Publishers, 2003. [BLARS07] I. B ogudlo v , T . Lev-Ami, T . Reps, and M. Sagiv . R e v amping t vla: Mak ing parametric shape analysis competiti ve. In CA V , 2007. [BR06] J. Bingha m and Z. Rakamaric. A logic and decision proced ure for pred icate abstraction of heap- manipulating programs. In VMCAI , pages 207–22 1, 200 6. [BRS99] M. Benedikt, T . Reps, and M. Sagiv . A dec idable logic for describing linked data structures. In Eur opean Symp. On Pro gramming , pages 2–19, March 1999. [CC77] Patrick Cousot and Radhia Co usot. Abs tract interpretation: a uniﬁed lattice model for static an alysis of pro- grams by construction or approximation of ﬁxpoints. In POP L ’77: Pr oceed ings of the 4th A CM SIGACT - SIGPLAN symposium on Principles of pr og ramming langua g es , page s 238–252. ACM Press, 19 77. [CC79] P . Cousot and R. Cousot. Systematic design of program analysis frameworks . In Symp. on P rinc. of Pr og . Lang. , pages 269–2 82, New Y ork, NY , 197 9. ACM Press. [DS95] G. Dong and J. S u. Incremental and decremental ev aluation of transiti ve closu re by ﬁ rst-order queries. Inf. & Comput. , 120:101–1 06, 1995. [EMS00] J. Elgaard, A. Møller , and M.I. Schwartzbach. Compile-time deb ugging of C prog rams working on trees. In Eur opean Symp. On Pr ogramming , pages 119–134, 2000. [FLL + 02] C. F lanagan, K.R.M. Leino, M. Lillibridge, G. Nelson, J.B. Sax e, and R. Stata. Extended static checking for jav a. In SIGPLAN Conf . on Pr og . Lang. Design and Impl. , 200 2. [GME99] E. Gr ¨ adel, M.Otto, and E. Rosen. Undecid ability results on two-v ariable logics. Ar chive of Math. L ogic , 38:313–3 54, 1999 . [Hen90] L. Hendren . P arallelizing Pr og rams with Recu rsive Da ta Structur es . PhD thesis, Corn ell Uni v . , Ithaca, NY , Jan 1990. [Hes03] W . Hesse. Dynamic Computational Comple xity . P hD thesis, Department of Computer Science, UMass, Amherst, July 2003. [HHN92] L. Hendren, J. Hummel, and A. Nicolau. Abstracti ons for recursiv e po inter data structures: Improv ing t he analysis and the transformation of imperative programs. In SIGPLAN Conf. on Pr og. Lang. Design and Impl. , pages 249–260 , Ne w Y ork, NY , June 1992. A CM Press. [Hoa75] C.A.R. Hoare. Recursiv e data structures. Int. J . of Comp. and Inf. Sci. , 4(2):10 5–132, 1975. [IRR + 04a] N. Immerman, A. Rabinov ich, T . Reps, M. Sagiv , and G. Y orsh. T he boundary between decidability and undecidab ility of transiti ve c losure logics. In CSL’04 , 2004. [IRR + 04b] N. Immerman, A. Rabino vich, T . Reps, M. Sagi v , and G. Y orsh. V eriﬁcation via structure simulation. In Pr oc. Computer -Aided V erif. , pa ges 281–294, 2004. [LARSW00] T . Lev -Ami, T . Reps, M. Sagiv , and R. W i lhelm. Putting static analysis to work for veriﬁcation: A case study . In ISST A 2000: P r oc. o f the Int. Symp. on Softwar e T esting and Analysis , pages 26–38, 2000. [LAS00] T . L e v-Ami and M. Sagiv . T VLA: A system for implementing static analyses. In Static Analysis Symp. , pages 280–301 , 200 0. [LASR07] T . Le v-Ami, M. S agi v , and T . Reps. Backward analysis for inferring quantiﬁed preconditions. Submitted for publication, 2007. [Lei98] R. L eino. Recursive object types in a logic of o bject-oriented programs. Nor dic J . of Computing , 5:330– 360, 1998. [LQ06] S. K. Lahiri and S . Qadeer . V erifying properties of well-founde d linked lists. In POPL , pages 115– 126, 2006. [LRS05] A. Loginov , T . Rep s, and M. Sagiv . Abstraction reﬁnement via i nducti ve learning. In Proc. Computer-Aided V erif. , 2 005. [LRS06] A. Loginov , T . Reps, and M. Sagiv . Automatic veriﬁcation of the Deutsch-Schorr-Waite tree-traversa l al- gorithm. In SAS , 2006. [MP71] R. McNaughton and S. Papert. Counter -F ree Au tomata. MIT Press, 1971. [MS01] A. Møller and M.I. Schwartzbach. The pointer assertion logic engine. In SIGPLAN Conf. on P r o g. Lang . Design and Impl. , pages 221–23 1, 200 1. [Nel83] G. Nelson. V erifying reach ability in variants of link ed structures. In Symp. on Princ. of Pr og . Lang. , pages 38–47, 1983. [PI97] S. Patnaik and N. Immerman. Dyn-FO: A parallel, dynamic complexity class. Journ al of Computer and System Sciences , 55(2):199–2 09, October 199 7. 30 T . LEV -AMI, N. IMMERMAN, T . REPS, M. SAGIV, S. SRIV AST A V A, AND G. Y ORSH [RSL03] T . Reps, M. Sagiv , and A. Loginov . F inite differenc ing of l ogical formulas for static analysis. In Eur opean Symp. On Pro gramming , pages 380–398, 2003. [RSW04] T . Reps, M. S agi v , and R. Wilhelm. Static prog ram analysis via 3-v alued logic. In CA V , pages 15–30, 200 4. [SR W 98] M. Sagiv , T . Reps, and R. W ilhelm. Solving shape-analysis prob lems in langu ages with destructi v e up dat- ing. T rans. on Pr og . Lang. and S yst. , 20(1):1–50, January 1998. [SR W 02] M. Sagiv , T . Reps, and R. Wilhelm. Parametric shape analysis via 3-value d logic. T ran s. on P r o g. Lang. and Syst. , 2002. [WGR96] Christoph W eidenbach, Bernd Gaede, and Georg Rock. Spass & ﬂotter version 0.42. In CA DE-13: Pro- ceedings of the 13th International Confer ence on Automated Deduction , pages 141–145. Springer-V erlag, 1996. [YRS + 06] G. Y orsh, A. Rabinovich, M. Sagiv , A. Me yer , and A. Bouajjani. A logic of reachable patterns in linked data-structures. In FOSSACS , 20 06. This work is licensed un der the Creative Commons Attribution-NoDer ivs L icense. T o vie w a copy of this license, visit http://creative commons.org/licenses/ by-nd/2.0/ or send a letter to Creative Commons, 1 71 Second St, Suite 30 0, Sa n Francisco, CA 94105, USA, or Eisenacher Strasse 2, 10777 Berlin, Ger many

Simulating reachability using first-order logic with applications to verification of linked data structures

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment