On the distributed evaluation of recursive queries over graphs
Logical formalisms such as first-order logic (FO) and fixpoint logic (FP) are well suited to express in a declarative manner fundamental graph functionalities required in distributed systems. We show that these logics constitute good abstractions for…
Authors: ** 저자 정보가 제공되지 않아 “저자 미상”으로 표기합니다. **
On the distributed ev aluation of recursiv e queries o v er graphs St ´ ephane Grum bac h ∗ F ang W ang † ‡ Zhilin W u § Abstract Logical formalisms such as first-order logic (F O) and fixp o int logic (FP) are w ell suited to express in a declar ative manner fundamen tal graph functionalities required in distributed systems. W e show that these logics constitute g o o d abstractions for pr ogramming distributed systems as a whole, since they can be ev aluated in a fully distributed manner with r easonable complexity upper-b ounds. W e first prov e that F O and FP can be ev aluated with a polyno mial nu mber o f mes sages of lo garithmic s iz e. W e then show that the (g lobal) logical formulas ca n be tra nslated into rule progr ams descr ibing the lo cal behavior of the nodes of the distributed system, whic h compute equiv alent results. Finally , we in tro duce loca l fragments of these logics, which preserve as muc h as p ossible the lo ca lit y of their distributed co mputatio n, while offering a r ic h expressive p ow er for netw orking functionalities. W e pr ov e that they a dmit tighter upp er- bo unds with bo unded num b er of messag es of bo unded size. Finally , we sho w that the semantics and the complexity of the lo cal fragments are pr eserved ov er lo cally consistent netw or ks as well as a no n ymo us net works, thus showing the r obustness o f the prop osed lo cal log ical formalisms. 1 In tro duction Logical form alisms hav e b een widely used in differen t fields of computer science to provide high- lev el p rogramming abstractions. The r elatio n al calculus u s ed by Co dd to describ e data-cen tric applications in an abstract wa y , is at th e origin of the tec hnological and commercial success of relational database managemen t systems [16]. Datalo g, an extension of Horn clause logic with fixp oints, has b een widely used to sp ecify fun ctionaliti es in vo lving recursion [17 ]. The dev elopment of distrib u ted applications ov er net works of d evices is generally a v ery tedious task, in vo lving h andling lo w lev el sys tem details. Th e lac k of high-leve l p rogramming abstr action has b een ident ified as one of the roadblo c ks for the deplo yment of netw orks of co op erating ob jects [15]. Recen tly , the use of queries to defin e net wo rk app licatio n s h as b een considered. Initially , the idea emerged in the field of sensor netw orks. It w as suggested to see the n et w ork as a d atabase, and int eract with it through d eclarativ e queries. Sev eral systems ha ve b een d ev elop ed, among whic h Cougar [9] and TinyDB [14], su pp orting SQL dialects. Queries are pro cessed in a cen tralized manner, leading to distribu ted execution plans . ∗ INRIA -LIAMA, CASIA, PO Bo x 2728, Beijing 100190, PR China. Steph ane.Grum bach@inria.fr † Lab of Computer Science, Institut e of Softw are, Chinese A cadem y of Sciences, Beijing 100190 . wangf @ios.ac.cn ‡ China Gradu ate School, Chin ese Academy of Sciences, Beijing 100049, China § LIAMA, CAS IA, PO Box 2728, Beijing 100190, PR China. zlwu@liama.ia.ac.cn 1 More recen tly , query languages were prop osed as a mean to expr ess comm unication net work problems suc h as routing p rotocols [13] and declarativ e ov erlays [12]. This approac h, known as de clar ative networking is extremely pr omising for it offers a high-lev el abstr action to pr ogram net- w orks. It was also sho wn ho w to u se recursive queries to p erf orm diagnosis of async hron ou s systems [1], net w ork monitoring [18], as wel l as self-organizatio n proto cols [10]. Distributed q u ery languages pro vide new means to express complex netw ork problems su ch as n o d e d iscov ery [3], route fi nding, path main tenance with qualit y of service [6], top ology disco v ery , including physical top ology [5], etc. Ho w ev er, there is a lac k of systematic theoretical in vest igations of qu er y languages in the dis- tributed setting, in particular on their semantic s, as well as the complexity of their distribu ted computation. In the present pap er, w e consider a distribu ted ev aluation of classical query lan- guages, namely , fir st-order logic and fixp oin t logic, wh ic h preserve s their classica l seman tics. First-order logic and fixp oint log ic h a v e b een extensiv ely inv estigated in the con text of d atabase theory [2] as wel l as fi n ite mo del theory [7]. S ince the seminal pap er of F agin [8], sho wing that the class NP corresp ond s exactly to pr oblems w h ic h can b e expr essed in existentia l second-ord er logic, man y resu lts ha v e link ed T uring complexit y classes w ith logica l formalisms. P arallel complexit y has also b een considered for first-order queries wh ich can b e ev aluated in constan t time o v er circuits with arbitrary fan-in gates [11]. This raise d our curiosit y on th e distribu ted p oten tial of th ese classica l qu ery languages to express the functionalities of communicatio n net works, which ha v e to b e compu ted in a distribu ted mann er o v er th e netw ork itself. If their computation can b e distributed efficien tly , they can form the basis of a h igh leve l abstraction for programming distribu ted sys tems as a whole. W e rely on th e classical message passing mo del [4]. No des exc hange messages w ith their neigh- b ors in th e net work. W e consider four measures of complexit y: (i) the in -no de computational complexit y , rarely addressed in d istributed computing; (ii) the distribu ted time complexit y; (iii) the message size; and (iv) the p er-n o de message complexit y . The b eha vior of the no des is go verned b y an algorithm, the distributed query engine, whic h is installed on eac h no d e, and ev aluates the queries b y alternating lo cal computation and exc hange of queries and r esults w ith the other no des. W e fi rst consid er the distribu ted complexit y of fi rst-order logic and fixp oin t logic with infla- tionary s eman tics, whic h accumulate s all the results of the different stages of the computation. Note th at our result carry o ver for other formalisms such as least fixp oin t. W e pr o v e that the distributed complexit y of first-order queries is in O (log n ) in -no de time, O (∆) d istributed time (∆ is the diameter of the n et w ork), messages of size O (log n ), and a p olynomial num b er of messages p er n o de. F or fixp oin t, a similar b ound can b e sh o wn but with a p olynomial distrib uted time. W e th en consider th e translation of logic al formulae that express pr op erties of graphs at a global level, int o r ule programs that exp ress the b ehavi or of no d es at a lo cal lev el, and compute the same resu lt. W e introdu ce a rule language, N etlog , wh ic h extends Datal og, with comm un ication primitiv es, and is w ell suited to exp ress distributed applications, ranging from n et w orking proto cols to distribu ted data managemen t. N etlog is supp orted by the Netquest s ystem, on which th e examples of this pap er hav e b een implement ed. W e pro ve that graph programs in Datalog ¬ [2] can b e translated to N etlog programs. Since it is w ell kn own that fi rst-order and fixp oin t logics can b e translated in Datalog ¬ [7], it follo ws that global logical form ulae can b e translated in b eha vioral programs in Netlog p ro ducing th e same result. Finally , w e define lo cal fragmen ts of first-order and fixp oin t logic, resp ectiv ely F O loc and F P loc . These fragmen ts pro vide a goo d compromise in the trade-off b et wee n expressiv e p o we r and efficiency 2 of the distributed ev aluation. Imp ortant n et w ork fu nctionalities (e.g. spanning tree, on-demand routes etc.) can b e defined easily in F P loc . Meanwhile, its complexit y is constan t for all our measures, b ut the distribu ted time whic h is linear in th e diameter for F O loc and in the size of the net wo rk for F P loc . Our results s h ed light on the complexit y of the distrib uted ev aluation of queries. Note that if the comm unication net wo rk is a clique (unb ounded degree), our mac hinery r esem bles Bo olean circuits, and we get constant distributed time, a result w h ic h resem bles the classical A C 0 b ound [11]. W e ha ve restricted our atten tion to b ounded degree graphs and synchronous sys tems. Most of our algorithms carry o v er, or can b e extended to unrestricted graphs, and async hr onous computa- tion, bu t not n ecessarily the complexit y b ound s. Interesti n gly , the r esults for the lo cal fragmen ts carry ov er for other classes of net w orks, suc h as lo cally consistent net works or anonymous net w orks, th us sh owing the robu stness of the languages F O loc and F P loc . The pap er is organized as follo ws. In the next sectio n , w e recall the b asics of fir st-order and fixp oint logics. In Section 3, the compu tation mo del is presented. Section 4 is dev oted to d istributed first-order query execution, and Section 5 to fi x p oin t qu ery execution. In S ection 6, we intro d uce a b eha vioral language, Netlog, and sh o w that FP f ormulae can b e translated in to equiv alen t Netlog programs. In Section 7, we consider the r estriction to the lo cal fragment s, and sho w that they can b e ev aluated ov er different t yp es of n et w orks. 2 Graph logics W e are int erested in functions on graphs that represent the top ology of comm unication net works. W e thus restrict our atten tion to finite c onne cte d b ounde d-de gr e e undir e cte d gr aphs . Let D b e the b ound on the degree. W e assume the existence of an infi n ite ordered set of constants, U , th e universe of no de Id’s. A gr aph , G = ( V , G ), is defined by a fin ite set of no des V ⊂ U , and a set of edges G ⊆ V × V . W e express the fun ctions on graphs as qu eries. A qu e ry of arity ℓ is a computable mapping from finite graph s to finite relations of arity ℓ o v er the domain of the inp ut graph closed under graph isomorphisms. A Bo ole an query is a query with Bo olean outpu t. Logical languages hav e b een widely u sed to define queries. A form ula ϕ o ver signature G with ℓ f ree v ariables defines a query map p ing instances of finite graphs G to relations of arity ℓ defin ed b y: A = { ( x 1 , . . . , x ℓ ) | G | = ϕ ( x 1 , . . . , x ℓ ) } . W e equiv alen tly write G , A | = ϕ . W e denote by FO the set of queries defi nable using fir st-order formulae. First-order queries can b e u s ed in p articular to c hec k lo cally forbidd en configurations f or instance. Their exp ressiv e p o wer is r ather limited though. Fixp oin t logics on the other hand allo w to express f u ndament al n et w ork fun ctionaliti es, such as those inv olving p aths. If ϕ ( T ; x 1 , ..., x ℓ ) is a fir st-order form ula with ℓ free v ariables o ve r signature { G, T } , where T is a n ew r elation symbol of arity ℓ , called the fixp oint r elation , then µ ( ϕ ( T )) denotes a fixp oin t formula wh ose semanti cs is defined in ductiv ely as the in flationary fixp oint I , of the sequence: I 0 = ∅ ; I i +1 = ϕ ( I i ) ∪ I i , i ≥ 0 3 where ϕ ( I i ) d enotes the result of the ev aluation of ϕ ( T ) with T interpreted b y I i . Th e I i ’s constitute the stages of the computation of the fixp oin t. W e write G, I | = µ ( ϕ ( T )), wheneve r I is the fixp oin t of the form ula ϕ ( T ) as d efined by the ab o ve in duction. It is w ell kno w [2] that on ordered d omains, the class of graph qu eries defined b y infl ationary fixp oint, den oted FP , captures exactly all P time mappings, that is mappin gs that can b e computed on a T urin g mac hin e in time p olynomial in th e size of the graph. The follo wing examples illustrate the exp r essiv e p ow er of FP for distributed app licatio n s . The form ula µ ( ϕ ( T )( x, h, d )) for in s tance wh ere the formula ϕ ( T )( x, h, d ) is defin ed by: ( G ( x, h ) ∧ h = d ) ∨ ( G ( x, h ) ∧ ∃ z ( T ( h, z , d ) ∧ x 6 = z ) ∧ ¬∃ uT ( x, u, d )) defines a table-based routing p roto col (OLS R like ) on the graph G , where h is the next hop from x to d estination d . A spanning tree from a no d e x satisfying R eq N ode ( x ) can b e defined by a fixp oint formula µ ( ϕ ( S T )( x, y )), w here th e formula ϕ ( S T )( x, y ) is d efined b y: ( G ( x, y ) ∧ Req N ode ( x )) ∨ ( ¬∃ x ′ S T ( x ′ , y ) ∧ ∃ w ( S T ( w, x ) ∧ w 6 = y ) ∧ G ( x, y ) ∧ ∀ w ′ ∀ x ′ ( S T ( w ′ , x ′ ) ∧ G ( x ′ , y ) ⇒ x ′ ≥ x )) Similarly , an On-Demand Routing proto col (A ODV lik e), can b e d efined by th e fi x p oin t queries µ ( ϕ ( RouteReq )( x, y , d )) and µ ( ψ ( N extH op )( x, y , d )), w here d is a constan t and ϕ ( RouteReq )( x, y , d ) is d efined by: ( G ( x, y ) ∧ Req N ode ( x ) ∧ dest ( d )) ∨ ( ∃ w ( RouteR eq ( w , x, d ) ∧ w 6 = y ) ∧ G ( x, y ) ∧ x 6 = d ∧ ¬∃ w ′ RouteReq ( w ′ , y , d ) and ψ ( N extH op )( x, y , d ) is d efined by: ( RouteReq ( x, d, d ) ∧ y = d ) ∨ ( ∃ z N extH op ( y , z , d ) ∧ RouteReq ( x, y , d )) where a route request is first emitted b y a no de x satisfying R eq N ode ( x ), then a p ath defi n ed by next hops from that n o de to d estination d is established by bac kward compu tation on the route request. 3 Distributed ev aluation W e are in terested in this p ap er in the distribu ted ev aluation of qu er ies. W e assume that eac h q u ery to the n et w ork is p osed by a r e q uesting no de (the no de satisfying th e predicate R eq N ode ( x ) in th e examples of th e previous section). The result of a query shall b e distrib u ted o ver all the no des of the n et w ork. In a query Q ( x 1 , x 2 , · · · , x ℓ ), one of the attributes x i denotes th e holding no de , written explicitly as @ x i , that is the no d e whic h holds the r esults relativ e to x i . More precisely , the tuple h a 1 , · · · , a i − 1 , a, a i +1 , · · · , a ℓ i is held b y no d e a , such that Q ( a 1 , · · · , a i − 1 , a, a i +1 , · · · , a ℓ ) holds. F or simplicit y , we will c ho ose the first v ariable as holding attribute. The results of fixp oin t queries are thus distrib uted on holding no des. In the OLSR like example of the previous section, eac h n o de shall hold its routing table as a resu lt of the ev aluation of the query . 4 The no des of the net wo r k are equipp ed with a distributed query engine to ev aluate queries. It is a univ ersal algorithm that p erforms the distributed ev aluation of an y net w ork functionalit y expressed u sing queries. Th e computation relies on the message passing mo d el for d istributed computing [4 ]. The c onfigur ation of a no de is giv en by a state, an in-bu ffer for incoming messages, an out-buffer for outgoing messages, and some lo cal data and metadata used for the computation. W e assu me that the metadata on eac h no de con tain a unique identifier, the u pp er b ound on the size of the net wo rk, n , and the diameter of th e net wo rk, ∆. W e also assume that the lo cal data of eac h no de includes all its neigh b ors w ith their identifiers. W e distinguish b et ween c omputation events , p erf orm ed in a n o de, and delivery ev e nts , p erformed b et wee n no des whic h broadcast their messages to their neigh b ors. A sequence of computation ev en ts follo w ed b y deliv ery ev en ts is called a r ound of the distrib uted computation. A lo c al exe cution is a sequence of alternating configurations and ev ent s o ccurring on one n o de. W e assume that the n et w ork is static, n o des are not mo ving, and that the comm unication has no failure. W e assume that at the b eginning of the computation of a query , all the n o des are idle, in initial state, with their in-bu ffer s , and out-buffer s empty . Note that, it is easy to extend the pr esent computational fr amew ork to a multit hr eaded compu tation with sev eral concurr en t queries runn ing in the net work. The requesting no d e broadcasts its qu ery to its neigh b ors. The incoming messages in the subsequ en t no d es trigger the start of their qu ery engine computation. The evaluation of a query terminates when the out-buffers of all no d es are empt y . The r esult is d istributed ov er th e netw ork in the memories of all no d es. Note that alternativ e termin ation mo des are also p ossible. W e consider four measures of the complexit y of th e distrib uted compu tation: • T he p er-r ound in-no de c omputational c omplexity , I N-TI ME/ROUND, is the maximal compu- tational time of the in-no d e computation in one r ound; • T he distribute d time c omplexity , DIST-TIME, is the maxim um num b er of roun ds of an y lo cal execution of an y n o de till the termination; • T he message size , MSG-SI Z E, is the m axim um num b er of bits in m essages; • T he p er- no de message c omplexity , #MSG/NODE, is th e maximum num b er of messages sen t b y any no d e till the termin ation of the ev aluation. There is a trade-off b et wee n th e in-no d e computation and th e comm u n ication. O u r ob jectiv e is to distribute the workloa d in the net work as ev enly as p ossible, with a balanced amount of computatio n and comm unication on eac h no de. Clearly , central ized computation can b e carried on b y loading the top ology of the net wo rk on the requesting no de, and p erforming the ev aluation by in-no d e computation. The cen tralized ev aluation of F O and FP admits th e follo wing complexit y b ounds. Prop osition 1. L e t G b e a network of diameter ∆ , with n no des. L et ϕ b e a F O formula with v variables. The c omplexity of the c entr alize d evaluation of the query ϕ on G is given by: IN-TIME/ROUND DIST-TIME M SG-SIZE # M SG/NODE O ( n v log n ) O (∆) O (log n ) O ( n ) Supp ose µ ( ϕ ( T )( x 1 , . . . , x ℓ )) is a FP formula su c h that T is a r elational symb ol of arity ℓ , and it c ontains v = ℓ + k variables ( ℓ fr e e and k b ounde d). Then the c omplexity upp er-b ound of the 5 c entr alize d e v aluation of the query µ ( ϕ ( T )( x 1 , . . . , x ℓ )) on G is the same as the ab ove c omplexity for F O formulae exc ept for the IN-TIME/ROUND which is in O ( n ℓ + v log n ) . Note that all no d es, bu t the requesting n o de, ha ve O (log n ) p er-round in-no d e complexit y . The pro of of this r esult follo ws from classical results on data complexit y of qu ery languages [2]. In the sequel, w e fo cus exclusively on d istributed qu er y ev aluation. 4 Distributed complexit y of F O In th is section we s ho w that th e distr ibuted ev aluation of F O can b e done with a p olynomial n umb er of messages but logarithmic in-no de compu tation p er round. The result relies on a naive distributed query engine for F O, QE F O , wh ic h w orks as follo ws. The r equesting no de s tarts the computation by su bmitting a qu ery . The no des br oadcast Bo olean answers to queries when they ha v e them, and otherwise queries they cannot answer, to their neigh b ors. Eac h no d e r educes queries by instan tiating v ariables. In QE F O , no des start instantia ting from the leftmost quan tified v ariable, and from the rightmo st free v ariable. The last instan tiated free v ariable therefore denotes the holding no de of the qu ery , on whic h the corresp onding tuples will b e stored. The n o des simplify the queries by remo ving all facts, or su bform u lae they can fully ev aluate. Let ϕ b e a first-order form ula with ℓ free v ariables. Th e query engine h andles the follo wing message types: m essage { ? B ϕ } for Bo olean queries, message { ? x 1 . . . ? x i ! a i +1 . . . ! a ℓ ϕ } for n on- Bo olean queries, and message { ! B ϕ } for answe rs of Boolean queries. Eac h n o de stores pairs ( q uer y , par entq uer y ), in a quer y table, asso ciating the query b eing ev aluated to the qu er y fr om which it d eriv es. No d es also store the Boolean answers ! B ϕ and non-Bo olean answers h a 1 . . . a ℓ i to queries in an answ er table. W e will see that th e diameter ∆ of the graph ind uces an upp er-b ound on the resp onse time of queries. Th e algorithm uses clo c ks that are defined according to this upp er-b ound . Clo cks are asso ciated to th e ev aluation of queries as well as sub queries. After the time of a clo c k asso ciated to a query on a n o de has elapsed, th e v alue of the query can b e determined b y the no de. F rom no w on, we assume that we are giv en a clo c k complian t with the communicat ion graph. The v alue of the clo c ks will b e d efined in Definition 1 b elo w. The main steps of th e query engine work as follo ws. Note th at w e assume for simp licit y in the sequ el that the s ystem is synchronous. This assum ption can b e relaxed easily in async hr onous systems w ithout impact on the complexit y b y using spann ing trees rather than the clo c ks. Initial Bo olean query emission F or a Bo olean q u ery , the requesting no de, say a , b roadcasts the query , ? B ϕ , ad d s (? B ϕ, n il ) into the query table, and sets a clo c k for the answer. Mean while it instan tiates the leftmost b oun d ed v ariable and pro d uces a s u b query . F or an existent ially quant ified form ula ∃ xψ , if ψ ( a ) is tru e then it is a witness that ∃ xψ is true. F or a un iv ersally quan tified form ula ∀ xψ , if ψ ( a ) is false then it is a counterevidence and ∀ xψ is false. If the no de do esn’t ha v e the answer to ψ ( a ), it inserts ψ ( a ) along with its parent query in to the qu ery table, e.g. (? B ψ ( a ) , ? B ∃ xψ ), broadcasts ψ ( a ) and also sets a clock for ψ ( a ). If no witness / counte revid en ce is receiv ed b efore the clo c k elapses, then ∃ xψ is false / ∀ xψ is true. I t then recursively handles ψ ( a ) in the same wa y . Bo olean query reception Ev ery no de u p on reception of a Bo olean query , ? B ϕ , c h ecks at first its qu ery table. If there is a record for this qu ery , it d o es nothing. Otherw ise its b eha vior is similar to the Boolean query emission of the r equesting no d e, with the difference that it also broadcasts the answe r. 6 Bo olean answer reception Every no de receiving an answer to a Bo olean query , ! B ϕ , c hecks its answ er table. If there is a record, it do es nothing. O therwise, it stores the answ er, c hecks the query table. I f it is w aiting for the answer, it then tr ies to ev aluate the parent q u ery (if it has one), stores and broadcasts its answe r if it h as; if it is n ot wa iting for the answ er, it broadcasts ! B ϕ . Initial non-Bo olean query emission The requesting n o de sub mits and br oadcasts the query ? x 1 . . . ? x ℓ ϕ ( x 1 , . . . , x ℓ ). It sets the clo c k, inserts (? x 1 . . . ? x ℓ ϕ ( x 1 , . . . , x ℓ ) , n il ) in to the query table, instan tiates the rightmo st fr ee v ariable to get the s ub query , whic h is ? x 1 . . . ? x ℓ − 1 ! aϕ ( x 1 , . . . , x ℓ − 1 , a ), and br oadcasts it. Mean while the sub query is inserted in to the query table and hand led f urther by the requesting no de. When all the free v ariables are instan tiated, the Bo olean qu ery ? B ϕ ( a 1 . . . a ℓ ) is emitted and a record (? B ϕ ( a 1 . . . a ℓ ) , ! a 1 . . . ! a ℓ ϕ ( a 1 . . . a ℓ )) is inserted in the query table of n o de a 1 . Non-Bo olean query reception Every no de chec ks its qu ery table wh en it receiv es a qu ery ? x 1 . . . ? x i − 1 ! a i . . . ! a ℓ ϕ ( x 1 , . . . , x i − 1 , a i , . . . , a ℓ ). If th er e is a record in the table, it do es nothin g. Oth- erwise, it stores (? x 1 . . . ? x i − 1 ! a i . . . ! a ℓ ϕ ( x 1 , . . . , x i − 1 , a i , . . . , a ℓ ) , n il ) in the query table, its b ehavio r is then sim ilar to the in itial non-Bo olean query emission with i − 1 f ree v ariables. Distributed t uple answ er collection If the Bo olean qu ery ? B ϕ ( a 1 , . . . , a ℓ ) receiv es a p os- itiv e answer to it, and there is a record (? B ϕ ( a 1 . . . a ℓ ) , ! a 1 . . . ! a ℓ ϕ ( a 1 . . . a ℓ )) in th e qu ery table, h a 1 , . . . , a ℓ i is stored in the answer table of th e current no de wh ic h corresp onds to the in s tan tiation of the leftmost free v ariable, that is the holding n o de for th e answer. W e n o w turn to the clocks wh ic h parameterize th e fir st-order qu er y engines. The f ollo w ing theorem pro vides an u pp er-b ound on th e distr ib uted time complexit y of the ev aluation of a formula. Theorem 1. F or networks of diameter ∆ , the distribute d time c omplexity of the evaluation of a formula with w variables or c onstants by QE F O is b ounde d by 2∆ w . Pr o of. The pr o of is done by induction on the num b er of v ariables and constan ts in the query ψ . Basis: Assum e w = 2. There are th ree p ossibilities: t wo constan ts, or tw o v ariables, or one constan t and one v ariable in the query ψ . • If th ere are t wo constants, say a an d b , the query ψ is pr opagate d to a and gets the v alue of the atom G ( a, b ) whic h tak es at most ∆ rounds. T hen the answ er of ψ is sen t bac k to the requesting no de wh ic h tak es at most ∆ round s. Th e total time is at most 2∆ round s. • If th ere are one v ariable x and one constant a in ψ , the query is p ropagated to ev ery no de at whic h the v ariable is instantia ted and we get th e answers of G ( x, a ), w h ic h tak es ∆ r ounds. - When the v ariable is free, the answer is stored in the lo cal table of x . - When the v ariable is b ounded, the witness/coun ter evidence of ψ is sent bac k to the requesting n o de which tak es at most ∆ rounds . Or if after ∆ r ounds, the r equesting no de do es not receiv e an y su b-answe r, it is sound to consider that there are no witnesses or count erevidences. So the total time is 2∆ round s in b oth cases. • If th ere are t wo v ariables then it tak es ∆ round s to in s tan tiate one v ariable at every no de (supp ose the form ula obtained is η ) and then ∆ round s for the other v ariable (supp ose the form ula obtained is ξ ). Therefore 2∆ in all. 7 - If b oth of the v ariables are free v ariables, if ξ is tru e, then the tup le is stored in the lo cal table. - If the fi rst v ariable is free an d the second one is b oun ded, then it tak es ∆ r ounds for the w itn ess/ counter evidence (if there is one) to get to the first instan tiating no de from the second one, if η is true, supp ose a is the instan tiation of the free v ariable, then the answ er is stored in the lo cal table. - If b oth v ariables are b ound ed, then it tak es ∆ rounds for th e answe r to get to the first instan tiating no de and then ∆ to the r equesting no de. So the total time is 4∆ round s. Therefore, for w = 2, the time is b ounded b y 2∆ w round s. Induction: Su pp ose that wh en th e sum of v ariables and constan ts is w , e.g. there are l free v ariables, k b ounded v ariables, c constan ts and w = l + k + c , the time is b ound ed by 2∆ w round s . W e pro ve the result for w + 1 • w hen there are c + 1 constan ts: there are ∆ round s (at most) for the su b-query to get to the additional constan t no de and ∆ round s f or the answer to the sub-qu ery getting back. Therefore the total time is at most 2∆( w + 1) rounds. • w hen there are k + 1 b ounded v ariables: w.l.o.g. w e assume that the additional b ound ed v ariable is the leftmost b oun d ed v ariable, then ∆ r ounds are sufficien t b efore in stan tiating the second v ariable to instanti ate the fi rst v ariable, and ∆ round s for the answer getting to the first instan tiating no d e fr om th e second one. Th erefore th e total time is at most 2∆( w + 1) rounds. • w hen there are l + 1 free v ariables: it take s ∆ rounds f or instan tiating the ad d itional fr ee v ariable. S o the total time is 2∆ w + ∆. Therefore, the distribu ted time time is b ounded by 2∆( w + 1). W e can n o w settle the v alues of the clo c ks in the query engine. Definition 1. The value of the clo ck i n a network of diameter ∆ , for an FO query with w variables or c onstants is 2∆ w . The next result sho ws the robustness of the algorithm: its in dep endence from the order in which messages are handled b y the query engine. Prop osition 2. The distribute d first-or der q uery engine is insensitive to the or der of the inc oming messages in a r ound. Pr o of. There are t wo fundamenta l steps in the algorithm of the query engine: query pr opagatio n and result construction. Durin g query propagation, quer ies and sub qu er ies arriving on one no de ha v e n o interac tion. They generate en tries in the query table. Du r ing result constru ction, results of in dep endent qu eries d o n ot interfere, and results of the same query are handled w ith a set seman tics. W e can n o w defin e the distribute d infer enc e . 8 Definition 2. L et G b e a g r aph, ψ a formula with ℓ fr e e variables, and A a finite r elation of arity ℓ . We write G, A ⊢ F O ψ i f and only if A is the union of al l the answers pr o duc e d by the query engine QE F O on al l no des, u p on r e quest of ψ fr om any no de. W e next p ro v e th e soundness an d completeness of th e query engine. Theorem 2. F or any network G of diameter at most ∆ , and any first-or der formula ψ , G, A | = ψ if and only if G, A ⊢ F O ψ . Pr o of. First observe that it is su ffi cien t to p r o v e the result for Bo olean formulae. Indeed, if there are ℓ f r ee v ariables in the query , they get instant iated b y all p ossible n ℓ instan tiation when the query trav els around the sys tem of n no des, resulting in n ℓ Bo olean first-order queries. T he r esu lt of eac h query (tuple of ℓ constan ts) is then stored at the key no de if it s atisfies the Bo olean query . The resu lt is also r ather ob vious for v ariable-free formulae . Su pp ose that a query has c ( c ≥ 2) constan ts and no v ariables. Th e query is b roadcasted to every no d e and once it successiv ely reac hes no des, it gets th e Bo olean v alue for the atoms contai n ing the corresp on d ing constan ts, replaces the corresp onding atoms by their v alue and pro du ces a new query wh ic h is br oadcasted agai n . The result is obtained when th e query h as reac h ed (at most) c − 1 of the constan ts. Th en th e answer is sent b ac k to the requesting no d e. T h e total time required is at most 2∆( c − 1). The clo ck time b eing fixed at 2∆ c round s, it is suffices to get the result. The rest of the pro of is done by indu ction on the num b er of b ound ed v ariables f or Bo olean form ulae. Basis : Assume the qu ery has one b ounded v ariable. Then it m ust has at least one constants, so c ≥ 1. First it is broadcasted by the requesting n o de and th e v ariable is instantia ted b y eve ry no de, thus pro du cing n sub-queries with at most c + 1 constan ts After the sub -queries reac h at most c − 1 of th e constant s (note that one of the constant s stems from ins tan tiating th e v ariable and th e sub-queries gets it immediately at the ins tan tiating no d e) and get their answ ers, the witness for ∃ or the count erevidence for ∀ is sen t b ac k to the r equesting no de which then pro du ces the fin al answer. If no witnesses/count erevidences are receiv ed b efore the clo c k time elapses, a n egativ e/p ositive answ er is p ro duced by the requesting n o de. Induction : Ass ume th at if th e quer y has k ( k ≥ 2) b ounded v ariables and c constan ts, i.e. the query is in the form: ψ k = A 1 x 1 . . . A k x k ϕ ( x 1 . . . x k ) (denoting ∃ or ∀ by A ), then G | = ψ k if and only if G ⊢ F O ψ k . W e pro ve the result for the case w h en there are k + 1 b oun ded v ariables in the qu ery ψ k +1 = A 1 x 1 . . . A k +1 x k +1 ϕ ( x 1 . . . x k +1 ) After the first v ariable h as b een instantia ted at eac h no de, the n su b-queries of th e form ψ ′ k = A 2 x 2 . . . A k +1 x k +1 ϕ ( x 2 . . . x k +1 ) are queries with k b ounded v ariables and c + 1 constan ts. They are then f u rther propagated by the ins tantiat ing n o de. By in d uction assum ption, G | = ψ ′ k if and only if G ⊢ F O ψ ′ k , so ev ery no d e gets a sound answer to ψ ′ k . After one instantia ting no de gets the answer to ψ ′ k , it sends the answe r to th e requesting no de. If it is tru e and A 1 is ∃ then the r equesting no d e tak es it as a witn ess and ψ k +1 is true; if it is false and A 1 is ∀ then the r equ esting no de tak es it as a count erevidence and ψ k +1 is false. If the requesting n o de do es not receiv e any witnesses/counte revidences un til the clock time has elapsed, it giv es a negativ e/p ositiv e answer to ψ k +1 . Therefore G | = ψ k +1 if and only if G ⊢ F O ψ k +1 . 9 W e next consider th e complexity of the d istributed ev aluation. Theorem 3 is the fun d amen tal result of this section. It sho ws the p oten tial for d istr ibuted ev aluation of first-order qu eries with logarithmic in-no d e time complexit y , distribu ted time linear in the diameter of the graph, and p olynomial amount of comm unication. Theorem 3. L et G b e a gr aph of diameter ∆ , with n no des, and let ϕ b e a first-or der formula with v variables. The c omplexity of the distribute d evaluation of the query ϕ on G by QE F O is given by: IN-TIME/ROUND DIST-TIME M SG-SIZE # M SG/NODE O (log n ) O (∆) O (log n ) O ( n v +1 ) Pr o of. (ske tc h) W e assume th at ϕ has ℓ free v ariables, k b oun ded v ariables and c constant s. So v = ℓ + k . Let w = v + c . IN-TIME/R OUND W e consid er the complexit y in the size of the graph. The query is partially ev aluated on the lo cal d ata (identifiers of n eigh b ors) of O (log n ) size. It is rewritten in a systematic fashion in to sub-queries b y instan tiating v ariables. Both op erations can b e p erformed in O (log n ) time. The searc hing on the query table and answe r table (b oth of size O ( n v )) can b e done in O (log n ) time as w ell by binary searching. DTIME As sho wn in Theorem 1, the distr ib uted time for a qu ery is 2∆ w , so the time complexit y is in O (∆). MSG-SIZE It is eviden t that MSG-SIZE is O (log n ). #MSG/NODE During the distribu ted ev aluation of queries, new queries can b e generated b y in stan tiating free and b ounded v ariables. Th e total n umber of queries generated during the d istributed ev aluation is O ( P v i =1 n i ), wh ich is O ( n v +1 ). S o th e n umb er of quer ies and answers receiv ed by eac h no de is O ( n v +1 ). T herefore, the num b er of messages sent by eac h no d e is O ( n v +1 ). Note that the fi rst-order query engine r elies on a n aiv e ev aluation of queries. It can b e optimized b y taking ad v ant age of the patterns in the query to limit the p ropagation of sub qu eries, b ut this do es not affect the global complexit y upp er b ounds. 5 Distributed complexit y of FP W e next consid er the complexit y u pp er b ounds for FP . It relies on a query engine which is defined as follo ws. Note that we first assu me that the system is s y n c hronous and we discuss async hron ou s systems at th e end of th e present section. Query engine for F P , QE F P . At first, the requesting no de broadcasts µ ( ϕ ( T )( x 1 , . . . , x ℓ )) (wh ere T is a relational sym b ol of arity ℓ ). It tak es ∆ roun d s f or all no des to receiv e the qu ery . In order to co ord inate the computation of the stages of the fixp oin t on differen t no d es, a hop counte r c is broadcasted together with the query µ ( ϕ ( T )( x 1 , . . . , x ℓ )), and a cloc k σ is set for eac h no de. Initially , the requesting n o de sets σ = ∆, and broadcasts ( µ ( ϕ ( T )( x 1 , . . . , x ℓ )) , ∆ − 1) to its neighbors. Each no de r eceiving messages of the form ( µ ( ϕ ( T )( x 1 , . . . , x ℓ )) , c ) sets σ = c and pr opagate s the form ula ( µ ( ϕ ( T )( x 1 , . . . , x ℓ )) , c − 1) to its neighbors , un less c = 0 or σ has b een set b efore. 10 When the clo c k σ expires, eac h no de a sets a lo cal table for T and p erform s the recur sion on µ ( ϕ ( T )) b y iterating the use of the fi rst-order query engine QE F O on th e query ϕ ( T ) as f ollo w s: • a sets a cloc k τ = 2∆ w (where w is the num b er of v ariables or constan ts in ϕ ( T )), ev aluates the query ? x 1 . . . ? x ℓ − 1 ! aϕ ( T )( x 1 , . . . , x ℓ − 1 , a ) using QE F O , wh ic h tak es time 2∆ w . • If a receiv es a query ? x 1 ! a 2 . . . ! a ℓ ϕ ( T )( x 1 , a 2 , . . . , a ℓ ) b efore τ exp ires, x 1 is in s tan tiated by a to get the sub q u ery ! a ! a 2 . . . ! a ℓ ϕ ( T )( a, a 2 , . . . , a ℓ ), and the ev aluation of the Boolean query ? B ϕ ( T )( a, a 2 , . . . , a ℓ ) starts. If a gets a p ositiv e answer to that Boolean q u ery , it stores h a, a 2 , . . . , a ℓ i in a temp orary b uffer. • When the clo c k τ expires, n o de a u p dates the lo cal table for T and sets another clock η = ∆. If some new tup les h a, a 2 , . . . , a ℓ i hav e b een p ro duced, a broadcasts an informin g message to its neighbors, wh ic h w ill b e pr opagate d fur ther to all the no d es in the net wo rk to inform them that the computation has not reac hed a fi xp oint yet. • If s ome new tup les ha ve b een pr o duced in a or a has receiv ed some informing messages w hen the clo c k η exp ir es, it resets τ = 2∆ w and starts the next iteration, otherwise the ev aluation terminates. Definition 3. L et µ ( ϕ ( T )) b e a fixp oint formula, G, I ⊢ F P µ ( ϕ ( T )) if and only if up on r e quest of µ ( ϕ ( T )) fr om any no de a , the qu e ry e ngine QE F P pr o duc es answer I distribute d in the network. As for F O, we show that the q u ery en gine is soun d and complete. Theorem 4. F or a network G and µ ( ϕ ( T )) a fixp oint formula, G, I | = µ ( ϕ ( T )) if and only if G, I ⊢ F P µ ( ϕ ( T )) . The pro of of Theorem 4 follo ws easily from Theorem 2. Theorem 5. L et G b e a gr aph of diameter ∆ , with n no des, T a r elation symb ol of arity ℓ , and µ ( ϕ ( T )( x 1 , . . . , x ℓ )) b e a FP formula with v = ℓ + k (first-or der) v ariables ( ℓ fr e e and k b ounde d). The c omplexity of the distribute d evaluation of the q uery µ ( ϕ ( T )) by QE F P on G is given by: IN-TIME/ROUND DIST-TIME M SG-SIZE # M SG/NODE O (log n ) O ( n ℓ ∆) O (log n ) O ( n ℓ + v + 1 ) Pr o of. Let w b e the total num b er of v ariables and constan ts in ϕ ( T )( x 1 , . . . , x ℓ ). Messages ( µ ( ϕ ( T )( x 1 , . . . , x ℓ )) , hop ) are tr ansferred in the net w ork, b efore th e clo c k σ expires, whic h tak es O (∆) round and O (1) messages for eac h no de. Queries ? x 1 . . . ? x ℓ − 1 ! aϕ ( T )( x 1 , · · · , x ℓ − 1 , a ) are ev aluated after the clock σ expires, b efore τ expires. O ( n v ) messages are sen t by eac h no d e for eac h suc h q u ery (there are at most v − 1 v ariables in ϕ ( T )( x 1 , · · · , x ℓ − 1 , a )) by Theorem 3. S ince there are n su c h ? x 1 . . . ? x ℓ − 1 ! aϕ ( T )( x 1 , · · · , x ℓ − 1 , a ) queries, the total num b er of m essages sent by eac h no de is O ( n v +1 ). When τ expires, eac h n o de sets a clo c k η = ∆, and br oadcasts informing messages to its neigh b ors if some new tuples are pro duced. Eac h no de receiv es the in forming message will broadcast it to its neigh b ors unless it has done that b efore. Eac h no d e send s O (1) inf orming m essages b efore η expires. When η expir es, if a no de h as pr o duced some new tuples or receiv ed some informing messages during the previous iteration, it starts the n ext iteration. 11 So b efore the ev aluation terminates, in eac h iterating p erio d 2∆ w + ∆ after the expiration of σ , at least one new tup le in T is pro duced in some no de, th us there are at most n ℓ suc h p erio ds b efore the termination of the ev aluation since th ere are at most n ℓ tuples in T . Consequent ly the total time of the ev aluation is in ∆ + n ℓ (2∆ w + ∆) = O ( n ℓ ∆). Because in eac h suc h p erio d, O ( n v +1 ) messages are sent by eac h no d e, so the total n umb er of messages s ent by eac h no de b efore the termination of the ev aluation is O ( n ℓ + v + 1 ). Although the complexit y u pp er-b ound f or DIST -T I ME and #MSG/NODE is p olynomial, the exp onent relates to the num b er of v ariables. F o r most net wo rkin g functionalities, this num b er is small, and the d ep endencies b etw een the v ariables, might ev en low er it. The algorithm QE F P ab o ve can b e adapted to an async h r onous system by using a b reath- first-searc h (BFS) spanning tree (with the requesting n o d e as the ro ot), without impact on the complexit y b ound s. If an arbitrary spanning tree, not n ecessarily a BFS tr ee, is u sed, then the complexit y b ounds do es not change, except th e distributed time, wh ic h b ecomes O ( n ℓ +1 ). Note that with QE F P , no des are coord inated to compute ev ery stage of the fixp oint simulta- neously by us in g the clo c k 2∆ w , wh ic h is critical for p reserving th e cen tralized seman tics of F P form ulae. Ho we ver if ϕ is monotone on T , th e cen tralized seman tics of the fi xp oin t is p reserv ed n o matter whether the stage s are computed simulta n eously or n ot. Similar results can b e sho wn for alternativ e d efinitions of the fixp oin t logic, such as Least Fixp oint. 6 In-no d e b eha vioral compilation In this section, we see ho w to transf orm FO and FP formulae, which expr ess qu eries at the global lev el of ab s traction of the graph , to equiv alen t r ule programs that mo del the b ehavio r of no des. W e first introd uce the N etl og language. A N etl og p rogram is a fin ite set of ru les of the form: ( ↑ ) γ 0 : − γ 1 ; . . . ; γ l . where l ≥ 0. The he ad of the rule γ 0 is an atomic first-order formula. Th e b o dy , γ 1 ; . . . ; γ l is constituted of literals, i.e., atomic ( R ( − → x )) or n egate d atomic ( ¬ R ( − → x )) formulae. Each atomic form ula γ i has a holding v ariable , which is wr itten explicitly as @ x and sp ecifies the no de on which the ev aluation is p erformed. The c ommunic ation c onstruct , ↑ , is added b efore the head if the result is to b e pu shed to neighbors . In the sequel we denote the head of a rule r as head r and the b o d y as body r and denote the holding v ariable of a formula γ i as hv γ i . The relations o ccurring in the head of th e rules are called intentional r elations . Some lo calizatio n restrictions are imp osed on the rules to ensur e the effectiv en ess of the dis- tributed ev aluation. (i) All literals in the b o dy hav e the same holding v ariable; (ii) the head is not push ed (by ↑ ) if the holdin g v ariable of the head is the holding v ariable of the b o dy; (iii) if th e head is pushed (by ↑ ), assuming the holding v ariable of the head is x and the holding v ariable of th e b o dy is y , th en G (@ y , x ) is in the b o dy . 12 A Netlog program is ru nning on eac h no d e of the net w ork concurrently . All the ru les are applied sim ultaneously on a no de. The holding v ariable of literals in th e b o dy is instan tiated b y the no de ID itself. F acts dedu ced are stored on the n o de if the ru le is not mo d ifi ed b y ↑ . O th erwise, they are sen t to n o des inte rp reting the holding v ariable of the head. On eac h no d e, (i) ph ases of executions of the ru les on the no d e and (ii) phases of comm unication with other no des are alternating till no new facts are deduced on eac h no d e. The global semantic s is d efined as th e union of the facts obtained on eac h no de. F or a graph G = ( V , G ), an instance I suc h that I = S v ∈ V I v where I v is the fr agmen t of I stored on n o de v , a rule: r : Q ( − → x ) : − R 1 ( − → y 1 ); . . . ; R m ( − → y m ); ¬ R m +1 ( − − − → y m +1 ); . . . ; ¬ R l ( − → y l ) . and an instan tiation σ of th e v ariables o ccurring in r , ( I , σ ) | = G R 1 ( − → y 1 ); . . . ; R m ( − → y m ); ¬ R m +1 ( − − − → y m +1 ); . . . ; ¬ R l ( − → y l ) if and only if R i ( σ ( − → y i )) ∈ I σ ( y ) ∪ G, for i ∈ [1 , m ] / ∈ I σ ( y ) ∪ G, for i ∈ [ m + 1 , l ] where y is the holding v ariable of body r . W e defin e the imme diate c onse quenc e op er ator of a N etl og pr ogram P as a mapping from an instance I to an in s tance: Ψ P , G ( I ) = [ v ∈ V Q ( − → u ) ∃ r ∈ P : Q ( − → x ) : − body r ∃ σ s.t. ( I , σ ) | = G body r ; − → u = σ ( − → x ); σ ( hv Q ( − → x ) ) = v . The c omputation of a N etl og pr o gr am P on a gr aph G is giv en by the f ollo w ing sequence: I 0 = ∅ ; I i +1 = Ψ P , G ( I i ) , i ≥ 0 The computation of P on G terminates if the sequence ( I i ) i ≥ 0 con v erges to a fixp oint . If the computation of P o n G terminates, we define P ( G ) to b e the least fixp oint obtained by the computation sequence ( I i ) i ≥ 0 . Before w e see ho w FO or FP form ulae can b e rewr itten into N etl og p rograms, let us fir st illustrate the tec hnique on the examples of Section 2. Example 1. The fol lowing pr o gr am c omputes the OLSR like table-b ase d r outing pr oto c ol as define d in Se ction 2: T (@ x, d, d ) : − G (@ x, d ) . T (@ x, h, d ) : − ¬ existT (@ x, d ); G (@ x, h ); ask T (@ x, h, d ) . existT (@ x, d ) : − T (@ x, u, d ) . ↑ as k T (@ x, h, d ) : − T (@ h, z , d ); G (@ h, x ); x 6 = z . T (@ x, d, d ) : − T (@ x, d, d ) . 13 New pr e dic ates ( ask T ) ar e intr o duc e d to stor e p artial r esults that ar e c ompute d on some no des, and use d by other no des to which they have b e en forwar de d. The last rule ensur es the inflationary b ehavior (ac cumulation of r esults). Example 2. The fol lowing pr o gr am c omputes sp anning tr e es as define d in Se ction 2. Sever al new pr e dic ates ar e intr o duc e d to r e duc e the c omplexity of the formula ( del ay , r ej ) and to ensur e the tr ansfer of data b etwe en the no des involve d i n the c omputa tion ( ask S T ). ↑ S T ( x, @ y ) : − G (@ x, y ); R eq N ode (@ x ) . S T ( x, @ y ) : − ¬ existS T (@ y ); del ay ( x, @ y ); ¬ r ej ( x, @ y ) . ↑ as k S T ( x, @ y ) : − S T ( w, @ x ); G (@ x, y ); w 6 = y . existS T (@ y ) : − S T ( x, @ y ) . r ej ( x ′ , @ y ) : − ask S T ( x, @ y ); ask S T ( x ′ , @ y ); x ′ ≥ x. delay ( x, @ y ) : − ask S T ( x, @ y ) . S T ( x, @ y ) : − S T ( x, @ y ) . Example 3. The fol lowing pr o gr am c omputes the AODV like on-demand r outing pr oto c ol as define d in Se ction 2. ↑ RouteR eq ( x, @ y , d ) : − G (@ x, y ); R eq N ode (@ x ); dest ( d ) . RouteReq ( x, @ y , d ) : − ask R outeReq ( x, @ y , d ); ¬ existR R (@ y , d ) . ↑ as k RouteReq ( x, @ y , d ) : − RouteReq ( w, @ x, d ); G (@ x, y ); x 6 = d ; w 6 = y . existRR (@ y , d ) : − RouteReq ( w ′ , @ y , d ) . ↑ N exthop (@ x, d, d ) : − RouteR eq ( x, @ d, d ); G (@ d, x ) . ↑ N exthop (@ x, y , d ) : − RouteReq ( x, @ y , d ); N exthop (@ y , z , d ); G (@ y , x ) . RouteReq ( x, @ y , d ) : − RouteR eq ( x, @ y , d ) . N exthop (@ x, d, d ) : − N exthop (@ x, d, d ) . W e no w consider the general translation of FO and FP formulae to N etlog programs. It has b een sho wn in [7] that FP is equiv alent to D atal og ¬ b oth with infl ationary s eman tics. Moreo ver, b oth F O and FP form ulae can b e tran s lated effectiv ely to D atal og ¬ programs. W e th er efore consider the translation of D atal og ¬ programs into equiv alen t N etl og programs. The main d ifficult y relies in the distribu tion of the computation. The syntax and semantic s of D atalog ¬ is similar to the one of N etlog , but without the com- m un ication primitive s. Indeed, unlike N etlog , a program in D atal og ¬ , is pro cessed in a cen tralized manner. The c omputation of a D atal og ¬ pr o gr am P on a gr aph G is giv en by th e follo wing se- quence: I 0 = ∅ ; I i + i = Ψ P , G ( I i ) ∪ I i , i ≥ 0 where Ψ P , G ( I i ) is defin ed in a similar wa y as for N etl og . 14 The follo wing algorithm rewrites a D atalog ¬ program P D L in to a N etl og program P N L . T o sync hr onize s tages of the recursion, there is a fact “ star t ( a )” stored on eac h n o de a at the b eginning of the computation whic h triggers a clo ck used to co ordinate stages. In the s equel w e do not d istinguish b et w een G (@ x, y ) and G (@ y , x ). Rewriting Algorithm: The algorithm rewrites the inpu t pr ogram step by step. Step 1: Distributing Data Input: P D L . Outp ut: P 1 . Algorithm Local iz e ( P D L ) c ho oses one v ariable as the holding v ariable for eac h relation in P D L . P 1 is obtained b y marking the holding v ariable of eac h literal in P D L . The Rewriting Algorithm supp orts differen t assignment of holdin g v ariables. F or simp licit y , we assume the left most v ariable of eac h relatio n is c hosen as holding v ariable. F or lac k of s pace, w e do n ot add ress the asso ciated optimizati on problem. Step 2: Distributing Computat ion Input: P 1 . Output: < P 2 , κ > Let ∆ b e the diameter of G . F or eac h ru le r ∈ P 1 , assume • hv head r is the holding v ariable of head r , • h r := hv head r , and • C N r := { h r } . Rew r ite ( r , h r , C N r ) recursively rewrites the ru le r into several rules until the output ru les satisfy the localization restriction (i). body r is divided in to several parts: the local part that can b e ev aluated lo cally and the non-lo cal part th at cannot b e ev aluated lo cally . h r is the holdin g v ariable of the literals in the lo cal part. Th e non-lo cal part is partitioned into several disconnected parts whic h share no v ariables except the v ariables in C N r and are ev aluated b y add itional rules r i on d ifferen t no des in p arallel. The deduced facts of r i are pu shed to the no de where the rule r is ev aluated. Mean while, it calculates the num b er of round s κ r for ev aluating r . Rewrite ( r , h r , C N r ) : output < T r , κ r > Begin Assume r : γ : − γ 1 ; . . . ; γ l . where l ≥ 1. Let S = { γ 1 , . . . , γ l } , S ′ = { γ i | γ i ∈ S and hv γ i = h r } , so that S ′ con tains all the literals in body r whose h olding v ariable is the same as the one of th e head, h r . - If S ′ = S , then T r := { r } , and κ r := 1. - If S ′ 6 = S , Begin Let S ′′ := S − S ′ , so that S ′′ con tains all th e literals in body r whose holding v ariables are not h r . 15 F or γ j , γ k ∈ S ′′ , let γ j ≈ γ k if γ j and γ k ha v e some common v ariables b esides the v ariables in C N r . Assume { S ′′ 1 , . . . , S ′′ n } ( n ≥ 1) is a partition of S ′′ in minimal sub s ets closed und er ≈ , so that the literals in S ′′ are divided into disconnected ”subgraph ” comp onent s. F or eac h S ′′ i , i ∈ [1 , n ], let T i := { hv γ iw | γ iw ∈ S ′′ i and G (@ h r , hv γ iw ) ∈ S ′ } . so that T i con tains th e v ariables w hic h are th e holding v ariable of one literal in S ′′ i and are also a n eigh b or of h r . - If T i 6 = ∅ , w hic h means the n on-lo cal p art S ′′ i is connected with the lo cal part S ′ . Cho ose one v ariable hv γ iu from T i . Let S ′′ i := S ′′ i ∪ { G (@ hv γ iu , h r ) } . Let h r i := hv γ iu . Let C N r i := C N r ∪ { h r i } . Let d r i := 1. Assume S ′′ i = { γ i, 1 , . . . , γ i,m i } . Let r i : Q i ( − → y i ) : − γ i, 1 ; . . . ; γ i,m i . where Q i is a new relation n ame an d − → y j con tains all the v ariables occurr ing b oth in S ′′ i and in either S ′ or head r , that is in v ar ( S ′′ i ) ∩ ( v ar ( S ′ ) ∪ v ar ( head r )), with h r as holding v ariable. - If T i = ∅ , then th e non-lo cal part S ′′ i is disconnected fr om the lo cal p art S ′ . Cho ose one literal γ it ∈ S ′′ i . Assume y is a v ariable not o ccurring in r , let S ′′ i := S ′′ i ∪ { y = hv γ it } . Let h r i := hv γ it . Let C N r i := C N r ∪ { h r i } . Let d r i := 1 + ∆. Assume S ′′ i = { γ i, 1 , . . . , γ i,m i } . Let r i : Q i ( − → y i ) : − γ i, 1 ; . . . ; γ i,m i . where Q i is a new relation n ame an d − → y i con tains all the v ariables occurr ing b oth in S ′′ i and in either S ′ or head r , that is in v ar ( S ′′ i ) ∩ ( v ar ( S ′ ) ∪ v ar ( head r )), with y as h olding v ariable. Moreo ver, let r ′ i : Q i (@ x . . . ) : − Q i (@ y . . . ); G (@ y , x ) . Assume S ′ = { γ ′ 1 , . . . , γ ′ k } ( k ≥ 0), let r ′ : γ : − γ ′ 1 ; . . . ; γ ′ k ; Q 1 ( − → y 1 ); . . . ; Q n ( − → y n ) . Q i ( − → y i ), i ∈ [1 , n ], is called su b-query . Assume < T r i , κ r i > = R ew r ite ( r i , h r i , C N r i ), let • T r := { r ′ } ∪ S i ∈ [1 ,n ] ( { r ′ i } ∪ T r i ), and • κ r := max { κ r i + d r i | i ∈ [1 , n ] } , End End Finally , let • P 2 := S r ∈P 1 T r , and 16 • κ := max { ∆ , max { κ r | r ∈ P 1 }} . Step 3: Communic at ion Input: < P 2 , κ > . O utput: < P 3 , κ > . P 3 is obtained by adding ↑ in the head of eac h rule r wh ere r ∈ P 2 with the holding v ariable of the head d ifferen t fr om the holding v ariable of the b o dy . So that r ules in P 3 satisfy the lo calization restriction (ii) and (iii). Step 4: Stage co ordination with clo c ks Input: < P 3 , κ > . O utput: P 4 . The rules in P 3 are mo dified as follo ws: - Add the literals ” clock (@ x , q )” and ” q 6 = 0” to the b o dy of eac h rule, w here x is the holding v ariable of th e b o dy . - F or eac h rule with an int ensional r elation R of P D L in its head, replace R in the head with tempR and add R ( − → x ) : − tempR ( − → x ); cl ock (@ x, 0) . continue (@ x ) : − tempR ( − → x ); ¬ R ( − → x ); cl ock (@ x, 0) . ↑ inf (@ y , x ) : − t empR ( − → x ); ¬ R ( − → x ); cl ock (@ x, 0); G (@ x, y ) . in P 4 where x is the holding v ariable of b oth R and t empR . - Add continue (@ x ) : − star t (@ x ) . ↑ inf (@ y , x ) : − s tar t (@ x ); G (@ x, y ) . clock (@ x, κ ) : − star t (@ x ) . clock (@ x, p ) : − clock (@ x, q ); q ≥ 1; p = q − 1; ¬ s top (@ x ) . clock (@ x, κ ) : − clock (@ x, 0); ¬ stop (@ x ) . ↑ inf (@ z , x ) : − inf (@ y , x ); G (@ y , z ); x 6 = z ; cl ock (@ x, q ); q ≥ ∆ . continue (@ x ) : − inf (@ x, y ); cl ock (@ x, q ); q 6 = 0 . continue (@ x ) : − continue (@ x ); clock (@ x, q ); q 6 = 0 . stop (@ x ) : − ¬ continue (@ x ); clock (@ x, 0) . in P 4 . Step 5: Inflat ionary result Input: P 4 . Output: P N L . P N L con tains rules in P 4 and the follo wing rules: - F or eac h relation R in P 4 except s tar t , cl ock , con tinue , inf and st op but not in P D L , add R ( . . . @ x . . . ) : − R ( . . . @ x . . . ); cl ock (@ x, q ); q 6 = 0 . in P N L . 17 - F or eac h int ensional relation R of P D L , add R ( − → x ) : − R ( − → x ) . in P N L . It is ob vious th at eac h rule in a program P N L pro du ced by the Rewriting Algo r ith m satisfies the lo calizatio n restrictions, and can thus b e computed effectiv ely on one no d e. W e can now state the main result of this section w hic h shows that the global seman tics of P D L coincides with the distributed seman tics of P N L . Theorem 6. F or a gr aph G = { V,G } , a Datalo g pr o gr am P D L and its r ewritten N etl og pr o g r am P N L pr o duc e d by the R ewriting A lgorithm, the c omputation of P N L on G terminates iff the c omputatio n of P D L on G terminates, and P N L ( G ) = P D L ( G ) . P N L slo ws do wn th e computation of P D L . During one stage ( κ r ounds) of the computation of P N L , th e clo c k turns from κ to 0, the sub -queries are ev aluated and th e sub-resu lts are transmitted. A t th e end of eac h s tage, the ded u ced facts for th e in tensional relations of P D L are cum u lated and all the su b-results are cleared. Hence, one suc h stage of P N L is equiv alen t to one stage of P D L . F o r an intensional relatio n R of P D L , R ( − → c ) ∈ I D Li if and only if R ( − → c ) ∈ I N Li ( κ +1)+1 , i ≥ 0, where I D Li and I N Li are the s tages of resp ectiv ely the fixp oints of P D L and P N L . The termination of the computation of P N L is ensured by the predicate stop as follo ws: the computation starts with a fact s tar t ( a ) on eac h n o de a , which triggers clock ( a, κ ), continue ( a ) and inf ( b, a ) where b is a neighbor of a . When the clock decreases from κ to 0, the ev aluation of the sub-queries is done. The facts of an in tensional relation R of P D L are stored in t empR . Mean wh ile, inf ( v , a ) is pushed to all th e other n o des v to in form that the computation on a contin ues, so that continue ( v ) is d educed. continue ( a ) for one stage is m ain tained to the end of th e stage. Wh en th e clock tur ns to zero, (i) th e program chec ks if continue ( a ) is tru e. If false, stop ( a ) is dedu ced. Since ¬ stop ( a ) is a p recondition for decreasing the clo c k and the clo ck is a p recondition for deducing facts of all th e other relations except R , so only th e facts of R are pr eserv ed along the stages. Thus the fi x p oin t is obtained and the compu tation terminates. Otherwise ( stop ( a ) is not dedu ced), the computation con tinues. (ii) Th e programs compares facts of tempR and R . If there are newly deduced facts, these f acts are add ed into R . Mean wh ile continue ( a ) and inf ( b, a ) are d ed uced f or the next stage. The pro of of T h eorem 6 relies on the follo wing Lemma and the fact that the Rewriting Algorithm pro du ces only rules satisfying the lo calization restrictions. Lemma 7. F or a gr aph G = { V,G } , a Datalo g pr o g r am P D L and its r ewritten N etlog pr o gr am P N L pr o duc e d by the R ewriting Algorithm, the c omputation se quenc e ( I N Lj ) j ≥ 0 for P N L satisfies: 1. F or e ach r e lation R in P N L , R ( − → c ) ∈ I N Lp if f R ( − → c ) ∈ I N Lp,c 1 and R ( − → c ) / ∈ I N Lp,c ′ , wher e c 1 is the holding no de of R ( − → c ) and c ′ 6 = c 1 . 2. I N L 0 = { star t ( v ) | v ∈ V } . 3. If clock ( a, c ) ∈ I N Lp , then cl ock ( v , c ) ∈ I N Lp for al l v ∈ V . If s top ( a ) ∈ I N Lp , then s top ( v ) ∈ I N Lp for al l v ∈ V . 18 4. If stop ( a ) ∈ I N Ls , then (i) cl ock ( a, κ ) ∈ I N Ls , (ii) for q ∈ [1 , s ] , cl ock ( a, κ − p ) ∈ I N Lq , p ∈ [0 , κ ] , if f q = n ( κ + 1) + p + 1 and (iii) if R ( − → c ) ∈ I N Lf wher e f > s + 1 , then R is an intensional r elation of P D L . C ontinue ( a ) / ∈ I N Ln ( k +) for any a ∈ V and any p ≥ s − ( κ + 1) if f stop ( a ) ∈ I N Ls . (iiii) continue ( a ) / ∈ I N Ls . 5. F or e ach r elation R i n P N L but not in P D L , exc ept the r elations star t , cl ock , continue , inf and st op , (i) if R ( − → c ) ∈ I N Lp , then cl ock ( a, κ ) / ∈ I N Lp , and (ii) i f p = n ( κ + 1) + q , q ∈ [2 , κ + 1] , then R ( − → c ) ∈ I N Ln ( κ +1)+ q ′ , q ′ ∈ [ q , κ + 1] . 6. F or e ach intensional r elation R of P D L , if R ( − → c ) ∈ I N Lp then R ( − → c ) ∈ I N Lp ′ wher e p ′ ≥ p . Assume q = m in { p | R ( − → c ) ∈ I N Lp } , then cl ock ( a, κ ) ∈ I N Lq . No w we prov e Theorem 6. Pr o of. Assume the computation sequen ce for P D L is ( I D Li ) i ≥ 0 and f or P N L is ( I N Lj ) j ≥ 0 . W e p ro ve for any intensional relation Q of P D L , Q ( − → c ) ∈ I N Li ( κ +1)+1 if f Q ( − → c ) ∈ I D Li . Basis : i = 0, I D L 0 = ∅ and I N L 1 = { continue ( a ) , inf ( b, a ) , cl ock ( a, κ ) | a ∈ V , G ( a, b ) } . Induction: S upp ose f or n ≥ 0, and eac h in tensional relation Q of P D L , Q ( a 1 , . . . , a k ) ∈ I D Ln if f Q ( a 1 , . . . , a k ) ∈ I N Ln ( κ +1)+1 . First we pro of th at for n + 1, if Q ( b 1 , . . . , b k ) ∈ I D Ln +1 , then Q ( b 1 , . . . , b k ) ∈ I N L ( n +1) ( κ +1)+1 . If Q ( b 1 , . . . , b k ) ∈ I D Ln +1 , then (i) Q ( b 1 , . . . , b k ) ∈ I D Ln or (ii) Q ( b 1 , . . . , b k ) is a newly d educted fact in I D Ln +1 . If Q ( b 1 , . . . , b k ) ∈ I D Ln then Q ( b 1 , . . . , b k ) ∈ I N Ln ( κ +1)+1 b y the indu ction hyp othesis, and Q ( b 1 , . . . , b k ) ∈ I N Lp where p ≥ n ( κ +1)+1 b y Lemma 7.6, therefore Q ( b 1 , . . . , b k ) ∈ I N L ( n +1) ( κ +1)+1 . Otherwise( Q ( b 1 , . . . , b k ) / ∈ I D Ln ), then there is one ru le r ∈ P D L r : Q ( x 1 , . . . , x k ) : − R 1 ( − → y 1 ); . . . ; R m ( − → y m ); ¬ R m +1 ( − − − → y m +1 ); . . . ; ¬ R l ( − → y l ) . and an instan tiation σ of the v ariables in r suc h that σ ( x i ) = b i for i ∈ [1 , k ] R i ( σ ( − → y i )) ∈ I D Ln ∪ G, for i ∈ [1 , m ] / ∈ I D Ln ∪ G, for i ∈ [ m + 1 , l ] and for some e ∈ [1 , m ], R e ( σ ( − → y e )) / ∈ I D Ln − 1 ∪ G . By the ind uction hyp othesis and L emma 7.1, R i ( σ ( − → y i )) ( ∈ I N Ln ( κ +1)+1 ,σ ( hv R i ) ∪ G, for i ∈ [1 , m ] / ∈ I N Ln ( κ +1)+1 ,σ ( hv R i ) ∪ G, for i ∈ [ m + 1 , l ] and R e ( σ ( − → y e )) / ∈ I N L ( n − 1) ( κ +1)+1 ,σ ( hv R i ) ∪ G . According to Lemma 7.6 R i ( σ ( − → y i )) ( ∈ I N Lp,σ ( hv R i ) ∪ G, for i ∈ [1 , m ] / ∈ I N Lp,σ ( hv R i ) ∪ G, for i ∈ [ m + 1 , l ] where p ∈ [ n ( κ + 1) + 1 , ( n + 1)( κ + 1)] and R e ( σ ( − → y e )) is newly deduced in I N Ln ( κ +1)+1 . So continue ( σ ( hv R e )) ∈ I N Ln ( κ +1)+1 . By Lemma 7.4, stop ( a ) / ∈ I N Ln ( κ +1)+1 and clock ( a, κ − p ) ∈ I N Ln ( κ +1)+1+ p for p ∈ [0 , κ ] and for an y a ∈ V . 19 Because Q (@ x 1 , . . . , x k ) : − tempQ (@ x 1 , . . . , x k ); cl ock (@ x 1 , 0) . is in P N L , so if tempQ ( b 1 , . . . , b k ) ∈ n ( κ + 1) + 1 + p , p ∈ [ κ r , κ ], then Q ( b 1 , . . . , b k ) ∈ I N L ( n +1) ( κ +1)+1 since κ r ≤ κ . According to Rewriting Algorithm, h r = hv Q , C N r = { h r } and • if all the holding v ariables of the literals in body r are the s ame with h r ( S ′ = S ), then tempQ (@ x 1 , . . . , x k ) : − R 1 ( − → y 1 ); . . . ; R m ( − → y m ); ¬ R m +1 ( − − − → y m +1 ); . . . ; ¬ R l ( − → y l ); cl ock (@ x 1 , q ); q 6 = 0 . and tempQ (@ x 1 , . . . , x k ) : − tempQ (@ x 1 , . . . , x k ); cl ock (@ x 1 , q ); q 6 = 0 . are in P N L . κ r = 1. Therefore tempQ ( b 1 , . . . , b k ) ∈ I N Ln ( κ +1)+1+ p for eac h p ∈ [1 , κ ], and Q ( b 1 , . . . , b k ) ∈ I N L ( n +1) ( κ +1)+1 b y L emma 7.1 and 7.5. • O therwise, not all of the holding v ariables of the literals in body r are the same with h r ( S ′ 6 = S ). Assu me hv R 1 = · · · = hv R w = hv R m +1 = · · · = hv R m + u = h r . Then tempQ (@ x 1 , . . . , x k ) : − R 1 ( − → y 1 ); . . . ; R w ( − → y w ); ¬ R m +1 ( − − − → y m +1 ); . . . ; ¬ R m + u ( − − − → y m + u ); Q 1 ( − → z 1 ); . . . ; Q t ( − → z t ); cl ock (@ x 1 , q ); q 6 = 0 . and tempQ (@ x 1 , . . . , x k ) : − tempQ (@ x 1 , . . . , x k ); cl ock (@ x 1 , q ); q 6 = 0 . are in P N L where Q i ( − → z i ) is in head r i for r i ∈ P N L . If for eac h i ∈ [1 , t ], Q i ( − → c i ) ∈ I N L n ( κ + 1) + 1 + ( κ r − 1), where − → c i = σ ( − → z i ), then tempQ ( b 1 , . . . , b k ) ∈ I N L n ( κ + 1) + 1 + κ r , then tempQ ( b 1 , . . . , b k ) ∈ I N L n ( κ + 1) + 1 + p , p ∈ [ κ r , κ ]. r i is as follo ws : Literals R w +1 ( − − → y w +1 ), . . . , R m ( − → y m ), ¬ R m + u +1 ( − − − − − → y m + u +1 ), . . . , ¬ R l ( − → y l ) are group ed into su bsets S ′′ 1 , . . . , S ′′ n , such that he literals in different subs ets h a v e no common v ariables except the v ariable in C N r whic h is x 1 . F or eac h S ′′ i , – if some of the holding v ariables of the literals in S ′′ i are the n eigh b ors of h r , ( T i 6 = ∅ ), then G (@ hv γ iu , h r ) where hv γ iu is one of su c h v ariables, is added in to S ′′ i . Then h r i = hv γ iu and C N r i = C N r ∪ { h r i } . Literals in S ′′ i along with ” cl ock (@ hv γ iu , q )”, ” q 6 = 0” constitute body r i . ” ↑ Q i ( − → z i )” constitute head r i where − → z i con tains all the v ariables b oth in body r i and in an y of R 1 ( − → y 1 ), . . . , R w ( − → y w ), ¬ R m +1 ( − − − → y m +1 ), . . . , ¬ R m + u ( − − − → y m + u ) or head r , with h r as the holding n o de. If the ev aluation for r i is fi nished, the result f or the sub-query Q i gets to σ ( h r ) in the n ext roun d. d r i = 1. – Otherw ise (non of the holding v ariables of the literals in S ′′ i are the neighbors of h r ), ” y = hv γ it ” is added in to S ′′ i where γ it is one literal in S ′′ i and y do es not o ccurs in r . h r i = hv γ it . C N r i = C N r ∪ { h r i } . Literals in S ′′ i along with ” cl ock (@ y , q )”, ” y 6 = 0” constitute body r i . head r i is ” ↑ Q i ( − → z i )” wher e − → z i con tains all the v ariables b oth in body r i and in an y of R 1 ( − → y 1 ), . . . , R w ( − → y w ), ¬ R m +1 ( − − − → y m +1 ), . . . , ¬ R m + u ( − − − → y m + u ) or head r , with y as holding no de. 20 Moreo v er, b ecause th e follo wing r ule is in P N L ↑ Q i (@ x . . . ) : − Q i (@ y . . . ); G (@ y , x ); cl ock (@ y , q ); q 6 = 0 . therefore if the ev aluation of r i is fin ished, then the result for the sub-query Q i is obtained lo cally in the next round and then is broadcast to ev ery n o de in ∆ roun d s. d r i = 1 + ∆. F or eac h i ∈ [1 , t ], if Q i ( σ ( h r i ) . . . ) ∈ I N Ln ( κ +1)+1+ κ r i , then Q i ( − → c i ) ∈ I N Ln ( κ +1)+1+( κ r i + d r i ) − 1 , and b ecause κ r = max { κ r i + d r i } , so Q i ( − → c i ) ∈ I N Ln ( κ +1)+1+( p ′ − 1) for p ′ ∈ [ κ r i + d r i , κ r ]. Eac h r i is then rewritten by Rew r ite function and th e output rules are mo d ified by the Rewriting Algorithm. A set of rules T r ∈ P N L is obtained by app lying the Rewriting Algorithm on r . F or r ′ ∈ T r with some sub-queries κ r ′ = max { κ r ′ i + d r ′ i | r ′ ∈ T r and r ′ i is a s u b-query of r ′ } , and for r ′′ ∈ T r without su b-queries κ r ′′ = 1. Th e ans w ers to r ∈ T r is in I N Ln ( κ +1)+1+ κ r . Th erefore Q i ( σ ( h r i ) . . . ) ∈ I N Ln ( κ +1)+1+ κ r i for eac h i ∈ [1 , t ], so fin ally Q ( b 1 , . . . , b k ) ∈ I N L ( n +1) ( k +1)+1 . Then we p ro of that if Q ( b 1 , . . . , b k ) ∈ I N L ( n +1) ( κ +1)+1 then Q ( b 1 , . . . , b k ) ∈ I D Ln +1 for n + 1. If Q ( b 1 , . . . , b k ) ∈ I N L ( n +1) ( κ +1)+1 , then (i) Q ( b 1 , . . . , b k ) ∈ I N L ( n +1) ( κ +1) or (iii) tempQ ( b 1 , . . . , b k ) ∈ I N L ( n +1) ( κ +1) and cl ock ( b 1 , 0) ∈ I N L ( n +1) ( κ +1) . If Q ( b 1 , . . . , b k ) ∈ I N L ( n +1) ( κ +1) , acco rd ing to Lemma 7.6, Q ( b 1 , . . . , b k ) ∈ I N Ln ( κ +1)+1 . By the induction hypothesis, Q ( b 1 , . . . , b k ) ∈ I D Ln , so Q ( b 1 , . . . , b k ) ∈ I D Ln +1 . Otherwise ( Q ( b 1 , . . . , b k ) / ∈ I N L ( n +1) ( κ +1) ), Q (@ x . . . ) : − tempQ (@ x . . . ); cl ock (@ x, 0) is in P N L , tempQ ( b 1 , . . . , b k ) ∈ I N L ( n +1) ( κ +1) , clock ( b 1 , 0) ∈ I N L ( n +1) ( κ +1) . Therefore stop ( a ) / ∈ I N L ( n +1) ( κ +1)+1 , and by Lemma 7.4, clock ( a, κ − p ) ∈ I N Lq , p ∈ [0 , κ ] if f q = n ( κ + 1) + 1 + p . A set of rules in P N L of the follo wing form, with Q i ( − → x i ), Q i 1 ( − → z i 1 ), . . . , Q io ( − → z io ) as sub-queries, is u sed and only u sed for deducing tempQ ( b 1 , . . . , b k ) ( ↑ ) Q i ( − → x i ) : − R i 1 ( − → y i 1 ); . . . ; R im ( − → y im ); ¬ R im +1 ( − − − → y im +1 ); . . . ; ¬ R il ( − → y il ); Q i 1 ( − → z i 1 ); . . . ; Q io ( − → z io ); cl ock (@ y , q ); q 6 = 0 . and tempQ ( b 1 , . . . , b k ) ∈ I N Lp , p ∈ [ p ′ , ( n + 1)( κ + 1)] f or some p ′ ∈ [2 , ( n + 1)( κ + 1)]. According to Rewriting Algorithm, all of these ru les are rewr itten from a rule in P D L with Q ( x 1 , . . . , x k ) as the head and the literals R t ( − → y t ) and ¬ R u ( − → y u ) o ccurring in these rules as the b o dy . Assume the r ule is r : Q ( x 1 , . . . , x k ) : − R 1 ( − → y 1 ); . . . ; R m ( − → y m ); ¬ R m +1 ( − − − → y m +1 ); . . . ; ¬ R l ( − → y l ) . By Lemma 7 .6, R i ( σ ( − → y i )) ∈ I N Ln ( κ +1)+1 ∪ G, for i ∈ [1 , m ] / ∈ I N Ln ( κ +1)+1 ∪ G, for i ∈ [ m + 1 , l ] 21 where σ ( x i ) = b i for i ∈ [1 , k ], and for some e ∈ [1 , m ], R e ( σ ( − → y e )) / ∈ I N L ( n − 1) ( κ +1)+1 ∪ G . By th e induction hypothesis R i ( σ ( − → y i )) ∈ I D Ln ∪ G, for i ∈ [1 , m ] / ∈ I D Ln ∪ G, for i ∈ [ m + 1 , l ] and R e ( σ ( − → y e )) / ∈ I D Ln − 1 ∪ G . So Q ( b 1 , . . . , b k ) ∈ I D Ln +1 . Therefore for an in tensional relation Q of P D L , Q ( − → c ) ∈ I D Li if and only if Q ( − → c ) ∈ I N Li ( κ +1)+1 . W e now p r o of that the computation of P N L on G terminates iff the computation of P D L on G terminates. The computation of P D L on G terminates, if f ( I D Lj ) j ≥ 0 con v erges, if f no n ew facts in any intensional relation of P D L are deduced in I D Li for th e minimal i , if f no n ew facts in any intensional relation of P D L are deduced in I N Li ( κ +1)+1 for th e minimal i , if f continue ( v ) / ∈ I N L ( i +1) ( κ +1) and continue ( v ) ∈ I N Li ( κ +1) , if f stop ( v ) ∈ I N L ( i +1) ( κ +1)+1 , if f clock ( v , c ) / ∈ I N L ( i +1) ( κ +1)+2 , if f only the facts in the intensional relations of P D L are in I N Lp , p > ( i + 1)( κ + 1) + 2, if f ( I N Lj ) j ≥ 0 con v erges, if f the computation of P N L on G terminates. Therefore the computation of P N L on G terminates iff the compu tation of P D L on G terminates and P N L ( G ) = P D L ( G ). 7 Restriction to neigh b orho o d W e next consider a restriction of F O and FP to b ounded n eigh b orho o d s of no d es wh ic h ensures that the distribu ted computation can b e p erformed w ith only a b ounded num b er of messages p er no de. Let dist ( x, y ) ≤ k b e the first-order form u la stating th at the distance b et w een x and y in the graph is n o more th an k . Let N k ( x ) = { y | dist ( x, y ) ≤ k } denote the k -neighb orho o d of x . Let ϕ ( x, − → y ) b e an F O form ula with free v ariables x, − → y , then ϕ ( k ) ( x, − → y ) denotes the form u la with all the v ariables o ccurr ing in ϕ r elativize d to the k -neighborh o o d of x , that is eac h q u an tifier ∀ / ∃ z is replaced by ∀ / ∃ z ∈ N k ( x ), and y ∈ N k ( x ) is add ed f or eac h free v ariable y . The lo cal fragment s of F O and FP can b e d efined as follo ws. Definition 4. F O loc is the set of F O formulae of the form ϕ ( k ) ( x, − → y ) . 22 The lo cal fragment of FP can b e defin ed as fixp oin t of F O loc form ulae. Definition 5. F P loc is the se t of FP formulae of the form µ ( ϕ ( k ) ( T )( x, − → y )) , wher e − → y = y 1 . . . y ℓ and T is of arity ℓ + 1 . Consider again the examples of S ection 2. It is easy to verify that the formula µ ( ϕ ( T )( x, h, d )) defining the OL S R lik e table-based routin g is not in F P loc . On the other hand, th e formula µ ( ϕ ( S T )( x, y )) d efi ning the spann in g tr ee is in F P loc , as we ll as the form u lae µ ( ϕ ( RouteReq )( x, y , d )) and µ ( ϕ ( N extH op )( x, y , d )) defining th e AOD V lik e O n -Demand Routing. 7.1 Distributed complexity W e no w show that the distributed computation of the local fr agmen ts, F O loc and F P loc , can b e done very efficien tly . W e assume that th e no des are equipp ed with p orts f or eac h of their neigh b ors. The p orts allo w to b ound the message size to a constan t indep enden t of the net wo rk size. The pro of relies as previously on sp ecific query engines f or F O loc and F P loc . The query en gine for F O loc w orks b oth in syn c hronous and async hr onous s y s tems. Query engine for F O loc ( QE F O loc ) The requesting no d e broadcasts the F O loc form ula ϕ ( k ) ( x, − → y ). F or eac h no de a , when it receiv es th e q u ery ϕ ( k ) ( x, − → y ), it collects the top ology information of its k -neigh b orh o o d by send ing m essages of O (1) size, then ev aluates ϕ ( k ) ( a, − → y ) (where x is in s tan tiated b y a ) by in -no de computation. Sin ce all n o des collect their k − n eigh b orho o d top ology information concurrent ly , these computations ma y interfere with eac h other. T o a v oid the interferences b et w een concurrent lo cal compu tations of different no des, the traces of trav ers ed p orts are incorp orated in all m essages. Eac h no d e collects th e top ology of its k -neighb orh o o d as f ollo ws. • F or eac h no de a , when it receiv es the query ϕ ( k ) ( x, − → y ), it sends a message (“collec t”, k , j ) to its neigh b or though p ort j , and waits for replies. • Up on r eception of a message (“colle ct”, i , j 1 ...j 2( k − i )+1 ) by p ort j ′ , a add s j 1 ...j 2( k − i )+1 j ′ in to a table tr acel ist a , and – if i > 0, a sends on eac h p ort j ′′ s.t. j ′′ 6 = j ′ the message (“co llect”, i − 1, j 1 ...j 2( k − i )+1 j ′ j ′′ ), and w aits for replies; – otherwise( i = 0), a s ends on p ort j ′ the message (“reply”, j 1 j 2 ...j 2 k + 1 , j ′ , tr acel ist a ). • Up on reception of a message (”reply”, j 1 . . . j 2 r +1 , j 2 r +2 . . . j 2 k + 2 , t r acelist ′ 1 . . . tr acel ist ′ k − r + 1 ) on p ort j 2 r +1 , and r eplies from all the other p orts hav e b een receiv ed – if r = 0, f or 1 ≤ s ≤ k + 1, a stores in the lo cal memory ( j 1 ....j 2 s , tr acelist ′ s ); – otherwise a send s on p ort j 2 r a message (“reply”, j 1 . . . j 2 r − 1 , j 2 r . . . j 2 k + 2 , t r acelist a tr acelis t ′ 1 . . . tr acel ist ′ k − r + 1 ). • After receiving replies from all p orts, a compu tes the top ology of th e k -neighborh o o d of a by utilizing the stored tuples ( j 1 ...j 2 r , t r acelist )as f ollo w s: Let T k ( a ) := { j 1 ...j 2 r | ( j 1 ...j 2 r , t r acelist ) is stored in lo cal memory of a } . 23 Define an equiv alence relation ≈ on T k ( a ) as follo ws: let j 1 ...j 2 r , j ′ 1 ...j ′ 2 s ∈ T k ( a ), then j 1 ...j 2 r ≈ j ′ 1 ...j ′ 2 s if and only if ∃ ( j 1 ...j 2 r , t r acelist ) , ( j ′ 1 ...j ′ 2 s , t r acelist ′ ) s.t. j 1 ...j 2 r ∈ tr acelist ′ or j ′ 1 ...j ′ 2 s ∈ tr acelist . The v ertex set of the k -n eighb orh o o d of a is n j 1 ...j 2 r | j 1 ...j 2 r ∈ T k ( a ) , r ≤ k o / ≈ , namely equiv alence classes [ j 1 · · · j 2 r ] of ≈ on elemen ts j 1 ...j 2 r ( r ≤ k ) of T k ( a ). Let [ j 1 ...j 2 r ] , [ j ′ 1 ...j ′ 2 s ] b e tw o vertice s of the k -neigh b orho o d of a , then there is an ed ge b et ween [ j 1 ...j 2 r ] and [ j ′ 1 ...j ′ 2 s ] if and on ly if there is j ∗ 1 j ∗ 2 ...j ∗ 2 t +1 j ∗ 2 t +2 ∈ T k ( a ) such th at j ∗ 1 ...j ∗ 2 t ≈ j 1 ...j 2 r and j ∗ 1 ...j ∗ 2 t +2 ≈ j ′ 1 ...j ′ 2 s . W e can n o w state our main resu lt for F O loc . Theorem 8. L et G = ( V , G ) b e a network with n no des and diameter ∆ . F O loc formulae ϕ ( k ) ( x, − → y ) c an b e evaluate d on G with the fol lowing c omplexity upp er b ounds: IN-TIME/ROUND DIST-TIME M SG-SIZE # M SG/NODE O (1) O (∆) O (1) O (1) Note that the distributed time O (∆) comes from the initial broadcasting of the form ula. Th e computation itself is fully lo cal, and can b e done in O (1) distributed time. In the case of an async hronous sys tem, DIS T-TIME is b ounded by O ( n ). W e now consider F P loc whic h admits the same complexit y b ounds as F O loc except for the distributed time. W e first assume that th e system is sync hron ou s , and discuss the asyn c hronous system later. Query engine for F P loc ( QE F P loc ) Request flo o ding The requesting no de sets a clo c k σ of v alue ∆ and b roadcasts the message ( µ ( ϕ ( k ) ( T )( x, − → y )) , ∆ − 1) to its neigh b ors. F or eac h n o de a , if it receiv es message ( µ ( ϕ ( k ) ( T )( x, − → y )) , c ) and it ha v en’t set th e clo c k σ b efore, then it sets a clo ck σ of v alue c , and if c > 0, it b roadcasts message ( µ ( ϕ ( k ) ( T )( x, − → y )) , c − 1) to all its neigh b ors. T op ology collection When the clo c k σ expires, eac h no d e a sets a clock σ ′ of v alue 4 k and starts collect ing all the top ology information in its 2 k -neighborh o o ds by sendin g messages and tracing the tra v ersed p orts (lik e for Theorem 8). No w eac h no de a gets a 2 k -local name for eac h a ′ in its k -neigh b orho o d , wh ic h is the set of traces f rom a to a ′ of length no more than 2 k , denote th is 2 k -local name of a ′ at a b y N ame 2 k a ( a ′ ). Fixp oin t Computation In eac h n o de a , there is a local table to store the tuples ( a, − → b ) in T , whic h uses the k -local names N ame k a ( a ′ ) of a ′ . When the clo c k σ ′ expires, eac h n o de a sets a clock τ = 3 k and starts ev aluating the F O form ula ϕ ( k ) ( T )( a, − → y ) (where x is instan tiated b y N ame 2 k a ( a ), the 2 k -lo cal name of a at a ). No de a ev aluates ϕ ( k ) ( T )( a, − → y ) by instan tiating all the (free or b oun d ed) v ariables in ϕ ( k ) ( T )( a, − → y ) by its 2 k -lo cal names N ame 2 k a ( a ′ ) f or no des in its k -neigh b orho o d and considering all the p ossible instan tiations one by one. Supp ose a instan tiates ( x, − → y ) b y ( a, − → b ) and also in s tan tiates all the b ound ed v ariables, then a v ariable-free form ula ψ is obtained. Since there ma y b e atomic form ulae T ( a ′ , − → b ′ ), a s h ould send the query ? B T ( a ′ , − → b ′ ) to a ′ , then a ′ should c hec k wh ether T ( a ′ , − → b ′ ) holds or n ot and send the answer 24 to a . It works sin ce f r om N ame 2 k a ( b ′ i ), the 2 k -local names of b ′ i at a , a ′ can get N ame k a ′ ( b ′ i ), the k -lo cal names of b ′ i at a ′ . During the ab o v e ev aluation of ϕ ( k ) ( T )( a, − → y ), if a n ew tup le ( a, − → b ) satisfying ϕ ( k ) ( T )( x, − → y ) is obtained, a stores it in a temp orary b uffer (the lo cal table for T will b e up dated later) b y using the k -lo cal names of a and − → b at a , and send s messages to in form other no des in its k -n eigh b orho o d that new facts are pro d uced. F or eac h n o de a , wh en the clo c k τ expires, it sets th e v alue of τ by 3 k again; if some new tup les are pro du ced, a up dates th e lo cal table f or T , and empty the temp orary b uffer; if some new tup les are pro du ced or some inf orming messages are r eceiv ed, it ev aluates ϕ ( k ) ( T )( a, − → y ) again. Theorem 9. L et G = ( V , G ) b e a network with n no des and diameter ∆ . F P loc formulae µ ( ϕ ( k ) ( T )( x, − → y )) c an b e evaluate d on G with the fol lowing c omplexity upp er b ounds IN-TIME/ROUND DIST-TIME M SG-SIZE # M SG/NODE O (1) O ( n ) O (1) O (1) Pr o of. It is easy to see that messages sent du ring the computation of QE F P loc are of size O (1). Before the clo c k σ expires, it is evident that eac h n o de sen ds only O (1) m essages of the format ( µ ( ϕ ( T )( x, − → y )) , c ). Then eac h no de sets the clock σ ′ and collects top ology information of its 2 k -neigh b orho o d , s in ce the degree of no des is b oun ded and in the 2 k -neighborh o o d of a there are only O (1) n o des, eac h no de sends only O (1) messages as wel l. After the clo c k σ ′ expires, eac h n o de a sets the clock τ and starts ev aluating ϕ ( k ) ( T )( a, − → y ). During eac h p erio d 3 k of τ , n o de a consid ers all the p ossible instan tiations of the (free or b ounded) v ariables in ϕ ( k ) ( T )( a, − → y ) one by one and ev aluate the instantia ted formula. Durin g eac h such p erio d, since the total num b er of different instantia tions are O (1) and only O (1) messages are sent during the ev aluation of eac h suc h instantia ted formula ϕ ( k ) ( T )( a, − → b ), the total n umber of messages sen t by a is O (1). Moreo v er, after the clo c k σ ′ expires an d b efore the distribu ted compu tation termin ates, eac h no de a only sen d s O (1) messages: a only b e able to receiv e informing messages f rom no des in its k -neigh b orh o o d, th e total n umb er of tup les ( a, − → b ) pr o duced on no des in the k -neighborh o o d of a is O (1), so the total num b er of informing messages receiv ed by a is O (1), consequen tly a ev aluates ϕ ( k ) ( x, − → y ) at most O (1) times, th us the total num b er of m essages sent by a is O (1). After the clo c k σ ′ expires, during eac h p erio d 3 k of clo ck τ , there should b e at least one inform ing message sen t b y some no d e, which means at least one new tuple in T is pro duced. Since there are at most O ( n ) num b er of tuples in T , the total d istributed time for the ev aluation of µ ( ϕ ( k ) ( T )( x, − → y )) is O ( n ). F or asynchronous systems, a spann ing tree ro oted at th e requ esting no d e can b e used to ev aluate F P loc , and the complexit y b ounds DIS T-TIME and #MSG/NODE b ecome resp ectiv ely O ( n 2 ) and O ( n ). 7.2 Net works with no global iden tifiers The qu ery engines QE F O loc and QE F P loc ev aluate F O loc and F P loc queries by using only lo cal names in the b ounded neigh b orh o o ds of no des, which suggests that for the ev aluation of the lo cal fragmen ts of F O and FP , u nique global identifiers for no des are unnecessary . In this section, we 25 sho w that this is essential ly the case, and consid er their ev aluation on net w orks w ith identifiers whic h are only lo cally consistent an d on anonymous net works with p orts. Definition 6. A network G = ( V , G, L ) with a lab eling function L : V → C assigning identifiers to no des, is k -lo c al ly c onsistent if for e ach no de a ∈ V , for any b 1 , b 2 ∈ N k ( a ) , L ( b 1 ) 6 = L ( b 2 ) . P orts ha ve b een used to construct lo cal names in the p r evious sub -section. They are not needed to ev aluate F O loc and F P loc on lo cally-consisten t net works since these netw orks hav e lo cally unique iden tifiers for no des. Theorem 10. A F O loc formula ϕ ( k ) ( x, − → y ) c an b e evaluate d on k -lo c al ly c onsistent networks with the fol lowing c omplexity upp er b ounds: IN-TIME/ROUND DIST-TIME M SG-SIZE # M SG/NODE O (1) O (∆) O (1) O (1) Theorem 11. A F P loc formula µ ( ϕ ( k ) ( T )( x, − → y )) c an b e evaluate d on k -lo c al ly c onsistent networks with the fol lowing c omplexity upp er b ounds: IN-TIME/ROUND DIST-TIME M SG-SIZE # M SG/NODE O (1) O ( n ) O (1) O (1) Lo cal fragmen ts of FO and FP can also b e ev aluated w ith the same complexity b ounds on anon ymous net wo rks with p orts sin ce lo cal names can b e obtained by tracing the tra v ersed p orts of messages. Note that in general, FO and FP queries cannot b e ev aluated ov er lo cally consistent or anon y- mous net wo rks . 8 Conclusion Fixp oin t logic expr esses at a global leve l and in a d eclarativ e w ay the interesting functionalities of distribu ted sys tems. W e ha ve pro v ed that fixp oint formulae o v er graphs admit reasonable dis- tributed complexit y upp er-b ounds. Moreo v er, we sh o w ed ho w global formulae can b e translated in to ru le p rograms describing th e b eha vior of the no des of the net work and compu ting the same result. The examples give n in the pap er ha v e b een imp lemented on the Netquest system which supp orts the Netlog language. Finally , we prov ed th e p otenti al of restricted fragmen ts of fixp oin t logic to lo cal neigh b orho o d, that are still v ery expressiv e, but admit m uch tigh ter distribu ted complexit y up p er-b oun ds with b ound ed num b er of messages of b ounded size, indep endent of the size of the n et w ork. These results sho w how classical logical form alisms can help designing high lev el programming abstractions for distributed systems that allo ws to s tate the desired global result, without sp eci- fying its computation mo de. W e plan to pursue this inv estigation in the follo wing d irections. (i) In vesti gate the distributed complexit y of other logical formalisms suc h as monadic Second Ord er Logic, whic h is very expressiv e on graphs. (ii) Study the optimization of the translation from fi x- p oint logic to Netlog, to obtain efficien t p rograms. (iii) Extend these results to other distribu ted computing m o dels. 9 Ac kno wledgmen ts The authors thank Huimin Lin for fru itful discussions. 26 References [1] S. Ab iteb oul, Z. Ab r ams, S . Haar, and T. Milo. Diag nosis of asyn c hronous discrete ev ent systems: d atalog to the rescue! In Pr o c e e dings of the Twenty-fourth ACM SIGACT-SIGMOD- SIGAR T Symp osium on Principles of Datab ase Systems, Baltimor e, Maryland, USA , 2005. [2] S. Abiteb oul, R. Hull, and V. Vian u. F oundations of Datab ases. Addison-W esley , 1995. [3] G. Alonso, E. Kranakis, C. Sa w c huk, R. W attenhofer, and P . Widmay er. Probabilistic pr oto- cols for no de disco v ery in ad ho c m ulti-c hannel broadcast n et w orks. In A d-Ho c , Mobile, and Wir eless Networks, Se c ond International Confer enc e, ADHO C- NOW , 2003. [4] H. Atti ya and J. W elc h . Distribu te d Computing: F undamentals, Simulations and A dvanc e d T opics . Wiley-In terscience, 2004. [5] Y. Bejerano, Y. Breitbart, M. N. Garofalakis, and R. Rastogi. Ph ysical top ology disco very for large multi-subnet net works. In INFOCOM , 2003. [6] Y. Bejerano, Y. Breitbart, A. O rda, R . Rastogi, and A. Sprint son. Algorithms for computing qos p aths with restoration. IEEE /A CM T r ans. Netw. , 13(3), 2005. [7] H. Ebbin ghaus an d J. Flum. Finite mo del the ory . Sprin ger-V erlag, Berlin, 199 9. [8] R. F ag in. Generalized fi rst-order sp ectra and p olynomial-time recognizable sets. In Complexity of c omputation (Pr o c. SIA M -AMS Symp os. Appl.Math., New Y ork, 1973) , pages 43–73. SI AM– AMS Pr o c., V ol. VI I. Amer. Math. So c., Pr o vidence, R.I., 1974. [9] W. F. F ung, D. Su n, and J. Gehrke. Cougar: the netw ork is the database. In SIGMOD Confer enc e , 2002. [10] S . Gru m bac h, J. Lu, and W. Q u. Self-organization of wireless net w orks thr ough declarativ e lo cal comm unication. In OTM, On the Move Confer enc e, MO NET Workshop , v olume LNCS 4805, p ages 497–506, 2007. [11] N. Imm erman. Expressib ilit y and p arallel complexit y . SIAM J. Comput. , 18(3):625 –638, 1989. [12] B. T. Lo o, T. Cond ie, J . M. Hellerstein, P . Maniatis, T. Roscoe, and I . Stoica. Imp lemen ting declarativ e ov erlays. In 20th A CM SOSP Symp osium on Op er ating Systems Principles , 2005. [13] B. T. Lo o, J. M. Hellerstein, I. Stoica, and R. Ramakrishn an . Declarati ve routing: extensible routing with d eclarativ e q u eries. In A CM SIGCOMM Confer enc e on Applic ations, T e chnolo- gies, A r chite ctur es, and Pr oto c ols for Computer Communic ations , 2005 . [14] S . Madden, M. J. F ranklin, J. M. Hellerstein, and W. Hong. Tinydb: an acquisitional quer y pro cessing system for sensor netw orks . ACM T r ans. Datab ase Syst. , 30(1), 2005. [15] P . Marron and D. Minder et al. Em b edded w isen ts researc h r oadmap. T ec hn ical rep ort, Em b edded WiSeNts Cons ortium, 2006. [16] R . Ramakrishn an and J. Gehrke. Datab ase M anagement Systems. McGra w-Hill, 2003. 27 [17] R . Ramakrishnan and J. Ullman. A survey of d ed uctiv e database systems. Journal of L o g i c Pr o gr amming. , 23(2), 1995. [18] F. Reiss and J . M. Hellerstein. Declarativ e netw ork monitoring w ith an underpr o visioned qu er y pro cessor. In ICDE , 2006. 28
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment