Decomposition Techniques for Subgraph Matching

In the constraint programming framework, state-of-the-art static and dynamic decomposition techniques are hard to apply to problems with complete initial constraint graphs. For such problems, we propose a hybrid approach of these techniques in the pr…

Authors: ** - Stéphane Zampelli (Université catholique de Louvain, Belgium) - Martin Mann (Albert‑Ludwigs‑Universität Freiburg, Germany) - Yves Deville (Université catholique de Louvain

Decomp osition T ec hniques for Subgraph Matc hing St ´ ephane Zamp elli 1 , Martin Mann 2 , Yves Deville 1 , Rolf Backofen 2 (1) Universit ´ e catholique de Louv ain, Department of Co mp uting Science and Engineering, 2, Place Sainte-Barbe 1348 Louv ain-la-Neu ve (Belgi u m) { stephane .zampelli,yves.deville } @uclouvain.be (2) Alb ert-Ludwigs-Univers ity F reiburg, Bioinformatic s Group, Institute of Computer S cie n ce, Georges-Koehler-Allee 106, D- 79 110 F reiburg (Germany) { mmann,ba ckofen } @informatik.uni-freiburg.de Abstract. In t he constrain t programming framew ork, state-of-the-art static a n d dynamic decomp osition techniques are hard to app ly to prob- lems with complete initial constraint graphs. F or suc h problems, we pro- p os e a hybrid approach of these tec h niques in the presence of global constrain ts. In particular, we solve t he subgraph isomorphism problem. F urther we design sp ecific heuristics for th is hard problem, exploiting its sp ecial structure to ac h ieve decomposition. The underlying idea is to pre- compute a static heuristic on a subset of its constraint netw ork, to follo w this static ordering u n til a first problem decomp os ition is av ailable, and to switch afterw ards to a fully propagated, dynamically decomp osi n g searc h. Exp eri mental results show that, for sparse graphs, our decom- p os ition metho d solves more instances than dedicated, state- of-th e-art matc h ing alg orithms or standard constraint programming approac hes. 1 In tro duction Graph pattern matching is a central application in many fields [1 ] and can b e suc- cessfully mo deled using constra int pro gramming [12,17,19]. Here, we stres s how to apply decomp osition tec hniques to so lv e the Subgraph Isomorphism Problem (SIP) in order to outp erform the dedicated state-of-the-ar t algorithm. Decomp osition techniques are an instantiation of the divide and conquer paradigm to ov er come redundant work for independent par tial problems. A con- straint problem (CSP) ca n be asso ciated with its constraint net work, which represents the active co nstrain ts to g ether with their re la tionship. During s earc h, the constraint netw ork lo oses structur e as v aria bles are instantiated and co n- straints entailed by domain pro pagation. The constraint netw or k can p ossibly consist of tw o or more indep enden t comp onents, lea ding to r edundan t work due to the rep eated computatio n and com bination of the corre s ponding indep enden t partial solutions. The key to solve this is de c omp osition that consists of tw o steps. The first step detects the po ssible problem decomp ositions, by examin- ing the underlying constr a in t netw ork fo r independent co mponents. The seco nd step explo its these indep enden t comp onent s b y solving the cor responding par - tial CSPs indep e ndently , and combines their solutio ns without redunda nt w or k. Decomp osition ca n oc c ur at any no de of the search tree, i.e. at the ro ot no de or dynamically during search. In constraint prog ramming, decomp osition tec h- niques hav e b e e n studied through the concept of AND/OR search [15]. AND/OR search is s ensitiv e to pr oblem dec omposition, introducing search subtree com- bining AND no des as an extens ion to classica l OR s earc h no des. The size of the minimal AND/OR sear c h tree is exp onen tia l in the tre e width while the size of the minimal OR s earc h tree is exp onen tia l in the path width, and is never worse than the size of the O R tree search. The c heck for de c omposition is usually done in one of t wo ways. Either , only the initia l constraint netw ork is statically analysized, resulting in a so ca lled pseudo-tree. This structure enco des b oth, the s tatic sear ch heuris tic and the information when a subproblem is decomp osable [5]. Ano ther p ossibilit y is to consider the dynamic changes of the constr ain t netw ork by analyzing it at each no de during the sear c h [3]. Suc h a dynamic appr oac h is b etter suited if a strong constraint propa gation (e.g. b y AC) is present but o b viously to the co st of ad- ditional computations. A ma jor pro blem of decompo sition techniques ar e their problem specificity . Without go o d heuristics, decomp osition may o ccure seldom or very la te such that the computational ov erhe a d for chec king etc. is too high for an efficient application. Nevertheless, so me appr oac hes ha ve b een shown to b e mo re gener al by applying dedicated algorithms, e.g . g raph separa tors or cycle cutset condi- tioning [10,15,16]. How ever, thos e (usually static) alg orithms fail to compute go od heuristics on problems with globa l co nstrain ts, whic h hav e an initially complete constra in t graph. Indeed, such algo rithms presupp ose a spar se constraint graph. In the sub- graph isomorphism pr oblem (SIP), for example, the initial cons train t graph is complete due to the presence of a global alldi ff -constraint. This pre v ents cycle cutset and gr aph separator algorithms to b e applied. A further drawback of a static analysis is the non-predictable decomp osability of the co nstrain t net work achiev ed by constraint entailmen t thro ugh pr opagation. T o explo it this, a dy- namic analys is of the pro ble m structur e during the search is necessa ry . This is of high importance for SA T- [13] and CSP-s olving [3]. Unfortunately , a dynamic analysis requires significant additional work that slows down the sea rc h pr o cess once more. In this pa per we show how to ov erco me those shortco mings by combin ing static a nd dy namic decomp osition approaches to take adv antage of dec omposi- tion for the hard problem of SIP . A combination yields a ba lance betw een the fast static analysis a nd the needed full pr opagation exploited by dynamic sear c h strategies in the presence of global cons tr ain ts. The underlying idea is to follo w the static order ing until a firs t pr oblem deco mposition is av ailable (or likely) a nd to s witc h a fterw ards to a full propag ated decompo sing search. F or the la ter, w e consider only a bina ry co ns train t repres en tation inside the co nstrain t net work in order to compute a go od deco mposition-enfor cing heuristic. As shown in the 2 exp erimen ts, this idea is a key po in t for a n efficien t a pplication o f a decompo s ing search (as AND/OR) for the SIP . T o face the problem of gra ph pa ttern matching [1] many different types of algorithms hav e b een prop osed, ra nging fro m gener al metho ds to spe c ific algo - rithms fo r particular types o f graphs. The sta te- of-the-art approach is the dedi- cated VF -algor ithm, freely av ailable in the C++ vf lib library [2 ]. In constraint progra mming, sev er al authors [12,1 7] hav e shown that gr aph matching can b e formulated as a CSP pro blem, and argued tha t co nstrain t progr amming co uld be a p o werful to ol to ha ndle its co m binatorial complex it y . Our modeling [19] is based o n these works. In [19], we show ed that a CSP approach is comp etitiv e with dedicated alg orithms o ver a graph database repr esen ting gra phs with v ari- ous top ologies. Regar ding decomp osition, V alien te and al. [18] hav e shown how to use decomp osition techniques in or der to sp eed up subgr aph ho meomorphism. [18] states that, if the initial pattern graph is ma de of several disconnected com- po nen ts, then matching eac h component separ ately is equiv alent to matc hing all of them together . Sp ecific algorithms are also demons tr ated. Our work can b e seen as an extens io n to this work. W e consider the subg r aph isomo rphism prob- lem instead of the subgraph homeomorphism problem. The la tter cas e is e a sier as the constraint graph is ma de only of the initia l pattern g raph. Moreov er , we apply the decomp osition dynamically when [18] deco mposes only statically on the initial pattern gra ph. Ob jectives and results - In this pap er we study the limits of the direct application of sta te-of-the-art (static and dynamic) decomp osition techniques for problems with g lobal constra in ts; we s ho w that such a direct application is useless for SIP . W e develop a hybrid decompo s ition appro ac h for such pro b- lems and design specific sea rc h heuristics for SIP , ex ploiting the str ucture of the problem to achieve deco mposition. W e show that the CP approach using the prop osed de c omposition techniques o utperforms the state-of-the-ar t a lgorithms, and solves more instanc e s on some classes of pr o blems (sparse ins ta nces with many solutions). The pap er is structured a s fo llo ws. Section 2 in tro duces a dec o mposition metho d able to detect decomp osition at any s tage during the sear ch. In Section 3, the prop osed deco mposition method is applied and sp ecialized to SIP . Exp er- imen ta l results as sessing the efficiency o f our approach are pres e n ted in Section 4. Section 5 concludes the pap er. 2 Decomp osition In this se c tion we show how to define and detect decomp osition during search. Sections 2.1 and 2.2 define a decomp osition metho d able to detect decomp o si- tion at an y state during search, co nsidering that w e do not know a priori when decomp osition o c curs. Section 2.3 shows that our metho d is able to compute the same decomp ositions than the AND/OR sear ch fra mew ork [15], where the search is precomputed on a graph re presen tation of the constra in t net work, and decomp osition even ts are known in adv a nce. The AND/OR sear c h metho d ha s 3 shown to b e very attractive for a large classe s of co nstrain t net works. But as we will see in Se c tio n 3, our metho d is suited for the SIP while the AND/OR metho d is not applicable b ecause the dec omposition even ts cannot b e precomputed. 2.1 Preliminary A Constr aint Satisfaction Pr oblem (CSP) P is a triple ( X, D , C ) wher e X = { x 1 , . . . , x n } is a set of v aria bles, D = { D 1 , . . . , D n } is a set of doma ins (i.e. a finite set of v alues ), each v ar ia ble x i is asso ciated with a do main D i , a nd C is a finite set of c o nstrain ts with scope ( c ) ⊆ X for all c ∈ C , where scope ( c ) is the set of v a riables in volved in the constraint c . A constraint c o ver a set of v ariables defines a relation be t ween the v ariables. A solution of the CSP is an assignment of each v ariable in X to one v alue in its asso ciated do main so that no constraint c ∈ C is violated. W e denote S ol ( P ) the set of so lutions of a CSP P . A p artial CSP ˆ P of a CSP P ≡ ( X , D , C ) is a CSP ( ˆ X , ˆ D , ˆ C ) where ˆ X ⊆ X , ∀ ˆ D k ∈ ˆ D : ˆ D k ⊆ D k and ˆ C ⊆ C . Note that since ˆ P is a CSP , w e hav e scope (ˆ c ) ⊆ ˆ X for all ˆ c ∈ ˆ C . 2.2 Decomp osing CSPs and graphs This s ubsection defines the no tion of decomp osition for a CSP . A CSP is de- comp osable into pa rtial CSPs if the CSP a nd its decomp osition hav e the same solutions. Definition 1. A CSP P is deco mposable in p artial CSPs P 1 , . . . , P n iff : – ∀ s ∈ S ol ( P ) : ∃ s 1 , . . . , s k ∈ S ol ( P 1 ) , . . . , S ol ( P k ) : s = ∪ i ∈ [1 ,k ] s i – ∀ s 1 , . . . , s k ∈ S ol ( P 1 ) , . . . , S ol ( P k ) : ∃ s ∈ S ol ( P ) : s = ∪ i ∈ [1 ,k ] s i . This general definition of decomp osition ca n b e instan tia ted to tw o practical cases. The firs t definition corresp onds to the direct intuition o f a decompo sition: a CSP is decompo s able if it ca n b e s plit into disjoin t pa rtial CSPs. It is called 0-decomp osability as no v aria ble are s hared b et ween the par tial CSPs. Definition 2. A CSP P = ( X, D , C ) is 0-decomp osable in p artial CSPs P 1 , . . . , P n with P i = ( X i , D i , C i ) iff ∀ 1 ≤ i < j ≤ n : X i ∩ X j = ∅ , ∪ i ∈ [1 ,k ] X i = X , ∪ i ∈ [1 ,k ] D i = D , ∪ i ∈ [1 ,k ] C i = C . The second definition finds more decomp o sitions by a llo wing the pa r tial CSPs to hav e instantiated v ariables in common. It is called 1-dec o mposability a s v ar i- ables shared b et ween the par tial CSPs hav e a domain of size 1 . Definition 3. A CSP P = ( X, D , C ) is 1-decomp osable in p artial CSPs P 1 , . . . , P k with P i = ( X i , D i , C i ) iff ∀ 1 ≤ i < j ≤ n : x ∈ ( X i ∩ X j ) ⇒ | D x | = 1 , ∪ i ∈ [1 ,k ] X i = X , ∪ i ∈ [1 ,k ] D i = D , ∪ i ∈ [1 ,k ] C i = C . The r e lationship with the genera l definition is direct. If a CSP P is 0- de c omp osable or 1-de c omp osable in partial CSPs P 1 , . . . , P k , then P is de c om- po sable in partial CSPs P 1 , . . . , P k . F r om Definitions 2 and 3, it follows further : 4 Pr op erty 1. If a CSP P = ( X , D , C ) is 0-decomp osable in P 1 , . . . , P k , then P is 1-decomp osable in P 1 , . . . , P k . F urther P might be 1- dec omposable in P ′ 1 , . . . , P ′ k ′ with k ′ ≥ k via overlapping partial problems P ′ i . Redundant computation during CSP-solving is performed whenev er a CSP is 0- or 1-decomp osable into k partia l CSPs P 1 , . . . , P k . F or insta nc e , if the solutions of P 1 are co mputed first, then for each solution of P 1 rep eatedly all solutions o f P 2 , . . . , P k are computed. Therefor e, P 2 , . . . , P k are solved | S ol ( P 1 ) | times and this ov er head can b e exp onential in the size o f the CSP . This can be av oided by solving the partial problems indep enden tly . The neces sary detection of the CSP-decomp osition in to indep enden t pa rtial CSPs can b e pe rformed through the concept of constr a in t gr a phs. A gr aph G = ( V , E ) consists of a vertex /no de set V and an e dge set E ⊆ V × V , wher e a n edg e ( u, v ) is a pair of no des. T he vertices u and v ar e the endpo in ts of the edge ( u, v ). W e co nsider dire cted and undirected gr aphs. A sub gr aph of a gra ph G = ( V , E ) is a graph G ′ = ( V ′ , E ′ ) with V ′ ⊆ V and E ′ ⊆ E such that ∀ ( u,v ) ∈ E ′ : u, v ∈ V ′ . A gra ph G is said to b e singly c onne cte d if and only if there is a t most one simple path b et ween a n y tw o no des in G . Definition 4. The constraint gra ph of a (p artial) CSP P = ( X , D , C ) is an undir e cte d gr aph G P = ( V , E ) wher e V = X and E = { ( x i , x j ) | ∃ c ∈ C : x i , x j ∈ scope ( c ) } . Note that all v aria bles in the s cope of one co nstrain t form a clique in G P . This c onstrain t graph is also called the primal gr aph [4]. Ther e is a standar d syntactic wa y o f decomp osing a CSP , bas e d on its constraint gr aph. Definition 5. A gr aph G = ( V , E ) is decompo sable into k sub gr aphs G 1 , . . . , G k iff ∀ 1 ≤ i and c orr esp ond t o assignment of the values v k in the domai ns of the variables. The ro ot of the AND/OR se ar ch tr e e is an OR no de, lab ele d with the r o ot of the pseudo-tr e e T P . The childr en of an OR no de x i ar e AND no des lab ele d with assignment s < x i , v k > , c onsistent along the p ath fr om t he r o ot. The childr en of an AND no de < x i , v k > ar e OR no des lab ele d with the childr en of varia ble x i in T P . Semantically , the OR states r epresen t alterna tiv e solutions, whereas the AND no des represent the problem decomp ositions into independent partial pro blems, all of which nee d to b e so lv ed. When the pseudo- tree is a chain, the AND/OR search tree coincides with the reg ular OR sear c h tree. F ollowing the ordering induced by the given a pseudo-tree T P of the con- straint graph of a CSP P , the no tion o f 1- decomposability coincides with the decomp ositions induced by a n AND/OR search. Pr op erty 3. Given a CSP P = ( X , D , C ), a pseudo tre e T P ov er the constr ain t graph of P and a pa th p of leng th l ( l ≥ 1) from the r oot no de of T P to an AND no de p l , the CSP P where all v aria bles in the path p a re assigned is 1- decomp osable into P 1 , . . . , P k where k is the num b er of OR success ors in T P of the end no de p l . Pro of - Let y 1 , . . . , y k ( k ≥ 1 ) b e the OR successor nodes of the end no de p l in T P . W e note tre e ( y i ) the tree ro oted at y i in T P . Let X s = { v ∈ X | v ∈ p } . Then build the partia l CSPs P i = ( X i , D i , C i ) ( i ∈ [1 , k ]): X i = X s ∪ { v ∈ X | v ∈ tr ee ( y i ) } D i = { D x ∈ D | x ∈ X i } C i = { c ∈ C | scop e ( c ) ⊆ X i } . It is clear that ∪ i ∈ [1 ,k ] C i = C since there exists no constraint betw een tw o different tr ee ( y i ) in T P , by definition of a pseudo tree .  6 As will b e explained in the ne x t section, neither sta tic nor dynamic AND/OR search is suited for o ur particular problem. In SIP , the co ns train t gra ph is com- plete, and thus the pseudo tree is a chain, leading to an AND/OR search tree equiv a len t to an OR search tr ee. How ever the CSP P becomes 1-decomp osable during sea rc h and a dynamic framework is needed in or der to chec k deco mposi- tion on any state during the sea rc h. But this is computatio nally very exp ensive as we will show in Section 4. 3 Applying decomp osition to SIP 3.1 Subgraph Isomo rphism Probl em Definition A s u b gr aph isomorph ism pr oblem b etw een a pattern graph G p = ( V p , E p ) a nd a tar get graph G t = ( V t , E t ) co nsists in deciding whether G p is iso morphic to some subg raph of G t . More pr ecisely , one s hould find an injective function f : V p → V t such tha t ∀ ( u, v ) ∈ E p : ( f ( u ) , f ( v )) ∈ E t . This NP-Hard problem is als o called subgr aph mo no morphism pro ble m o r s ubgraph match ing in the literature. The function f is called a su b gr aph matching function . W e assume the graphs a re directed. Undirected graphs a re a par ticula r case wher e undirected arcs are replaced by tw o directed ar c s. The CSP mo del P = ( X, D , C ) of subgraph iso morphism should r epre- sent a to tal function f : V p → V t . This tota l function can be mo deled with X = x 1 , . . . , x n with x i a FD v ariable corresp onding to the i th no de of G p and D ( x i ) = V t . The injective co ndition is mo deled with an alldiff ( x 1 , . . . , x n ) global co ns train t. The isomorphis m condition is trans lated in to a s e t o f n k- ary cons train ts M C i ≡ ( x i , x j ) ∈ E t for a ll x i ∈ V p . Given the above mo d- elling, the constraint g raph of the CSP , called the SIP constraint g raph, is the graph G P = ( V P , E P ) where V P = X and E P = E p ∪ E 6 = . Note, E p is rep- resenting all pro pagations of the M C i constraints while E 6 = depicts the glo ba l alldif f -constraints, i.e. a clique ( E 6 = = V p × V p ). Therefore, the SIP-CSP co n- sists of g lobal constraints only that would pr ev ent decomp osition using a static AND/OR sea rc h. Implementation, compar ison with dedicated algorithms, a nd extension to subg r aph isomorphism and to gra ph and function computation do- mains can b e found in [1 9]. 3.2 Decomp osing SIP This subsection e xplains how to decomp ose the SIP problem. W e first sho w wh y static AND/OR search fails by studying the SIP co nstrain t g raph. Static AND/OR Se ar ch: Because of the alldi ff -constraint, the SIP con- straint graph corr esponds to the complete g r aph K | V p | . The pseudo-tree co m- puted on the constr ain t graph of any SIP instance is a chain, detecting no decom- po sition at all. Moreover, the initial SIP co ns train t graph is not 1-decomp osable. Therefore a static analysis o f the SIP-CSP yields no decomp osition at all and is not applicable. 7 Decomp osition seems difficult to a c hieve. How ever, as v aria bles are as s igned during search, 1 -decompos ition may o ccur at some no des of the sear c h tree. A dynamic detection of 1-deco mposition at differe nt no des of the sea r c h tree gives a first wa y of detecting decomp osition fo r the SIP . Dynamic AND/OR Se ar ch: A dynamic a nalysis o f the SIP constr ain t graph, as done for dynamic AND/OR search, tak es care o f poss ible constraint en tail- men ts and pro pa gation results. It is therefore v er y usefull for a strong ly propa- gated CSP . The main drawback is the slow down due to the additional propa - gation a nd dynamic decomp osition chec ks. F ur ther, the SIP constra in t gra ph is still a complete one and do es not allow for decomp osition. Our 1-decomp osition removes assigned v ariables in the decomp osition pro- cess. One could also remov e entailed co nstrain ts, leading thus to more decomp o- sition. This ca n easily b e done for the a lldiff -constr ain t b y removing an edge ( x i , x j ) ∈ E P representing x i 6 = x j when D i ∩ D j = ∅ ( i 6 = j ). In the following, we redefine the constraint gra ph of a SIP as a constra in t g raph for the morphism constraints together with a dyna mic constr ain t gra ph of the alldiff -c o nstrain t. Definition 8. Given the CSP P = ( X , D , C ) of a SIP instanc e, its SIP con- straint g raph is the undir e cte d gr aph G = ( V , E M C ∪ E 6 = ) , wher e V = X , E M C = { ( x i , x j ) ∈ E p | x i , x j ∈ X } and E 6 = = { ( i, j ) ∈ X × X | D i ∩ D j = ∅ } . Given the pa rticular structure of a SIP constra in t graph, it is p ossible to sp ecialize and simplify the detection of 1 -decompos itio n. Pr op erty 4. Let P = ( X , D , C ) b e a CSP mo del o f a SIP ins tance, and let G = ( V , E M C ∪ E 6 = ) be its SIP constraint gra ph. Let M = ( V ′ , E ′ ) be the constraint gra ph without assigned v ar iables, i.e . with V ′ = { x ∈ X | | D x | > 1 } and E ′ = ( V ′ × V ′ ) ∩ E M C . Then P is 1-deco mp osable into P 1 , . . . , P m iff M is decomp osable into M 1 , . . . , M m and D ( M i ) ∩ D ( M j ) = ∅ (1 ≤ i < j ≤ m ) with D ( M i ) the union of the doma ins of the v aria bles a ssocia ted to the no des of M i . The ab ov e pro perty s ta tes that the deco mposition o f M is a necessar y co n- dition. W e can therefore des ig n heuristics lea ding to the decomp osition o f M , hence in some cas es in the decompos ition of P . A direc t appr o ac h c o nsists in detecting 1-decomp osition at each no de of the search tree. When the CSP b ecomes 1- decomposable in partial CSPs, those ar e computed separa tely in AND no des. As show in the exp erimen tal section, this strategy proves to b e muc h slower than a standard OR search tree. The reason is tw ofold: 1. Decompositio n is tested at ev e r y no de of the search tree . Starting from the ro ot no de is us eless, as a lot o f computation time is lost. 2. There is no guar an tee that a decomp osition will o ccur. Based o n this observ ation, we present a hybrid appr o ach co m bining the b est of the static and dynamic str ategy . 8 The Hybrid Appr o ach: As sta ted befor e, even a dedicated dynamic AND/OR search, chec king for decomp osition on the reduced cons tr ain t graph only , is not fast enough to compe te with state-of-the- a rt SIP-so lv ers as implement e d in the vflib libr ary . Therefore , we sugges t a h y br id approach in order to fix this. T he idea is as follows: 1. calculate a sta tic pseudotree heur is tic on the reduced constra int gr aph 2. apply a forward chec king sea rc h following the pseudotree up to the firs t branching or until a fixed num b er of v ariables is as signed 3. switc h the s trategy to dynamic AND/OR se arc h with full AC-propagation This ens ur es, that the exp ensive dyna mic appro ac h is first used when a de- comp osition is av ailable or a t least likely after full pr opagation. Up to that mo- men t, a cheap forw a rd chec king a pproach is used for a fast inconsistency c heck and a strong r eduction of the re duce d constraint graph. In the following, we will give tw o dedicated heuris tics we hav e applied in the preliminary forward chec king pro cedure. 3.3 Heuristics W e now present tw o heuristics based on Prop erty 4 aiming at reducing the n um- ber of decomp osition tests, and fav or ing decomp osition. The genera l idea is to first detect a s ubs e t of v a riables disconnecting the morphism constraint gra ph int o disjoint compo nen ts a s it is a necessary condition for 1- decomposa bilit y . The search pro cess will first distr ibute ov er thes e v ar iables. The test of 1- decomp osition is p erformed when all these v ariables are instantiated. It is als o per formed at the s ubsequen t no des of the se a rc h tr ee. The cycle heuri stic (h1 ) The ob jective of the cycle heuristic is to find a set o f no des S in the morphism gr a ph C G M C = ( X, E M C ) (see Def. 8) such that the graph witho ut tho se no des is simply connected. When the v ariable s a ssocia ted to S ar e as s igned, any subsequent assig nmen t will decompo s e the morphism gr aph. Finding the minimal set of no des is known as the minimal cycle cutset problem and is a NP-Hard problem [6]. W e prop ose here a simple linear approximation that returns the no des of the cycles of the g raph. Algor ithm 1 r uns in O ( | V p | ). The effectiveness o f such a pro cedure o n differ en t classes of pr oblems is shown in the exp erimen tal sec tio n. One of the main a dv antage is its simplicity . Using graph partitio ning (h2) Graph partitioning is a well-kno wn technique that a llo ws har d graph pr oblems to b e ha ndled by a divide a nd conquer a pproach . In our co ntext, it can b e used to se pa rate the morphism constra in t graph into t wo graphs of equal size. Definition 9. Given a gr aph G = ( V , E ) , a k -gr aph p artitioning of G is a p ar- tition of V in k subsets, V 1 , . . . , V k , such that V i ∩ V j = ∅ for i 6 = j , ∪ i V i = V , and the numb er of e dges of E whose incid en t vertic es b elong to differ en t subsets is minimize d (c al le d t he edg e cut ). 9 input : G = ( X , E ) the C G M C output : The no des of the cycles of G All ← X T ← ∅ while ( ∃ n ∈ X | D eg ree ( n ) == 1 ) do T ← T ∪ { n } remo ve no de n from G end return All \ T Algorithm 1 : Selectio n of the b ody v aria bles. Based on the edgec ut of the morphism co nstrain t g raph, we can ea sily deduce a subset v ariables. Definition 10 . Given a 2-gr aph p artitioning of G , a no decut is a set of no des c ont ai ning one no de of e ach e dge in the cutset. Finding a minimum edgecut is a NP- Hard pro blem for k ≥ 3, but can b e solved in polynomia l time for k = 2 by matching (see [8], pa ge 209). Howev er we use a fast local sea rc h approximation [11], as the exact minim um subset is not needed. 4 Exp erimen t s Goals - The o b jectiv e of our exp eriment s is to co mpare our deco mposition metho d on different clas ses of SIP with standard CSP mo dels a s well as vfli b , the standard and reference algor ithm for subgraph iso morphism [2]. W e also compare our decomp osition metho d with standard direct decomp osition. The different heuristics prese n ted in Section 3.5 ar e also tested. Instances - The instances are ta k en from the vfli b graph database desc ribed in [7]. There are several classes of ra ndo mly g enerated gra ph, random graphs, bo unded graphs and meshes graphs. The target g raphs has a size n and the relative size of the pattern is noted α . F or r andom graphs, the tar get graph has a fixed num be r of no des n and there is a directed ar c b et ween t wo nodes with a probability η . The pa tter n gr a ph is also g e nerated with the same probabil- it y η , but its num b er of no des is αn . If the g enerated graph is no t connected, further edges are added until the graph is co nnected. F or random gra phs, n takes a v alue in [20 , 40 , 80 , 1 00 , 200 , 400 , 800 , 1000], η in [0 . 01 , 0 . 05 , 0 . 1], a nd α in [20% , 40% , 60 %]. There are thus 69 classes of randomly connected graphs. I n a class of instances denoted as si2-r 001-m200 , we hav e α = 2 0%, η = 0 . 0 1, a nd n = 200 no des. Mesh- k -connected g raphs ar e graphs where eac h no de is co nnected with its k neighbo rhoo d no des. Irregular mesh- k -connec ted graphs are made of a regular 10 mesh with the addition of random edg es uniformly distributed. The num b e r of added branches is ρn . F o r ra ndom g raphs, n can tak e a v alue in [16 , . . . , 1096], k in [2 , 3 , 4], and ρ in [0 . 2 , 0 . 4 , 0 . 6]. In an irregular mesh-connected cla ss of instances denoted as s i2-m4Dr6-m 625 , we have α = 2 0%, k = 4, ρ = 0 . 6 and n = 625 no des. One hundred gra phs are generated for each cla ss of instances. F or rando m graphs, we also generated 100 additio nal instances where the targ et gra ph ha s 1600 no des, for ea c h pos sible v alue of η and α . W e used the generator freely av a ilable from the gr aph database, following the metho dology des cribed in [7]. Mo dels - Several mo dels were considered for the exp erimen ts. First o f all, we use the av ailable implementation of vf lib . Then clas s ical CP mo dels are used, ca lled CPFC and CPAC . T he mode l CPFC is a mo del where all the co nstrain ts use forw ar d chec king a nd the v aria ble selection sele c ts the first v ariable which is involv ed in the maximum num b er of constra ints (ca lled maxcs tr ) using minimal domain size as tiebreaker. The mo del CPAC is similar e x cept it uses an a rc consisten t version of the MC constraint. The mo del CP+Dec waits for 30% of the v aria bles to b e ins tan tiated fo llo wing a v ar iable sele ction p olicy , called m insize ), selecting the (uninstancia ted) v ari- able with the smallest domain. It then tests at ea ch no de of the search tree if decomp osition occur s using a maxc str v ar iable selection. The mo del CP+D ec+h1 uses the cycle heuristics; once the nodes b elonging to the cycles of the pa ttern graph are instantiated using a m insize v aria ble selection p olicy (up to 30% of the size of the pattern), decompos ition is tested at each no de of the search tree and follows a max cstr v ariable selectio n. The model CP+Dec+h 2 us es the graph partitioning heuristics; once the v aria bles belong ing to the no decut set are in- stantiated (up to 30% of the size of the pattern), decomp osition is tested at each no de of the sear c h tree and follows a m axcstr v a riable selection. Setup - All exp erimen ts were p erformed on a cluster o f 1 6 machines (AMD Opteron(tm) 875 2.2Ghz with 2Gb of RAM) using the implemen tatio n of [1 4 ]. All r uns are limited to a time b ound of 10 min utes. In each expe rimen t, w e search for all solutions. Exp erimen ts searching for o ne solution hav e also b een done but are no t rep orted here for lack of spa ce. These exp eriments lead to the same conclusions. Description of the tables - T able 1 shows the results for random graphs and T able 2 for irregula r mesh-connected graphs. Each line descr ibes the execution of 100 instances from a particular cla ss. The column N indicates the mean nu mber of solutions a mong the so lv ed insta nces. T he column % indicates the nu mber of insta nces tha t w ere so lved within the time bo und of 10 minutes. The column µ indicates the mean time ov er the solved instances and the column σ indicates the corr esponding s ta ndard deviation. The column D indicates the nu mber o f instance s that us e d decomp osition a mong the solved instances. The 11 T abl e 1. Randomly connected graphs, sea rc hing for all so lutions. Benc h vflib CP AC CPF C N % µ σ % µ σ % µ σ si2-r001-m200 61 E+6 72 74 115 83 56 109 85 41 76 si2-r001-m400 17 E+8 2 248 118 10 10 6 156 7 288 177 si2-r001-m800 28E+7 0 - - 11 220 136 1 153 - si2-r001-m1600 2 500 16 203 202 30 227 146 0 - - si6-r01-m200 1 100 2 3 100 9 11 100 12 17 si6-r01-m400 1 66 99 133 89 156 116 50 190 137 si6-r01-m800 1 7 235 153 0 - - 5 389 125 si6-r01-m1600 1 0 - - 0 - - 39 499 51 Benc h CP+Dec CP+Dec+h1 CP+Dec+h2 N % µ σ D #D % µ σ D #D S % µ σ D #D S si2-r001-m200 61E+6 94 49 100 91 9244 98 6 40 98 1834 0.2 87 23 48 71 909 0.2 si2-r001-m400 17E+8 15 160 177 15 35655 75 6 8 125 75 2268 0.4 29 212 218 22 196 0.3 si2-r001-m800 28E+7 0 - - 0 12 4 227 254 4 21 0.6 12 256 239 8 0 0.6 si2-r001-m1600 2 500 0 - - 0 0 7 165 199 1 0 0.8 0 - - 0 0 0.9 si6-r01-m200 1 94 148 153 0 0 100 0 0 0 0 1 100 0 0 0 0 1 si6-r01-m400 1 2 179 220 0 0 100 2 1 0 0 1 100 4 6 0 0 1 si6-r01-m800 1 0 - - 0 0 100 46 35 0 0 1 100 46 39 0 0 1 si6-r01-m1600 1 0 - - 0 0 74 479 71 0 0 1 54 435 79 0 0 1 column # D indicates the mean num b er of decomp osition that o ccurred ov er all solved instances . The column S indicates the mea n size of the initial v aria ble set co mput e d b y the heuristics h1 or h2 . T able 3 gives the mean degree and its v ar iance for the different instances clas ses. F o r each class o f instances in T ables 1 and 2, the results of the b est alg o rithms are in b old. Analysis - W e star t the analysis by lo o king at random g raphs (see T able 1). W e compare fir st the vflib with the CP mo dels CP FC and C PAC . F or si2 -r001-* instances, the CP AC mo de l is the bes t in mean time a nd % of the s o lv ed instance s. When the level of consis tency is higher fo r the MC constraint, the sear c h spa ce size diminishes, and all solutions are quic kly found. F or si6 -r01-* instances, CPAC is the b est mo de l for m2 00 and m400 instances, while C PFC is the b est mo del for m800 and m 1600 instances. As shown in T a ble 3 , the mean degree increase s with the size o f the generated gr aph. The effect of propag ation is mo dified. The MC forward chec k ing propaga tor is more efficient with denser gr aphs than an arc consistent one. With sparse gr a phs, an arc consisten t M C is cheap and pro pagates a lot, while with denser gr aphs it is mor e efficient to w a it for instantiation to propaga te. W e now lo ok at the use of decomp osition for r andom graphs (second table in T able 1). The first mo del CP+ Dec , whic h corre s ponds to a decompositio n approach that uses the whole constraint gr aph only , fails . This mo del cannot 12 T abl e 2. Irregula r meshes, se arc hing for all solutions. Benc h vflib CP AC CPF C N % µ σ % µ σ % µ σ si2-m4Dr6-m625 88E+5 89 23 50 94 21 38 95 6 27 si2-m4Dr6-m1296 17E+7 16 135 137 33 178 123 38 107 154 si6-m4Dr6-m625 3.31 100 7 43 100 29 4 100 9 4 si6-m4Dr6-m1296 10.38 100 13 55 100 233 30 100 113 65 Benc h CP+Dec CP+Dec+h1 CP+Dec+h2 N % µ σ D #D % µ σ D #D S % µ σ D #D S si2-m4Dr6-m625 88E+5 35 223 151 35 0.7 100 6 22 96 5.4 0.5 94 6 21 88 5.5 0.3 si2-m4Dr6-m1296 17E+7 3 120 36 3 0.1 63 67 109 63 4 0.5 49 163 170 49 3.9 0.5 si6-m4Dr6-m625 3.3 8 105 32 0 0 100 7 3 6 0.1 0.8 100 22 26 6 0.1 0.7 si6-m4Dr6-m1296 10.3 0 - - 0 0 100 65 20 41 0.6 0.7 77 223 161 29 0.4 0.7 T abl e 3. Mean degree for the tested g raph set. Benc h degree µ σ si2-r001-m200 2.30 0.14 si2-r001-m400 2.89 0.14 si2-r001-m800 3.99 0.18 si2-r001-m1600 6. 80 0.19 si6-r01-m200 3.29 0.14 si6-r01-m400 5.27 0.16 si6-r01-m800 9.76 0.15 si6-r01-m1600 19.20 0.17 si2-m4Dr6-m625 3.51 0.26 si2-m4Dr6-m1296 3.53 0.20 si6-m4Dr6-m625 5.12 0.16 si6-m4Dr6-m1296 5.19 0.14 take into a ccoun t the structure of the problem. This can b e measured throug h the quality of the deco mposition. First, w e will fo cus o n the si2-r00 1-* classes . The mode ls CP +Dec+h1 a nd CP+Dec +h2 achieve b etter deco mp ositions than the CP+ Dec mo del. E v en though CP+Dec tends to induce more decomp ositions, the num b er of instances using de- comp osition (see column D) is higher for CP+D ec+h1 and CP+De c+h2 than for CP+Dec . This visualizes the computational overhead o f a pure dynamic decom- po sition appr o ac h. Ho wev er , the num b er of instances using decomp osition tends to b e zer o for m160 0 instances. This is due to the fact tha t the gra phs have highe r degrees as their size increase s (see T able 3). This can b e obs erv ed by lo oking at the co lumn S: the size of the initial subse t of v a riable to instant ia te b ecomes closer to 10 0% as siz e increases . F or this reas on our decomp osition metho d is bea ten by the CP AC model for s i2-r001-m1 600 . 13 W e now fo cus o n the si6-r01- * clas s es. As s tressed ear lier, those insta nces hav e denser gr aphs. The initial set o f v aria ble s to instantiate is the whole s et of pattern no des for CP+Dec+h 1 and CP+De c+h2 . No decomp osition occur s. Why then CP +Dec+h* mo dels outp erform all other metho ds in those clas ses? Because CP+Dec +h* mo dels use a minsize v ar iable selection po licy instead of maxcst r for CPFC . In the cla s s si6-r 01-* , the CP+D ec+h1 appr o ac h r educes thus to a CPFC with a minsize v ariable selec tio n po licy . F or ra ndom gr aphs, the deco mposition metho d with heuristics is esp ecially useful for sparse gr a phs with ma n y solutions, while a CP FC model using a minsize v ar iable sele c tion p olicy seems the b est choice for denser g r aphs a nd there are few solutions. The vflib is c learly outp erformed on all these class es o f instances. Exp erimen ts on the other classe s of random graphs, not rep orted here for lack of space, confir med this analys is. W e now analyze irregula r mesh-connec ted graphs. W e observe in T a ble 3 that the mean degree of the si2-m4 Dr6-* class es is higher than for the si6-m 4Dr6-* classes. W e fir st compare the vf lib a nd CP mo dels without dec omposition. F or spars er si2-m4Dr 6-* classes, CPFC is the best metho d, while for denser si6-m4 Dr6-* clas ses, vfli b is the bes t. W e hav e no particular explana tio n for this b eha vior and this is an op en ques tion. Regarding decompos ition metho ds, the same rema rks than for random graphs a pply . The CP+De c model tends to pro duce less decomp osition than the CP+Dec+ h* mo dels. Moreov e r , CP+De c+h* mo dels a re the best mo dels fo r sparser instances with ma n y so lutions. As the mean degree of the instances incr ease (see T able 3), the decomp osition methods bec ome less efficient . Indeed, for si 6-m4Dr6-m 1296 , the b est metho d is vflib , but our decomp osition a pproach als o solves a ll the instances and helps CP a t diminishing the mean time. Summary - The application o f sta nda rd direc t decomposition methods CP+Dec lead to perfo rmances worse tha n the direct application of standard CP mo dels ( CPFC , CPAC ) and vflib . O n mos t clas ses, the cy cle heur istic ( h 1 ) is b etter than the g raph partitioning heuristic ( h2 ). On sparse rando mly connected gr aphs with ma n y s olutions, a nd on sparse irreg ula r meshes, our decomp osition metho d outp e rforms standard CP appro ac hes as well a s vflib . F o r denser co nnected graphs, CP mo dels ( C PAC or CPF C with a min size p olicy) outp erforms vflib . F or denser irregular meshes, vflib , the standar d CP mo dels and our decomp osition metho d solve a ll the insta nces, but vfli b is mor e efficient. 5 Conclusion Our initial question was to inv estiga te the application of decomp osition tech- niques as AND/OR search for problems with glo bal constraints, in particular for the SIP . W e sho wed that it is indeed p ossible using a hybrid appr o ac h of static a nd dy namic techniques and a dedicated problem s tructure a nalysis. F or the SIP , one can derive a decomp osition enforcing static heuristic that is us ed 14 by a cheap forward c hecking appro ac h. As so on as the pr oblem gets (likely) de- comp osable, the sear c h pro cess is switched to a fully propag ated, dynamically decomp osed sear c h. This exploits the non-predictable r eduction of the constraint graph structure via constraint pr opagation a nd entailmen t but reduces the h ug e computational effor t o f a completely propagated s earc h. W e show ed tha t our hy- brid deco mposition approach is able to b eat the state-o f-the-art VF-algorithm for spar s e graphs with hig h solutio n num b ers. As future work, w e w ould like to inv e s tigate mo r e heuris tics for SIP as it influences the quality of decompos i- tion. Moreov e r , we intend to inv estiga te the use of o ur decomp osition metho d for motif discov er y where so lving SIP is used as an enumeration to ol [9]. References 1. Donatello Con te, Pasquale F oggia, Carlo Sansone, and Mario V ento. Thirty ye ars of graph matching in pattern recognition. IJPRAI , 18(3):265– 298, 2004. 2. Luigi Pietro Cordella, Pasquale F oggia, Carlo Sansone, and Mario V ento. An impro ved algorithm for matc hing large graph s. I n 3r d IAPR-TC15 W ork shop on Gr aph-b ase d R epr esentations in Pattern R e c o gnition , pages 149–159. Cuen, 2001. 3. R. Dech ter and R. Mateescu. AND/OR search spaces for graphical mo dels. Arti- ficial I nte l ligenc e , 171(2-3):73–1 06, 2007. 4. Rina Dec hter. Constr ai nt Pr o c essing . Morgan K aufma n n, May 2003. 5. Rina Dechter and Rob ert Mateescu. The impact of AND/OR searc h spaces on constrain t satisfaction and counting. In Pr o c. of the CP’ 200 4 , 2004. 6. F ran¸ cois F ages and Aka sh Lal. A constrain t programming approac h to cutset problems. Comput. Op er. R es. , 33(10):2852–2865 , 2006. 7. P . F oggia, C. S an sone, and M. V ento. A database of graphs for isomorphism and sub-graph isomorphism b enchmarking. CoRR , cs.PL/010 5015, 2001. 8. Mic hael R. Garey and David S. Johnson. Computers and Intr actability; A Guide to the The ory of NP-Completeness . W. H. F reeman & Co., 1990. 9. Josh ua A . Gro c how and Manolis Kellis. Netw ork motif discov ery using sub gra p h enumeratio n and symmetry- b reaking. In RECOMB’07 , pages 92–106 , 2007 . 10. G. Karypis an d V . Ku mar. A fast and high quality m ultilevel sc heme for parti- tioning irregular graphs. SIAM J. Sci. C om put . , 20: 359–392, 1998. 11. George K arypis and Vipin Kumar. A fast and h ig h q u ali ty multil evel sc h eme for partitioning irregular graphs. SIAM J. Sci. Com put . , 20(1 ) :359–392, 1998. 12. Ja vier Larrosa and Gabriel V aliente. Constraint satisfaction algorithms for graph pattern matching. Mathematic al. Struct ur es in Comp. Sci. , 12(4):403–422, 2002. 13. W ei Li and P eter v an Beek. Guiding real-w orld SA T solving with dynamic hyp er- graph separator decomp ositio n . I n Pr o c. of the 16th IEEE ICT AI , 2004. 14. Martin Mann, Guido T ac k , and Sebastian Will. Decomp osition during searc h for propagation-based constraint solv ers. CoRR , 2007. arXiv :0 712.2389. 15. R. Mateescu. AND/OR Se ar ch Sp ac es f or Gr aphic al Mo dels . PhD thesis, 2007. 16. Rob ert Mateescu and Rina D echter. AND/OR cut set conditioning. In Pr o c. of the IJCAI’2005 , 2005. 17. Mic hael Rudolf. Utilizing constrain t satisfaction t ec hniques for efficien t graph pattern matching. In The ory and Applic ation of Gr aph T r ansformations , 1998. 18. Gabriel V alien te and Conrado Mart ´ ınez. An algorithm for graph pattern-matching. In Pr o c. 4th South Americ an Workshop on String Pr o c essing , volume 8, 1997. 19. St´ ephane Zamp ell i, Yves Deville, an d Pierre Dup ont. Approximate constrained subgraph matching. In Pr o c. of CP’05 , volume 3709 , pages 832– 836, 2005. 15

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment