Efficient modularity optimization by multistep greedy algorithm and vertex mover refinement

Identifying strongly connected substructures in large networks provides insight into their coarse-grained organization. Several approaches based on the optimization of a quality function, e.g., the modularity, have been proposed. We present here a mu…

Authors: Philipp Schuetz, Amedeo Caflisch

Efficient modularity optimization by multistep greedy algorithm and   vertex mover refinement
Eien t mo dularit y optimization b y m ultistep greedy algorithm and v ertex mo v er renemen t Philipp S h uetz and Amedeo Cais h 1 1 Dep artment of Bio hemistry, University of Zurih, Winterthur erstr asse 190, CH-8057 Zurih, Switzerland Iden tifying strongly onneted substrutures in large net w orks pro vides insigh t in to their oarse- grained organization. Sev eral approa hes based on the optimization of a qualit y funtion, e.g., the mo dularit y , ha v e b een prop osed. W e presen t here a m ultistep extension of the greedy algorithm (MSG) that allo ws the merging of more than one pair of omm unities at ea h iteration step. The es- sen tial idea is to prev en t the premature ondensation in to few large omm unities. Up on on v ergene of the MSG a simple renemen t pro edure alled v ertex mo v er (VM) is used for reassigning v er- ties to neigh b oring omm unities to impro v e the nal mo dularit y v alue. With an appropriate  hoie of the step width, the om bined MSG-VM algorithm is able to nd solutions of higher mo dularit y than those rep orted previously . The m ultistep extension do es not alter the saling of omputational ost of the greedy algorithm. P A CS n um b ers: 89.75.Fb,05.10. − a,89.75.H I. INTR ODUCTION The net w orks under study in natural and so ial si- enes often sho w a natural divisibilit y in to smaller mo d- ules (or omm unities) originating from an inheren t, oarse-grained struture. In general, these mo dules are  haraterized b y an abundane of edges onneting the v erties within individual omm unities in omparison to the n um b er of edges linking the mo dules. T o detet these partitions sev eral algorithm- or sore- based approa hes ha v e b een dev elop ed and applied. V ery p opular b eame the approa h in tro dued b y Girv an and Newman [1 ℄ based on the qualit y funtion alled mo du- larit y for partition assessmen t. This soring funtion ompares the atual fration of in traomm unit y edges with its exp etation in the random ase giv en an iden- tial degree distribution. The partition with the highest v alue of the soring funtion is then onsidered to b e the optimal splitting. The mo dularit y Q is dened (for undi- reted net w orks) as Q = N C X i =1 " I ( i ) L −  d i 2 L  2 # with I ( i ) the w eigh ts of all edges linking pairs of v erties in omm unit y i , d i the sum o v er all degrees of v erties in mo dule i , L the total w eigh t of all edges, and N C the n um b er of omm unities. In trinsially , the mo dularit y based approa h do es not presrib e the usage of a partiular optimization pro e- dure. In pratie, a strategy for optimization has to b e  hosen. The mo dularit y optimization is a NP-hard prob- lem [2℄. Therefore, only an exhaustiv e sear h rev eals the optimal solution for a generi net w ork. This t yp e of sear h is extremely demanding and only in a few ases feasible. Th us, man y heuristi approa hes su h as ex- tremal optimization [3 ℄, sim ulated annealing [4 ℄, and the greedy algorithm [5℄ ha v e b een dev elop ed, rened, and suessfully applied. Among the published approa hes the greedy algorithm is one of the fastest te hniques [6℄. On the other hand, man y examples sho w that the greedy algorithm is not apable of nding the solutions with the highest mo dularit y v alue. F urthermore, reen t studies ha v e pro vided evidene that mo dularit y [7℄ and P otts mo del based approa hes [8℄ are endo w ed with an in trin- si resolution limit (small mo dules are not deteted and amalgamated in to bigger ones). Th us, ea h omm unit y has to b e rened b y sub duing it as a separate net w ork to the omm unit y detetion algorithm. Therefore, a fast and aurate optimization te hnique is neessary . In this artile, w e enhane the greedy algorithm b y a m ultistep feature in om bination with a lo al renemen t pro edure. The enhaned algorithm nds partitions with higher mo dularit y v alues than previously rep orted. This pap er is organized as follo ws. In Se. I I w e in tro due b oth pro edures and desrib e the motiv ation for their onstrution. In addition, w e disuss p erformane ori- en ted implemen tations and estimate their running times. Ben hmarking results for a set of real-w orld net w orks and a omparison with other published results are presen ted in Se. I I I. The onlusions are in Se. IV. In this pap er, all net w orks are onsidered as undireted. The extension to direted net w orks is straigh tforw ard. I I. THE ALGORITHM A. Multistep Greedy algorithm (MSG) The lassial greedy algorithm (rst appliation in Ref. [5 ℄) joins iterativ ely the pair of omm unities that impro v es mo dularit y most in ea h step. The essen tial idea of the m ultistep greedy algorithm (MSG) is to pro- mote the sim ultaneous merging of sev eral pairs of om- m unities at ea h iteration. The pseudo o de of the MSG algorithm is presen ted in algorithms 1 and 2 , and an illus- 2 Ea h v ertex is a omm unit y Calulate the mo dularit y  hange matrix ∆ Q Determine the omm unit y degrees d i while pair ( i, j ) with ∆ Q ij > 0 exists do for all elemen t ( i, j, ∆ Q ij ) in ∆ Q matrix, parsed w.r.t. dereasing ∆ Q and inreasing ( i, j ) do if  ∆ Q ij > 0 in b est l v alues in ∆ Q matrix i and j u nc hanged in iteration  then MergeComm unities(i,j) end if end for end while Algorithm 1: Flo w  hart of the MSG algorithm. The mo dularit y  hange is alulated aording to Eq. (1). Details of the algorithm are giv en in algorithm 2 . trativ e example is giv en in Fig. 1. The MSG-algorithm starts with ea h v ertex separated in its o wn omm u- nit y . A t ea h iteration the mo dularit y  hange ∆ Q ij up on merge of ea h pair of onneted omm unities ( i, j ) is al- ulated (while nononneted pairs are ignored b eause their merging yields a negativ e mo dularit y  hange). The triplets ( i, j, ∆ Q ij ) are parsed in the order of dereasing ∆ Q -v alue and inreasing omm unit y index. Those om- m unit y pairs ( i, j ) are joined whi h fulll the follo wing t w o riteria: 1. The mo dularit y  hange ∆ Q ij is within the l most fa v orable v alues (lev els) and p ositiv e. 2. T ou hed-omm unit y-exlusion-rule (TCER): Nei- ther mo dule i nor j is presen t in another pair in- duing a higher mo dularit y  hange. Con v ergene is rea hed when all pairwise merges of om- m unities derease mo dularit y (b y indution one an pro v e that all merges in further iterations w ould derease mo d- ularit y). A level enompasses all triplets ( i, j, ∆ Q ij ) with equal ∆ Q ij -v alue and the level p ar ameter l is k ept on- stan t. By onstrution the lev el parameter is alw a ys smaller than the n um b er of edges in the net w ork. The m ultiple lev els promote the onurren t formation of m ultiple en ters. Sim ultaneously gro wing omm unit y en ters hinder the ondensation in to few large omm uni- ties (few formed omm unities srap e all v erties as the establishmen t of a new omm unit y is to o exp ensiv e in mo dularit y) as observ ed in the lassial greedy algorithm. The TCER is a seond mean against exessiv e aggrega- tion in to few large mo dules. This rule p ermits the addi- tion of only one omm unit y to an existing omm unit y p er algorithm iteration. F urthermore, the TCER guaran tees that the mo dularit y  hange up on all p erformed merges is just the sum o v er the orresp onding ∆ Q elemen ts whi h impro v es eieny . B. Implemen tation details of MSG The k ey observ ation for an eien t implemen tation of the MSG is the follo wing: Up on merge of omm unities i and j only those ∆ Q -elemen ts onerning either of the t w o mo dules ha v e to b e realulated. When the mo dules i and j are joined in to a new one alled I , the up dated mo dularit y  hanges ∆ Q new I k (mo dule k is onneted either to omm unit y i or j ) reads (see Se. I I in Ref. [ 9℄ for details) ∆ Q new I k =    ∆ Q ik + ∆ Q j k i, j a nd k pairwise connected ∆ Q ik − d j d k 2 L 2 i and k connected , j and k not ∆ Q j k − d i d k 2 L 2 j and k connected , i and k not (1) with d x the sum o v er all degrees of v erties in omm unit y x = i, j and L the total edge w eigh t. F urther eieny impro v emen ts are gained from an ap- propriate  hoie of data strutures. A set (implemen ta- tion tak en from the C++-STL -library) is a sorted binary sear h tree. In a set individual elemen ts an b e found or inserted in O (log ( n )) time ( n the n um b er of elemen ts) and the extremal en tries are found in onstan t time. The mo dularit y  hanges are stored in the ∆ Q matrix imple- men ted as v etor of ro w strutures. The i th ro w onsists of a set with elemen ts ( j, ∆ Q ij ) ( j a mo dule link ed to the omm unit y i ) ordered aording to the omm unit y index j . This data struture obsoletes a separate stor- age of the top ology information. The extration of the b est l mo dularit y  hanges is handled via the level set . F or ea h pair of onneted omm unities i and j the el- emen t (min { i, j } , max { i, j } , ∆ Q ij ) is added to the level set . The level-set elemen ts are sorted with resp et to dereasing ∆ Q and inreasing index v alues. The degree information is stored in a v etor heneforth named d . In ea h iteration a Bo olean v etor alled touhe d stores whether a omm unit y has already b een mo died in the same round. T o sa v e the time to determine the highest index of a presen t omm unities, the n um b er of v erties (initial length) is  hosen as length of the touhe d v etor. The implemen tation details of the MSG algorithm are listed in algorithm 2 . The alulation of the omm unit y degrees in v olv es one parse of the edge information. In the seond parse of the edge information the ∆ Q matrix and the level set is lled. The initial mo dularit y  hange ∆ Q ij up on join of mo dules (at this stage the v erties) i and j is alulated as (see Se. I I in Ref. [ 9℄ for details) ∆ Q ij = I L − d i d j 2 L 2 with I the w eigh t of the edges onneting the v erties i and j , d x the degree of v ertex x = i , j , and L the total edge w eigh t. The mo dularit y v alue of the initial partition is ( N the n um b er of v erties) Q 0 = − N X i =1 d 2 i 4 L 2 . 3 1 3 4 5 6 8 7 2 9 400 3 50 3 80 9 400 3 80 3 80 3 80 3 50 1 100 1 100 ∆ Q ij i j 3 50 1 2 3 50 2 3 3 80 1 4 3 80 3 4 3 80 6 7 3 80 7 8 1 100 5 6 1 100 5 8 9 400 4 5 9 400 5 7 Remo v ed due to ∆ Q < 0 ∆ Q ij i j 3 50 1 2 3 50 2 3 3 80 1 4 3 80 3 4 3 80 6 7 3 80 7 8 1 100 5 6 1 100 5 8 l = 1−scope l = 2−scope l = 3−scope TCER Merge l = 2: Merge (1,2),(3,4),(6,7) l = 3: Merge (1,2),(3,4),(6,7),(5,8) Merge (1,2) 1: l = FIG. 1: Eet of dieren t v alues of lev el parameter during rst MSG-iteration on example net w ork. Ea h v ertex is a omm unit y Calulate omm unit y degrees d and the ∆ Q matrix Determine the initial mo dularit y Q ← Q 0 = − P n i =1 d 2 i 4 L 2 level set ← set of ∆ Q elemen ts ( i, j, ∆ Q ij ) , sorted with resp et to dereasing ∆ Q and inreasing ( i, j ) while rst elemen t of level set has ∆ Q > 0 do touched ← (0 , . . . , 0) Bo olean, N -dimensional v etor ( N = No. v erties) { touched i = 1 , if mo dule i is mo died in while -lo op} M P ← subset of level-set elemen ts ( i, j, ∆ Q ij ) with ∆ Q ij > 0 and ∆ Q ij among highest l v alues for all elemen ts ( i, j, ∆ Q ij ) of M P do if ( not touched i ) and ( not touched j ) then while parse ∆ Q i. and ∆ Q j. onurren tly do ∆ Q ik ←    ∆ Q ik + ∆ Q j k i, k and j, k are linked ∆ Q ik − d j d k 2 L 2 i and k are linked ∆ Q j k − d i d k 2 L 2 j and k are linke d ∆ Q ki ← ∆ Q ik Up date the level set Up date the mo dularit y Q ← Q + ∆ Q ik end while Empt y ∆ Q j. Flag touched i , touched j ← 1 Up date degrees: d i ← d i + d j , d j ← 0 end if end for end while Algorithm 2: P erformane-orien ted implemen tation of MSG algorithm. The v etor touhe d on tains the information for the tou hed-omm unit y-exlusion-rule (TCER). The algorithm iteration starts b y initializing the touhe d v etor. Subsequen tly , the L evel -set is parsed and all ele- men ts with p ositiv e ∆ Q v alue, whose mo dularit y  hange is among the b est l (external level p ar ameter ) dieren t v alues, are stored in a set named MP onserving the or- der of the level set . In this order the mo dule pairs are merged unless one of them w as part of a amalgamation in the same algorithm iteration. In the merge pro ess, the  hanged ∆ Q matrix elemen ts are alulated as desrib ed at the b eginning of this paragraph. T o determine whi h ase applies in Eq. ( 1) the fat that ea h ro w of the ∆ Q matrix is ordered with resp et to the omm unit y index an b e used. More preisely , parse for the merge of mo d- ules i and j the orresp onding ro ws onurren tly . F or ea h ro w dene an momen tarily onsidered elemen t p . If the omm unit y index of p i is equal to the one of p j , the rst ase applies and adv ane b oth p 's to the next elemen t in the orresp onding ro w. If the index k of p i is lo w er than the one of p j alulate the ∆ Q new I k elemen t ( I the name of the merged omm unit y) aording to the seond ase and adv ane (if p ossible) only p i . If the mo dule in- dex of p i is larger than the one of p j , pro eed analogously . If one p rea hes the end of the ro w, merge the remaining elemen ts of the other ro w aording to the resp etiv e rule. This pro edure will b e alled asyn hronous parsing in Se. I I C. It is ustomary to up date ea h ∆ Q elemen t after alulation. T o omplete the merge pro ess it re- mains to up date the omm unit y degrees and to ag the mo died omm unities in the touhe d v etor. C. Running time estimation of MSG As w e adopted the mo dularit y  hange alulation of Clauset et al. (Se. I I in Ref. [9℄) w e an adopt their metho d of running time estimation as w ell. First, w e ob- serv e that the up date of one elemen t in the ∆ Q matrix and the level set osts in the w orst ase O (log ( N )) (inser- tion in set, ea h omm unit y has at most N neigh b ors with N the n um b er of v erties) and O (log ( M )) = O (log ( N )) running time (the n um b er of distint edges M is b ounded b y the square of the n um b er of v erties N 2 ), resp etiv ely . Merging omm unities i and j in v olv es an up date of the ∆ Q matrix and the level set for ea h elemen t of the orresp onding ro ws of the ∆ Q matrix . The alulation of ea h  hanged v alue an b e a hiev ed in onstan t time as during the asyn hronous parsing it is kno wn whether the other omm unit y is link ed as w ell and all other in- 4 formation (omm unit y degrees) is stored in a v etor. Th us, the total running time on tribution of one merging ev en t is O (( d i + d j ) log ( N )) with d k the n um b er of edge starts/ends on v erties of omm unit y k = i, j . In the w orst ase all omm unities are  hanged in one algorithm round. As the sum o v er all d i v alues is t wie the n um b er of distint edges, the on tribution of the merging pro- esses in one algorithm round is at most O ( M log ( N )) . The other steps of one algorithm round are less on- sumptiv e: The extration of pairs b elonging to the b est l lev els an b e p erformed in onstan t time. The same is true for the up date of the degree information. If D is dened as the depth of the dendrogram of omm uni- ties, at most D algorithm rounds ha v e to b e p erformed. Th us, the running time exp etation for the iterativ e part is O ( D M log ( N )) whi h is iden tial to the omplexit y of the lassial greedy algorithm [9 ℄. The initialization in v olv es the read-in pro esses of the edge information ( M onstan t time op erations), the de- gree alulation (part of read-in pro ess), the alulation of the initial mo dularit y (onstan t time op eration on N elemen ts) and nally the generation of the ∆ Q matrix and the level set at osts O ( M lo g( N )) ( M insertions in a set with at most N or M elemen ts, resp etiv ely). In the w orst ase the exp eted on tribution of the initialization to the running time is O ( M log ( N )) . In the preeden t paragraphs w e ha v e sho wn that the MSG greedy algorithm has the total omplexit y O ( D M log ( N )) . Among the published strategies for mo dularit y optimization the lassial greedy algorithm [9℄ is the fastest [6℄. As the MSG shares the w orst ase exp etation for the running time with the lassial greedy algorithm, w e onlude that the MSG is one of the fastest pro edures for mo dularit y optimization. D. V ertex mo v er (VM) T o further impro v e mo dularit y b y adjusting mis- plaed v erties, a renemen t step alled v ertex mo v er (VM) is applied up on on v ergene of the MSG algorithm. In priniple, it ould also b e applied to other mo dularit y optimization pro edures. In the VM, the list of v erties is parsed in the order of inreasing degree and v ertex index (to resolv e the degeneray of m ultiple v erties with equal degree) and ev ery v ertex is reassigned to the neigh b oring omm unit y with maximal mo dularit y impro v emen t. This parsing-and-reassignmen t pro edure is rep eated un til no mo dularit y impro v emen t is observ ed. The VM pro edure is similar to the Kernighan-Lin algorithm [10℄ (applied to mo dularit y optimization in Ref. [11 ℄). In on trast to the Kernighan-Lin algorithm the VM pro edure has a p erfetly lo al fo us. In other w ords, instead of rep etitiv ely sear hing for the optimal v ertex to reassign, the VM pro edure parses the v erties in the aforemen tioned order and iden ties the optimal omm unit y for the onsidered v ertex. F urthermore, ea h reassignmen t of the VM approa h impro v es mo dularit y . Therefore, the seletion of the optimal in termediate par- tition as in the Kernighan-Lin algorithm is not neessary . E. VM implemen tation The mo dularit y  hange ∆ Q up on reassignmen t of v er- tex v from omm unit y i to j an b e written as ∆ Q = links( v ↔ j ) − links( v ↔ i ) L − k v  d j − d i \ v  2 L 2 (2) with k v the degree of v ertex v , d j the sum o v er the de- grees of all v erties in omm unit y j , d i \ v = d i − k v the orresp onding degree for omm unit y i without v ertex v , and L the total w eigh t of all edges. The most time onsuming part of the VM is the al- ulation of the mo dularit y  hanges up on reassignmen t of the v erties. Consequen tly , Eq. (2) redues this b ottle- ne k to the alulation of w eigh t of the edges onneting the v ertex to the neigh b oring omm unities. The onne- tivit y information of v ertex v is stored in a sparse v etor [i.e., a v etor of elemen ts ( u, w vu ) with u a v ertex link ed to v and w vu the total w eigh t of all edges onneting v er- ties u and v ℄. These ro ws are stored in a v etor and form the top ology matrix. T o determine the total edge w eigh t onneting v ertex v with omm unit y j the v th ro w is parsed and for ea h en try the w eigh t is added to the subtotal edge w eigh t of the orresp onding omm u- nit y . T o k eep aess times short a N -dimensional v etor ( N the n um b er of v erties) is  hosen to store the in ter- mediate links( v ↔ j ) results. The optimal reassignmen t partner for v ertex v is the omm unit y with smallest index yielding the maximal mo dularit y impro v emen t. F. Estimation of VM running time Calulating the mo dularit y  hanges up on reassignmen t of one v ertex to an y neigh b oring omm unit y in v olv es one parse of its edge list supplemen ted with diret memory aess to determine the omm unit y aliation and some onstan t time op erations for the atual mo dularit y alu- lation. Therefore, the running time on tribution of one v ertex is prop ortional to its degree. One algorithm round requires O ( L ) = O ( P i d i ) running time. The estimation of the n um b er of needed iterations is not p ossible as it dep ends on the qualit y of the MSG result. In all exam- ples tested b y us the running time of the VM w as alw a ys at least one order of magnitude smaller and less than one min ute ev en for the biggest net w orks under study . I I I. RESUL TS A. T est set of net w orks F or b en hmarking algorithms that optimize mo dular- it y the net w orks ommonly used are the ollab oration 5 FIG. 2: (Color online) Dep endene of MSG mo dularit y v alue Q MSG ( l ) (blue), MSG-VM mo dularit y v alue Q MSG-VM ( l ) (bla k) on the lev el parameter l relativ e to maximal MSG-VM mo dularit y v alue Q max . The previously published result Q pub /Q max (dashed green line) is also sho wn as basis of omparison. The red irles indiate the v alue of l that yields maximal mo dularit y . A signian t n um b er of l -v alues yield higher mo dularit y than the previously published maximal mo dularit y for all but the smallest t w o net w orks, i.e. Za hary (not sho wn) and College. In the latter, only l = 1 yields a higher mo dularit y than Q pub . net w ork (oauthorships in ond-mat artiles) [ 12 ℄, the graph of metab oli reations in Caenorhabitis ele gans [13 ℄, the email net w ork [14℄, the net w ork of m utual trust (PGP-k ey signing) [15 , 16 ℄, the onferene graph of ol- lege fo otball teams [17 ℄, the net w ork of jazz groups with ommon m usiians [18 ℄ and the Za hary k arate lub ex- ample [19 ℄. In addition, w e inlude less frequen tly used examples su h as the graph of the metab oli reations in Esherihia  oli [20 ℄, t w o dieren t data set desribing the protein-protein in terations in S. erevisiae (budding y east) [21 , 22 ℄ with lab els PPI and y east. T o o v er lin- guisti appliations w e b en hmark the w ord asso iation net w ork [23 ℄ and the graph of the o-app earing w ords in publiation titles (o)authored b y Martin Karplus [ 24 ℄ who has the third highest h -fator [25 ℄ among  hemists [26 ℄. F urther asp ets of so ial w ebs w ere inorp orated b y onsidering the graph of ostarring ators in the IMDB database [27 ℄. Notieable, the ator net w ork - b eing the net w ork with the largest n um b er of edges - serv es as a pro of of onept for su h big net w orks b eing treatable as w ell. F rom omputer siene w e inlude the in ternet routing net w ork [28 ℄ and the graph of W orld Wide W eb pages [29 ℄. With this seletion of net w orks most ur- ren tly kno wn appliation elds of net w orks are o v ered. T o study the eet of disonneted graphs and w eigh ted net w orks, w e onsider in b oth ases the full net w ork as w ell as the largest onneted omp onen t (sux CP) and the un w eigh ted v arian t, resp etiv ely . Unless stated otherwise the net w orks are treated un w eigh ted. B. Dep endene on l and v ertex lab eling It is imp ortan t to in v estigate the robustness up on the  hoie of l and to determine the highest mo dularit y v al- ues a hiev able with the MSG-VM algorithm. There is a 6 Net w ork MSG-VM Greedy Name Ref. V erties Edges l opt Q Time [ s ] N C Q Time [ s ] N C Za hary Karate Club [19 ℄ 34 78 3 0.398 na 4 0.381 na 3 Metab oli E.  oli [20 ℄ 443 586 6, 8 0.816 na 19 0.811 na 20 College F o otball [17 ℄ 115 613 1 0.603 na 8 0.556 na 6 Metab oli C. ele gans [13 ℄ 453 1899 209 0.450 na 8 0.412 na 13 Jazz [18 ℄ 198 2742 566 0.445 na 4 0.439 na 4 Email [14 ℄ 1133 5451 56 0.575 na 10 0.503 na 12 Y east (PPI, CP) [21 ℄ 2552 7031 35 0.706 na 33 0.675 na 51 M. Karplus [24 ℄ 1167 13423 91 0.316 na 11 0.264 na 18 PPI-CP S.  er evisiae [22 ℄ 4626 14801 170 0.545 na 24 0.500 na 38 PPI S.  er evisiae [22 ℄ 4713 14846 170 0.546 na 65 0.501 na 81 M. Karplus w eigh ted [24 ℄ 1167 18991 173 0.320 na 13 0.296 na 11 In ternet [28 ℄ 11174 23409 278 0.625 8 35 0.584 8 49 PGP-k ey signing [15 , 16 ℄ 10680 24340 44 0.878 2 140 0.849 3 195 W ord Asso iation (CP) [23 ℄ 7204 31783 71 0.541 4 16 0.452 7 52 W ord Asso iation [23 ℄ 7207 31784 97 0.540 3 17 0.465 7 38 Collab oration [12 ℄ 27519 116181 153 0.748 14 82 0.661 103 381 WWW [29 ℄ 325729 1117563 3034 0.939 562 674 0.927 7640 2183 A tor [27 ℄ 82583 3666738 2429 0.543 1722 238 0.470 6288 406 A tor w eigh ted [27 ℄ 82583 4475520 389 0.536 5099 322 0.480 3541 361 T ABLE I: Results on real-w orld examples. Among all tested lev el parameters (all p ositiv e in tegers smaller than 5000 or the n um b er of edges if smaller) the v alue l opt yields the highest v alue of Q for the onsidered net w ork. N C is the n um b er of omm unities found. In most ases, a larger n um b er of omm unities (larger N C ) is iden tied b y the lassial greedy than the MSG-VM extension b eause the former partitions the net w ork in few large omm unities and man y small omm unities with less than ten v erties (mostly 2 - 20 times more small mo dules iden tied b y greedy than MSG-VM). The MSG-VM approa h prev en ts the ondensation in to few large mo dules: The three largest mo dules on tain b et w een 1.5 and 4 times less v erties in the MSG-VM partition than in the greedy partition (not sho wn). The running time (on a reen t laptop) is rep orted for a single run of the algorithm. The en try na indiates that the running time is shorter than 1 s and therefore not displa y ed. The sux CP p oin ts out that only the largest onneted omp onen t (the en tral part) w as onsidered. The aron ym PPI stands for protein-protein in teration. Net w ork Q MSG-VM max Q pub Soure Metho d Za hary Karate Club 0.398 0.419 [11 ℄ [11 ℄ College F o otball 0.603 0.601 [17 ℄ [17 ℄ Metab oli C. ele gans 0.450 0.435 [ 11 ℄ [11 ℄ Jazz 0.445 0.445 [11 ℄ [3℄ Email 0.575 0.574 [11 ℄ [3℄ PGP-k ey signing 0.878 0.855 [11 ℄ [11 ℄ Collab oration 0.748 0.723 [11 ℄ [11 ℄ T ABLE I I: Comparison of maximal v alue of mo dularit y ob- tained b y the MSG-VM algorithm Q MSG − VM max with previously published results Q pub . The highest published v alue w as ex- trated from the referened pap er (Soure) where it has b een alulated b y the Metho d whose referene is listed in the last olumn. minor dep endene on the v alue of l (Fig. 2 ) whi h  hanges the MSG-VM mo dularit y b y less than 2 % for large net- w orks. Moreo v er, the maximal mo dularit y is obtained with l < 300 for 14 of the 19 net w orks (T able I). An empirial form ula for the optimal  hoie of the lev el pa- rameter will b e presen ted elsewhere. Notew orthily , for a lab eled graph and a  hosen lev el parameter the algorithm is deterministi. T o assess the on tribution of the lab eling, the b en hmarking pro edure is p erformed also on h undred opies of the smallest ten net w orks with p erm uted v ertex lab els. This p erm uta- tion lea v es the top ology in v arian t, but mo dies the order in whi h the omm unit y pairs are onsidered. In om- parison to the maximal mo dularit y v alue found for the unsram bled v arian ts a maximal impro v emen t of 0.94 % is observ ed. C. P erformane and running time The mo dularit y v alues obtained with the MSG-VM approa h are listed in T able I I. F or v e of the sev en net w orks onsidered here the MSG-VM algorithm nds solutions with mo dularit y higher than previously pub- lished. Only for the Za hary Karate net w ork the MSG- VM pro edure yields a smaller mo dularit y v alue. F or the jazz net w ork a solution with the iden tial Q v alue is ob- 7 tained. F or the net w orks without published mo dularit y v alues w e ompare the optimal v alues obtained b y the MSG-VM algorithm with the lassial greedy algorithm for mo dularit y optimization as in tro dued b y Newman [5℄ in T able I . W e observ e that the MSG-VM algorithm outp erforms the original greedy algorithm signian tly . The running time estimations in Ses. I I C and I I F are based on a w orst ase senario. T o in v estigate the run- ning time b eha vior on real-w orld examples, w e ompare the running times of the lassial greedy v arian t and the MSG-VM algorithm in T able I. These data sho w that giv en the appropriate lev el parameter  hoie the MSG- VM algorithm is in almost all ases faster than the las- sial greedy algorithm and, at the same time, rea hes a higher v alue of mo dularit y . IV. CONCLUSIONS T o prev en t premature ondensation in to few large om- m unities the greedy algorithm for mo dularit y optimiza- tion has b een extended b y a pro edure for sim ultaneous merging of more than one pair of omm unities at ea h step. F urthermore, this m ultistep greedy v arian t has b een om bined with a simple v ertex-b y-v ertex a p osteri- ori renemen t. On sev en net w orks with previously pub- lished mo dularit y v alues the MSG-VM algorithm om- bination outp erforms all other frequen tly used, generi te hniques exept for the smallest of the sev en examples. In addition, a single run of the MSG-VM algorithm re- quires similar omputer time as the greedy algorithm. In most ases less than 10 indep enden t (i.e., em barrassingly parallel) runs of MSG-VM are required to obtain a mo d- ularit y within 1 % of the highest v alue b eause an empir- ial form ula has b een deriv ed for the appropriate  hoie of the optimal step-width. Therefore, the MSG-VM algo- rithm is an eien t to ol to nd net w ork partitions with high mo dularit y [30 ℄. V. A CKNO WLEDGMENTS The authors thank Stefanie Mu and F raneso Rao for helpful disussions. Christian Bolliger, Thorsten Steen b o  k, and Dr. Alexander Go dkne h t are a kno wl- edged for main taining the Matterhorn luster where most of the parameter studies w ere p erformed. W e are thank- ful to Drs. Arenas, Barabási, Gleiser, and Newman for pro viding the net w ork data. This w ork w as supp orted b y a Swiss National Siene F oundation gran t to A.C. [1℄ M. E. J. Newman and M. Girv an, Ph ys. Rev. E 69 , 026113 (2004). [2℄ U. Brandes, D. Delling, M. Gaertler, R. Go erk e, M. Ho efer, Z. Nik oloski, and D. W agner, eprin t arXiv:ph ysis/0608255 . [3℄ J. Du h and A. Arenas, Ph ys. Rev. E 72 , 027104 (2005). [4℄ R. Guimerà and L. A. N. Amaral, Nature (London) 433 , 895 (2005). [5℄ M. E. J. Newman, Ph ys. Rev. E 69 , 066133 (2004). [6℄ L. Danon, A. Díaz-Guilera, J. Du h, and A. Arenas, J. Stat. Me h. 2005 , P09008 (2005). [7℄ S. F ortunato and M. Barthélem y , Pro . Natl. A ad. Si. U.S.A. 104 , 36 (2007). [8℄ J. M. Kumpula, J. Saramäki, K. Kaski, and J. Kertész, Eur. Ph ys. J. B 56 , 41 (2007). [9℄ A. Clauset, M. E. J. Newman, and C. Mo ore, Ph ys. Rev. E 70 , 066111 (2004). [10℄ B. Kernighan and S. Lin, Bell Syst. T e h. J. 49 , 291 (1972). [11℄ M. E. J. Newman, Pro . Natl. A ad. Si. U.S.A. 103 , 8577 (2006). [12℄ M. E. J. Newman, Pro . Natl. A ad. Si. U.S.A. 98 , 404 (2001). [13℄ H. Jeong, B. T om b or, R. Alb ert, Z. N. Oltv ai, and A. L. Barabási, Nature (London) 407 , 651 (2000). [14℄ R. Guimerà, L. Danon, A. Díaz-Guilera, F. Giralt, and A. Arenas, Ph ys. Rev. E 68 , 065103(R) (2003). [15℄ X. Guardiola, R. Guimerà, A. Arenas, A. Díaz- Guilera, D. Streib, and L. A. N. Amaral, e-print arXiv: ond-mat/0206240 . [16℄ M. Boguñá, R. P astor-Satorras, A. Díaz-Guilera, and A. Arenas, Ph ys. Rev. E 70 , 056122 (2004). [17℄ M. Girv an and M. E. J. Newman, Pro . Natl. A ad. Si. U.S.A. 99 , 7821 (2002). [18℄ P . Gleiser and L. Danon, A dv. Complex Syst. 6 , 565 (2003). [19℄ W. W. Za hary , J. An throp ol. Res. 33 , 452 (1974). [20℄ H. Ma and A.-P . Zeng, Bioinformatis 19 , 270 (2003). [21℄ N. J. Krogan et al. , Nature (London) 440 , 637 (2006). [22℄ V. Colizza, A. Flammini, A. Maritan, and A. V espignani, Ph ysia A 352 , 1 (2005). [23℄ D. L. Nelson, C. L. MEv o y , and T. A. S hreib er, Beha v. Res. Metho ds. Instrum. Comput. 36 , 402 (2004). [24℄ P . S h uetz and A. Cais h, the net w ork of w ords in the titles of Martin Karplus' publiations (unpublished). [25℄ J. E. Hirs h, Pro . Natl. A ad. Si. U.S.A. 102 , 16569 (2005). [26℄ P . Ball, Nature (London) 448 , 737 (2007). [27℄ A.-L. Barabási and R. Alb ert, Siene 286 , 509 (1999). [28℄ In ternet Net w ork. Undireted, un w eigh ted net w ork of the In ternet at the Autonomous System lev el from data olleted b y the Oregon Route Views Pro jet (h ttp://www.routeviews.org/ ) in Ma y 2001, where v er- ties represen t In ternet servie pro viders and edges on- netions among them. The le rep orts the list of on- neted pairs of no des. [29℄ R. Alb ert, H. Jeong, and A.-L. Barabási, Nature (Lon- don) 401 , 130 (1999). [30℄ The o de is a v ailable at http://www.bio hem-  aish.uzh.h/ ommunitydete tion/

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment