Fast unfolding of communities in large networks

F ast unfolding of comm unities in large net w orks Vincen t D. Blondel 1; a , Jean-Loup Guillaume 1 , 2; b , Renaud Lam biotte 1 , 3; c and Etienne Lefeb vre 1 1 Department of Mathematical Engineering, Universit´ e catholique de Louv ain, 4 av enue Georges Lema itre, B-1 348 Louv ain-la -Neuve, Belg ium 2 LIP6, Universit ´ e Pierre et Marie C urie, 4 place Jussieu, 75 005 Paris, F rance 3 Institute for Mathematical Sciences , Imperial College London, 53 Prince’s Gate, South K ensington campus, SW72PG, UK E-mail: a vincen t.blo ndel@uclouvain.be; b jean-l oup.g uillaume@lip6.fr; c r.lamb iotte @imperial.ac.uk; Abstract. W e pr op ose a simple metho d to extract the communit y structure of large net works. Our method is a heuristic method that is based on modula rity optimization. It is shown to outp erform all other known comm unity detection method in terms of computation time. Moreov er, the quality of the communit ies detected is very go o d, as measured b y the so-called mo dularity . This is shown ﬁrs t b y ident ifying language communi ties in a Belgian mobile phone netw ork of 2.6 million customers and by analyzing a web graph o f 118 million no des and more than one billion links. The accuracy o f our algor ithm is a lso veriﬁed on ad-ho c mo dula r netw o rks. Keywor ds : Random graphs, net works ; Critical phenomena of socio-economic systems; So cio-economic net works F ast unfolding of c ommunities in lar ge networks 2 1. In t ro duction So cial, tec hnological and i nformation sys tems can often b e describ ed in terms of complex net w orks that ha v e a topolog y of in terconnected n o des com bining organization and randomness [1, 2]. The t ypical s ize of large net w orks s uc h as so cial net work servic es, mobile phone net w orks or the w eb no w coun ts in millions when not billions of no des and these scales demand new metho ds to retriev e comp rehensiv e information from their structure. A promising approac h consists in decomp osing the netw o rks in to sub-units or comm unities, whic h are sets of highly inter-connec ted no des [3]. The iden tiﬁcation of these comm unities is of crucial imp ortance a s they ma y help to unco v er a- priori unkno wn functional mo dules suc h as topics in information net w orks or cyb er-comm unities in so cial net w orks. Moreov er, the resulting meta-net w o rk, whose no des are the comm unities, may then b e used to visualize the original net w ork structure. The problem of commu nit y detection requires the partition of a net w ork in to comm unities o f densely connected no des, with the no des b elonging to diﬀeren t comm unities b eing only sparsely connected. Preci se form ulations of this optimiz ation problem are know n to b e computationally in tractable. Sev eral algorithms hav e therefore b een prop osed to ﬁnd reasonably go o d partitions in a r easonably fast w ay . This searc h for fast algorithms has attracted m uch in terest in recen t ye ars due to the increasing a v ailability of large net w ork data sets and the impact of net works on ev ery da y life. One can distinguish sev eral ty p es of comm unity detection algorithms: divisiv e algorithms detect in ter-comm unity links and remo v e them from the net work [4, 5, 6], ag glomerativ e algorithms merge s imilar no des/comm unities recu rsiv ely [ 7] and optimization metho ds are based on the maximisation of an ob jectiv e function [8, 9, 10]. The qualit y of the partitions resulting from these meth o ds is oft en me asured b y the so-called mo dularit y of the partition. The mo dularit y of a partition is a scalar v alue b et wee n - 1 and 1 that measures the densit y of links inside comm unities as compared to links b et w een comm unities [4, 11]. In the case of w eigh ted net w orks (w eigh ted net w orks are net work s that ha v e w eights on their links, such as the n umber of comm unications b et we en t w o mobile phone users), it is deﬁned as [12] Q = 1 2 m X i,j  A ij − k i k j 2 m  δ ( c i , c j ) , (1) where A ij represen ts the w eight of the edge b et wee n i and j , k i = P j A ij is the sum o f the w eights of the edges atta ched to ve rtex i , c i is the comm unit y to whic h vertex i is assigned, the δ -function δ ( u, v ) is 1 if u = v and 0 otherwise and m = 1 2 P ij A ij . Mo dularit y has been used to compare the qualit y o f the partitions obtained b y diﬀeren t metho ds, but also as an ob jectiv e function to optimize [13]. Unfortunately , exact mo dularit y optimization is a problem that is computationally hard [14] and so appro ximation algorithms are necessary when dealing with large net w orks. The fastest appro ximation algorithm for optimizing modularity on large net w o rks w as prop osed b y Clauset et al. [8]. That metho d consists in recurren tly merging commu nities that optimize the pro duction of mo dularit y . Unfortunately , this greedy algorithm may F ast unfolding of c ommunities in lar ge networks 3 Figure 1. Visualiza tion of the steps of our algo rithm. Each pas s is made of tw o phases: one where mo dularity is o ptimized b y allowing only lo c al c hanges of comm unities; one where the found communities ar e ag grega ted in or der to build a new net work of communi ties. The pas ses are repe ated iteratively unt il no increase of mo dularity is po ssible. pro duce v alues of mo dularity that are signiﬁcan tly low er than what can b e found by using, for instance, sim ulated annealing [1 5 ]. Moreo v er, the metho d prop osed in [8] has a tendency to pro duce sup er- comm unities that con ta in a large fraction of the no des, ev en on syn thetic netw or ks that ha v e no signiﬁcan t comm unit y structure. This artefact also has the disadv an ta g e to slo w do wn the algorithm considerably and make s it inapplicable to net w orks o f more than a million no des. This undes ired eﬀect has b een circum ven ted b y in tro ducing tric ks in order to balance the size of the comm unities being merged, thereb y sp eeding up the running time and making it p ossible to deal with net works that ha ve a few million no des [16]. The largest netw orks that ha v e b een dealt with so far in the literature a re a protein- protein in teraction net w ork of 30739 no des [17], a net w ork of ab out 4000 00 items on sale on the webs ite of a large on-line retailer [8], and a Japanese so cial net w orking systems of ab out 5.5 million users [16]. These sizes still leav e considerable ro om for impro v emen t [18] considering that, as of to day , the so cial net w orking servic e F aceb o ok has about 64 million activ e users, the mobile net w ork o p erator V o daphone has ab out 200 million customers and Go ogle indexes sev eral billion w eb-pages. Let us also notice that in most large net w orks suc h as those listed a b ov e there are sev eral natural organization lev els –comm unities divid e themselv es in to sub-comm unities– and it is thus desirable to obtain comm unit y detection metho ds that rev eal this hierarc hical structure [19]. F ast unfolding of c ommunities in lar ge networks 4 2. Metho d W e now introduce our algorithm that ﬁnds high mo dularit y partitions of large net works in sh ort time and that unfolds a complete hierarc hical comm unit y structure for the net w ork, thereb y giving access to diﬀeren t resolutions of comm unity detection. Con trary to all the other commun it y detection algorithms, the netw ork size limits that w e are facing with our algorithm are due to limited storage capacit y rather than limited computation time: iden tifying comm unities in a 118 million no des net w or k to ok only 152 min utes [20]. Our algorithm is divided in tw o phases that are rep eated iterativ ely . Assume that w e start with a we igh ted net work of N no des. First, we a ssign a diﬀeren t comm unit y to eac h node o f the net w ork. So, in this initial partition there are as man y comm unities as there are no des. Then, for eac h no de i w e consider the neigh b ours j of i and w e ev aluate the gain of mo dularity that w ould tak e place b y remo ving i from its comm unit y and b y placing it in the comm unity of j . The no de i is then placed in the commun it y for whic h this gain is maxim um (in c ase of a tie w e use a breaking rule), but o nly if this ga in is p ositiv e. If no p ositiv e gain is p ossible, i sta ys in its original comm unit y . This pro cess is applied rep eatedly a nd sequen tially for a ll nodes until no further improv emen t can b e ac hiev ed and the ﬁrst phase is then complete. Let us insist on the fact that a node ma y b e, and often is, considered sev eral times. This ﬁrst phase stops when a lo cal maxima of the mo dularity is attained, i.e., when no individ ual mo v e can improv e the mo dularit y . One should also note that the output o f the algorithm depends on the order in whic h the no des are considered. Preliminary results on sev eral test cases seem to indicate that the ordering of the no des do es not ha v e a signiﬁcan t inﬂuenc e on the modularity that is obtained. How ev er the ordering can inﬂuenc e the computation time. The problem o f c ho osing an order is th us w o rth studying since it could giv e go o d heuristics to enhance the computation time. P art of the algorithm eﬃc iency results from the fact that the gain in mo dularit y ∆ Q obtained b y mo ving an isolated no de i into a commun it y C can easily b e compute d b y: ∆ Q =   P in + k i,in 2 m − P tot + k i 2 m ! 2   −   P in 2 m −  P tot 2 m  2 − k i 2 m ! 2   , (2) where P in is the sum of the w eights of the links inside C , P tot is the sum of the w eights of the links inciden t to no des in C , k i is the sum of the w eigh ts of the links inciden t to no de i , k i,in is the sum of the w eights of the links from i to no des in C a nd m is the sum of the w eigh ts of all the links in the net w or k. A similar expression is used in order to ev aluate the c hange of mo dularity when i is remo v ed from its commu nit y . In practice, one therefore ev aluates the ch ange of mo dularit y by remov ing i from its comm unity and then b y mo ving it into a neighbouring commu nit y . The second phase of the algorithm cons ists in build ing a new net work whose no des are no w the comm unities found during the ﬁrst phase. T o do so, the weigh ts of t he links F ast unfolding of c ommunities in lar ge networks 5 Figure 2. W e hav e applied our metho d to the ring of 30 cliques discussed in [23]. The cliques are comp osed of 5 no des and ar e in ter -connected through single links. The ﬁrs t pass of the algo rithm ﬁnds the natural par tition of the netw ork. The second pass ﬁnds the g lobal maximum of modula rity where cliques are com bined into gro ups of tw o. b et w een the new no des are giv en by the sum of the w eigh t of the links b et w een nodes in the corresp onding tw o comm unities [21]. Links b et w een no des of the same comm unit y lead to self-lo ops for this comm unity in the new netw o rk. Once this second phase is completed, it is then p ossible to r eapply the ﬁrst phase of the algorithm to the resulting w eigh ted net w ork and to iterate. Let us denote by ”pass” a com bination o f the se tw o phases. By construction, t he n umber of meta-commu nities decreases at each pass, and as a consequence most of the computing time is used in the ﬁrs t pass. The passes are iterated (see Figure 1) until there are no more c hanges and a maxim um of mo dularity is attained. The algo rithm is remi niscen t of the self-simil ar nature of comp lex net w orks [22] and naturally incorp orates a notion of hierarc h y , as comm unities of comm unities are built during the pro cess. The heigh t of the hierarc hy that is constructed is determined b y the num b er o f passes and is generally a small n um b er, as will b e sho wn on some examples b elo w. This simple algorithm has sev eral adv antages. First, its steps are intui tiv e and easy to implemen t, and the outcome is unsup ervised. Moreo v er, the algorithm is extremely fast, i.e., computer sim ulations on large ad-ho c mo dular net works suggest that its complexit y is line ar on t ypical and sparse data. This is due to the fact that the p ossible gains in mo dularit y a re easy to com pute with the ab ov e formul a and that the n um b er of comm unities decreas es drastically af ter j ust a few passes so that most of the running time is concen trated on the ﬁrst iterations. Th e so-called resolution limit problem o f mo dularit y also seems to be circum v ente d thanks to the in trinsic m ulti-lev el nature of our alg o rithm. Indeed, it is w ell-kno wn [23] tha t mo dularit y optimization fails to iden tify comm unities smaller than a certain scale, thereb y inducing a r esolution limit o n the comm unit y detected b y a pure mo dularit y optimization approach. This observ ation is only partially relev ant in our case b ecause the ﬁrst phas e of o ur algorithm in v olves the displacemen t of single no des from one comm unit y to another. Conse quen tly , the probabilit y that tw o distinc t comm unities can b e merged by mo ving no des one by one F ast unfolding of c ommunities in lar ge networks 6 Figure 3 . Graphica l representation o f the net work of communities extracted from a Belgian mobile phone net work. Abo ut 2M custo mers are repr esented on this net work. The size of a no de is prop or tional to the num b er of individuals in the corresp onding communi ty and its colour on a r ed-green scale repr esents the main la nguage sp oken in the co mm unit y (red for F rench and gr een fo r Dutch) . Only the co mm unities comp osed o f more than 100 custo mers have b een plotted. Notice the intermediate communi ty of mixed colours betw een the t wo main language clusters. A zo om at higher r esolution reveals that it is made of several sub-communities with less apparent language s eparation. is v ery low . These comm unities may p o ssibly b e merged in the later passes, after blo cks of no des hav e b een aggregated. How ev er, o ur a lgorithm provi des a decomposition o f the net work in to commu nities for diﬀeren t lev els of organization. F or instance, when applied on the clique netw ork prop osed in [23], the cliques are indeed merged in the ﬁnal partition but they are distinct after the ﬁrst pass (see Figure 2). This result suggests that the in termediate solutions found b y our algorithm may also b e meaningful and that the uncov ered hierarchic al structure may allo w the end-use r to zo om in the net w ork and to observ e its structure with the desire d resolution. F ast unfolding of c ommunities in lar ge networks 7 Figure 4. F o r the largest commun ities in the B elgian mo bile pho ne netw ork we represent the size of the comm unit y and the prop or tion of customers in the communit y that sp eak the dominant lang uage of the communi ty . F or all but one communit y of more than 1000 0 members the do minant la nguage is spo ken by more than 85% of the communi ty mem b er s. 3. Application to large net works In order to v erify the v alidit y of our algorithm, w e ha v e a pplied it on a num b er of test- case net w orks that are commonl y used for eﬃciency comparis on a nd w e ha ve compared it with three other comm unit y detec tion algorithms (see T able 1). The net w orks that we consider include a small so cial net w ork [24], a net w ork of 9000 scien tiﬁc pap er and their Karate Arxiv In ternet W eb nd.edu Phone W eb uk-200 5 W eb W ebBa se 2 001 No des/links 34/77 9k/24 k 70k/3 51k 325k/1M 2.6M/6.3 M 39M/78 3M 118M/1 B CNM .38/0s .772/3.6 s .692/799 s .927/5 034s -/- -/- -/- PL .42 /0s .757/3.3s .729/ 575s .89 5/666 6s -/- -/- -/- WT .42/0s .761/0.7 s .667/62s .898/248s .56/46 4s -/- -/- Our a lgorithm .42/0s .813/0 s .781/1 s .935/3 s .769/1 34s .9 79/73 8s .98 4/15 2mn T able 1. Summary of num erical res ults. This table gives the p er formances of the algorithm of Clauset, Newman and Mo o re [8], of P ons and La tap y [7], of W a kita and Tsurumi [16] and of our algorithm for commu nity detection in net works of v ar ious sizes. F or each metho d/netw o rk, the table displays the mo dularity that is achiev ed and the co mputation time. Empt y cells co rresp ond to a computation time o ver 24 hours. Our metho d clearly p erforms b etter in terms of computer time and mo dularity . It is also interesting to note the sma ll v alue of Q found by WT for the mobile phone netw o rk. This bad mo dularity result ma y originate from their heuristic which creates bala nced co mm unities, while our appro ach giv es unbalanced communities in this s pe ciﬁc netw o rk. F ast unfolding of c ommunities in lar ge networks 8 citations [25], a sub-net w ork of the in ternet [26] and a w ebpage net work of a few hundred thousands w eb-pages (the nd.edu domain, s ee [2 7 ]). In all cases, one can observ e b oth the rapidit y and the large v alues of the mo dularity that are obtained. Our metho d outp erforms a ll the other metho ds to whic h it is compared. W e also hav e applied our metho d on tw o w eb net w orks of unpreceden ted sizes: a sub-net w ork of the .uk domain of 39 million no des and 783 mill ion links [28] and a net w ork of 118 million no des and 1 billion links obta ined b y the Stanford W ebBase cra wler [28, 29]. Ev en for these ve ry large net w orks, the computation time is small (12 min utes and 152 min utes resp ectiv ely) and mak es netw orks of still larger size, p erhaps a billion no des, access ible to computational analysis. It is a lso in teresting to note that the num b er of passes is usually ve ry small. In the case of the Karate Club [24], for instance, there are only 3 passes: during the ﬁrst one, the 34 no des of the netw or k are partitioned into 6 co mm unities; after the second o ne, only four comm unities remain; during the third one, nothing happ ens and the algorithm therefore stops. In the ab o ve examples, the n um b er of pas ses is alwa ys smaller than 5. W e hav e also tested the sensitivit y of our algo rithm b y applying it on ad-ho c net w orks that hav e a kno wn comm unity structure. T o do so, we ha v e used net w orks comp osed of 128 no des whic h are split into 4 comm unities of 32 no des each [30]. Pairs of no des b elonging to the same comm unit y are link ed with probabilit y p in while pairs b elonging to diﬀeren t comm unities are link ed with probabilit y p out . The accuracy of the metho d is ev aluated b y meas uring the f r a ction of correc tly identiﬁ ed nodes and the normalized m utual informatio n. In the b enc hmark prop osed in [30], the fraction of correctly iden tiﬁed no des is 0 . 67 for z out = 8, 0 . 92 for z out = 7 and 0 . 98 for z out = 6, i.e., an accuracy similar to that of the algorithm of Pons and Latap y [7] and of the algorithm of Reic hardt and Bornholdt [31]. T o our kno wledge, only t w o algorithms ha ve a b etter accuracy than o urs, the algorithm of D uc h and Arenas [32] and the sim ulated annealing metho d ﬁrst prop osed in [15], but their computational cost limits their applicabilit y to m uc h smaller net w orks than the ones considered here. Our algorithm has also b een succes sfully tested on other b enc hmarks, suc h a s the ones prop osed in [19, 3 3]. In the b enc hmark prop osed in [33], for instance, the normalized mu tual information is nearly 1 for the macro-comm unities with a mixing parameter k 3 up to 35. It reac hes 0 . 5 when the mixing parameter is around 55. T o v alidate the comm unities o btained w e hav e also applied our algorithm to a large netw ork cons tructed from the records of a Belgian mobile phone compan y . This net w ork is described in details in [34] whe re it is s ho wn to exhibits t ypical features of so cial net w o rks, suc h as a high cluste ring co eﬃcien t and a fat- tailed degree distribution. The net w ork is comp osed of 2.6 million customers, b etw een whom w eighted links are dra wn that account for t heir total nu m b er of phone calls during a 6 mon th p erio d. Each customer is identiﬁ ed b y a surrogate k ey to whic h sev eral en tries are asso ciated, suc h as his age, his sex, his language and t he zip co de of the place where he liv es. This large so cial net work is ex ceptional due to the particular situation o f Belgium where t wo main linguistic comm unities (F renc h and Dutc h) coexist and which pro vides an F ast unfolding of c ommunities in lar ge networks 9 excelle n t w a y to test the v alidity of our commun it y detection metho d b y loo king at the linguistic homogeneit y of comm unities [35]. F rom a more sociological p oin t of view, the p ossibilit y to highligh t the linguistic, religious or ethnic homogeneit y of comm unities op ens persp ectiv es fo r des cribing the so cial cohesion and the p oten tial fragilit y of a coun try [36]. On this particular net work, our comm unit y detection algorithm has iden tiﬁed a hierarc h y of six lev els. A t the b ottom leve l ev ery customer is a comm unity of its own and at the t o p-lev el there are 261 comm unities that ha ve more t han 100 customers. These comm unities account f o r ab out 75% of all cus tomers. W e hav e p erformed a language analysis of these 261 commun ities (see Figure 3 ). The homogeneit y of a comm unit y is characteriz ed b y the p ercen tage of those speaking the dominan t language in that comm unit y; this quan tit y go es to 1 when the comm unity tends to b e monolingual. Our a nalysis rev eals that the netw o r k is strongly segregated, with most commun ities almost monolingual. There are 36 comm unities with more than 10000 customers and, except for one comm unit y at the in terface b et w een the t wo language clusters, all these comm unities ha ve more than 85% of their mem b ers sp eaking the same language (see Figure 4 for a complete distribution). It is in teresting t o analyse more closely the only comm unit y that has a more equilib rate distribution of languages . Our hierarc h y rev ealing algorithm allo ws us to do this by considering the sub-comm unities pro vided b y the algorithm at the low er lev el. As sho wn on Figure 4, these sub-comm unities ar e closely connected to each other a nd are themselv es comp osed of heterogeneous groups of people. These groups of p eople, where language ceases to be a disc riminating factor, migh t p ossibly pla y a crucial role for the in t egra tion of the coun try and for the emergence of consensus b et w een the comm unities [37]. One ma y indeed w onder what would happ en if the comm unit y at the in terface b et we en the tw o language clusters on Figure 3 w as to b e remo ved. Another in teresting observ ation is related to the presence of o ther languages. There are actually four p ossible language declarations for the custome rs of this par t icular mobile phone op erator: F renc h, D utc h, Englis h or German. It is intere sting to note that, whereas English sp eaking customers disp erse thems elv es quite ev enly in all comm unities, more than 6 0 % of the G erman sp eaking customers are concen trated in just one comm unity . This is probably due to the fact that German sp eaking p eople are mainly concen trated in a small region close to Germany , while Englis h sp eaking people are spread in the whole coun try . Let us ﬁnally observ e that, as can b e visually noticed on Figure 3 , F renc h sp eaking comm unities are m uc h more densely connected than their Dutc h sp eaking coun terparts: on a verage, the strength of the links b et w een F renc h sp eaking comm unities is 5 4% stronger than those b et wee n Dutc h sp eaking comm unities. This diﬀerence o f structure b etw een the tw o sub-net works seems to indicate that the t wo linguistic comm unities are characteriz ed by diﬀeren t social b eha viours a nd therefore suggests to searc h other top ological c haracteristics for the comm unities. F ast unfolding of c ommunities in lar ge networks 10 4. Conclusion and discussion W e ha ve introduced an algorithm fo r optimizing mo dularity that allows to study net w orks of unpreceden ted size. T he limitation o f the metho d f o r the exp erimen ts that we p erformed was the storage of the net work in main me mory rather than the computation time. This c hange of scales, i.e., fro m ar o und 5 millions no des for previous metho ds to more than 100 millions no des in our case, op ens exciting p ersp ectiv es as the mo dular structure of comp lex sys tems suc h as whole coun tries or h uge parts of the Inte rnet can now b e unra v eled. The a ccuracy of our metho d has also been tested on ad-ho c mo dular net w orks and is show n to be excellen t in comparison with other (m uch slo w er) comm unit y detection methods. It is in teresting t o note that the speed of our algorithm c an still be substan tially impro v ed b y using some simple heuristics , for instance by stopping the ﬁrst phase of our algorithm when the gain of mo dularity is b elo w a giv en threshold or b y remo ving the no des of degree 1 (lea v es) from the original net w ork and adding them bac k after the comm unit y computation. The impact of these heuristics o n the ﬁnal part itio n of the netw o r k should b e studied further, as w ell as the role pla yed b y the ordering of the no des during the ﬁrst phase of the algorithm. By construction, our algo rithm unfolds a complete hierarc hical comm unit y structure for the netw ork, e ac h lev el of the hierarc h y b eing giv en b y the inte rmediate partitions found at each pass. In this pap er, ho w ev er, we ha ve only v eriﬁed the accuracy of the top leve l of this hierarc h y , namely the ﬁnal partition found b y our algorithm, and the accuracy of the in termediate partitions has still to b e sho wn. Sev eral p oints suggest, ho w ev er, that these in termediate partitions make sense. First, in termediate partitions corresp ond to lo cal maxima of mo dularity , maxima in the sense that it is not possible to increase modularity b y mo ving one single ”en tit y” fr o m o ne comm unity to a neigh b o uring one. In the ﬁrst pass of the algorithm, these en tities are no des, but at subsequen t passes , they corresp ond to larger and larger sets of no des. Interme diate partitions ma y therefore b e view ed as lo cal maxima of mo dularit y at diﬀeren t scales. It is the agg lomeration o f no des during the second phase of the algo rithm whic h allo ws to unco v er larger and larger commun ities, thereb y taking adv an t a ge of the self-similar structure of man y complex net work s. Second, the ﬁnal part ition found b y our algorithm has a v ery high v alue of mo dularity for a broad range o f system sizes (for instance, as sho wn in T able 1 , our alg orithm p erforms b etter in terms of modularity than t hose of Clauset, Newman and Mo ore [8], of P ons and Latap y [7] and of W akita and Tsurumi [16]). F inally , it is instructiv e to consider a comm unit y C found at the last pass of our algorithm. In order to test the v alidity of the sub-comm unities f ound at the penultim ate pass, it is tempting to lo ok at comm unit y C as a new net w ork, thereb y neglecting links going from C to the rest of the netw o rk. By reapplying our algorithm on the isolated comm unity C , one exp ects to ﬁnd v ery similar sub-comm unities due to t he lo cal optimization in volv ed at eac h step. These a re, ho w ev er, v ery qualitativ e argumen ts and the m ulti-resolution of our algorithm will only be conﬁrmed after lo oking in detail at the hierarc hies found in ad-ho c net w o rks with kno wn hierarc hical structure [19] or without F ast unfolding of c ommunities in lar ge networks 11 comm unit y structure (e.g. Erd¨ os-Ren yi random graphs), or after comparing with other metho ds incorp orating a tunable resolution [33, 38, 39]. Ac kno wledgemen ts This r esearc h w as supp orted b y the Comm unaut ´ e F ran¸ caise de Belgique through a gran t AR C and b y the Belgian Ne t w ork DYS CO, funded b y the In teruniv ersit y A ttraction P oles Programme, initiated b y the Belgian State, Science P olicy Oﬃce. J.-L. G. is a lso supported by the pro ject MAPE (ANR F rance) and MAP AP (Sa f er In ternet Plus Programme, Europ ean Union). [1] Alber t R and Barab´ asi A-L, 2002 R ev. Mo d. Phys. 74 4797. [2] Newman M E J, Barab´ asi A-L and W a tts D J, The St ructur e and Dynamics of Networks (Princeton Univ ersity P ress, Pr inceton, 2 006). [3] F ortunato S and Castellano C, 2007 arXiv: 0712.2 716 [4] Girv an M and Newman M E J, 20 02 Pr o c. Natl. A c ad. Sci. USA 99 782 1. [5] Newman M E J and Girv an M, 20 04 Phys. R ev. E 69 0261 13. [6] Radicchi F, Castellano C, Cecconi F, Loreto V and Parisi D, 200 4 Pr o c. Natl. A c ad. Sci . USA 101 2658. [7] Pons P a nd Latapy M, 2006 Journal of Gr aph Algorithms and Applic ations 10 191. [8] Clauset A, Newman M E J a nd Moor e C, 2004 Phys. Re v. E 70 06611 1. [9] W u F and Huber man B A, 2004 Eur. Phys. J. B 38 33 1. [10] Newman M E J, 2 006 Phys. Rev . E 74 036 104. [11] Newman M E J, 2 006 Pr o c. Natl. A c ad. Sci. USA 103 857 7. [12] Newman M E J, 2 004 Phys. Rev . E 70 056 131. [13] Newman M E J, 2 004 Phys. Rev . E 69 066 133. [14] Brandes U, Delling D, Gaertler M, Goerke R, Ho efer M, Nikoloski Z and W a gner D, 200 6 physics/ 060825 5 [15] Guimera R, Sales M and Amaral L A N, 20 04 Phys. R ev. E 70 0251 01. [16] W akita K and Tsurumi T, 2007 Pr o c e e dings of IA DIS int ernational c onfer enc e on WWW/Internet 2007 15 3. [17] Palla G, Der ´ enyi I, F a rk as I and Vicsek T, 200 5 Natur e 435 814. [18] Raghav an U N, Alb ert R and Kumara S, 200 7 Physic al R eview E 76 03 6106 . [19] Sales-Pardo M, Guimera R, Mor eira A A and Amaral L A N, 2007 Pr o c. Natl. A c ad. Sci. USA 104 1 5224 . [20] All metho ds describ ed here hav e been compiled and tested on the same machine: a bi- opteron 2.2k with 24 GB of memory . The co de is fre ely av ailable for download on the webpage h ttp://ﬁndcommunit ies.go og lepages.co m . [21] Arenas A, Duch J, F ern´ andez A and G´ omez S, 2007 N. J. of Phys. 9 176. [22] Song C, Havlin S a nd Makse H A, 20 05 Natur e 433 392. [23] F ortunato S and Barth´ elemy M, 2007 Pr o c. Natl. A c ad. Sci. USA 104 36 . [24] Zachary WW, 1977 Journal of Anthr op olo gic al R ese ar ch 33 45 2. [25] htt p://www.cs.co rnell.edu/pro jects/ kddcup/ (Cor nell KDD Cup) [26] Ho erdt M and Magoni D, 20 03 Pr o c e e dings of the 11th In ternational Confer enc e on Softwar e, T ele c ommun ic ations and Computer Networks 25 7. [27] Albe rt R, Jeo ng H and Bar ab´ asi A-L, 1 999 Natu r e 401 130. [28] htt p://law.dsi.unimi.it/ (Lab orator y for W eb Algorithmics) [29] htt p://dbpubs.stanford.edu:80 91/ ∼ testbed/ do c2/W ebBas e/ (Stanford W ebBa se P ro ject) [30] Danon L , D ´ ıaz-Guilera A, Duc h J and Arena s A, 2005 J. St at. Me ch. P09008. [31] Reichardt J a nd Bor nholdt S, 2004 Phys. Rev . L ett. 93 2187 01. [32] Duch J a nd Arenas A, 2 005 Phys. Rev . E 72 0271 04. [33] Lancichinetti A, F or tunato S, Kertesz J, ar Xiv:0802.1 218 [34] Lambiotte R, Blondel V D, de Kercho ve C, Huens E, Prieur C, Smor eda Z and V an Do o ren P , 2008 arXiv:0802. 2178 F ast unfolding of c ommunities in lar ge networks 12 [35] Palla G, Barab´ asi A-L and Vicsek T, 2007 Natur e 446 664. [36] Onnela J-P , Sar am¨ aki J, Hyv¨ onen J, Szab´ o G, Lazer D, Ka ski K, Kert´ esz J and Bara b´ asi A-L, 2007 Pr o c. Natl. A c ad. Sci. USA 104 733 2. [37] Lambiotte R, Auslo os M and Ho lyst J A, 2 007 Phys. R ev. E 75 03010 1(R). [38] Arenas A, F ern´ andez A and G´ o mez S, 2008 N. J. of Phys. 10 053039 . [39] Delvenne J-C, Y alir aki S and Barahona M, in pr eparation

Fast unfolding of communities in large networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment