Axioms for graph clustering quality functions

Journal of Machine Learning Research 1 (2013) 1-23 Submitted 7/13; Published 11/13 Axioms for graph clustering qualit y functions Tw an v an Laarho v en tv anlaarho ven@cs.ru.nl Elena Marc hiori elenam@cs.ru.nl Institute for Computing and Information Scienc es R adb oud University Nijme gen Postbus 9010 6500 GL Nijme gen, The Netherlands Editor: V ahab Mirrokni Abstract W e inv estigate prop erties that intuitiv ely ought to be satisﬁed by graph clustering qualit y functions, i.e. functions that assign a score to a clustering of a graph. Graph clustering, also kno wn as net work communit y detection, is often p erformed b y optimizing suc h a function. Tw o axioms tailored for graph clustering qualit y functions are in tro duced, and the four axioms in tro duced in previous w ork on distance based clustering are reformulated and generalized for the graph setting. W e sho w that mo dularit y , a standard qualit y function for graph clustering, does not satisfy all of these six prop erties. This motiv ates the deriv ation of a new family of qualit y functions, adaptiv e scale mo dularit y , which does satisfy the prop osed axioms. Adaptiv e scale mo dularit y has t wo parameters, whic h give greater ﬂexibility in the kinds of clusterings that can b e found. Standard graph clustering quality functions, such as normalized cut and unnormalized cut, are obtained as special cases of adaptiv e scale mo dularit y . In general, the results of our inv estigation indicate that the considered axiomatic frame- w ork cov ers existing ‘go od’ qualit y functions for graph clustering, and can b e used to deriv e an interesting new family of qualit y functions. Keyw ords: graph clustering, modularity , axiomatic framew ork. 1. In tro duction F ollo wing the work by Kleinberg (2002) there ha ve b een v arious con tributions to the the- oretical foundation and analysis of clustering, such as axiomatic frameworks for quality functions (Ack erman and Ben-Da vid, 2008), for criteria to compare clusterings (Meila, 2005), uniqueness theorems for speciﬁc types of clustering (Zadeh and Ben-Da vid, 2009; Ac kerman and Ben-David, 2013; Carlsson, M ´ emoli, Rib eiro, and Segarra, 2013), taxonom y of clustering paradigms (Ac kerman et al., 2010a), and characterization of div ersiﬁcation systems (Gollapudi and Sharma, 2009). Klein b erg fo cused on clustering functions, which are functions from a distance function to a clustering. He show ed that there are no clustering functions that sim ultaneously satisfy three intuitiv e properties: scale inv ariance, consistency and ric hness. Ack erman and Ben- Da vid (2008) con tinued on this work, and sho wed that the imp ossibilit y result do es not apply when formulating these prop erties in terms of qualit y functions instead of clustering functions, where consistency is replaced with a weak er prop erty called monotonicit y . c  2013 Twan van Laarhoven and Elena Marchiori. v an Laarhoven and Marchiori Both of these previous works are form ulated in terms of distance functions o ver a ﬁxed domain. In this pap er w e fo cus on weigh ted graphs, where the weigh t of an edge indicates the strength of a connection. The clustering problem on graphs is also known as net w ork comm unity detection. Graphs provide additional freedoms ov er distance functions. In particular, it is p ossible for t wo p oints to be unrelated, indicated by a weigh t of 0. These zero-w eight edges in turn mak e it natural to consider graphs ov er diﬀeren t sets of no des as part of a larger graph. Secondly , we can allow for self lo ops. Self lo ops can indicate in ternal edges in a no de. This notation is used for instance by Blondel et al. (2008), where a graph is contracted based on a ﬁne-grained clustering. In this setting, where edges with weigh t 0 are p ossible, Kleinberg’s imp ossibility result do es not apply . This can b e seen b y considering the connected comp onents of a graph. This is a graph clustering function that satisﬁes all three of Kleinberg’s axioms: scale inv ariance, consistency and richness (see Section 4.2). Our fo cus is on the inv estigation of graph clustering quality functions, which are func- tions from a graph and a clustering to a real n umber ‘qualit y’. A notable example is mo dularit y (Newman and Girv an, 2004). In particular w e ask whic h prop erties of qualit y functions intuitiv ely ough t to hold, and which are often assumed to hold when reason- ing informally ab out graph clustering. Suc h prop erties might be called axioms for graph clustering. The rest of this pap er is organized as follows: Section 2 gives basic deﬁnitions. Next, section 3 discusses diﬀerent w ays in whic h prop erties could b e formulated. In Section 4 of this paper we propose an axiomatic framew ork that consists of six prop erties of graph clustering quality functions: the (adaption of ) the four axioms from Klein b erg (2002) and Ac kerman and Ben-David (2008) (permutation inv ariance, scale in- v ariance, ric hness and monotonicity); and tw o additional prop erties speciﬁc for the graph setting (contin uity and the lo cality). Then, in Section 5, we show that mo dularity does not satisfy the monotonicity and lo calit y prop erties. This result motiv ates the analysis of v arian ts of mo dularity , leading to the deriv ation of a new parametric quality function in Section 6, that satisﬁes all properties. This quality function, which we call adaptiv e scale mo dularit y , has t wo parameters, M and γ whic h can b e tuned to control the resolution of the clustering. W e show that qualit y functions similar to normalized cut and unnormalized cut are obtained in the limit when M goes to zero and to inﬁnity , respectively . F urthermore, setting γ to 0 yields a parametric qualit y function similar to that prop osed by Reichardt and Bornholdt (2004). 1.1 Related W ork Previous axiomatic studies of clustering qualit y functions ha ve fo cused mainly on hierar- c hical clustering and on weak est and strongest link style qualit y functions (Kleinberg, 2002; Ac kerman and Ben-David, 2008; Zadeh and Ben-David, 2009; Carlsson et al., 2013). P a- p ers in this line of w ork that fo cussed also on the partitional setting include Puzicha et al. (1999); Ack erman et al. (2012, 2013). Puzicha et al. (1999) inv estigated a particular class of clustering qualit y functions obtained by requiring the function to decomp ose in to a cer- 2 Axioms for graph clustering quality functions tain additiv e form. Ac kerman et al. (2012) considered clustering in the w eigh ted setting, in whic h ev ery data point is assigned a real v alued w eight. They performed a theoretical analysis on the inﬂuence of w eighted data on standard clustering algorithms. Ac kerman et al. (2013) analyzed robustness of clustering algorithms to the addition of a s mall set of p oin ts, and inv estigated the robustness of p opular clustering metho ds. All these studies are framed in terms of distance (or similarity and dissimilarity) func- tions. Bub ec k and Luxburg (2009) studied statistical consistency of clustering methods. They in tro duced the so-called nearest neighbor clustering and show ed its consistency also for standard graph based qualit y functions, such as normalized cut, ratio cut, and mo dularity . Here w e do not focus on properties of methods to optimize clustering qualit y , but on natural prop erties that quality functions for graph clustering should satisfy . Related w orks on graph clustering qualit y functions mainly focus on the so-called resolu- tion limit, that is, the tendency of a quality function to prefer either small or large clusters. In particular, F ortunato and Barth´ elem y (2007) prov ed that mo dularit y may not detect clusters smaller than a scale whic h dep ends on the total size of the netw ork and on the degree of in terconnectedness of the clusters. Van Laarhov en and Marchiori (2013) show ed that the resolution limit is the most important diﬀerence betw een qualit y functions in graph clustering optimized using lo cal search optimization. T o mitigate the resolution limit phenomenon, the qualit y function ma y be extended with a so-called resolution parameter. F or example, Reic hardt and Bornholdt (2006) prop osed a form ulation of graph clustering (therein called netw ork communit y detection) based on principles from statistical mec hanics. This interpretation leads to the in tro duction of a family of qualit y functions with a parameter that allo ws to control the clustering resolution. In Section 6.1 w e will sho w that this extension is a special case of adaptive scale mo dularity . Traag, V an Do oren, and Nesterov (2011) formalized the notion of resolution-free qualit y functions, that is, not suﬀering from the resolution limit, and provided a c haracterization of this class of quality functions. Their notion is essentially an axiom, and we will discuss the relation to our axioms in Section 4.1.1. 2. Deﬁnitions and Notation A symmetric weighte d gr aph is a pair ( V , E ) of a ﬁnite set V of nodes and a function E : V × V → R ≥ 0 of edge weigh ts, where E ( i, j ) = E ( j, i ) for all i, j ∈ V . Edges with larger w eights represent stronger connections, so missing edges can get w eight 0. Note that this is the opp osite of the con ven tion used in distance based clustering. W e explicitly allow for self lo ops, that is, no des for which E ( i, i ) > 0. A clustering C of a graph G = ( V , E ) is a partition of its no des. That is, S C = V and for all c 1 , c 2 ∈ C , c 1 ∩ c 2 6 = ∅ if and only if c 1 = c 2 . When t wo nodes i and j are in the same cluster in clustering C , i.e. when i, j ∈ c for some c ∈ C , then we write i ∼ C j . Otherwise w e write i 6∼ C j . A clustering C is a r eﬁnement of a clustering D , written C v D , when for ev ery cluster c ∈ C there is a cluster d ∈ D suc h that c ⊆ d . A gr aph clustering quality function (or ob jectiv e function) Q is a function from graphs G and clusterings of G to real n umbers. W e adopt the conv en tion that a higher qualit y 3 v an Laarhoven and Marchiori indicates a ‘b etter’ clustering. As a generalization, w e will sometimes work with parame- terized families of quality functions . A single qualit y function can b e seen as a family with no parameters. Let G 1 = ( V 1 , E 1 ) and G 2 = ( V 2 , E 2 ) be t wo graphs and let V a ⊆ V 1 ∩ V 2 b e a subset of the common nodes. W e sa y that the graphs agr e e on V a if E 1 ( i, j ) = E 2 ( i, j ) for all i, j ∈ V a . W e sa y that the graphs also agr e e on the neighb orho o d of V a If • E 1 ( i, j ) = E 2 ( i, j ) for all i ∈ V a and j ∈ V 1 ∩ V 2 , • E 1 ( i, j ) = 0 for all i ∈ V a and j ∈ V 1 \ V 2 , and • E 2 ( i, j ) = 0 for all i ∈ V a and j ∈ V 2 \ V 1 . This means that for no des in V a the weigh ts and endp oints of incident edges are exactly the same in the tw o graphs. 3. On the F orm of Axioms There are three diﬀerent w ays to state p otential axioms for clustering: 1. As a prop erty of clustering functions, as in Kleinberg (2002). F or example, scale in v ariance of a clustering function ˆ C w ould b e written as “ ˆ C ( G ) = ˆ C ( αG ), for all graphs G , α > 0”. I.e. the optimal clustering is in v ariant under scaling of edge w eights. 2. As a prop ert y of the v alues of a qualit y function Q , as in Ac kerman and Ben-Da vid (2008). F or example “ Q ( G, C ) = Q ( αG, C ), for all graphs G , all clustering C of G , and α > 0”. I.e. the quality is in v ariant under scaling of edge w eights. 3. As a prop erty of the relation b etw een qualities of diﬀerent clustering, or equiv alen tly , as a prop erty of an ordering of clusterings for a particular graph. F or example “ Q ( G, C ) ≥ Q ( G, D ) ⇒ Q ( αG, C ) ≥ Q ( αG, D )”.I.e. the ‘better than’ relation for clusterings is inv arian t under scaling of edge weigh ts. The third form is slightly more ﬂexible than the other t w o. Any qualit y function that satisﬁes a property in the second st yle will also satisfy the corresponding prop erty in the third st yle, but the con verse is not true. Note also that if D is not restricted in a property in the third st yle, then one can take ˆ C ( G ) = argmax C Q ( G, C ) to obtain a clustering function and an axiom in the ﬁrst style. Most prop erties are more easily stated and prov ed in the second, absolute, style. There- fore, we adopt the second style unless doing so requires us to make sp eciﬁc choices. 4. Axioms for Graph Clustering Qualit y F unctions Klein b erg deﬁned three axioms for distance based clustering functions. In Ac k erman and Ben-Da vid (2008) the authors reformulated these into four axioms for clustering qualit y functions. These axioms can easily b e adapted to the graph setting. The ﬁrst prop erty that one exp ects for graph clustering is that the qualit y of a clustering dep ends only on the graph, that is, only on the weigh t of edges b etw een no des, not on the iden tity of no des. W e formalize this in the p ermutation inv ariance axiom, 4 Axioms for graph clustering quality functions Deﬁnition 1 (P ermutation inv ariance) A gr aph clustering quality function Q is p er- m utation in v ariant if for al l gr aphs G = ( V , E ) and al l isomorphisms f : V → V 0 , it is the c ase that Q ( G, C ) = Q ( f ( G ) , f ( C )) ; wher e f is extende d to gr aphs and clusterings by f ( C ) = {{ f ( i ) | i ∈ c } | c ∈ C } and f (( V , E )) = ( V 0 , ( i, j ) 7→ E ( f − 1 ( i ) , f − 1 ( j ))) . The second prop erty , scale in v ariance, requires that the quality do esn’t c hange when edge weigh ts are scaled uniformly . This is an intuitiv e axiom when one thinks in terms of units: a graph with edges in “m/s” can b e scaled to a graph with edges in “km/h”. The qualit y should not b e aﬀected b y such a transformation, p erhaps up to a change in units. Ac kerman and Ben-Da vid (2008) deﬁned scale in v ariance b y insisting that the qualit y sta ys equal when distances are scaled. In contrast, in Puzicha et al. (1999) the quality should scale prop ortional with the scaling of distances. W e generalize b oth of these previous deﬁnitions by only considering the relations b et ween the qualit y of t wo clusterings. Deﬁnition 2 (Scale inv ariance) A gr aph clustering quality function Q is scale inv ariant if for al l gr aphs G = ( V , E ) , al l clusterings C 1 , C 2 of G and al l c onstants α > 0 , Q ( G, C 1 ) ≤ Q ( G, C 2 ) if and only if Q ( αG, C 1 ) ≤ Q ( αG, C 2 ) . Wher e α G = ( V , ( i, j ) 7→ α E ( i, j )) is a gr aph with e dge weights sc ale d by a factor α . This form ulation is ﬂexible enough for single quality functions. Ho wev er, families of qualit y functions could hav e parameters that are also scale dep endent. F or such families we therefore prop ose to use as an axiom a more ﬂexible prop erty that also allows the parameters to b e scaled, Deﬁnition 3 (Scale inv ariant family) A family of quality function Q P p ar ameterize d by P ∈ P is scale in v ariant if for al l c onstants P ∈ P and α > 0 ther e is a P 0 ∈ P such that for al l gr aphs G = ( V , E ) , and al l clusterings C 1 , C 2 of G , Q P ( G, C 1 ) ≤ Q P ( G, C 2 ) if and only if Q P 0 ( αG, C 1 ) ≤ Q P 0 ( αG, C 2 ) . Thirdly , we w an t to rule out trivial qualit y functions. This is done b y requiring richness, i.e. that b y changing the edge weigh ts any clustering can b e made optimal for that qualit y function. Deﬁnition 4 (Ric hness) A gr aph clustering quality function Q is ric h if for al l sets V and al l non-trivial p artitions C ∗ of V , ther e is a gr aph G = ( V , E ) such that C ∗ is the Q -optimal clustering of V , i.e. argmax C Q ( G, C ) = C ∗ . The last axiom that Ac k erman and Ben-David consider is by far the most interesting. In tuitively , we exp ect that when the edges within a cluster are strengthened, or when edges b et ween clusters are weak ened, that this do es not decrease the quality . F ormally w e call suc h a c hange of a graph a consisten t improv ement, Deﬁnition 5 (Consisten t impro vemen t) L et G = ( V , E ) b e a gr aph and C a clustering of G . A gr aph G 0 = ( V , E 0 ) is a C -consistent improv ement of G if for al l no des i and j , E 0 ( i, j ) ≥ E ( i, j ) whenever i ∼ C j and E 0 ( i, j ) ≤ E ( i, j ) whenever i 6∼ C j . W e say that a quality function that do es not decrease under consistent impro v ement is monotonic. In previous work this axiom is often called consistency . 5 v an Laarhoven and Marchiori Deﬁnition 6 (Monotonicit y) A gr aph clustering quality function Q is monotonic if for al l gr aphs G , al l clusterings C of G and al l C -c onsistent impr ovements G 0 of G it is the c ase that Q ( G 0 , C ) ≥ Q ( G, C ) . 4.1 Lo cality In the graph setting it also becomes natural to look at com bining diﬀeren t graphs. With distance functions this is imp ossible, since it is not clear what the distance b et ween no des from the t wo diﬀerent sets should b e. But for graphs w e can take the edge weigh t b etw een no des not in b oth graphs to be zero, whic h is the case when the graphs agree on the neigh b orho o d of some set. Consider adding no des to one side of a large netw ork, then we would not wan t the clustering on the other side of the net work to c hange if there is no direct connection. F or example, if a new protein is disco vered in yeast, then the clustering of unrelated proteins in h umans should remain the same. Similarly , w e can consider an y t w o graphs with disjoin t no de sets as one larger graph. Then the qualit y of clusterings of the t w o original graphs should relate directly to qualit y on the combined graph. In general, local c hanges to a graph should hav e only local consequences to a clustering. Or in other words, the contribution of a single cluster to the total quality should only dep end on no des in the neighborho o d of that cluster. Deﬁnition 7 (Locality) A gr aph clustering quality function Q is lo cal if for al l gr aphs G 1 = ( V 1 , E 1 ) and G 2 = ( V 2 , E 2 ) that agr e e on a set V a and its neighb orho o d, and for al l clusterings C a , D a of V a , C 1 of V 1 \ V a and C 2 of V 2 \ V a , if Q ( G 1 , C a ∪ C 1 ) ≥ Q ( G 1 , D a ∪ C 1 ) then Q ( G 2 , C a ∪ C 2 ) ≥ Q ( G 2 , D a ∪ C 2 ) . An y qualit y function that has a preference for a ﬁxed n umber of clusters will not be lo cal. On the other hand, a quality function that is written as a sum ov er clusters, where eac h summand depends only on prop erties of nodes and edges in one cluster and not on global prop erties, is lo cal. Ac kerman et al. (2010b) deﬁned a similar lo calit y prop ert y for clustering functions. Their deﬁnition diﬀers from ours in three w ays. First of all, they looked at k -clustering, where the n umber of clusters is giv en and ﬁxed. Secondly , their lo cality prop erty only implies a consistent clustering when the rest of the graph is remo ved, corresp onding to V 2 = V 1 ∩ V a . They do not consider the other direction, where more no des and edges are added. Finally , their lo cality prop ert y requires only agreemen t of the ov erlapping set V a , not on its neigh b orho o d. That means that clustering functions should also give the same results if edges with one endp oint in V a are remov ed. 4.1.1 Rela tion to Resolution-Limit-Free Quality Functions Traag et al. (2011) introduced the notion of r esolution-limit-fr e e quality functions, w hic h is similar to lo calit y . They then show ed that resolution-limit-free qualit y functions do not suﬀer from the resolution limit as describ ed b y F ortunato and Barth´ elem y (2007). Their deﬁnition is as follows. 6 Axioms for graph clustering quality functions Deﬁnition 8 (Resolution-limit-free) Cal l a clustering C of a gr aph G Q -optimal if for al l clustering C 0 of G we have that Q ( G, C ) ≥ Q ( G, C 0 ) . L et C b e a Q -optimal clustering of a gr aph G 1 . Then the quality function Q is c al le d r esolution-limit-fr e e if for e ach sub gr aph G 2 induc e d by D ⊂ C , the p artition D is also Q -optimal. There are three diﬀerences compared to our lo cality prop erty . First of all, Deﬁnition 8 refers only to the optimal clustering, not to the quality , i.e. it is a prop erty in the st yle of Klein b erg. Secondly , lo calit y do es not require that G 2 b e a subgraph of G 1 . Lo calit y is stronger in that sense. Thirdly , and p erhaps most importantly , in the subgraph G 2 induced b y D ⊂ C , edges from a no de in D to no des not in D will b e remov ed. That means that while G 1 and G 2 agree on the set of common nodes, they do not also agree on their neigh b orho o d. So in this sense lo cality is w eaker than resolution-limit-freedom. The notion of resolution-limit-free qualit y functions was b orn out of the need to a void the resolution limit of graph clustering. And indeed lo cality is not enough to guarantee that a quality function is free from this re solution limit. W e could lo ok at a stronger v ersion of lo cality , which replaces agreemen t on the neigh- b orho o d of a set V a b y plain agreement on that set. Such a str ong lo c ality prop erty w ould imply resolution-limit-freedom. Ho wev er, it is a v ery strong prop erty in that it rules out man y sensible qualit y functions. In particular, a strongly local quality function can not dep end on the w eight of edges en tering or lea ving a cluster, because that w eigh t can be diﬀeren t in another graph that agrees only on that cluster. The solution used by T raag et al. is to use the num b er of no des instead of the volume of a cluster. In this w a y they obtain a resolution-limit-free v arian t of the Potts mo del b y Reic hardt and Bornholdt (2004), which they call the constan t Potts mo del. But this comes at the cost of scale inv ariance. 4.2 Contin uity In the context of graphs, p erhaps the most intuitiv e clustering function is ﬁnding the con- nected comp onents of a graph. As a quality function, we could write Q coco ( G, C ) = 1 [ C = ˆ C coco ( G )] , where the function ˆ C coco yields the connected comp onen ts of a graph. This qualit y function is clearly permutation inv arian t, scale in v ariant, ric h, and local. Since a consisten t c hange can only remo ve edges betw een clusters and add edges within clusters, the co co qualit y function is also monotonic. In fact, all of Kleinberg’s axioms (reformulated in terms of graphs) also hold for ˆ C coco , whic h seems to refute their imp ossibility result. How ever, the imp ossibility pro of can not b e directly transfered to graphs, because it inv olves a multiplication and division b y a maxim um distance. In the graph setting this would b e multiplication and division b y a minim um edge w eight, whic h can b e zero. Still, despite connected comp onents satisfying all previously deﬁned prop erties (except for strong locality), it is not a v ery useful quality function. In man y real-world graphs, most no des are part of one gian t connected comp onent (Bollob´ as, 2001). W e w ould also lik e the clustering to be inﬂuenced by the weigh t of edges, not just by their existence. A natural w ay to rule out such degenerate qualit y functions is to require contin uit y . 7 v an Laarhoven and Marchiori Deﬁnition 9 (Con tinuit y) A quality function Q is con tin uous if a smal l change in the gr aph le ads to a smal l change in the quality. F ormal ly, Q is c ontinuous if for every  > 0 and every gr aph G = ( V , E ) ther e exists a δ > 0 such that for al l gr aphs G 0 = ( V , E 0 ) , if E ( i, j ) − δ < E 0 ( i, j ) < E ( i, j ) + δ for al l no des i and j , then Q ( G 0 , C ) −  < Q ( G, C ) < Q ( G 0 , C ) +  for al l clusterings C of G . Connected comp onents clustering is not contin uous, b ecause adding an edge with a small w eight δ b et ween clusters c hanges the connected comp onen ts, and hence dramatically c hanges the qualit y . Con tinuous qualit y functions hav e an imp ortant property in practice, in that they pro- vide a degree of robustness to noise. A clustering that is optimal with regard to a con tinuous qualit y function will still b e close to optimal after a small change to the graph. 4.3 Summary of Axioms W e prop ose to consider the following six properties as axioms for graph clustering qualit y functions, 1. Perm utation inv ariance (deﬁnition 1), 2. Scale inv ariance (deﬁnition 2), 3. Richness (deﬁnition 4), 4. Monotonicity (deﬁnition 6), 5. Lo cality (deﬁnition 7), and 6. Contin uity (deﬁnition 9). As men tioned previously , for families of qualit y functions w e replace scale inv ariance by scale inv ariance for families (deﬁnition 3). In the next section w e will sho w that this set of axioms is consisten t b y deﬁning a qualit y function and a family of quality functions that satisﬁes all of them. Additionally , the fact that there are qualit y functions that satisfy only some of the axioms sho ws that they are (at least partially) indep enden t. 5. Mo dularit y F or graph clustering one of the most p opular quality functions is modularity (Newman and Girv an, 2004), despite its limitations (Go o d et al., 2010; T raag et al., 2011), Q modularity ( G, C ) = X c ∈ C  w c v V −  v c v V  2  . (1) In this expression v c ( G ) = P i ∈ c P j ∈ V E ( i, j ) is the v olume of a cluster, while w c ( G ) = P i,j ∈ c E ( i, j ) is the within cluster w eight. v V is the volume of the en tire graph. W e leav e the argument G implicit for readability . It is easy to see that mo dularity is permutation inv ariant, scale inv arian t and con tinuous. 8 Axioms for graph clustering quality functions Theorem 1 Mo dularity is rich. The pro of of Theorem 1 is in app endix A. An important asp ect of mo dularit y is that volume and within weigh t are normalized with respect to the total volume of the graph. This ensures that the quality function is scale inv arian t, but it also means that the quality can c hange in unexp ected w ays when the total volume of the graph changes. This leads us to Theorem 2. Theorem 2 Mo dularity is not lo c al. Pro of Consider the graphs G 1 = a b 1 1 2 2 G 2 = a b c 1 1 2 2 4 , whic h agree on the set V a = { a, b } . Note that w e draw the graphs as directed graphs, to mak e it clear that each undirected edge is counted t wice for the purposes of v olume and within cluster w eight. No w take the clusterings C a = {{ a } , { b }} and D a = {{ a, b }} of V a ; C 1 = {} of V 1 \ V a ; and C 2 = {{ c }} of V 2 \ V a . Then Q modularity ( G 1 , C a ∪ C 1 ) = 1 / 6 > 0 = Q modularity ( G 1 , D a ∪ C 1 ) , while Q modularity ( G 2 , C a ∪ C 2 ) = 23 / 50 < 24 / 50 = Q modularity ( G 2 , D a ∪ C 2 ) . This counterexample shows that mo dularity is not lo cal. Ev en without c hanging the node set, c hanges in the total v olume can b e problematic, as shown by the follo wing theorem. Theorem 3 Mo dularity is not monotonic. Pro of Consider the graphs G = a b c 1 1 2 G 0 = a b c 0 0 2 , and the clustering C = {{ a } , { b } , { c }} . G 0 is a C -consisten t improv emen t of G , b e- cause the w eight of a b etw een-cluster edge is decreased. The mo dularity of C in G is Q modularity ( G, C ) = 1 / 8, while the modularity of C in G 0 is Q modularity ( G 0 , C ) = 0. So mod- ularit y can decrease with a consistent c hange of a graph, and hence it is not a monotonic qualit y function. 9 v an Laarhoven and Marchiori Monotonicit y migh t b e to o strong a condition. When the goal is to ﬁnd a clustering of a single graph, w e are not actually in terested in the absolute v alue of a qualit y function. Rather, what is of in terest is the optimal clustering, and whic h changes to the graph preserve this optim um. A t a smaller scaler, we can lo ok at the relation b etw een tw o clusterings. If C is b etter then D on a graph G , then on what other graphs is C better then D ? W e therefore deﬁne a relative version of monotonicit y , in the hop es that modularity does satisfy this weak er v ersion. Deﬁnition 10 (Relativ e monotonicit y) A quality function Q is relativ ely monotonic if for al l gr aphs G and G 0 and clusterings C and D , if G 0 is a C -c onsistent impr ovement of G and G is a D -c onsistent impr ovement of G 0 and Q ( G, C ) ≥ Q ( G, D ) then Q ( G 0 , C ) ≥ Q ( G 0 , D ) . Theorem 4 Mo dularity is not r elatively monotonic. Pro of T ak e the graphs G = a b c d 1 1 8 1 G 0 = a b c d 2 2 8 1 , and the clusterings C = {{ a, b, c } , { d }} and D = {{ a } , { b } , { c, d }} . G 0 is a C -consistent im- pro vemen t of G , b ecause the weigh t of a within cluster edge is increased. G is a D -consisten t impro vemen t of G 0 , b ecause the w eigh t of a betw een cluster edge is decreased. Ho w- ev er Q modularity ( G, C ) = 20 / 121 > 16 / 121 = Q modularity ( G, D ) while Q modularity ( G 0 , C ) = 24 / 169 < 28 / 121 = Q modularity ( G 0 , D ). This coun terexample sho ws that modularity is not relativ ely monotonic. 6. Adaptiv e Scale Mo dularit y The problems with mo dularity stem from the fact that the total volume can c hange when c hanges are made to the graph. It is therefore natural to look at a v arian t of modularity where the total volume is replaced by a constan t M , Q M -ﬁxed ( G, C ) = X c ∈ C  w c M −  v c M  2  . This quality function is obviously lo cal. It is also a scale inv arian t family parameterized b y M . Ho wev er, this ﬁxed scale mo dularity qualit y function is not scale inv arian t for any ﬁxed scale M > 0. W e migh t hope that ﬁxed scale mo dularit y w ould b e monotonic, because it do esn’t suﬀer from the problem where c hanges in the edge weigh ts aﬀect the total volume. Unfortunately , ﬁxed scale modularity has problems when the v olume of a cluster starts to exceed M / 2. 10 Axioms for graph clustering quality functions In that case, increasing the w eight of within cluster edges starts to decrease the ﬁxed scale mo dularit y . Lo oking at a cluster c with v olume v c = w c + b c , ∂ Q M -ﬁxed ( G, C ) ∂ w c = 1 M − 2 v c M 2 . (2) This deriv ativ e is negativ e when 2 v c > M , so in that case increasing the weigh t of a within- cluster edge will decrease the quality . Hence ﬁxed scale mo dularity is not monotonic. The ab ov e argume n t also suggests a p ossible solution: add 2 v c to the normalization factor M . Or more generally , add γ v c with γ ≥ 2, which leads to the quality function Q M , γ ( G, C ) = X c ∈ C  w c M + γ v c −  v c M + γ v c  2  . (3) This adaptive sc ale mo dularity quality function is clearly still p erm utation inv arian t, con tinuous and lo cal. F or M = 0 it is also scale inv ariant. Since the v alue of M should scale along with the edge w eights, adaptive scale mo dularity is a scale inv arian t family parameterized by M . Additionally , w e ha ve the follo wing tw o theorems: Theorem 5 A daptive sc ale mo dularity is rich for al l M ≥ 0 and γ ≥ 1 . Theorem 6 A daptive sc ale mo dularity is monotonic for al l M ≥ 0 and γ ≥ 2 . The pro ofs of these theorems can b e found in app endices B and C. This sho ws that adaptive scale mo dularit y satisﬁes all six axioms we hav e deﬁned for families of graph clustering qualit y functions, and the six axioms for single qualit y functions when M = 0. This shows that our extended set of axioms is consistent. 6.1 Relation to Other Qualit y F unctions In terestingly , in the limit as M go es to 0, the adaptive-scale quality function b ecomes similar to normalized cut (Shi and Malik, 2000) with an added constant, Q 0, γ ( G, C ) = 1 γ X c ∈ C  w c v c − 1 γ  . This 0-adaptive mo dularit y is also scale inv arian t as a single quality function. Con versely , when M goes to inﬁnit y the qualit y go es to 0. Ho wev er, the qualit y function approac hes unnormalized cut in b ehavior: lim M →∞ M · Q M , γ ( G, C ) = X c ∈ C w c . This expression is similar to the Constant P otts mo del (CPM) by T raag et al. (2011), Q cpm ( G, C ) = X c ∈ C  w c − γ n 2 c  . (4) 11 v an Laarhoven and Marchiori In contrast to the qualit y functions discussed thus far, CPM uses the n umber of no des instead of volume to control the size of clusters. Like adaptive scale mo dularity , the constant P otts mo del satisﬁes all six axioms (as a family). As stated b efore, the ﬁxed scale and adaptiv e scale mo dularit y qualit y functions are a scale in v arian t family; they are not scale inv arian t for a ﬁxed v alue of M (except for M = 0). This is not a large problem in practice, since scale inv ariance is often sacriﬁced to o v ercome the resolution limit of mo dularit y (F ortunato and Barth ´ elemy, 2007). In fact, ﬁxed scale mo dularit y is prop ortional to the quality function introduced b y Reichardt and Bornholdt (2004), Q RB ( G, C ) = X c ∈ C  w c − γ RB v 2 c v V  = M · Q M -ﬁxed ( G, C ) , with M = v V /γ RB . 6.2 Parameter Dep endence Analysis There has b een a lot of interest in the so called resolution limit of mo dularity . This problem can be illustrated with a simple graph that consists of a ring of cliques, where each clique is connected to the next one with a single edge. W e would lik e the clusters in the optimal clustering to corresp ond to the cliques in the ring. It w as observed b y F ortunato and Barth´ elem y (2007) that, as the num b er of cliques in the ring increases, at some p oint the clustering with the highest modularity will ha ve m ultiple cliques p er cluster. This resolution problem stems from the fact that the b ehavior of modularity dep ends on the total volume of the graph. Both the ﬁxed scale and adaptive scale mo dularity quality functions instead hav e a parameter M , and hence do not suﬀer from this problem. In fact, an y local qualit y function will not hav e a resolution limit in the sense of F ortunato and Barth´ elem y. A similar observ ation was made b y T raag et al. (2011) in the context of mo dularit y like qualit y functions. In real situations graphs are not uniform as in the ring-of-cliques model. But w e can still tak e simple uniform problems as a building blo ck for larger and more complex graphs, since for local quality functions the rest of the net work doesn’t matter. Therefore w e will lo ok at a simple problem with tw o subgraphs of v arying sizes connected by a v arying num b er of edges. More precisely , w e take tw o cliques each with within weigh t w , connected by edges with weigh t b . The total volume of this (sub)graph is then 2 w + 2 b . There are three p ossible outcomes when clustering such a t wo-clique netw ork: (1) the optimal solution has a single cluster; (2) the optimal solution has tw o clusters, corresponding to the tw o cliques; (3) the optimal solution has more than tw o clusters, splitting the cliques apart. See Figure 1 for an illustration. Which of these outcomes is desirable dep ends on the circumstances. Another heterogeneous resolution limit model was prop osed by Lancic hinetti and F ortu- nato (2011). In this situation there are t wo cliques of equal size connected b y a single edge, and a random subgraph. No w the ideal solution would b e to ﬁnd three clusters, one for eac h clique and one for the random subgraph. The optimal split of the random subgraph will roughly cut it in half, with a ﬁxed fraction of the volume b eing betw een the t w o clusters (Reic hardt and Bornholdt, 2007). So this mo del can b e considered as a com bination of tw o 12 Axioms for graph clustering quality functions x, y 2 w + 2 b 1 x y b b w w 2 3 x 2 x 1 y 2 y 1 Figure 1: An illustration of the p ossible outcomes when clustering a tw o-clique net work. Clusters are indicated b y circles. In outcome (3), the v ertical edges each ha ve w eight w/ 4, while the horizontal and diagonal ones hav e w eight b/ 4. instances of our simpler problem, one for the t wo cliques and one for the random subgraph 1 . Hence, we wan t outcome (2) for the cliques, and outcome (1) for the random subgraph. In Figure 2 we sho w which graphs give which outcomes for adaptive scale mo dularity with v arious parameter settings. The ﬁrst column, γ = 0, is of particular interest, since it corresp onds to ﬁxed sc ale mo dularit y and hence also to Q RB and to mo dularit y in certain graphs. In the third ro w w e can see that when 2 v = 2 w + 2 b > M = 100 the cliques are split apart. This is precisely the region in whic h monotonicity no longer holds. Ov erall, the parameter M has the eﬀect of determining the scale; eac h ro w in this ﬁgure is merely the previous row magniﬁed b y a factor 10. Increasing M has the eﬀect of merging small clusters. On the other hand, the γ parameter controls the slope of the b oundary betw een outcomes (1) and (2), i.e. the fraction of edges that should b e within a cluster. This is most clearly seen when M = 0, while otherwise the eﬀect of M dominates for small clusters. 7. Conclusion and Op en Questions In this pap er w e presen ted an axiomatic framew ork for graph clustering qualit y functions consisting of six prop erties. W e show ed that mo dularity do es not satisfy the monotonicity prop ert y . This motiv ated the deriv ation of a new family of quality functions, adaptive scale mo dularit y , that satisﬁes all properties and has standard graph clustering qualit y functions as special cases. Results of an experimental parameter dep endence analysis sho w ed the high ﬂexibilit y of adaptive scale mo dularit y . How ever, adaptive scale modularity should not b e considered the solution to all the problems of mo dularit y , but rather an example of ho w axioms can b e used in practice. An ov erview of the discussed axioms and quality functions can b e found in table 1. Man y more qualit y functions ha ve b een prop osed in the literature, so this list is by no means exhaustiv e. An in teresting topic for future researc h is to mak e a survey of whic h existing quality functions satisfy which of the prop osed prop erties. W e also in vestigated resolution-limit-free quality functions as deﬁned in (T raag et al., 2011). As illustrated in section 6.2, adaptive scale mo dularity allo ws to p erform clustering at v arious resolutions, b y v arying the v alues of its tw o parameters. How ever it is not resolution-limit-free. 1. Lancic hinetti and F ortunato include edges b etw een the cliques and the random subgraph to ensure that the en tire net work is connected, these edges are not relev ant to the problem 13 v an Laarhoven and Marchiori M = 0 3 0 10 20 30 40 50 0 10 20 30 40 50 b 1 0 10 20 30 40 50 0 10 20 30 40 50 1 2 0 10 20 30 40 50 0 10 20 30 40 50 1 2 0 10 20 30 40 50 0 10 20 30 40 50 M = 10 3 0 10 20 30 40 50 0 10 20 30 40 50 b 1 2 0 10 20 30 40 50 0 10 20 30 40 50 1 2 0 10 20 30 40 50 0 10 20 30 40 50 1 2 0 10 20 30 40 50 0 10 20 30 40 50 M = 100 1 2 3 0 10 20 30 40 50 0 10 20 30 40 50 b 1 2 0 10 20 30 40 50 0 10 20 30 40 50 1 2 0 10 20 30 40 50 0 10 20 30 40 50 1 2 0 10 20 30 40 50 0 10 20 30 40 50 M = 1000 1 0 10 20 30 40 50 0 10 20 30 40 50 w b 1 0 10 20 30 40 50 0 10 20 30 40 50 w 1 0 10 20 30 40 50 0 10 20 30 40 50 w 1 2 0 10 20 30 40 50 0 10 20 30 40 50 w Γ = 0 Γ = 1 Γ = 2 Γ = 10 Figure 2: The b ehavior of Q M , γ for v arying parameter v alues. The graph consists of tw o subgraphs with w in ternal w eigh t eac h, connected b y an edge with w eigh b . Hence the volume of the total graph is 2 w + 2 b . In region (1) the optimal clustering has a single cluster, In region (2) (light blue) the optimal clustering separates the subgraphs. In region (3) (red, hatched) the subgraphs themselves will b e split apart. Our pap er did not address questions suc h as ﬁnding a best qualit y function (Almeida, Guedes, Jr., and Zaki, 2011), or selecting a signiﬁcan t resolution scale (T raag et al., 2013). The aim w as to pro vide necessary conditions about what a goo d qualit y function is, in order to rule out and/or to improv e quality functions. The proposed axioms and the introduction of adaptive scale mo dularity are an eﬀort in this direction. W e also did not address the question of ﬁnding a clustering with the highest quality . Finding the optimal v alue of quality functions such as mo dularity is NP-hard (Brandes et al., 2008), but several heuristic and approximation algorithms hav e b een developed. One class of algorithms uses a divisiv e approach, see for instance Newman (2006); Ruan and Zhang (2008). F or suc h a tactic to b e v alid, an optimal or close to optimal clustering of a subgraph 14 Axioms for graph clustering quality functions P ermutation in v ariance Scale inv ariance Scale inv ariance (family) Ric hness Monotonicit y Lo calit y Con tinuit y Connected comp onents X X n.a. X X X − Mo dularit y X X n.a. X − − X Reic hardt and Bornholdt (2004) X X X X − − X Fixed scale mo dularity X M = 0 X X − X X Adaptiv e scale mo dularity X M = 0 X γ ≥ 1 γ ≥ 2 X X Constan t Potts Mo del (T raag et al., 2011) X − X γ > 0 X X X Normalized cut X X n.a. − X X X T able 1: Ov erview of quality functions discussed in this pap er and the prop erties they satisfy . should also b e a near optimal clustering of the en tire graph. This is ensured by lo calit y . Recen tly Dinh and Thai (2013) proposed p olynomial-time appro ximation algorithms for the mo dularit y maximization in the context of scale free netw orks. It w ould be in teresting to in vestigate the suitabilit y of these algorithms for adaptive scale mo dularity maximization. In this w ork w e ha ve only lo oked at non-negative w eights, undirected graphs, and only at hard partitioning. An extension to graphs with negative weigh ts, to directed graphs and to o verlapping clusters remains to be in vestigated. Another open problem is ho w to use these axioms for reasoning ab out quality functions and clustering algorithms. Ac kno wledgments W e thank the review ers for their commen ts. This work has been partially funded b y the Netherlands Organization for Scien tiﬁc Researc h (NW O) within the NWO pro ject 612.066.927. App endix A. Proof of Theorem 1 (Mo dularity is Rich) The pro ofs of ric hness rely on clique graphs, Deﬁnition 11 (Clique graph) L et V b e a set of no des, C b e a p artition of V , and k b e a p ositive c onstant. The clique graph of C with edge weigh t k is deﬁne d as G = ( V , E ) wher e E ( i, j ) = k if i ∼ C j and E ( i, j ) = 0 otherwise. Pro of 15 v an Laarhoven and Marchiori Let V b e a set of no des and C 6 = { V } b e a clustering of V . Let G = ( V , E ) b e a clique graph of C with edge w eigh t 1. Note that E ( i, i ) = 1, so an y p ossible cluster will hav e a p ositiv e volume. Let D b e a clustering of G with maximal mo dularit y . Supp ose that there is a cluster d ∈ D that contains i, j ∈ d with i 6∼ C j . Then we can split the cluster in to d 1 = { k ∈ d | k ∼ C i } and d 2 = { k ∈ d | k 6∼ C i } . Because there are no edges b etw een no des in d 1 and no des in d 2 , it is the case that w d = w d 1 + w d 2 . Both d 1 and d 2 are non-empty and hav e a positive volume, so v 2 d = ( v d 1 + v d 2 ) 2 < v 2 d 1 + v 2 d 2 . Therefore Q modularity ( G, D ) < Q modularity ( G, D \ { d } ∪ { d 1 , d 2 } ). So D do es not hav e maximal mo dularit y , whic h is a contradiction. Supp ose, on the other hand that all clusters d ∈ D are a subset of some cluster in C , i.e. D is a reﬁnemen t of C . Then either D = C , or there are tw o clusters d 1 , d 2 ∈ D that are b oth a subset of the same cluster c ∈ C . In the latter case we can com bine the tw o clusters into d = d 1 ∪ d 2 . The within weigh t of this c om bined cluster is w d = | d | 2 = w d 1 + w d 2 + 2 | d 1 || d 2 | . The squared volume of the combined cluster is v 2 d = | d | 2 | c | 2 = v 2 d 1 + v 2 d 2 + 2 | d 1 || d 2 || c | 2 . So this changes increases the mo dularity b y Q modularity ( G, D \ { d 1 , d 2 } ∪ { d } ) − Q modularity ( G, D ) = 2 | d 1 || d 2 | /v V − 2 | d 1 || d 2 || c | 2 /v 2 V = 2 | d 1 || d 2 | ( v V − | c | 2 ) /v 2 V > 0 , whic h con tradicts the assumption that D has maximal mo dularity . Therefore the only optimal clustering of G is C . Note that the abov e inequalit y only holds when | c | 2 = v c < v V , whic h is the case b ecause C 6 = { V } . When C = { V } , a clique graph will not work; b ecause b oth { V } and the clustering that assigns half the no des to one clu ster, and half to another ha v e mo dularity equal to 0. In this case, instead deﬁne G = ( V , E ) by E ( i, j ) = 1 if i 6 = j and 0 if i = j . Then the mo dularity for C is q ( G, { V } ) = 0. Any cluster d in a clustering D will hav e v d = | d | ( | V | − 1) and w d = | d | ( | d | − 1). Therefore the contribution of this cluster to the total quality is −| d | ( | V | − | d | ) / ( | V | 2 ( | V | − 1)), which is negativ e when | d | < | V | . So the mo dularity of any clustering other than { V } will b e negative, hence { V } is the only optimal clustering. Since for every C we can construct a graph where C is the only optimal clustering, mo dularit y is rich. App endix B. Proof of Theorem 5 (Adaptiv e Scale Mo dularity is Rich) Denote b y f C ( d ) the largest fraction of an y cluster from C that is contained in a cluster d . f C ( d ) = max c ∈ C | c ∩ d | | c | . F or any clustering D we ha v e that X d ∈ D f C ( d ) = X d ∈ D max c ∈ C | c ∩ d | | c | ≤ X d ∈ D X c ∈ C | c ∩ d | | c | = | C | . (5) 16 Axioms for graph clustering quality functions And since f C ( d ) ≤ 1 for all clusters d , w e also hav e that X d ∈ D f C ( d ) ≤ | D | . (6) Lemma 7 F or a clique gr aph of C it is the c ase that w d /v d ≤ f C ( d ) . Pro of Giv en a cluster d and a clique graph G of C with weigh t k > 0, the v olume of d is v d = X c ∈ C k | c ∩ d || c | , and the within cluster w eight is w d = X c ∈ C k | c ∩ d | 2 . Therefore w d ≤ X c ∈ C k | c ∩ d || c | f C ( d ) = v d f C ( d ) . And hence w d /v d ≤ f C ( d ). Lemma 8 L et G b e the clique gr aph of a clustering C with weight k , and let 0 < β < 1 b e a c onstant. Then P d ∈ D ( w d /v d − β ) = (1 − β ) | C | if D = C , while P d ∈ D ( w d /v d − β ) < (1 − β ) | C | −  if D 6 = C , wher e  = min( β , 1 − β , 1 / | V | ) / 2 . Pro of Suppose that D = C , then for every cluster c ∈ C , w c = v c = k | c | 2 , and so X c ∈ C  w d v d − β  = (1 − β ) | C | . Otherwise, D 6 = C . Assume that P d ∈ D ( w d /v d − β ) ≥ (1 − β ) | C | − min( β , 1 / | V | ) / 2. By Lemma 7, | C | − β ( | C | + 1) < | C | − β | C | −  ≤ X d ∈ D ( w d v d − β ) ≤ X d ∈ D ( f C ( d ) − β ) ≤| C | − β | D | . Since β > 0, this implies that | D | < | C | + 1. 17 v an Laarhoven and Marchiori Additionally , since f C ( d ) ≤ 1 for all clusters d ∈ D , (1 − β )( | C | − 1) < (1 − β ) | C | −  ≤ X d ∈ D ( f C ( d ) − β ) ≤ (1 − β ) | D | Since β < 1, this implies that | D | > | C | − 1. Hence | D | = | C | . Supp ose that f C ( d ) < 1 for some d ∈ D , which implies that | c ∩ d | < | c | . Because edges are discrete, this can only happ en when | c ∩ d | ≤ | c | − 1 for all clusters c . And the size of clusters is b ounded b y | c | ≤ | V | . Hence f C ( d ) ≤ ( | V | − 1) / | V | = 1 − 1 / | V | . And since for all other clusters d 0 , f C ( d 0 ) ≤ 1, w e then ha ve X d ∈ D ( f C ( d ) − β ) ≤ (1 − β ) | D | − 1 / | V | < (1 − β ) | C | −  ≤ X d ∈ D ( w d /v d − β ) ≤ X d ∈ D ( f C ( d ) − β ) , whic h is a con tradiction. Hence, it must b e the case that f C ( d ) = 1 for all clusters d ∈ D . By the deﬁnition of f C this means that for every d there is a cluster c ∈ C suc h that | c ∩ d | = | c | , and therefore c ⊆ d . Since the clusters are disjoint and | D | = | C | , this implies that D = C . Whic h is a contradiction, so P d ∈ D ( w d /v d − β ) < (1 − β ) | C | −  . When M = 0, the adaptiv e scale mo dularity reduces to w d / ( γ v d ) − | D | /γ 2 , and the ab o ve lemma is enough to prov e richness. F or non-zero v alues of M , w e can get ‘close enough’ b y choosing large enough edge w eigh ts. This is formalized in the follo wing lemma. Lemma 9 L et d b e a cluster in a clustering of a clique gr aph of C with weight k . Then w d v d − β − β M /k ≤ q ( d ) /β ≤ w d v d − β + 2 β 2 M /k , wher e q ( d ) = w d M + v d /β −  v d M + v d /β  2 denotes the c ontribution of d to the M -adaptive mo dularity. 18 Axioms for graph clustering quality functions Pro of Since clusters are non-empty , and in a clique graph E ( i, i ) = k , it follo ws that v d ≥ w d ≥ k . So q ( d ) /β = β M w d + v d w d − β v 2 d ( β M + v d ) 2 = w d v d − β + β 2 M ( β M + 2 v d ) − β 2 M 2 w d /v d − β M w d ( β M + v d ) 2 ≤ w d v d − β + β 2 M ( β M + 2 v d ) ( β M + v d ) 2 ≤ w d v d − β + 2 β 2 M ( β M + 2 v d ) ( β M + v d )( β M + 2 v d ) = w d v d − β + 2 β 2 M β M + v d ≤ w d v d − β + 2 β 2 M k . And since w d ≤ v d , q ( d ) /β = w d v d − β + β 2 M ( β M + 2 v d ) − β 2 M 2 w d /v d − β M w d ( β M + v d ) 2 ≥ w d v d − β − β 2 M 2 + β M v d ( β M + v d ) 2 = w d v d − β − β M β M + v d ≥ w d v d − β − β M k . Com bining these lemmas yields the pro of of the general theorem: Pro of Giv en a clustering C . Deﬁne β = 1 /γ . If γ > 1 then 0 < β < 1. Pick k > 3 | V | β 2 M / where  is deﬁned as in Lemma 8. 19 v an Laarhoven and Marchiori Let G b e the clique graph of C with weigh t k . Let D 6 = C b e a clustering of G . Then b y Lemmas 8 and 9, Q M , γ ( G, D ) /β = X d ∈ D q ( d ) ≤ X d ∈ D ( w d /v d − β + 2 β 3 M /k ) ≤ (1 − β ) | C | + 2 | D | β 3 M /k −  ≤ (1 − β ) | C | + 2 | V | β 2 M /k −  < (1 − β ) | C | − | V | β 2 M /k ≤ (1 − β ) | C | − | C | β 2 M /k = X c ∈ C ( w c /v c − β + β 2 M /k ) ≤ Q M , γ ( C ) /β . Hence the quality is maximal for C . Since there is a clique graph and k for every clustering, adaptiv e scale mo dularity is ric h. App endix C. Proof of Theorem 6 (Adaptiv e Scale Mo dularity is Monotonic) Pro of Giv en a constants M > 0 and γ ≥ 2, a graph G and a clustering C of G . Let c ∈ C be an y cluster. W riting the v olume of c as v c = w c + b c , the con tribution of this cluster to the qualit y of G is q ( w c , b c ) where q ( w , b ) = w M + γ w + γ b −  w + b M + γ w + γ b  2 . The partial deriv atives of q are ∂ q ( w , b ) ∂ w = M 2 + ( γ − 2) M ( w + b ) + γ b ( M + γ w + γ b ) ( M + γ w + γ b ) 3 ≥ 0 ∂ q ( w , b ) ∂ b = − γ w M + ( w + b )( M + γ 2 w ) ( M + γ w + γ b ) 3 ≤ 0 . This means that q is a monotonically non-decreasing function in w and a non-increasing function in b . F or an y graph G 0 that is a C -consisten t c hange of G , it holds that w 0 c ≥ w c and b 0 c ≤ b c . So q ( w 0 c , b 0 c ) ≥ q ( w c , b c ). And therefore Q M , γ ( G 0 , C ) ≥ Q M , γ ( G, C ). So adaptive scale mo d- ularit y is monotonic. 20 Axioms for graph clustering quality functions References M. Ack erman, S. Ben-David, D. Loker, and S. Sabato. Clustering oligarchies. In Pr o c e e d- ings of the International Confer enc e on Artiﬁcial Intel ligenc e and Statistics (AIST A TS) , v olume 31 of JMLR Workshop and Confer enc e Pr o c e e dings , pages 66–74, 2013. Margareta Ack erman and Shai Ben-Da vid. Measures of clustering qualit y: A working set of axioms for clustering. In Daphne Koller, Dale Sch uurmans, Y oshua Bengio, and L´ eon Bottou, editors, NIPS , pages 121–128. Curran Asso ciates, Inc., 2008. Margareta Ack erman and Shai Ben-David. A characterization of link age-based hierarchical clustering. Journal of Machine L e arning R ese ar ch , 2013. Margareta Ack erman, Shai Ben-Da vid, and Da vid Loker. T o wards property-based classi- ﬁcation of clustering paradigms. In John D. Laﬀerty , Christopher K. I. Williams, John Sha we-T a ylor, Richard S. Zemel, and Aron Culotta, editors, NIPS , pages 10–18. Curran Asso ciates, Inc., 2010a. Margareta Ack erman, Shai Ben-David, and David Lok er. Characterization of link age-based clustering, 2010b. Margareta Ac kerman, Shai Ben-David, Simina Brˆ anzei, and Da vid Lok er. W eighted clus- tering. In J¨ org Hoﬀmann and Bart Selman, editors, AAAI . AAAI Press, 2012. Helio Almeida, Dorgiv al Guedes, W agner Meira Jr., and Mohammed J. Zaki. Is there a best qualit y metric for graph clusters? In Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis V azirgiannis, editors, Machine L e arning and Know le dge Disc overy in Datab ases , v olume 6911 of L e ctur e Notes in Computer Scienc e , pages 44–59. Springer Berlin Heidelb erg, 2011. ISBN 978-3-642-23779-9. Vincen t D. Blondel, Jean-Loup Guillaume, Renaud Lam biotte, and Etienne Lefeb vre. F ast unfolding of communities in large netw orks. J. Stat. Me ch. The ory Exp. , 2008(10):P10008, 2008. ISSN 1742-5468. doi: 10.1088/1742- 5468/2008/10/P10008. URL http://dx.doi. org/10.1088/1742- 5468/2008/10/P10008 . B ´ ela Bollob´ as. The Evolution of R andom Gr aphs – the Giant Comp onent , pages 130–159. Cam bridge Universit y Press, 2001. ISBN 9780521797221. Ulrik Brandes, Daniel Delling, Marco Gaertler, Rob ert Gorke, Martin Ho efer, Zoran Nik oloski, and Dorothea W agner. On modularity clustering. IEEE T r ansactions on Know le dge and Data Engine ering , 20(2):172–188, 2008. ISSN 1041-4347. doi: 10.1109/TKDE.2007.190689. S ´ ebastien Bub eck and Ulrik e von Luxburg. Nearest neighbor clustering: A baseline metho d for consisten t clustering with arbitrary ob jective functions. J. Mach. L e arn. R es. , 10: 657–698, June 2009. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id= 1577069.1577092 . 21 v an Laarhoven and Marchiori Gunnar Carlsson, F acundo M ´ emoli, Alejandro Rib eiro, and San tiago Segarra. Axiomatic construction of hierarchical clustering in asymmetric netw orks. CoRR , abs/1301.7724, 2013. Thang N. Dinh and My T. Thai. Communit y detection in scale-free netw orks: Appro xima- tion algorithms for maximizing mo dularity . IEEE Journal on Sele cte d Ar e as in Commu- nic ations , 31(6):997–1006, 2013. San to F ortunato and Marc Barth ´ elemy . Resolution limit in comm unity detection. Pr o c. Natl. A c ad. Sci. USA , 104(1):36–41, 2007. doi: 10.1073/pnas.0605965104. Sreeniv as Gollapudi and Aneesh Sharma. An axiomatic approach for result div ersiﬁcation. In Pr o c e e dings of the 18th international c onfer enc e on World wide web , pages 381–390, 2009. Benjamin H. Go o d, Yv es A. de Montjo ye, and Aaron Clauset. P erformance of mo dularity maximization in practical con texts. Phys. R ev. E , 81(4):046106, April 2010. doi: 10. 1103/Ph ysRevE.81.046106. URL http://dx.doi.org/10.1103/PhysRevE.81.046106 . Jon M. Kleinberg. An imp ossibility theorem for clustering. In Suzanna Bec ker, Sebastian Thrun, and Klaus Obermay er, editors, NIPS , pages 446–453. MIT Press, 2002. ISBN 0-262-02550-7. Andrea Lancic hinetti and Santo F ortunato. Limits of mo dularity maximization in commu- nit y detection. Phys. R ev. E , 84:066122, December 2011. doi: 10.1103/Ph ysRevE.84. 066122. URL http://dx.doi.org/10.1103/PhysRevE.84.066122 . Marina Meila. Comparing clusterings: an axiomatic view. In Pr o c e e dings of the 22nd international c onfer enc e on Machine le arning , pages 577–584. ACM, 2005. Mark E. J. Newman. Finding comm unity structure in net works using the eigen vectors of matrices. Phys. R ev. E , 74(3):036104, July 2006. doi: 10.1103/Ph ysRevE.74.036104. URL http://dx.doi.org/10.1103/PhysRevE.74.036104 . Mark E. J. Newman and Michelle Girv an. Finding and ev aluating communit y structure in net works. Phys. R ev. E , 69:026113, F eb 2004. doi: 10.1103/PhysRevE.69.026113. URL http://pre.aps.org/abstract/PRE/v69/i2/e026113 . Jan Puzicha, Thomas Hofmann, and Joachim M. Buhmann. A theory of proximit y based clustering: Structure detection by optimization. Pattern R e c o gnition , 33:617–634, 1999. J¨ org Reic hardt and Stefan Bornholdt. Detecting fuzzy communit y structures in complex net- w orks with a P otts mo del. Phys. R ev. L ett. , 93:218701, 2004. doi: 10.1103/PhysRevLett. 93.218701. J¨ org Reichardt and Stefan Bornholdt. Statistical mechanics of communit y detection. Phys- ic al R eview E , 74(1):016110, 2006. J¨ org Reic hardt and Stefan Bornholdt. Partitioning and mo dularity of graphs with arbitrary degree distribution. Phys. R ev. E , 76:015102, Jul 2007. doi: 10.1103/PhysRevE.76. 015102. URL http://link.aps.org/doi/10.1103/PhysRevE.76.015102 . 22 Axioms for graph clustering quality functions Jianh ua Ruan and W eixiong Zhang. Identifying netw ork comm unities with a high resolution. Phys. R ev. E , 77:016104, Jan 2008. doi: 10.1103/Ph ysRevE.77.016104. URL http: //link.aps.org/doi/10.1103/PhysRevE.77.016104 . Jian b o Shi and Jitendra Malik. Normalized cuts and image segmen tation. v olume 22, pages 888–905, W ashington, DC, USA, August 2000. IEEE Computer So ciety . doi: 10.1109/ 34.868688. URL http://dx.doi.org/10.1109/34.868688 . Vincen t A. T raag, P aul V an Do oren, and Y urii E. Nestero v. Narrow scop e for resolution- limit-free comm unity detection. Phys. R ev. E , 84:016114, Jul 2011. doi: 10.1103/ Ph ysRevE.84.016114. URL http://link.aps.org/doi/10.1103/PhysRevE.84.016114 . Vincen t A. T raag, Gautier Krings, and Paul V an Do oren. Signiﬁcant scales in communit y structure. Submitte d , Jun 2013. URL . Tw an v an Laarho ven and Elena Marc hiori. Graph clustering with local search optimization: The resolution bias of the ob jectiv e function matters most. Phys. R ev. E , 87:012812, Jan 2013. doi: 10.1103/Ph ysRevE.87.012812. URL http://link.aps.org/doi/10.1103/ PhysRevE.87.012812 . Reza Bosagh Zadeh and Shai Ben-Da vid. A uniqueness theorem for clustering. In Pr o- c e e dings of the Twenty-Fifth Confer enc e on Unc ertainty in Artiﬁcial Intel ligenc e , UAI ’09, pages 639–646, Arlington, Virginia, United States, 2009. A UAI Press. ISBN 978-0- 9749039-5-8. URL http://dl.acm.org/citation.cfm?id=1795114.1795189 . 23

Axioms for graph clustering quality functions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment