Detecting Overlapping Temporal Community Structure in Time-Evolving Networks
We present a principled approach for detecting overlapping temporal community structure in dynamic networks. Our method is based on the following framework: find the overlapping temporal community structure that maximizes a quality function associate…
Authors: Yudong Chen, Vikas Kawadia, Rahul Urgaonkar
Detecting Ov erlapping T emporal Community Structure in T ime-Ev olving Netw orks Y udong Chen ∗ , V ikas Kawadia † and Rahul Urg aonkar † ∗ The Uni versity of T exas at Austin, ydchen@ute xas.edu † Raytheon BBN T echnologies, { vkawadia,rahul } @bbn.com Abstract —W e present a principled approach for detecting overlapping temporal community structure in dynamic networks. Our method is based on the following framework: find the overlapping temporal community structure that maximizes a quality function associated with each snapshot of the network subject to a temporal smoothness constraint. A novel quality function and a smoothness constraint are proposed to handle overlaps, and a new conv ex relaxation is used to solve the result- ing combinatorial optimization problem. W e provide theor etical guarantees as well as experimental results that re veal community structure in real and synthetic networks. Our main insight is that certain structures can be identified only when temporal correlation is consider ed and when communities are allowed to overlap. In general, discovering such overlapping temporal community structure can enhance our understanding of real- world complex networks by re vealing the underlying stability behind their seemingly chaotic ev olution. I . I N T RO D U C T I O N Communities are densely connected groups of nodes in a network. Community detection, which attempts to identify such communities, is a fundamental primiti ve in the analysis of complex networked systems that span multiple disciplines in network science such as biological networks, online social networks, epidemic networks, communication networks, etc. It serves as an important tool for understanding the underlying, often latent, structure of networks and has a wide range of applications: user profiling for online marketing, computer virus spread and spam detection, understanding protein-protein interactions, content caching, to name a few . The concept of communities has been generalized to overlapping communities which allows nodes to belong to multiple communities at the same time. This has been shown to re veal the latent structure at multiple levels of hierarchy . Community detection in static networks has been studied extensi vely (see [1] for a comprehensiv e survey), but has primarily been applied to social networks and information networks. Applications to communications netw orks ha ve been few . Perhaps this is because communication networks (and in particular wireless networks) change at a much faster timescale than social and information networks, and the science of com- munity detection in time-varying netw orks is still developing. In this paper , we hope to narrow this gap by providing ef ficient techniques for detecting communities in networks that v ary ov er time while allo wing such communities to be ov erlapping as we elaborate belo w . T emporal community detection [2]–[5] aims to identify ho w communities emerge, grow , combine, and decay over time . Fig. 1. A schematic illustrating the various notions of community structure in netw orks. Panel a) sho ws a typical community structure in an example network. If one uses a quality function and methods that look for ov erlapping community structure on the same network, one could find a structure shown in panel b). When the network is time-varying, we illustrate the temporal community structure by showing the communities that a node belong to over time. The top panels are single snapshots of the network ev olution in the bottom panels. The (non-overlapping) temporal community structure in panel c) reveals how communities change with time. The ov erlapping temporal community structure shown in panel d) can uncover deeper hidden patterns such as small communities persistent over time (shown in blue). This is useful in practice because most networks of interest, particularly communication networks, are time-v arying. T ypi- cally , temporal community detection enforces some continuity or “smoothness” with the past community structure as the network e volv es. While one could apply static community detection independently in each snapshot, this fails to dis- cov er small, yet persistent communities because, without the smoothness constraints, these structures would be buried in noise and thus be unidentifiable to static methods. In this paper , we propose a principled frame work that goes beyond re gular temporal communities and incorporates the concept of overlapping tempor al communities (O TC) . In this formulation, nodes can belong to multiple communities at any gi ven time and those communities can persist o ver time as well. This allows us to detect e ven more subtle persis- tent structure that would otherwise be subsumed into larger communities. W e illustrate the v arious notions of community structure in Fig. 1. As we will demonstrate in Section V using both synthetic and real-world data, our framework is able to correctly identify such community structure. Knowing the OTC structure of an e volving network is useful because, although the observed network may change rapidly , its latent structure often e volv es much more slo wly . For example, contact-based social networks might change from day to day due to people’ s v arying daily activities, b ut the social groups (e.g. family , friends, colleagues) that people belong to are much more stable. Identifying such latent time- persistent structure can reveal the fundamental rules governing the seemingly chaotic ev olution of real-life complex networks. In addition, kno wledge of the times when significant changes occur could be used for predicting network ev olution. The OTC structure of networks has many applications to designing communication networks as well. For example, it can be used in ef ficient distributed storage of information in a wireless network so that a verage access latenc y is mini- mized [6]. The O TC structure can also be used to select relay nodes and design routing schemes in a disruption tolerant wireless networks. Another application is to de vise real- life mobility models for analysis and ev aluation of network protocols. W e elaborate on these applications in Section VI. Our approach: W e describe the key ideas behind our techniques. A nai ve approach to temporal community detec- tion is to perform static community detection independently in each snapshot. The limitations of this approach are well- documented [5], as it is very sensitive to e v en minor changes in the network. The approach can be extended to detect O TC structure as well b ut has the same limitations. Past work, including [2] and [5] argue that temporal communities can be detected if an e xplicit smoothness constraint that captures the distance to past partitions is enforced. W ith the smoothness constraint, it is possible to go beyond static methods and detect small persistent communities, as information at multiple snapshots is considered together . In this paper, we build on the abov e intuition and propose an approach for detecting O TC structure using temporal information. Our approach is a novel con ve x relaxation of the follo wing combinatorial problem: find the temporal community structure that maximizes a quality function associated with each snapshot subject to a temporal smoothness constraint. T o handle overlapping communities, we use generalizations of the quality function proposed in [7] and the temporal smoothness measure proposed in [5]. While the quality function fa vors densely connected groups, the smoothness constraint promotes persistent structure. Our formulation is fairly general and allo ws other quality functions and smoothness metrics to be used. Further, it is naturally applicable to o verlapping temporal communities and does not require any ad hoc modifications. Unlike most existing approaches that use greedy heuristics to solve the resulting NP- hard problem of optimizing ov er the combinatorial set of all partitions (or co vers when o verlaps are allowed), we use a tight con ve x relaxation of this set via the trace norm. This not only results in a con ve x optimization problem that can be solved efficiently using existing techniques, but also enables us to obtain a priori guarantees on the performance of our method, and provides valuable insight. In particular , our analysis shows that, under a natural generativ e model, our method is able to recov er communities that persists for m snapshots and have size K ≥ p n m , where n is the number of nodes. This result highlights the benefit of utilizing temporal information: with more snapshots, we are able to detect smaller communities. W e believe this specific relation is nov el, and applies beyond the particular methods in this paper . W e note that our approach can detect non-overlapping temporal communities as well. T o summarize, we pro vide the first principled formulation of the problem of detecting ov erlapping temporal communities. A critical piece of the formulation is the quality function for quantifying the community structure in any snapshot, and a distance function to ensure contiguity with the past community structure. T o the best of our kno wledge, we are the first ones to propose such functions for ov erlapping communities. W e provide a conv ex relaxation and hence an ef ficient way to solve the optimization problem, while most existing work relies on greedy heuristics. In addition, we pro vide theoretical guarantees on the performance of our method, and we discuss the insights we gain from the guarantees. Finally , we e v aluate our method using se veral synthetic and real netw ork data-sets and illustrate its ef ficacy . W e also discuss applications of our method to communication netw orks. Remark on terminology: In the sequel, we use cluster and community interchangeably , both allowing ov erlaps. I I . R E L A T E D W O R K There is a long line of work on community detection which has been comprehensiv ely surv eyed in [1]. Here, we focus on work that is most relev ant to our approach. In particular , [8] presents a con v ex formulation for optimizing modularity [9], a well-known quality function for static non-overlapping com- munities. Our con ve x formulation is completely dif ferent, and our allow overlapping and temporal communities. Our formu- lation is related to lo w-rank matrix recovery techniques [10], [11]. This line of work typically uses trace norm as a con ve x surrogate of the non-con ve x rank function, and similarly the ` 1 norm for sparsity . In this paper , we use trace norm as a relaxation for the set of covers, a combinatorial and non- con ve x set, and the (weighted) ` 1 norm as the quality function. Similar relaxation for static clustering without ov erlaps is considered in [12]. Our approach for dealing with ov erlaps is very dif ferent from exiting work; for a survey we refer to [13] and section XI of [1]. In the rest of this section we focus on an overvie w of temporal clustering. Most existing work on temporal clustering can be di vided into two categories: 1) maximize quality function subject to smoothness constraints; 2) slightly modify the clustering structure from the pre vious snapshot. The first approach starts with [2], which proposes the framew ork of Evolutionary Clustering that aims to optimize a combination of a snapshot quality and a temporal smoothness cost. The work in [5] argues that a specific choice of the temporal cost, namely estrangement, works well. In [3] the authors uses the KL-di ver gence in both the snapshot quality and temporal cost, and reformulates the problem using non- negati ve matrix factorization in order to obtain soft clusterings. In [4] a particle-density based method is proposed to opti- mize the clustering objecti ve. All of these works use greedy algorithms to solve the optimization problem, which only guarantees con ver gence to a local optimum. Our formulation is similar to the Ev olutionary Clustering framework, but we are able to use con v ex optimization via a reasonable relaxation. The second class of methods typically work as follo ws: each time the network changes, they modify the clustering structure to reflect the change according to some predefined rules. Smoothness is maintained since the modification would not change the clustering too much. For example, [14] pro- poses iLCD (intrinsic Longitudinal Community Detection) algorithm, which updates, merges, and creates communities based on the pre vious clustering; overlap is allo wed. [15] adopts a similar idea, but does not allow a node to belong to multiple clusters while follo w-up work [16] removes this restriction. Howe v er , all of these works use update rules that are based on heuristics; some of them might produce duplicates or very small communities and need to use ad hoc procedures to remov e them. Unlike our work, the y do not provide any analytical guarantees. Other existing approaches include [17]–[19], which use objectiv e functions that essentially measure both snapshot quality and temporal smoothness. Also, [20] propose a method to detect communities in multi-dimensional networks. None of these howe ver detect o verlapping communities and simultane- ously detect their e volution o ver time. I I I . F O R M U L A T I O N A N D A L G O R I T H M In this section, we formulate the problem and describe our algorithm. W e consider the following natural formulation of O TC detection. Suppose we are giv en T snapshots of a network with n nodes in terms of adjacency matrices A t , t = 1 , . . . , T 1 . Our general goal is, at each time t , to assign each node to a number of clusters so that the edge densities within clusters are higher than those across clusters, and that the assignment does not change rapidly with time. Note that each node might be assigned to multiple clusters, and clusters can overlap. A node might also be associated with no cluster; these nodes are called outliers, and are common in real networks. Mathematically , let r t be the total number of communities at time t . the value of r t is, of course, not known a priori. W e would like to find T covers with outlier s , where a cover C t with outliers means a collection of r t subsets C t = { C t k , k = 1 , . . . , r t } with C t k ⊆ { 1 , . . . , n } ; again note that we allow outlier nodes that do not belong to an y of the subsets. For con v enience, we will use cover in the sequel when we actually mean co ver with outliers. T o make this formulation concrete, there are se veral ques- tions that need to be answered. 1) How to concisely represent a cover? 2) When ov erlaps are allowed, ho w to measure the quality of a cov er? 3) In particular , ho w to a void de generate solutions? For example, declaring each edge as a cluster would make the in-cluster edge density 1 and across-cluster density 0 , but is an undesirable solution providing little information. Similarly , producing a cluster that differs from another only by one node hardly re veals any additional structure. 4) How to enforce temporal smoothness when overlap is present? 5) How to solve the resulting optimization problem o ver covers? In the remainder of this section, we present our precise approach and address the abov e questions. 1 W e use the conv ention A t ii = 1 . A. Cover Matrix Our first step, and also a ke y to later development, is to adopt a matrix representation of a cover . W e use the follo wing representation from [7]. Definition 1 (Matrix Representation of a Co ver) . A ma- trix Y ∈ R n × n r epr esents a cover C = { C k } if Y ij = |{ C ∈ C : i ∈ C, j ∈ C }| . That is, Y ij equals the number of clusters that include both node i and j . Each cover has a unique matrix representation. T o see this, let us introduce the notion of a cluster assignment matrix. Definition 2 (Cluster Assignment Matrix) . U ∈ R n × r is the cluster assignment matrix of a cover C = { C k , k = 1 , . . . , r } if U ik = 1 when i ∈ C k and zer o otherwise. The cluster assignment matrix U is another representation of a cover which shows the clusters that each node belongs to. Clearly each co ver corresponds to a unique U , and each U corresponds to a unique Y via the factorization Y = U U > (the ( i, j ) -th entry of the matrix U U > is the inner product of the i - th and j -th rows of U , which, due to the structure of U , equals the number of shared clusters of node i and j , i.e., Y ij ). In the sequel we will mainly use Y as the optimization variable, but the factorization is useful later for post-processing. Another way to view the cover matrix Y is that it assigns to each pair of nodes ( i, j ) a “similarity lev el” Y ij , measured by the number of shared clusters between i and j [7]. When there is no overlap, the assigned similarity lev el is either 1 ( i, j assigned to the same cluster) or 0 (assigned to different clusters). When overlaps are allo wed, nodes sharing many clusters are considered more similar . In contrast, the network adjacency matrix A can be vie wed as the observed similarity lev el. W ith this in mind, we can think of the general objecti ve of O TC detection as: find a series of cov ers Y t such that the assigned similarity lev el is closed to the observed one at each t , and the covers change smoothly o ver time. In general the number of clusters that include both node i and j might be greater than 1 , so the assigned similarity is also abov e 1 . B. Overlapping T emporal Community Detection W e now give a precise formulation of the above general objectiv e. W e adopt an optimization-based approach to O TC Detection. In particular , we consider the following framew ork: max { Y t } T X t =1 f A t ( Y t ) (1) s.t. T − 1 X t =1 d A t +1 ,A t ( Y t +1 , Y t ) ≤ δ, Y t represents a cover , t = 1 , . . . , T . Here f A t ( Y t ) is the snapshot quality , which serves two purposes: 1) it measures how well the co ver Y t reflects the network A t , i.e., the closeness between the assigned similarity lev el encoded in Y t and the observed similarity lev el in A t , and 2) it prev ents the algorithm from over -fitting, e.g., generating duplicate communities or many small communities ov erlap with each other . The function d A t +1 ,A t ( Y t +1 , Y t ) is a distance function that measures the dif ference between the cov ers at time t + 1 and t . Consequently , the first constraint in the above formulation ensures that the covers evolv e smoothly ov er time. This constraint prefers the ev olutionary path with fewer changes and reflects the inertia inherent in ev olution of groups in real life networks. In this paper we focus on concav e f and conv ex d (w .r .t. { Y t } ). This covers many existing methods for clustering with- out overlap. For example, f can be the modularity function [9] f A ( Y ) = P i,j A ij − k i k j 2 M Y ij (here k i is the degree of node i in A , M is the total number of edges, and we ignore the pre-constant) or the correlation clustering [21] objective f A ( Y ) = −k A − Y k 1 (here k X k 1 = P ij | X | is called the matrix ` 1 norm of X ), and d can be the estrangement [5] d A t +1 ,A t ( Y t +1 , Y t ) = P ij A t +1 ij A t ij max Y t ij − Y t +1 ij , 0 . For O TC detection, the difficulty lies in defining quality and distance functions that can handle ov erlaps. W e propose two nov el metrics that are suitable to this task. For the snapshot quality f , we use the weighted ` 1 distance between the cover matrix Y and the adjacency matrix A : f A ( Y ) = − X i,j | C ij ( Y ij − A ij ) | , where C ij are some weights to be chosen. In this paper , we use the weights C ij = A ij − k i k j 2 M , where k i and M are defined in the last paragraph. This qualify function generalizes the correlation clustering objecti ve [21] and is closely related to the widely-used modularity quality function [9] when there is no overlap. In particular , it penalizes three types of “errors” (recall Y ij is the number of clusters including both i and j , or the assigned similarity le vel between i, j ): • A ij = 1 and Y ij = 0 : nodes i and j are connected but they are assigned to different clusters • A ij = 0 and Y ij ≥ 1 : nodes i and j are disconnected but they share at least one clusters, i.e., the assigned similarity lev el is positive while the observed one is zero. • A ij = 1 and Y ij > 1 : nodes i and j are connected but they share more than one clusters, i.e., the assigned similarity lev el is higher than the observed one. Note that in the last two cases, the more clusters i and j share, the higher the cost is. This prevents the algorithm from ov er- fitting by generating many small clusters with lots of overlap. For the temporal distance d , we use: d A t +1 ,A t ( Y t +1 , Y t ) = X i,j A t +1 ij A t ij | Y t +1 ij − Y t ij | . In other words, we measure the change in the assigned similarity level between node i and j (i.e., the number of clusters that include both nodes), but only when there is an edge between i and j in both snapshots t and t + 1 . F or non- ov erlapping clusters, this reduces to the number of persisting edges that change “state” from intra-cluster to inter-cluster and vice-versa. Our measure is a modification and generalization of the estrangement measure in [5] to overlapping clusters. C. Conve x Relaxation The optimization problem (1) is combinatorial due to the constraint “ Y t represents a cov er”. Exhausti ve search is im- possible because there are exponentially many possible covers. One option is to use greedy local search, which a popular choice for optimizing modularity and other clustering objec- tiv es, but it only conv erges to local minimums and provides no guarantees. In this paper , we use con v ex optimization. There are two advantages of this approach: 1) it leads to an optimization problem that is efficiently solvable and guaranteed to conv erge to the global optimum, and 2) it is possible to obtain a priori characterization of the optimal solution (see Section IV), which provides interesting insights into the problem. T o this end, we relax the cover constraint and solve the following optimization problem: max T X t =1 f A t ( Y t ) (2) s.t. T − 1 X t =1 d A t +1 ,A t ( Y t +1 , Y t ) ≤ δ, Y t ∗ ≤ B , t = 1 , . . . , T ; here k Y t k ∗ is the so-called trace norm, the sum of singular values of Y t . It is known that the trace norm constraint k Y k ∗ ≤ B is a con ve x relaxation of the original cov er constraint [7]. W e briefly explain the reason here. Recall that a co ver matrix admits the factorization Y = U U > , so a co ver Y is positive semidefinite and satisfies k Y k ∗ = X i Y ii = X i #( clusters that include node i ) . (3) In particular , the right hand side in (3) equals n if Y represents a partition. Therefore, as long as B is no smaller than the right hand side in (3), then a co ver matrix Y also satisfies the new constraint, which is thus a relaxation. Although the right hand side in (3) is unkno wn a priori, in practice we find that choosing B to be suitably large, such as 10 n as is done in our e xperiment section, w orks well. Moreo ver , the constraint k Y k ∗ ≤ B effecti vely imposes an upper bound on the amount of ov erlap and prev ents the algorithm from producing a large number of clusters, which is desirable on its o wn right. T race norm is known to be a good relaxation for partition matrices both in theory and in practice [12], [22]. All partition matrices with a small number of partitions (which is the case of interest) are low-rank, and trace norm is the tightest con ve x relaxation of lo w-rank matrices in a formal sense [23]. Moreov er , trace norm utilizes the graph eigen-spectrum which has long been kno wn to reveal hidden clustering structure and is the basis of the highly successful spectral clustering meth- ods. This advantage of trace norm carries over to overlapping clusters [7]. With this relaxation and our choice of f and d , (2) becomes a conv ex program and can be solved in polynomial time using general-purpose con ve x optimization packages such as SDPT3. In Appendix A, we describe a specialized gradient descent algorithm, which is e ven faster . D. P ost-processing Ideally , the optimal solution ˆ Y t would represent a cov er , which could be easily extracted from ˆ Y t (e.g. by finding all maximal cliques); in the next section we provide one sufficient condition for this to happen. In practice, ho wev er , because of the relaxation, ˆ Y t may not ha ve the structure of a co ver matrix. But it is empirically observed that ˆ Y t is usually close to a cov er matrix; in particular, the optimization can be viewed as a “denoising” procedure, which filters out most (though not all) of the noise in the observ ation A t and makes the underlying clustering structure more clear . Therefore, a good clustering is likely to be extracted from ˆ Y t via simple post-processing. W e describe one such procedure below . Recall again that a cov er matrix can be factorized as Y = U U > , where U is an assignment matrix of non-ne gati ve entries, with U ik = 1 indicating node i in cluster k . Therefore, performing Non-negati ve Matrix Factorization (NMF) [24] on a cov er Y giv es the corresponding clustering assignment. When the optimal solution ˆ Y is not an cover but close to be one, we expect that performing NMF ˆ Y = ˆ U ˆ U > would still produce an approximate assignment matrix ˆ U , which is then rounded to be an exact assignment matrix. In particular , we declare node i to belong to cluster k at time t if ˆ U t ik ≥ 0 . 5 . E. Remarks on Our Method Mapping communities: Practical application sometimes requires the communities at time t to be mapped to those at t − 1 , in order to track the evolution of communities. In the experiment section, we use the mapping method in [5], which still works when Y is a cover instead of a parti- tion. The method in v olves mapping those communities across consecutiv e snapshots that ha ve the maximal mutual Jaccard ov erlap between their constituent node-sets, and generating new community identifiers only when needed. Online algorithm: In some cases it might be interesting to use an online version of the algorithm (2): At each time t when a ne w snapshot A t becomes a vailable, we obtain a ne w cov er Y t by solving the follo wing problem: max Y t f A t ( Y t ) (4) s.t. d A t ,A t − 1 ( Y t , Y t − 1 ) ≤ δ t , Y t ∗ ≤ B , where Y t − 1 is the solution from the last snapshot t − 1 and is considered fixed. Rigorously speaking, the solution to the online formulation is in general different from that to the offline one. But we expect in practice the online formulation will perform reasonably well, and v arious updating rules can be adopted to choose the online upper bound δ t . W e do not delve into this in this paper . Complexity and Scalability: Using the fast gradient de- scent algorithm, the space and time comple xities of our method both scale linearly with the problem size (the numbers of nodes, edges and snapshots); see Appendix B for details. W ith the online implementation suggested abo ve, the dependence on the snapshot number can be further alle viated. Our method is therefore amenable to lar ge datasets. I V . T H E O R E T I C A L A NA LY S I S In this section we provide theoretical analysis on the perfor- mance of our algorithm. In particular , our analysis sho ws that if the adjacency matrices A t are generated from an underlying persistent partition according to a generativ e model, then with high probability our method will recov er the underlying partition as long as K = Ω( p n/m ) , where K is the minimum cluster size in the partition and m is the number of snapshots it persists for . This highlights the benefit of temporal clustering: a small cluster of size p n/m is likely to be undetectable if each snapshot is considered individually (e.g., the cluster might not be connected in each single snapshot), but can be recov ered by temporal clustering if the cluster persists for m snapshots and all snapshots are used. This result is quite rev ealing: traditional single-snapshot clustering techniques can only find clusters that are large in size, but temporal clustering is capable of detecting clusters that are small in size but large in the time axis. Moreover , our theorem predicts a specific tradeoff between the “spatial size” K and the “temporal size” m : with four times more temporal snapshots, one can detect a cluster that is half as small spatially . W e believ e this is the first such result in the literature. W e no w present our theorem. W e use a generative model which can be considered as a multi-snapshot version of the classical and widely studied planted partition model (a.k.a. stochastic block model) [25]. Definition 3 (Multi-Snapshot Planted P artition Model) . Sup- pose n nodes are in r disjoint clusters, each with size K , and this clustering structur e does not change over time (see r emarks after the theor em). Let Y ∗ be the matrix that r epr e- sents this clustering. The adjacency matrices A t , t = 1 , . . . , m ar e g enerated as follows: if node i and j are in differ ent clusters, then ther e is an edge between them (i.e. A ij = 1 ) with pr obability q , independent with all others; if the y ar e in the same cluster , then A ij = 1 with pr obability p . W e assume q < 1 2 < p ar e constants independent of n , m and K . Since the underlying partition does not change, we impose the constraint P T − 1 t =1 d A t +1 ,A t ( Y t +1 , Y t ) = 0 , which is equiv- alent to Y t = Y , ∀ t . Rewritten in an equiv alent minimization form, our algorithm becomes min Y X t X ij C ij Y ij − A t ij (5) s.t. k Y k ∗ ≤ n. Note that under the multi-snapshot planted partition model, we hav e C ij = A ij − k i k j 2 M ≈ | A ij − s | , where s := p K n + q 1 − K n ∈ ( q , p ) . The follo wing theorem characterizes when (5) recovers true underlying partition matrix Y ∗ . Theorem 1. Suppose C ij = | A ij − s | . Under the multi- snapshot planted partition model, if K = Ω( p n m ) , then Y ∗ is the unique optimal solution to the con vex optimization pr oblem (5) with pr obability conver ging to 1 as n → ∞ . The proof is gi ven in Appendix C. Remark on Theorem 1: Although the multi-snapshot planted partition model assumes that the underlying clustering structure does not change, and that the clusters do not overlap, we conjecture similar theoretical guarantees can be obtained with these restrictions remov ed. In particular , we expect that our algorithm can detect clusters of size Θ( p n m ) even if the underlying structure changes, provided that between consecu- tiv e changes there are at least m snapshots. This conjecture is supported by the e xperimental results in section V. V . E X P E R I M E N T A L R E S U LT S W e apply our method to two synthetic datasets and three real-world datasets. Our synthetic networks are random graphs generated according to an underlying community structure ev olution. Each snapshot is an instantiation of a random graph generated by connecting each pair of nodes sharing at least one community with probability 0.5, and with probability 0.2 otherwise (including the case where one or both of the nodes are not in an y community). Note that we allo w some nodes in some snapshots to not belong to any community , as is often true in real scenarios. Also, note that nodes sharing more than one communities are not connected with a higher probability . This makes ov erlapping communities harder to detect and is a better test of the detection methods. Using this prescription, we generate two synthetic time- varying networks to validate our method and demonstrate its efficac y . W e compare the results obtained with and without ov erlap allowed, and with and without the smoothness con- straint. A popular temporal clustering method using multi-slice modularity [19] is also considered. The four real network datasets considered in this section include MANET , international trade, AS links, and the MIT Reality Mining Data. A. Synthetic Random Networks I In the first synthetic experiment, we demonstrate the advan- tage of considering the temporal aspect and allowing overlap, and that there is clustering structure that can be detected only if we consider both. W e generate the network snapshots as follows. Suppose there are 120 nodes and 5 underlying com- munities. Community 1 is a small 15-node group including nodes 0 through 14. Community 2 and 3, both of size 38, consist of nodes 15-52 and 47-85, respectively , and overlap at 5 nodes (47-52). Communities 4 and 5, both of smaller size 20, include node 85-104 and 100-119, respectiv ely , and ov erlap at 5 nodes (100-104). Since community 1 is small, in light of Theorem 1, we expect that single-snapshot methods are unable to detect it due to noise/randomness in the network, b ut temporal methods will find them. Community 2 and 3 are large but overlap with each other , so only methods that allow o verlap would detect them, ev en if the snapshots are considered individually . Finally , communities 4 and 5 are small and overlapping, and are thus expected to be discov erable only when both the temporal and ov erlap aspects are considered. This is indeed the case in our experiments. The results are sho wn in Fig 2 to 5. V isualizing ov erlapping temporal communities is not a tri vial task. Here we extend the approach used in Fig. 1 to allow o verlaps, which is explained in the caption of Fig 2. Fig 2 shows the results of Co m m un ity 2 : con t a in in g no de s 1 5 t o 5 2 Co m m un ity 3 : con t a in in g no de s 4 7 t o 8 5 Over la p ( no de s 4 7 t o 5 2) 5 com m u ni t i es ar e d etected a t sna psh ot 1 2 Fig. 2. Synthetic experiment I: Overlapping T emporal Community structure detected by our algorithm. In the figure, the strips between the consecutiv e vertical black lines shows the community assignment for one snapshot. For each snapshot, there are a number of colored vertical strips, each representing a community which contains nodes with corresponding labels on the Y axis. For example, the leftmost blue strip represents a community (Community 3) at time 0 containing nodes 47 to 85, and the cyan strip to its right is a community (Community 2) containing nodes 15 to 52; nodes 47 to 52 are in both communities. Across snapshots, communities with the same color are those that are mapped to each other using the mapping method described in Section III-D. This figure shows that our method faithfully recov ers the underlying 5-community structure that is used to generate the network. our method, which nicely detects all the underlying structure. Fig 3 shows the result of our method but with δ set to infinity , so there is no temporal smoothness constraint and snapshots are considered independently . In this case, communities 1, 4 and 5 are not recovered completely . Fig. 4 sho ws the result when overlap is not allowed, i.e., we impose the constraint Y t ij ≤ 1 , ∀ t, i, j. . All overlapping structure is clearly lost. Fig 5 sho ws that result when δ = ∞ and ov erlap is not allowed; one can see a further degradation of performance. W e also measure the performance of the abo ve four meth- ods by computing the distance of the reco vered commu- nity structure from the ground truth. W e use the distance P T t =1 k Y ∗ t − ˆ Y t k 1 , where Y ∗ t denotes the cover matrix of the ground truth, and ˆ Y t the one found by a clustering algorithm. The results are gi ven in the second ro w of T able I. The error is an order of magnitude smaller when both the overlapping and temporal aspects are considered. Comparison with existing schemes : Although there has been much w ork on community detection algorithms, almost none allo ws simultaneously discovering overlapping and tem- poral communities. Thus, we can only compare against some representativ e non-overlapping temporal community detection algorithms. W e compare against the widely cited temporal community detection scheme presented in [19]. This method in volv es two parameters, the resolution γ for the modularity function and the inter-slice coupling strength ω . Since the ground truth clustering structure does not change ov er time, a large ω is used to force a static output. W e then search o ver different v alues of γ and use the one that giv es the smallest error . The recovered community structure is sho wn in Fig. 6. W e find that this method cannot identify the o verlap structure (as expected), and fails to recov er the non-o verlapping portions of small communities (community 4 and 5). The reco very error , giv en in the last column of T able I, is also high. S m al l com m u ni t i es an d t h ei r o ver l ap s a r e no t we ll r e cove r ed Over la ps be t w ee n la r ge co m m un itie s a r e de t e ct e d Fig. 3. Synthetic experiment I:Clustering result when overlaps are allowed but without temporal smoothness constraint. Small communities and their overlaps are not well recovered. S m al l com m u ni t y is r eco ver e d Over la p st r u ct u r e is lo st Fig. 4. Synthetic experiment I: Clustering result with temporal smoothness constraint but not allowing overlaps. The ov erlap structure is lost. S m al l com m u ni t i es an d t h ei r o ver l ap s a r e no t r eco ver e d Over la p st r u ct u r e be t w ee n la r ge co m m un t i es ar e a lso l ost Fig. 5. Synthetic experiment I: Clustering result without temporal smoothness constraint and no overlaps. Small communities are not well recov ered and the overlap structure is lost. 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 Time 0 20 40 60 80 100 Node id Fig. 6. Synthetic experiment I: Clustering using the multi-slice modularity method in [19]. The two small communities 4 and 5 are incorrectly identified as one cluster, and the overlap structure is lost. T ABLE I D I STA N CE F RO M G RO U N D T RU T H F O R S Y NT H E T IC E X PE R I M EN T S . Overlap+ temporal Overlap only T emporal only None Ref. [19] Expt I 3133 27203 20646 32789 27390 Expt II 1646 14843 8318 12534 7450 B. Synthetic Random Networks II The second synthetic experiment demonstrates the ability of our method to detect and track time-v arying cluster structure, including the ov erlap, merger , emergence, splitting, growth, and shrinking of communities. W e describe how we generate the snapshots. The network has 100 nodes, and the underlying P ha se I P ha se I I P ha se I II P ha se I V S hi f t a nd o ver l ap M er g e E m er g e S h r in k S pl it G r o w Fig. 7. Clustering result of our method for synthetic experiment II. Our method is able to detect the merge, emerge, shrink, split, and growth of communities, as well as their overlaps. 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 Time 0 20 40 60 80 Node id Fig. 8. Clustering result of the multi-slice modularity method in [19] for synthetic experiment II. clustering structure has four phases, each with 10 snapshots: • Phase I: There are two communities: community 1 in- cludes nodes 0 to 39, and community 2 includes nodes 40 to 79. The structure does not change during this phase. • Phase II: Community 1 remains the same, but community 2, no w with nodes 30 to 69, ov erlaps with community 1. The structure does not change during this phase. • Phase III: Communities 1 and 2 merge into a lar ge community A, which consists of nodes 0 to 69. This community then gradually shrinks: at each time there is one node lea ving, and at the end of this phase, community A has nodes 0 to 60. On the other hand, there is a ne w community B, including nodes 75-99, emerges at the beginning of this phase and remains unchanged throughout this phase. • Phase IV : Community A splits into two smaller ones con- sisting of nodes 0-19 and 20-59, respectively . Community B grows by absorbing nodes 60-74, and thus has nodes 60-100. The structure does not change in this phase. As can be seen in Fig. 7, our method performs quite well in recov ering the underlying ev olving structure. This complements our theoretical results in section IV, and shows that our method can handle ov erlaps and detect jumps in the structure. W e compare our method with those that do not allow ov erlap, or ignore the temporal aspect; see T able I. Our method again outperforms other methods by a large margin. Comparison: W e also compare with the algorithm in [19]. For this algorithm, we search for the best parameters ( γ , ω ) that gi ve the smallest error . The reco vered community structure is shown in Fig. 8, and the error is given in the last column of T able I. One observes that it cannot detect o verlaps. Without considering overlaps, our method is competitiv e with a state- of-the-art algorithm that specializes for temporal clustering. C. Real MANET Scenario W e now present results on a real wireless network with mobility . The data is based on the mobility trace from an experiment scenario in Ne w Jersey as described in [26]. W e use a 40 node version of the scenario where the nodes are organized into three teams. The teams move from an initial point to a target point using two primary routes o ver a three- hour period. The scenario is di vided into sev eral phases, each associated with a rendezvous point. During each phase the teams mo ve from one rendezvous point to the ne xt and pause before moving on. There are also six leader nodes which hav e high range radios and are mostly in range of each other . The input to our algorithm is 711 network snapshots formed by the wireless connection between the nodes. The physical locations of the nodes, as well as the underlying team structure, is unknown to the algorithm. The community structure found by our algorithm is sho wn in Fig. 9. W e find that the leader nodes form a small yet persistent community (shown in orange in Fig. 9), which can only be detected by our clustering method. W e also find that the o verlapping temporal community structure is basically in variant for each phase of scenario even when the topology as well as the instantaneous community structure without overlap is changing. Thus, we show that in this case the ov erlapping temporal community structure detected by our method re veals a structural pattern that remains in v ariant even with a fair bit of mobility . D. MIT Reality-mining W e apply our method to a human-human contact netw ork in the Reality-mining project [27]. The results are shown in Fig. 11. T wo predominant groups can be seen, one corre- sponding to the staff at the MIT Media Lab, and the other corresponding to the students at the MIT Sloan School of Business. W e also observe a discontinuity of the Sloan School community around New Y ear’ s break. E. International T rade Network Our next real dataset consists of annual trade v olumes between pairs of countries during 1870–2006 [28]. W e create an unweighted network each year by placing an edge between two countries if the trade volume between them exceeds 0 . 1% of the total trade v olumes of both countries; in other words, an edge is drawn if their trade is significant for both of them. This leads to a dynamic network with 197 nodes and 137 snapshots, which is fed to our algorithm. Fig. 12 shows the post-W orld W ar II (1950–2006) com- munity structure found by our algorithm, where the ov erlaps are not displayed (for each node, only the largest cluster it belongs to is sho wn). Fiv e prominent trade communities can be immediately identified: Latin-American, US-Euro-Asian, Ex-USSR Block, W est African, and Afro-Asian. One also observes the ev olution of the communities, including the formation of the W est African block in 1960 (“the Y ear of Africa”) due to decolonization, the emer gence of the Ex-USSR block after 1991, as well as Colombia and V enezuela joining the US-Euro-Asian Block in the 1970s. Fi r st 30 sn ap sho t s wi t h o ver l ap s Te am 1 Te am 2 Te m po r al co m m un ity str uctur e w itho ut o ver l ap d etected b y o ur m etho d Te am 3 S ix le ad er no de s S na psh ot N o. 1 S na psh ot N o. 2 6 Te am 1 Te am 2 Te am 3 S ix le ad er no de s Fig. 9. Clustering results for MANET data. T op panel: community structure found by our method for all 711 snapshots, where the colors indicate the community membership of each node at each time; for each node, only the largest cluster it belongs to is sho wn; overlaps are not displayed. Middle panel: overlapping community structure for the first 30 snapshots; 3 teams and the six leader nodes are identified by our method; the six leader nodes form a small yet persistent community that overlaps with the other communities; this community can not be detected if overlap is not allowed (compare with Fig. 10). Bottom panel: the observed network structure at two snapshots; at snapshot No.1, all nodes are densely connected with each other and forms a single community; at snapshot No. 26, there are three communities corresponding to the three teams; in addition, the six leader nodes form a community of its own, which is not obvious from looking at a single snapshot of the network but yet our method is able to detect it. 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 Time 0 5 10 15 20 25 30 35 Node id Fig. 10. Clustering results without overlaps for MANET data for the first 30 snapshots. The community of leader nodes is not detected. More information can be obtained by examining the overlap structure. A number of countries are associated with multiple communities. For example, US, Me xico, Colombia and Brazil belong to both US-Euro-Asian and Latin American blocks. France and Portugal are in the US-Euro-Asian block, b ut they both interact with the W est African block for a significant number of years. Similarly , Ivory Coast, Ghana and Nigeria are mainly W est African but also associated with the US-Euro- Asian. Several Asian/Pacific countries, including Saudi Arabia and Australia, hav e trade partners in both US-Euro-Asian and Afro-Asian blocks. S ep 2 00 4 Oct 20 04 No v 2 00 4 De c 2 00 4 Jan 2 00 5 Fe b 20 05 M ar 20 05 M ed ia L ab S lo an S ch oo l of B u sin ess Fig. 11. MIT Data. No significant overlap is observed, so we only show non- overlapping temporal community structure found. T wo predominant groups can be seen, one corresponding to the staf f and students at the MIT Media Lab, and the other corresponding to the students at the MIT Sloan School of Business. United States of Am erica Canada Baham as Cuba Haiti Domin ican Repu blic Jamaica Trini dad and To bago Barb ados Domin ica Grenada St. Lucia St. Vincent and the Grenadin es Antigua & Barb uda St. Kitts and Nevis Mexico Belize Guatema la Hondur as El Sal vador Nicara gua Costa Rica Panam a Colom bia Venezu ela Guyana Suri name Ecuado r Peru Brazi l Bolivi a Para guay Chile Arge ntina Urugu ay United Ki ngdom Ireland Netherl ands Belgi um Luxemb ourg France Monaco Liechtenstei n Switzer land Spain Andor ra Portug al German y German Federa l Republ ic German Democr atic Republ ic Polan d Austria Hungar y Czechoslo vakia Czech Repu blic Slovaki a Italy San Ma rino Malta Alban ia Macedo nia Croatia Yugosl avia Bosnia and Herze govina Sloven ia Greece Cyprus Bulga ria Moldo va Roman ia Russia Estonia Latvia Lithuani a Ukrain e Belar us Arm enia Georgia Azerb aijan Finlan d Swede n Norwa y Denma rk Iceland Cape Ve rde Sao Tom e and Pr incipe Guinea- Bissau Equator ial Guine a Gambia Mali Seneg al Benin Maur itania Niger Ivory Coast Guinea Burki na Faso Liber ia Sier ra Leon e Ghana Togo Camer oon Niger ia Gabon Central African Re public Chad Congo Democr atic Repub lic of the Congo Uganda Kenya Tanzani a Buru ndi Rwanda Soma lia Djibou ti Ethiopi a Eritr ea Angol a Mozam bique Zambi a Zimba bwe Malaw i South Afr ica Namib ia Lesotho Botswana Swazil and Madag ascar Comor os Maur itius Seychel les Moro cco Alger ia Tunisia Libya Sudan Iran Turkey Iraq Egypt Syria Lebano n Jordan Israel Saudi Arabi a Yeme n Arab Republi c Yeme n Yeme n Peopl e's Repub lic Kuwai t Bahr ain Qatar United Ar ab Em irates Oman Afghani stan Turkm enistan Tajikistan Kyrgyzstan Uzbekistan Kazakhstan China Mongo lia Taiwan North Ko rea South Ko rea Japan India Bhutan Pakistan Bangl adesh Myanm ar Sri La nka Maldi ves Nepal Thaila nd Cambo dia Laos Vietnam Republ ic of Vietnam Malaysi a Singa pore Brun ei Phili ppines Indonesia East Timo r Austral ia Papua New Guinea New Zeal and Vanua tu Solom on Islands Kiri bati Tuvalu Fiji Tonga Nauru Marsh all Islands Palau Feder ated States of Micr onesia Samo a 1950 1958 1954 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 Latin- American Block US-E uro-As ian Block Ex-U SSR Bloc k West African Block Afro-A sian Block (For med after 19 91) (For med at 1960 du e to decolon ization) (For med post-1 970) Fig. 12. Clustering result for the International Trade Network; only years 1950–2006 are shown (overlaps not shown for clarity). Fi ve prominent trade communities (blocks) can be seen. Moreo ver , one can observe the emergence of the Ex-USSR block (after 1992), the W est African Block (at 1960) and the Afro-Asian Block (post-1970), as well as Colombia and V enezuela joining the US-Euro-Asian block (1970s, orange arrow at the top-middle part of the plot). Note that black is the background color and not a community . F . The Skitter AS Links Dataset Finally , to validate the performance of our algorithm on larger networks, we analyze the Internet topology at the Autonomous System (AS) lev el as collected by CAID A [29]. W e obtained quarterly snapshots of the data over an 8 year period starting in 2000. The data has upto 28000 nodes in some snapshots. Many of those are edge nodes with a low degree and do not belong to a community . Thus we only consider nodes with degree larger than 9 in at least one snapshot. The final dataset consists of 2807 nodes and 32 snapshots. Among these 2807 AS’ s, we identify 90 of them exhibit significant community structures – each of them are assigned to community in at least 10 snapshots. The temporal commu- nity structure for these nodes is shown in Fig. 14; ov erlaps are not sho wn for clarity . Results with ov erlaps are sho wn in Fig. 15. W e make some initial observations from Fig. 14: (1) In upper portion we see a persistent block with AS 1, 1239, 7018, 5511, 2914, 3561, 6461, 3549, 3356, 701, 209, and 6453. These seem to be mainly in US. (2) In the lower -right United States of Am erica Canada Baham as Cuba Haiti Domin ican Repu blic Jamaica Trini dad and To bago Barb ados Domin ica Grenada St. Lucia St. Vincent and the Grenadin es Antigua & Barb uda St. Kitts and Nevis Mexico Belize Guatema la Hondur as El Sal vador Nicara gua Costa Rica Panam a Colom bia Venezu ela Guyana Suri name Ecuado r Peru Brazi l Bolivi a Para guay Chile Arge ntina Urugu ay United Ki ngdom Ireland Netherl ands Belgi um Luxemb ourg France Monaco Liechtenstei n Switzer land Spain Andor ra Portug al German y German Federa l Republ ic German Democr atic Republ ic Polan d Austria Hungar y Czechoslo vakia Czech Repu blic Slovaki a Italy San Ma rino Malta Alban ia Macedo nia Croatia Yugosl avia Bosnia and Herze govina Sloven ia Greece Cyprus Bulga ria Moldo va Roman ia Russia Estonia Latvia Lithuani a Ukrain e Belar us Arm enia Georgia Azerb aijan Finlan d Swede n Norwa y Denma rk Iceland Cape Ve rde Sao Tom e and Pr incipe Guinea- Bissau Equator ial Guine a Gambia Mali Seneg al Benin Maur itania Niger Ivory Coast Guinea Burki na Faso Liber ia Sier ra Leon e Ghana Togo Camer oon Niger ia Gabon Central African Re public Chad Congo Democr atic Repub lic of the Congo Uganda Kenya Tanzani a Buru ndi Rwanda Soma lia Djibou ti Ethiopi a Eritr ea Angol a Mozam bique Zambi a Zimba bwe Malaw i South Afr ica Namib ia Lesotho Botswana Swazil and Madag ascar Comor os Maur itius Seychel les Moro cco Alger ia Tunisia Libya Sudan Iran Turkey Iraq Egypt Syria Lebano n Jordan Israel Saudi Arabi a Yeme n Arab Republi c Yeme n Yeme n Peopl e's Repub lic Kuwai t Bahr ain Qatar United Ar ab Em irates Oman Afghani stan Turkm enistan Tajikistan Kyrgyzstan Uzbekistan Kazakhstan China Mongo lia Taiwan North Ko rea South Ko rea Japan India Bhutan Pakistan Bangl adesh Myanm ar Sri La nka Maldi ves Nepal Thaila nd Cambo dia Laos Vietnam Republ ic of Vietnam Malaysi a Singa pore Brun ei Phili ppines Indonesia East Timo r Austral ia Papua New Guinea New Zeal and Vanua tu Solom on Islands Kiri bati Tuvalu Fiji Tonga Nauru Marsh all Islands Palau Feder ated States of Micr onesia Samo a 1950 1958 1954 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 US Mexico Colom bia Brazi l France Portug al Ivory Coast Ghana Niger ia Saudi Arabi a Austral ia US-E uro-As ian & Lartin American mainl y US-E uro-As ian also West African mainl y West African also US-E uro-As ian US-E uro-As ian & Afro-A sian Fig. 13. Clustering with overlaps for the International Trade Network; only years 1950–2006 are shown. The figure indicates some the countries that are associated with multiple communities. Note that the color black is the background and does not indicate a community . 2000 2001 2002 2003 2004 2005 2006 2007 Time 9057 9057 1755 1755 1200 1200 8297 8297 2529 2529 5459 5459 3300 3300 1273 1273 297 297 2603 2603 4565 4565 577 577 1 1 1239 1239 7018 7018 5511 5511 2914 2914 3561 3561 6461 6461 3549 3549 3356 3356 701 701 2152 2152 209 209 6453 6453 703 703 4694 4694 2548 2548 3301 3301 4766 4766 7911 7911 3786 3786 4200 4200 5400 5400 3257 3257 3303 3303 6762 6762 4651 4651 4755 4755 4589 4589 7473 7473 4637 4637 15412 15412 9318 9318 9304 9304 8220 8220 3292 3292 4713 4713 2518 2518 7527 7527 2500 2500 2516 2516 4716 4716 4725 4725 2497 2497 10026 10026 9264 9264 7539 7539 22388 22388 293 293 6509 6509 11537 11537 9270 9270 1103 1103 17579 17579 10764 10764 7660 7660 20965 20965 668 668 2907 2907 6325 6325 702 702 3216 3216 20485 20485 25462 25462 8342 8342 2828 2828 6395 6395 4323 4323 7132 7132 8928 8928 286 286 6695 6695 13237 13237 1299 1299 174 174 1668 1668 6539 6539 3320 3320 1257 1257 As Number Fig. 14. Clustering result for the AS Link Dataset, overlaps not shown. there is another smaller block with AS 8928, 286, 6695 and 13237, which seem to be EU and DE. (3) Between 2004 and 2005 there is some significant formation of ne w communities. A similar phenomenon has been observed in [30]. Moreov er , by looking at the o verlap structure, we find that that all the nodes in the US block mentioned abov e consistently appear in multiple clusters. These turn out to be T ier 1 providers or large internet exchange points. V I . A P P L I C A T I O N S T O C O M M U N I C A T I O N N E T W O R K S W e now describe some applications of community detection to the design of communication networks. 2000.01.01 2001.01.01 2002.01.01 2003.01.01 2004.01.01 2005.01.01 2006.01.01 2007.01.01 Time 9057 9057 1755 1755 1200 1200 8297 8297 2529 2529 5459 5459 3300 3300 1273 1273 297 297 2603 2603 4565 4565 577 577 1 1 1239 1239 7018 7018 5511 5511 2914 2914 3561 3561 6461 6461 3549 3549 3356 3356 701 701 2152 2152 209 209 6453 6453 703 703 4694 4694 2548 2548 3301 3301 4766 4766 7911 7911 3786 3786 4200 4200 5400 5400 3257 3257 3303 3303 6762 6762 4651 4651 4755 4755 4589 4589 7473 7473 4637 4637 15412 15412 9318 9318 9304 9304 8220 8220 3292 3292 4713 4713 2518 2518 7527 7527 2500 2500 2516 2516 4716 4716 4725 4725 2497 2497 10026 10026 9264 9264 7539 7539 22388 22388 293 293 6509 6509 11537 11537 9270 9270 1103 1103 17579 17579 10764 10764 7660 7660 20965 20965 668 668 2907 2907 6325 6325 702 702 3216 3216 20485 20485 25462 25462 8342 8342 2828 2828 6395 6395 4323 4323 7132 7132 8928 8928 286 286 6695 6695 13237 13237 1299 1299 174 174 1668 1668 6539 6539 3320 3320 1257 1257 AS Number Fig. 15. Clustering with ov erlaps for the AS Link Dataset. Routing in Disruption T olerant Networks (DTNs): DTNs are often formed by devices that are carried around by humans whose mobility patterns are strongly influenced by their social relationships. Thus, the structures of the social graph between the humans and the the contact graph between the devices are correlated. While the contact graph can change rapidly , it usually possesses a relati vely stable underlying structure that is a function of the less volatile social graph [31]–[35]. This can be used to de velop “social aw are” routing strategies that use social metrics such as node centrality and community labels to make forwarding decisions [31]–[34], [36]. All of these schemes utilize some form of community detection on the contact graph to infer social relations between the mobile nodes. Ho wev er , the community detection methods used are generally limited to the non-ov erlapping and e ven non time- varying case. The community detection framework proposed in this paper can be used with any of these schemes while ov ercoming these limitations. This can result in significant per - formance gains when, for example, people belong to multiple social groups (e.g., friends, family , co-workers, etc.). Efficient Caching in Content Centric Networks: Content based networking is an emer ging paradigm that does not require connection oriented protocols between producers and consumers of information in communication networks. Intel- ligent caching and replication of the content can significantly reduce access delays as well as the overhead costs associated with repeated querying and duplicate transmissions. Recent work [6] proposes making use of the community structure of a MANET to determine nodes for content replication. Assuming that the community structure changes on a slo wer time scale than the network topology , nodes in the same community can cooperate to provide an efficient and speedy access to content. The method proposed in this paper can pro vide a principled approach to build distributed content caching protocols. Developing Realistic Mobility Models: Much initial work on the design and analysis of routing algorithms for mobile networks assumed simplistic mobility models such as random walk, random waypoint, etc. Howe ver , the analysis of mobility traces from many real-life scenarios suggests that these sim- plistic models do not capture the details of real-world mobility characteristics such as periodicity and correlations due to social relationships between nodes. Recent work on mobility modeling [37], [38] attempts to capture the dependence of the social relationships between nodes on their mobility patterns. Community detection methods such as ours can be used to construct more refined mobility models that capture complex features such as the existence of o verlapped communities as well as small yet persistent temporal communities. V I I . C O N C L U S I O N In this paper, we consider the problem of detecting ov erlap- ping temporal communities in dynamic networks. A con vex optimization based approach is proposed for this problem. Theoretical and e xperimental results sho w that our method is capable of rev ealing interesting community structure that cannot be detected by methods that do not allo w ov erlap, or those that do not utilize temporal information. For simplicity , in this work we ha ve focused on unweighted graphs. In the future, we plan to extend our method to treat weighted graphs as well as dev elop distributed v ersions of the algorithm. W e believ e our methods hav e wide applications in studying the structure and e volution of complex networked systems including communication networks and social networks. Acknowledg ements Research was sponsored by the Army Research Laboratory and was accomplished under Cooperati ve Agreement Number W911NF- 09-2-0053. R E F E R E N C E S [1] S. Fortunato, “Community detection in graphs, ” Physics Reports , v ol. 486, no. 3-5, pp. 75–174, 2010. [2] D. Chakrabarti, R. K umar, and A. T omkins, “Evolutionary clustering, ” in ACM KDD , 2006. [3] Y .-R. Lin, Y . Chi, S. Zhu, H. Sundaram, and B. L. Tseng, “Facetnet: a framew ork for analyzing communities and their e volutions in dynamic networks, ” in A CM WWW , 2008. [4] M. Kim and J. Han, “ A particle-and-density based evolutionary cluster - ing method for dynamic networks, ” Proceedings of the VLDB Endow- ment , vol. 2, no. 1, pp. 622–633, 2009. [5] V . Kawadia and S. Sreenivasan, “Sequential detection of temporal communities by estrangement confinement, ” Scientific Reports 2 , 2012. [6] V . Kawadia, N. Rig a, J. Opper, and D. Sampath, “Slinky: An adaptive protocol for content access in disruption-tolerant ad hoc networks, ” in ACM Intl. W orkshop on T actical Mobile Ad Hoc Networking , 2011. [7] Y . Chen, H. Xu, and S. Sanghavi, “Graph clustering with ov erlaps, ” Manuscript. Submitted . [8] G. Agarwal and D. Kempe, “Modularity-maximizing graph communities via mathematical programming, ” The Eur opean Physical J ournal B , vol. 66, no. 3, pp. 409–418, 2008. [9] M. E. J. Newman, “Modularity and community structure in networks, ” PNAS , vol. 103, no. 23, pp. 8577–8582, 2006. [10] E. Candes, X. Li, Y . Ma, and J. Wright, “Robust principal component analysis?” Preprint , 2009. [11] V . Chandrasekaran, S. Sanghavi, S. Parrilo, and A. W illsky , “Rank- sparsity incoherence for matrix decomposition, ” SIAM Journal on Op- timization , vol. 21, no. 2, pp. 572–596, 2011. [12] Y . Chen, S. Sanghavi, and H. Xu, “Clustering sparse graphs, ” Advances in neural information processing systems 25 , 2012. [13] J. Xie, S. Kelley , and B. K. Szymanski, “Overlapping community detection in networks: the state of the art and comparative study , ” T o appear in ACM Computing Surveys , vol. abs/1110.5813, 2011. [14] R. Cazabet, F . Amblard, and C. Hanachi, “Detection of overlapping communities in dynamical social networks, ” in IEEE SocialCom , 2010. [15] N. Nguyen, T . Dinh, Y . Xuan, and M. Thai, “ Adapti ve algorithms for detecting community structure in dynamic social networks, ” in INFOCOM, 2011 Pr oceedings IEEE . IEEE, 2011, pp. 2282–2290. [16] N. Nguyen, T . Dinh, S. T okala, and M. Thai, “Ov erlapping communities in dynamic networks: their detection and mobile applications, ” in ACM Mobicom , 2011, pp. 85–96. [17] J. Sun, C. Faloutsos, S. P apadimitriou, and P . S. Y u, “Graphscope: parameter-free mining of large time-evolving graphs, ” in ACM KDD , 2007. [18] J. Ferlez, C. Faloutsos, J. Leskovec, D. Mladenic, and M. Grobelnik, “Monitoring network ev olution using MDL, ” in IEEE ICDE , 2008. [19] P . Mucha, T . Richardson, K. Macon, M. Porter , and J. Onnela, “Com- munity structure in time-dependent, multiscale, and multiple x networks, ” Science , vol. 328, no. 5980, pp. 876–878, 2010. [20] Y . Chi, X. Song, D. Zhou, K. Hino, and B. Tseng, “On ev olutionary spectral clustering, ” ACM T rans on Knowledge Discovery fr om Data , vol. 3, no. 4, p. 17, 2009. [21] N. Bansal, A. Blum, and S. Chawla, “Correlation clustering, ” Machine Learning , vol. 56, no. 1, pp. 89–113, 2004. [22] B. Ames and S. V av asis, “Nuclear norm minimization for the planted clique and biclique problems, ” Mathematical Pr ogr amming , vol. 129, no. 1, pp. 69–89, 2011. [23] B. Recht, M. F azel, and P . Parrilo, “Guaranteed Minimum-Rank So- lutions of Linear Matrix Equations via Nuclear Norm Minimization, ” SIAM Review , vol. 52, no. 471, 2010. [24] D. Seung and L. Lee, “ Algorithms for non-ne gativ e matrix factorization, ” Advances in neural information pr ocessing systems 13 , 2000. [25] A. Condon and R. Karp, “ Algorithms for graph partitioning on the planted partition model, ” Random Structur es and Algorithms , v ol. 18, no. 2, pp. 116–140, 2001. [26] Y . Lu, F . Wicker , Y . Chen, P . Li ´ o, and D. T owsley , “On secure network structures in the lakehurst trace, ” 2008. [27] N. Eagle and A. Pentland, “Reality mining: sensing complex social systems, ” P ersonal Ubiquitous Comput. , vol. 10, pp. 255–268, 2006. [28] K. Barbieri and O. K eshk, “Correlates of war project trade data set. ” [Online]. A vailable: correlatesofwar .or g [29] B. Huffaker , Y . Hyun, D. Andersen, and kc claffy , “The Skitter AS Links Dataset. ” [30] B. Edwards, S. Hofmeyr , G. Stelle, and S. Forrest, “Internet topology over time, ” Preprint , 2012. [31] E. M. Daly and M. Haahr , “Social network analysis for routing in disconnected delay-tolerant manets, ” in ACM Mobihoc , 2007, pp. 32–40. [32] P . Hui, J. Cro wcroft, and E. Y oneki, “Bubble rap: Social-based forward- ing in delay-tolerant networks, ” Mobile Computing, IEEE T ransactions on , vol. 10, no. 11, pp. 1576 –1589, nov . 2011. [33] W . Gao, Q. Li, B. Zhao, and G. Cao, “Multicasting in delay tolerant networks: a social network perspectiv e, ” in A CM MobiHoc , 2009. [34] A. Mtibaa, M. May , C. Diot, and M. Ammar, “Peoplerank: social opportunistic forwarding, ” in IEEE INFOCOM , 2010. [35] A.-K. Pietil ¨ anen and C. Diot, “Dissemination in opportunistic social networks: the role of temporal communities, ” in ACM MobiHoc , 2012. [36] Y . Zhu, B. Xu, X. Shi, and Y . W ang, “ A survey of social-based routing in delay tolerant netw orks: Positi ve and neg ative social ef fects, ” IEEE Comm. Surveys T utorials , vol. PP , no. 99, pp. 1–15, 2012. [37] W .-J. Hsu, T . Spyropoulos, K. Psounis, and A. Helmy , “Modeling spatial and temporal dependencies of user mobility in wireless mobile networks, ” IEEE/ACM T rans. Netw . , vol. 17, pp. 1564–1577, 2009. [38] T . Hossmann, T . Spyropoulos, and F . Legendre, “Putting contacts into conte xt: mobility modeling beyond inter-contact times, ” in ACM MobiHoc , 2011, pp. 18:1–18:11. [39] N. Halko, P . Martinsson, and J. Tropp, “Finding structure with ran- domness: Probabilistic algorithms for constructing approximate matrix decompositions, ” SIAM revie w , vol. 53, no. 2, pp. 217–288, 2011. A P P E N D I X A. F ast Algorithm Solving the program (2) using standard package is feasible only for small or medium size problems. In this section, we describe a faster algorithm that is suitable for larger scale datasets. Our method is based on matrix factorization. Each positive semidefinite cover matrix Y t can be factorized as Y t = U t U t > , where U t ∈ R n × r and Y t ∗ = U t 2 F . Here r is an y upper-bound on the number of clusters at each snapshot; one can always use r = n , but a smaller value is more desirable. W e consider the Lagrangian of the original constrained formulation. The optimization problem becomes max U t T X t =1 f ( U t U t > | A t ) − γ T − 1 X t =1 d ( Y t +1 , Y t ) , s.t. U t 2 F ≤ B . (6) Choosing the multiplier γ is equivalent to choosing δ in the original formulation. W e use (sub-)gradient descent to solve the problem: U t ← P √ B h U t + τ t ∇ f ( U t U t > ) − γ ∇ 2 d ( U t +1 U t +1 > , U t U t > ) − γ ∇ 1 d ( U t U t > , U t − 1 U t − 1 ) U t i , (7) where ∇ is the sub-gradient operator , ∇ i denotes the (sub-)gradient w .r .t. the i -th argument, P √ B ( X ) is the Euclidean projection of X onto the Frobenius norm ball n Z : k Z k F ≤ √ B o (i.e., scale down X to have Frobenius norm √ B if and only if it is outside the ball), and τ t is the step size. As for all gradient descent methods, the above procedure is guaranteed to con ver ge provided τ t → 0 . In this paper , we use a geometrically decreasing step size τ t = 0 . 001 · 0 . 995 t . B. Complexity and Scalability W e analyze the memory and time complexity of the gradient descent algorithm. Memory complexity : W e need to store the adjacency matrices { A t } and the factorization { U t } , which requires O ( E ) and O ( nrT ) memory , respective; here E is the total number of edges in all snapshots, n the number of nodes, r the maximum number of clusters at each snapshot, and T number of snapshots. The total memory complexity is O ( E + nrT ) . The online implementation suggested in Section III-E will further alleviate the dependence on T . Time complexity : The algorithm requires time T 1 for computing a initial point, and T 2 for each iteration with M iterations. Here we initialize U t by taking a rank- r SVD of A t . For each t this can be done in time O ( r E t + nr 2 ) (see [39]), where E t is the number of edges in snapshot t . So T 1 = O ( r E + nr 2 T ) . Now consider the update (7). The computation of the product of three (sub-)gradient operators with U t takes time O ( r 2 E t + nr ) , O ( r 2 E t ) and O ( r 2 E t ) , respectiv ely , by taking advantage of the fact that we can use any sub-gradient. The summation and the projection both take O ( nr ) . W e thus hav e T 2 = O ( r 2 E + nr T ) . The total time complexity is then O ( nr 2 T + M r 2 E + M nr T ) . Characterizing the number of iterations M needed for a specified accurac y rigorously is difficult, Howe ver , as observed empirically in our simulations and many other studies, M is independent of E , n and T , and can be treated as O (1) . In summary , with a bounded number of clusters r , both the space and time complexity scale linearly in E and nT . This is the best one can hope for , as it takes at least this much space and time to read the input and write down the final solution. C. Pr oof of Theor em 1 The follo wing lemma shows that it suffices to study the Lagrangian formulation. Recall that k X k 1 = P i,j | X ij | is the matrix ` 1 norm of M . Let ◦ denote the entry-wise product. Lemma 1. Y ∗ is the unique optimal solution to (5) if there exists a λ such that Y ∗ is the unique optimal solution to the following pr oblem min Y k Y k ∗ + λ X t k C ◦ ( Y − A t ) k 1 (8) Pr oof: Let g ( Y ) = k Y k ∗ and h ( Y ) = P t k C ◦ ( Y − A t ) k 1 . Note that g ( Y ∗ ) = n . By standard con vex analysis and the fact that Y ∗ is optimal to (8), we have the follo wing chain of inequality: (5) = min Y : g ( Y ) ≤ n h ( Y ) = max λ 0 min Y h ( Y ) + 1 λ 0 ( g ( Y ) − n ) ≥ 1 λ min Y λh ( Y ) + ( g ( Y ) − n ) = h ( Y ∗ ) ≥ (5) . Therefore, equality holds above, which proves that Y ∗ is an optimal solution to (5). W e pro ve uniqueness by contradiction. If Y ∗ is not the unique optimal solution to (5), then there exists Y 0 with g ( Y 0 ) ≤ n and h ( Y 0 ) = (5). Using the equality we just prov ed, we hav e h ( Y 0 ) + λ ( g ( Y 0 ) − n ) ≤ h ( Y 0 ) = (5) = 1 λ min Y λh ( Y ) + ( g ( Y ) − n ) , which contradicts the assumption that Y ∗ is the unique optimal solution to (8). T o prov e Theorem 1, it suf fices to show that Y ∗ is the unique optimal solution to (8) with λ = q 1 16 mn . W e do this by showing that any other solution Y ∗ + ∆ with ∆ 6 = 0 has a higher objective value. W e define a matrix W which serves as a dual certificate. Let S t = A t − Y ∗ , Ω t = ( i, j ) | S t ij 6 = 0 , R = { ( i, j ) | Y ij = 1 } , and U be the matrix whose columns are the singular vectors of Y ∗ . F or an y entry set Ω ⊆ [ n ] × [ n ] , let 1 Ω denote the matrix whose entries in Ω equals 1 and others 0 . Define W = P m t =1 V t + P t Z t , where V t = 1 m − P Ω t U U > + 1 − p p P Ω c t U U > Z t = 2 λ C ◦ S t + 1 − p p X ( i,j ) ∈ R ∩ Ω c t (1 − s )1 ( i,j ) − q 1 − q X ( i,j ) ∈ R c ∩ Ω c t s 1 ( i,j ) . Due to the randomness in Ω t , both P t V t and P t Z t are random matrices with independent zero-mean entries, whose variances are bounded by 1 K 2 m and 4 λ 2 m due to the setup of the model. Under our choice of λ and the assumption of the theorem, they are further bounded by 1 2 n . Let k · k be the spectral norm (the largest singular value). Standard bounds on the spectral norm of random matrices guarantees that with probability con ver ging to one, k P T ⊥ W k ≤ X t V t + X t Z t ≤ 1 . It follows that U U > + P T ⊥ W is a subgradient of k Y k ∗ , which means Y ∗ + ∆ , U U > + P T ⊥ W ≥ ∆ , U U > + P T ⊥ W for all ∆ . Also define F t = − sign ( P Ω c t (∆ t )) , where sign ( · ) is the signum function, so F t , ∆ t = P Ω t ∆ t 1 . W e also know C ◦ ( S t + F t ) is a subgradient of C ◦ S t 1 , so C ◦ ( S t − ∆) 1 − C ◦ S t 1 ≥ C ◦ ( S t + F t ) , − ∆ . Combining the above discussion, we have k Y + ∆ k ∗ − k Y k ∗ + λ X t C ◦ ( S t − ∆) 1 − C ◦ S t 1 ≥ D U U > + P T ⊥ W , ∆ E + λ X i C ◦ ( S t + F t ) , − ∆ W e bound each of the abo ve two terms. Notice that D U U > + P T ⊥ W , ∆ E = D U U > + W, ∆ E − h P T W , ∆ i = X t ( P Ω t + P Ω c t )( 1 m U U > + V t + Z t ) , ∆ − h P T W , ∆ i ≥ 2 λ X t k P Ω t ( C ◦ ∆) k 1 − k P T W k ∞ k ∆ k 1 + X t * 1 m 1 1 − q P Ω c t U U > + 2 λ 1 − p p X ( i,j ) ∈ R ∩ Ω c t (1 − s )1 ( i,j ) − 2 λ q 1 − q X ( i,j ) ∈ R c ∩ Ω c t s 1 ( i,j ) , ∆ + ; here k M k ∞ := max i,j | M ij | is the matrix ` ∞ norm. Under the assumption of the theorem, we have * 1 m 1 1 − q P Ω c t U U > + 2 λ 1 − p p X ( i,j ) ∈ R ∩ Ω c t (1 − s )1 ( i,j ) , ∆ + ≥ − 1 2 λ P R ∩ Ω c t ( C ◦ ∆) 1 and * − 2 λ q 1 − q X ( i,j ) ∈ R c ∩ Ω c t s 1 ( i,j ) , ∆ + ≥ − 1 2 λ k P R c ∩ Ω tc ( C ◦ ∆) k 1 . Moreov er , observe that each entry of ( P T W ) ij = 1 K P K k =1 W ij , which is the sum of independent random variables with bounded variance as previously discussed. Under the assumption of the the- orem, this sum is bounded by 1 √ K q 1 2 n ≤ 1 4 mλ min { s, 1 − s } with probability conv erging to one by standard Bernstein inequality; k P T W k ∞ is bounded by the same quantity using a union bound. It follows that D U U > + P T ⊥ W , ∆ E ≥ 7 4 λ X t k P Ω t ( C ◦ ∆) k 1 − 3 4 λ X t P Ω c t ( ◦ ∆) 1 On the other hand, we have λ X t h C ◦ ( S t + F t ) , − ∆ i = − λ X t k P Ω t ( C ◦ ∆) k 1 + λ X t P Ω c t ( C ◦ ∆) 1 . Combining pieces, we obtain k Y ∗ + ∆ k ∗ − k Y k ∗ + λ X t C ◦ ( S t − ∆) 1 − C ◦ S t 1 i ≥ D U U > + P T ⊥ W , ∆ E − λ X t k P Ω t ( C ◦ ∆) k 1 + λ X t k P Ω tc ( C ◦ ∆) k 1 ≥ 7 4 λ X t k P Ω t ( C ◦ ∆) k 1 − 3 4 λ X t k P Ω tc ( C ◦ ∆) k 1 − λ X t k P Ω t ( C ◦ ∆) k 1 + λ X t k P Ω tc ( C ◦ ∆) k 1 > 0 . This completes the proof of the theorem.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment