Bayesian Discovery of Threat Networks

5324 IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. 62, NO. 20, 2014 Bayesian Disco v ery of Threat Netw orks Ste ven Thomas Smith, Senior Member , IEEE , Edward K. Kao, Member , IEEE , K enneth D. Senne, Life F ellow , IEEE , Garrett Bernstein, and Scott Philips Abstract —A novel uniﬁed Bay esian framework f or network detection is developed, under which a detection algorithm is derived based on random walks on graphs. The algorithm detects threat networks using partial observations of their activity , and is pr oved to be optimum in the Neyman-Pearson sense. The algorithm is deﬁned by a graph, at least one obser vation, and a diffusion model for threat. A link to well-kno wn spectral detection methods is provided, and the equivalence of the random walk and harmonic solutions to the Bayesian formulation is proven. A general diffusion model is introduced that utilizes spatio- temporal relationships between vertices, and is used for a speciﬁc space-time formulation that leads to signiﬁcant performance im- pro vements on coordinated covert networks. This performance is demonstrated using a new hybrid mixed-membership blockmodel introduced to simulate random cov ert networks with realistic properties. Index T erms —Network detection, optimal detection, maxi- mum likelihood detection, community detection, network theory (graphs), graph theory , diffusion on graphs, random walks on graphs, dynamic network models, Bayesian methods, harmonic analysis, eigen vector centrality , Laplace equations. I . I N T R O D U C T I O N N ETWORK detection is the objective in many di verse graph analytic applications, ranging from graph parti- tioning, mesh segmentation, manifold learning, community detection [44], network anomaly detection [10], [30], and the discov ery of clandestine networks [32], [43], [52], [56], [70]. A new Bayesian approach to network detection is de- veloped and analyzed in this paper , with speciﬁc application to detecting small, covert networks embedded within much larger background networks. The novel approach is based on a Bayesian probabilistic framework where the probability of threat is deriv ed from an observation model and an a priori threat diffusion model. Speciﬁcally , observed threats from one or more vertices are propagated through the graph using a Manuscript received Nov ember 15, 2013; revised March 17, 2014; accepted May 29, 2014. Date of publication July 08, 2014; date of current version September 8, 2004. The associate editor coordinating the re view of this manuscript and approving it for publication was Prof. Francesco V erde. This work is sponsored by the Assistant Secretary of Defense for Research & Engineering under Air Force Contract F A8721-05-C-0002. Opinions, inter- pretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government. S. T . Smith, K. D. Senne, G. Bernstein, and S. Philips are with the MIT Lincoln Laboratory , Lexington, MA 02420 USA (e-mail: stsmith@ll.mit.edu; edward.kao@ll.mit.edu; senne@ll.mit.edu; garrett.bernstein@ll.mit.edu). E. K. Kao is with the MIT Lincoln Laboratory , Lexington, MA 02420 USA, and also with the Department of Statistics, Harvard University; Cambridge MA USA 02138 (e-mail: edwardkao@fas.harv ard.edu). Color versions of one or more of the ﬁgures in this paper are av ailable online at http://ieeexplore.ieee.org. Digital Object Identiﬁer 10.1109/TSP .2014.2336613 model based on random walks represented as Marko v chains with absorbing states. The resulting network detection algo- rithm is proved to be optimum in the Neyman–Pearson sense of maximizing the probability of detection at a ﬁxed false alarm probability . In the speciﬁc case of space-time graphs with time-stamped edges, a model for threat diffusion yields the new space-time threat propagation algorithm, which is shown to be an optimal detector for covert networks with coordinated activity . Network detectors are analyzed using both a stochastic framew ork of random walks on the graph and a probabilistic framew ork. The two frame works are shown to be equiv alent, providing an original, uniﬁed approach for Bayesian network detection. Performance for a variety of Bayesian network de- tection algorithms is shown with both a stochastic blockmodel and a new hybrid mixed-membership blockmodel (HMMB) introduced to simulate random cov ert networks with realistic properties. Using insights from algebraic graph theory , the connection between this uniﬁed framework and other spectral-based net- work detection methods [18], [22], [44] is shown, and the two approaches are contrasted by comparing their different optimality criteria based on detection probability and subgraph connectivity properties. The random walk framework provides a connection with many other well-kno wn graph analytic methods that may also be posed in this context [7], [11], [15], [34], [46], [54], [62]. In contrast to other research on network detection, rather than using a sensor network to detect signals [3], [13], [30], the signal of interest in this paper is the network. In this sense the paper is also related to work on so- called manifold learning methods [8], [10], [16], although the network to be detected is a subgraph of an existing network, and therefore the methods described here belong to a class of netw ork anomaly detection [10] as well as maximum- likelihood methods for network detection [21]. Threat network discovery is predicated on the existence of observations of network relationships. Detection of network communities is most likely to be effecti ve if the communities exhibit high lev els of connection activity . The covert networks of interest in this paper exist to accomplish nefarious, illegal, or terrorism goals, while “hiding in plain sight” [70]. Covert networks necessarily adopt operational procedures to remain hidden and robustly adapt to losses of parts of the network [9], [52], [61], [66]. This paper’ s major contributions are organized into a de- scription of the nov el approach to Bayesian network detection in Section III, and showing and comparing detection perfor- mance using simulations of realistic networks in Section IV. 1053-587X c  2014 IEEE. T ranslations and content mining are permitted for academic research only . Personal use is also permitted, but republication/ redistribution requires IEEE permission. See http://www .ieee.org/publications˙standards/publications/rights/index.html for more information. SMITH et al. : BA YESIAN DISCOVER Y OF THREA T NETW ORKS 5325 Fundamental ne w results are established in Theorems 1 – 3, which pro ve a maximum principal for threat propagation, provide a nonnegati ve basis for the principal inv ariant sub- space, and pro ve the equiv alence between the probabilistic and stochastic realization approaches of threat propagation. The Neyman–Pearson optimality of threat propagation is es- tablished in Theorem 4. I I . B AC K G R O U N D A. Notation A graph G = ( V , E ) is deﬁned by two sets, the vertices V , and the edges E ⊂ [ V ] 2 , in which [ V ] 2 denotes the set of 2 -element subsets of V [17]. For example, the sets V = { 1 , 2 , 3 } , E =  { 1 , 2 } , { 2 , 3 }  describe a simple graph with undirected edges between vertices 1 and 2 , and 2 and 3 : 1 − − 2 − − 3 . The or der and size of G are deﬁned to be # V and # E , respectively . A subgraph G 0 ⊆ G is a graph ( V 0 , E 0 ) with V 0 ⊆ V and E 0 ⊆ E . If E 0 contains all edges in E with both endpoints in V 0 , then G 0 = G [ V 0 ] is the induced subgraph of V 0 . The adjacency matrix A = A ( G ) of G is the { 0 , 1 } -matrix with a ij = 1 iff { i, j } ∈ E . In the example, A =  0 1 0 1 0 1 0 1 0  . The adjacency matrix of simple or undirected graphs is necessarily symmetric. The de gree matrix D = Diag( A · 1 ) is the diagonal matrix of the vector of degrees of all vertices, where 1 = (1 , . . . , 1) T is the vector of all ones. The neighborhood N ( u ) =  v : { u, v } ∈ E  of a verte x u ∈ V is the set of vertices adjacent to u , or equiv alently , the set of nonzero elements in the u -th row of A . The vertex space V ( G ) of G is the vector space of functions f : V → { 0 , 1 } . A dir ected graph G σ is deﬁned by an orientation map σ : [ V ] 2 → V × V (the ordered Cartesian product of V with itself) in which the ﬁrst and second coordinates are called the initial and terminal vertices, respectively . A strongly connected graph is a connected graph for which a directed path exists between any two vertices. The incidence matrix B = B ( G σ ) of G σ is the (0 , ± 1) -matrix of size # V -by- # E with B ie = ± 1 , if i is an terminal/initial v ertex of σ ( e ) , and 0 otherwise. For example, the directed graph 1 ← − 2 − → 3 has incidence matrix B =  1 0 − 1 − 1 0 1  . The unnormalized Laplacian matrix or Kirc hhoff matrix Q of a graph, the (normalized) Laplacian matrix L , and the generalized or asymmetric Laplacian matrix Ł are, respectively , Q = BB T = D − A , (1) L = D − 1 / 2 QD − 1 / 2 = I − D − 1 / 2 AD − 1 / 2 , (2) Ł = D − 1 / 2 LD 1 / 2 = D − 1 Q = I − D − 1 A . (3) In the e xample, Ł is immediately recognized as a discretization of the second deri vati ve − d 2 / dx 2 , i.e. the ne gativ e of the 1 -d Laplacian operator ∆ = ∂ 2 /∂ x 2 + ∂ 2 /∂ y 2 + · · · that appears in physical applications. The connection between the Laplacian matrices and physical applications is made through Green’ s ﬁrst identity , a link that explains many theoretical and performance advantages of the normalized Laplacian over the Kirchhoff matrix across applications [14], [64], [67], [68]. Solutions to Laplace’ s equation on a graph are directly connected to random walks or discrete Markov chains on the vertices of the graph, which provide stochastic realizations for harmonic problems. A (right) stochastic matrix T of a graph is a nonnegati ve matrix such that T1 = 1 . This represents a state transition matrix of a random walk on the graph with transition probability t ij of jumping from verte x v i to vertex v j . The Perron–Frobenius theorem guarantees if T is irreducible (i.e. G is strongly connected) then there exists a stationary probability distribution p v on V such that p T T = p T [26], [28]. Random walk realizations can be used to describe the solution to harmonic boundary value problems, e.g. equilibrium thermodynamics [47], [51], in which gi ven values are proscribed at speciﬁc “boundary” vertices. B. Network Detection Network detection is a special class of the more general graph partitioning (GP) problem in which the binary decision of membership or non-membership for each graph vertex must be determined. Indeed, the network detection problem for a graph G of order N results in a 2 N -ary multiple hypothesis test ov er the vertex space V ( G ) , and, when detection optimality is considered, an optimal test inv olves partitioning the measure- ment space into 2 N regions yielding a maximum probability of detection (PD). This NP-hard combinatoric problem is computationally and analytically intractable. In general, net- work detection methods in vok e various relaxation approaches to avoid the NP-hard network detection problem. The new Bayesian threat propagation approach taken in this paper is to greatly simplify the general 2 N -ary multiple hypothesis test by applying the random walk model and treating it as N independent binary hypothesis tests. This approach is related to existing network detection methods by posing an optimization problem on the graph—e.g. threat propagation maximizes PD—and through solutions to Laplace’ s equation on graphs. Because many network detection algorithms in volv e such solutions, a key fact is that the constant vector 1 = (1 , . . . , 1) T is in the kernel of the Laplacian, Q1 = 0 ; Ł 1 = 0 . (4) This constant solution does not distinguish between vertices at all, a deﬁciency that may be resolv ed in a variety of ways. Efﬁcient graph partitioning algorithms and analysis ap- peared in the 1970s with Donath and Hof fman’ s eigenv alue- based bounds for graph partitioning [18] and Fiedler’ s con- nectivity analysis and graph partitioning algorithm [22] which established the connection between a graph’ s algebraic prop- erties and the spectrum of its Kirchhoff Laplacian ma- trix Q = D − A [Eq. (1)]. Spectral methods solve the graph partitioning problem by optimizing various subgraph connec- tivity properties. Similarly , the threat propagation algorithm dev eloped here in Section III optimizes the probability of detecting a subgraph for a speciﬁc Bayesian model. Though the optimality criteria for spectral methods and threat prop- agation are different, all these netw ork detection methods must address the fundamental problem of avoiding the trivial solution of constant harmonic functions on graphs. Threat 5326 IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. 62, NO. 20, 2014 propagation av oids this problem by using observation vertices and a priori probability of threat dif fusion (Section III-A). Spectral methods take a complementary approach to a void this problem by using an alternate optimization criterion that depends upon the network’ s topology . The cut size of a subgraph—the number of edges necessary to remove to separate the subgraph from the graph—is quan- tiﬁed by the quadratic form s T Qs , where s = ( ± 1 , . . . , ± 1) T is a ± 1 -vector who entries are determined by subgraph mem- bership [50]. Minimizing this quadratic form over s , whose solution is an eigen v alue problem for the graph Laplacian, provides a network detection algorithm based on the model of minimal cut size. Howe ver , there is a paradox in the appli- cation of spectral methods to network detection: the smallest eigen value of the graph Laplacian λ 0 ( Q ) = 0 corresponds to the eigenv ector 1 constant over all vertices, which fails to discriminate between subgraphs. Intuitively this degenerate constant solution makes sense because the two subgraphs with minimal (zero) subgraph cut size are the entire graph itself ( s ≡ 1 ), or the null graph ( s ≡ − 1 ). This property manifests itself in many well-known results from complex analysis, such as the maximum principle. Fiedler showed that if rather the eigen vector ξ 1 correspond- ing to the second smallest eigen value λ 1 ( Q ) of Q is used (many authors write λ 1 = 0 and λ 2 rather than the zero of fset indexing λ 0 = 0 and λ 1 used here), then for ev ery nonpositiv e constant c ≤ 0 , the subgraph whose vertices are deﬁned by the threshold ξ 1 ≥ c is necessarily connected. This algorithm is called spectral detection . Giv en a graph G , the number λ 1 ( Q ) is called the Fiedler value of G , and the corresponding eigen vector ξ 1 ( Q ) is called the F iedler vector . Completely analogous with comparison theorems in Riemannian geometry that relate topological properties of manifolds to algebraic properties of the Laplacian, many graph topological properties are tied to its Laplacian. For example, the graph’ s diameter D and the minimum degree d min provide lo wer and upper bounds for the Fiedler v alue λ 1 ( Q ) : 4 / ( nD ) ≤ λ 1 ( Q ) ≤ n/ ( n − 1) · d min [41]. This inequality explains why the Fiedler value is also called the algebr aic connectivity : the greater the Fiedler value, the smaller the graph diameter , implying greater graph connectivity . If the normalized Laplacian L of Eq. (2) is used, the corresponding inequality inv olving the generalized eigen value λ 1 ( L ) = λ 1 ( Q , D ) in volv es the graph’ s diame- ter D and volume V : 1 / ( D V ) ≤ λ 1 ( L ) ≤ n/ ( n − 1) [14]. Because in practice spectral detection with its implicit as- sumption of minimizing the cut size oftentimes does not detect intuitiv ely appealing subgraphs, Ne wman introduced the alter- nate criterion of subgraph “modularity” for subgraph detec- tion [44]. Rather than minimize the cut size, Ne wman proposes to maximize the subgraph connectivity relative to background graph connectivity , which yields the quadratic maximization problem max s s T Ms , where M = A − V − 1 dd T is New- man’ s modularity matrix , A is the adjacency matrix, ( d ) i = d i is the degree vector , and V = 1 T d is the graph volume [44]. Newman’ s modularity-based graph partitioning algorithm, also called community detection, in volves thresholding the values of the principal eigenv ector of M . Miller et al. [38]–[40] also consider thresholding arbitrary eigen vectors of the modularity matrix, which by the Courant minimax principle biases the Newman community detection algorithm to smaller subgraphs, a desirable property for many applications. They also outline an approach for exploiti ng observ ations within the spectral framew ork [38]. Other graph partitioning methods in voke alternate relaxation approaches that yield practical detection/partitioning algo- rithms such as semideﬁnite programming (SDP) [6], [35], [69]. A class of graph partitioning algorithms is based on inﬁnite random walks on graphs [59]. Zhou and Lipowsky deﬁne proximity using the average distance between vertices [72]. Anderson et al. deﬁne a local version biased towards spe- ciﬁc vertices [4]. Mahoney et al. develop a local spectral partitioning method by augmenting the quadratic optimization problem with a locality constraint and relaxing to a con ve x SDP [37]. An important dual to network detection is the problem of identifying the source of an epidemic or rumor using observations on the graph [53], [54]. Another related problem is the determination of graph topologies for which epidemic spreading occurs [11], [62]. The approach adopted in this paper has fundamentally dif ferent objectives and propaga- tion models than the closely-related epidemiological problems. These problems focus on disease spreading to large portions of the entire graph, which arises because disease may spread from any infected neighbor—yielding a logical OR of neighborhood disease. Network detection focuses on discov ering a subgraph most likely associated with a set of observed vertices, assum- ing random walk propagation to the observations—yielding an arithmetic mean of neighborhood threat. All of these methods are related to spectral partitioning through the graph Laplacian. I I I . B A Y E S I A N N E T W O R K D E T E C T I O N The Bayesian model de veloped here depends upon threat observation and propagation via random walks o ver both space and time, and the underlying probabilistic models that govern inference from observ ation to threat, then propagation of threat throughout the graph. Bayes’ rule is used to dev elop a network detection approach for spatial-only , space-time, and hybrid graphs. The framework assumes a giv en Markov chain model for transition probabilities, and hence knowledge of the graph, and a diffusion model for threat. Neyman–Pearson optimality is developed in the context of network detection with a simple binary hypothesis, and it is prov ed that threat propagation is optimum in this sense. The framew ork is sufﬁciently general to capture graphs formed by many possible relationships between entities, from simple graphs with vertices that represent a single type of entity , to bipartite or multipartite graphs with heterogeneous entities. For example, an email network is a bipartite graph comprised of two types of vertices: individual people and in- dividual email messages, with edges representing a connection between people and messages. Without loss of generality , all entity types to be detected are represented as vertices in the graph, and their connections are represented by edges weighted by scalar transition probabilities. Network detection is the problem of identifying a speciﬁc subgraph within a given graph G = ( V , E ) . Assume that SMITH et al. : BA YESIAN DISCOVER Y OF THREA T NETWORKS 5327 within G , a foreground or “threat” network V Θ exists deﬁned by an (unkno wn) binary random variable: Deﬁnition 1 Threat is a { 0 , 1 } -valued discr ete random vari- able. Threat on a graph G = ( V , E ) is a { 0 , 1 } -valued function Θ ∈ V ( G ) . Thr eat at the vertex v is denoted Θ v . A vertex v ∈ V is in the foreground if Θ v = 1 , otherwise v is in the background . The foreground or threat vertices are the set V Θ = { v : Θ v = 1 } , and the foreground or threat network is the induced subgraph G Θ = G [ V Θ ] . A network detector of the subgraph G Θ is a collection of binary hypothesis tests to decide which of the graph’ s vertices belong to the foreground vertices V Θ . Formally , a network detector is an element of the vertex space of G : Deﬁnition 2 Let G = ( V , E ) be a graph. A network detec- tor φ on G is a { 0 , 1 } -valued function φ ∈ V ( G ) . The induced subgraph G φ = G [ V φ ] of V φ = { v : φ v = 1 } is called the foreground network and the induced subgraph G ˜ φ = G [ V ˜ φ ] of V ˜ φ = { v : φ v = 0 } is called the background network , in which ˜ φ denotes the logical complement of φ . The correlation between a network detector φ and the actual threat network deﬁned by the function Θ determines the detection performance of φ , measured using the detector’ s probability of detection (PD) and probability of false alarm (PF A). The PD and PF A of φ are the fraction of correct and incorrect foreground vertices determined by φ : PD φ = #( V φ ∩ V Θ ) / # V Θ , (5) PF A φ = #( V φ ∩ V ˜ Θ ) / # V ˜ Θ . (6) Observation models are now introduced and applied in the sequel to threat propagation models in the contexts of spatial- only graphs, space-time graphs whose edges have time stamps, and ﬁnally a hybrid graphs with edges of mixed type. Assume that there are C observed vertices { v b 1 , . . . , v b C } ⊂ V at which observations are taken. In the resulting Laplacian prob- lem, these are “boundary” vertices, and the rest are “interior . ” The simplest case in volv es scalar measurements; ho wever , there is a straightforward extension to multidimensional ob- servations. Deﬁnition 3 Let G = ( V , E ) be a graph. An observation on the graph is a vector z : { v b 1 , . . . , v b C } → M ⊂ R C fr om C vertices to a measurement space M ⊂ R C . Ideally , observation of a foreground and/or background vertices unequi vocally determines whether the observed ver - tices lie in the foreground or background networks, i.e. gi ven a foreground graph G Θ = G [ V Θ ] and a foreground vertex v ∈ V Θ , an observation vector z ev aluated at v would yield z ( v ) = 1 , and z ev aluated at a background verte x v 0 ∈ V ˜ Θ would yield z ( v 0 ) = 0 . In general, it is assumed that the observation z ( v ) at v and the threat Θ v at v are not statistically independent, i.e. f  z ( v ) | Θ v  6 = f  z ( v )  for probability density f , so that there is positiv e mutual information between z ( v ) and Θ v . Bayes’ rule for determining how likely a verte x is to be a foreground member or not depends on the model linking observations to threat: Deﬁnition 4 Let G Θ = G [ V Θ ] be the for e gr ound graph of a graph G determined by Θ ∈ V ( G ) , and let z : { v b 1 , . . . , v b C } → M ⊂ R C be an observation on G . The conditional pr obability density f  z ( v ) | Θ v  is called the observation model of verte x v ∈ V . The simplest, ideal observation model equates threat with observation so that f ideal  z ( v ) | Θ v  = δ z ( v )Θ v in which δ ij is the Kronecker delta. Though the threat network hypotheses are being treated here independently at each verte x, this framew ork allows for more sophisticated global models that include hypotheses ov er two or more vertices. The remainder of this section is de voted to the dev elopment of Bayesian methods of using measurements on a graph to determine the probability of threat on a graph in various contexts—spatial-only , space-timed, and the h ybrid case—then showing that these methods are optimum in the Neyman– Pearson sense of maximizing the probability of detection at a given false alarm rate. The motivating problem is: Problem 1 Detect the fore gr ound graph G Θ = G [ V Θ ] in the graph G = ( V , E ) with an unknown for e gr ound Θ ∈ V ( G ) and known observation vector z ( v b 1 , . . . , v b C ) . This problem is addressed by computing the probability of threat P (Θ v ) at all graph vertices from the measurements at observed vertices using an observation model and the application of Bayes’ rule. A. Spatial Thr eat Pr opagation A spatial threat propagation algorithm is moti vated and dev eloped now , which will be used in the subsequent space- time generalization, and will demonstrate the connection to spectral network detection methods. A verte x is declared to be threatening if the observed threat propagates to that vertex. W e wish to compute the probability of threat P (Θ v = 1 | z ) at all vertices v ∈ V in a graph G = ( V , E ) giv en an observation z ( v b 1 , . . . , v b C ) on G . Implicit in Problem 1 is a coordinated threat network in which threat propagates via network connec- tions, i.e. graph edges. For simplicity , probabilities conditioned on the observ ation z will be written θ v = P (Θ v | z ) (7) with an implied dependence on the observation vector z and the event Θ v = 1 expressed as Θ v . T o model the diffusion of threat throughout the graph, we introduce an a priori probability ψ v at each verte x v that represents threat diffusion at v . ψ v is the probability that threat propagates through vertex v to its neighbors, otherwise threat propagates to an absorbing “non-threat” state with probability 1 − ψ v . A threat diffusion event at v is represented by the { 0 , 1 } -valued r .v . Ψ v : 5328 IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. 62, NO. 20, 2014 Deﬁnition 5 The threat dif fusion model of a graph G = ( V , E ) with observation z is given by the a priori { 0 , 1 } -valued event Ψ v that threat Θ v pr opagates through v with pr obability ψ v . Threat propagation on the graph from the observed vertices to all other vertices is deﬁned as an average over all random walks between vertices and the observations. A single random walk between v and an observed vertex v b c is deﬁned by the sequence walk v → v b c = ( v w 1 , v w 2 , . . . , v w L ) (8) with endpoints v w 1 = v and v w L = v b c , comprised of L steps along vertices v w l ∈ V . The probabilities for each step of the random walk are deﬁned by the elements of the transition matrix t v u from verte x v to u , multiplied by the a priori prob- ability ψ v that threat propagates through v . The assumption that G is strongly connected guarantees the e xistence of a walk between e very vertex and e very observation. Threat may be absorbed to the non-threat state with probability 1 − ψ v w l at each step. The simplest models for both the transition and a priori probabilities are uniform: t ij = 1 / degree( v i ) for ( i, j ) ∈ E , i.e. T = D − 1 A , and ψ v ≡ 1 . The implications of these simple models as well as more general weighted models will be explored throughout this section. The indicator function I walk v → v b c = Y L l =1 Ψ ( l ) v w l (9) determines whether threat propagates along the walk or is absorbed into the non-threat state (the superscript ‘ ( l ) ’ allows for the possibility of repeated vertices in the sequence). The deﬁnition of threat propagation is captured in three parts: (1) a single random walk, walk v → v b c , with I walk v → v b c = 1 yields threat probability θ v b c at v ; (2) the probability of threat av eraged ov er all such random walks; (3) the random v ariable obtained by averaging the r .v . Θ v b c ov er all such random walks. Formally , Deﬁnition 6 (Threat Propagation). Let G = ( V , E ) be a str ongly connected graph with threat pr obabilities θ v b 1 , . . . , θ v b C at observed vertices v b 1 , . . . , v b C and the threat diffusion model ψ v for all v ∈ V . (1) F or a random walk on G fr om v to observed vertex v b c with transition matrix T , walk v → v b c = ( v w 1 , v w 2 , . . . , v w L ) , if events Ψ v w l ≡ 1 for all vertices v w l along the walk, then the threat propagation from v b c to v along walk v → v b c is deﬁned to be θ v b c ; otherwise, the thr eat equals zer o. (2) Threat propagation to vertex v is deﬁned as the expectation of threat propa gation to v along all random walks emanating fr om v , θ v = lim K →∞ 1 K X k I walk ( k ) v → v b c ( k ) θ v b c ( k ) , (10) wher e the k th walk terminates at the observed vertex v b c ( k ) . (3) Random threat propagation to vertex v is deﬁned as the random variable ¯ Θ v = lim K →∞ 1 K X k I walk ( k ) v → v b c ( k ) Θ ( k ) v b c ( k ) (11) v Single hop walk P ( Θ v | z ) = P ( v → v b ) P ( Θ v b ) = ψ v P ( Θ v b ) v b z v Multiple hop walk v b z 1 2 v Random walk Θ v = lim K –1 � k I v → v b ( k ) Θ v b ( k ) → P ( Θ v | z ) = P ( v → v b ) P ( Θ v b ) v b z 1 2 3 4 ψ 3 ψ 1 ψ 2 ψ 4 ψ v ψ 1 ψ 2 ψ v ψ v P ( Θ v | z ) = P ( v → v b ) P ( Θ v b ) = ψ 1 ψ 2 ψ v P ( Θ v b ) a.s. Fig. 1. Illustration of the random walk representation for threat propagation from Deﬁnition 6 and Eqs. (11) and (33), for the case of a single observation. The upper illustration shows the simplest, trivial case with a single hop from the observation to the vertex. The middle illustration shows the next simplest case with multiple hops. The lower illustration sho ws an example of the general case, comprised of the simpler multiple hop case. with independent draws Θ ( k ) v b c ( k ) of the observed threat. Fig. 1 illustrates threat propagation of Deﬁnition 6 and Eq. (11) [and Eq. (33) from the sequel] for the simple-to- general cases of a single hop, multiple hops, and an arbitrary random walk. By the la w of large numbers, ¯ Θ v a . s . → θ v as K → ∞ . (12) The random walk model is described using the distinct yet equiv alent probabilistic and stochastic realization repre- sentations. The probabilistic representation describes the threat probabilities by a Laplacian system of linear equations, which amounts to equating threat at a verte x to an av erage of neighboring probabilities. In contrast, the stochastic realization representation presented below in Section III-A2 describes the ev olution of a single random walk realization whose ensemble statistics are described by the probabilistic representation, presented next. 1) Pr obabilistic Approac h: Consider the (unobserved) verte x v 6∈ { v b 1 , . . . , v b C } with neighbors N ( v ) = { v n 1 , . . . , v n d v } ⊂ V and d v = degree( v ) . The probabilistic equation for threat propagation from the neighbors of a ver - tex v follows immediately from Deﬁnition 6 from ﬁrst-step analysis, yielding the threat pr opagation equation: θ v = ψ v X u ∈ N ( v ) t v u θ u , (13) which is simply the a verage of the neighboring threat proba- bilities weighted by transition probabilities t v u = ( T ) v u . Note that because A θ ≥ θ , θ v is a subharmonic function on the graph [19], [28]. In the simplest case of uniform transition probabilities, T = D − 1 A and θ v = ψ v d v X u ∈ N ( v ) θ u . (14) SMITH et al. : BA YESIAN DISCOVER Y OF THREA T NETWORKS 5329 Expressed in matrix-vector notation, Eqs. (13) and (14) be- come θ = ΨT θ and θ = ΨD − 1 A θ , (15) where ( θ ) v = θ v , Ψ = Diag( ψ v ) is the diagonal matrix of a priori threat diffusion probabilities, T , D , and A are, respectiv ely , the transition, degree, and adjacency matrices. The threat probabilities at the observed vertices v b 1 , . . . , v b C are determined by the observ ation model of Deﬁnition 4, and threat probabilities at all other vertices are determined by solving Eq. (15), as with all Laplacian boundary value problems. As seen in the spectral network detection methods in Section II-B, many network detection algorithms exploit prop- erties of the graph Laplacian, and therefore must address the fundamental challenge posed by the implication of the maximum principle that harmonic functions are constant [19] in many important situations [Eq. (4)], and because the con- stant function does not distinguish between vertices, detection algorithms that rely only on solutions to Laplace’ s equation provide a futile approach to detection. If the boundary is constant, i.e. the probability of threat on all observed vertices is equal, then this is the probability of threat on every verte x in the graph. The later example is relev ant in the practical case in which there a single observation. The maximum principle applies directly to threat propagation with uniform prior Ψ = I and uniform probability of threat p o on the observed vertices: Eqs. (15) are recognized as Laplace’ s equation, ( I − T ) θ = 0 or ( I − D − 1 A ) θ = 0 , whose solution is trivially θ = p o 1 . Equiv alently , from the stochastic realization point-of-view , the probability of threat on all vertices is the same because av erage ov er all random walks between any vertex to a boundary (observed) verte x is trivially the observed, constant probability of threat p o . The following maximum principle establishes the existence of a unique non-negati ve threat probability on a graph given threat probabilities at observed vertices: Theorem 1 (Maximum Principle for Threat Propagation). Let G = ( V , E ) be a connected graph with positive pr obability of threat θ v b 1 , . . . , θ v b C at observed vertices v b 1 , . . . , v b C and the a priori pr obability ψ v that threat pr opagates thr ough vertex v . Then ther e exists a unique pr obability of threat θ v at all vertices such that θ v ≥ 0 and the maximum threat occurs at the observed vertices. Proof: That θ v exists follows from the connectivity of G , and that it takes its maximum on the boundary follo ws immediately from Eq. (14) because the threat at all vertices is necessarily bounded abo ve by their neighbors. Now prove that θ v is nonnegati ve by establishing a contradiction. Let θ m be the minimum of all θ v < 0 . Because ψ m ≤ 1 , Eq. (14) implies that θ m ≥ Avg[ N ( m )] , the weighted average value of the neighbors of m . Therefore, there exists a neighbor n ∈ N ( m ) such that θ n ≤ θ m . But θ m is by assumption the minimum value. Therefore, θ n = θ m for all n ∈ N ( m ) . Because G is connected, θ v ≡ θ m is constant for all unobserved v ertices on G . Now consider the minimum threat θ i for which i ∈ N ( b ) is a neighbor of an observed vertex b . By Eq. (14), θ i = ψ i d − 1 i  P j ∈ N ( i ) \ b θ j + θ b  , (16) ≥ ψ i d − 1 i  ( d i − 1) θ i + θ b  . (17) Therefore, θ i ≥ ψ i θ b (1 − ψ i ) d i + ψ i ≥ 0 , (18) a contradiction. Therefore, the minimum value of θ v is non- negati ve. This theorem is intuitively appealing because it sho ws how nonuniform a priori probabilities ψ v yield a nonconstant and nonnegati ve threat on the graph; howe ver , the theorem conceals the crucial additional “absorbing” state that allows threat to dissipate away from the constant solution. This slight defect will be corrected shortly when the equiv alent stochastic realization Markov chain model is introduced. Models about the likelihood of threat at speciﬁc vertices across the graph are provided by the a priori probabilities ψ v , which as discussed abov e prev ent the uninformati ve (yet valid) solution of con- stant threat across the graph gi ven an observ ation of threat at a speciﬁc verte x. A simple model for the a priori probabilities is degree- weighted threat propagation (DWTP), ψ v = 1 d v (D WTP) , (19) in which threat is less likely to propagate through high-degree vertices. Another simple model sets the mean propagation length proportional to the graph’ s average path length l ( G ) yields length-weighted threat propagation (L WTP) ψ v ≡ 2 − 1 / l ( G ) (L WTP) . (20) For almost-surely connected Erd ˝ os–R ´ enyi graphs with p = n − 1 log n , l ( G ) = (log n − γ ) / log log n + 1 / 2 and γ = 0 . 5772 . . . is Euler’ s constant [25]. A model akin to breadth-ﬁrst search (BFS) sets the a priori probabilities to be in versely proportional to the Dijkstra distance from observed vertices, i.e. ψ v ∝ 1 / dist( v , { v b 1 , . . . , v b C } ) (BFS) . (21) Deﬁning the generalized Laplacian operator Ł ψ def = I − ΨD − 1 A , (22) the threat propagation equation Eq. (15) written as Ł ψ θ = 0 , (23) connects the generalized asymmetric Laplacian matrix of with threat propagation, the solution of which itself may be viewed as a boundary v alue problem with the harmonic operator Ł ψ . Giv en observations at vertices v b 1 , . . . , v b C , the harmonic thr eat pr opagation equation is  Ł ψ ii Ł ψ ib  θ i θ b  = 0 (24) where the generalized Laplacian Ł ψ =  Ł ψ ii Ł ψ bi Ł ψ ib Ł ψ bb  and the threat vector θ =  θ i θ b  hav e been permuted so that observed 5330 IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. 62, NO. 20, 2014 vertices are in the ‘ b ’ blocks (the “boundary”), unobserved vertices are in ‘ i ’ blocks (the “interior”), and the observation vector θ b is given. The harmonic thr eat is the solution to Eq. (24), θ i = − ( Ł ψ ii ) − 1 ( Ł ψ ib θ b ) . (25) Eq. (24) is directly analogous to Laplace’ s equation ∆ ϕ = 0 giv en a ﬁxed boundary condition. As discussed in the next subsection and Section II-B, the connection between threat propagation and harmonic graph analysis also provides a link to spectral-based methods for network detection. In practice, the highly sparse linear system of Eq. (25) may be solved by simple repeated iteration of Eq. (13), or using the biconjugate gradient method, which provides a practical computational approach that scales well to graphs with thousands of vertices and thousands of time samples in the case of space-time threat propagation, resulting in graphs of order ten million or more. In practice, signiﬁcantly smaller subgraphs are encountered in applications such as threat network discov ery [56], for which linear solvers with sparse systems are extremely fast. 2) Stoc hastic Realization Appr oach: The stochastic realiza- tion interpretation of the Bayesian threat propagation equations (13) is that the probability of threat for one random walk from v to the observed vertex v b c is θ v | walk v → v b c = θ v b c , (26) and the probability of threat θ v at v equals the threat proba- bility av eraged ov er all random walks emanating from v . This is equi v alent to an absorbing Markov chain with absorbing states [49] at which random walks terminate. The absorbing vertices for the threat dif fusion model are the C observed vertices, and an augmented state reachable by all unobserved vertices representing a transition from threat to non-threat with probability 1 − ψ v . The ( N + 1) -by- ( N + 1) transition matrix for the Markov chain corresponding to threat propagation equals T =   N − C C 1 N − C G H 1 − ψ N − C C 0 I 0 1 0 0 1   (27) in which G and H are deﬁned by the block partition ΨD − 1 A =  N − C C N − C G H C ∗ ∗  (28) with ‘ ∗ ’ denoting unused blocks, and ψ N − C = ( ψ 1 , ψ 2 , . . . , ψ N − C ) T is the vector of a priori threat diffusion probabilities from 1 to N − C . The observed vertices v b 1 , . . . , v b C are assigned to indices N − C + 1 , . . . , N , and the augmented “non-threat” state is assigned to index N + 1 . According to this stochastic realization model, the threat at a vertex for any single random walk that terminates at an absorbing vertex is giv en by the threat le vel at the terminal verte x, with the augmented “non-threat” verte x assigned a threat lev el of zero; the threat is determined by this result av eraged over all random walks. Ignoring the a priori proba- bilities, this is also precisely the stochastic realization model for equilibrium thermodynamics and, in general, solutions to Laplace’ s equation [47], [51]. As in Eq. (4), the uniform vector ( N + 1) − 1 1 N +1 is the left eigen vector of T because T is a right stochastic matrix, i.e. T · 1 = 1 . For an irreducible transition matrix of a strongly connected graph, the Perron–Frobenius theorem [26], [28] guarantees that this eigenv alue is simple and that the con- stant vector is the unique in variant eigen vector corresponding to λ = 1 , a trivial solution that, as usual, poses a fundamental problem for network detection. Ho we ver , neither v ersion of the Perron–Frobenius theorem applies to the transition matrix T of an absorbing Markov chain because T is not strictly positiv e, as required by Perron, nor is T irreducible, as required by Frobenius—the absorbing states are not strongly connected to the graph. T o guarantee the existence of nonnegati ve threat propagating ov er the graph, we require a generalization of the Perron– Frobenius theorem for reducible nonnegati ve matrices of the form found in Eq. (27). The following theorem introduces a new version of Perron–Frobenius that establishes the existence of a nonnegati ve basis for the principal in variant subspace of a reducible nonnegati ve matrix. Theorem 2 (Perron–Fr obenius for a Reducible Nonnega- tive Matrix). Let T be a r educible, nonne gative, or der n matrix of canonical form, T =  Q R 0 I r  , (29) such that the maximum modulus of the eigen values of Q is less than unity , | λ max ( Q ) | < 1 , and rank R = r . Then the maximal eigen value of T is unity with multiplicity r and nondefective. Furthermor e, ther e exists a nonne gative matrix E =  ( I − Q ) − 1 R I r  (30) of rank r such that TE = E , (31) i.e. the columns of E span the principal in variant subspace of T . The proof follo ws immediately by construction and a straightforward computation in volving the partition E =  E 1 E 2  with the choice E 2 = I r , resulting in the nonnegati ve solution to Eq. (31), E 1 = ( I − Q ) − 1 R = ( I + Q + Q 2 + · · · ) R . Theorem 2 has immediate application to threat propagation, for by deﬁnition the probability of threat on the graph is determined by the vector θ a =  θ i θ a b  such that T θ a = θ a and θ a b =  θ b 0  is determined by the probabilities of threat θ b = ( θ N − C +1 , . . . , θ N ) T at observed vertices v b 1 , . . . , v b C [cf. Eq. (24)] augmented with zero threat θ a N +1 = 0 at the “non-threat” vertex. From Eqs. (27) and (29), Q = G , R =  H 1 − ψ N − C  , and R θ a b = H θ b . Therefore, the vector that satisﬁes the proscribed boundary v alue problem equals θ a =  ( I − G ) − 1 H θ b θ a b  . (32) SMITH et al. : BA YESIAN DISCOVER Y OF THREA T NETWORKS 5331 As is well-kno wn [49], the hitting probabilities of a random walk from an unobserved verte x to an observed vertex are giv en by the matrix U = ( I − G ) − 1 H ; therefore, an equi v- alent deﬁnition of threat probability θ v from Eq. (32) is the probability that a random walk emanating from v terminates at an observed vertex, conditioned on the probability of threat ov er all observed vertices: θ v = X c P ( walk v → v b c ) P (Θ v b c ) . (33) W e have thus prov ed the follo wing theorem establishing the equiv alence between the probabilistic and stochastic realiza- tion approaches of threat propagation. Theorem 3 (Harmonic threat propagation). The vec- tor θ =  θ i θ b  ∈ R N is a solution to the boundary value pr oblem of Eq. (24) if and only if the augmented vector θ a =  θ i θ a b  ∈ R N +1 is a stationary vector of the absorbing Markov chain transition matrix T of Eq. (27) with given values θ a b . Furthermore , θ is nonnegative . This theorem will also provide a connection to the spectral method for network detection discussed in Section II-B. B. Space-T ime Thr eat Pr opagation Many important network detection applications, especially networks based on vehicle tracks and computer communi- cation networks, inv olve directed graphs in which the edges hav e departure and arriv al times associated with their initial and terminal vertices. Space-Time threat propagation is used compute the time-v arying threat across a graph gi ven one or more observ ations at speciﬁc vertices and times [48], [57]. In such scenarios, the time-stamped graph G = ( V , E ) may be viewed as a space-time graph G T = ( V × T , E T ) where T is the set of sample times and E T ⊂ [ V × T ] 2 is an edge set determined by the temporal correlations between vertices at speciﬁc times. This edge set is application-dependent, b ut must satisfy the two constraints, (1) if  u ( t k ) , v ( t l )  ∈ E T then ( u, v ) ∈ E , and (2) temporal subgraphs  ( u, v ) , E T ( u, v )  between any two vertices u and v are deﬁned by a tem- poral model E T ( u, v ) ⊂ [ T t T ] 2 . If the stronger , con- verse of property (1) holds, i.e. if ( u, v ) ∈ E then either  u ( t k ) , v ( t l )  ∈ E T or  v ( t l ) , u ( t k )  ∈ E T for all t k , t l , then if the graph G is irreducible, then so is the space-time graph G T . An e xample space-time graph is illustrated in Fig. 2. The general models for spatial threat propagation provided in the preceding subsection will now be augmented to include dynamic models of threat propagation. Giv en an observed threat at a particular verte x and time, we wish to compute the inferred threat across all vertices and all times. Gi ven a verte x v , denote the threat at v and at time t ∈ R by the { 0 , 1 } -valued stochastic process Θ v ( t ) , with value zero indicating no threat, and value unity indicating a threat. As abov e, denote the pr obability of thr eat at v at t by θ v ( t ) def = P  Θ v ( t ) = 1  = P  Θ v ( t )  . (34) The threat state at v is modeled by a ﬁnite-state continuous time Markov jump process between from state 1 to state 0 i v i u - G t 1 t 2 t 3 t 4 t 5 t 6 ? T   v ( t 1 )   v ( t 2 )   v ( t 3 )   v ( t 4 )   v ( t 5 )   v ( t 6 )   u ( t 1 )   u ( t 2 )   u ( t 3 )   u ( t 4 )   u ( t 5 )   u ( t 6 ) @ @ @ @ @ @ @ @ @ @ @ I Z Z Z Z Z Z Z Z Z Z Z } H H H H H H H H H H H Y X X X X X X X X X X X y             9            *            : - X X X X X X X X X X X z H H H H H H H H H H H j Z Z Z Z Z Z Z Z Z Z Z ~ Fig. 2. A directed space-time graph G T with vertices V × T , V = { u, v } sampled at index times T = ( t 1 , . . . , t 6 ) . F or example, an interaction between u ( t 5 ) and v ( t 3 ) (represented by the doubled-sided arrow v ( t 3 ) ← → u ( t 5 ) above) also creates space-time edges via the space-time kernel [Eq. (36)] between u ( t 5 ) and other times at v , and v ( t 3 ) and other times at u . with Poisson rate λ v . With this simple model the threat stochastic process Θ v ( t ) satisﬁes the It ˆ o stochastic differential equation [60], d Θ v = − Θ v dN v ; Θ v (0) = θ 1 , (35) where N v ( t ) is a Poisson process with rate λ v deﬁned for positiv e time, and simple time-rev ersal provides the model for negati ve times. Gi ven an observed threat z = Θ v (0) = 1 at v at t = 0 so that θ V (0) = 1 , the probability of threat at v under the Poisson process model (including time-reversal) is θ v ( t ) = P  Θ v ( t ) | z = Θ v (0) = 1  = e − λ v | t | , (36) This stochastic model provides a Bayesian framework for inferring, or propagating, threat at a verte x over time giv en threat at a speciﬁc time. The function K v ( t ) = e − λ v | t | (37) of Eq. (36) is called the space-time thr eat kernel and when combined with spatial propagation provides a temporal model E T for a space-time graph. A Bayesian model for propagating threat from vertex to verte x will provide a full space-time threat diffusion model and allo w for the application of the optimum maximum likelihood test that will be de veloped in Section III-D. Propagation of threat from vertex to vertex is determined by interactions between vertices. Upon computation of the space- time adjacency matrix, the spatial analysis of Section III-A applies directly to space-time graphs whose vertices are space- time positions. The threat at verte x v at which a single interaction τ from verte x u arri ves and/or departs at times t v τ and t u τ is determined by Eq. (36) and the (independent) ev ent Ψ v ( t ) that threat propagates through v at time t : P  Θ v ( t )  = θ v ( t ) = θ u ( t u τ ) K v ( t − t v τ ) ψ v ( t ) . There is a linear 5332 IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. 62, NO. 20, 2014 transformation θ v ( t ) = ψ v ( t ) K ( t − t v τ ) θ u ( t u τ ) = Z ∞ −∞ ψ v ( t ) K ( t − t v τ ) δ ( σ − t u τ ) θ u ( σ ) dσ (38) from the threat probability at u to v . Discretizing time, the temporal matrix K uv τ for the discretized operator has the sparse form K uv τ =  0 . . . 0 K ( t k − t v τ ) 0 . . . 0  , (39) where 0 represents an all-zero column, t k represents a vector of discretized time, and the discretized function K ( t k − t v τ ) appears in the column corresponding to the discretized time at t u τ . Threat propagating from vertex v to u along the same interaction τ is gi ven by the comparable expression θ u ( t ) = θ v ( t v τ ) K ( t − t u τ ) , whose discretized linear operator K v u τ takes the form K v u τ =  0 . . . 0 K ( t k − t u τ ) 0 . . . 0  (40) [cf. Eq. (39)] where the nonzero column corresponds to t v τ . The sparsity of K uv τ and K v u τ will be essential for practical space-time threat propagation algorithms. The collection of all interactions determines a weighted space-time adjacency matrix A for the space-time graph G T . This is a matrix of order # V · # T whose temporal blocks for interactions between vertices u and v equals,  A uu A uv A v u A v v  =  0 P l K v u τ l P l K uv τ l 0  . (41) Note that with the space-time threat kernel of Eq. (37), if G is irreducible, then so is G T . As with spatial-only threat propagation of Eq. (13), the space-time threat pr opagation equation is θ = ΨW − 1 A θ (42) or θ v ( t k ) = ψ v ( t k ) P u,l k v u ; kl X u,l k v u ; kl θ u ( t l ) (43) in which θ is the (discretized) space-time vector of threat probabilities, A = ( k v u ; kl ) is the (weighted) space- time adjacency matrix, and W = Diag( A · 1 ) and Ψ = diag  ψ 1 ( t 1 ) , . . . , ψ N ( t # T )  are, respectively , the space-time diagonal matrices of the space-time vertex weights and a priori probabilities that threat propagates through each spatial verte x at a speciﬁc time. In contrast to the treatment of spatial-only threat propagation in Section III-A, the space-time graph is necessarily a directed graph, consistent with the asymmetric space-time adjacency matrix of Eq. (41). By assumption, the graph G is irreducible, implying that the space-time graph G T is also irreducible. Therefore, The- orem 1 implies a well-deﬁned solution to the space-time threat propagation equation of Eq. (42) for a set observ ations at speciﬁc vertices and times, v b 1 ( t b 1 ) , . . . , v b C ( t b C ) . Y et the Perron–Frobenius theorem for the space-time Laplacian Ł = I − W − 1 A poses precisely the same detection challenge as with spatial-only propagation: if the a priori probabilities are constant and equal to unity , i.e. Ψ = I , and the observed probability of threat is constant, then the space-time proba- bility of threat is also constant for all spatial vertices and all times, yielding a hopeless detection method. Howe ver , the advantage of time-stamped edges is that the times can be used to detected temporally coordinated network activity—we seek to detect vertices whose acti vity is corre- lated with that of threat observed at other vertices. According to this model of threat networks, the a priori probability that a threat propagates through vertex v at time t k is determined by the Poisson process used to model the probability of threat as a function of time: ψ v ( t k ) = 1 d v X u,l k v u ; kl , (44) where d v is the spatial degree of vertex v , i.e. the number of interactions associated with a spatial verte x. If all interactions arriv e/depart at the same time at v , then the a priori probability of threat diffusion is unity at this time, b ut dif ferent times reduce this probability according to the stochastic process for threat. Thus space-time threat propagation for coordinated activity is determined by the threat propagation equation, θ = D − 1 A θ (45) in which D = diag  d 1 I , . . . , d N I  is the block-diagonal space-time matrix of (unweighted) spatial degrees and A is the weighted space-time adjacency matrix as in Eq. (42). This al- gorithm may also be further generalized to account for spatial- only a priori probability models such as the distance from ob- served vertices by replacing 1 /d v in Eq. (45) with ψ 0 v /d v and an a priori model as in Eq. (19), yielding the threat propagation equation θ = Ψ 0 D − 1 A θ with Ψ 0 = diag  ψ 0 1 I , . . . , ψ 0 N I  . C. Hybrid Thr eat Pr opagation The temporal kernels introduced for time-stamped edges in Section III-B are appropriate for network detection appli- cations that inv olve time-stamped edges; howe ver , there are many applications in which such time-stamped information is either unav ailable, irrele vant, or uncertain. Ignoring small routing delays, computer network communication protocols occur essentially instantaneously , and te xt documents may describe relationships between sites independent of a spe- ciﬁc timeframe. Integrating spatio-temporal relationships from multiple information sources necessitates a hybrid approach combining, where appropriate, the spatial-only capabilities of Section III-A with the space-time methods of Section III-B. In situations such as computer communication networks in which the timescale of the relationship is much smaller than the discretized timescale, then connections from one vertex to another arriv e at the same discretized time, and the temporal blocks for connections between vertices u and v replaces Eq. (41) and equals,  A uu A uv A v u A v v  =  0 I I 0  . (46) In situations such as time-independent references within text documents in which threat at any time at vertex u implies a SMITH et al. : BA YESIAN DISCOVER Y OF THREA T NETWORKS 5333 threat at all times at vertex v , and vice versa, the temporal blocks for connections between vertices u and v equals,  A uu A uv A v u A v v  =  0 (# T ) − 1 11 T (# T ) − 1 11 T 0  , (47) i.e. a space-time clique between u and v . This equi v alent the space-time model with Poisson rate λ = 0 . D. Ne yman–P earson Network Detection Network detection of a subgraph within a graph G = ( V , E ) of order N is treated as N independent binary hypothesis tests to decide which of the graph’ s N vertices do not belong (null hypothesis H 0 ) or belong (hypothesis H 1 ) to the network. Maximizing the probability of detection (PD) for a ﬁxed probability of false alarm (PF A) yields the Neyman– Pearson test in volving the log-likelihood ratio of the competing hypotheses. W e will derive this test in the context of network detection, which both illustrates the assumptions that ensure detection optimality , as well as indicates practical methods for computing the log-likelihood ratio test and achieving an optimal network detection algorithm. It will be seen that a few basic assumptions yield an optimum test that is equiv alent to the Bayesian threat propagation algorithm de veloped in the previous section. If any part of the graph is unknown or uncer- tain, then the Markov transition probabilities may be treated as random variables and either marginalized out of the likelihood ratio, yielding Neyman–Pearson optimality in the av erage sense, or the maximum likelihood estimate may be used in the suboptimum generalized likelihood ratio test (GLR T) [63]. W e will not cov er extensions to unknown parameters in this paper . The optimum test in volv es the graph Laplacian, which allows comparison of Neyman–Pearson testing to se veral other network detection methods whose algorithms are also related to the properties of the Laplacian. An optimum hypothesis test is now derived for the presence of a network giv en a set of observations z according to the observation model of Deﬁnition 4. Optimality is deﬁned in the Neyman–Pearson sense in which the probability of detection is maximized at a constant false alarm rate (CF AR) [63]. For the general problem of network detection of a subgraph within graph G of order N , the decision of which of the 2 N hypothesis Θ = (Θ v 1 , . . . , Θ v N ) T to choose in volv es a 2 N -ary multiple hypothesis test ov er the measurement space of the observation vector z , and an optimal test inv olves partitioning the measurement space into 2 N regions yielding a maximum PD. This NP-hard general combinatoric problem is clearly computationally and analytically intractable. Ho we ver , Eq. (11) following Deﬁnition 6 guarantees that the threats at each vertex are independent random variables, allowing the general 2 N -ary multiple hypothesis test to be greatly simpliﬁed by treating it as N independent binary hypothesis tests at each verte x. At each verte x v ∈ G and unknown threat Θ : V → { 0 , 1 } across the graph , consider the binary hypothesis test for the unknown v alue Θ v , H 0 ( v ) : Θ v = 0 (verte x belongs to background) H 1 ( v ) : Θ v = 1 (verte x belongs to foreground). (48) Giv en the observation vector z : { v b 1 , . . . , v b C } ⊂ V → M ⊂ R C with observation models f  z ( v b j ) | Θ v b j  , j = 1 , . . . , C , the PD and PF A are gi ven by the integrals PD = R R f ( z | Θ v = 1) d z and PF A = R R f ( z | Θ v = 0) d z , where R ⊂ M is the detection region in which observations are declared to yield the decision Θ v = 1 , otherwise Θ v is declared to equal 0 . The optimum Neyman–Pearson test uses the detection region R that maximizes PD at a ﬁxed CF AR value PF A 0 , yielding the likelihood ratio (LR) test [63], f ( z | Θ v = 1) f ( z | Θ v = 0) H 1 ( v ) ≷ H 0 ( v ) λ (49) for some λ > 0 . Likelihood ratio tests are also used for graph classiﬁcation [36]. Finally , a simple application of Bayes’ theorem to the har- monic threat θ v = f (Θ v | z ) provides the optimum Neyman– Pearson detector [Eq. (49)] because f ( z | Θ v = 1) f ( z | Θ v = 0) = f (Θ v = 1 | z ) f (Θ v = 0 | z ) · f (Θ v = 0) f (Θ v = 1) = θ v 1 − θ v · f (Θ v = 1) f (Θ v = 0) H 1 ( v ) ≷ H 0 ( v ) λ, (50) results in a threshold of the harmonic space-time threat prop- agation vector of Eq. (7), θ v H 1 ( v ) ≷ H 0 ( v ) threshold , (51) with the prior ratio f (Θ v = 1) /f (Θ v = 0) and the monotonic function θ v 7→ θ v / (1 − θ v ) being absorbed into the detection threshold. By construction, the e vent Θ v = 1 is equiv alent to a random walk between v and one of the observed vertices v b 1 , . . . , v b C , along with one of the e vents Θ v b 1 = 1 , . . . , Θ v b C = 1 , as represented in Eq. (33). Note that θ v and equiv- alently the likelihood ratio are continuous functions of the probabilities P ( walk v → v b c ) and P (Θ v b c ) ; therefore, equality of the likelihood ratio to any giv en threshold exists only on a set of measure zero. If the prior ratio is constant for all vertices, then the threshold is also constant, and the likelihood ratio test [Eq. (51)] for optimum network detection becomes θ H 1 ≷ H 0 threshold . (52) This establishes the detection optimality of harmonic space- time threat propagation. Because the probability of detecting threat is maximized at each vertex, the probability of detection for the entire subgraph is also maximized, yielding an optimum Neyman– Pearson test under the simpliﬁcation of treating the 2 N -ary multiple hypothesis testing problem as a sequence of N binary hypothesis tests. Summarizing, the probability of network detection gi ven an observation z is maximized by computing f (Θ v | z ) using a Bayesian threat propagation method and applying a simple likelihood ratio test, yielding the following theorem that equates threat propagation with the optimum Neyman–Pearson test. 5334 IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. 62, NO. 20, 2014 Theorem 4 (Neyman–Pearson Optimality of Threat Prop- agation). The solution to Bayesian threat propa gation ex- pr essed in Eqs. (24) or (32) yields an optimum likelihood ratio test in the Neyman–P earson sense. I V . M O D E L I N G A N D P E R F O R M A N C E Evaluation of network detection algorithms may be ap- proached from the perspectiv es of theoretical analysis or empirical experimentation. Theoretical performance bounds hav e only been accomplished for simple network models, i.e. cliques [23], [33], [42] or dense subgraphs [5] embedded within Erd ˝ os–R ´ enyi backgrounds, and there are no theoretical results at all for more complex network models that char- acterize real-world networks [58]. If representativ e network data with truth is av ailable, one may ev aluate algorithm performance with speciﬁc data sets [71]. Howe ver , real-world data sets of covert networks with truth is unknown to the authors. Therefore, network detection performance ev aluation must be conducted on simulated networks using generativ e models. W e begin with a simple stochastic blockmodel [65], explore this model’ s limitations, then introduce a ne w network model designed to address these defects while at the same time encompassing the characteristics of real-world networks [1], [2], [12], [45], [70]. V arying model parameters also yields in- sight on the dependence of algorithm performance on different network characteristics. For each ev aluation, we compare performance between the space-time threat propagation [STTP; Section III-B], breadth- ﬁrst search spatial-only threat propagation [BFS; Eq. (21)], and modularity-based spectral detection algorithm [SPEC] [40]. The performance metric is the standard recei ver operating characteristic (R OC), which in the case of network detection is the probability of detection (i.e. the percentage of true foreground vertices detected) versus the probability of false alarms (i.e. the percentage of background vertices detected) as the detection threshold is varied. A. Detection P erformance On Stochastic Bloc kmodels 1) Stoc hastic Blockmodel Description: The stochastic blockmodel captures the sparsity of real-world networks and basic community structure [27] using a simple network frame- work [65]. For a graph of order N di vided into K com- munities, the model is parameterized by a N -by- K { 0 , 1 } membership matrix Π and a K -by- K probability matrix S that deﬁnes the probability of an edge between two v er- tices based upon their community membership. Therefore, the probability of an edge is determined by the off-diagonal terms of the matrix ΠSΠ T . By the classical result of Erd ˝ os– R ´ enyi [20], each community is almost surely connected if S kk > log N k / N k in which N k is the number of vertices in community k . W e introduce the activity parameter r k ≥ 1 and set S kk = r k log N k / N k to adjust a community’ s density relativ e to its Erd ˝ os–R ´ enyi connectivity threshold. 2) Experimental Setup and Results: The objectiv e of this experiment is to quantify detection performance of a fore- ground network with varying activity giv en observations from a small fraction of its members. Fig. 3 illustrates the R OC 0 0.2 0.4 0.6 0.8 1 PD 0 0.2 0.4 0.6 0.8 1 STTP, r fg = 2 STTP, r fg = 1.1 BFS, r fg = 2 BFS, r fg = 1.1 SPEC, r fg = 2 SPEC, r fg = 1.1 1000 trials PFA Fig. 3. Detection ROC curves of the three dif ferent algorithms at tw o levels of foreground activity , 1 . 1 · log N fg / N fg and 2 · log N fg / N fg . Data is simulated using the stochastic blockmodel with 1000 Monte Carlo trials each with an independent draw of the random network and single threat observation. detection performance with a graph of order N = 256 and K = 3 with two background communities of order 128 , and a foreground community of order 30 randomly embedded in the background. The probability matrix is, S =   0 . 08 0 . 02 0 . 02 0 . 02 0 . 08 0 . 02 0 . 02 0 . 02 r fg · 0 . 1   , parameterized by the activity r fg relativ e to the Erd ˝ os–R ´ enyi foreground connectivity threshold log 30 / 30 ≈ 0 . 1 . A simple temporal model is used with all foreground interactions at the same time (i.e. perfect coordination), and background interactions uniformly distributed in time (i.e. uncoordinated). Results are shown for both sparsely connected ( r fg = 1 . 1 ) and moderately connected ( r fg = 2 ) foreground networks. The simulations show that excellent ROC performance is achie v- able if temporal information is exploited (STTP) with highly coordinated foreground networks with sparse to moderate connectivity . Because of the use of temporal information, STTP outperforms BFS. Spectral methods, which are designed to detect highly connected networks perform poorly on sparse foreground networks, and improve as foreground network connectivity increases, especially in the lo w PF A region in which SPEC performs better than BFS threat propagation. This result is consistent with expectations and recent theoretical results for spectral methods applied to clique detection [5], [42]. Continuous lik elihood ratio tests possess R OC curv es that are necessarily conv ex upwards [63]; therefore, the ROCs for threat propagation algorithms applied to data generated from random walk propagation are necessarily con vex. The results of Fig. 3 sho w both threat propagation and spectral meth- ods applied to data generated from a stochastic blockmodel. Because the spectral detection algorithm is not associated with a likelihood ratio test, con vexity of its R OC curves is not guaranteed—indeed, the spectral ROC curve with r fg = 2 is seen to be concave in the high PD region. All threat SMITH et al. : BA YESIAN DISCOVER Y OF THREA T NETWORKS 5335 X ( L × K ) i ( K ) N i N N × N ij i j m S B ( K × K ) i j i j Membership Activity level Community interaction Sparsity i l ( L ) ( L ) j i z ( K ) i j z ( K ) ( K ) ( K ) ( K × K ) Fig. 4. Hybrid mixed-membership blockmodel for the network simulation with N vertices, K communities, and L lifestyles. Shaded squares are model parameters for tuning and circles are variables drawn during simulation. propagation R OC curves are observed to be con vex, except for a small part of STTP with r fg = 1 . 1 near PD = 0 . 7 . This slight conca vity (about 2% ) may be caused by model mismatch between the stochastic blockmodel and the random walk model, or statistical ﬂuctuation of the Monte Carlo analysis (about 1 . 4% binomial distribution variance at PD = 0 . 7 ). Of course, real-w orld networks are not perfectly coordi- nated, ideal Erd ˝ os–R ´ enyi graphs. W e will de velop a no vel, more realistic model in the next section to explore ho w more realistic networks with v arying lev els of foreground coordina- tion affect the performance of space-time threat propagation. B. Detection P erformance on the Hybrid Mixed-Membership Blockmodel 1) Hybrid Mixed-Membership Blockmodel Description: Real-world networks display basic topological characteristics that include a po wer-law degree distribution (i.e. the “small world” property) [12], mixed-membership-based community structure (i.e. individuals belong to multiple communities) [2], [65], and sparsity [44]. No one simple network model captures all these traits. For example, the stochastic blockmodel abov e provides sparsity and a rough community structure, but does not capture interactions through time, the po wer-law degree distribution, nor the reality that each individual may belong to multiple communities. The po wer-la w models such as R- MA T [12] do not capture membership-based community struc- ture, and mixed-membership stochastic blockmodels [2] does not capture po wer-law degree distribution nor temporal coor- dination. T o capture all these characteristics of the real-world networks model, we propose a ne w parameterized generati ve model called the “hybrid mixed-membership blockmodel” (HMMB) that combines the features of these fundamental network models. The proposed model is depicted as the plate diagram in Fig. 4. The hybrid mixed-membership blockmodel is an aggreg ate of the following simpler models and their features: Erd ˝ os– R ´ enyi for sparsity [20], Chung–Lu for po wer-la w degree distri- bution [1], and mixed-membership stochastic blockmodel for community structure [2]. W e model the number of interactions between an y two indi viduals (i.e. edge weights) as Poisson ran- dom v ariables. Each interaction recei ves a timestamp through a coordination model. As above, let N be the order of the graph, and let K be the number of communities. Each individual (i.e. verte x) divides its membership among the K communities (i.e. mixed membership), and the fraction in which an individual participates among the dif ferent communities is determined by L distinct lifestyles. The rate λ ij of interactions between vertices i and j is gi ven by the product λ ij = I S ij · λ i λ j P k λ k · z T i → j Bz j → i , (53) where the ﬁrst term I S ij is the (binary) indicator function drawn from the stochastic blockmodel described in Section IV -A, the second term λ i λ j /  P k λ k  is the Chung–Lu model with per- verte x expected degrees λ i , and the third term z T i → j Bz j → i is the mixed-membership stochastic blockmodel with K -by- K block matrix B that determines the intercommunity interaction strength, and z i → j is a { 0 , 1 } -valued K -vector that indicates which community membership that verte x i assumes when interacting with verte x j . The mixed-membership K -vector π i speciﬁes the fraction that indi vidual vertex i divides its membership among the K communities so that 1 T π i ≡ 1 . Each v ertex is assigned, via the { 0 , 1 } -valued L -vector l i , to one of L “lifestyles” each with an e xpected membership distribution gi ven by the L -by- K matrix X . The membership distrib ution π i is determined by a Dirichlet random draw using the K -vector l T X . The lifestyle vector l i is determined from a multinomial random draw using the L -vector φ as the probability of belonging to each lifestyle. Similarly , for each interaction, the community indicator vector z i → j is determined from a multinomial random draw using the K -vector π i as the probability of belonging to each community . The expected vertex degrees λ i are determined from a po wer-la w random draw using the exponent α . The parameter matrices S and B are ﬁxed. Finally , intracommunity coordination is achiev ed by the nonnegati ve K -vector γ , a Poisson parameter of the avera ge number of coordinated ev ents at each vertex within a speciﬁc community . Smaller values of γ k correspond to higher le vels of coordination in community k because there are fewer ev ent times from which to choose. A community-dependent Poisson random draw determines the integer number of ev ent times within each community , which are then drawn uniformly ov er the time extent of interest. An edge between vertices i and j is assigned two random event timestamps based on the community indicator vectors z i → j and z j → i . 2) Experimental Setup and Results: The objectiv e of this experiment is to quantify detection performance with varying coordination of a realistic fore ground network operating within a realistic background. W e use elev en “lifestyles” spanning ten communities, with two lifestyles designated as foreground and all others as background. The foreground network’ s coordination varies from γ fg = 1 (i.e. highly coordinated activity at a single time, consistent with the tactic used by cov ert networks to mitigate their exposure to discov ery) to γ fg = 24 (i.e. less coordination). Each member of the covert foreground network is also a mem- ber of sev eral background communities. The foreground and background order are the same as in the experiment of Sec- tion IV -A2, and sparsity le vels all log N i / N i . The foreground network is only a small fraction of the entire population. 5336 IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. 62, NO. 20, 2014 0 0.2 0.4 0.6 0.8 1 PD 0 0.2 0.4 0.6 0.8 1 PFA STTP, γ fg = 1 STTP, γ fg = 10 STTP, γ fg = 24 BFS, γ fg = 1 SPEC, γ fg = 1 1000 trials Fig. 5. Detection R OC curves of the three different algorithms at three lev els ( γ fg = 1 , 10 , 24 ) of foreground coordination. Spectral detection and BFS threat propagation do not use temporal information so their performance is unaffected by the coordination level. Fore ground actors are characterized by two distinct lifestyles representing their memberships in the covert community as well as dif ferent background communities. The background communities are intended to represent various business, home, industry , religious, sports, or other social interactions. Fig. 5 illustrates the R OC performance with these param- eters, varying the lev el of foreground coordination. Through Eq. (44), space-time threat propagation is designed to perform well with highly coordinated networks, consistent with the results observed in Fig. 5 in which STTP performs best at the higher coordination levels and outperforms the breadth- ﬁrst search and modularity-based spectral detection methods. The spectral detection algorithm is expected to perform poorly in this scenario because, as discussed in Section II-B, it relies upon a relativ ely dense foreground network, which does not exist in this simulated dataset with realistic properties of co vert networks. V . C O N C L U S I O N S A Bayesian framework for network detection can be used to unify the different approaches of network detection algorithms based on random walks/diffusion and algorithms based on spectral properties. Indeed, using the concise assumptions for random walks and threat propagation laid out in Deﬁnition 6, all the theoretical results follow immediately , including the proof of equiv alence, an exact, closed-form, efﬁcient solution, and Neyman–Pearson optimality . Not only is this theoretically appealing, but it provides direct practical beneﬁts through a ne w network detection algorithm called space-time threat propagation, that is sho wn to achie ve superior performance with simulated covert networks. Bayesian space-time threat propagation is interpreted both as a random walk on a graph and, equi valently , as the solution to a harmonic boundary value problem. Bayes’ rule determines the unkno wn probability of threat on the uncued nodes—the “interior”—based on threat observations at cue nodes—the “boundary . ” Hybrid threat propagation algorithms appropriate for heterogeneous spatio- temporal relationships can be obtained from this general threat diffusion model. This new method is compared to well-known spectral methods by examining competing notions of network detection optimality . T o model realistic covert networks real- istically embedded within realistic backgrounds, a new hybrid mixed-membership blockmodel based on mixed membership of random graphs is introduced and used to assess algorithm detection performance on graphs with varying activity and coordination. In the important situations of lo w foreground activity with varying le vels of coordination, the e xamples sho w the superior detection performance of Bayesian space-time threat propagation compared to other spatial-only and uncued spectral methods. A C K N O W L E D G M E N T S The authors gratefully ackno wledge the consistently incisi ve and constructive comments from our revie wers, which greatly improv ed this paper . W e also thank Professor Patrick W olfe for originally suggesting the deep connection with random walks on graphs. R E F E R E N C E S [1] W . A I E L L O , F. C H U N G , and L . L U . “ A random graph model for power law graphs, ” Experimental Mathematics 10 (1) : 53–66 (2001). [2] E . M . A I RO L D I , D . M . B L E I , S . E . F I E N B E R G , and E . P . X I N G . “Mixed-membership stochastic blockmodels, ” JMLR 9 : 1981–2014 (2008). [3] M . A L A N Y A L I , S . V E N K A T E S H , O . S A V A S , and S . A E RO N . “Distrib uted Bayesian hypothesis testing in sensor networks, ” in Pr oc. 2005 American Contr ol Conf . Boston MA, pp. 5369–5374 (2004). [4] R . A N D E R S E N , F. R . K . C H U N G , and K . L A N G . “Local graph partition- ing using PageRank vectors, ” in Pr oc. 47th IEEE Symp. F oundations of Computer Science (FOCS) . pp. 475–486 (2006). [5] E . A R I A S - C A S T RO and N . V ER Z E L E N . “Community detection in random networks, ” arXiv:1302.7099 [math.ST] . Mar . 18 2013 [Online]. A vailable: h http://arxiv .org/abs/1302.7099 i . [6] S . A R OR A , S . R AO , and U . V A Z I R A N I . “Geometry , ﬂows, and graph- partitioning algorithms, ” Comm A CM 51 (10) : 96–105 (2008). [7] K . A V R AC H E N K OV , N . L I T V A K , M . S O KO L , and D . T O W S L E Y . “Quick detection of nodes with large degrees, ” Internet Mathe- matics doi:10.1080/15427951.2013.798601, 12 Nov . 2013. Feb. 27 2014 [Online]. A vailable: h http://www .tandfonline.com/doi/abs/10.1080/ 15427951.2013.798601 i . [8] M . B E L K I N and P . N I YO G I . “Laplacian eigenmaps for dimensionality reduction and data representation, ” Neural Computation 15 : 1373–1396 (2003). [9] K . C A R L E Y . “Estimating vulnerabilities in large cov ert networks, ” in Pr oc. 16th Intl. Symp. Command and Control Researc h and T ec h. (ICCRTS) . (San Diego, CA) (2004). [10] K . M . C A RT E R , R . R A I C H , and A . O . H E R O I I I . “On local intrinsic di- mension estimation and its applications, ” IEEE T rans. Signal Pr ocessing 58 (2) : 650–663 (2010). [11] D . C H A K R A BA RT I , Y . W A N G , C . W AN G , J . L E S KO V EC , and C . F AL O U T S O S . “Epidemic thresholds in real netw orks, ” A CM T . Inform. Syst. Se. , 10 (4) : 13.1–13.26 (2008). [12] D . C H A K R A B A RT I , Y . Z H A N , and C . F A L O U T S O S . “R-MA T : A recur- siv e model for graph mining, ” in Pr oc. 2004 SIAM Intl. Conf. Data Mining . pp. 442–446 (2004). [13] J . - F. C HA M B E R L A N D and V . V . V E E R A V A L L I . “Decentralized detection in sensor networks, ” IEEE T rans. Signal Pr ocessing 51 (2) : 407–416 (2003). [14] F. R . K . C H U N G . Spectral Graph Theory , Regional Conference Series in Mathematics 92 . Providence, RI: American Mathematical Society (1994). SMITH et al. : BA YESIAN DISCOVER Y OF THREA T NETWORKS 5337 [15] F. R . K . C H U N G and W. Z H AO . “PageRank and random walks on graphs, ” in F ete of Combinatorics and Computer Science , Bolyai Society Mathematical Studies 20 : 43–62, V ienna: Springer (2010). [16] J . A . C O S T A and A . O . H E RO I I I. “Geodesic entropic graphs for dimension and entropy estimation in manifold learning, ” IEEE T rans. Signal Pr ocessing 52 (8) : 2210–2221 (2004). [17] R . D I E S T E L . Graph Theory . New Y ork: Springer-V erlag, Inc. (2000). [18] W . E . D O NATH and A . J . H O FF M A N . “Lower bounds for the partitioning of graphs, ” IBM J. Res. Development 17 : 420–425 (1973). [19] E . B . D Y N K I N and A . A . Y U S H K E V I C H . Markov Processes: Theorems and Pr oblems . New Y ork: Plenum Press (1969). [20] P . E R D ˝ O S and A . R ´ E N Y I , “On the evolution of random graphs, ” Pubs. Mathematical Institute of the Hungarian Academy of Sciences 5 : 17–61 (1960). [21] J . P. F E R RY , D . L O , S . T. A H E A R N , and A . M . P H I L L I P S . “Network detection theory , ” in Mathematical Methods in Counterterrorism , eds. N . M E M O N et al., pp. 161–181, V ienna: Springer (2009). [22] M . F I E D L E R . “ A property of eigen vectors of non-negati ve symmetric matrices and its application to graph theory , ” Czech. Math. J. 25 : 619– 633 (1975). [23] S . F O RT U N A T O and M . B A RT H ´ E L E M Y . “Resolution limit in community detection, ” PNAS 104 (1) : 36–41 (2007). [24] S . F O RT U NATO . “Community detection in graphs, ” Physics Reports 486 : 75–174 (2010). [25] A . F RO N C Z A K , P . F R O NC Z A K , and J . A . H O Ł Y S T . “ A verage path length in random networks, ” Phys. Rev . E 70 : 056110 (2004). [26] F. R . G A N T M AC H E R . Matrix Theory . V ol. 2. New Y ork: Chelsea (1959). [27] M . G I RV A N and M . E . J . N E W M A N . “Community structure in social and biological networks, ” PNAS 99 (12) : 7821–7826 (2002). [28] C . G O D S I L and G . R O Y L E . Algebr aic Gr aph Theory . New Y ork: Springer-V erlag, Inc. (2001). [29] M . O . J AC K S O N . Social and Economic Networks , Princeton U. Press (2008). [30] A . S A N D RY H A I L A and J . F. M O U R A . “Discrete signal processing on graphs: Frequency analysis, ” IEEE Tr ans. Signal Pr ocessing 62 (12) : 3042–3054 (2014). [31] D . K O L L E R and N . F R I E D M A N . Pr obabilistic Graphical Models . Cam- bridge, MA: MIT Press (2009). [32] V . E . K R E B S . “Uncloaking terrorist networks, ” F irst Monday 7 (4) (2002). Feb. 27 2014 [Online]. A vailable: h http://ﬁrstmonday .org/ ojs/index.php/fm/article/vie w/941 i . [33] J . M . K U M P U L A , J . S A R A M ¨ A K I , K . K A S K I , and J . K E RT ´ E S Z . “Limited resolution in complex network community detection with Potts model approach, ” Eur . Phys. J. B 56 : 41–45 (2007). [34] J . L E S K OV E C and C . F A L O U T S OS . “Sampling from large graphs, ” in Pr oc. 12th ACM SIGKDD Intl. Conf. Knowledge Discovery and Data Mining pp. 631–636 (2006). [35] J . L E S KO V E C , K . J . L A N G , and M . M A H O N E Y . “Empirical comparison of algorithms for network community detection, ” in Pr oc. 19th Intl. Conf. W orld W ide W eb (WWW’10) . Raleigh, NC, pp. 631–640 (2010). [36] J . G . L I G O , G . K . A T I A , and V . V . V E E R A V A L L I . “ A controlled sensing approach to graph classiﬁcation, ” in Pr oc. IEEE Intl. Conf. Acoustics, Speech and Signal Processing (ICASSP) . V ancouver , BC (2013). [37] M . W. M A H O N E Y , L . O R E C C H I A , and N . K . V I S H N O I . “ A local spec- tral method for graphs: W ith applications to improving graph partitions and exploring data graphs locally , ” J. Machine Learning Researc h 13 : 2339–2365 (2012). [38] B . A . M I L L E R , M . S . B E A R D , and N . T. B L I S S . “Eigenspace Analysis for Threat Detection in Social Networks, ” in Proc. 14th Intl. Conf. Informat. Fusion (FUSION) . Chicago, IL (2011). [39] B . A . M I L L E R , N . T. B L I S S , and P . J . W O L F E . “T oward signal processing theory for graphs and other non-Euclidean data, ” in Pr oc. IEEE Intl. Conf. Acoustics, Speech and Signal Processing , pp. 5414– 5417 (2010). [40] B . A . M I L L E R , N . T. B L I S S , and P . J . W O L F E . “Subgraph detection using eigenv ector L 1 norms, ” in Proc. 2010 Neural Information Pro- cessing Systems (NIPS) . V ancouver , Canada (2010). [41] B . M O H A R . “The Laplacian Spectrum of Graphs, ” in Graph Theory , Combinatorics, and Applications , 2 , eds. Y . A L A V I , G . C H A RT R A N D , O . R . O E L L E R M A N N , and A . J . S C H W E N K . New Y ork: Wile y , pp. 871– 898 (1991). [42] R . R . N A DA K U D I T I and M . E . J . N E W M A N . “Graph spectra and the detectability of community structure in networks, ” Phys. Rev . Lett. 108 , 188701 (2012). [43] J . N E V I L L E , O . S I M S E K , D . J E N S E N , J . K O M O RO S K E , K . P AL M E R , and H . G O L D B E R G . “Using relational knowledge discovery to prevent securities fraud, ” in Proc. 11th ACM SIGKDD Intl. Conf. Knowledge Discovery and Data Mining pp. 449–458 (2005). [44] M . E . J . N E W M A N . “Finding community structure in networks using the eigenv ectors of matrices, ” Phys. Rev . E , 74 (3) (2006). [45] . “The structure and function of complex networks, ” SIAM Rev . 45 (2) : 167–256 (2003). [46] J . - P . O N N E L A and N . A . C H R I S TA KI S . “Spreading paths in partially observed social networks, ” Phys. Rev . E , 85 (3) : 036106 (2012). [47] M . N . ¨ O Z I S ¸ I K . Boundary V alue Pr oblems of Heat Conduction . Scranton P A: International T extbook Company , 1968. [48] S . P H I L I P S , E . K . K AO , M . Y E E , and C . C . A N D E R S O N . “Detecting activity-based communities using dynamic membership propagation, ” in Pr oc. IEEE Intl. Conf. Acoustics, Speech and Signal Processing (ICASSP) . Kyoto, Japan (2012). [49] M . A . P I N S K Y and S . K A R L I N . An Introduction to Stochastic Modeling . New Y ork: Academic Press (2010). [50] A . P OT H E N , H . S I M O N , and K . - P . L I O U . “Partitioning sparse matrices with eigen vectors of graphs, ” SIAM J. Matrix Anal. Appl. 11 : 430–45 (1990). [51] K . K . S A B E L F E L D and N . A . S I M O N OV . Random W alks on Boundaries for Solving PDEs . Utrecht, The Netherlands: VSP International Science Publishers, 1994. [52] M . S AG E M A N . Understanding T err or Networks . Philadelphia, P A: U. Pennsylvania Press (2004). [53] D . S H A H and T. Z A M A N . “Rumors in a network: Who’s the culprit?, ” IEEE T rans. Inf. Theory 57 (8) : 5163–5181 (2011). [54] D . S H A H . “Gossip Algorithms, ” F oundations and T rends in Networking 3 (1)1–125 (2009). [55] A . S H A M I R . “ A surve y on mesh segmentation techniques, ” Computer Graphics F orum 27 (6) : 1539–1556 (2008). Sep. 3 2012 [Online]. A vail- able: h http://www .faculty .idc.ac.il/arik/site/mesh-segment.asp i . [56] S . T. S M I T H , A . S I L B E R FAR B , S . P H I L I P S , E . K . K AO , and C . C . A N D E R S O N . “Network Discovery Using Wide-Area Surveillance Data, ” in Proc. 14th Intl. Conf. Informat. Fusion (FUSION) . Chicago, IL (2011). [57] S . T. S M I T H , S . P H I L I P S , and E . K . K A O . “Harmonic space-time threat propagation for graph detection, ” in Proc. IEEE Intl. Conf. Acoustics, Speech and Signal Processing (ICASSP) . Kyoto, Japan (2012). [58] S . T. S M I T H , K . D . S E N N E , S . P H I L I P S , E . K . K AO , and G . B E R N - S T E I N . “Covert Network Detection, ” Lincoln Laboratory J. 20 (1) : 47– 61 (2013). [59] D . A . S P I E L M A N and S . - H . T E N G . “Nearly-linear time algorithms for graph partitioning, graph sparsiﬁcation, and solving linear systems, ” in Pr oc. 36th ACM Symp. Theory of Computing (STOC) , pp. 81–90 (2004). [60] D . S T I R Z A K E R . Stochastic Pr ocesses and Models . Oxford University Press (2005). [61] R . T R I N Q U I E R . Modern W arfare: A F r ench V iew of Counterinsurg ency . W estport, CT : Praeger Security International (2006). [62] P . V A N M I E G H E M , D . S T E V A N OV I ´ C , F. K U I P E R S , C . L I , R . V A N D E B OV E N K A M P , D . L I U , and H . W A N G . “Decreasing the spectral radius of a graph by link removals” Phys. Rev . E 84 : 016101 (2011). [63] H . L . V A N T R E E S . Detection, Estimation, and Modulation Theory , Part 1. New Y ork: John W iley and Sons, Inc. (1968). [64] U . VO N L U X B U R G , O . B O U S Q U E T , and M . B E L K I N . “Limits of spectral clustering, ” in Advances in Neural Information Processing Systems 17 , eds. L . K . S AU L , Y . W E I S S , and L . B O T TO U . Cambridge, MA: MIT Press (2005). [65] S . W A S S E R M A N and K . F AU S T . Social Network Analysis . Cambridge Univ ersity Press. (1994). [66] D . J . W A T T S . “Networks, dynamics, and the small-world phenomenon, ” American Journal of Sociology 13 (2) : 493–527 (1999). [67] Y . W E I S S . “Segmentation using eigenv ectors: A unifying view , ” in Pr oc. of the Intl. Conf. Computer V ision 2 : 975 (1999). [68] S . W H I T E and P . S M Y T H . “ A spectral clustering approach to ﬁnding communities in graphs, ” in Proc. 5th SIAM Intl. Conf. Data Mining , eds. H . K A R G U P TA , J . S R I V A S TA V A , C . K A M A T H , and A . G O O D M A N . Philadelphia P A, pp. 76–84 (2005.) 5338 IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. 62, NO. 20, 2014 [69] H . W O L KO W I C Z and Q . Z H AO . “Semideﬁnite programming relaxations for the graph partitioning problem, ” Discrete Applied Mathematics 96– 97 : 461–479 (1999). [70] J . X U and H . C H E N . “The topology of dark networks, ” Comm. A CM 51 (10) : 58–65 (2008). [71] M . J . Y E E , S . P H I L I P S , G . R . C O N D O N , P . B . J O N E S , E . K . K A O , S . T. S M I T H , C . C . A N D E R S O N , and F. R . W AU G H . “Network discovery with multi-intelligence sources, ” Lincoln Laboratory J. 20 (1) : 31–46 (2013). [72] H . Z H O U and R . L I P O W S K Y . “Network Brownian Motion: A New Method to Measure V ertex-V ertex Proximity and to Identify Commu- nities and Subcommunities, ” in Computational Science—ICCS 2004 , Lectur e Notes in Computer Science 3038 : 1062–1069. Berlin: Springer (2004). Steven Thomas Smith (M’86–SM’04) is a Senior Staff Member at MIT Lincoln Laboratory , Lexing- ton, MA. He recei ved the B.A.Sc. degree in electrical engineering and mathematics from the Univ ersity of British Columbia, V ancouver , BC in 1986 and the Ph.D. degree in applied mathematics from Harvard Univ ersity , Cambridge, MA in 1993. He has over 15 years experience as an innovati ve technology leader with statistical data analytics, both theory and practice, and broad leadership experience rang- ing from ﬁrst-of-a-kind algorithm dev elopment for groundbreaking sensor systems to graph-based intelligence architectures. His contributions span diverse applications from optimum network detection, geometric optimization, geometric acoustics, statistical resolution limits, and nonlinear parameter estimation. He receiv ed the SIAM Outstanding Paper A ward in 2001 and the IEEE Signal Processing Society Best Paper A ward in 2010. He was associate editor of the IEEE T ransactions on Signal Pr ocessing in 2000–2002, and currently serves on the IEEE Sensor Array and Multichannel committee. He has taught signal processing courses at Harvard and for the IEEE. Edward K. Kao (M’03) is a Lincoln scholar at MIT Lincoln Laboratory in the Intelligence and Decision T echnologies Group. Since joining Lincoln in 2008, he has been working on graph-based intelligence, where actionable intelligence is inferred from inter- actions and relationships between entities. Applica- tions include wide area surveillance, threat network detection, homeland security , and cyber warfare, etc. In 2011, he entered the Ph.D. program at Harvard Statistics. Current research topics include: causal inference on peer inﬂuence effects, statistical models for community membership estimation, information content in network infer- ence, and optimal sampling and experimental design for network inference. Kenneth D. Senne (S’65–M’72–SM’95–F’02– LF’08) serves as Principal Staff in the ISR and T actical Systems Division at Lincoln Laboratory . His research examines the application of large data analytics to decision support problems. He joined the Laboratory in 1972 to work on the design and collision avoidance application of the Mode-S beacon system for the Federal A viation Adminis- tration. From 1977 to 1986 he contributed to the dev elopment of anti-jam airborne communication systems and super resolution direction ﬁnding with adaptiv e antennas. In 1986 he was asked to set up an array signal processing group as part of a large air defense airborne electronics program. This effort resulted in the pioneering demonstration of a large scale, real-time embedded adaptiv e signal processor. In 1998 he was promoted to head the Air Defense T echnology Division. In 2002 he established the Laboratory’ s T echnology Ofﬁce, with responsibility for managing technology in vestments, including the internal innovati ve research program. Prior to joining Lincoln Laboratory he earned a Ph.D. degree in electrical engineering from Stanford University with foundational research on digital adaptive signal processing and he served as Captain in the U.S. Air Force with the Frank J. Seiler Research Laboratory at the Air Force Academy . He was elected Fellow of the IEEE in 2002. Garrett Bernstein is a Computer Science Ph.D. student at the University of Massachusetts Amherst. At MIT Lincoln Laboratory , he was a member of the technical staff in the Intelligence and Deci- sion T echnologies Group. His research focused on statistical inference and machine learning applied to diverse problems, such as graph detection al- gorithms, model simulation, semantic analysis, and military operational effecti veness. Prior to joining the Laboratory , he receiv ed a bachelor’ s degree in applied and engineering physics and and engineering master’ s degree in computer science, both from Cornell University . Scott Philips is currently a data scientist at Palantir T echnologies. Before joining Palantir , Scott spent ﬁv e years as a member of the technical staff in the Intelligence and Decision T echnologies Group. While he was at Lincoln Laboratory , his research focused on dev eloping statistical algorithms for the exploitation of data from intelligence, surveillance, and reconnaissance sensors. Scott received his doc- toral degree in electrical engineering in 2007 from the Univ ersity of W ashington, where his research focused on signal processing and machine learning algorithms for the detection and classiﬁcation of sonar signals.

Bayesian Discovery of Threat Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment