Nonparametric Multi-group Membership Model for Dynamic Networks
Relational data-like graphs, networks, and matrices-is often dynamic, where the relational structure evolves over time. A fundamental problem in the analysis of time-varying network data is to extract a summary of the common structure and the dynamic…
Authors: Myunghwan Kim, Jure Leskovec
Nonparam etric Multi-group M embershi p Model f or Dynamic Networks Myunghwan Kim Stanford Uni versity Stanford, CA 9430 5 mykim@stanfo rd.edu Jure Lesko vec Stanford Uni versity Stanford, CA 94305 jure@cs.stan ford.edu Relational data—like graphs, networks, and matrices—is often dynamic, where the relational struc- ture ev olves over time . A funda mental p roblem in the analysis of time-varying network da ta is to extract a summary o f the co mmon stru cture and the d ynamics of the und erlying relation s b etween the entities. Here we b uild on th e intuition th at chang es in the netw ork structure are driven by the d y- namics at the l evel of g roups of nodes. W e propose a nonparame tric multi-group mem bership model for dynamic networks. Ou r mod el contains three m ain com ponents: W e model the b irth an d death of individual gr oups with resp ect to the dynamics of the network st ructur e v ia a distance depen dent In- dian B uffet Pro cess. W e capture t he e volution of individual node group membersh ips via a Factorial Hidden Markov mod el. And, we explain the dy namics o f the network structure by explicitly mo d- eling the connectivity structure of groups. W e demonstra te o ur model’ s capability of identifying the dynamics of latent group s in a numb er of d ifferent types of n etwork data. Experim ental results s how that our model p rovides improved pr edictiv e perfo rmance over existing dy namic n etwork mo dels on future network forecasting and missing link prediction. 1 Intr oduction Statistical analysis of social n etworks and oth er rela tional d ata is beco ming an increa singly im por- tant proble m as the scop e and av ailability o f network data increases. Network d ata—such as the friendship s in a so cial network—is often dynam ic in a sense that relatio ns between entities rise an d decay over time. A fund amental problem in the analysis of such d ynamic network data is to extract a summary of the common structure and the dynamic s of the underlyin g relatio ns between entities. Accurate mode ls of structur e and dynamics of network data ha ve many applications. They allo w us to pr edict missing relationship s [19, 20, 22], recommend potential new relations [2], identif y clusters and grou ps of no des [1, 28], for ecast future links [ 4, 9, 11, 23], an d even predict g roup growth and longevity [15]. Here we present a n e w approach to m odeling n etwork dy namics b y co nsidering time -ev olving in- teractions between gro ups o f nodes as well as individual n ode arriv al an d departu re dynam ics to these groups. W e develop a dynam ic n etwork model, Dynamic Multi-g roup Member ship Graph Model, that identifies the birth and d eath of individual groups as well as the dynamics of node join- ing and leaving groups in order to explain changes in the under lying n etwork linking structure. Our nonpa rametric model considers an infinite numbe r of latent gro ups, where each n ode can belon g to multiple gr oups simultaneously . W e captu re the ev olution of in dividual nod e group member ships via a Factorial Hid den Markov model. Howe ver , in contra st to r ecent works o n dynamic network modeling [4, 5, 1 1, 12, 1 4], we explicitly model th e b irth an d d eath d ynamics of individual grou ps by using a d istance-depen dent I ndian Buffet Process [7]. Unde r our model only active/a live groups influence relationships in a network at a giv en time . Further innovation of our ap proach is that we do no t only model relation s b etween th e member s of the same g roup but also acco unt f or links between the mem bers and non-m embers. By explicitly modelin g group lifespan and gro up connec- ti vity structu re we ach ie ve greater modelin g fle xibility , which leads to improved performanc e on link prediction and network forecasting tasks as well as to increased interpretability of obtained results. 1 The re st of the paper is o rganized as follows: Sectio n 2 provid es the backg round and Sectio n 3 presents our genera ti ve m odel a nd motiv ates its par ametrization. W e discuss related work in Sec- tion 4 an d present model infer ence p rocedur e in Sectio n 5. Last, in Section 6 we provid e experi- mental results as well as analysis of the social network from the Lor d of the Rings movie. 2 Models of Dynamic Networks First, we d escribe gen eral compo nents o f mod ern dynam ic n etwork models [4, 5, 11, 14]. In the next section we will then de scribe ou r own mo del and point o ut the differences to the p revious work. Dynamic networks are gen erally con ceptualized as d iscrete time series of graph s on a fixed set of nodes N . Dynamic network Y is repr esented as a time series of adjacency m atrices Y ( t ) for each time t = 1 , 2 , · · · , T . In this work, we limit o ur focu s to unweigh ted directed as well a s und irected networks. So, each Y ( t ) is a N × N b inary matrix wh ere Y ( t ) ij = 1 if a link from n ode i to j exists at time t and Y ( t ) ij = 0 otherwise. Each node i of the network is associated w ith a n umber o f latent binary featur es that govern the interaction d ynamics with other nodes of th e n etwork. W e d enote the binar y value of feature k of node i at time t by z ( t ) ik ∈ { 0 , 1 } . Suc h latent featur es can be viewed as assigning nod es to multi- ple overlapping , latent clusters or groups [1, 2 0]. In ou r work, we inter pret these latent features as membersh ips to latent gr oups such as social co mmunities of p eople with th e same inte rests or h ob- bies. W e allow each node to belon g to multiple gro ups simultaneously . W e model each nod e-grou p membersh ip using a sep arate Bernoulli random variable [17, 21, 28]. This is in contrast to mixed- membership models whe re the distribution over in dividual nod e’ s gro up m emberships is mo deled using a multinomial distribution [1, 5, 12]. The a dvantage of our multiple-me mbership approach is as fo llows. Mixed-member ship models ( i.e. , m ultinomial d istribution over gro up memb erships) essentially assum e that b y incr easing the amoun t of node’ s membership to so me grou p k , the same node’ s member ship to some o ther g roup k ′ has to decrease (du e to the condition th at th e proba bilities normalize to 1). On the oth er han d, mu ltiple-memb ership mod els do not suffer fro m this assumption and allow no des to true ly belong to multiple g roups. Furthermo re, we con sider a nonp arametric model of grou ps which does no t restrict the numb er of la tent gro ups ahead of time. Hence, our model adapti vely learns the ap propr iate num ber of latent group s fo r a given network a t a gi ven time. In dynamic network mod els, one also specifies a p rocess by which n odes dy namically join and leave group s. W e assume that each node i ca n join or leave a gi ven g roup k according to a Markov model. Howe ver , since each n ode ca n join multiple gr oups inde pendently , we naturally con sider fa ctorial hidden Markov mod els (FHMM ) [8], where latent grou p membe rship of each nod e in depende ntly ev olves over time. T o be co ncrete, each mem bership z ( t ) ik ev olves thro ugh a 2-by- 2 Markov tran sition probab ility matr ix Q ( t ) k where each entry Q ( t ) k [ r , s ] co rrespond s to P ( z ( t ) ik = s | z ( t − 1) ik = r ) , where r , s ∈ { 0 = non -member , 1 = member } . Now , given node grou p me mberships z ( t ) ik at time t o ne also needs to specify the pr ocess o f link generation . Lin ks of the ne twork realize according to a link function f ( · ) . A link from no de i to node j at time t occurs with probab ility determined by the link fu nction f ( z ( t ) i · , z ( t ) j · ) . In ou r model, we dev elop a link function that not only accounts fo r links between group mem bers b ut also models links between the member s and non-m embers o f a gi ven group. 3 Dynamic Multi-grou p Membership Graph Model Next we shall describe o ur Dynam ic Multi-g roup Member ship Gra ph Mo del (DM MG) and po int out the d ifferences with the previous work. In our mod el, we pay c lose attention to the three pro cesses governing ne twork dyn amics: (1) b irth and death dy namics of individual g roups, (2) evolution of membersh ips o f nodes t o group s, an d (3) the s tructure of network interactions between group mem- bers as well as non- members. W e now proceed by describing each of them in turn. Model of act ive groups. Links of the network are influenc ed not only by nodes changin g member- ships to gro ups but also by the birth an d dea th of group s themselves. New group s can be bo rn and old ones can die. Howe ver , w ithout explicitly modeling g roup b irth and death th ere exists ambig uity 2 between group m embership chan ge an d the birth /death o f groups. For examp le, co nsider two dis- joint group s k an d l su ch that their lifetimes and members do not overlap. In other words, group l is born a fter g roup k d ies o ut. However , if gr oup bir th and d eath dy namics is n ot explicitly modeled , then the m odel cou ld interpret that the two g roups correspo nd to a single latent gro up where all the members of k leav e the group before the members of l join the group. T o resolve this ambiguity we devise an explicit mod el o f birth/death dynamics of groups b y introduc ing a notion o f active groups. Under our model, a grou p can be in one of two states: it can be either activ e (aliv e) or in acti ve (not yet b orn or dead ). Ho wev er , on ce a group b ecomes inactive, it can nev er be acti ve again. Th at is, once a grou p d ies, it can nev er be ali ve again. T o en sure co herence of group’ s state over time, we build on the idea of distance-dep endent Indian Buffet Processes ( dd-IBP) [7]. The IBP is nam ed after a metaphorical p rocess that gi ves rise to a prob ability distribution, where customers enter an Indian Bu ffet restauran t and sample some subset of an infinitely long sequen ce of dishes. In the context of networks, nodes usu ally correspond to ‘customers’ and latent features/groups correspon d to ‘d ishes’. Howe ver , we apply d d-IBP in a dif ferent way . W e regard each time step t as a ‘cu stomer’ that samples a set of acti ve groups K t . So, at the first time s tep t = 1 , we have P oisson ( λ ) nu mber of g roups that are initially active, i.e. , |K 1 | ∼ P oisson ( λ ) . T o acco unt for death of group s we then con sider tha t ea ch active grou p at time t − 1 ca n become inactive at th e next time step t with probab ility γ . On the o ther hand , P oisson ( γ λ ) new grou ps are also bo rn at time t . Thus, at each time cu rrently ac ti ve gro ups can d ie, while new ones can a lso be born . The hyperp arameter γ controls f or h ow often new group s are b orn and how often old on es die. For instance, th ere will b e almost no ne wborn or dead groups if γ ≈ 1 , while there would b e no temporal group coherence and practically all the group s would die between consecutiv e time steps if γ = 0 . Figure 1(a) gives an example of the a bove proce ss. Black circles in dicate active gr oups and white circles denote inacti ve ( not y et b orn or dead) groups. Group s 1 and 3 exist at t = 1 and Group 2 is born at t = 2 . At t = 3 , Grou p 3 dies b ut Group 4 is bo rn. W ithout our g roup ac ti vity model, Group 3 could have be en reused with a completely new set of members and Group 4 would have never b een born. Ou r model can distinguish these two disjoint groups. Formally , we denote the num ber of active group s at tim e t by K t = |K t | . W e also d enote the state (active/inacti ve) o f g roup k at tim e t by W ( t ) k = 1 { k ∈ K t } . For convenience, we also de fine a set of newly acti ve groups at time t b e K + t = { k | W ( t ) k = 1 , W ( t ′ ) k = 0 ∀ t ′ < t } and K + t = |K + t | . Putting it all together we can now fully describe the proce ss o f group birth/death as follows: K + t ∼ P oisson ( λ ) , for t = 1 P oisson ( γ λ ) , for t > 1 W ( t ) k ∼ B er noul l i (1 − γ ) if W ( t − 1) k = 1 1 , if P t − 1 t ′ =1 K + t ′ < k ≤ P t t ′ =1 K + t ′ 0 , otherwise . (1) Note that unde r this model an infinite number of active group s can exist. This means ou r model au- tomatically determines the right number of active gr oups and each node can belong to many groups simultaneou sly . W e now proceed by describing the model of node group membership dynamics. Dynamics of node g roup memberships. W e capture the dynamics of n odes joining an d leaving group s by assuming that late nt node gro up memb erships f orm a Markov chain. I n this framework, node membership s to acti ve groups e v olve through time according to Markov dy namics: P ( z ( t ) ik | z ( t − 1) ik ) = Q k = 1 − a k a k b k 1 − b k , where m atrix Q k [ r , s ] deno tes a Markov transition fr om state r to state s , which can b e a fixed parameter, group specific, o r oth erwise do main dep endent as long as it define s a Markov transition matrix. Thus, the transition of node’ s i m embership to activ e group k can be defined as follows: a k , b k ∼ B eta ( α, β ) , z ( t ) ik ∼ W ( t ) k · B er noul l i a 1 − z ( t − 1) ik k (1 − b k ) z ( t − 1) ik . (2) T y pically , β > α , wh ich ensures that group’ s membership s are not too volatile over time. 3 (a) Group activity model (b) Link function model Figure 1: (a) Bi rth a nd death of groups: Black circles rep resent acti ve and white circles represent inacti ve (unborn or dead) groups. A dead group can ne ver become active again. (b) Link functi on: z ( t ) i denotes binary node group m emberships. E ntries of l ink af finity matrix Θ k denotes linking p arameters between all 4 combinations of members ( z ( t ) i = 1 ) and non -members ( z ( t ) i = 0 ). T o o btain link probability p ( t ) ij , indi vidu al affinities Θ k [ z ( t ) j , z ( t ) j ] are combined using a logistic function g ( · ) . Relationship between node gr oup membership s and links of the network. Last, we describe the part o f the m odel th at estab lishes th e conn ection betwee n no de’ s member ships to gr oups and th e links of the network. W e achiev e this by defining a link fun ction f ( i, j ) , which for g i ven a pair of nodes i, j determine s their interaction probability p ( t ) ij based on their grou p membership s. W e build on the Multip licati ve Attr ibute Gr aph mod el [ 16, 18], where eac h group k is associated with a lin k affinity m atrix Θ k ∈ R 2 × 2 . Eac h of the four entries of the link affinity m atrix captur es the tende ncy o f lin king between gr oup’ s m embers, members and non- members, as well as non- members themselves. While trad itionally link affinities wer e considered to be p robabilities, we relax th is assumptio n by allowing a ffi nities to b e arbitr ary real numb ers a nd then comb ine them throug h a logistic function to obtain a final link probab ility . The model is illustrated in Figure 1(b). Giv en gro up memb erships z ( t ) ik and z ( t ) j k of nodes i and j at time t the binary indicators “select” an entry Θ k [ z ( t ) ik , z ( t ) j k ] of matrix Θ k . This way link ing tendency from nod e i to n ode j is r eflected based on th eir member ship to g roup k . W e then deter mine the overall link probab ility p ( t ) ij by combining the link af finities via a logistic function g ( · ) 1 . Thus, p ( t ) ij = f ( z ( t ) i · , z ( t ) j · ) = g ǫ t + ∞ X k =1 Θ k [ z ( t ) ik , z ( t ) j k ] ! , Y ij ∼ B ern ou l l i ( p ( t ) ij ) (3) where ǫ t is a density parameter that reflects the varying link density of network over tim e. Note that du e to poten tially in finite number of groups the sum of an infinite n umber of link affinities may not be tractable. T o resolve this, we no tice that for a given Θ k subtracting Θ k [0 , 0] from all its entries an d then adding th is value to ǫ t does n ot change the o verall linking probability p ( t ) ij . Thus, we can set Θ k [0 , 0] = 0 an d then on ly a finite nu mber of affinities selected by z ( t ) ik have to be considered . For all other entries of Θ k we use N (0 , ν 2 ) as a prior distribution. T o sum up, Figu re 2 illustrates th e three compon ents of the DMMG in a plate n otation. G roup’ s state W ( t ) k is determined by the dd-I BP process and each node- group membership z ( t ) ik is defined as the FHMM over active gr oups. The n, the lin k between nod es i and j is determ ined based on the group s they belong to and the corresponding group link af finity matrices Θ . 4 Related W ork Classically , non-Bay esian app roaches such as exponential rando m g raph models [10, 26] ha ve been used to study dyn amic network s. On the other h and, in the Bayesian ap proaches to dynamic n etwork analysis latent variable mod els have been most widely used. These approa ches differ by the struc- ture of the latent space that the y assume. For example, euclidean space models [13, 23] place nodes 1 g ( x ) = exp ( x ) / (1 + exp( x )) 4 Figure 2: Dynamic Multi-group Membership Graph Model. Network Y depends on each node’ s gro up mem- berships Z and active gro ups W . Links of Y appear via link af finities Θ . in a low dimensional E uclidean space an d the network evolution is then mode led as a regression problem o f no de’ s future latent locatio n. In con trast, our model uses HMM s, where latent vari- ables stoch astically dep end on the state at the previous time step . Related to our work are dynamic mixed-membership models wh ere a node is prob abilistically alloca ted to a set of latent features. Ex- amples of this model include the dyn amic mixed-membership b lock model [5, 12] and the dynamic infinite r elational m odel [ 14]. Howe ver , the critica l d if ference her e is that ou r model uses multi- membersh ips where node’ s membership to one gr oup do es n ot limit i ts membership to other g roups. Probably mo st related to our work here are DRIFT [4] an d LFP [11] mo dels. Both o f these models consider Ma rkov switching of late nt multi- group membersh ips over time . DRIFT uses th e infinite factorial HMM [6], wh ile LFP ad ds “social p ropagatio n” to the Markov pr ocesses so that network links of ea ch no de at a given time directly influence gr oup member ships of the cor respondin g node at the n ext time. Compar ed to these m odels, we uniquely incor porate the model o f g roup b irth and death and present a novel and powerful linking function. 5 Model Infere nce vi a MCMC W e d e velop a Markov chain Mo nte Carlo (MCMC) proc edure to appr oximate samples fro m the posterior distribution o f the latent variables in o ur model. Mor e specifically , ther e are fiv e types of variables that we n eed to sample: n ode grou p membersh ips Z = { z ( t ) ik } , grou p states W = { W ( t ) k } , gro up membership transitions Q = { Q k } , link affinities Θ = { Θ k } , and density par ameters ǫ = { ǫ t } . By sampling each type of variables wh ile fixin g all the others, we end up with m any samples representing the posterior distrib ution P ( Z, W, Q, Θ , ǫ | Y , λ, γ , α, β ) . W e shall now explain a sampling strategy for each v arible type. Sampling node group memberships Z . T o sam ple node grou p memb ership z ( t ) ik , we u se the forward-ba ckward recursion alg orithm [25]. The alg orithm first d efines a deterministic fo rward pass which ru ns down the chain startin g at time o ne, and at each time p oint t collects information from the data and parameters up to time t in a dynamic progr amming cache. A stochastic backward pass starts at time T an d samples each z ( t ) ik in b ackwards order using the inf ormation collected dur- ing the forward p ass. In our case, we only n eed to sample z ( T B k : T D k ) ik where T B k and T D k indicate the birth time an d the dea th time o f grou p k . Due to space constraints, we discuss furth er details in Append ix. Sampling group sta tes W . T o update activ e group s, we u se the Metrop olis-Hastings algor ithm with the following proposal distribution P ( W → W ′ ) : W e add a new gr oup, rem ove an existing group , or update the life time of an acti ve group with the same probability 1 / 3 . When adding a new group k ′ we select the birth an d dea th time o f the g roup at ran dom such that 1 ≤ T B k ′ ≤ T D k ′ ≤ T . For removing grou ps we randomly pic k on e o f existing gr oups k ′′ and remove it by setting W ( t ) k ′′ = 0 for all t . Finally , to update the birth and death time of an e xisting group, we select an existing g roup and propose n e w birth a nd d eath time of the group at random. Once ne w state vector W ′ is propo sed we accept it with prob ability min 1 , P ( Y | W ′ ) P ( W ′ | λ, γ ) P ( W ′ → W ) P ( Y | W ) P ( W | λ, γ ) P ( W → W ′ ) . (4) W e compu te P ( W | λ, γ ) an d P ( W ′ → W ) in a closed for m, while we appro ximate the p osterior P ( Y | W ) by sampling L Gib bs samples while keeping W fixed. 5 Sampling group membership transition matrix Q . Beta distribution is a conjuga te prio r of Bernoulli distribution an d thu s we can sample each a k and b k in Q k directly from the poster ior distribution: a k ∼ B eta ( α + N 01 ,k , β + N 00 ,k ) and b k ∼ B eta ( α + N 10 ,k , β + N 11 ,k ) , where N r s,k is th e num ber o f no des that transition from state r to s in gro up k ( r , s ∈ { 0 = non- member , 1 = member } ). Sampling link affinities Θ . On ce node g roup membe rships Z a re determine d, we update th e entries of link affinity matrices Θ k . Direct sampling of Θ is intractable because of no n-conju gacy of the logistic link function. An appr opriate meth od in such case would be th e Metropolis-Hastings th at accepts or re jects the pro posal based on the likelihood ratio. Howe ver , to av oid low ac ceptance rates and q uickly move toward the mode of th e p osterior distribution, we develop a metho d based on Hybr id Monte Carlo (HMC) sampling [3]. W e guid e the samplin g using the grad ient o f lo g- likelihood functio n with respect to each Θ k . Because links Y ( t ) ij are g enerated indep endently given group membership s Z , the grad ient with respect to Θ k [ x, y ] can be computed by − 1 2 σ 2 Θ 2 k + X i,j,t Y ( t ) ij − p ( t ) ij 1 { z ( t ) ik = x, z ( t ) j k = y } . (5) Updating d ensity parameter ǫ . Parameter vector ǫ is defined over a finite dimensio n T . Therefo re, we can upd ate ǫ by max imizing the lo g-likelihood given all the other variables. W e compute the gradient update for each ǫ t and directly update ǫ t via a gradien t step. Updating hyperparameters. The numb er of gr oups over all time p eriods is given by a Po isson distribution with p arameter λ (1 + γ ( T − 1)) . Hence, giv en γ we sample λ by using a Gamma conjuga te prior . Similar ly , we can u se th e Beta co njugate pr ior for the gro up death proc ess ( i.e. , Bernoulli distribution) to sample γ . However , hyp erparameter s α and β do not h a ve a con jugate prior, so we u pdate them by using a gradient method based on the sampled values of a k and b k . Time complexity of model parameter estima tion. Last, we b riefly comme nt on the time c om- plexity of o ur model p arameter estimation p rocedur e. Each sample z ( t ) ik requires com putation of link pro bability p ( t ) ij for all j 6 = i . Since the expe cted number of ac ti ve g roups at each time is λ , this requires O ( λN 2 T ) comp utations o f p ( t ) ij . By cac hing the sum of link affinities between e very pair o f nodes sam pling Z as well as W requires O ( λN 2 T ) time. Sampling Θ an d ǫ a lso r equires O ( λN 2 T ) because the gradient of each p ( t ) ij needs to b e comp uted. Overall, our app roach takes O ( λN 2 T ) to obtain a single sample, while models that are based on the interaction matrix between all group s [4 , 5, 11] req uire O ( K 2 N 2 T ) , wh ere K is the expected number of group s. Furth ermore, it h as b een shown th at O (log N ) gr oups ar e eno ugh to represent networks [16, 18]. Thu s, in practice K ( i.e. , λ ) is of order log N and the runnin g time for each sample is O ( N 2 T log N ) . 6 Experiments W e e valuate our model on three d ifferent tasks. For quantitativ e ev aluation, we perform missing link prediction as well as fu ture n etwork foreca sting and show ou r mo del gives fav orable pe rforman ce when com pared to curr ent dynamic and static network models. W e also analyze the dyna mics o f group s in a dy namic so cial network o f c haracters in a movie “ The Lor d of the Rin gs: The T wo T owers . ” Experimental s etup. For the two prediction experiments, we use the following th ree datasets. First, the NIPS co-au thorships network conn ects two peop le if they appear o n the same pu blication in the NIPS conf erence in a giv en y ear . Network spans T =17 y ears (1987 to 2003 ). Following [11] we focus on a subset of 11 0 most connected people over all time pe riods. Secon d, the DBLP co- authorship network is obtained from 21 Compu ter Science confe rences from 200 0 to 2009 ( T = 10) [27]. W e f ocus o n 209 p eople by taking 7 -core of the agg regated network fo r the entire time. Third, the I NFOCOM dataset repr esents the physical prox imity interactions between 78 studen ts at the 2006 INFOCOM con ference, recorded b y wireless dete ctor remotes gi ven to each attendee [24]. As in [11] we use the processed data that removes inactive time slices to have T =5 0. T o ev aluate the predictive performan ce of ou r model, we co mpare it to three baseline models. For a naive baseline mo del, we regard the rela tionship between each pair of no des as the instance of 6 Model NIPS DBLP INFOCOM T estLL A UC F1 T estLL A UC F1 T estLL A UC F1 Naiv e -2030 0.808 0.17 7 -12051 0.814 0 .300 -17 821 0.677 0.252 LFRM -880 0.777 0 .195 -3783 0.784 0.146 -8689 0.946 0.703 DRIFT -758 0.866 0 .296 -3108 0.916 0.421 -6654 0.973 0.757 DMMG − 624 0 . 916 0 . 434 − 2684 0 . 939 0 . 492 − 6422 0 . 976 0 . 764 T ab le 1: Missing link prediction. W e bold the performance of the best scoring me thod. Our DMMG performs the best in all cases. All improv ements are statistically significant at 0.01 significance lev el. indepen dent Bern oulli distribution with B eta (1 , 1) prior . Thus, for a gi ven pair of no des, the link probab ility at each time equals to the expected probability from the posterior distribution gi ven net- work data. Secon d baseline is LFRM [20], a mo del of static ne tworks. For missing link p rediction, we independen tly fit LFRM to e ach snapshot of dyn amic networks. For network fore casting task, we fit LFRM to th e most recent snapshot o f a network. Even thou gh LFRM does not capture time dynamics, we consider this to be a strong baseline model. Fin ally , for the comparison with dy namic network m odels, we consider two recent state of the art mod els. The DRIFT mo del [4] is b ased on an infinite facto rial HMM and auth ors kindly shared their imple mentation. W e also consider the LFP model [11] for wh ich we w ere not able to obtain the implementation, but since we use the same datasets, we compar e performance number s dir ectly with those reported in [11]. T o evaluate predictive perf ormance, we use various standard ev aluation metrics. First, to assess goodn ess of inferr ed prob ability distributions, we repo rt the lo g-likelihood of held- out edg es. Sec- ond, to verify the pred icti ve perfor mance, we com pute the area un der the ROC curve (A UC). La st, we also r eport th e maximum F 1-score (F1) by scanning o ver all possible precision/recall thresholds. T ask 1: Predicting missing links. T o generate the datasets for the task of missing link pr ediction, we ran domly hold out 20% of no de pairs ( i.e. , either link o r no n-link) throug hout the entire time period. W e then run each model to obtain 40 0 samp les after 800 burn-in samples fo r each of 10 MCMC cha ins. Each sample gives a lin k proba bility fo r a given missing en try , so the final link probab ility of a m issing entr y is compu ted by averaging the correspo nding link pro bability over all the samples. T his fin al link probab ility p rovides the e v aluation me tric f or a given missing data entr y . T ab le 1 shows av erage ev aluation metrics for each model and dataset over 1 0 ru ns. W e also comp ute the p -value on the difference between two best r esults for each d ataset and metric. Overall, our DMMG m odel significantly ou tperfor ms t he other mod els in ev ery metric and dataset. Particularly in terms of F1-score we gain up to 46.6% improvement over the other models. By comp aring the naive model an d LFRM, we o bserve that LFRM perfo rms especia lly poorly compare d to th e n aiv e model in two networks with few edges (NIPS and DBLP). In tuitiv ely this makes s ense because due to the network sparsity we can obtain more information fro m the temporal trajectory of each link than f rom each snap shot of network. Howev er , both DRIFT and DMMG successfully com bine the tempor al and the n etwork inform ation which results in b etter pred icti ve perfor mance. Furthe rmore, we note that DM MG outp erforms th e other m odels by a larger margin as networks get sparser . DMMG makes better use of temp oral information because it can explicitly model temporally local links through acti ve groups. Last, we also com pare our model to the LFP mod el. The LFP paper repor ts A UC R OC sco re of ∼ 0.85 for NIPS an d ∼ 0.95 for INFOCOM on the same task of missing link prediction with 20% held-ou t m issing data [1 1]. Perfor mance of our DMMG on these same n etworks under the same condition s is 0.916 for NIPS and 0.976 for INFOCOM, which is a strong improvement over LFP . T ask 2: Future network for ecasting. Here we are given a d ynamic ne twork up to time T obs and the goal is to predict the network at the next time T obs + 1 . W e follow th e exp erimental pro tocol described in [4, 11]: W e train the mod els on first T obs networks, fix the p arameters, and then for each model we run MCMC sampling o ne time step into the futur e. For e ach model and network, we obtain 400 samples with 10 different MCMC chains, resulting in 40 0K network samples. These network samples provide a probab ility distribution o ver links at time T obs + 1 . T ab le 2 shows perfo rmance averaged over different T obs values rang ing f rom 3 to T -1 . Overall, DMMG gen erally exhib its the b est perfor mance, but perf ormance r esults seem to dep end on th e dataset. DMM G perfor ms the best at 0.001 significan ce le vel in terms of A UC and F1 for the NI PS dataset, and at 0.05 le vel f or the INFOCOM dataset. While DMMG improves perfor mance on A UC 7 Model NIPS DBLP INFOCOM T estLL A UC F1 T estLL A UC F1 T estLL A UC F1 Naiv e -547 0.524 0.130 -3248 0 . 668 0.243 -774 0.673 0 .270 LFRM -356 0.398 0 .011 -1680 0.492 0.024 -760 0.640 0.248 DRIFT − 148 0.67 2 0.084 − 1324 0.650 0.122 -661 0.782 0.381 DMMG -170 0 . 732 0 . 196 -1347 0.652 0 . 245 − 625 0 . 804 0 . 392 T ab le 2: Future netw ork forecasting. DMMG performs best on NIPS and INFOCOM while results on DBL P are mixed. madril galadriel elrond arwen faramir hama grima theoden eomer eowyn saruman gimli legolas aragorn pippin gollum sam frodo merry gandalf haldir 1 2 3 4 5 madril galadriel elrond arwen faramir hama grima theoden eomer eowyn saruman gimli legolas aragorn pippin gollum sam frodo merry gandalf haldir 1 2 3 4 5 madril galadriel elrond arwen faramir hama grima theoden eomer eowyn saruman gimli legolas aragorn pippin gollum sam frodo merry gandalf haldir 1 2 3 4 5 (a) Group 1 (b) Group 2 (c) Group 3 Figure 3: Gr oup a rriv al and dep arture dynamics of dif ferent characters in the Lord of the Rings. Dark are as in the plots correspond to a gi ve node’ s (y-axis) membership to each group over time (x-axis) . (9%) and F1 (133%), DRIFT achiev es the b est lo g-likelihood on the NI PS da taset. In light of our previous ob servations, we conjecture that this is due to chan ge in network edge den sity between different snapshots. On the DBLP d ataset, DRIFT gives th e b est log- likelihood, the naive m odel perfor ms b est in terms of A UC, and DMMG is the best on F1 score. Howe ver , in all cases of DBLP dataset, th e differences are not statistically sign ificant. Overall, DMMG per forms the best on NI PS and INFOCOM and provides comparable performance on DBLP . T ask 3: Case study of “The Lord of the Rings: The T wo T owers” social network. Last, we also in vestigate gro ups identified by our mo del on a d ynamic social network of chara cters in a movie, The Lor d of the Rings: The T wo T owers . Based on the transcript of the movie we c reated a dynamic social network on 21 characters and T =5 time epochs, where we connect a pair of characters if they co-app ear inside some time window . W e fit our model to this network and examine the results in Figure 3. Our mod el id entified three dynamic group s, wh ich all n icely co rrespond to the Lord o f th e Rings story line. For example, the core of Gr oup 1 correspon ds to Ar agorn, elf Legolas, dwarf Gimli, an d people in Rohan who in the en d all fight again st the Orcs. Similarly , Group 2 correspo nds to h obbits Sam , Frodo and Gollum on their mission to destroy the ring in Mord or , an d ar e later joined by Faramir and ranger Madril. Interesting ly , Group 3 e v olving aroun d Merry and Pippin only forms at t =2 when they s tart their jo urney with Treebeard and later figh t against wizard Sarum an. While the fight o ccurs in two separate place s we find th at some scenes are not distinguishab le, so it look s as if Merry a nd Pippin fough t t ogether with Rohan’ s army against S aruman ’ s army . Acknowledgments W e thank Creigh ton Hea ukulani an d Zoub in Gha hramani f or sha ring data an d code. T his r esearch has been sup ported in part by NS F IIS-1016 909, CNS-1010 921, II S-11498 37, IIS-11 59679 , IARP A AFRL F A8 650-1 0-C-7058 , Okawa Founda tion, Do como, B oeing, Ally es, V o lkswagen, In tel, Alfred P . Sloan Fellowship and the Microsoft Faculty Fello wship. References [1] E . M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P . Xing. Mixed membership stochastic blockmod els. JMLR , 9, 2008. [2] L . Backstrom and J. Leskov ec. Supervised rando m walks: Predicting an d recommen ding links in social networks. In WSDM , 2011. 8 [3] S . Duane, A. Kenn edy , B. J. Pendleton, and D. Roweth. Hybrid monte carlo. Physics L etter B , 195(2):216– 222, 19 87. [4] J. Foulds, A. U. Asun cion, C . DuBois, C. T . Butts, and P . Smyth. A dynamic relati onal infinite feature model for longitudinal social networks. In AIST A TS , 2011 . [5] W . Fu, L. Song, and E. P . Xing. Dynamic mixed membership blockmodel for ev olving network s. In ICML , 2009. [6] J. V . Gael, Y . W . T eh, , and Z. Gha hramani. T he infinite factorial hidden mark ov model. In NIPS , 2009. [7] S . J. Gershman, P . I. Frazier , and D. M. Blei. Dist ance dependent infinite latent feature models. arXiv:1110.545 4 , 20 12. [8] Z . Ghahramani and M. I. Jordan . Factorial hidden mark ov models. Mach ine Learning , 29(2-3):245–273 , 1997. [9] F . Guo, S. Hannek e, W . Fu, and E. P . Xing. Recovering temporally rewiring networks: a model-based approach. In ICML , 2007. [10] S . Hannek e, W . Fu, and E. P . Xing. Discrete temporal models of social network s. Electr on. J . Statist. , 4:585–60 5, 20 10. [11] C. Heaukulani and Z. Ghahramani. Dynamic probabilistic models for latent feature propagation in social networks. In ICML , 2013. [12] Q. Ho, L. Song, and E. P . Xing. Evolving cluster mixed-membership blockmodel for time-v arying net- works. In AIST A TS , 20 11. [13] P . D. Hof f, A. E. Raftery , and M. S. Handco ck. Latent space approaches to so cial netw ork ana lysis. J ASA , 97(460):109 0 – 1098, 2 002. [14] K. Ishiguro, T . Iwata, N. Ueda, and J. T enenbaum. Dynamic infinite relational model for time-v arying relational data analysis. In NIPS , 2010. [15] S . Kairam, D. W ang, a nd J. Lesk ov ec. T he life an d death of online groups: Predicting group growth and longe vity . In WSDM , 2012. [16] M. Kim and J. Lesko vec. Modeling social ne tworks with node attributes using the multiplicativ e attribute graph model. In UAI , 2 011. [17] M. Ki m and J. Lesk ov ec. L atent multi-group membersh ip grap h model. In ICML , 2012. [18] M. Kim and J. L esko v ec. Multiplicative attribute graph mod el of real-world networks. Internet Mathe- matics , 8(1-2):113– 160, 20 12. [19] J. R. Lloy d, P . Orbanz, Z . Ghahramani, and D. M. Roy . Random function priors for e xchang eable arrays with applications to graphs and relational data. In NIPS , 2012. [20] K. T . Miller, T . L. Grifths, and M . I. Jordan. Nonparametric latent feature models for link prediction. In NIPS , 2010. [21] M. Mørup, M. N. Schmidt, and L. K. Hansen. Infinit e multiple membership relational modeling for complex netw orks. I n MLSP , 2011. [22] K. Palla, D. A. Kno wles, and Z. Ghahramani. An infinite latent attribute model for netwo rk data. In ICML , 2012. [23] P . Sarkar and A. W . Moore. Dynamic social network analy sis using latent space models. In NIPS , 2005. [24] J. Scott, R. Gass, J. Crowcroft, P . Hui, C. Diot, and A. Chaintreau. CRA WD AD data set cambridge/haggle (v . 2009-05-29), May 2009. [25] S . L. Scott. Bayesian methods for hidden marko v models. JASA , 97 (457):337–3 51, 2002 . [26] T . A. B. Snijders, G. G. v an de Bunt, and C. E. G. Steglich. Introduction to st ochastic ac tor-based mod els for network dynamics. Social Networks , 32(1):44–60, 2010. [27] J. T ang, J. Zhang, L. Y ao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD’08 , 2008. [28] J. Y ang and J. Lesko vec. Community-af filiation graph model for overlapp ing community detection. In ICDM , 2012. 9 A Sampling group memb erships Z T o sample node group m embership z ( t ) ik , we u se the forward-bac kward recu rsion alg orithm [25] that samples the wh ole Markov chain z (1: T ) ik at once . Since we f ocus on ly on active group s, we o nly need to sample z ( T B k : T D k ) ik where T B k and T D k indicates the b irth time and th e death time of gro up k , respectively . Suppose that all the other variables but Z are giv en. For the sam ple of each gro up memb ership z ( t ) ik , we use the f orward-back ward r ecursion alg orithm [25] th at sample the whole Markov chain z (1: T ) ik together . Moreover, since the active g roups are fixed, i.e. , the birth time T B k and death time T D k of g roup k is g i ven, we o nly need to sample its sub-chain z ( T B k : T D k ) ik . T he algo rithm consists two passes: forward an d backward p asses. In the forward pa ss, for each time t , we com pute the posterior transition prob ability of z ( · ) ik from t − 1 to t given the links upto tim e t . Once the fo rward pass is don e, we sample th e latent featur e z ( · ) ik backward from T D k to T B k , with con sideration of the posterior transition probab ility co mputed in the forward pass. T o be con crete, let Ω be the states of all the other v ariables except for z ( · ) ik . For the forward pass, we define the following variables: P tr s = P z ( t − 1) ik = r , z ( t ) ik = s | Y ( T B k : t ) , Ω , π ts = P z ( t ) ik = s | Y ( T B k : t ) , Ω . (6) Then, we can find the value of each P tr s and π ts by dynamic program ming: π ts = X r P tr s , P tr s ∝ π t − 1 ,s Q k [ r , s ] P Y ( t ) | z ( t ) ik = s, Ω (7) where Q k = 1 − a k a k 1 − b k b k and P r,s P tr s = 1 . Now given each P tr s and π ts , z T D k ik can be samp led acc ording to π T D k , and then th e back ward pass samples the z · ik chain backwards: P z ( t ) ik = r | z ( t +1) ik = s, Y ( T B k : T D k ) , Ω ∝ P ( t +1) r s . (8) 10
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment