Network as a computer: ranking paths to find flows

Net w ork as a computer: ranking paths to ﬁnd ﬂo ws Dusko Pa vlovic ⋆ Oxford Universit y and Kestrel Institute Abstract. W e explore a simple mathematical model of netw ork computation, based on Marko v chains. Similar models apply to a broad range of computational ph en omena, arising in netw orks of comput ers, as well as in genetic, and neural nets, in social netw orks, and so on. The main problem of in teraction with such sp ontaneously evolving computational systems is th at the data are not uniformly structu red. An interesti ng app roac h is to try to extract the semantica l conten t of the data from their d istribu t ion among the no d es. A concept is then identiﬁed b y ﬁnd ing the comm u nity of n od es that share it. The task of data structuring is thus red u ced to the task of ﬁnding the n etw ork comm un ities, as groups of nod es t hat together p erform some non-lo cal data pro cessing. T ow ards this goa l, w e extend th e ranking metho d s from n odes to paths, whic h allo ws us to extract information ab out the lik ely ﬂo w biases from the av ailable static information abou t the net work. 1 In t ro duction Initially , W eb sea rch was developed as an instance o f information r etrieval , optimized for a par ticularly large distributed databas e. With the adv ent of online advertising, W eb search got enhanced by a broad range of information supply techn iques where the search r esults a re expanded b y additional data, extrap olated fro m user’s interests, and from s e arch engine’s stock of information. F r o m the simple idea to match and coordina te the push and the pull of infor mation o n the W eb a s a new c omputational platform [18] sprang up a new generation of w eb business es and so c ial netw or ks. Similar pa tterns o f informa tion pro cessing are found in many o ther evolutionary systems, from gene reg ulation, protein interaction and neural ne ts , through the v arious ne tw orks of computers and devices , to the complex so cial and market str uctures [1 5]. This pap er explores so me s imple mathematical cons equences of the observ a tion that the W eb, and simila r net works, are muc h more than mere information repos ito ries: b esides storing, and r etrieving, and supplying information, they a ls o g enerate, and pro cess information. W e pursue the idea that the W eb can b e modeled as a co mputer, rather than a database; o r more precisely , as a v ast mult i-party computation [6], akin to a market place, wher e masses of selﬁsh agen ts jointly ev aluate a nd gener ate public infor mation, dr iven by their priv a te utilities. While this view raises interesting new pr oblems acros s the whole ga m ut of Computer Science, the most eﬀective solutions, so far, of the pro blem of semantic al interactions with the W eb co mputatio ns w ere obtained by rediscov ering and a dopting the ra nking metho ds, deeply ro oted in the so cio metric tr a dition [11,10], and adapting them for use on very larg e indices, lea ding to the who le new paradig m o f se ar ch [19,12,13]. Implicitly , the idea of the W eb as a computer is tacitly pr esent a lr eady in this para digm, in the sense that the search rankings are extracted from the link structure, and other in tr insic information, generated o n the W eb itself, r ather than stored in it. Outline of the p ap er. In section 2 we intro duce the bas ic netw o rk model, and describ e a ﬁrst attempt to extract infor ma tion ab out the ﬂows through a netw o rk from the av ailable static data ab out it. In sections 3 and 4, we describ e the structure which allo ws us to lift the notio n of rank , describ ed in section 5, to path net works in section 6. Ranking paths allows us to extract a rando m v ariable, called attraction bias, which allows mea suring the mutual information o f the distributions of the inputs and the outputs of the netw ork computation, which can b e view ed as a n indicator o f non-lo cal informatio n pro cessing that takes place in the giv en net work. In the ﬁnal section, we describ e ho w the obtained data can b e used to detect s emantical ⋆ Email: dusko@ { ke strel.edu,c omlab.ox.ac.uk } . Supp orted by EPSRC and ONR. structures in a netw ork. The ex per imental w o r k necessary to test the prac tica l eﬀectiveness of the appr oach is left for future work. 2 Net w or ks Basic mo del. W e view a netw ork as an edge-labelled directed gra ph A =  R γ o o E δ / /  / / N  , where N and E are, resp ectively , the ﬁnite sets of no des , and links , or e dges , whereas R is an or dered ﬁeld of values (in some applica tions an order ed ring of functions). A link i e → j is thus an element e ∈ E with δ ( e ) = i and  ( e ) = j . The v alue γ ( e ) is the cos t (when p ositive), or payoﬀ (when negative) of the tr aﬃc ov er e . These data induce the adjac ency matrix E = ( E ij ) N × N and the c ap acity matrix A = ( A ij ) N × N , with the entries E ij = { e ∈ E | i e → j } and A ij = P e ∈ E ij A e , where A e = 2 − γ ( e ) is the capa c it y of the link e . R emark. The ter m ”capacit y” is used here as in net work ﬂo w theory . 1 The cost or the pa yoﬀ o f a link may represent its v alue in a pay-per - click mo del of a fragment of the W eb; or it may denote the proximit y of the web pages within the same site, o r within a gr oup o f in ter connected sites. In a protein netw ork , the energy cost or pay oﬀ may b e derived from the chemical aﬃnities be tween the no des. While this para meter can be abstracted aw ay , simply b y taking γ ( e ) = 0 for a ll links e , its role will beco me clear in sections 3 and 6 , where it allows discounting and eliminating so me paths. Basic dynamics. The simples t mo del of net work dynamics is bas ed on the a ssumption that the traﬃc ﬂows are distr ibuted prop ortio nally to the link capacities. Rando mly sa mpling the W eb traﬃc, we shall th us ﬁnd a s urfer on a link e with the probability α e = A e A • , where A • = P f ∈ E A f . In order to ﬁnd the comm unities in a net work, we need to detect the traﬃc biases betw een its nodes. W e assume that the traﬃc b etw e e n the nodes within the same comm unity will b e higher than the capacity of the links be tw een them w o uld lead us to exp ect; and that the traﬃc betw een the diﬀerent communities will be low er than ex pec ted. T o measure suc h tr aﬃc biases, w e no r malize the ca pacity ma tr ix A to get the c ap acity distribution α = ( α ij ) N × N as α ij = A ij A •• , where A •• = P k,ℓ ∈ N A kℓ . The entry α ij is thus the proba bilit y that a random sample of traﬃc o n A , follo wing the s imple dynamics propo r tional to c apacity , will b e found on a link from i to j . On the other hand, the marginals of the probabilit y distr ibution α , α i • = X j ∈ N α ij α • j = X i ∈ N α ij corres p o nd, resp ectively , to the pro ba bilities that a random sample of traﬃc will have i a s its so ur ce, a nd j as its destination. Let us call α i • the out- ra nk of i , and α • j the in-r ank o f j , becaus e they can be viewed as the simplest, alb eit degenerate ca ses of the notion of rank. If the in-rank and the out-rank a re s ta tistically indep endent, then (b y the deﬁnition of indep endence) the probability that a r andom traﬃc sample goes from i to j will b e α i • α • j . Their dependency is thus measured by the tr aﬃc bias matrix υ = ( υ ij ) N × N with the entries υ ij = α ij − α i • α • j falling in the in ter v al [ − 1 , 1]. The higher the bias, the more unexp ected traﬃc there is. F or a set of no des U ⊆ N the v a lues Coh ( U ) = X i,j ∈ U υ ij Adh ( U ) = X i ∈ U,j 6∈ U υ ij + υ j i can th us b e construed as the c ohesion and the ad hesion forces: the total traﬃc bias within the group, and with its exterior , resp ectively . A net work communit y U can thus be recognized as a set of no des with a high cohesion and a low a dhesion [16]. The idea that sema n tically related no des can b e c a ptured as mem ber s 1 The information th eoretic homon ym has a diﬀeren t, alb eit related meaning, whic h motiv ates the c hoice of γ ( e ) = − log 2 A e . 2 of the same netw or k communities, der ived from the graph structure, is a na tural extensio n of the ranking approach, which has b een formalized in [9,1 7]. The only problem with applying that idea to the ab ove model is that our initial assumption — that the traﬃc distribution o n A is pro po rtional to its link capacities — is no t very realis tic. It abstrac ts awa y a ll traﬃc dynamics. On the other hand, the static netw ork mo del, as given a bove, does not provide any data ab out the actual traﬃc. W e ex plo re the wa ys to solve this problem, and extract increa singly more realistic views of traﬃc dynamics fro m a static net work mo del. 3 Adding paths A path i a → j in a net work A is a sequence of links i a 1 → k 1 a 2 → k 2 → · · · a n → j . In man y cases of in terest, traﬃc dynamics on a net work dep ends on the path selections, r a ther than just o n single links. One idea is to add the paths to the structure of a netw ork , and to annota te how the links compo se into paths, and how the paths compo se in to longer paths. This amoun ts to gener ating the free categor y [14] o ver the netw o r k gra ph. Unfortunately , adding all paths to a net work usually destr oys some essen tia l information, just like the transitive closure of a r elation do es. E.g., in a so cial netw ork, a fr iend of a friend is often no t even an acquain tance. T aking the tra nsitive closure of the friendship rela tio n oblitera tes that fact. Moreov er, the popular ” s mall world” phenomenon suggests that a lmo st every two p e ople can be related through no more than six friends of friends of friends. . . So alr eady a dding all paths of leng th six to a so cia l netw ork, with a s ymmetric friendship relation, is likely to gener ate a complete gr aph. In fact, the av er age probabilit y that t wo o f node’s neighbor s in an undirected graph a re also linked with each o ther is a n important facto r, called clu stering c o eﬃcient [20]. On the o ther ha nd, in some net works, e.g . of protein int eractions, a link i → k which s hortcuts the links i → j → k often denotes a direct fe e d-forwar d connectio n, rather than a comp osition of the t w o links, and lea ds to essentially diﬀerent dynamics. So o nly ”s hort” pa ths m ust be added to a netw ork: co mp os ition must b e p e nalized. Deﬁnition 1. F or a given net work A =  R γ o o E δ / /  / / N  , a cutoﬀ value v ∈ R , and a c omp osition p enalty d ∈ R , we deﬁne the v -completion to b e the network A ∗ v =  R γ o o E ∗ v δ / /  / / N  , wher e E ∗ v = { a ∈ E ∗ | γ ( a ) ≤ v } and γ  i 0 a 1 → i 1 a 2 → i 2 → · · · a n → i n  = ( n − 1) d + γ ( a 1 ) + · · · + γ ( a n ) and E ∗ is the set of al l n onempty p aths in A . R emarks. E ∗ can be obtained as the ma trix o f se ts E ∗ = P ∞ n =0 E n where each E n is a p ow e r of the adjacency matrix E . If the ent ry E ij is viewed as the set of links { i e → j } , then the entry E 2 ij = P N k =1 E ik · E kj corres p o nds to the set of 2- hop paths { i e 1 → k e 2 → j } through the v a rious no des k ; the matrix E 3 similarly corres p o nds to the matrix of 3- hop paths, and so on. The v -clo sed net work A ∗ v is closed under the c o mpo sition of low cost paths, but not if the cost is g reater than v . It is not har d to s ee that the v -completio n is an ide mp otent o p eration, i.e. A ∗ v ∗ v = A ∗ v , but it may fail to b e a prop er closure op er ation, b eca use a link e in A , with γ ( e ) > v , may lead to A 6⊆ A ∗ v . In the rest of the pap er, w e assume that the netw orks ar e v - co mplete for some v , i.e. A = A ∗ v . This means that the relev a nt path wa ys are already represe nted as links, with the comp osition p enalty absorb ed in the cost. 4 Net w or k dynamics In or der to derive netw ork dyna mics from a static netw ork mo del, one ﬁrst sp e ciﬁes the w ay in which the behavior of a c o mputational agen t, pr o cessing data on the netw or k, is inﬂuenced by the netw ork structure, and then usually der ives a Mar ko v chain that drives the traﬃc. The netw or k featur e s that inﬂuence its dynamics can then be increment ally reﬁned, yielding more and more information. 3 4.1 F orw ard and back w ard Random walks o n netw orks are often represented in terms of the b ehavior of surfers on the W eb, following the hyperlinks. 2 The simplest sur fer b ehavior cho oses an out-link unifor mly at r andom at each no de. A visitor of a node i will thus pro c eed to a node j with probability A  ij = A ij A i • , where A i • = P N k =1 A ik is the out-degree of i . The row-stochastic ma trix A  = ( A  ij ) N × N represents forwar d dynamics of a netw ork A . The en tries A  ij are called the pul l co eﬃcients of i by j . Dually , b ackwar d dynamic s o f a netw ork A is represented b y a column-stochastic matrix A  = ( A  ij ) N × N , where the entry A  ij = A ij A • j , with A • j = P N k =1 A kj denoting the in- degree of j , desc rib es the pr obability that a s urfer who is on the node j came there fro m the no de i . The entries A  ij are called the push co eﬃcients. R emark. Note that the capacity ma trix can b e normalized to get A  and A  as ab ove only if no rows, resp. columns, c o nsist of 0s alone. This means that every no de of the net work must hav e at least one out-link, resp. at least one in-link. Netw ork s that do not satisfy this r equirement need to be mo diﬁed, in one w ay or another, in order to enable analys is . Adding a high-cost link betw een ev er y t wo nodes is clea rly the minimal per turbation (with maximal entropy) that ac hieves this. Alternativ e ly , the pr oblem ca n also b e resolved by adjoining a fresh no de, and the high-cost link s in a nd out of it [2]. Either wa y , the quantitativ e eﬀect of suc h mo diﬁcations can b e made a r bitrarily small by increasing the cost of the added links . 4.2 F orw ard-out and bac kw ard-in dynamics The next example can b e in terpreted in tw o ways, either to show how forward and bac kward dynamics can be reﬁned to tak e in to account v a rious na vigation capabilities, or how to abstract aw ay irrelev ant cy c les. Suppo se that a surfer searches for the hubs on the net work: he prefers to follow the hyperlinks that lea d to the no des with a higher out-degree. This preference may be r ealized by a nnotating the hyper links according to the out-rank of their targe t no des. Alternatively , the surfer may explore the h yp erlinks ahe a d, and select those with the hig he s t out-degre e; but we w ant to ignore the exploratio n par t, and simply ass ume that he pro ceeds according to the out-r ank of the nodes ahead. The probability that this surfer will move from i to j is th us A  ij = A  ij · α j • = A ij A j • A i • A •• W e call this the forwar d-out dynamics. In the dua l, b ackwar d-in dynamics, the s urfers are mo re likely to arrive to j from i if this is a freq uently visited no de, i.e. if its in-r ank is higher A  ij = α • i · A  ij = A • i A ij A •• A • j These dynamics will be the pa r ticularly conv enient to demonstrate an exa mple of bia s analys is in section 6, bec ause they clea rly dis play clear ly how the simple tr aﬃc bia s υ from section 2 can be re ﬁned by the v ar ious dynamics factors. 4.3 T elep ortation and preference The main p oint o f for mu lating net work dynamics, es pec ia lly in the Marko v chain form, is to b e able to compute the no de ranking a s its inv aria nt distribution. How ever, since the netw o rk graphs ar e usually not strongly connected, the Ma rko v c hains, deriv ed from their structure, are o ften reducible to c lasses of no des with no wa y out. The simplest r emedy is the idea o f telep ortation , g oing back to [19]. A g eneral interpretation is that, whichev er dynamics a surfer might follow, at each no de he to s ses a biase d co in, and with a pr obability d ∈ (0 , 1) fo llows that dynamics. Otherwise, with a probability 1 − d , he ”telep orts” to a r andomly chosen 2 The surfers deserve their name b y follo wing th e ”w aves ”, i.e. ob eying the same dy n amics. 4 no de, ignoring all h yp er links and other s tructure. F ollo w ing , say , forward dynamics, the proba bilit y that he will go fro m i to j is thus A P ij = dA  ij + 1 − d N . This is roughly the P ageRank dynamics, from which the Go og le search eng ine had started [19] 3 . The induced dynamics is thus A P = dA  + (1 − d ) P , where P = ( P ij ) N × N has all en trie s P ij = 1 N . In the netw orks without a cost funct ion, this is int erpreted as adding a link betw een every tw o nodes . The inﬂuence o f such links can be controlled using the cost functions. In any case, the resulting Markov c ha ins b eco me irr educible, and their stationa ry distributio ns do not g et ca ptured in any closed co mpo ne nts. F urthermor e, the mo del can b e p ersonalize d by capturing s urfer’s pre ferences in ter ms of the biases in P : the en tries P ij can b e in terpreted as i ’s trust in j [8]. The extensions of the b ackwar d dynamics b y telep ortatio n yields to diﬀeren t in terpretations, whic h the rea de r may wish to consider on her own. 5 Ranking Int uitively , the ra nk o f a no de is the pro bability that randomly sampled traﬃc will b e fo und to v is it that no de. In search, this is tak en a s a generic relev ance meas ure. The tec hnical implica tion is that the rank can be obtained as a stationa ry distribution of the Mar ko v c hain capturing dynamics. Ea ch notio n of dynamics th us induces a co rresp onding notion of r ank. Since a Ma rko v c hain can b e viewed as a linear , and hence contin uous transformatio n of the simplex of distributions, which is closed and co mpact, alre a dy Brouw er ’s ﬁxed p oint theorem guara nt ees that the rank a lwa ys exists. Finding a meaningful, useful notion of rank is another matter. First of all, as a lready mentioned, netw o rks often deco mpo se into lo osely connected subnets. In the lo ng run, all tra ﬃc is likely to ge t captured in some such subnet. This results in multiple stationary distributions, each c oncentrated in a close d subnet, zer o o therwise. Dynamics der ived dir ectly from the netw ork gr aph therefore result in uninformative ra nk ing data. In or der to assure that the rele v a nt Mar ko v chains are irre- ducible and ap erio dic, a nd th us induce unique and nondegenera te statio na ry distributions, net work dynamics usually need to be p erturb ed, using a damping and stabilizing fac tor suc h as telep o rtation. Another so rt o f problems ar ise when the unique stationary distribution is no t an attractor, o r when the rate of conv ergence is unfeas ibly slow [7,3]. While very impo rtant in concrete applicatio ns , these pro blems, and their solutions, hav e less impact on the conceptual analyses pur sued in this pa p er . We s hal l henc eforth assume t hat al l pr o c esses have b e en adjuste d to induc e u nique and eﬀe ctively c omputable r anking. 4 5.1 Promotion and reputation W e no w explain the in tuition behind the simplest notions of ra nk. In so cial ter ms, the push coeﬃcient A  ij = A ij A • j can b e in terpreted as measuring how m uch i supp orts (or advocates) j . The concept of pr omotion can then b e formalized as a probability distribution r  , such that r  i = P N k =1 A  ik r  k . In w ords, the promotion rank (or push ra nk ) r  i of a node i is the sum of the pro motion ranks r  k of its children no des, eac h allo ca ted to i according to the push coeﬃcient A  ik , measur ing i ’s supp ort for k . Dually , the pull co eﬃcient A  ij can b e interpreted as mea suring how m uch i trus ts j . The concept o f r eputation can then b e for malized a s a probability distribution r  , such that r  j = P N k =1 r  k A  kj . This reputation ra nk (or pul l r ank ) r  i of a no de i is thus the sum of the reputation ranks r  k of its pa rent nodes, each allo cated according to the pull co eﬃcient A  kj , of k ’s trust in j . 3 The original v ersion allo wed A  ij to b e 0, if A i • is 0, i. e. if i is a ”s ink-hole”, and the telep ortation fa ctor w as added to sa ve dynamics fro m such sinkholes. Other mod iﬁcations w ere introduced later. 4 This implies that all notions of dy namics that w e consider ha ve a tacit damping factor. W e do not display it only b ecause it needlessly complicates form ulas. 5 Gathering the promo tion v alues in a column vector r  and the reputation v alues in a row vector r  , w e can r ewrite the deﬁnitions of r  and r  in the matrix form r  = A  r  r  = r  A  The reﬁned notions of pr omotion r  and r eputation r  are deﬁned and in terpr eted a lo ng the same lines, as the stationary distributions of the pro cesses A  and A  resp ectively . 5.2 Exp ected ﬂow While dy namics o f reputation has b een studied for a long time [11,10], and with increased attention rec e n tly , since it b ecome a crucial tool of W eb search [19,13], the dual dynamics of promo tio n do es not seem to hav e attracted m uch attention. W e need both notions to deﬁne the exp ected traﬃc ﬂow. The expected ﬂow fro m j to k , under the assumption that they are indep endent, is ca used only b y a ”traﬃc pressure”, resulting from the pull to j a nd the push from k . F ollowing this idea , we de ﬁne r   j k = r  j r  k (1) The expec ted ﬂo w r   is thus a pr obability distribution ov er N × N , which can be represented a s the matrix r   = r  · r  , obtained b y mult iplying the column vector r  and the r ow vector r  . Since r  and r  are the principal eige nvectors of A  and A  , r   is the unique distribution satisfying r   = A  · r   · A  , i.e. r   j k = P N i =1 P N ℓ =1 A  ij r   iℓ A  kℓ . Intuitiv ely , this means that the ﬂow pressure from i to ℓ pro pagates to cause a ﬂow pr e ssure from j to k prop or tionally to the force of the traﬃc fr o m i to j and to the force of traﬃc ﬂo ws from k to ℓ — pr ovide d that j and k a re indep endent. In or der to mea sure their dependency , w e attempt to capture how the actual ﬂows fr o m i to ℓ (rather than mere ﬂo w pr essure) may get diverted, say by the high costs and the lo w capacities, to cause actual ﬂows from j to k . 6 P ath netw orks Deﬁnition 2. Giv en a v -close d network A =  R γ o o E δ / /  / / N  , we deﬁne the path net work b A =  R γ o o b E δ / /  / / b N  , wher e b N = E , and b E = P a,b ∈ E b E ab , with b E ab =  f = h f 0 , f 1 i ∈ E ij × E kℓ | γ ( f 0 ) + γ ( b ) + γ ( f 1 ) − γ ( a ) ≤ v − 2 d  (2) γ ( f ) = 2 d + γ ( f 0 ) + γ ( b ) + γ ( f 1 ) − γ ( a ) (3) i a   f 0 ' ' P P P P P P P j b   k f 1 v v n n n n n n n ℓ Dynamics of path selectio n. Recalling that b A ab = P f ∈ b E ab 2 − γ ( f ) , we deﬁne the for ward and the ba ckw ar d dynamics, and the pull r ank and the push rank just like b efore : b A  ab = b A ab b A a • b A  ab = b A ab b A • b b r b = X a ∈ b N b r a b A  ab b r  a = X b ∈ b N b A  ab b r  b 6 Int uitively , b A  ab is now the probability that tr aﬃc through a is diverted to b (r ather than to some o ther path); while b A  ab is the pro bability that traﬃc thro ug h b is diverted from a (and not fr o m some other path). The pull rank b r b , i.e. the probability that b will b e traversed, can thus be under sto o d as its attr action ; whereas b r  a is the proba bilit y that a w ill b e a voided. Using the pull ra nk of the paths, w e can now deﬁne the no de attra ction b etw een j a nd k to b e the total attraction of all paths b etw een them: b r j k = X j → b k b r b (4) The idea is that this notion of attraction the no des will allow us to r eﬁne the estimate of the traﬃc bias υ as described in sectio n 2. In particular, consider attr action bias Υ j k = b r j k − r   j k (5) T o motiv ate this, note that expanding the form ula for r   j k in section 5.2 shows tha t r   is the stationary distribution of the Marko v c hain A   =  A   ( ij )( kℓ )  N 2 × N 2 , where A   ( ij )( kℓ ) = A ij A j • A • k A kℓ A i • A 2 •• A • ℓ and r   j k = X i,ℓ ∈ N A   ( ij )( kℓ ) r   iℓ On the other hand, the node attraction b r turns out to b e a sta tionary distribution o f a pro ce ss tha t reﬁnes A   . Deﬁnition 3. Giv en a n etwork A , its a ttraction dynamics is a Markov chain b A =  b A ( ij )( kℓ )  N 2 × N 2 , with the entries b A ( ij )( kℓ ) = A ij A j k A kℓ A i • A •• A • ℓ (6) wher e A i • A •• A • ℓ = P m,n ∈ N A im A mn A nℓ . Prop ositio n 1. Supp ose that a given network A is v -c omplete for a suﬃciently lar ge v . Then the no de attr action b r , deﬁne d in (4), is the stationary distribution of its attr action dynamics (6). In other wor ds, for every j, k holds b r j k = X i,ℓ ∈ N b A ( ij )( kℓ ) b r iℓ (7) The pro of is in the App endix. It is based on the follo wing lemma. Lemma 1. F or a network A , whic h is v - c omplete for a suﬃciently lar ge cutoﬀ value v , the fol lowing e quations hold for i a → ℓ and j b → k b A ab = A b 4 d A a A ij A kℓ (8) X j → c k b A ac = 1 4 d A a A ij A j k A kℓ (9) b A a • = 1 4 d A a A i • A •• A • ℓ (10) On the other hand, prop ositio n 1 implies the following corollary , which establishes that for mula (5) can be used to measur e the attraction bia s, a s intended. 7 Corollary 1. Th e dir e cte d r eputation and pr omotion r anks ar e the mar ginals of the no de attr action X k ∈ N b r j k = r  j (11) X j ∈ N b r j k = r  k (12) In terpretation. T o understand the meaning of a ttraction bias, consider a v -complete net work A , with the forward-out and backward-in dynamics. The pull ra nk r  j tells how likely it is that a rando mly sampled traﬃc path arr ives to j ; whereas the push rank r  k tells how likely it is that a randomly sa mpled traﬃc path departs from k . On the other ha nd, the attraction dynamics in the induced path net work b A g ives the no de attraction b r j k , which tells how lik ely it is that a randomly sampled traﬃc path trav erses a pa th from j to k . In summary , we hav e r  j = Prob  • ξ → j | ξ ∈ A   r  k = Prob  k ξ → • | ξ ∈ A   b r j k = Prob  j ξ → k | ξ ∈ b A  Although the notation suggests that r  , r  , and b r are sampled fro m diﬀerent pro c e sses, cor ollary 1 establishes that b r is in fact the joint distribution of r  and r  . Nevertheless, a diligen t rea der will sure ly notice a t wist of j a nd k in the last three equations, and wonder why is the probability that traﬃc goes fr o m j to k related with the probabilities that it arrives to j , and that it departs fr om k ? — The answer to this q ues tion makes the forward- out and the ba ckw ar d- in dynamics int o a mo re int eresting example than its many dynamical cous ins . Br ieﬂy , if the surfers are more likely to ﬂow with • → j if the capac ity of the links out of j is higher, and if they are mor e likely to ﬂow with k →• if the capa city of the links in to k is higher, then the surfers are most lik ely to follo w both thes e ﬂows, i.e. in to j and out o f k — if there is a high capa city of the links j → k . Mutual i nformation o f the inputs and the o u tputs . The fact that b r is the joint distribution of the pro cesses expressed by r  and r  allows us to extract their mutu al informa tion [4] I ( r  ; r  ) = D ( b r || r   ) = N X j =1 N X k =1 b r j k log b r j k r  j r  k Its expr ession in terms of rela tive entrop y D ( b r || r   ) [ ibidem ] shows that it measures how muc h we lose, in the eﬃciency of enco ding of b r if we a ssume that r  and r  are mutu ally indep endent. Intuitiv ely , the m utual information I ( r  ; r  ) can th us b e taken as a measure of the lo c ality of information process ing in A . If this is an entirely lo cal pro cess, then every path must b egin and end at the s ame no de, and the ra ndom walks δ and  , selecting the sources and the destinations of the paths, must coincide. But if δ =  , then the push rank and the pull rank m us t obey the s ame distr ibution r  = r  = r , and their mutual infor mation is I ( r  ; r  ) = H ( r ), their entrop y . In the other extreme case, the rando m walks δ and  a re indep endent 5 , and their joint distribution is just the pr o duct of their distributio ns b r j k = r  j r  k . Their mutual information is then I ( r  ; r  ) = 0. 7 Conclusions and future work When the W eb is view ed as a global data store, the problem of its seman tics is the pr oblem o f determining a unifor m mea ning for the data published by its v a rious pa rticipants. The sear ch eng ines a re dealing with 5 The theorem in the app endix suggests that th ey are similarl y distributed, up to a scale factor. 8 this problem on the level of the h uman-W eb interaction (e.g ., distinguishing the meanings of the word ”jaguar ” , sometimes denoting a car , sometimes a n animal [12], or deciding whether ”Paris Hilton”, in a given context, refers to a p erson or to a ho tel, etc.), whereas the Semantic W eb pro ject [1] deals with the computer-W eb interactions. When the W eb is viewed as a computer, the problem of its seman tics is not just a matter of assigning some meanings to some data stored in it, but a lso to its data pro ces sing op erations. F o r progra mming languages, this is what we usually ca ll op eratio nal seman tics. How ever, unlik e a programming language, the W eb, and other sp ontaneously evolving netw orks, do not have a formally deﬁned set o f data structures and op eratio ns: data are transfor med b y many ra ndom walks, running concurrently . Op erational semantics o f netw o rk computation requires a to o lkit for incremen tal analysis of suc h process e s. In this paper, we describ ed a pa th r anking method, which is may b ecome a useful piece of that to olkit. Now we sketc h a w ay to test it exper imen tally . Using the notion of a ttraction bias, w e lift the graph theoretic notio n of (maximal) clique into ra nk analysis, while retaining net work dynamics as a gra ph structure ov er such generalize d c liques. W e call these generalized cliq ues c onc epts and the links betw een them asso ciations . Communities and concepts. T aking the notion of attr a ction bias ba ck to the idea of communit ies as s ets of no des with hig h co hesion, from which w e started in the Introduction, we now reformulate the notion o f cohesion in a diﬀeren t norm ( ℓ ∞ instead o f ℓ 1 ), and deﬁne cohesion o f a set of no des U ⊆ N to b e their minimal symmetric attrac tio n bias Υ ( U ) = ^ i,j ∈ U ( Υ ij ∨ Υ j i ) F or ea ch ε ∈ [0 , 1], we deﬁne an ε - c ommunity to b e a set o f no des U ⊆ N such that Υ ( U ) ≥ ε . Denoting by ℘ ε N the set of ε -communities, note that ε 1 ≤ ε 2 implies that ℘ ε 1 N ⊇ ℘ ε 2 N . The partial ordering of U, V ∈ ℘ ε N is given by U ⊑ V ⇐ ⇒ U ⊆ V ∧ Υ ( U ) ≤ Υ ( V ) This g ives a di r e cte d c omplete p art ial or der (dcp o) . It is not a la ttice b ecause s o me communit ies canno t b e extended by new no des without decreasing their c o hesion; so there a re pair s of communities that cannot b e joined, a nd do not have an uppe r b o und. How ever, dir e cte d sets of comm unities (i.e., where each pair has an upp er b ound) do hav e least upp er b ounds, which ar e just their set theoretic unions. Directed complete pa r tial orders are often used in denotatio nal semantics of pro gramming la nguages [5]. According to that in terpr etation, communities ca n be thought of a s pieces of p artial information , their ⊑ - ordering as the incr ease of informatio n, and the existence of an upper bo und of tw o comm unities as the c onsistency of the informations that they ca r ry . The maximal elemen ts of ℘ ε N , i.e. the communities that ca nnot be extended by new no des without losing cohesion, can be construed as ε -c onc epts . A set U ∈ ℘ ε N is th us an ε -concept if Υ ( { i, j } ) ≥ ε holds for all i, j ∈ U , but fo r every k ∈ N \ U there is a j ∈ U such that Υ ( { k , j } ) < ε . The communit y a nd concept str uctur e of a net work A can be analyzed by studying the sequence of hypergra phs A ε , where the ε -concepts, or the ε -communities approximating them, are viewed as h yp eredg es. The sequence ( A ε ) ε ∈ [0 , 1] decreases as the cohesion para meter ε increases, and the highly cohesive comm unities and co ncepts can b e feasibly ana lyzed. A le vel further , concepts and comm unities can b e viewed as the no des of a netw or k. The most in ter e sting deﬁnition o f the links b etw een them, intuit ively thought of as asso cia tions, is based on a v ar ia nt of a path net work, complementing deﬁnition 2. A sketc h of this deﬁnition is in the next, ﬁnal subsectio n. Asso ciations . Let N ε denote the set of ε -concepts in a net work A . The c onc ept network A ε , induced by a netw or k A , has the ε -concepts as its no des. I ts edge s are calle d c onc ept asso ciations . The set of asso ciations b etw e en U, V ∈ N ε is E ε U V = X U → a U ∩ V X U ∩ V → b V ˜ E ab (13) where U ξ → V abbre viates δ ( ξ ) ∈ U ∧  ( ξ ) ∈ V , and ˜ E ab =  f = h f 0 , f 1 i ∈ E ij × E kℓ | γ ( f 0 ) + γ ( b ) ≤ v − d and γ ( a ) + γ ( f 1 ) ≤ v − d  9 An a sso ciation f ∈ A U V is th us a quadruple f = h a, b, f 0 , f 1 i i a   f 0 / / j b   k f 1 / / ℓ such that i , j, k ∈ U and j, k , ℓ ∈ V . Its cost is γ ( f ) = γ ( f 0 ) + γ ( b ) − γ ( a ) − γ ( f 1 ). The cost of an asso ciation from U to V is lo wer if the tra ﬃc fro m i ∈ U to ℓ ∈ V gets less co s tly when it crosses to V ea rlier. While the general netw ork analysis to ols apply to conce pt netw or ks, the v a rious notions of dy na mics acquire new meanings on this lev e l. At this p oint, understanding whic h of the p o ssible in ter pretations may lead to useful to o ls for extra c ting and analyzing the r e lev ant conc e pts, proces s ed in a net work, seems to call for experimentation with real da ta. References 1. T. Berners-Lee. Semantic W eb road map, Octob er 1998. 2. M. Bianchini, M. Gori, and F. S carselli. Inside P ageRank. ACM T r ans. Inter. T e ch. , 5(1):92–128, F ebruary 2005. 3. P . Boldi, M. Santini, and S. V igna. Page Rank as a function of the damping factor. In WWW ’05: Pr o c e e dings of the 14th international c onfer enc e on World Wide Web , pages 557–5 66, New Y ork, NY, USA, 2005. ACM Pres s. 4. T. M. Cov er and J. A. Thomas. Elements of information the ory . W iley- Interscience, New Y ork, NY, USA, 1991. 5. G. Gi erz, K. H. H oﬀmann, K. Keimel, J. Lawson, M. Mislo ve, and D . Scott. Continuous L attic es and Domains , vol ume 93 of Encyclop e dia of Math ematics and its Applic ations . Cam b rid ge Univers ity Press, 2003. 6. O. Goldreic h. F oundations of Crypto gr aphy: V olume 2, Basi c Applic ations . Cam bridge Un ivers ity Press, New Y ork, NY, USA , 2004 . 7. G. H. Golub and C. Grei f. An Arnoldi-type algo rithm for computing Pag eRank. BIT Num eric al Mathematics , 43(1):1–18 , 20 03. 8. Z. Gy¨ ongyi, H. Garcia-Molina, and J. P edersen. Co mbating W eb spam with T rustRank. In VLDB , pages 576–587 , 2004. 9. T. H. Ha veliw ala. T opic- sensitive pagerank: A con text- sensitiv e ranking algorithm for w eb search. IEEE T r ans. Know l. Data Eng. , 15(4):784–796, 2003. 10. C . Hubb ell. An input-outp ut app roac h to clique identiﬁcati on. So ciometry , 28:377–399 , 1965. 11. L. Katz. A new status index derived from sociometric analysis. Psychometrika , 18 :39–43, 1953. 12. J . M. Kleinberg. A uthoritative sources in a hyperlinked environmen t. Journal of the A CM , 46 (5):604–632 , 19 99. 13. A. N . Langville and C. D. Mey er. Go o gl e’ s PageR ank and Beyond: The Scienc e of Se ar ch Engine Ra nkings . Princeton Universit y Press, Princeton, NJ, U S A, 2006. 14. S. Mac Lane. Cate gories for the Working Mathematician . Num b er 5 in Graduate T exts in Mathematics. Springer- V erlag, 1971. 15. M . Newman, A.-L. Barabasi, and D. J. W atts, editors. The Structur e and Dynamics of Networks . Princeton Studies in Complexit y . Princeton Un ivers ity Press, Princeton, N J, USA, 2006. 16. M . E. J. Newman. Modularity and communit y stru cture in netw orks. PNAS , 103(23):8 577–858 2, June 2006. 17. Y. Ollivier and P . S en ellart. F inding related pages using Green measures: An illustration with Wikip edia. In Pr o c e e di ngs of the 22nd AAAI Confer enc e on A rtiﬁcial Intel ligenc e , pages 1427–1433, Menlo Park, California , July 2007. AAAI , AAAI Press. 18. T . O’Reilly . What is W eb 2.0, September 2005. 19. L. Page, S. Brin, R. Motw ani, and T. Winograd. The PageRank citation ranking: Bringing order to th e W eb. T echnical report, Stanford Digital Lib rary T ec hnologies Pro ject, 1998. 20. D. J. W atts and S. H. Strogatz. Collectiv e dynamics of ’small-w orld’ net works. Natur e , 393(6684 ):440–442 , June 1998. 10 App endix: P r o ofs Pr o of (of lemma 1(8)). The ﬁrst claim is that there is a suﬃcien tly la rge v suc h that γ ( f 0 ) + γ ( b ) + γ ( f 1 ) ≤ v − 2 d ho ds for all f 0 ∈ E ij and f 1 ∈ E kℓ . Since b and d are ﬁxed, the claim is clear if E ij and E kℓ are ﬁnite. Since A is assumed to be tr unca ted complete, an inﬁnite set of paths can only be ge ner ated from the links with a cost ≤ 0. So the costs of the ele men ts o f E ij and E kℓ are in any case bounded. But if all f 0 ∈ E ij and f 1 ∈ E kℓ satisfy γ ( f 0 ) + γ ( b ) + γ ( f 1 ) ≤ v − 2 d , then b E ab = E ij × E kℓ . Unfolding the deﬁnition of b A ab and using (3) w e get b A ab = X f ∈ b E ab 2 − γ ( f ) = X f 0 ∈ E ij X f 1 ∈ E kℓ 2 − 2 d − γ ( f 0 ) − γ ( b ) − γ ( f 1 )+ γ ( a ) = 4 − d 2 − γ ( b ) 2 − γ ( a ) X f 0 ∈ E ij 2 − γ ( f 0 ) X f 1 ∈ E kℓ 2 − γ ( f 1 ) = A b 4 d A a A ij A kℓ 1(9) fo llows directly from 1(8), unpacking A e = 2 − γ ( e ) . And 1(10) then follows fro m 1(9): b A a • = X b b A ab = X j,k ∈ N X j → b k b A ab = X j,k ∈ N 1 4 d A a A ij A j k A kℓ = 1 4 d A a A i • A •• A • ℓ Pr o of (of pr op osition 1(7)). W e unfold the deﬁnition of b r and then use (9) and (10): b r j k = X j → b k b r b = X j → b k X a b A ab b A a • b r a = X a P j → b k b A ab b A a • b r a = X a 1 4 d A a A ij A j k A kℓ 1 4 d A a A i • A •• A • ℓ b r a = X i,ℓ ∈ N X i → a ℓ A ij A j k A kℓ A i • A •• A • ℓ b r a = X i,ℓ ∈ N A ij A j k A kℓ A i • A •• A • ℓ X i → a ℓ b r a = X i,ℓ ∈ N b A ( ij )( kℓ ) b r iℓ 11 Pr o of (of c or ol lary 1(11)). W e set q j = P k ∈ N b r j k and expand b r j k using prop osition 1(7), to sho w that q is the stationary distribution of the pro cess A  : q j = X k ∈ N b r j k = X k,i,ℓ ∈ N b A ( ij )( kℓ ) b r iℓ = X k,i,ℓ ∈ N A ij A j k A kℓ A i • A •• A • ℓ b r iℓ = X i,ℓ ∈ N A ij A j • A i • A •• b r iℓ = X i ∈ N A  ij A j • A •• X ℓ ∈ N b r iℓ = X i ∈ N A  ij q i Since r  is b y deﬁnition the stationary point of A  , the uniqueness implies q = r  . T o pro ve 1(12), w e set q k = P j ∈ N b r j k and pr o ceed s imila rly: q k = X j ∈ N b r j k = X j,i,ℓ ∈ N b A ( ij )( kℓ ) b r iℓ = X j,i,ℓ ∈ N A ij A j k A kℓ A i • A •• A • ℓ b r iℓ = X i,ℓ ∈ N A • k A kℓ A •• A • ℓ b r iℓ = X ℓ ∈ N A • k A •• A  kℓ X i ∈ N b r iℓ = X ℓ ∈ N A  kℓ q ℓ Since r  is b y deﬁnition the stationary point of A  , the uniqueness implies q = r  . 12

Network as a computer: ranking paths to find flows

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment