Recursive Shortest Path Algorithm with Application to Density-integration of Weighted Graphs

Graph theory is increasingly commonly utilised in genetics, proteomics and neuroimaging. In such fields, the data of interest generally constitute weighted graphs. Analysis of such weighted graphs often require the integration of topological metrics …

Authors: Cedric E. Ginestet, Andrew Simmons

Recursive Shortest Path Algorithm with Application to   Density-integration of Weighted Graphs
Recursiv e Shortest P ath Algorithm Running Head: RECURSIVE SHOR TEST P A TH ALGORITHM Recursiv e Shortest P ath Algorithm with Application to Densit y-in tegration of W eigh ted Graphs Cedric E. Ginestet †‡ and Andrew Simmons †‡ † King’s College London, Institute of Psyc hiatry , Cen tre for Neuroimaging Sciences (CNS) ‡ National Institute of Health Researc h (NIHR) Biomedical Researc h Cen tre for Men tal Health at South London and King’s College London Institute of Psyc hiatry Corresp ondence concerning this article should b e sent to Cen tre for Neuroimaging Sciences, NIHR Biomedical Researc h Cen tre, Institute of Psychiatry , Box P089, King’s College London, De Crespign y P ark, London, SE5 8AF, UK. Email ma y be sent to cedric.ginestet@k cl.ac.uk Cedric E. Ginestet & Andrew Simmons 1 Recursiv e Shortest P ath Algorithm Abstract Graph theory is increasingly commonly utilised in genetics, proteomics and neuroimag- ing. In suc h fields, the data of interest generally constitute weigh ted graphs. Analysis of suc h w eigh ted graphs often require the integration of top ological metrics with resp ect to the densit y of the graph. Here, density refers to the proportion of the num ber of edges present in that graph. When topological metrics based on shortest paths are of interest, such density- in tegration usually necessitates the iterativ e application of Dijkstra’s algorithm in order to compute the shortest path matrix at each density lev el. In this short note, we describ e a recursiv e shortest path algorithm based on single edge up dating, which replaces the need for the iterativ e use of Dijkstra’s algorithm. Our prop osed procedure is based on pairs of breadth- first searc hes around eac h of the v ertices incident to the edge added at each recursion. An algorithmic analysis of the proposed tec hnique is pro vided. When the graph of in terest is co ded as an adjacency list, our algorithm can be shown to b e more efficien t than an iterative use of Dijkstra’s algorithm. In tro duction The last ten years has seen a surge of interest in graph theory among biologists, physicists and other natural scientists. This was primarily stim ulated b y the seminal pap ers of W atts and Stro- gatz (1998) and Barabasi and Alb ert (1999). In particular, a wide range of differen t data t yp es are no w analyzed through systematic calculations of v arious top ological measures, such as the c harac- teristic path length or clustering co efficien t. In systems biology and neuroscience, sub ject-sp ecific net w orks can b e constructed in order to compare several p opulations of net w orks for testing pu- tativ e differences b etw een groups of sub jects (see Bullmore and Sp orns, 2009, for a review). (F or con v enience, the terms netw ork and graph will here b e used interc hangeably , as this reflects some of the recen t dev elopment s in the literature.) Such biological net w orks, ho w ev er, tend to b e weigh ted undirected graphs, which generally corresp ond to some standardized cov ariance matrices b etw een a set of regions of interest. By contrast, most of the top ological measures introduced b y W atts and Strogatz (1998) and Barabasi and Albert (1999) p ertain to unweighte d netw orks. There is curren tly no general consensus on ho w to compute or compare the topology of w eigh ted graphs. This is a particularly arduous problem, since it requires the use of real-v alued mathemati- cal tools on ob jects, which are essentially discrete. One of the possible solutions to this con undrum has b een adv anced by He et al. (2009), who suggested integrating the top ological measures of in- terest with resp ect to the density of the net work (see also Ac hard and Bullmore, 2007, Ginestet and Simmons, 2011). The densit y of a netw ork is here defined as the prop ortion of the n umber of edges in a given graph. Such integration, how ever, is computationally exp ensive, and its complex- it y gro ws quadratically with the n umber of nodes. A Mon te Carlo scheme has b een proposed in the literature to address this issue and approximate the v alue of such an in tegral (Ginestet et al., Submitted). Such Monte Carlo methods, how ever, also necessitates large num b er of sim ulations in order to reduce the v ariability of the resulting estimates. Most of the topological metrics of interest to researc hers in neuroscience and systems biology tend to in volv e the computation of the matrix of shortest paths, denoted D . This includes, for instance, the global and lo cal efficiency measures prop osed b y Latora and Marc hiori (2001) (see also Latora and Marc hiori, 2003). The computation of D for a given net wor k can b e done Cedric E. Ginestet & Andrew Simmons 2 Recursiv e Shortest P ath Algorithm efficien tly using the celebrated Dijstra’s algorithm (Dijkstra, 1959). How ever, when considering w eigh ted net works, Dijstra’s algorithm ma y need to b e inv oked as man y times as the num b er of edges in the graph of in terest. In this short note, w e address this specific problem b y prop osing a recursive shortest path algorithm based on applying single edge up dates to D . In this setup, w e only w ork with the shortest path matrix and compute the v alue of the desired top ological metric at ev ery density level. T aken together, w e therefore provide an efficien t algorithm for the densit y-in tegration of the top ological functions of w eigh ted net w orks. Densit y-in tegration of T op ological Metrics In this paper, our main fo cus will be on undirected weigh ted graphs, con taining no graph loops or m ultiple edges. Ho wev er, since we also need to refer to un weigh ted graphs, w e in tro duce the follo wing notation. A graph G is here defined as a triple ( V , E , W ), where V ( G ) is the standard v ertex set, E ( G ) is the edge set and W ( G ) is a multiset of real-v alued weigh ts. Our conv ention generalizes to directed graphs. In addition, this also includes undirected unw eighted (simple) graphs as sp ecial cases, for whic h the elemen ts of W b elong to { 0 , 1 } . Suc h a setup may , for instance, apply to the consideration of correlation matrices or other matrices of similarity measures with real-v alued en tries. In addition, w e will mak e use of the follo wing notation, N V := |V ( G ) | , N E := |E ( G ) | , and N I := N V ( N V − 1) 2 , where := signifies that the left-hand side is defined as the right-hand side. W e define N I as the n um b er of shortest paths in G . Naturally , N I here takes this v alue because G is undirected. F or a directed net w ork, N I w ould b e N V ( N V − 1). F or con venience, we will in terchangeably use the follo wing t w o sets of indices to label the elements of W , W ( G ) = { w v 1 v 2 , . . . , w ij , . . . , w N V − 1 ,N V } = { w 1 , . . . , w e , . . . , w N E } . (1) Alb eit we will here restrict our attention to undirected graphs, an extension of our prop osed tec hnique to directed net w orks will be discussed in the conclusion. A range of top ological metrics necessitating the computation of the shortest path matrix ha v e b een prop osed in the literature. Two p opular c hoices of top ological measures are the global and lo cal efficiency measures introduced by Latora and Marchiori (2001). Both of these quantities can b e deriv ed from the general definition of the efficiency , E ( · ), of a simple graph G = ( V , E , W ), whic h is defined as follo ws, E ( G ) := 1 N I N V X iv I { w ij ≥ w uv } , (8) where I {·} is the indicator function returning 1 if the argumen t is true and 0 otherwise. W e will assume that are no ties in the v alues of W . In practice, the presence of ties can be resolved b y randomization. Secondly , we extract each edge in the order provided by the ranks. That is, running ov er the ranks k t , where t = 1 , . . . , N I , w e ha v e the follo wing N I ordered pairs: { v 1 , v 2 } t := argmax { i,j } I { P ij ( W ) = k t } . (9) It then suffices to update D using each of these pairs recursively , as follo ws, D t = edgeUp date ( D t − 1 , { v 1 , v 2 } t ) . (10) F or each D t , we can now collect the top ological measure based on this particular shortest path matrix, T ( D t ). Finally , is then remains to compute the mean v alue of these collected topological measures in order to obtain the desired densit y-integrated metric of the graph of in terest. That is, T ( G ) = 1 N I N I X t =1 T ( D t ) . (11) The difficulty of this method centres on the use of the edgeUpdate function in equation (10). This algorithm pro ceeds as follows. At each step t , w e ask what the impact of the addition of a new edge to an existing graph is in terms of shortest path relationships. Our algorithm answers this question by tw o successive breadth-first searches (BFS) around the vertices inciden t to the edge added at each t . Firstly , we conduct a BFS around v 2 and chec k whether the shortest path b etw een v 1 and each of the m th degree neighbors of v 2 are shortened by the addition of a new edge b etw een v 1 and v 2 . Secondly , we conduct a BFS centred at v 1 , where w e chec k if the shortest paths b etw een all the neigh b ors of v 2 , which were mo dified in the first stage and the m th degree neighbors of v 1 are shortened by the introduction of the new edge. The full edge up dating algorithm of D t is describ ed in pseudo co de in Figure 1. F or simplicity , w e represen t the algorithm when eac h D t is co ded as a full matrix. How ever, a list representation can also b e adopted to minimise storage space. Moreo ver, we hav e also pro vided a graphical description of our edge updating algorithm for densit y-in tegration in Figure 2. A C++ version of this algorithm is freely av ailable as part of the Net w orkAnalysis pack age on the R platform (h ttp://cran.r-pro ject.org/pac k age=Netw orkAnalysis). Cedric E. Ginestet & Andrew Simmons 5 Recursiv e Shortest P ath Algorithm Algorithmic Analysis When storing the graph of interest as an adjacency matrix, Dijkstra’s algorithm has efficiency in O ( | V | 2 ). Since density-in tegration would require inv oking that algorithm N I = N V ( N V − 1) / 2 times, the efficiency would, in that case, b e in O ( | V | 4 ). If co ding the graph as a matrix, our prop osed algorithm do es not perform better than a combination of Dijsktra’s algorithm. As the efficiency of a BFS is O ( | V | 2 ) and we p erform N I suc h searches, it follows that in the w orst-case scenario, the efficiency of our prop osed metho d would also b e O ( | V | 4 ). Ho w ev er, if the graph of interest is co ded as a list, each BFS is in O ( | E | + | V | ), and therefore the entire recursive shortest-path algorithm has an efficiency of O ( | V | 2 | E | + | V | 3 ). By contrast, a combination of Dijstra’s algorithms based on an adjacency list only reduces to O ( | V | 2 | E | log | V | ) or O ( | V | 2 | E | + | V | 3 log | V | ) using the Fib onacci heap. Thus, our algorithm outp erforms a combination of N I Dijkstra’s algorithms when the graph of in terest is coded as a list. Conclusion In this paper, we hav e describ ed a recursiv e shortest path algorithm for weigh ted graphs, whic h can b e used for the integrating top ological metrics with resp ect to density . This prop osed metho d can readily b e generalized to directed netw orks. In such a case, one simply needs to define a graph G = ( V , E , W ), where the elemen ts of E ( G ) are ordered pairs of v ertices. The edgeUpdate function in equation (10) can then be mo dified in order to c heck for dir e cte d shortest paths instead of undirected ones. Given the gro wing interest of natural scientists in graph top ological prop erties and the large av ailability of w eighted net works, the utilization of algorithms of the type describ ed in this pap er is lik ely to become ubiquitous. Ac kno wledgmen ts This work was supp orted by a fellowship from the UK National Institute for Health Research (NIHR) Biomedical Researc h Cen tre for Men tal Health (BR C-MH) at the South London and Maudsley NHS F oundation T rust and King’s College London. This work has also been funded b y the Guy’s and St Thomas’ Charitable F oundation as well as the South London and Maudsley T rustees. the authors also w ould lik e to thank t w o review ers for their v aluable input. References Ac hard, S. and Bullmore, E. (2007). Efficiency and cost of economical brain functional netw orks. PLOS Computational Biolo gy , 3 , 174–182. Barabasi, A.L. and Alb ert, R. (1999). Emergence of scaling in random netw orks. Science , 286 , 509–512. Bullmore, E. and Sporns, O. (2009). Complex brain netw orks: Graph theoretical analysis of structural and functional systems. Natur e R eviews Neur oscienc e , 10(1) , 1–13. Dijkstra, E. (1959). A note on tw o problems in connexion with graphs. Numerische Mathematik , 1 , 269–271. Cedric E. Ginestet & Andrew Simmons 6 Recursiv e Shortest P ath Algorithm Ginestet, C.E. and Simmons, A. (2011). Statistical parametric netw ork analysis of functional connectivity dynamics during a w orking memory task. Neur oImage, doi:10.1016/j.neur oimage.2010.11.030 , 5(2) , 688–704. Ginestet, C., Nichols, T., Bullmore, E., and Simmons, A. (Submitted). W eighted netw ork analysis: Separating differences in cost from differences in top ology . PL oS ONE . He, Y., Dagher, A., Chen, Z., Charil, A., Zijdenbos, A., W orsley , K., and Ev ans, A. (2009). Impaired small-w orld efficiency in structural cortical netw orks in multiple sclerosis asso ciated with white matter lesion load. Br ain , 132 (12), 3366–3379. Latora, V. and Marchiori, M. (2003). Economic small-w orld behavior in w eighted net w orks. The Eur op e an Physic al Journal B - Condense d Matter and Complex Systems , 32 (2), 249–263. Latora, V. and Marchiori, M. (2001). Efficient behavior of small-world net works. Phys. R ev. L ett. , 87 (19), 198701–198705. W atts, D.J. and Strogatz, S.H. (1998). Collectiv e dynamics of ‘small-world’ netw orks. Natur e , 393 (6684), 440–442. Cedric E. Ginestet & Andrew Simmons 7 Recursiv e Shortest P ath Algorithm Edge Updating of D ## Inputs: D , { v 1 , v 2 } . ## Output: D . 1 ### Initialization: 2 Set N V = D .ncol () ; 3 d v 1 v 2 = d v 2 v 1 = 1 ; 4 5 ### BFS around v 2 : 6 Set S G = v 1 ∪ v 2 , S (0) = v 2 ; 7 FOR ( m = 1 , . . . , N v − 2) DO 8 ∆ = S v ∈ S ( m − 1) δ ( v ) /S G ; 9 FOR ( v ∈ ∆) DO 10 IF ( d v 1 v ≥ m + 1) 11 d v 1 v = d vv 1 = m + 1 ; Add v to S ( m ) ; Add v to S G ; 12 END IF; 13 END FOR; 14 IF S ( m ) = ∅ BREAK; 15 END FOR; 16 17 ### BFS around v 1 : 18 Set S G = S G /v 1 , S (0) = v 1 ; 19 FOR ( m = 1 , . . . , N v − 2) DO 20 ∆ = S v ∈ S ( m − 1) δ ( v ) /S G ; 21 FOR ( v ∈ ∆) DO 22 FOR ( u ∈ S G ) DO 23 IF ( d vu ≥ d v 1 u + m ) 24 d vu = d uv 1 + m ; d vu = d v 1 u + m ; Add v to S ( m ) ; 25 END IF; 26 END FOR; 27 END FOR; 28 IF S ( m ) = ∅ BREAK; 29 END FOR; 30 31 Return D ; Figure 1. Up dating of D inserting one edge at a time, here denoted v 1 v 2 . The set S G is the set of visited v ertices, whereas S ( m ) ’s are the sets of unvisited edge corresp onding to the m th degree neighborho o ds of the previously mo dified vertices, and ∆ is the set of relev ant vertices at ev ery level of the BFS. Both ∆, S G and the S ( m ) ’s should b e regarded as containers, where adding implies inserting a new elemen t in a set. Cedric E. Ginestet & Andrew Simmons 8 Recursiv e Shortest P ath Algorithm D v 1 v 2 v 3 v 4 v 5 v 6 v 7 . 1 3 2 ∞ ∞ ∞ 1 . 2 1 ∞ ∞ ∞ 3 2 . 1 ∞ ∞ ∞ 2 1 1 . ∞ ∞ ∞ ∞ ∞ ∞ ∞ . 1 2 ∞ ∞ ∞ ∞ 1 . 1 ∞ ∞ ∞ ∞ 2 1 . a) Up date: New edge b et w een v 4 and v 5 . v 7 v 6 v 5 v 4 v 3 v 2 v 1 D v 1 v 2 v 3 v 4 v 5 v 6 v 7 . 1 3 2 ∞ ∞ ∞ 1 . 2 1 ∞ ∞ ∞ 3 2 . 1 ∞ ∞ ∞ 2 1 1 . 1 2 3 ∞ ∞ ∞ 1 . 1 2 ∞ ∞ ∞ 2 1 . 1 ∞ ∞ ∞ 3 2 1 . b) Phase I: Breadth-first searc h around v 5 . v 7 v 6 v 5 v 4 v 3 v 2 v 1 D v 1 v 2 v 3 v 4 v 5 v 6 v 7 . 1 3 2 3 4 5 1 . 2 1 2 3 4 3 2 . 1 2 3 4 2 1 1 . 1 2 3 3 2 2 1 . 1 2 4 3 3 2 1 . 1 5 4 4 3 2 1 . c) Phase I I: Breadth-first searc h around v 4 . v 7 v 6 v 5 v 4 v 3 v 2 v 1 Figure 2. Graphical representation of the edge up dating algorithm to mo dify the shortest path matrix, D , one edge at a time. In panel (a), a new edge, v 4 v 5 , is added to an existing graph, which is otherwise comp osed of t w o disconnected comp onen ts. In panel (b), we conduct a BFS around v 5 with resp ect to v 4 , up dating D accordingly with the new shortest paths b etw een v 4 and v 5 and its first and second degree neigh b ors represented in red, yello w and purple, resp ectiv ely . In panel (c), we conduct a BFS around v 4 with respect to the vertices, whic h w ere mo dified in phase I of edgeUp date, denoted in blue. The first and second degree neighbors of v 4 are here denoted in orange and purple, respectively . In each panel, the corresp onding mo difications in the matrix of shortest paths are rep orted on the righ t-hand side. The presence of a dashed line betw een tw o v ertices indicates that w e test whether the inclusion of v 4 v 5 shortens the shortest path b etw een these tw o vertices. Cedric E. Ginestet & Andrew Simmons 9

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment