Optimal Per-Edge Processing Times in the Semi-Streaming Model
We present semi-streaming algorithms for basic graph problems that have optimal per-edge processing times and therefore surpass all previous semi-streaming algorithms for these tasks. The semi-streaming model, which is appropriate when dealing with m…
Authors: ** - **J. A. K.** (가명) – 알고리즘 이론 및 데이터 스트리밍 전문가 - **M. L.** – 그래프 알고리즘 및 동적 데이터 구조 연구자 - **S. P.** – 대규모 네트워크 분석 및 시스템 구현 담당 *(실제 논문에 명시된 저자 이름을 대체한 가상의 표기이며, 원문에 따라 실제 저자 정보를 삽입해야 함)* --- **
Optima l P er-Edge P ro cessin g T imes in t he Semi-St reami ng Mo del ∗ Mariano Zelke a † a Hum bo ldt-Universit¨ at zu Berlin, Institut f¨ ur Informatik, 100 9 9 Berlin, German y W e present semi-streaming algorithms for basic graph problems that hav e optimal p er-ed ge pro cessing times and therefore surpass all previous semi-streaming algorithms for th ese tasks. The semi-streaming m o del, which is appropriate when dealing with massiv e graphs, forbids random access to the inpu t and restricts the memory t o O ( n · p olylog n ) bits. P articularly , the formerly b est per-edge pro cessing times for find in g the connected components and a bipartition are O ( α ( n )), for determining k -vertex and k -edge conn ectivit y O ( k 2 n ) and O ( n · log n ) resp ectively for any constant k and for computin g a minimum spanning fores t O (log n ). All these time b ounds we reduce to O (1). Every presented algorithm determines a solution asymptotically as fast as the b est correspond ing algorithm up to date in the classi cal RAM model, whic h therefore cannot conv ert the adv an tage of u nlimited memory and random access into sup erior comput ing times for these problems. Keywor ds: graph algorithms, streaming algorithms, p er-edge pro cessing time 1. In tro duction When facing computationa l tasks o n mass ive graphs the p ostulate of the classical RAM mo del, that is, storing the whole input in memor y allow- ing rando m access to it, is no longer a dequate. In fa ct, informatio n building up the graph ma y arrive a t no sp ecified order and the attempt of completely storing it exceeds common main mem- ories. Rega rding this Muthukrishnan[8] 20 03 pro- po sed the semi-str e aming mo del as a more restric- tive mode l of computation. According to this the edges of the input graph G app ear at arbitra ry or - der and the memo ry is limited to O ( n · p olylog n ) bits, where n is the num ber o f vertices in G . An impo rtant parameter of a semi- s treaming algo- rithm is describ ed b y the per- edge pr o cessing time T , i.e., t he time the algorithm needs to handle each single edg e. This time determines at which frequency the edges may arrive. The second pa- rameter of the semi-strea ming mo del denotes the nu mber P o f passes the algor ithm ta kes ov er the input str eam. All considered a lgorithms in this pap er use only one pass. Despite the hea vy restrictions in the semi- ∗ Supported by the DFG Research Cent er Ma theon “Mathematics for key techn ologies” in Berlin † Email address: zelk e@informatik.hu-berlin.de streaming model there are algorithms known so lv- ing ba sic graph pro blems. In [4] semi-streaming algorithms are given for co mputing the connected comp onents a nd a bipartition o f a graph as well as a minimum spanning tree of a weighted gr aph. There a re approa ches to determine the k -edge connectivity [5 ] and the k -vertex connectivity [5],[13] of a graph for any constant k . In this pap er w e present semi-str eaming alg o- rithms for co mputing the co nnec ted co mpo nen ts and a bipar tition of a gra ph, to calculate the k - vertex a nd k -edge connectivity for a ny constant k and to find a minimum spa nning forest MSF. All these alg orithms hav e co nstant and therefore optimal p er-edge pro cessing times. Section 2 gives the usual definitions, in Section 3 we discuss our definition of the p er-edge pro- cessing time which is a slig h t refinement of pre - vious definitions. W e develop o ur s emi-streaming algorithms in Section 4. In Section 5 we debate on how the obta ined algorithms co mpete with the corres p onding algorithms in the RAM model. A final conclusion is found in Section 6. 2. Preliminaries and Defini tions By G we denote a graph G ( V , E ) with vertex set V and edge set E . W e c a ll n = | V | and 1 2 Mariano Zelke m = | E | the num ber o f vertices and e dges re- sp ectively . Every gra ph consider ed in this pap er is undirected and contains no lo ops but migh t hav e multiple edges. F or computing an MSF we consider G to b e a weigh ted graph, that is, with a nonnega tive weigh t asso ciated with each edg e. Regarding the memory co nstraints of the semi- streaming mo del we assume ev ery w eight to b e storable in O (po lylog n ) bits. W e define α ( m, n ) to b e a na tural in verse of Ac kermann’s function A ( · , · ) as defined in [12]: α ( m, n ) := min { i ≥ 1 | A ( i, ⌊ m/n ⌋ ) > log n } . W e abbreviate α ( n ) to denote α ( n, n ). Bipartition. A g raph G is called bip artite if the vertices can b e split in tw o pa r ts, a bip artition , such that no edge runs b etw een t wo vertices in the same parts. The problem of finding a bipartition is to find tw o such par ts or stating that there is no bipartition since the gr a ph is not bipartite. Connectivit y . W e na me t wo vertices c onne cte d if there is a path betw een them. A gra ph G is con- nected if any pair of v ertices in G is connected, a c onne cte d c omp onent o f G is an induced subgra ph C of G such that C is connected and maximal. A sp anning for est of G is a subgraph o f G with- out an y cycles having the same connected com- po nent s as G . Giv en a p ositive integer k , a graph G is said to b e k -vertex c onne cte d ( k - e dge c on- ne ct e d ) if the r emov al of any k − 1 vertices (edges) leav es the gr a ph connec ted. A subset S of the ver- tices (edges) of G we call a n l -sep ar ator ( l - cut ) if l = | S | and the gra ph obtained by remo ving S fro m G has more connec ted comp onents than G . The lo c al vertex-c onne ctivity κ ( x, y ; G ) ( lo c al e dge-c onne ctivity λ ( x, y ; G )) denotes the num ber of vertex-disjoint (edg e-disjoint) paths b etw e e n x and y in G . By a class ical r e s ult o f Menger (see e.g. [1]) the lo cal vertex- (edge-) co nnectivity be- t ween x a nd y equals the minimum num ber of vertices (edges ) that must be remov ed to obtain x and y in different co nnec ted comp onents. MSF/MST. F or a n edge-weighted g raph G the minimum sp anning for est MSF is a subgra ph G ′ of G with minimum total cost co nsisting of the same connected compone nts a s G . If G is co n- nected w e name G ′ , whic h is then connected as well, the minimum sp anning tr e e MST of G . Certificates. Given a n y gr aph prop er ty P and a g r aph G , a c ertific ate of G for P is a gr aph G ′ on the same vertex set such that G has P if and only if G ′ has P . F or a ny graph G on vertex set V and a n y prop- erty P a str ong c ertific ate o f G for P is a gra ph G ′ on v ertex set V such that for a ny g raph H on V , G ∪ H has P if a nd only if G ′ ∪ H has P . A certificate is said to b e sp arse if the n um b er of edges is O ( n ). Semi-Streaming Algorithm. A gr aph str e am of a gra ph G is a sequence of the m edges of G in arbitrar y o der . A semi-str e aming algorithm A gets a graph stream as an input and is restricted to use a space of at most O ( n · p olylog n ) bits. The algor ithm may access the input stream for P passes in a sequential one-way order. All algo- rithms cons idered in this pap er use only P = 1 pass. The pe r -edge pro cessing time T of A we de- fine to be the minimum time allow ed b etw een the revealing of tw o consecutive edges in the input stream. That definition of T renders the defini- tions of previous pa p er s more precisely , we g ive a discus sion co ncerning that in Section 3 . There we also commen t on the c omputing time which denotes the total time requir ed by A to deter mine the prop erty in ques tio n of the input graph. 3. Discussion of P er-Edge Pro cessi ng Time In previo us paper s abo ut semi-streaming algo - rithms that co nsider the p er-edg e pro ces sing time T ([4],[5],[13]), T is used in an ambiguous w ay . While being used as the worst-case time to pro- cess a single edge o n the one ha nd it is equally used on the other hand, even if not explicitly stated, as amortized time charged ov er the n um- ber of edges. In fact, if to ols as dynamic trees or disjoint set data structures are utilized they give rise to amortized times since their time bounds are of amor tized type, to o . Pro cessing the input edges is then assumed to b e evenly spread o ver the whole computing time which is just m · T . This definition is not appropria te for a strea m- Optimal Per-Edge Pro ce ssing Times in the Semi-Stre aming Mode l 3 ing algor ithm: As Muthukrishnan[8] p ointed out the computing time, i.e., the time to ev aluate the prop erty in question for items read in so far , is not the most impor ta n t par ameter o f a stream- ing alg orithm. What is mor e cr uc ia l is the max- im um frequency of incoming items that can s till be co ns idered b y the algorithm. That refers to the sp eed a t which external stor age devices can present their data conten t to a streaming algo- rithm and c onstitutes the freq uency at which ob- served phenomena ca n b e taken into account. T o this a im it is desir able to maximize the p os sible rate of incoming items b y p ostp oning as muc h op erations as p ossible to a p oint after whic h all items are re ceived, possibly accepting a higher computing time. T o mo del this worth while prop erty of a strea m- ing alg orithm A we prop os e the definition o f the per -edge pro cessing time T to b e the minim um allow able time b etw een tw o cons e c utiv e edges in the gr aph str eam. The final deter mination of the prop erty in question ma y r equire some p ostpro- cessing a fter r eading all input edges. This time is considered in the computing time which incorp o- rates the sum of the p er-edg e pro cessing times of all edges and the postpro cessing time. 4. Computing Certificates and Buffering Edges T o achiev e o ur optimal p er- edge pro ce ssing times we ex ploit the general metho d o f spars ifi- cation as presented by Eppstein et al.[3]. F eigen- baum et al.[5] p ointed out how the r e sults of [3] can b e adopted for the s emi-streaming mo del. Thu s they received the former ly best b ounds on T for almost all pr oblems consider ed in this pa p er . W e refine their metho d to obtain an improvemen t of their r esults. F or a comparison of our new bo unds with the previous ones see T able 1. Due to the memory limitatio ns of the semi- streaming mo del it is not p oss ible to memorize a whole graph which is to o dense, that is, if m/n ≫ log n . A w a y to de ter mine gr a ph pro p- erties without completely stor ing the graph is to find a spar s e certificate C of the g raph for the prop erty in question. Consis ting o f a linear nu m- ber of edges the certificate can b e sto red within the memory res trictions a nd testing it answ ers the question for the or iginal graph. The con- cept of certificates has be e n a pplied for the semi- streaming model in [5] and [13]. Ho wev er, in [13] every input edge initiates an up date of the certifi- cate which is time-consuming and avoids a faster per -edge pro cessing . T o inc r ease the manageable frequency of in- coming edges, upda ting the certificate can b e done not for every s ing le edge but for a gro up of edges. While considering suc h a gr oup of edges the next incoming edges can b e buffered to com- po se the group for the following up date. T o pe rmit this updating in gr oups o f edges the utilized certificate must b e a strong ce r tificate, an as sumption that is not required in [13]. That is b ecaus e strong certificates ob ey t w o imp ortant attributes for any fixed g raph prop erty: Firstly , they behave transitively , that is, if C is a strong certificate for G and C ′ is a strong certificate for C , then C ′ is a strong certificate for G . Secondly , if G ′ and H ′ are strong certificates of G and H resp ectively , then G ′ ∪ H ′ is a s trong certificate of G ∪ H . The technique of gro up-wise updating is used by Eppstein et al.[3] yie lding fast dynamic a l- gorithms and has b een transfer r ed to the semi- streaming mo del by F eigenbaum et a l.[5]. The following theore m is a slight ly extended v ersion of their result augmen ted with s pace considera- tions. W e will need details of the pro o f later on. Theorem 1 L et G b e a gr aph and let C b e a sp arse and str ong c ertific ate of G for a gr aph pr op erty P . If C c an b e c ompute d in sp ac e O ( m ) and time f ( n, m ) , t hen ther e is a one-p ass semi- str e aming algorithm building C of G with p er-e dge pr o c essing t ime T = f ( n, O ( n )) /n . Pro of. W e denote the e dg es o f the input str e am as e 1 , e 2 , . . . , e m and the subgr aph o f G contain- ing the first i edges in the stream as G i . W e inductively as sume that we c o mputed a sparse and strong certificate C j n of the graph G j n for 1 ≤ j < ⌊ m/ n ⌋ using a time o f f ( n, O ( n )) /n per already pro cessed edge. During the com- putation of C j n we buff ered the next n edges e j n +1 , e j n +2 , . . . , e ( j +1) n . 4 Mariano Zelke T able 1 Previous ly b est p er-edg e pro cessing times T compared to our new bounds Problem Previous Best T New T Connected comp onents O ( α ( n )) O (1 ) Bipartition O ( α ( n )) O (1 ) { 2,3 } - vertex co nnectivit y O ( α ( n )) O (1 ) 4-vertex connectivity O (lo g n ) O (1 ) k -vertex connectivity O ( k 2 n ) O (1 ) { 2,3 } - edge connectivity O ( α ( n )) O (1 ) 4-edge connectivity O ( nα ( n )) O (1 ) k -edge connectivity O ( n · log n ) O (1 ) Minim um spanning fore s t O (lo g n ) O (1 ) All previous b ounds are due to [5 ], apart from k -vertex connectivity which is a result of [13]. k is any constant, α ( n ) the inverse of Ack ermann’s function. Because of the prop erties of s trong certificates T = C j n ∪ { e j n +1 , e j n +2 , . . . , e ( j +1) n } is a strong certificate for G ( j +1) n . Since C j n is sparse, T co n- sists o f O ( n ) edges as well. Computing C ( j +1) n as a spa rse and strong certifica te of T can b e re- alized in a space linear in the spa c e needed to memorize the edges o f T , which is O ( n · p olylog n ) bits, without exceeding the memo ry limitation of the semi-streaming mo del. By transitivity C ( j +1) n is a strong certificate of G ( j +1) n . A time of f ( n, O ( n )) suffices to compute C ( j +1) n , hence the input edg es ca n arrive with a time dela y o f f ( n, O ( n )) /n building the g r oup o f the nex t n edges to up date the certifica te after the c o mpu- tation of C ( j +1) n is completed. Finally for k = ⌊ m/ n ⌋ the last gr oup of edges { e kn +1 , e kn +2 , . . . , e m } can simply b e added to C kn to obtain a s parse and strong certificate of the input gra ph G for the prop erty P . ⊓ ⊔ T o o btain our semi-streaming alg o rithms with op- timal p e r-edge pro cessing times, all that r e mains to do is to present the r equired cer tificates and to show in whic h time and space b ounds they can be computed. A t first gla nce it may seem sur- prising that F eig enbaum et al.[5] using the same techn ique of up dating cer tificates with groups o f edges do not meet the b ounds w e present in this pap er. The r eason is that they just observe that results of E ppstein et al.[3] can b e trans fered to the semi-str eaming mo del. How ever, Eppstein et al. develop dynamic gr a ph alg orithms requiring powerful abilities: The algo rithm m ust b e a ble to answer a query for the subgr aph of alr e ady r e ad edges at any time a nd it must ha ndle e dg e dele- tions. In the semi-str eaming mo del the pro pe r ty is queried only at the end of the stream and there are no edge deletions. Thus we can drop b oth re- quirements for faster p er-edg e pro ce s sing times. In the following the input g raph for our semi- streaming a lgorithms is deno ted by G with n ver- tices and m edges as usual. 4.1. Connected Co m p onents W e use a spanning forest F o f G as a cer tificate. F is not only a stro ng cer tificate for connec tiv- it y it a lso has the sa me connec ted components as G . F can be computed by a depth-firs t s earch in time and s pace of O ( n + m ) and is sparse b y defi- nition. Using Theorem 1 we get a semi-streaming algorithm co mputing a spanning forest o f G with per -edge pr o cessing time T = O (1). T o identify the connected co mpo nent s of G in the p ostpr o- cessing step we can run a depth-fir st search o n the final certificate in time O ( n ). The r esulting computing time is m · T + O ( n ) = O ( n + m ). 4.2. Bipartition As a certificate for bipartiteness of G we use F + , whic h is a spanning fo rest of G augmen ted Optimal Per-Edge Pro ce ssing Times in the Semi-Stre aming Mode l 5 with one more edge of G inducing an o dd cycle if there is any . If no such cycle exists F + is just a spanning forest. By [3] F + is a strong certificate of G and spar se by definition. It can b e com- puted by a depth-firs t search which is alter nately coloring the visited vertices and is therefo r e able to find a n o dd cycle. T o do so a time and space of O ( n + m ) suffices, yielding a semi- s treaming algorithm with T = O (1). On the final certificate we can run aga in a depth-first s e arch color ing the vertices alternately in time O ( n ) dur ing the po st- pro cessing s tep. That pro duces a bipartition of the vertices or identifies an o dd cycle in G in a computing time of O ( n + m ). 4.3. k -V ertex Connectivit y F or k -vertex connectivity , k b eing any con- stant, we use as a certifica te o f G a subgraph C k which is der ived by an algo rithm prese n ted by Nagamo chi and Ibaraki[9]. C k can be computed in time and spa ce of O ( n + m ), contains at most k n edg es and is therefore sparse. Bey ond it, as a main result o f [9] C k preserves the lo ca l vertex connectivity up to k for any pair of no des in G : κ ( x, y ; C k ) ≥ min { κ ( x, y ; G ) , k } ∀ x, y ∈ V (1) This qualit y of C k leads to useful prop erties: Lemma 2 Every l -sep ar ator S in C k , l < k , is an l -sep ar ator in G and its r emova l le aves t he same c onne cte d c omp onents in b oth C k \ S and G \ S . Pro of. In C k \ S we find tw o no nempt y , disjoint connected comp onents X and Y with vertices x ∈ X and y ∈ Y . Assume that S is no t an l -separ ator in G , therefore there exists a pa th Z from x to y in G \ S . Let x ′ be the la st vertex on Z in X and y ′ the first one in Y . The part of Z b etw een x ′ and y ′ we call Z ′ . I n C k we find at most l vertex-disjoint paths b et ween x ′ and y ′ , a ll of them using vertices of S . In G these pa ths exist a s well with the additional path Z ′ which is vertex-disjoint from the other paths b y construction. T he r efore the lo cal connectivity b etw een x ′ and y ′ in G exce eds that in C k contradicting prop er ty 1 o f C k . Since C k \ S is a subgraph of G \ S every co n- nected comp onent o f C k \ S is included in one con- nected compone nt of G \ S . Assume that W is a connected comp onent in G \ S which co n tains tw o vertices i and j within different connected com- po nent s of C k \ S , namely I ∋ i and J ∋ j . As in the firs t part of this pr o of we can find a path Z from i to j in W with x ′ being the last vertex in I and y ′ the first one in J on Z . W e can deduce the same contradiction a s above. ⊓ ⊔ So C k is usable for our purpos es: Lemma 3 C k is a s t r ong c ertific ate for k -vertex c onne ctivity of G . Pro of. If C k ∪ H is k - vertex connected then G ∪ H including C k ∪ H as a subgra ph is k -vertex con- nected as well. Ass ume for the pro of of the con- verse direction that G ∪ H is k -vertex connected and C k ∪ H is not. Then C k ∪ H c ontains an l - separato r S for some l < k . After the r emov al of S the remaining v ertices o f C k ∪ H can b e group ed int o tw o nonempt y sets A a nd B , such that no edge joins a vertex of A with a v ertex of B . It is immediate that H do es not contain any edges betw een A and B . Clearly , removing S from C k pro duces the same sets A and B , still with no edge joining them. The prop erties o f C k shown in Lemma 2 make sure that the remov al of S from G le aves A and B without any joining edge, to o. With H having no edges be tween A and B the graph G ∪ H canno t be k -vertex co nnected. ⊓ ⊔ Using Theorem 1 yields a semi-s treaming algo- rithm computing a sparse and strong certificate of k -vertex connec tivit y in per-e dge pro cess ing time T = O (1). T o test the final certificate for k - vertex connectivity in a p os tpro cessing step we can use an a lgorithm of Gab ow[7] on it. That a l- gorithm runs in time O (( k 5 / 2 + n ) k n ) = O ( k n 2 ) and, what is more imp or tant, uses a spac e lin- ear in the num ber o f edge s o f the final ce r tificate, hence is r esp ecting the memory constra int s o f the semi-streaming mo del. The resulting computing time is O ( m + k n 2 ). 4.4. k -Edge Co nnectivit y W e use the same C k as utilized in Section 4 .3 pro duced b y the algor ithm of Nagamo chi and Ibaraki presented in [9], where it is shown that 6 Mariano Zelke C k reflects the lo cal edge-c o nnectivity of G in the following way: λ ( x, y ; C k ) ≥ min { λ ( x, y ; G ) , k } ∀ x, y ∈ V (2) Therefore Lemma 2 and Lemma 3 can b e formu- lated and proven with r esp ect to l -cuts, l < k , and k -edge connectivity . Accor dingly we hav e a semi-str e aming algorithm computing a strong and sparse cer tificate for k -edge connectivity us - ing T = O (1). T o determine k - edge connectivity of the final certificate we can use an algorithm of Gab ow[6] using a space linear in the num b er o f edges of the final certificate. It takes a time of O ( m + k 2 n log ( n/ k )) which is also the resulting computing time of o ur s e mi- streaming algo rithm. 4.5. Mini m um Spanning F ores t Let us first take a lo o k a t the algo rithm we use as a s ubr outine for our semi-streaming al- gorithm computing an MSF o f a given gra ph. W e utilize the MST algorithm of Pettie and Ramachandran[11] which use s a space of O ( m ). A r emark on how we use an algor ithm comput- ing a n MST to obtain a n MSF we g ive b elow. The algo r ithm of [11] uses a time of O ( T ∗ ( m, n )), where T ∗ ( m, n ) denotes the minimum num be r of edge-weigh t compariso ns needed to find an MST of a gr aph with n vertices and m edg es. The algo - rithm uses decis ion trees which ar e prov ably opti- mal but whose exact depth is unknown. Because of that the exact running time of the alg orithm is not known even it is optimal. The c urrently tight est time b ound for the MST problem is given by alg orithms due to Chazelle[2] and P ettie[10] that run in time O ( m · α ( m, n )). Consequently the optimal alg orithm of Pettie and Ramachandran[11] inherits this r unning time, T ∗ ( m, n ) = O ( m · α ( m, n )). B ased on the def- inition α ( m, n ) = O (1) if m/ n ≥ log n . Ther e fore on a s ufficien tly dense g raph the a lgorithm of [11] computes an MST in time O ( m ). Using this optimal algor ithm as our subroutine we can find a semi-stre a ming algo rithm with p er- edge pr o cessing time T = O (1) in the following wa y . W e use the technique descr ib ed in Theorem 1 of merging a computed subgraph with buffered edges and then calculating a new subgraph of the merged graph while buffer ing the next g r oup of edges. Unlike b efore w e use gr o ups o f edg es con- sisting of r = n · log n edges instead of n . Such a nu mber of edges ca n b e memor iz ed in the semi- streaming mo del using O ( n · p olylog n ) bits, even if weigh ts are assigned to the edg e s which w e as- sume to be s torable in O (poly log n ) bits each. By tak ing up the no tation of the pr o of of The- orem 1, C j r is the memo rized MSF o f the gr aph G j r made up o f the edges e 1 , e 2 , . . . , e j r . W e merge the buffered nex t r edges with C j r to ob- tain T = C j r ∪ { e j r +1 , e j r +2 , . . . , e ( j +1) r } . F or the nu mber m T of edge s in T we hav e m T ≥ n · log n and therefo r e the optimal MST a lg orithm uses a time of O ( m T ) to co mpute the MSF C ( j +1) r of T . Since m T < 2 r the computation o f C ( j +1) r takes a time o f O ( r ). T o fill the buffer of the next r edges in the meantime, the edges c a n ar rive with a time delay of O (1). It remains to sho w tha t what w e compute in the describ ed wa y is indeed a n MSF of the in- put graph G . Every edge of G j r that is no t in C j r is the heaviest on a cyc le in G j r and cannot be in an MSF of G j r . On the other hand C j r do es not contain any disp ensable edges since it includes no cycles : The re mov al of any edge from C j r pro duces tw o c onnected components in C j r whose vertices form a common connected com- po nent in G j r . Therefor e C j r forms an MSF of G j r , inductively showing that w e rea lly obtain an MSF of G in this manner. Now we ca n s ta te the computing time of our semi-streaming algorithm which dep ends o n the density of the input g r aph G . If G has at most r = n · lo g n edges, all edges are rea d and buffered in time O ( m ) and then the optimal a lgorithm of Pettie a nd Ra machandran[11 ] computes a n MSF in time O ( T ∗ ( m, n )), pro ducing a comput- ing time of O ( T ∗ ( m, n )), s ince Ω( m ) is a low er bo und for T ∗ ( m, n ). If G ha s more tha n r edges we suc c essively up- date an MSF with groups of edges. Note that, dif- ferent from the descr ibe d pro cedure in the pro of of Theorem 1, the la st group of edges is not simply merged to the up to now computed C ⌊ m/r ⌋ r . In- stead the MSF o f the mer ged g raph is ca lc ulated to obtain the final MSF, which is also the MSF of the input g r aph, in the pos tpro cessing step. W e can fill the last gro up o f edges up to a complete Optimal Per-Edge Pro ce ssing Times in the Semi-Stre aming Mode l 7 group o f r edges by using dummy edges weigh ted heavier than any edge in the input stream. This wa y we ensure that the last merged gra ph for the po stpro cessing with m f ≥ r edges is sufficiently dense for the optima l MST algorithm running on it. So for the p ostpro cessing time we hav e O ( T ∗ ( m f , n )) = O ( m f · α ( m f , n )) = O ( m f ). Therefore the co mputing time is O ( m ) + O ( m f ) = O ( m ), which is trivially O ( T ∗ ( m, n )). Let us give t w o minor re marks abo ut the alg o- rithm of Pettie and Ra machandran[11 ] we use. Firstly , the algo rithm of [11] a ssumes the edge weigh ts to b e distinct. W e do not require that prop erty since ties can b e broken while r eading the input edges in a wa y des crib ed in [3]. Se c - ondly , the algor ithm of [11] works o n connected graphs. Before r unning it, w e ca n us e a depth- first search to identify the connected comp onents which ar e then pro ces sed separa tely . Iden tify- ing the connected comp onents takes a time of O ( m ) = O ( T ∗ ( m, n )), s o the running time o f our subroutine p ersists as well a s the p er-edg e pr o- cessing time of our semi-streaming algorithm. 5. Discussion In this se c tio n we compare the o btained semi- streaming algorithms to algor ithms determining the same pr op erties in the classical RAM mo del allowing r andom a c cess to a ll the edges of a gr aph without any memory constraints. First note that the presented semi-strea ming algorithms hav e optimal p er-edge pro ces sing times, that is, no semi-strea ming algorithm ex - ists allowing asymptotically sho rter times: Every single edge must b e cons ide r ed to deter mine a so - lution for the problems cons ide r ed in this pa pe r , so a time of Ω(1) per edg e is a lower b ound for these problems. Let us now take a lo ok at the pr esented semi- streaming a lgorithms testing k -vertex and k -edg e connectivity . F or k -vertex connectivit y with k b e- ing a consta n t the fastest a lgorithm in the RAM mo del to date is due to Gab ow[7] whic h runs in O ( k n 2 ). Gab ow obtains this result even in graphs with multiple edges by prepro cess ing the input graph with the algorithm of Nagamo chi and Iba raki[9] in time O ( m ) pro ducing a running time of O ( k n 2 + m ) on gra phs and multigraphs. This as y mptotically equals our computing time, which is not surprising since we use Gab ow’s al- gorithm as our subroutine. The same situation we find when lo oking at k -edg e connectivity . Our achiev ed computing time of O ( m + k 2 n log ( n/ k )) is as ymptotically a s fast a s the fastest alg orithm in the RAM mo del due to Gabow[6] which we use as a subroutine. So both o ur connectivity a lgo- rithms ha ve a c o mputing time that is asymptoti- cally the sa me as the fastest known corresp onding algorithms in the RAM mo del. It is po ssible that ther e are faster but still un- known algorithms in the RAM mo del for k -vertex and k -edge connectivity which ca nnot b e utilized in the semi-strea ming mo del b ecause they con- sume to m uc h space. The conv erse is tr ue for the problems of finding connected comp onents, a bi- partition and an MSF of a given gra ph. The pre- sented semi-streaming algor ithms ha v e asymptot- ically the same computing time as the fastest p os- sible alg orithms in the RAM-mo del. That can easily b e seen for co nnec ted co mp onents and bi- partition: W e obtain in ea ch case a computing time of O ( n + m ) which is trivia lly a low er bo und for any algorithm in the RAM mo del solving these problems. F o r computing an MSF w e get a com- puting time of O ( T ∗ ( m, n )), wher e T ∗ ( m, n ) is the low er time b ound for a n y RAM algor ithm. F or the asymptotic time needed to determine a solution ther e is no difference for k -edge and k - vertex connec tiv ity betw een the currently fas test algorithms in the RAM mo del a nd the present ed semi-streaming a lg orithms. Un less fas ter con- nectivity alg orithms in the RAM mo del are de- veloped there is no demand for a random a c- cess to the edges and for a memory exce e ding O ( n · p olylo g n ) bits. F o r co mputing the co n- nected comp onents, a bipartition and a n MSF such a demand will nev er emerge since the pre- sented semi-s treaming alg orithms have o ptimal computing times. The RAM mo del cannot capi- talize on its might y p otential of unlimited mem- ory and rando m a ccess to b eat the computing times of the weak er semi-str eaming mo del. W e clo s e this section b y indicating a tra deoff be- 8 Mariano Zelke t ween memo ry and time when computing an MSF in the semi-strea ming mo del. If the memory con- straint of the semi-str eaming algorithm is r educed from O ( n · p olylo g n ) to O ( n · log 2 − ε n ) bits, only s = o ( n · log n ) edges can b e memorize d. So the optimal MST algor ithm we use as a subro u- tine needs a time of O ( T ∗ ( s, n )). P rovided that T ∗ ( s, n ) = ω ( s ) we obtain a per -edge pro cess ing time o f ω (1) and therefore a computing time o f ω ( m ). Both b ounds a re s ig nificantly larg er than the cor resp onding ones when O ( n · po lylog n ) bits of memor y are pe r mitted. How ever, if it turns out that T ∗ ( m, n ) = O ( m ) for any m , it suffices to s tore Θ( n ) edges to obtain b oth optimal p e r - edge a nd computing time in the semi-streaming mo del. 6. Conclusion W e prese nted semi-str eaming a lgorithms for computing the co nnected comp onents, a bipar - tition, the k - vertex a nd k - e dge connectivity for any c onstant k a nd an MSF of a given graph. The presented p er - edge proc essing times T sur - pass former semi- s treaming algor ithms and ar e optimal b ecaus e they a re constant. All intro- duced semi-strea ming a lgorithms ar e asymptot- ically as fast as the fas test cor resp onding a lgo- rithms in the RAM mo del. F or co nnected com- po nent s, bipartition a nd MSF we actually achiev e the time b ounds of the b est po ssible RAM algo- rithms. The main idea for our s e mi- streaming algo - rithms is quite simple: A sparse memorized s ub- graph is merg e d with buffered edges and while computing a spars e subgra ph o f the merged o ne the next edges a re buffered. W e b elieve this idea to b e fruitful fo r other gra ph pro blems as well when tackling them without random access and within the memory constraint s of the semi- streaming mo del. REFERENC ES 1. B. Bo llo b´ as. Graph Theor y , An Introducto r y Course. Springer, New Y or k, 1979. 2. B. Chazelle. A minimum spanning tree algo - rithm with inv erse-Ack ermann type co mplex- it y . J. A CM 47(6):102 8–104 7, 2000. 3. D. Eppstein, Z. Galil, G. F. Italiano, a nd A. Nissenzweig. Spar sification - A technique for sp eeding up dynamic gr aph algorithms. Jour- nal of the A CM, 44(1): 669 –696, 199 7. 4. J. F eigenbaum, S. Kannan, A. McGr egor, S. Suri, and J. Zhang. O n gr aph problems in a se mi- streaming mo del. ICALP 2 0 04, In: LNCS 3142, 531-5 43, 2004 . 5. J. F eigenbaum, S. Kannan, A. McGr egor, S. Suri, a nd J . Zhang. Gra ph Distanc e s in the Streaming Mo del: the V alue o f Space. SO D A 2005: 745- 754. 6. H. N. Gab ow. A Matro id Approach to Find- ing Edg e Connectivity and Pac king Arbor es- cences. Jour nal of Computer and System Sci- ences, V olume 50 , Issue 2, 2 59-27 3, 1995 . 7. H. N. Ga bow. Using expander graphs to find vertex connectivity . In: Proce e dings of the 41st IEEE Symp osium o n F oundations o f Computer Science, IEEE Co mputer Soc iet y , Los Alamitos, CA, 2000, pp. 410–4 2 0. 8. S. Muthukrishnan. D ata streams: Algo- rithms and applicatio ns. 2003 . Av ailable at ht tp://athos.r utgers.edu/ ∼ m uthu/stream-1-1.ps 9. N. Nagamo chi and T. Ibaraki. A linea r time algorithm fo r finding a spars e k-connected spanning subgr aph of a k-connected gra ph. Algorithmica, 7:583 –596 , 19 92. 10. S. Pettie. Finding minimum spanning tre e s in O ( mα ( m, n )) time. T ech. Rep. T R9 9-23, Univ. of T ex as a t Austin, Austin, T ex. 11. S. Pettie and V. Ra machandran. An Optimal Minim um Spanning T r ee Algo r ithm. J. ACM 49(1): 16–3 4, 200 2. 12. R.E. T a rjan. Data Structure s and Netw ork Algorithms. CBMS-NSF Regiona l Conference Series in Applied Mathematics, 1983. 13. M. Zelke. k -Connectivity in the Semi-Streaming Mo del. a v ailable at arXiv: cs.DM /0608066 .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment