Optimal Per-Edge Processing Times in the Semi-Streaming Model

Optima l P er-Edge P ro cessin g T imes in t he Semi-St reami ng Mo del ∗ Mariano Zelke a † a Hum bo ldt-Universit¨ at zu Berlin, Institut f¨ ur Informatik, 100 9 9 Berlin, German y W e present semi-streaming algorithms for basic graph problems that hav e optimal p er-ed ge pro cessing times and therefore surpass all previous semi-streaming algorithms for th ese tasks. The semi-streaming m o del, which is appropriate when dealing with massiv e graphs, forbids random access to the inpu t and restricts the memory t o O ( n · p olylog n ) bits. P articularly , the formerly b est per-edge pro cessing times for ﬁnd in g the connected components and a bipartition are O ( α ( n )), for determining k -vertex and k -edge conn ectivit y O ( k 2 n ) and O ( n · log n ) resp ectively for any constant k and for computin g a minimum spanning fores t O (log n ). All these time b ounds we reduce to O (1). Every presented algorithm determines a solution asymptotically as fast as the b est correspond ing algorithm up to date in the classi cal RAM model, whic h therefore cannot conv ert the adv an tage of u nlimited memory and random access into sup erior comput ing times for these problems. Keywor ds: graph algorithms, streaming algorithms, p er-edge pro cessing time 1. In tro duction When facing computationa l tasks o n mass ive graphs the p ostulate of the classical RAM mo del, that is, storing the whole input in memor y allow- ing rando m access to it, is no longer a dequate. In fa ct, informatio n building up the graph ma y arrive a t no sp eciﬁed order and the attempt of completely storing it exceeds common main mem- ories. Rega rding this Muthukrishnan[8] 20 03 pro- po sed the semi-str e aming mo del as a more restric- tive mode l of computation. According to this the edges of the input graph G app ear at arbitra ry or - der and the memo ry is limited to O ( n · p olylog n ) bits, where n is the num ber o f vertices in G . An impo rtant parameter of a semi- s treaming algo- rithm is describ ed b y the per- edge pr o cessing time T , i.e., t he time the algorithm needs to handle each single edg e. This time determines at which frequency the edges may arrive. The second pa- rameter of the semi-strea ming mo del denotes the nu mber P o f passes the algor ithm ta kes ov er the input str eam. All considered a lgorithms in this pap er use only one pass. Despite the hea vy restrictions in the semi- ∗ Supported by the DFG Research Cent er Ma theon “Mathematics for key techn ologies” in Berlin † Email address: zelk e@informatik.hu-berlin.de streaming model there are algorithms known so lv- ing ba sic graph pro blems. In [4] semi-streaming algorithms are given for co mputing the connected comp onents a nd a bipartition o f a graph as well as a minimum spanning tree of a weighted gr aph. There a re approa ches to determine the k -edge connectivity [5 ] and the k -vertex connectivity [5],[13] of a graph for any constant k . In this pap er w e present semi-str eaming alg o- rithms for co mputing the co nnec ted co mpo nen ts and a bipar tition of a gra ph, to calculate the k - vertex a nd k -edge connectivity for a ny constant k and to ﬁnd a minimum spa nning forest MSF. All these alg orithms hav e co nstant and therefore optimal p er-edge pro cessing times. Section 2 gives the usual deﬁnitions, in Section 3 we discuss our deﬁnition of the p er-edge pro- cessing time which is a slig h t reﬁnement of pre - vious deﬁnitions. W e develop o ur s emi-streaming algorithms in Section 4. In Section 5 we debate on how the obta ined algorithms co mpete with the corres p onding algorithms in the RAM model. A ﬁnal conclusion is found in Section 6. 2. Preliminaries and Deﬁni tions By G we denote a graph G ( V , E ) with vertex set V and edge set E . W e c a ll n = | V | and 1 2 Mariano Zelke m = | E | the num ber o f vertices and e dges re- sp ectively . Every gra ph consider ed in this pap er is undirected and contains no lo ops but migh t hav e multiple edges. F or computing an MSF we consider G to b e a weigh ted graph, that is, with a nonnega tive weigh t asso ciated with each edg e. Regarding the memory co nstraints of the semi- streaming mo del we assume ev ery w eight to b e storable in O (po lylog n ) bits. W e deﬁne α ( m, n ) to b e a na tural in verse of Ac kermann’s function A ( · , · ) as deﬁned in [12]: α ( m, n ) := min { i ≥ 1 | A ( i, ⌊ m/n ⌋ ) > log n } . W e abbreviate α ( n ) to denote α ( n, n ). Bipartition. A g raph G is called bip artite if the vertices can b e split in tw o pa r ts, a bip artition , such that no edge runs b etw een t wo vertices in the same parts. The problem of ﬁnding a bipartition is to ﬁnd tw o such par ts or stating that there is no bipartition since the gr a ph is not bipartite. Connectivit y . W e na me t wo vertices c onne cte d if there is a path betw een them. A gra ph G is con- nected if any pair of v ertices in G is connected, a c onne cte d c omp onent o f G is an induced subgra ph C of G such that C is connected and maximal. A sp anning for est of G is a subgraph o f G with- out an y cycles having the same connected com- po nent s as G . Giv en a p ositive integer k , a graph G is said to b e k -vertex c onne cte d ( k - e dge c on- ne ct e d ) if the r emov al of any k − 1 vertices (edges) leav es the gr a ph connec ted. A subset S of the ver- tices (edges) of G we call a n l -sep ar ator ( l - cut ) if l = | S | and the gra ph obtained by remo ving S fro m G has more connec ted comp onents than G . The lo c al vertex-c onne ctivity κ ( x, y ; G ) ( lo c al e dge-c onne ctivity λ ( x, y ; G )) denotes the num ber of vertex-disjoint (edg e-disjoint) paths b etw e e n x and y in G . By a class ical r e s ult o f Menger (see e.g. [1]) the lo cal vertex- (edge-) co nnectivity be- t ween x a nd y equals the minimum num ber of vertices (edges ) that must be remov ed to obtain x and y in diﬀerent co nnec ted comp onents. MSF/MST. F or a n edge-weighted g raph G the minimum sp anning for est MSF is a subgra ph G ′ of G with minimum total cost co nsisting of the same connected compone nts a s G . If G is co n- nected w e name G ′ , whic h is then connected as well, the minimum sp anning tr e e MST of G . Certiﬁcates. Given a n y gr aph prop er ty P and a g r aph G , a c ertiﬁc ate of G for P is a gr aph G ′ on the same vertex set such that G has P if and only if G ′ has P . F or a ny graph G on vertex set V and a n y prop- erty P a str ong c ertiﬁc ate o f G for P is a gra ph G ′ on v ertex set V such that for a ny g raph H on V , G ∪ H has P if a nd only if G ′ ∪ H has P . A certiﬁcate is said to b e sp arse if the n um b er of edges is O ( n ). Semi-Streaming Algorithm. A gr aph str e am of a gra ph G is a sequence of the m edges of G in arbitrar y o der . A semi-str e aming algorithm A gets a graph stream as an input and is restricted to use a space of at most O ( n · p olylog n ) bits. The algor ithm may access the input stream for P passes in a sequential one-way order. All algo- rithms cons idered in this pap er use only P = 1 pass. The pe r -edge pro cessing time T of A we de- ﬁne to be the minimum time allow ed b etw een the revealing of tw o consecutive edges in the input stream. That deﬁnition of T renders the deﬁni- tions of previous pa p er s more precisely , we g ive a discus sion co ncerning that in Section 3 . There we also commen t on the c omputing time which denotes the total time requir ed by A to deter mine the prop erty in ques tio n of the input graph. 3. Discussion of P er-Edge Pro cessi ng Time In previo us paper s abo ut semi-streaming algo - rithms that co nsider the p er-edg e pro ces sing time T ([4],[5],[13]), T is used in an ambiguous w ay . While being used as the worst-case time to pro- cess a single edge o n the one ha nd it is equally used on the other hand, even if not explicitly stated, as amortized time charged ov er the n um- ber of edges. In fact, if to ols as dynamic trees or disjoint set data structures are utilized they give rise to amortized times since their time bounds are of amor tized type, to o . Pro cessing the input edges is then assumed to b e evenly spread o ver the whole computing time which is just m · T . This deﬁnition is not appropria te for a strea m- Optimal Per-Edge Pro ce ssing Times in the Semi-Stre aming Mode l 3 ing algor ithm: As Muthukrishnan[8] p ointed out the computing time, i.e., the time to ev aluate the prop erty in question for items read in so far , is not the most impor ta n t par ameter o f a stream- ing alg orithm. What is mor e cr uc ia l is the max- im um frequency of incoming items that can s till be co ns idered b y the algorithm. That refers to the sp eed a t which external stor age devices can present their data conten t to a streaming algo- rithm and c onstitutes the freq uency at which ob- served phenomena ca n b e taken into account. T o this a im it is desir able to maximize the p os sible rate of incoming items b y p ostp oning as muc h op erations as p ossible to a p oint after whic h all items are re ceived, possibly accepting a higher computing time. T o mo del this worth while prop erty of a strea m- ing alg orithm A we prop os e the deﬁnition o f the per -edge pro cessing time T to b e the minim um allow able time b etw een tw o cons e c utiv e edges in the gr aph str eam. The ﬁnal deter mination of the prop erty in question ma y r equire some p ostpro- cessing a fter r eading all input edges. This time is considered in the computing time which incorp o- rates the sum of the p er-edg e pro cessing times of all edges and the postpro cessing time. 4. Computing Certiﬁcates and Buﬀering Edges T o achiev e o ur optimal p er- edge pro ce ssing times we ex ploit the general metho d o f spars iﬁ- cation as presented by Eppstein et al.[3]. F eigen- baum et al.[5] p ointed out how the r e sults of [3] can b e adopted for the s emi-streaming mo del. Thu s they received the former ly best b ounds on T for almost all pr oblems consider ed in this pa p er . W e reﬁne their metho d to obtain an improvemen t of their r esults. F or a comparison of our new bo unds with the previous ones see T able 1. Due to the memory limitatio ns of the semi- streaming mo del it is not p oss ible to memorize a whole graph which is to o dense, that is, if m/n ≫ log n . A w a y to de ter mine gr a ph pro p- erties without completely stor ing the graph is to ﬁnd a spar s e certiﬁcate C of the g raph for the prop erty in question. Consis ting o f a linear nu m- ber of edges the certiﬁcate can b e sto red within the memory res trictions a nd testing it answ ers the question for the or iginal graph. The con- cept of certiﬁcates has be e n a pplied for the semi- streaming model in [5] and [13]. Ho wev er, in [13] every input edge initiates an up date of the certiﬁ- cate which is time-consuming and avoids a faster per -edge pro cessing . T o inc r ease the manageable frequency of in- coming edges, upda ting the certiﬁcate can b e done not for every s ing le edge but for a gro up of edges. While considering suc h a gr oup of edges the next incoming edges can b e buﬀered to com- po se the group for the following up date. T o pe rmit this updating in gr oups o f edges the utilized certiﬁcate must b e a strong ce r tiﬁcate, an as sumption that is not required in [13]. That is b ecaus e strong certiﬁcates ob ey t w o imp ortant attributes for any ﬁxed g raph prop erty: Firstly , they behave transitively , that is, if C is a strong certiﬁcate for G and C ′ is a strong certiﬁcate for C , then C ′ is a strong certiﬁcate for G . Secondly , if G ′ and H ′ are strong certiﬁcates of G and H resp ectively , then G ′ ∪ H ′ is a s trong certiﬁcate of G ∪ H . The technique of gro up-wise updating is used by Eppstein et al.[3] yie lding fast dynamic a l- gorithms and has b een transfer r ed to the semi- streaming mo del by F eigenbaum et a l.[5]. The following theore m is a slight ly extended v ersion of their result augmen ted with s pace considera- tions. W e will need details of the pro o f later on. Theorem 1 L et G b e a gr aph and let C b e a sp arse and str ong c ertiﬁc ate of G for a gr aph pr op erty P . If C c an b e c ompute d in sp ac e O ( m ) and time f ( n, m ) , t hen ther e is a one-p ass semi- str e aming algorithm building C of G with p er-e dge pr o c essing t ime T = f ( n, O ( n )) /n . Pro of. W e denote the e dg es o f the input str e am as e 1 , e 2 , . . . , e m and the subgr aph o f G contain- ing the ﬁrst i edges in the stream as G i . W e inductively as sume that we c o mputed a sparse and strong certiﬁcate C j n of the graph G j n for 1 ≤ j < ⌊ m/ n ⌋ using a time o f f ( n, O ( n )) /n per already pro cessed edge. During the com- putation of C j n we buﬀ ered the next n edges e j n +1 , e j n +2 , . . . , e ( j +1) n . 4 Mariano Zelke T able 1 Previous ly b est p er-edg e pro cessing times T compared to our new bounds Problem Previous Best T New T Connected comp onents O ( α ( n )) O (1 ) Bipartition O ( α ( n )) O (1 ) { 2,3 } - vertex co nnectivit y O ( α ( n )) O (1 ) 4-vertex connectivity O (lo g n ) O (1 ) k -vertex connectivity O ( k 2 n ) O (1 ) { 2,3 } - edge connectivity O ( α ( n )) O (1 ) 4-edge connectivity O ( nα ( n )) O (1 ) k -edge connectivity O ( n · log n ) O (1 ) Minim um spanning fore s t O (lo g n ) O (1 ) All previous b ounds are due to [5 ], apart from k -vertex connectivity which is a result of [13]. k is any constant, α ( n ) the inverse of Ack ermann’s function. Because of the prop erties of s trong certiﬁcates T = C j n ∪ { e j n +1 , e j n +2 , . . . , e ( j +1) n } is a strong certiﬁcate for G ( j +1) n . Since C j n is sparse, T co n- sists o f O ( n ) edges as well. Computing C ( j +1) n as a spa rse and strong certiﬁca te of T can b e re- alized in a space linear in the spa c e needed to memorize the edges o f T , which is O ( n · p olylog n ) bits, without exceeding the memo ry limitation of the semi-streaming mo del. By transitivity C ( j +1) n is a strong certiﬁcate of G ( j +1) n . A time of f ( n, O ( n )) suﬃces to compute C ( j +1) n , hence the input edg es ca n arrive with a time dela y o f f ( n, O ( n )) /n building the g r oup o f the nex t n edges to up date the certiﬁca te after the c o mpu- tation of C ( j +1) n is completed. Finally for k = ⌊ m/ n ⌋ the last gr oup of edges { e kn +1 , e kn +2 , . . . , e m } can simply b e added to C kn to obtain a s parse and strong certiﬁcate of the input gra ph G for the prop erty P . ⊓ ⊔ T o o btain our semi-streaming alg o rithms with op- timal p e r-edge pro cessing times, all that r e mains to do is to present the r equired cer tiﬁcates and to show in whic h time and space b ounds they can be computed. A t ﬁrst gla nce it may seem sur- prising that F eig enbaum et al.[5] using the same techn ique of up dating cer tiﬁcates with groups o f edges do not meet the b ounds w e present in this pap er. The r eason is that they just observe that results of E ppstein et al.[3] can b e trans fered to the semi-str eaming mo del. How ever, Eppstein et al. develop dynamic gr a ph alg orithms requiring powerful abilities: The algo rithm m ust b e a ble to answer a query for the subgr aph of alr e ady r e ad edges at any time a nd it must ha ndle e dg e dele- tions. In the semi-str eaming mo del the pro pe r ty is queried only at the end of the stream and there are no edge deletions. Thus we can drop b oth re- quirements for faster p er-edg e pro ce s sing times. In the following the input g raph for our semi- streaming a lgorithms is deno ted by G with n ver- tices and m edges as usual. 4.1. Connected Co m p onents W e use a spanning forest F o f G as a cer tiﬁcate. F is not only a stro ng cer tiﬁcate for connec tiv- it y it a lso has the sa me connec ted components as G . F can be computed by a depth-ﬁrs t s earch in time and s pace of O ( n + m ) and is sparse b y deﬁ- nition. Using Theorem 1 we get a semi-streaming algorithm co mputing a spanning forest o f G with per -edge pr o cessing time T = O (1). T o identify the connected co mpo nent s of G in the p ostpr o- cessing step we can run a depth-ﬁr st search o n the ﬁnal certiﬁcate in time O ( n ). The r esulting computing time is m · T + O ( n ) = O ( n + m ). 4.2. Bipartition As a certiﬁcate for bipartiteness of G we use F + , whic h is a spanning fo rest of G augmen ted Optimal Per-Edge Pro ce ssing Times in the Semi-Stre aming Mode l 5 with one more edge of G inducing an o dd cycle if there is any . If no such cycle exists F + is just a spanning forest. By [3] F + is a strong certiﬁcate of G and spar se by deﬁnition. It can b e com- puted by a depth-ﬁrs t search which is alter nately coloring the visited vertices and is therefo r e able to ﬁnd a n o dd cycle. T o do so a time and space of O ( n + m ) suﬃces, yielding a semi- s treaming algorithm with T = O (1). On the ﬁnal certiﬁcate we can run aga in a depth-ﬁrst s e arch color ing the vertices alternately in time O ( n ) dur ing the po st- pro cessing s tep. That pro duces a bipartition of the vertices or identiﬁes an o dd cycle in G in a computing time of O ( n + m ). 4.3. k -V ertex Connectivit y F or k -vertex connectivity , k b eing any con- stant, we use as a certiﬁca te o f G a subgraph C k which is der ived by an algo rithm prese n ted by Nagamo chi and Ibaraki[9]. C k can be computed in time and spa ce of O ( n + m ), contains at most k n edg es and is therefore sparse. Bey ond it, as a main result o f [9] C k preserves the lo ca l vertex connectivity up to k for any pair of no des in G : κ ( x, y ; C k ) ≥ min { κ ( x, y ; G ) , k } ∀ x, y ∈ V (1) This qualit y of C k leads to useful prop erties: Lemma 2 Every l -sep ar ator S in C k , l < k , is an l -sep ar ator in G and its r emova l le aves t he same c onne cte d c omp onents in b oth C k \ S and G \ S . Pro of. In C k \ S we ﬁnd tw o no nempt y , disjoint connected comp onents X and Y with vertices x ∈ X and y ∈ Y . Assume that S is no t an l -separ ator in G , therefore there exists a pa th Z from x to y in G \ S . Let x ′ be the la st vertex on Z in X and y ′ the ﬁrst one in Y . The part of Z b etw een x ′ and y ′ we call Z ′ . I n C k we ﬁnd at most l vertex-disjoint paths b et ween x ′ and y ′ , a ll of them using vertices of S . In G these pa ths exist a s well with the additional path Z ′ which is vertex-disjoint from the other paths b y construction. T he r efore the lo cal connectivity b etw een x ′ and y ′ in G exce eds that in C k contradicting prop er ty 1 o f C k . Since C k \ S is a subgraph of G \ S every co n- nected comp onent o f C k \ S is included in one con- nected compone nt of G \ S . Assume that W is a connected comp onent in G \ S which co n tains tw o vertices i and j within diﬀerent connected com- po nent s of C k \ S , namely I ∋ i and J ∋ j . As in the ﬁrs t part of this pr o of we can ﬁnd a path Z from i to j in W with x ′ being the last vertex in I and y ′ the ﬁrst one in J on Z . W e can deduce the same contradiction a s above. ⊓ ⊔ So C k is usable for our purpos es: Lemma 3 C k is a s t r ong c ertiﬁc ate for k -vertex c onne ctivity of G . Pro of. If C k ∪ H is k - vertex connected then G ∪ H including C k ∪ H as a subgra ph is k -vertex con- nected as well. Ass ume for the pro of of the con- verse direction that G ∪ H is k -vertex connected and C k ∪ H is not. Then C k ∪ H c ontains an l - separato r S for some l < k . After the r emov al of S the remaining v ertices o f C k ∪ H can b e group ed int o tw o nonempt y sets A a nd B , such that no edge joins a vertex of A with a v ertex of B . It is immediate that H do es not contain any edges betw een A and B . Clearly , removing S from C k pro duces the same sets A and B , still with no edge joining them. The prop erties o f C k shown in Lemma 2 make sure that the remov al of S from G le aves A and B without any joining edge, to o. With H having no edges be tween A and B the graph G ∪ H canno t be k -vertex co nnected. ⊓ ⊔ Using Theorem 1 yields a semi-s treaming algo- rithm computing a sparse and strong certiﬁcate of k -vertex connec tivit y in per-e dge pro cess ing time T = O (1). T o test the ﬁnal certiﬁcate for k - vertex connectivity in a p os tpro cessing step we can use an a lgorithm of Gab ow[7] on it. That a l- gorithm runs in time O (( k 5 / 2 + n ) k n ) = O ( k n 2 ) and, what is more imp or tant, uses a spac e lin- ear in the num ber o f edge s o f the ﬁnal ce r tiﬁcate, hence is r esp ecting the memory constra int s o f the semi-streaming mo del. The resulting computing time is O ( m + k n 2 ). 4.4. k -Edge Co nnectivit y W e use the same C k as utilized in Section 4 .3 pro duced b y the algor ithm of Nagamo chi and Ibaraki presented in [9], where it is shown that 6 Mariano Zelke C k reﬂects the lo cal edge-c o nnectivity of G in the following way: λ ( x, y ; C k ) ≥ min { λ ( x, y ; G ) , k } ∀ x, y ∈ V (2) Therefore Lemma 2 and Lemma 3 can b e formu- lated and proven with r esp ect to l -cuts, l < k , and k -edge connectivity . Accor dingly we hav e a semi-str e aming algorithm computing a strong and sparse cer tiﬁcate for k -edge connectivity us - ing T = O (1). T o determine k - edge connectivity of the ﬁnal certiﬁcate we can use an algorithm of Gab ow[6] using a space linear in the num b er o f edges of the ﬁnal certiﬁcate. It takes a time of O ( m + k 2 n log ( n/ k )) which is also the resulting computing time of o ur s e mi- streaming algo rithm. 4.5. Mini m um Spanning F ores t Let us ﬁrst take a lo o k a t the algo rithm we use as a s ubr outine for our semi-streaming al- gorithm computing an MSF o f a given gra ph. W e utilize the MST algorithm of Pettie and Ramachandran[11] which use s a space of O ( m ). A r emark on how we use an algor ithm comput- ing a n MST to obtain a n MSF we g ive b elow. The algo r ithm of [11] uses a time of O ( T ∗ ( m, n )), where T ∗ ( m, n ) denotes the minimum num be r of edge-weigh t compariso ns needed to ﬁnd an MST of a gr aph with n vertices and m edg es. The algo - rithm uses decis ion trees which ar e prov ably opti- mal but whose exact depth is unknown. Because of that the exact running time of the alg orithm is not known even it is optimal. The c urrently tight est time b ound for the MST problem is given by alg orithms due to Chazelle[2] and P ettie[10] that run in time O ( m · α ( m, n )). Consequently the optimal alg orithm of Pettie and Ramachandran[11] inherits this r unning time, T ∗ ( m, n ) = O ( m · α ( m, n )). B ased on the def- inition α ( m, n ) = O (1) if m/ n ≥ log n . Ther e fore on a s uﬃcien tly dense g raph the a lgorithm of [11] computes an MST in time O ( m ). Using this optimal algor ithm as our subroutine we can ﬁnd a semi-stre a ming algo rithm with p er- edge pr o cessing time T = O (1) in the following wa y . W e use the technique descr ib ed in Theorem 1 of merging a computed subgraph with buﬀered edges and then calculating a new subgraph of the merged graph while buﬀer ing the next g r oup of edges. Unlike b efore w e use gr o ups o f edg es con- sisting of r = n · log n edges instead of n . Such a nu mber of edges ca n b e memor iz ed in the semi- streaming mo del using O ( n · p olylog n ) bits, even if weigh ts are assigned to the edg e s which w e as- sume to be s torable in O (poly log n ) bits each. By tak ing up the no tation of the pr o of of The- orem 1, C j r is the memo rized MSF o f the gr aph G j r made up o f the edges e 1 , e 2 , . . . , e j r . W e merge the buﬀered nex t r edges with C j r to ob- tain T = C j r ∪ { e j r +1 , e j r +2 , . . . , e ( j +1) r } . F or the nu mber m T of edge s in T we hav e m T ≥ n · log n and therefo r e the optimal MST a lg orithm uses a time of O ( m T ) to co mpute the MSF C ( j +1) r of T . Since m T < 2 r the computation o f C ( j +1) r takes a time o f O ( r ). T o ﬁll the buﬀer of the next r edges in the meantime, the edges c a n ar rive with a time delay of O (1). It remains to sho w tha t what w e compute in the describ ed wa y is indeed a n MSF of the in- put graph G . Every edge of G j r that is no t in C j r is the heaviest on a cyc le in G j r and cannot be in an MSF of G j r . On the other hand C j r do es not contain any disp ensable edges since it includes no cycles : The re mov al of any edge from C j r pro duces tw o c onnected components in C j r whose vertices form a common connected com- po nent in G j r . Therefor e C j r forms an MSF of G j r , inductively showing that w e rea lly obtain an MSF of G in this manner. Now we ca n s ta te the computing time of our semi-streaming algorithm which dep ends o n the density of the input g r aph G . If G has at most r = n · lo g n edges, all edges are rea d and buﬀered in time O ( m ) and then the optimal a lgorithm of Pettie a nd Ra machandran[11 ] computes a n MSF in time O ( T ∗ ( m, n )), pro ducing a comput- ing time of O ( T ∗ ( m, n )), s ince Ω( m ) is a low er bo und for T ∗ ( m, n ). If G ha s more tha n r edges we suc c essively up- date an MSF with groups of edges. Note that, dif- ferent from the descr ibe d pro cedure in the pro of of Theorem 1, the la st group of edges is not simply merged to the up to now computed C ⌊ m/r ⌋ r . In- stead the MSF o f the mer ged g raph is ca lc ulated to obtain the ﬁnal MSF, which is also the MSF of the input g r aph, in the pos tpro cessing step. W e can ﬁll the last gro up o f edges up to a complete Optimal Per-Edge Pro ce ssing Times in the Semi-Stre aming Mode l 7 group o f r edges by using dummy edges weigh ted heavier than any edge in the input stream. This wa y we ensure that the last merged gra ph for the po stpro cessing with m f ≥ r edges is suﬃciently dense for the optima l MST algorithm running on it. So for the p ostpro cessing time we hav e O ( T ∗ ( m f , n )) = O ( m f · α ( m f , n )) = O ( m f ). Therefore the co mputing time is O ( m ) + O ( m f ) = O ( m ), which is trivially O ( T ∗ ( m, n )). Let us give t w o minor re marks abo ut the alg o- rithm of Pettie and Ra machandran[11 ] we use. Firstly , the algo rithm of [11] a ssumes the edge weigh ts to b e distinct. W e do not require that prop erty since ties can b e broken while r eading the input edges in a wa y des crib ed in [3]. Se c - ondly , the algor ithm of [11] works o n connected graphs. Before r unning it, w e ca n us e a depth- ﬁrst search to identify the connected comp onents which ar e then pro ces sed separa tely . Iden tify- ing the connected comp onents takes a time of O ( m ) = O ( T ∗ ( m, n )), s o the running time o f our subroutine p ersists as well a s the p er-edg e pr o- cessing time of our semi-streaming algorithm. 5. Discussion In this se c tio n we compare the o btained semi- streaming algorithms to algor ithms determining the same pr op erties in the classical RAM mo del allowing r andom a c cess to a ll the edges of a gr aph without any memory constraints. First note that the presented semi-strea ming algorithms hav e optimal p er-edge pro ces sing times, that is, no semi-strea ming algorithm ex - ists allowing asymptotically sho rter times: Every single edge must b e cons ide r ed to deter mine a so - lution for the problems cons ide r ed in this pa pe r , so a time of Ω(1) per edg e is a lower b ound for these problems. Let us now take a lo ok at the pr esented semi- streaming a lgorithms testing k -vertex and k -edg e connectivity . F or k -vertex connectivit y with k b e- ing a consta n t the fastest a lgorithm in the RAM mo del to date is due to Gab ow[7] whic h runs in O ( k n 2 ). Gab ow obtains this result even in graphs with multiple edges by prepro cess ing the input graph with the algorithm of Nagamo chi and Iba raki[9] in time O ( m ) pro ducing a running time of O ( k n 2 + m ) on gra phs and multigraphs. This as y mptotically equals our computing time, which is not surprising since we use Gab ow’s al- gorithm as our subroutine. The same situation we ﬁnd when lo oking at k -edg e connectivity . Our achiev ed computing time of O ( m + k 2 n log ( n/ k )) is as ymptotically a s fast a s the fastest alg orithm in the RAM mo del due to Gabow[6] which we use as a subroutine. So both o ur connectivity a lgo- rithms ha ve a c o mputing time that is asymptoti- cally the sa me as the fastest known corresp onding algorithms in the RAM mo del. It is po ssible that ther e are faster but still un- known algorithms in the RAM mo del for k -vertex and k -edge connectivity which ca nnot b e utilized in the semi-strea ming mo del b ecause they con- sume to m uc h space. The conv erse is tr ue for the problems of ﬁnding connected comp onents, a bi- partition and an MSF of a given gra ph. The pre- sented semi-streaming algor ithms ha v e asymptot- ically the same computing time as the fastest p os- sible alg orithms in the RAM-mo del. That can easily b e seen for co nnec ted co mp onents and bi- partition: W e obtain in ea ch case a computing time of O ( n + m ) which is trivia lly a low er bo und for any algorithm in the RAM mo del solving these problems. F o r computing an MSF w e get a com- puting time of O ( T ∗ ( m, n )), wher e T ∗ ( m, n ) is the low er time b ound for a n y RAM algor ithm. F or the asymptotic time needed to determine a solution ther e is no diﬀerence for k -edge and k - vertex connec tiv ity betw een the currently fas test algorithms in the RAM mo del a nd the present ed semi-streaming a lg orithms. Un less fas ter con- nectivity alg orithms in the RAM mo del are de- veloped there is no demand for a random a c- cess to the edges and for a memory exce e ding O ( n · p olylo g n ) bits. F o r co mputing the co n- nected comp onents, a bipartition and a n MSF such a demand will nev er emerge since the pre- sented semi-s treaming alg orithms have o ptimal computing times. The RAM mo del cannot capi- talize on its might y p otential of unlimited mem- ory and rando m a ccess to b eat the computing times of the weak er semi-str eaming mo del. W e clo s e this section b y indicating a tra deoﬀ be- 8 Mariano Zelke t ween memo ry and time when computing an MSF in the semi-strea ming mo del. If the memory con- straint of the semi-str eaming algorithm is r educed from O ( n · p olylo g n ) to O ( n · log 2 − ε n ) bits, only s = o ( n · log n ) edges can b e memorize d. So the optimal MST algor ithm we use as a subro u- tine needs a time of O ( T ∗ ( s, n )). P rovided that T ∗ ( s, n ) = ω ( s ) we obtain a per -edge pro cess ing time o f ω (1) and therefore a computing time o f ω ( m ). Both b ounds a re s ig niﬁcantly larg er than the cor resp onding ones when O ( n · po lylog n ) bits of memor y are pe r mitted. How ever, if it turns out that T ∗ ( m, n ) = O ( m ) for any m , it suﬃces to s tore Θ( n ) edges to obtain b oth optimal p e r - edge a nd computing time in the semi-streaming mo del. 6. Conclusion W e prese nted semi-str eaming a lgorithms for computing the co nnected comp onents, a bipar - tition, the k - vertex a nd k - e dge connectivity for any c onstant k a nd an MSF of a given graph. The presented p er - edge proc essing times T sur - pass former semi- s treaming algor ithms and ar e optimal b ecaus e they a re constant. All intro- duced semi-strea ming a lgorithms ar e asymptot- ically as fast as the fas test cor resp onding a lgo- rithms in the RAM mo del. F or co nnected com- po nent s, bipartition a nd MSF we actually achiev e the time b ounds of the b est po ssible RAM algo- rithms. The main idea for our s e mi- streaming algo - rithms is quite simple: A sparse memorized s ub- graph is merg e d with buﬀered edges and while computing a spars e subgra ph o f the merged o ne the next edges a re buﬀered. W e b elieve this idea to b e fruitful fo r other gra ph pro blems as well when tackling them without random access and within the memory constraint s of the semi- streaming mo del. REFERENC ES 1. B. Bo llo b´ as. Graph Theor y , An Introducto r y Course. Springer, New Y or k, 1979. 2. B. Chazelle. A minimum spanning tree algo - rithm with inv erse-Ack ermann type co mplex- it y . J. A CM 47(6):102 8–104 7, 2000. 3. D. Eppstein, Z. Galil, G. F. Italiano, a nd A. Nissenzweig. Spar siﬁcation - A technique for sp eeding up dynamic gr aph algorithms. Jour- nal of the A CM, 44(1): 669 –696, 199 7. 4. J. F eigenbaum, S. Kannan, A. McGr egor, S. Suri, and J. Zhang. O n gr aph problems in a se mi- streaming mo del. ICALP 2 0 04, In: LNCS 3142, 531-5 43, 2004 . 5. J. F eigenbaum, S. Kannan, A. McGr egor, S. Suri, a nd J . Zhang. Gra ph Distanc e s in the Streaming Mo del: the V alue o f Space. SO D A 2005: 745- 754. 6. H. N. Gab ow. A Matro id Approach to Find- ing Edg e Connectivity and Pac king Arbor es- cences. Jour nal of Computer and System Sci- ences, V olume 50 , Issue 2, 2 59-27 3, 1995 . 7. H. N. Ga bow. Using expander graphs to ﬁnd vertex connectivity . In: Proce e dings of the 41st IEEE Symp osium o n F oundations o f Computer Science, IEEE Co mputer Soc iet y , Los Alamitos, CA, 2000, pp. 410–4 2 0. 8. S. Muthukrishnan. D ata streams: Algo- rithms and applicatio ns. 2003 . Av ailable at ht tp://athos.r utgers.edu/ ∼ m uthu/stream-1-1.ps 9. N. Nagamo chi and T. Ibaraki. A linea r time algorithm fo r ﬁnding a spars e k-connected spanning subgr aph of a k-connected gra ph. Algorithmica, 7:583 –596 , 19 92. 10. S. Pettie. Finding minimum spanning tre e s in O ( mα ( m, n )) time. T ech. Rep. T R9 9-23, Univ. of T ex as a t Austin, Austin, T ex. 11. S. Pettie and V. Ra machandran. An Optimal Minim um Spanning T r ee Algo r ithm. J. ACM 49(1): 16–3 4, 200 2. 12. R.E. T a rjan. Data Structure s and Netw ork Algorithms. CBMS-NSF Regiona l Conference Series in Applied Mathematics, 1983. 13. M. Zelke. k -Connectivity in the Semi-Streaming Mo del. a v ailable at arXiv: cs.DM /0608066 .

Optimal Per-Edge Processing Times in the Semi-Streaming Model

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment