Balanced allocation: Memory performance tradeoffs

The Annals of Applie d Pr obabil ity 2012, V ol. 22, N o. 4, 1642 –1649 DOI: 10.1214 /11-AAP804 c  Institute of Mathematical Statistics , 2 012 BALANCE D ALLOCA TION: MEMOR Y PERF ORMANCE TRADEOFFS By It ai Benjamini and Yur y Makar ychev Weizmann Institute and T oyota T e chnol o gic al Institute at Chic ago Supp ose w e sequen t ially put n balls into n bins. If we put eac h ball into a random bin th en the h ea viest bin will contain ∼ log n/ log log n balls with high probability . How ever, Azar, Bro der, Karlin and U pfal [ SIAM J. Comput. 29 (1999) 180–200] show ed that if each time we choose t w o bins at random and put th e b all in the least loaded bin among the tw o, th en the heaviest bin will conta in on ly ∼ log log n balls with h igh probability . How muc h memory do we need to im- plement this sc heme? W e need roughly log log log n bits per bin, and n log log log n bits in total. Let us assume now that we ha ve limited amount of memory . F or eac h ball, we are given tw o random bins and w e hav e to pu t the ball into one of them. Our goal is to minimize the load of the heaviest bin. W e prov e that if we hav e n 1 − δ bits then t he heaviest bin will conta in at least Ω( δ log n/ log log n ) balls with high probabilit y . The b ound is tight in the communication complexit y mo del. 1. In tro d uction. Supp ose w e sequentia lly pu t n balls in to n b in s. If w e put eac h ball in a bin c hosen in dep end en tly and u niformly at ran- dom, the m aximum load (the largest n u m b er of balls in an y bin) w ill b e ∼ log n/ log log n with high p robabilit y . W e can signiﬁcan tly reduce the max- im u m load b y using the “p ow er of t wo c hoices” sc h eme of Azar, Bro der, Karlin and Upfal [ 2 ]: if we put eac h ball in the least loaded of tw o bins c hosen ind ep endently and uniformly at rand om, the maximum load will b e ∼ log log n with h igh probability . T his scheme has n um er ou s applications f or hashing, serv er load b alancing and lo w-congestion circuit r outing (see [ 1 – 8 ]). As an example, consider an implementat ion of a h ash table that us es the “p o wer of tw o c hoices” paradigm. W e k eep a table of size n ; eac h table entry can store multi ple element s (sa y) in a doubly-linked list. W e use t wo p erfectly Received Novem b er 2009; revised F ebruary 2011. AMS 2000 subje ct classiﬁc ations. Primary 68Q87; secondary 60C05. Key wor ds and phr ases. Balls–and–bins process, load balancing, memory performance tradeoﬀs. This is an electr onic reprint of the origina l article published by the Institute of Mathema tical Statistics in The Annals of Applie d Pr ob ability , 2012, V ol. 22, No. 4, 1 642– 1649 . This reprint diﬀers from the or iginal in pagination and typogr a phic detail. 1 2 I. BENJAMIN I AND Y. MAKA R YCHEV random h ash fu nctions h 1 and h 2 that map elemen ts to table en tries. T o insert an elemen t e , w e ﬁnd t wo p ossib le table en tries h 1 ( e ) and h 2 ( e ), and store th e element in the table en try with few er element s. T o ﬁnd an elemen t e , we search through all elemen ts in en tries h 1 ( e ) and h 2 ( e ). T h is requires only O (log log n ) op eratio ns for every elemen t e w.h.p.; whereas if w e us ed only one hash function we would need to p erform Ω(log n/ log log n ) op erations for some elemen ts w.h .p. Ho w many extra b its of memory d o w e need to imp lemen t this sc heme? W e need roughly log log log n bits p er bin (table entry) to s tore the n u m b er of balls (elemen ts) in the b in, and n log log log n bits in total. Let us assume no w th at w e hav e limited amount of memory . F or eac h ball, w e are given tw o random bins (e.g., w e are giv en t wo hash v alues) and w e ha ve to put the ball into one of them. Can we s till guarant ee that the maxim um load is O (log log n ) with high probability? The correct an s w er is not ob vious. On e could assum e th at if th e num b er of memory b its is o ( n ) then the maxim um load should b e ∼ log n log log n balls. Ho we v er, that is n ot the case as the follo wing example sho w s . Let us group all bins in to n/ log log n clus ters; eac h cluster consists of log log n bins. F or eac h cluster, we keep the total n um b er of balls in the bins that form the cluster. No w giv en a ball and t wo bins, we put the ball into th e bin whose cluster con tains few er balls. T he result of Azar, Bro der, Karlin and Upfal [ 2 ] implies that w.h.p. eac h cluster will con tain at most n n/ log log n + log log n = 2 log log n balls. Th erefore, eac h bin w ill also con tain at most 2 log log n b alls. This sc heme uses n log log n log log log n = o ( n ) b its of memory . In this pap er , w e sh ow that if we hav e n 1 − δ bits of memory then th e maxim um load is Ω( δ log n/ log log n ) balls w ith h igh p robabilit y . W e study the problem in th e “comm un ication complexit y mo d el.” In this mo d el, the state of the algorithm is determined by M bits of memory . Before eac h step, w e choose the memory state m ∈ { 1 , . . . , 2 M } . Then the algorithm gets tw o bin c h oices i and j . I t selects one of them based on m , i , j and in dep end en t random bits. That is, th e algorithm c ho oses i with a certain p r obabilit y f ( m, i, j ) and j with pr obabilit y 1 − f ( m, i, j ); the choi ce is in d ep end ent from the p revious steps. Unlik e th e standard computational mo del, we do not requir e that the memory state of the algorithm dep end s only on m , i , j and th e rand om bits in the comm unication complexit y mo d el. In particular, th e state can dep end on the curren t load of b ins. Hence, algorithms in our mo del are more p o w erful than algorithms in the computational mo del. Consequent ly , our lo w er b ound (Theorem 1.1 ) app lies also to the computational m o del, whereas our upp er b ound (Theorem 1.2 ) applies only to the comm u n ication complexit y mo del. First, we p ro ve the lo wer b oun d on maxim um load. BALANCED ALLO CA TION: MEMOR Y PERFORMANCE TRADEOFFS 3 Theorem 1.1. We ar e se quential ly given n b al ls. W e have to put e ach of them into one of two bins chosen uniformly and indep e ndently at r andom among n bins. We have only M = n 1 − δ bits of memory ( δ > 0 may dep end on n ) ; our choic e wher e to put a b al l c an dep end only on these memory bits and r andom bits. Then the maximum lo ad wil l b e at le ast δ l og n 2 log log n with pr ob ability 1 − o (1) . Then we show that the b ound is essentia lly tigh t in the comm unication complexit y mo del. Theorem 1.2. Ther e exists an algorith m that gets M = n 1 − δ bits of ad- vic e b ef or e e ach step and uses no other memory, and ensur es that the he aviest bin c ontains at most O ( δ log n log l og n ) b al ls w.h.p. [ wher e δ ≥ 1 / (log n ) 1 − Ω(1) ] . In S ection 2 , w e pr o ve Theorem 1.1 . I n Section 3 , we prov e Theorem 1.2 . 2. Pro of of Theorem 1.1 . W e assume that δ l og n 2 log l og n ≥ 1, as otherwise the statemen t of the theorem is trivially true (there is a bin that con tains at least one b all). Consider one step of the bins–and–balls pro cess: we are giv en t wo bins c hosen uniformly at random, and we put the b all in to one of th em. Let p i ≡ p ( m ) i b e the pr obabilit y that w e put the ball in to bin i giv en that the memory state is m ∈ { 1 , . . . , 2 M } . Let F m ≡ F ε m = { i : p m i < ε/n } . Claim 2.1. (1) F or every set of bins S , the pr ob ability that we put a b al l in a bin fr om S is at le ast ε | S \ F ε m | /n : (2) | F ε m | ≤ εn. Pr oof. (1) The desired probability equals X i ∈ S p i ≥ X i ∈ S \ F ε m p i ≥ ε | S \ F ε m | n . (2) The p robabilit y that b oth c hosen b ins are in F ε m is | F ε m | 2 /n 2 . T here- fore, the pr obabilit y t th at w e put the ball int o a b in from F ε m is at least | F ε m | 2 /n 2 . On the other hand, w e ha v e t = X i ∈ F ε m p i < ε | F ε m | n . W e conclude that | F ε m | ≤ εn .  W e divide the p ro cess int o L consecutive phases. In eac h phase, we pu t ⌊ n/L ⌋ balls in to bin s . Let S i b e the set of bins that cont ain at least i balls at the en d of the phase i ; let S 0 = { 1 , . . . , n } . Now we will pro v e a b ound on the size of S i that in turn w ill imply Th eorem 1.1 . 4 I. BENJAMIN I AND Y. MAKA R YCHEV Lemma 2.2. L et L = ⌈ δ 2 log n / log log n ⌉ , ε = 1 / (2 L ) and β = 1 / (4 L ) . F or every i ∈ { 0 , . . . , L } , let E i b e the event that for every m 1 , . . . , m L − i ∈ { 1 , . . . , 2 M } ,      S i  L − i [ j =1 F m j      ≥ ( β ε ) i 2 n. Then for every i Pr( E i ) = 1 − o (1) . In p art icular, Pr( | S L | > 0) ≥ Pr( E L ) = 1 − o (1) , and ther efor e, in the end, the he aviest bin c ontains at le ast L b al ls w.h.p. Pr oof. First, note that the ev ent E 0 alw a ys holds,      S 0  L [ j =1 F m j      ≥ n − Lεn = n/ 2 . No w we s hall pro ve that Pr( ¯ E i |E i − 1 ) ≤ o (1 /L ) (uniformly for all i ), and th u s Pr( E i ) ≥ Pr( E 0 ∧ · · · ∧ E i ) = 1 − i X j =1 Pr( E 0 ∧ · · · ∧ E j − 1 ∧ ¯ E j ) − Pr( ¯ E 0 ) ≥ 1 − i X j =1 Pr( ¯ E j ∧ E j − 1 ) ≥ 1 − i X j =1 Pr( ¯ E j |E j − 1 ) = 1 − o (1) . Assume that E i − 1 holds. Fix m 1 , . . . , m L − i . W e are going to estimate the n u m b er of bin s in S i − 1 \ S L − i j =1 F m j whic h we put a ball int o during the phase i . All th ose bins are in the set S i \ S L − i j =1 F m j . Consider one step of the pro cess; w e are give n the t th b all (in the current phase) and ha ve to put it in a bin. Let N t − 1 b e the set of bins in S i − 1 \ S L − i j =1 F m j where w e hav e already put a b all in to (durin g the curren t phase). W e are going to lo wer b ound the probabilit y of the ev en t that we put the ball in to a “new bin,” th at is, in a bin in S i − 1 \ S L − i j =1 F m j \ N t − 1 . Denote the indicator v ariable of this even t by q t . L et m b e the state of the m emory at time t . Since E i − 1 holds,      S i − 1  L − i [ j =1 F m j !  F m      ≥ ( β ε ) i − 1 n 2 . Therefore, by Claim 2.1 (1), the prob ab ility that q t = 1 is at least ε | S i − 1 \ S L − i j =1 F m j \ F m \ N t − 1 | n ≥  ( β ε ) i − 1 2 − | N t − 1 | n  ε. BALANCED ALLO CA TION: MEMOR Y PERFORMANCE TRADEOFFS 5 Th us, if | N t − 1 | ≤ ( β ε ) i − 1 n/ 4, Pr( q t = 1 | q 1 , . . . , q t − 1 ) ≥ ( β ε ) i − 1 × ε/ 4 def = µ. Note that | N t | = | N t − 1 | + q t and | N t | = q 1 + · · · + q t . No w we w ant to apply the Cher n oﬀ b ound to the rand om v ariables { q j } j . Ho wev er, since they are not necessarily in dep end en t, w e will need an additional step. Deﬁne rand om v ariables ˜ q j as follo ws. If | N t − 1 | ≤ ( β ε ) i − 1 n/ 4,          if q t = 1 , let ˜ q t = 1 w.p. µ Pr( q t = 1 | q 1 , . . . , q t − 1 ) ; if q t = 1 , let ˜ q t = 0 w.p. 1 − µ Pr( q t = 1 | q 1 , . . . , q t − 1 ) ; if q t = 0 , let ˜ q t = 0. If | N t − 1 | > ( β ε ) i − 1 n/ 4,  let ˜ q t = 1 w.p. µ ; let ˜ q t = 0 w.p. 1 − µ. It is easy to s ee that in either case Pr( ˜ q t = 1 | ˜ q 1 , . . . , ˜ q t − 1 ) = µ . Therefore, ˜ q 1 , . . . , ˜ q t are i.i.d. 0–1 Bernoulli random v ariables with exp ectation µ . By the Cher n oﬀ b ound, the probability th at ˜ q 1 + · · · + ˜ q n/L is at least 1 2 × E [ ˜ q 1 + · · · + ˜ q n/L ] = 1 2 × nµ L = ( β ε ) i n 2 is at least 1 − 2 · 2 − ( β ε ) i n/ 8 . Since q t ≥ ˜ q t if | N t − 1 | < ( β ε ) i − 1 n/ 4, | N t | = q 1 + · · · + q t ≥ min(( β ε ) i − 1 n/ 4 , ˜ q 1 + · · · + ˜ q t ) . Finally , we ha ve Pr  | N n/L | ≥ ( β ε ) i n 2  ≥ Pr  min(( β ε ) i − 1 n/ 4 , ˜ q 1 + · · · + ˜ q n/L ) ≥ ( β ε ) i n 2  = Pr  ˜ q 1 + · · · + ˜ q n/L ≥ ( β ε ) i n 2  ≥ 1 − 2 · 2 − ( β ε ) i n/ 8 . Since S i \ S L − i j =1 F m j ⊃ N n/L , Pr      S i  L − i [ j =1 F m j      ≥ ( β ε ) i 2 n    E i − 1 ! ≥ 1 − 2 · 2 − ( β ε ) i n/ 8 for ﬁxed m 1 , . . . , m L − i . By th e un ion b ound [recall that ε = 1 / (2 L ) and β = 1 / (4 L )] Pr for all m 1 , . . . , m L − i :      S i  L − i [ j =1 F m j      ≥ ( β ε ) i n/ 2    E i − 1 ! 6 I. BENJAMIN I AND Y. MAKA R YCHEV ≥ 1 − 2 · (2 M ) L − i 2 − ( εβ ) i n/ 8 ≥ 1 − 2 M L − (1 / (8 L 2 )) L n/ 8 (1) = 1 − 2 n 1 − δ L (1 − n δ L (1 / (8 L 2 )) L +1 ) . Recall that L = ⌈ δ log n 2 log log n ⌉ . W e ha ve, (8 L 2 ) L +1 ≤ (8 L 2 ) 2 · (8 L 2 ) δ log n/ (2 log log n ) ≤ (8 L 2 ) 2 ·  log n ω (1)  δ l og n/ log l og n ≤ (8 L 2 ) 2 n δ 2 − ω ( L ) = n δ 2 6+4 log L − ω ( L ) = o ( n δ ) . Therefore, expr ession ( 1 ) is 1 − 2 n 1 − δ L (1 − ω ( L )) = 1 − o (1).  3. Pro of of Th eorem 1.2 . In th is s ection, we w ill p ro ve that our b ound is tigh t in the communicatio n complexit y mo d el. Sp eciﬁcall y , we p r esen t an algorithm that gets M = n 1 − δ bits of ad v ice b efore eac h ball is thro wn , and ens ures that the maximum load is at most O ( δ l og n log log n ) w .h.p. when δ ≥ 1 / (log n ) 1 − Ω(1) . Observe that no matter wh ic h of the t wo b ins we c h o ose at eac h step, the pr ob ab ility p i that w e put the ball in a b in i is at most 2 /n . Th erefore, the probability that after n steps the total num b er of balls in the bin i exceeds T = 2 δ log n log log n (1 + 2 log(1 /δ ) log( δ log n ) ) is asymp totically at most the probabilit y that a P oisson rand om v ariable with λ = 2 exceeds T , that is, it is at most e − 2 2 T T ! (1 + o (1)) = o (( 2 e T ) T ) = o (1 / ( n δ log n )). Thus the n u m b er of bin s that con tain at least T balls is at most n 1 − δ / (2 log n ) w.h.p. Before eac h step, our algorithm receiv es the list L of such b ins, and the n u m b er of b alls in eac h of them. No w if one of the tw o randomly c h osen bins b elongs to L and the other d o es n ot, the algorithm pu ts the ball in to the bin that is n ot in L ; if b oth bins are in L , the algorithm puts the ball into the bin with few er balls (let us sa y that w e use th e “alw a ys-go-left” tie b reaking r u le: if b oth bins contai n the same num b er of b alls, w e put the ball into the left of the t wo b ins); ﬁnally , if b oth bin s are not in L , the algorithm puts the b all in to an arbitrary bin. Let us estimate the maxim um load. W e sa y th at a b all is an “extra ball” if we put it into a bin that is in L (at th e moment when we put the ball). Then the total n umb er of balls in a bin is at most T plu s the n u m b er of extra balls in the b in. Let us no w coun t only extra balls. Note that ev ery time we get a ball, we either: • “discard it,” pu t it into a bin that is n ot in L , and thus do n ot coun t it as an extra b all, or • put it into one of the tw o bins that con tains f ew er “extra balls.” That is, w e use a mo diﬁed sc h eme of Azar, Brod er, K arlin and Upfal, where w e sometimes pu t a ball into one of the t wo bin s that con tains few er “extra balls,” and sometimes d iscard it. W e claim that eac h b in conta ins at most BALANCED ALLO CA TION: MEMOR Y PERFORMANCE TRADEOFFS 7 log log n extra balls as in the s tand ard “p o wer of t wo c hoices” sc heme of Azar, Bro d er, Karlin and Upfal. Claim 3.1. Consider the b al ls and bins pr o c ess. Supp ose at step i we ar e given the choic e of two bins a 1 i and a 2 i . L et k ij b e the numb er of b al ls in bin j after i steps when we use the standar d “p ower of two choic es” scheme. L et ˜ k ij b e the numb er of extr a b al ls in bin j after i steps when we use our mo diﬁe d “p ower of two choic es” scheme. Assume that in b oth c ases we use the “always-go-left” tie br e aking rule. Then ˜ k ij ≤ k ij , for every 1 ≤ i, j ≤ n ( the statement holds for eve ry se quenc e { a 1 i , a 2 i } i =1 ,...,n ) . Pr oof. W e pro ve that ˜ k ij ≤ k ij b y in duction on i . Initially , all b ins con tain n o balls, ˜ k 0 j = k 0 j = 0, so the statemen t holds. Assume that the statemen t holds for i < i 0 , w e verify that ˜ k ij ≤ k ij for i = i 0 . Fix j . Consider sev eral cases. • First, supp ose that w e put the b all in to the bin j at step i in b oth sc hemes. Then ˜ k ij = ˜ k i − 1 ,j + 1 ≤ k i − 1 ,j + 1 = k ij . • No w su pp ose that we put the ball in to the bin j at step i in the mo diﬁ ed sc heme, h o wev er, we p ut the b all in to some bin j ′ 6 = j in the standard sc heme. Note that if j < j ′ then ˜ k i − 1 ,j ≤ ˜ k i − 1 ,j ′ and k i − 1 ,j ′ < k i − 1 ,j th u s ˜ k ij = ˜ k i − 1 ,j + 1 ≤ ˜ k i − 1 ,j ′ + 1 ≤ k i − 1 ,j ′ + 1 ≤ k i − 1 ,j = k ij ; if j ′ < j then ˜ k i − 1 ,j < ˜ k i − 1 ,j ′ and k i − 1 ,j ′ ≤ k i − 1 ,j and thus ˜ k ij = ˜ k i − 1 ,j + 1 ≤ ˜ k i − 1 ,j ′ ≤ k i − 1 ,j ′ ≤ k i − 1 ,j = k ij . • Finally , supp ose that in th e mo diﬁed sc heme w e pu t the ball into some bin j ′ 6 = j or discard it at step i . Then ˜ k ij = ˜ k i − 1 ,j ≤ k i − 1 ,j ≤ k ij .  Note that if bins a 1 i and a 2 i are c hosen u niformly at rand om, then max j k nj = log log n + Θ(1) with h igh p r obabilit y [ 2 ]. Therefore, by the claim, max j ˜ k nj = log log n + O (1), and eac h bin cont ains at most T + lo g log n + O (1) = O ( δ log n log l og n ) balls w.h.p . Ac kn o wledgment s. Noga Alon sho wed us the n o memory case b efore pursu ing this work. W e wo u ld lik e to thank Noga Alon and Ey al Lu b etzky for useful d iscussions. W e thank the anonymo us referee for v aluable suggestions. REFERENCES [1] Adler, M. , Chakrabar ti, S. , Mitzenmacher, M. and Rasmussen, L. (1998). P arallel randomized load balancing. R andom Structur es Algorithms 13 159–188. MR1642570 [2] Azar, Y. , B ro d er, A. Z. , Karlin, A. R. and Upf al, E. (1999). Balanced allocations. SIAM J. Comput. 29 180–200. MR1710347 8 I. BENJAMIN I AND Y. MAKA R YCHEV [3] Beren brink, P. , Czu maj, A. , Ste ger, A. and V ¨ ocking, B. (2006). Balanced allo- cations: The heavily loaded case. SI AM J. Comput. 35 1350–1385. MR2217150 [4] Byers, J. , Considine, J. and Mitzen macher, M . (2003). Simple load balancing for distributed h ash tables. I n Pe er-to-Pe er Systems II . L e ctur e Notes i n Computer Scienc e 2735 80–87. Sp ringer, Berlin. [5] Byers, J. W. , Considine , J. and Mitze nmacher, M. (2004). Geometric generaliza- tions of the p ow er of tw o choices . In Pr o c e e di ngs of the Si xte enth Annual ACM Symp osium on Par al lelism in Algorithms and Ar chite ctur es 54–63. ACM, New Y ork. [6] Cole, R. , Maggs, B. M. , Meyer auf der Heide , F. , Mitzenmacher, M. , Richa, A. W. , Schr ¨ oder, K. , Sit arama n, R. K. and V ¨ ocking, B. (1999). Randomized proto cols for low-congestio n circuit routing in multis tage inter- connection netw orks. In STOC’98 (Dal l as, TX) 378–38 8. ACM, New Y ork. MR1731590 [7] Mitzen macher, M. , Richa, A. W. and Sit araman, R. (2001). The p ow er of tw o random choices: A survey of tec h niques and results. In Handb o ok of R andomize d Computing, Vol. I, II . Combinatorial Optim ization 9 255–312. Kluw er Academic, Dordrech t. MR1966907 [8] T al w ar, K. and Wiede r, U. (2007). Balanced allocations: The weigh ted case. In STOC’07—Pr o c e e dings of the 39th Annual ACM Sympo sium on The ory of Com- puting 256–265. ACM, N ew Y ork. MR2402449 Dep ar tment of Ma thema tics Weizmann Institute Rehovot 76100 Israel E-mail: itai.b enjamini@weizmann.ac.il Toyot a Technological In stitute a t Chicago 6045 S. Kenwood A ve. Chicago, Illinois 60637 USA E-mail: yury@ttic.edu

Balanced allocation: Memory performance tradeoffs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment