Dynamic Range Majority Data Structures

Dynamic Range Ma jorit y Data Structures ⋆ Amr Elmasry 1 , Meng He 2 , J. Ian Munro 3 , and P atrick K. Nic ho lson 3 1 Department of Computer Science, Universit y of Copen hagen, Denmark 2 F acult y of Computer Science, Dalhousie Un iversit y , Ca nada 3 David R. Cheriton School of Computer Science, U niversit y of W aterloo, Canada, e lmasry@diku.dk, mhe@cs.dal.ca, { im un ro, p3nichol } @uw aterloo.ca Abstract. Give n a set P of n coloured p oints on the real line, w e study the problem of answering range α -ma jorit y (or “hea vy hitter”) queries on P . More sp eciﬁcally , for a query range Q , w e wan t to return each colour that is assigned to more t h an an α -fraction of the points con tained in Q . W e present a new data structure for answeri ng range α - ma jority q ueries on a dynamic set of p oin ts, where α ∈ (0 , 1). Our data stru ct ure uses O ( n ) space, supp orts qu eries in O ((lg n ) /α ) time, and updates in O ((lg n ) /α ) amortized time. If t he coordinates of th e p oin ts are integers, then the query time can b e impro ved to O (lg n/ ( α lg lg n )). F or constant v alues of α , this improv ed q uery time matc hes an existing lo wer b oun d, for any data structu re with p olylogarithmic up date time. W e also generalize our data structure to handle sets of p oin ts in d -dimensions, for d ≥ 2, as well as d ynamic arrays, in which each entry is a colour. 1 In t ro duction Many pro blems in computationa l geometry deal with p oint sets that have informatio n encoded as colours assigned to the points. In this pap er, w e des ign dy na mic da ta structures for the r ange α -majority pr oblem , in which we wan t to r ep ort colours that app ear fr e quently within an axis- aligned quer y r ectangle. This pr oblem is useful in databa se applications in which we would like to know typical attributes of the da ta p oints in a query rang e [23,24]. F or the one-dimensio nal case, where the p oints r epresent time stamps, this problem has data mining applications for net work traﬃc lo gs, simila r to those o f coloured range co un ting (cf. [1 7]). F ormally , we are giv en a se t, P , of n points, where each p o int p ∈ P is assigned a co lour c from a set, C , of colours. W e deno te the colour of p a s col( p ) = c . W e are a lso given a ﬁxed pa rameter α ∈ (0 , 1), that deﬁnes the threshold for determining whether a colour is to be co nsidered frequent. Our g oal is to design a dynamic r ange α -majority data s t ructur e that can perfo rm the following op erations: – Quer y ( Q ): W e are g iven an axis-a ligned hyperr ectangle Q as a query . Let P ( Q ) be the set { p | p ∈ P ∩ Q} , and P ( Q , c ) be the set { p | p ∈ P ( Q ) , col( p ) = c } . The answer to the q uery Q is the set of colours C ⋆ such tha t for each colo ur c ∈ C ⋆ , |P ( Q , c ) | > α |P ( Q ) | , and for all c 6∈ C ⋆ , |P ( Q , c ) | ≤ α |P ( Q ) | . W e refer to a colour c ∈ C ⋆ as a n α -majority for Q , and this t yp e of q uery as an α -majority query . When α = 1 / 2, the problem is to identify the ma jorit y colo ur in Q , if such a colour exists. – Inser t ( p, c ): Insert a p oint p with colour c in to P . – Delete ( p ): Remo ve the p oint p fro m P . 1.1 Previous W ork Static and Dynamic R ange α -Majo rity: In all of the following results, unless mentioned otherwis e , the threshold α ∈ (0 , 1) is ﬁxed at co nstruction time, rather than sp eciﬁed for each quer y individually . ⋆ A preliminary v ersion of this w ork app eared in the 22nd International Symp osium on Algorithms and Computation (ISAAC 2011). This work wa s supp orted b y NSERC of Canada, and N SERC PGS-D Scholarship, and the Canada Researc h Chairs Program. Karpinski a nd Nekr ich [2 3] studied the problem o f a nswering range α -ma jority queries, whic h they call c olour e d α -domination querie s . In the static ca se, they g av e a n O ( n/ α ) space data str uc tur e that sup- po rts one-dimensional quer ies in O ((lg n lg lg n ) /α ) time 4 , a nd a n O (( n lg lg n ) /α ) space data structur e that suppo rts quer ies in O ((lg n ) /α ) time. In the dynamic cas e, they gav e an O ( n/α ) s pa ce data structur e for one- dimensional queries tha t suppo rts queries and inser tions in O ((lg 2 n ) /α ) time, and deletions in O ((lg 2 n ) /α ) amortized time. They also gav e an alternative O (( n lg n ) /α ) space data str ucture that supp orts queries a nd insertions in O ((lg n ) / α ) time, and deletions in O ((lg n ) /α ) amor tize d time. F o r po ints in d -dimensions, for constant d ≥ 2, they g av e a static O (( n lg d − 1 n ) /α ) spa ce da ta structure that supp orts queries in O ((lg d n ) /α ) time, as well a s a dynamic O (( n lg d − 1 n ) /α ) space data structure that s uppo rts queries and insertions in O ((lg d +1 n ) /α ) time, and deletio ns in O ((lg d +1 n ) /α ) amortized time. Duro cher et al. [12] described a static O ( n (lg (1 /α ) + 1)) spa ce data structure that answ ers ra nge α - ma jority quer ies in an array in O (1 / α ) time. This data structure is based on the idea that it is p ossible to pro duce a s hort list of candidate α -ma jorities for any query , and then eﬃcien tly verify the frequencies of these candidates using succinct data structures. In a later version of the same pap er [13], they describe d how to extend their technique to d -dimensions for co nstant d ≥ 2, resulting in a n O ( n lg d − 1 n ) space data structure that supp orts range α -ma jor ity queries in O (lg d n/α ) time. Gagie et al. [16] improved the static o ne- dimensional result to O ( n (min(lg (1 /α ) , H ) + 1)) space, wher e H ≤ lg n is the 0th-o rder empirical ent ropy o f the sequence stored in the array . The same authors also describ ed ho w to improv e the query time to O (1 /β ), when asked for the β -ma jorities in a quer y range, for any β ≥ α speciﬁed at query time. Recently , for the t w o- dimensional static case, Wilkinson [31] prese nted an improv ed data str ucture that o ccupies O ( n lg ε n lg(1 / α )) space, for an y constant ε > 0, and ca n a nswer queries in O (lg n/α ) time. Appr oximate V ersions of the Pr oblem: Resea rchers ha ve also examined an approximate version of the r ange α -ma jority pr oblem, in w hich the so lution must contain all the α -ma jorities in a query rang e, but ca n also contain some false p ositives. La i et al. [2 4] s tudied the dyna mic problem, using the term he avy-c olours instea d of α -ma jorities. They presented a dynamic data structure based on sketc hing, which provides an approximate solution with pro babilistic guara ntees for constant v alues of α . F or one dimensio n their data structure use s O ( hn ) space and supports queries and up dates in O ( h lg n ) time, where the parameter h = O ( lg | C | ε lg( lg | C | αδ )) depe nds o n the thresho ld α , the a pproximation factor ε , the to tal n umber of colour s | C | , and the probability of failure δ . They also noted that the space can be reduced to O ( n ), at the co s t of increas ing the query time to O ( h lg n + h 2 ). Thus, for constant v alues of ε , δ , and α , their data structur e uses O ( n ) space and has O ((lg n lg lg n ) 2 ) query and up date time in the worst ca se when lg m = Ω (lg n ). Another approximate data structure based on sketc hing w a s prop os e d by W ei a nd Yi [30]. Their data structure uses linear space, answers querie s in O (lg n + 1 /ε ) time, and ma y return fals e positives with relative frequency b etw een α − ε and α . The c o st of up dates is O ( µ lg n lg(1 / ε )) amortized time, where µ is the cost of updating the sketc hes. W e note that this result was obtained independently of o urs, and that bo th our techniques and the main technique they develop, called exp onent ial de c omp osability , are simila r. By combining Theor e m 4 of their pap er with standard r ange counting data structures, it is not diﬃcult to get a data structure that o ccupies linear spa c e, a nswers queries in O (lg n/ α ) time, and supp or ts up dates in O ((lg n lg (1 /α )) /α ) amo r tized time for the non-approximate version of the problem that w e study . Ho wev er, we slightly improv e this up date time, and also generalize our data structures to higher dimensions, whereas their structure is part o f a mo re general framework that s upp or ts other kinds of aggrega te queries. L ower Bounds: The partia l sum problem for thresho ld functions [21] capture s the essence of the dynamic range α -ma jority problem: maintain n bits x 1 , ..., x n sub ject to up dates and thr eshold queries . An update consists of ﬂipping the bit at a s pe c iﬁe d index. The answer to query t hresh old ( i ) is “yes” if and only if P i j =1 x j ≥ f ( i ), where f ( i ) is an integer function such that f ( i ) ∈ { 0 , ..., ⌈ i / 2 ⌉} . Husfeldt and Rauhe proved a lo wer bo und [21] on the query time t q for a da ta structure that can answer threshold queries with update time t u . 4 W e use lg n to denote ⌈ log 2 n ⌉ . 2 An y data structure for dynamic α -ma jority c an be used to solve the par tial sum problem for threshold functions. In particular, w e can trea t t he pro blem as inv olving n p oints with integer co ordinates 1 , ..., n , with each p oint having o ne of tw o co lours. A ﬂip op era tion can be implemented as a deletion follow ed b y an inse rtion. Th us, we can state their low er b ound in terms of our problem, denoting the c e ll size of our machine as w : Lemma 1 (F ollo ws from [21], Prop. 4). L et t u and t q denote the u p date and query times, r esp e ctively, for any dynamic α - m ajority data structur e. Then, t q = Ω  lg(min { αn, (1 − α ) n } ) lg( t u w lg (min { αn, (1 − α ) n } ))  . This b ound sug gests that, for constant v alues of α and w ord size Θ (lg n ) bits, O (lg n/ lg lg n ) query time for in teger po in t sets is optimal for any da ta structure w ith p olylo garithmic update time. Other R elate d Work: Finally , sev er al other results exist for ﬁnding α -ma jor ities in the streaming mo del, t ypically referred to as he avy hitters [6,10,22,25]. De Berg a nd Hav erkort [9] studied a similar problem of rep orting τ - signiﬁc ant colo urs. F o r this problem, the go al is to output all colour s c such that a t least a τ -fraction of al l of the points with colour c lie in the axis-alig ned quer y rectangle. More bro adly , there are o ther data structure pro blems that deal with coloured p oints. In c olour e d r ange r ep orting pr oblems , we are interested in repor ting the set of distinct colours as signed to the po ints contained in an axis-aligned rectangle. Similarly , in the c olour e d r ange c ount ing pr oblem w e are interested in returning the num b er of such distinct colour s . Gupta et al. [20], Bozanis et al. [7], and, more recently , Ga gie et a l. [18] and Gag ie and K¨ arkk¨ ainen [17] studied these pro ble ms and presented several interesting results. 1.2 Our Resul ts In this pa p er w e pres e nt new data structures for the dynamic range α -ma jority problem in the w ord-RAM mo del with word size Ω (lg n ), where n is the num b er of p oints in the set P , a nd α ∈ (0 , 1 ). Our res ults are s ummarized and co mpared to the previous b e s t r esults in T a ble 1 . The input co lumn indicates the t yp e of data we are considering. W e use p oints to deno te a set of p oints on a line with real-v alue d co o rdinates that w e can compare in cons tant time, inte gers to denote a set of p oints on a line with word size d integer co ordinates, and arr ay to denote that the input is to be considered a dynamic array , where the p ositio ns o f the po ints are in r ank space. Source Input S p ace Query Insert Delete [23, Thm. 3] points O ( n α ) O ( lg 2 n α ) O ( lg 2 n α ) O ( lg 2 n α )* [23, Thm. 3] points O ( n lg n α ) O ( lg n α ) O ( lg n α ) O ( lg n α )* New p oints O ( n ) O ( lg n α ) O ( lg n α )* O ( lg n α )* New integ ers O ( n ) O ( lg n α l g lg n ) O ( lg n α )* O ( lg n α )* New arra y O ( n ) O ( lg n α l g lg n ) O ( lg 3 n α l g lg n )* O ( lg n α )* T able 1. Comparison of the results in this pap er to the previous best results. F or the entries marked with “*” the running times are amortized. Our results improv e up on previo us r esults in several ways. Most noticeably , all our da ta s tr uctures require linear space. In order to provide fast query and up date times for our linear spac e structures, we pr ove several int eresting pro per ties of α -ma jority colours. W e note tha t the low er b ound fr o m Lemma 1 implies that, for constant v alue s of α , an O (lg n/ lg lg n ) q uery time for in teg er point sets is optimal for a ny data structur e 3 with p olyloga rithmic update time, w he n the word size w = Θ (lg n ). Our data structure for p oints on a line with in teger co ordinates achiev es this optimal que r y time. Our data structures can also be generalized to handle d -dimensiona l p oints, improving up on previous results in the dynamic case [2 3]. F or d ≥ 2, our data structure occupies O ( n lg d − 1 n ) space, answ ers r ange α -ma jority queries in O ((lg d n ) /α ) time, and suppo rts updates in O ((lg d n ) /α ) amortized time. R o ad Map: In Section 2 we present a dynamic range α -ma jority data structur e for points in one dimension. In Section 3 we s how how to sp eed up the query time o f our data structure in the cas e wher e the p oints ha ve int eger co ordinates. In Section 4 we generaliz e our o ne dimensio na l data structures to hig her dimensions. Finally , in Section 5 w e pr esent our data structure for dynamic arrays. Assumptions A b out Colours: In the following sections, we ass ume that we ca n compare colours in co nstant time. In or der to supp ort a dynamic set of colour s, w e employ the techniques describe d by Gupta et al. [20]. These tec hniques a llow us to maintain a mapping from the set of colour s to integers in the range [1 , 2 n ], where n is the num b er o f p oints curre ntly in our data structure. This a llows us to index into an ar ray us ing a colour in co nstant time. F or the dynamic problems discussed, this mapping is maintained using a metho d similar to global re- building to ensure that the in teg er iden tiﬁers o f the colours do no t grow to o large [20, Section 2.3]. When a colour ed point is inserted, we m ust ﬁrst determine whether we have alr eady assigne d an in tege r to that colour. By storing the set of kno wn colour s in a balanced binary search tre e , this can b e chec ked in O (lg | C | ) time; recall that | C | is the num ber of distinct colours cur rently ass igned to points in our data structure. Since | C | ≤ n , this cost is absorb ed b y up date time of o ur data structure; see T able 1 . Therefore, from this po int on, we assume that w e a re dealing with in teg ers in the ra nge [1 , 2 n ] when we discuss colo urs. 2 Dynamic Data Struct ures in One Dimension In o ne-dimension we can interc hange the notion of p oints and x -co ordinates in P , s ince they are equiv alent. Depending on the con text we may use either ter m. Our basic data structure, lik e tha t of Karpinski and Nekrich [23], is a mo diﬁed weigh t balanced B- tree [3]. How ever, we prov e several in ter esting com bina torial prop erties of α -ma jorities in order to provide mo re eﬃcien t supp or t for queries and up dates. Let T b e a w eight-balanced B-tree with bra nching par ameter 8 and leaf par ameter 1 such that each leaf represents an x -co ordinate in P . F rom left to rig h t the leav es are sorted in ascending order o f the x -co o rdinate that they r epresent. Let T ( u ) be the subtre e ro oted at no de u . Each in ter na l no de u in the tree represents a range R ( u ) = [ x min , x max ], where x min is the x -co o rdinate represented b y the leftmost leaf in T ( u ), and x max is the x -co o r dinate represented by the rightmost lea f in T ( u ). W e n umber the levels of the tr ee 0 , 1 , ..., Θ (lg n ) from top to bottom. If a no de is h levels above the leaf lev el, w e say that this no de is of height h . By the prop erties of weigh t-balanced B- trees, the range represented by an in ter nal node of height h contains at lea st 8 h / 2 (except the ro ot) and at most 2 × 8 h po int s, and the degree of eac h internal no de is at least 2 and at most 32. 2.1 Supp orting Q ueries Given a query Q ′ = [ x ′ a , x ′ b ], we perfor m a top-down trav ers al on T to map Q ′ to the r ange Q = [ x a , x b ], where x a and x b are the points in P with x -coor dinates that are the successor and the pre de c e ssor of x ′ a and x ′ b , resp ectively . W e call the q uer y range Q gener al if Q is not represe nted by a single no de of T . W e ﬁrs t deﬁne the notion o f r epr esenting a general query range by a set of no des: Deﬁnition 1. Given a gener al query r ange Q = [ x a , x b ] , Q induc es a set, I , of no des in the t r e e T , satisfying the fol lowing two c onditions. 1. Th e r ange r epr esente d by the p ar ent of e ach no de in I is not entir ely c ontaine d in Q . 2. F or al l p ∈ P ∩ Q , ther e exists some n o de u ∈ I with p ∈ R ( u ) . 4 We say that I is the set of no des in the tr e e T representing Q . F or each no de u ∈ T , w e keep a list, L ( u ), of k c andida te colour s, i.e., the k most frequent colours in the range R ( u ) repr esented by u , bre aking ties a rbitrar ily . Later, we will ﬁx a v alue fo r k . Let L ⋆ = ∪ u ∈ I L ( u ), i.e., the union of a ll the candidate lists among the no des re presenting the query r ange Q . F o r e a ch colo ur c ∈ C , we keep a separ ate ra nge count ing data s tructure, F c , containing all p o ints p ∈ P with colour c , and also a ra nge coun ting data structure, F , containing all of the points in P . Let m be the total n um b e r o f po int s in the range [ x a , x b ], which ca n b e determined by querying F . F o r ea ch c ∈ L ⋆ , we query F c with the range [ x a , x b ] letting o cc be the res ult. If occ > αm , then we rep or t that c ∈ C ⋆ . It is clear that I co n tains at most Θ (lg n ) no des. F urthermore, if a co lour c is a n α - ma jority for Q , then it must b e a n α -ma jority fo r at least one of the ra nges in I [23, Observ ation 1]. If we set k = ⌈ 1 /α ⌉ a nd store ⌈ 1 /α ⌉ colo urs in each internal no de as candidate colours, then, by the pro cedure just des crib ed, we will p erform a range counting q uery on Θ ((lg n ) /α ) c o lours. If we use balanced sear ch trees for our range counting data s tructures, then this tak es Θ ((lg 2 n ) /α ) time overall. How ever, in the seq ue l we show how to improv e this query time b y exploiting the fact that the no des in I that ar e closer to the roo t o f T contain more po int s in the r anges that they r epresent. W e shall prove useful pr o p erties of a g e neral query range Q and the set, I , of nodes representing it in Lemmas 2, 3, 4, and 5. In these lemmas, m denotes the num b er of p oints in Q , and i 1 , i 2 , ... denote the distinct v alues of the heights of the no des in I , where i 1 > i 2 > ... ≥ 0 . W e ﬁrst g ive an upp e r bound on the nu mber of points co nt ained in the r anges represented by the no des of I o f a g iven height: Lemma 2. T he t otal n umb er of p oints in the r anges r epr esente d by al l the no des in I of hei ght i j is less than m × min(1 , 3 1 × 8 1 − j ) . Pr o of. Since Q is gene r al and con tains at least one no de of height i 1 , m is gr eater than the minimum num b er of p oints that can b e co n tained in a node o f height i 1 , which is 8 i 1 / 2. The no des of I whose height is i j , j 6 = 1, are siblings and m ust hav e at lea s t one sibling that is not in I . The num b er of p oints con tained in the in ter v al represented by this sibling is at least 8 i j / 2. Ther efore, the num ber , m j , of points in the ra nges repres ent ed b y the no des of I at level i j is at most 2 × 8 i j +1 − 8 i j / 2 = (31 / 2) × 8 i j . Thus, m j /m < 31 × 8 i j − i 1 < 3 1 × 8 1 − j . ⊓ ⊔ W e next us e the ab ove lemma to b ound the num b er of p oints whose co lours a re not amo ng the candidate colours stored in the corresp onding no des in I . Lemma 3. Supp ose we ar e given a no de v ∈ I of heigh t i j and a c olour c . L et n ( c ) v denote t he n umb er of p oints with c olour c in R ( v ) , t he r ange c over e d by v , if c is not among the ﬁrst k j = ⌈ k / 2 j − 1 ⌉ most fr e quent c andidate c olours in the c andidacy list of v , and n ( c ) v = 0 otherwise. Then P v ∈ I n ( c ) v < 5 . 59 m k +1 . Pr o of. If c is not among the ﬁrst k j candidate colour s stored in v , then the num b er of p o ints with co lour c in R ( v ) is at most 1 / ( k j + 1) times the n um be r of points in R ( v ). Thus, X v ∈ I n ( c ) v < 2 X j =1 m k j + 1 + X j ≥ 3  31 × 8 1 − j  m k j + 1 < m k + 1  1 + 2 + 31  2 2 8 2 + 2 3 8 3 + . . .  < 5 . 59 m k + 1 ⊓ ⊔ W e next consider the nodes in I that are closer to the leaf levels. Let I t denote the nodes in I that are at one of the top t = ⌈ lg( 1 α ) 3 + 2 . 05 ⌉ — no t nec e s sarily consecutive— le vels of the no des in I . W e prove the following prop er t y: 5 Lemma 4. The numb er of p oints c ontaine d in the r anges r epr esente d by the no des in I \ I t is less tha n αm/ 2 . Pr o of. By Lemma 2, the n um ber of p oints cont ained in the r anges represented by the no des in I \ I t is less than: 31 m X j ≥ t +1 8 1 − j < 31 m  1 8 t + 1 8 t +1 + ...  < 31 m  8 7 × 1 8 t  Since t ≥ lg( 1 α ) 3 + 2 . 05, the a bove v alue is less than αm/ 2. ⊓ ⊔ With the above lemmas, we can choose an appropria te v alue for k to g uarantee the following pro p e rty that is c r itical to achiev e improv ed que r y time: Lemma 5. When k = ⌈ 11 . 18 α ⌉ − 1 , any α -majority c olour, c , of the query r ange Q is among t he u nion of the ﬁr s t ⌈ k / 2 j − 1 ⌉ c andidates stor e d in e ach no de of height i j r epr esenting a r ange in I t . Pr o of. The tota l num b er of p oints with co lour c in the ranges r epresented b y the nodes in I \ I t is less than αm/ 2 b y Lemma 4. By Lemma 3 and o ur c hoice for the v alue of k , less than αm/ 2 points in the ranges represented b y the nodes in I t for which c is not a candidate can hav e co lour c . The lemma th us follows. ⊓ ⊔ F or eac h node v ∈ T , we keep a s emi-ordered list of the k candidate c o lours in the range R ( v ) repres e nted by v . The order on the colours for any candidacy list is maintained suc h that the mos t frequent ⌈ k / 2 j − 1 ⌉ colours come ﬁr st, for all j = 2 , 3 , . . . , arbitrarily or der ed within their positions. Note tha t such a semi- ordering ca n be obtained in O ( k ) time b y r e p ea ted median queries. That is , b y using a linear time median ﬁnding algorithm [5], w e can partition the list so tha t the ﬁrst half of the list con tains the k/ 2 most frequen t colours, a nd then recurse on the ﬁrst half of the list until the list has 1 element. In total, this takes O ( k + k / 2 + ...k / 4) = O ( k ) time. By se tting k = ⌈ 11 . 18 /α ⌉ − 1, Lemma 5 implies that the colours that we ha ve check ed are the o nly po ssible α -ma jority colours for the query . F urthermore, Lemma 4 implies that w e nee d only chec k the no des on the to p O (lg (1 /α )) levels in I . Let I t denote the s et of no des in these levels. W e pr esent the following lemma: Lemma 6. The data s tructur es describ e d in this se ction o c cupy O ( n ) wor ds, and c an b e u s e d to answer a r ange α - majority query in O ((lg n ) /α ) time with the help of an add itional arr ay of size 2 n . Pr o of. T o supp or t α -ma jority q ueries, we only consider the nontrivial case in which the query r ange Q is general. By Lemma 5, the α -ma jorities can be found by exa mining the ﬁrst ⌈ k / 2 j − 1 ⌉ candidate colours stored in eac h no de representing a range in I t . Th us, there are at most O ( ⌈ 1 α ⌉ + ⌈ 1 2 α ⌉ + ⌈ 1 4 α ⌉ + ... + ⌈ 1 2 t − 1 α ⌉ ) = O ( 1 α ) relev ant colours to chec k. Let L t denote the set of these colours. F or each c ∈ L t we query our range coun ting data str uctures F c and F in Θ (lg n ) time to de ter mine whether c is an α -ma jorit y . Thus, the overall quer y time is O ((lg n ) /α ). There ar e Θ ( n ) no des in the weigh t-ba la nced B-tree. Ther efore, one w o uld exp ect the space to be Θ ( n/α ) words, since each no de stor e s Θ (1 /α ) co lours. W e use a pruning technique on the low er levels of the tree in order to reduce the spa ce to O ( n ) w or ds ov er all. If a no de u cov er s less than 1 /α p oints, then we need not store L ( u ), since ev ery colour in T ( u ) is an α -ma jor it y for R ( u ). Instead, during a quer y , w e can trav erse the leaves of T ( u ) in o rder to de ter mine the unique co lours. T o ma ke this eﬃcient, w e req uir e an ar ray D of size 2 n int egers to count the frequencies of the colours in R ( u ). As mentioned in Section 1 .2, w e ca n map a colour in to an index of the array D , whic h allows us to incremen t a frequency counter in O (1) time. Thus, we can extract the unique colour s in R ( u ) in O ( | T ( u ) | ) = O ( 1 α ) time. The num be r of tree no de s whose subtrees hav e at least 1 /α leav e s is O ( nα ). Thus, we store O ( k ) = O (1 /α ) words in O ( nα ) no des, and the total space used by our B-tree T is O ( n ) words. The only other da ta structures we make use o f are the ar ray D and the range count ing data s tr uctures F and F c for each c ∈ C , and together these o ccup y O ( n ) words. ⊓ ⊔ 6 2.2 Supp orting Up dates W e next establish how m uch time is required to maintain the list L ( v ) in no de v under insertions and deletions. W e b eg in b y obser ving that it is not p ossible to lazily ma in tain the lis t of the top k = ⌈ 11 . 18 α ⌉ − 1 most frequent colo urs in each ra nge: man y of these colours co uld hav e low freq uencies, and the list L ( v ) would hav e to be rebuilt after v ery few inser tions or deletions . T o circumv ent this pr o blem, w e relax our requirements o n wha t is stored in L ( v ), only guaranteeing that al l of the β - ma jo r ities of the range R ( v ) m ust be present in L ( v ), where β = ⌈ 11 . 18 α ⌉ − 1 . With this altera tion, we can still make use of the lemmas from the previous section, since they dep end only o n the fact that there a re no colo urs c 6∈ L ( v ) with frequency gre ater than β | T ( v ) | . The issue now is ho w to maintain the β -ma jorities of R ( v ) during insertions and deletions o f colours. Karpinski a nd Nekr ich noted that if we store the ( β / 2)-ma jorities for ea ch no de v in T , then it is only after | T ( v ) | β / 2 deletions that we m ust rebuild L ( v ) [2 3]. F or the ca se of insertions and deletio ns , their da ta structure p erforms a ra nge coun ting quer y at each no de v along the path from the r o ot o f T to the leaf representing the inserted or deleted colour c . This co unt ing query is used to determine if the colo ur c should be a dded to , o r remov ed from, the list L ( v ). In contrast, our strateg y is to be laz y dur ing insertions and deletions, waiting a s long as p ossible b efore recomputing L ( v ), and to avoid p erfo rming range counting queries for each no de in the up date path. W e provide a tig ht er analysis (to constant factors) of ho w man y insertions and deletions can o ccur b efor e the list L ( v ) is to b e rebuilt. One cav ea t is that the res ults in this section only apply whe n β ∈ (0 , 1 2 ]. How ever, since α < 1, our c hoice of β s atisﬁes this condition. W e use Z ∗ to denote Z + ∪ { 0 } . The following lemma is used to show a low e r b ound on the n umber of upda te op erations (insertions and deletions) tha t can o ccur b efor e a list needs to b e re c omputed: Lemma 7. L et Γ ( ℓ, j, β ) = min n i ∈ Z ∗ ,n d ∈ Z ∗ n n i + n d    j + n i ℓ + n i − n d > β o wher e ℓ ∈ Z + , j ∈ Z ∗ , j < ℓ , a nd β ∈ (0 , 1 2 ] . If ℓ ≥ 2 j + 1 , then Γ ( ℓ, j, β ) ≥ β ℓ − j 1 − β . Pr o of. Observe tha t j +1 ℓ +1 ≥ j ℓ − 1 if and only if ℓ ≥ 2 j + 1. This implies that increasing n i rather than n d by the same amount increases the v alue of the r atio j + n i ℓ + n i − n d by a g reater amount when ℓ + n i − n d ≥ 2( j + n i ) + 1. Thu s, w e hav e Γ ( ℓ, j, β ) = min n i ∈ Z ∗  n i   m + n i ℓ + n i > β  , if ℓ + i ≥ 2( j + i ) + 1 for 1 ≤ i < Γ ( ℓ, j, β ). Also observe that j + n i ℓ + n i > β implies n i > β ℓ − j 1 − β . All that r emains is to s how that for β ∈ (0 , 1 2 ] the co ns traint ℓ + i ≥ 2( j + i ) + 1 is satisﬁed for 1 ≤ i < Γ ( ℓ , j, β ). T o show this, we observe that if i < ℓ − 2 j , then ℓ + i ≥ 2( j + i ) + 1. Thus, the constraint is sa tisﬁed if Γ ( ℓ, j, β ) ≤ ℓ − 2 j . Since β ℓ − j 1 − β ≤ ℓ − 2 j for all β ∈ (0 , 1 2 ], we get the des ired bound. ⊓ ⊔ W e can think of the v aria bles n i and n d as the n um b e r s of insertio ns and deletio ns in to our data structure. Thu s, Γ ( ℓ , j, β ) represents the n umber of updates that can o ccur in a range containing ℓ p o in ts b efor e a colour c with j o ccurrences can p ossibly b e c ome a β -ma jority . W e next prov e the following lemma: Lemma 8. Supp ose the list L ( v ) for no de v c ontains the ⌈ 1 − β + √ 1 − β β ⌉ most fr e quent c olours in the r ange R ( v ) , br e aking ties arbitr arily. F or β ∈ (0 , 1 2 ] , this value is upp er b ounde d by ⌈ 2 β ⌉ . L et ℓ b e the num b er of p oints c ontaine d in R ( v ) . Only af ter ⌈ β ℓ √ 1 − β (1+ √ 1 − β ) ⌉ ≥ ⌈ β ℓ 2 ⌉ ins ert ions or deletions into T ( v ) c an a c olour c 6∈ L ( v ) p ossibly b e c ome a β -majority for t he r ange sp anne d by no de v . Pr o of. Since we keep in L ( v ) the k most frequent ly app earing co lours in the ra nge R ( v ), any colour not in L ( v ) can app ear at most ℓ k +1 times. W e apply Lemma 7, noting that it exa c tly describ es the n um ber of insertions or deletions required to cause a colo ur with frequency m to b ecome a β -ma jority in a range containing ℓ points. Thu s, we get that Γ ( ℓ, ℓ k +1 , β ) ≥ β ℓ − ℓ/ ( k +1) 1 − β . W e w ant to maximize the ratio Γ ( ℓ, ℓ k +1 , β ) /k , whic h gives us 7 the maximum num be r of upda tes b efore rebuilding L ( v ) p er co lour stored in v . If h ( k ) = β ℓ − ℓ/ ( k +1) (1 − β ) k , then the deriv ative h ′ ( k ) = ( − 2 k + β k 2 +2 β k + β − 1) ℓ ( k +1) 2 ( − 1+ β ) k 2 , which has zer os at k = { 1 − β + √ 1 − β β , 1 − β − √ 1 − β β } . The relev ant zero, which maximizes h ( k ) is k = 1 − β + √ 1 − β β . Substituting this in as k into β ℓ − ℓ/ ( k +1) 1 − β , w e get that β ℓ √ 1 − β (1+ √ 1 − β ) upda tes are required before a colo ur c 6∈ L ( v ) can become a β -ma jor ity for the range spanned b y no de v . ⊓ ⊔ By Lemma 8, o ur lazy up dating scheme only requires ea ch list L ( v ) to hav e size ⌈ 1 − β + √ 1 − β β ⌉ = O (1 /α ). This leads to the following theorem: Theorem 1. Given a set P of n p oints in one dimension and a ﬁxe d α ∈ (0 , 1) , ther e is an O ( n ) sp ac e data structur e that supp orts r ange α -majority queries on P in O ((lg n ) /α ) time, and insertions and deletio ns in O ((lg n ) /α ) amortize d time. Pr o of. Query time follows from Lemma 6. In order to get the de s ired space, w e combine L e mmas 6 and 8, implying that each list L ( v ) contains O (1 / α ) colours. This allows us to use the same pruning tec hnique describ ed in Lemma 6 in o rder to reduce the space to O ( n ). When an up date o ccur s, we fo llow the path fr om the ro ot of T to the upda ted no de u . Suppos e, without loss o f generality , the up date is an insertion of a point of colour c . F or ea ch vertex v on the path, if v contains a lis t L ( v ), w e c heck whether c is in L ( v ). If it is, then w e increment the coun t of colour c . This takes O (1 /α ) time. W e also increment the counter for v that keeps track of the num b er of up dates in to T ( v ) that hav e o ccurred since L ( v ) was rebuilt. Th us, modifying the lists and counters along the path requir es O ((lg n ) /α ) time in the w o rst ca s e. W e next lo ok at the costs of maintaining the lists L ( v ). The list L ( v ) can b e rebuilt in O ( | T ( v ) | ) time, using the array D . Note that D can b e maintained und er up dates using the sa me sc heme describ ed in Section 1 .2. First, we use D to compute the frequency of a ll the co lours in R ( v ) in Θ ( | T ( v ) | ) time. Let k be the v alue from Lemma 8. Since there are at most O ( | T ( v ) | ) colours, we c an use a linear time s election algorithm to ﬁnd the k -th mo st frequent colour in D , and then ﬁnd the top k most frequent colo urs via a linear scan in O ( | T ( v ) | ) time. W e can then enfo r ce the necessar y semi-o rdering on this list in O ( k ) = O ( 1 α ) time, as describ ed in Section 2.1. Thus, each leaf in T ( v ) pays O (1) co st every Θ ( | T ( v ) | α ) insertions, or O (1 / α ) amor tized co st p er insertion. Since each upda te may caus e O (lg n ) lists to b e rebuilt, this increases the cost to O ((lg n ) /α ) amortized time pe r update. W e make use of standard lo cal rebuilding techniques to keep the tree T balanced, rebuilding the lists in no des that a re merged or split during an up date. Since a no de v will only b e mer ged or split after O ( | T ( v ) | ) upda tes by the properties of weight -balanced B-trees, these lo cal rebuilding operatio ns re q uire O (lg n/ α ) amortized time. Finally , w e can up date F c and F dur ing an insertion or deletion of a point of colour c in O (lg n ) time. Thus, upda tes requir e O ((lg n ) / α ) amo rtized time o verall, and ar e dominated by the costs of maintaining the lists L ( v ) in each no de v . ⊓ ⊔ 3 Sp eedup for In teger Co ordinates W e next descr ibe how to improv e the query time of the data structure fro m Theorem 1 fr om O ((lg n ) /α ) to O (lg n/ ( α lg lg n )) for the case in which the x -co ordinates of the po in ts in P are in teg ers that can be stor ed in a constant n um b e r of w o rds. T o accomplish this go a l, we req uir e an improv ed one-dimensional range c ounting data s tructure, whic h we g et by combining tw o existing data structure s. The fusion tr e e of F redman and Willard [15] is an O ( n ) space data structur e that s upp or ts pr e de c essor and succ essor q ue r ies in O (lg n / lg lg n ) time a nd inser- tions/deletions in O (lg n/ lg lg n ) a mortized time. The list indexing data structure of Dietz [11] uses O ( n ) space and suppo rts r ank qu eries —i.e, given a n element, return the num b er of elements that pr e cede it in the list— in O (lg n/ lg lg n ) time, a nd insertions/deletions in O (lg n/ lg lg n ) amortized time. In Anders son et al. [1], it was observed that these data structures could b e combined to suppo rt dynamic one-dimensional range counting queries in O (lg n/ lg lg n ) time p er oper ation; amortized for up dates. W e refer to this data structure as an augment e d fu sion tr e e . 8 In order to achiev e O (lg n/ ( α lg lg n )) query time, w e implemen t all the range coun ting data structures a s augmented fusion trees: i.e., the data structures F , and F c for ea ch c ∈ C . Immediately , we get that we can per form a query in O (lg n/ ( α lg lg n ) + lg n ) time: O (lg n/ ( α lg lg n )) time for the range co un ting queries, a nd O (lg n ) time to ﬁnd the no des in I t . W e now dis cuss ho w to remove the additive O (lg n ) term, w hich in volves mo difying our weigh t-bala nced B -tree to suppo rt dynamic low est common ancestor queries. T o identify the top O (lg 1 α ) levels of I , we use the following lemma: Lemma 9. The weight-b alanc e d B-tr e e T c an b e a ugmente d in or der to supp ort lowest c ommon anc estor queries in O ( √ lg n ) time without changi ng the O ((lg n/α )) amortize d t ime r e quir e d for up dates. Pr o of. Let the ﬁrst a ncestor of a node u b e the parent of u , and the ℓ -th ancestor of u be the pare nt of the ( ℓ − 1)-th ancestor of u for ℓ > 1. In order to s upp or t lowest common ancestor queries b etw een tw o nodes z a and z b , denoted LCA ( z a , z b ), we add thr ee p ointers to e ach no de u ∈ T : p ointers to b oth the leav es representing the minimum and maximum x -co ordinates in T ( u ), and a p ointer to the ℓ -th ancestor of u ; we will ﬁx the v a lue of ℓ later. W e ca n sear ch for the LCA ( z a , z b ) by setting v = z a and follo wing the p ointer to the ℓ -th ancestor o f v , denoted v ′ . By chec king the maximum x -co ordinate to s e e if R ( v ′ ) co n tains z b , we can determine whether v ′ is an ancestor o f LCA ( z a , z b ) or a descendant of LCA ( z a , z b ) in cons tant time. If v ′ is a descendant of LCA ( z a , z b ), then w e set v to v ′ and v ′ to the ℓ -th a ncestor of v ′ . If v ′ is an ancestor of LCA ( z a , z b ), then we backtrac k and walk up the path fro m v to v ′ un til we ﬁnd LCA ( z a , z b ). Overall, it tak es O ( h 0 /ℓ + ℓ ) time to ﬁnd no de z = LCA ( z a , z b ), if z is at height h 0 in T . By setting ℓ = O ( √ lg n ) we g et O ( √ lg n ) time. F urthermore, the po int ers we added to T can b e updated in O (lg n ) amortized time during an insertion or dele tio n. Whenever w e merge or split a no de u , w e ha ve O ( | T ( u ) | ) time to ﬁx all o f the p ointers into u , b y prop erties of weight-balanced B- tr ees. The p ointers out of u can b e ﬁxed in O (lg n ) worst case time. ⊓ ⊔ Although Lemma 9 is w ea ker than other r e s ults (cf. [29]), it is simple and suﬃcient for our needs. W e next present the following theorem: Theorem 2. Given a set P of n p oints in one dimension with inte ger c o or dinates and a ﬁ x e d α ∈ (0 , 1) , ther e is an O ( n ) sp ac e data structu r e that supp orts r ange α -majority queries on P in O (lg n/ ( α lg lg n )) time, and b oth insertions and deletions into P in O ((lg n ) /α )) amortize d t ime. Pr o of. Suppos e we are given a query rang e [ x a , x b ]. Applying Le mma 9 to the weigh t-balanced B-tree T , we claim that we can identify the top ℓ levels of I – that ar e not necessar ily from co nsecutive lev els in T – using O ( ℓ ) least common ancestor op era tions. T o show this, we describ e a recursive pr o cedure Findtop ( z a , z b , ℓ ) for ident ifying the top ℓ levels of I . W e assume that we have acquired po int ers to z a and z b , the leav es of T that represent the x -co o rdinates of the s uccessor of x a and predecessor of x b , resp ectively . T o do this, we add a p ointer from each leaf in the augmented fusion tree F to its corresp onding leaf in T . Giv en a quer y , w e initially perfo r m a successor query for x a and predecessor query for x b in F , and follow these e xtra p ointers to z a and z b , resp ectively . W e assume tha t z a 6 = z b , otherwise the query is trivially answered b y rep orting the colour stored in z a . Let z = LCA ( z a , z b ), and c i denote the i -th child of z . Let z l and z r denote the le ftmos t and rightmost leav es in T z . In co nstant time w e c a n determine c hildren c j and c k of z which are on the pa th to z a and z b , resp ectively . Note that k − j > 0, otherwis e z is no t the LCA ( z a , z b ). W e say we a re in the go o d c ase when z a = z l , z b = z r , and/or k − j > 1. When we are in the go o d cas e, e ither c j , c k , and/o r c j +1 , ..., c k − 1 are in the top level of I , and w e set ℓ ′ = ℓ − 1. Otherwise, if k − j = 1 and z a 6 = z l and z b 6 = z r , then w e are in the b ad c ase . In the bad cas e we ha ve no t found the top level of I , and w e set ℓ ′ = ℓ . In both cases (goo d or bad), let z b ′ be the leaf in c k representing the minim um x -co o rdinate in T c k , and z a ′ be the leaf in c j representing the maximum x -co or dinate in T c j . W e recurse if ℓ ′ > 0, calling Findtop ( z a , z a ′ , ℓ ′ ) if z a 6 = z a ′ and Findtop ( z b ′ , z b , ℓ ′ ) if z b 6 = z b ′ . W e observe that the pro cedure Findtop ( z a , z b , ℓ ) uses O ( ℓ ) least co mmo n ancesto r q uer ies. This is bec ause if a call to Findtop is in the bad ca se, then the subseq uent recursive ca ll(s) will be in the goo d ca se by choice of z a ′ and z b ′ , and o nly the initial call to Findtop ca n make tw o recur sive calls. Using Findtop , 9 we ca n iden tify the top O (lg 1 α ) levels o f I in O ( √ lg n × lg 1 α ) time, replacing the O (lg n ) additive ter m. This factor is strictly a symptotically less than the time required to p e rform the rang e - counting queries, which is O (lg n/ ( α lg lg n )). By Lemma 9, the w e c a n supp o rt the lo west co mmon ancestor o pe ration without increa s ing the up date time of T as stated in Theo rem 1. The extra p ointers we added from the leav es of F to the lea ves of T can also b e up dated without aﬀecting the b ound fro m Theorem 1, since during any insertion/deletion of a p oint p , the tw o leaves corresp onding to p in b oth F and T must be lo cated. Therefore, the total up date time follows from Theorem 1. ⊓ ⊔ 4 Extension to H igher Dim ensional P oin t Sets In this section w e pr esent a reﬁnement of the technique pr e sented b y Karpinski and Nekrich [23], who used standard r a nge tree techniques [4] to generalize their range α -ma jority s tructures to higher dimensions. 5 W e note that, recently , Wilkinson [31] ha s used the same reﬁnemen t to improve the b ounds of Durocher et al. [13] for the tw o-dimensional static case. All of the a lgorithms presented thus far hav e the following tw o phase structure. The ﬁrst is the c andidate extr action phase, in whic h w e ex tr act a list o f candidates fro m our data structure . The second phas e is the veriﬁc ation phase , in which w e use r ange count ing data structures to verify that they ar e actual α -ma jorities . F or higher dimensional problems we sp eed up the veriﬁcation phase by a dding an additional ﬁltering phase betw een the candidate extraction and v eriﬁcation phases. In order to do this, w e make use of app r oximate r ange c ounting data structures [26,28,31]. If m p o ints are contained in the query range, then an approximate range coun ting data structure with additive error m ′ will return a count in the r ange [ m − m ′ , m + m ′ ]; see [28]. Similarly , a data structure with multiplicativ e error (1 − ε ) will return a coun t in the range [(1 − ε ) m, m ]; see [26]. In the remainder of this s e ction w e ﬁrst mo dify existing data structures for approximate ra nge coun ting, and then conside r their applications to higher dimensional data structures for dynamic r a nge α -ma jority queries. 4.1 Appro xim ate Range Counting Before stating the results for higher dimensional range α -ma jor ity data structures we require some addi- tional re s ults o n approximate range co unt ing. W e b eg in with a lemma, which is a v ery minor generaliz ation of Nekrich’s one-dimensional approximate range counting da ta structure [2 8, Theorem 1]. In the original structure each p oint is unw eig ht ed, but we w is h to a dd the oper ations in cremen t and decreme nt to the structure, whic h resp ectively incr ease a nd decrease the w eig h t of a p oint by one. W e assume a newly inserted po int b egins with w e ig ht one. Instead o f returning the num b er of points in a query range (within an additive error ter m), our quer y op era tion will return the sum of the weigh ts of the p oints in the query range, within an additiv e error term. Lemma 10. L et τ > 1 b e an inte ger c onstant, dpred ( n ) denote the c ost of a dynamic pr e de c essor se ar ch on n keys, and m denote the sum of the weight of the p oints c ontaine d the query r ange. Ther e exists an O ( n ) sp ac e data s tructur e t hat supp orts approximate w eighted range counting queries with additive err or m 1 /τ in O ( dp red ( n ) + lg lg n ) time, deletions in O (lg lg n ) amortize d time, and insertions in O ( dpr ed ( n ) + lg lg n ) amortize d time. The op er ations incremen t and decremen t ar e supp orte d in O (lg lg n ) amortize d time. Pr o of. Let m ′ be the approximate weight r e turned by our data structure, while m is the exact w eight. W e divide the solution into tw o case s . In the ﬁr s t case, we assume that m ≥ h 0 (lg n ) τ for some a r bitrary constant h 0 > 0. W e emphasize that in b oth cas es the data structure and pro of ar e essent ially the same as Theore m 1 of Nekrich [28], with so me minor modiﬁcatio ns . W e use an exp onential sea rch tree T [2], where ea ch lea f in T represents a p oint, but also stores the weight a sso ciated with the point. W e require some additional 5 In the preliminary version of this p ap er [1 4], th e paragraph preceding Theorem 3 erroneously stated t hat the result whic h we pro ve in this section follo ws immediately from the analysis of K arpinski and Nek rich [23]. 10 notation, and closely follow that of Nekrich [28]. Let v i denote the i -th child of v , n v denote the num b er of leav es in the subtree T ( v ), W ( v ) deno te the w eight of the leaves of the s ubtr ee T ( v ), a nd f ( v ) denote the nu mber of children of v . In the exp onential search tree T , each no de v has Θ ( n 1 /τ v ) children, each o f whic h contains betw een n ( τ − 1) /τ v / 2 and 2 n ( τ − 1) /τ v po int s, for a ﬁxed constant τ > 2. Each node v stores its w eig ht W ( v ), as w e ll as a se t of appr oximate weigh ts W ′ ( v , i, j ), such that W ( v i ) + ... + W ( v j ) − n 3 /τ v / 2 ≤ W ′ ( v , i, j ) ≤ W ( v i ) + ... + W ( v j ) + n 3 /τ v / 2 , for all 1 ≤ i ≤ j ≤ f ( v ). W e reco mpute all counts W ′ ( v , i, j ) after n 3 /τ v / 2 update o p er ations (insertions, deletions, incremen ts, and decremen ts). Rec omputing all the W ′ ( v , i, j )’s for a node ta kes O ( n 2 /τ v ) time. Thu s, each update operatio n— insertion, deletion, increment, decremen t— requires O (lg lg n ) amor tized time, since the height of T is O (lg lg n ) and we must increment or decrement the w eig ht W ( v ) stored in each no de on the pa th from leaf to ro ot. The s pace is linea r by the pro p er ties of exp onential se arch trees [2], a nd a ll that r emains is to a rgue the correctnes s of the query algorithm of Nekrich [28]. The query algorithm ess e ntially ﬁnds the ranges in T that represent the query range, and returns the summation o f the approximate counters of those ranges. F or a ﬁxed no de v from the set of nodes representing the query rang e , with children v i , ..., v j contained ent irely in the quer y range, let m ′ v = W ′ ( v , i, j ) and m v = P j ℓ = i W ( v ℓ ). Then m v − n 3 /τ v ≤ m ′ v ≤ m v + n 3 /τ v . Since m v ≥ n ( τ − 1) /τ v , m v − m 3 / ( τ − 1) v ≤ m ′ v ≤ m v + m 3 / ( τ − 1) v . Since T has heig ht h 1 lg lg n for some co nstant h 1 , m − (2 h 1 lg lg n )( m 3 / ( τ − 1)) ) ≤ m ′ ≤ m + (2 h 1 lg lg n )( m 3 / ( τ − 1) ). Ho wev er, since w e a ssume m ≥ h 0 (lg n ) τ , we need only ensure h 0 > (2 h 1 ) τ in order for 2 h 1 lg lg n ≤ m 1 / ( τ − 1) . Th us, m − m 4 / ( τ − 1)) ≤ m ′ ≤ m + m 4 / ( τ − 1) . By replacing τ with τ ′ = ma x(5 τ , 5) we obtain the result of the lemma. In the second case, when m < h 0 (lg n ) τ , we make use of a n alternative data structure. W e divide the p o int set in to groups of betw een h 0 (lg n ) τ and 4 h 0 (lg n ) τ consecutive po ints, and stor e each gro up in a bala nced binary search tree. Each no de u in the sea rch tree stores the tota l weigh t of the no des in the subtree induced by u . Given the successor and predecessor , e ′ 1 and e ′ 2 , of a query rang e [ e 1 , e 2 ], we can assume that p oints e ′ 1 and e ′ 2 either b elong to the sa me group, or tw o a djacent groups . Th us, given e ′ 1 and e ′ 2 , which c a n be obtained in O ( dpred ( n )) time, we can tally the exact weight in this c ase in O (lg lg n ) time. Using standar d techniques we ca n support inser tion in O ( dp red ( n ) + lg lg n ) amor tized time; O ( dpr ed ( n )) to ﬁnd the p osition in which to insert the new ele men t, and O (lg lg n ) amo rtized time to insert it in to the binary search tree for its gr oup, accounting for merging /splitting of groups. By analog ous ar guments deletion takes O (lg lg n ) amo rtized time. Finally , increment and decrement ca n b e p erfo r med in O (lg lg n ) worst ca se time. ⊓ ⊔ Before con tinuing, we r equire the deﬁnition of a g e neralized unio n-split-ﬁnd (GUSF) da ta structure, a s well as the time b o unds fo r its op er ations. Lemma 11 ([19], Theorem 5. 2). A GUSF st or es an or der e d list G of elements, in which e ach element x of G is asso ciate d with a subset U ( x ) ⊆ { 1 , ..., lg 1 4 n } of c olours. Assume we have a p ointer to an element x ∈ G , and U ′ ⊆ { 1 , ..., lg 1 4 n } b e a set of c olours. A GUS F su pp orts t he op er ations: – find ( x , U ′ ) : r eturn the suc c essor of x with c olour c ∈ U ′ . – add ( y, x ) : inserts y into the list b efor e x . – erase ( x ) : r emoves x fr om the list, assuming U ( x ) = ∅ . – mark ( x , c ) : inserts c into U ( x ) . – unmark ( x, c ) r emoves c fr om U ( x ) . A GUSF c an b e imple mente d in O ( n ) sp ac e, such that e ach op er ation takes O (lg lg n ) time. The time b ound for add and erase is amortize d, while the running time of al l other op er ations ar e wo rst c ase. A GUSF c ontaining n elements c an b e c onst ructe d in O ( n lg 1 8 n lg lg n ) time. Next we are r eady to sta te and pr ov e the main result o f this sectio n: 11 Lemma 12. L et P b e a set of d -dimensional p oints for any d ≥ 2 . The p oint set, P , c an b e pr epr o c esse d into an O (( n lg d − 1 n ) / lg lg n ) sp ac e da ta structur e, such t hat for a ny arbitr ary d -dimensional axis-aligne d hyp err e ctangle, Q , appr oximate r ange c ounting qu eries c an b e p erforme d in O (lg d − 1 n ) time, with additive err or |P ∩ Q| 1 j , for any c onstant inte ger j > 1 . Insertions and deletions c an b e p erforme d in O (lg d − 1+ 1 4 n ) amortize d time. Pr o of. The pro of o f this lemma is almost the same as Theorem 2 in [28], except that w e increa se the co st of the query by a factor of O (lg lg n ) for a ny d ≥ 3, and decrease the s pa ce b ound by a factor of O (lg ε n ). W e mak e use of dynamic fractional cascading in weigh t balanced B-trees witho ut mo diﬁcations [19], and a slight mo diﬁcation of the GUSF o f Lemma 1 1. W e next descr ib e ho w to co mbine the one - dimensional weigh ted approximate c o unt ing data s tr ucture from Lemma 10 with the GUSF. This will allow the GUSF to support colour ed appr oximate rang e counting: i.e., given a colo ur c ∈ { 1 , ..., lg 1 4 n } and a range [ x a , x b ], approximately r e po rt the n umber of elements with colour c contained in [ x a , x b ]. A single mo diﬁe d GUSF will then be stored in each internal node o f a weigh t-bala nced B-tree , in order to suppor t t wo-dimensional approximate range counting. A GUSF gr oups consecutive ele men ts into blo cks which are of size Θ (lg 2+ 1 4 n ). The elemen ts in eac h block are stored in a balanced binary sear ch tree. F o r each node in the tree, we store the counts of the num b er of children w ith eac h of the lg 1 4 n colour s, w ith coun ter n c storing the num b er of p oints of colour c . Since the tree ha s O (lg 2+ 1 4 n ) elements, eac h counter requires O (lg lg n ) bits, a nd th us the counters for a no de can be pack ed in to a co nstant n umber of words. Thus, these coun ters do not increas e the space of the GUSF structure asymptotically . As in a sta ndard GUSF, each blo ck in the mo diﬁed GUSF is r epresented in an o rder maintenance structure that maps a blo ck to an int eger coordina te. Given tw o blo cks, b and b ′ , we denote their corr esp onding integer co ordinates X ( b ) and X ( b ′ ), and w e can determine whether the elements in b precede those in b ′ , or vice versa, by comparing these co ordinates ; see [19] for full details. Our mo diﬁed GUSF also stores O (lg 1 4 n ) co pies o f the data structur e fro m Lemma 10, one for ea ch co lour c ∈ [1 , lg 1 4 n ], deno ted D c . W e discuss how to set the parameter τ for eac h D c later. F or ea ch blo ck b that ha s a counter v alue n c > 0 in its r o ot for colour c , w e store a point representing that blo ck in D c , with weigh t n c , and co or dinate X ( b ). The ro ot of b also stores a p o inter to the leaf in D c representing these points, for each colour c ∈ [1 , lg 1 4 n ]. Since ther e are at most O ( n/ lg 2+ 1 4 n ) blo cks, all these structures together o ccupy O ( n/ lg 2 n ) space. Given tw o elements e 1 and e 2 , where e 1 < e 2 and both ele men ts are marked with colour c , w e can determine the approximate num b er of p oints with colour c that b oth succeed e 1 and precede e 2 , as follows. First, if e 1 and e 2 are in the same blo ck, we can return the ex act c ount in O (lg lg n ) time using the count ers that are stor ed in the no des of the bala nced binar y tr e e re pr esenting the blo ck. Otherwis e, we need to p erform an additiona l step of querying the data str ucture D c , providing p ointers to the leav es in D c that repre s ent the blo cks containing e 1 and e 2 , resp ectively . With the exception of the da ta structures D c , the GUSF con taining n ele ments can b e constructed in O ( n lg 1 8 n lg lg n ) time b y Lemma 11, since each GUSF op era tion takes a t most O (lg lg n ) amortized time. Since eac h p oint r esults in an insertion o r incremen t oper ation on O (lg 1 8 n ) approximate r ange co unt ing data structures, eac h of size O ( n/ lg 2+ 1 4 n ), this takes O (lg 1 8 n lg lg n ) a mortized time p er point, and do es not asymptotically change the construction time. W e are next rea dy to discuss our data structure for pla nar a pproximate counting, i.e., the case in which d = 2. W e stor e a weigh t balanced B- tree T over the y -co ordinates o f the given p oints, with branching parameter Θ (lg 1 8 n ) and leaf parameter 1. F or each internal no de u o f T with degree f , we stor e o ur mo diﬁed GUSF M ( u ), ov er all of the points in the subtree T ( u ), ordered b y their x -co ordinate. Note that there are Θ (lg 1 4 n ) p o ssible contiguous s ubranges of children of u in tota l, and each child of u belong s to Θ (lg 1 8 n ) of these ranges [ i, j ], where 1 ≤ i ≤ j ≤ f . W e construct a set of colours, and ea ch colour cor resp onds to a pos s ible range [ i, j ]. Th us, each po int in M ( u ) is mar ked with the Θ (lg 1 8 n ) colo urs corre sp o nding to these ranges. E a ch node u also sto r es a catalo g ue V ( u ) corresp o nding to the po ints in T ( u ) o rdered by 12 x -co ordinate. Ea ch catalo gue element stores a pointer to the corresp onding element in M ( u ). W e maintain a dynamic fractional cascading da ta structure o ver the catalogues of T . Since the branching parameter of the tree T is Θ (lg 1 8 n ), the tree has heigh t Θ (lg n/ lg lg n ). E ach p oint is stored in Θ (lg n/ lg lg n ) no des, each con taining a constan t n umber o f linear space structures. Th us, the space o ccupied by the data structure is Θ ( n lg n/ lg lg n ). T o answer a query o f the form [ x 1 , x 2 ] × [ y 1 , y 2 ], we p er form a se arch for the successor and predeces sor, e 1 and e 2 , of the query range [ x 1 , x 2 ] in each catalo gue of each node amo ng the nodes repres ent ing [ y 1 , y 2 ] in T . This tak es Θ (lg n ) time, since there are Θ (lg n/ lg lg n ) catalogues: the initial s earch req uir es Θ (lg n ) time, and ea ch additiona l s earch uses Θ (lg lg n ) time. F or a ﬁxed in terna l no de u of T , such that the query range [ y 1 , y 2 ] spans children [ i, j ], let c b e the colour representing [ i , j ] in M ( u ). W e lo cate e ′ 1 and e ′ 2 , the resp ective s uccessor and pr edecessor of e 1 and e 2 in M ( u ) with colour c , using the find op era tio n. Th us , lo cating e ′ 1 and e ′ 2 in M ( u ) takes O (lg lg n ) time. Once w e hav e loca ted e ′ 1 and e ′ 2 we can query D c , a nd the counters in the block(s) containing e ′ 1 and e ′ 2 , in O (lg lg n ) time, as outlined above. Thus, the ov er all quer y time is O (lg lg n (lg n/ lg lg n ) + lg n ) which is O (lg n ). Suppo se we desire a dditiv e erro r |P ∩ Q| 1 j for some ﬁxed co nstant j > 1. Then, we set the parameter τ = 2 j . Let m ′ denote the s um of ea ch of the h 2 (lg n/ lg lg n ) appr oximate counts tallied at each no de that represents the query range, where h 2 is a constan t that depends on the heigh t of T , and m denotes the exact count. Thus, m − h 2 (lg n/ lg lg n ) m 1 τ ≤ m ′ ≤ m + h 2 (lg n/ lg lg n ) m 1 τ . (1) If m ≥ ( h 2 lg n ) τ , then m 1 τ ≥ h 2 (lg n/ lg lg n ). Thus, m − m 2 τ ≤ m ′ ≤ m + m 2 τ . Since τ = 2 j , we are left with m − m 1 j ≤ m ′ ≤ m + m 1 j , which is the desired error term. Next suppos e m < ( h 2 lg n ) τ . In this case, we c a n retrieve the exac t count in O (lg n ) time using the binary tree repr e sentation o f D c , since none of the rang es represented can contain more than m p oints. Thus, in bo th cases we hav e shown the query algorithm is co rrect. Note that we m ust ensure h 0 ≥ h τ 2 , in a dditio n to the constraints on h 0 describ ed in Lemma 10. In order to insert a point p , w e identify the nodes on the path Y from the roo t of T to the leaf wher e p will b e inserted. W e then s earch for the successor s of p in all of the catalo gues on this path, which takes O (lg n ) time in to tal. Once we have the success o r, we ca n inser t p into each GUSF along Y in the following wa y . Let u b e a no de in Y a nd u i be the c hild of u whose range contains p . Using the p ointer to the s ucc essor of p in M ( u ), we can p erfo r m an add op eration, inserting p in to a blo ck b in M ( u ). Le t U ′ denote the set o f colours in M ( u ) r epresenting the ranges that cont ain u i . If b splits into tw o blocks b and b ′ as a r esult of this, then w e m ust decr ement the w eight of the element representing b in each D c for each c with a non zero counter in the ro ot of b . W e also must inse rt a new element repr esenting b ′ int o each D c for each c with a non zero counter in the ro ot of b ′ , and increment its w e ig ht accor ding ly . Reca ll that O (lg 2+ 1 4 n ) elemen ts must have been inserted into M ( u ) to caus e b to split. Each split causes O (lg 1 8 n (lg 2+ 1 4 n )) update op erations on all D c , for each c s tored in the r o ots of b and b ′ . This is O (lg 1 8 n lg lg n ) amortized up date time. In the case in which b does not split, we still requir e O (lg 1 8 n lg lg n ) amortized time to increase the w eight of the representative of b in eac h D c , for c ∈ U ′ . Since ther e are O (lg n/ lg lg n ) no des in Y , the overall insertion time is thus O (lg 1+ 1 8 n ), pr ovided w e do no t cause a no de to split or tw o no des to merge in the base tree T . Deletion is handled analo gously , ex cept that we decrea se the weight of the repr esentativ e of b in ea ch D c for c ∈ U ′ . Thus, deletio n r equires O (lg 1+ 1 8 n ) amortized time as w ell. In the case in which a no de u ∈ T s plits or merges, we can eﬃciently up date the fractional c ascading data structure using the techniques described in [1 9]. The c o st o f a split or merg e is dominated by the cost of rebuilding the modiﬁed GUSF structures in both u and u ’s paren t. W e can rebuild each mo diﬁed GUSF in a no de repr e senting m p oints in O ( m lg 1 8 n lg lg n ) time. Since O ( m ) upda tes ar e requir ed to split a no de with a paren t containing O ( m lg 1 8 n ) points, a nd O (lg n/ lg lg n ) splits/mer ges can o ccur dur ing a single up date, the cost of perfo r ming an up date is O (lg 1+ 1 4 n ) amortized time. 13 T o get the b o und s tated b y the lemma fo r hig her dimens io ns, we us e the s tandard range tree technique [4], which inﬂates the spa ce, query and up date time b y a factor of O (lg n ) for ea ch additional dimensio n. In general, w e must set the parameter τ = 2 d − 1 j . ⊓ ⊔ 4.2 Range α -Ma jority in Higher Dimensions As a n application o f Theorem 1, and the approximate range counting data structures of Lemma 12, we can improv e the query time fo r ra nge α -ma jor ity data structures in higher dimensio ns . Theorem 3. Given a set P of n p oints in d dimensions (for a c onstant d ≥ 2 ) and a ﬁxe d α ∈ (0 , 1) , ther e is an O ( n lg d − 1 n ) sp ac e data structur e that supp orts r ange α -majority queries on P in O ((lg d n ) /α ) time, and insertions and del etions int o P in O ((lg d n ) /α ) amortize d time. Pr o of. Using range trees, we can conv ert any d -dimensional range α -ma jorit y query in to a co mbination of several ( d − 1)-dimensional range α -ma jor ity queries and d -dimensional ra nge counting queries. In pa rticular, let S ( n, d ) denote the cost of a d -dimensio nal range counting quer y on a dyna mic set of n po int s, and M ( n, d ) denote the co st of a d -dimensio nal range α -ma jority on a dynamic se t of n po in ts. Then, for d ≥ 2, M ( n, d ) = O (lg n ) M ( n, d − 1) + O (lg n/α ) S ( n, d ) , (2) since we can extra ct and verify the frequency o f the O ((lg n ) /α ) ca ndidates from the O (lg n ) no des represent- ing the r ange spanned by the d -th co ordinate of the query rang e. Note that ea ch ca ndidate is a n α -ma jority if we consider their ﬁrst ( d − 1 ) coo rdinates only . Since d -dimensional dynamic orthogonal range counting queries require O (lg d n ) time [8], M ( n, d ) = O ((lg d +1 n ) /α ). T o further reduce the query time w e o bserve that o nly O (1 /α ) of the O (lg n / α ) candidates ca n hav e frequency ab ov e (1 − ε ) αq , where q is the total num b er of points contained in the query ra nge and ε ∈ (0 , 1) is an arbitrary constant. Thus, w e add additional data structures F ′ c for each c ∈ C , where each F ′ c is the structure of Lemma 12 a nd stores the p oints of colour c . 6 Using these data structures, w e can p erform an additional ﬁltering pass o f the list of O (lg n/α ) candidates into a shorter list of O (1 /α ) candidates. After this ﬁltering step w e ca n then verify the frequency of the O (1 /α ) candidates ab ov e this threshold ex a ctly using range counting data structures F c . Let ˆ S ( n, d ) denote the query time of the data s tructure fro m Lemma 1 2. W e can rewrite the recurrence of Equation 2 as : M ( n, d ) = O (lg n ) M ( n, d − 1) + O (lg n/α ) ˆ S ( n, d ) + O (1 /α ) S ( n, d ) . (3) This recur rence resolves to M ( n, d ) = O ((lg d n ) /α ). W e can up date the structures F c and F ′ c for the colo ur, c , o f the inserted or deleted po in t, a nd F in O (lg d n + lg d − 1+ 1 4 n ) amortized time. Each o f the O (lg n ) ( d − 1)- dimensional rang e α -ma jority da ta structures can b e up dated in O ((lg d − 1 ) n/α ) amor tized time, for a to tal of O ((lg d n ) /α ) amortized time. Finally , the space cost is do minated by the range α -ma jor it y structures . The space o ccupied by this can be e x pressed as U ( n, d ) = O (lg n ) U ( n, d − 1), where U ( n, d ) is the spa ce occupied by a d -dimensional dynamic range α -ma jority structure, and d ≥ 2. Thus U ( n, d ) = O ( n lg d − 1 n ). ⊓ ⊔ 5 Dynamic Arra ys In this se ction we extend our res ults to dynamic arr ays. In the dynamic ar ray version of the problem, w e wish to s upp or t the follo wing op era tions on an array A of length n , wher e eac h A [ i ] contains a colo ur, for 1 ≤ i ≤ n : – Inser t ( i, c ): Insert the colour c b etw een the colours A [ i − 1] and A [ i ]. This shifts the colours in p os itions i to n to p ositio ns i + 1 to n + 1, resp ectively . 6 The data stru cture of Lemma 12 has v ery small additiv e error, though , for the pu rp oses of this p roof, w e need only constant m ultiplicative error. 14 – Delete ( i ): Delete the colour A [ i ]. This shifts the c o lours in p ositions i + 1 to n to p ositions i to n − 1, resp ectively . – Modify ( i, c ): Set the colour A [ i ] to c . – Quer y ( i, j ): Let | A [ i..j ] | c denote the nu mber of o ccurr e nc e s of colour c in the range A [ i..j ]. Rep o rt the set of colo ur s M suc h that for each c ∈ M , | A [ i..j ] | c > α | j − i + 1 | . As b efore, we refer to a colour c ∈ M as an α - majority in the range A [ i..j ], and the quer y a s a r ange α -majority query . The dynamic array problem b oils down to the w ell-studied problem of maintaining a n injective order preserving mapping from the p o sitions in A into a large r set of in teger keys P [27]. W e nex t prov e the following theorem: Theorem 4. Given an arr ay A [1 ..n ] of c olours and a ﬁ xe d α ∈ (0 , 1 ) , ther e is an O ( n ) sp ac e data structur e that supp orts r ange α -majority qu eries on A in O ((lg n ) / ( α lg lg n )) time, I nser t in O ((lg 3 n ) / ( α lg lg n )) amortize d time, Delete in O ((lg n ) / α ) amortize d time, and Modify in O ((lg n ) /α ) amortize d time. Pr o of. W e maintain our data structure T from Theor e m 2 on the in teg er key set P . Ea ch time a k ey p in P is c ha nged to k e y p ′ , we must delete p from T , and then inser t p ′ int o T . If an inser tion or deletion into our dynamic array c hanges ℓ k eys in the mapping, it will require O (( ℓ lg n ) /α ) amortized time to c hange these v alues in T . W e note that a Mo dify o p er ation corr esp onds to one deletion and o ne insertio n into T , requiring O ((lg n ) /α ) time. W e apply the dynamic r e duction to extende d r ank sp ac e tec hnique [2 7], whic h ma ps the po sitions in A to integers in the b ounded universe [1 ..O ( n 3 )]. This mapping requires O ((lg 2 n ) / lg lg n ) amor tized time for insertions, and O (1 ) amortized time for deletions. These time b ounds also b ound the num b er of key c hang e s for insertion and deletion (in the amo r tized sense), co mpleting the pro of. ⊓ ⊔ 6 Conclusions W e have presen ted several new dyna mic data structures for the range α -ma jority problem. These data structures improv e on the pr evious results in terms o f space, quer y , and up da te time. Notably , for one- dimensional p oints, we presented a linear spa ce data structure with O (lg n/α ) q uery time, and O (lg n/α ) amortized upda te time. In the case in which the coo rdinates of the p oints are in tegers, we reduced the que r y time by a (lg lg n )- factor. This improved q uery time matches an existing cell-prob e low er bo und, for the case when 1 /α is a constant, and the w ord s ize is Θ (lg n ). W e a lso extended our one-dimensional data structure to d -dimensions, where d ≥ 2 is an ar bitrary constant. The generalized structure o ccupies O ( n lg d − 1 n ) space, has O (lg d n/α ) query time, and suppor ts upda tes in O (lg d n/α ) amortized time. It would b e in terested to determine if either the space or query time can be improved for the higher dimensiona l data structure. References 1. Andersson, A., Miltersen, P ., Thorup, M.: F usion trees can be implemen ted with A C 0 instructions only. Theo- retical Computer Science 215(1-2), 337–3 44 (1999) 2. Andersson, A ., Thorup, M.: Dynamic ordered sets with exp onential search trees. Journal of the ACM 54 (June 2007) 3. Arge, L., Vitter, J.S.: Op t imal external memory interv al managemen t. SIAM Journal on Comput in g 32 (6), 1488– 1508 (2003) 4. Ben tley , J.: Multidimensional divide-and- conquer. Comm unications of the ACM 23(4), 214–229 (1980) 5. Blum, M., Flo y d, R ., Pratt, V., R ives t, R., T arjan, R.: Time b ounds for selection. Journal of Computer and System Sciences 7(4), 448–461 (1973) 6. Bo yer, R.S., Moore, J.S.: MJR TY - A fa st ma jority vote algorithm. In: Boy er, R.S. (ed.) Automated Reason- ing: Essa ys in Honor of W oo dy B ledsoe, pp. 105– 117. Aut omated Reas oning Series, Kluw er, Dordrech t, The Netherlands (1991) 15 7. Bozanis, P ., Kitsios, N ., Makris, C., T sak alidis, A.: New up p er boun ds for generalized inters ection searching problems. I n: Proceedings of the 22nd International Colloquium Automata, Languages and Programming. LNCS, vol . 944, pp. 464–474. Springer (1995) 8. Chazelle, B.: F unct ional approach to data structures and its use in m ultidimensional searc h ing. SIAM Journal on Computing 17(3), 427–462 (1988) 9. De Berg, M., Hav erkort, H.: Signiﬁcant-presence range qu eries in categorical data. In : Proceedings of the 8th International W orkshop on Algorithms and Data Structures. LNCS, vol. 2748, pp . 462–473. Springer (2003) 10. Demaine, E., L´ opez- Ortiz, A., Munro, J.I.: F requency estimation of internet pack et streams with limited space. In: Proceedings of the 10th European Symp osium on Algorithms. LN CS, vol. 2461, pp. 11–20. Springer (2002) 11. Dietz, P .: Optimal algorithms for list ind exing and subset rank. In: Dehne, F., Sac k, J., San toro, N. (eds.) Algorithms and Data Stru ctures, LNCS, vol. 382, pp. 39–46. Sp ringer (1989) 12. Durocher, S., He, M., Munro, J.I., Nichol son, P .K., Sk ala, M.: Range ma jority in constant time and linear space. In: Proceedings of th e 38th International Colloq uium on Au tomata, Languages, and Programming. LNCS, vol. 6755, pp. 244–255. S p ringer (2011) 13. Durocher, S., He, M., Munro, J.I., Nichol son, P .K., Sk ala, M.: Range ma jority in constant time and linear space. Information and Computation (2012), to app ear. 14. Elmasry , A., He, M., Munro, J.I., Nicholso n, P .K.: Dynamic range ma jority data structures. In: Pro ceed in gs of the 22nd International Symp osium on Algorithms and Computation. LNCS, vol. 7074, pp. 150–159 ( 2011) 15. F redman, M., Willard, D.: Surpassing th e information theoretic b ound with fusion trees. Journal of Computer and System Sciences 47(3), 424–4 36 (1993) 16. Gagie, T ., He, M., Munro, J. I., Nic h olson, P .K.: Finding F requent Elements in Compressed 2D Arrays and Strings. In: Pro ceedings of the 18th Symp osium on Strin g Pro cessing and Information Retriev al, LNCS, vol . 7024. Springer (2011) 17. Gagie, T., K¨ arkk¨ ainen, J.: Counting colours in compressed strings. In: Proceedings of the 22nd A nnual Sy mp osium on Com b inatorial Pattern Matc hing. LNCS, vol. 6661, pp. 197–207. Springer (2011) 18. Gagie, T., Na v arro, G., Puglisi, S.: Colo red range q ueries and document retriev al. In: Cha vez, E., Lonardi, S. (eds.) String Pro cessing and Information Retriev al, LN CS, vo l. 6393, pp . 67–81 . Springer (2010) 19. Giora, Y., Kaplan, H.: Optimal d ynamic vertical ray shooting in rectilinear planar sub divisions. AC M T ransac- tions on Algorithms 5, 28:1–28:51 (July 2009) 20. Gupta, P ., Janardan, R., Smid, M.: F urther Results on Generalized Intersection Searc h ing Problems: Counting, Rep orting, and Dyn amization. Journal of Algorithms 19(2), 282–317 (1995) 21. Husfeldt, T., Rauhe, T.: New low er b ound tec h niques for dynamic partial sums and related problems. SIAM Journal on Computing 32(3), 736–75 3 (2003) 22. Karp, R., Shenker, S., Pa padimitriou, C.: A simple algorithm for ﬁnding frequent elemen ts in streams and bags. ACM T ransactions on Database S ystems 28(1), 51–55 (2003) 23. Karpinski, M., Nek ric h, Y.: S earc h ing for frequ ent colors in rectangles. In : Proceedings of th e 20th Canadian Conference on Computational Geometry . pp. 11–14 (2008) 24. Lai, Y., Poon, C., Shi, B.: Approximate colored range and p oint enclosure queries. Journal of Discrete A lgorithms 6(3), 420–432 (2008) 25. Misra, J., Gries, D.: Find in g rep eated elemen ts. Science of Computer Programming 2(2), 143–152 (1982) 26. Mortensen, C.W.: Data Struct ures for Ort h ogonal I ntersection Searching and Other Problems. Ph.D. thesis, IT Universit y of Cop enhagen (2006) 27. Nekrich, Y .: Space eﬃcien t dynamic orthogonal range reporting. A lgorithmica 49(2), 94–108 (2007) 28. Nekrich, Y.: Data structures for approximate orthogonal range counting. In: Pro ceedings of th e 20th International Symp osium on Algorithms and Comput ation. LNCS, vol. 5878, pp. 183–192. Springer (2009) 29. Tsak alidis, A.: The nearest common ancestor in a d ynamic tree. Acta In formatica 25(1), 37–54 (1988) 30. W ei, Z., Yi, K .: Beyond simple aggreg ates: index ing for summary queries. In: Proceedings of the 2011 ACM SIGMOD/PODS Conference. p p . 117–12 8. ACM (2011) 31. Wilkinson, B.: Adaptive Range Co unting and Other F requ ency-Based Range Query Problems. Ph.D. thesis, Universit y of W aterloo (2012) 16

Dynamic Range Majority Data Structures

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment