Succinct Representations of Permutations and Functions

Succinct Represen tati ons of P ermutations and F unctions ✩ , ✩✩ J. Ian Munro a , Ra jeev Raman b , V enk a tesh Raman c , S. Sriniv asa Rao d, ∗ a Scho ol of Computer Scie nc e, Universi ty of Waterlo o, Waterlo o, ON, N2L 3G1, Canada b Dep artment of Computer Science, University of L eicester, L eic este r, LE1 7RH, UK c Institute of M athematic al Sc ienc es, Chennai, 600 113, India d Scho ol of Computer Scie nc e and Engine ering, Se oul National University, Se oul, 151-744, R epublic of Kor e a Abstract W e inv estigate the problem of succinctly repres e n ting an arbitrary permutation, π , on { 0 , . . . , n − 1 } so that π k ( i ) can be c omputed quickly for a n y i a nd any (po sitive or negative) integer pow er k . A representation taking (1 + ǫ ) n lg n + O (1) bits suﬃces to compute a rbitrary p ow ers in c onstant t ime, for any positive constant ǫ ≤ 1. A representation taking the optimal ⌈ lg n ! ⌉ + o ( n ) bits can be used to compute arbitra ry p owers in O (lg n/ lg lg n ) time. W e then consider the more g eneral problem of succinctly representing an arbitrar y function, f : [ n ] → [ n ] so that f k ( i ) ca n b e computed quickly for any i and any integer pow er k . W e giv e a representation that takes (1 + ǫ ) n lg n + O (1) bits, for any p ositive constant ǫ ≤ 1, and computes a rbitrary p ositive p owers in constant time. It can also be used to co mpute f k ( i ), for any negative integer k , in optimal O (1 + | f k ( i ) | ) time. W e pla ce empha sis on the r e dun dancy , or the space b eyond the info rmation- theoretic lower bo und that the data str uc tur e uses in order to support operations eﬃciently . A num b er of lo wer b ounds hav e r ecently be en shown on the redun- dancy of da ta structures. Thes e low er bo unds conﬁrm the spac e - time optimalit y of some o f o ur s o lutions. F urthermore, the r edundancy o f one of our str uctures “surpass es” a r ecent low er bound by Goly nski [Goly nski, SO D A 2009], th us demonstrating the limitations o f this lo wer b ound. Keywor ds: Succinct data structures , Space redundancy, Perm utations, F unctions, Benes netw or k , Succinct tree repres ent ations, Level a nc e s tor querie s ✩ W ork supp orted in part b y UISTRF pro ject 2001.04/I T. ✩✩ Preliminary versions of these results hav e appeared in the Pr o ce e dings of I nternational Col lo quium on Automata, L anguages and Pr o gra mming (ICALP) in 2003 and 2004. ∗ Corresp onding author Email addr esses: imunro@uwat erloo.ca (J. Ian Munro), rr29@leicest er.ac.uk (Ra jeev Raman), vraman@imsc. res.in (V enk atesh Raman), ssrao@cse.snu. ac.kr (S. Sri niv asa Rao) Pr eprint submitt e d t o Elsevie r Septemb er 12, 2018 1. In tro du ctio n F or an ar bitrary function f from [ n ] = { 0 , . . . , n − 1 } to [ n ], deﬁne f k ( i ), for all i ∈ [ n ], a nd any integer k as follows: f k ( i ) =    i when k = 0 f ( f k − 1 ( i )) when k > 0 and { j | f − k ( j ) = i } when k < 0 . W e consider the f ollowing problem: w e are giv en a sp eciﬁc and arbitrary (static) function f from [ n ] to [ n ] that ar ises in some application. W e wan t to repr e sent f (after pr e-pro cessing f ) in a da ta str uc tur e that, given k and i as par ameters, rapidly r eturns the v alue of f k ( i ). F or the s ake of simplicity , in the re st o f the pap er we assume that the given num b er k is b ounded by some p olyno mial in n . Our in ter est is in suc cinct , or highly-space eﬃcient, r epresentations of such functions, whose s pace usage is clo se to the information-theor etic lower bo und for repr esenting suc h a function. Since there are n n functions from [ n ] to [ n ], such a function cannot be r epresented in less than ⌈ n lg n ⌉ bits 1 . An y amount of memor y used by a data structure that represents such a function, ab ov e a nd beyond this low er bo und, is termed the r e dundancy of the data structure. W e also co ns ider the case wher e f is given as a “ bla ck b ox”, i.e. the data str ucture is g iven a ccess to a ro utine to ev a luate f ( i ) for any i ∈ [ n ]; in this cas e an y amount of memory whatso ever us ed b y the da ta structure is its redundancy . The fundamen tal aim is to understand precisely the minimum redundancy req uired to suppo rt op era tions r apidly . Clearly , the a bove problem is trivial if space is not an issue. T o facilitate the computation in co nstant time, one could store f k ( i ) for a ll i a nd k ( | k | ≤ n , along with so me extra informa tion), but that would require Ω( n 2 ) words of memo ry . The mos t natural compromise is to retain the v alues of f k ( i ) wher e 2 ≤ k ≤ n is a p ow er of 2 . This Θ ( n lg n )-word r epresentation easily yields a logarithmic ev a luation sc heme. Unfortunately , this representation not only uses non-linear space (and is re la tively slow) but also do es not supp o rt que r ies for the negative powers of f eﬃciently . Given f in a natur al repre sentation — the sequence f ( i ) for i = 0 , . . . , n − 1 , or a s a bla ck b ox — a highly spac e - eﬃcient solution is to store no additional da ta structures (ze r o r edundancy), a nd to co mpute f k ( i ) in k steps, for p ositive k . How ever, this is unacceptably slow for la rge k , a nd still do es not addres s the is sue of negative p ow er s. 1.1. R esults Our res ults are primarily in the unit-cost RAM with word size Θ(log n ) bits, where we measure the running time and the bits o f space used by an algor ithm. W e a lso consider the “black-b ox” mo de l, kno wn a lso as the systematic mo del [10], where we lo o k a t the num b er of ev a luations of f in addition to the run- ning time and space (in bits) used by the algorithm. Low er b ound results are 1 lg denotes the l ogari thm base 2. 2 discussed in either the black-box mo del or in the c el l-pr ob e mo del, where w e consider the spa ce (in bits) used by the a lg orithm, and the r unning time is the nu mber o f w -bit words of the data structure read by the a lgorithm to answer a query (and all o ther computation is for free). Finally , we also brieﬂy consider the bit-pr ob e mo del, which is the cell-pro b e mo del with w = 1 [24]. 1.1.1. Permutations W e b egin b y considering a sp ecial ca se, where the function is a p ermutation (abbreviated hereafter as a p erm [2 2]) of [ n ] = { 0 , . . . , n − 1 } . This turns o ut not only to b e an interesting sub-case in its own right, but is also ess ent ial to our solution to the g e neral problem. No te that for stor ing p erms, the informa tion- theoretic lower b ound is P ( n ) = ⌈ lg n ! ⌉ ≈ n lg n − 1 . 44 n bits, so the o bvious representation (as an a r ray s to ring π ( i ) for i = 1 , . . . , n ) has redundancy Θ( n ) bits (and of course do es n ot support inv erses or pow er s). W e obtain t he fo llowing results for repre s ent ing p er ms: 1. W e give a representation tha t uses P ( n ) + O ( n (lg lg n ) 5 / (lg n ) 2 ) bits, and suppo rts π () a nd π − 1 () in O (lg n/ lg lg n ) time. 2. I n the “ black b ox” model, where access to th e perm is only through the π () op eration, w e show how to suppor t π − 1 () in O ( t ) time and at most t + 1 ev a luations of π (), us ing ( n/t )(lg n + lg t + O (1 )) bits, for a ny 1 ≤ t ≤ n . 3. Given a structure that represents a p erm π in space S ( n ) bits, and sup- po rts π () and π − 1 () in time t f ( n ) and t i ( n ) respectively , we show how to repre s ent a given p erm π ′ on [ n ] in space S ( n ) + O ( n lg n/ lg lg n ) bits (or S ( n ) + O ( √ n lg n ) bits) a nd supp o rt a rbitrary powers of π ′ in t f ( n ) + t i ( n ) + O (1) time (or t f ( n ) + t i ( n ) + O (lg lg n ) time, r esp ectively). As corollar ies, we get the following repr esentations o f p erms: 4. o ne that uses P ( n ) + O (( n/t ) lg n ) bits, and s uppo rts π () in O (1) time a nd π − 1 () in O ( t ) time, for a ny t ≤ lg n . 5. o ne that uses P ( n ) + O (( n/t ) lg n ) bits and s upp or ts π k () in O ( t ) time for arbitrar y k , for a ny t ≤ lg n . 6. o ne that uses P ( n ) + O ( n (lg lg n ) 5 / (lg n ) 2 ) bits and suppo rts π k () in O (lg n/ lg lg n ) time for arbitrary k . R elate d Work Perms are fundamental in computer science and hav e b een the focus of ex- tensive study . A num b er of pap e rs ha ve dealt with issues p ertaining to p erm generation, mem b e r ship in p erm groups etc. There has a ls o b een work o n space-eﬃcient representation of r estricted clas s es of per ms, such as the pe rms representing the lexicogra phic order of the suﬃxes of a string [17, 18], or so- called approximately min-wise independent pe r ms, used for document similarit y estimation [6 ]. Our pap er is the ﬁrst to study the s pace-eﬃcient representation of general p er ms so that gener al pow ers ca n be computed eﬃciently (how e ver, see the discussion on Hellman’s w o rk in Section 1.2). 3 Recently Golynski [14, 15] showed a num b er of low er b ounds fo r the re- dundancy o f permutation representations. He showed a spa ce lower bound of Ω(( n/t ) lg( n/ t )) bits for Item (2) for an y algorithm that ev a lua tes π at most t < n/ 2 times [1 5, Theorem 17]. Thu s, (2 ) is asymptotically optimal for all t = n 1 − Ω(1) . F urthermore, Golynski [14] show e d that the redundancy of (4) is asymp otically optimal in the cell pro b e mo del with word size w = lg n : sp eciﬁ- cally , that any per m re presentation which supp orts π () in O (1) prob es and π − 1 () in t pro be s, for any t ≤ (1 / 16 )(lg n/ lg lg n ), must have asymptotically the same redundancy as (4). He also s hows that any per m that supp orts b oth π () and π − 1 () in at most t cell prob es, for a n y t ≤ (1 / 16 )(lg n/ lg lg n ), must ha ve redun- dancy Ω( n (lg lg n ) 2 / lg n ). In the pr eliminary version o f this pap er [26], a p erm representation was given that suppor ted π () and π − 1 () in O (lg n/ lg lg n ) time, and had redundancy Θ( n (lg lg n ) 2 / lg n ). Go ly nski sugg ested that the result of [26] was “optimal up to constant factor in the ce ll prob e mo del”. How ever, we note that the lower bound is quite sensitive to the pr ecise constant in the nu mber of prob es: our result (1) obtains a n asymptotically smaller redundanc y by using ov e r 2 lg n/ lg lg n cell prob es. 1.1.2. F unctions F or general functions from [ n ] to [ n ], o ur main result is that we r educe the problem of representing functions to that of representing p ermutations, with O ( n ) additiona l bits. As cor ollaries, we g et the following representations of functions, both of which use close to the information-theor etic minimum amount of space, and answer q ueries in optimal time: 1. o ne that uses n lg n (1 + 1 /t ) + O (1) bits, and supp orts f k ( i ) in O (1 + | f k ( i ) | · t ) time for any in teg er k , and for a n y t ≤ lg n/ lg lg n . 2. o ne tha t uses n lg n + O ( n ) bits and suppor ts f k ( i ) in O ((1 + | f k ( i ) | ) · (lg n/ lg lg n )) time, for any integer k . Along the wa y , we show that an unlab elled static n -no de r o oted tree can be represented us ing the optimal 2 n + o ( n ) bits of space to answer level-anc estor — given a no de x a nd a num be r k , to r epo rt the i -th ance s tor of x — a nd level- suc c essor/level-pr e de c essor queries — to rep ort the next/previous no de at the same lev el as the given no de — in constant time. W e r epresent the tree in 2 n bits as a balanced par ent hesis (BP) s equence. The key technical contribution is to provide a o ( n )-bit index for exc ess se ar ch in a BP sequence. F or a p osition i in a BP sequence, excess ( i ) is the num b e r of unclos ed ope n par entheses up to that p ositio n (this co rresp onds to the depth of a node in the tree repr esented by the BP). The op e ration next-excess ( i, k ), s tarting at a p osition i in the BP sequence, ﬁnds the next p osition j whose excess is k ; w e supp or t next-excess in O (1) time provided tha t j ’s excess is a t most (lg n ) c below or a bove the exce ss of i (i.e., | k − excess ( i ) | = O ((lg n ) c )), for a ny ﬁxe d constant c ≥ 0. T o add standard navigational oper ations, one ca n use existing o ( n ) bit indices for BP sequences [25]. 4 R elate d work The problem of represe n ting a function f space- eﬃciently in the “bla ck b ox” mo del, so that f − 1 can be computed q uickly , was co nsidered b y Hellman [20]. Spec ia lized to p erms, Hellman’s idea is similar t o our “ black b ox” represen tation for representing a p erm and its in verse, mo dulo some implemen ta tion details. The version of the function p ow er s pr o blem that we consider is diﬀerent: wherea s Hellman attempts, given x , to ﬁnd any y s uch that f ( y ) = x , we enumerate a ll such y . F urther more, our solution do es not use the “bla ck box” model, and assumes space for r epresenting f in its en tire t y , whic h is b oth unnecessary and prohibitive in Hellman’s co nt ext. Representing tr e e s to supp or t lev el-ancesto r queries is a w ell-studied prob- lem. Solutions with O ( n ) prepro cessing time and O (1) quer y time w er e given by Dietz [8 ], Ber kman and Vishkin [5] and by Alstr up and Holm [1]. A muc h simpler solution w as given by Bender and F arach-Colton [3]. F or a tree on n no des, all these solutions requir e Θ( n ) words, or Θ( n lg n ) bits, to represent the tree its e lf, and the additiona l data structures stored to supp or t level-ancestor queries also take Θ ( n ) words (lev el- successor / predecesso r is trivial using Θ( n ) words). As noted above, our interest is in s uccinct tree r epresentations. W e make a few remark s ab o ut suc h representations, so as to better understand our contri- bution in rela tio n to others. Succ inct tree repr esentations can also b e considered to b e split into a tr e e enc o ding that takes 2 n + o ( n ) bits, and an index of o ( n ) bits for that tree enco ding. There are man y tree enco dings, including BP [25], DFUDS [4], LO UDS [21] and P artition [12], and it is not known if they are equiv alent, i.e. if ther e a re op er a tions that hav e o ( n ) siz ed indices for one tree enco ding and not the o ther . Another fea tur e is that diﬀere nt tree enco dings impo se diﬀerent n umberings o n the no des o f the tree. Therefore, a r esult show- ing a s uccinct index for a par ticular op er ation in (say) BP do e s no t imply the existence of a s uccinct index for that op eration in (say) LOUDS. This matters from an application p ersp ective b ecause the only w ay to g et a space-eﬃcient data structure that sim ulta neously suppo r ts op era tions a and b , where a and b are known to b e supp orted only by (say) LO UDS a nd BP-based tree enco ding s resp ectively , w o uld b e to enco de the tree twice, once ea ch in LOUDS and BP and to ma intain the cor resp ondence betw een the LOUDS and BP n umbering s, which would severely a ﬀect the space usa g e. W e provide o ( n )-bit BP indices for the oper ations of lev el-ancestor and lev el- successor /predecesso r, via exces s se a rch. Geary et al. [12] gav e a o ( n )-bit index for supp orting level-ancestor in O (1 ) time using the Partition enco ding, but they did not provide suppo rt for level-successor /predecess o r; a o ( n )-bit index for supp or ting these quer ie s was a nnounced by He et al. [19]. V ery r ecently Sadak ane and Nav a rro [33] gave an alternative algorithm for exces s search in BP and showed that excess sea rch together with r ange-minimum quer ies suﬃce to supp ort a wide v ar iety of tree op er ations, amo ng other things. Their excess index is of smaller size, but seems not to supp or t search for excess v alues greater than the starting p oint. 5 1.2. Motivation There ar e a num b er of motiv ations for succinct data structures in g eneral, many to do with text in dexing or repr esenting huge g raphs [1 7, 21, 25, 32]. W ork on s uc c inct representation of a p erm and its inverse w as, for one of the a uthors, originally motiv ated b y a data warehousing applicatio n. Under the indexing scheme in the system, the perm cor r esp onding to the r ows of a relation sorted under any given key w as ex plic itly stored. It w as r e a lized that to p erform certain joins, the in verse of a seg men t of this p erm was precisely what w as r equired. The p erms in q ues tion o ccupied a substan tia l p ortion of the several hundred gigabytes in the indexing structure and doubling this space req uirement (for the per m inv erses) for the sole purp ose of improving the time to compute certain joins was inappro priate. Since the publication of the preliminary versions of these pa p er s, the r e- sults herein ha ve found num erous applications, most notably to the problem of supp or ting rank and select op er ations ov er strings of large alphab ets [16]. Other applications arise in Bioinformatics [2]. The mor e general problem of quickly computing π k () also ha s num be r of applications. An interesting one is determining the r t h ro ot o f a p e rm [30]. Our techniques not o nly solve the r t h power problem immediately , but can also b e used to ﬁnd the r t h ro ot, if o ne exists. Inv erting a “one- wa y” function, particular ly in the scena rio considered by Hellman [20], is a fundamental task in cry ptography . Finally , very recently a num b er of r esults hav e b een s hown that fo cus o n the redundancy of succinct da ta structures for v ario us o b jects, including [10, 13, 14, 29]; we hav e already ment ioned low er bounds on the redundancy of r e presenting per ms in particula r. T his has b een acco mpa nied by some remar k a ble r esults o n very low-redundancy data structures. F or example, consider the simple task of representing a sequence of n integers from [ r ], for so me r ≥ 1 to per mit r andom access to the i -th integer. The naive bo und of n ⌈ lg r ⌉ bits has redundancy Θ( n ) bits rela tive to the optimal ⌈ n lg r ⌉ bits. F ollowing the ﬁrst non-trivial result on this topic ([26, Theorem 3]), a line of work culminated in Do dis et al.’s remark able result that O (1)-time access can b e obtained with eﬀectiv ely zer o redundancy [9]. W e also note that the redundancy is often important in pr actice, as the “ lower-order” r edunancy term in the space usage is often signiﬁcant for practical input sizes [1 1]. The r emainder of the pap er is organized as follows. The next section de- scrib es so me pr evious r e sults on indexable dictiona ries use d in la ter sections. Section 3 deals with p ermutation representations. In Section 3.1 we descr ibe the ‘sho r tcut’ metho d, and Section 3.2 d escrib es an optimal space representation based on Benes netw orks. Both of these ar e r epresentations supp orting π () and π − 1 () querie s , and we consider the o ptimalit y of thes e solutions in Section 3.3. In Section 3.4 w e consider repre sentations that suppo rt a rbitrary p owers. Se c - tions 4 and 5 deal with genera l function r epresentation. Section 4 o utlines new op erations on balanced parenthesis sequences which lead to an optimal-space tree representation that supports level-ancestor queries along with v ar ious other navigational op erations in co nstant time. Section 5 describ es a succinct repr e- 6 sentation of a function that s uppo rts computing a rbitrary pow ers in o ptimal time. 2. Preliminaries Given a set S ⊆ [ m ], | S | = n , deﬁne the following op erations: rank ( x, S ) : Given x ∈ [ m ], r e turn |{ y ∈ S | y < x }| , select ( i, S ) : Giv en i ∈ [ n ], return the i + 1-st smalles t e le ment in S , p-rank ( x, S ) : Giv en x ∈ [ m ], retur n − 1 if x 6∈ S a nd rank ( x, S ) otherwise (the p artial r ank op eration). F urthermore, deﬁne the following data structures: • A f ul ly indexable dictionary (FID) representation for S suppo rts rank ( x, S ), select ( i, S ), rank ( x, ¯ S ) and select ( i, ¯ S ) in O (1) time. • An indexable dic tionary (ID) S supp or ts p-rank ( x, S ) a nd select ( i, S ) in O (1) time. Raman, Raman and Rao [32] show the following: Theorem 2.1. On the R A M mo del with wor dsize O (lg m ) bits: (a) Ther e is a FID for a set S ⊆ [ m ] of size n using at most  lg  m n  + O ( m lg lg m/ lg m ) bits. (b) Ther e is an ID for a set S ⊆ [ m ] of size n using at most  lg  m n  + o ( n ) + O (lg lg m ) bits. 3. Representing Perm utations 3.1. The Shortcut Metho d W e ﬁr st provide a spa c e-eﬃcient representation (based on Hellman’s idea) that suppor ts π − 1 () in the “black b ox” model. Recall that in the “bla ck b ox” mo del, the per m is accessible only thro ugh calls of π (). Let t ≥ 2 b e a parameter. W e trace the cycle structure of the perm π , and for every cy c le who s e length k is greater than t , the k ey idea is to a sso ciate with some selected elements, a shortcut p ointer to an ele ment t po sitions prior to it. Sp eciﬁca lly , let c 0 , c 1 , . . . , c k − 1 be the elements of a cycle o f the p erm π such that π ( c i ) = c ( i +1) mo d k , for i = 0 , 1 , . . . , k − 1. W e asso ciate shor tcut pointers with the indices whose π v a lues are c it , for i = 0 , 1 , . . . , l = ⌊ k /t ⌋ , and the shortcut p ointer v alue at c it stores the index whose π v alue is c (( i − 1) mo d ( l +1)) t , for i = 0 , 1 , . . . , l (see Fig. 1). Let s ≤ n/t be the nu mber of shortcut p ointers after doing this for every cycle of the p er m and le t d 1 < d 2 < . . . < d s be the elements asso cia ted with sho r tcut po int ers. 7 2 4 11 6 1 9 8 0 5 7 3 12 10 13 Figure 1: Shortcut method. Solid lines denote the p erm, and the dotted lines denote the shortcut p ointers. The shaded no des indicate the p ositions having shortcut p oint ers. W e store the set { d i } in a data structure D that is an instance of the in- dexable dictionary (ID) of Theorem 2.1(b). Given an index i , D allows us to test if a particula r element has a sho rtcut p ointer with it, and if so , r eturns its po sition in the s e t { d i } . W e store the sequence { s i } , where s i is the shortcut po int er asso ciated with d i in an ar r ay S . The following pro cedure computes π − 1 ( x ) for a given x : i := x ; while π ( i ) 6 = x do if i ∈ D and p- rank ( i, D ) = r / / b oth found by q ue r ying D then j := S [ r ]; else j := π ( i ); i := j ; endwhile return i Since we hav e a shortcut p ointer for every t elements of a cycle, the n umber o f π () ev aluations made by the alg orithm is at most t + 1, and all other op erations take O (1) time by T he o rem 2.1. B y the standard approximation ⌈ lg  n s  ⌉ = s (lg( n/s ) + O (1)), we see that the space used by D is a t most ( n/t )(lg t + O (1 )) bits. The space used by S is clear ly s ⌈ lg n ⌉ = s (lg n + O (1)). Thu s we have: Theorem 3.1. Given an arbitr ary p ermutation π on [ n ] as a “black b ox”, and an inte ger 1 ≤ t ≤ n , ther e is a data structu r e t hat uses at most ( n/t )(lg n + lg t + O (1)) bits that al lo ws π − 1 () to b e c omput e d in at most t + 1 evaluations of π () , plus O ( t ) time. W e get the following e a sy corolla ry: Corollary 3 .1. Ther e is a r epr esen t ation of an arbitr ary p erm π on [ n ] using at most P ( n ) + O (( n/t ) lg n )) for any 1 ≤ t ≤ lg n that supp orts π () in O (1) time and π − 1 in O ( t ) time. Pr o of. W e r epresent π naively a s a n array taking n ⌈ lg n ⌉ = P ( n ) + O ( n ) bits, and a llowing π () to be computed in O (1) time, a nd apply Theorem 3.1. T he space b ound follows since for t ≤ lg n , ( n/ t )(lg n + lg t + O (1)) = Ω( n ). 8 R emark: Cho o sing t = ⌈ (1 /ǫ ) ⌉ for any constant ǫ > 0 in Co rollar y 3.1 we ge t a representation of a p ermutation π o n [ n ] in (1 + ǫ ) n lg n bits whe r e π () a nd π − 1 bo th take O (1) time. 3.2. R epr esentations b ase d on the Benes network 3.2.1. The Benes N et work The results in this se ction are based on the Bene s netw ork, a co mm unication net work comp osed of a n umber of switches , whic h w e now brieﬂy outline (see [23] for details). Ea ch switch ha s tw o inputs x 0 and x 1 and tw o outputs y 0 and y 1 and can b e conﬁgured either so that x 0 is connected to y 0 (i.e. a pa ck et that is input along x 0 comes out of y 0 ) and x 1 is co nnected to y 1 , or the other wa y around. An r -Benes netw o r k ha s 2 r inputs and 2 r outputs, a nd is deﬁned a s follows. F o r r = 1, the Benes netw o rk is a single switc h with two inputs and t wo outputs. An ( r + 1)-Benes netw o rk is c o mpo sed of 2 r +1 switches and tw o r -B e nes netw or ks, connected as shown in Fig. 2(a). A par ticular setting of the switches of a Benes net work r e alises a per m π if a pack et in tr o duced at input i co mes o ut at output π ( i ), for all i (Fig. 2(b)). T he following pro per ties a re either easy to verify o r well-kno wn [23]. • An r -Benes netw or k ha s r 2 r − 2 r − 1 switches, and e very path f rom an input to an output pass es through 2 r − 1 switches; • F o r e very per m π on [2 r ] there is a setting o f the switches of an r -Benes net work that re alises π . r -Benes network r -Benes network (a) construction of ( r + 1)-Bene s netw ork 7 6 5 4 3 2 1 0 1 3 5 0 7 6 4 2 (b) Benes network realising the permutation (4 7 0 6 1 5 2 3) Figure 2: T he Benes netw ork construction and an example Clearly , Benes netw o rks may b e used to re present perms. If n = 2 r , a repr e - sentation of a p erm π on [ n ] may b e obtaine d by conﬁguring an r -Benes netw or k to r ealize π a nd then listing the settings o f the switches in some canonica l order (e.g. level-order). This repr e sents π using r 2 r − 2 r − 1 = n lg n − n/ 2 bits. Given i , one can trace the path taken by a pack et at input i by insp ecting the a ppro- priate bits in this representation, and thereb y co mpute π ( i ); by tracing the path 9 back fr om output i we can likewise compute π − 1 ( i ). The time tak en is clearly O (lg n ); indeed, the algorithm only makes O (lg n ) bit-prob es. T o s umma r ize: Prop ositio n 3.1. When n = 2 r for some inte ger r > 0 , ther e is a r epr esenta- tion of an arbitr ary p erm π on [ n ] t hat uses n lg n − n/ 2 bits and su pp orts the op er ations π () and π − 1 () in O (lg n ) time. How ever, the Benes netw o rk has t wo shortcoming s from our vie wpo int : ﬁrstly , the Benes net work is deﬁned only for v alue s of n that are p owers of 2. In order to represent a p er m with n no t a p ow er of 2, ro unding up n to the next higher pow er of 2 could double the space usage, which is unaccept- able. F urther mo re, even fo r n a p ower of 2 , repres e n ting a p erm us ing a B enes net work uses P ( n ) + Ω( n ) bits. W e now deﬁne a fa mily of Benes-like netw orks that admit greater ﬂe x ibilit y in the num b er o f inputs, namely the ( q , r )-Benes netw orks, for in teg ers r ≥ 0 , q > 1. Deﬁnition 3.1. A q - p ermuter to b e a c ommunic ation network that has q inputs and q outpu t s, and r e alises any of the q ! p erms of its inputs (an r -Benes network is a 2 r -p ermuter). Deﬁnition 3.2. A ( q , r ) -Benes network is a q -p ermu ter for r = 0 , and for r > 0 it is c omp ose d of q 2 r switches and t wo ( q , r − 1) -Benes networks, c onne cte d to gether in ex actly the same way as a standar d Benes network. Lemma 3.1. L et q > 1 , r ≥ 0 b e inte gers and take p = q 2 r . Then: 1. A ( q , r ) -Benes net work c onsists of q 2 r − 1 (2 r − 1) switches and 2 r q -p ermut ers; 2. F or every p erm π on [ p ] ther e is a setting of the switches of a ( q , r ) -Benes network that re alises π . Pr o of. (1) is ob v ious; (2) ca n be proved in the same wa y as for a standard Benes net work. W e now consider r epresentations based on ( q , r )-Benes netw or ks; a crucial comp onent is the representation of the central q -p ermuters, whic h we address in the next s ubsection. Since we are no t interested in designing communication net works as such, we fo cus instead o n w ays to repr esent the pe r ms repr esented b y the central q -p ermuters in o ptimal (or very close to optimal) spa c e and o pe rate on it – speciﬁca lly , to compute π () a nd π − 1 () on the p erms re pr esented by the q -p ermuters – in the bit-prob e, cell-prob e or RAM mo del. This is s uﬃcient to compute π () and π − 1 in the ( q , r ) Benes netw ork at larg e. 3.2.2. R epr esenting Smal l Perms In this section we co nsider the hig hly space-e ﬃcient representation of “small” per ms to use as a central q -p ermuter in a ( q , r )-Benes net work. It is straig ht - forward (as noted in Sectio n 3.3) to repr esent a p erm on [ q ], q = O (lg n/ lg lg n ) and opera te on it in the cell-prob e model, or by table lo okup in the RAM mo del. As w e will see, the la rger w e can make our central q -p ermuters (while k eeping 10 optimal space and rea sonable pro cessing times), the low er the redundancy o f our r e pr esentation. With this in mind, we now give a method for asymptot- ically lar ger v alues of q . W e use the following complexity b ounds for integer m ultiplication and divisio n using the fast F ourier T ransfor m [7]: Lemma 3.2. Given a n umb er A o c cupying m wor ds and another numb er B ≤ A , one c an c ompute the numb ers ( A mo d B ) and ( A div B ) in O ( m lg m ) time. Lemma 3 .3. If q ≤ (lg n ) 2 / (lg lg n ) 4 , then t her e is a r epr esent ation of an arbi- tr ary p erm π on [ q ] using P ( q ) bits that supp orts π ( i ) and π − 1 ( i ) in O (lg n/ lg lg n ) time. This assumes ac c ess to a set of pr e c omput e d c onstants that dep end on q and c an b e stor e d in O ( q 2 lg q ) bits and also pr e c ompute d tables of si ze √ n (lg n ) O (1) bits. Pr o of. W e repres e nt a per m π ov er [ q ] as a sequence r (0) , r (1) , . . . , r ( q − 1), where r (0) = 0 a nd for 1 ≤ i < q , r ( i ) = |{ j < i | π ( j ) < π ( i ) }| is the rank o f π ( i ) in the set { π (0) , π (1 ) , . . . , π ( i − 1) } . This seque nc e is viewed as a q -digit num be r in a “mixed-r adix” system, wher e the i -th digit r ( i ) is from [ i + 1], re pr esenting the in teg er R = P q − 1 i =0 i ! r ( i ). The perm π is enco ded b y stor ing R in binary: since R is an integer from [ q !], the space used by the enco ding is P ( q ) bits, and R is stored in m = O (lg n / (lg lg n ) 3 ) words. T o compute π () or π − 1 (), w e ﬁrst deco de the sequence r (0) , . . . , r ( q − 1) fro m R in O ( m (lg m ) 2 ) time, and from this seqeunce compute π () and π − 1 () in O ( m lg m ) and O ( m ) time res pec tively , for an ov er all r unning time of O ( m (lg m ) 2 ) = O (lg n/ lg lg n ). W e now describ e these steps, assuming for s implicit y that q is a p ower of 2. T o deco de R , w e ﬁrst obtain re presentations R ′ and R ′′ of the sequences of digits r ( q − 1 ) , r ( q − 2 ) , . . . , r ( q / 2), and r ( q / 2 − 1) , . . . , r (0) as R ′ = ( R div ( q / 2)!) and R ′′ = ( R mo d ( q / 2)!) in O ( m lg m ) time, and recurse. When r ecursing, note that lg R ′ − (lg R ) / 2 = O ( q ) bits, so the leng ths of R ′ and R ′′ are equa l to within O ( m/ lg m ) w ords. Standard ar ithmetic, plus table lo okup, is used once the integer to b e dec o ded ﬁts into a single word. Th us, the r ecurrence is: T ( m ) = m lg m + T ( m 1 ) + T ( m 2 ) T (1) = O (1) where m 1 + m 2 ≤ m + 1 and | m j − m/ 2 | = O ( m/ lg m ) (for j = 1 , 2), which clearly solves to O ( m (lg m ) 2 ). (It is assumed that the divisors at eac h level o f the recursion such as ( q / 2 )! at the top lev el, ( q / 4)! and (3 q / 4)(3 q / 4 − 1) · · · ( q/ 2 ) at the next level e tc. a r e pre-co mputed (but these dep end on q o nly , and ar e independent of the p erm π ). W e partition the sequence r ( q − 1) , . . . , r (0 ) into chun ks of c = ⌈ 1 2 (lg n/ lg q ) ⌉ consecutive num b ers each; ea ch ch unk ﬁts into a single word and the n umber o f ch unks is O ( m ). Deﬁne under ( x, i ) as the num b er o f v alues in π ( q − 1) , . . . , π ( i ) that are ≤ x . As r ( q − 1) = π ( q − 1 ), under ( x, q − 1) is immediate. F urther observe that: • if r ( i ) = x − under ( x, i + 1) − 1 then π ( i ) = x ; 11 • if r ( i ) < x − under ( x, i + 1) − 1 then π ( i ) < x ; • if r ( i ) > x − under ( x, i + 1) − 1 then π ( i ) > x . Thu s, under ( x , i ) is easily computed from under ( x, i +1) and r ( i ). Given under ( x, i ) and a ch unk r ( i − 1 ) , . . . , r ( i − c ) o ne can p erform a ll the following tasks in O (1) time using table lo o kup: • c ompute under ( x, i − c ); • de ter mine if there is a j ∈ [ i − 1 , i − c ] suc h that π ( j ) = x ; • g iven a p osition j ∈ [ i − 1 , i − c ], determine whe ther π ( j ) ≤ x or > x . This gives an O ( m )-time algorithm for computing π − 1 () and an O ( m lg m )-time algorithm for computing π () (via binary search). 3.2.3. R epr esenting L ar ger Perms W e will now use the repre sentation of Lemma 3.3, to repr esent la rger p er- m utations via the Benes netw ork. W e b egin by s howing: Prop ositio n 3.2 . F or al l inte gers p, t ≥ 0 , p ≥ t t her e is an inte ger p ′ ≥ p such that p ′ = q 2 ℓ and p ′ < p (1 + 1 /t ) , for inte gers q and ℓ wher e t < q ≤ 2 t and ℓ ≥ 0 . Pr o of. T ake q to be  p/ 2 ℓ  , where ℓ is the integer that sa tisﬁes t < p/ 2 ℓ ≤ 2 t . Note that p ′ < ( p/ 2 ℓ + 1) · 2 r = p (1 + 2 r /p ) < p (1 + 1 /t ). Now we describ e t he necessary mo diﬁcations to t he Benes netw o rk. Although no new ideas ar e needed, a little care is needed to minimize r edundancy . Lemma 3.4. F or any inte ger p ≤ n , if p = q 2 r for inte gers q and r su ch that (lg n ) 2 / 2(lg lg n ) 4 < q ≤ (lg n ) 2 / (lg lg n ) 4 and r ≥ 0 , t hen ther e is a r epr esen- tation of an arbitr ary p erm π on [ p ] that uses P ( p ) + Θ(( p lg q ) /q ) bits, and supp orts π () and π − 1 () in O ( r + lg n/ lg lg n ) time e ach. This assu m es ac c ess to a pr e-c omput e d table of size O ( √ n (lg n ) c ) bits t hat do es not dep en d up on π , for some c onstant c > 0 . Pr o of. Consider the ( q , r )-Benes netw ork that realizes t he perm π , and represent this net work as follows. List all the switch settings of the outer 2 r lay ers of switches a s in Prop os ition 3.1, and r epresent each o f the cen tra l q -p er muters using Lemma 3.3. The repres ent ation of Lemma 3.3 r equires pre-c omputed tables o f s ize O ( √ n (lg n ) c ) bits (for some co nstant c > 0), which can b e shared ov er all the applications of the lemma. W e now calculate the spa ce used. Note that: P ( p ) = p lg( p/ e ) + Θ(lg p ) = q 2 r ( r + lg q − lg e ) + Θ (lg p ) = q r 2 r + 2 r ( q lg ( q /e )) + Θ(lg p ) 12 By Lemma 3.1 a nd Lemma 3.3 the space used by the ab ov e repr esentation (excluding lo okup tables) is q r 2 r + 2 r P ( q ) = q r 2 r + 2 r ( q lg ( q /e ) + Θ(lg q )) = P ( p ) + Θ(( p lg q ) /q ). The running time for the queries follo ws from the fact that we need to lo ok at O ( r ) bits among the outer lay er s o f switch settings, and that the re presentation of the central q -p ermuter (Lemma 3.3) supp or ts the queries in O (lg n/ lg lg n ) time. Theorem 3.2. An arbitr ary p erm π on [ n ] may b e r epr esent e d us ing P ( n ) + O ( n (lg lg n ) 5 / (lg n ) 2 ) bits, such that π () and π − 1 () c an b oth b e c ompute d in O (lg n/ lg lg n ) time. Pr o of. Let t = (lg n ) 3 . W e ﬁrst co nsider re pr esenting a p erm ψ on [ l ] for some int eger l , t < l ≤ 2 t . T o do this, w e ﬁnd an integer p = l (1 + O ((lg lg n ) 4 / (lg n ) 2 )) that satisﬁes the preconditions of Lemma 3 .4; such a p exis ts b y P rop osition 3.2. An elemen ta r y calcula tion shows that P ( p ) = P ( l )(1 + O ((lg lg n ) 4 / (lg n ) 2 )) = P ( l ) + O (lg n (lg lg n ) 5 ). W e extend ψ to a p erm on [ p ] by setting ψ ( i ) = i for all l ≤ i < p a nd represent ψ . By Lemma 3.4, ψ can b e represented using P ( p ) + Θ(( p lg p )(lg lg n ) 4 / (lg n ) 2 ) = P ( l ) + Θ(lg n (lg lg n ) 5 ) bits such that ψ () and ψ − 1 () op erations are s uppo rted in O (lg n/ lg lg n ) time, as suming a ccess to a pre-computed table o f s iz e O ( √ n (lg n ) c ) bits, for some co nstant c > 0 . Now we repres ent π as follows. W e choo se an n ′ ≥ n such that n ′ = n (1 + O (1 / (lg n ) 3 )) and n ′ = q 2 r for some integers q , r s uch that t < q ≤ 2 t . Again we extend π to a pe r m on [ n ′ ] by s etting π ( i ) = i for n ≤ i < n ′ , and represe n t this extended p erm. As in Le mma 3 .4, we start with a ( q , r )-Benes netw o r k that re a lises π a nd write down the switch settings of the 2 r outer levels in level- order. The p erms realised by the central q -permuters are represented using Lemma 3.4. Ignoring any pre-c o mputed tables, the s pace requir ement is q r 2 r + 2 r ( P ( q ) + Θ(lg n (lg lg n ) 5 )) bits, which is aga in e a sily shown to b e P ( n ′ ) + Θ(( n ′ lg n ′ ) /q + 2 r lg n (lg lg n ) 5 )) = P ( n ′ ) + Θ( n (lg lg n ) 5 / (lg n ) 2 ) bits. Fina lly , as ab ov e, P ( n ′ ) = (1 + O (1 / (lg n ) 3 )) P ( n ), and the space re quirement is P ( n ) + Θ( n (lg lg n ) 5 / (lg n ) 2 ) bits. The running time for π () and π − 1 () is clear ly O (lg n ). T o improve this to O (lg n/ lg lg n ), we now expla in how to step through m ultiple levels of a Benes netw or k in O (1) time, ta king car e not to incr ease the space consumption signiﬁcantly . Consider a ( q , r )-Bene s netw ork and let t = ⌊ lg lg n − lg lg lg n ⌋ − 1. Consider the case when t ≤ r (the other ca se is easier), and consider input nu mber 0 to the ( q , r )-Benes netw ork . Depending upon the se ttings of the switches, a pack et entering at input 0 may reach any of 2 t switches in t s teps A little thought s hows that the o nly packets that could app ear at the inputs to these 2 t switches ar e the 2 t +1 pack ets that en ter at inputs 0 , 1 , k , k + 1 , 2 k , 2 k + 1 , . . . , where k = q 2 r − t . The settings of the t 2 t switches that could b e seen b y any one of these pack ets suﬃce to determine the next t steps of al l o f these pack ets. Hence, when writing down the settings of the s witches of the B enes net work in the r epresentation of π , we write a ll the settings of thes e sw itches in t 2 t ≤ (lg n ) / 2 consecutive lo cations. Using table lo o kup, we can then step 13 through t of the outer 2 r layers of the ( q , r )-Benes netw or k in O (1) time. Since computing the eﬀect of the central q -p ermuter tak e s O (lg n/ lg lg n ) time, we see that the ov erall r unning time is O ( r/ t + lg n/ lg lg n ) = O (lg n/ lg lg n ). 3.3. Optimality W e now consider the optimalit y o f the solutions given in the previous t wo sections: sp eciﬁcally , if they achiev e the b est p ossible redundancy for a given query time. As no ted in Intro duction, Golynski [15, Theor em 17] has shown that any data structure in the “ black-box” mo del that supp orts π − 1 in at mos t t < n/ 2 e v a luations of π () requires an index of size Ω(( n/t ) lg( n/t )). This shows the as ymptotic optimality of Theo rem 3.1 for t = n 1 − Ω(1) . In the cell probe mo del, Golynski [14] shows that: Lemma 3.5. F or any data stru ct ur e which uses P ( n ) + r bits of sp ac e to r epr e- sent a p erm over [ n ] and supp orts π () and π − 1 () in time t f and t i r esp e ctively, such that max { t f , t i } ≤ (1 / 16)(lg n/ lg lg n ) , it holds t hat r = Ω(( n lg n ) / ( t f · t i )) bits. This shows that Co r ollary 3.1 is optimal for a r a nge o f v a lues of the par a meter t . Sp ecﬁcially , there is a cons ta nt c (whic h depends upo n the constant within the O () in Corollary 3.1 and the v alue 1 / 16 in Lemma 3.5) such tha t th e redundancy of Corollar y 3.1 is asymptotically optimal for all t ≤ c lg n/ lg lg n . In order to clarify the r elationship of Lemma 3.5 to the results in Section 3.2 we have the following prop osition: Prop ositio n 3.3. In the c el l pr ob e mo del with wor d size O (log n ) , a p erm π non [ n ] c an b e r epr esente d as fol lows: i. Both π () and π − 1 () c an b e c ompute d using 2 lg n/ lg lg n + O (1) pr ob es, and the sp ac e u se d is P ( n ) + O ( n (lg lg n ) 2 / lg n ) bits. ii. Both π () and π − 1 () c an b e c omput e d using (2 + ǫ ) lg n/ lg lg n + O (1) pr ob es, for any c onstant ǫ > 0 , and the sp ac e u se d is P ( n ) + O ( n (lg lg n ) 3 / (lg n ) 2 ) bits. Pr o of. In the cell pro be model, we note that given a p er m π on [ q ], one can compute π () and π − 1 on a p erm q in O (1 + ( q lg q ) / lg n ) time, using P ( q ) bits. This is done b y r e pr esenting π implicitly , e.g ., as the index o f π in a canonical en umer a tion o f all p erms on [ q ], a nd computing π () and π − 1 by simply reading the entire re pr esentation (whic h occupies O (1 + ( q lg q ) / lg n ) cells). Tw o particular v a lues o f q a re of interest her e: q 1 = Θ(lg n/ lg lg n ), when the time is O (1) prob es, and q 2 = ǫ (lg n/ lg lg n ) 2 , for some c onstant ǫ < 1 , when the time is at most ǫ lg n/ lg lg n prob es. Using thes e repres ent ations as t he cen tra l q -p er muter in Lemma 3.4, follow e d by Theorem 3.2, we note that the num b er of prob es made in the outer layers of the Benes net work is at most 2 lg n/ lg lg n . By adding the pr o b e s made to the central q - p er m uter (for both q = q 1 and q = q 2 ), we ge t the n umber s of prob es claimed. The redundancies a re obtained by stra ightforw a rd calculation as in Lemma 3.4 and Theor em 3.2. 14 The ﬁrst of tw o cases re pr esents the low es t n umber o f prob es that we are able to ac hieve with our approach. Although the num b er of prob es is still higher than the maximum n um b er of prob es allowed by Lemma 3.5, the redunda ncy equals the lo west redundancy prov able by Le mma 3.5. How ever, with a very small increase in the n umber of prob es, the re dunda ncy drops considerably (and in fact is low er tha n that of Theore m 3.2). 3.4. Supp orting Arb itr ary Powers W e now co nsider the pr oblem of r epresenting an ar bitrary p erm π to c ompute π k () for k > 1 (or k < 1) more eﬃcient ly than by rep eated applica tion of π () (or π − 1 ()). Here we develop a succinct structure to suppor t all p owers of π (including π () and π − 1 ). The r esults in this s ection ass ume that we hav e P ( n ) bits (plus some redundancy) to store the repres entation, i.e., we do not work in the “black-box” mo del. Theorem 3.3. Supp ose t her e is a r epr esentation R taking s ( n ) bits t o stor e an arbitr ary p erm π on [ n ] , that su pp orts π () in t ime t f , and π − 1 () in t ime t i . Th en ther e is a r epr esentation for an arbitr ary p erm on [ n ] t aking s ( n ) + O ( n lg n/ lg lg n ) bits in which π k () for any inte ger | k | ≤ n c an b e supp orte d in t f + t i + O (1) t ime, and one taking s ( n ) + O ( √ n lg n ) bits in which π k () c an b e supp orte d in t f + t i + O (lg lg n ) time. Pr o of. Consider the cy cle repres ent ation of the given p er m π , in which for all cycles of π , w e wr ite down the elements co mpr ising the cyc le, in the order in which they a ppea r in the cycle, star ting with the smallest element in the cycle. It will be con venien t to co nsider the log ic a l ar ray ψ of length n , which comprises the cycles written in nondecre a sing order of length, with lo gical sepa rators mar king the b oundary of each cycle (see Fig. 3 for an exa mple) 2 . Clear ly , igno ring the logical separato rs b etw een cycles, ψ is itself a p ermutation. T o compute π k ( x ) for any (p os itive or negative) k we do the following: 1. ﬁnd the po s ition j in ψ that contains x , 2. ﬁnd the left e ndp oint l of the seg men t o f ψ that r epresents the cycle containing i , and the length λ of this cycle and 3. r eturn the element o f ψ in p osition s = l + (( j − l + k ) mo d λ ). The data s tr ucture fo r implementing this is as follows. W e repr esent ψ in the assumed repre sentation R . In Step (1), j is computed as ψ − 1 ( i ) in time t i , and in Step (3), the r eturn v a lue is just ψ ( s ), computed in time t f . W e now fo cus o n Step (2). Let λ 1 < λ 2 < . . . < λ z be the distinct cycle leng ths in π (the example in Fig. 3 ha s z = 3); note that z = O ( √ n ). W e store the seq uence { λ i } in an ar ray , using O ( √ n lg n ) bits. Also consider the set S = { s i } , where s 1 = 0 and for i = 2 , . . . , z , s i is the total length of all cycles in π whose length is strictly les s than λ i (note that s i is the starting po sition of the se quence of 2 One can disp ense with the logical s eparators by wr iting the cycles in order of decreasing minimum elemen t, but this is not as conv enient for our purp oses. 15 0 1 2 3 4 5 6 7 8 9 10 11 5 4 1 11 8 0 3 7 6 2 9 0 9 7 6 10 10 2 11 4 1 5 8 3 i i i i i i i i i i i i i i i i i i i i i i i ψ π Figure 3: A permutation π and the logical array ψ representing its cycles. cycles of size λ i ). Th us, if j is the po sition of x in ψ in Step (1), then the length λ o f the cycle containing x is λ t , where t = rank ( j, S ). Also, since all the cycles of length λ b egin at s t = s elect ( S, t ), it is straightforward to compute the left endpo int of the cycle containing x . It only remains to des crib e how to re present S . W e cho ose tw o options, g iv ing the claimed results: • to represent S in the FID of Theo rem 2 .1, ta king lg  n z  + O ( n lg lg n / lg n ) = O ( n lg lg n / lg n ) bits, which supp or ts rank and select in O (1) time. • to r epresent S as a n ar ray , suppo rting s elect in O (1) time a nd also as a predecessor data structure (e.g. the Y-fast trie [34]) which supp orts rank in O (lo g log n ) time. The spac e used b y this option is O ( √ n lg n ) bits. As an immediate cor ollary , we get, from Theore m 3.2 Corollary 3.2. Ther e is a r epr esent ation t o stor e an arbitr ary p erm π on [ n ] using at most P ( n ) + O ( n (lg lg n ) 5 / (lg n ) 2 ) bits t hat c an supp ort π k () for any k in O (lg n/ lg lg n ) t ime. 4. Succinct trees with l ev el -ancestor queries In this section we consider the pro blem of s uppo rting level-anc estor queries on a static ro oted ordered tr e e. The structure develop ed her e will b e used in the next section as a substructure in representing a function eﬃciently . Given a ro o ted tree T with n nodes , the level-ancestor problem is to pr epro cess T to answer q ueries of the following form: Given a v ertex v a nd an integer i > 0, ﬁnd the i th vertex on the path from v to the ro ot, if it e xists. Existing solutions take Θ( n lg n ) bits to answer queries in O (1) time [8, 5, 1, 3], and our so lutio n stor es T using (essen tially optimal) 2 n bits of space, and uses auxiliary structures of o ( n ) bits to supp ort level-ancestor quer ie s in O (1) time. Another useful feature of o ur solution (which w e need in the function r epresentation) is that it a lso suppo rts ﬁnding the level-successor (or predecessor) of a no de, i.e., the no de to the right (left) of a g iven no de on the s ame level, if it exis ts, in constant time. 16 A hig h-level view of o ur str ucture and the que r y a lgorithm is as follows: for any constant c > 0 we construct a structure A , that giv e n a no de x and any (po sitive or nega tive) in teg er k , | k | ≤ lg c n , supp or ts ﬁnding the ances tor (or the ﬁrst successo r in pre- order, if k ≤ 0) of x whose depth is depth ( x ) + k (this structure is our main contribution). Applying the above with c = 2 (say), we also construc t another s tr ucture, B , which supp orts lev el-ancestor queries on no des whose depths are m ultiples of lg 2 n , and whose heig ht s are at least lg 2 n . T o supp or t a level-ancestor query , structure A is ﬁr s t used to ﬁnd the clo sest ancestor o f the given no de, whose depth is a m ultiple of lg 2 n and whose height is at leas t lg 2 n . Then structure B is used to ﬁnd the ancestor which is the closest descendant of the req uir ed no de and whos e depth is a m ultiple of lg 2 n . Structure A is again used to ﬁnd the r equired no de from this no de. The choice of diﬀerent powers of lg n in the structures given b elow ar e so mewhat a rbitrary , and could b e ﬁne-tuned to s lig htly improv e the lower-order term. The str ucture A consists o f the tree T r epresented in 2 n bits as a bala nced parenthesis (BP ) sequence a s in [25], by visiting the no des of the tre e in depth ﬁrst order and writing an open paren thesis whenev er a no de is ﬁrst visited, and a closing parenthesis when a no de is visited after all its c hildren hav e be en visited. Thu s, each no de has exactly o ne open and one closing par e nthesis c o rresp onding to it. Hereafter , w e als o refer a no de by the p osition of either the op en or the closing pa renthesis corresp onding to it in the BP sequence of the tree. W e stor e an e x isting auxilia ry structur e o f size o ( n ) bits that answ er s the follo wing queries in O (1) time on the BP se quence (see [25, 11] for deta ils): • c lose ( i ): ﬁnd the p osition o f the closing par enth esis that matches the op en parenthesis a t p os ition i . • o p en ( i ): ﬁnd the po sition of the op en parenthesis that matches the closing parenthesis a t p os ition i . • e xcess ( i ): ﬁnd the diﬀerence b etw een the num b er of op en parentheses and the num b er of closing par entheses fro m the b eginning up to the p ositio n i . Note that the excess of a p osition i is simply the depth o f the no de i in the tr ee. Our new con tr ibution is to give a o ( n )-bit structure to supp ort the following op eration in O (1) time: • n ext-excess ( i, k ): ﬁnd the least p o s ition j > i such that excess ( j ) = k . W e only suppor t th is query for excess ( i ) − O (lg c n ) ≤ k ≤ excess ( i ) + O (lg c n ) for some ﬁxed co nstant c . In the following lemma, we ﬁx the v alue of c to be 2. Observe that next-e xces s ( i, k ) gives: (a) the ancestor o f i at depth k , if k < depth ( i ), a nd (b) the next no de after i in the lev el-o rder tr aversal of the tr ee, if k = d e pth ( i ), and (c) the next no de after i in pr e-order , if k > depth ( i ). 17 W e now descr ib e the a uxiliary str ucture to supp ort the next-excess query in constant time using o ( n ) bits of extra space, showing the follo wing: Theorem 4.1. Given a b alanc e d p ar enthesis se quenc e of length 2 n , one c an sup- p ort the op er ations op en , close , excess and next-excess ( i, k ) wher e | k − excess ( i ) | ≤ lg 2 n , al l in c onstant time using an additional index of size o ( n ) bits. Pr o of. The auxilia ry structure to suppo rt o pe n , close a nd excess in constant time using o ( n ) additional bits has b een describ ed by Munro and Raman [25] (see also [11] for a simpler str ucture). W e now descr ibe the a ux iliary structures required to supp ort the next-exces s query in consta nt time. W e split the par enthesis sequence corresp onding to the tree int o sup erblo cks of size s = lg 4 n a nd each super blo ck into blo cks o f size b = (lg n ) / 2. Since the excess v alues of tw o consecutive p o s itions diﬀer o nly b y one, the set contain- ing the excess v alues of all the po sitions in a sup erblo ck/blo ck forms a single range o f integers, which we denote as the ex c ess-r ange of the sup erblo ck/blo ck. W e store this exce s s range informatio n for each s uper blo ck, whic h requires O ( n lg n/ lg 4 n ) = o ( n ) bits fo r the en tire sequence. F or each block, w e also store the excess-ra nge information, whe r e excess is deﬁned with respe c t to the beg inning of the super blo ck. As the excess-r ange for ea ch blo ck can be stored using O (lg lg n ) bits, the space used over all the blo cks is O ( n lg lg n/ lg n ) = o ( n ) bits. F or ea ch sup erblo ck, we store the following structure to s uppo rt the queries within th e supe rblo ck (i.e., if the answ er lies in the same s up er blo ck as the quer y element) in O (1) time: W e build a complete tree with branching factor √ lg n (and hence cons tant height) with blo cks at the lea ves. Each internal no de o f this tree stores the excess ranges of all its children, where the excess- range of an internal no de is deﬁned as the union o f the excess-ra nges of all the leav es in its subtree. Thu s, the size of this structure for each sup erblo ck is O ( s lg lg n/ b ) = o ( s ) bits. Using this str ucture, g iven a ny p osition i in the sup erblo ck and a num b er k , we can ﬁnd the p osition next-excess ( i, k ) in constant time, if it exists within the sup erblo ck. Mor e s p eciﬁca lly , a que r y is answered by sta r ting at the leaf (blo ck) v containing the pos itio n i , trav ersing the tree upw ards till we ﬁnd the ﬁrst ancesto r node which has a child with preorder num b er la r ger than that o f v whose excess-r ange contains k , a nd then traversing down wards to reach the leaf containing the answer to the query ; searches a t the in ternal no des and leav es are per formed using precomputed tables , as the information sto red at these no des is either O ( √ lg n lg lg n ) bits for internal no des, or (lg n ) / 2 bits for leaves. Let [ e 1 , e 2 ] be the ra nge o f excess v alues in a sup erblo ck B . Then fo r each i such that e 1 − lg 2 n ≤ i < e 1 or e 2 ≤ i < e 2 + lg 2 n , we store the least p o s ition to the right of s uper blo ck B whos e excess is i , in a n array A B . In addition, for each i , e 1 ≤ i ≤ e 2 , we store a p ointer to the ﬁrst supe rblo ck B ′ to the right of sup erblo ck B suc h that B ′ has a p osition with excess i . Then we remove all multiple p ointers (thus each p ointer corresp o nds to a rang e of exc e sses instead of just one exce s s). The graph repr esenting these p o inters betw een sup erblo cks is planar. [One wa y to see this is to dr aw the gr aph o n 18 the Euclidean plane so that the vertex co rresp onding to the j -th super blo ck B , with excess v alue s in the r ange [ e 1 , e 2 ], is represented as a v ertical line with end p oints ( j, e 1 ) and ( j, e 2 ). Then, there is an edg e betw een tw o sup erblo cks B a nd B ′ if and only if the vertices (vertical lines) cor resp onding to these are ‘visible’ to e ach o ther (i.e., a horizontal line connecting these t wo vertical lines at so me height do es not in tersect any other vertical lines in the middle).] Since the n umber of edges in a pla na r gr aph o n m vertices is O ( m ), the num b er of these in ter-sup erblo ck pointers (edges) is O ( n/s ) as there are n/ s super blo cks (vertices). The total space re quired to store all the p ointers and the array A B is O ( n lg 3 ( n/s )) = o ( n ) bits. Thu s, each superblo ck has a s et of p ointers asso cia ted with a set of r a nges of excess v alues . Given a n excess v alue, we need to ﬁnd the range cont aining that v a lue in a given superblo c k (if the v alue b elo ngs to the range of excess v alues in that sup erblo ck), to ﬁnd the p ointer asso cia ted with that range. F or this purp ose, w e store the following a uxiliary structure : If a superblo ck has mor e than lg n ra nges asso cia ted with it (i.e., if the degree of the no de corresp onding to a supe rblo ck in the graph representing the inter-sup erblo ck p ointers is more than lg n ), then we store a bit vector for that sup erblo ck that ha s a 1 a t the po sition w he r e a range starts, and 0 everywhere else. W e also store an auxiliary structure to suppor t rank queries o n this bit v e ctor in constan t time. Since there are at most n/ ( s lg n ) sup erblo cks containing more than lg n rang es, the total space used for stor ing all these bit vectors together with the auxilia ry structures is o ( n ) bits. If a super blo ck ha s at most lg n ra nges a sso ciated with it, then we store the leng ths of these ranges (from left to r ight) using the sea rchable partial sum structure of [3 1], that supp orts predecessor queries in co nstant time. This requires o ( s ) bits for every such sup erblo ck, and he nc e o ( n ) bits ov er all. Given a query next-excess ( i, k ), let B b e the super blo ck to which the po s ition i b elong s. W e ﬁrs t chec k to see if the answer lies within the sup erblo ck B (using the preﬁx sums tree structure mentioned above), and if so, w e output the p ositio n. Other wise, let [ e 1 , e 2 ] b e the ra nge of excess v alues in B . If e 1 − lg 2 n ≤ k < e 1 or e 2 ≤ k < e 2 + lg 2 n , then we can ﬁnd the answer from the arr ay A B . Otherwis e (when e 1 ≤ k ≤ e 2 ), w e ﬁrst ﬁnd the p o inter asso ciated with the range co ntaining k (using e ither the bit vector or the partial sum structure, asso cia ted with the s up er blo ck) and use this p ointer to ﬁnd the blo ck con taining the answer. Finding the answer, given the sup erblo ck in which it is contained, is done using the preﬁx sums tree structure stored for that sup e rblo ck. Thu s, using these structures, w e can supp o rt next-excess ( i, k ) for any i and | k − excess ( i ) | ≤ lg 2 n in cons ta nt time. By using the balanced parenthesis representation of the given tree and b y storing the auxilia ry structures of Theorem 4.1, we can supp ort the following: given a no de in the tree ﬁnd its k - th ance s tor, for k ≤ lg 2 n , and also the next no de in the lev el-order trav ers al of the tree in constant time. T o supp ort general level ancestor querie s , w e do as follows. Firstly , we mar k all no des of the tree that a re at a depth which is a multiple 19 of lg 2 n and whose height is at least lg 2 n (similar to [1]). There a re O ( n/ lg 2 n ) such no des. W e store all these marked no des as a tree (preserving the ances tor relation among these no des) and stor e a linear space (hence o ( n )-bit) structure that suppo r ts level-ancestor quer ies in constant time [3]. Note that one level in this tree cor resp onds to exactly lg 2 n levels in the or iginal tree. W e also store the cor resp ondence b etw een the no des in the or iginal tree and tho se in the tree containing only the marked no des. A q uery for level-ancestor ( x , k ), the ancestor of x at height k fr o m x (i.e., a t depth depth ( x ) − k ), is answered a s follows: If k ≤ lg 2 n , w e ﬁnd the answer using a next-excess query . Otherwise, we ﬁrs t ﬁnd the least a ncestor of x whic h is marked using at most t wo next-e xces s queries (the ﬁrst one to ﬁnd the least ancestor whose depth is a m ultiple of lg 2 n , and the next one, if nec e ssary , to ﬁnd the ma r ked a ncestor whose heigh t is at least lg 2 n ). F rom this we ﬁnd the highest m arked ancestor of x whic h is a descendan t o f the answer no de, using the level-ancestor structure for the marked no de s . The r equired ancestor is found from this no de using a nother n e xt-excess query , if nece s sary . The query level-successor ( x ), which returns the success o r of no de x in the level order (i.e., the no de to the right o f x whic h is in the same level a s x ), ca n be supp orted in consta n t time us ing a next-excess ( x, depth ( x )) query . Since all the no des in a subtr ee a re toge ther in the parenthesis representation, chec king whether a no de x is a descendant of another no de y ca n b e done in consta n t time by comparing either the op en o r closing parenthesis p ositio n of x with the op en and closing parenthesis positio ns of y . Hence the repr esentation also supp or ts the is-ancestor op er ation in constant time. Thu s we have: Corollary 4.1. Given an u nlab ele d r o ote d tr e e with n no des, ther e is a struc- tur e t hat re pr esents the t r e e using 2 n + o ( n ) bits of s p ac e and supp orts parent , ﬁrst-child , level-ancestor , level-successor and i s-ancestor queries in O ( 1) time. 5. Representing functions W e now consider the r epresentation of functions f : [ n ] → [ n ]. Given such a function f , we eq ua te it to a digr aph in which every no de is of outdeg ree 1, a nd represent this graph s pace-eﬃciently . W e then show how to compute ar bitr ary powers of the function by tra nslating them into the navigational op erations on the digraph. More sp eciﬁcally , given an arbitr ary function f : [ n ] → [ n ], co nsider the digraph G f = ( V , E ) o btained from it , where V = [ n ] and E = {h i, j i : f ( i ) = j } . In general this digraph cons ists o f a set of connected comp onents where each comp onent has a directed c y cle with each vertex being the ro ot of a (po ssibly single no de) directed tree, with e dg es directed tow ards the ro o t. See Figure 4(a) for an example. W e refer to ea ch connected comp onent as a gadget . The main idea of our repre s entation is to sto re the struc tur e of the gra ph G f as a tree T f such that the for ward and inv erse quer ies can b e transla ted into appropria te na vigationa l operatio ns on the tree. W e store the bijection b etw een 20 18 17 0 1 16 7 15 5 12 10 2 3 13 8 6 11 9 4 14 (3) (4) (5) (6) (8) (10) (11) (12) (14) (17) (18) (15) (16) (13) (1) (2) (0) (9) (7) (a) Graph represent ation of the function f ( x ) = ( x 2 + 2 x − 1) mo d 19 , for 0 ≤ x ≤ 18. The v ertex lab els in the brac ke ts corresp ond to the function g obtained by renaming the ve rtices 0 1 2 3 4 5 6 7 8 9 1 0 11 12 1 3 14 15 1 6 17 18 1 5 4 12 17 9 15 3 13 14 10 16 8 18 11 7 6 2 0 (b) Perm deﬁning the isomor phi s m b et ween G f and G g ( ( ) ) ( ) ( ( ( ) ( ) ) ) ( ( ) ) ( ( ) ) ( ( ( ) ( ( ) ( ) ) ) ) ( ( ) ) 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 (c) Paren thesis represent ation and the bit v ectors indicating the starting p ositions of the gad- gets and the trees (auxiliary structures ar e not shown) Figure 4: Represen ting a function the nodes lab els in G f and the preorder num be r s of the ‘cor resp onding’ no des in T f as a p erm π . T o suppor t the querie s for pow e rs o f f , we need to ﬁnd the no de in T f corres p o nding to a g iven la b el, p erform the required navigational op erations on the tree to ﬁnd the answer node(s), and ﬁnally r eturn the lab el(s) corres p o nding to the answer no de(s). Hence w e stor e the p erm π using one of the perm representations fr om Section 3 so that π () a nd π − 1 () can be supp or ted eﬃciently . W e deﬁne a ga dget to b e wide if its cycle length is lar ger tha n lg 1 / 3 n , and narr ow otherwise. The size of a g adget or a tree is deﬁned as the num b er o f no des in it. B efore constructing the tree T f , we ﬁrst re-order the gadgets and the tree nodes within ea ch gadget as follows: (i) W e ﬁr s t order the g adgets so that all the narrow g adgets a re before an y of the wide gadgets. (ii) W ide gadgets are o rdered arbitra rily a mo ng themselves, while narrow ga dgets ar e order ed in the non-dec resing order of their sizes. (iii) Within each group of nar row g a dgets with the same s ize, we a r range them in the non-decreasing o rder o f their cycle lengths (the cy cle length of a gadget is the num b er of trees in the gadg et). (iv) F or each gadget whos e cycle leng th is g reater than 1 , we break the cycle by selecting a tr ee with maximal height among a ll the tree that b elong to the 21 gadget a nd deleting the outgo ing edge fr o m the r o ot of this tree. W e then order the trees such that the tr ees are in the re verse o rder as we mov e a long the cycle edges in the forw ard dire c tio n (th us the tr ee with the maxima l height that w as selected, is the last tree in this order). (v) W e a lso ar range the nodes within each tree such that the leftmost path of any subtree is the longest path in that subtree, breaking the ties a rbitrarily . W e now constr uct a tree that enco des the structure of the function f . Let C 1 , C 2 , . . . , C p be the gadg e ts in G f and let T 1 i , T 2 i , . . . , T q i i be the trees in the i -th gadg et, for 1 ≤ i ≤ p , after the re-o rdering of the gadg ets and the no des the within the trees. Let r j i be the r o ot of the tre e T j i , for 1 ≤ i ≤ p and 1 ≤ j ≤ q i . W e refer the no de r 1 i as the r o ot of the gadget C i . Construct a tree T f with ro o t r whose children are the p no des: r 1 1 , r 1 2 , . . . r 1 p . F or 1 ≤ i ≤ p , under the no de r 1 i add the path r 2 i − r 3 i − . . . − r q i i . Also a ttach the subtree under the ro o t r j i in T j i to the no de r j i in T f . The siz e of T f is n + 1 (the n no des in G f plus the new ro ot r ). W e represent the tre e T f using the structure of Corolla r y 4.1 using 2 n + o ( n ) bits. Items (iv) a nd (v) ab ove ensure that the leftmos t path in any subtree of T f is a lo ngest pa th in that subtree, and hence is repre s ent ed by a seq ue nc e of op en parentheses in the BP se quence. This enables us to ﬁnd the descendent o f any node in the s ubtree at a given level, if it exis ts , in constant time. W e num b e r o f the no des of T f with their pr e-order num b er s, sta rting from 0 for the ro ot r . Every node in the tree T f , except for the ro ot r , cor resp onds to a unique no de in the gr aph G f , and this cor resp ondence can b e ea sily determined from the co nstruction of the tree. As mentioned earlie r, we stor e this bijection π b et ween the labels in G f and the pr e order num b e rs in T f by repres ent ing the per m π that supp orts π () and π − 1 () eﬃciently . In additio n to the p erm π a nd the tr ee T f , we s tore the following data structures using o ( n ) bits: 1. An array A storing the distinct s izes of the nar row gadgets in the increasing order (i.e., the s equence s 1 , s 2 , . . . , s d , wher e 1 ≤ s 1 < s 2 < . . . < s d ≤ n , and for 1 ≤ i ≤ d there exists a na rrow gadget of s iz e s i in G f ). Note than d = O ( √ n ). 2. An FID for the set B = { p 1 , p 2 , . . . p d } , where p i is the preorder n um b er of the ﬁr st narrow gadget (in the ab ov e o r dering) whose siz e is s i (or equiv alently , the sum of the size s of all the narrow gadgets in G f whose sizes are less than s i ), for 1 ≤ i ≤ d . 3. An FID for the m ultiset C = { s i,j } , for 1 ≤ i ≤ d and 1 ≤ j ≤ n 1 / 3 , where s i,j is the sum of the sizes of all the gadgets whos e size s are: (i) less than s i , and (ii) equal to s i whose cycle lengths are at most j . (A rank op eration in this FID enables us to ﬁnd the cycle length o f the ga dget containing the no de with a given preo r der n umber, if it is in a na rrow gadget). 4. An ar ray A ′ that sto r es the size and cycle leng th of each wide gadget, in the ab ov e o r dering of the wide gadgets. 22 5. An FID for the s et B ′ = { p ′ 1 , p ′ 2 , . . . p ′ d ′ } , where d ′ is the n umber of wide gadgets in G f , and p ′ i is the preorder num b er of the ro ot of the i -th wide gadget (in the a b ove o rdering). Given a no de in a tree, we can ﬁnd its k -th succe s sor (i.e., the no de r eached by trav ers ing k edges in the for ward direction), if it exis ts within the s ame tr ee, in cons tant time using a level- ancestor query . The k -th successor of no de r j i (the ro ot of the j th tree in the i th gadg et) can b e found in O (1 ) time by computing the length of the cycle in the i th gadget, using rank and select op erations on the the ab ove FIDs. By com bining these tw o , we can ﬁnd the k - th succes sor of an arbitrar y no de in a g adget in constant time. Given a no de x in a g adget, if it is not the roo t of any tree, then w e can ﬁnd all its k -th predec e s sors (i.e., all the nodes reachable by trav ers ing k edges in the reverse dir ection) in optimal time using the tree structure by ﬁnding all the descendant nodes o f x that a r e k levels b elow, as fo llows: we ﬁrst ﬁnd the leftmost descendant in the subtree roo ted at x at the g iven level, if it ex ists, in consta n t time, as the leftmost path is represented b y a sequence of ope n parentheses in the pa renthesis representation of the tree. F rom this no de, we can ﬁnd all the no des at this level by using the level-succe ssor op era tion to ﬁnd the next no de at this level, c hecking whether the no de is a descendant of x using the is-ancestor op er ation, a nd stopping when this test fails. T o r ep o rt the set of all k -th pr edecessor s of a node r j i (whic h is the ro ot of the j th tree in the i th gadg et), if j + k ≤ q i , then w e rep ort a ll the no des in the subtree (of T f ) ro oted a t r j i that are at the same lev el as r j + k i . Otherwise, we ﬁrst ﬁnd all trees T y x which co ntain at lea st one answer, a nd then re po rt all the answers in each of those tr ees. Now to ﬁnd all the tr ees T j i that contain at least one answer, we obs erve that if T j ′ i contains a t leas t one node that is a k - th predecessor of r j i , then it also contains at least one node that is a ( q i + ( k mod q i ))-th predecessor of r j i (here q i is the num b er o f trees in the i th gadge t). Also, the s et of all ( q i + ( k mo d q i ))-th predecessor s of r j i is a subse t o f the set of k -th predecessor s of r j i , when k ≥ q i . In other words, the set of all trees that contain at least one k -th predece s sor o f r j i is the same as the se t of all trees that contain at least one ( q i + ( k mo d q i ))-th predecessor of r j i . Thu s to ﬁnd the k -th predeces sors of r j i , we identify t wo subsets of trees whose union is the set o f all trees in the gadget C i that co n tain at least one answer. These tw o subsets are the set o f all tr ees tha t contain at lea st one no de • a t a depth of k in the s ubtree ro oted at no de r j i in T f , and • a t a depth of k − ( q i − j ) in the subtree ro o ted at r 1 i in T f . Once we identify all the trees containing at least one a nswer, we ca n rep ort all the ans wer no des in the tree T f in time linear in the num b er of such no des, a s explained earlier. Each of thes e node num b ers are then tra nsformed in to their corres p o nding no de num b ers in G f using the repres ent ation of π . Combining all thes e , w e hav e : 23 Theorem 5.1. If ther e is a r epr esentation of a p erm on [ n ] that takes P ( n ) sp ac e and supp orts forwar d in t f time and inverse in t i time, then ther e is a r epr esentation of a fun ction f : [ n ] → [ n ] that t akes P ( n ) + 2 n + o ( n ) bits of sp ac e and su pp orts f k ( i ) in O ( t f + t i ∗ | f k ( i ) | ) time (or in O ( t i + t f ∗ | f k ( i ) | ) time), for any int e ger k (which c an b e st or e d in O (1) wor ds) and for any i ∈ [ n ] . Using the succinct p erm representation of Corolla r y 3 .1, we get: Corollary 5. 1. Ther e is a r epr esentation of a fun ction f : [ n ] → [ n ] that takes (1 + ǫ ) n lg n + O (1) bits of sp ac e for any ﬁx e d p ositive c onstant ǫ , and supp orts f k ( i ) in O (1 + | f k ( i ) | ) t ime, for any inte ger k (which c an b e stor e d in O (1) wor ds and for any i ∈ [ n ] . 5.1. F unctions with arbitr ary r anges So fa r we co nsidered functions whose domain and range a re the same se t [ n ]. W e now consider functions f : [ n ] → [ m ] whose do ma in and range ar e of diﬀerent size s, and deal with the tw o cas es: (i) n > m and (ii) n < m separately . These results can b e easily extended to the case when neither the domain nor the ra nge is a s ubs et of the other. W e only consider the querie s for po s itive powers. Case (i) n > m : A function f : [ n ] → [ m ], where n > m ca n b e represented by storing the restr ic tion of f on [ m ] using the representation mentioned in the previous section, tog e ther with the sequence S = f ( m + 1) , f ( m + 2 ) , . . . , f ( n ) stored in an ar ray . This gives a repr esentation that supp orts for ward queries eﬃciently . T o supp ort the inv erse queries, w e sto re the se quence S using a r epresenta- tion that supp orts access and select queries eﬃciently , where access ( i ) returns the v alue f ( m + i ), and se lect ( j, k ) returns the k -th o ccurr ence o f the v a lue j in the sequence . W e use the following representation whic h is implicit in Golyn- ski et al. [16]: A sequence S of length n from an alphab et of size k (wher e n ≥ k ) can be represented a s a collection of ⌈ n/ k ⌉ per ms ov er [ k ] to gether with O ( n ) bits such that a select or an access query on S can b e a nswered b y per forming a single π () or π − 1 query o n one of the p erms, together with a cons tant amount of computation. In addition, we a ugment the dir ected gr aph G f , repr esenting the function f restricted to [ m ], with dummy no des as follows: if f ( m + i ) = j , then we a dd a dumm y no de v as a ‘child’ of the no de corr esp onding to j in G f . The no de v is a r epr esentative of the set { i | f ( i ) = j, i > m } . W e represent this augment ed directed gra ph to suppor t the forward and inv ers e queries, using O ( m ) bits. W e a lso repr esent the p erm that maps the ‘r eal’ (non-dumm y ) no des to their original v alues in the function f . Finally , we store an FID that indicates the po sitions of the dumm y no de s in the order deter mined by the representation of G f , using O ( m ) bits (note that the size of the graph G f is O ( m )). T o answer a q uery f k ( i ) for i ∈ [ n ] and k ≥ 1 , we ﬁrst ﬁnd the no de v corres p o nding to i in the augmented graph G f . The no de v is a ‘rea l’ node if i ≤ m , and can b e found using the p erm π that maps the nodes of G f to their 24 v a lues in f and the FID indicating the positions of dummy nodes. W e then ﬁnd the no de u that is reached by trav er sing k edges in the forward directio n, using the structur e of G f . Finally , the v alue corresp onding to the no de u is obtained using the perm π . If i > m , then the no de v is a dummy no de, and we can ﬁnd j = f ( i ) using an acc e ss query on the string S , and use the fact that f k ( i ) = f k − 1 ( j ) to compute the answer. T o answer a q uery f − k ( i ) for i ∈ [ m ] and k ≥ 1 , we ﬁr s t ﬁnd the no de corres p o nding to the v a lue i in G f , ﬁnd a ll the no des that can b e reached b y trav er sing k edges in the backw a rd direction, and return the v alues corresp ond- ing to all such no des. Th us w e hav e: Theorem 5.2. If ther e is a r epr esentation of a p erm on [ n ] that takes P ( n ) sp ac e and supp orts forwar d in t f time and inverse in t i time, then ther e is a r epr esentation of a function f : [ n ] → [ m ] , n ≥ m that takes ( n − m ) ⌈ lg m ⌉ + P ( m ) + O ( m ) bits of sp ac e and supp orts f k ( i ) in O ( t f + t i ) time, for any p ositive inte ger k and for any i ∈ [ n ] . Ther e is another re pr esentation of f that takes ⌈ n/m ⌉ P ( m ) + O ( m ) bits that supp orts, for any k ≥ 1 , f k ( i ) in O ( t f + t i ) t ime, and f − k ( i ) in O ( t f + t i ∗ | f − k ( i ) | ) time (or in O ( t i + t f ∗ | f − k ( i ) | ) time). Case(ii) n < m : F or a function f : [ n ] → [ m ], where n < m , larger p ow ers (i.e., f k ( i ) for k ≥ 2) are not deﬁned in general (as w e might go out of the domain after one or mor e a pplications of the function). Let R be the set of all elements in the r ange [ m ] that ha ve pre-images in the domain [ n ] whose v alues are greater tha n n . In the g raph G f representing the function f , each element in R corr esp onds to the ro o t of a tree with no outgoing e dges. W e order these trees such that elements c o rresp onding to these ro ots ar e in the increasing orde r . W e then s tore an index able dictiona ry for the set R ⊆ [ m ] using lg  m | R |  + o ( | R | ) + O (lg lg m ) bits . Since | R | ≤ n , this s pace is at most n lg ( m/n ) + O ( n + lg lg m ) bits. The s ize of the gra ph G f is O ( n ) and hence is stored in O ( n ) bits using the repr esentation describ ed in the previous section. Finally , we s to re the co rresp ondence betw een the node n umber ing giv en by the O ( n )-bit repr esentation and the ac tua l no de la b els in G f , except for the no des cor r esp onding to R . As all these no des a re in the set [ n ], we need to sto re a pe r m π ov er [ n ]. A query for f k ( i ), for i ∈ [ n ] and k ≥ 1 is answered by ﬁrst ﬁnding the no de corres p o nding to i in G f using π , then ﬁnding the k -th no de in the forward direction, if it exists, us ing the structur e of G f , and ﬁnally ﬁnding the elemen t corres p o nding to this no de, using the r epresentation of π ag a in. T o ﬁnd the set f − k ( i ), for i ∈ [ m ] and k ≥ 1, we ﬁrst ﬁnd the no de x co rresp onding to i in G f using either the representation of π if i ≤ n , or using the indexable dictionary stored for the set R if n < i ≤ m . W e then ﬁnd all the nodes reachable from x b y taking k edg es in the backw ard dire ction. W e ﬁnally r ep ort the elements corres p o nding to each of these no des , using the repr esentation of π . Thus we hav e: Theorem 5.3. If ther e is a r epr esentation of a p erm on [ n ] that takes P ( n ) sp ac e and supp orts forwar d in t f time and inverse in t i time, then ther e is a 25 r epr esentation of a funct ion f : [ n ] → [ m ] , n < m that takes n lg( m/n ) + P ( n ) + O ( n ) bits. F or any p ositive inte ger k , this r epr esentation s upp orts the queries for f k ( i ) , for any i ∈ [ n ] (r eturns the p ower if deﬁne d and − 1 otherwise) in O ( t f + t i ) time, and supp orts f − k ( i ) , for any i ∈ [ m ] in O ( t f + t i ∗ | f − k ( i ) | ) time (or in O ( t i + t f ∗ | f − k ( i ) | ) time). References [1] S. Alstrup a nd J. Holm. Improv ed alg orithms for ﬁnding level-ancestors in dynamic trees. In Pr o c e e dings of the 27th International Confer en c e on Automata, L anguage and Pr o gr amming , LNCS 1 853, 73–84 , 2000 . [2] D. A. Bader, M. Y an, B. M. W. More t. A linear-time al- gorithm for computing in version distance b etw een signed per - m utations with an exper iment al study . U niversit y o f New Mexico T echn ical Rep ort HP CERC2001-005 (August 200 1): ht tp://www.hp cerc.unm.edu/Rese arch/tr/HPCERC2001-005.p df [3] M. A. Bender and M. F ara ch-Colton. The lev el a ncestor problem simpliﬁed. In Pr o c e e dings of LA TIN , LNCS 228 6 , 50 8–51 5 , 2002 . [4] D. Benoit, E. D. Demaine, J. I. Munro, R. Raman, V. Raman and S. S. Rao. Representing trees of hig he r degree. Algorithmic a , 43(4) 275–292 , 2005. [5] O . Berkma n and U. Vishkin. Finding level-ancestors in trees. Journal of Computer and System Scienc es , 48(2) 2 14–23 0, (199 4 ). [6] A. Z. Bro der, M. Charik ar, A. M. F rieze and M. Mitzenmacher. Min- wise independent per mut ations. Journal of Computer System Scienc es , 60 6 30– 659, (200 0). [7] T . H. Cormen, C. E. L e is erson, R. L. Rivest and C. Stein. In tro duc tio n to Algorithms (3rd edition). The MIT Pr ess , 2 009. [8] P . F. Dietz. Finding level-ancestors in dynamic trees. In Pr o c e e dings of the 2nd Workshop on Algori thms and Data Struct ur es , LNCS 5 19, 32–40, 1991. [9] Y. Do dis, M. Patrascu and M. Thorup. Chang ing base without losing space. In Pr o c e e dings of ACM Symp osium on The ory of Computing , 593– 602, 201 0. [10] A. G´ al and P . B. Miltersen. The ce ll prob e complexit y of succinct data structures. The or. Comput. Sci. , 379(3 ): 40 5-417 (2007). [11] R. Gear y , N. Rahman, R. Raman and V. Rama n. A simple optimal repre- sentation for balanced pa rentheses. The or etic al Computer Scienc e , 368(3) 231–2 46 (20 0 6). 26 [12] R. Geary , R. Raman and V. Raman. Succinct o rdinal trees with level- ancestor quer ies. ACM T r ansactions on A lgorithms , 2(4) 510 – 534 (200 6) [13] A. Golynski. Optimal lo wer b ounds for rank and select indexes. The or. Comput. Sci. 387(3): 348- 359 (2007 ). [14] A. Go lynski. Cell pro be lower bounds for succinct data structur e s. In Pr o c e e dings of the ACM-SIAM Symp osium on Discr et e A lgorithms , 625– 634, 200 9. [15] A. Golynski. Upper and Low er Bounds for T ext Indexing Data Structures. PhD thesis, University o f W aterlo o, 20 07. [16] A. Golynski, J. I. Munro and S. S. Ra o . Rank/ select op era tions on large alphab ets: a to o l for text indexing. In Pr o c e e dings of the ACM-SIAM Symp osium on Discr ete Algorithms , 3 68–3 7 3, 200 6. [17] R. Gros s i and J. S. Vitter. Compressed suﬃx arrays and suﬃx trees with applications to text indexing and string matching. In Pr o c e e dings of the ACM Symp osium on The ory of Computing , 3 9 7–40 6, 2 000. [18] M. He, J. I. Munr o and S. S. Rao. A catego rization theorem on suﬃx arrays with applica tio ns to spa c e eﬃcient text indexes. In Pr o c e e dings of the A CM-SIAM Symp osium on Discr ete Algori thms , 23 – 32, 20 05. [19] M. He, J. I. Munro and S. S. Ra o. Succinct ordinal trees based on tre e cov er ing . In Pr o c e e dings of the International Confer enc e on A utomata, L anguage and Pr o gr amming , LNCS 459 6: 509–520 , 2007. [20] M. E. Hellman. A Cryptana lytic Time-Me mo ry T ra deoﬀ. IEEE T r ansac- tions on In formation The ory , 26 4 01–40 6 (1 980). [21] G. Jacobso n. Space-e ﬃcient static trees and graphs. In Pr o c e e dings of the Annual IEEE Symp osium on F oundations of Computer Scienc e , 5 49–5 54, 1989. [22] D. E. Knuth. E ﬃcient re pr esentation o f pe r m gro ups . Combinatoric a 11 33–43 (199 1). [23] F. T. Leighton. Intr o duction to Par al lel Algo rithms and Ar chite ctu r es: Ar- r ays, T r e es and Hyp er cub es . Computer Science and Information Pro cessing. Morgan Kauﬀman, 1 992. [24] P . B. Milter sen. The bit pr o b e complexity measure revisited. In Pr o c e e dings of the Annual Symp osium on The or etic al Asp e ct s of Computer Scienc e , LNCS 665 662 - 67 1, Spr inger-V erla g , 1993 . [25] J. I. Munr o and V. Raman. Succinct representation of ba lanced par ent heses and static trees. S IAM Jou r n al on Computing , 31 (3):7 62-77 6, 200 2. 27 [26] J. I. Munro, R. Raman, V. Rama n and S. S. Rao. Succinct representa- tions of p ermutations. In Pr o c e e dings of t he Intern ational Confer enc e on Automata, L anguage and Pr o gr amming , LNCS 2 719: 345–35 6, 2003. [27] J. I. Munro and S. S. Ra o . Succinct representations of functions. In Pr o- c e e dings of the Intern ational Confer enc e on Automata, L anguage and Pr o- gr amming , LNCS 3142: 1006–10 15, 2 0 04. [28] M. Patrascu. Succincter. In Pr o c e e dings of the A nnual IEEE S ymp osium on F oundations of Computer Scienc e , 30 5–31 3, 200 8. [29] M. Patrasc u and E. Viola. Cell- Prob e Low er Bounds for Succinct Partial Sums. In P ro c. 21 s t Annual A CM- SIAM SODA, pp. 1 17-12 2, 2 010. [30] N.Pouyanne. On the num b er of p er m utations admitting a n m-th r o ot. The Ele ct r onic Journal of Combinatorics , 9 (2002). [31] R. Ra man, V. Raman and S. S. Rao. Succinct dynamic data structur es. In Pr o c e e dings of the Workshop on Algorithms and D ata S tructur es , LNCS 2125: 426–4 37, 20 01. [32] R. Raman, V. Raman a nd S. S. Rao. Succinct index a ble dictionar ies with applications to enco ding k -ary trees a nd m ultisets. ACM T ra nsactions on Algo rithms , 3(4) , 20 07. [33] K. Sadak ane and G. Nav ar ro. F ully-functiona l succ inct trees . Pr o c e e dings of the ACM-SIAM Symp osium on Discr ete Algorithms , 134–1 49, 2010 . [34] Willar d, Dan E. Lo g-loga rithmic worst-ca s e range queries are po ssible in space Θ( N ). Information Pr o c essing L ett ers 17 (198 3 ) pp. 8 184. 28

Succinct Representations of Permutations and Functions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment