Error-Correcting Data Structures

Error-Correcting Data Structures Ronald de W olf ∗ CWI Ams terdam Abstract W e study data structures in the pr esence of adv ersarial noise. W e wan t to enco de a given ob ject in a s uccinct data structure that ena bles us to eﬃciently answ er sp eciﬁc queries ab out the ob ject, ev en if the da ta structure ha s b een cor rupted by a constant fraction of er r ors. This new model is the common genera lization of (static) data structures and lo ca lly deco dable error- correcting co des. The main is sue is the tradeoﬀ b etw een the space used by the da ta structure and the time (num ber of prob es) needed to answ er a query abo ut the encode d ob ject. W e prov e a num ber of upp er and lower bounds on v arious natura l err or-cor recting data structure problems. In particular, w e show that the optimal length o f error -correc ting data structures fo r the Membership problem (where we want to store subsets of siz e s from a universe of size n ) is clo s ely related to the optimal length of lo cally deco dable co des for s -bit strings. Keyw ords: data structures, fault-tolerance, error-co rrecting co des, loc a lly deco dable co des, mem be r ship pro blem, length-queries tra de o ﬀ 1 In tro duction Data stru ctures d eal with on e of th e most fundamenta l questions of compu ter science: how can we store certain ob jects in a w a y that is b oth space- eﬃcien t and that enables us to eﬃci en tly answ er questions ab out the ob ject? Th us, for instance, it makes sense to store a set as an ordered list or as a heap-structure, b ecause this is space-eﬃcien t and allo ws us to determine quickl y (in time logarithmic in the size of the set) whether a certain elemen t is in the set or not. F rom a complexit y-theoretic p oint of view, the aim is usually to study the tradeoﬀ b et wee n the t wo main r esources of the data structur e: the length/size of the d ata s tr ucture (storage space) and the eﬃciency with whic h we can answer sp eciﬁc qu eries ab out the stored ob ject. T o mak e this precise, we mea sure the length of the data structur e in bits, and measure the eﬃciency of query- answ ering in th e num b er of pr ob es , i.e., the n u mb er of bit-p ositions in the data stru ctur e th at we lo ok at in ord er to answer a query . T he follo wing is adapted from Miltersen’s surv ey [Mil99]: Deﬁnition 1 L et D b e a set of d ata items, Q b e a set of queries, A b e a set of answers, and f : D × Q → A . A ( p, ε )-data structure for f of length N is a map φ : D → { 0 , 1 } N for which ther e is a r andomize d algorithm A that makes at most p pr ob es to its or acle and satisﬁes for every q ∈ Q and x ∈ D Pr[ A φ ( x ) ( q ) = f ( x, q )] ≥ 1 − ε. ∗ rdew olf@cwi.nl. Partia lly supported by V eni and Vidi gran t s from the Netherlands Organization for S cien tiﬁc Researc h (NW O ), and by the Europ ean Commission und er the Integrated Pro ject Qub it Ap plications (QAP) funded by the IST directorate as Con tract N umber 015848. 1 Usually w e will stud y the case D ⊆ { 0 , 1 } n and A = { 0 , 1 } . Mo st standard d ata structures taugh t in undergradu ate computer science are d eterministic, and h ence h a ve error p r obabilit y ε = 0. As men tioned, the main complexit y issue h ere is the tradeoﬀ b et ween N and p . S ome data structure pr ob lems that w e will consid er are the foll o wing: • Equa lity. D = Q = { 0 , 1 } n , and f ( x, y ) = 1 if x = y , f ( x, y ) = 0 if x 6 = y . Th is is not a terribly interesting data structure pr oblem in itself, since for every x there is only one query y for whic h the answer is ‘1’; we m er ely mentio n this data structure here b ecause it will b e used to illustrate some deﬁnitions later on. • Me mbership. D = { x ∈ { 0 , 1 } n : Hamming w eigh t | x | ≤ s } , Q = [ n ] := { 1 , . . . , n } , and f ( x, i ) = x i . In other w ords, x corresp onds to a set of size at most s from a u niv erse of size n , and w e w an t to sto re t he set in a wa y that easily allo w s us to mak e memb ership queries. This is probably the most basic and widely-studied data structur e problem of them all [FKS84, Y ao81, BMR V02, RS V02]. Note that for s = 1 this is Equa lity on log n bits, while for s = n it is the general Me mbership problem without constraints on the s et. • S ubstring. D = { 0 , 1 } n , Q = { y ∈ { 0 , 1 } n : | y | ≤ r } , f ( x, y ) = x y , wh er e x y is the | y | -bit substring of x indexed b y the 1-bits of y (e.g., 1010 0110 = 01). F or r = 1 it is Membership . • In ner pr oduct ( IP n,r ). D = { 0 , 1 } n , Q = { y ∈ { 0 , 1 } n : | y | ≤ r } and f ( x, y ) = x · y mo d 2. This problem is among the hardest Boolean problems where the ans w er dep ends on at most r bits of x (again, for r = 1 it is Membe rship ). More complicate d d ata structure problems su c h as Rank , Predece ssor , Neares t neighbor ha ve also b een studied a lo t, b ut w e will not consider them her e. One issue that the abov e deﬁnition ignores, is the issue of no ise . Memory and storage devices are not p er f ect: the wo rld is full of cosmic ra y s , small earthquak es, rand om (quan tum) eve n ts, b ypassing trams, etc., that can cause a few errors here and there. Another p oten tial source of noise is transmission of th e data stru cture ov er some noisy c hannel. O f course, b etter hardware can partly mitigate these eﬀects, bu t in many situations it is realistic to exp ect a small fraction of the bits in the storage space to b ecome corrupted o ver time. Our goal in this pap er is to study err or-c orr e cting data structures. These still enable eﬃci en t compu tation of f ( x, q ) from the stored data structure φ ( x ), ev en if the latter h as b een corrupted b y a constan t fraction of errors. In analogy with the usual setting for error-correcting co d es [MS77, vL98], we will tak e a p essimistic, adve rsarial view of er r ors her e: we wa n t to b e able to deal with a constant fraction of er r ors no matter wher e they ar e plac e d . F ormally , we deﬁn e error-correcting data s tr uctures as follo ws. Deﬁnition 2 L et D b e a set of data items, Q b e a set of queries, A b e a set of answers, and f : D × Q → A . A ( p, δ, ε )-error-correcting data stru cture for f of length N is a map φ : D → { 0 , 1 } N for which ther e is a r andomize d algorithm A that makes at most p pr ob es to its o r acle and satisﬁes Pr[ A y ( q ) = f ( x, q )] ≥ 1 − ε, for every q ∈ Q , every x ∈ D , and every y ∈ { 0 , 1 } N at Hamming distanc e ∆( y , φ ( x )) ≤ δ N . 2 Deﬁnition 1 is the sp ecial case of Deﬁnition 2 where δ = 0. 1 Note that if δ > 0 then the adv er s ary can alw a ys set the errors in a w ay that giv es the d ecod er A a non-zero error probabilit y . Hence the setting with b ound ed err or pr obabilit y is the natural one f or error-correcting d ata structures. This con trasts with the standard n oiseless setting, where one usually considers deterministic structures. A simple example of an eﬃcient err or-correcting data structur e is for Equality : encode x with a go o d error-correcting co d e φ ( x ). Then N = O ( n ), and we can deco de b y one prob e: giv en y , prob e φ ( x ) j for uniformly chosen j ∈ [ N ], compare it with φ ( y ) j , and output 1 iﬀ these t wo b its are equal. If up to a δ -fraction of the bits in φ ( x ) are corru pted, then we will giv e th e correct answ er with probability 1 − δ in the case x = y . If the distance b et ween an y tw o codewords is close to N / 2 (whic h is true f or instance for a r andom linear co de), then w e w ill giv e the correct ans w er with probabilit y ab out 1 / 2 − δ in the case x 6 = y . These t w o probabilities can b e balanced to 2-sided error ε = 1 / 3 + 2 δ / 3. The error can b e r ed uced further by allo win g more than one prob e. W e only deal with so-called static d ata structures here: w e d o n ot w orry ab out up dating the x that we are enco ding. What ab out dynamic data structures, whic h allo w eﬃcient up dates as w ell as eﬃcien t queries to the enco d ed ob ject? Note that if data-items x and x ′ are distinguishable in the sense that f ( x, q ) 6 = f ( x ′ , q ) for at least o ne query q ∈ Q , then their resp ectiv e er r or-correcting enco dings φ ( x ) and φ ( x ′ ) will hav e distance Ω( N ). 2 Hence up dating the encoded data f r om x to x ′ will requir e Ω( N ) c hanges in the data structure, which sho ws that a d ynamical v ersion of our mo del of err or-correcting data stru ctures with eﬃcien t up dates is not p ossible. Error-correcting data structures n ot only generalize the standard (static) data str uctures (Def- inition 1), bu t they also generalize lo c al ly de c o dable c o des . These are deﬁned as follo w s: Deﬁnition 3 A ( p, δ, ε )-lo cally decod ab le co de (LDC) of length N is a map φ : { 0 , 1 } n → { 0 , 1 } N for which ther e is a r andomize d algorithm A that makes at most p pr ob es to its o r acle and satisﬁes Pr[ A y ( i ) = x i ] ≥ 1 − ε, for every i ∈ [ n ] , every x ∈ { 0 , 1 } n , and ev e ry y ∈ { 0 , 1 } N at Hamming distanc e ∆( y , φ ( x )) ≤ δ N . Note that a ( p, δ , ε )-error-correcting data structur e for Membe rship (with s = n ) is exactly a ( p, δ , ε )-locally deco dable co de. Muc h work has b een done on L DCs, but their length-vs-prob es tradeoﬀ is s till large ly u nkno wn for p ≥ 3. W e refer to [T re04] and the references therein. LDCs address only a v ery s imple t yp e of data stru ctur e p roblem: w e hav e an n -bit “database” and w ant to b e able to r etrieve individual bits from it. In practice , databases h a ve m ore str ucture and complexit y , and one usually asks more complica ted qu eries, such as retrieving all records within a certain range. O ur more general notion of error-correcting data structures enables a s tu dy of suc h more pr actical d ata structure problems in the presence of adve rsarial noise. Commen t on terminology . The terminologies u sed in the data-structure and LDC-literature conﬂict at v arious p oint s, and w e needed to reconcile them someho w. T o a void confusion, let u s rep eat here th e c hoices w e made. W e reserv e the term “query” for the question q one asks ab out the enco ded data x , wh ile accesses to bits of the data stru cture are called “prob es” (in cont rast, these 1 As [BMR V02 , en d of Section 1.1] notes, a data structure can be v iew ed as locally d ecodable source cod e. With this informatio n-theoretic p oint of view, an err or-c orr e cting data structure is a locally deco dable com bined source- channel co de, and our results fo r Membership show t hat one can sometimes do better than combining th e best source code with the best c han n el code. W e t h ank one of the anon ymous referees for p ointing this out. 2 Hence if all pairs x, x ′ ∈ D are distinguishable (which is usually the case), then φ is an error-correcting code. 3 are usu ally called “queries” in the LDC-literature). The num b er of prob es is d enoted by p . W e u se n for the num b er of bits of the data item x (in con trast with the literature ab out Me mbership , whic h m ostly uses m for the size of the un iverse a nd n for the s ize of the set). W e use N for the length of the data structur e (while the LDC-literature mostly uses m , except for Y ekh an in [Y ek07] who uses N as w e do). W e u se the term “deco der” for the algorithm A . Another issu e is that ε is somet imes used as the err or p robabilit y (in whic h case one w ants ε ≈ 0), and someti mes as the bias a wa y from 1/2 (in wh ic h case one w ant s ε ≈ 1 / 2 ). W e u s e the former. 1.1 Our results If one subscrib es to the approac h to w ard s err ors ta k en in the area of error-correcting cod es, then our deﬁnition of err or-correcting data structures seems a very natural one. Y et, to our kn o wledge, this deﬁnition is new and has not b een studied b efore (see S ection 1.2 for other app roac hes). 1.1.1 Members hip The m ost b asic d ata structur e p roblem is pr obably the Member ship problem. F ortunately , our main p ositiv e r esult for error-correcting d ata structures app lies to this pr oblem. Fix some num b er of prob es p , noise lev el δ , and allo w ed error probability ε , and consider th e minimal length of p -prob e error-correcting data structures for s -out-of- n Member ship . Let us call this min imal length MEM( p, s, n ). A ﬁ rst observ ation is th at such a data structure is actually a lo cally deco dable co de for s bits: ju st restrict atten tion to n -bit strings whose last n − s bits are all 0. Hence, with LDC( p, s ) denoting th e min imal length among all p -pr ob e LDCs that enco de s bits (for our ﬁxed ε, δ ), w e immediate ly get the obvious lo we r b ound LDC( p, s ) ≤ MEM( p, s, n ) . This b ound is cl ose to optimal if s ≈ n . Another trivial lo wer b ound comes f rom the observ ation that our d ata structure for Me mbership is a map with domain of size B ( n, s ) := P s i =0  n i  and range of size 2 N that has to b e in jectiv e. Hence w e ge t another ob vious lo w er b ound Ω( s log( n/s )) ≤ log B ( n, s ) ≤ MEM( p, s , n ) . What about upp er b oun d s? Something th at one can alw a ys d o to co nstruct error-correcting data structures for an y pr oblem, is to take th e optimal non-error-correcting p 1 -prob e construction and enco de it with a p 2 -prob e LDC. I f the error probabilit y of the LDC is m uc h smaller than 1 /p 1 , then w e can just run the d ecod er for the non-error-correcti ng structure, replacing eac h of its p 1 prob es b y p 2 prob es to the LDC. T his giv es an error-correcti ng d ata stru cture with p = p 1 p 2 prob es. In the case of Membe rship , the optimal non-error-correcting data str u cture of Buhrman et al. [BMR V02] uses only 1 prob e and O ( s log n ) bits. En co ding this with the b est p ossible p -prob e LDC giv es error- correcting data structures for Me mbership of length LDC( p, O ( s log n )). F or instance for p = 2 w e can use the Hadamard co de 3 for s bits, giving upp er b oun d MEM(2 , s, n ) ≤ exp( O ( s log n )). 3 The Hadamard code of x ∈ { 0 , 1 } s is the codewor d of length 2 s obtained by concatenating the bits x · y (mo d 2) for all y ∈ { 0 , 1 } s . It can b e deco ded by t wo prob es, since for every y ∈ { 0 , 1 } s w e hav e ( x · y ) ⊕ ( x · ( y ⊕ e i )) = x i . Pic k ing y at random, decoding from a δ -corrupted co deword will be correct with probabilit y at least 1 − 2 δ , b ecause b oth prob es y and y ⊕ e i are ind ividually random and hence prob e a corrupted entry with probability at most δ . This exponential length is optimal for 2-prob e LDCs [KW04]. 4 Our main p ositiv e result in Section 2 says that something m uc h b etter is p ossible—the max of the ab ov e t w o lo we r b ounds is not far from op timal. Sligh tly s im p lifying 4 , we pr o ve MEM( p, s, n ) ≤ O (LDC( p, 1000 s ) log n ) . In other words, if we hav e a decen t p -prob e LDC for enco ding O ( s )-bit strings, th en we can use this to encode sets of size s from a muc h larger univ erse [ n ], at the exp ense of blo win g up our data str u cture by only a factor of log n . i F or instance, for p = 2 p rob es we get MEM(2 , s, n ) ≤ exp( O ( s )) log n from the Hadamard co de, wh ic h is m uch b etter than the earlier exp( O ( s log n )). F or p = 3 pr ob es, w e get MEM(3 , s, n ) ≤ exp(exp( √ log s )) log n from Efremenko ’s recen t 3-prob e LDC [Efr08] (which imp ro ved Y ekhanin ’s breakthrough construction [Y ek07]). Our construction relies h ea vily on the Memb ership construction of [BMR V0 2]. Note that th e near-tigh tn ess of the ab o ve upp er and lo w er b ound s implies that p rogress (meaning b etter upp er and/or lo w er bou n ds) on lo cally d ecod ab le co d es for an y n umb er of prob es is e quivalent to progress on error-correcting data structures f or s -out-of- n Membe rship . 1.1.2 Inner p roduct In Section 3 we analyze the inner pro duct problem, where we are enco ding x ∈ { 0 , 1 } n and wa n t to b e ab le to compute the dot p r o duct x · y (mod 2), for any y ∈ { 0 , 1 } n of weigh t at m ost r . W e ﬁrst study the non-error-correcting setting, wh ere we can prov e nearly matc hing u pp er and low er b ounds (this is not the error-correcting setting, bu t provi des s omething to compare it with). Clearly , a trivial 1-prob e data structure is to store the answers to all B ( n, r ) possib le queries separately . In Section 3.1 w e use a discr ep ancy argumen t from comm un ication complexit y to pro ve a lo wer b ound of ab out B ( n, r ) 1 /p on the length of p -prob e data structures. This sho ws that the trivial solution is essen tially optimal if p = 1. W e also construct v arious p -prob e err or-correcting d ata stru ctur es for inner pro d uct. F or small p and large r , their length is not muc h worse than th e b est non-error-correcting s tructures. The upshot is th at inn er p ro duct is a problem where d ata structures can sometimes b e made err or- correcting at little extra cost compared to the non-error-correcting case—admittedly , this is mostly b ecause the n on-error-correcting solutions for IP n,r are already v ery exp ensiv e in terms of length. 1.2 Related work Muc h w ork has of course been done on lo cally deco dable codes, a.k.a. error-correcting data stru c- tures for t he Member ship p r oblem without constraints on th e set size [T re04]. Ho wev er, the error-correcting version of s -out-of- n Membership (“storing sparse tables”) or of other p ossib le data stru cture problems h as not b een stud ied b efore. 5 Here we brieﬂy describ e a num b er of other approac hes to data stru ctur es in the presence of memory er r ors. There is also muc h w ork on data structures with f ault y pr o c essors , but w e will n ot discuss that her e. F ault-tolerant p oin ter-based data structures. Aumann and Bender [AB96] study fault- toleran t v ersions of p ointer-b ase d d ata s tructures. They deﬁn e a p oin ter-based data str ucture 4 Our actual result, Theorem 2, is a bit dirtier, with some deterioration in t h e error and noise parameters. 5 Using the connection b etw een information-theoretical priv ate information retriev al and locally deco dable co des, one may derive some error-correcting data structures from the PIR results of [CIK + 01]. Ho wev er, the resulting structures seem fairly weak. 5 as a directed graph where the edges are p ointe rs, and th e no des come in t wo types: information no des carry real data, w hile auxiliary n o des carry au x iliary or structur al d ata. An err or is the de- struction of a no de and its outg oing edges. They assume su c h an error is d etected when acce ssing the no de. Eve n a few errors ma y b e v ery h armful to p oin ter-based data stru ctures: for instance, losing one p oin ter halfw ay a stand ard linke d list means we lose the second half of the list. They call a data structure ( d, g ) -fault-toler ant (wh ere d is an intege r that upp er b ounds the n umber of errors, and g is a function) if f ≤ d errors ca use at m ost g ( f ) inform ation nod es to b e lost. Aumann and Bender presen t fault-tole ran t stacks with g ( f ) = O ( f ), and fault-toleran t linke d lists and binary se ar ch tr e es with g ( f ) = O ( f log d ), with only a constant -factor o verhead in the size of th e data str u cture, and small computational o v erh ead. Notice, h o wev er , that their error- correcting demand s are m uc h w eake r than ours: w e r equire that no part of the data is lost (ev ery query sh ould b e answered with high success p robabilit y), ev en in th e presence of a constan t fraction of errors. O f course, w e pay for that in terms of the length of the data structur e. F aulty-memo ry RAM mo del. An alternativ e mo del of error-correcting data structures is the “fault y-memory RAM mod el”, introdu ced by Fino cc h i and Italiano [FI04]. In th is m o del, on e assumes there are O (1) incorruptib le memory cells av ailable. This is jus tiﬁed b y the fact that CPU registers are muc h more robust th an other kind s of memory . On the other hand, all other memory cells can b e fault y—in cluding the ones used by th e algorithm that is answ ering queries (something our mo del do es not consider). Th e mo del assumes an up p er b ound ∆ on th e num b er of errors. Fino cc hi, Grandoni, and Italiano d escrib ed essenti ally optimal resilient algorithms for sorting that w ork in O ( n log n + ∆ 2 ) time with ∆ up to ab out √ n ; and for se ar ching in Θ(log n + ∆) time. There is a lot of recen t work in this mo del: Jørgenson et al. [JMM07] study resilien t priority queues , Fino cchi et al. [F GI07] study resilien t se ar ch tr e es , and Bro dal et al. [BFF + 07] study resilien t dictionaries . This in teresting mo del allo ws for more eﬃcien t data structures than our mo del, bu t its disadv anta ges are also clear: it assumes a small n umb er of incorrup tible cells, whic h ma y not b e av ailable in man y practica l situatio ns (for instance when the whole data s tr ucture is stored on a hard disk), and the constructions men tioned ab ov e cannot deal we ll w ith a constant noise rate. 2 The Membership problem 2.1 Noiseless case: t he BMR V data st ructure for M embership Our error-correcting data structures for M embership rely hea vily on the construction of Buhrman et al. [BMR V02], w hose relev an t prop erties w e sk etc h here. T heir structure is obtained using the probabilistic method. Explicit bu t sligh tly less eﬃcient stru ctur es w ere subsequently give n b y T a- Shma [TS 02]. The BMR V-structure maps x ∈ { 0 , 1 } n (of weig h t ≤ s ) to a string y := y ( x ) ∈ { 0 , 1 } n ′ of length n ′ = 100 ε 2 s log n that can b e d ecod ed with one p rob e if δ = 0. More precisely , f or ev ery i ∈ [ n ] there is a set P i ⊆ [ n ′ ] of size | P i | = log( n ) /ε such that for ev ery x of we igh t ≤ s : Pr j ∈ P i [ y j = x i ] ≥ 1 − ε, (1) where the p robabilit y is tak en o ver a u niform ind ex j ∈ P i . F or ﬁ xed ε , the length n ′ = O ( s log n ) of the BMR V-stru cture is optimal up to a constan t factor, b ecause clearly log  n s  is a low er b ound. 6 2.2 Noisy case: 1 prob e F or the n oiseless case, the BMR V data structure has information-theoretically optimal length O ( s log n ) and deco des with the minimal num b er of prob es (one). This can also b e ac hieve d in the error-correcting case if s = 1: then w e just h a ve the Equality problem, for wh ic h see th e remark follo wing Deﬁnition 2. F or larger s , one can observ e that the BMR V-structur e still w orks with h igh p robabilit y if δ ≪ 1 /s : in that case the total num b er of errors is δ n ′ ≪ log n , s o for eac h i , most b its in the Θ(log n )-set P i are uncorru pted. Theorem 1 (BMR V) Ther e exist (1 , Ω(1 /s ) , 1 / 4) -err or-c orr e cting da ta structur es for Member- ship of length N = O ( s log n ) . This only works if δ ≪ 1 /s , whic h is actually clo se to optimal , as follo ws. An s -bit LD C can b e em b edded in an error-correcti ng data structure for Membership , hence it follo ws from Katz-T revisan’s [KT00, Theorem 3] that there are no 1 -prob e error-correcting data stru ctures for Members hip if s > 1 / ( δ (1 − H ( ε ))) (where H ( · ) d enotes binary en tropy). In sum , there are 1- prob e er r or-correcting data structures for Mem bership of information-theoretic ally optimal length if δ ≪ 1 /s . In con trast, if δ ≫ 1 /s then th ere are n o 1-prob e err or-correcting data str u ctures at all, not eve n of exp on ential length. 2.3 Noisy case: p > 1 prob es As w e a rgued in the in tr o duction, for ﬁxed ε and δ there is an easy low er b ound on the length N of p -prob e err or-correcting data structures for s -out-of- n Membership : N ≥ max LDC( p, s ) , log s X i =0  n i  ! . Our nearly m atc hing up p er b ound, describ ed b elo w, uses the ε -error data structur e of [BMR V02] for some small ﬁxed ε . A simple wa y to obtain a p -prob e err or-correcting data structure is just to enco de their O ( s log n )-bit string y with the optimal p -prob e LDC (with error ε ′ , s a y), w h ic h giv es length LDC( p, O ( s log n )). T he one prob e to y is replaced b y p prob es to the LDC. By the un ion b ound , the error probabilit y of the o v erall construction is at most ε + ε ′ . This, ho wev er, achiev es more than we need: this structur e enab les us to reco ver y j for ev ery j , whereas it would su ﬃ ce if w e w ere able to reco v er y j for most j ∈ P i (for eac h i ∈ [ n ]). Deﬁnition of the dat a structure and deco der. T o constru ct a shorter error-correcting data structure, we pro ceed as f ollo ws. Let δ b e a small constan t (e.g. 1 / 100 00); this is the noise leve l we w ant our ﬁ nal data str ucture for Memb ership to p rotect against. C on s ider the BMR V-structur e for s -out-of- n Me mbership , with err or pr obabilit y at most 1/10. Then n ′ = 10000 s log n is its length, and b = 10 log n is the size of eac h of the sets P i . Apply no w a random p er mutation π to y (we sho w b elo w that π can b e ﬁxed to a sp eciﬁc p ermutat ion). View the resulting n ′ -bit string as made up of b = 10 log n consecutiv e b lo c ks of 1000 s bits eac h. W e enco de eac h blo c k with the optimal ( p, 100 δ , 1 / 10 0)-LDC that enco des 1 000 s bits. Let ℓ b e the length of this LDC. T his giv es o ve rall length N = 10 ℓ log n. 7 The deco ding pro cedure is as follo ws. Rand omly c ho ose a k ∈ [ b ]. This pic ks out one of the blo c ks. If th is k th blo c k con tains exactly one j ∈ P i then r eco v er y j from th e (p ossibly corrupted) LDC for that blo ck, u s ing the p -prob e LDC-deco der , an d output y j . If the k th blo c k con tains 0 or m ore than 1 element s from P i , then outp u t a uniformly rand om bit. Analysis. Our goal b elo w is to sho w that we can ﬁx the p ermutatio n π su c h that for at least n/ 20 of the indices i ∈ [ n ], this pro cedure has go o d probabilit y of correctly deco ding x i (for all x of w eight ≤ s ). T h e intuition is as follo ws. Thanks to the random p erm utation and the f act that | P i | equals the n um b er of blo c ks, the exp ected intersecti on b etw een P i and a blo c k is exactly 1. He nce for many i ∈ [ n ], many blo c ks will con tain exactly one in dex j ∈ P i . Moreo v er, for most blo c ks, their LDC-encod ing w on’t ha v e to o man y errors, hence w e can reco ver y j using the LDC-decoder for that blo ck. Since y j = x i for 90% o f the j ∈ P i , we usu ally reco v er x i . T o mak e th is pr ecise, call k ∈ [ b ] “goo d for i ” if blo c k k cont ains exactly one j ∈ P i , and let X ik b e the indicator rand om v ariable for th is ev en t. Call i ∈ [ n ] “go o d” if at least b/ 4 of the blo cks are go o d for i (i.e., P k ∈ [ b ] X ik ≥ b/ 4), and let X i b e the ind icator r an d om v ariable for this ev ent. The exp ected v alue (o ver uniformly random π ) of eac h X ik is the probabilit y that if w e randomly place b b alls in to ab p ositions ( a is th e block-size 1000 s ), then th ere is exactly one ball among the a p ositions of th e ﬁrst blo ck, and the other b − 1 balls are in the last ab − a p ositio ns. T his is a  ab − a b − 1   ab b  = ( ab − b )( ab − b − 1) · · · ( ab − b − a + 2) ( ab − 1)( ab − 2) · · · ( ab − a + 1) ≥  ab − b − a + 2 ab − a + 1  a − 1 ≥  1 − 1 a − 1  a − 1 . The right hand side go es to 1 /e ≈ 0 . 37 with large a , s o w e can safely lo w er b ound it b y 3 / 10. Th en, using linearit y of exp ectation: 3 bn 10 ≤ E xp   X i ∈ [ n ] ,k ∈ [ b ] X ik   ≤ b · Exp   X i ∈ [ n ] X i   + b 4   n − Exp   X i ∈ [ n ] X i     , whic h implies Exp   X i ∈ [ n ] X i   ≥ n 20 . Hence w e can ﬁ x one p ermutatio n π su ch that at least n / 20 of the indices i are goo d. F or every in dex i , at least 90% of all j ∈ P i satisfy y j = x i . Hence for a go o d index i , with probabilit y at least 1 / 4 − 1 / 1 0 we will pic k a k su c h that th e k th blo c k is go o d for i and the unique j ∈ P i in the k th blo ck satisﬁes y j = x i . By Mark o v’s inequ ality , the p robabilit y th at the blo c k that we pick ed has more than a 10 0 δ -fraction of errors , is less than 1 / 100. If the fraction of errors is at most 100 δ , th en our LDC-deco der reco v ers the relev an t bit y j with probability 99 / 100 . Hence the o v erall probabilit y of outputting the correct v alue x i is at least 3 4 · 1 2 +  1 4 − 1 10 − 1 100  · 99 100 > 51 100 . W e end up with an error-correcting data structure for Memb ership for a universe of size n/ 20 instead of n elements, but we can ﬁx this by starting with the BMR V-structure for 20 n bits. W e summarize this construction in a theorem: 8 Theorem 2 If ther e exists a ( p, 100 δ , 1 / 1 00) -LDC of length ℓ that enc o des 1000 s b its, then ther e exists a ( p, δ, 49 / 100) -err or-c orr e c ting d ata structur e of length O ( ℓ log n ) for the s -out-of- n Mem- bership pr oblem. The err or and noise parameters of this n ew stru cture are not great, but they ca n b e impr ov ed b y more careful analysis. W e h ere sk etc h a b etter solution without giving all tec hnical deta ils. Supp ose we c hange the d ecod ing pr o cedure for x i as follo w s: pic k j ∈ P i uniformly at random, deco de y j from the L DC of the blo c k where y j sits, and outp ut the result. There are thr ee sources of error h ere: (1) the BM R V-structur e mak es a mistak e (i.e., j h app ens to b e su ch that y j 6 = x i ), (2) the LDC-decod er fails b ecause there is to o m uch noise on the LDC that w e are deco ding from, (3) the LDC-decoder fails ev en though there is not to o m uch noise on it. Th e 2nd kin d is hardest to analyz e. Th e ad versary w ill do best if he puts j u st a bit more than the tolerable noise-lev el on the enco dings of b lo c ks that con tain the most j ∈ P i , thereby “destroying” those enco din gs. F or a rand om p er mutation, we exp ect that ab out b/ ( e · m !) of the b blo cks conta in m elements of P i . Hence ab out 1/65 of all b lo c ks hav e 4 or more elemen ts of P i . If the LDC is d esigned to protect against a 65 δ -fraction of errors within one encod ed block, then w ith o v erall error-fraction δ , the adversary has exactly enough noise to “destroy” all blo c ks conta ining 4 or more elements of P i . The probability that our un iformly random j sits in suc h a “destroy ed” b lo c k is ab out X m ≥ 4 m b b e · m ! = 1 e  1 3! + 1 4! + · · ·  ≈ 0 . 08 . Hence if w e set the error of the BMR V-stru cture to 1/10 and the error of the LDC to 1/10 0 (as ab o ve), then the total err or probabilit y for d ecod ing x i is less than 0.2 (of course w e need to sho w that w e can ﬁx a π such that go o d decodin g o ccurs for a go o d fraction of all i ∈ [ n ]). Another parameter that ma y b e adju sted is the blo c k size, which w e here to ok to b e 1000 s . C learly , diﬀerent tradeoﬀs b etw een co delength, to lerable noise-lev el, and err or probabilit y are p ossible. 3 The Inner pr oduct problem 3.1 Noiseless case Here we s ho w b ounds for Inner product , ﬁrst for the case where there is n o noise ( δ = 0). Upp er b ound. Consider all strings z o f wei gh t at most ⌈ r /p ⌉ . The num b er of suc h z i s B ( n, ⌈ r/p ⌉ ) = P ⌈ r/p ⌉ i =0  n i  ≤ ( epn/r ) r /p . W e deﬁne our codewo rd b y wr iting do wn , for all z in lexicographic ord er, the inner p r o duct x · z m o d 2. If w e w ant to reco v er the inner pro d uct x · y for some y of we igh t at most r , we write y = z 1 + · · · + z p for z j ’s of w eigh t at most ⌈ r /p ⌉ and reco v er x · z j for eac h j ∈ [ p ], using one prob e f or eac h . Summin g th e r esults of the p prob es give s x · y (mo d 2). In particular, for p = 1 pr ob es, the length is B ( n, r ). Lo wer b ound. T o prov e a nearly-matc h ing lo we r b oun d, we use Miltersen’s tec hnique of relating a data structure to a t w o-party communicatio n game [Mil94]. W e refer to [K N97] for a general in tr o duction to comm unication complexit y . Sup p ose Alice get s string x ∈ { 0 , 1 } n , Bob gets strin g y ∈ { 0 , 1 } n of wei gh t ≤ r , and they need to compu te x · y (mo d 2) with b ounded error probabilit y and minimal comm unication b etw een them. Call this communicatio n problem IP n,r . Let B ( n, r ) = 9 P r i =0  n i  b e the siz e of Q , i.e., the n u m b er of p ossible queries y . The pr o of of our c omm u nication complexit y lo we r b ound b elo w uses a fairly standard discrepancy argum ent, but we ha ve not found this sp eciﬁc resu lt an ywhere. F or completeness w e in clud e a pro of in App endix A. Theorem 3 Every c ommunic ation pr oto c ol for IP n,r with worst-c ase (or even aver age-c ase) suc c ess pr ob ability ≥ 1 / 2 + β ne e ds at le ast log ( B ( n, r )) − 2 log (1 / 2 β ) bits of c ommunic ation. Armed with this co m m u nication complexit y b oun d we can lo w er b oun d d ata s tructure length: Theorem 4 Every ( p, ε ) -data structur e for IP n,r ne e ds sp ac e N ≥ 1 2 2 (log( B ( n,r )) − 2 l og(1 / (1 − 2 ε )) − 1) /p Pro of. W e will use the d ata structur e to obtain a comm unication pr oto col for IP n,r that u ses p (log( N ) + 1) + 1 bits of comm unication, and then in vok e T heorem 3 to obtain the low er b ou n d. Alice holds x , and hence φ ( x ), while Bob simulat es the decod er. Bob starts the communicatio n. He p ic ks his ﬁ rst p rob e to the data structur e and sends it ov er in log N bits. Alice sends back the 1-bit answ er. After p rounds of comm unication, all p prob es ha ve b een sim ulated and Bob can giv e the same output as the deco der wo uld ha v e giv en. Bob’s output will b e the last bit of the comm unication. Th eorem 3 no w implies p (log( N ) + 1) + 1 ≥ log ( B ( n, r )) − 2 log (1 / (1 − 2 ε )) . Rearranging giv es the b ound on N . ✷ F or ﬁ xed ε , th e lo wer b ound is N = Ω  B ( n, r ) 1 /p  . Th is is Ω(( n/r ) r /p ), wh ich (at least for small p ) is not to o far from the u pp er b oun d of appro ximately ( epn/r ) r /p men tioned ab o ve. Note that in general our b ound on N is s u p erp olynomial in n whenever p = o ( r ). F or instance, when r = αn for some constant α ∈ (0 , 1 / 2) then N = Ω(2 nH ( α ) /p ), whic h is non-trivial whenev er p = o ( n ). Finally , note that the pro of tec h nique also w orks if Alice’s m essages are longer than 1 bit (i.e., if the co de is o ver a larger-than-binary alphab et). 3.2 Noisy case 3.2.1 Constructions for Subs tring One can easily construct error-correcting d ata structur es for Substring , whic h al so suﬃce for Inner product . Note that since w e are r eco v erin g r bits, and eac h p rob e giv es at most one bit of in formation, b y informatio n theory w e n eed at least ab out r prob es to the data structure. 6 Our solutions b elo w will use O ( r log r ) prob es. View x as a concatenatio n x = x (1) . . . x ( r ) of r strings of n/r bits eac h (we ignore rou n ding for simp licit y), and deﬁne φ ( x ) as the concatenation of the Hadamard co des of th ese r pieces. Then φ ( x ) has length N = r · 2 n/r . If δ ≥ 1 / 4 r then the adv ersary could corrupt one of the r Hadamard co des by 25% noise, en s uring that some of the bits of x are irrev o cably lost ev en when w e allo w the full N p rob es. Ho we v er, if δ ≪ 1 /r then w e can reco ver eac h bit x i with small constan t error probabilit y by 2 pr ob es in the Hadamard codewo rd where i sits, and with error probabilit y ≪ 1 /r using O (log r ) p r ob es. Hence w e can compu te f ( x, y ) = x y with error close to 0 using p = O ( r log r ) p rob es (or with 2 r prob es if δ ≪ 1 /r 2 ). 7 This also imp lies that any d ata structure pr oblem wher e f ( x, q ) dep ends o n at most some ﬁ xed co nstan t r bits of x , has an error-correcting data structure of length N = r · 2 n/r , p = 6 d/ (log( N ) + 1) probes in the case of quantum d ecoders. 7 It fol lo ws from Buhrman et al. [BNR W07] that if we allow a quantum decoder, the fa ctor of log r is not needed. 10 O ( r log r ), and that w orks if δ ≪ 1 /r . Alte rnativ ely , w e can tak e Efremenko’s [Efr08] or Y ekhanin’s 3-prob e LDC [Y ek07], and jus t deco de eac h of the r b its separately . Using O (log r ) prob es to reco v er a bit with error probabilit y ≪ 1 /r , we reco v er the r -bit string x y using p = O ( r log r ) prob es even if δ is a co nstan t indep endent of r . 3.2.2 Constructions for Inner product Going through the p ro of of [Y ek07], it is easy to see that it allo w s us to compute the parit y of any set of r bits from x using at most 3 r p rob es with er r or ε , if the noise rate δ is at m ost ε/ (3 r ) (ju st add the results of the 3 p rob es one w ou ld make for eac h bit in the parity). T o get er r or-correcting data structures ev en for sm all constan t p (indep endent of r ), w e can adapt the p olynomial schemes from [BIK05] to ge t the fol lo wing theorem. The details are giv en in Ap p endix B. Theorem 5 F or ev ery p ≥ 2 , ther e exists a ( p, δ, pδ ) -err or-c orr e cting data structur e for I P n,r of length N ≤ p · 2 r ( p − 1) 2 n 1 / ( p − 1) . F or the p = 2 ca se, we get somet hing simpler and b etter from th e Hadamard cod e. This co de, of length 2 n , actually allo ws us to compute x · y (mo d 2) for an y y ∈ { 0 , 1 } n of ou r c hoice, with 2 prob es and error pr obabilit y at most 2 δ (just prob e z and y ⊕ z for un iformly random z ∈ { 0 , 1 } n and observe that ( x · z ) ⊕ ( x · ( y ⊕ z )) = x · y ). Note that for r = Θ( n ) and p = O (1), ev en non-error-correcting d ata structures need length 2 Θ( n ) (Theorem 4). This is an e xample wh er e error-correcting data structures are not signiﬁcan tly longer than the non-error-correcting kind . 4 F uture w ork Man y questions are op ened u p b y our mod el of error-correcting data structures. W e men tion a f ew: • Th ere are plen ty of other natural data structure problems, suc h as Rank , Predecessor , v ersions of Nearest neighbor etc. [Mil99]. What ab ou t the length-vs-prob es tradeoﬀs for their error-correcting v ersions? The obvio us approac h is to put the b est kno w n LDC on top of the b est kno wn non-error-correcting data s tr uctures. This is not alw a ys optimal, though—for instance in the case of s -out-of- n Membe rship one can do signiﬁcan tly b etter, as w e show ed. • It is o ften natural to assume that a memory cell con tains not a bit, but some num b er from, sa y , a p olynomial-size u niv ers e. This is called the c el l-pr ob e mo d el [Y ao81], in con trast to the bit-pr ob e mo del w e considered h er e. Pr obing a cell giv es O (log n ) bits at the same time, whic h can s igniﬁcan tly impro v e the length-vs-prob es tradeoﬀ and is worth studying. Still, w e view the bit-prob e app roac h tak en here as more fundamental than the c ell-prob e mo del. A p -prob e cell-prob e structur e is a O ( p log n )-pr ob e bit-prob e structure, bu t n ot vice v ers a. Also, the w ay memory is addressed in actual computers in constan t c hunks of, sa y , 8 or 16 bits at a time, is closer in spirit to the bit-prob e mo del than to the cell-prob e mo del. • Z v i Lotke r s u ggested to me the follo win g connection with distribu ted computing. Supp ose the data structure is distributed ov er N p ro cessors, eac h holding one bit. Int erpreted in this setting, an error-correcting data s tructure allo ws a n honest part y to answ er queries abou t the encoded ob ject while comm unicating with at most p processors. Th e answer will b e correct with probabilit y 1 − ε , even if u p to a δ -fraction of the N pro cessors are fault y or ev en malicious (the querier need not kno w where the fault y/malicious sites are). 11 Ac kno wledgmen ts Thanks to Nitin Saxena for man y useful discussions, to Harry Buh rman and Jaikumar Radhakrish - nan for discussions ab out [BMR V02], to Z vi Lotk er for the conn ection with distribu ted computation men tioned in Section 4, to P eter Bro Miltersen for a p oin ter to [JMM07] and the fault y-memory RAM mo del, and to Gabr iel Moruz for sending me a co p y of that pap er. References [AB96] Y. Aumann and M. Bender. F ault-toleran t data structur es. In Pr o c e e dings of 37 th IEEE F OCS , p ages 580–58 9, 19 96. [BFF + 07] G. Brod al, R. F agerb erg, I. Finocchi, F. Grandoni, G. Italiano, A. J ørgenson, G. Moruz, and T. Mølha v e. Optimal r esilien t dynamic d ictionaries. In Pr o c e e dings of 15th Eur o- p e an Symp osium on Algorithm s (ESA) , pages 347–358, 20 07. [BIK05] A. Beimel, Y. Ishai, and E. Kus h ilevitz. General c onstructions for inf orm ation- theoretical Priv ate Information Retriev al. Journal of Computer and System Sc i enc e s , 72(2): 247–2 81, 2005. [BMR V02] H. Buhrman, P . B. Miltersen, J. Radh akrishnan, and S . V enk atesh. Are bitv ectors optimal? SIAM Journal on Computing , 31(6):1723 –1744 , 2002. Earlier v ersion in STOC’00. [BNR W07] H. Buhr m an, I. Newman, H. R¨ ohrig, and R. de W olf. Robust p olynomials and quan tum algorithms. The ory of Computing Systems , 40(4):37 9–395 , 2007. [CIK + 01] R. Canetti, Y. Ishai, R. Kum ar, M. Reiter, R. R u binfeld, and R. W right. Selectiv e priv ate function ev aluation with applications to priv ate statistics. In Pr o c e e dings of 20th Annua l ACM Symp osium on Princi ples of Distribute d Computing (P O DC) , pages 293–3 04, 2 001. [Efr08] K. Efremenk o. 3-query locally d ecod able co d es of su b exp onential length. T ec hn ical rep ort, ECCC Rep ort TR08–06 9, 20 08. [F GI07] I. Fino cc hi, F. Grandoni, and G. I taliano. Resilien t s earc h trees. In Pr o c e e dings of 18th ACM-SIAM SODA , p ages 547–5 53, 2007 . [FI04] I. Fino cc h i and G. Italia no. Sorting and searc hing in the pr esence of memory faults (without redund ancy). In Pr o c e e dings of 36th A CM STOC , pages 101–110, 200 4. [FKS84] M. F redman , M. Koml´ os, and E. Szemer´ edi. Storing a s p arse table with O (1) worst case access time. J ournal of the A CM , 31(3 ):538– 544, 1984. [JMM07] A. G. Jørgenson, G. Moruz, and T . Mølha ve. Resilien t priorit y q u eues. In Pr o c e e dings of 10th International Workshop on Algorithms and Data Structur es (W A D S) , v olume 4619 of L e c tur e N otes in Computer Scienc e , 2007. [KN97] E. Kush ilevitz and N. Nisan. Communic ation Complexity . Cambridge Univ ersity Press, 1997. 12 [KT00] J. Katz and L. T revisan. On the eﬃciency of local d eco din g pro cedu res for err or- correcting co des. In Pr o c e e dings of 32nd A CM STOC , p ages 80–86, 2000. [KW04] I. Kerenidis and R. de W olf. Exp onen tial low er b ound for 2-query lo cally decod ab le co des via a quan tum argument. Journal of Computer and System Scienc es , 69(3):395– 420, 2004. [Mil94] P . B. Miltersen. Low er b ounds for Union-Sp lit-Find related problems on random access mac hines. In Pr o c e e dings of 26th ACM STO C , pages 625–634, 19 94. [Mil99] P . B. Miltersen. Cell prob e complexit y - a surve y . Invited paper at A dvanc es in Data Structur es w orks h op. Av ailable at Miltersen’s homepage, 1999. [MS77] F. MacWilli ams and N. S loane. The The ory of Err or-Corr e cting Co des . North-Hol land, 1977. [RSV02] J. Radhakrishnan, P . Sen, and S. V enk atesh. The quantum complexit y of set member- ship. Algor ithmic a , 34(4 ):462– 479, 2002. E arlier v ersion in F OCS’00. [T re04] L. T r evisan. Some applications of co ding theory in computational complexit y . Quaderni di Matematic a , 13:347–424 , 2004. [TS02] A. T a-Shm a. S toring information with extracto rs. Informa tion Pr o c essing L etters , 83(5): 267–2 74, 2002. [vL98] J. H. v an Lint. Intr o duction to Co ding The ory . S pringer, third edition, 1998. [Y ao77] A. C-C. Y ao. Probabilistic compu tations: T o ward a uniﬁed measure of complexit y . In Pr o c e e dings of 18th IEEE FOCS , pages 222–22 7, 1977 . [Y ao81] A. C-C. Y ao. Sh ould tables b e sorted? Journal of the ACM , 28(3) :615–6 28, 1981. [Y ek07] S. Y ekhanin. T ow ards 3-query locally deco dable co des of sub exp onential length. In Pr o c e e dings of 39th A CM STOC , pages 266–274, 2007. A Pro of of Theorem 3 Let µ b e the uniform input distribution: eac h x has p robabilit y 1 / 2 n and eac h y of weigh t ≤ r has probabilit y 1 /B ( n, r ). W e sho w a lo wer b ound on the comm un ication c of d eterministic proto cols that compute IP n,r with µ -probabilit y at least 1 / 2 + β . By Y ao’s prin ciple [Y ao77], this lo wer b oun d then also applies to r andomized proto cols. Consider a deterministic c -bit proto col. Assume the last bit communicat ed is the output bit. It is w ell-kno w n that this partitions the input sp ace into r e ctangles R 1 , . . . , R 2 c , where R i = A i × B i , and the protocol giv es the same output bit a i for ea c h ( x, y ) ∈ R i . 8 The discr ep ancy of rectangle R = A × B under µ is the diﬀerence b et ween the w eigh t of the 0s and the 1s in that rectangle: δ µ ( R ) =   µ ( R ∩ IP − 1 n,r (1)) − µ ( R ∩ IP − 1 n,r (0))   W e can sh o w f or ev ery rectangle that its discrepancy is not v ery large: 8 [KN97, Section 1.2]. The n u mb er of rectangles ma y b e smaller th an 2 c , but we can alwa ys add empt y ones. 13 Lemma 1 δ µ ( R ) ≤ p | R | √ 2 n B ( n, r ) . Pro of. Let M b e the 2 n × B ( n, r ) matrix whose ( x, y )-en try is ( − 1) IP n,r ( x,y ) = ( − 1) x · y . It is easy to see that M T M = 2 n I , where I is the B ( n, r ) × B ( n, r ) identit y matrix. This imp lies, for an y v ∈ R B ( n,r ) k M v k 2 = ( M v ) T · ( M v ) = v T M T M v = 2 n v T v = 2 n k v k 2 . Let R = A × B , v A ∈ { 0 , 1 } 2 n and v B ∈ { 0 , 1 } B ( n,r ) b e the c h aracteristic (column) vec tors o f the sets A and B . Note that k v A k = p | A | and k v B k = p | B | . The su m of M -en tries in R is P a ∈ A,b ∈ B M ab = v T A M v B . W e can b ound this u sing Cauc h y-S c hw arz: | v T A M v B | ≤ k v A k · k M v B k = k v A k · √ 2 n k v B k = p | A | · | B | · 2 n . Observing that δ µ ( R ) = | v T A M v B | / (2 n B ( n, r )) and | R | = | A | · | B | concludes the pro of. ✷ Deﬁne the su ccess and failure probabilities (und er µ ) of the p roto col as P s = 2 c X i =1 µ ( R i ∩ IP − 1 n,r ( a i )) and P f = 2 c X i =1 µ ( R i ∩ IP − 1 n,r (1 − a i )) Then 2 β ≤ P s − P f = X i µ ( R i ∩ IP − 1 n,r ( a i )) − µ ( R i ∩ IP − 1 n,r (1 − a i )) ≤ X i   µ ( R i ∩ IP − 1 n,r ( a i )) − µ ( R i ∩ IP − 1 n,r (1 − a i ))   = X i δ µ ( R i ) ≤ P i p | R i | √ 2 n B ( n, r ) ≤ √ 2 c p P i | R i | √ 2 n B ( n, r ) = p 2 c /B ( n, r ) , where the last inequalit y is Cauch y-Sc hw arz and the last equalit y holds b ecause P i | R i | is the total n um b er of inp uts, whic h is 2 n B ( n, r ). Rearranging giv es 2 c ≥ (2 β ) 2 B ( n, r ), h ence c ≥ log ( B ( n, r )) − 2 log(1 / 2 β ). B Pro of of Theorem 5 Here we construct p -prob e err or-correcting d ata structur es for the inn er pro d u ct problem, inspired b y the approac h to lo cally deco dable codes of [BIK05]. Let d b e an in teger to b e determined later. Pic k m = ⌈ dn 1 /d ⌉ . Th en  m d  ≥ n , so there exist n d istinct sets S 1 , . . . , S n ⊆ [ m ], eac h of size d . F or ea c h x ∈ { 0 , 1 } n , deﬁne an m -v ariate polynomial p x of degree d ov er F 2 b y p x ( z 1 , . . . , z m ) = n X i =1 x i Y j ∈ S i z j . 14 Note that if we identi fy S i with its m -bit characte ristic v ector, then p x ( S i ) = x i . F or z (1) , . . . , z ( r ) ∈ { 0 , 1 } m , deﬁne an r m -v ariate p olynomial p x,r o ve r F 2 b y p x,r ( z (1) , . . . , z ( r ) ) = r X j = 1 p x ( z ( j ) ) . This p olynomial p x,r ( z ) has r m v ariables, degree d , and allo ws us to ev aluate parities of an y set of r of the v ariables of x : if y ∈ { 0 , 1 } n (of we igh t r ) has its 1-bits at p ositions i 1 , . . . , i r , then p x,r ( S i 1 , . . . , S i r ) = r X j = 1 x i j = x · y (mod 2) . T o construct an err or-correcting data structure for IP n,r , it th us suﬃces to giv e a structur e that enables us to ev aluate p x,r at an y p oin t w of our c hoice. 9 Let w ∈ { 0 , 1 } r m . S upp ose we “ secret-share” this into p pieces w (1) , . . . , w ( p ) ∈ { 0 , 1 } r m whic h are un iformly rand om su b ject to th e constraint w = w (1) + · · · + w ( p ) . No w consider the pr m -v ariate p olynomial q x,r deﬁned by q x,r ( w (1) , . . . , w ( p ) ) = p x,r ( w (1) + · · · + w ( p ) ) . (2) Eac h monomial M in this polynomial has at most d v ariables. If we pic k d = p − 1, t hen for ev ery M there will b e a j ∈ [ p ] suc h that M does not con tain v ariables from w ( j ) . Assign all su c h monomials to a new p olynomial q ( j ) x,r , which is indep endent of w ( j ) . This allo ws us to w rite q x,r ( w (1) , . . . , w ( p ) ) = q (1) x,r ( w (2) , . . . , w ( p ) ) + · · · + q ( p ) x,r ( w (1) , . . . , w ( p − 1) ) . (3) Note that eac h q ( j ) x,r has d omain of size 2 ( p − 1) rm . The data structure is d eﬁned as the co ncatenation, for all j ∈ [ p ], of the v alues of q ( j ) x,r on all p ossible inputs. Th is h as length N = p · 2 ( p − 1) rm = p · 2 r ( p − 1) 2 n 1 / ( p − 1) . This length is 2 O ( rn 1 / ( p − 1) ) for p = O (1). Answ ering a query works as follo w s: the deco der w ould lik e to ev aluate p x,r on some p oin t w ∈ { 0 , 1 } r m . He picks w (1) , . . . , w ( p ) as ab o ve, and for all j ∈ [ p ], in the j th blo c k of the co de prob es the p oin t z (1) , . . . , z ( j − 1) , z ( j +1) , . . . , z ( p ) . T his, if uncorr u pted, return s the v alue of q ( j ) x,r at that p oint. T he deco der outputs the sum of h is p prob es (mo d 2). If none of th e prob ed bits w ere corrupted, then the output is p x,r ( w ) b y Eqs. (2) and (3). Note that the prob e within the j th blo c k is uniformly random in that blo c k , so its error probab ility is exactly the fraction δ j of er r ors in the j th b lo c k. H ence b y the union b oun d, the total error probabilit y is at most P p j = 1 δ j . If the ov erall fraction of errors in the data structure is at most δ , then we ha v e 1 p P p j = 1 δ j ≤ δ , hence the total error probability is at most pδ . 9 If we also w ant to b e able to compute x · y (mod 2) for | y | < r , w e can just add a dummy 0 as ( n + 1)st v ariable to x , and use its index r − | y | times as inpu ts to p x,r . 15

Error-Correcting Data Structures

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment