Tight Lower Bound for Linear Sketches of Moments

Tigh t Lo w er Bound for Linear Sk etc hes of Momen ts Alexandr Andoni 1 , Huy L. Nguy ˜ ˆ en 2 , Y ury Polyanskiy 3 , and Yihong W u 4 1 Microsof t R esearc h SVC , a ndoni@microsoft. com 2 Princeton U, hlnguyen@pr inceton.edu 3 MIT, yp@mit.edu 4 UIUC, yihongwu @illinois.edu Abstract. The problem of estimating frequ ency moments of a data stream has attracted a l ot of attentio n since the onset of streami ng algo- rithms [ AMS99 ]. While the space complexit y for appro ximately comput- ing the p th moment, for p ∈ (0 , 2] has b een settled [ KNW10 ], for p > 2 the e xact complexity remains open. F or p > 2 the current b est algo rithm uses O ( n 1 − 2 /p log n ) wo rds of space [ AKO11 , B O10 ], whereas the low er b ound is of Ω ( n 1 − 2 /p ) [ BJKS04 ]. In this pap er, w e show a tight lo wer b oun d of Ω ( n 1 − 2 /p log n ) words for the class of algorithms based on linear sketc hes, whic h store only a sketc h Ax of input vector x and some (possibly rand omized) m atrix A . W e note that all know n algorithms for this problem are linear sketc hes. 1 In tro duction One of the classica l pro blems in the streaming literature is that of computing the p -frequency moments (or p -no rm) [ AMS99 ]. In particular , the que s tion is to compute the norm k x k p of a vector x ∈ R n , up to 1 + ǫ a pproximation, in the streaming mo del using low space. Here, we a ssume the mo s t general mo del of streaming, where one sees up dates to x of the form ( i, δ i ) which means to add a quantit y δ i ∈ R to the co ordinate i o f x . 5 In this s etting, linear estimator s, which store Ax for a ma tr ix A , are particula r ly useful as such an up date can b e easily pro cessed due to the equa lity A ( x + δ i e i ) = Ax + A ( δ i e i ). The frequency moments pro blem is among the pro blems that received the most attention in the streaming literatur e. F or exa mple, the s pace complexity for p ≤ 2 has b een fully under sto o d. Sp eciﬁcally , for p = 2 , the founda tional pap er of [ AMS99 ] show ed that O ǫ (1) words (linear measur e ments) suﬃce to approximate the Euclidean no rm 6 . Later work show ed how to a chiev e the same s pace for all 5 F or simplicity of presentation, w e assume th at δ i ∈ {− n O (1) , . . . , n O (1) } , although more reﬁned b ound s can b e stated otherwise. N ote th at in this case, a “word” (or measuremen t in the case of linear sketc h — see d eﬁnition b elow) is usually O ( log n ) bits. 6 The exact b ound is O (1 /ǫ 2 ) w ords; since in this pap er w e concentrate on the case of ǫ = Ω (1) only , we drop dep endence on ǫ . p ∈ (0 , 2 ) norms [ Ind0 6 , Li08 , KNW10 ]. This upp er b o und ha s a matching lower bo und [ AMS99 , IW03 , Bar 02 , W oo 04 ]. F urther rese a rch fo cused o n other a sp ects, such as algorithms with improved up date time (time to pro c e ss a n update ( i, δ i )) [ NW10 , KNW10 , Li08 , GC07 , KNPW11 ]. In constrast, when p > 2, the exact space complexit y still remains open. After a line of research on b oth upper b ounds [ AMS99 , IW05 , BGKS06 , MW10 ], [ AKO11 , BO 1 0 , Gan11 ] and low er b ounds [ AMS99 , CKS03 , BJKS04 , JST11 , PW12 ], we presently know that the b est space upp er bo und is of O ( n 1 − 2 /p log n ) w or ds, and the lower bound is Ω ( n 1 − 2 /p ) bits (or linea r measurements). (V ery recently also, in a r estricte d strea ming mo del — when δ i = 1 — [ BO12 ] achiev es an im- prov ed upp er b ound of near ly O ( n 1 − 2 /p ) words.) In fact, since fo r p = ∞ the right b ound is O ( n ) (without the lo g factor), it may b e tempting to assume that there the rig ht upp er b ound sho uld b e O ( n 1 − 2 /p ) in the genera l cas e as well. In this work, we prov e a tight lower b ound of Ω ( n 1 − 2 /p log n ) for the case of line ar estimator . A linear estimato r use s a distribution over m × n ma trices A such that with high pro ba bilit y ov er the choice of A , it is p ossible to ca lculate the p th moment k x k p from the sk etch Ax . The pa rameter m , the num be r of words used b y the algorithm, is also called the num b er of mea surements of the alg orithm. Our new low er b ound is of Ω ( n 1 − 2 /p log n ) measurements/words, which matc hes the upp er b ound from [ AKO11 , BO10 ]. W e stress that essentially all known algor ithms in the gener al s treaming mo del are in fact linear estimato r s. Theorem 1. Fix p ∈ (2 , ∞ ) . Any line ar sketching algori thm for appr oximating the p th moment of a ve ctor x ∈ R n up to a multiplic ative factor 2 with pr ob ability 99 / 10 0 r e quir es Ω ( n 1 − 2 /p log n ) me asur ements. In other wor ds, for any p ∈ (2 , ∞ ) ther e is a c onstant C p such that for any distribution on m × n matric es A with m < C p n 1 − 2 /p log n and any function f : R m × n × R m → R + we have inf x ∈ R n Pr  1 2 k x k p ≤ f ( A, Ax ) ≤ 2 k x k p  ≤ 99 100 . (1) The pro of uses similar ha rd distr ibutio ns as in s o me of the previous work, namely all co ordinates of an input vector x have r andom sma ll v alues except for p ossibly one lo cation. T o succeed on thes e distributions , the alg orithm has to disting uish b etw een a mixture o f Gauss ian distributions and a pure Ga ussian distribution. Analy z ing the optimal probability of success directly see ms too diﬃcult. Instead, we use the χ 2 -divergence to b ound the succes s probability , which turns o ut to b e muc h more amena ble to ana lysis. F rom a statistica l p er sp e c tive, the problem of linear sketc hes of moments can b e recast as a minimax statistica l estimation problem where one obser ves the pair ( Ax, A ) and pro duces an estimate o f k x k p . More sp eciﬁca lly , this is a functional estimation pro blem, where the g oal is to estimation s o me functional (in this case, the p th moment) of the parameter x instea d of estimating x dir ectly . Under this decision-theore tic framework, our argument can b e under s to o d as Le Cam’s tw o- po int metho d fo r der iv ing minimax low er b ounds [ LC86 ]. The idea is to use a binary hypotheses testing arg ument wher e t wo pr io rs (distr ibutions of x ) ar e constructed, such that 1 ) the p th moment of x diﬀers by a co nstant factor under the resp ective prior; 2 ) the resulting distributions o f the sketc hes Ax are indistinguishable. Consequently there exists no mo men t es timator which can achiev e cons tant relative e r ror. This appro ach is als o known as the metho d of fuzzy hypo theses [ Tsy09 , Section 2.7.4]. See als o [ B L96 , IS03 , Low10 , CL11 ] for the metho d o f using χ 2 -divergence in minimax lower b ound. W e remark that our pro of does no t give a low er bo und as a function of ǫ (but [ W o o13 ] indep endently rep orts prog ress on this fr o nt) . 1.1 Preliminaries W e use the following deﬁnition o f divergences. Deﬁnition 1. L et P and Q b e pr ob ability m e asur es. The χ 2 -divergence fr om P to Q is χ 2 ( P || Q ) , Z  d P d Q − 1  2 d Q = Z  d P d Q  2 d Q − 1 The total v ariation distance b etwe en P and Q is V ( P , Q ) , sup A | P ( A ) − Q ( A ) | = 1 2 Z | d P − d Q | (2) The op era tio nal meaning of the total v ariatio n distance is as follows: De- note the optimal sum of Type-I and Type-I I err or probabilities of the binary hypotheses testing problem H 0 : X ∼ P versus H 1 : X ∼ Q by E ( P, Q ) , inf A { P ( A ) + Q ( A c ) } , (3) where the inﬁmum is ov er all measurable s ets A and the c o rresp onding test is to declare H 1 if a nd o nly if X ∈ A . Then E ( P, Q ) = 1 − V ( P , Q ) . (4) The total v ariation and the χ 2 -divergence are r elated by the follo wing in- equality [ Tsy09 , Sectio n 2 .4.1]: 2 V 2 ( P, Q ) ≤ log (1 + χ 2 ( P || Q )) (5) Therefore, in order to es tablish tha t t wo hypo theses cannot be distinguished with v anis hing erro r probability , it suﬃces to show that the χ 2 -divergence is bo unded. One additional fact a bo ut V and χ 2 is the data- pro cessing prop erty [ Csi67 ]: If a measura ble function f : A → B carries probability mea s ure P on A to P ′ on B , and car ries Q to Q ′ then V ( P , Q ) ≥ V ( P ′ , Q ′ ) . (6) 2 Lo w er Bound P ro of In this section we prove Theo rem 1 for ar bitrary ﬁxed measurement matr ix A . Indeed, by Y a o ’s minimax principle, w e only nee d to demonstrate a n input distri- bution and show that a n y deterministic alg orithm succeeding on this distr ibution with pr obability 99/10 0 must use Ω ( n 1 − 2 /p log n ) measur ements. Fix p ∈ (2 , ∞ ). Let A ∈ R m × n be a ﬁxe d matrix which is used to pro duce the linear s ketc h, where m < C p n 1 − 2 /p log n is the num be r of measure ments and C p is to b e sp eciﬁed. Next, we construct distributions D 1 and D 2 for x to fulﬁll the fo llowing prop erties: 1. k x k p ≤ C n 1 /p on the entire suppo rt of D 1 , and k x k p ≥ 4 C n 1 /p on the ent ire suppo rt of D 2 , for s o me appr opriately chosen constant C . 2. Let E 1 and E 2 denote the distribution o f Ax when x is drawn fr o m D 1 and D 2 resp ectively . Then V ( E 1 , E 2 ) ≤ 98 / 100 . The abov e cla ims immediately imply the desired ( 1 ) via the rela tionship b etw een statistical tes ts and estimators. T o see this, note that any moment estimator f induces a tes t for distinguishing E 1 versus E 2 : declar e D 2 if and only if f ( A,Ax ) 2 C n 1 /p ≥ 1. In other words, Pr x ∼ 1 2 ( D 1 + D 2 )  1 2 k x k p ≤ f ( A, Ax ) ≤ 2 k x k p  ≤ 1 2 Pr x ∼ D 2  f ( A, Ax ) > 2 C n 1 /p  + 1 2 Pr x ∼ D 1  f ( A, Ax ) ≤ 2 C n 1 /p  (7) ≤ 1 2 (1 + V ( E 1 , E 2 )) ≤ 99 100 , (8) where the la st line follows fro m the characterizatio n of the tota l v ariation in ( 2 ). The idea for constr ucting the desired pair of distributions is to us e the Gaus- sian distribution and its spar se p erturbatio n. Since the moment of a Gaussian random vector takes v alues o n the ent ir e R + , we need to further truncate by taking its conditioned v ersion. T o this end, let y ∼ N (0 , I n ) be a stan- dard no rmal random vector and t a random index uniformly distributed on { 1 , . . . , n } and indep endently of y . Let { e 1 , . . . , e n } denote the standa r d basis of R n . Let ¯ D 1 and ¯ D 2 be input distributions deﬁned as follows: Under the dis- tribution ¯ D 1 , we let the input vector x equal to y . Under the distribution ¯ D 2 , we add a one- sparse p erturbation by setting x = y + C 1 n 1 /p e t with an a ppro- priately chosen consta n t C 1 . Now we set D 1 to b e ¯ D 1 conditioned on the e vent E = { z : k z k p ≤ C n 1 /p } , i.e ., D 1 ( · ) = ¯ D 1 ( ·∩ E ) ¯ D 1 ( E ) , and set D 2 to b e ¯ D 2 conditioned on the even t F = { z : k z k p ≥ 4 C n 1 /p } . By the tr iangle inequality , V ( E 1 , E 2 ) ≤ V ( ¯ E 1 , ¯ E 2 ) + V ( ¯ E 1 , E 1 ) + V ( ¯ E 2 , E 2 ) ≤ V ( ¯ E 1 , ¯ E 2 ) + V ( ¯ D 1 , D 1 ) + V ( ¯ D 2 , D 2 ) = V ( ¯ E 1 , ¯ E 2 ) + Pr x ∼ ¯ D 1 ( k x k p ≥ C n 1 /p ) + Pr x ∼ ¯ D 2 ( k x k p ≤ 4 C n 1 /p ) , (9) where the second inequality follows from the data-pr o cessing inequality ( 6 ) (ap- plied to the mapping x 7→ Ax ). It remains to b o und the three terms in ( 9 ). First observe that for any i , E [ | y i | p ] = t p where t p = 2 p/ 2 Γ ( p +1 2 ) π − 1 / 2 . Th us, E [ k y k p p ] = nt p . By Markov ine q uality , k y k p p ≥ 100 n t p holds with proba bilit y at most 1 / 100. Now, if we set C 1 = 4 · (100 t p ) 1 /p + 10 , (10) we ha ve ( y t + C 1 n 1 /p ) p > 4 p · 10 0 nt p with pr obability at lea st 99 / 100, and hence the third term in ( 9 ) is also smaller than 1 100 . It r e ma ins to show that V ( ¯ E 1 , ¯ E 2 ) ≤ 96 / 100 . Without loss o f genera lit y , we ass ume that the r ows of A are o rthonorma l since we can a lwa ys change the basis of A a fter taking the measur ements. Let ǫ b e a co nstant smaller than 1 − 2 /p . Ass ume that m < ǫ 100 C 2 1 · n 1 − 2 /p log n . Let A i denote the i th column of A . Let S b e the set of indices i such that k A i k 2 ≤ 10 p m/n ≤ n − 1 /p √ ǫ log n/ C 1 . Let ¯ S b e the complement of S . Since P n i =1 k A i k 2 2 = m , we hav e | ¯ S | ≤ n/ 100 . Let s b e uniformly dis tr ibuted on S a nd ˜ E 2 the dis tribution of y + C 1 n 1 /p e s . By the conv exity o f ( P , Q ) 7→ V ( P , Q ) and the fact that V ( P, Q ) ≤ 1, we have V ( ¯ E 1 , ¯ E 2 ) ≤ V ( ¯ E 1 , ˜ E 2 ) + | ¯ S | n ≤ V ( ¯ E 1 , ˜ E 2 ) + 1 / 100 . In view of ( 5 ), it suﬃces to show that χ 2 ( ˜ E 2 k ¯ E 1 ) ≤ c (11) for s ome suﬃciently small constant c . T o this end, we ﬁrst prove a useful fact ab out the measurement ma tr ix A . Lemma 1. F or any matrix A with m < ǫ 100 C 2 1 · n 1 − 2 /p log n orthonormal r ows, denote by S the s et of c olumn indic es i su ch that k A i k 2 ≤ 10 p m/n . Then | S | − 2 X i,j ∈ S e C 2 1 n 2 /p h A i ,A j i ≤ 1 . 03 C 4 1 ( n − 2+4 /p + ǫ m + n 2 /p − 1 √ m ) + 1 Pr o of . Bec a use AA T = I m , we have X i,j ∈ [ n ] h A i , A j i 2 = X i,j ∈ [ n ] ( A T A ) 2 ij = k A T A k 2 F = tr( A T AA T A ) = tr( A T A ) = k A k 2 F = m. W e consider the follo wing relaxa tion: let x 1 , . . . , x | S | 2 ≥ 0 where P i x 2 i ≤ C 4 1 n 4 /p · m and x i ≤ ǫ log n . W e now upp er b ound | S | − 2 P | S | 2 i =1 e x i . W e hav e | S | − 2 | S | 2 X i =1 e x i = | S | − 2 | S | 2 X i =1   1 + x i + X j ≥ 2 x j i j !   ≤ 1 + | S | − 2 | S | 2 X i =1 x i + | S | − 2 | S | 2 X i =1 x 2 i X j ≥ 2 (max i ∈ [ n 2 ] x i ) j − 2 j ! ≤ 1 + | S | − 2 s | S | 2 X i x 2 i + | S | − 2 ( C 4 1 mn 4 /p )  e ǫ log n ( ǫ log n ) 2  ≤ 1 + 1 . 0 3 C 2 1 √ mn 2 /p − 1 + 1 . 03 C 4 1 n − 2+4 /p + ǫ m. The las t inequality uses the fact tha t 9 9 n/ 10 0 ≤ | S | ≤ n . Applying the ab ov e upper b o und to x ( i − 1) | S | + j = C 2 1 n 2 /p |h A i , A j i| ≤ C 2 1 n 2 /p k A i k · k A j k ≤ ǫ log n , we co nclude the lemma. W e also need the following lemma [ IS03 , p. 97] which gives a form ula for the χ 2 -divergence from a Gaussian location mixtur e to a standard Gaussian distribution: Lemma 2. L et P b e a distribution on R m . Then χ 2 ( N (0 , I m ) ∗ P || N (0 , I m )) = E [exp( h X, X ′ i )] − 1 , wher e X and X ′ ar e indep endently dr awn fr om P . W e now pro cee d to proving an upper bo und on the χ 2 -divergence b etw een ¯ E 1 and ˜ E 2 . Lemma 3. χ 2 ( ˜ E 2 k ¯ E 1 ) ≤ 1 . 03 C 4 1 ( n − 2+4 /p + ǫ m + n 2 /p − 1 √ m ) Pr o of . Let p i = 1 / | S | ∀ i ∈ S b e the probability t = i . Rec a ll that s is the random index uniform on the set S = { i ∈ [ n ] : k A i k 2 ≤ 10 p m/n } . Note that Ay ∼ N (0 , AA T ). Since AA T = I m , we hav e ¯ E 1 = N (0 , I m ). Therefore A ( y + C 1 n 1 /p ) ∼ ˜ E 2 = 1 | S | P i ∈ S N ( A i , I m ), a Gauss ian lo ca tion mixture. Applying Lemma 2 and then Lemma 1 , we have χ 2 ( ˜ E 2 k ¯ E 1 ) = X i,j ∈ S p i p j e C 2 1 n 2 /p h A i ,A j i − 1 ≤ 1 . 03 C 4 1 ( n − 2+4 /p + ǫ m + n 2 /p − 1 √ m ) . Finally , to ﬁnish the lower b ound pro of, since ǫ < 1 − 2 / p we ha ve n − 2+4 /p + ǫ m + n 2 /p − 1 √ m = o (1), implying ( 11 ) fo r a ll s uﬃcient ly larg e n and completing the pro of o f V ( E 1 , E 2 ) ≤ 98 / 100. 3 Discussions While Theorem 1 is stated only for cons ta nt p , the pro of also gives lo wer b ounds for p depending on n . A t one extreme, the pr o of recov ers the known lo wer b ound for approximating the ℓ ∞ -norm of Ω ( n ). No tice that the ratio be tw een the ℓ (ln n ) /ε -norm and the ℓ ∞ -norm of any vector is b ounded by e ε so it suﬃces to cons ide r p = (ln n ) /ε with a suﬃciently small constant ε . Applying the Stir- ling approximation to the cr ude v alue of C 1 in the pro of, we ge t C 1 = Θ ( √ p ). Thu s, the low er b ound we o btain is Ω ( n 1 − 2 /p (log n ) /C 2 1 ) = Ω ( n ). A t the other e xtreme, when p → 2 , the pro of a lso g ives sup er constant low er bo unds up to p = 2 + Θ (log log n/ log n ). No tice that ǫ can be set to 1 − 2 /p − Θ (log lo g n/ lo g n ) instead of a p o s itive constant strictly s maller than 1 − 2 /p . F o r this v alue of p , the pro of gives a p olylo g( n ) low er b ound. W e leav e it a s an op en question to obtain tight b ounds for p = 2 + o (1). Ac kno wledgm en ts. HN was supp o rted by NSF CCF 083 2 797, and a Go rdon W u F ellowship. YP’s work was supp or ted by the Cen ter for Science of Infor - mation (CSoI), an NSF Science and T echnology Cen ter, under grant ag reement CCF-0939 370. References AKO11. Alexand r An d oni, Rob ert Krauthgamer, and Krzysztof O n ak. Streaming algorithms from precision sampling. In Pr o c e e dings of the Symp osium on F oundations of Computer Scienc e (F OCS) , 2011. F ull version app ears on arXiv:1011.12 63. AMS99. Noga Alon, Y ossi Matias, and Mario Szegedy . The space complexit y of approximating the frequency moments. J. Comp. Sys. Sci. , 58:137–1 47, 1999. Previously app eared in STOC’96. Bar02. Ziv Bar-Y ossef. The complexity of massiv e d ata set compu tations. PhD thesis, UC Berkeley , 2002. BGKS06. Lakshminath Bh uv anagiri, Su mit Ganguly , Deepanjan Kesh, and Chan- dan Saha. Simpler algorithm for estimating frequency moments of data streams. In Pr o c e e di ngs of the ACM-SIAM Symp osium on Discr ete Algo- rithms (SODA) , pages 708–713, 2006. BJKS04. Zi v Bar-Y ossef, T. S. Ja yram, Ravi Kumar, and D. Sivakumar. An infor- mation statistics approac h to d ata stream and communicatio n complexity . J. Comput. Syst. Sci. , 68(4):702–732, 2004. BL96. L. D. Brown and M. G. Low. A constrained risk inequ ality with applications to nonparametric functional estimatio n. The Annals of Statistics , 24:2524– 2535, 1996. BO10. Vladimir Brav erman and Rafail O stro vsky . Recu rsive sketching for fre - quency momen ts. CoRR , abs/1011.257 1, 2010. BO12. Vladimir Bra verman and Rafail Ostro vsky . Ap proximati ng large frequency moments with pick-and-drop sampling. CoRR , abs/1212.0 202, 2012. CKS03. Amit Chak rabarti, Su bhash Khot, and Xiaodong S un. N ear-optimal low er b ounds on the multi-part y comm un ication complexit y of set disjoin tness. In IEEE Confer enc e on Computational Complexity , pages 107–117 , 2003. CL11. T. T. Cai and M. G. Lo w. T esting comp osite hypotheses, Hermite p oly- nomials and optimal estimation of a n onsmooth functional. The Annals of Statistics , 39(2):1012– 1041, 2011. Csi67. I. Csisz´ ar. I nformation-t yp e measures of diﬀerence of probability distri- butions and indirect observ ations. Studia Sci. Math. Hungar. , 2:299–31 8, 1967. Gan11. Sumit Ganguly . Pol yn omial estimators for high frequen cy moments. arXiv , 1104.45 52, 2011. GC07. Sumit Ganguly and Graham Cormo de. On estimating frequency moments of data streams. I n Pr o c e e dings of the International Workshop on R andom- ization and Computation (RANDOM ) , p ages 479–493, 2007. Ind06. Piotr I n dyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. J. ACM , 53(3):307–323, 2006. Previously app eared in FOCS’0 0. IS03. Y.I. Ingster and I.A. Suslina. Nonp ar ametric go o dnes s-of-ﬁt testing under Gaussian mo dels . Sp ringer, N ew Y ork, NY, 2003. IW03. Piotr I ndyk and David W oo druﬀ. Tigh t low er b ounds for the d istinct ele- ments problem. Pr o c e e dings of the Symp osium on F oundations of Computer Scienc e (FOCS) , p ages 283–290 , 2003. IW05. Piotr Indy k and David W oo druﬀ. Optimal approximations of th e frequency moments of data streams. Pr o c e e dings of the Symp osium on The ory of Computing (STOC) , 2005. JST11. Hossein Jow hari, Mert S aglam, and G´ ab or T ardos. Tigh t b ounds for L p samplers, ﬁnd in g dup licates in streams, and related problems. In Pr o c e e d- ings of the ACM Symp osium on Principles of Datab ase Systems (PODS) , pages 49–58, 2011. Previously http://arxiv. org/abs/1012.488 9 . KNPW11. Daniel M. Kane, Jelani Nelson, Ely Pora t, and David P . W oo druﬀ. F ast moment estimation in d ata streams in optimal space. In Pr o c e e dings of the Symp osium on The ory of Computing (STOC) , 2011. A previous version app eared as A rXiv:1007.419 1, http://arxiv.org/a bs/1007.4191 . KNW10. Daniel M. Kane, Jelani Nelson, and David P . W oo druﬀ . On th e ex act space complexity of sketc hing small norms. In Pr o c e e dings of the ACM-S I A M Symp osium on Di scr ete Algorithms (SODA) , 2010. LC86. Lucien Le Cam. Asymptotic metho ds in statistic al de cision the ory . Springer- V erlag, New Y ork, NY, 1986. Li08. Ping Li. Estimators and tail b oun ds for dimension reduction in l p (0 < p ≤ 2) u sing stable random pro jections. In Pr o c e e dings of the ACM-SIAM Symp osium on Di scr ete Algorithms (SODA) , 2008. Lo w10. M. G. Lo w. Chi-square low er b ounds. Borr owing Str ength: The ory Powering Applic ations - A F est schrift for L awr enc e D. B r own , p ages 22–31, 2010. MW10. Morteza Monemizadeh and David W o o druﬀ. 1-p ass relative-erro r l p - sampling with applications. I n Pr o c e e dings of the ACM-SIAM Symp osium on Di scr ete Algorithms (SODA) , 2010. NW10. Jelani Nelson and David W oo druﬀ. F ast manhattan sketc hes in data streams. In Pr o c e e dings of the ACM Symp osium on Pri nciples of Datab ase Systems (PODS) , 2010. PW12. Eric Price and David P . W o od ruﬀ. Ap plications of the S h annon-Hartley theorem t o data streams and sparse recov ery . In Pr o c e e dings of the 2012 IEEE International Symp osium on I nformation The ory , pages 1821–1825, 2012. Tsy09. A.B. Tsybako v. Intr o duction to Nonp ar ametric Estimation . Springer V er- lag, New Y ork, NY, 2009. W o o04. Da vid W o o druﬀ. Optimal space low er b ounds for all frequency mo- ments. In Pr o c e e dings of the A CM-SIAM Symp osium on Di scr ete A lgo- rithms (SODA) , 2004. W o o13. Da vid W o o druﬀ. Personal comm unication. F ebruary 2013.

Tight Lower Bound for Linear Sketches of Moments

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment