The Optimal Quantile Estimator for Compressed Counting

The Optimal Quantile Estimator for Compress ed Counting Ping Li (pingli@cornell.edu ) Cornell Uni versity , Ithaca, NY 14853 Abstract Compressed Counting (CC) 1 was recently proposed for very ef ﬁciently computing t he (approximate) α th frequenc y moments of data streams, where 0 < α ≤ 2 . Sev eral estimators were repo rted including the g eometric mean estimator , the harmonic mean esti mator , the optimal power estimator , etc. The geometric mean estimator is particularly interesting for theoretical purposes. For exa mple, when α → 1 , the complexity of CC (using the geometric mean estimator) is O (1 /ǫ ) , breaking the well-known large-de viation bound O ` 1 /ǫ 2 ´ . T he case α ≈ 1 has important applications, for e xample, computing entropy of data streams. For practical purposes, this study proposes the optimal quantile estimator . Compared with previous estima- tors, this estimator is computationa lly more efﬁcient an d is also more accurate when α > 1 . 1 Introd uction Compressed Counting (CC) [4, 7] was very recently proposed fo r efﬁciently comp uting the α th f requen cy m o- ments, where 0 < α ≤ 2 , in data streams. The un derlying techniq ue of CC is maximally skew ed stable random pr o jections , wh ich signiﬁcantly improves th e well-know algo rithm based on symmetric stable rando m pr ojections [3, 6], esp ecially when α → 1 . CC boils d own to a statistical estimation pro blem and various esti- mators have been pr oposed[4, 7]. In this stu dy , we present an e stimator based on th e op timal qua ntiles , which is computatio nally more ef ﬁcient and signiﬁcantly mo re accurate when α > 1 , as long as the sample size is not to o small. One direct app lication of CC is to e stimate entr opy of data streams. A recen t trend is to a pprox imate entropy using freq uency mom ents an d estimate frequ ency mome nts using symmetric stable random pr o jections [11, 2]. [8] applied CC to estimate entropy and demonstrated huge improvement (e. g., 50-fo ld) o ver pre vious studies. CC was r ecently pr esented at MMDS 2 008: W orkshop on Algorithms for Mode rn Massive Data S ets . Slides are av ailable at http://www.stanfo rd.edu/group /mmds/slides2008/li.pdf . 1.1 The Relaxed Strict T ur nstile Data Str eam Model Compressed Coun ting (CC) assumes a relaxed strict T urnstile data stream model. I n th e T urnstile m odel[9], the input stream a t = ( i t , I t ) , i t ∈ [1 , D ] arriving sequentially describes the underlying signal A , meaning A t [ i t ] = A t − 1 [ i t ] + I t , (1) where th e inc rement I t can be either positive (insertion) or n egati ve (deletion). Restricting A t [ i ] ≥ 0 at all t resu lts in the str ict T urn stile mod el, wh ich sufﬁces for describing m ost natural phenomena. CC constrains A t [ i ] ≥ 0 only at the t we care abou t; ho we ver , wh en at s 6 = t , CC allows A s [ i ] to be arbitr ary . Under the r elaxed strict T u rnstile model, the α th frequ ency momen t of a data stream A t is deﬁned as F ( α ) = D X i =1 A t [ i ] α . (2) When α = 1 , it is obvious that one can comp ute F (1) = P D i =1 A t [ i ] = P t s =1 I s trivially , using a simple counter . When α 6 = 1 , howe ver , com puting F ( α ) exactly requires D co unters. 1 The results were initiall y drafted in Jan 2008, as part of a report for priv a te communicat ions with seve ral theorists. That report was later ﬁled to arXi v[7], which, for shortening the presentati on, exclude d the content of the optimal quantile estimator . 1 1.2 Maximally-skewed Stable Random Pr ojections Based on maximally skew ed stable random p r o jections ), CC provid es an very efﬁcient m echanism for approx - imating F ( α ) . One ﬁrst generates a random matrix R ∈ R D , whose en tries are i.i.d. samples of a β -ske wed α -stable distribution with scale parameter 1, denoted by r ij ∼ S ( α, β , 1) . By property of stable distributions[12, 10], entries of the resultant projected v ector X = R T A t ∈ R k are i.i.d. samples of a β -ske wed α -stable distrib ution whose s cale parameter is the α f requen cy moment of A t we are afte r: x j =  R T A t  j = D X i =1 r ij A t [ i ] ∼ S α, β , F ( α ) = D X i =1 A t [ i ] α ! . The ske wness paramete r β ∈ [ − 1 , 1] . CC recomm ends β = 1 , i.e ., m aximally-skewed, for the best perfo r- mance. In real imple mentation, the linear p rojection X = R T A t is con ducted incrementally , using the fact th at the T urnstile mo del is also linear . That is, fo r e very incoming a t = ( i t , I t ) , we upd ate x j ← x j + r i t j I t for j = 1 to k . This procedu re is similar to that of symmetri c stable random pr ojection s [3, 6]; the difference is the distrib ution of the elements in R . 2 The Statistical Estimation Probl em and Pre vious Estimators CC bo ils down to a statistical estimatio n pro blem. Given k i. i.d. samples, x j ∼ S  α, β = 1 , F ( α )  , estimate the scale parameter F ( α ) . Assume k i.i.d. samples x j ∼ S  α, β = 1 , F ( α )  . V arious estimator s were pr oposed in [4, 7], inclu ding th e geometric mean estimator, the harmo nic mean estimator, the max imum likelihood estimator , the optimal quantile estimator . Figure 1 compares their asymptotic v ariances alon g with the asymptotic variance o f the geometric mean estimator for symmetric stable random pr ojectio ns [6]. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 1 2 3 4 5 α Asymp. variance factor Geometric mean Harmonic mean Optimal quantile Optimal power Symmetric GM Figure 1: Let ˆ F be an estimator of F with asympto tic variance V ar  ˆ F  = V F 2 k + O  1 k 2  . W e plot the V values for the geome tric mean estimator , the h armonic me an estimator (fo r α < 1 ), the optimal power estimator ( the lower dashed cur ve), and the optimal quantile estimator, along with the V values for th e geometric mean estimator for symmetric stab le random pr ojections in [6] (“symmetric GM”, the upper dashed curve). When α → 1 , CC achieves an “inﬁn ite improvement” in terms of the asymptotic variances. 2 2.1 The geometric mean estimator , ˆ F ( α ) ,gm , for 0 < α ≤ 2 , ( α 6 = 1 ) ˆ F ( α ) ,gm = Q k j =1 | x j | α/k  cos k  κ ( α ) π 2 k  / cos  κ ( α ) π 2   2 π sin  π α 2 k  Γ  1 − 1 k  Γ  α k  k . V ar  ˆ F ( α ) ,gm  = F 2 ( α ) k π 2 12  α 2 + 2 − 3 κ 2 ( α )  + O  1 k 2  , κ ( α ) = α, if α < 1 , κ ( α ) = 2 − α, if α > 1 . ˆ F ( α ) ,gm is unb iased and has expo nential tail boun ds for all 0 < α ≤ 2 . 2.1.1 The harmonic estimator , ˆ F ( α ) ,hm,c , for 0 < α < 1 ˆ F ( α ) ,hm,c = k cos ( απ 2 ) Γ(1+ α ) P k j =1 | x j | − α  1 − 1 k  2Γ 2 (1 + α ) Γ(1 + 2 α ) − 1  , E  ˆ F ( α ) ,hm,c  = F ( α ) + O  1 k 2  , V ar  ˆ F ( α ) ,hm,c  = F 2 ( α ) k  2Γ 2 (1 + α ) Γ(1 + 2 α ) − 1  + O  1 k 2  . ˆ F ( α ) ,hm,c has exponen tial tail bounds. 2.2 The maximum likelihood estimator , ˆ F (0 . 5) ,mle,c , for α = 0 . 5 only ˆ F (0 . 5) ,mle,c =  1 − 3 4 1 k  s k P k j =1 1 x j , E  ˆ F (0 . 5) ,mle,c  = F (0 . 5) + O  1 k 2  , V ar  ˆ F (0 . 5) ,mle,c  = 1 2 F 2 (0 . 5) k + 9 8 F 2 (0 . 5) k 2 + O  1 k 3  . ˆ F (0 . 5) ,mle,c has exponential tail bounds. 2.3 The optimal power estimator , ˆ F ( α ) ,op,c , f or 0 < α ≤ 2 , ( α 6 = 1 ) ˆ F ( α ) ,op,c =    1 k P k j =1 | x j | λ ∗ α cos ( κ ( α ) λ ∗ π 2 ) cos λ ∗ ( κ ( α ) π 2 ) 2 π Γ(1 − λ ∗ )Γ( λ ∗ α ) sin  π 2 λ ∗ α     1 /λ ∗ × 1 − 1 k 1 2 λ ∗  1 λ ∗ − 1  cos ( κ ( α ) λ ∗ π ) 2 π Γ(1 − 2 λ ∗ )Γ(2 λ ∗ α ) sin ( π λ ∗ α )  cos  κ ( α ) λ ∗ π 2  2 π Γ(1 − λ ∗ )Γ( λ ∗ α ) sin  π 2 λ ∗ α  2 − 1 !! , E  ˆ F ( α ) ,op,c  = F ( α ) + O  1 k 2  V ar  ˆ F ( α ) ,op,c  = F 2 ( α ) 1 λ ∗ 2 k cos ( κ ( α ) λ ∗ π ) 2 π Γ(1 − 2 λ ∗ )Γ(2 λ ∗ α ) sin ( π λ ∗ α )  cos  κ ( α ) λ ∗ π 2  2 π Γ(1 − λ ∗ )Γ( λ ∗ α ) sin  π 2 λ ∗ α  2 − 1 ! + O  1 k 2  . λ ∗ = argmin g ( λ ; α ) , g ( λ ; α ) = 1 λ 2 cos ( κ ( α ) λπ ) 2 π Γ(1 − 2 λ )Γ(2 λα ) sin ( πλα )  cos  κ ( α ) λπ 2  2 π Γ(1 − λ )Γ( λα ) sin  π 2 λα  2 − 1 ! . When 0 < α < 1 , λ ∗ < 0 and ˆ F ( α ) ,op,c has exponential tail bounds. ˆ F ( α ) ,op,c becomes the harmon ic mean estimator wh en α = 0+ , the a rithmetic mea n estimator when α = 2 , and the maximum likelihood estimator when α = 0 . 5 . 3 3 The Optimal Quantile Estimator Because X ∼ S  α, β = 1 , F ( α )  belongs to the location- scale f amily (location is zero always), one can estimate the scale parameter F ( α ) simply from the sample qantiles. 3.1 A General Quantile Estimator Assume x j ∼ S  α, 1 , F ( α )  , j = 1 to k . On e possibility is to use the q -quan tile of the absolute v alues, i.e., ˆ F ( α ) ,q =  q -Quan tile {| x j | , j = 1 , 2 , ..., k } W q  α . (3) where W q = q -Quantile {| S ( α, β = 1 , 1) |} . (4) Denote Z = | X | , wher e X ∼ S  α, 1 , F ( α )  . Note th at when α < 1 , Z = X . Denote the probability density fu nction of Z by f Z  z ; α, F ( α )  , the pro bability cumulative function by F Z  z ; α, F ( α )  , an d the inverse cumulative fu nction by F − 1 Z  q ; α, F ( α )  . W e can ana lyze the asymptotic (as k → ∞ ) variance of ˆ F ( α ) ,q , presented in Lemma 1. Lemma 1 V ar  ˆ F ( α ) ,q  = 1 k ( q − q 2 ) α 2 f 2 Z  F − 1 Z ( q ; α, 1) ; α, 1   F − 1 Z ( q ; α, 1)  2 F 2 ( α ) + O  1 k 2  . (5) Proof: The pr oof dir ectly follows fr om kno wn statistical r esults o n sample quantiles, e.g ., [1, Theor em 9.2], and the “delta” method. V ar  ˆ F ( α ) ,q  = 1 k q − q 2 f 2 Z  F − 1 Z  q ; α, F ( α )  ; α, F ( α )   F − 1 Z ( q ; α, 1)  2  F ( α )  (( α − 1) /α ) 2 α 2 + O  1 k 2  = 1 k ( q − q 2 ) α 2 f 2 Z  F − 1 Z ( q ; α, 1) ; α, 1   F − 1 Z ( q ; α, 1)  2 F 2 ( α ) + O  1 k 2  , using the fact that F − 1 Z  q ; α, F ( α )  = F 1 /α ( α ) F − 1 Z ( q ; α, 1) , f Z  z ; α, F ( α )  = F − 1 /α ( α ) f Z  z α − 1 /α ; α, 1  . W e can choose q = q ∗ to min imize the asymptotic variance factor, ( q − q 2 ) α 2 f 2 Z ( F − 1 Z ( q ; α, 1); α, 1 )( F − 1 Z ( q ; α, 1) ) 2 , which is apparen tly a co n vex fu nction of q , althou gh th ere appears no simple algebr aic meth od to prove it (except w hen α = 0+ ). W e den ote the optimal quantile estimator as ˆ F ( α ) ,oq = ˆ F ( α ) ,q ∗ . 3.2 The Optimal Quantiles The optimal q uantiles, d enoted by q ∗ = q ∗ ( α ) , has to b e d etermined by num erical pro cedures, using the simulated probab ility density fu nctions for stable distributions. W e used the fBasics pack age in R . W e, howe ver , fo und those function s had numer ical problems when 1 < α < 1 . 011 and 0 . 989 < α < 1 . For all oth er estimators, we hav e not n oticed any num erical issues e ven wh en α = 1 − 10 − 4 or 1 + 1 0 − 4 . Therefo re, we do not conside r there is any numerica l instability for CC, as far as the method itself is concer ned. T ab le 1 pre sents the n umerical results, inc luding q ∗ , W q ∗ = q ∗ -Quantile {| S ( α, β = 1 , 1) |} , and th e variance of ˆ F ( α ) ,oq (without the 1 k term). The v ariance factor is also plotted in Figure 1, indicating signiﬁcan t improvement over the geome tric mean es timator when α > 1 . 4 T ab le 1: α q ∗ V ar W q ∗ 0.20 0.180 1.39003806 0.05561700 0.30 0.167 1.21559359 0.11484008 0.40 0.151 1.00047427 0.2720723 0.50 0.137 0.76653704 0.4522449 0.60 0.127 0.53479789 0.7406894 0.70 0.116 0.32478420 1.231919 0.80 0.108 0.15465894 2.256365 0.85 0.104 0.08982992 3.296870 0.90 0.101 0.04116676 5.400842 0.95 0.098 0.01059831 1.174773 0.96 0.097 0.006821834 14.92508 0.97 0.096 0.003859153 20.22440 0.98 0.0944 0.001724739 30.82616 0.989 0.0941 0.0005243589 56.86694 1.011 0.8904 0.0005554749 58.83961 1.02 0.8799 0.001901498 32.76892 1.03 0.869 0.004424189 22.13097 1.04 0.861 0.008099329 16.80970 1.05 0.855 0.01298757 13.61799 1.10 0.827 0.05717725 7.206345 1.15 0.810 0.1365222 5.070801 1.20 0.799 0.2516604 4.011459 1.30 0.784 0.5808422 2.962799 1.40 0.779 1.0133272 2.468643 1.50 0.778 1.502868 2.191925 1.60 0.785 1.997239 2.048035 1.70 0.794 2.444836 1.968536 1.80 0.806 2.798748 1.937256 1.90 0.828 3.019045 1.976624 2.00 0.862 3.066164 2.097626 3.3 Comments on the Optimal Quantile Estimator The optimal quantile estimator has at least two advantages: • When th e sample size k is not to o small (e.g., k ≥ 50 ), ˆ F ( α ) ,oq is more accu rate then ˆ F ( α ) ,gm , especially for α > 1 . • ˆ F ( α ) ,oq is computatio nally more efﬁcient. The disadvantages are: • For s mall samples (e.g., k ≤ 20 ), ˆ F ( α ) ,oq exhibits bad behaviors when α > 1 . • Its the oretical ana lysis, e.g., variances a nd tail bou nds, is b ased on the density fu nction of ske wed stable distributions, which do not h ave closed-fo rms. Th e tail bou nd bo unds can be o btained similarly using the method de veloped in [5]. • The important p arameters, q ∗ and W q ∗ , are obtaine d fro m the nume rically-com puted den sity fu nctions. Due t o the numerical difﬁculty in those functio ns, w e can only obtain q ∗ and W q ∗ values for α ≥ 1 . 01 1 and α ≤ 0 . 989 . 4 Conclusion Compressed Counting (C C) dramatically improves symmetric stable r andom pr ojections , especially when α ≈ 1 , and has importan t applications in data streams compu tations such as entropy estimation. CC boils down to a statistical estimation pro blem. W e propose the optimal qu antile estima tor , wh ich con- siderably imp roves the previously p roposed geo metric mean estimator wh en α > 1 , at least asymptotically . For practical pu rposes, this estimato r should be very u seful. Howev er , for theoretical pu rposes, it can not replace the geometric mean estimator . 5 Refer ences [1] Herbert A. David. Order S tatistics . John Wile y & Sons, Inc ., New Y ork, NY , second edition, 1981. [2] Nicholas J. A. Harvey , Jelani Nelson, and Krzysztof Onak. Sketching and streaming entropy via approximation theory . In FOCS , 2008. [3] Piotr Indy k. S table d istributions, pseudorando m generators, emb eddings, and data stream computation. Journa l of ACM , 53(3):307–3 23, 2006. [4] Ping Li. Compressed counting. CoRR , abs/0802.23 05, 2008. [5] Ping Li. Computationally efﬁcient estimators for dimension reductions using stable random projections. CoRR , abs/0806.44 22, 2008. [6] Ping Li. Est imators and tail bounds for dimension r eduction in l α ( 0 < α ≤ 2 ) using stable random projections. In SOD A , pages 10 – 19, 2008. [7] Ping Li. On approximating frequenc y moments of data streams with ske wed projections. CoRR , abs/0802.08 02, 2008. [8] Ping Li. A very efﬁcient scheme for estimating entropy of data streams using compressed counting. T echnical report, Department of Statistical Science, Cornell Uni versity , 2008. [9] S. Muthukrishn an. Data streams: Algorithms and applications. F ounda tions and T r ends in Theor etical Computer Science , 1:117–23 6, 2 2005. [10] Gennady S amorodnitsk y and Murad S. T aqq u. Stable Non-Gaussian Random Pro cesses . Chapman & Hall, New Y ork, 1994. [11] Haiquan Zhao, Ashwin L all, Mitsunori Ogihara, O liv er Spatscheck, Ji a W ang, and Jun Xu. A data streaming algorithm for estimating entropies of od ﬂows. In IMC , San Diego, CA, 200 7. [12] Vladimir M. Zolotare v . One-dimensional Stable Distributions . American Mathematical Society , Providence, RI, 1986. 6

The Optimal Quantile Estimator for Compressed Counting

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment