Uniform limit theorems for wavelet density estimators

Let $p_n(y)=\sum_k\hat{\alpha}_k\phi(y-k)+\sum_{l=0}^{j_n-1}\sum_k\hat {\beta}_{lk}2^{l/2}\psi(2^ly-k)$ be the linear wavelet density estimator, where $\phi$, $\psi$ are a father and a mother wavelet (with compact support), $\hat{\alpha}_k$, $\hat{\b…

Authors: Evarist Gine, Richard Nickl

The Annals of Pr ob ability 2009, V ol. 37, No. 4, 1605–1 646 DOI: 10.1214 /08-AOP447 c  Institute of Mathematical S tatistics , 2009 UNIF ORM LIMIT THEOREMS FOR W A VE LET DENSITY ESTIMA TORS By Ev arist Gin ´ e and R ichard Nickl University of Conne cticut and University of Cambr idge Let p n ( y ) = P k ˆ α k φ ( y − k ) + P j n − 1 l =0 P k ˆ β lk 2 l/ 2 ψ (2 l y − k ) b e the linear wa v elet d ensity estimator, where φ , ψ are a father and a mother w av elet (with compact supp ort), ˆ α k , ˆ β lk are the empirical wa v elet coefficients based on an i.i.d. sample of random v ariables distributed according to a densit y p 0 on R , and j n ∈ Z , j n ր ∞ . Several uniform limit theorems are pro ved: First, the almost sure rate of con verge nce of sup y ∈ R | p n ( y ) − E p n ( y ) | is obtained, and a la w of th e logarithm for a suitably scaled version of this qu antit y is established. This implies that sup y ∈ R | p n ( y ) − p 0 ( y ) | attains th e optimal almost sure rate of conv ergence for es timating p 0 , if j n is suitably c hosen. Second, a uniform central limit theorem as w ell as strong inv ari ance principles for t h e d istribution function of p n , that is, for th e sto chastic pro cesses √ n ( F W n ( s ) − F ( s )) = √ n R s −∞ ( p n − p 0 ) , s ∈ R , are prov ed; and more generally , u n iform central limit theorems for the processes √ n R ( p n − p 0 ) f , f ∈ F , for other Donsker classes F of interest are considered. As a statistical application, it is shown that essential ly the same limit theorems can b e obtained for the hard thresholding wa v elet estimator introduced by Donoho et al. [ Ann. Statist. 24 ( 1996) 508–539] . 1. In tro duction. Let X , X 1 , . . . , X n b e ind ep endent iden tically d istr ibuted real-v alued r andom v ariables with absolutely contin uous la w P and densit y p 0 , and denote by P n the u s ual empir ical measure ind uced by the sample. If φ is a b ounded and compactly supp orted father wa v e let (scaling fun ction) and ψ an asso ciated mother w a v elet, the (linear) wa velet densit y estimator of p 0 is given b y p n ( y ) = X k ∈ Z ˆ α j n k 2 j n / 2 φ (2 j n y − k ) Received Octob er 2007; revised Octob er 2008. AMS 2000 subje c t classific ations. Primary 62G07; secondary 60F05, 60F15, 60F17. Key wor d s and phr as es. W avel et density estimator, sup-norm loss, law of the logarithm, rates of con vergence, uniform central limit theorem, w av elet thresholding, ad ap t ive esti- mation. This is an electr onic reprint of the o riginal a rticle published b y the Institute of Mathematical Statistics in The Annals of Pr ob ability , 2009, V ol. 3 7, No. 4, 1 605–1 646 . This re print differs from the original in pagination and typog raphic detail. 1 2 E. GIN ´ E AND R. NICKL (1) = X k ∈ Z ˆ α 0 k φ ( y − k ) + j n − 1 X l =0 X k ∈ Z ˆ β lk 2 l/ 2 ψ (2 l y − k ) , y ∈ R , where ˆ α lk = R 2 l/ 2 φ (2 l x − k ) dP n ( x ), ˆ β lk = R 2 l/ 2 ψ (2 l x − k ) dP n ( x ) and wh ere j n ր ∞ . Th is estimator w as introdu ced in Doukhan and Le´ o n ( 1990 ) and Kerky ac harian and Picard ( 1992 ). The latter authors pro v ed—using wa v e let theory as established b y Daub ec hies ( 1992 ), Mey er ( 1992 ) and others — th at this estimat or is, for a suitable c hoice of j n , an optimal estimator of p 0 in L p - loss, 1 ≤ p < ∞ , if p 0 b elongs to a Beso v space B t pq ( R ). F urthermore, “non- linear” mo difications of p n w ere sho wn to b e optimal eve n in more general settings, including, in particular, the case w h en t is u nknown [see Donoho, Johnstone, Kerky ac harian and Picard ( 1995 , 1996 ), Delyo n and Juditsky ( 1996 ), Kerkyac harian, Picard and T rib ouley ( 199 6 ), Hall, Ker k yac harian and Picard ( 1998 ), Juditsky and Lam b ert-Lacroix ( 2004 ) and others]. The linear estimator is part of the analysis of th ese more complex nonlinear es- timators. W e refer to the monographs H¨ ardle, Kerk yac harian, Pic ard and Tsybak o v ( 1998 ) and Vidak o vic ( 1999 ) for a general treatment of the use of w a v elets in statistics. In this article, w e ha v e three main goals: the fi rst t w o consist in studying the limiting b eha vior of the linear estimator p n ( y ) b oth as an estimator for the tr ue density fun ction p 0 ( y ) and as an estimator F W n ( s ) = R s −∞ p n ( y ) dy for the true distr ib ution fu nction F ( s ) = R s −∞ p 0 ( y ) dy , in sup-norm loss. Third—as a statistic al application—w e consider the same problems for a nonlinear m o dification of p n , namely the “hard thr esholding” wa v e let density estimator. In the fi rst case, we sh o w that under mild conditions, sup y ∈ R | p n ( y ) − E p n ( y ) | = O a . s . s j n 2 j n n ! , (2) in fact we obtain an exact la w of the logarithm for a suitably scaled version of p n − E p n , somewhat analogous to that of Deheuv els ( 2000 ) and Gin ´ e and Guillou ( 20 02 ) for the Rosen blatt–P arzen k ernel density estimator. A corollary of the p ro of also reco v ers, u nder we ak er conditions, a result of Massiani ( 2003 ), w here the su prem um is taken ov er a b ounded inte rv al, as in the classic al la w of the logarithm of Stute ( 198 2 ) for the Rosenblatt – P arzen estimator. The result ( 2 ) implies that, if p 0 is in the Be so v space B t ∞∞ ( R ) (or in the corresp onding H¨ older space of order t ), then sup y ∈ R | p n ( y ) − p 0 ( y ) | = O a . s .  log n n  t/ (2 t +1)  , (3) W A VELET DENS ITY ESTIMA TORS 3 if one c h o oses 2 j n ≃ ( n/ log n ) 1 / (2 t +1) , wh ic h is the o ptimal r ate of conv e r- gence in sup-norm loss. These results are complemen ted b y exp ectat ion b ound s and con v ergence of Laplace transforms. In the second case, w e sho w, for j n as in the previous paragraph (and other c hoices), that the pro cesses √ n ( F W n − F )( s ) , s ∈ R , (4) con v erge in law in the Banac h space of b ound ed fu nctions on R to the P - Bro wnian bridge pro cess, and that sup s ∈ R | F W n ( s ) − F ( s ) | = O a . s . s log log n n ! , in fact, w e obtain an exact la w of the iterated logarithm and a strong ap- pro ximation result. More generally , we then also prov e uniform central limit theorems for the pro cesses √ n Z R ( p n ( y ) − p 0 ( y )) f ( y ) dy , f ∈ F , for sev eral (Donske r) classes of functions F . These results again parallel limit theorems for the classical Rosenblat t–P arzen estimator [see Bic k el and Rito v ( 2003 ), Gin ´ e and Nic kl ( 2008 , 2009 )]. T o motiv ate the relev ance of our third goal, note that the resolution j n un- der whic h the linear estimator ac hiev es the optimal r ate ( 3 ) for p 0 ∈ B t ∞∞ ( R ) dep end s on t , wh ic h is t ypically unknown. T o remed y this, Do noho et al. ( 1996 ) introduced (soft and hard) thresholdin g w a v elet estimators: one first c ho oses j n sufficien tly large and indep enden t of t , and then deletes the w a v elet co efficien ts ˆ β lk in ( 1 ) in a certain range of l ’s if they are smaller than a certain thresh old. This estimator d o es not dep end on t an ymore, bu t still ac hiev es rates of con v ergence in the L p -loss, 1 ≤ p < ∞ , that are optimal up to a logarithm factor, uniformly o v er compactly s upp orted densities that are con tained in balls of Beso v sp aces B t pq ( R ), w ith t unkn o wn (but b ounded). W e sho w, as an app lication of our results f or the linear estimator, that their hard thresh olding estimator is exact r ate adaptive in th e supnorm , that is, it ac hiev es the optimal rate ( 2 ) in the sup-norm, ev en without a logarithmic p enalt y , for (not necessarily compactly s u pp orted) p 0 in B t ∞∞ ( R ), and any unsp ecified (bu t b ounded) t . (In fact, this im p lies optimalit y ov er balls of densities in B t pq ( R ) as well, cf. Remark 8 b elo w.) Moreo v er, we pro v e that the hard th r esholding w a v elet densit y estimator also satisfies the cen tral limit theorem ( 4 ). Hence this remark a ble est imator is not only r ate-adaptiv e in sup-norm loss, b u t also satisfies Bic k el and Rito v’s ( 2003 ) p lug-in p rop erty . The linear estimator in ( 1 ) can b e expressed as a generalized k ernel-t yp e estimator p n ( y ) = 2 j n n X i =1 K (2 j X i , 2 j y ) , 4 E. GIN ´ E AND R. NICKL where K ( x, y ) is th e wa v ele t pro jection kernel. It is in teresting to compare to other kernel c hoices. T he classical case wo uld b e the Parzen–Ro senblat t k ernel densit y estimat or, wh er e K ( x, y ) = K ( x − y ) with K some probabilit y densit y: if one mak es the usual conv ersion from bandwidth h to 2 − j , one can compare directly with the classical kernel case, and we discus s this in some detail in Remark 6 b elo w. In a n utshell, w hile th e pro of in the wa v e let case follo ws a pattern similar to the one for classical k ernels, some fun damen tal difficulties arise due to the fact that K ( x, y ) is not a “co nv o lution-t yp e” k ernel K ( x − y ). Most imp ortan tly , the size of the r andom fluctuations of the (cen tered) wa v el et estimator p n ( y ) − E p n ( y ) dep ends on y not only through p 0 ( y ), b ut also through the qu an tit y R K 2 (2 j y , u ) du , w h ic h is part of th e v ariance term, and which h as p erio dic oscillations on R (unless one restricts oneself to th e Haar wa vele t). Among other things, this requires a normalization in the la w of the logarithm that is not n ecessary in the con v olution-k ernel case of Stute ( 1982 ) and Gin ´ e and Guillou ( 2002 ). One migh t also b e inte rested in considering pr o jectio n ke rnels asso ciat ed with other orthonormal systems that are n ot of w a v elet t yp e, as, for example, the Diric hlet kernel (whic h corresp onds to an estimator based on F ourier series expansions). While our techniques ma y apply there as well, these k ernels are often less interesting for estimating a f u nction in the sup -norm, b ecause of app ro ximation-theoretic r easons: for example, the F ourier series of a u niformly conti nuous fun ction might not conv erge at all p oints, and ev en if it d o es, its approxima tion pr op erties in supn orm can b e sub optimal. Our pro ofs are based on tec hn iques fr om emp irical pr o cess theory . Note that if p 0 is compactly su pp orted, or if y v aries in a fi xed compact set, then p n ( y ) − E p n ( y ) consists of a finite sum of cente red e mpirical wa v elet co efficien ts, and in this case “finite dimensional” probabilistic metho ds are sufficien t to analyze the limiting b eha vior of p n in the sup-norm. Otherw ise, empirical pro cess metho d s seem to b e required. W e sho w that the classes of functions naturally asso ciated to wa v elet densit y estimators are of V apnik– ˇ Cerv onenkis type, and this allo ws the effectiv e use of exp onent ial inequal- ities for emp ir ical p ro cesses [T alagrand ( 1996 )] and en trop y-based momen t b ound s [e.g., see Einmahl an d Mason ( 2000 ), Gin´ e and Guillou ( 2001 )]. W e also u se th at b ound ed subsets of Beso v spaces are P -Donsker classes of func- tions [Nic kl and P¨ otsc her ( 2007 )]. W a velet theory is used throughout, and a brief summary of what we n eed p recedes the m ain r esults. 2. Basic setup. 2.1. Notation. F o r an arb itrary (nonempt y) set M , ℓ ∞ ( M ) will d enote the Banac h s pace of b ou n ded real-v alued functions H on M normed by k H k M := sup m ∈ M | H ( m ) | , bu t we will u se the sym b ol k h k ∞ to denote sup x ∈ R | h ( x ) | for h : R → R . F or Borel-measurable fun ctions h : R → R and W A VELET DENS ITY ESTIMA TORS 5 Borel measures µ on R , w e set µh := R R h dµ , and we d en ote b y L p ( µ ) := L p ( R , µ ) the u sual Leb esgue-space s of real-v alued fun ctions, normed by k · k p,µ . If dµ ( x ) = dx is Leb esgue measure, w e set L p ( R ) := L p ( R , µ ) , and, for 1 ≤ p < ∞ , w e abbreviate the norm b y k · k p . Similarly ℓ p := ℓ p ( Z ), 1 ≤ p ≤ ∞ , are the usual sequence spaces, and we also denote their norm b y k · k p in a sligh t abuse of notation. All integ rals are o v er the real line unless stated otherwise. Let X 1 , . . . , X n b e i.i.d. r andom v ariables with common law P on R , and denote by P n = n − 1 P n i =1 δ X i the empirical measure. W e assu me throughout that the v ariables X i are the co ordinate p r o jectio ns of ( R N , B N , P N ), and we set P r := P N . The empirical pro cess indexed by F ⊆ L 2 ( R , P ) is giv en b y f 7→ √ n ( P n − P ) f , f ∈ F . Conv ergence in la w of rand om elemen ts in ℓ ∞ ( F ) is defined as, for example, in Du dley ( 1999 ) or de la P e ˜ na and Gin ´ e ( 199 9 ), and will b e d enoted by the symbol ℓ ∞ ( F ) . Th e class F is said to b e P - Donsker if the centered Gaussian pro cess G P with co v ariance E G P ( f ) G P ( g ) = P [( f − P f )( g − P g )] is sample-b oun ded and sample-con tin uous w .r.t. the cov ariance semimetric, and if √ n ( P n − P ) ℓ ∞ ( F ) G P . 2.2. Wavelet exp ansions and estimators. W e recall h ere some standard facts from w a v elet theory [e.g., see Meyer ( 1992 ), Daub ec hies ( 1992 ), H¨ ardle et al. (1998) or Vidako vic ( 1999 )]. Let φ ∈ L 2 ( R ) b e a father w a v elet, th at is, φ is suc h that { φ ( · − k ) : k ∈ Z } is an orth onormal system in L 2 ( R ), and moreo ver the linear spaces V 0 = { f ( x ) = P k c k φ ( x − k ) : { c k } k ∈ Z ∈ ℓ 2 } , V 1 = { h ( x ) = f (2 x ) : f ∈ V 0 } , . . . , V j = { h ( x ) = f (2 j x ) : f ∈ V 0 } , . . . , are nested ( V j − 1 ⊆ V j for j ∈ N ) and su c h that S j ≥ 0 V j is dense in L 2 ( R ). F or φ with compact supp ort and K ( y , x ) := K φ ( y , x ) = X k ∈ Z φ ( y − k ) φ ( x − k ) , (5) the fu nctions K j ( y , x ) := 2 j K (2 j y , 2 j x ) , j ∈ N ∪ { 0 } , are the k ernels of the or- thogonal p ro jections of L 2 ( R ) onto V j , and we write K j ( f )( y ) = R K j ( y , x ) × f ( x ) dx for this pro jection. W e w ill use th e follo win g prop er ties rep eatedly throughout the pro ofs: if φ (not n ecessarily a father wa v elet) is b ound ed and compactly supp orted, w e ha v e [e.g., H¨ ardle et al. ( 1998 ), Lemma 8.6] | K ( y , x ) | ≤ Φ( y − x ) and X k | φ ( · − k ) | ∈ L ∞ ( R ) , (6) where Φ : R → R + is b oun ded, compactly supp orted and symmetric. F u r- thermore, if φ is a b oun ded and compactly supp orted f ather w a velet , then , for ev ery x , Z K ( x, y ) dy = 1 (7) 6 E. GIN ´ E AND R. NICKL [see Corollary 8.1 in H¨ ardle et al. ( 1998 )]; m oreo ve r, for f ∈ L p ( R ), 1 ≤ p ≤ ∞ , and fixed j , the series K j ( f )( y ) = X k ∈ Z 2 j φ (2 j y − k ) Z φ (2 j x − k ) f ( x ) dx, y ∈ R , con verges p oin twise (since for eac h y th is is a finite su m). F or f ∈ L 1 ( R ), whic h is the main case in this article, th e con v ergence of the series in fact tak es place in L p ( R ), 1 ≤ p ≤ ∞ . [F or the reader’s con venience, here is a pro of: since j is fi xed, we can assume j = 0. Setting a k = R φ ( x − k ) f ( x ) dx w e h av e R K 0 ( f )( x ) φ ( x − k ) dx = a k b y compactness of the supp ort of φ and orthogonalit y , hence X k | a k | ≤ Z X k | K 0 ( f )( x ) φ ( x − k ) | dx ≤ sup x X k | φ ( x − k ) |k K 0 ( f ) k 1 (8) ≤ c 1 k Φ ∗ | f |k 1 ≤ c 2 k f k 1 b y ( 6 ). Therefore, for any 1 ≤ p ≤ ∞ , P k k a k φ ( · − k ) k p ≤ k φ k p P k | a k | < ∞ . ] If now φ is a f ather wa v elet and ψ th e asso ciated mother w a v elet so th at { φ ( · − k ) , 2 l/ 2 ψ (2 l ( · ) − k ) : k ∈ Z , l ∈ N ∪ { 0 }} is an orthonormal basis of L 2 ( R ) [see, e.g., H¨ ardle et al. ( 1998 ), page 27], then an y f ∈ L p ( R ) has the formal expansion f ( y ) = X k α k ( f ) φ ( y − k ) + ∞ X l =0 X k β lk ( f ) ψ lk ( y ) , (9) where ψ lk ( y ) = 2 l/ 2 ψ (2 l y − k ), α k ( f ) = R f ( x ) φ ( x − k ) dx , β lk ( f ) = R f ( x ) × ψ lk ( x ) dx . Since ( K l +1 − K l ) f = P k β lk ( f ) ψ lk [e.g., H¨ ard le et al. ( 1998 ), page 92], the partial sums of the series ( 9 ) are in fact giv en b y K j ( f )( y ) = X k α k ( f ) φ ( y − k ) + j − 1 X l =0 X k β lk ( f ) ψ lk ( y ) (10) and—just as in th e pr evious p aragraph—one shows that, if φ, ψ are b ounded and ha ve compact su pp ort, then ( 10 ) con verge s p oin t wise and also in L p ( R ), 1 ≤ p ≤ ∞ , if f ∈ L 1 ( R ). If p < ∞ , and f ∈ L p ( R ), then con v ergence in ( 10 ) tak es place in L p ( R ) by a similar argumen t. Now using ( 6 ), ( 7 ), Minko wski’s inequalit y for integ rals and con tin uit y of translations in L p ( R ), w e ha ve k K j ( f ) − f k p ≤ R Φ( u ) k f (2 − j u + · ) − f k p du → 0 as j → ∞ for all f ∈ L p ( R ), 1 ≤ p < ∞ , so that con vergence of the w a v elet series in ( 9 ) tak es place in L p ( R ). Some regularit y cond itions on the w a v elets φ, ψ will b e n eeded. They par- allel the order and momen t conditions for con volutio n k ern els in classica l k er n el den sit y estimatio n. The standard co nditions read as follo ws. Reca ll that D φ is the weak deriv ativ e of φ if R φD f = − R ( D φ ) f holds for all com- pactly supp orted infinitely differen tiable functions f : R → R . W A VELET DENS ITY ESTIMA TORS 7 Condition 1. (S). W e sa y that th e orthonormal system { φ ( · − k ) , ψ lk : k ∈ Z , l ∈ N ∪ { 0 } } is S -regular, if φ and ψ are b ounded and hav e compact s up- p ort, and, if in addition, one of the follo wing t wo conditions is satisfied: either (i) the father w a velet φ has wea k d er iv ative s u p to order S that are in L p ( R ) for some 1 ≤ p ≤ ∞ ; or (ii) the mother wa v elet ψ associated to φ satisfies R x i ψ ( x ) dx = 0 , i = 0 , . . . , S . The Haar wa v elets, corresp onding to φ = 1 (0 , 1] and ψ = 1 (0 , 1 / 2] − 1 (1 / 2 , 1] , satisfy this cond ition only for S = 0 . And, for any give n S there exist compactly su pp orted wa v elets φ and ψ that satisfy condition (S) [e.g., Daub ec hies’ w av elets, see Daub ec hies ( 199 2 ), Ch ap ter 6, or H¨ ard le et al. ( 1998 )]. Giv en X 1 , . . . , X n i.i.d. w ith common absolutely contin uous la w P on R , the linear w a v elet density estimator has the form p n ( y ) := P n ( K j n ( y , · )) = 1 n n X i =1 K j n ( y , X i ) (11) = X k ˆ α k φ ( y − k ) + j n − 1 X l =0 X k ˆ β lk ψ lk ( y ) , y ∈ R , where K is as in ( 5 ), j n ∈ N sat isfies j n ր ∞ as n → ∞ , a nd where ˆ α k = R φ ( x − k ) dP n ( x ), ˆ β lk = R ψ lk ( x ) dP n ( x ) are the empirical wa v elet coeffi- cien ts. W e note that for φ , ψ compactl y supp orted, there are only fi nitely man y k s for which these co efficien ts are nonzero (with the set of co effi- cien ts dep endin g o n y ). Note that, if φ = 1 (0 , 1] , then p n is just the u sual histogram densit y estimator (with d y adic bin p oint s). F or general compactly supp orted w a v elets φ, ψ , the estimator p n w as fi rst studied b y Doukhan and Le´ on ( 1990 ) and K er k yac harian and Picard ( 1992 ). 2.3. Besov sp ac es. T o deal with the app ro ximation err or (“bias term”) of wa v elet density estimators, and for some p ro ofs, we introd uce the Beso v spaces B s pq ( R ), w h ic h form a general scale of smo oth fu nction spaces (that con tain all the classical ones as sp ecial cases). Beso v spaces can b e de- fined in several equ iv alent wa ys, the classical one b eing in terms of L p – L q norms of th e second differences | h | − sq − 1 × ( D s −{ s } f ( · + h ) + D s −{ s } f ( · − h ) − 2 D s −{ s } f ( · )) of weak deriv ativ es of f , where 0 < { s } ≤ 1 and s − { s } ∈ N ∪ { 0 } . W a ve let bases pro vide another c h aracterization of these spaces, hence it is most con ve nient f or our p urp oses to d efine them in this wa y . Definition 1. Let 1 ≤ p, q ≤ ∞ , 0 < s < S , s ∈ R , S ∈ N . Let φ b e a b ounded, compactly su pp orted father w a velet that satisfies p art (i) of 8 E. GIN ´ E AND R. NICKL Condition 1 (S), and denote by α k ( f ) an d β lk ( f ) th e wa v elet co efficient s of f ∈ L p ( R ). The Beso v space B s pq ( R ) is defined as the s et of functions ( f ∈ L p ( R ) : k f k s,p,q := k α ( · ) ( f ) k p + ∞ X l =0 (2 l ( s +1 / 2 − 1 /p ) k β l ( · ) ( f ) k p ) q ! 1 /q < ∞ ) with the ob v ious mo dification in case q = ∞ . Remark 1 (Prop erties of Beso v spaces). That this defin ition coincides with the m ore classical o nes follo ws, for instance, from Mey er ( 1992 , page 200) or Section 9 in H¨ ardle et al. ( 1998 ). The definition is indep en den t of the c hoice of φ, ψ , and one has th e con tinuous imb edding of B r pq ( R ) [defined via φ satisfying p art (i) of Cond ition 1 (R) with 0 < r < R ] into B s pq ( R ) [defined via a p ossibly different φ ′ satisfying part (i) of Condition 1 (S) with 0 < s < S for r ≥ s ]. W e also recall some wel l-kno wn relations of B s pq ( R ) to classical smo oth fun ction spaces [see, e.g ., T rieb el ( 1983 )]: for example, B s pq ( R ) is con tin uously im b edded into L p ( R ) for 1 ≤ p ≤ ∞ , an d , if C s ( R ) are the classical H¨ older spaces (of s -times conti nuously differen tiable f unctions in case s ∈ N ), then B s ∞ 1 ( R ) ֒ → C s ( R ) ֒ → B s ∞∞ ( R ) (12) holds, wher e the s econd im b edding is ev en an identi t y if s is noninte ger; and one also has the Sobolev type im b edding B s pq ( R ) ֒ → C s − 1 /p ( R ) for s > 1 /p or s = 1 /p and q = 1. F urther examples are the classical S ob olev sp aces H s ( R ) = { f ∈ L 2 ( R ) : | F f ( · ) | 2 (1 + | · | 2 ) s ∈ L 2 ( R ) } , where F is the F ourier transform, for whic h one has H s ( R ) = B s 22 ( R ); and if B V ( R ) = { f : v 1 ( f ) < ∞} , where v 1 is defined in ( 13 ) b elo w, then B 1 11 ( R ) ֒ → B V ( R ) ∩ L 1 ( R ) ֒ → B 1 1 ∞ ( R ). 3. En trop y and exp ectati on b oun ds. In this section w e will sho w that certain classes of functions r elated to the kernel K ( y , x ) = P k ∈ Z φ ( y − k ) φ ( x − k ) are V C-t yp e classes of f u nctions, meaning that they hav e L 2 ( Q ) co v er- ing num b ers of p olynomial order, u niformly in all probabilit y measur es Q . Using exp ectation inequalities for VC-cla sses, w e obtain—as an immediate consequence—a finite s amp le inequalit y for th e exp ected v alue of the devi- ation of the w av elet estimator from its mean. Also, these V C-b ounds will b e applied in later sections to obtain v arious exp onentia l inequalities for w a velet density estimators. W A VELET DENS ITY ESTIMA TORS 9 A fu nction h is of b oun ded p -v ariation on R , 0 < p < ∞ , if v p ( h ) := su p ( n X i =1 | f ( x i ) − f ( x i − 1 ) | p : (13) n ∈ N , −∞ < x 0 < x 1 < · · · < x n < ∞ ) is finite. The follo wing lemma—whic h uses (and generalizes) a result due to Nolan and Polla rd ( 1987 )—will b e useful in w h at follo ws. As usual, for H a class of functions in L r ( Q ), 1 ≤ r < ∞ , N ( H , L r ( Q ) , ε ) denotes the minimal n umber of L r ( Q )-balls of radiu s less than or equal to ε , th at co v er H . Th e logarithm of the co ve ring n umber is the L r ( Q )-metric en trop y of H . Lemma 1. L et h : R → R b e a function of b ounde d p -variation, p ≥ 1 . Set H = { h (( · ) t − s ) : t , s ∈ R } . Then H satisfies sup Q N ( H , L 2 ( Q ) , ε ) ≤  A ε  v , 0 < ε < A, with finite p ositive c onstants A, v dep ending only on h , and wher e the supr e- mum e xtends over al l Bor el pr ob ability me asur es Q on R . Pr oof. It is kno wn th at h is equal to g ◦ f where f is nondecreasing with range conta ined in [0 , v p ( h )] and g is 1 /p -H¨ older-con tinuous on the full in terv all [0 , v p ( h )] [see Lo ve and Y oung ( 1937 ) and also Dudley ( 1992 ), page 1971]. The set M of dilations and translations of f satisfies the required en trop y b ound with L 2 ( Q ) replaced b y L r ( Q ) for an y r > 0 (wh ere k · k r,Q = R | · | r dQ if r < 1), with a c onstan t A that dep ends only on r times v 1 ( f ) [see Nolan an d P ollard ( 1987 ) and d e la Pe˜ na and Gin ´ e ( 1999 ), page 224, for r < 1 ]. Since Z | g ( m 1 ) − g ( m 2 ) | 2 dQ ≤ Z | m 1 − m 2 | 2 /p dQ, it f ollo ws that an y ε -co vering of M for L 2 /p ( Q ) in duces a ε s -co ve ring of H of the same cardinalit y , for s = 1 /p if 2 /p ≥ 1 and s = 1 / 2 otherwise, pro ving the lemma (for s uitable v d ep ending only on p ).  W e will imp ose the f ollo wing condition on the function φ defining the k er n el K in ( 5 ). 10 E. GIN ´ E AND R. NICKL Condition 2. φ : R → R is of b oun ded p -v ariation for some 1 ≤ p < ∞ and v anishes on ( B 1 , B 2 ] c for some −∞ < B 1 < B 2 < ∞ . The Haar father wa v elet φ = 1 (0 , 1] is of b ounded v ariation ( p = 1) and hence satisfies Cond ition 2 . F urtherm ore, since an y α -H¨ older-con tinuous function (0 < α ≤ 1 ) with compact sup p ort is also of b ounded 1 /α -v ariation, Condition 2 is also satisfied, for example, for all Daub echies’ (father) w a v elets [see, e.g., H¨ ardle et al. ( 1998 ), Remark 7.1]. It sh ould b e noted that not all Daub ec hies’ wa ve lets are of b ounded v ariation, but they are all H¨ older con- tin u ous for some 0 < α < 1, which is why the generalizatio n to p -v ariation of the r esult of Nolan and Polla rd ( 1987 ) is useful in the presen t con text. No w f or φ satisfying Condition 2 , define F φ =  X k ∈ Z φ (2 j y − k ) φ (2 j ( · ) − k ) : y ∈ R , j ∈ N ∪ { 0 }  (14) and D φ,j =  X k ∈ Z 2 j Z t −∞ φ (2 j y − k ) dy φ (2 j ( · ) − k ) : t ∈ R  , j ∈ N ∪ { 0 } . (15) Notice that b y ( 6 ), b oth classes ha v e a constan t en v elop e. Lemma 2. L et G b e either F φ or D φ,j , wher e φ satisfies Condition 2 . Then we have the uniform entr opy b ound sup Q N ( G , L 2 ( Q ) , ε ) ≤  A ε  v , 0 < ε < A (16) for A, v p ositive and finite c onstants dep ending only on φ (and not on j for D φ,j ), and wher e the supr emum extends over al l Bor el pr ob ability me asur es Q on R . Pr oof. The case of F φ : for y , j fixed, the sum P k ∈ Z φ (2 j y − k ) φ (2 j ( · ) − k ) consists of at m ost [ B 2 − B 1 ] + 1 summand s, eac h of which has the f orm φ (2 j y − k ) φ (2 j ( · ) − k ) = c j,y ,k φ (2 j ( · ) − k ) , where k is a fixed in teger satisfying 2 j y − B 2 ≤ k < 2 j y − B 1 , and where | c j,y ,k | ≤ k φ k ∞ . Since φ is of b ounded p -v ariation, Lemma 1 ab o v e app lies to the class M of dilations and tr anslations of φ , yielding the entrop y b ound ( 16 ) for M (with differen t constant s A, v ). The class F φ consists of linear com bin ations of at most [ B 2 − B 1 ] + 1 eleme nts of M , w hose coefficients are b ounded in absolute v alue by k φ k ∞ . F or given ε ′ > 0, tak e an ε ′ -dense subset { a l } of [ −k φ k ∞ , k φ k ∞ ] and an L 2 ( Q )- ε ′ -dense subset { m i ( · ) } of M . Then { P [ B 2 − B 1 ]+1 k =1 a l k m i k ( · ) } l,i are the cen ters of a co ve ring of F φ b y L 2 ( Q ) W A VELET DENS ITY ESTIMA TORS 11 balls of radius ε = ([ B 2 − B 1 ] + 1)( k φ k ∞ + 1) ε ′ , and a simple computation on co vering n umbers sho ws that the r equired en trop y b ou n d holds for F φ . The case of D φ,j : b y the supp ort assumption on φ , we h a ve for ev ery fixed t , X k ∈ Z 2 j Z t −∞ φ (2 j y − k ) dy φ (2 j ( · ) − k ) = c X k ≤ 2 j t − B 2 φ (2 j ( · ) − k ) + X 2 j t − B 2 s ) ≤ Pr ( max n k − 1 s s n k − 1 j n k 2 j n k ) , where j ∈ N . T o estimat e the last p robabilit y , w e w ill apply T alag rand’s inequalit y to the classes of functions F k = { ¯ K (2 j y , 2 j ( · )) − P ( ¯ K (2 j y , 2 j ( · ))) : y ∈ R , j n k − 1 < j n ≤ j n k } , whic h h a ve constan t env elop e 2 k Φ k ∞ /D 1 [b y ( 6 ) and ( 20 )] and s atisfy the same entrop y b oun d as F φ in Lemma 2 , with a p ossibly different A, v —b ut indep end en t of k —by that lemma and a simple computatio n on co v ering n umbers (since ( R K 2 (2 j y , u ) du ) − 1 ∈ [1 /D 2 2 , 1 /D 2 1 ] for all y ∈ R ). Conse- quen tly we may apply in equalit y ( 60 ) b elo w with U := U k = 2 k Φ k ∞ /D 1 , and σ 2 = 2 − j n k − 1 k p 0 k ∞ , w h ere th e b ound on σ f ollo ws fr om sup y Z ¯ K 2 (2 j y , 2 j x ) p 0 ( x ) dx = sup y 2 − j R K 2 (2 j y , u ) p 0 (2 − j u ) du R K 2 (2 j y , u ) du (24) ≤ 2 − j k p 0 k ∞ . T o b e precise, w e also n eed that the sup rem um in ( 23 ) is countable, and we sho w in Remark 2 b elo w that th is is the case. Setting s = √ 2 τ +2 C 1 k p 0 k 1 / 2 ∞ mak es t = s q n k − 1 2 − j n k j n k an admissible choice in ( 60 ) for all k large enough b y the fi rst and third conditions in ( 21 ). As a consequence, for th ese v alues of k , the pr obabilit y in qu estion is b ounded from ab o v e by R exp  − 1 C 3 s 2 n k − 1 2 − j n k j n k n k 2 − j n k − 1 k p 0 k ∞  ≤ R exp  − 2 C 1 j n k C 3  . No w the second limit in ( 21 ) b ecomes j n k / log k → ∞ , hence the last expres- sion is th e general term of a con verge nt series. Th us, mo d ulo measurabilit y , w e ha v e pr ov ed that (for the stipulated s ), X k Pr ( max n k − 1 s ) < ∞ , W A VELET DENS ITY ESTIMA TORS 15 whic h giv es the theorem by Borel–Can telli and the 0–1 la w.  Remark 2 (Measurabilit y). In ord er to apply T alag rand’s inequalit y in the previous theorem, we must sh o w that the supr em u m in ( 22 ) is in fact a c ounta ble supremum. Let T 1 b e the set of discon tinuitie s of φ , wh ic h is coun table since, φ b eing of b ound ed p -v ariation, it is the comp osition of a H¨ older-conti nuous with a nond ecreasing function (see the pro of of Lemma 1 ). Let T 0 b e a countable dense sub set of R \ T 1 and defin e T = { 2 − j ( z + k ) : k ∈ Z , z ∈ T 0 ∪ T 1 } . F o r eac h y , let φ y = ( φ (2 j y − k ) : k ∈ Z ) ∈ ℓ ∞ ( Z ). W e first pro v e that { φ y ; y ∈ R } ⊆ { φ y ; y ∈ T } , (25) where the closure is in ℓ ∞ ( Z ). Give n y ∈ R , t wo cases are p ossib le. Either 2 j y − k is a discon tin uit y p oin t of φ for some k ∈ Z , or 2 j y − k ∈ T c 1 for all k . In the first case, y ∈ T . In the second case, φ y can b e appro ximated b y φ y δ , y δ ∈ T as follo ws. Let k 0 b e the largest in teger such that 2 j y − k 0 > B 2 , and k N = k 0 + N b e the smallest integ er su c h that 2 j y − k N < B 1 , and set k i = k 0 + i , i = 0 , . . . , N . Note that N ≤ B 2 − B 1 + 1. Let 0 < δ 0 < 1 b e suc h that φ is con tin uous on the neigh b orho od N i ( δ 0 ) of 2 j y − k i of radius δ 0 , i = 0 , . . . , N . F or δ ≤ δ 0 let z ∈ N 0 ( δ ) ∩ T 0 and d efine y δ = 2 − j ( z + k 0 ) ∈ T . Then | 2 j y − k i − (2 j y δ − k i ) | < δ , i = 0 , . . . , N . Hence, by con tin uit y of φ at 2 j y − k i , we ha v e max 0 ≤ i ≤ N | φ (2 j y − k i ) − φ (2 j y δ − k i ) | → 0 as δ → 0. S ince, moreo ver, φ (2 j y − k ) = φ (2 j y δ − k ) = 0 if k / ∈ { k 0 , . . . , k N } , we ha ve φ y δ → φ y in ℓ ∞ ( Z ) as δ → 0 concluding the pro of of ( 25 ). No w n X i =1 ( ¯ K (2 j y , 2 j X i ) − E ¯ K (2 j y , 2 j X )) = P k φ (2 j y − k ) c k q P k φ 2 (2 j y − k ) =: Γ ( y ) , where c k are r andom v ariables satisfying P k | c k | ≤ c < ∞ f or c nonrand om b y ( 6 ), and where P k φ 2 (2 j y − k ) ≥ D 2 1 > 0 b y ( 20 ). He nce if φ y δ → φ y in ℓ ∞ ( Z ) then Γ( y δ ) → Γ( y ) (as δ → 0). This, together with ( 25 ) prov es that sup y ∈ R | Γ( y ) | = sup y ∈ T | Γ( y ) | . Th at is, the supr em u m in ( 22 ) is countable. Remark 3. The p ro of of Theorem 1 also sho ws that, und er the condi- tions of this theorem, lim su p n →∞ r n j n 2 j n sup y ∈ R | p n ( y ) − E p n ( y ) | = C a.s. , (26) where C 2 ≤ M 2 τ k Φ k 2 2 k p 0 k ∞ . [The only difference is that in this case w e use the v ariance estimate σ 2 = 2 − j k p 0 k ∞ k Φ k 2 2 , w h ic h follo ws as in ( 24 ).] The follo w ing corollary to the pro of of Th eorem 1 will b e needed for th e more exact result b elo w . 16 E. GIN ´ E AND R. NICKL Corollar y 1. L et D ⊂ R b e suc h that k p 0 k D > 0 . If, in addition to the hyp otheses in The or em 1 , p 0 is uniformly c ontinuous, then lim su p n →∞ r n j n 2 j n sup y ∈ D     p n ( y ) − E p n ( y ) q P k φ 2 (2 j n y − k )     = C a.s. , wher e C 2 ≤ M 2 2 τ k p 0 k D and wher e M is as in The or em 1 . Pr oof. The pro of is, a s in Theorem 1 , after observing that for ev ery ε > 0 and k large enough, the b ound in ( 24 ), f or y ∈ D , b ecomes σ 2 = (1 + ε )2 − j n k − 1 k p 0 k D , by uniform con tinuit y of p 0 and since, for all x , K ( x, x + u ) = 0 if | u | > B 2 − B 1 .  T o obtain the exact constan t in the almost sure limit, we no w pro ceed to giv e a lo wer b ound . Pr opos ition 2. L et φ b e a b ounde d father wavelet vanishing on ( B 1 , B 2 ] c , −∞ < B 1 < B 2 < ∞ , and assume that P has a b ounde d c ontinuous density p 0 . Then, if j n / log log n → ∞ , we have lim inf n →∞ s n (2 log 2) j n 2 j n sup y ∈ R     p n ( y ) − E p n ( y ) q P k φ 2 (2 j n y − k )     ≥ k p 0 k 1 / 2 ∞ a.s. Pr oof. By Prop osition 2 in Einmahl and Mason ( 2000 ), the conclusion holds if, for ev ery ε > 0 and n large enough (dep endin g on ε ), we can fi nd k n = k n ( ε ) p oin ts z in = z in ( ε ), i = 1 , . . . , k n , suc h that, if g ( n ) i ( x ) = g ( n,ε ) i ( x ) = ¯ K (2 j n z in , 2 j n x ) , then the follo wing conditions hold (for all n large enough and for constants r , µ i , σ i , i = 1 , 2, d ep ending on ε ): (a) Pr { g ( n ) i ( X ) 6 = 0 , g ( n ) k ( X ) 6 = 0 } = 0 , i 6 = k , (b) P k n i =1 Pr { g ( n ) i ( X ) 6 = 0 } ≤ 1 / 2, (c) 2 − j n k n → r ∈ (0 , ∞ ) , (d) 2 − j n µ 1 ≤ E ( g ( n ) i ( X )) ≤ 2 − j n µ 2 for some −∞ < µ 1 < µ 2 < ∞ , (e) σ 1 2 − j n / 2 ≤ q V ar( g ( n ) i )( X ) ≤ σ 2 2 − j n / 2 for some 0 < σ 1 < σ 2 < ∞ , (f ) sup i,n k g ( n ) i k ∞ < ∞ , (g) lim ε → 0 σ 1 ( ε ) = lim ε → 0 σ 2 ( ε ) = k p 0 k ∞ . W e p r o ceed to v erify these conditions. Given ε > 0 , let I b e an interv al suc h that p 0 ( x ) ≥ (1 − ε ) k p 0 k ∞ for all x ∈ I ; and suc h that Pr { X ∈ I } ≤ 1 / 2. Suc h an inte rv al exists b ecause p 0 is b oun d ed and con tinuous. Set I = [ a, b ] and define z in = a + 3( B 2 − B 1 ) i 2 − j n , W A VELET DENS ITY ESTIMA TORS 17 where i = 1 , 2 , . . . ,  ( b − a )2 j n 3( B 2 − B 1 )  − 1 := k n . F or (a) n ote that K (2 j n z in , 2 j n x ) 6 = 0 implies | x − z in | 2 j n ≤ B 2 − B 1 , and that | z in − z k n | > 2 − j n 3 | B 2 − B 1 | by construction, w hic h together imply that the set in qu estion is emp t y . F or (b) note that b y (a) the sum of the pr obabilities in (b) is Pr( S k n i =1 { g ( n ) i ( X ) 6 = 0 } ) ≤ Pr { X ∈ I } ≤ 1 / 2. By constru ction, the limit in (c) is b − a 3( B 2 − B 1 ) . Condition (f ) follo ws immediately from ( 6 ) and the assu m ption on φ . Cond itions (d) and (e) are implied b y the follo wing estimates. First, Z | ¯ K (2 j n z in , 2 j n x ) | p 0 ( x ) dx ≤ D − 1 1 Z | K (2 j n z in , 2 j n x ) | p 0 ( x ) dx (27) ≤ 2 − j n D − 1 1 Z | K (2 j n z in , 2 j n z in + u ) | p 0 ( z in + u 2 − j n ) du ≤ 2 − j n D − 1 1 k p 0 k ∞ k Φ k 1 , where w e use ( 20 ) in the first in equalit y and ( 6 ) in the last, and Z ¯ K 2 (2 j n z in , 2 j n x ) p 0 ( x ) dx ≤ 2 − j n k p 0 k ∞ b y ( 24 ), whic h giv e the u pp er b oun d s in (d ) and (e) w ith µ 2 = D − 1 1 k p 0 k ∞ k Φ k 1 and σ 2 2 = k p 0 k ∞ . Second, for the lo w er b ound in (d), again using ( 6 ), ( 7 ) and ( 20 ), Z ¯ K (2 j n z in , 2 j n x ) p 0 ( x ) dx ≥ D − 1 2 Z K (2 j n z in , 2 j n x ) k p 0 k ∞ dx − D − 1 2 Z | K (2 j n z in , 2 j n x ) ||k p 0 k ∞ − p 0 ( x ) | dx ≥ 2 − j n D − 1 2 k p 0 k ∞ (1 − ε k Φ k 1 ) , whic h gives µ 1 = D − 1 2 k p 0 k ∞ (1 − ε k Φ k 1 ) in (d). T hird, for the lo wer b ound in (e), note that the inequalities ( 27 ) giv e ( E ( g ( n ) i ( X ))) 2 = O (2 − 2 j n ), w hereas E ( g ( n ) i ( X )) 2 = 2 − j n Z B 2 − B 1 B 1 − B 2 ( g ( n ) i (2 j n z in , 2 j n z in + u )) 2 p 0 ( z in + u 2 − j n ) du ≥ 2 − j n k p 0 k ∞ (1 − ε ) , 18 E. GIN ´ E AND R. NICKL since z in + u 2 − j n ∈ I and by construction of I . So the lo w er b ound in con- dition (e) is s atisfied w ith σ 2 1 = k p 0 k ∞ (1 − 2 ε ) for all n large enough, w h ic h , together with σ 2 2 = k p 0 k ∞ , give s condition (g).  This prop osition, Th eorem 1 and the b ounds ( 20 ) already determine the a.s. rate of con ve rgence of k p n − E p n k ∞ , r n j n 2 j n k p n − E p n k ∞ = O a . s . (1) and n ot o a . s . (1) and the same is true for the n ormalized quant it y in Theorem 1 . T o obtain the exact limit (with n ormalization), w e need the follo win g p rop osition. In the next p rop osition, the (at fir st sight somewhat awkw ard) cond ition B 1 , B 2 ∈ Z is d esigned to include b oth the Haar wa vele t and an y con tin uous father w a v elet with b ounded supp ort and b oun ded p -v ariation. Pr opos ition 3. L et φ b e a father wavelet that satisfies Condition 2 and is uniformly c ontinuous on ( B 1 , B 2 ] , wher e B 1 , B 2 ∈ Z . Supp ose P has a b ounde d uniformly c ontinuous density p 0 . L et D b e a b ounde d subset of R . Then, if j n satisfies ( 21 ), we have lim su p n →∞ s n (2 log 2) j n 2 j n sup y ∈ D     p n ( y ) − E p n ( y ) q P k φ 2 (2 j n y − k )     ≤ k p 0 k 1 / 2 ∞ a.s. Pr oof. W e choose λ ∈ (1 , 2) and n ′ k = [ λ k ] (where [ a ] denotes the integ er part of a ). Sin ce [ λ k ] ≤ 2[ λ k − 1 ] (as [ λ k ] / [ λ k − 1 ] → λ < 2) f or k large enough, it follo ws that for such k , the cardinalit y of the set { 2 − j n : n ′ k − 1 < n ≤ n ′ k } do es not exceed 2 if τ in ( 21 ) equals 1, which we assume, b ecause the pro of for larger τ requires only formal c hanges to the presen t pro of. Define n k − 1 = n ′ k − 1 if th is card in alit y is 1, and otherwise let n k − 1 b e the largest in teger n suc h that j n = j n ′ k − 1 . T h en w e ha ve [ λ k − 1 ] = n ′ k − 1 ≤ n k − 1 < n ′ k ≤ n k < n ′ k +1 = [ λ k +1 ] (28) and j n = j n ′ k = j n k for n k − 1 < n ≤ n k . (29) Let δ m = 1 /m for m ∈ N . F or eac h giv en k and δ m , we consider the follo w ing partition of D . D is con tained in the un ion of at most 2 + diam( D ) / (2 − j n k ( B 2 − B 1 )) d isj oin t sets (2 − j n k ( B 1 + l ) , 2 − j n k ( B 2 + l )] , l ∈ Z . T hen divide eac h o f these in terv als into m ( B 2 − B 1 ) disjoint left-op en r igh t-closed subin terv als I k ,i of length δ m 2 − j n k and let z k i b e the right endp oin ts of the int erv al I k ,i W A VELET DENS ITY ESTIMA TORS 19 [i.e., z k i = ( B 1 + m ′ δ m + l )2 − j n k for s ome 1 ≤ m ′ ≤ m and s ome l ∈ Z ]. Then the n umber l k of interv als I k i co ve ring D satisfies l k ≤ 2 + diam( D ) δ m 2 − j n k ≤ c δ m 2 − j n k (30) for some c fin ite (and k large enough dep end ing on m ). These in terv als I k i also ha v e the follo win g prop ert y: If z ∈ I k i and l ∈ Z , (31) then 2 j n k z k i − l ∈ ( B 1 , B 2 ] ⇔ 2 j n k z − l ∈ ( B 1 , B 2 ] , and, for eac h z k i , this happ ens for B 2 − B 1 in tegers l . As in ( 24 ) w e ha v e E ¯ K 2 (2 − j n k z k i , 2 − j n k X ) ≤ 2 − j n k k p 0 k ∞ , hence th e maximal version of Bernstein’s inequalit y [see Einmahl and Mason ( 1996 )] giv es that, for all η > 0, Pr ( max 1 ≤ i ≤ l k max n k − 1 q 2(1 + η ) n k 2 − j n k k p 0 k ∞ log 2 j n k ) ≤ 2 l k exp  − (2(1 + η ) n k 2 − j n k k p 0 k ∞ log 2 j n k ) ×  2 n k 2 − j n k k p 0 k ∞ + 4 3 D − 1 1 k Φ k ∞ q 2(1 + η ) n k 2 − j n k k p 0 k ∞ log 2 j n k  − 1  , whic h, by the first condition in ( 21 ), is dominated b y 2 l k exp  − (1 + η ) log 2 j n k 1 + η k  ≤ cm (2 − j n k ) (1+ η ) / (1+ η k ) − 1 for some η k → 0 and c as in ( 30 ). T h is is the general term of a conv ergen t series since 1+ η 1+ η k − 1 > 0 and by the second condition in ( 21 ). Then Borel– Can telli sho ws that lim su p n max 1 ≤ i ≤ l k max n k − 1 s 3 C 3 n k σ 2 k log U σ k ) ≤ Rl k exp  − 3 log U σ k  . (Note that the supr em u m o v er G k i is a count able supr em u m b y Remark 2 .) No w, for a fixed constant L ′ , l k exp  − 3 log U σ k  ≤ cm 2 j n k  σ k U  3 ≤ L ′ mω 3 δ m ( φ )2 − j n k / 2 , whic h, b y the second limit in ( 21 ) and by ( 28 ), is the general term of a con vergen t series. Then, for n k − 1 < n ≤ n k , and k , m large enough, one has n k σ 2 k log U σ k ≤ 2 λ 2 C 2 n 2 − j n log(2 j n ) ω 2 δ m ( φ ) log U C ω δ m ( φ ) . W A VELET DENS ITY ESTIMA TORS 21 It then follo ws by Borel–Can telli th at lim su p n max 1 ≤ i ≤ l k max n k − 1 ε, | x | < 1 /ε } for ε > 0 . App lying Prop osition 3 to D ε and Corollary 1 to D c ε , w e obtain lim su p n →∞ s n 2(log 2) j n 2 j n sup y ∈ R     p n ( y ) − E p n ( y ) q P k φ 2 (2 j n y − k )     ≤ k p 0 k 1 / 2 ∞ + M 2 τ / 2 k p 0 k D c ε for all ε > 0. Now, since, k p 0 k D c ε → 0 as ε → 0 by uniform con tin uit y of p 0 , this lim sup do es not exceed k p 0 k 1 / 2 ∞ . This and Prop osition 2 prov e th e theorem.  With the natural c hanges in the v ariance computations ( 24 ) and ( 33 ), the pro of of Th eorem 2 implies a r esult similar to the one in Massiani ( 2003 ), whic h is the counterpart for the wa vele t dens ity estimator of the classical result of Stute ( 1982 ) f or the Parzen–Rosen blatt estimator. Corollar y 2. L e t φ and the se quenc e j n b e as in Pr op osition 3 . L et D b e a b ounde d subset of R a nd assume that p 0 is uniformly c ontinuous on a 22 E. GIN ´ E AND R. NICKL neighb orho o d of D and k p 0 k D 6 = 0 . Then lim n →∞ s n 2(log 2) j n 2 j n sup y ∈ D     p n ( y ) − E p n ( y ) q P k φ 2 (2 j n y − k )     = k p 0 k 1 / 2 D a.s. If furthermor e inf x ∈ D p 0 ( x ) > 0 , then lim n →∞ s n 2(log 2) j n 2 j n sup y ∈ D     p n ( y ) − E p n ( y ) q p 0 ( y ) P k φ 2 (2 j n y − k )     = 1 a.s. Remark 4 (Mo ments and Lapla ce trans forms). W e note that the a.s. limits in the pr evious th eorem can b e complemented by con ve rgence of mo- men ts. In fact, direct application of inequalit y ( 61 ) giv es th at u nder the conditions of Theorem 1 , sup n E exp ( λ s n 2(log 2) j n 2 j n sup y ∈ R     p n ( y ) − E p n ( y ) q P k φ 2 (2 j n y − k )     ) < ∞ (35) for all λ ≥ 0 , and the same is true without the n ormalizatio n by q P k φ 2 (2 j n y − k ). This yields enough uniform int egrabilit y to obtain th at un d er the conditions of Theorem 2 , lim n →∞ E exp ( λ s n 2(log 2) j n 2 j n sup y ∈ R     p n ( y ) − E p n ( y ) q P k φ 2 (2 j n y − k )     ) = e λ k p 0 k 1 / 2 ∞ (36) for all λ ≥ 0. In particular w e obtain conv ergence of all momen ts in T heorem 2 (as we ll as uniform b oundedness of all momen ts in Theorem 1 and in Remark 3 ). Remark 5 (Haar wa ve let and normalization). Theorem 2 (and C orol- lary 2 ) applies to the Haar f ather wa velet φ = 1 (0 , 1] (whic h is obviously un i- formly con tin uous on ( B 1 , B 2 ] = (0 , 1]). In this case, P k φ 2 (2 j n y − k ) = 1 for all j, y . How ev er, if φ is n ot constan t on ( B 1 , B 2 ], the quantit y P k φ 2 (2 j n y − k ) = R K 2 (2 j n y , u ) du —although b oun ded from a b ov e and belo w—dep ends on y and j n , w hic h is why it must b e part of the n ormalizatio n in stead of the limit. Remark 6 (Comparison to con v olution k ernels). T he resolution lev el j n in w a vele t densit y estimation, m ore exac tly , the q u an tit y 2 − j n , corre- sp onds to the window wid th h n in the classica l (“P arzen–Rosen blatt”) con- v olution k ernel densit y estimator. I f K ( y , x ) = K ( y − x ) then the v ariance of the estimator ˜ p n ( y ) = h − 1 n n − 1 P n i =1 K (( y − X i ) /h n ) is asymptotically o f the order n − 1 h − 1 n p 0 ( y ) k K k 2 2 , whereas the o rder of th e v ariance of p n ( y ) is W A VELET DENS ITY ESTIMA TORS 23 n − 1 2 j n p 0 ( y ) R K 2 (2 j n y , x ) dx , whic h is differen t (exce pt for Haar w av elets) since the L 2 ( R ) norm of K ( y , x ) oscillates with y . Mo dulo these differences, the a.s. asymp totic b eh a vior of wa velet estimators is similar to that of con- v olution k er n el estimators [cf. Stute ( 1982 ), Deheuv els ( 2000 ) and Gin ´ e and Guillou ( 2002 )]. Regarding proofs, generally , the d eriv ation of Theorems 1 and 2 follo ws the same pattern of pr o of of Theorems 2.3 and 3.3 in Gin ´ e and Guillou ( 2002 ); the short pro ofs of their Theorem 2.3 and of our T heorem 1 consist of a d ir ect app licatio n of T alagrand’s inequalit y , momen t b ound s for V C -typ e cl asses of functions and “blo c king,” and here the differences only reside in the fact that th e classes of fu nctions asso ciated with the ke rnels K are not the s ame (but in b oth cases of V C type), in different v ariance com- putations, and in measurabilit y consid er ations. How ev er, the pro of of the exact limit la w (Theorem 2 ) is more delicate in the wa velet case. In Prop osi- tion 3 ab ov e w e cannot use con tin uit y of translations and dilations in L 1 ( R ) as in the up p er b ound part of Pr op osition 3.1 in Gin´ e and Gu illou ( 2002 ). Similarly , the conditions (a)–(g) that we v erify in the pro of of Prop osition 2 also require differen t metho ds than those in Einmahl and Mason ( 2000 ) or Gin ´ e and Guillou ( 2002 ). Remark 7 [Nonorthogonal φ ( · − k ) ′ s ]. The estimator p n ( y ) = P n ( K (2 j n y , · )) for K ( y , x ) = P k φ ( y − k ) φ ( x − k ) mak es sens e ev en if φ is not a fa- ther w a velet , that is, th e φ ( · − k ) need not form an orthogonal system. Assuming φ satisfies C ondition 2 and inf k K ( y , · ) k 2 > 0, then the results pro v ed so far in th is secti on still hold true b oth f or k p n − E p n k ∞ and for sup y | p n ( y ) − E p n ( y ) | / k K (2 j n y , · ) k 2 . 4.1. Appr oximation err or and optimal r ates o f c onver genc e. The p revi- ous resu lts were form ulated for the deviation of p n from E p n , whereas the quan tit y of statistical interest is p n − p 0 . Th e “bias” E p n − p 0 = K j ( p 0 ) − p 0 is nonrand om and can b e dealt with by using standard results on app ro xima- tion of functions by wa v elets. If p 0 is unif orm ly con tinuous then, by ( 6 ), ( 7 ) and Minko w ski’s inequalit y for in tegrals, k K j ( p 0 ) − p 0 k ∞ ≤ R Φ( u ) k p 0 (2 − l u + · ) − p 0 k ∞ du → 0 for l → ∞ and φ satisfying Condition 1 (0), so that if also Condition 2 holds , then k p n − p 0 k ∞ = o a . s . (1) b y Remark 3 if one chooses 2 j n ≃ n/ (log n ) 1+ δ for some δ > 0. If more is kno w n on the smo othness of p 0 one can obtain r ates of conv er- gence. The approximat ion error in su p-norm loss of a function f by wa ve lets is r elated to con tainmen t of f in the Beso v space B t ∞∞ ( R ). Recall from ( 12 ) that these spaces include the more classical H¨ older spaces C t ( R ). A f u nction p 0 in B t ∞∞ ( R ) is appr o ximated b y its pro jection K j ( p 0 ) in the un iform norm 24 E. GIN ´ E AND R. NICKL at rate 2 − j t if φ has some regularit y . T o b e precise, if φ, ψ satisfy C ondition 1 (T) for 0 < t < T + 1, an d if p 0 ∈ B t ∞∞ ( R ), then the b ounds k K j ( p 0 ) − p 0 k ∞ ≤ C 2 − j t and sup k | β lk ( p 0 ) | ≤ C 2 − l ( t +1 / 2) (37) can b e s ho w n to hold [e.g., H¨ ardle et al. ( 1998 ), Th eorem 9.4]. Insp ection of the pro of of that theorem shows that the constan t C dep ends on p 0 only through its Beso v norm k p 0 k t, ∞ , ∞ . As a consequen ce we h av e the follo w ing theorem. Theorem 3. Assume that the density p 0 of P sa tisfies p 0 ∈ B t ∞∞ ( R ) for some t > 0 . L et p n b e as i n ( 11 ) wher e φ satisfies Condition 2 , and φ , ψ ar e such that Condition 1 (T) holds for some 0 ≤ T < ∞ . If j n satisfies ( 21 ), then sup x ∈ R | p n ( x ) − p 0 ( x ) | = O a . s . s j n 2 j n n + 2 − tj n ! and  E sup x ∈ R | p n ( x ) − p 0 ( x ) | p  1 /p = O s j n 2 j n n + 2 − tj n ! for every 0 < t < T + 1 and for every 1 ≤ p < ∞ . Pr oof. This follo w s from Remarks 3 and 4 and ( 37 ).  W e note that Conditions 1 (T) and 2 are s atisfied for a large v ariet y of w a velet s, for example, the Haar wa v elet ( T = 0 ), or for sufficient ly regular Daub ec hies wa velets (arbitrary T ≥ 0 ) [cf. H¨ ard le et al. ( 1998 ), Remark 7.1 ]. Remark 8 (Op timal rates of conv ergence o ve r general Beso v b od ies). The last theorem implies that the linear wa ve let estimato r with 2 j n ≃ ( n/ log n ) 1 / (2 t +1) ac hieve s the o ptimal [o v er B t ∞∞ ( R )-balls] rate of co nv er- gence ((log n ) /n ) t/ (2 t +1) in th e uniform norm for estimati ng p 0 [see, e.g., Juditsky and Lam b ert-Lacroix ( 2004 ) for optimalit y of these r ates]. One migh t ask whether the linear w a vele t estimator p n is also b est p ossible if p 0 is conta ined in some space other than B t ∞∞ ( R ), for example, in a Beso v space B t pq ( R ) with t > 1 /p . The con tin uous (Sob olev-t yp e) im b edding of B t pq ( R ) in to B t − 1 /p ∞∞ ( R ) (see Remark 1 ) and the c h oice 2 j n ≃ ( n/ log n ) 1 / (2( t − 1 /p )+1) then giv e ( E k p n − p 0 k r ∞ ) 1 /r = O ((log n/n ) − ( t − 1 /p ) / (2( t − 1 /p )+1) ), f or all r , wh ic h is still the optimal rate of con verge nce [see, e.g., Donoho et al. ( 1996 ), Th e- orem 1]. W A VELET DENS ITY ESTIMA TORS 25 5. Uniform central limit theorems for wa v elet d ensit y estimators. Con- sider again t he wa v elet estimato r p n ( y ) defined in ( 11 ). In th is sect ion w e study the stochastic pro cess f 7→ √ n Z ( p n ( y ) − p 0 ( y )) f ( y ) dy = √ n ( P W n − P ) f , where f v aries o v er some Donsk er class F of fun ctions and where dP W n ( y ) = p n ( y ) dy . Note that P W n ( f ) = P n ( K j ( f )). The classical case is F = { 1 ( −∞ ,s ] : s ∈ R } , in which case one has √ n ( P W n − P ) f = √ n ( F W n ( s ) − F ( s )) , where F W n and F are the distribu tion functions of p n and p 0 , resp ectiv ely . W e will sho w that √ n ( P W n − P ) con verges in distribu tion in ℓ ∞ ( F ) to G P , for v arious Donsker classes F . Our p ro ofs will in fact show k P W n − P n k F = o P (1 / √ n ). Th e limit theorem for P W n can then b e inferred from a limit theorem for P n (using the fact th at F is a Donske r class). In the classical case of ( F W n − F ), we will also obtain a D v oretzky–Kiefer–W olfowitz t yp e inequalit y , the compact la w of the iterat ed logarithm, as w ell as a strong in v ariance prin ciple. W e set, for e ase of notatio n, P ( K j )( y ) = P K j ( y , · ), an d w e w ill use the sym b ol P ( K j ) b oth for the fu nction an d the finite signed measure that has it as densit y . The same applies to P n ( K j ). F or f , a b ounded function, th e follo wing decomp osition will b e usefu l: ( P W n − P n ) f = ( P n − P )( K j ( f ) − f ) + ( P ( K j ) − P ) f . (38) The fi rst term is sto c hastic, whereas the second (“exp ectatio n”) term is deterministic, and we will deal with these t wo terms separately . 5.1. CL T and str ong invarianc e principles for the distribution function of the wavelet estimator. W e will first treat the classical sp ecial case F = { 1 ( −∞ ,s ] : s ∈ R } , whic h corresp ond s to studyin g the d istr ibution fu nction F W n ( s ) = Z s −∞ p n ( y ) dy of th e wa v elet d ensit y estimator p n . T he key result will b e an exp onenti al inequalit y for the random quan tit y √ n k F W n − F n k ∞ , where F n ( s ) = R s −∞ dP n is the empirical d istr ibution fun ction. This inequalit y will follo w from ap- plying T alagrand’s inequalit y (and Lemma 2 ) to the cen tered term in the decomp osition ( 38 ), but we first m u st show that the second (“exp ectation”) term is sufficien tly small for relev ant c hoices of j . Lemma 3. L et K ( y , x ) b e a pr oje ction ke rnel as in ( 5 ) arising fr om the father wavelet φ , and assume that φ , ψ satisfy Condition 1 (T) for some 26 E. GIN ´ E AND R. NICKL 0 ≤ T < ∞ . Assu me further that the density p 0 is a b ounde d function—in which c ase we set t = 0 —or that p 0 ∈ B t ∞∞ ( R ) for some t , 0 < t < T + 1 . L et F = { 1 ( −∞ ,s ] : s ∈ R } . Then the ine quality sup f ∈F | ( P ( K j ) − P ) f | ≤ C 2 − j ( t +1) holds f or some c onstant C dep e nding only on φ and k p 0 k t, ∞ (with k p 0 k 0 , ∞ = k p 0 k ∞ ). Pr oof. Using that the wa v elet series ( 9 ) of p 0 ∈ L 1 ( R ) conv erges in L 1 ( R ), we hav e K j ( p 0 ) − p 0 = − ∞ X l = j X k β lk ( p 0 ) ψ lk , in the L 1 -sense. Therefore, since f = 1 ( −∞ ,s ] ∈ L ∞ ( R ), w e ha v e ( P ( K j ) − P ) f = Z ( K j ( p 0 ) − p 0 ) f = − Z ∞ X l = j X k β lk ( p 0 ) ψ lk ( x ) ! f ( x ) dx (39) = − ∞ X l = j X k β lk ( p 0 ) Z f ( x ) ψ lk ( x ) dx = − ∞ X l = j X k β lk ( p 0 ) β lk ( f ) . Th us, we only need to obtain b ound s for the wa vele t co efficien ts β lk ( p 0 ) and β lk ( f ). W e first obtain a b ound for f . W e observ e that Z ( K l +1 − K l )( f ) ψ lk = Z X r β lr ( f ) ψ lr ψ lk = β lk ( f ) , where the fi rst iden tit y holds b y p oin twise equalit y o f the in tegrands, a nd the second b ecause the sum h as only a finite num b er of n onzero terms (du e to compactness of the sup p ort of ψ ) and since the ψ lk ’s are orthogonal. Therefore, we ha v e, using also ( 6 ) with ψ instead of φ , k β l ( · ) ( f ) k 1 ≤ Z X k | ( K l +1 − K l )( f )( x ) || ψ lk ( x ) | dx ≤ 2 l/ 2     X k | ψ (2 l ( · ) − k ) |     ∞ k K l +1 ( f ) − K l ( f ) k 1 (40) ≤ c 2 l/ 2 ( k K l +1 ( f ) − f k 1 + k K l ( f ) − f k 1 ) . W A VELET DENS ITY ESTIMA TORS 27 T o b ound the r.h .s., we h av e by T onelli, ( 6 ) and the definition of f Z     Z 2 l K (2 l y , 2 l x ) f ( x ) dx − f ( y )     dy = Z     Z 2 l K (2 l y , 2 l u + 2 l y )( f ( u + y ) − f ( y )) du     dy ≤ Z Z 2 l Φ(2 l u ) | f ( u + y ) − f ( y ) | du dy = Z Φ( u ) Z | f (2 − l u + y ) − f ( y ) | dy du = Z Φ( u )     Z s s − 2 − l u dy     du ≤ 2 − l Z Φ( u ) | u | du. Since Φ is b ounded and has compact supp ort, w e conclude that sup f ∈F k β l ( · ) ( f ) k 1 ≤ c ′ 2 − l/ 2 (41) for s ome c ′ ∈ (0 , ∞ ). F or t he w av elet co efficien ts of p 0 , we hav e from ( 37 ) that k β l ( · ) ( p 0 ) k ∞ ≤ C 2 − l ( t +1 / 2) for 0 < t < T + 1 and some constant C . If t = 0 , one h as the s ame b ound since | β lk ( p 0 ) | ≤ 2 l/ 2 Z | ψ (2 l x − k ) | p 0 ( x ) dx ≤ 2 − l/ 2 k ψ k 1 k p 0 k ∞ b y a simple c h ange of v ariables. Applying these b ound s to ( 39 ), w e ha v e sup f ∈F     Z ( K j ( p 0 ) − p 0 ) f     ≤ sup f ∈F X l ≥ j k β l ( · ) ( p 0 ) k ∞ k β l ( · ) ( f ) k 1 ≤ c ′′ 2 − j ( t +1) , whic h completes the pro of.  Using Lemmas 2 and 3 one ca n p ro ve the foll o wing inequalit y , wh ic h is similar to Theorem 1 in Gin´ e and Nic kl ( 2009 ). Lemma 4. L et F n ( s ) = R s −∞ dP n and F W n ( s ) = R s −∞ p n ( y ) dy , wher e p n is as in ( 11 ), φ satisfies Condition 2 , and φ , ψ ar e such that Condition 1 (T) holds for some 0 ≤ T < ∞ . Assume further that the density p 0 of P is a b ounde d function—in which c ase we set t = 0 —or that p 0 ∈ B t ∞∞ ( R ) for some t , 0 < t < T + 1 . L et j satisfy 2 − j ≥ d (log n ) /n for some 0 < d < ∞ . Then ther e exist finite p ositive c onstants L := L ( k p 0 k ∞ , φ, d ) , Λ 0 := 28 E. GIN ´ E AND R. NICKL Λ 0 ( k p 0 k t, ∞ , φ, d ) such that f or al l n ∈ N and λ ≥ Λ 0 max( p j 2 − j , √ n 2 − j ( t +1) ) we have Pr( √ n k F W n − F n k ∞ > λ ) ≤ L exp  − min(2 j λ 2 , √ nλ ) L  . Pr oof. Set F = { 1 ( −∞ ,s ] : s ∈ R } . Using the decomp osition ( 38 ) and Lemma 3 w e h a ve Pr( √ n k F W n − F n k ∞ > λ ) ≤ Pr  n sup f ∈F | ( P n − P )( K j ( f ) − f ) | > √ nλ 2  b y assumption on λ (if w e tak e Λ 0 ≥ 2 C ), and w e will apply T ala grand’s inequalit y to the class ˜ F = { K j ( f ) − f − P ( K j ( f ) − f ) : f ∈ F } , whic h is a V C-t yp e class by Lemma 2 (and since F is VC)—to b ound the last probabilit y . Notice th at the supr emum o v er f ∈ F is in fact ov er a coun table set by left contin uit y of the fu nction s 7→ K j (1 ( −∞ ,s ] ) − 1 ( −∞ ,s ] . Using t he fact that K is ma jorized by a conv olution kernel Φ [cf. ( 6 )], one establishes b y the same argument s as in the pro of of T h eorem 1 in Gin ´ e and Nic kl ( 2009 ) that ˜ F has constan t en v elop e U = c k Φ k 1 and that sup f ∈F k K j ( f ) − f k 2 ,P ≤ c ′ 2 − j / 2 =: σ for some 0 < c ′ < ∞ that dep ends only on k p 0 k ∞ and Φ. Therefore, we ha v e σ < U / 2 and nσ 2 > C log( U /σ ). Using ( 59 ) we can c h o ose Λ 0 large enough so that 4 − 1 √ n Λ 0 q j 2 − j > E in the notation of App endix , which means that E + √ nλ/ 4 ≤ √ nλ/ 2. T hese conditions and the obvious b ound log (1 + x ) ≥ (( e − 1) − 1 x ∨ 1) for x > 0 applied to ( 56 ) giv e the r esu lt.  Remark 9 (Asymp totic equiv alence of F W n and F n ). Note that the v ariance estimate in the previous p r o of together with Lemma 3 [assuming ( d log n ) /n ≤ 2 − j n ], imp lies, using ( 57 ), E sup t ∈ R | F W n ( t ) − F n ( t ) | ≤ C " s j 2 j n + √ n 2 j ( t +1) # , whic h is o (1 / √ n ) if j = j n is s u c h that √ n 2 j n ( t +1) → 0 . W A VELET DENS ITY ESTIMA TORS 29 The last remark suggests that—in the m ost int eresting range of j n s such as 2 − j n ≃ n − 1 / (2 t +1) —the integ rated w a velet dens it y estimator is asymp- totical ly equiv alen t to th e empirical distribu tion fu nction (wh ile, at the same time, delive ring a r easonable estimate of the density). The exp onen tial b ound from Lemma 4 is th e key to transfer r ing seve ral classical results for the empirical d istribution fu nction to the cdf of the w a velet d en sit y esti- mator, and we state b elo w some of the m ore imp ortan t results that can b e obtained in this w ay . F or example, Lemma 4 implies a Dv oretzky , K iefer and W olfo witz ( 1956 ) t yp e exp onentia l b ound, up to constants, f or the distribution function of the w a velet estimator; namely , there exist un iv er s al constan ts c 1 , c 2 suc h th at for Λ 0 max( q j 2 − j , √ n (2 − j ( t +1) )) ≤ λ ≤ √ n, w e ha v e Pr( √ n k F W n − F k ∞ > λ ) ≤ c 1 exp {− c 2 λ 2 } . (42) W e next giv e the w a vele t-analogue of Donsk er ’s classical fun ctional CL T for the empirical distribution function. Theorem 4. L et φ, ψ and p 0 satisfy the c onditions of L emma 4 for some t ≥ 0 a nd let j n satisfy 2 − j n ≥ d (log n ) /n for al l n and √ n 2 − j n ( t +1) → 0 as n → ∞ . If F is the distribution function of P , then √ n ( F W n − F ) ℓ ∞ ( R ) G P . F or th e compact la w of the iterated logarithm defi n e S =  x 7→ Z x −∞ f dP : Z f dP = 0 , Z f 2 dP ≤ 1  , the Strassen set. Theorem 5. L et φ, ψ and p 0 satisfy the c onditions of L emma 4 for some t ≥ 0 and let j n satisfy 2 − j n ≥ d (log n ) /n for al l n and su p n √ n (2 − j n ) t +1 = M < ∞ . L et F b e the distribution function of P . Then, almost sur ely, the se quenc e ( r n 2 log log n ( F W n − F ) ) ∞ n =3 is r elatively c omp act in ℓ ∞ ( R ) and i ts set of limit p oints c oincides with the Str assen set S . 30 E. GIN ´ E AND R. NICKL Finally , w e consider the s mallest admissible c hoice of λ in Lemma 4 . In the case t = 0 and the largest admissible resolution lev el 2 j n ≃ n / log n , we see that w e can tak e λ ≃ log n/ √ n , the rate o ccur ring in the K oml´ os, Ma jor and T usn´ ady ( 1975 ), resu lt on strong appro ximation of √ n ( F n − F ) b y Bro wnian bridges. Consequent ly , we ha v e the follo wing strong in v ariance principle f or the in tegrated wa vele t density estimator F W n . Theorem 6. L et φ, ψ and p 0 satisfy the c onditions of L emma 4 with t = 0 , and set 2 − j n ≃ (log n ) /n . L e t F b e the distribution function of P . Then ther e exists a pr ob ability sp ac e that su pp orts X 1 , X 2 , . . . i.i. d. with density p 0 and a se q uenc e of Br ownian bridges B n such that, for al l n ∈ N and x ∈ R , Pr( k √ n ( F W n − F ) − B n ◦ F k ∞ > n − 1 / 2 (( C + Λ ′ 0 ) log n + x )) ≤ 2 n − 2 + M e − ηx , wher e C, M , η ar e absolute c onstants and wher e Λ ′ 0 = max(2 L, √ 2 L, Λ 0 ) , with Λ 0 and L as in L emma 4 . In p articular, for these versions, one has k √ n ( F W n − F ) − B n ◦ F k ∞ = O a . s .  log n √ n  . 5.2. Gener al UCL Ts for wavelet density estimators. The question arises whether { 1 ( −∞ ,s ] : s ∈ R } in the last section can b e replaced by a more gen- eral Donske r class F . Considering the central limit theorem, suc h r esults w ere pro v ed for other den s it y estimators—suc h as nonparametric maximum lik eliho o d estimators and ke rnel densit y estimators—in Nic kl ( 2007 ) and Gin ´ e and Nic kl ( 2008 ). W e sho w in this section that suc h resu lts can also b e pro v ed for the w a velet estimator P W n , for many classes F , in p articular for balls in general Beso v spaces (h ence co vering Sob olev, H¨ older and Lipschitz classes). In th e case of general (Beso v) classes of functions, the wa velet structur e will b e particularly helpful, b ut b efore we turn to these classes, w e sho w that Lemma 4 immediately implies the follo wing r esult for b ounded v ariation classes, since these are in the clo sed con vex hull of indicator functions. A measurable fun ction f : R 7→ R is of b ound ed v ariation if v 1 ( f ) < ∞ , cf. ( 13 ), and the class F = { f : k f k ∞ + v 1 ( f ) ≤ 1 } is a P -Donsk er class for ev ery P [see, e.g., Dudley ( 1992 )]. Corollar y 3. L et φ, ψ and p 0 , satisfy the c onditions of L e mma 4 for some t ≥ 0 . Then, if F R = { f right c ontinuous : k f k ∞ + v 1 ( f ) ≤ 1 } and L, Λ 0 , λ, j ar e as in L emma 4 , we have for al l n ∈ N , Pr( √ n k P W n − P n k F R > λ ) ≤ L exp  − min(2 j λ 2 , √ nλ ) L  . W A VELET DENS ITY ESTIMA TORS 31 If f urthermor e √ n 2 − j n ( t +1) → 0 as n → ∞ and if F = { f : k f k ∞ + v ( f ) 1 ≤ 1 } , then also √ n ( P W n − P ) ℓ ∞ ( F ) G P . Pr oof. If f is of b ounded v ariation and r igh t con tinuous, and f ( −∞ +) = 0, then there exists a unique finite Borel measure µ f suc h that f ( x ) = R 1 ( −∞ ,x ] ( v ) dµ f ( v ). S ince ( P W n − P n ) c = 0 for c constan t [see ( 7 )], we may as- sume that the elemen ts in F R all satisfy f ( −∞ +) = 0 . W e then ha v e from F u- bini’s theorem [usin g also ( 6 )], for f ∈ F R that | ( P W n − P n ) f | ≤ k F W n − F n k ∞ . This already pr o ves the first claim of the corollary by Lemm a 4 . T o pro v e the second claim, observe that an y f ∈ F is right- con tin uous except at most at a coun table n u m b er of p oints, in p articular there exists a righ t-con tinuous function ˜ f suc h th at ˜ f = f almost ev erywhere. Sin ce P W n , P are absolutely con tinuous measures, w e ha v e √ n ( P W n − P ) f = √ n ( P W n − P ) ˜ f = √ n ( P W n − P n ) ˜ f + √ n ( P n − P ) ˜ f , whic h pro v es the second claim b y using th e first and sin ce F is P -Donsk er.  W e will now pro v e a general cen tral limit theorem for the w av elet density estimator, un iformly o v er Beso v balls. T he pro of via the d ecomp osition ( 38 ) necessitates that these b alls b e Donsker classes of functions. The follo wing Donsk er p rop erty of b alls in B s pq ( R ) w as p ro ved in Nic kl and P ¨ otscher ( 2007 ), and can b e sho wn to b e essential ly sharp [see Nic kl ( 2006 )]. Note that under the follo wing conditions on s, p, q , the Beso v sp aces B s pq ( R ) can (and will b e) view ed as spaces of b ounded cont inuous functions. Pr opos ition 4. L et F b e a b ounde d su bset of B s pq ( R ) wher e 1 ≤ p ≤ ∞ , 1 ≤ q ≤ ∞ , and let P b e a pr ob ability me asur e on R . Supp ose that one of the fol lowing c onditions holds: (a) 1 ≤ p ≤ 2 and s > 1 /p . (b) 2 < p ≤ ∞ , s > 1 / 2 , and R R | x | 2 γ dP ( x ) < ∞ for some γ > 1 / 2 − 1 /p . (c) 1 ≤ p < 2 , q = 1 and s = 1 /p . Then F is P -Donsker. Theorem 7. L et 1 ≤ p, q ≤ ∞ and 1 /p + 1 /r = 1 . L et dP W n ( x ) = p n ( x ) dx wher e p n is as in ( 11 ) and wher e φ , ψ satisfy p art (i) of Condition 1 (T) for some 1 ≤ T < ∞ . F or 0 < s < T + 1 and P , s , p, q satisfying one of the c on- ditions in Pr op osition 4 , let F b e a b ounde d subset of B s pq ( R ) . Assume in addition that p 0 ∈ L r ( R ) —in which c ase we set t = 0 —or that p 0 ∈ B t r ∞ ( R ) for some t , 0 < t < T + 1 . Supp ose √ n 2 − j n ( t + s ) → 0 as n → ∞ . Then √ n ( P W n − P ) ℓ ∞ ( F ) G P . 32 E. GIN ´ E AND R. NICKL Pr oof. W e shall use thr oughout the pro of the prop erties of Beso v spaces summarized in Remark 1 . Note first that un der the conditions of the th eorem, if p = 1 , then s > 1 or s = q = 1, in wh ic h case F is a b ound ed subset of B V ( R ), an d th e conclusion of th e theorem follo ws f rom Corollary 3 . So w e n eed only consider th e case p > 1. W e will use the decomp osition ( 38 ) from ab ov e, and we fir st deal with the exp ectation term. As in ( 39 ), w e obtain Z ( P ( K j ) − P ) f = X l ≥ j X k β lk ( p 0 ) β lk ( f ) , where one uses th e conjugacy of p and r and the fact that the wa vele t series of p 0 ∈ L r ( R ) conv erges in L r ( R ) if 1 ≤ r < ∞ . I f t > 0, w e obtain from [H¨ ardle et al. ( 1998 ), Theorem 9.4] that k β l ( · ) ( p 0 ) k r ≤ c 2 − l ( t +1 / 2 − 1 /r ) for some finite constan t c . In case t = 0 this follo ws from ( 6 ) and a computation similar to the one in ( 40 ), us in g H¨ older’s in equalit y . Similarly , it follo ws from the same reference, noting the obvio us imb edding of B s pq ( R ) into B s p ∞ ( R ), that w e ha v e sup f ∈F k β l ( · ) ( f ) k p ≤ c ′ 2 − l ( s +1 / 2 − 1 /p ) . (43) Hence the second “exp ectation” term in ( 38 ) is of order sup f ∈F     Z ( K j n ( p 0 ) − p 0 ) f     ≤ sup f ∈F X l ≥ j n k β l ( · ) ( p 0 ) k r k β l ( · ) ( f ) k p ≤ X l ≥ j n c ′′ 2 − l ( t + s +1 − 1 /r − 1 /p ) ≤ c ′′′ 2 − j n ( t + s ) = o (1 / √ n ) b y the assumption on j n . It remains to treat the first term in ( 38 ). First observ e that the class of functions [ j ≥ 0 F ′ j := [ j ≥ 0 { K j ( f ) − f : f ∈ F } is P -Donsker: by definition of the Beso v n orm and ( 10 ), w e see that for s ′ suc h that max(1 / 2 , 1 /p ) < s ′ < min( s, 1), k K j ( f ) k s ′ ,p,q is b ound ed from ab o v e b y k f k s ′ ,p,q ≤ c k f k s,p,q , u niformly in j . Consequently , S j ≥ 0 F ′ j is con- tained in a ball of B s ′ pq ( R ) of radius at m ost c ′ sup f ∈F k f k s,p,q for some con- stan t 0 < c ′ < ∞ , hence it is P -Donsk er by Prop ositio n 4 . So, in order to pro v e k P n − P k F ′ j n = o P (1 / √ n ) , W A VELET DENS ITY ESTIMA TORS 33 it suffices to sho w that the v ariances su p f ∈F ′ j n E f 2 ( X ) con verge to zero. Since b ounded subsets of B s ′ pq ( R ) are uniformly b ounded classes of f unctions under the conditions of the theorem, w e h a ve E ( K j ( f )( X ) − f ( X )) 2 ≤ c Z | K j ( f )( x ) − f ( x ) | p 0 ( x ) dx (44) ≤ c k K j ( f ) − f k p k p 0 k r and this completes the pro of s in ce p 0 ∈ L r ( R ) by assumption and since sup f ∈F k K j n ( f ) − f k p ≤ c ′ sup f ∈F k K j n ( f ) − f k s ′ ,p,q = sup f ∈F ∞ X l ≥ j n (2 l ( s ′ +1 / 2 − 1 /p ) k β l ( · ) ( f ) k p ) q ! 1 /q → 0 as n → ∞ , b y ( 43 ).  Gin ´ e and Nic kl [( 2008 ), Theorems 5–7, Lemma 12 ] pro v ed an a nalogue of Theorem 7 and of Corollary 3 for the classical k ernel den sit y estimator. A t fi r st sigh t the pro of ther e seems s omewh at more inv olv ed, but it should b e noted that the pr o of in the wa ve let case relies on nontrivial results s uc h as th e wa v elet charact erization o f Beso v spaces together with the Donske r prop erty of Beso v b alls (Pr op osition 4 ), w h ic h cannot b e used in the case of con volutio n k er n els. W e should also men tion that the case p ≥ 2 (and com- pactly supp orted p 0 ) in the ab o v e theorem wa s considered in N ic kl ( 2007 ) for the muc h more inv olv ed case of n onparametric maxim um lik eliho o d es- timators. 6. Adaptation in sup-norm loss and the “plug-in prop erty” of thresh - olding wa v elet estimators. T h e linear wa ve let estimator p n ( y ) from ( 11 ) requires c h o osing j n , and the choice of j n that pro du ces optimal results for p n dep end s on the smo othness t of the tru e density p 0 (cf. the discussion in Remark 8 ). F rom a p ractical p oin t of view, this is a d ra w bac k, as p 0 is un - kno wn. A remedy for this problem w as suggested in Donoho et al. ( 199 6 ) by considering so called “thresholding” wa v elet esti mators, defined as follo ws. Note first that w e m a y w r ite, for j 0 < j 1 , b oth in tegers, P n ( K j 1 ) = P n ( K j 0 ) + j 1 − 1 X l = j 0 P n ( K l +1 − K l ) = P n ( K j 0 ) + j 1 − 1 X l = j 0 X k ˆ β lk ψ lk . Har d thresholding (the only one w e will consider) consists of keeping in this sum only those ˆ β lk that are larger than a threshold τ . That is, for j i = j i ( n ) 34 E. GIN ´ E AND R. NICKL and τ = τ ( n, l ) th e hard thresh olding estimator of p 0 is giv en b y p H n ( y ) = P n ( K j 0 ( y , · )) + j 1 − 1 X l = j 0 X k ˆ β lk I {| ˆ β lk | >τ } ψ lk ( y ) . (45) It is kno wn [e.g., Donoho et al. ( 1996 ), Juditsky and Lam b ert-Lacroix ( 2004 )] that if τ , j 0 , j 1 are c hosen in a suitable w a y (not requiring the knowledge of the smo othness p arameter t ), then p H n is rate-adaptiv e w ith in a logarithmic factor for an y L p ′ loss, 1 ≤ p ′ < ∞ , that is sup p 0 : k p 0 k t,p,q ≤ L, | supp( p 0 ) |≤ M E p 0 k p H n − p 0 k p ′ p ′ ≤ C (log n ) γ r n ( t, p ′ ) , where γ > 0, C is a constant and r n ( t, p ′ ) is the minimax rate of con vergence for estimating a densit y in the give n Beso v ball. W e no w show, w ithout assuming compact supp ort for p 0 , that the thesh- olding wa v elet estimator is rate adaptive for sup norm loss without the log- arithmic p enalt y and th at, simulta neously , its d istr ibution function is √ n - consisten t in the sup norm (in fact, it satisfies the UCL T). The pattern of pro of of the result b elo w follo ws that of the aforemen tioned authors, but w e must use th e results from the previous sections in sev eral instances, and w e must d eal with th e u n b oun ded sup p ort of p 0 b y introd ucing a moment condition for p 0 of arbitrarily small ord er com b ined with an application of Hoffmann–Jørgensen’s inequalit y . F or κ > 0, define th e constant c ( κ ) := c ( κ, ψ , k p 0 k ∞ ) = κ 2 8 k ψ k 2 2 k p 0 k ∞ + 8 / (3 √ log 2) κ k ψ k ∞ . Also, define P ( L, L ′ , η ) =  p 0 : k p 0 k t, ∞ , ∞ ≤ L, Z | x | η p 0 ( x ) dx ≤ L ′  . Theorem 8. Supp ose φ satisfies Condition 2 and φ , ψ ar e such that Condition 1 (T) holds for some 0 ≤ T < ∞ . Assume further that the density p 0 of P satisfies p 0 ∈ B t ∞∞ ( R ) for some t , 0 < t < T + 1 , and that E | X | η < ∞ for some η > 0 . L et p H n , n ≥ 2 , b e the thr esholding estimator in ( 45 ) c orr esp onding to τ = τ ( n, l ) = κ q l/n, 2 j 0 ≃ ( n/ log n ) 1 / (2( T +1)+1) and n / log n ≤ 2 j 1 ≤ 2 n/ log n, j 0 < j 1 , W A VELET DENS ITY ESTIMA TORS 35 wher e κ > 0 is chosen so that c ( κ ) ≥ ( T + 3)(1 + η − 1 ) log 2 . Then sup p 0 ∈P ( L,L ′ ,η ) E p 0 sup y ∈ R | p H n ( y ) − p 0 ( y ) | = O  log n n  t/ (2 t +1)  . (46) Mor e over, letting F H n and F denote the distribution functions of p H n and p 0 , r e sp e ctive ly, √ n ( F H n − F ) ℓ ∞ ( R ) G P . (47) Pr oof. Since p 0 = K j 0 ( p 0 ) + j 1 − 1 X l = j 0 ( K l +1 − K l )( p 0 ) − K j 1 ( p 0 ) + p 0 and since j 1 − 1 X l = j 0 ( K l +1 − K l )( p 0 ) = j 1 − 1 X l = j 0 X k β lk ( p 0 ) ψ lk with the last series con v erging p oint wise (in fact uniformly) b ecause p 0 ∈ L 1 ( R ), we hav e, k p H n − p 0 k ∞ ≤ k ( P n − P )( K j 0 ) k ∞ +      j 1 − 1 X l = j 0 X k ( ˆ β lk I {| ˆ β lk | >τ } − β lk ( p 0 )) ψ lk      ∞ + k K j 1 ( p 0 ) − p 0 k ∞ . The exp ectation of the first term is O (((log n ) /n ) ( T +1) / (2( T +1)+1) ) = o ( ((log n ) /n ) t/ (2 t +1) ) b y ( 17 ) and since t < T + 1. The th ir d term is at m ost of the order 2 − j 1 t b y ( 37 ), and this is O ((log n/n ) t ) = o ((log n /n ) t/ (2 t +1) ). It r emains to consider the second term, w hic h can b e decomp osed as j 1 − 1 X l = j 0 X k ( ˆ β lk − β lk ) ψ lk [ I [ | ˆ β lk | >τ , | β lk | >τ / 2] + I [ | ˆ β lk | >τ , | β lk |≤ τ / 2] ] − j 1 − 1 X l = j 0 X k β lk ψ lk [ I [ | ˆ β lk |≤ τ , | β lk | > 2 τ ] + I [ | ˆ β lk |≤ τ , | β lk |≤ 2 τ ] ] := (I) + (I I) − (I I I) − (IV) , where we w rite β lk for β lk ( p 0 ). W e fi rst treat the “large deviations” terms (I I) and (I I I). 36 E. GIN ´ E AND R. NICKL F or (I I) we c ho ose α ∈ (1 , η + 1) suc h that c ( κ ) ≥ ( T + 2) α α − 1 log 2 , (48) whic h is p ossible by the condition on c ( κ ), and note, E sup x      j 1 − 1 X l = j 0 X k ( ˆ β lk − β lk ) ψ lk ( x ) I [ | ˆ β lk | >τ , | β lk |≤ τ / 2]      ≤ j 1 − 1 X l = j 0 2 l/ 2 k ψ k ∞ X k [ E | ˆ β lk − β lk | α ] 1 /α (49) × [Pr {| ˆ β lk | > τ , | β lk | ≤ τ / 2 } ] 1 − 1 /α . No w, since sup x | ψ lk ( x ) | ≤ 2 l/ 2 k ψ k ∞ and E ψ 2 lk ( X ) ≤ k ψ k 2 2 k p 0 k ∞ = k p 0 k ∞ , Bernstein’s inequalit y giv es, for l ≤ j 1 − 1 (and n ≥ e 2 ), Pr {| ˆ β lk | > τ , | β lk | ≤ τ / 2 } ≤ Pr (      1 n n X i =1 ( ψ lk ( X i ) − E ψ lk ( X ))      > 2 − 1 κ q l/n ) ≤ 2 exp  − κ 2 l 8 k p 0 k ∞ + (8 / 3) κ k ψ k ∞ q 2 l l/n  (50) ≤ 2 exp  − κ 2 l 8 k p 0 k ∞ + (8 / (3 √ log 2)) κ k ψ k ∞  = 2 e − c ( κ ) l , a b oun d which is indep en den t of k . In order to estimate P k [ E | ˆ β lk − β lk | α ] 1 /α , w e note that, by Hoffmann –Jørgensen’s inequalit y [see the ve rsion of Corol- lary 1.2.7 in de la Pe˜ na and Gin´ e ( 1999 )], th er e exists a unive rsal constan t d ( α ) su ch that k ˆ β lk − β lk k α,P (51) ≤ d ( α )      max 1 ≤ i ≤ n     1 n ( ψ lk ( X i ) − E ψ lk ( X ))         α,P + k ˆ β lk − β lk k 1 ,P  . If supp ψ ⊆ [ A 1 , A 2 ], we hav e, for the second summand, X k k ˆ β lk − β lk k 1 ,P ≤ 2 l/ 2+1 k ψ k ∞ X k Z supp ψ lk dP ≤ 2 l/ 2+1 k ψ k ∞ X k Z ( A 2 + k ) / 2 l ( A 1 + k ) / 2 l dP (52) W A VELET DENS ITY ESTIMA TORS 37 ≤ 2( A 2 − A 1 + 1) k ψ k ∞ 2 l/ 2 (since, for l fi xed, eac h x ∈ R is conta ined in at m ost A 2 − A 1 + 1 interv als [( A 1 + k ) / 2 l , ( A 2 + k ) / 2 l ]). In order to estimate th e sum o ver k of the fi rst summands in ( 51 ), we first observ e that for eac h k and l , they are b ounded b y 2  nE     1 n ψ lk ( X )     α  1 /α ≤ n − ( α − 1) /α 2 ( l/ 2)+1 k ψ k ∞  Z supp ψ lk dP  1 /α . Let K 1 = { k : 0 ∈ [( A 1 + k ) / 2 l , ( A 2 + k ) / 2 l ] } , whic h consists of at most A 2 − A 1 + 1 terms and set K 2 = Z \ K 1 . T h en, X k ∈ K 1  Z supp ψ lk dP  1 /α ≤ ( A 2 − A 1 + 1) ( α +1) /α 2 − l/α k p 0 k 1 /α ∞ and X k ∈ K 2  Z supp ψ lk dP  1 /α ≤ X k ∈ K 2 1 (1 + ( | A 1 + k | ∧ | A 2 + k | ) / 2 l ) η/α  Z ( A 2 + k ) / 2 l ( A 1 + k ) / 2 l (1 + | x | ) η dP  1 /α ≤ 2 lη /α  X k ∈ K 2 1 (2 l + ( | A 1 + k | ∧ | A 2 + k | )) η/ ( α − 1)  1 − 1 /α ×  X k ∈ K 2 Z ( A 2 + k ) / 2 l ( A 1 + k ) / 2 l (1 + | x | ) η dP ( x )  1 /α b y H¨ older. Since for λ > 1, P k ∈ K 2 1 (2 l +( | A 1 + k |∧| A 2 + k | )) λ ≤ 2 P ∞ r =2 l r − λ , w e get 2 lη /α  X k ∈ K 2 1 (2 l + ( | A 1 + k | ∧ | A 2 + k | )) η/ ( α − 1)  1 − 1 /α ≤ C 2 l ( α − 1) /α for a constant C = C η,α dep end in g only on η and α . Moreo ver,  X k Z ( A 2 + k ) / 2 l ( A 1 + k ) / 2 l (1 + | x | ) η dP ( x )  1 /α ≤ ( A 2 − A 1 + 1) 1 /α ( E (1 + | X | η )) 1 /α < ∞ . Th us, collecting terms, X k     max 1 ≤ i ≤ n     1 n ( ψ lk ( X i ) − E ψ lk ( X ))         α,P (53) ≤ C n − ( α − 1) /α 2 l/ 2 (2 − l/α + 2 l ( α − 1) /α ) , 38 E. GIN ´ E AND R. NICKL where C dep en ds on α , η , k ψ k ∞ , A 1 , A 2 and k p 0 k ∞ . No w, adding ( 52 ) and ( 53 ) gives a b ound for P k k ˜ β lk − β lk k α,P b y ( 51 ), whic h , com bin ed w ith inequalit y ( 50 ), and ( 49 ), pro ves th at the series in ( 49 ) is dominated by C ′ j 1 − 1 X l = j 0 2 l [1 + 2( n − 1 2 l ) ( α − 1) /α ] e − c ( κ ) l ( α − 1) /α , (54) where C ′ dep end s only on α , η , k ψ k ∞ , A 1 , A 2 and k p 0 k ∞ . By definition of j 1 , n − 1 2 l < 2 / log n for l < j 1 , w h ic h give s 2 l [1 + 2( n − 1 2 l ) ( α − 1) /α ] ≤ 2 l (1 + 2(2 / log n ) ( α − 1) /α ) ≤ c 2 l for some c < ∞ , and, usin g the definition of α and condition ( 48 ) for c ( κ ), w e obtain that ( 54 ) is b ounded b y C ′′ j 1 − 1 X l = j 0 2 − l ( T +1) ≤ C ′′′ 2 − j 0 ( T +1) for suitable constants C ′′ and C ′′′ . By the defi nition of j 0 and T , we see that this giv es the b ound O  log n n  ( T +1) / [2( T +1)+1]  = o  log n n  t/ (2 t +1)  for the series in ( 49 ), w hic h is what we wan ted to prov e for term (I I). F or term (I I I), E sup x      j 1 − 1 X l = j 0 X k β lk ψ lk ( x ) I [ | ˆ β lk |≤ τ , | β lk | > 2 τ ]      ≤ j 1 − 1 X l = j 0 2 l/ 2 k ψ k ∞ X k | β lk | Pr {| ˆ β lk | ≤ τ , | β lk | > 2 τ } ≤ c j 1 − 1 X l = j 0 2 l e − c ( κ ) l < c ′ 2 − j 0 ( T +1) = o  log n n  t/ (2 t +1)  , where w e hav e used that ( 40 ) and k K l ( p 0 ) k 1 ≤ k Φ l ∗ p 0 k 1 ≤ k Φ k 1 [b y ( 6 )] imply P k | β lk | ≤ C 2 l/ 2 , and th at Pr {| ˆ β lk | ≤ τ , | β lk | > 2 τ } ≤ Pr {| ˆ β lk − β lk | > τ } ≤ 2 exp {− c ( κ ) l } by ( 50 ) and c h oice of κ . Next, we consider (I). W e will use ( 18 ) and w e sh ould note in adv ance that if l ≤ j 1 , then p l/n ≥ C 2 l/ 2 l/n , so that p l/n is the dominating term in that b ound . Let j 1 ( t ) b e su c h that j 0 < j 1 ( t ) ≤ j 1 − 1 and 2 j 1 ( t ) ≃ ( n/ log n ) 1 / (2 t +1) [suc h j 1 ( t ) exists by the definitions]. Using ( 18 ) and ( 6 ) we h a ve E sup x      j 1 ( t ) − 1 X l = j 0 X k ( ˆ β lk − β lk ) ψ lk ( x ) I [ | ˆ β lk | >τ , | β lk | >τ / 2]      W A VELET DENS ITY ESTIMA TORS 39 ≤ j 1 ( t ) − 1 X l = j 0 E  sup k | ˆ β lk − β lk |  2 l/ 2 sup x X k | ψ (2 l x − k ) | ≤ C j 1 ( t ) − 1 X l = j 0 s 2 l l n = O s 2 j 1 ( t ) j 1 ( t ) n ! = O  log n n  t/ (2 t +1)  . F or the second part of (I), u sing the same facts as in the previous com- putation and that sup k | β lk ( p 0 ) | ≤ c 2 − l ( t +1 / 2) b y ( 37 ), we hav e [recall the definition of τ = τ ( l, n )] E sup x      j 1 − 1 X l = j 1 ( t ) X k ( ˆ β lk − β lk ) ψ lk ( x ) I [ | ˆ β lk | >τ , | β lk | >τ / 2]      ≤ j 1 − 1 X l = j 1 ( t ) E  sup k | ˆ β lk − β lk |  2 κ r n l sup k | β lk | 2 l/ 2 sup x X k | ψ (2 l x − k ) | ≤ C j 1 − 1 X l = j 1 ( t ) 2 − lt = O  log n n  t/ (2 t +1)  . Finally , for term (IV), using ( 6 ) and ( 37 ) we h a ve sup x      j 1 − 1 X l = j 0 X k β lk ψ lk ( x ) I [ | ˆ β lk |≤ τ , | β lk |≤ 2 τ ]      ≤ c j 1 − 1 X l = j 0 sup k 2 l/ 2 | β lk | I [ | β lk |≤ 2 τ ] (55) ≤ c j 1 − 1 X l = j 0 min(2 l/ 2 δ , C 2 − lt ) , where δ = 2 κ p j 1 /n ≥ 2 τ and where C only dep en ds on th e Beso v n orm L of p 0 . T o estimate this qu an tity , we use an id ea of Do noho et al . [( 1997 ), pro of of Theorem 3, see also Dely on and Ju ditsky ( 1996 )]. Set W ( l ) = min(2 l/ 2 δ , C 2 − lt ). Clearly sup j 0 ≤ l ≤ j 1 − 1 W ( l ) is attained at l ∗ suc h that 2 l ∗ = ( C /δ ) 1 / ( t +1 / 2) , and W ( l ∗ ) = C 1 − r δ r = C 2 − tl ∗ where r = t/ ( t + 1 / 2). Hence, W ( l ) /W ( l ∗ ) ≤ min(2 t ( l ∗ − l ) , 2 l ∗ t + l/ 2 δ /C ) . So the last term in ( 55 ) equals c j 1 − 1 X l = j 0 W ( l ) ≤ cW ( l ∗ )2 l ∗ t δ C − 1 X j 0 ≤ l 0, N ( F , L 2 ( Q ) , ε ) ≤ ( AU /ε ) v . F or suc h classes, assu ming P f = 0 for f ∈ F , there exists a un iv ers al constant L ′ suc h that E      n X i =1 f ( X i )      F ≤ L ′ √ v √ nσ 2 s log AU σ + v U log AU σ ! (57) [see, e.g., Gin´ e and Gu illou ( 2001 )]. If σ < U / 2 we m a y replac e A by 1 at the price of c hanging the constant L ′ . Then, if nσ 2 > C log U σ (58) for some constan t C w e obtain E      n X i =1 f ( X i )      F ≤ L ′′ √ nσ 2 s log U σ and V ≤ L ′′′ nσ 2 (59) for constan ts L ′′ , L ′′′ that dep end only on A, v , C . Com b ining these estimates with T alagrand’s inequalit y ( 56 ), it is easy to obtain [as in C orollary 2.2 in Gin ´ e and Guillou ( 2002 )] that there exist constan ts R and C 1 dep end in g only on A and v suc h that for all C 2 ≥ C 1 , if C 1 √ nσ s log U σ ≤ t ≤ C 2 nσ 2 U , σ < U / 2 , 42 E. GIN ´ E AND R. NICKL and ( 58 ) are satisfied, then Pr ( max k ≤ n      k X i =1 f ( X i )      F ≥ t ) ≤ R exp  − 1 C 3 t 2 nσ 2  , (60) where C 3 = log(1 + C 2 /L ′′′ ) /RC 2 . In particular, for u ≥ C 1 , with ¯ L = L ′′′ ∨ R , Pr ( max k ≤ n      k X i =1 f ( X i )      F ≥ u √ nσ 2 s log U σ ) ≤ ¯ L exp  − u log (1 + u/ ¯ L ) ¯ L log U σ  . These tail p robabilities are of P oisson-t y p e, and an easy (bu t somewhat cum b ersome) computation yields that, for all λ ≥ 0, E exp  λ max k ≤ n k P k i =1 f ( X i ) k F √ nσ 2 p log( U /σ )  ≤ D ( A, v , C 1 , ¯ L )(1 + q λ ¯ L ( e 2 λ ¯ L/ log( U /σ ) − 1) (61) × exp { λ ¯ L ( e 2 λ ¯ L/ log( U /σ ) − 1) } ) . Ac kn o w ledgmen t. W e than k t w o anonymous referees for a careful read- ing of this article and for constructiv e criticism that r esulted in an improv ed exp osition. REFERENCES Bickel, P. J. and Ritov, Y. ( 2003). Nonparametric estimators which can b e “plugged- in.” Ann. Statist. 31 1033–1053. MR2001641 de la Pe ˜ na, V. H. and Gin ´ e, E. (1999). De c oupling: F r om Dep endenc e to Indep en- denc e, R andomly Stopp e d Pr o c esses. U -Statistics and Pr o c esses. M artingales and Be- yond . Springer, New Y ork. MR1666908 Da ubechie s, I. (1992). T en L e ctur es on Wavelets . CBM S-NSF R e gional Confer enc e Series in Applie d Mathematics 61 . S IAM, Philadelphia, P A. MR1162107 Deheuvels, P. (2000). Uniform limit la ws for kernel density estimators on p ossibly un- b ounded interv als. I n R e c ent A dvanc es i n R eliabili ty The ory (Bor de aux, 2000) . 477–492. Birkh¨ auser, Boston. MR1783500 Del yon, B. and Judi tsky, A . (1996). On minimax wa velet estimators. Appl. Comput. Harmon. Anal. 3 215–2 28. MR1400080 Donoho, D. L. , Johnstone, I. M. , Kerky acharian, G. and Picard , D. (1995). W av elet shrink age: Asymptopia? J. R oy. Statist. So c. Ser. B 57 301–369. MR1323344 Donoho, D. L. , Johnstone, I . M. , Kerky acharian, G. and Picard, D. (1996). Density estimation by w avel et thresholding. Ann. Statist. 24 508–539. MR1394974 Donoho, D. L. , Johnstone, I. M. , Ke rky acharian, G. and Picard, D. (1997). Un i- versa l near minimaxity of wa velet shrink age. I n F estschrift for Lucien Le Cam 183–218. Springer, New Y ork. MR1462946 W A VELET DENS ITY ESTIMA TORS 43 Doukhan, P. and Le ´ on, J. R. (1990). D´ eviation quadratique d’estimateurs de densit´ e p ar pro jections orthogonales. C. R. A c ad. Sci. Paris S´ er. I Math. 310 425–43 0. MR1046526 Dudley, R. M. (1992). F r´ echet differentia bility , p -v ariation and uniform Donsker classes. Ann . Pr ob ab. 20 1968–198 2. MR1188050 Dudley, R. M. (1999). Uniform Centr al Limit The or ems . Cambridge Studies in A dvanc e d Mathematics 63 . Cambridge Univ. Press, Cambridge. MR1720712 Dv oretzky, A. , Kie fer, J. and W olfo witz, J. (1956). Asymptotic minimax c haracter of the sample distribution fun ction and of t he classical multinomial estimator. Ann. Math. Statist. 27 642–669. MR0083864 Einmahl, U. and Mason, D. M. (1996). Some un ivers al results on the beh avior of incre- ments of p artial sums. Ann. Pr ob ab. 24 1388–1407. MR1411499 Einmahl, U. and Mason, D . M. (2000). An empirical process app roac h to the uniform consistency of kernel-t y p e function estimators. J. The or et. Pr ob ab. 13 1–37. MR1744994 Einmahl, U. and Mason, D. M. (2005). Uniform in b an d width consistency of kernel-type function estimators. A nn. Statist. 33 1380–140 3. MR2195639 Gin ´ e, E. and Guillou, A . (2001). On consistency of kernel density estimators for ran- domly censored d ata: Rates holding u niformly ov er adaptive interv als. Ann. Inst. H. Poinc ar´ e Pr ob ab. Statist. 37 503–5 22. MR1876841 Gin ´ e, E. and Guillou, A. (2002). Rates of strong un iform consistency for multiv ari- ate kernel densit y estimators. Ann. Inst. H. Poinc ar´ e Pr ob ab. Statist . 38 907–921. MR1955344 Gin ´ e, E. , La t a la, R. and Zin n, J. (2000). Exp onential and moment inequalities for U - statistics. In Hi gh Di mensional Pr ob abil ity, II (Se attle, W A, 1999) . Pr o gr ess i n Pr ob- ability 47 ( E . Gin ´ e, D. M. Mason and J. A. Wellner , eds.) 13 –38. Birkh¨ auser, Boston. MR1857312 Gin ´ e, E. and Nickl, R . (2008). Uniform central limit theorems for ke rnel density esti- mators. Pr ob ab. The ory R elate d Fields 141 333–387. MR2391158 Gin ´ e, E. an d Nickl, R. (200 9). A n ex p onential inequ alit y for the distribution fun ction of the kernel density estimator, with applications to adaptive estimation. Pr ob ab. The ory R elate d Fields 143 569–596 . MR2475673 Golubev, Y. , Lepski, O. an d Le vit, B. ( 2001). On adap t ive estimation for the sup- n orm losses. Math. Metho ds Statist. 10 23–37 . MR1841807 Hall, P. , Kerky acharian, G . and Picard, D. (1998). Blo ck th reshold rules for curve estimation using kernel and wa velet method s. Ann . Statist. 26 922–942. MR1635418 H ¨ ardle, W . , Kerky acharian, G . , Picard, D . and T sybak o v, A. (1998). Wavelets, Appr oximation, and Statistic al Applic ations . L e ctur e Notes in Statistics 129 . Springer, New Y ork. MR1618204 Juditsky, A. and Lamber t-La croix, S . (2004). On minimax density estimation on R . Bernoul li 10 187–220 . MR2046772 Kerky acharian, G. and Picard, D. (1992). Density estimation in Beso v spaces. Statist. Pr ob ab. L ett. 13 15–24. MR1147634 Kerky acharian, G. , Picard, D. an d Tribouley, K. (1996). L p adaptive densit y esti- mation. Bernoul li 2 229–247 . MR1416864 Ko ml ´ os, J. , Major, P. and Tusn ´ ady, G . (1975). An app roximation of partial sums of in- dep endent rv’s and the sample df. I . Z. Wahrsch. V erw. Gebiete 32 111–131. MR0375412 Ledoux, M. (2001). The Conc entr ation of Me asur e Phenomenon . Mathematic al Surveys and Mono gr aphs 89 . Amer. Math. So c., Providence, RI. MR1849347 Lo ve, E. R. and Young, L. C. (1937). S ur une classe de fonctionelles lin´ eaires. F und. Math. 28 243–257. 44 E. GIN ´ E AND R. NICKL Massiani, A . (2003). Vitesse de conve rgence uniforme presque s ˆ u re d e l’estimateur lin ´ eaire par m´ etho de d’ondelettes. C. R. Math. A c ad. Sci. Paris 337 67–70. MR1995680 Meyer, Y. (19 92). Wavelets and Op er ators . Cambridge Studies i n A dvanc e d Mathematics 37 . Cambridge U niv. Press, Cam bridge. MR1228209 Nickl, R. (2006). Empirical and Gaussian processes on Besov classe s. I n High Dimensional Pr ob abili ty . Institute of Mathematic al Statistics L e ctur e Notes— Mono gr aph Series 51 ( E. G in ´ e, V. K ol tchinski i, W. Li and J. Zin n , eds.) 185–195. IMS, Beac hw o od , OH. MR2387769 Nickl, R . (20 07). D onsker-t yp e theorems for nonparametric maximum likelihood estima- tors. Pr ob ab. The ory R elate d Fields 138 411–449. MR2299714 Nickl, R . and P ¨ otscher, B. M. (2007). Brack eting metric entrop y rates and empirical central limit theorems for fun ction classes of Beso v- and Sob olev-type. J. The or et. Pr ob ab. 20 177–199. MR2324525 Nolan, D. and Pollard, D. (1987). U - processes: Rates of conv ergence. Ann. Statist. 15 780–799 . MR888439 Stute, W. (1982). A la w of the logarithm for kernel density estimators. A nn. Pr ob ab. 10 414–422 . MR647513 T alagrand, M. (1994 ). Sharper b ound s for Gaussian and empirical pro cesses. Ann. Pr ob ab. 22 28–76. MR1258865 T alagrand, M. (1996). New concentratio n inequalities in pro duct spaces. Invent. Math. 126 505–563. MR1419006 Triebel, H. (1983). The ory of F unction Sp ac es . Birkh¨ auser, Basel. MR781540 Tsybak ov, A. B. (1998). Po int wise and sup-norm sharp adaptive estimation of functions on the Sob olev classes. A nn. Statist. 26 2420–246 9. MR1700239 v an der V aar t, A. W. and Wellner, J. A. (1996). We ak C onver genc e and Empiric al Pr o c esses . Springer Series in Statistics . S p ringer, New Y ork. MR1385671 Vidak ovic, B. (1999). Statistic al Mo deling by Wavelets . Wiley , New Y ork. MR1681904 Dep ar tment of Ma thema tics University of Connecticut Storrs, Connecticut 06269-3 009 USA E-mail: gine@math.uconn.edu St a tistical Labora tor y Dep ar tment of Pure M a themat ics and Ma thematic al S t a tistics University of Cambridge Wilberforce Road CB3 0WB Cambridge United Kingdom E-mail: nic kl@statslab.cam.ac.uk

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment