Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections
Given an i.i.d. sample from a distribution $F$ on $\mathbb{R}$ with uniformly continuous density $p_0$, purely data-driven estimators are constructed that efficiently estimate $F$ in sup-norm loss and simultaneously estimate $p_0$ at the best possibl…
Authors: Evarist Gine, Richard Nickl
Bernoul li 16 (4), 2010, 1137–11 6 3 DOI: 10.315 0/09-BEJ 239 Adaptiv e estimation of a distributio n function and its densit y in sup-norm loss b y w a v elet and spline pro jections EV ARIST GIN ´ E 1 and RICHARD NICKL 2 1 Dep artment of Mathemat i cs, University of Conne cticut, Storrs, CT 06269-3009, USA. E-mail: gine@math.uc onn.e du 2 Statistic al L ab or atory, Dep artment of Pur e Mathematics and Mathematic al Statistics, Univer- sity of Cambridge, Wilb erfor c e R o ad, Cambridge CB3 0WB, UK. E-mail: nickl@statslab.c am.ac.uk Give n an i.i. d . s ample from a distribution F on R with uniformly contin uous d ensit y p 0 , purely data-driven estimators are constructed that efficiently estimate F in sup-norm loss and sim u lta- neously estimate p 0 at the b est p oss ible rate of conv ergence o ver H¨ older balls, also in sup -norm loss. The estimators are obtained by applying a mo del selec t ion proced ure close to Lepski’s metho d with random thresholds to pro jections of the empirical measure onto spaces spanned by w av elets or B - splines. The random t h resholds are based on su prema of R ademac her pro cess es indexed by w av elet or spline pro jection kernels. This requ ires Bernstein-typ e analogs of the in- equalities in K oltc hinskii [ Ann. Statist. 34 (2006) 2593–2656] for the deviation of suprema of empirical p rocesses from their Rademacher symmetrizations. Keywor ds: adaptive estimation; Lepski’s metho d; R ademac her processes; spline estimator; sup-norm loss; wa velet estimator 1. In tro duction If X 1 , . . . , X n are i.i.d. with unknown distribution function F on R , then classical results of ma thematical statistics esta blis h optimality of the empirical distribution function F n as an estimator of F . That is to say , if we assume no a prio ri knowledge wha tsoever on F and equip the set of all probabilit y distribution functions with some natur al los s function such as sup- no rm loss , then F n is asymptotically sha rp minimax for es timating F . (The same is true even if mor e is known a bout F , fo r instance, if F is known to hav e a uniformly contin uous density .) How ever, this do es no t preclude the existence of other estimators that are a lso asymptotically minimax for estimating F in sup- norm los s , but which improve up on F n in o ther res pects. Wha t we have in mind is a purely data-dr iv en estimator that is efficient for F , but, at the same time, also estimates the density f of F This is an electronic reprint of the original article pub li shed by the ISI/BS in Bernoul li , 2010, V ol. 16, No. 4, 1137–1 163 . This reprint differs from the original in pagination and typographic detail. 1350-7265 c 2010 ISI/BS 1138 E. Gin´ e and R. Nickl at the b est rate of conv er gence in so me relev ant loss function ov er s o me prescrib ed cla s ses of densities. More precise ly , our goal in the pr esen t article is to c onstruct estima to rs that satisfy the functional cent r al limit theorem (CL T) for the distributio n function and whic h adapt to the unknown s moothness of the density in sup-norm loss . Whereas this ar ticle is conce r ned with the mathematical problem of the existence and construction of s uc h estimators, it do es not deal with the practical implementation of estimation pro cedures. T o achieve adaptation, one can opt for several approaches, all o f which are related. Among them, w e ment io n the p enalization metho d of Barro n, Birg´ e and Massa rt [ 1 ], wa velet threshho lding [ 7 ] a nd Lepski’s [ 26 ] method. Our choice for the goal at hand consists of using Lepski’s metho d, with random thresholds , applied to wa velet and spline pro jection estimators of a density . The linear estimato rs underlying o ur pr ocedure ar e pro jections of the empir ical mea- sure onto spaces spanned by wav elets, a nd wav elet theory is cent r al to some of the deriv ations of this ar ticle. The wav elets mo st co mmonly used in statistics ar e those that are compactly suppor ted (for example, Daub ec hies wa velets), and our results r eadily apply to thes e . How ever, for co mputatio nal a nd other purp oses, pro jections onto s pline spaces are a lso in ter esting candida tes for the estimators. Density estimato rs obtained by pro jecting the empirical measure onto Scho en b erg spaces spanned by B -splines were studied by Huang a nd Studden [ 19 ]. As is w ell known in wa velet theory , the Schoe n- ber g spline spaces w ith equally spa ced knots hav e an o rthonormal ba sis co ns isting of the Battle–Lemari´ e wa velets so that the spline pro jection estimator is, in fact, exactly equal to the wa velet estimato r based on Ba ttle–Lemari ´ e wav elets. These wa velets do not hav e compact supp ort, but they a re exp onentially lo calized. Althoug h we canno t, in genera l, handle exp onentially decaying wav elets, we can still work with Battle–Lema r i ´ e wa velets bec ause the B -spline expansio n of the pro jections allows us to show tha t the re le v ant classes of functions are of V apnik– Cherv onenk is t yp e so that empir ical pro cess techniques can b e applied. In particular , the adaptive estimator s we devise in Theorem 3 ma y b e based either on spline pro jections or on compactly supp orted wa velets. In the pro cess of proving the main theor em, we a ls o provide new asymptotic results for spline pro jection density estimato rs similar to those for wav elet estimators in [ 14 ]. W e need to use T ala grand’s exp onen tial ineq ua lit y with sharp constants [ 3 , 2 1 ] in the pro ofs, but to do this, we ha ve to estimate the exp ectation of suprema of certain empirical pro cesses that app ear in the centering of T a lagrand’s inequalit y . The us e of ent r op y-bas ed moment inequalities for empirical pro cesses typically r esults in too con- serv ative co nstan ts (for example, in [ 13 ]). In o rder to remedy this problem, w e ada pt recent idea s due to Koltchinskii [ 22 , 2 3 ] and Bartlett, Boucheron a nd Lugosi [ 2 ] to den- sity estimation: the entrop y- based moment bo unds ar e r eplaced by the s up- norm of the asso ciated Rademac her averages, which are, with high probability , b e tter estimates of the exp ected v a lue of the s upr em um of the empirical pr o cess. W e der iv e a Bernstein- t yp e a na log of an exp onential inequality in [ 23 ] that shows how the supremum of an empirical pro cess deviates from the supremum of the ass ociated Rademacher pro cesses. This Bernstein-t yp e version allows one to use partial knowledge of the v aria nce of the empirical pr ocesse s in volved, which is cr ucial for a pplications in our context of a daptiv e density estimatio n. Mo reo ver, we show tha t one can use, instea d o f the supremum of the Rademacher pro cess, its conditiona l exp ectation g iv en the data. A daptive estimation 1139 Adaptive es tima tion in sup-norm lo s s is a relatively rec e n t sub ject. W e should mention the results in Tsybako v [ 34 ], Golub e v, Le ps ki a nd Levit [ 16 ] – who only considered Sob olev-t yp e smo othness conditions – and [ 15 ]. All of these results were obtained in the Gaussian white noise mo del. If one is interested in adapting to a H¨ older- con tinuous density in sup-norm loss in the i.i.d. density model on R , this s implifying Ga ussian structure is not av a ilable and no vel tec hniques a re needed. In the i.i.d. de ns it y mo del on R , a dir e c t ‘comp etitor’ to the estimators constructed in this article is the hard thresholding wav elet density estimator intro duced in [ 7 ]: a s prov ed in [ 14 ], its dis tribution function satisfies the functional CL T and it is adaptive in the s up- norm ov er H¨ older ba lls; how ever, the pr oofs there seem to req uir e the additional assumption that d F integrates | x | δ for some δ > 0, and the constants a ppearing in the threshold a nd the risk b ecome quite lar ge for δ sma ll. The re s ults in the present ar ticle hold under no moment condition whatso ev er . 2. W a v elet expansions and estimators W e start with some basic notation. If ( S, S ) is a measurable spa ce, then for Borel- measurable functions h : S → R and Borel measur es µ on S , we set µh := R S h d µ . W e will denote b y L p ( Q ) := L p ( S, Q ), 1 ≤ p ≤ ∞ , the usual Leb esgue spaces on S with resp ect to a Borel measure Q , and if Q is Leb esgue measure on S = R , then w e sim- ply denote this space by L p ( R ), and its norm by k · k p , if p < ∞ . W e will use k h k ∞ to denote sup x ∈ R | h ( x ) | for h : R → R . F or s ∈ N , denote b y C s ( R ) the spaces of func- tions f : R → R that are s -times differentiable with b ounded uniformly contin uous D r f , 0 < r ≤ s , e q uipped with the norm k f k s, ∞ = P 0 ≤ α ≤ s k D α f k ∞ , with the conv ention tha t D 0 =: i d and that C ( R ) := C 0 ( R ) is then the space of bounded unifor mly con tinuous functions. F or non-integer s > 0 and [ s ] the integer part o f s , set C s ( R ) = f ∈ C [ s ] ( R ) : k f k s, ∞ := X 0 ≤ α ≤ [ s ] k D α f k ∞ + sup x 6 = y | D [ s ] f ( x ) − D [ s ] f ( y ) | | x − y | s − [ s ] < ∞ . 2.1. Multiresolution analysis and w a velet bases W e r ecall here a few well-kno wn facts ab out wav elet expansio ns; see, for ex ample, Sections 8 and 9 in [ 17 ]. L et φ ∈ L 2 ( R ) b e a sca ling function, tha t is, φ is such that { φ ( · − k ) : k ∈ Z } is an orthonorma l system in L 2 ( R ) a nd, moreov er, the linear space s V 0 = { f ( x ) = P k c k φ ( x − k ) : { c k } k ∈ Z ∈ ℓ 2 } , V 1 = { h ( x ) = f (2 x ) : f ∈ V 0 } , . . . , V j = { h ( x ) = f (2 j x ) : f ∈ V 0 } , . . . a re nested ( V j − 1 ⊆ V j for j ∈ N ) a nd their union is dense in L 2 ( R ). In the cas e where φ is a b ounded function that decays exponentially at infinity (that is, | φ ( x ) | ≤ C e − γ | x | for s o me C, γ > 0 ) – which we assume for the re st o f this subsection – the kernel of the pro jection onto the space V j has cer tain pr operties. First, the ser ies K ( y , x ) := K ( φ, y , x ) = X k ∈ Z φ ( y − k ) φ ( x − k ) (1) 1140 E. Gin´ e and R. Nickl conv erge s p oint wise and we set K j ( y , x ) := 2 j K (2 j y , 2 j x ) , j ∈ N ∪ { 0 } . F ur ther more, we hav e | K ( y , x ) | ≤ Φ( | y − x | ) and sup x ∈ R X k | φ ( x − k ) | < ∞ , (2) where Φ : R → R + is b ounded and has exp onential decay (cf. Lemma 8.6 in [ 33 ]). F or any j fixe d, if f ∈ L p ( R ), 1 ≤ p ≤ ∞ , then the series K j ( f )( y ) := Z K j ( x, y ) f ( x ) d x = X k ∈ Z 2 j φ (2 j y − k ) Z φ (2 j x − k ) f ( x ) d x, y ∈ R , conv erge s po in twise and, for f ∈ L 2 ( R ), K j ( f ) coincides with the orthogona l pr o jection π j : L 2 ( R ) → V j of f onto V j . F or f ∈ L 1 ( R ), which is the main case in this article, the conv erge nce of the series in fa c t takes place in L p ( R ), 1 ≤ p ≤ ∞ . This still ho lds true if f ( x ) d x is replaced by d µ ( x ), wher e µ is any finite sig ned mea sure. If, now, φ is a scaling function and ψ the as s ociated mother wav elet s o that { φ ( · − k ) , 2 l/ 2 ψ (2 l ( · ) − k ) : k ∈ Z , l ∈ N } is an orthonor mal basis of L 2 ( R ), then a n y f ∈ L p ( R ) admits the for mal expansion f ( y ) = X k α k ( f ) φ ( y − k ) + ∞ X l =0 X k β lk ( f ) ψ lk ( y ) , (3) where ψ lk ( y ) = 2 l/ 2 ψ (2 l y − k ), α k ( f ) = R f ( x ) φ ( x − k ) d x , β lk ( f ) = R f ( x ) ψ lk ( x ) d x . Since ( K l +1 − K l ) f = P k β lk ( f ) ψ lk , the par tial s ums of the series ( 3 ) are in fact given by K j ( f )( y ) = X k α k ( f ) φ ( y − k ) + j − 1 X l =0 X k β lk ( f ) ψ lk ( y ) (4) and if φ, ψ ar e b o unded and hav e exp onen tial decay , then conv ergence o f the series ( 4 ) holds p oin twise; it also holds in L p ( R ), 1 ≤ p ≤ ∞ , if f ∈ L 1 ( R ) or if f is repla c e d by a finite s igned measur e. Now, using these facts, one ca n fur thermore show tha t the wa velet series ( 3 ) c o n verges in L p ( R ), p < ∞ , for f ∈ L p ( R ) and we also note that if p 0 is a uniformly co ntin uous density , then its wa velet ser ies conv erge s unifor mly . 2.2. Densit y estimation using w av elet and spline pro jection k ernels Let X 1 , . . . , X n be i.i.d. ra ndom v ariables with co mmo n law P and density p 0 on R , and denote by P n = 1 n P n i =1 δ X i the asso ciated empir ic al measure. A natural first step is to estimate the pro jection K j ( p 0 ) of p 0 onto V j by p n ( y ) := p n ( y , j ) = 1 n n X i =1 K j ( y , X i ) = X k ˆ α k φ ( y − k ) + j − 1 X l =0 X k ˆ β lk ψ lk ( y ) , y ∈ R , (5) A daptive estimation 1141 where K is as in ( 1 ), j ∈ N , a nd where ˆ α k = R φ ( x − k ) d P n ( x ), ˆ β lk = R ψ lk ( x ) d P n ( x ) are the empirical wa velet co efficien ts. W e note that for φ , ψ compactly s upported (for example, Daub echies w avelets), there are only finitely many k ’s for which these co effi- cients are no n-zero. This estimator was first studied by Kerkyacharian a nd Picard [ 20 ] for co mpa ctly s upp orted wav elets. If the w avelets φ and ψ do no t hav e compact supp ort, it may b e impo ssible to com- pute the estimator exactly since the sums over k consist of infinitely many summands . How ever, in the spec ia l case of the Battle–L e mari ´ e family φ r , r ≥ 1 (see, for example, Section 6.1 in [ 17 ]) – which is a class of non-compa c tly supp orted but exp onent ia lly decaying wa velets – the estimator has a simple form in terms of splines: the asso ciated spaces V j,r = { P k c k 2 j / 2 φ r (2 j ( · ) − k ) : P k c 2 k < ∞} are, in fact, equal to the Scho enb er g sp ac es generated by the Riesz basis of B -splines of o rder r s o that the sum in ( 5 ) can b e computed by p n ( y , j ) := 1 n n X i =1 κ j ( y , X i ) = 2 j n n X i =1 X k X l b kl N j,k,r ( X i ) N j,l,r ( y ) , y ∈ R , (6) where the N j,k,r are (suita bly tra nslated and dilated) B -splines o f order r , the kernel κ is as in ( 29 ) b elow and the b kl ’s are the entries of the in verse of the matrix de fined in ( 28 ) be lo w. An exact deriv ation of this spline pro jection, its w avelet r epresen tatio n and detailed definitions ar e g iv en in Section 3.2 . It turns out that for every sample p oin t X i and for every y , eac h of the last tw o sums extends o ver only r terms. W e should no te that this ‘spline pro jectio n’ estimator was first s tudied (outside the wav elet setting) by Huang a nd Studden [ 19 ], who der iv ed p oin twise rates of co n vergence; s e e a lso [ 18 ], whe r e some co mparison b etw een Daub ec hies a nd spline wa velets can b e found. In the cour se of proving the main theorem of this article, w e will derive some basic results fo r the linea r spline pro jection es timator ( 6 ), whic h w e no w state. F or classical kernel estimators, r e sults similar to those that follow were obtained in [ 5 , 11 , 13 ], and for wav elet estimators ba s ed o n compactly suppo rted w avelets, this was done in [ 14 ]. Theorem 1. Supp ose that P has a b ounde d density p 0 . As sume that j n → ∞ , n/ ( j n 2 j n ) → ∞ , j n / log log n → ∞ and j 2 n − j n ≤ τ for some τ p ositive. L et p n ( y ) = p n ( y , j n ) b e the estimator fr om ( 6 ) for s ome r ≥ 1 . Then lim sup n r n 2 j n j n sup y ∈ R | p n ( y ) − E p n ( y ) | = C a.s. and, for 1 ≤ p < ∞ , sup n r n 2 j n j n E s up y ∈ R | p n ( y ) − E p n ( y ) | p 1 /p ≤ C ′ , 1142 E. Gin´ e and R. Nickl wher e C and C ′ dep end only on k p 0 k ∞ and on r , p, τ ... and on r , p , τ . Mor e over, if p 0 ∈ C t ( R ) , then sup y ∈ R | p n ( y ) − p 0 ( y ) | = O r 2 j n j n n + 2 − tj n ! a.s. and in L p ( P ) . F or rates of conv ergence in pro babilit y , the conditions on j n can be weakened (see Prop osition 3 below). The last bound in this theorem gives, for p 0 ∈ C t ( R ) with t ≤ r and 2 j n ≃ ( n/ log n ) 1 / (2 t +1) , that sup y ∈ R | p n ( y ) − p 0 ( y ) | = O log n n t/ (2 t +1) , bo th a.s. a nd in L p ( P ) . F or the following cen tra l limit theo rem, we denote b y ℓ ∞ ( R ) conv erge nce in law for sample-b ounded pro cesses in the Banach spa ce o f b ounded functions on R , and by G P the us ua l P -B r o wnian bridge (for example, Chapter 3 in [ 8 ]). W e should emphasize that the optimal bandwidth c hoice 2 − j n ≃ n − 1 / (2 t +1) (if sup-norm loss is b eing consider ed, replace n by n/ log n ) is a dmissible for every t > 0 in the theor em b elow. Theorem 2. Assume that t he density p 0 of P i s a b ounde d fun ctio n ( t = 0 ) or that p 0 ∈ C t ( R ) for some t , 0 < t ≤ r . L et j n satisfy n/ (2 j n j n ) → ∞ and √ n 2 − j n ( t +1) → 0 as n → ∞ . If F is the distribution function of P and we set F S n ( s ) := R s −∞ p ( y , j n ) d y , then √ n ( F S n − F ) ℓ ∞ ( R ) G P . Pro of. Giv en ε > 0 , apply Prop osition 4 b elow with λ = ε so that k F S n − F n k ∞ = o P (1 / √ n ) follows and use the fa c t that √ n ( F n − F ) c o n verges in law in ℓ ∞ ( R ) to G P . 3. The adaptiv e estimation pro cedures In this section, we construct data-dr iv en ch o ices of the resolution level j and state the main ada ptation results. As mentioned in the Int r oduction , w e will use Rademacher symmetrization for this. Generate a Rademacher sequence ε i , i = 1 , . . . , n , indep endent of the sample (that is, ε i takes v alues 1 , − 1 with proba bilit y 1 / 2 ) and s et, for j < l , R ( n, j ) = 2 1 n n X i =1 ε i K j ( X i , · ) ∞ and (7) T ( n, j, l ) = 2 1 n n X i =1 ε i ( K j − K l )( X i , · ) ∞ , where K j is the kernel of the w avelet pro jection π j onto V j (bo th for Battle–L emari ´ e and compactly supp orted wav elets). In bo th cas es, these are suprema of fixed rando m A daptive estimation 1143 functions tha t depe nd only on known q uan tities that can be co mputed in a numerically effective wa y . F o r more deta ils on Rademacher pro cesses, s e e Sectio n 3.1.1 . T o construct the estimator s, we firs t need a grid index ing the spaces V j onto which we pro ject P n . F or r ≥ 1 , n > 1 , cho ose int eg ers j min := j min ,n and j max := j max ,n such that 0 < j min < j max , 2 j min ≃ n log n 1 / (2 r +1) and 2 j max ≃ n (log n ) 2 , (8) and set J := J n = [ j min , j max ] ∩ N . Note that the num b er of elements in this grid is of order log n . W e will co nsider t wo preliminary estimators, ¯ j n and ˜ j n , o f the re solution level (of course, only one is needed, but w e offer a choice b et ween t wo, as discuss e d b elow). Let p n ( j ) b e as in ( 5 ) or ( 6 ). First, we set ¯ j n = min ( j ∈ J : k p n ( j ) − p n ( l ) k ∞ (9) ≤ T ( n, j, l ) + 7 k Φ k 2 k p n ( j max ) k 1 / 2 ∞ r 2 l l n , ∀ l > j, l ∈ J ) , where the function Φ is as in ( 2 ), a nd we discuss a n explicit wa y to construct Φ in Remark 2 below. If the minimum do es not exist, then w e set ¯ j n equal to j max . An a lternativ e estimator of the resolution level is ˜ j n = min ( j ∈ J : k p n ( j ) − p n ( l ) k ∞ ≤ ( B ( φ ) + 1) R ( n, l ) (10) + 7 k Φ k 2 k p n ( j max ) k 1 / 2 ∞ r 2 l l n , ∀ l > j, l ∈ J ) , where B ( φ ) is a b ound, uniform in j , for the op erator norm in L ∞ ( R ) of the pr o jection π j ; s e e Remar k 3 b elow. Again, if the minimum do es not exist, we set ˜ j n equal to j max . Before we s ta te the main result, we briefly discuss these pro cedures. The data-driven resolution level ˜ j n in ( 10 ) is bas ed on tests that use Rademacher-type analog s o f the usual thresholds in Lepski’s metho d: sta rting with j min , the main contribution to k p n ( j ) − p n ( l ) k ∞ is the bias k E p n ( j ) − p 0 k ∞ . The pro cedure s ho uld s top when the ‘v aria nc e term’ k p n ( l ) − E p n ( l ) k ∞ starts to dominate. Since this is an unknown quantit y and since we know no go od non-r andom upper bound for it, we estimate it b y the supr em um of the asso ciated Rademacher proc ess, that is, by R ( n, l ). The co nstan t B ( φ ) is necessar y in order to corre c t for the lack of monotonicity of the R ( n, l )’s in the reso lution le vel l . 1144 E. Gin´ e and R. Nickl The estimator ¯ j n in ( 9 ) is s o mewhat more r efined: it attempts to take adv antage of the fact that in the ‘small bias’ domain, a nd using the res ults fro m Section 3.1.1 , k p n ( j ) − p n ( l ) k ∞ = 1 n n X i =1 ( K j − K l )( X i , · ) ∞ should not exceed its Rademacher sy mmetrization T ( n, j, l ) = 2 1 n n X i =1 ε i ( K j − K l )( X i , · ) ∞ . W e now state the main r esult, whose pro of is defer r ed to the next section. As usual, we say that a w avelet basis is s -r e gular , s ∈ N ∪ { 0 } , if either the sca ling function φ has s w eak der iv atives contained in L p ( R ) fo r some p ≥ 1 or if the mother wa velet ψ satisfies R x α ψ ( x ) d x = 0 for α = 0 , . . . , s . No te that any compactly supp orted e le men t of C s ( R ) , 0 < s ≤ 1 , is of b ounded (1 /s )-v ariation so that the p -v aria tion condition in the following theorem is satisfied, for exa mple, fo r all Da ub echies wa velets. The estimator s below achiev e the optimal rate of convergence for es timating p 0 in sup-norm los s in the minimax sense (ov er H¨ older balls); see, for example, [ 24 ] for optimality of these ra tes. Theorem 3. L et X 1 , . . . , X n b e i.i.d. on R with c ommon law P that p ossesses a uni- formly c ontinuous density p 0 . L et p n ( j ) := p n ( y , j ) b e as in ( 5 ) , wher e φ is either c om- p actly su pp orte d, of b oun de d p -variation ( p ≥ 1 ) and ( r − 1 )-r e gu lar or φ = φ r e quals a Battle–L emari´ e wavelet. L et the se quenc e { ˆ j n } n ∈ N b e either { ¯ j n } n ∈ N or { ˜ j n } n ∈ N and let F n ( ˆ j n )( t ) = R t −∞ p n ( y , ˆ j n ) d y . Then √ n ( F n ( ˆ j n ) − F ) ℓ ∞ ( R ) G P , (11) the c onver genc e b eing uniform over the set of al l pr ob ability m e asure s P on R with den- sities p 0 b ounde d by a fixe d c onstant, in any distanc e that metrizes c onver genc e in law. F urthermor e, if C is any pr e c omp act subset of C ( R ) , then sup p 0 ∈ C E s up y ∈ R | p n ( y , ˆ j n ) − p 0 ( y ) | = o(1) . (12) If, in additio n, p 0 ∈ C t ( R ) for some 0 < t ≤ r , t he n we also have sup p 0 : k p 0 k t, ∞ ≤ D E s up y ∈ R | p n ( y , ˆ j n ) − p 0 ( y ) | = O log n n t/ (2 t +1) . (13) R emark 1 (R elaxi ng the uniform c ontinu ity assumption). The assumption of uniform contin uity o f the density of F can b e r elaxed by mo difying the definition of ¯ j n (or ˜ j n ) along the lines of [ 13 ]. The idea is to constr ain all candidate estimator s to lie in A daptive estimation 1145 a ball of size o (1 / √ n ) around the empirical distribution function F n so that ( 11 ) holds automatically . F ormally , this can b e done by adding the re quiremen t sup t ∈ R Z t −∞ p n ( y , j ) d y − F n ( t ) ≤ 1 √ n lo g n to ea c h test in ( 9 ) or ( 10 ). If this r equiremen t do es not even hold for j max , then it c an be s een as evidence that F has no density and o ne just uses F n as the estimator so as to obtain at leas t the functional C L T. If F has a b ounded density , then one ca n use the exp onen tial b ound in Pr oposition 4 in the pr oof to co n trol r e jection proba bilities of these test in the ‘small bias’ domain ˆ j n > j ∗ and Theo rem 3 ca n then still b e proven for this pro cedure without any a s sumptions on F . See Theorem 2 in [ 13 ] for mo re details o n this pro cedure and its pro of. R emark 2 (The c onstant k Φ k 2 ). Once the wav elet φ hav e b een chosen, ˆ j n is purely data-driven since the function Φ dep ends only on φ . F o r the Ha ar basis ( φ = I [0 , 1) ), we can take Φ = φ b ecause, in this c a se, K ( x, y ) ≤ I [0 , 1) ( | x − y | ) s o that k Φ k 2 = 1. A general wa y to obtain ma jorizing kernels Φ is describ ed in Section 8 .6 of [ 17 ]. F or Battle–Le ma ri ´ e wa velets, the spline r epresen ta tio n o f the pr o jection kernel is ag ain useful for estimating k Φ k 2 . See [ 19 ] for explicit computations. R emark 3 (The c onstant B ( φ ) ). T o construc t ˜ j n , one requir es knowledge of the constant B ( φ ) that b ounds the op erator nor m k π j k ′ ∞ of π j , viewed a s a n op erator L ∞ ( R ). A simple wa y o f obta ining a bo und is as follows: for any f ∈ L ∞ ( R ), we have, by ( 2 ), | π j ( f )( x ) | = Z K j ( x, y ) f ( y ) d y ≤ k Φ k 1 k f k ∞ , that is, k π j k ′ ∞ ≤ k Φ k 1 . In combination with the pr evious remark, one readily obtains po ssible v alues for B ( φ ) . F or instance, for the Haar wa velet, B ( φ ) ≤ 1 . F or spline wa velets, other metho ds ar e av a ilable. F or example, for Battle–Lemar i ´ e wa velets ar is ing fro m linea r B -splines , k π j k ′ ∞ is bo unded by 3, and [ 30 ], page 135, conjecture s the b ound 2 r − 1 for general or der r . See [ 6 ], Chapter 13.4 , [ 30 ] and references therein for more information. W e a lso note tha t – as the res ults in Section 3.1.1 , in particular Prop osition 2 , show – all of our pro ofs go thro ug h if one replaces R ( n, j ) , T ( n, j, l ) by their resp ectiv e Ra demac her exp ectations E ε R ( n, j ), E ε T ( n, j, l ) in the definitions o f ˜ j n , ¯ j n . 3.1. Estimating suprema of empirical pro cesses T a lagrand’s [ 33 ] expo nential inequalit y for empirical pr ocesses (see also [ 25 ]), which is a unifor m Pro ho ro v- t yp e inequality , is not sp ecific ab out constant s . Constants in its Bernstein-type version hav e been s p ecified by several a uthors [ 3 , 21 , 27 ]. Let X i be the co ordinates of the pro duct pr obabilit y space ( S, S , P ) N , where P is any proba bilit y 1146 E. Gin´ e and R. Nickl measure on ( S, S ) a nd let F b e a countable c la ss of mea surable functions on S that take v a lues in [ − 1 / 2 , 1 / 2] o r, if F is P -centered, in [ − 1 , 1] . Let σ ≤ 1 / 2 and V b e any tw o nu mber s s atisfying σ 2 ≥ k P f 2 k F , V ≥ nσ 2 + 2 E n X i =1 ( f ( X i ) − P f ) F , (14) in whic h case V is a lso an upp er b ound for E k P ( f ( X i ) − P f ) 2 k F [ 21 ]. Then, noting that sup f ∈F ∪ ( −F ) P n i =1 f ( X i ) = sup F | P n i =1 f ( X i ) | , Bousquet’s [ 3 ] version of T a la grand’s inequality is as follows: for every t > 0 , Pr ( n X i =1 ( f ( X i ) − P f ) F ≥ E n X i =1 ( f ( X i ) − P f ) F + t ) ≤ ex p − t 2 2 V + (2 / 3 ) t . (15 ) In the other dir ection, the Klein a nd Rio [ 21 ] r esult is that for every t > 0, Pr ( n X i =1 ( f ( X i ) − P f ) F ≤ E n X i =1 ( f ( X i ) − P f ) F − t ) ≤ exp − t 2 2 V + 2 t . (16) These ineq ualities can b e applied in conjunction with an estimate of the ex p ected v alue obtained via e mpir ical pro cess metho ds. Here, we describ e one such r esult for V C-type classes, that is, for F satisfying the uniform metric entrop y condition sup Q N ( F , L 2 ( Q ) , τ ) ≤ A τ v , 0 < τ ≤ 1 ( A ≥ e , v ≥ 2) , (17) with the supr em um extending ov er a ll Bor el proba bilit y measur es on ( S, S ) . W e denote here by N ( G , L 2 ( Q ) , τ ) the usual cov er ing num b ers of a cla s s G of functions by balls of radius less than or eq ual to τ in L 2 ( Q )-distance. One then has, for every n , E n X i =1 ( f ( X i ) − P f ) F ≤ 2 " 15 r 2 v nσ 2 log 5 A σ + 1 350 v log 5 A σ # ; (18) see Pr o position 3 in [ 13 ] with a change obtained by using V as in ( 14 ) instead of an ea rlier bo und due to T alagr and for E k P ( f ( X i ) − P f ) 2 k F . Inequalities o f this type also ha ve some historical precedents ([ 9 , 10 , 12 , 32 ] among other s ). The co nstan ts on the right- hand side of ( 18 ) may b e fa r fro m the b est p ossible, but we pr efer them over unsp ecified ‘universal’ co nstan ts. As is the cas e of Ber ns tein’s inequality in R , T alag rand’s inequality is esp ecially useful in the Gaussian tail rang e and, co m bining ( 15 ) a nd ( 1 8 ) , o ne can o btain such a ‘Gaussian tail’ b ound for the supremum o f the empir ical pr ocess that depends only on σ (simila r to a b o und in [ 10 ]). A daptive estimation 1147 Prop osition 1. L et F b e a c ountable class of me asur able functions that satisfies ( 17 ) and is uniformly b oun de d (in absolute value) by 1 / 2. A s sume, further, that for s ome λ > 0 , nσ 2 ≥ λ 2 v 2 log 5 A σ . (19) Set c 1 ( λ ) = 2[15 + 1350 λ − 1 ] and let c 2 ( λ ) ≥ 1 + 120 λ − 1 + 1 0 , 800 λ − 2 . Then, if c 1 ( λ ) r 2 v nσ 2 log 5 A σ ≤ t ≤ 3 2 c 2 ( λ ) nσ 2 , (20) we have Pr ( n X i =1 ( f ( X i ) − P f ) F ≥ 2 t ) ≤ exp − t 2 3 c 2 ( λ ) nσ 2 . (21) Pro of. In the lig h t of ( 19 ), inequality ( 18 ) gives E n X i =1 ( f ( X i ) − P f ) F ≤ c 1 ( λ ) r 2 v nσ 2 log 5 A σ and ( 14 ) implies that we can ta k e V = c 2 ( λ ) nσ 2 . The result now follows fro m ( 15 ), taking int o a ccoun t that in the rang e of t ’s, E k P n i =1 ( f ( X i ) − P f ) k F ≤ t ≤ 3 V / 2 , ( 15 ) b ecomes Pr ( n X i =1 ( f ( X i ) − P f ) F ≥ 2 t ) ≤ exp − t 2 3 V . The cons tan ts here may be to o larg e for some applications, but they are not so in situations wher e λ can b e taken very la rge, in particular , in as ymptotic considera tions. (Then c 1 ( λ ) → 3 0 and c 2 ( λ ) → 1 as λ → ∞ .) 3.1.1. Estimating the size of empiric al pr o c esses by R ademacher aver ages The constants o ne could obtain from Prop osition 1 ar e no t satisfactory for the applica- tions to adaptive e s timation which we hav e in mind. W e now prop ose a remedy for this problem, ins pir ed by a nice idea of Ko ltc hinskii [ 2 2 ] and Ba rtlett, Bo uc heron a nd Lug osi [ 2 ] which they use d in o ther contexts, namely in risk minimizatio n and mo del selection. This consis ts of replacing the exp ectation o f the supremum o f an empir ical pro cess by the supremum o f the asso ciated Rademacher pro cess. An inequality of this type (see [ 23 ], page 260 2) is Pr ( n X i =1 ( f ( X i ) − P f ) F ≥ 2 n X i =1 ε i f ( X i ) F + 3 t ) ≤ exp − 2 t 2 3 n , (22) 1148 E. Gin´ e and R. Nickl where ε i , i ∈ N , are i.i.d. Rademacher random v ariables, indep enden t of the X i ’s, all de- fined as co ordinates on a la rge pro duct probability spa ce. No te that this b ound do es not take the v ariance V in ( 15 ) into ac c oun t, but in the a pplications to densit y e stimation that we have in mind, V is muc h s maller than n (it is of o rder n 2 − j n , j n → ∞ ). W e need a similar inequality , with the quantit y n in the bound replaced by V , v alid ov er a la rge enough r ange o f t ’s. It will b e convenien t to use the following well-known s ymmetrization inequality (see, for exa mple, [ 8 ], pag e 3 43): 1 2 E n X i =1 ε i f ( X i ) F − √ n 2 k P f k F ≤ E n X i =1 ( f ( X i ) − P f ) F ≤ 2 E n X i =1 ε i f ( X i ) F . (23) The following exp onen tia l b ound is the Bernstein-type analo g of ( 22 ). Denote by E ε exp ectation with r espect to the Rademacher v ariables only . Prop osition 2. L et F b e a c ountable class of me asur able fun ctions, u nifo rmly b ounde d (in absolute value) by 1 / 2. Then, for every t > 0 , Pr ( n X i =1 ( f ( X i ) − E f ( X )) F ≥ 2 n X i =1 ε i f ( X i ) F + 3 t ) ≤ 2 exp − t 2 2 V ′ + 2 t , ( 2 4) as wel l as Pr ( n X i =1 ( f ( X i ) − E f ( X )) F ≥ 2 E ε n X i =1 ε i f ( X i ) F + 3 t ) ≤ 2 exp − t 2 2 V ′ + 2 t , (25) wher e V ′ = nσ 2 + 4 E k P n i =1 ε i f ( X i ) k F . Pro of. W e hav e Pr ( n X i =1 ( f ( X i ) − P f ) F ≥ 2 n X i =1 ε i f ( X i ) F + 3 t ) ≤ Pr ( n X i =1 ( f ( X i ) − P f ) F ≥ 2 E n X i =1 ε i f ( X i ) F + t ) + P r ( n X i =1 ε i f ( X i ) F ≤ E n X i =1 ε i f ( X i ) F − t ) . F or the first ter m, combining ( 23 ) with ( 15 ) gives Pr ( n X i =1 ( f ( X i ) − P f ) F ≥ 2 E n X i =1 ε i f ( X i ) F + t ) ≤ exp − t 2 2 V ′ + (2 / 3) t . A daptive estimation 1149 F or the second term, note that ( 16 ) applies to the randomized sums P n i =1 ε i f ( X i ) as well, by just taking the cla ss of functions G = { g ( τ , x ) = τ f ( x ) : f ∈ F } , τ ∈ {− 1 , 1 } , instead of F and the pro babilit y mea sure ¯ P = 2 − 1 ( δ − 1 + δ 1 ) × P instead of P . Hence, Pr ( n X i =1 ε i f ( X i ) F ≤ E n X i =1 ε i f ( X i ) F − t ) ≤ exp − t 2 2 V ′ + 2 t (26) since V ′ ≥ nσ 2 + 2 E k P n i =1 ε i f ( X i ) k F . Combining the b o unds completes the pro of of ( 24 ). It remains to prove ( 25 ). Let G , ¯ P be a s ab ov e, let Y i = ( ε i , X i ) a nd no te that ¯ P is the law o f Y i . By conv exity , E e − tE ε k P n i =1 ε i f ( X i ) k F ≤ E e − t k P n i =1 ε i f ( X i ) k F = E e − t k P n i =1 g ( Y i ) k G for all t . The Kle in and Rio [ 21 ] version ( 16 ) of T alag rand’s inequa lit y is, in fact, estab- lished by estima ting the Laplace tr ansform E e − t k P n i =1 g ( Y i ) k G and Theore m 1.2a in [ 21 ] implies tha t E e − tE ε k P n i =1 ε i ( f ( X i ) − P f ) k F ≤ − tE n X i =1 g ( Y i ) G + V 9 (e 3 t − 3 t + 1) for V ≥ nσ 2 + 2 E k P n i =1 g ( Y i ) k G , whic h, b y their pro of of the implication ( a ) ⇒ ( c ) in that theo r em, g iv es Pr ( E ε n X i =1 ε i f ( X i ) F ≤ E n X i =1 ε i f ( X i ) F − t ) ≤ ex p − t 2 2 V ′ + 2 t . The pro of of ( 25 ) now follows as in the prev io us case . F or F of V C-type, the moment bound ( 18 ) is usually prov ed as a c onsequence o f a bo und for the Ra demac her pr ocess. In fac t, the pro of o f Pr opositio n 3 in [ 13 ] s ho ws that E n X i =1 ε i f ( X i ) F ≤ 15 r 2 v nσ 2 log 5 A σ + 1 350 v log 5 A σ , (27) where σ is a s in ( 14 ), which we use in the following cor ollary , together with the prev ious prop osition. The co nstan t c 2 ( λ ) in the exp onen t b elo w is still p oten tially large, but tends to one if λ → ∞ . 1150 E. Gin´ e and R. Nickl Corollary 1 . L et F b e a c ount able class of me asur able functions that satisfies ( 17 ) and assume it to b e uniformly b oun de d (in absolute value) by 1 / 2. Assume, further, ( 19 ) for some λ > 0 . Then, for 0 < t ≤ 1 20 c 2 ( λ ) nσ 2 with c 2 ( λ ) as in Pr op osition 1 , we have Pr ( n X i =1 ( f ( X i ) − E f ( X )) F ≥ 2 n X i =1 ε i f ( X i ) F + 3 t ) ≤ 2 exp − t 2 2 . 1 c 2 ( λ ) nσ 2 and the same ine quality holds if k P n i =1 ε i f ( X i ) k F is r eplac e d by its E ε exp e ctation. Pro of. By ( 19 ) and ( 27 ), we hav e V ′ ≤ c 2 ( λ ) nσ 2 , a nd the condition o n t to gether with ( 24 ) gives the result. 3.2. Pro jections on to spline spaces and their w a velet represen tation In this section, we brie fly review how the w avelet es timator ( 5 ) for Battle–Lemari´ e wa velets ca n b e represented as a spline pr o jection estimator ( 6 ). W e s hall need the spline representation in some pro ofs, while the wav elet repres e n tation will b e useful in o thers. Let T := T j = { t i ( j ) } ∞ −∞ = 2 − j Z , j ∈ Z , b e a bi-infinite sequence of equally spaced knots, t i := t i ( j ). A function S is a spline of order r , o r of deg ree m = r − 1 , if, on each int er v al ( t i , t i +1 ), it is a polyno mial of degree less than or equal to m (and o f degree exactly m on at least o ne interv al) and, a t ea c h br eakpoint t i , S is at least ( m − 1)-times differentiable. The Scho en b erg space S r ( T ) := S r ( T , R ) is defined as the set of all splines of o rder (less than or equal to) r a nd it coincides with the space S r ( T , 1 , R ) in [ 6 ], page 135. The space S r ( T j ) has a Riesz basis formed b y B -splines { N j,k,r } k ∈ Z that we now describ e; see Section 4.4 in [ 31 ] a nd page 138 f in [ 6 ] for mor e de ta ils. Define N 0 ,r ( x ) = 1 [0 , 1) ∗ · · · ∗ 1 [0 , 1) ( x ) , r -times := r X i =0 ( − 1) i r i ( x − i ) r − 1 + ( r − 1)! . F or r = 2 , this is the linea r B -spline (the us ual ‘hat’ function), for r = 3 , it is the quadr atic and for r = 4 , it is the cubic B -spline. Set N k,r ( x ) := N 0 ,r ( x − k ). The ele men ts of the Riesz basis are then given by N j,k,r ( x ) := N k,r (2 j x ) = N 0 ,r (2 j x − k ) . By the Curry–Scho en b erg theorem, a n y S ∈ S r ( T j ) can b e uniquely r epresen ted as S ( x ) = P k ∈ Z c k N j,k,r ( x ) . The orthogona l pr o jection π j ( f ) of f ∈ L 2 ( R ) onto S r ( T j ) ∩ L 2 ( R ) is derived, for example, in [ 6 ], pa g e 40 1f, where it is shown that π j ( f ) = 2 j / 2 P k ∈ Z c k N j,k,r , with the c o efficients c k := c k ( f ) satisfying ( Ac ) k = 2 j / 2 R N j,k,r ( x ) f ( x ) d x , the matrix A being given by a kl = Z 2 j N j,k,r ( x ) N j,l,r ( x ) d x = Z N k,r ( x ) N l,r ( x ) d x. (28) A daptive estimation 1151 The inv erse A − 1 of A exists (see Co rollary 4.2 on page 404 in [ 6 ]) and if we denote its ent r ies b y b kl so that c k = 2 j / 2 R P l b kl N j,l,r ( x ) f ( x ) d x, then we hav e π j ( f )( y ) = 2 j Z X k X l b kl N j,l,r ( x ) N j,k,r ( y ) f ( x ) d x = Z κ j ( x, y ) f ( x ) d x, where κ j ( x, y ) = 2 j κ (2 j x, 2 j y ) with κ ( x, y ) = X k X l b kl N l,r ( x ) N k,r ( y ) (29) is the spline pro jection kernel. Note that κ is symmetric in its arguments. In fac t, diag onalization of the kernel κ of the pro jection op erator π j led to one of the first ex amples of wav elets; see, for example, page 21f and Sec tion 2.3 in [ 28 ], Section 5.4 in [ 4 ] o r Section 6.1 in [ 17 ]. There, it is shown that there exists an ( r − 1) -times differentiable sc a ling function φ r with exp onen tial decay , the Battle–Lemari´ e wa velet of order r , such that S r ( T j ) ∩ L 2 ( R ) = V j,r = X k c k 2 j / 2 φ r (2 j ( · ) − k ) : X k c 2 k < ∞ . This necessar ily implies that the kernels κ a nd K = K ( φ r ) des cribe the same pro jectio ns in L 2 ( R ) and the following simple lemma shows that these kernels are, in fact, p oin twise the sa me. Lemma 1. L et { N k,r } k ∈ Z b e t he Riesz b asis of B -splines of or der r ≥ 1 and let φ r b e t he asso ciate d Battle–L emari´ e sc aling function. I f K is as in ( 1 ) and κ is as in ( 29 ), then, for al l x, y ∈ R , we have K ( x, y ) = κ ( x, y ) . Pro of. If r = 1 , then N 0 , 1 = φ 1 since this is just the Ha ar ba sis. So, consider r > 1 . Since { φ r ( · − k ) : k ∈ Z } is an orthonor mal ba sis of S r ( Z ) ∩ L 2 ( R ) (see, for example, Theorem 1 on page 26 in [ 28 ]), it follows that K and κ a re the kernels o f the same L 2 -pro jection op erator and, there fo re, for all f , g ∈ L 2 ( R ) , Z Z ( K ( x, y ) − κ ( x, y )) f ( x ) g ( y ) d x d y = 0 . By density in L 2 ( R × R ) o f linea r co mbinations of pro ducts of elements of L 2 ( R ), this im- plies that κ and K are almost everywhere equal in R 2 . W e complete the pro of by s ho wing that b oth functions a re contin uous on R 2 . F or K , this follows from the decomp osition | K ( x, y ) − K ( x ′ , y ′ ) | ≤ X k | φ r ( x − k ) − φ r ( x ′ − k ) || φ r ( y − k ) | + X k | φ r ( y − k ) − φ r ( y ′ − k ) || φ r ( x ′ − k ) | , 1152 E. Gin´ e and R. Nickl the uniform co n tinuit y of φ r ( r > 1) a nd relation ( 2 ). F or κ , we use the rela tion ( 31 ) below, | κ ( x, y ) − κ ( x ′ , y ′ ) | ≤ X i | N i,r ( x ) − N i,r ( x ′ ) || H ( y − i ) | + X i | H ( y − i ) − H ( y ′ − i ) || N i,r ( x ′ ) | , which implies contin uity of κ on R 2 since N 0 ,r and H are uniformly c o n tinuous (as N 0 ,r is, and P i | g ( | i | ) | < ∞ ) a nd since N 0 ,r has co mpa ct supp ort. 3.3. An exponential inequalit y for the uniform deviations of t he linear estimator T o control the uniform deviations of the linear estimators from their means, o ne can use inequalities for the empirical pro cess indexed b y cla sses of functions F contained in K = { 2 − j K j ( · , y ) : y ∈ R , j ∈ N ∪ { 0 } } , (30) together with suitable b ounds on the ‘weak’ v aria nce σ . If φ has co mpact suppor t (and is of finite p -v aria tion), it is proved in Lemma 2 of [ 14 ] that the cla ss K a lso satisfies the b ound ( 17 ). How ever, the pro of there does not apply to Battle–Lemari´ e wa velets. A differ e n t pro of, using the T o eplitz and band-limited structure of the spline pro jection kernel, still enables us to pro ve that these classes of functions a re of V apnik–Cher v onenkis type. Lemma 2. L et K b e as in ( 30 ), wher e φ r is a Ba t tle–L emari ´ e wavelet for some r ≥ 1 . Ther e then exist finite c onstants A ≥ 2 and v ≥ 2 such that sup Q N ( K , L 2 ( Q ) , ε ) ≤ A ε v for 0 < ε < 1 and wher e the su pr emu m ex t ends over al l Bor el pr ob ability me asur es on R . Pro of. In the cas e r = 1, φ 1 is just the Haar w av elet, in whic h ca se the result follows from Lemma 2 o f [ 14 ]. Hence, we assume that r ≥ 2. The matrix A is T o eplitz since, by a change of v ar iables in ( 28 ), a kl = a k +1 ,l +1 for all k , l ∈ Z , and it is band-limited becaus e N 0 ,r has c o mpact supp ort. It follows that A − 1 is also T o eplitz and we deno te its entries by b kl = g ( | k − l | ) for some function g . F urther more, it is known (for example, Theorem 4.3 on page 40 4 of [ 6 ]) that the entries of the inv erse of any p ositive definite band-limited matrix satisfy | b kl | ≤ cλ | k − l | for s o me 0 < λ < 1 and c finite. Now, following [ 19 ], we write X k g ( | l − k | ) N k,r ( x ) = X k g ( | l − k | ) N k − l,r ( x − l ) = X k g ( | k | ) N k,r ( x − l ) , A daptive estimation 1153 so that 2 − j κ j ( · , y ) = X l ∈ Z N j,l,r ( y ) H (2 j ( · ) − l ) , (31) where H ( x ) = P k ∈ Z g ( | k | ) N k,r ( x ) is a function of bo unded v ar iation. T o s ee the last claim, note that N 0 ,r is of b ounded v ariation a nd hence k N k,r k TV = k N 0 ,r k TV (where k · k TV denotes the usual total v ariation norm) s o that k H k TV ≤ k N 0 ,r k TV × P k ∈ Z | g ( | k | ) | < ∞ b ecause P k | b l,l − k | ≤ P k cλ | k | < ∞ . The las t fact implies that H = { H (2 j ( · ) − l ) : l ∈ Z , j ∈ N ∪ { 0 }} satisfies, fo r finite cons tan ts B > 1 a nd w ≥ 1 , sup Q N ( H , L 2 ( Q ) , ε ) ≤ B k H k TV ε w for 0 < ε < k H k ∞ , as proved in [ 29 ]. Since N j, 0 ,r is ze r o if y is no t contained in [0 , 2 − j r ] , the sum in ( 31 ), for fixed y a nd j , extends o ver only those l ’s such that 2 j y − r ≤ l < 2 j y , hence it consists of at most r terms. This implies that K is contained in the set H r of lin- ear combinations o f at most r functions fr om H , with coefficients b ounded in absolute v a lue by k N j,l,r k ∞ = k N 0 ,r k ∞ < ∞ . Given ε , let ε ′ = ε/ (2 r max( k H k ∞ , k N 0 ,r k ∞ )). Let α 1 , . . . , α n 1 be an ε ′ -dense subs e t of [ −k N 0 ,r k ∞ , k N 0 ,r k ∞ ] which, for ε ′ < k N 0 ,r k ∞ , has cardinality n 1 ≤ 3 k N 0 ,r k ∞ /ε ′ . F urther mo re, let h 1 , . . . , h n 2 be a subset of H of ca rdinal- it y n 2 = N ( H , L 2 ( Q ) , ε ′ ) which is ε ′ -dense in H in the L 2 ( Q )-metric. It follows that fo r ε ′ < min( k H k ∞ , k N 0 ,r k ∞ ), every P l ∈ Z N j,l,r ( y ) H (2 j ( · ) − l ) is at L 2 ( Q )-distance at most ε fro m P r l =1 α i ( l ) h i ′ ( l ) for some 1 ≤ i ( l ) ≤ n 1 and 1 ≤ i ′ ( l ) ≤ n 2 . The total num b er of such linear combinations is do minated by ( n 1 n 2 ) r ≤ ( B ′ /ε ) ( w + 1) r . This shows that the lemma holds for ε < 2 r min {k H k ∞ , k N 0 ,r k ∞ } max {k H k ∞ , k N 0 ,r k ∞ } = 2 r k H k ∞ k N 0 ,r k ∞ = U , which c o mpletes the pro of b y tak ing A = max( B ′ , U, e ) (for ε ∈ [ U, A ] , one ball cov ers the who le set). Prop osition 3. L et K b e as in ( 1 ) and assume either that φ has c omp act supp ort and is of b ounde d p -vari ation ( p < ∞ ) or t ha t φ is a Battle–L emari´ e sc aling fun ctio n for some r ≥ 1 . Su pp ose that P has a b ounde d density p 0 . Given C, T > 0 , ther e exist finite p ositive c onstants C 1 = C 1 ( C, K, k p 0 k ∞ ) and C 2 = C 2 ( C, T , K , k p 0 k ∞ ) such that, if n 2 j j ≥ C and C 1 r 2 j j n ≤ t ≤ T , then Pr n sup y ∈ R | p n ( y , j ) − E p n ( y , j ) | ≥ t o ≤ exp − C 2 nt 2 2 j . (32) Pro of. W e first prov e the Battle–Lemari´ e wav elet cas e. If r > 1, then the function K is co n tinuous (see the pro of of Lemma 1 ) a nd therefor e the supr em um in ( 32 ) is ov er a 1154 E. Gin´ e and R. Nickl countable set. That this is also true for r = 1 follo ws from Remark 1 in [ 14 ]. W e apply Prop osition 1 and Lemma 2 to the supremum of the empirical proces s indexed by the classes of functions K j := { 2 − j K j ( · , y ) / (2 k Φ k ∞ ) : y ∈ R } , where Φ is a function ma jor izing K (as in ( 2 )) so that K j is uniformly b ounded by 1 / 2 . W e next bo und the se cond mo men ts E (2 − 2 j K 2 j ( X, y )) . W e hav e, using ( 2 ), that Z 2 − 2 j K 2 j ( x, y ) p 0 ( x ) d x ≤ Z Φ 2 ( | 2 j ( x − y ) | ) p 0 ( x ) d x (33) ≤ 2 − j Z Φ 2 ( | u | ) p 0 ( y + 2 − j u ) d u ≤ 2 − j k p 0 k ∞ k Φ k 2 2 . W e may hence ta ke σ = p 2 − j k Φ k 2 2 k p 0 k ∞ / (2 k Φ k ∞ ) a nd the result is then a direct co nse- quence of Pr o position 1 , which a pplies by Le mma 2 . F or compactly supp orted wa velets, the sa me pr oof a pplies, using Lemma 2 (and Remar k 1 ) in [ 1 4 ]. Pro of of Theorem 1 . Using Lemma 2 , the first tw o claims o f the theorem follow by the sa me pro of as in [ 14 ], Theorem 1 a nd Remark 4. F or the bias term, we argue as in Theorem 8 .1 in [ 17 ] – using the fact that φ r is ( r − 1) - times differe ntiable – and obtain, for p 0 ∈ C t ( R ) , | E p n ( x ) − p 0 ( x ) | ≤ 2 − j t k p 0 k t, ∞ C, (34) where C := C (Φ) = R Φ( | u | ) | u | t d u . 3.4. An exponential inequalit y for the distribution function of the linear estimator The quantit y o f int er est in this subsectio n is the distribution function F S n of the linear pro jection estimator p n from ( 6 ). More precisely , we will study the sto c hastic pr ocess √ n ( F S n ( s ) − F ( s )) = √ n Z s −∞ ( p n ( y , j ) − p 0 ( y )) d y , s ∈ R . T o pr o ve a functional CL T for this pro cess, it turns out that it is ea sier to compa re F S n to F n rather than to F . With F = { 1 ( −∞ ,s ] : s ∈ R } , the deco mposition ( F S n − F n )( s ) = ( P n − P )( π j ( f ) − f ) + Z ( π j ( p 0 ) − p 0 ) f , f ∈ F , (35) will b e useful, since it splits the quantit y o f int er est into a deterministic ‘bias’ ter m and an empirica l pr o cess. A daptive estimation 1155 Lemma 3. Assu me that p 0 is a b ounde d function ( t = 0 ) or t ha t p 0 ∈ C t ( R ) for some 0 < t ≤ r . L et F = { 1 ( −∞ ,s ] : s ∈ R } . We then have Z R ( π j ( p 0 ) − p 0 ) f ≤ C 2 − j ( t +1) (36) for some c onst ant C dep ending only on r and k p 0 k t, ∞ . Pro of. Let ψ := ψ r be the mother w avelet asso ciated with φ r . Since the w avelet ser ie s of p 0 ∈ L 1 ( R ) conv er ges in L 1 ( R ), we ha ve π j ( p 0 ) − p 0 = − P ∞ l = j P k β lk ( p 0 ) ψ lk in the L 1 ( R )-sense and then, s inc e f = 1 ( −∞ ,s ] ∈ L ∞ ( R ), − Z R ( π j ( p 0 ) − p 0 ) f = Z R ∞ X l = j X k β lk ( p 0 ) ψ lk ( x ) ! f ( x ) d x = ∞ X l = j X k β lk ( p 0 ) β lk ( f ) . The lemma now follows from a n estimate for the decay of the w avelet co efficien ts o f p 0 and f , namely , the b ounds sup f ∈F X k | β lk ( f ) | ≤ c 2 − l/ 2 and sup k | β lk ( p 0 ) | ≤ c ′ 2 − l ( t +1 / 2) . (37) The first bound is prov ed a s in the proo f o f Le mma 3 in [ 14 ], noting that the iden tity befo re equa tion (37) in that pro o f also holds for spline w avelets b y their exp onen tial decay prop erty . The seco nd b ound follows fr o m sup k | β lk ( p 0 ) | ≤ c ′′ 2 − l/ 2 k K l +1 ( p 0 ) − K l ( p 0 ) k ∞ ≤ c ′′ 2 − l/ 2 ( k K l ( p 0 ) − p 0 k ∞ + k K l +1 ( p 0 ) − p 0 k ∞ ) ≤ c ′ 2 − l/ 2 2 − lt , where we used (9.35) in [ 17 ] for the first ineq ualit y and ( 34 ) in the la st. T o co ntrol the fluctuations of the sto c has tic term, one applies T a lagrand’s inequality to the empir ic al pr ocess index e d by the ‘shrinking ’ cla sses of functions { π j ( f ) − f : f ∈ F } . These clas ses cons ist o f differences of elements in F and in K ′ j := Z t −∞ K j ( · , y ) d y : t ∈ R , and we hav e to show tha t for each j , this class sa tisfies the entropy condition ( 17 ). Again, for φ with compa ct supp ort (and of finite p -v ar iation), this result was proven in Lemma 2 of [ 14 ] a nd we now extend it to the Battle–Lemar i ´ e wa velets considered here. Lemma 4. L et K ′ j b e as ab ove, wher e φ r is a Ba t tle–L emari ´ e wavelet for r ≥ 1 . Ther e then exist finite c onstants A ≥ e and v ≥ 2 , indep endent of j and such that sup Q N ( K ′ j , L 2 ( Q ) , ε ) ≤ A ε v , 0 < ε < 1 , 1156 E. Gin´ e and R. Nickl wher e the supr emum extends over al l Bor el pr ob ability me asur es on R . Pro of. In analog y to the pro of of Lemma 2 , one can write Z t −∞ K j ( · , y ) d y = X l ∈ Z Z t −∞ 2 j N j,l,r ( y ) d y H (2 j ( · ) − l ) since the series ( 31 ) co n verges abso lutely (in view of X l | H (2 j x − l ) | ≤ X k | g ( | k | ) | X l N k,r (2 j x − l ) ≤ r k N 0 ,r k ∞ X k | g ( | k | ) | < ∞ ) . Recall that N j,l,r is suppo r ted in the int er v al [2 − j l , 2 − j ( r + l )]. Hence, if l > 2 j t , then the last integral is zer o. F or l ≤ 2 j t − r , the integral equals the constant c = R R N 0 ,r ( y ) d y and for l ∈ [2 j t − r, 2 j t ], the int eg ral c j,l,r is b ounded by c , so this sum, in fact, equals c X l ≤ 2 j t − r H (2 j ( · ) − l ) + X 2 j t − r λ ) ≤ L exp − min(2 j λ 2 , √ nλ ) L . Pro of. Giv en the pr eceding le mma s, the pro position follows from T ala grand’s inequality applied to the cla ss { π j (1 ( −∞ ,x ] ) − 1 ( −∞ ,x ] } in the same wa y as in the pro of of Lemma 4 in [ 14 ], so we omit it. A daptive estimation 1157 3.5. Pro of of Theorem 3 W e can now prove the main result, Theorem 3 . W e will pr o ve it only for Battle– L e mari ´ e wa velets. F o r compactly supp orted wa velets, the pro of is exac tly the s a me, r eplacing the results from steps (I) and (I I) b elow and from Sectio ns 3.3 and 3.4 for spline wa velets by the corr e sponding ones for co mpactly supp orted wa velets obtained in [ 14 ]. Also, unifor- mit y in p 0 – which is proved by co n trolling the resp ective constants – is left implicit in the der iv ations. W e start with some prelimina r y obser v ations. (I) Since, uniformly in j ∈ J , we hav e n/ (2 j j ) > c lo g n for so me c > 0 indep enden t o f n , we hav e from Theo rem 1 that E k p n ( j ) − E p n ( j ) k p ∞ ≤ D p 2 j j n p/ 2 := D p σ p ( j, n ) (38) for every j ∈ J , 1 ≤ p < ∞ and so me 0 < D < ∞ dep ending only on k p 0 k ∞ and Φ. F or the bias, we reca ll fro m ( 34 ) that for 0 < t ≤ r , | E p n ( y , j ) − p 0 ( y ) | ≤ 2 − j t k p 0 k t, ∞ C (Φ) := B ( j, p 0 ) . (39) If the density p 0 is o nly uniformly contin uous , then one still has from ( 2 ) a nd integrability of Φ that, uniformly in y ∈ R , | E p n ( y , j ) − p 0 ( y ) | ≤ Z | Φ( | u | ) || p 0 ( y − 2 − j u ) − p 0 ( y ) | d u := B ( j, p 0 ) = o(1) . (40) (II) Define ˜ M := ˜ M n = C k p n ( j max ) k ∞ and set C = 49 k Φ k 2 2 . Also, define M = C k p 0 k ∞ for the sa me C . W e need to control the pro babilit y that ˜ M > 1 . 01 M or ˜ M < 0 . 99 M if p 0 is uniformly contin uous . F or s ome 0 < L < ∞ and n lar ge enoug h, we hav e Pr( | ˜ M − M | > 0 . 01 C k p 0 k ∞ ) = Pr( |k p n ( j max ) k ∞ − k p 0 k ∞ | > 0 . 01 k p 0 k ∞ ) ≤ Pr( k p n ( j max ) − p 0 k ∞ > 0 . 01 k p 0 k ∞ ) ≤ Pr( k p n ( j max ) − E p n ( j max ) k ∞ > 0 . 01 k p 0 k ∞ − B ( j max , p 0 )) ≤ Pr( k p n ( j max ) − E p n ( j max ) k ∞ > 0 . 009 k p 0 k ∞ ) ≤ exp − (log n ) 2 L , by P ropos ition 3 a nd step (I). F ur thermore, there exists a c onstan t L ′ such that E ˜ M ≤ L ′ for every n , in view o f E k p n ( j max ) k ∞ ≤ E k p n ( j max ) − E p n ( j max ) k ∞ + k E p n ( j max ) k ∞ ≤ c + k Φ k 1 k p 0 k ∞ , where we hav e used ( 2 ) and ( 3 8 ) . 1158 E. Gin´ e and R. Nickl (II I) W e need some observ ations on the Radema c her pr ocesses used in the definition of ˆ j n . Fir st, for the symmetriz ed empiric a l mea sure ˜ P n = 2 n − 1 P n i =1 ε i δ X i , we hav e R ( n, j ) = k π j ( ˜ P n ) k ∞ = k π j ( π l ( ˜ P n )) k ∞ ≤ k π j k ′ ∞ R ( n, l ) ≤ B ( φ ) R ( n, l ) (41) for every l > j . Her e, k π j k ′ ∞ is the op erator norm in L ∞ ( R ) of the pro jection π j , which admits b ounds B ( φ ) indep enden t of j . (Clearly , π j acts on finite signed mea sures µ by duality , taking v alues in L ∞ ( R ) since | π j ( µ ) | = | R K j ( · , y ) d µ ( y ) | ≤ 2 j k Φ k ∞ | µ | ( R ).) See Remark 3 for details on how to obtain B ( φ ). F urther more, for j < l , T ( n, j, l ) ≤ R ( n, j ) + R ( n, l ) ≤ (1 + B ( φ )) R ( n, l ) (42) and the sa me inequality holds fo r the Rademacher exp ectations of T ( n, j, l ). W e also record the following b ound for the (full) exp ectation of R ( n, l ) , l ∈ J : using ineq ualit y ( 27 ) and the v ar ia nce computation ( 33 ), we have that there exists a constant L dep ending only on k p 0 k ∞ and Φ such tha t, for every l ∈ J , E R ( n, l ) ≤ L p 2 l l /n. Pro of of ( 11 ). Let F = { 1 ( −∞ ,s ] : s ∈ R } and let f ∈ F . W e have √ n Z ( p n ( ˆ j n ) − p 0 ) f = √ n Z ( p n ( j max ) − p 0 ) f + √ n Z ( p n ( ˆ j n ) − p n ( j max )) f . The first term s atisfies the CL T from Theo rem 2 for the line a r estimator with j n = j max . W e now show that the second ter m conv erg es to zero in probability . Firs t, observe tha t p n ( ˆ j n )( y ) − p n ( j max )( y ) = P n ( K ˆ j n ( · , y ) − K j max ( · , y )) = − j max − 1 X l = ˆ j n X k ˆ β lk ψ lk ( y ) , with co n vergence in L 1 ( R ). Next, we hav e, by (9.35) in [ 17 ], for all l ∈ [ ˆ j n , j max − 1] and all k , by the definition of ˆ j n , that for some 0 < D ′ < ∞ , (1 /D ′ )2 l/ 2 | ˆ β lk | ≤ sup y ∈ R | P n ( K l +1 ( · , y )) − P n ( K l ( · , y )) | = k p n ( l + 1) − p n ( l ) k ∞ ≤ k p n ( l + 1) − p n ( ˆ j n ) k ∞ + k p n ( l ) − p n ( ˆ j n ) k ∞ ≤ (1 + B ( φ ))( R ( n, l + 1) + R ( n, l )) + 3 q ˜ M 2 l l /n, in the case ˆ j n = ¯ j n , also using the inequality T ( n, ¯ j n , l ) ≤ (1 + B ( φ )) R ( n, l ) for l ≥ ¯ j n ; see ( 42 ). Co ns equen tly , uniformly in f ∈ F , E Z ( p n ( ˆ j n ) − p n ( j max )) f = E j max − 1 X l = ˆ j n X k ˆ β lk Z ψ lk ( y ) f ( y ) d y A daptive estimation 1159 ≤ E j max − 1 X l = j min D ′ 2 − l/ 2 (( B ( φ ) + 1)( R ( n, l + 1) + R ( n, l )) + 3 q ˜ M 2 l l /n ) X k | β lk ( f ) | ≤ D ′′ √ n j max − 1 X l = j min 2 − l/ 2 √ l = o 1 √ n , using the momen t b ounds in (II) and (I II), ˆ j n ≥ j min → ∞ as n → ∞ (b y definition of J ) and the fa ct that s up f ∈F P k | β lk ( f ) | ≤ c 2 − l/ 2 by ( 37 ) for some constant c . Pro of of ( 12 ) and ( 13 ). The pro of o f the case t = 0 fo llows from a simple mo dification of the arguments b elow as in Theo rem 2 of [ 13 ], s o we omit it. (In this case, one defines j ∗ as j max if t = 0 so that only the case ˆ j n ≤ j ∗ has to be considere d.) F or t > 0 , define j ∗ := j ( p 0 ) by the balance equation j ∗ = min { j ∈ J : B ( j, p 0 ) ≤ p 2 log 2 k p 0 k 1 / 2 ∞ k Φ k 2 σ ( j, n ) } . (43) Using the results from (I), it is easily verified that 2 j ∗ ≃ ( n/ log n )) 1 / (2 t +1) if p 0 ∈ C t ( R ) for so me 0 < t ≤ r and that σ ( j ∗ , n ) = O log n n t/ (2 t +1) is the rate of c on vergence r equired in ( 13 ). W e will consider the cases { ˆ j n ≤ j ∗ } and { ˆ j n > j ∗ } separa tely . First, if ˆ j n is ¯ j n , then we hav e, b y the definition of ¯ j n , ( 42 ), the definitions of M and j ∗ , ( 38 ) and the moment bo und in (I I I), E k p n ( ¯ j n ) − p 0 k ∞ I { ¯ j n ≤ j ∗ }∩{ ˜ M ≤ 1 . 01 M } ≤ E ( k p n ( ¯ j n ) − p n ( j ∗ ) k ∞ + E k p n ( j ∗ ) − p 0 k ∞ ) I { ¯ j n ≤ j ∗ }∩{ ˜ M ≤ 1 . 01 M } (44) ≤ ( B ( φ ) + 1) E R ( n, j ∗ ) + √ 1 . 01 M σ ( j ∗ , n ) + k p n ( j ∗ ) − p 0 k ∞ ≤ B ′ r 2 j ∗ j ∗ n + B ′′ σ ( j ∗ , n ) = O( σ ( j ∗ , n )) . If ˆ j n is ˜ j n , then one has the same b ound (without even using ( 42 )). Also, by the r esults in (I) and (I I), we hav e E k p n ( ˆ j n ) − p 0 k ∞ I { ˆ j n ≤ j ∗ }∩{ ˜ M > 1 . 01 M } ≤ X j ∈J : j ≤ j ∗ E ([ k p n ( j ) − E p n ( j ) k ∞ + B ( j, p 0 )] I { ˆ j n = j } I { ˜ M > 1 . 01 M } ) ≤ c log n [ D σ ( j ∗ , n ) + B ( j min , p 0 )] · q E 1 { ˜ M > 1 . 01 M } 1160 E. Gin´ e and R. Nickl = o (log n ) s exp − (log n ) 2 L ! = o( σ ( j ∗ , n )) . W e now turn to { ˆ j n > j ∗ } . First, E k p n ( ˆ j n ) − p 0 k ∞ I { ˆ j n >j ∗ }∩{ ˜ M < 0 . 99 M } ≤ X j ∈J : j >j ∗ E ([ k p n ( j ) − E p n ( j ) k ∞ + B ( j, p 0 )] I { ˆ j n = j } I { ˜ M < 0 . 99 M } ) ≤ c ′ log n [ D σ ( j max , n ) + B ( j ∗ , p 0 )] · q E I { ˜ M < 0 . 99 M } = O s (log n ) ex p − (log n ) 2 L ! = o( σ ( j ∗ , n )) , again by the res ults in (I) and (I I), and, second, for any 1 < p < ∞ , 1 / p + 1 /q = 1, us ing ( 38 ) and the definition o f j ∗ , we hav e E k p n ( ˆ j n ) − p 0 k ∞ I { ˆ j n >j ∗ }∩{ 0 . 99 M ≤ ˜ M } ≤ X j ∈J : j >j ∗ ( E k p n ( j ) − p 0 k p ∞ ) 1 /p ( E I { ˆ j n = j }∩{ 0 . 99 M ≤ ˜ M } ) 1 /q ≤ X j ∈J : j >j ∗ D ′ σ ( j, n ) · P r( { ˆ j n = j } ∩ { 0 . 99 M ≤ ˜ M } ) 1 /q . W e show b elow that for n large enoug h, so me co nstan t c , some δ > 0 and some q > 1 , Pr( { ˆ j n = j } ∩ { 0 . 99 M ≤ ˜ M } ) ≤ c 2 − j ( q/ 2+ δ ) , (45) which gives the b ound X j ∈J : j >j ∗ D ′′ σ ( j, n ) · 2 − j / 2 − jδ /q = O 1 √ n = o( σ ( j ∗ , n )) , completing the pro of, mo dulo verification of ( 45 ). T o verify ( 45 ), we split the pro of into tw o cases. Pick any j ∈ J such that j > j ∗ and denote by j − the pr e vious element in the grid (that is, j − = j − 1). Case I : ˆ j n = ¯ j n . W e hav e Pr( { ¯ j n = j } ∩ { 0 . 99 M ≤ ˜ M } ) ≤ X l ∈J : l ≥ j Pr( k p n ( j − ) − p n ( l ) k ∞ > T ( n, j − , l ) + √ 0 . 99 M σ ( l , n )) . W e first observe that k p n ( j − ) − p n ( l ) k ∞ A daptive estimation 1161 (46) ≤ k p n ( j − ) − p n ( l ) − E p n ( j − ) + E p n ( l ) k ∞ + B ( j − , p 0 ) + B ( l , p 0 ) , where, setting √ 2 log 2 k p 0 k 1 / 2 ∞ k Φ k 2 =: U ( p 0 , Φ), B ( j − , p 0 ) + B ( l , p 0 ) ≤ 2 B ( j ∗ , p 0 ) ≤ 2 U ( p 0 , Φ) σ ( j ∗ , n ) ≤ 2 U ( p 0 , Φ) σ ( l, n ) , by definition of j ∗ and since l > j − ≥ j ∗ . Co nsequen tly , the l th probability in the last sum is b ounded by Pr( k p n ( j − ) − p n ( l ) − E p n ( j − ) + E p n ( l ) k ∞ (47) > T ( n, j − , l ) + ( √ 0 . 99 M − 2 U ( p 0 , Φ)) σ ( l, n )) and we now a pply Corolla ry 1 to this b ound. Define the class of functions F := F j − ,l = { 2 − l ( K j − ( · , y ) − K l ( · , y )) / (4 k Φ k ∞ ) } , which is uniformly b ounded by 1 / 2 and satisfies ( 17 ) for some A a nd v indep endent o f l and j − , b y Lemma 2 (a nd a co mputatio n on covering n umber s). W e compute σ , using ( 33 ) and l > j − : (2 − l E ( K j − − K l )( X, y )) 2 ≤ 2 − 2 l +1 ( E K 2 j − ( X, y ) + E K 2 l ( X, y )) ≤ 2 − 2 l +1 k Φ k 2 2 k p 0 k ∞ (2 j − + 2 l ) ≤ 3 · 2 − l k Φ k 2 2 k p 0 k ∞ , so that we can take σ 2 = 3 · 2 − l k Φ k 2 2 k p 0 k ∞ / (16 k Φ k 2 ∞ ) . The probabilit y in ( 47 ) is then equal to Pr 2 l 4 k Φ k ∞ n n X i =1 f ( X i ) − P f F > 2 l 4 k Φ k ∞ n 2 n X i =1 ε i f ( X i ) F + ( √ 0 . 99 M − 2 U ( p 0 , Φ)) σ ( l, n ) ! = Pr n X i =1 f ( X i ) − P f F > 2 n X i =1 ε i f ( X i ) F + 3 n ( √ 0 . 99 M − 2 U ( p 0 , Φ)) σ ( l, n ) 3 · 2 l · 4 k Φ k ∞ ! . Since nσ 2 / log(1 / σ ) ≃ n/ (2 l l ) → ∞ uniformly in l ∈ J , there exists λ n → ∞ indep enden t of l such that ( 19 ) is sa tisfied a nd the choice t = n ( √ 0 . 99 M − 2 U ( p 0 , Φ)) σ ( l, n ) 3 · 2 l · 4 k Φ k ∞ 1162 E. Gin´ e and R. Nickl is admiss ible in Coro lla ry 1 for c 2 ( λ n ) = 1 + 120 λ − 1 n + 10 , 800 λ − 2 n . Hence, using Co rollary 1 , the last probability is b ounded by ≤ 2 exp − n 2 ( √ 0 . 99 M − 2 U ( p 0 , Φ)) 2 (2 l l /n )16 k Φ k 2 ∞ 9 · 6 . 3 · c 2 ( λ n )2 2 l n 2 − l k Φ k 2 2 k p 0 k ∞ 16 k Φ k 2 ∞ ≤ 2 − l (( q/ 2)+ δ ) (48) for so me δ > 0 a nd q > 1 , by the definition of M . Since P l ∈J : l ≥ j 2 − l ( q/ 2)+ δ ) ≤ c 2 − j (( q/ 2)+ δ ) , we hav e prov en ( 45 ). Case II : ˆ j n = ˜ j n . The pr oof reduces to the pr evious case since, by inequality ( 42 ), one has Pr( { ˜ j ε n = j } ∩ { 0 . 99 M ≤ ˜ M } ) ≤ X l ∈J : l ≥ j Pr( k p n ( j − ) − p n ( l ) k ∞ > ( B ( φ ) + 1) R ( n, l ) + √ 0 . 99 M σ ( l , n )) ≤ X l ∈J : l ≥ j Pr( k p n ( j − ) − p n ( l ) k ∞ > T ( n, j − , l ) + √ 0 . 99 M σ ( l , n )) . Ac kno wledgemen ts W e would lik e to thank Patricia Reyna ud-Bouret and Benedikt P¨ otsc her for helpful comments. The idea of using Rademacher thresholds in Lepski’s method ar ose fro m a conv ersa tion with Patricia Reynaud-B ouret. References [1] Barron, A., Birg ´ e, L. and Massart, P . (1999). Risk b ounds for mo del selection via p enal- ization. Pr ob ab. T he ory Re late d Fiel ds 113 301–413. MR1679028 [2] Bartlett, P ., Boucheron, S. and Lugosi, G. (2002). Model selection and error estimation. Mach. Le arn. 48 85–113. [3] Bousquet, O. (2003). Concentratio n inequ ali ties for sub-additive fun ctions using the en - tropy metho d . In Sto chastic Ine qualities and Applic ations (E. Gin´ e, C. Houdr´ e and D. Nualart, ed s.). Pr o gr. Pr ob ab. 56 213–247. Boston: Birkh¨ auser. MR2073435 [4] Daub ec h ies , I. (1992). T en L e ctur es on Wavelets . CBMS-NSF R e g. Conf. Ser. in Appl. Math. 61 . Philadelphia, P A : SI AM. MR1162107 [5] Deheuvels, P . (2000). Uniform limit laws for kernel densit y estimators on p ossibly u n- b ounded interv als. In R e c ent A dvanc es i n R el i ability The ory (N. Limnios and M. Nikulin, eds.) 477–492. Boston: Birkh¨ auser. MR1783500 [6] DeV ore, R.A. and Lorentz, G.G. (1993). Constructive Appr oximation . Berli n : Sp ringer. MR1261635 [7] Donoho, D.L., Johnstone, I.M., Kerkyachari an, G. and Picard, D . (1996). Density estima- tion by wa velet thresholding. Ann. Statist. 24 508–539. MR1394974 [8] Dud ley , R .M. (1999 ). Uniform Centr al Limit The or ems. Cambridge: Cambridge Univ. Press. MR1720712 A daptive estimation 1163 [9] Einmahl, U. and Mason, D.M. (2000). A n empirical pro cess app roa ch to t h e uniform con- sistency of kernel-type fun ction estimators. J. The or et. Pr ob ab. 13 1–37. MR1744994 [10] Gin´ e, E. and Guillou, A. (2001). O n consistency of kernel den sit y estimators for randomly censored data: Rates holding uniformly over adaptive interv als. Ann. Inst. H. Poinc ar´ e Pr ob ab. Statist. 37 503–522. MR1876841 [11] Gin´ e, E. and Guillou, A . ( 20 02). Rates of strong uniform consistency for multiv ariate kernel density estimators. Ann . Inst. H. Poinc ar´ e Pr ob ab. Statist. 38 907–921 . MR1955344 [12] Gin´ e, E. and Koltchinskii, V. ( 2006). Concentration inequ ali ties and asymptotic results for ratio type emp irica l processes. Ann. Pr ob ab. 34 1143–12 16. MR2243881 [13] Gin´ e, E. and Nickl, R. (2009). An exp onential inequ ali ty for the distribu t io n function of the kernel density estimator, with applications to adaptive estimation. Pr ob ab. The ory R elate d Fields 143 569–596. MR2475673 [14] Gin´ e, E. and N ic kl, R. (2009). U nifo rm limit theorems for wa vel et d ensit y estimators. Ann. Pr ob ab. 37 1605–16 46. MR2546757 [15] Goldenshluger, A. and Lepski, O. (2009). Struct u ral adaptation via L p -norm oracle inequal- ities. Pr ob ab. The ory R elate d Fields 143 41–71. MR2449122 [16] Golub ev, Y., Lepsk i, O. and Levit, B. ( 20 01). On adap tiv e estimation for the sup-norm losses. Math. Metho ds Statist. 10 23–37. MR1841807 [17] H¨ ardle, W., Kerkyac harian, G., Picard, D. and Tsybako v, A. (1998). Wavelets, Appr oxima- tion, and Statistic al Appli c ations. L e ctur e Notes in Statistics 129 . New Y ork: Springer. MR1618204 [18] Huang, S.-Y . (1999). Densit y estimation by wa velet-based repro ducing kernels. Statist . Sinic a 9 137–151 . MR1678885 [19] Huang, S.-Y. and Studden, W.J. (1993). Density estimation using spline pro jection kernels. Comm. Statist. The ory Metho ds 22 3263–328 5. MR1245544 [20] Kerkyac harian, G. and Picard, D. (1992). Density estimation in Besov spaces. Statist. Pr ob ab. L ett. 13 15–24. MR1147634 [21] Klein, T. and R io, E. (2005 ). Concentration around the mean for maxima of empirical processes. Ann. Pr ob ab. 33 1060–1077. MR2135312 [22] Koltchinskii, V. (2001). Rademac her p enalties and structural risk minimization. IEEE T r ans. Inform. The ory 47 1902–191 4. MR1842526 [23] Koltchinskii, V . (2006). Lo cal Rademacher complexities and oracle in eq ualitie s in risk min- imization. Ann. Statist. 34 2593–2656. MR2329442 [24] Korostelev, A. and Nussbaum, M. (1999). The asymptotic minimax constant for sup- norm loss in n onparametric density estimation. Bernoul l i 5 1099–1118 . MR1735786 [25] Ledoux, M. (2001). The Conc entr ation of Me asur e Phenomenon. Mathematic al Surveys and Mono gr aphs 89 . Providence, R I: Amer. Math. So c. MR1849347 [26] Lepski, O.V. (1991). Asymp totical ly minimax adaptive estimation. I. U pper b ounds. Op - timally adaptive estimates. The ory Pr ob ab. Appl. 36 682–697. MR1147167 [27] Massart, P . (2000). A bout the constants in T alagrand’s concentrati on inequ ali ties for em- pirical processes. Ann. Pr ob ab. 28 863–884. MR1782276 [28] Meyer, Y. (1992). Wavelets and Op er ators I. Cam bridge: Cambridge Univ. Press. MR1228209 [29] Nolan, D. and P ollard, D. (1987). U -pro cesses: Rates of conv ergence. Ann. Statist . 15 780–799 . MR0888439 [30] Shadrin, A .Y. (2001). The L ∞ -norm of th e L 2 -spline p ro jector is b ounded indep endently of the knot sequence: A p roof of de Boor’s conjecture. A cta Math. 187 59–137 . MR1864631 [31] Sch umaker, L.L. ( 19 93). Spli ne F unctions: Basic The ory. Malabar: Krieger. MR1226234 1164 E. Gin´ e and R. Nickl [32] T alagrand, M. (1994). Sh arp er b ounds for Gaussian and empirical pro cesses. Ann. Pr ob ab. 22 28–76. MR1258865 [33] T alagrand, M. (1996). New concentration inequalities in pro duct spaces . I nven t. Math. 126 505–563. MR1419006 [34] Tsybako v, A.B. (1998). Poin twi se and sup- norm sharp adaptive estimation of the fun ctions on the Sob olev classes. Ann. Statist. 26 2420–246 9. MR1700239 R e c eive d May 2008 and r evise d Augus t 2009
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment