Uniform bias study and Bahadur representation for local polynomial estimators of the conditional quantile function
This paper investigates the bias and the weak Bahadur representation of a local polynomial estimator of the conditional quantile function and its derivatives. The bias and Bahadur remainder term are studied uniformly with respect to the quantile leve…
Authors: Emmanuel Guerre, Camille Sabbah
Unif orm bias stud y and Bahadur re present a tion f or local pol ynomial estima tors of the conditional quantile function 1 Emman uel Guerre 2 Camille Sabb ah 3 Septem b er 2014 Abstract This pap er in vestigates the bias and the weak B a hadur representation of a local po lynomial estimator of the conditional quantile function a nd its deriv atives. The bia s and Bahadur re ma inder term are studied uniformly with res pec t to the quan tile level, the cov ariates a nd the smo othing parameter. The order of the lo cal po lynomial estimator can be hig her than the differen tiabilit y order of the conditional quantile function. Applicatio ns of the results deal with global o pti- mal consistency rates o f the loca l po ly nomial quant ile estimator, p erfo rmance of random bandwidths and estimation o f the conditional quan tile density func- tion. The la tter allows to o btain a simple estimato r of the conditiona l quantile function of the priv ate v alues in a first price sealed bids auctions under the independent priv ate v alues paradigm and risk neutrality . JEL Classific ation : Primar y C14; Secondary C21. Keywords: Bahadur repr esentation; Co nditio nal quantile function; Lo ca l po ly- nomial estimation; Econometr ics of Auctions . 1 This pap er w as started and comp leted when b oth authors were at Lab oratoire de Statistique Th´ eorique et Appliqu´ ee, U niversi t´ e Pierre et Marie Curie, which su p p ort is gratefully ackno wledged. Financial Su pp ort from the Department of Economics, Queen Mary U niversi ty of London, is also gratefully ackno wledged. The auth ors w ould like to th ank the participants of the Queen Mary Econometrics Reading Group, of th e Berlin Q u antil e Regression W orkshop, of the LSE Econometrics and Statistics W orkshop as wel l as the Associate Editor and t wo anonymous referees whose careful readings, suggestions and comments hav e helped to impro ve the pap er. All remaining errors are under our resp onsibilit y . This ve rsion corrects an error in the proof of Lemma B.2 which w as p oin ted out by Zhongjun Qu but do es not change the results of t he published ver sion. 2 School of Economics and Finance, Queen Mary Universit y of London, Great-Britain. e.guerre@qmul.a c.uk 3 Lab oratoire EQU IPPE, Universit ´ e Lille 3, F rance. camille.sab bah@univ-l ille3.fr 1 1. Introduction The c onditional quanti le function is a p o w erful tool to represen t the dep endence b et ween tw o v ariables. Let Q ( α | x ), α in (0 , 1), b e the cond itional qu an tile fun ction of a un iv ariate dep en d en t v ariable Y giv en X = x , where X is the d dimensional co v ariate, Q ( α | x ) = inf { y : P ( Y ≤ y | X = x ) ≥ α } . Under fairly general cond itions, the L ´ evy-Smirno v-Rosen b latt transformation ensures that there is a rand om v ariable A ind ep enden t of X and uniform o ver [0 , 1] su ch that (1.1) Y = Q ( A | X ) . In other w ords, the kno w ledge of the cond itional quan tile function allo ws to compute the impact on Y of a sho c k on X for any given A . The conditional qu an tile f u nction is also cen tral in the id en tification of the impact of suc h sho cks or of more general parameters in nonseparable mo dels in micro econometrics, see Ch esher (2003 ), C h ernozh uk o v and Hansen (2005), Holderlein and Mammen ( 2007) a nd I mb ens and N ew ey (2009) to men tion ju s t a f ew. S ee also Firp o, F ortin and Lemieux (2009) or Rothe (2010) for an unconditional p oin t of view wh en ev aluating distributional p olicy effects. Conditional quantile approac h es can also b e useful in industrial organizatio n du e to the imp ortan t r ole p la yed by increasing fu nctions and the equiv ariance prop erty of quan tile function wh ic h states that Ψ( Q ( α | x )) is the cond itional quantil e fu nction of Ψ( Y ) giv en X pro vid ed Ψ is an in creasing transformation. See Haile, Hong and Shum (200 3), Marmer and Shneyero v (2008) and b elo w for the case of auctions. Echenique and Kom u njer (2009 ) show the usefulness of a conditional qu antile approac h when analyzing general m ultiple equilibria economic mo d els. Ho wev er, inference with the qu an tile representati on (1.1) is p otentia lly difficult due to non- separabilit y . In a regression mo del Y = m ( X )+ ε w here X and ε are indep endent, the dep endence b et w een Y and X is summarized through the regression function m ( · ) and do es not inv olv e th e unobserved noise ε . Th is con trasts with (1.1) where the r andom v ariable A ma y p oten tially c h an ge the sh ap e of x 7→ Q ( A | x ). Hence, inference in (1.1) s hould not f o cus on a particular v alue of the quanti le leve l α but should consider instead all α in an in terv al [ α , α ] close enough to [0 , 1], as recommend ed for instance in the case of the more constrained quantil e regression mo del analyzed in Ko enk er (2005). In practice, this often leads to consider graphical represen- tations of the estimated curves x 7→ b Q ( α | x ) for v arious α . A natural norm for ev aluating these estimated graphs is the uniform norm with r esp ect to α and x , sup α,x b Q ( α | x ) − Q ( α | x ) . The presen t pap er contributes to this issue for local p olynomial quan tile estimators b Q h ( α | x ) whic h dep ends up on a bandwidth h . W e study its bias uniformly in α and x and d eriv e a u niform Bahadur representat ion for b Q h ( α | x ) and its deriv ative s whic h holds in pr ob ab ility , that is a 2 w eak Bahadu r repr esen tation. In few w ords, a Bahadur represen tation is an appro ximation of b Q h ( α | x ) − Q ( α | x ) b y a bias term plus a leading sto c hastic term up to remainder term with an explicit order. In our setup, uniformit y is with resp ect to the lev el α , the bandwidth h , and the co v ariate x , implyin g that our Bahadur repr esen tation is an imp ortant step for th e study of sup α,x b Q ( α | x ) − Q ( α | x ) , see P r op osition 2 b elo w. V arious other interesti ng resu lts also follo w from our uniform results. T o b e more s p ecific, consid er indep endent and id en tically observ ations ( X 1 , Y 1 ) , . . . , ( X n , Y n ) with the same distribution than ( X , Y ). Define, for α in (0 , 1), the loss f unction (1.2) ℓ α ( q ) = | q | + (2 α − 1) q = 2 q ( α − I ( q ≤ 0)) , q in R , where R stands for the set of real num b ers. It is well known that (1.3) Q ( α | x ) = arg min q ∈ R E [ ℓ α ( Y − q ) | X = x ] is th e conditional quan tile of Y giv en X = x . When d = 1, the local p olynomial estimator of order p of Q ( α | x ) is b Q h ( α | x ) = b b 0 ( α ; h, x ) where, f or b = ( b 0 , . . . , b p ) T , (1.4) b b ( α ; h, x ) = arg min b ∈ R p +1 n X i =1 ℓ α Y i − b 0 − b 1 ( X i − x ) − · · · − b p p ! ( X i − x ) p K X i − x h . In the expression abov e, p ! is the facto rial p × ( p − 1) × · · · × 1, K ( · ) is a k ernel function and h is a smo othing p arameter wh ic h go es to 0 with the sample s ize. As detailed in Section 2 and studied throughout the pap er, the lo cal p olynomial estimator b Q h ( α | x ) has a natur al extension w hic h co ve rs the multiv ariate case d > 1. As noted in F an and Gijb els (1996, Chapter 5), the lo cal p olynomial estimator b Q h ( α | x ) is a mo dification of the Least Squares lo cal p olynomial estimator of a regression fu nction w hic h u ses the square loss fu n ction in (1.4) instead of the loss function ℓ α ( · ). A T aylor expansion Q ( α | X i ) ≃ Q ( α | x ) + ∂ Q ( α | x ) ∂ x ( X i − x ) + · · · + 1 p ! ∂ p Q ( α | x ) ∂ x p ( X i − x ) p suggests that b b 1 ( α ; h, x ) , . . . , b b p ( α ; h, x ) estimate the partial d eriv ativ es ∂ r Q ( α | x ) /∂ x r , r = 1 , . . . , p , pro vided Q ( α | x ) is smo oth enough. Robust local p olynomial estimation of a r egression function and its deriv ativ es, including quan tile metho d s , has already b een considered in m an y researc h articles. S ee in particular Tsy- bak o v (1986 ) f or optimal p oin twise consistency rates, F an (1992) for d esign adaptation, and F an and Gijb els (1996) and Loader (1999) for a general o v erview. The present pap er is p erh aps more sp ecifical ly related to T ru ong (1989 ), Ch auduri (1991) , Holderlein and Mammen (2009) and Kong, Linto n and Xia (2010). T ru ong (1989) sho w ed that lo cal median estimators ac h iev e 3 the global optimal rates of Stone (1982) with resp ect to L m norms, 0 < m ≤ ∞ , for conditional quan tile fun ction satisfying a Lipsc hitz condition. Ch auduri (1991) obtained a strong (that is whic h holds in an almost sur e sense) Bahadur representa tion for the local p olynomial quantile estimators w hen the ke rnel function K ( · ) of (1.4) is u niform. Hong (2003 ) extend ed this result to lo cal p olynomial robust M-estimat ion and more general k ern els. Th e Bahadur representa- tion of Chaudhuri (1991) is p oint wise, that is holds for some prescrib ed x and α and a giv en deterministic bandwidth h → 0. As explained and illustrated in Kong et al. (2010), p oin t wise Bahadur representat ions are not sufficien t for m any applications includ ing plug in estimation of conditional qu an tile f u nctionals or marginal integrati on estimators. Hence Kong et al. (2010) deriv es a strong un iform Bahadur representati on for robu st lo cal p olynomial M-estimators for dep end ent observ ations. Here uniformity is w ith resp ect to th e lo cation v ariable x . F or lo cal p olynomial quan tile estimators of order p = 1, Holderlein and Mammen (2009) considers u ni- formit y with resp ect to α and x but they just sh o w that their remainder term is negligible in probabilit y and do es not obtain its ord er. In this w ork, w e stud y th e bias term and obtain the order in probabilit y of the Bahadur remainder term uniform ly in α , h and x for lo cal p olynomial qu antile estimators. A first contri- bution giv en in Th eorem 1 b elo w deals with the stud y of the bias of lo cal p olynomial quan tile estimators. Most of th e literature has fo cused on the case wh er e the order p of the lo cal p olyno- mial is equ al to the order of differentia bilit y of x 7→ Q ( α | x ), say s . This is somehow unr ealistic since it amounts to assu me that s is kn o wn . Since the case where p ≤ s can b e easily d ealt with b y ignoring higher order d eriv ative s, w e fo cus in the more in teresting case where p > s , which has apparent ly not b een considered in the statistic al and econometric literature. As sho wn in Corollary 1, a lo cal p olynomial quan tile estimator with p > s still allo ws to estimate Q ( α | x ) with the optimal rate n − s/ (2 s + d ) of Stone (1982 ). This s u ggests that lo cal p olynomial estimators using high order p should b e preferred since they allo w to estimate in an optimal w ay a wider range of s m o oth conditional qu antile functions. Another in teresting conclusion of our bias study is that the ad d itional lo cal p olynomial co efficien ts b b v ( α ; h, x ), v = s + 1 , . . . , p can diverge and Prop osition 1 describ es a simple example where it indeed happ ens. Hence, in the lo cal p olyno- mial setup, a high v alue of b b v ( α ; h, x ) may also corresp ond to a non smo oth quantile fun ction in whic h case a lo wer degree p < v could ha v e b een used. Our uniform stu d y of the Bahadur r emainder term, namely Theorem 2, is the second main con trib ution of the pap er. A th ir d con tribu tion builds on the fact that Theorems 1 and 2 hold uniformly with resp ect to x in a compact in ner subset of the supp ort of X . Com bining these results with a study of the sto c h astic part of the Bahadur rep r esen tation allo ws u s to sh ow that 4 the lo cal p olynomial qu an tile estimator achiev es the global optimal rates of S tone (1982) for th e L m and uniform norms provided the band width go es to 0 with an app ropriate rate. This result, stated in Corollary 1, is apparently new and extends T ruong (1989) whic h is restricted to Lipshitz quan tile fun ctions, or Chauduri (1991 ) wh o considers p oint wise optimalit y . A fourth con tribution uses the fact that Th eorems 1 and 2 hold uniformly w ith resp ect to h in an int erv al [ h , h ]. Prop osition 2 sho ws th at a random bandwidth p erforms as well as its deterministic equiv alent coun terpart with resp ect to con v ergence rates of the uniform norm sup α,x b Q h ( α | x ) − Q ( α | x ) . Suc h a result giv es a solid theoretical b asis to Li and Racine (2008) su ggestion of choosing the lo cal p olynomial bandwidth h via a simpler cr oss v alidation p ro cedure f or the conditional cum ulativ e distribu tion fu nction. As men tioned earlier, un iformit y w ith r esp ect to α and x is also u seful for graphical representat ions of (1.1 ). A fifth con tr ibution also exploits uniformit y with resp ect to the quan tile ord er α . Prop osi- tion 3 considers estimation of the conditional quan tile d ensit y function (1.5) q ( α | x ) = ∂ Q ( α | x ) ∂ α = 1 f ( Q ( α | x ) | x ) . As argued in Parze n (1979 ), the quan tile density function q ( α | x ) or its in v er s e 1 /q ( α | x ) is a renormalization of the density function f ( y | x ) wh ic h is w ell suited for statistical explanatory analysis. Th e function q ( α | x ) is also crucial for quan tile b ased statistical in ference. In deed, th e asymptotic v ariance of b Q h ( α | x ) is prop ortional to 1 nh α (1 − α ) q 2 ( α | x ) f ( x ) where f ( · ) is the marginal d ensit y of X , see F an and Gijb els (1996, p . 202). Hence estimating q ( α | x ) is usefu l to estimate the v ariance of b Q h ( α | x ). As noted in Guerre, P errigne and V uong (2009 ), the conditional quant ile den s it y function pla ys an imp ortant role in th e ident ification of first-price sealed b id s auction mo d els. Und er th e indep endent priv ate v alues p aradigm and risk neutralit y , the conditional quan tile fun ction of the pr iv ate v alues Q v ( α | x ) satisfies Q v ( α | x ) = Q b ( α | x ) + αq b ( α | x ) I − 1 , where Q b ( α | x ) and q b ( α | x ) are the conditional quan tile function and quantil e density function of the bids. Hence estimating Q b ( α | x ) and q b ( α | x ) giv es a straight forwa rd estimation of the conditional quan tile function of the priv ate v alues Q v ( α | x ) wh ich is an alternativ e to the t w o steps approac h of Guerre, Perrigne and V uong (2000). See Haile et al. (2003) or Marmer and Shneyero v (2008) for a related estimatio n strategy . 5 There is how ev er jus t a few references that address the estimation of q ( α | x ). F or th e related function q ( α | x ) ∂ F ( Q ( α | x ) | x ) ∂ x , Lee and Lee (2008) uses a comp osition approac h whic h non- parametrically estimates ∂ F ( y | x ) /∂ x , f ( y | x ) and Q ( α | x ) = F − 1 ( α | x ). Haile et al. (2003) and Marmer an d Shney ero v (2008 ) pro ceeds similarly . Xiang (1995) p rop oses the estimator 1 h q Z b F − 1 ( α + h q a | x ) dK q ( a ) , where b F ( y | x ) is a kernel estimator of the conditional cumulativ e distrib ution fu nction, K q ( · ) a probabilit y distribution an d h q a smo othing p arameter. As argued in F an and Gijb els (1996 ), lo cal p olynomial estimators ma y ha v e b etter design adaptation pr op erties than kernel ones. Hence we p rop ose to us e the lo cal p olynomial b Q h ( α | x ) instead of the k ernel b F − 1 ( α | x ). T hanks to un iformit y with r esp ect to α in T heorems 1 and 2, the r esulting conditional qu antile density function estimator b q ( α | x ) has a simple Bahadur represent ation wh ic h facilitates the study of its consistency rate, see Prop osition 3. The rest of the pap er is organized as follo ws. Th e next section groups our main assum ptions and notati ons and explained in particular ho w to extend (1.4) to m ultiv ariate co v ariates. Section 3 exp oses our main resu lts and Section 4 conclud es the p ap er. T he pro ofs of our statemen ts are gathered in t w o app end ices. 2. Main ass umptions an d not a tions The definition (1.4 ) of b Q h ( α | x ) assu mes that the co v ariate X is univ ariate. In th e mul- tiv ariate case, we use a multiv ariate ke rnel fun ction K ( z ) = K ( z 1 , . . . , z d ) bu t we restrict to an u niv ariate bandwidth for the sake of simp licit y . T h e u niv ariate p olynomial expansion b 0 + b 1 ( X i − x ) + · · · + b p ( X i − x ) p /p ! is r eplaced by a m ultiv ariate counterpart as d efined now. Let N b e the set of natural in teger n um b ers. F or v = ( v 1 , . . . , v d ) let | v | = v 1 + · · · + v d and let P b e the num b er of v ’s with | v | ≤ p . Then a generic expr ession f or multiv ariate p olynomial function of order p is, for b in R P , U ( z ) T b = X v ; | v |≤ p b v z v v ! , wh ere z v = z v 1 1 × · · · × z v d d , U ( z ) T = z v v ! , | v | ≤ p , and v ! = Π d i =1 v i !. In the expression ab ov e, the ve ctors v of N d are ordered according to the lexicographic order. The multiv ariate version of the lo cal p olynomial estimator (1.4) is b b ( α ; h, x ) = arg min b ∈ R P L n ( b ; α, h, x ) with (2.1) L n ( b ; α, h, x ) = 1 nh d n X i =1 ℓ α Y i − U ( X i − x ) T b K X i − x h . 6 As in the univ ariate case, the en tr y b b 0 ( α ; h, x ) = b Q h ( α | x ) of b b ( α ; h, x ) is an estimator of Q ( α | x ). The en try b b v ( α ; h, x ) can b e viewed as an estimator of the partial deriv ativ e b v ( α | x ) = ∂ | v | Q ( α | x ) ∂ x v 1 1 × · · · × ∂ x v d d pro vided this partial deriv ativ e exists. W e s h all consid er later on the follo wing H¨ older class. Consider a subset [ α , α ] of (0 , 1) o ver wh ic h Q ( α | x ) or its partial deriv ativ es will b e estimated. Let ⌊ s ⌋ b e the lo west int eger part of s , i.e. ⌊ s ⌋ is the unique intege r num b er with ⌊ s ⌋ < s ≤ ⌊ s ⌋ + 1. Then Q ( ·|· ) is in C ( L, s ), L, s > 0, if (i) for all α in [ α , α ], x 7→ Q ( α | x ) is ⌊ s ⌋ -th con tin uously different iable ov er the su pp ort X of X ; (ii) for all v in N d with | v | = ⌊ s ⌋ , all α in [ α , α ], all x , x ′ in X , b v ( α | x ) − b v α | x ′ ≤ L x − x ′ s −⌊ s ⌋ where k · k stands for the Euclidean norm. Since the estimators b b v ( α ; h, x ) of the partial deriv ativ es b v ( α | x ) con v erge with different rates, w e use the diagonal standardization matrix H = H ( h ) = Diag h | v | , v ∈ N d , | v | ≤ p . It is w ell kno wn that lo cal p olynomial estimation tec h niques app ly at the b ound aries. Ho wev er w e will fo cus on those x wh ic h are in an inner sub set X 0 of the supp ort X of X to a v oid tec hn icalities. O ur main assu mptions are as follo ws. Let B (0 , 1) b e th e closed u nit ball z ∈ R d : k z k ≤ 1 . Assumption X. The distribution of X has a pr ob ability density f u nction f ( · ) with r esp e ct to the L eb esgue me asur e, which i s strictly p ositive and c ontinuously differ entiable over the c omp act supp ort X of X . The set X 0 is a c omp act subset of the interior of X . Assumption F. The cumulative distribution function F ( ·|· ) of Y given X has a c ontinuous pr ob ability density function f ( y | x ) with r esp e ct to the L e b esgue me asur e, which is strictly p ositive for y in R and x in X . The p artial derivative ∂ F ( y | x ) /∂ x is c ontinuous over R × X . Ther e is a L 0 > 0 , such that f ( y | x ) − f ( y ′ | x ′ ) ≤ L 0 ( x, y ) − ( x ′ , y ′ ) for al l ( x, y ) , ( x ′ , y ′ ) of X × R . 7 Assumption K. The nonne g ative kernel function K ( · ) is Lipschitz over R d , has a c omp act sup- p ort K and satisfies R K ( z ) dz = 1 . F or some K > 0 , K ( z ) ≥ K I ( z ∈ B (0 , 1) ) . The b andwidth is in [ h n , h n ] with 0 < h n ≤ h n < ∞ , lim n →∞ h n = 0 and lim n →∞ (log n ) / ( nh d n ) = 0 . Assumption X is stand ard. Ass u mption F ensures un iqueness of the conditional quant ile Q ( α | x ) = F − 1 ( α | x ) in (1.3 ) and existence of th e quan tile densit y function (1. 5). Assump tion K allo ws for a wide range of smo othing parameters h → 0 in [ h n , h n ]. In the univ ariate case d = 1, Hong (20 03) restricts to bandwidths h = O ( n − 1 / (2 p +3) ), a cond ition whic h is n ot imp osed here, and Ch auduri assumes that h has the exact order n − 1 / (2 p + d ) . In the simpler context of univ ariate kernel re- gression, Einmahl and Mason (2005) assumes h d ≥ C (log n ) /n to obtain u niform consistency so that Assumption K is fairly general. 3. Bias study and Bahadur repre sent a tion Applying standard p arametric M -estimation theory as detailed in White (1994) or v an der V aart (1998) suggests that the lo cal p olynomial estimator b b ( α ; h, x ) of (2.1) is an estimator of b ∗ ( α ; h, x ) with (3.1) b ∗ ( α ; h, x ) = arg min b ∈ R P E ℓ α Y − U ( X − x ) T b K X − x h . In particular, Q ∗ h ( α | x ) = b ∗ 0 ( α ; h, x ) ma y d iffer from the true conditional qu an tile Q ( α | x ) due to a bias term Q ∗ h ( α | x ) − Q ( α | x ). Studying this b ias term can b e done using the first-order condition ∂ ∂ b T E ℓ α Y − U ( X − x ) T b ∗ ( α ; h, x ) K X − x h = 0 , and the Implicit F unctions Th eorem. This approac h giv es in particular the order of the differ- ence b etw een b ∗ v ( α ; h, x ) and th e v th partial deriv ativ e b v ( α | x ) of Q ( α | x ) pr o vided the p artial deriv ativ e exists. Theorem 1. Assume that Q ( ·|· ) is in a H ¨ older class C ( L, s ) with ⌊ s ⌋ ≤ p . Then under As- sumptions F, K and X and pr ovide d h is smal l e nough, ther e is a c onstant C suc h that for al l | v | ≤ ⌊ s ⌋ and n lar ge enough, sup ( α,h,x ) ∈ [ α,α ] × [ h,h ] ×X 0 b ∗ v ( α ; h, x ) − b v ( α | x ) h s −| v | ≤ C L. It f ollo ws that Q ∗ ( α | x ) − Q ( α | x ) = O ( h s ) and m ore generally that b ∗ v ( α ; h, x ) − b v ( α | x ) = O h s −| v | 8 uniformly pr ovided | v | ≤ ⌊ s ⌋ . Since ⌊ s ⌋ ≤ p , the bias order h s −| v | is n ot affected by the ord er p of the lo cal p olynomial estimator. T his bias order is b etter than the b ias order h p −| v | , | v | ≤ p , that w ould b e ac h iev ed by sub optimal lo cal p olynomial estimators of low er order p < ⌊ s ⌋ . The pro of of Theorem 1 establishes a sligh tly stronger result since it also giv es the order of the coefficients b ∗ v ( α ; h, x ) with | v | > ⌊ s ⌋ whic h corresp ond to partial deriv ative s that may not exist. Indeed, equation (A.8) of the pr o of of Th eorem 1 imp lies that (3.2) b ∗ v ( α ; h, x ) = O h s −| v | for | v | ≥ s uniformly in ( α, h, x ) ∈ [ α , α ] × [ h, h ] × X 0 . See also Loader (1999, Th eorem 4.2) which giv es a less precise b ∗ v ( α ; h, x ) = o h −| v | . Hence the higher ord er p olynomial co efficien ts b ∗ v ( α ; h, x ), | v | > s , ma y div erge when h > 0. Th at this may b e indeed the case can b e seen on a simp le regression example. Consider (3.3) Y = m ( X ) + ε, m ( x ) = | x | 1 / 2 if x ≥ 0 −| x | 1 / 2 if x < 0 , where the U ([ − 1 , 1]) rand om v ariable X and the N (0 , 1) ε are indep enden t. Let Φ( · ) b e the cum ulativ e distribu tion f u nction of the standard normal N (0 , 1) . In this example, Q ( α | x ) = Φ − 1 ( α ) + m ( x ) inh erits of the smo othness prop erties of the regression fun ction m ( · ). Note that the d ifferen tial of m ( · ) at x = 0 is infinite. It also f ollo ws that Q ( α | x ) is at b est in an H¨ older class C ( L, 1 / 2) since, for L large enough, | m ( x ) − m ( x ′ ) | ≤ L x − x ′ 1 / 2 for all ( x, x ′ ) ∈ [ − 1 , 1] 2 , an inequalit y that cannot b e impro v ed by in cr easing the exp onent 1 / 2 as seen b y taking x = 0 and x ′ → 0. The next Prop ositio n u ses the b ehavior of m ( · ) at x = 0 to sho w that the rate give n in (3.2) is sharp . Prop osition 1. Supp ose that ( X, Y ) satisfies (3.3). L et b ∗ ( α ; h, x ) = ( b ∗ 0 ( α ; h, x ) , b ∗ 1 ( α ; h, x )) T fr om (3.1) b e give n by a lo c al p olynomial pr o c e dur e of or der 1. Then under Assumption K and R z K ( z ) dz = 0 , b ∗ 0 (0 . 5; h, 0) = m (0) + O ( h 1 / 2 ) and b ∗ 1 (0 . 5; h, 0) diver ges with the exact r ate h − 1 / 2 , lim h → 0 h 1 / 2 b ∗ 1 (0 . 5; h, 0) = R | z | 3 / 2 K ( z ) dz R z 2 K ( z ) dz 6 = 0 . The divergence of b ∗ 1 (0 . 5; h, 0) implies th at the estimator b b 1 (0 . 5; h, 0) will d iv erge in p robabilit y . This recalls th at observing a large b b 1 (0 . 5; h, 0) is n ot an argument f or claiming that a lo cal p olynomial estimator of order p = 1 should b e used . 9 W e no w consider the stochastic term s b Q h ( α | x ) − Q ∗ h ( α | x ) and the rescaled H b b ( α ; h, x ) − b ∗ ( α ; h, x ) . Let us firs t introd uce some additional notations. Lo cal p olynomial estimation builds on a ord er p T a ylor expansion of Q ( α | x ′ ) w ith x ′ in the vicinit y of x . This T a ylor expansion can b e written as Q ( α | x ′ ) ≃ U ( x ′ − x ) T b p ( α | x ) where b p ( α | x ) groups the partial deriv ativ es of Q ( α | x ) with resp ect to x . Consider the follo wing counterpart of the T a ylor app ro x im ation, (3.4) Q ∗ ( x ′ ; α, h, x ) = U ( x ′ − x ) T b ∗ ( α, h, x ) Define also S i ( α ; h, x ) = S ( X i , Y i ; α, h, x ) and J i ( α ; h, x ) = J ( X i ; α, h, x ) with (3.5) S i ( α ; h, x ) = 2 { I ( Y i ≤ Q ∗ ( X i ; α, h, x )) − α } U X i − x h K X i − x h , (3.6) J i ( α ; h, x ) = 2 f ( Q ∗ ( X i ; α, h, x ) | X i ) U X i − x h U X i − x h T K X i − x h . Since U ( X i − x ) = HU X i − x h and (1.2) giv es ∂ ℓ α ∂ b T Y i − U ( X i − x ) T b K X i − x h = 2 n I Y i ≤ U ( X i − x ) T b − α o U ( X i − x ) K X i − x h almost everywhere, the v ariables S i ( α ; h, x ) satisfy ∂ L n ∂ b T ( b ∗ ( α, h, x ); α, h, x ) = H nh d n X i =1 S i ( α ; h, x ) almost ev erywhere. Hence P n i =1 S i ( α ; h, x ) can b e view ed as a score fu nction term whereas P n i =1 J i ( α ; h, x ) is actually similar to a second d eriv ative of the ob jectiv e fu nction L n although it is not twice different iable. Ind eed, it can b e shown that it admits a quadr atic approxima tion with second-order deriv ativ es H 1 nh d n X i =1 J i ( α ; h, x ) ! H . 10 Classical results of Wh ite (1994) or v an d er V aart (1998) for parametric estimation suggests that a candidate approximat ion for b b ( α ; h, x ) − b ∗ ( α ; h, x ) is − H 1 nh d n X i =1 J i ( α ; h, x ) ! H ! − 1 H nh d n X i =1 S i ( α ; h, x ) = − H − 1 1 nh d n X i =1 J i ( α ; h, x ) ! − 1 1 nh d n X i =1 S i ( α ; h, x ) . Hence the rescaled nh d 1 / 2 H b b ( α ; h, x ) − b ∗ ( α ; h, x ) is exp ected to b e close to (3.7) β n ( α ; h, x ) = − 1 nh d n X i =1 J i ( α ; h, x ) ! − 1 1 ( nh d ) 1 / 2 n X i =1 S i ( α ; h, x ) . n X i =1 J i ( α ; h, x ) / ( nh d ) is sim ilar to a Kern el regression estimator and ob eys a La w of L arge Num- b ers for triangular array which ensur es that this m atrix is asymptotical ly close to 2 f ( Q ∗ ( x ; α, h, x ) | x ) Z U ( t ) U ( t ) T K ( t ) dt. Since this matrix is symmetric p ositiv e definite, the inv erse in (3.7) exists with a p robabilit y tending to 1. Th e term n X i =1 S i ( α ; h, x ) / ( nh d ) 1 / 2 has a similar kernel str u cture bu t with cen tered S i ( α ; h, x ), s ee (A.1) in Lemma A.1 of Ap p endix A. Hence n X i =1 S i ( α ; h, x ) / ( nh d ) 1 / 2 satisfies a p oint w ise Central Limit Theorem, as β n ( α ; h, x ). Hence nh d 1 / 2 H b b ( α ; h, x ) − b ∗ ( α ; h, x ) should also b e asymptotically Gaussian pro vid ed the so calle d Bahadur error term (3.8) E n ( α ; h, x ) = nh d 1 / 2 H b b ( α ; h, x ) − b ∗ ( α ; h, x ) − β n ( α ; h, x ) . is asymptotically negligible p oin t wisely . But transp osing the v arious un iform results established in th e App endices for the leading term β n ( α ; h, x ) of the expansion of nh d 1 / 2 H b b ( α ; h, x ) − b ∗ ( α ; h, x ) requests a uniform study of E n ( α ; h, x ). T echniques to s tudy E n ( α ; h, x ) for a fi xed argument α , h and x are give n in Hjort and P ollard (1993 ). S ee also F an, Hec kman and W an d (1995 , p.143) or F an and Gijb els (1996, p.210). In our u n iform setup, obtaining an un iform order for E n ( α ; h, x ) is p erformed us in g a 11 preliminary uniform study of a sto chastic p ro cess w e in tro duce no w. Define first L 1 n ( β ; α, h, x ) = nh d ( L n b ∗ ( α ; h, x ) + H − 1 β ( nh d ) 1 / 2 ; α, h, x ! − L n ( b ∗ ( α ; h, x ); α, h, x ) ) = n X i =1 ℓ α Y i − Q ∗ ( X i ; α, h, x ) − U X i − x h T ( nh d ) 1 / 2 β − ℓ α ( Y i − Q ∗ ( X i ; α, h, x )) K X i − x h , whic h is such that nh d 1 / 2 H b b ( α ; h, x ) − b ∗ ( α ; h, x ) = arg min β L 1 n ( β ; α, h, x ) . It th en follo ws from (3.8) that E n ( α ; h, x ) = arg min ǫ L n ( β n ( α ; h, x ) , ǫ ; α ; h, x ) where L n ( β , ǫ ; α ; h, x ) = L 1 n ( β + ǫ ; α, h, x ) − L 1 n ( β ; α, h, x ) . (3.9) Hence th e sto c hastic pro cess L n pla ys a cen tral r ole in our analysis. Esp ecially useful is the decomp osition L n ( β , ǫ ; α ; h, x ) = L 0 n ( β , ǫ ; α ; h, x ) + R n ( β , ǫ ; α ; h, x ) where L 0 n is th e quadratic approximati on of L n , L 0 n ( β , ǫ ; α ; h, x ) = 1 ( nh d ) 1 / 2 n X i =1 S i ( α ; h, x ) T ( β + ǫ ) + 1 2 ( β + ǫ ) T 1 nh d n X i =1 J i ( α ; h, x ) ! ( β + ǫ ) − 1 ( nh d ) 1 / 2 n X i =1 S i ( α ; h, x ) T β + 1 2 β T 1 nh d n X i =1 J i ( α ; h, x ) ! β = 1 ( nh d ) 1 / 2 n X i =1 S i ( α ; h, x ) T ǫ + 1 2 ǫ T 1 nh d n X i =1 J i ( α ; h, x ) ! ( ǫ + 2 β ) , (3.10) and R n is a remainder term. As in the expression ab o ve (3.9) for E n ( α ; h, x ), th e v ariable β ab o v e in (3.10) will b e tak en equ al to β n ( α ; h, x ) in the pro of of Theorem 2 b elo w . As n oted in the quadratic appr o ximation lemma of F an et al. (1995 , p.148) in the p oin twise case, the order of E n ( α ; h, x ) is driven b y th e order of R n . T h e p ro of of the next Theorem relies on an uniform study of R n based on a m aximal inequalit y und er brack eting entrop y conditions from Massart (2007), s ee the pr o of of Prop osition A.1. This maximal inequalit y pla ys here the role of the Bernstein inequalit y used in the p oin twise framework of Hong (2003). 12 Theorem 2. Under Assumptions F, K and X, sup ( α,h,x ) ∈ [ α ,α ] × [ h,h ] ×X 0 k E n ( α ; h, x ) k = O P log 3 ( n ) nh d 1 / 4 . In the case wh ere the lo wer and upp er b andwidths h and h ha v e the same order, Theorem 2 giv es uniformly in h in [ h , h ], α and x , b Q h ( α | x ) = Q ∗ h ( α | x ) + e T 0 β n ( α ; h, x ) ( nh d ) 1 / 2 + O P log n nh d 3 / 4 , where e 0 is the first vect or of the canonical basis of R P , which first co ord in ate is equal to 1 and the other ones are equal to 0. F or h of order n − 1 / (2 p + d ) as studied in Chaudu ri (1991, T heorem 3.2), the order of the remainder term is n − 3 p/ (2(2 p + d )) log 3 / 4 n as found by this author. When d = 1, Hong (20 03) obtains the b etter order (log log n/ ( nh )) − 3 / 4 but his Bahadur represen tation only h olds p oint w isely in α and x . It can b e conjectured th at the ord er (log n/ ( nh d )) − 3 / 4 is optimal for Bahadur expansion holding u niformly w ith r esp ect to x . F or h igher ord er partial deriv ativ es, Theorem 2 yields b b v ( α ; h, x ) = b ∗ ( α ; h, x ) + e T v β n ( α ; h, x ) ( nh d ) 1 / 2 h | v | + 1 h | v | O P log n nh d 3 / 4 , where the v th entry of e v is 1 and the other are 0, see also Hong (2003) for a p oin t wise v er s ion of this expansion and Kong et al. (2010) for a ve rsion wh ic h is un iform with resp ect to x . S u c h expansion can b e us ed to stud y the p oint w ise asymptotic norm alit y of the lo cal p olynomial quan tile estimato r. Com b ining this Bahadur repr esen tation with the bias s tudy of Th eorem 1 giv es a global r ate result which is apparen tly new. The next Corollary extends th e stu dy of lo cal medians in T r uong (1989) . Corollary 1. Assume that Q ( α | x ) is in C ( L, s ) for some ⌊ s ⌋ ≤ p . Supp ose that Assumptions F, K and X hold. Then for al l p artial derivative or der v with | v | ≤ ⌊ s ⌋ and al l α in [ α , α ] , (i) R X 0 b b v ( α ; h, x )) − b v ( α | x ) m dx 1 /m = O P 1 n s −| v | 2 s + d for any finite m > 0 pr ovide d h is asymptot ic al ly pr op ortional to n − 1 2 s + d ; (ii) sup x ∈X 0 b b v ( α ; h, x ) − b v ( α | x ) = O P log n n s −| v | 2 s + d if h is asymptotic al ly pr op ortional to log n n 1 2 s + d . Since the b v ( α | x ) are estimators of the partial deriv ativ es of m ( x ) in a regression mo del as (3.3), It follo ws from Stone (19 82) that th e glo bal rates deriv ed in Corollary 1 are optimal in a minimax sense. 13 A second application builds on the un if orm it y with resp ect to th e b an d width h of our Bahadur representat ion. The next Prop osition allo ws for d ata-driv en bandwid ths. Ob serv e that it also deals w ith the uniform norm sup ( α,x ) ∈ [ α ,α ] ×X 0 b Q h ( α | x ) − Q ( α | x ) whic h ev aluates the estimated curves ( α, x ) 7→ b Q h ( α | x ) used in empirical graphic illustr ations of (1.1). Prop osition 2. Consider a r andom b andwidth b h n such that b h n = O P ( h n ) and 1 / b h n = O P (1 /h n ) wher e h n is a deterministic se quenc e satisfying h n = o (1) and lim n →∞ (log n ) / ( nh d n ) = 0 . Supp ose that Assumption K, F and X hold and that Q ( α | x ) is in C ( L, s ) . Then for any v with | v | ≤ ⌊ s ⌋ , sup ( α,x ) ∈ [ α ,α ] ×X 0 b b v ( α ; b h n , x ) − b v ( α | x ) = h −| v | n O P h s n + log n nh d n 1 / 2 ! . In particular if the exact order of b h n is (lo g( n ) /n ) 1 / (2 s + d ) in p robabilit y , sup x ∈X 0 b b v ( α ; b h, x ) − b v ( α | x ) has th e optimal ord er (log ( n ) /n ) ( s −| v | ) / (2 s + d ) of Corollary 1-( ii). It is like ly that an L m v er s ion of Prop osition 2 holds but it is sligh tly longer to pr o ve. Prop osition 2 can b e for in stance fr u itfully applied to cross-v alidated b an d widths for the conditional cumulativ e d istribution as prop osed b y L i and R acine (2008) . Our last application bu ilds on the f act that Theorems 1 and 2 hold uniform ly with r esp ect to the quant ile order α . This application concerns estimation of th e conditional quantile densit y function (1.5). The considered estimator of q ( α | x ) is a conditional ve rsion of the Parzen (1979) con volutio n estimator, (3.11) b q ( α | x ) = 1 h q Z b Q h ( a | x ) dK q a − α h q = 1 h q Z b Q h ( α + h q t | x ) dK q ( t ) , see also Xiang (1995). In the expression ab ov e, h q > 0 is a bandwidth and K q ( · ) is a signed measure o ver R suc h that Z dK q ( t ) = 0 , Z tdK q ( t ) = 1 . In particular, if K q ( · ) has a Leb esgue deriv ativ e dK q ( t ) = K ′ q ( t ) dt , substituting in (3.11) giv es b q ( α | x ) = 1 h q Z b Q h ( α + h q t | x ) K ′ q ( t ) dt. Computing these integrals ma y r equest in tensiv e numerical steps so that the r esulting estimator ma y b e difficult to imp lemen t in practice. A more r ealistic estimator u ses a discrete m easur e K q ( · ) in (3.11 ). If K q ( · ) is a linear com bination of Dirac masses at t j with w eight s κ j , j = 1 , . . . , J , the resulting estimator b q ( α | x ) = 1 h q J X j =1 κ j b Q h ( α + h q t j | x ) , J X j =1 κ j = 0 and J X j =1 t j κ j = 1 , 14 ma y b e indeed simpler to compute. Note that this includes the well kn o wn numerical d er iv ative s b Q h ( α + h q | x ) − b Q h ( α | x ) h q , b Q h ( α | x ) − b Q h ( α − h q | x ) h q and b Q h ( α + h q | x ) − b Q h ( α − h q | x ) 2 h q . T o stud y th e bias of b q ( α | x ), we strengthen the definition of the smo othness class C ( L, s ) as follo ws. Q ( α | x ) is in C q ( L, s ) if (i) Q ( α | x ) is in C ( L, s + 1); (ii) F or eac h x in X , α ∈ [ α , α ] 7→ q ( α | x ) is ⌊ s ⌋ th d ifferen tiable; (iii) F or eac h x in X and all ( α, α ′ ) ∈ [ α , α ] 2 ∂ ⌊ s ⌋ q ( α | x ) ∂ α ⌊ s ⌋ − ∂ ⌊ s ⌋ q ( α ′ | x ) ∂ α ⌊ s ⌋ ≤ L α − α ′ s −⌊ s ⌋ . W e shall assume in addition that K q ( · ) has a compact s upp ort and satisfies th e additional conditions Z t j dK q ( t ) = 0 , j = 1 , . . . , ⌊ s ⌋ , Z | dK q ( t ) | < ∞ . Prop osition 3. Assume that Q ( α | x ) is i n C q ( L, s ) and ⌊ s + 1 ⌋ ≤ p . Supp ose that Assumptions K, F and X hold with h = O ( h q ) , h q → 0 and (log n ) / ( nh d ) → 0 . Then for any x in X 0 and α in ( α , α ) , b q ( α | x ) = q ( α | x ) + O P h s q + 1 ( nh d h q ) 1 / 2 ! + log 3 / 4 n nh d h 2 q 1 / 4 O P 1 ( nh q h d ) 1 / 2 ! . T aking h q and h of the same ord er is the optimal c hoice for th e order of h in the exp ansion of Prop osition 3. This giv es b q ( α | x ) = q ( α | x ) + O P h s + 1 ( nh d +1 ) 1 / 2 ! + log 3 / 4 n ( nh d +2 ) 1 / 4 O P 1 ( nh d +1 ) 1 / 2 ! . The it em log 3 / 4 n nh d +2 − 1 / 4 O P nh d +1 − 1 / 2 is giv en b y the Bahadu r error term E n ( α ; h, x ) of T h eorem 2. The other item, O P h s + ( nh d +1 ) − 1 / 2 , can b e view ed as a bias v ariance decom- p osition comp on ent. The latter is th e leading term of the expansion pr o vided n h d +2 → ∞ , a condition also used in Lee and Lee (2008) when d = 1. In this case, the optimal ord er for h is n − 1 / (2 s + d +1) , w hic h is suc h th at nh d +2 → ∞ p r o vid ed s > 1 / 2. In this case, th e optimal rate for p oint w ise estimation of q ( α | x ) is n − s/ (2 s + d +1) whic h, as exp ected fr om (1.5), coincides w ith the optimal rate for p oin t wise estimation of f ( y | x ). 15 4. Final re marks This p ap er h as inv estigated the bias and the Bahadu r representa tion of a lo cal p olynomial estimator of the conditional quanti le fu nction and its d er iv ative s. C ompared to the existing lit- erature, a distinctiv e feature is that the bias and Bahadur remaind er term are studied uniformly with resp ect to the quant ile lev el, the co v ariates and th e smo othing p arameter, extending so Chaudu r i (199 1) and K ong et al. (2010). Our fr amew ork also consid ers the case wh ere the order of the lo cal p olynomial estimator p is higher than th e order of differentia bilit y s of the con- ditional quantile fu nction. An in teresting consequence of our bias stu dy is that using a lo cal p olynomial estimator of order p ≥ s do es not affect its rate optimalit y . Our uniform study of the bias and of the Bahadur remainder term are app lied to d eriv e the global rate optimalit y of the local p olynomial estimators of the conditional qu an tile fu nction and its deriv ativ es with resp ect to L m norms, 0 < m ≤ ∞ provided the bandwid th go es to 0 with an appr op r iate rate. T his extends T ru ong (1989) who states a similar r esu lt for lo cal medians and un der a rather strong Lipschitz condition for th e conditional quantile function. Another application d eals with the p erformance of randomly selected bandw idths that are sho wn to p erform as well as their deterministic equ iv alent in term of consistency rates in u niform n orm . Our framew ork is flexib le enough to b e adapted to other global norms. This new result is esp ecially useful in view of Li and Racine (2008 ) suggestion of implementi ng local p olynomial quan tile estimation with a d ata-driv en bandwid th giv en by a cross v alidation criterion for th e conditional cumulati v e distr ibution f unction. A last application to nonp arametric estimation of the qu an tile density f unction can b e u s eful for confidence interv als and in Econometrics of Auctions where the conditional quantile density fu nction p la ys an imp ortan t r ole. Our uniform results can also b e u seful for other stud ies. F or instance an issue f ar b ey ond the scop e of the presen t pap er is the c hoice of the lo cal p olynomial order p . Lo cal p olynomial quanti le estimation can b e implemen ted u sing a large p , p ossibly gro wing with the sample size. This w ould allo w to estimate v ery smo oth conditional quanti le function with a small bias although it ma y inflate the asymptotic v ariance of the resu lting estimato r. Another approac h would b e to use a data-driv en lo cal p olynomial order p . Su c h a problem is ve ry close to the issue of c ho osing the order of the k ernel when estimating a regression or a p r obabilit y d ensit y fun ction. T he latter can b e add ressed follo win g the recent adaptiv e appr oac h of Goldenshluger and Lesp ki (2008,20 09) whic h giv es a d ata-driv en choic e of the ke rnel and bandwid th in the con text of the conti n uous time white noise mo d el. Ou r uniform Bahadur representati on is a preliminary step th at can b e useful to extend their resu lts to lo cal p olynomial qu an tile estimation. 16 Referen ces [1] Chauduri, P. (1991). Nonp arametric estimates of regression quantiles and t heir lo cal Bahadur representa- tion. The Annals of Statistics , 19 , 760-777. [2] Chernozhu k o v, V. & C. H ansen (2005). An IV mo del of qu antil e treatments effects. Ec onometric a 73 , 245-261. [3] Chesher, A. (2003). Identification in nonseparable mod els. Ec onometric a , 71 , 1405-1441. [4] Chow, Y.S. & H. Tei cher (2003). Pr ob abili ty the ory. Indep endenc e, i nter change abil ity, martingales . Third edition, Springer. [5] Echenique, F. & I. Komunjer (2009). T esting models with m ultiple equilibria b y quan tile metho ds. Ec ono- metric a 77 , 1281-1297. [6] Einmahl, U. & D.M. Mason (2005 ). Uniform in bandwidth consistency of kernel-t yp e function estimators. The Annals of Statistics , 33 , 1380-1403. [7] F an, J. (1992). Design-adaptive nonparametric regression. Journal of the Americ an Statistic al Asso ci ation , 87 , 998-1004 . [8] F an, J. & I. G ijbels (1996). L o c al p olynomial mo deling and i ts applic ations . Chapman & Hall/CRC. [9] F an, J., N.E. Heckman & M.P. W a nd (1995). Lo cal p olynomial kernel regression for generalized linear mod el and qu asi-lik elihoo d functions. Journal of the Americ an Statistic al Asso ciation , 90 , 141-151. [10] Firpo, S. , N. For tin & T. Lem ieux (2009). Unconditional q uantile regression. Ec onometric a 77 , 953-973. [11] Guerre, E., I. Perri gne & Q. V uong (2000). O ptimal nonparametric estimation of first p rice auctions. Ec onometric a , 68 , 525-574. [12] Guerre, E., I. Perrigne & Q. Vuong (2009). Nonp arametric identification of risk av ersion in first-p rice auctions under exclusion restrictions. Ec onometric a 77 , 1193-1227. [13] Goldenshluger, A. & O. Lepski (2008). Universal p oint wise selection ru le in multiv ariate function esti- mation. Bernoul li 14 , 1150-1190. [14] Goldenshluger, A. & O. Lepski (2009). Structural adap t ation via L p -norm oracle inequalities. Pr ob abil ity The ory and R elate d Fiel ds 143 , 41-71. [15] Haile, P.A, H. Hong & M. Shum (2003). Nonp arametric tests for common v alues in first-price sealed-bid auctions. Co wles F ound ation discussio n p aper. [16] Hjor t N. & Pollard D. (1993). A symptotics for minimisers of con vex processes. Unpublished manuscript, http://www .stat.yale .edu/ e Pollard/Pa pers/. [17] Holderlein, S. & E. Mammen (2007). Identification of marginal effects in n on separable mo dels without monotonicit y . Ec onometric a 75 , 1513-1518. [18] Holderlein, S. & E. Mammen (2009). Identification and estimation of local av erage deriv atives in non- separable mo dels without monotonicit y . Ec onometrics Journal 12 , 1-25. [19] Hong, S.Y. (2003). Bahadur rep resen tation and its app lications for lo cal p olynomial estimates in nonpara- metric M -regressio n. Journal of Nonp ar ametric Statistics , 15 , 237-251. [20] Imbens, G.W . & W.K. Newey (2009). Identificatio n an d Estimation of T riangular Simultaneous Equations Models Without Additivity . Ec onometric a 77 , 1481-1512. [21] Ko enker, R. (2005). Quantile R e gr ession . Cambridge Universit y Press. 17 [22] Ko ng, E., O. Li nton & Y. X ia (2010). Uniform Bahadur representation for local p olynomial estimates of M-regression and its application to the additive mo del. Ec onometric The ory 26 , 1529-1564. [23] Lee, K.L. & E.R. Le e (2008). Kernel metho ds for estimating deriv atives of conditional quantiles. Journal of the Kor e an Statistic al So ci ety , 37 , 365-373. [24] Li Q. & J. S. Racine (2008). Nonparametric Estimation of Conditional CDF and Quantile F unct ions With Mixed Categorical and Con tinuous Data. Journal of Business and Ec onomic Statistics , 26 , 423-434. [25] Lo ader, C. (1999). L o c al r e gr ession and likeli ho o d . Sp rin ger. [26] Marmer, V . & A. Shney ero v (2008). Quantile-based nonparametric inference for first-p rice auctions. Docu ment pap er. [27] Massar t, P. (2007). Conc entr ation ine quali ties and mo del sele ction . L e ctur e Notes in Mathematics , 1896 . Ec ole d’Et´ e de Pr ob abilit´ es de Saint Flour XXXI I I- 2003, Jean Picard Editor. S pringer. [28] P arzen, E. (1979). Nonparametric S tatistical D ata Mo d eling. Journal of the Americ an Statistic al Asso cia- tion , 74 , 105-121 . [29] Rothe, C . (2010). Nonp arametric estimation of distributional p olicy effects. Journal of Ec onometrics 155 , 56-70. [30] Stone, C.J. (1982). Op timal global rates of converg ence for nonparametric regression. The Annals of Sta- tistics , 10 , 1040-105 3. [31] Tsybako v, A.B . (1986). Robu st reconstruction of functions by the lo cal-approximation meth od . Pr oblemy Per e dachi I nformatsii , 22 , 69-84. [32] Truong, Y.K. (1989). Asym p totic prop erties of kernel estimators based on local medians. The Annals of Statistics , 17 , 606-617. [33] v an de Geer, S . (1999). Empiric al pr o c esses in M -estimation . Cambridge Universit y Press. [34] v an der V aar t, A.W. (1998). Asymptotic statistics . Cam bridge Universit y Press. [35] White, H . (1994). Estimation, infer enc e and sp e cific ation analysis . Ec onometric So ciety Mono gr aphs . Cam- bridge Universit y Press. [36] Xiang, X . (1995). Estimatio n of conditional quantile densit y function. Journal of Nonp ar ametric Statistics , 4 , 309-316. [37] Zeidler, E. (1985). Nonli ne ar F unctional Analysis and its Applic ations, I. Fixe d-Point T he or ems . Springer V erlag. Appendix A : Proofs o f main resul ts Appendix A groups the pro ofs of Theor ems 1 and 2, Pr op ositions 1 , 2 and 3, and Corolla ry 1. The pro ofs of intermediary results used to prove these main results are gro uped in App endix B. W e first intro duce some a dditional notations. Sequences { a n } and { b n } satisfy a n ≍ b n if | a n | /C ≤ | b n | ≤ C | a n | for some C > 0 and n lar ge enoug h. Recall that k · k is the Euclidean nor m and B (0 , 1 ) = { z ; k z k ≤ 1 } . Let ≻ b e the usua l order for symmetric ma trices, that is A 1 ≻ A 2 if and only if A 1 − A 2 is a non-nega tive symmetric matr ix. If A is a symmetric matrix , k A k = s up u ∈B (0 , 1) k Au k = sup u ∈B (0 , 1) | u T Au | is the la rgest eigenv alue in abs o lute v alue of A . This no r m is such that k AB k ≤ k A kk B k for any ma tr ix or v ector B . Denote by k·k ∞ the uniform nor m, i.e. k f ( ·|· ) k ∞ = sup ( x,y ) ∈ R d × R | f ( y | x ) | . 18 W e us e the abbreviation θ = ( α, h, x ). In particular , Q ∗ ( x ′ ; θ ), S i ( θ ) and J i ( θ ) stand for Q ∗ ( x ′ ; α, h, x ), S ( X i , Y i ; α, h, x ) a nd J ( X i ; α, h, x ), se e equatio ns (3.4 ), (3.5) and (3.6 ). W e a bbreviate h n and h n int o h and h . Define Θ 0 = [ α , α ] × [0 , h ] × X 0 , Θ 1 = [ α , α ] × [ h, h ] × X 0 , where X 0 is as in Assumption X and [ α , α ] ⊂ (0 , 1) is a s in the definition of the smo othness class C ( L, s ). F or L n ( b ; α, h, x ) = L n ( b ; θ ) a s in (2.1), define L ( b ; θ ) = E [ L n ( b ; θ ) ] = 1 h d E n ℓ α Y − U ( X − x ) T b − ℓ α ( Y ) o K X − x h . W e als o use K h ( z ) = K ( z /h ). It is convenien t to change b in to its standar dization B = Hb and to define b B ( θ ) = H b b ( θ ) and B ∗ ( θ ) = Hb ∗ ( θ ). Absolute constants are denoted b y the gener ic letter C and may v ary from line to line. The following arg umen t is used systemically . Recall that X 0 is an inner subset of the co mpact X under Assumption X . Hence for any ( x, h ) ∈ X 0 × K , x + hz is in X under Assumption K provided h is small eno ugh. The nex t lemma is used in the pro o f of Theorems 1 a nd 2. Its pro o f is given in App endix B with the pr o of o f the other intermediary res ults. Lemma A.1. Under Assum ption F, K and X, we have for h smal l enough, (i) b ∗ ( θ ) exists and is unique for al l θ in Θ 0 . (ii) B ∗ ( θ ) = Hb ∗ ( θ ) satisfies E [ S i ( θ ) ] = Z n F U ( z ) T B ∗ ( θ ) | x + hz − F ( Q ( α | x + hz ) | x + hz ) o f ( x + hz ) U ( z ) K ( z ) dz = 0 , (A.1) lim h → 0 sup θ ∈ Θ 0 k B ∗ ( θ ) − B ∗ ( α ; 0 , x ) k = 0 , (A.2) wher e B ∗ ( α ; 0 , x ) = ( Q ( α | x ) , 0 , . . . , 0) T . (iii) for al l ( x ′ , θ i ) in X × Θ 1 , i = 1 , 2 , | Q ∗ ( x ′ ; θ 1 ) − Q ∗ ( x ′ ; θ 2 ) | ≤ C h − p (1 + h − 1 ) k θ 1 − θ 2 k . (iv) Ther e exists C such that, for al l θ in Θ 1 , al l x ′ in X and al l x in X 0 , f ( Q ∗ ( x ′ ; θ ) | x ′ ) K x − x ′ h ≥ C K x − x ′ h . A.1. Pro of of Theorem 1. Since Q ( ·|· ) is in C ( L, s ), the T aylor-Lag r ange F o rmula and Assumption K yield that there ex ists t = t ( h, x, z ) in (0 , 1) such that for h sma ll enoug h and all ( x, z ) in X 0 × K , Q ( α | x + hz ) = X 0 ≤| v |≤ ⌊ s ⌋ b v ( α | x ) v ! ( hz ) v + X | v | = ⌊ s ⌋ ( hz ) v v ! ( b v ( α | x + thz ) − b v ( α | x )) = U ( z ) T Hb ( α | x ) + ǫ ( θ , z ) . (A.3) 19 In the equatio n a bove, b v ( α | x ) is the v th par tial deriv atives of Q ( α | x ) with resp ect to x and b ( α | x ) = ( b v ( α | x ) , | v | ≤ ⌊ s ⌋ , 0 , . . . , 0 ) T ∈ R P . Since Q ( ·|· ) ∈ C ( L, s ), (A.4) lim h → 0 sup ( θ ,z ) ∈ Θ 1 ×K ǫ ( θ , z ) h s ≤ C L. Let I ( θ , z ) = Z 1 0 f Q ( α | x + hz ) + t U ( z ) T B ∗ ( θ ) − Q ( α | x + hz ) | x + hz dt. Assumptions F, K, X, Q ( ·|· ) ∈ C ( L, s ) and (A.2) give (A.5) lim h → 0 sup ( θ ,z ) ∈ Θ 0 ×K | I ( θ , z ) − f ( Q ( α | x ) | x ) | = 0 . A T aylor e xpansion with integral r e mainder gives F U ( z ) T B ∗ ( θ ) | x + hz − F ( Q ( α | x + hz ) | x + hz ) = U ( z ) T B ∗ ( θ ) − Q ( α | x + hz ) I ( θ , z ) . Substituting in the fir s t-order co ndition (A.1) yields (A.6) Z U ( z ) U ( z ) T B ∗ ( θ ) − Q ( α | x + hz ) I ( θ , z ) f ( x + hz ) K ( z ) dz = 0 . W e s how that the matrix R U ( z ) U ( z ) T I ( θ, z ) f ( x + h z ) K ( z ) dz has an inv ers e. Indeed, Assumptions K a nd X, (A.5) and h small e no ugh g ive that uniformly in θ in Θ 0 and A in R P , A T Z U ( z ) U ( z ) T I ( θ, z ) f ( x + hz ) K ( z ) dz A = Z U ( z ) T A 2 I ( θ, z ) f ( x + h z ) K ( z ) dz = (1 + o (1)) f ( Q ( α | x ) | x ) Z U ( z ) T A 2 K ( z ) dz ≥ C k A k 2 , using the fact that A 7→ R U ( z ) T A 2 K ( z ) dz is a sq ua re norm and norm equiv a lence ov e r R P . It follows that R U ( z ) U ( z ) T I ( θ , z ) f ( x + hz ) K ( z ) dz is strictly positive definite and ha s an inv erse which satisfies, for n lar ge eno ugh (A.7) sup θ ∈ Θ 0 Z U ( z ) U ( z ) T I ( θ, z ) f ( x + h z ) K ( z ) dz − 1 < ∞ . (A.6) a nd (A.3) g ive Hb ∗ ( θ ) = Hb ( α | x ) + Z U ( z ) U ( z ) T I ( θ , z ) f ( x + hz ) K ( z ) dz − 1 Z ǫ ( θ , z ) I ( θ, z ) f ( x + hz ) U ( z ) K ( z ) dz . It then follows fr o m (A.4) and (A.7) that k Hb ∗ ( θ ) − Hb ( α | x ) k ≤ Z U ( z ) U ( z ) T I ( θ , z ) f ( x + hz ) K ( z ) dz − 1 Z ǫ ( θ , z ) I ( θ, z ) f ( x + hz ) U ( z ) K ( z ) dz ≤ C L h s (A.8) uniformly in θ in Θ 0 . This ends the pro of o f the Theore m and a lso es tablishes (3.2) since b ( α | x ) = ( b v ( α | x ) , | v | ≤ ⌊ s ⌋ , 0 , . . . , 0 ) T . ✷ 20 A.2. Pro of of Prop osition 1. Let ϕ ( t ) = exp( − t 2 / 2) / √ 2 π , Φ( t ) = R t −∞ ϕ ( u ) du b e the p.d.f a nd c.d.f of the standard nor mal. The regr ession mo del (3.3) is such that F ( y | x ) = Φ ( y − m ( x )) , f ( x ) = I ( x ∈ [ − 1 , 1]) . (A.2) g ives that lim h → 0 max z ∈K U ( z ) T B (0 . 5; h, 0) = Q (0 . 5 | 0) = m (0) = 0. Hence (A.6), (A.5) and Assumption K g ive (1 + o (1)) ϕ (0) Z U ( z ) U ( z ) T B (0 . 5; h, 0) − m ( h z ) K ( z ) dz = 0 . Recall that U ( z ) = (1 , z ) T , so that the equation ab ov e gives b 0 (0 . 5; h, 0) hb 1 (0 . 5; h, 0) = (1 + o (1)) Z U ( z ) U T ( z ) K ( z ) dz − 1 h 1 / 2 R m ( z ) K ( z ) dz h 1 / 2 R | z | 3 / 2 K ( z ) dz = (1 + o (1)) h 1 / 2 R m ( z ) K ( z ) dz R | z | 3 / 2 K ( z ) d z R z 2 K ( z ) d z . ✷ A.3. Pro of of Theorem 2. W e firs t state some intermediary r esults. The tw o following prop o sitions deals w ith the re ma inder ter m R n ( β , ǫ ; θ ) = P n i =1 R i ( β , ǫ ; θ ) from (3 .10), whe r e R i ( β , ǫ ; θ ) = ( ℓ α Y i − Q ∗ ( X i ; θ ) − U X i − x h T ( β + ǫ ) ( nh d ) 1 / 2 ! − ℓ α Y i − Q ∗ ( X i ; θ ) − U X i − x h T β ( nh d ) 1 / 2 !) K X i − x h − 1 ( nh d ) 1 / 2 S i ( θ ) T ǫ − 1 2 ǫ T 1 nh d J i ( θ ) ( ǫ + 2 β ) . Define a lso R i ( β , ǫ ; θ ) = R i ( β , ǫ ; θ ) + 1 2 ǫ T 1 nh d J i ( θ ) ( ǫ + 2 β ) (A.9) = ( ℓ α Y i − Q ∗ ( X i ; θ ) − U X i − x h T ( β + ǫ ) ( nh d ) 1 / 2 ! − ℓ α Y i − Q ∗ ( X i ; θ ) − U X i − x h T β ( nh d ) 1 / 2 ! − 2 { I ( Y i ≤ Q ∗ ( X i ; θ ) ) − α } U X i − x h T ǫ ( nh d ) 1 / 2 ) K X i − x h , R 1 i ( β , ǫ ; θ ) = R i ( β , ǫ ; θ ) − E [ R i ( β , ǫ ; θ ) | X i ] , (A.10) R 2 i ( β , ǫ ; θ ) = E [ R i ( β , ǫ ; θ ) | X i ] − 1 2 ǫ T 1 nh d J i ( θ ) ( ǫ + 2 β ) , (A.11) which are such that R n ( β , ǫ ; θ ) = R 1 n ( β , ǫ ; θ ) + R 2 n ( β , ǫ ; θ ) , R j n ( β , ǫ ; θ ) = n X i =1 R j i ( β , ǫ ; θ ) , j = 1 , 2 . 21 Prop ositio n A.1. Consider two r e al numb ers t β , t ǫ > 0 which may dep end u p on on n with t β ≥ 1 , t ǫ ≥ 1 /n and ( t β + t ǫ ) 1 / 2 /t ǫ ≤ O nh d 1 / 4 / log 1 / 2 n . Then, under Assum ptions F, K and X and for n lar ge enou gh, E " sup ( β ,ǫ, θ ) ∈B (0 ,t β ) ×B (0 ,t ǫ ) × Θ 1 R 1 n ( β , ǫ ; θ ) # ≤ C log 1 / 2 n nh d 1 / 4 t ǫ ( t β + t ǫ ) 1 / 2 . Prop ositio n A. 2. Consider two r e al nu mb ers t β , t ǫ > 0 which may dep end up on on n with t β ≥ 1 and t β /t ǫ = O nh d / log 1 / 2 n . Then, under Assu mptions F, K and X and for n lar ge enough, E " sup ( β ,ǫ, θ ) ∈B (0 ,t β ) ×B (0 ,t ǫ ) × Θ 1 R 2 n ( β , ǫ ; θ ) # ≤ C t ǫ ( t β + t ǫ ) 2 nh d 1 / 2 . The ne x t lemma is used to b ound the e igenv alues of P n i =1 J i ( θ ) / ( nh d ) fro m b elow. It implies in particular that all the β n ( θ ) in (3.7), θ in Θ 1 , are well defined with a probability tending to 1 . Let γ n ( θ ) be the s ma llest eige n v alue of the nonnega tive symmetric matr ix P n i =1 J i ( θ ) / ( nh ) d . Lemma A.2. Under Assum ptions F, K and X, inf θ ∈ Θ 1 γ n ( θ ) ≥ γ + o P (1) for some γ > 0 . Lemma A.2 tog ether Le mma A.3 b elow gives sup θ ∈ Θ 1 k β n ( θ ) k = O P log 1 / 2 n . Lemma A.3. Supp ose that Assum ptions F, K and X ar e satisfie d. Then sup θ ∈ Θ 1 1 ( nh d ) 1 / 2 n X i =1 S i ( θ ) = O P log 1 / 2 n . The rest o f the pro of of Theorem 2 is divided in tw o steps . In what follows t n = t log 3 / 4 n ( nh d ) 1 / 4 , t > 0 . Under Assumption K, (log n ) / ( nh d ) = o (1) so tha t t n = o log 1 / 2 n . In the sequel, t n will play the r ole of t ǫ whereas t β will be chosen such that t β ≍ log 1 / 2 n . Hence ( t β + t ǫ ) 1 / 2 t ǫ ≍ ( nh d ) 1 / 4 log 1 / 4 n t log 3 / 4 n = 1 t O ( nh d ) 1 / 4 log 1 / 2 n ! , t β t ǫ ≍ ( nh d ) 1 / 4 log 1 / 2 n t log 3 / 4 n = O nh d log n ! 1 / 4 = o nh d log n × log 1 / 2 n ! = o nh d log 1 / 2 n ! . Hence these choices of t β and t ǫ satisfy the conditions of Pr op ositions A.1 and A.2 provided t is chosen large eno ugh. Step 1: or der of s up ( ǫ,θ ) ∈B (0 ,t n ) × Θ 1 | R n ( β n ( θ ) , ǫ ; θ ) | . Consider η > 0 arbitr arily small. Let γ be a s in Lemma A.2. Since Lemmas A.2 a nd A.3 give sup θ ∈ Θ 1 k β n ( θ ) k = O P log 1 / 2 n , there is a C η such that, 22 for n la rge eno ug h, P sup ( ǫ,θ ) ∈B (0 ,t n ) × Θ 1 | R n ( β n ( θ ) , ǫ ; θ ) | ≥ γ t 2 n 4 ! ≤ P sup ( ǫ,θ ) ∈B (0 ,t n ) × Θ 1 | R n ( β n ( θ ) , ǫ ; θ ) | ≥ γ t 2 n 4 , sup θ ∈ Θ 1 k β n ( θ ) k ≤ C η log 1 / 2 n ! + P sup θ ∈ Θ 1 k β n ( θ ) k > C η log 1 / 2 n ≤ P sup ( β ,ǫ, θ ) ∈B (0 ,C η log 1 / 2 n ) ×B (0 ,t n ) × Θ 1 | R n ( β , ǫ ; θ ) | ≥ γ t 2 n 4 ! + η . Prop ositio ns A.1 a nd A.2, R n = R 1 n + R 2 n and the Mar ko v inequality give P sup ( β ,ǫ, θ ) ∈B (0 ,C η log 1 / 2 n ) ×B (0 ,t n ) × Θ 1 | R n ( β , ǫ ; θ ) | ≥ γ t 2 n 4 ! ≤ C t 2 n t n C η log 1 / 2 n + t n 1 / 2 log 1 / 2 n nh d 1 / 4 + t n C η log 1 / 2 n + t n 2 nh d 1 / 2 = C t n log 3 / 4 n ( nh d ) 1 / 4 C η + t n log 1 / 2 n 1 / 2 + log n nh d 1 / 4 C η + t n log 1 / 2 n 2 ! . The definition of t n , t n = o log 1 / 2 n and Assumption K give (A.12) lim sup n →∞ P sup ( ǫ,θ ) ∈B (0 ,t n ) × Θ 1 | R n ( β n ( θ ) , ǫ ; θ ) | ≥ γ t 2 n 4 ! = η + O C 1 / 2 η t ! when t → ∞ . Step 2: sup θ ∈ Θ 1 k E n ( θ ) k . Consider τ n ≥ t n and ǫ = τ n e , k e k = 1 so that k ǫ k ≥ t n . Since ℓ α ( · ) is conv ex , ǫ 7→ L n ( β ( θ ) , ǫ ; θ ) is conv ex. This gives since L n ( β ( θ ) , 0; θ ) = 0 a nd L n = L 0 n + R n t n τ n L n ( β n ( θ ) , ǫ ; θ ) = t n τ n L n ( β n ( θ ) , ǫ ; θ ) + 1 − t n τ n L n ( β n ( θ ) , 0; θ ) ≥ L n β n ( θ ) , t n τ n ǫ ; θ = L n ( β n ( θ ) , t n e ; θ ) ≥ L 0 n ( β n ( θ ) , t n e ; θ ) + R n ( β n ( θ ) , t n e ; θ ) . Hence E n ( θ ) = arg min ǫ L n ( β n ( θ ) , ǫ ; θ ) and the latter inequality give {k E n ( θ ) k ≥ t n } ⊂ inf ǫ ; k ǫ k≥ t n L n ( β n ( θ ) , ǫ ; θ ) ≤ inf ǫ ; k ǫ k 1. Reca ll that Lemma A.2 together Lemma A.3 gives sup θ ∈ Θ 1 k β n ( θ ) k = O P log 1 / 2 n . Hence (3.8), Theo rems 1 and 2 give, for all C > 1, sup ( α,x,h ) ∈ [ α ,α ] ×X 0 × [ h,h ] b b v ( α ; b h, x ) − b v ( α | x ) = h −| v | O P h s + log n nh d 1 / 2 ! = h −| v | n O P h s n + log n nh d n 1 / 2 ! . This ends the pro o f of the P rop osition s ince lim inf n →∞ P b h n ∈ [ h , h ] can b e made arbitra rily close to 1 by increa sing C . ✷ A.6. Pro of of Prop o sition 3. Substituting (3.8) in (3.1 1) yields b q ( α | x ) − q ( α | x ) = 1 h q Z Q ( α + h q t | x ) dK q ( t ) − q ( α | x ) + 1 h q Z ( Q ∗ ( α + h q t | x ) − Q ( α + h q t | x )) dK q ( t ) + Z e T 0 β n ( α + h q t ; h, x ) h q ( nh d ) 1 / 2 dK q ( t ) + Z e T 0 E n ( α + h q t ; h, x ) h q ( nh d ) 1 / 2 dK q ( t ) . Theorems 1 a nd 2 with h = O ( h q ) a nd h q → 0 give 1 h q Z ( Q ∗ ( α + h q t | x ) − Q ( α + h q t | x )) dK q ( t ) = O h s +1 h q Z | dK q ( t ) | = O ( h s q ) , Z E n ( α + h q t ; h, x ) h q ( nh d ) 1 / 2 dK q ( t ) = log 3 / 4 n nh d h 2 q 1 / 4 O P 1 ( nh q h d ) 1 / 2 ! . Hence it r emains to show that 1 h q Z Q ( α + h q t | x ) dK q ( t ) − q ( α | x ) = O h s q , (A.14) 1 h 1 / 2 q Z β n ( α + h q t ; h, x ) dK q ( t ) = O P (1) . (A.15) The t wo next steps es ta blish these tw o equalities. Step 1: pr o of of (A .14) . Let q ( j ) ( α | x ) = ∂ j q ( α | x ) /∂ x j . Since Q ( α | x ) ∈ C ( L, s + 1), the T aylor- Lagra ng e F o r mula gives, for some ω in [0 , 1], Q ( α + h q t | x ) − Q ( α | x ) = ⌊ s ⌋ X j =0 q ( j ) ( α | x ) ( j + 1)! ( h q t ) j + q ( ⌊ s ⌋ ) ( α + ω h q t | x ) − q ( ⌊ s ⌋ ) ( α | x ) ( ⌊ s ⌋ + 1)! ( h q t ) ⌊ s ⌋ . 25 The definition of the smo othnes s clas s C q ( L, s ) g ives q ( ⌊ s ⌋ ) ( α + ω h q t | x ) − q ( ⌊ s ⌋ ) ( α | x ) ≤ L | h q t | s −⌊ s ⌋ . Hence, s ince the supp or t o f K q ( · ) is co mpact, R | dK q ( t ) | < ∞ and R dK q ( t ) = 0, R tdK q ( t ) = 1, R t 2 dK q ( t ) = · · · = R t ⌊ s ⌋ dK q ( t ) = 0, 1 h q Z Q ( α + h q t | x ) dK q ( t ) = Q ( α | x ) h q Z dK q ( t ) + q ( α | x ) Z tdK q ( t ) + h q q (1) ( α | x ) 2 Z t 2 dK q ( t ) + · · · + h ⌊ s ⌋ q q ( ⌊ s ⌋ ) ( α | x ) ( ⌊ s ⌋ + 1) Z t ⌊ s ⌋ dK q ( t ) + O ( h s ) = q ( α | x ) + O ( h s ) . Step 2: pr o of of (A .15) . Let θ t = ( α + h q t, h, x ), θ = θ 0 . Since R dK q ( t ) = 0, (3 .7) gives 1 h 1 / 2 q Z β n ( θ t ) dK q ( t ) = 1 h 1 / 2 q Z ( β n ( θ t ) − β n ( θ ) ) dK q ( t ) = 1 h 1 / 2 q Z 1 nh d n X i =1 J i ( θ ) ! − 1 − 1 nh d n X i =1 J i ( θ t ) ! − 1 1 ( nh d ) 1 / 2 n X i =1 S i ( θ ) dK q ( t ) (A.16) + 1 h 1 / 2 q Z 1 nh d n X i =1 J i ( θ t ) ! − 1 1 ( nh d ) 1 / 2 n X i =1 { S i ( θ ) − S i ( θ t ) } dK q ( t ) . (A.17) Since A 7→ A − 1 is Lipshitz ov e r the set of semi-de finite p ositive matrices A with smallest eigenv alue bo unded from b elow by γ , Lemmas A.2 a nd A.3, (3.6) and Assumption F yield that (A.16) s a tisfies 1 h 1 / 2 q Z 1 nh d n X i =1 J i ( θ ) ! − 1 − 1 nh d n X i =1 J i ( θ t ) ! − 1 1 ( nh d ) 1 / 2 n X i =1 S i ( θ ) dK q ( t ) ≤ O P (1) h 1 / 2 q Z 1 nh d n X i =1 { J i ( θ t ) − J i ( θ ) } 1 ( nh d ) 1 / 2 n X i =1 S i ( θ ) | dK q ( t ) | ≤ O P (log n ) 1 / 2 h 1 / 2 q Z 1 nh d n X i =1 | Q ∗ ( X i ; θ t ) − Q ∗ ( X i ; θ ) | I X i − x h ∈ K | dK q ( t ) | . 26 The definition (3.4 ) of Q ∗ ( X ; θ ) and (A.8 ) give, since Q ( α | x ) ∈ C ( L, s + 1) a nd b ecause the suppo rt of K q is co mpa ct, 1 h 1 / 2 q Z 1 nh d n X i =1 | Q ∗ ( X i ; θ t ) − Q ∗ ( X i ; θ ) | I X i − x h ∈ K | dK q ( t ) | = 1 h 1 / 2 q Z 1 nh d n X i =1 U X i − x h T ( Hb ∗ ( θ t ) − Hb ∗ ( θ )) I X i − x h ∈ K | dK q ( t ) | ≤ C 1 nh d n X i =1 I X i − x h ∈ K 1 h 1 / 2 q Z k Hb ∗ ( θ t ) − Hb ∗ ( θ ) k | dK q ( t ) | ≤ O P (1) 1 h 1 / 2 q Z k Hb ( α + h q t | x ) − Hb ( α | x ) k | dK q ( t ) | + O ( h s +1 ) ≤ O P (1) 1 h 1 / 2 q Z | Q ( α + h q t | x ) − Q ( α | x ) | | dK q ( t ) | + O h s +1 + h = O P h 1 / 2 q . This gives that the item in (A.16) is O P h 1 / 2 q = o P (1). F or (A.17), Lemma A.2, E [ S i ( θ t )] = 0 , (3.5), Q ∗ ( X ; θ t ) = Q ∗ ( X ; θ ) + O ( h q ) uniformly with resp e c t to t in the suppor t of K q and X ∈ x + h K (a s easily s e en arg uing as in the equation a b ov e) and Assumptions F, X g ive 1 h 1 / 2 q Z 1 nh d n X i =1 J i ( θ t ) ! − 1 1 ( nh d ) 1 / 2 n X i =1 { S i ( θ ) − S i ( θ t ) } dK q ( t ) ≤ O P (1) h 1 / 2 q Z 1 ( nh d ) 1 / 2 n X i =1 { S i ( θ ) − S i ( θ t ) } | dK q ( t ) | = O P (1) h 1 / 2 q E " Z 1 ( nh d ) 1 / 2 n X i =1 { S i ( θ ) − S i ( θ t ) } | dK q ( t ) | # ≤ O P (1) h 1 / 2 q Z E 1 / 2 1 ( nh d ) 1 / 2 n X i =1 { S i ( θ ) − S i ( θ t ) } 2 | dK q ( t ) | = O P (1) h 1 / 2 q Z V a r 1 / 2 1 h d/ 2 { S ( θ ) − S ( θ t ) } | dK q ( t ) | = O P (1) Z " Z 1 h q Z Q ∗ ( x + hz ; θ )+ C h q Q ∗ ( x + hz ; θ ) − C h q f ( y | x + hz ) dy ! I ( z ∈ K ) f ( x + hz ) dz # 1 / 2 | dK q ( t ) | = O P (1) . ✷ Appendix B: Pr oofs of intermediar y resul ts B.1. Pro of of Lemma A.1. Recall U ( X − x ) T b = U ( X − x ) T H − 1 B = U X − x h T B and define e L ( B ; θ ) = L ( b ; θ ) = 1 h d E ℓ α ( Y − U (( X − x ) /h ) T B ) − ℓ α ( Y ) K h ( X − x ) . 27 The change of v a riable x 1 = x + hz g ives e L ( B ; θ ) = 1 h d Z " Z ℓ α ( y − U x 1 − x h T B ) − ℓ α ( y ) ! f ( y | x 1 ) dy # f ( x 1 ) K x 1 − x h dx 1 = Z Z ℓ α ( y − U ( z ) T B ) − ℓ α ( y ) f ( y | x + hz ) dy f ( x + hz ) K ( z ) dz , (B.1) showing that e L ( B ; θ ) is also defined for h = 0. Pr o of of (i). It is sufficient to show that B ∗ ( θ ) = ar g min B ∈ R P e L ( B ; θ ) exists and is unique. Note that B 7→ e L ( B ; θ ) is co nvex by (B.1 ) b ecause ℓ α ( · ) is conv ex. Since lim | t |→ + ∞ ℓ α ( t ) = + ∞ and U ( z ) T B di- verges almost ev e r ywhere w hen k B k div erges, (B.1) gives that lim k B k→ + ∞ e L ( B ; θ ) = + ∞ . Hence e L ( B ; θ ) has a minimum. W e s how that this minim um is unique by showing that B 7→ e L ( B ; θ ) is stric tly conv ex for a ll θ in Θ 0 . W e co mpute the first and sec ond B -der iv atives of e L ( B ; θ ). Eq ua tion (1.2) gives tha t for almost all B , ∂ ℓ α y − U ( z ) T B ∂ B T = 2 I y ≤ U ( z ) T B − α U ( z ) which is bounded for z in the compact K . Assumptions F, K and X, the Lebes g ue Dominated Co nv erg ence Theorem and (B.1 ) yie ld that e L (1) ( B ; θ ) = ∂ e L ( B ; θ ) ∂ B T = 2 Z Z I y ≤ U ( z ) T B − α f ( y | x + hz ) dy f ( x + hz ) U ( z ) K ( z ) dz = 2 Z F U ( z ) T B | x + hz f ( x + hz ) U ( z ) K ( z ) dz − 2 α Z f ( x + hz ) U ( z ) K ( z ) dz . (B.2) Applying again the Dominated Co nv erg ence Theorem yie lds that (B.3) e L (2) ( B ; θ ) = ∂ 2 e L ( B ; θ ) ∂ B T ∂ B = 2 Z f ( U ( z ) T B | x + hz ) f ( x + hz ) U ( z ) U ( z ) T K ( z ) dz . F or all A 6 = 0 in R P , (B.3), Assumptions F, K X and x ∈ X 0 give A T e L (2) ( B ; θ ) A = 2 Z f ( U ( z ) T B | x + hz ) f ( x + hz ) A T U ( z ) U ( z ) T A K ( z ) dz = 2 Z f U ( z ) T B | x + hz f ( x + hz ) U ( z ) T A 2 K ( z ) dz > 0 . (B.4) Hence e L (2) ( · ; θ ) is a p ositive definite symmetric matr ix for all θ in Θ 0 and B in R P so that the s trictly conv ex function e L ( B ; θ ) a chiev es it minimum for a unique B ∗ ( θ ). Pr o of of (ii). Consider a fixe d h to b e chosen small enough, and let e Θ 0 be the corr esp onding Θ 0 , which is co mpa ct. The pro of of (i) yields that B ∗ ( θ ) is unique for all θ in e Θ 0 and is the unique solution of the first-order co ndition e L (1) ( B ; θ ) = 0, that is (B.5) Z F U ( z ) T B | x + hz f ( x + hz ) U ( z ) K ( z ) dz = α Z f ( x + hz ) U ( z ) K ( z ) dz , see (B.2), so tha t (A.1) is prov e d. In particula r, B ∗ ( α ; 0 , x ) is the unique so lution of e L (1) ( B ; α, 0 , x ) = 0. If h = 0, the fir st or der condition (A.1) is equiv alent to Z F ( U ( z ) T B ∗ ( α ; 0 , x ) | x ) U ( z ) K ( z ) dz = α Z U ( z ) K ( z ) dz . 28 Let B T 0 ( α | x ) = ( Q ( α | x ) , 0 , . . . , 0) in R P . Since U ( z ) T B 0 ( α | x ) = Q ( α | x ), B 0 ( α | x ) satisfies the fir st-order condition eq uation a bove. Hence B ∗ ( α ; 0 , x ) = B 0 ( α | x ) by uniqueness. W e now show that B ∗ ( θ ) is contin uous ly differ ent iable in θ ov er e Θ 0 and give b ounds for B ∗ ( θ ), ∂ e L (1) ( B ∗ ( θ ); θ ) /∂ θ T and e L (2) ( B ∗ ( θ ); θ ). As shown ab ov e, B 7→ e L (1) ( B ; θ ) is contin uous ly differentiable and e L (2) ( B ; θ ) is a symmetric positive definite matrix for all B in R P and s o has an in verse. Assumptions F, K and X yield that F ( U ( z ) T B | x + hz ) and f ( x + hz ) are bounded a nd have b ounded θ -partia l deriv atives ov er e Θ 0 provided h is small eno ugh. Hence the Dominated Conv ergence Theor em and (B.2) yield that e L (1) ( B ; θ ) is contin uously differe ntiable in θ ov er e Θ 0 . Then the Implicit F unction Theo rem (see e.g . Zeidler (1985 ), p.130) and the first-or de r condition e L (1) ( B ∗ ( θ ); θ ) = 0 yields tha t B ∗ ( θ ) is contin uously differentiable in θ ov er e Θ 0 , with (B.6) ∂ B ∗ ( θ ) ∂ θ T = − h e L (2) ( B ∗ ( θ ); θ ) i − 1 ∂ e L (1) ( B ∗ ( θ ); θ ) ∂ θ T . Recall now that Θ 0 ⊂ e Θ 0 when h tends to 0. Hence co nt inuit y of B ∗ ( · ), ∂ e L (1) ( · , · ) /∂ θ T and co mpa ctness of e Θ 0 give lim h → 0 sup θ ∈ Θ 0 k B ∗ ( θ ) − B ∗ ( α ; 0 , x ) k = 0 , lim h → 0 sup θ ∈ Θ 0 ∂ e L (1) ( B ∗ ( θ ); θ ) ∂ θ T − ∂ e L (1) ( B ∗ ( α ; 0 , x ); α, 0 , x ) ∂ θ T = 0 . (B.7) Since the first limit is (A.2), (ii) is proved. Pr o of of (iii) . W e b ound the pa rtial deriv ative (B.6). Observe that (A.2), the expression of B ∗ ( α ; 0 , x ), the compactnes s of Θ 0 and Assumption F yield that there is a co mpact B s uch that B ∗ ( θ ) is in B for all θ in Θ 0 , provided h is sma ll enoug h. Then (B.3) and (B.4) give that unifor mly in θ in Θ 0 , e L (2) ( B ∗ ( θ ); θ ) ≻ C Z B (0 , 1) U ( z ) U ( z ) T dz . Hence (B.6) and (B.7) give (B.8) lim h → 0 sup θ ∈ Θ 0 ∂ B ∗ ( θ ) ∂ θ T ≤ C Z B (0 , 1) U ( z ) U ( z ) T dz ! − 1 lim h → 0 sup θ ∈ Θ 0 ∂ e L (1) ( B ∗ ( θ ); θ ) ∂ θ T ≤ C. Let us now return to the pro of of (iii). The differentiabilit y results a b ove yield that θ ∈ Θ 1 7→ Q ∗ ( x ′ ; θ ) = U (( x − x ′ ) /h ) T B ∗ ( θ ) is con tinuously differen tiable in θ . W e hav e for all x , x ′ in X and h ≥ h , U x − x ′ h ≤ C h p , ∂ ∂ θ T U x − x ′ h ≤ C h p +1 . Hence for h small e no ugh, (A.2) and (B.8) yield that for a ll θ in Θ 1 and x ′ in X , ∂ Q ∗ ( x ′ ; θ ) ∂ θ T = " ∂ ∂ θ T U x − x ′ h T # B ∗ ( θ ) + U x − x ′ h T ∂ B ∗ ( θ ) ∂ θ T ≤ ∂ ∂ θ T U x − x ′ h k B ∗ ( θ ) k + U x − x ′ h ∂ B ∗ ( θ ) ∂ θ T ≤ C h − p 1 + h − 1 . The T aylor inequalit y shows that (iii) is proved. 29 Pr o of of (iv). The change of v ariable x ′ = x + hz shows that it is s ufficient to prove that, for all θ in Θ 0 and z in K , f ( Q ∗ ( x + hz ; θ ) | x + hz ) ≥ C with f ( Q ∗ ( x + hz ; θ ) | x + hz ) = f ( U ( z ) T B ∗ ( θ ) | x + hz ) , which is true for h small enough by (A.2) and under Ass umption F which g ives that f ( y | x ) ≥ C > 0 for y in any compact subs et o f R and any x in X 0 . ✷ B.2. Pro of of Prop os ition A.1. The proo f of the Prop osition uses the t wo following Lemma s . In what follows, the s to chastic pro cesses R ( · ; · ), R 1 ( · ; · ) and R 2 ( · ; · ) hav e the same distribution than the R i ( · ; · ), R 1 ( · ; · ) and R 2 ( · ; · ) in (A.9), (A.10) and (A.11). Define als o (B.9) δ ( β , θ ) = U X − x h T β ( nh d ) 1 / 2 . Lemma B.1. Under Assumptions F, K and X, we have V a r ( R ( β , ǫ ; θ )) ≤ C k ǫ k 2 ( k β k + k ǫ k ) n ( nh d ) 1 / 2 . Pro of of Lemma B.1. Obser ve ℓ α ( t ) = 2 R t 0 ( α − I ( z ≤ 0)) dz . Hence (A.9) and (B.9) yield (B.10) R ( β , ǫ ; θ ) = 2 K h ( X − x ) Z δ ( β ,θ )+ δ ( ǫ,θ ) δ ( β ,θ ) ( I ( Y ≤ Q ∗ ( X ; θ ) + t ) − I ( Y ≤ Q ∗ ( X ; θ ))) dt. The Cauch y- Schw arz inequality give R ( β , ǫ ; θ ) 2 = 4 K h ( X − x ) 2 Z δ ( β ,θ )+ δ ( ǫ,θ ) δ ( β ,θ ) ( I ( Y ≤ Q ∗ ( X ; θ ) + t ) − I ( Y ≤ Q ∗ ( X ; θ ))) dt ! 2 ≤ 4 K h ( X − x ) 2 | δ ( ǫ, θ ) | Z δ ( β ,θ )+ δ ( ǫ,θ ) δ ( β ,θ ) ( I ( Y ≤ Q ∗ ( X ; θ ) + t ) − I ( Y ≤ Q ∗ ( X ; θ ))) 2 dt < 4 K h ( X − x ) 2 | δ ( ǫ, θ ) | Z δ ( β ,θ )+ δ ( ǫ,θ ) δ ( β ,θ ) I ( | Y − Q ∗ ( X ; θ ) | < | t | ) dt . Hence Assumption F and (B.9) g ive E R 2 ( β , ǫ ; θ ) | X ≤ 4 K h ( X − x ) 2 | δ ( ǫ, θ ) | Z δ ( β ,θ )+ δ ( ǫ,θ ) δ ( β ,θ ) Z I ( | y − Q ∗ ( X ; θ ) | < | t | ) f ( y | X ) dy dt ≤ 4 K h ( X − x ) 2 k f ( ·|· ) k ∞ | δ ( ǫ, θ ) | 2 Z δ ( β ,θ )+ δ ( ǫ,θ ) δ ( β ,θ ) | t | dt ≤ C K h ( X − x ) 2 δ ( ǫ, θ ) 2 ( | δ ( β , θ ) | + | δ ( ǫ , θ ) | ) ≤ C K 2 X − x h U X − x h 3 ( nh d ) 3 / 2 k ǫ k 2 ( k β k + k ǫ k ) . 30 Then, under Assumptions K and X, V a r ( R ( β , ǫ ; θ )) ≤ E [ R 2 ( β , ǫ ; θ )] = E E [ R 2 ( β , ǫ ; θ ) | X ] ≤ C k ǫ k 2 ( k β k + k ǫ k ) ( nh d ) 3 / 2 Z K 2 x ′ − x h U x ′ − x h 3 f X ( x ′ ) dx ′ ≤ C k ǫ k 2 ( k β k + k ǫ k ) ( nh d ) 3 / 2 h d Z K 2 ( z ) k U ( z ) k 3 f X ( x + hz ) dx ′ ≤ C k ǫ k 2 ( k β k + k ǫ k ) n ( nh d ) 1 / 2 . ✷ Define F = F t β , t ǫ , Θ 1 = R ( β , ǫ ; θ ) , ( β , ǫ, θ ) ∈ B (0 , t β ) × B (0 , t ǫ ) × Θ 1 . The next lemma studies coverings of F with bra ck ets R , R . Recall that the bracket R, R = R ( X , Y ) , R ( X , Y ) is the set of r a ndom v aria bles r = r ( X , Y ) such that R ≤ r ≤ R almost s urely . Lemma B.2. Under Assumptions F, K and X and if t β + t ǫ ≥ 1 and n is lar ge en ou gh, (i) Ther e ar e some σ 2 and w , with σ 2 ≍ t 2 ǫ ( t ǫ + t β ) n ( nh d ) 1 / 2 , w ≍ t β + t ǫ ( nh d ) 1 / 2 , such that for al l inte ger num b er k ≥ 2 , ( β , ǫ, θ ) in B (0 , t β ) × B (0 , t ǫ ) × Θ 1 , E h | R ( β , ǫ ; θ ) − E [ R ( β , ǫ ; θ )] | k i ≤ k ! 2 w k − 2 σ 2 . (ii) L et τ in (0 , 1) b e a br acket length. Ther e is an set of br ackets I τ = R j,τ , R j,τ , 1 ≤ j ≤ e H ( τ ) such that F ⊂ [ 1 ≤ j ≤ e H ( τ ) R j,τ , R j,τ , E h R j,τ − R j,τ k i ≤ k ! 2 w k − 2 τ 2 for al l inte ger numb er k ≥ 2 and al l j in 1 , e H ( τ ) , H ( τ ) ≤ C lo g n ( t β + t ǫ ) τ for al l τ , t β and t ǫ . Pro of of Lemma B.2 . Define for β in R P e R ( β ; θ ) = 2 K h ( X − x ) Z δ ( β ,θ ) 0 ( I ( Y ≤ Q ∗ ( X ; θ ) + u ) − I ( Y ≤ Q ∗ ( X ; θ ))) du. Let sgn( t ) = I ( t ≥ 0) − I ( t < 0 ). O bserve that e R ( β ; θ ) ≥ 0 with e R ( β ; θ ) = 2 K h ( X − x ) Z | δ ( β ,θ ) | 0 | I ( Y ≤ Q ∗ ( X ; θ ) + sgn( δ ( β , θ )) u ) − I ( Y ≤ Q ∗ ( X ; θ )) | du = 2 K h ( X − x ) | δ ( β , θ ) | Z 1 0 | I ( Y ≤ Q ∗ ( X ; θ ) + δ ( β , θ ) v ) − I ( Y ≤ Q ∗ ( X ; θ )) | dv = 2 K h ( X − x ) | δ ( β , θ ) | Z 1 0 I ( Y − Q ∗ ( X ; θ ) lies be tw een 0 and δ ( β , θ ) v ) dv. (B.11) (B.10) a nd δ ( β , θ ) + δ ( ǫ, θ ) = δ ( β + ǫ, θ ) give R ( β ; ǫ , θ ) = e R ( β + ǫ ; θ ) − e R ( β ; θ ) . (B.12) 31 It a lso follows from (B.9) and Assumption K that for a ll β in B (0 , t β + t ǫ ) a nd all θ in Θ 1 (B.13) e R ( β ; θ ) ≤ 2 U X − x h K X − x h k β k ( nh d ) 1 / 2 ≤ w 2 , w ≍ t β + t ǫ nh d 1 / 2 . Part (i) follows fro m Lemma B.1 and (B.12) which give E h | R ( β , ǫ ; θ ) − E [ R ( β , ǫ ; θ )] | k i = E e R ( β + ǫ ; θ ) − E h e R ( β + ǫ ; θ ) i − e R ( β ; θ ) − E h e R ( β ; θ ) i k − 2 | R ( β , ǫ ; θ ) − E [ R ( β , ǫ ; θ )] | 2 ≤ 2 × w 2 k − 2 V a r ( R ( β , ǫ ; θ )) ≤ w k − 2 σ 2 . The pr o of of par t (ii) will b e divided in thr ee steps. Let e F t be { e R ( β ; θ ) , ( β , θ ) ∈ B (0 , t ) × Θ 1 } . F or the sake of br evity w e abbr eviate R j,τ , R j,τ int o R j , R j . Step 1 : Coverings of F and e F t , t = t β + t ǫ ≥ 1 . W e show in this step that it is s ufficient to find a cov er ing of e F t with H ( τ ) = H ( τ ; t ) brack ets satisfying E h R j − R j k i ≤ k ! 8 w 2 k − 2 τ 2 , (B.14) H ( t ) ≤ C log nt τ . (B.15) Indeed, consider tw o such cov er ings of e F t β and e F t β + t ǫ , e F t β ⊂ [ 1 ≤ j ≤ e H 1 ( τ ) h R 1 j , R 1 j i , e F t β + t ǫ ⊂ [ 1 ≤ j ≤ e H 2 ( τ ) h R 2 j , R 2 j i , H 1 ( τ ) ≤ H 2 ( τ ) = H ( τ , t β + t ǫ ). Consider a R ( β , ǫ ; θ ) in F . Since e R ( β ; θ ) ∈ h R 1 j 1 , R 1 j 1 i and e R ( β + ǫ ; θ ) ∈ h R 2 j 2 , R 2 j 2 i for some j 1 and j 2 , (B.1 2) implies that R ( β , ǫ ; θ ) ∈ h R 2 j 2 − R 1 j 1 , R 2 j 2 − R 1 j 1 i . Hence these e H ′ ( τ ) brack ets form a covering of F with, using (B.14) and (B.15), E R 2 j 2 − R 1 j 1 − R 2 j 2 − R 1 j 1 k ≤ 2 k − 1 E R 2 j 2 − R 2 j 2 | k + E R 1 j 1 − R 1 j 1 | k ≤ 2 k k ! 8 w 2 k − 2 τ 2 = k ! 2 w k − 2 τ 2 , H ′ ( τ ) = H 1 ( τ ) + H 2 ( τ ) ≤ C log n ( t β + t ǫ ) τ . Step 2: Pr eliminary r esults for the c onstru ction of a c overing of e F t . W e b ound the increments of ( β , θ ) 7→ Q ∗ ( X ; θ ) , K h ( X − x ) , δ ( β , θ ). Lemma A.1-(iii) gives that for all θ , θ ′ in Θ 1 | Q ∗ ( X ; θ ) − Q ∗ ( X ; θ ′ ) | ≤ C h − p (1 + h − 1 ) k θ − θ ′ k . Under Assumption K K X − x h − K X − x ′ h ′ ≤ C x − x ′ h ′ + k X − x k 1 h − 1 h ′ ≤ C 1 h k x − x ′ k + 1 h 2 | h − h ′ | ≤ C h 2 k θ − θ ′ k . 32 F or the incr ements of δ ( β , θ ), define U = U ( X − x ), U ′ = U ( X − x ′ ), H ′ = H ( h ′ ). This gives | δ ( β , θ ) − δ ( β ′ , θ ′ ) | = U T H − 1 ( nh d ) 1 / 2 ( β − β ′ ) + ( U ′ − U ) T H − 1 ( nh d ) 1 / 2 β ′ + U ′ T H − 1 ( nh d ) 1 / 2 − H ′− 1 ( nh ′ d ) 1 / 2 ! β ′ ≤ C H − 1 ( nh d ) 1 / 2 k β − β ′ k + k x − x ′ k H − 1 ( nh d ) 1 / 2 k β ′ k + C k β ′ k H − 1 ( nh d ) 1 / 2 − H ′− 1 ( nh ′ d ) 1 / 2 ≤ C (1 + t ) h p nh d 1 / 2 k β − β ′ k + k x − x ′ k + 1 h | h − h ′ | ≤ C (1 + t ) h p +1 nh d 1 / 2 ( k β − β ′ k + k θ − θ ′ k ) . Step 3 : Construction of t he c overing of e F t . Define ρ ( q , δ ) = | I ( q ≤ δ ) − I ( q ≤ 0) | = I ( q ∈ (0 , δ ]) I ( δ ≥ 0 ) + I ( q ∈ [ δ, 0)) I ( δ < 0) , r ( q , δ ) = Z 1 0 ρ ( q , δ v ) dv . Hence (B.11) s hows e R ( β ; θ ) = 2 K h ( X − x ) | δ ( β , θ ) | r ( Y − Q ∗ ( X ; θ ) , δ ( β , θ ) ) . F or an y η > 0, there exists functions ρ ( q , δ ) = ρ η ( q , δ ) and ρ ( q , δ ) = ρ η ( q , δ ) and an open set D = D η ⊂ R 2 such that ρ − (i) 0 ≤ ρ ( q , δ ) ≤ ρ ( q , η ) ≤ ρ ( q , δ ) ≤ 1 for all ( q , δ ), with ρ ( q , δ ) = ρ ( q , η ) = ρ ( q , δ ) if ( q , δ ) ∈ R 2 \ D η , ρ − (ii) sup ( q,δ ) ∈ D η ∂ ρ ( q,δ ) ∂ q + ∂ ρ ( q,δ ) ∂ δ + ∂ ρ ( q ,δ ) ∂ q + ∂ ρ ( q,δ ) ∂ δ ≤ C η − 1 / 2 , ρ − (iii) D ⊂ D ′ = ( q , δ ) ∈ R 2 ; | q | ≤ C η − 1 / 2 or | q − δ | ≤ C η − 1 / 2 . Define r ( q , δ ) = R 1 0 ρ ( q , v δ ) dv , r ( q , δ ) = R 1 0 ρ ( q , v δ ) dv and R ( β , θ ) = 2 K h ( X − x ) | δ ( β , θ ) | r ( Y − Q ∗ ( X ; θ ) , δ ( β , θ ) ) , R ( β , θ ) = 2 K h ( X − x ) | δ ( β , θ ) | r ( Y − Q ∗ ( X ; θ ) , δ ( β , θ ) ) . Since K ( · ) ≥ 0, ρ -(i) gives that these functions are s uch that (B.16) R ( β , θ ) ≤ e R ( β , θ ) ≤ R ( β , θ ) . W e now bound R ( β , θ ) − R ( β ′ , θ ′ ) a nd R ( β , θ ) − R ( β ′ , θ ′ ). W e have | R ( β , θ ) − R ( β ′ , θ ′ ) | ≤ 2 | K h ( X − x ) − K h ′ ( X − x ′ ) | | δ ( β , θ ) | r ( Y − Q ∗ ( X ; θ ) , δ ( β , θ ) ) +2 K h ′ ( X − x ′ ) | δ ( β , θ ) − δ ( β ′ , θ ′ ) | r ( Y − Q ∗ ( X ; θ ) , δ ( β , θ )) +2 K h ′ ( X − x ′ ) | δ ( β ′ , θ ′ ) | | r ( Y − Q ∗ ( X ; θ ) , δ ( β , θ ) ) − r ( Y − Q ∗ ( X ; θ ′ ) , δ ( β ′ , θ ′ )) | . 33 Hence Step 1, ρ - (i,ii), (B.9) a nd the T aylor inequality giv e for all ( β , θ ), ( β ′ , θ ′ ) in B (0 , t ) × Θ 1 , provided n is la r ge enough, | R ( β , θ ) − R ( β ′ , θ ′ ) | ≤ C t h p ( nh d ) 1 / 2 k θ − θ ′ k h 2 + 1 + t h p +1 ( nh d ) 1 / 2 ( k θ − θ ′ k + k β − β ′ k ) + C η − 1 / 2 k θ − θ ′ k h p +1 + 1 + t h p +1 ( nh d ) 1 / 2 ( k θ − θ ′ k + k β − β ′ k ) ≤ C 1 + η − 1 / 2 (1 + t ) h p +2 ( k θ − θ ′ k + k β − β ′ k ) . Arguing symmetrically gives R ( β , θ ) − R ( β ′ , θ ′ ) ≤ C 1 + η − 1 / 2 (1 + t ) h p +2 ( k θ − θ ′ k + k β − β ′ k ) . W e no w construct the brack ets. Recall that there is a cov ering of B (0 , t ) × Θ 1 with N ba lls B (( β j , θ j ) , η ), θ j = ( α j , h j , x j ), with center ( β j , θ j ) a nd radius η s uch that (B.17) N ≤ max 1 , C t P η P + d +2 , see v a n de Geer (1 999, p.20 ). Define R ′ j = R ( β j , θ j ) − C η 1 + η − 1 / 2 (1 + t ) h p +2 , R ′ j = R ( β j , θ j ) + C η 1 + η − 1 / 2 (1 + t ) h p +2 , (B.18) R j = max 0 , R ′ j , R j = min w 2 , R ′ j . Bounding R ( β , θ ) − R j and R ( β , θ ) − R j for ( β , θ ) in B (( β j , θ j ) , η ), (B.1 6) and (B.1 3) give (B.19) R ′ j ≤ R j ≤ e R ( β , θ ) ≤ R j ≤ R ′ j . It then follows tha t R j , R j , j = 1 , . . . , N is a cov ering of e F t with, s ince 0 ≤ R j ≤ R j ≤ w / 2, (B.20) R j − R j ≤ w 2 ≍ C t nh d 1 / 2 . 34 W e no w bo und E h R j − R j 2 i and E h R j − R j k i . (B.19), ρ -(i,iii), (B.9) a nd Assumptions F, K give E h R j − R j 2 i ≤ E R ′ j − R ′ j 2 ≤ 2 E h R ( β j , θ j ) − R ( β j , θ j ) 2 i + C η 2 1 + η − 1 / 2 2 (1 + t ) 2 h 2( p +2) ≤ 8 E h K 2 h j ( X − x j ) δ 2 ( β j , θ j ) ( r ( Y − Q ∗ ( X ; θ j ) , δ ( β j , θ j )) − r ( Y − Q ∗ ( X ; θ j ) , δ ( β j , θ j ))) 2 i + C (1 + t ) 2 h 2( p +2) η 2 + η ≤ 8 E " K 2 h j ( X − x j ) δ 2 ( β j , θ j ) Z Z 1 0 I (( y − Q ∗ ( X ; θ j ) , vδ ( β j , θ j )) ∈ D ) dv 2 f ( y | X ) dy # + C (1 + t ) 2 h 2( p +2) η 2 + η ≤ 8 h d j k β k 2 nh d j Z K 2 ( z ) k U ( z ) k 2 × Z Z 1 0 I (( y − Q ∗ ( x j + h j z ; θ j ) , vδ ( β j , θ j )) ∈ D ) dv f ( y | x j + h j z ) dy f ( x j + h j z ) dz + C (1 + t ) 2 h 2( p +2) η 2 + η ≤ C (1 + t ) 2 h 2( p +2) η 2 + η + η 1 / 2 . This together with (B.20 ) give for any integer num b er k ≥ 2 E h R j − R j k i ≤ w 2 k − 2 E h R j − R j 2 i ≤ k ! 8 w 2 k − 2 × C (1 + t ) 2 h 2( p +2) η 2 + η + η 1 / 2 . Hence (B.14) ho lds if η sa tisfies η = C 3 min h 2( p +2) (1 + t ) 2 ! 1 / 2 τ , h 2( p +2) (1 + t ) 2 τ 2 , h 2( p +2) (1 + t ) 2 ! 2 τ 4 . Recall now that τ < 1 , t ≥ 1 and that h ≥ C n − 1 /d under Assumption K. The b ound (B.1 7) for N = exp( H ( τ )) g ives taking η as ab ove e H ( τ ) ≤ max 1 , C t P min h 2( p +2) (1+ t ) 2 1 / 2 τ , h 2( p +2) (1+ t ) 2 τ 2 , h 2( p +2) (1+ t ) 2 2 τ 4 P + d +2 ≤ max 1 , C t 3 n (4 p +4) /d τ P + d +2 . It then follows fo r n larg e enough H ( τ ) ≤ ( P + d + 2) max 0 , log C t 3 n (4 p +4) /d τ = C 3 log t + 4 p + 4 d log n − log τ ≤ C log tn τ , and (B.15 ) is proved. This ends the pro of of the Lemma. ✷ 35 Let us now return to the pr o of of Prop ositio n A.1. Define X = ( X 1 , · · · , X n ). The definition of R 1 n and (A.10) g ive E " sup ( β ,ǫ, θ ) ∈B (0 ,t β ) ×B (0 ,t ǫ ) × Θ 1 R 1 n ( β , ǫ ; θ ) # = E " sup ( β ,ǫ, θ ) ∈B (0 ,t β ) ×B (0 ,t ǫ ) × Θ 1 n X i =1 ( R i ( β , ǫ ; θ ) − E [ R i ( β , ǫ ; θ ) | X ]) # ≤ E " sup ( β ,ǫ, θ ) ∈B (0 ,t β ) ×B (0 ,t ǫ ) × Θ 1 n X i =1 ( R i ( β ; ǫ, θ ) − E [ R i ( β ; ǫ , θ )]) # + E " sup ( β ,ǫ, θ ) ∈B (0 ,t β ) ×B (0 ,t ǫ ) × Θ 1 E " n X i =1 ( R i ( β , ǫ ; θ ) − E [ R i ( β , ǫ ; θ )]) | X # # ≤ 2 E " sup ( β ,ǫ, θ ) ∈B (0 ,t β ) ×B (0 ,t ǫ ) × Θ 1 n X i =1 ( R i ( β , ǫ ; θ ) − E [ R i ( β , ǫ ; θ )]) # . Let H ( · ), σ and w b e as in Lemma B.2. Recall tha t t β + t ǫ ≥ 1 and that σ < 1 ≤ n ( t β + t ǫ ) for n lar ge enough under the ass umptions for t β and t ǫ of the Pro p o sition. It follows fro m Mass art (2 007, Theo rem 6.8) tha t E " sup ( β ,ǫ, θ ) ∈B (0 ,t β ) ×B (0 ,t ǫ ) × Θ 1 n X i =1 ( R i ( β , ǫ ; θ ) − E [ R i ( β , ǫ ; θ )]) # ≤ C n 1 / 2 Z σ 0 H ( u ) 1 / 2 du + ( w + σ ) H ( σ ) ! . Since σ < 1, Lemma B .2 gives, for a ll u in (0 , σ ], H ( u ) ≤ C log( n ( t β + t ǫ ) /u ). This g ives n 1 / 2 Z σ 0 H 1 / 2 ( u ) du ≤ ( n σ ) 1 / 2 Z σ 0 H ( u ) du ! 1 / 2 ≤ C ( n σ ) 1 / 2 Z σ 0 log n ( t β + t ǫ ) u du ! 1 / 2 = C ( n σ ) 1 / 2 σ log ( t β + t ǫ ) n σ + 1 1 / 2 ≤ C n 1 / 2 σ lo g 1 / 2 ( t β + t ǫ ) n σ . The order for σ g iven in Lemma B.2, assumption on t β + t ǫ and Assumption K give log ( n ( t β + t ǫ ) / σ ) ≤ C log n 3 / 2 nh d 1 / 4 ( t β + t ǫ ) 1 / 2 t ǫ ≤ C log n 3 / 2 nh d 1 / 2 log 1 / 2 n ≤ C log n . Substituting gives E " sup ( β ,ǫ, θ ) ∈B (0 ,t β ) ×B (0 ,t ǫ ) × Θ 1 R 1 n ( β , ǫ ; θ ) # ≤ C n 1 / 2 σ lo g 1 / 2 n + ( σ + w ) log n ≤ C t ǫ ( t β + t ǫ ) 1 / 2 nh d 1 / 4 log 1 / 2 n 1 + log 1 / 2 n 1 n 1 / 2 + ( t β + t ǫ ) 1 / 2 t ǫ nh d 1 / 4 ≤ C t ǫ ( t β + t ǫ ) 1 / 2 nh d 1 / 4 log 1 / 2 n. ✷ B.3. Pro of of Prop ositi on A.2. The pro of of Pro p o sition A.2 follows the sa me steps of the pro of of Prop ositio n A.1 and we only sketc h it. The integral expressio n of R ( β , ǫ ; θ ) in (B.10) and the expressio n (A.11) of R 2 ( β , ǫ ; θ ) give R 2 ( β , ǫ ; θ ) = 2 K h ( X − x ) Z δ ( β ,θ )+ δ ( ǫ,θ ) δ ( β ,θ ) ( F ( Q ∗ ( X ; θ ) + u | X ) − F ( Q ∗ ( X ; θ ) | X )) du − 1 2 nh d ǫ T J ( θ )( ǫ +2 β ) . 36 The definition (3.6) o f J ( θ ) gives R 2 ( β , ǫ ; θ ) = 2 K h ( X − x ) Z δ ( β ,θ )+ δ ( ǫ,θ ) δ ( β ,θ ) ( F ( Q ∗ ( X ; θ ) + u | X ) − F ( Q ∗ ( X ; θ ) | X ) − uf ( Q ∗ ( X ; θ ) | X )) du = 2 K h ( X − x ) Z δ ( β ,θ )+ δ ( ǫ,θ ) δ ( β ,θ ) u Z 1 0 ( f ( Q ∗ ( X ; θ ) + v u | X ) − f ( Q ∗ ( X ; θ ) | X ) ) dv du. Define r ( β ; θ ) = 2 K h ( X − x ) Z δ ( β ,θ ) 0 u Z 1 0 ( f ( Q ∗ ( X ; θ ) + v u | X ) − f ( Q ∗ ( X ; θ ) | X ) ) dv du which is such that R 2 ( β , ǫ ; θ ) = r ( β + ǫ ; θ ) − r ( β ; θ ). Since | f ( q + v | x ) − f ( q | x ) | ≤ L 0 | v | under Assumption F, (B.9 ) gives R 2 ( β , ǫ ; θ ) ≤ K h ( X − x ) L 0 Z δ ( β ,θ )+ δ ( ǫ,θ ) δ ( β ,θ ) u 2 du ≤ C K h ( X − x ) | δ ( ǫ, θ ) | ( | δ ( β , θ ) | + | δ ( ǫ , θ ) | ) 2 ≤ C U X − x h 3 K X − x h k ǫ k ( k β k + k ǫ k ) 2 ( nh d ) 3 / 2 , (B.21) | r ( β ; θ ) | ≤ C K h ( X − x ) | δ ( β , θ ) | 3 ≤ C U X − x h 3 / 2 K X − x h k β k 3 ( nh d ) 3 / 2 . The latter ineq uality gives for all β in B (0 , t β + t ǫ ) a nd all θ in Θ 1 | r ( β ; θ ) | ≤ w ′ 2 , w ′ ≍ ( t β + t ǫ ) 3 nh d 3 / 2 . It fo llows from (B.2 1) that, for all ( β , ǫ ) in B (0 , t β ) × B (0 , t ǫ ), V a r R 2 ( β , ǫ ; θ ) ≤ E R 2 ( β , ǫ ; θ ) 2 ≤ C k ǫ k ( k β k + k ǫ k ) 2 ( nh d ) 3 / 2 ! 2 Z U x ′ − x h 4 K 2 x ′ − x h f ( x ′ ) dx ′ ≤ C k ǫ k 2 ( k β k + k ǫ k ) 4 ( nh d ) 3 h d Z k U ( z ) k 4 K 2 ( z ) dz ≤ ( σ ′ ) 2 , σ ′ ≍ t ǫ ( t β + t ǫ ) 2 n 3 / 2 h d . Then constructing brack ets as in Lemma B.2 and arg uing as in the pro of o f Pro p o sition A.1 give E " sup ( β ,ǫ, θ ) ∈B (0 ,t β ) ×B (0 ,t ǫ ) × Θ 1 R 2 n ( β , ǫ ; θ ) − E R 2 n ( β , ǫ ; θ ) # ≤ n 1 / 2 σ ′ log 1 / 2 n ( t β + t ǫ ) σ ′ + ( σ ′ + w ′ ) log n ( t β + t ǫ ) σ ′ . Since (B.21) y ields for all ( β , ǫ, θ ) ∈ B (0 , t β ) × B (0 , t ǫ ) × Θ 1 E R 2 n ( β , ǫ ; θ ) = n E R 2 ( β , ǫ ; θ ) ≤ C n E " U X − x h 3 K X − x h k ǫ k ( k β k + k ǫ k ) 2 ( nh d ) 3 / 2 # ≤ C t ǫ ( t ǫ + t β ) 2 nh d 1 / 2 , 37 substituting gives, using t β ≥ 1, t β /t ǫ = O nh d / log 1 / 2 n and Assumption K whic h ensures log n 5 / 2 h d /t ǫ = O (log n ), E " sup ( β ,ǫ, θ ) ∈B (0 ,t β ) ×B (0 ,t ǫ ) × Θ 1 R 2 n ( β , ǫ ; θ ) # ≤ E " sup ( β ,ǫ, θ ) ∈B (0 ,t β ) ×B (0 ,t ǫ ) × Θ 1 R 2 n ( β , ǫ ; θ ) − E R 2 n ( β , ǫ ; θ ) + E R 2 n ( β , ǫ ; θ ) # ≤ C t ǫ ( t β + t ǫ ) 2 nh d 1 + t β + t ǫ t ǫ nh d 1 / 2 log 1 / 2 n 5 / 2 h d t ǫ ( t β + t ǫ ) ! + C t ǫ ( t ǫ + t β ) 2 nh d 1 / 2 ≤ t ǫ ( t ǫ + t β ) 2 nh d 1 / 2 . ✷ B.4. Pro of of Lemma A.2 . Lemma A.1 (iv) and Assumptions K and F g ive that ther e is a C > 0 such that for a ll θ in Θ 1 and all i , J i ( θ ) ≻ C M i ( θ ) , M i ( θ ) = 2 K h ( X i − x ) U X i − x h U X i − x h T . Hence for a ll θ in Θ 1 , (B.22) 1 nh d n X i =1 J i ( θ ) ≻ C nh d n X i =1 M i ( θ ) = M n ( θ ) . The ent ries of M n ( θ ) wr ite C nh d n X i =1 X i − x h v 1 + v 2 K X i − x h , 0 ≤ | v 1 | , | v 2 | ≤ p. Let M ( θ ) be the ma tr ix with entries C h d E " X − x h v 1 + v 2 K X − x h # , 0 ≤ | v 1 | , | v 2 | ≤ p . Arguing as in the pr o of of P rop osition A.1 for ea ch o f the entries o f M n ( θ ) gives sup θ ∈ Θ 1 k M n ( θ ) − M ( θ ) k = o P (1) . Assumptions K, F a nd X give, for all u in R P , all x in X 0 and h small eno ugh, u T M ( θ ) u = C h d E X 0 ≤| v 1 | , | v 2 |≤ p u v 1 u v 2 X − x h v 1 + v 2 K X − x h = C X 0 ≤| v 1 | , | v 2 |≤ p u v 1 u v 2 Z z v 1 + v 2 K ( z ) f ( x + hz ) dz = C Z X 0 ≤| v |≤ p u v z v 2 K ( z ) f ( x + hz ) dz ≥ C Z B (0 , 1) X 0 ≤| v |≤ p u v z v 2 dz ≥ C k u k 2 , where the la st b ound uses the fact that u 7→ Z B (0 , 1) X 0 ≤| v |≤ p u v z v 2 dz 1 / 2 38 is a nor m and that nor ms ov er R P are equiv alent. Hence (B.22) and k M n ( θ ) − M ( θ ) k = o P (1) yield that there is a γ > 0 such that inf θ ∈ Θ 1 γ n ( θ ) ≥ inf θ ∈ Θ 1 inf k u k =1 u T M n ( θ ) u ≥ γ + o P (1) . ✷ B.5. Pro of o f Lemma A.3. The first order conditio n (A.1) implies that E [ S i ( θ )] = 0. Consider the v co ordinate o f S i ( θ ), S v ,i ( θ ) = 2 { I ( Y i ≤ Q ∗ ( X i ; θ )) − α } X i − x h v K X i − x h . Hence Assumptions K and X give, uniformly in θ ∈ Θ 1 and for all i , S v ,i ( θ ) ( nh d ) 1 / 2 ≤ w ′′ , w ′′ ≍ nh d − 1 / 2 , V a r S v ,i ( θ ) ( nh d ) 1 / 2 ! ≤ E S v ,i ( θ ) ( nh d ) 1 / 2 ! 2 ≤ E X i − x h v K X i − x h ( nh d ) 1 / 2 ! 2 = h d nh d Z ( z v K ( z )) 2 ≤ ( σ ′′ ) 2 , σ ′′ ≍ n − 1 / 2 . Hence arguing a s in the pro o f of Prop os itio n A.1 gives, under Assumption K, E " sup θ ∈ Θ 1 1 ( nh d ) 1 / 2 n X i =1 S v ,i ( θ ) # = O n 1 / 2 σ ′′ log 1 / 2 n + ( σ ′′ + w ′′ ) log 1 / 2 n = O log 1 / 2 n . The Marko v inequality then shows that the Lemma is prov ed. ✷
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment