Quantile Estimation of A general Single-Index Model

Quan tile Estimation of A Gene ral Single-Inde x Mo d el Efang Kong ∗ T e chnische Universiteit Eindhoven Yingcun Xia † National University of Si ng ap or e, Singap or e Abstract The single-index mo del is one of the most p opular semiparametric mo dels in Econometrics. In this pap er, we deﬁne a quanti le regression single-index mo del, whic h includes the single-index structure for conditional mean and for con- ditional v ariance. Key wor ds: Lo cal p olynomial ﬁtti ng; M-regression; Strongly mixing pro cesses; Uni- form strong consistency . 1 In tro duc tion Regression qu an tiles, along with the dual metho d s of regression rank scores, can b e considered one of the ma jor statistica l breakthroughs of the past decades. Its adv ant ages o ve r the other estimation metho d s hav e b een we ll inv estigated. Regression quantile metho ds pro vide a m uc h more complete statistical analysis of the sto c hastic relationships among v ariables; in addition, they are more robust against p ossible outliers or extreme v alues, and can b e computed via traditional l inear programming metho ds. Although median regression ideas go bac k to the 18th cen tury and the w ork of Laplace, regression quantile m etho ds were ﬁ rst in tro duced b y Ko enk er and Bassett (1978). T he linear r egression qu an tile is very useful, but lik e linear regression it is not ﬂexible enough to capture complicated relations. F or quanti le regression, this d isadv antage ∗ Eurandom, T ec hnische Univ ersiteit Ei nd hov en, T he Netherlands. E-mail address: kong@ eurandom.tue.nl. † Department of Statistics and Ap plied Probability , National Un ivers ity of Singapore, Singap ore. http://w ww.stat.nus.edu.sg /˜staxyc . E-mail address: staxyc@ nus.edu.sg. 1 is ev en worse. As an example, consid er th e p opular AR(1)-AR CH(1) mo del: y t = α 0 + α 1 y t − 1 + ε t , ε t = σ t z t , z t ∼ I ID σ 2 t = β 0 + β 1 ε 2 t − 1 , β 0 > 0 , β 1 ≥ 0 , whic h cannot b e ﬁtted we ll by the linear q u an tile m o del. In this pap er, w e fo cus on an imp ortant sp ecial case when the loss fun ction is sp eciﬁed as ρ τ ( v ) = τ I ( v > 0) v + ( τ − 1) I ( v ≤ 0) v, (1) where 0 < τ < 1 and I ( . ) is the identi ty fun ction, leading to the τ th q u an tile regression, see Ko enk er and Bassett (1978). In a nonparametric setting, w e can state the problem as follo ws. S u pp ose Y is th e resp onse v ariable and X ∈ R d are the co v ariate s. F or loss function ρ τ ( . ), we are in terested in a fun ction m τ ( x ), suc h th at m τ ( x ) = arg min E { ρ τ [ Y − m ( X )]    X = x } with resp ect to m ( . ) ∈ L 1 . (2) The function m τ ( x ) is called the τ − th quantile nonp arametric regression fu nction of Y on X . The application of nonparametric quantil e estimation has b een intensiv ely in vestig ated in the literature. See for examp le K o enk er (2005) and Kong et al (2008). As in n onparametric estimation of the conditional mean function, there is the “curse of dimensionalit y” in estimating the t ypically m ultiv ariable fun ction m τ ( . ). The dimens ion red uction app r oac h can th us b e applied here, by consid ering m τ ( θ ⊤ x ) = arg min E { ρ τ ( Y − m ( θ ⊤ X )) | X = x } with r esp ect to θ ∈ Θ and m ( . ) ∈ L 1 , (3) where Θ = { θ : | θ | = 1 } . Ideally , w e come to a single-index quanti le mo del Y = m ( θ ⊤ 0 X ) + ε, E ( ϕ ( ε ) | X ) = 0 , a.s. (4) where ϕ ( . ) is the piecewise deriv ativ e f u nction of ρ ( . ) in (1). A typica l mo del is the general single-index mo del, Y = g ( θ ⊤ 0 X, ε ) where ε is indep endent of X . Und er suc h a mo d el sp eciﬁcation, it is easy to see that m τ ( x ) = g τ ( θ ⊤ x ) ≡ m in v { v : P ( g ( θ ⊤ 0 x, ε ) ≤ v ) ≥ τ } . 2 F or the conditional heteroscadiscit y mo del, w here g ( θ ⊤ 0 X, ε ) = g ( θ ⊤ 0 X ) ε , we ev en ha ve m τ ( x ) = g ( θ ⊤ 0 X ) Q τ ( ε ) where Q τ ( ε ) is the τ − th quan tile of ε . An in teresting sp ecial case for this setting is the AR CH(p) mo del, where X = ( y 2 t − 1 , ..., y 2 t − p ) ⊤ and Y = y t in a time series s etting. Our main fo cus is the estimation of θ 0 . Sup p ose { X i , Y i } n i =1 are I .I.D. observ ations fr om underlying mo del (4). W e p rop ose to estimate the ind ex p arameter θ 0 b y ˆ θ = arg min θ ∈ Θ min a j ,b j n X i =1 n X j =1 K ( θ ⊤ X ij /h ) ρ ( Y i − a j − b j θ ⊤ X ij ) , X ij = X i − X j (5) where K ( . ) is a kernel fun ction and h is a bandwidth . The minimization in (5) can b e realized through iteration. First for any initial estimate ϑ ∈ Θ, d enote b y [ ˆ a ϑ ( x ) , ˆ b ϑ ( x )], th e minimizer of n X i =1 K ( ϑ ⊤ X ix /h ) ρ ( Y i − a − bϑ ⊤ X ix ) with resp ect to a and b, (6) where X ix = X i − x . The estimate of θ 0 is then up dated b y ˆ θ = arg min θ ∈ Θ n X i =1 n X j =1 K ( ϑ ⊤ X ij /h ) ρ { Y i − ˆ a ϑ ( X j ) − ˆ b ϑ ( X j ) θ ⊤ X ij } . (7) Rep eat (6) and (7) until con v ergence. The true v alue θ 0 is thus estimated by the s tandardized ﬁnal estimate ˆ θ := ˆ θ / | ˆ θ | . 2 Numerical studies Again, the calculation of the ab o ve minimization problem can b e d ecomp osed into t wo mini- mization problems. • Fixing θ = ϑ and w ϑ ij = K h ( ϑ ⊤ X ij ), the estimation of a j and d j are n X i =1 ρ { Y i − a j − d j ϑ ⊤ X ij } w ϑ ij . • Fixing a j and d j , the minimization with resp ect to θ can b e done as follo ws. Again, let Y ϑ ij = Y i ( w ϑ ij ) 1 / 2 − a j ( w ϑ ij ) 1 / 2 , X ϑ ij = d j X ij ( w ϑ ij ) 1 / 2 . 3 Then the problem b ecomes min ϑ n X i,j =1 ρ { Y ϑ ij − θ ⊤ X ϑ ij } Supp ose th e solution to the ab ov e problem is θ . Standardize it to θ := θ / || θ || . Set ϑ = θ and rep eat th e t w o steps until con vergence . Note that b oth steps are simple linear quan tile regression p roblems and th at sev eral eﬃcien t algorithms are a v ailable, see Ko enker (2005 ). Example 2.1 (Single-index media n regression) Consider the follo wing mo del y = exp {− 5( θ ⊤ 0 X ) 2 } + ε, (8) where X ∼ Σ 1 / 2 0 X 0 with X 0 ∼ N (0 , I 5 ) and Σ 0 = (0 . 5 | i − j | ) 0 ≤ i,j ≤ 5 . F or the n oise term, we consider sev eral d istr ibutions w ith b oth hea vy tail and thin tails as w ell. F or simplicit y , w e consider th e median regression only . As a comparison, we also r u n the MA VE where a least square typ e estimation is used. With diﬀerent sample sizes n = 100 , 200, w e carried out 100 replications. The calculation r esults are listed in T able 1. T able 1: Estimation errors (and stand ard err ors ) for mo del (8) based on qu ad r atic loss fu nction and 50% quantile s Distribution of ε size metho d 0 . 05 t (1 ) 0 . 1( N (0 , 1) 4 − 3) √ 5 t (5) / 20 N(0,1)/ 4 100 MA VE 0.3641 (0.3526) 0.3530 (0.3102) 0.0 401(0.0182 ) 0.058 1(0.0263) qMA VE 0.0902 (0.1074) 0.1512 (0.1957) 0.0 833(0.0785 ) 0.114 6(0.0651) 200 MA VE 0.3381 (0.3389) 0.2859 (0.2887) 0.0 232(0.0091 ) 0.037 3(0.0147) qMA VE 0.0681 (0.1415) 0.0581 (0.0698) 0.0 402(0.0173 ) 0.065 2(0.0272) The MA VE metho d with qu adratic loss fu nction h as very b ad p erformance w hen the noise has hea vy tail (e.g. t (1)) or is highly asymmetric (e.g. N (0 , 1) 4 ). With the absolute v alue loss function, the p erformance is m uch b etter. Ev en in the s itu ation when the noise h as thin tail and symmetric, qMA VE still p erform ance reasonably w ell. 3 Assumptions and asymptotic prop erties W e adop t mo del (4) throughout and mak e the additional assumption that { ( X i , Y i ) } ∞ i =1 are I.I.D. observ atio n s . Th e extension to the case of we akly dep endent time series shou ld b e straigh tfor- w ard but complicates matters without adding anything conceptually . F u rthermore, the follo wing conditions are assum ed in the pro ofs of Theorem 6.1. 4 (A1) F or eac h v ∈ R , ρ ( v ) is absolutely con tinuous, i.e., there is a function ϕ ( . ) such that ρ ( v ) = ρ (0) + R v 0 ϕ ( t ) dt. The probabilit y dens ity function of ε i is b ound ed and con tin uously diﬀeren tiable. E { ϕ ( ε i ) | X i } = 0 almost surely and E | ϕ ( ε i ) | ν 1 ≤ M 0 < ∞ for some ν 1 > 2 . (A2) F unction ϕ ( . ) satisﬁes the Lips chitz cond ition in ( a j , a j +1 ) , j = 0 , · · · , m , where a 1 < · · · < a m are ﬁnite num b er of ju mp discontin uit y p oint s of ϕ ( . ), a 0 ≡ −∞ , a m +1 ≡ + ∞ and m < ∞ . (A3) Kernel function K ( . ) is symmetric densit y fun ction with a compact supp ort and satisﬁes | u j K ( u ) − v j K ( v ) | ≤ C | u − v | for all j with 0 ≤ j ≤ 3. (A4) Th e link function m ( . ) deﬁn ed in (4) has con tinuous and b oun ded d eriv ativ es up to the third order. (A5) The smo othin g parameter h is c hosen suc h that n h 4 → ∞ and nh 5 / log n < ∞ . Note that (A1) and (A2) are satisﬁed in qu an tile regression with ρ ( . ) = ρ τ ( . ) giv en in (1). Condition (A3) an d (A4) are standard in k ernel smo othing. Based on (A1) and (A2), Hong (2003 ) p ro ve d that there is a constant C > 0, suc h that for all small t and all x , E h { ϕ ( Y − t − a ) − ϕ ( Y − a ) } 2 | X = x i ≤ C | t | (9) holds for all ( a, x ) in a neighb orh o o d of { m ( x ⊤ θ 0 ) , x } . Deﬁn e G ( t ; x ) = E { ρ { Y − m ( x ⊤ θ 0 ) + t }| X = x } , G i ( t, x ) = ( ∂ i /∂ t i ) G ( t ; x ) , i = 1 , 2 , 3 . (10) Then it follo ws that g ( x ) def = G 2 (0; x ) ≥ C > 0 and G 3 ( t, x ) is con tinuous and uniformly b oun ded f or all x ∈ D and t near 0. F or quantil e regression, g ( x ) = f ε (0 | x ), where f ε ( . | x ) is the conditional probabilit y dens it y function of ε giv en X = x . 4 Initial estimator of θ 0 W e use the av erage deriv ativ e estimation (ADE, H¨ ardle and Sto c ker, 1989; Chaud h ur i et al., 1997) metho d to obtain an initial estimate of θ 0 , by observing the fact that E [ ∂ m ( θ ⊤ 0 X ) /∂ X ] = θ 0 E [ ∂ m ( θ ⊤ 0 X ) /∂ ( θ ⊤ 0 X )] and θ 0 = E [ ∂ m ( θ ⊤ 0 X ) /∂ X ] /E [ ∂ m ( θ ⊤ 0 X ) /∂ ( θ ⊤ 0 X )] . (11) 5 F or an y x ∈ R d and a kernel density fu nction H ( . ) : R d → R + , d en ote by [ˆ a ( x ) , ˆ b ( x )], th e minimizer of the f ollo win g quant ity n X i =1 H ( X ix /h 0 ) ρ ( Y i − a − b ⊤ X ix ) , with resp ect to a and b . Ob serving (11 ), an initial estimate of θ 0 could b e constructed as follo ws ϑ = n X j =1 c ( X j ) ˆ b ( X j ) .    n X j =1 c ( X j ) ˆ b ( X j )    , (12) where C ( x ) is some trimm in g function in tro d u ced to deal with b oundary eﬀects. The consistency of ϑ in (12) can b e pro ved u sing the r esults on th e uniform Bahadur repre- sen tation of ˆ b ( x ) o v er an y compact sub s et D of the sup p ort of X . Supp ose H ( . ) is symmetric ab out 0 in eac h co ord inate d irection and the cond itions in Prop osition 3.1 and Corollary 3.3 in Kong et al (2007) are met, esp ecially n h d +4 0 / log n < ∞ and nh d 0 / log n → ∞ . Then with probabilit y one, ˆ b ( x ) = m ′ ( θ ⊤ 0 x ) θ 0 + 1 nh d +1 0 { f g } ( x ) n X i =1 H ( X ix /h 0 ) ϕ ( ε i ) X ix /h 0 + O n h − 1 0  log n nh d 0  3 / 4 o (13) uniformly in x ∈ D , w here { f g } ( x ) = f ( x ) g ( x ) with f ( . ) the densit y function of X and g ( x ) > 0 some deterministic fu n ction. T h is in turn implies that with probabilit y one, 1 n n X j =1 c ( X j ) ˆ b ( X j ) = m ′ ( θ ⊤ 0 x ) θ 0 + 1 n 2 h d +1 0 n X i,j =1 c ( X j ) { f g } − 1 ( X j ) H ( X ij /h 0 ) ϕ ( ε i ) X ij /h 0 + O n h − 1 0  log n nh d 0  3 / 4 o . Using results in Masry (1996), we kno w that with probability 1, 1 nh d 0 n X i =1 H ( X ix /h 0 ) ϕ ( ε i ) X ix h 0 = O { ( nh d 0 / log n ) − 1 / 2 } uniformly in x ∈ D , wh ence 1 n 2 h d +1 0 2 n X i,j =1 c ( X j ) { f g } − 1 ( X j ) H ( X ij /h 0 ) ϕ ( ε i ) X ij h 0 = O { h − 1 0 ( nh d 0 / log n ) − 1 / 2 } almost surely . Therefore, concerning the initial estimator ϑ in (12 ), we ha ve δ ϑ ≡ θ 0 − ϑ = O { h − 1 0 ( nh d 0 / log n ) − 1 / 2 } (14 ) almost sur ely . C onsequen tly from now on, we f o cus on p arametric space Θ n ≡ { ϑ : | δ ϑ | < C h ( nh d +2 0 / log n ) − 1 / 2 } for some constan t C > 0. 6 5 Asymptotics of ˆ a ϑ ( x ) and ˆ b ϑ ( x ) F or an y ϑ ∈ Θ n , den ote by f ϑ ( x ) and F ϑ ( x ) ,the p r obabilit y d ensit y function and distribution function of ϑ ⊤ X at ϑ ⊤ x resp ectiv ely , and f or an y v ∈ R and x ∈ D ⊂ R d , deﬁne m ϑ ( v ) = arg min a E { ρ ( Y − a ) | X ⊤ ϑ = v } , G ϑ ( t, x ) = E { ρ ( Y − m ϑ ( ϑ ⊤ x ) + t ) | ϑ ⊤ X = ϑ ⊤ x } , G i ϑ ( t, x ) = ( ∂ i /∂ t i ) G ϑ ( t, x ) , i = 1 , 2; g ϑ ( x ) = G 2 ϑ ( m ϑ ( x ) , x ) Apparent ly g θ 0 ( x ) ≡ g ( x ). W e assum e that for an y ϑ in a neigh b orho o d of θ 0 , G 2 ϑ ( t, x ) is con tin uous and uniformly b ounded in the neigh b orho o d of ( m ϑ ( x ) , x ) an d there exists some δ > 0 suc h th at g ϑ ( x ) > δ for ϑ near enough θ 0 and x ∈ D . With initial estimate ϑ, let [ ˆ a j , ˆ b j ] ≡ [ˆ a ϑ ( X j ) , ˆ b ϑ ( X j )] b e the solution to (6) with x sp eciﬁed as X j . If the smo othing p arameter h is c hosen suc h that nh/ log n → ∞ and nh 5 / log n < ∞ , u sing the results on un iform Bahadur representa tion in Kong et al (2007), we ha v e ˆ a j − m ϑ ( X j ) = 1 nh { g .f } − 1 ϑ ( X j ) n X i =1 K ϑ ij ϕ ( Y ∗ ij ) + O n log n nh  3 / 4 o , (15) h { ˆ b j − m ′ ϑ ( X j ) } = 1 nh { g .f } − 1 ϑ ( X j ) n X i =1 K ϑ ij ϕ ( Y ∗ ij ) X ⊤ ij ϑ/h + O n log n nh  3 / 4 o , uniformly in X j ∈ D , wh ere K ϑ ij = K ( X ⊤ ij ϑ/h ), Y ∗ ij = Y i − m ϑ ( X j ) − m ′ ϑ ( X j ) X ⊤ ij ϑ and { g.f } ϑ ( . ) = g ϑ ( . ) f ϑ ( . ). Note that m ϑ ( X j ) def = m ϑ ( X ⊤ j ϑ ) and m ′ ϑ ( X j ) def = m ′ ϑ ( X ⊤ j ϑ ). Com bined w ith Lemma 6.5 and Lemma 6.6 in the App end ix, furth er to (15), we hav e ˆ a j − a j = 1 2 m ′′ ( X ⊤ j θ 0 ) ϑ ( X j ) h 2 + b j δ ⊤ ϑ { ( ν /µ ) ϑ ( X j ) − X j } +( nh ) − 1 { g f } − 1 ϑ ( X j ) n X i =1 ϕ ij + O n log n nh  3 / 4 + h 4 + hδ ϑ o , (16) ˆ b j − b j = h 2 h 1 2 m ′′ ( X ⊤ j θ 0 ) { ( f µ ) ′ / ( f g ) } ϑ ( X j ) + 1 6 m (3) ( X ⊤ j θ 0 ) { ( f µ ) / ( f g ) } ϑ ( X j ) i + b j δ ⊤ ϑ { ( µν ′ − µ ′ ν ) /µ 2 } ϑ ( X j ) + ( nh 2 ) − 1 { g f } − 1 ϑ ( X j ) n X i =1 ˜ ϕ ij + O n h 4 + h 2 δ ϑ +  log n nh  3 / 4 /h o uniformly in j with X j ∈ D , wh ere ( ν /µ ) ϑ ( X j ) ≡ ν ϑ ( X ⊤ j ϑ ) /µ ϑ ( X ⊤ j ϑ ), µ ϑ ( v ) = E [ g ( X ) | X ⊤ ϑ = v ] , ν ϑ ( v ) = E [ g ( X ) X | X ⊤ ϑ = v ] . (17) 7 and ϕ ij and ˜ ϕ ij are zero-mean I.I.D. r andom v ariables d eﬁned as ϕ ij = K ϑ ij ϕ ( Y ∗ ij ) − E [ K ϑ ij ϕ ( Y ∗ ij )] , (18) ˜ ϕ ij = K ϑ ij ϕ ( Y ∗ ij ) X ⊤ ij ϑ/h − E [ K ϑ ij ϕ ( Y ∗ ij ) X ⊤ ij ϑ/h ] . Note that (16) fo cuses on the almost sure p rop erty of [ˆ a j , ˆ b j ] . W elsh (1996) stud ied their the asymptotic bias and v ariance, i.e. E { ˆ a ( x ) } = m ϑ ( ϑ ⊤ x ) + O ( h 2 ) , E { ˆ b ( x ) } = m ′ ϑ ( ϑ ⊤ x ) + O ( h 2 ) , V ar { ˆ a ( x ) } = O ( n − 1 h − 3 ) , V ar { ˆ b ( x ) } = O ( n − 1 h − 3 ) , (19) and the O ( . )s are u niformly in x in any compact subset of the supp ort of X . 6 Asymptotics of ˆ θ F or the previously obtained ϑ , ˆ a j , ˆ b j , j = 1 , · · · , n , su pp ose ˆ θ minimizes ˜ Φ n ( θ ), where n X i =1 n X j =1 K ϑ ij ρ ( Y i − ˆ a j − ˆ b j θ ⊤ X ij ) + n 2 h 2 ( θ − ϑ ) ⊤ ϑϑ ⊤ ( θ − ϑ ) . Apparent ly , ˆ θ also minimizes ˜ Φ n ( θ ) = Φ n ( θ ) + n 2 h { 1 2 ( θ − θ 0 ) ⊤ ϑϑ ⊤ ( θ − θ 0 ) + ( θ 0 − ϑ ) ⊤ ϑϑ ⊤ ( θ − θ 0 ) } Φ n ( θ ) = n X i =1 n X j =1 K ϑ ij { ρ ( Y i − ˆ a j − ˆ b j θ ⊤ X ij ) − ρ ( Y ij ) } , (20) where Y ij ≡ Y i − ˆ a j − ˆ b j X ⊤ ij θ 0 . Let a nϑ = m ax { ( n log log n ) − 1 / 2 , | δ ϑ |} . As | ϑ − θ 0 | = O ( a nϑ ), ϑϑ ⊤ = θ 0 θ ⊤ 0 + O ( a nϑ ), whence for any θ with δ θ def = θ 0 − θ = O ( a nϑ ), w e ha v e ˜ Φ n ( θ ) = Φ n ( θ ) + n 2 h { 1 2 δ ⊤ θ θ 0 θ ⊤ 0 δ θ − δ ⊤ ϑ θ 0 θ ⊤ 0 δ θ } + o ( n 2 ha 2 nϑ ) . W rite Φ n ( θ ) = E [Φ n ( θ )] + δ ⊤ θ { R n 1 ( θ ) − E R n 1 ( θ ) } + R n 2 ( θ ) − E R n 2 ( θ ) , where R n 1 = X i,j K ϑ ij ϕ ( Y ij ) ˆ b j X ij , R n 2 ( θ ) = X i,j K ϑ ij h ρ ( Y i − ˆ a j − ˆ b j θ ⊤ X ij ) − ρ ( Y ij ) − δ ⊤ θ ϕ ( Y ij ) ˆ b j X ij i . Applying the resu lts on E (Φ n ( θ )) in Lemma 6.11, w e hav e Φ n ( θ ) = δ ⊤ θ R n 1 + 1 2 δ ⊤ θ G nϑ δ θ { 1 + o (1) } + R n 2 ( θ ) − E R n 2 ( θ ) , (21) 8 where G nϑ = X i,j E [ K ϑ ij g ( X i ) ˆ b 2 j X ij X ⊤ ij ] = n 2 hS 2 { 1 + O ( δ ϑ ) } , S 2 = Z { m ′ ( X ⊤ θ 0 ) } 2 ω θ 0 ( X ) f θ 0 ( X ) dX , and ω ϑ ( x ) = E { g ϑ ( X )( X − x )( X − x ) ⊤ | X ⊤ ϑ = x ⊤ ϑ } . Consequen tly , ˜ Φ n ( θ ) = ( R n 1 − θ 0 θ ⊤ 0 ) δ θ + 1 2 δ ⊤ θ ( G nϑ + n 2 hθ 0 θ ⊤ 0 ) δ θ { 1 + o (1) } + R n 2 ( θ ) − E R n 2 ( θ ) . Our main resu lt is as follo ws Theorem 6.1 Supp ose (A1)-(A4) hold. With ν ϑ ( . ) and µ ϑ ( . ) as deﬁne d i n (17), we have ˆ θ − θ 0 = ( S 2 + θ 0 θ ⊤ 0 ) − 1 1 n X i ϕ ( ε i ) b i {  f } θ 0 ( X i ) − ( S 2 + θ 0 θ ⊤ 0 ) − 1 (Ω nϑ + θ 0 θ ⊤ 0 ) δ ϑ + α n | ϑ − θ 0 | + o ( n − 1 / 2 ) = ( S 2 + θ 0 θ ⊤ 0 ) − 1 1 n X i ϕ ( ε i ) b i {  f } θ 0 ( X i ) − ( S 2 + θ 0 θ ⊤ 0 ) − 1 (Ω 0 + θ 0 θ ⊤ 0 ) δ ϑ + α n | ϑ − θ 0 | + o ( n − 1 / 2 ) (22) almost sur e ly, wher e  θ ( x ) = E ( X | X ⊤ θ = x ⊤ θ ) − x, α n = o (1) u ni f ormly in ϑ and Ω nϑ def = 1 n X j b 2 j µ ϑ ( X j ) { ( ν /µ ) ϑ ( X j ) − X j } × { ( ν /µ ) ϑ ( X j ) − X j } ⊤ Ω 0 = E [ { m ′ ( X ⊤ θ 0 ) } 2 µ θ 0 ( X ) { ( ν /µ ) θ 0 ( X ) − X }{ ( ν /µ ) θ 0 ( X ) − X } ⊤ ] Remark 6.2 In Lemma 6.16, we p ro ve that if δ ϑ 6 = 0 , 0 < | ( S 2 + θ 0 θ ⊤ 0 ) − 1 (Ω 0 + θ 0 θ ⊤ 0 ) δ ϑ | / | δ ϑ | < 1 . (23) This implies th at the eﬀect on ˆ θ − θ 0 of the initial estimate error ϑ − θ 0 decreases geometrically . Remark 6.3 Theorem 6.1 is p r o v ed under the assumption that { ( X i , Y i ) } ∞ i =1 are I.I.D. obser- v atio ns. It is p ossible, how ev er, to extend this result f or time series obs er v ations pro vided that the time dep endency (usu ally measured by m ixing co eﬃcien t) are weak enough. F or example, the stationary β − mixing pro cesses, which satisﬁes β ( k ) = sup A ∈F a −∞ ,B ∈F ∞ a + k | P ( B ) − P ( B | A ) | → 0 , as k → ∞ , where F b a is the σ − algebra generated by { ( X i , Y i ) } b i = a . 9 Lemma 6.4 Under c ondition s in The or em 6.1, we have ( n 2 h ) − 1 R n 1 = 1 n P i ϕ ( ε i ) b i {  f } θ 0 ( X i ) − Ω nϑ δ ϑ + α n | ϑ − θ 0 | + o ( n − 1 / 2 ) a.s. (24) Pro of of Theorem 6.1 . Based on (24), it suﬃces to pr o v e that ˆ θ − θ 0 = { n 2 h ( S 2 + θ 0 θ ⊤ 0 ) } − 1 ( R n 1 − n 2 hθ 0 θ ⊤ 0 δ ϑ ) a.s. (25) As the ﬁrst step to p ro v e (25), w e s h o w in Lemma 6.13 and Lemma 6.14 that for eac h ﬁxed θ , ( n 2 ha 2 nϑ ) − 1 [ R n 2 ( θ ) − E R n 2 ( θ )] = o (1) a.s. (26) This together with (21) and th e fact that G nϑ = n 2 hS 2 { 1 + O ( δ ϑ ) } imply that for any ﬁxed θ , ( n 2 ha 2 nϑ ) − 1 [ ˜ Φ n ( θ ) − δ ⊤ θ ( R n 1 + θ 0 θ ⊤ 0 δ ϑ ) − 1 2 n 2 hδ ⊤ θ ( S 2 + θ 0 θ ⊤ 0 ) δ θ ] → 0 a.s. As b oth ˜ Φ n ( θ ) − δ ⊤ θ ( R n 1 + θ 0 θ ⊤ 0 δ ϑ ) and δ ⊤ θ ( S 2 + θ 0 θ ⊤ 0 ) δ θ are con v ex in θ , it follo ws f rom Lemma 6.7 that for any compact set Θ nθ ⊂ Θ n (con v ex op en set), sup θ ∈ Θ nθ ( n 2 ha 2 nϑ ) − 1 | ˜ Φ n ( θ ) − δ ⊤ θ ( R n 1 + θ 0 θ ⊤ 0 δ ϑ ) − 1 2 n 2 hδ ⊤ θ ( S 2 + θ 0 θ ⊤ 0 ) δ θ | → 0 a.s. (27) Let η n = { n 2 h ( S 2 + θ 0 θ ⊤ 0 ) } − 1 ( R n 1 + θ 0 θ ⊤ 0 δ ϑ ). No w we are ready to p ro ve the equiv alen t of (25), i.e. with p r obabilit y 1, for any δ > 0, | ˆ θ − θ 0 − η n | /a nϑ ≤ δ for large n . First note that as θ 0 + η n is b ound ed with p r obabilit y 1, Θ n can b e chosen to con tain B δ n , a closed ball with cent er θ 0 + η n and radius a nϑ δ . Replace Θ nθ in (27) by B δ n , w e hav e ∆ n ≡ sup θ ∈ B δ n ( n 2 ha 2 nϑ ) − 1 | ˜ Φ n ( θ ) − δ ⊤ θ ( R n 1 − θ 0 θ ⊤ 0 δ ϑ ) − 1 2 n 2 hδ ⊤ θ ( S 2 + θ 0 θ ⊤ 0 ) δ θ | = o (1) a.s. (28) No w consider the b eh avior of ˜ Φ n ( θ ) outside B δ n . S upp ose θ = θ 0 + η n + a nϑ β ν , for some β > δ and ν a unit v ector. Deﬁne θ ∗ as the b ound ary p oin t of B δ n that lies on the line segment from θ 0 + η n to θ , i.e. θ ∗ = θ 0 + η n + a nϑ δ ν . Conv exit y of Φ n ( θ ) and the d eﬁnition of ∆ n imply δ β ˜ Φ n ( θ ) + (1 − δ β ) ˜ Φ n ( θ 0 + η n ) ≥ ˜ Φ n ( θ ∗ ) ≥ 1 2 n 2 hδ 2 a 2 nϑ ν ⊤ ( S 2 + θ 0 θ ⊤ 0 ) ν − 1 2 ( n 2 h ) − 1 R ⊤ n 1 ( S 2 + θ 0 θ ⊤ 0 ) − 1 R n 1 − n 2 ha 2 nϑ ∆ n ≥ 1 2 n 2 hδ 2 a 2 nϑ ν ⊤ ( S 2 + θ 0 θ ⊤ 0 ) ν + ˜ Φ n ( θ 0 + η n ) − 2 n 2 ha 2 nϑ ∆ n . 10 It follo ws th at inf | θ − θ 0 − η n | >δ a nϑ ˜ Φ n ( θ ) ≥ ˜ Φ n ( θ 0 + η n ) + β δ n 2 ha 2 nϑ [ 1 2 δ 2 ν ⊤ ( S 2 + θ 0 θ ⊤ 0 ) ν − 2∆ n ] . As S 2 + θ 0 θ ⊤ 0 is p ositiv e deﬁ nite, then according to (28), with probabilit y 1, δ 2 ν ⊤ S 2 ν > 4∆ n for large enough n . Th is implies that for any δ > 0 and for large en ough n , the minimum of ˜ Φ n ( θ ) m ust o ccur w ithin B δ n . This implies (25).  App endix Pro of of Lemma 6.4 . W r ite R n 1 ( θ ) = X i,j K ϑ ij ϕ ( ε i ) b j X ij + X i,j K ϑ ij ϕ ( ε i )( ˆ b j − b j ) X ij + X i,j K ϑ ij ˆ b j X ij { ϕ ( Y ij ) − ϕ ( ε i ) } , where E j denotes exp ectation tak en w.r.t X j for giv en X i . W e will show that 1 n 2 h X i,j K ϑ ij ϕ ( ε i ) b j X ij = 1 n X i ϕ ( ε i ) b i {  f } θ 0 ( X i ) + O { (log log n/n ) 1 / 2 ( h 2 + δ ϑ ) } , (29) whic h together with Lemma 6.12 lead to (24 ). First note that E j [ K ϑ ij b j X ij /h ] = b i {  f } ϑ ( X i ) − δ ϑ m ′′ ( X ⊤ i θ 0 ) { Σ f } θ 0 ( X i ) + h 2 b i {  f } ′′ θ 0 ( X i ) + O ( | δ ϑ | 2 + h 4 ) , This together with Lemma 7.8 in Xia and T ong (2006), we h a v e 1 n 2 h X i,j K ϑ ij ϕ ( ε i ) b j X ij = 1 n X i ϕ ( ε i ) b i {  f } θ 0 ( X i ) + O { (log log n /n ) 1 / 2 ( h 2 + δ ϑ ) } , from whic h f ollo ws (29), as { f } ϑ ( . ) is lipschitz con tin uous in ϑ .  Lemma 6.5 m ϑ ( X j ) − a j = b j δ ⊤ ϑ { ( ν /µ ) ϑ ( X j ) − X j } + o ( | δ ϑ | ) , (30) m ′ ϑ ( X j ) − b j = b j δ ⊤ ϑ { ( µν ′ − µ ′ ν ) /µ 2 } ϑ ( X j ) + o ( | δ ϑ | ) , (31) Pro of It follo ws from the p rop erty of conditional exp ectation that E { ρ ( Y − a ) | X ⊤ ϑ = x ⊤ ϑ } = E [ E { ρ ( Y − a ) | X }| X ⊤ ϑ = x ⊤ ϑ ] = E [ G { m ( θ ⊤ 0 X ) − a ; X }| X ⊤ ϑ = x ⊤ ϑ ] . 11 Using the diﬀerenti abilit y of G ( t ; X ) in t , w e h a v e G { m ( θ ⊤ 0 X ) − a ; X } = G (0; X ) + g ( X )( m ( θ ⊤ 0 X ) − a ) 2 / 2 + O { ( m ( θ ⊤ 0 X ) − a ) 3 } . If X ⊤ ϑ = x ⊤ ϑ and δ ϑ = o (1), m ( θ ⊤ 0 X ) − m ( θ ⊤ 0 x ) = O { θ ⊤ 0 ( X − x ) } = O { δ ⊤ θ ( X − x ) } = o (1) . Therefore for ev ery a near m ( θ ⊤ 0 X ) (whence m ( θ ⊤ 0 x ) [ WHY ] ), E [ G { m ( θ ⊤ 0 X ) − a ; X }| X ⊤ ϑ = x ⊤ ϑ ] − E [ G (0; X ) | X ⊤ ϑ = x ⊤ ϑ ] → 1 2 E [ g ( X )( m ( θ ⊤ 0 X ) − a ) 2 | X ⊤ ϑ = x ⊤ ϑ ] . As ρ ( . ) is conv ex, we can argue this conv ergence is in fact uniform o ver all a near m ( θ ⊤ 0 X ), w hic h implies that the minima of E [ G { m ( θ ⊤ 0 X ) − a ; X }| X ⊤ ϑ = x ⊤ ϑ ] is also appro ximately the minima of E [ g ( X )( m ( θ ⊤ 0 X ) − a ) 2 | X ⊤ ϑ = x ⊤ ϑ ]. W e hav e m ( θ ⊤ 0 X ) = m ( θ ⊤ 0 x ) + m ′ ( θ ⊤ 0 x ) θ ⊤ 0 ( X − x ) + C { θ ⊤ 0 ( X − x ) } 2 , E [ g ( X )( m ( θ ⊤ 0 X ) − a ) 2 | X ⊤ ϑ = x ⊤ ϑ ] = 2 m ′ ( θ ⊤ 0 x ) { m ( θ ⊤ 0 x ) − a } δ ⊤ ϑ { ν ϑ ( x ⊤ ϑ ) − xµ ϑ ( x ⊤ ϑ ) } + { m ( θ ⊤ 0 x ) − a } 2 µ ϑ ( x ⊤ ϑ ) + O ( | δ ϑ | 2 ) . (32) T ak e deriv ativ e with resp ect to a and (30 ) follo ws. T o pro ve (31), for an y t → 0, mimicking (32), E [ g ( X ) { m ( θ ⊤ 0 X ) − a } 2 | X ⊤ ϑ = x ⊤ ϑ + t ] = 2 m ′ ( θ ⊤ 0 x ) { m ( θ ⊤ 0 x ) − a } E [ g ( X ) { t + δ ⊤ ϑ ( X − x ) }| X ⊤ ϑ = x ⊤ ϑ + t ] + { a − m ( θ ⊤ 0 x ) } 2 µ ϑ ( x ⊤ ϑ + t ) + O ( | δ ϑ | 2 ) = { a − m ( θ ⊤ 0 x ) } 2 µ ϑ ( x ⊤ ϑ + t ) + 2 tm ′ ( θ ⊤ 0 x ) { m ( θ ⊤ 0 x ) − a } µ ϑ ( x ⊤ ϑ + t ) + O ( t 2 | δ ϑ | 2 ) +2 m ′ ( θ ⊤ 0 x ) { m ( θ ⊤ 0 x ) − a } δ ⊤ ϑ { ν ϑ ( x ⊤ ϑ + t ) − xµ ϑ ( x ⊤ ϑ + t ) } . Again tak e d eriv ativ e with r esp ect to a and by the deﬁnition of m ϑ ( . ), we hav e m ϑ ( ϑ ⊤ x + t ) ≈ m ( θ ⊤ 0 x ) + tm ′ ( θ ⊤ 0 x ) + m ′ ( θ ⊤ 0 x ) δ ⊤ ϑ { ( ν /µ ) ϑ ( x ⊤ ϑ + t ) − x } , Recall that from (30), m ϑ ( ϑ ⊤ x ) ≈ m ( θ ⊤ 0 x ) + m ′ ( θ ⊤ 0 x ) δ ⊤ ϑ { ( ν /µ ) ϑ ( x ⊤ ϑ ) − x } + O ( | δ ϑ | 2 ) . Su btract this from the equation ab ov e and sup p ose the ﬁrst order deriv ativ e of µ ϑ ( . ) and ν ϑ ( . ) are b oth Lipsc hitz contin uous, we h a v e m ϑ ( ϑ ⊤ x + t ) − m ϑ ( ϑ ⊤ x ) ≈ tm ′ ( θ ⊤ 0 x ) + m ′ ( θ ⊤ 0 x ) δ ⊤ ϑ { ( ν /µ ) ϑ ( x ⊤ ϑ + t ) − ( ν /µ ) ϑ ( x ⊤ ϑ ) } = tm ′ ( θ ⊤ 0 x ) + tm ′ ( θ ⊤ 0 x ) δ ⊤ ϑ { ( µν ′ − µ ′ ν ) /u 2 } ϑ ( x ⊤ ϑ ) + O ( t 2 ) . 12 Divide this o ve r t and let t → 0, w e will h a v e (31).  Lemma 6.6 E i K ϑ ij ϕ ( Y ∗ ij ) = 1 2 m ′′ ( X ⊤ j θ 0 )( f g ) ϑ ( X j ) h 3 + O ( h 4 ) + o ( hδ ϑ ) , E i K ϑ ij ϕ ( Y ∗ ij ) X ⊤ ij ϑ = h 4 n 1 2 m ′′ ( X ⊤ j θ 0 )( f µ ) ′ ϑ ( X j ) + 1 6 m (3) ( X ⊤ j θ 0 )( f µ ) ϑ ( X j ) o + O ( h 4 δ ϑ + h 6 ) . (33) Pro of Based on (30) and (31), w e h a v e m ( X ⊤ i θ 0 ) − m ϑ ( X j ) − m ′ ϑ ( X j ) X ⊤ ij ϑ = m ( X ⊤ i θ 0 ) − m ( X ⊤ j θ 0 ) − b j δ ⊤ ϑ { ( ν /µ ) ϑ ( X j ) − X j } −{ b j + b j δ ⊤ ϑ { ( µν ′ − µ ′ ν ) /µ 2 } ϑ ( X j ) } X ⊤ ij ϑ + o ( | δ ϑ | ) = b j X ⊤ ij δ ϑ + 1 2 m ′′ ( X ⊤ j θ 0 )( θ ⊤ 0 X ij ) 2 + 1 6 m (3) ( X ⊤ j θ 0 )( θ ⊤ 0 X ij ) 3 − b j δ ⊤ ϑ { ( µν ′ − µ ′ ν ) /µ 2 } ϑ ( X j ) X ⊤ ij ϑ − b j δ ⊤ ϑ { ( ν /µ ) ϑ ( X j ) − X j } + o ( | δ ϑ | ) + O { ( X ⊤ ij ϑ ) 4 + δ ϑ } . As m ( X ⊤ i θ 0 ) − m ϑ ( X j ) − m ′ ϑ ( X j ) X ⊤ ij ϑ = o (1), b y the cont inuit y of G 1 ( t ; X ) in t , w e hav e E [ ϕ { Y i − m ϑ ( X j ) − m ′ ϑ ( X j ) X ⊤ ij ϑ }| X i ] = G 1 { m ( X ⊤ i θ 0 ) − m ϑ ( X j ) − m ′ ϑ ( X j ) X ⊤ ij ϑ ; X i } = b j δ ⊤ ϑ g ( X i ) X ij − b j δ ⊤ ϑ { ( ν /µ ) ϑ ( X j ) − X j } g ( X i ) − b j δ ⊤ ϑ { ( µν ′ − µ ′ ν ) /µ 2 } ϑ ( X j ) g ( X i ) X ⊤ ij ϑ + 1 2 m ′′ ( X ⊤ j θ 0 ) g ( X i )( θ ⊤ 0 X ij ) 2 + 1 6 m (3) ( X ⊤ j θ 0 ) g ( X i )( θ ⊤ 0 X ij ) 3 + o ( | δ ϑ | ) + O (( X ⊤ ij ϑ ) 4 ) , (34) and th u s E i [ K ϑ ij ϕ { Y i − m ϑ ( X j ) − m ′ ϑ ( X j ) X ⊤ ij ϑ } ] = 1 2 m ′′ ( X ⊤ j θ 0 )( g f ) ϑ ( X j ) h 3 + o ( h | δ ϑ | ) + O ( h 4 ) . 13 Similarly (33) follo ws from (34) and the follo wing facts E [ g ( X i ) X ij | X ⊤ i ϑ = X ⊤ j ϑ + hu ] = ν ϑ ( X ⊤ j ϑ + hu ) − X j µ ϑ ( X ⊤ j ϑ + hu ) = ν ϑ ( X ⊤ j ϑ ) + huν ′ ϑ ( X ⊤ j ϑ ) − X j µ ϑ ( X ⊤ j ϑ ) − huX j µ ′ ϑ ( X ⊤ j ϑ ) + O ( h 2 ) , E [ g ( X i ) | X ⊤ i ϑ = X ⊤ j ϑ + hu ] = µ ϑ ( X ⊤ j ϑ ) + huµ ′ ϑ ( X ⊤ j ϑ ) + O ( h 2 ) , Z K ( u ) E [ g ( X i ) X ij | X ⊤ i ϑ = X ⊤ j ϑ + hu ] hud u = h 2 { ( f ν ′ ) ϑ ( X ⊤ j ϑ ) − X j ( f µ ′ ) ϑ ( X ⊤ j ϑ ) } + h 2 { ( f ′ ν ) ϑ ( X ⊤ j ϑ ) − X j ( f ′ µ ) ϑ ( X ⊤ j ϑ ) } + O ( h 4 ) , Z K ( u ) E [ g ( X i ) | X ⊤ i ϑ = X ⊤ j ϑ + hu ] hudu = h 2 ( µ ′ f + µf ′ ) ϑ ( X ⊤ j ϑ ) + O ( h 4 ) , Z K ( u ) E [ g ( X i ) | X ⊤ i ϑ = X ⊤ j ϑ + hu ] h 2 u 2 du = h 2 ( µf ) ϑ ( X ⊤ j ϑ ) + O ( h 4 ) .  Lemma 6.7 L et { λ n ( θ ) : θ ∈ Θ } b e a se quenc e of r andom c onvex functions deﬁne d on a c onvex, op en sub set Θ of R d . Supp ose λ ( θ ) i s a r e al value d fu nction on Θ such that λ n ( θ ) tends to λ ( θ ) for e ach θ almost sur ely, Then f or e ach c omp act set K of Θ , with pr ob ability 1 , sup θ ∈ K | λ n ( θ ) − λ ( θ ) | → 0 . Pro of The condition can b e restated as follo ws: for an y ﬁxed θ ∈ Θ, there exists some Ω θ ⊆ Ω, suc h that P (Ω θ ) = 1 and λ n ( ω , θ ) − λ ( θ ) → 0 , for any ω ∈ Ω θ . The conclusion can b e restated that for eac h compact set K of Θ, there exists some Ω 0 ⊆ Ω, suc h that P (Ω 0 ) = 1 and sup θ ∈ K | λ n ( ω , θ ) − λ ( θ ) | → 0 , for any ω ∈ Ω 0 . F or su c h uniformit y of the con v ergence, it is enough to consider the case where K is a cub e with edges parallel to the coord inate directions e l , · · · , e d . Ev ery compact sub s et of Θ can b e co vered b y ﬁ nitely many such cub es. Let ℑ 0 ≡ K and K + δ 0 b e the larger cub e constructed by adding an extra la y er of cu b es with sides δ 0 to K . Su pp ose δ 0 > 0 is small enough su c h that K + δ 0 ⊂ Θ . Deﬁne ✵ 0 for the ﬁn ite set of all v ertices of all the cub es that mak e up K + δ 0 . No w for k = 1 , 2 , · · · , let ǫ k = k − 1 . As conv exit y implies con tinuit y , there is a 0 < δ k < δ k − 1 suc h that λ ( . ) v aries by less than ǫ k / ( d + 1) o v er eac h cub e of side 3 δ k that in tersects K . Pa rtition 14 eac h cub e in ℑ k − 1 in to a union of cub es w ith side at m ost δ k and denote by ℑ k the resulted union of cub es. Then expand K to a larger cub e K + δ k b y adding an extra lay er of th ese δ k − cub es around eac h face. As δ k < δ k − 1 , K + δ k ⊂ K + δ k − 1 is still within Θ. Deﬁne ✵ k = { vertice s of all the δ k − cub es that mak e up K + δ k } [ ✵ k − 1 ≡ { vertice s of all the δ k − cub es that mak e up K + δ k } [ { ✵ k − 1 \ K c } and Ω k = \ θ ∈ ✵ k Ω θ . As ✵ k is ﬁnite, we h a v e P (Ω k ) = 1 and for an y ω ∈ Ω k , M k n ( ω ) = s u p θ ∈ ✵ k | λ n ( ω , θ ) − λ ( θ ) | → 0 . (35) W e ﬁr st establish the connection b et wee n M k n ( ω ) and the upp er b ound for λ n ( ω , θ ) − λ ( θ ) , o v er θ ∈ K , f or any given ω ∈ Ω k . F or any ﬁxed k = 1 , 2 , · · · , eac h θ in K lies within a δ k -cub e with vertice s { θ i } ∈ ✵ k ; it can b e written as a conv ex com bination P i α i θ i of those vertic es, i.e. θ = X θ i ∈ ✵ k α i θ i , X θ i ∈ ✵ k α i = 1 . Then for any giv en ω ∈ Ω k , con v exit y of λ n ( ω , θ ) in θ giv es λ n ( ω , θ ) ≤ X θ i ∈ ✵ k α i λ n ( ω , θ i ) = X θ i ∈ ✵ k α i { λ n ( ω , θ i ) − λ ( θ i ) } + X θ i ∈ ✵ k α i { λ ( θ i ) − λ ( θ ) } + λ ( θ ) ≤ M k n ( ω ) + max θ i ∈ ✵ k | λ ( θ i ) − λ ( θ ) | + λ ( θ ) . Therefore, λ n ( ω , θ ) − λ ( θ ) ≤ M k n ( ω ) + ǫ k . (36) Next w e establish th e companion lo we r b ound. F or any ﬁ xed k = 1 , · · · , eac h θ in K lies w ithin a δ k -cub e with a vertex θ 0 in K T ✵ k : θ = θ 0 + d X i =1 δ i e i , with | δ i | ≤ δ k , i = 1 , · · · , d. 15 Without loss of generalit y , sup p ose δ i ≥ 0 for eac h i = 1 , · · · , d. Deﬁne θ ik = θ 0 − δ ′ i e i , w here δ ′ i ≡ min { c ≥ δ k : θ 0 − ce i ∈ ✵ k } , i = 1 , · · · , d Note that as θ 0 ∈ K T ✵ k , δ ′ i m ust exist and δ ′ i < 2 δ k , for all i = 1 , · · · , d. W rite θ 0 as a con vex com bination of θ and these θ ik : θ 0 = Q d j =1 δ ′ j Q d j =1 δ ′ j + P d j =1 δ j Q l 6 = j δ ′ l θ + d X i =1 δ i Q j 6 = i δ ′ j Q d j =1 δ ′ j + P d j =1 δ j Q l 6 = j δ ′ l θ ik . Denote these conv ex we ights by β and { β i } . As δ j ≤ δ k ≤ δ ′ j , we h a v e β ≥ 1 / ( d + 1) and β λ n ( ω , θ ) ≥ λ n ( ω , θ 0 ) − X i β i λ n ( ω , θ ik ) ( con v exit y of λ n ( ω , θ ) in θ ) ≥ λ ( θ 0 ) − X i β i λ ( θ ik ) − 2 M k n ( ω ) ( from (35)) ≥ λ ( θ ) − ǫ k / ( d + 1) − X i β i [ λ ( θ ) + ǫ k / ( d + 1)] − 2 M k n ( ω ) = β λ ( θ ) − 2 ǫ k / ( d + 1) − 2 M k n ( ω ) where the third inequ alit y is du e to th e deﬁnition of δ k and the fact that th ere exists a cub e of side 3 δ k whic h conta ins b oth θ ik and θ 0 . As β ≥ 1 / ( d + 1), λ n ( ω , θ ) − λ ( θ ) ≥ − 2 ǫ k − 2( d + 1) M k n ( ω ) . This toge ther with (36) implies that for any k = 1 , 2 , · · · , there exists some Ω k ( ⊇ Ω k +1 ) suc h that P (Ω k ) = 1 and ∀ ω ∈ Ω k , sup θ ∈ K | λ n ( ω , θ ) − λ ( θ ) | ≤ ( d + 1 ) M k n ( ω ) + 2 k − 1 . Let Ω 0 ≡ T ∞ k =1 Ω k . As Ω k is a decreasing sequence and P (Ω k ) = 1, w e h av e P (Ω 0 ) = 1 and for an y ω ∈ Ω 0 , sup θ ∈ K | λ n ( ω , θ ) − λ ( θ ) | ≤ ( d + 1 ) M k n ( ω ) + 2 k − 1 , for all k ≥ 1 . (37) Note that as n → ∞ , M k n ( ω ) → 0 for eac h ﬁxed k , as in (35). T ak e limit of b oth s id es of (37) lim n →∞ sup θ ∈ K | λ n ( ω , θ ) − λ ( θ ) | ≤ lim n →∞ M k n ( ω ) + k − 1 = k − 1 , for all k ≥ 1 . This is equiv alen t to th at with probabilit y 1 , lim n →∞ sup θ ∈ K | λ n ( ω , θ ) − λ ( θ ) | → 0 .  W e no w list a n umb er of facts in the literature th at will b e used in our pro ofs later. 16 Lemma 6.8 [Korolyuk et al, 1989] Let X 1 , X 2 , · · · , X n b e i.i.d. random v ariables. With a symmetric ke rn el Φ : X m → R , w e consider the U-statistic U n =  n m  X l ≤ i 1 < ··· 0 and for all c = 1 , · · · , m , E g 2 c/ (2 c − 1) c < ∞ . T he with probability 1, lim sup n →∞ n 1 / 2 ( U n − θ ) (2 m 2 σ 2 1 log log n ) 1 / 2 = 1  Lemma 6.9 [Berb ee’s Lemma] Let ( X , Y ) b e a R d × R d ′ − v alued random v ector. Then there exists a R d ′ − v alued random v ector Y ∗ whic h h as the same distribution as Y and Y ∗ is indep endent of X ; P ( Y ∗ 6 = Y ) = β ( σ ( X ) , σ ( Y )) (38) where σ ( X ) and σ ( Y )) are th e σ − algebra generated by X and Y resp ectiv ely , and β [ σ ( X ) , σ ( Y )] = E sup A ∈ σ ( Y ) | P ( A ) − P ( A | σ ( X )) | Lemma 6.10 β [ σ ( X 1 , Y 1 ) , σ (ˆ a j , ˆ b j )] = O { ( nh/ log 3 n ) − 1 / 4 } Pro of By the deﬁn ition, β [ σ ( X 1 , Y 1 ) , σ (ˆ a j , ˆ b j )] = E sup A ∈ σ ( ˆ a j , ˆ b j ) | P ( A ) − P ( A | σ ( X 1 , Y 1 )) | According to results in W elsh (1996), [(ˆ a j − E ˆ a j ) /σ 1 , ( ˆ b j − E ˆ b j ) /σ 2 ] are asymptotically normal, w h ere σ 1 ≡ { V ar ˆ a j } 1 / 2 = O { ( nh ) − 1 / 2 } and σ 2 ≡ { V ar ˆ b j } 1 / 2 = O { ( nh 3 ) − 1 / 2 } . Let τ n = ( nh/ log n ) − 3 / 4 and rewrite (16) as ˆ a j = E ˆ a j + 1 nh n X i =2 K ϑ ij ϕ ij + 1 nh K 1 j ϕ 1 ( X 1 , Y 1 ) + O ( τ n ) , ˆ b j = E ˆ b j + 1 nh 2 n X i =2 ˜ ϕ ij + 1 nh 2 ˜ ϕ 1 j + O { τ n /h } . (39) 17 Note that ϕ ij , ˜ ϕ ij , i = 1 , · · · , n are t w o sequences of zero-mean i.i.d. b ounded random v ariables deﬁned in (18), when ce P { ˆ a j ≤ t 1 , ˆ b j ≤ t 2 | Y 1 , X 1 } ≤ P [ˆ a j ≤ C τ n + t 1 , ˆ b j ≤ C τ n /h + t 2 ] ≤ P h (ˆ a j − E ˆ a j ) /σ 1 ≤ ( t 1 − E ˆ a j + C τ n ) /σ 1 , ( ˆ b j − E ˆ b j ) /σ 2 ≤ ( t 2 − E ˆ b j + C τ n /h ) /σ 2 i = P [ˆ a j ≤ t 1 , ˆ b j ≤ t 2 ] + C ( nh ) 1 / 2 τ n , P { ˆ a j ≥ t 1 , ˆ b j ≥ t 2 | Y 1 , X 1 } ≥ P [ˆ a j ≥ t 1 − C τ n , ˆ b j ≥ t 2 − C τ n /h ] ≥ P h (ˆ a j − E ˆ a j ) /σ 1 ≥ ( t 1 − E ˆ a j − C τ n ) /σ 1 , ( ˆ b j − E ˆ b j ) /σ 2 ≥ ( t 2 − E ˆ b j − C τ n /h ) /σ 2 i = P [ˆ a j ≥ t 1 , ˆ b j ≥ t 2 ] − C ( nh ) 1 / 2 τ n . Therefore, | P { ˆ a j ≤ t 1 , ˆ b j ≤ t 2 | Y 1 , X 1 } − P { ˆ a j ≤ t 1 , ˆ b j ≤ t 2 }| ≤ C ( nh ) − 1 / 2 τ n = O { ( nh/ log 3 n ) − 1 / 4 } . Lemma 6.11 Under the assumptions (A1)–(A5), we have E Φ n ( θ ) = δ ⊤ θ E R n 1 ( θ ) + δ ⊤ θ G nϑ δ θ + o ( n 2 h | δ θ | 2 ) . Pro of Apparently it suﬃces to sho w that E K ϑ ij { ρ ( Y 1 − ˆ a j − ˆ b j θ ⊤ X 1 j ) − ρ ( Y 1 − ˆ a j − ˆ b j θ ⊤ 0 X 1 j ) } = δ ⊤ θ E [ K ϑ ij ϕ ( Y 1 − ˆ a j − ˆ b j θ ⊤ 0 X 1 j ) ˆ b j X 1 j ] + δ ⊤ θ E [ K ϑ ij X 1 j X ⊤ 1 j g ( X 1 ) ˆ b 2 j ] δ θ + o ( | δ θ | 2 ) . By the con tinuit y of E [ ρ ( Y 1 − ˆ a j − t ˆ b j ) |X ] in t , where X = σ ( X 1 , · · · , X n ), w e ha v e E { ρ ( Y 1 − ˆ a j − ˆ b j θ ⊤ X 1 j ) − ρ ( Y 1 − ˆ a j − ˆ b j θ ⊤ 0 X 1 j ) |X } = δ ⊤ θ X 1 j E [ ϕ ( Y 1 − ˆ a j − ˆ b j θ ⊤ 0 X 1 j ) ˆ b j |X ] + δ ⊤ θ X 1 j X ⊤ 1 j δ θ ∂ [ E { ϕ ( Y 1 − ˆ a j − ˆ b j t ) ˆ b j |X } ] / ∂ t | t = X ⊤ 1 j θ 0 + δ ⊤ θ X 1 j X ⊤ 1 j δ θ h ∂ [ E { ϕ ( Y 1 − ˆ a j − ˆ b j t ) ˆ b j |X } ] / ∂ t | t = X ⊤ 1 j θ 0 − ∂ [ E { ϕ ( Y 1 − ˆ a j − ˆ b j t ) ˆ b j |X } ] / ∂ t | t = t ∗ i where t ∗ is some v alue b et wee n θ ⊤ X 1 j and θ ⊤ 0 X 1 j . T aking exp ectations of b oth sides, we h av e E K ϑ ij { ρ ( Y 1 − ˆ a j − ˆ b j θ ⊤ X 1 j ) − ρ ( Y 1 − ˆ a j − ˆ b j θ ⊤ 0 X 1 j ) } (40) = δ ⊤ θ E [ K ϑ ij ϕ ( Y 1 − ˆ a j − ˆ b j θ ⊤ 0 X 1 j ) ˆ b j X 1 j ] + δ ⊤ θ (∆ 1 + ∆ 2 ) δ θ ∆ 1 = E { K ϑ ij X 1 j X ⊤ 1 j ∂ [ E { ϕ ( Y 1 − ˆ a j − ˆ b j t ) ˆ b j |X } ] / ∂ t | t = X ⊤ 1 j θ 0 } ∆ 2 = E { K ϑ ij X 1 j X ⊤ 1 j ∂ [ E { ϕ ( Y 1 − ˆ a j − ˆ b j t ) ˆ b j |X } ] / ∂ t | t = X ⊤ 1 j θ 0 } − ∆ 1 18 where t ∗ is some v alue b et wee n θ ⊤ X 1 j and θ ⊤ 0 X 1 j . T o s tu dy ∆ 1 , we need to compu te ∂ [ E { ϕ ( Y 1 − ˆ a j − ˆ b j t ) ˆ b j |X } ] / ∂ t . T o th is end , we ap- ply Lemma 6.9 and Lemma 6.10. Sup p ose [˜ a j , ˜ b j ] has the same distribu tion as [ˆ a j , ˆ b j ], but is indep end en t of ( Y 1 , X 1 ) and P ([˜ a j , ˜ b j ] 6 = [ ˆ a j , ˆ b j ]) = O { ( nh/ log 3 n ) − 1 / 4 } . Thus for an y δ → 0 , E [ ϕ ( Y 1 − ˆ a j − ˆ b j ( t + δ )) ˆ b j |X ] − E [ ϕ ( Y 1 − ˆ a j − ˆ b j t ) ˆ b j |X ] = E [ ϕ { Y 1 − ˜ a j − ˜ b j ( t + δ ) } ˜ b j ] − E [ ϕ ( Y 1 − ˜ a j − ˜ b j t ) ˜ b j |X ] + E [ { ϕ ( Y 1 − ˆ a j − ˆ b j ( t + δ )) − ϕ ( Y 1 − ˆ a j − ˆ b j t ) } ˆ b j I { [˜ a j , ˜ b j ] 6 = [ ˆ a j , ˆ b j ] }|X ] − E [ { ϕ ( Y 1 − ˜ a j − ˜ b j ( t + δ )) − ϕ ( Y 1 − ˜ a j − ˜ b j t ) } ˜ b j I { [˜ a j , ˜ b j ] 6 = [ ˆ a j , ˆ b j ] }|X ] ≡ T 1 + T 2 + T 3 (41) Based on the deﬁn ition of G 1 ( s ; X ), sin ce Y 1 is indep endent of [˜ a j , ˜ b j ], we hav e T 1 = E [ { G 1 ( a 1 − ˜ a j − ˜ b j ( t + δ ); X 1 ) − G 1 ( a 1 − ˜ a j − ˜ b j t ; X 1 ) } ˜ b j |X ] = δ E [ G 2 ( a 1 − ˜ a j − ˜ b j t ; X 1 ) ˜ b 2 j |X ] + o ( δ ) , (42) where the last equalit y follo ws from the conti nuit y of G 1 ( t ; X ) in t . Next, we sh o w that T 2 = o ( δ ). As w e men tioned in the pro of of Lemm a 6.10, [ v 1 , v 2 ] ≡ [(ˆ a j − E ˆ a j ) /σ 1 , ( ˆ b j − E ˆ b j ) /σ 2 ] are asymptotically n orm al, where σ 1 ≡ { V arˆ a j } 1 / 2 = O { ( nh ) − 1 / 2 } , σ 2 ≡ { V ar ˆ b j } 1 / 2 = O { ( nh 3 ) − 1 / 2 } . Similarly construct [ ˜ v 1 , ˜ v 2 ] from ˜ a j and ˜ b j . Without loss of generalit y , consider a sm all δ ( > 0) . It is easy to un derstand that the cond itional probabilit y densit y fun ction of Y 1 giv en [ v 1 , v 2 ] is uniformly b oun d ed. Th erefore, for an y give n v alues of ˆ a j and ˆ b j (equiv alently v 1 and v 2 ), | E { ϕ ( Y i − ˆ a j − ˆ b j ( t + δ )) − ϕ ( Y i − ˆ a j − ˆ b j t ) | v 1 , v 2 }| ≤ C δ | ˆ b j | . Let f ( ˜ v 1 , ˜ v 2 | v 1 , v 2 ) b e the conditional pr obabilit y density function of ( ˜ v 1 , ˜ v 2 ) giv en ( v 1 , v 2 ), and g ( v 1 , v 2 ) = Z [ ˜ v 1 , ˜ v 2 ] 6 =[ v 1 ,v 2 ] f ( ˜ v 1 , ˜ v 2 | v 1 , v 2 ) d ˜ v 1 d ˜ v 2 . As R f ( v 1 , v 2 ) g ( v 1 , v 2 ) dv 1 dv 2 = P ([˜ a j , ˜ b j ] 6 = [ˆ a j , ˆ b j ]) = O { ( nh/ log 3 n ) − 1 / 4 } , w e h a v e | T 2 | ≤ C δ Z ˆ | b j | f ( v 1 , v 2 ) g ( v 1 , v 2 ) dt 1 dt 2 = o ( δ ) . 19 Similarly we can sh o w that T 3 = o ( δ ). This together with (41) and (42) yields ∂ [ E ϕ ( Y i − ˆ a j − ˆ b j t ) ˆ b j |X ] /∂ t = E [ G 2 ( a 1 − ˜ a j − ˜ b j t ; X 1 ) ˜ b 2 j |X ] . (43) Apply this result to ∆ 1 and ∆ 2 , w e hav e ∆ 1 = E [ K ϑ ij X 1 j X ⊤ 1 j G 2 ( a 1 − ˜ a j − ˜ b j X ⊤ 1 j θ 0 ; X 1 ) ˜ b 2 j ] , ∆ 2 = O ( δ θ ) . Plugging this int o (40) leads to E K ϑ ij { ρ ( Y 1 − ˆ a j − ˆ b j θ ⊤ X 1 j ) − ρ ( Y 1 − ˆ a j − ˆ b j θ ⊤ 0 X 1 j ) } = δ ⊤ θ E [ K ϑ ij ϕ ( Y 1 − ˆ a j − ˆ b j X ⊤ 1 j θ 0 ) ˆ b j X 1 j ] + δ ⊤ θ E [ K ϑ ij X 1 j X ⊤ 1 j G 2 ( a 1 − ˜ a j − ˜ b j X ⊤ 1 j θ 0 ; X 1 ) ˜ b 2 j ] δ θ + o ( | δ θ | 2 ) = δ ⊤ θ E [ K ϑ ij ϕ ( Y 1 − ˆ a j − ˆ b j θ ⊤ 0 X 1 j ) ˆ b j X 1 j ] + δ ⊤ θ E [ K ϑ ij X 1 j X ⊤ 1 j g ( X 1 ) b 2 j ] δ θ + o ( | δ θ | 2 ) where the last equalit y follo ws from the conti nuit y of G 2 ( t ; X 1 ) in t and (19).  Lemma 6.12 Deﬁne Z ij = K ϑ ij ˆ b j X ij { ϕ ( Y ij ) − ϕ ( ε i ) } . Then h − 1 E i Z ij = − δ ⊤ ϑ b 2 j { ( ν /µ ) ϑ ( X j ) − X j }{ ν ϑ ( X j ) − X j µ ϑ ( X j ) } ⊤ + o ( | δ ϑ | + n − 1 / 2 ) , (44) P i,j ( Z ij − E i Z ij ) = o ( n 2 hδ ϑ ) , (45) ( nh ) − 1 P i K ϑ ij ϕ ( ε i )( ˆ b j − b j ) X ij = o ( n − 1 / 2 ) + O { δ ϑ ( nh/ log n ) − 1 / 2 } (46) uniformly in ϑ . Pro of On ce again w e ap p ly Lemma 6.9 and supp ose [ ˜ a j , ˜ b j ] has the same distribution as [ˆ a j , ˆ b j ] and is in dep endent of ( X 1 , Y 1 ). By Lemma 6.10, P ([˜ a j , ˜ b j ] 6 = [ˆ a j , ˆ b j ] } ) = O { ( nh/ log 3 n ) − 1 / 4 } . Recall X = σ ( X 1 , · · · , X n ). Note that E 1 Z 1 j = E [ K 1 j X 1 j (T 1 − T 2 + T 3 )], where E [ { ϕ ( Y 1 − ˆ a j − X ⊤ 1 j θ 0 ˆ b j ) − ϕ ( ε 1 ) } ˆ b j |X ] = T 1 − T 2 + T 3 , T 1 = E [ { ϕ ( Y 1 − ˜ a j − ˜ b j X ⊤ 1 j θ 0 ) − ϕ ( ε 1 ) } ˜ b j |X ] T 2 = E [ { ϕ ( Y 1 − ˜ a j − ˜ b j X ⊤ 1 j θ 0 ) − ϕ ( ε 1 ) } ˜ b j I { [˜ a j , ˜ b j ] 6 = [ˆ a j , ˆ b j ] }|X ] T 3 = E [ { ϕ ( Y 1 − ˆ a j − ˆ b j X ⊤ 1 j θ 0 ) − ϕ ( ε 1 ) } ˆ b j I { [˜ a j , ˜ b j ] 6 = [ˆ a j , ˆ b j ] }|X ] . Similar to (42), we can conclude that T 1 = E [ { G 1 ( a 1 − ˜ a j − ˜ b j X ⊤ 1 j θ 0 ; X 1 ) − G 1 (0; X 1 ) } ˜ b j |X ] (47) = g ( X 1 ) E { ˜ b j ( a 1 − ˜ a j − ˜ b j X ⊤ 1 j θ 0 ) |X } + O [ E { ( a 1 − ˜ a j − ˜ b j X ⊤ 1 j θ 0 ) 2 |X } ] . 20 Using the results on the asymptotic bias and v ariance of (˜ a j , ˜ b j ) in (19), we can see that E { K ϑ 1 j ( a 1 − ˜ a j − ˜ b j X ⊤ 1 j θ 0 ) 2 } = O ( hδ 2 ϑ + n − 1 ) , Next w e d eal with the ﬁr st term in (47). Using (16), a 1 − ˜ a j − ˜ b j X ⊤ 1 j θ 0 = a 1 − a j + a j − ˜ a j − ˜ b j X ⊤ 1 j θ 0 = 1 2 m ′′ ( X ⊤ j θ 0 ) { ( X ⊤ 1 j θ 0 ) 2 } − 1 2 m ′′ ( X ⊤ j θ 0 ) h 2 + O { ( X ⊤ 1 j θ 0 ) 3 } − b j δ ⊤ ϑ { ( ν /µ ) ϑ ( X j ) − X j } − b j δ ⊤ ϑ { ( µν ′ − µ ′ ν ) /µ 2 } ϑ ( X j ) X ⊤ 1 j θ 0 − h 2 [ 1 2 m ′′ ( X ⊤ j θ 0 ) { ( f µ ) ′ / ( f g ) } ϑ ( X j ) + 1 6 m (3) ( X ⊤ j θ 0 )( f µ ) ϑ ( X j )] X ⊤ 1 j θ 0 + { g f } − 1 ϑ ( X j ) 1 nh n X i =1 ϕ ij − { g f } − 1 ϑ ( X j ) { 1 nh 2 n X i =1 ˜ ϕ ij } X ⊤ 1 j θ 0 + O { ( nh/ log n ) − 3 / 4 (1 + δ ϑ /h ) + h 3 } (48) where ϕ ij , ˜ ϕ ij are zero-mean I ID rand om v ariables E [ K ϑ 1 j X 1 j T 1 ] = E [ K ϑ 1 j g ( X 1 ) X 1 j ˜ b 1 ( a 1 − ˜ a j − ˜ b j X ⊤ 1 j θ 0 )] + o ( h | δ ϑ | + n − 1 / 2 h ) (49) = − hδ ⊤ ϑ b 2 j { ( ν /µ ) ϑ ( X j ) − X j }{ ν ϑ ( X j ) − X j µ ϑ ( X j ) } + o ( h | δ ϑ | + hn − 1 / 2 ) uniformly in ϑ , wh ere (19) is used in the last step. As P ([ ˜ a j , ˜ b j ] 6 = [ˆ a j , ˆ b j ]) = O { ( nh/ log 3 n ) − 1 / 4 } , w e h a v e sim ilar to T 2 in (41), E [ K ϑ 1 j X 1 j T 2 ] = o ( n − 1 / 2 h ) + o ( hδ ϑ ) , E [ K ϑ 1 j X 1 j T 2 ] = o ( n − 1 / 2 h ) + o ( hδ ϑ ) uniformly in ϑ. Th is toge ther with (49 ) yields (44). T o pro ve (45), ﬁrst note that ϕ ( Y i − ˆ a j − ˆ b j θ ⊤ 0 X ij ) − ϕ ( ε i ) = [ ϕ ( Y i − ˆ a j − ˆ b j θ ⊤ 0 X ij ) − ϕ ( Y i − a j − b j θ ⊤ 0 X ij )] +[ ϕ ( Y i − a j − b j θ ⊤ 0 X ij ) − ϕ ( ε i )] . Let ˜ Z ij = K ϑ ij X ij { ϕ ( Y i − a j − b j θ ⊤ 0 X ij ) − ϕ ( ε i ) } . By Lemma 6.14 , it suﬃces to sho w th at X i,j b j ( ˜ Z ij − E ˜ Z ij ) = o ( n 2 hδ ϑ ) (50) X j ( ˆ b j − b j ) X i ˜ Z ij = o ( n 2 hδ ϑ ) . (51) Due to Borel-Can telli Lemma, (50) can b e further reduced to, for any ǫ > 0 nP {| X i b j ( ˜ Z ij − E ˜ Z ij ) | ≥ ǫn hδ ϑ } is summable o ver n, (52) 21 whic h follo ws from the f acts that ˜ Z ij is b ou n ded, E ˜ Z 2 ij = O ( h 3 + hδ 2 ϑ ) and Bernstein’s inequalit y , P {| X i ( ˜ Z ij − E ˜ Z ij ) | ≥ ǫn hδ ϑ } ≤ C exp n − ǫ 2 n 2 h 2 δ 2 ϑ nh 3 + nhδ 2 ϑ + ǫnhδ ϑ o = o ( n − 2 ) . T o pro ve (51), w e again us e the expansion of ˆ b j − b j giv en in (16), i.e. ˆ b j − b j = h 2 h 1 2 m ′′ ( X ⊤ j θ 0 ) { ( f µ ) ′ / ( f g ) } ϑ ( X j ) + 1 6 m (3) ( X ⊤ j θ 0 ) { ( f µ ) / ( f g ) } ϑ ( X j ) i + b j δ ⊤ ϑ { ( µν ′ − µ ′ ν ) /µ 2 } ϑ ( X j ) + 1 nh 2 n X i =1 ˜ ϕ ij + O { ( nh/ log n ) − 3 / 4 /h } where E ˜ ϕ ij = 0 . If we denote by C ( X j ) the determinstic(bias) term in ˆ b j − b j , it is easy to see that P i,j C ( X j ) ˜ Z ij = o ( n 2 hδ ϑ ) . F or the sto c h astic p art, wr ite X j,i,l ˜ Z ij ˜ ϕ lj = X i,j ˜ Z ij ˜ ϕ ij + X j,i 6 = l ˜ Z ij ˜ ϕ lj (53) W e fo cus on the second term, as th e ﬁr st term is relativ ely negligi b le. Let c ≡ E ˜ Z ij = O ( h 3 + hδ 2 ϑ ), whence the second term in (53 ) is ( nh 2 ) − 1 P j (T 1 j + c T 2 j ), where T 1 j = X i 0 and ϑ ∈ Θ n , deﬁne M ϑ n 1 = C a nϑ , M ϑ n 2 = C {| δ ϑ | + ( nh/ log n ) − 1 / 2 } , M ϑ n 3 = C {| δ ϑ | + ( nh/ log n ) − 1 / 2 /h } , B (1) n = { α ∈ R d +1 | α = [0 , α ⊤ 1 ] ⊤ , | α 1 | ≤ M ϑ n 1 } , B (2) n = { β ∈ R d +1 | β = [ b 1 , b 2 θ ⊤ 0 ] ⊤ , | b 1 | ≤ M ϑ n 2 , | b 2 | ≤ M ϑ n 3 } . As | ˆ b j δ θ | ≤ C a nϑ , | ˆ a j − a j | = O {| δ ϑ | +( nh/ log n ) − 1 / 2 } and | ( ˆ b j − b j ) | = O {| δ ϑ | +( nh/ log n ) − 1 / 2 /h } , (54) will follo w if for any ǫ > 0 sup x ∈D sup α ∈ B (1) n , β ∈ B (2) n | n X i =1 R ni ( x ; α, β ) | ≤ ǫd n a.s., d n = nha 2 nϑ (55) This is done in a similar s t yle as Lemma 4.2 in Kong et al(2008). Cov er D by a ﬁnite num b er T n of cub es D k = D n,k with side length l n = O { h ( nh/ log n ) − 1 / 4 } and cen ters x k = x n,k . W r ite sup x ∈D sup α ∈ B (1) n , β ∈ B (2) n | n X i =1 R ni ( x ; α, β ) | ≤ max 1 ≤ k ≤ T n sup α ∈ B (1) n , β ∈ B (2) n    n X i =1 R ni ( x k ; α, β )    + max 1 ≤ k ≤ T n sup x ∈D k sup α ∈ B (1) n , β ∈ B (2) n    n X i =1 n Φ ni ( x k ; α, β ) − Φ ni ( x ; α, β ) o    + max 1 ≤ k ≤ T n sup x ∈D k sup α ∈ B (1) n , β ∈ B (2) n    n X i =1 n E Φ ni ( x k ; α, β ) − E Φ ni ( x ; α, β ) o    ≡ Q 1 + Q 2 + Q 3 . 23 In Lemma 6.15, we will pr ov e that Q 2 = o ( d n ) , a.s. , whence Q 3 ≤ E Q 2 = o ( d n ). It remains to sho w that Q 1 ≤ ǫd n / 3 a.s . , wh ic h can b e done follo wing a similar p ro of st yle as in Lemma 4.2 in Kong et al (2008). P artition B ( i ) n , i = 1 , 2 in to a sequence of sub rectangles D ( i ) 1 , · · · , D ( i ) J 1 , i = 1 , 2 , such that for all 1 ≤ j 1 ≤ J 1 ≤ M d +1 ( M = ǫ − 1 ) and for all α, α ′ ∈ D (1) j 1 , w e ha ve | α − α ′ | ≤ M ϑ n 1 / M ; for all β = [ b 1 , b 2 θ ⊤ 0 ] ⊤ , β ′ = [ b ′ 1 , b ′ 2 θ ⊤ 0 ] ⊤ ∈ D (2) j 1 , w e h a v e | b 1 − b ′ 1 | ≤ M ϑ n 2 / M , | b 2 − b ′ 2 | ≤ M ϑ n 3 / M . Cho ose a p oin t α j 1 ∈ D (1) j 1 and b k 1 ∈ D (2) k 1 , 1 ≤ j 1 , k 1 ≤ J 1 . Then for any x, sup α ∈ B (1) n β ∈ B (2) n | X i R ni ( x ; α, β ) | ≤ max 1 ≤ j 1 ,k 1 ≤ J 1 sup α ∈ D (1) j 1 , β ∈ D (2) k 1 | n X i =1 { R ni ( x ; α j 1 , b k 1 ) − R ni ( x ; α, β ) }| + max 1 ≤ j 1 ,k 1 ≤ J 1 | n X i =1 R ni ( x ; α j 1 , β k 1 ) | = H n 1 + H n 2 . (56) W e ﬁrst sho w that any ǫ > 0 T n P n H n 2 ≥ ǫd n 2 o ≤ T n J 2 1 P n | n X i =1 R ni ( x ; α j 1 , β k 1 ) | ≥ ǫd n 3 o = O ( n − a ) , (57) for s ome a > 1 . By Bernstein’s Inequ ality and the fact that | R ni ( x ; α j 1 , β k 1 ) | ≤ C a nϑ and V ar { R ni ( x ; α j 1 , β k 1 ) } = O [ n ha 2 nϑ { a nϑ + ( nh/ log n ) − 1 / 2 } ], w e h a v e T n J 2 1 P n | n X i =1 R ni ( x ; α j 1 , β k 1 ) | ≥ ǫd n 3 o = T n J 2 1 exp[ − ǫ 2 nha nϑ { 1 + a nϑ ( nh/ log n ) 1 / 2 ) } ] = O ( n − a ) , for some a > 1 . Therefore, (57) holds. W e next consider H n 1 . F or eac h j 1 = 1 , · · · , J 1 and i = 1 , 2, partition eac h rectangle D ( i ) j 1 further into a sequence of su brectangles D ( i ) j 1 , 1 , · · · , D ( i ) j 1 ,J 2 . Rep eat this pr o cess recursive ly as follo ws. S upp ose after the l th round , we get a sequen ce of rectangle s D ( i ) j 1 ,j 2 , ··· ,j l with 1 ≤ j k ≤ J k , 1 ≤ k ≤ l , then in the ( l + 1)th r ound, eac h rectangle D ( i ) j 1 ,j 2 , ··· ,j l is partitioned into a sequence of subr ectangles { D ( i ) j 1 ,j 2 , ··· ,j l ,j l +1 , 1 ≤ j l ≤ J l } such that for all 1 ≤ j l +1 ≤ J l +1 and for all a, a ′ ∈ D ( i ) j 1 ,j 2 , ··· ,j l ,j l +1 , w e h a v e | a − a ′ | ≤ M ϑ n 1 / M l +1 ; and for all β = [ b 1 , b 2 θ ⊤ 0 ] ⊤ , β ′ = [ b ′ 1 , b ′ 2 θ ⊤ 0 ] ⊤ ∈ D (2) j 1 ,j 2 , ··· , j l ,j l +1 , | b 1 − b ′ 1 | ≤ M ϑ n 2 /M l +1 , | b 2 − b ′ 2 | ≤ M ϑ n 3 /M l +1 , wh ere J l +1 ≤ M d +1 . Rep eat this pro cess after the (L n + 2)th round , with L n b eing the largest integ er such th at n (2 / M ) L n > d n / M ϑ n 2 . (58) Let D ( i ) l , i = 1 , 2, denote the set of all s u brectangles of D ( i ) 0 after the l th round of partition and a t ypical element D ( i ) j 1 ,j 2 , ··· ,j l of D ( i ) l is denoted as D ( i ) ( j l ) . Ch o ose a p oin t α ( j l ) ∈ D (1) ( j l ) and 24 β ( j l ) ∈ D (2) ( j l ) . Deﬁne V l = X ( j l +1 ) ( k l +1 ) P n    n X i =1 { R ni ( x ; α ( j l ) , β ( k l ) ) − R ni ( x ; α ( j l +1 ) , β ( k l +1 ) ) }    ≥ εd n 2 l +1 o , 1 ≤ l ≤ L n + 1 , Q l = X ( j l ) ( k l ) P n sup α ∈ D (1) ( j l ) , β ∈ D (2) ( k l )    n X i =1 { R ni ( x ; α ( j l ) , β ( k l ) ) − R ni ( x ; α, β ) }    ≥ εd n 2 l o , 1 ≤ l ≤ L n + 2 . Then Q l ≤ V l + Q l +1 , 1 ≤ l ≤ L n + 1 . On the other hand , it is easy to see th at for an y α ∈ D (1) ( j L n +2 ) and β ∈ D (2) ( k L n +2 ) , n | R ni ( x ; α ( j L n +2 ) , β ( k L n +2 ) ) − R ni ( x ; α, β ) | ≤ nM ϑ n 2 /M L n +2 ≤ ǫd n / 2 L n +2 due to the c hoice of L n sp eciﬁed in (58). T herefore, Q L n +2 = 0 and it r emains to sho w that T n P { H n 1 ≥ ǫd n 2 } ≤ T n J 2 1 Q 1 ≤ T n J 2 1 L n +1 X l =1 V l = O ( n − a ) , for some a > 1 . (59) T o ﬁnd upp er b ound for V l , 1 ≤ l ≤ L n + 1, w e again app ly Bernstein’s inequalit y . As | R ni ( x ; α ( j l ) , β ( k l ) ) − R ni ( x ; α ( j l +1 ) , β ( k l +1 ) ) | ≤ C {| α ( j l ) − α ( j l +1 ) | + | β ( k l ) − β ( k l +1 ) | ( δ ϑ + h ) } ≡ M ϑ n 2 /M l , E | R ni ( x ; α ( j l ) , β ( k l ) ) − R ni ( x ; α ( j l +1 ) , β ( k l +1 ) ) | 2 ≤ h ( M ϑ n 2 ) 3 /M l , w e h a v e V l ≤  l +1 Y j =1 J 2 j  exp[ − ε 2 nh { 1 + a nϑ ( nh/ log n ) 1 / 2 } ] , and (59) thus h olds. This together with (57) completes the pr o of.  Lemma 6.14 L et Z ij = K ij [ ϕ ( Y i − a j − b j θ ⊤ 0 X ij ) − ϕ ( Y i − ˆ a j − ˆ b j θ ⊤ 0 X ij )] ˆ b j X ij . Then X i,j Z ij − E Z ij = o ( n 2 ha nϑ ) . (60) Pro of As ˆ a j − a j = O ( a nϑ ), ( ˆ b j − b j ) = O { a nϑ + ( nh/ log n ) 1 / 2 /h } and for any ǫ > 0, P n | X i,j Z ij − E Z ij | ≥ ǫn 2 ha nϑ o ≤ nP n | X i Z ij − E Z ij | ≥ ǫnha nϑ o 25 then (60) wo uld follo w if w e could sh o w that f or an y x , P n sup α ∈ B (1) n β ∈ B (2) n | X i R ix ( a, b ) | ≥ ǫnha nϑ o = O ( n − a ) for some a > 2 , (61) where B (1) n = { a ∈ R : | a − a x | ≤ ca nϑ } , B (2) n = { b ∈ R : | b − b x | ≤ c { a nϑ +( nh/ log n ) 1 / 2 /h }} , a x = m ( θ ⊤ 0 x ) , b x = m ′ ( θ ⊤ 0 x ), R ix ( a, b ) = Z ix ( a, b ) − E Z ix ( a, b ), K ix = K ( X ⊤ ix ϑ/h ) and Z ix ( a, b ) = K ix X ix [ ϕ ( Y i − a x − b x θ ⊤ 0 X ix ) − ϕ ( Y i − a − bθ ⊤ 0 X ix )]. T o this end, partition B ( i ) n , i = 1 , 2 in to a sequence of sub rectangles D ( i ) 1 , · · · , D ( i ) J 1 , i = 1 , 2 s u c h that | D ( i ) j 1 | = sup n | a − a ′ | : a, a ′ ∈ D ( i ) j 1 o ≤ M ( i ) n / M , 1 ≤ j 1 ≤ J 1 , where M (1) n = ca nϑ , M (2) n = c { a nϑ + ( n h/ log n ) 1 / 2 /h } , M ≡ ǫ − 1 and J 1 ≤ M . Cho ose a p oin t a j 1 ∈ D (1) j 1 and b k 1 ∈ D (2) k 1 . Then sup a ∈ B (1) n b ∈ B (2) n | X i R ix ( a, b ) | ≤ max 1 ≤ j 1 ,k 1 ≤ J 1 sup a ∈ D (1) j 1 , b ∈ D (2) k 1 | n X i =1 { R ix ( a j 1 , b k 1 ) − R ix ( a, b ) }| + max 1 ≤ j 1 ,k 1 ≤ J 1 | n X i =1 R ix ( a j 1 , b k 1 ) | ≡ H n 1 + H n 2 . (62) W e ﬁrst consider H n 2 . P n H n 2 ≥ εnha nϑ 2 o ≤ J 2 1 P n | n X i =1 R ix ( a j 1 , b k 1 ) | ≥ ǫnha nϑ 2 o As R ix ( a j 1 , b k 1 ) is b ounded and V ar { R ix ( a j 1 , b k 1 ) } = O { h ( a nϑ + ( nh/ log n ) − 1 / 2 } , then by Bern- stein’s inequalit y we hav e J 2 1 P n | n X i =1 R ix ( a j 1 , b k 1 ) | ≥ ǫnha nϑ 2 o ≤ C J 2 1 exp {− ǫ 2 n 1 / 2 h 3 / 2 } = O ( n − a ) , for some a > 2 . W e next consid er H n 1 . F or eac h j 1 = 1 , · · · , J 1 and i = 1 , 2, partition eac h rectangle D ( i ) j 1 further in to a sequ en ce of s u brectangles D ( i ) j 1 , 1 , · · · , D ( i ) j 1 ,J 2 . Rep eat this pro cess recursive ly as follo ws. Supp ose after the l th round, w e get a sequ en ce of rectangles D ( i ) j 1 ,j 2 , ··· , j l with 1 ≤ j k ≤ J k , 1 ≤ k ≤ l , then in the ( l + 1)th roun d, eac h rectangle D ( i ) j 1 ,j 2 , ··· ,j l is partitioned into a sequence of subrectangles { D ( i ) j 1 ,j 2 , ··· , j l ,j l +1 , 1 ≤ j l ≤ J l } suc h that | D ( i ) j 1 ,j 2 , ··· ,j l ,j l +1 | = sup n | a − a ′ | : a, a ′ ∈ D ( i ) j 1 ,j 2 , ··· ,j l ,j l +1 o ≤ M ( i ) n / M l +1 , 1 ≤ j l +1 ≤ J l +1 , 26 where J l +1 ≤ M . En d this p ro cess after the (L n + 2)t h roun d , with L n b eing the smallest in teger suc h that (2 / M ) L n > a nϑ / M (2) nϑ [whic h means 2 L n ≤ { M (2) nϑ /a nϑ } log ( M / 2) / log 2 ] . (63) Let D ( i ) l , i = 1 , 2, denote the set of all s u brectangles of D ( i ) 0 after the l th round of partition and a typica l element D ( i ) j 1 ,j 2 , ··· ,j l of D ( i ) l is denoted as D ( i ) ( j l ) . Cho ose a p oin t a ( j l ) ∈ D (1) ( j l ) and b ( j l ) ∈ D (2) ( j l ) and deﬁne V l = X ( j l ) ( k l ) P n    n X i =1 { R ix ( a j l , b k l ) − R ix ( a j l +1 , b k l +1 ) }    ≥ ǫnha nϑ 2 l +1 o , 1 ≤ l ≤ L n + 1 , Q l = X ( j l ) ( k l ) P n sup a ∈ D (1) ( j l ) , b ∈ D (2) ( k l )    n X i =1 { R ix ( a j l , b k l ) − R ix ( a, b ) }    ≥ ǫnha nϑ 2 l o , 1 ≤ l ≤ L n + 2 . Then Q l ≤ V l + Q l +1 , 1 ≤ l ≤ L n + 1 . W e ﬁrst giv e a b ou n d for V l , 1 ≤ l ≤ L n + 1. As R ix ( a j l , b k l ) − R ix ( a j l +1 , b k l +1 ) is b oun ded and E | R ix ( a j l , b k l ) − R ix ( a j l +1 , b k l +1 ) | 2 ≤ h { a nϑ + ( nh/ log n ) − 1 / 2 } / M l +1 , applying Bernstein’s inequ ality and using (63), w e ha v e V l ≤  l +1 Y j =1 J 2 j  exp[ − ǫ 2 nh min { a nϑ , a 2 nϑ ( nh/ log n ) 1 / 2 } ] ≤  l +1 Y j =1 J 2 j  exp( − ǫ 2 n 1 / 2 h 3 / 2 ) . (64) W e no w fo cus on Q L n +2 . Recall the deﬁn ition of Z ix ( a, b ) Z ix ( a, b ) = K ix [ ϕ ( Y i − a x − b x θ ⊤ 0 X ix ) − ϕ ( Y i − a − bθ ⊤ 0 X ix )] X ix . F or an y a ∈ D (1) ( j l ) and b ∈ D (2) ( k l ) , let I a,b i = 1, if there is a discon tin uity p oint of ϕ ( . ) b et w een Y i − a j l − b k l θ ⊤ 0 X ix and Y i − a − bθ ⊤ 0 X ix and I a,b i = 0 otherwise. W rite R ix ( a j l , b k l ) − R ix ( a, b ) = { R ix ( a j l , b k l ) − R ix ( a, b ) } I a,b i + { R ix ( a j l , b k l ) − R ix ( a, b ) } (1 − I a,b i ) . Then we hav e |{ R ix ( a j l , b k l ) − R ix ( a, b ) } (1 − I a,b i ) | ≤ C { a nϑ + ( n h/ log n ) − 1 / 2 } / M l and sp eciﬁcally for l = L n + 2 P n sup a ∈ D (1) ( j l ) , b ∈ D (2) ( k l )    n X i =1 { R ix ( a j l , b k l ) − R ix ( a, b ) } (1 − I a,b i )    ≥ ǫnha nϑ 2 L n +3 o ≤ P n n X i =1 U i ≥ 1 8 M nh o ≤ P n n X i =1 U i − E U i ≥ M nh 16 o 27 where U i = I {| X ⊤ ix ϑ | ≤ h } and th e ﬁrst inequalit y is due to (63). By Berns tein’s inequ alit y , this in turn implies that f or l = L n + 2  l +1 Y j =1 J 2 j  P n sup a ∈ D (1) ( j l ) , b ∈ D (2) ( k l )    n X i =1 { R ix ( a j l , b k l ) − R ix ( a, b ) } (1 − I a,b i )    ≥ ǫnha nϑ 2 L n +3 o = O ( n − a ) , (65) for some a > 2 . No w we ha v e to show similar result for  l +1 Y j =1 J 2 j  P n sup a ∈ D (1) ( j l ) , b ∈ D (2) ( k l )    n X i =1 { R ix ( a j l , b k l ) − R ix ( a, b ) } I a,b i    ≥ ǫnha nϑ 2 L n +3 o , l = L n + 2 . Note that for any a ∈ D (1) ( j l ) and b ∈ D (2) ( k l ) , I a,b i ≤ I { Y i ∈ S i } , where S i = [ a j l + b k l θ ⊤ 0 X ix − C M (2) n / M l , a j l + b k l θ ⊤ 0 X ix + C M (2) n / M l ] , whic h is indep endent of a, b . Let U i = I {| X ⊤ ix ϑ | ≤ h } I { Y i ∈ S i } . As R ix ( a j l , b k l ) − R ix ( a, b ) is b ound ed, we h a v e for l = L n + 2 , P n sup a ∈ D (1) ( j l ) , b ∈ D (2) ( k l )    n X i =1 { R ix ( a j l , b k l ) − R ix ( a, b ) } I a,b i    ≥ ǫnha nϑ 2 L n +3 o ≤ P n n X i =1 U i ≥ ǫnha nϑ C 2 L n +2 o ≤ P n n X i =1 U i − E U i ≥ ǫnha nϑ C 2 L n +4 o , (66) where the second inequalit y is due to (63). Applying Bernstein’s inequalit y to the right hand side of (66) and by (63), w e hav e  l +1 Y j =1 J 2 j  P n sup a ∈ D (1) ( j l ) , b ∈ D (2) ( k l )    n X i =1 { R ix ( a j l , b k l ) − R ix ( a, b ) } I a,b i    ≥ ǫnha nϑ 2 L n +3 o = O ( n − a ) , for l = L n + 2 for some a > 2 . This together with (65) implies that Q L n +2 = O ( n − a ) for some a > 2 . Therefore, based on (70), we h a v e P n H n 2 ≥ ǫnha nϑ 2 o ≤ Q 1 ≤ L n +1 X l =1 V l + Q L n +2 = O ( n − a ) , for some a > 2 .  Lemma 6.15 F or al l lar ge enough M > 0 , Q 2 ≤ M d n a.s. , wher e d n = nha 2 nϑ l n /h { 1 + a − 1 nϑ ( nh/ log n ) − 1 / 2 } = o ( nha 2 nϑ ) , 28 Pro of Let X ik = X i − x k , µ ik = (1 , X ⊤ ik ) ⊤ , K ik = K ( X ⊤ ik ϑ/h ) and w rite Φ ni ( x k ; α, β ) − Φ ni ( x ; α, β ) = ξ i 1 + ξ i 2 + ξ i 3 , where ξ i 1 =  K ik µ ik − K ix µ ix  ⊤ α R 1 0  ϕ ni ( x k ; µ ⊤ ik ( β + αt )) − ϕ ni ( x k ; 0)  dt, ξ i 2 = K ix µ ⊤ ix α R 1 0  ϕ ni ( x k ; µ ⊤ ik ( β + αt )) − ϕ ni ( x ; µ ⊤ ix ( β + αt ))  dt, ξ i 3 = K ix µ ⊤ ix α { ϕ ni ( x ; 0) − ϕ ni ( x k ; 0) } . Then P ( Q 2 > M 3 / 2 d n / 3) ≤ T n ( P n 1 + P n 2 + P n 3 ), where P nj ≡ max 1 ≤ k ≤ T n P  sup x ∈D k sup α ∈ B (1) n , β ∈ B (2) n | n X i =1 ξ ij | ≥ M 3 / 2 d n / 9  , j = 1 , 2 , 3 . Based on Borel-Can telli lemma, Q 2 ≤ M 3 / 2 d n almost surely , if P n T n P nj < ∞ , j = 1 , 2 , 3. Again this can b e accomplished through similar approac h in Lemma 5.1 in Kong et al(2008). W e only deal with P nj to illustrate. First note that if ξ i 1 6 = 0,then either K ik 6 = 0 or K ix 6 = 0. Without loss of generalit y , sup p ose K ik 6 = 0 , i.e. | X ⊤ ix ϑ | ≤ h , whence | X ⊤ ix θ 0 | ≤ h + | δ ϑ | and | µ ⊤ ik ( β + αt ) | ≤ C { M (1) nϑ + M (2) nϑ } . F or an y ﬁ xed α ∈ B (1) n and β ∈ B (2) n , let I α,β ik = 1. If there exists some t ∈ [0 , 1], suc h that there are discon tinuit y p oin ts of ϕ ( Y i − a ) b et ween µ ⊤ ik ( β ( x k ) + β + αt )) and µ ⊤ ik β p ( x k ); and I α,β ik = 0, otherwise. W r ite ξ i 1 = ξ i 1 I α,β ik + ξ i 1 (1 − I α,β ik ). As | ( K ik µ ik − K ix µ ix ) ⊤ α | ≤ C M (1) nϑ l n /h and | µ ⊤ ik ( β + αt ) | ≤ C M (2) nϑ , we h a v e | ξ i 1 (1 − I α,β ik ) | ≤ C M 1 nϑ M 2 nϑ l n /h = o ( a 2 nϑ ) uniformly in i, α , β and x ∈ D k , if n h 3 / log n 3 → ∞ . Let U ik = I {| X ⊤ ik ϑ | ≤ 2 h } . As ξ i 1 = ξ i 1 U ik (b ecause l n = o ( h )), we ha v e P  sup α ∈ B (1) n , β ∈ B (2) n sup x ∈D k    n X i =1 ξ i 1 (1 − I α,β ik )    > M d n 18  ≤ P  n X i =1 U ik > M nh 18 C  ≤ P  | n X i =1 U ik − E U ik | > M nh 36 C  , (67) where the second inequalit y follo ws from the fact that E U ik = O ( h ). W e can then apply to (67) Bernstein’s inequalit y for indep endent d ata or Lemma 5.4 in Kong et al (2008) for dep endent case, to obtain the b elo w resu lt T n P  sup α ∈ B (1) n , β ∈ B (2) n    n X i =1 ξ i 1 (1 − I α,β ik )    > M d n / 18  is summ able o v er n, (68) 29 whence P n T n P n 1 < ∞ , is equ iv alent to T n P  sup α ∈ B (1) n , β ∈ B (2) n    n X i =1 ξ i 1 I α,β ik    > M d n / 18  is summable o ve r n. (69) T o this end, ﬁrst note that I α,β ik ≤ I { ε i ∈ S α,β i ; k } , where S α,β i ; k = m [ j =1 [ t ∈ [0 , 1] [ a j − A ( X i , x k ) + µ ⊤ ik ( β + αt ) , a j − A ( X i , x k )] ⊆ m [ j =1 [ a j − C M (2) nϑ , a j + C M (2) nϑ ] ≡ D n , for some C > 0 , A ( x 1 , x 2 ) = m ( x ⊤ 1 θ 0 ) − m ( x ⊤ 2 θ 0 ) − m ′ ( x ⊤ 1 θ 0 )( x 1 − x 2 ) ⊤ θ 0 , where in th e deriv ation of S α,β i ; k ⊆ D n , w e ha ve used the fact that | X ik | ≤ 2 h, µ ⊤ ik ( β + αt ) = O ( M (2) n ) and A ( X i , x k ) = O ( h 2 + | δ ϑ | 2 ) = o ( M (2) n ) uniformly in i . As I α,β ik ≤ I { ε i ∈ D n } , we ha ve | ξ i 1 | I α,β ik ≤ | ξ i 1 | U ni , where U ni ≡ I ( | X ik | ≤ 2 h ) I { ε i ∈ D n } , w h ic h is in dep endent of th e c hoice of α and β . Th erefore, P  sup α ∈ B (1) n , β ∈ B (2) n    n X i =1 ξ i 1 I α,β ik    > M d n / 18  ≤ P  n X i =1 U ni > M nhM (2) n / (18 C )  ≤ P  n X i =1 ( U ni − E U ni ) > M nhM (2) n 36 C  , (70) where the ﬁrst in equalit y is b ecause | ξ i 1 | ≤ C M a nϑ l n /h and th e second one b ecause E U ni = O ( hM (4) n ). S imilar to (67), w e could ap p ly either Bernstein’s in equalit y f or indep end en t data or in dep end en t case Lemma 5.4 in Kong et al (2008 ) to see that (69) ind eed h olds.  Lemma 6.16 Al l eigenvalues of ( S 2 + θ 0 θ ⊤ 0 ) − 1 (Ω 0 + θ 0 θ ⊤ 0 ) fal l into the interval (0 , 1) . Pro of By the Cauch y-Sc h warz Inequalit y that for any x ∈ R d , E { g ( X )( X − x ) | X ⊤ ϑ = x ⊤ ϑ } E { g ( X )( X − x ) | X ⊤ ϑ = x ⊤ ϑ } ⊤ ≤ E { g ( X ) | X ⊤ ϑ = x ⊤ ϑ } E { g ( X )( X − x )( X − x ) ⊤ | X ⊤ ϑ = x ⊤ ϑ } , whic h is equiv alen t to { ν ϑ ( x ) − xµ ϑ ( x ) }{ ν ϑ ( x ) − xµ ϑ ( x ) } ⊤ ≤ µ ϑ ( x ) ω ϑ ( x ) or µ ϑ ( x ) { ( ν /µ ) ϑ ( x ) − x }{ ( ν /µ ) ϑ ( x ) − x } ⊤ ≤ ω ϑ ( x ) . 30 Multiply b oth sides by m ′ ( x θ 0 ) 2 and tak e exp ectation, w e ha ve that S 2 − Ω 0 ≥ 0, whic h could b e strengthen as S 2 − Ω 0 > 0. T his is b ecause if there exists some ϑ 1 6 = 0, su ch th at ϑ ⊤ 1 ( S 2 − Ω 0 ) ϑ 1 = 0, then for any x , ther e exists some C , such that { g ( X ) } 1 / 2 ϑ ⊤ 1 ( X − x ) ≡ C { g ( X ) } 1 / 2 , for all X ⊤ ϑ = x ⊤ ϑ ⇒ ϑ ⊤ 1 ( X − x ) ≡ C, f or all X ⊤ ϑ = x ⊤ ϑ ⇒ ϑ 1 ≡ ϑ (71) A suﬃcien t condition for ( S 2 + θ 0 θ ⊤ 0 ) − 1 (Ω 0 + θ 0 θ ⊤ 0 ) to ha v e only p ositive eigen v alues is that θ 0 is the s ole eigen v ector of S 2 and Ω 0 that corresp onds to eigenv alue 0. W e argue this by con tradiction. Supp ose there exists some ϑ su ch that ϑ ⊥ θ 0 and E { g ( X ) ϑ ⊤ ( X − x )( X − x ) ⊤ ϑ | θ ⊤ 0 X = θ ⊤ 0 x } = 0 , for any x ∈ R d (72) E { g ( X ) ϑ ⊤ ( X − x ) | θ ⊤ 0 X = θ ⊤ 0 x } = 0 , for any x ∈ R d (73) Note that as g ( X ) > 0, (72) in fact implies that E { ϑ ⊤ ( X − x ) | θ ⊤ 0 X = θ ⊤ 0 x } = 0, w hic h in turn means that ϑ = θ 0 ; this con tradicts the fact that ϑ ⊥ θ 0 . T o sho w th at (73) can’t b e true, let { b 1 , · · · , b d − 1 } constitute the orthogonal b asis of the orthogonal space to v ector θ 0 . Let x = b i , i = 1 , · · · , d − 1, then θ ⊤ 0 x = 0 and from (73) w e hav e E { g ( X ) ϑ ⊤ ( X − b i ) | θ ⊤ 0 X = 0 } = 0 , ⇒ ϑ ⊤ E { g ( X ) X | θ ⊤ 0 X = 0 } = ϑ ⊤ b i E { g ( X ) | θ ⊤ 0 X = 0 } As E { g ( X ) X | θ ⊤ 0 X = 0 } and E { g ( X ) | θ ⊤ 0 X = 0 } are constan ts (vec tor) indep end en t of b i and E { g ( X ) X | θ ⊤ 0 X = 0 }⊥ θ 0 , we ha ve that there exists some v ector b ⊥ θ 0 suc h that ϑ ⊤ b = ϑ ⊤ b i , i = 1 , · · · , d − 1 , ⇔ ϑ ⊤ ( b − b i ) = 0 i = 1 , · · · , d − 1 , but this can n ot b e true unless ϑ ⊥ b i for all i = 1 , · · · , d − 1 . Next we show that λ max < 1 by con tradiction. If not, supp ose x is the corresp onding eigen v ector, ( S 2 + θ 0 θ ⊤ 0 ) − 1 (Ω 0 + θ 0 θ ⊤ 0 ) x = λ max x ⇒ (Ω 0 + θ 0 θ ⊤ 0 ) x = λ max ( S 2 + θ 0 θ ⊤ 0 ) x ⇒ x ⊤ (Ω 0 + θ 0 θ ⊤ 0 ) x = λ max x ⊤ ( S 2 + θ 0 θ ⊤ 0 ) x ⇒ x ⊤ Ω 0 x ≥ λ max x ⊤ S 2 x ( ∵ λ max x ≥ 1) whic h contradict s the fact that S 2 − Ω 0 > 0 if x 6 = θ 0 . R EFERENCES 31 Andrews, D.W.K. (1994) Asymp totics for s emip arametric econometric mo d els via sto c hastic equicon tin uity . Ec onometric a 62 , 43-72 . Bosq, D. (1998) Nonp ar ametric Statistics for Sto chastic Pr o c esses: Estimation and Pr e diction . Springer, New Y ork. Chaudhuri, P ., Doksum, K. and Samarov, A. (1997) O n av erage deriv ativ e quan tile r egression. Ann. Statist. 25 , 715-44. Ko enk er, R. (2005) Quantile R e g r ession . C ambridge Univ ersit y Press, New Y ork. Ko enk er, R. and Bassett, G. (1978) R egression Quanti les. Econometrica. 46 (1), 3350. Kong, E., Lin ton, O. and Xia, Y. (2008 ) Un iform Bahadur representati on for lo cal p olynomial estimates of M-regression and its app lication to the add itiv e mo del, Ec onometric The ory (to app ear) Korolyuk, V. S. and Boro vskikh , Y u. V. (1989) La w of the iterated logarithm for U-statistics. Ukr ainian Mathematic al Journal 1 , 89-92. Lange, T. Rahbek, A. and Jensen, S.T. (2006) Es timation and Asymptotic In ference in the First Ord er AR-ARCH Mod el. Preprint P ollard, D. (1991 ) Asymptotics for least absolute deviation r egression estimators. Ec onometric The ory 7 , 186-99. Sun, S. and Ch iang, C.Y. (1997 ) Lim iting b ehavio r of the p ertur b ed empirical distr ibution functions ev aluated at U-statistic s for strongly mixin g sequences of random v ariables. Journal. Applied Mathematics and Sto chasti c Analysis. 10 , 3-20. W elsh, A.H. (1996) Robu st estimation of smo oth r egression and sp read fu nctions and their deriv ativ es Statistic a Sinic a 6 , 347-36 6. Xia, Y. (2007) Direct estimation of the multiple-i nd ex mo del 32

Quantile Estimation of A general Single-Index Model

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment