Principal components analysis for sparsely observed correlated functional data using a kernel smoothing approach

In this paper, we consider the problem of estimating the covariance kernel and its eigenvalues and eigenfunctions from sparse, irregularly observed, noise corrupted and (possibly) correlated functional data. We present a method based on pre-smoothing…

Authors: Debashis Paul, Jie Peng

Principal comp onen ts analysis for sparsely observ ed c o rrelated functional data using a k ernel smo othing approac h Debashis P aul and Jie P eng University of California , Davis Abstract In this pap er, w e consider the problem of estimating the cov ariance kernel and its eig en v alues and eigenfunctions from sparse, irr e gularly o bs erved, noise corr upted and (possibly) corr elated functional data. W e present a metho d based o n pre -smo othing of individual sample curves through an a ppropria te kernel. W e show that the naive empirical cov a riance of the pre-smo othed sample curves gives highly biased es timator of the cov ariance kernel along its diagonal. W e attend to this problem by estimating the diag onal a nd o ff-diagonal par ts of the cov ariance kernel s eparately . W e then present a pra ctical and efficient metho d for choosing the bandwidth for the k er nel by using a n approximation to the leav e- one-curve-out cross v a lidation sco re. W e prov e that under standar d regula rity conditions o n the cov ariance kernel and as suming i.i.d. samples, the r isk of our estimator , under L 2 loss, achiev es the optimal nonpar ametric ra te whe n the num b er of mea surements p er curv e is b o unded. W e also sho w that ev en when the sample curves ar e corr e la ted in such a wa y that the no iseless data has a sep ar able co v aria nce structure, the prop osed metho d is still consistent and we qua ntify the role of this correla tion in the ris k of the estimator . AMS Subje ct Classific ation : 62G20, 62H25 Keywor ds : F unctional data analysis, principal comp onent analysis, k ern el smo othing, cross v alidati on, consistency 1 In tro du ction Noisy fu nctional data arise fr equ en tly in v arious fields, for example longitudinal data analysis, c hemometrics, econometrics, etc (F errat y and Vieu, 2006). Dep ending on ho w th e measuremen ts are tak en, there can b e t wo different scenarios - (i) individu al cur v es are measur ed on a dense, regular grid; (ii) the measuremen ts are observ ed on a sparse, and typica lly irregular set of p oin ts in an interv al. The first s itu ation usually arises w h en the d ata are recorded by some automated instrument, e.g. in c hemometrics, where the cur v es represent the sp ectra of certain chemica l sub- stances. Th e second scenario is more t yp ical in longitudinal studies where the individual curves could repr esen t the level of concen tration of some sub stance, and the measurements on the sub jects ma y b e tak en only at irregular time p oin ts. In these settings, when the goal of analysis is either data compression, mo del building or stud y- ing co v ariate effects, one may w ant to extract inf ormation ab out the functional princip al c omp onents (i.e., the eigen v alues and eigenfunctions of th e co v ariance kernel). The eigenfunctions give a nice basis for representing the data, and hence are ve r y useful in problems related to mo del bu ilding and prediction f or functional data. F or example, they ha ve b een used extensiv ely in f u nctional linear regression (Card ot, F errat y and Sarda (1999), Hall and Ho r o witz (2007) , Cai an d Hall (2006)). Ramsa y and S ilverman (2005 ) and F errat y and Vieu (2006) giv e extensiv e su r v eys of the appli- cations of fu nctional pr incipal comp onen ts. In the first scenario, i.e., data on a r egular grid, as long as the individual cur v es are smo oth, th e measur emen t noise lev el is lo w, and the grid is dense enough, one can essentia lly treat the data to b e on a con tin uu m, and emp loy tec hn iques similar to 1 the ones u sed in classical multiv ariate analysis. Ho w ev er, the ir r egular n ature of data in the second scenario, and the asso ciated measurement n oise requir e a different treatmen t. In this pap er, w e prop ose a ke r n el smo othing approac h to estimate the co v ariance sur face and its f unctional prin cipal comp onent s based on sparse, irregularly observ ed, n oise corrupted functional data. This metho d is b ased on the pre-smo othing of individual curves, with su itable mo dification of the diagonal, for estimating th e co v ariance ke r nel. W e pr o v e the consistency and d er ive the rate of conv ergence of the prop osed estimator. Also, u nder man y p ractical circumstances, th e sample curves are correlated, for example, spatio-temp oral data (Hlub ink a and P rc hal, 2007) , online auction data (P eng and M ¨ uller, 2008), time course gene expression d ata (Sp ellman et al. , 1998). Ho wev er, in the existing literature, most of the theoretical study on principal comp onents analysis assume i.i.d. sample curv es. The analysis p resen ted in th is pap er sh o ws that the asymptotic consistency of th e p rincipal comp onent s holds for the prop osed metho d ev en u nder certain t yp es of correlation s tr uctures (as discussed later). Before w e go into the details of the p r op osed pro cedur e, we first giv e an outline of th e d ata mo d el and an o v erview of different approac hes to this problem. Su pp ose that we observe n realizations of an L 2 -sto c hastic pro cess { X ( t ) : t ∈ [0 , 1] } at a sequence of p oin ts on the in terv al [0 , 1] (or, more generally , on an in terv al [ a, b ]), with ad d itiv e measurement n oise. T h at is, the observ ed d ata { Y ij : 1 ≤ j ≤ m i ; 1 ≤ i ≤ n } can b e mo deled as : Y ij = X i ( T ij ) + σ ε ij , (1) where { ε ij } are i.i.d. w ith mean 0 and v ariance 1. Since X ( t ) is an L 2 sto c hastic pr o cess, b y Mer c er’s The or em (Ash , 1972) there exists a p ositiv e semi-definite kernel C ( · , · ) such that C ov ( X ( s ) , X ( t )) = C ( s, t ) and eac h X i ( t ) has the f ollo w in g a.s. represen tation in terms of the eigenfunctions of the k ernel C ( · , · ) : X i ( t ) = µ ( t ) + ∞ X ν =1 p λ ν ψ ν ( t ) ξ iν , (2) where µ ( · ) = E ( X ( · )) is th e mean fun ction; λ 1 ≥ λ 2 ≥ . . . ≥ 0 are the eigen v alues of C ( · , · ); ψ ν ( · ) are the corresp ond ing orthonormal eigenfunctions; and the rand om v ariables { ξ iν : ν ≥ 1 } , for eac h i , are uncorrelated with zero mean an d un it v ariance. F urthermore, w e assume th at for eac h p air ( i, j ) with 1 ≤ i 6 = j ≤ n , the correlation is mo delled by E ( ξ iν ξ j ν ′ ) = δ ν ν ′ ρ ij , for 1 ≤ ν , ν ′ ≤ M , and ρ ij ma y b e nonzero. This giv es rise to a se p ar able c ovarianc e structur e for the noiseless d ata. T hat is, the pr o cesses { X i ( · ) } n i =1 satisfy , C ov ( X i ( s ) , X j ( t )) = ρ ij C ( s, t ), with ρ ii ≡ 1. This holds, for example when the principal comp onent scores { ξ iν } n i =1 for differen t ν are i.i.d. stat ionary time series. Fi n ally , in th e observ ed data mo d el ( 1 ), we assum e that T i = { T ij : j = 1 , . . . , m i } are randomly sampled from a contin uous d istr ibution. As an example that is particularly suitable f or mo deling within the framew ork p resen ted ab o ve, w e consider th e data on atmospheric radiation in Hlu bink a and Prchal (2007). Th er e, the m ea- surements are ta ken fr om ballo ons from Earth’s surface u p to an altitude of 35 km. The data p oints corresp onding to the i -th ballo on are of the form ( a i , z i ), where a represents th e altitude and z represents the a v erage num b er of p u lses at altitude a , whic h is thought to b e prop ortional to the radiation intensit y . Thus, these vertica l p rofiles of atmospheric radiatio n are considered as individual realizatio ns of a functional data. That is, h ere a i ’s are measur ement p oint s, z i ’s are the measuremen ts and the sub jects are ind exed by time. Hence th ere is a natural dep endence among 2 the sample curve s obs er ved o ver different time p oint s. Moreo v er, it is r easonable to assume that the dep en dence across time do es not change with the v ertical distance except p ossibly through a long-term trend , i.e., the spatio-t emp oral co v ariance structure is separable. Belo w we give a s h ort ov erview of t wo existing app roac hes to the problem of estimation of functional principal comp onen ts from sparse data. Y a o, M ¨ uller and W ang (2005) p rop ose a lo cal linear smo othing of the empirical co v ariances { b C i ( T ij , T ij ′ ) : j 6 = j ′ } n i =1 : b C i ( T ij , T ij ′ ) = ( Y ij − b µ ( T ij ))( Y ij ′ − b µ ( T ij ′ )) where b µ is the estimate of th e mean fun ction µ ( · ) obtained b y lo cal lin ear smo othing. They pro ve asymptotic consistency of this estimator and the estimated eigenfunctions, by assum in g i.i.d. sample cu r v es. Hall, M ¨ uller and W ang (2006) prov e furth er that the problem of estimating the co v ariance kernel and that of estimating its eigenfun ctions are intrinsically different in that the former is a t wo-dimensional smo othing p r oblem w hile the latter is an one-dimensional problem, whic h results in d ifferen t c hoices for optimal b andwidth. T hey also prov e th at the prop osed lo cal p olynomial estimator ac hieves the optimal n on p arametric con verge n ce rate with the optimal choic e of bandw idths, under the i.i.d. setti n g, wh en the n u mb er of measuremen ts p er curv e is b ound ed. Instead of the lo cal p olynomial approac h, where one imp oses regularization on the estimates by v arying the bandwidth of the k ernel, one can imp ose regularization b y restricting the eigenfunctions in a known basis of smo oth f unctions. This ap p roac h has b een u sed by v arious researchers in cluding Besse, Card ot and F errat y (1997), Card ot (2000), James, Hastie and S ugar (2000) and P eng and P aul (2007 ). Peng and P aul (2007) prop ose to directly m aximize the restricted log-lik eliho o d under the wo rkin g assump tion of Gaussianit y , suc h that th e resulting estimator s atisfies the geometry of the parameter space. This metho d is implemented thr ough a Newton-Raphson alg orithm on the S tiefel manifold of rectangular matrices with orthonormal columns. The latter space is the parameter space for the m atrix of basis co efficien ts for the eigenfun ctions. F urthermore, in P aul and P eng (2007) the authors p ro ve that this r estricted maxim um lik eliho o d (REML) estimator also ac hiev es the optimal nonparametric r ate w hen the num b er of measurements p er sample curve is b ound ed and th e sample curves are i.i.d. W e n ow giv e a b rief descrip tion of the estimatio n pro cedur e prop osed in this pap er. T he metho d is partly m otiv ated by the observ atio n that the naiv e sample co v ariance b ased on the presmo othed individual s ample curves is a highly bias estimation along the diagonal of the co v ariance k ern el, when m i , th e num b er of measur ements p er cur v e, is small. As can b e seen clearly from ( 6 ) in S ection 2.1 , this bias do es not v anish asymptotically unless (min 1 ≤ i ≤ n m i ) h n → ∞ as n → ∞ , where h n is the bandwid th of the k ern el smo other. Under the latter setting, Hall et al. (2006) d iscuss the p ossibilit y of using a lo cal linear smoother for individual samp le curves an d then p erform in g a PCA on the smo othed curves. F urthermore, wh en the design p oin ts T ij are regularly spaced and sufficien tly dense, th ey sho w that u sing con v ent ional PCA for fun ctional d ata (see statemen ts and conditions in Theorem 3 of that pap er for details) on e obtains ro ot- n consisten t estimates of the eigen v alues and eigenfunctions so that the problem is asymptotically equiv alen t to a parametric problem. It is an in teresting question that whether the naiv e kernel smo othing appr oac h can b e suitably m o dified su c h that it can pro du ce estimators with go o d asymptotic risk prop erties even when th e m i ’s are relativ ely small. Our app roac h in this p ap er go es to wards this dir ection an d in vol ves estimating the diagonal and the off-diago n al p ortions separately , and then merging them together using a sm o oth weigh t k ern el. The estima tion of th e off-diagonal p ortion is based on presmo othing individual sample cu r v es by a linearized k ernel smo other. The estimation of the diagonal p art inv olv es linearized k ern el sm o othing of the empirical v ariances. T h e task of select in g 3 an ap p ropriate band width, and the num b er of nonzero eigen v alues, is add ressed th rough obtaining a computationally efficient app ro ximation to the lea ve -one-curve-o ut cross v alidation score. This appro ximation p ro cedure, as wel l as the asymptotical analysis of the estimators, is based on the p ertur bation theory of linear op erators. No w we su mmarize th e main con tributions of this pap er. Our approac h of m er ging t wo separate presmo othed linearized kernel estimates of the diagonal and the off-diagonal p arts of the co v ariance k ernel is new and is compu tationally very efficien t. W e prov e that the prop osed estimator achiev es the optimal nonparametric rate when the observ ations are i.i.d. r ealizat ions of a finite dimensional smo oth sto c h astic pro cess, and w h en th e num b er of measur ements p er curve is b oun ded. This result parallels to the one obtained by Hall et al. (2006) for the lo cal p olynomial app r oac h. Moreov er, we obtain explicit expressions for th e integ rated mean squ ared error of the estimated eigenfunctions under a regime of s eparable co v ariance structure among the sample cur v es. The quantificat ion of the role of correlation in the risk b ehavio r (Th eorem 4.2 ) is seemingly n ew in the literature, un der the con text of functional data analysis. W e also deriv e a low er b ound on the rate of con verge n ce of the risk of the fi rst eigenfunction (Theorem 4.3 ) wh ic h is sharp er than an analogous (but more general) b ound obtained in Hall et al. (2006). T h is lo wer b oun d and the matc hin g up p er b ound on the rate of conv ergence for th e i.i.d. case sho ws that the prop osed estimator obtains the optimal rate ev en w hen max 1 ≤ i ≤ n m i → ∞ , at least under the restricted setting describ ed in Theorem 4.3 . Moreo v er, if the correlati on b etw een sample curv es is “w eak” in a suitable sense, then the optimal rate of conv ergence for eigenfunctions in the correlated and i.i.d. cases are the same. F urthermore, w e sh o w that our estimation p ro cedure also allo ws for a computationall y efficien t ap p ro ximation of lea v e-one-curv e-out cross v alidation score, whic h is used for selecting the bandwidth for estimating the eigenfun ctions. Th is appr o ximation is based on a p ertu rbation analysis ap p roac h that is n atural giv en the f orm of our estimator. In the pap er, we also s ho w that the widely used prediction err or loss for cross v alidation is not correctly scale d und er the curr en t context . Th u s we prop ose to use the empirical Ku llbac k-Leibler loss for th e cross v alidation criterion. The rest of the pap er is organized as follo ws. In Section 2 , we prop ose the estimation pro cedu re and con trast it with the naiv e kernel smo othing app r oac h. I n Section 3 , we pr op ose an appr o xima- tion to the lea v e-one-curv e-out cross v alidation s core based on th e p erturbation theory for linear op erators. I n Section 4 , w e state the main results ab out th e consistency an d rate of conv ergence of the estimato r s of the co v ariance k ern el and its eigenfun ctions. In Section 5 , w e giv e an outline of the p ro of of the main results (Theorems 4.1 and 4.2 ) and discuss their implications. In Section 6 , w e giv e an o verview of v arious related issues and future r esearch d ir ections. T he pr o of details are pro vided in the app endices. 2 Metho d Throughout this section, we assume that the mean curve has b een estimated separately , and h as b een su btracted from the data. Thus, without loss of generalit y w e assume that µ = 0. Also, in the asymptotic analysis carried out in S ection 4 , we m ak e th e same assu mption to simplify the exp osition. The case of arbitrary µ w ith sufficien t degree of smoothn ess can b e easily handled. 2.1 Naiv e kernel smo othing approac h A p opular metho d in n onparametric fun ction estimation is to smo oth the individ u al sample cur v es b y a k ernel av eraging of the sample p oint s. In pr inciple, one can adopt a similar approac h in the 4 current con text. This m eans that fi rst smo othing in dividual sample curv es, and then computing the co v ariance of the “pre-smo othed” sample curves, follo wed by an eigen-analysis of this “pre- smo othed” empirical co v ariance. In the follo wing, w e first describ e briefly suc h an approac h, and then sh ow that ev en in the case of i.i.d. data, the estimator thus obtained has an intrinsic bias while estimating the diago n al of th e co v ariance kernel, unless the n u m b er of m easur emen ts p er curv e is large. Let K ( · ) b e a summability kernel with an adequate d egree of smo othness, and satisfying the follo wing conditions: B1 (i) supp( K ) = [ − B K , B K ] for some B K > 0; (ii) K is symm etric ab out 0; (iii) R K ( x ) dx = 1; (iv) R xK ( x ) dx = 0; (v) R K ′ ( x ) dx = 0; (vi) R xK ′ ( x ) dx = 1. W e then define the pr esmo othed sample curve s as follo ws: e X i ( t ) = 1 m i m i X j =1 Y ij K h n,i ( t − T ij ) , i = 1 , . . . , n, (3) where K h ( x ) = h − 1 K ( h − 1 x ) for h > 0 and h n,i is the bandwid th for the i -th cur v e. T hen th e empirical co v ariance b ased on the presmo othed curv es is simply e C ( s, t ) = 1 n n X i =1 e X i ( t ) e X i ( s ) . (4) In the follo wing, w e deriv e an expression for the exp ectatio n of e C ( s, t ) in estimating C ( s, t ) to quan tify the bias, w h en h n,i = h n for all i , under the assu mption that C ( · , · ) is t wice con tin uously differen tiable. Supp ose for simplicit y that th e d ensit y of the design p oin ts { T ij } m i j =1 , f or eac h sub ject, is unif orm on [0 , 1]. Define C ( t ) = C ( t, t ) for t ∈ [0 , 1], and K 2 ( · ) = R K ( · − u ) K ( − u ) du . Also, we assume that m ′ i s are giv en. In the follo w in g p rop osition the b oun ds hold under h n → 0. Prop osition 2.1. When s 6 = t , E [ e X i ( s ) e X i ( t )] = 1 m i h n K 2 ( s − t h n )( C ( t ) + σ 2 ) + 1 m i C ′ ( t ) Z uK ( − u ) K ( s − t h n − u ) du + (1 − 1 m i ) C ( s, t ) + 1 m i O ( h n ) + O ( h 2 n ) . (5) And , E [ e X i ( t ) 2 ] = 1 m i h n K 2 (0)( C ( t ) + σ 2 ) + (1 − 1 m i ) C ( t ) + 1 m i O ( h n ) + O ( h 2 n ) (6) The O ( · ) terms involve su p t ∈ [0 , 1] | C ′′ ( t ) | , sup s,t ∈ [0 , 1] k D 2 C ( s, t ) k and R u 2 K ( u ) du , wher e D 2 is the Hessian op er ator. By Pr op osition 2.1 , it is easy to see, E [ e X i ( s ) e X i ( t )] = (1 − 1 m i ) C ( s, t ) + O ( h 2 n ) if | s − t | > 2 B K h n , since the fi r st t wo terms in ( 5 ) b oth v anish, as well s as the O ( h n ) term (see the pro of in App endix C for more details). This shows that e C ( s, t ) sh ould b e m u ltiplied by m i / ( m i − 1) to get r id of the trivial bias. Ho wev er, ( 5 ) and ( 6 ) also show that the emp ir ical co v ariance e C ( s, t ) is a h ighly biased estimate of C ( s, t ) near the diagonal ev en after this trivial mo dification, unless h n min 1 ≤ i ≤ n m i → ∞ . This is b ecause the fi rst terms in( 5 ) and ( 6 ) are alw a ys p ositiv e along the diagonal (i.e., wh en 5 | s − t | < 2 B K h n ), which r esult in ov erestimatio n . In fact the degree of o v erestimation gets really big (by a scale factor of h n ) as so on as | s − t | < 2 B K h n . This demonstrates clearly that the naiv e k ernel s mo othing appr oac h is intrinsically b iased and n eeds to b e app ropriately mo difi ed. T o unders tand the reason for this bias, notice that if a p air of p oint s ( T ij , T ij ′ ), for some 1 ≤ j 6 = j ′ ≤ m i , is randomly sampled from [0 , 1] 2 , then it h as a p robabilit y of the order O ( h 2 n ) to b e in a neighborh o o d of length and w idth h n of a given p oint ( s, t ) (which is a wa y from th e diagonal). In contrast , there is O ( h n ) probabilit y of a randomly c hosen p oin t T ij to b elong to a neigh b orho o d of length h n of the p oint ( t, t ) along th e diagonal. Ther efore, measurement s are muc h denser along the d iagonal and this explains the d ifference in rates. 2.2 Mo dification t o naiv e kernel smo othing In this sectio n , we prop ose a mo d ification to deal with the bias in the naiv e k ernel smo othing approac h describ ed in Section 2.1 . W e prop ose to remedy the effe ct of unequal s cale along the diagonal of the co v ariance k ernel (and the r esu lting bias) by estimat ing the diagonal and the off- diagonal p arts separately . W e then u se a suitable (sm o oth) weigh t k ernel to com bine those tw o estimates together. Throughout the p ap er, we assume that the d ensit y of the time-p oints { T ij } is kno wn and is denoted b y g ( · ). In practice we can estimate g from the data separately . W e further assume that there are constant s 0 < c 0 ≤ c 1 < ∞ suc h that c 0 ≤ g ( · ) ≤ c 1 . W e also pr op ose to use a linearized v ersion of the k ernel s mo othing to red uce the b ias while con trolling the v ariance. F o r th is p urp ose, defin e Q ( s, t ) to b e a tensor-pr o duct kernel (that is a k ernel of the form Q ( s, t ) = Q ( s ) Q ( t ) for some sm o oth function Q ) with the follo wing prop erties, together referred as condition B2 : (i) Q is sup p orted on [ − C Q , C Q ], for some C Q > 0, and Q ( · ) ≥ 0; (ii) k Q k ∞ < ∞ ; (iii) P k ∈ Z Q ( x − k ) = 1. (iv) Q is symm etric ab out 0. Prop erty (iii) can b e rephrased as sa ying that integ er trans lates of Q form a p artition of unity . As an example, the B-spline basis functions (Ch u i, 1987) satisfy all four p r op erties. Let Q h ( · , · ) denote the k ern el Q ( h − 1 · , h − 1 · ). F or estimation of the diagonal C ( t ) = C ( t, t ), let b C ( t ):= b C ∗ ( t ) − b σ 2 , where b σ 2 is an estimato r of σ 2 (discussed in Section 2.3 ), and b C ∗ ( t ) is the estimate of C ( t ) + σ 2 obtained b y using a linearized k ernel smo othing of the terms { 1 m i Y 2 ij : j = 1 , . . . , m i ; i = 1 , . . . , n } . T h is is b ecause, f or eac h pair ( i, j ), the conditional exp ectatio n of the qu an tit y Y 2 ij (conditional on T i and m i ) is C ( T ij , T ij ) + σ 2 . Define a grid on [0 , 1] with grid sp acings h n and denote the grid p oint s by { s l : l = 1 , . . . , L n } where L n = c L h n for an appropriately chose n c L ≈ 1. Then define, b C ∗ ,h n ( t ) = 1 g ( t ) 1 n n X i =1 L n X l =1 [ S i ( s l ) + ( t − s l ) S ′ i ( s l )] Q h n ( t − s l ) , (7) with S i ( s ) = 1 m i m i X j =1 Y 2 ij K h n ( s − T ij ) . (8) 6 Note that, ( 7 ) is a linearized v ersion of the con v entio n al k ernel smo othing, w hic h can b e in terpreted as a lo cal linear smo othing of the emp irical v ariances. A similar principle is app lied to construct an estimator of the off-diagonal part (see ( 9 ) b elow). T he linearization has t wo adv an tages: on one hand, it helps in r educing the bias in the estimate; and on the other hand it facilitate s efficien t computation b oth in term s of estimation and mo d el select ion. Th e d ifference of this linearization approac h with th e lo cal linear smo othing mainly lies in the fact that w e are using g ( t ) (or an estimate of g ( t )) in the denominator, w hile in lo cal linear smo othing, the denomin ator implicitly is a local estimate of g obtained by a verag in g the smoothin g k ernel in a neigh b orho o d of t . No te that, as opp osed to our estimator of g , w h ic h uses d ifferen t bandwidth th an the one for estimating the co v ariance, lo cal linear smo othing essentia lly us es the same bandwid th for estimating b oth g and C , and thus it suffers f rom instabilit y . More sp ecifically , the lo cal linear estimator of Y ao et al. (20 05) inv olv es ratios w ith a denominator consisting of essen tially the num b er of time p oints falling in a small interv al. Since the time p oint s are assumed to b e randomly distributed and are sparse, in p r actice this can cause instabilit y . Let e X i ( t ) b e the i -th smo othed sample curve as d efined in ( 3 ), and e X ′ i ( t ) b e the deriv ativ e of e X i ( t ). Then defi ne the estimate of the off-diagonal p art as (with a slight abuse of notation) e C h n ( s, t ) = 1 g ( s ) g ( t ) 1 n n X i =1 w ( m i ) L n X l,l ′ =1 h ( e X i ( s l ) + ( s − s l ) e X ′ i ( s l )) · ( e X i ( s l ′ ) + ( t − s l ′ ) e X ′ i ( s l ′ )) Q h n ( s − s l , t − s l ′ ) i . (9) Here w ( m i ) = m i m i − 1 is a w eigh t function wh ic h is determined thr ough an asymptotic bias analysis (Prop osition 2.1 ). Note that, as long as | s − t | ≥ Ah n for some constan t A d ep end in g on B K and C Q , in the inner su m in defin ition ( 9 ), the terms for w hic h l = l ′ are absent. Therefore, according to our analysis in the previous section, they do not con trib ute anything b y wa y of bias. No w let W ( · , · ) b e a w eight k ernel on th e d omain [0 , 1] 2 defined as W ( s , t ) := W ( s − t ) = ( 0 if | s − t | > 1 2 1 if | s − t | ≤ 1 2 (10) Define W e h n ( s, t ) = W (( s − t ) / e h n ) and W e h n ( s, t ) = 1 − W e h n ( s, t ), w h ere e h n = Ah n for the ab o ve A > 0. W e then smo oth the ke r nels W e h n and W e h n b y con vo lving them with a Gaussian k ernel G τ n ( · ) with a small ban d width τ n (in the sense that τ n = o ( h n )). And with an abuse of notation, denote the resu lting kernels also b y W e h n and W e h n , resp ectiv ely . Finally , we are ready to d efine the prop osed com bined estimator of C ( s, t ) as b C c,h n ( s, t ) = W e h n ( s, t ) e C h n ( s, t ) + W e h n ( s, t ) max { b C h n ( s + t 2 ) , h 2 n } , (11) where b C h n ( · ):= b C ∗ ,h n ( · ) − b σ 2 . The use of maxim um in the second term is just to guaran tee that the estimator of the diago n al is nonn egativ e and the b ias is O ( h 2 n ). W e no w discuss briefly the computational asp ects of the pr op osed estimator. A k ey s tep is the computation of the f unctions S i ( · ) and e X i ( · ) and their deriv ativ es at th e grid p oin ts s l : l = 1 , . . . , L n . Eac h one of these computations requires O ( m i ) floating p oin t op erations (for eac h i = 1 , . . . , n ). F rom these, we obtain e C h n ( s, t ) and b C ∗ ,h n ( t ) b y usin g ( 9 ) and ( 7 ), resp ectiv ely . Bo th expressions are in the form of discrete conv olutions, and hen ce can b e computed very rapidly by u sing the F ast F ourier T r ansform . T h u s, the estima tion pro cedure is compu tationally v ery efficien t, with O ( n mL n log L n ) computations on the w hole grid, where m = m ax i m i . 7 2.3 Estimation of σ 2 Here w e briefly outline a metho d for estimating the error v ariance σ 2 . The metho d is similar to the approac h tak en in Y ao, M ¨ uller and W ang (2006), an d hence w e omit the d etails. First, for a given band width h n , we estimate the fu n ction C ( s, t ) for | s − t | > Ah n , for some A dep ending on B K and C K , using ( 9 ). Then, as in Y ao et al. (2006) , we estimate the diagonal { C ( t ) : t ∈ [0 , 1] } , using an oblique linear in terp olation, by b C 0 ,h n ( t ) = Z A 2 A 1 1 2 ( e C h n ( t − uh n , t + uh n ) + e C h n ( t + uh n , t − uh n )) d e G ( u ) , (12) for some probabilit y d istribution fu nction e G sup p orted on [ A 1 , A 2 ] wh ere A 1 > A . On the other hand, we estimate the curve { C ( t ) + σ 2 : t ∈ [0 , 1] } by b C ∗ ,h n ( t ) defined in ( 7 ). No w, we estimate σ 2 b y b σ 2 = 1 T 1 − T 0 Z T 1 T 0 ( b C ∗ ,h n ( t ) − b C 0 ,h n ( t )) dt, (13) where 0 < T 0 < T 1 < 1. It can b e sho wn that (Corollary 4.1 in Section 4 ) the estimator b σ 2 th us obtained is consisten t for an approp r iate choice of h n . 3 Bandwidth selection The c h oice of optimal bandwidth for the kernel is a key step in any k ernel-based estimation p ro- cedure. Y ao et al. (2005) use a lea ve-o n e-curv e-out cross v alidation score based on the prediction error for selecting the bandwid th of the smo other, and an AIC approac h f or selecting the num b er of non-zero eig env alues. Ho w ever, lea ve -one-curve-o u t cross v alidation is computationally v ery ex- p ensive. Also, as sho wn b elo w, th e pr e diction err or loss is not an appropriate criterion for cross v alidati on und er the current con text. Therefore, in this pap er w e address the issue of mo del selec- tion b y pro du cing an appro ximation to the lea v e-one-curv e-out cross v alidat ion score based on the empiric al Kul lb ack-L e i bler loss . The appr o ximation is based on the idea that the estimator obtained b y dropp ing an y single curv e is a small p erturb ation of the estimator based on the whole d ata (Pe n g and Paul, 2007). In p articular, we use p erturb ation theory of linear op erators to quan tify this p er- turbation and pro duce a first order appro ximation to the C V score that is computationally efficien t. It also enables u s to select the bandw idth and the dimension of the pr o cess sim ultaneously . W e fir s t discuss the choi ce of the loss fu nction, wh ic h is v ery imp ortan t for a cross-v alidation sc heme. W e w ant to p oint out that, the p rediction problem is in trins ically different from th e estimation of the co v ariance k ern el. W e find out that the cr iterion based on prediction er r or loss is not correctly scaled, as opp osed to th e one based on empir ical K ullbac k-Leibler loss. T o mak e this p oint clear, we examine these t wo cross v alidatio n criteria in d etails. Define Y i = ( Y ij ) m i j =1 , µ i = ( µ ( T ij )) m i j =1 , ψ iν = ( ψ ν ( T ij )) m i j =1 . W e assu m e that the cov ariance k ernel can b e represente d using K orthonormal eigenfunctions for some K ≥ 1. Then the lea v e- one-curv e-out cross v alidat ion score based on the prediction err or loss is giv en b y C V ( K, h n ) = n X i =1 m i X j =1 ( Y ij − b Y ( − i ) i ( T ij )) 2 . (14) Here b Y ( − i ) i ( t ) = b µ ( − i ) ( t ) + P K ν =1 b ξ ( − i ) iν b ψ ( − i ) ν ( t ), where b µ ( − i ) ( t ) and b ψ ( − i ) ν ( t ) are the estimates of µ ( t ) and ψ ν ( t ) computed from observ ations { Y i ′ } n i ′ 6 = i . Also, b ξ ( − i ) iν is the estimated p rincipal comp onent 8 score based on observ ations { Y i ′ } n i ′ 6 = i . Note that, the estimat ed pr incipal comp onents scores ξ ( − i ) iν can b e obtained thr ough the pro cedure describ ed in Y ao et al. (2005 ), ev en though it will not b e necessary for the m o del selection pro cedu re w e shall adopt. On the other han d , the CV score based on the empir ical Kullbac k-Leibler loss is giv en by C V ∗ ( K, h n ) = n X i =1 ℓ i ( Y i ; b µ ( − i ) i , b Σ ( − i ) i,K ) , (15) where b Σ ( − i ) i,K = K X ν =1 b λ ( − i ) ν b ψ ( − i ) iν ( b ψ ( − i ) iν ) T + b σ 2 ( − i ) I m i , and ℓ i is (up to an add itiv e constant ) the negativ e log-lik elihoo d of the i -th observ ation u n der the w orking assumption of Gaussianit y , wh ic h is ℓ i ( Y i ; µ i , Σ i ) = 1 2 log | Σ i | + 1 2 tr (Σ − 1 i ( Y i − µ i )( Y i − µ i ) T ) . T o gain an un derstanding of what these CV scores are appr oximati n g, w e assume that we hav e t wo indep endent samples, eac h with n i.i.d. sample curves. F urthermore, to simplify exp osition, w e assume that µ ≡ 0. S upp ose that the estimates b Ψ = { b ψ ν } K ν =1 , b Λ = { b λ ν } K ν =1 are obtained from the fi r st sample. Then a lea v e-one-curv e-out CV score can b e reasonably appro ximated b y su bstituting th ese estimates in the corresp ond ing empirical loss function b ased on the second sample, and w ith an abuse of notation we also denote this quantit y by C V . I f ℓ i (Ψ , Λ) denotes the loss fu nction corresp ondin g to the i -th observ ation in th e second sample, th en the CV score is giv en by 1 n P n i =1 ℓ i ( b Ψ , b Λ). F or simplicit y , we assume that there is a tru e mo del (Ψ ∗ , Λ ∗ ) within the class of m o dels we are considering. A fi rst order exp an s ion of the difference b etw een the C V scores under the true and estimated parameters for the empirical K ullbac k-Leibler loss s h o ws that, with high probabilit y , 1 n n X i =1 ℓ i ( b Ψ , b Λ) − 1 n n X i =1 ℓ i (Ψ ∗ , Λ ∗ ) = 1 4 n n X i =1 k b Σ − 1 / 2 i (Σ ∗ i − b Σ i ) b Σ − 1 / 2 i k 2 F (1 + o (1)) + O   r log n n " 1 n n X i =1 k Σ 1 / 2 ∗ i ( b Σ − 1 i − Σ − 1 ∗ i )Σ 1 / 2 ∗ i k 2 F # 1 / 2   , (16) where k · k F is the F r ob enius norm, a n d Σ ∗ i and b Σ i are the co v ariance matrices of the ob- serv ations Y i = ( Y i 1 , . . . , Y im i ) T , corresp onding to th e true parameter (Ψ ∗ , Λ ∗ ) and estimates ( b Ψ , b Λ), resp ectiv ely . Since w e can essen tially ignore the O ( · ) term in ( 16 ) as long as 1 n P n i =1 k b Σ − 1 / 2 i (Σ ∗ i − b Σ i ) b Σ − 1 / 2 i k 2 F is not too small, ( 16 ) giv es a quadratic appro ximation to the CV score. Notice that, in eac h term within th e summation of this quadratic appr o ximation, dir ections with high v ariabilit y are do wn -w eigh ted by the multiplicativ e factors b Σ − 1 / 2 i . Therefore this CV score based on the empir ical Kullbac k-Leibler loss is prop erly scaled. Moreo ver, note that appro ximation ( 16 ) do es not really d ep end on Gaussianit y but only on the tail of the d istributions inv olv ed. 9 On the other hand, it can b e sho wn by s imple algebra that, up to a m ultiplicativ e factor, the CV score based on the pred iction error loss is C V = 1 n P n i =1 e ℓ i (Ψ , Λ) where e ℓ i (Ψ , Λ) = tr ( b Σ − 2 i S i ), where, S i = ( Y i − b µ i )( Y i − b µ i ) T is the empirical co v ariance m atrix corresp onding the i obs erv atio n v ector. The corresp ondin g differen ce of the CV s cores b etw een estimated and tru e parameters b ecomes (ignoring the m ultiplicativ e constan t), 1 n n X i =1 e ℓ i ( b Ψ , b Λ) − 1 n n X i =1 e ℓ i (Ψ ∗ , Λ ∗ ) = 1 n n X i =1 tr [( b Σ − 2 i − Σ − 2 ∗ i )Σ ∗ i ] + 1 n n X i =1 tr [( b Σ − 2 i − Σ − 2 ∗ i )( S i − Σ ∗ i )] = 1 n n X i =1 tr [Σ − 1 / 2 ∗ i ( A 2 i − I m i )Σ − 1 / 2 ∗ i ] + O   r log n n " 1 n n X i =1 k Σ − 1 / 2 ∗ i ( A 2 i − I m i )Σ − 1 / 2 ∗ i k 2 F # 1 / 2   (17) with high prob ab ility . Here A i = Σ 1 / 2 ∗ i b Σ − 1 i Σ 1 / 2 ∗ i whic h is already prop erly scaled. Therefore, from ( 17 ) it is clear that this CV score itself is not correctly scaled. Also, the expr ession 1 n P n i =1 tr [Σ − 1 / 2 ∗ i ( A 2 i − I m i )Σ − 1 / 2 ∗ i ] app earing in ( 17 ) is not necessarily nonn egativ e. This m eans that the p rediction er- ror lo ss do es n ot enjo y the pleasing prop ert y of the Kullbac k-Leibler loss that the minim um of the exp ected loss o ccurs at th e true parameter. Hence the use of the prediction error loss is not recommended for the cur ren t pr oblem. 3.1 First order appro ximation Direct computation of the criterion C V ∗ ( K, h n ) (equation ( 15 )) is a lab orious pro cess since we need to compute b C ( − i ) c ( s, t ) and p erform its eigen-analysis f or ev ery i = 1 , . . . , n . Therefore, we prop ose to appr oximate C V ∗ ( K, h n ) by using a first ord er appr o ximation to the quan tities b µ ( − i ) i , b ψ ( − i ) ν ( · ) and b λ ( − i ) ν around the estimates b µ i , b ψ ν ( · ) and b λ ν , resp ectiv ely . The appro ximations of the eigenfunctions and eige nv alues is b ased on a p erturbation analysis app r oac h. Th e k ey idea is that the lea ve- one-curve-o u t estimator b C ( − i ) c of the co v ariance can b e viewe d as a p ertur b ation of the linear operator b C c . The k ey comp onent is Prop osition 3.1 whic h uses a result on p erturb ation of eigenfunctions of a linear op er ator (Lemma 7.1 in App endix A). Note th at, our approximat ion sc heme can also b e applied to CV scores b ased on some other loss f u nctions, suc h as C V ( K, h n ). Using Lemma 7.1 , we can get a first ord er app ro ximation to the quan tities b ψ ( − i ) iν and b λ ( − i ) ν that dep ends on the observ ations thr ou gh a term that is linear in ∆ i ( s, t ) = b C c ( s, t ) − b C ( − i ) c ( s, t ) (for conv enience we omit h n in the n otation). Sin ce the latter quan tity h as a rather simp le ex- pression which inv olv es essential ly only the i -th observ ation, this step su bstan tially reduces the computational bur den of the cross-v alidati on pro cedu re. Prop osition 3.1. F or the pr op ose d estimator b C c given by ( 11 ), we have, (i) b ψ ( − i ) iν − b ψ iν = ( b ψ ( − i ) ν ( T ij ) − b ψ ν ( T ij )) m i j =1 ≈ (( b H ν ∆ i b ψ ν )( T ij )) m i j =1 ; (18) 10 (ii) b λ ( − i ) ν − b λ ν ≈ − tr ( b P ν ∆ i ); (19) wher e (a) b P ν = b ψ ν ⊗ b ψ ν wher e, for f , g ∈ L 2 ([0 , 1]) , f ⊗ g denotes the inte gr al op er ator with kernel f ( x ) g ( y ) and acts on any w ∈ L 2 ([0 , 1]) as ( f ⊗ g )( w )( x ) = ( R 1 0 g ( y ) w ( y ) dy ) f ( x ) ; (b) b H ν ( t, u ) = K X k 6 = ν 1 b λ k − b λ ν b ψ k ( t ) b ψ k ( u ) − 1 b λ ν δ ( t − u ) − K X k =1 b ψ k ( t ) b ψ k ( u ) ! = K X k 6 = ν b λ k b λ ν ( b λ k − b λ ν ) b ψ k ( t ) b ψ k ( u ) + 1 b λ ν b ψ ν ( t ) b ψ ν ( u ) − 1 b λ ν δ ( t − u ) , (20) with δ b eing the Dir ac δ - f unction, i.e., R δ ( t − u ) w ( u ) du = w ( t ) for any smo oth w ∈ L 2 ([0 , 1]) . Her e tr ( b P ν ∆ i ) and ( b H ν ∆ i b ψ ν )( t ) ar e define d as fol lows: (a’) tr ( b P ν ∆ i ) = Z b ψ ν ( u )∆ i ( u, v ) b ψ ν ( v ) dudv ; (21) (b’) ( b H ν ∆ i b ψ ν )( t ) = Z Z b H ν ( t, u )∆ i ( u, v ) b ψ ν ( v ) dudv . (22) Also , (iii) b µ ( − i ) ( t ) − b µ ( t ) = 1 n − 1 b µ ( t ) − 1 n − 1 1 m i m i X j =1 Y ij K h µ ( t − T ij ) , (23) wher e b µ ( t ) = 1 n P n i =1 1 m i P m i j =1 Y ij K h µ ( t − T ij ) , with h µ b eing the b andw idth for estimating µ (chosen sep ar ately). After we obtain the approxima tions for b ψ ( − i ) iν and b λ ( − i ) ν from Prop osition 3.1 , we plu g them bac k in equation ( 15 ) for C V ∗ ( K, h n ) to obtain the final appro ximation of the CV score, denoted b y g C V ∗ ( K, h n ): g C V ∗ ( K, h n ) = 1 2 n n X i =1 log | e Σ i | + 1 2 n n X i =1 tr ( e Σ − 1 i ( Y i − b µ ( − i ) i )( Y i − b µ ( − i ) i ) T ) , where e Σ i = P K ν =1 e λ iν e ψ iν e ψ T iν + b σ 2 ( − i ) I m i , with e λ iν = b λ ν − tr ( b P ν ∆ i ) and e ψ iν = b ψ iν + (( b H ν ∆ i b ψ ν )( T ij )) m i j =1 , 11 and b µ ( − i ) i = ( b µ ( − i ) ( T ij )) m i j =1 , with b µ ( − i ) giv en b y ( 23 ). An expression for b σ 2 ( − i ) − n n − 1 b σ 2 is easily obtained b y u sing ( 7 ), ( 12 ) and ( 13 ). Note that this step do es n ot requir e an y extra computation b eyo n d that for computing b σ 2 . Observe th at our ob jectiv e of minimizing the criterion g C V ∗ ( K, h n ) is to estimate the n umb er of n onzero eigen v alues and to select an appropriate bandwid th for estimating the eigenfunctions. If instead th e ob jectiv e is to select an appropriate b an d width for estimating the co v ariance kernel, we can do so b y replacing the term P K ν =1 e λ iν e ψ iν e ψ T iν in the definition of e Σ i with the lea v e-one-curv e- out estimate of co v ariance kernel, viz. b C ( − i ) c,h n ev aluated at the design p oints, and minimizing the corresp ondin g CV criterion. This distinction is imp ortan t since the theoretica l results (Theorems 4.1 and 4.2 ) sho w that the optimal rates for th e bandwidth h n are different for estimating th e co v ariance kernel and its eigenfunctions. 3.2 Represen tation of b H ν ∆ i b ψ ν and tr ( b P ν ∆ i ) In order to obtain the approximat e CV score g C V ∗ ( K, h n ) efficien tly , w e need to compute the quan tities ( b H ν ∆ i b ψ ν )( t ) and tr ( b P ν ∆ i ) in an efficien t manner. Thus w e h a v e the follo w in g further appro ximation based on Lemma 7.1 . Prop osition 3.2. We have (i) ( b H ν ∆ i b ψ ν )( t ) ≈ w ( m i ) n − 1 K X k 6 = ν b λ k b λ ν ( b λ k − b λ ν ) γ k ,h n ( i ) γ ν,h n ( i ) b ψ k ( t ) + w ( m i ) n − 1 1 b λ ν ( γ ν,h n ( i )) 2 b ψ ν ( t ) − w ( m i ) n − 1 1 b λ ν 1 g ( t ) L n X l =1 ( e X i ( s l ) + ( t − s l ) e X ′ i ( s l )) Q h n ( s l − t ) e γ ν,h n ( i, t ) − w ( m i ) n − 1   K X k 6 = ν b λ k b λ ν ( b λ k − b λ ν ) γ k ,ν,h n ( i ) b ψ k ( t ) + 1 b λ ν γ ν,ν,h n ( i ) b ψ ν ( t )   + 1 n − 1 K X k 6 = ν b λ k b λ ν ( b λ k − b λ ν ) b ψ k ( t ) L n X l =1 Z b ψ k ( u ) b ψ ν ( u ) g ( u ) ( S i ( s l ) β 1 ,h ( u, s l ) + S ′ i ( s l ) β 2 ,h ( u, s l )) du + 1 n − 1 1 b λ ν b ψ ν ( t ) L n X l =1 Z ( b ψ ν ( u )) 2 g ( u ) ( S i ( s l ) β 1 ,h ( u, s l ) + S ′ i ( s l ) β 2 ,h ( u, s l )) du − 1 n − 1 1 b λ ν L n X l =1 ( S i ( s l ) β 1 ,h ( t, s l ) + S ′ i ( s l ) β 2 ,h ( t, s l )) b ψ ν ( t ) g ( t ) + ( b σ 2 ( − i ) − n n − 1 b σ 2 )( Z b H ν ( t, u ) du )( Z b ψ ν ( u ) du ); (24) 12 (ii) tr ( b P ν ∆ i ) ≈ − b λ ν n − 1 + w ( m i ) n − 1  ( γ ν,h n ( i )) 2 − γ ν,ν,h n ( i )  + 1 n − 1 Z ( b ψ ν ( u )) 2 g ( u ) ( S i ( u ) β 1 ,h ( u ) + S ′ i ( u ) β 2 ,h ( u )) du, +( b σ 2 ( − i ) − n n − 1 b σ 2 )( Z b ψ ν ( u ) du ) 2 (25) wher e (a) γ k ,h n ( i ) = L n X l =1 e X i ( s l ) G 0 b ψ k g , Q h n ! ( s l ) − L n X l =1 e X ′ i ( s l ) G 1 b ψ k g , Q h n ! ( s l ) , (26) wher e, for any two f unctions f 1 and f 2 define d on [0 , 1] , G 0 ( f 1 , f 2 )( s ) = ( f 1 ∗ f 2 )( s ) = Z f 1 ( x ) f 2 ( s − x ) dx, G 1 ( f 1 , f 2 )( s ) = ( f 1 ∗ ( xf 2 ))( s ) = Z f 1 ( x )( s − x ) f 2 ( s − x ) dx ; (b) e γ k ,h n ( i, t ) = L n X l =1 Z (1 − W e h n ( t, v ))[( e X i ( s l ) + ( v − s l ) e X ′ i ( s l )) Q h n ( s l − v )] b ψ k ( v ) g ( v ) dv ; (27) (c) γ k ,k ′ ,h n ( i ) = 1 X j,j ′ =0 L n X l,l ′ =1 X ( j ) i ( s l ) X ( j ′ ) i ( s l ′ ) Z W e h n ( u, v ) b ψ k ( u ) g ( u ) b ψ k ′ ( v ) g ( v ) · ( u − s l ) j ( v − s l ′ ) j ′ Q h n ( s l − u ) Q h n ( s l ′ − v ) dudv ; (28) (d) β 1 ,h ( u, s ) = Z u + A 2 h u − A 2 h Q h ( u + v 2 − s ) dv (29) β 2 ,h ( u, s ) = Z u + A 2 h u − A 2 h ( u + v 2 − s ) Q h ( u + v 2 − s ) dv . (30) In the ab o ve , the computation of γ k ,h n ( i ) can b e easily d one b y using fast four ier transformation. Also, e γ k ,h n ( i, t ) ≈ γ k ,h n ( i ) for al l t ∈ [0 , 1]. Ho w eve r , the computation of γ k ,k ′ ,h n ( i ) in volv es a double in tegration. Th us w e need to d o some appr o ximations to simplify th e computation. A computationally efficien t appro ximation to γ k ,k ′ ,h n ( i ) is describ ed in App endix B. Compu tation of β 1 ,h ( u ) and β 2 ,h ( u ) can b e done in clo sed form whenev er Q h ( · ) has a “nice” fun ctional form 13 (e.g. a B-spline). F rom Prop ositions 3.1 and 3.2 it is clear that most of the components ha ve already b een computed in constru cting the estimator, and con volutio n s can b e p erformed in a fast manner b y using FFT. Thus, th e k ey adv an tage afforded b y Prop osition 3.2 is to replace the exp ensive computation of doub le integral s to a m uch cheaper computation of single in tegrals and con v olutions. See App end ix F f or d etails of some of these steps. 4 Asymptotic prop erties In this section, we present the theoretical p rop erties of the pr op osed estimators through a large sample analysis. Our main interest is in the estimation accuracy of the co v ariance kernel and its eigenfunctions. The statemen ts of the results and the asso ciated regularit y co n ditions are give n b elo w. W e first s tate the follo win g assump tions on g , the den sit y of the d esign p oint s; C , the co v ariance k ernel; and { ψ k } M k =1 , the eigenfunctions. A1 g is t wice con tinuously differentia ble and the second deriv ativ e is H¨ older( α ), f or s ome α ∈ (0 , 1). Also, the same holds for the co v ariance k ernel C . A2 max k {k ψ k k ∞ , k ψ ′ k k ∞ , k ψ ′′ k k ∞ } is b oun ded. A3 There are constan ts 0 < c 0 ≤ c 1 < ∞ su ch that c 0 ≤ g ( · ) ≤ c 1 . W e also assume th at the k ern els K ( · ) and Q ( · ) satisfy conditions B1 and B2 , r esp ectiv ely . W e need to mak e further assumptions ab out the co v ariance kernel C and the correlat ions among th e sample curves. Let R denote an n × n matrix with ( i, j )-th en try ρ ij . Assu me: C1 λ 1 > λ 2 > · · · > λ M > 0 and λ M +1 = · · · = 0. That is, the nonzero eigen v alues are all distinct and the co v ariance kernel is of finite dim en sion. C2 max 1 ≤ ν ≤ M ( λ ν − λ ν +1 ) − 1 is b ound ed ab o v e. C3 1 n 2 tr [( R − I n ) 2 ] → 0 as n → ∞ , and k R k≤ κ n for κ n > 0. Note that, the first part of C3 quan tifies the total con tribu tion of th e correlations among the sample curv es in the v ariance of the estimated co v ariance k ernel (see Theorem 4.1 ). The second p art of C 3 imp oses a stabilit y condition on the correlation matrix R . In other w ords, the sample curv es are “w eakly correlated” as k R k is b ound ed b y κ n . Define m = min 1 ≤ i ≤ n m i and m = max 1 ≤ i ≤ n m i . W e further assume that C4 m/m is b ound ed ab o v e as n → ∞ . W e no w giv e the bias and v ariance of the prop osed com bined estimator. Theorem 4.1. Supp ose that c onditions A1 - A3 , B1 - B2 and C 3 - C4 hold. Assume further that σ 2 is known and b C ( · ) = b C ∗ ( · ) − σ 2 wher e b C ∗ ( · ) i s define d thr ough ( 7 ). Supp ose further that in the definitio n ( 11 ), e h n = Ah n for some c onstant A ≥ 4( B K + C Q ) . Then, with h n = o (1) and nh 2 n → ∞ , the estimato r b C c satisfies: E [ b C c ( s, t )] = C ( s, t ) + O ( h 2 n ) , (31) V ar [ b C c ( s, t )] = O  1 n  + O  max { 1 nh 2 n m 2 , 1 nh n m }  +   1 n 2 n X i 6 = j ρ 2 ij   O (1) , (32) 14 wher e the O ( · ) terms ar e unif orm in s, t ∈ [0 , 1] . One implication of Theorem 4.1 is that it giv es the rate of con verge n ce of the estimator b σ 2 defined in ( 13 ) as illustrated in th e follo wing corollary . Corollary 4.1. Supp ose that c onditions A1 - A3 , B1 - B2 and C3 - C4 hold, and in the definition ( 11 ), e h n = Ah n for some c onstant A ≥ 4( B K + C Q ) . Then, with h n = o (1) and n h 2 n → ∞ , E ( b σ 2 − σ 2 ) 2 = O  1 n  + O  max { 1 nh 2 n m 2 , 1 nh n m }  +   1 n 2 n X i 6 = j ρ 2 ij   O (1) + O ( h 4 n ) , (33) wher e the O ( · ) terms ar e unif orm in s, t ∈ [0 , 1] . Using Corollary 4.1 and Th eorem 4.1 , we get a b ound on the v ariance of the prop osed estimator of the co v ariance kernel when σ 2 is estimated by b σ 2 defined in ( 13 ). Corollary 4.2. Supp ose that c onditions A1 - A3 , B1 - B2 and C3 - C4 hold, and in the definition ( 11 ), e h n = Ah n for some c onstant A ≥ 4( B K + C Q ) . Then, with h n = o (1) and n h 2 n → ∞ , V ar [ b C c ( s, t )] = O  1 n  + O  max { 1 nh 2 n m 2 , 1 nh n m }  +   1 n 2 n X i 6 = j ρ 2 ij   O (1) + O ( h 4 n ) , (34) wher e the O ( · ) terms ar e unif orm in s, t ∈ [0 , 1] . Next w e state the result ab out the asymptotic b eha vior of the estimated eigenfunctions. Let the loss fu nction for ψ ν b e the mo difie d L 2 -loss give n by L ( b ψ ν , ψ ν ) = k b ψ ν − sign( h b ψ ν , ψ ν i ) ψ ν k 2 2 , (35) where k · k 2 denotes the L 2 norm, and h b ψ ν , ψ ν i = R 1 0 b ψ ν ( x ) ψ ν ( x ) dx . F or the statemen t of Theorem 4.2 , we only need to assume that the estimator b σ 2 of σ 2 satisfies E ( b σ 2 − σ 2 ) 2 = o (1). Theorem 4.2. Supp ose that c onditions A1 - A3 , B1 - B2 and C 1 - C4 hold. Supp ose further that in the definition ( 11 ), e h n = Ah n for some c onstant A > 4( B K + C Q ) . If mh n = o (1) , nh 2 n → ∞ and κ n mh − 1 n n − 1 / 2+ ǫ ′ → 0 for some ǫ ′ > 0 , then the estimator b ψ ν , which is the eigenfu nction c orr esp onding to the ν -th lar gest eigenvalue of b C c , satisfies: for any arbitr ary but fixe d ǫ > 0 , sup ( C,g ) ∈ Θ E L ( b ψ ν , ψ ν ) ≤ (1 + ǫ ) 1 n   X 1 ≤ k 6 = ν ≤ M λ k λ ν ( λ k − λ ν ) 2   +(1 + ǫ )   1 n 2 n X i 6 = j ρ 2 ij     X 1 ≤ k 6 = ν ≤ M λ k λ ν ( λ k − λ ν ) 2 + O ( h n )   + O ( h 4 n ) + O  1 nh n m  , (36) wher e Θ denotes the class of c ovarianc e- density p airs ( C, g ) satisfying the c onditions A1 - A3 , B1 - B2 and C 1 - C4 . 15 One imp ortant implication of Theorems 4.1 and 4.2 is that, if the correlation b et ween sample curv es is “w eak” in a suitable sense, then the b est rate of con v ergence for the correlated and i.i.d. cases are the same. Comparing with the i.i.d. case, w e imm ed iately see that, in order for this to hold, un der the conditions of Theorem 4.2 , we n eed 1 n 2 X j 6 = i ρ 2 ij = o  1 nh n, ∗ m  , (37) where h n, ∗ is the optimal band w idth c hoice (at the lev el of r ates of conv ergence) for the i.i.d. case. Also, in order to ensur e the optimal rate for the estimate of the co v ariance k ern el in the correlated case is the same as that in the i.i.d. case, it is sufficien t that (by Corollary 4.2 ) 1 n 2 X j 6 = i ρ 2 ij = o  max { 1 nh n, ∗ m , 1 nh 2 n, ∗ m 2 }  , (38) where h n, ∗ is the optimal band width c hoice for th e co v ariance estimator (at the lev el of rates of con v ergence) for the i.i.d. case. Sp ecifically , for estimating the co v ariance, h n, ∗ = ( nm 2 ) − 1 / 6 (b y Theorem 4.1 , and under the setting wh ere mh n, ∗ = o (1)), and for estimating the eigenfunctions, h n, ∗ = ( nm ) − 1 / 5 (b y Theorem 4.2 ). Th us, one notices that the optimal band width f or estimating the cov ariance and its eigenfunctions are differen t, at least in the case where m can only gro w rather slo wly with n . Combining the lo wer b ound giv en by (Theorem 2) in Hall e t al. (2 006) and th e upp er b ound fr om Th eorem 4.2 , it f ollo ws th at wh en m is b oun ded, the rate of conv ergence of L 2 -risk is optimal if ( 37 ) holds. Thus, under this setting the prop osed estimator of the eigenfunctions is optimal even in the situation w hen the s amp le curves are we akly c orr elate d . Similarly , und er the setting of Theorem 4.1 , if ( 38 ) holds, then the L 2 -risk of th e prop osed estimato r of co v ariance also has the optimal rate un der an approp r iate choice of bandwid th. Another imp ortan t p oin t is that the conditions in Theorem 4.2 , sp ecifically that mh n = o (1), nh 2 n → ∞ , and mκ n h − 1 n n − 1 / 2+ ǫ ′ = o (1), wh ic h imp ly that m = o ( n 1 / 4 ), are not the most general conditions. W e conjecture that ( 36 ) hold un der w eak er conditions. I ndeed, in the i.i.d. case, ( 36 ) holds (without the second term on the RHS ) und er a muc h wid er range of p ossib le v alues of m as ind icated by the follo wing result. The follo wing result giv es a lo we r b ound on the r ate of con v ergence of the first eigenfunction when m → ∞ under the i.i.d. setting. This b ound is a refinement o v er an analogous result (Theorem 2) in Hall et al. (2006), ev en though the latter holds for all eigenfunctions. Notice th at this lo wer b ound , toge th er with the upp er b ound elucidated in the paragraph follo wing Th eorem 4.2 , imp lies that at least for the fi rst eigenfunction, the b est rate of con v ergence for eigenfunctions, viz. O (( nm ) − 4 / 5 ) is optimal when m → ∞ at a faster rate and if ( 37 ) holds. Theorem 4.3. L e t C denote the class of c ovarianc e kernels Σ( · , · ) on [0 , 1] 2 with r ank ≥ 1 , and nonzer o e i genvalues { λ j } j ≥ 1 satisfying C 0 ≥ λ 1 > λ 2 ≥ 0 with λ 1 − λ 2 ≥ C 1 , and the first eigenfu nc- tion ψ 1 b eing twic e differ entiable and satisfying k ψ ′′ 1 k ∞ ≤ C 2 , for some c onstants C 0 , C 1 , C 2 > 0 . Also , let G denote the class of c ontinuous densities g on [0 , 1] such that c 1 ≤ g ≤ c 2 for some 0 < c 1 ≤ 1 ≤ c 2 < ∞ . Supp ose that we observe data ac c or ding to mo dels ( 1 ) wher e X i ( · ) ar e i.i.d. Gaussian pr o c esses with me an 0 and c ovarian c e kernel Σ . Also supp ose that the numb er of me asur ements m i ’s satisfy m ≤ m i ≤ m , for m ≥ m ≥ 4 , such that m/m ≤ C 3 for some C 3 < ∞ , and m = o ( n 2 / 3 ) . L et D denote the sp ac e of such designs D = { m i } n i =1 . Then for sufficiently lar ge 16 n , for any estimator b ψ 1 with l 2 norm one, the fol lowing holds: sup D ∈D sup g ∈G sup Σ ∈C E k b ψ 1 − ψ 1 k 2 2 ≥ C 4 ( nm ) − 4 / 5 . (39) The pro of of Th eorem 4.3 is giv en in App endix G. 5 Outline of the Pro of of Theorems 4.1 and 4.2 In this sectio n , we b riefly describ e the main ideas leading to the pr o of of Th eorems 4.1 and 4.2 . The tec hnical arguments are giv en in the app end ices. Th e pro of of Theorem 4.1 u ses direct computation (App end ices C and D). The basic idea in the computation of th e moments is to treat the diagonal and the off-diagonal p arts of b C c ( · , · ) separately . The pr o of of Theorem 4.2 hea vily r elies on an application of Lemma 7.1 . In view of th is, the key quant ity in the deriv ation of asymptotic risk is the computation of E k H ν b C c φ ν k 2 2 , where k f k 2 2 denotes R 1 0 f 2 ( x ) dx for a function f ∈ L 2 ([0 , 1]). Once w e obtain an expression for this (as give n in Section 5.1 ), w e use a pr ob ab ilistic b ound on the op erator norm of the difference b etw een estimated and true co v ariance kernels, to complete the pro of. P r o ofs of T h eorems 4.1 and 4.2 require rep eated computation of mixed momen ts of correlated Gaussian rand om v ariables. The details of all these computations are giv en in the app endices. 5.1 Asymptotic risk for estimating ψ ν The k ey result in this section is the follo wing prop osition. Prop osition 5.1. Under the assumptions of The or em 4.2 , we have E k H ν b C c ψ ν k 2 2 ≤ (1 + ǫ ) 1 n   X 1 ≤ k 6 = ν ≤ M λ k λ ν ( λ k − λ ν ) 2   +(1 + ǫ )   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2     X 1 ≤ k 6 = ν ≤ M λ k λ ν ( λ k − λ ν ) 2 + O ( h n )   + O ( h 4 n ) + O  1 nh n m  (40) for any arbitr ary but fixe d ǫ > 0 . Here w e briefly describ e the main idea of the p ro of. F or conv enience of exp osition, throughout w e r eplace max { b C ( s + t 2 ) , h 2 n } in the d efinition ( 11 ) by b C ( s + t 2 ). Using app ropriate exp onential in- equalities for b C ∗ ( t ), it can b e sho wn that, asymptotically th is do es not make any difference as long as min t ∈ [0 , 1] C ( t, t ) > c 3 for some c 3 > 0. Also, for computational p urp oses, it is helpful to consider the unsmo othed v ersion ( 10 ) of the k ernel W , and tak e e h n = Ah n , where A ≥ 4( B K + C Q ). Th e adv an tage of this is in b eing able to deal with th e con tribu tions from the diagonal and off-diagonal parts of the estimator separately . S ince the defi n ition of H ν in vol ves the Dirac- δ op erator, w e need to accoun t for the con tribution of terms inv olving δ carefully . The estimation error in σ 2 also pla ys a role, and is tak en into acco u nt separately . The main decomp ositions that facilitate the compu- tations are give n through ( 55 ) - ( 57 ) in App endix D. The last b ound r educes the task of b oun d ing 17 E k H ν b C c ψ ν k 2 2 to that of b oundin g E k H ν e C c ψ ν k 2 2 , w ith e C c ( · , · ) as describ ed in App endix D. Note also that, if σ 2 is assumed to b e kno wn, then the decomp osition ( 57 ) is not r equ ired, and w e can get rid of the m ultiplicativ e facto r (1 + ǫ ) in th e expr ession ( 36 ) for the r isk in Theorem 4.2 . 5.2 Norm b ound on b C c − E b C c T o complete the p ro of of the theorems, we need to fi nd a probabilistic b ound for k b C c − E b C c k , where k · k denotes the op erator n orm. W e shall first fin d a b ound on the s u p norm of b C c − E b C c , and then we can b ound the op erator norm k b C c − E b C c k via the in equalit y k b C c − E b C c k≤k C c − E b C c k F , where k · k F denotes the Hilb ert-Schmidt norm. This is in turn due to the inequalit y , k b C c − E b C c k F ≤ sup x,y ∈ [0 , 1] | b C c ( x, y ) − E b C c ( x, y ) | =: k b C c − E b C c k ∞ . Note that, b y piecewise differenti ability of the estimate b C c , in order to p ro vide exp onen tial b ounds for the deviations of k b C c − E b C c k ∞ , it is enough to provide exp onen tial b ounds for the fluctuations of | b C c ( s, t ) − E [ b C c ( s, t )] | for a finite (but p olynomially gro wing with n ) num b er of p oints ( s, t ) ∈ [0 , 1]. Th u s, w e fi x an arbitrary ( s, t ) ∈ [0 , 1] and deriv e an exp on ential inequ alit y for the d eviation of estimate at this p oin t. F or simp lifying the computations, w ithout loss of generalit y , we assume that g is the density of the Uniform(0,1) distribu tion. Th en w e hav e the follo wing prop osition. Prop osition 5.2. Under the c onditions of The or em 4.2 , given η > 0 , ther e is a c η > 0 such that, for every fixe d s, t ∈ [0 , 1] , P | b C c ( s, t ) − E ( b C c ( s, t )) | > c η mκ n s log n nh 2 n ! ≤ n − η . (41) The pr o of of Theorem 4.2 then follo ws by noticing first that b y Lemma 7.1 and the fact that k b ψ ν k 2 = k ψ ν k 2 = 1, E L ( b ψ ν , ψ ν ) ≤ E k H ν b C c ψ ν k 2 2 (1 + δ n,η ) + 2 P k b C c − E ( b C c ) k > c ′ η mκ n s log n nh 2 n ! for some η > 0, c ′ η > 0 an d δ n,η → 0 app ropriately c hosen, and then u sing Prop ositions 5.1 and 5.2 . 5.3 Connection to parametric rate for “purely functional” data It is instr u ctiv e to compare the optimal rate for our pr o cedure with that obtained b y Hall et al. (2006 ). W e can regard the first line on the righ t hand sid e of ( 40 ), as the parametric comp onen t of the risk and the second line as the nonparametric comp onent. If we take h = O ( n − 1 / 5 ), then for b ound ed m we get the optimal nonparametric rate. F or consistency of b ψ ν in L 2 sense, w e clearly need 1 n 2 P i 1 6 = i 2 ρ 2 i 1 i 2 = o (1) (used in Theorem 4.2 ). If m increases with increasing sample s ize, th en the rate also impr o v es. But there is n o result ab out optimalit y . When the observ ations are i.i.d., it can b e c hec k ed b y usin g a mo d ifi cation to th e pro of of Prop osition 5.2 that, if m → ∞ , h → 0, su c h that ( mh ) − 1 = o (1) and h = o ( n − 1 / 4 ), w e obtain the parametric rate for the L 2 -risk of b ψ ν (as indicated in Hall et al. , 2006). In other w ords, u nder that setting there is asymptotically no difference b et wee n th e r isk of estimating the eigenfunctions from data obtai n ed with ob s erv atio n al noise and measured at randomly distribu ted p oints, and 18 that from data measured on the contin uum without n oise. Indeed, su c h a s cenario is p ossible if m − 1 = o ( n − 1 / 4 − ǫ ), for an ǫ > 0. Then, b y taking h n = o ( n − 1 / 4 ), and assum ing th at either σ 2 is kn o wn, or an estimator b σ 2 satisfying | b σ 2 − σ 2 | = O P ( h 2 n ) is a v ailable, we attai n the conditions men tioned ab o ve. W e conjecture th at the same result holds even when the observ ations are “we akly” correlated. 6 Discussion In this pap er, w e p resen ted a p ro cedure for estimating the co v ariance k ernel and its eigenfunctions from sparsely observ ed, noise corrupted and correlated fu nctional data. Th e estimator for the co- v ariance k ernel is based on m erging t wo separate estimators: (i) the estimat or of the off-diago n al part b ased on computing linearized empirical co v ariances of the smo othed ve rs ion of individu al sample curv es; (ii) the estimator of the diagonal p art b ased on linearized k ernel smo othing of the empirical v ariances. The imp ortance of this mod ification to the naiv e kernel smo othing app roac h, esp ecially in the scenario when the num b er of d esign p oin ts p er curv e is s mall, is demonstrated through an asymp totic bias analysis. The linearized ve rs ion of the ke r nel sm o othing helps in re- ducing bias, wh ile cont r olling the v ariance, and is computationally app ealing. Asymptotic risk b ehavio r of the pr op osed estimators is stud ied und er the assumption that th e samp le curves hav e a “separable co v ariance” structure and are “w eakly” correlat ed. Exact q u an tification of the asymp- totic r isk for the eigenfun ctions is ob tained u nder the Gaussian setting (Theorem 4.2 ). It is also sho wn that th e L 2 -risk for the eigenfun ctions ac h iev es the optimal r ate, u nder an appropr iate c hoice of th e b andwidth, wh en the n u m b er of measurements p er cur v e is b ounded. Also, in the i.i.d. case, w e obtain a lo wer b ound on the rate of con ve rgence f or estimating the fi rst eigenfunction that is sharp er than b oun ds in the existing literature, wh ic h pr ov es the rate-optimalit y of our estimator in a wider r egime. Finally , we prop ose a computationally tractable mo del selection pro cedure based on min imizing an approxima tion to the lea ve- one-curve-o u t cross v alidation score that uses the empirical Kullbac k-Leibler loss. W e also show that in the co ntext of estimating the co v ariance k ernel or its eige n functions, it h as clear adv an tages ov er the commonly u s ed prediction error loss. The pr op osed pro cedur e for estimation and mo del selection is easily implemen table and com- putationally more tractable as compared to s ome of the existing metho ds . Moreo v er, due to the linear structure of the pre-smo othing of ind ividual cu rv es, our estimator is stable. F u rthermore, the linear structure of the prop osed estimat or also allo ws for a simple appro ximation to the cr oss v alidati on score. Finally , ev en th ough the results are prov ed under Gaussianit y of the noise pro cess, it can b e shown that at the lev el of rates of con v ergence, the u pp er b ou n ds hold und er sufficien t momen t cond itions on the n oise, and hence the estimator is exp ected to b e robust to distribu tional assumptions. There are a f ew asp ects of the estimation pro cedure that need fur ther exploration. In the asymptotic analysis, we assumed that g , the density f u nction of the design p oin ts, is known. In practice it h as to b e estimated from the data. Additional computations are needed to sho w that the results deriv ed h ere h old under that setting as w ell. It will b e useful also to study its impact on th e estimation pr o cedure thr ough simulatio n studies, and in r eal data applications when the assumption of exact randomness of th e design p oin ts may b e violated. A natural generalizati on of the framewo r k studied in this pap er will b e when the principal comp onent scores jointly form a stationary v ector autoregressiv e pro cess. Under such a setting, we w ould lik e to extend the estimation and mo del selection p ro cedures describ ed her e to exploit the sp ecial structures of s u c h pro cesses. This is lik ely to su mmarize the statist ical prop erties of some 19 real-life phenomena and also help in m o del building and pr ediction, for example in spatio-temp oral mo dels when the co v ariance is not separable. 7 App endix App endix A P erturbation of eigen-structure The follo win g lemma is a mo difi ed v ersion of a similar r esult in Paul and John stone (2007). Several v arian ts of this lemma app ear in the literature (see, e.g., Kneip and Utik al (2001), Cai and Hall (2006 )), and most of them implicitly use the approac h tak en in Kato (1980 ). In the follo wing w e use k A k to den ote th e op erator norm of an op erator A , i.e., the largest singular v alue of A . Lemma 7.1. L et A and B b e two symmetric Hilb ert-Schmidt op er ators acting on L 2 ([0 , 1]) . L et the eigenvalues of the op er ator A b e denote d by λ 1 ( A ) , λ 2 ( A ) , · · · . Set λ 0 ( A ) = ∞ and λ ∞ ( A ) = −∞ . F or any r ≥ 1 , if λ r ( A ) is a unique eigenvalue of A , i.e ., if λ r ( A ) is of multiplicity 1, then denoting by p r the eigenfunction asso ciate d with the r -th eigenvalue. Then p r ( A + B ) − sign h p r ( A + B ) , p r ( A ) i p r ( A ) = − H r ( A ) B p r ( A ) + R r wher e H r ( A ) := P s 6 = r 1 λ s ( A ) − λ r ( A ) P E s ( A ) and P E s ( A ) denotes the ortho gonal pr oje ction op er ator onto the ei gen-subsp ac e E s c orr esp onding to eigenvalue λ s ( A ) (p ossibly multi-dimensional). Define δ r and δ r as δ r := 1 2 [ k H r ( A ) B k + | λ r ( A + B ) − λ r ( A ) | k H r ( A ) k ] δ r = k B k min 1 ≤ j 6 = r ≤∞ | λ j ( A ) − λ r ( A ) | . Then, the r esidual term R r c an b e b ounde d as k R r k≤ min  10 δ 2 r , k H r ( A ) B p r ( A ) k  2 δ r (1 + 2 δ r ) 1 − 2 δ r (1 + 2 δ r ) + k H r ( A ) B p r ( A ) k (1 − 2 δ r (1 + 2 δ r )) 2  wher e the se c ond b ound holds only if δ r < √ 5 − 1 4 . In additio n, if 1 ≤ r 1 ≤ r 2 ar e such that λ r 1 ( A ) > λ r 1 +1 ( A ) = · · · = λ r 2 ( A ) > λ r 2 +1 ( A ) , then r 2 X k = r 1 ( λ k ( A + B ) − λ k ( A )) = tr ( P E r 1 ( A ) B ) + R r 1 ,r 2 , wher e P E r 1 ( A ) is the ortho gonal pr oje ction op e r ator of A c orr esp onding to the eigenvalues λ r 1 ( A ) , . . . , λ r 2 ( A ) , and the r esidual R r 1 ,r 2 satisfies | R r 1 ,r 2 | ≤ ( r 2 − r 1 + 1) 6 k B k 2 min 1 ≤ j 6 = r ≤∞ | λ j ( A ) − λ r ( A ) | . 20 Large deviat ions of quadratic forms The follo wing lemmas are from Paul (2004). S u pp ose that Φ : X → R n × n is a measur able fu nction. Let Z b e a random v ariable taking v alues in X . Lemma 7.2. Supp ose that X and Y ar e i.i.d. N n (0 , I ) and ar e indep endent of Z . Then for every L > 0 and 0 < δ < 1 , for al l 0 < t < δ 1 − δ L , P ( 1 n | X T Φ( Z ) Y | > t, k Φ( Z ) k≤ L ) ≤ 2 exp  − (1 − δ ) nt 2 2 L 2  . Lemma 7.3. Supp ose that X is distribute d as N n (0 , I ) and is indep endent of Z . Als o assume that Φ( z ) = Φ T ( z ) for al l z ∈ X . Then for every L > 0 and 0 < δ < 1 , for al l 0 < t < 2 δ 1 − δ L , P ( 1 n | X T Φ( Z ) X − T r (Φ( Z )) | > t , k Φ( Z ) k≤ L ) ≤ 2 exp  − (1 − δ ) nt 2 4 L 2  . Computation of conditional mixed momen ts In order to calculate the bias and v ariance of the prop osed estimator, w e n eed to compute the conditional exp ectatio n s E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) for v arious c h oices of i 1 , i 2 , j 1 , j ′ 1 , j 2 , j ′ 2 . W e shall use the follo win g we ll-known result, wh ic h is a sp ecial case of W ick formula (Nica and S p eic h er , 2006, p. 129) for computation of mixed moments of a Gaussian random v ector. Lemma 7.4. If W 1 , W 2 , W 3 and W 4 ar e jointly Gaussian with me an zer o and c ovarianc e matrix Σ , then E ( W 1 W 2 W 3 W 4 ) = Σ 12 Σ 34 + Σ 13 Σ 24 + Σ 14 Σ 23 . (42) W e shall use the formula to compute the ab o v e mixed moments with the observ ation that Co v( X i 1 j 1 , X i 2 j 2 | T i 1 , T i 2 ) = ρ i 1 i 2 C ( T i 1 j 1 , T i 2 j 2 ). Th e details of this computation in v arious generic cases are giv en in App end ix F. App endix B In App end ix B and the follo wing app endices, w e shall often wr ite h an d e h to denote h n and e h n , resp ectiv ely , and w e shall dr op the subscript h n from the co v ariance estimates. F or example, b C c will b e used to denote e C c,h n . Pro of of Prop osition 3.1 This is a straightfo r w ard app lication of Lemma 7.1 , by taking the estimated co v ariance k ernel b C c as op erator A and − ∆ i = b C ( − i ) c − b C c as op erator B . Note that in ( 20 ) the last term corresp onds to the zero eigen v alues of b C c . 21 Pro of of Prop osition 3.2 W e can express ∆ i ( u, v ) as b ∆ i ( u, v ) + R i ( u, v ) + ( b σ 2 ( − i ) − n n − 1 σ 2 ) − 1 n − 1 b C c ( u, v ), where b ∆ i ( u, v ) = (1 − W ˜ h n ( u, v )) w ( m i ) n − 1 1 g ( u ) g ( v ) · L n X l,l ′ =1 ( e X i ( s l ) + ( u − s l ) e X ′ i ( s l ))( e X i ( s l ′ ) + ( v − s l ) e X ′ i ( s l ′ )) Q h n ( u − s l ) Q h n ( v − s l ′ ) + W ˜ h n ( u, v ) 1 n − 1 1 g ( u + v 2 ) L n X l =1  S i ( u + v 2 ) + ( u + v 2 − s l ) S ′ i ( u + v 2 )  Q h n ( u + v 2 − s l ) (43) and R i ( u, v ) equals (with z d enoting u + v 2 ) (1 − W ˜ h n ( u, v )) σ n − 1 n X j 6 = i w ( m j ) h ( b µ ( − i ) ∗ ,j ( u ) − b µ ∗ ,j ( u )) b ε j ( v ) + b ε j ( u )( b µ ( − i ) ∗ ,j ( v ) − b µ ∗ ,j ( v )) i +(1 − W ˜ h n ( u, v )) X j 6 = i w ( m j ) n − 1 h ( b µ ( − i ) ∗ ,j ( u ) − b µ ∗ ,j ( u ))( µ ∗ ,j ( v ) − b µ ∗ ,j ( v )) +( b µ ( − i ) ∗ ,j ( v ) − b µ ∗ ,j ( v ))( µ ∗ ,j ( u ) − b µ ∗ ,j ( u )) − ( b µ ( − i ) ∗ ,j ( u ) − b µ ∗ ,j ( u ))( b µ ( − i ) ∗ ,j ( v ) − b µ ∗ ,j ( v )) i + W ˜ h n ( u, v ) 2 σ ( n − 1) g ( z ) n X j 6 = i 1 m j m j X k =1 ( b µ ( T j k ) − b µ ( − i ) ( T j k )) ε j k L n X l =1 e K z ,l ( T j k ) Q h n ( z − s l ) + W ˜ h n ( u, v ) 2 ( n − 1) g ( z ) n X j 6 = i 1 m j m j X k =1 ( b µ ( − i ) ( T j k ) − b µ ( T j k ))( µ ( T j k ) − b µ ( T j k )) L n X l =1 e K z ,l ( T j k ) Q h n ( z − s l ) − W ˜ h n ( u, v ) 1 ( n − 1) g ( z ) n X j 6 = i 1 m j m j X k =1 ( b µ ( − i ) ( T j k ) − b µ ( T j k )) 2 L n X l =1 e K z ,l ( T j k ) Q h n ( z − s l ) where, the ke r nel e K s,l ( · ) ≡ e K s,l, h n ( · ) is defined as e K s,l ( u ) = 1 h n  K ( s l − u h n ) +  s − s l h n  K ′ ( s l − u h n )  , (44) for s ∈ [0 , 1] and l = 1 , . . . , L n ; and for an y function f , f ∗ ,j ( x ) = ( g ( x )) − 1 L n X l =1 ( e f j ( s l ) + ( x − s l ) e f ′ j ( s l )) Q h n ( x − s l ) with e f j ( s ) := 1 m j P m j k =1 f ( T j k ) K h n ( s − T j k ); and b ε j ( x ) = ( g ( x )) − 1 L n X l =1 [ e ε j ( s l ) + ( x − s l ) e ε ′ j ( s l )] Q h n ( x − s l ) 22 with e ε j ( s ) = 1 m j P m j k =1 ε j k K h n ( s − T j k ). Since b H ν b C c b ψ ν = b λ ν b H ν b ψ ν = 0, it follo ws from the represen tation of ∆ i that b H ν ∆ i b ψ ν = b H ν ( b ∆ i + R i ) b ψ ν + ( b σ 2 ( − i ) − n n − 1 b σ 2 )( b H ν 1 b ψ ν ) , where 1 ( u, v ) = 1 { 0 ≤ u,v ≤ 1 } . It is ea sy to see from the exp r ession for R i ( u, v ) and ( 23 ) that for reasonable choic es of h µ , the contribution of R i ( u, v ) can b e ignored, since it is of a smaller asymp - totic order (in fact can b e sho wn to b e o P ( n − 1 )). Hence, we end up w ith the appro ximation b H ν ∆ i b ψ ν ≈ b H ν b ∆ i b ψ ν + ( b σ 2 ( − i ) − n n − 1 b σ 2 )( b H ν 1 b ψ ν ). Th u s, we can separate out the first term on the RHS of ( 43 ) into t w o parts - one w ith multiplier 1, and the other with m ultiplier W ˜ h n ( u, v ). Then u sing ( 22 ) and th e second represen tation of b H ν in ( 20 ), we obtai n th e expressions in the first four lines on the RHS of ( 24 ). Next, using the fact that W e h n ( u, v ) ≈ 1 {| u − v |≤ A 2 h n } , and u sing th e appro ximations e ψ ν ( v ) ≈ e ψ ν ( u ) an d g ( u + v 2 ) ≈ g ( u ) on th e in terv al [ u − A 2 h n , u + A 2 h n ], w e obtain th e last three terms on the RHS of ( 24 ). No w, using ( 21 ), noting that tr ( b P ν b C c ) = b λ ν , and follo wing similar argumen ts, we h a v e ( 25 ). Appro ximation of γ k ,k ′ ,h n ( i ) First, to fix n otation, sup p ose that e h n = Ah n for some constant A > 0. Then, by definition of W e h n , and the symmetry of Q , the integ r al app earing in ( 28 ) can b e expr essed as (ignoring the b ound aries) d j j ′ ll ′ ,k k ′ ; h n := Z 1 0 b ψ k ( u ) g ( u ) ( u − s l ) j Q h n ( s l − u ) Z u + A 2 h n u − A 2 h n b ψ k ′ ( v ) g ( v ) ( v − s l ′ ) j ′ Q h n ( v − s l ′ ) dv du. (45) Noticing th at, on [ u − A 2 h n , u + A 2 h n ], b ψ k ′ ( v ) g ( v ) can b e approximat ed as b ψ k ′ ( u ) g ( u ) , w e can approximate the inner int egral (with resp ect to v ) b y b ψ k ′ ( u ) g ( u ) Z u + A 2 h n u − A 2 h n ( v − s l ′ ) j ′ Q h n ( v − s l ′ ) dv = h j ′ +1 n b ψ k ′ ( u ) g ( u ) Z u − s l ′ h n + A 2 u − s l ′ h n − A 2 w j ′ Q ( w ) dw , (setting w = v − s l ′ h n ) =: h j ′ +1 n b ψ k ′ ( u ) g ( u ) G Q j ′  u − s l ′ h n  =: h j ′ +1 n b ψ k ′ ( u ) g ( u ) G Q j ′ ,l ′ ; h n ( u ) . Substituting this in ( 45 ), we hav e the app ro ximation d j j ′ ll ′ ,k k ′ ; h n ≈ ( − 1) j h j ′ +1 n Z 1 0 b ψ k ( u ) b ψ k ′ ( u ) g 2 ( u ) G Q j ′ ,l ′ ; h n ( u )( s l − u ) j Q h n ( s l − u ) du = ( − 1) j h j ′ +1 n G j b ψ k b ψ k ′ g 2 ! G Q j ′ ,l ′ ; h n , Q h n ! ( s l ) =: d j j ′ ll ′ ,k k ′ ; h n , (46) b y definition of G j ( f 1 , f 2 )( · ), j = 0 , 1. Sin ce  u − s l ′ h n − A 2 , u − s l ′ h n + A 2  ∩ [ − C Q , C Q ] = φ ⇔ | u − s l ′ | > ( C Q + A 2 ) h n , 23 then G Q j ′  u − s l ′ h n  ≡ 0 if | u − s l ′ | > ( C Q + A 2 ) h n . F urthermore, Q h n ( u − s l ) ≡ 0 if | u − s l | > C Q h n . This means that, if either | u − s l | > C Q h n , or | u − s l ′ | > ( C Q + A 2 ) h n , then the integ rand in the first step of ( 46 ) is zero. So the d omain of in tegration is, effectiv ely , [ s l − C Q h n , s l + C Q h n ] ∩ [ s l ′ − ( C Q + A 2 ) h n , s l ′ + ( C Q + A 2 ) h n ]. This implies that if | s l − s l ′ | > (2 C Q + A 2 ) h n , then the effectiv e domain of inte gration is empt y , meaning that d j j ′ ll ′ ,k k ′ ; h n = 0 if | s l − s l ′ | > (2 C Q + A 2 ) h n . If Q ( · ) is chosen to b e a cen tered cubic B-spline (so that C Q = 2), w e can compute G Q j ′ ( · ) explicitly , without having to p erform a n umer ical integrat ion (App endix F). App endix C In the follo wing, we often drop th e su bscript n from h n for simplicit y and sometimes w e eve n dr op the sub s cript h from the notation. Pro of of Prop osition 2.1 By elemen tary calculat ions, and supp osing that m i ≥ 2 for eac h 1 ≤ i ≤ n , we ha v e E [ e X i ( s ) e X i ( t )] = 1 m 2 i m i X j,j ′ =1 E [ Y ij Y ij ′ 1 h 2 n K ( s − T ij h n ) K ( s − T ij ′ h )] = m i m 2 i 1 h 2 n Z ( C ( u, u ) + σ 2 ) K ( s − u h n ) K ( t − u h n ) du + m i ( m i − 1) m 2 i 1 h 2 n Z Z C ( u, v ) K ( s − u h n ) K ( t − u h n ) dudv = 1 m i 1 h n Z ( C ( t + h n u, t + h n u ) + σ 2 ) K ( − u ) K ( s − t h n − u ) du + m i − 1 m i Z Z C ( s + h n u, t + h n v ) K ( − u ) K ( − v ) dudv = 1 m i h n  ( C ( t ) + σ 2 ) Z K ( − u ) K ( s − t h n − u ) du + h n C ′ ( t ) Z uK ( − u ) K ( s − t h n − u ) du + O ( h 2 n )  +  1 − 1 m i  C ( s, t ) Z Z K ( − u ) K ( − v ) dudv +  1 − 1 m i  h n Z Z [ C s ( s, t ) u + C t ( s, t ) v ] K ( − u ) K ( − v ) dudv + O ( h 2 n ) , (47) where the last step is b y T a ylor series expansions. No w, noticing that K is symmetric ab out 0, R K ( x ) dx = 1 and R xK ( x ) dx = 0, ( 5 ) and ( 6 ) follo w from ( 47 ) after simplifications. 24 Asymptotic p oin twise bias ( 31 ) W e first compute the exp ected v alue of the estimate describ ed by ( 11 ). F or simplicit y of notations, w e exp r ess e X i ( s l ) + ( s − s l ) e X ′ i ( s l ) b y e X i,l ( s ). Observ e that e X i,l ( s ) = 1 m i m i X j =1 Y ij 1 h  K ( s l − T ij h ) + s − s l h K ′ ( s l − T ij h )  Let the supp ort of k ern el K ( · ) b e d en oted by [ − B K , B K ]. Th en, for eac h fixed j = 1 , . . . , m i , and i = 1 , . . . , n , E  Y 2 ij  ( K ( s l − T ij h ) + s − s l h K ′ ( s l − T ij h ))( K ( s l ′ − T ij h ) + s − s l ′ h K ′ ( s l ′ − T ij h ))  = Z [ C ( u, u ) + σ 2 ] g 2 ( u ) ·  ( K ( s l − u h ) + s − s l h K ′ ( s l − u h ))( K ( s l ′ − v h ) + t − s l ′ h K ′ ( s l ′ − v h ))  du, (48) whic h is 0 if | s l − s l ′ | > 2 B K h , since this implies th at K ( s l − u h ) K ( s l ′ − u h ) = 0 f or all u ∈ R . If | s l − s l ′ | ≤ 2 B K h , there is nonzero con trib ution of th e term ( 48 ) in E [ e X i,l ( s ) e X i,l ′ ( t )] Q h ( s − s l ) Q h ( t − s l ′ ) only if | s − t | ≤ 2( B K + C Q ) h , where supp ( Q ) = [ − C Q , C Q ]. Thus, if A > 4( B K + C Q ), then for | s − t | > 1 2 Ah , w e h a ve w ( m i ) E ( e X i,l ( s ) e X i,l ′ ( t )) = Z Z C ( u, v ) g ( u ) g ( v ) 1 h 2  ( K ( s l − u h ) + s − s l h K ′ ( s l − u h ))( K ( s l ′ − v h ) + t − s l ′ h K ′ ( s l ′ − v h ))  dudv = Z Z C ( s l + xh, s l ′ + y h ) g ( s l + xh ) g ( s l ′ + y h ) ·  ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y ))  dxdy . (49) W e assume th at the cond itions in Section 4 hold. Then using the representat ion ( 49 ), and the calculatio n s done in App endix F, we get an expr ession for the asymptotic bias in estimating C ( · , · ) as a fun ction of the bandwidth h ≡ h n . These resu lts are su m marized in the follo wing lemmas, where C s , C ss and C t , C tt denote the first and second partial deriv ativ es of C ( s, t ) with resp ect to s and t , resp ectiv ely . Lemma 7.5. ( Exp ectation of e C ( s, t ) ): L et K 2 = R x 2 K ( x ) dx , Q h ( s ) = L n X l =1 Q h ( s − s l ) , and Q (2) h ( s ) = L n X l =1 ( s − s l h ) 2 Q h ( s − s l ) . 25 Then, for | s − t | > 2 Ah n , E e C ( s, t ) = C ( s, t ) Q h ( s ) Q h ( t ) + h 2 2 C ( s, t )  g ′′ ( s ) g ( s ) ( K 2 Q h ( s ) − Q (2) h ( s )) Q h ( t ) + g ′′ ( t ) g ( t ) ( K 2 Q h ( t ) − Q (2) h ( t )) Q h ( s )  + h 2 C s g ′ ( s ) g ( s ) ( K 2 Q h ( s ) − Q (2) h ( s )) Q h ( t ) + h 2 C t g ′ ( t ) g ( t ) ( K 2 Q h ( t ) − Q (2) h ( t )) Q h ( s ) + h 2 2 1 g ( s ) g ( t ) h C ss ( K 2 Q h ( s ) − Q (2) h ( s )) Q h ( t ) + C tt ( K 2 Q h ( t ) − Q (2) h ( t )) Q h ( s ) i + O ( h 2+ α ) . (50) Note that b ecause of pr op ert y (iii ) of the k ern el Q , and the fact that s l = ( l + a ) h for l = 1 , . . . , L n , for some constan t a ∈ [ − 3 , 3], we ha v e for s ∈ ( c, 1 − c ), for some c ∈ (0 , 1), Q h ( s ) = L n X l =1 Q ( s h − a − l ) = 1 . Therefore, w e can c ho ose L n and the sequence of p oin ts { s l } L n l =1 so that L n ≈ h − 1 n , and Q h ( s ) ≡ 1 for all s ∈ [0 , 1]. T hat is, from Lemma 7.5 , we h av e E e C ( s, t ) = C ( s, t ) + O ( h 2 ). Lemma 7.6. ( Exp ecta t ion of b C ∗ ( t ) ): L et C ′ ( t ) and C ′′ ( t ) denote the first and se c ond derivative of the fu nction C ( t ) := C ( t, t ) . Then, uniformly in t , E b C ∗ ( t ) = ( C ( t ) + σ 2 ) Q h ( t ) + h 2 2 ( C ( t ) + σ 2 )  g ′′ ( t ) g ( t )  ( K 2 Q h ( t ) − Q (2) h ( t )) + h 2 C ′ ( t )  g ′ ( t ) g ( t )  ( K 2 Q h ( t ) − Q (2) h ( t )) + h 2 2 C ′′ ( t ) g ( t ) ( K 2 Q h ( t ) − Q (2) h ( t )) + O ( h 2+ α ) . (51) Pro of of Lemm a 7.6 follo ws along the lines of Lemma 7.5 . F u rthermore, if an estimator b σ 2 is suc h that E b σ 2 = σ 2 + O ( h 2 ), then it follo ws from Lemma 7.6 th at the estimato r b C ( t ) := b C ∗ ( t ) − b σ 2 satisfies E b C ( t ) = C ( t ) + O ( h 2 ) , (52) uniformly on t ∈ [0 , 1], since Q h ( t ) ≡ 1 on t ∈ [0 , 1]. Next, since C ( s, t ) = C ( t, s ) and C ( · , · ) is smo oth, it f ollo ws that C s − C t ≡ 0. Consequen tly , using a T aylo r s eries expans ion, it follo ws that, for an y A > 0, C ( s, t ) = C  s + t 2  + O ( h 2 ) , for | s − t | ≤ A 2 h. (53) Com binin g ( 52 ) and ( 53 ) we get, E b C  s + t 2  = C ( s, t ) + O ( h 2 ) , for | s − t | ≤ A 2 h, s, t ∈ [0 , 1] . (54) 26 App endix D Pro of of Prop osition 5.1 W e shall extensiv ely use the follo wing rep r esen tation H ν ( x, y ) = H ν ( x, y ) − 1 λ ν δ ( x − y ) , (55) where H ν ( x, y ) = X 1 ≤ k 6 = ν ≤ M λ k λ k − λ ν ψ k ( x ) ψ k ( y ) + 1 λ ν ψ ν ( x ) ψ ν ( y ) . The first step is to exp ress b C c ( s, t ) as e C c ( s, t ) − W e h n ( s, t )( b σ 2 − σ 2 ), where e C c ( s, t ) = W e h n ( s, t ) e C ( s, t ) + W e h n ( s, t )( b C ∗ ( s + t 2 ) − σ 2 ) . (56) Therefore, in ord er to separate th e effect of estimating σ 2 , use the fact that for any fixed ǫ > 0, k H ν b C c ψ ν k 2 2 ≤ (1 + ǫ ) k H ν e C c ψ ν k 2 2 +  1 + 1 ǫ  ( b σ 2 − σ 2 ) 2 k H ν W e h n ψ ν k 2 2 = (1 + ǫ ) k H ν e C c ψ ν k 2 2 +  1 + 1 ǫ  ( b σ 2 − σ 2 ) 2 O ( h 4 n ) . (57) The equalit y follo ws since usin g H ν ψ ν = 0, the defin ition of W e h n , and the Me an V alue The or em , w e h a v e | ( H ν W e h n ψ ν )( x ) | =      Z H ν ( x, s ) Z ( s − Ah n 2 ) ∧ 1 ( s + Ah n 2 ) ∨ 0 ( ψ ν ( t ) − ψ ν ( s )) dtds      ≤ A 2 h 2 n 2 k ψ ′ ν k ∞  Z | H ν ( x, s ) | ds + 1 λ ν  . Since E ( b σ 2 − σ 2 ) 2 = o (1), it is enough to show that E k H ν e C c ψ ν k 2 2 has th e b oun d giv en by the RHS of ( 40 ), without the multiplic ative factor (1 + ǫ ). With a sligh t abuse of notation, w e wr ite b C ( s ) to indicate b C ∗ ( s ) − σ 2 . Then, since ( H ν e C c ψ ν )( x ) = Z Z H ν ( x, s ) W e h n ( s, t ) e C ( s, t ) ψ ν ( t ) dsdt + Z Z H ν ( x, s ) W e h n ( s, t ) b C ( s + t 2 ) ψ ν ( t ) dsdt, it follo ws that, k H ν e C c ψ ν k 2 2 equals Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) · e C ( s 1 , t 1 ) e C ( s 2 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx + Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) · b C ( s 1 + t 1 2 ) b C ( s 2 + t 2 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx + Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) · e C ( s 1 , t 1 ) b C ( s 2 + t 2 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx. (58) 27 Th u s, in ord er to obtain E k H ν e C c ψ ν k 2 2 , we need to ev aluate the quantitie s E [ e C ( s 1 , t 1 ) e C ( s 2 , t 2 )], E [ b C ( s 1 + t 1 2 ) b C ( s 2 + t 2 2 )], and E [ e C ( s 1 , t 1 ) b C ( s 2 + t 2 2 )]. Let U i ( s, t ) = L n X l,l ′ =1 1 m 2 i m i X j,j ′ =1 Y ij Y ij ′ e K s,l ( T ij ) e K s,l ( T ij ′ ) Q h ( s − s l ) Q h ( t − s l ′ ) , (59) where e K s,l ( · ) is as in ( 44 ). Then we can express the exp ectation of the first term on the RHS of ( 58 ) as 1 n 2 n X i =1 w 2 ( m i ) Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) · [ g ( s 1 ) g ( s 2 ) g ( t 1 ) g ( t 2 )] − 1 E [ U i ( s 1 , t 1 ) U i ( s 2 , t 2 )] ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx + 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) w ( m i 2 ) Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) · [ g ( s 1 ) g ( s 2 ) g ( t 1 ) g ( t 2 )] − 1 E [ U i 1 ( s 1 , t 1 ) U i 2 ( s 2 , t 2 )] ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx. (60) The follo wing prop osition is the key to get a simplified b ound on ( 60 ). It is prov ed using a lengthy , but fairly straightforw ard calculation. The details are giv en in App end ix F. Prop osition 7.1. Supp ose that A > 4( B K + C Q ) . Then for | s k − t k | > 1 2 Ah n ( k = 1 , 2 ), we have 1 n 2 n X i =1 w 2 ( m i ) E [ U i ( s 1 , t 1 ) U i ( s 2 , t 2 )] g ( s 1 ) g ( s 2 ) g ( t 1 ) g ( t 2 ) = 1 n n X i =1 ( m i − 2)( m i − 3) m i ( m i − 1)  ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( s 2 , t 2 ) + O ( h 2 n )) +( C ( s 1 , s 2 ) + O ( h 2 n ))( C ( t 1 , t 2 ) + O ( h 2 n )) + ( C ( s 1 , t 2 ) + O ( h 2 n ))( C ( s 2 , t 1 ) + O ( h 2 n ))  + Z 1 + Z 2 + Z 3 + Z 4 + Z 5 + Z 6 , (61) wher e the quantities Z j := Z j ( s 1 , s 2 , t 1 , t 2 ) , j = 1 , . . . , 6 wher e Z 1 , . . . , Z 4 ar e asympto tic al ly e quiv- alent to Z ( s 1 , s 2 ) , Z ( s 1 , t 2 ) , Z ( t 1 , s 2 ) and Z ( t 1 , t 2 ) , r esp e ctiv e ly; and Z 5 , Z 6 ar e asymptotic al ly e quivalent to e Z ( s 1 , s 2 , t 1 , t 2 ) and e Z ( s 1 , t 2 , t 1 , s 2 ) , r esp e ctively, wher e Z ( s, t ) = ( O ( 1 nh n m ) if | s − t | ≤ Ah n 2 0 otherw ise ; and e Z ( s 1 , s 2 , t 1 , t 2 ) =            O ( 1 nh 2 n m 2 ) if max {| s 1 − s 2 | , | t 1 − t 2 |} ≤ Ah n 2 O ( 1 nh n m 2 ) if | s 1 − s 2 | ≤ Ah n 2 and | t 1 − t 2 | > Ah n 2 O ( 1 nh n m 2 ) if | s 1 − s 2 | > Ah n 2 and | t 1 − t 2 | ≤ Ah n 2 0 otherwise . 28 Also , 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) w ( m i 2 ) E [ U i 1 ( s 1 , t 1 ) U i 2 ( s 2 , t 2 )] g ( s 1 ) g ( s 2 ) g ( t 1 ) g ( t 2 ) = n − 1 n ( C ( s 1 , t 1 ) + O ( h 2 n ))( C ( s 2 , t 2 ) + O ( h 2 n )) + 1 n 2 ( n X i 1 6 = i 2 ρ 2 i 1 i 2 )  ( C ( s 1 , s 2 ) + O ( h 2 n ))( C ( t 1 , t 2 ) + O ( h 2 n )) +( C ( s 1 , t 2 ) + O ( h 2 n ))( C ( s 2 , t 1 ) + O ( h 2 n ))  . (62) In al l of ab ove the O ( · ) terms ar e uniform in s 1 , s 2 , t 1 , t 2 in their r esp e ctive domains. No w w e deal with the last tw o terms on the RHS of ( 58 ). Let V i ( s ) = L n X l =1 1 m i m i X j =1 Y 2 ij e K s,l ( T ij ) . (63) Then, b C ∗ ( s ) = 1 n n X i =1 [ g ( s )] − 1 V i ( s ) Q h n ( s − s l ) . F or con v enience, in the rest of this subsection we shall use z k to den ote ( s k + t k ) / 2, for k = 1 , 2. Then the follo wing prop osition describ es the con tribu tion of the qu an tities of the t yp e E [ V i 1 ( z 1 ) V i 2 ( z 2 )] and E [ U i 1 ( s 1 , t 1 ) V i 2 ( z 2 )]. Prop osition 7.2. Supp ose that A > 4( B K + C Q ) . Then for (i) | s k − t k | ≤ Ah n 2 , k = 1 , 2 , 1 n 2 n X i =1 E ( V i ( z 1 ) V i ( z 2 )) g ( z 1 ) g ( z 2 ) + 1 n 2 n X i 1 6 = i 2 E ( V i 1 ( z 1 ) V i 2 ( z 2 )) g ( z 1 ) g ( z 2 ) − σ 2 [ E ( b C ∗ ( z 1 )) + E ( b C ∗ ( z 2 ))] + σ 4 = C ( s 1 , t 1 ) C ( s 2 , t 2 ) − 1 n 2 n X i =1 1 m i ! ( C ( s 1 , t 1 ) + σ 2 )( C ( s 2 , t 2 ) + σ 2 ) + O ( h 2 n ) +   1 n (1 − 1 n n X i =1 1 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h n )) + Z 7 , (64) wher e Z 7 := Z 7 ( z 1 , z 2 ) is asymptotic al ly e quiv alent to Z ( z 1 , z 2 ) . N ext, if (ii) | s 1 − t 1 | > Ah n 2 and 29 | s 2 − t 2 | ≤ Ah n 2 , then 1 n 2 n X i =1 w ( m i ) E ( U i ( s 1 , t 1 ) V i ( z 2 )) + 1 n 2 X i 1 6 = i 2 w ( m i 1 ) E ( U i 1 ( s 1 , t 1 ) V i 2 ( z 2 )) − σ 2 E e C ( s 1 , t 1 ) = ( C ( s 1 , t 1 ) + O ( h 2 n ))( C ( s 2 , t 2 ) + O ( h 2 n )) − 1 n 2 n X i =1 2 m i ! ( C ( s 1 , t 1 ) + O ( h 2 n ))( C ( s 2 , t 2 ) + σ 2 + O ( h 2 n )) +   1 n (1 − 1 n n X i =1 2 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h n )) + Z 8 + Z 9 , (65) wher e the O ( h 2 n ) terms within br ackets in the first term on the RHS dep end on ( s 1 , t 1 ) and ( s 2 , t 2 ) r e sp e ctively, and Z j := Z j ( s 1 , t 1 , z 2 ) , j = 8 , 9 satisfy Z 8 = ( O ( 1 nh 2 n m ) if | s 1 − s 2 | ≤ Ah n 2 0 otherw ise ; Z 9 = ( O ( 1 nh 2 n m ) if | t 1 − s 2 | ≤ Ah n 2 0 otherw ise . The pr o of of Prop osition 5.1 is now completed by using the defin itions of E [ e C ( s 1 , t 1 ) e C ( s 2 , t 2 )], E [ b C ( s 1 + t 1 2 ) b C ( s 2 + t 2 2 )], and E [ e C ( s 1 , t 1 ) b C ( s 2 + t 2 2 )]; using the prop erties of the kernel H ν ( x, y ); and the b ound s in P r op ositions 7.1 and 7.2 and p lugging ev erything bac k into th e exp ectation of ( 58 ). Th e details can b e found in App en d ix F. App endix E Asymptotic p oin twise v ariance ( 32 ) In this section, we pro ve ( 32 ), ( 33 ) an d ( 34 ). Most of the d eriv ations are similar to that of Prop o- sition 5.1 . Th us w e simply giv e a b rief outline. First, using the f act that W e h n ( s, t ) W e h n ( s, t ) = 0, we obtain V ar ( b C c ( s, t )) = W e h n ( s, t )V ar ( e C ( s, t )) + W e h n ( s, t )V ar( b C ∗ ( s + t 2 ) − b σ 2 ) ≤ W e h n ( s, t )V ar ( e C ( s, t )) + 2 W e h n ( s, t )  V ar ( b C ∗ ( s + t 2 )) + V ar( b σ 2 )  . Since E ( b σ 2 − σ 2 ) 2 has the rate giv en b y ( 33 ) (Corollary 4.1 ), we only n eed to pro vide b oun ds f or W e h n ( s, t )V ar( e C ( s, t )) and W e h n ( s, t )V ar ( b C ∗ ( s + t 2 )). W e state these in the follo wing prop ositions. Prop osition 7.3. W e h n ( s, t ) V ar ( e C ( s, t )) = O  1 n  +   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   O (1) + O  max { 1 nh 2 n m 2 , 1 nh n m }  . (66) 30 Prop osition 7.4. W e h n ( s, t ) V ar ( b C ∗ ( s + t 2 )) = O  1 n  +   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   O (1) + O  1 nh n m  . (67) The pro of of ( 34 ) is finished b y com b ining Prop ositions 7.3 and 7.4 and Corollary 4.1 . Pro of of Corollary 4.1 First observe th at, E ( b σ 2 − σ 2 ) 2 = 1 ( T 1 − T 0 ) 2 Z T 1 T 0 Z T 1 T 0 E [( b C ∗ ( t ) − σ 2 − b C 0 ( t ))( b C ∗ ( s ) − σ 2 − b C 0 ( s ))] dsdt ≤ sup t ∈ [ T 0 ,T 1 ] E ( b C ∗ ( t ) − σ 2 − b C 0 ( t )) 2 (b y Cauc h y-Sch w arz inequalit y) ≤ 2 sup t ∈ [ T 0 ,T 1 ] V ar( b C ∗ ( t )) + 2 su p t ∈ [ T 0 ,T 1 ] V ar( b C 0 ( t )) + sup t ∈ [ T 0 ,T 1 ]  E ( b C ∗ ( t )) − σ 2 − E ( b C 0 ( t ))  2 . (68) By Prop ositions 7.3 and 7.4 , and the definition ( 12 ) of b C 0 , the sum of th e first t wo term on the RHS on ( 68 ) is b ounded b y O  1 n  +   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   O (1) + O  max { 1 nh 2 n m 2 , 1 nh n m }  . On the other han d , since for an y b ound ed u ∈ [ A 1 , A 2 ],     1 2 ( C ( t − h n u, t + h n u ) + C ( t + h n u, t − h n u )) − C ( t, t )     = O ( h 2 n ) , uniformly in t ∈ [ T 0 , T 1 ], it follo ws from Lemmas 7.5 and 7.6 (App endix C) that th e last term on the RHS of ( 68 ) is O ( h 4 n ). Pro of of Prop osition 5.2 Without loss of generalit y we assume g to b e uniform d ensit y on [0 , 1]. W e need to consider t w o cases separately : (i) | s − t | > Ah 2 and (ii) | s − t | ≤ Ah 2 . (i) | s − t | > Ah 2 : In this case, we h a v e b C c ( s, t ) − E [ b C c ( s, t )] = W Ah ( s, t )( e C ( s, t ) − E [ e C ( s, t )]). Let B i ( s, T ij ) = L n X l =1 e K s,l ( T ij ) Q h ( s − s l ) , 1 ≤ j ≤ m i , 1 ≤ i ≤ n. Since | e K s,l ( T ij ) | = O ( h − 1 ) and the su mmands are nonzero for finitely man y l , there exists a constan t C 3 > 0 such that sup s ∈ [0 , 1] max 1 ≤ i ≤ n max 1 ≤ j ≤ m i | B i ( s, T ij ) | ≤ C 3 h − 1 . (69) 31 Note further that B i ( s, T ij ) = 0 if | s − T ij | > 2( B K + C Q ) h . Next, L n X l =1 e X i,l ( s ) = 1 m i m i X j =1 ( X i ( T ij ) + σ ε ij ) B i ( s, T ij ) = M X k =1 p λ k ξ ik   1 m i m i X j =1 ψ k ( T ij ) B i ( s, T ij )   + σ 1 m i m i X j =1 ε ij B i ( s, T ij ) = M X k =1 p λ k ξ ik B 1 i,k ( s ) + σ 1 m i m i X j =1 ε ij B i ( s, T ij ) , where B 1 i,k ( s ) := 1 m i P m i j =1 ψ k ( T ij ) B i ( s, T ij ). By ( 69 ), there exists C 4 > 0 such that sup s ∈ [0 , 1] max 1 ≤ k ≤ M max 1 ≤ i ≤ n | B 1 i,k ( s ) | ≤ C 4 h − 1 . (70) Also, since A > 4( B K + C Q ) h and since | s − t | ≥ Ah 2 , it follo ws that B i ( s, T ij ) B i ( t, T ij ) = 0. Moreo v er, B i ( s, T ij ) B i ( t, T ij ′ ) 6 = 0 only if 1 ≤ j 6 = j ′ ≤ m i are suc h that | s − T ij | ≤ 2( B K + C Q ) h and | t − T ij ′ | ≤ 2( B K + C Q ) h . T his implies that P g ( B 1 i,k ( s ) B 1 i,k ( t ) 6 = 0) ≤ C 5 m i ( m i − 1) h 2 for some C 5 := C 5 ( A ) > 0 . F u r thermore, for eac h k = 1 , . . . , M , { B 1 i,k ( s ) } n i =1 are indep enden t, and these random v ari- ables are indep end en t of { ξ ik : 1 ≤ k ≤ M } n i =1 and { ε ij : 1 ≤ j ≤ m i } n i =1 . Then, we can express e C ( s, t ) − E [ e C ( s, t )] as, e C ( s, t ) − E [ e C ( s, t )] = X 1 ≤ k 6 = k ′ ≤ M p λ k λ k ′ 1 n n X i =1 ξ k i ξ k ′ i w ( m i ) B 1 i,k ( s ) B 1 i,k ′ ( t ) + M X k =1 λ k 1 n n X i =1 ( ξ 2 k i − 1) w ( m i ) B 1 i,k ( s ) B 1 i,k ( t ) + M X k =1 λ k 1 n n X i =1 w ( m i )( B 1 i,k ( s ) B 1 i,k ( t ) − E ( B 1 i,k ( s ) B 1 i,k ( t ))) + σ M X k =1 p λ k 1 n n X i =1 w ( m i ) 1 m i m i X j =1 ξ ik ε ij ( B 1 i,k ( s ) B i ( t, T ij ) + B 1 i,k ( t ) B i ( s, T ij )) + σ 2 1 n n X i =1 w ( m i ) m 2 i m i X j 6 = j ′ ε ij ε ij ′ B i ( s, T ij ) B i ( t, T ij ′ ) + σ 2 1 n n X i =1 w ( m i ) m 2 i m i X j =1 ( ε 2 ij − 1) B i ( s, T ij ) B i ( t, T ij ) + σ 2 1 n n X i =1 w ( m i ) m 2 i m i X j =1 ( B i ( s, T ij ) B i ( t, T ij ) − E ( B i ( s, T ij ) B i ( t, T ij ))) . (71) 32 The last tw o terms in the ab o ve expression v anish since | s − t | > 4( B K + C Q ) h . Note that, max 1 ≤ i ≤ n w ( m i ) is b ounded. By ( 70 ), | B 1 i,k ( s ) B 1 i,k ( t ) | ≤ C 2 4 h − 2 are b ounded for k = 1 , . . . , M , and for all k , k ′ , max 1 ≤ i ≤ n V ar( B 1 i,k ( s ) B 1 i,k ′ ( t )) ≤ C 6 max { ( m h ) − 2 , ( mh ) − 1 } f or C 6 = C 6 ( A ) > 0 , (72) (see App endix F). Thus b y Bernstein ’s ine quality , an d using the condition that m 2 = o ( nh 2 / log n ), giv en η > 0, there exists c 1 ,η > 0 suc h that for sufficiently large n (so that the b ound in ( 72 ) is O (( m h ) − 2 )), P g max k =1 ,...,M      1 n n X i =1 w ( m i )( B 1 i,k ( s ) B 1 i,k ( t ) − E ( B 1 i,k ( s ) B 1 i,k ( t )))      > c 1 ,η s log n nh 2 m 2 ! ≤ n − η . (73) Next, let A b e th e set of indices i such that B 1 i,k ( s ) B 1 i,k ′ ( t ) 6 = 0 for some k , k ′ . An d let N n = |A| . Since for an y k , k ′ , P ( B 1 i,k ( s ) B 1 i,k ′ ( t ) 6 = 0) ≤ C 5 m 2 h 2 , it follo w s b y another application of Bernstein’s inequalit y that there exists a set D n (in th e sigma field generated b y { T ij } ) and a constan t c 2 ,η > 0 suc h that D n = { N n ≤ c 2 ,η n m 2 h 2 } and P ( D n ) ≥ 1 − n − η . Therefore we can restrict our atten tion to the set D n , and conditioning on T W e can express ξ A ,k = ( ξ ik ) i ∈A as ξ A ,k := ( R AA ) 1 / 2 ξ A ,k , w here the r andom v ectors ξ A ,k ha ve N N n (0 , I ) distribution and are ind ep endent for different k ’s. Then w e can w r ite (conditionally on T ) n X i =1 ξ k i ξ k ′ i w ( m i ) B 1 i,k ( s ) B 1 i,k ′ ( t ) = ξ T A ,k Φ( T ) ξ A ,k ′ , where Φ( T ) = ( R AA ) 1 / 2 diag ( w ( m i ) B 1 i,k ( s ) B 1 i,k ′ ( t )) i ∈A ( R AA ) 1 / 2 . Observe that by ( 70 ) and condition C 3 , w e ha ve k Φ( T ) k≤ C 4 κ n h − 2 . T herefore, by an applicatio n of Lemma 7.2 , we ha ve , for some c 3 ,η > 0, P ( | 1 n n X i =1 ξ k i ξ k ′ i w ( m i ) B 1 i,k ( s ) B 1 i,k ′ ( t ) | > c 3 ,η mκ n r log n nh 2 , D n ) ≤ n − η . V ery similar arguments can b e us ed to obtain b ound s of order mκ n q log n nh 2 (that hold with probabilit y at least 1 − O ( n − η ), for any giv en η > 0) for the s econd, fourth and fifth terms on the RHS of ( 71 ). Th us, by conditions on κ n and h n , w e h av e, for some constant c 4 ,η > 0, P ( | W Ah ( s, t )( b C c ( s, t ) − E ( b C c ( s, t )) | > c 4 ,η mκ n r log n nh 2 ) ≤ n − η . (74) (ii) | s − t | ≤ Ah 2 : In this case, w e hav e b C c ( s, t ) − E [ b C c ( s, t )] = W Ah ( s, t )( b C ∗ ( s + t 2 ) − E [ b C ∗ ( s + t 2 )]) (ignoring the maxim um o v er h 2 n in the definition). Th en similar (but somewhat simp ler) argumen ts, no w in vol vin g Lemma 7.3 , sho w that for some c 5 ,η > 0, P ( | W Ah ( s, t )( b C ∗ ( s + t 2 ) − E [ b C ∗ ( s + t 2 )]) | > c 5 ,η mκ n r log n nh 2 ) ≤ n − η . (75) Com binin g ( 74 ) and ( 75 ) we obtain the result. 33 App endix F Details of computation of G Q j ( · ) W e w ant to give explicit fun ctional form for G Q j ( y ), for j = 0 , 1 and for any y ∈ R . Let B 1 ( x ) = x 3 / 6 B 2 ( x ) = ( − 3 x 3 + 3 x 2 + 3 x + 1) / 6 B 3 ( x ) = (3 x 3 − 6 x 2 + 4) / 6 B 4 ( x ) = (1 − x ) 3 / 6 . Then the cen tered v ersion of th e cubic B -splin e Q has the form Q ( x ) =                B 1 ( x + 2) for − 2 ≤ x ≤ − 1 B 2 ( x + 1) for − 1 ≤ x ≤ 0 B 3 ( x ) for 0 ≤ x ≤ 1 B 4 ( x − 1) for 1 ≤ x ≤ 2 0 o therw ise =                1 6 (2 + x ) 3 for − 2 ≤ x ≤ − 1 1 6 ( − 3 x 3 − 6 x 2 + 4) for − 1 ≤ x ≤ 0 1 6 (3 x 3 − 6 x 2 + 4) for 0 ≤ x ≤ 1 1 6 (2 − x ) 3 for 1 ≤ x ≤ 2 0 otherwise . Note that G Q j ( y ) can then b e computed by utilizing the fact th at, for j = 0 , 1, G Q j ( y ) = Z ( y + A 2 ) ∧ 2 − 2 x j Q ( x ) dx − Z ( y − A 2 ) ∧ 2 − 2 x j Q ( x ) dx, where the int egrals on the righ t hand side are defi ned to b e zero if the corresp onding upp er limits are less than − 2. T he in tegrals on the RHS of ab o v e equation can b e compu ted from the rep resen tation of Q ( · ) as follo ws: Z b − 2 Q ( x ) dx = 1 24 (2 + b ) 4 , for − 2 ≤ b ≤ − 1 Z b − 1 Q ( x ) dx = 1 24 ( − 3 b 4 − 8 b 3 + 16 b + 11) , for − 1 ≤ b ≤ 0 Z b 0 Q ( x ) dx = 1 24 (3 b 4 − 8 b 3 + 16 b ) , f or 0 ≤ b ≤ 1 Z b 1 Q ( x ) dx = 1 24 (1 − (2 − b ) 4 ) , for 1 ≤ b ≤ 2 , Z b − 2 x Q ( x ) dx = 1 30 (2 + b ) 5 − 1 12 (2 + b ) 4 , for − 2 ≤ b ≤ − 1 Z b − 1 x Q ( x ) dx = 1 60 ( − 6 b 5 − 15 b 4 + 20 b 2 − 11) , f or − 1 ≤ b ≤ 0 Z b 0 x Q ( x ) dx = 1 60 (6 b 5 − 15 b 4 + 20 b 2 ) , for 0 ≤ b ≤ 1 Z b 1 x Q ( x ) dx = 1 30 (2 − b ) 5 − 1 12 (2 − b ) 4 + 1 20 , for 1 ≤ b ≤ 2 . 34 Details of the calculation of p oin twise bias P erformin g a T aylo r series expansion around ( s, t ) w e get , g ( s l + xh ) = g ( s ) + h ( s l − s h + x ) g ′ ( s ) + h 2 2 ( s l − s h + x ) 2 g ′′ ( s ) + O (( | s − s l h | 2+ α + | x | 2+ α ) h 2+ α ) g ( s l ′ + y h ) = g ( t ) + h ( s l ′ − t h + y ) g ′ ( t ) + h 2 2 ( s l ′ − t h + y ) 2 g ′′ ( t ) + O (( | t − s l ′ h | 2+ α + | y | 2+ α ) h 2+ α ) , (76) and C ( s l + xh, s l ′ + y h ) = C ( s, t ) + h ( s l − s h + x, s l ′ − t h + y )  C s ( s, t ) C t ( s, t )  + h 2 2 ( s l − s h + x, s l ′ − t h + y )  C ss C st C ts C tt   s − s l h + x t − s l ′ h + y  + O  | s − s l h | 2+ α + | t − s l ′ h | 2+ α + | x | 2+ α + | y | 2+ α  h 2+ α  . (77) First we consider th e off-diagonal terms, i.e., compu te E e C ( s, t ), for | s − t | > 2 Ah . • h 0 terms : Since R K ( x ) dx = 1 and R K ′ ( x ) dx = 0, Z Z C ( s, t )( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = C ( s, t ) . (78) • h 1 terms : Since R xK ′ ( − x ) dx = 1, R xK ( x ) dx = 0, and R K ( x ) dx = 1, Z Z h  ( s l − s h + x ) C s + ( s l ′ − t h + y ) C t  · ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = 0 , (79) and Z Z hC ( s, t )  g ( s ) g ′ ( t )( s l ′ − t h + y ) + g ′ ( s ) g ( t )( s l − s h + x )  · ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = 0 . (80 ) • h 2 terms : Since R x 2 K ′ ( − x ) dx = 0, R xK ′ ( − x ) dx = 1, R xK ( x ) dx = 0, and R K ( x ) dx = 1, h 2 2 C ( s, t ) Z Z  g ′′ ( t ) g ( s )( s l ′ − t h + y ) 2 + g ′′ ( s ) g ( t )( s l − s h + x ) 2  · ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = h 2 2 C ( s, t )  g ′′ ( t ) g ( s )( K 2 − ( s l ′ − t h ) 2 ) + g ′′ ( s ) g ( t )( K 2 − ( s l − s h ) 2 )  ; 35 h 2 C ( s, t ) Z Z [( s l − s h + x )( s l ′ − t h + y ) g ′ ( s ) g ′ ( t )] · ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = 0; h 2 Z Z  ( s l − s h + x ) C s + ( s l ′ − t h + y ) C t  ·  g ( s ) g ′ ( t )( s l ′ − t h + y ) + g ′ ( s ) g ( t )( s l − s h + x )  · ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = h 2  C s g ′ ( s ) g ( t )( K 2 − ( s l − s h ) 2 ) + C t g ( s ) g ′ ( t )( K 2 − ( s l ′ − t h ) 2 )  ; h 2 2 Z Z  ( s l − s h + x ) 2 C ss + 2( s l − s h + x )( s l ′ − t h + y ) C st + ( s l ′ − t h + y ) 2 C tt  · ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = h 2 2  C ss ( K 2 − ( s l − s h ) 2 ) + C tt ( K 2 − ( s l ′ − t h ) 2 )  . In summary , the h 2 term in the expansion is, h 2  1 2 g ′′ ( s ) g ( t ) C + g ′ ( s ) g ( t ) C s + 1 2 C ss  ( K 2 − ( s l − s h ) 2 ) + h 2  1 2 g ( s ) g ′′ ( t ) C + g ( s ) g ′ ( t ) C t + 1 2 C tt  ( K 2 − ( s l ′ − t h ) 2 ) . (81 ) Pro of of Lemma 7.5 : Com b ining ( 78 ), ( 79 ), ( 80 ) and ( 81 ), and using ( 76 ), ( 77 ) and the fact that P L n l =1 | s − s l h | β Q h ( s − s l ) < ∞ , after some alg ebr a, w e obtain ( 50 ). Com bined b ound on E k H ν e C c ψ ν k 2 2 W e put the differen t pieces der ived in App endix D together to obtain a b ound on E k H ν e C c ψ ν k 2 2 . F or ease of notation, we denote b y H ν ≡ H ν ( x, s 1 , s 2 , t 1 , t 2 ) the inte gr al op er ator with kernel H ν ( x, s 1 ) H ν ( x, s 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ). Then, with r 1 , r 2 taking v alues 0 or 1, Z Z Z Z Z H ν ( x, s 1 , s 2 , t 1 , t 2 )( C ( s 1 , t 1 )) r 1 ( C ( s 2 , t 2 )) r 2 ds 1 ds 2 dt 1 dt 2 dx = 0 . (82) Z Z Z Z Z H ν ( x, s 1 , s 2 , t 1 , t 2 )( C ( s 1 , t 2 )) r 1 ( C ( s 2 , t 1 )) r 2 ds 1 ds 2 dt 1 dt 2 dx = 0 . (83) Z Z Z Z Z H ν ( x, s 1 , s 2 , t 1 , t 2 )( C ( s 1 , s 2 )) r 1 ( C ( t 1 , t 2 )) r 2 ds 1 ds 2 dt 1 dt 2 dx = λ r 2 ν   X 1 ≤ k 6 = ν ≤ M λ k ( λ k − λ ν ) 2   r 1 . (84) 36 Implicitly using ( 130 ) - ( 132 ), w e also ha ve the b oun d | Z Z Z Z Z H ν ( x, s 1 , s 2 , t 1 , t 2 ) R ( s 1 , s 2 , t 1 , t 2 ) ds 1 ds 2 dt 1 dt 2 dx | = O ( k R k ∞ ) . (85) F rom Prop osition 7.1 , the total con tribution in ( 60 ) of th e first terms on th e RHS of ( 61 ) and ( 62 ) b ecomes 1 n (1 − 1 n n X i =1 4 m i − 6 m i ( m i − 1) ) + n − 1 n ! · Z  Z Z H ν ( x, s ) W e h n ( s, t )[ C ( s, t ) + O ( h 2 n )] ψ ν ( t ) dsdt  2 dx +   1 n (1 − 1 n n X i =1 4 m i − 6 m i ( m i − 1) ) + 1 n 2 X i 1 6 = i 2 ρ 2 i 1 i 2   · Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) ( C ( s 1 , s 2 ) + O ( h 2 n ))( C ( t 1 , t 2 ) + O ( h 2 n )) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx +   1 n (1 − 1 n n X i =1 4 m i − 6 m i ( m i − 1) ) + 1 n 2 X i 1 6 = i 2 ρ 2 i 1 i 2   · Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) ( C ( s 1 , t 2 ) + O ( h 2 n ))( C ( s 2 , t 1 ) + O ( h 2 n )) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx. (86) Since H ν C ψ ν ≡ 0, it can b e c hec ke d that th e first in tegral in ( 86 ) is O ( h 2 n ). On th e other hand, from the definition of W e h n ( s, t ) and the fact that H ν C ψ ν ≡ 0, it follo ws that the last integral term is O ( h n ). Next, apply H ν to the follo wing functions : W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) D 2 ( s 1 , s 2 , t 1 , t 2 ) and 2 W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) D 3 ( s 1 , s 2 , t 1 , t 2 ), where D 2 ( s 1 , s 2 , t 1 , t 2 ) and D 3 ( s 1 , s 2 , t 1 , t 2 ) are the terms giv en b y the sum of the first three terms on the RHS of ( 64 ) (includ ing th e isolated O ( h 2 n ) term), and the sum of th e fir st three term s on the RHS of ( 65 ), resp ectiv ely . Then, add in g these terms to ( 86 ), w e ha v e, by ( 82 ) - ( 85 ), ( 132 ) (for dealing with the isolated O ( h 2 n ) term in ( 64 )), and the commen t follo wing ( 86 ), that this sum equals R 1 = 1 n   X 1 ≤ k 6 = ν ≤ M λ k λ ν ( λ k − λ ν ) 2   +   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2     X 1 ≤ k 6 = ν ≤ M λ k λ ν ( λ k − λ ν ) 2 + O ( h n )   + O ( h 4 n ) + O  1 nm  + O  h n n  . (87) Next, f or notational con v enience, expr ess the integral op erator H ν applied to Z j (where Z j are as in Prop ositions 7.1 - 7.2 ) times W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) W e h n ( s 1 , t 2 ) by H ν W s 1 ,t 1 W s 2 ,t 2 W s 1 ,t 2 Z j , etc. 37 Using ( 130 ) - ( 135 ), and the b ounds in Prop osition 7.1 for Z j , j = 1 , . . . , 4, w e ha ve, R 2 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 1 = H ν W s 1 ,t 1 W s 2 ,t 2 W s 1 ,s 2 Z 1 = O  1 nh n m  , R 3 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 2 = H ν W s 1 ,t 1 W s 2 ,t 2 W s 1 ,t 2 Z 2 = O  1 nm  , R 4 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 3 = H ν W s 1 ,t 1 W s 2 ,t 2 W s 2 ,t 1 Z 3 = O  1 nm  , R 5 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 4 = H ν W s 1 ,t 1 W s 2 ,t 2 W t 1 ,t 2 Z 4 = O  1 nm  . Using analogous reasoning, from Prop ositions 7.1 and 7.2 w e also ha v e R 6 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 5 = O  1 nh n m 2  , R 7 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 6 = O  1 nm 2  R 8 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 7 = O  h n nm  , R 9 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 8 = O  1 nh n m  , R 10 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 9 = O  1 nm  . Hence, com bin ing ( 87 ) with the b ound s for R 2 to R 10 , using the definitions of E [ e C ( s 1 , t 1 ) e C ( s 2 , t 2 )], E [ b C ( s 1 + t 1 2 ) b C ( s 2 + t 2 2 )], and E [ e C ( s 1 , t 1 ) b C ( s 2 + t 2 2 )], and plugging ev eryth ing bac k in to ( 58 ), w e complete the pro of of Prop osition 5.1 . The details of the ke y steps in this deriv ation are giv en b elo w. Pro of details for Prop osition 5.1 Pro of of Prop osition 7.1 : W e need to deal with terms of the form e E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) := E [ Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i 1 j ′ 1 ) e K s 2 ,l 2 ( T i 2 j 2 ) e K t 2 ,l ′ 2 ( T i 2 j ′ 2 )] , for 1 ≤ j 1 , j ′ 1 ≤ m i 1 , 1 ≤ j 2 , j ′ 2 ≤ m i 2 , 1 ≤ i 1 , i 2 ≤ n . F or compu tational conv enience, w e also define, E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) = e E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) Q h ( s 1 − s l 1 ) Q h ( t 1 − s l ′ 1 ) Q h ( s 2 − s l 2 ) Q h ( t 2 − s l ′ 2 ) . (88) 38 First, consider the case i 1 = i 2 = i , sa y . Then, us ing to ⋆ to denote E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ), w e h a v e E [ U i ( s 1 , t 1 ) U i ( s 2 , t 2 )] = 1 m 4 i m i X j 1 6 = j ′ 1 6 = j 2 6 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ + 1 m 4 i   m i X j 1 = j ′ 1 6 = j 2 6 = j ′ 2 + m i X j 1 = j 2 6 = j ′ 1 6 = j ′ 2 + m i X j 1 = j ′ 2 6 = j ′ 1 6 = j 2 + m i X j 1 6 = j ′ 1 6 = j 2 = j ′ 2 + m i X j 1 6 = j ′ 1 = j 2 6 = j ′ 2 + m i X j 1 6 = j 2 6 = j ′ 1 = j ′ 2   L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ + 1 m 4 i   m i X j 1 = j ′ 1 6 = j 2 = j ′ 2 + m i X j 1 = j 2 6 = j ′ 1 = j ′ 2 + m i X j 1 = j ′ 2 6 = j ′ 1 = j 2   L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ + 1 m 4 i   m i X j 1 = j ′ 1 = j 2 6 = j ′ 2 + m i X j 1 = j ′ 1 = j ′ 2 6 = j 2 + m i X j 1 = j 2 = j ′ 2 6 = j ′ 1 + m i X j 1 6 = j ′ 1 = j 2 = j ′ 2   L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ + 1 m 4 i m i X j 1 = j ′ 1 = j 2 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ . (89) Next, consider the ca se i 1 6 = i 2 . Th en, with ⋆ denoting E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ), E [ U i 1 ( s 1 , t 1 ) U i 2 ( s 2 , t 2 )] = 1 m 2 i 1 m 2 i 2 m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 6 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ + 1 m 2 i 1 m 2 i 2   m i 1 X j 1 = j ′ 1 m i 2 X j 2 6 = j ′ 2 + m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 = j ′ 2   L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ + 1 m 2 i 1 m 2 i 2 m i 1 X j 1 = j ′ 1 m i 2 X j 2 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ . (90) Note that, for all i 1 , i 2 , if either j 1 = j ′ 1 or j 2 = j ′ 2 , then L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) = 0 , unless | s 1 − t 1 | ≤ A 2 h , or | s 2 − t 2 | ≤ A 2 h , resp ectiv ely , for A satisfying A ≥ 4( B K + C Q ) and e h n = Ah n . This can b e v erifi ed by using the definition of E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ), equations ( 113 ), ( 114 ), ( 116 ) - ( 119 ), and arguing as in the analysis of the term ( 48 ). Th erefore, since 1 | s k − t k |≤ A 2 h W e h n ( s k , t k ) = 0, for k = 1 , 2, the sums corresp onding to either j 1 = j ′ 1 or j 2 = j ′ 2 in ( 89 ) and ( 90 ) do not con tribute an ythin g to ( 60 ). Thus, when i 1 6 = i 2 , the only sum that contributes to ( 60 ) corresp onds to j 1 6 = j ′ 1 , j 2 6 = j ′ 2 . When i 1 = i 2 = i , th e sums that con tribute to ( 60 ) are the ones corresp ond ing to j 1 6 = j ′ 1 6 = j 2 6 = j ′ 2 , j 1 = j 2 6 = j ′ 1 6 = j ′ 2 , j 1 = j ′ 2 6 = j ′ 1 6 = j ′ 2 , j 1 6 = j ′ 1 = j 2 6 = j ′ 2 , j 1 6 = j 2 6 = j ′ 1 = j ′ 2 , j 1 = j 2 6 = j ′ 1 = j ′ 2 , and j 1 = j ′ 2 6 = j ′ 1 = j 2 . W e consid er these cases one by one. 39 Lemma 7.7. If i 1 = i 2 , j 1 6 = j ′ 1 6 = j 2 6 = j ′ 2 ; or i 1 6 = i 2 j 1 6 = j ′ 1 , j 2 6 = j ′ 2 , then L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) = ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( s 2 , t 2 ) + O ( h 2 )) + ρ 2 i 1 i 2  ( C ( s 1 , s 2 ) + O ( h 2 ))( C ( t 1 , t 2 ) + O ( h 2 )) + ( C ( s 1 , t 2 ) + O ( h 2 ))( C ( s 2 , t 1 ) + O ( h 2 ))  , (91) wher e the O ( h 2 ) terms ar e uniform in s 1 , t 1 , s 2 , t 2 ∈ [0 , 1] . The follo wing lemma giv es an exp r ession and the corresp onding b ound for the term Z 1 . Lemma 7.8. If i 1 = i 2 = i , j 1 = j 2 6 = j ′ 1 6 = j ′ 2 , then 1 n 2 n X i =1 1 m 4 i m i X j 1 = j 2 6 = j ′ 1 6 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) = ( O ( 1 nhm ) if | s 1 − s 2 | ≤ Ah 2 0 otherwise (92) The follo win g lemma giv es expressions and the corresp on d ing b ounds for th e term Z 2 , Z 3 and Z 4 . Lemma 7.9. If i 1 = i 2 = i and j 1 = j ′ 2 6 = j ′ 1 6 = j 2 ; j 1 6 = j ′ 1 = j 2 6 = j ′ 2 ; j 1 6 = j 2 6 = j ′ 1 = j ′ 2 , then 1 n 2 n X i =1 1 m 4 i m i X j 1 = j ′ 2 6 = j ′ 1 6 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) = ( O ( 1 nhm ) if | s 1 − t 2 | ≤ Ah 2 0 otherwise (93) 1 n 2 n X i =1 1 m 4 i m i X j 1 6 = j ′ 1 = j 2 6 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) = ( O ( 1 nhm ) if | t 1 − s 2 | ≤ Ah 2 0 otherwise (94) 1 n 2 n X i =1 1 m 4 i m i X j 1 6 = j 2 6 = j ′ 1 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) = ( O ( 1 nhm ) if | t 1 − t 2 | ≤ Ah 2 0 otherwise (95) The follo wing lemma gives expressions and the corresp onding b ound s for the terms Z 5 and Z 6 . 40 Lemma 7.10. If i 1 = i 2 = i and j 1 = j 2 6 = j ′ 1 = j ′ 2 , j 1 = j ′ 2 6 = j ′ 1 = j 2 , then 1 n 2 n X i =1 1 m 4 i m i X j 1 = j 2 6 = j ′ 1 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) =            O ( 1 nh 2 m 2 ) if max {| s 1 − s 2 | , | t 1 − t 2 |} ≤ Ah 2 O ( 1 nhm 2 ) if | s 1 − s 2 | ≤ Ah 2 and | t 1 − t 2 | > Ah 2 O ( 1 nhm 2 ) if | s 1 − s 2 | > Ah 2 and | t 1 − t 2 | ≤ Ah 2 0 otherwise ; (96) 1 n 2 n X i =1 1 m 4 i m i X j 1 = j ′ 2 6 = j ′ 1 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) =            O ( 1 nh 2 m 2 ) if max {| s 1 − t 2 | , | s 2 − t 1 |} ≤ Ah 2 O ( 1 nhm 2 ) if | s 1 − t 2 | ≤ Ah 2 and | s 2 − t 1 | > Ah 2 O ( 1 nhm 2 ) if | s 1 − t 2 | > Ah 2 and | s 2 − t 1 | ≤ Ah 2 0 otherwise . (97) Pro of of Prop osition 7.2 : Define, F i 1 i 2 ; j 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ) := E [ Y 2 i 1 j 1 Y 2 i 2 j 2 e K ( s 1 + t 1 ) / 2 ,l 1 ( T i 1 j 1 ) e K ( s 2 + t 2 ) / 2 ,l 2 ( T i 2 j 2 )] Q h ( s 1 + t 1 2 − s l 1 ) Q h ( s 1 + t 1 2 − s l 2 ) (98) and G i 1 i 2 ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) := E [ Y i 1 j 1 Y i 1 j ′ 1 Y 2 i 2 j 2 e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i 1 j ′ 1 ) e K ( s 2 + t 2 ) / 2 ,l 2 ( T i 2 j 2 )] · Q h ( s 1 − s l 1 ) Q h ( t 1 − s l ′ 1 ) Q h (( s 2 + t 2 ) / 2 − s l 2 ) . (99 ) First, if i 1 = i 2 = i then, with ⋆ denoting F ii ; j 1 j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ), E [ V i ( s 1 + t 1 2 ) V i ( s 2 + t 2 2 )] = 1 m 2 i m i X j 1 6 = j 2 L n X l 1 =1 L n X l 2 =1 ⋆ + 1 m 2 i m i X j 1 = j 2 L n X l 1 =1 L n X l 2 =1 ⋆ . (100) Next, if i 1 6 = i 2 then, with ⋆ denoting F i 1 i 2 ; j 1 j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ), E [ V i 1 ( s 1 + t 1 2 ) V i 2 ( s 2 + t 2 2 )] = 1 m i 1 m i 2 m i 1 X j 1 =1 m i 2 X j 2 =1 L n X l 1 =1 L n X l 2 =1 ⋆ . (101) 41 Next, if i 1 = i 2 = i then, with ⋆ d enoting G ii ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ), E [ U i ( s 1 , t 1 ) V i ( s 2 + t 2 2 )] = 1 m 3 i m i X j 1 6 = j ′ 1 6 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ + 1 m 3 i m i X j 1 = j ′ 1 6 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ 1 m 3 i m i X j ′ 1 6 = j 1 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ + 1 m 3 i m i X j 1 6 = j ′ 1 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ + 1 m 3 i m i X j 1 = j ′ 1 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ . (102) Finally , if i 1 6 = i 2 , then, with ⋆ d enoting G i 1 i 2 ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ), E [ U i 1 ( s 1 , t 1 ) V i 2 ( s 2 + t 2 2 )] = 1 m 2 i 1 m i 2 m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 =1 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ + 1 m 2 i 1 m i 2 m i 1 X j 1 = j ′ 1 m i 2 X j 2 =1 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ . (103) Argumen ts similar to those emplo y ed earlier show th at the sums corresp onding to j 1 = j ′ 1 in ( 102 ) and ( 103 ) do not con trib ute anyt h ing to E k H ν b C c ψ ν k 2 . W e fi rst consid er E ( V i 1 ( z 1 ) V i 2 ( z 2 )). Then Lemmas 7.11 and 7.12 , stated b elow, giv e expressions for the leading term and the term Z 7 (and corresp ond ing b ound ), r esp ectiv ely , in ( 64 ). Lemma 7.11. If i 1 6 = i 2 or i 1 = i 2 , and j 1 6 = j 2 then for | s k − t k | ≤ Ah 2 , k = 1 , 2 , with A ≥ 4( B K + C Q ) , 1 n 2 n X i =1 1 m 2 i m i X j 1 6 = j 2 L n X l 1 =1 L n X l 2 =1 F ii ; j 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ) g ( z 1 ) g ( z 2 ) + 1 n 2 n X i 1 6 = i 2 1 m i 1 m i 2 m i 1 X j 1 =1 m i 2 X j 2 =1 L n X l 1 =1 L n X l 2 =1 F i 1 i 2 ; j 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ) g ( z 1 ) g ( z 2 ) − σ 2 [ E ( b C ∗ ( z 1 )) + E ( b C ∗ ( z 2 ))] + σ 4 = C ( s 1 , t 1 ) C ( s 2 , t 2 ) − 1 n 2 n X i =1 1 m i ! ( C ( s 1 , t 1 ) + σ 2 )( C ( s 2 , t 2 ) + σ 2 ) + O ( h 2 ) +   1 n (1 − 1 n n X i =1 1 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h )) . (104) Lemma 7.12. If i 1 = i 2 = i , j 1 = j 2 , then 1 n 2 n X i =1 1 m 2 i m i X j 1 =1 L n X l 1 =1 L n X l 2 =1 F ii ; j 1 ,j 1 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ) g ( z 1 ) g ( z 2 ) = ( O ( 1 nhm ) if | z 1 − z 2 | ≤ Ah 2 0 otherwise. (105) 42 Finally , consider the term E ( U i 1 ( s 1 , t 1 ) V i 2 ( z 2 )). L emm a 7.13 gives an expression for the leading term in ( 65 ), Lemma 7.14 gives expr essions and the corresp onding b oun ds for the terms Z 8 and Z 9 . Lemma 7.13. If i 1 6 = i 2 , j 1 6 = j ′ 1 ; i 1 = i 2 , j 1 6 = j ′ 1 6 = j 2 , then for | s 1 − t 1 | > Ah 2 , and | s 2 − t 2 | ≤ Ah 2 , with A ≥ 4( B K + C Q ) , 1 n 2 n X i =1 w ( m i ) 1 m 3 i m i X j 1 6 = j ′ 1 6 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 G ii ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) + 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) 1 m 2 i 1 m i 2 m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 =1 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 G i 1 i 2 ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) − σ 2 E e C ( s 1 , t 1 ) = ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( s 2 , t 2 ) + O ( h 2 )) − 1 n 2 n X i =1 2 m i ! ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( s 2 , t 2 ) + σ 2 + O ( h 2 )) +   1 n (1 − 1 n n X i =1 2 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h )) . (106) Lemma 7.14. If i 1 = i 2 = i and j ′ 1 6 = j 1 = j 2 , j 1 6 = j ′ 1 = j 2 , then 1 n 2 n X i =1 w ( m i ) 1 m 3 i m i X j ′ 1 6 = j 1 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 G ii ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) = ( O ( 1 nh 2 m ) if | s 1 − s 2 | ≤ Ah 2 0 otherwise . (107) 1 n 2 n X i =1 w ( m i ) 1 m 3 i m i X j 1 6 = j ′ 1 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 G ii ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) = ( O ( 1 nh 2 m ) if | t 1 − s 2 | ≤ Ah 2 0 otherwise . (108) Details of the calculation of p oin twise v a riance ( 32 ) Pro of of Prop osition 7.3 : Consider firs t W e h n ( s, t )V ar ( e C ( s, t )) = W e h n ( s, t ) E ( e C ( s, t )) 2 − ( E ( W e h n ( s, t ) e C ( s, t ))) 2 . 43 Using ( 59 ), ( 88 ) and th e argumen ts leading to ( 91 ), w e hav e W e h n ( s, t ) 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) w ( m i 2 ) m 2 i 1 m 2 i 2 m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 6 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s, t, s, t ; l 1 , l ′ 1 , l 2 , l ′ 2 ) ( g ( s ) g ( t )) 2 − ( E ( W e h n ( s, t ) e C ( s, t ))) 2 = W e h n ( s, t ) 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) w ( m i 2 ) m 2 i 1 m 2 i 2 m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 6 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E [ C ( T i 1 j 1 , T i 1 j ′ 1 ) e K s,l 1 ( T i 1 j 1 ) e K t,l ′ 1 ( T i 1 j ′ 1 )] g ( s ) g ( t ) E [ C ( T i 2 j 2 , T i 2 j ′ 2 ) e K s,l 2 ( T i 2 j 2 ) e K t,l ′ 2 ( T i 2 j ′ 2 )] g ( s ) g ( t ) − W e h n ( s, t ) 1 n 2   n X i =1 w ( m i ) m 2 i m i X j 1 6 = j ′ 1 L n X l 1 ,l ′ 1 =1 E [ C ( T ij 1 , T ij ′ 1 ) e K s,l 1 ( T ij ′ 1 ) e K t,l ′ 1 ( T ij ′ 1 )] g ( s ) g ( t )   2 + W e h n ( s, t ) 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2  ( C ( s, s ) + O ( h 2 ))( C ( t, t ) + O ( h 2 )) + ( C ( s, t ) + O ( h 2 )) 2  = − W e h n ( s, t ) 1 n ( C ( s, t ) + O ( h 2 )) 2 + W e h n ( s, t ) 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2  ( C ( s, s ) + O ( h 2 ))( C ( t, t ) + O ( h 2 )) + ( C ( s, t ) + O ( h 2 )) 2  = O  1 n  +   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   O (1) . (109) Com binin g ( 109 ) with ( 91 ) and ( 92 ) - ( 97 ), w e obtain ( 66 ). Pro of of Prop osition 7.4 : W rite W e h n ( s, t )V ar ( b C ∗ ( s + t 2 )) = W e h n ( s, t ) E ( b C ∗ ( s + t 2 )) 2 − ( E ( W e h n ( s, t ) b C ∗ ( s + t 2 ))) 2 , and observe that, b y ( 63 ), ( 98 ) and ( 104 ), an d follo win g steps ve r y similar to those leading to ( 109 ), w e h a v e W e h n ( s, t ) 1 n 2 n X i 1 6 = i 2 1 m i 1 m i 2 m i 1 X j 1 =1 m i 2 X j 2 =1 L n X l 1 =1 L n X l 2 =1 F i 1 i 2 ; j 1 ,j 2 ( s, t, s, t ; l 1 , l 2 ) g ( s + t 2 ) 2 − ( E ( W e h n ( s, t ) b C ∗ ( s + t 2 ))) 2 = − W e h n ( s, t ) 1 n ( C ( s, t ) + σ 2 + O ( h 2 )) 2 + W e h n ( s, t )   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   (2( C ( s, t )) 2 + O ( h )) = O  1 n  +   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   O (1) . (110) Com binin g ( 110 ) with the steps leading to ( 104 ) and ( 105 ), we obtain ( 67 ). 44 Pro ofs of Lemmas 7.7 - 7.14 Pro of of Lemma 7.7 : Since ρ ii = 1, from expressions ( 112 ) and ( 115 ) w e can treat the terms corresp ondin g to i 1 = i 2 = i and i 1 6 = i 2 in a unified wa y . F rom ( 120 ) and ( 121 ), the exp r ession ( 122 ) and the calculations leading to ( 50 ), ( 91 ) follo ws. Pro of of Lemma 7.8 : I t follo ws from ( 116 ), ( 123 ) and ( 128 ) (taking s = s 1 , s ′ = s 2 , t = t 1 and t ′ = t 2 in the latter). Pro of of Lemma 7.9 : F ollo w s by arguments analogo u s to th ose for deriving ( 92 ). Pro of of Lemma 7.10 : F ollo ws from ( 118 ), ( 123 ) and ( 126 ). Pro of of Lemma 7.11 : By ( 114 ) and ( 118 ), E [ Y 2 i 1 j 1 Y 2 i 2 j 2 | T i 1 , T i 2 ] = ( C ( T i 1 j 1 , T i 1 j 1 ) + σ 2 )( C ( T i 2 j 2 , T i 2 j 2 ) + σ 2 ) + 2 ρ 2 i 1 i 2 ( C ( T i 1 j 1 , T i 2 j 2 )) 2 . (111) The expression for E [( C ( T i 1 j 1 , T i 2 j 2 )) 2 e K z 1 ,l 1 ( T i 1 j 1 ) e K z 2 ,l 2 ( T i 2 j 2 )] is giv en b y Z Z ( C ( u, v )) 2 g ( u ) g ( v ) e K z 1 ,l 1 ( u ) e K z 2 ,l 2 ( v ) dudv , and it can b e shown that when w e sum o ve r l 1 , l 2 = 1 , . . . , L n , th e sum equals ( C ( z 1 , z 2 )) 2 g ( z 1 ) g ( z 2 )+ O ( h 2 ). F rom this, and the calculati ons leading to ( 51 ), w e h a ve , f or | s k − t k | ≤ Ah 2 , k = 1 , 2, w ith 45 A ≥ 4( B K + C Q ), 1 n 2 n X i =1 1 m 2 i m i X j 1 6 = j 2 L n X l 1 =1 L n X l 2 =1 F ii ; j 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ) g ( z 1 ) g ( z 2 ) + 1 n 2 n X i 1 6 = i 2 1 m i 1 m i 2 m i 1 X j 1 =1 m i 2 X j 2 =1 L n X l 1 =1 L n X l 2 =1 F i 1 i 2 ; j 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ) g ( z 1 ) g ( z 2 ) − σ 2 [ E ( b C ∗ ( z 1 )) + E ( b C ∗ ( z 2 ))] + σ 4 = 1 n (1 − 1 n n X i =1 1 m i ) + n − 1 n ! ( C ( z 1 , z 1 ) + σ 2 + O ( h 2 ))( C ( z 2 , z 2 ) + σ 2 + O ( h 2 )) +   1 n (1 − 1 n n X i =1 1 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   (2( C ( z 1 , z 2 )) 2 + O ( h 2 )) − σ 2 ( C ( z 1 , z 1 ) + C ( z 2 , z 2 ) + 2 σ 2 + O ( h 2 )) + σ 4 = 1 − 1 n 2 n X i =1 1 m i ! ( C ( s 1 , t 1 ) + σ 2 + O ( h 2 ))( C ( s 2 , t 2 ) + σ 2 + O ( h 2 )) +   1 n (1 − 1 n n X i =1 1 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h )) , − σ 2 ( C ( s 1 , t 1 ) + C ( s 2 , t 2 )) − σ 4 + O ( h 2 ) = C ( s 1 , t 1 ) C ( s 2 , t 2 ) − 1 n 2 n X i =1 1 m i ! ( C ( s 1 , t 1 ) + σ 2 )( C ( s 2 , t 2 ) + σ 2 ) + O ( h 2 ) +   1 n (1 − 1 n n X i =1 1 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h )) . Pro of of Lemma 7.12 : Note that, E ( Y 4 ij 1 | T i ) = 3( C ( T ij 1 , T ij 1 ) + σ 2 ) 2 . Th us, from ( 129 ), we ha ve L n X l 1 ,l 2 =1 E [ E ( Y 4 ij 1 | T i ) e K z 1 ,l 1 ( T ij 1 ) e K z 2 ,l 2 ( T ij 1 )] Q h ( z 1 − s l 1 ) Q h ( z 2 − s l 2 ) = ( O ( h − 1 ) if | z 1 − z 2 | ≤ Ah 2 0 otherwise uniformly in s 1 , t 1 , s 2 , t 2 ∈ [0 , 1] and 1 ≤ j 1 ≤ m i , 1 ≤ i ≤ n . T h erefore ( 105 ) follo ws. Pro of of Lemma 7.13 : By ( 113 ) and ( 116 ), E [ Y i 1 j 1 Y i 1 j ′ 1 Y 2 i 2 j 2 | T i 1 , T i 2 ] = C ( T i 1 j 1 , T i 1 j ′ 1 )( C ( T i 2 j 2 , T i 2 j 2 ) + σ 2 ) + 2 ρ 2 i 1 i 2 C ( T i 1 j 1 , T i 2 j 2 ) C ( T i 1 j ′ 1 , T i 2 j 2 ) . The expression for E [ C ( T i 1 j 1 , T i 2 j 2 ) C ( T i 1 j ′ 1 , T i 2 j 2 ) e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i 1 j ′ 1 ) e K z 2 ,l 2 ( T i 2 j 2 )] is giv en b y Z Z Z C ( u, w ) C ( v , w ) g ( u ) g ( v ) g ( w ) e K s,l ( u ) e K t,l ′ ( v ) e K z ,m ( w ) dudv dw , 46 and it can b e sho wn that w hen we sum this ov er l 1 , l ′ 1 , l 2 = 1 , . . . , L n , the sum equals C ( s 1 , z 2 ) C ( t 1 , z 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) + O ( h 2 ) . F rom this, and s imilar argumen ts as b efore, we ha ve, for | s 1 − t 1 | > Ah 2 , and | s 2 − t 2 | ≤ Ah 2 , with A ≥ 4( B K + C Q ), 1 n 2 n X i =1 w ( m i ) 1 m 3 i m i X j 1 6 = j ′ 1 6 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 G ii ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) + 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) 1 m 2 i 1 m i 2 m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 =1 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 G i 1 i 2 ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) − σ 2 E e C ( s 1 , t 1 ) = 1 n (1 − 1 n n X i =1 2 m i )( C ( s 1 , t 1 ) + O ( h 2 ))( C ( z 2 , z 2 ) + σ 2 + O ( h 2 )) + 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) 1 m 2 i 1 m i 1 X j 1 6 = j ′ 1 L n X l 1 ,l ′ 1 =1 E [ C ( T i 1 j 1 , T i 1 j ′ 1 ) e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i 1 j ′ 1 )] g ( s 1 ) g ( t 1 ) · 1 m i 2 m i 2 X j 2 =1 L n X l 2 =1 E [( C ( T i 2 j 2 , T i 2 j 2 ) + σ 2 ) e K z 2 ,l 2 ( T i 2 j 2 )] g ( z 2 ) − σ 2 1 n n X i =1 w ( m i ) 1 m 2 i m i X j 1 6 = j ′ 1 L n X l 1 ,l ′ 1 =1 E [ C ( T i 1 j 1 , T i 1 j ′ 1 ) e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i 1 j ′ 1 )] g ( s 1 ) g ( t 1 ) +   1 n (1 − 1 n n X i =1 2 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   (2 C ( s 1 , z 2 ) C ( t 1 , z 2 ) + O ( h 2 )) = 1 n (1 − 1 n n X i =1 2 m i ) + n − 1 n ! ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( z 2 , z 2 ) + σ 2 + O ( h 2 )) − σ 2 ( C ( s 1 , t 1 ) + O ( h 2 )) +   1 n (1 − 1 n n X i =1 2 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   (2 C ( s 1 , z 2 ) C ( t 1 , z 2 ) + O ( h 2 )) = ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( s 2 , t 2 ) + O ( h 2 )) − 1 n 2 n X i =1 2 m i ! ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( s 2 , t 2 ) + σ 2 + O ( h 2 )) +   1 n (1 − 1 n n X i =1 2 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h )) . The last equalit y follo ws from the fact that the terms C ( s 1 , t 1 ) + O ( h 2 )) app earing lines f our, nine and ten are the same. Pro of of Lemma 7.14 : F ollo ws from ( 117 ) and ( 12 7 ). 47 Pro of of ( 72 ) Define W ij = ψ k ( T ij ) B i ( s, T ij ) and ¯ W ij = ψ k ′ ( T ij ) B i ( t, T ij ). Since | s − t | > Ah/ 2, it follo ws that for all i , W k ij ¯ W l ij = 0, for all k , l ≥ 1, for all j = 1 , . . . , m i . Th us, if | s − t | > Ah/ 2, then m 4 i V ar ( B 1 i,k ( s ) B 1 i,k ′ ( t )) = E [ m i X j 6 = j ′ ( W ij ¯ W ij ′ − E ( W ij ¯ W ij ′ ))] 2 = m i X j 6 = j ′ [ E ( W ij ¯ W ij ′ ) 2 − ( E ( W ij ¯ W ij ′ )) 2 ] + m i X j 1 = j ′ 2 6 = j ′ 1 = j 2 [ E ( W ij 1 ¯ W ij ′ 1 W ij ′ 1 ¯ W ij 1 ) − E ( W ij 1 ¯ W ij ′ 1 ) E ( W ij ′ 1 ¯ W ij 1 )] + m i X j 1 = j 2 6 = j ′ 1 6 = j ′ 2 [ E ( W 2 ij 1 ¯ W ij ′ 1 ¯ W ij ′ 2 ) − E ( W ij 1 ¯ W ij ′ 1 ) E ( W ij 1 ¯ W ij ′ 2 )] + m i X j 1 = j ′ 2 6 = j ′ 1 6 = j 2 [ E ( W ij 1 ¯ W ij ′ 1 W ij 2 ¯ W ij 1 ) − E ( W ij 1 ¯ W ij ′ 1 ) E ( W ij 2 ¯ W ij 1 )] + m i X j ′ 1 = j 2 6 = j 1 6 = j ′ 2 [ E ( W ij 1 ¯ W ij ′ 1 W ij ′ 1 ¯ W ij ′ 2 ) − E ( W ij 1 ¯ W ij ′ 1 ) E ( W ij ′ 1 ¯ W ij ′ 2 )] + m i X j ′ 1 = j ′ 2 6 = j 1 6 = j 2 [ E ( W ij 1 ¯ W 2 ij ′ 1 W ij 2 ) − E ( W ij 1 ¯ W ij ′ 1 ) E ( W ij 2 ¯ W ij 1 )] , since the term corresp onding to j 1 6 = j ′ 1 6 = j 2 6 = j ′ 2 v anishes. No w b y using the fact that T ij ’s are i.i.d., we can simplify eac h sum on th e RHS. 1st term = m i ( m i − 1)[ E ( W 2 i 1 ) E ( ¯ W 2 i 1 ) − ( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 ] 2nd term = m i ( m i − 1)[0 − ( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 ] 3rd term = m i ( m i − 1)( m i − 2)[ E ( W 2 i 1 )( E ( ¯ W i 1 )) 2 − ( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 ] 4th term = m i ( m i − 1)( m i − 2)[0 − ( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 ] 5th term = m i ( m i − 1)( m i − 2)[0 − ( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 ] 6th term = m i ( m i − 1)( m i − 2)[( E ( W i 1 )) 2 E ( ¯ W 2 i 1 ) − ( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 ] Th u s, m 4 i V ar( B 1 i,k ( s ) B 1 i,k ′ ( t )) = m i ( m i − 1)[ E ( W 2 i 1 ) E ( ¯ W 2 i 1 ) + ( m i − 2) E ( W 2 i 1 )( E ( ¯ W i 1 )) 2 + ( m i − 2)( E ( W i 1 )) 2 E ( ¯ W 2 i 1 )] − m i ( m i − 1)(4 m i − 6)( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 No w, u s ing the facts that E ( W 2 i 1 ) = O ( h − 1 ) = E ( ¯ W 2 i 1 ) and | E ( W i 1 ) | = O (1) = | E ( W i 1 ) | , w e conclude ( 72 ). Computation of conditional mixed momen ts The computation of th e momen ts is done by u sing the Wick formula (Lemma 7.4 ). W e consider all the different generic cases b elo w : 48 • Case : i 1 6 = i 2 , j 1 6 = j ′ 1 , j 2 6 = j ′ 2 : In this case, E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = C ( T i 1 j 1 , T i 1 j ′ 1 ) C ( T i 2 j 2 , T i 2 j ′ 2 ) + ρ 2 i 1 i 2 h C ( T i 1 j 1 , T i 2 j 2 ) C ( T i 1 j ′ 1 , T i 2 j ′ 2 ) + C ( T i 1 j 1 , T i 2 j ′ 2 ) C ( T i 1 j ′ 1 , T i 2 j ′ 2 ) i . (112) • Case : i 1 6 = i 2 , j 1 = j ′ 1 , j 2 6 = j ′ 2 (equiv alent to i 1 6 = i 2 , j 1 6 = j ′ 1 , j 2 = j ′ 2 ): I n this case, E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = E ( X 2 i 1 j 1 X i 2 j 2 X i 2 j ′ 2 | T i 1 , T i 2 ) + σ 2 E ( X i 2 j 2 X i 2 j ′ 2 | T i 1 , T i 2 ) . Therefore, b y ( 42 ), E ( X 2 i 1 j 1 X i 2 j 2 X i 2 j ′ 2 | T i 1 , T i 2 ) = C ( T i 1 j 1 , T i 1 j 1 ) C ( T i 2 j 2 , T i 2 j ′ 2 ) + 2 ρ 2 i 1 i 2 C ( T i 1 j 1 , T i 2 j 2 ) C ( T i 1 j 1 , T i 2 j ′ 2 ) . Com binin g, w e hav e E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = C ( T i 1 j 1 , T i 1 j 1 ) C ( T i 2 j 2 , T i 2 j ′ 2 ) + 2 ρ 2 i 1 i 2 C ( T i 1 j 1 , T i 2 j 2 ) C ( T i 1 j 1 , T i 2 j ′ 2 ) + σ 2 C ( T i 2 j 2 , T i 2 j ′ 2 ) . (113) • Case : i 1 6 = i 2 , j 1 = j ′ 1 , j 2 = j ′ 2 : In this case, E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = ( C ( T i 1 j 1 , T i 1 j 1 ) + σ 2 )( C ( T i 2 j 2 , T i 2 j 2 ) + σ 2 ) + 2 ρ 2 i 1 i 2 ( C ( T i 1 j 1 , T i 2 j 2 )) 2 . (114) • Case : i 1 = i 2 , j 1 6 = j ′ 1 6 = j 2 6 = j ′ 2 : In this case E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = C ( T i 1 j 1 , T i 1 j ′ 1 ) C ( T i 1 j 2 , T i 1 j ′ 2 ) + C ( T i 1 j 1 , T i 1 j 2 ) C ( T i 1 j ′ 1 , T i 1 j ′ 2 ) + C ( T i 1 j 1 , T i 1 j ′ 2 ) C ( T i 1 j ′ 1 , T i 1 j 2 ) . (115) • Case : i 1 = i 2 , j 1 = j ′ 1 6 = j 2 6 = j ′ 2 (equiv alent to i 1 = i 2 , j 1 = j 2 6 = j ′ 1 6 = j ′ 2 ; i 1 = i 2 , j 1 = j ′ 2 6 = j ′ 1 6 = j ′ 2 ; i 1 = i 2 , j 1 6 = j ′ 1 = j 2 6 = j ′ 2 ; i 1 = i 2 , j 1 6 = j ′ 1 = j ′ 2 6 = j 2 ; and i 1 = i 2 , j 1 6 = j ′ 1 6 = j 2 = j ′ 2 ): I n this case E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = ( C ( T i 1 j 1 , T i 1 j 1 ) + σ 2 ) C ( T i 1 j 2 , T i 1 j ′ 2 ) + 2 C ( T i 1 j 1 , T i 1 j 2 ) C ( T i 1 j 1 , T i 1 j ′ 2 ) . (116) • Case : i 1 = i 2 , j 1 = j ′ 1 = j 2 6 = j ′ 2 (equiv alent to i 1 = i 2 , j 1 = j ′ 1 = j ′ 2 6 = j 2 ; i 1 = i 2 , j 1 = j 2 = j ′ 2 6 = j ′ 1 ; and i 1 = i 2 , j 1 6 = j ′ 1 = j 2 = j ′ 2 ): I n this case E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = 3 C ( T i 1 j 1 , T i 1 j 1 ) C ( T i 1 j 1 , T i 1 j ′ 2 ) + 3 σ 2 C ( T i 1 j 1 , T i 1 j ′ 2 ) . (117) • Case : i 1 = i 2 , j 1 = j ′ 1 6 = j 2 = j ′ 2 (equiv alent to i 1 = i 2 , j 1 = j 2 6 = j ′ 1 = j ′ 2 ; and i 1 = i 2 , j 1 = j ′ 2 6 = j ′ 1 = j 2 ): T his this case E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = ( C ( T i 1 j 1 , T i 1 j 1 ) + σ 2 )( C ( T i 1 j 2 , T i 1 j 2 ) + σ 2 ) + 2( C ( T i 1 j 1 , T i 1 j 2 )) 2 . (118) • Case : i 1 = i 2 , j 1 = j ′ 1 = j 2 = j ′ 2 : In this case E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = 3( C ( T i 1 j 1 , T i 1 j 1 ) + σ 2 ) 2 . (119) 49 Computation of unconditional mixed momen ts ( off-diagonal part) Here, w e obtain simplified forms the certain exp ectations that are used in the pr o of of Prop ositions 7.3 and 7.4 . O b serv e that, based on our calcula tions in App endix A, w e only need to compute the exp ectations of the form E [ C ( T i 1 j 1 , T i ′ 1 j ′ 1 ) C ( T i 2 j 2 , T i ′ 2 j ′ 2 ) e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i ′ 1 j ′ 1 ) e K s 2 ,l 2 ( T i 2 j 2 ) e K t 2 ,l ′ 2 ( T i ′ 2 j ′ 2 )] . (120) Notice that, wh en the pairs ( T i 1 j 1 , T i ′ 1 j ′ 1 ) and ( T i 2 j 2 , T i ′ 2 j ′ 2 ) are indep endent , the exp ectation in ( 120 ) factorizes as E [ C ( T i 1 j 1 , T i ′ 1 j ′ 1 ) e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i ′ 1 j ′ 1 )] E [ C ( T i 2 j 2 , T i ′ 2 j ′ 2 ) e K s 2 ,l 2 ( T i 2 j 2 ) e K t 2 ,l ′ 2 ( T i ′ 2 j ′ 2 )] . (121) Eac h individual term is exactly of th e same form that w e en countered wh ile calc u lating the b ias of our estimate. The exp ectations app earing ab o ve are of the form Z Z C ( u, v ) g ( u ) g ( v ) e K s,l ( u ) e K s ′ ,l ′ ( v ) du. (122) F or other terms we need to ev aluate or appr oximate v arious other in tegrals. The general forms of these int egrals are giv en b elo w, for 1 ≤ l , l ′ , m, m ′ ≤ L n and s, s ′ , t, t ′ ∈ [0 , 1]. Z ( C ( u, u )) r g ( u ) e K s,l ( u ) e K s ′ ,l ′ ( u ) du = ( O ( h − 1 ) if max {| s − s l | , | s ′ − s l ′ |} ≤ 2 B K h 0 otherwise ; for r = 0 , 1 , 2; (123) Z C ( u, u ) g ( u ) e K s,l ( u ) e K s ′ ,l ′ ( u ) e K t,m ( u ) e K t ′ ,m ′ ( u ) du ; (124) Z ( C ( u, u )) 2 g ( u ) e K s,l ( u ) e K s ′ ,l ′ ( u ) e K t,m ( u ) e K t ′ ,m ′ ( u ) du. (1 25) Z Z ( C ( u, v )) r g ( u ) g ( v ) e K s,l ( u ) e K s ′ ,l ′ ( u ) e K t,m ( v ) e K t ′ ,m ′ ( v ) dudv = ( O ( h − 2 ) if max {| s − s l | , | s ′ − s l ′ | , | t − s m | , | t ′ − s m ′ |} ≤ 2 B K h 0 otherwise . ; for r = 0 , 1 , 2 (126) Z Z ( C ( u, u )) r C ( u, v ) g ( u ) g ( v ) e K s,l ( u ) e K s ′ ,l ′ ( u ) e K t,m ( u ) e K t ′ ,m ′ ( v ) dudv = ( O ( h − 2 ) if max {| s − s l | , | s ′ − s l ′ | , | t − m |} ≤ 2 B K h 0 otherwise ; for r = 0 , 1 . (127) Z Z Z C ( u, v ) C ( u, w ) g ( u ) g ( v ) g ( w ) e K s,l ( u ) e K s ′ ,l ′ ( u ) e K t,m ( v ) e K t ′ ,m ′ ( w ) dudv dw = ( O ( h − 1 ) if max {| s − s l | , | s ′ − s l ′ |} ≤ 2 B K h 0 otherwise . (128) 50 Computation of unconditional mixed momen ts ( diagonal and mixed pa rt ) W e ha v e the follo wing b ound : Z ( C ( u, u )) r g ( u ) e K z 1 ,l 1 ( u ) e K z 2 ,l 2 ( u ) du = ( O ( h − 1 ) if | z k − s l k | ≤ 2 B K h, k = 1 , 2 0 otherwise for r = 0 , 1 , 2 . (129) Some error b ounds in volv ing Dirac- δ Here, w e pr ovide some k ey estimates that are crucial to obtaining the o ve rall risk b ound. They all in vol ve the op erator H ν . Due to the decomp osition ( 55 ) we can reduce the co m p utations of these b ound s to in tegrals in v olving { ψ k ( · ) } M k =1 and δ ( · , · ). Thr oughout we assume that R ( s 1 , s 2 , t 1 , t 2 ) is a “nice” fun ction satisfying certain (b oun d edness) conditions. Then, | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 | = | Z Z R ( x, x, t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 | ≤ k R k ∞ k ψ ν k 2 ∞ . (130) | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) W Ah ( s 1 , t 1 ) R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 | = | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) Z ( s 1 + Ah 2 ) ∧ 1 ( s 1 − Ah 2 ) ∨ 0 R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 ds 1 ds 2 | = | Z Z ( x + Ah 2 ) ∧ 1 ( x − Ah 2 ) ∨ 0 R ( x, x, t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 | ≤ Ah k R k ∞ k ψ ν k 2 ∞ . (131 ) | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) W Ah ( s 1 , t 1 ) W Ah ( s 2 , t 2 ) R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 | = | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) Z ( s 2 + Ah 2 ) ∧ 1 ( s 2 − Ah 2 ) ∨ 0 Z ( s 1 + Ah 2 ) ∧ 1 ( s 1 − Ah 2 ) ∨ 0 R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 ds 1 ds 2 | = | Z ( x + Ah 2 ) ∧ 1 ( x − Ah 2 ) ∨ 0 Z ( x + Ah 2 ) ∧ 1 ( x − Ah 2 ) ∨ 0 R ( x, x, t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 | ≤ ( Ah ) 2 k R k ∞ k ψ ν k 2 ∞ . (132) | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) W Ah ( t 1 , t 2 ) R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 | = | Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) Z ( t 2 + Ah 2 ) ∧ 1 ( t 2 − Ah 2 ) ∨ 0 R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 ds 1 ds 2 | = | Z Z ( t 2 + Ah 2 ) ∧ 1 ( t 2 − Ah 2 ) ∨ 0 R ( x, x, t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 | ≤ Ah k R k ∞ k ψ ν k 2 ∞ . (133) 51 | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) W Ah ( t 1 , t 2 ) W Ah ( s 2 , t 2 ) R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 | = | Z Z δ ( x, s 1 ) δ ( x, s 2 ) Z ( s 2 + Ah 2 ) ∧ 1 ( s 2 − Ah 2 ) ∨ 0 Z ( t 2 + Ah 2 ) ∧ 1 ( t 2 − Ah 2 ) ∨ 0 R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 ds 1 ds 2 | = | Z ( x + Ah 2 ) ∧ 1 ( x − Ah 2 ) ∨ 0 Z ( t 2 + Ah 2 ) ∧ 1 ( t 2 − Ah 2 ) ∨ 0 R ( x, x, t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 | ≤ ( Ah ) 2 k R k ∞ k ψ ν k 2 ∞ . (134) | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) W Ah ( t 1 , s 2 ) W Ah ( s 2 , t 2 ) R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 | = | Z Z δ ( x, s 1 ) δ ( x, s 2 ) Z ( s 2 + Ah 2 ) ∧ 1 ( s 2 − Ah 2 ) ∨ 0 Z ( s 2 + Ah 2 ) ∧ 1 ( s 2 − Ah 2 ) ∨ 0 R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 ds 1 ds 2 | = | Z ( x + Ah 2 ) ∧ 1 ( x − Ah 2 ) ∨ 0 Z ( x + Ah 2 ) ∧ 1 ( x − Ah 2 ) ∨ 0 R ( x, x, t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 | ≤ ( Ah ) 2 k R k ∞ k ψ ν k 2 ∞ . (135) App endix G : Pro of of Theorem 4.3 In order to p ro ve th is result, w e use a strategy v ery similar to the one used in the p r o of of Corollary 1 in P aul and P eng (2007 ). In view of the statemen t of the theorem, it suffices to co n sider a submo d el consisting of k ernels Σ of r an k 1. Let Σ (0) ( s, t ) = λψ ( s ) ψ ( t ) , s, t ∈ [0 , 1] for λ ≥ C 1 , where ψ ( · ) ≡ 1. Th en ψ is the first (and only) eigenfunction of Σ (0) with corresp onding eigen v alue λ . Let us supp ose that th e design D satisfies m = m = m ≥ 4. Finally , c ho ose g to b e the uniform densit y on [0 , 1]. Let M ∗ ∼ ( nm ) 1 / 5 , and let { γ l } M ∗ l =1 b e orthonormal functions such that (i) γ l ’s are t w ice con tinuously differen tiable, and max l k γ ( j ) l k ∞ = O ( M 1 / 2+ j ∗ ), for j = 0 , 1 , 2; (ii) R 1 0 γ l ( s ) ds = 0 f or all l , and (iii) γ l is cen tered aroun d l / M ∗ with length of su pp ort O ( M − 1 ∗ ) uniformly o v er l . Note th at, condition (iii) implies that { γ l } are orthogonal to ψ . Let M 0 = [ 2 M ∗ 9 ]. Let F 0 b e an index set s atisfying log |F 0 | ≍ M ∗ , and { z ( j ) l : l = 1 , . . . , M ∗ } j ∈F 0 b e a collection with z ( j ) l taking v alues in {− M − 1 / 2 0 , 0 , M − 1 / 2 0 } , su ch that with z ( j ) denoting the ve ctor ( z ( j ) l ) M ∗ l =1 , we h a v e k z ( j ) k 2 = 1 and k z ( j ) − z ( j ′ ) k 2 ≥ 1 for j 6 = j ′ ∈ F 0 . T h e constru ction is b y a “sph ere packing” argumen t as in P aul and John stone (2007). Let δ ≍ ( nm ) − 2 / 5 ≍ M − 2 ∗ Then, define ψ ( j ) ( s ) = p 1 − δ 2 ψ ( s ) + δ M ∗ X l =1 z ( j ) l γ l ( s ) , j ∈ F 0 . Note that by construction, (i’) k ψ ( j ) k 2 = 1; (ii’) ψ ( j ) are t wice differen tiable, with second d eriv ativ e b ound ed; (iii’) k ψ ( j ) − ψ ( j ′ ) k 2 ≥ δ f or j 6 = j ′ ∈ F 0 ; (iv’) k ψ − ψ ( j ) k ∞ = O ( δ ) uniform ly o v er j ∈ F 0 . Prop erty (iv’) will b e crucial for muc h of our analysis later on . In order to pr o v e Theorem 4.3 , w e n eed to sho w the follo wing: n X i =1 E K ( Σ ( j ) i , Σ (0) i ) ≍ nmδ 2 , uniformly in j ∈ F 0 , (136) 52 where Σ ( j ) i denotes the cov ariance of the observ atio n i given { T il } m l =1 under the mo del parameterized b y Σ ( j ) 0 , and E denotes exp ectation with resp ect to the design p oints T . Pro of of ( 136 ) F rom now on wards, w e shall fix j ∈ F 0 , and drop the su p erscript ( j ) for con v enience. Denote the m × 1 v ectors ( ψ ( T ij ) m j =1 and ( ψ ( T ij ) m j =1 b y ψ i and ψ i , resp ectiv ely . Of course, ψ i is the nonrandom v ector w ith all the entrie s equal to 1. Next, observ e that, k Σ (0) i − Σ i k F = λ k ψ i ( ψ i − ψ i ) T + ( ψ i − ψ i ) ψ T i k F ≤ λ ( k ψ i k 2 + k ψ i k 2 ) k ψ i − ψ i k 2 . (137) Since k ψ i − ψ i k 2 ≤ √ m k ψ − ψ k ∞ = O ( √ mδ ) (by p r op erty (iv’)), and k ψ i k 2 = √ m , from ( 137 ) it follo ws that, max 1 ≤ i ≤ n k Σ (0) i − Σ i k 2 F = O ( m 2 δ 2 ) . (138) Since mδ ≍ m ( nm ) − 2 / 5 and m = o ( n 2 / 3 ), the RHS of ( 138 ) is o (1) ( a nonr andom b ound ) u niformly o v er F 0 , and hence, using argument s as in the pro of of Prop osition 2 in P aul and Peng (2007 ), we ha ve n X i =1 K ( Σ i , Σ (0) i ) ≍ n X i =1 k ( Σ (0) i ) − 1 / 2 (Σ (0) i − Σ i )(Σ (0) i ) − 1 / 2 k 2 F , u n iformly o v er F 0 . Th u s, ( 136 ) will follo w once w e p ro ve: Prop osition 7.5. Uniformly over F 0 , E k ( Σ (0) 1 ) − 1 / 2 (Σ (0) 1 − Σ 1 )(Σ (0) 1 ) − 1 / 2 k 2 F ≍ mδ 2 . (139) Pro of of Prop osition 7.5 : First, note that, θ = ψ 1 / √ m is a v ector of l 2 norm 1, and hence, b y using a standard m atrix in ve r s ion formula, (Σ (0) 1 ) − 1 = ( I + λmθ θ T ) − 1 = I − κθ θ T , where κ = λm 1 + λm . Let ∆ = Σ 1 − Σ (0) 1 = λ ( ψ 1 ψ T 1 − mθ θ T ). Then, k ( Σ (0) 1 ) − 1 / 2 (Σ (0) 1 − Σ 1 )(Σ (0) 1 ) − 1 / 2 k 2 F = tr [( I − κ θ θ T )∆( I − κθ θ T )∆] = tr [( I − θ θ T )∆( I − θ θ T )∆] + 2(1 − κ ) θ T ∆( I − θ θ T )∆ θ + (1 − κ ) 2 ( θ T ∆ θ ) 2 = λ 2 h k ( I − θ θ T ) ψ 1 k 4 2 +2(1 − κ )( θ T ψ 1 ) 2 k ( I − θ θ T ) ψ 1 k 2 2 +(1 − κ ) 2 ( m − ( θ T ψ 1 ) 2 ) 2 i = λ 2 h ( k ψ 1 k 2 2 − ( θ T ψ 1 ) 2 ) 2 + 2(1 − κ )( θ T ψ 1 ) 2 ( k ψ 1 k 2 2 − ( θ T ψ 1 ) 2 ) + (1 − κ ) 2 ( m − ( θ T ψ 1 ) 2 ) 2 i (140) where the third and last steps f ollo w f rom the fact that ( I − θ θ T ) θ = 0 and ( I − θ θ T ) 2 = I − θ θ T . F rom ( 140 ), the pro of will follo w once w e establish the follo wing results. 53 Lemma 7.15. With T ij i.i.d. fr om Unif orm [0 , 1] , we have (uniformly over F 0 ) E [ m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ] = m ( m − 1) δ 2 (1 + o (1)) , and V ar [ m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ] = O ( m 3 δ 4 ) . Lemma 7.16. With T ij i.i.d. fr om Unif orm [0 , 1] , we have (uniformly over F 0 ) E [ m − ψ T 1 ψ 1 ] 2 = mδ 2 (1 + o (1)) , E k ψ 1 − ψ 1 k 4 2 = O ( m 2 δ 4 ) . Lemma 7.17. With T ij i.i.d. fr om Unif orm [0 , 1] , we have (uniformly over F 0 ) E [ k ψ 1 k 2 2 ( m k ψ 1 k 2 2 − ( ψ T ψ 1 ) 2 )] = m 2 ( m − 1) δ 2 (1 + o (1)) . T o see ho w ( 139 ) follo ws from Lemm as 7.15 - 7.17 , note first that, E ( ψ T 1 ψ 1 ) 2 ( m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ) = m E [ k ψ 1 k 2 2 ( m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 )] − E ( m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ) 2 = m 3 ( m − 1) δ 2 (1 + o (1)) − O ( m 4 δ 4 ) = m 3 ( m − 1) δ 2 (1 + o (1)) ( 141) b y Lemmas 7.15 and 7.17 . No w, from ( 140 ) we obtain E k ( Σ (0) 1 ) − 1 / 2 ( Σ (0) 1 − Σ 1 )(Σ (0) 1 ) − 1 / 2 k 2 F ≥ 2 λ 2 (1 − κ ) 1 m 2 E ( ψ T 1 ψ 1 ) 2 ( m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ) = 2 λ 2 m ( m − 1) 1 + λm δ 2 (1 + o (1)) , where the last step is b y ( 141 ). This establishes the lo w er b oun d in ( 139 ). T o establish the u pp er b ound in ( 139 ), we also need to consider the exp ectations of the other t wo terms on the RHS of ( 140 ). First, b y Lemma 7.15 , λ 2 E ( k ψ 1 k 2 2 − ( θ T ψ 1 ) 2 ) 2 = λ 2 m 2 E ( m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ) 2 = O ( m 2 δ 4 ) . (142) Next, writing m − ( θ T ψ 1 ) 2 = m − k ψ 1 k 2 2 + 1 m [ m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ] , and then using the fact that f or an y ǫ > 0, and a, b ∈ R , ( a + b ) 2 ≤ (1 + ǫ ) a 2 + (1 + ǫ − 1 ) b 2 , we ha ve, for arbitrary b ut fixed ǫ > 0, E ( m − ( θ T ψ 1 ) 2 ) 2 ≤ (1 + ǫ ) E [ m − k ψ 1 k 2 2 ] 2 + (1 + ǫ − 1 ) m 2 E [ m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ] 2 = (1 + ǫ ) E [ k ψ 1 − ψ 1 k 2 2 − 2( m − ψ T 1 ψ 1 )] 2 + O ( m 2 δ 4 ) (by Lemma 7.1 ) ≤ 4(1 + ǫ ) 2 E [ m − ψ T 1 ψ 1 ] 2 + (1 + ǫ )(1 + ǫ − 1 ) E k ψ 1 − ψ 1 k 4 2 + O ( m 2 δ 4 ) = 4(1 + ǫ ) 2 mδ 2 (1 + o (1)) + O ( m 2 δ 4 ) , (143) 54 where the last step follo ws from Lemma 7.16 . Finall y , su bstituting ( 142 ), ( 141 ) and ( 143 ) in ( 140 ) w e obtain an upp er b oun d of the form 2 λ 2 m ( m − 1) 1 + λm δ 2 (1 + o (1)) + (1 + ǫ ) 2 4 λ 2 m (1 + λm ) 2 δ 2 (1 + o (1)) + O ( m 2 δ 4 ) = O ( mδ 2 ) , whic h concludes the pro of. Pro of of Lemmas 7.15 - 7.17 In order to pro ve the lemmas, w e define ξ = ψ − ψ and notice the v ery imp ortan t set of relatio n s : Z ξ = Z ( ψ − ψ ) = Z ( ψ − ψ ) ψ = 1 − Z ψ ψ = 1 2 Z | ψ − ψ | 2 = 1 2 Z ξ 2 = 1 2 δ 2 + O ( δ 4 ) . (144) Pro of of Lemma 7.15 : The decomp osition m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 = ( m − 1) m X k =1 ψ 2 ( T 1 k ) − m X k 6 = k ′ ψ ( T 1 k ) ψ ( T 1 k ′ ) (145) yields E [ m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ] = ( m − 1) m Z ψ 2 − m ( m − 1)( Z ψ ) 2 = m ( m − 1)[1 − (1 − Z ξ ) 2 ] = m ( m − 1)[2 Z ξ − ( Z ξ ) 2 ] = m ( m − 1) δ 2 (1 + O ( δ 2 )) , (b y ( 14 4 )) . Define τ = R ψ . Then, using ( 145 ), V ar [ m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ] = ( m − 1) 2 E [ m X k =1 ( ψ 2 ( T 1 k ) − 1)] 2 + X k 1 6 = k ′ 1 X k 2 6 = k ′ 2 E [( ψ ( T 1 k 1 ) ψ ( T 1 k ′ 1 ) − τ 2 )( ψ ( T 1 k 2 ) ψ ( T 1 k ′ 2 ) − τ 2 )] − 2( m − 1) m X k 1 =1 X k 2 6 = k ′ 2 E [( ψ 2 ( T 1 k 1 ) − 1)( ψ ( T 1 k 2 ) ψ ( T 1 k ′ 2 ) − τ 2 )] = ( m − 1) 2 m E ( ψ 2 ( T 11 ) − 1) 2 + 2 m ( m − 1) E ( ψ ( T 11 ) ψ ( T 12 ) − τ 2 ) 2 +4 m ( m − 1)( m − 2) E [( ψ ( T 11 ) ψ ( T 12 ) − τ 2 )( ψ ( T 11 ) ψ ( T 13 ) − τ 2 )] − 4 m ( m − 1) 2 E [( ψ 2 ( T 11 ) − 1)( ψ ( T 11 ) ψ ( T 12 ) − τ 2 )] = m ( m − 1)  ( m − 1)( Z ψ 4 − 1) + 2( Z ψ 2 Z ψ 2 − τ 4 ) +4( m − 2)( Z ψ 2 Z ψ Z ψ − τ 4 )  − 4 m ( m − 1) 2 ( Z ψ 3 Z ψ − τ 2 ) = m ( m − 1)  ( m − 1)( Z (1 − ξ ) 4 − 1) + 2(1 − τ 4 ) +4( m − 2) τ 2 (1 − τ 2 ) − 4( m − 1) τ ( Z (1 − ξ ) 3 − τ )  . 55 Simplifying this expression, and using ( 144 ), first term within square brac k et is ( m − 1)(4 R ξ 2 − 4 R ξ 3 + R ξ 4 ), and the last term within squ are b rac ke t is − 4( m − 1) τ (2 R ξ 2 − R ξ 3 ). Collecti n g terms and using the fact that 1 − τ 2 = 2(1 − τ ) − (1 − τ ) 2 = ξ 2 − ( ξ 2 ) 4 (again b y ( 144 )), we can express the su m as m ( m − 1)  (4( m − 1) + 4 + 4( m − 2) − 8( m − 1)) Z ξ 2 − (4( m − 1) − 4( m − 1)) Z ξ 3 + ( m − 1) Z ξ 4  + m ( m − 1)  − 4(1 − τ ) 2 − 2(1 − τ 2 ) 2 − 4( m − 2)((1 − τ ) 2 − (1 − τ 2 ) 2 ) +4( m − 1)(1 − τ )(2 Z ξ 2 − Z ξ 3 )  = O ( m 3 δ 4 ) . Pro of of Lemma 7.16 : First observe that, E [ m − ψ T 1 ψ 1 ] 2 = E [ m X k =1 (1 − ψ ( T 1 k ))] 2 = m X k =1 E (1 − ψ ( T 1 k )) 2 + m X k 6 = k ′ E [(1 − ψ ( T 1 k )(1 − ψ ( T 1 k ′ )] = m Z ( ψ − ψ ) 2 + m ( m − 1)( Z ( ψ − ψ )) 2 = m ( δ 2 + O ( δ 4 )) + m ( m − 1) 4 ( δ 2 + O ( δ 4 )) 2 = mδ 2 (1 + o (1)) , (b y ( 144 )) . Next, E k ψ 1 − ψ 1 k 4 2 = E [ m X k =1 (1 − ψ ( T 1 k )) 2 ] 2 = m X k =1 E (1 − ψ ( T 1 k )) 4 + m X k 6 = k ′ E [(1 − ψ ( T 1 k ) 2 (1 − ψ ( T 1 k ′ ) 2 ] = m Z ( ψ − ψ ) 4 + m ( m − 1)( Z ( ψ − ψ ) 2 ) 2 ≤ m k ψ − ψ k 2 ∞ Z ( ψ − ψ ) 2 + m ( m − 1)( Z ( ψ − ψ ) 2 ) 2 = O ( mδ 4 ) + m ( m − 1) δ 4 (1 + o (1)) = O ( m 2 δ 4 ) , where in the last step we used (iv’) and ( 144 ). 56 Pro of of Lemma 7.17 : Use ( 145 ) to w rite the exp ectation as ( m − 1) E [ m X k =1 ψ 2 ( T 1 k )] 2 − E   ( m X k 1 =1 ψ 2 ( T 1 k 1 ))( m X k 2 6 = k ′ 2 ψ ( T 1 k 2 ) ψ ( T 1 k ′ 2 ))   = ( m − 1)   m X k =1 E ψ 4 ( T 1 k ) + m X k 6 = k ′ E [ ψ 2 ( T 1 k ) ψ 2 ( T 1 k ′ )   −   X k 1 = k 2 6 = k ′ 2 E [ ψ 3 ( T 1 k 1 ) ψ ( T 1 k ′ 2 )] + X k 1 = k ′ 2 6 = k 2 E [ ψ 3 ( T 1 k 1 ) ψ ( T 1 k 2 )] + X k 1 6 = k 2 6 = k ′ 2 E [ ψ 2 ( T 1 k 1 ) ψ ( T 1 k 2 ) ψ ( T 1 k ′ 2 )]   = ( m − 1)[ m Z ψ 4 + m ( m − 1)( Z ψ 2 ) 2 ] − [2 m ( m − 1)( Z ψ 3 )( Z ψ ) + m ( m − 1)( m − 2)( Z ψ 2 )( Z ψ ) 2 ] = m ( m − 1)[ Z (1 − ξ ) 4 + ( m − 1) − 2( Z (1 − ξ ) 3 )( Z (1 − ξ )) − ( m − 2)( Z (1 − ξ )) 2 ] = m ( m − 1)[ m Z ξ 2 − ( Z ξ 3 )(2 − Z ξ 2 ) − 1 4 ( m − 2)( Z ξ 2 ) 2 + Z ξ 4 ] = m 2 ( m − 1) δ 2 (1 + o (1)) , where in the four th and last steps we used ( 144 ) and (iv’). References [1] Ash, R. B. (1972). R e al A nalysis and Pr ob ability , Academic Pr ess. [2] Bess e, P. , Cardot, H. and Ferra t y, F . (1997 ). Sim u ltaneous nonp arametric regression of unbala nced longitudin al data. Computationa l Statistics and Data Ana lysis 24 , 255-270. [3] Cai, T. and Hall, P. (2006). Prediction in functional linear regression. Annals of Statistics 34 , 2159-2 179. [4] Cardot , H. , Ferr a ty F. and S arda P. (1999). F un ctional Lin ear Mod el. Statistics and Pr ob ability L etters 45 , 11-22. [5] Cardot , H. (2000 ). Nonparametric estimation of smoothed pr incipal comp onen ts analysis of sampled noisy functions. Journal of Nonp ar ametric Statistics 12 , 503-538. [6] Chui, C. (198 7). Multivariate Splines , SIAM. [7] Fer ra ty , F. and Vieu, P. (2006). Nonp ar ametric F unctional Data A nalysis : The ory and Pr actic e . Springer. 57 [8] Hall , P. and Hor owitz, J . L. (2 007). Metho dology and con v ergence r ates for functional linear regression. ( http://ww w.facult y.econ.northwestern.edu/faculty/horowitz/papers/hhor-final.pdf ) [9] Hall , P. , M ¨ uller, H.-G. and W ang, J.-L. (2006). Pr op erties of principal comp onen t meth- o ds for fun ctional and longitudinal data analysis. Annals of Statistics 34 , 1493-1 517. [10] Hlubinka , D. and Prchal, L. (2007). Changes in atmospheric r adiation from th e statistical p oint of view. Computational Statistics and Data A nalysis 51 , 4926-49 41. [11] Jame s, G. M. , Hastie, T. J. and S ugar, C. A. (2000). Principal comp onent mo dels for sparse functional data. Biometrika , 87 , 587 -602. [12] Ka to, T. (1980 ). Perturb ation The ory of Line ar Op er ators . S pringer-V erlag. [13] Kneip, A. and Utikal, K. J. (2001). Inference for densit y families u sing fu nctional principal comp onent analysis, Journal of the A meric an Statistic al Asso ciation , 96 , 51 9-542. [14] Nica, A. and Speicher, R. (2006). L e ctur es on the Combinatorics of F r e e Pr ob ability . Cam- bridge Univ ersity Press. [15] P aul, D. (200 4). Asymptotics of the leading samp le eigen v alues for a spik ed co v ariance mo del. T e chnic al r ep ort . ( http://an son.ucda vis.edu/ ∼ debashis/techrep/eigenlimit.pdf ) [16] P aul, D. and J ohnston e, I. M. (2007). Augmen ted sp arse principal comp onent analysis for high dimensional data. Working Pap er . ( http://an son.ucda vis.edu/ ∼ debashis/techrep/augmented-spca.pdf ) [17] P aul, D. and Pen g, J. (2007). Cons istency of restricted maximum lik elihoo d estimators of principal comp onents. T o app ear in Annal s of Statistics . ( http://an son.ucda vis.edu/ ∼ jie/REML-Asymptotics revi sion.pdf ) [18] Peng, J. and P aul, D. (20 07). A geometric approac h to maxim um like liho o d estimation of co v ariance kernel from sp arse irregular longitudinal data. T e chnic al R ep ort . arXiv:0710 .5343v1 [stat.ME]. ( http://an son.ucda vis.edu/ ∼ jie/pd-cov-likelihood-technical.pdf ) [19] Peng, J . and M ¨ uller, H.-G. (2008). Distance-based clustering of s parsely observed sto c has- tic pro cesses, with ap p lications to online auctions. T o app ear in Annals of Applie d Statistics . [20] Ramsa y, J . and Sil verman , B. W. (2005) : F unctional Data Analysis, 2nd Edition . Springer. [21] Spel lman, P.T. , Sherloc k, G. , Zh ang, M. Q. , Iyer, V. R. , Ande rs, K. , Eisen , M. B. , Bro wn, P. O. , Botstein, D. and Futch er, B. (19 98). Compreh en siv e ident ifi cation of cell cycle-regulated genes of th e y east sac char omyc e s c er evisiae b y m icroarra y h ybridization. Mole cular Biolo gy of the Ce l l 9 , 327 3-3297. [22] Y ao, F. , M ¨ uller, H.-G. and W ang, J.-L. (2005) . F unctional data analysis for sparse lon- gitudinal data. Journal of the Americ an Statistic al Asso ciation 100 , 577-590. [23] Y ao, F. , M ¨ uller, H.-G. and W ang, J .-L. (2006). F unctional linear regression for longitu- dinal data. Annal s of Statistics 33 , 2873-29 03. 58

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment