Principal components analysis for sparsely observed correlated functional data using a kernel smoothing approach

Principal comp onen ts analysis for sparsely observ ed c o rrelated functional data using a k ernel smo othing approac h Debashis P aul and Jie P eng University of California , Davis Abstract In this pap er, w e consider the problem of estimating the cov ariance kernel and its eig en v alues and eigenfunctions from sparse, irr e gularly o bs erved, noise corr upted and (possibly) corr elated functional data. W e present a metho d based o n pre -smo othing of individual sample curves through an a ppropria te kernel. W e show that the naive empirical cov a riance of the pre-smo othed sample curves gives highly biased es timator of the cov ariance kernel along its diagonal. W e attend to this problem by estimating the diag onal a nd o ﬀ-diagonal par ts of the cov ariance kernel s eparately . W e then present a pra ctical and eﬃcient metho d for choosing the bandwidth for the k er nel by using a n approximation to the leav e- one-curve-out cross v a lidation sco re. W e prov e that under standar d regula rity conditions o n the cov ariance kernel and as suming i.i.d. samples, the r isk of our estimator , under L 2 loss, achiev es the optimal nonpar ametric ra te whe n the num b er of mea surements p er curv e is b o unded. W e also sho w that ev en when the sample curves ar e corr e la ted in such a wa y that the no iseless data has a sep ar able co v aria nce structure, the prop osed metho d is still consistent and we qua ntify the role of this correla tion in the ris k of the estimator . AMS Subje ct Classiﬁc ation : 62G20, 62H25 Keywor ds : F unctional data analysis, principal comp onent analysis, k ern el smo othing, cross v alidati on, consistency 1 In tro du ction Noisy fu nctional data arise fr equ en tly in v arious ﬁelds, for example longitudinal data analysis, c hemometrics, econometrics, etc (F errat y and Vieu, 2006). Dep ending on ho w th e measuremen ts are tak en, there can b e t wo diﬀerent scenarios - (i) individu al cur v es are measur ed on a dense, regular grid; (ii) the measuremen ts are observ ed on a sparse, and typica lly irregular set of p oin ts in an interv al. The ﬁrst s itu ation usually arises w h en the d ata are recorded by some automated instrument, e.g. in c hemometrics, where the cur v es represent the sp ectra of certain chemica l sub- stances. Th e second scenario is more t yp ical in longitudinal studies where the individual curves could repr esen t the level of concen tration of some sub stance, and the measurements on the sub jects ma y b e tak en only at irregular time p oin ts. In these settings, when the goal of analysis is either data compression, mo del building or stud y- ing co v ariate eﬀects, one may w ant to extract inf ormation ab out the functional princip al c omp onents (i.e., the eigen v alues and eigenfunctions of th e co v ariance kernel). The eigenfunctions give a nice basis for representing the data, and hence are ve r y useful in problems related to mo del bu ilding and prediction f or functional data. F or example, they ha ve b een used extensiv ely in f u nctional linear regression (Card ot, F errat y and Sarda (1999), Hall and Ho r o witz (2007) , Cai an d Hall (2006)). Ramsa y and S ilverman (2005 ) and F errat y and Vieu (2006) giv e extensiv e su r v eys of the appli- cations of fu nctional pr incipal comp onen ts. In the ﬁrst scenario, i.e., data on a r egular grid, as long as the individual cur v es are smo oth, th e measur emen t noise lev el is lo w, and the grid is dense enough, one can essentia lly treat the data to b e on a con tin uu m, and emp loy tec hn iques similar to 1 the ones u sed in classical multiv ariate analysis. Ho w ev er, the ir r egular n ature of data in the second scenario, and the asso ciated measurement n oise requir e a diﬀerent treatmen t. In this pap er, w e prop ose a ke r n el smo othing approac h to estimate the co v ariance sur face and its f unctional prin cipal comp onent s based on sparse, irregularly observ ed, n oise corrupted functional data. This metho d is b ased on the pre-smo othing of individual curves, with su itable mo diﬁcation of the diagonal, for estimating th e co v ariance ke r nel. W e pr o v e the consistency and d er ive the rate of conv ergence of the prop osed estimator. Also, u nder man y p ractical circumstances, th e sample curves are correlated, for example, spatio-temp oral data (Hlub ink a and P rc hal, 2007) , online auction data (P eng and M ¨ uller, 2008), time course gene expression d ata (Sp ellman et al. , 1998). Ho wev er, in the existing literature, most of the theoretical study on principal comp onents analysis assume i.i.d. sample curv es. The analysis p resen ted in th is pap er sh o ws that the asymptotic consistency of th e p rincipal comp onent s holds for the prop osed metho d ev en u nder certain t yp es of correlation s tr uctures (as discussed later). Before w e go into the details of the p r op osed pro cedur e, we ﬁrst giv e an outline of th e d ata mo d el and an o v erview of diﬀerent approac hes to this problem. Su pp ose that we observe n realizations of an L 2 -sto c hastic pro cess { X ( t ) : t ∈ [0 , 1] } at a sequence of p oin ts on the in terv al [0 , 1] (or, more generally , on an in terv al [ a, b ]), with ad d itiv e measurement n oise. T h at is, the observ ed d ata { Y ij : 1 ≤ j ≤ m i ; 1 ≤ i ≤ n } can b e mo deled as : Y ij = X i ( T ij ) + σ ε ij , (1) where { ε ij } are i.i.d. w ith mean 0 and v ariance 1. Since X ( t ) is an L 2 sto c hastic pr o cess, b y Mer c er’s The or em (Ash , 1972) there exists a p ositiv e semi-deﬁnite kernel C ( · , · ) such that C ov ( X ( s ) , X ( t )) = C ( s, t ) and eac h X i ( t ) has the f ollo w in g a.s. represen tation in terms of the eigenfunctions of the k ernel C ( · , · ) : X i ( t ) = µ ( t ) + ∞ X ν =1 p λ ν ψ ν ( t ) ξ iν , (2) where µ ( · ) = E ( X ( · )) is th e mean fun ction; λ 1 ≥ λ 2 ≥ . . . ≥ 0 are the eigen v alues of C ( · , · ); ψ ν ( · ) are the corresp ond ing orthonormal eigenfunctions; and the rand om v ariables { ξ iν : ν ≥ 1 } , for eac h i , are uncorrelated with zero mean an d un it v ariance. F urthermore, w e assume th at for eac h p air ( i, j ) with 1 ≤ i 6 = j ≤ n , the correlation is mo delled by E ( ξ iν ξ j ν ′ ) = δ ν ν ′ ρ ij , for 1 ≤ ν , ν ′ ≤ M , and ρ ij ma y b e nonzero. This giv es rise to a se p ar able c ovarianc e structur e for the noiseless d ata. T hat is, the pr o cesses { X i ( · ) } n i =1 satisfy , C ov ( X i ( s ) , X j ( t )) = ρ ij C ( s, t ), with ρ ii ≡ 1. This holds, for example when the principal comp onent scores { ξ iν } n i =1 for diﬀeren t ν are i.i.d. stat ionary time series. Fi n ally , in th e observ ed data mo d el ( 1 ), we assum e that T i = { T ij : j = 1 , . . . , m i } are randomly sampled from a contin uous d istr ibution. As an example that is particularly suitable f or mo deling within the framew ork p resen ted ab o ve, w e consider th e data on atmospheric radiation in Hlu bink a and Prchal (2007). Th er e, the m ea- surements are ta ken fr om ballo ons from Earth’s surface u p to an altitude of 35 km. The data p oints corresp onding to the i -th ballo on are of the form ( a i , z i ), where a represents th e altitude and z represents the a v erage num b er of p u lses at altitude a , whic h is thought to b e prop ortional to the radiation intensit y . Thus, these vertica l p roﬁles of atmospheric radiatio n are considered as individual realizatio ns of a functional data. That is, h ere a i ’s are measur ement p oint s, z i ’s are the measuremen ts and the sub jects are ind exed by time. Hence th ere is a natural dep endence among 2 the sample curve s obs er ved o ver diﬀerent time p oint s. Moreo v er, it is r easonable to assume that the dep en dence across time do es not change with the v ertical distance except p ossibly through a long-term trend , i.e., the spatio-t emp oral co v ariance structure is separable. Belo w we give a s h ort ov erview of t wo existing app roac hes to the problem of estimation of functional principal comp onen ts from sparse data. Y a o, M ¨ uller and W ang (2005) p rop ose a lo cal linear smo othing of the empirical co v ariances { b C i ( T ij , T ij ′ ) : j 6 = j ′ } n i =1 : b C i ( T ij , T ij ′ ) = ( Y ij − b µ ( T ij ))( Y ij ′ − b µ ( T ij ′ )) where b µ is the estimate of th e mean fun ction µ ( · ) obtained b y lo cal lin ear smo othing. They pro ve asymptotic consistency of this estimator and the estimated eigenfunctions, by assum in g i.i.d. sample cu r v es. Hall, M ¨ uller and W ang (2006) prov e furth er that the problem of estimating the co v ariance kernel and that of estimating its eigenfun ctions are intrinsically diﬀerent in that the former is a t wo-dimensional smo othing p r oblem w hile the latter is an one-dimensional problem, whic h results in d iﬀeren t c hoices for optimal b andwidth. T hey also prov e th at the prop osed lo cal p olynomial estimator ac hieves the optimal n on p arametric con verge n ce rate with the optimal choic e of bandw idths, under the i.i.d. setti n g, wh en the n u mb er of measuremen ts p er curv e is b ound ed. Instead of the lo cal p olynomial approac h, where one imp oses regularization on the estimates by v arying the bandwidth of the k ernel, one can imp ose regularization b y restricting the eigenfunctions in a known basis of smo oth f unctions. This ap p roac h has b een u sed by v arious researchers in cluding Besse, Card ot and F errat y (1997), Card ot (2000), James, Hastie and S ugar (2000) and P eng and P aul (2007 ). Peng and P aul (2007) prop ose to directly m aximize the restricted log-lik eliho o d under the wo rkin g assump tion of Gaussianit y , suc h that th e resulting estimator s atisﬁes the geometry of the parameter space. This metho d is implemented thr ough a Newton-Raphson alg orithm on the S tiefel manifold of rectangular matrices with orthonormal columns. The latter space is the parameter space for the m atrix of basis co eﬃcien ts for the eigenfun ctions. F urthermore, in P aul and P eng (2007) the authors p ro ve that this r estricted maxim um lik eliho o d (REML) estimator also ac hiev es the optimal nonparametric r ate w hen the num b er of measurements p er sample curve is b ound ed and th e sample curves are i.i.d. W e n ow giv e a b rief descrip tion of the estimatio n pro cedur e prop osed in this pap er. T he metho d is partly m otiv ated by the observ atio n that the naiv e sample co v ariance b ased on the presmo othed individual s ample curves is a highly bias estimation along the diagonal of the co v ariance k ern el, when m i , th e num b er of measur ements p er cur v e, is small. As can b e seen clearly from ( 6 ) in S ection 2.1 , this bias do es not v anish asymptotically unless (min 1 ≤ i ≤ n m i ) h n → ∞ as n → ∞ , where h n is the bandwid th of the k ern el smo other. Under the latter setting, Hall et al. (2006) d iscuss the p ossibilit y of using a lo cal linear smoother for individual samp le curves an d then p erform in g a PCA on the smo othed curves. F urthermore, wh en the design p oin ts T ij are regularly spaced and suﬃcien tly dense, th ey sho w that u sing con v ent ional PCA for fun ctional d ata (see statemen ts and conditions in Theorem 3 of that pap er for details) on e obtains ro ot- n consisten t estimates of the eigen v alues and eigenfunctions so that the problem is asymptotically equiv alen t to a parametric problem. It is an in teresting question that whether the naiv e kernel smo othing appr oac h can b e suitably m o diﬁed su c h that it can pro du ce estimators with go o d asymptotic risk prop erties even when th e m i ’s are relativ ely small. Our app roac h in this p ap er go es to wards this dir ection an d in vol ves estimating the diagonal and the oﬀ-diago n al p ortions separately , and then merging them together using a sm o oth weigh t k ern el. The estima tion of th e oﬀ-diagonal p ortion is based on presmo othing individual sample cu r v es by a linearized k ernel smo other. The estimation of the diagonal p art inv olv es linearized k ern el sm o othing of the empirical v ariances. T h e task of select in g 3 an ap p ropriate band width, and the num b er of nonzero eigen v alues, is add ressed th rough obtaining a computationally eﬃcient app ro ximation to the lea ve -one-curve-o ut cross v alidation score. This appro ximation p ro cedure, as wel l as the asymptotical analysis of the estimators, is based on the p ertur bation theory of linear op erators. No w we su mmarize th e main con tributions of this pap er. Our approac h of m er ging t wo separate presmo othed linearized kernel estimates of the diagonal and the oﬀ-diagonal p arts of the co v ariance k ernel is new and is compu tationally very eﬃcien t. W e prov e that the prop osed estimator achiev es the optimal nonparametric rate when the observ ations are i.i.d. r ealizat ions of a ﬁnite dimensional smo oth sto c h astic pro cess, and w h en th e num b er of measur ements p er curve is b oun ded. This result parallels to the one obtained by Hall et al. (2006) for the lo cal p olynomial app r oac h. Moreov er, we obtain explicit expressions for th e integ rated mean squ ared error of the estimated eigenfunctions under a regime of s eparable co v ariance structure among the sample cur v es. The quantiﬁcat ion of the role of correlation in the risk b ehavio r (Th eorem 4.2 ) is seemingly n ew in the literature, un der the con text of functional data analysis. W e also deriv e a low er b ound on the rate of con verge n ce of the risk of the ﬁ rst eigenfunction (Theorem 4.3 ) wh ic h is sharp er than an analogous (but more general) b ound obtained in Hall et al. (2006). T h is lo wer b oun d and the matc hin g up p er b ound on the rate of conv ergence for th e i.i.d. case sho ws that the prop osed estimator obtains the optimal rate ev en w hen max 1 ≤ i ≤ n m i → ∞ , at least under the restricted setting describ ed in Theorem 4.3 . Moreo v er, if the correlati on b etw een sample curv es is “w eak” in a suitable sense, then the optimal rate of conv ergence for eigenfunctions in the correlated and i.i.d. cases are the same. F urthermore, w e sh o w that our estimation p ro cedure also allo ws for a computationall y eﬃcien t ap p ro ximation of lea v e-one-curv e-out cross v alidation score, whic h is used for selecting the bandwidth for estimating the eigenfun ctions. Th is appr o ximation is based on a p ertu rbation analysis ap p roac h that is n atural giv en the f orm of our estimator. In the pap er, we also s ho w that the widely used prediction err or loss for cross v alidation is not correctly scale d und er the curr en t context . Th u s we prop ose to use the empirical Ku llbac k-Leibler loss for th e cross v alidation criterion. The rest of the pap er is organized as follo ws. In Section 2 , we prop ose the estimation pro cedu re and con trast it with the naiv e kernel smo othing app r oac h. I n Section 3 , we pr op ose an appr o xima- tion to the lea v e-one-curv e-out cross v alidation s core based on th e p erturbation theory for linear op erators. I n Section 4 , w e state the main results ab out th e consistency an d rate of conv ergence of the estimato r s of the co v ariance k ern el and its eigenfun ctions. In Section 5 , w e giv e an outline of the p ro of of the main results (Theorems 4.1 and 4.2 ) and discuss their implications. In Section 6 , w e giv e an o verview of v arious related issues and future r esearch d ir ections. T he pr o of details are pro vided in the app endices. 2 Metho d Throughout this section, we assume that the mean curve has b een estimated separately , and h as b een su btracted from the data. Thus, without loss of generalit y w e assume that µ = 0. Also, in the asymptotic analysis carried out in S ection 4 , we m ak e th e same assu mption to simplify the exp osition. The case of arbitrary µ w ith suﬃcien t degree of smoothn ess can b e easily handled. 2.1 Naiv e kernel smo othing approac h A p opular metho d in n onparametric fun ction estimation is to smo oth the individ u al sample cur v es b y a k ernel av eraging of the sample p oint s. In pr inciple, one can adopt a similar approac h in the 4 current con text. This m eans that ﬁ rst smo othing in dividual sample curv es, and then computing the co v ariance of the “pre-smo othed” sample curves, follo wed by an eigen-analysis of this “pre- smo othed” empirical co v ariance. In the follo wing, w e ﬁrst describ e brieﬂy suc h an approac h, and then sh ow that ev en in the case of i.i.d. data, the estimator thus obtained has an intrinsic bias while estimating the diago n al of th e co v ariance kernel, unless the n u m b er of m easur emen ts p er curv e is large. Let K ( · ) b e a summability kernel with an adequate d egree of smo othness, and satisfying the follo wing conditions: B1 (i) supp( K ) = [ − B K , B K ] for some B K > 0; (ii) K is symm etric ab out 0; (iii) R K ( x ) dx = 1; (iv) R xK ( x ) dx = 0; (v) R K ′ ( x ) dx = 0; (vi) R xK ′ ( x ) dx = 1. W e then deﬁne the pr esmo othed sample curve s as follo ws: e X i ( t ) = 1 m i m i X j =1 Y ij K h n,i ( t − T ij ) , i = 1 , . . . , n, (3) where K h ( x ) = h − 1 K ( h − 1 x ) for h > 0 and h n,i is the bandwid th for the i -th cur v e. T hen th e empirical co v ariance b ased on the presmo othed curv es is simply e C ( s, t ) = 1 n n X i =1 e X i ( t ) e X i ( s ) . (4) In the follo wing, w e deriv e an expression for the exp ectatio n of e C ( s, t ) in estimating C ( s, t ) to quan tify the bias, w h en h n,i = h n for all i , under the assu mption that C ( · , · ) is t wice con tin uously diﬀeren tiable. Supp ose for simplicit y that th e d ensit y of the design p oin ts { T ij } m i j =1 , f or eac h sub ject, is unif orm on [0 , 1]. Deﬁne C ( t ) = C ( t, t ) for t ∈ [0 , 1], and K 2 ( · ) = R K ( · − u ) K ( − u ) du . Also, we assume that m ′ i s are giv en. In the follo w in g p rop osition the b oun ds hold under h n → 0. Prop osition 2.1. When s 6 = t , E [ e X i ( s ) e X i ( t )] = 1 m i h n K 2 ( s − t h n )( C ( t ) + σ 2 ) + 1 m i C ′ ( t ) Z uK ( − u ) K ( s − t h n − u ) du + (1 − 1 m i ) C ( s, t ) + 1 m i O ( h n ) + O ( h 2 n ) . (5) And , E [ e X i ( t ) 2 ] = 1 m i h n K 2 (0)( C ( t ) + σ 2 ) + (1 − 1 m i ) C ( t ) + 1 m i O ( h n ) + O ( h 2 n ) (6) The O ( · ) terms involve su p t ∈ [0 , 1] | C ′′ ( t ) | , sup s,t ∈ [0 , 1] k D 2 C ( s, t ) k and R u 2 K ( u ) du , wher e D 2 is the Hessian op er ator. By Pr op osition 2.1 , it is easy to see, E [ e X i ( s ) e X i ( t )] = (1 − 1 m i ) C ( s, t ) + O ( h 2 n ) if | s − t | > 2 B K h n , since the ﬁ r st t wo terms in ( 5 ) b oth v anish, as well s as the O ( h n ) term (see the pro of in App endix C for more details). This shows that e C ( s, t ) sh ould b e m u ltiplied by m i / ( m i − 1) to get r id of the trivial bias. Ho wev er, ( 5 ) and ( 6 ) also show that the emp ir ical co v ariance e C ( s, t ) is a h ighly biased estimate of C ( s, t ) near the diagonal ev en after this trivial mo diﬁcation, unless h n min 1 ≤ i ≤ n m i → ∞ . This is b ecause the ﬁ rst terms in( 5 ) and ( 6 ) are alw a ys p ositiv e along the diagonal (i.e., wh en 5 | s − t | < 2 B K h n ), which r esult in ov erestimatio n . In fact the degree of o v erestimation gets really big (by a scale factor of h n ) as so on as | s − t | < 2 B K h n . This demonstrates clearly that the naiv e k ernel s mo othing appr oac h is intrinsically b iased and n eeds to b e app ropriately mo diﬁ ed. T o unders tand the reason for this bias, notice that if a p air of p oint s ( T ij , T ij ′ ), for some 1 ≤ j 6 = j ′ ≤ m i , is randomly sampled from [0 , 1] 2 , then it h as a p robabilit y of the order O ( h 2 n ) to b e in a neighborh o o d of length and w idth h n of a given p oint ( s, t ) (which is a wa y from th e diagonal). In contrast , there is O ( h n ) probabilit y of a randomly c hosen p oin t T ij to b elong to a neigh b orho o d of length h n of the p oint ( t, t ) along th e diagonal. Ther efore, measurement s are muc h denser along the d iagonal and this explains the d iﬀerence in rates. 2.2 Mo diﬁcation t o naiv e kernel smo othing In this sectio n , we prop ose a mo d iﬁcation to deal with the bias in the naiv e k ernel smo othing approac h describ ed in Section 2.1 . W e prop ose to remedy the eﬀe ct of unequal s cale along the diagonal of the co v ariance k ernel (and the r esu lting bias) by estimat ing the diagonal and the oﬀ- diagonal p arts separately . W e then u se a suitable (sm o oth) weigh t k ernel to com bine those tw o estimates together. Throughout the p ap er, we assume that the d ensit y of the time-p oints { T ij } is kno wn and is denoted b y g ( · ). In practice we can estimate g from the data separately . W e further assume that there are constant s 0 < c 0 ≤ c 1 < ∞ suc h that c 0 ≤ g ( · ) ≤ c 1 . W e also pr op ose to use a linearized v ersion of the k ernel s mo othing to red uce the b ias while con trolling the v ariance. F o r th is p urp ose, deﬁn e Q ( s, t ) to b e a tensor-pr o duct kernel (that is a k ernel of the form Q ( s, t ) = Q ( s ) Q ( t ) for some sm o oth function Q ) with the follo wing prop erties, together referred as condition B2 : (i) Q is sup p orted on [ − C Q , C Q ], for some C Q > 0, and Q ( · ) ≥ 0; (ii) k Q k ∞ < ∞ ; (iii) P k ∈ Z Q ( x − k ) = 1. (iv) Q is symm etric ab out 0. Prop erty (iii) can b e rephrased as sa ying that integ er trans lates of Q form a p artition of unity . As an example, the B-spline basis functions (Ch u i, 1987) satisfy all four p r op erties. Let Q h ( · , · ) denote the k ern el Q ( h − 1 · , h − 1 · ). F or estimation of the diagonal C ( t ) = C ( t, t ), let b C ( t ):= b C ∗ ( t ) − b σ 2 , where b σ 2 is an estimato r of σ 2 (discussed in Section 2.3 ), and b C ∗ ( t ) is the estimate of C ( t ) + σ 2 obtained b y using a linearized k ernel smo othing of the terms { 1 m i Y 2 ij : j = 1 , . . . , m i ; i = 1 , . . . , n } . T h is is b ecause, f or eac h pair ( i, j ), the conditional exp ectatio n of the qu an tit y Y 2 ij (conditional on T i and m i ) is C ( T ij , T ij ) + σ 2 . Deﬁne a grid on [0 , 1] with grid sp acings h n and denote the grid p oint s by { s l : l = 1 , . . . , L n } where L n = c L h n for an appropriately chose n c L ≈ 1. Then deﬁne, b C ∗ ,h n ( t ) = 1 g ( t ) 1 n n X i =1 L n X l =1 [ S i ( s l ) + ( t − s l ) S ′ i ( s l )] Q h n ( t − s l ) , (7) with S i ( s ) = 1 m i m i X j =1 Y 2 ij K h n ( s − T ij ) . (8) 6 Note that, ( 7 ) is a linearized v ersion of the con v entio n al k ernel smo othing, w hic h can b e in terpreted as a lo cal linear smo othing of the emp irical v ariances. A similar principle is app lied to construct an estimator of the oﬀ-diagonal part (see ( 9 ) b elow). T he linearization has t wo adv an tages: on one hand, it helps in r educing the bias in the estimate; and on the other hand it facilitate s eﬃcien t computation b oth in term s of estimation and mo d el select ion. Th e d iﬀerence of this linearization approac h with th e lo cal linear smo othing mainly lies in the fact that w e are using g ( t ) (or an estimate of g ( t )) in the denominator, w hile in lo cal linear smo othing, the denomin ator implicitly is a local estimate of g obtained by a verag in g the smoothin g k ernel in a neigh b orho o d of t . No te that, as opp osed to our estimator of g , w h ic h uses d iﬀeren t bandwidth th an the one for estimating the co v ariance, lo cal linear smo othing essentia lly us es the same bandwid th for estimating b oth g and C , and thus it suﬀers f rom instabilit y . More sp eciﬁcally , the lo cal linear estimator of Y ao et al. (20 05) inv olv es ratios w ith a denominator consisting of essen tially the num b er of time p oints falling in a small interv al. Since the time p oint s are assumed to b e randomly distributed and are sparse, in p r actice this can cause instabilit y . Let e X i ( t ) b e the i -th smo othed sample curve as d eﬁned in ( 3 ), and e X ′ i ( t ) b e the deriv ativ e of e X i ( t ). Then deﬁ ne the estimate of the oﬀ-diagonal p art as (with a slight abuse of notation) e C h n ( s, t ) = 1 g ( s ) g ( t ) 1 n n X i =1 w ( m i ) L n X l,l ′ =1 h ( e X i ( s l ) + ( s − s l ) e X ′ i ( s l )) · ( e X i ( s l ′ ) + ( t − s l ′ ) e X ′ i ( s l ′ )) Q h n ( s − s l , t − s l ′ ) i . (9) Here w ( m i ) = m i m i − 1 is a w eigh t function wh ic h is determined thr ough an asymptotic bias analysis (Prop osition 2.1 ). Note that, as long as | s − t | ≥ Ah n for some constan t A d ep end in g on B K and C Q , in the inner su m in deﬁn ition ( 9 ), the terms for w hic h l = l ′ are absent. Therefore, according to our analysis in the previous section, they do not con trib ute anything b y wa y of bias. No w let W ( · , · ) b e a w eight k ernel on th e d omain [0 , 1] 2 deﬁned as W ( s , t ) := W ( s − t ) = ( 0 if | s − t | > 1 2 1 if | s − t | ≤ 1 2 (10) Deﬁne W e h n ( s, t ) = W (( s − t ) / e h n ) and W e h n ( s, t ) = 1 − W e h n ( s, t ), w h ere e h n = Ah n for the ab o ve A > 0. W e then smo oth the ke r nels W e h n and W e h n b y con vo lving them with a Gaussian k ernel G τ n ( · ) with a small ban d width τ n (in the sense that τ n = o ( h n )). And with an abuse of notation, denote the resu lting kernels also b y W e h n and W e h n , resp ectiv ely . Finally , we are ready to d eﬁne the prop osed com bined estimator of C ( s, t ) as b C c,h n ( s, t ) = W e h n ( s, t ) e C h n ( s, t ) + W e h n ( s, t ) max { b C h n ( s + t 2 ) , h 2 n } , (11) where b C h n ( · ):= b C ∗ ,h n ( · ) − b σ 2 . The use of maxim um in the second term is just to guaran tee that the estimator of the diago n al is nonn egativ e and the b ias is O ( h 2 n ). W e no w discuss brieﬂy the computational asp ects of the pr op osed estimator. A k ey s tep is the computation of the f unctions S i ( · ) and e X i ( · ) and their deriv ativ es at th e grid p oin ts s l : l = 1 , . . . , L n . Eac h one of these computations requires O ( m i ) ﬂoating p oin t op erations (for eac h i = 1 , . . . , n ). F rom these, we obtain e C h n ( s, t ) and b C ∗ ,h n ( t ) b y usin g ( 9 ) and ( 7 ), resp ectiv ely . Bo th expressions are in the form of discrete conv olutions, and hen ce can b e computed very rapidly by u sing the F ast F ourier T r ansform . T h u s, the estima tion pro cedure is compu tationally v ery eﬃcien t, with O ( n mL n log L n ) computations on the w hole grid, where m = m ax i m i . 7 2.3 Estimation of σ 2 Here w e brieﬂy outline a metho d for estimating the error v ariance σ 2 . The metho d is similar to the approac h tak en in Y ao, M ¨ uller and W ang (2006), an d hence w e omit the d etails. First, for a given band width h n , we estimate the fu n ction C ( s, t ) for | s − t | > Ah n , for some A dep ending on B K and C K , using ( 9 ). Then, as in Y ao et al. (2006) , we estimate the diagonal { C ( t ) : t ∈ [0 , 1] } , using an oblique linear in terp olation, by b C 0 ,h n ( t ) = Z A 2 A 1 1 2 ( e C h n ( t − uh n , t + uh n ) + e C h n ( t + uh n , t − uh n )) d e G ( u ) , (12) for some probabilit y d istribution fu nction e G sup p orted on [ A 1 , A 2 ] wh ere A 1 > A . On the other hand, we estimate the curve { C ( t ) + σ 2 : t ∈ [0 , 1] } by b C ∗ ,h n ( t ) deﬁned in ( 7 ). No w, we estimate σ 2 b y b σ 2 = 1 T 1 − T 0 Z T 1 T 0 ( b C ∗ ,h n ( t ) − b C 0 ,h n ( t )) dt, (13) where 0 < T 0 < T 1 < 1. It can b e sho wn that (Corollary 4.1 in Section 4 ) the estimator b σ 2 th us obtained is consisten t for an approp r iate choice of h n . 3 Bandwidth selection The c h oice of optimal bandwidth for the kernel is a key step in any k ernel-based estimation p ro- cedure. Y ao et al. (2005) use a lea ve-o n e-curv e-out cross v alidation score based on the prediction error for selecting the bandwid th of the smo other, and an AIC approac h f or selecting the num b er of non-zero eig env alues. Ho w ever, lea ve -one-curve-o u t cross v alidation is computationally v ery ex- p ensive. Also, as sho wn b elo w, th e pr e diction err or loss is not an appropriate criterion for cross v alidati on und er the current con text. Therefore, in this pap er w e address the issue of mo del selec- tion b y pro du cing an appro ximation to the lea v e-one-curv e-out cross v alidat ion score based on the empiric al Kul lb ack-L e i bler loss . The appr o ximation is based on the idea that the estimator obtained b y dropp ing an y single curv e is a small p erturb ation of the estimator based on the whole d ata (Pe n g and Paul, 2007). In p articular, we use p erturb ation theory of linear op erators to quan tify this p er- turbation and pro duce a ﬁrst order appro ximation to the C V score that is computationally eﬃcien t. It also enables u s to select the bandw idth and the dimension of the pr o cess sim ultaneously . W e ﬁr s t discuss the choi ce of the loss fu nction, wh ic h is v ery imp ortan t for a cross-v alidation sc heme. W e w ant to p oint out that, the p rediction problem is in trins ically diﬀerent from th e estimation of the co v ariance k ern el. W e ﬁnd out that the cr iterion based on prediction er r or loss is not correctly scaled, as opp osed to th e one based on empir ical K ullbac k-Leibler loss. T o mak e this p oint clear, we examine these t wo cross v alidatio n criteria in d etails. Deﬁne Y i = ( Y ij ) m i j =1 , µ i = ( µ ( T ij )) m i j =1 , ψ iν = ( ψ ν ( T ij )) m i j =1 . W e assu m e that the cov ariance k ernel can b e represente d using K orthonormal eigenfunctions for some K ≥ 1. Then the lea v e- one-curv e-out cross v alidat ion score based on the prediction err or loss is giv en b y C V ( K, h n ) = n X i =1 m i X j =1 ( Y ij − b Y ( − i ) i ( T ij )) 2 . (14) Here b Y ( − i ) i ( t ) = b µ ( − i ) ( t ) + P K ν =1 b ξ ( − i ) iν b ψ ( − i ) ν ( t ), where b µ ( − i ) ( t ) and b ψ ( − i ) ν ( t ) are the estimates of µ ( t ) and ψ ν ( t ) computed from observ ations { Y i ′ } n i ′ 6 = i . Also, b ξ ( − i ) iν is the estimated p rincipal comp onent 8 score based on observ ations { Y i ′ } n i ′ 6 = i . Note that, the estimat ed pr incipal comp onents scores ξ ( − i ) iν can b e obtained thr ough the pro cedure describ ed in Y ao et al. (2005 ), ev en though it will not b e necessary for the m o del selection pro cedu re w e shall adopt. On the other han d , the CV score based on the empir ical Kullbac k-Leibler loss is giv en by C V ∗ ( K, h n ) = n X i =1 ℓ i ( Y i ; b µ ( − i ) i , b Σ ( − i ) i,K ) , (15) where b Σ ( − i ) i,K = K X ν =1 b λ ( − i ) ν b ψ ( − i ) iν ( b ψ ( − i ) iν ) T + b σ 2 ( − i ) I m i , and ℓ i is (up to an add itiv e constant ) the negativ e log-lik elihoo d of the i -th observ ation u n der the w orking assumption of Gaussianit y , wh ic h is ℓ i ( Y i ; µ i , Σ i ) = 1 2 log | Σ i | + 1 2 tr (Σ − 1 i ( Y i − µ i )( Y i − µ i ) T ) . T o gain an un derstanding of what these CV scores are appr oximati n g, w e assume that we hav e t wo indep endent samples, eac h with n i.i.d. sample curves. F urthermore, to simplify exp osition, w e assume that µ ≡ 0. S upp ose that the estimates b Ψ = { b ψ ν } K ν =1 , b Λ = { b λ ν } K ν =1 are obtained from the ﬁ r st sample. Then a lea v e-one-curv e-out CV score can b e reasonably appro ximated b y su bstituting th ese estimates in the corresp ond ing empirical loss function b ased on the second sample, and w ith an abuse of notation we also denote this quantit y by C V . I f ℓ i (Ψ , Λ) denotes the loss fu nction corresp ondin g to the i -th observ ation in th e second sample, th en the CV score is giv en by 1 n P n i =1 ℓ i ( b Ψ , b Λ). F or simplicit y , we assume that there is a tru e mo del (Ψ ∗ , Λ ∗ ) within the class of m o dels we are considering. A ﬁ rst order exp an s ion of the diﬀerence b etw een the C V scores under the true and estimated parameters for the empirical K ullbac k-Leibler loss s h o ws that, with high probabilit y , 1 n n X i =1 ℓ i ( b Ψ , b Λ) − 1 n n X i =1 ℓ i (Ψ ∗ , Λ ∗ ) = 1 4 n n X i =1 k b Σ − 1 / 2 i (Σ ∗ i − b Σ i ) b Σ − 1 / 2 i k 2 F (1 + o (1)) + O   r log n n " 1 n n X i =1 k Σ 1 / 2 ∗ i ( b Σ − 1 i − Σ − 1 ∗ i )Σ 1 / 2 ∗ i k 2 F # 1 / 2   , (16) where k · k F is the F r ob enius norm, a n d Σ ∗ i and b Σ i are the co v ariance matrices of the ob- serv ations Y i = ( Y i 1 , . . . , Y im i ) T , corresp onding to th e true parameter (Ψ ∗ , Λ ∗ ) and estimates ( b Ψ , b Λ), resp ectiv ely . Since w e can essen tially ignore the O ( · ) term in ( 16 ) as long as 1 n P n i =1 k b Σ − 1 / 2 i (Σ ∗ i − b Σ i ) b Σ − 1 / 2 i k 2 F is not too small, ( 16 ) giv es a quadratic appro ximation to the CV score. Notice that, in eac h term within th e summation of this quadratic appr o ximation, dir ections with high v ariabilit y are do wn -w eigh ted by the multiplicativ e factors b Σ − 1 / 2 i . Therefore this CV score based on the empir ical Kullbac k-Leibler loss is prop erly scaled. Moreo ver, note that appro ximation ( 16 ) do es not really d ep end on Gaussianit y but only on the tail of the d istributions inv olv ed. 9 On the other hand, it can b e sho wn by s imple algebra that, up to a m ultiplicativ e factor, the CV score based on the pred iction error loss is C V = 1 n P n i =1 e ℓ i (Ψ , Λ) where e ℓ i (Ψ , Λ) = tr ( b Σ − 2 i S i ), where, S i = ( Y i − b µ i )( Y i − b µ i ) T is the empirical co v ariance m atrix corresp onding the i obs erv atio n v ector. The corresp ondin g diﬀeren ce of the CV s cores b etw een estimated and tru e parameters b ecomes (ignoring the m ultiplicativ e constan t), 1 n n X i =1 e ℓ i ( b Ψ , b Λ) − 1 n n X i =1 e ℓ i (Ψ ∗ , Λ ∗ ) = 1 n n X i =1 tr [( b Σ − 2 i − Σ − 2 ∗ i )Σ ∗ i ] + 1 n n X i =1 tr [( b Σ − 2 i − Σ − 2 ∗ i )( S i − Σ ∗ i )] = 1 n n X i =1 tr [Σ − 1 / 2 ∗ i ( A 2 i − I m i )Σ − 1 / 2 ∗ i ] + O   r log n n " 1 n n X i =1 k Σ − 1 / 2 ∗ i ( A 2 i − I m i )Σ − 1 / 2 ∗ i k 2 F # 1 / 2   (17) with high prob ab ility . Here A i = Σ 1 / 2 ∗ i b Σ − 1 i Σ 1 / 2 ∗ i whic h is already prop erly scaled. Therefore, from ( 17 ) it is clear that this CV score itself is not correctly scaled. Also, the expr ession 1 n P n i =1 tr [Σ − 1 / 2 ∗ i ( A 2 i − I m i )Σ − 1 / 2 ∗ i ] app earing in ( 17 ) is not necessarily nonn egativ e. This m eans that the p rediction er- ror lo ss do es n ot enjo y the pleasing prop ert y of the Kullbac k-Leibler loss that the minim um of the exp ected loss o ccurs at th e true parameter. Hence the use of the prediction error loss is not recommended for the cur ren t pr oblem. 3.1 First order appro ximation Direct computation of the criterion C V ∗ ( K, h n ) (equation ( 15 )) is a lab orious pro cess since we need to compute b C ( − i ) c ( s, t ) and p erform its eigen-analysis f or ev ery i = 1 , . . . , n . Therefore, we prop ose to appr oximate C V ∗ ( K, h n ) by using a ﬁrst ord er appr o ximation to the quan tities b µ ( − i ) i , b ψ ( − i ) ν ( · ) and b λ ( − i ) ν around the estimates b µ i , b ψ ν ( · ) and b λ ν , resp ectiv ely . The appro ximations of the eigenfunctions and eige nv alues is b ased on a p erturbation analysis app r oac h. Th e k ey idea is that the lea ve- one-curve-o u t estimator b C ( − i ) c of the co v ariance can b e viewe d as a p ertur b ation of the linear operator b C c . The k ey comp onent is Prop osition 3.1 whic h uses a result on p erturb ation of eigenfunctions of a linear op er ator (Lemma 7.1 in App endix A). Note th at, our approximat ion sc heme can also b e applied to CV scores b ased on some other loss f u nctions, suc h as C V ( K, h n ). Using Lemma 7.1 , we can get a ﬁrst ord er app ro ximation to the quan tities b ψ ( − i ) iν and b λ ( − i ) ν that dep ends on the observ ations thr ou gh a term that is linear in ∆ i ( s, t ) = b C c ( s, t ) − b C ( − i ) c ( s, t ) (for conv enience we omit h n in the n otation). Sin ce the latter quan tity h as a rather simp le ex- pression which inv olv es essential ly only the i -th observ ation, this step su bstan tially reduces the computational bur den of the cross-v alidati on pro cedu re. Prop osition 3.1. F or the pr op ose d estimator b C c given by ( 11 ), we have, (i) b ψ ( − i ) iν − b ψ iν = ( b ψ ( − i ) ν ( T ij ) − b ψ ν ( T ij )) m i j =1 ≈ (( b H ν ∆ i b ψ ν )( T ij )) m i j =1 ; (18) 10 (ii) b λ ( − i ) ν − b λ ν ≈ − tr ( b P ν ∆ i ); (19) wher e (a) b P ν = b ψ ν ⊗ b ψ ν wher e, for f , g ∈ L 2 ([0 , 1]) , f ⊗ g denotes the inte gr al op er ator with kernel f ( x ) g ( y ) and acts on any w ∈ L 2 ([0 , 1]) as ( f ⊗ g )( w )( x ) = ( R 1 0 g ( y ) w ( y ) dy ) f ( x ) ; (b) b H ν ( t, u ) = K X k 6 = ν 1 b λ k − b λ ν b ψ k ( t ) b ψ k ( u ) − 1 b λ ν δ ( t − u ) − K X k =1 b ψ k ( t ) b ψ k ( u ) ! = K X k 6 = ν b λ k b λ ν ( b λ k − b λ ν ) b ψ k ( t ) b ψ k ( u ) + 1 b λ ν b ψ ν ( t ) b ψ ν ( u ) − 1 b λ ν δ ( t − u ) , (20) with δ b eing the Dir ac δ - f unction, i.e., R δ ( t − u ) w ( u ) du = w ( t ) for any smo oth w ∈ L 2 ([0 , 1]) . Her e tr ( b P ν ∆ i ) and ( b H ν ∆ i b ψ ν )( t ) ar e deﬁne d as fol lows: (a’) tr ( b P ν ∆ i ) = Z b ψ ν ( u )∆ i ( u, v ) b ψ ν ( v ) dudv ; (21) (b’) ( b H ν ∆ i b ψ ν )( t ) = Z Z b H ν ( t, u )∆ i ( u, v ) b ψ ν ( v ) dudv . (22) Also , (iii) b µ ( − i ) ( t ) − b µ ( t ) = 1 n − 1 b µ ( t ) − 1 n − 1 1 m i m i X j =1 Y ij K h µ ( t − T ij ) , (23) wher e b µ ( t ) = 1 n P n i =1 1 m i P m i j =1 Y ij K h µ ( t − T ij ) , with h µ b eing the b andw idth for estimating µ (chosen sep ar ately). After we obtain the approxima tions for b ψ ( − i ) iν and b λ ( − i ) ν from Prop osition 3.1 , we plu g them bac k in equation ( 15 ) for C V ∗ ( K, h n ) to obtain the ﬁnal appro ximation of the CV score, denoted b y g C V ∗ ( K, h n ): g C V ∗ ( K, h n ) = 1 2 n n X i =1 log | e Σ i | + 1 2 n n X i =1 tr ( e Σ − 1 i ( Y i − b µ ( − i ) i )( Y i − b µ ( − i ) i ) T ) , where e Σ i = P K ν =1 e λ iν e ψ iν e ψ T iν + b σ 2 ( − i ) I m i , with e λ iν = b λ ν − tr ( b P ν ∆ i ) and e ψ iν = b ψ iν + (( b H ν ∆ i b ψ ν )( T ij )) m i j =1 , 11 and b µ ( − i ) i = ( b µ ( − i ) ( T ij )) m i j =1 , with b µ ( − i ) giv en b y ( 23 ). An expression for b σ 2 ( − i ) − n n − 1 b σ 2 is easily obtained b y u sing ( 7 ), ( 12 ) and ( 13 ). Note that this step do es n ot requir e an y extra computation b eyo n d that for computing b σ 2 . Observe th at our ob jectiv e of minimizing the criterion g C V ∗ ( K, h n ) is to estimate the n umb er of n onzero eigen v alues and to select an appropriate bandwid th for estimating the eigenfunctions. If instead th e ob jectiv e is to select an appropriate b an d width for estimating the co v ariance kernel, we can do so b y replacing the term P K ν =1 e λ iν e ψ iν e ψ T iν in the deﬁnition of e Σ i with the lea v e-one-curv e- out estimate of co v ariance kernel, viz. b C ( − i ) c,h n ev aluated at the design p oints, and minimizing the corresp ondin g CV criterion. This distinction is imp ortan t since the theoretica l results (Theorems 4.1 and 4.2 ) sho w that the optimal rates for th e bandwidth h n are diﬀerent for estimating th e co v ariance kernel and its eigenfunctions. 3.2 Represen tation of b H ν ∆ i b ψ ν and tr ( b P ν ∆ i ) In order to obtain the approximat e CV score g C V ∗ ( K, h n ) eﬃcien tly , w e need to compute the quan tities ( b H ν ∆ i b ψ ν )( t ) and tr ( b P ν ∆ i ) in an eﬃcien t manner. Thus w e h a v e the follo w in g further appro ximation based on Lemma 7.1 . Prop osition 3.2. We have (i) ( b H ν ∆ i b ψ ν )( t ) ≈ w ( m i ) n − 1 K X k 6 = ν b λ k b λ ν ( b λ k − b λ ν ) γ k ,h n ( i ) γ ν,h n ( i ) b ψ k ( t ) + w ( m i ) n − 1 1 b λ ν ( γ ν,h n ( i )) 2 b ψ ν ( t ) − w ( m i ) n − 1 1 b λ ν 1 g ( t ) L n X l =1 ( e X i ( s l ) + ( t − s l ) e X ′ i ( s l )) Q h n ( s l − t ) e γ ν,h n ( i, t ) − w ( m i ) n − 1   K X k 6 = ν b λ k b λ ν ( b λ k − b λ ν ) γ k ,ν,h n ( i ) b ψ k ( t ) + 1 b λ ν γ ν,ν,h n ( i ) b ψ ν ( t )   + 1 n − 1 K X k 6 = ν b λ k b λ ν ( b λ k − b λ ν ) b ψ k ( t ) L n X l =1 Z b ψ k ( u ) b ψ ν ( u ) g ( u ) ( S i ( s l ) β 1 ,h ( u, s l ) + S ′ i ( s l ) β 2 ,h ( u, s l )) du + 1 n − 1 1 b λ ν b ψ ν ( t ) L n X l =1 Z ( b ψ ν ( u )) 2 g ( u ) ( S i ( s l ) β 1 ,h ( u, s l ) + S ′ i ( s l ) β 2 ,h ( u, s l )) du − 1 n − 1 1 b λ ν L n X l =1 ( S i ( s l ) β 1 ,h ( t, s l ) + S ′ i ( s l ) β 2 ,h ( t, s l )) b ψ ν ( t ) g ( t ) + ( b σ 2 ( − i ) − n n − 1 b σ 2 )( Z b H ν ( t, u ) du )( Z b ψ ν ( u ) du ); (24) 12 (ii) tr ( b P ν ∆ i ) ≈ − b λ ν n − 1 + w ( m i ) n − 1  ( γ ν,h n ( i )) 2 − γ ν,ν,h n ( i )  + 1 n − 1 Z ( b ψ ν ( u )) 2 g ( u ) ( S i ( u ) β 1 ,h ( u ) + S ′ i ( u ) β 2 ,h ( u )) du, +( b σ 2 ( − i ) − n n − 1 b σ 2 )( Z b ψ ν ( u ) du ) 2 (25) wher e (a) γ k ,h n ( i ) = L n X l =1 e X i ( s l ) G 0 b ψ k g , Q h n ! ( s l ) − L n X l =1 e X ′ i ( s l ) G 1 b ψ k g , Q h n ! ( s l ) , (26) wher e, for any two f unctions f 1 and f 2 deﬁne d on [0 , 1] , G 0 ( f 1 , f 2 )( s ) = ( f 1 ∗ f 2 )( s ) = Z f 1 ( x ) f 2 ( s − x ) dx, G 1 ( f 1 , f 2 )( s ) = ( f 1 ∗ ( xf 2 ))( s ) = Z f 1 ( x )( s − x ) f 2 ( s − x ) dx ; (b) e γ k ,h n ( i, t ) = L n X l =1 Z (1 − W e h n ( t, v ))[( e X i ( s l ) + ( v − s l ) e X ′ i ( s l )) Q h n ( s l − v )] b ψ k ( v ) g ( v ) dv ; (27) (c) γ k ,k ′ ,h n ( i ) = 1 X j,j ′ =0 L n X l,l ′ =1 X ( j ) i ( s l ) X ( j ′ ) i ( s l ′ ) Z W e h n ( u, v ) b ψ k ( u ) g ( u ) b ψ k ′ ( v ) g ( v ) · ( u − s l ) j ( v − s l ′ ) j ′ Q h n ( s l − u ) Q h n ( s l ′ − v ) dudv ; (28) (d) β 1 ,h ( u, s ) = Z u + A 2 h u − A 2 h Q h ( u + v 2 − s ) dv (29) β 2 ,h ( u, s ) = Z u + A 2 h u − A 2 h ( u + v 2 − s ) Q h ( u + v 2 − s ) dv . (30) In the ab o ve , the computation of γ k ,h n ( i ) can b e easily d one b y using fast four ier transformation. Also, e γ k ,h n ( i, t ) ≈ γ k ,h n ( i ) for al l t ∈ [0 , 1]. Ho w eve r , the computation of γ k ,k ′ ,h n ( i ) in volv es a double in tegration. Th us w e need to d o some appr o ximations to simplify th e computation. A computationally eﬃcien t appro ximation to γ k ,k ′ ,h n ( i ) is describ ed in App endix B. Compu tation of β 1 ,h ( u ) and β 2 ,h ( u ) can b e done in clo sed form whenev er Q h ( · ) has a “nice” fun ctional form 13 (e.g. a B-spline). F rom Prop ositions 3.1 and 3.2 it is clear that most of the components ha ve already b een computed in constru cting the estimator, and con volutio n s can b e p erformed in a fast manner b y using FFT. Thus, th e k ey adv an tage aﬀorded b y Prop osition 3.2 is to replace the exp ensive computation of doub le integral s to a m uch cheaper computation of single in tegrals and con v olutions. See App end ix F f or d etails of some of these steps. 4 Asymptotic prop erties In this section, we present the theoretical p rop erties of the pr op osed estimators through a large sample analysis. Our main interest is in the estimation accuracy of the co v ariance kernel and its eigenfunctions. The statemen ts of the results and the asso ciated regularit y co n ditions are give n b elo w. W e ﬁrst s tate the follo win g assump tions on g , the den sit y of the d esign p oint s; C , the co v ariance k ernel; and { ψ k } M k =1 , the eigenfunctions. A1 g is t wice con tinuously diﬀerentia ble and the second deriv ativ e is H¨ older( α ), f or s ome α ∈ (0 , 1). Also, the same holds for the co v ariance k ernel C . A2 max k {k ψ k k ∞ , k ψ ′ k k ∞ , k ψ ′′ k k ∞ } is b oun ded. A3 There are constan ts 0 < c 0 ≤ c 1 < ∞ su ch that c 0 ≤ g ( · ) ≤ c 1 . W e also assume th at the k ern els K ( · ) and Q ( · ) satisfy conditions B1 and B2 , r esp ectiv ely . W e need to mak e further assumptions ab out the co v ariance kernel C and the correlat ions among th e sample curves. Let R denote an n × n matrix with ( i, j )-th en try ρ ij . Assu me: C1 λ 1 > λ 2 > · · · > λ M > 0 and λ M +1 = · · · = 0. That is, the nonzero eigen v alues are all distinct and the co v ariance kernel is of ﬁnite dim en sion. C2 max 1 ≤ ν ≤ M ( λ ν − λ ν +1 ) − 1 is b ound ed ab o v e. C3 1 n 2 tr [( R − I n ) 2 ] → 0 as n → ∞ , and k R k≤ κ n for κ n > 0. Note that, the ﬁrst part of C3 quan tiﬁes the total con tribu tion of th e correlations among the sample curv es in the v ariance of the estimated co v ariance k ernel (see Theorem 4.1 ). The second p art of C 3 imp oses a stabilit y condition on the correlation matrix R . In other w ords, the sample curv es are “w eakly correlated” as k R k is b ound ed b y κ n . Deﬁne m = min 1 ≤ i ≤ n m i and m = max 1 ≤ i ≤ n m i . W e further assume that C4 m/m is b ound ed ab o v e as n → ∞ . W e no w giv e the bias and v ariance of the prop osed com bined estimator. Theorem 4.1. Supp ose that c onditions A1 - A3 , B1 - B2 and C 3 - C4 hold. Assume further that σ 2 is known and b C ( · ) = b C ∗ ( · ) − σ 2 wher e b C ∗ ( · ) i s deﬁne d thr ough ( 7 ). Supp ose further that in the deﬁnitio n ( 11 ), e h n = Ah n for some c onstant A ≥ 4( B K + C Q ) . Then, with h n = o (1) and nh 2 n → ∞ , the estimato r b C c satisﬁes: E [ b C c ( s, t )] = C ( s, t ) + O ( h 2 n ) , (31) V ar [ b C c ( s, t )] = O  1 n  + O  max { 1 nh 2 n m 2 , 1 nh n m }  +   1 n 2 n X i 6 = j ρ 2 ij   O (1) , (32) 14 wher e the O ( · ) terms ar e unif orm in s, t ∈ [0 , 1] . One implication of Theorem 4.1 is that it giv es the rate of con verge n ce of the estimator b σ 2 deﬁned in ( 13 ) as illustrated in th e follo wing corollary . Corollary 4.1. Supp ose that c onditions A1 - A3 , B1 - B2 and C3 - C4 hold, and in the deﬁnition ( 11 ), e h n = Ah n for some c onstant A ≥ 4( B K + C Q ) . Then, with h n = o (1) and n h 2 n → ∞ , E ( b σ 2 − σ 2 ) 2 = O  1 n  + O  max { 1 nh 2 n m 2 , 1 nh n m }  +   1 n 2 n X i 6 = j ρ 2 ij   O (1) + O ( h 4 n ) , (33) wher e the O ( · ) terms ar e unif orm in s, t ∈ [0 , 1] . Using Corollary 4.1 and Th eorem 4.1 , we get a b ound on the v ariance of the prop osed estimator of the co v ariance kernel when σ 2 is estimated by b σ 2 deﬁned in ( 13 ). Corollary 4.2. Supp ose that c onditions A1 - A3 , B1 - B2 and C3 - C4 hold, and in the deﬁnition ( 11 ), e h n = Ah n for some c onstant A ≥ 4( B K + C Q ) . Then, with h n = o (1) and n h 2 n → ∞ , V ar [ b C c ( s, t )] = O  1 n  + O  max { 1 nh 2 n m 2 , 1 nh n m }  +   1 n 2 n X i 6 = j ρ 2 ij   O (1) + O ( h 4 n ) , (34) wher e the O ( · ) terms ar e unif orm in s, t ∈ [0 , 1] . Next w e state the result ab out the asymptotic b eha vior of the estimated eigenfunctions. Let the loss fu nction for ψ ν b e the mo diﬁe d L 2 -loss give n by L ( b ψ ν , ψ ν ) = k b ψ ν − sign( h b ψ ν , ψ ν i ) ψ ν k 2 2 , (35) where k · k 2 denotes the L 2 norm, and h b ψ ν , ψ ν i = R 1 0 b ψ ν ( x ) ψ ν ( x ) dx . F or the statemen t of Theorem 4.2 , we only need to assume that the estimator b σ 2 of σ 2 satisﬁes E ( b σ 2 − σ 2 ) 2 = o (1). Theorem 4.2. Supp ose that c onditions A1 - A3 , B1 - B2 and C 1 - C4 hold. Supp ose further that in the deﬁnition ( 11 ), e h n = Ah n for some c onstant A > 4( B K + C Q ) . If mh n = o (1) , nh 2 n → ∞ and κ n mh − 1 n n − 1 / 2+ ǫ ′ → 0 for some ǫ ′ > 0 , then the estimator b ψ ν , which is the eigenfu nction c orr esp onding to the ν -th lar gest eigenvalue of b C c , satisﬁes: for any arbitr ary but ﬁxe d ǫ > 0 , sup ( C,g ) ∈ Θ E L ( b ψ ν , ψ ν ) ≤ (1 + ǫ ) 1 n   X 1 ≤ k 6 = ν ≤ M λ k λ ν ( λ k − λ ν ) 2   +(1 + ǫ )   1 n 2 n X i 6 = j ρ 2 ij     X 1 ≤ k 6 = ν ≤ M λ k λ ν ( λ k − λ ν ) 2 + O ( h n )   + O ( h 4 n ) + O  1 nh n m  , (36) wher e Θ denotes the class of c ovarianc e- density p airs ( C, g ) satisfying the c onditions A1 - A3 , B1 - B2 and C 1 - C4 . 15 One imp ortant implication of Theorems 4.1 and 4.2 is that, if the correlation b et ween sample curv es is “w eak” in a suitable sense, then the b est rate of con v ergence for the correlated and i.i.d. cases are the same. Comparing with the i.i.d. case, w e imm ed iately see that, in order for this to hold, un der the conditions of Theorem 4.2 , we n eed 1 n 2 X j 6 = i ρ 2 ij = o  1 nh n, ∗ m  , (37) where h n, ∗ is the optimal band w idth c hoice (at the lev el of r ates of conv ergence) for the i.i.d. case. Also, in order to ensur e the optimal rate for the estimate of the co v ariance k ern el in the correlated case is the same as that in the i.i.d. case, it is suﬃcien t that (by Corollary 4.2 ) 1 n 2 X j 6 = i ρ 2 ij = o  max { 1 nh n, ∗ m , 1 nh 2 n, ∗ m 2 }  , (38) where h n, ∗ is the optimal band width c hoice for th e co v ariance estimator (at the lev el of rates of con v ergence) for the i.i.d. case. Sp eciﬁcally , for estimating the co v ariance, h n, ∗ = ( nm 2 ) − 1 / 6 (b y Theorem 4.1 , and under the setting wh ere mh n, ∗ = o (1)), and for estimating the eigenfunctions, h n, ∗ = ( nm ) − 1 / 5 (b y Theorem 4.2 ). Th us, one notices that the optimal band width f or estimating the cov ariance and its eigenfunctions are diﬀeren t, at least in the case where m can only gro w rather slo wly with n . Combining the lo wer b ound giv en by (Theorem 2) in Hall e t al. (2 006) and th e upp er b ound fr om Th eorem 4.2 , it f ollo ws th at wh en m is b oun ded, the rate of conv ergence of L 2 -risk is optimal if ( 37 ) holds. Thus, under this setting the prop osed estimator of the eigenfunctions is optimal even in the situation w hen the s amp le curves are we akly c orr elate d . Similarly , und er the setting of Theorem 4.1 , if ( 38 ) holds, then the L 2 -risk of th e prop osed estimato r of co v ariance also has the optimal rate un der an approp r iate choice of bandwid th. Another imp ortan t p oin t is that the conditions in Theorem 4.2 , sp eciﬁcally that mh n = o (1), nh 2 n → ∞ , and mκ n h − 1 n n − 1 / 2+ ǫ ′ = o (1), wh ic h imp ly that m = o ( n 1 / 4 ), are not the most general conditions. W e conjecture that ( 36 ) hold un der w eak er conditions. I ndeed, in the i.i.d. case, ( 36 ) holds (without the second term on the RHS ) und er a muc h wid er range of p ossib le v alues of m as ind icated by the follo wing result. The follo wing result giv es a lo we r b ound on the r ate of con v ergence of the ﬁrst eigenfunction when m → ∞ under the i.i.d. setting. This b ound is a reﬁnement o v er an analogous result (Theorem 2) in Hall et al. (2006), ev en though the latter holds for all eigenfunctions. Notice th at this lo wer b ound , toge th er with the upp er b ound elucidated in the paragraph follo wing Th eorem 4.2 , imp lies that at least for the ﬁ rst eigenfunction, the b est rate of con v ergence for eigenfunctions, viz. O (( nm ) − 4 / 5 ) is optimal when m → ∞ at a faster rate and if ( 37 ) holds. Theorem 4.3. L e t C denote the class of c ovarianc e kernels Σ( · , · ) on [0 , 1] 2 with r ank ≥ 1 , and nonzer o e i genvalues { λ j } j ≥ 1 satisfying C 0 ≥ λ 1 > λ 2 ≥ 0 with λ 1 − λ 2 ≥ C 1 , and the ﬁrst eigenfu nc- tion ψ 1 b eing twic e diﬀer entiable and satisfying k ψ ′′ 1 k ∞ ≤ C 2 , for some c onstants C 0 , C 1 , C 2 > 0 . Also , let G denote the class of c ontinuous densities g on [0 , 1] such that c 1 ≤ g ≤ c 2 for some 0 < c 1 ≤ 1 ≤ c 2 < ∞ . Supp ose that we observe data ac c or ding to mo dels ( 1 ) wher e X i ( · ) ar e i.i.d. Gaussian pr o c esses with me an 0 and c ovarian c e kernel Σ . Also supp ose that the numb er of me asur ements m i ’s satisfy m ≤ m i ≤ m , for m ≥ m ≥ 4 , such that m/m ≤ C 3 for some C 3 < ∞ , and m = o ( n 2 / 3 ) . L et D denote the sp ac e of such designs D = { m i } n i =1 . Then for suﬃciently lar ge 16 n , for any estimator b ψ 1 with l 2 norm one, the fol lowing holds: sup D ∈D sup g ∈G sup Σ ∈C E k b ψ 1 − ψ 1 k 2 2 ≥ C 4 ( nm ) − 4 / 5 . (39) The pro of of Th eorem 4.3 is giv en in App endix G. 5 Outline of the Pro of of Theorems 4.1 and 4.2 In this sectio n , we b rieﬂy describ e the main ideas leading to the pr o of of Th eorems 4.1 and 4.2 . The tec hnical arguments are giv en in the app end ices. Th e pro of of Theorem 4.1 u ses direct computation (App end ices C and D). The basic idea in the computation of th e moments is to treat the diagonal and the oﬀ-diagonal p arts of b C c ( · , · ) separately . The pr o of of Theorem 4.2 hea vily r elies on an application of Lemma 7.1 . In view of th is, the key quant ity in the deriv ation of asymptotic risk is the computation of E k H ν b C c φ ν k 2 2 , where k f k 2 2 denotes R 1 0 f 2 ( x ) dx for a function f ∈ L 2 ([0 , 1]). Once w e obtain an expression for this (as give n in Section 5.1 ), w e use a pr ob ab ilistic b ound on the op erator norm of the diﬀerence b etw een estimated and true co v ariance kernels, to complete the pro of. P r o ofs of T h eorems 4.1 and 4.2 require rep eated computation of mixed momen ts of correlated Gaussian rand om v ariables. The details of all these computations are giv en in the app endices. 5.1 Asymptotic risk for estimating ψ ν The k ey result in this section is the follo wing prop osition. Prop osition 5.1. Under the assumptions of The or em 4.2 , we have E k H ν b C c ψ ν k 2 2 ≤ (1 + ǫ ) 1 n   X 1 ≤ k 6 = ν ≤ M λ k λ ν ( λ k − λ ν ) 2   +(1 + ǫ )   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2     X 1 ≤ k 6 = ν ≤ M λ k λ ν ( λ k − λ ν ) 2 + O ( h n )   + O ( h 4 n ) + O  1 nh n m  (40) for any arbitr ary but ﬁxe d ǫ > 0 . Here w e brieﬂy describ e the main idea of the p ro of. F or conv enience of exp osition, throughout w e r eplace max { b C ( s + t 2 ) , h 2 n } in the d eﬁnition ( 11 ) by b C ( s + t 2 ). Using app ropriate exp onential in- equalities for b C ∗ ( t ), it can b e sho wn that, asymptotically th is do es not make any diﬀerence as long as min t ∈ [0 , 1] C ( t, t ) > c 3 for some c 3 > 0. Also, for computational p urp oses, it is helpful to consider the unsmo othed v ersion ( 10 ) of the k ernel W , and tak e e h n = Ah n , where A ≥ 4( B K + C Q ). Th e adv an tage of this is in b eing able to deal with th e con tribu tions from the diagonal and oﬀ-diagonal parts of the estimator separately . S ince the deﬁ n ition of H ν in vol ves the Dirac- δ op erator, w e need to accoun t for the con tribution of terms inv olving δ carefully . The estimation error in σ 2 also pla ys a role, and is tak en into acco u nt separately . The main decomp ositions that facilitate the compu- tations are give n through ( 55 ) - ( 57 ) in App endix D. The last b ound r educes the task of b oun d ing 17 E k H ν b C c ψ ν k 2 2 to that of b oundin g E k H ν e C c ψ ν k 2 2 , w ith e C c ( · , · ) as describ ed in App endix D. Note also that, if σ 2 is assumed to b e kno wn, then the decomp osition ( 57 ) is not r equ ired, and w e can get rid of the m ultiplicativ e facto r (1 + ǫ ) in th e expr ession ( 36 ) for the r isk in Theorem 4.2 . 5.2 Norm b ound on b C c − E b C c T o complete the p ro of of the theorems, we need to ﬁ nd a probabilistic b ound for k b C c − E b C c k , where k · k denotes the op erator n orm. W e shall ﬁrst ﬁn d a b ound on the s u p norm of b C c − E b C c , and then we can b ound the op erator norm k b C c − E b C c k via the in equalit y k b C c − E b C c k≤k C c − E b C c k F , where k · k F denotes the Hilb ert-Schmidt norm. This is in turn due to the inequalit y , k b C c − E b C c k F ≤ sup x,y ∈ [0 , 1] | b C c ( x, y ) − E b C c ( x, y ) | =: k b C c − E b C c k ∞ . Note that, b y piecewise diﬀerenti ability of the estimate b C c , in order to p ro vide exp onen tial b ounds for the deviations of k b C c − E b C c k ∞ , it is enough to provide exp onen tial b ounds for the ﬂuctuations of | b C c ( s, t ) − E [ b C c ( s, t )] | for a ﬁnite (but p olynomially gro wing with n ) num b er of p oints ( s, t ) ∈ [0 , 1]. Th u s, w e ﬁ x an arbitrary ( s, t ) ∈ [0 , 1] and deriv e an exp on ential inequ alit y for the d eviation of estimate at this p oin t. F or simp lifying the computations, w ithout loss of generalit y , we assume that g is the density of the Uniform(0,1) distribu tion. Th en w e hav e the follo wing prop osition. Prop osition 5.2. Under the c onditions of The or em 4.2 , given η > 0 , ther e is a c η > 0 such that, for every ﬁxe d s, t ∈ [0 , 1] , P | b C c ( s, t ) − E ( b C c ( s, t )) | > c η mκ n s log n nh 2 n ! ≤ n − η . (41) The pr o of of Theorem 4.2 then follo ws by noticing ﬁrst that b y Lemma 7.1 and the fact that k b ψ ν k 2 = k ψ ν k 2 = 1, E L ( b ψ ν , ψ ν ) ≤ E k H ν b C c ψ ν k 2 2 (1 + δ n,η ) + 2 P k b C c − E ( b C c ) k > c ′ η mκ n s log n nh 2 n ! for some η > 0, c ′ η > 0 an d δ n,η → 0 app ropriately c hosen, and then u sing Prop ositions 5.1 and 5.2 . 5.3 Connection to parametric rate for “purely functional” data It is instr u ctiv e to compare the optimal rate for our pr o cedure with that obtained b y Hall et al. (2006 ). W e can regard the ﬁrst line on the righ t hand sid e of ( 40 ), as the parametric comp onen t of the risk and the second line as the nonparametric comp onent. If we take h = O ( n − 1 / 5 ), then for b ound ed m we get the optimal nonparametric rate. F or consistency of b ψ ν in L 2 sense, w e clearly need 1 n 2 P i 1 6 = i 2 ρ 2 i 1 i 2 = o (1) (used in Theorem 4.2 ). If m increases with increasing sample s ize, th en the rate also impr o v es. But there is n o result ab out optimalit y . When the observ ations are i.i.d., it can b e c hec k ed b y usin g a mo d iﬁ cation to th e pro of of Prop osition 5.2 that, if m → ∞ , h → 0, su c h that ( mh ) − 1 = o (1) and h = o ( n − 1 / 4 ), w e obtain the parametric rate for the L 2 -risk of b ψ ν (as indicated in Hall et al. , 2006). In other w ords, u nder that setting there is asymptotically no diﬀerence b et wee n th e r isk of estimating the eigenfunctions from data obtai n ed with ob s erv atio n al noise and measured at randomly distribu ted p oints, and 18 that from data measured on the contin uum without n oise. Indeed, su c h a s cenario is p ossible if m − 1 = o ( n − 1 / 4 − ǫ ), for an ǫ > 0. Then, b y taking h n = o ( n − 1 / 4 ), and assum ing th at either σ 2 is kn o wn, or an estimator b σ 2 satisfying | b σ 2 − σ 2 | = O P ( h 2 n ) is a v ailable, we attai n the conditions men tioned ab o ve. W e conjecture th at the same result holds even when the observ ations are “we akly” correlated. 6 Discussion In this pap er, w e p resen ted a p ro cedure for estimating the co v ariance k ernel and its eigenfunctions from sparsely observ ed, noise corrupted and correlated fu nctional data. Th e estimator for the co- v ariance k ernel is based on m erging t wo separate estimators: (i) the estimat or of the oﬀ-diago n al part b ased on computing linearized empirical co v ariances of the smo othed ve rs ion of individu al sample curv es; (ii) the estimator of the diagonal p art b ased on linearized k ernel smo othing of the empirical v ariances. The imp ortance of this mod iﬁcation to the naiv e kernel smo othing app roac h, esp ecially in the scenario when the num b er of d esign p oin ts p er curv e is s mall, is demonstrated through an asymp totic bias analysis. The linearized ve rs ion of the ke r nel sm o othing helps in re- ducing bias, wh ile cont r olling the v ariance, and is computationally app ealing. Asymptotic risk b ehavio r of the pr op osed estimators is stud ied und er the assumption that th e samp le curves hav e a “separable co v ariance” structure and are “w eakly” correlat ed. Exact q u an tiﬁcation of the asymp- totic r isk for the eigenfun ctions is ob tained u nder the Gaussian setting (Theorem 4.2 ). It is also sho wn that th e L 2 -risk for the eigenfun ctions ac h iev es the optimal r ate, u nder an appropr iate c hoice of th e b andwidth, wh en the n u m b er of measurements p er cur v e is b ounded. Also, in the i.i.d. case, w e obtain a lo wer b ound on the rate of con ve rgence f or estimating the ﬁ rst eigenfunction that is sharp er than b oun ds in the existing literature, wh ic h pr ov es the rate-optimalit y of our estimator in a wider r egime. Finally , we prop ose a computationally tractable mo del selection pro cedure based on min imizing an approxima tion to the lea ve- one-curve-o u t cross v alidation score that uses the empirical Kullbac k-Leibler loss. W e also show that in the co ntext of estimating the co v ariance k ernel or its eige n functions, it h as clear adv an tages ov er the commonly u s ed prediction error loss. The pr op osed pro cedur e for estimation and mo del selection is easily implemen table and com- putationally more tractable as compared to s ome of the existing metho ds . Moreo v er, due to the linear structure of the pre-smo othing of ind ividual cu rv es, our estimator is stable. F u rthermore, the linear structure of the prop osed estimat or also allo ws for a simple appro ximation to the cr oss v alidati on score. Finally , ev en th ough the results are prov ed under Gaussianit y of the noise pro cess, it can b e shown that at the lev el of rates of con v ergence, the u pp er b ou n ds hold und er suﬃcien t momen t cond itions on the n oise, and hence the estimator is exp ected to b e robust to distribu tional assumptions. There are a f ew asp ects of the estimation pro cedure that need fur ther exploration. In the asymptotic analysis, we assumed that g , the density f u nction of the design p oin ts, is known. In practice it h as to b e estimated from the data. Additional computations are needed to sho w that the results deriv ed h ere h old under that setting as w ell. It will b e useful also to study its impact on th e estimation pr o cedure thr ough simulatio n studies, and in r eal data applications when the assumption of exact randomness of th e design p oin ts may b e violated. A natural generalizati on of the framewo r k studied in this pap er will b e when the principal comp onent scores jointly form a stationary v ector autoregressiv e pro cess. Under such a setting, we w ould lik e to extend the estimation and mo del selection p ro cedures describ ed her e to exploit the sp ecial structures of s u c h pro cesses. This is lik ely to su mmarize the statist ical prop erties of some 19 real-life phenomena and also help in m o del building and pr ediction, for example in spatio-temp oral mo dels when the co v ariance is not separable. 7 App endix App endix A P erturbation of eigen-structure The follo win g lemma is a mo diﬁ ed v ersion of a similar r esult in Paul and John stone (2007). Several v arian ts of this lemma app ear in the literature (see, e.g., Kneip and Utik al (2001), Cai and Hall (2006 )), and most of them implicitly use the approac h tak en in Kato (1980 ). In the follo wing w e use k A k to den ote th e op erator norm of an op erator A , i.e., the largest singular v alue of A . Lemma 7.1. L et A and B b e two symmetric Hilb ert-Schmidt op er ators acting on L 2 ([0 , 1]) . L et the eigenvalues of the op er ator A b e denote d by λ 1 ( A ) , λ 2 ( A ) , · · · . Set λ 0 ( A ) = ∞ and λ ∞ ( A ) = −∞ . F or any r ≥ 1 , if λ r ( A ) is a unique eigenvalue of A , i.e ., if λ r ( A ) is of multiplicity 1, then denoting by p r the eigenfunction asso ciate d with the r -th eigenvalue. Then p r ( A + B ) − sign h p r ( A + B ) , p r ( A ) i p r ( A ) = − H r ( A ) B p r ( A ) + R r wher e H r ( A ) := P s 6 = r 1 λ s ( A ) − λ r ( A ) P E s ( A ) and P E s ( A ) denotes the ortho gonal pr oje ction op er ator onto the ei gen-subsp ac e E s c orr esp onding to eigenvalue λ s ( A ) (p ossibly multi-dimensional). Deﬁne δ r and δ r as δ r := 1 2 [ k H r ( A ) B k + | λ r ( A + B ) − λ r ( A ) | k H r ( A ) k ] δ r = k B k min 1 ≤ j 6 = r ≤∞ | λ j ( A ) − λ r ( A ) | . Then, the r esidual term R r c an b e b ounde d as k R r k≤ min  10 δ 2 r , k H r ( A ) B p r ( A ) k  2 δ r (1 + 2 δ r ) 1 − 2 δ r (1 + 2 δ r ) + k H r ( A ) B p r ( A ) k (1 − 2 δ r (1 + 2 δ r )) 2  wher e the se c ond b ound holds only if δ r < √ 5 − 1 4 . In additio n, if 1 ≤ r 1 ≤ r 2 ar e such that λ r 1 ( A ) > λ r 1 +1 ( A ) = · · · = λ r 2 ( A ) > λ r 2 +1 ( A ) , then r 2 X k = r 1 ( λ k ( A + B ) − λ k ( A )) = tr ( P E r 1 ( A ) B ) + R r 1 ,r 2 , wher e P E r 1 ( A ) is the ortho gonal pr oje ction op e r ator of A c orr esp onding to the eigenvalues λ r 1 ( A ) , . . . , λ r 2 ( A ) , and the r esidual R r 1 ,r 2 satisﬁes | R r 1 ,r 2 | ≤ ( r 2 − r 1 + 1) 6 k B k 2 min 1 ≤ j 6 = r ≤∞ | λ j ( A ) − λ r ( A ) | . 20 Large deviat ions of quadratic forms The follo wing lemmas are from Paul (2004). S u pp ose that Φ : X → R n × n is a measur able fu nction. Let Z b e a random v ariable taking v alues in X . Lemma 7.2. Supp ose that X and Y ar e i.i.d. N n (0 , I ) and ar e indep endent of Z . Then for every L > 0 and 0 < δ < 1 , for al l 0 < t < δ 1 − δ L , P ( 1 n | X T Φ( Z ) Y | > t, k Φ( Z ) k≤ L ) ≤ 2 exp  − (1 − δ ) nt 2 2 L 2  . Lemma 7.3. Supp ose that X is distribute d as N n (0 , I ) and is indep endent of Z . Als o assume that Φ( z ) = Φ T ( z ) for al l z ∈ X . Then for every L > 0 and 0 < δ < 1 , for al l 0 < t < 2 δ 1 − δ L , P ( 1 n | X T Φ( Z ) X − T r (Φ( Z )) | > t , k Φ( Z ) k≤ L ) ≤ 2 exp  − (1 − δ ) nt 2 4 L 2  . Computation of conditional mixed momen ts In order to calculate the bias and v ariance of the prop osed estimator, w e n eed to compute the conditional exp ectatio n s E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) for v arious c h oices of i 1 , i 2 , j 1 , j ′ 1 , j 2 , j ′ 2 . W e shall use the follo win g we ll-known result, wh ic h is a sp ecial case of W ick formula (Nica and S p eic h er , 2006, p. 129) for computation of mixed moments of a Gaussian random v ector. Lemma 7.4. If W 1 , W 2 , W 3 and W 4 ar e jointly Gaussian with me an zer o and c ovarianc e matrix Σ , then E ( W 1 W 2 W 3 W 4 ) = Σ 12 Σ 34 + Σ 13 Σ 24 + Σ 14 Σ 23 . (42) W e shall use the formula to compute the ab o v e mixed moments with the observ ation that Co v( X i 1 j 1 , X i 2 j 2 | T i 1 , T i 2 ) = ρ i 1 i 2 C ( T i 1 j 1 , T i 2 j 2 ). Th e details of this computation in v arious generic cases are giv en in App end ix F. App endix B In App end ix B and the follo wing app endices, w e shall often wr ite h an d e h to denote h n and e h n , resp ectiv ely , and w e shall dr op the subscript h n from the co v ariance estimates. F or example, b C c will b e used to denote e C c,h n . Pro of of Prop osition 3.1 This is a straightfo r w ard app lication of Lemma 7.1 , by taking the estimated co v ariance k ernel b C c as op erator A and − ∆ i = b C ( − i ) c − b C c as op erator B . Note that in ( 20 ) the last term corresp onds to the zero eigen v alues of b C c . 21 Pro of of Prop osition 3.2 W e can express ∆ i ( u, v ) as b ∆ i ( u, v ) + R i ( u, v ) + ( b σ 2 ( − i ) − n n − 1 σ 2 ) − 1 n − 1 b C c ( u, v ), where b ∆ i ( u, v ) = (1 − W ˜ h n ( u, v )) w ( m i ) n − 1 1 g ( u ) g ( v ) · L n X l,l ′ =1 ( e X i ( s l ) + ( u − s l ) e X ′ i ( s l ))( e X i ( s l ′ ) + ( v − s l ) e X ′ i ( s l ′ )) Q h n ( u − s l ) Q h n ( v − s l ′ ) + W ˜ h n ( u, v ) 1 n − 1 1 g ( u + v 2 ) L n X l =1  S i ( u + v 2 ) + ( u + v 2 − s l ) S ′ i ( u + v 2 )  Q h n ( u + v 2 − s l ) (43) and R i ( u, v ) equals (with z d enoting u + v 2 ) (1 − W ˜ h n ( u, v )) σ n − 1 n X j 6 = i w ( m j ) h ( b µ ( − i ) ∗ ,j ( u ) − b µ ∗ ,j ( u )) b ε j ( v ) + b ε j ( u )( b µ ( − i ) ∗ ,j ( v ) − b µ ∗ ,j ( v )) i +(1 − W ˜ h n ( u, v )) X j 6 = i w ( m j ) n − 1 h ( b µ ( − i ) ∗ ,j ( u ) − b µ ∗ ,j ( u ))( µ ∗ ,j ( v ) − b µ ∗ ,j ( v )) +( b µ ( − i ) ∗ ,j ( v ) − b µ ∗ ,j ( v ))( µ ∗ ,j ( u ) − b µ ∗ ,j ( u )) − ( b µ ( − i ) ∗ ,j ( u ) − b µ ∗ ,j ( u ))( b µ ( − i ) ∗ ,j ( v ) − b µ ∗ ,j ( v )) i + W ˜ h n ( u, v ) 2 σ ( n − 1) g ( z ) n X j 6 = i 1 m j m j X k =1 ( b µ ( T j k ) − b µ ( − i ) ( T j k )) ε j k L n X l =1 e K z ,l ( T j k ) Q h n ( z − s l ) + W ˜ h n ( u, v ) 2 ( n − 1) g ( z ) n X j 6 = i 1 m j m j X k =1 ( b µ ( − i ) ( T j k ) − b µ ( T j k ))( µ ( T j k ) − b µ ( T j k )) L n X l =1 e K z ,l ( T j k ) Q h n ( z − s l ) − W ˜ h n ( u, v ) 1 ( n − 1) g ( z ) n X j 6 = i 1 m j m j X k =1 ( b µ ( − i ) ( T j k ) − b µ ( T j k )) 2 L n X l =1 e K z ,l ( T j k ) Q h n ( z − s l ) where, the ke r nel e K s,l ( · ) ≡ e K s,l, h n ( · ) is deﬁned as e K s,l ( u ) = 1 h n  K ( s l − u h n ) +  s − s l h n  K ′ ( s l − u h n )  , (44) for s ∈ [0 , 1] and l = 1 , . . . , L n ; and for an y function f , f ∗ ,j ( x ) = ( g ( x )) − 1 L n X l =1 ( e f j ( s l ) + ( x − s l ) e f ′ j ( s l )) Q h n ( x − s l ) with e f j ( s ) := 1 m j P m j k =1 f ( T j k ) K h n ( s − T j k ); and b ε j ( x ) = ( g ( x )) − 1 L n X l =1 [ e ε j ( s l ) + ( x − s l ) e ε ′ j ( s l )] Q h n ( x − s l ) 22 with e ε j ( s ) = 1 m j P m j k =1 ε j k K h n ( s − T j k ). Since b H ν b C c b ψ ν = b λ ν b H ν b ψ ν = 0, it follo ws from the represen tation of ∆ i that b H ν ∆ i b ψ ν = b H ν ( b ∆ i + R i ) b ψ ν + ( b σ 2 ( − i ) − n n − 1 b σ 2 )( b H ν 1 b ψ ν ) , where 1 ( u, v ) = 1 { 0 ≤ u,v ≤ 1 } . It is ea sy to see from the exp r ession for R i ( u, v ) and ( 23 ) that for reasonable choic es of h µ , the contribution of R i ( u, v ) can b e ignored, since it is of a smaller asymp - totic order (in fact can b e sho wn to b e o P ( n − 1 )). Hence, we end up w ith the appro ximation b H ν ∆ i b ψ ν ≈ b H ν b ∆ i b ψ ν + ( b σ 2 ( − i ) − n n − 1 b σ 2 )( b H ν 1 b ψ ν ). Th u s, we can separate out the ﬁrst term on the RHS of ( 43 ) into t w o parts - one w ith multiplier 1, and the other with m ultiplier W ˜ h n ( u, v ). Then u sing ( 22 ) and th e second represen tation of b H ν in ( 20 ), we obtai n th e expressions in the ﬁrst four lines on the RHS of ( 24 ). Next, using the fact that W e h n ( u, v ) ≈ 1 {| u − v |≤ A 2 h n } , and u sing th e appro ximations e ψ ν ( v ) ≈ e ψ ν ( u ) an d g ( u + v 2 ) ≈ g ( u ) on th e in terv al [ u − A 2 h n , u + A 2 h n ], w e obtain th e last three terms on the RHS of ( 24 ). No w, using ( 21 ), noting that tr ( b P ν b C c ) = b λ ν , and follo wing similar argumen ts, we h a v e ( 25 ). Appro ximation of γ k ,k ′ ,h n ( i ) First, to ﬁx n otation, sup p ose that e h n = Ah n for some constant A > 0. Then, by deﬁnition of W e h n , and the symmetry of Q , the integ r al app earing in ( 28 ) can b e expr essed as (ignoring the b ound aries) d j j ′ ll ′ ,k k ′ ; h n := Z 1 0 b ψ k ( u ) g ( u ) ( u − s l ) j Q h n ( s l − u ) Z u + A 2 h n u − A 2 h n b ψ k ′ ( v ) g ( v ) ( v − s l ′ ) j ′ Q h n ( v − s l ′ ) dv du. (45) Noticing th at, on [ u − A 2 h n , u + A 2 h n ], b ψ k ′ ( v ) g ( v ) can b e approximat ed as b ψ k ′ ( u ) g ( u ) , w e can approximate the inner int egral (with resp ect to v ) b y b ψ k ′ ( u ) g ( u ) Z u + A 2 h n u − A 2 h n ( v − s l ′ ) j ′ Q h n ( v − s l ′ ) dv = h j ′ +1 n b ψ k ′ ( u ) g ( u ) Z u − s l ′ h n + A 2 u − s l ′ h n − A 2 w j ′ Q ( w ) dw , (setting w = v − s l ′ h n ) =: h j ′ +1 n b ψ k ′ ( u ) g ( u ) G Q j ′  u − s l ′ h n  =: h j ′ +1 n b ψ k ′ ( u ) g ( u ) G Q j ′ ,l ′ ; h n ( u ) . Substituting this in ( 45 ), we hav e the app ro ximation d j j ′ ll ′ ,k k ′ ; h n ≈ ( − 1) j h j ′ +1 n Z 1 0 b ψ k ( u ) b ψ k ′ ( u ) g 2 ( u ) G Q j ′ ,l ′ ; h n ( u )( s l − u ) j Q h n ( s l − u ) du = ( − 1) j h j ′ +1 n G j b ψ k b ψ k ′ g 2 ! G Q j ′ ,l ′ ; h n , Q h n ! ( s l ) =: d j j ′ ll ′ ,k k ′ ; h n , (46) b y deﬁnition of G j ( f 1 , f 2 )( · ), j = 0 , 1. Sin ce  u − s l ′ h n − A 2 , u − s l ′ h n + A 2  ∩ [ − C Q , C Q ] = φ ⇔ | u − s l ′ | > ( C Q + A 2 ) h n , 23 then G Q j ′  u − s l ′ h n  ≡ 0 if | u − s l ′ | > ( C Q + A 2 ) h n . F urthermore, Q h n ( u − s l ) ≡ 0 if | u − s l | > C Q h n . This means that, if either | u − s l | > C Q h n , or | u − s l ′ | > ( C Q + A 2 ) h n , then the integ rand in the ﬁrst step of ( 46 ) is zero. So the d omain of in tegration is, eﬀectiv ely , [ s l − C Q h n , s l + C Q h n ] ∩ [ s l ′ − ( C Q + A 2 ) h n , s l ′ + ( C Q + A 2 ) h n ]. This implies that if | s l − s l ′ | > (2 C Q + A 2 ) h n , then the eﬀectiv e domain of inte gration is empt y , meaning that d j j ′ ll ′ ,k k ′ ; h n = 0 if | s l − s l ′ | > (2 C Q + A 2 ) h n . If Q ( · ) is chosen to b e a cen tered cubic B-spline (so that C Q = 2), w e can compute G Q j ′ ( · ) explicitly , without having to p erform a n umer ical integrat ion (App endix F). App endix C In the follo wing, we often drop th e su bscript n from h n for simplicit y and sometimes w e eve n dr op the sub s cript h from the notation. Pro of of Prop osition 2.1 By elemen tary calculat ions, and supp osing that m i ≥ 2 for eac h 1 ≤ i ≤ n , we ha v e E [ e X i ( s ) e X i ( t )] = 1 m 2 i m i X j,j ′ =1 E [ Y ij Y ij ′ 1 h 2 n K ( s − T ij h n ) K ( s − T ij ′ h )] = m i m 2 i 1 h 2 n Z ( C ( u, u ) + σ 2 ) K ( s − u h n ) K ( t − u h n ) du + m i ( m i − 1) m 2 i 1 h 2 n Z Z C ( u, v ) K ( s − u h n ) K ( t − u h n ) dudv = 1 m i 1 h n Z ( C ( t + h n u, t + h n u ) + σ 2 ) K ( − u ) K ( s − t h n − u ) du + m i − 1 m i Z Z C ( s + h n u, t + h n v ) K ( − u ) K ( − v ) dudv = 1 m i h n  ( C ( t ) + σ 2 ) Z K ( − u ) K ( s − t h n − u ) du + h n C ′ ( t ) Z uK ( − u ) K ( s − t h n − u ) du + O ( h 2 n )  +  1 − 1 m i  C ( s, t ) Z Z K ( − u ) K ( − v ) dudv +  1 − 1 m i  h n Z Z [ C s ( s, t ) u + C t ( s, t ) v ] K ( − u ) K ( − v ) dudv + O ( h 2 n ) , (47) where the last step is b y T a ylor series expansions. No w, noticing that K is symmetric ab out 0, R K ( x ) dx = 1 and R xK ( x ) dx = 0, ( 5 ) and ( 6 ) follo w from ( 47 ) after simpliﬁcations. 24 Asymptotic p oin twise bias ( 31 ) W e ﬁrst compute the exp ected v alue of the estimate describ ed by ( 11 ). F or simplicit y of notations, w e exp r ess e X i ( s l ) + ( s − s l ) e X ′ i ( s l ) b y e X i,l ( s ). Observ e that e X i,l ( s ) = 1 m i m i X j =1 Y ij 1 h  K ( s l − T ij h ) + s − s l h K ′ ( s l − T ij h )  Let the supp ort of k ern el K ( · ) b e d en oted by [ − B K , B K ]. Th en, for eac h ﬁxed j = 1 , . . . , m i , and i = 1 , . . . , n , E  Y 2 ij  ( K ( s l − T ij h ) + s − s l h K ′ ( s l − T ij h ))( K ( s l ′ − T ij h ) + s − s l ′ h K ′ ( s l ′ − T ij h ))  = Z [ C ( u, u ) + σ 2 ] g 2 ( u ) ·  ( K ( s l − u h ) + s − s l h K ′ ( s l − u h ))( K ( s l ′ − v h ) + t − s l ′ h K ′ ( s l ′ − v h ))  du, (48) whic h is 0 if | s l − s l ′ | > 2 B K h , since this implies th at K ( s l − u h ) K ( s l ′ − u h ) = 0 f or all u ∈ R . If | s l − s l ′ | ≤ 2 B K h , there is nonzero con trib ution of th e term ( 48 ) in E [ e X i,l ( s ) e X i,l ′ ( t )] Q h ( s − s l ) Q h ( t − s l ′ ) only if | s − t | ≤ 2( B K + C Q ) h , where supp ( Q ) = [ − C Q , C Q ]. Thus, if A > 4( B K + C Q ), then for | s − t | > 1 2 Ah , w e h a ve w ( m i ) E ( e X i,l ( s ) e X i,l ′ ( t )) = Z Z C ( u, v ) g ( u ) g ( v ) 1 h 2  ( K ( s l − u h ) + s − s l h K ′ ( s l − u h ))( K ( s l ′ − v h ) + t − s l ′ h K ′ ( s l ′ − v h ))  dudv = Z Z C ( s l + xh, s l ′ + y h ) g ( s l + xh ) g ( s l ′ + y h ) ·  ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y ))  dxdy . (49) W e assume th at the cond itions in Section 4 hold. Then using the representat ion ( 49 ), and the calculatio n s done in App endix F, we get an expr ession for the asymptotic bias in estimating C ( · , · ) as a fun ction of the bandwidth h ≡ h n . These resu lts are su m marized in the follo wing lemmas, where C s , C ss and C t , C tt denote the ﬁrst and second partial deriv ativ es of C ( s, t ) with resp ect to s and t , resp ectiv ely . Lemma 7.5. ( Exp ectation of e C ( s, t ) ): L et K 2 = R x 2 K ( x ) dx , Q h ( s ) = L n X l =1 Q h ( s − s l ) , and Q (2) h ( s ) = L n X l =1 ( s − s l h ) 2 Q h ( s − s l ) . 25 Then, for | s − t | > 2 Ah n , E e C ( s, t ) = C ( s, t ) Q h ( s ) Q h ( t ) + h 2 2 C ( s, t )  g ′′ ( s ) g ( s ) ( K 2 Q h ( s ) − Q (2) h ( s )) Q h ( t ) + g ′′ ( t ) g ( t ) ( K 2 Q h ( t ) − Q (2) h ( t )) Q h ( s )  + h 2 C s g ′ ( s ) g ( s ) ( K 2 Q h ( s ) − Q (2) h ( s )) Q h ( t ) + h 2 C t g ′ ( t ) g ( t ) ( K 2 Q h ( t ) − Q (2) h ( t )) Q h ( s ) + h 2 2 1 g ( s ) g ( t ) h C ss ( K 2 Q h ( s ) − Q (2) h ( s )) Q h ( t ) + C tt ( K 2 Q h ( t ) − Q (2) h ( t )) Q h ( s ) i + O ( h 2+ α ) . (50) Note that b ecause of pr op ert y (iii ) of the k ern el Q , and the fact that s l = ( l + a ) h for l = 1 , . . . , L n , for some constan t a ∈ [ − 3 , 3], we ha v e for s ∈ ( c, 1 − c ), for some c ∈ (0 , 1), Q h ( s ) = L n X l =1 Q ( s h − a − l ) = 1 . Therefore, w e can c ho ose L n and the sequence of p oin ts { s l } L n l =1 so that L n ≈ h − 1 n , and Q h ( s ) ≡ 1 for all s ∈ [0 , 1]. T hat is, from Lemma 7.5 , we h av e E e C ( s, t ) = C ( s, t ) + O ( h 2 ). Lemma 7.6. ( Exp ecta t ion of b C ∗ ( t ) ): L et C ′ ( t ) and C ′′ ( t ) denote the ﬁrst and se c ond derivative of the fu nction C ( t ) := C ( t, t ) . Then, uniformly in t , E b C ∗ ( t ) = ( C ( t ) + σ 2 ) Q h ( t ) + h 2 2 ( C ( t ) + σ 2 )  g ′′ ( t ) g ( t )  ( K 2 Q h ( t ) − Q (2) h ( t )) + h 2 C ′ ( t )  g ′ ( t ) g ( t )  ( K 2 Q h ( t ) − Q (2) h ( t )) + h 2 2 C ′′ ( t ) g ( t ) ( K 2 Q h ( t ) − Q (2) h ( t )) + O ( h 2+ α ) . (51) Pro of of Lemm a 7.6 follo ws along the lines of Lemma 7.5 . F u rthermore, if an estimator b σ 2 is suc h that E b σ 2 = σ 2 + O ( h 2 ), then it follo ws from Lemma 7.6 th at the estimato r b C ( t ) := b C ∗ ( t ) − b σ 2 satisﬁes E b C ( t ) = C ( t ) + O ( h 2 ) , (52) uniformly on t ∈ [0 , 1], since Q h ( t ) ≡ 1 on t ∈ [0 , 1]. Next, since C ( s, t ) = C ( t, s ) and C ( · , · ) is smo oth, it f ollo ws that C s − C t ≡ 0. Consequen tly , using a T aylo r s eries expans ion, it follo ws that, for an y A > 0, C ( s, t ) = C  s + t 2  + O ( h 2 ) , for | s − t | ≤ A 2 h. (53) Com binin g ( 52 ) and ( 53 ) we get, E b C  s + t 2  = C ( s, t ) + O ( h 2 ) , for | s − t | ≤ A 2 h, s, t ∈ [0 , 1] . (54) 26 App endix D Pro of of Prop osition 5.1 W e shall extensiv ely use the follo wing rep r esen tation H ν ( x, y ) = H ν ( x, y ) − 1 λ ν δ ( x − y ) , (55) where H ν ( x, y ) = X 1 ≤ k 6 = ν ≤ M λ k λ k − λ ν ψ k ( x ) ψ k ( y ) + 1 λ ν ψ ν ( x ) ψ ν ( y ) . The ﬁrst step is to exp ress b C c ( s, t ) as e C c ( s, t ) − W e h n ( s, t )( b σ 2 − σ 2 ), where e C c ( s, t ) = W e h n ( s, t ) e C ( s, t ) + W e h n ( s, t )( b C ∗ ( s + t 2 ) − σ 2 ) . (56) Therefore, in ord er to separate th e eﬀect of estimating σ 2 , use the fact that for any ﬁxed ǫ > 0, k H ν b C c ψ ν k 2 2 ≤ (1 + ǫ ) k H ν e C c ψ ν k 2 2 +  1 + 1 ǫ  ( b σ 2 − σ 2 ) 2 k H ν W e h n ψ ν k 2 2 = (1 + ǫ ) k H ν e C c ψ ν k 2 2 +  1 + 1 ǫ  ( b σ 2 − σ 2 ) 2 O ( h 4 n ) . (57) The equalit y follo ws since usin g H ν ψ ν = 0, the deﬁn ition of W e h n , and the Me an V alue The or em , w e h a v e | ( H ν W e h n ψ ν )( x ) | =      Z H ν ( x, s ) Z ( s − Ah n 2 ) ∧ 1 ( s + Ah n 2 ) ∨ 0 ( ψ ν ( t ) − ψ ν ( s )) dtds      ≤ A 2 h 2 n 2 k ψ ′ ν k ∞  Z | H ν ( x, s ) | ds + 1 λ ν  . Since E ( b σ 2 − σ 2 ) 2 = o (1), it is enough to show that E k H ν e C c ψ ν k 2 2 has th e b oun d giv en by the RHS of ( 40 ), without the multiplic ative factor (1 + ǫ ). With a sligh t abuse of notation, w e wr ite b C ( s ) to indicate b C ∗ ( s ) − σ 2 . Then, since ( H ν e C c ψ ν )( x ) = Z Z H ν ( x, s ) W e h n ( s, t ) e C ( s, t ) ψ ν ( t ) dsdt + Z Z H ν ( x, s ) W e h n ( s, t ) b C ( s + t 2 ) ψ ν ( t ) dsdt, it follo ws that, k H ν e C c ψ ν k 2 2 equals Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) · e C ( s 1 , t 1 ) e C ( s 2 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx + Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) · b C ( s 1 + t 1 2 ) b C ( s 2 + t 2 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx + Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) · e C ( s 1 , t 1 ) b C ( s 2 + t 2 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx. (58) 27 Th u s, in ord er to obtain E k H ν e C c ψ ν k 2 2 , we need to ev aluate the quantitie s E [ e C ( s 1 , t 1 ) e C ( s 2 , t 2 )], E [ b C ( s 1 + t 1 2 ) b C ( s 2 + t 2 2 )], and E [ e C ( s 1 , t 1 ) b C ( s 2 + t 2 2 )]. Let U i ( s, t ) = L n X l,l ′ =1 1 m 2 i m i X j,j ′ =1 Y ij Y ij ′ e K s,l ( T ij ) e K s,l ( T ij ′ ) Q h ( s − s l ) Q h ( t − s l ′ ) , (59) where e K s,l ( · ) is as in ( 44 ). Then we can express the exp ectation of the ﬁrst term on the RHS of ( 58 ) as 1 n 2 n X i =1 w 2 ( m i ) Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) · [ g ( s 1 ) g ( s 2 ) g ( t 1 ) g ( t 2 )] − 1 E [ U i ( s 1 , t 1 ) U i ( s 2 , t 2 )] ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx + 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) w ( m i 2 ) Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) · [ g ( s 1 ) g ( s 2 ) g ( t 1 ) g ( t 2 )] − 1 E [ U i 1 ( s 1 , t 1 ) U i 2 ( s 2 , t 2 )] ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx. (60) The follo wing prop osition is the key to get a simpliﬁed b ound on ( 60 ). It is prov ed using a lengthy , but fairly straightforw ard calculation. The details are giv en in App end ix F. Prop osition 7.1. Supp ose that A > 4( B K + C Q ) . Then for | s k − t k | > 1 2 Ah n ( k = 1 , 2 ), we have 1 n 2 n X i =1 w 2 ( m i ) E [ U i ( s 1 , t 1 ) U i ( s 2 , t 2 )] g ( s 1 ) g ( s 2 ) g ( t 1 ) g ( t 2 ) = 1 n n X i =1 ( m i − 2)( m i − 3) m i ( m i − 1)  ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( s 2 , t 2 ) + O ( h 2 n )) +( C ( s 1 , s 2 ) + O ( h 2 n ))( C ( t 1 , t 2 ) + O ( h 2 n )) + ( C ( s 1 , t 2 ) + O ( h 2 n ))( C ( s 2 , t 1 ) + O ( h 2 n ))  + Z 1 + Z 2 + Z 3 + Z 4 + Z 5 + Z 6 , (61) wher e the quantities Z j := Z j ( s 1 , s 2 , t 1 , t 2 ) , j = 1 , . . . , 6 wher e Z 1 , . . . , Z 4 ar e asympto tic al ly e quiv- alent to Z ( s 1 , s 2 ) , Z ( s 1 , t 2 ) , Z ( t 1 , s 2 ) and Z ( t 1 , t 2 ) , r esp e ctiv e ly; and Z 5 , Z 6 ar e asymptotic al ly e quivalent to e Z ( s 1 , s 2 , t 1 , t 2 ) and e Z ( s 1 , t 2 , t 1 , s 2 ) , r esp e ctively, wher e Z ( s, t ) = ( O ( 1 nh n m ) if | s − t | ≤ Ah n 2 0 otherw ise ; and e Z ( s 1 , s 2 , t 1 , t 2 ) =            O ( 1 nh 2 n m 2 ) if max {| s 1 − s 2 | , | t 1 − t 2 |} ≤ Ah n 2 O ( 1 nh n m 2 ) if | s 1 − s 2 | ≤ Ah n 2 and | t 1 − t 2 | > Ah n 2 O ( 1 nh n m 2 ) if | s 1 − s 2 | > Ah n 2 and | t 1 − t 2 | ≤ Ah n 2 0 otherwise . 28 Also , 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) w ( m i 2 ) E [ U i 1 ( s 1 , t 1 ) U i 2 ( s 2 , t 2 )] g ( s 1 ) g ( s 2 ) g ( t 1 ) g ( t 2 ) = n − 1 n ( C ( s 1 , t 1 ) + O ( h 2 n ))( C ( s 2 , t 2 ) + O ( h 2 n )) + 1 n 2 ( n X i 1 6 = i 2 ρ 2 i 1 i 2 )  ( C ( s 1 , s 2 ) + O ( h 2 n ))( C ( t 1 , t 2 ) + O ( h 2 n )) +( C ( s 1 , t 2 ) + O ( h 2 n ))( C ( s 2 , t 1 ) + O ( h 2 n ))  . (62) In al l of ab ove the O ( · ) terms ar e uniform in s 1 , s 2 , t 1 , t 2 in their r esp e ctive domains. No w w e deal with the last tw o terms on the RHS of ( 58 ). Let V i ( s ) = L n X l =1 1 m i m i X j =1 Y 2 ij e K s,l ( T ij ) . (63) Then, b C ∗ ( s ) = 1 n n X i =1 [ g ( s )] − 1 V i ( s ) Q h n ( s − s l ) . F or con v enience, in the rest of this subsection we shall use z k to den ote ( s k + t k ) / 2, for k = 1 , 2. Then the follo wing prop osition describ es the con tribu tion of the qu an tities of the t yp e E [ V i 1 ( z 1 ) V i 2 ( z 2 )] and E [ U i 1 ( s 1 , t 1 ) V i 2 ( z 2 )]. Prop osition 7.2. Supp ose that A > 4( B K + C Q ) . Then for (i) | s k − t k | ≤ Ah n 2 , k = 1 , 2 , 1 n 2 n X i =1 E ( V i ( z 1 ) V i ( z 2 )) g ( z 1 ) g ( z 2 ) + 1 n 2 n X i 1 6 = i 2 E ( V i 1 ( z 1 ) V i 2 ( z 2 )) g ( z 1 ) g ( z 2 ) − σ 2 [ E ( b C ∗ ( z 1 )) + E ( b C ∗ ( z 2 ))] + σ 4 = C ( s 1 , t 1 ) C ( s 2 , t 2 ) − 1 n 2 n X i =1 1 m i ! ( C ( s 1 , t 1 ) + σ 2 )( C ( s 2 , t 2 ) + σ 2 ) + O ( h 2 n ) +   1 n (1 − 1 n n X i =1 1 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h n )) + Z 7 , (64) wher e Z 7 := Z 7 ( z 1 , z 2 ) is asymptotic al ly e quiv alent to Z ( z 1 , z 2 ) . N ext, if (ii) | s 1 − t 1 | > Ah n 2 and 29 | s 2 − t 2 | ≤ Ah n 2 , then 1 n 2 n X i =1 w ( m i ) E ( U i ( s 1 , t 1 ) V i ( z 2 )) + 1 n 2 X i 1 6 = i 2 w ( m i 1 ) E ( U i 1 ( s 1 , t 1 ) V i 2 ( z 2 )) − σ 2 E e C ( s 1 , t 1 ) = ( C ( s 1 , t 1 ) + O ( h 2 n ))( C ( s 2 , t 2 ) + O ( h 2 n )) − 1 n 2 n X i =1 2 m i ! ( C ( s 1 , t 1 ) + O ( h 2 n ))( C ( s 2 , t 2 ) + σ 2 + O ( h 2 n )) +   1 n (1 − 1 n n X i =1 2 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h n )) + Z 8 + Z 9 , (65) wher e the O ( h 2 n ) terms within br ackets in the ﬁrst term on the RHS dep end on ( s 1 , t 1 ) and ( s 2 , t 2 ) r e sp e ctively, and Z j := Z j ( s 1 , t 1 , z 2 ) , j = 8 , 9 satisfy Z 8 = ( O ( 1 nh 2 n m ) if | s 1 − s 2 | ≤ Ah n 2 0 otherw ise ; Z 9 = ( O ( 1 nh 2 n m ) if | t 1 − s 2 | ≤ Ah n 2 0 otherw ise . The pr o of of Prop osition 5.1 is now completed by using the deﬁn itions of E [ e C ( s 1 , t 1 ) e C ( s 2 , t 2 )], E [ b C ( s 1 + t 1 2 ) b C ( s 2 + t 2 2 )], and E [ e C ( s 1 , t 1 ) b C ( s 2 + t 2 2 )]; using the prop erties of the kernel H ν ( x, y ); and the b ound s in P r op ositions 7.1 and 7.2 and p lugging ev erything bac k into th e exp ectation of ( 58 ). Th e details can b e found in App en d ix F. App endix E Asymptotic p oin twise v ariance ( 32 ) In this section, we pro ve ( 32 ), ( 33 ) an d ( 34 ). Most of the d eriv ations are similar to that of Prop o- sition 5.1 . Th us w e simply giv e a b rief outline. First, using the f act that W e h n ( s, t ) W e h n ( s, t ) = 0, we obtain V ar ( b C c ( s, t )) = W e h n ( s, t )V ar ( e C ( s, t )) + W e h n ( s, t )V ar( b C ∗ ( s + t 2 ) − b σ 2 ) ≤ W e h n ( s, t )V ar ( e C ( s, t )) + 2 W e h n ( s, t )  V ar ( b C ∗ ( s + t 2 )) + V ar( b σ 2 )  . Since E ( b σ 2 − σ 2 ) 2 has the rate giv en b y ( 33 ) (Corollary 4.1 ), we only n eed to pro vide b oun ds f or W e h n ( s, t )V ar( e C ( s, t )) and W e h n ( s, t )V ar ( b C ∗ ( s + t 2 )). W e state these in the follo wing prop ositions. Prop osition 7.3. W e h n ( s, t ) V ar ( e C ( s, t )) = O  1 n  +   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   O (1) + O  max { 1 nh 2 n m 2 , 1 nh n m }  . (66) 30 Prop osition 7.4. W e h n ( s, t ) V ar ( b C ∗ ( s + t 2 )) = O  1 n  +   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   O (1) + O  1 nh n m  . (67) The pro of of ( 34 ) is ﬁnished b y com b ining Prop ositions 7.3 and 7.4 and Corollary 4.1 . Pro of of Corollary 4.1 First observe th at, E ( b σ 2 − σ 2 ) 2 = 1 ( T 1 − T 0 ) 2 Z T 1 T 0 Z T 1 T 0 E [( b C ∗ ( t ) − σ 2 − b C 0 ( t ))( b C ∗ ( s ) − σ 2 − b C 0 ( s ))] dsdt ≤ sup t ∈ [ T 0 ,T 1 ] E ( b C ∗ ( t ) − σ 2 − b C 0 ( t )) 2 (b y Cauc h y-Sch w arz inequalit y) ≤ 2 sup t ∈ [ T 0 ,T 1 ] V ar( b C ∗ ( t )) + 2 su p t ∈ [ T 0 ,T 1 ] V ar( b C 0 ( t )) + sup t ∈ [ T 0 ,T 1 ]  E ( b C ∗ ( t )) − σ 2 − E ( b C 0 ( t ))  2 . (68) By Prop ositions 7.3 and 7.4 , and the deﬁnition ( 12 ) of b C 0 , the sum of th e ﬁrst t wo term on the RHS on ( 68 ) is b ounded b y O  1 n  +   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   O (1) + O  max { 1 nh 2 n m 2 , 1 nh n m }  . On the other han d , since for an y b ound ed u ∈ [ A 1 , A 2 ],     1 2 ( C ( t − h n u, t + h n u ) + C ( t + h n u, t − h n u )) − C ( t, t )     = O ( h 2 n ) , uniformly in t ∈ [ T 0 , T 1 ], it follo ws from Lemmas 7.5 and 7.6 (App endix C) that th e last term on the RHS of ( 68 ) is O ( h 4 n ). Pro of of Prop osition 5.2 Without loss of generalit y we assume g to b e uniform d ensit y on [0 , 1]. W e need to consider t w o cases separately : (i) | s − t | > Ah 2 and (ii) | s − t | ≤ Ah 2 . (i) | s − t | > Ah 2 : In this case, we h a v e b C c ( s, t ) − E [ b C c ( s, t )] = W Ah ( s, t )( e C ( s, t ) − E [ e C ( s, t )]). Let B i ( s, T ij ) = L n X l =1 e K s,l ( T ij ) Q h ( s − s l ) , 1 ≤ j ≤ m i , 1 ≤ i ≤ n. Since | e K s,l ( T ij ) | = O ( h − 1 ) and the su mmands are nonzero for ﬁnitely man y l , there exists a constan t C 3 > 0 such that sup s ∈ [0 , 1] max 1 ≤ i ≤ n max 1 ≤ j ≤ m i | B i ( s, T ij ) | ≤ C 3 h − 1 . (69) 31 Note further that B i ( s, T ij ) = 0 if | s − T ij | > 2( B K + C Q ) h . Next, L n X l =1 e X i,l ( s ) = 1 m i m i X j =1 ( X i ( T ij ) + σ ε ij ) B i ( s, T ij ) = M X k =1 p λ k ξ ik   1 m i m i X j =1 ψ k ( T ij ) B i ( s, T ij )   + σ 1 m i m i X j =1 ε ij B i ( s, T ij ) = M X k =1 p λ k ξ ik B 1 i,k ( s ) + σ 1 m i m i X j =1 ε ij B i ( s, T ij ) , where B 1 i,k ( s ) := 1 m i P m i j =1 ψ k ( T ij ) B i ( s, T ij ). By ( 69 ), there exists C 4 > 0 such that sup s ∈ [0 , 1] max 1 ≤ k ≤ M max 1 ≤ i ≤ n | B 1 i,k ( s ) | ≤ C 4 h − 1 . (70) Also, since A > 4( B K + C Q ) h and since | s − t | ≥ Ah 2 , it follo ws that B i ( s, T ij ) B i ( t, T ij ) = 0. Moreo v er, B i ( s, T ij ) B i ( t, T ij ′ ) 6 = 0 only if 1 ≤ j 6 = j ′ ≤ m i are suc h that | s − T ij | ≤ 2( B K + C Q ) h and | t − T ij ′ | ≤ 2( B K + C Q ) h . T his implies that P g ( B 1 i,k ( s ) B 1 i,k ( t ) 6 = 0) ≤ C 5 m i ( m i − 1) h 2 for some C 5 := C 5 ( A ) > 0 . F u r thermore, for eac h k = 1 , . . . , M , { B 1 i,k ( s ) } n i =1 are indep enden t, and these random v ari- ables are indep end en t of { ξ ik : 1 ≤ k ≤ M } n i =1 and { ε ij : 1 ≤ j ≤ m i } n i =1 . Then, we can express e C ( s, t ) − E [ e C ( s, t )] as, e C ( s, t ) − E [ e C ( s, t )] = X 1 ≤ k 6 = k ′ ≤ M p λ k λ k ′ 1 n n X i =1 ξ k i ξ k ′ i w ( m i ) B 1 i,k ( s ) B 1 i,k ′ ( t ) + M X k =1 λ k 1 n n X i =1 ( ξ 2 k i − 1) w ( m i ) B 1 i,k ( s ) B 1 i,k ( t ) + M X k =1 λ k 1 n n X i =1 w ( m i )( B 1 i,k ( s ) B 1 i,k ( t ) − E ( B 1 i,k ( s ) B 1 i,k ( t ))) + σ M X k =1 p λ k 1 n n X i =1 w ( m i ) 1 m i m i X j =1 ξ ik ε ij ( B 1 i,k ( s ) B i ( t, T ij ) + B 1 i,k ( t ) B i ( s, T ij )) + σ 2 1 n n X i =1 w ( m i ) m 2 i m i X j 6 = j ′ ε ij ε ij ′ B i ( s, T ij ) B i ( t, T ij ′ ) + σ 2 1 n n X i =1 w ( m i ) m 2 i m i X j =1 ( ε 2 ij − 1) B i ( s, T ij ) B i ( t, T ij ) + σ 2 1 n n X i =1 w ( m i ) m 2 i m i X j =1 ( B i ( s, T ij ) B i ( t, T ij ) − E ( B i ( s, T ij ) B i ( t, T ij ))) . (71) 32 The last tw o terms in the ab o ve expression v anish since | s − t | > 4( B K + C Q ) h . Note that, max 1 ≤ i ≤ n w ( m i ) is b ounded. By ( 70 ), | B 1 i,k ( s ) B 1 i,k ( t ) | ≤ C 2 4 h − 2 are b ounded for k = 1 , . . . , M , and for all k , k ′ , max 1 ≤ i ≤ n V ar( B 1 i,k ( s ) B 1 i,k ′ ( t )) ≤ C 6 max { ( m h ) − 2 , ( mh ) − 1 } f or C 6 = C 6 ( A ) > 0 , (72) (see App endix F). Thus b y Bernstein ’s ine quality , an d using the condition that m 2 = o ( nh 2 / log n ), giv en η > 0, there exists c 1 ,η > 0 suc h that for suﬃciently large n (so that the b ound in ( 72 ) is O (( m h ) − 2 )), P g max k =1 ,...,M      1 n n X i =1 w ( m i )( B 1 i,k ( s ) B 1 i,k ( t ) − E ( B 1 i,k ( s ) B 1 i,k ( t )))      > c 1 ,η s log n nh 2 m 2 ! ≤ n − η . (73) Next, let A b e th e set of indices i such that B 1 i,k ( s ) B 1 i,k ′ ( t ) 6 = 0 for some k , k ′ . An d let N n = |A| . Since for an y k , k ′ , P ( B 1 i,k ( s ) B 1 i,k ′ ( t ) 6 = 0) ≤ C 5 m 2 h 2 , it follo w s b y another application of Bernstein’s inequalit y that there exists a set D n (in th e sigma ﬁeld generated b y { T ij } ) and a constan t c 2 ,η > 0 suc h that D n = { N n ≤ c 2 ,η n m 2 h 2 } and P ( D n ) ≥ 1 − n − η . Therefore we can restrict our atten tion to the set D n , and conditioning on T W e can express ξ A ,k = ( ξ ik ) i ∈A as ξ A ,k := ( R AA ) 1 / 2 ξ A ,k , w here the r andom v ectors ξ A ,k ha ve N N n (0 , I ) distribution and are ind ep endent for diﬀerent k ’s. Then w e can w r ite (conditionally on T ) n X i =1 ξ k i ξ k ′ i w ( m i ) B 1 i,k ( s ) B 1 i,k ′ ( t ) = ξ T A ,k Φ( T ) ξ A ,k ′ , where Φ( T ) = ( R AA ) 1 / 2 diag ( w ( m i ) B 1 i,k ( s ) B 1 i,k ′ ( t )) i ∈A ( R AA ) 1 / 2 . Observe that by ( 70 ) and condition C 3 , w e ha ve k Φ( T ) k≤ C 4 κ n h − 2 . T herefore, by an applicatio n of Lemma 7.2 , we ha ve , for some c 3 ,η > 0, P ( | 1 n n X i =1 ξ k i ξ k ′ i w ( m i ) B 1 i,k ( s ) B 1 i,k ′ ( t ) | > c 3 ,η mκ n r log n nh 2 , D n ) ≤ n − η . V ery similar arguments can b e us ed to obtain b ound s of order mκ n q log n nh 2 (that hold with probabilit y at least 1 − O ( n − η ), for any giv en η > 0) for the s econd, fourth and ﬁfth terms on the RHS of ( 71 ). Th us, by conditions on κ n and h n , w e h av e, for some constant c 4 ,η > 0, P ( | W Ah ( s, t )( b C c ( s, t ) − E ( b C c ( s, t )) | > c 4 ,η mκ n r log n nh 2 ) ≤ n − η . (74) (ii) | s − t | ≤ Ah 2 : In this case, w e hav e b C c ( s, t ) − E [ b C c ( s, t )] = W Ah ( s, t )( b C ∗ ( s + t 2 ) − E [ b C ∗ ( s + t 2 )]) (ignoring the maxim um o v er h 2 n in the deﬁnition). Th en similar (but somewhat simp ler) argumen ts, no w in vol vin g Lemma 7.3 , sho w that for some c 5 ,η > 0, P ( | W Ah ( s, t )( b C ∗ ( s + t 2 ) − E [ b C ∗ ( s + t 2 )]) | > c 5 ,η mκ n r log n nh 2 ) ≤ n − η . (75) Com binin g ( 74 ) and ( 75 ) we obtain the result. 33 App endix F Details of computation of G Q j ( · ) W e w ant to give explicit fun ctional form for G Q j ( y ), for j = 0 , 1 and for any y ∈ R . Let B 1 ( x ) = x 3 / 6 B 2 ( x ) = ( − 3 x 3 + 3 x 2 + 3 x + 1) / 6 B 3 ( x ) = (3 x 3 − 6 x 2 + 4) / 6 B 4 ( x ) = (1 − x ) 3 / 6 . Then the cen tered v ersion of th e cubic B -splin e Q has the form Q ( x ) =                B 1 ( x + 2) for − 2 ≤ x ≤ − 1 B 2 ( x + 1) for − 1 ≤ x ≤ 0 B 3 ( x ) for 0 ≤ x ≤ 1 B 4 ( x − 1) for 1 ≤ x ≤ 2 0 o therw ise =                1 6 (2 + x ) 3 for − 2 ≤ x ≤ − 1 1 6 ( − 3 x 3 − 6 x 2 + 4) for − 1 ≤ x ≤ 0 1 6 (3 x 3 − 6 x 2 + 4) for 0 ≤ x ≤ 1 1 6 (2 − x ) 3 for 1 ≤ x ≤ 2 0 otherwise . Note that G Q j ( y ) can then b e computed by utilizing the fact th at, for j = 0 , 1, G Q j ( y ) = Z ( y + A 2 ) ∧ 2 − 2 x j Q ( x ) dx − Z ( y − A 2 ) ∧ 2 − 2 x j Q ( x ) dx, where the int egrals on the righ t hand side are deﬁ ned to b e zero if the corresp onding upp er limits are less than − 2. T he in tegrals on the RHS of ab o v e equation can b e compu ted from the rep resen tation of Q ( · ) as follo ws: Z b − 2 Q ( x ) dx = 1 24 (2 + b ) 4 , for − 2 ≤ b ≤ − 1 Z b − 1 Q ( x ) dx = 1 24 ( − 3 b 4 − 8 b 3 + 16 b + 11) , for − 1 ≤ b ≤ 0 Z b 0 Q ( x ) dx = 1 24 (3 b 4 − 8 b 3 + 16 b ) , f or 0 ≤ b ≤ 1 Z b 1 Q ( x ) dx = 1 24 (1 − (2 − b ) 4 ) , for 1 ≤ b ≤ 2 , Z b − 2 x Q ( x ) dx = 1 30 (2 + b ) 5 − 1 12 (2 + b ) 4 , for − 2 ≤ b ≤ − 1 Z b − 1 x Q ( x ) dx = 1 60 ( − 6 b 5 − 15 b 4 + 20 b 2 − 11) , f or − 1 ≤ b ≤ 0 Z b 0 x Q ( x ) dx = 1 60 (6 b 5 − 15 b 4 + 20 b 2 ) , for 0 ≤ b ≤ 1 Z b 1 x Q ( x ) dx = 1 30 (2 − b ) 5 − 1 12 (2 − b ) 4 + 1 20 , for 1 ≤ b ≤ 2 . 34 Details of the calculation of p oin twise bias P erformin g a T aylo r series expansion around ( s, t ) w e get , g ( s l + xh ) = g ( s ) + h ( s l − s h + x ) g ′ ( s ) + h 2 2 ( s l − s h + x ) 2 g ′′ ( s ) + O (( | s − s l h | 2+ α + | x | 2+ α ) h 2+ α ) g ( s l ′ + y h ) = g ( t ) + h ( s l ′ − t h + y ) g ′ ( t ) + h 2 2 ( s l ′ − t h + y ) 2 g ′′ ( t ) + O (( | t − s l ′ h | 2+ α + | y | 2+ α ) h 2+ α ) , (76) and C ( s l + xh, s l ′ + y h ) = C ( s, t ) + h ( s l − s h + x, s l ′ − t h + y )  C s ( s, t ) C t ( s, t )  + h 2 2 ( s l − s h + x, s l ′ − t h + y )  C ss C st C ts C tt   s − s l h + x t − s l ′ h + y  + O  | s − s l h | 2+ α + | t − s l ′ h | 2+ α + | x | 2+ α + | y | 2+ α  h 2+ α  . (77) First we consider th e oﬀ-diagonal terms, i.e., compu te E e C ( s, t ), for | s − t | > 2 Ah . • h 0 terms : Since R K ( x ) dx = 1 and R K ′ ( x ) dx = 0, Z Z C ( s, t )( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = C ( s, t ) . (78) • h 1 terms : Since R xK ′ ( − x ) dx = 1, R xK ( x ) dx = 0, and R K ( x ) dx = 1, Z Z h  ( s l − s h + x ) C s + ( s l ′ − t h + y ) C t  · ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = 0 , (79) and Z Z hC ( s, t )  g ( s ) g ′ ( t )( s l ′ − t h + y ) + g ′ ( s ) g ( t )( s l − s h + x )  · ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = 0 . (80 ) • h 2 terms : Since R x 2 K ′ ( − x ) dx = 0, R xK ′ ( − x ) dx = 1, R xK ( x ) dx = 0, and R K ( x ) dx = 1, h 2 2 C ( s, t ) Z Z  g ′′ ( t ) g ( s )( s l ′ − t h + y ) 2 + g ′′ ( s ) g ( t )( s l − s h + x ) 2  · ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = h 2 2 C ( s, t )  g ′′ ( t ) g ( s )( K 2 − ( s l ′ − t h ) 2 ) + g ′′ ( s ) g ( t )( K 2 − ( s l − s h ) 2 )  ; 35 h 2 C ( s, t ) Z Z [( s l − s h + x )( s l ′ − t h + y ) g ′ ( s ) g ′ ( t )] · ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = 0; h 2 Z Z  ( s l − s h + x ) C s + ( s l ′ − t h + y ) C t  ·  g ( s ) g ′ ( t )( s l ′ − t h + y ) + g ′ ( s ) g ( t )( s l − s h + x )  · ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = h 2  C s g ′ ( s ) g ( t )( K 2 − ( s l − s h ) 2 ) + C t g ( s ) g ′ ( t )( K 2 − ( s l ′ − t h ) 2 )  ; h 2 2 Z Z  ( s l − s h + x ) 2 C ss + 2( s l − s h + x )( s l ′ − t h + y ) C st + ( s l ′ − t h + y ) 2 C tt  · ( K ( x ) + s − s l h K ′ ( − x ))( K ( y ) + t − s l ′ h K ′ ( − y )) dxdy = h 2 2  C ss ( K 2 − ( s l − s h ) 2 ) + C tt ( K 2 − ( s l ′ − t h ) 2 )  . In summary , the h 2 term in the expansion is, h 2  1 2 g ′′ ( s ) g ( t ) C + g ′ ( s ) g ( t ) C s + 1 2 C ss  ( K 2 − ( s l − s h ) 2 ) + h 2  1 2 g ( s ) g ′′ ( t ) C + g ( s ) g ′ ( t ) C t + 1 2 C tt  ( K 2 − ( s l ′ − t h ) 2 ) . (81 ) Pro of of Lemma 7.5 : Com b ining ( 78 ), ( 79 ), ( 80 ) and ( 81 ), and using ( 76 ), ( 77 ) and the fact that P L n l =1 | s − s l h | β Q h ( s − s l ) < ∞ , after some alg ebr a, w e obtain ( 50 ). Com bined b ound on E k H ν e C c ψ ν k 2 2 W e put the diﬀeren t pieces der ived in App endix D together to obtain a b ound on E k H ν e C c ψ ν k 2 2 . F or ease of notation, we denote b y H ν ≡ H ν ( x, s 1 , s 2 , t 1 , t 2 ) the inte gr al op er ator with kernel H ν ( x, s 1 ) H ν ( x, s 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ). Then, with r 1 , r 2 taking v alues 0 or 1, Z Z Z Z Z H ν ( x, s 1 , s 2 , t 1 , t 2 )( C ( s 1 , t 1 )) r 1 ( C ( s 2 , t 2 )) r 2 ds 1 ds 2 dt 1 dt 2 dx = 0 . (82) Z Z Z Z Z H ν ( x, s 1 , s 2 , t 1 , t 2 )( C ( s 1 , t 2 )) r 1 ( C ( s 2 , t 1 )) r 2 ds 1 ds 2 dt 1 dt 2 dx = 0 . (83) Z Z Z Z Z H ν ( x, s 1 , s 2 , t 1 , t 2 )( C ( s 1 , s 2 )) r 1 ( C ( t 1 , t 2 )) r 2 ds 1 ds 2 dt 1 dt 2 dx = λ r 2 ν   X 1 ≤ k 6 = ν ≤ M λ k ( λ k − λ ν ) 2   r 1 . (84) 36 Implicitly using ( 130 ) - ( 132 ), w e also ha ve the b oun d | Z Z Z Z Z H ν ( x, s 1 , s 2 , t 1 , t 2 ) R ( s 1 , s 2 , t 1 , t 2 ) ds 1 ds 2 dt 1 dt 2 dx | = O ( k R k ∞ ) . (85) F rom Prop osition 7.1 , the total con tribution in ( 60 ) of th e ﬁrst terms on th e RHS of ( 61 ) and ( 62 ) b ecomes 1 n (1 − 1 n n X i =1 4 m i − 6 m i ( m i − 1) ) + n − 1 n ! · Z  Z Z H ν ( x, s ) W e h n ( s, t )[ C ( s, t ) + O ( h 2 n )] ψ ν ( t ) dsdt  2 dx +   1 n (1 − 1 n n X i =1 4 m i − 6 m i ( m i − 1) ) + 1 n 2 X i 1 6 = i 2 ρ 2 i 1 i 2   · Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) ( C ( s 1 , s 2 ) + O ( h 2 n ))( C ( t 1 , t 2 ) + O ( h 2 n )) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx +   1 n (1 − 1 n n X i =1 4 m i − 6 m i ( m i − 1) ) + 1 n 2 X i 1 6 = i 2 ρ 2 i 1 i 2   · Z Z Z Z Z H ν ( x, s 1 ) H ν ( x, s 2 ) W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) ( C ( s 1 , t 2 ) + O ( h 2 n ))( C ( s 2 , t 1 ) + O ( h 2 n )) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 dx. (86) Since H ν C ψ ν ≡ 0, it can b e c hec ke d that th e ﬁrst in tegral in ( 86 ) is O ( h 2 n ). On th e other hand, from the deﬁnition of W e h n ( s, t ) and the fact that H ν C ψ ν ≡ 0, it follo ws that the last integral term is O ( h n ). Next, apply H ν to the follo wing functions : W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) D 2 ( s 1 , s 2 , t 1 , t 2 ) and 2 W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) D 3 ( s 1 , s 2 , t 1 , t 2 ), where D 2 ( s 1 , s 2 , t 1 , t 2 ) and D 3 ( s 1 , s 2 , t 1 , t 2 ) are the terms giv en b y the sum of the ﬁrst three terms on the RHS of ( 64 ) (includ ing th e isolated O ( h 2 n ) term), and the sum of th e ﬁr st three term s on the RHS of ( 65 ), resp ectiv ely . Then, add in g these terms to ( 86 ), w e ha v e, by ( 82 ) - ( 85 ), ( 132 ) (for dealing with the isolated O ( h 2 n ) term in ( 64 )), and the commen t follo wing ( 86 ), that this sum equals R 1 = 1 n   X 1 ≤ k 6 = ν ≤ M λ k λ ν ( λ k − λ ν ) 2   +   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2     X 1 ≤ k 6 = ν ≤ M λ k λ ν ( λ k − λ ν ) 2 + O ( h n )   + O ( h 4 n ) + O  1 nm  + O  h n n  . (87) Next, f or notational con v enience, expr ess the integral op erator H ν applied to Z j (where Z j are as in Prop ositions 7.1 - 7.2 ) times W e h n ( s 1 , t 1 ) W e h n ( s 2 , t 2 ) W e h n ( s 1 , t 2 ) by H ν W s 1 ,t 1 W s 2 ,t 2 W s 1 ,t 2 Z j , etc. 37 Using ( 130 ) - ( 135 ), and the b ounds in Prop osition 7.1 for Z j , j = 1 , . . . , 4, w e ha ve, R 2 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 1 = H ν W s 1 ,t 1 W s 2 ,t 2 W s 1 ,s 2 Z 1 = O  1 nh n m  , R 3 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 2 = H ν W s 1 ,t 1 W s 2 ,t 2 W s 1 ,t 2 Z 2 = O  1 nm  , R 4 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 3 = H ν W s 1 ,t 1 W s 2 ,t 2 W s 2 ,t 1 Z 3 = O  1 nm  , R 5 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 4 = H ν W s 1 ,t 1 W s 2 ,t 2 W t 1 ,t 2 Z 4 = O  1 nm  . Using analogous reasoning, from Prop ositions 7.1 and 7.2 w e also ha v e R 6 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 5 = O  1 nh n m 2  , R 7 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 6 = O  1 nm 2  R 8 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 7 = O  h n nm  , R 9 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 8 = O  1 nh n m  , R 10 := H ν W s 1 ,t 1 W s 2 ,t 2 Z 9 = O  1 nm  . Hence, com bin ing ( 87 ) with the b ound s for R 2 to R 10 , using the deﬁnitions of E [ e C ( s 1 , t 1 ) e C ( s 2 , t 2 )], E [ b C ( s 1 + t 1 2 ) b C ( s 2 + t 2 2 )], and E [ e C ( s 1 , t 1 ) b C ( s 2 + t 2 2 )], and plugging ev eryth ing bac k in to ( 58 ), w e complete the pro of of Prop osition 5.1 . The details of the ke y steps in this deriv ation are giv en b elo w. Pro of details for Prop osition 5.1 Pro of of Prop osition 7.1 : W e need to deal with terms of the form e E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) := E [ Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i 1 j ′ 1 ) e K s 2 ,l 2 ( T i 2 j 2 ) e K t 2 ,l ′ 2 ( T i 2 j ′ 2 )] , for 1 ≤ j 1 , j ′ 1 ≤ m i 1 , 1 ≤ j 2 , j ′ 2 ≤ m i 2 , 1 ≤ i 1 , i 2 ≤ n . F or compu tational conv enience, w e also deﬁne, E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) = e E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) Q h ( s 1 − s l 1 ) Q h ( t 1 − s l ′ 1 ) Q h ( s 2 − s l 2 ) Q h ( t 2 − s l ′ 2 ) . (88) 38 First, consider the case i 1 = i 2 = i , sa y . Then, us ing to ⋆ to denote E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ), w e h a v e E [ U i ( s 1 , t 1 ) U i ( s 2 , t 2 )] = 1 m 4 i m i X j 1 6 = j ′ 1 6 = j 2 6 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ + 1 m 4 i   m i X j 1 = j ′ 1 6 = j 2 6 = j ′ 2 + m i X j 1 = j 2 6 = j ′ 1 6 = j ′ 2 + m i X j 1 = j ′ 2 6 = j ′ 1 6 = j 2 + m i X j 1 6 = j ′ 1 6 = j 2 = j ′ 2 + m i X j 1 6 = j ′ 1 = j 2 6 = j ′ 2 + m i X j 1 6 = j 2 6 = j ′ 1 = j ′ 2   L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ + 1 m 4 i   m i X j 1 = j ′ 1 6 = j 2 = j ′ 2 + m i X j 1 = j 2 6 = j ′ 1 = j ′ 2 + m i X j 1 = j ′ 2 6 = j ′ 1 = j 2   L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ + 1 m 4 i   m i X j 1 = j ′ 1 = j 2 6 = j ′ 2 + m i X j 1 = j ′ 1 = j ′ 2 6 = j 2 + m i X j 1 = j 2 = j ′ 2 6 = j ′ 1 + m i X j 1 6 = j ′ 1 = j 2 = j ′ 2   L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ + 1 m 4 i m i X j 1 = j ′ 1 = j 2 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ . (89) Next, consider the ca se i 1 6 = i 2 . Th en, with ⋆ denoting E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ), E [ U i 1 ( s 1 , t 1 ) U i 2 ( s 2 , t 2 )] = 1 m 2 i 1 m 2 i 2 m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 6 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ + 1 m 2 i 1 m 2 i 2   m i 1 X j 1 = j ′ 1 m i 2 X j 2 6 = j ′ 2 + m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 = j ′ 2   L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ + 1 m 2 i 1 m 2 i 2 m i 1 X j 1 = j ′ 1 m i 2 X j 2 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 ⋆ . (90) Note that, for all i 1 , i 2 , if either j 1 = j ′ 1 or j 2 = j ′ 2 , then L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) = 0 , unless | s 1 − t 1 | ≤ A 2 h , or | s 2 − t 2 | ≤ A 2 h , resp ectiv ely , for A satisfying A ≥ 4( B K + C Q ) and e h n = Ah n . This can b e v eriﬁ ed by using the deﬁnition of E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ), equations ( 113 ), ( 114 ), ( 116 ) - ( 119 ), and arguing as in the analysis of the term ( 48 ). Th erefore, since 1 | s k − t k |≤ A 2 h W e h n ( s k , t k ) = 0, for k = 1 , 2, the sums corresp onding to either j 1 = j ′ 1 or j 2 = j ′ 2 in ( 89 ) and ( 90 ) do not con tribute an ythin g to ( 60 ). Thus, when i 1 6 = i 2 , the only sum that contributes to ( 60 ) corresp onds to j 1 6 = j ′ 1 , j 2 6 = j ′ 2 . When i 1 = i 2 = i , th e sums that con tribute to ( 60 ) are the ones corresp ond ing to j 1 6 = j ′ 1 6 = j 2 6 = j ′ 2 , j 1 = j 2 6 = j ′ 1 6 = j ′ 2 , j 1 = j ′ 2 6 = j ′ 1 6 = j ′ 2 , j 1 6 = j ′ 1 = j 2 6 = j ′ 2 , j 1 6 = j 2 6 = j ′ 1 = j ′ 2 , j 1 = j 2 6 = j ′ 1 = j ′ 2 , and j 1 = j ′ 2 6 = j ′ 1 = j 2 . W e consid er these cases one by one. 39 Lemma 7.7. If i 1 = i 2 , j 1 6 = j ′ 1 6 = j 2 6 = j ′ 2 ; or i 1 6 = i 2 j 1 6 = j ′ 1 , j 2 6 = j ′ 2 , then L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) = ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( s 2 , t 2 ) + O ( h 2 )) + ρ 2 i 1 i 2  ( C ( s 1 , s 2 ) + O ( h 2 ))( C ( t 1 , t 2 ) + O ( h 2 )) + ( C ( s 1 , t 2 ) + O ( h 2 ))( C ( s 2 , t 1 ) + O ( h 2 ))  , (91) wher e the O ( h 2 ) terms ar e uniform in s 1 , t 1 , s 2 , t 2 ∈ [0 , 1] . The follo wing lemma giv es an exp r ession and the corresp onding b ound for the term Z 1 . Lemma 7.8. If i 1 = i 2 = i , j 1 = j 2 6 = j ′ 1 6 = j ′ 2 , then 1 n 2 n X i =1 1 m 4 i m i X j 1 = j 2 6 = j ′ 1 6 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) = ( O ( 1 nhm ) if | s 1 − s 2 | ≤ Ah 2 0 otherwise (92) The follo win g lemma giv es expressions and the corresp on d ing b ounds for th e term Z 2 , Z 3 and Z 4 . Lemma 7.9. If i 1 = i 2 = i and j 1 = j ′ 2 6 = j ′ 1 6 = j 2 ; j 1 6 = j ′ 1 = j 2 6 = j ′ 2 ; j 1 6 = j 2 6 = j ′ 1 = j ′ 2 , then 1 n 2 n X i =1 1 m 4 i m i X j 1 = j ′ 2 6 = j ′ 1 6 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) = ( O ( 1 nhm ) if | s 1 − t 2 | ≤ Ah 2 0 otherwise (93) 1 n 2 n X i =1 1 m 4 i m i X j 1 6 = j ′ 1 = j 2 6 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) = ( O ( 1 nhm ) if | t 1 − s 2 | ≤ Ah 2 0 otherwise (94) 1 n 2 n X i =1 1 m 4 i m i X j 1 6 = j 2 6 = j ′ 1 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) = ( O ( 1 nhm ) if | t 1 − t 2 | ≤ Ah 2 0 otherwise (95) The follo wing lemma gives expressions and the corresp onding b ound s for the terms Z 5 and Z 6 . 40 Lemma 7.10. If i 1 = i 2 = i and j 1 = j 2 6 = j ′ 1 = j ′ 2 , j 1 = j ′ 2 6 = j ′ 1 = j 2 , then 1 n 2 n X i =1 1 m 4 i m i X j 1 = j 2 6 = j ′ 1 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) =            O ( 1 nh 2 m 2 ) if max {| s 1 − s 2 | , | t 1 − t 2 |} ≤ Ah 2 O ( 1 nhm 2 ) if | s 1 − s 2 | ≤ Ah 2 and | t 1 − t 2 | > Ah 2 O ( 1 nhm 2 ) if | s 1 − s 2 | > Ah 2 and | t 1 − t 2 | ≤ Ah 2 0 otherwise ; (96) 1 n 2 n X i =1 1 m 4 i m i X j 1 = j ′ 2 6 = j ′ 1 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E ii ; j 1 j ′ 1 j 2 j ′ 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 , l ′ 2 ) g ( s 1 ) g ( t 1 ) g ( s 2 ) g ( t 2 ) =            O ( 1 nh 2 m 2 ) if max {| s 1 − t 2 | , | s 2 − t 1 |} ≤ Ah 2 O ( 1 nhm 2 ) if | s 1 − t 2 | ≤ Ah 2 and | s 2 − t 1 | > Ah 2 O ( 1 nhm 2 ) if | s 1 − t 2 | > Ah 2 and | s 2 − t 1 | ≤ Ah 2 0 otherwise . (97) Pro of of Prop osition 7.2 : Deﬁne, F i 1 i 2 ; j 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ) := E [ Y 2 i 1 j 1 Y 2 i 2 j 2 e K ( s 1 + t 1 ) / 2 ,l 1 ( T i 1 j 1 ) e K ( s 2 + t 2 ) / 2 ,l 2 ( T i 2 j 2 )] Q h ( s 1 + t 1 2 − s l 1 ) Q h ( s 1 + t 1 2 − s l 2 ) (98) and G i 1 i 2 ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) := E [ Y i 1 j 1 Y i 1 j ′ 1 Y 2 i 2 j 2 e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i 1 j ′ 1 ) e K ( s 2 + t 2 ) / 2 ,l 2 ( T i 2 j 2 )] · Q h ( s 1 − s l 1 ) Q h ( t 1 − s l ′ 1 ) Q h (( s 2 + t 2 ) / 2 − s l 2 ) . (99 ) First, if i 1 = i 2 = i then, with ⋆ denoting F ii ; j 1 j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ), E [ V i ( s 1 + t 1 2 ) V i ( s 2 + t 2 2 )] = 1 m 2 i m i X j 1 6 = j 2 L n X l 1 =1 L n X l 2 =1 ⋆ + 1 m 2 i m i X j 1 = j 2 L n X l 1 =1 L n X l 2 =1 ⋆ . (100) Next, if i 1 6 = i 2 then, with ⋆ denoting F i 1 i 2 ; j 1 j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ), E [ V i 1 ( s 1 + t 1 2 ) V i 2 ( s 2 + t 2 2 )] = 1 m i 1 m i 2 m i 1 X j 1 =1 m i 2 X j 2 =1 L n X l 1 =1 L n X l 2 =1 ⋆ . (101) 41 Next, if i 1 = i 2 = i then, with ⋆ d enoting G ii ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ), E [ U i ( s 1 , t 1 ) V i ( s 2 + t 2 2 )] = 1 m 3 i m i X j 1 6 = j ′ 1 6 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ + 1 m 3 i m i X j 1 = j ′ 1 6 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ 1 m 3 i m i X j ′ 1 6 = j 1 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ + 1 m 3 i m i X j 1 6 = j ′ 1 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ + 1 m 3 i m i X j 1 = j ′ 1 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ . (102) Finally , if i 1 6 = i 2 , then, with ⋆ d enoting G i 1 i 2 ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ), E [ U i 1 ( s 1 , t 1 ) V i 2 ( s 2 + t 2 2 )] = 1 m 2 i 1 m i 2 m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 =1 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ + 1 m 2 i 1 m i 2 m i 1 X j 1 = j ′ 1 m i 2 X j 2 =1 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 ⋆ . (103) Argumen ts similar to those emplo y ed earlier show th at the sums corresp onding to j 1 = j ′ 1 in ( 102 ) and ( 103 ) do not con trib ute anyt h ing to E k H ν b C c ψ ν k 2 . W e ﬁ rst consid er E ( V i 1 ( z 1 ) V i 2 ( z 2 )). Then Lemmas 7.11 and 7.12 , stated b elow, giv e expressions for the leading term and the term Z 7 (and corresp ond ing b ound ), r esp ectiv ely , in ( 64 ). Lemma 7.11. If i 1 6 = i 2 or i 1 = i 2 , and j 1 6 = j 2 then for | s k − t k | ≤ Ah 2 , k = 1 , 2 , with A ≥ 4( B K + C Q ) , 1 n 2 n X i =1 1 m 2 i m i X j 1 6 = j 2 L n X l 1 =1 L n X l 2 =1 F ii ; j 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ) g ( z 1 ) g ( z 2 ) + 1 n 2 n X i 1 6 = i 2 1 m i 1 m i 2 m i 1 X j 1 =1 m i 2 X j 2 =1 L n X l 1 =1 L n X l 2 =1 F i 1 i 2 ; j 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ) g ( z 1 ) g ( z 2 ) − σ 2 [ E ( b C ∗ ( z 1 )) + E ( b C ∗ ( z 2 ))] + σ 4 = C ( s 1 , t 1 ) C ( s 2 , t 2 ) − 1 n 2 n X i =1 1 m i ! ( C ( s 1 , t 1 ) + σ 2 )( C ( s 2 , t 2 ) + σ 2 ) + O ( h 2 ) +   1 n (1 − 1 n n X i =1 1 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h )) . (104) Lemma 7.12. If i 1 = i 2 = i , j 1 = j 2 , then 1 n 2 n X i =1 1 m 2 i m i X j 1 =1 L n X l 1 =1 L n X l 2 =1 F ii ; j 1 ,j 1 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ) g ( z 1 ) g ( z 2 ) = ( O ( 1 nhm ) if | z 1 − z 2 | ≤ Ah 2 0 otherwise. (105) 42 Finally , consider the term E ( U i 1 ( s 1 , t 1 ) V i 2 ( z 2 )). L emm a 7.13 gives an expression for the leading term in ( 65 ), Lemma 7.14 gives expr essions and the corresp onding b oun ds for the terms Z 8 and Z 9 . Lemma 7.13. If i 1 6 = i 2 , j 1 6 = j ′ 1 ; i 1 = i 2 , j 1 6 = j ′ 1 6 = j 2 , then for | s 1 − t 1 | > Ah 2 , and | s 2 − t 2 | ≤ Ah 2 , with A ≥ 4( B K + C Q ) , 1 n 2 n X i =1 w ( m i ) 1 m 3 i m i X j 1 6 = j ′ 1 6 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 G ii ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) + 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) 1 m 2 i 1 m i 2 m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 =1 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 G i 1 i 2 ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) − σ 2 E e C ( s 1 , t 1 ) = ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( s 2 , t 2 ) + O ( h 2 )) − 1 n 2 n X i =1 2 m i ! ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( s 2 , t 2 ) + σ 2 + O ( h 2 )) +   1 n (1 − 1 n n X i =1 2 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h )) . (106) Lemma 7.14. If i 1 = i 2 = i and j ′ 1 6 = j 1 = j 2 , j 1 6 = j ′ 1 = j 2 , then 1 n 2 n X i =1 w ( m i ) 1 m 3 i m i X j ′ 1 6 = j 1 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 G ii ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) = ( O ( 1 nh 2 m ) if | s 1 − s 2 | ≤ Ah 2 0 otherwise . (107) 1 n 2 n X i =1 w ( m i ) 1 m 3 i m i X j 1 6 = j ′ 1 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 G ii ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) = ( O ( 1 nh 2 m ) if | t 1 − s 2 | ≤ Ah 2 0 otherwise . (108) Details of the calculation of p oin twise v a riance ( 32 ) Pro of of Prop osition 7.3 : Consider ﬁrs t W e h n ( s, t )V ar ( e C ( s, t )) = W e h n ( s, t ) E ( e C ( s, t )) 2 − ( E ( W e h n ( s, t ) e C ( s, t ))) 2 . 43 Using ( 59 ), ( 88 ) and th e argumen ts leading to ( 91 ), w e hav e W e h n ( s, t ) 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) w ( m i 2 ) m 2 i 1 m 2 i 2 m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 6 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E i 1 i 2 ; j 1 j ′ 1 j 2 j ′ 2 ( s, t, s, t ; l 1 , l ′ 1 , l 2 , l ′ 2 ) ( g ( s ) g ( t )) 2 − ( E ( W e h n ( s, t ) e C ( s, t ))) 2 = W e h n ( s, t ) 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) w ( m i 2 ) m 2 i 1 m 2 i 2 m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 6 = j ′ 2 L n X l 1 ,l ′ 1 =1 L n X l 2 ,l ′ 2 =1 E [ C ( T i 1 j 1 , T i 1 j ′ 1 ) e K s,l 1 ( T i 1 j 1 ) e K t,l ′ 1 ( T i 1 j ′ 1 )] g ( s ) g ( t ) E [ C ( T i 2 j 2 , T i 2 j ′ 2 ) e K s,l 2 ( T i 2 j 2 ) e K t,l ′ 2 ( T i 2 j ′ 2 )] g ( s ) g ( t ) − W e h n ( s, t ) 1 n 2   n X i =1 w ( m i ) m 2 i m i X j 1 6 = j ′ 1 L n X l 1 ,l ′ 1 =1 E [ C ( T ij 1 , T ij ′ 1 ) e K s,l 1 ( T ij ′ 1 ) e K t,l ′ 1 ( T ij ′ 1 )] g ( s ) g ( t )   2 + W e h n ( s, t ) 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2  ( C ( s, s ) + O ( h 2 ))( C ( t, t ) + O ( h 2 )) + ( C ( s, t ) + O ( h 2 )) 2  = − W e h n ( s, t ) 1 n ( C ( s, t ) + O ( h 2 )) 2 + W e h n ( s, t ) 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2  ( C ( s, s ) + O ( h 2 ))( C ( t, t ) + O ( h 2 )) + ( C ( s, t ) + O ( h 2 )) 2  = O  1 n  +   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   O (1) . (109) Com binin g ( 109 ) with ( 91 ) and ( 92 ) - ( 97 ), w e obtain ( 66 ). Pro of of Prop osition 7.4 : W rite W e h n ( s, t )V ar ( b C ∗ ( s + t 2 )) = W e h n ( s, t ) E ( b C ∗ ( s + t 2 )) 2 − ( E ( W e h n ( s, t ) b C ∗ ( s + t 2 ))) 2 , and observe that, b y ( 63 ), ( 98 ) and ( 104 ), an d follo win g steps ve r y similar to those leading to ( 109 ), w e h a v e W e h n ( s, t ) 1 n 2 n X i 1 6 = i 2 1 m i 1 m i 2 m i 1 X j 1 =1 m i 2 X j 2 =1 L n X l 1 =1 L n X l 2 =1 F i 1 i 2 ; j 1 ,j 2 ( s, t, s, t ; l 1 , l 2 ) g ( s + t 2 ) 2 − ( E ( W e h n ( s, t ) b C ∗ ( s + t 2 ))) 2 = − W e h n ( s, t ) 1 n ( C ( s, t ) + σ 2 + O ( h 2 )) 2 + W e h n ( s, t )   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   (2( C ( s, t )) 2 + O ( h )) = O  1 n  +   1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   O (1) . (110) Com binin g ( 110 ) with the steps leading to ( 104 ) and ( 105 ), we obtain ( 67 ). 44 Pro ofs of Lemmas 7.7 - 7.14 Pro of of Lemma 7.7 : Since ρ ii = 1, from expressions ( 112 ) and ( 115 ) w e can treat the terms corresp ondin g to i 1 = i 2 = i and i 1 6 = i 2 in a uniﬁed wa y . F rom ( 120 ) and ( 121 ), the exp r ession ( 122 ) and the calculations leading to ( 50 ), ( 91 ) follo ws. Pro of of Lemma 7.8 : I t follo ws from ( 116 ), ( 123 ) and ( 128 ) (taking s = s 1 , s ′ = s 2 , t = t 1 and t ′ = t 2 in the latter). Pro of of Lemma 7.9 : F ollo w s by arguments analogo u s to th ose for deriving ( 92 ). Pro of of Lemma 7.10 : F ollo ws from ( 118 ), ( 123 ) and ( 126 ). Pro of of Lemma 7.11 : By ( 114 ) and ( 118 ), E [ Y 2 i 1 j 1 Y 2 i 2 j 2 | T i 1 , T i 2 ] = ( C ( T i 1 j 1 , T i 1 j 1 ) + σ 2 )( C ( T i 2 j 2 , T i 2 j 2 ) + σ 2 ) + 2 ρ 2 i 1 i 2 ( C ( T i 1 j 1 , T i 2 j 2 )) 2 . (111) The expression for E [( C ( T i 1 j 1 , T i 2 j 2 )) 2 e K z 1 ,l 1 ( T i 1 j 1 ) e K z 2 ,l 2 ( T i 2 j 2 )] is giv en b y Z Z ( C ( u, v )) 2 g ( u ) g ( v ) e K z 1 ,l 1 ( u ) e K z 2 ,l 2 ( v ) dudv , and it can b e shown that when w e sum o ve r l 1 , l 2 = 1 , . . . , L n , th e sum equals ( C ( z 1 , z 2 )) 2 g ( z 1 ) g ( z 2 )+ O ( h 2 ). F rom this, and the calculati ons leading to ( 51 ), w e h a ve , f or | s k − t k | ≤ Ah 2 , k = 1 , 2, w ith 45 A ≥ 4( B K + C Q ), 1 n 2 n X i =1 1 m 2 i m i X j 1 6 = j 2 L n X l 1 =1 L n X l 2 =1 F ii ; j 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ) g ( z 1 ) g ( z 2 ) + 1 n 2 n X i 1 6 = i 2 1 m i 1 m i 2 m i 1 X j 1 =1 m i 2 X j 2 =1 L n X l 1 =1 L n X l 2 =1 F i 1 i 2 ; j 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l 2 ) g ( z 1 ) g ( z 2 ) − σ 2 [ E ( b C ∗ ( z 1 )) + E ( b C ∗ ( z 2 ))] + σ 4 = 1 n (1 − 1 n n X i =1 1 m i ) + n − 1 n ! ( C ( z 1 , z 1 ) + σ 2 + O ( h 2 ))( C ( z 2 , z 2 ) + σ 2 + O ( h 2 )) +   1 n (1 − 1 n n X i =1 1 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   (2( C ( z 1 , z 2 )) 2 + O ( h 2 )) − σ 2 ( C ( z 1 , z 1 ) + C ( z 2 , z 2 ) + 2 σ 2 + O ( h 2 )) + σ 4 = 1 − 1 n 2 n X i =1 1 m i ! ( C ( s 1 , t 1 ) + σ 2 + O ( h 2 ))( C ( s 2 , t 2 ) + σ 2 + O ( h 2 )) +   1 n (1 − 1 n n X i =1 1 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h )) , − σ 2 ( C ( s 1 , t 1 ) + C ( s 2 , t 2 )) − σ 4 + O ( h 2 ) = C ( s 1 , t 1 ) C ( s 2 , t 2 ) − 1 n 2 n X i =1 1 m i ! ( C ( s 1 , t 1 ) + σ 2 )( C ( s 2 , t 2 ) + σ 2 ) + O ( h 2 ) +   1 n (1 − 1 n n X i =1 1 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h )) . Pro of of Lemma 7.12 : Note that, E ( Y 4 ij 1 | T i ) = 3( C ( T ij 1 , T ij 1 ) + σ 2 ) 2 . Th us, from ( 129 ), we ha ve L n X l 1 ,l 2 =1 E [ E ( Y 4 ij 1 | T i ) e K z 1 ,l 1 ( T ij 1 ) e K z 2 ,l 2 ( T ij 1 )] Q h ( z 1 − s l 1 ) Q h ( z 2 − s l 2 ) = ( O ( h − 1 ) if | z 1 − z 2 | ≤ Ah 2 0 otherwise uniformly in s 1 , t 1 , s 2 , t 2 ∈ [0 , 1] and 1 ≤ j 1 ≤ m i , 1 ≤ i ≤ n . T h erefore ( 105 ) follo ws. Pro of of Lemma 7.13 : By ( 113 ) and ( 116 ), E [ Y i 1 j 1 Y i 1 j ′ 1 Y 2 i 2 j 2 | T i 1 , T i 2 ] = C ( T i 1 j 1 , T i 1 j ′ 1 )( C ( T i 2 j 2 , T i 2 j 2 ) + σ 2 ) + 2 ρ 2 i 1 i 2 C ( T i 1 j 1 , T i 2 j 2 ) C ( T i 1 j ′ 1 , T i 2 j 2 ) . The expression for E [ C ( T i 1 j 1 , T i 2 j 2 ) C ( T i 1 j ′ 1 , T i 2 j 2 ) e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i 1 j ′ 1 ) e K z 2 ,l 2 ( T i 2 j 2 )] is giv en b y Z Z Z C ( u, w ) C ( v , w ) g ( u ) g ( v ) g ( w ) e K s,l ( u ) e K t,l ′ ( v ) e K z ,m ( w ) dudv dw , 46 and it can b e sho wn that w hen we sum this ov er l 1 , l ′ 1 , l 2 = 1 , . . . , L n , the sum equals C ( s 1 , z 2 ) C ( t 1 , z 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) + O ( h 2 ) . F rom this, and s imilar argumen ts as b efore, we ha ve, for | s 1 − t 1 | > Ah 2 , and | s 2 − t 2 | ≤ Ah 2 , with A ≥ 4( B K + C Q ), 1 n 2 n X i =1 w ( m i ) 1 m 3 i m i X j 1 6 = j ′ 1 6 = j 2 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 G ii ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) + 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) 1 m 2 i 1 m i 2 m i 1 X j 1 6 = j ′ 1 m i 2 X j 2 =1 L n X l 1 ,l ′ 1 =1 L n X l 2 =1 G i 1 i 2 ; j 1 ,j ′ 1 ,j 2 ( s 1 , t 1 , s 2 , t 2 ; l 1 , l ′ 1 , l 2 ) g ( s 1 ) g ( t 1 ) g ( z 2 ) − σ 2 E e C ( s 1 , t 1 ) = 1 n (1 − 1 n n X i =1 2 m i )( C ( s 1 , t 1 ) + O ( h 2 ))( C ( z 2 , z 2 ) + σ 2 + O ( h 2 )) + 1 n 2 n X i 1 6 = i 2 w ( m i 1 ) 1 m 2 i 1 m i 1 X j 1 6 = j ′ 1 L n X l 1 ,l ′ 1 =1 E [ C ( T i 1 j 1 , T i 1 j ′ 1 ) e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i 1 j ′ 1 )] g ( s 1 ) g ( t 1 ) · 1 m i 2 m i 2 X j 2 =1 L n X l 2 =1 E [( C ( T i 2 j 2 , T i 2 j 2 ) + σ 2 ) e K z 2 ,l 2 ( T i 2 j 2 )] g ( z 2 ) − σ 2 1 n n X i =1 w ( m i ) 1 m 2 i m i X j 1 6 = j ′ 1 L n X l 1 ,l ′ 1 =1 E [ C ( T i 1 j 1 , T i 1 j ′ 1 ) e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i 1 j ′ 1 )] g ( s 1 ) g ( t 1 ) +   1 n (1 − 1 n n X i =1 2 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   (2 C ( s 1 , z 2 ) C ( t 1 , z 2 ) + O ( h 2 )) = 1 n (1 − 1 n n X i =1 2 m i ) + n − 1 n ! ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( z 2 , z 2 ) + σ 2 + O ( h 2 )) − σ 2 ( C ( s 1 , t 1 ) + O ( h 2 )) +   1 n (1 − 1 n n X i =1 2 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   (2 C ( s 1 , z 2 ) C ( t 1 , z 2 ) + O ( h 2 )) = ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( s 2 , t 2 ) + O ( h 2 )) − 1 n 2 n X i =1 2 m i ! ( C ( s 1 , t 1 ) + O ( h 2 ))( C ( s 2 , t 2 ) + σ 2 + O ( h 2 )) +   1 n (1 − 1 n n X i =1 2 m i ) + 1 n 2 n X i 1 6 = i 2 ρ 2 i 1 i 2   ( C ( s 1 , s 2 ) C ( t 1 , t 2 ) + C ( s 1 , t 2 ) C ( s 2 , t 1 ) + O ( h )) . The last equalit y follo ws from the fact that the terms C ( s 1 , t 1 ) + O ( h 2 )) app earing lines f our, nine and ten are the same. Pro of of Lemma 7.14 : F ollo ws from ( 117 ) and ( 12 7 ). 47 Pro of of ( 72 ) Deﬁne W ij = ψ k ( T ij ) B i ( s, T ij ) and ¯ W ij = ψ k ′ ( T ij ) B i ( t, T ij ). Since | s − t | > Ah/ 2, it follo ws that for all i , W k ij ¯ W l ij = 0, for all k , l ≥ 1, for all j = 1 , . . . , m i . Th us, if | s − t | > Ah/ 2, then m 4 i V ar ( B 1 i,k ( s ) B 1 i,k ′ ( t )) = E [ m i X j 6 = j ′ ( W ij ¯ W ij ′ − E ( W ij ¯ W ij ′ ))] 2 = m i X j 6 = j ′ [ E ( W ij ¯ W ij ′ ) 2 − ( E ( W ij ¯ W ij ′ )) 2 ] + m i X j 1 = j ′ 2 6 = j ′ 1 = j 2 [ E ( W ij 1 ¯ W ij ′ 1 W ij ′ 1 ¯ W ij 1 ) − E ( W ij 1 ¯ W ij ′ 1 ) E ( W ij ′ 1 ¯ W ij 1 )] + m i X j 1 = j 2 6 = j ′ 1 6 = j ′ 2 [ E ( W 2 ij 1 ¯ W ij ′ 1 ¯ W ij ′ 2 ) − E ( W ij 1 ¯ W ij ′ 1 ) E ( W ij 1 ¯ W ij ′ 2 )] + m i X j 1 = j ′ 2 6 = j ′ 1 6 = j 2 [ E ( W ij 1 ¯ W ij ′ 1 W ij 2 ¯ W ij 1 ) − E ( W ij 1 ¯ W ij ′ 1 ) E ( W ij 2 ¯ W ij 1 )] + m i X j ′ 1 = j 2 6 = j 1 6 = j ′ 2 [ E ( W ij 1 ¯ W ij ′ 1 W ij ′ 1 ¯ W ij ′ 2 ) − E ( W ij 1 ¯ W ij ′ 1 ) E ( W ij ′ 1 ¯ W ij ′ 2 )] + m i X j ′ 1 = j ′ 2 6 = j 1 6 = j 2 [ E ( W ij 1 ¯ W 2 ij ′ 1 W ij 2 ) − E ( W ij 1 ¯ W ij ′ 1 ) E ( W ij 2 ¯ W ij 1 )] , since the term corresp onding to j 1 6 = j ′ 1 6 = j 2 6 = j ′ 2 v anishes. No w b y using the fact that T ij ’s are i.i.d., we can simplify eac h sum on th e RHS. 1st term = m i ( m i − 1)[ E ( W 2 i 1 ) E ( ¯ W 2 i 1 ) − ( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 ] 2nd term = m i ( m i − 1)[0 − ( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 ] 3rd term = m i ( m i − 1)( m i − 2)[ E ( W 2 i 1 )( E ( ¯ W i 1 )) 2 − ( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 ] 4th term = m i ( m i − 1)( m i − 2)[0 − ( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 ] 5th term = m i ( m i − 1)( m i − 2)[0 − ( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 ] 6th term = m i ( m i − 1)( m i − 2)[( E ( W i 1 )) 2 E ( ¯ W 2 i 1 ) − ( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 ] Th u s, m 4 i V ar( B 1 i,k ( s ) B 1 i,k ′ ( t )) = m i ( m i − 1)[ E ( W 2 i 1 ) E ( ¯ W 2 i 1 ) + ( m i − 2) E ( W 2 i 1 )( E ( ¯ W i 1 )) 2 + ( m i − 2)( E ( W i 1 )) 2 E ( ¯ W 2 i 1 )] − m i ( m i − 1)(4 m i − 6)( E ( W i 1 )) 2 ( E ( ¯ W i 1 )) 2 No w, u s ing the facts that E ( W 2 i 1 ) = O ( h − 1 ) = E ( ¯ W 2 i 1 ) and | E ( W i 1 ) | = O (1) = | E ( W i 1 ) | , w e conclude ( 72 ). Computation of conditional mixed momen ts The computation of th e momen ts is done by u sing the Wick formula (Lemma 7.4 ). W e consider all the diﬀerent generic cases b elo w : 48 • Case : i 1 6 = i 2 , j 1 6 = j ′ 1 , j 2 6 = j ′ 2 : In this case, E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = C ( T i 1 j 1 , T i 1 j ′ 1 ) C ( T i 2 j 2 , T i 2 j ′ 2 ) + ρ 2 i 1 i 2 h C ( T i 1 j 1 , T i 2 j 2 ) C ( T i 1 j ′ 1 , T i 2 j ′ 2 ) + C ( T i 1 j 1 , T i 2 j ′ 2 ) C ( T i 1 j ′ 1 , T i 2 j ′ 2 ) i . (112) • Case : i 1 6 = i 2 , j 1 = j ′ 1 , j 2 6 = j ′ 2 (equiv alent to i 1 6 = i 2 , j 1 6 = j ′ 1 , j 2 = j ′ 2 ): I n this case, E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = E ( X 2 i 1 j 1 X i 2 j 2 X i 2 j ′ 2 | T i 1 , T i 2 ) + σ 2 E ( X i 2 j 2 X i 2 j ′ 2 | T i 1 , T i 2 ) . Therefore, b y ( 42 ), E ( X 2 i 1 j 1 X i 2 j 2 X i 2 j ′ 2 | T i 1 , T i 2 ) = C ( T i 1 j 1 , T i 1 j 1 ) C ( T i 2 j 2 , T i 2 j ′ 2 ) + 2 ρ 2 i 1 i 2 C ( T i 1 j 1 , T i 2 j 2 ) C ( T i 1 j 1 , T i 2 j ′ 2 ) . Com binin g, w e hav e E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = C ( T i 1 j 1 , T i 1 j 1 ) C ( T i 2 j 2 , T i 2 j ′ 2 ) + 2 ρ 2 i 1 i 2 C ( T i 1 j 1 , T i 2 j 2 ) C ( T i 1 j 1 , T i 2 j ′ 2 ) + σ 2 C ( T i 2 j 2 , T i 2 j ′ 2 ) . (113) • Case : i 1 6 = i 2 , j 1 = j ′ 1 , j 2 = j ′ 2 : In this case, E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = ( C ( T i 1 j 1 , T i 1 j 1 ) + σ 2 )( C ( T i 2 j 2 , T i 2 j 2 ) + σ 2 ) + 2 ρ 2 i 1 i 2 ( C ( T i 1 j 1 , T i 2 j 2 )) 2 . (114) • Case : i 1 = i 2 , j 1 6 = j ′ 1 6 = j 2 6 = j ′ 2 : In this case E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = C ( T i 1 j 1 , T i 1 j ′ 1 ) C ( T i 1 j 2 , T i 1 j ′ 2 ) + C ( T i 1 j 1 , T i 1 j 2 ) C ( T i 1 j ′ 1 , T i 1 j ′ 2 ) + C ( T i 1 j 1 , T i 1 j ′ 2 ) C ( T i 1 j ′ 1 , T i 1 j 2 ) . (115) • Case : i 1 = i 2 , j 1 = j ′ 1 6 = j 2 6 = j ′ 2 (equiv alent to i 1 = i 2 , j 1 = j 2 6 = j ′ 1 6 = j ′ 2 ; i 1 = i 2 , j 1 = j ′ 2 6 = j ′ 1 6 = j ′ 2 ; i 1 = i 2 , j 1 6 = j ′ 1 = j 2 6 = j ′ 2 ; i 1 = i 2 , j 1 6 = j ′ 1 = j ′ 2 6 = j 2 ; and i 1 = i 2 , j 1 6 = j ′ 1 6 = j 2 = j ′ 2 ): I n this case E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = ( C ( T i 1 j 1 , T i 1 j 1 ) + σ 2 ) C ( T i 1 j 2 , T i 1 j ′ 2 ) + 2 C ( T i 1 j 1 , T i 1 j 2 ) C ( T i 1 j 1 , T i 1 j ′ 2 ) . (116) • Case : i 1 = i 2 , j 1 = j ′ 1 = j 2 6 = j ′ 2 (equiv alent to i 1 = i 2 , j 1 = j ′ 1 = j ′ 2 6 = j 2 ; i 1 = i 2 , j 1 = j 2 = j ′ 2 6 = j ′ 1 ; and i 1 = i 2 , j 1 6 = j ′ 1 = j 2 = j ′ 2 ): I n this case E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = 3 C ( T i 1 j 1 , T i 1 j 1 ) C ( T i 1 j 1 , T i 1 j ′ 2 ) + 3 σ 2 C ( T i 1 j 1 , T i 1 j ′ 2 ) . (117) • Case : i 1 = i 2 , j 1 = j ′ 1 6 = j 2 = j ′ 2 (equiv alent to i 1 = i 2 , j 1 = j 2 6 = j ′ 1 = j ′ 2 ; and i 1 = i 2 , j 1 = j ′ 2 6 = j ′ 1 = j 2 ): T his this case E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = ( C ( T i 1 j 1 , T i 1 j 1 ) + σ 2 )( C ( T i 1 j 2 , T i 1 j 2 ) + σ 2 ) + 2( C ( T i 1 j 1 , T i 1 j 2 )) 2 . (118) • Case : i 1 = i 2 , j 1 = j ′ 1 = j 2 = j ′ 2 : In this case E ( Y i 1 j 1 Y i 1 j ′ 1 Y i 2 j 2 Y i 2 j ′ 2 | T i 1 , T i 2 ) = 3( C ( T i 1 j 1 , T i 1 j 1 ) + σ 2 ) 2 . (119) 49 Computation of unconditional mixed momen ts ( oﬀ-diagonal part) Here, w e obtain simpliﬁed forms the certain exp ectations that are used in the pr o of of Prop ositions 7.3 and 7.4 . O b serv e that, based on our calcula tions in App endix A, w e only need to compute the exp ectations of the form E [ C ( T i 1 j 1 , T i ′ 1 j ′ 1 ) C ( T i 2 j 2 , T i ′ 2 j ′ 2 ) e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i ′ 1 j ′ 1 ) e K s 2 ,l 2 ( T i 2 j 2 ) e K t 2 ,l ′ 2 ( T i ′ 2 j ′ 2 )] . (120) Notice that, wh en the pairs ( T i 1 j 1 , T i ′ 1 j ′ 1 ) and ( T i 2 j 2 , T i ′ 2 j ′ 2 ) are indep endent , the exp ectation in ( 120 ) factorizes as E [ C ( T i 1 j 1 , T i ′ 1 j ′ 1 ) e K s 1 ,l 1 ( T i 1 j 1 ) e K t 1 ,l ′ 1 ( T i ′ 1 j ′ 1 )] E [ C ( T i 2 j 2 , T i ′ 2 j ′ 2 ) e K s 2 ,l 2 ( T i 2 j 2 ) e K t 2 ,l ′ 2 ( T i ′ 2 j ′ 2 )] . (121) Eac h individual term is exactly of th e same form that w e en countered wh ile calc u lating the b ias of our estimate. The exp ectations app earing ab o ve are of the form Z Z C ( u, v ) g ( u ) g ( v ) e K s,l ( u ) e K s ′ ,l ′ ( v ) du. (122) F or other terms we need to ev aluate or appr oximate v arious other in tegrals. The general forms of these int egrals are giv en b elo w, for 1 ≤ l , l ′ , m, m ′ ≤ L n and s, s ′ , t, t ′ ∈ [0 , 1]. Z ( C ( u, u )) r g ( u ) e K s,l ( u ) e K s ′ ,l ′ ( u ) du = ( O ( h − 1 ) if max {| s − s l | , | s ′ − s l ′ |} ≤ 2 B K h 0 otherwise ; for r = 0 , 1 , 2; (123) Z C ( u, u ) g ( u ) e K s,l ( u ) e K s ′ ,l ′ ( u ) e K t,m ( u ) e K t ′ ,m ′ ( u ) du ; (124) Z ( C ( u, u )) 2 g ( u ) e K s,l ( u ) e K s ′ ,l ′ ( u ) e K t,m ( u ) e K t ′ ,m ′ ( u ) du. (1 25) Z Z ( C ( u, v )) r g ( u ) g ( v ) e K s,l ( u ) e K s ′ ,l ′ ( u ) e K t,m ( v ) e K t ′ ,m ′ ( v ) dudv = ( O ( h − 2 ) if max {| s − s l | , | s ′ − s l ′ | , | t − s m | , | t ′ − s m ′ |} ≤ 2 B K h 0 otherwise . ; for r = 0 , 1 , 2 (126) Z Z ( C ( u, u )) r C ( u, v ) g ( u ) g ( v ) e K s,l ( u ) e K s ′ ,l ′ ( u ) e K t,m ( u ) e K t ′ ,m ′ ( v ) dudv = ( O ( h − 2 ) if max {| s − s l | , | s ′ − s l ′ | , | t − m |} ≤ 2 B K h 0 otherwise ; for r = 0 , 1 . (127) Z Z Z C ( u, v ) C ( u, w ) g ( u ) g ( v ) g ( w ) e K s,l ( u ) e K s ′ ,l ′ ( u ) e K t,m ( v ) e K t ′ ,m ′ ( w ) dudv dw = ( O ( h − 1 ) if max {| s − s l | , | s ′ − s l ′ |} ≤ 2 B K h 0 otherwise . (128) 50 Computation of unconditional mixed momen ts ( diagonal and mixed pa rt ) W e ha v e the follo wing b ound : Z ( C ( u, u )) r g ( u ) e K z 1 ,l 1 ( u ) e K z 2 ,l 2 ( u ) du = ( O ( h − 1 ) if | z k − s l k | ≤ 2 B K h, k = 1 , 2 0 otherwise for r = 0 , 1 , 2 . (129) Some error b ounds in volv ing Dirac- δ Here, w e pr ovide some k ey estimates that are crucial to obtaining the o ve rall risk b ound. They all in vol ve the op erator H ν . Due to the decomp osition ( 55 ) we can reduce the co m p utations of these b ound s to in tegrals in v olving { ψ k ( · ) } M k =1 and δ ( · , · ). Thr oughout we assume that R ( s 1 , s 2 , t 1 , t 2 ) is a “nice” fun ction satisfying certain (b oun d edness) conditions. Then, | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 | = | Z Z R ( x, x, t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 | ≤ k R k ∞ k ψ ν k 2 ∞ . (130) | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) W Ah ( s 1 , t 1 ) R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 | = | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) Z ( s 1 + Ah 2 ) ∧ 1 ( s 1 − Ah 2 ) ∨ 0 R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 ds 1 ds 2 | = | Z Z ( x + Ah 2 ) ∧ 1 ( x − Ah 2 ) ∨ 0 R ( x, x, t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 | ≤ Ah k R k ∞ k ψ ν k 2 ∞ . (131 ) | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) W Ah ( s 1 , t 1 ) W Ah ( s 2 , t 2 ) R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 | = | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) Z ( s 2 + Ah 2 ) ∧ 1 ( s 2 − Ah 2 ) ∨ 0 Z ( s 1 + Ah 2 ) ∧ 1 ( s 1 − Ah 2 ) ∨ 0 R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 ds 1 ds 2 | = | Z ( x + Ah 2 ) ∧ 1 ( x − Ah 2 ) ∨ 0 Z ( x + Ah 2 ) ∧ 1 ( x − Ah 2 ) ∨ 0 R ( x, x, t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 | ≤ ( Ah ) 2 k R k ∞ k ψ ν k 2 ∞ . (132) | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) W Ah ( t 1 , t 2 ) R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 | = | Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) Z ( t 2 + Ah 2 ) ∧ 1 ( t 2 − Ah 2 ) ∨ 0 R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 ds 1 ds 2 | = | Z Z ( t 2 + Ah 2 ) ∧ 1 ( t 2 − Ah 2 ) ∨ 0 R ( x, x, t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 | ≤ Ah k R k ∞ k ψ ν k 2 ∞ . (133) 51 | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) W Ah ( t 1 , t 2 ) W Ah ( s 2 , t 2 ) R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 | = | Z Z δ ( x, s 1 ) δ ( x, s 2 ) Z ( s 2 + Ah 2 ) ∧ 1 ( s 2 − Ah 2 ) ∨ 0 Z ( t 2 + Ah 2 ) ∧ 1 ( t 2 − Ah 2 ) ∨ 0 R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 ds 1 ds 2 | = | Z ( x + Ah 2 ) ∧ 1 ( x − Ah 2 ) ∨ 0 Z ( t 2 + Ah 2 ) ∧ 1 ( t 2 − Ah 2 ) ∨ 0 R ( x, x, t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 | ≤ ( Ah ) 2 k R k ∞ k ψ ν k 2 ∞ . (134) | Z Z Z Z δ ( x, s 1 ) δ ( x, s 2 ) W Ah ( t 1 , s 2 ) W Ah ( s 2 , t 2 ) R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) ds 1 ds 2 dt 1 dt 2 | = | Z Z δ ( x, s 1 ) δ ( x, s 2 ) Z ( s 2 + Ah 2 ) ∧ 1 ( s 2 − Ah 2 ) ∨ 0 Z ( s 2 + Ah 2 ) ∧ 1 ( s 2 − Ah 2 ) ∨ 0 R ( s 1 , s 2 , t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 ds 1 ds 2 | = | Z ( x + Ah 2 ) ∧ 1 ( x − Ah 2 ) ∨ 0 Z ( x + Ah 2 ) ∧ 1 ( x − Ah 2 ) ∨ 0 R ( x, x, t 1 , t 2 ) ψ ν ( t 1 ) ψ ν ( t 2 ) dt 1 dt 2 | ≤ ( Ah ) 2 k R k ∞ k ψ ν k 2 ∞ . (135) App endix G : Pro of of Theorem 4.3 In order to p ro ve th is result, w e use a strategy v ery similar to the one used in the p r o of of Corollary 1 in P aul and P eng (2007 ). In view of the statemen t of the theorem, it suﬃces to co n sider a submo d el consisting of k ernels Σ of r an k 1. Let Σ (0) ( s, t ) = λψ ( s ) ψ ( t ) , s, t ∈ [0 , 1] for λ ≥ C 1 , where ψ ( · ) ≡ 1. Th en ψ is the ﬁrst (and only) eigenfunction of Σ (0) with corresp onding eigen v alue λ . Let us supp ose that th e design D satisﬁes m = m = m ≥ 4. Finally , c ho ose g to b e the uniform densit y on [0 , 1]. Let M ∗ ∼ ( nm ) 1 / 5 , and let { γ l } M ∗ l =1 b e orthonormal functions such that (i) γ l ’s are t w ice con tinuously diﬀeren tiable, and max l k γ ( j ) l k ∞ = O ( M 1 / 2+ j ∗ ), for j = 0 , 1 , 2; (ii) R 1 0 γ l ( s ) ds = 0 f or all l , and (iii) γ l is cen tered aroun d l / M ∗ with length of su pp ort O ( M − 1 ∗ ) uniformly o v er l . Note th at, condition (iii) implies that { γ l } are orthogonal to ψ . Let M 0 = [ 2 M ∗ 9 ]. Let F 0 b e an index set s atisfying log |F 0 | ≍ M ∗ , and { z ( j ) l : l = 1 , . . . , M ∗ } j ∈F 0 b e a collection with z ( j ) l taking v alues in {− M − 1 / 2 0 , 0 , M − 1 / 2 0 } , su ch that with z ( j ) denoting the ve ctor ( z ( j ) l ) M ∗ l =1 , we h a v e k z ( j ) k 2 = 1 and k z ( j ) − z ( j ′ ) k 2 ≥ 1 for j 6 = j ′ ∈ F 0 . T h e constru ction is b y a “sph ere packing” argumen t as in P aul and John stone (2007). Let δ ≍ ( nm ) − 2 / 5 ≍ M − 2 ∗ Then, deﬁne ψ ( j ) ( s ) = p 1 − δ 2 ψ ( s ) + δ M ∗ X l =1 z ( j ) l γ l ( s ) , j ∈ F 0 . Note that by construction, (i’) k ψ ( j ) k 2 = 1; (ii’) ψ ( j ) are t wice diﬀeren tiable, with second d eriv ativ e b ound ed; (iii’) k ψ ( j ) − ψ ( j ′ ) k 2 ≥ δ f or j 6 = j ′ ∈ F 0 ; (iv’) k ψ − ψ ( j ) k ∞ = O ( δ ) uniform ly o v er j ∈ F 0 . Prop erty (iv’) will b e crucial for muc h of our analysis later on . In order to pr o v e Theorem 4.3 , w e n eed to sho w the follo wing: n X i =1 E K ( Σ ( j ) i , Σ (0) i ) ≍ nmδ 2 , uniformly in j ∈ F 0 , (136) 52 where Σ ( j ) i denotes the cov ariance of the observ atio n i given { T il } m l =1 under the mo del parameterized b y Σ ( j ) 0 , and E denotes exp ectation with resp ect to the design p oints T . Pro of of ( 136 ) F rom now on wards, w e shall ﬁx j ∈ F 0 , and drop the su p erscript ( j ) for con v enience. Denote the m × 1 v ectors ( ψ ( T ij ) m j =1 and ( ψ ( T ij ) m j =1 b y ψ i and ψ i , resp ectiv ely . Of course, ψ i is the nonrandom v ector w ith all the entrie s equal to 1. Next, observ e that, k Σ (0) i − Σ i k F = λ k ψ i ( ψ i − ψ i ) T + ( ψ i − ψ i ) ψ T i k F ≤ λ ( k ψ i k 2 + k ψ i k 2 ) k ψ i − ψ i k 2 . (137) Since k ψ i − ψ i k 2 ≤ √ m k ψ − ψ k ∞ = O ( √ mδ ) (by p r op erty (iv’)), and k ψ i k 2 = √ m , from ( 137 ) it follo ws that, max 1 ≤ i ≤ n k Σ (0) i − Σ i k 2 F = O ( m 2 δ 2 ) . (138) Since mδ ≍ m ( nm ) − 2 / 5 and m = o ( n 2 / 3 ), the RHS of ( 138 ) is o (1) ( a nonr andom b ound ) u niformly o v er F 0 , and hence, using argument s as in the pro of of Prop osition 2 in P aul and Peng (2007 ), we ha ve n X i =1 K ( Σ i , Σ (0) i ) ≍ n X i =1 k ( Σ (0) i ) − 1 / 2 (Σ (0) i − Σ i )(Σ (0) i ) − 1 / 2 k 2 F , u n iformly o v er F 0 . Th u s, ( 136 ) will follo w once w e p ro ve: Prop osition 7.5. Uniformly over F 0 , E k ( Σ (0) 1 ) − 1 / 2 (Σ (0) 1 − Σ 1 )(Σ (0) 1 ) − 1 / 2 k 2 F ≍ mδ 2 . (139) Pro of of Prop osition 7.5 : First, note that, θ = ψ 1 / √ m is a v ector of l 2 norm 1, and hence, b y using a standard m atrix in ve r s ion formula, (Σ (0) 1 ) − 1 = ( I + λmθ θ T ) − 1 = I − κθ θ T , where κ = λm 1 + λm . Let ∆ = Σ 1 − Σ (0) 1 = λ ( ψ 1 ψ T 1 − mθ θ T ). Then, k ( Σ (0) 1 ) − 1 / 2 (Σ (0) 1 − Σ 1 )(Σ (0) 1 ) − 1 / 2 k 2 F = tr [( I − κ θ θ T )∆( I − κθ θ T )∆] = tr [( I − θ θ T )∆( I − θ θ T )∆] + 2(1 − κ ) θ T ∆( I − θ θ T )∆ θ + (1 − κ ) 2 ( θ T ∆ θ ) 2 = λ 2 h k ( I − θ θ T ) ψ 1 k 4 2 +2(1 − κ )( θ T ψ 1 ) 2 k ( I − θ θ T ) ψ 1 k 2 2 +(1 − κ ) 2 ( m − ( θ T ψ 1 ) 2 ) 2 i = λ 2 h ( k ψ 1 k 2 2 − ( θ T ψ 1 ) 2 ) 2 + 2(1 − κ )( θ T ψ 1 ) 2 ( k ψ 1 k 2 2 − ( θ T ψ 1 ) 2 ) + (1 − κ ) 2 ( m − ( θ T ψ 1 ) 2 ) 2 i (140) where the third and last steps f ollo w f rom the fact that ( I − θ θ T ) θ = 0 and ( I − θ θ T ) 2 = I − θ θ T . F rom ( 140 ), the pro of will follo w once w e establish the follo wing results. 53 Lemma 7.15. With T ij i.i.d. fr om Unif orm [0 , 1] , we have (uniformly over F 0 ) E [ m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ] = m ( m − 1) δ 2 (1 + o (1)) , and V ar [ m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ] = O ( m 3 δ 4 ) . Lemma 7.16. With T ij i.i.d. fr om Unif orm [0 , 1] , we have (uniformly over F 0 ) E [ m − ψ T 1 ψ 1 ] 2 = mδ 2 (1 + o (1)) , E k ψ 1 − ψ 1 k 4 2 = O ( m 2 δ 4 ) . Lemma 7.17. With T ij i.i.d. fr om Unif orm [0 , 1] , we have (uniformly over F 0 ) E [ k ψ 1 k 2 2 ( m k ψ 1 k 2 2 − ( ψ T ψ 1 ) 2 )] = m 2 ( m − 1) δ 2 (1 + o (1)) . T o see ho w ( 139 ) follo ws from Lemm as 7.15 - 7.17 , note ﬁrst that, E ( ψ T 1 ψ 1 ) 2 ( m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ) = m E [ k ψ 1 k 2 2 ( m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 )] − E ( m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ) 2 = m 3 ( m − 1) δ 2 (1 + o (1)) − O ( m 4 δ 4 ) = m 3 ( m − 1) δ 2 (1 + o (1)) ( 141) b y Lemmas 7.15 and 7.17 . No w, from ( 140 ) we obtain E k ( Σ (0) 1 ) − 1 / 2 ( Σ (0) 1 − Σ 1 )(Σ (0) 1 ) − 1 / 2 k 2 F ≥ 2 λ 2 (1 − κ ) 1 m 2 E ( ψ T 1 ψ 1 ) 2 ( m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ) = 2 λ 2 m ( m − 1) 1 + λm δ 2 (1 + o (1)) , where the last step is b y ( 141 ). This establishes the lo w er b oun d in ( 139 ). T o establish the u pp er b ound in ( 139 ), we also need to consider the exp ectations of the other t wo terms on the RHS of ( 140 ). First, b y Lemma 7.15 , λ 2 E ( k ψ 1 k 2 2 − ( θ T ψ 1 ) 2 ) 2 = λ 2 m 2 E ( m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ) 2 = O ( m 2 δ 4 ) . (142) Next, writing m − ( θ T ψ 1 ) 2 = m − k ψ 1 k 2 2 + 1 m [ m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ] , and then using the fact that f or an y ǫ > 0, and a, b ∈ R , ( a + b ) 2 ≤ (1 + ǫ ) a 2 + (1 + ǫ − 1 ) b 2 , we ha ve, for arbitrary b ut ﬁxed ǫ > 0, E ( m − ( θ T ψ 1 ) 2 ) 2 ≤ (1 + ǫ ) E [ m − k ψ 1 k 2 2 ] 2 + (1 + ǫ − 1 ) m 2 E [ m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ] 2 = (1 + ǫ ) E [ k ψ 1 − ψ 1 k 2 2 − 2( m − ψ T 1 ψ 1 )] 2 + O ( m 2 δ 4 ) (by Lemma 7.1 ) ≤ 4(1 + ǫ ) 2 E [ m − ψ T 1 ψ 1 ] 2 + (1 + ǫ )(1 + ǫ − 1 ) E k ψ 1 − ψ 1 k 4 2 + O ( m 2 δ 4 ) = 4(1 + ǫ ) 2 mδ 2 (1 + o (1)) + O ( m 2 δ 4 ) , (143) 54 where the last step follo ws from Lemma 7.16 . Finall y , su bstituting ( 142 ), ( 141 ) and ( 143 ) in ( 140 ) w e obtain an upp er b oun d of the form 2 λ 2 m ( m − 1) 1 + λm δ 2 (1 + o (1)) + (1 + ǫ ) 2 4 λ 2 m (1 + λm ) 2 δ 2 (1 + o (1)) + O ( m 2 δ 4 ) = O ( mδ 2 ) , whic h concludes the pro of. Pro of of Lemmas 7.15 - 7.17 In order to pro ve the lemmas, w e deﬁne ξ = ψ − ψ and notice the v ery imp ortan t set of relatio n s : Z ξ = Z ( ψ − ψ ) = Z ( ψ − ψ ) ψ = 1 − Z ψ ψ = 1 2 Z | ψ − ψ | 2 = 1 2 Z ξ 2 = 1 2 δ 2 + O ( δ 4 ) . (144) Pro of of Lemma 7.15 : The decomp osition m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 = ( m − 1) m X k =1 ψ 2 ( T 1 k ) − m X k 6 = k ′ ψ ( T 1 k ) ψ ( T 1 k ′ ) (145) yields E [ m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ] = ( m − 1) m Z ψ 2 − m ( m − 1)( Z ψ ) 2 = m ( m − 1)[1 − (1 − Z ξ ) 2 ] = m ( m − 1)[2 Z ξ − ( Z ξ ) 2 ] = m ( m − 1) δ 2 (1 + O ( δ 2 )) , (b y ( 14 4 )) . Deﬁne τ = R ψ . Then, using ( 145 ), V ar [ m k ψ 1 k 2 2 − ( ψ T 1 ψ 1 ) 2 ] = ( m − 1) 2 E [ m X k =1 ( ψ 2 ( T 1 k ) − 1)] 2 + X k 1 6 = k ′ 1 X k 2 6 = k ′ 2 E [( ψ ( T 1 k 1 ) ψ ( T 1 k ′ 1 ) − τ 2 )( ψ ( T 1 k 2 ) ψ ( T 1 k ′ 2 ) − τ 2 )] − 2( m − 1) m X k 1 =1 X k 2 6 = k ′ 2 E [( ψ 2 ( T 1 k 1 ) − 1)( ψ ( T 1 k 2 ) ψ ( T 1 k ′ 2 ) − τ 2 )] = ( m − 1) 2 m E ( ψ 2 ( T 11 ) − 1) 2 + 2 m ( m − 1) E ( ψ ( T 11 ) ψ ( T 12 ) − τ 2 ) 2 +4 m ( m − 1)( m − 2) E [( ψ ( T 11 ) ψ ( T 12 ) − τ 2 )( ψ ( T 11 ) ψ ( T 13 ) − τ 2 )] − 4 m ( m − 1) 2 E [( ψ 2 ( T 11 ) − 1)( ψ ( T 11 ) ψ ( T 12 ) − τ 2 )] = m ( m − 1)  ( m − 1)( Z ψ 4 − 1) + 2( Z ψ 2 Z ψ 2 − τ 4 ) +4( m − 2)( Z ψ 2 Z ψ Z ψ − τ 4 )  − 4 m ( m − 1) 2 ( Z ψ 3 Z ψ − τ 2 ) = m ( m − 1)  ( m − 1)( Z (1 − ξ ) 4 − 1) + 2(1 − τ 4 ) +4( m − 2) τ 2 (1 − τ 2 ) − 4( m − 1) τ ( Z (1 − ξ ) 3 − τ )  . 55 Simplifying this expression, and using ( 144 ), ﬁrst term within square brac k et is ( m − 1)(4 R ξ 2 − 4 R ξ 3 + R ξ 4 ), and the last term within squ are b rac ke t is − 4( m − 1) τ (2 R ξ 2 − R ξ 3 ). Collecti n g terms and using the fact that 1 − τ 2 = 2(1 − τ ) − (1 − τ ) 2 = ξ 2 − ( ξ 2 ) 4 (again b y ( 144 )), we can express the su m as m ( m − 1)  (4( m − 1) + 4 + 4( m − 2) − 8( m − 1)) Z ξ 2 − (4( m − 1) − 4( m − 1)) Z ξ 3 + ( m − 1) Z ξ 4  + m ( m − 1)  − 4(1 − τ ) 2 − 2(1 − τ 2 ) 2 − 4( m − 2)((1 − τ ) 2 − (1 − τ 2 ) 2 ) +4( m − 1)(1 − τ )(2 Z ξ 2 − Z ξ 3 )  = O ( m 3 δ 4 ) . Pro of of Lemma 7.16 : First observe that, E [ m − ψ T 1 ψ 1 ] 2 = E [ m X k =1 (1 − ψ ( T 1 k ))] 2 = m X k =1 E (1 − ψ ( T 1 k )) 2 + m X k 6 = k ′ E [(1 − ψ ( T 1 k )(1 − ψ ( T 1 k ′ )] = m Z ( ψ − ψ ) 2 + m ( m − 1)( Z ( ψ − ψ )) 2 = m ( δ 2 + O ( δ 4 )) + m ( m − 1) 4 ( δ 2 + O ( δ 4 )) 2 = mδ 2 (1 + o (1)) , (b y ( 144 )) . Next, E k ψ 1 − ψ 1 k 4 2 = E [ m X k =1 (1 − ψ ( T 1 k )) 2 ] 2 = m X k =1 E (1 − ψ ( T 1 k )) 4 + m X k 6 = k ′ E [(1 − ψ ( T 1 k ) 2 (1 − ψ ( T 1 k ′ ) 2 ] = m Z ( ψ − ψ ) 4 + m ( m − 1)( Z ( ψ − ψ ) 2 ) 2 ≤ m k ψ − ψ k 2 ∞ Z ( ψ − ψ ) 2 + m ( m − 1)( Z ( ψ − ψ ) 2 ) 2 = O ( mδ 4 ) + m ( m − 1) δ 4 (1 + o (1)) = O ( m 2 δ 4 ) , where in the last step we used (iv’) and ( 144 ). 56 Pro of of Lemma 7.17 : Use ( 145 ) to w rite the exp ectation as ( m − 1) E [ m X k =1 ψ 2 ( T 1 k )] 2 − E   ( m X k 1 =1 ψ 2 ( T 1 k 1 ))( m X k 2 6 = k ′ 2 ψ ( T 1 k 2 ) ψ ( T 1 k ′ 2 ))   = ( m − 1)   m X k =1 E ψ 4 ( T 1 k ) + m X k 6 = k ′ E [ ψ 2 ( T 1 k ) ψ 2 ( T 1 k ′ )   −   X k 1 = k 2 6 = k ′ 2 E [ ψ 3 ( T 1 k 1 ) ψ ( T 1 k ′ 2 )] + X k 1 = k ′ 2 6 = k 2 E [ ψ 3 ( T 1 k 1 ) ψ ( T 1 k 2 )] + X k 1 6 = k 2 6 = k ′ 2 E [ ψ 2 ( T 1 k 1 ) ψ ( T 1 k 2 ) ψ ( T 1 k ′ 2 )]   = ( m − 1)[ m Z ψ 4 + m ( m − 1)( Z ψ 2 ) 2 ] − [2 m ( m − 1)( Z ψ 3 )( Z ψ ) + m ( m − 1)( m − 2)( Z ψ 2 )( Z ψ ) 2 ] = m ( m − 1)[ Z (1 − ξ ) 4 + ( m − 1) − 2( Z (1 − ξ ) 3 )( Z (1 − ξ )) − ( m − 2)( Z (1 − ξ )) 2 ] = m ( m − 1)[ m Z ξ 2 − ( Z ξ 3 )(2 − Z ξ 2 ) − 1 4 ( m − 2)( Z ξ 2 ) 2 + Z ξ 4 ] = m 2 ( m − 1) δ 2 (1 + o (1)) , where in the four th and last steps we used ( 144 ) and (iv’). References [1] Ash, R. B. (1972). R e al A nalysis and Pr ob ability , Academic Pr ess. [2] Bess e, P. , Cardot, H. and Ferra t y, F . (1997 ). Sim u ltaneous nonp arametric regression of unbala nced longitudin al data. Computationa l Statistics and Data Ana lysis 24 , 255-270. [3] Cai, T. and Hall, P. (2006). Prediction in functional linear regression. Annals of Statistics 34 , 2159-2 179. [4] Cardot , H. , Ferr a ty F. and S arda P. (1999). F un ctional Lin ear Mod el. Statistics and Pr ob ability L etters 45 , 11-22. [5] Cardot , H. (2000 ). Nonparametric estimation of smoothed pr incipal comp onen ts analysis of sampled noisy functions. Journal of Nonp ar ametric Statistics 12 , 503-538. [6] Chui, C. (198 7). Multivariate Splines , SIAM. [7] Fer ra ty , F. and Vieu, P. (2006). Nonp ar ametric F unctional Data A nalysis : The ory and Pr actic e . Springer. 57 [8] Hall , P. and Hor owitz, J . L. (2 007). Metho dology and con v ergence r ates for functional linear regression. ( http://ww w.facult y.econ.northwestern.edu/faculty/horowitz/papers/hhor-final.pdf ) [9] Hall , P. , M ¨ uller, H.-G. and W ang, J.-L. (2006). Pr op erties of principal comp onen t meth- o ds for fun ctional and longitudinal data analysis. Annals of Statistics 34 , 1493-1 517. [10] Hlubinka , D. and Prchal, L. (2007). Changes in atmospheric r adiation from th e statistical p oint of view. Computational Statistics and Data A nalysis 51 , 4926-49 41. [11] Jame s, G. M. , Hastie, T. J. and S ugar, C. A. (2000). Principal comp onent mo dels for sparse functional data. Biometrika , 87 , 587 -602. [12] Ka to, T. (1980 ). Perturb ation The ory of Line ar Op er ators . S pringer-V erlag. [13] Kneip, A. and Utikal, K. J. (2001). Inference for densit y families u sing fu nctional principal comp onent analysis, Journal of the A meric an Statistic al Asso ciation , 96 , 51 9-542. [14] Nica, A. and Speicher, R. (2006). L e ctur es on the Combinatorics of F r e e Pr ob ability . Cam- bridge Univ ersity Press. [15] P aul, D. (200 4). Asymptotics of the leading samp le eigen v alues for a spik ed co v ariance mo del. T e chnic al r ep ort . ( http://an son.ucda vis.edu/ ∼ debashis/techrep/eigenlimit.pdf ) [16] P aul, D. and J ohnston e, I. M. (2007). Augmen ted sp arse principal comp onent analysis for high dimensional data. Working Pap er . ( http://an son.ucda vis.edu/ ∼ debashis/techrep/augmented-spca.pdf ) [17] P aul, D. and Pen g, J. (2007). Cons istency of restricted maximum lik elihoo d estimators of principal comp onents. T o app ear in Annal s of Statistics . ( http://an son.ucda vis.edu/ ∼ jie/REML-Asymptotics revi sion.pdf ) [18] Peng, J. and P aul, D. (20 07). A geometric approac h to maxim um like liho o d estimation of co v ariance kernel from sp arse irregular longitudinal data. T e chnic al R ep ort . arXiv:0710 .5343v1 [stat.ME]. ( http://an son.ucda vis.edu/ ∼ jie/pd-cov-likelihood-technical.pdf ) [19] Peng, J . and M ¨ uller, H.-G. (2008). Distance-based clustering of s parsely observed sto c has- tic pro cesses, with ap p lications to online auctions. T o app ear in Annals of Applie d Statistics . [20] Ramsa y, J . and Sil verman , B. W. (2005) : F unctional Data Analysis, 2nd Edition . Springer. [21] Spel lman, P.T. , Sherloc k, G. , Zh ang, M. Q. , Iyer, V. R. , Ande rs, K. , Eisen , M. B. , Bro wn, P. O. , Botstein, D. and Futch er, B. (19 98). Compreh en siv e ident iﬁ cation of cell cycle-regulated genes of th e y east sac char omyc e s c er evisiae b y m icroarra y h ybridization. Mole cular Biolo gy of the Ce l l 9 , 327 3-3297. [22] Y ao, F. , M ¨ uller, H.-G. and W ang, J.-L. (2005) . F unctional data analysis for sparse lon- gitudinal data. Journal of the Americ an Statistic al Asso ciation 100 , 577-590. [23] Y ao, F. , M ¨ uller, H.-G. and W ang, J .-L. (2006). F unctional linear regression for longitu- dinal data. Annal s of Statistics 33 , 2873-29 03. 58

Principal components analysis for sparsely observed correlated functional data using a kernel smoothing approach

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment