Thresholding Projection Estimators in Functional Linear Models
We consider the problem of estimating the regression function in functional linear regression models by proposing a new type of projection estimators which combine dimension reduction and thresholding. The introduction of a threshold rule allows to g…
Authors: Herve Cardot, Jan Johannes
Threshol ding Pro jection Est imators in F unction al Linear Mo dels. Her v ´ e Cardot ∗ Jan Johannes † Decem b er 17, 2008 Abstract W e consider the problem of estimating the regres sion function in functional linear regres s ion mo dels by prop osing a new type of pro jection estimators whic h combine dimension reductio n and thresholding. The in tro duction of a thre s hold rule a llows to get consistency under broa d assumptions as w ell as minimax rates of con vergence under additional regula rity hypotheses. W e also consider the particular case of Sob olev spaces generated by the trigonometric basis which p ermits to get easily mean squared error of prediction as well a s estimators of the deriv atives of the re g ressio n function. W e prove these estimators are minimax and r ates of conv ergence are giv en for some particula r cases. Keywor ds: Deriv ativ es estimation, Galerkin metho d, Linear inv ers e pr oblem, Mean squ ared error of pr ediction, Op timal rate of con verge nce, Hilb ert scale, Sob ole v Space. AMS 2000 subje ct classific ations: Pr im ary 62J05; secondary 62G20, 62G08. 1 In tro duction F unctional data analysis (Ramsa y and Silv erman (200 5 ), F errat y and Vieu (2006)) is a to pic of gro wing interest in statistics and man y applications in c hemometrics (F rank and F riedman (1993)), fin ance (Preda and S ap orta (2005)), biometry or climatology (Besse et al. (2000)) are no w d ealing with the functional linear mo d el. This mo del is u seful to estimate or predict a scala r random v ariable, sa y Y ∈ R , th anks to a random function denoted b y X. W e assu m e in the follo wing that Y and X are cen tered random v ariables and, w ithout loss of generalit y , that the random fun ction X take s v alues in L 2 [0 , 1], the space of squ are integ rable fun ctions defined on [0 , 1] end o w ed with its us u al inner p ro du ct h f , g i = R 1 0 f ( t ) g ( t ) dt and asso ciated norm k f k = h f , f i 1 / 2 , f , g ∈ L 2 [0 , 1] . The fu nctional linear mo d el is then defined by Y = Z 1 0 β ( t ) X ( t ) dt + σ ǫ, σ > 0 , (1.1) where the function β ( t ) is called the regression or slop e function and the error term ǫ is supp osed to b e cent ered E ( ǫ ) = 0 and not correlated with X : ∀ t ∈ [0 , 1] , E ( X ( t ) ǫ ) = 0 . ∗ Universit ´ e de Bourgogne, In stitut de Math´ ematiques de Bourgogne, 9 A v. Alain Sav ary , 21078 D ijon Cedex, F rance, e-mail: herve.cardo t@u-bourgogne.f r † Universit¨ at Heidelb erg, Institut f ¨ ur Angewandte Mathematik, Im Neuenheimer F eld, 294, D-69120 Hei- delb erg, Germany , e-mail: johannes@statlab.un i-heidelberg.de 1 Assuming that X has a finite second moment, i.e. E k X k 2 = R 1 0 E | X ( t ) | 2 dt < ∞ , one can define the co v ariance op erator of X , sa y Γ . This op erator is defined on L 2 [0 , 1] as follo ws: for any function f ∈ L 2 [0 , 1] , Γ f ( s ) = Z 1 0 co v ( X ( t ) , X ( s )) f ( t ) dt, ∀ s ∈ [0 , 1] . (1.2) It is w ell kno wn (see e.g. Cardot et al. (1999)) that the r egression fu nction β satisfies the follo w in g momen t equation g ( s ) := E [ Y X ( s )] = [Γ β ]( s ) , s ∈ [0 , 1] , (1.3) where g b el ongs to L 2 [0 , 1] . Sin ce Γ is a non negativ e nucle ar op erat or (Dauxois et al. (1982)) a contin uous g eneralized inv erse of Γ do es n ot exist as long as the range of the op erator Γ is an in finite dimensional su b space of L 2 [0 , 1] . C onsequent ly inv erting equation (1.3) to reco ver β can b e seen as an ill p osed in verse p roblem. C ardot et al. (2003) provides a necessary and su fficien t condition for the existence of a uniqu e solution of equation (1.3) Assumpt ion 1.1 . The c ovarianc e op er ator Γ of the r andom function X is inje ctive and the function g = E [ Y X ] b elongs to the r ange R (Γ) of Γ . Under this assump tion, the co v ariance op erato r Γ admits a discrete sp ec tral decom- p osition giv en by a sequence ( λ j ) j ∈ N of s tr ictly p ositiv e eigen v alues and a sequ ence of corresp ondin g orthonormal eigenfunctions { φ j } j ∈ N . Then, the normal equation (1.3) can b e rewritten as follo ws β = X j ∈ N g j λ j · φ j with g j := h g, φ j i , j ∈ N . (1.4) It is w ell-kno wn that, ev en in case of a-priori known eige nv alues { λ j } and eigenfun ctions { φ j } , r eplacing in (1.4) the un kno wn function g b y a consistent estimator b g do es in general not lead to a consisten t estimator of β . T o b e more precise, since the sequence ( λ j ) j ∈ N tends to zero, E k b g − g k 2 = o (1) do es generally not imp ly P j ∈ N | λ j | − 2 · E |h b g − g , φ j i| 2 = o (1). Consequent ly , the estimation in functional linear mo d el is called ill-p osed and additional regularit y assump tions on the regression function β are n ecessary in order to obtain a uniform rate of con v ergence (c.f. En gl et al. (2000)). The ob jectiv e is to estimate the regression function β , as w ell as its deriv ative s, when observing a sample ( Y i , X i ) of n i.i.d realiza tions of ( Y , X ) . W e can define the emp irical estimators of g and Γ resp ective ly as follo ws b g := 1 n n X i =1 Y i X i and b Γ := 1 n n X i =1 h X i , ·i X i . (1.5) The main class of estimation pr o cedures s tu died in the statistical literature are based on principal comp onen ts regression and consist in redu cing the d imension b y in v erting equation (1.3) in the finite dim en sion space generated by the eigenfunctions of b Γ asso ciated to th e largest eigen v alues (see e.g. Bosq (200 0 ), F rank and F riedm an (1993), Card ot et al. (1999), Cardot et al. (2007) or M ¨ uller and Stadtm ¨ uller (2005) in the con text of generalize d linear mo dels). The s econd imp ortan t class of estimators relies on min imizing a p enalize d least squ ares criterion wh ich ca n b e seen as generalization of the ridge regression. Marx and Eilers (1999) 2 and Cardot et al. (2003) prop osed B-splines expansion of the regression fun ction with a p enalt y dealing with the squared norm of a fi xed order d eriv ativ e of the estimat ors. More recen tly Crambes et al. (2008) pr op osed a sp line smo othing decomp osition with the same t yp e of p enalt y and prov ed the optimalit y of their estimators according to a criterio n th at can b e int erpr eted as a squared er r or of prediction. Note that this question h as giv en rise recen tly to numerous publications in the mac hine learnin g comm un it y with s imilar ideas based on repro ducing k ern el Hilb ert spaces (RKHS ) and Tikhono v regularization (see e.g. Smale and Zhou (2007), Bauer et al. (2007) and referen ces therein). Borro wing ideas fr om the inv erse problems comm unity (Efromo vich and Koltc hinskii (2001) and Hoffmann and Reiß (2008)) we prop ose in this article a new class of estimators whic h rely on d imension redu ction by p ro jecting the data ont o some basis of orthonormal functions and thr esh old tec hniques that allo w to con trol the accuracy of the estimator. More precisely , let u s consid er a set of orthonormal functions such as w a v elet or trigonometric basis denoted b y { ψ 1 , . . . , ψ m , . . . } which forms a b asis of L 2 [0 , 1] . Giv en a d imension m ≥ 1 , w e denote by [ b Γ] m the m × m matrix w ith generic elemen ts h b Γ ψ ℓ , ψ j i , j, ℓ = 1 , . . . , m and by [ b g ] m the m ve ctor w ith elemen ts h b g, ψ ℓ i , ℓ = 1 , . . . , m. W e can first remark, that the least squares estimator of β obtained with the pro jections of the X i on to Ψ m , the subspace of L 2 [0 , 1] spanned by the fu nctions { ψ 1 , . . . , ψ m } , is simply given, when [ b Γ] m is non singu lar, b y ([ b Γ] − 1 m [ b g ] m ) t [ ψ ] m ( · ) where [ ψ ] m ( · ) = ( ψ 1 ( · ) , . . . , ψ m ( · )) t . Our estimator, in its simplest form, consists in thr esholding this p ro jection estimator when, rough ly sp eaking, the norm of the in v erse of the matrix [ b Γ] m is to o large. More pr ecisely , introdu cing a th reshold v alue γ whic h w ill dep en d on m and n we prop o se to estimate β as follo w s b β ( t ) = m X ℓ =1 b β ℓ · 1 {k [ b Γ] − 1 m k ≤ γ } · ψ ℓ ( t ) , t ∈ [0 , 1] , (1.6) where the b β ℓ are the generic elemen ts of th e vec tor of co ordin ates obtained by least squares pro jection and 1 is the indicator fu nction. T h is new thresholdin g step can b e seen as an impro veme nt of the estimator pr op osed by Ramsa y and Dalzell (1991) whic h w as built b y pro jecting the data on to finite dimensional basis of functions. F rom an inv erse p roblems p ersp ectiv e this approac h is similar to the lin ear Galerkin pro cedu re (Natterer (1997) or Engl et al. (2000)) defin ed as follo ws, β m ∈ Ψ m denotes a Galerkin solution of the op erator equation g = Γ β when k g − Γ β m k 6 k g − Γ ˜ β k , ∀ ˜ β ∈ Ψ m . (1.7) Since Γ is strictly p osit ive it follo ws that β m = [ β m ] t m [ ψ ] m ( · ) with [ β m ] m = [Γ] − 1 m [ g ] m is the unique Galerkin solution satisfying [Γ( β − β m )] m = 0. It has the adv an tage compared to principal comp onen ts regression that it do es n ot necessitate to estimate the eigenfunctions of the empirical cov ariance op erator. W e will consider a large class of w eigh ted norms to ev aluate the asymp totic rates of con v erge of the thresholded p ro jectio n estimators. F or f ∈ L 2 [0 , 1] , we defi n e k f k 2 ω =: ∞ X j =1 ω j |h f , ψ j i| 2 (1.8) for some strictly p ositiv e sequen ce of w eigh ts ( ω j ) j ∈ N . T hen, the p erformance of th e esti- mator b β of β is ev aluated according to th e r isk E k b β − β k 2 ω , called W ω -risk in the follo wing, 3 whic h is sim p ly the L 2 [0 , 1]-risk when ω j = 1 for all j ∈ N . Th is general framew ork allo ws us with appr op r iate c hoices of the w eigh t sequence ω to co v er the estimation of deriv ativ es of β as w ell as the optimal estimatio n with resp ect to the mean s q u ared prediction error. Indeed, the prediction error of a new v alue of Y give n any random fu nction X n +1 p ossessing the same distribu tion as X and b eing indep endent of X 1 , . . . , X n can b e ev aluated as follo ws (see for example Cardot et al. (2003) or Cram b es et al. (2008) for similar setups) E h Z 1 0 b β ( s ) X n +1 ( s ) ds − Z 1 0 β ( s ) X n +1 ( s ) ds 2 b β i = h Γ( b β − β ) , ( b β − β ) i . Consequent ly , if w e supp ose , n o w for sake of simplicit y , that the functions ψ j are also the eigenfunctions φ j of op erator Γ th en it is clear that c ho osing ω j = λ j leads to ev aluate, according to the ω -norm, the mean squared pred iction error of the estimator. The pap er is organized a follo ws. In section 2, we fix notations and we first deriv e consistency of the estimator in the general case under br oad m omen t assumptions and then pro v e minimax r esults un der some additional r egularit y assumptions based on a link condition b et we en the op erator Γ and the basis { ψ j } . Section 3 is dev oted to the particular case of trigonometric b asis and fo cu ses on fin itely and infi nitely smo othing op erat or Γ as w ell as differen t regularit y conditions for the function β . W e first consider th e case of mean squared p rediction er r or and get asymptotic rates of con v ergence whic h are comparable to those of Cr am b e s et al. (2008) in the p o lynomial case. One remark able result is that for the exp onentia l case, one can attain the parametric rates up to a p o wer of a log n f actor. Rates of con ve rgence for the fu nction itself and its d eriv ativ es are also given. T h ey are similar to those obtained by Hall and Horowitz (2007) in the case of the estimation of the function itself. Finally , a brief section 4 presents the concluding remarks and some p ersp ectiv es. T he pro ofs are gathered in the App endix. 2 Asymptotic prop ertie s, the general case 2.1 Notations and assumptions. W e assume from no w on that the regression fun ction β b elongs to some ellipsoid W ρ b , ρ > 0, defined as follo ws W ρ b := { f ∈ L 2 [0 , 1] : ∞ X j =1 b j |h f , ψ j i| 2 =: k f k 2 b ≤ ρ } , (2.1) where { ψ j , j ∈ N } is as b efore some orthonormal basis in L 2 [0 , 1] not necessarily corre- sp ond ing to the eigenfun ctions of Γ, and the sequence of we ights ( b j ) j ∈ N is non-decreasing. Here W ρ b captures all the prior information (suc h as the smo othness) ab out the unkno wn slop e fu nction β . Matrix and ope rator notat ions. Giv en m > 1, Ψ m denotes the su bspace of L 2 [0 , 1] spanned by th e functions { ψ 1 , . . . , ψ m } . Π m and Π ⊥ m denote the orthogonal pr o jecti ons on Ψ m and its orthogonal complemen t Ψ ⊥ m resp ectiv ely . Giv en an op erator (matrix) K , k K k ω denotes its op erat or W ω -norm, i.e. k K k ω := sup k f k ω =1 k K f k ω . The inv erse op er ator (matrix) of K is denoted by K − 1 , the adjoin t (transp osed) op erator (matrix) of K by K t . The identit y op erator (matrix) is denoted b y I . F or a ve ctor v and a matrix K , the upp er 4 m subv ector and m × m sub -matrix is denoted by [ v ] m and [ K ] m and its ent ries by v i and K i,j resp ectiv ely . The diagonal matrix with en tries v is denoted by Dia g ( v ). [ f ] and [ K ] denote the (infi n ite) vec tor and m atrix of the fun ction f and the op erat or K w ith the en tries [ f ] i = h f , ψ i i and [ K ] i,j = h K ψ j , ψ i i r esp ectiv ely . C learly , [Π m f ] m = [ f ] m and if w e restrict Π m K Π m to an op erato r from Ψ m in to itself, then it has the m atrix [ K ] m . Moreo ve r, Π m f = [ f ] t m [ ψ ] m ( · ) and Π m K Π m f = [ f ] t m [ K ] m [ ψ ] m ( · ) with [ ψ ] m ( · ) = ( ψ 1 ( · ) , . . . , ψ m ( · )) t . Consider the co v ariance op erato r Γ. W e assum e throu gh ou t the pap er that Γ is str ictly p ositiv e defi nite and hence the matrix [Γ] m is nonsingular for all m ∈ N , so that [Γ] − 1 m alw a ys exists. Under this assumption the notatio n Γ − 1 m is u sed for the op erator from L 2 [0 , 1] in to itself, whose matrix in the basis { ψ j } has the entries ([Γ] − 1 m ) i,j for 1 6 i, j 6 m and zero es otherwise. Momen t a ssumptions. The results deriv ed b elo w in vol ve additional conditions on the momen ts of the random fun ction X , which we formalize no w. Here and sub sequen tly , we denote by X the set of all cen tered r andom fun ctions X w ith finite second moment, i.e., E k X k 2 < ∞ , and strictly p osit ive co v ariance op erator. Given X ∈ X consider th e random v ector [ X ] m , then its ent ries [ X ] j = h X , ψ j i ha ve mean zero and v ariance [Γ] j,j = h Γ ψ j , ψ j i , but they are n ot uncorrelated. In fact, [Γ] m is the co v ariance matrix of [ X ] m . Since Γ is strictly p ositiv e defin ite it follo ws that [Γ] m is non singular. Therefore, the random vecto r [Γ] − 1 / 2 m [ X ] m has mean zero and id en tit y I m as co v ariance matrix. Then we denote b y X k η , k ∈ N , η > 1, the su bset of X con taining only random functions X with un iformly b ound ed k -th momen t of the corresp onding random v ariables [ X ] j / [Γ] 1 / 2 j,j , j ∈ N , and ([Γ] − 1 / 2 m [ X ] m ) j , 1 6 j 6 m, m ∈ N , th at is X k η := n X ∈ X suc h th at sup j ∈ N E [ X ] j / [Γ] 1 / 2 j,j k 6 η and sup m ∈ N sup 1 6 j 6 m E ([Γ] − 1 / 2 m [ X ] m ) j k 6 η o . (2.2) It is worth n oting th at in case X ∈ X is a Gaussian random fu nction the corr esp onding random v ariables [ X ] j / [Γ] 1 / 2 j,j , j ∈ N and ([Γ] − 1 / 2 m [ X ] m ) j , 1 6 j 6 m, m ∈ N , are Gaussian with mean zero and v ariance one. Hence, for eac h k ∈ N there exists η such th at an y Gaussian rand om fu nction X ∈ X b el ongs also to X k η . F urthermore, in what follo ws, E k η stands for the set of all cen tered error terms ǫ w ith v ariance one and finite k -th momen t, i.e., E | ǫ | k 6 η . 2.2 Consistency . The W ω -risk of b β is essen tially d etermin ed by the deviation of the estimators of [ g ] m and [Γ] m , and by the regularization error due to the pro jection. Th e next assertion summarizes then minim al conditions to ensure consistency of b β prop osed in (1.6) . Pr op osition 2.1 . Assume an n -sample of ( Y , X ) satisfying (1.1) with σ > 0 . L et β ∈ W ω , X ∈ X 4 η and ǫ ∈ E 4 η , η > 1 . Consider the estimator b β with p ar ameter m := m ( n ) and thr eshold γ := γ ( n ) ar e chosen suc h that γ > 2 k [Γ] − 1 m k and supp ose, as n → ∞ , that 1 /m = o (1) , γ ( m/n ) sup 1 6 j 6 m { ω j } = o (1) , ( m 2 /n ) = o (1) and γ 2 ( m 3 /n 1+1 / 2 ) = O (1) . If in addition sup m ∈ N k Γ − 1 m Π m ΓΠ ⊥ m k ω < ∞ , then E k b β − β k 2 ω = o (1) as n → ∞ . 5 Remark 2.1 . The last result co v ers the case ω ≡ 1, i.e., the estimator of β is consis- ten t without an additional assu m ption on β . Ho we ver, consistency is only obtained u nder the condition sup m ∈ N k Γ − 1 m Π m ΓΠ ⊥ m k ω < ∞ , whic h is kno w n to b e suffi cien t to ensu re con- v ergence in the W ω -norm as m → ∞ of the Galerkin solution β m = [ β m ] t m [ ψ ] m ( · ) w ith [ β m ] m = [Γ] − 1 m [ g ] m to the s lop e p arameter β . F urthermore, if ω is increasing, as in case of a Sob olev n orm , then b β is obviously a consistent estimator only if β ∈ W ω . Moreo ver, in the last assertion w e ma y replace th e condition β ∈ W ω b y the assum ption β ∈ W b and ( ω j /b j ) is n on-increasing. In this situation w e hav e W b ⊂ W ω and thus the result still holds true. Roughly sp eaking this corresp onds to the condition that at least p > s d eriv ativ es exist in case we w ant to estimate the s -th deriv ativ e. Link condition. In the last assertion the choic e of the smo othing parameter m and γ , i.e. γ > 2 k [Γ] − 1 m k , dep end s on the relation b et we en the co v ariance op erator Γ asso ciated to the regressor X and the basis { ψ j } used for the pro jectio n, w hic h we formalize n ext. Consider the sequ ence ( k Γ ψ j k ) j > 1 , which is s ummable and hence con v erges to zero sin ce Γ is nuclear. In w hat f ollo ws w e imp ose restriction on the d eca y of this sequence. Therefore, consider a strictly p ositiv e, monotonically decreasing and summable sequence of weig hts υ := ( υ j ) j ∈ N with υ 1 = 1. Then for s ∈ R denote b y k·k υ s the asso ciated wei ghte d norm giv en by k f k 2 υ s := P ∞ j =1 υ s j |h f , ψ j i| 2 . Let N b e the set of all self-adjoin t n uclear op erato r defined on L 2 [0 , 1]. Then for d > 1 defin e the su bset N d υ of N b y N d υ := n Γ ∈ N : k f k 2 υ 2 /d 2 6 k Γ f k 2 6 d 2 k f k 2 υ 2 , ∀ f ∈ L 2 [0 , 1] o . (2.3) A similar condition, bu t in a d ifferen t context, can b e f ou n d, for example, in Nair et al. (2005) and Chen and Reiß (2008). Note, for all Γ ∈ N d υ b y u sing the inequalit y of Heinz (1951) it f ollo ws th at 1 k Γ ψ j k ≍ d υ j . Hence, the sequence ( υ j ) j ∈ N has to b e sum mable, i.e., P j υ j < ∞ , since Γ is nuclear. W e fir st consider this general class of oper ator. Ho wev er, we illustrate condition (2.3 ) in Section 3 b y considering the p articular cases of a sequence υ with p olynomial or exp onentia l deca y wh ic h are naturally lin ked to p olynomial or exp onential decreasing rates f or the eigen v alues of Γ. T o b e more precise, if the eigen v alue d ecomp osition of Γ ∈ N is giv en b y { λ j , ψ j , j ∈ N } then Γ ∈ N d υ if an d only if λ j ≍ d υ j for all j ∈ N . All the r esults b elo w are derived under the follo wing b asic regularit y assu m ption. Assumpt ion 2.1 . L et ω := ( ω j ) j > 1 , b := ( b j ) j > 1 and υ := ( υ j ) j > 1 b e strictly p ositive se quenc es of weights with ω 1 = 1 , b 1 = 1 and υ 1 = 1 such that b and ( b j /ω j ) j > 1 ar e non-de cr e asing and υ and ( υ 2 j /ω j ) j > 1 ar e non-i nc r e asing with Λ := P j υ j < ∞ . Note that under Assumption 2.1, i.e., ( b j /ω j ) j > 1 is non-decreasing, the ellipsoid W ρ b is a subset of W ρ ω . Roughly sp eaking, if W ρ b describ es p -times differentia ble fun ctions, then th e Assumption 2.1 ensu res that the W ω -risk in vo lves maximal s 6 p deriv ativ es. O n the other hand if the sequence ω is decreasing, i.e., the W ω -norm is roughly sp eaking smo othin g, the Assumption 2.1 exclud es cases in whic h ω decreases faster than th e sequence υ 2 . Ho w ev er, in case ω ≡ υ 2 w e show b elo w that the obtainable optimal-rate is parametric, and hence, whenev er ( ω j /υ 2 j ) = o (1) it is parametric to o. The next assertion summ arizes no w minimal conditions to ensure consistency of the estimator b β giv en in (1.6) wh en the co v ariance op erato r satisfies a lin k condition. Corollar y 2.2 . Assume an n -sample of ( Y , X ) satisfying (1.1) with σ > 0 and asso ciate d c ovarianc e op er ator Γ ∈ N d υ , d > 1 . L et β ∈ W b , X ∈ X 4 η and ǫ ∈ E 4 η , η > 1 . Consider 1 W e write a ≍ d b if d − 1 6 b/a 6 d . 6 the estimator b β with thr eshold γ = 8 d 3 /υ m and p ar ameter m := m ( n ) chosen such that 1 /m = o (1) , ( m/n ) su p 1 6 j 6 m { ω j /υ j } = o (1) , ( m 2 /n ) = o (1) and m 3 / ( υ 2 m n 1+1 / 2 ) = O (1 ) as n → ∞ . If in addition A ssumption 2.1 is satisfie d, then E k b β − β k 2 ω = o (1) as n → ∞ . It is worth n oting that the link condition Γ ∈ N d υ used in th e last assertion imp lies sup m ∈ N k Γ − 1 m Π m ΓΠ ⊥ m k ω < ∞ and hence ensur es au tomatically the consistency in the W ω - norm of th e Galerkin solution β m as m → ∞ . Ho wev er, in order to obtain a rate of con v ergence it is necessary to imp ose additional r egularit y assump tion on the slop e param- eter β . First w e derive a lo wer b ound for any estimator wh en th ese r egularity assumptions are form alized by th e condition that β b elongs to the ellipsoid W ρ b . 2.3 The lo w er b ound. It is well-kno wn that in general the hardest one-dimensional s ubpr oblem do es not captur e the full d ifficult y in estimating the s olution of an inv erse problem ev en in case of a known op erator (for details see e.g. th e pr o of in Mair and Ruymgaart (1996)). In other w ords, there do es not exist t wo sequences of slop e functions β 1 ,n , β 2 ,n ∈ W ρ b , wh ich are statistically not consisten tly distinguishable and whic h satisfy k β 1 ,n − β 2 ,n k 2 ω > C δ ∗ n , where δ ∗ n is the opti- mal rate of con v ergence. Th erefore we need to consider su bsets of W ρ b with growing num b er of elements in order to get the optimal lo wer b oun d. More pr ecisely , we obtain the follo w - ing low er b ound b y applyin g Assouad’s cub e tec hniqu e (see e.g. Korostolev and T sybak o v (1993) or Chen and R eiß (200 8 )). Moreo ver, th e follo wing lo wer b oun d is obtained u nder the additional assumption that distribu tion of the error term ǫ is Gaussian with mean zero and v ariance one, i.e., ǫ ∼ N (0 , 1). Theorem 2.3 . Assume an n -sample of ( Y , X ) satisfying (1.1 ) with σ > 0 and asso ciate d c ovarianc e op er ator Γ ∈ N d υ , d > 1 . Supp ose the err or term ǫ ∼ N (0 , 1) is indep endent of X . Consider W ρ b , ρ > 0 , as set of slop e functions. L et m ∗ := m ∗ ( n ) and δ ∗ n := δ ∗ n ( m ∗ ) for some △ > 1 b e chosen such tha t △ − 1 6 b m ∗ n ω m ∗ m ∗ X j =1 ω j υ j 6 △ and δ ∗ n := ω m ∗ /b m ∗ . (2.4) If in additio n the Assumption 2.1 i s satisfie d, then for any estimator e β of β we have sup β ∈W ρ ρ n E k e β − β k 2 ω o > 1 4 △ · min n σ 2 2 d , ρ △ o · δ ∗ n . Remark 2.2 . The n ormalit y and indep endence assu mption on the error term in the last theorem is only used to simplify th e calculatio n of the d istance b et we en distrib u tions cor- resp ond ing to differen t slop e fu nctions. How eve r, b elo w we sho w an upp er b ound for the estimator b β in ca se the error term ǫ ∈ E k η and the regressor X ∈ X k η for some k ∈ N and η > 1 are only uncorrelated, whic h includes the particular case of an indep enden t Gaussian error considered in Theorem 2.3 as long as η is sufficiently large. Therefore, by applying Theorem 2.3 an u pp er b ound of ord er δ ∗ n implies that this rate is optimal and hence the estimator b β is minimax-optimal. Note further that if ( ω j /υ j ) is summ able then the order δ ∗ n is parametric. Th is in p articular is the case when ω ≡ υ 2 since ( υ j ) is summable. 7 Remark 2.3 . In case the eigenfunctions of the op erator Γ are kno wn, the obtainable accu- racy of any estimator of β is essen tially d etermined by th e d eca y of the eigen v alues ( λ j ) j > 1 of Γ. T o b e m ore precise, if for some sequ ence of weig hts υ := ( υ j ) j > 1 w e h av e ∃ d > 1 : λ j ≍ d υ j , j > 1 , (2.5) then υ determines the obtainable rate of con v ergence (c.f. Johannes (2008)). If { ψ j } are the eigenfunctions of Γ, i.e., λ j = h Γ ψ j , ψ j i , then th e condition (2.5) holds if and only if Γ ∈ N d υ . In other w ord s, the condition Γ ∈ N d υ sp ecifies in this situation the deca y of the eigen v alues of Γ. Ho w eve r, the set N υ also conta ins op erators whose eigenfunctions are not giv en by { ψ j } . T hen the corresp ond ing eigen v alues ma y deca y far slow er than the sequence of w eigh ts υ . Hence, for these operators th e obtainable rate of con v ergence ma y b e far slo w er by using the basis { ψ j } in p lace of their eigenfunctions. 2.4 The upp er b ound. In the follo wing theorem we pro vide an upp er b oun d for the estimator b β defined in (1.6) by assuming sequen ces b , ω and υ with the ad d itional prop e rty that m 2 k ∗ δ ∗ n n k = O (1) , m ∗ δ ∗ n n sup 1 6 j 6 m ∗ n ω j υ j o = O (1) and m 2+ k ∗ n k / 2 − 1 = O (1) for some k ∈ N as n → ∞ , (2.6) where m ∗ := m ∗ ( n ) and δ ∗ n := δ ∗ n ( m ∗ ) are giv en b y (2.4). T he n ext theorem states th at the r ate δ ∗ n of the lo wer b ound given in Theorem 2.3 p ro vides also an upp er b oun d of the estimator b β defined in (1.6). Theorem 2.4 . Assume an n -sample of ( Y , X ) satisfying (1.1 ) with σ > 0 and asso ciate d c ovarianc e op er ator Γ ∈ N d υ , d > 1 . Consider W ρ b , ρ > 0 as set of slop e functions and supp ose that the se quenc es b , ω and υ satisfy the Assumption 2.1. L e t m ∗ := m ∗ ( n ) and δ ∗ n := δ ∗ n ( n ) b e gi ven by (2.4) and supp ose (2.6) is satisfie d for some k > 4 . Consider the estimator b β with p ar ameter m = m ∗ and thr eshold γ = n m ax(1 , 8 d 3 △ /b m ∗ ) . If in addition X ∈ X 4 k η and ǫ ∈ E 4 k η , η > 1 , then we have sup β ∈W ρ b E k b β − β k 2 ω 6 C δ ∗ n η d 16 △ 2 { σ 2 + ρ Λ } . wher e C is a p ositive c onstant. Th u s , we ha ve prov ed that the r ate δ ∗ n is optimal and hence the estimator b β is minimax optimal. Remark 2.4 . I t is w orth noting that as long as the sequence b is increasing the condition on the threshold γ giv en in Th eorem 2.4 w rites γ = n for all sufficientl y large n . Therefore, only the p arameter m has to b e chosen d ata-driven in order to b uild an adaptive estimation pro cedur e. On the other hand , u nder the assu mptions of Theorem 2.4 the parametric rate cannot b e obtained. T o b e more p recise, in case that P j ω j /υ j < ∞ , the rate of the lo wer b ound in Theorem 2.4 is give n by δ ∗ n = 1 /n . But in this case th e condition m ∗ / ( δ ∗ n n ) sup 1 6 j 6 m ∗ { ω j /υ j } = O (1) is not satisfied and hence w e cann ot apply Theorem 2.4. How eve r, we conjecture that the prop osed estimator att ains also th e parametric rate under a stronger set of assumptions as, for example, used by J ohannes and Sc henk (200 8 ) in order to obtain rate optimal estimation of a linear functional of the s lop e p arameter β . 8 3 Mean squared prediction error and deriv ativ e estimation In this section we will s upp ose that the slop e function β is an elemen t of the Sob olev space of p eriod ic functions W p for some p > 0 giv en by W p = n f ∈ H s : f ( j ) (0) = f ( j ) (1) , j = 0 , 1 , . . . , p − 1 o , where H p := { f ∈ L 2 [0 , 1] : f ( p − 1) absolutely cont inuous , f ( p ) ∈ L 2 [0 , 1] } is a S ob olev space (c.f. Neubauer (198 8a ,b), Mair and Ru ymgaart (1996) or Tsyb ak o v (2004)). Let us first remark that if we consider the sequ ence of w eigh ts ( b p j ) j ∈ N giv en by b p 1 = 1 and b p 2 j = b p 2 j +1 = j 2 p , j ∈ N , (3.1) and the trigonometric b asis ψ 1 ( t ) = 1 , ψ 2 k ( t ) = √ 2 cos(2 π k t ) , ψ 2 k + 1 ( t ) = √ 2 sin(2 π k t ) , k = 1 , 2 , . . . . ( 3.2) then the Sob olev space of p er io dic fu nctions is equiv alen tly giv en by W b p defined in (2.1). Therefore, let us d enote by W ρ p := W ρ b p , ρ > 0, an ellipsoid in the S ob olev space W p . Mean squared prediction error. W e shall first measur e the p erformance of the esti- mator by consid ering th e mean prediction error (MPE), i.e., E k b β − β k 2 Γ . In this case, if Γ satisfies a link condition, that is Γ ∈ N d υ , d > 1, f or some weigh t sequ ence υ (see d efinition 2.3), then it f ollo ws b y u s ing the inequ alit y of Heinz (1951) that th e MPE is equiv alen t to the W υ -risk, that is E k b β − β k 2 υ . T o illustrate the previous r esults we assum e in the follo win g the sequence ( υ j ) m ∈ N to b e either p o lynomially decreasing, i.e., υ 1 = 1 and υ j = | j | − 2 a , j > 2, for some a > 1 / 2, or exp onenti ally decreasing, i.e., υ 1 = 1 and υ j = exp( −| j | 2 a ), j > 2, f or s ome a > 0. In the p olynomial case ea sy calculus shows that a co v ariance op er- ator Γ ∈ N d υ acts lik e integrat ing (2 a )-times and hence it is called finitely smo othing (c.f. Natterer (1984)). F urthermore, if the eigenfunctions of Γ are { ψ j } , then Γ ∈ N d υ holds if and only if th e eigen v alues λ j of Γ satisfy λ j ≍ d | j | − 2 a , whic h is the case considered, for example, in Cr am b es et al. (2008). On the other hand in the exp onentia l case it can easily b e seen that the link cond ition Γ ∈ N d υ implies R (Γ) ⊂ W p for all p > 0, ther efore th e op er- ator Γ is called infinitely smo othing (c.f. Mair (1994 )). Moreo ve r, if the eigenfunctions of Γ are { ψ j } , then Γ ∈ N d υ holds if and only if the eigenv alues λ j of Γ satisfy λ j ≍ d exp( − j 2 a ). T o the b est of our kno wledge this case h as not b een considered y et in the literature. Sin ce in b o th cases the basic regularit y assumption 2.1 is satisfied, the low er b ounds presen ted in the next assertion follo w directly from Theorem 2.3. Here and sub sequent ly , w e write a n . b n when there exists C > 0 su c h that a n 6 C b n for all sufficien tly large n ∈ N and a n ∼ b n when a n . b n and b n . a n sim ultaneously . Pr op osition 3.1 . Under the assumptions of The or em 2.3 we have for any e stimator e β (i) in the p olynomial c ase, i.e. υ 1 = 1 and υ j = | j | − 2 a , j > 2 , for som e a > 1 / 2 , that sup β ∈W ρ p E k e β − β k 2 Γ & n − (2 p +2 a ) / (2 p +2 a +1 ) , (ii) in the exp onential c ase, i.e. υ 1 = 1 and υ j = exp( −| j | 2 a ) , j > 2 , for some a > 0 , that sup β ∈W ρ p E k e β − β k 2 Γ & n − 1 (log n ) 1 / 2 a . 9 On the other hand , if the dimension parameter m and the threshold γ in the defi nition of the estimator b β giv en in (1.6) are c hosen appr opriately , then, by applying Theorem 2.4, the rates of the lo wer b oun d giv en in th e last assertion also pr o vide, up to a constant , the upp er b oun d of the risk of the estimator b β , w hic h is summarized in the n ext p rop osition. Pr op osition 3.2 . Under the assumptions of The or em 2.3 c onsider the estimator b β (i) in the p olynomial c ase, i.e. υ 1 = 1 and υ j = | j | − 2 a , j > 2 , for some a > 1 / 2 , with m ∼ n 1 / (2 p +2 a +1) and thr eshold γ = n . If in addition k > 2 + 8 / (2 p + 2 a − 1) , then sup β ∈W ρ p E k b β − β k 2 Γ . n − (2 p +2 a ) / (2 p +2 a +1 ) , (ii) in the exp onential c ase, i.e. υ 1 = 1 and υ j = exp ( −| j | 2 a ) , j > 2 , for some a > 0 , with m ∼ (log n ) 1 / (2 a ) and thr eshold γ = n . Then sup β ∈W ρ p E k b β − β k 2 Γ . n − 1 (log n ) 1 / 2 a . W e h a v e thus pr ov ed that these rates are optimal and the pr op osed estimator b β is minimax optimal in b oth cases. It is w orth noting that replacing the condition γ = n by γ = c n w ith c > 0 approp r iately c hosen, Prop osition 3.2 remains true when p = 0, that is to say when β is just supp osed to b e squ are in tegrable. Remark 3.1 . I t is of inte rest to compare our results with those of Crambes et al. (2008) who measure the p erformance of their estimator in terms of squared pr ediction error. In their n otations th e d eca y of the eigen v alues of Γ is assumed to b e of order ( | j | − 2 q − 1 ), i.e., q = a − 1 / 2. F urtherm ore th ey supp ose the slop e function to b e m -times con tinuously differen tiable, i.e., m = p . By using this parametrization w e see that our results in the p olynomial case imply the same r ate of conv ergence in probabilit y of the prediction error as it is pr esen ted in Cr am b es et al. (2008). Ho wev er, fr om our general results follo ws a lo wer and an upp er b ound of the MPE not on ly in the p olynomial case but also in the exp onen tial case. F urthermore, w e shall emp hasize th e in teresting influence of the parameters p and a c haracterizing the smo othness of β and the smo othing pr op erties of Γ, resp ec tiv ely . As we see from Prop ositions 3.1 and 3.2, in the p olynomial case an increasing v alue of p leads to a faster optimal rate. In other w ords, as exp ected, a smo other r egression fun ction can b e f aster estimated. The situatio n in the exp onent ial case is extremely different . It seems rather surp rising that, contrary to the p olynomial case, in the exp onentia l case the optimal rate of conv ergence do es not dep end on the v alue of p , ho wev er this dep enden ce is clearly hidden in the constant. F urthermore, the parameter m do es not even d ep end on the v alue of p . Thereby , the prop osed estimator is automatically adaptive , i.e., it do es n ot in v olv e an a-priori knowle dge of th e degree of smo othness of the slop e function β . Ho w eve r, the c hoice of the smo othing p arameter d ep ends on the v alue a sp ec ifying th e deca y of { υ j } . Note further that in b oth cases an increasing v alue of a leads to a faster optimal rate of con v ergence, i.e., we ma y call 1 /a as de gr e e of il l-p ose dness (c.f. Natterer (1984)). Estimation of the deriv ativ es. Let us consider no w the estimat ion of deriv ative s of the slop e function β . It is well-kno wn , that for an y function g b elo nging to a Sob olev- ellipsoid W ρ p the Sob olev norm k g k b s for eac h 0 6 s 6 p is equiv alent to the L 2 -norm of the s -th wea k deriv ativ e g ( s ) , i.e., k g ( s ) k . T hereby , the results in th e previous Section imp ly again a lo w er b ound as w ell as an up p er b ound of the L 2 -risk for the estimatio n of the s -th weak d eriv ativ e of β . In the follo wing we consider again the t wo p articular cases of 10 p olynomial and exp on ential decreasing r ates for the sequence of w eigh ts ( υ j ). The next assertion summarizes then lo w er b oun ds for the L 2 -risk for the estimation of the s -th w eak deriv ativ e of β in b oth cases. Pr op osition 3.3 . Under the assumptions of The or em 2.3 we have for any e stimator e β ( s ) (i) in the p olynomial c ase, i.e. υ 1 = 1 and υ j = | j | − 2 a , j > 2 , for som e a > 1 / 2 , that sup β ∈W ρ p E k e β ( s ) − β ( s ) k 2 & n − (2 p − 2 s ) / (2 p +2 a +1) , (ii) in the exp onential c ase, i.e. υ 1 = 1 and υ j = exp( −| j | 2 a ) , j > 2 , for some a > 0 , that sup β ∈W ρ p E k e β ( s ) − β ( s ) k 2 & (log n ) − ( p − s ) /a . On the other h and considerin g the estimator b β giv en in (1.6), we only ha v e to calculate the s -th weak deriv ative of β . Giv en the exp onentia l basis, whic h is link ed to th e trigono- metric basis b y the relation exp(2ı π k t ) = 2 − 1 / 2 ( ψ 2 k ( t ) + ı ψ 2 k + 1 ( t )) , for k ∈ Z and t ∈ [0 , 1] , with ı 2 = − 1 , we recall that for 0 6 s < p the s -th deriv ativ e β ( s ) of β in a weak s ense satisfies β ( s ) ( t ) = X k ∈ Z (2ı π k ) s Z 1 0 β ( u ) exp ( − 2ı π k u ) du exp(2ı π k t ) . Giv en a dimension m > 1 , we denote now by [ b Γ] m the (2 m + 1) × (2 m + 1) matrix with generic elemen ts h b Γ ψ ℓ , ψ j i , − m 6 j, ℓ 6 m and by [ b g ] m the 2 m + 1 v ector w ith elemen ts h b g , ψ ℓ i , − m 6 ℓ 6 m. F urtherm ore for in teger s define th e diagonal matrix ▽ 1 / 2 m with entrie s ▽ 1 / 2 j,j := (2ı π j ) s , − m 6 j 6 m . Then we consider the estimator of β ( s ) defined b y b β ( s ) := [ b β ( s ) ] t m [ ψ ] m ( · ) with [ b β ( s ) ] m = ▽ s/ 2 m [ b Γ] − 1 m [ b g ] m , if [ b Γ] m is nonsin gu lar and k [ b Γ] − 1 m k 2 6 γ , 0 , otherwise . (3.3) F urthermore, if the dimen s ion parameter m and the threshold γ in the definition of the estimator b β ( s ) giv en in (3.3) are c hosen appropriately , then by applying Th eorem 2.4 the rates of the low er b ound giv en in the last assertion pro vide u p to a constant again the up p er b ound of the L 2 -risk of the estimator b β ( s ) , whic h is summ arized in the next prop osition. W e ha v e th us prov ed that these r ates are optimal and the p rop osed estimator b β ( s ) is minimax optimal in b oth cases. Pr op osition 3.4 . Under the assumptions of The or em 2.3 c onsider the estimator b β ( s ) (i) in the p olynomial c ase, i.e. υ 1 = 1 and υ j = | j | − 2 a , j > 2 , for some a > 1 / 2 , with m ∼ n 1 / (2 p +2 a +1) and thr eshold γ = n . If in addition k > 2 + 8 / (2 p + 2 a − 1) , then sup β ∈W ρ p E k b β ( s ) − β ( s ) k 2 . n − (2 p − 2 s ) / (2 p +2 a +1) , (ii) in the exp onential c ase, i.e. υ 1 = 1 and υ j = exp ( −| j | 2 a ) , j > 2 , for some a > 0 , with m ∼ (log n ) 1 / (2 a ) and thr eshold γ = n . Then sup β ∈W ρ p E k b β ( s ) − β ( s ) k 2 . (log n ) − ( p − s ) /a . 11 Remark 3.2 . It is worth noting th at the L 2 -risk in estimat ing the slop e fun ction β itself, i.e., s = 0, has b een considered in Hall and Horo witz (200 7 ) only in th e p olynomial case. In their notations th e decrease of the eigen v alues of Γ is of ord er ( | j | − α ), i.e., α = 2 a . F urthermore the F ourier co efficien ts of the slop e function deca y at least w ith r ate j − β , i.e., β = p + 1 / 2. By u sing this new parametrizatio n w e see that w e reco v er the result of Hall and Horo witz (2007) in the p ol ynomial case with s = 0, but without the additional assumption β > α/ 2 + 1 or β > α − 1 / 2 . F urthermore, we shall d iscuss again the in teresting infl uence of the parameters p and a . As we see from Pr op ositions 3.3 and 3.4, in b oth cases an decreasing of th e v alue of a or an increasing of the v alue p leads to a faster optimal r ate of con v ergence. Hence, in opp osite to the MPE b y considering the L 2 -risk the parameter a d escrib es in b oth cases the de gr e e of il l-p ose dness . F ur thermore, the estimation of higher deriv ativ es of the slope function, i.e. b y considering a larger v alue of s , is as u sual only p ossible with a slo wer optimal rate. Finally , as for the MPE in the exp onential case the parameter m do es not dep end on the v alues of p or s , hence the p rop osed estimator is automatical ly adaptiv e. Remark 3.3 . There is an interesting hidden issue in the parametrization w e hav e c ho- sen. Consider a classical ind irect r egression mo del with kno wn op erator giv en b y Γ, i.e., Y = [Γ β ]( U ) + ǫ where U h as a uniform distribution on [0 , 1] and ǫ is white noise (for details see e.g. Mair and Ruymgaart (1996 )). If in add ition the op erator Γ is finitely smo othing, i.e., ( υ j ) is p olynomially decreasing with υ j = j − 2 a , j > 2 , then giv en an n - sample of Y the optimal rate of con verge nce of the W s -risk of any estimator of β is of order n − 2( p − s ) / [2( p +2 a )+1] , since R (Γ) = W 2 a (c.f. Mair and Ruymgaart (1996 ) or Ch en and R eiß (2008)). Ho w ev er, we h a v e sho wn that in a functional linear mo d el ev en with estimated op erator the optimal rate is of order n − 2( p − s ) / [2( p + a )+1] . Th us comparing b oth rates we see that in a fun ctional linear m o del the co v ariance op e rator Γ has the de gr e e of il l-p ose dness a while the same op er ator h as, in the ind irect r egression mo d el, a de gr e e of il l-p ose dness (2 a ). In other words in a fun ctional linear mo del we do not face the complexit y of an inv ersion of Γ but only of its square ro ot Γ 1 / 2 . This, roughly sp eaking, may b e seen as a multiplic ation of the normal equation Y X = h β , X i X + X ǫ by the in v erse of Γ 1 / 2 . Remarking that Γ is also the co v ariance op erator asso ciated to the err or term ǫX, the m ultiplication by the in ve rse of Γ 1 / 2 leads, rough ly sp ea king, to white noise. 4 Concluding remarks and p ersp ectiv es W e hav e prop ose d in this w ork a n ew kind of estimation pro cedu res for the regression function and its deriv ativ es in the functional linear mo del and pr ov ed they can attain optimal rates of con v ergence. These estimators dep end on tw o parameters whic h p la y the role of smo othing parame- ters, the d imension m of the pro jection space and the threshold v alue γ . Buildin g data driven rules that can p erm it to choose automatically the v alues of these p arameters is certainly a topic that deserv es f u rther atten tion and one promising direction is to adapt the selection tec h nique prop osed in Efr omo vic h and Koltc h inskii (2001), Goldensh luger and Pe reverze v (2000) and Tsyb ak o v (2000). Another p oint of int erest is to extend the thresholdin g appr oac h in order to consider differen t thresh olding rules for differen t co ord inates in the considered basis. This could lead f or instance with wa vele t basis to estimators that would adapt to sp arseness as we ll as v arying r egularit y of the r egression f u nction. 12 A App endix: Pro ofs A.1 Pro o fs of Section 2 W e b egin by defining and r ecalling notations to b e u s ed in the pro ofs of this section. Giv en m > 0, a Galerkin s olution of g = Γ β is d en oted by β m ∈ Ψ m (see equation (1.7)). F urthermore, we u s e the notations e β m := [ e β m ] t m [ ψ ] m ( · ) with [ e β m ] m := [ β m ] m 1 {k [ b Γ] − 1 m k 6 γ } , [ b Γ] m = 1 n n X i =1 [ X i ] m [ X i ] t m , [ ˜ X i ] m := [Γ] − 1 / 2 m [ X i ] m , [ ˜ Γ] m := 1 n n X i =1 [ ˜ X i ] m [ ˜ X i ] t m , [Ξ n ] m := [ ˜ Γ] m − I m , [ T n ] m := 1 n n X i =1 h X i , β − β m i [ X i ] m , [ W n ] m := σ n n X i =1 ǫ i [ X i ] m , (A.1) where [ b g ] m − [ b Γ] m [ β m ] m = [ T n ] m + [ W n ] m with E [ T n ] m = [Γ( β − β m )] m = 0 and E [ W n ] m = 0, E [ b Γ] m = [Γ] m , [ ˜ Γ] m = [Γ] − 1 / 2 m [ b Γ] m [Γ] − 1 / 2 m and hence E [Ξ n ] m ≡ 0. Moreo v er, let us introdu ce the even ts Ω := {k [ b Γ] − 1 m k 6 γ } , Ω 1 / 2 := {k [Ξ n ] m k 6 1 / 2 } Ω c := {k [ b Γ] − 1 m k > γ } and Ω c 1 / 2 = {k [Ξ n ] m k > 1 / 2 } . (A.2) Observe that Ω 1 / 2 ⊂ Ω in case γ > 2 k [ Γ] − 1 m k . I ndeed, if k [Ξ n ] m k 6 1 / 2 then the identit y [ b Γ] m = [Γ] 1 / 2 m { I + [Ξ n ] m } [Γ] 1 / 2 m implies b y the usual Neumann series argumen t that k [ b Γ] − 1 m k 6 2 k [Γ] − 1 m k . Th ereb y , if γ > 2 k [Γ] − 1 m k , th en we ha ve Ω 1 / 2 ⊂ Ω. These r esults will b e used b elo w without further r eference. W e shall p ro ve in the en d of this section the t wo tec hnical Lemma A.1 and A.2 whic h are u s ed in the follo wing p ro ofs. Pro of of the consistency . Pr oo f of Proposition 2.1. The pr o of is based on the decomp osition E k b β − β k 2 ω 6 2 { E k b β − e β m k 2 ω + E k e β m − β k 2 ω } . (A.3) Since γ > 2 k [Γ] − 1 m k it follo ws that Ω c ⊂ Ω c 1 / 2 and h en ce E k e β m − β k 2 ω 6 2 {k β m − β k 2 ω + k β m k 2 ω P (Ω c 1 / 2 ) } . (A.4) On th e other hand we sho w b elo w for some constan t C > 0 the follo win g b o un d E k b β − e β m k 2 ω 6 C · k [D iag ( ω )] 1 / 2 m [Γ] − 1 / 2 m k 2 ( m/n ) η n σ 2 + k β − β m k 2 E k X k 2 o n 1 + γ 2 m 2 /n η − 1 / 2 ( P (Ω c 1 / 2 )) 1 / 2 k [Γ] m k 2 o , (A.5) where b y applying Mark o v’s inequalit y (A.12) in Lemma A.1 implies P (Ω c 1 / 2 ) 6 C η m 2 /n for some C > 0. Moreo ver, k [Γ] m k 2 6 k Γ k 2 and k [Diag ( ω )] 1 / 2 m [Γ] − 1 / 2 m k 2 6 γ sup 1 6 j 6 m { ω j } 13 since γ > 2 k [Γ] − 1 / 2 m k 2 , w h ic h by com bination of (A.4) and (A.5) leads to the estimate E k b β − β k 2 ω 6 C n k β m − β k 2 ω + k β m k 2 ω ( m 2 /n ) η + γ sup 1 6 j 6 m { ω j } ( m/n ) η { σ 2 + k β − β m k 2 E k X k 2 } { 1 + γ 2 ( m 3 /n 1+1 / 2 ) k Γ k 2 } (A.6) for some C > 0. F urtherm ore, for eac h β ∈ W ω , w e ha v e k β − β m k ω = o (1) as m → ∞ , whic h can b e realized as follo ws. S ince k Π ⊥ m β k = o (1) and k Π ⊥ m β k ω = o (1) as m → ∞ by u sing Leb esgue’s dominated con v ergence theorem, the assertion follo ws from the identit y [Π m β − β m ] m = − [Γ] − 1 m [ΓΠ ⊥ m β ] m b y using that k Π m β − β m k ω 6 k Π ⊥ m β k ω sup m k Γ − 1 m Π m ΓΠ ⊥ m k ω = O ( k Π ⊥ m β k ω ). Consequentl y , the conditions on m and γ ensure the con v ergence to zero as n → ∞ of the b ound giv en in (A.6), wh ic h p ro ve s the result. Pro of of (A.5). F r om the identit y [ b g ] m − [ b Γ] m [ β m ] m = [ T n ] m + [ W n ] m it follo ws that E k b β − e β m k 2 ω = E k [Diag( ω )] 1 / 2 m { [Γ] − 1 m + [ b Γ] − 1 m ([Γ] m − [ b Γ] m )[Γ] − 1 m } { [ T n ] m + [ W n ] m }k 2 1 Ω . Since 2 k [Γ] − 1 m k 6 γ we ha ve Ω 1 / 2 ⊂ Ω , and hence b y using k [ b Γ] − 1 m k 2 1 Ω 6 γ 2 w e obtain E k b β − e β m k 2 ω 6 3 k [Diag ( ω )] 1 / 2 m [Γ] − 1 / 2 m k 2 n E k [Γ] − 1 / 2 m { [ T n ] m + [ W n ] m }k 2 + γ 2 k [Γ] m k 2 ( E k [Ξ n ] m k 8 ) 1 / 4 ( E k [Γ] − 1 / 2 m { [ T n ] m + [ W n ] m }k 8 ) 1 / 4 ( P (Ω c 1 / 2 )) 1 / 2 + E k{ I + [Ξ n ] m } − 1 k 2 k [Ξ n ] m k 2 k [Γ] − 1 / 2 m { [ T n ] m + [ W n ] m }k 2 1 Ω 1 / 2 o . F rom (A.10)-(A.12) in Lemma A.1 together with k{ I + [Ξ n ] m } − 1 kk [Ξ n ] m k 1 Ω 1 / 2 6 1 follo ws then (A.5), which completes th e pr o of. Pr oo f of Corollar y 2.2. Th e link condition Γ ∈ N d υ implies 2 k [Γ] − 1 m k 6 8 d 3 /υ m = γ , k [Diag( ω )] 1 / 2 m [Γ] − 1 / 2 m k 2 6 4 d 3 sup 1 6 j 6 m { ω j /υ j } and k [Γ] m k 2 6 d 2 b y u sing the estimates (A.16), (A.17) an d (A.18) in Lemm a A.3 , resp ect ive ly . Therefore, b y com bination of (A.4 ) and (A.5) in the pro of of Prop o sition 2.1 we obtain E k b β − β k 2 ω 6 C n k β m − β k 2 ω + k β m k 2 ω ( m 2 /n ) η + d 3 sup 1 6 j 6 m { ω j /υ j } ( m/n ) η { σ 2 + k β − β m k 2 E k X k 2 } { 1 + m 3 / ( n 1+1 / 2 υ 2 m ) d 8 } o (A.7) for some C > 0. By using the ident it y [Π m β − β m ] m = − [Γ] − 1 m [ΓΠ ⊥ m β ] m and the estimate (A.23) in the p ro of of Lemma A.3 with b ≡ ω the link condition Γ ∈ N d υ implies fu rther that k Γ − 1 m Π m ΓΠ ⊥ m k 2 ω = sup k β k ω =1 k Π m β − β m k 2 ω 6 2(1 + d 2 ) for all m ∈ N . Therefore w e ha v e k β − β m k ω = o (1) as m → ∞ for eac h β ∈ W ω . C onsequent ly , the conditions on m and γ ensure th e conv ergence to zero as n → ∞ of the b oun d giv en in (A.7), whic h prov es the r esult. Pro of of the low er b ound. Pr oo f of Theo rem 2.3. Let X i , i ∈ N , b e i.i.d. copies of X with asso ciated co v ari- ance op erato r Γ b elonging to N d υ . Then for eac h j , [ X i ] j is cent ered and h as v ariance E [ X ] 2 j = h Γ ψ j , ψ j i 6 υ j d . Th is resu lt will b e used b elo w without f u rther reference. Con- sider in dep en d en t error terms ǫ i ∼ N (0 , 1), i ∈ N , whic h are indep endent of the random 14 functions { X i } . Let θ ∈ {− 1 , 1 } m ∗ , wh ere m ∗ := m ∗ ( n ) ∈ N s atisfies (2.4 ) for some △ > 1. Defin e a m ∗ -v ector u of coefficient s u j satisfying (A.14) in Lemma A.2. F or eac h θ we consider a slop e f unction β θ := P m ∗ j =1 θ j u j ψ j ∈ W ρ p b y using (A.15) in Lemma A.2. Consequent ly , for eac h θ the r andom v ariables ( Y i , X i ) with Y i := R 1 0 β θ ( s ) X i ( s ) ds + σ ǫ i , i = 1 , . . . , n , form a sample of the mo del (1.1) and we denote its join t distribution b y P θ . F urthermore, for j = 1 , . . . , m ∗ and eac h θ w e intro d uce θ ( j ) b y θ ( j ) l = θ l for j 6 = l and θ ( j ) j = − θ j . As in case of P θ the conditional d istribution of Y i giv en X i is Gaussian with mean P m ∗ j =1 θ j u j [ X i ] j and v ariance σ 2 it is easily seen that the log-lik eliho o d of P θ ( j ) w.r.t. P θ is giv en by log dP θ ( j ) dP θ = − 1 σ 2 n X i =1 n Y i − m ∗ X l =1 θ l u l [ X i ] l o θ j u j [ X i ] j − 2 σ 2 n X i =1 u 2 j [ X i ] 2 j and its exp ec tation w.r.t. P θ satisfies E P θ [log( dP θ ( j ) /dP θ )] > − 2 nd u 2 j υ j /σ 2 . In terms of Kullbac k-Leibler divergence this means K L ( P θ ( j ) , P θ ) 6 2 nd u 2 j υ j /σ 2 . S ince the Hellinger distance H ( P θ ( j ) , P θ ) satisfies H 2 ( P θ ( j ) , P θ ) 6 K L ( P θ ( j ) , P θ ) it follo ws f rom (A.1 5 ) in Lemma A.2 that H 2 ( P θ ( j ) , P θ ) 6 2 nd σ 2 · u 2 j · υ j 6 1 , j = 1 , . . . , m ∗ . (A.8) Consider the Hellinger affinit y ρ ( P θ ( j ) , P θ ) = R p dP θ ( j ) dP θ , then we obtain for any estimator e β of β that ρ ( P θ ( j ) , P θ ) 6 Z |h e β − β θ ( j ) , ψ j i| |h β θ − β θ ( j ) , ψ j i| p dP θ ( j ) dP θ + Z |h e β − β θ , ψ j i| |h β θ − β θ ( j ) , ψ j i| p dP θ ( j ) dP θ 6 Z |h e β s − β θ ( j ) , ψ j i| 2 |h β θ − β θ ( j ) , ψ j i| 2 dP θ ( j ) 1 / 2 + Z |h e β − β θ , ψ j i| 2 |h β θ − β θ ( j ) , ψ j i| 2 dP θ 1 / 2 . (A.9) Due to the identit y ρ ( P θ ( j ) , P θ ) = 1 − 1 2 H 2 ( P θ ( j ) , P θ ) combining (A.8) with (A.9) yields n E θ ( j ) |h e β − β θ ( j ) , ψ j i| 2 + E θ |h e β − β θ , ψ j i| 2 o > 1 2 u 2 j , j = 1 , . . . , m ∗ . F rom this we conclude for eac h estimator e β that sup β ∈W ρ b E k e β − β k 2 ω > sup θ ∈{− 1 , 1 } m ∗ E θ k e β − β θ k 2 ω > 1 2 m ∗ X θ ∈{− 1 , 1 } m ∗ m ∗ X j =1 ω j E θ |h e β − β θ , ψ j i| 2 = 1 2 m ∗ X θ ∈{− 1 , 1 } m ∗ m ∗ X j =1 ω j 1 2 n E θ |h e β − β θ , ψ j i| 2 + E θ ( j ) |h e β − β θ ( j ) , ψ j i| 2 o > 1 4 m ∗ X j =1 u 2 j · ω j > 1 4 · min n σ 2 2 d , ρ △ o · δ ∗ n △ , where the last inequalit y follo ws from (A.15) in Lemma A.2 which completes the pro of. 15 Pro of of the upp er bound. Pr oo f of Theo rem 2.4. Our pro of starts w ith the observ ation that th e link condition Γ ∈ N d υ implies 2 k [Γ] − 1 m k 6 8 d 3 /υ m , k [Diag( ω )] 1 / 2 m [Γ] − 1 / 2 m k 2 6 4 d 3 sup 1 6 j 6 m { ω j /υ j } an d k [Γ] m k 2 6 d 2 b y using the estimates (A.16), (A.17) and (A.18) in Lemma A.3, resp ectiv ely . Moreo ver, for all X ∈ X 4 k η b y applyin g Marko v’s inequalit y (A.12) in Lemma A.1 we ha v e P (Ω c 1 / 2 ) 6 C η m 2 k /n k for some C > 0. F urthermore, b y using the d efinition of m ∗ the condition m = m ∗ implies 1 /υ m ∗ 6 n △ /b m ∗ and hence γ = n max(1 , 8 d 3 △ /b m ∗ ) > 2 k [Γ] − 1 m ∗ k . Therefore, from (A.4) and (A.5) in the pro of of Prop ositio n 2.1 follo ws E k b β − β k 2 ω 6 C n k β m ∗ − β k 2 ω + k β m ∗ k 2 ω ( m 2 k ∗ /n k ) η + d 3 sup 1 6 j 6 m ∗ { ω j /υ j } ( m ∗ /n ) η { σ 2 + k β − β m ∗ k 2 E k X k 2 } { 1 + m 2+ k ∗ / ( n k / 2 − 1 ) d 8 △ 2 } o for some C > 0. Consequently , the definition of δ ∗ n b y u sing (A.19) in Lemma A.3, i.e., k β − β m ∗ k 2 ω 6 10 d 4 ρδ ∗ n , and E k X k 2 6 d Λ, implies E k b β − β k 2 ω 6 C δ ∗ n η d 16 △ 2 { σ 2 + ρ Λ } n 1 + m 2 k ∗ / ( δ ∗ n n k ) + m ∗ / ( δ ∗ n n ) sup 1 6 j 6 m ∗ { ω j /υ j } o n 1 + m 2+ k ∗ / ( n k / 2 − 1 ) o Thereby , the result follo ws f r om the condition (2.6) whic h ensur es that the f actors in braces are b ounded as n → ∞ , which completes th e pro of. T ec hnical assertions. The follo wing t wo lemma gather tec h nical results u s ed in th e pro of of Prop o sition 2.1, Theorem 2.3 and Th eorem 2.4. Lemma A.1 . Supp ose X ∈ X 4 k η and ǫ ∈ E 4 k η , k ∈ N . Then for some c onstant C > 0 only dep ending on k we have E k [Γ] − 1 / 2 m W n,m k 2 k ≤ C · m k n k · σ 2 k · η , (A.10) E k [Γ] − 1 / 2 m T n,m k 2 k ≤ C · m k n k · k β − β m k 2 k · ( E k X k 2 ) k · η , (A.11) E k Ξ n,m k 2 k ≤ C · η · m 2 k n k , (A.12) E k{ [Γ] m − [ b Γ] m } [Γ] − 1 / 2 m k 2 k ≤ C · η · m 2 k n k · ( E k X k 2 ) k (A.13) Pr oo f . Let ˜ W := [Γ] − 1 / 2 m W n,m , then E k [Γ] − 1 / 2 m W n,m k 2 k ≤ m k − 1 P m j =1 E ˜ W 2 k j , where ˜ W j = (1 /n ) P n i =1 σ ǫ i [ ˜ X i ] j . The r an d om v ariables ( ǫ i [ ˜ X i ] j ), 1 ≤ i ≤ n, are ind ep enden t and iden tically distributed (i.i.d.) with mean zero. S ince X ∈ X 4 k η and ǫ ∈ E 4 k η , (A.10 ) follo w s from Theorem 2.10 in P etrov (1995), that is, E ˜ W 2 k j ≤ C n − k σ 2 k E | ǫ [ ˜ X ] j | 2 k ≤ C n − k σ 2 k η for some constant C > 0 only d ep end ing on k . Pro of of (A.11). Due to E h β − β m , X i [ X ] m = [Γ( β − β m )] m = 0, i.e., th e random v ariables ( h β − β m , X i i [ X i ] m ), 1 6 i 6 n, are i.i.d. with mean zero. F u rthermore, we claim 16 that X ∈ X 4 k η implies E |h β − β m , X i [ ˜ X ] j | 2 k ≤ C · η · k β − β m k 2 k ( E k X k 2 ) k , for eac h j ∈ N . Then the estimate (A.11) follo ws in analogy to (A.10). Ind eed, w e ha ve E | [ ˜ X ] j | 4 k 6 η and E |h β − β m , X i| 4 k 6 k β − β m k 4 k X j 1 [Γ] j 1 ,j 1 · · · X j 2 k [Γ] j 2 k ,j 2 k E 2 k Y l =1 | [ X ] j l / [Γ] 1 / 2 j l ,j l | 2 6 k β − β m k 4 k ( E k X k 2 ) 2 k η , whic h imply together the assertion by u sing the C auc hy-Sc hw arz in equalit y . Pro of of (A.12). F rom the iden tit y (Ξ n,m ) j,l = (1 /n ) P n i =1 { [ ˜ X i ] j [ ˜ X i ] l − δ j l } with δ j l = 1 if j = l and zero otherwise, we conclude E (Ξ n,m ) 2 k j,l ≤ C ′ n − k E | [ ˜ X ] j [ ˜ X ] l − δ j l | 2 k . Thus X ∈ X 4 k η implies E k Ξ n,m k 2 k ≤ m 2( k − 1) P j,l E (Ξ n,m ) 2 k j,l ≤ C m 2 k n − k η . The estimate (A.13) follo w s by using the iden tit y { [Γ] m − [ b Γ] m } [Γ] − 1 / 2 m = [Γ] 1 / 2 m Ξ n,m from (A.12), wh ic h completes the pro of. Lemma A.2 . L et m ∗ ∈ N and δ ∗ n b e chosen such that (2.4 ) is satisfie d for some △ > 1 . Consider a (infinite) ve ctor u with c omp onents u j satisfying u 2 j = ζ n · υ j , j ∈ N , with ζ := min σ 2 / (2 d ) , ρ/ △ , (A.14) then under Assumption 2.1 we have f or al l j ∈ N 2 nd σ 2 u 2 j υ j 6 1 , m ∗ X j =1 u 2 j b j 6 ρ, and m ∗ X j =1 u 2 j ω j > m in σ 2 2 d , ρ △ δ ∗ n △ . (A.15) Pr oo f . The fir st inequality in (A.15) follo ws trivially by using the definition of ζ , while the defin ition of m ∗ giv en in (2.4) together w ith Assump tion 2.1, i.e., ( b j /ω j ) is n on- decreasing, imp lies the second, i.e., P m ∗ j =1 u 2 j b j 6 ζ b m ∗ /ω m ∗ P m ∗ j =1 ω j / ( nυ j ) 6 ζ △ 6 ρ . T o deduce the third estimate from the definition of m ∗ and δ ∗ n observ e that P m ∗ j =1 u 2 j ω j = δ ∗ n ζ b m ∗ /ω m ∗ P m ∗ j =1 ω j / ( n υ j ) > δ ∗ n ζ / △ , w hic h pro ves the lemma. Lemma A.3 . Supp ose the se quenc es b , ω and υ satisfy Assumption 2.1. L et Γ ∈ N d υ . Then sup m ∈ N n υ m k [Γ] − 1 / 2 m k 2 o 6 { 2 d 2 (2 d 4 + 3) } 1 / 2 6 4 d 3 , (A.16) sup m ∈ N n k [Diag( υ )] 1 / 2 m [Γ] − 1 / 2 m k 2 o 6 { 2 d 2 (2 d 4 + 3) } 1 / 2 6 4 d 3 , (A.17) sup m ∈ N n k [Diag( υ )] − 1 / 2 m [Γ] 1 / 2 m k 2 o 6 d. (A.18) If in additio n β m denotes a Galer kin solution of g = Γ β with β ∈ W ρ b , then sup m ∈ N n b m /ω m k β − β m k 2 ω o 6 2(2 d 4 + 3) ρ 6 10 d 4 . (A.19) Pr oo f . W e start our pro of with the observ ation that the link condition Γ ∈ N d υ implies that Γ is strictly p ositiv e and that for all | s | 6 1 by using the inequalit y of Heinz (1951 ) d − 2 | s | k f k 2 υ 2 s 6 k Γ s f k 2 6 d 2 | s | k f k 2 υ 2 s . (A.20) 17 Consider g ∈ Ψ m . Th en (A.20) imp lies β := Γ − 1 g ∈ L 2 [0 , 1] by using that k g k υ − 2 = k [Diag( υ )] − 1 m [ g ] m k < ∞ . F urthermore, β m = [Γ] − 1 m [ g ] m is the unique Galerkin solution of (1.7). By usin g successiv ely the fir st inequalit y of (A.20), the Galerkin condition (1.7) and the second inequalit y of (A.20), we obtain k β − β m k 2 υ 2 6 d 2 k Γ( β − β m ) k 2 6 d 2 k Γ( β − Π m β ) k 2 6 d 4 k β − Π m β k 2 υ 2 (A.21) Since ( υ j ) is monotonically d ecreasing it follo ws k β − Π m β k 2 υ 2 6 υ 2 m k β k 2 and, hence b y using (A.20) with s = − 1 we hav e k β − Π m β k 2 υ 2 6 d 2 υ 2 m k g k 2 υ − 2 . Combining the last estimate with (A.21) w e obtain k β m − Π m β k 2 υ 2 6 2 {k β − β m k 2 υ 2 + k β − Π m β k 2 υ 2 } 6 2 d 2 ( d 4 + 1) υ 2 m k g k 2 υ − 2 whic h together with k f k 2 6 υ − 2 m k f k 2 υ 2 for all f ∈ Ψ m leads to k β m − Π m β k 2 6 υ − 2 m k β m − Π m β k 2 υ 2 6 2 d 2 ( d 4 + 1) k g k 2 υ − 2 . By u sing the last estimate together with k g k υ − 2 = k [Diag ( υ )] − 1 m [ g ] m k we conclude that k [Γ] − 1 m [ g ] m k 2 = k β m k 2 6 2 {k β m − Π m β k 2 + k Π m β k 2 } 6 2 d 2 (2 d 4 + 3) k [Diag ( υ )] − 1 m [ g ] m k 2 , ∀ g ∈ Ψ m . (A.22) Then, f r om (A.22) follo w s b y usin g the inequ ality of Heinz (1951) for all g ∈ Ψ m k [Γ] − 1 / 2 m [ g ] m k 2 6 { 2 d 2 (2 d 4 + 3) } 1 / 2 k [Diag( υ )] − 1 / 2 m [ g ] m k 2 , whic h implies tog ether with k [Diag ( υ )] − 1 m k = υ − 1 m the estimate (A.16), and furthermore by replacing [ g ] m b y [Diag ( υ )] 1 / 2 m [ g ] m the estimate (A.17), that is, k [Γ] − 1 / 2 m [Diag( υ )] 1 / 2 m [ g ] m k 2 6 { 2 d 2 (2 d 4 + 3) } 1 / 2 k [ g ] m k 2 , ∀ g ∈ Ψ m . Pro of of (A.18). By u sing the s econd inequalit y of (A.20) together with k Π m k = 1 w e obtain k [Γ] m [ g ] m k 2 = k Π m Γ g k 2 6 k Γ g k 2 6 d 2 k g k 2 υ 2 = d 2 k [Diag( υ )] m [ g ] m k 2 , ∀ g ∈ Ψ m and h en ce the inequalit y of Heinz (1951) implies k [Γ] 1 / 2 m [ g ] m k 2 6 d k [Diag ( υ )] 1 / 2 m [ g ] m k 2 , ∀ g ∈ Ψ m . Thereby , (A.18) follo ws by replacing [ g ] m b y [Diag( υ )] − 1 / 2 m [ g ] m , that is, k [Γ] 1 / 2 m [Diag( υ )] − 1 / 2 m [ g ] m k 2 6 d k [ g ] m k 2 , ∀ g ∈ Ψ m . Pro of of (A.1 9) . Let β ∈ W ρ b . Consider the decomp osition k β − β m k 2 ω 6 2 {k β − Π m β k 2 ω + k Π m β − β m k 2 ω } . Since ( ω j /b j ) is non-increasing it follo ws k β − Π m β k 2 ω 6 ω m /b m k β k 2 b , while we show b elo w k Π m β − β m k 2 ω 6 2(1 + d 2 ) ω m /b m k β k 2 b . (A.23) Consequent ly , by com bin ation of these t wo b ound s the condition β ∈ W ρ b , i.e., k β k 2 b 6 ρ , implies (A.19). F rom (A.21) follo ws k β − β m k 2 υ 2 6 d 4 k β − Π m β k 2 υ 2 6 d 4 υ 2 m /b m k β k 2 b b ecause ( υ 2 j /b j ) is non-increasing, and h ence, k Π m β − β m k 2 υ 2 6 2 {k β − β m k 2 υ 2 + k β − Π m β k 2 υ 2 } 6 2(1 + d 4 ) υ 2 m /b m k β k 2 b . (A.24) F urthermore, k Π m β − β m k 2 ω 6 ω m υ − 2 m k Π m β − β m k 2 υ 2 since ( ω j /υ 2 j ) is n on -d ecreasing. The last estimate and (A.24) imply now together (A.23), whic h completes the pro of. 18 A.2 Pro o fs of Section 3 The mean prediction error. Pr oo f of Proposition 3.1. Since Γ ∈ N d υ , d > 1, it follo ws by using the in equalit y of Heinz (1951) that E k e β − β k 2 Γ ≍ d E k e β − β k 2 υ . Therefore, we can app ly the general results by considering the W ω -risk with ω = υ as a measure of the p erf orm ance of an estimator of β . F urthermore, in case (i) the d efinition of b p j and υ j imply together ( b p m ∗ /ω m ∗ ) P m ∗ j =1 ω j /υ j = m 2 a +2 p +1 ∗ . It follo ws that the condition on m ∗ and δ ∗ n giv en in (2.4) of Theorem 2.3 can b e rewr itten as m ∗ ∼ n 1 / (2 p +2 a +1) and δ ∗ n ∼ n − (2 p +2 a ) / (2 p +2 a +1 ) . On the other hand, in case (ii) ( b p m ∗ /ω m ∗ ) P m ∗ j =1 ω j /υ j = m 2 p +1 ∗ exp( m 2 a ∗ ) implies that the condition on m ∗ and δ ∗ n writes m ∗ ∼ (log n ) 1 / (2 a ) and δ ∗ n ∼ n − 1 (log n ) 1 / (2 a ) . Consequ en tly , the lo wer b ounds in Prop osition 3.1 follo w by applying Theorem 2.3. Pr oo f of Proposition 3.2. Note, that for sufficien tly large n the condition on γ in The- orem 2.4 wr ites γ = n b ecause ( b p j ) is increasing. F urthermore, it is easily seen that the additional condition (2.6) is satisfied in the exp onen tial case and for all k > 2+ 8 / (2 p + 2 a − 1 ) also in the p olynomial case. Finally , sin ce in b oth cases th e cond ition on m ensur es that m ∼ m ∗ (see the p ro of of Pr op osition 3.1) the r esu lt follo ws fr om Theorem 2.4. The estimation of deriv atives . Pr oo f of Proposition 3.3. Since for ea c h 0 6 s 6 p we ha v e E k e β ( s ) − β ( s ) k 2 ∼ E k e β − β k 2 b s w e can apply again the general results b y considerin g the W ω -risk with ω = b s . In case (i) the we ll-known app ro ximation P m j =1 j r ∼ m r +1 for r > 0 together with the definition of b p j and υ j imply ( b p m ∗ /ω m ∗ ) P m ∗ j =1 ω j /υ j ∼ m 2 a +2 p +1 ∗ . It follo ws that the condition on m ∗ and δ ∗ n giv en in (2.4 ) of Theorem 2.3 writes m ∗ ∼ n 1 / (2 p +2 a +1) and δ ∗ n ∼ n − (2 p − 2 s ) / (2 p +2 a +1) . On the other hand, in case (ii) by applying Laplace’s Metho d (c.f. chapter 3.7 in Olve r (1974)) the defin ition of b j and υ j imply ( b p m ∗ /ω m ∗ ) P m ∗ j =1 ω j /υ j ∼ m 2 p ∗ exp( m 2 a ∗ ) implies that the condition on m ∗ and δ ∗ n can b e rewr itten as m ∗ ∼ (lo g n ) 1 / (2 a ) and δ ∗ n ∼ n − 1 (log n ) 1 / (2 a ) . Consequent ly , the lo wer b ounds in Prop ositio n 3.1 follo w b y applying Th eorem 2.3. Pr oo f of Proposition 3.4. The pro of f ollo ws in analogy to the p ro of of Pr op osition 3.2 and we omit the details. References F. Bauer, S. V. Perev erzev, and L. Rosasco. On regularizatio n algorithms in learning theory . J. Complexity , 23:52 – 72, 2007. P . Besse, H. C ard ot, and D. Steph enson. Autoregressiv e forecasting of some fun ctional climatic v ariations. Sc and. J. of Statist. , 27:67 3–687, 2000 . D. Bosq. Li ne ar Pr o c esses in F unction Sp ac es. , vo lume 149 of L e ctur e Notes in Statistics . Springer-V erlag, 2000. H. Cardot, F. F err at y , and P . S arda. F u nctional linear mo del. Statistics & Pr ob ability L etters , 45:11– 22, 1999 . 19 H. Cardot, F. F errat y , and P . S ard a. Spline estimators for the fun ctional linear mo d el. Statistic a Sinic a , 13:571–59 1, 2003. H. Cardot, A. Mas, and P . Sard a. CL T in functional linear regression mo dels. Pr ob. The ory and R el. Fields , 18:325–361 , 2007. X. Chen and M. Reiß. On rate optimalit y for ill-p osed inv erse problems in econometrics. T ec hnical rep ort, Y ale Un iv ersit y , 200 8. C. Cram b es, A. Kn eip, and P . Sarda. Sm o ot hin g splines estimators f or functional linear regression. Anna ls of Statistics , 2008. T o app ear. J. Dauxois, A. Pousse, and Y. Romain. Asymptotic theory for principal comp onents anal- ysis of a r andom v ector function: some applications to statistic al inferen ce. Journal of Multivariate Analysis , 12:136–1 54, 1982. S. Efromovic h and V. Koltc hins kii. On inv ers e pr ob lems with unkno wn op erato rs. IEEE T r ansactions on Inf ormation The ory , 47(7):28 76–2894, 2001. H. W. En gl, M. Hank e, and A. Neubauer. R e gularization of i nverse pr oblems. Klu we r Academic, Dordr ec h t, 2000. F. F errat y an d P . Vieu. Nonp ar ametric F unctional Data Ana lysis: M e tho ds, The ory, Ap- plic ations and Implementations. Spr inger-V erlag, London, 2006. I. F rank and J. F riedman. A s tatistical view of some chemometrics regression to o ls. T e ch- nometrics , 35:109–148 , 1993. A. Goldenshluger and S. V. Pe reverzev. Adaptiv e estimatio n of linear fu nctionals in hilb ert scales from ind irect white noise observ ations. Pr ob. The ory and R el. Fields , 118:169– 186, 2000. P . Hall and J. L. Horo witz. Metho dology and conv ergence rates for fun ctional linear r egres- sion. Annal s of Statistics , 35(1):7 0–91, 2007 . E. Heinz. Beitr¨ age zur st¨ orungstheorie d er sp ektralzerleg un g. Mathematische Anna len , 123: 415–4 38, 1951 . M. Hoffmann and M. Reiß. Nonlinear estimatio n for linear inv erse problems w ith error in the op erator. Annals of Stat istics , 36( 1):310–33 6, 200 8. J. Johannes. Nonp arametric estimation in fun ctional linear mo d el with second order sta- tionary r egressors. Sub mitted, 2008. J. J oh an n es and R. Sc henk. Rate optimal estimation of linear funt ionals in functional linear mo del. T ec hnical rep ort, Un iv ersit y Heidelb erg, 2008. A. P . Korostolev and A. B. Tsyb ak o v. M i nimax The ory for Image R e c onstruction. , vol- ume 82 of L e ctur e Notes in Statistics . S pringer-V erlag, 1993. B. A. Mair. Tikhonov regularization f or fin itely an d infi nitely sm o othing op erat ors. SIAM Journal on Mathematic al Analysis , 25:135–1 47, 1994. 20 B. A. Mair and F. H. Ruymgaart. Statistical inv er s e estimatio n in Hilb ert scales. SIAM Journal on Applie d M athematics , 56(5):142 4–1444, 1996. B. D. Marx and P . H. E ilers. Generalized lin ear regression on sampled signals and cu r v es : a p-sp line appr oac h . T e chnometrics , 41:1–13, 1999. H.-G. M ¨ uller and U. Stadtm ¨ uller. Generalize d fun ctional linear mo dels. Ann. Stat. , 33: 774–8 05, 2005 . M. Nair, S. V. Perev erzev, and U. T autenh ahn. Re gularization in Hilb ert scales u nder general smo othin g conditions. Inverse Pr oblems , 21:1851–18 69, 2005. F. Natterer. Regularisierung schlec ht gestellter probleme dur c h pro jektionsv erfahren. Nu- mer. M ath. , 28:329–3 41, 1997. F. Natte rer. Error b ounds for Tikhon ov r egularization in Hilb ert scales. Applic able Analysis , 18:29– 37, 1984 . A. Neub au er . When do Sob olev sp aces form a Hilb ert scale? Pr o c. Amer. Math. So c. , 103 (2):55 7–562, 1988 a. A. Neubauer. An a p osteriori parameter c hoice for T ikh ono v regularization in Hilb ert scales leading to optimal conv ergence rates. SIAM J. Numer. Anal . , 25(6):1 313–1326, 1988b. F. Olver. Asympto tics and sp e cial functions. Academic Press, New Y ork, 1974. V. V. P etro v. Li mit the or e ms of pr ob ability the ory. Se quenc es of indep endent r andom vari- ables. Oxford Stu d ies in Probabilit y . C larendon Press., Oxford , 4. edition, 1995. C. Preda and G. Sap orta. Pls regression on a sto c hastic pro cess. Computational Statistics & Data A nalysis , 48:149 –158, 2005. J. Ramsa y and B. Silverman. F unctional Data Anal ysis. Sprin ger, New Y ork, second ed. edition, 2005. J. O. Ramsa y and C. J. Dalzell. Some tools f or fu nctional d ata analysis. Journal of the R oyal Stat istic al So ciety, Series B , 53:539–572 , 1991. S. Smale and D. Zhou . Learning theory estimates via integral op erators and their approxi- mations. Constr. A ppr ox. , 26:153 – 172, 2007. A. B. Tsybak ov. Intr o duction ` a l’estimation non-p ar am´ etrique (Intr o duction to nonp ar a- metric estimation). Math´ ematiques & Applications (P aris). 41. Springer: P aris, 2004. A. B. Tsyb ak o v. On the b est rate of adaptive estimatio n in some in v erse p r oblems. Comptes R endus de l’A c ad ´ emie des Scie nc es, S´ erie I, Math ´ ematiques , 330:835–8 40, 2000. 21
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment