Supervised functional classification: A theoretical remark and some comparisons

Sup ervised functional classiﬁcation: A theoretical remark and some comparisons Amparo Ba ´ ıllo ∗ and An tonio Cuev as † Departamen to de An´ alisis Econ´ omico: Econom ´ ıa Cuan titativ a, Univ. Aut´ onoma de Madrid, Spain Departamen to de Matem´ aticas, Univ. Aut´ onoma de Madrid, Spain Abstract The problem of sup ervised classiﬁcation (or discrimination) with functional data is considered, with a sp ecial in terest on the popular k -nearest neigh b ors ( k -NN) classiﬁer. First, relying on a recent result by C´ erou and Guy ader (2006), w e prov e the consis- tency of the k -NN classiﬁer for functional data whose distribution b elongs to a broad family of Gaussian processes with triangular cov ariance functions. Second, on a more practical side, we chec k the b ehavior of the k -NN metho d when compared with a few other functional classiﬁers. This is carried out through a small sim ulation study and the analysis of several real functional data sets. While no global “uniform” winner emerges from suc h comparisons, the o v erall performance of the k -NN metho d, together with its sound in tuitive motiv ation and relativ e simplicity , suggests that it could represen t a reasonable b enchmark for the classiﬁcation problem with functional data. Key wor ds and phr ases . Sup ervised classiﬁcation, functional data, pro jections metho d, nearest neighbors, discriminant analysis. AMS 2000 subje ct classiﬁc ation . Primary 62G07; secondary 62G20. ∗ Corresp onding author. Phone: +34 914978640, e-mail: amparo.baillo@uam.es † The research of both authors w as partially supported b y Spanish grant MTM2007-66632 and the IV PRICIT program titled Mo delizaci´ on Matem´ atic a y Simulaci´ on Num´ eric a en Ciencia y T e cnolo g ´ ıa (SIMU- MA T). 1 1. In tro duction 1.1 Some b ackgr ound on sup ervise d classiﬁc ation Sup ervised classiﬁcation is the mo dern name for one of the oldest statistical problems in exp erimental science: to decide whether an individual, from which just a random mea- suremen t X (with v alues in a “feature space” F endow ed with a metric D ) is known, either b elongs to the p opulation P 0 or to P 1 . F or example, in a medical problem P 0 and P 1 could corresp ond to the group of “healthy” and “ill” individuals, resp ectiv ely . The decision must b e tak en from the information provided by a “training sample” X n = { ( X i , Y i ) , 1 ≤ i ≤ n } , where X i , i = 1 , . . . , n , are indep endent replications of X , measured on n randomly c hosen individuals, and Y i are the corresp onding v alues of an indicator v ariable which tak es v alues 0 or 1 according to the membership of the i -th individual to P 0 or P 1 . Th us the mathematical problem is to ﬁnd a “classiﬁer” g n ( x ) = g n ( x ; X n ), with g n : F → { 0 , 1 } , that minimizes the classiﬁcation error P { g n ( X ) 6 = Y } . The term “sup ervised” refers to the fact that the individuals in the training sample are supp osed to b e correctly classiﬁed, typically using “external” non statistical pro cedures, so that they pro vide a reliable basis for the assignation of the new observ ation. This problem, also known as “statistical discrimination” or “pattern recognition”, is at least 70 years old. The origin go es bac k to the classical work b y Fisher (1936) where, in the d -v ariate case F = R d , a simple “linear classiﬁer” g n ( x ) = 1 { x : w 0 x + w 0 > 0 } w as introduced ( 1 A stands for the indicator function of a set A ⊂ F ). A deep insightful p ersp ective of the sup ervised classiﬁcation problem can be found in the b o ok of Devroy e et al (1996). Other useful textb o oks are Hand (1997) and Hastie et al. (2001). All of them fo cus on the standard m ultiv ariate case F = R d . It is not diﬃcult to prov e (e.g., Devro y e et al., 1996, p. 11) that the optimal classiﬁcation rule (often called “Ba yes rule”) is g ∗ ( x ) = 1 { η ( x ) > 1 / 2 } , (1) where η ( x ) = E ( Y | X = x ). Of course, since η is unkno wn the exact expression of this rule is usually unknown, and thus diﬀeren t pro cedures hav e b een prop osed in order to approximate it. In particular, it can b e seen that Fisher’s linear rule is optimal pro vided that the con- ditional distributions of X | Y = 0 and X | Y = 1 are b oth normal with identical cov ariance matrix. While these conditions lo ok quite restrictiv e, and it is straightforw ard to construct problems where any linear rule has a p o or p erformance, Fisher’s classiﬁer is still by far the most p opular c hoice among users. A simple non-parametric alternativ e is giv en by the k -nearest neigh b ors ( k -NN) metho d whic h is obtained b y replacing the unkno wn regression function η ( x ) in (1) with the regression estimator η n ( x ) = 1 k n X i =1 1 { X i ∈ k ( x ) } Y i (2) 2 where k = k n is a given (integer) smo othing parameter and “ X i ∈ k ( x )” means that X i is one of the k nearest neighbours of x . More concretely , if the pairs ( X i , Y i ) 1 ≤ i ≤ n are re- indexed as ( X ( i ) , Y ( i ) ) 1 ≤ i ≤ n so that the X ( i ) ’s are arranged in increasing distance from x , D ( x, X (1) ) ≤ D ( x, X (2) ) ≤ . . . ≤ D ( x, X ( n ) ), then k ( x ) = { X ( i ) , 1 ≤ i ≤ k } . This leads to the k -NN classiﬁer g n ( x ) = 1 { η n ( x ) > 1 / 2 } . It is well-kno wn that, in addition to this simple classiﬁer, several other alternativ e meth- o ds (kernel classiﬁers, neural netw orks, supp ort v ector mac hines,...) ha v e been dev elop ed and extensiv ely analyzed in the latest years. Ho w ev er, when used in practice with real data sets, the p erformance of Fisher’s rule is often found to b e v ery close to that of the b est one among all the main alternative pro cedures. On these grounds, Hand (2006) has argued in a pro v o cative pap er ab out the “illusion of progress” in sup ervised classiﬁcation tec hniques. The central idea w ould b e that the study of new classiﬁcation rules often fails to tak e into accoun t the structure of real data sets and it tends to ov erlo ok the fact that, in spite of the its theoretical limitations, Fisher’s rule is quite satisfactory in many practical applications. This, together with its conceptual simplicit y , explains its p opularit y ov er the years. 1.2 The purp ose and structur e of this p ap er W e are concerned here with the problem of (binary) sup ervised classiﬁcation with func- tional data. That is, w e consider the general framew ork indicated ab ov e but w e will assume throughout that the space ( F , D ) where the random elements X i tak e v alues is a separable metric space of functions. F or some theoretical results (Theorem 2) we will imp ose a more sp eciﬁc assumption b y taking F as the space C [ a, b ] of real contin uous functions deﬁned in a closed ﬁnite in terv al [ a, b ], with the usual supremum norm k k ∞ . The study of discrimination techniques with functional data is not as developed as the corresp onding ﬁnite-dimensional theory but, clearly , is one of the most active researc h topics in the b o oming ﬁeld of functional data analysis (FD A). Two well-kno wn b o oks including broad ov erviews of FDA with interesting examples are F errat y and Vieu (2006) and Ramsay and Silverman (2005). Other recent more sp eciﬁc references will b e men tioned b elow. There are of course several imp ortant diﬀerences b et ween the theory and practice of sup ervised classiﬁcation for functional data and the classical developmen t of this topic in the ﬁnite-dimensional case, where t ypically the data dimension d is m uc h smaller than the sample size n (the “high-dimensional” case where d is “large”, and usually d > n , requires a separate treatmen t). A ﬁrst imp ortant practical diﬀerence is the role of Fisher’s linear discriminan t metho d as a “default” choice and a b enc hmark for comparisons. As we ha v e men tioned, this holds for the ﬁnite dimensional cases with “small” v alues of d but it is not longer true if functional (or high-dimensional) data are inv olved. T o b egin with, there is no ob vious w a y to apply in practice Fisher’s idea in the inﬁnite-dimensional case, as it requires to in v ert a linear operator which is not in general a straigh tforw ard task in functional spaces; see, ho w ev er, James and Hastie (2001) for an in teresting adaptation of linear discrimination ideas to a functional setting. Then, the question is whether there exists an y functional 3 discriminan t metho d, based on simple ideas, which could play a reference role similar to that of Fisher’s metho d in the ﬁnite dimensional case. The results in this pap er suggest (as a partial, not deﬁnitiv e, answer) that the k -NN metho d could represent a “default standard” in functional settings. Another diﬀerence, particularly imp ortant from the theoretical p oint of view, concerns the univ ersal consistency of the k -NN classiﬁer. A classical result by Stone (1977) establishes that in the ﬁnite-dimensional case (with X i ∈ R d ) the conditional error of the k -NN classiﬁer L n = P { g n ( X ) 6 = Y |X n } , (3) con v erges in probability (and also in mean) to that of the Ba y es (optimal) rule g ∗ , that is, E ( L n ) → L ∗ = P { g ∗ ( X ) 6 = Y } , pro vided that k n → ∞ and k n /n → 0 as n → ∞ . This result holds univ ersally , that is, irresp ective of the distribution of the v ariable ( X , Y ). The interesting p oin t here is that this universal consistency result is no longer v alid in the inﬁnite-dimensional setting. As recen tly pro v ed b y C ´ erou and Guy ader (2006), if the space F where X tak es v alues is a general separable metric space, a non-trivial condition must b e imposed on the distribution of ( X, Y ) in order to ensure the consistency of the k -NN classiﬁer. The aim of this pap er is t wofold, with a common focus on the k -NN classiﬁer and in close relation with the abov e men tioned t w o diﬀerences b etw een the classiﬁcation problem in ﬁnite and inﬁnite settings. First, on the theoretical side, w e hav e a further lo ok at the consistency theorem in C ´ erou and Guy ader (2006) b y giving concrete non-trivial examples where their consistency condition is fulﬁlled. Second, from a more practical viewpoint, w e will carry out numerical comparisons (based b oth on Monte Carlo studies and real data examples) to assess the p erformance of diﬀerent functional classiﬁers, including k -NN. This pap er is organized as follows. In Section 2 the consistency of the functional k -NN classiﬁer is established, as a consequence of Theorem 2 in C´ erou and Guyader (2006), for a broad class of Gaussian pro cesses. In Section 3 other functional classiﬁers recently considered in the literature are introduced and brieﬂy commen ted. They are all compared through a sim ulation study (based on t w o diﬀeren t mo dels) as w ell as six real data examples, v ery m uc h in the spirit of Hand’s (2006) pap er, where the p erformance of the classical Fisher’s rule was assessed in terms of its discrimination capacit y in several randomly chosen data sets. 2. On the consistency of the functional k -NN classiﬁer In the functional classiﬁcation problem sev eral auxiliary devices ha ve b een used to ov er- come the extra diﬃculty p osed by the inﬁnite dimensional nature of the feature space. They include dimension reduction techniques (e.g., James and Hastie 2001, Preda et al. 2007), random pro jections combined with data-depth measures pro jections use of data-depth mea- sures (Cuev as et al. 2007) and diﬀeren t adaptations to the functional framework of several non-parametric and regression-based metho ds, including k ernel classiﬁers (Abraham et al. 4 2006, Biau et al. 2005, F erraty and Vieu 2003), repro ducing kernel procedures (Preda 2007), logistic regression (M ¨ uller and Stadtm ¨ uller 2005) and multila y er p erceptron techniques with functional inputs (F err ´ e and Villa 2006). 2.1 On the c onsistency of the functional k -NN classiﬁer The functional k -NN classiﬁer b elongs also to the class of pro cedures adapted from the usual non-parametric m ultiv ariate setup. Nev ertheless, unlike most of the ab ov e men tioned functional metho dologies, the k -NN procedure works according to exactly the same principles in the ﬁnite and inﬁnite-dimensional cases. It is deﬁned by g n ( x ) = 1 { η n ( x ) > 1 / 2 } , where η n is the k -NN regression estimator (2), whose deﬁnition is formally iden tical to that of the ﬁnite-dimensional case. The in tuitive interpretation is also the same in b oth cases. No previous data manipulation, pro jection or dimension reduction techniq ue is required in principle, apart from the discretization pro cess necessarily inv olv ed in the practical handling of functional data. In the present section w e oﬀer some concrete examples where the k -NN functional classiﬁer is weakly consistent. As we hav e men tioned in the previous section, this is a non-trivial p oin t since the k -NN classiﬁer is no longer universally consisten t in the case of inﬁnite-dimensional inputs X . Throughout this section the feature space where the v ariable X tak es v alues is a separable metric space ( F , D ). W e will denote b y P X the distribution of X deﬁned by P X ( B ) = P { X ∈ B } for B ∈ B F , where B F are the Borel sets of F . Let us no w consider the follo wing regularit y assumption on the regression function η ( x ) = E ( Y | X = x ) (BC) Besico vitc h condition: lim δ → 0 1 P X ( B X,δ ) Z B X,δ η ( z ) dP X ( z ) = η ( X ) in probability , where B x,δ := { z ∈ F : D ( x, z ) ≤ δ } is the closed ball with cen ter x and radius δ . Under (BC) C´ erou and Guy ader (2006, Th. 2) get the following consistency result. Denote by L n and L ∗ , r esp e ctively, the c onditional err or asso ciate d with the ab ove deﬁne d k -NN classiﬁer and the Bayes (optimal) err or for the pr oblem at hand. If ( F , D ) is sep ar able and c ondition (BC) is fulﬁl le d then the k -NN classiﬁer is we akly c onsistent, that is E ( L n ) → L ∗ , as n → ∞ , pr ovide d that k → ∞ and k /n → 0. Besico vic h condition pla ys an important role also in the consistency of kernel rules (see Abraham et al. 2006). C ´ erou and Guy ader (2006) hav e also considered the follo wing more con venien t condition (called P X -con tin uity) that ensures (BC) : F or every  > 0 and for P X -a.e. x ∈ F lim δ → 0 P X { z ∈ F : | η ( z ) − η ( x ) | >  | D ( x, z ) < δ } = 0 . 5 Ho w ever, for our purp oses, it will b e suﬃcient to observe that the contin uity ( P X -a.e.) of η ( x ) implies also (BC) . W e are in terested in ﬁnding families of distributions of ( X , Y ) under whic h the regression function η ( x ) is con tinuous ( P X -a.e.) and hence (BC) holds. F rom now on we will use the follo wing notation. Let µ i b e the distribution of X condi- tional on Y = i , that is, µ i ( B ) = P { X ∈ B | Y = i } , for B ∈ B F and i = 0 , 1. W e denote by S i ⊂ F the supp ort of µ i , for i = 0 , 1, and S = S 0 ∩ S 1 . The expression µ 0 << µ 1 will denote that µ 0 is absolutely contin uous with resp ect to µ 1 . Also we will assume that p = P { Y = 0 } fulﬁlls p ∈ (0 , 1). The following theorem sho ws that the prop erty of con tin uity (resp. P X -con tin uity) of η ( x ), and hence the weak consistency of the k -NN classiﬁer, follo ws from the contin uity (resp P X -con tin uity) of the Radon-Nik o dym deriv ative of µ 0 with respect to µ 1 pro vided that it exists. Theorem 1: Assume that P X ( ∂ S ) = 0 and that µ 0 << µ 1 and µ 1 << µ 0 on S . Then the fol lowing ine quality holds for P X -a.e. x, z ∈ F . | η ( z ) − η ( x ) | ≤ p 1 − p     dµ 0 dµ 1 ( x ) − dµ 0 dµ 1 ( z )     , wher e dµ 0 /dµ 1 denotes the R adon-Niko dym derivative of µ 0 with r esp e ct to µ 1 . When S 0 = S 1 = S the assumption P X ( ∂ S ) = 0 may b e dr opp e d. In p articular, η is c ontinuous P X -a.e. (r esp. P X -c ontinuous) whenever dµ 0 /dµ 1 is c on- tinuous P X -a.e. (r esp. P X -c ontinuous). Of c ourse, a similar r esult holds by inter changing the sub-indic es 0 and 1 and r eplacing p by 1 − p . Pr oof: Deﬁne µ = µ 0 + µ 1 . Then µ i << µ , for i = 0 , 1, and we can deﬁne the Radon- Nik o dym deriv ativ es f i = dµ i /dµ , for i = 0 , 1. F rom the deﬁnition of the conditional exp ectation we kno w that η ( x ) = E ( Y | X = x ) = P ( Y = 1 | X = x ) can b e expressed b y η ( x ) = f 1 ( x )(1 − p ) f 0 ( x ) p + f 1 ( x )(1 − p ) . (4) Observ e that µ | S c ∩ S i = µ i | S c ∩ S i and thus f i | S c ∩ S i = 1 S c ∩ S i , for i = 0 , 1. Since µ 0 << µ 1 and µ 1 << µ 0 on S then, on this set, we can deﬁne the Radon-Nikodym deriv ativ es dµ 0 /dµ 1 and dµ 1 /dµ 0 . In this case, it also holds that µ | S << µ i | S , for b oth i = 0 , 1 and dµ dµ i ( x ) = 1 + dµ 1 − i dµ i ( x ) for any x ∈ S . Then (see, e.g., F olland 1999), for i = 0 , 1 and for P X -a.e. x ∈ S , f i ( x ) = dµ i dµ ( x ) =  dµ dµ i ( x )  − 1 = 1 1 + dµ 1 − i dµ i ( x ) (5) 6 Substituting (5) into expression (4) w e get η ( x ) =        0 if x ∈ S 0 ∩ S c 1 if x ∈ S 1 ∩ S c 1 − p p dµ 0 dµ 1 ( x ) + 1 − p if x ∈ S. (6) Using this last expression w e can see that if P X ( ∂ S ) = 0 and if dµ 0 /dµ 1 is con tin uous P X -a.e. (resp. P X -con tin uous) on S then η is also contin uous P X -a.e. (resp. P X -con tin uous) on S . T o see this it suﬃces to observe that, for P X -a.e. x, z ∈ in t( S ), | η ( z ) − η ( x ) | =      1 − p p dµ 0 dµ 1 ( z ) + 1 − p − 1 − p p dµ 0 dµ 1 ( x ) + 1 − p      ≤ p 1 − p     dµ 0 dµ 1 ( x ) − dµ 0 dµ 1 ( z )     . T o derive the last inequality we ha v e used that, as µ i , i = 0 , 1, are p ositive measures, the Radon-Nik o dym deriv ativ e dµ 0 /dµ 1 is also non-negativ e. 2 In order to b e able to com bine Theorem 1 and the consistency result in C´ erou and Guy ader (2006, Th. 2), we are in terested in ﬁnding distributions µ 0 , µ 1 of an inﬁnite- dimensional random elemen t X such that µ 0 << µ 1 and µ 1 << µ 0 with contin uous Radon- Nik o dym deriv ativ es. Measures µ 0 and µ 1 satisfying that µ 0 << µ 1 and µ 1 << µ 0 on S are said to b e e quivalent on S . Let us denote by ( C [ a, b ] , k k ∞ ) the metric space of contin uous real-v alued functions x deﬁned on the interv al [ a, b ], endow ed with the supremum norm, k x k ∞ = sup {| x ( t ) | : t ∈ [ a, b ] } . Also let C 2 [ a, b ] b e the space of twice con tin uously diﬀeren tiable functions deﬁned on [ a, b ]. In the next theorem w e show a broad class of Gaussian pro cesses fulﬁlling the conditions of Theorem 2 in C ´ erou and Guy ader (2006). Thus the consistency of the k -NN classiﬁer is guaranteed for them. A k ey element in the pro of are the results b y V arb erg (1961) and Jørsbo e (1968) pro viding explicit expressions for the Radon-Nik o dym deriv ativ e of a Gaussian measure with resp ect to another one. F rom the gaussianit y assumption, the mo del is completely determined b y giving the mean and cov ariance functions. F or the sake of a more clear and systematic presen tation the statemen t is divided in to three parts: The ﬁrst one applies to the case where the mean function in b oth functional p opulations, with distributions µ 0 and µ 1 (corresp onding to X | Y = 0 and X | Y = 1), is common and the diﬀerence b et ween both pro cesses lies in the cov ariance functions (whic h how ev er k eep a common structure). The second part considers the dual case where the diﬀerence lies in the mean functions and the cov ariance structure is common. Finally , the third part of the theorem generalizes the previous tw o statemen ts by including the case of diﬀerent mean and co v ariance functions. 7 Theorem 2: L et ( F , D ) = ( C [ a, b ] , k k ∞ ) with 0 ≤ a < b < ∞ . a) Assume that X | Y = i , for i = 0 , 1 , ar e Gaussian pr o c esses on [ a, b ] , whose me an function is zer o and with c ovarianc e functions Γ i ( s, t ) = u i (min( s, t )) v i (max( s, t )) , for s, t ∈ [ a, b ] , wher e u i , v i , for i = 0 , 1 , ar e p ositive functions in C 2 [ a, b ] . Assume also that v i , for i = 0 , 1 , and v 1 u 0 1 − u 1 v 0 1 ar e b ounde d away fr om zer o on [ a, b ] , that u 1 v 0 1 − u 0 1 v 1 = u 0 v 0 0 − u 0 0 v 0 and that u 1 ( a ) = 0 if and only if u 0 ( a ) = 0 . Then dµ 0 /dµ 1 is c ontinuous on F . b) Assume that X | Y = i , for i = 0 , 1 , ar e Gaussian pr o c esses on [ a, b ] , with e qual c ovarianc e function Γ( s, t ) = u (min( s, t )) v (max( s, t )) , for s, t ∈ [ a, b ] , wher e u, v ∈ C 2 [ a, b ] ar e p ositive functions and v and v u 0 − uv 0 ar e b ounde d away fr om zer o on [ a, b ] . Assume also that the me an function of X | Y = 1 is 0 and that of X | Y = 0 is a function m ∈ C 2 [ a, b ] , such that m ( a ) = 0 whenever u ( a ) = 0 . Then dµ 0 /dµ 1 is c ontinuous on F . c) Assume that X | Y = i , for i = 0 , 1 , ar e Gaussian pr o c esses on [ a, b ] , with me an functions m i ∈ C 2 [ a, b ] and c ovarianc e functions Γ i ( s, t ) = u i (min( s, t )) v i (max( s, t )) , for s, t ∈ [ a, b ] , wher e u i , v i , for i = 0 , 1 , ar e p ositive functions in C 2 [ a, b ] which fulﬁl l the same c onditions imp ose d in (a). Assume also that m i ( a ) = 0 whenever u i ( a ) = 0 . Then dµ 0 /dµ 1 is c ontinuous on F . Ther efor e, under the assumptions in either (a), (b) or (c), the k -NN classiﬁer discriminating b etwe en µ 0 and µ 1 is we akly c onsistent when k → ∞ and k /n → 0 . Pr oof: a) V arb erg (1961, Th. 1) shows that, under the assumptions of (a), µ 0 and µ 1 are equiv alen t measures and the Radon-Nik o dym deriv ative of µ 0 with resp ect to µ 1 is given b y dµ 0 dµ 1 ( x ) = C 1 exp  1 2  C 2 x 2 ( a ) + Z b a f ( t ) d  x 2 ( t ) v 0 ( t ) v 1 ( t )  (7) where C 1 =       v 0 ( a ) v 1 ( b ) v 0 ( b ) v 1 ( a )  1 / 2 if u 0 ( a ) = 0  u 1 ( a ) v 1 ( b ) v 0 ( b ) u 0 ( a )  1 / 2 if u 0 ( a ) 6 = 0 C 2 = ( 0 if u 0 ( a ) = 0  v 0 ( a ) u 0 ( a ) − u 1 ( a ) v 1 ( a ) v 1 ( a ) v 0 ( a ) u 0 ( a ) u 1 ( a )  1 / 2 if u 0 ( a ) 6 = 0 and f ( s ) = v 1 ( s ) v 0 0 ( s ) − v 0 ( s ) v 0 1 ( s ) v 1 ( s ) u 0 1 ( s ) − u 1 ( s ) v 0 1 ( s ) for s ∈ [ a, b ] . Observ e that, by the assumptions of the theorem, this function f is diﬀerentiable with b ounded deriv ativ e. Th us f is of b ounded v ariation and it ma y b e expressed as the 8 diﬀerence of tw o b ounded p ositive increasing functions. Therefore the sto chastic in tegral (7) is w ell deﬁned and it can b e ev aluated in tegrating by parts, dµ 0 dµ 1 ( x ) = C 1 exp  1 2  C 3 x 2 ( a ) + C 4 x 2 ( b ) − Z b a x 2 ( t ) v 0 ( t ) v 1 ( t ) d f ( t )  with C 3 = C 2 − f ( a ) /v 0 ( a ) v 1 ( a ) and C 4 = f ( b ) /v 0 ( b ) v 1 ( b ). It is clear that this deriv ative is a con tin uous functional of x with resp ect to the supremum norm. No w, Theorem 1 implies that η ( x ) is con tin uous and, therefore, Besico vich condition (BC) holds and, from Theorem 2 in C ´ erou and Guy ader (2006), the k -NN classiﬁer is w eakly consisten t. Note that the equiv alence of µ 0 and µ 1 implies the coincidence of b oth supp orts S 0 = S 1 = S . b) In Jørsb o e (1968), p. 61, it is prov ed that, under the indicated assumptions, µ 0 and µ 1 are equiv alen t measures with the following Radon-Nik o dym deriv ativ e dµ 0 dµ 1 ( x ) = exp  D 1 + D 2 x ( a ) + 1 2 Z b a g ( t ) d  2 x ( t ) − m ( t ) v ( t )  where D 1 = − m 2 ( a ) 2 u ( a ) v ( a ) 1 { u ( a ) > 0 } , D 2 = m ( a ) u ( a ) v ( a ) 1 { u ( a ) > 0 } and g ( t ) = v ( t ) m 0 ( t ) − m ( t ) v 0 ( t ) v ( t ) u 0 ( t ) − u ( t ) v 0 ( t ) . Again, the in tegration by parts giv es dµ 0 dµ 1 ( x ) = exp  D 3 +  D 2 − 2 g ( a ) v ( a )  x ( a ) + 2 g ( b ) v ( b ) x ( b ) − 2 Z b a x ( t ) v ( t ) dg ( t )  , (8) with D 3 = D 1 − Z b a g ( t ) d  m ( t ) v ( t )  . Th us dµ 0 /dµ 1 , and hence η , are contin uous and the consistency of the k -NN classiﬁer holds also in this case. c) Let us denote b y P m, Γ the distribution of the Gaussian pro cess with mean m and co v ari- ance function Γ. Then dµ 0 dµ 1 ( x ) is con tin uous since (see e.g. F olland 1991) dµ 0 dµ 1 ( x ) = dP m 0 , Γ 0 dP m 1 , Γ 1 ( x ) = dP m 0 , Γ 0 dP 0 , Γ 0 ( x ) dP 0 , Γ 0 dP 0 , Γ 1 ( x ) dP 0 , Γ 1 dP m 1 , Γ 1 ( x ) , (9) and, as w e ha ve sho wn in the pro ofs of (a) and (b), the Radon-Nikodym deriv atives in the right-hand side of (9) are all contin uous. 2 9 Remark 1 (Applica tion to the Ornstein-Uhlenbeck pr ocesses). Let X | Y = i , for i = 0 , 1, b e Gaussian processes on [ a, b ], with zero mean and cov ariance function Γ i ( s, t ) = σ 2 i exp( − β i | s − t | ), for s, t ∈ [ a, b ], where β i , σ i > 0 for i = 0 , 1. Assume that σ 2 1 β 1 = σ 2 0 β 0 . Then these pro cesses satisfy the assumptions in Theorem 2(a). Remark 2 (Applica tion to the Bro wnian motion). Theorem 2(b) can also b e used to consistently discriminate b etw een a Bro wnian motion without trend ( m 0 = 0) and another one with trend ( m 1 6 = 0). It will suﬃce to consider the case where u ( t ) = t and v ≡ 1. Remark 3 (On triangular cov ariance functions). Co v ariance functions of t yp e Γ( s, t ) = u (min( s, t )) v (max( s, t )), called triangular , hav e received considerable attention in the literature. F or example, Sac ks and Ylvisaker (1966) use this condition in the study of optimal designs for regression problems where the errors are generated by a zero mean pro cess with co v ariance function K ( s, t ). It turns out that the Hilb ert space with repro ducing k ernel K pla ys an imp ortant role in the results and, as these authors p oint out, the norm of this space is particularly easy to handle when K is triangular. On the other hand, V arb erg (1964) has given an interesting represen tation of the pro cesses X ( t ) , 0 ≤ t < b , with zero mean and triangular cov ariance function by proving that they can b e expressed in the form X ( t ) = Z b 0 W ( u ) d u R ( t, u ) , where W is the standard Wiener pro cess and R = R ( t, u ) is a function, of b ounded v ariation with resp ect to u , deﬁned in terms of K . Remark 4 (On plug-in functional classifiers). The explicit kno wledge of the con- ditional exp ectation (6) in the cases considered in Theorem 2 could b e explored from the statistical point of view as they suggest to use “plug-in” classiﬁers obtained by replacing η ( x ) in (1) with suitable parametric or semiparametric estimators. Remark 5 (On equiv alent Gaussian measures and their suppor ts). According to a well-kno wn result b y F eldman and H´ ajek, for an y given pair of Gaussian pro cesses, there is a dic hotom y in suc h a w a y that they are either equiv alen t or m utually singular. In the ﬁrst case b oth measures µ 0 and µ 1 ha v e a common supp ort S so that Theorem 1 is applicable with S = S 0 = S 1 . As for the identiﬁcation of the supp ort, V akhania (1975) has pro v ed that if a Gaussian pro cess, with tra jectories in a separable Banach space F , is not degenerate (i.e., then the distribution of an y non-trivial linear contin uous functional is not degenerate) then the supp ort of suc h pro cess is the whole space F . Again, expression (6) of the regression functional η suggests the p ossibilit y of inv estigating p ossible nonparametric estimators for the Radon-Nikodym deriv ative dµ 0 /dµ 1 whic h w ould in turn provide plug-in v ersions of the Bay es rule g ∗ ( x ) = 1 { η ( x ) > 1 / 2 } with no further assumption on the structure of the inv olv ed Gaussian pro cesses, apart from their equiv alence. 10 3. Some n umerical comparisons The aim of this section is to compare (n umerically) the performance of sev eral sup ervised functional classiﬁcation pro cedures already in tro duced in the literature. The pro cedures are the k -NN rule, computed b oth with resp ect to the supremum norm k k ∞ and the L 2 norm k k 2 , and other discrimination rules reviewed in Section 3.1. One of the ob jectiv es of this n umerical study is to hav e some insigh t in to which classiﬁcation pro cedures p erform w ell no matter the type of functional data under consideration and could thus b e considered a sort of b enc hmark for the functional discrimination problem. Section 3.2 con tains a Monte Carlo study carried out on t w o diﬀeren t functional data generating mo dels. In Section 3.3 w e consider six functional real data sets tak en from the literature. 3.1 Other functional classiﬁers Here w e will review other classiﬁcation techniques that hav e b een used in the literature in the context of functional data. F rom now on w e denote by ( t 1 , . . . , t N ) the no des where the functional predictor X has b een observed. Partial L e ast Squar es (PLS) classiﬁc ation Let us ﬁrst describ e the pro cedure in the context of a multiv ariate predictor X . PLS is actually a dimension reduction technique for regression problems with predictor X and a resp onse Y (whic h in the case of classiﬁcation tak es only tw o v alues, 0 or 1, depending on which p opulation the individual comes from). The dimension reduction is carried out b y pro jecting X on to an lo wer dimensional space such that the co ordinates of the pro jected X , the PLS co ordinates, are uncorrelated to eac h other and hav e maximum cov ariance with Y . Then, if the aim is classiﬁcation, Fisher’s linear discriminan t is applied to the PLS co ordinates of X (see Barker and Ra y ens 2003, Liu and Ray ens 2007). In the case of a functional predictor X (see Preda et al. 2007), the ab o v e described pro cedure is applied to the discretized v ersion of X , X = ( X ( t 1 ) , X ( t 2 ) , . . . , X ( t N )). Here we hav e c hosen the n um b er of PLS directions, among the v alues 1,. . . ,10, by cross-v alidation. R epr o ducing Kernel Hilb ert Sp ac e (RKHS) classiﬁc ation W e will also deﬁne this technique initially for a multiv ariate predictor X . F or simplicit y , w e will assume that X takes v alues in [0 , 1] N . Let κ b e a function deﬁned on [0 , 1] N × [0 , 1] N . A RKHS with kernel κ is the v ector space generated b y all ﬁnite linear combinations of functions of the form κ t ∗ ( · ) = κ ( t ∗ , · ), for any t ∗ ∈ [0 , 1] N , and endow ed with the inner pro duct giv en b y h κ t ∗ , κ t ∗∗ i κ = κ ( t ∗ , t ∗∗ ). RKHS are frequen tly used in the context of Mac hine Learning (see Evgeniou et al. 2002, W ah ba 2002); for their applications in Statistics the reader is referred to the monograph of Berlinet and Thomas-Agnan (2004). In this w ork we use the Gaussian 11 k ernel κ ( s , t ) = exp( −k s − t k 2 2 /σ 2 κ ), where σ κ > 0 is a ﬁxed parameter. The classiﬁcation problem is solv ed by plugging a regression estimator of the type η n ( x ) = P n i =1 c i κ ( x , X i ) in to the Ba y es classiﬁer. When X is a random function, this procedure is applied to the discretized X . The parameters c i , for i = 1 , . . . , n , are c hosen to minimize the risk functional n − 1 P n i =1 ( Y i − η n ( X i )) 2 + λ h η , η i κ , where λ > 0 is a p enalization parameter. In this work the v alues of the parameters λ and σ κ ha v e b een c hosen b y cross-v alidation via a lea v e-one-out pro cedure. According to our results, it seems that the p erformance the RKHS metho dology is rather sensitiv e to changes in these parameters and ev en to the starting p oint of the lea v e-one-out pro cedure mentioned. Classiﬁc ation via depth me asur es The idea is to assign a new observ ation x to that p opulation, P 0 or P 1 , with resp ect to which x is deep er (see Ghosh and Chaudh uri 2005, Cuev as et al. 2007). F rom the ﬁve functional depth measures considered b y Cuev as et al. (2007) we hav e tak en the h -mode depth and the random pro jection (RP) depth. Sp eciﬁcally , the h -mo de depth of x with resp ect to the p opulation given b y the random elemen t X is deﬁned as f h ( x ) = E ( K h ( k x − X k 2 )), where K h ( · ) = h − 1 K ( · /h ), K is a k ernel function (here w e hav e tak en the Gaussian k ernel K ( t ) = p 2 /π exp( − t 2 / 2)) and h is a smo othing parameter. As the distribution of X is usually unknown, in the simulations we actually use the empirical version of f h , ˆ f h ( x ) = n − 1 P n i =1 K h ( k x − X i k 2 ). The smo othing parameter has b een c hosen as the 20 p ercentile in the L 2 distances b etw een the functions in the training sample (see Cuev as et al. 2007). T o compute the RP depth the training sample X 1 , . . . , X n is pro jected on to a (functional) random direction a (independent of the X i ). The sample depth of an observ ation x with resp ect to P i is deﬁned as the univ ariate depth of the pro jection of x on to a with resp ect to the pro jected training sample from P i . Since a is a random element this deﬁnition leads to a random measure of depth, but a single representativ e v alue has b een obtained b y a v eraging these random depths ov er 50 indep endent random directions (see Cuev as and F raiman 2008 for a certain theoretical dev elopment of this idea). If w e are w orking with discretized versions ( x ( t 1 ) , . . . , x ( t N )) of the functional data x ( t ), we ma y take a according to a uniform distribution on the unit sphere of R N . This can be achiev ed, for example, setting a = Z / k Z k , where Z is drawn from standard Gaussian distribution on R N . Moving window rule The moving windo w classiﬁer is giv en by g n ( x ) =  0 if P n i =1 1 { Y i =0 ,X i ∈ B ( x,h ) } ≥ P n i =1 1 { Y i =1 ,X i ∈ B ( x,h ) } , 1 otherwise , where h = h n > 0 is a smo othing parameter. This classiﬁcation rule was considered in the functional setting, for instance, by Abraham et al. (2006). In this w ork the parameter h has b een chosen again via cross-v alidation. 12 3.2 Monte Carlo r esults In this section we study tw o functional data mo dels already considered by other authors. More speciﬁcally , in Mo del 1, similar to one used in Cuev as et al. (2007), X | Y = i is a Gaussian pro cess with mean m i ( t ) = 30 (1 − t ) 1 . 1 i t 1 . 1 1 − i and co v ariance function Γ i ( s, t ) = 0 . 25 exp( −| s − t | / 0 . 3), for i = 0 , 1. Observ e that this mo del with smo oth tra jectories satisﬁes the assumptions in Theorem 2 and thus w e would exp ect the k -NN classiﬁcation rule (with resp ect to the k k ∞ norm) to p erform nicely . Let us note that the v alue of 1.1 in the exponent of m i ( t ) is in fact the one used in Mo del 1, pg. 487, of Cuev as et al. (2007), although in their work a 1.2 was misprin ted instead. Mo del 2 appears in Preda et al. (2007), but here the functions h i , used to deﬁne the mean, hav e b een rescaled to hav e domain [0 , 1]. The tra jectories of X | Y = i are giv en b y X i ( t ) = U h 1 ( t ) + (1 − U ) h i +2 ( t ) +  ( t ) for i = 0 , 1 , (10) where U is uniformly distributed on [0 , 1], h 1 ( t ) = 2 max(3 − 5 | 2 t − 1 | , 0), h 2 ( t ) = h 1 ( t − 1 / 5), h 3 ( t ) = h 1 ( t + 1 / 5) and the  ( t ) is an approximation to the con tinuous-time white noise. In practice, this means that in the discretized appro ximations ( X ( t 1 ) , . . . , X ( t N )) to X ( t ), the v ariables  ( t 1 ) , . . . ,  ( t N ) are indep enden tly drawn from a standard normal distribution. The simulation results are summarized in T ables 1 and 2. The num ber of equispaced no des where the functional data hav e b een ev aluated is the same for b oth mo dels, 51. The n um b er of Monte Carlo runs is 100. In ev ery run w e generated tw o training samples (from X | Y = 0 and X | Y = 1 resp ectively) each with sample size 100, and we also generated a test sample of size 50 from each of the t wo p opulations. The tables displa y the descriptiv e statistics of the prop ortion of correctly classiﬁed observ ations from these test samples. k -NN | ∞ k -NN | 2 PLS RKHS h -mo dal RP(hM) MWR Minim um 0.6200 0.6600 0.6000 0.4800 0.6400 0.5400 0.6600 First quartile 0.8000 0.8000 0.8000 0.6600 0.8000 0.7800 0.8000 Median 0.8400 0.8400 0.8400 0.8400 0.8400 0.8400 0.8400 Mean 0.8396 0.8354 0.8371 0.7999 0.8409 0.8260 0.8393 Third quartile 0.8800 0.8800 0.8800 0.9400 0.8800 0.8800 0.8800 Maxim um 0.9800 0.9600 0.9800 1.0000 0.9800 0.9800 1.0000 Std. deviation 0.0603 0.0572 0.0668 0.1457 0.0589 0.0725 0.0634 T able 1: Sim ulation results for Mo del 1 Regarding Mo del 1, observ e that there is little diﬀerence b etw een the correct classiﬁcation rates of an y of the metho ds, except for the RKHS procedure whic h p erforms worse. In Mo del 2 the PLS, RKHS and h -mo dal metho ds sligh tly outp erform the others. When the 13 k -NN | ∞ k -NN | 2 PLS RKHS h -mo dal RP(hM) MWR Minim um 0.8400 0.8400 0.8800 0.8400 0.8600 0.8400 0.8200 First quartile 0.9200 0.9400 0.9600 0.9600 0.9400 0.9400 0.9400 Median 0.9600 0.9600 0.9800 0.9800 0.9800 0.9600 0.9600 Mean 0.9522 0.9558 0.9686 0.9688 0.9657 0.9522 0.9570 Third quartile 0.9800 0.9800 0.9800 1.0000 1.0000 0.9800 0.9800 Maxim um 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 Std. deviation 0.0335 0.0355 0.0279 0.0313 0.0308 0.0345 0.0349 T able 2: Sim ulation results for Mo del 2 Mon te Carlo study with this mo del was carried out, w e also applied the k -NN classiﬁcation pro cedures to a spline-smo othed version of the X tra jectories. The result w as that the mean correct classiﬁcation rate increased to 0.9582 in the case of the suprem um norm and to 0.9624 in the case of the L 2 norm. This, together with the analysis of the ﬂies data in the next subsection, seems to suggest that, when the curv es X are irregular, smoothing these functions will enhance the k -NN discrimination pro cedure. 3.3. Some c omp arisons b ase d on r e al data sets 3.3.1. Brief description of the data sets Berkeley Gr owth Data: The Berk eley Growth Study (T uddenham and Sn yder 1954) recorded the heights of n 0 = 54 girls and n 1 = 39 b oys b et ween the ages of 1 and 18 years. Heigh ts w ere measured at 31 ages for eac h child. These data ha v e b een previously analyzed b y Ramsa y and Silv erman (2002). ECG data: These are electro cardiogram (ECG) data, studied b y W ei and Keogh (2006), from the MIT-BIH Arrhythmia database (see Goldb erger et al. 2000). Each observ ation con tains the successive measurements recorded b y one electro de during one heartb eat and w as normalized and rescaled to ha v e length 85. A group of cardiologists hav e assigned a lab el of normal or abnormal to eac h data record. Due to computational limitations, of the original 2026 records in the data set, w e hav e randomly chosen only 200 observ ations from eac h group. MCO data: The v ariable under study is the mito chondrial calcium o verload (MCO), mea- sured every 10 seconds during an hour in isolated mouse cardiac cells. The data come from researc h conducted by Dr. Da vid Garc ´ ıa-Dorado at the V all d’Hebron Hospital (see Ruiz- Meana et al. 2003, Cuev as, F ebrero and F raiman 2004, 2007). In order to assess if a certain drug increased the MCO lev el, a sample of functions of size n 0 = 45 w as tak en from a con trol group and n 1 = 44 functions w ere sampled from the treatmen t group. 14 Sp e ctr ometric data: F or eac h of 215 pieces of meat a sp ectrometer provided the absorbance attained at 100 diﬀerent wa velengths (see F errat y and Vieu 2006 and references therein). The fat conten t of the meat w as also obtained via c hemical pro cessing and each of the meat pieces was classiﬁed as low- or high-fat. Phoneme data: The X v ariable is the log-p erio dogram (discretized to 150 no des) of a phoneme. The t w o populations corresp ond to phonemes “aa” and “ao” resp ectively (see more information in F erraty and Vieu 2006). W e hav e considered a sample of 100 observ a- tions from eac h phoneme. Me dﬂies data: This dataset w as obtained by Prof. Carey from U.C. Davis (see Carey et al. 1998) and has b een studied, for instance, by M ¨ uller and Stadtm ¨ uller (2005). The predictor X is the num b er of eggs laid daily by a Mediterranean fruit ﬂy for a 30-day p erio d. The ﬂy is classiﬁed as long-liv ed if its remaining lifetime past 30 days is more than 14 da ys and short-liv ed otherwise. The num b er of long- and short-lived ﬂies observed w as 256 and 278 resp ectiv ely . 3.3.2. R esults W e ha v e applied the classiﬁcation tec hniques review ed in Section 3.1 to the real data sets just describ ed. While carrying out the simulations of Subsection 3.1, w e observ ed that the performance of the RKHS procedure w as v ery dep enden t on the initial v alues of the parameters σ K and λ pro vided for the cross-v alidation algorithm. In fact, ﬁnding initial v alues for these parameters that would ﬁnally yield comp etitiv e results with resp ect to the other metho ds took a considerable time. Th us w e decided to exclude the RKHS classiﬁcation metho d from the study with real data. W e ha ve computed, via a cross-v alidation pro cedure, the mean correct classiﬁcation rates attained by the diﬀeren t discrimination metho ds on the real data sets. In T able 3 we displa y the results. Since the egg-la ying tra jectories in the medﬂies data set were v ery irregular and spiky , we ha v e computed the correct classiﬁcation rate for b oth the original data and a smo othed v ersion obtained with splines. The smo othing leads to a b etter p erformance of the k -NN pro cedure with the supremum metric, just as it happ ened in the simulations with Mo del 2. As a conclusion w e w ould say that the k -NN classiﬁcation metho dology with resp ect to the L ∞ norm is alwa ys among the b est p erforming ones if the X tra jectories are smo oth. The k -NN pro cedure with resp ect to the L 2 norm and the PLS metho dology give also go o d results, although the latter has the drawbac k of a m uc h higher computation time. References 15 Data set k -NN | ∞ k -NN | 2 PLS h -modal RP(hM) MWR Gro wth 0.9462 0.9677 0.9462 0.9462 0.9462 0.9570 ECG 0.9900 0.9950 0.9825 0.9900 0.8575 0.8850 MCO 0.8427 0.8315 0.8876 0.7640 0.7079 0.6854 Sp ectrometric 0.9070 0.8558 0.9163 0.6791 0.6930 0.6558 Phoneme 0.7300 0.7800 0.7400 0.7300 0.7450 0.6950 Medﬂies (non-smo othed) 0.5468 0.5412 0.5262 0.4925 0.5056 0.5431 (smo othed) 0.5712 0.5431 0.5094 0.5075 0.5543 0.5206 T able 3: Mean correct classiﬁcation rates for the real data sets Abraham, C., Biau, G. and Cadre, B. (2006). On the kernel rule for function classiﬁcation. Annals of the Institute of Statistical Mathematics 58, 619-633. Bark er M. and Ray ens W. (2003). Partial least squares for discrimination. Journal of Chemometrics 17, 166-73. Berlinet, A. and Thomas-Agnan, C. (2004). Repro ducing Kernel Hilb ert Spaces in Proba- bilit y and Statistics. Klu wer Academic Publishers. Biau, G., Bunea, F. and W egk amp, M. (2005). F unctional classiﬁcation in Hilb ert spaces. IEEE T ransactions on Information Theory 51, 2163-2172. Carey , J.R., Liedo, P ., M ¨ uller, H.G., W ang, J.L. and Chiou, J.M. (1998). Relationship of age patterns of fecundity to mortality , longevity , and lifetime repro duction in a large cohort of Mediterranean fruit ﬂy females. Journal of Gerontology , Ser. A 53, 245–251. C ´ erou, F. and Guyader, A. (2006). Nearest neighbor classiﬁcation in inﬁnite dimension. ESAIM: Probability and Statistics 10, 340-355. Cuev as, A., F ebrero, M and F raiman, R. (2004). An ANO V A test for functional data. Computational Statistics and Data Analysis 47, 111–122. Cuev as, A., F ebrero, M and F raiman, R. (2007). Robust estimation and classiﬁcation for functional data via pro jection-based depth notions. Computational Statistics 22, 481–496. Cuev as, A. and F raiman, R. (2008). On depth measures and dual statistics. A metho dology for dealing with general data. Manuscript . Devro y e, L., Gy¨ orﬁ, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer-V erlag. Evgeniou , T., P oggio, T. P on til, M. and V erri, A. (2002). Regularization and statistical learning theory for data analysis. Computational Statistics and Data Analysis, 38, 421– 432. F erraty , F. and Vieu, P . (2003). Curv es discrimination: A nonparametric functional ap- proac h. Computational Statistics and Data Analysis 44, 161–173. 16 F erraty , F. and Vieu, P . (2006). Nonparametric Mo delling for F unctional Data. Springer. F err´ e, L. and Villa, N. (2006). Multilay er p erceptron with functional inputs: an inv erse regression approach. Scandina vian Journal of Statistics 33, 807–823, Fisher, R.A. (1936). The use of multiple measuremen ts in taxonomic problems. Annals of Eugenics 7, 179–188. F olland, G. B. (1999). Real analysis. Mo dern techniques and their applications. Wiley . Ghosh, A. K. and Chaudh uri, P . (2005). On maximal depth and related classiﬁers. Scandi- na vian Journal of Statistics 32, 327–350. Goldb erger, A., Amaral, L., Glass, L., Hausdorﬀ, J., Iv anov, P ., Mark, R., Mietus, J., Moo dy , G., P eng, C., and He, S. (2000). PhysioBank, PhysioT oolkit, and Ph ysioNet: Comp onents of a New Researc h Resource for Complex Ph ysiologic Signals. Circulation 101, 215–220. Hand, D.J. (1997). Construction and Assessment of Classiﬁcation Rules. Wiley . Hand, D.J. (2006). Classiﬁer tec hnology and the illusion of progress. Statistical Science 21, 1–14. Hastie, T., Tibshirani, R. and F riedman, J. (2001). The Elemen ts of Statistical Learning. Springer. James, G.M. and Hastie, T.J. (2001). F unctional linear discriminan t analysis for irregularly sampled curves. Journal of the Roy al Statistical So ciety , Ser. B 63, 533-550. Jørsb o e, O. G. (1968). Equiv alence or Singularit y of Gaussian Measures on F unction Spaces. V arious Publications Series, No. 4, Matematisk Institut, Aarh us Universitet, Aarh us. Liu, Y. and Ra yens, W. (2007). PLS and dimension reduction for classiﬁcation. Computa- tional Statistics 22, 189–208. M ¨ uller, H.G. and Stadtm ¨ uller, U. (2005). Generalized functional linear mo dels. The Annals of Statistics 33, 774-805. Preda, C. (2007). Regression models for functional data b y repro ducing kernel Hilb ert spaces metho ds. Journal of Statistical Planning and Inference 137, 829–840. Preda, C., Sap orta, G. and L ´ ev ´ eder, C. (2007). PLS classiﬁcation of functional data. Com- putational Statistics 22, 223–235. Ramsa y , J.O. and Silv erman, B.W. (2002). Applied F unctional Data Analysis. Metho ds and Case Studies. Springer-V erlag. Ramsa y , J.O. and Silv erman, B.W. (2005). F unctional Data Analysis. Second edition. Springer. Ruiz-Meana, M., Garc ´ ıa-Dorado, D., Pina, P ., Inserte, J., Agull´ o, L. and Soler-Soler, J. (2003). Carip oride preserves mito chondrial proton gradien t and delays A TP depletion in 17 cardiom y o cites during ischemic conditions. American Journal of Physiology - Heart and Circulatory Physiology 285, 999–1006. Sac ks, J. and Ylvisaker, N.D. (1966). Designs for regression problems with correlated errors. Annals of Mathematical Statistics 37, 66–89. Stone, C. J. (1977). Consistent nonparametric regression. The Annals of Statistics 5, 595- 645. T uddenham, R. D. and Sn yder, M. M. (1954). Ph ysical growth of California b o ys and girls from birth to eigh teen years. Univ ersity of California Publications in Child Dev elopmen t 1, 183–364. V akhania, N.N. (1975). The top ological supp ort of Gaussian measure in Banach space. Nago y a Mathematical Journal 57, 59–63. V arb erg, D.E. (1961). On equiv alence of Gaussian measures. Paciﬁc Journal of Mathematics 11, 751–762. V arb erg, D.E. (1964). On Gaussian measures equiv alent to Wiener measure. T ransactions of the American Mathematical So ciety 113, 262–273. W ahba, G. (2002). Soft and hard classiﬁcation b y reproducing k ernel Hilb ert space metho ds. Pro ceedings of National Academy of Sciences 99, 16524–16530. W ei, L. and Keogh, E. (2006). Semi-Sup ervised Time Series Classiﬁcation. Pro ceedings of the 12th ACM SIGKDD In ternational Conference on Knowledge Disco very and Data Mining, 748–753, Philadelphia, U.S.A. 18

Supervised functional classification: A theoretical remark and some comparisons

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment