Supervised functional classification: A theoretical remark and some comparisons
The problem of supervised classification (or discrimination) with functional data is considered, with a special interest on the popular k-nearest neighbors (k-NN) classifier. First, relying on a recent result by Cerou and Guyader (2006), we prove the…
Authors: Amparo Baillo, Antonio Cuevas
Sup ervised functional classification: A theoretical remark and some comparisons Amparo Ba ´ ıllo ∗ and An tonio Cuev as † Departamen to de An´ alisis Econ´ omico: Econom ´ ıa Cuan titativ a, Univ. Aut´ onoma de Madrid, Spain Departamen to de Matem´ aticas, Univ. Aut´ onoma de Madrid, Spain Abstract The problem of sup ervised classification (or discrimination) with functional data is considered, with a sp ecial in terest on the popular k -nearest neigh b ors ( k -NN) classifier. First, relying on a recent result by C´ erou and Guy ader (2006), w e prov e the consis- tency of the k -NN classifier for functional data whose distribution b elongs to a broad family of Gaussian processes with triangular cov ariance functions. Second, on a more practical side, we chec k the b ehavior of the k -NN metho d when compared with a few other functional classifiers. This is carried out through a small sim ulation study and the analysis of several real functional data sets. While no global “uniform” winner emerges from suc h comparisons, the o v erall performance of the k -NN metho d, together with its sound in tuitive motiv ation and relativ e simplicity , suggests that it could represen t a reasonable b enchmark for the classification problem with functional data. Key wor ds and phr ases . Sup ervised classification, functional data, pro jections metho d, nearest neighbors, discriminant analysis. AMS 2000 subje ct classific ation . Primary 62G07; secondary 62G20. ∗ Corresp onding author. Phone: +34 914978640, e-mail: amparo.baillo@uam.es † The research of both authors w as partially supported b y Spanish grant MTM2007-66632 and the IV PRICIT program titled Mo delizaci´ on Matem´ atic a y Simulaci´ on Num´ eric a en Ciencia y T e cnolo g ´ ıa (SIMU- MA T). 1 1. In tro duction 1.1 Some b ackgr ound on sup ervise d classific ation Sup ervised classification is the mo dern name for one of the oldest statistical problems in exp erimental science: to decide whether an individual, from which just a random mea- suremen t X (with v alues in a “feature space” F endow ed with a metric D ) is known, either b elongs to the p opulation P 0 or to P 1 . F or example, in a medical problem P 0 and P 1 could corresp ond to the group of “healthy” and “ill” individuals, resp ectiv ely . The decision must b e tak en from the information provided by a “training sample” X n = { ( X i , Y i ) , 1 ≤ i ≤ n } , where X i , i = 1 , . . . , n , are indep endent replications of X , measured on n randomly c hosen individuals, and Y i are the corresp onding v alues of an indicator v ariable which tak es v alues 0 or 1 according to the membership of the i -th individual to P 0 or P 1 . Th us the mathematical problem is to find a “classifier” g n ( x ) = g n ( x ; X n ), with g n : F → { 0 , 1 } , that minimizes the classification error P { g n ( X ) 6 = Y } . The term “sup ervised” refers to the fact that the individuals in the training sample are supp osed to b e correctly classified, typically using “external” non statistical pro cedures, so that they pro vide a reliable basis for the assignation of the new observ ation. This problem, also known as “statistical discrimination” or “pattern recognition”, is at least 70 years old. The origin go es bac k to the classical work b y Fisher (1936) where, in the d -v ariate case F = R d , a simple “linear classifier” g n ( x ) = 1 { x : w 0 x + w 0 > 0 } w as introduced ( 1 A stands for the indicator function of a set A ⊂ F ). A deep insightful p ersp ective of the sup ervised classification problem can be found in the b o ok of Devroy e et al (1996). Other useful textb o oks are Hand (1997) and Hastie et al. (2001). All of them fo cus on the standard m ultiv ariate case F = R d . It is not difficult to prov e (e.g., Devro y e et al., 1996, p. 11) that the optimal classification rule (often called “Ba yes rule”) is g ∗ ( x ) = 1 { η ( x ) > 1 / 2 } , (1) where η ( x ) = E ( Y | X = x ). Of course, since η is unkno wn the exact expression of this rule is usually unknown, and thus differen t pro cedures hav e b een prop osed in order to approximate it. In particular, it can b e seen that Fisher’s linear rule is optimal pro vided that the con- ditional distributions of X | Y = 0 and X | Y = 1 are b oth normal with identical cov ariance matrix. While these conditions lo ok quite restrictiv e, and it is straightforw ard to construct problems where any linear rule has a p o or p erformance, Fisher’s classifier is still by far the most p opular c hoice among users. A simple non-parametric alternativ e is giv en by the k -nearest neigh b ors ( k -NN) metho d whic h is obtained b y replacing the unkno wn regression function η ( x ) in (1) with the regression estimator η n ( x ) = 1 k n X i =1 1 { X i ∈ k ( x ) } Y i (2) 2 where k = k n is a given (integer) smo othing parameter and “ X i ∈ k ( x )” means that X i is one of the k nearest neighbours of x . More concretely , if the pairs ( X i , Y i ) 1 ≤ i ≤ n are re- indexed as ( X ( i ) , Y ( i ) ) 1 ≤ i ≤ n so that the X ( i ) ’s are arranged in increasing distance from x , D ( x, X (1) ) ≤ D ( x, X (2) ) ≤ . . . ≤ D ( x, X ( n ) ), then k ( x ) = { X ( i ) , 1 ≤ i ≤ k } . This leads to the k -NN classifier g n ( x ) = 1 { η n ( x ) > 1 / 2 } . It is well-kno wn that, in addition to this simple classifier, several other alternativ e meth- o ds (kernel classifiers, neural netw orks, supp ort v ector mac hines,...) ha v e been dev elop ed and extensiv ely analyzed in the latest years. Ho w ev er, when used in practice with real data sets, the p erformance of Fisher’s rule is often found to b e v ery close to that of the b est one among all the main alternative pro cedures. On these grounds, Hand (2006) has argued in a pro v o cative pap er ab out the “illusion of progress” in sup ervised classification tec hniques. The central idea w ould b e that the study of new classification rules often fails to tak e into accoun t the structure of real data sets and it tends to ov erlo ok the fact that, in spite of the its theoretical limitations, Fisher’s rule is quite satisfactory in many practical applications. This, together with its conceptual simplicit y , explains its p opularit y ov er the years. 1.2 The purp ose and structur e of this p ap er W e are concerned here with the problem of (binary) sup ervised classification with func- tional data. That is, w e consider the general framew ork indicated ab ov e but w e will assume throughout that the space ( F , D ) where the random elements X i tak e v alues is a separable metric space of functions. F or some theoretical results (Theorem 2) we will imp ose a more sp ecific assumption b y taking F as the space C [ a, b ] of real contin uous functions defined in a closed finite in terv al [ a, b ], with the usual supremum norm k k ∞ . The study of discrimination techniques with functional data is not as developed as the corresp onding finite-dimensional theory but, clearly , is one of the most active researc h topics in the b o oming field of functional data analysis (FD A). Two well-kno wn b o oks including broad ov erviews of FDA with interesting examples are F errat y and Vieu (2006) and Ramsay and Silverman (2005). Other recent more sp ecific references will b e men tioned b elow. There are of course several imp ortant differences b et ween the theory and practice of sup ervised classification for functional data and the classical developmen t of this topic in the finite-dimensional case, where t ypically the data dimension d is m uc h smaller than the sample size n (the “high-dimensional” case where d is “large”, and usually d > n , requires a separate treatmen t). A first imp ortant practical difference is the role of Fisher’s linear discriminan t metho d as a “default” choice and a b enc hmark for comparisons. As we ha v e men tioned, this holds for the finite dimensional cases with “small” v alues of d but it is not longer true if functional (or high-dimensional) data are inv olved. T o b egin with, there is no ob vious w a y to apply in practice Fisher’s idea in the infinite-dimensional case, as it requires to in v ert a linear operator which is not in general a straigh tforw ard task in functional spaces; see, ho w ev er, James and Hastie (2001) for an in teresting adaptation of linear discrimination ideas to a functional setting. Then, the question is whether there exists an y functional 3 discriminan t metho d, based on simple ideas, which could play a reference role similar to that of Fisher’s metho d in the finite dimensional case. The results in this pap er suggest (as a partial, not definitiv e, answer) that the k -NN metho d could represent a “default standard” in functional settings. Another difference, particularly imp ortant from the theoretical p oint of view, concerns the univ ersal consistency of the k -NN classifier. A classical result by Stone (1977) establishes that in the finite-dimensional case (with X i ∈ R d ) the conditional error of the k -NN classifier L n = P { g n ( X ) 6 = Y |X n } , (3) con v erges in probability (and also in mean) to that of the Ba y es (optimal) rule g ∗ , that is, E ( L n ) → L ∗ = P { g ∗ ( X ) 6 = Y } , pro vided that k n → ∞ and k n /n → 0 as n → ∞ . This result holds univ ersally , that is, irresp ective of the distribution of the v ariable ( X , Y ). The interesting p oin t here is that this universal consistency result is no longer v alid in the infinite-dimensional setting. As recen tly pro v ed b y C ´ erou and Guy ader (2006), if the space F where X tak es v alues is a general separable metric space, a non-trivial condition must b e imposed on the distribution of ( X, Y ) in order to ensure the consistency of the k -NN classifier. The aim of this pap er is t wofold, with a common focus on the k -NN classifier and in close relation with the abov e men tioned t w o differences b etw een the classification problem in finite and infinite settings. First, on the theoretical side, w e hav e a further lo ok at the consistency theorem in C ´ erou and Guy ader (2006) b y giving concrete non-trivial examples where their consistency condition is fulfilled. Second, from a more practical viewpoint, w e will carry out numerical comparisons (based b oth on Monte Carlo studies and real data examples) to assess the p erformance of different functional classifiers, including k -NN. This pap er is organized as follows. In Section 2 the consistency of the functional k -NN classifier is established, as a consequence of Theorem 2 in C´ erou and Guyader (2006), for a broad class of Gaussian pro cesses. In Section 3 other functional classifiers recently considered in the literature are introduced and briefly commen ted. They are all compared through a sim ulation study (based on t w o differen t mo dels) as w ell as six real data examples, v ery m uc h in the spirit of Hand’s (2006) pap er, where the p erformance of the classical Fisher’s rule was assessed in terms of its discrimination capacit y in several randomly chosen data sets. 2. On the consistency of the functional k -NN classifier In the functional classification problem sev eral auxiliary devices ha ve b een used to ov er- come the extra difficulty p osed by the infinite dimensional nature of the feature space. They include dimension reduction techniques (e.g., James and Hastie 2001, Preda et al. 2007), random pro jections combined with data-depth measures pro jections use of data-depth mea- sures (Cuev as et al. 2007) and differen t adaptations to the functional framework of several non-parametric and regression-based metho ds, including k ernel classifiers (Abraham et al. 4 2006, Biau et al. 2005, F erraty and Vieu 2003), repro ducing kernel procedures (Preda 2007), logistic regression (M ¨ uller and Stadtm ¨ uller 2005) and multila y er p erceptron techniques with functional inputs (F err ´ e and Villa 2006). 2.1 On the c onsistency of the functional k -NN classifier The functional k -NN classifier b elongs also to the class of pro cedures adapted from the usual non-parametric m ultiv ariate setup. Nev ertheless, unlike most of the ab ov e men tioned functional metho dologies, the k -NN procedure works according to exactly the same principles in the finite and infinite-dimensional cases. It is defined by g n ( x ) = 1 { η n ( x ) > 1 / 2 } , where η n is the k -NN regression estimator (2), whose definition is formally iden tical to that of the finite-dimensional case. The in tuitive interpretation is also the same in b oth cases. No previous data manipulation, pro jection or dimension reduction techniq ue is required in principle, apart from the discretization pro cess necessarily inv olv ed in the practical handling of functional data. In the present section w e offer some concrete examples where the k -NN functional classifier is weakly consistent. As we hav e men tioned in the previous section, this is a non-trivial p oin t since the k -NN classifier is no longer universally consisten t in the case of infinite-dimensional inputs X . Throughout this section the feature space where the v ariable X tak es v alues is a separable metric space ( F , D ). W e will denote b y P X the distribution of X defined by P X ( B ) = P { X ∈ B } for B ∈ B F , where B F are the Borel sets of F . Let us no w consider the follo wing regularit y assumption on the regression function η ( x ) = E ( Y | X = x ) (BC) Besico vitc h condition: lim δ → 0 1 P X ( B X,δ ) Z B X,δ η ( z ) dP X ( z ) = η ( X ) in probability , where B x,δ := { z ∈ F : D ( x, z ) ≤ δ } is the closed ball with cen ter x and radius δ . Under (BC) C´ erou and Guy ader (2006, Th. 2) get the following consistency result. Denote by L n and L ∗ , r esp e ctively, the c onditional err or asso ciate d with the ab ove define d k -NN classifier and the Bayes (optimal) err or for the pr oblem at hand. If ( F , D ) is sep ar able and c ondition (BC) is fulfil le d then the k -NN classifier is we akly c onsistent, that is E ( L n ) → L ∗ , as n → ∞ , pr ovide d that k → ∞ and k /n → 0. Besico vic h condition pla ys an important role also in the consistency of kernel rules (see Abraham et al. 2006). C ´ erou and Guy ader (2006) hav e also considered the follo wing more con venien t condition (called P X -con tin uity) that ensures (BC) : F or every > 0 and for P X -a.e. x ∈ F lim δ → 0 P X { z ∈ F : | η ( z ) − η ( x ) | > | D ( x, z ) < δ } = 0 . 5 Ho w ever, for our purp oses, it will b e sufficient to observe that the contin uity ( P X -a.e.) of η ( x ) implies also (BC) . W e are in terested in finding families of distributions of ( X , Y ) under whic h the regression function η ( x ) is con tinuous ( P X -a.e.) and hence (BC) holds. F rom now on we will use the follo wing notation. Let µ i b e the distribution of X condi- tional on Y = i , that is, µ i ( B ) = P { X ∈ B | Y = i } , for B ∈ B F and i = 0 , 1. W e denote by S i ⊂ F the supp ort of µ i , for i = 0 , 1, and S = S 0 ∩ S 1 . The expression µ 0 << µ 1 will denote that µ 0 is absolutely contin uous with resp ect to µ 1 . Also we will assume that p = P { Y = 0 } fulfills p ∈ (0 , 1). The following theorem sho ws that the prop erty of con tin uity (resp. P X -con tin uity) of η ( x ), and hence the weak consistency of the k -NN classifier, follo ws from the contin uity (resp P X -con tin uity) of the Radon-Nik o dym deriv ative of µ 0 with respect to µ 1 pro vided that it exists. Theorem 1: Assume that P X ( ∂ S ) = 0 and that µ 0 << µ 1 and µ 1 << µ 0 on S . Then the fol lowing ine quality holds for P X -a.e. x, z ∈ F . | η ( z ) − η ( x ) | ≤ p 1 − p dµ 0 dµ 1 ( x ) − dµ 0 dµ 1 ( z ) , wher e dµ 0 /dµ 1 denotes the R adon-Niko dym derivative of µ 0 with r esp e ct to µ 1 . When S 0 = S 1 = S the assumption P X ( ∂ S ) = 0 may b e dr opp e d. In p articular, η is c ontinuous P X -a.e. (r esp. P X -c ontinuous) whenever dµ 0 /dµ 1 is c on- tinuous P X -a.e. (r esp. P X -c ontinuous). Of c ourse, a similar r esult holds by inter changing the sub-indic es 0 and 1 and r eplacing p by 1 − p . Pr oof: Define µ = µ 0 + µ 1 . Then µ i << µ , for i = 0 , 1, and we can define the Radon- Nik o dym deriv ativ es f i = dµ i /dµ , for i = 0 , 1. F rom the definition of the conditional exp ectation we kno w that η ( x ) = E ( Y | X = x ) = P ( Y = 1 | X = x ) can b e expressed b y η ( x ) = f 1 ( x )(1 − p ) f 0 ( x ) p + f 1 ( x )(1 − p ) . (4) Observ e that µ | S c ∩ S i = µ i | S c ∩ S i and thus f i | S c ∩ S i = 1 S c ∩ S i , for i = 0 , 1. Since µ 0 << µ 1 and µ 1 << µ 0 on S then, on this set, we can define the Radon-Nikodym deriv ativ es dµ 0 /dµ 1 and dµ 1 /dµ 0 . In this case, it also holds that µ | S << µ i | S , for b oth i = 0 , 1 and dµ dµ i ( x ) = 1 + dµ 1 − i dµ i ( x ) for any x ∈ S . Then (see, e.g., F olland 1999), for i = 0 , 1 and for P X -a.e. x ∈ S , f i ( x ) = dµ i dµ ( x ) = dµ dµ i ( x ) − 1 = 1 1 + dµ 1 − i dµ i ( x ) (5) 6 Substituting (5) into expression (4) w e get η ( x ) = 0 if x ∈ S 0 ∩ S c 1 if x ∈ S 1 ∩ S c 1 − p p dµ 0 dµ 1 ( x ) + 1 − p if x ∈ S. (6) Using this last expression w e can see that if P X ( ∂ S ) = 0 and if dµ 0 /dµ 1 is con tin uous P X -a.e. (resp. P X -con tin uous) on S then η is also contin uous P X -a.e. (resp. P X -con tin uous) on S . T o see this it suffices to observe that, for P X -a.e. x, z ∈ in t( S ), | η ( z ) − η ( x ) | = 1 − p p dµ 0 dµ 1 ( z ) + 1 − p − 1 − p p dµ 0 dµ 1 ( x ) + 1 − p ≤ p 1 − p dµ 0 dµ 1 ( x ) − dµ 0 dµ 1 ( z ) . T o derive the last inequality we ha v e used that, as µ i , i = 0 , 1, are p ositive measures, the Radon-Nik o dym deriv ativ e dµ 0 /dµ 1 is also non-negativ e. 2 In order to b e able to com bine Theorem 1 and the consistency result in C´ erou and Guy ader (2006, Th. 2), we are in terested in finding distributions µ 0 , µ 1 of an infinite- dimensional random elemen t X such that µ 0 << µ 1 and µ 1 << µ 0 with contin uous Radon- Nik o dym deriv ativ es. Measures µ 0 and µ 1 satisfying that µ 0 << µ 1 and µ 1 << µ 0 on S are said to b e e quivalent on S . Let us denote by ( C [ a, b ] , k k ∞ ) the metric space of contin uous real-v alued functions x defined on the interv al [ a, b ], endow ed with the supremum norm, k x k ∞ = sup {| x ( t ) | : t ∈ [ a, b ] } . Also let C 2 [ a, b ] b e the space of twice con tin uously differen tiable functions defined on [ a, b ]. In the next theorem w e show a broad class of Gaussian pro cesses fulfilling the conditions of Theorem 2 in C ´ erou and Guy ader (2006). Thus the consistency of the k -NN classifier is guaranteed for them. A k ey element in the pro of are the results b y V arb erg (1961) and Jørsbo e (1968) pro viding explicit expressions for the Radon-Nik o dym deriv ativ e of a Gaussian measure with resp ect to another one. F rom the gaussianit y assumption, the mo del is completely determined b y giving the mean and cov ariance functions. F or the sake of a more clear and systematic presen tation the statemen t is divided in to three parts: The first one applies to the case where the mean function in b oth functional p opulations, with distributions µ 0 and µ 1 (corresp onding to X | Y = 0 and X | Y = 1), is common and the difference b et ween both pro cesses lies in the cov ariance functions (whic h how ev er k eep a common structure). The second part considers the dual case where the difference lies in the mean functions and the cov ariance structure is common. Finally , the third part of the theorem generalizes the previous tw o statemen ts by including the case of different mean and co v ariance functions. 7 Theorem 2: L et ( F , D ) = ( C [ a, b ] , k k ∞ ) with 0 ≤ a < b < ∞ . a) Assume that X | Y = i , for i = 0 , 1 , ar e Gaussian pr o c esses on [ a, b ] , whose me an function is zer o and with c ovarianc e functions Γ i ( s, t ) = u i (min( s, t )) v i (max( s, t )) , for s, t ∈ [ a, b ] , wher e u i , v i , for i = 0 , 1 , ar e p ositive functions in C 2 [ a, b ] . Assume also that v i , for i = 0 , 1 , and v 1 u 0 1 − u 1 v 0 1 ar e b ounde d away fr om zer o on [ a, b ] , that u 1 v 0 1 − u 0 1 v 1 = u 0 v 0 0 − u 0 0 v 0 and that u 1 ( a ) = 0 if and only if u 0 ( a ) = 0 . Then dµ 0 /dµ 1 is c ontinuous on F . b) Assume that X | Y = i , for i = 0 , 1 , ar e Gaussian pr o c esses on [ a, b ] , with e qual c ovarianc e function Γ( s, t ) = u (min( s, t )) v (max( s, t )) , for s, t ∈ [ a, b ] , wher e u, v ∈ C 2 [ a, b ] ar e p ositive functions and v and v u 0 − uv 0 ar e b ounde d away fr om zer o on [ a, b ] . Assume also that the me an function of X | Y = 1 is 0 and that of X | Y = 0 is a function m ∈ C 2 [ a, b ] , such that m ( a ) = 0 whenever u ( a ) = 0 . Then dµ 0 /dµ 1 is c ontinuous on F . c) Assume that X | Y = i , for i = 0 , 1 , ar e Gaussian pr o c esses on [ a, b ] , with me an functions m i ∈ C 2 [ a, b ] and c ovarianc e functions Γ i ( s, t ) = u i (min( s, t )) v i (max( s, t )) , for s, t ∈ [ a, b ] , wher e u i , v i , for i = 0 , 1 , ar e p ositive functions in C 2 [ a, b ] which fulfil l the same c onditions imp ose d in (a). Assume also that m i ( a ) = 0 whenever u i ( a ) = 0 . Then dµ 0 /dµ 1 is c ontinuous on F . Ther efor e, under the assumptions in either (a), (b) or (c), the k -NN classifier discriminating b etwe en µ 0 and µ 1 is we akly c onsistent when k → ∞ and k /n → 0 . Pr oof: a) V arb erg (1961, Th. 1) shows that, under the assumptions of (a), µ 0 and µ 1 are equiv alen t measures and the Radon-Nik o dym deriv ative of µ 0 with resp ect to µ 1 is given b y dµ 0 dµ 1 ( x ) = C 1 exp 1 2 C 2 x 2 ( a ) + Z b a f ( t ) d x 2 ( t ) v 0 ( t ) v 1 ( t ) (7) where C 1 = v 0 ( a ) v 1 ( b ) v 0 ( b ) v 1 ( a ) 1 / 2 if u 0 ( a ) = 0 u 1 ( a ) v 1 ( b ) v 0 ( b ) u 0 ( a ) 1 / 2 if u 0 ( a ) 6 = 0 C 2 = ( 0 if u 0 ( a ) = 0 v 0 ( a ) u 0 ( a ) − u 1 ( a ) v 1 ( a ) v 1 ( a ) v 0 ( a ) u 0 ( a ) u 1 ( a ) 1 / 2 if u 0 ( a ) 6 = 0 and f ( s ) = v 1 ( s ) v 0 0 ( s ) − v 0 ( s ) v 0 1 ( s ) v 1 ( s ) u 0 1 ( s ) − u 1 ( s ) v 0 1 ( s ) for s ∈ [ a, b ] . Observ e that, by the assumptions of the theorem, this function f is differentiable with b ounded deriv ativ e. Th us f is of b ounded v ariation and it ma y b e expressed as the 8 difference of tw o b ounded p ositive increasing functions. Therefore the sto chastic in tegral (7) is w ell defined and it can b e ev aluated in tegrating by parts, dµ 0 dµ 1 ( x ) = C 1 exp 1 2 C 3 x 2 ( a ) + C 4 x 2 ( b ) − Z b a x 2 ( t ) v 0 ( t ) v 1 ( t ) d f ( t ) with C 3 = C 2 − f ( a ) /v 0 ( a ) v 1 ( a ) and C 4 = f ( b ) /v 0 ( b ) v 1 ( b ). It is clear that this deriv ative is a con tin uous functional of x with resp ect to the supremum norm. No w, Theorem 1 implies that η ( x ) is con tin uous and, therefore, Besico vich condition (BC) holds and, from Theorem 2 in C ´ erou and Guy ader (2006), the k -NN classifier is w eakly consisten t. Note that the equiv alence of µ 0 and µ 1 implies the coincidence of b oth supp orts S 0 = S 1 = S . b) In Jørsb o e (1968), p. 61, it is prov ed that, under the indicated assumptions, µ 0 and µ 1 are equiv alen t measures with the following Radon-Nik o dym deriv ativ e dµ 0 dµ 1 ( x ) = exp D 1 + D 2 x ( a ) + 1 2 Z b a g ( t ) d 2 x ( t ) − m ( t ) v ( t ) where D 1 = − m 2 ( a ) 2 u ( a ) v ( a ) 1 { u ( a ) > 0 } , D 2 = m ( a ) u ( a ) v ( a ) 1 { u ( a ) > 0 } and g ( t ) = v ( t ) m 0 ( t ) − m ( t ) v 0 ( t ) v ( t ) u 0 ( t ) − u ( t ) v 0 ( t ) . Again, the in tegration by parts giv es dµ 0 dµ 1 ( x ) = exp D 3 + D 2 − 2 g ( a ) v ( a ) x ( a ) + 2 g ( b ) v ( b ) x ( b ) − 2 Z b a x ( t ) v ( t ) dg ( t ) , (8) with D 3 = D 1 − Z b a g ( t ) d m ( t ) v ( t ) . Th us dµ 0 /dµ 1 , and hence η , are contin uous and the consistency of the k -NN classifier holds also in this case. c) Let us denote b y P m, Γ the distribution of the Gaussian pro cess with mean m and co v ari- ance function Γ. Then dµ 0 dµ 1 ( x ) is con tin uous since (see e.g. F olland 1991) dµ 0 dµ 1 ( x ) = dP m 0 , Γ 0 dP m 1 , Γ 1 ( x ) = dP m 0 , Γ 0 dP 0 , Γ 0 ( x ) dP 0 , Γ 0 dP 0 , Γ 1 ( x ) dP 0 , Γ 1 dP m 1 , Γ 1 ( x ) , (9) and, as w e ha ve sho wn in the pro ofs of (a) and (b), the Radon-Nikodym deriv atives in the right-hand side of (9) are all contin uous. 2 9 Remark 1 (Applica tion to the Ornstein-Uhlenbeck pr ocesses). Let X | Y = i , for i = 0 , 1, b e Gaussian processes on [ a, b ], with zero mean and cov ariance function Γ i ( s, t ) = σ 2 i exp( − β i | s − t | ), for s, t ∈ [ a, b ], where β i , σ i > 0 for i = 0 , 1. Assume that σ 2 1 β 1 = σ 2 0 β 0 . Then these pro cesses satisfy the assumptions in Theorem 2(a). Remark 2 (Applica tion to the Bro wnian motion). Theorem 2(b) can also b e used to consistently discriminate b etw een a Bro wnian motion without trend ( m 0 = 0) and another one with trend ( m 1 6 = 0). It will suffice to consider the case where u ( t ) = t and v ≡ 1. Remark 3 (On triangular cov ariance functions). Co v ariance functions of t yp e Γ( s, t ) = u (min( s, t )) v (max( s, t )), called triangular , hav e received considerable attention in the literature. F or example, Sac ks and Ylvisaker (1966) use this condition in the study of optimal designs for regression problems where the errors are generated by a zero mean pro cess with co v ariance function K ( s, t ). It turns out that the Hilb ert space with repro ducing k ernel K pla ys an imp ortant role in the results and, as these authors p oint out, the norm of this space is particularly easy to handle when K is triangular. On the other hand, V arb erg (1964) has given an interesting represen tation of the pro cesses X ( t ) , 0 ≤ t < b , with zero mean and triangular cov ariance function by proving that they can b e expressed in the form X ( t ) = Z b 0 W ( u ) d u R ( t, u ) , where W is the standard Wiener pro cess and R = R ( t, u ) is a function, of b ounded v ariation with resp ect to u , defined in terms of K . Remark 4 (On plug-in functional classifiers). The explicit kno wledge of the con- ditional exp ectation (6) in the cases considered in Theorem 2 could b e explored from the statistical point of view as they suggest to use “plug-in” classifiers obtained by replacing η ( x ) in (1) with suitable parametric or semiparametric estimators. Remark 5 (On equiv alent Gaussian measures and their suppor ts). According to a well-kno wn result b y F eldman and H´ ajek, for an y given pair of Gaussian pro cesses, there is a dic hotom y in suc h a w a y that they are either equiv alen t or m utually singular. In the first case b oth measures µ 0 and µ 1 ha v e a common supp ort S so that Theorem 1 is applicable with S = S 0 = S 1 . As for the identification of the supp ort, V akhania (1975) has pro v ed that if a Gaussian pro cess, with tra jectories in a separable Banach space F , is not degenerate (i.e., then the distribution of an y non-trivial linear contin uous functional is not degenerate) then the supp ort of suc h pro cess is the whole space F . Again, expression (6) of the regression functional η suggests the p ossibilit y of inv estigating p ossible nonparametric estimators for the Radon-Nikodym deriv ative dµ 0 /dµ 1 whic h w ould in turn provide plug-in v ersions of the Bay es rule g ∗ ( x ) = 1 { η ( x ) > 1 / 2 } with no further assumption on the structure of the inv olv ed Gaussian pro cesses, apart from their equiv alence. 10 3. Some n umerical comparisons The aim of this section is to compare (n umerically) the performance of sev eral sup ervised functional classification pro cedures already in tro duced in the literature. The pro cedures are the k -NN rule, computed b oth with resp ect to the supremum norm k k ∞ and the L 2 norm k k 2 , and other discrimination rules reviewed in Section 3.1. One of the ob jectiv es of this n umerical study is to hav e some insigh t in to which classification pro cedures p erform w ell no matter the type of functional data under consideration and could thus b e considered a sort of b enc hmark for the functional discrimination problem. Section 3.2 con tains a Monte Carlo study carried out on t w o differen t functional data generating mo dels. In Section 3.3 w e consider six functional real data sets tak en from the literature. 3.1 Other functional classifiers Here w e will review other classification techniques that hav e b een used in the literature in the context of functional data. F rom now on w e denote by ( t 1 , . . . , t N ) the no des where the functional predictor X has b een observed. Partial L e ast Squar es (PLS) classific ation Let us first describ e the pro cedure in the context of a multiv ariate predictor X . PLS is actually a dimension reduction technique for regression problems with predictor X and a resp onse Y (whic h in the case of classification tak es only tw o v alues, 0 or 1, depending on which p opulation the individual comes from). The dimension reduction is carried out b y pro jecting X on to an lo wer dimensional space such that the co ordinates of the pro jected X , the PLS co ordinates, are uncorrelated to eac h other and hav e maximum cov ariance with Y . Then, if the aim is classification, Fisher’s linear discriminan t is applied to the PLS co ordinates of X (see Barker and Ra y ens 2003, Liu and Ray ens 2007). In the case of a functional predictor X (see Preda et al. 2007), the ab o v e described pro cedure is applied to the discretized v ersion of X , X = ( X ( t 1 ) , X ( t 2 ) , . . . , X ( t N )). Here we hav e c hosen the n um b er of PLS directions, among the v alues 1,. . . ,10, by cross-v alidation. R epr o ducing Kernel Hilb ert Sp ac e (RKHS) classific ation W e will also define this technique initially for a multiv ariate predictor X . F or simplicit y , w e will assume that X takes v alues in [0 , 1] N . Let κ b e a function defined on [0 , 1] N × [0 , 1] N . A RKHS with kernel κ is the v ector space generated b y all finite linear combinations of functions of the form κ t ∗ ( · ) = κ ( t ∗ , · ), for any t ∗ ∈ [0 , 1] N , and endow ed with the inner pro duct giv en b y h κ t ∗ , κ t ∗∗ i κ = κ ( t ∗ , t ∗∗ ). RKHS are frequen tly used in the context of Mac hine Learning (see Evgeniou et al. 2002, W ah ba 2002); for their applications in Statistics the reader is referred to the monograph of Berlinet and Thomas-Agnan (2004). In this w ork we use the Gaussian 11 k ernel κ ( s , t ) = exp( −k s − t k 2 2 /σ 2 κ ), where σ κ > 0 is a fixed parameter. The classification problem is solv ed by plugging a regression estimator of the type η n ( x ) = P n i =1 c i κ ( x , X i ) in to the Ba y es classifier. When X is a random function, this procedure is applied to the discretized X . The parameters c i , for i = 1 , . . . , n , are c hosen to minimize the risk functional n − 1 P n i =1 ( Y i − η n ( X i )) 2 + λ h η , η i κ , where λ > 0 is a p enalization parameter. In this work the v alues of the parameters λ and σ κ ha v e b een c hosen b y cross-v alidation via a lea v e-one-out pro cedure. According to our results, it seems that the p erformance the RKHS metho dology is rather sensitiv e to changes in these parameters and ev en to the starting p oint of the lea v e-one-out pro cedure mentioned. Classific ation via depth me asur es The idea is to assign a new observ ation x to that p opulation, P 0 or P 1 , with resp ect to which x is deep er (see Ghosh and Chaudh uri 2005, Cuev as et al. 2007). F rom the five functional depth measures considered b y Cuev as et al. (2007) we hav e tak en the h -mode depth and the random pro jection (RP) depth. Sp ecifically , the h -mo de depth of x with resp ect to the p opulation given b y the random elemen t X is defined as f h ( x ) = E ( K h ( k x − X k 2 )), where K h ( · ) = h − 1 K ( · /h ), K is a k ernel function (here w e hav e tak en the Gaussian k ernel K ( t ) = p 2 /π exp( − t 2 / 2)) and h is a smo othing parameter. As the distribution of X is usually unknown, in the simulations we actually use the empirical version of f h , ˆ f h ( x ) = n − 1 P n i =1 K h ( k x − X i k 2 ). The smo othing parameter has b een c hosen as the 20 p ercentile in the L 2 distances b etw een the functions in the training sample (see Cuev as et al. 2007). T o compute the RP depth the training sample X 1 , . . . , X n is pro jected on to a (functional) random direction a (independent of the X i ). The sample depth of an observ ation x with resp ect to P i is defined as the univ ariate depth of the pro jection of x on to a with resp ect to the pro jected training sample from P i . Since a is a random element this definition leads to a random measure of depth, but a single representativ e v alue has b een obtained b y a v eraging these random depths ov er 50 indep endent random directions (see Cuev as and F raiman 2008 for a certain theoretical dev elopment of this idea). If w e are w orking with discretized versions ( x ( t 1 ) , . . . , x ( t N )) of the functional data x ( t ), we ma y take a according to a uniform distribution on the unit sphere of R N . This can be achiev ed, for example, setting a = Z / k Z k , where Z is drawn from standard Gaussian distribution on R N . Moving window rule The moving windo w classifier is giv en by g n ( x ) = 0 if P n i =1 1 { Y i =0 ,X i ∈ B ( x,h ) } ≥ P n i =1 1 { Y i =1 ,X i ∈ B ( x,h ) } , 1 otherwise , where h = h n > 0 is a smo othing parameter. This classification rule was considered in the functional setting, for instance, by Abraham et al. (2006). In this w ork the parameter h has b een chosen again via cross-v alidation. 12 3.2 Monte Carlo r esults In this section we study tw o functional data mo dels already considered by other authors. More specifically , in Mo del 1, similar to one used in Cuev as et al. (2007), X | Y = i is a Gaussian pro cess with mean m i ( t ) = 30 (1 − t ) 1 . 1 i t 1 . 1 1 − i and co v ariance function Γ i ( s, t ) = 0 . 25 exp( −| s − t | / 0 . 3), for i = 0 , 1. Observ e that this mo del with smo oth tra jectories satisfies the assumptions in Theorem 2 and thus w e would exp ect the k -NN classification rule (with resp ect to the k k ∞ norm) to p erform nicely . Let us note that the v alue of 1.1 in the exponent of m i ( t ) is in fact the one used in Mo del 1, pg. 487, of Cuev as et al. (2007), although in their work a 1.2 was misprin ted instead. Mo del 2 appears in Preda et al. (2007), but here the functions h i , used to define the mean, hav e b een rescaled to hav e domain [0 , 1]. The tra jectories of X | Y = i are giv en b y X i ( t ) = U h 1 ( t ) + (1 − U ) h i +2 ( t ) + ( t ) for i = 0 , 1 , (10) where U is uniformly distributed on [0 , 1], h 1 ( t ) = 2 max(3 − 5 | 2 t − 1 | , 0), h 2 ( t ) = h 1 ( t − 1 / 5), h 3 ( t ) = h 1 ( t + 1 / 5) and the ( t ) is an approximation to the con tinuous-time white noise. In practice, this means that in the discretized appro ximations ( X ( t 1 ) , . . . , X ( t N )) to X ( t ), the v ariables ( t 1 ) , . . . , ( t N ) are indep enden tly drawn from a standard normal distribution. The simulation results are summarized in T ables 1 and 2. The num ber of equispaced no des where the functional data hav e b een ev aluated is the same for b oth mo dels, 51. The n um b er of Monte Carlo runs is 100. In ev ery run w e generated tw o training samples (from X | Y = 0 and X | Y = 1 resp ectively) each with sample size 100, and we also generated a test sample of size 50 from each of the t wo p opulations. The tables displa y the descriptiv e statistics of the prop ortion of correctly classified observ ations from these test samples. k -NN | ∞ k -NN | 2 PLS RKHS h -mo dal RP(hM) MWR Minim um 0.6200 0.6600 0.6000 0.4800 0.6400 0.5400 0.6600 First quartile 0.8000 0.8000 0.8000 0.6600 0.8000 0.7800 0.8000 Median 0.8400 0.8400 0.8400 0.8400 0.8400 0.8400 0.8400 Mean 0.8396 0.8354 0.8371 0.7999 0.8409 0.8260 0.8393 Third quartile 0.8800 0.8800 0.8800 0.9400 0.8800 0.8800 0.8800 Maxim um 0.9800 0.9600 0.9800 1.0000 0.9800 0.9800 1.0000 Std. deviation 0.0603 0.0572 0.0668 0.1457 0.0589 0.0725 0.0634 T able 1: Sim ulation results for Mo del 1 Regarding Mo del 1, observ e that there is little difference b etw een the correct classification rates of an y of the metho ds, except for the RKHS procedure whic h p erforms worse. In Mo del 2 the PLS, RKHS and h -mo dal metho ds sligh tly outp erform the others. When the 13 k -NN | ∞ k -NN | 2 PLS RKHS h -mo dal RP(hM) MWR Minim um 0.8400 0.8400 0.8800 0.8400 0.8600 0.8400 0.8200 First quartile 0.9200 0.9400 0.9600 0.9600 0.9400 0.9400 0.9400 Median 0.9600 0.9600 0.9800 0.9800 0.9800 0.9600 0.9600 Mean 0.9522 0.9558 0.9686 0.9688 0.9657 0.9522 0.9570 Third quartile 0.9800 0.9800 0.9800 1.0000 1.0000 0.9800 0.9800 Maxim um 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 Std. deviation 0.0335 0.0355 0.0279 0.0313 0.0308 0.0345 0.0349 T able 2: Sim ulation results for Mo del 2 Mon te Carlo study with this mo del was carried out, w e also applied the k -NN classification pro cedures to a spline-smo othed version of the X tra jectories. The result w as that the mean correct classification rate increased to 0.9582 in the case of the suprem um norm and to 0.9624 in the case of the L 2 norm. This, together with the analysis of the flies data in the next subsection, seems to suggest that, when the curv es X are irregular, smoothing these functions will enhance the k -NN discrimination pro cedure. 3.3. Some c omp arisons b ase d on r e al data sets 3.3.1. Brief description of the data sets Berkeley Gr owth Data: The Berk eley Growth Study (T uddenham and Sn yder 1954) recorded the heights of n 0 = 54 girls and n 1 = 39 b oys b et ween the ages of 1 and 18 years. Heigh ts w ere measured at 31 ages for eac h child. These data ha v e b een previously analyzed b y Ramsa y and Silv erman (2002). ECG data: These are electro cardiogram (ECG) data, studied b y W ei and Keogh (2006), from the MIT-BIH Arrhythmia database (see Goldb erger et al. 2000). Each observ ation con tains the successive measurements recorded b y one electro de during one heartb eat and w as normalized and rescaled to ha v e length 85. A group of cardiologists hav e assigned a lab el of normal or abnormal to eac h data record. Due to computational limitations, of the original 2026 records in the data set, w e hav e randomly chosen only 200 observ ations from eac h group. MCO data: The v ariable under study is the mito chondrial calcium o verload (MCO), mea- sured every 10 seconds during an hour in isolated mouse cardiac cells. The data come from researc h conducted by Dr. Da vid Garc ´ ıa-Dorado at the V all d’Hebron Hospital (see Ruiz- Meana et al. 2003, Cuev as, F ebrero and F raiman 2004, 2007). In order to assess if a certain drug increased the MCO lev el, a sample of functions of size n 0 = 45 w as tak en from a con trol group and n 1 = 44 functions w ere sampled from the treatmen t group. 14 Sp e ctr ometric data: F or eac h of 215 pieces of meat a sp ectrometer provided the absorbance attained at 100 different wa velengths (see F errat y and Vieu 2006 and references therein). The fat conten t of the meat w as also obtained via c hemical pro cessing and each of the meat pieces was classified as low- or high-fat. Phoneme data: The X v ariable is the log-p erio dogram (discretized to 150 no des) of a phoneme. The t w o populations corresp ond to phonemes “aa” and “ao” resp ectively (see more information in F erraty and Vieu 2006). W e hav e considered a sample of 100 observ a- tions from eac h phoneme. Me dflies data: This dataset w as obtained by Prof. Carey from U.C. Davis (see Carey et al. 1998) and has b een studied, for instance, by M ¨ uller and Stadtm ¨ uller (2005). The predictor X is the num b er of eggs laid daily by a Mediterranean fruit fly for a 30-day p erio d. The fly is classified as long-liv ed if its remaining lifetime past 30 days is more than 14 da ys and short-liv ed otherwise. The num b er of long- and short-lived flies observed w as 256 and 278 resp ectiv ely . 3.3.2. R esults W e ha v e applied the classification tec hniques review ed in Section 3.1 to the real data sets just describ ed. While carrying out the simulations of Subsection 3.1, w e observ ed that the performance of the RKHS procedure w as v ery dep enden t on the initial v alues of the parameters σ K and λ pro vided for the cross-v alidation algorithm. In fact, finding initial v alues for these parameters that would finally yield comp etitiv e results with resp ect to the other metho ds took a considerable time. Th us w e decided to exclude the RKHS classification metho d from the study with real data. W e ha ve computed, via a cross-v alidation pro cedure, the mean correct classification rates attained by the differen t discrimination metho ds on the real data sets. In T able 3 we displa y the results. Since the egg-la ying tra jectories in the medflies data set were v ery irregular and spiky , we ha v e computed the correct classification rate for b oth the original data and a smo othed v ersion obtained with splines. The smo othing leads to a b etter p erformance of the k -NN pro cedure with the supremum metric, just as it happ ened in the simulations with Mo del 2. As a conclusion w e w ould say that the k -NN classification metho dology with resp ect to the L ∞ norm is alwa ys among the b est p erforming ones if the X tra jectories are smo oth. The k -NN pro cedure with resp ect to the L 2 norm and the PLS metho dology give also go o d results, although the latter has the drawbac k of a m uc h higher computation time. References 15 Data set k -NN | ∞ k -NN | 2 PLS h -modal RP(hM) MWR Gro wth 0.9462 0.9677 0.9462 0.9462 0.9462 0.9570 ECG 0.9900 0.9950 0.9825 0.9900 0.8575 0.8850 MCO 0.8427 0.8315 0.8876 0.7640 0.7079 0.6854 Sp ectrometric 0.9070 0.8558 0.9163 0.6791 0.6930 0.6558 Phoneme 0.7300 0.7800 0.7400 0.7300 0.7450 0.6950 Medflies (non-smo othed) 0.5468 0.5412 0.5262 0.4925 0.5056 0.5431 (smo othed) 0.5712 0.5431 0.5094 0.5075 0.5543 0.5206 T able 3: Mean correct classification rates for the real data sets Abraham, C., Biau, G. and Cadre, B. (2006). On the kernel rule for function classification. Annals of the Institute of Statistical Mathematics 58, 619-633. Bark er M. and Ray ens W. (2003). Partial least squares for discrimination. Journal of Chemometrics 17, 166-73. Berlinet, A. and Thomas-Agnan, C. (2004). Repro ducing Kernel Hilb ert Spaces in Proba- bilit y and Statistics. Klu wer Academic Publishers. Biau, G., Bunea, F. and W egk amp, M. (2005). F unctional classification in Hilb ert spaces. IEEE T ransactions on Information Theory 51, 2163-2172. Carey , J.R., Liedo, P ., M ¨ uller, H.G., W ang, J.L. and Chiou, J.M. (1998). Relationship of age patterns of fecundity to mortality , longevity , and lifetime repro duction in a large cohort of Mediterranean fruit fly females. Journal of Gerontology , Ser. A 53, 245–251. C ´ erou, F. and Guyader, A. (2006). Nearest neighbor classification in infinite dimension. ESAIM: Probability and Statistics 10, 340-355. Cuev as, A., F ebrero, M and F raiman, R. (2004). An ANO V A test for functional data. Computational Statistics and Data Analysis 47, 111–122. Cuev as, A., F ebrero, M and F raiman, R. (2007). Robust estimation and classification for functional data via pro jection-based depth notions. Computational Statistics 22, 481–496. Cuev as, A. and F raiman, R. (2008). On depth measures and dual statistics. A metho dology for dealing with general data. Manuscript . Devro y e, L., Gy¨ orfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer-V erlag. Evgeniou , T., P oggio, T. P on til, M. and V erri, A. (2002). Regularization and statistical learning theory for data analysis. Computational Statistics and Data Analysis, 38, 421– 432. F erraty , F. and Vieu, P . (2003). Curv es discrimination: A nonparametric functional ap- proac h. Computational Statistics and Data Analysis 44, 161–173. 16 F erraty , F. and Vieu, P . (2006). Nonparametric Mo delling for F unctional Data. Springer. F err´ e, L. and Villa, N. (2006). Multilay er p erceptron with functional inputs: an inv erse regression approach. Scandina vian Journal of Statistics 33, 807–823, Fisher, R.A. (1936). The use of multiple measuremen ts in taxonomic problems. Annals of Eugenics 7, 179–188. F olland, G. B. (1999). Real analysis. Mo dern techniques and their applications. Wiley . Ghosh, A. K. and Chaudh uri, P . (2005). On maximal depth and related classifiers. Scandi- na vian Journal of Statistics 32, 327–350. Goldb erger, A., Amaral, L., Glass, L., Hausdorff, J., Iv anov, P ., Mark, R., Mietus, J., Moo dy , G., P eng, C., and He, S. (2000). PhysioBank, PhysioT oolkit, and Ph ysioNet: Comp onents of a New Researc h Resource for Complex Ph ysiologic Signals. Circulation 101, 215–220. Hand, D.J. (1997). Construction and Assessment of Classification Rules. Wiley . Hand, D.J. (2006). Classifier tec hnology and the illusion of progress. Statistical Science 21, 1–14. Hastie, T., Tibshirani, R. and F riedman, J. (2001). The Elemen ts of Statistical Learning. Springer. James, G.M. and Hastie, T.J. (2001). F unctional linear discriminan t analysis for irregularly sampled curves. Journal of the Roy al Statistical So ciety , Ser. B 63, 533-550. Jørsb o e, O. G. (1968). Equiv alence or Singularit y of Gaussian Measures on F unction Spaces. V arious Publications Series, No. 4, Matematisk Institut, Aarh us Universitet, Aarh us. Liu, Y. and Ra yens, W. (2007). PLS and dimension reduction for classification. Computa- tional Statistics 22, 189–208. M ¨ uller, H.G. and Stadtm ¨ uller, U. (2005). Generalized functional linear mo dels. The Annals of Statistics 33, 774-805. Preda, C. (2007). Regression models for functional data b y repro ducing kernel Hilb ert spaces metho ds. Journal of Statistical Planning and Inference 137, 829–840. Preda, C., Sap orta, G. and L ´ ev ´ eder, C. (2007). PLS classification of functional data. Com- putational Statistics 22, 223–235. Ramsa y , J.O. and Silv erman, B.W. (2002). Applied F unctional Data Analysis. Metho ds and Case Studies. Springer-V erlag. Ramsa y , J.O. and Silv erman, B.W. (2005). F unctional Data Analysis. Second edition. Springer. Ruiz-Meana, M., Garc ´ ıa-Dorado, D., Pina, P ., Inserte, J., Agull´ o, L. and Soler-Soler, J. (2003). Carip oride preserves mito chondrial proton gradien t and delays A TP depletion in 17 cardiom y o cites during ischemic conditions. American Journal of Physiology - Heart and Circulatory Physiology 285, 999–1006. Sac ks, J. and Ylvisaker, N.D. (1966). Designs for regression problems with correlated errors. Annals of Mathematical Statistics 37, 66–89. Stone, C. J. (1977). Consistent nonparametric regression. The Annals of Statistics 5, 595- 645. T uddenham, R. D. and Sn yder, M. M. (1954). Ph ysical growth of California b o ys and girls from birth to eigh teen years. Univ ersity of California Publications in Child Dev elopmen t 1, 183–364. V akhania, N.N. (1975). The top ological supp ort of Gaussian measure in Banach space. Nago y a Mathematical Journal 57, 59–63. V arb erg, D.E. (1961). On equiv alence of Gaussian measures. Pacific Journal of Mathematics 11, 751–762. V arb erg, D.E. (1964). On Gaussian measures equiv alent to Wiener measure. T ransactions of the American Mathematical So ciety 113, 262–273. W ahba, G. (2002). Soft and hard classification b y reproducing k ernel Hilb ert space metho ds. Pro ceedings of National Academy of Sciences 99, 16524–16530. W ei, L. and Keogh, E. (2006). Semi-Sup ervised Time Series Classification. Pro ceedings of the 12th ACM SIGKDD In ternational Conference on Knowledge Disco very and Data Mining, 748–753, Philadelphia, U.S.A. 18
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment