Supervised Machine Learning with a Novel Kernel Density Estimator
In recent years, kernel density estimation has been exploited by computer scientists to model machine learning problems. The kernel density estimation based approaches are of interest due to the low time complexity of either O(n) or O(n*log(n)) for c…
Authors: ** - Y.J.O. (연구 주도, SRKDE 수학적 기반 제시) - D.T.H.C. - Y.Y.O. - H.G.H. - C.P.W. - C.Y.C. **
Supervised Machine Learnin g with a Novel Kernel Yen-Jen Oyan g * 1 , Darby Tien-H ao Chang 2 , Yu-Yen Ou 3 , Hao-Geng Hu ng 4 , Chih-Peng Wu 4 , and Chien-Yu Ch en 5 1 Graduate Institute of Biomedical Electronics an d Bioinformati cs, Dep artment of Computer Science and Information Eng ineering, Nationa l Taiwan University, Taipei, 106, Taiwan, R.O.C. yjoyang@csie.ntu.edu.tw 2 Department of Electrical Engineering , National Cheng Kung University, Tain an, 70101, Taiwan, R.O.C. darby@ee.ncku. edu.tw 3 Graduate School of Biotechnology and Bioi nfo rmatics, Yuan-Ze University, Chung-Li, 320, Taiwan, R.O.C. yi en@saturn.yzu.edu.tw 4 Department of Computer Scien ce and Inform ation Engineering , National Taiwan Univer- sity, Taipei, 106, Taiwan, R. O.C. hghung@mars.csie.ntu .edu.tw , chinuy@gmail.com 5 Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei, Taiwan, R.O.C. chienyuchen@ntu.edu.tw Abstract. In recent years, kernel density estimation has been exploited by co m- puter scientists t o model machin e learning probl ems. The kernel density esti- mation based approaches are of interest due to the low time complexity of ei- ther ) ( n O or ) log ( n n O for constructing a classifier, where n is the nu mber of sampling instances. Concerning design of kernel density estimators, one es- sential issue is how fast the poin twise me an square error (MSE) and/or the inte- grated mean square error (IMSE) diminish as the number of sampling instances increases. In th is article, th e kernel function with general form () ( ) ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + − ⋅ ⋅ + Γ ⋅ + 2 2 2 2 2 1 2 / ) 1 ( 2 ... exp 1 1 2 / 2 σ σ π m m m x x x m , where m is the dimension of the vector space, is employed for genera tion of th e density estimator in a high- dimensional vector space. With the propos ed kernel function , it is then feasi- ble to make the pointwise MSE of the densi ty estimator converg e at ) ( 3 / 2 − n O regardless of the dimension of the vector spa ce, provided that the probability density function at the po int of interest meets certain conditions. Keyterms: kernel density estim ation, mach ine learning, data classification * To whom correspondences should be addr essed. Tel:+886-2-33664888 ext. 431, Fax:+886-2- 23688675. 2 Supervis ed Machine Learni ng with a Novel Kernel I. Introduction Kernel density estimation is a problem th at has been studied by statisticians for dec- ades [1-4 ]. In recen t y ears, kernel den sity esti mation has been exploit ed by com - puter scientists to mod el machine learning problems [5-7]. The kernel de nsity esti- mation based approaches are o f interest due to the low time co mplexity of either ) ( n O or ) log ( n n O for gener ating an estimator, where n is the number of sampling instances [4]. Furt hermore, in comparison with the support vector machine (SVM) [8], a recent stud y has shown that the kernel density e stimation base d classifier is ca- pable of del ivering the same level of prediction accuracy, while enjoying several dis- tinctive advantages [7]. Therefore , the kernel density esti mation based machine learning algorithms may become the favor ite choice for contemporary applications that involve large datasets or datab ases. Concerning de sign of ker nel density est imators, one essential issue is how fast the pointwise mea n square error (M SE) and/or t he integrated mean square e rror (IMSE) diminish as the number of sampling i nstances increases. In this respect, the main problems with the co nventional k ernel densit y estimators is that the convergence rate of the pointwise MSE bec omes extremely slow in case the dimension of the dataset is large. For example, wit h Gaussian ke rnels, the pointwis e MSE of the fixe d kernel density estimator converges at ) ( ) 4 /( 4 + − m n O [4], where m is the dimension of th e dataset. Accordingly, the conv entional kernel density estimato rs suffer a serious defi- ciency in dealing with high -dimensional da tasets. Since high-dimensi onal datasets are common in m odern machine l earning appli cations, desi gn of a novel kernel den- sity estimator t hat can ha ndle high -dimensio n al datasets more effectively is essential for exploiti ng kernel density esti mation in modern machi ne learning applicat ions. In this article, the kernel function with general form () () ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + − ⋅ ⋅ + Γ ⋅ + 2 2 2 2 2 1 2 / ) 1 ( 2 ... exp 1 1 2 / 2 σ σ π m m m x x x m , where m is the dimension of t he vector space, is empl oyed for ge neration of t he den- sity estimator in a high-dim ensional vector space. With the propose d kernel func- tion, it is then feasible to mak e the point wise MSE of the density estimator co nverge at ) ( 3 / 2 − n O regardless of t he dimension of the v ector space, provided t hat the prob- ability density fun ction at the point of interest meets certain conditio ns. Just like many conventional kernel densit y estimators, the proposed ker nel densit y estimator features an average time complexity of O ( n log n ) for generating the appr oximate probability d ensity function. Accordingly, the av erage time complexity for con- structing a classifier with the pro pose d kernel densit y estimator is O ( n log n ). In [9] , the effects of appl ying the proposed kernel density e stimator in a bioin formatics ap- plication is addressed. In the following part of this paper, Sectio n II presents the novel ker nel density es- timator proposed in this arti cle. Sectio n III reports the experiments conducted to verify the the orems presented in Sect ion II. Finally, concluding remarks are pre- sented in section IV. Supervised Ma chine Learning with a Novel Kernel Densit II. The Proposed Kernel Density Estimator In this section, we will first elaborate the mathematical b asis of the novel kernel den- sity estimator proposed in this article. In particular, we will show that the pointwise mean square d error (M SE) of the basic fo rm of the propose d kernel density e stimator converges at ) ( 3 / 2 − n O , regardless of the dimensi on of the vector s pace, where n is the number of instances in the training dataset. Then, we will discuss how the proposed kernel density estimator can be exploited i n data classifi cation appli cations. Since we ca n always con duct a tra nslation ope ration wit h the coordi nate system , without loss of generality, we assume in the fo llowing discussion that it is the point- wise MSE at the origin of t he coordina te system that is of concern. Let ) ,..., , ( 2 1 m X x x x f denote the probability density fu nction of the distribution of concern in an m -dimensional vector space. Assume that ) ,..., , ( 2 1 m X x x x f is analytic and ∞ < ) ,..., , ( 2 1 m X x x x f for all m m x x x R ∈ ) ,..., , ( 2 1 . Let Z be the ra ndom variable that maps a sampling instance s i taken fro m the distributio n governed by X f to m i s , wh ere i s is the dista n ce between the ori g in and s i . Accordingly, we have the distributi on function ) ( z F Z of Z equal to ∫∫ ∫ ≤ + + + ... / 2 2 2 2 2 1 2 1 2 1 ... ) ,..., , ( ... m m m m X z x x x dx dx dx x x x f for 0 ≥ z and 0 ) ( = z F Z for 0 < z . Theorem 1: Let ε ε ε ε ) ( ) ( 0 0 lim ) ( z F z F z f Z Z Z − + > → = for 0 ≥ z . Then, we have () ) ( 1 2 / ) 0 ( 2 / 0 X m Z f m f + Γ = π , where Γ ( . ) i s the gamma fu nction [10]. Proof: Since , 0 ) 0 ( = Z F we have ε ε ε ε ε ε ε ε ∫∫ ∫ ≤ + + + > → > → = − m m x x x m m X Z Z dx dx dx x x x f F F / 2 2 2 2 2 1 ... 0 0 lim 0 0 lim ... ) ,..., , ( ... ) 0 ( ) ( 2 1 2 1 By the Tayl or expansi on, s. order term - high ) ( ... ) ( ) ( ) ,..., , ( 1 1 2 1 + ∂ ∂ + + ∂ ∂ + = m m X X X m X x x f x x f f x x x f 0 0 0 4 Supervis ed Machine Learning with a Novel Kernel Furthermore, i n region where m m x x x / 2 2 2 2 2 1 ... ε ≤ + + + , we have 0 1 → x , 0 2 → x , …, 0 → m x as 0 → ε . Therefore, ε π ε ε ε 1 ) 1 2 / ( ) ( ) ( lim ) 0 ( 2 / 0 0 ⋅ + Γ ⋅ = > → m f f m m m X Z 0 ) ( ) 1 2 / ( 2 / 0 X m f m ⋅ + Γ = π , wher e ) 1 2 / ( ) ( 2 / + Γ m m m m π ε is the volume of a sphere in an m -dimensional vector space with radius = m ε . □ Theorem 1 implies that we can obtain an estimate of ) ( 0 X f by first obtaining an es- timate of ) 0 ( Z f . Since Z f is a univariate probability density function, if we em- ploy a fixed ker nel density estimat or [4] to estimate ) 0 ( Z f , then, a s Theorem 2 shows, we can obt ain an estimator of ) ( 0 X f with the pointwise MSE converging at ) ( 3 / 2 − n O . Theorem 2: Let } , , { n 2 1 s s s K be a set of sampling instances randomly and inde- pendently t aken from the distributi on governed by X f in the m -dimensional vector space. Assume that ) ( z f Z is analytic in [0, ∞ ) with all orders of the right-sided derivatives at 0. Then, with 3 / 1 − ⋅ = n λ σ and λ being a positive real number, ∑ = + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⋅ + Γ ⋅ ⋅ = n i m m X m n f 1 2 2 2 / ) 1 ( 2 || || exp ) 1 2 / ( 2 1 ) ( ˆ σ σ π i s 0 is an estimator of ) ( 0 X f with the pointwise MSE converging at ) ( 3 / 2 − n O . Proof: Let m i i z s = and ∑ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⋅ = n i Z z n f 1 2 2 2 exp 2 1 ) 0 ( ˆ σ σ π i with 3 / 1 − ⋅ = n λ σ . We have )] 0 ( ˆ [ )) 0 ( )] 0 ( ˆ [ ( )] 0 ( ˆ [ 2 Z Z Z Z f Var f f E f MSE + − = and . ) ( 2 exp 2 1 )] 0 ( ˆ [ 1 0 2 ∑ ∫ = ∞ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⋅ = n i Z Z dz z f z n f E σ σ π 2 As n →∞ , we have σ→ 0 and Supervised Ma chine Learning with a Novel Kernel Densit ( ) , ) 0 ( 2 ) 0 ( ] ) 0 ( ) 0 ( [ 2 exp 2 1 2 ) 0 ( )] 0 ( ˆ [ 0 2 ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ ′ ⋅ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ′ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⋅ = − + ∞ + ∫ σ π σ σ π Z Z Z Z Z Z f O f dz z f f z O f f E O 2 where . ) 0 ( ) ( lim ) 0 ( 0 0 ε ε ε ε Z Z Z f f f − = ′ > → + Let . 2 exp 2 1 ) 0 ( ˆ 2 2 1 / 1 ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⋅ = σ σ π z n f n We have . ) ( exp 2 1 )] 0 ( ˆ [ 0 2 2 2 2 / 1 ∫ ∞ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⋅ = dz z f z n f E Z n σ πσ 2 Due to σ→ 0 as n →∞ , () . ) 0 ( ) 0 ( exp 2 1 )] 0 ( ˆ [ 2 0 2 2 2 2 / 1 ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⋅ = ∫ ∞ π σ σ πσ n f O dz f z n O f E O Z Z n 2 Therefore, as n →∞ , ( ) ( ) . )]) 0 ( ˆ [ ( 1 ) 0 ( )]) 0 ( ˆ [ ( )] 0 ( ˆ [ )] 0 ( ˆ [ 2 2 2 2 / 1 2 / 1 / 1 ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − = − = Z Z n n n f E n n f O f E f E O f Var O π σ Since 3 / 1 − ⋅ = n λ σ , as n →∞ , we have ( ) () . )] 0 ( ˆ [ 3 / 5 / 1 − = n O f Var O n Furthermore, since s 1 , s 2 ,…, s n are taken randomly and indep endently, )] 0 ( ˆ [ )] 0 ( ˆ [ / 1 n Z f Var n f Var ⋅ = . Therefore, as n →∞ , ( ) () ). ( )] 0 ( ˆ [ )) 0 ( )] 0 ( ˆ [ ( )] 0 ( ˆ [ 3 / 2 2 − = + − = n O f Var f f E O f MSE O Z Z Z Z Let ∑ = + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⋅ + Γ ⋅ ⋅ = n i m m X m n f 1 2 2 2 / ) 1 ( 2 || || exp ) 1 2 / ( 2 1 ) ( ˆ σ σ π i s 0 with 3 / 1 − ⋅ = n λ σ . Then, we have 6 Supervis ed Machine Learning with a Novel Kernel [ ] [ ] () [] . ) 0 ( ˆ ) 1 2 / ( ) 0 ( ) 0 ( ˆ ) 1 2 / ( )) ( ) ( ˆ ( ) ( ˆ 2 2 / 2 2 2 / 2 Z m Z Z m X X X f MSE m f f m E f f E f MSE ⋅ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + Γ = ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − ⋅ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + Γ = − = π π 0 0 0 Since [ ] ) 0 ( ˆ Z f MSE converges at ) ( 3 / 2 − n O with 3 / 1 − ⋅ = n λ σ , [ ] ) ( ˆ 0 X f MSE converges at ) ( 3 / 2 − n O as well. □ Theorem 3: Let } , , { n 2 1 s s s K be a set of sampling instances randomly and inde- pendently taken from the distribution governed by X f in the m -dimensional vector space. Let ∑ = + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⋅ + Γ ⋅ ⋅ = n i m m X m n f 1 2 2 2 / ) 1 ( 2 || || exp ) 1 2 / ( 2 1 ) ( ˆ σ σ π i s 0 . We have 1 ... ) ( ˆ ... 2 1 = ∫∫ ∫ ∞ ∞ − ∞ ∞ − ∞ ∞ − m X dx dx dx f 0 . Proof: In order t o prove 1 ... ) ( ˆ ... 2 1 = ∫∫ ∫ ∞ ∞ − ∞ ∞ − ∞ ∞ − m X dx dx dx f 0 , we only need to show that the kernel function employed sati sfies . 1 ... 2 ) ... ( exp ) 1 2 / ( 2 ... 2 1 2 2 2 2 2 1 2 / ) 1 ( = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + − ⋅ + Γ ⋅ ∫∫ ∫ ∞ ∞ − ∞ ∞ − ∞ ∞ − + m m m m dx dx dx x x x m σ σ π We have ∫∫ ∫ ∞ ∞ − ∞ ∞ − ∞ ∞ − ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + − m m m dx dx dx x x x ... 2 ) ... ( exp ... 2 1 2 2 2 2 2 1 σ , 2 exp ) 2 / ( 2 0 2 2 1 2 / dr r m r m m m ∫ ∞ − ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − Γ ⋅ ⋅ = σ π where ) 2 / ( 2 1 2 / m r m m Γ ⋅ ⋅ − π is the surface area of a sphere with radius r in an m - dimensional vector space. Let m r t = . Then, r t m r m d r dt m ⋅ = ⋅ = − 1 . Accordingly, dr r m r m m m ∫ ∞ − ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − Γ ⋅ ⋅ 0 2 2 1 2 / 2 exp ) 2 / ( 2 σ π Supervised Ma chine Learning with a Novel Kernel Densit dt t m dt t m m m m ∫ ∫ ∞ ∞ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − + Γ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⋅ ⋅ Γ ⋅ = 0 2 2 2 / 0 2 2 2 / 2 exp ) 1 2 / ( 2 exp 1 ) 2 / ( 2 σ π σ π . 2 ) 1 2 / ( 2 / σ π π ⋅ ⋅ + Γ = m m Therefore, . 1 ... 2 ) ... ( exp ) 1 2 / ( 2 ... 2 1 2 2 2 2 2 1 2 / ) 1 ( = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + − ⋅ + Γ ⋅ ∫∫ ∫ ∞ ∞ − ∞ ∞ − ∞ ∞ − + m m m m dx dx dx x x x m σ σ π □ The validity of Theorem 2 stems on the assump tion that ) ( z f Z is analytic in [0, ∞ ) with all orders of the right-side d derivatives at 0. One may wonder how strict this condition is. In this respect, the following illustration should provide us with some insights. Assume that ) ,..., , ( 2 1 m X x x x f is a constan t function in the prox- imity of 0. Then, ) ( z f Z is an alytic in [0, ε ], where ε is a small positive real number, wit h the right-si ded derivat ives at 0. The estimator presented in Theorem 2 forms the basis of t he novel kernel de nsity estimator proposed in th is article. Sin ce both Theorem 1 and Theorem 2 address only the p o intwise MSE , for real applications we have incorporated t h e basic idea of variable kernel density estimator [2 , 4] to generalize the est imator presented in The o- rem 2 and obt ain the so-called super-ra dius based kernel densi ty estimator (SRKDE) shown in t he followi ng: , 2 || || exp 2 1 ) 1 2 / ( ) ( ˆ 1 2 2 2 / ∑ = ∗ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − − ⋅ ⋅ + Γ = n i i m i m X n m f σ σ π π v s v i where 1. k R m i k i )] ( [ s β σ = ; 2. β is the smoothing parameter with order ) ( 3 / 2 n O ; 3. ) ( i s k R is the distance from s i to its k -th neare st neighbor; 4. k is a parameter to be set. The proposed kernel density estimator is so named because random variable Z maps a sampling instance s i taken f rom the distri bution governed by X f to m i s and m i s is referred to as the supe r-radius of s i in this article. For data classification applica- tions, we will constru ct one SRKDE to appr oximate th e distribution of one class of training instances in the vect or space. Then, a query i nstance located at v is pre- dicted to belon g to the class that gives t h e maximum val ue among the likelihood fu nc- tions defined in th e following: , ) ( ˆ | | ) ( ˆ | | ) ( * * ∑ ⋅ ⋅ = h h h j j j f S f S L v v v 8 Supervis ed Machine Learning with a Novel Kernel wher e j S is the numbe r of class- j training instances and ) ( ˆ * v j f is the SRKDE cor- responding t o class- j training instances. In our current implementation, aiming to improve the execution time of the classifi e r, we include only a limited number, de- noted by k’ , of the nearest class- j training inst ances of v in compu ting ) ( ˆ * v j f . The basic idea of the proposed ker nel density estimator can be exploited to obtain a kernel-based appr oximate function. Let } , , { n 2 1 s s s K be a set of sampling in stances randomly and independently taken fr om the space of funct ion f with a uniform sam- pling densit y ρ . Then, wi th 3 / 1 − ⋅ = n λ σ and λ being a positive real n umber, the pointwise MS E of the following kernel- based approximate function i n an m - dimensional vector space ∑ = + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⋅ + Γ ⋅ ⋅ = n i m m m f f 1 2 2 2 / ) 1 ( 2 exp ) 1 2 / ( 2 ) ( ) ( ˆ σ σ π ρ i i s s 0 converges at ) ( 3 / 2 − n O regardless of t he value of m . As mentioned earlier, one main d istinctive property of the kernel density estima- tion based approach is that the average time ta ken to con struct a classifier is in the or- der of O ( n log n ) , wher e n is the total number of training instances. This argument is based on the assump tion that the kd-tree structure [11] is employed in the imp lemen- tation. For detailed analysis of the time complexity, please refer to the discussion presented in [7], which provides the detailed analysis for a similar kernel density es- timator. Concerning the execution time for making prediction with n' query in- stances, it is shown in [7] that the average time complexity is O ( n' log n ). III. Experimental Results This section re ports the experiments conduct ed to verify the theorems presented in the previous sectio n. Table 1 : The observed MSEs with th e estimator pres ented in The orem 2 at 5 different points and the respectiv e convergence rates. n Points 20000 80000 320000 1280000 c in log MSE = c log n + δ (0,0,0,0) 3.23E-05 1.43E-05 5.98E-06 2.18E-06 -0.643 (0.05,0,0,0) 3.45E-05 1.48E -05 5.12E-06 2.50E-06 -0.644 (0,1,0,0) 2.23E-05 8.1E-06 3.55E-06 1.38E-06 -0.661 (0,0.1,0,0) 4.08E-05 1.23E-05 6.22E-06 2.47E-06 -0.656 (0.05,0.05,0,0) 3.57E-05 1.43E -05 5.70E-06 2.41E-06 -0.649 Supervised Ma chine Learning with a Novel Kernel Densit Each dataset used in t h e experi ment contai ned sampling i nstances randomly taken from the distribution defined by the following probability density function in the 4- dimensional vector space: ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + + + − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + − − ⋅ 2 ] ) 1 . 0 [( exp 21 9 2 ] ) 1 . 0 [( exp 20 11 ) 2 ( 1 2 4 2 3 2 2 2 1 2 4 2 3 2 2 2 1 4 x x x x x x x x π Then, the estimator presented in Theorem 2 with σ set to 3 / 1 10000 005 . 0 − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ⋅ n was employed to obtain the estimates at th e 5 points listed in Table 1 based on the ran- domly generat ed dataset. For the numbers reporte d in Table 1, the same expe rimen- tal procedure was repeat ed 500 times with 50 0 indepe ndently generated datasets and the observed MSE at each poin t was computed a s follows () ∑ = − 500 1 2 ) ( ) ( ˆ 500 1 i i v f v f , wher e ) ( ˆ v f i is the estimate of ) ( v f obtained wit h the dataset generated in the i -th run of the experiment . The experi mental results reported i n Table 1 confirm that the pointwise MS E of the estimator pre sented in Theorem 2 c onverges at ) ( 3 / 2 − n O . IV. Conclusion This articl e proposes the super radius based kernel density estimator (SR KDE) and reports the experime nt conducted to veri fy the th eorems pre sente d in this article. The major distinction of the SRKDE is that the po intwise MSE of its basic form con- verges at ) ( 3 / 2 − n O , where n is the number of instances in the training dataset, re- gardless of the dimension of the vector space, if th e probability density function at the point of interest meets certain cond itions. Since the average time complexity for constructing a SRKDE based classifier is O ( n log n ), it is conceivable that the SRKDE based approach can c ope well with the contempora ry applications that invol ve a large and ever-growing database and delivers e v er-improving prediction accuracy as the da- tabase continues to grow. On the other ha nd, because the theo rems associated with the SRKDE are all derive d with the asympt otical a pproach, these theorems may not hold well when the number of sampling instances is no t sufficiently large. In such cases, the SRKDE based classifier ma y deliv er inferior predict ion accuracy in com- parison with the state-of-a rt SVM (Support Vector Machine). In summary, the ker- nel density estimation b ased approach and the SVM have their respective advantages and disadvantag es. Contributions of Authors YJO initiated this study, proposed the SR KDE, and estab lished its mathematical foundation. DTHC, YYO, HGH, CPW a nd CYC join tly implemented the software and designed the experime nts reported in this article. 10 Supervis ed Machine Learning with a Novel Ke rnel Density Estimator Acknowledgement The authors greatly appreciate the co mments and suggest ions provided b y Prof. Henry Horng -Shing Lu of National Chiao-Tung Universi ty. This research has bee n supported by t he National Science Council of R.O.C. under the contra cts NSC 95- 3114-P-002-005 -Y and NSC 9 6-2627-B-002-00 3. References 1. Parzen, E.: Esti mation of a Probability Dens ity-Function and Mode. Annals of Mathematical Statistics 33 (1962) 1065-& 2. Breiman, L., Meisel , W., Purcell, E.: Variable Kernel Estimates of Multivariat e Densities. Technometrics 19 (1977) 135-144 3. Sain, S.R., Scott, D.W.: Zero-bias locally ad aptive densi ty estimators. Scandinavian Journal of Statistics 29 (2002) 441-460 4. Silverman, B.W.: Density estim a tion for sta tisti cs and data analysis. Chapman and Ha ll, London ; New York (1986) 5. Lowe, D.G.: Similarity Metri c Learning for a Variable-Kernel Classifier. Neural Computa- tion 7 (1995) 72-85 6. Krzyzak, A., Linder, T., Lugosi, G.: Nonpara metric estimation and classification using ra- dial basis function nets and empirical risk mini mization. Ieee Tr ans actions on Neural Net- works 7 (1996) 475-487 7. Oyang, Y.J., Hwang, S.C., Ou, Y.Y., Chen, C. Y., Chen, Z.W.: Data classification with ra- dial basis function ne tworks based on a novel kernel density estimation algorit hm. Ieee Transactions on Neural Networks 16 (2005) 225- 236 8. Cristianini, N., Shawe-Taylor, J .: An intr oduction to support vector machines : and other kernel-based learning methods. Cambridge Univ ersity Press, Cambridge ; New York (2000) 9. Oyang, Y.J., Chang, D.T.H., Ou, Y.Y., Hung, H.G., Chen, C.Y.: Prediction of Protein Sec- ondary Structures with a Novel Kernel Density Estimator. Proceedings of the 2007 Interna- tional Conference on Machine Learning, Mode ls, Technolog ies & App lications, Las Vegas, Nevada, U.S.A 79-84 10. Artin, E.: The gamma function. Holt Ri nehart and Winston, New York, (1964) 11. Bentley, J.L.: Multidimensional binary search trees used for associative sear ching. Commu- nications of the ACM 18 (1975) 509-517
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment