The optimal assignment kernel is not positive definite

The optimal assignmen t k ernel is not p ositive deﬁnite Jean-Philipp e V ert Cen tre for Computational Biology Mines P arisT ec h Jean-Philip pe.Vert@mines.org Octob er 27, 2018 Abstract W e prov e that the optimal assignment k ernel, propo sed recently as an attempt to embed la bele d gra phs and mor e gener a lly tuples of basic data to a Hilb ert space, is in fact not alwa ys po sitive deﬁnite. 1 In tro duction Let X b e a set, and k : X × X → R a symmetric fun ction that satisﬁes, for an y n ∈ N and an y ( a 1 , . . . , a n ) ∈ R n and ( x 1 , . . . , x n ) ∈ X n : n X i =1 n X j =1 a i a j k ( x i , x j ) ≥ 0 . Suc h a fun ction is called a p o sitive deﬁnite kernel on X . A famous result by [1] states the equiv a lence b et w een the deﬁnition of a p ositiv e k ernel and the em b edding of X in a Hilb ert space, in the sense that k is a p ositiv e deﬁnite k ernel on X if and only if there exists a Hilb ert space H with inner pro d u ct h· , ·i H and a mapping Φ : X → H su c h that, for any x, x ′ ∈ X , it holds th at: k ( x, x ′ ) =  Φ( x ) , Φ( x ′ )  H . (1) The construction of p ositiv e deﬁn ite k ernels on v arious sets X has re- cen tly receiv ed a lot of atten tion in statistics and mac hine learning, b ecause they allo w the use of a v ariet y of algorithms for pattern recognition, r egres- sion of outlier detec tion for sets of p oin ts in X [5, 3]. These algorithms, 1 collect iv ely referred to as kernel metho ds , can b e thought of as multiv ari- ate linear metho ds that can b e p erform ed on the Hilb ert space implicitly deﬁned by any p ositiv e deﬁnite k ernel k through (1), b ecause they only ac- cess d ata through inner p ro ducts, hen ce through the ke rnel. This “kernel tric k” allo ws, for example, to p erform sup ervised classiﬁcat ion or r egression on strings or graph s with state-of-t he-art statistical m etho d s as so on as a p ositiv e d eﬁnite k ernel for strings or graph s is deﬁn ed. Unsurprisin gly , this has triggered a lot of activit y f o cu s ed on the d esign of sp eciﬁc p ositiv e d eﬁ- nite kernels for sp eciﬁc data, such as strin gs and graphs for applications in bioinformatics in natural language pro cessing [4]. Motiv ated by applications in computational chemistry , [2] prop osed re- cen tly a k ern el for lab eled graphs, and more generally for structured d ata that can b e decomp osed int o subparts. The k ernel, called optimal assign- ment kernel , measures the similarit y b et ween tw o data p oin ts b y p erforming an optimal matc hing b et w een the subp arts of b oth p oin ts. It translates a natural n otion of similarit y b et ween graphs, and can b e eﬃcien tly co mputed with the Hungarian algorithm. Ho w ev er, w e sho w b elo w that it is in general not p ositiv e deﬁnite, which suggests that sp ecial care m a y b e n eeded b efore using it with kernel metho ds. It s h ould b e p ointe d out that n ot b eing p ositiv e deﬁnite is n ot necessarily a big issue for the use of this k ern el in practice. First, it ma y in fact b e p os- itiv e deﬁn ite when restricted to a particular set of d ata used in a practical exp eriment . Second, other non p ositiv e d eﬁnite k ernels, suc h as the sig- moid ke rnel, ha v e b een sho wn to b e ve ry useful and eﬃcien t in combinatio n with k ernel metho ds. Third , pr actitio ners of k ernel metho ds h a ve d ev elop ed a v ariet y of strategies to limit th e p ossib le dysf unction of k ernel metho ds when non positive deﬁnite ke rnels are used, suc h as pro jecting the Gram matrix of pairwise kernel v alues on the set of p ositive semid eﬁ nite matrices b efore pro cessing it. Th e goo d results rep orted on sev eral c hemoinformatics b enc hmark in [2] indeed conﬁrm the usefu lness of the m etho d . He nce our message in this note is certainly n ot to criticize the use of the optimal as- signmen t k ern el in the con text of ke rnel metho ds. Instead we wish to warn that in some cases, n egativ e eigen v alues ma y a pp ear in the Gram matrix and sp eciﬁc care ma y b e needed, and simultaneously to con tribute to the limitation of err or p ropagation in the scient iﬁc litterature. 2 2 Main result Let us ﬁr st deﬁne formally the optimal assignmen t k ernel of [2]. W e assume giv en a set X ′ , endow ed w ith a p ositive d eﬁnite k ernel k 1 that tak es only nonnegativ e v alues. The ob jects we consider are tuples of elemen ts in X ′ , i.e., an ob ject x decomp oses as x = ( x 1 , . . . , x n ), where n is the length of the tuple x , denoted | x | , and x 1 , . . . , x n ∈ X ′ . W e note X the set of all tu ples of elemen ts in X ′ . Let S n b e the symmetric group, i.e., the set of p erm u tations of n element s. W e now recall the k ernel on X p rop osed in [2]: Deﬁnition 1. The optimal assignmen t k ernel k A : X × X → R is deﬁne d, for any x, y ∈ X , by: k A ( x, y ) = ( max π ∈ S | x | P | x | i =1 k 1 ( x i , y π ( i ) ) if | y | ≥ | x | , max π ∈ S | y | P | y | i =1 k 1 ( x π ( i ) , y i ) otherwise. W e can no w s tate our main theorem. Theorem 1. The optimal alignment kernel is not always p ositive deﬁnite. Before pr o ving this results w e can make a few commen ts. Remark 1. The me aning of the statement “not always” i n The or em 1 is that ther e exist choic es of X ′ and k 1 such that the optima l assignment kernel is p ositive deﬁnite, while ther e also exist choic es for which it is not p ositive deﬁnite. Remark 2. The or em 1 c ontr adicts Th e or em 2.3 in [2], which claim s that the optimal assignment kernel is always p ositive deﬁnite. The pr o of of The or em 2.3 i n [2 ], however, c ontains the fol lowing err or. Using the no- tations of [2], th e author deﬁne in the c ourse of their pr o of the values A := 2 P n j =1 v n +1 v j K n +1 ,j and B := P n j =1 v 2 n +1 K n +1 ,n +1 + v 2 j K j j . They show tha t A ≤ B , on the one hand, and that A < 0 , on the oth er hand. F r om th is they c onclude that B < 0 , which is obvioulsy not a valid lo gic al c onclusion. In order t o pro v e Theorem 1, w e now provide an example of ( X ′ , k 1 ) pair that lea ds to a p ositiv e deﬁnite optimal assignmen t k ernel, and another example that leads to the opp osite conclusion. Lemma 1. L et X ′ = { 1 } b e a singleton, and k 1 (1 , 1) = 1 . Then the optimal assignment kernel is p ositive deﬁnite. 3 Pr o of. When X ′ = { 1 } , the tuples are simply rep eats of the u n ique elemen t, hence eac h elemen t x = ( 1 , . . . , 1) ∈ X is uniquely deﬁned by its length | x | ∈ N . The optimal assignmen t k ern el is then giv en by: k A ( x, y ) = min ( | x | , | y | ) . The fu nction min ( a, b ) is kno wn to b e p ositiv e deﬁnite on N , therefore k is a v alid kernel on X . Lemma 2. L et X ′ = R 2 and k 1 ( x, y ) = exp  − γ || x − y || 2  , for x, y ∈ R 2 and γ > 0 . Then the optimal assignment kernel is not p ositive deﬁnite. Pr o of. The fun ction k 1 deﬁned in Lemma 2 is the well-kno wn Gaussian ra- dial basis fun ction kernel, whic h is kno w n to b e p ositiv e d eﬁnite an d only tak es nonnegativ e v alues, h en ce it satisﬁes all h yp othesis needed in the def- inition of the optimal assignment ke rnel. In ord er to show that the latter is not p ositive deﬁnite, we exhibit a set of p oints in X that can not b e em b ed- ded in a Hilb ert space through (1). F or this let u s start w ith four p oints th at form a squ are in X ′ , e.g., A = (0 , 0) , B = (1 , 0) , C = (1 , 1) an d D = (0 , 1) (Figure 1 ). Denoting a := exp( − γ ), w e d irectly obtain from the deﬁnition D A B C Figure 1: F our p oints in X ′ = R 2 , endo w ed with th e p ositiv e deﬁnite kernel k 1 ( x, y ) = exp  − γ || x − y || 2  . of k 1 that:      k 1 ( A, A ) = k 1 ( B , B ) = k 1 ( C, C ) = k 1 ( D , D ) = 1 , k 1 ( A, B ) = k 1 ( B , C ) = k 1 ( C, D ) = k 1 ( D , A ) = a , k 1 ( A, C ) = k 1 ( B , D ) = a 2 . In th e sp ace X of tup les, let u s now consider the six 2-tuples obtained b y taking all p airs of distinct p oin ts: AB , AC, AD , B C , B D , C D . Using 4 the deﬁn ition of the optimal assignmen t k ernel k ( uv , wt ) = max( k 1 ( u, w ) + k 1 ( v , t ) , k 1 ( u, t ) + k 1 ( v , w )) f or u, v , w, t ∈ { A, B , C, D } , w e easily obtain:                      k ( AB , AB ) = k ( AC, AC ) = k ( AD , AD ) = k ( B C, B C ) = k ( B D , B D ) = k ( C D , C D ) = 2 , k ( AB , AC ) = k ( AB , B D ) = k ( B C , B D ) = k ( B C , AC ) = k ( C D , AC ) = k ( C D , B D ) = k ( AD , AC ) = k ( AD , B D ) = 1 + a , k ( AB , B C ) = k ( B C, C D ) = ( C D , AD ) = k ( AB , AD ) = 1 + a 2 , k ( AB , C D ) = k ( AD, B C ) = k ( AC, B D ) = 2 a . If k w as p ositive deﬁn ite, then these s ix 2-tuples could b e embedd ed to a Hilb ert space H by a mappin g Φ : X → H satisfying (1). Let us sho w that this is imp ossible. Let d ( x, y ) = || Φ( x ) − Φ( y ) || H b e the Hilb ert distance b et w een t wo p oints x, y ∈ X after their embedd ing in H . It can b e compu ted from the kernel v alues b y the classical equalit y: d ( x, y ) 2 = k ( x, x ) + k ( y , y ) − 2 k ( x, y ) . W e ﬁrst observ e that d ( AB , AC ) 2 = d ( AC, C D ) 2 = 2 − 2 a , and d ( AB , C D ) 2 = 4 − 4 a . T h erefore, d ( AB , C D ) 2 = d ( AB , AC ) 2 + d ( AC, C D ) 2 , from w hic h w e conclude that ( AB , AC, C D ) form a h alf-square, with hy- p oten use ( AB , C D ). A similar computation shows that ( AB , B D , C D ) is also a half-square with h yp oten use ( AB , C D ). Moreo v er, d ( AC, B D ) = 4 − 4 a = d ( AB , C D ) , whic h shows that the four p oint s ( AB , AC, C D , B D ) are in fact coplanar and form a square. The same computation when AB and C D are resp ectiv ely replaced by AD and B C sho ws that the four p oints ( AD , AC , B C, B D ) are also coplanar and also f orm a square. Hence all six p oin ts can b e em b edded in 3 dimensions, and th e p oin ts ( AB , AD , C D , B C ) a re them- selv es coplanar and m ust form a rectangle on the plane equ idistan t from AC and B D (Fi gure 2). The ed ges of this rectangle hav e all the same length d ( AB , B C ) 2 = d ( B C, C D ) 2 = d ( C D , AD ) 2 = d ( AD , AB ) 2 = 2 − 2 a 2 and is therefore a square, whose hypotenuse ( AB , C D ) should ha v e a length √ 4 − 4 a 2 . Ho wev er a direct compu tation giv es d ( AB , C D ) = √ 4 − 4 a , whic h provides a con tradiction since 0 < a < 1. Hence the six p oin ts can not b e em b edded in to a Hilb ert space with k as inn er pro duct, w hic h sho ws that k is n ot p ositiv e deﬁnite on X . 5 BC AB CD BD AC AD Figure 2: Th e necessary conﬁgur ation of the six 2-tuples if em b edd in g with the optimal assignment k ern el w as p ossib le. References [1] N. Aronsza jn. Theory of repro du cing k ern els. T r ans. Am. Math. So c. , 68:337 – 404, 1950. [2] H. F r¨ ohlic h, J. K. W egner, F. Sieke r, an d A. Zell. Optimal assignment k ernels for attributed molecular graphs. In Pr o c e e dings of the 22nd in- ternational c onfer enc e on Machine le arning , pages 225 – 232 , New Y ork, NY, USA, 2005. A CM Pr ess. [3] B. Sch¨ olk opf and A. J . Smola. L e arning with Kernels: Supp ort V e c- tor Machines, R e gularization, Optimization, and Beyond . MIT Press, Cam bridge, MA, 2002. [4] J. Sha we -T a ylor and N. Cristianini. Kernel M etho ds for Pattern Ana ly- sis . Cam br idge Univ ersity Press, 2004. [5] V. N. V apnik. Statistic al Le arning The ory . Wiley , New-Y ork, 1998. 6

The optimal assignment kernel is not positive definite

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment