The optimal assignment kernel is not positive definite

We prove that the optimal assignment kernel, proposed recently as an attempt to embed labeled graphs and more generally tuples of basic data to a Hilbert space, is in fact not always positive definite.

Authors: Jean-Philippe Vert (CB)

The optimal assignmen t k ernel is not p ositive definite Jean-Philipp e V ert Cen tre for Computational Biology Mines P arisT ec h Jean-Philip pe.Vert@mines.org Octob er 27, 2018 Abstract W e prov e that the optimal assignment k ernel, propo sed recently as an attempt to embed la bele d gra phs and mor e gener a lly tuples of basic data to a Hilb ert space, is in fact not alwa ys po sitive definite. 1 In tro duction Let X b e a set, and k : X × X → R a symmetric fun ction that satisfies, for an y n ∈ N and an y ( a 1 , . . . , a n ) ∈ R n and ( x 1 , . . . , x n ) ∈ X n : n X i =1 n X j =1 a i a j k ( x i , x j ) ≥ 0 . Suc h a fun ction is called a p o sitive definite kernel on X . A famous result by [1] states the equiv a lence b et w een the definition of a p ositiv e k ernel and the em b edding of X in a Hilb ert space, in the sense that k is a p ositiv e definite k ernel on X if and only if there exists a Hilb ert space H with inner pro d u ct h· , ·i H and a mapping Φ : X → H su c h that, for any x, x ′ ∈ X , it holds th at: k ( x, x ′ ) =  Φ( x ) , Φ( x ′ )  H . (1) The construction of p ositiv e defin ite k ernels on v arious sets X has re- cen tly receiv ed a lot of atten tion in statistics and mac hine learning, b ecause they allo w the use of a v ariet y of algorithms for pattern recognition, r egres- sion of outlier detec tion for sets of p oin ts in X [5, 3]. These algorithms, 1 collect iv ely referred to as kernel metho ds , can b e thought of as multiv ari- ate linear metho ds that can b e p erform ed on the Hilb ert space implicitly defined by any p ositiv e definite k ernel k through (1), b ecause they only ac- cess d ata through inner p ro ducts, hen ce through the ke rnel. This “kernel tric k” allo ws, for example, to p erform sup ervised classificat ion or r egression on strings or graph s with state-of-t he-art statistical m etho d s as so on as a p ositiv e d efinite k ernel for strings or graph s is defin ed. Unsurprisin gly , this has triggered a lot of activit y f o cu s ed on the d esign of sp ecific p ositiv e d efi- nite kernels for sp ecific data, such as strin gs and graphs for applications in bioinformatics in natural language pro cessing [4]. Motiv ated by applications in computational chemistry , [2] prop osed re- cen tly a k ern el for lab eled graphs, and more generally for structured d ata that can b e decomp osed int o subparts. The k ernel, called optimal assign- ment kernel , measures the similarit y b et ween tw o data p oin ts b y p erforming an optimal matc hing b et w een the subp arts of b oth p oin ts. It translates a natural n otion of similarit y b et ween graphs, and can b e efficien tly co mputed with the Hungarian algorithm. Ho w ev er, w e sho w b elo w that it is in general not p ositiv e definite, which suggests that sp ecial care m a y b e n eeded b efore using it with kernel metho ds. It s h ould b e p ointe d out that n ot b eing p ositiv e definite is n ot necessarily a big issue for the use of this k ern el in practice. First, it ma y in fact b e p os- itiv e defin ite when restricted to a particular set of d ata used in a practical exp eriment . Second, other non p ositiv e d efinite k ernels, suc h as the sig- moid ke rnel, ha v e b een sho wn to b e ve ry useful and efficien t in combinatio n with k ernel metho ds. Third , pr actitio ners of k ernel metho ds h a ve d ev elop ed a v ariet y of strategies to limit th e p ossib le dysf unction of k ernel metho ds when non positive definite ke rnels are used, suc h as pro jecting the Gram matrix of pairwise kernel v alues on the set of p ositive semid efi nite matrices b efore pro cessing it. Th e goo d results rep orted on sev eral c hemoinformatics b enc hmark in [2] indeed confirm the usefu lness of the m etho d . He nce our message in this note is certainly n ot to criticize the use of the optimal as- signmen t k ern el in the con text of ke rnel metho ds. Instead we wish to warn that in some cases, n egativ e eigen v alues ma y a pp ear in the Gram matrix and sp ecific care ma y b e needed, and simultaneously to con tribute to the limitation of err or p ropagation in the scient ific litterature. 2 2 Main result Let us fir st define formally the optimal assignmen t k ernel of [2]. W e assume giv en a set X ′ , endow ed w ith a p ositive d efinite k ernel k 1 that tak es only nonnegativ e v alues. The ob jects we consider are tuples of elemen ts in X ′ , i.e., an ob ject x decomp oses as x = ( x 1 , . . . , x n ), where n is the length of the tuple x , denoted | x | , and x 1 , . . . , x n ∈ X ′ . W e note X the set of all tu ples of elemen ts in X ′ . Let S n b e the symmetric group, i.e., the set of p erm u tations of n element s. W e now recall the k ernel on X p rop osed in [2]: Definition 1. The optimal assignmen t k ernel k A : X × X → R is define d, for any x, y ∈ X , by: k A ( x, y ) = ( max π ∈ S | x | P | x | i =1 k 1 ( x i , y π ( i ) ) if | y | ≥ | x | , max π ∈ S | y | P | y | i =1 k 1 ( x π ( i ) , y i ) otherwise. W e can no w s tate our main theorem. Theorem 1. The optimal alignment kernel is not always p ositive definite. Before pr o ving this results w e can make a few commen ts. Remark 1. The me aning of the statement “not always” i n The or em 1 is that ther e exist choic es of X ′ and k 1 such that the optima l assignment kernel is p ositive definite, while ther e also exist choic es for which it is not p ositive definite. Remark 2. The or em 1 c ontr adicts Th e or em 2.3 in [2], which claim s that the optimal assignment kernel is always p ositive definite. The pr o of of The or em 2.3 i n [2 ], however, c ontains the fol lowing err or. Using the no- tations of [2], th e author define in the c ourse of their pr o of the values A := 2 P n j =1 v n +1 v j K n +1 ,j and B := P n j =1 v 2 n +1 K n +1 ,n +1 + v 2 j K j j . They show tha t A ≤ B , on the one hand, and that A < 0 , on the oth er hand. F r om th is they c onclude that B < 0 , which is obvioulsy not a valid lo gic al c onclusion. In order t o pro v e Theorem 1, w e now provide an example of ( X ′ , k 1 ) pair that lea ds to a p ositiv e definite optimal assignmen t k ernel, and another example that leads to the opp osite conclusion. Lemma 1. L et X ′ = { 1 } b e a singleton, and k 1 (1 , 1) = 1 . Then the optimal assignment kernel is p ositive definite. 3 Pr o of. When X ′ = { 1 } , the tuples are simply rep eats of the u n ique elemen t, hence eac h elemen t x = ( 1 , . . . , 1) ∈ X is uniquely defined by its length | x | ∈ N . The optimal assignmen t k ern el is then giv en by: k A ( x, y ) = min ( | x | , | y | ) . The fu nction min ( a, b ) is kno wn to b e p ositiv e definite on N , therefore k is a v alid kernel on X . Lemma 2. L et X ′ = R 2 and k 1 ( x, y ) = exp  − γ || x − y || 2  , for x, y ∈ R 2 and γ > 0 . Then the optimal assignment kernel is not p ositive definite. Pr o of. The fun ction k 1 defined in Lemma 2 is the well-kno wn Gaussian ra- dial basis fun ction kernel, whic h is kno w n to b e p ositiv e d efinite an d only tak es nonnegativ e v alues, h en ce it satisfies all h yp othesis needed in the def- inition of the optimal assignment ke rnel. In ord er to show that the latter is not p ositive definite, we exhibit a set of p oints in X that can not b e em b ed- ded in a Hilb ert space through (1). F or this let u s start w ith four p oints th at form a squ are in X ′ , e.g., A = (0 , 0) , B = (1 , 0) , C = (1 , 1) an d D = (0 , 1) (Figure 1 ). Denoting a := exp( − γ ), w e d irectly obtain from the definition D A B C Figure 1: F our p oints in X ′ = R 2 , endo w ed with th e p ositiv e definite kernel k 1 ( x, y ) = exp  − γ || x − y || 2  . of k 1 that:      k 1 ( A, A ) = k 1 ( B , B ) = k 1 ( C, C ) = k 1 ( D , D ) = 1 , k 1 ( A, B ) = k 1 ( B , C ) = k 1 ( C, D ) = k 1 ( D , A ) = a , k 1 ( A, C ) = k 1 ( B , D ) = a 2 . In th e sp ace X of tup les, let u s now consider the six 2-tuples obtained b y taking all p airs of distinct p oin ts: AB , AC, AD , B C , B D , C D . Using 4 the defin ition of the optimal assignmen t k ernel k ( uv , wt ) = max( k 1 ( u, w ) + k 1 ( v , t ) , k 1 ( u, t ) + k 1 ( v , w )) f or u, v , w, t ∈ { A, B , C, D } , w e easily obtain:                      k ( AB , AB ) = k ( AC, AC ) = k ( AD , AD ) = k ( B C, B C ) = k ( B D , B D ) = k ( C D , C D ) = 2 , k ( AB , AC ) = k ( AB , B D ) = k ( B C , B D ) = k ( B C , AC ) = k ( C D , AC ) = k ( C D , B D ) = k ( AD , AC ) = k ( AD , B D ) = 1 + a , k ( AB , B C ) = k ( B C, C D ) = ( C D , AD ) = k ( AB , AD ) = 1 + a 2 , k ( AB , C D ) = k ( AD, B C ) = k ( AC, B D ) = 2 a . If k w as p ositive defin ite, then these s ix 2-tuples could b e embedd ed to a Hilb ert space H by a mappin g Φ : X → H satisfying (1). Let us sho w that this is imp ossible. Let d ( x, y ) = || Φ( x ) − Φ( y ) || H b e the Hilb ert distance b et w een t wo p oints x, y ∈ X after their embedd ing in H . It can b e compu ted from the kernel v alues b y the classical equalit y: d ( x, y ) 2 = k ( x, x ) + k ( y , y ) − 2 k ( x, y ) . W e first observ e that d ( AB , AC ) 2 = d ( AC, C D ) 2 = 2 − 2 a , and d ( AB , C D ) 2 = 4 − 4 a . T h erefore, d ( AB , C D ) 2 = d ( AB , AC ) 2 + d ( AC, C D ) 2 , from w hic h w e conclude that ( AB , AC, C D ) form a h alf-square, with hy- p oten use ( AB , C D ). A similar computation shows that ( AB , B D , C D ) is also a half-square with h yp oten use ( AB , C D ). Moreo v er, d ( AC, B D ) = 4 − 4 a = d ( AB , C D ) , whic h shows that the four p oint s ( AB , AC, C D , B D ) are in fact coplanar and form a square. The same computation when AB and C D are resp ectiv ely replaced by AD and B C sho ws that the four p oints ( AD , AC , B C, B D ) are also coplanar and also f orm a square. Hence all six p oin ts can b e em b edded in 3 dimensions, and th e p oin ts ( AB , AD , C D , B C ) a re them- selv es coplanar and m ust form a rectangle on the plane equ idistan t from AC and B D (Fi gure 2). The ed ges of this rectangle hav e all the same length d ( AB , B C ) 2 = d ( B C, C D ) 2 = d ( C D , AD ) 2 = d ( AD , AB ) 2 = 2 − 2 a 2 and is therefore a square, whose hypotenuse ( AB , C D ) should ha v e a length √ 4 − 4 a 2 . Ho wev er a direct compu tation giv es d ( AB , C D ) = √ 4 − 4 a , whic h provides a con tradiction since 0 < a < 1. Hence the six p oin ts can not b e em b edded in to a Hilb ert space with k as inn er pro duct, w hic h sho ws that k is n ot p ositiv e definite on X . 5 BC AB CD BD AC AD Figure 2: Th e necessary configur ation of the six 2-tuples if em b edd in g with the optimal assignment k ern el w as p ossib le. References [1] N. Aronsza jn. Theory of repro du cing k ern els. T r ans. Am. Math. So c. , 68:337 – 404, 1950. [2] H. F r¨ ohlic h, J. K. W egner, F. Sieke r, an d A. Zell. Optimal assignment k ernels for attributed molecular graphs. In Pr o c e e dings of the 22nd in- ternational c onfer enc e on Machine le arning , pages 225 – 232 , New Y ork, NY, USA, 2005. A CM Pr ess. [3] B. Sch¨ olk opf and A. J . Smola. L e arning with Kernels: Supp ort V e c- tor Machines, R e gularization, Optimization, and Beyond . MIT Press, Cam bridge, MA, 2002. [4] J. Sha we -T a ylor and N. Cristianini. Kernel M etho ds for Pattern Ana ly- sis . Cam br idge Univ ersity Press, 2004. [5] V. N. V apnik. Statistic al Le arning The ory . Wiley , New-Y ork, 1998. 6

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment