Kernels for Measures Defined on the Gram Matrix of their Support

We present in this work a new family of kernels to compare positive measures on arbitrary spaces $\Xcal$ endowed with a positive kernel $\kappa$, which translates naturally into kernels between histograms or clouds of points. We first cover the case …

Authors: Marco Cuturi

Kernels for Measures Defined on the Gram Matrix of their Supp ort Marco Cuturi Princeton Univ ersit y mcuturi@pri nceton.edu Octob er 30, 2018 Abstract W e present in th is work a new family of kernels to compare p ositive measures on arbitrary spaces X endo wed w ith a p ositiv e kernel κ , whic h translates n atu rally into kernels b et ween histograms or clouds of p oints. W e first co ver the case where X is Euclidian, and focus on ke rnels whic h take into account the v ariance matrix of the mixture of tw o measures to compute their similarit y . The kernels w e defi n e are semigroup kernels in the sense that they only use th e sum of tw o measures to compare th em, and sp ectral in the sense that t hey only use t he eig ensp ectrum of the v ariance matrix of this mixture. W e sho w th at such a family of kernels has close b onds with t h e laplace transforms of nonnegative-v alued func- tions defined on the cone of positive semidefinite matrices, and we presen t some closed f ormulas that can be derived as sp ecial cases of such inte- gral ex pressions. By fo cusing further on functions which are in v ariant to the addition of a null eigen v alue to the sp ectrum of th e v ariance ma- trix, we can defin e kernels b etw een atomic measures on arbitrary spaces X en dow ed with a kernel κ by using directly the eigenv alues of t he cen- tered Gram matrix of the joined supp ort of the compared measures. W e provide ex p licit form ulas suited for applications and present p reliminary exp eriments to illustrate the interes t of the approach. 1 In tro duction Defining meaningful kernels on p ositive measur es is a n imp ortant iss ue in the field of kernel metho ds, as it encompass e s the topic of comparing histog rams, bags-of-c omp o nents or clouds of p oints, which all a rise very fre q uent ly in a ppli- cations dealing with str uctured data. In the pioneering applications of suppo rt vector mac hines to str uc tur ed da ta, histograms were often treated as simple vectors and used a s such through the standard Gaussian or p olyno mial kernels [J oa02]. Y et, more adeq ua te kernels which exploit the sp ecificities of histog r ams hav e b een prop os ed since. Namely , 1 the fact that histog rams are vectors with nonnega tiv e co or dinates [HB05], a nd whose sum may b e no rmalized to one, that is cast as dis c r ete proba bilit y mea- sures and tr eated under the light of informatio n geometry [LL0 5, Leb06]. Since such histograms are usually defined on bins which are not equally dissimilar , as is for instance the case with color, words or amino-a cid histograms, further kernels which may take in to acco unt an a prio ri inter-bin similarity where subsequently prop osed [KJ 03, CFV05, HB05] as an a ttempt to include with mor e ac curacy a pr io r k nowledge on the co ns idered co mpo nent s, through the knowledge of a prior kernel κ for instance . In this pap er we inv estigate further such kind of kernels be t ween tw o mea- sures, which can co nvenien tly describ e the similarity b e t ween tw o clouds of po int s by only conside r ing their Gram ma tr ices. In this s ense we refor mulate and extend the results o f [CFV05] whose framework we briefly recall: The set M b + ( X ) of b ounded p ositive measures on a set X is a cone, and from a more e lemen tary algebra ic v iewpo int a semigr oup 1 . In that sense, a natural wa y to define kernels suited to the g eometry of M b + ( X ) is to study the family of semig roup functions on M b + ( X ), as int ro duced in [BCR8 4], that is real- v alued functions ψ defined on M b + ( X ) s uch that the map ( µ, µ ′ ) 7→ ψ ( µ + µ ′ ) is either pos itive or nega tive definite. The Jense n-divergence, whic h is computed through the entropy of the mixtur e of tw o meas ures is such a n exa mple, a s recalled in [HB05]. Given the complexity of ev aluating entropies for finite samples, it is shown in [CFV05] that similar quantities can b e defined for measur es by only taking int o a ccount the v ariance matr ix Σ( µ + µ ′ ) of the mixture of tw o mea sures. This has tw o clear adv antages. First, v ariances ar e easy to compute given atomic measures, that is measur e s with finite supp or t, which a re usually consider e d in mo st applications. Second, the eigensp ectrum of the v ar iance ma trix o f a n (atomic) pro bability meas ure is known to b e the same as, up to zero eigenv alues and an adequate cen traliz a tion, the eig ensp ectrum of the dot-pro duct matrix of the supp ort of the same mea s ure. This fact pav es the way to co nsider kernels defined on Gram matrices rather than on v ariance matrices, regar dless o f the structure of X , as was first hint ed in [KJ03]. More prec is ely , the authors of [CFV05] first prove that for a v aria nce matrix Σ( µ + µ ′ ), the deter minant | 1 η Σ( µ + µ ′ ) + I n | − 1 2 for η > 0 is a p ositive definite (p.d.) kernel betw een tw o measure s µ, µ ′ on an Euclidian space X of dimension n . Second, they prove that this q ua nt ity can b e cast into a r epro ducing kernel Hilber t space (rk Hs) a sso ciated with a kernel κ on X , reg ardless of the na ture of X , b y using directly a centered Gram matrix K µ,µ ′ of all element s contained in the supp ort of b oth µ and µ ′ . W e a r e interested in this pap er in characterizing other functions ϕ defined on matrices s uch that (i) µ, µ ′ 7→ ϕ (Σ( µ + µ ′ )) is e ither p os itive or negative definite, (ii) ϕ is sp ectr al 2 and (iii) ϕ is inv ariant to the addition of null eigenv alues, 1 In this pap er, a semi group will be a non-empty set S endo wed wi th a commutativ e addi- tion, such that for s, t ∈ S , their sum s + t = t + s ∈ S , and a neutral elemen t e such that s + e = s 2 A function f defined on symmetric matrices is sp ectral, or orthogonally inv ariant, if for 2 that is, for tw o square p.d. matrices A, B which may not have the sa me size, ϕ ( A ) = ϕ ( B ) if A a nd B hav e the sa me p ositive eige nv alues taken with their m ultiplicity , reg ardless of the m ultiplicity o f 0 in their eig ensp ectrum. It is ea sy to chec k tha t b oth | 1 η · + I | and the trace fulfill condition (iii) . If ϕ s atisfies condition (i) and (ii) , we c a ll the co mpo sed function ψ = ϕ ◦ Σ a semigr oup s p e ctr al p ositive (r esp. ne gative) definite (s.s.p.d., res p. s.s.n.d.) function on M b + ( X ). Note that the task of defining such functions ψ is not equiv ale nt to defining directly p ositive o r neg ative definite functions ϕ on the semigroup of p.d. matrices, since the underlying semigr o up o pe ration is the addition of mea sures and not that of the v ar ia nce matrices of the measures , as recalled in Equation (1). When ϕ is further inv ar iant to null eigenv alues (iii) ψ c a n b e cas t in Hilb ert s pa ces of infinite dimensions to compar e deg enerated v ar iance op erator s, which will b e in the context of this pap er a n rk Hs built o n X through a kernel κ . This pap er is str uctured a s follows: we introduce in Section 2 an a llevi- ated for malism for semigr oup functions, and prop ose a gener al link be tween s.s.p.d. functions a nd the Laplace tr a nsform of functions defined o n matrices in Section 3. W e review then in Section 4 differe nt s.s.p.d. functions, notably a function which satis fie s cr iteria (iii) a nd which do es not r equite any regu- larization. W e provide explicit formulas and test the kernel derived fro m such a function o n a b enchmark classificatio n ta s k inv olving handwritten dig its in Section 5. 2 Semigroup F unctions on B ounded Sub sets of M b + ( X ) W e consider X , a n Euclidian space of dimensio n n endow ed with Leb esgue’s measure a nd restrict M b + ( X ) to measur e s with finite fir st and second moments. In such a cas e , the v aria nc e of a meas ure µ of M b + ( X ) can b e defined as: Σ( µ ) = µ [ xx ⊤ ] − µ [ x ] µ [ x ] ⊤ . W riting ¯ µ for µ [ x ], w e recall an elementary result for tw o measures µ, µ ′ of M b + ( X ), Σ( µ + µ ′ ) = Σ( µ ) + Σ( µ ′ ) −  ¯ µ ¯ µ ′⊤ + ¯ µ ′ ¯ µ ⊤  , (1) which highlights the no nlinearity of the v ariance mapping. W e write P n for the cone of real, symmetric a nd p ositive semidefinite ma - trices, and P + n for its subset of (str ic tly ) p.d. matrices. In this pa pe r, the assumption that for a measure µ its v a riance Σ( µ ) is in P n is cr ucial for most calculations, and this is ensured for s ub-probability measures, that is is meas ur es µ such that | µ | = µ ( X ) ≤ 1 , since we then have that Σ( µ ) = µ [( x − ¯ µ ) ( x − ¯ µ ) ⊤ ] + (1 − | µ | ) ¯ µ ¯ µ ⊤ ∈ P n . (2) an y real n × n orthogonal matrix H , that is such that H H ⊤ = I n , f ( H AH ⊤ ) = f ( A ). In that case f only depends on the eigensp ectrum of A . See [BL00] 3 F urthermo r e, w e will a lso need the identit y Σ( µ ) = µ [( x − ¯ µ ) ( x − ¯ µ ) ⊤ ] in order to make the link b etw een the dot-pro duct matrix of the supp ort of µ and its v ar iance matrix, which is why we restrict o ur s tudy to probability measure s M 1 + ( X ). M 1 + ( X ) is no t, how ev er, a semigroup, since it is not clo sed under addition, due to the constra int on | µ | . T o cop e with this co n tradictio n, that is to use semigroup-like functions of the type ( µ, µ ′ ) → ψ ( µ + µ ′ ) where ψ is only defined on a subset of the original semigroup, and where this subset may not b e itself a semigr oup, w e define the following extensio n to the or ig inal definition of semigroup functions which, a lthough technical, is also useful to recall the actual definitions of p ositive and negative definiteness for s emigroup functions. Definition 1 (Sem igroup k ernels on subsets) L et ( S, +) b e a semigr oup and U ⊂ S a nonempty subset of S . A fun ct ion ψ : U → R is a p.d. (r esp. n.d.) semigr oup function on U if X i,j c i c j ψ ( s i + s j ) ≥ 0 ( r esp ≤ 0) holds for any n ∈ N ; any s 1 , . . . , s n ∈ S such that s i + s j ∈ U for 1 ≤ i ≤ j ≤ n ; and any c 1 . . . , c n ∈ R (r esp. with t he additional c ondition that P i c i = 0 ) In pra ctice, stating that a function ψ defined on the subset M 1 + ( X ) is p ositive (resp. nega tive) definite is eq uiv alent to sta ting that the kernel for t wo elements µ, µ ′ of M 1 + ( X ) defined as ( µ, µ ′ ) 7→ ψ  µ + µ ′ 2  is p ositive (r esp. negative) definite. Finally , we write Σ − 1 ( µ ) for (Σ( µ )) − 1 when appropria te. 3 Laplace T ransforms of Matrix F unctions and s.s.p.d. functions W e s how in this sectio n how s.s.p.d. functions on M 1 + ( X ) can b e defined thro ug h the Lapla c e transform of a nonnegative-v alued function defined o n the cone P + n , through the following lemma. Lemma 2 F or any S ∈ P n , the r e al-value d function define d on M 1 + ( X ) , µ 7→ h Σ( µ ) , S i is a n e gative definite semigr oup function. Pr o of. F or any k ∈ N , any c 1 , . . . , c k ∈ R such t hat P i c i = 0 and any µ 1 , . . . , µ k ∈ M 1 + ( X ) s uch that µ i + µ j ∈ M 1 + ( X ), we hav e using Equa tion (1) 4 that X i,j c i c j h Σ ( µ i + µ j ) , S i = * X i,j c i c j  Σ( µ i ) + Σ( µ j ) −  ¯ µ i ¯ µ ⊤ j + ¯ µ j ¯ µ ⊤ i   , S + = − * X i,j c i c j  ¯ µ i ¯ µ ⊤ j + ¯ µ j ¯ µ ⊤ i  , S + = − 2 X i,j c i c j ¯ µ ⊤ i S ¯ µ j ≤ 0 . Note tha t this function is actually n.d. for all mea sures of M b + ( X ), reg ardless of their total w eight | µ | . The case S = I n yields the simple function ψ tr def = µ 7→ tr Σ( µ ), which provides interesting results in practice, a nd b oils down to a fast kernel on clo uds of p oints, which we will review br iefly in Section 5. F or any nonnegative-v alued function f : P + n → R + defined on the cone of p.d. matrices, we write L f ( Z ) = Z S ∈ P + n e − f ( S ) dS (3) for the Laplace transfor m of f ev alua ted in Z ∈ P + n , when the integral exists. Prop ositio n 3 F or any sp e ctr al fu n ction f : P + n → R + , the mapping µ 7→ L f (Σ( µ )) define d for al l me asur es µ ∈ M 1 + ( X ) such that Σ( µ ) ∈ P + n is a s.s.p.d . function. Pr o of. The integral when it exists is a sum of p.d. semig roup functions through Schoenberg’s theorem [BCR84, Theo rem 3.2.2], and is hence p.d. Laplace tr ansforms o f functions defined on matrices is an extensive sub ject and we refer to [Mat9 3, Sectio n 4] for a short s urvey . In the case where f = 1 we recov er the character istic function of the cone P + n , and its lo garithm, ln L 1( A ) = C − n +1 2 log | A | , is known as the universal barr ier [G ¨ ul96] of the cone P + n , with nu mero us applicatio ns in conv ex o ptimiza tion. W e reca ll now a well-known r e sult of multiv a riate analys is based o n zona l po lynomials (see [T ak84, MPH95] for a n exhaustive pr e sentation of these), which may not, how e ver, be of immediate use for a n application in kernel meth- o ds. T o b e short, zonal p olynomials C α ( A ) are p oly nomials in the eigenv al- ues of a matrix A with p ositive co efficients [MP H95, Remark 4.3.6 ], and thus nonnegative-v alued s p ectr al functions, indexed by the partitions α of an int eger a . Na mely , for a ∈ N , we wr ite α = ( a 1 , a 2 , . . . , a n ) for a partition of a into not more than n parts , that is a 1 + a 2 + · · · + a n = a and a 1 ≥ a 2 ≥ · · · ≥ a n . The following r esult follows from [MPH95, Theor em 4.4.1] wher e we hav e dropp ed constants which o nly depend of n and α for mo re readability: Corollary 4 Given a ∈ N and a p artition α of a , the r e al-value d zonal kernel ψ α is a s .s.p.d. fun ction on M 1 + ( X ) , with ψ α : µ 7→ C α (Σ − 1 ( µ )) | Σ( µ ) | − 1 2 n , 5 thr ough the identity R S ∈ P + n e − < Σ ,S > | S | t − 1 2 ( n +1) C α ( S ) dS ∝ | Σ | − t C α (Σ − 1 ) , for t > 1 2 ( n − 1 ) . Actual expressio ns for z onal po lynomials of order a ≤ 10 ar e currently known, and the use of Wishart densities for f can b e seen as a sp ecia l ca se of s uch ev alua tions. It is als o clear that finite and infinite linear co m binations of such zonal kernels, with t he speculatio n that they might b e a useful basis for a sub c ategory of s.s.p.d. functions, ca n b e carr ied out in the spirit of equations provided in [MPH95, Lemma s 4.4.5&6 ] and yield conv enient formulas, s uch as ∞ X a = k X α ψ α : µ 7→ e tr Σ − 1 ( µ ) (tr Σ − 1 ( µ )) k | Σ( µ ) | − 1 2 n , which is a s.s .p.d. function for a ny k ≥ 0 . Howev er, the weak p oint of these ex- pressions when us ed in our setting is that they tend to b e extremely degenerated when the eig ensp ectrum of Σ v a nis hes, due to the high p ower of the denominator and to the fact that the eigensp ectrum of Σ − 1 , not Σ, is co nsidered implicitly . Hence, we do not s ee at the moment how one would obta in expr essions sat- isfying co ndition (iii) , even throug h the us e of regular ization. T o handle this problem, we fo cus in the next s e ction on deg enerated integrations, that is we consider a n extension of the Laplace transform setting defined in Equation (3 ) to degenerated functions f defined on families of semidefinite matric e s o f P n . 4 Degenerated In tegrations on Semidefinite M a- trices of Rank 1 W e restrict the in tegra tion domain to only c o nsider the s ubspace of P n of ma - trices of rank 1, that is matr ices of the form y y ⊤ where y ∈ R n . The Euclidia n norm y ⊤ y of y is the only p ositive eig env alue of y y ⊤ when y 6 = 0 , hence only real-v alued functions of y ⊤ y can b e s pe c tral. F ollowing the pr o of of P r op osi- tion 3, a nd for any nonneg a tive-v alued function g : R + → R + , we observe th us that ψ g : µ 7→ Z R n e − y t Σ( µ ) y g ( y ⊤ y ) dy (4) is a s.s.p.d. function on M 1 + ( X ), noting simply that tr(Σ( µ ) y y ⊤ ) = y t Σ( µ ) y . W e start our a nalysis with a simple example for g , which ca n b e co mputed in close form. 4.1 The case g : x 7→ x i F or a matrix A ∈ P + n such that mspec A = { λ 1 , . . . , λ n } , we set γ 0 ( A ) = 1 and write for 1 ≤ i ≤ n , γ i ( A ) def = X | j | = i Q n k =1 Γ( j k + 1 2 ) λ j 1 1 · · · λ j n n 6 where the summation is taken ov er a ll families j ∈ N n such that the sum o f their elements | j | is equal to i . W riting σ n for (2 π − 1 2 ) n 2 we hav e with these nota tio ns that for all 1 ≤ i ≤ n , Corollary 5 The fun ction ψ i : µ 7→ σ n √ γ n · γ i (Σ( µ )) is a s.s.p.d. functions on M 1 + ( X ) . Pr o of. Let µ ∈ M 1 + ( X ), and write msp ec Σ( µ ) = { λ 1 , . . . , λ n } . Then by an appropria te base change we hav e for g i : x 7→ x i , i ≤ n , ψ g i ( µ ) = Z R n e − P n k =1 λ k y 2 k n X k =1 y 2 k ! i dy = Z R n e − P n k =1 λ k y 2 k X | j | = i n Y k =1 y 2 j k k dy = X | j | = i n Y k =1 Z R e − λ k y 2 k y 2 j k k dy k = X | j | = i n Y k =1 Γ( j k + 1 2 ) λ − j k − 1 2 k = σ n √ γ n · γ i (Σ ( µ )) . The inv erse generaliz e d v ariance is recov ered as ψ 0 . W e refer now to Lancaster ’s formulas [Ber05, p.320] to express more explicitly the cases i = 1 , 2 , 3, where w e write Σ for Σ( µ ): ψ 1 ( µ ) = σ n p | Σ |  tr Σ − 1  , ψ 2 ( µ ) = σ n p | Σ |  (tr Σ − 1 ) 2 + 2 tr Σ − 2  , ψ 3 ( µ ) = σ n p | Σ |  (tr Σ − 1 ) 3 + 6(tr Σ − 1 )(tr Σ − 2 ) + 8 tr Σ − 3  . Although the functions ψ i are s.s.p.d., they are ma inly defined by the low est eigenv a lues of Σ( µ ). These functions ca n a ll be r egularized, by adding a w eighted ident ity matrix I n to Σ, while still preser ving their p ositive definiteness as can be ea sily justified by using the functions g i ( x ) = e − x x i to pena lize for lar ge v alues of y ⊤ y . In such a ca se how ever, and to the notable exception of ψ 0 , this regular iz ation preven ts the ab ove functions to be inv ar iant to the addition of a zero eigenv alue to the sp ectrum of Σ( µ ). Intuitiv ely , this dege ne r acy is due to the fact that we int egr ate on the whole or R n , notably on ker Σ( µ ), wher e the contribution of ex p( − y ⊤ Σ( µ ) y ) is infinite. W e pro po se to solve this issue by considering more specifica lly the contribution of each sphere { y | y ⊤ y = t } to the ov erall summatio n in the case where g = 1 . 4.2 The case g : x 7→ δ t and its v arian ts The question o f integrating exp( − y ⊤ Σ y ) ov er compa ct balls { y ∈ R n | y ⊤ y ≤ t } o r spheres { y ∈ R n | y ⊤ y = t } is clo sely related to the ev aluatio n of the distribution of q uadratic f or ms in norma l v aria tes [MP92]. Given a matrix Q ∈ P n and a random vector y in R n following a normal law N ( m, V ) with V ∈ P + n , the density h [ Q, V , m ] of the v a lues of y ⊤ Qy , that is h [ Q, V , m ]( t ) dt = (2 π ) − n 2 | V | − 1 2 Z t 0 , η > 0 and δ > 0 such that δ < 1 / ρ ( ˜ K γ ′′ ), where for a matrix A ∈ P n such that msp ec A = { λ 1 , . . . , λ n } , ρ ( A ) is the spe ctral radius of A , that is max 1 ≤ i ≤ n λ i . W e discuss now p ossible v a lues for δ which will ensure that δ < 1 /ρ ( ˜ K γ ) for any cloud-of-p oints γ and a ny kernel κ upper-b ounded b y o ne, that is sup x ∈X | κ ( x, x ) | ≤ 1. Through Equa tion (7), one can obtain that for any cloud of po in ts γ = ( x i , a i ) d i =1 , ρ ( ˜ K γ ) ≤ [max( d · a max − 1 , 1)] 2 d · a max where we write a max for the maxima l weight of γ and we hav e b ounded ρ ( K γ ) by d , which cor resp onds to the case K γ = 1 d,d . Thus, any factor δ chosen so that δ < 1 [max( d · ω − 1 , 1)] 2 d · ω can b e used to compare families of clouds o f p o ints whose maxima l weigh ts do not exceed ω and max imal size do es not exceed 1 2 d . I n the case where these clouds a re b ounded b etw een d min po int s (with weigh t 1 / d min ) a nd d max po int s, this condition is ensure d for δ ≤ ( d min d max ) 3 , which is far from being optimal in practical cases since the v a lues o f κ a re more likely to be b etter distributed in the [0,1] rang e. This shows how ever that if w e co mpare clouds of s imila r size δ can b e equal to 1, and pos s ibly ab ove depending o n the kernel κ which is used. W e leav e for future work the study of the conv ergence of the s e r ies P N k =0 ( − 1) k c k corres p o nding to the ev aluation of k M , although we note that in the pr actice of our exp eriments very few iterations (that is N set be t ween 10 and 20 ) are sufficient to converge to the limit v alue, which reduces consider ably the ov erall computation cost w ith r esp ect to a s traightforw ard eigenv alue decomp osition of ˜ K γ ′′ . Indeed, a s is the case with the inv erse genera liz ed v a riance, this would hav e a cost of the order o f d 3 while N computations of the traces tr[ δ ˜ K γ ′′ ] k only grow in complexity N d 2 . It would be wise, how ever, to let N dep end adaptively on the c o nv ergence of tr([ δ ˜ K γ ′′ ] k ) to 0 , which is very m uch conditioned by the observed sp ectrum for κ . 10 5.2 Exp erimen ts on MNIST handwritten digits W e use the E xp erimental s etting of [KJ0 3], a ls o used in [CFV05] to compa r e the thr e e previo us kernels, namely , we sample 1.000 images from the MNIST database, that is 100 ima ges p er dig it, and sample rando mly clouds-o f-pixels to compare such dig its using the thr ee kernels describ ed ab ov e. The images , which are actually 28 × 28 ma trices, a r e considered as clouds- of-pixels in the [0 , 1] 2 square, and we use a Gaussia n kernel of width σ = 0 . 1 to ev aluate the similarity b et ween tw o pixels through κ , and use a three fold cross v alidation with five rep ea ts to ev aluate the p erforma nces of the kernels. The preliminar y results shown in T able 1 show that the kernel ψ M is comp etitive with b oth ψ tr and the inverse generaliz ed v ariance, which was itself shown to b e effective with resp ect to other kernels in [CFV05], such as simple p olynomia l and Gaussian kernels. Sample Size ψ 0 , η = 0 . 01 ψ tr , t = 0 . 1 ψ M , δ = 1 40 pixels 16.2 28.6 20.62 50 ” 14.7 16.47 15.84 60 ” 14.5 14.97 13.52 70 ” 13.1 11.3 13 80 ” 12.8 10.8 12.4 T able 1: Misclassifica tion ra te expr essed in p ercents for the 3 s.s.p.d. functions used on a b enchmark test o f re cognizing digits images, with 40 to 8 0 black points sampled from the orig inal imag es. References [BCR84] Christian Berg, Jens Peter Reus Christensen, and Paul Ressel. Har- monic Analysis on Semigr oups . Number 10 0 in Graduate T exts in Mathematics. Spring er V erlag , 1 9 84. [Ber05] Dennis S. Bernstein. Matrix Mathematics: The ory, F acts, and F ormu- las with Applic ation to Line ar S yst ems The ory . P rinceton Universit y Press, 2005 . [BL00] J. M. Bor wein and A. S. Lewis. Convex analysis and nonline ar opti- mization . Springer , New Y ork, 20 00. [CFV05] Marco Cuturi, Kenji F ukumizu, and J ean-Philipp e V ert. Semigr oup kernels on meas ures. JMLR , 6:116 9–119 8, 2 005. [G¨ ul96 ] O . G ¨ uler. Barrier functions in interior p oint metho ds. Mathematics of Op er ations R ese ar ch , 21 :860– 8 85, 1996. 11 [HB05] M. Hein and O. Bousquet. Hilber tian metrics a nd p os itive definite kernels on probability measure s. In Z. Ghahr amani a nd R. Cowell, editors, Pr o c e e dings of AIST A TS 2005 , January 2 005. [Joa02 ] Thorsten Joachims. L e arning to Classify T ext Using Supp ort V e c- tor Machines: Metho ds, The ory, and Algorithms . Kluw er Academic Publishers, 20 0 2. [KJ03] Risi Kondo r and T ony Jebara. A kernel b etw een sets of vectors. In T. F aucett and N. Mishra, editors , Pr o c. of ICML’03 , pages 361 – 368, 2003. [Leb06] G uy Lebano n. Metric learning for text do cuments. IEEE T r ansactions on Pattern Analysis and Machine Intel ligenc e , 28(4):497 –508, 20 06. [LL05] John Lafferty and Guy Lebanon. Diffusion k ernels on s tatistical man- ifolds. JMLR , 6:12 9–163 , January 200 5. [Mat93] A rak M. Ma thai. A Handb o ok of Gener alize d Sp e cial F unctions for Statistic al and Physic al Scienc es . Oxford Science Publications, 1993 . [MP92] Arak M. Mathai and Ser ge B. Provost. Qu adr atic F orms in Ra ndom V ariables . Number 1 2 6 in Statistics: T extbo oks and Mo nographs. Dekker, 19 92. [MPH95] A rak M. Mathai, Serg e B. Provost, and T akesi Hay ak awa. Biline ar F orms and Zonal Polynomials . Num b er 102 in LNS. Springer V er lag, 1995. [T ak8 4] Akimichi T akemura. Zonal Polynomials . Inst. Ma th. Stat. Lecture Notes, 198 4. 12

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment