Properties and applications of Fisher distribution on the rotation group

Prop erties and applications of Fisher distribution on the rotation group T omona ri Sei ∗ , Hirok i Shibat a † , Aki m ic hi T akem ura †‡ , Katsuyoshi Ohar a § , Nobuk i T ak a y am a ¶ ‡ June 2012 Abstract W e study prop e r t ies of Fisher distribution (vo n Mises-Fisher distribution, matrix Langevin distribution) on the rotation group SO ( 3). In particular we apply the holonomic gradien t descen t, in tro duced b y Nak a y ama et al. (2011 ), and a m e tho d of series expansion for e v aluating t h e normalizing constant of the distribution and for computing the maxim um like liho o d estimate. The rotation group can b e iden tiﬁed with the Stiefel manifold of t w o orth o n orm a l ve ctors. Th er efore from the viewp oin t of statistica l modeling, it is of int erest to compare Fisher distr i b utio n s on these manifolds. W e illustrate the diﬀerence with an example of near-earth ob jects data. Keywor ds: algebr aic statistics; dir e ctional statistics; ho lonomic gr adient desc ent; maximum likeliho o d estimation; r otation gr oup. 1 In tro duct ion In this pap er w e apply the holo nomic gradien t descen t (HGD) introduced in Nak a y ama et al. [2011] and a metho d of series expansion for ev aluat in g the normalizing constan t of Fisher distribution on the ro t ation group and on Stiefel manifolds and for obta inin g the maxi- m um lik eliho od estimate. Fisher distribution is the most basic exp onen tial family mo del for these manifolds. The g e neral theory of exp onen tia l families is well established (e.g. Barndor ﬀ-Niels en [1978]). In nice “ t extb o ok” cases, the normalizing constan t of the exponential family (i.e. its cum ulan t generating function) can b e explicitly ev aluat ed and then the calculatio n of maxim um lik eliho o d estimate is also simple. How ev er in general, the integral deﬁning the ∗ Department of Mathematics, Keio Universit y † Department of Mathematical Informatics, Gra duate School of Information Science and T echnology , Univ er sit y of T okyo ‡ JST, CREST § F acult y of Mathematics and Physics, Kanazaw a Univ ersity ¶ Department of Mathematics, Kob e University 1 normalizing constant of an exp onen tia l family can not b e explicitly ev aluated. V arious tec hniques, suc h as inﬁnite series expansion, n umerical in tegra tion, Mark ov c hain Mon te Carlo metho ds, iterat iv e prop ortional scaling, hav e b een applied for these cases. Recen tly , w e in tro duced a v ery no vel approac h, the holonomic g radien t descen t, for ev aluation of the normalizing constan t and solving the likelihoo d eq ua t io n (Nak a yama et al. [2011]). Our appro ac h pro vides a systematic metho dology for these t as ks. Note t ha t the normalizing constan t is a deﬁnite in tegral o ve r the sample space, where the in tegrand con tains the parameter of the family of distributions. The lik eliho o d equation in v olv es dif- feren tiation of the normalizing constan t with resp ect to the parameter. In the holonomic gradien t descen t, the theory of D -mo dule s is used to deriv e a set of partial diﬀeren tia l equations satisﬁed by the normalizing constan t. W e illustrate the HGD method for a simple example. Let C ( κ ), κ ≥ 0, b e the nor- malizing constant of the v on Mises-Fisher distribution on S 1 = { ( x 1 , x 2 ) | x 2 1 + x 2 2 = 1 } with the densit y function p ( x | κ ) = C ( κ ) − 1 exp ( κx 1 ) , x = ( x 1 , x 2 ) ∈ S 1 , with resp ect to the uniform distribution on S 1 . W e assumed that the mean direction is kno wn to b e (1 , 0) just for simplicit y . F o r the sample mean ¯ x = ( ¯ x 1 , ¯ x 2 ) of a giv en data set on S 1 of size n , the lo g-lik eliho o d function is giv en b y n ( κ ¯ x 1 − log C ( κ )). W e calculate the maxim um likelihoo d estimate b y gradien t descen t metho ds, that is, up date the parameter κ according to κ ← κ + A∂ { κ ¯ x 1 − log C ( κ ) } with some (ﬁxed or ada ptive) n um b er A , where ∂ = d /dκ denotes the deriv ativ e with resp ect to κ . This needs ev a luation of C ( κ ) and its deriv a t iv es for a num b er of p oin ts κ . The HGD is the gradient descen t metho d together with the following algorithm of computing C ( κ ) and its deriv at iv es. First w e note that the function C ( κ ) is the mo diﬁed Bessel function of the ﬁrst kind and satisﬁes a diﬀeren tial equation ∂ 2 C + 1 κ ∂ C − C = 0 (1) (see e.g. Mardia and Jupp [20 0 0]). Assume that w e kno w the v alues of C and ∂ C at a giv en p o in t κ (0) > 0. Then v alues of C and ∂ C at an y other p oint κ (1) > 0 is obtained if one solv es t he diﬀeren tial equation ∂  C ∂ C  =  0 1 1 − κ − 1   C ∂ C  (2) n umerically from κ = κ (0) to κ = κ (1) . A k ey p oint is that the co eﬃcien t matrix in (2) is easily ev alua t ed. Now the HGD a lg orithm is giv en as follow s: 1. Compute C ( κ (0) ) and ∂ C ( κ (0) ) a t a p oin t κ (0) b y some metho d suc h as n umerical in tegration or series expansion. 2. F or t = 1 , 2 , . . . , rep eat the f o llo wing pro cedure un til κ ( t ) con v erges. 2 2-a. Determine κ ( t ) from κ ( t − 1) according to the gra dient metho d. 2-b. Solv e the equation (2) nume rically from κ ( t − 1) to κ ( t ) . The metho d needs direct computat io n of C ( κ ) and ∂ C ( κ ) only o nce, at κ = κ (0) . The metho d is also av ailable for m ulti- dimensional para meters. Assume that a function C ( θ ) o f θ ∈ R d , typic ally the normalizing constan t of a parametric fa mily , satisﬁes the follo wing partial diﬀerential equations: ∂ ∂ θ i G = P i ( θ ) G, i = 1 , . . . , d, (3) where G = G ( θ ) is a ﬁnite-dimensional v ector consisting of some partial deriv a t ives of C ( θ ) and P i ( θ ) is a square mat r ix of ra t io nal functions of θ for eac h i . An example is the equation (2), where θ = κ . Note that (3) is essen tially an or dinary diﬀeren tia l equation b ecause each equation con tains only one ∂ i . The equation (3) is called a Pfaﬃan system of C ( θ ). Let θ (0) and θ (1) b e tw o p oin ts in R d and a ssume that a nume rical v a lue of G ( θ (0) ) is kno wn. Then we can obtain a n umerical v alue of G ( θ (1) ) as f ollo ws. By recursiv e argumen t, it is suﬃcien t to consider the case that θ (0) and θ (1) ha v e the same comp onen ts all but the i -th comp onen t for some i . Then a numeric a l v alue of G ( θ (1) ) is obtained b y solving the equation (3) for i with the initial condition of G ( θ (0) ). No w the HGD algorithm is the same as t he one-dimensional case. In this pa p er w e apply the holonomic gradien t descen t and a m etho d of series expansion to F isher distribution on the rotation gr o up S O (3) and on the Stiefel manifold V 2 ( R 3 ) of tw o orthonormal v ectors. The F isher distribution on Stiefel manifolds and orthogonal groups has b een studied by num b er of authors. Ho w ev er only a few pap ers (Pren tice [1986], W o o d [1993]) study the Fisher distribution on the sp ecial orthogonal group S O ( p ). The holonomic gradient descen t needs the initia l v alue for the diﬀeren tial equation as illustrated ab o ve. T o ev aluate this v alue, w e dev elop an explicit form ula of the inﬁnite se- ries expansion for S O (3 ). An alternativ e metho d is a one-dimensional in tegratio n form ula prop osed b y W o o d [1993]. F urthermore, as a referee p ointed out, the Fisher distribution on S O (3) is iden tiﬁed with the Bingham distribution on t he real pro jectiv e space R P 3 (Pren tice [1986], W o o d [1993], Mardia and Jupp [20 00]). Th e normalizing constants of the Bingham distributions are hypergeometric functions of a matrix arg umen t and sad- dlep oin t approximations to these normalizing constan ts w ere giv en b y Kume and W o o d [2005]. The organization of the pap er is as follows . F or the rest of this section w e set up nota- tion and summarize preliminary facts on special orthogonal groups and Stiefel manifolds. In Section 2 w e deriv e some prop erties of F isher distribution on sp ecial orthogonal groups and Stiefel manifolds. In Section 3 we deriv e the set of partial diﬀeren tia l equations sat- isﬁed b y the normalizing constan t (Section 3.1 ). W e also give an inﬁnite series expansion for the normalizing constan t (Section 3.2). In Section 4 w e a pply the results of previous sections to the data on orbits of near- ear t h ob jects. 3 1.1 Notation and p reliminary facts Here w e set up notation of this pap er and summarize some preliminary facts. Although w e a re primarily interes ted in 3 × 3 matrices fo r pra ctical a nd computational reasons, we set up our nota tion for general dimension. Let V r ( R p ) = { A ∈ R p × r | A ⊤ A = I r } (0 < r ≤ p ) denote the Stiefel manifold of p × r real mat r ices with orthonor mal columns, where R p × r denotes the set of p × r real matr ices and A ⊤ denote the transp ose of A . In particular for r = p , V r ( R p ) = O ( p ) is the set of p × p orthog onal matrices. S O ( p ) = { X ∈ O ( p ) | det X = 1 } denotes the sp ecial o rthogonal group. Let V o l denote the in v ariant measure (volume elemen t) on V r ( R p ) a nd let µ ( · ) = 1 V ol( V r ( R p )) V ol( · ) denote the inv arian t probability measure on V r ( R p ). W e also call µ the uniform distribu- tion. Similarly for S O ( p ), b y µ ( · ) = V ol( · ) / V ol( S O ( p )) w e denote the uniform distribution with V ol( S O ( p )) = 1 2 V ol( O ( p )) . If p ≥ 3 and X = ( X ij ) is uniformly distributed on S O ( p ) , then we can show that the ﬁrst t wo momen ts are E [ X ij ] = 0 and E [ X ij X k l ] = 1 p 1 { ( i,j )=( k,l ) } . (4) W e pro v e E [ X 11 ] = E [ X 11 X 21 ] = 0 and E [ X 2 11 ] = 1 /p . The o ther cases are similarly pro ve d. Let Q = diag( − 1 , 1 , · · · , 1 , − 1) ∈ S O ( p ). Then ( QX ) 11 = − X 11 and ( QX ) 21 = X 21 . Since X and QX hav e the same distribution, w e ha ve E [ X 11 ] = 0 a nd E [ X 11 X 21 ] = 0. By a similar reason, X 11 and X i 1 for eac h i hav e the same marginal distribution. Therefore E [ X 2 11 ] = E [ X 2 i 1 ]. Since P p i =1 X 2 i 1 = 1, w e ha v e E [ X 2 11 ] = 1 /p . F o r a p × r matrix Θ ∈ R p × r , r ≤ p , let Θ = QD R, Q ∈ V r ( R p ) , D = diag( ρ 1 , . . . , ρ r ) , R ∈ O ( r ) denote its singular v alue decomposition (SVD ), where ρ 1 ≥ · · · ≥ ρ p ≥ 0. No w let Θ ∈ R p × p b e a square matrix and Θ = QD R b e the SVD. L et ǫ = det( QR ) ∈ {− 1 , 1 } . Then w e can write Θ = ˜ Q ˜ D ˜ R, ˜ Q, ˜ R ∈ S O ( p ) , ˜ D = diag( ǫρ 1 , ρ 2 , . . . , ρ p ) , (5) where ˜ Q = Q diag(det Q, 1 , . . . , 1) and ˜ R = diag(det R, 1 , . . . , 1) R . If Θ is non-singular, w e hav e ǫ = sgn det Θ. W e call the decomp osition (5) the sign-pr eserving SVD of Θ with resp ect to S O ( p ). W e also call φ 1 = ǫρ 1 , φ i = ρ i , i ≥ 2, the s ign-pr eserving s ingular values of Θ. The decomp osition is also used in Pren tice [1986 ] a nd W o o d [1993]. 4 2 Fisher di s tributi ons on V r ( R p ) and S O ( p ) In this section w e consider F isher distribution on V r ( R p ) and S O ( p ). In particular we clarify the diﬀerence b et we en Fisher distributions on V p − 1 ( R p ) and S O ( p ). Basic facts on Fisher distribution o n V r ( R p ) a re summarized in Chapter 1 3 of Mardia and Jupp [20 00]. Let X denote either V r ( R p ) or S O ( p ). The density of the Fisher distribution on X with resp ect to the uniform distribution µ is giv en by f ( X ; Θ) = 1 c (Θ) etr(Θ ⊤ X ) , X ∈ X , where Θ = ( θ ij ) ∈ R p × r is the parameter matrix, etr( · ) = exp(tr( · )), and c (Θ) = Z X etr(Θ ⊤ X ) µ ( d X ) (6) is the normalizing constan t. F or V r ( R p ) it is w ell know n (e.g. Khatri and Mardia [197 7], Muirhead [1982], Chikuse [2 0 03]) that c (Θ) is a hypergeometric function c (Θ) = 0 F 1 ( p/ 2 , Y ) with a matrix argument, where Y = Θ ⊤ Θ / 4. How ev er prop erties of c (Θ ) f or the sp ecial orthogonal group X = S O ( p ) ha ve no t b een studied in detail. F or the case of S O (3), follo wing the appro ac h in Pren tice [1 986], W o o d [1993] used the corresp ondence b et w een the Fisher distribution on S O (3) and the Bingha m distribution on the unit sphere S 3 or the real pro jectiv e space R P 3 and sho we d that c ( Θ ) can b e written as a one- dimensional in tegral in v olving the mo diﬁed Bessel function of degree zero. The normalizing constan ts of the Bingham distributions are h yp ergeometric functions of a matrix argumen t and sad- dlep oin t approximations to these normalizing constan ts w ere giv en b y Kume and W o o d [2005]. In Section 3 w e deriv e diﬀerential equations and an inﬁnite series expansion of c (Θ) for S O (3). Let x 1 , . . . , x p b e the columns of X ∈ S O ( p ). Sinc e x p is uniquely determined from x 1 , . . . , x p − 1 , w e can iden tify S O ( p ) with V p − 1 ( R p ) by ( x 1 , . . . , x p ) ∈ S O ( p ) ↔ ( x 1 , . . . , x p − 1 ) ∈ V p − 1 ( R p ) (7) This leads to the question of diﬀerences b et w een Fisher distributions on S O ( p ) and those on V p − 1 ( R p ). Let Θ = ( θ 1 , . . . , θ p ) ∈ R p × p b e a parameter matrix fo r Fisher distribution on S O ( p ). By setting θ p = 0, we clearly obtain a Fisher distribution on V p − 1 ( R p ). Henc e the family of Fisher distributions on V p − 1 ( R p ) is a submo del of the fa mily of Fisher distributions on S O ( p ). It can b e easily seen that fo r p = 2, θ 2 is redundant and these t w o families are the same. How ev er for p ≥ 3, the family of Fisher distributions on V p − 1 ( R p ) is a strict submo del of that on S O ( p ). W e state this as a lemma. Lemma 1. F or p ≥ 3 , the fami l y of Fis h er distributions on V p − 1 ( R p ) is a strict submo del of that on S O ( p ) . Pr o of. In g eneral, let K b e a p ositive integer a nd consider a K -dimensional exp onential family p ( x | θ ) = 1 C ( θ ) exp  θ ⊤ x  ( x ∈ S ) , C ( θ ) = Z S exp( θ ⊤ x ) ν ( dx ) , 5 where θ is a K -dimensional v ector, S is a smo o t h submanifold of R K and ν is a measure on S . Assume that C ( θ ) exists in some op en neigh b orho o d of the origin θ = 0 . W e call the para meter θ is estimable if p ( x | θ ) 6 = p ( x | ˜ θ ) as f unctions of x whenev er θ 6 = ˜ θ . F or the exp o nen tial family , θ is estimable if and only if Aﬃne(supp ort( ν )) = R K (e.g. Corollary 8.1 of Barndorﬀ-Nielsen [1978]), where supp ort ( ν ) is the supp o r t of ν a nd Aﬃne( U ), U ⊂ R K , denotes the aﬃne hull o f U . W e now show that if p ≥ 3 then Aﬃne( S O ( p )) = R p × p and therefore Θ is estimable, whic h is suﬃcien t to pro ve the lemma. Let L = Aﬃne( S O ( p )). W e ﬁr st see that t he zero matrix 0 b elongs to L . Let ǫ i ∈ {− 1 , 1 } for 1 ≤ i ≤ p . Then the a v erag e of 2 p − 1 matrices diag ( ǫ 1 , . . . , ǫ p − 1 , Q p − 1 i =1 ǫ i ) in S O ( p ) is zero. Hence 0 ∈ L . W e now sho w that e i e ⊤ j b elongs to L ( ∀ i, j ), where e i = (0 , . . . , 1 , . . . , 0) ⊤ is the standard basis v ector with 1 as the i -th elemen t. Then together with 0 ∈ L it f o llo ws t ha t L = R p × p . T ak e matrices P i ∈ S O ( p ) ( i = 1 , . . . , p ) suc h that P i e i = e 1 . F or example, let P 1 = I p and P i = e 1 e ⊤ i − e i e ⊤ 1 + P j 6 =1 ,i e j e ⊤ j for i 6 = 1. Then e i e ⊤ j ∈ L if and only if e 1 e ⊤ 1 = P i e i e ⊤ j P ⊤ j ∈ L . No w it suﬃces to sho w that e 1 e ⊤ 1 b elongs to L . T ak e t he av erage of 2 p − 2 matrices diag(1 , ǫ 2 , . . . , ǫ p − 1 , Q p − 1 i =2 ǫ i ). The n e 1 e ⊤ 1 ∈ L . F o r X = V r ( R p ) the maxim um likelih o o d estimate (MLE) of the Fisher distribution is o btained b y the follow ing pr o cedure (Khatri and Mardia [19 77]). Let X (1) , . . . , X ( N ) b e a data set on V r ( R p ). Let ¯ X = N − 1 P N t =1 X ( t ) b e the sample mean matrix and let ¯ X = Q diag( g 1 , . . . , g r ) R b e the SVD of ¯ X , where Q ∈ V r ( R p ), R ∈ O ( r ) and g 1 ≥ · · · ≥ g r ≥ 0. Then the maxim um lik eliho o d estimate ˆ Θ is give n b y ˆ Θ = Q diag ( ˆ φ 1 , . . . , ˆ φ r ) R , where ˆ φ i is the solution of R V r ( R p ) x ii exp( P r k =1 ˆ φ k x k k ) µ ( dX ) R V r ( R p ) exp( P r k =1 ˆ φ k x k k ) µ ( dX ) = g i , i = 1 , . . . , r. This pro cedure is a lso v alid for S O ( p ) if we use the sign-prese rving SVD in (5). W e giv e the fact as a lemma since it is not explicitly pro ved in the literature. Remark that for S O ( p ) the normalizing constan t c (Θ) in (6) is in v ariant under a tr a nsformation Θ 7→ Q Θ R for an y Q, R ∈ S O ( p ). Lemma 2. L et X (1) , . . . , X ( N ) b e a data se t on S O ( p ) . L et ¯ X = N − 1 P N t =1 X ( t ) b e the sample me an matrix and ¯ X = Q diag( g 1 , . . . , g p ) R b e the sign-pr eserving SVD of ¯ X , wher e Q, R ∈ S O ( p ) and | g 1 | ≥ g 2 · · · ≥ g p ≥ 0 . Then the maxim um li k eliho o d estimate of the Fisher distribution on S O ( p ) is ˆ Θ = Q diag ( ˆ φ 1 , . . . , ˆ φ p ) R , wher e ˆ φ i is the maximize r of the function ( φ k ) p k =1 7→ p X k =1 φ k g k − log c (diag ( φ 1 , . . . , φ p )) , (8) 6 or e quival e ntly, the solution of R S O ( p ) x ii exp( P p k =1 ˆ φ k x k k ) µ ( dX ) R S O ( p ) exp( P p k =1 ˆ φ k x k k ) µ ( dX ) = g i , i = 1 , . . . , p. (9) Pr o of. W e c hange the parameter v ariable from Θ to Φ = ( φ ij ) p i,j =1 = Q ⊤ Θ R ⊤ . Then the (1 / N times) log -lik eliho o d function is written as tr(Θ ⊤ ¯ X ) − log c (Θ) = tr(Φ ⊤ G ) − log c (Φ) , (10) where G = diag ( g 1 , . . . , g p ). Since (10) is strictly con vex in Φ, the unique maximizer mak es its ﬁrst-order deriv ativ es zero. Note tha t the ﬁrst term on the right hand side of (10) do es not dep end on the oﬀ-diago na l elemen ts of Φ. There fo re the condition for maximization of (1 0) with resp ect to an oﬀ-diagonal elemen t is written a s 0 = ∂ ∂ φ ij log c (Φ) , ( i 6 = j ) . (11) W e now ﬁx i 6 = j and ev aluate ( ∂ /∂ φ ij ) log c (Φ) at ( φ i ′ j ′ ) i ′ 6 = j ′ = 0. Then we hav e ∂ ∂ φ ij log c (Φ)     φ i ′ j ′ =0 ∀ i ′ 6 = j ′ = R S O ( p ) x ij exp( P p k =1 φ k k x k k ) µ ( dX ) R S O ( p ) exp( P p k =1 φ k k x k k ) µ ( dX ) . Ho w eve r Z S O ( p ) x ij exp( p X k =1 φ k k x k k ) µ ( dX ) = Z S O ( p ) ( − x ij ) exp( p X k =1 φ k k x k k ) µ ( dX ) = 0 b ecause the uniform distribution µ on S O ( p ) is in v ariant with resp ect to m ultiplication of the i -th row and the i -th (not j -th) column of X b y − 1, which transforms x ij in to − x ij and x k k in to x k k for ev ery k . Therefore any diagona l mat r ix Φ satisﬁes (11). The log-lik eliho o d function o f the diagonal matrix is (8) and the ma ximizer satisﬁes (9). When det ¯ X < 0, it is no t correct to use the ordinary singular v alues of ¯ X on the righ t-ha nd side of (9). Remark 1. The determinant of the sample me an m atrix ¯ X is not ne c e ssarily p ositive even if al l X ( t ) , t = 1 , . . . , N , ar e in S O ( p ) . Inde e d for the c a s e of uniform distribution on S O ( p ) we pr ove P (det ¯ X < 0) → 1 2 , ( N → ∞ ) , as long as p ≥ 3 . B y the c entr al limit the or em √ N ( ¯ X − E ( X )) c onver ges to a Gaussian r and o m matrix Z with the sam e c ovarianc es as X . F r om (4), w e kn ow that E ( X ) = 0 and the c ovarianc es of X ar e diag o nal when p ≥ 3 . Then Z an d any sig n change of a c olumn of Z have the same pr ob ability distribution and ther efor e the pr ob ability of det( Z ) < 0 is 1 / 2 . Henc e the pr ob ability of det( ¯ X ) < 0 c onver ges to 1 / 2 . 7 Remark 2. Even if det ¯ X > 0 , the determinant o f the estimate d p ar ameter ˆ Θ may b e ne g a tive. Inde e d, let the sign-pr ese rv i n g sin g ular values of ¯ X and ˆ Θ b e g = ( g 1 , g 2 , g 3 ) and ˆ φ = ( ˆ φ 1 , ˆ φ 2 , ˆ φ 3 ) , r esp e ctively. We pr ove that g 1 g 2 g 3 and ˆ φ 1 ˆ φ 2 ˆ φ 3 c an have the opp osite signs. T o se e this, we ﬁrst c onsider the c ase ˆ φ 1 = 0 , ˆ φ 2 > 0 and ˆ φ 3 > 0 . Th e n , by using the T aylor exp ansion formulas (16) an d (17) deve l o p e d i n Subse ction 3.2, we de duc e that g 1 , g 2 and g 3 ar e s trictly p ositive. By c ontinuity, ther e exist some ˆ φ 1 < 0 , ˆ φ 2 > 0 and ˆ φ 3 > 0 while al l g i ’s ar e p ositive. 3 Computation of the normalizing constan t and its deriv ativ e s F o r computing the maxim um lik eliho o d estimate of Fisher distribution w e need n umerical ev aluation of the normalizing constan t c (Θ ) of (6) and its deriv at iv es. In this section w e study tw o metho ds for this purp ose. The ﬁrst metho d is the holonomic gradien t descen t . In the second metho d, w e use series expansion of etr(Θ ⊤ X ). The second metho d is also used to compute t he initial v alue of HG D. 3.1 The holonomic gradien t descen t for Sti efel manifolds and sp ecial orthogonal group Let us brieﬂy describ e the holonomic gradient descen t. As to details, w e refer to Nak ay ama et al. [2011]. An algebraic computation is the ﬁrst step; w e construct linear ODE’s (o rdinary diﬀeren tial equations) satisﬁed by c (Θ) with resp ect to eac h θ ij b y G r ¨ obner bases of a set o f partial diﬀeren tial equations satisﬁed b y c . V ariables other than θ ij app ear as pa- rameters in the ODE. The rank o f ODE’s is called the ho lo nomic rank. The OD E’s give a dynamical syste m for the function c (Θ) etr( − Θ ⊤ ¯ X ), the recipro cal of the lik eliho o d. The gradien t of the function can also b e expressed in terms of deriv atives of the recip- ro cal standing for the standard monomials. The second step is a n umerical pro cedure; a p oint in the dynamical system mo v es tow ard the maxim um lik eliho o d estimate along the gradien t dir ection, sim ultaneously up dating the v alues of c (Θ) and its deriv ativ es. F o r the ho lo nomic gradien t descen t, w e study diﬀeren tial op erators A annihilating c (Θ), that is, A · c ( Θ) = 0. Denote the diﬀeren tial op erato r ∂ /∂ θ ij b y ∂ ij . W e ﬁrst study the sp ecial orthogonal gro up and then study the St iefel manifold. 8 3.1.1 The case of sp ecial orthogonal group Let Θ ∈ R p × p . W e consider t he follo wing three types of diﬀerential op erator s: A (1) ij = p X k =1 ∂ ik ∂ j k − δ ij , ˜ A (1) ij = p X k =1 ∂ k i ∂ k j − δ ij ( i ≤ j ) , A (2) = det( ∂ ij ) − 1 , A (3) ij = p X k =1 ( − θ j k ∂ ik + θ ik ∂ j k ) , ˜ A (3) ij = p X k =1 ( − θ k j ∂ k i + θ k i ∂ k j ) ( i < j ) , where δ ij is t he Kronec ke r’s delta. The follo wing lemma is a n analog of Theorem 2 of Nak ay a ma et al. [2011]. Lemma 3. The ab ove diﬀer ential op er ators annihilate c (Θ) of S O ( p ) . Pr o of. W e ﬁrst pro v e that the op erators A (1) ij , ˜ A (1) ij and A (2) annihilate etr(Θ ⊤ X ) for any X ∈ S O ( p ). Then they also annihilate c (Θ) b ecause A · c (Θ) = R S O ( p ) A · etr(Θ ⊤ X ) µ ( d X ) for an y o p erator A . Since ∂ ij · etr (Θ ⊤ X ) = x ij etr(Θ ⊤ X ) and X X ⊤ = I , we hav e A (1) ij · etr(Θ ⊤ X ) = p X k =1 x ik x j k − δ ij ! etr(Θ ⊤ X ) = 0 . Similarly , we obtain ˜ A (1) ij · etr(Θ ⊤ X ) = 0 from X ⊤ X = I and A (2) · etr(Θ ⊤ X ) = 0 from det( X ) = 1. Next consider A (3) ij and ˜ A (3) ij . W e note c (Θ) = c ( Q Θ) = c (Θ Q ) for an y Q ∈ S O ( p ). F or an y ﬁxed i < j , deﬁne a rota tion matrix Q = Q ( ǫ ) b y Q = (cos ǫ )( E ii + E j j ) + (sin ǫ )( − E ij + E j i ) + X k 6 = i,j E k k , where E k l is the matrix whose ( i, j )-th comp onen t is 1 if k = i and l = j and 0 otherwise. Then 0 = c ( Q Θ) − c (Θ) = c Θ − ǫ X k θ j k E ik + ǫ X k θ ik E j k + o( ǫ ) ! − c (Θ) = ǫ p X k =1 ( − θ j k ∂ ik + θ ik ∂ j k ) · c (Θ) + o( ǫ ) , as ǫ → 0. Hence w e ha ve A (3) ij · c (Θ) = 0. Similarly w e obtain ˜ A (3) ij · c (Θ) = 0 from c (Θ Q ) = c (Θ). 9 Let D b e the ring of diﬀeren tial op erator s with p olynomial co eﬃcien ts in θ ij and let I denote the ideal generated by the ab o ve diﬀeren tial op erat o rs A (1) ij , . . . , ˜ A (3) ij in D . Also let I diag denote I restricted to diagona l mat rices Θ = diag ( θ 11 , . . . , θ pp ). I · f ( Θ ) = 0 implies I diag · f (diag (Θ)) = 0. W e denote b y R p the ring of diﬀeren tia l op erators with rational function co eﬃcien ts in θ ij , 1 ≤ i, j ≤ p . The fo llo wing pr o p ositions are essen tial for the holonomic gradien t descen t. W e refer to Nak a yama et a l. [2011] f or the deﬁnition of holonomic ideals in D and zero-dimensional ideals in R p . Once zero-dimensionalit y of R p I is prov ed and a Gr¨ obner basis is constructed, w e can ﬁnd OD E’s and apply the holonomic gradien t descen t for the maxim um lik eliho o d estimate. Prop osition 1. I f p = 2 , then the ide al I is holonomic. In p articular, the ide al R 2 I is zer o- dimensional. The holonomic r ank is e qual to 2 . The prop osition is prov ed b y Macaulay2 (G ra yson a nd Stillman) and the y ang pac k ag e on Risa/Asir (RisaAsir dev eloping team) by utilizing Gr¨ obner basis computations in rings of diﬀeren tial op erators. Also the set of generators of I is obtained by nk restriction function of asir fro m the in tegral represen tation of c ( Θ ) a s g 1 = − ∂ 12 − ∂ 21 , g 2 = − ∂ 11 + ∂ 22 , g 3 = ∂ 2 21 + ∂ 2 22 − 1 , g 4 = ( θ 22 + θ 11 ) ∂ 21 + ( − θ 21 + θ 12 ) ∂ 22 , g 5 = ( θ 21 − θ 12 ) ∂ 22 ∂ 21 + ( θ 22 + θ 11 ) ∂ 2 22 + ∂ 22 − θ 22 − θ 11 , g 6 = ( − θ 21 + θ 12 ) ∂ 21 + ( θ 2 21 − 2 θ 12 θ 21 + θ 2 22 + 2 θ 11 θ 22 + θ 2 11 + θ 2 12 ) ∂ 2 22 + ( θ 22 + θ 11 ) ∂ 22 − θ 2 22 − 2 θ 11 θ 22 − θ 2 11 . F urthermore the set o f generators of I diag is giv en as h 1 = ( − θ 22 − θ 11 ) ∂ 2 11 − ∂ 11 + θ 22 + θ 11 , h 2 = − ∂ 11 + ∂ 22 . Prop osition 2. I f p = 3 , then the ide al R 3 I is zer o -dimensiona l . The holonom ic r ank is less than or e qual to 4 . R 3 / ( R 3 I ) is sp anne d by 1 , ∂ 31 , ∂ 32 , ∂ 33 as a ve ctor sp a c e over the ﬁeld of r ational functions. The prop osition is pro ve d by a large scale computation on R isa/ Asir with Gr¨ obner bases. The algorithm for it is explained in, e.g., Nak a yama et al. [2011]. Programs and obtained data are at the we bsite Op enXM/Math (Op enXM Mathematics Rep ository). W e conjecture that I is holonomic and consequen tly R p I is zero- dimensional for an y p in the case of S O ( p ). 10 3.1.2 The case of Stiefel manifold Let Θ ∈ R p × r ( r ≤ p ). Consider the following diﬀeren tia l o p erators: A (1) ij = p X k =1 ∂ k i ∂ k j − δ ij (1 ≤ i ≤ j ≤ r ) , A (2) ij = r X k =1 ( − θ j k ∂ ik + θ ik ∂ j k ) (1 ≤ i < j ≤ p ) , ˜ A (2) ij = p X k =1 ( − θ k j ∂ k i + θ k i ∂ k j ) (1 ≤ i < j ≤ r ) . Lemma 4. The ab ove op er ators annihilate c (Θ) of V r ( R p ) . Pr o of. The pro of is similar to that of Lemma 3. The op erator A (1) ij annihilates etr(Θ ⊤ X ) if X ∈ V r ( R p ). Since c (Θ) = c ( Q Θ) = c (Θ R ) for an y Q ∈ O ( p ) and R ∈ O ( r ), w e hav e A (2) ij · c ( Θ ) = 0 and ˜ A (2) · c (Θ) = 0, resp ectiv ely . Let I denote the ideal generated by the ab ov e op erato rs and let I diag denote its r e- striction to diag onal matrices Θ = diag ( θ 11 , . . . , θ r r ) ∈ R p × r . W e denote by R r,p the ring of diﬀeren tial op erators with rationa l function co eﬃcien ts in θ ij , 1 ≤ i ≤ p , 1 ≤ j ≤ r . Prop osition 3. If r = 2 , p = 3 , then the i d e al R 2 , 3 I is zer o-dimens ional. T he holon omic r ank is e qual to 4 . R 2 , 3 / ( R 2 , 3 I ) is sp anne d by 1 , ∂ 11 , ∂ 12 , ∂ 2 11 over the ﬁeld of r ational functions. This prop osition is also pro ve d b y a computation on Risa/Asir. Programs to v erify the prop osition are at the we bsite Op enXM/Math (Op enXM Mathematics R ep ository). W e conjecture that I is holonomic and conseque ntly R r,p I is zero-dimensional for any r and p in the case of V r ( R p ). 3.1.3 Diﬀeren tial equations for the diagonal matrix F o r the h yp ergeometric function c (Θ) = 0 F 1 ( p/ 2 , Y ), Y = Θ ⊤ Θ / 4, the followin g partial diﬀeren tial equation is w ell kno wn (Muirhead [1970], [Muirhead, 1 982, Thm.7.5.6]). Let y 1 , . . . , y r denote the eigen v alues of Y . The function 0 F 1 satisﬁes the follo wing partial diﬀeren tial equations: y i ∂ 2 i F + ( p 2 − r − 1 2 + 1 2 r X j =1 ,j 6 = i y i y i − y j ) ∂ i F − 1 2 r X j =1 ,j 6 = i y j y i − y j ∂ j F = F, i = 1 , . . . , r . (12) Muirhead [1970] obtained these partial diﬀeren tial equations from t he partial diﬀerential equations satisﬁed b y zonal p olynomials (James [1968], [T ak em ura , 198 4, Sec.4.5]). In App endix A w e c hec k that for lo w dimensional cases these equations are also deriv ed from the diﬀeren tial op erators in Lemma 4. 11 When Θ = diag( θ ii ) is diago nal, the normalizing constan t c (Θ) satisﬁes Muirhead’s diﬀeren tial equation g iv en b elow in (24). The holonomic rank of the system of equations is 8 when p = r = 3. In the case o f S O (3), which is of inte rest in our applications, the normalizing constan t should satisfy extra partial diﬀeren tial equations, b ecause w e hav e sho wn that the holo nomic rank of R 3 I is less than or equal to 4 in Prop osition 2. In fact, w e can ﬁnd extra diﬀeren tial equations from A ( k ) ij and ˜ A ( k ) ij . Theorem 1. 1. Put ℓ ij = ( θ 2 ii − θ 2 j j ) ∂ ii ∂ j j − ( θ j j ∂ ii − θ ii ∂ j j ) − ( θ 2 ii − θ 2 j j ) ∂ k k (13) for i 6 = j . The index k is chosen so that { i, j, k } = { 1 , 2 , 3 } . Then, the norma lizing c on s tant c (diag ( θ ii )) satisﬁes the p artial diﬀ er en tial e quation ℓ ij · c = 0 fo r 1 ≤ i < j ≤ 3 . 2. The holonom ic r ank of (24) and (13), ij = 12 , 13 , 23 , is 4 . In order to ﬁnd the op erator ℓ ij , we utilized the series expans io n (16) and (1 7) of the normalizing constan t c , whic h will be giv en in the next subsection, and the metho d of undetermined co eﬃcien ts or a syzygy computation. These metho ds will b e explained after the pro of. Once these op erators ℓ ij and some auxiliary op erators ar e found, the pro of consists o f a tedious calculation. Pr o of. 1. Let I b e the left ideal in D 9 , whic h is the ring of diﬀeren tia l op erators in 9 v ariables, generated b y A ( k ) ij , ˜ A ( k ) ij , A (2) . Since the normalizing constan t c is a nnihilat ed b y the elemen ts of I , w e ma y show that ℓ ij ∈ I + P i 6 = j θ ij D 9 . In fact, w e ha ve ℓ 12 = a 13 A (1) 13 + a 32 A (1) 32 + a 33 A (1) 33 + bA (2) + c 21 A (3) 21 + ˜ c 21 ˜ A (3) 21 + X i 6 = j θ ij e ij where a 13 = ( θ 2 22 − θ 2 11 )( ∂ 21 ∂ 32 − ∂ 22 ∂ 31 ) a 32 = − ( θ 2 22 − θ 2 11 )( ∂ 11 ∂ 32 − ∂ 12 ∂ 31 ) a 33 = ( θ 2 22 − θ 2 11 )( ∂ 11 ∂ 22 − ∂ 12 ∂ 21 ) b = − ( θ 2 22 − θ 2 11 ) ∂ 33 c 21 = θ 22 ( θ 22 ∂ 21 − θ 11 ∂ 12 ) ∂ 22 ˜ c 21 = ( θ 11 ∂ 21 − θ 22 ∂ 12 )( θ 22 ∂ 22 − 1) − 2 θ 22 ∂ 12 and e 12 = θ 22 (( θ 22 ∂ 22 + 1) ∂ 11 − θ 11 ∂ 2 22 ) ∂ 12 − θ 11 ( θ 22 ∂ 22 − 1) ∂ 11 ∂ 21 + θ 2 22 ∂ 2 22 ∂ 21 e 13 = θ 22 ( θ 22 ∂ 21 − θ 11 ∂ 12 ) ∂ 22 ∂ 23 e 21 = ( θ 22 ∂ 22 θ 11 ∂ 11 − θ 2 22 ∂ 2 22 − θ 22 ∂ 22 ) ∂ 12 − θ 2 22 ∂ 22 ∂ 21 ∂ 11 + θ 11 ( θ 22 ∂ 22 − 1) ∂ 21 ∂ 22 e 23 = − θ 22 ( θ 22 ∂ 21 − θ 11 ∂ 12 ) ∂ 13 ∂ 22 e 31 = ( θ 11 ( θ 22 ∂ 22 − 1) ∂ 21 − θ 22 ( θ 22 ∂ 22 + 1) ∂ 12 ) ∂ 32 e 32 = ( θ 22 ( θ 22 ∂ 22 + 1) ∂ 12 − θ 11 ( θ 22 ∂ 22 − 1) ∂ 21 ) ∂ 31 . 12 W e can sho w that the diﬀeren tial op erato r (24) found b y the Muirhead also belongs to I + P i 6 = j θ ij D 9 with an analogous metho d. No w, t he second statemen t can b e sho wn b y a rank ev aluat io n program (use, e.g., the holonomicRank command o f Macaula y2). Let us explain how w e fo und the op erat o r ℓ 12 . W e put ℓ 12 = P d ij ( θ ′ ) ∂ ii ∂ j j + P d i ( θ ′ ) ∂ ii + d ( θ ′ ), θ ′ = ( θ 11 , θ 22 , θ 33 ) where the degree of p olynomials d ij , d i , d is less than o r equal to 2. W e act the op erator ℓ 12 to a truncated series expansion and obt a in a sys tem of linear equations for the undetermine d co eﬃcien ts of the p olynomials. By solving the system, w e get a candidate of ℓ 12 . The op erators a ij , ˜ a ij , b , ˜ c ij and c ij are also fo und by the metho d of undetermined co eﬃcien ts. More precisely sp eaking, we put a ij = P | α | + | β |≤ N c αβ ij θ α ∂ β where c αβ ij are undetermined coeﬃcien ts a nd N = 5. W e put other op erat o rs ana logously . Expand ℓ 12 + X a ij A (1) ij + X ˜ a ij ˜ A (1) ij + bA (2) + X c ij A (3) ij + X ˜ c ij ˜ A (3) ij (14) in to the normally ordered expression and put θ ij = 0 f o r all i 6 = j . And then, set the co eﬃcien ts of eac h θ α ∂ β to 0 a nd we obtain a sy stem of linear eq ua t io ns with respect to the undetermined co eﬃcien ts. Find a non-trivial solution of the system whic h giv es a candidat e of undetermined o p erators. The op erators e ij are o bta ined b y collecting the righ t co eﬃcien ts of (14) with respect to θ ij , i 6 = j . A diﬀeren t approac h is a syzy gy computatio n (see, e.g., D.Cox et al. [2005]). W e put a ij = P | β | ≤ N c β ij ∂ β w ere c β ij are undetermined p olynomials. W e put other op erator s analogously . Doing the same pro cedure as abov e, w e obtain a sys tem of linear indef- inite equations in the p olynomial ring Q [ θ 11 , θ 22 , θ 33 ]. It can b e solv ed b y the syzygy computation. The p erformance o f the second metho d is more eﬃcien t than the ﬁrst one. Theorem 2. The Pfaﬃan e quation derive d fr om (24) and (13), ij = 12 , 13 , 2 3 is ∂ ii C = P i C , (15) wher e C = ( c, ∂ 11 c, ∂ 22 c, ∂ 33 c ) T and P 1 =      0 1 0 0 1 − θ 11 (2 θ 2 11 − θ 2 22 − θ 2 33 ) ( θ 11 − θ 33 )( θ 11 + θ 33 )( θ 11 − θ 22 )( θ 11 + θ 22 ) θ 22 ( θ 11 − θ 22 )( θ 11 + θ 22 ) θ 33 ( θ 11 − θ 33 )( θ 11 + θ 33 ) 0 θ 22 ( θ 11 − θ 22 )( θ 11 + θ 22 ) − θ 11 ( θ 11 − θ 22 )( θ 11 + θ 22 ) 1 0 θ 33 ( θ 11 − θ 33 )( θ 11 + θ 33 ) 1 − θ 11 ( θ 11 − θ 33 )( θ 11 + θ 33 )      P 2 =      0 0 1 0 0 θ 22 ( θ 11 − θ 22 )( θ 11 + θ 22 ) − θ 11 ( θ 11 − θ 22 )( θ 11 + θ 22 ) 1 1 − θ 11 ( θ 11 − θ 22 )( θ 11 + θ 22 ) − θ 22 ( θ 2 11 − 2 θ 2 22 + θ 2 33 ) ( θ 22 − θ 33 )( θ 22 + θ 33 )( θ 11 − θ 22 )( θ 11 + θ 22 ) θ 33 ( θ 22 − θ 33 )( θ 22 + θ 33 ) 0 1 θ 33 ( θ 22 − θ 33 )( θ 22 + θ 33 ) − θ 22 ( θ 22 − θ 33 )( θ 22 + θ 33 )      P 3 =      0 0 0 1 0 θ 33 ( θ 11 − θ 33 )( θ 11 + θ 33 ) 1 − θ 11 ( θ 11 − θ 33 )( θ 11 + θ 33 ) 0 1 θ 33 ( θ 22 − θ 33 )( θ 22 + θ 33 ) − θ 22 ( θ 22 − θ 33 )( θ 22 + θ 33 ) 1 − θ 11 ( θ 11 − θ 33 )( θ 11 + θ 33 ) − θ 22 ( θ 22 − θ 33 )( θ 22 + θ 33 ) θ 33 ( θ 2 11 + θ 2 22 − 2 θ 2 33 ) ( θ 22 − θ 33 )( θ 22 + θ 33 )( θ 11 − θ 33 )( θ 11 + θ 33 )      13 This theorem can be sho wn b y a straightforw ard calculation f r om Theorem 1 as ex- plained in, e.g., Nak a yama et al. [2011]. W e ev a luate the normalizing constan t and its deriv atives by ev aluating a truncated series expansion near the origin and extend the v alue b y solving an ordinary diﬀeren tial equation (the holonomic gradient metho d, H.Hashiguc hi et al. [2012]). The ODE is giv en in the following Corollar y . Corollary 1. F o r c o nstants a, b, c , we r estrict the function C to θ 11 = at , θ 22 = bt , θ 33 = ct . The or dinary di ﬀer e ntial e quation satisﬁe d by C with r esp e ct to t is dC dt =  A − 2 t diag(0 , 1 , 1 , 1)  C , wher e A =     0 a b c a 0 c b b c 0 a c b a 0     and the e igenvalues of A ar e { a − b − c, − a + b − c, − a − b + c, a + b + c } . 3.1.4 Practice of H GD Although the HGD is a general metho d whic h can b e applied to broad problems, w e need a go o d guess (oracle) of a starting p oin t to searc h for the optimal p oin t (MLE). W e explain why w e need a go o d guess of a starting p oin t with an example of S O (3 ). Let Θ b e the optimal p oint for a giv en data and ∂ C ∂ θ ii = P i ( θ ) C b e the Pfaﬃan system to apply for the HGD . The denominator of the co eﬃcien t matrix P ij is the po lynomial Q 1 ≤ i 0 but det ˆ Θ (s) < 0 . W e ha ve a ppro ximated the normalizing constan t by our series expansion truncated at the degree 20 and maximized the approximated log- lik eliho o d function. Thes e n umerical results are obta ined b y the implemen tation in R o f t he BF GS metho d. F o r the asteroid data the MLEs are ˆ Θ (s) 1:2 =   0 . 156 0 . 248 0 . 0379 0 . 062 − 0 . 00225 19 . 6   = Q 2  19 . 6 0 0 0 . 161  R 2 20 and ˆ Θ (s) =   0 . 0783 0 . 25 1 − 0 . 716 0 . 753 0 . 059 1 − 0 . 078 − 0 . 00341 19 . 6 0 . 050 3   = Q   19 . 6 0 0 0 0 . 815 0 0 0 − 0 . 6 5 5   R, where Q 2 , R 2 , Q and R are the mat r ices in (21). When w e appro ximate the normalizing constan t b y the p olynomial o f degree 20 as in the case of the comets’ data and use the output as a starting p oint of the holo nomic gradien t descen t, the HGD immediately ﬁnds a b etter lik eliho o d v alue and then w e reject the approximate solution by the series expansion metho d of this degree. W e increase the degree of the appro ximation un til the HG D do es not reject the output. In the outputs ab o ve, we appro ximate the normalizing constan t by our series expansion truncated at the degree 40. W e next compute the MLE b y the HGD with solving nu merically the a sso ciated dy- namical system along gradien t directions. W e a dd a sup erscript (h) as ˆ Θ (h) for v a lues computed by the holonomic gradien t descen t. W e use the output of the series expansion metho d as a starting p oin t of the HGD. F o r the comets’ data the MLE’s of the F isher distribution on V 2 ( R 3 ) a nd on S O (3) by the HGD r esp ective ly agree with the MLE’s by the series expansion metho d. F o r the asteroid data, the MLE’s are ˆ Θ ( h ) 1:2 = ˆ Θ ( s ) 1:2 and ˆ Θ ( h ) =   0 . 0779 0 . 25 1 − 0 . 733 0 . 769 0 . 0591 − 0 . 0776 − 0 . 00346 19 . 6 0 . 05 0 5   = Q   19 . 6 0 0 0 0 . 831 0 0 0 − 0 . 671   R. This output impro ves the approxim a t e MLE b y the series expansion metho d. The AIC (Ak aik e Informatio n Criterion) v alues are giv en in T able 1. F or eac h data set, AIC w as minimized b y t he Fisher distribution on S O (3). The log-lik eliho o d ratio test statistic (LLR) of V 2 ( R 3 ) a gainst S O (3) a nd its p - v alue based o n the χ 2 -approx imatio n with 3 degrees of freedom ar e also sho wn. T able 1: AIC of eac h data a nd eac h mo del, and the result of the lik eliho o d rat io test of V 2 ( R 3 ) a gainst S O (3 ). comets asteroids Uniform V 2 ( R 3 ) S O (3) Uniform V 2 ( R 3 ) S O (3) AIC 0 − 207 . 0 − 219 . 7 0 − 34764 . 6 − 34769 . 2 LLR 18 . 7 10 . 6 p -v a lue 0 . 0003 0 . 014 21 A P artial diﬀe ren tial equatio n for 0 F 1 ( p/ 2 , Y ) If Θ = diag ( θ ii ) is diagonal, then y i = θ 2 ii / 4. By change o f v ariables fr om ( 1 2) w e ha v e y i ∂ 2 i + ( p 2 − r − 1 2 + 1 2 r X j =1 ,j 6 = i y i y i − y j ) ∂ i − 1 2 r X j =1 ,j 6 = i y j y i − y j ∂ j − 1 = ∂ 2 ii + ( p − r ) 1 θ ii ∂ ii + X j 6 = i 1 θ 2 ii − θ 2 j j { θ ii ∂ ii − θ j j ∂ j j } − 1 . (24) W e now show that the numerator of (24) b elongs I diag for small dimensions. F o r p = r = 2 (i.e. for O (2)) by Macaula y2 w e c hec k ed that the a b ov e I is holonomic. Also b y asir (nk restriction), a set of generator s of I diag is giv en as h 1 = ( − θ 2 22 + θ 2 11 ) ∂ 4 11 + 6 θ 11 ∂ 3 11 + (2 θ 2 22 − 2 θ 2 11 + 6) ∂ 2 11 − 6 θ 11 ∂ 11 − θ 2 22 + θ 2 11 − 3 , h 2 = ( θ 2 22 − θ 2 11 ) ∂ 2 11 − θ 11 ∂ 11 + θ 22 ∂ 22 − θ 2 22 + θ 2 11 , h 3 = θ 22 ∂ 4 11 + θ 11 ∂ 22 ∂ 3 11 + (3 ∂ 22 − θ 22 ) ∂ 2 11 − θ 11 ∂ 22 ∂ 11 − 2 ∂ 22 , h 4 = θ 11 θ 22 ∂ 3 11 + ( θ 2 11 ∂ 22 − θ 22 ) ∂ 2 11 + ( − θ 2 11 − 1) ∂ 22 + θ 22 , h 5 = − ∂ 2 11 + ∂ 2 22 . Lo oking at h 2 and h 5 w e hav e h 2 = ( θ 2 22 − θ 2 11 )  ∂ 2 11 + θ 11 ∂ 11 − θ 22 ∂ 22 θ 2 11 − θ 2 22 − 1  , h 2 θ 2 22 − θ 2 11 + h 5 =  ∂ 2 22 + θ 22 ∂ 22 − θ 11 ∂ 11 θ 2 22 − θ 2 11 − 1  . These are the same as (2 4) for p = r = 2. F o r the case of V 2 ( R 3 ) ( p = 3, r = 2) b y Macaula y2 w e hav e chec k ed that I is holonomic By asir (nk restriction) I diag has the set of generators: h 1 = − θ 11 ∂ 22 ∂ 2 11 + ( − θ 22 ∂ 2 22 − 3 ∂ 22 + θ 22 ) ∂ 11 + θ 11 ∂ 22 , h 2 = θ 11 θ 22 ∂ 2 11 + θ 22 ∂ 11 − θ 11 θ 22 ∂ 2 22 − θ 11 ∂ 22 , h 3 = θ 2 11 ∂ 2 11 + 2 θ 11 ∂ 11 − θ 2 22 ∂ 2 22 − 2 θ 22 ∂ 22 + θ 2 22 − θ 2 11 , h 4 = − θ 11 ∂ 2 11 + ( θ 2 22 ∂ 2 22 + 2 θ 22 ∂ 22 − θ 2 22 − 1) ∂ 11 + θ 11 θ 22 ∂ 3 22 + 2 θ 11 ∂ 2 22 − θ 11 θ 22 ∂ 22 , h 5 = ( − θ 11 θ 22 ∂ 2 22 − θ 11 ∂ 22 + θ 11 θ 22 ) ∂ 11 − θ 2 22 ∂ 3 22 − 4 θ 22 ∂ 2 22 + ( θ 2 22 − 2) ∂ 22 + 2 θ 22 , h 6 = − θ 11 θ 22 ∂ 11 + ( θ 3 22 − θ 2 11 θ 22 ) ∂ 2 22 + (2 θ 2 22 − θ 2 11 ) ∂ 22 − θ 3 22 + θ 2 11 θ 22 . Lo oking at h 6 h 6 = − θ 11 θ 22 ∂ 11 + ( θ 3 22 − θ 2 11 θ 22 ) ∂ 2 22 + (2 θ 2 22 − θ 2 11 ) ∂ 22 − θ 3 22 + θ 2 11 θ 22 = ( θ 2 22 − θ 2 11 ) θ 22  ∂ 2 22 + 2 θ 2 22 − θ 2 11 ( θ 2 22 − θ 2 11 ) θ 22 ∂ 22 − θ 11 θ 2 22 − θ 2 11 ∂ 11 − 1  = ( θ 2 22 − θ 2 11 ) θ 22  ∂ 2 22 + 1 θ 22 ∂ 22 + θ 22 ∂ 22 − θ 11 ∂ 11 θ 2 22 − θ 2 11 − 1  22 w e see that it coincides with the case of p = 3 , r = 2 , i = 2 in (24). The case of p = r = 3 is to o big for the current implemen tation of the D -mo dule theoretic restriction algorit hm in the nk restriction pac k ag e. Theorem 1, whic h gives elemen ts in I diag , is sho wn b y a diﬀeren t metho d. B Asymptotic ev aluation of E ( k , l , m ) W e deriv e an asymptotic form of E ( k , l , m ) whe n k , l , m simultaneously go to inﬁnit y . Let k = nα , l = nβ a nd m = nγ where α , β and γ are ﬁxed p ositiv e n um b ers. W e use Laplace’s metho d to show E ( k , l , m ) ∼ r 2 π (( k + l )( l + m )( k + m )) − 1 / 2 (25) as n → ∞ . The integrand x k 11 x l 22 x m 33 of E ( k , l , m ) is maximized a t fo ur p oin ts ( x 11 , x 22 , x 33 ) = (1 , 1 , 1), ( − 1 , − 1 , 1) , ( − 1 , 1 , − 1) and (1 , − 1 , − 1) as long as k , l , m a re all ev en or all o dd. By symmetry it is suﬃcien t to consider the neigh b orho o d of diag (1 , 1 , 1), where X is appro ximated b y X =   (1 − ǫ 2 1 − ǫ 2 2 ) 1 / 2 − ǫ 1 − ǫ 2 ǫ 1 (1 − ǫ 2 1 − ǫ 2 3 ) 1 / 2 − ǫ 3 ǫ 2 ǫ 3 (1 − ǫ 2 2 − ǫ 2 3 ) 1 / 2   with suﬃcien tly small n um b ers ǫ 1 , ǫ 2 , ǫ 3 . The densit y of ( ǫ 1 , ǫ 2 , ǫ 3 ) with respect to the Leb esgue measure dǫ 1 dǫ 2 dǫ 3 is 1 / V ol( S O (3 )) = 1 / (8 π 2 ). Henc e w e obtain E ( k , l , m ) ∼ 4 Z (1 − ǫ 2 1 − ǫ 2 2 ) k / 2 (1 − ǫ 2 1 − ǫ 2 3 ) l/ 2 (1 − ǫ 2 2 − ǫ 2 3 ) m/ 2 1 8 π 2 dǫ 1 dǫ 2 dǫ 3 ∼ 4 Z e − ( k + l ) ǫ 2 1 / 2 − ( k + m ) ǫ 2 2 / 2 − ( l + m ) ǫ 2 3 / 2 1 8 π 2 dǫ 1 dǫ 2 dǫ 3 = r 2 π (( k + l )( l + m )( k + m )) − 1 / 2 . W e hav e c hec ke d that the rig h t-hand side giv es a go o d approximation to t he exact v alue of E ( k , l , m ) for k + l + m ≥ 10 0. The same argumen t sho ws that fo r S O ( p ) E " p Y i =1 x k i ii # ∼ p ( p − 1 ) 2 1 V ol( S O ( p )) Y i 0, and k i ’s are are a ll eve n or all o dd. 23 References Ole Barndorﬀ-Nielsen. Inform ation and Exp onential Fami lies . John Wiley & Sons Inc., New Y ork, 1978 . Wiley Series in Probabilit y and Mathematical Statistics. R. J. Beran. Exp onential mo dels for directional data. The A nnals o f S tatistics , 7:1162 – 1178, 1 979. Y asuk o Chikuse . Statistics on Sp e cial Manifolds , v olume 174 o f L e ctur e Notes in Statistics . Springer-V erlag , New Y ork, 2 003. ISBN 0-38 7-0016 0 -3. D.Co x, J. Little, and D.O’Shea. Using Algebr aic Ge ometry . Springer-V erlag, New Y ork, 2005. ISBN 978- 0387207 339. Daniel R. Gra yson and Michael E. Stillman. Macaula y2, a soft w a re system for researc h in alg ebraic geometry . Av ailable at http://www.m a t h.uiuc.edu/Macaula y2/. H.Hashiguc hi, Y.Numata, N.T ak a yama, and A.T ake mura. Holonomic gradient metho d for the distribution function of the largest ro ot of a Wishart matrix. arxi v :1201.0472 , 2012. A. T. James. Calculation of zonal p olynomial co eﬃcien ts by use o f the Laplace-Beltrami op erator. A nn. Math. Statist , 39:17 11–1718 , 1 9 68. ISSN 0003-4 8 51. P . E. Jupp and K. V. Mardia. Maxim um lik eliho o d estimators for the matrix von Mises– Fisher and Bingham distributions. The Annals of Statistics , 7:59 9–606, 1979. C. G . Khatri and K. V. Mardia. The v on Mises–Fisher dis tr ibution in orentation statistics. Journal o f the Royal Statistic al So ciety: S e ri e s B , 39:95–106 , 1977 . A. Kume a nd A. T. A. W o o d. Saddlep oint approximations for the Bingham and F isher- Bingham no r ma lising constan ts. Biometrika , 92 :4 65–476, 2005. K. V. Mardia. Sta t istics of directional data. Journal of the R oyal Statistic al So ciety: Series B , 3 7 :349–393 , 1975. Kan ti V. Mardia and P eter E. Jupp. Dir e ctional Statistics . Wiley Series in Probabilit y and Sta tistics. John Wiley & Sons Ltd., Chic hester, 20 00. ISBN 0- 471-95 333-4. Brian G. Marsde n. Catalo gue of Cometary Orbits . Smithonian Astroph ysical Obse rv atory , Cam bridge, MA, 19 72. R. J. Muirhead. Systems of par t ia l diﬀ erential equations for h yp ergeometric functions of matrix argumen t. Ann. Math. Statist. , 41:9 9 1–1001, 1970. ISSN 00 0 3-4851 . Robb J. Muirhead. Asp e cts of Multivariate Statistic al The ory . John Wiley & Sons Inc., New Y ork, 1982 . ISBN 0-471-09 442-0. Wiley Series in Probability and Mathematical Statistics. 24 Hiromasa Nak ay ama, Ken ta Nishiy ama, Masay uki Nor o, K a tsuy oshi Ohara, T omonari Sei, Nobuki T ak a y a ma , and Akimic hi T a k em ura. Holonomic gradient descen t and its application to the Fisher-Bingham in tegral. A dv a n c e s in Applie d Mathematics , 47 : 639–658, 2011. doi: 10.1016/j.aam.20 11.03.001. Op enXM Mathematics Rep ository . Dat a for the sp ecial ort ho gonal group a nd the stiefel manifold. h ttp://www.math.k ob e-u.ac.jp/Op enXM/Math/Fisher-Bingham/SO3. Mic hael J. Prentic e. Orien tation statistics without parametric assumptions. J. R oy. Statist. So c. Ser. B , 48(2):2 14–222, 19 86. ISSN 0035-9 2 46. RisaAsir dev eloping team. Risa/asir, a computer algebra system. Av ailable at h ttp://www.math.k ob e-u.ac.jp/Asir/. Akimic hi T ak em ura. Zonal Polynomials . I nstitute o f Mathematical Statistics Lecture Notes—Monograph Series, 4. Institute of Mathematical Statistics, Hayw ard, CA, 19 8 4. ISBN 0-9406 00-05- 6. Andrew T. A. W o o d. Estimation of the concen tration parameters of the Fisher matrix distribution on SO(3) a nd the Bing ha m distribution on S q , q ≥ 2. A ustr al. J. Statist. , 35(1):69–7 9, 1993. ISSN 0004-9 5 81. 25

Properties and applications of Fisher distribution on the rotation group

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment