Open Problems in Algebraic Statistics

Algebraic statistics is concerned with the study of probabilistic models and techniques for statistical inference using methods from algebra and geometry. This article presents a list of open mathematical problems in this emerging field, with main em…

Authors: Bernd Sturmfels

OPEN PR OBLEMS IN ALGEBRAIC ST A TISTICS BERND STURMFELS ∗ Abstract. Algebraic statistics is concerned with the study of probabili s tic models and tec hniques for st atistical inference using methods fro m algebra and geometry . This article presents a list of op en mathematical problems i n this emerging field, with mai n emphasis on graphical mo dels wi th hidden v ariables, maximum like liho o d estimation, and multiv ar i ate Gaussian distributions. These are notes from a lecture presen ted at the IMA in Minneapolis during the 2006/07 program on Applicat ions of Algebraic Geo metry . Key words. Al gebraic statistics, continge ncy tables, hidden v ariabl es, Sch ur mo d- ules, maximum likelihoo d, conditional independence, multiv ariate Gaussian, gaussoid AMS(MOS) sub ject classificat ions. 13P10, 14Q15, 62H17, 65C60 1. In tro duction. This a r ticle is based on a lecture given in Ma rch 2007 at the workshop on Statistics, Biolo gy and Dynamics held a t the In- stitute for Mathematics and its Applications (IMA) in Minneap olis as pa rt of the 2 006/ 07 pro gram on Applic ations of Algebr aic Ge ometry . In four sections we present mathematical problems whose solutions would likely beco me impor tant contributions to the emerg ing interactions b etw een al- gebraic geometr y a nd computational statistics. Each of the four sections starts o ut with a “sp ecific problem” which plays the role of representing the broader resea rch ag enda. The latter is summarize d in a “ general problem”. Algebraic statistics is concer ned with the study of pr obabilistic mo dels and techniques for statistical inference using metho ds from algebra and geometry . The term was coined in the bo o k o f Pistone, Riccomagno and Wynn [25] and subsequently dev elop ed for biological applications in [24]. Readers fr om statistics will enjoy the introduction and review r ecently giv en b y Drton and Sulliv a n t [8 ], while reader s fr o m a lg ebra will find v ar ious po int s of entry cited in our discussion and listed amo ng our refer ences. 2. Graphical Mo dels with Hidden V ariables. Our first question concerns three - dimensional contingency tables ( p ijk ) whose indices i, j, k range over a set of four elements, such as the s et { A , C , G , T } of DNA base s. Sp ecific Proble m: Consider the variety of 4 × 4 × 4 -tables of tensor r ank at most 4 . T her e ar e c ertain known p olynomials of de gr e e at most nine which vanish on this variety. Do they suffic e to cut out the variety? This particular op en problem app e a rs in [24, Conjecture 3.24], and it here serves as a place holder for the following broader direction of inquiry . General Problem: Study t he ge ometry and c ommu t ative algebr a of gr aphic al mo dels with hidden r andom variables. Construct these varieties by gluing familiar se c ant varieties, and by applying r epr esentation the ory. ∗ Unive rsity of California, Berk eley , CA 94720, USA, bernd@math.be rkeley.edu 1 W e are in terested in statistical models for discrete data whic h can be represented by po ly nomial constra int s. As is customary in alg ebraic geometry , we consider v ar ieties ov er the field of co mplex num ber s, with the tacit under s tanding that statisticians mostly car e abo ut p oints whose co ordinates ar e r eal and non-negative. The mo del referred to in the Sp ecific Problem lives in the 64-dimensional s pa ce C 4 ⊗ C 4 ⊗ C 4 of 4 × 4 × 4-tables ( p ijk ), where i, j, k ∈ { A , C , G , T } . It has the pa rametric r epresentation p ijk = ρ A i · σ A j · θ A k + ρ C i · σ C j · θ C k + ρ G i · σ G j · θ G k + ρ T i · σ T j · θ T k . (2.1) Our problem is to compute the homog eneous prime ideal I of all po lynomi- als which v anish on this mo del. The des ir ed ideal I lives in the p o lynomial ring Q  p AAA , p AAC , p AAT , . . . , p TTG , p TTT  with 64 unknowns. In pr inciple, one can compute g e ner ators of I by applying Gr¨ obner bases metho ds to the parametrizatio n (2.1). Howev er, o ur problem has 64 probabilities and 48 parameters, and it is simply to o big for the kind of computations which were p erfo r med in [24, § 3.2] using the soft ware pa ck age S ingul ar [1 3]. Given that Gr¨ obner basis metho ds app ear to b e to o slow fo r any pro b- lem size which is actually relev ant for real data, s keptics may wonder why a statistician should b other learning the lang ua ge of idea ls and v a rieties. One p ossible resp onse to the practitioner’s leg itimate question “Why (pur e) mathematics?” is offered by the following quote due to Henri Poincar´ e: “Mathematics is the Art of Giving the Same Name to Differ ent Things”. Indeed, our pr ime idea l I gives the same name to the following things: • the set o f 4 × 4 × 4-tables of tensor rank ≤ 4, • the mixture of four mo dels for three indep endent r andom v ar iables, • the naive Bayes mo del with four classes, • the conditional indep endence mo del [ X 1 ⊥ ⊥ X 2 ⊥ ⊥ X 3 | Y ], • the fourth s e c a nt v ariety of the Segr e v ariety P 3 × P 3 × P 3 , • the general Mar kov mo del for the ph ylogenetic tre e K 1 , 3 , • sup erp osition of four pure sta tes in a quantum system [4, 14]. These different terms hav e b een used in the litera ture for the geo metric ob ject represented by (2.1). The concise la nguage of comm utative a lgebra and algebraic geometry can be an effectiv e channel of communication for the differen t communities o f statisticians, computer scientists, physicists, engineers and biologists, all of whom have encountered formulas like (2.1). The generators of low est deg ree in our ideal I hav e degree five, and the known generato rs of highest degr ee hav e deg ree nine. The analysis of Landsb erg and Manivel in [2 0, Pro p o s ition 6.3] on 3 × 3 × 4-tables of tensor rank four implies the existence of a dditional idea l g enerators o f degr ee six in I . This analysis had b een ov er lo oked b y the authors of [24] when they formulated their Conjecture 3.24 . Reader s of [24, Chapter 3] ar e herewith kindly a sked to replace “of de gr e e 5 and 9 ” by “of de gr e e at most 9 ” . 2 In what follows we pre s ent the k nown minimal ge ner ators of degre e five and nine in our prime ideal I , and we p ostp one a mo re detailed discussion of the La ndsb erg-Manivel sextics in [20, Pro po sition 6 .3] to a future study . Consider any 3 × 4 × 4- subtable ( p ijk ) and let A, B , C b e the 4 × 4-s lices gotten b y fixing i . T o b e precise, the entry o f the 4 × 4-ma trix A in r ow j and column k equals p A j k , the entry o f B in row j a nd column k eq uals p C j k , and the entry of C in row j and column k equals p G j k . W e can chec k that the following identi ty o f 4 × 4-matrices holds for all tables in o ur mo del, pr ovided the ma trix B is inv ertible: A · B − 1 · C = C · B − 1 · A After clea r ing the deno minator det( B ), we can write this iden tit y as A · adj( B ) · C − C · adj( B ) · A = 0 , (2.2) where adj( B ) = det( B ) · B − 1 is the adjoint matrix of B . The matrix entries on the left hand side g ive 16 quintic p olynomia ls which lie in our prime ideal I . Ea ch matrix entry is a p olynomial with 180 terms which in volv e only 30 o f the 64 unknowns. F o r exa mple, the upp er left en try lo o ks like this: p AA C p CCA p CGG p CTT p GAA − p AA C p CCA p CGT p CTG p GAA − p AA C p CCG p CGA p CTT p GAA + p AA C p CCT p CGA p CTG p GAA + · · · · · · (175 terms) · · · · · · − p A T A p CA G p CCC p CGA p GA T . W e no te that there a r e no no n-zero p olynomials o f degree ≤ 4 in the ideal I . This fo llows from general res ults on s e c a nt v a rieties [5, 17]. An explicit linear a lgebra co mputation re veals that all p olynomia ls of deg ree five in I are gotten from the ab ov e construction by rela b e ling and co nsidering all subtables of format 3 × 4 × 4, format 4 × 3 × 4 and fo r - mat 4 × 4 × 3, and by applying the natur a l action o f the group G L ( C 4 ) × GL ( C 4 ) × GL ( C 4 ) o n 4 × 4 × 4-tables. This action leaves the ideal I fixed. W e iden tify the representation of this group o n the space of quint ics in I . Proposition 2. 1. Th e sp ac e of qu int ic p olynomials in the prime ide al I of (2.1) has dimension 1728 . As a GL ( C 4 ) 3 -mo dule, it is isomorphic to S 311 ( C 4 ) ⊗ S 2111 ( C 4 ) ⊗ S 2111 ( C 4 ) ⊕ S 2111 ( C 4 ) ⊗ S 311 ( C 4 ) ⊗ S 2111 ( C 4 ) ⊕ S 2111 ( C 4 ) ⊗ S 2111 ( C 4 ) ⊗ S 311 ( C 4 ) . Here S λ ( C 4 ) denotes the S chur mo dules which a re the irreducible r ep- resentations of GL ( C 4 ). W e refer to [10] for the r e le v ant basics on repr e - sentation theor y of the gener al linear gr oup, and to [17, 18, 19] for more detailed infor mation ab out the sp ecific mo dules under consideration her e. 3 The known inv ar iants o f degree nine are also o bta ined by a s imilar construction. Consider any 3 × 3 × 3 -subtable ( p ijk ) and denote the three slices of that table by A , B and C . W e now co nsider the 3 × 3-determinant det( A · B − 1 · C − C · B − 1 · A ) . (2.3) The denominator of the r ational function (2.3) is det( B ) 2 and not det( B ) 3 as one might think on first glance. The num erator of (2.3) is a homogeneous po lynomial of degree nine with 9 216 terms which remains in v ar iant under per mu ting A , B and C . This homo geneous p olynomial of degree nine lies in the ideal I and is known as the Stra ssen invariant . Proposition 2.2. The GL ( C 4 ) 3 -submo dule of the de gr e e 9 c omp onent I 9 gener ate d by the Str assen invariant is not c ontaine d in the ide al h I 5 i gener ate d by the quintics in Pr op osition 2.1 . This mo dule has ve ctor sp ac e dimension 800 0 and it is isomorp hic to the r epr esentation S 333 ( C 4 ) ⊗ S 333 ( C 4 ) ⊗ S 333 ( C 4 ) . The first app earance of the Stra ssen inv aria nt in algebraic statistics was [1 1, Prop osition 22]. A conceptual study o f the matrix construction AB − 1 C − C B − 1 A was undertaken by Landsb erg a nd Manivel in [18]. The Sp ecific Problem at the b eg inning of this section pla ys a piv otal role also in algebraic phylogenetics [1, 2, 3]. Our mo del (2.1) is known there as the general Markov mo del o n a tree with three leav es branching off directly from the ro o t. Allman a nd Rho des [2, § 6 ] show ed that phylogenetic in v ar iants which cut o ut the genera l Marko v model on an y lar g e r binary ro oted tree ca n b e constructed from the ge ner ators of our ideal I by a gluing pro cess. The inv ar iants of degree five and nine arising from (2.2) a nd (2.3) are therefore basic building blo cks for phylogenetic inv ariants on arbitrar y trees whos e no des are lab eled with the four letters A , C , G and T . In her lecture at the same IMA conference in Mar ch 200 7, E lizab e th Allman [1] offered an extremely attractive prize for the res o lution of the Specific Pr oblem. She o ffered to pe r sonally ca tch and smoke wild salmon from the Copp er River, lo ca ted in her “backy ard” in Alask a, and ship it to any one who will determine a minimal g e ner ating set of the prime ideal I . In Prop ositions 2.1 and 2.2, w e emphasized the language of represen- tation theor y in characterizing the defining equations of graphical statis- tical mo dels. This metho dolog y is a main fo cus in the for thcoming bo o k b y J.M. Landsb erg and J ason Morton, which advoca tes the idea of using Sch ur mo dules S λ ( C n ) in the description of such mo dels. Morton’s key insight is that this naturally generalizes conditional indep endence, the cur - rent lang uage of choice for characterizing graphica l mo dels. Conditional independence statemen ts ca n b e in terpreted as a co n venien t shor thand for large sys tems of quadratic equations; see [12, § 4.1] or [2 7, P r op osition 8.1 ]. 4 In the abs e nce of hidden random v ar iables, the quadratic equatio ns ex- pressed implicitly by conditional independence a re sufficient to characterize graphical mo dels. This is the conten t of the Hammersley-Cliffor d The or em (see e.g . [12, Theorem 4.1 ] or [24, Theo r ems 1 .30 a nd 1.33]). How ever, when s ome of the r andom v a riables in a g raphical mo del are hidden then the situation b eco mes muc h more complicated. W e b elieve that representa- tion theory of the ge ner al linear g r oup can greatly enhance the c o nditional independence calculus which is so widely used by graphical mo dels exp erts. The repr esentation-theoretic notation was here illustrated for a tiny graph- ical mo del, having three observed random v ariables a nd one hidden r andom v aria ble, all four having the sa me state space { A , C , G , T } . 3. Maxim um Likeliho o d Estim ation. In this section we discuss topics concerning the alg ebraic a pproach to maximum likelihoo d estimation [24, § 3 .3]. The following o p e n problem w as published in [1 6, Pro blem 1 3]. Sp ecific Problem : Find a ge ometric char acterization of t hose pr oje ctive varieties whose maximum likeli ho o d de gr e e (ML de gr e e) is e qual to one. This question a nd others rais ed in [6, 16] ar e just the tip o f an iceb erg: General Problem: Stu dy the ge ometry of maximum likeliho o d estimation for algebr aic statistic al mo dels. Here algebraic s ta tistical models ar e regarded a s pro jectiv e v arieties. A mo del has ML deg r ee one if and only if its maximu m lik eliho o d estimator is a r ational function o f the data. Mo dels which have this prop erty tend to be v ery nice. F o r insta nce, in the sp ecia l cont ext of undirected gra phica l mo dels (Marko v random fields), the pro per ty of having ML degree o ne is equiv alent to the statemen t that the gra ph is decompo s able [12, Theorem 4.4]. F or toric v arieties, our question was featured in [27, Problem 8.23 ]. It is hop ed that the ML degree is r elated to conv ergence prop erties o f n umerical a lgorithms used by statisticians, such as iterative prop ortional scaling or the EM a lgorithm, but no systematic study in this direction has yet b een undertaken. In general, we wish to learn how statistical features of a mo del rela te to geometric pro p er ties of the corr esp onding v ar ie ty . Here are the relev ant definitions for our pr oblems. W e fix the complex pro jective space P n with co o rdinates ( p 0 : p 1 : · · · : p n ). The co ordinate p i represents the pro bability of the i th event . The n -dimensional probability simplex is iden tified with the set P n ≥ 0 of p oints in P n which hav e non- negative rea l co o rdinates. The data comes in the form of a no n- neg ative in teger v ector ( u 0 , u 1 , . . . , u n ) ∈ N n +1 . Here u i is the num ber of times the i th even t was o bserved. The corresp o nding likelih o o d function is defined as L ( p 0 , p 1 , . . . , p n ) = p 0 u 0 · p 1 u 1 · p 2 u 2 · · · · · p n u n ( p 0 + p 1 + · · · + p n ) u 0 + u 1 + ··· + u n . (3.1) Statistical computations are typically done in affine n -space sp ecified by p 0 + p 1 + · · · + p n = 1, where the denominator of L ca n b e ignored. How ever, 5 the denominator is needed in o rder for L to b e a well-defined rational function on P n . The unique critical p oint of the lik eliho o d function L is a t ( u 0 : u 1 : · · · : u n ), and this p oint is the global maximum of L ov er P n ≥ 0 . By a critic al p oint we mean any p oint a t which the gr adient of L v anishes. An algebr aic st atistic al mo del is represented by a subv a riety M o f the pro jective space P n . The mo del itself is the intersection of M with the probability simplex P n ≥ 0 . The ML de gr e e of the v ariety M is the num ber of complex cr itical p oints of the res triction of the likelihoo d function L to M . Here we disre g ard singular po int s o f M , we o nly c o unt critical p oints that are not p o les or zeros of L , a nd u 0 , u 1 , . . . , u n are assumed to be generic. If M is smo o th and the divisor on M defined by L has normal crossing s then there is a geometric characteriza tion o f the ML degr ee, derived in the pape r [6] with Catanese, Ho¸ sten and Khetan. The assumptions of smo othness and normal crossing ar e very re strictive and a lmost never satisfied for mo dels of statistical interest. In gener al, to under stand the ML deg ree will r e q uire in voking s o me reso lution of s ing ularities a nd its a lgebraic underpinnings. W e illustrate the computation of the ML degree for the ca s e when M is a plane curv e. Here n = 2 a nd M is the zero set of a homo geneous po lynomial F ( p 0 , p 1 , p 2 ). Using Lag range multipliers or [16, Pr op osition 2], we der ive that the condition for ( p 0 : p 1 : p 2 ) to b e a critical p oint o f the re s triction of L to M is equiv alent to the system of tw o equations F ( p 0 , p 1 , p 2 ) = det   u 0 p 0 p 0 · ∂ F / ∂ p 0 u 1 p 1 p 1 · ∂ F / ∂ p 1 u 2 p 2 p 2 · ∂ F / ∂ p 2   = 0 . F or a general p olynomia l F of degre e d , these equations will hav e d ( d + 1 ) solutions, by B´ ezo ut’s Theo rem. Mo reov er, a ll of these s o lutions satisfy p 0 · p 1 · · · p n · ( p 0 + p 1 + · · · + p n ) 6 = 0 , (3.2) and we co nclude that the ML degree of a general plane curve o f degree d is equal to d ( d + 1 ). How ev er, that num b er can dr op considerably for sp ecial curves. F o r instance, while the ML degr ee of a genera l plane quadric equals six, the s pecia l quadric { p 2 1 = λp 0 p 2 } has ML deg r ee tw o for λ 6 = 4, a nd it has ML degree one for λ = 4. Thus, returning to the Sp ecial Pr oblem, our first example of a v ar iet y o f ML deg ree one is the plane curve defined by F = det  2 p 0 p 1 p 1 2 p 2  . (3.3) Biologists know this as the Har dy-Weinb er g curve , with the para metr iz a tion p 0 = θ 2 , p 1 = 2 θ (1 − θ ) , p 2 = (1 − θ ) 2 . (3.4) The unique critical p oint of the likeliho o d function L on this curve equals  (2 u 0 + u 1 ) 2 : 2(2 u 0 + u 1 )( u 1 +2 u 2 ) : ( u 1 + 2 u 2 ) 2  . 6 Determinant al v arieties arise natura lly in statistics. They are the mo d- els M that are sp ecified by impo sing rank conditions on a matrix of un- knowns. A firs t exa mple is the mo del (3 .4) for tw o i.i.d. binary random v aria bles. F or a second example we c o nsider the general 3 × 3-matrix P =   p 00 p 01 p 02 p 10 p 11 p 12 p 20 p 21 p 22   (3.5) which r epresents tw o ternary random v ar iables. The independence mo del for these tw o random v ariables is the v ariety of ra nk one matrices. This mo del a lso ha s ML degr ee one, i.e., the maximum likeliho o d estimator is a rational function in the data. It is given b y the 3 × 3-matrix whose entry in r ow i a nd column j equals ( u i 0 + u i 1 + u i 2 ) · ( u 0 j + u 1 j + u 2 j ). By contrast, co nsider the mixtu r e mo del based on t wo ternar y r a ndom v aria bles. It consists of all matrices P of ra nk a t most tw o. Thus this mo del is the h yp e r surface defined by the cubic p olynomial F = det( P ). Explicit computation shows that the ML deg ree of this h yp ersurface is ten. In g eneral, it remains an o p en problem to find a formula, in terms of m, n and r , for the ML degree of the v ariety of m × n -matrices of rank ≤ r . The first in teresting case arises when m = n = 4 and r = 2. At prese nt we a re una ble to so lve the likelihoo d equations for this case symbolically . The following concrete biolog y example w as prop osed in [24, Example 1.16 ]: “Our data ar e two aligne d DNA se quenc es ... ATCACC AAACA TTGGG ATGCCTGTGCATTTGCAAGCGGCT ATGAGT CTTAA ACGCT GGCCATGTGCCATCTTAGACAGCG .. t est t he hyp othesis t hat these two se quenc es wer e gener ate d by D iaNA using one biase d c oin and four tet r ahe dr al dic e....” Here the mo del M co nsists of all (p ositive) 4 × 4-matrices ( p ij ) of rank at most tw o . In the g iven alignment, each match o ccurs four times a nd ea ch mismatch o ccurs tw o times. Hence the likelihoo d function (3.1) equals L = ( Y i p ii ) 4 · ( Y i 6 = j p ij ) 2 · ( X i,j p ij ) − 40 . Based on exp eriments with the EM alg orithm, w e conjectured that the matrix  ˆ p ij  = 1 40     3 3 2 2 3 3 2 2 2 2 3 3 2 2 3 3     is a global maximum of the lik eliho o d function L . In the Nachdiplomsve rlesung (po stgraduate course) which I held at E TH Z ¨ uric h in the summer o f 2005, I o ffered a ca sh prize of 100 Swiss F ra ncs fo r the resolution of this very sp ecific co njecture, and this prize rema ins unclaimed and is still av ailable at this time (August 2007). 7 The state of the a r t on this 100 Swiss F r ancs Conje ctu r e is the work of Hersh which or iginated in March 2007 at the IMA. She prov ed a range of constraints on the maximum likeliho o d estimates of determinantal mo dels, esp ecially when the data u ij hav e symmetry . A discussion o f these ideas app ears in Hersh’s pap er with Fienberg , Rinaldo a nd Zhou [9]. That pap er gives an ex po sition of MLE for determinantal mo dels aimed at statisticians. 4. Gaussian Condi tional Indep e nde nce Mo del s. The early lit- erature on algebr aic statistics, including the b o o k [24], dealt primar ily with discrete random v a riables (binar y , ternary , . . . ). The set-up was as describ ed in the previo us tw o sectio ns. W e now shift gear s and consider mult iv ar iate Gaussian distributions. F or contin uo us random v ariables, we must work in the space o f mo del pa rameters in or der to apply algebraic geometr y . The following co ncrete problem co ncerns Ga us sian distributions on R 5 . Sp ecific Problem: Which sets of almost-princip al minors c an b e zer o for a p ositive definite symmetric 5 × 5 - matrix? The gener al question behind this ask s for c haracter iza tion o f all con- ditional indep endence mo dels which can be rea lized by Ga ussians on R n . General Problem: Stu dy the ge ometry of c onditional indep endenc e mo d- els for m u ltivariate Gaussian r andom variables. The sta te of the art o n these problems app ears in the work of F ranti ˇ sek Mat ´ u ˇ s and his collab ora tors. In particular, Mat ´ u ˇ s’ recen t pap er with Lnˇ eniˇ ck a [20] on r epr esent ation of gaussoids solves our Specific Pr oblem for symmetric 4 × 4-matrices. Sulliv a nt ’s co nstruction in [28] co mplemen ts that work. F o r mor e information see also the article by ˇ Sime ˇ cek [26]. Let us b egin, how ever, with so me basic definitions. Our aim is to discuss these problems in a self-contained ma nner . A multivariate Gaussian distribution on R n with mean zer o is s p ecified b y its co v ariance matrix Σ = ( σ ij ). The n × n -matrix Σ is s y mmetric and it is p ositive definite , which mea ns that all its 2 n principal minors are pos itiv e re a l nu mbers. An almost-princip al minor of Σ is a sub determinant whic h has row indices { i } ∪ K and column indices { j } ∪ K for some K ⊂ { 1 , . . . , n } a nd i, j ∈ { 1 , . . . , n }\ K . W e denote this sub determinant by [ i ⊥ ⊥ j | K ]. F or example, if n = 5, i = 2 , j = 4 a nd K = { 1 , 5 } then the cor resp onding almost-principal minor of the symmetric 5 × 5-matrix Σ equals [ 2 ⊥ ⊥ 4 |{ 1 , 5 } ] = det   σ 24 σ 12 σ 25 σ 14 σ 11 σ 15 σ 45 σ 15 σ 55   Our notation for almo st-principal minor s is justified by their in timate connection to conditional indep endence, expr essed in the following lemma . W e note that the a lmost-principal minors a re referred to as p artial c ovari- anc e (or, if renorma lized, p artial c orr elations ) in the statistics litera ture. 8 Lemma 4. 1. The su b determinant [ i ⊥ ⊥ j | K ] is zer o fo r a p ositive definite symmetric n × n -matrix Σ if and only if, for the Gaussian r andom variable X on R n with c ovarianc e matrix Σ , the ra ndom variab le X i is indep endent of the r andom variable X j given the joint variable X K . Pr o of . See [7, Equation(5)], [22, Section 1], or [28, P rop osition 2.1]. Let P D n denote the  n +1 2  -dimensional c o ne of p o s itiv e definite s y m- metric n × n -matrices . Note that this cone is op en. A Gaussian c onditional indep endenc e mo del , or GCI mo del for short, is a ny semi-a lgebraic subset of the co ne P D n which can b e defined by p olynomial eq uations of the form [ i ⊥ ⊥ j | K ] = 0 . (4.1) In algebr aic geometry , w e simplify matters by studying the complex alge - braic v arieties defined by equations of the form (4.1). Of course, what we are particularly in terested in is the real lo cus of suc h a complexified GCI mo del, and how it intersects the p os itiv e definite cone PD n and its closur e. As an illustration of algebr aic r e a soning for Gauss ia n conditional in- dependence mo dels, we examine a n example taken from [28]. Let n = 5 and consider the GCI mo del g iven by the five quadra tic po lynomials [ 1 ⊥ ⊥ 2 | { 3 } ] = σ 12 σ 33 − σ 13 σ 23 [ 2 ⊥ ⊥ 3 | { 4 } ] = σ 23 σ 44 − σ 24 σ 34 [ 3 ⊥ ⊥ 4 | { 5 } ] = σ 34 σ 55 − σ 35 σ 45 [ 4 ⊥ ⊥ 5 | { 1 } ] = σ 45 σ 11 − σ 14 σ 15 [ 5 ⊥ ⊥ 1 | { 2 } ] = σ 15 σ 22 − σ 25 σ 12 This v ar iety is a complete intersection (it has dimension ten) in the 15- dimensional spac e of symmetric 5 × 5- matrices. Primary deco mpo sition re- veals that it is the union of precisely tw o irreducible components, namely , • the linear spa ce { σ 12 = σ 23 = σ 34 = σ 45 = σ 15 = 0 } , a nd • the toric v a riety defined b y the five q uadrics plus the extra equation σ 11 σ 22 σ 33 σ 44 σ 55 = σ 13 σ 14 σ 24 σ 25 σ 35 . (4.2) All matrice s in the op en co ne PD 5 satisfy the inequalities σ ii > 0 and σ 11 σ 33 > σ 2 13 , σ 22 σ 44 > σ 2 24 , σ 33 σ 55 > σ 2 35 , σ 44 σ 11 > σ 2 14 , σ 55 σ 22 > σ 2 25 . Multiplying the left hand sides and rig ht ha nd sides resp ectively , we find σ 2 11 σ 2 22 σ 2 33 σ 2 44 σ 2 55 > σ 2 13 σ 2 14 σ 2 24 σ 2 25 σ 2 35 . This is a con tradiction to the equation (4.2 ), and we co nclude that the in tersection of our GCI mo del with PD 5 is contained in the linear space { σ 12 = σ 23 = σ 34 = σ 45 = σ 15 = 0 } . The v anishing of the off-diagonal ent ry σ ij means that X i is independent o f X j , or, in symbols, [ i ⊥ ⊥ j ]. Our algebraic co mputation thus implies the following a xiom for GCI mo dels. 9 Corollar y 4.1. Supp ose the c onditional indep endenc e statements [ 1 ⊥ ⊥ 2 | { 3 } ] , [ 2 ⊥ ⊥ 3 | { 4 } ] , [ 3 ⊥ ⊥ 4 | { 5 } ] , [ 4 ⊥ ⊥ 5 | { 1 } ] , [ 5 ⊥ ⊥ 1 | { 2 } ] hold for some multivariate Gaussian distribution. Then also the fol lowing five st atements must hold: [ 1 ⊥ ⊥ 2 ] , [ 2 ⊥ ⊥ 3 ] , [ 3 ⊥ ⊥ 4 ] , [ 4 ⊥ ⊥ 5 ] and [ 5 ⊥ ⊥ 1 ] . Let us now return to the question “which almost-princip al minors c an simultane ously vanish for a p ositive definite symmetric n × n -matrix?” Corollar y 4.1 g ives a necessar y condition fo r n = 5. W e next dis c us s the answer to our ques tio n for n ≤ 4. F or n = 3 , the necessary and sufficient conditions ar e given (up to relab eling) by the following four axioms: (a) [ 1 ⊥ ⊥ 2 ] and [ 1 ⊥ ⊥ 3 | { 2 } ] implies [ 1 ⊥ ⊥ 3 ] and [ 1 ⊥ ⊥ 2 | { 3 } ] , (b) [ 1 ⊥ ⊥ 2 | { 3 } ] and [ 1 ⊥ ⊥ 3 | { 2 } ] implies [ 1 ⊥ ⊥ 2 ] and [ 1 ⊥ ⊥ 3 ] , (c) [ 1 ⊥ ⊥ 2 ] a nd [ 1 ⊥ ⊥ 3 ] implies [ 1 ⊥ ⊥ 2 | { 3 } ] and [ 1 ⊥ ⊥ 3 | { 2 } ] , (d) [ 1 ⊥ ⊥ 2 ] a nd [ 1 ⊥ ⊥ 2 | { 3 } ] implies [ 1 ⊥ ⊥ 3 ] or [ 2 ⊥ ⊥ 3 ] . The necessity of these ax ioms can b e chec ked by s imple calculatio ns inv olv- ing almos t- pr incipal minors o f p ositive definite s ymmetric 3 × 3-matrices: (a) σ 12 = σ 13 σ 22 − σ 12 σ 23 = 0 implies σ 13 = σ 12 σ 33 − σ 13 σ 23 = 0 , (b) σ 12 σ 33 − σ 13 σ 23 = σ 13 σ 22 − σ 12 σ 23 = 0 implies σ 12 = σ 13 = 0 , (c) σ 12 = σ 13 = 0 implies σ 12 σ 33 − σ 13 σ 23 = σ 13 σ 22 − σ 12 σ 23 = 0 , (d) σ 12 = σ 12 σ 33 − σ 13 σ 23 = 0 implies σ 13 = 0 or σ 23 = 0. The s ufficiency of these axioms w as noted in [2 2, Ex a mple 1]. F or a rbitrary n ≥ 3, a collection of a lmost-principal minors is ca lled a gaussoid if it satisfies the a xioms (a)-(d), after relab eling and applying Sch ur complemen ts. F or instance, axiom (a) is then written as follows: [ i ⊥ ⊥ j | L ] and [ i ⊥ ⊥ k | { j } ∪ L ] implies [ i ⊥ ⊥ k | L ] and [ i ⊥ ⊥ j | { k } ∪ L ]. This axiom is k nown as the semigr aphoid axiom . See [2 3] for a discuss io n. A g aussoid is r epr esent able if it is the set of v anishing almos t-principal minors of s ome matrix in PD n . F or n = 3 every gaussoid is r epresentable b y [22, E xample 1]. F or n = 4 , a complete clas s ification of the representable gaussoids was g iven in [20]. W e a re her e a sking fo r the extension to n = 5. W e now introduce a conceptual fr amework for our General P roblem. F or each subset S o f { 1 , 2 , . . . , n } we introduce one unknown H S , a nd we define the submo dular c one to b e the so lution set in R 2 n of the system o f linear inequa lities H { i }∪ K + H { j }∪ K ≤ H { i,j }∪ K + H K , (4.3) where K is a ny subset of { 1 , . . . , n } a nd i, j ∈ { 1 , . . . , n }\ K . W e denote this cone b y Sub Mod n ⊂ R 2 n . Note that S ubMod n is a polyhedra l cone living in a high-dimensional spa ce while PD n is a non- po lyhedral cone in a low-dimensional spac e. Betw een these t wo co nes we hav e the entr opy map H : PD n → Sub Mod n , which is given b y the logarithms of all 2 n principal minors of a p os itiv e definite matrix Σ = ( σ ij ). Namely , the co ordinates of the entrop y map are H (Σ) I = − log det (Σ I ) , 10 where I is an y subset of { 1 , . . . , n } and Σ I the corresp o nding pr incipal minor. Note that the entrop y map is well-defined b ecause of the inequality det(Σ { i }∪ K ) · det(Σ { j }∪ K ) ≥ det(Σ { i,j }∪ K ) · det(Σ K ) . (4.4) A ma tr ix Σ ∈ PD n satisfies (4 .1) if and o nly if equality holds in (4.4) if a nd only if equality holds in (4.3 ). This implies the following result. Proposition 4.1. Th e Gaussian c onditional indep endenc e mo dels ar e those subsets of the p ositive definite c one PD n that arise as inverse images of the fac es of the submo dular c one S ubMod n under the entr opy map H . The imp or tance of the submo dular cone for probabilistic inference with discrete random v ariables was highlighted in [23]. Here we are concerned with Gaussian r andom v a riables, and it is the g eometry of the entropy map which we must study . W e ca n thus pa r aphrase o ur problem as follows. General Problem: Char acterize t he image of the entro py map H and how it int erse cts the various fac es of SubM od n . Study the fib ers of t his map. One approach to this problem is to work with the algebraic equations satisfied by the principal minors of a symmetric matr ix. A characteriza- tion o f these relations in terms of hyp er determinants w as prop os ed in [15]. What we are interested in here is the logar ithmic imag e (or amo eb a ) of the po sitive part of the h yp erdeterminantal v ariety o f [15]. A reasonable first approximation to this amo eba is the tropicalization of that v a r iety . Mor e precisely , w e seek to compute the p ositive tr opic al variety [24, § 3.4 ] par a- metrically repres ent ed by the principal minors o f a s y mmetric n × n -matrix. 5. Bon us Problem on Rational P oints. Section 4 dealt with con- ditional indepe ndence (CI) mo dels for Gaussia ns. Our b onus pro blem c o n- cerns CI mo dels for discrete ra ndom v ariables, thus r eturning to the setting of Section 2. Consider n discrete r andom v aria bles X 1 , X 2 , . . . , X n with d 1 , d 2 , . . . , d n states. Any collection of CI statements X i ⊥ ⊥ X j | X K sp e c i- fies a determinantal v ariety in the space of tables C d 1 ⊗ C d 2 ⊗ · · · ⊗ C d n . (5.1) W e call such a v a riety a CI variety . It is the zer o s et o f a large collection of 2 × 2-determinants. These co nstraints are w ell-known and listed explicitly in [12, § 4.1 ] or [27, Prop osition 8.1]. The corresp onding strict CI variety is the set of tables for which the g iven CI statements hold but all other CI statement s do not hold. Th us a strict CI v ariety is a constructible subset of (5.1 ) which is Za riski op en in a CI v ariety . The co rresp onding strict CI mo del is the intersection of the strict CI v ariety with the po sitiv e or- thant . It cons ists of a ll po sitive d 1 × d 2 × · · · × d n -tables that lie in a common equiv alence class, where t wo tables ar e eq uiv alent if pr ecisely the sa me CI statement s X i ⊥ ⊥ X j | X K are v alid (r esp. not v alid) for b oth tables. Bon us Problem: Do es every strict CI mo del have a Q -r ational p oint? 11 This charming problem was prop os e d by F. Mat ´ u ˇ s in [21, pa ge 275]. It suggests that algebraic statistics has so mething to offer also for a rithmetic geometers. O ne conceiv able so lution to the Bonus Problem might say that CI mo dels with no rational p oints exis t but that ra tional po in ts alwa ys app ear when the num ber of states grows la rge, that is , fo r d 1 , d 2 , . . . , d n ≫ 0. B ut that is pure sp eculation. At present we know next to nothing. 6. Brief Co nclusion. This article o ffered a whirlwind intro duction to the emerg ing field of alg ebraic statistics, by discussing a few o f its numerous op en pr oblems. Aside from the Bonus Pro blem a bove, we had listed three Specific Pro blems whose so lution might b e particularly rewarding: • Consider the v a riety of 4 × 4 × 4-tables o f tenso r r ank a t mo st 4. Do the known po lynomial inv a riants o f degree at most nine suffice to define this v ar iety? Set-theoretically? Idea l-theoretically? • Chara cterize all pro jective v arieties whose maximum likelihoo d de- gree is equal to one. • Which sets of almost-principal minors can be simultaneously zero for a po sitive definite symmetric 5 × 5-matrix? REFERENCES [1] E. Al l man: Determine the ideal defining Sec 4 ( P 3 × P 3 × P 3 ). Phylogen etic expla- nation of an Op e n Pr oblem at www.dms.uaf.e du/ ∼ eallman/salmonPrize.pdf . [2] E. All man and J. Rho des: Phyloge netic ideals and v arieties f or the general M arko v model, A dvanc es in Applie d Mathematics , to appear. [3] E. Allm an and J. Rho des: Ph ylogenetics, in R. Laub en bac her (ed): Mo deling and Simulation of Biolo gic al Net works , Proceedings of Symp osia in Applied Mathematics, American Mathematical So ciet y , 2007, pp. 1–31. [4] D. Bro dy and J. Hughston: Geometric quant um mechanics, J. Ge om. Phys. 38 (2001) 1953. [5] L. Catalano-Johnson: The homogeneous i deals of higher secant v ari eties, Journal of Pur e and Applie d Algebr a 158 (2001) 123–129. [6] F. Catanese, S. H o ¸ sten, A. Khetan and B. Sturmfels: The m aximum li k eliho o d degree, Americ an Journal of Mathematics 1 28 (2006) 671-697. [7] M. Drton, B. Sturmfels and S. Sulliv ant: Algebraic f actor analysis: tetrads, pentads and b ey ond, Pr ob ability The ory and R elate d Fields 138 (2007) 463-493 [8] M. Dr ton and S. Sulliv an t: Al gebraic statistical mo dels, St atist ic a Sinic a 17 (2007) 1273–129 7. [9] S. Fienberg, P . Hersh, A. Rinaldo and Y. Zhou: Maxim um likelihoo d estimation in laten t class models f or continge ncy table data, preprint, arXiv:0709.35 35 . [10] W. F ulton and J. Harris: R epr esentation The ory. A First Course , Graduate T exts in Mathematics, 1 2 9 , Springer-V erlag, 1991. [11] L. Garcia, M . Still man and B. Sturmfels: Algebraic geometry of Bay esian netw orks, Journal of Symbo lic Computation 3 9 (2005) 331-355. [12] D. Geiger, C. Meek and B. Sturmf els: On the toric algebra of graphical models, Anna ls of Statistics 34 (2006) 1463-1492 [13] G.- M. Greuel, G. Pfister, and H. Sch¨ o nemann: Singular 3.0. A Computer Algebr a System for Polynomial Computations , Centre for Computer Algebra, Univer- sity of Kaiserslautern, http://w ww.singula r.uni-kl.de , 2005. [14] H. Heydari: General pure multipartite entang led states and the Segre v ariety , J. Phys. A: Math. Gen. 39 (2006) 9839–9844 12 [15] O. Holtz and B. Sturmfels: Hyp erdeterminanta l relations among symmetric prin- cipal minors, Journal of Algebr a 316 (2007) 634–648. [16] S. Ho¸ sten , A. Khetan and B. Sturmfels: Solving the likeliho od equations, F ounda- tions of Computational Mathematics 5 (2005) 389-407. [17] J.M . Landsb erg and L. Manive l: On the i deals of secant v arieties of Segre v arieties, F oundations of Computational Mathematics 4 (2004) 397–422. [18] J.M . Landsberg and L. Manivel: Generalizations of Stras s en’s equations for secan t v arieties of Segre v arieties. Communic ations in Algebr a , to app ear. [19] J.M . Landsberg and J. W eyman: On the ideals and singulari ties of secant v ar ieties of Segre v arieties, Bulletin of the London Math. Soc. 39 (2007) 685–697. [20] R. Lnˇ eni ˇ ck a and F. Mat´ u ˇ s: On Gaussian conditional i ndependence structures, Kyb ernetika 43 (2007) 327–342. [21] F. Mat´ u ˇ s: Conditional independences among four r andom v ariables II I: Final con- clusion Combinatorics, Pr ob ability and Computing 8 (1999) 269–276. [22] F. Mat ´ u ˇ s: Conditional independences in Gaussian v ectors and rings of pol ynomials . Pr o c e e dings of WCII 2002 (eds. G. Kern-Isb erner, W. Rdder, and F. Kulmann) LNAI 3301, Springer-V erl ag, Berlin, 152-161, 2005. [23] J. Morton, L. Pac h ter, A. Shiu, B. Sturmfels and O. Wienand: Conv ex rank tests and semigraphoids, pr eprint , ArXiv:mat h.CO/07025 64 . [24] L. Pac ht er and B. Sturmfels: Algebr aic Stati st ics f or Computational Biolo gy , Cam- bridge U niv ersity Press, 2005. [25] G. Pistone, E. Riccomagno and H. Wynn: Algebr aic Statistics: Computational Commutative Algebr a i n Statistics , Chapman & Hall/CRC, 2000. [26] P . ˇ Simeˇ cek: Classes of Gaussians, discr ete and binary r epr esentable indep endenc e mo dels that have no finite char acte rization , Pro ceedings of Prague Stochastics 2006, pp. 622–632. [27] B. Sturmfels: Solving Syst e ms of Polynomial Equations , CBMS R e gional Confer- enc e Se ries in Mathematics , v ol 97 , A m er. M ath. So ciety , Provide nce, 2002. [28] S. Sulliv an t: Gaussian conditional indep endence r elations hav e no finite complete c haracterization, preprint, arXiv:0704.2 847 , 2007. 13

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment