A class of statistical models to weaken independence in two-way contingency tables

Submitted to the Annals of Statistics A CLASS OF ST A TISTICAL MODELS TO WE AKEN INDEPENDENCE IN TWO-W A Y CONTI N GENCY T ABLES By Enrico Carlini and F abio Rap allo Polite cnic o di T orino and University of E astern Pie dmont In this pap er we study a new class of statistical mo dels for con- tingency tables. W e deﬁne this cl ass of mod els through a subset of the binomial equations of the cla ssical indep en dence mo del. W e use some notions from Algebraic S tatistics t o compute th eir suﬃcient statistic, and to prov e th at they are log-linear. Moreov er, we sho w how to compute maximum likeli h o o d estimates and to p erform exact inference through the Diaconis-Sturmfels algorithm. Examples sh ow that th ese models can b e useful in a wide range of applications. 1. Introduction. One of the most p opular statistical mo d els for tw o- w a y con tingency tables is the indep endence mo d el. It has b ecame a reference to ol in applied researc h where categorica l v ariables are co n cerned. In man y applications the indep endence mo del is suﬃcien t to describ e and mod el t h e data, bu t this is not alwa ys th e case. Ther e are situations where th e inde- p end en ce mo del do es n ot ﬁt the data and one has to detect more complex relations b etw een the rand om v ariables. T h u s , diﬀerent mo dels ha ve b een in tro d uced in order to ident ify some structures in the con tingency tables. Most of these mo d els b elong to the class of log-linear models. Among these, w e recall the quasi-i n dep end ence mo d el, the quasi-symmetry model, the lo- gistic regression mo d el. As a general reference for these mo dels see again [ 3 ]. Suc h mo dels hav e a wide sp ectrum of applications in , e.g., biolog y , psychol- ogy and med icine. The b o oks by Fien b erg [ 15 ], Fingleton [ 17 ], Le [ 26 ] and Agresti [ 3 ] p r esen t a great deal of examples with real dat a sets coming f rom the most disparate disciplines. A r ecen t dev elopment in the area of s tatistica l mo dels for con tingency tables inv olv es th e use of some to ols from Algebraic Geometry to describ e the structure and the prop erties of the mo dels. This ﬁeld is curr en tly kno wn under the name of Algebraic S tatistics. While the ﬁrst work on this direc- tion relates to a metho d for exact in ference, see [ 13 ], follo wing pap ers ha v e fo cused their atten tion on the geometry of the statistical mo d els through p olynomial algebra. The algebraic and geo m etric p oint of view in the analy- AMS 2000 subje ct classiﬁc ations : Primary 62H17; secondary 60A99, 65C6 0, 1 3P10 Keywor ds and phr ases: Algebraic Statistics, log-linear mo dels, Marko v bases, suﬃcient statistic 1 2 E. CAR LINI AND F. RAP ALLO sis of p r obabilit y mo dels allo ws us to generalize statistical mo d els in presence of cells with zero probab ility (t oric mo dels), to study its exp onen tial stru c- ture, and to mak e inference feasible also in mo dels with complex structure. This approac h has b een particularly u seful in the ﬁelds of log-linear and graphical mo dels. Some relev an t works on these recen t topics are [ 20 ], [ 18 ], [ 19 ], and [ 31 ]. An exp osition of s u c h theory , with a view to wa r d applications to computational biology , can b e found in [ 27 ]. The th eoretical a d v ances men tioned ab o ve also ha ve a computational coun terpart. In f act, many symb olic softw ares traditionally conceiv ed for p olynomial algebra no w includ e sp ecial functions or pac k ages sp eciﬁcally designed for Algebraic Statistics, see e.g. CoCoA [ 9 ], 4ti2 [ 1 ], and LattE [ 12 ]. In th is p ap er w e consider statistical mo d els for t wo-w a y con tingency ta- bles with str ictly p ositiv e cell probab ilities. W e introdu ce a class of mo dels in ord er to we aken in d ep end en ce, starting from the binomial represen tation of the indep endence mo del. Th e in dep end ence statement means that the table o f probabilities has rank 1, and therefore that all 2 × 2 minors v anish. In the strictly p ositiv e case, this is equiv alen t to the v anishing of all 2 × 2 adjacen t minors. Ou r m o dels, wh ic h w e c all we a kene d indep endenc e mo dels , are d eﬁned through a su bset of the indep end ence binomial equations. As a consequence, th e indep endence statemen ts hold lo cally and the resulting mo dels allo w us to id en tify lo cal p atterns of indep en d ence in con tingency tables. W e study th e main pr op erties of suc h mo d els. In particular, we pro ve that they b elong to the class of log-linear mo dels, and we determine their suﬃcien t statistic. Moreo v er, we compute the corresp onding Mark o v b ases, in order to apply the Diaconis-Sturmfels algorithm w ithout symb olic com- putations. The r elev ance of our theory is emph asized b y some examples on real data sets. W e also sho w that our m o dels h a v e connections wit h a prob- lem recen tly stated by Bernd Stu rmfels in the ﬁ eld of p robabilit y mo d els for Computational Biolog y , the so-called “10 0 Swiss F rancs Problem”, see [ 32 ]. While most of the p ap ers in Algebraic Statistics uses algebraic and ge- ometric metho ds to describ e and analyze existing statistical mo d els, or to mak e exa ct inference, the main focus of this pap er is the deﬁnition of a new class of mo d els, b y exploiting the Algebraic Statistics w a y of thinking. Notice that w e restrict the analysis to adjacen t minors. Therefore, the ap- plications are mainly concerned with b inary or ordin al r andom v a riables. A t the end of the pap er w e will giv e some p oin ters to follo w-up s and extensions of this w ork. In Section 2 we deﬁne the weak ened in dep end ence mo d els and w e giv e some examples, while in Section 3 w e provide the computation of a suﬃ- MODELS TO W EA KEN IND EPENDENCE 3 cien t statistic. In Sectio n 4 we pr o v e that these mo dels b elong to the class of toric mo d els (and th er efore they are log-linear for strictly p ositiv e probabil- ities), and w e explicitly write d o wn some consequences, suc h as a canonical parametrization of the mo d els. In S ection 5 we compu te the Marko v bases for weak ened in d ep end en ce mo dels and w e present some examples with r eal data. In particular, Example 5.3 is dev oted to the discussion of some in terest- ing relationships b etw een our mod els and the “1 00 Swiss F rancs Problem”. Section 6 highligh ts the main con tribu tions of our th eory and pro vides some p ointe r s to future dev elopmen ts. 2. Deﬁnitions. A tw o-w a y con tingency table collec ts data from a sam- ple w h ere t wo categorical v ariables, sa y X and Y , are measured. Su pp ose that X h as I leve ls and Y h as J level s. The s amp le space f or a sample of size one is X = { 1 , . . . , I } × { 1 , . . . , J } an d a joint p robabilit y distribution for an I × J con tingency table is a table of ra w pr obabilities ( p i,j ) i =1 ,...,I , j =1 ,...,J in the simplex ∆ =    ( p i,j ) i =1 ,...,I , j =1 ,...,J ∈ R I × J + : X i,j p i,j = 1    . A stati s tical mo del for an I × J con tingency table is then a subset of ∆ de- ﬁned thr ough equations on the ra w probab ilities p 1 , 1 , . . . , p I ,J . In this p ap er, w e do not allo w any p i,j to b e zero, and we assume strict p ositivit y of all probabilities. The indep endence mo del can b e deﬁned in p arametric form th rough the p o wer pro duct represen tation, i.e. by the set of equations (2.1) p i,j = ζ 0 ζ X i ζ Y j for i = 1 , . . . , I and j = 1 , . . . , J , where ζ X i and ζ Y j are unrestricted p ositiv e parameters and ζ 0 is the normalizing constan t, see [ 28 ]. In term of log- probabilities, Eq. ( 2.1 ) assumes the most famil iar form (2.2) log p i,j = λ + λ X i + λ Y j where λ = log ζ 0 , λ X i = log ζ X i for i = 1 , . . . , I and λ Y j = log ζ Y j for j = 1 , . . . J . As an equiv alen t representat ion, one can derive implicit formulae on the raw probabilities p i,j . Eliminating th e ζ v ariables from Eq. ( 2.1 ), one obtains the set of equati ons b elo w: (2.3) p i,j p k ,m − p i,m p k ,j = 0 4 E. CAR LINI AND F. RAP ALLO for all 1 ≤ i < k ≤ I and 1 ≤ j < m ≤ J . In other w ords , in the indep endence mo del all 2 × 2 minors of th e table v anish. It is w ell kno wn, see e.g. [ 3 ], that in the p ositiv e case, the equalities in Eq. ( 2.3 ) are redun dan t and it is en ough to set to ze ro the adjacen t minors: (2.4) p i,j p i +1 ,j +1 − p i +1 ,j p i,j +1 for all 1 ≤ i < I and 1 ≤ j < J . Remark 2.1 . In the framew ork of toric mo dels as deﬁned in [ 28 ], where structural zeros are allo w ed, the implicit r epresen tations ( 2.3 ) and ( 2.4 ) are not equiv alen t, as they diﬀer on the b oun dary . F or a description of s u c h phenomenon, see [ 31 ]. In algebraic terms, let C = { p i,j p i +1 ,j +1 − p i +1 ,j p i,j +1 : 1 ≤ i < I , 1 ≤ j < J } . The set C is the set of all 2 × 2 adjace nt minors of the table of probabilities. Moreo v er, let R [ p ] b e the p olynomial ring in I × J ind eterminates with real co eﬃcien ts. F rom the geome tric p oin t of view, the indep endence model is the v ariet y V C = { p i,j : C = 0 } ∩ ∆ , i.e., the set of the points o f the simplex where all binomials in C v anish. The c hoice of a su bset of C leads us to the deﬁnition of a new class of mo dels. Definition 2.2 . L et B b e a subset of C . The B -w eak ened indep endence mo del is the variety V B = { p i,j : B = 0 } ∩ ∆ . Of course, V C ⊆ V B for all su b sets B of C . T he meaning of the class of mo dels in Deﬁnition 2.2 is quite simple. In fa ct, the c h oice of a g iven set of minors means that we allo w the b inomial indep endence statemen ts to hold lo cally , i.e., we determine p atterns of indep endence. Example 2.3 . As a ﬁrs t applications, we consider a 2 × J con tingency table. A table of this kind could deriv e, e.g., fr om th e observ ation of a binary random v ariable X at diﬀerent times. The mo del deﬁned through the s et of binomials B = { p 1 , 1 p 2 , 2 − p 1 , 2 p 2 , 1 , p 1 , 2 p 2 , 3 − p 1 , 3 p 2 , 2 , . . . , p 1 ,j ′ − 1 p 2 ,j ′ − p 1 ,j ′ p 2 ,j ′ − 1 } , MODELS TO W EA KEN IND EPENDENCE 5 s 1 s s 2 s s 3 s . . . . . . s s s j ′ s s s . . . . . . s s J Fig 1 . Binomials for a change-p oint pr oblem in lo gistic r e gr ession. where j ′ < J , is presented in Figure 1 . This choic e of B m eans that there is indep en d ence b et we en X and the time up to the instant j ′ and not after. In literature, the p oin t j ′ in this mo d el refers to the detection of the c han ge- p oint i n a logistic regression mo del. A recen t pap er ab out this to p ic is [ 21 ]. Example 2.4 . Let us consider a I × I con tingency table. A table of this kind could derive from a r ater agreemen t analysis. S upp ose that 2 raters indep en d en tly classify n ob jects using a nominal or ordin al scale with I catego r ies. If w e set B = { p 1 , 1 p 2 , 2 − p 1 , 2 p 2 , 1 } the corresp ond ing mo del yields that categories 1 and 2 are in d istinguishable. A reference for the notio n of cate gory indistinguishability is, e.g., [ 11 ]. This mo del can b e generalize d us in g the s et of bin omials B = { p i,j p i +1 ,j +1 − p i,j +1 p i +1 ,j : 1 ≤ i ≤ i ′ , 1 ≤ j ≤ i ′ } meaning that the categories 1 , . . . , i ′ are ind istinguishable. The ﬁr st pap er in the direction of mod elling patterns of agreemen t is [ 2 ]. An example with 5 catego r ies and 3 undistinguishable categories is presente d in Figure 2 . More examples on the mo dels for r ater agreemen t p roblems will b e presented later in the pap er. Remark 2.5 . In the n ext sections, our a p proac h will pr o ceed someho w bac kwa r ds with resp ect to the classical log -linear mo dels theory . In fact, we will deﬁne the m o del thr ou gh th e bin omials and then we w ill us e them to determine a suﬃcien t statistic and a parametrizatio n. 3. Suﬃcien t statistic. As noticed in the In tro duction, the indep en- dence mo del is deﬁned through the log-linear form in Eq. ( 2.2 ). One can easily c hec k that for the indep endence mo del a suﬃcien t statistic T for the sample of size 1 is giv en by th e ind icator functions of the I r o ws and th e 6 E. CAR LINI AND F. RAP ALLO s s s s s s s s s s s s s s s s s s s s s s s s s Fig 2 . Binomials for Example 2. 4 . indicator fun ctions of the J column s. More precisely , we denote the indica- tor f u nction of the i -th ro w by I ( i, +) and the ind icator fun ction of the j -th column by I (+ ,j ) . W riting the sample sp ace as X = { 1 , . . . , I } × { 1 , . . . , J } , a suﬃcien t statistic for the indep end ence m o del is T =  I (1 , +) , . . . , I ( I , +) , I (+ , 1) , . . . , I (+ ,J )  . A single observ ation is an elemen t of th e s amp le space X and its table has a single count of 1 in one cell and 0 otherwise. This observ ation yields a v alue of 1 in the c orresp onding ro w and column indicator functions in T . Therefore, the suﬃcient stati stic T for a sample of size 1 is a linear map from X to N I + J . Th e function T can b e extended to a linear homomorph ism T : R I J → R I + J . In Section 4 we will prov e that we akened indep endence mo d els, as the indep en d ence mo del, are log-linear. Thus, the s uﬃcien t statistic for a sample of size n is the su m of the suﬃcien t s tatistics of all comp onen ts of th e sample and it will b e formed by the sum of appropr iate cell counts, as familiar in the ﬁeld of ca tegorical data analysis, see e.g. [ 3 ]. Ho w ev er, in this section it is more c onv enien t to work with a samp le of size one and with the indicato r functions. T his approac h has b een fru itfully used in [ 22 ] and, more r ecen tly , in [ 28 ]. Hereinafter, w e write the table as a column v ector, i.e. the table of prob- abilities is written as p = ( p 1 , 1 , . . . , p 1 ,J , . . . , p I , 1 , . . . , p I ,J ) t , where t denotes the transp osition. Moreo v er, w e u se a v ector notation, i.e. w e write a b in omial in the form p a − p b , meaning p a 1 , 1 1 , 1 · · · p a I ,J I ,J − p b 1 , 1 1 , 1 · · · p b I ,J I ,J . MODELS TO W EA KEN IND EPENDENCE 7 W e b rieﬂy r eview the r elationship betw een the suﬃcien t statistic and the binomials in Eq. ( 2.4 ). W riting the table as a column vec tor of length I J , the matrix representa tion o f T is a matrix A C . This matrix has size I J × ( I + J ) and its rank is I + J − 1. Moreo v er, consider the log-v ector of a 2 × 2 minor to b e deﬁned in th e follo wing wa y: Λ : R [ p ] − → R I J p a − p b 7− → a − b W e denote by Z C the su b-v ector space of R I J generated by the v ectors Λ( m ), for all 2 × 2 adjacen t minors m . It is w ell kno wn , see for example [ 6 ], th at Z C has d imension ( I − 1)( J − 1) and the sequen ce of log-v ectors Λ( m ) with m ∈ C is a sequ en ce of ( I − 1)( J − 1) linearly ind ep endent v ectors orthogonal to A C . Hence the column s pace A C is the orthogonal of Z C Th u s, from a vecto r -sp ace p ersp ectiv e, the exp onent s of the b inomials are the orthogo n al complement o f the matrix A C . In the sequ el, w e will use the same s ym b ol to denote a matrix A and the s ub-ve ctor space of R I J generated b y the columns of A , although th is sh ould b e considered as a sligh t abu se of not ation. The p ro cedure describ ed ab ov e is quite g eneral and i t provides a met h o d to actually compute the r elev an t binomials of a s tatistical mo d el with a giv en su ﬃcien t statistic. F or more details, see [ 28 ]. In order to an alyze th e we akened indep endence mod els in De ﬁ nition 2.2 , w e u s e the th eory sk etc h ed ab o ve for the indep endence mo del. W e start with a set of bin omials, w e compute a su ﬃ cien t statistic and the parametric represent ation of the mod el. Remark 3.1 . W e w ill pro ve in Section 4 th at w eak ened ind ep endence mo dels are log-linea r . Therefore, th e orthogonal to the log-v ectors of the c hosen b inomials is the matrix r epresent ation of a su ﬃcien t statistic. In ord er to keep notation as simple as p ossible, w e call this orthogonal a suﬃcient statistic ev en b efore sho wing that the mo dels are log- linear. Lemma 3.2 . The lo g-ve ctors of d distinct adjac ent minors ar e line arly indep endent . Pr o of. Let B d b e a set of d distinct adj acen t min ors and let L d b e the set of th eir lo g-v ectors. W e pro ceed b y ind uction on d . F or d = 1 the statemen t is clearly true. W e assu me that the elemen ts of L i are linearly indep endent for all i < d and we w ill sho w that the same holds for L d . Let m ∈ B d b e the 8 E. CAR LINI AND F. RAP ALLO minor inv olving th e ind eterminate ha ving the lex-smallest in dex, say p ¯ i, ¯ j , and n otice that no other elemen t in B d in vol ves p ¯ i, ¯ j . Let l = Λ( m ) ∈ L d and notice that l is not a linear com b ination of the element of L d \ { l } whic h are linearly in dep end en t b y h yp othesis. Hence the elemen t of L d are lin early indep en d en t. Remark 3.3 . Lemma 3.2 is false when w e consid er log-v ectors of non- adjacen t m inors. As a coun terexample, tak e a 2 × 3 table and all three minors. No w, consider a B -w eak ened ind ep endence m o del with set of adjacen t minors B of cardin ality m . Let Z B b e the matrix of the log-v ectors of the adjacen t m inors in B . In view of L emma 3.2 , the orthogonal of Z B has dimension ( I J − m ). Th us , the explicit computation of A B , the orth ogonal of Z B , requires to ﬁnd at least ( I J − m ) v ectors orth ogonal to Z B . Although this can b e done simp ly with a linear algebra algorithm, it is v ery useful t o in vesti gate the structure of the the matrix A B . Giv en a B -we akened indep endence mod el for I × J con tingency tables, we deﬁne a graph in the f ollo wing wa y . Definition 3.4 . Given a set B of adjac ent minors, we deﬁne a gr aph G B as fol lows: the set of ve rtic es is the set of c el ls and e ach binomial deﬁnes 4 e dges. The binomial p i,j p i +1 ,j +1 − p i,j +1 p i +1 ,j deﬁnes the e dges ( i, j ) ↔ ( i +1 , j ) , ( i + 1 , j ) ↔ ( i +1 , j +1) , ( i, j +1) ↔ ( i +1 , j +1) and ( i, j ) ↔ ( i, j + 1) . The edges asso ciated to a b inomial are the 4 sides of the square with v ertices on the 4 cells in v olve d in the binomial. Definition 3.5 . A c el l ( i, j ) is a free cell if no e dge of G B involves ( i, j ) . Equiv alen tly , a cell ( i, j ) is a fr ee cell if and only if the indeterminate p i,j do es not app ear in an y of the binomials in B . Definition 3.6 . The se quenc e of c el ls ( i, j ) , ( i, j + 1) , . . . ( i, j + h ) i s a connected comp onent of the i -th r ow if e ach p air of c onse cu tive c el ls is c onne c te d by an e dge of G B . The se quenc e forms a maximal conn ected ro w comp onent ( M C R ) if the se quenc e is no mor e c onne cte d when one adds ( i, j − 1) or ( i, j + h + 1) . One can deﬁne similarly th e maximal conn ected column comp onen t ( M C C ). W e illustrate the deﬁnitions ab o ve with an examp le. MODELS TO W EA KEN IND EPENDENCE 9 s s s s s s s s s s s s s s s s Fig 3 . Binomials for Example 3. 7 . Example 3.7 . In the mo del for a 4 × 4 con tingency table deﬁned throu gh the binomials in Figure 3 , w e ha ve 4 M C R s, 5 M C C s and 2 free cells. Pr o position 3.8 . Consider a B - we akene d indep endenc e mo del with set of binomials B and let Z B b e the matrix of the lo g-ve ctors of the minors in B . The indic ator ve ctor s of the fr e e c el ls, the indic ator ve ctor s of the M C R s and the indic ator ve ctors of the M C C s ar e ortho gona l to the c olumn sp ac e Z B . Pr o of. If ( i, j ) is a free cell, then no monomial in B in volv es the corre- sp ond ing v ariable. Hence the indicator ve ctor of ( i, j ) is orthogonal to the column space of Z B . Given a M C R , its in d icator function is clea r ly orthog- onal to the columns of Z B corresp ondin g to min ors not inv olving the cells of the M C R . If a minor inv olv es a cell of the M C R , then it inv olv es tw o cells with alternating signs. A s imilar argument w orks for M C C s. Hence the orthogonalit y follo ws . No w, t wo questions arise: one ab out the linear indep endence of the vect ors deﬁned in Prop osition 3.8 and the other ab out the dimens ion of the sub - v ector space generated by su ch vect ors. In other words, we hav e to inv estigat e whether these vec tors generate the space orthogonal to Z B or n ot. Let us start with t w o s im p le examples. Example 3.9 . C onsider a w eak ened ind ep enden ce mo del for 4 × 4 tables deﬁned thr ou gh the adjacen t minors in Figure 4 . In th is situation, Z B has rank 3 an d there are 4 M C R s , 4 M C C s and 6 free cells. Here, the 14 v ectors corresp ondin g to the M C R s , to the M C C s an d to the free cells generate a su b-v ector space of dimension 13. Thus, they are enough to d eﬁ ne the matrix A B . 10 E. CAR LINI AND F. RAP ALLO s s s s s s s s s s s s s s s s Fig 4 . Binomials for Example 3.9 . s s s s s s s s s s s s s s s s Fig 5 . Binomials for Example 3.10 . Example 3.10 . Consider no w a wea kened indep enden ce mo del for 4 × 4 tables deﬁned th r ough the adjacen t minors in Figur e 5 . The mo del ab ov e only lea v es out one minor, namely the cent r al one. In this case, Z B has rank 8 and there are 4 M C R s and 4 M C C s. Th e 8 v ectors corresp onding to t h e M C R s and to the M C C s generate a sub-ve ctor s p ace of dimension 7 and therefore they are not enough t o generate the orthogonal space A B . The dimension of the vecto r space generated b y th e indicator function of the M C R s, the M C C s and the free c ells can be c omp uted and w e ha v e the follo wing resu lts. Pr o position 3.11 . F or any c onne cte d c omp o nent of B with r M C R s and c M C C s, the v e ctor sp ac e gener ate d by the M C R s and by the M C C s has dimension ( r + c − 1) . Pr o of. Clearly the log-v ectors of the M C R s and of the M C C s a r e not linearly in dep end en t as th eir s u ms are equal. T o show that this is the only relation w e proceed b y i n duction on the n umber o f minors in the co n nected MODELS TO W EA KEN IND EPENDENCE 11 comp onent . Let this num b er b e d . If d = 1 the result is trivial. No w, assume that the resu lt h olds for d . If the connected comp onen t in v olve s d + 1 m inors, let r 1 , . . . , r t b e the in d icator functions of the M C R s and c 1 , . . . , c s b e the indicator functions of the M C C s. Also assume th at c 1 and r 1 in vol ve the lex-smallest cell. Notice that c 1 and r 1 are the only v ectors inv olving this cell. Giv en a linear com bination (3.1) X λ i c i = X µ i r i w e must ha ve λ 1 = µ 1 . Th en the linear com b ination ( 3.1 ) can b e r ead in B ′ = B \ { m } , where m is the min or in vol vin g the lex-smallest cell and the c i ’s and the r i ’s represent the log-v ectors of the M C R s and th e M C C s of B ′ . By the inductiv e h yp othesis w e get λ i = µ i = 1 for all i . As distinct connected comp onents and free cells act on sp aces whic h are orthogonal to eac h other, Prop osition 3.11 leads to the f ollo w in g corollary . Theorem 3.12 . Consider a B -we akene d indep endenc e mo del deﬁne d by a set of binomials B whose gr aph has k c onne cte d c omp onents and with r M C R s, c M C C s and f fr e e c e l ls. The dimension of the ve ctor sp ac e gener ate d by M C R s, M C C s and the indic ator functions of the fr e e c el ls is ( r + c + f − k ) . Pr o of. The indicators of the free cells are clearly in dep end en t with the indicators of the M C C s a n d of the M RC s. Moreo ver, indicators of M C C s and M C R s of diﬀeren t connected comp onent s are linearly indep endent as they do not share an y cell. By Prop osition 3.11 eac h conn ected comp onen t giv es exactly one relation among the indicators of the M C R s and the M C C s. Hence the result follo ws. In the r esults ab o ve, we ha v e addr essed dimensional issu es. No w, w e use them to ﬁnd a p ro cedure to determine a su ﬃcien t statistic. Moreo ver, the examples of this section show that in some cases the v ectors of M C C s, M C R s and f r ee cells are su ﬃ cien t to generate the space orthogonal to Z B . Clearly , these v ectors are not s uﬃcien t when the graph G B of the binomials in B p resen t a h ole, i.e., when we remo ve some minors with 4 double edges from the complete set of b inomials C . Remo ving su c h a min or adds a new v ector to the orthogonal. On th e other han d , it do es not add anything in terms of M C R s, M C C s and free ce lls. Th u s, the last part of this section is d ev oted to actually ﬁnd a su ﬃ cien t statistic for a generic w eak ened indep endence mo del. Th e k ey idea is to start from the complete set of adjacen t minors C and to remo ve m in ors iterativ ely . 12 E. CAR LINI AND F. RAP ALLO This approac h is motiv ated by the fact that for the complete set C a suﬃcient statistic is known to b e formed by the row su m s and the column su ms, as extensiv ely discu s sed in Sectio n 2 . W e b egin our analysis from a simple case. Namely , we consider a set of binomials B with giv en s u ﬃcien t statistic A B and w e in vest igate the b ehavi or of the suﬃ cien t statistic when w e remo v e one minor m from B , i.e., wh en the s et of binomials is B ′ = B \ { m } . W e separate t wo cases, dep end ing on the n u mb er of d ouble edges of the r emo v ed minor. Lemma 3 .13 . Consider a we akene d indep endenc e mo del obtaine d r emov- ing a binomial with four double e dges by a given family of adjac ent b inomials B , i.e. let B ′ = B \ { m } wher e m has four double e dges. If we let A B b e the ortho gonal to Z B , then the ortho gonal to Z B ′ is gener ate d by the elements of A B and by the indic ator ve ctor Q of a qu adr ant c enter e d on one of the indeterminates of the r emove d minor. Pr o of. First notice that the elemen ts of A B are orthogonal to the columns of Z B ′ , i.e. A B ′ ⊇ A B , and clearly , b y Lemma 3.2 , one has dim( A B ′ ) = dim( A B ) + 1 , where A B ′ is the orthogonal to Z B ′ . No w let Q b e the in dicator v ector of a quadrant cen tered o n one of the indeterminates of m . Then, Q 6∈ A B as it is not orthogonal to the log- vect or of m . But, Q ∈ A B ′ as eac h binomial in B ′ either a void the quadrant , or it is co ntained in the quadran t, or has exa ctly t wo elemen ts on the b order o f th e quadr an t. T his is enough to complete the pro of. The quadran t to b e used in Example 3.1 0 is sk etc hed Figure 6 . Lemma 3 .14 . Consider a we akene d indep endenc e mo del obtaine d r emov- ing a binomial with not al l the e dges double by a given family of adjac ent binomials B , i.e. let B ′ = B \ { m } wher e m has not al l the e dges double. If we let A B b e the ortho gonal to Z B , then the ortho gonal to Z B ′ is gener ate d by: the elements of A B , the indic ator ve ctors of the M C C s, of the M C R s and of the fr e e c el ls. Pr o of. Clearly , the elemen ts of A B are orthogonal to the column of Z B ′ , i.e. A B ′ ⊇ A B , and b y Lemma 3.2 one has dim( A B ′ ) = dim( A B ) + 1 , where A B ′ is the orthogonal to Z B ′ . T he remo ved b in omial m , with n ot all the edges double, can b e o n e of the follo wing MODELS TO W EA KEN IND EPENDENCE 13 s s s s s s s s s s s s s s s s Fig 6 . Bi nom i als of Example 3.10 . The dashe d l ine delim its the quadr ant deﬁne d in L emma 3.13 . (d) s s s s (e) s s s s (a) s s s s (b) s s s s (c) s s s s and to complete t h e proof w e only n eed t o presen t, in eac h case, a v ector Q in A B ′ whic h is not in A B . I n case (a), either w e ha v e a n ew free cell or not. If w e hav e, let Q b e the in d icator ve ctor of the free cell. Clearly , Q 6∈ A B , but Q ∈ A B ′ has no minor is in volving the v ariable corresp ondin g to the free cell. If we do not ha v e a new free cell, then we ha v e a new M M C or a new M C R and its indicator v ector is the r equ ired one. Rep eating t h is kind of argumen t in cases (b) through (e) w e complete the pro of. W e are no w ready to analyze the ge n er al case. Definition 3.15 . L et B b e a set of adjac ent minors and c onsider i ts c omplement B i n the set of al l adjac ent minors. L et G B b e the gr aph asso- ciate d with B . F or e ach c onne c te d c omp onent of G B not touching the b or der of the table, we c onsider the lex-smal lest variable and we c al l it a corner . Remark 3.16 . W e notice that the n u m b er of corners is just the n umber of holes one ca n ﬁnd in the g r aph deﬁned b y the set of monomials. 14 E. CAR LINI AND F. RAP ALLO Theorem 3.17 . L e t B b e a set of adja c ent minors. Then the ortho g onal to Z B is gener ate d by the indic ator ve ctors of the M C C s, of the M C R s, of the fr e e c el ls and by quadr ants c enter e d in variables c orr esp onding to c orners. Pr o of. B can b e constructed by the set of all th e adjacen t min ors by remo ving a min or at eac h time. Using Lemmas 3.13 and 3.14 w e only n eed to sho w the result for the set of all the adjace nt minors, but this is a straigh t- forw ard consequence of Lemma 3.2 and Theorem 3.12 . Remark 3.18 . In alternativ e to the straigh tforward u se of Theorem 3.17 one can app ly Lemmas 3.14 and 3 .13 to determine a su ﬃcien t s tatistic for a w eak ened indep endence mod el with set of binomial B . It is enough to start from the complete set of adjacen t minors and remo ve one b y one the minors not in B . Notice that such an iterativ e pro cedur e, and the theorem itself, yield a sys tem of generators of t h e space orthogonal to Z B , but not a basis, i.e. some of the v ectors w e add are redundant. W e will sho w some examples and app lications of Th eorem 3.17 in th e next sections. 4. Exp onential mo dels. In Section 3 we ha v e carried out some com- putations to determine a suﬃcient statistic of a weak ened indep end en ce mo del. W e are no w able to ﬁnd a parametric represen tation of the mo del. Let us introd u ce unrestricted p ositiv e parameters ζ 1 , . . . , ζ s , wh ere s is the n u mb er of columns of the matrix A B . If A B has full rank, then s coincide with the dimension of the v ector su b-space orthogonal to Z B . The ﬁrst step is to pr ov e that weak ened indep enden ce mo dels b elong to the class of toric mo dels and ther efore they are exp onen tial (log-linear) mo dels on the strictly p ositiv e simplex. The ﬁr st result in this direction is a rewriting of a general theo r em to b e found in [ 20 ]. Remark 4.1 . The main result in [ 20 ] can b e applied to statistical mo dels when the matrix representat ion A B of th e suﬃcient statistic has n on-negativ e en tries. Therefore, the theory dev elop ed in Sectio n 3 is r elev an t not only to actually determine a suﬃcien t statistic, but also to derive further theoretical prop erties of the w eak ened indep endence models. Here we u s e again a v ector n otation in ord er to im p ro ve the r eadabilit y and simplify the form ulae. MODELS TO W EA KEN IND EPENDENCE 15 Theorem 4.2 (Geiger, Meek, Sturm fels (2006) , Th . 3.2) . Given a B - we akene d indep endenc e mo del, it c an b e expr e sse d as (4.1) p ( ζ ) = ζ A B ap art fr om the normalizing c onstant. Clearly , the parametrization in Eq. ( 4.1 ) is not u nique. T heorem 4.2 pro- vides an easy w ay to s witc h from the implicit representa tion to its parametriza- tion. It is enough to co n sider the matrix A B , w hose c olumn s are orthogonal to the log-v ectors of the binomials. As the columns of A B can b e chosen with non-n egativ e ent r ies, then eac h bin omial in B v anishes in all p oints of the form giv en in Eq. ( 4.1 ). Th is follo ws f r om a direct substitution. The con v ers e part is less in tuitive and the p r o of is not ob vious. As a coroll ary , T heorem 4.2 allo w s u s to consider weak ened in dep end ence mo dels in to the larger class of toric mod els, as describ ed in [ 28 ] a n d [ 31 ]. In the follo wing result, w e summarize the main prop erties inh er ited from to r ic mo dels. Pr o position 4.3 . Consider a B -we akene d indep e ndenc e mo del V B . 1. V B is a toric mo del; 2. With the c onstr aint p > 0 , V B is an exp onential mo del; 3. In c ase of sampling, the suﬃci e nt statistic for the sample of size n is the sum of the suﬃci ent statistic of al l c omp onents of the sample. Pr o of. See Theorem 2 a n d the discussion in Sect ion 3 of [ 31 ]. Remark 4.4 . As noticed in [ 31 ], wh en w e consider the general case p ≥ 0 instead of p > 0, the toric mo del is not an exp onential mod el. Nev ertheless it can b e describ ed as the disjoin t u nion of a suitable n umb er of exp onent ial mo dels. W e conclude this section with t wo examples. Example 4 .5 . As a ﬁrst example, we consider a statistical mo d el for 3 × 3 cont ingency tables d eﬁned through th e b in omials in Figure 7 . Using the theory d ev elop ed in the previous section, th e M C R s, the M C C s and the f ree cells are su ﬃcien t to describ e the orthogonal Z B and ther efore the relev an t matrices are in T able 1 . Thus, a parametrization with parameters 16 E. CAR LINI AND F. RAP ALLO s s s s s s s s s Fig 7 . Binomials for Example 4.5 . [ Z B | A B ] =             1 0 1 0 0 1 0 0 0 0 -1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 -1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 -1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 -1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0             T able 1 Matric es Z B and A B for Example 4.5 . ζ 1 , . . . , ζ 8 is:        p 1 , 1 = ζ 1 ζ 4 p 1 , 2 = ζ 1 ζ 5 p 1 , 3 = ζ 7        p 2 , 1 = ζ 2 ζ 4 p 2 , 2 = ζ 2 ζ 5 p 2 , 3 = ζ 2 ζ 6        p 3 , 1 = ζ 8 p 3 , 2 = ζ 3 ζ 5 p 3 , 3 = ζ 3 ζ 6 Example 4.6 . Let u s consider the w eak ened ind ep enden ce mo d el f or 4 × 4 deﬁn ed in Example 3.10 . In th at mo del, a minor with four double edges has b een r emov ed and consequen tly th e M C R s, M C C s and free cells are not suﬃcien t to describ e the orthogonal Z B . A vecto r must b e added according to Theorem 3.13 . F ollo w ing the same appr oac h as in the previous example one can easily write do wn the matrices Z B and A B . A parametrizat ion with parameters ζ 1 , . . . , ζ 9 is:            p 1 , 1 = ζ 1 ζ 5 p 1 , 2 = ζ 1 ζ 6 p 1 , 3 = ζ 1 ζ 7 p 1 , 4 = ζ 1 ζ 8            p 2 , 1 = ζ 2 ζ 5 p 2 , 2 = ζ 2 ζ 6 p 2 , 3 = ζ 2 ζ 7 p 2 , 4 = ζ 2 ζ 8            p 3 , 1 = ζ 3 ζ 5 p 3 , 2 = ζ 3 ζ 6 p 3 , 3 = ζ 3 ζ 7 ζ 9 p 3 , 4 = ζ 3 ζ 8 ζ 9            p 4 , 1 = ζ 4 ζ 5 p 4 , 2 = ζ 4 ζ 6 p 4 , 3 = ζ 4 ζ 7 ζ 9 p 4 , 4 = ζ 4 ζ 8 ζ 9 MODELS TO W EA KEN IND EPENDENCE 17 5. Inference and examples. In the previous sections we h av e deﬁned and studied the weak ened in d ep end en ce mo dels. When the statistical mo del is giv en thr ough a set of adjacent minors B , we are no w able to compu te a suﬃcien t statistic for a sample of size 1 (see Prop osition 4.3 ) and to ﬁ nd a parametrization of the statistical mo del V B (see Theorem 4.2 .) In th is section w e giv e some ideas on h o w to compute maximum like liho o d estimates (MLE) and to p erform exact inference through Al gebraic S tatistics. In the ind ep endence mo del d eﬁned by the set C of all adjacent minors, the maxim um like liho o d estimate can b e expressed in closed form in terms of the observ ed v alue of the su ﬃ cien t statistic T . In the w eak ened indep en dence mo dels this is no longer true, b ut numerical algo r ith m s for log- linear mo dels can b e used. As p oin ted out in Section 4 , at least in th e strictly p ositiv e case, a weak ened indep en d ence m o del is log-linea r and th us mo diﬁed Newton- Raphson m etho ds, Iterativ e Prop ortional Fitting or EM metho ds can b e used, see e.g. [ 3 ]. T o compute the MLEs of of th e cell pr obabilities for the examples in this p ap er, we ha ve used the R softw are, see [ 29 ] , together with the pac k age gllm (generalized log-linear mo dels), see [ 14 ]. This pac k age allo w s to deﬁne a generic ma trix for the su ﬃcien t st atistic. This is the main adv anta ge of the gllm pac k age with r esp ect to other a v ailable pro cedures in diﬀeren t softw are systems. Another theoretical r esult whic h h ighlights once again the in terplay b e- t we en st atistical m o dels and p olynomial algebra is th e Birc h ’s Th eorem (see e.g. [ 6 ]). It states that the MLE is the unique p oint ˆ p of the mo del V B whic h satisﬁes th e constraint s A t B ˆ p = A t B p obs , where p obs are the observ ed fr equen- cies. W e will hav e the opp ortunity to apply such result later in Example 5.3 . Once the MLE is a v ailable, the go o d ness-of-ﬁt can b e ev aluated through a c hi-squared test. The P earson test statistic (5.1) C 2 = n X i,j ( p i,j − ˆ p i,j ) 2 ˆ p i,j or the log- likel ih o o d ratio test statistic (5.2) G 2 = 2 n X i,j p i,j log p i,j ˆ p i,j ! are ev aluated and compared with the chi-square distribu tion w ith # B de- grees of freedom, where # B is the cardinalit y of B , see [ 22 ] or [ 6 ]. Alternativ ely , one can run th e go o dness-of-ﬁt test within Algebraic S tatis- tics, using a Mark ov Ch ains Mon te Carlo (MCMC) alg orithm. A num b er of 18 E. CAR LINI AND F. RAP ALLO pap ers hav e sho wn the relev ance of th is approac h , see [ 13 ], and e.g. [ 30 ], [ 4 ], and [ 8 ]. The algebraic MCMC algorithm w as ﬁr st describ ed in [ 13 ], and it is b y no w widely used to compute non-a sy m ptotic p -v alues for goo dn ess-of-ﬁt tests in con tingency tables problems. Let h b e th e observ ed con tingency table for a sample of siz e n , w ritten as a v ector in N I J . The MCMC algorithm is useful to eﬃcien tly sample from the reference set F t of a con tingency table h giv en a su ﬃcien t statistic w ith matrix represent ation A B , i.e., from the set F t = n h ′ ∈ N I J | A t B h ′ = A t B h o . The algorithm samples tables from F t with the appropriate hyp ergeometric distribution H thr ou gh a Marko v chain based on a s uitable set of m o v es making the chain conn ected. Such set of mov es is called a Markov b asis . With more details, a Mark o v basis is a set of tables { m 1 , . . . , m L } with in teger entries suc h that: • A t B m k = 0 for all k = 1 , . . . , L • if h 1 and h 2 are tables in F t , there exist mo ve s m k 1 , . . . , m k A and s igns ǫ 1 , . . . , ǫ A ( ǫ a = ± 1) su ch that h 2 = h 1 + A X a =1 ǫ a m k a and h 1 + l X a =1 ǫ a m k a ≥ 0 for all l = 1 , . . . , A . These conditions ensure the irr educibilit y of the Mark o v c hain d eﬁned by the algorithm b elo w: • Start from a table h 1 ∈ F t ; • Cho ose a mov e m k uniformly in { m 1 , . . . , m L } and a sign ǫ uniformly in {− 1 , 1 } . Deﬁne h 2 = h 1 + ǫm k ; • Cho ose u uniformly distribu ted in the interv al [0 , 1]. If h 2 ≥ 0 and min { 1 , H ( h 2 ) /H ( h 1 ) } > u then mo ve from h 1 to h 2 , otherwise sta y at h 1 . In the general case, the compu tation of a Marko v basis n eeds s y mb olic computations (the Diaconis-Sturmfels algorithm). Nevertheless, a Mark ov basis for the w eak ened indep endence mo dels can b e d eriv ed theoretically . In the f ollo w ing, we will determine a Marko v basis for the weak ened ind ep en- dence mo dels. Giv en the set B of b inomials d eﬁning the B -weak ened ind ep endence mo d el, let A B b e the ma trix r epresent ation of the suﬃcien t statistic. Moreo ver, we denote b y I B the p olynomial ideal in R [ p ] generated b y the binomials in B . Diaconis and Stur m fels ([ 13 ], Th eorem 3 . 1) prov ed that a Mark ov b asis is MODELS TO W EA KEN IND EPENDENCE 19 Second instructor First instructor 1 2 3 T otal 1 7 5 0 12 (6 . 52) (5 . 48) (0) 2 4 5 2 11 (4 . 48) (3 . 76) (2 . 76) 3 1 5 5 11 (1) (5 . 76) (4 . 24) T otal 12 15 7 34 T able 2 Evaluation of 3 4 homework s by two instructors. In p ar entheses ar e the t he MLE estimates f or the we akene d indep endenc e mo del. formed b y the log-v ectors of a set of generators of the toric ideal a sso ciated to A B , i.e. the i d eal J B = { p a − p b | a, b ∈ R I J , A t B ( a ) = A t B ( b ) } . Therefore, the computation o f a Marko v basis tr an s lates i nto the compu- tation of a set o f generators of a toric ideal. Bigatti e t al. [ 5 ] sho wed that the toric ideal asso ciated to A B is the s at- uration of I B with resp ect to the pro d uct of the ind eterminates. Suc h ideal is deﬁned as: I B : ( p 1 , 1 · · · p I ,J ) ∞ = { f ∈ R [ p ] | ( p 1 , 1 · · · p I ,J ) n f ∈ I B for some n } . In order to compute a set of generators of this ideal, one can use symbolic algebra pac k ages, e.g. the function Toric of CoCoA. F or further details on ideals and their op erations, see e. g. [ 10 ] and [ 25 ]. Example 5 .1 . The data w e presen t as a ﬁrst example in this section ha ve b een colle cted b y one of the authors in his Biostatistics course. Eac h of the 34 studen ts must s u bmit a h omew ork b efore the exam and this r ep ort is ev aluated b y t wo in structors on a scale with lev els { 1 , 2 , 3 } . Th e ﬁnal grade is the maxim um of the t wo ev aluations. The data are in the T able 2 . The mo del w e use to analyze suc h data is the model deﬁned by the adja- cen t min ors in Example 4.5 . Using th e gllm pac k age, we obtain the MLEs written in parentheses in the table. Th e Pea r s on statistic is 0 . 9863. Running a MCMC algorithm with a Mark o v basis consisting of 2 mov es, we ﬁnd a p -v alue of 0 . 6 665 for the go o dness-of-ﬁt t est, sho win g a go o d ﬁ t. The Monte Carlo computations are based on a sample of 10 , 000 tables, with a burn-in phase of 50 , 000 tables and sampling e very 50 steps. 20 E. CAR LINI AND F. RAP ALLO Smoking level HDLC 1 2 3 4 T otal 1 15 3 6 1 25 2 8 4 7 2 21 3 11 6 15 3 35 4 5 1 11 5 22 T otal 39 14 39 11 103 T able 3 Cr oss-classiﬁc ation of Smoking l evel and HDLC. Smoki ng levels: 1 = “No Smoking”, 2 = “L ess than 5 cigar ettes”, 3 = “L ess than 10 cigar ettes”, 4 = “Mor e than 10 cigar ettes” . HDLC levels: 1 = “Normal”, 2 = “L ow No rmal ”, 3 = “Bor derline”, 4 = “Abnormal”. Mo dels of this kind are u sed in [ 7 ] to detect catego ry indistingu ish abilit y b oth in intra-rate r and in int er-rater agreemen t problems. The mo del we used sho w s that categories 1 and 2 are confu sed, as well as categories 2 and 3. This lac k of distinguishability can b e ascrib ed to a relev ant non- homogeneit y of the marginal distribu tions. Example 5.2 . The d ata in T able 3 sh o w the cr oss-classiﬁcation of 103 sub j ects with r esp ect to 2 ordin al v ariables: the smoking lev el, 4 categories from “No Smoking” to “More than 10 cigarettes”, and the qu an tit y of High- Densit y Lip oprotein Cholesterol (HDLP) in the b lo o d, 4 categ ories from “Normal” to “Abn orm al”. The data are presented in [ 24 ] and analyzed by the authors under b oth the in d ep end en ce mo del and the R C (Ro w Column eﬀects) mo del. The authors compute the exact p -v alues for the ind ep enden ce mo del (0.049) and f or th e R C mo del (0.657) using the log-lik eliho o d ratio test statistic. W e use a weak ened ind ep end en ce model with b inomials in Figure 8 . Ac- cording to Th eorem 3.17 , a suﬃcient statistic is f orm ed by 4 M C R s, 4 M C C s and 3 free cells. Mo r eo v er, the relev an t Mark ov basis has 14 binomi- als. With our mo del we ﬁn d a p -v alue of 0 . 7205 . Therefore, this wea kened indep en d ence mo del ﬁts b etter than the indep endence mo del and it is as go o d as the R C mo del. In particular, remo ving only three adjacen t minors from the complete conﬁguration with 9 min ors, w e obtain a model whose ﬁt is dr amatically improv ed. Moreo v er, the r emo v ed minors allo w us to iden tify quic kly the cells whic h cause the departure from indep endence. Example 5.3 . T o conclude this section, we sh o w how the mo dels de- ﬁned in the present pap er ha ve some r elationships with a r ecen t problem, ﬁrst stated by Bernd Sturmfels in 200 5 and kno wn as the “1 00 S wiss F rancs MODELS TO W EA KEN IND EPENDENCE 21 s s s s s s s s s s s s s s s s Fig 8 . We akene d indep endenc e mo del for th e choles ter ol data. Problem”, s ee [ 32 ]. S uc h problem is r elated to the mo d elling of DNA se- quence alignmen ts. W e brieﬂy describ e the probabilistic exp eriment. F or a plain description, the reader can refer to [ 27 ], Examp le 1 . 15. A DNA sequence is a sequence of symbols in the alph ab et { A , T , C , G } . A ma jor p r oblem in molecular biology is to compare tw o DNA sequences. In [ 32 ], the follo w ing observed s equ ences w ere considered: ATCACCAA ACATTGGG ATGCCTGT GCATTTGCAAGCGGCT ATGAGTCT TAAAACGC TGGCCATG TCCATCTTAGACAGCG leading to the observ ed table b elo w: (5.3)      4 2 2 2 2 4 2 2 2 2 4 2 2 2 2 4      The h yp othesis of the auth or is that suc h tw o DN A sequ ences are generated through a (biased) coin and f our tetrahedral dice D 1 , D 2 , D 3 , D 4 with the letters A , T , C , G on the facets. When the coin outcome is “Head”, then the dice D 1 and D 2 are r olled. The outcome of D 1 is registered in the ﬁrs t sequence a n d the outcome of D 2 in the second one. When the coin outcome is “T ail”, then the dice D 3 and D 4 are rolled. Denote by q 1 , . . . , q 4 the probabilit y v ectors of the fo u r dice and by α the probabilit y of “Head” in the coin. Then, the probabilit y distribution of the ﬁnal outcome is (5.4) αq 1 q t 2 + (1 − α ) q 3 q t 4 . Therefore, the construction of this exp erimen t leads us to consider the statistica l mo del of 4 × 4 matrices of pr obabilities whose rank is l ess th an or 22 E. CAR LINI AND F. RAP ALLO s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s Fig 9 . We akene d indep endenc e mo dels M 1 (left) and M 2 (right) for t he 100 Swiss f r ancs pr oblem. equal to 2. The author conjectured that the maximum lik eliho o d estimate of the probabilities under the mod el of matrices with rank at most 2 is: (5.5) ˆ P g = 1 40      3 3 2 2 3 3 2 2 2 2 3 3 2 2 3 3      F urther analyses to b e fou n d in [ 16 ] sho w that the mo del is non-iden tiﬁable and that numerical m etho ds are able to iden tify 3 global maxima and 4 lo cal maxima for the lik eliho o d fu nction. Apart from the simultaneous p ermuta - tion of the ro w and column lab els, the global maximum is reac hed for the matrix in Eq. ( 5 .5 ) and the local m axim um is obtained b y the matrix: (5.6) ˆ P l = 1 40      8/3 8/3 8/3 2 8/3 8/3 8/3 2 8/3 8/3 8/3 2 2 2 2 4      No w, consider th e follo wing probabilit y mo dels for 4 × 4 con tingency ta- bles: • the mo del M of the matrices with rank a t most 2; • the weak ened indep en dence mo dels M 1 and M 2 whose bin omials are present ed in Figure 9 . One can easily chec k th at b oth M 1 and M 2 are prop er subsets of M . In fact, in M 1 the ﬁrst row is pr op ortional to the second ro w an d so are the third and the fourth, while in M 2 , the ﬁrst three ro ws are prop ortional. No w it is easy to c hec k by direct substitution that the matrix ˆ P g of global maxim um in Eq. ( 5.5 ) b elongs to M 1 , while the matrix ˆ P l of local maxim u m MODELS TO W EA KEN IND EPENDENCE 23 in Eq. ( 5.6 ) b elongs to M 2 . Suc h matrices are the MLEs for t h e t wo m o dels, resp ectiv ely . Remark 5.4 . This i s not a pro of that the matrix ˆ P g is the MLE for the mo del M . Nevertheless, it is inte r esting to notice that our mo dels con tain b oth the lo cal and the global m axima. How eve r , our mo d el can not suﬃce to ﬁnd the lo cal extrema in M as M is not a toric mo del, e.g., i t is not deﬁn ed b y b inomials. 6. Final remarks and future work . In this pap er we used the b i- nomial representa tion of the indep endence mo del to introd uce a new class of s tatistical mo dels:these mo dels are devised to weak en indep endence. W e studied th eir su ﬃcien t statistic, we p ro ve d that th ey are log-linear mo dels for strictly p ositive p robabilities and we show ed how to make inf erence on these mod els. Some n umerical examples emphasized the imp ortance and the wide applicabilit y of our mo dels. W e h a v e in mind diﬀerent w ays to generalize this w ork. Fi r st, we w ant to ﬁnd a pro cedure to c haracterize the r elev an t Ma rko v bases to b e used in the Diaconis-Sturmfels algorithm. Then, w e plan to c ons id er mo dels deﬁned by non-adjacen t 2 × 2 minors and tr y to analyze them wit h similar tec h n iques. Moreo ver, w e a r e interested in the stud y of higher dimen sional minors, e.g ., 3 × 3 min ors whic h app ear in the deﬁn ition of the mo dels in Example 5.3 . Finally , for large tables there will b e many w eak ened indep en dence mod els and mod el selectio n strategies m u st be studied. F urther applications of this kind of mo dels will b e in vestig ated, in particular in the ﬁeld of computational biology . F rom a geometrical p oin t of view, we would like to exp lore the structure of the v arieties d eﬁned by some adjacen t minors, as done in [ 23 ] when all adjacen t minors are considered. References. [1] 4ti2 team . (2007). 4ti2—a soft wa re pack age for algebraic, geometric and combinato- rial problems on linear spaces. Avai lable at www. 4ti2.de. [2] Agresti, A. (1992). Mod elling patterns of a greement and disagreement. Statistic al Metho ds in Me dic al R ese ar ch 1 , 201–2 18. [3] Agresti, A. (2002). Cate goric al Data Analysis , 2 ed. Wiley , New Y ork. [4] Aoki, S. and T akemu ra, A. (2005). Mark ov chain Monte Carlo exact tests for incomplete tw o-wa y contingency tables. J. Stat. Comput . Simul 75 , 10, 787– 812. [5] Bi ga tti, A. , La Scala, R. , and R obbiano, L. (19 99). Computing toric ideals. J. Symb. Comput. 27 , 351–3 65. [6] Bi shop, Y. M. , Fienberg, S. , and Holland, P. W . (1975). Di scr ete multivariate analysis: The ory and pr actic e . MIT Press, Cambridge. [7] Carli ni, E. and Rap a llo, F. (2008). A lgebraic modelling of category distinguisha- bilit y . In Mathematics Explor ations i n Contemp or ary Statistics , P . Gibilisco, E. Ricco- magno, and M. P . Rogan tin, Eds. Cam bridge Universit y Press. In press. 24 E. CAR LINI AND F. RAP ALLO [8] Che n, Y. , Dinwoodie, I. , Dobra, A. , and Huber, M. (2005). Lattice p oints, contin- gency tables, and sampling. I n Inte ger p oints in p olyhe dr a—ge ometry, numb er the ory, algebr a, optimi zation . Con temp . Math., V ol. 374 . Amer. Math. So c., Pro vidence, RI, 65–78. [9] CoCoATea m . (2007). CoCoA: a system for d oing Computations in Commutativ e Algebra. Avai lable at http://coc oa.dima.unige.it . [10] Cox, D. , Little, J. , and O’She a, D. (1992). Ide als, V arieties, and Algorithms . Springer V erlag, New Y ork. [11] Darr och, J. N. and McCloud, P. I. (1986). Category distinguishabilit y and observer agreemen t. Austr alian Journal of Statistics 28 , 3, 371–388 . [12] De Loera, J. , Ha ws, D. , Hemmecke, R. , Hug gins, P. , T auzer, J. , and Yoshida, R. (2003). A user’s guide for LattE v1.1. so ftw are pack age LattE is av ailable at http://w ww.math.ucdavis.edu/ ~ latte/. [13] Diaconis, P. and Sturmfels, B . (19 98). Algebraic algorithms for sampling fro m conditional distributions. Ann. Statist. 26 , 1, 363–397. [14] Duffy, D. (200 6). The gl l m p ackage , 0.31 ed. Av ailable from http://cra n.r-project.org . [15] Fienberg, S . (1980). The An alysi s of Cr oss-Classiﬁe d Cate goric al Data . MIT Press, Cam bridge. [16] Fienberg, S. E. , Hersh, P. , Rin aldo, A. , a nd Zhou, Y. (2007). Maximum likeli - hoo d estimation in laten t class mod els for contingency table data. arX iv :0709.3 535v1. [17] Fingleton, B. (1984). Mod els of Cate gory Counts . Cam bridge U niversit y Press, Cam bridge. [18] Garcia, L. D. , Stillman, M. , and Sturmfels, B. (2005). Algebraic geometry of Ba yesya n netw orks. J. Symb. Comput. 39 , 331–35 5. [19] Gei ger, D. , H eckerman, D. , King, H. , and Mee k, C. (2001). Stratiﬁed ex p onen- tial famili es: Gra ph ical models and model selection. Ann. Stat i st. 29 , 3, 505–529. [20] Gei ger, D. , Mee k, C. , a nd S turmfels, B . (2006). On th e toric algebra of graphical mod els. Ann. Statist. 34 , 3, 1463–1492. [21] Gure vich, G. and Vexler, A. (2005). Change p oint problems in t he mo del of logistic regressio n . J. Statist. Pl ann. Infer enc e 131 , 2, 313– 331. [22] Haberma n, S. J. (1974). The Analysis of F r e quency Data . The Universit y of Chicago Press, Chicago an d London. [23] Hosten, S. and Sulliv ant, S. (2004). I deals of adjacen t minors. J. Algeb r a 277 , 615–642 . [24] Jeong, H. C. , Jhun, M. , a nd Kim, D. ( 2005). Bootstrap tests for indep en dence in tw o-wa y ordinal contingency tables. C om put. Statist. Data A nal. 48 , 623–631. [25] Kreuz er, M. and Robbiano, L. (2000). Computational Commutative Algebr a 1 . Springer, Berlin. [26] Le, C. T. (1998). Applie d Cate goric al Data Analysis . John Wiley & S ons, New Y ork. [27] P a tcher, L. and S turmfels, B. (2005). A lgebr aic statistics for c omputational biolo gy . Cambridge Univers ity Press, New Y ork. [28] Pistone, G. , Riccomagno, E. , and Wynn, H. P. (2001). Algeb r aic Statistics: Computational Commutative A lgebr a in Statistics . Chapman&Hall/CR C, Boca Raton. [29] R Development Core Te am . (2006). R: A L anguage and Envir onment for Sta- tistic al Com puting . R F oundation for S t atistical Computing, Vienn a, Austria. ISBN 3-900051-07 - 0, http://www.R-pro ject.org. [30] Rap allo, F. (2003). Algebraic Marko v b ases and MCMC for t wo-w ay con tingency tables. Sc and. J. Statist. 30 , 2, 385– 397. [31] Rap allo, F. (2007). T oric s tatistical models: Binomi al and parametric represen ta- MODELS TO W EA KEN IND EPENDENCE 25 tions. Ann. Inst. Statist. Math. 59 , 4, 727–740. [32] Sturmfe ls, B. (2007). Op en problems in algebraic statistics. arXiv:0707.45 58v1. Dep a r tment of Ma thema tics Politecnico di Torino Corso Duca degli Abruzzi, 2 4 10124 TORINO (It al y) E-mail: enrico.carlini@p olito.it Dep a r tment of Science and Ad v anced Technologies University of Eastern Piedmont Via Bellini, 25/g 15100 ALESSANDRIA (It al y) E-mail: fabio.rapallo@mfn.unipmn. it

A class of statistical models to weaken independence in two-way contingency tables

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment