A Localization Approach to Improve Iterative Proportional Scaling in Gaussian Graphical Models

A Lo calization Approach to Impro v e Iterativ e Prop ortional Scalin g in Gaussian Graphical Mo dels Hisayuki Hara Department of T ec hno logy Manag emen t fo r Innov ati on Universit y of T ok yo Akimi c hi T akem ur a Graduat e Sc ho ol of Inform ation Science and T ec hnolog y Universit y of T okyo Ma y 2 0 08 Abstract W e discuss an eﬃcien t imp le men tation of the iterativ e prop ortional sca ling p ro- cedure in the m ultiv ariate Gaussian graph ic al mo dels. W e show that the compu ta - tional cost can b e reduced by localization of the u pd at e pro cedure in eac h iterativ e step b y using the structure of a d e comp osable mo d el obtained by triangulation of the graph asso ciat ed with the mo del. S o me n umerical exp erimen ts demonstrate the comp etit iv e p erformance of the prop osed algorithm. 1 In tro d uction Since Dempster [6] introduced a m ultiv ariate Ga us sian graphical mo del, also called a co- v ariance selection mo del, it has b ee n in v estigated b y man y autho r s from b oth theoretical and practical viewp oin ts. On the theory of a Gaussian graphical mo del, see e.g. Whit- tak er [22 ], Lauritzen [16], Co x and W ermuth [2] and Edw ards [10]. In recent y ears m uch eﬀort has b een dev ot ed to application of the Gaussian graphical mo del t o iden tify sparse large net work systems , esp ecially genetic net works (e.g. [8], [18], [9]), and the eﬃcien t implemen tation of the inference in the mo del ha s b een extensiv ely studied. In this article w e discuss an eﬃcien t algo rithm t o compute the ma ximum like liho o d estimator (MLE) of the co v ariance matrix in the Gaussian graphical mo de ls. When the graph associat ed with the mo del is a c hordal g raph, the mo de l is called a decomp osable mo del. F or a decomposable mo de l, the MLE of the co v ariance matrix is explicitly obtained. F o r g eneral graphical mo dels other than decomp osable mo dels, ho wev er, w e need some it era t ive pro cedure to o btain the MLE. The iterativ e prop ortional scaling (IPS) pro cedure is one of p opular algor ithm s to compute the MLE. 1 The IPS w as ﬁrst intro du ced b y Deming and Stephan [5] to estimate cell pro babilities in con t ingen cy t a ble s sub ject to certain ﬁxed margina ls . Its con v ergence and statistical prop erties hav e b een w ell studied by man y authors (e.g. [13], [11]) and the IPS hav e b e en justiﬁed in a more general fr a me w ork ([3]). Sp eed and Kiiveri [21 ] ﬁrst formulated the IPS in a G auss ian graphical mo del and gav e a pro of of its conv ergence. Ho wev er, from a practically p oin t of view, a straigh tfo r ward application of the IPS is often computationally to o exp ensiv e for larger mo dels. In the con tingency tables sev eral tec hniques ha v e b een deve lop ed t o reduce b oth storage and computational time of the IPS (e.g. [14], [15]). Badsb erg and Malv estuto [1] prop osed a lo calize d implemen ta t ion of the IPS b y using the structure of decomp osable mo dels con taining the gr aphic al mo del. Suc h a tec hnique is called the c hordal extens ion. The local computation based o n the c hordal extension has b een a p opular tec hnique in many ﬁelds for n umerical computation of a sparse linear system(e.g. [20], [12 ]). In the presen t pap er we describ e a lo calized algorithm based on the c hordal extension for impro ving the computational eﬃciency of the IPS in the G a us sian graphical mo dels . Let ∆ b e the s et of v ariables whic h corresponds t o the s et of v ertices of the graph asso ciated with the mo del. The straightforw ard implemen tatio n of the IPS requires approx imately O ( | ∆ | 3 ) time in each iterativ e step for large mo dels . In the similar w ay to the techniq ue in Badsb erg and Malv estuto [1], w e lo calize the up date pro cedure in eac h step by using the structure of a decomposable mo del con taining the mo del. The prop osed algorithm is sho wn to require O ( | ∆ | ) t im e for some mo dels. The pr o ble m of computing the MLE is equiv alen t to the p ositiv e deﬁnite matr ix com- pletion problem. The prop osed algorithm based on the c hordal extension is closely related to the tech nique discuss ed b y F ukuda et al. [12] in the framew ork of the p ositiv e deﬁnite matrix completion problem but not the same. As p oin ted out in Dahl et a l. [4], the implemen tation of the IPS requires en umeration of all maximal cliques of the graph and this en umeration has an exponential complexit y . Hence the a pplic ation of the IPS to large mo dels may b e limited. How ev er in the case where the model is relativ ely small or the structure of the mo del is simple, it ma y be feasible to en umerate maximal cliques. In this article w e consider suc h situations. The o rganization of this pap er is as follows. In Section 2 w e summarize notations and basic facts on graphs and giv e a brief review o f Gaussian graphical mo dels and t he IPS algorithm for cov ariance matr ices. In Section 3 w e prop ose an eﬃcien t implemen tation of the up date pro cedure of the IPS. In Section 4 we p erform some nume rical exp erimen ts to illustrate the eﬀectiv eness of the prop osed pro cedure. W e end this pap er with some concluding remarks in Section 5. 2 Bac kgrou nd and preli m inaries 2.1 Preliminaries on decomp ositions of graphs In this section w e summarize some preliminary f acts on decomp ositions of graphs needed in the argumen t of the follo wing sections according to Leimer [17], Lauritzen [16] and 2 Malv estuto and Moscarini [19]. Let G = (∆ , E ) b e a n undirected graph, where ∆ denotes the set of v ertices and E denotes the set of edges. A subset of ∆ whic h induces a complete subgraph is called a clique of G . Deﬁne t he set of maximal cliques of G b y C . F or a subset of v ertices V , let G ( V ) denote the subgraph o f G induced b y V . When a g r a ph G is not connected, w e can consider eac h connected component of G separately . Therefore w e only consider a connected g raph from no w on. A subset S ⊂ ∆ is said to b e a separator of G if G (∆ \ S ) is disconnected. F or a separator S , a triple ( A, B , S ) of disjoint subsets of ∆ suc h that A ∪ B ∪ S = ∆ is said to form a decomp osition of G . A separator S is called a clique separator if S is a clique of G . F or t w o non-a dj a c en t vertice s δ and δ ′ , S ⊂ ∆ is said to b e a ( δ, δ ′ )-separator if δ ∈ A and δ ′ ∈ B for a decomp osition ( A, B , S ) . A ( δ, δ ′ )-separator whic h is minimal with respect to inclusion relatio n is called a minimal ( δ, δ ′ )-separator or a minimal vertex separator(Lauritzen[16]). Denote by S the set of minimal v ertex separators for all no n- adjacen t pairs of v ertices in G . A graph G is called reducible if ∆ con tains a clique separator and otherwise G is said to b e prime. If G ( V ) is prime and G ( V ′ ) is r educible for all V ′ with V ( V ′ ⊂ ∆, G ( V ) is called a maximal prime subgraph (mp-subgraph) of G . F o r an y reducible graph, its decomp osition in to mp-subgraphs is uniquely deﬁned(Leim er[17 ], Malv estuto and Moscarini[19]). Denote b y V the set of subsets of ∆ whic h induces mp-subgraphs of G and let | V | = M . Then there exists a sequence V 1 , . . . , V M ∈ V suc h that for ev ery m = 2 , . . . , M there exis ts m ′ < m with V m ′ ⊃ V m ∩ ( V 1 ∪ · · · ∪ V m − 1 ) . Suc h a sequence is called a D-ordered sequence. Let S m := V m ∩ ( V 1 ∪ · · · ∪ V m − 1 ) for m = 2 , . . . , M . Deﬁne ¯ S = { S 2 , . . . , S m } . D en ote by S C the set of clique separators of G . Then ¯ S satisfy ¯ S = S ∩ S C . So w e call elemen ts of ¯ S clique minimal v ertex separators. Leimer[17] sho we d that reducible graphs alw ays ha v e a D-ordered sequenc e with V 1 = V for an y V ∈ V . Hence a D- o rde red sequence is not uniquely deﬁned. How eve r ¯ S is common f o r all D -ordered sequenc es. Example 1 (A reducible g raph) . T he gr aph G in Figur e 1 is an examp le of r e ducible gr aphs. G has two clique minimal vertex se p ar ators S 2 := { 3 , 4 } and S 3 := { 5 , 6 } . Deﬁne V 1 , V 2 and V 3 by V 1 := { 1 , 2 , 3 , 4 } , V 2 := { 3 , 4 , 5 , 6 } , V 3 := { 5 , 6 , 7 , 8 } as in Figur e 1. Then V = { V 1 , V 2 , V 3 } and the se quenc e V 1 , V 2 , V 3 is a D-or der e d se quenc e. When G is a c ho r da l graph, V and ¯ S are equal to the set of maximal cliques C a nd the set of minimal v ertex separator s S of G , resp ec tiv ely . Hence |C | = M . A D -ordered sequence for a chordal g r a ph is called a p erfec t sequenc e of maximal cliques. There exists 3 1 3 5 7 2 4 6 8 V 1 V 2 V 3 Figure 1: A reducible graph with eight v ertices a p erfec t sequence o f maximal cliques C 1 , . . . , C M suc h that C 1 = C f or an y C ∈ C ( e.g. Lauritzen[16]). F o r a ve rtex δ ∈ ∆, let adj( δ ) denote the set of v ertices adjacen t to δ . When adj( δ ) is a clique, δ is called a simplicial ve rtex. A simplicial v ertex is con tained in only one maximal clique C . Hence if δ is simplicial and δ ∈ C , then adj( δ ) = C \ { δ } . A sequence of v ertices δ 1 , δ 2 , . . . , δ | ∆ | is called a p erfect elimination order of v ertices of G if δ i is a simplicial ve rtex in G ( S | ∆ | j = i { δ j } ). It is well known that G is a chordal graph if and o nly if G p osse sses a p erfect elimination order (Dirac[7]). Let C 1 , . . . , C M b e a p erfect s equence of maximal cliques o f a c hordal graph G . Deﬁne R 1 := C 1 \ S 2 , S m := C m ∩ ( C 1 ∪ · · · ∪ C m − 1 ) and R m := C m \ S m for m = 2 , . . . , M . Let r m := | R m | . Let δ m 1 , . . . , δ m r m b e any sequence of vertic es in R m . Then the sequence of vertic es δ M 1 , . . . , δ M r M , δ M − 1 1 , . . . , δ M − 1 r M − 1 , . . . , δ 1 1 , . . . , δ 1 r 1 is a p erfect elimination order of G . W e call it a p erfect elimination order induced b y the p erfect sequence C 1 , . . . , C M . W e in tr oduce some no t a tions and a basic form ula for matrices needed in the f o llo wing sections. Let A = { a ij } b e a | ∆ | × | ∆ | matrix. F or tw o subsets ∆ 1 and ∆ 2 of ∆, w e let A ∆ 1 ∆ 2 = { a ij } i ∈ ∆ 1 ,j ∈ ∆ 2 denote a | ∆ 1 | × | ∆ 2 | submatrix of A . Deﬁne A − 1 ∆ 1 ∆ 2 := ( A − 1 ) ∆ 1 ∆ 2 . W e let [ A ∆ 1 ∆ 2 ] ∆ denote the | ∆ | × | ∆ | matrix suc h that ([ A ∆ 1 ∆ 2 ] ∆ ) ij =  a ij if i ∈ ∆ 1 , j ∈ ∆ 2 0 otherwise . Let ∆ 2 = ∆ C 1 and decomp ose a symmetric matrix A in to blo c ks as A =  A ∆ 1 ∆ 1 A ∆ 1 ∆ 2 A ′ ∆ 1 ∆ 2 A ∆ 2 ∆ 2  . Here for notational simplicity we displa ye d A for the case that the elemen ts of ∆ 1 are smaller tha n those o f ∆ 2 . Supp ose that A ∆ 2 ∆ 2 and A ∆ 1 ∆ 1 − A ∆ 1 ∆ 2 ( A ∆ 2 ∆ 2 ) − 1 A ′ ∆ 1 ∆ 2 are b oth p ositiv e deﬁnite. Then A is p ositiv e deﬁnite and A − 1 ∆ 1 ∆ 1 =  A ∆ 1 ∆ 1 − A ∆ 1 ∆ 2 ( A ∆ 2 ∆ 2 ) − 1 A ′ ∆ 1 ∆ 2  − 1 . (1) 4 2.2 Gaussian graphical mo dels Let M + ( G ) denote the set of | ∆ | × | ∆ | p ositiv e deﬁnite ma t r ic es K = { k ij } suc h tha t k ij = 0 for all i , j ∈ ∆ with i 6 = j and ( i, j ) / ∈ E . Then the G aus sian gra ph ical mo del for | ∆ | dimensional random v a riable Y = ( Y (1) , . . . , Y ( | ∆ | ) ) ′ asso ciated with a graph G is deﬁned as Y ∼ N | ∆ | ( µ, Σ) , K := Σ − 1 ∈ M + ( G ) . k ij = 0 indicates the conditional indep endence b et w een Y ( i ) and Y ( j ) giv en all other v a riables . In what fo llo ws, w e iden tify M + ( G ) with the corresp onding gr a phic al mo del. Let y 1 , . . . , y n b e i.i.d. samples from M + ( G ). Deﬁne ¯ y and W b y ¯ y := n − 1 n X i =1 y i , W := n X i =1 ( y i − ¯ y )( y i − ¯ y ) ′ , resp e ctiv ely . The lik eliho o d equation is written as L ( µ, K ) ∝ (det K ) n/ 2 exp  − 1 2 tr K W − n 2 tr K ( ¯ y − µ )( ¯ y − µ ) ′  . The MLE of µ is ¯ y . The lik eliho o d equations in volv ing K are expressed as nK − 1 C C = n Σ C C = W C C , ∀ C ∈ C . (2) F o r a subset of v ertices V ⊂ ∆, let ˆ K V V denote the MLE of K in the margina l mo del asso ciated with the graph G ( V ) based on the data in V -marginal sample only . Let S b e a clique separator of G and ( A, B , S ) b e a decomp osition of G . Let V = A ∪ S and V ′ = B ∪ S . Then the MLE ˆ K is kno wn to satisfy ˆ K = h ˆ K V V i ∆ + h ˆ K V ′ V ′ i ∆ − n  ( W S S ) − 1  ∆ (3) (e.g. Lauritzen [16]). More generally , for the set of mp-subgraphs V and the set of clique minimal vertex separato rs ¯ S , ˆ K = X V ∈V h ˆ K V V i ∆ − n X S ∈ ¯ S  ( W S S ) − 1  ∆ . (4) As men tioned in the previous se ction, when the mo de l is decomposable, V = C and S = ¯ S . Hence f r om (2), ˆ K is explicitly written b y ˆ K = n X C ∈C  ( W C C ) − 1  ∆ − n X S ∈S  ( W S S ) − 1  ∆ . Ho wev er for other graphical mo de ls, w e need some iterativ e pro cedure for computing the ﬁrst term on the right-hand side of (4). The following IPS is commonly used for this purp ose. Note that the second term on the righ t-hand side of (4) needs to b e calculated 5 only once and is not in v olve d in the iterativ e pro cedure. IPS consists of iterativ ely and success iv ely adjusting Σ C C for C ∈ C as in (2). Let K t and Σ t = ( K t ) − 1 denote the estimated K and Σ at the t -th step of iteration, resp ectiv ely . Deﬁne D := ∆ \ C fo r C ∈ C . Then the t -th iterativ e step of the IPS is describ ed by the up date rule of K t as follo ws. Algorithm 0 (Iterativ e prop ortional scaling for K ) Step 0 t ← 1 and sele ct an initial estimate K 0 suc h that K 0 ∈ M + ( G ). Step 1 Select a maximal clique C ∈ C and up date K as follo ws, ( K t ) C C ← ( W C C ) − 1 + ( K t − 1 ) C D (( K t − 1 ) D D ) − 1 ( K t − 1 ) D C (5) ( K t ) C D ← ( K t − 1 ) C D ( K t ) D D ← ( K t − 1 ) D D . Step 2 If K t con verges , exit. O the rwise t ← t + 1 and g o to Step 1. F ro m (1 ), it is easy to see t ha t ( K t ) − 1 C C = (Σ t ) C C = W C C /n. In Step 1, o nly the C -marginal of K is up dated. The refore w e note that if the initial estimate K 0 satisﬁes K 0 ∈ M + ( G ), K t satisﬁes K t ∈ M + ( G ) fo r all t . By using the argumen t of Csisz ´ ar [3], the conv ergence of the algorithm to the MLE lim n →∞ K t = ˆ K , lim n →∞ Σ t = ˆ Σ is gua ran teed (Sp eed and Kiiv eri [21] and Lauritzen [1 6]). The fact (3) suggests that the decomp osition ( A, B , S ) for a clique separator S can lo calize the problem, tha t is, in order to obtain the MLE ˆ K , it suﬃces to compute the MLE of submatrix ˆ K V V and ˆ K V ′ V ′ , where V = A ∪ S and V ′ = B ∪ S . Esp ecially if the decompo sition by mp-subgraphs is obtained, w e need only to compute ˆ K V V for eac h V ∈ V . F ro m a complexit y theoretic p oin t of view, the t -th iterative step (5) requires O ( | D | 3 + | D | 2 | C | + | D || C | 2 ) time. The gr aphic al mo del with C = {{ 1 , 2 } , { 2 , 3 } , . . . , { | ∆ | − 1 , | ∆ |} , {| ∆ | , 1 }} is called the | ∆ | -dimensional cycle mo del or | ∆ | cycle mo del. Note that t he cyc le is prime. In the case of | ∆ | cycle mo de l, | C | = 2 and | D | = | ∆ | − 2. Hence when | ∆ | ≥ 4, the iterativ e step (5) requires O (( | ∆ | − 2 ) 3 ) time. In the next section w e prop ose a more eﬃcien t alg orithm for computing (5) by using the structure of a chordal extension of a graph. 6 3 A lo calized algorith m of IPS F ro m (1 ), w e note that (5) is rewritten as ( K t ) C C = ( W C C ) − 1 + ( K t − 1 ) C C − ((Σ t − 1 ) C C ) − 1 = ( W C C ) − 1 + ( K t − 1 ) C C − (( K t − 1 ) − 1 C C ) − 1 . (6) In this section we pro vide an eﬃcien t alg orithm to compute (( K t − 1 ) − 1 C C ) − 1 b y using the structure of G . F or a g raph G , let G ∗ b e a c hordal g raph obt a ine d by triangulating G . Suc h G ∗ is called a c hordal extension of G . Figure 2 represen ts an example of the ﬁv e cycle mo del and its c horda l extension. 1 2 3 4 5 1 2 3 4 5 C 1 C 2 C 3 (i) the ﬁv e cycle mo del (ii) a c hordal extens ion of (i) Figure 2: The ﬁve cycle mo del and its c hordal extension Let C ∗ 1 , . . . , C ∗ M b e a perfect sequ ence of the maximal cliques of G ∗ with C ∗ 1 ⊃ C . Let S ∗ m := C ∗ m ∩ ( C ∗ 1 ∪ · · · ∪ C ∗ m − 1 ) for m = 2 , . . . , M b e minimal v ertex separators of G ∗ . W e prop ose the follow ing algorithm to compute (( K t − 1 ) − 1 C C ) − 1 for each maximal clique C ∈ C . Algorithm 1 (Computing ( ( K t − 1 ) − 1 C C ) − 1 ) . Step 0 m ← M a nd K ∗ ← K t − 1 . Step 1 If m 6 = 1, select a simplicial v ertex δ ∈ C ∗ m of G ∗ . If m = 1 , select a v ertex δ / ∈ C . Let Q = C ∗ m \ { δ } . Step 2 Up date K ∗ QQ b y K ∗ QQ ← K ∗ QQ − ( k ∗ δδ ) − 1 K ∗ Qδ K ∗ δQ . (7) Step 3 Up date C ∗ m , G ∗ and ∆ as follo ws, C ∗ m ← Q, G ∗ ← G ∗ (∆ \ { δ } ) , ∆ ← ∆ \ { δ } . If C ∗ m = S ∗ m , m ← m − 1. If C ∗ m = C , return K ∗ C C . Otherwise, go to Step 1. No w w e state the main theorem of this pap er. 7 Theorem 1. The output K ∗ C C of A lgorithm 1 i s e qual to (( K t − 1 ) − 1 C C ) − 1 . Pr o of. Let δ ∈ C ∗ M b e a simplicial v ertex in G ∗ . D eﬁ ne Q := C ∗ M \ { δ } , Q 1 := ∆ \ { δ } and Q 2 := ∆ \ C ∗ M . Since adj( δ ) ⊂ C ∗ M and K ( t − 1) ∈ M + ( G ), ( K t − 1 ) Q 2 δ = 0 . Noting that Q ∪ Q 2 = Q 1 , w e ha ve from (1) (( K t − 1 ) − 1 Q 1 Q 1 ) − 1 = ( K t − 1 ) Q 1 Q 1 − ( k t − 1 ) − 1 δδ ( K t − 1 ) Q 1 δ ( K t − 1 ) δQ 1 = ( K t − 1 ) Q 1 Q 1 − ( k t − 1 ) − 1 δδ  0 ( K t − 1 ) Qδ   0 ( K t − 1 ) δQ  = ( K t − 1 ) Q 1 Q 1 −  0 0 0 ( k t − 1 ) − 1 δδ ( K t − 1 ) Qδ ( K t − 1 ) δQ  and (( K t − 1 ) − 1 Q 1 Q 1 ) − 1 ∈ M + ( G ( Q 1 )), where ( k t − 1 ) δδ is the ( δ , δ )- th elemen t of K t − 1 . By iterating the pro cedure in accordance with the p erfect elimination order induced by the p erfect sequence C ∗ 1 , . . . , C ∗ M , w e complete the pro of. In Algorithm 1 , the triangulation G ∗ is arbitrary . Ho w ev er for eve ry iterat ive step of adjusting the C - marginal, w e hav e to use the p erfect sequence with C ∗ 1 ⊃ C . Example 2 (the ﬁv e cycle mo del) . Consider the ﬁv e cycle mo de l in Figur e 2-(i). K is expr esse d by K =       k 11 k 12 k 13 0 0 k 12 k 22 0 k 24 0 k 13 0 k 33 0 k 35 0 k 24 0 k 44 k 45 0 0 k 35 k 45 k 55       . By adding the ﬁl l-in e dges { 2 , 3 } and { 3 , 4 } , a trian g ulate d g r aph G ∗ c an b e obtaine d as in Figur e 2-(ii). Consider the c ase wher e C = { 1 , 2 } . Deﬁne C ∗ 1 = { 1 , 2 , 3 } , C ∗ 2 = { 2 , 3 , 4 } and C ∗ 3 = { 3 , 4 , 5 } . Then the se quenc e C ∗ 1 , C ∗ 2 , C ∗ 3 is p erfe ct a n d it induc es a p erfe ct elimination or de r 5 , 4 , 3 , 2 , 1 . The up date of K ∗ in step 2 in ac c or danc e with the p erfe ct elimination or der is describ e d as fol lows, K ∗ 34 , 34 ← K ∗ 34 , 34 − ( k ∗ 55 ) − 1  k ∗ 35 k ∗ 45  ( k ∗ 35 k ∗ 45 ) , K ∗ 23 , 23 ← K ∗ 23 , 23 − ( k ∗ 44 ) − 1  k ∗ 24 k ∗ 34  ( k ∗ 24 k ∗ 34 ) , K ∗ 12 , 12 ← K ∗ 12 , 12 − ( k ∗ 33 ) − 1  k ∗ 13 k ∗ 23  ( k ∗ 13 k ∗ 23 ) . Then K ∗ 12 , 12 = (( K t − 1 ) − 1 12 , 12 ) − 1 = (( K t − 1 ) − 1 C C ) − 1 . W e no w analyze the computational cost of the prop osed algorithm. In Step 2, the running time of the calculation of (7) is as fo llo ws, 8 • K ∗ 1 := ( k ∗ δδ ) − 1 K ∗ Qδ requires | Q | divisions ; • K ∗ 2 := K ∗ 1 K ∗ δQ requires | Q | 2 m ultiplications ; • K ∗ QQ − K ∗ 2 requires | Q | 2 subtractions. Deﬁne R ∗ 1 := C ∗ 1 \ C and R ∗ m := C ∗ m \ S ∗ m for m = 2 , . . . , M . | Q | ranges o v er {| C m | − j | 1 ≤ j ≤ R ∗ m , 1 ≤ m ≤ M } . Let µ , γ and σ measure the time units required by a single m ultiplication, division and subtraction, resp ec tiv ely . Then the running time of Algorithm 1 amounts to ( µ + σ ) M X m =1 R ∗ m X j =1 ( | C ∗ m | − j ) 2 + δ M X m =1 R ∗ m X j =1 ( | C ∗ m | − j ) = ( µ + σ ) M X m =1 n | R ∗ m || C ∗ m | 2 − | R ∗ m || C ∗ m | − | R ∗ m | 2 | C ∗ m | + | R ∗ m | ( | R ∗ m | + 1)(2 | R ∗ m | + 1) 6 o + δ M X m =1  | R ∗ m || C ∗ m | − (1 + | R ∗ m | ) | R ∗ m | 2  + 2 σ | C | 2 . Since | C ∗ m | ≥ | R ∗ m | , the computational cost of Algorithm 1 is O  P M m =1 | R ∗ m || C ∗ m | 2  . Once (( K t − 1 ) − 1 C C ) − 1 is obtained, O ( | C | 2 ) a dditions ar e required to compute (6). Note that we can compute ( W C C ) − 1 once b efore the IPS pro cedure. Hence the computational c ost of the t -th iterative step amounts t o O  | C | 2 + P M m =1 | R ∗ m || C ∗ m | 2  . In the case of cycle mo dels, | C | = 2 , M = | ∆ | − 2 , | C ∗ m | = 3 and | R ∗ m | = 1. Th us the computationa l cost is O ( | ∆ | ). As men tio ned in the previous section, the direct computation of (( K t − 1 ) − 1 C C ) − 1 requires O ((∆ \ C ) 3 ) = O ( | D | 3 ) time and in the case of cycle mo dels it requires O (( | ∆ | − 2) 3 ) time. Hence we can see the eﬃciency o f the prop osed algorithm. 4 Numerical exp eri m en t s for cycle mo dels In this section w e compare the lo calized IPS prop osed in the previous section with the direct computation of the IPS by n umerical experiments . W e conside r the | ∆ | cycle mo dels with | ∆ | = 5 , 10 , 50 , 100 , 20 0 , 300 , 500 , 1 0 00. W e set K = I | ∆ | . W e generate 100 Wishart matrices W with the parameter I | ∆ | and the degrees of freedom | ∆ | a nd computed the MLE ˆ K for | ∆ | cycle mo dels b y using the pro posed algorithm and the direct computation of the IPS. W e set the initial estimate K 0 := I | ∆ | . As a con v ergence criterion, w e used P i,j | k t ij | ≤ 10 − 6 . The computation w as done on a In tel C ore 2 Duo 3.0 GHz C PU mac hine b y using R language. T able 1 presen ts the av erage CPU time p er one iterative step to up date K t − 1 in (6) for bo th algorithms. 9 W e can see the comp etitiv e p erformance of the prop osed algorithm when | ∆ | = 5 and | ∆ | ≥ 200. Ho w ev er the direct computation is faster than the prop osed one for | ∆ | = 10, 50, 100. In the up date pro cedure of direct computation (6), the computation of (( K t − 1 ) D D ) − 1 is the most computationally exp ensiv e and in theory it requires O ( | D | 3 ) time. T able 2 shows t he a v erage CPU t im e for computing a | ∆ | × | ∆ | in v erse matrix b y using R la nguage on the same mac hine. As seen from the table, while the CPU time for computing the inv erse of a mat r ix increases nearly at the rate O ( | ∆ | 3 ) f or | ∆ | > 100 , it increases to o slow ly for relative ly small | ∆ | . On the other hand, w e can see from T able 1 that the CPU time of the prop osed algorithm almost linearly increases in prop ortion to | ∆ | whic h follows the theoretical result in the previous sec tion. These are the reasons wh y the propo s ed algorithm is slo w er than the direct computation for relatively small | ∆ | . When | ∆ | ≥ 200 , how ever, the computational cost o f (( K t − 1 ) D D ) − 1 is not ig norable and the prop osed algorithm show s a considerable reduction of computational time. In practice the p erformances f or large mo dels are more crucial. In this sense the pro p osed algorithm is considered to b e eﬃcie n t. T able 1: CPU time p er one iterativ e pro cedure for | ∆ | cycle mo dels | ∆ | Algorithm 1 direct computation 5 1.590 2.261 10 3.442 2.523 50 18.63 5.482 100 38.11 17.48 200 78.87 104.91 300 120.03 361.01 500 225.94 1292.1 1000 511.60 6625.8 (10 − 2 CPU time) T able 2: CPU time for calculating | ∆ | × | ∆ | in v erse matrices | ∆ | CPU time 5 0.027 10 0.031 50 0.058 100 0.286 200 1.789 300 5.602 500 24.666 1000 207.66 (10 − 2 CPU time) 10 5 Conclud ing remarks In this article we discu ssed t he lo calization to reduce the computational burden of the IPS in tw o w a ys. W e ﬁrst show ed that the decomp osition into mp-subgraphs of the graph can lo calize the IP S. Next w e pro p osed a lo calized algorithm of the iterativ e step in the IPS b y using the structure of a c ho r dal extens ion of the graphical mo del for each mp-subgraph. The prop osed algorithm costs O ( | ∆ | ) in the case of cycle mo dels and some numerical exp eriments conﬁrmed the theory fo r large models. As men tio ne d in Section 1, t he implemen tation of the IPS requires enume ration of all maximal cliques o f the graph a nd this en umeration has an exponen tial complexit y . In addition, the prop osed algorithm a lso requires some c haracteristics of graphs, that is, a c hordal extension, p erfect sequences a nd p erfect elimination orders of the chordal exten- sion. In this sense, the application of the IPS may b e limited. How ev er in the case where the structure of the mo del is simple or sparse, it ma y be feasible to obtain c haracteristics of gr a phs . In suc h cases, the prop osed algorithm is considered to b e eﬀectiv e. Ac kno wledgmen t The authors are grateful to tw o ano n ymous referees for constructiv e comments and sug- gestions whic h ha v e led to improv ements in the presen tation of the pa p er. References [1] J. H. Badsb erg and F. M. Malv estuto. An implrmen taition of the iterat ive prop or- tional ﬁtting pro cec ure b y propaga tion trees. Comput. Statist. Data. A nal. , 37:297– 322, 2001 . [2] D . R. Co x and N. W erm uth. Multivariate D ep endenc i e s . Chapman and Hall, London, 1996. [3] I. Csisz´ ar. I - divergenc e geometry of probability distributions and minimization prob- lems. Ann. Pr ob a b. , 3:146–158, 1 975. [4] J. D ahl, V anden b erghe L., and V. Royc ho wdh ury . Co v ariance selection for non- c hordal gr a phs via c horda l em b edding, 2006. T o app ear in Optimization Metho ds and Softwar e . [5] W. E. D emin g and F. F. Stephan. On a least squares adjustmen t of a sampled frequency table when the expected marginal tota ls a re kno wn. A nn. Math. S tatist , 11:427–44 4, 1940. [6] A. P . Dempster. Cov ariance selec tion. Biometrics , 28:157–175, 1972. [7] G. A. Dira c. O n r igid circuit graphs. Abh. Math. Sem. Univ. Hambur g , 25:71–76, 1961. 11 [8] A. Do bra, C. Hans, B. Jones, J. R. Nevins, G. Y ao, and M. W est. Sparse graphical mo dels for exploring gene expression data . J. Multivariate Anal. , 90:196 – 212, 2004. [9] M. Drton and T. S. Ric ha rds on. Graphical metho ds for eﬃcien t lik eliho o d inference in gaussian co v ariance mo dels. arXiv:070 8.1321, 2007. [10] D . M. Edw ards. Intr o d u ction to Gr aphic al Mo del ling . Springer, New Y ork, 2 000. [11] S. E. Fienberg. An iterative pro cedure for estimation in contingency tables. Ann. Math. Statist. , 41:907–917, 1970 . [12] M. F ukuda, H. Ko jima, K. Murota, and K. Nak ata. Exploiting sparcit y in semidef- inite prog ramming via matrix completion I : General f r ame w or k. SIAM J. Optim. , 11:647–67 4, 2000. [13] C. T. Ireland a nd S. K ulba ck. Con tingency table with give n marginal. Biometrika , 55:179–18 8, 1968. [14] R . Jirou ˇ sek. Solution of the marginal problem and decomp osable solutions. K yb er- netika , 27 :403–412, 1 991. [15] R . Jirou ˇ sek and S. P ˇ reu ˇ cil. On the eﬀectiv e implemen t a tion of the iterativ e prop or- tional ﬁtting pro cedure. Comput. Statist. Data. Anal. , 19:177–1 8 9, 199 5. [16] Steﬀen L. Lauritzen. Gr aphic al Mo dels . O x ford Univers it y Press, Oxford, 1996. [17] H. G . Leimer. Optimal decomposition b y clique separato rs . Discr ete Math. , 11 3 :99– 123, 1993 . [18] H. L i and J. Gui. Gradien t directed regularization fo r sparse gaussian concen tration graphs, with applications to inference of g enetic net w or k . Biostatistics , 7:302– 317, 2006. [19] F . M. Malve stuto and M. Moscarini. Decomp osition o f a hypergraph b y partial-edge separators. The o r et. Comput. Sci. , 237:57–79, 200 0. [20] J. D. Rose. A graph theoretic study of the n umerical solution o f sparse p ositiv e deﬁ- nite. In R. C. Read, editor, Gr aph The ory and Co m put ing , pages 183–217. Academic Press, New Y ork, 1971. [21] T. P . Sp eed and H. T. Kiiv eri. G auss ian marko v distribution ov er ﬁnite g raphs. Ann. Statist. , 1 4 :138–150, 1 986. [22] J. Whittak er. Gr aph i c a l Mo dels in Applie d Multivariate Statistics . John Wiley and Sons, Chich ester, 1990. 12

A Localization Approach to Improve Iterative Proportional Scaling in Gaussian Graphical Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment