A Localization Approach to Improve Iterative Proportional Scaling in Gaussian Graphical Models
We discuss an efficient implementation of the iterative proportional scaling procedure in the multivariate Gaussian graphical models. We show that the computational cost can be reduced by localization of the update procedure in each iterative step by…
Authors: Hisayuki Hara, Akimichi Takemura
A Lo calization Approach to Impro v e Iterativ e Prop ortional Scalin g in Gaussian Graphical Mo dels Hisayuki Hara Department of T ec hno logy Manag emen t fo r Innov ati on Universit y of T ok yo Akimi c hi T akem ur a Graduat e Sc ho ol of Inform ation Science and T ec hnolog y Universit y of T okyo Ma y 2 0 08 Abstract W e discuss an efficien t imp le men tation of the iterativ e prop ortional sca ling p ro- cedure in the m ultiv ariate Gaussian graph ic al mo dels. W e show that the compu ta - tional cost can b e reduced by localization of the u pd at e pro cedure in eac h iterativ e step b y using the structure of a d e comp osable mo d el obtained by triangulation of the graph asso ciat ed with the mo del. S o me n umerical exp erimen ts demonstrate the comp etit iv e p erformance of the prop osed algorithm. 1 In tro d uction Since Dempster [6] introduced a m ultiv ariate Ga us sian graphical mo del, also called a co- v ariance selection mo del, it has b ee n in v estigated b y man y autho r s from b oth theoretical and practical viewp oin ts. On the theory of a Gaussian graphical mo del, see e.g. Whit- tak er [22 ], Lauritzen [16], Co x and W ermuth [2] and Edw ards [10]. In recent y ears m uch effort has b een dev ot ed to application of the Gaussian graphical mo del t o iden tify sparse large net work systems , esp ecially genetic net works (e.g. [8], [18], [9]), and the efficien t implemen tation of the inference in the mo del ha s b een extensiv ely studied. In this article w e discuss an efficien t algo rithm t o compute the ma ximum like liho o d estimator (MLE) of the co v ariance matrix in the Gaussian graphical mo de ls. When the graph associat ed with the mo del is a c hordal g raph, the mo de l is called a decomp osable mo del. F or a decomposable mo de l, the MLE of the co v ariance matrix is explicitly obtained. F o r g eneral graphical mo dels other than decomp osable mo dels, ho wev er, w e need some it era t ive pro cedure to o btain the MLE. The iterativ e prop ortional scaling (IPS) pro cedure is one of p opular algor ithm s to compute the MLE. 1 The IPS w as first intro du ced b y Deming and Stephan [5] to estimate cell pro babilities in con t ingen cy t a ble s sub ject to certain fixed margina ls . Its con v ergence and statistical prop erties hav e b een w ell studied by man y authors (e.g. [13], [11]) and the IPS hav e b e en justified in a more general fr a me w ork ([3]). Sp eed and Kiiveri [21 ] first formulated the IPS in a G auss ian graphical mo del and gav e a pro of of its conv ergence. Ho wev er, from a practically p oin t of view, a straigh tfo r ward application of the IPS is often computationally to o exp ensiv e for larger mo dels. In the con tingency tables sev eral tec hniques ha v e b een deve lop ed t o reduce b oth storage and computational time of the IPS (e.g. [14], [15]). Badsb erg and Malv estuto [1] prop osed a lo calize d implemen ta t ion of the IPS b y using the structure of decomp osable mo dels con taining the gr aphic al mo del. Suc h a tec hnique is called the c hordal extens ion. The local computation based o n the c hordal extension has b een a p opular tec hnique in many fields for n umerical computation of a sparse linear system(e.g. [20], [12 ]). In the presen t pap er we describ e a lo calized algorithm based on the c hordal extension for impro ving the computational efficiency of the IPS in the G a us sian graphical mo dels . Let ∆ b e the s et of v ariables whic h corresponds t o the s et of v ertices of the graph asso ciated with the mo del. The straightforw ard implemen tatio n of the IPS requires approx imately O ( | ∆ | 3 ) time in each iterativ e step for large mo dels . In the similar w ay to the techniq ue in Badsb erg and Malv estuto [1], w e lo calize the up date pro cedure in eac h step by using the structure of a decomposable mo del con taining the mo del. The prop osed algorithm is sho wn to require O ( | ∆ | ) t im e for some mo dels. The pr o ble m of computing the MLE is equiv alen t to the p ositiv e definite matr ix com- pletion problem. The prop osed algorithm based on the c hordal extension is closely related to the tech nique discuss ed b y F ukuda et al. [12] in the framew ork of the p ositiv e definite matrix completion problem but not the same. As p oin ted out in Dahl et a l. [4], the implemen tation of the IPS requires en umeration of all maximal cliques of the graph and this en umeration has an exponential complexit y . Hence the a pplic ation of the IPS to large mo dels may b e limited. How ev er in the case where the model is relativ ely small or the structure of the mo del is simple, it ma y be feasible to en umerate maximal cliques. In this article w e consider suc h situations. The o rganization of this pap er is as follows. In Section 2 w e summarize notations and basic facts on graphs and giv e a brief review o f Gaussian graphical mo dels and t he IPS algorithm for cov ariance matr ices. In Section 3 w e prop ose an efficien t implemen tation of the up date pro cedure of the IPS. In Section 4 we p erform some nume rical exp erimen ts to illustrate the effectiv eness of the prop osed pro cedure. W e end this pap er with some concluding remarks in Section 5. 2 Bac kgrou nd and preli m inaries 2.1 Preliminaries on decomp ositions of graphs In this section w e summarize some preliminary f acts on decomp ositions of graphs needed in the argumen t of the follo wing sections according to Leimer [17], Lauritzen [16] and 2 Malv estuto and Moscarini [19]. Let G = (∆ , E ) b e a n undirected graph, where ∆ denotes the set of v ertices and E denotes the set of edges. A subset of ∆ whic h induces a complete subgraph is called a clique of G . Define t he set of maximal cliques of G b y C . F or a subset of v ertices V , let G ( V ) denote the subgraph o f G induced b y V . When a g r a ph G is not connected, w e can consider eac h connected component of G separately . Therefore w e only consider a connected g raph from no w on. A subset S ⊂ ∆ is said to b e a separator of G if G (∆ \ S ) is disconnected. F or a separator S , a triple ( A, B , S ) of disjoint subsets of ∆ suc h that A ∪ B ∪ S = ∆ is said to form a decomp osition of G . A separator S is called a clique separator if S is a clique of G . F or t w o non-a dj a c en t vertice s δ and δ ′ , S ⊂ ∆ is said to b e a ( δ, δ ′ )-separator if δ ∈ A and δ ′ ∈ B for a decomp osition ( A, B , S ) . A ( δ, δ ′ )-separator whic h is minimal with respect to inclusion relatio n is called a minimal ( δ, δ ′ )-separator or a minimal vertex separator(Lauritzen[16]). Denote by S the set of minimal v ertex separators for all no n- adjacen t pairs of v ertices in G . A graph G is called reducible if ∆ con tains a clique separator and otherwise G is said to b e prime. If G ( V ) is prime and G ( V ′ ) is r educible for all V ′ with V ( V ′ ⊂ ∆, G ( V ) is called a maximal prime subgraph (mp-subgraph) of G . F o r an y reducible graph, its decomp osition in to mp-subgraphs is uniquely defined(Leim er[17 ], Malv estuto and Moscarini[19]). Denote b y V the set of subsets of ∆ whic h induces mp-subgraphs of G and let | V | = M . Then there exists a sequence V 1 , . . . , V M ∈ V suc h that for ev ery m = 2 , . . . , M there exis ts m ′ < m with V m ′ ⊃ V m ∩ ( V 1 ∪ · · · ∪ V m − 1 ) . Suc h a sequence is called a D-ordered sequence. Let S m := V m ∩ ( V 1 ∪ · · · ∪ V m − 1 ) for m = 2 , . . . , M . Define ¯ S = { S 2 , . . . , S m } . D en ote by S C the set of clique separators of G . Then ¯ S satisfy ¯ S = S ∩ S C . So w e call elemen ts of ¯ S clique minimal v ertex separators. Leimer[17] sho we d that reducible graphs alw ays ha v e a D-ordered sequenc e with V 1 = V for an y V ∈ V . Hence a D- o rde red sequence is not uniquely defined. How eve r ¯ S is common f o r all D -ordered sequenc es. Example 1 (A reducible g raph) . T he gr aph G in Figur e 1 is an examp le of r e ducible gr aphs. G has two clique minimal vertex se p ar ators S 2 := { 3 , 4 } and S 3 := { 5 , 6 } . Define V 1 , V 2 and V 3 by V 1 := { 1 , 2 , 3 , 4 } , V 2 := { 3 , 4 , 5 , 6 } , V 3 := { 5 , 6 , 7 , 8 } as in Figur e 1. Then V = { V 1 , V 2 , V 3 } and the se quenc e V 1 , V 2 , V 3 is a D-or der e d se quenc e. When G is a c ho r da l graph, V and ¯ S are equal to the set of maximal cliques C a nd the set of minimal v ertex separator s S of G , resp ec tiv ely . Hence |C | = M . A D -ordered sequence for a chordal g r a ph is called a p erfec t sequenc e of maximal cliques. There exists 3 1 3 5 7 2 4 6 8 V 1 V 2 V 3 Figure 1: A reducible graph with eight v ertices a p erfec t sequence o f maximal cliques C 1 , . . . , C M suc h that C 1 = C f or an y C ∈ C ( e.g. Lauritzen[16]). F o r a ve rtex δ ∈ ∆, let adj( δ ) denote the set of v ertices adjacen t to δ . When adj( δ ) is a clique, δ is called a simplicial ve rtex. A simplicial v ertex is con tained in only one maximal clique C . Hence if δ is simplicial and δ ∈ C , then adj( δ ) = C \ { δ } . A sequence of v ertices δ 1 , δ 2 , . . . , δ | ∆ | is called a p erfect elimination order of v ertices of G if δ i is a simplicial ve rtex in G ( S | ∆ | j = i { δ j } ). It is well known that G is a chordal graph if and o nly if G p osse sses a p erfect elimination order (Dirac[7]). Let C 1 , . . . , C M b e a p erfect s equence of maximal cliques o f a c hordal graph G . Define R 1 := C 1 \ S 2 , S m := C m ∩ ( C 1 ∪ · · · ∪ C m − 1 ) and R m := C m \ S m for m = 2 , . . . , M . Let r m := | R m | . Let δ m 1 , . . . , δ m r m b e any sequence of vertic es in R m . Then the sequence of vertic es δ M 1 , . . . , δ M r M , δ M − 1 1 , . . . , δ M − 1 r M − 1 , . . . , δ 1 1 , . . . , δ 1 r 1 is a p erfect elimination order of G . W e call it a p erfect elimination order induced b y the p erfect sequence C 1 , . . . , C M . W e in tr oduce some no t a tions and a basic form ula for matrices needed in the f o llo wing sections. Let A = { a ij } b e a | ∆ | × | ∆ | matrix. F or tw o subsets ∆ 1 and ∆ 2 of ∆, w e let A ∆ 1 ∆ 2 = { a ij } i ∈ ∆ 1 ,j ∈ ∆ 2 denote a | ∆ 1 | × | ∆ 2 | submatrix of A . Define A − 1 ∆ 1 ∆ 2 := ( A − 1 ) ∆ 1 ∆ 2 . W e let [ A ∆ 1 ∆ 2 ] ∆ denote the | ∆ | × | ∆ | matrix suc h that ([ A ∆ 1 ∆ 2 ] ∆ ) ij = a ij if i ∈ ∆ 1 , j ∈ ∆ 2 0 otherwise . Let ∆ 2 = ∆ C 1 and decomp ose a symmetric matrix A in to blo c ks as A = A ∆ 1 ∆ 1 A ∆ 1 ∆ 2 A ′ ∆ 1 ∆ 2 A ∆ 2 ∆ 2 . Here for notational simplicity we displa ye d A for the case that the elemen ts of ∆ 1 are smaller tha n those o f ∆ 2 . Supp ose that A ∆ 2 ∆ 2 and A ∆ 1 ∆ 1 − A ∆ 1 ∆ 2 ( A ∆ 2 ∆ 2 ) − 1 A ′ ∆ 1 ∆ 2 are b oth p ositiv e definite. Then A is p ositiv e definite and A − 1 ∆ 1 ∆ 1 = A ∆ 1 ∆ 1 − A ∆ 1 ∆ 2 ( A ∆ 2 ∆ 2 ) − 1 A ′ ∆ 1 ∆ 2 − 1 . (1) 4 2.2 Gaussian graphical mo dels Let M + ( G ) denote the set of | ∆ | × | ∆ | p ositiv e definite ma t r ic es K = { k ij } suc h tha t k ij = 0 for all i , j ∈ ∆ with i 6 = j and ( i, j ) / ∈ E . Then the G aus sian gra ph ical mo del for | ∆ | dimensional random v a riable Y = ( Y (1) , . . . , Y ( | ∆ | ) ) ′ asso ciated with a graph G is defined as Y ∼ N | ∆ | ( µ, Σ) , K := Σ − 1 ∈ M + ( G ) . k ij = 0 indicates the conditional indep endence b et w een Y ( i ) and Y ( j ) giv en all other v a riables . In what fo llo ws, w e iden tify M + ( G ) with the corresp onding gr a phic al mo del. Let y 1 , . . . , y n b e i.i.d. samples from M + ( G ). Define ¯ y and W b y ¯ y := n − 1 n X i =1 y i , W := n X i =1 ( y i − ¯ y )( y i − ¯ y ) ′ , resp e ctiv ely . The lik eliho o d equation is written as L ( µ, K ) ∝ (det K ) n/ 2 exp − 1 2 tr K W − n 2 tr K ( ¯ y − µ )( ¯ y − µ ) ′ . The MLE of µ is ¯ y . The lik eliho o d equations in volv ing K are expressed as nK − 1 C C = n Σ C C = W C C , ∀ C ∈ C . (2) F o r a subset of v ertices V ⊂ ∆, let ˆ K V V denote the MLE of K in the margina l mo del asso ciated with the graph G ( V ) based on the data in V -marginal sample only . Let S b e a clique separator of G and ( A, B , S ) b e a decomp osition of G . Let V = A ∪ S and V ′ = B ∪ S . Then the MLE ˆ K is kno wn to satisfy ˆ K = h ˆ K V V i ∆ + h ˆ K V ′ V ′ i ∆ − n ( W S S ) − 1 ∆ (3) (e.g. Lauritzen [16]). More generally , for the set of mp-subgraphs V and the set of clique minimal vertex separato rs ¯ S , ˆ K = X V ∈V h ˆ K V V i ∆ − n X S ∈ ¯ S ( W S S ) − 1 ∆ . (4) As men tioned in the previous se ction, when the mo de l is decomposable, V = C and S = ¯ S . Hence f r om (2), ˆ K is explicitly written b y ˆ K = n X C ∈C ( W C C ) − 1 ∆ − n X S ∈S ( W S S ) − 1 ∆ . Ho wev er for other graphical mo de ls, w e need some iterativ e pro cedure for computing the first term on the right-hand side of (4). The following IPS is commonly used for this purp ose. Note that the second term on the righ t-hand side of (4) needs to b e calculated 5 only once and is not in v olve d in the iterativ e pro cedure. IPS consists of iterativ ely and success iv ely adjusting Σ C C for C ∈ C as in (2). Let K t and Σ t = ( K t ) − 1 denote the estimated K and Σ at the t -th step of iteration, resp ectiv ely . Define D := ∆ \ C fo r C ∈ C . Then the t -th iterativ e step of the IPS is describ ed by the up date rule of K t as follo ws. Algorithm 0 (Iterativ e prop ortional scaling for K ) Step 0 t ← 1 and sele ct an initial estimate K 0 suc h that K 0 ∈ M + ( G ). Step 1 Select a maximal clique C ∈ C and up date K as follo ws, ( K t ) C C ← ( W C C ) − 1 + ( K t − 1 ) C D (( K t − 1 ) D D ) − 1 ( K t − 1 ) D C (5) ( K t ) C D ← ( K t − 1 ) C D ( K t ) D D ← ( K t − 1 ) D D . Step 2 If K t con verges , exit. O the rwise t ← t + 1 and g o to Step 1. F ro m (1 ), it is easy to see t ha t ( K t ) − 1 C C = (Σ t ) C C = W C C /n. In Step 1, o nly the C -marginal of K is up dated. The refore w e note that if the initial estimate K 0 satisfies K 0 ∈ M + ( G ), K t satisfies K t ∈ M + ( G ) fo r all t . By using the argumen t of Csisz ´ ar [3], the conv ergence of the algorithm to the MLE lim n →∞ K t = ˆ K , lim n →∞ Σ t = ˆ Σ is gua ran teed (Sp eed and Kiiv eri [21] and Lauritzen [1 6]). The fact (3) suggests that the decomp osition ( A, B , S ) for a clique separator S can lo calize the problem, tha t is, in order to obtain the MLE ˆ K , it suffices to compute the MLE of submatrix ˆ K V V and ˆ K V ′ V ′ , where V = A ∪ S and V ′ = B ∪ S . Esp ecially if the decompo sition by mp-subgraphs is obtained, w e need only to compute ˆ K V V for eac h V ∈ V . F ro m a complexit y theoretic p oin t of view, the t -th iterative step (5) requires O ( | D | 3 + | D | 2 | C | + | D || C | 2 ) time. The gr aphic al mo del with C = {{ 1 , 2 } , { 2 , 3 } , . . . , { | ∆ | − 1 , | ∆ |} , {| ∆ | , 1 }} is called the | ∆ | -dimensional cycle mo del or | ∆ | cycle mo del. Note that t he cyc le is prime. In the case of | ∆ | cycle mo de l, | C | = 2 and | D | = | ∆ | − 2. Hence when | ∆ | ≥ 4, the iterativ e step (5) requires O (( | ∆ | − 2 ) 3 ) time. In the next section w e prop ose a more efficien t alg orithm for computing (5) by using the structure of a chordal extension of a graph. 6 3 A lo calized algorith m of IPS F ro m (1 ), w e note that (5) is rewritten as ( K t ) C C = ( W C C ) − 1 + ( K t − 1 ) C C − ((Σ t − 1 ) C C ) − 1 = ( W C C ) − 1 + ( K t − 1 ) C C − (( K t − 1 ) − 1 C C ) − 1 . (6) In this section we pro vide an efficien t alg orithm to compute (( K t − 1 ) − 1 C C ) − 1 b y using the structure of G . F or a g raph G , let G ∗ b e a c hordal g raph obt a ine d by triangulating G . Suc h G ∗ is called a c hordal extension of G . Figure 2 represen ts an example of the fiv e cycle mo del and its c horda l extension. 1 2 3 4 5 1 2 3 4 5 C 1 C 2 C 3 (i) the fiv e cycle mo del (ii) a c hordal extens ion of (i) Figure 2: The five cycle mo del and its c hordal extension Let C ∗ 1 , . . . , C ∗ M b e a perfect sequ ence of the maximal cliques of G ∗ with C ∗ 1 ⊃ C . Let S ∗ m := C ∗ m ∩ ( C ∗ 1 ∪ · · · ∪ C ∗ m − 1 ) for m = 2 , . . . , M b e minimal v ertex separators of G ∗ . W e prop ose the follow ing algorithm to compute (( K t − 1 ) − 1 C C ) − 1 for each maximal clique C ∈ C . Algorithm 1 (Computing ( ( K t − 1 ) − 1 C C ) − 1 ) . Step 0 m ← M a nd K ∗ ← K t − 1 . Step 1 If m 6 = 1, select a simplicial v ertex δ ∈ C ∗ m of G ∗ . If m = 1 , select a v ertex δ / ∈ C . Let Q = C ∗ m \ { δ } . Step 2 Up date K ∗ QQ b y K ∗ QQ ← K ∗ QQ − ( k ∗ δδ ) − 1 K ∗ Qδ K ∗ δQ . (7) Step 3 Up date C ∗ m , G ∗ and ∆ as follo ws, C ∗ m ← Q, G ∗ ← G ∗ (∆ \ { δ } ) , ∆ ← ∆ \ { δ } . If C ∗ m = S ∗ m , m ← m − 1. If C ∗ m = C , return K ∗ C C . Otherwise, go to Step 1. No w w e state the main theorem of this pap er. 7 Theorem 1. The output K ∗ C C of A lgorithm 1 i s e qual to (( K t − 1 ) − 1 C C ) − 1 . Pr o of. Let δ ∈ C ∗ M b e a simplicial v ertex in G ∗ . D efi ne Q := C ∗ M \ { δ } , Q 1 := ∆ \ { δ } and Q 2 := ∆ \ C ∗ M . Since adj( δ ) ⊂ C ∗ M and K ( t − 1) ∈ M + ( G ), ( K t − 1 ) Q 2 δ = 0 . Noting that Q ∪ Q 2 = Q 1 , w e ha ve from (1) (( K t − 1 ) − 1 Q 1 Q 1 ) − 1 = ( K t − 1 ) Q 1 Q 1 − ( k t − 1 ) − 1 δδ ( K t − 1 ) Q 1 δ ( K t − 1 ) δQ 1 = ( K t − 1 ) Q 1 Q 1 − ( k t − 1 ) − 1 δδ 0 ( K t − 1 ) Qδ 0 ( K t − 1 ) δQ = ( K t − 1 ) Q 1 Q 1 − 0 0 0 ( k t − 1 ) − 1 δδ ( K t − 1 ) Qδ ( K t − 1 ) δQ and (( K t − 1 ) − 1 Q 1 Q 1 ) − 1 ∈ M + ( G ( Q 1 )), where ( k t − 1 ) δδ is the ( δ , δ )- th elemen t of K t − 1 . By iterating the pro cedure in accordance with the p erfect elimination order induced by the p erfect sequence C ∗ 1 , . . . , C ∗ M , w e complete the pro of. In Algorithm 1 , the triangulation G ∗ is arbitrary . Ho w ev er for eve ry iterat ive step of adjusting the C - marginal, w e hav e to use the p erfect sequence with C ∗ 1 ⊃ C . Example 2 (the fiv e cycle mo del) . Consider the fiv e cycle mo de l in Figur e 2-(i). K is expr esse d by K = k 11 k 12 k 13 0 0 k 12 k 22 0 k 24 0 k 13 0 k 33 0 k 35 0 k 24 0 k 44 k 45 0 0 k 35 k 45 k 55 . By adding the fil l-in e dges { 2 , 3 } and { 3 , 4 } , a trian g ulate d g r aph G ∗ c an b e obtaine d as in Figur e 2-(ii). Consider the c ase wher e C = { 1 , 2 } . Define C ∗ 1 = { 1 , 2 , 3 } , C ∗ 2 = { 2 , 3 , 4 } and C ∗ 3 = { 3 , 4 , 5 } . Then the se quenc e C ∗ 1 , C ∗ 2 , C ∗ 3 is p erfe ct a n d it induc es a p erfe ct elimination or de r 5 , 4 , 3 , 2 , 1 . The up date of K ∗ in step 2 in ac c or danc e with the p erfe ct elimination or der is describ e d as fol lows, K ∗ 34 , 34 ← K ∗ 34 , 34 − ( k ∗ 55 ) − 1 k ∗ 35 k ∗ 45 ( k ∗ 35 k ∗ 45 ) , K ∗ 23 , 23 ← K ∗ 23 , 23 − ( k ∗ 44 ) − 1 k ∗ 24 k ∗ 34 ( k ∗ 24 k ∗ 34 ) , K ∗ 12 , 12 ← K ∗ 12 , 12 − ( k ∗ 33 ) − 1 k ∗ 13 k ∗ 23 ( k ∗ 13 k ∗ 23 ) . Then K ∗ 12 , 12 = (( K t − 1 ) − 1 12 , 12 ) − 1 = (( K t − 1 ) − 1 C C ) − 1 . W e no w analyze the computational cost of the prop osed algorithm. In Step 2, the running time of the calculation of (7) is as fo llo ws, 8 • K ∗ 1 := ( k ∗ δδ ) − 1 K ∗ Qδ requires | Q | divisions ; • K ∗ 2 := K ∗ 1 K ∗ δQ requires | Q | 2 m ultiplications ; • K ∗ QQ − K ∗ 2 requires | Q | 2 subtractions. Define R ∗ 1 := C ∗ 1 \ C and R ∗ m := C ∗ m \ S ∗ m for m = 2 , . . . , M . | Q | ranges o v er {| C m | − j | 1 ≤ j ≤ R ∗ m , 1 ≤ m ≤ M } . Let µ , γ and σ measure the time units required by a single m ultiplication, division and subtraction, resp ec tiv ely . Then the running time of Algorithm 1 amounts to ( µ + σ ) M X m =1 R ∗ m X j =1 ( | C ∗ m | − j ) 2 + δ M X m =1 R ∗ m X j =1 ( | C ∗ m | − j ) = ( µ + σ ) M X m =1 n | R ∗ m || C ∗ m | 2 − | R ∗ m || C ∗ m | − | R ∗ m | 2 | C ∗ m | + | R ∗ m | ( | R ∗ m | + 1)(2 | R ∗ m | + 1) 6 o + δ M X m =1 | R ∗ m || C ∗ m | − (1 + | R ∗ m | ) | R ∗ m | 2 + 2 σ | C | 2 . Since | C ∗ m | ≥ | R ∗ m | , the computational cost of Algorithm 1 is O P M m =1 | R ∗ m || C ∗ m | 2 . Once (( K t − 1 ) − 1 C C ) − 1 is obtained, O ( | C | 2 ) a dditions ar e required to compute (6). Note that we can compute ( W C C ) − 1 once b efore the IPS pro cedure. Hence the computational c ost of the t -th iterative step amounts t o O | C | 2 + P M m =1 | R ∗ m || C ∗ m | 2 . In the case of cycle mo dels, | C | = 2 , M = | ∆ | − 2 , | C ∗ m | = 3 and | R ∗ m | = 1. Th us the computationa l cost is O ( | ∆ | ). As men tio ned in the previous section, the direct computation of (( K t − 1 ) − 1 C C ) − 1 requires O ((∆ \ C ) 3 ) = O ( | D | 3 ) time and in the case of cycle mo dels it requires O (( | ∆ | − 2) 3 ) time. Hence we can see the efficiency o f the prop osed algorithm. 4 Numerical exp eri m en t s for cycle mo dels In this section w e compare the lo calized IPS prop osed in the previous section with the direct computation of the IPS by n umerical experiments . W e conside r the | ∆ | cycle mo dels with | ∆ | = 5 , 10 , 50 , 100 , 20 0 , 300 , 500 , 1 0 00. W e set K = I | ∆ | . W e generate 100 Wishart matrices W with the parameter I | ∆ | and the degrees of freedom | ∆ | a nd computed the MLE ˆ K for | ∆ | cycle mo dels b y using the pro posed algorithm and the direct computation of the IPS. W e set the initial estimate K 0 := I | ∆ | . As a con v ergence criterion, w e used P i,j | k t ij | ≤ 10 − 6 . The computation w as done on a In tel C ore 2 Duo 3.0 GHz C PU mac hine b y using R language. T able 1 presen ts the av erage CPU time p er one iterative step to up date K t − 1 in (6) for bo th algorithms. 9 W e can see the comp etitiv e p erformance of the prop osed algorithm when | ∆ | = 5 and | ∆ | ≥ 200. Ho w ev er the direct computation is faster than the prop osed one for | ∆ | = 10, 50, 100. In the up date pro cedure of direct computation (6), the computation of (( K t − 1 ) D D ) − 1 is the most computationally exp ensiv e and in theory it requires O ( | D | 3 ) time. T able 2 shows t he a v erage CPU t im e for computing a | ∆ | × | ∆ | in v erse matrix b y using R la nguage on the same mac hine. As seen from the table, while the CPU time for computing the inv erse of a mat r ix increases nearly at the rate O ( | ∆ | 3 ) f or | ∆ | > 100 , it increases to o slow ly for relative ly small | ∆ | . On the other hand, w e can see from T able 1 that the CPU time of the prop osed algorithm almost linearly increases in prop ortion to | ∆ | whic h follows the theoretical result in the previous sec tion. These are the reasons wh y the propo s ed algorithm is slo w er than the direct computation for relatively small | ∆ | . When | ∆ | ≥ 200 , how ever, the computational cost o f (( K t − 1 ) D D ) − 1 is not ig norable and the prop osed algorithm show s a considerable reduction of computational time. In practice the p erformances f or large mo dels are more crucial. In this sense the pro p osed algorithm is considered to b e efficie n t. T able 1: CPU time p er one iterativ e pro cedure for | ∆ | cycle mo dels | ∆ | Algorithm 1 direct computation 5 1.590 2.261 10 3.442 2.523 50 18.63 5.482 100 38.11 17.48 200 78.87 104.91 300 120.03 361.01 500 225.94 1292.1 1000 511.60 6625.8 (10 − 2 CPU time) T able 2: CPU time for calculating | ∆ | × | ∆ | in v erse matrices | ∆ | CPU time 5 0.027 10 0.031 50 0.058 100 0.286 200 1.789 300 5.602 500 24.666 1000 207.66 (10 − 2 CPU time) 10 5 Conclud ing remarks In this article we discu ssed t he lo calization to reduce the computational burden of the IPS in tw o w a ys. W e first show ed that the decomp osition into mp-subgraphs of the graph can lo calize the IP S. Next w e pro p osed a lo calized algorithm of the iterativ e step in the IPS b y using the structure of a c ho r dal extens ion of the graphical mo del for each mp-subgraph. The prop osed algorithm costs O ( | ∆ | ) in the case of cycle mo dels and some numerical exp eriments confirmed the theory fo r large models. As men tio ne d in Section 1, t he implemen tation of the IPS requires enume ration of all maximal cliques o f the graph a nd this en umeration has an exponen tial complexit y . In addition, the prop osed algorithm a lso requires some c haracteristics of graphs, that is, a c hordal extension, p erfect sequences a nd p erfect elimination orders of the chordal exten- sion. In this sense, the application of the IPS may b e limited. How ev er in the case where the structure of the mo del is simple or sparse, it ma y be feasible to obtain c haracteristics of gr a phs . In suc h cases, the prop osed algorithm is considered to b e effectiv e. Ac kno wledgmen t The authors are grateful to tw o ano n ymous referees for constructiv e comments and sug- gestions whic h ha v e led to improv ements in the presen tation of the pa p er. References [1] J. H. Badsb erg and F. M. Malv estuto. An implrmen taition of the iterat ive prop or- tional fitting pro cec ure b y propaga tion trees. Comput. Statist. Data. A nal. , 37:297– 322, 2001 . [2] D . R. Co x and N. W erm uth. Multivariate D ep endenc i e s . Chapman and Hall, London, 1996. [3] I. Csisz´ ar. I - divergenc e geometry of probability distributions and minimization prob- lems. Ann. Pr ob a b. , 3:146–158, 1 975. [4] J. D ahl, V anden b erghe L., and V. Royc ho wdh ury . Co v ariance selection for non- c hordal gr a phs via c horda l em b edding, 2006. T o app ear in Optimization Metho ds and Softwar e . [5] W. E. D emin g and F. F. Stephan. On a least squares adjustmen t of a sampled frequency table when the expected marginal tota ls a re kno wn. A nn. Math. S tatist , 11:427–44 4, 1940. [6] A. P . Dempster. Cov ariance selec tion. Biometrics , 28:157–175, 1972. [7] G. A. Dira c. O n r igid circuit graphs. Abh. Math. Sem. Univ. Hambur g , 25:71–76, 1961. 11 [8] A. Do bra, C. Hans, B. Jones, J. R. Nevins, G. Y ao, and M. W est. Sparse graphical mo dels for exploring gene expression data . J. Multivariate Anal. , 90:196 – 212, 2004. [9] M. Drton and T. S. Ric ha rds on. Graphical metho ds for efficien t lik eliho o d inference in gaussian co v ariance mo dels. arXiv:070 8.1321, 2007. [10] D . M. Edw ards. Intr o d u ction to Gr aphic al Mo del ling . Springer, New Y ork, 2 000. [11] S. E. Fienberg. An iterative pro cedure for estimation in contingency tables. Ann. Math. Statist. , 41:907–917, 1970 . [12] M. F ukuda, H. Ko jima, K. Murota, and K. Nak ata. Exploiting sparcit y in semidef- inite prog ramming via matrix completion I : General f r ame w or k. SIAM J. Optim. , 11:647–67 4, 2000. [13] C. T. Ireland a nd S. K ulba ck. Con tingency table with give n marginal. Biometrika , 55:179–18 8, 1968. [14] R . Jirou ˇ sek. Solution of the marginal problem and decomp osable solutions. K yb er- netika , 27 :403–412, 1 991. [15] R . Jirou ˇ sek and S. P ˇ reu ˇ cil. On the effectiv e implemen t a tion of the iterativ e prop or- tional fitting pro cedure. Comput. Statist. Data. Anal. , 19:177–1 8 9, 199 5. [16] Steffen L. Lauritzen. Gr aphic al Mo dels . O x ford Univers it y Press, Oxford, 1996. [17] H. G . Leimer. Optimal decomposition b y clique separato rs . Discr ete Math. , 11 3 :99– 123, 1993 . [18] H. L i and J. Gui. Gradien t directed regularization fo r sparse gaussian concen tration graphs, with applications to inference of g enetic net w or k . Biostatistics , 7:302– 317, 2006. [19] F . M. Malve stuto and M. Moscarini. Decomp osition o f a hypergraph b y partial-edge separators. The o r et. Comput. Sci. , 237:57–79, 200 0. [20] J. D. Rose. A graph theoretic study of the n umerical solution o f sparse p ositiv e defi- nite. In R. C. Read, editor, Gr aph The ory and Co m put ing , pages 183–217. Academic Press, New Y ork, 1971. [21] T. P . Sp eed and H. T. Kiiv eri. G auss ian marko v distribution ov er finite g raphs. Ann. Statist. , 1 4 :138–150, 1 986. [22] J. Whittak er. Gr aph i c a l Mo dels in Applie d Multivariate Statistics . John Wiley and Sons, Chich ester, 1990. 12
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment