Error AMP Chain Graphs
Any regular Gaussian probability distribution that can be represented by an AMP chain graph (CG) can be expressed as a system of linear equations with correlated errors whose structure depends on the CG. However, the CG represents the errors implicit…
Authors: Jose M. Pe~na
ERROR AMP CHAIN GRAPHS JOSE M. PE ˜ NA ADIT, ID A, LINK ¨ OPING UN IVERSITY, SE- 58183 LINK ¨ OPING, SWEDEN JOSE.M.PE NA@LIU.SE Abstract. An y regular Gaussian probability distribution that can be represented by an AMP c hain graph (CG) can b e expressed a s a system o f linea r equations with correla ted error s whose structure depe nds on the CG. How ever, the CG r epresents the erro rs implicitly , as no no des in the CG corres po nd to the err ors. W e prop ose in this pap er to add so me deterministic no des to the CG in order to represent the error s explicitly . W e call the result an EAMP CG. W e will s how tha t, as desired, ev ery AMP CG is Markov equiv alent to its corres p onding EAMP CG under ma rginalizatio n of the err or no des. W e will also show that every EAMP CG under marginalizatio n of the error nodes is Ma rko v equiv alen t to some L WF CG under ma r ginalization of the error no des, and that the latter is Marko v equiv alent to some directed and acyclic graph (DA G) under marg inalization of the err or no des and conditioning on some selec tio n no des . This is imp or tant b ecause it implies that the indep endence mo del represented b y a n AMP CG can be accounted for by so me data g e ne r ating pro ce s s that is partially observed and ha s selection bias. Finally , we will show that EAMP CGs are clo sed under margina liz a tion. This is a desirable feature because it g uarantees parsimonio us mo dels under marginaliza tion. 1. Intr oduction Chain graphs (CGs) are graphs with p ossibly directed and undirected edges, and no semidi- rected cycle. They hav e b een extensiv ely studied as a formalism to represen t indep endence mo dels. CGs extend Mark o v net w orks, i.e. undirected graphs, and Bay esian net w orks, i.e. directed and acyclic graphs (DA Gs). Therefore, they can mo del symmetric and asymmetric relationships b et w een the random v aria bles of intere st, whic h is o ne o f the reasons of t heir p opularity . Ho w ev er, unlik e Mark o v and Ba y esian netw orks whose interpretation is unique, there are four differen t interpretations of CGs as indep endence mo dels (Co x and W erm uth, 1993, 199 6; D r t on, 2009; Sonn tag and P e ˜ na, 201 3). In this pap er, w e a r e intereste d in the AMP interpretation (Andersson et al., 20 01; Levitz et al., 200 1) and the L WF interpretation (F ryden b erg , 1990; La uritzen and W ermuth, 1 989). An y regular Gaussian pro ba bilit y distribution tha t can b e represen ted by an AMP CG can b e expressed as a system of linear equations with correlated errors whose structure dep ends on the CG (Andersson et a l., 2001, Section 5 ). Ho w ev er, the CG represen ts the errors im- plicitly , as no no des in the CG corresp ond to the errors. W e prop ose in t his pap er to a dd some deterministic no des to the CG in order to represen t the erro r s explicitly . W e call the result a n EAMP CG. W e will sho w that, as desired, ev ery AMP CG is Mar ko v equiv alen t to its corresp onding EAMP CG under mar ginalization of the error no des, i.e. the indep endence mo del represen ted by the former coincides with the independence mo del represen ted by the latter. W e will also show that ev ery EAMP CG under marginalization of the error no des is Mark ov equiv alen t to some L WF CG under marginalization of the error no des, and that the latter is Mark o v equiv alen t to some D A G under marginalizatio n of the error no des and conditioning o n some selection no des. The relev ance of this result can b e b est explained b y extending to AMP CGs what Koster (20 0 2, p. 838) stated for summary graphs a nd Ric hardson and Spirtes (2002, p. 981) stated for ancestral graphs: The fact that an AMP CG has a D A G a s departur e p oin t implies that the indep endence mo del a sso ciated with Date : 02:1 4, 26/ 0 6/21. 1 2 the former can b e a ccoun ted for by some data generating pro cess that is partia lly observ ed (corresp onding to marginalizatio n) and has selection bias (corresp onding to conditioning). Finally , we will sho w tha t EAMP CGs are closed under marginalization, in the sense that ev ery EAMP CG under marginalization o f an y superset of the error no des is Mark o v equiv- alen t to some EAMP CG under margina lizat io n of t he error no des. 1 The relev ance of this result can b e b est appreciated by noting that AMP CGs are not closed under marginalization (Ric hardson and Spirtes, 200 2, Section 9.4). Therefore, the independence mo del represen ted b y a n AMP CG under marginalization ma y not b e represen table b y any AMP CG. Therefore, w e ma y ha v e to represen t it b y an AMP CG with extra edges so as to a v oid represen ting f a lse indep endencies . Ho w ever, if we consider the EAMP CG corresp onding to the original AMP CG, then we will sho w that the margina l indep endence mo del can b e represen ted by some EAMP CG under marginalizatio n of the error no des. The latter case is of course preferred, b ecause the graphical mo del is mor e parsimonious as it do es not include extra edges. See also Richardson a nd Spirtes (2002, p. 965) for a discussion on t he imp ortance of the class of mo dels considered being closed under mar g inalization. It is worth men tioning that Andersson et al. (2001, Theorem 6) hav e iden tified the con- ditions under whic h an AMP CG is Mark ov equiv alen t to some L WF CG. 2 It is clear fro m these conditions tha t there are AMP CGs that are not Mark ov equiv alen t to an y L WF CG. The results in this pap er differ f rom those b y Andersson et al. (200 1, Theorem 6), b ecause w e show that ev ery AMP CG is Marko v equiv alent to some L WF CG with error no des under marginalization of the error no des. It is also w orth men tioning tha t Ric hardson and Spirtes (20 02, p. 10 25) sho w tha t there ar e AMP CGs that are not Mark ov equiv alent to any DA G under marg inalization and condition- ing. Ho w ev er, the results in this pap er sho w that ev ery AMP CG is Mark o v equiv alen t to some D A G with error and sele ction no des under marginalizat io n of the error no des and conditio n- ing of the selection no des. Therefore, the indep endence mo del represen ted b y an y AMP CG has indeed some D A G as departure p oin t and, th us, it can b e accoun ted for b y some data gen- erating pro cess. The results in this pap er do not con t r adict those b y Ric hardson and Spirt es (2002, p. 1025), b ecause they did not consider deterministic no des while w e do (recall that the error no des ar e deterministic). Finally , it is also worth men tioning that EAMP CGs are not the first graphical mod- els to ha v e D A Gs as departure p oint or to b e closed under marginalization. Sp ecifically , summary graphs (Cox and W erm uth, 1996), MC graphs (Koster, 2002), a ncestral gra phs (Ric hardson and Spirtes, 2002), and rib onless graphs (Sadeghi, 2013) predate EAMP CGs and ha v e the men tioned prop erties. Ho w ev er, no ne of these other classes of graphical mo d- els subsumes AMP CGs, i.e. there are indep endence mo dels that can b e represen ted b y an AMP CG but not b y any mem ber of t he other class (Sadeghi and Lauritzen , 201 2, Section 4). T herefore, none of these other classes o f g r a phical mo dels subsumes EAMP CGs under marginalization of the error no des. This justifies the presen t study . The r est o f the pap er is org anized a s follows. W e start by reviewing some concepts in Section 2. W e discuss in Section 3 the semantic s of deterministic no des in the con text o f AMP and L WF CGs. In Section 4, w e in tro duce EAMP CGs and use them to show that ev ery AMP CG is Marko v equiv a lent to some L WF CG under marginalization. In that section w e also show that ev ery AMP CG is Marko v equiv alent to some D A G under marginalizatio n 1 Our definition o f closed under mar ginalization is an a daptation of the standa rd o ne to the fac t that we only care a bo ut indep endence mo dels under marginaliza tion of the err or no des. 2 T o b e exa ct, Ander s son et al. (20 01, Theore m 6) hav e identified the conditions under which all and only the probability distributions that c an b e r e presented b y an AMP CG can also b e repres en ted by so me L WF CG. Ho wev er, for any AMP or L WF CG G , there ar e Gaussia n pro bability distributions that have all and only the indep endencies in the indep endence mo del re pr esented by G , as shown by Levitz e t a l. (20 01, Theorem 6.1) and Pe˜ na (2011, Theor ems 1 and 2). Then, o ur formulation is equiv alent to the original for m ulation of the r esult b y Andersson et a l. (200 1, Theorem 6). 3 and conditioning. In Section 5, w e sho w that EAMP CGs are closed under marginalization. Finally , w e close with some conclusions in Section 6. 2. Preliminaries In this section, w e review some concepts from graphical mo dels that are used later in this pap er. All the graphs and probabilit y distributions in t his pap er are defined ov er a finite set V unless otherwise stated. The elemen ts of V are not distinguished from singletons. The op erators set union and set difference are g iv en equal precedence in the expressions . The term maximal is a lw a ys wrt set inclusion. All the graphs in this pap er are simple, i.e. they con tain at most one edge b et w een an y pair o f no des. Moreo v er, the edge is undirected o r directed. If a g raph G con tains an undirected or directed edge b et w een tw o no des V 1 and V 2 , then w e say tha t V 1 − V 2 or V 1 → V 2 is in G . The paren ts of a set of no des X o f G is the set pa G ( X ) = { V 1 ∣ V 1 → V 2 is in G , V 1 ∉ X and V 2 ∈ X } . A route b et w een a no de V 1 and a no de V n in G is a sequence of (not necessarily distinct) no des V 1 , . . . , V n st V i − V i + 1 , V i → V i + 1 or V i ← V i + 1 is in G for all 1 ≤ i < n . If the no des in the route a re all distinct, then the route is called a path. A route is called undirected if V i − V i + 1 is in G for all 1 ≤ i < n . A route is called strictly descending if V i → V i + 1 is in G for all 1 ≤ i < n . The strict ascendan ts of X is the set san G ( X ) = { V 1 ∣ there is a strictly descending r o ute from V 1 to V n in G , V 1 ∉ X and V n ∈ X } . A route V 1 , . . . , V n in G is called a cycle if V n = V 1 . Moreo v er, it is called a semidirected cycle if V 1 → V 2 is in G and V i → V i + 1 or V i − V i + 1 is in G for all 1 < i < n . A c hain graph (CG) is a graph with no semidirected cycles. A set of no des of a gra ph is connected if there exists a n undirected path in the graph b etw een ev ery pair of no des in the set. A connectivit y comp onen t of a CG is a maximal connected set. W e now r ecall the seman tics of AMP and L WF CGs. A no de B in a path ρ in an AMP CG G is called a triplex no de in ρ if A → B ← C , A → B − C , or A − B ← C is a subpath of ρ . Moreo v er, ρ is said to b e Z -op en with Z ⊆ V when ● ev ery triplex no de in ρ is in Z ∪ san G ( Z ) , and ● no non-triplex no de B in ρ is in Z , unless A − B − C is a subpath of ρ and some no de in pa G ( B ) is not in Z . A section o f a route ρ in a CG is a maximal undirected subroute of ρ . A section V 2 − . . . − V n − 1 of ρ is a collider section o f ρ if V 1 → V 2 − . . . − V n − 1 ← V n is a subroute o f ρ . A route ρ in a CG is said to b e Z -op en when ● ev ery collider sec tion of ρ has a no de in Z , and ● no non-collider sec tion of ρ has a no de in Z . Let X , Y and Z denote three disjoint subsets of V . When there is no Z -op en path (resp ec- tiv ely ro ute) in an AMP (resp ectiv ely L WF) CG G b et w een a no de in X and a no de in Y , w e sa y that X is separated from Y giv en Z in G and denote it as X ⊥ G Y ∣ Z . The indep endence mo del represen ted by G , denoted as I AM P ( G ) or I LW F ( G ) , is the set of separations X ⊥ G Y ∣ Z . In general, I AM P ( G ) ≠ I LW F ( G ) . How eve r, if G is a directed and acyclic graph (DA G ), then I AM P ( G ) = I LW F ( G ) . Giv en an AMP or L WF CG G and t w o disjoin t subsets L and S of V , we denote b y [ I ( G )] S L the indep endence mo del represen ted b y G under marginalizatio n of the no des in L and conditioning on the no des in S . Sp ecifically , X ⊥ G Y ∣ Z is in [ I ( G )] S L iff X ⊥ G Y ∣ Z ∪ S is in I ( G ) and X, Y , Z ⊆ V ∖ L ∖ S . Finally , we denote b y X ⊥ p Y ∣ Z that X is indep enden t of Y given Z in a probabilit y distribution p . W e sa y that p is Mark o vian wrt an AMP or L WF CG G when X ⊥ p Y ∣ Z if X ⊥ G Y ∣ Z for a ll X , Y and Z disjoin t subsets of V . W e sa y that p is faithful to G when X ⊥ p Y ∣ Z iff X ⊥ G Y ∣ Z for all X , Y and Z disjoint subsets of V . 4 3. AMP and L WF CGs with Deterministic Nodes W e say that a no de A of an AMP or L WF CG is determined by some Z ⊆ V when A ∈ Z or A is a function of Z . In that case, w e also sa y that A is a deterministic no de. W e use D ( Z ) to denote all the no des that are determined b y Z . F rom the p oin t of view of the separations in an AMP or L WF CG, that a no de is determined b y but is not in the conditioning set of a separation has the same effect as if the no de were actually in the conditioning set. W e extend the definitions of separation for AMP a nd L WF CGs to the case where deterministic no des ma y exist. Giv en an AMP CG G , a path ρ in G is said to b e Z -o p en when ● ev ery triplex no de in ρ is in D ( Z ) ∪ san G ( D ( Z )) , and ● no non-triplex no de B in ρ is in D ( Z ) , unless A − B − C is a subpath of ρ and some no de in pa G ( B ) is not in D ( Z ) . Giv en an L WF CG G , a ro ut e ρ in G is said to b e Z - o p en when ● ev ery collider sec tion of ρ has a no de in D ( Z ) , and ● no non-collider sec tion of ρ has a no de in D ( Z ) . It should b e no ted that we are not the first t o consider g raphical mo dels with deterministic no des. F o r instance, Geiger et al. (1990, Section 4) consider DA Gs with deterministic no des. Ho w ev er, our definition of deterministic no de is more general than theirs. 4. Fro m AMP CGs to D A Gs Via EAMP CGs Andersson et al. (2001, Section 5) show that a ny regular Gaussian proba bilit y distribution p that is Mark ov ian wrt an AMP CG G can b e expressed as a system of linear equations with correlated errors whose structure dep ends on G . Sp ecifically , a ssume without loss of generalit y that p ha s mean 0. Let K i denote an y connectivit y comp onen t of G . Le t Ω i K i ,K i and Ω i K i ,pa G ( K i ) denote submatrices of the precision matrix Ω i of p ( K i , pa G ( K i )) . Then, as sho wn b y Bishop (20 06, Section 2.3.1), K i ∣ pa G ( K i ) ∼ N ( β i pa G ( K i ) , Λ i ) where β i = − ( Ω i K i ,K i ) − 1 Ω i K i ,pa G ( K i ) and ( Λ i ) − 1 = Ω i K i ,K i . Then, p can b e expresse d as a system of linear equations with nor ma lly distributed errors whose structure dep ends on G a s f o llo ws: K i = β i pa G ( K i ) + ǫ i where ǫ i ∼ N ( 0 , Λ i ) . Note that for all A, B ∈ K i st A − B is not in G , A ⊥ G B ∣ pa G ( K i ) ∪ K i ∖ A ∖ B and thu s ( Λ i ) − 1 A,B = 0 (Lauritzen, 1 9 96, Prop osition 5.2). Note also tha t for a ll A ∈ K i and B ∈ pa G ( K i ) st A ← B is not in G , A ⊥ G B ∣ pa G ( A ) a nd th us ( β i ) A,B = 0. L et β A con tain the nonzero elemen t s o f the v ector ( β i ) A, ● . Then, p can b e expressed as a system o f linear equations with correlated errors whose structure dep ends on G as follo ws. F or an y A ∈ K i , A = β A pa G ( A ) + ǫ A and for any other B ∈ K i , cov ar iance ( ǫ A , ǫ B ) = Λ i A,B . It is w orth mentioning that the mapping ab o v e b et w een probabilit y distributions and sys- tems of linear equations is bijectiv e (Andersson et al., 2001, Section 5). Note that no no des 5 G G ′ G ′′ [ G ′ ] { A,B ,F } A B C D E F A B C D E F ǫ A ǫ B ǫ C ǫ D ǫ E ǫ F A B C D E F ǫ A ǫ B ǫ C ǫ D ǫ E ǫ F S ǫ C ǫ D S ǫ C ǫ E S ǫ D ǫ F S ǫ E ǫ F C D E ǫ A ǫ B ǫ C ǫ D ǫ E ǫ F Figure 1. Example of the differen t transformations. in G corresp ond to the errors ǫ A . Therefore, G represen t the errors implicitly . W e prop ose to r epresen t them explicitly . This can easily b e done by transforming G in to what we call an EAMP CG G ′ as follo ws: 1 Let G ′ = G 2 F or eac h no de A in G 3 Add the no de ǫ A to G ′ 4 Add the edge ǫ A → A to G ′ 5 F or eac h edge A − B in G 6 Add the edge ǫ A − ǫ B to G ′ 7 Remo v e the edge A − B fr om G ′ The transformation ab ov e basically consists in adding the error no des ǫ A to G and connect them appropriately . Figure 1 sho ws an example. Note that ev ery no de A ∈ V is determined b y pa G ′ ( A ) and, what is more imp ortant in this pap er, that ǫ A is determined by pa G ′ ( A ) ∖ ǫ A ∪ A . Note also that, giv en Z ⊆ V , a no de A ∈ V is determined by Z iff A ∈ Z . The if pa rt is tr ivial. T o see t he only if part, note t ha t ǫ A ∉ Z and thus A cannot b e determined by Z unless A ∈ Z . Therefore, a no de ǫ A in G ′ is determined b y Z iff pa G ′ ( A ) ∖ ǫ A ∪ A ⊆ Z b ecause, as sho wn, there is no other w a y for Z t o determine pa G ′ ( A ) ∖ ǫ A ∪ A whic h, in turn, determine ǫ A . Let ǫ denote all the error no des in G ′ . It is easy to see that G ′ is an AMP CG ov er V ∪ ǫ a nd, th us, its seman t ics are defined. The follow ing theorem confirms that these seman tics are as desired. Theorem 1 . I AM P ( G ) = [ I AM P ( G ′ )] ∅ ǫ . Pr o of. It suffice s to sho w that ev ery Z -op en pat h b etw een α and β in G can b e transformed in to a Z - o p en path b et w een α and β in G ′ and vice v ersa, with α, β ∈ V and Z ⊆ V ∖ α ∖ β . Let ρ denote a Z -op en path b et w een α and β in G . W e can easily tr a nsform ρ into a path ρ ′ b et w een α a nd β in G ′ : Simply , replace ev ery maximal subpath o f ρ of the f orm V 1 − V 2 − . . . − V n − 1 − V n ( n ≥ 2) with V 1 ← ǫ V 1 − ǫ V 2 − . . . − ǫ V n − 1 − ǫ V n → V n . W e now sho w that ρ ′ is Z - op en. First, if B ∈ V is a triplex no de in ρ ′ , then ρ ′ m ust ha v e one of the f ollo wing subpaths: A B C A B ǫ B ǫ C ǫ B B C ǫ A with A, C ∈ V . Therefore, ρ m ust hav e one of the follo wing subpaths (sp ecifically , if ρ ′ has the i -th subpath ab ov e, then ρ has the i -th subpath b elow): 6 A B C A B C A B C In either case, B is a triplex no de in ρ and, th us, B ∈ Z ∪ san G ( Z ) for ρ to b e Z -op en. Then, B ∈ Z ∪ san G ′ ( Z ) b y construction of G ′ and, thus, B ∈ D ( Z ) ∪ san G ′ ( D ( Z )) . Second, if B ∈ V is a non-t riplex no de in ρ ′ , then ρ ′ m ust hav e one of the following subpaths: A B C A B C A B C A B ǫ B ǫ C ǫ B B C ǫ A with A, C ∈ V . Therefore, ρ m ust hav e one of the follo wing subpaths (sp ecifically , if ρ ′ has the i -th subpath ab ov e, then ρ has the i -th subpath b elow): A B C A B C A B C A B C A B C In either case, B is a non-triplex no de in ρ and, thus, B ∉ Z for ρ to b e Z - op en. Since Z con tains no error no de, Z cannot determine any no de in V that is not already in Z . Then, B ∉ D ( Z ) . Third, if ǫ B is a non- t r iplex no de in ρ ′ (note that ǫ B cannot b e a triplex no de in ρ ′ ), then ρ ′ m ust ha v e one of the follo wing subpaths: A B ǫ B ǫ C ǫ B B C ǫ A α = B ǫ B ǫ C ǫ B B = β ǫ A A B ǫ B ǫ C ǫ B B C ǫ A ǫ A ǫ B ǫ C with A, C ∈ V . Recall that ǫ B ∉ Z b ecause Z ⊆ V ∖ α ∖ β . In the first case, if α = A then A ∉ Z , else A ∉ Z for ρ to b e Z -op en. Then, ǫ B ∉ D ( Z ) . In t he second case, if β = C then C ∉ Z , else C ∉ Z for ρ to b e Z -op en. Then, ǫ B ∉ D ( Z ) . In the third and fourt h cases, B ∉ Z b ecause α = B or β = B . The n, ǫ B ∉ D ( Z ) . In the fifth and sixth cases, B ∉ Z f o r ρ to b e Z -op en. Then, ǫ B ∉ D ( Z ) . The la st case implies that ρ has the follo wing subpath: A B C Th us, B is a non- triplex no de in ρ , whic h implies that B ∉ Z o r pa G ( B ) ∖ Z ≠ ∅ for ρ to b e Z -op en. In either case, ǫ B ∉ D ( Z ) (recall that pa G ′ ( B ) = pa G ( B ) ∪ ǫ B b y construction of G ′ ). Finally , let ρ ′ denote a Z - o p en path b etw een α and β in G ′ . W e can easily tra nsform ρ ′ in to a path ρ b etw een α a nd β in G : Simply , replace ev ery ma ximal subpath of ρ ′ of the f orm V 1 ← ǫ V 1 − ǫ V 2 − . . . − ǫ V n − 1 − ǫ V n → V n ( n ≥ 2) with V 1 − V 2 − . . . − V n − 1 − V n . W e now sho w that ρ is Z -op en. First, note that all the no des in ρ are in V . Moreo v er, if B is a triplex no de in ρ , then ρ m ust ha v e one of the follo wing subpaths: A B C A B C A B C with A, C ∈ V . Therefore, ρ ′ m ust ha v e one of the follo wing subpaths (sp ecifically , if ρ has the i -th subpath ab ov e, then ρ ′ has the i -th subpath b elo w): A B C A B ǫ B ǫ C ǫ B B C ǫ A In either case, B is a triplex no de in ρ ′ and, th us, B ∈ D ( Z ) ∪ san G ′ ( D ( Z )) for ρ ′ to b e Z -op en. Since Z con tains no error no de, Z cannot determine an y no de in V that is not already in Z . Then, B ∈ D ( Z ) iff B ∈ Z . Since there is no strictly descendin g ro ute from B 7 to an y error no de, then any strictly descending rout e f r o m B to a no de D ∈ D ( Z ) implies that D ∈ V whic h, as seen, implies that D ∈ Z . Then, B ∈ san G ′ ( D ( Z )) iff B ∈ san G ′ ( Z ) . Moreo v er, B ∈ san G ′ ( Z ) iff B ∈ san G ( Z ) b y construction of G ′ . These results together imply that B ∈ Z ∪ san G ( Z ) . Second, if B is a non-triplex no de in ρ , then ρ must hav e one of the following subpaths: A B C A B C A B C A B C A B C A B C with A, C ∈ V . Therefore, ρ ′ m ust ha v e one of the follo wing subpaths (sp ecifically , if ρ has the i -th subpath ab ov e, then ρ ′ has the i -th subpath b elo w): A B C A B C A B C A B ǫ B ǫ C ǫ B B C ǫ A ǫ A ǫ B ǫ C In the first fiv e cases, B is a non-tr iplex no de in ρ ′ and, th us, B ∉ D ( Z ) for ρ ′ to b e Z -o p en. Since Z contains no erro r no de, Z cannot determine any no de in V that is not already in Z . Then, B ∉ Z . In the la st case, ǫ B is a non-triplex no de in ρ ′ and, thus, ǫ B ∉ D ( Z ) for ρ ′ to b e Z -op en. Then, B ∉ Z or pa G ′ ( B ) ∖ ǫ B ∖ Z ≠ ∅ . Then, B ∉ Z or pa G ( B ) ∖ Z ≠ ∅ (recall that pa G ′ ( B ) = pa G ( B ) ∪ ǫ B b y construction o f G ′ ). Theorem 2 . Assume that G ′ has the same deterministic r elationships no matter whether it is interpr ete d as an AMP or L WF CG. Then, I AM P ( G ′ ) = I LW F ( G ′ ) . Pr o of. Assume for a moment that G ′ has no deterministic no de. Note t ha t G ′ has no induced subgraph of the form A → B − C with A, B , C ∈ V ∪ ǫ . Suc h an induced subgraph is called a flag by Andersson et al. (2001, pp. 40 -41). They a lso in tro duce the term biflag, whose definition is irr elev an t here. What is relev ant here is the o bserv ation that a CG cannot hav e a biflag unless it has some flag . Therefore, G ′ has no biflags. Consequen tly , ev ery probability distribution t hat is Mark ov ian wrt G ′ when interpre ted as an AMP CG is also Mark o vian wrt G ′ when interpreted as a L WF CG and vice v ersa (Andersson et al., 2001, Corollary 1). No w, note that t here are Gaussian probabilit y distributions that are faithful to G ′ when in terpreted as an AMP CG (Levitz et al., 2001, Theorem 6.1) as w ell as when interpreted as a L WF CG (Pe˜ na, 2011, Theorems 1 and 2 ). Therefore, I AM P ( G ′ ) = I LW F ( G ′ ) . W e denote this indep endence mo del by I N DN ( G ′ ) . No w, fo rget the momentary assumption made ab ov e that G ′ has no deterministic no de. Recall that we assumed that D ( Z ) is the same under the AMP and the L WF inte rpretations of G ′ for all Z ⊆ V ∪ ǫ . Recall also that, from the p oint of view o f the separatio ns in an AMP or L WF CG, that a no de is determined b y the conditioning set has the same effect as if the no de w ere in the conditioning set. Then, X ⊥ G ′ Y ∣ Z is in I AM P ( G ′ ) iff X ⊥ G ′ Y ∣ D ( Z ) is in I N DN ( G ′ ) iff X ⊥ G ′ Y ∣ Z is in I LW F ( G ′ ) . Then, I AM P ( G ′ ) = I LW F ( G ′ ) . The first ma jor result of this pap er is the f ollo wing corollary , whic h sho ws that eve ry AMP CG is Mark ov equiv alent to some L WF CG under marginalization. The corolla r y follo ws from Theorems 1 and 2. Corollary 1. I AM P ( G ) = [ I LW F ( G ′ )] ∅ ǫ . No w, let G ′′ denote the D A G obtained from G ′ b y replacing ev ery edge ǫ A − ǫ B in G ′ with ǫ A → S ǫ A ǫ B ← ǫ B . Figure 1 shows an example. The no des S ǫ A ǫ B are called selection no des. Let S denote all the selection no des in G ′′ . The f o llo wing theorem relates the semantics of G ′ and G ′′ . Theorem 3. A ssume that G ′ and G ′′ have the same deterministic r elationships. Then, I LW F ( G ′ ) = [ I LW F ( G ′′ )] S ∅ . 8 Pr o of. Assume f or a momen t that G ′ has no deterministic no de. Then, G ′′ has no deter- ministic no de either. W e show b elow that eve ry Z -op en route b etw een α and β in G ′ can b e transformed in to a ( Z ∪ S ) -o p en route b etw een α and β in G ′′ and vice v ersa, with α, β ∈ V ∪ ǫ . This implies that I LW F ( G ′ ) = [ I LW F ( G ′′ )] S ∅ . W e denote this indep endence mo del b y I N DN ( G ′ ) . First, let ρ ′ denote a Z -op en route b et w een α and β in G ′ . Then, w e can easily transform ρ ′ in to a ( Z ∪ S ) -op en route ρ ′′ b et w een α and β in G ′′ : Simply , replace ev ery edge ǫ A − ǫ B in ρ ′ with ǫ A → S ǫ A ǫ B ← ǫ B . T o see that ρ ′′ is actually ( Z ∪ S ) -o p en, note that ev ery collider section in ρ ′ is due to a subroute of the fo rm A → B ← C with A, B ∈ V and C ∈ V ∪ ǫ . Then, an y no de that is in a collider ( r esp ectiv ely non- collider) section of ρ ′ is also in a collider (resp ectiv ely non-collider) section of ρ ′′ . Second, let ρ ′′ denote a ( Z ∪ S ) -op en route b etw een α and β in G ′′ . Then, w e can easily transform ρ ′′ in to a Z -op en route ρ ′ b et w een α a nd β in G ′ : Fir st, replace ev ery subroute ǫ A → S ǫ A ǫ B ← ǫ A of ρ ′′ with ǫ A and, then, replace ev ery subroute ǫ A → S ǫ A ǫ B ← ǫ B of ρ ′′ with ǫ A − ǫ B . T o see that ρ ′ is actually Z -op en, note that ev ery undirected edge in ρ ′ is b et w een t w o noise no des and recall that no noise no de has incoming directed edges in G ′ . Then, again ev ery collider section in ρ ′ is due to a subroute of the form A → B ← C with A, B ∈ V and C ∈ V ∪ ǫ . Then, again an y no de that is in a collider (resp ectiv ely non-collider) section of ρ ′ is also in a collider ( r esp ectiv ely non-collider) section of ρ ′′ . No w, fo rget the momentary assumption made ab ov e that G ′ has no deterministic no de. Recall that we assumed that D ( Z ) is the same no matter whether w e are considering G ′ or G ′′ for all Z ⊆ V ∪ ǫ . Recall also that, from the p oint of view of the separatio ns in a L WF CG, that a no de is determined by the conditioning set has the same effect as if the no de w ere in the conditioning set. T hen, X ⊥ G ′′ Y ∣ Z is in [ I LW F ( G ′′ )] S ∅ iff X ⊥ G ′ Y ∣ D ( Z ) is in I N DN ( G ′ ) iff X ⊥ G ′ Y ∣ Z is in I LW F ( G ′ ) . Then, I LW F ( G ′ ) = [ I LW F ( G ′′ )] S ∅ . The second ma jor result of this pap er is the following corollary , whic h shows that every AMP CG is Mark o v equiv alen t to some DA G under marginalization and conditioning. The corollary follows from Coro llary 1, The orem 3 and the fact that G ′′ is a DA G and, thus, I AM P ( G ′′ ) = I LW F ( G ′′ ) . Corollary 2. I AM P ( G ) = [ I LW F ( G ′′ )] S ǫ = [ I AM P ( G ′′ )] S ǫ . 5. EAMP CGs Are Closed under Marginaliza tion In this section, w e sho w that EAMP CGs are closed under marginalization, meaning that for an y EAMP CG G ′ and L ⊆ V there is an EAMP CG [ G ′ ] L st [ I AM P ( G ′ )] L ∪ ǫ = [ I AM P ([ G ′ ] L )] ǫ . W e actually sho w how to transform G ′ in to [ G ′ ] L . T o g ain some intuition into the problem and our solution to it, assume that L con tains a single no de B . Then, marginalizing out B from the system of linear equations asso ciated with G implies the follow ing: F or ev ery C st B ∈ pa G ( C ) , mo dify the equation C = β C pa G ( C ) + ǫ C b y replacing B with the right-hand side of its corresp onding equation, i.e. β B pa G ( B ) + ǫ B and, then, remov e the equation B = β B pa G ( B ) + ǫ B from the system. In graphical terms, this corresp onds to C inheriting the paren ts of B in G ′ and, then, remo ving B from G ′ . The follo wing pseudo co de formalizes this idea for an y L ⊆ V . 1 Let [ G ′ ] L = G ′ 2 Rep eat until a ll the no des in L ha v e b een considered 3 Let B denote an y no de in L that has not b een considered b efo r e 4 F or eac h pair of edges A → B and B → C in [ G ′ ] L with A, C ∈ V ∪ ǫ 5 Add the edge A → C to [ G ′ ] L 6 Remo v e B and all the edges it participates in from [ G ′ ] L 9 Note that the result of the pseudo co de ab o v e is the same no matter the ordering in whic h the no des in L are selected in line 3. Note also that w e hav e not y et giv en a for ma l definition of EAMP CGs. W e define them recursiv ely as a ll the graphs resulting from applying the pseudo co de in the previous section to an AMP CG, plus a ll the graphs resulting from applying the pseudo co de in this section to an EAMP CG. It is easy to see tha t eve ry EAMP CG is an AMP CG o v er W ∪ ǫ with W ⊆ V and, th us, its semantic s are defined. Theorem 1 together with the f ollo wing theorem confirm that these seman tics are as desired. Theorem 4 . [ I AM P ( G ′ )] L ∪ ǫ = [ I AM P ([ G ′ ] L )] ǫ . Pr o of. W e find it easier to pro v e the theorem b y defining separation in AMP CGs in t erms of routes rather t han paths. A no de B in a route ρ in an AMP CG G is called a triplex no de in ρ if A → B ← C , A → B − C , or A − B ← C is a subroute of ρ (note that ma yb e A = C in the first case). A no de B in ρ is called a non-triplex no de in ρ if A ← B → C , A ← B ← C , A ← B − C , A → B → C , A − B → C , or A − B − C is a subroute of ρ (note that ma yb e A = C in the first a nd la st cases). No t e tha t B ma y b e b oth a triplex and a no n- triplex no de in ρ . Moreo v er, ρ is said to b e Z - op en with Z ⊆ V when ● ev ery triplex no de in ρ is in D ( Z ) , and ● no non-triplex no de in ρ is in D ( Z ) . When there is no Z -op en route in G b et w een a no de in X and a no de in Y , w e sa y that X is separated from Y g iven Z in G and denote it as X ⊥ G Y ∣ Z . This and the standard definition of separation in AMP CGs in tro duced in Section 2 are equiv alen t, in the sense that they iden tify the same separations in G (Andersson et a l., 2001, Remark 3.1). W e prov e the theorem for the case where L con tains a single no de B . The g eneral case follo ws b y induction. Sp ecifically , given α , β ∈ V ∖ L and Z ⊆ V ∖ L ∖ α ∖ β , w e sho w b elow that ev ery Z -op en ro ut e b et w een α and β in [ G ′ ] L can b e transformed into a Z -o p en route b et w een α and β in G ′ and vice vers a. First, let ρ denote a Z -op en route b et w een α and β in [ G ′ ] L . W e can easily transform ρ in to a Z -op en route b et w een α and β in G ′ : F or eac h edge A → C o r A ← C with A, C ∈ V ∪ ǫ that is in [ G ′ ] L but not in G ′ , replace eac h of its o ccurrence in ρ with A → B → C or A ← B ← C , resp ectiv ely . Note that B ∉ D ( Z ) b ecause ǫ B ∉ Z . Second, let ρ denote a Z -op en route b et w een α and β in G ′ . Note that B cannot participate in any undirected edge in G ′ , b ecause B ∈ V . Note also that B cannot b e a triplex no de in ρ , b ecause B ∉ D ( Z ) . Note also that B ≠ α , β . Then, B can only app ear in ρ in the following configurations: A → B → C , A ← B ← C , o r A ← B → C with A, C ∈ V ∪ ǫ . Then, w e can easily transform ρ in to a Z -op en ro ute b etw een α and β in [ G ′ ] L : Replace eac h o ccurrence of A → B → C in ρ with A → C , each o ccurrence of A ← B ← C in ρ with A ← C , and each o ccurrence of A ← B → C in ρ with A ← ǫ B → C . In the last case, note that ǫ B ∉ D ( Z ) b ecause B ∉ Z . 6. Conclusions In this pap er, w e ha v e in tro duced EAMP CGs to mo del explic itly the erro rs in the system of linear equations asso ciated to an AMP CG. W e hav e shown that, a s desired, ev ery AMP CG is Mark o v equiv alen t to its corresp onding EAMP CG under marginalization. W e ha v e used this result to sho w that ev ery AMP CG is Mark ov equiv alen t to some L WF CG under marginalization. This result links the tw o mo st p opular interpre tations of CGs. W e hav e used the previous result to show that ev ery AMP CG is also Mark o v equiv alen t to some DA G under marginalizat io n and conditioning. This result implies that the indep endence mo del represen t ed b y an AMP CG can b e a ccounted fo r by some data generating pro cess that is partially o bserv ed and has selection bias. Finally , w e hav e sho wn that EAMP CGs are closed under marginalization, whic h guaran tees parsimonious mo dels under marginalizatio n. 10 W e are curren tly studying the follo wing t w o questions. Can w e mo dify EAMP CGs so that they b ecome closed under conditioning to o ? Can we rep eat the w ork done here for L WF CGs ? That is, can we add deterministic no des to L WF CGs so that they ha v e D A Gs as departure p oin t and they b ecome closed under marginalizatio n and conditioning ? A ckno wledgments This work is funded by the Cen ter for Indus trial Information T ec hnology (CENI IT) and a so-called career contract at Link¨ oping Univ ersit y , b y the Sw edish Researc h Council (ref. 2010- 4808), and by FEDER funds and the Spanish Gov ernment (MICINN) through the pr o ject TIN2010-20900 -C04-03. Reference s Andersson, S. A., Madigan, D . and P erlman, M. D . Alternativ e Mark o v Prop erties for Chain Graphs. Sc andina vian Journal of Statistics , 28:33- 85, 2 001. Bishop, C. M. Pattern R e c o gnition a n d Machine L e arn i n g . Springer, 2006. Co x, D. R . and W erm uth, N. Linear Dep endencies R epresen ted b y Chain Graphs. Statistic al Scienc e , 8 :204-218 , 1993. Co x, D. R . a nd W ermuth, N. Multivariate D ep en d encies - Mo dels , Analysis and Interpr eta- tion . Chapman & Hall, 1996. Drton, M. Discrete Chain Graph Mo dels. Bernoul li , 1 5:736-753 , 20 0 9. F ryden b erg, M. The Chain Graph Mark ov Prop ert y . Sc andina vian Journal of Statistics , 17:333-3 53 1 990. Geiger, D., V erma, T. and Pe arl, J. Iden tifying Indep endence in Bay esian Netw orks. Networks , 20:507-5 34, 1 990. Koster, J. T. A. Marginalizing and Conditioning in Graphical Mo dels. Be rn oul li , 8:817-8 40, 2002. Lauritzen, S. L. Gr aph ic al Mo dels . Oxford Univ ersit y Press, 1996. Lauritzen, S. L . and W erm uth, N. Graphical Mo dels for Asso ciations b et w een V ariables, some of which are Qualitative and some Quantitativ e. A nnual of S tatistics , 1 7:31-57, 1989. Levitz, M., P erlman M. D. and Madigan, D. Separation and Completeness Prop erties for AMP Chain Gr a ph Mark ov Mo dels. The Annals of Statistics , 29:1751- 1784, 2001. P e ˜ na, J. M. F aithfulness in Chain Gra phs: The Gaussian Case. In Pr o c e e dings of the 14th International Con f e r enc e on Artificial Intel ligenc e and Statistics , 588-599, 20 1 1. Ric hardson, T. and Spirtes, P . Ancestral Gra ph Mark ov Mo dels. The Annals of Statistics , 30:962-1 030, 2002. Sadeghi, K. a nd La uritzen, S. L. Mark o v Prop erties for Mixed Gr a phs. arXiv:110 9.5909v4 [stat.OT]. Sadeghi, K. Sta ble Mixed Graphs. Bernoul li , to app ear. Sonn tag, D . and Pe˜ na, J. M. Chain Graph In terpretations and t heir Relations. In Pr o c e e dings of the 12th Eur o p e an Confer enc e on Symb olic a n d Quantitative Appr o aches to R e asoning under Unc ertainty , to app ear.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment