A generalized back-door criterion
We generalize Pearl's back-door criterion for directed acyclic graphs (DAGs) to more general types of graphs that describe Markov equivalence classes of DAGs and/or allow for arbitrarily many hidden variables. We also give easily checkable necessary …
Authors: Marloes H. Maathuis, Diego Colombo
The Annals of Statistics 2015, V ol. 43, No. 3, 1060–10 88 DOI: 10.1214 /14-AOS1295 c Institute of Mathematical Statistics , 2 015 A GENERALIZED BA CK-DOOR CRITE RION 1 By Marloes H. Maa thuis and Diego Colombo ETH Zurich W e generalize P earl’s back-do or criterion for directed acyclic graphs (DA Gs) to more general types of graphs th at describe Marko v equ iv- alence classes of DA Gs and/or allow for arbitrarily many hidden v ari- ables. W e also give easily chec k able necessary and sufficient graphical criteria for th e existence of a set of v ariables th at satisfies our gener- alized back-door criterion, when considering a single interv ention and a single outcome var iable. Moreo ver, if suc h a set exists, we pro vide an exp licit set that fulfills the criterion. W e illustrate the results in sever al examples. R-co de is a v ailable in the R- p ac k age pcalg . 1. In tro duction. Causal Ba y esian n et works are widely used for ca usal reasoning [e.g., Glymour e t al. ( 1987 ), Koller and F riedman ( 2009 ), P earl ( 1995 , 2000 , 2009 ), Sp irtes, Glymour and Scheines ( 1993 , 2000 )]. In partic- ular, if the causal structure is kn o wn and represent ed by a directed acyclic graph (D A G), this framewo rk allo ws one to ded uce p ost-inte rven tion dis- tributions and causal effect s from the pre-interv entio n (or obser v ational) distribution. Hence, if the causal D A G is kn o wn, one can estimate causal effects f rom observ ational data. Cov ariate adjustment is often used for this purp ose. The b ack-do or criterion [P earl ( 1993 )] is a graphical criterion that is sufficien t for adj ustmen t, in the sense that a set of v ariables can b e used for co v ariate adjustment if it satisfies the bac k-do or criterion for the giv en graph. In p ractice, there are tw o important complications. F irst, the underly- ing DA G may b e u nkno wn. In this case one can try to estimate the DA G, but in general one cannot id en tify the un derlying D A G uniquely . In stead, one can identify its Mark o v equiv alence class, which consists of all D A Gs that enco de the same conditional ind ep endence relationships as the under- lying DA G. Su c h a Mark o v equiv alence class can b e represen ted uniquely Received July 2013; revised No vem b er 2014. 1 Supp orted in part by S wiss NSF Grant 200021-12 9972. AMS 2000 subje ct classific ations. 62H99. Key wor ds and phr ases. Causal inference, cov ariate adjustment, hid d en confound ers, DA G, CPD AG, MAG, P AG. This is an electronic reprint o f the original article published by the Institute of Mathematical Statistics in The Annals of St atist ics , 2015, V ol. 43 , No. 3, 1 060–10 88 . T his r e pr in t differs from the or ig inal in pagination and t yp ogra phic detail. 1 2 M. H. MAA THUIS AND D. COLOMBO b y a differen t t yp e of graph , called a completed partially directed acyclic graph (CPD A G) [Spirtes, Glymour and Scheines ( 1993 ), Meek ( 1995 ), An- dersson, Madigan and Perlman ( 1997 )]. S econd, it is often the case that some imp ortan t v ariables were not measured, meaning that w e d o not ha v e causal sufficiency . In this case, one can work with maximal ancestral graphs (MA Gs) instead of DA Gs [ Ric hardson and Spirtes ( 2002 , 2003 )]. Finally , the underlying MA G m ay b e unknown, so that it m ust b e estimat ed fr om data. Again, there is an ident ifiabilit y problem h er e, as we can generally only iden tify the Marko v equ iv alence class of the und erlying MA G, which can b e represent ed u niquely by a p artial ancestral graph (P A G) [Ric h ardson and Spirtes ( 2002 ), Ali, Ric hard son and Spir tes ( 2009 )]. In th is p ap er, w e therefore consider generalizations of the back-door cri- terion to the follo w ing thr ee scenarios: (1) we assume causal sufficiency , and we only kno w the C PD A G, th at is, the Mark o v equiv alence class of the u nderlying D A G; (2) we do not assume causal su fficiency , and w e kn ow th e MA G on the observ ed v ariables; (3) we do not assume causal sufficiency , and w e only kno w the P A G, that is, the Mark o v equiv alence class of the under lyin g MAG on the observed v ariables. In s cenarios 2 and 3, w e allo w for arb itrarily many hid den (or u nmeasured) v ariables. W e do not, ho w ev er, allo w for selection v ariables, that is, for un- measured v ariables that determine w hether a unit is included in the sample. Since the bac k-do or criterion is a simple criterion that is widely u sed for D A Gs, it seems usefu l to ha v e similar criteria for CPDA Gs, MAG s and P A Gs. W e also hop e that our generalized bac k-do or criterion will mak e w ork- ing with MA Gs and P A Gs less daunti ng, and more accessible to p eople in practice. Our generalized bac k-do or criterion for D A Gs, CPD A Gs, MA Gs an d P A Gs is giv en in S ection 3 ; see esp ecially Definition 3.7 and Th eorem 3.1 . Corr e- sp onding R-co de is av ailable in the function backdo or in the R-pac k age pcalg [K alisc h et al. ( 2012 )]. O u r results are derive d by firs t f orm ulating in v ariance conditions that are sufficient for adjustment, and then u sing the graphical criteria for inv ariance derive d by Zhang ( 2008a ). W e also sho w that the generalized back- do or criterion is equiv alent to P earl’s bac k-do or criterion for single interv entions in D A Gs, and is sligh tly more general for m ultiple interv entio ns in D A Gs (Lemma 3.1 an d Examp le 1 ). In Section 4 , w e giv e necessary and sufficient criteria for the existence of a set that s atisfies the generaliz ed bac k-do or criterion relativ e to a pair of v ariables ( X , Y ) and a D A G, MA G, C PD A G or P A G. Moreo v er, if a generalized bac k-do or set exists, we provide an explicit su c h set. These r esults are summarized in The- orem 4.1 , us in g a general framew ork that co ve rs DA Gs, C PD A Gs, MA Gs A GENER ALIZED BACK-DOOR CRITERION 3 and P A Gs. Corollaries 4.1 – 4.3 sp ecialize the results for D A Gs, CPD A Gs and MA Gs, r esp ectiv ely . W e illustrate our r esults w ith s everal examples in Section 5 . All pr o ofs are given in Section 7 . W e close this introdu ction by discus s ing related work. F or a give n causal D A G, identifiabilit y of causal effects in general or via cov ariate ad j ustmen t has b een studied by v arious auth ors. In particular, there are complete graph i- cal criteria for the id entificatio n of causal effects if a causal D A G with un m ea- sured v ariables is giv en [e.g., Hu ang and V altorta ( 2006 ), Sh pitser and P earl ( 2006 a , 2006b , 2008 ), T ian and P earl ( 2002 )]. Sh pitser, V an der W eele and Robins ( 2010a , 2010b ) stud ied effects th at are id en tifiable via co v ariate ad- justment , and pro vided n ecessary and sufficien t graphical criteria f or this purp ose, again if the causal DA G is given. Their results can b e viewe d as an improv ement on the bac k-do or criterion, wh ic h is only sufficient for ad- justment . T extor and Li ´ skiewicz ( 2011 ) stu died co v ariate adjustment f or a giv en D A G fr om an algorithmic p ersp ectiv e. Among other things, they sho we d that the bac k-do or criterion and the adju stmen t criterion of Sh - pitser, V an d er W eele and Robins ( 2010a ) are equiv alent if one is interested in minimal adj ustmen t sets for a certain s u b class of graphs. V an d er Zan- der, Li ´ skiewicz and T extor ( 2014 ) extended these necessary and sufficient graphical criteria f or co v ariate adjustment to MAGs. There are also existing approac hes that do not m ake th e assu mption that the causal DA G or MA G is giv en. The p rediction algorithm [Spir tes, Gly- mour and Scheines ( 2000 ), C hapter 7] roughly starts fr om a P A G and uses in v ariance results. In this sense it is probably closest to our w ork. The main difference b et w een this metho d and our resu lts is that the prediction algo- rithm is more complex. In p articular, it s earc h es ov er all p ossible orderings of the v ariables, w hic h quic kly b ecomes infeasible f or large graphs. The pre- diction algorithm may , ho wev er, b e more in formativ e, in the sense that cer- tain distributions ma y b e identifiable b y th e pr ediction algorithm but not by the generalized bac k-do or criterion. S tudying the exact relati onship b et w een these t wo approac hes would b e an in teresting topic for futur e work. Other work on data driven metho ds for selection of adjustment v ariables for the estimation of causal effects d o es not assume th at the causal struc- ture is kno wn, b ut do es make some assump tions ab out causal relationships b et we en the v ariables of in terest and /or ab out the existence of a set of v ari- ables th at can b e used for co v ariate adju stmen t [de Lun a, W aernbaum and Ric hardson ( 2011 ), V anderW eele and Shp itser ( 2011 ), En tner, Ho ye r and Spirtes ( 2013 )]. In the current pap er, w e do not mak e an y such assump - tions. On th e other hand, we start f rom a giv en D A G, CPDA G, MA G or P A G. W e do not see this as a gen uine restriction of our approac h, how eve r, since th ere are algorithms to estimate CPD A Gs and P AGs from data (e.g., the PC algorithm [Sp irtes, Glymour and Scheines ( 2000 )], greedy equiv a- lence searc h [Chick ering ( 2002 )] and v ersions of the FCI algorithm [Sp irtes, 4 M. H. MAA THUIS AND D. COLOMBO Glymour and Sc heines ( 2000 ), Colom b o et al. ( 2012 ), C laassen, Mo oij and Hesk es ( 2013 )]). T hese algorithms h av e b een s ho wn to b e consistent, ev en in certain spars e high-dimensional settings [Kalisc h and B ¨ uhlmann ( 2007 ), Colom b o et al. ( 2012 )]. In practice, one could therefore first emplo y such an algorithm, and th en apply the results in the current pap er . 2. Preliminaries. Throughout this pap er, we denote sets in a b old f on t (e.g., X ) and graph s in a calligraphic font (e.g., D or M ). 2.1. Basic gr aphic al definitions. A graph G = ( V , E ) consists of a s et of v ertices V = { X 1 , . . . , X p } and a set of ed ges E . The v ertices r epresen t ran - dom v ariables, and the edges d escrib e conditional ind ep endence and causal (ancestral) relationships. There is at most one edge b et w een ev ery p air of v ertices, and the ed ge set E can conta in (a su bset of ) th e follo wing four edge t yp es: → ( dir e cte d ), ↔ ( bi-dir e cte d ), ❜ ❜ ( nondir e cte d ) and ❜ → ( p ar- tial ly dir e c te d ). A dir e cte d gr aph con tains only directed edges, a mixe d gr aph can con tain directed and bi-directed edges and a p artial mixe d gr aph can con tain all four edge t yp es. Th e end p oin ts of an edge are called marks , and they can b e tails , arr owhe ads or cir cles . W e use the sym b ol “ ∗ ” to denote an arbitrary edge mark. If we are only intereste d in the pr esence or absence of edges, and not in the edge m arks, then w e r efer to the skeleton of a graph. Tw o vertic es are adjac e nt if there is an edge b et w een them. The adjacency set of a v ertex X in G , denoted by adj( X, G ), consists of all v ertices adjacen t to X in G . A p ath is a sequence of distin ct adjacent ve rtices. T he length of a p ath p = h X i , X i +1 , . . . , X i + ℓ i equals th e corresp onding num b er of edges, in this case ℓ . The p ath p is said to b e out of ( into ) X i if the edge b et we en X i and X i +1 has a tail (arro whead) at X i . A sub -path of p from X j to X j ′ is denoted by p ( X j , X j ′ ). W e den ote the concatenatio n of paths by ⊕ , so that, for example, p = p ( X i , X i + k ) ⊕ p ( X i + k , X i + ℓ ) for k ∈ { 1 , . . . , ℓ − 1 } . W e use the con v ent ion th at we r emo ve an y lo ops that ma y o ccur d u e to the concatenati on, s o that th e result do es not con tain d uplicate vertices and is again a p ath. The path p is a dir e cte d p ath fr om X i to X i + ℓ if for all k ∈ { 1 , . . . , ℓ } , the ed ge X i + k − 1 → X i + k o ccurs, an d it is a p ossibly dir e cte d p ath if for all k ∈ { 1 , . . . , ℓ } , the edge b et w een X i + k − 1 and X i + k is not into X i + k − 1 . A c ycle o ccurs if there is a path b et wee n X i and X j of length greater than one, and X i and X j are adjacen t. A directed path from X i to X j forms a dir e c te d cycle together with the edge X j → X i , an d an almost dir e cte d cycle together w ith the edge X j ↔ X i . A dir e cte d acyclic g r aph (D AG ) is a directed graph without directed cycles. An anc estr al graph is a mixed graph without dir ected and almost directed cycles. If X j → X i , we say that X i is a c hild of X j , and X j is a p ar ent of X i . The corresp onding sets of parents and children are d enoted by pa( X i , G ) and ch( X i , G ). If there is a (p ossibly) d irected path from X i to X j or if A GENER ALIZED BACK-DOOR CRITERION 5 X i = X j , then X i is a ( p ossible ) anc estor of X j and X j a ( p ossible ) de- sc endant of X i . T he s ets of ancestors, d escendan ts, p ossib le ancestors, and p ossible descendant s of a v ertex X i in G are d en oted by an( X i , G ), de( X i , G ), p ossibleAn( X i , G ), and p ossibleDe( X i , G ), resp ectiv ely . These definitions are applied disjunctive ly to a set Y ⊆ V , for example, an( Y , G ) = { X i | X i ∈ an( X j , G ) for some X j ∈ Y } . A path h X i , X j , X k i is an unshielde d triple if X i and X k are not adja- cen t. A nonen d p oin t vertex X j on a path is a c ol lider on the path if the path con tains * → X j ← * . A nonendp oint vertex on a p ath wh ich is not a col- lider is a nonc ol lider on th e path. A c ol lider p ath is a path on w hic h every nonendp oin t v ertex is a collider. A path of length one is a trivial collider path. 2.2. Causal Bayesian networks. A Ba y esian net w ork for a set of v ariables V = { X 1 , . . . , X p } is a pair ( D , f ), where D = ( V , E ) is a DA G, and f is a join t pr ob ab ility densit y for V (w ith r esp ect to some dominating measure) that factorizes according to D : f ( V ) = Q p i =1 f ( X i | pa( X i , D )). If the D A G is inte rpr eted causally , in the sense th at X i → X j means that X i has a (p oten tial) direct causal effect on X j , then w e talk ab ou t a c ausal DA G and a c ausal Bayesian network . One can easily deriv e p ost-int erve ntio n densities if the causal Ba y esian net w ork is given and all v ariables are observ ed. In particular, we consider in terv ent ions d o( X = x ) for X ⊆ V [Pe arl ( 2000 )], whic h repr esen t outside in terv ent ions that set the v ariables in X to their resp ectiv e v alues in x . W e assume that su c h inte rven tions are effectiv e, m eanin g that X = x after the in terv ent ion. Moreo v er, we assume that the in terv ent ions are lo cal, mean- ing that the generating mec hanisms of the other v ariables, and hence their conditional distrib utions giv en their p arents, do not change . W e then ha v e f ( V | do( X = x )) = Y X i ∈ V \ X f ( X i | pa( X i , D )) , for v alues of V consisten t w ith x , 0 , otherwise. This is kn o wn as the g-form ula or the trun cated f actoriza tion formula [Robins ( 1986 ), Spir tes, Glymour and Sc heines ( 1993 ), P earl ( 2000 )]. In a Bay esian net wo rk ( D , f ) , the D A G D enco des conditional indep en- dence relationships in the d ensit y f via d-separation [P earl ( 2000 ); see also Definition 3.5 ]. Several D A Gs can enco de the same conditional indep en- dence relationships. S uc h D A Gs form a Mark o v equiv alence class which can b e uniqu ely represen ted b y a CPDA G. A C PD A G is a graph w ith the same sk eleton as eac h DA G in its equiv alence class, and its edges are either di- rected ( → ) or nondirected ( ❜ ❜ ). An edge X i → X j in such a CPD A G means 6 M. H. MAA THUIS AND D. COLOMBO that X i → X j is present in ev ery DA G in the Mark o v equiv alence class, while an edge X i ❜ ❜ X j represent s uncertaint y ab out the edge marks, in the sense that the Mark o v equiv alence class con tains at least one D A G with X i → X j and at least one D A G with X i ← X j . (Note that many authors use X i X j instead of X i ❜ ❜ X j ; we u se ❜ ❜ to ensu re that the CPDA G satisfies the syn tactic prop erties of a P A G; see b elo w.) If some of th e v ariables in a D A G are unobserved, one can transform the D A G into a uniqu e maximal anc e str al gr aph (MA G) on the observ ed v ariables; see Ric hardson and Spirtes [( 2002 ), page 981] for an algorithm. In particular, t w o v ertices X i and X j are adjacen t in a MA G if and only if no su bset of the r emainin g obs erv ed v ariables make s them conditionally indep end ent. Moreo ver, a tail mark X i * X j in a MAG M means that X i is an ancestor of X j in all D A Gs represent ed by M , while an arro whead X i ← * X j means that X i is n ot an ancestor of X j in all D AGs represen ted by M . Thus an edge X i → X j in M means that there is a directed path from X i to X j in all D A Gs represente d by M , bu t we emp hasize that it do es not represent a direct effect with resp ect to the obser ved v ariables, in the sense that there m a y b e other observed v ariables on the d irected path. Sev eral differen t D A Gs can lead to the same MA G, and a MAG repr esen ts a class of (infi n itely many) DA Gs th at ha ve the same d-separation and ancestral relationships among the obs erv ed v ariables. Th e MA G of a causal DA G is called a c ausal MAG . A MAG enco des conditional ind ep endence relationships via the concept of m-separation (Definition 3.5 ). Again, seve ral MA Gs can enco de the same conditional ind ep endence relationships. S uc h MA Gs are called Marko v equiv- alen t, and can b e uniqu ely r ep resen ted b y a p artial anc estr al gr aph (P AG ). This is a partial mixed graph with the same ske leton as eac h MA G in its Mark o v equiv alence class. A tail mark (arro whead) at an edge X i * X j ( X i ← * X j ) in suc h a P A G means that X i * X j ( X i ← * X j ) in eve ry MA G in the Mark o v equ iv alence class, while a circle mark at an edge X i ❜ * X j represent s u ncertain t y ab out the edge mark , in the sense that the Mark o v equiv alence class con tains at least one MA G with X i * X j , and at least one MA G w ith X i ← * X j . W e sa y th at a d ensit y f is c omp atible with a DA G D if the pair ( D , f ) forms a causal Bay esian net work. A density f is c omp atible with a CPDA G C if it is compatible with a D A G in the Mark o v equiv alence class describ ed by C . A density f is c omp atible with a MAG M if there exists a causal Ba y esian net w ork ( D ∗ , f ∗ ) (including hidd en v ariables), such that M is the MAG of D ∗ and f is the corresp on d ing marginal of f ∗ . Finally , f is c omp atible with a P AG P if it is compatible with a MAG in the Marko v equiv alence class describ ed by P . A GENER ALIZED BACK-DOOR CRITERION 7 Fig. 1. Edge c onfigur ations i n MAGs and P AGs for a vi sible e dge A → B ; cf. Zhang ( 2008a ), Figur e 6. Inste ad of the tail m ark at C , one c an also have an arr owhe ad or ci r cle mark. 3. Generalized back-door criterion. W e no w present our generalized bac k- do or criterion in Definition 3.7 and Theorem 3.1 , wh ere the name “gener- alized bac k-do or criterion” is motiv ated by Lemma 3.1 . W e first introdu ce some more sp ecialized d efinitions. Zhang ( 2008a ) in tro duced the concept of ( definitely ) visible edges in MA Gs and P AG s. The r eason for this is as follo ws. A directed edge X → Y in a D A G, CPDA G, MAG or P A G alw a ys means that X is a cause (or an- cestor) of Y , b ecause of the tail mark at X . Ho w ev er, if we allo w for hidd en v ariables (i.e., in MAGs and P A Gs), there ma y b e a hidden confounding v ariable b et we en X an d Y . Visible edges refer to situations where there cannot b e such a hidden confounder b et w een X and Y . Invisible edges, on the other hand, are p ossibly confound ed in the sens e that there is a D A G represent ed by the MA G or P AG with X ← L → Y , where L is n ot measured (in addition to X → · · · → Y ). Definition 3.1 [Visible and invisible edges; cf. Zh an g ( 2008a )]. All d i- rected edges in DA Gs and CPDA Gs are said to b e visib le . Giv en a MA G M / P A G P , a directed edge A → B in M / P is v i sible if there is a vertex C not adjacen t to B , such that there is an ed ge b et ween C and A that is into A , or there is a collider path b et we en C and A that is into A and ev ery nonendp oin t ve rtex on the path is a parent of B . O th erwise A → B is s aid to b e invisible. Figure 1 illustrates the d ifferen t graphical configu r ations that can lead to a visible edge. W e note that Zhang ( 2008a ) used sligh tly different terminology , referring to definitely visible edges in a P AG, while w e simply sa y visible for b oth MA Gs and P A Gs. Borb oudakis, T r ian tafillou and Tsamardin os ( 2012 ) used the term pur e-c ausal edges in s tead of visible edges in MA Gs. W e can n ow generalize the concept of a b ack-do or p ath in Defin ition 3.2 . Definition 3.2 (Bac k-do or p ath). Let ( X , Y ) b e an ordered p air of v ertices in G , wh er e G is a D A G, CPDA G, MA G or P A G. W e sa y th at a path b et wee n X and Y is a b ack-do or p ath fr om X to Y if it do es n ot hav e a visible edge out of X . 8 M. H. MAA THUIS AND D. COLOMBO In a D A G, this defin ition reduces to a path b et w een X and Y that starts with X ← , whic h is the usu al b ac k-do or path as d efined b y Pea rl ( 1993 ). In a C PD A G, a back-door path f rom X to Y is a path b et wee n X and Y that starts with X ← or X ❜ ❜ . In a MA G, it is a path b et ween X and Y that starts with X ↔ , X ← or an invisible ed ge X → . Finally , in a P AG, it is a path b et wee n X and Y t hat starts with X ← * , X ❜ * or an invisible edge X → . W e also need generalizations of the concept of d-separation in DA Gs [Defi- nition 1.2.3 of Pea rl ( 2000 )]. In MA Gs, one can use m-separation [Section 3.4 of Richardson an d Spirtes ( 2002 )]. In CPD A Gs and P A Gs, there is the ad- ditional complication that it m a y b e unclear whether a vertex is a collider or a noncollider on the path. W e therefore n eed the follo wing defin itions: Definition 3.3 [Definite noncollider; Zhang ( 2008a )]. A nonendp oin t v ertex X j on a p ath h . . . , X i , X j , X k , . . . i in a partial mixed graph G is a definite nonc ol lider on the p ath if (i) there is a tail mark at X j , that is, X i * X j or X j * X k , or (ii) h X i , X j , X k i is u nshielded and has circle marks at X j , that is, X i * ❜ X j ❜ * X k and X i and X k are not adjacen t in G . The m otiv ation for conditions (i) and (ii) is s traigh tforward. A tail mark out of X j on th e path ensur es that X j is a noncollider on the path in an y graph obtained by orienting an y p ossible circle marks. Condition (ii) comes f r om the fact that the collider status of unshielded triples is known in CPD A Gs and P A Gs. Hence, if the graph con tains an u nshielded triple that w as n ot oriented as a collider, then it m ust b e a noncollider in all underlying D A Gs or MA Gs. I f G is a D A G or a MAG, th en only condition (i) applies and reduces to th e usual defin ition of a noncollider. Definition 3.4 (Definite status path). A nonendp oin t vertex X j on a path p in a partial mixed graph is said to b e of a definite status if it is either a col lider or a defin ite noncollider on p . T he path p is said to b e of a definite status if all nonend p oin t vertices on the p ath are of a defin ite status. A path of length one is a trivial definite status path. Moreo v er, in D A Gs and MA Gs, all paths are of a definite status. W e n o w define m-connection for d efinite status paths. Definition 3.5 (m-connection). A definite status path p b et wee n ver- tices X and Y in a partial mixed graph is m-c onne cting give n a (p ossibly empt y) set of v ariables Z ( X , Y / ∈ Z ) if the follo w ing tw o conditions hold: (a) ev ery d efinite noncollider on the path is not in Z ; (b) ev ery collider on the path is an ancestor of some mem b er of Z . A GENER ALIZED BACK-DOOR CRITERION 9 If a definite status path p is not m -connecting give n Z , then w e say that Z blo c ks p . If Z = ∅ , w e usu ally omit the phr ase “giv en the empty set.” Definition 3.5 reduces to m-connection for MAGs and d-connection for D AG s. W e note that Zhang ( 2008a ) used th e notions of p ossible m-c onne ction and definite m-c onne ction in P AGs, where his notion of definite m-connection is the same as our n otion of m-connection for d efinite s tatus paths. W e now d efine an adjus tment criterion for DA Gs, CPD A Gs, MAG s and P A Gs. Throughout, we think of X and Y as n onempt y sets. Definition 3.6 (Adjustment criterion). Let X , Y and W b e pairwise disjoin t s ets of v ertices in G , where G represent s a D A G, C P D A G, MAG or P AG . Then we say that W satisfies the adj ustmen t criterion r elativ e to ( X , Y ) and G if for any densit y f compatible w ith G , we h a v e f ( y | do( x )) = f ( y | x ) , if W = ∅ , Z w f ( y | w , x ) f ( w ) d w = E W { f ( y | w , x ) } , otherwise. If X = { X } and Y = { Y } , we simply sa y that a set satisfies the criterion relativ e to ( X , Y ) [rather than ( { X } , { Y } )] and the given graph . W e no w prop ose our generalized bac k-do or criterion f or D AGs, CPD A Gs, MA Gs and P AG s. W e will show in Theorem 3.1 that this criterion is sufficient for adjustment. Definition 3.7 (Generalized bac k-do or criterion and generalized bac k- do or set). Let X , Y and W b e pairwise d isj oin t sets of vertice s in G , where G represents a D A G, CPD A G, MA G or P A G. T hen W satisfies the gener alize d b ack-do or criterion relativ e to ( X , Y ) and G if the follo w in g tw o conditions hold: (B-i) W do es not con tain p ossible descendants of X in G ; (B-ii) for every X ∈ X , the set W ∪ X \ { X } blo cks ev ery defin ite status bac k-do or path f rom X to any m emb er of Y , if an y , in G . A set W that satisfies the generalized bac k-do or criterion r elativ e to ( X , Y ) and G is called a gener alize d b ack-do or set relativ e to ( X , Y ) and G . Remark 3.1. Condition (B-i) in Definition 3.7 is equiv alen t to th e fol- lo wing: (B-i) ′ W do es not con tain p ossible descendant s of X alon g a definite status p ath in G . 10 M. H. MAA THUIS AND D. COLOMBO Condition (B-i) ′ ma y b e easier to c hec k compu tational ly th an (B-i). Th e equiv alence of (B-i) and (B-i) ′ is sho wn in the pro of of Theorem 3.1 , using Lemma 7.2 . Theorem 3.1. L et X , Y and W b e p airwise disjoint sets of v e rtic es in G , wher e G r epr esents a DA G, M AG , CP DA G or P AG. If W satisfies the gener alize d b ack-do or criterion r elative to ( X , Y ) and G (Definition 3.7 ), then it satisfies the adjustment criterion r elative to ( X , Y ) and G (Defini- tion 3.6 ). The pro of of T heorem 3.1 consists of t wo steps . First, we formulate in- v ariance criteria that are sufficient for adjustment (Theorem 7.1 ). Next, we translate th e inv ariance criteria in to th e graphical criteria giv en in Defin i- tion 3.7 , using r esu lts of Z hang ( 2008a ) (Theorem 7.3 ). W e refer to Definition 3.7 as generalized b ac k-do or criterion b ecause its conditions are closely related to Pe arl’s original bac k-do or criterion [ P earl ( 1993 , 2000 )]. Definition 3.8 [P earl’s bac k-do or cr iterion; Defin ition 3.3.1 of Pe arl ( 2000 )]. A set of v ariables W satisfies the b ac k -d o or criterion relativ e to an ordered pair of v ariables ( X , Y ) in a D A G D if th e follo wing t w o conditions hold: (P-i) no ve rtex in W is a descendant of X in D ; (P-ii) W blo cks every path b et wee n X and Y in D that is int o X . Similarly , if X and Y are tw o disjoint subsets of ve rtices in D , then W is said to satisfy the back- do or criterion relativ e to ( X , Y ) in D if it s atisfies the criterion r elativ e to an y pair ( X , Y ) such that X ∈ X and Y ∈ Y . In p articular, the conditions in Definition 3.7 are equiv alen t to P earl’s bac k-do or criterion for a D A G w ith a single in terv ent ion ( | X | = 1 ). F or a D A G w ith multiple in terv enti ons, an y set that s atisfies Pearl’s bac k-do or criterion also satisfies the generalized bac k-do or criterion, but not necessarily the other w a y around. I n th is sense, our criterion is slight ly b etter; see Lemma 3.1 and E x amp le 1 . Lemma 3.1. L et X , Y a nd W b e p airwise disjoint sets of ve rtic es in a DA G D . If W satisfies Pe arl’s b ack-do or criterion (Definition 3.8 ) r ela- tive to ( X , Y ) and D , then W satisfies the gener alize d b ack-do or criterion (Definition 3.7 ) r elative to ( X , Y ) and D . A GENER ALIZED BACK-DOOR CRITERION 11 4. Finding a set that satisfies the generalized b ac k-do or criterion. An imp ortan t r eason f or the p opularity of Pearl’s back-door criterion is the follo w ing. Consider t wo distinct vertic es X and Y in a D AG D . Then pa( X, D ) satisfies the b ac k-d o or criterion relativ e to ( X , Y ) and D , un less Y ∈ pa( X, D ). I n the latter case, there is n o set that satisfies the b ac k-d o or criterion relativ e to ( X , Y ) and D , bu t it is easy to see that f ( y | do( x )) = f ( y ) for an y density f compatible w ith D , since there cannot b e a directed path from X to Y in D . In this section, we form ulate similar r esults for the generalized bac k-do or criterion. I n particular, we consider the f ollo wing pr oblem. Giv en tw o distinct v ertices X and Y in a D A G, CPDA G, MA G or P A G, can w e easily determine if there exists a generalized bac k-do or set r elativ e to ( X , Y ) and the giv en graph? Moreo v er, if th is question is answe red p ositiv ely , can we giv e an explicit set that satisfies the criterion? Th eorem 4.1 addresses these questions in general, while Corollaries 4.1 – 4.3 giv e s p ecific results for D A Gs, CPD A Gs and MA Gs. W e emphasize th at throughout this section, we fo cus on the setting with a single interv entio n v ariable X and a single v ariable of in terest Y . The s etting with multiple inte rven tions (i.e., a set X ) is consid erably more difficult, ev en for D A Gs [Shpitser, V an der W eele and Robins ( 2010a )]. It therefore seems c hallenging to generalize the results in this section to sets X . Handling sets Y seems less difficult, and w e p lan to study this in future w ork. In a D A G, the follo wing result is we ll kn o wn. If X and Y are not ad- jacen t in a D A G D and X / ∈ an( Y , D ), then pa( X, D ) blo cks all p aths b e- t w een X and Y . In MA Gs, we h av e a similar r esult, but we need to use D-SEP( X, Y , M ) instead of the parent set; see Definition 4.1 and Lemma 4.1 . Definition 4.1 [D-SEP ( X, Y , G ) ; cf. p age 136 of Spir tes, Glymour and Sc heines ( 2000 )]. Let X and Y b e tw o d istinct vertices in a mixed graph G . W e say that V ∈ D-SEP( X, Y , G ) if V 6 = X , and there is a collider path b et we en X a nd V in G , su c h that eve ry verte x on this path (includ ing V ) is an ancestor of X or Y i n G . Lemma 4.1. L et X and Y b e two distinct vertic es i n an anc estr al gr aph G . Then the fol lowing statements ar e e quiv alent: (i) X and Y ar e m-sep ar ate d in G by some subset of the r emaining variables, (ii) Y / ∈ D - S EP( X, Y , G ) , and (iii) X and Y ar e m-sep ar ate d in G by D - SE P( X, Y , G ) . Mor e over, if G i s a MAG, a fourth e quivalent statement is (iv) X and Y ar e not adjac ent in G . W e no w in tro duce imp ortant definitions that are n eeded to f orm ulate our generalized bac k-do or criterion in Th eorem 4.1 . 12 M. H. MAA THUIS AND D. COLOMBO Definition 4.2 ( R ∗ and R X ). Let X b e a v ertex in G , where G is a D A G, CPD A G, MA G or P A G. Let R ∗ = R ∗ ( G , X ) b e a class of D A Gs or MA Gs, defined as follo ws. If G is a D A G or a MA G, w e simply let R ∗ = {G } . If G is a CPDA G/P A G, w e let R ∗ b e the sub class of D AG s/MA Gs in the Marko v equiv alence class describ ed by G th at ha v e the same num b er of edges into X as G . F or any R ∈ R ∗ , let R X = R X ( R , G , X ) b e the graph obtained fr om R b y remo ving all directed ed ges out of X th at are visible in G ; see Definition 3.1 . F or any giv en G and X , we say that a graph R X satisfies Definition 4.2 if there exists an R ∈ R ∗ ( G , X ) suc h that R X = R X ( R , G , X ) . Lemma 7.6 sh o w s that the class R ∗ is alw ays nonempt y . The d efinition of R X is related to the X -lo w er manipulated MAG s that we re used b y Zhang ( 2008a ). It is imp ortan t to n ote, ho we ve r, that R X is obtained fr om R by remo ving the edges out of X that are visible in G (rather than R ). Moreo v er, Zhang replaced in visible edges by bi-directed edges, but that is not n eeded for our pur p oses (although it would not h urt to do so). Finally , w e note that R X is ancestral, since an y R ∈ R ∗ is ancestral. W e can n ow present the main resu lt of this section. Theorem 4.1 (Generalized bac k-do or set). L et X and Y b e two distinct vertic es in G , wher e G is a DA G, CPDA G, MAG or P AG. L et R X b e any gr aph satisfying De finition 4.2 . Then ther e exists a gener alize d b ack- do or set r elative to ( X , Y ) and G if and only if Y / ∈ adj( X , R X ) and D - SEP( X , Y , R X ) ∩ p ossibleDe( X, G ) = ∅ . M or e over, if such a gener alize d b ack- do or set exists, then D - SEP( X, Y , R X ) is such a set. The definitions of R ∗ and R X in Definition 4.2 are needed in Theorem 4.1 to ensure that D-S EP( X, Y , R X ) ∩ p ossibleDe( X , G ) 6 = ∅ implies that ther e do es n ot exist a generalized bac k-do or set relativ e to ( X , Y ) and G ; see also Example 8 . F or DA Gs, C P D AGs and MA Gs w e can simplify T heorem 4.1 somewhat; see C orollaries 4.1 – 4.3 . C orollary 4.1 is the w ell-kno wn result for D A Gs that w e discu ssed earlier. Corollary 4.3 is giv en without p ro of, since it follo ws straigh tforw ardly from T heorem 4.1 . Corollar y 4.1 (Generalized bac k-do or set for a DA G). L et X and Y b e two distinct ve rtic es in a DA G D . Then ther e exists a gener alize d b ack- do or set r elative to ( X , Y ) and D if and only i f Y / ∈ pa( X , D ) . Mor e over, if such a gener alize d b ack-do or set exists, then pa( X, D ) i s such a set. Corollar y 4.2 (Generalized b ack-door set for a CPD A G). L et X and Y b e two distinct vertic es in a CPDA G C . L et C X b e the gr aph obtaine d fr om A GENER ALIZED BACK-DOOR CRITERION 13 C by r emoving al l dir e cte d e dges out of X . Then ther e e xi sts a gener alize d b ack-do or set r elative to ( X , Y ) and C if and only if Y / ∈ pa( X , C ) and Y / ∈ p ossibleDe( X , C X ) . Mor e over, if such a gener alize d b ack-do or set exists, then pa( X, C ) is such a set. Corollar y 4.3 (Generalized bac k-do or set f or a MA G). L et X and Y b e two distinct vertic es in a MA G M . Then ther e exists gener alize d b ackdo or set r elative to ( X , Y ) and M if and only if Y / ∈ adj( X, M X ) and D - SEP( X , Y , M X ) ∩ d e( X, M ) = ∅ . Mor e over, if su ch a gener alize d b ack- do or set exists, then D - SEP( X, Y , M X ) is such a set. 5. Examples. W e n o w giv e sev eral examples to illustrate the theory for D A Gs, CPD AG s, MA Gs and P A Gs. 5.1. DA G examples. W e start with an example that sho ws that the gen- eralized back- do or criterion for DA Gs is weak er th an P earl’s b ac k -d o or cri- terion for DA Gs, in the s en se that it can happ en that there is no set that satisfies Pearl’s bac k-do or criterion, while th ere is a set that satisfies the generalized bac k-do or criterion. Example 1. Consider the D A G D in Figure 2 (a) w ith X = { X 1 , X 3 , X 4 } and Y = { Y } . W e fir st s h o w that W = ∅ is a generalized bac k-do or set relativ e to ( X , Y ) and D . Note th at we cannot u se Theorem 4.1 since X is a s et. W e th erefore w ork with Definition 3.7 directly . W e only need to c hec k that the bac k-do or path fr om X 4 to Y is blo ck ed b y W ∪ X \ { X 4 } = { X 1 , X 3 } , whic h is the case s in ce X 3 is a n oncollider on the path. Ind eed, w e ha v e that f ( y | d o( x 1 , x 3 , x 4 )) = f ( y | x 1 , x 3 , x 4 ) in Figure 2 (a), wh ich can b e furth er simplified to f ( y | x 3 ). On the other hand , th ere is n o set that satisfies Pearl’s bac k-do or criterion (Definition 3.8 ) with resp ect to ( X , Y ). T o see this, note that { X 2 , X 3 , X 4 } ⊆ de( X 1 , D ). Hence, th e only p ossible candidate set is W = ∅ . But this set do es not blo c k th e back- do or path from X 4 to Y , since there is no collider on this p ath. (a) (b) Fig. 2. DA G examples. (a) The DAG D f or Example 1 . (b) The DA G D for Example 3 . 14 M. H. MAA THUIS AND D. COLOMBO Next, we n ote that the generalized bac k-do or criterion is not n ecessary for iden tifying p ost-in terve ntio n distribu tions. Tw o simple examples are given b elo w. Example 2. Let X and Y b e t wo distinct vertic es in G , where G repre- sen ts a D A G, CPD AG , MA G or P AG. If X ← * Y in G , then Y ∈ adj( X , R X ) for an y R X satisfying Defin ition 4.2 . Hence T heorem 4.1 im p lies that there do es not exist a generalized back- do or set relativ e to ( X , Y ) and G . On the other hand, it is clear th at f ( y | do( x )) = f ( y ) for an y densit y f compatible with G , since the edge X ← * Y implies that there cann ot b e a p ossibly directed path from X to Y in G ; see Lemma 7.5 b elo w. Example 3. Let D b e th e D A G in Figure 2 (b), and let X = { X 1 , X 2 } and Y = { Y } . Then there do es not exist a generalized bac k-d o or set relativ e to ( X , Y ) and D . T o see this, note that the only candidate v ariable Z cannot b e used, since it is a descend ant of X 1 . Moreo ver, W = ∅ violates condition (B-ii) in Definition 3.7 for X 2 , since W ∪ X \ { X 2 } = { X 1 } do es not blo c k the bac k-do or path X 2 ← Z → Y . On the other hand , f ( y | do( x 1 , x 2 )) = R f ( z | x 1 ) f ( y | x 2 , z ) dz for any den- sit y f compatible with D , by the g-form ula. 5.2. CPDA G examples. W e no w illustrate the theory for CPD A Gs. In Example 5 , there is a set that satisfies the generalized bac k-do or criterion, while in Ex amp le 4 there is n one. Example 4. In the CPDA G C in Figure 3 (a), f ( y | do( x )) is not iden- tifiable. T o see this, n ote that the Mark o v equiv alence class r epresen ted b y this CPDA G con tains three D A Gs. Without loss of generalit y , we denote these by D 1 , D 2 and D 3 , wh ere w e assume that D 1 con tains the su b -graph X ← V 2 → Y , D 2 con tains the sub-graph X ← V 2 ← Y , and D 3 con tains the (a) (b) Fig. 3. CPDA G examples. (a) The CPDA G C for Example 4 . (b ) The CPDA G C ′ for Example 5 . A GENER ALIZED BACK-DOOR CRITERION 15 sub-graph X → V 2 → Y . In D 1 and D 2 there is no directed path from X to Y , so that f ( y | do( x )) = f ( y ) for an y d en sit y f compatible with D 1 or D 2 . In D 3 , ho w ev er, there is a directed path from X to Y . Hence, one can easily con- struct a density f that is compatible with D 3 suc h that f ( y | d o( x )) 6 = f ( y ) . This implies th at f ( y | do( x )) is not ident ifiable. Th is implies th at there can- not b e a generalized bac k-do or set relativ e to ( X , Y ) and C . W e no w app ly Theorem 4.1 to the CP D AG C to c hec k if this leads to the same conclusion. Not e that G = C and R ∗ = {D 3 } . Hence, w e take R = D 3 and the corresp onding R X = D 3 . W e then hav e D-SEP( X , Y , R X ) = { V 1 , V 2 , V 3 } an d p ossibleDe( X, G ) = { V 2 , Y } . Hence, D-SEP( X , Y , R X ) ∩ p ossibleDe( X , G ) = { V 2 } , and Th eorem 4.1 correctly says that it is imp os- sible to satisfy the generalized bac k-do or criterion r elativ e to ( X, Y ) and C . Finally , w e c hec k if Corollary 4.2 also yields the same result. Note that C X = C and Y ∈ p ossibleDe( X, C X ) = { V 2 , Y } . Hence, we again find th at it is imp ossible to satisfy the generalized bac k-do or criterion relativ e to ( X, Y ) and C . Example 5. In the CPDA G C ′ in Figure 3 (b), f ( y | do( x )) is id entifiable and equals f ( y ), since there is no p ossibly directed path from X to Y in C ′ . W e no w c hec k if w e also arriv e at this conclusion by applying Theorem 4.1 . Note that there are t wo DA Gs in th e Mark o v equiv alence class describ ed b y C ′ , n amely D ′ 1 with the edge X → V 2 and D ′ 2 with the edge X ← V 2 . Th us in T h eorem 4.1 , we ha ve G = C ′ and R ∗ = {D ′ 1 } . Hence w e take R = D ′ 1 and the corresp ond ing R X = D ′ 1 . Note that Y / ∈ adj( X , R X ) = { V 1 , V 2 , V 3 } and D-S EP( X, Y , R X ) = { V 1 , V 3 } and p ossib leDe( X, G ) = { V 2 , V 4 } . Hence, D-SEP( X, Y , R X ) ∩ p ossibleDe( X , G ) = ∅ , and D- SEP( X, Y , R X ) = { V 1 , V 3 } satisfies the generalized bac k-do or criterion relativ e to ( X , Y ) and C ′ . W e can indeed c hec k that the set { V 1 , V 3 } satisfies the conditions in Defin ition 3.7 . Finally , we also apply Corollary 4.2 . Note th at C ′ X = C ′ . Moreo v er, Y / ∈ pa( X, C ′ ) and Y / ∈ p ossib leDe( X , C ′ X ). Hence, p a( X, C ′ ) = { V 1 , V 3 } satisfies the generalized b ac k -d o or criterion r elativ e to ( X , Y ) and C ′ . 5.3. MAG examples. Next, we illustrate the theory f or MA Gs. In Ex- amples 6 and 7 , th ere do es n ot exist a generalized bac k-do or set relativ e to ( X, Y ) and the giv en MA Gs. In Example 6 , this is du e to Y ∈ adj( X, M X ), while in Ex amp le 7 , it is due to D-S EP( X, Y , M X ) ∩ de( X , M ) 6 = ∅ . Example 6. Consider the MA G M consisting of the invisible edge X → Y , and s upp ose w e are in terested in f ( y | do( x )). Th en u n derlying DA G could b e as in Figure 4 (a), w here L is unobserved. Th is is a w ell-kno wn example where f ( y | do( x )) is not identifiable. 16 M. H. MAA THUIS AND D. COLOMBO (a) (b) Fig. 4. MA G examples. (a) A p ossible DA G describ e d by the M AG in Example 6 , wher e L is latent. (b) The MAG M for Example 7 . W e no w apply Corollary 4.3 to chec k if we ind eed find that it is imp ossib le to satisfy the generalized bac k-do or criterion r elativ e to ( X, Y ) and M . W e ha v e that M = M X is the graph X → Y . Hence, Y ∈ adj( X , M X ), whic h leads to the correct conclusion. Example 7. Consid er the MA G M in Figure 4 (b) and apply C orol- lary 4.3 with X = { X } and Y = { Y } . Since the edge X → V 3 is visible, M X is constru cted from M by r emo vin g this edge. W e then h av e D-SEP( X , Y , M X ) = { V 1 , V 2 , V 3 } and de( X, M ) = { V 3 , V 5 , Y } . Hence the inte rsection of de( X, M ) and D-SEP( X, Y , M X ) is nonemp t y , and it follo ws that th ere is no generalized bac k-do or set relativ e to ( X, Y ) and M . Indeed, we see that it is imp ossible to satisfy conditions (B-i) and (B-ii) in Definition 3.7 . I n order to b lo ck the b ac k-d o or path h X , V 2 , V 4 , Y i , we must include V 2 or V 4 in our set W , but doing so op ens the collider V 2 on the bac k-do or path h X , V 2 , V 3 , V 5 , Y i . Hence, th e latter path m ust b e b lock ed b y V 3 or V 5 . But b oth these ve rtices are descendant s of X in M , and are therefore not allo wed by condition (B-i). 5.4. P AG example. Finally , Example 8 is an example w h ere there ex- ists a generalized b ac k-do or set relativ e to s ome ( X , Y ) and a P AG. This example also illustrates that th er e ma y b e sub sets of D-SEP( X, Y , R X ) in Th eorem 4.1 that satisfy the generalized b ac k-d o or criterion. In other w ords, Theorem 4.1 may yield a nonminimal set. Hence, if one is in terested in a minimal generalized bac k-do or set, one could consider all sub s ets of D-SEP( X, Y , R X ). Example 8 also illustr ates w h y R X is required to satisfy Definition 4.2 . Example 8. Consider the P A G P in Figure 5 (a), and supp ose w e are in terested in f ( y | do( x )). Note that the MAG R = M as giv en in Fig- ure 5 (b) is in R ∗ ; see Definition 4.2 . W e will apply Theorem 4.1 u sing the corresp onding graph R X , which is as M but without the edge X → Y . W e then ha ve Y / ∈ adj( X , R X ) and D- SEP( X, Y , R X ) ∩ p ossibleDe( X, G ) = A GENER ALIZED BACK-DOOR CRITERION 17 (a) (b) Fig. 5. P AG example. (a) The P AG P for Example 8 . ( b) A p ossible MAG M for Example 8 . { V 1 , V 2 } ∩ { V 3 , V 4 , Y } = ∅ . Hence Theorem 4.1 imp lies that { V 1 , V 2 } is a gen- eralized b ac k-d o or set relativ e to ( X, Y ) and P . On e can easily verify that all sub sets of { V 1 , V 2 } are also generalized back-door sets relativ e to ( X , Y ) and P , sin ce all back-door paths from X to Y are blo c k ed by the collider V 4 on these paths. Th is sho ws that D-SEP( X , Y , R X ) is n ot minimal. This example also sho ws the imp ortance of Definition 4.2 . T o see this, let R ′ b e as R , bu t with the edge X ← V 3 instead of X → V 3 , so th at there is an additional edge in to X . Then D-SEP( X , Y , R ′ X ) = { V 1 , V 2 , V 3 } , and w e get D-SEP( X, Y , R ′ X ) ∩ p ossibleDe( X, G ) = { V 3 } 6 = ∅ . Th is sho ws that applyin g Theorem 4.1 with R ′ X instead of R X leads to incorrect r esults. 6. Discussion. In this pap er, w e generalize P earl’s b ac k-do or criterion [P earl ( 1993 )] to a generalized bac k-do or criterion f or D AG s, CPD AGs, MA Gs and P A Gs. W e also provide easily c hec k able necessary and sufficient criteria for the existence of a generalized bac k-do or set, when considering a single in terv ent ion v ariable and a single outcome v ariable. Moreo ver, if suc h a set exists, w e provide an exp licit set that satisfies th e generalized back- do or criterion. This s et is not necessarily minimal, so if one is in terested in a minimal s et, one could consider all s ubsets. Although effects that can b e computed via the generalized b ac k-d o or cri- terion are only a subset of all iden tifiable causal effects, we hop e that the generalized b ack-door criterion will b e useful in pr actice, and will mak e it easier to work with C P D AGs, MA Gs and P AGs. Moreo v er, com bining our results for CPDA Gs and P AGs with fast causal str ucture learning algo- rithms suc h as the PC algorithm [S p irtes, Glymour and Scheines ( 2000 )] or the F CI algorithm [Spirtes, Glymour and S c heines ( 2000 ), C olombo et al. ( 2012 ), C laassen, Mo oij and Hesk es ( 2013 )] yields a computationally effi- cien t wa y to obtain information on causal effects wh en assu m ing that the observ ational d istribution is faithful to the true u n kno wn causal DA G with or without hidden v ariables. T o our k n o wledge, the prediction algorithm of Spirtes, Glymour and Scheines ( 2000 ) is the only alternativ e approac h un- der the same assump tions, b ut the pr ediction algorithm is compu tationally m uc h more complex. 18 M. H. MAA THUIS AND D. COLOMBO The ID A algorithm [Maath uis, Kalisc h and B ¨ uh lmann ( 2009 ), Maath uis et al. ( 2010 )] has b een d esigned to obtain b ounds on causal effects when assuming that the ob s erv ational distribu tion is faithfu l to the true under - lying causal D A G without hidd en v ariables. IDA roughly com bines the PC algorithm with P earl’s b ac k -d o or criterion. W e could no w apply a s imilar ap- proac h in the setting with hid den v ariables, by com bining the F CI alg orithm with the generalized bac k-do or criterion for MA Gs. P ossible directions for future wo rk include stud ying the exac t relatio nsh ip b et we en the prediction algorithm and our generalized bac k-do or criterion, generalizing the results in Section 4 to allo w for sets X and Y and extend- ing the recent results of V an d er Zan d er, Li ´ skie wicz and T extor ( 2014 ) to CPD A Gs and P AGs. 7. Pro ofs. 7.1. Pr o ofs for Se ction 3 . In order to prov e Theorem 3.1 , w e formulate so-calle d inv ariance conditions that will turn out to b e sufficient for adj ust- men t; see Definition 7.1 and Th eorem 7.1 b elo w. First, we br iefly defin e w h at is meant by inv ariance. W e refer to Zh ang ( 2008a ) f or full details. Let Y , Z and X b e three subsets of vertic es in a causal D A G D , where X ∩ Y = Y ∩ Z = ∅ . Then a d ensit y f ( y | z ) is said to b e entaile d to b e in v ariant u nder interv entions on X give n D if f X := x ( y | z ) = f ( y | z ) for all causal Ba y esian n et works ( D , f ), where the subscript X := x d enotes do( X = x ). (This notation is used sin ce X and Z are allo wed to o v erlap.) Th e densit y f ( y | z ) is said to b e enta iled to b e inv ariant u n der inte rven tions on X give n a CP D A G C , a MAG M or a P A G P if it is en tailed to b e inv arian t und er in terv ent ions on X giv en all DA Gs r epresen ted by C , M or P , resp ectiv ely . Definition 7.1 (Inv ariance cr iterion). L et X , Y and W be pairwise disjoin t sets of v ertices in G , where G is a DA G, C PD A G, MA G or P A G. Then W satisfies the invarianc e criterion relativ e to ( X , Y ) and G if the follo w ing tw o conditions hold for any den s it y f compatible with G : (I-i) f ( w | do( x )) = f ( w ) ; (I-ii) f ( y | do( x ) , w ) = f ( y | x , w ). In other words, conditions (I-i) and (I-ii) state that f ( w ) and f ( y | x , w ) are en tailed to b e in v arian t und er in terv ent ions on X give n G . Th e conditions are also closely related to th e conditions in equation (9) of Pearl ( 19 93 ). W e note that condition (I-i) is trivially satisfied if W = ∅ . Theorem 7.1. L et X , Y and W b e p airwise disjoint sets of vertic es i n G , wher e G is a DA G, CPD AG, MA G or P AG. If W satisfies the invarianc e criterion r elative to ( X , Y ) and G , then it satisfies the adjustment criterion r elative to ( X , Y ) and G . A GENER ALIZED BACK-DOOR CRITERION 19 Pr oof . If W = ∅ , condition (I-ii) immediately give s f ( y | do( x )) = f ( y | x ). Otherw ise, we h a ve f ( y | do( x )) = Z w f ( y , w | d o( x )) d w = Z w f ( y | w , d o ( x )) f ( w | d o( x )) d w . (1) Under cond itions (I-i) and (I-ii), the right-hand side of ( 1 ) simplifies to R w f ( y | w , x ) f ( w ) d w . Spirtes, Glymour and Sc heines ( 1993 , 2000 ), Zhang ( 2008a ) formulate d in v ariance results for D AG s, MA Gs and P A Gs. W e d eriv e a similar result for CPD A Gs and then summarize the resu lts f or all these t yp es of graphs in Theorem 7.2 . Theorem 7.2 (Graphical criteria for inv ariance). L et X , Y , Z b e thr e e subsets of observe d vertic es in G , wher e G r epr esents a DAG, CPDAG, MAG or P AG. Mor e over, let X ∩ Y = Y ∩ Z = ∅ . Then f ( y | z ) is entaile d to b e invariant under interventions on X given G if and only if: (1) for every X ∈ X ∩ Z , every m-c onne cting definite status p ath, if any, b etwe en X and any memb er of Y given Z \ { X } is out of X with a visible e dge; (2) for every X ∈ X ∩ (p ossibleAn( Z , G ) \ Z ) , ther e is no m-c onne cting definite status p ath b etwe en X and any memb er of Y given Z ; (3) for e v ery X ∈ X \ p ossibleAn( Z , G ) , every m-c onne cting definite sta- tus p ath, if any, b etwe en X and any memb er of Y given Z is into X . Pr oof . One can easily c hec k th at the conditions reduce to the appropr i- ate conditions for DA Gs, MA Gs and P AG s [Zh ang ( 2008a ), Pr op osition 18, Theorem 24 an d T h eorem 30]. The resu lt for CPD A Gs can b e pr o ved anal- ogously . Note th at X ∩ Z , X ∩ (p ossibleAn( Z , G ) \ Z ) and X \ p ossibleAn( Z , G ) form a partition of X . Hence, only one of the conditions in Theorem 7.2 is relev an t for a give n X ∈ X . W e also n eed th e follo wing basic p rop ert y of P AGs and CPDA Gs: Lemma 7.1 [Basic prop ert y of CPDA Gs and P AGs; Lemma 1 of Meek ( 1995 ) for CPDA Gs, and Lemma 3.3.1 of Z hang ( 2006 ) for P A Gs]. F or any thr e e ve rtic es A , B and C in a CPDA G C or P AG P , the fol lowing holds: if A * → B ❜ * C , then ther e i s an e dge b etwe en A and C with an arr owhe ad at C , namely A * → C . F urthermor e , if the e dge b etwe en A and B is A → B , then the e dge b etwe en A and C is either A ❜ → C or A → C (i.e ., not A ↔ C ). 20 M. H. MAA THUIS AND D. COLOMBO W e no w sho w that the in v ariance conditions in Definition 7.1 are equiv a- len t to the graphical conditions of Definition 3.7 . Theorem 7.3. The ge ne r alize d b ack-do or criterion (Definition 3.7 ) is e quivalent to the invarianc e criterion (Definition 7.1 ). Pr oof . W e first sh o w that condition (B-ii) of Definition 3.7 is equiv alen t to condition (I-ii) of Definition 7.1 . W e u se Th eorem 7.2 with ( X ′ , Y ′ , Z ′ ), where X ′ = X , Y ′ = Y an d Z ′ = X ∪ W . Then X ′ ⊆ Z ′ , and clause (1) of the theorem yields that (I-ii) is equiv alen t to th e follo wing: for eve ry X ∈ X , ev ery m-connecting definite status path, if an y , b et wee n X and an y mem b er of Y give n ( X ∪ W ) \ { X } is out of X with a visible edge. This is equiv alent to condition (B-ii) b y our definition of a bac k-do or path; see Definition 3.2 . By Lemma 7.2 (b elo w), condition (B-i) of Defin ition 3.7 is equiv alent to condition (B-i) ′ of Remark 3.1 . W e n o w sho w that condition (B-i) ′ is equiv alent to condition (I-i) in Def- inition 7.1 . W e use Th eorem 7.2 with ( X ′ , Y ′ , Z ′ ), wh ere X ′ = X , Y ′ = W and Z ′ = ∅ . Then Z ′ = p ossibleAn( Z ′ , G ) = ∅ and clause (3) of the theo- rem yields that (I-i) is equiv alent to the follo wing condition (I-i) ′ : for eve ry X ∈ X , ev ery m-connecting defin ite status p ath, if any , b et w een X and any mem b er of W is in to X . W e now show th at (I-i) ′ is equiv alen t to (B-i) ′ . First sup p ose that W violates (B-i) ′ . Then there are W ∈ W and X ∈ X suc h that there is a p ossibly d irected defin ite status p ath p from X to W . Since p is p ossibly dir ected, it is not in to X and it cannot contai n colliders. Hence, it is an m-connecting d efinite status path b et wee n X and W t hat is not into X . This violates (I-i) ′ . No w s u pp ose th at W violates (I-i) ′ . Th en there are W ∈ W and X ∈ X suc h that there is an m-connecting d efinite status path b et wee n X and W that is not into X . Let p = h X = U 1 , . . . , U k = W i b e suc h a path. Then ev ery nonendp oin t v ertex on p must b e a definite n oncollider. Supp ose that p is not a p ossibly directed path f rom X to W , meaning that there exists an i ∈ { 2 , . . . , k } suc h that the edge b et w een U i − 1 and U i is into U i − 1 . If i = 2, this means that the path is in to X , wh ic h is a con tradiction. If i > 2, then the edge b et w een U i − 2 and U i − 1 m ust b e out of U i − 1 , since U i − 1 is a d efinite noncollider. But this means that the edge must b e int o U i − 2 , since edges of the form ❜ or are not allo w ed. Contin uin g this argument, we find that for all j ∈ { 2 , . . . , i } , the edge b et we en U j − 1 and U j is into U j − 1 . But this means that th e path is in to U 1 = X , which is a con tradiction. Hence, p is a p ossibly d irected path from X to W . T ogether with th e fact that p is of a definite status, this violates (B-i) ′ . Lemma 7.2. L et X and Y b e two distinct vertic e s in G , wher e G is a DA G, CPDAG, MAG or P AG. If Y ∈ p ossibleDe( X, G ) , then ther e is a A GENER ALIZED BACK-DOOR CRITERION 21 p ossibly dir e cte d definite status p ath p = h X = U 1 , . . . , U k = Y i fr om X to Y . Mor e over, if U i − 1 * → U i for some i ∈ { 2 , . . . , k } , then U j − 1 → U j for al l j ∈ { i + 1 , . . . , k } . Pr oof . If G is a D A G or a MA G, the lemma is trivially true. So let G b e a CPDA G or a P AG, and assume that Y ∈ p ossibleDe( X, G ). T h is implies that there is a p ossibly directed path from X to Y in G . Let p = h X = U 1 , . . . , U k = Y i b e a shortest suc h path. If p is of length one, then the Lemma is trivially true. So assum e that the length of p is at least t wo , that is, k ≥ 3. W e fir st sho w that p is a definite status path. Note that p can con tain the follo wing edges U i − 1 ❜ ❜ U i , U i − 1 ❜ → U i and U i − 1 → U i ( i = 2 , . . . , k ). W e no w consider a sub-p ath p ( U i − 1 , U i +1 ) = h U i − 1 , U i , U i +1 i of p , for s ome i ∈ { 2 , . . . , k − 1 } . This sub-path cannot b e of the form U i − 1 ❜ → U i ❜ ❜ U i +1 or U i − 1 ❜ → U i ❜ → U i +1 . T o see this, supp ose that the sub-path tak es suc h a form. T hen Lemma 7.1 im p lies the edge U i − 1 * → U i +1 . Sup p ose that this edge is into U i − 1 ; that is, it is U i − 1 ↔ U i +1 . Then Lemma 7.1 app lied to U i +1 ↔ U i − 1 ❜ → U i implies the edge U i +1 * → U i , which is a contradicti on. If the edge U i − 1 * → U i +1 is not in to U i − 1 , then p is not a shortest p ossibly directed path. Similarly , the su b-path cannot b e of the form U i − 1 → U i ❜ ❜ U i +1 or U i − 1 → U i ❜ → U i +1 . T o see this, su pp ose that the s ub-path tak es such a f orm. Then Lemma 7.1 implies the edge U i − 1 ❜ → U i +1 or U i − 1 → U i +1 . In either case, p is not a sh ortest p ossibly directed p ath. Moreo ver, if the su b-path is of the form U i − 1 ❜ ❜ U i ❜ ❜ U i +1 , U i − 1 ❜ ❜ U i ❜ → U i +1 or U i − 1 ❜ ❜ U i → U i +1 , then it m ust b e u nshielded. T o see this, su pp ose that the sub -path tak es such a form and is not unshielded. If the edge b e- t w een U i − 1 and U i +1 is into U i − 1 , then Lemma 7.1 applied to U i +1 * → U i − 1 ❜ ❜ U i implies the ed ge U i +1 * → U i , whic h is a con tradiction. If the ed ge b et ween U i − 1 and U i +1 is n ot into U i − 1 , then p is not a s hortest p ossibly directed path. Hence, p can only con tain triples of the form U i − 1 ❜ → U i → U i +1 or U i − 1 → U i → U i +1 , or of the form U i − 1 ❜ ❜ U i ❜ ❜ U i +1 , U i − 1 ❜ ❜ U i ❜ → U i +1 or U i − 1 ❜ ❜ U i → U i +1 where U i − 1 and U i +1 are not adjacen t. I n all these cases, the middle v ertex U i is a definite noncollider, so that p is a d efinite status path. Finally , if U i − 1 * → U i for s ome i ∈ { 2 , . . . , k } , it f ollo ws that U j − 1 → U j for all j ∈ { i + 1 , . . . , k } . Pr oof of Theo rem 3.1 . This f ollo ws directly from Theorems 7.1 and 7.3 . 22 M. H. MAA THUIS AND D. COLOMBO Pr oof o f Lemma 3.1 . Conditions (P-i) and (B-i) are trivially equiv a- len t for DA Gs. W e ther efore only show that (P-ii) implies (B-ii), by con tra- diction. Thus, supp ose that W blo c ks all b ac k-do or p aths b et w een X ∈ X and Y ∈ Y in D , but th er e exist X ∈ X and Y ∈ Y suc h that there is a bac k- do or p ath p from X to Y that is not blo ck ed by W ∪ X \ { X } . Th is means that: (i) no noncollider on p is in W ∪ X \ { X } , (ii) all colliders on p ha ve a descendan t in W ∪ X \ { X } , (iii) there is at least one collider on p that has a descendan t in X \ { X } bu t not in W . Among all colliders satisfying (iii), let Q b e the one that is closest to Y on p , and let X ′ denote a descendant of Q in X \ { X } . Then the directed path q ( Q, X ′ ) from Q to X ′ is m-connecting giv en W , s ince it is a path consisting of noncolliders and none of its vertice s are in W . Moreo v er, the sub -path p ( Q, Y ) of p is m-connecting giv en W by construction. But this means that q ( X ′ , Q ) ⊕ p ( Q, Y ) is a back- do or path from X ′ to Y that is m-connecting give n W . Th is con tradicts (P-ii). 7.2. Pr o ofs for Se ction 4 . W e first give sev eral lemmas, starting with a result ab out m-connection in MA Gs. This result basically sa ys that r eplacing condition (b) in Definition 3.5 by “ev ery collider on the path is an ancestor of some mem b er of Z ∪ { X , Y } ” do es not change the m-separation relations in a MA G. Lemma 7.3 [Ric hardson ( 2003 ), Corollary 1]. L et X and Y b e two dis- tinct vertic es and Z b e a subset of vertic es in a mixe d gr aph M , with Z ∩ { X , Y } = ∅ . If ther e is a p ath b etwe en X and Y in M on which no nonc ol lider is in Z and every c ol lider is in an( Z ∪ { X , Y } , M ) , then ther e is a p ath (not ne c essarily the same p ath) m-c onne cting X and Y gi v en Z in M . Pr oof of Lemma 4.1 . Let G b e an ancestral graph. Firs t, w e note that (iii) trivially implies (i). Next, w e sho w that (i) implies (ii), or equiv alen tly , that n ot (ii) implies not (i). Thus, supp ose th at Y ∈ D-SEP( X , Y , G ). Th en there is a collider path b et wee n X and Y suc h that ev ery v ertex on the path is an ancestor of { X, Y } in G . Th is path is m-connecting giv en an y su b set of the remaining v ertices, by Lemma 7.3 . Next, we sh o w that (ii) implies (iii). Su p p ose th at Y / ∈ D- S EP( X, Y , G ). If there is n o path b etw een X and Y in G , then X an d Y are trivially m- separated by an y su bset of the remaining vertic es. Thus, assume that there is at least one path b et w een X and Y . Consider an arbitrary suc h path, and call it p . Since Y / ∈ D- SEP( X, Y , G ) , w e ha v e Y / ∈ adj( X, G ) . Hence the length of p m ust b e at least tw o. W e will sho w that p is blo ck ed b y D-SEP( X, Y , G ) . Supp ose p starts with X ← V . T h en V ∈ D-SEP( X , Y , G ), since V ∈ an( X, G ) . Since V is a noncollider on p , this implies that p is blo c k ed by D-SEP( X, Y , G ) . A GENER ALIZED BACK-DOOR CRITERION 23 Supp ose p is of the form X * → V → · · · → Y . Then V ∈ an( Y , G ), so that V ∈ D-SEP( X, Y , G ). Since V is a n oncollider on p , this imp lies that p is blo c k ed by D-SEP( X , Y , G ). Supp ose p starts with X * → V → · · · and the s ub-path p ( V , Y ) of p conta ins at least one collider. Let C b e the collider closest to V on p . Then V ∈ an( C, G ). If C / ∈ an(D-SEP( X , Y , G ) , G ), then p is blo ck ed by D-SEP( X, Y , G ). Hence, sup p ose C ∈ an(D-SEP( X, Y , G ) , G ). Sin ce any v ertex in D-SEP( X, Y , G ) is an ancestor of { X , Y } in G , this implies C ∈ an( { X, Y } , G ) an d hen ce V ∈ an( { X, Y } , G ) and V ∈ D-SEP( X, Y , G ). Since V is a noncollider on p , p is blo ck ed b y D-SEP( X, Y , G ). Supp ose p is a collider p ath of the f orm X * → ↔ · · · ← * Y . Then at least one of the colliders is not in an( { X, Y } , G ) , since otherwise Y ∈ D- S EP( X, Y , G ). Let C b e the collider closest to X on p that is not in an ( { X , Y } , G ) . Then C / ∈ an(D-S EP( X, Y , G ) , G ) . Hence, p is blo c k ed by D-S EP( X, Y , G ). Supp ose p is of the form X * → ↔ · · · ↔ V ← W · · · Y , with W 6 = Y ( W = Y w as treated in the p revious case) and the sub-path p ( X, V ) is allo wed to b e of length one (i.e., X * → V ). If W ∈ D- SEP( X, Y , G ) , th en p is blo c k ed by D-SEP( X, Y , G ) . So supp ose that W / ∈ D-SEP( X , Y , G ). Then there do es not exist a collider p ath b et wee n X an d W s u c h that eac h vertex on the p ath is in an( { X , Y } , G ) . This implies that there is a collider on th e su b-path p ( X, W ) of p that is not in an( { X, Y } , G ). Among s u c h ve rtices, let Z b e the one that is closest to X o n p ( X , W ). Then Z / ∈ an(D-SEP( X, Y , G ) , G ). Hence, p is blo ck ed by D-S EP( X, Y , G ). Finally , if G is a MA G, t wo vertices are adjacen t if and only if no subset of the remaining v ariables can m-separate them. Hence, (i) and (iv) are equiv alen t for MA Gs. The follo wing lemma sa ys that we can c hec k the existence of m-connecting definite status bac k-do or p aths in G b y chec king the existence of m-connecting paths in R X , where R X is any graph satisfying Definition 4.2 . Th is lemma is closely related to Lemma 5.1 .7 of Z hang ( 2006 ) and Lemmas 26 and 27 of Zhang ( 2008a ). Lemma 7.4. L et X and Y b e two distinct vertic es and Z b e a su b set of vertic es in G , wher e G is a DA G, CPDA G, MAG or P AG. L et R X b e any gr aph satisfying Definition 4.2 . Then ther e is a definite status m-c onne cting b ack-do or p ath fr om X to Y given Z in G if and only if ther e is an m- c onne cting p ath b etwe en X and Y given Z in R X . Pr oof . Let R ∈ R ∗ and R X satisfy Definition 4.2 . W e first pr o ve the “only if ” statement . Sup p ose there is a defi n ite status m-connecting back- do or path p from X to Y giv en Z in G . Let p ′ and p ′′ b e th e corresp onding paths in R and R X , consisting of th e same sequence of vertice s. (Note that 24 M. H. MAA THUIS AND D. COLOMBO p ′′ exists by th e definition of R X and th e fact that p is a b ac k-do or p ath in G .) Th en th e path p ′ is m-connecting giv en Z in R . The path p ′′ , how eve r, is not n ecessarily m-connecting in R X , s in ce it ma y happ en that ther e is a collider Q on the path su c h that Q ∈ an( Z , R ) bu t Q / ∈ an( Z , R X ). But th is can only o ccur if Q ∈ an( X, R X ). Hence, p ′′ satisfies the follo wing p rop erties: no noncollider on p ′′ is in Z and ev ery collider on p ′′ is in an( Z ∪ { X } , R X ). It then follo ws fr om Lemma 7.3 that there is an m-connecting path b et w een X and Y given Z in R X . W e n o w pro ve the “if ” statemen t. Su p p ose that there is an m-connecting path p ′′ b et we en X and Y give n Z in R X . Let p ′ b e the corresp onding p ath in R , consisting of the same sequence of vertice s. Then p ′ is also m-connecting giv en Z in R . Moreo v er, p do es not start w ith a visible edge ou t of X in G , b ecause p ′′ exists in R X . By Lemma 2 ′ in the pr oof of Lemma 5.1.7 of Zhang ( 2006 ), it then follo ws that ther e exists an m-connecting definite status bac k-do or path b etw een X and Y give n Z in G . The next lemma is used sev eral times to derive a con tradiction. Lemma 7.5. L et U and V b e two distinct vertic e s in G , wher e G denotes a DAG , CP DA G, M AG or P AG. Then G c annot have b oth a p ossibly dir e c te d p ath fr om U to V and an e dge of the form V * → U . Pr oof . This lemma is trivial f or D A Gs and MAGs, since they cannot con tain (almost) directed cycles. So we only show the result for C PD A Gs and P AGs. Let G d enote the CPD A G or P AG, and supp ose that G con tains an edge of the f orm V * → U as w ell as a p ossibly d irected path from U to V in G . Th en there is also a p ossibly directed defin ite status path p = h U = U 1 , . . . , U k = V i from U to V in G , by Lemma 7.2 . The path p has the follo w ing prop erties: if U i − 1 * → U i for some i ∈ { 2 , . . . , k } , th en U j − 1 → U j for all j ∈ { i + 1 , . . . , k } , and the length of p m ust b e at least t wo , b ecause of the edge V * → U . If p is f u lly directed, th er e is an (almost) d irected cycle in an y DA G or MA G in the Mark o v equiv alence class describ ed by G , whic h violates the ancestral pr op ert y . Otherwise, if p con tains a directed su b-path, let p ( U d , V ) b e the longest directed sub-path. Then the su b-path p ( U, U d ) m ust b e of the form U ❜ ❜ · · · ❜ ❜ U d or U ❜ ❜ · · · ❜ ❜ ❜ → U d . In either case, the edge V * → U and rep eated applications of Lemma 7.1 imply the edge V * → U d . Th is giv es an (almost) directed cycle together with the directed path p ( U d , V ) in any D A G or MA G in the Marko v equiv alence class describ ed by G . Th is again con tradicts the ancestral pr op ert y . Otherwise, p do es not conta in a directed sub -path. Let T b e the v ertex preceding V on the p ath. T hen th e p ath has one of the follo wing t w o forms: A GENER ALIZED BACK-DOOR CRITERION 25 U ❜ ❜ · · · ❜ ❜ T ❜ ❜ V or U ❜ ❜ · · · ❜ ❜ T ❜ → V . The edge V * → U and rep eated ap- plications of Lemma 7.1 yield the edge V * → T , which con tradicts T ❜ ❜ V or T ❜ → V . Theorem 4.1 requires a DA G or MA G in R ∗ ; see Definition 4.2 . The follo w ing lemma establishes su c h a D A G or MA G exists, since R ∗ is alw a ys nonempt y . This result is closely related to constructions in Ali et al. ( 2005 ), Theorem 2 of Zhang ( 2008b ) and Lemma 27 of Zhang ( 2008a ). Lemma 7.6. L et G b e a P AG (CPDA G) with k e dges into X , k ∈ { 0 , 1 , . . . } . Then ther e e xists at le ast one MAG (DA G) R in the Markov e quivalenc e class r epr esente d by G that has k e dges into X . Pr oof . Building on th e w ork of Meek ( 1995 ), Theorem 2 of Zhang ( 2008 b ) give s a pro cedure to create a MA G (D A G) in the Marko v equiv a- lence class repr esen ted by a P AG (CPD A G) G . One first replaces all partially directed ( ❜ → ) edges in G by d irected ( → ) edges. Next, one consid ers the cir- cle comp on ent G C of G , that is, the su b-graph of G consisting of nondir ected ( ❜ ❜ ) edges and orien ts this into a directed graph without directed cycles and un shielded colliders. T he first s tep of this pro cedure only creates tail marks, and hence cann ot yield an additional edge into X . F or the second step, we w ill argue that we can construct suc h a graph th at do es not ha v e an y edges in to X . First, we note th at G C is chordal; that is, an y cycle of length four or more has a c hord, whic h is an edge joining t wo vertice s th at are not adjacen t in the cycle; see the p ro of of Lemma 4.1 of Zhang ( 2008b ). An y c hordal graph with more than one v ertex has tw o simp licial v ertices, that is, vertic es V suc h that all v ertices ad j acen t to V are also adj acen t to eac h other [e.g., Golum bic ( 1980 )]. Hence, G C m ust h a v e at least one simplicial vertex th at is different fr om X . W e c ho ose such a v ertex V 1 and orien t any edges inciden t to V 1 in to V 1 . Since V 1 is simplicial, this do es n ot create unshielded colliders. W e then remo v e V 1 and these edges from the graph . T he resulting graph is again c hordal [e.g., Golumbic ( 1980 )] and ther efore again has at least one simplicial verte x that is d ifferent f rom X . Cho ose su c h a v ertex V 2 , and orien t any edges inciden t to V 2 in to V 2 . W e con tin ue this pro cedu re until all edges are oriented. T he resulting ordering is called a p erfect elimination sc heme for G C . By construction, this pro cedur e yields an acyclic dir ected graph w ith ou t unsh ielded colliders. Moreo v er, since X is c hosen as the last v ertex in the p erfect eliminatio n scheme, we do not orien t an y edges into X . Lemma 7.7. L et X and Y b e two distinct vertic e s in G , wher e G is a DAG, CPDAG, MAG or P AG. L e t R X b e any gr aph satisfying Defini- tion 4.2 . If V ∈ D - SEP( X , Y , R X ) ∩ p ossibleDe( X, G ) , then V ∈ an( Y , R X ) . 26 M. H. MAA THUIS AND D. COLOMBO Pr oof . Let R X satisfy Defin ition 4.2 , and let V ∈ D-SEP( X , Y , R X ) ∩ p ossibleDe( X , G ). This means that there is a collider p ath p 1 b et we en X and V in R X suc h that eve ry v ertex on the p ath is an ancestor of X or Y in R X . In p articular, V ∈ an( { X, Y } , R X ). W e first s h o w that V ∈ pa( X , R X ) leads to a con tradiction. Thus, su pp ose there is an edge X ← V in R X . By construction of R X , G then cont ains an edge of the f orm X ← ❜ V or X ← V , b u t this form s a con tradiction together with V ∈ p ossibleDe( X, G ), by Lemma 7.5 . W e now show that V ∈ an( X, R X ) \ pa( X , R X ) leads to a con tradiction. Th us sup p ose th ere is a d irected path fr om V to X in R X of th e form h V , . . . , W, X i , where V 6 = W and W 6 = X . By construction of R X , the ed ge W → X must also b e in to X in G , so that G con tains W ❜ → X or W → X . Since V ∈ p ossibleDe( X, G ) , there is a p ossibly directed path p xv from X to V in G . Since R X con tains a directed path f rom V to W , G must also cont ain a p ossibly d irected path p vw from V to W . Th is implies that p xv ⊕ p vw is a p ossibly directed path from X to W in G , so that W ∈ p ossib leDe( X, G ). But this forms a con tradiction with W ❜ → X or W → X in G , by Lemma 7.5 . Hence, w e must h a v e V ∈ an( Y , R X ). W e can n ow pro ve the main result in Section 4 . Pr oof o f Theore m 4.1 . Let R X satisfy Definition 4.2 . W e first show that Y ∈ adj( X , R X ) or D-SEP( X , Y , R X ) ∩ p ossib leDe( X , G ) 6 = ∅ implies that there do es not exist a generalized bac k-do or set relativ e to ( X , Y ) and G , since no s et W can satisfy cond itions (B-i) and (B-ii) in Definition 3.7 . Th us sup p ose that Y ∈ adj( X, R X ). T h en there is a definite status back- do or path of length one in G that cannot b e blo c ke d. Hence condition (B-ii) cannot b e satisfied b y an y set W . Next, sup p ose that there exists some v er- tex V ∈ D-SE P( X , Y , R X ) ∩ p ossibleDe( X, G ) 6 = ∅ . Then there is a collider path p 1 b et we en X and V in R X suc h that ev ery v ertex on th e path is in an( { X, Y } , R X ). Moreo v er, by Lemma 7.7 , there is a directed path p 2 from V to Y in R X . No w consider p = p 1 ⊕ p 2 . All nonendp oin t v ertices on p that are n ot on p 2 are colliders on p and in an( { X, Y } , R X ). T he remaining nonendp oin t vertic es on p are noncolliders and in p ossibleDe( X, G ) [since V ∈ p ossib leDe( X , G )], so that in clud ing them in W violates condition (B- i). It then follo ws by L emm a 7.3 that for any subset W satisfying condition (B-i), there exists an m-connecting path b etw een X and Y give n W in R X . By Lemma 7.4 , this means that we cannot blo c k all definite s tatus bac k-do or paths from X to Y in G without violating condition (B-i). W e no w pr o v e the other d ir ection. Thus su p p ose that Y / ∈ ad j ( X, R X ) and D-SEP( X , Y , R X ) ∩ p ossibleDe( X, G ) = ∅ . Then we need to show that D-SEP( X, Y , R X ) satisfies conditions (B-i) and (B-ii) of Definition 3.7 . Con- dition (B-i) is satisfied trivially , since D-SEP( X , Y , R X ) ∩ p ossibleDe( X, G ) = A GENER ALIZED BACK-DOOR CRITERION 27 ∅ . T o prov e th at cond ition (B-ii) is satisfied as we ll, w e fi r st sh o w Y / ∈ D-SEP( X, Y , R X ), b y con tradiction. Thus, su pp ose Y ∈ D-SE P( X , Y , R X ) ⊆ D-SEP( X, Y , R ) . By Lemma 4.1 , this implies Y ∈ adj( X, R ). S ince Y / ∈ adj( X, R X ), this implies that X → Y in G with a visible edge. But this means that Y ∈ p ossibleDe( X , G ), so that D-SEP( X , Y , R X ) ∩ p ossibleDe( X, G ) 6 = ∅ . This is a con tradiction, wh ic h implies Y / ∈ D-SEP ( X, Y , R X ). Hence D-SEP( X, Y , R X ) m -separates X and Y in R X b y Lemma 4.1 (we use here that R X is ancestral). By Lemma 7.4 , this implies that D-SE P( X, Y , R X ) blo c ks all definite status bac k-do or paths from X to Y in G , so that cond ition (B-ii) is satisfied. Pr oof of Corollar y 4.1 . Although this resu lt f or D A Gs is w ell kno wn, we show how one can derive this from Theorem 4.1 . Note that D X is the graph obtained by remo ving all dir ected edges out of X from D . Moreo ver, D-SEP( X , Y , D X ) = pa( X , D ) and p ossibleDe( X, D ) = d e( X, D ). No w the cond ition Y / ∈ ad j ( X, D X ) is equiv alen t to Y / ∈ pa( X , D ). The other condition D-S EP( X, Y , D X ) ∩ p ossibleDe( X, D ) = ∅ reduces to pa( X, D ) ∩ de( X, D ) = ∅ , and this is fulfilled automatica lly by the acyclic it y of D . Hence Theorem 4.1 r ed uces to the giv en statemen t. Pr oof of Corollar y 4.2 . Let D b e a DA G in the Mark o v equiv alence class repr esen ted by C , constructed without orienti ng additional edges into X . Let D X b e obtained from D by removi ng all directed edges out of X that were d irected out of X in C . Let C X b e obtained from C by remo ving all dir ected edges out of X . W e fir st show that Y ∈ pa( X, C ) or Y ∈ p ossib leDe( X , C X ) imp ly Y ∈ adj( X, D X ) or D-S E P( X, Y , D X ) ∩ p ossibleDe( X, C ) 6 = ∅ . T h us su pp ose Y ∈ pa( X, C ) . T hen Y ∈ adj( X , D X ). Next, s upp ose Y ∈ p ossib leDe( X, C X ). It can b e easily shown that C X satisfies the basic pr op ert y of Lemma 7.1 , that A → B ❜ ❜ C implies A → C (since all ed ges th at are remo ve d are directed edges out of X ). Hence, Lemma 7.2 applies to C X , and it follo ws that there is a p ossibly directed d efinite status path from X to Y in C X . All nonend - p oin t v ertices on this path m ust b e d efinite noncolliders. By constru ction of C X , the first ed ge on this path must b e n ondirected in C X , and by con- struction of D X , this edge m ust b e oriente d out of X in D X . Th is imp lies that the en tire p ath must b e directed from X to Y in D X , since all nonend- p oin t vertic es are noncolliders. Let V b e the vertex adjacen t to X on th e path. T hen V ∈ D-SEP( X , Y , D X ). Moreo ver, V ∈ p ossibleDe( X, C ) . Hence D-SEP( X, Y , D X ) ∩ p ossibleDe( X, C ) 6 = ∅ . W e no w sh ow that D- SEP( X, Y , D X ) ∩ p ossibleDe( X, C ) 6 = ∅ or Y ∈ adj( X, D X ) imply Y ∈ pa( X, C ) or Y ∈ p ossibleDe( X, C X ). Thus sup p ose that Y ∈ pa( X, C ) imply Y ∈ pa( X, C ) or Y ∈ p ossibleDe( X, C X ). Thus su p- p ose Y ∈ adj( X, D X ). Then either X ← Y or X ❜ ❜ Y in C . This implies that 28 M. H. MAA THUIS AND D. COLOMBO Y ∈ pa( X, C ) or Y ∈ p ossib leDe( X, C X ). Next, su pp ose th at there exists a v ertex V ∈ D-SEP( X , Y , D X ) ∩ p ossibleDe( X, C ). Note that V ∈ D-SEP( X, Y , D X ) implies: (i) V ∈ pa( X, D X ) or (ii) V ∈ ch( X, D X ) ∩ an( Y , D X ) or (iii) V ∈ pa(c h ( X, D X ) ∩ an( Y , D X )). By constru ction of D X , case (i) implies V ∈ pa( X , C ). But this is in con tradiction w ith V ∈ p ossibleDe( X, C ), by Lemma 7.5 . In case (ii), we hav e X → V and a directed path fr om V to Y in D X , so that Y ∈ de( X , D X ). Similarly , we can obtain Y ∈ d e( X, D X ) in case (iii). T his imp lies Y ∈ p ossibleDe( X , C X ) in cases (ii) and (iii). The ab ov e sho ws the follo wing: if Y ∈ pa( X, C ) or Y ∈ p ossibleDe( X, C X ), then it is imp ossible to satisfy the generalized bac k -d o or criterion relativ e to ( X, Y ) and C . On the other hand, if Y / ∈ pa( X, C ) and Y / ∈ p ossibleDe( X, C X ), then D-S EP( X, Y , D X ) satisfies the generalized b ack-door criterion relativ e to ( X , Y ) and C . I t is left to show that in the latter case, w e can replace D-SEP( X, Y , D X ) by pa( X, C ) . S ince pa( X, C ) ⊆ D-SEP( X, Y , D X ), it is clear that pa( X, C ) satisfies condition (B-i) of Definition 3.7 . W e will now sho w that it also satisfies condition (B-ii). Th us, s u pp ose th at Y / ∈ pa( X, C ) an d Y / ∈ p ossibleDe( X, C X ). Con s ider a definite status bac k-do or path p = h X = U 1 , . . . , U k = Y i from X to Y in C . Since p is a bac k-do or path, it must start with X ← U 2 or X ❜ ❜ U 2 . Moreo ver, the length of p is at least t w o. If X ← U 2 , then it is clear that pa( X , C ) blo c ks p . If X ❜ ❜ U 2 , then p cannot ha ve a sub -path of the form U i − 1 ❜ ❜ U i ← U i +1 , i ∈ { 2 , . . . , k − 1 } , b ecause U i is of a definite status. Moreo v er, p cann ot b e p ossibly directed, b ecause then Y ∈ p ossibleDe( X , C X ). Hence, there m ust b e at least one collider on p . Let Q b e the collider on p that is closest to X . Then th e sub -path p ( X , Q ) is a p ossibly directed p ath from X to Q in C . Supp ose that Q is an ancestor of some vertex W ∈ pa( X, C ) in C . Then there is a p ossibly directed path from X to W in C , as w ell as an edge W → X . But this is imp ossible b y Lemma 7.5 . Hence, Q cannot b e an ancestor of any mem b er of pa( X, C ) in C . This im p lies that p is blo c k ed by p a( X, C ). Ac kno wledgemen ts. W e are v ery grateful to Marku s Kalisc h, T homas Ric hardson and t wo anon ymous r eferees for their comments and suggestions that ha ve significan tly impr o v ed the pap er. REFERENCES Ali, A. R. , R i chardson, T. S. , Spir tes, P. L. and Zhang, J. (2005). T ow ards char- acterizing Marko v equiv alence classes for directed acyclic graphs with latent v ariables. In Pr o c e e dings of the 21st Annual Confer enc e on Unc ertainty i n Artificial Intel l igenc e (UAI 2005) 10–17. AUAI Press, Arlington, V A. Ali, R. A. , R ichardson, T. S. and Spir tes, P. (2009). Marko v equiva lence for ancestral graphs. A nn. Statist. 37 2808 –2837. MR2541448 Andersson, S. A. , Madigan, D. and Perlman, M. D. (1997). A characterizatio n of Mark ov equiv alence classes for acyclic digraphs. Ann. Statist. 25 505–541. MR1439312 A GENER ALIZED BACK-DOOR CRITERION 29 Borboud akis, G. , Triant afillou, S. and Tsam ardinos, I. (2012). T ools and algo- rithms for causally interpreting directed edges in maximal ancestral graphs. I n Pr o- c e e dings of the 6th Eur op e an Workshop on Pr ob abilistic Gr aphic al Mo dels (PGM 2012) 35–42. DECSA I, Univ. Granada. Chickering, D. M. (2002). Learnin g equiv alence classes of Bay esian-netw ork structures. J. Mach. L e arn. R es. 2 445–498. MR1929415 Claassen, T . , Mooij, J. and Heskes, T. (2013). Learning sparse causal models is not NP-hard. In Pr o c e e di ngs of the 29th Annual Confer enc e on Unc ertainty in Artificial Intel li genc e (UAI 2013) 172–181. AUAI Press, Corv allis, OR. Colombo, D. , Maa thuis, M. H. , Kalisch, M. and Richardson, T. S. (2012). Learn- ing high-d imensional d irected acyclic graphs with latent and selection va riables. Ann. Statist. 40 294–3 21. MR3014308 de Luna, X. , W aernbaum, I. and Richardson, T. S . (2011). Cov ariate selection for the nonparametric estimation of an av erage treatment effect. Bi ometrika 98 861–875. MR2860329 Entner, D. , Hoyer, P. O. and Spir tes, P. (2013). D ata-driven cov ariate selection for nonparametric estimation of causal effects. J. Mach. L e arn. R es. Workshop Conf. Pr o c. 31 256–264. Gl ymour, C. , Sche ines, R. , Spir tes, P. and Kell y, K. (1987). Di sc overing Causal Structur e: Art ificial Intel ligenc e, Philosophy of Scienc e, and Statistic al Mo deling. A ca- demic Press, San Diego, CA. Golumbic, M. C. (1980). Algorithmic Gr aph The ory and Perfe ct Gr aphs . Academic Press, New Y ork. Huang, Y. and V al tor t a, M. (2006). Identifiabilit y in causal Ba yesian n et works: A sound and complete algorithm. In Pr o c e e dings of the 21st National Confer enc e on Artificial Intel li genc e (AAAI 2006) 114 9–1154. AA A I Press, Menlo P ark, CA. Kalisch, M. and B ¨ uhlmann, P. (2007). Estimating h igh-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. L e arn. R es. 8 613–636 . Kalisch, M. , M ¨ achler, M. , Colombo, D. , M a a thuis, M. H. and B ¨ uhlmann, P. (2012). Causal inference using graphical mo dels with the R pack age pcalg . J. Stat. Softw. 47 1–26. Ko ller, D. and Friedman , N. (2009). Pr ob abilistic Gr aphic al Mo dels: Principles and T e chniques . MIT Press, Cam bridge, MA. MR2778120 Maa thui s, M. H. , Colombo, D. , Kalisch, M. and B ¨ uhlmann, P. (2010). Predicting causal effects in large-scale systems from observ ational data. Nat. Metho ds 7 247–248. Maa thui s, M . H. , Kalisch, M. and B ¨ uhlmann, P. (2009). Estimating high- dimensional interv entio n effects from ob serva tional d ata. Ann. Statist . 37 3133–3164. MR2549555 Meek, C. (1995). Causal inference and causal explanation with b ac kground know ledge. In Pr o c e e dings of the El eventh Confer enc e on Unc ertainty in Artificial Intel li genc e (UAI 1995) 403–410. Morgan K aufmann, San F rancisco, CA. Pearl, J. (1993). Comment: Graphical mo dels, causality and interv ention. Statist. Sci. 8 266–269 . Pearl, J. (1995). Causal diagrams for empirical research. Biometrika 82 669–710. MR1380809 Pearl, J. (2000). Causality: Mo dels, R e asoning, and I nfer enc e . Cambridge Univ. Press, Cam bridge. MR1744773 Pearl, J. (2009). Causality: Mo dels, R e asoning, and Infer enc e , 2nd ed. Cambridge U niv. Press, Cambridge. MR2548166 Richardson, T. (2003). Marko v prop ert ies for acyclic directed mixed graphs. Sc and. J. Stat. 30 145–1 57. MR1963898 30 M. H. MAA THUIS AND D. COLOMBO Richardson, T. and Spi r tes, P. (2002). A n cestral graph Mark o v models. Ann. Stat ist. 30 962–1030. MR1926166 Richardson, T. S. an d Spi r tes, P. (2003). Causal inference v ia ancestral graph mo dels. In Highly Structur e d Sto chastic Systems . Oxfor d Statist. Sci. Ser. 27 83–113. Oxford Univ. Press, Oxford. MR2082407 Ro bins, J. (1986). A n ew approach to causal inference in m ortality studies with a sus- tained exp osure p eriod— application to control of the healthy wo rker survivor effect. Math. Mo del ling 7 1393–151 2. MR0877758 Shpitser, I. and Pearl, J. (2006a). Identification of conditional interven tional distri- butions. I n Pr o c e e dings of the 22nd Annual Confer enc e on U nc ertainty in Artificial Intel li genc e (UAI 2006) 437–444. AUAI Press, Corv allis, OR. Shpitser, I. and Pearl, J. (2006b). I dentificatio n of join t interv entional distributions in recursive semi-Marko vian causal mod els. In Pr o c e e dings of the 21st National Conf er enc e on Ar tificial I ntel ligenc e (AAAI 2006) 1219–1226. AAAI Press, Menlo Park, CA. Shpitser, I. and Pearl, J. ( 2008). Complete identi fication methods for th e causal h ier- arc hy . J. M ach. L e arn. R es. 9 1941–19 79. MR2447308 Shpitser, I. , V an der Wee le , T. and Robins, J. (2010a). On the v alidit y of cov ari- ate adjustment for estimating causal effects. In Pr o c e e dings of the 26th Confer enc e on Unc ertainty and Artificial Intel ligenc e (UAI 2010) 527–53 6. AUAI Press, Corv allis, OR. Shpitser, I. , V an der Weele, T. and Rob ins, J. (2010b). A p p endum to On the vali dity of c ovariate adjustment for estimating c ausal effe cts . P ersonal comm unication. Spir te s, P. , Gl ymour, C. and Schein es, R. (1993). Causation, Pr e diction, and Se ar ch . L e ctur e Notes i n Stat istics 81 . Sp ringer, New Y ork. MR1227558 Spir te s, P. , Gl ymour, C. and Schein es, R. (2000). Causation, Pr e diction, and Se ar ch , 2nd ed . MIT Press, Cam bridge, MA. MR1815675 Textor, J. and Li ´ skiewicz, M. (2011). Adjustment criteria in causal diagrams: An algo- rithmic p ersp ective. In Pr o c e e di ngs of the 27th Confer enc e on Unc ertainty in A rtificial Intel li genc e (UAI 2011) 681–688. AUAI Press, Corv allis, OR. Tian, J. and Pearl, J. (2002). A general identification condition for causal effects. In Pr o c e e dings of the 18th National Confer enc e on Ar tificial Intel l igenc e (AAAI 2002) 567–573 . A AAI Press, Menlo Park, CA. V anderWeele, T. J. and Shpi tser, I. (2011). A n ew criterion for confounder selection. Biometrics 67 1406–1 413. MR2872391 V an der Zander, B. , Li ´ skiewicz, M. and Textor, J. (2014). Constructing separators and adjustment sets in ancestral graphs. In Pr o c e e dings of the 30th Annual Confer enc e on Unc ertainty in Artificial I ntel ligenc e (UAI 2014) 907–916. AUAI Press, Corv allis, OR. Zhang, J. (2006). Causal inference and reasoning in causally insufficient systems. Ph.D. thesis, Carnegie Mellon Un iv., Pittsburgh, P A. Zhang, J. (2008a). Causal reasoning with ancestral graphs. J. Mach. L e arn. R es. 9 1437– 1474. MR2426048 Zhang, J. ( 2008b). On t h e completeness of orientation ru les for causal disco very in the presence of latent confounders and selection bias. Artificial I ntel ligenc e 172 1873–1896. MR2459793 Seminar for St atis tics ETH Zurich R ¨ amistrasse 101 8092 Zurich Switzerland E-mail: maath uis@stat.math.ethz.c h colom b o@stat.math.ethz.c h
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment