Direct genetic effects and their estimation from matched case-control data

1 Direct genetic eﬀects and their esti- mation from matc hed case-con trol data Carl o Berzuini 1 , Stijn V ansteela n d t 2 , Luisa F o co 3 , Rob erta P astori no 3 , Luisa Bernar dinelli 3 , 1 1 Stati stical Lab orat ory , Cen tre for Mathemat ical Sciences, Un i v ersity of Cambridge, United King dom 2 Department o f Applied Mathema tics and Comput er Sci ence, Ghen t U niv ersity , Bel gium 3 Department o f Applied Healt h Sciences, Universit y o f Pa via, Italy In genetic asso ciat ion studies, a single mark er is often asso ciated with m ultiple, correlated p he- not yp es (e.g., ob esity and cardio v ascular disease, or nicotine dep endence and lung cancer). A p erv asiv e question is then whether that marker has in dep enden t eﬀects on all p henot yp es. In this article, w e address this question b y assessing whether there is a dir e ct genetic eﬀect on one phenot yp e that is not mediated through the o ther phenot yp es. In particular, we inv estig ate h ow to ident ify and estimate such dir ect genetic eﬀects on the b asis of (matc hed) case-con trol data. W e d iscuss conditions under whic h su c h eﬀects are iden tiﬁable from the av ailable (matc hed) case-co ntrol data. W e ﬁn d that dir ect genetic eﬀects are sometimes estimable via s tandard re- gression m ethods, and sometimes vi a a more general G-e stimation method , whic h has previously b een prop osed for random samples and unmatc hed case-con trol stud ies [37, 39] and is here ex- tended to matc hed case-con trol stud ies. The results are u sed to assess whether the FTO gene is asso ciated with my ocardial infarction other than via an eﬀect on ob esity . 1 In tro duc t ion Asso ciations o f a genetic v arian t with a primary phenot yp e can b e diﬃcult to interpret when one considers the lik ely presence of correlated phenotypes. The genetic asso ciation ma y then b e the indirect result of g enetic eﬀects on a correlated phenotype, whic h sub- sequen tly aﬀ ect the primary phenot yp e. F or instance, Chano c k and Hun ter [5] discuss 2 three genetic asso ciation studies whic h iden tiﬁed an a ssociation b et w een a genetic v aria- tion on c hromosome 15 and the risk of lung cancer, but the studies disagree on whether the link is direct or mediated through nicotine dep endence . Addressing this question ma y b e imp ortant to a b etter understanding of the underlying causal mec hanism. This article addresses t he general problem of inferring the direct eﬀect of a mark er X on a tra it Y (e.g., lung cancer), controlling for a correlated tra it M (e.g., nicotine dep endence), whic h w e will refer to as a m e diating v ariable or mediator. V ansteelandt et al. [39] consider this pro blem in the con text of prosp ectiv e studies of genetic association. Motiv a ted by the frequen t use of ascertained samples in those studies, in this paper w e extend the metho d to matche d case-con trol studies. W e sho w t ha t case-con trol sampling seriously complicates the iden tiﬁcation of direct genetic eﬀects. Progress can b e made within certain classes of statistical mo dels and under sp eciﬁc no unmeasured confo unding assumptions. In particular, w e ﬁnd that, under v ery restrictiv e conditions, direct eﬀects are estimable from case-con trol data b y us ing standard regression metho ds, and that they are es timable under more lenien t conditions b y using special G -estimation metho ds [37], whic h w e here extend to ma t ched case-con trol data . In this pap er, the required conditions for estimabilit y are unam biguously e xpressed as conditional indep endenc e relationships b et w een problem v ariables, whic h w e can chec k on a causal diagram [8, 17, 25 ]. W e illustrate the metho d with the aid of a motiv a ting study , in whic h w e use matc hed case-con trol data to assess whether v ariation in the c hromosomal region of the FTO gene causally aﬀects susceptibilit y to m y o cardial infar ctio n other than via an increase in b o dy ma ss. 2 Motiv ating study FTO is a larg e gene on c hromosome 16, that is highly expressed in t he hypothala mic n uclei that control eating b ehav iour in mice [13]. The ﬁrst in tron of FTO harb ours the single n ucleotide p olymorphism (SNP) rs9939609, asso ciated with b o dy mass [35] and m y o cardial infa r ctio n [1, 13, 1 4, 28 , 32, 3 3, 41 ]. A simple, ten tativ e, in terpretation of the evidence is that genetic v ariation represen ted (or reﬂected) b y rs9939609 ampliﬁes the ob esit y-inducing eﬀect of FTO, t hereby indirectly aﬀ ecting susceptibilit y t o infarction. Ho w ev er, an analysis of the data of Section 6, based on the metho d w e propose in Section 5, sho ws that the eﬀect of rs9939609 on infarction is not en tirely mediated by b o dy mass. This ﬁnding p oin ts to a diﬀerent theory o f the role o f rs9939 609 in the dev elopmen t of an infarction. Figure 1 a show s a causal diagram represen ta tion of the problem. C ausal diagrams are review ed in App endix 1. In the diagr a m, w e let GE N O denote genetic v ariation resp on- sible for c hanges in risk of infarction and correlated with rs9939 6 09. W e let M I denote o ccurrence o r no no ccurrence of infarction. Let D E M O represen t the follo wing set of 3 v aria bles: sex, geographical area of origin and profession. Let B M I represen t the b o dy mass index. Let B E H AV E represen t frequen t phys ical exercise and drinking habit. Ac- cording to the dia g ram, the correlation b et w een B E H AV E and M I is taken to b e, in part, induced by shared genetic or en vironmen tal f a ctors, U N O B S E RV E D . The miss- ing U N O B S E RV E D → B M I arrow represen ts the assumption that, conditionally on B E H AV E and D E M O , no unobserv ed risk factors for ob esit y are as so ciated with infarc- tion. Application of the propo sed method to the data of Section 6, under the assumptions of Fig ure 1 a , show s that the causal eﬀect of GE N O on M I is not entirely mediated b y B M I , in the sense that a (hypothetical) in terv en tion that ﬁxes the v alue of B M I w ould not completely blo c k the eﬀect exerted on M I b y a (hypothetical) inte rve ntion on GE N O . This ﬁnding p oin ts to new hy p otheses ab out the role of rs99 3 9609 in susceptibilit y to MI. A t the end of this pa per we discuss the biological implications of this ﬁnding in the ligh t of recen t experimen tal researc h evidence. U N O B S E RV E D M I B M I D E M O B E H AV E GE N O (a) U N O B S E RV E D M I B M I D E M O B E H AV E GE N O σ B M I σ GE N O (b) Figure 1. (a) C ausal diagram for our motiv ating study , (b) same d iagram, augmented with in terv ent ion ind icators, as explained in Section 3. 4 3 Con trolle d dir e ct eﬀects More in general, let X denote genetic v ariation of interes t, and t he binary v ariable Y indicate o ccurrence ( Y = 1) or non-o ccurrence ( Y = 0) of the disease. Let M denote a set of v ariables along the causal path from X to Y . Deﬁne the dir e ct eﬀe ct of X on Y , c ontr ol le d for M , to b e the eﬀect exerted on Y by an intervention that c hanges the v alue of X from some reference v alue x 0 to x 1 , while k eeping M ﬁxed at some reference v alue, m 0 [26, 3 0]. T o formalize this concept, we need t o represen t the idea o f “in terv en tion”. This means to distinguish b etw een the “observ at io nal” distribution of the data w e are analyzing, P ∅ , and the distribution, P xm , of the data w e w ould ha v e obta ined had w e ﬁxed X to some v alue x a nd/or M to some v alue m . F ollowin g Da wid [8 ], w e label these diﬀeren t distributions by an intervention indic ator σ X and a n interv en tion indicator σ M , where, for H ∈ ( X , M ), the sym b ol σ H = ∅ indicates that the v alue H is observ ed passiv ely , and the sym bol σ H = h indicates that H is set to h by an interv en tion. Th us, for a g eneric v ariable W , the sym b ol P ( Y = 1 | σ X = x, σ M = m, W = w ) denotes the probabilit y of o ccurrence of the o utcome ev en t, conditional on observing W = w , when w e forcefully set X to x a nd M t o m . The direct eﬀect o f X o n Y , controlled fo r M and conditional on a generic set W of observ ed v ariables, can no w b e measured in terms o f the (causal conditional) relativ e risk P ( Y = 1 | σ X = x 1 , σ M = m 0 , W ) P ( Y = 1 | σ X = x 0 , σ M = m 0 , W ) , (1) or in terms of the (causal conditional) o dds ratio o dds( Y = 1 | σ X = x 1 , σ M = m 0 , W ) o dds( Y = 1 | σ X = x 0 , σ M = m 0 , W ) , (2) where o dds( Y = 1 | σ X = x, σ M = m, W = w ) = P ( Y = 1 | σ X = x, σ M = m, W = w ) /P ( Y = 0 | σ X = x, σ M = m, W = w ). Because our data are generated fr om P ∅ , i.e. , conditional on σ X = ∅ , σ M = ∅ , t hey will – in general – b e uninfo rmativ e ab out the in terv en tional probabilities in v olv ed in the direct eﬀect of in terest, b e it in the form (1) or in the f orm (2). Do es this mean w e can n e ver estimate a direct eﬀect on the basis of observ ational data? Luc kily , no. Estimation is p ossible in sp ecial situations, under iden tiﬁabilit y conditions studied in the next section. As w e shall see, these conditions can b e expressed through the la ng uage of conditional indep endenc e [10], extende d b y Da wid to accommo date interv en tion indicators [9]. An imp ortan t to ol, in our subsequen t discussion, are causal diagrams extended (augmen ted) to incorp ora te interv ention indicators in the form of a dditional no des sending arrows in to their corr esp onding v ariables, as in [9]. One example is the causal diagram of Figure 1 b , whic h extends the diagram o f Figure 1 a by adding no des to r epres en t the interv en tion indicators for v ariables X a nd M . 5 U N O B S E RV E D S = 1 M I B M I D E M O B E H AV E GE N O σ B M I σ GE N O Figure 2. Th is diagram has b een obtained from Figure 1 b by adding the selection ind icato r no de, S = 1, as explained in Section 5. In the diagram, this n od e receiv es arro ws from M I , D E M O and B E H AV E . T his repr esen ts the assumption th at the p robabilit y of a generic individual of the study cohort b eing sampled dep end s on ( M I , D E M O , B E H AV E ) while b eing, conditional on these v ariables, ind ep enden t of GE N O . Its dep end en ce on GE N O w ould violate condition (15). 4 Estimation fro m random p opulation samples Imp ortan t results on the identiﬁabilit y of con trolled direct eﬀects ha v e b een obtained b y Robins, Greenland, Didelez, Dawid, G enele tti and P earl [11, 2 6, 29, 30] under the assump- tion that t he p opulation sample is random. Thes e results are now summarized, with the in v olv ed assumptions expressed in the fo rm of conditional independence conditions b et w een pro blem v ariables. If there exists a ( p ossibly empty ) set W of observ ed v ariables suc h t ha t, conditionally on W , there is no unobserv ed confounding of the relationship b et w een ( X , M ) and Y , then the direct eﬀect of X on Y , con trolling for M , is iden tiﬁable from random p opulation samples and estimable via standard regression of Y o n X , M and W . The stated condition is equiv alen t to asking that W is not a descendan t of either M or X , and that the distribution of Y give n ( X , M , W ) is the same, regar dless of the w a y the v a lues of X a nd M are generated, b e it observ at io nally o r b y forceful in terv en tion, formally: W ⊥ ⊥ ( σ X , σ M ) , (3) Y ⊥ ⊥ ( σ X , σ M ) | ( X , M , W ) . (4) 6 In fact, it follo ws from (4) t ha t P ( Y = 1 | σ X = x, σ M = m, W ) = P ( Y = 1 | X = x, M = m, W ) , where the righ thand side can b e obtained as the ﬁtted v alue from a (logistic) regres- sion mo del, and hence an estimate of the causal conditional relativ e risk (1) can also b e obtained. Conditions (3-4) can b e c hec k ed on an augmen ted causal diagram b y the d -separation criterion [15, 34] review ed in App endix 1, o r the equiv a len t moralisation cri- terion [22]. EXAMPLE 1: the diagram of Figure 1 b conta ins causal paths from GE N O to M I that do not in v olv e B M I . It m ak es thus sense to test for a d irect eﬀect of GE N O on M I , controlling for B M I . Conditions (3-4) for this test to b e approac hable via standard regression imply the existence of a (p ossibly empty) s et of v ariables W th at satisﬁes: W ⊥ ⊥ ( σ GE N O , σ B M I ) , (5) M I ⊥ ⊥ ( σ GE N O , σ B M I ) | ( GE N O , B M I , W ) . (6) In ord er to satisfy (5), the set W must not conta in a member of B E H AV E . But th en, b e - cause M I and σ B M I are d -connected when B M I is in the conditioning set and B E H AV E is not (in accord with the theory of App endix 1), condition (6) will b e inevitably violated, and we conclude that, in this examp le, the direct eﬀect of in terest cannot b e estimated by using standard regression. Estimation of the direct eﬀect of X on Y , con trolled for M , from prosp ectiv e o bserv ational data, is p ossible under mo r e lenien t conditions than (3-4), although this will o ccasionally require standard regression to b e abando ned in fav our of the more general metho d of G – computation [30]. These more lenien t conditions require that there b e a (p ossibly empty) set W of non-causal successors of X suc h that, conditional on W , there is no confo unding b et w een Y and X , and a (p ossibly empt y) set Z of non-causal successors of M suc h that, conditional on ( X , Z , W ), there is no confounding b et w een Y and M . All this is formally expresse d by the follo wing conditions: W ⊥ ⊥ σ X , (7) Z ⊥ ⊥ σ M , (8) Y ⊥ ⊥ σ X | ( X , W ) , (9) Y ⊥ ⊥ σ M | ( X , M , Z , W ) , (10) whic h are sim ilar to those giv en in [11]. V arious authors hav e discusse d G - computation [11, 23, 2 6 , 29, 30, 36] or G -estimation [20, 37, 39] of controlled direct eﬀects from a random p opulation sample in suc h settings. These authors use a ssumptions (7)- ( 10), although they sometimes a dopt a diﬀeren t ”language” to express them. 7 EXAMPLE 2: with reference to the causal d iag ram of Figure 1 b , if we sp ecify W ≡ D E M O and Z ≡ B E H AV E , then conditions (7)-(10) can b e written as: D E M O ⊥ ⊥ σ GE N O , (11) B E H AV E ⊥ ⊥ σ B M I , (12) M I ⊥ ⊥ σ GE N O | ( D E M O , GE N O ) , (13) M I ⊥ ⊥ σ B M I | ( GE N O , B M I , B E H AV E , D E M O ) , (14) Conditions (11)-(12) are satisﬁed b e cause n either D E M O is a descendant of GE N O , nor B E H AV E a descendant of B M I . Condition (13) is satisﬁed b ecause M I and σ GE N O are d -separated in Figure 1 b when D E M O is in the conditioning set. Finally , condition (14) is satisﬁed b ecause, as sho wn in App endix 1, no des M I and σ B M I are d -separated in Figure 1 b if B M I and B E H AV E are in the conditioning set. W e conclude that the direct eﬀect of GE N O on M I , control ling for B M I , is estimable by G -computation from prosp ectiv e observ ati onal data, un der the assum p tions of Figure 1 b . Conditions (7)-(10) do not preve nt Z from b eing a descendan t of X , in whic h case the con- ditioning on Z will – in a g eneral prosp ectiv e study – create a spurious asso ciation b et w een X and Y , even in absence of the direct eﬀect we wish to assess [6, 24]. This ”collider- stratiﬁcation bias” will preven t standard regression, but not necessarily G -computation or G -estimation, from correctly estimating the direct eﬀect o f X on Y , con trolling for M , as sho wn in [39]. 5 Estimation fro m matc hed case-co n trol s tudies Let us no w shift atten tion to the estimation of con trolled direct eﬀects in the con text of a r etr osp e ctive d esign . This is, ev en under the g eneral conditions (7)-(10), a complicated task, one reason b eing the p ossible (”exp osure-induced mediator- outcome”) confounding induced by statistical dep endence b et w een Z and σ X (quite p ossible under (7)-(10)). The literature on estimating con trolled direct eﬀects fro m retrosp ectiv e designs in presence of this t yp e of confounding is, to the b est of our knowle dge, v ery limited so far. G - estimation a pproac hes to this problem in the contex t o f unmatc hed case-con trol studies ha v e b een suggested b y V ansteelandt in [37] and [38]. The latter pap er uses G -estimation in com bination with logistic regr ession. In this section, w e shall presen t an appro ac h to the problem that w orks with matche d case-con trol studies. W e start b y including in the causal diagra m a sp ecial no de S , called the sele ction indi- c ator , to accoun t for t he non-r a ndom sampling in v olv ed in case-con trol studies. This is exempliﬁed in Fig ur e 2 b . The v alue S = 1 indicates that the individual ha s b een selected from the underlying study cohort for inclusion in the study , a s in [1 6, 19]. Implicit in a case-con trol study is the fact that the selection ev en t, S , dep ends on the outcome, Y , and this is wh y w e ha v e the Y → S arr o w in the diagram. Data analysis is (b y tautology) 8 p erformed conditional on S = 1 . Supp ose that the usual ”ra r e disease assumption” is v alid, and that the “collapsibilit y” condition X ⊥ ⊥ S | ( Y , M , W ) , (15) is satisﬁed, whic h make s sure the conditio na l o dds ratio o dds( Y = 1 | X = x, M = m, W ) is not aﬀected by the retrosp ectiv e sampling [12, 40]. In those situations where the a bov e condition is satisﬁed together with (3)-(4), a standard regression approach to t he case- con trol study will w ork ( conditio nal lo gistic regression b eing one optio n when hev case- con trol study is matc hed). In the follo wing, w e are concerned with the more diﬃcult situation of a matc hed case-con trol study where condition (15), but not (3)-(4), hold. Hence supp ose that cases a nd controls ha v e b een 1- to-1 match ed with resp ect to a set W of v a riables t ha t satisﬁes conditions ( 7)-(1 0). Let the W -matched pairs b e indexed by i (with i = 1 , . . . , n ) and let the generic notation G ( ij ) denote the v alue of a v a riable of in terest, G , for the j th mem b er of pair i . Assume the ev en t Y = 1 is rare (whic h is often a main motiv ation fo r the c hoice of a retrosp ectiv e design), and that the follow ing mo del is true: E ( Y | σ X = x, σ M = m, W, Z ) E ( Y | σ X = 0 , σ M = 0 , W , Z ) = exp( ψ x + γ m ) , (16) where exp ectations E ( . ) r efer to the p opulation distribution. Then w e show in App endix 2 that the data will approximately satisfy: E ∗  ( X ( i 1) − X ( i 0) ) exp( − ψ X ( i 1) − γ M ( i 1) )  = 0 , (17) where the exp ectatio n E ∗ ( . ) refers to the observ ed data distribution under retrosp ectiv e sampling. The idea is then to ﬁt the logistic regression mo del: logit P ( Y ( ij ) = 1 | X ( ij ) = x, Z ( ij ) = z , M ( ij ) = m ) = α + δ x + β z + η m + b ( i ) , where b ( i ) is a mean zero r a ndom eﬀect, whic h expresse s the con tribution for matched pair i . A maximum lik eliho o d estimate o f the r emaining parameters, ( α , δ , β , η ), can b e obtained via conditional logistic regression, for example by using the CLOGIT pro cedure in R . Under the ”no confounding” conditions (8) and (10), the estimate of η , denoted b y ˆ η , enco des the conditional causal eﬀect of M on Y , represen ted in Equation (1 6) by the sym b ol γ . Equation (17) then justiﬁes the use of the f ollo wing conditional score equation: 0 = n X i =1 ( x ( i 1) − x ( i 0) ) exp  − ψ x ( i 1) − ˆ η m ( i 1)  (18) for estimating the direct eﬀect o f in terest, whic h is enco ded b y ψ . An estimator for the v aria nce of ˆ ψ is deriv ed in the last paragraph o f App endix 2 . 9 EXAMPLE 3: it is easy to show, along the lines of Examp le 2, that, for W ≡ D E M O and Z ≡ B E H AV E , the causal diag ram of Figure 2 satisﬁes conditions (7)-( 10 ) for W ≡ D E M O and Z ≡ B E H AV E , and th e collapsibilit y condition GE N O ⊥ ⊥ S | ( M I , B M I , D E M O ) , as w ell. Bec ause of the ab o v e consid eratio ns, and b ecause early infarction is a rare disease, w e conclude that the direct eﬀect of GE N O on M I , con trolling for B M I , is estimable by G -computation from m atc hed case-con trol data, u nder th e assump tions of Figure 2. 6 Bac k to o u r motiv ating study Within an Italian study in the genetics of infarction [2], cases w ere ascertained on t he basis o f hospitalization fo r acute m y o cardial infar ction b et w een ages 40 and 45, during the 199 6 – 20 0 2 p erio d. This study in v olv es the v aria bles represen ted in Figures 1 and 2, and whic h w e contin ue to denote through the sym b ols intro duced in Section 2. The con trols w ere selected by matchin g them to the cases ov er sex, geogra phical area of origin and profession ( D E M O ). Our aim here is to estimate, on the basis of the study data, the eﬀect of genetic v ariation reﬂected b y rs9939609 ( GE N O ) on risk of early infarction ( M I ), con trolling for b o dy mass ( B M I ). W e work under the a ssumptions represen ted in the diagram of Figure 2, whic h app ear legitimate, esp ecially when one considers the narrow range of ages a t infarction represen ted in our sample of cases. Under suc h a ssu mptions, w e ha v e already seen in Example 3 that the direct eﬀect of in terest is estimable b y using the alg o rithm described in the previous section. The distribution of the rs99396 09 genotype in sample cases and controls is summarized in T able 1 . No ma jor departure from Ha r dy-W einberg equilibrium in con trols w as detected. n um b er of copies of the ma jo r rs9939 6 09 a llele con trols cases 0 305 380 1 889 921 2 644 537 T able 1. Distribution of the rs9939609 genot yp e in samp le cases and con trols. T able 2 summarizes results from the ﬁtting of a conditional logistic mo del for the de- p endence of o ccurrence of early my ocardia l infarction on wild-ty p e rs993 9609 homozygos- it y , without any adjustmen t for other v aria bles in the mo del (except, o f course, for the matc hing v ariables). This yielded an estimate of 0.76 fo r the total eﬀect of rs9939 6 09 rare homozygosit y on infarction, on a n o dds ratio scale, whic h is signiﬁcan tly diﬀeren t from 10 OR p -v alue 95% conﬁdence in terv a l rs9939609 wild-type homozygosit y? 0.76 0.0001 0.65 - 0.8 7 T able 2. Results from the ﬁtting of a conditional logistic mo del for the dep endence of o ccurrence of early my ocardial infarction on rare r s9939 609 homozygosit y , without any adjustment for other v ariables in the mo del. This pro duces an estimate of the total eﬀect of the rs9939609 r are homozygosit y on su sceptibilit y to early m y o cardial infarction, on an o dds ratio of disease scale, rep orted in the OR column of the table. OR p -v alue 95% conﬁdence in terv al rs9939609 wild-t yp e homozygosit y? 0.81 0.007 0.7 - 0.94 b o dy mass index 1.15 < 2e-16 1.12 - 1.17 T able 3. Results from the ﬁtting of a conditional logistic mo del for the dep endence of o ccurrence of early my ocardial infarction on rare r s9939 609 homozygosit y , adju sting for b ody mass index. the null at a 0.0001 lev el of signiﬁcance. This can b e in terpreted as evidence of an “o v erall protectiv e” eﬀect of the ma jor rs9939609 a llele. When w e further included b o dy mass as an additional explanatory v aria ble in the mo del, w e obtained the results of T able 3, where the eﬀect of rs99 3 9609 wild-type homozygosity on infarction, 0.81 o n an o dds ratio scale, signiﬁcan tly departs from the null at a 0.007 lev el of signiﬁcance. Unfortunately , b ecause conditions (3-4) are violated b y the diagram of Figure 2, w e cannot take this estimate as a v alid measure of the direct eﬀect of rs9939609 wild-t yp e homozygosit y on infarction, con trolling for b o dy mass. One problem here is, in fact, that phys ical exercise a nd drinking are p oten tial confounders o f the asso ciation b et w een b o dy mass and m y o cardial infa rction. Can this problem b e ov ercome b y including the B E H AV E v a riables – physic al exercise and drinking habit – as additio nal co v a riates in the regr ession mo del? When w e did so, the estimated eﬀect of r s99 3 9609 wild-type homozygosit y on infarction w as 0.84 , whic h is a signiﬁcant (at a 0.02 leve l) departure fro m the n ull (see T able 4 ) . Again, b ecause the causal dia gram of Figure 2 violates conditions (3)-(4), our metho d do es not guar a n tee that the ab ov e estimate, obtained by standard regression, is a v alid measure of the direct eﬀect of inte rest. One problem b eing that the conditioning on B E H AV E opens the GE N O → B E H AV E ← U → M I path (see App endix 1) and, as a consequence, it in tro duces a spurious, non causal, asso ciation b et w een GE N O and M I , so called collider- stratiﬁcation bias. W e must accept the fact that, according t o our metho d, no v alid estimate of the direct eﬀect of in terest can b e obtained b y standard regression. Luc kily , b ecause the causal diagram of F ig ure 2 satisﬁes conditio ns (7-1 0, 15 ), our metho d tells us 11 OR p -v alue 95% conﬁdence in terv al rs9939609 wild-t yp e homozygosit y? 0.84 0.02 0.72 - 0.98 b o dy mass index 1.14 < 2e-16 1.11 - 1.16 o ccasional ph ysical exercise? 0.61 1.41e-07 0.50 - 0.73 frequen t ph ysical exercise? 0.53 3.13e-13 0.44 - 0 .63 drinking habit? 1.36 7.48e-05 1.17 - 1 .59 T able 4. Results from the ﬁtting of a conditional logistic mo del for the dep endence of o ccurrence of early my ocardial infarction on rare r s9939 609 homozygosit y , adju sting for b ody mass index, physic al exercise and drin king habit. that a v alid estimate of the direct eﬀect of rs9939 609 o n infarction, controlling for b o dy mass, can b e obtained by using the G -estimation pro cedure of the preceding section. This yields an estimate of 0 . 72, with a 95% conﬁdence in terv al of ( 0 . 62 , 0 . 84), on a relat ive risk scale. This estimate diﬀers appreciably from the estimates obtained in previous steps of the analysis. The fact that the latter estimate refers to t he relativ e risk scale, rather than the o dds ratio scale, do es not en tirely explain this diﬀerence in view of the low prev alence of early-onset m y o cardial infarction. F rom a substan tiv e p oin t o f view, our ﬁnding suggests t ha t genetic v ariation represen ted b y rs9939609 may inﬂuence heart disease via path w ay s diﬀerent from those inv o lv ed in b o dy mass. A biolo gical in terpretation of this ﬁnding is giv en at the end of the next section. 7 Discuss ion In this pap er, w e ha v e started by examining conditions under whic h con trolled direct eﬀects can be estimated from prosp ectiv e observ atio nal dat a via standard regression. When these conditions are violat ed, the direct eﬀect of interest is sometimes still estimable from a pro spectiv e study , alb eit not via regression. W e ha v e examined the more general conditions under whic h a con trolled direct eﬀect is estimable via G -computation, and w e ha v e expresse d them as prop erties of a causal diagra m represen tation of the problem. Then, in consideration o f t he increasing imp ortance of matc hed case-control studies in genetic epidemiology , w e hav e shifted a tten tion to this class of studies. W e hav e prop osed an algorithm for the G – estimation o f con trolled direct eﬀects from matc hed case-con trol studies, and characterized the necessary conditions for a lg orithm v alidit y in t erms of conditional indep endence prop erties of the causal diagra m represen tation of t he pro blem. The prop osed metho d is also relev an t in situations where the notion of ” case” is not the usual one. Examples are o ﬀered b y the pap ers of Cordell and colleagues [7], and 12 of Bernardinelli a nd colleagues [4], where genetic eﬀects are estimated b y conditioning on paren tal genot yp es, using data from proband-paren t tr io s. These pap ers essen tially p erform a mat ched case-con trol analysis via conditio nal logistic regression, using the case and one o r more ”pseudo con trols” deriv ed from the un transmitted parental haplotypes. This approa c h could b e com bined with the metho ds presen ted in this pap er to assess dir e ct controlled g enetic eﬀects. In the contex t of retrosp ectiv e designs, further study is warran ted o f identiﬁc ation results for con trolled direct eﬀects in speciﬁc mo del classes, as w ell as for so-called ”natural” direct and indirect eﬀects [27]. In additio n, further w ork is needed to in v estigate whether direct eﬀect estimators can b e constructed on t he basis of matc hed case-control studies, whic h are either more eﬃcien t than the estimator pr o po sed in this pap er, or less dep enden t on a rare disease assumption. Fina lly , future work will also fo cus on inference under a lt ernativ e strategies for the selection of controls in a retrosp ectiv e study . W e hav e illustrated the metho d with the aid of a study in the g enetics o f m y o cardial infarction. Our ana lysis detected presence of a direct eﬀect of rs9939609 on infarction, con trolling for b o dy mass. This ﬁnding suggests that the eﬀect of this SNP on suscepti- bilit y to infar ctio n is not totally explained in terms of a deleterious eﬀect of FTO on b o dy mass. This ﬁnding p oints to a num ber of p ossible h yp otheses. V ery relev an t here is re- cen t evidence that SNPs can, in general, exert an inﬂuence on the expression of r elativ ely distan t (in terms of DNA stretch) genes. In our case, it could b e that rs993 9609 drive s the expression of a gene other than F TO, functionally unrelated with FTO, whose eﬀect on r isk of infarction is not mediated b y b o dy mass. And hence the direct eﬀect. Such h yp othesis is corrob orated b y biolo g ical evidence tha t the F TO is lo cated in a genomic region con taining highly conserv ed genomic regulatory blo c ks whic h, according to a we ll established theory , are likely to driv e the expression of distant genes [3, 21, 31]. The ab ov e considerations ha v e useful implications with resp ect to p ossible experimen ts to elucidate the mec hanism. It is not unlikely that rs9939609 may sim ultaneously driv e the expres- sion of diﬀeren t, and f unctiona lly unrelated, genes. Suc h a m ulti-eﬀect pattern could b e common. F or example a recen t study [18] sho ws tha t SNPs in the 9p21 .3 regio n of D NA, notoriously a ssociated with susceptibilit y to infarction, not only control nearb y genes, but also the expression of the quite distant IFNA21 g ene. G eneralizing on t his example, one migh t conjecture that many SNPs exert their inﬂuence on disease susceptibilit y through non-o v erlapping path w a ys, and tha t this will, in man y cases, result in evidence of direct and indirect eﬀects that our metho d is able to capture. Ac kno wledgments The authors thank Elisab eth Coart for pr eliminary data analyses, a nd D rs. Diego Ardissino and Pierangelica Merlini for prov iding their insight in to the clinical problem. The ﬁrst au- 13 thor ac kno wledges supp o rt from the UK Medical Researc h Council G ran t no. G08023 20 ( www.mrc.ac. uk ), and b y the Cam bridge Statistics Initiative. The second author ac- kno wledges supp ort from researc h pro ject G.0111.1 2 of the F und for Scien tiﬁc Researc h (Flanders), IAP researc h net w ork gr an t nr. P06/03 from the Belgian gov ernment (Belgian Science P olicy), and Ghen t Univ ersit y (Multidisciplinary Researc h P artnership “Bioinfor- matics: from n ucleotides to net w orks”). App en dix 1: Causal diagrams Causal d iagrams [8, 17, 25] consist of a set of no des repr esenting v ariables in the problem, and directed arr ows conn ecti ng p airs of no des, as in Figure 1, for example. The same, elliptical, shap e is used for all n odes. In particular, no distinction is made, in terms of n od e shap e, b et w een observ ed and unobserve d v ariables/no des, one reason b eing that this is not a distinction that has to do w ith the causal s tructure of the sy s tem un der study . The arrows represent direct causal inﬂuence, in a sense to b e made clear. A p ath is a sequence of distinct no des where any t w o adjacen t no des in the sequence are connected by an arrow. A dir e cte d p ath from a no de X to a n od e B is a path where all arro ws connecting no des on the p ath p oint a wa y from A and to w ards B . F or example, in the graph of Figure 2, the sequence GE N O , B E H AV E , B M I , M I , U N O B S E RV E D is a path b etw een GE N O and U N O B S E R V E D , bu t n ot a d irected one. If A has a directed path to B then A is an anc e stor of B , an d B a desc endant of A . By con v en tion, A is b oth an ancestor and a d escendan t of A . If an arrow p oints fr om A to B , then A is called a p ar ent of B . In this pap er, we restrict to causal diagrams w h ic h ha v e the form of a directed acyclic graph (D A G), that is, a directed graph where for any directed path from A to B , n od e B is n ot a parent of A . A probab ility distribu tion o v er the set of no des of the graph is said to b e M arko v with resp ect to the graph if it can b e expr essed as a pro du ct of factors, where eac h factor is the conditional pr obabilit y of a no de of the graph, give n its parents in the graph. A consecutiv e triple of n od es, A, B , C sa y , on a path is called a c ol lider if the arrow b etw een A and B and the arro w b et wee n C and B b oth hav e arro wheads p oin ting to B . F or example, in Figure 1, n ode B E H AV E is a collider on the GE N O → B E H AV E ← U N O B S E RV E D path. An y other consecutiv e trip le is called a non-c ol lider . A path b et w een tw o no des, A and B say , is said to b e blo cke d by a set C if either for some non-collider on the path, th e mid dle no de is in C , or if the p ath con tains a collider suc h that no descendan t of the mid dle no de of suc h collider is in C . F or example, in the graph of Figure 2, the path GE N O → B E H AV E → B M I → M I ← U N O B S E RV E D 14 is blo c ke d b y any set of n od es that con tains either or b oth of ( B E H AV E , B M I ), and /or do es not con tain S or M I . In particular, the p ath is blo c k ed by th e empt y set of no des. F or disjoin t sets A, B , C of no des in a DA G we sa y A is d -separated from B give n C if ev ery path f rom a n od e in A to a no de in B is b lock ed b y C . If A is not d -separated f r om B giv en C , w e say A is d -connected to B give n C . F or example, in th e d iag ram of Figure 1 b , no des σ B M I and M I are d -separated giv en ( GE N O , B M I , B E H AV E , D E M O ). Th is is b ecause all paths b et w een σ B M I and M I con tain at least one of the follo wing non-colliders: ( M I , B E H AV E , B M I ), ( U N O B S E R V E D, B E H AV E , B M I ), ( U N O B S E RV E D , D E M O , GE N O ), ( U N O B S E RV E D , D E M O , B M I ), ( M I , GE N O , B M I ), all of whic h are blo c k ed by virtu e of the fact th at B E H AV E , D E M O and GE N O are in the conditioning set. As a fu rther example, the r eader is in vited to c hec k that that σ GE N O and M I are d -connected in the d iagram of Figure 1 b if B M I , but not GE N O , is in th e conditioning set. Two s ets of no des, A and B s ay , that are d -separated giv en a third set C , are conditionally indep endent, in a pr ob ab ilistic sense, giv en C , und er an y distribution that is Mark o v with resp ect to the graph. By contrast, if A and B are d -connected giv en C , there exists some probabilit y distribution whic h is Mark o v with resp ect to the graph , un der w h ic h A and B are not conditionally in dep enden t, giv en C . App en dix 2 W e no w pro ve that, under conditions (7-10, 15), mo del (16) and a matc hed case-con trol sampling regime of the kin d describ ed in Section 5 , the data approximat ely satisfy Equation (17) , whic h w e h er e r ep eat for th e reader’s con v enience: E ∗ n ( X ( i 1) − X ( i 0) ) exp( − ψ X ( i 1) − γ M ( i 1) ) o = 0 , (19) where the exp ectation E ∗ ( . ) refers to the observe d data distrib ution un der retrosp ectiv e s am- pling. Mo del (16) implies: E ( Y | σ X = x, σ M = m, W , Z ) E ( Y | σ X = x, σ M = 0 , W , Z ) = exp( γ m ) , from which we obtain: E ( Y | σ X = x, σ M = m, X = x, M = m, W, Z ) exp( − γ m ) = E ( Y | σ X = x, σ M = 0 , W , Z ) , b ecause for a generic v ariable H the equ alit y σ H = h logically implies H = h ; at least, this is true under the so-called consistency assumption that setting H to h by int erve ntio n has no eﬀect amongst those for whom H = h is naturally observ ed. 15 Thanks to th e conditioning on X = x and M = m , w e ma y n o w br ing the exp( − γ m ) factor of the left han d side in to the exp ectation, and further multiply b oth s id es of th e equ atio n b y the factor exp( − ψ x ), so as to obtain: E [ Y exp( − ψ x − γ m ) | σ X = x, σ M = m, X = x, M = m, W , Z ] = = E [ Y exp( − ψ x ) | σ X = x, σ M = 0 , W , Z ] . Then, by virtu e of conditions (9)- (10), r esp ecti vel y , w e can eliminate the conditioning on σ X = x and σ M = m from the left hand sid e of the equation, w h ic h leads to: E { Y exp( − ψ x − γ m ) | X = x, M = m, W, Z } = E { Y exp( − ψ x ) | σ X = x, σ M = 0 , W , Z } . where the exp ectatio n at the left hand sid e is tak en with resp ect to th e p opulation distribution (whic h is what th e absence of the σ indicators in the conditioning part means). F rom the ab o v e equation, by vir tu e of (16), w e obtain: E { Y exp( − ψ x − γ m ) | X = x, M = m, W, Z } = E [ Y | σ X = 0 , σ M = 0 , W , Z ] . (20) The ab o v e equalit y implies that, conditionally on W and Z , the rand om v ariable Y exp( − ψ X − γ M ) is, in exp ectation und er the p opulation distribution, ind ep en den t of ( X, M ) and therefore, in a sample from a r andom p opulation, the quantit y: ( X i − E { X } ) Y i exp( − ψ X i − γ M i ) (21) has, conditionally on W and Z , zero mean at the true parameter v alues. Recall that w e are dealing with a sample from a 1-to-1 m atched case-con trol study . F or the aﬀected member of the i th matc hed set, consider the quantit y: E ∗ n X ( i 1) exp( − ψ X ( i 1) − γ M ( i 1) ) | W = W ( i 1) o = = E n X Y exp( − ψ X − γ M ) | W = W ( i 1) , Y = 1 o , = E n X Y exp( − ψ X − γ M ) | W = W ( i 1) o /P ( Y = 1 | W = W ( i 1) ) where the exp ectations E ( . ) are tak en with resp ect to the p o pu lation distribution. By virtue of the ab o ve indep end ence prop ert y , the ab o ve equation can b e rewritten as: E n X | W = W ( i 1) o E n Y exp( − ψ X ( i 1) − γ M ( i 1) ) | W = W ( i 1) o /P ( Y = 1 | W = W ( i 1) ) . whic h, in the ligh t of (20), can b e written as: = E [ X | W = W ( i 1) ] E h Y | σ X = 0 , σ M = 0 , W = W ( i 1) i /P ( Y = 1 | W = W ( i 1) ) , 16 F urther, note that by a similar reasoning E ∗ h X ( i 0) exp( − ψ X ( i 1) − γ M ( i 1) ) | W = W ( i 1) i = E [ X | W = W ( i 0) , Y = 0] E h Y exp( − ψ X − γ M ) | W = W ( i 1) , Y = 1 i = E [ X | W = W ( i 0) , Y = 0] E h Y | σ X = 0 , σ M = 0 , W = W ( i 1) i /P ( Y = 1 | W = W ( i 1) ) . Under a rare disease assum p tion, w e h a v e E [ X | W , Y = 0] ≈ E [ X | W ], wh ic h giv es Equa- tion (19). Qu o d e r at demonstr andum . In the remaining p art of this App end ix, we derive an estimator for the v ariance of the estimate of the parameter ψ of Equation (18). W e start by deﬁn ing θ ≡ ( ψ , δ, γ , β ) and let U i ( θ ) b e giv en b y: U i ( θ ) =     ( x ( i 1) − x ( i 0) ) exp  − ψ x ( i 1) − η m ( i 1)    x ( i 1) − x ( i 0) m ( i 1) − m ( i 0) z ( i 1) − z ( i 0)   expit  − δ ( x ( i 1) − x ( i 0) ) − η ( m ( i 1) − m ( i 0) ) − β ( z ( i 1) − z ( i 0) )      . Let ˆ θ d enote th e estimate of θ obtained by our m ethod. The v ariance of ˆ θ is we ll appr o ximate d in large s amp les by the follo wing sandwic h estimator: 1 n I E − 1  ∂ U i ( θ ) ∂ θ  V ar( U i ( θ ))I E − 1  ∂ U i ( θ ) ∂ θ  T , (22) where V ar( U i ( θ )) can b e estimated by calculati ng U i ( θ ), then taking the sample v ariance of these con tributions for all sub jects, and ﬁ nally ev aluating at ˆ θ . The qu an tit y I E ( ∂ U i ( θ ) /∂ θ ) can b e estimated b y ﬁrst calculating the gradient matrix ∂ U i ( θ ) /∂ θ for eac h sub ject, ev aluating it at ˆ θ and then calculating the sample av erage (o v er all su b jects) of eac h comp onen t of the matrix. In th is gradient matrix, the elemen t in the j th row and l th column should b e the deriv ativ e of the j th comp onen t of U i ( θ ) with resp ect to the l th comp onen t of θ . The ﬁrst diagonal elemen t of the resu lting matrix (22) give s the appro ximate v ariance of ˆ ψ . References 1. Camilla H. Andreasen, Kirs tin e L. Stender-Pe tersen, Mette S. Mogensen, Signe S. T orek o v, Lise W egner, Gitte An d ersen, Arne L. Nielsen, Anders Albrec htse n, Knut B orc h- Johnsen, Signe S. Rasmussen, Jesp er O. Clausen, Annelli Sand bk, T orsten Lauritzen, Lars Hansen, T orb en Jr gensen, Oluf P edersen, and T orb en Hansen. Lo w Physica l Activit y Ac- cen tuates the Eﬀect of th e FTO rs9939609 P olymorphism on Bo dy F at Accum ulation. Diab etes , 57(1) :95–101, January 2008. 17 2. Diego Ardissino, Carlo Berzuini, Piera Angelica Merlini, Pier Mann uccio Mannucci, Aarti Surti, No el Burtt, Benj amin V oigh t, Marco T ubaro, Flora Pe yv andi, Marta S preaﬁco, P a- trizia C elli, Daniela Lina, Maria F rancesca Notarangelo, Maurizio F errario, Raﬀaela F e- tiv eau, Giorgio Casari, Michele Galli, Fla vio Ribichini, Marco Rossi, F rancesco Bernard i, Nicola Marziliano, Pietro Zonzin, F rancesco Mauri, Alb erto Piazza, Luisa F o co, Luisa Bernardinelli, Da vid Altsh uler, Thr om b osis Kathiresan, Sek ar on b ehalf of Atheroscl ero- sis, and V ascular Biology Italian Inv estiga tors. Inﬂu ence of 9p21.3 genetic v aria nts on clinical and angiographic outcomes in early-onset my o cardial infarction. J ourna l of the Americ an Col le ge of Car diolo gy , 58(4 ):426 – 434, 2011. 3. Gill Bejerano, Mic hael P heasan t, Igor Makunin, Stuart Stephen, W. James Kent, J ohn S. Mattic k, and Da vid Haussler. Ultraconserv ed elemen ts in th e human genome. Scie nc e , 304(5 675):1321 –1325, 2004. 4. Luisa Bernardinelli, Salv atore Bruno Murgia, Pier P aolo Bitti, Luisa F o co, Raﬀaela F er- rai, Luigina Musu, Inga Prokopenko, Rob erta Pa storino, V aleria Saddi, Anna Ticca, Maria Luisa Piras, Da vid Ro xb ee Co x, and Carlo Berzuini. Asso ciatio n b et w een the accn1 gene and multiple sclerosis in cen tral east sardinia. PL oS ONE , 2(5), 2007. 5. Stephen J. Chano c k and Da vid J . Hunt er. Genomics - When the smok e clears ... Natur e , 452(7 187):537– 538, 2008. 6. S. C ole and M. Hernan. F allibilit y in estimating direct eﬀects. International Journal of Epidemiolo gy , 31:1631 65, 2002. 7. BJ C ordell, Heather J an Barratt and Da vid G Cla yton. Case/pseudo con trol analysis in genetic asso ciation studies: a uniﬁed fr amew ork for detection of genot yp e and h ap lotype asso ciat ions, gene-gene and gene-en vironment in teractions and paren t-of-origin eﬀects. Genetic Epidemiolo gy , 26:167–185 , 2004. 8. A. Philip Da wid. In ﬂ uence diagrams for causal mo delling and inference. International Statistic al R evie w , 70(2):161–1 89, 2002. 9. A. Philip Da wid. Causal inference: a decision theoretic appr oac h. In Carlo Berzuini, A.Philip Da wid, and Luisa Bernardinelli, editors, Causal Infer enc e: Statistic al Persp e c- tives and Applic ations . Wiley , 2011 (forth coming). 10. A.P . Da wid. Conditional ind ep endence in statistical th eory . J. R. Stat. So c., Ser. B , 41:1–3 1, 1979. 11. V anessa Didelez, A.P Da wid, and Sara Geneletti. Direct and indirect eﬀects of sequent ial treatmen ts. In Pr o c e e dings of the 22 nd Annual Confer enc e on Unc ertainty in Artiﬁcial Intel ligenc e (UAI-06) , p age s 138–14 6, Arlington, Virginia, 2006. A UAI Press. 12. V anessa Didelez, S vend Kreiner, an d Niels Keiding. Graphical mo dels for inference und er outcome dep end en t sampling. Statistic al Scienc e , 25:368 – 387, 2010. 18 13. C Dina, D Meyre, S Gallina, E Durand, A Krn er, P Jacobson, LM Carlsson, W Kiess, V V atin, C Leco eur, J Delplanque, E V aillan t, F Pat tou, J Ru iz, J W eill, C Levy-Marc hal, F Horb er, N Po to czna, S Hercb erg, Bougnres P Le Stu nﬀ, C, P Ko v acs, M Marre, Balk au B, S Cauc hi, JC Chvre, and P . F roguel. V ariation in FTO con tributes to c hildho o d ob esit y and sev ere adult ob esit y . Natur e Genetics , 39:724–726 , 2007. 14. Timoth y M. F ra yling, Nic holas J. Timpson , Mic hael N. W eedon, Eleftheria Zeggini, Rac hel M. F reath y , Cecilia M. Lindgren, John R. B. P erry , Katherine S. Elliott, Hana Lango, Nigel W. Ra yner, Bev erley S hields, Lorna W. Harries, Jeﬀrey C. Barrett, Sian Ellard, Christoph er J. Gro v es, Bridget Knight, Ann-Marie Patc h , Andrew R. Ness, S hah Ebrahim, Debbie A. Lawlo r, Susan M. Ring, Y oa v Ben-Sh lomo, Marjo-Riitta Jarvelin, Ulla Sovio , Amanda J. Bennett, Da vid Melzer, Luigi F errucci, Ruth J. F. Lo os, In s Bar- roso, Nic holas J. W areham, F redrik Karp e, Katharine R. Owen, Lon R. C ardon, Mark W alk er, Graham A. Hitman, Colin N. A. P almer, Alex S . F. Doney , Andrew D. Morris, George Da v ey Smith, Th e W ell come T rust Case Control Consortium, Andrew T . Hatter- sley , and Mark I. McCarth y . A common v arian t in the FTO gene is asso ciated with b o dy mass ind ex and p redisp oses to childhoo d and adult ob esity . Scienc e , 316(5826) :889–894, 2007. 15. Dan Geiger, T. V erma, and Jud ea P earl. Identifying ind ep endence in ba y esian netw orks. Networks , 20(5):50 7–534, 1990 . 16. Sara Geneletti, Sylvia Richardson, and Nicky Best. Ad justing f or selection bias in retro- sp ectiv e, case-con trol studies. Biostatistics , 10(1):17 –31, 2009 . 17. Sander Greenland. Causal analysis in the health sciences. Journal of the Americ an Statistic al Asso ciation , 95(449):pp. 286–289, 2000. 18. O Harismend y , D Notani, X Son g, NG Rahim, B T anasa, N Hein tzman, B Ren, XD F u, EJ T op ol, MG Rosenfeld, and KA F razer. 9p21 DNA v arian ts associated with coronary artery disease im p air interferon signalling resp onse. N atur e , 470(7333 ):264–268, 2011. 19. MA Hernan, S Hernandez-Diaz, and J M Robin s. A structural appr oac h to selection bias. Epidemiolo gy , 15(5):61 5–625, SEP 2004. 20. MM Joﬀe and T Greene. Related causal framewo rks for surr oga te ou tcomes. Biometrics , 65:530 –538, 2009. 21. Hiroshi Kikuta, Mary Laplante , Pa vla Na vratilo v a, Anna Z. Komisarczuk, Pr G. Engstrm, Da vid F redman, Altuna Ak alin, Mario Caccamo, Ian Sealy , Kerstin Ho w e, J u lien Gh islain, Guillaume Pez eron, Philipp e Mourrain, Staale Ellingsen, Andrew C. Oates, Christine Thisse, Bern ard Thisse, Isab elle F ouc her, Birgit Adolf, Andrea Geling, Boris Lenhard, and Th omas S. Bec ker. Genomic regulatory b locks encompass multiple neigh b oring genes and maint ain conserv ed s yn ten y in v ertebrates. Genome R ese ar ch , 17(5):5 45–555, 2007 . 22. Steﬀen Lilholt Lauritzen, A. Philip Dawid, B.N Larsen, and H.G. Leimer. Indep end ence prop erties of directed m arko v ﬁelds. Networks , 20(5): 491–505, 1990. 19 23. J. P earl and J .M. Robins. P robabilistic ev aluatio n of sequ en tial plans fr om causal mo dels with h idden v ariables. In P . Besnard and S . Hanks, editors, Unc ertainty in Artiﬁcial Intel ligenc e 11 , p age s 444–4 53. Morgan Kaufmann, San F rancisco, 1995. 24. Judea P earl. Graphs, causalit y , and str u ctural equ atio n mo dels. So ciolo gic al Metho ds and R ese ar ch , 27:2 26–284, 1998. 25. Judea Pea rl. Causality : Mo dels, R e asoning, and Infer enc e . Cam br idge Univ ersit y Press, Marc h 2000. 26. Judea Pea rl. Direct an d indirect eﬀects. I n Pr o c e e dings of the Americ an Statistic al A sso- ciation Joint Statistic al Me etings , pages 1572–1 581. MIRA Digital Publish ing. T ec hnical Rep ort R-273, 2005. 27. Judea Pe arl. Th e mediation formula: A guide to the assessment of causal path wa ys in nonlinear mo dels. In Carlo Berzuini, A.Philip Da wid, and Luisa Bernard inelli, editors, Causality: Statistic al Persp e ctives and Applic atio ns . Wiley and Sons, 2011 (forthcoming). 28. Armand P eeters, S igri Bec k ers, An V errijke n, Peter Ro ev ens, Pieter Pe eters, Luc V an Gaal, an d Wim V an Hu l. V arian ts in the FTO gene are asso ciated with common ob esit y in the b elgian p opulation. Mole cular Genetic s and Metab olism , 93(4):48 1 – 484, 2008. 29. James M. Robins. T esting and estimation of direct eﬀects by reparametrizing dir ect ed acyclic graph s with structur al nested mo dels. In Computatio n, Causation and Disc overy (Eds. C. Glymour and G. Co op er) , pages 349–40 5, Menlo Park, CA, and Cambridge, MA, 2006. AAAI Press and The MIT Press. 30. JM Robins and S . Greenland. Id en tiﬁabilit y and exc hangeabilit y for direct and indirect eﬀects. Epidemiolo gy , 3:143 –155, 1992. 31. A San d elin, P Bailey , S Bruce, PG Engstrom, J M Klos, WW W asserman, J Ericson, and Lenhard B. Arrays of ultrac onserved non-co ding regions span the loci of k ey dev elopmen tal genes in vertebrate genomes. BMC Genomics , 5, 2004. 32. Laura J. S cott , Karen L. Mohlke, Lori L. Bonn ycastle, Cr isten J. Willer, Y un Li, William L. Duren, Mic hael R. Erd os, Heather M. Stringh am, Peter S. Chines, Anne U. Jac kson, Ludm ila Prokun ina-Olsson, Chia-Jen Ding, Amy J . Swift, Narisu Narisu , Tianle Hu, Randall Pru im, Rui Xiao, Xiao-Yi Li, K aren N. Conneely , Nancy L. Rieb o w, An- drew G. Spr au, Maurine T ong, P eggy P . White, Kur t N. Hetric k, Mic hael W. Barnhart, Craig W. B ark, Janet L. Goldstein, Lee W at kins, F ang Xiang, Jouko S aramies, Thomas A. Buc hanan, Richard M. W ata nab e, Timo T. V alle, Leena Kinnunen, Gonalo R. Ab ecasis, Elizab eth W. Pugh, Kimb erly F. Dohen y , Ric hard N. Bergman, Jaakk o T uomilehto , F ran- cis S. Collins, and Mic hael Bo ehnke. A genome-wide asso cia tion stud y of t yp e 2 diab etes in ﬁnn s detects multiple su sceptibilit y v arian ts. Scienc e , 316(5829 ):1341–1 345, 2007. 33. Angelo Scuteri, Serena Sann a, W ei-Min C hen, Man uela Uda, Giusepp e Albai, J ames Strait, Samer Na jjar, Ramaiah Nagara ja, Marco O r r, Gianluca Usala, Mariano Dei, San- dra Lai, And rea Masc hio, F abio Bus on er o, Ant onella Mulas, Georg B Ehret, Ashley A 20 Fink, Alan B W eder, Richard S Co op er, Pilar Galan, Ar avinda Ch akra v arti, Da vid Sch- lessinger, Ant onio Cao, Edward Lak atta , and Gonalo R Ab ecasis. Genome-wide asso ci- ation s can shows genetic v arian ts in th e FTO gene are asso ciated with ob esit y-related traits. PL oS Genet , 3(7):e11 5, 07 2007. 34. Ily a Sh pitser. Gr ap h -based criteria of ident iﬁabilit y of causal questions. In Carlo Berzuini, A.Philip Da wid, and Luisa Bernard inelli, editors, Causality: Statistic al Persp e ctives and Applic ations . Wiley and Sons, 2011 (forth coming). 35. The W ellc ome T rust C ase Control Consortium . Genome-wide asso ciation study of 14,000 cases of sev en common diseases an d 3,000 shared controls. Natur e , 447 (7145):66 1–678, 2007. 36. J. Tian an d I. S hpitser. O n identifying causal eﬀects. In R. Dech ter, H. Geﬀner, and J.Y. Halp ern, editors, Heuristics, Pr ob ability and Causality: A T ribute to Jude a Pe ar l , pages 415–4 44. C olle ge Publications, UK, 2010. 37. Stijn V ansteela ndt. Estimating Direct Eﬀects in Cohort and Case-Control Stud ies. Epi- demiolo gy , 20(6):85 1–860, No v em b er 2009. 38. Stijn V a nsteelandt. Estimation of controll ed d irect eﬀects on a dic hotomous outcome using logistic stru ctural direct eﬀect mo dels. Biometrika , 97(4):921–9 34, 2010. 39. Stijn V ansteela nd t, S ylvie Go etgeluk, Sharon Lutz, Ir win W aldman, Helen Ly on, Er ic E. Sc hadt, Scott T. W eiss, and Ch ristoph Lange. On the Adjustment f or Co v ariates in Genetic Asso ciati on Analysis: A Nov el, Simp le Principle to Infer Direct C ausal Eﬀects. Genetic Epidemiolo gy , 33(5):394– 405, JUL 2009. 40. A.S. Whittemore. Collapsibilit y of multidimensional continge ncy tables. Journal of the R oyal Statistic al So c i ety, Series B , 40:328–34 0, 1978. 41. Eleftheria Zeggini, Mic hael N. W eedon, C ecil ia M. Lindgren, T imoth y M. F ra yling, Katherine S . Elliott, Hana Lango, Nic holas J. Timpson, John R. B. P erry , Nige l W. Ra yner, Rac hel M. F reath y , Jeﬀrey C. Barrett, Bev erley Sh ields, Andrew P . Morris, S ian Ellard, Christoph er J. Gro v es, Lorna W. Harr ies, Jonathan L. Marc hini, Katharine R. Ow en, Beatrice Kn igh t, Lon R. Cardon, Mark W alker, Graham A. Hitman, Andrew D. Morris, Alex S. F. Doney , Th e W ell come T rust Case C on trol Consortium (WTCC C), Mark I . McCarth y , and Andrew T. Hattersley . Replication of genome-wide asso ciat ion signals in uk samples r eveal s risk lo ci for t yp e 2 diab etes. Scienc e , 316(582 9):1336–1341, 2007.

Direct genetic effects and their estimation from matched case-control data

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment