Multiple testing procedures under confounding

IMS Collectio ns Beyond P arametr ics in Interdisciplinary Resear c h: F estsc hrift in Honor of Pro fessor Pranab K. Sen V ol. 1 (2008) 243–25 6 c  Institute of Mathemat ical Statistics , 2008 DOI: 10.1214/ 193940307 000000176 Multiple testing pro cedures under confound ing Debashis Ghosh ∗ 1 Pennsylvania State University Abstract: While m ultiple testing pro cedures ha v e b een the fo cus of muc h statistical research, an i mpor tan t facet of the problem is how to deal with possi ble confounding. Pro cedures hav e b een developed b y authors in genetics and statistics. In this chapt er, we relate these prop osals. W e prop ose tw o new multiple testing approac hes within this framework. The ﬁr s t com bines sensi- tivit y analysis metho ds wi th false discov ery rate estimation procedures. The second inv olv es construction of s hr ink age estimators that utilize the mixture model for multiple testing. The pr ocedures are illustrated with applications to a gene expression proﬁling experim ent i n prostate cancer. 1. In tro duction In the study of complex dis eases such as cancer , inv estigators have sought to impli- cate candidate gene p olymo r phisms with disease. Candidate gene p olymor phisms hav e bee n t y pic a lly identiﬁed in case-control studies, wher e their frequencies in cases a nd co nt rols ar e typically compa r ed. Such studies are quite c ommonplace in the scientiﬁc literature. F or this c ontext, w e fo cus o n the problem of m ultiple tes ting . A useful quantit y to consider in m ultiple testing situations is the false disc overy rate (FDR) (Benjamini and Hoch ber g [ 2 ]). Sabatti et al. [ 24 ] a nd Ab ecas is et al. [ 1 ] hav e advocated its use in genetic asso cia tion studies. There has b e en a lot of work on sta tistical pro cedures inv o lving the fals e discov ery rate, which we review in Section 2. While muc h work has b een done on m ultiple testing pro cedures, very little work to date has be e n done on estimation for this setting. Recently , E fron [ 11 ] has ar gued that these metho ds presume theo r etical null dis- tributions, which migh t b e incorr ect, and has instead argued for use of a n empirical nu ll distribution. A parallel developmen t of this problem has o c c urred in g enetic asso ciatio n studies (Cardon and Bell [ 7 ]), in which statistica l geneticists lo ok a t frequencies of alleles in aﬀected (cas e) and unaﬀected (control) po pulations to de- termine asso ciations b etw een genes and disea se. W right [ 36 ] a r gued that a v ariety of p opulation genetic forces, such as nonra ndom ma ting and g enetic dr ift, mig ht induce genetically similar s ubgroups within p o pula tions. This is r eferred to as p op- ulation str ucture in the genetic literatur e . Prop osa ls for addressing this problem hav e be en developed b y Devlin and Roeder [ 10 ] and Ro senberg and Pritchard [ 23 ]. ∗ Supported by the National Institutes of Health and the National Science F oundation. 1 Departmen t of Statistics, T he P ennsylv ania State Unive rsity , 326 Thomas Building, Ann Arb or, M I 48109-2029, USA, e-mail: debashis orama@gma il.com AMS 2000 subje ct classiﬁc ations: Primary 62P10; secondary 92D10. Keywor ds and phr ases: asso ciation studies, empirical null h ypothesis, multiple comparisons, statistical genomics. 243 244 D. Ghosh In this article, w e seek to unify these pr op osals in a f ramework for m ultiple testing in which we adjust for confounding. The p opulation structure that must be a ccounted for in genetic asso ciation studies re pr esents one t ype of confo und- ing. There has also b een mention on the r ole of other confounders in diﬀerential expression analyses for micro a rray da ta (Bhattacharya et al. [ 6 ] and Ghos h and Chinnaiyan [ 17 ]). Co nfounding is a pro blem in tha t it biases the asso ciations that are o bserved in the da ta. In other words, because of co nfounding, under the null hypothesis of no asso cia tion, the usual test statistic do es no t ha ve the corre c t mean. The str ucture of this pap er is as follows. In Section 2, we deﬁne the false dis- cov ery rate and r eview the standard mixture mo del for m ultiple h ypo thesis testing that has b een used in the literature . W e then slightly generalize the mode l to al- low for confo unding and then rela te three propo sals for m ultiple testing (Devlin and Ro eder [ 10 ], Pritchard and Rosenberg [ 23 ] and Efron [ 1 1 ]) to this model. W e then pr op ose tw o analy tical schemes for m ultiple testing in the presence of con- founding. The ﬁrst inv o lves utilization of s ensitivity analysis metho ds to adjust for confounding (Lin et a l. [ 19 ]), followed by q-v a lue estimation (Storey [ 28 ]) o r a false discov ery ra te controlling pro cedure. This , a long with the link to related shrink age estimation pro cedur es (James and Stein [ 18 ], Sen and Sale h [ 2 6 , 27 ] and George [ 15 ]), is discussed in Section 3. In Section 4, we dis cuss estimation of asso cia tion, along with conﬁdence in terv als in the m ultiple testing situation in the presence of confounding. W e highlight the pro cedure s with applications to a gene e x pression data from a pro state ca ncer s tudy in Section 5. Fina lly , we conclude with some discussion in Section 6. 2. Data structures, bac kground and prelim i naries Suppo se there ar e g ge nes that we wish to make gene-s pec iﬁc hypotheses ab o ut. W e ca n cr oss-cla ssify the corr esp onding g h ypothes es into their true status (i.e. null is true or alter native is true) and also based on whether or not w e decide to r eject the null h ypothes is . Suc h a table is g iven in T able 1 . The c la ssical quan tit y that ha s been controlled in multiple testing problems has bee n the familywise type I e rror rate (FWER). Based on T able 1 , the FWER is deﬁned as P ( V ≥ 1 ). By cont rast, the deﬁnition of false discov ery rate (FDR) as put forward by Benjamini and Ho ch ber g [ 2 ] is F DR ≡ E  V Q | Q > 0  P ( Q > 0 ) . The conditioning on the even t [ Q > 0] is needed becaus e the fraction V /Q is not well-deﬁned when Q = 0. If all the null hypotheses are true (i.e., g 0 = g ), then control of the false dis cov ery also provides control of the fa milywise t yp e I error rate. In general, howev er, co n trol of the tw o quant ities ar e not equiv alent, and the control of the false dis c ov ery r ate is less conserv ative than that o f FWER. T able 1 Outc omes of G t ests of hyp otheses Accept Reject T otal T rue null U V G 0 T rue alternative T S G 1 W Q G Multiple co mp arisons 245 FDR-related pro cedur es fa ll int o tw o clas s es: (1) methods that aim to control FDR; (2 ) methods for direct estimation of FDR based on a ﬁxed r e jection regio n. The ﬁrst class o f proce dur es ha s been prop osed by many author s, such a s Benjamini and Ho ch berg [ 2 ], Benjamini a nd Liu [ 3 ], Benjamini and Y ekutieli [ 4 ] and Sa rk ar [ 25 ]. E fron et al. [ 12 ], Storey [ 28 , 29 ] and Genovese and W asser ma n [ 13 ], 2005 hav e developed the s econd class of pro ce dures. The tw o classes of metho ds ha ve been uniﬁed by Storey et al. [ 30 ] and Genov ese and W asser man (2005). The FDR is a lso linked intimately to the q-v alue (Storey [ 28 ]), the smallest FDR at which a null hypothesis can b e rejected. 2.1. Mixtur e mo del fr amework for multiple testing F o r false discov ery rate estimatio n proce dur es, many authors hav e studied a mixture mo del for mult iple tes ting. T o test each n ull hypo thesis, let T g denote t he tes t statistic for the g th gene. Deﬁne indicator v ariable s H 1 , . . . , H G corres p o nding to T 1 , . . . , T G , wher e H i = 0 if the null hypothesis is true and H i = 1 if the alternative hypothesis is tr ue . Assume that H 1 , . . . , H G are a random sample fro m a Bernoulli distribution where P ( H i = 0) = π 0 , i = 1 , . . . , G . W e deﬁne the densities f 0 and f 1 corres p o nding to T i | H i = 0 and T i | H i = 1, ( i = 1 , . . . , m ). The marg inal density of the test statistics T 1 , . . . , T m is given by (2.1) f ( t ) ≡ π 0 f 0 ( t ) + (1 − π 0 ) f 1 ( t ) . The mixture mo del framework repr esented in ( 2.1 ) has b een used by several authors to study the false discovery rate (e.g., E fr on et al. [ 12 ], Storey [ 28 ] and Genov ese and W asser man [ 14 ]). Given an estimate of π 0 , methods for false discov ery rate estimation ha ve b een developed by several author s (Efro n et al. [ 12 ] and Stor ey [ 28 ]). There are several methods f or es timating π 0 in the liter ature, mo s t of which are based o n the deﬁntion of π 0 as the deriv ative of the density of test statistics corres p o nding to the null hypothesis ev a lua ted at one. The latter deﬁnition ca n be found in Stor ey and Tibshira ni [ 31 ]; methods of estimatio n of π 0 can be found in Storey [ 28 ], Pounds and Cheng [ 22 ] and Dalmasso et al. [ 8 ]. Intuitiv ely , it is estimated base d on p- v alues that in a reg ion clo se to one; the size of the region will be inevitably linked to a bias-v ariance tradeoﬀ regar ding the estimate of π 0 . 2.2. A djustments for c onfounding In mo del ( 2.1 ), author s hav e usually a s sumed f 0 to b e known. F or e x ample, if the test statistics are p-v alues, then f 0 has b een assumed to be the p df of a Unifor m(0 ,1 ) random v a riable. If the statistics are W ald- t yp e statistics (e.g., t w o sample t-tes ts ), then f 0 is usually tak en to b e a normal distribution with mean zero and v ariance one. There hav e been c ertain pro p o sals in which authors attempted to use permuta- tion metho ds to estimate f 0 (e.g., E fron et al. [ 12 ]). This tec hnique has b een used to account for cor r elation b etw een the h ypo theses, e.g. genes on a micr oarray hav- ing c orrela tio n. Perm utation is o ften ca r ried out b y interchanging labels b etw een independent samples. How ever, a crucial assumption of p ermutation metho d is that under the n ull hypothesis, all assignments of class la b e ls to samples ar e exchange- able. In the pr esence of confounding, this assumption is no long er true . This is wh y Efron [ 11 ] wr ites that “p ermutation metho ds do no a utomatically resolve the issue of the theor e tical versus the empir ical null hypo thesis.” 246 D. Ghosh T o acco unt for confounding, let us assume that under the null hypo thesis, the i th test statistic T i has distr ibution f C 0 i ( t ), where the subscript C denotes confounding. Assume that under the a lternative hypothesis, the distribution is f C 1 i ( t ) for T i , i = 1 , . . . , n . Then we hav e the follo wing mixtur e mo del: (2.2) T i ind ∼ π 0 f C 0 i ( t ) + (1 − π 0 ) f C 1 i ( t ) . Assume further that f C 1 i ( t ) do e s not dep end on i . Suppose there w ere s upplemen tary test statistics U 1 , . . . , U K av ailable for which it w ere known that the null hypothesis was true. The framework o f Devlin and Roeder [ 10 ] works in the following manner. Using a population g enetic mo del and the presence of populatio n stratiﬁcation, they sho w that f C 0 i ( t ) will b e relatively overdisperse d compared to f 0 i ( t ), whic h they assume not to dep end o n i . T o be sp eciﬁc, f C 0 i ( t ) = σ − 1 i f 0 ( t/σ i ) . While there should also be a mean shift (i.e. bias correction) in f C 0 relative to f 0 , Devlin and Ro eder [ 10 ] argue that the o verdispersio n outw eighs the bias; this has b een supp o rted b y other a uthors using simulation studies (W acholder et a l. [ 34 ]). Making a further assumption that the scale pa rameter σ i is consta nt acro ss the genome so that it no longer dep ends on i , Devlin a nd Roeder [ 10 ] use the s tatistics U 1 , . . . , U K to estimate σ . L e t the resulting estimator b e denoted as ˆ σ ; one can then use in ˆ σ − 1 f 0 ( t/ ˆ σ ) as the reference null hypothesis for multiple testing. Devlin a nd Ro eder [ 10 ] arg ue for the co nstant v ariance ass umption ac ross the genome using po pulation genetic arguments. An a lternative approach to adjusting for co nfounding was put forward b y Pritchard a nd Rosenber g [ 23 ]. They po stulated the existence of a latent v ar iable C such that co nditional on C = c , (2.3) T i | C = c iid ∼ π 0 f 0 ( t ) + (1 − π 0 ) f 1 ( t ) , i.e. within subp opula tions deﬁned by C , the test statistics have the distributio n deﬁned as in ( 2.1 ). In their metho d, wha t is needed are supplementary data that can be used to deﬁne C for eac h individual. Just as in the genomic control metho d of Devlin and Roeder [ 10 ], these are data o n genes for whic h the n ull hypo thes is is known to be true. The appr oach o f Pritchard and Rosenber g [ 23 ] is to infer sub- po pulation mem be r ship ( C ) using the supplementary data and then do standar d m ultiple testing within each subpo pulation. Conditional on C (i.e., within subpopu- lations deﬁned by C ), the theoretical null and alter native distributions ca n b e used for the multiple testing problem. F or Pritchard and Rosenberg [ 23 ], they use the supplement al data to infer subpopula tio n mem bers hip and then for those deﬁned by common v alues for the inferr ed cluster member s hip, they ca n p erfor m the standa rd m ultiple testing pr o cedures. The appro a ch of Efron [ 11 ] in volv es the f ollowing model: (2.4) T i iid ∼ π 0 f C 0 ( t ) + (1 − π 0 ) f C 1 ( t ) . Thu s, the densit y of the test statistics corresp onding to the null and alternative hypotheses are left unsp eciﬁed. No relations hip b etw een f C 0 ( t ) to f 0 ( t ) is formulated, making this approach more ﬂexible than that o f Devlin and Roeder [ 10 ]. In addition, no s upplemen tal data are r equired on known true null h ypothese s, in contrast to the Devlin and Roeder [ 10 ] and Pritchard and Ro senberg [ 2 3 ] pro po sals. T o mak e estimation in ( 2.4 ) feasible, E fron [ 11 ] ma kes a zero-matching assumption, namely Multiple co mp arisons 247 that all statistics near zero come from the n ull h ypothesis co mpo nent and relates f 1 to f 0 via an exp onential tilt (Lindsey [ 20 ]). In co mparing the prop osals, we ﬁnd that b oth the method of Pritchard and Rosenberg [ 23 ] and Devlin and Ro e der [ 10 ] b oth r equire supplemental data , a s well as an assumption that the supplemental data are not asso cia ted with the phenotype in any manner. By contrast, no supplemental data are needed for the metho d of Efron [ 11 ]. How ever, a desirable feature of the ﬁr s t tw o pro p osals mentioned is that they explicitly mo del the confounding in an easily interpretable w a y into the mo del. The Efro n appro ach is more algo rithmic in nature , although he mentions confound- ing a s being one reason to prefer the empirical null r elative to the theoretical null hypothesis. If we think of confounding as manifesting thr ough a bias and v ar iance adjust- men t to the theoretical n ull distribution, then w e ﬁnd that the genomic control metho d makes a v ar iance a djustmen t to the theoretical n ull, while the approach of Pritchard and Ros e n b erg [ 23 ] prop osal mak es an adjustmen t to b oth through the latent v aria ble C . The Efro n [ 11 ] pro cedure also doe s the same witho ut sp ecifying a probabilis tic mo del for the confounding. T he ﬁrst a pproach we prop ose in this pap er is to p o stulate confounding in a manner a nalogo us to P ritchard and Rosen- ber g [ 23 ]; how ever, the ac tual m ultiple testing proc e dur es do no t r equire additional data, which mak es it similar to the approa ch of Efron [ 11 ]. An imp orta n t step in application o f the pro p o sed metho dolo gy is the use of sens itivit y ana lysis metho ds, which a re describ ed next. 3. Sensitivity analysis and m ultiple testing metho dologies T o incorp ora te confounding into the analysis, we will ut ilize the framework of Lin et al. [ 19 ]. They ca st se ns itivit y analysis into a regressio n modelling framework in which the confounder w as treated as an unobs erved cov ariate. While they focused primarily on binary and time-to-even t outco mes, the data we consider in Section 5 inv o lves a contin uous resp onse. Also, we have the issue of multiple testing. 3.1. Continuous r esp onse W e observe the data ( X i , D i ), i = 1 , . . . , m , where X i is the n -dimensional cov ari- ate vector and D i the binar y phenotype for the i th individual. W e formulate the following r egress ion mo del r elating X and D : (3.1) E ( X ki | D i , U ) = β 0 k + β 1 k D i + γ C i , where ( β 0 , β 1 ) are regressio n co eﬃcients, C repre sents the confounder and γ is the asso ciated regr ession co eﬃcient. Note that w e assume no interaction b etw een C i and D i in ( 3.1 ). Since C is not o bserved, we ﬁt the reduced model: (3.2) E ( X ki | D i ) = β ∗ 0 k + β ∗ 1 k D i . W e w ant to determine the r elationship betw een β 1 and β ∗ 1 . Note that integrating ( 3.1 ) with res pe c t to the conditional distr ibution of C given D implies the following mo del: (3.3) E [ X ik | D i ] = β 0 k + β 1 k D i + γ E [ C i | D i ] . 248 D. Ghosh W e now assume C | D to b e normal with mean µ D and v ariance one. F or this situa - tion, plugging in the deﬁnition and equating regression coeﬃcients for D i in ( 3.3 ) and ( 3.2 ) yields (3.4) β 1 k = β ∗ 1 k − γ ( µ 1 − µ 0 ) . Thu s, after the user sp eciﬁes γ and the diﬀerence ( µ 1 − µ 0 ), the estimate o f β 1 k , ˆ β 1 k , can b e found quite ea s ily by direct computation in to ( 3.4 ) Since the inputs are co nstants, this has no impact on the standard error s of β 1 k . O ne ca n then use ˆ β 1 k / ˆ S E ( ˆ β ∗ 1 k ) a s the W ald statistic whic h when co mpared to a standa rd normal distribution yields a p-v a lue. Ag ain, the standard erro rs a re unaﬀected by the sensi- tivit y ana lysis para meters, a nd w e can obta in p-v alues bas e d on the W ald statistic. 3.2. Q-value/FDR adjustment for multi ple testing Based on the p-v alues ca lculated in the prev io us section, one c an co ns truct hypothesis-sp eciﬁc q- v alues (Stor ey a nd Tibshira ni [ 31 ]). Q-v alues are deﬁned to be the smallest FDR a t which a test o f h y po thesis is signiﬁca nt , a nalogo us to the p-v alue b eing the smallest le vel of signiﬁca nc e a t which a test o f hypothesis is deemed to b e signiﬁcan t. Here is the q-v alue construction algo rithm: 1. Order the G p-v alues as p (1) ≤ p (2) ≤ · · · ≤ p ( G ) . 2. Construct a grid of L λ v alues, λ 1 , . . . , λ L and calcula te ˆ π 0 ( λ l ) = # { p j > λ l } G (1 − λ l ) , l = 1 , . . . , L . 3. Fit a cubic smo othing s pline to the v alues { λ l , ˆ π 0 ( λ l ) } , l = 1 , . . . , L . 4. Estimate π 0 by the in ter p olated v alue at λ = 1. 5. F o r th e gene with the largest p- v alue the q- v alue is given b y q ( p ( G ) ) = min t ≥ p ( G ) ˆ π 0 Gt # { p j ≤ t } = ˆ π 0 p ( G ) , and for i = G − 1 , G − 2 , . . . , 1, q ( p ( i ) ) = min( ˆ π 0 Gp ( i ) /i, ˆ p ( i +1) ). This guar - antees that the q-v alues will b e mo no tonically incr easing as a function of p-v alues. An a lternative approach is to use the p-v alues in the Benjamini-Ho ch b er g [ 2 ] pro cedure for co ntrolling the false discov ery rate: (a) Let p (1) ≤ p (2) ≤ · · · ≤ p ( G ) denote the order e d, o bserved p-v alues. (b) Find ˆ k = max { 1 ≤ k ≤ G : p ( k ) ≤ αk / G } . (c) If ˆ k exists, then reject n ull hypotheses p (1) ≤ · · · ≤ p ( ˆ k ) . Otherwis e, reject nothing. The ma jor adjustmen t in these procedure s is that we are accounting for potential adjustment of co nfo unding , which impacts the p- v alues that get used in the multiple testing pro cedur es. Note that while we ha ve used q-v alues to adjust for the multiple testing problem, the p-v a lue s can be input in to an y single-step algo rithm for mult iple tes ting. If we wished to con trol the fa milywise t ype I err or level instead, we co uld use any of the Multiple co mp arisons 249 pro cedures from W estfall and Y oung (1993) or so me of the more rec ent methods from V an der Laan and Dudoit a nd colleagues (V an der L a an et al. [ 32 , 33 ]). One thing to note a bo ut the sensitivit y ana lysis metho d describ ed in Section 3 .1 and is that it is eﬀectiv ely making an mean-s hift adjustment to the theoretical null distribution. In par ticular, the standa rd er rors of the estima to rs are not aﬀected by the sensitivity analys is metho d of Lin et al. [ 19 ]. The m ultiple testing mo del that adjusts for co nfounding assumed here, in the notation of Section 2, is the following: (3.5) T i | Θ iid ∼ π 0 f 0 ( t − Θ) + (1 − π 0 ) f 1 ( t ) , where Θ denotes the sensitivity analysis inputs and a re ﬁxed consta nt s sp eciﬁed by the user. One adv a nt age of such a model for confounding is that it r e quires minimal user input and is easy to implement. A second adv antage is that what ar e aﬀected in this proc e dur e is the estimate of π 0 and the actual q-v alues themselves. In settings where in vestigators are interested in the selection of hypotheses, these quantities migh t be a ll that is of interest. In comparing this mixture model with ( 2.2 ) and ( 2.3 ), w e ﬁnd that the s e ns itivit y analysis approa ch models c o nfounding as a shift in the mean for the confounding nu ll density relative to the theoretical null density . W e allo w for nonconstant v ari- ance acr oss the genome, in con trast to the appr oach o f Devlin and Roeder [ 10 ]. In addition, unlik e Dev lin and Ro e der [ 10 ], we mo del the bias due to confounding. By compa rison, the Pritchard a nd Rosenberg [ 23 ] and Efron [ 11 ] appro aches would allow for both a mean and v ar iance adjustmen t for the null h ypothes is density . It appea rs that combining the appro ach here with t hat of g enomic con trol s ho uld provide an answer that is closer to the Pritchard and Ros e n b erg [ 23 ] pro p osal, if supplement al data were av ailable. 3.3. Shrinkage estimation for the q-value W e now heuristically describ e an ar gument, given in more detail in Ghosh [ 16 ], that shows how the q-v a lue can be motiv ated as a shrink ag e estimator in the m ultiple testing pro blem. Consider mo del ( 2.1 ) ag ain, whe r e the test statistics ar e the p- v alues. Then one could construct shrink ag e estimators of the p-v alues where they are shr unk towards the comp onents of the mixture mo del deﬁned b y ( 2.1 ). If w e assume that the densit y o f the n ull hyp o thesis is a point ma ss at one and that of the alter native is zero, then it is shown in Ghos h [ 16 ] that a shr ink age es timator of the p-v alue in th is problem is lo w er bounded by the q-v a lue. Th us, th e q-v a lue approach advocated by Stor ey [ 28 ] enjo ys a nice shrink a ge pro pe rty . How ever, the deﬁnition o f the q-v alue given by Storey [ 28 ] is a ppropriate to the situation of 0 / 1 loss (miscla ssiﬁcation err or). The shr ink age es timation pro cedur e by Ghosh [ 16 ] could generalize e a sily to other loss functions, whic h might b e more appropria te in other scientiﬁc cont exts. 4. Estimators of ass o ciation in the pres ence of confounding A limitation o f all these approa ches is tha t they do not pro duce mult iple-testing adjusted e s timates of ass o ciation or conﬁdence in terv als. W e no w discuss the prob- lem o f how to estimate asso ciation mea sures in the m ultiple testing problem in the presence of co nfounding. While the multiple testing problem has b een considered quite int ensively in the statistica l literature, that of estima ting ass o ciations has not. 250 D. Ghosh This w as p ointed out recen tly by P rentice and Qi [ 21 ] in a genomewide study ex- ample. Our approach is based on the idea that we can use ( 2.1 ) to deﬁne e stimation targets under the null and alternativ e hypo theses. One then can deﬁne Empirical Bay es estimators that shrink the observed test statistic tow a rds each target with appropria te mixing w eight s. F o r estimatio n, J ames-Stein (Ja mes and Stein [ 18 ]) estimators are used. W e now construct the double s hrink ag e estimators using the tes t statistic mo del ( 2.1 ) where F = π 0 F 0 +(1 − π 0 ) F 1 . T o do this, we utilize the densit y estimation metho d pro p osed by E fron [ 11 ]. Note fro m ( 2.1 ) that w e ha v e π 1 f 1 ( t ) = f ( t ) − π 0 f 0 ( t ) . W e ca n estimate f ( t ) by apply ing density estimation metho ds to T 1 , . . . , T G . F or estimation of π 0 f 0 ( t ), the zer o a ssumption in E fron [ 1 1 ] is utilized. What this means is that most test statistics with a v alue near z e r o comes fr om the null distribution comp onent. The assumption, combined with a no rmal-base d mo ment s matching techn ique and density estimatio n pro cedures as describ ed in Efron [ 11 ], allows o ne to obtain an estimate of π 0 and f 0 ( t ). Giv e n them, w e then obtain a n estimate of π 1 and f 1 ( t ) b y simple subtraction. Ba s ed on the estimates of f 0 ( t ), f 1 ( t ) and π 0 , we c an estimate T J S 1 , . . . , T J S n by (4.1) ˆ T J S i = ˆ π 0 ( T i ) ˆ T J S 0 i + { 1 − ˆ π 0 ( T i ) } ˆ T J S 0 i , where ˆ T J S 0 i = T i − " 1 ∧ G − 2 P G i =1 ( T i − ˆ µ 0 ) 2 # ( T i − ˆ µ 0 ) , ˆ T J S 1 i = T i − " 1 ∧ G − 2 P G i =1 ( T i − ˆ µ 1 ) 2 # ( T i − ˆ µ 1 ) , ˆ π k ( t ) = ˆ π k ˆ f k ( t ) ˆ π 0 ˆ f 0 ( t ) + (1 − ˆ π 0 ) ˆ f 1 ( t ) , ˆ µ 0 = R td ˆ F 0 ( t ) and ˆ µ 1 = R td ˆ F 1 ( t ). Before discuss ing the issue o f conﬁdence in terv als, no te that w e are us ing the mo del fr amework of E fron [ 11 ] to dea l with confounding, i.e., model ( 2.4 ). As men- tioned b efore, estimation in this model requires no s upplemen tal data on known nu ll hypotheses. Along with estimators o f the par ameters that a djust for multiple testing, it is useful to hav e conﬁdence in terv als that als o account for the m ultiple testing phe- nomenon. T o calculate the conﬁdence in terv als , we will use the following sim ulation- based algor ithm: 1. Estimate π 0 , f 0 and f 1 using the algo rithm ab ove. 2. Construct the double shrink age estimators of µ 1 , . . . , µ G , ˆ T J S 1 , . . . , ˆ T J S G . 3. Sample with r e placement G observ a tions µ ∗ 1 , . . . , µ ∗ G from the es timated den- sity ˆ π 0 ˆ f 0 ( t ) + (1 − ˆ π 0 ) ˆ f 1 ( t ) and generate T ∗ 1 , 1 , . . . , T ∗ G, 1 , where T ∗ i, 1 ∼ N ( ˆ T J S i , 1 + ˆ σ 2 ∗ ), where ˆ σ 2 ∗ is the empirical v a riance of the µ ∗ ’s sampled in Step 2. 4. Repe a t Step 3 B times. Use the empirical distr ibution o f T ∗ i, 1 , . . . , T ∗ i,B to calculate conﬁdence interv als for µ i , i = 1 , . . . , G . 5. Multiply b y the gene-speciﬁc standard devia tions to get conﬁdence interv als on the scale of the par ameter. Multiple co mp arisons 251 Based on the b o otstra p distributions, we can co nstruct conﬁdence interv als for each of the comp onents of µ ≡ ( µ 1 , . . . , µ G ). W e hav e chosen to use equal tail conﬁdence int erv a ls, e.g. a 95% co nﬁdence int erv a l would b e based on the 2.5 th and 97.5th per centiles of the empirical distribution of T ∗ i . 5. Numerical example: Mi croarra y proﬁling s tudy in prostate cancer W e no w illustrate the pro p o sed metho dologies using tw o genomic studies. The ﬁrst comes fro m prostate cancer g e ne expression proﬁling exp eriment rep orted in V ar ambally et al. [ 35 ]. The inv estigators us e d cDNA micro arrays co ntaining 9984 genes to proﬁle tissue samples from v arious stages of prostate cancer (normal ad- jacent prostate, b enign prostatic hyperpla sia, lo calized prostate cancer , adv anced metastatic pro state cancer). Measurements w ere made on 99 84 genes for 10 1 individuals, of who m 79 have cancers and 22 do not. Before analyzing the data, we ﬁltered out genes that ha d a sample v ariation less than 0.05 across all samples. This left a total o f G ≡ 6 253 genes av ailable for analy s is. First, a q-v alue analysis (St orey [ 28 ]) w a s done using the unpo o led t wo-sample t-test; this is rep o r ted in Figure 1. F or this analy s is, w e assumed a theoretical nu ll distribution of N (0 , 1) for ea ch t-statistic. O ne po int of note from the plot is the estimate of π 0 , the fraction of null h ypothes e s estimated to b e tr ue, is 0.49. As a result, we are ﬁnding a large prop o rtion of genes to b e diﬀerentially expres sed. Fig 1 . R esults of the q -value analysis (Stor ey [ 28 ]) f or pr ostate c anc er gene e xpr ession data. 252 D. Ghosh T able 2 R esults of sensi t ivity analyses for g ene expr ession data γ ( µ 1 − µ 0 ) ˆ π 0 Num ber q-v alues ≤ 0.05 0 0 0 . 49 2099 1.5 0 . 01 0 . 86 1762 0 . 1 0 . 47 613 0 . 3 0 . 05 6227 0 . 5 0 . 01 6253 1 0 . 006 6253 0.1 0 . 01 0 . 49 2415 0 . 1 0 . 78 1902 0 . 3 0 . 86 1466 0 . 5 0 . 80 1154 1 0 . 69 586 As ar g ued by Ghosh and Chinnaiy an [ 1 7 ], there is a v ariety o f co nfounders that might ca use the apparent o bserved diﬀerence betw een the cancer and noncancer samples. The implica tion is that the s tandard normal distribution might not b e the correct r eference null distributio n to use. W e tried adjustments to the t-statistic based o n the s ensitivity analysis appr oach. First, w e assumed a confounder that was co nt inuous and tried v a rious v alues o f ( γ , µ 1 , µ 0 ) to obtain estimates of π 0 and the q-v a lues. Note that what matters is the diﬀerence in means µ 1 − µ 0 , so we considered changes of 0 . 01 , 0 . 1 , 0 . 3 , 0 . 5 and 1. W e have aggrega ted the sensitiv ity results into T able 2. What we ﬁnd is that the estimate of π 0 is extremely sensitive to confounders . In terestingly , as the pr o duct of γ and ( µ 1 − µ 0 ) decr e ases, w e ﬁnd that the es timate of the prop or tion of true null h yp o theses increa s es, which lea ds to a decrease in the num ber of gene s being called signiﬁcant if w e use a cutoﬀ of 0 .05, say . How ever, we als o see that there is no simple relationship betw een the amoun t of confounding and the estimate of π 0 . The estimate of n um ber of genes b eing called signiﬁcant is also sensitive to the estimate of π 0 . F or example, there is very little diﬀerence in the estimates of π 0 for the unadjusted and ( γ , µ 1 − µ 0 ) = (0 . 1 , 0 . 01) situation, y et we are calling approximately four h undred more genes sig niﬁcant in the second scenar io. This emphasizes the need to ha v e g o o d estimates of π 0 . Next, w e utilized the E fron [ 11 ] metho d for estimating the lo cal false discovery rates; the results are prese n ted in Figure 2. Ag ain, this pro cedure co rresp onds to a data-driven adjustment to null hypothese s for confounding. Now note that the estimate of π 0 from the Efron [ 11 ] pro cedure is 0 . 88, almos t 80% larg er than the estimate using the method of Storey [ 28 ]. This ﬁts into our intuition that adjusting for confounding s ho uld decrease the num ber of signiﬁcant genes. W e next calculated the do uble-shrink a ge estimators for the t-statistics using the metho d in Section 3 .1 . W e then com bined an FDR-thresholding pro cedure with the inference pro cedures describ ed in the paper. This was done in the follo wing manner . W e applied the q-v alue pro cedure o f Stor ey [ 28 ] a nd s e le cted the genes with the t w en ty smalles t q- v alues. W e r e po rted the eﬀects and asso ciated 95% co nﬁdence interv als in T able 3 . Note that the 95% CIs sho uld b e cons erv ative in that we hav e igno r ed the s election pro cess (i.e. taken the tw ent y genes with the sma llest q-v alues ). 6. Discussion In this article, we hav e addr essed the issue of m ultiple testing pro cedur es in the presence of confounding. First, we sought to unify propo s als from the statistical and genetics literature on the problem. Doing so clar iﬁe s and provides a stronger Multiple co mp arisons 253 Fig 2 . R esults of the t -test statistics fr om the pr ostate c anc er gene expr ession data using the lo c al false disc overy r ate estimation pr o c e dur e of Efr on [ 11 ]. T able 3 R esults for t op 20 genes for pr ostate c anc er gene expr ession data Gene name Estimator S ˆ β 95% CI T ransforming growth factor, b eta 1 6 . 99 0 . 30 (0 . 09 , 0 . 52) F er-1-like 3, my oferlin (C. elegans) 8 . 36 0 . 55 (0 . 14 , 0 . 93) O-linked N-acet ylglucosamine transferase − 9 . 11 − 0 . 51 ( − 0 . 88 , − 0 . 18) Human calmo dulin-I (CALM1) mRNA 6 . 87 0 . 56 (0 . 17 , 0 . 98) Hepsin (transmembrane pr otease, serine 1) − 6 . 89 − 0 . 95 ( − 1 . 62 , − 0 . 30) Ca ve olin 2 7 . 44 0 . 53 (0 . 16 , 0 . 85) Ras homolog gene famil y , mem b er B 6 . 7 1 . 41 (0 . 42 , 2 . 40) Zinc ﬁnger protein 36, C3H t ype-l ike 1 6 . 76 0 . 52 (0 . 16 , 0 . 87) Hepatic l euk emia f actor 6 . 72 0 . 46 (0 . 11 , 0 . 78) Phosphatidic acid phosphatase type 2B 6 . 59 0 . 34 (0 . 08 , 0 . 60) Multiple endo crine neoplasia I − 6 . 51 − 0 . 31 ( − 0 . 54 , − 0 . 08) Endothelin r eceptor type A 6 . 47 0 . 51 (0 . 12 , 0 . 92) T ransforming growth factor, b eta receptor I I I 6 . 38 0 . 54 (0 . 10 , 0 . 98) Gro wth factor receptor-bound protein 2 − 6 . 31 − 0 . 42 ( − 0 . 77 , − 0 . 12) Hermansky-Pudlak s yndrome 1 6 . 32 0 . 34 (0 . 07 , 0 . 58) Dickk opf homolog 3 (Xenopus l aevis) 6 . 29 0 . 34 (0 . 09 , 0 . 61) Phosphatidic acid phosphatase type 2B 6 . 24 0 . 32 (0 . 06 , 0 . 57) Tissue inhibitor of metalloproteinase 3 6 . 2 0 . 55 (0 . 12 , 1 . 00) Hypothetical protein hCLA-iso − 6 . 13 − 0 . 38 ( − 0 . 70 , − 0 . 09) Zinc ﬁnger protein 36, C3H t ype-l ike 2 6 . 08 0 . 34 (0 . 08 , 0 . 62) 254 D. Ghosh justiﬁcation for pr e ferring the empirical null distribution that Efro n [ 11 ] has rece ntly advocated. It is also clear fro m the discussion that wha t the metho ds of Devlin and Ro eder [ 10 ] a nd Pritc hard and Rose n b erg [ 2 3 ] provide are a djustmen ts to the theoretical null, using supplemen tal data and under v ar ious assumptions on the nature of confounding a nd its eﬀects on lo cation a nd scale par ameters. Second, we hav e prop osed tw o ana lytical approaches. The ﬁrst is a sensitivity analysis approach using a methodology describ ed b y Lin et al. [ 19 ], whic h is then plugged into standa rd FDR metho dology . As shown in the article, this can b e viewed as a mean-shift adjustment to the theoretical null. The seco nd pr o cedure in the a rticle we pursue is one using E mpirical Bay es methodolo gy . This inv o lves utilizing the mixture mo del for hyp o thesis testing in the pr esence of confounding prop osed by Efron [ 11 ] and constr ucting James-Stein estimators of the test s tatistic that shr ink age tow ards each target (that spe c iﬁed under the nu ll a nd that under the alternative) with data-dep endent weights. What is novel here is tha t the mea n v alue of the distribution of the test statistic under the null is als o estimated as par t of the pro cedure. This leads to calibration of the test statistics under the empirical nu ll h ypo thes is (Efro n [ 11 ]) r ather than the theoretical n ull hypothesis . W e also develop conﬁdence interv als in the multiple testing setting, which has no t b een discussed very muc h in the liter ature, with the ma jor exception being the work o n FDR-controlling conﬁdence in terv als by Benjamini and Y ek utieli [ 5 ]. The metho do logy prop osed her e is fairly general. The se ns itivit y analysis regres - sion mo dels co nsidered here ar e a subset of those considered by Lin et al. [ 19 ]; the metho dology prop osed here could apply more g enerally to censored outcomes through propo rtional haza rds r egressio n mo dels and coun t r esp onses using Pois- son regression mo dels. The double shrink age estimation metho dology in Section 4 requires having W ald-type estimators. While there hav e b een Empirica l Bay es meth o ds for mult iple tes ting prop osed in the literature (Efron et a l. [ 12 ], Datta and Datta [ 9 ] and Ghosh [ 1 6 ]), most o f this work has fo cused o n hypothesis testing and selection o f hypotheses. While this is useful in scr eening problems, it mig ht also b e of interest to rep ort estimated eﬀects and conﬁdence in terv als corres p o nding to the rejected null hypotheses . A ma jor p otential adv an tage of Empirical Bayes (and more genera lly , fully Bay esian) metho ds in this setting is that pa r ameter estimates and co nﬁdence int erv a ls w ill be unaﬀected by the selection of whic h h y po theses to reject, provided the testing pro cedure is Bayesian. This is an ar ea that is curr ent ly under inv estigation. Ac knowledgment s. The a uthor would like to thank T om Nichols and T rivellore Raghunathan for helpful discussions. References [1] Abecasis, G. R., Ghosh, D. and Nicho ls, T. E. ( 2005 ). Link a g e disequi- librium: Ancient history drives the new g enetics. Hu man Her e dity 59 11 8–12 4. [2] Benjamini, Y. a nd Hochberg, Y. (19 95). Controlling the false dis cov ery rate: A practical a nd p owerful approa ch to m ultiple testing. J. R oy. S tatist. So c. Ser. B 57 28 9–30 0. MR13253 92 [3] Benjamini, Y. and Liu, W. (199 9). A s tep-down multiple hypothesis testing pro cedure that controls the false discov ery r ate under indep endence. J. S tatist. Plann. In fer enc e 82 163 –170. MR17364 41 Multiple co mp arisons 255 [4] Benjamini, Y. and Yekutieli, D. (2 001). The co nt rol of the false discov- ery rate in multiple testing under dep endency . Ann. Statist. 29 1165–1 188. MR18692 45 [5] Benjamini, Y. and Yeku tieli, D. (2005 ). F alse discov ery rate controlling conﬁdence in terv als for selected par ameters (with discuss ion). J. Amer. Statist. Asso c. 10 0 71–80 . MR21568 20 [6] Bha tt a char y a, S., Long, D. and L yons-Weiler, J. (2003). Overcoming confounded controls in the a nalysis of gene expr ession da ta from microarr ay exp eriments. Applie d Bioinfo rmatics 2 197–20 8. [7] Cardon, L. and Bell, J. (2001). Asso ciatio n study designs for co mplex diseases. Natur e R eviews Genetics 2 9 1–99. [8] Dalmasso, C., Br o ¨ et, P. and Morea u, T. (20 05). A simple pro c e dure for estimating the false dis cov ery ra te. Bioinf ormatics 21 660–66 8. [9] Da tt a, S. and Da tt a, S. (2005). Empirical Bay es scre e ning of many p-v alues with applications to micro array studies. Bioinformatics 21 1987– 1994. [10] Devlin, B. and R oeder, K. (19 99). Genomic control for as so ciation studies. Biometrics 55 997 –100 4. [11] Efron, B. (2004 ). Large-sca le simultaneous hypothesis testing: The choice o f a null hypo thesis. J. A mer. S tatist. A sso c. 99 96 –104 . MR20542 89 [12] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Em- pirical B ay es ana lysis o f a microarr ay experiment. J. Amer. St atist. Asso c. 96 1151– 1160 . MR19465 71 [13] Genovese, C. and W asserman, L. (20 02). Op era ting characteristics a nd extensions o f the false discov ery rate pr o cedure. J. Ro y. Statist. So c. Ser. B 64 499– 5 17. MR19243 03 [14] Genovese, C. and W asserman, L. (2004). A sto chastic approach to false discov ery control. Ann. Statist. 32 103 5–106 1. MR20651 97 [15] George, E. I. (1986). Minimax m ultiple shrink age estimatio n. A nn. Statist. 14 188– 2 05. MR08295 62 [16] Ghosh, D. (2006). Shrunken p-v alues for assessing diﬀerential expression, with applications to genomic data a nalysis. Biometrics 5 9 1099– 1106. MR229 7681 [17] Ghosh, D. and Chinnaiy an, A. M. (200 5). Cov ar ia te adjustmen t in the analysis o f microa rray data from clinical studies. F unctional and Inte gr ative Genomics 5 18 –27. [18] James, W. and Stein, C. (1 961). E stimation with qua dratic lo s s. Pr o c. F ourth Berkeley Symp. Math. St atist. Pr ob ab. 361– 380. Univ. Ca lifo rnia P ress, Berkeley . MR01331 91 [19] Lin, D. Y., Kronmal, R. A. and Psa ty, B. M. (1998). Ass essing the sensitivity of regressio n r esults to unmeas ur ed confounders in o bserv a tional studies. Biometrics 5 4 9 48–9 6 3. [20] Lindsey, J. K. (1974 ). Co mparison o f probability distributions. J. R oy. Statist. So c. Ser. B 36 38–47 . MR03626 43 [21] Prentice, R. L. and Qi, L. (2006 ). Asp ects of the design and analysis of high-dimensional SNP studies for disease risk estimation. Biostatistics 7 339– 354. [22] Pounds, S. a nd Cheng, C. (2004). Improving false discov ery r ate estimation. Bioinfo rmatics 20 1737– 1745 . [23] Pritchard, J. K. and Rosenber g, N . A. (1999). Use o f unlink ed genetic markers to detect p opulatio n str atiﬁcation in as s o ciation studies. Amer. J. Human Genetics 6 5 2 20–2 28. [24] Saba tti, C., Ser vice, S. and Freimer, N. (2003). F alse disco very r ate in 256 D. Ghosh link age and as so ciation g enome screens for co mplex disorder s . Genetics 164 829–8 33. [25] Sarkar, S. (2002). Some results on false discov ery rates in multiple testing pro cedures. Ann. Statist. 3 0 2 39–2 5 7. MR18926 63 [26] Sen, P . K. a nd Saleh, A. K. (1985). O n so me shrink age estimator s of m ultiv aria te lo cation. Ann. Statist. 1 3 2 72–2 8 1. MR0773 167 [27] Sen, P. K. and Saleh, A. K. (1987). On pr eliminary test and shrink age M -estimation in linear mo dels. A nn. Statist . 15 1580 –1592 . MR09135 75 [28] Storey, J. D. (20 0 2). A dir ect appro ach to fals e discov ery rates . J. R oy. Statist. So c. Ser. B 64 479–4 9 8. MR19 2430 2 [29] Storey, J. D. (2003 ). The p ositive false discov ery rate: A Bay esian interpre- tation and the q- v alue. Ann. Statist. 31 20 13–2 035. MR2036 3 98 [30] Storey, J. D., T a ylor, J. E. and Siegmund, D. (2004). Strong con trol, conserv ativ e po int estimation and simultaneous conserv ative consis tency of false discov ery r a tes: A uniﬁed approach. J. R oy. Statist. So c. Ser. B 66 187 – 205. MR20357 66 [31] Storey, J. D. and Tibshirani, R. (20 0 3). Statistical signiﬁcance for genomewide studies. Pr o c. Nat. Ac ad. S ci. US A 100 9440 –9445 . MR19 9 4856 [32] v an der Laan, M. J., Dudoit, S. and Polla rd, K. S. (2004a ). Augmen- tation pro cedures for control of the generalized family-wise error rate and tail probabilities for the prop ortio n of false p ositives. Statistic al Applic ations in Genetics and Mole cular Biolo gy 3 Article 15 . MR21014 64 [33] v an der Laan, M. J., Dudoit, S. and Po llard, K. S. (2004b). Multi- ple testing. P art I I. Step-down pro cedures for con trol of the family-wis e erro r rate. Statistic al Applic ations in Genetics and Mole cular Biolo gy 3 Article 14. MR21014 63 [34] W a cholder, S. , Rothman, N. and Caporaso, N. (200 0). Population stratiﬁcation in epidemiolog ic studies of common genetic v a riants and cancer: Quantiﬁcation of bia s. J. National Canc er In stitute 92 115 1 –115 8. [35] V aramball y, S. et a l. (200 2). The p olyco mb gro up pr otein EZH2 is inv o lved in progr ession of pr ostate cancer . N atur e 419 6 24–62 9. [36] Wright, S. (1951 ). The genetical s tr ucture of p opulations. Ann. Eugenics 15 323–3 54. MR00414 13

Multiple testing procedures under confounding

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment