Optimal factorial designs for cDNA microarray experiments

The Annals of Applie d Statistics 2007, V ol. 2, No. 1, 366–385 DOI: 10.1214 /07-A OAS144 c  Institute of Mathematical Statistics , 2 007 OPTIMAL F A C TORIAL DESIGNS FOR CDNA MICR OARRA Y EXPERIMENTS By T a thaga t a Banerjee and Rahul Muk erjee Indian Institute of Management A hme dab ad and Indian Institute of Management Calcutta W e consider cDNA micro arra y exp eriments w hen the cell pop- ulations hav e a factorial structure, and inves tigate t he problem of their optimal designing under a baseline parametrization where the ob jects o f interes t diﬀer from th ose under t h e more commo n orthog- onal parametrization. First, analytical results are given for th e 2 × 2 factorial. Since practical app lications often inv olve a more complex factorial structure, we next explore general factorials and obtain a col- lection of optimal designs in the saturated, that is, most ec onomic, case. This, in turn, is seen to yield an ap proac h for ﬁnd in g optimal or eﬃcien t d esigns in the practically more important nearly sa turated cases. Thereafter, t h e ﬁnd ings are extended to the more in tricate sit- uation where the underlying mod el incorp orates dye-coloring eﬀects, and the role of dy e-swapping is criticall y examined. 1. In tro duction. Optimal designing of cDNA microarra y exper im ents is an area of enorm ous p oten tial that has started op ening up in r ecen t ye ars . The ﬁelds of application include the biological , agricultural and pharmaceu- tical sciences. In an exp erimen tal design for microarra ys , the cell p opula- tions und er study represent the treatmen ts wh ic h, as in traditional design theory , ma y b e un s tructured or h a ve a factorial structur e. The design is called v arietal or factorial in these t wo situations r esp ectiv ely . In v arietal designs, in terest lies t ypically in all or some pairwise con trasts of the treat - men t eﬀects, whereas in f actorial designs, the ob jects of in terest are the main eﬀects of the factors and inte ractions among them. Th ese factorial eﬀects are commonly d eﬁ ned via an orthogonal parametrization, b u t a relativ ely less studied baseline parametrization, which is nonorth ogonal, can also b e of in terest dep end ing on th e conte xt. The main distinction b et we en th ese t w o Received Ap ril 2007; revised Septemb er 2007. Key wor ds and phr ases. Admissibili ty, augmented design, baseline parametrizatio n, dye-sw apping, intera ction, main eﬀect, orthogonal p arametrization, saturated design, w eighted optimality . This is an electronic reprin t of the o riginal article published by the Institute of Mathematica l Sta tis tics in The Annals o f Applie d S tatistics , 2007, V ol. 2, No. 1 , 3 66–3 8 5 . This reprint diﬀers from the orig inal in pagination and typo graphic detail. 1 2 T. BANERJEE AND R . MUKER JEE kinds of parametrization is that the former d eﬁnes the factorial eﬀects via m u tually orthogonal treatmen t cont rasts, whereas the latter deﬁnes these eﬀects with reference to natur al b aseline levels of the factors and, hence, en tails nonorthogonalit y . More details foll o w in Section 3 . In a pioneering p ap er Kerr and Ch urchill ( 2001a ) d iscussed the design is- sues in microarra ys and in v estigated optimal v arietal designs that estimate the p airw ise con trasts of treatmen t eﬀects for ﬁxed genes with minim um a v- erage v ariance. While observing that microarra y designs can b e considered as incomplete b lo c k designs with block size t wo , they noted the inadequacy of the classica l optimalit y results in this r egard and obtained, via complete en u meration, ec on omic optimal designs for ten or fewe r tr eatmen ts. W e re- fer to Kerr and Ch u rc h ill ( 200 1b ), Y ang and Sp eed ( 2002 ) and Churc hill ( 2002 ) for very in formativ e furth er discussion on the design issu es. While Kerr and Churc hill ( 2001a , 2001b ) and C h urc h ill ( 2002 ) concen tr ated on v arietal designs, Y ang and Sp eed ( 2002 ) discussed factorial designs in some detail. S ubsequent w ork on v arietal designs for microarra ys includ es those due to Dobbin and Simon ( 2002 ), Kerr ( 2003 ), Rosa, Steib el and T emp el- man ( 2005 ), Wit, Nobile and Kh anin ( 2005 ) and Altman and Hua ( 2006 ), although some of these authors, as also Ch urc hill ( 2002 ), b rieﬂy touc h ed up on f actorial designs as well. W ork on factoria l designs for microarra ys started gaining momen tum only v ery recen tly . A ma jor reference in this regard is Kerr ( 2006 ), who wo r k ed under the framew ork of the orthogonal parametrization and exp lored t w o- lev el factorial designs that k eep all main eﬀec ts and tw o-factor in teractions estimable, without an y assumption on the ab s ence of higher order inte rac- tions. Exploiting the connectio n b et w een microarra y designs and incomplete blo c k designs w ith b lo c k size tw o, she show ed h o w replicates arising f rom diﬀeren t blo cking sc h emes can b e com bin ed for this purp ose and , in partic- ular, ga ve designs w ith the minim um n umb er of rep licates for eig ht or fewer factors. Related references in the general con text of tw o-lev el factorials in blo c ks of size tw o include Y ang and Drap er ( 2003 ) and W ang ( 2004 ) and, in terestingly , Kerr ( 2006 ) pr op osed a construction wh ich is more economic than that in W ang ( 200 4 ) f or any n umb er of factors. Bu en o Filho, Gilmour and Rosa ( 2006 ) also considered factorial microarra y designs for b oth ﬁxed and random treatmen t eﬀects. Their parametrization for ﬁxed treatmen t eﬀects is akin to the orth ogonal one and, among other th ings, they s tu d- ied optimal d esigns f or th e 3 × 3 factorial, paying more atten tion to the case w here the tw o-factor in teraction is absen t. F urther results on facto- rial microarray designs u nder the orthogonal parametrizatio n w ere obtained b y Landgreb e, Bretz and Brunner ( 2006 ), Gupta ( 2006 ) and Gr ossm ann and Sc h w ab e ( 2008 ). Landgreb e, Bretz and Brunner ( 20 06 ) studied optima l designs w ithin a collec tion of candid ate designs and fo cused on the 2 × 2 and 3 × 2 factorials, wh ile Gupta ( 2006 ) inv estigate d the role of balanced OPTIMAL DESIGN IN MICRO A RRA Y EXPERI MENTS 3 factorial designs. Grossmann and Sch w ab e ( 2008 ) explored optimal designs for mo dels that in clude only the main eﬀects or on ly the main eﬀects and t wo- factor in teractions. T urning to factorial d esigns for microarrays under the baseline parametriza- tion, whic h is the main thrus t of this pap er, t wo k ey references are Y ang and Sp eed ( 2002 ) and Glonek and Solomon ( 200 4 ), (hereafter abbrevi- ated GS). While Y ang and Sp eed ( 2002 ) broadly discussed the design issues, GS rep orted illuminating computational results on admissible designs for the 2 × 2 facto r ial. Con tin uin g with the baseline parametrization, w e pr op ose to consider general, p ossibly asymmetric, factorials and present analytical re- sults. F or comparativ e pu rp oses, results und er the orthogonal p arametriza- tion will also b e o ccasionally ind icated. The presen t end ea vo r is motiv ated b y t w o reasons. First, as will b e seen in Section 3 , there are p ractical situa- tions, ev en b ey ond the domain of m icroarray exp erimen ts, where th e baseline parametrization is natural. Second, although this parametrization lo oks sim- pler th an the orthogonal one, it renders the task of ﬁnding optimal or eﬃcient designs somewhat more challe nging d ue to lac k of orthogonalit y . Presumably due to this reason, even in traditional factorial design literature, the optimal design problem u nder the baseline parametrization has r eceiv ed very little atten tion. It is hop ed that our results would ﬁll in this gap to some exten t. W e remark in this connection that b ecause of th e considerable diﬀerence in the deﬁn itions of the factorial eﬀects under the b aseline and orthogonal parametrizations, the signiﬁcant b o d y of work that h as already b een done on factorial microarra y d esigns under the latter parametrizatio n p ro vid es no clue to our deriv ation or r esu lts. The f ollo wing example, concerning a stud y of leuk emic mice , highligh ts this p oin t; more details on some of the tec hnical terms in the example are a v ailable in Sections 2 and 3 . Example 1. GS describ e a cDNA microarra y exp eriment that compares t wo cell lines FI∆ and V449E at times 0 hours and 24 h ours. The cell line V449E proliferates into leuk emia wh ile FI∆ is n onleuk emic. Then there are t wo factors dicta tin g the cell popu lations. The ﬁrst factor, namely , the m u- tan t, has t wo lev els FI∆ and V449E of whic h FI∆, b eing nonleukemic , is tak en as the baseline lev el. These leve ls are co ded as 0 and 1 resp ectiv ely . The second fac tor is time, ag ain with t w o lev els, 0 hours and 24 hour s, and the ﬁrst of these is tak en as the baseline lev el. T hese t wo lev els are also co ded as 0 and 1 resp ectiv ely . Thus, considering the tw o factors together, there are four treatmen t combinatio n s, 00, 01, 10 and 11, representing the cell p opulations. The main eﬀects of the t wo facto rs are imp ortan t, but their in teraction, that concerns the diﬀeren tial expr ession of genes for V449E and FI∆ at time 24 h ours as con trasted with that at time 0 hours, is often of ev en greater in terest. T he exp eriment consists of a num b er of slides eac h 4 T. BANERJEE AND R . MUKER JEE comparing a p air of cell p opulations or, equiv alen tly , treatmen t combina- tions. Supp ose the a v ailable resources allo w exp er im entatio n with six slides. Since the four treatment combinatio ns can also b e paired in six w a ys, namely , (01 , 00) , (10 , 00) , (11 , 00) , (10 , 01) , (11 , 01) and (11 , 10), the symmetric design, that co mpares eac h pair on one slide, seems very attractiv e, b ecause it is a balanced incomplete blo c k (BIB) design w ith excellen t optimali t y prop erties under the orthogonal parametrization [ Kiefer ( 1975 )]. As a r iv al, consider the design th at c ompares the p airs (01 , 00) and (10 , 00) eac h on t w o slides, and the p airs (11 , 01) and (11 , 10) eac h on one slide. As noted in GS, un - der the b aseline parametrization, the sym metric design estimates th e t wo main eﬀects and the in teraction with r esp ectiv e v ariances 1 2 σ 2 , 1 2 σ 2 and σ 2 , whereas the corr esp onding v ariances for the riv al design a re 5 12 σ 2 , 5 12 σ 2 and 3 4 σ 2 , where σ 2 is the common v ariance of the observ ations. Thus, the riv al design outp erforms th e symmetric one n ot only in o v erall t erms but also in- dividually for eac h f actorial eﬀect, in the s ense of enta iling uniformly smaller v ariance, that is, more precise estimation. The ab o v e example is rev ealing. It sho ws that ev en for the seemingly sim- ple 2 × 2 factorial, optimal designs under the orthogonal parametrization are b y no means guarante ed to p erform wel l when one w orks w ith th e b aseline parametrization where entirely new designs ma y turn out to b e desirable. This op ens up new c hallenges which become ev en m ore complex f or general factorials. The pap er is organized as follo ws. The next section gives an outline of the exp erimental setup. In Section 3 w e revisit the 2 × 2 f actorial, consid- ered pr eviously b y GS, and obtain analytical results whic h sup plemen t and strengthen their computational ﬁ ndings, in addition to pr eparing the groun d for the subsequent deve lopmen t. T aking cognizance of the f acts that in man y applications there ma y b e more than t w o factors dictating the cell p opula- tions and that, eve n with tw o factors, one or b oth of them m ay app ear at more than tw o lev els, general factorials are considered f r om S ection 4 on- w ard . S ince cDNA microarra y exp eriments are still quite exp ensiv e, there is a pr emium on optimal d esigns that are r elativ ely small in size. F r om this viewp oint , in S ection 4 , we ﬁrs t consider the saturated, that is, most eco- nomic, case and p resen t a collect ion of optimal d esigns in a strong s ense. Apart from facilitating a c hoice u nder resource constraint s, th is leads to an approac h for ﬁnding optimal or eﬃcient designs in the p ractically more im- p ortant nearly saturated cases that are also studied at length in Sectio n 4 . In Section 5 we extend the m ain id eas of Section 4 to the situation where the underlying mo del in cludes dye -coloring eﬀects. The ﬁnd in gs of this section rigorously justify , for suc h a mo del, a recommendation b y Y ang and Sp eed ( 2002 ) on dye-sw apping. S ev eral other design iss ues, in cluding op en prob- lems, are discussed in Section 6 . T ec hn ical details, including pro ofs, app ear OPTIMAL DESIGN IN MICRO A RRA Y EXPERI MENTS 5 in a sup plemen tary material ﬁle p osted at the jour nal w ebs ite [Banerjee and Muk erj ee ( 2008 )]. The tec hnical to ols include use of appro xim ate design theory , Kroneck er represent ation and unimo dularit y . 2. Exp erimen tal setup. W e refer to Nguy en et al. ( 2002 ) and Amaratunga and Cabrera ( 2004 ) for detailed accoun ts of the exp erimen tal setup. In cDNA microarra ys, eac h slide compares tw o cell p opu lations on the basis of mRNA samples separately labeled with ﬂuorescent d yes, u sually red and green. T his is done for a n umb er of slides and d iﬀeren t slides ma y com- pare diﬀeren t pairs of ce ll p opu lations. After comp etitiv e hybridization, the ratio of the red a nd green ﬂuorescence in tensities is measured at eac h sp ot on eac h slide. Any suc h ratio represents the relativ e abundance of the gene in the tw o ce ll pop u lations compared o n the corresp ondin g slide. The in tensit y ratios are usually adjusted for bac kground noise and then normalized with the ob jective of remo v in g systematic biases. W e consider linear mo d els f or th e log in tensities and, hence, the log in- tensit y ratios. T he mo deling as well as the corresp onding optimal design problem r efers t o a sin gle gene—it is in tended that the same design a pplies sim u ltaneously to all genes on th e array . Th e log in tensit y rati os for a gene, arising from diﬀeren t slides, are supp osed to b e h omoscedastic and u ncor- related; a discussion on this, in the light of biolog ical v ariabilit y , follo w s in Section 6 . The ab o ve exp erimental setup is structurally similar to classical paired comparison exp erimen ts; see Kerr and Churc hill ( 2001a ). T he cell p opu la- tions und er comparison are the same as treat men ts (or treatmen t com bina- tions when they are dictated by sev eral f actors as in Example 1 ), while ea ch slide is equiv alent to a blo c k of size tw o. Ho wev er, the stringency on the n u m b er of slides as w ell as the baseline parametrization adopted here op en up n ew design problems. 3. The 2 × 2 fact orial. 3.1. The b aseline p ar ametrization . Supp ose t wo facto rs F 1 and F 2 , eac h at lev els 0 and 1, dictat e the cell p opulations, whic h corresp ond to th e treatmen t com binations 00 , 01 , 10 and 11. Let τ 00 , τ 01 , τ 10 and τ 11 denote the exp ected log int ensities, that is, the eﬀects, of these treatmen t com bin ations. W e fo cus on the situation where, as in E xample 1 , there is a null state or baseline lev el, say , 0, of eac h factor. Th en θ 00 = τ 00 stands for the baseline eﬀect. W e consider the baseline parametrization [cf. Y ang and Sp eed ( 2002 ); GS] according to whic h the main eﬀects of F 1 and F 2 are giv en resp ectiv ely b y θ 10 = τ 10 − τ 00 and θ 01 = τ 01 − τ 00 , (1) 6 T. BANERJEE AND R . MUKER JEE while th e int eraction eﬀec t F 1 F 2 is given by θ 11 = τ 11 − τ 10 − τ 01 + τ 00 . (2) The counterparts of θ 10 , θ 01 and θ 11 under the more common orthogonal parametrization are deﬁned resp ectiv ely a s θ ∗ 10 = 1 2 ( τ 11 + τ 10 − τ 01 − τ 00 ) , θ ∗ 01 = 1 2 ( τ 11 − τ 10 + τ 01 − τ 00 ) , θ ∗ 11 = 1 2 ( τ 11 − τ 10 − τ 01 + τ 00 ) . (3) Observe that the deﬁnitions of the main eﬀe cts un der the t wo parametriza- tions are en tirely diﬀeren t. While θ 11 is p rop ortional to θ ∗ 11 , this equiv alence for the tw o-factor in teraction also d isapp ears for factorials in v olving three or m ore fact ors; see, e.g., ( 8 ) b elo w. Kerr ( 2006 ) nicely summarized the situations under which the t w o para- metrizations men tioned ab o ve are appropriate. The b aseline parametrizatio n is natural if there is a clear null s tate or baseline lev el of eac h factor. As n oted ab o ve, this happ ens in Example 1 . Similarly , in a to xicologi cal stu d y with bi- nary factors, eac h represent ing the presence or absence of a p articular to xin, the state of absence can b e regarded as a natural baseline lev el of eac h factor [ Kerr ( 2006 )]. On the other h and, if at least one factor, lik e gender, lac ks a natural baseline lev el, then th e b aseline parametrization is inappr opriate b ecause this will arb itrarily single out one lev el of suc h a factor. In situations of this kind, it is advisable to u se the orthogo n al parametriza tion. Indeed, the n u ll state or b aseline lev el of a factor can b e inte rpreted in a br oad sense. It n eed not strictly mean th e zero leve l on some scale, bu t ma y as w ell refer to a standard or con trol lev el lik e the one curr ently used in practice. F or exa mple, in an agricultural exp eriment to inv estigate p ossi- ble impro ve men t in pro ductivit y b y c hanging the doses of sev eral fertilizers, the cur ren tly u s ed doses of the fertilizers ma y represent the control lev els. Similarly , in an industrial exp erimen t on p ossible qualit y impro ve men t via a c hange in the settings of sev eral mac hines used at diﬀeren t stages of the pro- duction pro cess, the current settings of the mac hines ma y reasonably consti- tute the cont rol leve ls. In general, if eac h factor has su c h a con trol or b aseline lev el along with one or more test leve ls, then the baseline p arametrization is appropriate and, hence, the present results should be useful. The p ossible areas of application extend wel l b eyo nd microarrays and p ertain, notably , to agricultural and indu strial exp eriments a s hin ted ab ov e. W e add in this connection that although not muc h work h as so far b een r ep orted on opti- mal fact orial d esigns u nder the baseline p arametrization, there is already a ric h literature on t he corr esp onding problem for v arietal designs. An excel- len t review of this dev elopmen t on treatmen t-con trol designs is a v ailable in Ma j umdar ( 1 996 ). OPTIMAL DESIGN IN MICRO A RRA Y EXPERI MENTS 7 3.2. Design criteria. F ollo wing GS , in this section w e assume th e absence of sys tematic biases includin g dye-co lor bias b ecause one of our ob j ectiv es here is to obtain analytica l results in their setup. With four treatme n t com- binations 00 , 01 , 10 and 11, as in Example 1 , there are six p ossibilities for an y slide, n amely , (01 , 00) , (10 , 00) , (11 , 00) , (10 , 01) , (11 , 01) and (11 , 10), where the mem b ers of ea c h pair represen t the treatmen t com binations that can be compared on the slid e. Within eac h pair, one member gets red dy e-coloring and the other green dy e-coloring, but the distinction is immaterial in the absence of dye- color bias. Supp ose the total num b er of slides used in the exp eriment is ﬁxed at N . Then the design problem inv olv es deciding on the resp ective frequ encies f 1 , . . . , f 6 with which the slides of the six kind s as listed ab o ve should app ear in the exp er im ent, so as to entai l optimal inference in a reasonable sense. Here f 1 , . . . , f 6 are nonnegativ e in tegers sat- isfying f 1 + · · · + f 6 = N . W e consider only those designs that k eep θ 01 , θ 10 and θ 11 estimable. Let V 01 , V 10 and V 11 denote the v ariances of the b est lin- ear unbiased estimators (BLUEs) of θ 01 , θ 10 and θ 11 , resp ectiv ely . A go o d design should aim at k eeping these th r ee v ariances small. Re cognizing that commonly no single design will minimize all the three simultaneously , GS considered admissible deigns. A design d 0 is admissible if there is no other design d 1 suc h that eac h of V 01 , V 10 and V 11 under d 1 is less than or equal to that un der d 0 , at least one of these inequalities b eing str ict. By com- plete enumeration, GS tabulated admissib le designs for ev en N in the range 6 ≤ N ≤ 18. The notion of admissibilit y is in timately linke d with that of w eighte d optimalit y . In most applications, one wishes to give equal w eight to the t wo main eﬀect p arameters. Also, as GS noted, the inte raction parameter can b e of greater interest in microarra ys than the main eﬀect parameters. F r om this p ersp ectiv e, we consider designs that minimize V 01 + V 10 + w V 11 , where w is a p ositiv e w eigh t, with p articular in terest in case w > 1. Suc h a design, call ed w - optimal for simp licit y , is eviden tly admissible. Indeed, ev en for mod erate N , ad m issible d esigns may b e to o n umerous, and consideration of w -optimalit y h elps in narro wing d o wn the c hoice. 3.3. R esults via appr oximate design the ory and their implic ations. The fact that the frequencies f 1 , . . . , f 6 are in teger-v alued complicates the task of ﬁnding w -optimal designs b ecause tools from calculus cannot b e employ ed . This is particularly so b ecause the ob jectiv e function V 01 + V 10 + wV 11 de- p ends on t hese f r equencies in a complex manner. Considerable simp licit y is ac hiev ed if for the moment w e treat the relativ e frequencies π i = f i / N as con tinuous v ariables ov er the range π i ≥ 0 for ea ch i and P π i = 1. Any suc h π = ( π 1 , . . . , π 6 ) is called a design measure. T h is approac h amounts to in vok- ing the app ro ximate design theory [see, e.g., Silv ey ( 1980 )] whic h enables one to use calculus tec hniques to get th e follo wing result. 8 T. BANERJEE AND R . MUKER JEE Resul t 1. (a) F or w > 0 , let ξ = 1 4 { ( w 2 + 2 w ) 1 / 2 − w } . (4) Then th e design measure π 0 = ( 1 2 − ξ , 1 2 − ξ , 0 , 0 , ξ , ξ ) (5) is w -optimal, whenev er w ≥ 2 3 . (b) The design measure ˜ π = ( 1 4 , 1 4 , 0 , 0 , 1 4 , 1 4 ) minimizes V 11 and is admis- sible. (c) The d esign measure ( 1 2 − ξ , 1 2 − ξ , 0 , 0 , ξ , ξ ) is admissible wh enev er 1 6 ≤ ξ ≤ 1 4 . Inciden tally , Bueno Fil ho, Gilmour and Rosa ( 2006 ) and Grossmann and Sc hw ab e ( 2008 ) also emplo yed the approximat e theory in the stud y of op- timal microarra y d esigns. But their settings and criteria and , hence, ﬁnal results are diﬀerent from ours. W e no w discuss the imp lication of Result 1 on (exact) designs th at tak e cognizance of the fact that f 1 , . . . , f 6 are actually in tegers. Any such design ma y b e represen ted b y the vect or f = ( f 1 , . . . , f 6 ). Since π i = f i / N for eac h i , the follo win g co nclusions, p ertaining to even N , are evident from Resu lt 1 . (i) If w ≥ 2 3 and φ = N ξ is an in teger, where ξ is giv en b y ( 4 ), th en the design f = ( 1 2 N − φ, 1 2 N − φ, 0 , 0 , φ, φ ) (6) is w -optimal. (ii) If N is a m ultiple of 4, then the design ( 1 4 N , 1 4 N , 0 , 0 , 1 4 N , 1 4 N ) mini- mizes V 11 . (iii) Any design of t he f orm ( 6 ) is ad m issible w henev er 1 6 N ≤ φ ≤ 1 4 N . The p oin ts (i)–(ii i) noted ab o ve cater to the need, in our conte xt, of ﬁnd ing go o d designs w ith emphasis on the in teractio n paramete r. F or N ≤ 18, they pro v id e analytical justiﬁcation for quite a few ﬁn dings of GS, suc h as the admissibilit y of the riv al design in Examp le 1 . In addition, they f acilitat e the study of goo d designs for N ≥ 20 , w h ic h is b eyond th e range consid ered b y GS and ma y p ose diﬃculties in complete enumeratio n. F or instance, if N = 20, then they show that the designs (6 , 6 , 0 , 0 , 4 , 4) and (5 , 5 , 0 , 0 , 5 , 5) are admissib le, and that the latter design minimizes V 11 . Giv en w ( ≥ 2 3 ), ev en if φ = N ξ in (i) ab o v e is n ot an in teger, one ma y simply roun d it oﬀ to the nearest in teger to get a highly eﬃcien t design. As an illustration, le t w = 2. B y ( 4 ), th en ξ = 0 . 207107. F or N = 22, rounding N ξ oﬀ to the nearest in teger, namely , 5, we can follo w ( 6 ) to consid er the design OPTIMAL DESIGN IN MICRO A RRA Y EXPERI MENTS 9 (6 , 6 , 0 , 0 , 5 , 5), whic h has eﬃciency 99.44% as a comparison with the w - optimal design measure in ( 5 ) rev eals. Con tin uin g with w = 2, for ev ery ev en N in the range 6 ≤ N ≤ 30, one can similarly obtain designs w ith o ver 97%, and often o v er 99%, eﬃciency . These eﬃciencies are actually low er b ounds, as they are relativ e to an optimal d eign measure wh ic h is u nattainable in the exact setup. Hence, we conjecture that all th ese designs are actually w -optimal, with w = 2, for the resp ectiv e N . Using (iii) ab o v e, one can also v erify that these designs are all admissible. It is of interest to compare Resu lt 1 with its coun terpart arising und er the orthogonal p arametrization ( 3 ). T o that eﬀect, we note that the f ollo wing hold un der ( 3 ): (a) The design measur e π orth 0 = ( α, α, 1 2 − 2 α, 1 2 − 2 α, α, α ), where α = 1 2 w 1 / 2 / (2 + w 1 / 2 ), is w -optimal for 0 < w < 4 . (b) The design measure ˜ π = ( 1 4 , 1 4 , 0 , 0 , 1 4 , 1 4 ) is w -optimal for w ≥ 4. (c) The design measure( α, α, 1 2 − 2 α, 1 2 − 2 α, α, α ) , w ith α as in (a), is admissible for 0 < α ≤ 1 4 . The pr o ofs of these are similar to but simpler than that of Result 1 . F r om (a) and (b) ab ov e, simple rounding oﬀ again yields highly eﬃcien t exact d esigns under the orthogonal parametrizati on. F or w < 4, un lik e π 0 in Result 1 , the measure π orth 0 in (a) a ssigns p ositiv e masses to all the six p ossible slides. In fact, for w = 1 , π orth 0 assigns uniform m ass ev ery w here an d , h ence, entail s a BIB design. On the other hand, for w ≥ 4, the optimal design measur es under the t wo parametrizatio ns are quite clo se to eac h ot her. The fact that π has only s ix element s, b ecause of only six p ossibilities for any slide, help ed the study of optimal design measures and , hence, that of optimal or eﬃcien t designs in this section. In microarra y exper im ents for general f actorials considered from Section 4 onw ard, the num b er of p ossibil- ities for any slide increases dr amatically and, as a result, the optimal design measures are analytically in tractable. F or this reason, hereafter w e d ir ectly in vestig ate exact designs. 4. General fact orials. 4.1. Pr eliminaries. In man y app lications of cDNA microarra ys there ma y b e m ore than t wo factors dictating the cell p opu lations and, even if there are only tw o factors, one or b oth of th em may app ear at more than t wo levels. F or instance, if in Example 1 the tw o cell li nes are compared at time 12 hours , in addition to 0 hou r s and 24 hours, then w e ha v e to consid er a 2 × 3 factorial, with the seco n d factor, time, no w app earing at th ree lev- els. Similar examples ab ound and underscore the practical need to exp lore the optimal d esignin g of cDNA microarra y exp er im ents with reference to general f actorials. 10 T. BANERJEE AND R . MUKER JEE F r om this p ers p ectiv e, consider an s 1 × · · · × s n factorial that in v olve s n ( ≥ 2) facto rs F 1 , . . . , F n dictating the cell p opu lations, with F j app earing at lev els 0, 1 , . . . , s j − 1. Then there are v = Q s j cell p opu lations wh ic h corresp ond to the treatmen t combinatio ns i 1 . . . i n (0 ≤ i j ≤ s j − 1 , 1 ≤ j ≤ n ) . Let τ i 1 ...i n b e the exp ected log int ensit y , th at is, the eﬀec t, of the treatmen t com bination i 1 . . . i n . As b efore, the baseline lev el of eac h factor is denoted b y 0. Hence, θ 00 ... 0 = τ 00 ... 0 stands for the baseline eﬀect. Also, as an obvio u s extension of the baseline parametrizati on giv en by ( 1 ) and ( 2 ), a main eﬀect, sa y , that of F 1 , is represen ted by the s 1 − 1 parameters θ i 1 0 ... 0 = τ i 1 0 ... 0 − τ 00 ... 0 (1 ≤ i 1 ≤ s 1 − 1) , (7) whereas a tw o-factor in teraction, sa y , F 1 F 2 , is r epresen ted by th e ( s 1 − 1)( s 2 − 1) parameters θ i 1 i 2 0 ... 0 = τ i 1 i 2 0 ... 0 − τ i 1 00 ... 0 − τ 0 i 2 0 ... 0 + τ 000 ... 0 (8) (1 ≤ i 1 ≤ s 1 − 1 , 1 ≤ i 2 ≤ s 2 − 1) . Similarly , w e can deﬁne θ i 1 ...i n for ev ery i 1 . . . i n 6 = 0 . . . 0 (0 ≤ i j ≤ s j − 1 , 1 ≤ j ≤ n ) so th at a n y s u c h θ i 1 ...i n represent s a factorial eﬀect as d etermined b y its n on zero subscripts. Thus, an y θ i 1 ...i n with u nonzero su bscripts repr esen ts a factorial eﬀect inv olving u f actors. Hereafter, often the v − 1 p arameters θ i 1 ...i n ( i 1 . . . i n 6 = 0 . . . 0) are collecti v ely referred to as the θ s for ease in present ation. Note that ( 7 ) is r eminiscen t of the canonical parametrizat ion in Wit, Nobile and Khanin ( 2005 ), Su bsection 2.1, for v arietal d esigns. Throughout this section w e cont in u e to assume the absence of sy s tematic biases includin g dy e-colo r bias and w rite σ 2 for th e v ariance of any obs erv ed log in tensity ratio. 4.2. Optimal satur ate d designs. All main and int eraction eﬀects, as rep- resen ted by the θ s, are of p oten tial inte rest at least for a relativ ely small n u m b er of factors. Hence, at this stage w e consider optimal designs for the estimation of all these v − 1 paramete rs. Clearly , then the num b er of slides, N , in the exp eriment must satisfy N ≥ v − 1. W e ﬁr st consider the satu- rated case N = v − 1 a nd obtain a coll ection of optimal designs. In addition to facilitating a choic e under resource constrain ts, this pa ves the w a y for the dev elopment of an approac h for ﬁn ding optimal or eﬃcient designs in th e practically more important nearly saturated cases that are tak en up in the next su bsection. Resul t 2. Let N = v − 1 an d consider a d esign that keeps all the θ s estimable. Then for any θ i 1 ...i n , wh ic h represen ts a factoria l eﬀec t inv olving u f actors, V ar( ˆ θ i 1 ...i n ) ≥ σ 2 2 u − 1 , (9) OPTIMAL DESIGN IN MICRO A RRA Y EXPERI MENTS 11 where ˆ θ i 1 ...i n is the BLUE of θ i 1 ...i n . Result 3 below shows that the same design can att ain the lo wer b ound in ( 9 ) sim ultaneously for all the θ s. Suc h a design is then optimal not only in o ve rall terms but also individua l ly for every parameter rep r esen ting a main or in teraction eﬀect. In what follo w s, a slide w hic h compares treatmen t com- binations i 1 . . . i n and j 1 . . . j n , resp ectiv ely with red and gree n dye -coloring, is denoted by the ord ered pair ( i 1 . . . i n , j 1 . . . j n ). A design is rep resen ted by a collect ion of su ch pairs. Note that the ordering within an y pair is immaterial at th is stage f or in feren tial purp oses, as w e are no w assuming the absen ce of dy e-colo r bias. F or an y i 1 . . . i n 6 = 0 . . . 0, let ρ ( i 1 . . . i n ) b e obtained r eplacing the ﬁrst nonzero ent ry of i 1 . . . i n b y 0 and lea vin g the other entries un - c hanged. F or in stance, with a 2 × 2 × 3 factorial, ρ (012) = 002 , ρ (111) = 011 etc. Resul t 3. Let N = v − 1. Then the design d 0 = { ( i 1 . . . i n , ρ ( i 1 . . . i n )) : 0 ≤ i j ≤ s j − 1 , 1 ≤ j ≤ n, i 1 . . . i n 6 = 0 . . . 0 } leads to the attainment of the lo we r boun d in ( 9 ) sim u ltaneously for all th e θ s. Remark 1. F or n = 2, one can c hec k that Result 3 remains v alid if for ev ery i 1 ≥ 1 , i 2 ≥ 1 , ρ ( i 1 i 2 ) is allo w ed to b e either 0 i 2 or i 1 0, instead of b eing ﬁxed at 0 i 2 as stipulated ab o v e. Be cause of the tw o p ossibilities for an y su c h ρ ( i 1 i 2 ), one gets a co llectio n of 2 ( s 1 − 1)( s 2 − 1) optimal designs. Remark 2. E x amp les can b e give n to show that, for n ≥ 3, Resu lt 3 do es not remain v alid if, in the spirit of Remark 1 , eac h ρ ( i 1 . . . i n ) is obtained simply b y replacing an arbitrary , rather than the ﬁ r st, nonzero en try of i 1 . . . i n b y 0. Ho wev er, ev en then, Result 3 leads to a collection of optimal designs via factor p erm utation. T o illustrate this p oin t, o bserve that Resu lt 3 yields the design d 0 = { (001 , 000) , (002 , 000) , (010 , 000) , (011 , 001) , (012 , 002) , (100 , 000) , (101 , 001) , (102 , 002) , (110 , 0 10) , (111 , 011) , (112 , 012) } for a 2 × 2 × 3 fac torial, and the design { (001 , 000) , (010 , 000) , (011 , 001) , (100 , 000) , (101 , 001) , (110 , 010) , (111 , 011) , (200 , 000) , (201 , 001) , (210 , 010) , (211 , 011) } for a 3 × 2 × 2 factorial. P erm uting the factors in th e latter, one readily gets another d esign for the 2 × 2 × 3 f actorial w hic h, like d 0 , is optimal in the sense of Resu lt 3 . In the same manner, Result 3 can b e easil y applied to al l p ossible f actor ord erings to yiel d a col lection of optimal designs. 12 T. BANERJEE AND R . MUKER JEE 4.3. Ne arly satur ate d designs. The optimal designs Section 4.2 are satu- rated and , h ence, do not yield an in ternal estimator of σ 2 whic h is imp ortant for testing of h yp otheses. This d iﬃcult y can p ersist even if the same clone is replicated r ( > 1) times on eac h slide. Then , for the purp ose of estima ting the θ s, the means of the r log inte nsit y ratio s arising f r om the slides pla y the role of the ind ividual ratios considered so far, but an attempt to estimate σ 2 on the basis of the within slid e v ariation can b e vitiated by unknown correlation among the r atios arising f rom the same slide [ Y ang and Sp eed ( 2002 ) and Ch ur c hill ( 2 002 )]. In view of the ab ov e, as a feasible yet economic app roac h to getting degrees of freedom for the estimation of σ 2 , one ma y like to hav e a little more than v − 1 slid es and th u s consider nearly saturated designs. Unlike in the saturated case, typica lly for N > v − 1, no single d esign can estimate all the θ s with the minimum v ariance. W e, therefore, consider the w -optimalit y criterion as applicable to general fac torials. Analytic al d eriv ation of optimal designs, via either combinatorial tec hniques or approximat e design theory , still remains diﬃcult for N > v − 1. The resu lts in the last su bsection, h ow ev er, readily yield a heuristic ap- proac h which, as computations indicate, lea d s to h ighly eﬃcien t, if not op- timal, designs. Based on the in tuitiv e exp ectat ion that, for N close to v − 1, a design obtained via augmentat ion of an optimal saturated design should b ehav e w ell, w e prop ose the follo wing steps: (I) Giv en s 1 , . . . , s n , list all optimal saturated designs given by Result 3 and Remark 1 or 2 . (I I) Give n N , augmen t eac h design in (I) in all p ossible wa ys to generate designs with N slides. (I I I) F rom th e augmente d designs in (I I), select one as p er the c hosen optimalit y criterion, also taki ng care of resource constrain ts, if an y . F or N close to v − 1, compu tationally the ab ov e steps are far easier to implemen t than a complete en u meration of al l designs. The least fa vorable cases for this approac h are th ose w here the saturated designs inv olv e a rather small n u mb er of slides, a nd hence, ev en a sligh tly larger N can p oten tially ha ve a signiﬁ cant impact. F rom this viewp oint , the approac h is no w ev alu- ated for 2 × 2 and 2 × 3 factoria ls which rep r esen t the t wo smallest saturated cases. W e consider w -optimal designs that, f or an s 1 × s 2 factorial, aim at minimizing s 1 − 1 X i 1 =1 V ar( ˆ θ i 1 0 ) + s 2 − 1 X i 2 =1 V ar( ˆ θ 0 i 2 ) + w s 1 − 1 X i 1 =1 s 2 − 1 X i 2 =1 V ar ( ˆ θ i 1 i 2 ) . T ables 1 and 2 sho w w -optimal designs, as obtained by complete en umeration of all d esigns, for 2 × 2 and 2 × 3 factorials, w ith w = 1 , 2 , 3, and N = v − OPTIMAL DESIGN IN MICRO A RRA Y EXPERI MENTS 13 T able 1 w -optimal des igns for the 2 × 2 factor ial N w w -opti mal de sign 4 1 , 2 , 3 { (01 , 00) , (10 , 00) , (11 , 01) , (11 , 10) } 5 1 , 2 , 3 { (01 , 00) , (10 , 00) , (10 , 00) , (11 , 01) , (11 , 10) } 6 1 , 2 , 3 { (01 , 00) , (01 , 00) , (10 , 00) , (10 , 00) , (11 , 01) , (11 , 10) } 1 + j ( j = 1 , 2 , 3). While these optimal designs can b e n on u nique, only one suc h design is reported in eac h case to s av e space. Ev ery design in T able 1 or 2 con tains the optimal saturated design d 0 of Result 3 as a sub design, th u s sho wing that the heuristic appr oac h, based on augmen tation, indeed yields a w -optimal d esign in eac h of these cases. F or the 2 × 2 factorial, w e hav e also c hec ked that all admissible designs for N = 4 , 5 and 6 are augmen tations of one of the tw o optimal saturated designs, { (01 , 00) , (10 , 00) , (11 , 01) } and { (01 , 00) , (10 , 00) , (11 , 10) } , arising from Re- mark 1 . F or an s 1 × s 2 factorial, let ¯ d b e the design obtained as the u nion of the 2 ( s 1 − 1)( s 2 − 1) optimal saturated designs giv en b y Remark 1 , that is, ¯ d consists of the v − 1 + ( s 1 − 1)( s 2 − 1) s lides ( i 1 i 2 , 0 i 2 ) , 1 ≤ i 1 ≤ s 1 − 1 , 0 ≤ i 2 ≤ s 2 − 1, and ( i 1 i 2 , i 1 0) , 0 ≤ i 1 ≤ s 1 − 1 , 1 ≤ i 2 ≤ s 2 − 1. Let Ω b e the class of all designs that co nsist of N slides from ¯ d , where v − 1 < N ≤ v − 1 + ( s 1 − 1)( s 2 − 1). T ables 1 , 2 and partial enumeration in sev eral other cases lead u s to the follo wing conjecture. Conjecture . (a) If N = v − 1 + ( s 1 − 1)( s 2 − 1), th en the d esign ¯ d is w -optimal for an y w ≥ 1 . (b) If v − 1 < N < v − 1 + ( s 1 − 1)( s 2 − 1), then for an y w ≥ 1 , a w -optimal design in Ω is also w -optimal in the class of al l d esigns. The case N = 4 in T able 1 and the cases N = 6 , 7 in T able 2 p ertain to the Conjecture and sh o w its tru th, with w = 1 , 2 , 3, for 2 × 2 and 2 × 3 factorials. F u r thermore, using appro ximate d esign theory , the eﬃciency of T able 2 w -optimal des igns for the 2 × 3 factor ial N w w -opti mal de sign 6 1 , 2 , 3 { (01 , 00) , (02 , 00) , (10 , 00) , (11 , 01) , (12 , 02) , (11 , 10) } 7 1, 2, 3 { (01 , 00) , (02 , 00) , (10 , 00) , (11 , 01) , (12 , 02) , (11 , 10) , (12 , 10) } 8 1 { (01 , 00) , (02 , 00) , (10 , 00) , (11 , 01) , (12 , 02) , (11 , 10) , (12 , 10) , (02 , 01) } 8 2 , 3 { (01 , 00) , (02 , 00) , (10 , 00) , (10 , 00) , (11 , 01) , (12 , 02) , (11 , 10) , (12 , 10) } 14 T. BANERJEE AND R . MUKER JEE the design ¯ d in (a) is seen to b e at least 91.88%, 94.1 6% and 95.57% for the 2 × 4 f actorial, and at least 94.26%, 96.16% and 97.04% f or the 3 × 3 factorial, und er w = 1 , 2 and 3 r esp ectiv ely . Indeed, if this Conjecture is tru e in general, then part (b) w ould considerably reduce the searc h for an optimal design, w hile part (a) w ould giv e a compact result. F or general facto r ials, one can deﬁne the orthogonal parametrizat ion via a straigh tforw ard extension of ( 3 ); see, for example, Gupta and Muk erjee [( 1989 ), Chapter 2]. Und er this parametrization, BIB designs are optimal in a v ery strong sense [ Kiefer ( 1975 )] an d extended group d ivisible (EGD) designs are kno w n to b e admissible [ Gupta and Muk erjee ( 1989 ), C hapters 3 and 8]. While Example 1 demonstrates that a BIB design can b ecome inadmiss ible under the baseline parametrization, we no w show that the s ame can happ en with E GD designs. Note that in the cont ext of microarra ys, an EGD design is one where the num b er of slides comparing an y t w o treatmen t combinatio ns i 1 . . . i n and j 1 . . . j n dep end s only on the equ alit y or otherwise of i u and j u , 1 ≤ u ≤ n . Thus, for the 2 × 3 factorial and with N = 6 slides, there is a unique EGD design { (1 1 , 00) , (12 , 00) , (10 , 01) , (12 , 01) , (1 0 , 02) , (11 , 02) } that allo ws the estimabilit y of all treatmen t con trasts. Und er the baseline parametrization, this design b ecomes inadm iss ible b ecause it estimate s eac h of the θ s with u niformly larger v ariance than the w -optimal design shown in T able 2 for N = 6. 5. Results un der eﬀects due to dye -coloring. W e now extend the main ideas of Section 4 to the situation where the u n derlying mo del includes eﬀects due to dye-co lorin g. F or 0 ≤ i j ≤ s j − 1 , 1 ≤ j ≤ n , let β (1) i 1 ...i n and β (2) i 1 ...i n b e the exp ected log int ensities for the treatme n t com bination i 1 . . . i n under red and green dy e-coloring r esp ectiv ely . Then τ i 1 ...i n = 1 2 { β (1) i 1 ...i n + β (2) i 1 ...i n } represents the ov erall eﬀect of i 1 . . . i n , whereas λ i 1 ...i n = 1 2 { β (1) i 1 ...i n − β (2) i 1 ...i n } acco u nt s f or the eﬀect of dy e-coloring on i 1 . . . i n . F or any slide ( i 1 . . . i n , j 1 . . . j n ), whic h compares treatmen t com bin ations i 1 . . . i n and j 1 . . . j n resp ectiv ely with red and green dy e-coloring, the exp ected lo g intensit y ratio is n ow giv en b y β (1) i 1 ...i n − β (2) j 1 ...j n = τ i 1 ...i n − τ j 1 ...j n + λ i 1 ...i n + λ j 1 ...j n . (10) The p arameters of interest con tin ue to b e the θ s , representing th e main and in teraction eﬀects and deﬁned with reference to the τ s as in Section 4.1 . Th e λ s are, on the other hand, nuisance parameters to us. Un lik e in the previous sections, where w e to ok the λ s as zeros, no w these are k ept p erfectly general. Hence, as ( 10 ) indicates, the ordering within th e slides is no longer inconsequential . A redu ced but more restrictiv e version of the mo del ( 1 0 ) will b e considered brieﬂy in Section 6 . In the p resence of d y e-coloring eﬀects, sev eral authors, notably Y ang and Sp eed ( 2002 ), adv o cated the use o f dy e-swa pp ed exp eriments. It is not h ard to see OPTIMAL DESIGN IN MICRO A RRA Y EXPERI MENTS 15 that, u nder ( 10 ), an y estimable con trast of th e τ s is estimated orthogo- nally to th e λ s in such an experim ent. Let d 0 b e any optimal design arising from Result 3 , Rema rk 1 or Remark 2 . The dy e-swapp ed v ersion of d 0 , de- noted by d swap 0 , is a design that includ es b oth the s lid es ( i 1 . . . i n , j 1 . . . j n ) and ( j 1 . . . j n , i 1 . . . i n ) for ev ery slide ( i 1 . . . i n , j 1 . . . j n ) in d 0 . Giv en the op- timalit y of d 0 in the absence of dy e-colo ring eﬀects, one ma y b e inclined to recommend the u se of d swap 0 in the p resen t setup. Ho wev er , in ord er to justify this rigorously , the follo wing questions need to b e answ ered : (a) There are 2( v − 1) slides in d swap 0 . Are a t least 2( v − 1) slides required to estimate all the θ s und er ( 10 ), ev en when p ossibly nonorthogonal (to the λ s) estimat ion is allo w ed? (b) Under ( 10 ), will d swap 0 b e optimal, in the sense of Result 3 , amo ng all designs that in v olv e 2( v − 1) slides and keep the θ s estimable? The p ossibility of nonorthogonal estimation complicates (a). S imilarly , (b) needs ca r eful atten tion b ecause orthogonalit y alone do es not guarante e optimalit y with 2( v − 1) slides. Satisfyingly , the answers to b oth (a) and (b) are in the aﬃrmativ e. The follo w ing results conﬁrm this and, hence, vindicate th e prop osal of Y ang and Sp eed ( 2002 ) ab out dye-sw apping. As b efore, the total n u m b er of slides is denoted b y N . Also, w e con tin u e to assume that the log inte nsit y ratios arising from diﬀeren t slides are uncorrelated and homoscedastic with common v ariance σ 2 . Resul t 4. Under the mo d el ( 10 ), at least 2( v − 1) slides are required to ke ep all the θ s estimable. Resul t 5. Let N = 2( v − 1) and consider a design that k eeps all the θ s estimable und er ( 10 ). Then for an y θ i 1 ...i n , whic h represents a factorial eﬀect in volving u factors, V ar( ˆ θ i 1 ...i n ) ≥ σ 2 2 u − 2 , (11) where ˆ θ i 1 ...i n is the BLUE of θ i 1 ...i n under ( 10 ). Resul t 6. Let N = 2( v − 1) and d swap 0 b e a design d eﬁned as ab o ve. Then, under ( 10 ), d swap 0 leads to the att ainmen t of th e lo wer boun d in ( 11 ) sim u ltaneously for all the θ s. These results s h o w the optimalit y of d swap 0 in a s tr ong sense. Although Result 5 resem bles Result 2 , its pro of in vo lves m uc h extra w ork. F ollo wing Remarks 1 an d 2 , there is considerable ﬂexibilit y in the c hoice of d 0 and hence that o f d swap 0 . In a ddition to b eing helpful under resource constrain ts, this facilitate s th e task of ﬁnding highly eﬃcien t designs when one in tends 16 T. BANERJEE AND R . MUKER JEE to use a little more than 2( v − 1) slides s o as to gain degrees of freedom for the estimation of σ 2 . F or this pur p ose, the same h euristic approac h as in Section 4.3 can b e follo wed with the only change th at now in step (I), all p ossibilities for d swap 0 , corresp onding to d 0 arising fr om Result 3 and Remark 1 or 2 , hav e to b e co nsidered. As an illustratio n, consider the 2 × 2 factorial and let N = 8. Then, under the criterion of w -optimalit y ( w = 1 , 2 or 3), the ab o ve app r oac h yields the d esign { (01 , 00) , (10 , 00) , (11 , 01) , (00 , 01) , (00 , 10) , (01 , 11) , (11 , 10) , (10 , 11) } , whic h is an augmen tation of d swap 0 (consisting of the ﬁr st six slides) and a dy e-sw app ed design by itself. A complete enumeration shows that this design is, indeed, w -optimal amo ng all designs with N = 8 slides, for w = 1 , 2 , 3 and under the mo del ( 10 ). 6. Concluding remarks. 6.1. R obustness c onsider ations. The results in this pap er w ere obta ined under the assumption that the log intensit y ratios for a gene, arising fr om diﬀeren t slides, are homoscedastic and uncorrelated. A discussion on this as- sumption is warran ted. In cD NA microarra y exp eriments, the measuremen t error is t ypically swamped in b iologic al v ariability . F rom a pr actical view- p oint, it is therefore appropriate to attribu te th e v ariance of an observe d log in tensit y ratio arising from a s lide ( i 1 . . . i n , j 1 . . . j n ) to components, sa y , γ 2 i 1 ...i n and γ 2 j 1 ...j n , representing the biological v ariabilit y within the cell p op- ulations giv en by i 1 . . . i n and j 1 . . . j n , in addition to a comp onen t, sa y , δ 2 , due to the measurement error. Th u s, this v ariance equ als γ 2 i 1 ...i n + γ 2 j 1 ...j n + δ 2 . If the v ariance comp onents γ 2 i 1 ...i n are supp osed to b e equal for all cell p op- ulations [cf. Kerr ( 2003 ) and Altman and Hua ( 2006 ), among others] with common v alue sa y , γ 2 , then the log int ensit y ratios arising f r om d iﬀeren t slides are homosce dastic with co m mon v ariance σ 2 = 2 γ 2 + δ 2 . F urthermore, these ratios can b e safel y assu med to b e uncorrelated if the replications for ev ery treatmen t com b ination are biolo gical (i.e., the same sub ject does not app ear in more than one slide). T hus, in this situation all our results go through w ith σ 2 = 2 γ 2 + δ 2 . If, how ev er, the v ariance comp onen ts γ 2 i 1 ...i n asso ciated w ith the cell p opu - lations are not all equal, then the assumption of homoscedasticit y no longer holds. In order to giv e a ﬂa v or of the robustness of our results to this pos- sibilit y , we revisit Sections 3 and 4 . F or the 2 × 2 factorial in S ection 3 , writing ˜ γ 2 i 1 i 2 = γ 2 i 1 i 2 /δ 2 , three patte rns are considered for ( ˜ γ 2 00 , ˜ γ 2 01 , ˜ γ 2 10 , ˜ γ 2 11 ): (i) (2 , 2 . 5 , 2 . 5 , 3), (ii) (2 , 3 , 4 , 6) and (iii) (6 , 4 , 3 , 2). Und er (i)–(iii), one can emplo y the approximat e d esign theory to ﬁ nd th e w -optimal design mea- sures and, hence, obtain T able 3 sho wing the eﬃciencies of the design mea- sure π 0 rep orted earlier in R esu lt 1 (a). It is satisfying to note that π 0 , OPTIMAL DESIGN IN MICRO A RRA Y EXPERI MENTS 17 T able 3 Eﬃciencies of π 0 under heter osc e dasticity Situation w = 1 w = 2 w = 3 (i) 99.90% 99.89% 99.86% (ii) 99.25% 99.03% 98.91% (iii) 99.43% 99.15% 99.00% whic h is w -optimal u n der h omoscedasticit y , r emains qu ite robust ev en to the appreciably heteroscedastic situations (ii) and (iii). The exact d esigns arising f r om π 0 are also seen to remain h ighly eﬃcien t un der (i)–(iii). The ﬁndings are almost equally encouraging f or the nearly saturated optimal ex- act d esigns shown in T ables 1 and 2 . Un der (i)–(iii) and for w = 1 , 2 , 3, the d esigns in T able 1 often remain w -optimal among all exact designs and, except in on e case, alw ays hav e eﬃciency o ve r 97%. Th e exceptional case concerns th e design for N = 6, wh ic h has eﬃciency 93.40% under (ii) when w = 3. F or the 2 × 3 factorial, along the line of (i)–(iii) , we consid- ered the patterns (2 , 2 . 5 , 2 . 5 , 2 . 5 , 3 , 3), (2 , 3 , 3 , 4 , 6 , 6) and (6 , 4 , 4 , 3 , 2 , 2) for ( ˜ γ 2 00 , ˜ γ 2 01 , ˜ γ 2 02 , ˜ γ 2 10 , ˜ γ 2 11 , ˜ γ 2 12 ). Under all these p atterns and for w = 1 , 2 , 3, the designs in T able 2 often tur n out to b e w -optimal and alw a ys hav e eﬃciency o ve r 98%. The log inte n sit y ratios from diﬀeren t slides, of course, get correlated when the same sub ject is allo wed to app ear in more than one slide. If we con tinue to a ssume the equalit y of the γ 2 i 1 ...i n and den ote their common v alue by γ 2 , then the correlation term s dep end on the ratio γ 2 /δ 2 , whic h is commonly unknown. As a r esult, the standard linear mod el based analysis and the asso- ciated optimal design theory will not w ork. On the other hand, if w e pretend γ 2 /δ 2 to b e kno wn so as to allo w the us e of w eigh ted least sq u ares, empir- ical evidence suggests in fa vor of ha ving only biologic al r eplications from the p oin t of view of eﬃciency . T o illustrate this p oint without making the present ation to o inv olv ed, w e consider the case of a 2 × 2 factorial d esign in N = 4 slides under the absence of dy e color eﬀects. W e made an en um er ation of all such designs that keep the main an d in teraction eﬀects estimable and, for eac h design and every treat men t com bin ation, enumerated all p ossib ili- ties for biological or tec hn ical replication (here technical replication means rep eating the same s ub ject on more than one slide). F or in s tance, in the design d ∗ = { (01 , 00) , (10 , 00) , (11 , 01) , (11 , 10) } , the tw o r ep lications for any treatmen t com bination can b e biological or tec hn ical, thus leading to 16 p os- sibilities arising from this d esign alo ne. The complete enumeration rev ealed that, if th e ratio γ 2 /δ 2 is pretended to b e known, th en irresp ectiv e of the v alue of this ratio, the design d ∗ , with all replica tes biolo gical, is w -optimal whenev er w ≥ 2 3 . Earlier, in T able 1 , th e same d esign was rep orted to b e 18 T. BANERJEE AND R . MUKER JEE w -optimal for w = 1 , 2 and 3 in the homoscedastic and uncorrelated setup. Complete en u meration of this kind b ecomes un manageable for more complex factorials, but partial en umeration in sev eral other situ ations led to similar conclusions. This reinforces the ﬁndin gs in Kerr ( 2003 ) in a simpler setting and suggest s that, in a ddition to making the log in tensit y ratios from diﬀer- en t s lides uncorrelated, use of only b iological r eplicates can b e adv an tageous from the pers p ectiv e of design eﬃc iency as w ell; see also Kendziorski et al. ( 2005 ) and th e references therein for insigh tfu l practical r esu lts in a similar con text. The p oint just n oted mak es sense if th e cost of b iologica l replication is n egligible compared to the cost of th e assa y p er slide, as h as b een tac- itly supp osed in this pap er. While Bueno Filho, Gilmour and Rosa ( 2006 ) men tion that the n u mb er of slides is t ypically the most imp ortant limit ing factor in microarra y exp eriments, a more detailed discussion in this rega rd is a v ailable in Kerr ( 2003 ), who also d w elt on the situation where this is not the case. If the cost of biologic al replication is a real issu e, then the design problem b ecomes m uc h more complex. Instead of ﬁxing the n umb er of slides, as done here, one should then proceed in the spirit of Kerr ( 2003 ) to form ulate the problem in terms of a cost function that incorp orates the cost of the assa ys (slides), as w ell as the cost of biological replicatio n. Giv en suc h a cost constrain t, the p ossibilit y o f technical rep lication and asso ciated correlation will a lso hav e to b e accoun ted for. Since commonly t his correla- tion is unkno w n, the optimal design p roblem will then concern some kind of lik eliho o d based rather than linear mod el based analysis. 6.2. F urther op en issues. Ev en within the h omoscedastic an d uncorre- lated setting, there are sev eral op en issu es that deserv e atten tion. One of these concerns analytical deriv ation of optimal designs f or N greater than v − 1 or 2( v − 1) in Sections 4 or 5 resp ectiv ely . F or instance, a pro of of the Conjecture in Section 4.3 will b e of in terest. This can, ho wev er, b e c hal- lenging, and p ending a complete solution, our heuristic approac h holds the promise of yielding designs t hat are at least highly eﬃcien t. F r om a practical p oin t of view, an imp ortan t d esign p r oblem is that of fractional replicati on. Compared to traditional factorials, a d iﬃcult y here is the lac k of eﬀect hierarc hy [ W u and Hamada ( 2000 ), page 112]. Even in the t wo- factor case, the in teraction can b e of greater in terest to biologists than the main eﬀects. Hence, esp ecially wh en the num b er of factors, n , is rela- tiv ely small, it ma y b e to o d r astic to ignore some interac tions, as required in fractional replicatio n . F or large n , ho wev er, this can be a sensible option. The exp erience with factorial fractions u nder the orth ogonal parametriza- tion [see, e.g., Dey and Muk erjee ( 1999 )] suggests that then, und er sp eciﬁca- tion from biolog ists ab out the pattern of n egligible in teractions, the pr esent tec hniques should b e useful. OPTIMAL DESIGN IN MICRO A RRA Y EXPERI MENTS 19 A p r oblem, akin to that of fr actional replication, concerns the s tu dy of optimal deigns when the impact of p ossible dy e-color b ias can b e mod eled via a reduced ve rsion of ( 10 ). Note that the mo del ( 10 ) a llo ws a v ery general form for the eﬀect of dy e-coloring and, hence, is applicable to a broad sp ec- trum of situations. If in a sp eciﬁc application one has suﬃ cien t kn o wledge of the und erlying pro cess so as to en tertain the risk of assumin g that suc h eﬀect is repeatable o ver slides, that is, additiv e to treatmen t e ﬀects, then in ( 10 ) one ca n replace λ i 1 ...i n + λ j 1 ...j n b y a single p arameter η . In this case, it can b e sho wn that at least v slides are required to k eep all the θ s estimable. Ho we v er, in cont rast with Re sults 3 and 6 , no single d esign with v slides is optimal sim ultaneously for all these parameters. F or th is reduced mo del, it is kn o wn that any ev en design ( i.e., a design wh ere ev ery treatmen t co mbi- nation app ears an ev en n umb er of times) allo ws a dy e-color assignmen t that ensures orth ogonalit y to η [ Kerr and Ch u rc h ill ( 2001a )]. F or ev en s 1 and s 2 , the d esign in Conjecture (a ) of Section 4.3 is eve n and, hence, with ap- propriate dy e-color assig nment , it is again conject ured to b e optimal under this mo d el. F or o dd s 1 or s 2 to o, the initial ﬁnd ings are optimistic. T h us, for the 2 × 3 f actorial, Conjecture (a) yields the nearly orthogonal d esign { (00 , 01) , (02 , 00) , (10 , 00) , (01 , 11) , (11 , 1 0) , (12 , 02) , (10 , 12) } , and a complete en u meration sho w s that it is, indeed, w -optimal among all d esigns with 7 slides, f or w = 1 , 2 , 3 and under th e reduced mo del. In the pr esen t paper we studied optimal d esigns from the sta tistical con- sideration of eﬃciency . F rom this per s p ectiv e, our designs often outp erform more eleme n tary ones that h a ve gained p opu larit y in applied work. F or in- stance, in the setup of Sections 4 and 5 , it is easy to chec k that the designs arising from Results 3 and 6 estimate the main eﬀect parameters with the same v ariance and en tail smaller v ariances for the in teraction parameters, as compared to the commonly used r eference design or the d y e-swapp ed ve r- sion th er eof, resp ectiv ely . These simpler designs may , how ev er, hav e other practical b eneﬁts, includ in g those d ictated b y manufacturer r ecommenda- tions. Nev ertheless, as noted earlier, our results allo w consider ab le ﬂexibilit y under resource constrain ts and sh ould b e useful to applied r esearchers con- cerned with practica l issues in addition to eﬃci ency considerations. Ev en in extreme situations where such p racticalit ies pr eclude d irect implemen tation of the prop osed designs, the latter would help in b enc h marking the designs actually u s ed from the p oin t of v iew of eﬃciency . It is hop ed th at the presen t endeav or w ill generate further in terest in the ab o ve dir ections. Ac kn owledgmen ts. W e thank th e referee, the asso ciate editor an d the editor for very constructive suggestions. This w ork wa s supp orted b y the Cen ter for Managemen t and Dev elopmen t Studies, Ind ian In stitute of Man- agemen t Calcutta. 20 T. BANERJEE AND R . MUKER JEE SUPPLEMENT AR Y MA TERI AL Optimal factorial designs for CDNA microarra y exp erimen ts: Pro ofs (doi: 10.121 4/07-A OAS144SUPP ; .p d f ). T ec hnical d etails, includin g pr o ofs, ap- p ear in a supp lemen tary mat erial ﬁle p osted at t he j ournal website. REFERENCES Al tma n, N. S. and Hua, J. (2006). Extend ing t h e loop design for tw o-c hannel microarra y exp eriments. Genet. R es. 88 153–163. Amara tunga, D. and Cabrera, J. (2004). Explor ation and Analysis of DNA Micr o arr ay and Pr otein Arr ay Data . Wiley , New Y ork. Banerjee , T. and Mukerjee , R. (2008). Supplement to “Op t imal factori al designs fo r CDNA microarra y exp eriments.” DOI: 10.1214/ 07-AO AS144SUPP . Bueno Filho, J. S. S ., Gilmour, S. G. and Rosa, G. J. M. (2006). Design of microarray exp eriments for genetical genomics studies. Genetics 174 945– 957. Churchill, G. A . (2002). F undamentals of exp erimental design for cDNA micro arrays. Natur e Genetics ( Suppl. ) 3 490–495. Dey, A. and Mukerjee , R. (1999). F r actional F actorial Plans . Wiley , New Y ork. MR1679441 Dobbin, K. and Simon, R. (2002). Comparison of microarra y designs for class comparison and class discov ery . Bioinformatics 18 1438–1 445. Glonek, G. F. V . and Solo mon, P. J. (2004). F actorial and time co urse designs for cDNA microarra y exp eriments. Biostatistics 5 89–111. Gros sman, H. and Schw abe, R. (2008). The relatio nship betw een optimal designs for microarra y and paired comparison exp eriments. Preprint. Gupt a, S. (2006). Balanced fa ctorial designs for cDNA microarra y exp eriments. Comm. Statist. The ory Metho ds 35 1469–14 76. MR2328489 Gupt a, S. and M ukerjee, R. (1989). A Calculus for F actorial A rr angements . Sp ringer, Berlin. MR1026012 Kendziorski, C ., Irizarr y, R. A., Chen, K. S., H aag, J. D. and Gould, M. N . (2005). On the ut ilit y of p o oling biolog ical samples in microarray ex p eriments. Pr o c. Natl. Ac ad. Sci. USA 102 4252–4 257. Kerr, K. F . (2006). Eﬃcient 2 k factorial designs for blo cks of size 2 with microarra y applications. J. Qual. T e chnol. 38 309–318. Kerr, M. K. (2003). Design considerations for eﬃcient an d eﬀectiv e microarra y studies. Biometrics 59 822–82 8. MR2019821 Kerr, M. K. and Chur chill, G. A. (200 1a). Exp erimental design for gene expression microarra ys. Biostatistics 2 183–201. Kerr, M. K. and Ch urchill, G. A. (2001b). Statistical design and the analysis of gene expression microarra y data. Genet . R es . 77 123 –128. Kiefer, J. C. (1975). Construction and optimalit y of genera lized Y oud en d esigns. In A Survey of Statistic al Design and Line ar Mo dels (J. N. Sriv asta va , ed.) 333– 353. North- Holland, Amsterdam. MR0395079 Landgrebe, J., Bretz, F. and Brunner, E. (2006). Eﬃcient design and analysis of tw o colo ur factori al microarra y exp erimen ts. Comput. Statist. Data Anal. 50 499 –517. MR2201875 Majumdar, D. (1996). Op timal and eﬃcien t t reatment-control d esigns. In Handb o ok of Statistics 13 (S. Ghosh and C. R. Rao, eds.) 1007– 1053. North-Holland, Amsterdam. MR1492589 OPTIMAL DESIGN IN MICRO A RRA Y EXPERI MENTS 21 Nguyen, D., Arp a t, A. W ang, N. and Carrol l, R. J. (2002). D NA microarra y exp er- iments: Biological and tec hnical aspects. Biometrics 58 701–717. MR1939398 Ro sa, G. J. M, Steibel, J. P. and Tempelman, R. J. (2005). Reassessing design and analysis of tw o-color microarra y experiments using mixed eﬀects mo dels. Comp. F unct. Genomics 6 123–13 1. Sil vey, S. D. (1980). Optimal Design . Chapman and Hall, London. MR0606742 W ang, P. C. (2004). Designing t w o-level fractional factorial exp eriments in blo cks of size tw o. Sankhy¯ a Ser. A 66 327–342. MR2090981 Wit, E., Nobile, A. and Khanin, R. (2005). Near-optimal d esigns for du al-channel microarra y studies. Appl. St atist. 54 817–830. MR2209033 Wu, C. F. J. and Hamada, M. (2000). Exp eriments : Pl anning , Analysis and Par ameter Design Optimization . Wiley , New Y ork. MR1780411 Y ang, Y. J., and Dr aper, N. R . (2003 ). Tw o-level factorial and fractional factorial designs in blo cks of size tw o. J. Qual. T e chnol. 35 294–305. Y ang, Y. H. and Speed, T. (2002). D esign issues for cDNA microarra y exp eriments. Natur e Genetics ( Suppl. ) 3 579–588. Indian Institute of Management Ahmedabad V a strapur Ahmedabad 3 80 015 India E-mail: tathaga ta.bandy opadhy ay @gmail.com Indian Institute of Management Calcutt a Joka Diamond Harbour Ro ad Ko lka t a 700 104 India E-mail: rmuk1@ho tmail.com

Optimal factorial designs for cDNA microarray experiments

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment