On regression adjustments in experiments with several treatments

Regression adjustments are often made to experimental data. Since randomization does not justify the models, bias is likely; nor are the usual variance calculations to be trusted. Here, we evaluate regression adjustments using Neyman's nonparametric …

Authors: David A. Freedman

The Annals of Applie d Statistics 2008, V ol. 2, No. 1, 176–19 6 DOI: 10.1214 /07-A OAS143 c  Institute of Mathematical Statistics , 2008 ON REGRESSION ADJUSTMENTS IN EXPERIMENTS WITH SEVERAL TREA TMENTS By D a vid A. Fr eedman University of California , Berkeley Regression adjustmen ts are often made to exp erimen tal data. Since ra ndomization does not justify the mo dels, bias is lik ely; nor are the usual va riance calculations to b e trusted. Here, w e ev aluate regressio n adjustmen ts using Neyman’s nonparametric model. Pre- vious res ults are ge neralized, and mo re in tuitive proofs are giv en . A bias term is isolate d, and conditions are giv en for unbias ed estimation in finite samples. 1. In tro d uction. Data from randomized con trolled exp eriment s (includ- ing clinical trials) are often analyzed using regression mo d els and th e lik e. The b ehavio r of the estimates can b e calibrated using the nonparametric mo del in Neyman ( 1923 ), where eac h sub j ect has p oten tial r esp onses to sev- eral p ossible treatmen ts. On ly one resp on s e can b e observ ed, according to the sub ject’s assignmen t; the other p oten tial resp onses must then remain un- observ ed . Co v ariates are measured for eac h sub ject and ma y b e en tered in to the regression, p erh aps with the h op e of impro ving precision by adjusting the data to comp ensate for minor imbalances in the assignmen t groups. As discussed in F reedman ( 200 6 , 2007 ), r andomization do es n ot j u stify the regression mod el, so that bias can b e exp ected, and the u sual formulas do not giv e th e right v ariances. Moreo v er, regression n eed not improv e precision. Here, we extend some of those results, with pro ofs that are more intuiti v e. W e study asymptotic s, isolate a bias term of order 1 /n , and giv e s ome sp ecial conditions under whic h the m ultiple regression estimator is unbiased in fi nite samples. What is the source of the bias wh en regressio n mo dels are applied to exp eriment al data? In brief, the regression model assumes linear additive effects. Giv en th e assignmen ts, the resp onse is tak en to b e a linear com b ina- tion of tr eatment dumm ies and cov ariates, with an additiv e random error; Received June 2007; rev ised Octob er 2007. Key wor ds and phr ases. Models, randomizatio n, exp eriments, multiple regressi on, esti- mation, bias, balanced designs, inten tion-to-treat. This is a n electronic reprint of the or ig inal ar ticle published by the Institute o f Mathematical Statistics in The Annals of Applie d Statistics , 2008, V ol. 2, No. 1, 176–1 96 . This reprint differs fro m the orig inal in pagination and t yp ogr aphic detail. 1 2 D. A. FREEDMAN co efficients are assumed to b e constant across sub jects. The Neyman mod el mak es no assu mptions ab out linearit y a nd additivi t y . If w e writ e t he ex- p ected resp ons e giv en the assignmen ts as a linear com bination of treatmen t dummies, coefficient s will v ary across sub jects. That is the source of th e bias (algebraic details are giv en b elo w ). T o p ut this more starkly , in the Neyman mo del, inferences are based on the random assignment to the sev eral treatmen ts. Ind eed, the only sto c h astic elemen t in th e mo del is the randomization. With regression, in ferences are made conditional on the assignmen ts. The sto c h astic elemen t is the error term, and the inferences d ep end on assump tions ab out that error term. Those assumptions are not justified b y randomization. The breakdo wn in assumptions explains why regression comes up short when calibrated against the Neyman mo d el. F or simplicit y , we consider three treatmen ts and one co v ariate, the main difficult y in handling more v ariables b eing the notational ov erhead. Th ere is a finite p opulation of n su b jects, indexed by i = 1 , . . . , n . Defined on this p opulation are four v ariables a, b, c, z . The v alue of a at i is a i , and so forth. These are fi xed real n umb ers. W e consider three p ossible treatment s, A, B , C . If, for instance, i is assigned to trea tmen t A , w e observe the r esp onse a i , but do not observe b i or c i . The p opulation av erages are the p arameters of int erest here: a = 1 n n X i =1 a i , b = 1 n n X i =1 b i , c = 1 n n X i =1 c i . (1) F or example, a is the a verag e resp onse if all sub jects are assigned to A . This could be mea sured d ir ectly , at the exp ense of losing all information ab out b and c . T o estimate all three parameters, w e divide the p opulation at random into thr ee sets A, B , C , of fi xed sizes n A , n B , n C . I f i ∈ A , then i receiv es treatmen t A ; lik ewise for B and C . W e n o w h a ve a simple mo del for a clinical trial. As a matter of notation, A stands for a random set as w ell as a treatmen t. Let U, V , W b e dummy v ariables for th e sets. F or instance, U i = 1 if i ∈ A and U i = 0 otherwise. In p articular, P i U i = n A , and so forth. Let x A b e the a vera ge of x o ver A , namely , x A = 1 n A X i ∈ A x i . (2) Plainly , a A = P i ∈ A a i /n A is an unbiase d estimator, called the “ITT esti- mator,” for a . Like wise for B and C . “ITT” stands for in tenti on-to-treat . The idea, of course, is that the s amp le a verag e is a go o d estimator for the p opulation a verag e. The in ten tion-to-treat principle go es bac k to Bradford Hill ( 1961 ); for additional discussion, see F reedman ( 2 006 ). T here is at lea st REGRESSION ADJUSTMENT 3 one flaw in the n otatio n: x A is a r andom v ariable, b eing the av erage of x o ver the rand om set A . By cont rast, n A is a fixed quan tit y , b eing the num b er of elemen ts in A . In the Neyman mo del, the observ ed resp onse for sub ject i = 1 , . . . , n is Y i = a i U i + b i V i + c i W i , (3) b ecause a, b, c co de the resp onses to the treatment s. If, for instance, i is assigned to A , the resp on s e is a i . F ur thermore, U i = 1 and V i = W i = 0, so Y i = a i . In this circumstance, b i and c i w ould not b e observ able. W e come no w to m ultiple regression. The v ariable z is a cov ariate. It is observ ed for ev ery sub ject, and is unaffected b y assignmen t. Applied wo rk- ers often estimate the parameters in ( 1 ) b y a multi ple regression of Y on U, V , W, z . This is the multi ple regression estimator whose prop erties are to b e studied. Th e idea seems to b e that estimates are impro v ed b y adjusting for rand om imbala nce in assignmen ts. The stand ard regression mo del assumes linear add itive effects, so th at E ( Y i | U, V , W, z ) = β 1 U i + β 2 V i + β 3 W i + β 4 z i , (4) where β is constan t across sub jects. Ho w ev er, the Neyma n mo del mak es no assumptions ab out linearit y or additivit y . As a result, E ( Y i | U, V , W, z ) is giv en b y the righ t-hand side of ( 3 ), with coefficient s that v ary across sub jects. The v ariation in th e co efficien ts con tradicts the basic assumption needed to pro v e that regression estimat es are unbiased [F reedman ( 2005 ), page 43]. Th e v ariation in the co efficien ts is the source of the bias. Analysts who fit ( 4 ) to data from a r andomized con trolled experiment seem to think of ˆ β 1 as estimating the effect of treatmen t A , namely , a in ( 1 ). Lik ewise, ˆ β 3 − ˆ β 1 is us ed to estimate c − a , the differen tial effect of treatmen t C versus A . Similar considerations app ly to other effects. Ho wev er, these estimators suffer f r om bias and other pr oblems to b e explored b elo w. W e turn for a momen t to com binatorics. Prop osition 1 is a we ll-kno wn result. (All pr o ofs are deferred to th e App endix at the end of the article.) Pr oposition 1. L et ˜ p S = n S /n for S = A, B or C . (i) E ( x A ) = x . (ii) v ar( x A ) = 1 n − 1 1 − ˜ p A ˜ p A v ar( x ) . (iii) co v ( x A , y A ) = 1 n − 1 1 − ˜ p A ˜ p A co v ( x, y ) . (iv) co v ( x A , y B ) = − 1 n − 1 co v ( x, y ) . Here, x, y = a, b, c or z . Lik ewise, A in (i)–(iii) ma y b e replaced by B or C . And A, B in (iv) m ay b e replaced b y an y other d istinct pair of sets. By 4 D. A. FREEDMAN co v ( x, y ) , for example, we mean 1 n n X i =1 ( x i − x )( y i − y ) . Curiously , the resu lt in (iv) do es not dep end on the fractions of sub jects allocated to th e three sets. W e can tak e x = z and y = z . F or instance, co v ( z A , z B ) = − 1 n − 1 v ar( z ) . The finite-sample multiv ariate CL T in Theorem 1 b elo w is a minor v ari- ation on results in H¨ oglund ( 1978 ). Th e theorem will b e us ed to pr o ve the asymptotic normalit y of the m ultiple regression estimator. T here are sev eral regularit y conditions for the theorem. Condition # 1. There is an a priori b ound on fourth momen ts. F or all n = 1 , 2 , . . . and x = a, b, c or z , 1 n n X i =1 | x i | 4 < L < ∞ . (5) Condition # 2. The fi rst- and second-order momen ts, including mixed momen ts, conv erge to fin ite limits, and asymptotic v ariances are p ositiv e. F or ins tance, 1 n n X i =1 a i → h a i (6) and 1 n n X i =1 a 2 i → h a 2 i , 1 n n X i =1 a i b i → h ab i , (7) with h a 2 i > h a i 2 ; (8) lik ewise for the other v ariables and pairs of v ariables. Here, h a i and so forth merely denote finite limits. W e tak e h a 2 i and h a, a i as synon ymous. In p resen t notation, h a i is the limit of a , the lat ter being the a v erage of a o v er the p opulation of size n ; see ( 1 ). Condition # 3. W e assume groups are of order n in size, that is, ˜ p A = n A /n → p A > 0 , ˜ p B = n B /n → p B > 0 , (9) ˜ p C = n C /n → p C > 0 , where p A + p B + p C = 1. Notice that ˜ p A , for instance, is the fraction of sub jects assigned to A at s tage n ; the limit as n increases is p A . REGRESSION ADJUSTMENT 5 Condition # 4. The v ariables a, b, c, z ha ve m ean 0: 1 n n X i =1 x i = 0 , where x = a, b, c, z . (10) Condition #4 is a normaliza tion for Theorem 1 . Without it, some cen ter- ing w ould b e n eeded. Theorem 1 (Th e CL T). Under Conditions # 1 – # 4 , the joint distribu- tion of the 12 - v e ctor √ n ( a A , a B , a C . . . , z C ) is asympto tic al ly normal, with p ar ameters given by the limits b elow: (i) E ( √ nx A ) = 0 ; (ii) v ar( √ nx A ) → h x 2 i (1 − p A ) /p A ; (iii) co v ( √ nx A , √ ny A ) → h x, y i (1 − p A ) /p A ; (iv) co v ( √ nx A , √ ny B ) → −h x, y i . Here, x, y = a, b, c or z . Lik ewise, A in (i)–(iii) ma y b e replaced by B or C . And A, B in (iv) may b e replaced b y an y other distinct pair of sets. The theorem asserts, among other things, that the limiting fi r st- and second- order momen ts coincide w ith the m omen ts of the asymp totic distribu tion, whic h is safe due to the b ound on fourth momen ts. (As noted ab o ve , pr o ofs are d eferred to a T ec hn ical App endix at the end of the article. ) Example 1. Supp ose we wish to estimate the effect of C relativ e to A , that is, c − a . Th e ITT estimator is Y C − Y A = c C − a A , where the equalit y follo ws from ( 3 ). As b efore, Y C = P i ∈ C Y i /n C = P i ∈ C c i /n C . The estimator Y C − Y A is unbia sed by Prop osition 1 , and its exact v ariance is 1 n − 1  1 − ˜ p A ˜ p A v ar( a ) + 1 − ˜ p C ˜ p C v ar( c ) + 2 co v ( a, c )  . By contrast , the multiple regression estimator would b e obtained b y fit- ting ( 4 ) to the d ata, and computing ˆ ∆ = ˆ β 3 − ˆ β 1 . The asymptotic bias and v ariance of this estimator will b e determined in Theorem 2 b elo w . The p er- formance of th e t w o estimators will b e compared in Theorem 4 . 2. Asymptotics for multiple regression estimators. In this sectio n w e state a theorem that d escrib es the asymptotic b ehavio r of the multiple re- gression estimator applied to exp erimen tal data: there is a r andom term of order 1 / √ n and a bias term of order 1 /n . As noted ab o ve, w e ha ve three treatmen ts and one co v ariate z . The treatmen t group s are A, B , C , with 6 D. A. FREEDMAN dummies U, V , W . The co v ariate is z . If i is assigned to A , w e observe the resp onse a i , whereas b i , c i remain unobserved. Lik ewise for B , C . The co- v ariate z i is alwa ys observed, and is unaffected by assignment. The resp onse v ariable Y is giv en b y ( 3 ). In Theorem 1 , most of the random v ariables—lik e a B or b A —are u nobserv able. That ma y affect the app licatio ns, but not the mathematics. Argum ents b elo w inv olv e only observ able random v ariables. The d esign matrix f or the multiple regression estimator w ill hav e n ro ws and four columns, namely , U, V , W, z . The esti mator is obtained b y a regres- sion of Y o n U, V , W , z , the fir s t three co efficien ts estimat ing the effects of A, B , C , r esp ectiv ely . Let ˆ β MR b e the m u ltiple regression estimator f or the effects of A, B , C . Thus, ˆ β MR is a 3 × 1-v ector. W e normalize z to hav e mean 0 and v ariance 1: 1 n n X i =1 z i = 0 , 1 n n X i =1 z 2 i = 1 . (11) The mean-zero condition on z o verlaps Condition #4 , and is needed for Theorem 2 . There is no interce pt in our regression mo del; without the mean- zero condition, the mean of z is liable to confoun d the effect estimates. See the App end ix for details. (In the alternativ e, w e can dr op one of the dummies and put an in tercept into the regression—although w e w ould n o w b e estimating effect differences rather than effec ts.) The condition on the mean of z 2 merely s ets the scale. Recall that ˜ p A is the fraction of su b jects assigned to treatmen t A . Let ˜ Q = ˜ p A az + ˜ p B bz + ˜ p C cz (12) and Q = p A h az i + p B h bz i + p C h cz i . (13) Here, for instance, az = P n i =1 a i z i /n is the a verage o v er the study p opulation. By Condition #2 , as the p op u lation size gro w s, az = P n i =1 a i z i /n → h az i ; lik ewise f or b and c . Thus, ˜ Q → Q. (14) The quant ities ˜ Q and Q are n eeded f or the next theorem, wh ic h d emon- strates asymptotic normalit y and isolates th e bias term. T o state the theo- rem, recall that ˆ β MR is the multi ple regression estimat or f or the three effects. The estimand is β = ( a, b, c ) ′ , (15) where a, b, c are defined in ( 1 ). Define the 3 × 3 matrix Σ as follo ws: Σ 11 = 1 − p A p A lim v ar( a − Qz ) , (16) Σ 12 = − lim cov( a − Qz , b − Qz ) , REGRESSION ADJUSTMENT 7 and so forth. The limits are tak en as the p opulation size n → ∞ , and exist b y Condition #2 . L et ζ n = √ n ( a A − ˜ Qz A , b B − ˜ Qz B , c C − ˜ Qz C ) ′ . (17) This turns out to b e the lead random elemen t in ˆ β MR − β . T he asymptotic v ariance–c o v ariance matrix of ζ n is Σ , by ( 14 ) and Theorem 1 . F or th e bias term, let K A = co v ( az , z ) − ˜ p A co v ( az , z ) − ˜ p B co v ( bz , z ) − ˜ p C co v ( cz , z ) , (18) and likewise for K B , K C . Theorem 2. Assume Conditions # 1 – # 3 , not Condition # 4 , and ( 11 ). Define ζ n by ( 17 ), and K S by ( 18 ) for S = A, B , C . Then E ( ζ n ) = 0 and ζ n is asympto tic al ly N (0 , Σ) . Mor e over, ˆ β MR − β = ζ n / √ n − K/n + ρ n , (19) wher e K = ( K A , K B , K C ) ′ and ρ n = O (1 /n 3 / 2 ) in pr ob ability. Remarks. (i) If K = 0, the bias term will b e O (1 /n 3 / 2 ) or smaller. (ii) What are the implications for practice? In the usual linear mo del, ˆ β is u n biased give n X . With exp erimen tal d ata and the Neyman mo del, giv en the assignmen t, results are d eterministic. At b est, w e will get unbiasedness on a v erage, o ver all assignmen ts. Under sp ecial c ircumstances ( Theorems 5 and 6 b elo w), that happ ens. Generally , ho wev er, th e multiple regression estimator will b e biased. See Example 5 . Th e bias decreases as sample size increases. (iii) T urn no w to random error in ˆ β . This is of order 1 / √ n , b oth for the ITT estimator and for the m u ltiple regression estimator. Ho wev er, the asymptotic v ariances d iffer. The multiple regression estimator can b e more efficien t than the ITT estimator—or less efficien t—and the difference p ersists ev en for large samples. See E xamp les 3 and 4 b elo w. 3. Asymptotic n ominal v ariances. “Nominal” v ariances are computed b y the usual regression formulae , but are lik ely to b e wrong since the usual assumptions do not h old. W e sk etc h the asymptotics h ere, u nder the condi- tions of Theorem 2 . Recall that the design matrix X is n × 4 , the columns b eing U, V , W, z . The resp ons e v ariable is Y . The nominal co v ariance matrix is then Σ nom = ˆ σ 2 ( X ′ X ) − 1 , (20) where ˆ σ 2 is the sum of the squared residuals, normalized by the degrees of freedom ( n − 4). Recall Q from ( 13 ). Let σ 2 = lim n →∞ [ ˜ p A v ar( a ) + ˜ p B v ar( b ) + ˜ p C v ar( c )] − Q 2 , (21) 8 D. A. FREEDMAN where the limit exists by Conditions #2 and #3 . Let D =     p A 0 0 0 0 p B 0 0 0 0 p C 0 0 0 0 1     . (22) Theorem 3. Assume Conditions # 1 – # 3 , not Condition # 4 , and ( 11 ). Define σ 2 by ( 21 ) and D b y ( 22 ). In pr ob ability, (i) X ′ X/n → D , (ii) ˆ σ 2 → σ 2 , (iii) n Σ nom → σ 2 D − 1 . What are t he implications for practice ? The upp er left 3 × 3 blo c k of σ 2 D − 1 will generally differ from Σ in Theorem 2 , so the usual regression standard err ors—computed for exp erimen tal data—can b e quite mislea d- ing. Th is difficult y do es n ot go a wa y for large samples. What explains the breakdo w n? In brief, the m ultiple regression assumes (i) the exp ectation of the resp onse giv en the assignment v ariables and the cov ariates is linear, with co efficien ts that are constant across sub jects; and (ii) the conditional v ariance of the resp onse is constan t across sub jects. In the Neyman mo d el, (i) is wrong as n oted earlier. Moreo v er, give n the assignment s, there is no v ariance left in the resp onses. More tec hnically , v ariances in th e Neyman mo del are (necessarily) com- puted across the assignmen ts, for it is the assignmen ts that are the random elemen ts in the m o del. With regression, v ariances are computed condition- ally on the assignmen ts, from an error term assumed to b e I ID across sub- jects, and indep endent of the assignmen t v ariables as wel l as the co v ariates. These assu mptions do not follo w fr om the randomization, explaining wh y the usual form ulas break down. F or additional discussion, see F reedman ( 2007 ). An example may clarify the issues. W rite co v ∞ for limiting cov ariances, for example, co v ∞ ( a, z ) = lim co v ( a, z ) = h az i − h a ih z i = h az i b ecause h z i = 0 by ( 11 ); similarly for v ariances. See Con d ition #2 . Example 2. Consider estimating the effect of C relativ e to A , so the parameter of in terest is c − a . By w a y of s imp lification, supp ose Q = 0. Let ˆ ∆ b e the multi ple r egression estimator for the effect difference. By T heorem 3 , the nominal v ariance of ˆ ∆ is essentia lly 1 /n times  1 + p A p C  v ar ∞ ( a ) +  1 + p C p A  v ar ∞ ( c ) +  1 p A + 1 p C  p B v ar ∞ ( b ) . REGRESSION ADJUSTMENT 9 By T heorem 2 , h o wev er, the true asymptotic v ariance of ˆ ∆ is 1 /n times  1 p A − 1  v ar ∞ ( a ) +  1 p C − 1  v ar ∞ ( c ) + 2 cov ∞ ( a, c ) . F or ins tance, w e can take the asymptotic v ariance-co v ariance matrix of a, b, c, z to b e the 4 × 4 identit y matrix, with p A = p C = 1 / 4 s o p B = 1 / 2. The true asymptotic v ariance of ˆ ∆ is 6 /n . The n ominal asymptotic v ariance is 8 /n and is to o big. On the other hand, if we c hange v ar ∞ ( b ) to 1/4, the true asymptotic v ariance is still 6 /n ; the n ominal asymptotic v ariance d r ops to 5 /n and is too sm all. 4. The gain from adjustmen t. Do es ad j ustmen t improv e pr ecision? The answ er is sometimes. Theorem 4. Assume Conditions # 1 – # 3 , not Condition # 4 , and ( 11 ). Consider estimating the effe ct of C r elative to A , so the p ar ameter of in- ter est is c − a . If we c omp ar e the multiple r e gr e ssion estimator to the ITT estimator, the asymptotic gain in varianc e is Γ / ( np A p C ) , wher e Γ = 2 Q [ p C h az i + p A h cz i ] − Q 2 [ p A + p C ] , (23) with Q define d by ( 13 ). A djustment ther efor e helps asymptotic pr e ci sion if Γ > 0 , but hurts if Γ < 0 . The next tw o examples are set u p like Example 2 , with cov ∞ for limit- ing co v ariances. W e sa y the design is b alanc e d if n is a m ultiple of 3 and n A = n B = n C = n/ 3. W e sa y that effects are additive if b i − a i is constan t o ver i and likewise f or c i − a i . With additiv e effects, v ar ∞ ( a ) = v ar ∞ ( b ) = v ar ∞ ( c ); write v for the common v alue. Similarly , co v ∞ ( a, z ) = co v ∞ ( b, z ) = co v ∞ ( c, z ) = Q = ρ √ v , where ρ is the asymptotic correlati on b et ween a and z , or b and z , or c and z . Example 3. Supp ose effects are additiv e. Th en co v ∞ ( a, z ) = co v ∞ ( b, z ) = co v ∞ ( c, z ) = Q and Γ = Q 2 ( p A + p C ) ≥ 0. The asymptotic gain from adjust- men t w ill b e p ositiv e if co v ∞ ( a, z ) 6 = 0. Example 4. Supp ose the d esign is balanced, so p A = p B = p C = 1 / 3. Then 3 Q = co v ∞ ( a, z ) + cov ∞ ( b, z ) + co v ∞ ( c, z ). Consequen tly , 3Γ / 2 = Q [2 Q − co v ∞ ( b, z )]. Let z = a + b + c . Ch o ose a, b, c so that v ar ∞ ( z ) = 1 and co v ∞ ( a, b ) = co v ∞ ( a, c ) = co v ∞ ( b, c ) = 0. I n p articular, Q = 1 / 3. No w 2 Q − co v ∞ ( b, z ) = 2 / 3 − v ar ∞ ( b ). Th e asymp totic gain from adjustmen t will b e n egativ e if v ar ∞ ( b ) > 2 / 3. 10 D. A. FREEDMAN Example 3 indicates one motiv ation f or adjustment: if effects are nearly additiv e, adju stmen t is likel y to help. How ev er, Ex amp le 4 sho w s that ev en in a balanced d esign, the “gain” from adju s tmen t can b e negativ e—if there are sub ject-by- treatmen t int eractions. More complicate d and r ealistic examples can no doub t b e constructed. 5. Finite-sample results. T his section giv es conditions under wh ic h the m u ltiple regression estimator will b e exactly un biased in finite samples. Ar- gumen ts are fr om symmetry . As b efore, the d esign is balanced if n is a m u ltiple of 3 and n A = n B = n C = n/ 3; effects are additiv e if b i − a i is con- stan t o ver i and lik ewise for c i − a i . Then a i − a = b i − b = c i − c = δ i , sa y , for all i . Note th at P i δ i = 0. Theorem 5. If ( 11 ) holds, the design is b alanc e d, and e ffe cts ar e addi- tive, then the multiple r e gr ession estimator i s u nb i ase d. Examples sho w that the b alance condition is needed in Th eorem 5 : ad- ditivit y is not enough. Lik ewise, if the b alance condition holds but th ere is nonadditivit y , the multiple regression estimator will usually b e biased. W e illustrate the firs t p oint. Example 5. C onsider a miniature trial with 6 sub jects. Resp onses a, b, c to treatmen ts A, B , C a re sho wn in T able 1 , along with the co v ariate z . Notice that b − a = 1 and c − a = 2. Th u s , effects are additiv e. W e assign one sub ject at rand om to A , one to B , and the remaining four to C . There are 6 × 5 / 2 = 15 assignments. F or eac h assignmen t, w e build u p the 6 × 4 design matrix (one column f or eac h treatmen t d u mm y and one column for z ); we c ompute the resp onse v ariable from T able 1 ab ov e, and then the m u ltiple regression estimator. Finally , w e a ve rage the results across the 15 assignmen ts, as sho wn in T able 2 . The a verage giv es the exp ected v alue of the multiple regression estima tor, b ecause the a v erage is tak en across all p ossible designs. “T ruth” is determined from the p arameters in T able 1 . Calculations are exact, w ithin the limits of rounding err or; no sim ulations are inv olv ed. F or instance, the a verag e co efficien t for the A du mm y is 3.3825. Ho w ev er, from T able 1 , the a ve rage effect of A is a = 1 . 3333. T he d ifference is bias. Consider next the d ifferen tial effect of B versus A . On a v erage, this is esti- mated by m u ltiple regression as 1 . 9965 − 3 . 3825 = − 1 . 3860. F rom T able 1 , truth is +1. Again, this reflects bias in the m u ltiple regression estimator. With a larger trial, of course, the bias w ould b e smaller; see Theorem 2 . Theorem 5 d o es n ot app ly b ecause the design is unbalanced. REGRESSION ADJUSTMENT 11 T able 1 Par ameter values a b c z 0 1 2 0 0 1 2 0 0 1 2 0 2 3 4 − 2 2 3 4 − 2 4 5 6 4 F or the next theorem, consider the p ossible v alues v of z . Let n v b e th e n um b er of i with z i = v . Th e a v erage of a i giv en z i = v is 1 n v X { i : z i = v } a i . Supp ose this is constan t across v ’s, as is P { i : z i = v } b i /n v , P { i : z i = v } c i /n v . The common v alues m us t b e a , b , c , resp ectiv ely . W e call this c onditional c onstancy . No condition is imp osed on z , and the design need n ot b e bal- anced. (Conditional consta ncy is violated in Example 5 , as one sees by lo ok- ing at the parameter v alues in T able 1 .) Theorem 6. With c onditional c onstancy, the multiple r e gr ession esti- mator is unbiase d. Remarks. (i) In the usual regression mo del, Y = X β + ǫ with E ( ǫ | X ) = 0. The multiple regression estimator is then conditionally u n biased. In Theo- rems 5 and 6 , the estimator is cond itionally biased, although the bias a v er- ages out to 0 across p ermutat ions. In T heorem 5 , for ins tance, the conditional bias is ( X ′ X ) − 1 X ′ δ . Across p ermutat ions, the bias a ve rages out to 0. The pro of is a little tricky (see the T ec hn ical Ap p endix b elo w). The δ is fi xed, as explained b efore the theorem; it is X that v aries from one p erm utation to another; the conditional bias is a nonlinear fu nction of X . T his is all quite differen t fr om the usu al regression argum en ts. T able 2 Av er age multiple r e gr ession estimates versus truth Av e MR T ru th A 3.3825 1.3333 B 1.996 5 2.3333 C 2.9053 3.3333 z − 0.0105 12 D. A. FREEDMAN (ii) Kempthorne ( 19 52 ) p oints to the difference b etw een p erm utation mo dels and the usual linear r egression mo del; see Ch apters 7–8, esp ecially Section 8.7. Also s ee Biometrics v ol. 13, no. 3 (195 7). Cox ( 1956 ) cites Kempthorne, but app ears to con tradict Theorem 5 abov e. I am indebted to Jo el Middleton f or the reference to Cox. (iii) When sp ecialize d to t wo-gro up exp erimen ts, the form ulas in this pap er (for, e.g., asymptotic v ariances) differ in app earance bu t not in sub- stance from th ose previously rep orted [F reedman ( 2007 )]. (iv) Although d etails ha ve not b een c hec ked, the results (and the argu- men ts) in this pap er seem to extend easily to an y fixed num b er of treatme n ts, and any fixed n um b er of co v ariates. T reatmen t by co v ariate in teractions can probably b e accommodated to o. (v) In th is pap er treatmen ts hav e t w o lev els: lo w or high. If a treatmen t has several leve ls—for example, lo w, medium, high—and linearit y is assumed in a regression mo del, inconsistency is lik ely to b e a consequence. Lik ewise, w e view treatmen ts as mutually exclusive : if sub ject i is assigned to group A , then i ca nnot also tur n up in group B . If m ultiple treatme n ts are app lied to the same sub ject in order to determine joint effec ts, and a regression mo del assumes add itiv e or multi plicativ e effects, inconsistency is again lik ely . (vi) The theory deve lop ed here applies equally well to 0–1 v alued re- sp onses. With 0–1 v ariables, it may seem more n atural to u se logit or pro- bit mo d els to adjust the data. Ho w ever, su c h models are not j ustified b y randomization—an y more than the linear mo del. Preliminary calculations suggest that if adjustmen ts are to b e made, linear regression may b e a safer c hoice. F or instance, the con ve n tional logit esti mator for the od d s ratio m a y b e sev erely biased. On the other hand, a consisten t estimator can b e b ased on estimated probabilities in th e logit mo del. F or discussion, see F reedman ( 2008 ). (vii) The theory dev elop ed here can probably b e extended to more com- plex designs (l ik e blo c kin g) and more complex esti mators (lik e tw o-stage least squares), b ut the wo rk r emains to b e done. (viii) Victora, Habic ht and Bryce ( 2004 ) fav or adjustmen t. Ho wev er, they do not address the sort of issues raised here, nor are they e n tirely clear ab out whether inferences are to b e made on a verag e across assignmen ts, or conditional on assignmen t. In the latter case, inferen ces migh t b e strongly mo del-dep endent. (ix) Mo dels are used to adjust data from large rand omized con trolled exp eriment s in, for example, Co ok et al. ( 2007 ), Gertler ( 2004 ), Chattopad- h y ay and Duflo ( 2004 ) and Rossou w et al. ( 200 2 ). Co ok et al. rep ort on long- term follo w up of sub jects in exp eriment s where salt intak e w as restricted; conclusions are d ep endent on the mo dels us ed to analyze the data. By con- trast, the results in Rossou w et al. for hormone replacemen t therap y do not dep end v ery m u c h on the mo deling. REGRESSION ADJUSTMENT 13 6. Recommendations for practice. Altman et al. ( 2001 ) do cument p er- sisten t failures in the rep orting of data from clinical trials, and mak e detailed prop osals for impr o vemen t. The follo wing recommendations are complemen- tary: (i) As is usu al, measures of balance b etw een th e assigned-to-treatmen t group and the assigned-to-con trol group should b e r ep orted. (ii) After that sh ould come a simp le in ten tion-to-treat analysis, com- paring rates (or a v erages and SDs) of outcomes among those assigned to treatmen t and those assigned to the cont rol group. (iii) Crosso ver should b e discussed, and deviations from proto col. (iv) Subgroup analyses should b e rep orted, and corrections for crosso ve r if that is to b e attempted. Analysis b y treatmen t receiv ed requires sp ecial justification, and so do es p er pr otocol analysis. (The fi rst compares those who receiv e treatmen t with those wh o do not, regardless of assignmen t; the second censors sub jects w h o cross o ver from one arm of the trial to the other, e.g., th ey are assigned to con trol but insist on treatmen t.) Complications are discussed in F reedman ( 2006 ). (v) Regression estimates (including logistic r egression and p rop ortional hazards) sh ould b e d eferred until rates and a verag es ha v e b een presen ted. If regression estimates differ from simple int en tion-to- treat r esults, and reliance is p laced on the mo dels, that needs to b e explained. As indicated abov e, the usual mo d els are not justified b y randomization, and simp ler estimators ma y b e more robust. TECHNICAL APPENDIX The App end ix pro vides tec hnical underpin nings for the theorems dis- cussed ab o ve. Pr oof of Proposition 1 . W e pro ve only claim (iv). Plainly , E ( U i V j ) = 0 if i = j , since i cannot b e assigned b oth to A and to B . F urthermore, E ( U i V j ) = P ( U i = 1 & V j = 1) = n A n n B n − 1 if i 6 = j . This is clear if i = 1 and j = 2; but p erm uting indices w ill n ot c hange the joint distribution of assignmen t d ummies. W e ma y assume without loss of generalit y that x = y = 0 . No w co v ( x A , y B ) = 1 n A 1 n B X i 6 = j E ( U i V j x i y j ) = 1 n ( n − 1) X i 6 = j x i y j 14 D. A. FREEDMAN = 1 n ( n − 1) X i x i X j y j − X i x i y i ! = − 1 n ( n − 1) X i x i y i = − 1 n − 1 co v ( x, y ) as required, w h ere i, j = 1 , . . . , n .  Pr oof of The orem 1 . The theorem can b e pr o ved by app ealing to H¨ oglund ( 1978 ) and computing conditional distrib u tions. An other starting p oint is Hoeffding ( 1951 ), with s uitable c hoices for the matrix fr om which summands are drawn. With either appr oac h, the u sual linear-com bin ations tric k can b e u sed to reduce dimensionalit y . In view of ( 9 ), the limiting dis- tribution satisfies three linear constrain ts. A f ormal pro of is omitted, but we sketc h the argumen t for one case, starting f rom Theorem 3 in Ho effding ( 1951 ). Let α, β , γ b e three constants. Let M b e an n × n matrix, with M ij =    αa j , for i = 1 , . . . , n A , β b j , for i = n A + 1 , . . . , n A + n B , γ c j , for i = n A + n B + 1 , . . . , n . Pic k one j at random f rom eac h r ow, without replacemen t (in terpretation: if j is pick ed from ro w i = 1 , . . . , n A , sub ject j go es into treatmen t group A ). According to Hoeffding’s theorem, the sum of the corresp ondin g matrix en tries will b e app r o ximately normal. So the la w of √ n ( a A , b B , c C ) tends to m ultiv ariate normal. Theorem 1 in Ho effding’s pap er will help get the regularit y conditions in his Theorem 3 f r om Cond itions #1 – #4 ab o v e.  Let X b e an n × p matrix of r ank p ≤ n . Let Y b e an n × 1 vec tor. Th e m u ltiple regression estimator computed fr om Y is ˆ β Y = ( X ′ X ) − 1 X ′ Y . Let θ b e a p × 1 v ector. Th e “inv ariance lemma” is a pu rely arithmetic result; the w ell-known pr o of is omitted. Lemma A.1. The invarianc e lemma. ˆ β Y + X θ = ˆ β Y + θ . The m ultiple-regression estimator for Theorem 2 ma y b e computed as fol- lo ws. Recall from ( 2 ) that Y A is the a verag e of Y o ver A , that is, P i ∈ A Y i /n A ; lik ewise f or B , C . Let e i = Y i − Y A U i − Y B V i − Y C W i , (A1) whic h is the residual w hen Y is regressed on the first three column s of the design matrix. L et f i = z i − z A U i − z B V i − z C W i , (A2) REGRESSION ADJUSTMENT 15 whic h is the residual when z is regressed on those columns. Let ˆ Q b e the slop e when e is regressed on f : ˆ Q = e · f / | f | 2 . (A3) The n ext r esult is standard. Lemma A.2. The multiple r e gr ession estimator for the effe ct of A , that is, the first element in ( X ′ X ) − 1 X ′ Y , is Y A − ˆ Qz A (A4) and likewise for B , C . The c o efficient of z in the r e gr ession of Y on U, V , W , z is ˆ Q . W e turn no w to ˆ Q ; this is the k ey tec h nical qu antit y in the p ap er, and w e d ev elop a more explicit formula for it. Notice that th e dummy v ariables U, V , W are mutually orthogonal. By th e u sual regression argumen ts, | f | 2 = | z | 2 − n A ( z A ) 2 − n B ( z B ) 2 − n C ( z C ) 2 , (A5) where | f | 2 = P n i =1 f 2 i . Recall ( 3 ). Ch ec k that Y A = a A , wh ere a A = P i ∈ A a i /n A ; lik ewise f or B , C . Hence, e i = ( a i − a A ) U i + ( b i − b B ) V i + ( c i − c C ) W i , (A6) where the r esidual e i w as d efined in ( A1 ). Lik ewise, f i = ( z i − z A ) U i + ( z i − z B ) V i + ( z i − z C ) W i , ( A7) where the r esidual f i w as defined in ( A2 ). No w e i f i = ( a i − a A )( z i − z A ) U i + ( b i − b B )( z i − z B ) V i (A8) + ( c i − c C )( z i − z C ) W i and n X i =1 e i f i = n A [( az ) A − a A z A ] + n B [( bz ) B − b B z B ] (A9) + n C [( cz ) C − c C z C ] , where, for in stance, ( az ) A = P i ∈ A a i z i /n A . Recall that ˜ p A = n A /n is the fr action of sub jects assigned to treatmen t A ; lik ewise for B and C . These fractions are deterministic, n ot random. W e can no w g iv e a more explicit form ula for the ˆ Q defined in ( A3 ), dividing n umerator and d enominator by n . By ( A5 ) and ( A9 ), ˆ Q = N /D , w here N = ˜ p A [( az ) A − a A z A ] + ˜ p B [( bz ) B − b B z B ] + ˜ p C [( cz ) C − c C z C ] , (A10) D = 1 − ˜ p A ( z A ) 2 − ˜ p B ( z B ) 2 − ˜ p C ( z C ) 2 . 16 D. A. FREEDMAN In the f orm u la for D , we used ( 11 ) to r eplace | z | 2 /n b y 1. The r eason ˆ Q matters is that it relates the m ultiple regression estimator to th e ITT estimator in a fairly simple wa y . Indeed, b y ( 3 ) and Lemma A.2 , ˆ β MR = ( Y A − ˆ Qz A , Y B − ˆ Qz B , Y C − ˆ Qz C ) ′ (A11) = ( a A − ˆ Qz A , a B − ˆ Qz B , a C − ˆ Qz C ) ′ . W e m ust now estimate ˆ Q . In view of ( 11 ), T heorem 1 shows that ( z A , z B , z C ) = O (1 / √ n ) . (A12) (All O ’s are in p robabilit y .) Con s equ en tly , the denominator D of ˆ Q in ( A10 ) is 1 + O (1 /n ). (A13) Tw o deterministic appro ximations to the numerator N we re present ed in ( 12 )–( 13 ). Pr oof of Theorem 2 . By Lemma A.1 , w e ma y assume a = b = c = 0. T o see this m ore sharply , r ecall ( 3 ). Let ˆ β b e the result of regressing Y on U, V , W, z . F urthermore, let Y ∗ i = ( a i + a ∗ ) U i + ( b i + b ∗ ) V i + ( c i + c ∗ ) W i . (A14) The resu lt of r egressing Y ∗ on U, V , W , z is ju st ˆ β + ( a ∗ , b ∗ , c ∗ , 0) ′ . S o the general case of Theorem 2 would follo w from the sp ecial case. Th at is why w e can, without loss of generalit y , assu me Cond ition #4 . Now ( a A , b B , c C ) = O ( 1 / √ n ) . (A15) W e use ( A10 ) to ev aluate ( A11 ). The den ominator of ˆ Q is essen tially 1, that is, the departure fr om 1 can b e s wept in to the error term ρ n , b ecause the departure from 1 g ets m ultiplied b y ( z A , z B , z C ) ′ = O (1 / √ n ). This is a little delicate, w e are estimating do wn to order 1 /n 3 / 2 . The departure of the d enominator from 1 is m ultiplied by N , but terms lik e a A z A are O (1 /n ) and immaterial, while terms lik e ( az ) A are O (1) by Condition #1 and P r op osition 1 (or see the discussion of Prop osition A.1 b elo w). F or the numerator of ˆ Q , terms lik e a A z A go in to ρ n : after multiplica tion b y ( z A , z B , z C ) ′ , they are O (1 /n 3 / 2 ). Recall that az = P n i =1 a i z i /n . What’s left of the numerator is ˇ Q + ˜ Q , where ˇ Q = ˜ p A ( az − az ) A + ˜ p B ( bz − bz ) B + ˜ p C ( cz − cz ) C . (A16) The term ˜ Q ( z A , z B , z C ) ′ go es in to ζ n ; see ( 17 ). Th e rest of ζ n comes from ( a A , b B , c C ) in ( A11 ). The bias in estimating the effects is therefore − E    ˇ Q   z A z B z C      . (A17) REGRESSION ADJUSTMENT 17 This can b e ev aluated by Prop osition 1, the relev ant v ariables b eing az , bz , cz , z .  Additional det ail f or Theore m 2 . W e need to show, for instance, ˆ Qz A = ˜ Qz A + ˇ Qz A + O  1 n 3 / 2  . This can b e done in three easy steps. Step 1. N D z A = N z A + O  1 n 3 / 2  . Indeed, N = O (1), D = 1 + O ( 1 n ), and z A = O ( 1 √ n ). Step 2. N = ˜ Q + ˇ Q − R , where R = ˜ p A a A z A + ˜ p B b B z B + ˜ p C c C z C . This is b ecause ( az ) A = az , and so forth. Step 3. R = O ( 1 n ) so Rz A = O ( 1 n 3 / 2 ). Remarks. (i) As a matter of notatio n, ˜ Q is deterministic but ˇ Q is ran- dom. Both are s calar: compare ( 12 ) and ( A16 ). Th e source of the bias is the co v ariance b et ween ˇ Q and z A , z B , z C . (ii) Supp ose we add a constant k to z . Instead of ( A11 ), we get z = k and z 2 = 1 + k 2 . Because z A and so forth are all shifted by the same amoun t k , the shift does not affect e, f or ˆ Q ; see ( A1 )–( A3 ). T h e mult iple regression estimator for the effect of A is therefore shifted by ˆ Qk ; lik ewise for B , C . This bias do es not tend to 0 when sample size gro ws, but d o es cancel wh en estimating d ifferences in effects. (iii) In applications, we cannot assume the parameters a, b, c are 0—the whole p oint is to estimate them. The in v ariance lemma, how ev er, reduces the general case to the more m anageable sp ecial case, where a = b = c = 0, as in the pr o of of Theorem 2 . (iv) In ( 19 ), K = O (1). Indeed, z = 0, so co v ( az , z ) = ( az ) z = az 2 . No w      1 n n X i =1 a i z 2 i      ≤ 1 n n X i =1 | a i | 3 ! 1 / 3 1 n n X i =1 | z i | 3 ! 2 / 3 b y H¨ older’s inequ alit y app lied to a and z 2 . Finally , us e C ondition #1 . The same argum en t can b e used for co v ( bz , z ) and co v ( cz , z ) . Define ˆ Q as in ( A3 ); recall ( A1 )–( A2 ). The residuals from the m ultiple regression are e − ˆ Qf by Lemma A.2 ; according to usual pro cedur es, ˆ σ 2 = | e − ˆ Qf | 2 / ( n − 4) . ( A18) Recall f from ( A2 ), and ˆ Q, Q f r om ( A3 ) and ( 13 ). 18 D. A. FREEDMAN Lemma A.3. Assume Conditions # 1 – # 3 , not Condition # 4 , and ( 11 ). Then | f | 2 /n → 1 and ˆ Q → Q . Conver genc e is in pr ob ability. Pr oof. The first claim follo ws from ( A5 ) and ( A12 ); the second, from ( A10 ) and Th eorem 1 .  Pr oof of Theorem 3 . Let M b e t he 4 × 4 matrix wh ose diagonal is ˜ p A , ˜ p B , ˜ p C , 1 ; the last ro w of M is ( z A , z B , z C , 1) ; th e last column of M is ( z A , z B , z C , 1) ′ . Pad out M with 0’s. Plainly , X ′ X/n = M . As b efore, ˜ p A = n A /n is deterministic, and ˜ p A → p A b y ( 9 ). But z A = O (1 / √ n ); lik ewise for B , C . T his prov es (i). F or (ii), e = e − ˆ Qf + ˆ Qf . But e − ˆ Qf ⊥ f . S o | e − ˆ Qf | 2 = | e | 2 − ˆ Q 2 | f | 2 . Then n − 4 n ˆ σ 2 = | e − ˆ Qf | 2 n = | e | 2 − ˆ Q 2 | f | 2 n = | Y | 2 n − ˜ p A ( Y A ) 2 − ˜ p B ( Y B ) 2 − ˜ p C ( Y C ) 2 − ˆ Q 2 | f | 2 n = | Y | 2 n − ˜ p A ( a A ) 2 − ˜ p B ( b B ) 2 − ˜ p C ( c C ) 2 − ˆ Q 2 | f | 2 n b y ( A1 ) and ( 3 ). Using ( 3 ) again, we get | Y | 2 n = ˜ p A ( a 2 ) A + ˜ p B ( b 2 ) B + ˜ p C ( c 2 ) C . (A19) (Remem b er, the dummy v ariables are orthogonal.) So n − 4 n ˆ σ 2 = ˜ p A [( a 2 ) A − ( a A ) 2 ] + ˜ p B [( b 2 ) B − ( b B ) 2 ] (A20) + ˜ p C [( c 2 ) C − ( c C ) 2 ] − ˆ Q 2 | f | 2 n . T o ev aluate lim ˆ σ 2 , we ma y without loss of generalit y assume Condi- tion #4 , by the inv ariance lemma. No w a A = O (1 / √ n ) and lik ewise for B , C b y ( A15 ). The terms in ( A20 ) inv olving ( a A ) 2 , ( b B ) 2 , ( c C ) 2 can therefore b e dropp ed, b eing O (1 /n ). F urthermore, | f | 2 /n → 1 and ˆ Q → Q by Lemma A.3 . T o complete the p r o of of (ii), we must show that, in probabilit y , ( a 2 ) A → h a 2 i , ( b 2 ) B → h b 2 i , ( c 2 ) C → h c 2 i . (A21) This follo ws from Condition #1 and Prop osition 1 . Giv en (i) and (ii) , claim (iii) is imm ediate.  REGRESSION ADJUSTMENT 19 Pr oof of Theore m 4 . The asymptotic v ariance of the m ultiple regres- sion estimat or is giv en b y Th eorem 2 . Th e v ariance of the ITT estimator Y C − Y A can b e work ed out exactly , fr om Prop osition 1 (see Example 1 ). A bit of algebra will no w pro v e Th eorem 4 .  Pr oof of Theo rem 5 . By the inv ariance lemma, w e ma y as well assume that a = b = c = 0. The ITT estimator is u n biased. By Lemma A.2 , the m ultiple regression estimator differs from the IT T estimator b y ˆ Qz A , ˆ Qz B , ˆ Qz C . These three random v ariables sum to 0 b y ( 11 ) and the balance condition. So their exp ectations sum to 0. Moreo v er, the three ran- dom v ariables are exc hangeable, so th eir exp ectatio ns m ust b e equal. T o see the exc h angeabilit y more sh arp ly , recall ( A1 )–( A3 ). Because th ere are no in teractions, Y i = δ i . So e = δ − δ A U − δ B V − δ C W (A2 7) b y ( A1 ), and f = z − z A U − z B V − z C W (A28) b y ( A2 ). These are random n -v ectors. Th e join t distribution of e, f , ˆ Q, z A , z B , z C (A29) do es not dep end on the lab els A, B , C : the pairs ( δ i , z i ) are just b eing divided in to three rand om groups of equal size.  The same argument s h o ws that the m u ltiple regression estimator f or an effect difference (lik e a − c ) is symmetrically distributed aroun d th e true v alue. Pr oof of Theorem 6 . By Lemma A.1 , we may assume without loss of generalit y th at a = b = c = 0 . W e can assign sub jects to A, B , C by randomly p ermuting { 1 , 2 , . . . , n } : th e first n A sub jects go in to A , the next n B in to B , and the last n C in to C . F reez e the num b er of A ’s, B ’s—and hence C ’s— within eac h level of z . Consider only the corresp ond in g p ermutat ions. Over those p ermutat ions, z A is fr ozen; lik ewise for B , C . So the denominator of ˆ Q is frozen: without co ndition ( 11 ), the denominator m ust b e c omputed from ( A5 ). In the n umerator, z A , z B , z C are frozen, while a A a vera ges out to zero o ver the p ermutat ions of in terest; s o d o b B and c C . With a little m ore effort, one also sees that ( az ) A a vera ges out to zero, as do ( bz ) B , ( cz ) C . In consequence, ˆ Qz A has exp ectation 0, and like wise for B , C . Lemma A.2 completes th e argum en t.  20 D. A. FREEDMAN Remarks. (i) What if | f | = 0 in ( A2 )–( A3 )? Then z is a linear com- bination of the treatmen t dummies U, V , W ; the design matrix ( U V W z ) is singular, and the multiple r egression estimator is ill-defined. This is not a problem for Theorems 2 or 3 , b eing a low-probabilit y eve n t. But it is a prob- lem for Theorems 4 and 5 . The easiest course is to assume the problem a w ay , for instance, r equiring z is linearly ind ep endent of the treatmen t du m mies for ev- ery p erm utation of { 1 , 2 , . . . , n } . (A30) Another solution is more int eresting: exclude the p ermutat ions w here | f | = 0, and sho w the multiple regression estimator is conditionally un biased, that is, has the right a verage o ve r the remaining p ermutat ions. (ii) All that is needed for Th eorems 2 – 4 is an a pr iori b ound on absolute third momen ts in Condition #1 , rather than fourth momen ts; third momen ts are used for the CL T b y H¨ oglund ( 1978 ). Th e new a wkwardness is in pro ving results lik e ( A21 ), but this can b e done by familiar truncation argumen ts. More explicitly , let x 1 , . . . , x n b e real num b ers, with 1 n n X i =1 | x i | α < L. (A31) Here, 1 < α < ∞ and 0 < L < ∞ . As will b e seen b elo w, α = 3 / 2 is the relev an t case. In pr inciple, the x ’s can b e doubly su bscripted, for instance, x 1 can c hange with n . W e dra w m times at random without replacemen t from { x 1 , . . . , x n } , generating random v ariables X 1 , . . . , X m . Pr oposition A.1. Under c ondition ( A31 ) , as n → ∞ , if m/n c onver ges to a p ositive limit that is less than 1 , then 1 m ( X 1 + · · · + X m ) − E ( X i ) c on- ver ges in pr ob ability to 0 . Pr oof. Assume without loss of generalit y th at E ( X i ) = 0. L et M b e a p ositiv e num b er. Let U i = X i when | X i | < M ; else, let U i = 0. Let V i = X i when | X i | ≥ M ; else, let V i = 0. T h us, U i + V i = X i . Let µ = E ( U i ), so E ( V i ) = − µ . No w 1 m ( U 1 + · · · + U m ) − µ → 0. Con v ergence is almost sure, and r ates can b e giv en; see, f or instance, Ho effding ( 1963 ). Consider next 1 m ( W 1 + · · · + W m ), wh ere W i = V i + µ . The W i are ex- c hangeable. Fix β with 1 < β < α . By Mink owski’s in equ alit y ,  E      W 1 + · · · + W m m     β  1 /β ≤ [ E ( | W i | β )] 1 /β . (A32) When M is large, the righ t-hand side of ( A32 ) is u niformly small, by a standard argumen t starting from ( A31 ). In essence, Z | X i | >M | X i | β < M β − α Z | X i | >M | X i | α < L/ M α − β . REGRESSION ADJUSTMENT 21  In pro ving Theorem 2 , w e needed ( az ) A = O (1). If there is an a priori b ound on the absolute third momen ts of a and z , then ( A31 ) will hold for x i = a i z i and α = 3 / 2, b y the Cauc hy–Sc h w arz inequalit y . On the other hand, a b ound on the second momen ts w ould su ffice, b y Chebyshev’s inequalit y . T o get ( A21 ) from third moments, we would, for instance, set x i = a 2 i ; again, α = 3 / 2. Ac kn o wledgment s. Donald Green generated a string of examples where the regression estimator w as unbiased in finite samples; ad ho c explanations for the fi ndings gradually evo lv ed in to Theorems 5 and 6 . Sandrine Dudoit, Winston Lim, Mic hael Newton, T erry Sp eed and Pete r W est fall m ade useful suggestions, as did an anon ymous asso ciate ed itor. REFERENCES Al tman, D. G ., Schulz, K. F., Moher, D. et al. (2001). The revised CONSOR T statemen t for rep orting randomized trials: Explanation an d elab oration. Ann. Inter- nal Me dicine 134 663 –694. Cha ttop adhy a y, R. and Duflo, E. (20 04). W omen as p olicy mak ers: Evidence from a randomized p olicy exp eriment in India. Ec onometric a 72 1409–1443. MR2077488 Cook, N. R., Cutler, J. A., Obarzanek, E. et al. (2007). Long term effects of dietary sodium reduction on cardio v ascular disease outcomes: Observati onal follo wup of the trials of hyp ertension preven tion. British Me di c al J. 334 885–892. Co x, D. R. (1956). A note on wei ghted randomization. Ann. Math. Statist. 27 1144 –1151. MR008387 2 Freedman, D. A. (2008). Randomization do es not justify logistic regression. Av ailable at http://www .stat.berk eley.edu/users/census/neylogit.pdf . Freedman, D. A. (2007). On regressio n adjustmen t s to exp erimen tal data. A dv. in Appl. Math. Av ailable at http://www .stat.berk eley.edu/users/census/neyregr.pdf . Freedman, D. A. (2006). Statistical mo dels for causation: W hat inferential lev erage do they provide? Evaluation R eview 30 691–713. Freedman, D. A. (2005 ). Statist ic al Mo dels : The ory and Pr actic e . Cam bridge U n iv. Press, New Y ork. MR21758 38 Ger tler, P. (2004). Do conditiona l cash transfers improv e chil d health? Eviden ce from PROGRESA’s control randomized exp erimen t. Americ an Ec onomic R eview 94 336– 341. Hill, A. B. (1961). Principl es of Me dic al Statistics , 7th ed. The Lancet, London. Hoeffding, W . (1951). A com binatorial central limit theorem. An n. Math. Statist. 22 558–56 6. MR004405 8 Hoeffding, H. (1963). Probability inequalities for sums of b ounded random v ariables. J. Amer . Statist. Asso c. 58 13–30. MR014436 3 H ¨ oglund, T. (1978). S ampling from a finite p opulation: A remainder term estimate. Sc and. J. Statist. 5 69–71 . MR047113 0 Kempthorne, O. (1952). The Design and Analysis of Exp eriments . Wiley , New Y ork. MR004536 8 22 D. A. FREEDMAN Neyman, J. (1923). S ur les applications de la th´ eorie des probabilit´ es aux exp eriences agricole s: Essai des princip es. R o czniki Nauk R olniczych 10 1–51. [In P olish. En glish translation by D. M. Dabrowsk a and T. P . S p eed (1990). Statist. Sci. 5 465–480 (with discussion).] Ro ssouw, J . E., Anderson, G. L., Prentice, R. L. et al. (2002). Risks and benefi t s of estrogen plus progestin in healthy postmenopausal w omen: Principal results from the W omen’s Health I nitiativ e randomized contro lled trial. J. Americ an M e dic al Asso cia- tion 288 321–333. Victora, C. G., Habicht, J. P. and B r yce, J. (2004). Evidence-based public health: Mo v ing b eyond randomized trials. Amer ic an J. Public He alth 94 400–405. Dep ar tmen t of St a tistics University of California Berkeley, California 94720-3860 USA E-mail: freedman@stat.berkeley .edu

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment