Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data

Statistic al Scienc e 2007, V ol. 22, No. 4, 540– 543 DOI: 10.1214 /07-STS227C Main article DO I: 10.1214/07-STS227 c  Institute of Mathematical Statisti cs , 2007 Comment: Demystifying Double Robustness: A Compa rison of Alternative Strategies fo r Estimating a P opulation Mean from Incomplete Data Greg Ridgew a y and Daniel F. Mc Caﬀrey This article is an excellen t in tro du ction to dou- bly robus t metho ds and we congratulate the au th ors for their thoroughness in brin ging together the wid e arra y of metho ds from d iﬀeren t tr ad itions that all share the prop ert y of b eing doubly robust. Statisticians at RAND ha ve b een making exten- siv e use of prop ensity sco re wei ght ing in education ( McCaﬀrey and Hamilton ( 2007 )), p olicing and crim- inal j ustice ( Ridgew a y ( 2006 )), dru g tr eatmen t e v al- uation ( Morral et al. ( 2006 )), and military workforce issues ( Harrell, Lim, Castaneda and Golinelli ( 2004 )). More recen tly , w e ha v e b een adopting doubly ro- bust (DR) metho ds in these applications b elieving that w e could ac hieve further bias and v ariance re- duction. In itially , this articl e made us second-guess our decision. The apparen tly s trong p erformance of OLS and the authors’ ﬁnding that no method out- p erformed OLS r an coun ter to our in tu ition a nd ex- p erience with prop ensit y sco re weig hti ng and DR estimators. W e p osited tw o p otenti al explanations for this. First, w e susp ected that the high v ariance rep orted by the authors wh en u sing prop ensit y score w eigh ts could result from their u se of standard logis- tic regression. Second, stronger interac tion eﬀects in the outcome regression mo del migh t fa v or the DR approac h. Gr e g Ri dgeway is Senior Statistician and Asso ciate Dir e ctor of the Saf ety and Just ic e Pr o gr am at the RAND Corp or ation, Santa Monic a, California 90407 -2138, USA e- m ail: gr e gr@r and.or g . Daniel F. McCaﬀr ey is Senior Statistician and He ad of the Statistics Gr oup at the RAND Corp or ation, Pittsbur gh, Pennsylvania 1521 3, US A e-mail: danielm@r and.or g . This is an electronic r eprint o f the original article published by the Institu te of Mathematical Statistics in Statistic al Scienc e , 2007, V ol. 22 , No. 4 , 540 –543 . This reprint diﬀer s from the orig inal in pa gination and t yp ogr aphic detail. 1. METHODS W e felt the authors were somewhat n arro w in th eir discussion of w eigh ting by fo cu s ing only on prop en - sit y scores estimated b y logistic regression in their sim ulation. Th e high v ariabilit y in th e w eigh ts re- p orted b y the authors could result f rom using this metho d. The authors s tate that none of the v arious IPW metho ds could ov ercome the problems w ith es- timated prop ensity scores near 0 and 1 , ye t we b e- liev ed that this is ind icativ e of a pr oblem with the prop en s it y s core estimator rather than IPW meth- o ds. I n our exp erience weig hts estimated using a generalized b o osted mo del (GBM) follo win g the metho ds of McCaﬀrey , Rid gew a y and Morral ( 2004 ) as implemen ted in the T o olkit for W eighti ng and Analysis of Nonequiv alen t Gr ou p s, the twang p ac k- age for R, tend not to sho w the extreme b ehavio r that resulted from log istic r egression ( Ridgew a y , McCaﬀrey and Morral ( 2006 )). GBM is a general, automated, d ata-adaptiv e al- gorithm that can b e used with a large n umber of co v ariates to ﬁt a n onlinear surface and estimate prop en s it y scores. GBM u ses a linear combinatio n of a large collection of piecewise constan t basis fu n c- tions to c onstruct a regression model f or dic hoto- mous outcomes. Shr u nke n co eﬃcient s preve nt the mo del fr om o v erﬁtting. The use of piecewise con- stan ts has the eﬀect of ke eping the estimated pr op en- sit y scores relativ ely ﬂat at th e edges of the range of the p r edictors, y et it still pro du ces w ell-calibrated probabilit y estimates. This reduces the risk of the spurious pr edicted p r obabilities near 0 and 1 that cause p roblems for prop en s it y score w eigh ting. Many v arian ts of bo osting ha v e app eared in mac h ine learning and s tatistics literat ur e and Hastie, Tibshirani and F riedm an ( 2001 ) provide an o verview. W e optimized the n umb er o f terms in the GBM mo del to pro vid e the b est “balance” b etw een 1 2 G. R IDGEW A Y AND D. F. MCCAFFREY the w eigh ted co v ariate distributions f ( x | t = 1) an d f ( x | t = 0). Th is app roac h to ﬁtting p rop ens ity scores is fully implemen ted in the tw ang pac k age. W e tested our c onjectures ab out the p erformance of IPW and DR estimators based on GBM and in the presence of omitt ed in teractions terms through a simulatio n exp er im ent using the same d esign that the authors u sed. Using their mo del from Section 1.4, w e generated 1000 datasets and ca lculated the p opulation and nonresp onse estimator as they d id for T ables 1, 3, 5 and 6. In addition, for eac h d ataset w e also estimated prop ensity scores usin g GBM (the ps() fun ction optimizes the num b er of basis fun c- tions in the mo del to minimize the la rgest of the marginal Kolmogoro v–Smirn o v statistics). While their simulat ions d o not test th is, Kang and S c hafer noted that choic es other than logistic regression ma y b e preferable and oﬀered robit regression as a p os- sibilit y . W e included robit(1) p rop ensity s core esti- mates in our sim ulations as well. In add ition to exp erimenting with other p r op en- sit y score estimators, w e also expanded the sim ula- tion to add an in teraction term equal to 20 Z 1 Z 2 to the mean fun ction for Y . R co de f or the sim ulation exp eriments i s av aila ble up on request. 2. RESUL TS T able 1 sho ws the results for the IPW metho ds. The r o w s of the table corresp ond to the diﬀerent es- timators presented by Kang and Sc hafer. The r o w lab eled “Mo del for Y ” d enotes whether the out- come OLS mod el used the Z v ariables or th e “mis- transformed” X v ariables in th e estima tion or, in the case of the in teraction exp er im ents, wh ether the ﬁtted mo del includes a Z 1 Z 2 term. The 12 IPW e s- timators in T able 1 v ary b y the prop en s it y score mo del (logistic, GBM or robit regression), the use of Z or X as co v ariates in t he prop ensit y score mo del, and the use of either p opu lation weig hti ng ( IPW- POP) or the n onresp ond er reweig hti ng (IPW-NR). The elemen ts of the table conta in th e ratio of the RMSE of the alternativ e estimators to the RMSE of OLS ﬁt with the co v ariates listed in eac h column heading. First note that IPW estimators with logistic re- gression us in g the X co v ariates hav e b y far th e largest RMSEs in the table. S econd, while OLS seems to b e preferable ov er IPW metho ds in the case where there is in tru th no interac tion, when the OLS mod els ex- clude an imp ortan t in teraction the IPW metho ds are preferable. When faced with the c hoice b etw een OLS and I PW, the analyst m ust d ecide whether to hedge against an interact ion and use IPW or c ho ose OLS, hoping that the outcome mo del is sp eciﬁed correctly and consequently gaining a 60 % impro ve - men t o ver GBM-based IPW or a 10% impro v emen t o ver robit-based IPW. T able 1 Simulation study r esults f or IPW m etho ds Generated data: K&S mo del K&S mo del with interactio n Fit with Z Fit with Z , Mod el for Y : Fit with Z Fit with X and interactions no interactions F it with X OLS 1.0 (1.16) 1.0 (1.64) 1.0 (1.35) 1.0 (3.58) 1.0 (5.00) Logistic Z IPW-POP 1.4 1.0 2.0 0.7 0.5 IPW-NR 1.3 0.9 1.9 0.7 0.5 X IPW-POP 9.9 7.0 9.7 3.6 2.6 IPW-NR 6.0 4.3 5.9 2.2 1.6 GBM Z IPW-POP 1.9 1.3 2.2 0.8 0.6 IPW-NR 1.5 1.0 2.1 0.8 0.6 X IPW-POP 2.6 1.9 3.1 1.2 0.8 IPW-NR 2.2 1.6 2.7 1.0 0.7 Robit Z IPW-POP 1.4 1.0 1.7 0.6 0.4 IPW-NR 1.3 0.9 2.4 0.9 0.6 X IPW-POP 1.6 1.1 2.8 1.0 0.7 IPW-NR 1.6 1.1 2.9 1.1 0.8 The ro ws deﬁne the model used for the propensity score w eights and the columns d eﬁ ne the v ariables used in the outcome regression. The cells show the rati o of the RMSE of the estima tor to the RMSE of t he OLS model that used t he cov ariates listed in th e column title. The actual RMSE of the OLS mod el is shown in parentheses. COMMENT 3 T able 2 Simulation study r esults f or DR met ho ds Generated data: K&S mo del K&S mo del with interaction Fit with Z Fit with Z , Mod el for Y : Fit with Z Fit with X and i nteractions no interactions Fi t with X OLS 1.0 (1.16) 1.0 (1.64) 1.0 (1.35) 1.0 (3.58) 1.0 (5.00) Logistic Z BC 1.0 1.0 1.0 0 . 6 0 . 4 WLS 1.0 0.8 1.0 0 . 5 0 . 4 X BC 1.0 51.3 2.6 85 . 8 139 . 2 WLS 1.0 2.0 1.0 1 . 1 1 . 2 GBM X BC 1.0 0.9 1.0 ∗ 0 . 6 0 . 6 WLS 1.0 0.9 1.0 ∗ 0 . 5 0 . 6 Robit X BC 1.0 1.9 1.0 ∗ 0 . 5 1 . 2 WLS 1.0 1.5 1.0 ∗ 0 . 5 1 . 0 ∗ These estimators use Z in the prop ensity score mo del. The ro ws deﬁne the model used for the propensity score w eights and the columns d eﬁ ne the v ariables used in the outcome regression. The cells show the rati o of the RMSE of the estima tor to the RMSE of t he OLS model that used t he cov ariates listed in the column title. All GBM and robit mo dels were ﬁt using X with the exception of th e “Fit with Z , n o intera ctions” column for which they were ﬁt with Z . The actual RMSE of th e OLS mo del is show n in parentheses. The aim of DR estimators is to a v oid this d ilemma and the asso ciated hedging b y co mbining the b ene- ﬁts of b oth the outcome and selection m o dels. Kang and Sc hafer’s results sugge st that curren t DR es- timators can disapp oin t us. They sho w DR esti- mators ha ving twice the RMS E as OLS estimators when b oth the outcome and selection mo dels u se the X co v ariates. W e in v estigated th is u sing the same prop en s it y score mo dels describ ed pr eviously and the b ias corrected (BC) and weigh ted least squares (WLS) describ ed by Kang an d Schafer. T able 2 com- pares th e relativ e eﬃciency of DR estimators in terms of the RMSE of the DR estimators compared to OLS. T he most in teresting comparisons are those for whic h b oth the p rop ensity score mo del and the outcome regression model use X . Other combina- tions, su ch as the pr op ensity score ﬁt with X and the outcome regression ﬁt with Z , are n ot realistic but are included for completeness. The resu lts clearly show that WLS with GBM dominates OLS. When the mo d el for Y is correct, WLS is essentia lly as eﬃcien t as th e OLS estimator. When the mo del f or Y is in correct, WLS with GBM can b e signiﬁcantly more eﬃcient than OLS. GBM also outp erform s the rob it r egression m o del that the authors suggested as an option. These results sug- gest that DR estimators migh t b e reliable metho d s of buyin g insur an ce against mo d el missp eciﬁcation without pa ying a high pr ice in lost eﬃciency . 3. SUMMARY In the simulatio n the d oubly robust estimators are particularly usefu l wh en the mo del is miss ing an im- p ortant inte raction b et we en p retreatmen t v ariables. Exploratory data an alysis could b e used to ﬁnd suc h missing term s in the m o del and hence th e adv an- tages of WLS might app ear o verstated. Ho wev er, suc h exploratory an alyses require mo deling the o ut- come and present the o pp ortunity for the mo del se- lection to b e corrupted by t he impact o f alternativ e mo dels on the estimated treatmen t eﬀect. T h at is, the mo d el might b e c hosen b ecause it yields s ig- niﬁcan t treatmen t eﬀects. This typ e of mo del ﬁt- ting remo v es one of the b eneﬁts of the p rop ens it y score approac h , whic h is the ability to cont rol f or pretreatmen t v ariable pr ior to seeing the outcome to a vo id th e temptation or eve n the app earance of data sno oping. Doubly robust estimators with GBM app ear to ha v e th e desired p rop erties in this s im ulation stud y . When the model fo r the mean is correct, there is n o cost for using the doubly r obust estimator (bias cor- rected or WLS). They are essen tially as eﬃcie nt as the correctly sp eciﬁed OL S mo del. When th e OL S mo del is in correct, again the d oubly robu st estima- tors are at least as eﬃcien t as OLS and sub stan tially more eﬃcien t when the OLS mo del is m issing imp or- tan t interact ion terms. While it is clear that more w ork on these estimators is needed, our results do suggest that doubly robust estimation should not b e dismissed to o quic kly . 4 G. R IDGEW A Y AND D. F. MCCAFFREY REFERENCES Harrell, M. C. , Lim, N. , Cast ane da, L. and G olinelli, D. (200 4). W orking a round the military: Challenges to mili- tary sp ouse employment and education. R AND Monograph MG-196-OSD, Santa Monica, CA. Hastie, T. , Tibshirani, R. and Frie dman, J. H. (2001 ). The Elements of Statistic al L e arning : Data Mining , I nfer- enc e , and Pr e diction . Springer, New Y ork. MR1851606 McCaffrey, D. F. and Ham il ton, L. S. (2007). V alue- added assessmen t in practice: Lessons from the P en n syl- v ania v alue-added assessmen t system p ilot p ro ject. R AND T ec hnical R ep ort TR-506, Santa Monica, CA. McCaffrey, D. F. , Ri dgew a y, G . and Morral, A. R. (2004). Prop ensity scor e estimation with b o osted regres- sion for ev aluating causal eﬀects in observ ational stud ies. Psychol. Metho ds 9 403–425. Morral, A. R. , McCaffrey, D. F. , Ridgew a y, G. , Mukherji, A. and Beighley, C. (2006). The relativ e ef- fectiv eness of 10 adolescen t substance abuse treatment pro- grams in the Un ited States. R AND T echnical Rep ort TR- 346, S anta Monica, CA. Ridgew a y, G. (2006). Assessing the eﬀect of race bias in p ost-traﬃc stop outcomes u sing prop ensity scores. J. Quant. Criminol. 22 1–29. Ridgew a y, G. , McCaffrey, D. F. and Mor- ral, A. R. (2006). Tw ang: T oolkit for weigh t - ing and analysis of nonequiv alent groups. (Soft- w are and reference manual.) Av ailable at http://cra n.r-pro ject.org/src/con trib/Descriptions/ tw ang.html .

Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment