Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data

Comment on ``Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data'' [arXiv:0804.2958]

Authors: Anastasios A. Tsiatis, Marie Davidian

Statistic al Scienc e 2007, V ol. 22, No. 4, 569– 573 DOI: 10.1214 /07-STS227B Main article DO I: 10.1214/07-STS227 c  Institute of Mathematical Statisti cs , 2007 Comment: Demystifying Double Robustness: A Compa rison of Alternative Strategies fo r Estimating a P opulation Mean from Incomplete Data Anastasios A. Tsiatis and Ma rie Davidian INTRODUCTION W e congratulat e Drs . Kang and S c hafer (K S hence- forth) for a careful and though t-prov oking contribu- tion to the literature regarding the so-called “dou- ble robu stness” prop ert y , a t opic that still en genders some confusion and disagreemen t. The authors’ ap- proac h of fo cusing on the simp lest situation of es- timation of t he p opu lation mea n µ of a resp onse y when y is not observed on all sub jects according to a missing at rand om (MAR) mec hanism (equiv alent ly , estimation o f t he mean of a p oten tial outcome in a causal mo del under the assumption of no un mea- sured confound ers) is commendable, as the fun da- men tal iss u es can b e explored without the distrac- tions of the messier n otation and consid erations re- quired in more complicated settings. Ind eed, as the article demons tr ates, th is simple setting is su fficien t to highligh t a num b er of k ey p oint s. As noted elo quently by Molen b erghs ( 2005 ), in regard to how suc h missing data/causal in f erence problems are b est addressed, tw o “sc ho ols” ma y b e iden tified: the “lik eliho o d -orien ted” school and the “w eigh ting-based” sc ho ol. As w e ha v e emphasized previously (Da vidian, Tsiatis and Leon, 2005 ), we prefer to view infer en ce from the v anta ge p oin t of Anastasio s A. Tsiatis is Dr exel Pr ofessor of Statistics at North Car olina St ate University, R aleigh, North Car olina 27695-8203 , USA e-mail: tsiatis@stat.ncsu.e du . Marie Davidian is Wil liam Ne al R eynolds Pr ofessor of Statistics at North Car olina Stat e University, R aleigh, North Car olina 27695-820 3, USA e-mail: davidi an@stat.ncsu .e du . This is an electronic reprint of the or iginal article published by the Institute of Mathematical Statistics in Statistic al Scienc e , 2 007, V ol. 2 2, No. 4, 569– 573 . This reprint differs from the original in pagination and t yp ogr aphic detail. semiparametric theory , fo cus ing on the assump tions em b edd ed in the statistical mo d els leading to differ- en t “t yp es” of estimato rs (i.e ., “lik eliho o d-oriente d” or “w eigh ting-based”) rather than on the form s of the estimators themselv es. In this discussion, w e hop e to complement the p resen tation of the authors by elab orating on this p oint of view. Throughout, we use th e same n otation as in th e pap er. SEMIP ARAMET RIC THEORY PERSP ECTIVE As demonstrated by Robins, Rotnitzky and Zhao ( 1994 ) and Tsiatis ( 2006 ), exploiting the relation- ship b etw een so-calle d influenc e functions and esti- mators is a fr u itful approac h to stud ying and con- trasting the (large-sample) prop erties of estimators for p arameters of int erest in a statistical mo del. W e remind the r eader that a statistic al mo del is a class of dens ities that could hav e generated the ob s erv ed data. Our p resen tation h ere is for s calar parameters suc h as µ , but generalizes readily to v ector-v alued parameters. If one restricts atten tion to estimators that are r e gular (i.e., not “pathological”; see Da vid- ian, Tsiatis and Leon, 2005 , page 263 and Tsiatis 2006 , pages 26–27 ), then , for a parameter µ in a parametric or s emip arametric statistical mo del, an estimator b µ for µ based on indep endent an d iden- tically d istr ibuted ob s erv ed data z i , i = 1 , . . . , n , is said to b e asympto tic al ly line ar if it satisfies n 1 / 2 ( b µ − µ 0 ) = n − 1 / 2 n X i =1 ϕ ( z i ) + o p (1) (1) for ϕ ( z ) with E { ϕ ( z ) } = 0 and E { ϕ 2 ( z ) } < ∞ , where µ 0 is the tru e v alue of µ generating the data, and exp ectation is with resp ect to the true d istribution of z . The function ϕ ( z ) is the influenc e function of the estimator b µ . A regular, asymptotically linear es- timator w ith influ ence function ϕ ( z ) is consisten t 1 2 A. A. TSIA TIS AN D M. DA VIDIAN and asymptotically normal with asymptotic v ari- ance E { ϕ 2 ( z ) } . Thus, there is an inextricable con- nection b et ween estimators and in fluence fu nctions in that the asymp totic b eha vior of an estimator is fully determined b y its infl uence fun ction, so that it suffices to fo cus on the in fl uence function when discussing an estimator’s prop erties. Many of the estimators discu ssed by KS are regular and asymp - totical ly linear; in the sequel, w e refer to regular and asymptotically linear estimators as simply “estima- tors.” W e capitalize on this connection b y considering the problem of estimating µ in the setting in KS in terms of statistica l mo dels that may b e assumed for the observe d data, from which influence func- tions corresp onding to estimators v alid und er the assumed mo dels ma y b e d er ived. In the situation studied by KS , th e “full” data that would ideally b e observ ed are ( t, x, y ); how eve r, as y is unobs er ved for some sub jects, the observed d ata av ailable for anal- ysis are z = ( t, x, ty ). As n oted b y KS , the MAR as- sumption states that y and t are conditionally in de- p end ent give n x ; for example, P ( t = 1 | y , x ) = P ( t = 1 | x ). Un d er this assumption, all join t d ensities for the observ ed data hav e the form p ( z ) = p ( y | x ) I ( t =1) p ( t | x ) p ( x ) , (2) where p ( y | x ) is the densit y of y giv en x , p ( t | x ) is the d ensit y of t giv en x , and p ( x ) is the marginal densit y of x . L et p 0 ( z ) b e the densit y in th e class of densities of form ( 2 ) generating the observed data (the true joint densit y). One ma y p osit d ifferent statistic al mo d els b y mak- ing differen t assumptions on the components of ( 2 ). W e fo cus on thr ee suc h mo d els: I. Mak e n o assumptions on the forms of p ( x ) or p ( t | x ), lea ving th ese en tirely unsp ecified. Make a sp ecific assumption on p ( y | x ), namely , th at E ( y | x ) = m ( x, β ) for some giv en fu nction m ( x, β ) dep end ing on p arameters β ( p × 1). De- note th e class of densities satisfying th ese as- sumptions as M I . I I. Mak e no assu m ptions on the f orms of p ( x ) or p ( y | x ). Mak e a sp ecific assumption on p ( t | x ) that P ( t = 1 | x ) = E ( t | x ) = π ( x, α ) for some giv en function π ( x, α ) dep endin g on p arameters α ( s × 1). Here, we also requir e the assumption that P ( t = 1 | x ) ≥ ε > 0 for all x and some ε . Denote the class of d ensities satisfying these assump- tions as M II . I I I . Make n o assump tions on th e form of p ( x ), but mak e sp ecific assumptions on p ( y | x ) and p ( t | x ), namely , that E ( y | x ) = m ( x, β ) and P ( t = 1 | x ) = E ( t | x ) = π ( x, α ) ≥ ε > 0 for all x and some ε for giv en fun ctions m ( x, β ) and π ( x, α ) dep end ing on parameters β and α . Th e class of densities satisfying these assumptions is M I ∩ M II . All of I–II I are semiparametric s tatistica l mo dels in that some asp ects of p ( z ) are left un sp ecified. De- note b y m 0 ( x ) the true function E ( y | x ) and b y π 0 ( x ) the tru e fun ction P ( t = 1 | x ) = E ( t | x ) corresp ondin g to the tr u e densit y p 0 ( z ). Semiparametric theory yields the form of all in- fluence fu nctions corresp ond in g to estimators for µ under eac h of the statistical mo d els I–I I I. As d is- cussed in Tsiatis ( 2006 , page 52), lo osely sp eaking, a consisten t and asymptotically n ormal estimator for µ in a s tatistical mo del h as the prop ert y that, for all p ( z ) in the class of d ensities defined by the mo del, n 1 / 2 ( b µ − µ ) D ( p ) → N { 0 , σ 2 ( p ) } , where D ( p ) → means con- v ergence in distrib u tion und er the densit y p ( z ) , and σ 2 ( p ) is the asymptotic v ariance of b µ u nder p ( z ). If mo del I is correct, then m 0 ( x ) = m ( x, β ) for some β , and it ma y b e shown (e.g., Tsiatis, 2006 , Section 4.5) that all estimators for µ ha ve influence functions of the form m 0 ( x ) − µ + ta ( x ) { y − m 0 ( x ) } (3) for arb itrary functions a ( x ) of x . If mo del I I is cor- rect, then π 0 ( x ) = π ( x, α ) for some α , and all esti- mators for µ h a v e infl uence functions of th e form ty π 0 ( x ) + t − π 0 ( x ) π 0 ( x ) h ( x ) − µ (4) for arb itrary h ( x ), which is w ell kno wn from Robins, Rotnitzky and Z hao ( 1994 ). If mo del I I I is correct, then m 0 ( x ) = m ( x, β ) and π 0 ( x ) = π ( x, α ) for some β and α , and in fluence fun ctions for estimators b µ ha v e the form m 0 ( x ) − µ + ta ( x ) { y − m 0 ( x ) } (5) + t − π 0 ( x ) π 0 ( x ) h ( x ) for arbitrary a ( x ) and h ( x ). Dep ending on forms of m ( x, β ) as a function of β and π ( x, α ) as a f unction of α , there will b e restrictions on th e f orms of a ( x ) and h ( x ); see b elo w . W e n ow consider estimators discuss ed b y K S from the p ers p ectiv e of in fluence functions. The regres- sion estimator b µ OLS in ( 7 ) of KS comes ab out nat- urally if one assumes mod el I is correct. In terms COMMENT 3 of infl uence functions, b µ OLS ma y b e motiv ated by considering the influ ence function ( 3 ) with a ( x ) = 0, as this leads to the estimator n − 1 P n i =1 m ( x i , β ). In fact, although KS do not discuss it, the “impu tation estimator” b µ IMP = n − 1 P n i =1 { t i y i + (1 − t i ) m ( x i , β ) } ma y b e motiv ated by ta king a ( x ) = 1 in ( 3 ). Of course, in p ractice, β m ust b e estimated. In general, ( 3 ) implies that all estimators for µ that are consis- ten t and asymptoticall y normal if mo del I is correct m ust b e asymptotically equiv alen t to an estimator of the form n − 1 n X i =1 [ m ( x i , b β ) + t i e a ( x i ) { y i − m ( x i , b β ) } ] , (6) where β is estimated by solving an estimating equa- tion P n i =1 t i A ( x i , β ) { y i − m ( x i , β ) } = 0 for A ( x, β ) ( p × 1). Because β is estimated, the influence fun c- tion of the estimator ( 6 ) with a particular e a ( x ) w ill not b e exactly equal to ( 3 ) w ith a ( x ) = e a ( x ); instead, it ma y b e sho wn th at the infl uence function of ( 6 ) is of form ( 3 ) with a ( x ) in ( 3 ) equal to e a ( x ) − E [ { π 0 ( x ) e a ( x ) − 1 } m T β ( x, β 0 )] · [ E { π 0 ( x ) A ( x, β 0 ) m T β ( x, β 0 ) } ] − 1 (7) · A ( x, β 0 ) , where m β ( x, β ) is the v ector of partial deriv ativ es of elements of m ( x, β ) with resp ect to β , and β 0 is suc h that m 0 ( x ) = m ( x, β 0 ). The IPW estimator b µ IPW - POP in (3) of KS and its v ariants arise if one assu mes mo del I I. In par- ticular, b µ IPW - POP can b e motiv ated via the influ - ence function ( 4 ) with h ( x ) = − µ . The estimator b µ IPW - NR in ( 4 ) of KS follo ws from ( 4 ) with h ( x ) = − E [ y { 1 − π ( x ) } ] /E [ { 1 − π ( x ) } ]. In fact, if one re- stricts h ( x ) in ( 4 ) to b e a constan t, then, using th e fact that the exp ectati on of the squ are of ( 4 ) is the asymptotic v ariance of the estimator, one ma y fin d the “b est” such constant minimizing the v ariance as h ( x ) = − E [ y { 1 − π ( x ) } /π ( x )] /E [ { 1 − π ( x ) } /π ( x )]. An estimator b ased on th is id ea was giv en in (10) of Lunceford and Da vid ian ( 2004 , page 2943). In gen- eral, as for mo del I, ( 4 ) implies that all estimators for µ that are consisten t and asymp totical ly normal if mo del I I is correct m ust b e asymptotically equiv- alen t to an estimator of th e form n − 1 n X i =1  t i y i π ( x i , b α ) + t i − π ( x i , b α ) π ( x i , b α ) e h ( x i )  , (8) where b α is estimated b y solving an equation of the form P n i =1 { t i − π ( x i , α ) } B ( x i , α ) = 0 for some ( s × 1) B ( x i , α ), almost alw a ys maximum lik eliho o d for binary r egression. As ab o ve , b ecause α is estimated, the infl uence fu nction of ( 8 ) is equal to ( 4 ) w ith h ( x ) equal to e h ( x ) − E [ π T α ( x, α 0 ) { m 0 ( x ) + e h ( x ) } /π 0 ( x )] · [ E { B ( x, α 0 ) π T α ( x, α 0 ) } ] − 1 (9) · B ( x, α 0 ) π 0 ( x ) , where π α ( x, α ) is the vecto r of partial d eriv ativ es of elemen ts of π ( x, α ) with r esp ect to α , and α 0 satis- fies π 0 ( x ) = π ( x, α 0 ). Doubly r ob u st (DR) estimators are estimators that are consisten t and asymptotically n ormal for mo d- els in M I ∪ M II , that is, u nder the assumptions of mo del I or mo del I I . When the true d ensit y p 0 ( z ) ∈ M I ∩ M II , then the influence fun ction of any su ch DR estimator m ust b e equal to ( 3 ) with a ( x ) = 1 /π 0 ( x ) or, equiv alen tly , equal to ( 4 ) w ith h ( x ) = − m 0 ( x ). Accordingly , when p 0 ( z ) ∈ M I ∩ M II , that is, b oth mod els hav e b een sp ecified correctly , all suc h DR estimators will h a v e the same asymptotic v ariance. This also imp lies th at, if b oth mo dels are correctly sp ecified, the asymptotic p rop erties of the estimator do not dep end on th e metho ds used to estimate β and α . KS discuss strateg ies for constructing DR esti- mators, and they presen t seve ral sp ecific examples: b µ BC - OLS in th eir equation ( 8 ); the estimators b e- lo w ( 8 ) using P OP or NR wei ght s, whic h w e denote as b µ BC - POP and b µ BC - NR , resp ectiv ely; th e estimator b µ WLS in their equ ation ( 10 ); b µ π - cov in their equ ation (12); and a v ersion of b µ π - cov equal to the estima- tor prop osed b y Scharfstein, Rotnitzky and Rob in s ( 1999 ) and Bang and Robins ( 2005 ), which we de- note as b µ SRR . The results for these estimators un der the “Correct-Correct” scenarios ( M I ∩ M II ) in T a- bles 5–8 of KS are consisten t with the asymptotic prop erties ab o v e. W e note that b µ π - cov is n ot DR un- der M I ∪ M II b ecause of the ad d itional assump tion that the mean of y giv en π must b e equal to a lin- ear combinatio n of basis fu nctions in π . Making this additional assumption ma y n ot b e un r easonable in practice; how ev er, strictly sp eaking, it tak es b µ π - cov outside the class of DR estimators discussed here, and hence we do not consider it in the remainder of this section. How ev er, b µ SRR is still in th is class. KS suggest that a c haracteristic distinguishin g th e p erforman ce of DR estimators is whether or not the estimator is within or outsid e the augmen ted in v erse-probabilit y w eigh ted (AIPW) class. W e fin d 4 A. A. TSIA TIS AN D M. DA VIDIAN this distinction artificial , as all of the ab o ve estima- tors b µ BC - OLS , b µ BC - POP , b µ BC - NR , b µ WLS and b µ SRR can b e exp ressed in an AIPW form. Namely , all of these estimato rs are algebraically exactly of the form ( 8 ) with e h ( x i ) replaced by a term − b γ − m ( x i , b β ), where b γ BC - OLS = b γ WLS = b γ SRR = 0, b γ BC - P OP = n − 1 P n i =1 ( t i / b π i )( y i − b m i ) n − 1 P n i =1 t i / b π i and (10) b γ BC - NR = n − 1 P n i =1 ( t i (1 − b π i ) / b π i )( y i − b m i ) n − 1 P n i =1 t i (1 − b π i ) / b π i , where w e w rite b π i = π ( x i , b α ) and b m i = m ( x i , b β ) for brevit y . F or b µ WLS and b µ SRR , this ident it y follo ws from the fact that P n i =1 t i b π i ( y i − b m i ) = 0 , whic h for b µ WLS holds b ecause KS r estrict to m ( x, β ) = x T β , with x includ ing a constan t term. Thus, we con- tend th at iss ues of p erformance u nder M I ∪ M II are not linke d to whether or not a DR estimator is AIPW, but, rather, are a consequence of forms of the influ ence functions of estimators under M I or M II . In particular, under mo del I I, it follo ws that the ab o v e estimators ha v e infl u ence f unctions of th e form ( 4 ) with h ( x ) equal to ( 9 ) with e h ( x ) = −{ γ ∗ + m ( x, β ∗ ) } , where γ ∗ and β ∗ are the limits in probabilit y of b γ and b β , resp ectiv ely . Thus, features determining p erf orm ance of these estimators w hen mo del I I is correct are h o w close γ ∗ + m ( x, β ∗ ) is to m 0 ( x ) and ho w α is estimated, where maxim um lik eliho o d is the optimal c hoice. In fact, this p er- sp ectiv e rev eals that, for fixed m ( x, β ), u sing ideas similar to those in T an ( 2006 ), the op timal c hoice of b γ is as in b γ BC - NR with t i (1 − b π i ) / b π i replaced by t i (1 − b π i ) / b π 2 i . Similarly , un der m o del I , the influence fun ctions of these estimators are of the form ( 3 ) with a ( x ) equal to ( 7 ) with e a ( x ) = ψ 1 /π ( x, α ∗ ) + ψ 2 , where α ∗ is the limit in probab ility of b α and ψ 1 = 1 and ψ 2 = 0 for b µ BC - OLS , b µ WLS and b µ SRR ; ψ 1 = 1 /E { π 0 ( x ) /π ( x, α ∗ ) } and ψ 2 = 0 for b µ BC - POP ; and ψ 1 and ψ 2 for b µ BC - NR are more complicated exp ectations inv olv- ing π 0 ( x ) and π ( x, α ∗ ). Th us, und er mo del I, f ea- tures determinin g p erformance of these estimators are the form of e a ( x ) and h o w β is estimated through the c hoice of A ( x, β ). W e ma y in terpret some of the results in T ables 5, 6 and 8 of KS in ligh t of these observ ations. Un- der the “ π -mo del Correct– y -mo del In correct” sce- nario ( M II ∩ M c I ), b µ BC - OLS , b µ WLS and b µ SRR sho w some nontrivial d ifferences in p erformance, wh ic h , from ab ov e, are lik ely attributable to differences in m ( x, β ∗ ). Under th e “ π -mo del Incorrect– y -mod el Correct” ( M I ∩ M c II ), all three estimators share the same e a ( x ) b ut u se different metho ds to estimate β , so that an y differences are d ictated en tirely by the c h oice of A ( x, β ). The p o or p erformance of b µ SRR can b e understo o d from this p ersp ectiv e: “ β ” for this es- timator is actuall y β in th e mo del m ( x, β ) used by the other t w o estimators concatenated by an add i- tional element, the co efficien t of b π − 1 i . Th e A ( x, β ) for b µ SRR th us in v olv es a d esign matrix that is un- stable for small b π i , consisten t with the comment of KS at the end of their Section 3. In sum m ary , we b eliev e that stud ying the p erfor- mance of estimators via their influ ence f unctions can pro vide useful insigh ts. Our preceding remarks re- fer to large-sample p erformance, wh ic h d ep ends di- rectly on the influence fun ction. Estimators with the same influ en ce function can exhibit different finite- sample p rop erties. It ma y b e p ossible via higher- order expansions to gain an understanding of some of this b eha vior; to the b est of our kno wledge, this is an op en question. BOTH MODELS INCORRECT The dev elopmen ts in the pr evious section are rel- ev an t in M I ∪ M II . Key themes of KS are p erfor- mance of DR and other estimators outsid e this class; that is, when b oth the mo dels π ( x, α ) and m ( x, β ) are incorrectly sp ecified, and c hoice of estimator un - der these circumstances. One wa y to study p erformance in this situation is through simulation. KS ha ve devised a ve ry inte r- esting and instru ctiv e sp ecific sim ulation scenario that highligh ts some imp ortan t features of v arious estimators. In p articular, the K S scenario emph a- sizes the difficulties encoun tered with some of the DR estimators wh en π ( x i , b α ) is small for some x i . Indeed, in our exp erience, p o or p erf ormance of DR and IPW estimators in practice can result from few small π ( x i , b α ). When there are small π ( x i , b α ), as noted KS, resp onses are n ot observ ed for some p or- tion of the x space. Consequently , estimato rs lik e b µ OLS rely on extrap olation into that part of the x space. KS ha v e constructed a scenario where fail- ure to observ e y in a p ortion of the x space can wreak hav o c on some estimators that mak e u se of the π ( x i , b α ) but h as minimal impact on the qu al- it y of extrap olations for these x based on m ( x, b β ). COMMENT 5 One could equally well build a scenario where the x for whic h y is unobs erv ed are highly influ ential for the regression m ( x, β ) and h ence could resu lt in deleterious p erf ormance of b µ OLS . W e th us reiterate the r emark of K S that, although simulations can b e illuminating, they cann ot yield broadly applicable conclusions. Giv en this, we offer some though ts on other strate- gies for deriving estimat ors that ma y ha ve some ro- bustness p r op erties und er the foregoing conditions, that is, offer go o d p erformance outside M I ∪ M II . One approac h may b e to searc h outside the class of DR estimators v alid under M I ∪ M II . F or ex- ample, as su ggested b y the simulations of KS , esti- mators in the sp irit of b µ π - cov , whic h imp ose add i- tional assumptions rendering them DR in the strict sense only in a subset of M I ∪ M II , ma y comp ensate for this restriction by yielding more robust p erfor- mance outside M I ∪ M II ; fu rther study along these lines w ould b e in teresting. An alternativ e tactic for searc h ing outside M I ∪ M II ma y b e to consider the form of infl uence functions ( 5 ) for estimators v alid under M I ∩ M II . F or in s tance, a “h ybrid ” estimator of the form n − 1 n X i =1  m ( x i , b β ) I { π ( x i , b α ) < δ } +  t i y i π ( x i , b α ) + t i − π ( x i , b α ) π ( x i , b α ) e h ( x i )  · I { π ( x i , b α ) ≥ δ }  , for δ sm all, ma y tak e adv an tage of the desirable prop erties of b oth b µ OLS and DR estimators. A second p ossible strategy for identi fying robust estimators arises from the follo wing observ ation. Con- sider the estimator n − 1 n X i =1  t i y i π ( x i ) − t i − π ( x i ) π ( x i ) m ( x i , b β )  . (11) If π ( x i ) = π ( x i , b α ), th en ( 11 ) yields one f orm of a DR estimator. If π ( x i ) ≡ 1, then ( 11 ) results in the impu- tation estimato r. If π ( x i ) = ∞ , ( 11 ) reduces to b µ OLS . This suggests that it ma y b e p ossible to dev elop es- timators based on alternativ e c hoices of π ( x i ) that ma y ha v e go o d robus tness prop erties. F or exam- ple, a m etho d for obtaining estimators π ( x i , b α ) that shrinks these to ward a common v alue ma y p ro v e fruitful. T he suggestion of KS to mo v e aw a y from logistic regression mo dels for π ( x i , α ) is in a similar spirit. Finally , w e note that y et another approac h to dev eloping estimators would b e to start with the premise that one mak e n o parametric assump tion on the f orms of E ( y | x ) and E ( t | x ) b ey ond some mild smo othness conditions. Here, it is likel y that fi rst- order asymptotic theory , as in the previous section, ma y no longer b e applicable. It may b e necessary to use h igher-order asymptotic theory to make progress in this d irection; see, for example, Robins and v an der V aart ( 2006 ). CONCLUDING REMA RKS W e aga in compliment the authors for their though t- ful and insigh tful article, and w e app r eciate the op- p ortun it y to offer our p ersp ective s on this imp ortan t problem. W e lo ok forw ard to new metho dological dev elopmen ts that ma y ov ercome s ome of th e chal- lenges brought in to fo cus by KS in their article. A CKNO WLEDGMENT This r esearch was supp orted in part b y Gr ants R01-CA051 962 , R01-CA085848 and R37-AI031789 from the National Institutes of Health. REFERENCES Bang, H . and Rob ins, J. M. (2005). D oubly robu st estima- tion in missing data and causal inference mo dels. Bi omet- rics 61 962–972. MR2216189 Da vidian, M., Tsia tis, A. A. and Leon, S. (2005). Semi- parametric estimation of treatment effect in a pretest- p osttest study without missing data. Statist. Sci. 20 261– 301. MR2189002 Lunceford, J. K. and D a vidian, M. (2004). Stratification and weigh t in g via t he prop ensity score in estimation of causal treatment effects: A comparativ e study . Statistics in Me dicine 23 2937–2960. Molenberghs, G. (2005). D iscussion of “Semiparametric es- timation of treatment effect in a pretest–p osttest study with missing data,” by M. Davidian, A. A. Tsiatis and S. Leon. Statist. Sci. 20 289–292. MR2189002 Ro bins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression co efficients when some regressors are not alw ays observed. J. Amer. Stat ist. Asso c. 89 846– 866. MR1294730 Ro bins, J. and v an der V aar t, A. ( 2006). Adaptiv e nonparametric confidence sets. An n. Statist . 34 229–253. MR2275241 Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Rejoinder to “Ad justing for nonignorable drop- out using semiparametric nonresp onse mod els.” J. Amer. Statist. Asso c. 94 1135–114 6. MR1731478 T an, Z. (2006). A distributional approac h for causal inference using propensity scores. J. Amer . Statist. Asso c. 101 1619– 1637. MR2279484 6 A. A. TSIA TIS AN D M. DA VIDIAN Tsia tis, A. A. (2006). Semi p ar ametric The ory and Missing Data . S pringer, New Y ork. MR2233926

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment