Assessing surrogate endpoints in vaccine trials with case-cohort sampling and the Cox model

The Annals of Applie d Statistics 2008, V ol. 2, No. 1, 386–407 DOI: 10.1214 /07-A OAS132 c  Institute of Mathematical Statistics , 2 008 ASSES SING SURR OGA TE ENDPOINTS IN V A C CINE TRIALS WITH CASE-COHOR T SAMPLING A ND THE CO X MODEL 1 By Li Qin, Peter B. Gilber t, Dean F oll mann and Dongfeng Li F r e d Hutchinson Canc er R ese ar ch Center , F r e d Hutchinson Canc er R ese ar ch Center , National Institute of Al ler gy and Infe ctious D ise ases and Peking University Assessing immune resp onses to stu d y v accines as surrogates of protection p la y s a central role in v accine clinical trials. Motiv ated by three ongoing or p ending HIV v accine eﬃcacy t rials, we consider such surrogate endp oint assessmen t in a randomized placeb o-controlle d trial w ith case-cohort sampling of imm une resp onses and a time to even t end p oin t. Based on the principal surrogate deﬁnition un der th e principal stratiﬁcation framew ork proposed by F rangakis and Rubin [ Biometrics 58 (2002) 21–29] and adapted by Gilbert and Hud gens (2006), we introduce estimands th at measure the val ue of an im- mune resp onse as a surrogate of protection in th e context of th e Co x p roportional hazards mo del. The estimands are not identiﬁed b ecause th e imm une resp onse to v accine is n ot measured in p lacebo recipien ts. W e formulate the problem as a Cox mo del with missing co vari ates, and employ nov el trial d esigns for p redicting t he missing imm une responses and thereby identifying th e estimands. The ﬁrst design ut ilizes information from baseline predictors of th e immune response, and bridges their relationship in the va ccine recipients to the placebo recipien ts. The second design provides a v alidation set for the u nmeasured immune resp onses of uninfected placeb o recipien ts by immunizing them with the study v accine after trial closeout. A maximum estimated likelihoo d approac h is prop osed for estimation of th e parameters. Simulated data examples are given to ev aluate the prop osed designs and stud y their prop erties. 1. In tro duction. The ev aluation of v accine eﬃcacy in v accine clinical tri- als is generally costly , either b ecause it tak es a long trial p erio d for the clin- ical outcomes to b e observ ed, or b ecause the v accine ma y only b e p artially Received Novem b er 2006; revised August 2007. 1 Supp orted b y US NIH-NIA I D Gran t 2 R O 1 AI054165-04 and NIH Gra nt R 37 AI291 68. Key wor ds and phr ases. Clinical trial, discrete failure time model, missing data, p oten- tial outcomes, principal stratiﬁcation, surrogate marker. This is a n electronic r eprint of the original article publishe d by the Institute of Mathematical Statistics in The Annals of Applie d Statistics , 2008, V ol. 2, No. 1, 386 –407 . This reprint diﬀers fr om the or iginal in pag ination and typogra phic detail. 1 2 L. QIN , P . B. GILBER T, D . FOLLMANN AND D. LI eﬀectiv e. Therefore, ident ifying v accine-i nduced immune resp onses as sur- rogate m ark ers for the tr u e stud y endp oin t has spa wn ed int erest in v accine researc h [Halloran ( 1998 ), C han, W ang and Heyse ( 2003 ) and Gilb ert et al. ( 2005 )]. The p oten tial surrogate wo u ld us u ally b e mea sured shortly after administration of the study v accine, and if it can b e v alidated then the v accine’s pr otectiv e eﬀect can b e infer r ed from it. As knowle dge bu ilds on the imm un ologic al mec hanism for p rotecting against disease by a pathogen, ﬁnding a go o d immunolog ical surrogate is p r omising for iterativ ely guiding reﬁnement of the v accine formulatio n, an d ultimately for pro viding a b asis for regulatory decisions. There is an extensiv e literature on the ev aluatio n of sur rogate endp oin ts for therap eutic dev elopment [e.g., Pren tice ( 1989 ), Lin, Fleming and De Gruttola ( 1997 ), DeGruttola et al. ( 2002 ), Molen b er gh s et al. ( 2002 ) and W eir and W alley ( 2006 )]. The assessmen t of an immunologic al sur rogate fo cuses on con trast- ing the clinical outcome rate b et w een v accine recipien ts and placeb o recip- ien ts, given the measured immune resp ons es. Sin ce immune resp onse m ea- surements are made p ost-randomization, this assessm en t is sub j ect to se- lection bias [ F rangakis and Rubin ( 2002 ) and Gilb ert, Bosc h and Hudgens ( 2003 )]. T o addr ess this pr oblem, Gilb ert an d Hudgens ( 2006 ) (henceforth GH) prop osed to ev aluate the v alue of a biomark er as a sur rogate endp oint b y estimating the causal eﬀect predictiv eness (CEP ) surface, whic h co n- trasts th e clinical outcome rates b et wee n the v accine r ecipien ts and placeb o recipien ts within pr in cipal strata formed by join t v alues of th e p oten tial im- m u ne resp onses under assignmen t to v acci ne or p laceb o. Th is work b uilt on F rangakis and Rub in ( 2002 )’s p oten tial outcomes framew ork for ev aluating principal s u rrogate endp oin ts. GH considered a b in ary clinical outcome and used a baseline predictor appr oac h to predict the principal strata and esti- mate the CEP sur face n on p arametrically . W e deve lop a similar metho d for a time-to-ev ent clinical endp oin t, w h ic h is most commonly used in v accine clinical trials, and use the Co x p rop ortional hazards mo del [ Co x ( 1972 )] to describ e the relationship b et we en the surviv al outcome and co v ariates in- cluding the p otenti al sur rogate. Our lik eliho o d calculations utilize d iscrete failure time mo d els, which are suitable for many v accine trials b ecause clin- ical end p oin ts are often assessed at pre-sp eciﬁed dates. In the prin cipal stratiﬁcation framew ork , the principal strata are su b ject to missingness as only the immune r esp onse to the actual treatmen t as- signmen t (v accine or p laceb o) is observed. This situation wa s describ ed as the “fun d amen tal c hallenge of causal inference” [ Holland ( 1986 )]. Th e unob- serv ed immune resp onse is missing for th e su b jects that receiv e th e “opp o- site” assignment. W e f o cus on a marginal estimand that conditions on the imm u ne resp onse to the v acci ne. C on s equen tly , the assessment of a su rrogate in the Co x mo del f r amew ork can b e cast as a problem of estimation w ith a missing co v ariate. Although metho d s for estimating the Cox mo del with ASSESSI NG SUR ROGA TE ENDPOINTS IN V ACCINE TRIA LS 3 missing co v ariates ha ve b een extensiv ely stud ied [e.g., Lin and Ying ( 1993 ), Robins, Rotnitzky and Zh ao ( 1994 ), Zhou and P ep e ( 1995 ), Pa ik and Tsai ( 1997 ), Ch en and Little ( 1999 ), Herring and Ibrahim ( 2001 ), Chen ( 2002 ) and Little and Ru bin ( 2002 )], their app lication to the prop osed surrogate assessmen t are not dir ect, as the missin g data are enti rely in the placeb o group. T ec hniques are called for to pr ed ict the “missing” immune resp onses in the p laceb o recipien ts, or a rand om sample of them. Therefore, we extend the innov ativ e designs p r op osed by F ollmann ( 2006 ) for a binary end p oin t to the Co x mo del setting. F ollmann ( 2006 ) pr op osed t wo no v el comp onents to v accine trials: baseline irrelev an t pr edictor (BIP), and closeout placeb o v accination (CPV), w hic h enable inference ab out the v accine-sp eciﬁc immune resp onses of placeb o re- cipien ts. BIP utilizes asso ciation b et we en the resp onse of interest and an- other baseline imm u ne resp onse thought to b e irr elev an t to infection in the v accinate d sub jects. CPV inv olv es v accinating u ninfected p lacebo recipi- en ts after stu dy completion. T o matc h ongoing and pen d ing HIV v accine trials, we extend these strategie s to accommod ate a time to ev ent clini- cal en d p oint and samplin g of imm un e resp onses via a case-cohort design [e.g., Prentic e ( 1986 ), Borgan et al. ( 2000 ), Scheik e and Martin ussen ( 2004 ) and Kulic h and Lin ( 2004 )]. W e fo cus on a sampling design that u ses data from all infected sub jects and a r an d om su b cohort of uninfected su b jects for whom the immune resp onse to the v accine is measured (termed “im- m u nogenicit y sub cohort,” IC ). The metho ds also apply for other sampling designs, such as failure status-indep endent case-cohort sampling. W e also consider measuring the BIP on some su b jects outside the IC , wh ic h ca n help impr o v e eﬃciency . Under the BIP design placeb o su b jects cannot b e selected int o th e IC ; similarly , infected placeb o sub jects cann ot en ter IC in the CPV design. Suc h n u ll selection p robabilities violate a k ey assump tion for most semiparametric approac hes to hand ling missing co v ariates in Co x regression, includ ing all that are based on p artial like liho o d. Accordingly , we emplo y a fu ll-lik eliho o d based estimation pro cedu re b ased on DFT mo dels. F or cont in u ous failure time d ata, we also consider an approximate semiparametric algorithm for the estimation of the BIP-alone design by extending the EM algorithm of Chen ( 2002 ). The prop osed metho ds will b e app lied to analyze three U.S. National Institutes of Health-sp onsored HIV v acci ne eﬃcacy trials. Th ese trials ran- domize HIV n egativ e high risk volun teers to v accine or p laceb o in a 1:1 ratio, and follo w participan ts until a ﬁ xed num b er of HIV infection ev ents. Th e ﬁrs t t wo trials (named S TEP 502 [ Mehrotra, Li and Gilb ert ( 2006 )] and HVTN 503) are ongoing in the Americas and South Africa, resp ectiv ely , and ev al- uate Merck’s Adenovirus serot yp e 5 (Ad5) v ector v accine in appro ximately 3000 sub jects. The th ird tr ial (named P A VE-100), co-sp onsored by the U.S. 4 L. QIN , P . B. GILBER T, D . FOLLMANN AND D. LI Military HIV Researc h Program, the In ternational AIDS V accine Initia- tiv e, and the Centers f or Disease Cont rol and Prev en tion, is b eing p lanned. The current P A VE-100 design will randomize appro ximately 8500 v olun- teers fr om 13 countries in the Americas, East Africa, and Southern Africa to p lacebo or the V accine Researc h C enter’s p r ime-b o ost v accine regimen (DNA pr ime:Ad5 v ector b o ost). Th e trials plan to analyze appro ximately 100, 120 and 280 HIV infection ev ents, resp ective ly . A secondary ob jectiv e of eac h trial is to ev aluate the magnitude of CD8 + T cell resp onse lev els, as measured by the ELIS p ot assa y from bloo d samples dra wn after Ad5 imm u nization, as a su rrogate for HIV inf ection. The n eutralizing antib o dy titer to Ad5 is measur ed at b aseline for all participan ts. Because it is in- v ersely correlated with th e C D8 + T cell r esp onses [ Catanzaro et al. ( 2006 )], it p oten tially m a y b e used as a BIP . T o devel op our approac h for assessing surr ogate endp oin ts in v accine tri- als, w e p resen t the general framew ork, assum ptions, and deﬁn ition of the estimands in Section 2 , design considerations in Section 3 , and an estima- tion pro cedure in Section 4 . In Section 5 we ev aluate the approac h with sim u lated trials designed to matc h the aforementio ned HIV trials. A discus- sion follo ws in Section 6 . 2. The p rincipal stratiﬁcation framew ork. In this section we introd uce the p rincipal stratiﬁcation framew ork b ased on p oten tial outcomes and pr in - cipal stratiﬁcation [ F rangakis and Rubin ( 2002 ) and Rub in ( 2005 )]. Let n d en ote the total num b er of sub j ects in the v accine trial. F or sub ject i ( i = 1 , . . . , n ), let V i denote the observ ed treatment indicator, W i denote a collect ion of ﬁrst-ph ase baseline co v ariates in the case-cohort sampling (mea- sured on ev eryone), and S i ( V ) denote the p oten tial imm une resp onse of the sub ject if he/she is assigned v accine ( V = 1) or p laceb o ( V = 0). Similarly , for V = 1 , 0, let T i ( V ) and C i ( V ) b e the p oten tial failure time and censor- ing time, and X i ( V ) = min { T i ( V ) , C i ( V ) } and δ i ( V ) = I ( T i ( V ) ≤ C i ( V )). Let t 1 , . . . , t K indicate th e ﬁxed visit times, with t 2 , . . . , t K the p ossible d is- crete failure times for X i ( V i ). Let t + K denote censored at the ﬁn al visit and M i denote the last visit n u m b er of sub ject i dur ing th e trial p erio d, thus, M i ∈ { 1 , . . . , K } . F or v accine recipients at-risk at t 1 and in the IC , the im- m u ne resp onse S i ( V ) is measur ed at time t 1 . L etting R i ( V ) denote the p oten tial at-risk indicator at t 1 , S i ( V ) is only deﬁned if R i ( V ) = 1 ; other- wise, w e put S i ( V ) = ∗ . W e assume that the censoring pr o cess C i ( V ) and failure time distribution T i ( V ) are indep endent given { W i , R i ( V ) , S i ( V ) } . Supp ose t hat { V i , W i , R i (0) , R i (1) , S i (0) , S i (1) , X i (0) , X i (1) , δ i (0) , δ i (1) , i = 1 , . . . , n } are i.i.d. W e mak e the follo wing assu mptions to iden tify the esti- mands: A1. Stable unit treatment v alue assump tion (SUTV A). ASSESSI NG SUR ROGA TE ENDPOINTS IN V ACCINE TRIA LS 5 A2. Ignorable tr eatment assignments. Conditional on W i , V i is ind ep endent of { R i (0) , R i (1) , S i (0) , S i (1) , X i (0) , X i (1) , δ i (0) , δ i (1) } . Assumption A1 guaran tees th e “consistency” prop ert y (i.e., the observ ed outcomes for a sub ject assigned V equals h is p oten tial outcomes if assigned V ) and that the p oten tial outcomes of one sub ject are not impacted by the treatmen t assignmen ts of other sub jects. A2 holds for randomized, blinded trials. Under th e ab ov e assum ptions, we deﬁne t wo v accine eﬃcacy estimands: 1. Conditional on joint p otential outc omes ( joint VE ) VE ( s 1 , s 0 ) ≡ 1 − Pr( T (1) = t k | T (1) ≥ t k − 1 , S (1) = s 1 , S (0) = s 0 , R (1) = 1 , R (0) = 1) Pr( T (0) = t k | T (0) ≥ t k − 1 , S (1) = s 1 , S (0) = s 0 , R (1) = 1 , R (0) = 1) . 2. Conditional on mar ginal p otential outc ome ( mar ginal V E ) VE ( s 1 ) ≡ 1 − Pr( T (1) = t k | T (1) ≥ t k − 1 , S (1) = s 1 , R (1) = 1) Pr( T (0) = t k | T (0) ≥ t k − 1 , S (1) = s 1 , R (1) = 1) , k = 2 , . . . , K. The estimand V E ( s 1 , s 0 ) conditions on memb ership in the basic princi- pal stratum { S (1) = s 1 , S (0) = s 0 , R (1) = R (0) = 1 } , and VE ( s 1 ) conditions on membersh ip in a union of basic principal s trata [ F rangakis and Rubin ( 2002 )]. The estimands condition on R i (1) = R i (0) = 1 or on R i (1) = 1 b e- cause S i ( V ) is only deﬁn ed if R i ( V ) = 1 , V = 0 , 1. Th e estimands are prin- cipal stratiﬁcation estimands in that the pair ( S (1) , S (0)) or S (1) can b e treated as a baseline co v ariate. Ho w eve r, they are n ot causal estimands, b ecause the numerators and denominators cond ition on diﬀerent ev en ts T (1) ≥ t k − 1 and T (0) ≥ t k − 1 . Nev erth eless they are scien tiﬁcally in terest- ing, in the same w ay that a hazard ratio conditional on baseline co v ariates is interesting. T o help identi fy the estimands, only sub j ects with R i ( V i ) = 1 are included in the analysis, and w e assume the follo win g: A3. Equal drop-out and risk up to time t 1 : R i (1) = 1 ⇐ ⇒ R i (0) = 1. A3 implies that sub jects observed to b e at risk at t 1 will ha ve R i (1) = R i (0) = 1, so that S i (1) and S i (0) are b oth d eﬁ ned. In addition to A1–A3, id en tiﬁabilit y of VE ( s 1 , s 0 ) requires a w a y to pre- dict S i (1) for sub jects with V i = 0 and a w a y to predict S i (0) for sub- jects with V i = 1 . Identiﬁabilit y of VE ( s 1 ) is easier b ecause only the S i (1) for su b jects in arm V i = 0 m ust b e predicted. F urthermore, for our mo- tiv ating application, typica lly the imm un e resp onse S i (0) is ze ro for all placeb o recipien ts, b ecause exp osure to th e v a ccine is necessary to stim- ulate an imm u ne resp onse. F or th ese r easons, henceforth, we fo cus on the 6 L. QIN , P . B. GILBER T, D . FOLLMANN AND D. LI marginal estimand VE ( s 1 ). Note th at, for ap p lications with S i (0) = 0 for all i , VE ( s 1 ) = VE ( s 1 , 0). W e prop ose a Co x mo del f or the discrete cumulativ e hazard fun ction Λ( t ), d Λ( t k ; V , S (1) = s 1 , R (1) = 1 , W ) = exp( Z ′ β ) d Λ 0 ( t k ) , (1) k = 2 , . . . , K , with Z = { V , S (1) , V S (1) , W ′ } ′ , β = { β 1 , β 2 , β 3 , β ′ 4 } ′ , and Λ 0 ( · ) is the d is- crete baseline cumulativ e hazard f u nction. T he marginal VE ( s 1 ) can b e expressed as VE ( s 1 ) = 1 − d Λ( t k ; V = 1 , S (1) = s 1 , R (1) = 1) d Λ( t k ; V = 0 , S (1) = s 1 , R (1) = 1) , k = 2 , . . . , K . The d iscrete h azards alw ays condition on { R (1) = 1 } and, h enceforth, w e assume this implicitly . F or sub jects with a particular baseline co v ariate w , a similar estimand V E ( s 1 | w ) can b e formed b y conditioning on W = w in the hazards. The p opu lation estimand VE ( s 1 ) contrasts the rate of the clinical ev en t for sub jects with S (1) = s 1 under assignment to v accine v ersu s under as- signmen t to placeb o. Supp osing S (1) is b ound ed b elo w at v alue zero w hic h indicates a negativ e immune resp onse, w e d eﬁ ne S to b e a pr e dictive surr o- gate if VE (0) = 0 and VE ( s 1 ) > 0 for all s 1 > C for some constant C ≥ 0. These conditions r eﬂect p opulation lev el n ecessit y and suﬃciency of the im- m u ne resp ons e to ac h ieve p ositive v accine eﬃcacy . Under A1–A3 and the Cox mo del ( 1 ), the estimand equals VE ( s 1 ) = 1 − exp( β 1 + s 1 β 3 ) . (2) In equation ( 2 ) a negativ e v alue of β 3 indicates that a higher immune re- sp onse to v acci ne predicts greater v accine eﬃcacy . On th e other hand, β 3 = 0 implies VE ( s 1 ) is constant in s 1 so that the mark er do es not predict v accine eﬃcacy . Th erefore, testing H 0 : β 3 = 0 v ersu s H 1 : β 3 < 0 assesses suﬃciency . A v alue β 1 = 0 in dicates n ecessit y , and b oth β 1 = 0 an d β 3 < 0 indicate the mark er is a predictiv e sur rogate. T h e magnitude of β 3 indicates the qual- it y of the pr ed ictiv e surrogate w ith β 3 = 0 suggesting n o sur rogate v alue [ VE ( s 1 ) is constan t in s 1 ] and larger | β 3 | s uggesting greater surrogate v alue (greater pr edictiv eness). 3. Augmen ted designs for estimation. The immune resp onse to the s tudy v accine, S (1), cannot b e measur ed in placeb o recipien ts, but it ma y b e in- ferred wh en utilizing either the BIP or CPV designs (see Figure 1 ). ASSESSI NG SUR ROGA TE ENDPOINTS IN V ACCINE TRIA LS 7 Fig. 1. Il lustr ation of an HIV vac cine trial design under the BIP and CPV str ate gies. Under BIP or BIP + CPV , b aseline me asur ements of W and B ar e obtaine d f r om al l (or a r andom sample of ) the study p articip ants prior to the r andomization at time 0 . The study subje cts ar e then r andomize d to r e c eive ino culation V of the study vac cine or plac eb o. F or some vac cine r e cipients, the imm une r esp onse to the vac cine S (1) is me asur e d at time t 1 . The subse quent assessments of HIV inf e ction ar e c onducte d at discr ete times t 2 , . . . , t K . The study subje cts ar e fol lowe d until diagnosis of HIV i nfe ction (HIV + ) or study close out at or after t K . Under CPV or BIP + CPV, plac eb o r e cipi ents uninfe cte d (HIV − ) at study close out (or a r andom sample of them) ar e im munize d wi th the study vac ci ne and the immune r esp onse S c (1) is me asur e d t 1 units of time af terwar d. Baseline Irr elevant Pr e dictor ( BIP ) . Assume a baseline co v ariate B is a v ailable that do es not aﬀect (i.e., is “irrelev an t” for) clinical risk after ac- coun ting for the imm u ne resp ons e S (1) and ﬁr st-phase co v ariates W : A4. d Λ( t k ; V , S (1) , W, B ) = d Λ( t k ; V , S (1) , W ), k = 2 , . . . , K , V = 0 , 1. Assumptions A1–A3 imply that the relationship b et w een S (1) and B is the same regardless of tr eatmen t assignmen t [ S i (1) | V i = 1 , B i , R i (1) = 1] d = [ S i (1) | V i = 0 , B i , R i (1) = 1] . (3) 8 L. QIN , P . B. GILBER T, D . FOLLMANN AND D. LI Therefore, S i (1) can b e pr edicted or imputed f or p laceb o s u b jects based on B i . F or v accine recipient s w ith the BIP measured and who are outside the IC , their immune resp onses are p redicted u s ing the BIP as w ell. In case-cohort designs, go o d baseline p r edictors need to b e highly corre- lated with the biomarker S (1), and preferably include ﬁrst-phase (measur ed on every one) inexp ensiv e co v ariates to ac h iev e eﬃciency gains. Close out Plac eb o V ac cination ( CPV ) . This design en tails v accinating uninfected placeb o sub jects after the s tudy closeout, and measuring their im- m u ne resp onse S c i (1). The closeo ut measurement S c i (1) is m ad e at a visit t 1 time u nits after v accination, to matc h the m easuremen t schedule in the v ac- cine trial. W e need to make an add itional assump tion to br id ge the marker v alues S i (1) and S c i (1). Let S true i (1) b e th e true immune r esp onse at time t 1 , allo w ing that th e observed imm u ne resp onse is sub j ect to some assa y measuremen t error. A5. Time c onstancy of S true i (1): F or uninf ected placeb o recipien ts, S i (1) = S true i (1) + e i 1 and S c i (1) = S true i (1) + e i 2 , where e i 1 and e i 2 are indep en- den t and identica lly d istributed r andom errors with mean 0. This assump tion implies that the true immune resp onse is unchanged from time t 1 to stud y closeout plus t 1 , and the measurement errors ha ve the same distribu tion. Thus, S i (1) and S c i (1) are exc hangeable and one can b e used in lieu of th e other. T o b e concrete, supp ose only one shot is give n, the trial is thr ee y ears, and t 1 is 6 m on ths after the sh ot. A5 states that the true imm une resp onse 6 mon ths after the shot is the same w hether it is measured January 1, 2004 or Jan u ary 1, 2007. In the Discuss ion we outline ho w our metho ds can b e generalized to use S true i (1) in the Co x m o del (1) rather than S i (1). Note that even if the regression inv olv es S true i (1), a v alid test of the eﬀect of S true i (1) obtains when u sing S i (1) [ Pren tice ( 1982 )]. If time constancy of imm u ne resp onse is n ot reasonable, then S c i (1) cannot b e used in lieu of S i (1) and CPV ma y b e questionable. See F ollmann ( 2006 ) for further discus s ion of this issue, including h o w to examine this assumption. Under A5, the d istribution of [ S i (1) | V i = 0 , δ i = 1] can b e in ferred from the marginal distribu tions [ S i (1) | V i = 1] d = [ S i (1) | V i = 0] . Ho w ever, in case- cohort sampling, if the IC is small, then the large amount of missing d ata and the in ferred immune resp onses in placeb o recipien ts ma y c h allenge the p erformance of the metho d. Baseline irr elevant pr e d ictor and close out p lac eb o vac cination c o mbi ne d ( BIP + CPV ) . The BIP an d CPV designs can b e com bined by impu ting S i (1) with S c i (1) for all u ninfected p laceb o recipients with S c i (1) measured, and pr ed icting S i (1) with B i for all others with B i measured. Combining ASSESSI NG SUR ROGA TE ENDPOINTS IN V ACCINE TRIA LS 9 the designs can yield large eﬃciency gains. In th e situation wh ere there is no goo d baseline predictor or the baseline p redictor is exp ensiv e to collect, conducting sm all-scale CPV on a random sample of the uninfected p laceb o recipien ts can add accuracy and precision to th e estimates. 4. Estimation. Estimation of the estimand is c hallenged by the amoun t of missin g S (1)’s. W e fo cus on the maxim um estimated likel iho o d (MEL) estimation pro cedure th at applies to all three d esigns. W e then b rieﬂy out- line an approxima te EM-t yp e algorithm for estimation with th e BIP-alone design. 4.1. Maximum estimate d likeliho o d estimation. W e present b elo w the es- timation pro cedu re for the BIP + CPV design, whic h includes estimation under the BIP- or C PV-alone designs as sp ecial cases. Let IC V denote the imm u n ogenicit y cohort that con tributes second-phase data S (1) in v accine recipien ts, and IC P denote the cohort within uninfected placeb o su b jects that receiv ed v accinatio n at stud y closeout, s o that IC = IC V ∪ IC P . Let IB denote the set of su b jects with B measured , wh ic h can b e larger than IC . F or p lacebo sub jects th at do not ha ve S (1) measured, their lik eliho o d contribution in tegrates ov er th e marginal distribu tion of S (1) or the conditional d istr ibution of S (1) | B . The fu ll log-lik eliho o d of m o del ( 1 ) under the BIP + CPV design (with con v entio n that Q 1 j = 2 = 1) is giv en b y log L ( β , λ 0 ) = X i ∈ IC V log L 1 ( O i ) + X i ∈ IC P log L 2 ( O i ) + X i ∈ IC , IB log L 3 ( O i ) (4) + X i ∈ IC , IB log L 4 ( O i ) , where L 1 ( O i ) = M i − 1 Y j = 2 (1 − λ 0 j ) exp { V i β 1 + S i (1) β 2 + V i S i (1) β 3 + W ′ i β 4 } R i ( V i ) × { 1 − (1 − λ 0 ,M i ) exp { V i β 1 + S i (1) β 2 + V i S i (1) β 3 + W ′ i β 4 } } δ i R i ( V i ) × (1 − λ 0 ,M i ) exp { V i β 1 + S i (1) β 2 + V i S i (1) β 3 + W ′ i β 4 } (1 − δ i ) R i ( V i ) , L 2 ( O i ) = M i Y j = 2 (1 − λ 0 j ) exp { V i β 1 + S c i β 2 + V i S c i β 3 + W ′ i β 4 } R i ( V i ) , L 3 ( O i ) = Z M i − 1 Y j = 2 (1 − λ 0 j ) exp { V i β 1 + sβ 2 + V i sβ 3 + W ′ i β 4 } R i ( V i ) × { 1 − (1 − λ 0 ,M i ) exp { V i β 1 + sβ 2 + V i sβ 3 + W ′ i β 4 } } δ i R i ( V i ) 10 L. QIN , P . B. GILBER T, D . FOLLMANN AND D. LI × (1 − λ 0 ,M i ) exp { V i β 1 + sβ 2 + V i sβ 3 + W ′ i β 4 } (1 − δ i ) R i ( V i ) dP ( s | B i , W i ) , L 4 ( O i ) = Z M i − 1 Y j = 2 (1 − λ 0 j ) exp { V i β 1 + sβ 2 + V i sβ 3 + W ′ i β 4 } R i ( V i ) × { 1 − (1 − λ 0 ,M i ) exp { V i β 1 + sβ 2 + V i sβ 3 + W ′ i β 4 } } δ i R i ( V i ) × (1 − λ 0 ,M i ) exp { V i β 1 + sβ 2 + V i sβ 3 + W ′ i β 4 } (1 − δ i ) R i ( V i ) dP ( s | W i ) . Here λ 0 = { λ 02 , . . . , λ 0 K } T are unknown baseline hazards (with λ 0 k = d Λ 0 ( t k ), k = 2 , . . . , K ), and P ( s | w ) and P ( s | b, w ) are the conditional c.d.f.’s of S (1). In the Co x mo d el form ulation, th e estimand VE ( s 1 ) dep ends only on β while the parameters in the cond itional c.d.f.’s P ( s | w ) and P ( s | b, w ) are nuisance parameters. Rather than maximizing the full lik eliho o d o v er the ent ire parameter space, w e tak e th e MEL app roac h [ P ep e and Fleming ( 1991 )] to a void sp ecifying the joint distribu tion of ( S (1) , B , W ) and the in tensive compu tations en tailed in the n um erical in tegration. The condi- tional c.d.f.’s P ( s | w ) and P ( s | b, w ) are ﬁrst consisten tly estimated from the v accine recipient s’ data (Section 4.1.1), and then the estimated likeli ho o d log L ( β , λ , b P ( · ) , b P ( ·|· )) is constructed. F or a categorical W , P ( s | w ) and P ( s | b, w ) can b e estimated nonpara- metrically . Ho w ever, if W is conti n u ous, then nonp arametric estimation will require smoothing and muc h larger sample size s a re n eeded for tractable computation. Th erefore, if W is con tinuous or multi-c omp onent , parametric assumptions on the cond itional c.d.f.’s will usually b e needed to ac hiev e sta- ble estimation in practice. An adv anta ge of the MEL appr oac h is that it can straigh tforwardly accommo date any approac h to estimating the n uisance parameters P ( s | w ) and P ( s | b, w ). In th e MEL approac h w e ﬁrst estimate these distribu tions consisten tly usin g d ata from the v accine recipien ts, and then construct the estimated lik eliho o d L ( β , λ , b P ( · ) , b P ( ·|· )). W e outline thr ee ke y steps in the ev aluation of the log-lik eliho o d ( 4 ) in the absence of the ﬁ rst-phase co v ariates W : 1. Estimation of p ( s ) and p ( s | b ) . Let p ( s ) , p ( b ) , and p ( s , b ) b e marginal and join t p.d .f.s (or p.m.f.s for discrete v ariables) for S (1) and B . Because v accine recipien ts in the IC V pro v id e nonrandom samples of S (1) and B , and v accine r ecipien ts in the IB contribute additional data for B , it f ollo ws that p ( s ) = f 11 ( s ) p 11 + f 10 ( s ) p 10 , p ( b ) = f 11 ( b ) p 11 + f 10 ( b ) p 10 , (5) p ( s, b ) = f 11 ( s, b ) p 11 + f 10 ( s, b ) p 10 , where, for h = 1 , 0, f 1 h ( · ) is the conditional p.d.f. or p.m.f. of S (1) giv en V = 1 and δ = h , and p 1 h ≡ Pr( δ = h | V = 1). The pr ob ab ilities { p 1 h } can b e esti- mated by their sample counterparts { ˆ p 1 h } and estimates of { f 1 h ( s ) , f 1 h ( b ) , f 1 h ( s, b ) } . ASSESSI NG SUR ROGA TE ENDPOINTS IN V ACCINE TRIA LS 11 W e sketc h the estimation for t wo sp ecial cases w here (A) ( S (1), B ) are catego rical and (B) ( S (1) , B ) are biv ariate normally distributed. (A) If S (1) and B hav e discrete v alues with J and L categories, resp ec- tiv ely , then f 1 h ( s j ) and p ( S (1) = s j | b l ) ( j = 1 , . . . , J, l = 1 , . . . , L ) can b e es- timated n onparametrically: ˆ f 1 h ( s j ) = P i ∈ IC V I ( S i (1) = s j , δ i = h ) P i ∈ IC V I ( δ i = h ) , ˆ p ( S (1) = s j | b l ) = P i ∈ IC V ,B i = b l δ i I ( S i (1) = s j ) P i ∈ IC V ,B i = b l δ i ˆ p 11 + P i ∈ IC V ,B i = b l (1 − δ i ) I ( S i (1) = s j ) P i ∈ IC V ,B i = b l (1 − δ i ) ˆ p 10 . (B) If ( S (1) , B ) are join tly normally distributed, then p ( s ) and p ( s | b ) are b oth normal densities and thus can b e estimated using estimates of the ﬁ rst and second momen ts from expressions in ( 5 ). Ev aluating the lik eliho o d ( 4 ) in vol v es integ rations ov er s , whic h are brieﬂy describ ed in the App end ix . 2. Maximization and implementa tion. The estimated log-lik eliho o d log L ( β , λ , b P ( · ) , b P ( ·|· )) is maximized us ing qu asi-Newton metho d s. The as- sumption th at S (1) is observe d with nonzero probability in all su b jects is violated. Therefore, the asymp totic v ariance of b β via the MEL approac h cannot b e deriv ed analytically . W e p r op ose to ob tain the standard errors for b β by the b o otstrap. F or computational eﬃciency , the softw are for estima- tion is implemente d in Matlab 7.0.1 (Math works, Inc) with a C ++ plug in, compiled to dynamic link library . 4.2. Appr o ximate E M-typ e estimation. In this su bsection w e present an estimation approac h that u ses regression calibration to impu te th e missin g S i (1)s for sub jects with a BIP B i measured and emplo ys an EM-t yp e algo- rithm based on full likel iho o d to accommo date th e missing S i (1)s for sub - jects without B i measured. Because the CPV-based d esigns ha ve missin g S (1)s for the ent ire { V = 0 , δ = 1 } stratum, the algorithm can only r eliably estimate the Co x mo d el p arameters for the BIP-alone design, as conﬁrmed in simulat ions. W e fo cus on the BIP-alone design with a con tinuous BIP in this section. Th e prop osed algorithm can b e applied to a categ orical BIP with sligh t mo d iﬁcation. An adv an tage of this EM approac h is that it ac- commo dates contin uous failure times. Because the missingness of S (1) do es not dep en d on un observ ed S (1), and we assume the censoring distr ibution do es not dep end on S (1), th e log- lik eliho o d for the BIP-alone design can b e expressed up to a constan t factor 12 L. QIN , P . B. GILBER T, D . FOLLMANN AND D. LI as l ( β , α , Λ 0 ) = X i ∈ IC { δ i ( Z ′ i β ) − Λ 0 ( X i ) exp( Z ′ i β ) } + X i ∈ IC , IB log  Z exp { δ i ( Z ′ i β ) − Λ 0 ( X i ) exp( Z ′ i β ) } dP ( s | V i , W i , B i )  + X i ∈ IC , IB log  Z exp { δ i ( Z ′ i β ) − Λ 0 ( X i ) exp( Z ′ i β ) } dP ( s | V i , W i )  + δ i log( d Λ 0 ( X i )) , where X i denotes the observed failure time, Λ 0 ( X ) denotes the baseline cum u lativ e h azard fun ction, an d α represent s u nkno wn parameters in th e conditional d istributions of S (1). The log-lik eliho o d score equations can b e solv ed via an iterativ e EM al- gorithm [ Chen and L ittle ( 1999 ), Herring and Ibrahim ( 2001 ), Chen ( 2002 )]. F or compu tational eﬃciency , w e prop ose to mo dify the doub le-semiparametric EM-algorithm of Chen ( 2002 ) to incorp orate the auxiliary co v ariate B as a predictor of the missing S (1). Giv en equation ( 3 ) and th e relat ionship S i (1) = g ( B i ; θ ) + ǫ i , where g ( · ) is a p arametric link function dep end ing on the un kno w n parameter θ and ǫ i has m ean zero and v ariance σ 2 , S i (1) can b e pr edicted by b E ( S (1) | B i ) = g ( B i ; b θ ). When the ev ent o ccurr ence is rare, E ( S (1) | B i ) ≈ E ( S (1) | B i , X i , δ i ). This fact has b een w ell studied in the con- text of regression calibration in the Co x r egression [e.g., Pren tice ( 1982 ) and W ang et al. ( 1997 )]. Ther efore, unobserved S (1)’s can b e r eplaced b y b E( S (1) | B ) and treated as observ ed data in th e E M algorithm. W e name this pro cedure the “Approxima te Calibratio n-Based EM (A CE M)” algorithm. An outline of this pr o cedure is giv en b elo w; interested readers are referr ed to Chen ( 2002 ) for details: 1. Calibration-step: Prediction of unobserved S i (1)s by b S i (1) = b E( S (1) | B i ) . 2. E-step: Giv en parameter v alues at the m th iteration ( β ( m ) , Λ ( m ) 0 ( X ) , α ( m ) , p ( m ) k lj , θ ( m ) ), for p k lj denote the pr obabilit y mass of the observ ed dis- tinct v alues of S (1) at discrete lev els of V = v k and W d = w l ( W = W d ∪ W c where W d and W c denote the categorica l an d conti n u ous co- v ariates in W , resp.), and α ( m ) denote the parameters in the distri- bution P ( W c | S (1) , V , W d , X, δ ) . Calculate conditional exp ectations un d er P ( S (1) | V , W d , X, δ ) . 3. M-step: Up date ( β , Λ 0 ( X ) , α , p k lj , θ ) by solving the corresp ondin g score equations. ASSESSI NG SUR ROGA TE ENDPOINTS IN V ACCINE TRIA LS 13 4. Rep eat the E-step and M-step ab o ve until con v ergence. The adv an tage of the A C EM algorithm is that it can accoun t for con tin u- ous failure times and is computationally fast; ho wev er, since it u ses regression calibration, it p erforms well only for the rare eve n t situation with a highly predictiv e BIP . Preve n tion trials, whic h usu ally h a ve a low ev ent rate, are an app licatio n area. 5. Sim ulation study . W e conducted a simulatio n study to ev aluate the p erformance of the prop osed str ategies for estimating th e estimand VE ( s 1 ) and thereb y assessing a predictiv e s u rrogate in the Co x m o del setting. T o sim u late the r eal scenarios, w e roughly follo w the design of the three HIV v accine eﬃcac y trials describ ed in the in tro du ction. W e sup p ose a total sample size of 5000, with 2500 sub j ects p er arm. The treatmen t indicator V = 1 if assigned v accine and V = 0 if assigned p laceb o. Und er the ca se- cohort sampling, the imm u nogenicit y sub cohort ( IC ) consists of all infected v accine recipien ts and a random sample of unin f ected v acc in e recipien ts, whic h includ e a com b ination of 25% or 50% of uninf ected v accine recipien ts. W e considered one auxiliary co v ariate B as the BIP for the p oten tial im- m u nologica l surrogate S (1). Th e v ariables S (1) and B we re generated from a biv ariate normal distrib ution with m ean zero and v ariance 0.4 for eac h comp onen t [reﬂecting the v ariance of th e ELI S POT assa y u sed to measure S (1) = CD8 + T cell resp onse], and correlation ρ = 0 . 6 or 0.9. F or the BIP- alone and BIP + C P V designs, we assume that B w as m easured f rom all individuals in the IC and from 50% or 37.5% of those not in the IC , as a precision factor. In the BIP-alone approac h, S (1) w as treated as missin g f or all placeb o recipien ts, while for the BIP + C PV and C PV-alone ap p roac hes, w e assu me 25% or 50% uninfected p laceb o r ecipients got th e C PV measure- men t S c (1). Infection times we re generated from the con tinuous-time Co x mo del λ ( t | V , S (1)) = λ 0 ( t ) exp { β 1 V + β 2 S (1) + β 3 V S (1) } , and were group ed in to 6 equal-length time in terv als to reﬂect the discrete visit sc hedule of the trials. The tru e p arameters β 2 = − 1 . 109 and β 3 w ere set at 0, − 0.4, or − 0.7, reﬂecting th e null hyp othesis that S (1) h as no v alue as a predictiv e surrogate and alternativ e hyp otheses of 1.2-fold and 1.5-fold lo wer relativ e risks RR ( S (1)) = 1 − VE ( S (1)) p er 1 stand ard deviation higher imm u ne re- sp onse S (1), corresp onding to lo w and high surr ogate v alue, resp ectiv ely . In addition, λ 0 ( t ) = λ 0 and β 1 w ere calibrated to give VE (0) = 0 . 5 and 334 infections exp ected in the placeb o arm, and hence, 7% o v erall infection rate. Random censoring of 10% w as added to accoun t for sub ject dr op out. All uninfected sub jects w ere censored at the end of the f ollo w-up p erio d, sp ec- iﬁed at 3 y ears. Fiv e h u n dred simulati on r u ns and 50 b o otstrap replicates w ere u s ed to obtain standard error estimates for the estimated r egression parameters. 14 L. QIN , P . B. GILBER T, D . FOLLMANN AND D. LI W e ﬁrst conducted estimation through the MEL algorithm for discrete failure times usin g all three designs. F or th e BIP-alone design, a second sim u lation was conducted to compare th e p erformance of the MEL approac h for group ed failure times, v ersus that of the A C EM algorithm assum ing con tinuous failure times w er e observed in a rare ev ent setting. T o ev aluate eﬃciencies for the p arameter estimates, estimates from the Co x mo d el using the fu ll sim u lated data were obtained as an un attainable “gold standard.” Figure 2 plots the tru e VE ( s 1 ) curve for diﬀerent true parameters ( β 1 , β 3 ) in mo del ( 2 ). It sho ws that w hen β 3 = − 0 . 7, VE (0) = 0 and VE ( s 1 ) > 0 for s 1 > 0 , indicating that the immune r esp onse v ariable is a predictive su rro- gate. T able 1 presents sim u lation resu lts for the MEL appr oac h in diﬀerent set- tings. It can b e seen that the metho d has excelle n t p erformance. T h ere are generally small biases, small v ariances of the estimates an d go o d p o we r of the test of H 0 : β 3 = 0 for surrogate v alue. As more C PV or auxiliary BIP information is a v ailable, b oth the accuracy and precision of the estimates impro v e. The eﬃciency of the BIP-in volv ed designs increases as the corre- lation b et ween the BIP an d S (1) increases. The C PV-alone design is less Fig. 2. Il lustr ation of the estimand VE ( s 1 ) as a function of the standar dize d p otential surr o gate S ( 1) over the r ange of observable values with di ﬀer ent values for β 3 . ASSESSI NG SUR ROGA TE ENDPOINTS IN V ACCINE TRIA LS 15 T able 1 R esults f r om the MEL estimation ˆ β 1 ˆ β 2 ˆ β 3 Design ρ β Missing Bias SD SE RE Bias SE ASE RE Bias S D SE RE Po wer BIP 0.6 β (0) Large 0 . 004 0.13 0.14 26 0 . 007 0.22 0.24 9 − 0 . 013 0.28 0.29 15 5 Medium 0 . 009 0.13 0.14 26 − 0 . 001 0.22 0.22 9 − 0 . 009 0.27 0.27 16 5 β (4) Large − 0 . 001 0.08 0.08 84 0 . 007 0.18 0.18 14 − 0 . 01 0 0.21 0.20 32 52 Medium 0 . 001 0.08 0.08 93 0 . 005 0.16 0.15 18 − 0 . 006 0.19 0.18 41 62 β (7) Large − 0 . 001 0.09 0.08 78 − 0 . 017 0.18 0.18 14 0 . 011 0.22 0.21 28 89 Medium 0 . 000 0.08 0.08 95 0 . 002 0.15 0.15 21 − 0 . 007 0.18 0.18 43 97 0.9 β (0) Large 0 . 001 0.07 0.07 99 0 . 003 0.12 0.12 33 − 0 . 00 7 0.15 0.15 54 5 Medium − 0 . 002 0.07 0.07 94 0 . 003 0.10 0.10 45 − 0 . 00 4 0.13 0.13 68 4 β (4) Large − 0 . 002 0.08 0.07 93 0 . 005 0.12 0.11 33 − 0 . 00 7 0.16 0.15 58 78 Medium 0 . 000 0.07 0.07 100 0 . 006 0.10 0.10 45 − 0 . 007 0.14 0.13 77 86 β (7) Large − 0 . 004 0.08 0.08 86 − 0 . 009 0.12 0.12 32 0 . 004 0.17 0.15 49 99 Medium − 0 . 001 0.08 0.08 100 − 0 . 003 0.10 0.10 47 − 0 . 0 03 0.14 0.14 72 1 00 BIP + CPV 0.6 β (0) Large 0 . 000 0.08 0.07 82 − 0 . 002 0.14 0.13 24 − 0 . 007 0.19 0.18 34 6 Medium − 0 . 001 0.08 0.07 79 − 0 . 004 0.12 0.11 32 0 . 004 0.16 0.15 48 6 β (4) Large 0 . 001 0.08 0.08 88 − 0 . 003 0.13 0.13 26 0 . 002 0.18 0.18 44 61 Medium − 0 . 004 0.08 0.08 88 0 . 004 0.11 0.11 35 − 0 . 00 5 0.15 0.16 59 74 β (7) Large 0 . 003 0.08 0.08 88 − 0 . 001 0.13 0.13 26 0 . 007 0.18 0.18 41 96 Medium − 0 . 003 0.08 0.08 97 − 0 . 004 0.11 0.11 36 0 . 002 0.16 0.16 51 99 0.9 β (0) Large − 0 . 002 0.07 0.07 91 − 0 . 002 0.10 0.10 47 − 0 . 006 0.15 0.14 57 7 Medium − 0 . 004 0.07 0.07 89 0 . 001 0.09 0.08 60 − 0 . 00 2 0.13 0.13 69 8 β (4) Large − 0 . 001 0.08 0.07 95 0 . 002 0.10 0.10 49 − 0 . 00 2 0.14 0.14 69 81 Medium − 0 . 005 0.08 0.07 95 0 . 005 0.08 0.08 66 − 0 . 00 4 0.13 0.13 84 86 β (7) Large 0 . 002 0.08 0.08 95 0 . 000 0.09 0.10 53 0 . 006 0.14 0.15 67 100 Medium − 0 . 004 0.08 0.08 100 0 . 00 0 0.08 0.08 64 0 . 000 0.14 0.13 72 1 00 16 L. QIN , P . B. GILBER T, D . FOLLMANN AND D. LI T able 1 (Continue d) ˆ β 1 ˆ β 2 ˆ β 3 Design ρ β Missing Bias SD SE RE Bias SE ASE RE Bias SD SE RE P ow er CPV β (0) Large 0 . 015 0.10 0.09 51 − 0 . 023 0.26 0.24 7 0 . 031 0.31 0.29 12 8 Medium 0 . 011 0.08 0.08 65 − 0 . 027 0.18 0.18 14 0 . 03 2 0.22 0.21 24 7 β (4) Large 0 . 001 0.09 0.09 62 − 0 . 005 0.24 0.24 8 − 0 . 003 0.29 0.29 17 24 Medium 0 . 001 0.09 0.08 69 − 0 . 012 0.18 0.18 13 0 . 00 8 0.22 0.21 29 47 β (7) Large 0 . 002 0.10 0.09 67 0 . 001 0.23 0.24 8 − 0 . 002 0.28 0.29 17 70 Medium − 0 . 001 0.09 0.09 73 0 . 001 0.17 0.18 16 − 0 . 004 0.21 0.21 30 91 Note . β (0) = ( − 0 . 693 , − 1 . 109 , 0); β (4) = ( − 0 . 849 , − 1 . 109 , − 0 . 4); β (7) = ( − 0 . 996 , − 1 . 109 , − 0 . 7) . S E = M on te Carlo standard error, ASE = a vera ge of the b ootstrap stand ard error from 50 b ootstrap samples; RE = rela tive eﬃciency (AS E(gold standard) 2 / ASE (missing) 2 ) × 100%; Po w er is for testing H 0 : β 3 = 0. “Large Missing” and “Medium Missing” patterns indicate the I C size of 25% or 50% with add itional 25% or 37.5% BIP d ata for designs with BIP , and include closeout S c (1) d ata from 25% or 50 % uninfected p lacebo recipients for designs with CPV, resp ectively . ASSESSI NG SUR ROGA TE ENDPOINTS IN V ACCINE TRIA LS 17 Fig. 3. R elative eﬃciencies of p ar ameter estimators. F or designs with BIP, “ L ar ge Miss- ing ” and “ Me dium Missing ” p atterns indic ate the IC size of 25 % or 50 % wi th addi- tional 25 % or 37.5 % BI P data, r esp e ctively. F or the design with CPV, “ L ar ge missing ” and “ Me dium missing ” p atterns i nclude close out S c (1) data fr om 25 % or 50 % uninfe cte d plac eb o r e cipients, r esp e ctively. T rue values of β ( β (0) , β (4) , β (7) ) ar e as sp e ciﬁe d in T ab les 1 and 2 . eﬃcien t b ecause n one of the infected placeb o su b jects hav e S c (1) measured. Figure 3 disp la ys the relativ e eﬃciencies of the parameter estimators from the three designs w ith m issing S (1) with resp ect to the gold standard esti- mators. Ov erall th e relativ e eﬃciency increases as the amoun t of m easured imm u ne resp onses increases. Th e relativ e eﬃciency of ˆ β 2 is largely impacted b y th e amount of missin g data, while that of ˆ β 1 is less sensitiv e to the missing data pattern. These results conﬁ rm our design assumptions q u ite wel l. T able 2 lists r esults from b oth the MEL approac h and the A CEM algo- rithm un der the BIP-alone design and the medium missing case (the IC size of 50% with ad d itional 37.5% ﬁrst p hase BIP data). It demonstrates th at the p erformance of the A CEM metho d is ve ry sensitiv e to the prediction accuracy of the baseline p r edictor. When the BIP is a fairly inaccurate pre- dictor of S (1) ( ρ = 0 . 6), the ACEM metho d p ro duces large biases and do es 18 L. QIN , P . B. GILBER T, D . FOLLMANN AND D. LI T able 2 Comp arison of r esults b etwe en the MEL and ACEM appr o aches for the BIP-alone design with the “ Me dium Mi ssing ” p attern (the IC size of 50 % with additional 37.5 % BIP data) ˆ β 1 ˆ β 2 ˆ β 3 ρ β Meth od Bias SE ASE RE Bias SE ASE RE Bias SE ASE RE P ow er 0.6 β (0) ACE M − 0 . 139 0.12 0.12 99 0 . 025 0.19 0.20 38 − 0 . 238 0.26 0.26 57 14 MEL 0 . 000 0.13 0.14 87 − 0 . 00 2 0.20 0.22 33 0 . 004 0.26 0.27 59 5 β (4) ACE M − 0 . 151 0.13 0.13 100 0 . 019 0.19 0.20 37 − 0 . 265 0.26 0.26 64 72 MEL 0 . 004 0.14 0.14 88 − 0 . 01 3 0.21 0.21 31 0 . 013 0.26 0.27 60 34 β (7) ACE M − 0 . 162 0.15 0.14 98 0 . 021 0.20 0.19 30 − 0 . 292 0.27 0.26 60 95 MEL 0 . 006 0.15 0.14 79 − 0 . 00 8 0.21 0.20 27 0 . 012 0.26 0.25 51 73 0.9 β (0) ACE M − 0 . 043 0.12 0.12 99 0 . 012 0.13 0.13 80 − 0 . 064 0.21 0.21 86 6 MEL − 0 . 002 0.12 0.13 98 − 0 . 004 0.14 0.14 75 0 . 007 0.21 0.21 88 6 β (4) ACE M − 0 . 046 0.13 0.13 99 0 . 008 0.13 0.13 74 − 0 . 072 0.22 0.21 89 58 MEL 0 . 000 0.13 0.13 97 − 0 . 00 8 0.14 0.14 69 0 . 008 0.21 0.21 87 45 β (7) ACE M − 0 . 054 0.15 0.14 96 0 . 012 0.14 0.13 70 − 0 . 094 0.23 0.22 86 95 MEL 0 . 002 0.14 0.13 89 − 0 . 00 4 0.14 0.13 64 0 . 008 0.21 0.20 79 88 Note . β (0) = ( − 0 . 693 , − 1 . 109 , 0); β (4) = ( − 0 . 849 , − 1 . 109 , − 0 . 4); β (7) = ( − 0 . 996 , − 1 . 109 , − 0 . 7) . S E = M on te Carlo standard error, ASE = a vera ge of the b ootstrap stand ard error from 50 b ootstrap samples; RE = rela tive eﬃciency (AS E(gold standard) 2 / ASE (missing) 2 ) × 100%; P o wer is for testing H 0 : β 3 = 0. ASSESSI NG SUR ROGA TE ENDPOINTS IN V ACCINE TRIA LS 19 not con trol the type-I error rate; while if the calibration is reliable ( ρ = 0 . 9), then the A CEM algorithm can generally estimate well. T he MEL estimation outp erforms the A CEM in most settings, with a sligh t loss of eﬃciency d ue to the grouping of the surviv al times. 6. Discussion. W e ha ve pr op osed a framew ork for assessing an imm u no- logica l predictive surrogate in a v accine trial with a time to ev ent end p oin t and case-cohort sampling of the immunologic al biomarke r. While we hav e fo cused on the metho ds deve lopmen t for v accine trials, the prop osed prin- ciples are applicable for ev aluating p redictiv e su rrogate end p oint s in other biomedical app licatio ns. W e ha ve discussed study designs and estimation pro cedur es, and provided sim u lation results to demonstrate their v alidit y and applicabilit y un der as- sumptions. W e plan to apply the BIP-alone design to the three ongoing or p ending HIV v accine eﬃcacy trials. As demonstrated by the sim u lation study , if go o d baseline ir relev an t predictors exist, then a predictiv e surr ogate can b e ev aluated eﬀectiv ely . The CPV-alone design is also a u seful tool f or the assessmen t that is complimen tary to th e BIP-alone design. If resources p ermit, th e BIP + CPV d esign m erits consid eration b ecause it improv es ac- curacy and eﬃciency compared to the BIP-alone design if b aseline predictors are not closely correlated with the p oten tial predictive su rrogate, or if A4 app ears to b e violated (i.e., the BIP aﬀects clinical r isk after control ling for the p oten tial surrogate and ﬁr s t-phase baseline co v ariates). F or simp licit y , we assumed equal drop-out and r isk for eac h sub ject under assignmen t to v accine or placeb o o v er the time inte rv al [0 , t 1 ] (assumption A3), and restricted the analysis to sub jects at risk at the time the imm une resp onse is measured, t 1 . T o includ e all randomized sub jects, A3 can b e re- laxed by p ostulating th at the futur e immune resp onse th at will b e m easur ed at time t 1 impacts the risk of infection o ver [0 , t 1 ]. With the DFT Cox mo d el ( 1 ), the lik eliho o d con trib ution of a sub ject with early infection during [0 , t 1 ] can b e obtained as R { 1 − (1 − λ 01 ) exp { V i β 1 + sβ 2 + V i sβ 3 + W ′ i β 4 } } dP ( s ) , where P ( · ) is the marginal ( P ( s ) ) or cond itional distribu tion of S (1) ( P ( s | B i )) if the BIP B i is measur ed. Another w a y to p oten tially w eak en A3 w ould b e to assume equ al infection pr obabilities in [0 , t 1 ] for the v accine and placeb o groups, bu t not requ ire that the v accine has no eﬀect for eve r y in d ividual. A4 is a str ong untesta ble assumption. Because we assume B and S (1) are correlated, A4 implies th at the phase one co v ariates W capture all the causes of S (1) and the clinical end p oint [in the sense of P earl ( 2000 )]. F urthermore, it ma y b e diﬃ cu lt to ﬁnd a baseline cov ariate B that is kno wn to not aﬀect clinical risk after accoun ting f or S (1). W e su ggest three p oten tially useful B ’s for v accine trials. First, a study that v acci nated 75 ind ividuals sim ulta- neously with h epatitis A and B v accines show ed a lin ear correlation of 0.85 among A- and B-sp eciﬁc antibo d y titers [ Czesc hinsk i, Bind in g and Witting 20 L. QIN , P . B. GILBER T, D . FOLLMANN AND D. LI ( 2000 )]. Given there is little cross-reactivit y among the hepatitis A and B proteins, B = hepatitis A titer ma y b e an excellen t baseline predictor for S (1) = h epatitis B titer that satisﬁes A4. F or HIV v accine trials, t w o a v ail- able scalar B ’s may p lausibly satisfy A4. First, F ollmann (2006) considered as B the an tib od y titer to a rabies glycoprotein v accine. Because rabies is not acquired sexually , it is p lausible th at ant i-rabies an tib o dies are inde- p endent of r isk of HIV inf ection giv en S (1). S econd, in the on going HIV v accine eﬃcacy trials, a current leading candidate B is the titer of antibo d - ies that neutralize the Adeno viru s serot yp e 5 ve ctor that carries the HIV genes in the v accine. This B has been sho w n to in versely correlate with the S (1) of primary interest (T cell r esp onse leve ls measured by ELISp ot) [ Catanzaro et al. ( 2006 )], and since Aden o virus 5 is a respiratory infection virus, A4 ma y plausibly hold. In general, though, it is desirable to relax A4, and fortunately this can b e done b y includin g B as a comp onent of W in the Co x mod el (1) and estimating its co eﬃcien t (as suggested by the Asso ciate E d itor). This extra co eﬃcien t for B is identi ﬁed by the data from v a ccine recipien ts with B measured. Based on the argumen t giv en by F ollmann (2006) and Gilb ert and Hudgens (2006) for the setting of the BIP-alone design and a dic hotomous clinical end p oint , we conjecture that the estimand VE ( s 1 ) w ill b e id en tiﬁed from the observ ed data as long as at least one of the inte raction terms of B with V or W with V is omitted f rom the Co x mo del. Our approac h sp eciﬁed a Cox regression w ith S i (1) and S i (1) V as co v ari- ates. Another appr oac h is to assume that th e immune resp on s e is measured with s ome “error,” S i (1) = S true i (1) + e i 1 and S c i (1) = S true i (1) + e i 2 (as is done in A5), b ut then to use the true immune resp onses S true i (1) and S true i (1) V as co v ariates in the Co x mo del. T o pro ceed with this mod el, one could obtain r ep licates of S i (1) and S c i (1), sa y , S i 1 (1) , S i 2 (1) and S c i 1 (1) , S c i 2 (1), and assume that the e i s follo w ed a Gaussian distribution with mean 0 and unknown v ariance τ 2 . Then a more complicated likelihoo d could b e writ- ten b y in tegrating S true i (1) o v er the distribution of S true i (1) | S i 1 (1) , S i 2 (1), S true i (1) | S c i 1 (1) , S c i 2 (1), or S true i (1) | B i as appr opriate. W e ha v e p resen ted estimated lik eliho o d based metho ds to accommod ate missing d ata in case-cohort designs, as w ell as a regression calibration b ased double-semiparametric EM algorithm that has reasonable p erformance w hen the regression calibration is reliable and the ev ent is r are. Th is approxi mate algorithm enjo ys the con ve nience of regression calibration to incorp orate auxiliary in formation, and h as faster and easier implementati on for the con- tin u ou s failure time mo d el. Alternativ e estimation metho ds such as multiple imputation ma y also b e useful, pr ovided the p osterior distribu tion can b e prop erly sp eciﬁed. I n add ition, a full lik eliho o d app r oac h that maximized o ve r ( β , λ 0 ) and the parameters of p ( s | w ) a nd p ( s | b, w ) all at once co uld b e used. While the full lik eliho o d should b e more eﬃcien t if the ent ire join t ASSESSI NG SUR ROGA TE ENDPOINTS IN V ACCINE TRIA LS 21 mo del is correctly sp eciﬁed, MEL is simpler to implement and may b e more robust to join t mo del mis-sp eciﬁcatio n . APPENDIX: INTEGRAL CALCULA TION IN LIKELIHOOD ( ?? ) F or discrete S (1) and B , the in tegrations can b e replaced by ﬁnite sum- mations. When S (1) is con tin uous , the integ r ations can b e made easier by p ositing parametric mo dels. Assume S (1) ∼ N( µ ( · ) , σ ( · ) 2 ), wher e µ ( · ) , σ ( · ) 2 represent the ﬁ r st t wo m oments of p ( s ) or p ( s | b ). T hen for a giv en function g ( s ) of s , R g ( s ) p ( · ) ds = R g ( µ ( · ) + σ ( · ) u ) φ ( u ) du, where p ( · ) denotes p ( s ) or p ( s | b ) and φ ( u ) is the standard normal density fu nction. Because the in- tegrand g ( s ) in ( 4 ) is a smo oth function of s , numerica l metho d s suc h as Gaussian qu adrature can b e app lied to ev aluate the in tegration. Based on our exp erience, on ly a small n u m b er (around 15) of ev aluatio ns is needed to get stable quadrature resu lts. When B has discrete v alues b l , l = 1 , . . . , L , an alternativ e wa y to int egrate o ve r s is through the n onparametric represen tation of p ( s ) and p ( s | b ). The in tegrals R g ( s ) p ( · ) ds can b e ev aluated nonparametrically b y Z g ( s ) p ( s ) ds ≈ p 11 1 P i ∈ IC V δ i X i ∈ IC V δ i g ( S i (1)) + p 10 1 P i ∈ IC V (1 − δ i ) X i ∈ IC V (1 − δ i ) g ( S i (1)) , Z g ( s ) p ( s | b l ) ds ≈ p 11 1 P i ∈ IC V ,B i = b l δ i X i ∈ IC V ,B i = b l δ i g ( S i (1)) + p 10 1 P i ∈ IC V ,B i = b l (1 − δ i ) X i ∈ IC V ,B i = b l (1 − δ i ) g ( S i (1)) . Ac kn owledgmen ts. The authors than k Huayun Chen for pro v id ing his fortran cod e for the E M algorithm of Chen ( 2002 ), and the Ed itor, Asso ciate Editor and Referees for helpful commen ts. REFERENCES Borgan, O., Langholz, B., Samuelsen, O., Goldstein, L. and Pogod, J. (2000). Exp osure stratiﬁed case-cohort designs. Lifetime Data An al. 6 39–58. MR1767493 Ca t anzaro , A., Koup, R., Roederer, M. et al. (2006). Safet y an d immunogenici ty ev aluation of a multicl ade HIV-1 candidate va ccine delivered by a replication-defective recom b inant adeno virus vector. J. Infe ctious Dise ases 194 1638–1649 . Chan, I., W ang, W. and Heyse, J. (2003). V ac cine Clinic al T rials i n Encyclop e dia of Biopharmac eutic al Statistics , 2nd ed. Dek ker, New Y ork. 22 L. QIN , P . B. GILBER T, D . FOLLMANN AND D. LI Chen, H. Y. (2002). Double-semiparametric metho d for missing cov ariates in Co x regres- sion mo dels. J. Amer. Statist. Asso c. 97 565–576 . MR1941473 Chen, H. Y. and Little, R. J. A . (1999). Proportional hazards regression with missing co vari ates. J. Amer. Statist. Asso c. 94 896–908. MR1723311 Co x, D. R. (1972). Regression mod els and life-tables ( with discussion). J. R oy. Statist. So c. Ser. B 34 187–220. MR0341758 Czeschinski, P., Binding , N. and Witting, U. (2000). H epatitis A and hepatitis B v accinations: immunogenicit y of combined v accine and of simultaneously or separately applied single v accines. V ac cine 18 1074–1080 . DeGruttola, V. G., Clax , P., DeMe ts, D. L., D o wnin g, G. J., Ellenber g, S. S., Friedman, L., G ail, M . H., P rentice, R ., W ittes, J. and Zeger, S. L. (2002). Considerations in the ev aluation of surrogate endp oints in clinical trials. Contr ol le d Clinic al T rials 22 485–502. F ollmann, D. (2006). Augmented d esigns to assess immune resp onse in v accine trials. Biometrics 62 1161–1169. MR2307441 Frangakis, C. and Rubin, D. (2002). Principal stratiﬁcation in causal inference. Bio- metrics 58 21–29. MR1891039 Gilber t, P. B. and Hudgens, M. G. (2006). Eva luating causal eﬀect p redictiveness of candidate surrogate endp oints. W orking Pa p er 291, UW Biostatistics W orking Paper Series. Gilber t, P. B., Bosch, R. and Hudgens, M. G. (2003). Sensitivity analysis for th e assessmen t of causal va ccine eﬀects on viral loads in HIV v accine trials. Bi ometrics 59 531–541 . MR2004258 Gilber t, P. B., Peterson, M., Fol lma nn, D., Hud gens, M ., Francis, D., Gur with, M., Heyw ard, W., Job es, D., Popo vic, V., Self, S., S inangil, F., Burke, D. and Be rman, P. (2005). Correlatio n b etw een imm unologic resp onses to a recombinan t glycoprotein 120 v accine and incidence of HIV -1 infection in a phase 3 HI V-1 preven tive v accine trial. J. Inf e ctious Dise ases 191 666–677. Halloran, M. E. (1998). V accine stud ies. In Encyclop e dia of Bi ostatistics (P . Armitage and T. Colton, eds.) 4687–4694 . Wiley , New Y ork. Herring, A. H. and Ibrahim, J. G. (2001). Likel ihoo d-b ased metho ds for missing co- v ariates in the Cox p roportional hazards mo del. J. A m er. Statist. A sso c. 96 292–302. MR1952739 Holland, P. ( 1986). St atistics and causal inference (with discussion). J. Amer. Statist. Asso c. 81 945–970. MR0867618 Kulich, M. an d Lin, D. Y. (2004). Improving the eﬃciency of relative-risk estimation in case-cohort stu d ies. J. Amer. Statist. Asso c. 99 832–844. MR2090916 Lin, D., Fleming, T. R. and De Gruttola, V. (1997). Estimating the prop ortion of treatment eﬀect exp lained by a surrogate markers. Statistics in Me dicine 16 1515–1527 . Lin, D. Y. and Yin g, Z. ( 1993). Co x regression with incomplete cov ariate measuremen ts. J. Amer. Statist. Asso c. 88 1341–1349. MR1245368 Little, R. A. and Rub in, D. B. (2002). Statistic al Analysis of M i ssing Data , 2nd ed. Wiley , H oboken, NJ. MR1925014 Mehrotra, D. V., Li, X. and Gilber t, P. B. (2006). A comparison of eigh t method s for the dual-end p oint eval uation of eﬃcacy in a p roof-of-concept H I V v accine trial. Biometrics 62 893–900. MR2247219 Molenberghs, G., Buyse, M., Geys, H., Renard, D., Burz yk owski, T. and Alonso, A. (2002). Statistical challenges in the ev aluation of surrogate endp oints in rand omized trials. Contr ol le d Clinic al T rials 23 607–625. ASSESSI NG SUR ROGA TE ENDPOINTS IN V ACCINE TRIA LS 23 P aik, M. C. and Tsai, W. Y. (1997). On using the Co x prop ortional hazards mo del with missing cov ariates. Biometrika 84 579–593 . MR1603989 Pearl, J. (2000). Causality : Mo dels , R e asoning , and Infer enc e . Cam b ridge U niv. Press. MR1744773 Pepe, M. and Flemi ng, T . (1991). A nonparametric metho d for d ealing with mismea- sured cov ariate data. J. Amer. Statist. Asso c. 86 108–113. MR1137103 Prentice, R. L. (1982). Cov ariate measurement errors and parameter estimation in a failure time regression mod el. Biometrika 69 331–342. MR0671971 Prentice, R. L. (1986). A case-cohort design for epidemiologic cohort stu dies and disease preven tion trials. Biometrika 73 1–11. Prentice, R. L. (1989). S urrogate endp oints in clinical trials: Deﬁnition and operational criteria. Statistics in Me dicine 8 431–440. Ro bins, J. M., R otnitzky, A. and Zhao, L. P. (1994). Estimation of regression co- eﬃcien ts when some regressors are not alwa ys observed. J. Amer. Statist. A sso c. 89 846–866 . MR1294730 Rubin, D. B. (2005). Causal inference using p otential out comes: Design, mo deling, deci- sions. J. Amer. Statist. Asso c. 100 322–33 1. MR2166071 Scheike, T. H. and Mar tinussen, T. (2004). Maximum likelihoo d estimation for Cox’s regression mod el under case-cohort sampling. Sc and. J. Statist. 31 283–293. MR2066254 W ang, C. Y., Hsu, L. , Fen g, Z. D. and Prentice, R. L. (1997). Regression calibration in failure time regression. Biometrics 53 131–145. MR1450183 Weir, C. and W alley, R. (2006). St atistical ev aluation of b iomark ers as surrogate end- p oin ts: A literature review. Statistics in Me dicine 25 183–203. MR2222082 Zhou, H. B. and Pepe, M. S. (1995). Auxiliary cov ariate data in failure time regressio n. Biometrika 82 139–149. MR1332845 L. Qin P. B. Gilber t V accine and Infectious Disease Institute Fred Hutchinson Cancer Research Center Dep ar tment of Biost a tistics University of W ashing ton 1100 F air view A venu e Nor th, LE-400 Sea ttle, W ashington 9 8109 USA E-mail: lqin@scharp.org pgilb ert@sc harp. org D. Follmann Na tional Institute of Allergy and Infectious Diseases 6700B Rockledge Drive MSC 7609 Bethesda, Mar yland 20892 USA E-mail: dfollmann@niaid.nih.gov D. Li School of Ma thema tical Science Peking University Beijing 10087 1 P.R. China E-mail: ldf@math.pku.edu.cn

Assessing surrogate endpoints in vaccine trials with case-cohort sampling and the Cox model

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment