Dynamic likelihood hazard rate estimation

Dynamic lik eliho o d hazard rate estimation Nils Lid Hjort Univ ersity of Oslo and Universit y of Oxford Abstract. The best kno wn methods for estimating hazard rate functions in surviv al analysis models are either purely parametric or purely nonparametric. The parametric ones are sometimes to o biased while the nonparametric ones are sometimes too v ari- able. In the present pap er a certain semiparametric approach to hazard rate estimation, prop osed in Hjort (1991), is developed further, aiming to com bine parametric and non- parametric features. It uses a dynamic lo cal lik eliho o d approac h to ﬁt the lo cally most suitable mem ber in a given parametric class of hazard rates, and amounts to a v ersion of nonparametric parameter smo othing within the parametric class. Th us the parametric hazard rate estimate at time s inserts a parameter estimate that also dep ends on s . W e study bias and v ariance prop erties of the resulting estimator and methods for c ho osing the lo cal smoothing parameter. It is shown that dynamic lik eliho od estimation often leads to b etter p erformance than the purely nonparametric metho ds, while also ha ving capacit y for not losing m uc h to the parametric metho ds in cases where the model being smoothed is adequate. Key words: dynamic lik eliho od, hazard rate, kernel smoothing, lo cal goo dness of ﬁt, lo cal mo delling, semiparametric estimation 1. Introduction and summary . This paper concerns a class of semiparametric type methods of estimating hazard rate functions in mo dels for life history data. The b est kno wn metho ds for estimating such hazard rates are those that are either purely parametric or purely nonparametric. The parametric metho ds are usually biased since parametric mo dels are usually imp erfect, and the nonparametric methods often hav e high estimation v ariance. There should accordingly be ro om for metho ds that someho w lie betw een the parametric and the nonparametric ones. One migh t hop e that such metho ds are b etter than the nonparametric ones if the true hazard is in the vicinit y of the parametric mo del, while not b eing m uch w orse than the parametric ones if the parametric mo del is true. Although results can b e obtained in a more general framew ork of counting pro cess mo dels w e shall mainly b e conten t to illustrate and inv estigate ideas for the ‘random censorship’ mo del, whic h is the simplest and perhaps most important sp ecial case of such mo dels for censored life- time data. It p ostulates that life-times X 0 1 , . . . , X 0 n from a p opulation are i.i.d. with density f ( . ), cum ulative distribution F ( . ), and hazard rate function α ( . ) given by α ( s ) = f ( s ) /F [ s, ∞ ); α ( s ) d s is the probability of failing in [ s, s + d s ) giv en that an individual is still at risk at time s . The life-time X 0 i ma y not b e directly observ ed, ho w ever, b ecause of a p ossibly interfering censoring v ariable C i ; only X i = min( X 0 i , C i ) and the indicator v ariable δ i = I { X 0 i ≤ C i } are observed. F or simplicit y and concreteness w e stipulate that the C i ’s are independent of the life-times and i.i.d. according to a distribution with cumulativ e function G . In particular the n pairs ( X i , δ i ) are i.i.d. Finally we shall assume that data are obtained on a ﬁnite time horizon basis, say on [0 , T ] for a kno wn and ﬁnite T . This is conv enien t for some of the martingale conv ergence theory and is not a practical limitation. The parametric approac h is to postulate that α ( s ) = α ( s, θ ) for a suitable family , indexed b y some one- or multi-dimensional θ . Typical examples include the exponential, the W eibull, the simple frailt y mo del with α ( s ) = θ 1 / (1 + θ 2 s ), the piecewise constan t hazard rate mo del, the Gomp ertz– Mak eham distribution, the gamma, and the log-normal. Prop erties of the maximum lik eliho o d Nils Lid Hjort 1 April 1993 metho d for estimating θ with censored data ha ve been studied by Borgan (1984) and others under the condition that the model is correct, i.e. that there really is some θ 0 with α ( s ) = α ( s, θ 0 ) on [0 , T ]. In practice the mo del is never p erfect, how ev er, and it is useful to study estimation metho ds outside mo del conditions, where the b est parameter is to b e thought of as b eing ‘least false’ or ‘most suitable’, as opp osed to ‘true’. The large-sample b ehaviour of several estimation metho ds in this wider setting has been explored in Hjort (1992). Some results about this are reviewed in Section 2 and are used in later sections. In Section 3 a dynamic lik eliho od approach to parametric hazard rate estimation is presen ted. It tak es as its basis an y giv en parametric hazard function and consists of inserting a local parameter estimate b θ ( s ) in α ( s, θ ) at time s , pro ducing b α ( s ) = α ( s, b θ ( s )) , where the parameter estimate is obtained using only information on those individuals that ha v e surviv ed up to s − 1 2 h and what happ ens to them on [ s − 1 2 h, s + 1 2 h ]. This amoun ts to a kind of nonparametric parameter smo othing within a giv en parametric class. A more general estimator in volving smo othing with a k ernel function is also discussed. Bias and v ariance prop erties are studied in Section 3 for one-dimensional and in Section 4 for m ulti-dimensional families. It turns out that E b α ( s ) . = α ( s ) + 1 2 β K h 2 b ( s ) and V ar b α ( s ) . = γ K nh α ( s ) y ( s ) , where β K and γ K are characteristics of the kernel function used and y ( s ) is the limiting prop ortion of individuals still at risk at time s . The b ( s ) is a certain bias factor, the size of which dep ends on both α 00 ( s ) and characteristics of the underlying parametric mo del used. These results matc h closely those of the most usual nonparametric metho d, that of smo othing the empirical cum ulativ e hazard function, for which E e α ( s ) . = α ( s ) + 1 2 β K h 2 α 00 ( s ) and V ar e α ( s ) . = γ K nh α ( s ) y ( s ) . In Section 5 situations are c haracterised where the new metho d performs b etter than the traditional nonparametric metho d. Metho ds for c ho osing the local smo othing parameter h are discussed in Section 6, including the arduous one that for each s expands the s ± 1 2 h in terv al un til a go o dness of ﬁt criterion rejects the mo del. Overall it transpires that a suitable dynamic likelihoo d estimator often can p erform b etter than the purely nonparametric ones, while at the same tim e not losing m uch to parametric ones when the true hazard is close to the parametric hazard. Finally some supplemen ting results and remarks are oﬀered in Section 7. This pap er expands in sev eral wa ys on the basic results that w ere already presented in Hjort (1991). That pap er also proposed t w o further semiparametric estimation schemes, one using or- thogonal expansions to correct on an initial parametric guess, and one Ba y esian pro cedure that emplo ys a nonparametric prior around a given parametric hazard mo del. 2. Purely nonparametric and purely parametric estimation. This section in tro duces some basic notation and reviews properties of the Nelson–Aalen estimator for the cumulativ e haz- ard function in the nonparametric case and of the maxim um lik eliho od and maxim um w eighted Dynamic Likelihoo d 2 April 1993 lik eliho o d estimators in the parametric case. These will b e used in later sections. Since our ambi- tion is to go b eyond ordinary parametric metho ds the behaviour of these must be considered also outside the conditions of the p ostulated parametric mo del. 2.1. Nonp arametric estima tion. Let N ( t ) = P n i =1 I { X i ≤ t, δ i = 1 } b e the coun ting pro cess and Y ( s ) = P n i =1 I { X i ≥ s } the at risk pro cess, and form from these the Nelson–Aalen estimator b A ( t ) = Z t 0 d N ( s ) Y ( s ) = n X i =1 δ i Y ( X i ) I { X i ≤ t } (2 . 1) for the cumulativ e hazard rate A ( t ) = R t 0 α ( s ) d s . Its properties are b est explained using the martingale B ( t ) = N ( t ) − R t 0 Y ( s ) α ( s ) d s . Let y ( s ) b e the limit in probability of b y ( s ) = Y ( s ) /n , i.e. the limiting prop ortion of individuals under risk at time s , and equal to F [ s, ∞ ) G [ s, ∞ ) under presen t circumstances, where G ( . ) is the censoring distribution. A basic large-sample prop erty of B is that B ( t ) / √ n go es to a Gaußian martingale V ( t ) with indep enden t incremen ts and noise level V ar d V ( s ) = y ( s ) α ( s ) d s , and, more generally , that R t 0 H n ( s ) d B ( s ) / √ n tends to R t 0 h ( s ) d V ( s ) in distribution, in cases where H n ( . ) is previsible (its v alue at s is kno wn at s − ) and conv erges to the deterministic h ( . ). It follo ws from these facts that √ n { d b A ( s ) − d A ( s ) } = I { Y ( s ) ≥ 1 } b y ( s ) d B ( s ) √ n − I { Y ( s ) = 0 } d A ( s ) . = d b y ( s ) − 1 d B ( s ) / √ n → d y ( s ) − 1 d V ( s ) (2 . 2) in the large-sample limit. In particular d b A ( s ) is v ery nearly un biased for d A ( s ) and √ n { b A ( t ) − A ( t ) } tends to the Gaußian martingale R t 0 y ( s ) − 1 d V ( s ) with v ariance R t 0 y ( s ) − 1 α ( s ) d s . See for example the recen t bo ok Andersen, Borgan, Gill & Keiding (1993, Chapter I I) for more details. The usual nonparametric wa y of estimating the hazard rate itself is to smo oth the Nelson–Aalen and take the deriv ativ e, see (5.1). 2.2. Maximum likelihood estima tion. A parametric mo del is of the form α ( t ) = α ( t, θ ), where θ = ( θ 1 , . . . , θ p ) 0 is some p -dimensional parameter. The log-likelihoo d for the observed data can be written L n ( θ ) = R T 0 { log α ( t, θ ) d N ( t ) − Y ( t ) α ( t, θ ) d t } , see for example Andersen et al. (1993, Chapter VI). This deﬁnes the maximum likelihoo d estimator b θ . T o explain the large-sample behaviour of this estimator, let U n ( θ ) = n − 1 R T 0 ψ ( t, θ ) { d N ( t ) − Y ( t ) α ( t, θ ) } d t b e the p -v ector of ﬁrst partial deriv atives of n − 1 L n ( θ ), where we write ψ ( t, θ ) = ∂ ∂ θ log α ( t, θ ). Under natural regularity conditions U n ( θ ) tends in probability to u ( θ ) = R T 0 y ( t ) ψ ( t, θ ) { α ( t ) − α ( t, θ ) } d t , with y ( t ) as ab o v e. The maximum likelihoo d estimator, whic h solves U n ( b θ ) = 0, con v erges in probability to the particular parameter v alue θ 0 that solv es u ( θ 0 ) = 0. W e think of this as the ‘least false’ or ‘agnostic’ parameter v alue, and it minimises the distance measure d [ α, α ( ., θ )] = Z T 0 y  α { log α − log α ( ., θ ) } − { α − α ( ., θ ) }  d t (2 . 3) b et w een true model and approximating mo del. This is pro v ed in Hjort (1992). In later sections we shall also need the large-sample distribution, and quote the follo wing result from Hjort (1992). Con- sider the p × p -matrix ψ ∗ ( t, θ ) = ∂ 2 log α ( t, θ ) /∂ θ∂ θ 0 and the function E ( t ) = R t 0 y ( s ) ψ ( s, θ 0 ) { α ( s ) − Nils Lid Hjort 3 April 1993 α ( s, θ 0 ) } d s (in particular E (0) = E ( T ) = 0). Deﬁne p × p -matrices J = Z T 0 h y ( t ) ψ ( t, θ 0 ) ψ ( t, θ 0 ) 0 α ( t, θ 0 ) − y ( t ) ψ ∗ ( t, θ 0 ) { α ( t ) − α ( t, θ 0 ) } i d t, M = Z T 0 h y ( t ) ψ ( t, θ 0 ) ψ ( t, θ 0 ) 0 α ( t ) +  ψ ( t, θ 0 ) E ( t ) 0 + E ( t ) ψ ( t, θ 0 ) 0  α ( t, θ 0 ) i d t. Then √ n ( b θ − θ 0 ) → d N p { 0 , J − 1 M J − 1 } . Note that under model conditions α ( t ) is indeed equal to α ( t, θ 0 ), the expressions for J and M simplify and b ecome equal, and w e hav e the more familiar- lo oking limit distribution N p { 0 , J − 1 } , a result prov ed by Borgan (1984). 2.3. M-estima tors. W e shall also need some general results ab out w eigh ted lik eliho o d estimators, from Hjort (1992, Section 5). Consider R T 0 G n ( t ) { log α ( t, θ ) d N ( t ) − Y ( t ) α ( t, θ ) d t } instead of the ordinary log-likelihoo d (which uses G n ( t ) = 1), and let b θ g maximise. This estimator also solv es R T 0 G n ( t ) ψ ( t, θ ) { d N ( t ) − Y ( t ) α ( t, θ ) d t } = 0, and belongs to the class of M-estimators for this counting pro cess mo del, see Hjort (1985) and Andersen et al. (1993, Chapter VI). Assume that the weigh t function G n ( t ) is previsible and go es in probability to g ( t ). The ﬁrst result is that this estimator is consistent for the particular least false parameter v alue θ 0 ,g that minimises the distance function d g [ α, α ( ., θ )] = Z T 0 g y  α { log α − log α ( ., θ ) } − { α − α ( ., θ ) }  d t, (2 . 4) a generalisation of (2.3). It also solves R T 0 g ( t ) y ( t ) ψ ( t, θ ) { α ( t ) − α ( t, θ ) } d t = 0. Secondly , √ n ( b θ g − θ 0 ,g ) → d N p { 0 , J − 1 g M g J − 1 g } , (2 . 5) where J g and M g are appropriate generalisations of those app earing abov e. In fact J g = Z T 0 g y  ψ 0 ψ 0 0 α 0 − ψ ∗ 0 ( α − α 0 )  d t, M g = Z T 0  g 2 y ψ 0 ψ 0 0 α + g { ψ 0 E 0 g + E g ψ 0 0 } α 0  d t, (2 . 6) in which E g ( t ) = R t 0 g y ψ 0 ( α − α 0 ) d s , and where α 0 = α ( s, θ 0 ,g ), ψ 0 = ψ ( s, θ 0 ,g ). Note that both E g (0) and E g ( T ) are equal to 0, and that the expressions for J g and M g simplify when the model happ ens to be correct. 3. Dynamic lik eliho o d estimation. Of course the parametric estimation metho d of 2.2 w orks b est if the postulated mo del is adequate, i.e. if there really is a single θ 0 that secures α ( s ) . = α ( s, θ 0 ) throughout [0 , T ]. Otherwise there is mo delling bias present and it could for example b e adv an tageous to use diﬀerent θ 0 ’s in diﬀeren t sub-in terv als. W e shall pursue a somewhat more extreme version of this idea, namely to ﬁt a lo cal estimate b θ ( s ) for eac h s , and then use α ( s, b θ ( s )) in the end. 3.1. D ynamic likelihood. The dynamic or lo cal likelihoo d estimation prop osal is to use the M-estimator apparatus with a ‘windo w function’ G n ( t ) = g ( t ) = I { t ∈ W } , where W = [ s − 1 2 h, s + 1 2 h ] is a lo cal in terv al around a giv en ﬁxed s . So let b θ ( s ) maximise L W ( θ ) = Z W { log α ( t, θ ) d N ( t ) − Y ( t ) α ( t, θ ) d t } . (3 . 1) Dynamic Likelihoo d 4 April 1993 The resulting dynamic likeliho o d hazar d r ate estimator is b α ( s ) = α ( s, b θ ( s )) . (3 . 2) Note that L W ( θ ), the lo cal log-lik eliho o d at window W around s , is a b ona ﬁde log-lik eliho o d, namely that based on those individuals that hav e survived up to s − 1 2 h and information about what happ ens to them in [ s − 1 2 h, s + 1 2 h ]. Showing this is not diﬃcult by ﬁrst noting that this group of individuals hav e probabilit y density = α ( t, θ ) exp  − R t s − h/ 2 α ( u, θ ) d u  for t ∈ [ s − 1 2 h, s + 1 2 h ] , and chance = exp  − R s + h/ 2 s − h/ 2 α ( u, θ ) d u  of further surviving s + 1 2 h. Consciously disregarded, for example, is information ab out individuals failing in [0 , s − 1 2 h ). In- cluding such a [1 − exp {− A ( s − 1 2 h, θ ) } ] n 0 term would ha ve strengthened the lik eliho o d and made our θ estimator more precise – but only if the parametric form of the hazard is correct also to the left of s − 1 2 h . The crucial idea here is to only trust the parametric form lo cally , and this leads to the (3.1) log-likelihoo d. Of course if h is large, whic h should corresp ond to trusting the mo del ov er the full range, then w e get back the full log-lik eliho o d and ordinary maximum lik eliho o d. The b θ ( s ) estimator aims at the lo cally most suitable parameter v alue θ 0 ( W ) = θ 0 ( s ) that minimises (2.4) with g = I W , or, equiv alen tly , solv es R W y ( t ) ψ ( t, θ ) { α ( t ) − α ( t, θ ) } d t = 0. Its large-sample b ehaviour is describ ed b y (2.5), whic h suggests E b θ ( s ) . = θ 0 ( s ) , V AR b θ ( s ) . = J − 1 W M W J − 1 W /n, where J W and M W are as in (2.6) with g ( t ) = I { t ∈ W } . This transforms into corresponding prop erties for b α ( s ) by T a ylor expansions and delta-method argumen ts: E α ( s, b θ ( s )) . = α ( s, θ 0 ( s )) , V ar α ( s, b θ ( s )) . = α ( s, θ 0 ( s )) 2 ψ ( s, θ 0 ( s )) 0 J − 1 W M W J − 1 W ψ ( s, θ 0 ( s )) /n. (3 . 3) These approximations are v alid if h is ﬁxed and n is large. But we are also interested in b ecoming increasingly ﬁne-tuned ab out the s ± 1 2 h in terv al as n grows. In order to study the bias and v ariance prop erties more closely , observe ﬁrst that if z ( t ) is a t wice diﬀeren tiable function deﬁned in a neighbourho o d of s , then R W z ( t ) d t . = z ( s ) h + 1 24 z 00 ( s ) h 3 b y a simple T aylor argumen t. F rom this and the deﬁning equation for θ 0 ( s ) we see that y ( s ) ψ ( s, θ ) { α ( s ) − α ( s, θ ) } + 1 24  y ψ ( ., θ )( α − α ( s, θ ))  00 ( s ) h 2 . = 0 , for the particular v alue θ = θ 0 ( s ), where ( f g h ) 00 ( s ) means the second deriv ative of the f ( s ) g ( s ) h ( s ) function ev aluated at s . This implies generally that α ( s, θ 0 ( s )) = α ( s ) + O ( h 2 ). One can also sho w from this that E α ( s, b θ ( s )) = α ( s, θ 0 ( s )) + O (1 /n ) = α ( s ) + O ( h 2 + 1 /n ) . In order for the bias of the (3.2) estimator to go to zero it is therefore necessary that h → 0 as n → ∞ . Nils Lid Hjort 5 April 1993 A t the momen t we shall b e con tent to giv e a bias formula for the case of a one-parameter family α ( s, θ ), for which α ( s, θ 0 ( s )) . = α ( s ) + h 2 24 h α 00 ( s ) − α 00 0 ( s ) + 2 { α 0 ( s ) − α 0 0 ( s ) } n y 0 ( s ) y ( s ) + ψ 0 0 ( s ) ψ 0 ( s ) oi . (3 . 4) In this form ula α 0 0 ( s ) means the deriv ativ e of α ( s, θ ) w.r.t. s , and then inserted θ = θ 0 ( s ), and similarly for α 00 0 ( s ) and ψ 0 0 ( s ). The case of multi-parametric classes of hazard rates is handled in Section 4. T urning next to the v ariance matrix, one ﬁnds after using the (2.6) expressions and the pre- viously established O ( h 2 ) result for the bias that J W = y ( s ) ψ 0 ( s ) ψ 0 ( s ) 0 α ( s, θ 0 ( s )) h + O ( h 3 ) and M W = y ( s ) ψ 0 ( s ) ψ 0 ( s ) 0 α ( s ) h + O ( h 3 ), under smo othness assumptions on α ( . ) and y ( . ), and writ- ing for simplicit y ψ 0 ( s ) for ψ ( s, θ 0 ( s )). W e note here for the one-parameter case that V ar b θ ( s ) . = ( nh ) − 1 { y ( s ) α ( s ) ψ 0 ( s ) 2 } − 1 , which in its turn implies V ar α ( s, b θ ( s )) . = 1 nh α ( s ) y ( s ) . (3 . 5) Th us nh → ∞ is necessary for the v ariance to go to zero, and this together with h → 0 suﬃces for consistency of the (3.2) estimator. 3.2 Special case: estima ting the local const ant. The simplest mo del to try out is the one having α ( s, θ ) = θ , a constan t hazard. The lo cal hazard estimate and its limit in probability are b α ( s ) = b θ ( s ) = R W d N ( t ) R W Y ( t ) d t → p R W y ( t ) α ( t ) d t R W y ( t ) d t = θ 0 ( s ) , (3 . 6) again with W = [ s − 1 2 h, s + 1 2 h ]. The estimate is of the t yp e total o ccurrence ov er total exp osure, and the underlying local least false parameter is a local y -w eighted av erage of the true hazard rate. By earlier eﬀorts E b α ( s ) . = α ( s ) + h 2 24 n α 00 ( s ) + 2 α 0 ( s ) y 0 ( s ) y ( s ) o and V ar b α ( s ) . = 1 nh α ( s ) y ( s ) . (3 . 7) This can also b e veriﬁed directly . F urther attention to these details is giv en in the next subsection. A general remark about the dynamic lik elihoo d method is that the particular parametric model used should b e allo wed to b e quite crude, since w e only employ it as a lo cal appro ximation to the true hazard rate. This example illustrates this. (3.7) sho ws that ev en when α ( . ) simplistically is mo delled as being lo cally a constan t the result is a reasonable nonparametric estimator. 3.3. Kernel smoothed dynamic likelihood. The dynamic lik eliho o d method of Sections 3.1 and 3.2 can be generalised to kernel smo othed v ariants. Let K ( u ) b e a symmetric kernel function with support [ − 1 2 , 1 2 ] and integral 1. Deﬁne the local kernel smo othed likelihoo d estimator b θ ( s ) to maximise e L W ( θ ) = Z W K  h − 1 ( t − s )  log α ( t, θ ) d N ( t ) − Y ( t ) α ( t, θ ) d t  . (3 . 8) The hazard rate estimator is as in (3.2) with this more general estimate of θ . The previously deﬁned lo cal lik eliho o d estimate corresp onds to the special case K ( u ) = 1 on [ − 1 2 , 1 2 ]. This uniform c hoice has perhaps some sp ecial appeal since the dynamic log-likelihoo d L W ( θ ) then can b e interpreted Dynamic Likelihoo d 6 April 1993 as a gen uine log-lik eliho o d for a subgroup of the individuals under study . The current smo othed lik eliho o d is more of a mathematical construction, but turns out to pro duce estimators with sligh tly b etter prop erties, for go od c hoices of K ( u ). W e can dra w on the general results of 2.3 to ﬁnd appro ximate bias and v ariance for the maximiser of (3.8). Let β K = R u 2 K ( u ) d u and γ K = R K ( u ) 2 d u . (2.6) with T a ylor expansion quic kly giv es J W = y ( s ) ψ 0 ( s ) ψ 0 ( s ) 0 α ( s, θ 0 ( s )) h + O ( h 3 ) , M W = γ K y ( s ) ψ 0 ( s ) ψ 0 ( s ) 0 α ( s ) h + O ( h 3 ) , (3 . 9) The multi-parameter case requires more precise expansions, since the in verse of J W is needed and ψ 0 ( s ) ψ 0 ( s ) 0 has rank 1. Leaving the m ulti-parameter case for Section 4, consider an arbitrary one- parameter family α ( s, θ ), where b θ ( s ) solves R W K ( h − 1 ( t − s )) ψ ( t, θ ) { d N ( t ) − Y ( t ) α ( t, θ ) d t } = 0. It aims at the locally least false θ 0 = θ 0 ( W ) that solves R W K ( h − 1 ( t − s )) ψ ( t, θ ) y ( t ) { α ( t ) − α ( t, θ ) } d t = 0, or R K ( u ) ψ ( s + hu, θ ) y ( s + hu ) { α ( s + hu ) − α ( s + hu, θ ) } d u = 0. T aylor expansion sho ws that R K ( u ) z ( s + hu ) d u = z ( s ) + 1 2 β K h 2 z 00 ( s ) + O ( h 4 ) for smo oth z ( . ) functions, and this, in conjunction with (3.3) and (3.9), leads to E α ( s, b θ ( s )) . = α ( s ) + 1 2 h 2 β K b ( s ) and V ar α ( s, b θ ( s )) . = γ K nh α ( s ) y ( s ) , (3 . 10) where the bias factor is b ( s ) = α 00 ( s ) − α 00 0 ( s ) + 2 { α 0 ( s ) − α 0 0 ( s ) } n y 0 ( s ) y ( s ) + ψ 0 0 ( s ) ψ 0 ( s ) o . (3 . 11) The fact that R uK ( u ) d u = 0 is used here. When K ( u ) is uniform w e get bac k (3.4) and (3.5). Observ e that the appro ximate v ariance do es not dep end on the parametric family emplo y ed (to the order of appro ximation used). 3.4. Special case: local const ant with a kernel. Let us illustrate this for the sp ecial case where α ( s, θ ) = θ . Then b α ( s ) = b θ ( s ) = R W K ( h − 1 ( t − s )) d N ( t ) R W K ( h − 1 ( t − s )) Y ( t ) d t = P | x i − s |≤ h/ 2 K ( h − 1 ( x i − s )) δ i P n i =1 R W ∩ [0 ,x i ] K ( h − 1 ( t − s )) d t , (3 . 12) a locally weigh ted occurrence ov er locally w eighted exp osure estimate. Here and later on x i denotes the observed v alue of X i = min( X 0 i , C i ). Previous eﬀorts give E b α ( s ) = α ( s ) + 1 2 β K h 2  α 00 ( s ) + 2 α 0 ( s ) y 0 ( s ) /y ( s )  + O ( h 4 ) , and v ariance γ K ( nh ) − 1 α ( s ) /y ( s ) as b efore. This generalises (3.7). One theoretical adv an tage that (3.12) has o ver the (3.6) estimator is that it has smaller mean squared error, for several natural c hoices of k ernel K , see 6.1. A more immediate practical adv antage is that K can b e chosen to make it smoother than the (3.6) version, which is discontin uous at time p oin ts s where s ± 1 2 h is equal to observed failure times. (3.12) is contin uous when K ( ± 1 2 ) = 0, and has a contin uous deriv ativ e if K is chosen suc h that K 0 ( ± 1 2 ) = 0. 4. Dynamic lik eliho od for m ulti-parameter families. The dynamic likelihoo d and k ernel smoothed dynamic lik eliho od ideas of Section 3 can be applied for any smo oth parametric Nils Lid Hjort 7 April 1993 family of hazards, but the basic bias and v ariance prop erties hav e so far only b een derived for one-parameter families. W e sa w in (3.9), for example, that the m ulti-parameter case requires more careful expansions. It is not clear at the outset that w e gain in precision by smo othing e.g. a t wo-parameter hazard family . W e should perhaps exp ect larger windo ws to b e required to be able to estimate b oth parameters with reasonable precision. 4.1. A r unning Gomper tz estima tor. The hazard function mo del α ( t ) = a exp( β t ) is sometimes called the Gomp ertz mo del. Concen trating on a ﬁxed s with ﬁxed window W = s ± 1 2 h , w e ma y reparametrise the hazard as α ( t, θ , β ) = a exp( β s ) exp( β ( t − s )) = θ exp( β ( t − s )) for t ∈ [ s − 1 2 h, s + 1 2 h ] , (4 . 1) and interpret θ as the ‘local lev el’ and β as the ‘lo cal slop e’. Deﬁne b θ ( s ) and b β ( s ) as those maximising the kernel smo othed dynamic lik eliho o d e L W ( θ , β ) = Z W K ( h − 1 ( t − s ))  { log θ + β ( t − s ) } d N ( t ) − Y ( t ) θ exp( β ( t − s )) d t  . (4 . 2) One has b θ ( s, β ) = R W K ( h − 1 ( t − s )) d N ( t ) R W K ( h − 1 ( t − s )) Y ( t ) exp( β ( t − s )) d t , (4 . 3) and the resulting proﬁle dynamic likelihoo d can be sho wn to b e concav e in β , and accordingly not v ery diﬃcult to maximise. The maximiser found is then inserted in to (4.3) to give b θ ( s ). Note that the general dynamic likelihoo d recipe giv es b α ( s ) = α ( s, b θ ( s ) , b β ( s )) = b θ ( s ) , (4 . 4) simply , so the β parameter estimate is only somewhat silen tly present. F rom the general theory of Section 2.3 w e kno w that b θ ( s ) and b β ( s ) aim at certain appropriate least false parameter v alues θ 0 = θ 0 ( s ) and β 0 = β 0 ( s ), dep ending on the window W , and that √ n ( b θ ( s ) − θ 0 , b β ( s ) − β 0 ) go es to a zero-mean normal with cov ariance matrix J − 1 W M W J − 1 W . Here J W and M W are as in (2.6) with g = I W . W e now set out to pro vide informativ e approximations for these and for the least false lo cal parameters. The least false parameter v alues are suc h that they solve the t wo equations R W K ( h − 1 ( t − s )) ψ ( t, θ 0 , β 0 ) y ( t ) { α ( t ) − θ 0 exp( β 0 ( t − s )) } d s = 0, where ψ ( t, θ , β ) = (1 /θ , t − s ). The ﬁrst equation giv es θ 0 = R W K ( h − 1 ( t − s )) y ( t ) α ( t ) d t R W K ( h − 1 ( t − s )) y ( t ) exp( β 0 ( t − s )) d t = R K ( u ) y ( s + hu ) α ( s + hu ) d u R K ( u ) y ( s + hu ) exp( β 0 hu ) d u , where the latter integrals are ov er the support [ − 1 2 , 1 2 ] for the kernel function K ( u ). Up on using R K ( u ) z ( s + hu ) d u = z ( s ) + 1 2 β K h 2 z 00 ( s ) + O ( h 4 ) again, one ﬁnds after some calculations that θ 0 . = α ( s ) + 1 2 β K h 2 [ { y ( t ) α ( t ) } 00 ( s ) − α ( s ) { y ( t ) exp( β 0 ( t − s )) } 00 ( s )] /y ( s ) = α ( s ) + 1 2 β K h 2 b ( s, β 0 ) , sa y , up to O ( h 4 ) terms, where in fact b ( s, β 0 ) = α 00 ( s ) − α ( s ) β 2 0 + 2 { y 0 ( s ) /y ( s ) }{ α 0 ( s ) − α ( s ) β 0 } . Similarly the second equation gives R K ( u ) u y ( s + hu ) { α ( s + hu ) − θ 0 exp( β 0 hu ) } d u = 0, whic h Dynamic Likelihoo d 8 April 1993 up on using R K ( u ) u z ( s + hu ) d u = β K z 0 ( s ) h + O ( h 3 ) deliv ers β 0 = α 0 ( s ) /α ( s ) + O ( h 2 ). This can b e plugged in to b ( s, β 0 ) ab ov e to giv e α ( s, θ 0 ( s ) , β 0 ( s )) = θ 0 = α ( s ) + 1 2 β K h 2 { α 00 ( s ) − α 0 ( s ) 2 /α ( s ) } + O ( h 4 ) . (4 . 5) Note that the bias is only O ( h 4 ) at s if the true α ( . ) is lo cally like a Gompertz hazard. Next consider the matrices that determine the approximate v ariances for b θ ( s ) and b β ( s ). F rom (2.6), J W = Z W K ( h − 1 ( t − s )) y ( t ) h  1 /θ 2 0 ( t − s ) /θ 0 ( t − s ) /θ 0 ( t − s ) 2  θ 0 exp( β 0 ( t − s )) +  1 /θ 2 0 0 0 0  { α ( t ) − θ 0 exp( β 0 ( t − s )) } i d t. W e ﬁnd J 11 = h Z K ( u ) y ( s + hu ) θ − 2 0 α ( s + hu ) d u = hy ( s ) α ( s ) /θ 2 0 + 1 2 β K h 3 ( y α ) 00 ( s ) /θ 2 0 + O ( h 5 ) for the (1,1) element. Similar calculations give J W = h h  y ( s ) α ( s ) /θ 2 0 0 0 0  + β K h 2  a 11 a 12 a 12 a 22  i + O ( h 5 ) , where in fact a 11 = 1 2 ( y α ) 00 ( s ) /θ 2 0 , a 12 = y 0 ( s ) + y ( s ) β 0 , and a 22 = y ( s ) θ 0 . Next lo ok at M W = Z W h K ( h − 1 ( t − s )) 2 y ( t )  1 /θ 2 0 ( t − s ) /θ 0 ( t − s ) /θ 0 ( t − s ) 2  α ( t ) + K ( h − 1 ( t − s ))  2 E 1 ( t ) /θ 0 E 2 ( t ) /θ 0 + E 1 ( t )( t − s ) E 2 ( t ) /θ 0 + E 1 ( t )( t − s ) 2( t − s ) E 2 ( t )  i d t, where E 1 ( t ) and E 2 ( t ) are the components of the E ( t ) function deﬁned after (2.6). It turns out that E 1 ( t ) = O ( h 3 ) while E 2 ( t ) = O ( h 4 ), so the second part of the M W matrix is of a smaller size than the ﬁrst. W e ﬁnd after some expansion w ork that M W = h h  γ K y ( s ) α ( s ) /θ 2 0 0 0 0  + δ K h 2  b 11 b 12 b 12 b 22  i +  O ( h 4 ) O ( h 5 ) O ( h 5 ) O ( h 5 )  , where γ K = R K ( u ) 2 d u and δ K = R u 2 K ( u ) 2 d u , and where b 11 = 1 2 ( y α ) 00 ( s ) /θ 2 0 , b 12 = ( y α ) 0 ( s ) /θ 0 , and b 22 = y ( s ) α ( s ). T o reac h expressions for J − 1 W M W J − 1 W w e need to w ork with a matrix of the form ( cE 11 + h 2 A ) − 1 ( dE 11 + h 2 B )( cE 11 + h 2 A ) − 1 , where E 11 is the matrix with 1 as (1,1) elemen t and zeros elsewhere. The result, after length y but elementary calculations, is of the form J − 1 W M W J − 1 W =  h − 1 γ K α ( s ) /y ( s ) + c 11 h + O ( h 2 ) h − 1 c 12 + O ( h ) h − 1 c 12 + O ( h ) h − 3 ( δ K /β 2 K ) / { y ( s ) α ( s ) } + h − 1 c 22 + O (1)  , for certain c ij . W e are primarily interested in the appro ximate v ariance for the local b θ ( s ), in view of (4.4), and this is ( nh ) − 1 γ K α ( s ) /y ( s ) + O ( h/n ), precisely as in the one-dimensional case (3.10). Hence bias and v ariance prop erties are of the same form as in the one-dimensional case, but with a diﬀerent bias factor, inherited from the model one smo oths. Nils Lid Hjort 9 April 1993 4.2. Dynamic likelihood for a general mul ti-p arameter model. Supp ose the hazard rate model is of the t yp e α ( t ) = aγ ( t, β ), i.e. a constan t parameter a times a function whic h dep ends on a p ossibly multi-dimensional parameter β but not on a . Reparametrise lo cally to α ( t ) = aγ ( s, β ) { γ ( t, β ) /γ ( s, β ) } = θ exp { C ( t, β ) − C ( s, β ) } for t ∈ [ s − 1 2 h, s + 1 2 h ] . (4 . 6) The score type function of the mo del is ψ ( t, θ , β ) = (1 /θ , C ∗ ( t, β ) − C ∗ ( s, β )), where C ∗ ( t, β ) = ∂ ∂ β C ( t, β ). Notice that the lo cal estimate b β ( s ) is only ‘silently present’, in that it is used only in conjunction with ﬁnding the lo cal b θ ( s ), as with (4.3) and (4.4). No w the calculations of the Gomp ertz mo del ab o v e can be repeated with the necessary mo di- ﬁcations. As in that case one ﬁnds θ 0 = α ( s ) + 1 2 β K h 2 b ( s, β 0 ) with a similar b ( s, β 0 ), and also that θ 0 c ( s, β 0 ) = α 0 ( s ) + O ( h 2 ), where c ( t, β ) = ∂ ∂ t C ( t, β ). This leads to E b α ( s ) = α ( s ) + 1 2 β K h 2  α 00 ( s ) − α ( s ) { c ( s, β 0 ) 2 + c 0 ( s, β 0 ) } + 2 { y 0 ( s ) /y ( s ) }{ α 0 ( s ) − α ( s ) c ( s, β 0 ) }  + O ( h 4 ) = α ( s ) + 1 2 β K h 2  α 00 ( s ) − θ 0 { c ( s, β 0 ) 2 + c 0 ( s, β 0 ) } + 2 { y 0 ( s ) /y ( s ) } O ( h 2 )  + O ( h 4 ) = α ( s ) + 1 2 β K h 2 { α 00 ( s ) − α 00 0 ( s ) } + O ( h 4 ) , where α 00 0 ( s ) is the second deriv ativ e of the mo del’s hazard rate θ exp { C ( t, β ) − C ( s, β ) } w.r.t. t , ev aluated at s , and with the lo cal least false parameters θ 0 = θ 0 ( W ) and β 0 = β 0 ( W ) inserted. W e also ha ve α 00 0 ( s ) = θ 0 { c ( s, β 0 ) 2 + c 0 ( s, β 0 ) } = α 0 ( s ) 2 /α ( s ) + α ( s ) c 0 ( s, β 0 ) + O ( h 2 ). Note that (4.5) is a sp ecial case. One next ﬁnds that the (1,1) elemen t of the appropriate J − 1 W M W J − 1 W matrix is y et again equal to h − 1 γ K α ( s ) /y ( s ) + O ( h ), alb eit with a more in v olved expression for the constan t in the secondary O ( h ) term. The basic prop erties for the dynamic likelihoo d estimator are accordingly once more of the familiar type E b α ( s ) = α ( s ) + 1 2 β K h 2 { α 00 ( s ) − α 00 0 ( s ) } + O ( h 4 ) and V ar b α ( s ) . = γ K nh α ( s ) y ( s ) , (4 . 7) with a bias term 1 2 β K h 2 b ( s ) appropriate to the parametric model emplo y ed. As noted ab o ve there are also alternativ e useful expressions for the b ( s ) term, since we can mov e out O ( h 2 ) terms. 4.3. A running Weibull estima tor. As an example of the previous general machin- ery , consider the W eibull mo del, which uses α ( t ) = abt b − 1 for certain parameters a and b . W e reparametrise to α ( t ) = θ ( t/s ) β , where θ = abs b − 1 and β = b − 1. Let b θ ( s ) and b β ( s ) maximise the k ernel smoothed dynamic log-likelihoo d Z W K ( h − 1 ( t − s ))  { log θ + β (log t − log s ) } d N ( t ) − Y ( t ) θ ( t/s ) β d t  . Then use b α ( s ) = b θ ( s ) in the end. The results ab o v e imply θ 0 β 0 = sα 0 ( s ) + O ( h 2 ), and the bias is 1 2 β K h 2 { α 00 ( s ) − θ 0 ( β 2 0 − β 0 ) /s 2 } + O ( h 4 ) = 1 2 β K h 2 { α 00 ( s ) − α 0 ( s ) 2 /α ( s ) + α 0 ( s ) /s } + O ( h 4 ) . (4 . 8) The appro ximate v ariance is yet again γ K ( nh ) − 1 α ( s ) /y ( s ). Note that the bias is only O ( h 4 ) if the true hazard is lo cally a W eibull hazard. Dynamic Likelihoo d 10 April 1993 4.4. A dynamic nonincreasing estima tor. As a ﬁnal example, consider the simple frailty mo del with hazard rate a/ (1 + β t ). This is the hazard rate in a population where eac h individual has a constan t hazard rate but where these v ary in the p opulation according to a gamma distribution with mean a and v ariance β . The local parametrisation is θ (1 + β s ) / (1 + β t ) for t ∈ [ s − 1 2 h, s + 1 2 h ]. Ev en though the mo del can tolerate a small negativ e v alue for β we shall in this example tak e it a priori as a nonnegative quantit y . So let b θ ( s ) and b β ( s ) maximise Z W K ( h − 1 ( t − s ))  { log θ + log(1 + β s ) − log (1 + β t ) } d N ( t ) − Y ( t )(1 + β s ) d t/ (1 + β t )  , and use b α ( s ) = b θ ( s ) in the end. Then E b α ( s ) = α ( s ) + 1 2 β K h 2 { α 00 ( s ) − 2 α 0 ( s ) 2 /α ( s ) } + O ( h 4 ) and V ar b α ( s ) . = γ K nh α ( s ) y ( s ) . (4 . 9) 5. Comparison with the traditional k ernel estimator. Estimators dev elop ed in Sections 3 and 4 can no w be compared with the classical nonparametric estimator. 5.1. The smoothed Nelson–Aalen estima tor. The traditional nonparametric estimator is a k ernel smooth of the (2.1) estimator of the cumulativ e, e α ( s ) = Z W h − 1 K ( h − 1 ( t − s )) d b A ( t ) = X | x i − s |≤ h/ 2 h − 1 K ( h − 1 ( x i − s )) δ i / Y ( s i ) . (5 . 1) When K ( u ) is uniform this b ecomes { b A ( s + 1 2 h ) − b A ( s − 1 2 h ) } /h , for example. F rom the prop erties of b A reviewed in Section 2 it is not diﬃcult to derive E e α ( s ) . = α ( s ) + 1 2 β K h 2 α 00 ( s ) and V ar e α ( s ) . = γ K nh α ( s ) y ( s ) . (5 . 2) See also Ramlau-Hansen (1983), Y andell (1983), and T anner and W ong (1983), who all studied estimators of this t yp e, and Andersen et al. (1993, Chapter IV). It is remark able that the new estimators α ( s, b θ ( s )) and the traditional one hav e exactly the same approximate v ariance and the same type of approximate bias, when they use the same k ernel and the same bandwidth; see (3.10), (4.4) and (4.7). 5.2. When is the dynamic method al w a ys better? The dynamic k ernel smo othed lik eli- ho od estimator has approximate bias 1 2 β K h 2 b ( s ), with a b ( s ) function depending on the underlying parametric family used. In view of the comparison already made ab o v e it follo ws that the new metho d is alwa ys as goo d as or b etter than the Ramlau-Hansen–Y andell estimator, with the same k ernel and the same window size, provided only | b ( s ) | ≤ | α 00 ( s ) | . F or the one-parameter situation the question is whether | α 00 ( s ) − α 00 0 ( s ) + 2 { α 0 ( s ) − α 0 0 ( s ) }{ y 0 ( s ) /y ( s ) + ψ 0 0 ( s ) /ψ 0 ( s ) }| ≤ | α 00 ( s ) | . (5 . 3) This can easily happ en if the parametric family is only mo derately acceptable. F or the sp ecial case (3.12), for whic h α 0 0 ( s ), α 00 0 ( s ) and ψ 0 0 ( s ) are absen t, the inequalit y migh t tak e place in regions where α is conv ex and increasing, or conca ve and decreasing. When there is no censoring y = exp( − A ) Nils Lid Hjort 11 April 1993 and y 0 /y = − α , and then the criterion for when (3.12) is b etter than then the traditional (5.1) b ecomes 0 ≤ α ( s ) α 0 ( s ) /α 00 ( s ) ≤ 1. F or the multi-parametric families of Section 4 w e ha v e established b ( s ) = α 00 ( s ) − α 00 0 ( s ), with α 00 0 ( s ) stemming from the model used, and with certain useful alternativ e expressions. The dynamic lik eliho o d estimator is b etter than (5.1), when the same h is used, whenev er | α 00 ( s ) − α 00 0 ( s ) | ≤ | α 00 ( s ) | , which can be rewritten 0 ≤ α 00 0 ( s ) α 00 ( s ) ≤ 2 (5 . 4) when the second deriv ativ e of the true hazard is not zero. If the parametric mo del used is locally correct, then the ratio is 1 and the bias is O ( h 4 ) only . If we tak e ‘the parametric mo del is roughly adequate’ to mean (5.4), then indeed the dynamic lik eliho od estimator is alwa ys b etter than (5.1) under such circumstances, for each h and eac h K . F or the t wo-parametric running Gomp ertz estimator (4.4) the criterion is 0 ≤ α 0 ( s ) 2 / { α ( s ) α 00 ( s ) } ≤ 2 , (5 . 5) and the ratio is 1 exactly for Gomp ertz hazards a exp( β s ). If for example α ( s ) = a + be cs is of Gomp ertz–Mak eham form, then the ratio is be cs / ( a + be cs ) and w ell inside (0 , 2), sho wing that (4.4) will b e b etter than (5.1) for all suc h hazards. Similarly , for the tw o-parametric running W eibull estimator of 4.3, the new estimator is alwa ys b etter than (5.1) in regions where 0 ≤ α 0 ( s ) 2 / { α ( s ) α 00 ( s ) } − α 0 ( s ) / { sα 00 ( s ) } ≤ 2 , (5 . 6) and the function app earing in the middle is equal to 1 exactly for W eibull hazards. And ﬁnally the criterion for when the estimator of 4.4 is alw ays b etter than (5.1) is 0 ≤ α 0 ( s ) 2 / { α ( s ) α 00 ( s ) } ≤ 1 . (5 . 7) 5.3. Vicinity of p arametric model. As explained ab ov e one can exp ect the metho ds dev elop ed to p erform better than the traditional (5.1), and surely also b etter than other purely nonparametric estimators, if only the parametric mo del used is roughly adequate. So far this statemen t has b een referring to a comparison when the t w o metho ds hav e used the same window width h . But in these cases w e w ould really expect the new metho ds to perform not only b etter but m uch better, b y carefully c ho osing a go o d windo w width. When the bias is smaller we can select a larger windo w and b e rewarded with smaller v ariabilit y , cf. the mean squared error calculations of 6.1. It should also b e possible to improv e on the conv ergence rate if the true hazard lies suitably close to the parametric family . A mathematical framework to mak e this notion more precise could b e as follo ws. There is a sequence of exp eriments where at stage n there are data ( X i , δ i ) on n individuals coming from a distribution with true hazard α ( . ) = α n ( . ). Supp ose this is such that the bias factor is b ( s ) = b 0 ( s ) n −  , for some  ∈ [0 , 1 2 ). Then the best ac hiev able mean squared error is of size proportional to n − (4 − 2  ) / 5 , and this happ ens with h chosen as a suitable h 0 = cY n ( s ) − (1 − 2  ) / 5 . A cross v alidation or other clever h selection scheme will pic k this up, cf. the following section. The b est nonparametric conv ergence rate is n − 4 / 5 , for b oth p oint-wise and in tegrated mean squared error, and these calculations sho w that this can be impro ved up on for alternatives in the vicinit y Dynamic Likelihoo d 12 April 1993 of the parametric mo del. If α 00 ( s ) − α 00 0 ( s ) = O ( n − 1 / 4 ), for example, then the mean squared error is O ( n − 9 / 10 ), and alternativ es lying almost O ( n − 1 / 2 ) aw a y , in the ab ov e sense, are estimated with almost full parametric O ( n − 1 ) precision. The p oin t of comparison is that the traditional (5.1) estimator will still only accomplishes O ( n − 4 / 5 ) precision for these hazard rates. 6. Cho osing the smo othing parameter. W e hav e deﬁned b α ( s ) = α ( s, b θ ( s )) for given parametric family α ( s, θ ) and kernel K . The most decisiv e inﬂuence on the estimator is due to the smo othing parameter h . 6.1. Mean squared error calcula tions. By (3.10) and (4.7) the appro ximate mean squared error is of the form E { b α ( s ) − α ( s ) } 2 . = 1 4 h 4 β 2 K b ( s ) 2 + γ K nh α ( s ) y ( s ) , where b ( s ) is the appropriate bias factor stemming from the parametric recipe used. The mean squared error is minimised for h 0 ( s ) = n γ K β 2 K α ( s ) b ( s ) 2 o 1 / 5 1 { ny ( s ) } 1 / 5 . (6 . 1) The resulting minimal mean squared error is 5 4 ( β K γ 2 K ) 2 / 5 α ( s ) 4 / 5 b ( s ) 2 / 5 / { ny ( s ) } 4 / 5 . Diﬀeren t c hoices of reasonable k ernels give ab out the same result, but the b est choice, managing to minimise β K γ 2 K among kernels on [ − 1 2 , 1 2 ] with integral 1, is the Bartlett–Y epanec hniko v k ernel K 0 ( u ) = 3 2 (1 − 4 u 2 ) on [ − 1 2 , 1 2 ]. The resulting α ( s, b θ ( s )) estimator is con tinuous in s but its deriv ativ e will ha ve discon tinuities at p oints s where s ± 1 2 h hits an observed failure time. W e ha v e seen that the new metho ds can outp erform the traditional ones b y reducing the bias, sa y from b trad ( s ) = α 00 ( s ) of (5.1) to p ossibly smaller b ( s ) = α 00 ( s ) − α 00 0 ( s ) for those of Section 4. It is therefore of in terest to note that the squared bias makes up 20% and the v ariance 80% of the appro ximate mean squared error, so bias reduction can p erhaps not b e exp ected to giv e dramatic gains. If b ( s ) = 1 2 b trad ( s ), for example, then the b est theoretical window width becomes h 0 = 1 . 32 h 0 , trad , and the mean squared error is reduced with 24%. The h 0 form ula cannot b e put to direct use since it depends on the hazard rate itself, but the rate of conv ergence to zero of mean squared error b ecomes the optimal n − 4 / 5 when h is chosen prop ortional to n − 1 / 5 . The formula indicates that h should be c hosen proportional to Y ( s ) − 1 / 5 in practice. One p ossibilit y is to use h n = cY ( s ) − 1 / 5 and try to minimise a global criterion lik e E R T 0 w ( s ) { b α ( s ) − α ( s ) } 2 d s w.r.t. c . The result is a lo cal v ariable k ernel smoothed lik eliho o d estimator with h n = n γ K β 2 K R T 0 w ( s ) y ( s ) − 4 / 5 α ( s ) d s R T 0 w ( s ) y ( s ) − 4 / 5 b ( s ) 2 d s o 1 / 5 1 Y ( s ) 1 / 5 . (6 . 2) W e migh t for example c ho ose w eigh t function w ( s ) = y ( s ) 4 / 5 here, this b eing in v ersely prop ortional to the optimal mean squared error, and this simpliﬁes (6.2). The nominator integral can b e esti- mated with n 1 / 2 -precision. Some pilot estimate b α pil , lik e the (5.1) estimator with an o verall twice diﬀeren tiable k ernel K 2 and a somewhat large h 2 , can be used to estimate the denominator in tegral. A ﬁnal adjustmen t is needed since R T 0 b b ( s ) 2 d s will b e biased. W orking out expressions for the bias of R T 0 b b ( s ) 2 d s as an estimator of R T 0 b ( s ) 2 d s tak es some eﬀorts, for the most interesting estimators Nils Lid Hjort 13 April 1993 of Sections 3 and 4, but is within comfortable reac h of Ramlau-Hansen’s (1983) metho ds. In the end this pro duces a practical algorithm of ‘plug-in’ t yp e. The discussion ab o ve is v alid for one-parameter families and also for the class of multi- parameter families considered in Section 4, since the appro ximate v ariance of b α ( s ) also in these situations turned out to be of the form γ K ( nh ) − 1 α ( s ) /y ( s ). In the mo dels of Section 4 there is a ‘lo cal position’ parameter θ and a ‘lo cal slope’ parameter β . Note that the local slop e estimate b β ( s ) has quite larger v ariance than the lo cal position estimate b θ ( s ). In the running Gomp ertz case the slop e estimate has v ariance prop ortional to n − 1 h − 3 / { y ( s ) α ( s ) } , for example. The b est windo w size for β estimation is prop ortional to Y ( s ) − 1 / 7 , but the best size for θ estimation, whic h is our primary concern, is still prop ortional to Y ( s ) − 1 / 5 . These quantitativ e results are p erhaps as exp ected, in view of similar results from densit y estimation and nonparametric regression. They also suggest that the b β ( s ) that is inserted in b θ ( s, β ) of (4.3) to pro duce the ﬁnal (4.4) can b e quite v ariable if pro duced from Y ( s ) − 1 / 5 -windo ws, and it may b e adv antageous to use a separate estimation sc heme for estimation of this parameter, with somewhat larger windo ws. See also Remarks 7B and 7G. The reasoning that led to (6.1) and (6.2) is also p ertinent for the problem of choosing h in the Ramlau-Hansen–Y andell estimator (5.1), since the bias and v ariance structure are of the same t ype, only with b ( s ) = α 00 ( s ) instead. W e also note that there are other wa ys of obtaining a data-driven h n ( s ), like cross v alidation or b o otstrapping, but these are not discussed further here. References to cross v alidation tec hniques for the (5.1) estimator are Nielsen (1990) and Gr´ egoire (1993), and these techniques should carry ov er at least to the (3.12) estimator. 6.2. Local goodness of fit testing. If the parametric mo del do esn’t ﬁt well the dynamic lik eliho o d hazard estimator is still reasonable, and resembles the nonparametric Nelson–Aalen smo other (5.1) in p erformance. At the same time our metho d is able to outp erform (5.1) as w ell as other purely nonparametric metho ds in cases where the parametric family α ( s, θ ) used is only roughly acceptable, as explained in Section 5. In such cases the size of the bias is small, which b y (6.2) suggests using quite a large bandwidth h , whic h in its turn almost amounts to using an ordinary parametric metho d. A natural but somewhat elaborate strategy is to c ho ose h = b h ( s ) to be the smallest h for whic h some conv enien t goo dness of ﬁt criterion rejects the parametric model on s ± 1 2 h . The ultimate case is of course no detectable departure from the mo del ov er the full range [0 , T ], whic h then leads to using h = ∞ , i.e. ordinary parametric estimation α ( s, b θ [0 ,T ] ), say . Hjort (1985, 1990) has dev elop ed classes of goo dness of ﬁt tests for general parametric coun ting pro cess models, and these are indeed presented there as tests of v alidit y o ver the full range [0 , T ]. Similar mathematical tec hniques can ho wev er b e used to construct procedures that chec k mo del adequacy o ver a general [ a, b ] in terv al, and some suc h are presented next. This apparatus w ould then b e used with [ a, b ] = [ s − 1 2 h, s + 1 2 h ], mostly , but to get the running estimator started one w ould look for mo del adequacy ov er [0 , b ] in terv als ﬁrst, cf. Remark 7A. 6.3. One-p arameter f amilies. Consider dynamic smo othing of an arbitrary one-dimensio- nal parametric family α ( u, θ ). Let b θ [ a,b ] b e the lo cal maxim um likelihoo d estimator using only [ a, b ] information, i.e. it solves R b a ψ ( u, θ ) { d N ( u ) − Y ( u ) α ( u, θ ) d u } = 0. Let D n ( t ) = n − 1 / 2 Z t a ψ ( u, b θ [ a,b ] ) { d N ( u ) − Y ( u ) α ( u, b θ [ a,b ] ) d u } for t ∈ [ a, b ] . Dynamic Likelihoo d 14 April 1993 It uses the ‘basic martingale’ d N ( u ) − α ( u, θ ) d u and is able to pic k up departures from the para- metric mo del. Notice that D n ( . ) starts and ends at zero. Metho ds of Hjort (1990) can be used to prov e that D n ( . ), if indeed the model holds on [ a, b ], con verges to a zero-mean Gaußian pro cess D ( . ) with co v ariance function co v { D ( t 1 ) , D ( t 2 ) } = τ 2 ( b ) { p ( t 1 ∧ t 2 ) − p ( t 1 ) p ( t 2 ) } , in which τ 2 ( t ) = R t a y ( u ) ψ ( u, θ ) 2 α ( u, θ ) d u and p ( t ) = τ 2 ( t ) /τ 2 ( b ). But this sho ws that D ( . ) is distributed as a scaled and time-transformed Brownian bridge, τ ( b ) W 0 ( p ( . )). Consequently max a ≤ t ≤ b | D n ( t ) | / b τ ( b ) is asymptotically distributed as k W 0 k = max 0 ≤ s ≤ 1 | W 0 ( s ) | , where b τ 2 ( b ) = R b a n − 1 Y ( u ) ψ ( u, b θ [ a,b ] ) 2 α ( u, b θ [ a,b ] ) d u estimates τ 2 ( b ). A natural procedure is therefore to stretc h the [ a, b ] = [ s − 1 2 h, s + 1 2 h ] in terv al until n Z b a Y ( u ) ψ ( u, b θ [ a,b ] ) 2 α ( u, b θ [ a,b ] ) d u o − 1 / 2 max a ≤ t ≤ b    Z t a ψ ( u, b θ [ a,b ] ) { d N ( u ) − Y ( u ) α ( u, b θ [ a,b ] ) d u }    ≥ 1 . 225 , (6 . 3) sa y , 1.225 b eing the upper 10% p oin t of the distribution of k W 0 k . One might opt for 1.359 instead, the upp er 5% point. Observe that the maxim um v alue must be attained at one of the p oin ts x i of x i − , with a ≤ x i ≤ b , so the con tinuous maxim um is really only a ﬁnite maxim um, and is p erfectly feasible to compute eﬃciently , for given s ± 1 2 h window. When c ho osing window sizes for the (3.12) estimator, for example, which uses lo cal constan ts, the windows should be stretc hed un til N [ a, b ] − 1 / 2 max a ≤ t ≤ b    N [ a, t ] − Z t a Y ( u ) b θ [ a,b ] d u    ≥ 1 . 225 , (6 . 4) where in this case b θ [ a,b ] = N [ a, b ] / R b a Y ( u ) d u . 6.4. Local model adequa cy for mul ti-p arameter hazard ra tes. Next turn attention to dynamic lik eliho od smo othing of a m ulti-parametric class of hazards, with p ≥ 2 parameters. Let this time D n ( t ) = n − 1 / 2 n N [ a, t ] − Z t a Y ( u ) α ( u, b θ [ a,b ] ) d u o for t ∈ [ a, b ] , with a view tow ards using the maximal absolute v alue as a test for mo del adequacy . Here b θ [ a,b ] is the local maximum likelihoo d estimator using [ a, b ]-information, i.e. solving the p equations R b a ψ ( u, θ ) { d N ( u ) − Y ( u ) α ( u, θ ) d u } = 0. T ec hniques of Hjort (1990) can be used to demonstrate pro cess conv ergence of D n ( . ) tow ards D ( t ) = V [ a, t ] −  Z t a y ( u ) ψ ( u, θ ) α ( u, θ ) d u  0 Σ − 1 Z b a ψ ( u, θ ) d V ( s ) , where V ( . ) is a Gaußian martingale with noise level V ar d V ( u ) = y ( u ) α ( u, θ ) d u , and where Σ = R b a y ( u ) ψ ( u, θ ) ψ ( u, θ ) 0 α ( u, θ ) d u . The D 0 ( t ) = V [ a, t ] = R t a d V ( u ) pro cess is quite simple, it has indep enden t increments and hence is a scale- and time-transformed Bro wnian motion pro cess. The p oin t is now that if one considers the D 0 ( . ) process conditioned on the p ev en ts R b a ψ ( u, θ ) d V ( u ) = 0, then Gaußianeity and co v ariance calculations can b e furnished to demonstrate that this is exactly distributed as the D ( . ) process; cf. Remark 7F in Hjort (1990). This makes it possible to b ound the distribution of k D k = max a ≤ t ≤ b | D ( t ) | , even though the exact distribution might b e to o diﬃcult to obtain. Nils Lid Hjort 15 April 1993 W e no w sp ecialise to a class of hazards of the form α ( u, θ ) = θ γ ( u, β ), cf. (4.6), in which case the ψ ( . ) function has ﬁrst comp onen t 1 /θ and second comp onen t φ ( u, β ), sa y . In this case the limit pro- cess D ( . ) is distributed as D 0 ( . ), tied do wn ﬁrst with D 0 ( b ) = 0 and then with R b a φ ( u, β ) d V ( u ) = 0. Letting D ∗ ( . ) b e the result of tying down D 0 ( . ) with only the ﬁrst requirement, cov ariance calculations show that D ∗ ( . ) = d τ ( b ) W 0 ( p ( . )), this time with τ 2 ( t ) = R t a y ( u ) θ γ ( u, β ) d u and p ( t ) = τ 2 ( t ) /τ 2 ( b ). So the distribution of D ( . ) is that of tying do wn D ∗ ( . ) further, and it can b e seen that the distribution of k D k is stochastically smaller than the distribution of k D ∗ k , just as the distribution of a maximal absolute Bro wnian bridge is sto c hastically smaller than the dis- tribution of a maximal absolute Brownian motion. Here τ 2 ( b ) is estimated consistently with R b a n − 1 Y ( u ) b θ [ a,b ] γ ( u, b β [ a,b ] ) d u , and w e also ha ve b θ [ a,b ] = N [ a, b ] / R b a Y ( u ) γ ( u, b β [ a,b ] ) d u . The end result is to use N [ a, b ] − 1 / 2 max a ≤ t ≤ b    N [ a, t ] − Z t a Y ( u ) b θ [ a,b ] γ ( u, b β [ a,b ] ) d u    ≥ 1 . 225 (6 . 5) as a conserv ativ e 10% lev el test criterion for rejecting θ γ ( u, β ) as a mo del for the hazard on [ a, b ]. It is worth noting that the diﬀerence b etw een the distributions of k D k and k D ∗ k is small when the [ a, b ] interv al is not large, pro vided the model b eing tested has the local reparametrisation form (4.6). This can b e shown after expanding the Σ − 1 matrix here in a wa y similar to that for J − 1 W in Sections 4.1 and 4.2. Th us 1.225 ab ov e is mean t to be a conserv ative v alue but actually also an appro ximation to the real 0.90 point of the null distribution. Of course this approximation cannot b e expected to be ov erly precise, and some exp erimentation with the 1.225 rejection limit w ould b e needed. On the computational side we point out that the maxim um again must b e attained for one of t = x i or x i − with a ≤ x i ≤ b . F urthermore, Z x i a Y ( u ) b θ [ a,b ] γ ( u, b β [ a,b ] ) d u = X j : x j ≥ a b θ [ a,b ] { G ( x i ∧ x j , b β [ a,b ] ) − G ( a, b β [ a,b ] ) } , (6 . 6) where G ( t, β ) = R t 0 γ ( u, β ) d u . 6.5. Other tests for model adequacy on inter v als. There are naturally other possible go odness of ﬁt tests for in terv als, see Hjort (1990) for other D n ( . ) t yp e functions and for classes of c hi squared t yp e tests and Hjort and Lumley (1993) for normalised local hazard plots. Chi squared metho ds w ould be awkw ard to implement in a general w ay here, since the [ a, b ] interv als w ould often b e short. W e record a couple of potentially useful v ariations on the D n ( . ) theme, ho wev er, with a view to wards quic k calculations and decisions, since tests are to b e carried out on slowly expanding [ s − 1 2 h, s + 1 2 h ] interv als for each s . (6.3)–(6.5) arose as maxima of the D n ( . ) pro cess, and utilised con vergence to suitably scaled and time-transformed Brownian bridges, as with Kolmogorov–Smirno v type tests. Martingale tec hniques for the coun ting pro cess N can be used to show Z b a | D n ( t ) | q d N ( t ) /n → d Z b a | D ( t ) | q y ( t ) θ γ ( t, β ) d t ≤ d Z b a | τ ( b ) W 0 ( p ( t )) | q τ 2 ( b ) dp ( t ) = τ ( b ) 2+ q Z 1 0 | W 0 ( s ) | q d s, Dynamic Likelihoo d 16 April 1993 where ‘ ≤ d ’ means ‘stochastically smaller than’. F or q = 2 we hav e a Cram´ er–v on Mises t yp e test, with rejection criterion b τ ( b ) − 4 Z b a D n ( t ) 2 d N ( t ) /n = n − 1 P a ≤ x i ≤ b D n ( x i ) 2 δ i ( n − 1 N [ a, b ]) 2 = P a ≤ x i ≤ b { N [ a, x i ] − R x i a Y ( u ) b θ [ a,b ] γ ( u, b β [ a,b ] ) d u } 2 δ i N [ a, b ] 2 ≥ 0 . 347 (6 . 7) on the 10% signiﬁcance level. With wished for 5% lev el we would use 0.461 instead, the 0.95 quan tile of the R 1 0 W 0 ( s ) 2 d s distribution. The second v ariation is for q = 1, where we use b τ ( b ) − 3 Z b a | D n ( t ) | d N ( t ) /n = n − 1 P a ≤ x i ≤ b | D n ( x i ) | δ i ( n − 1 N [ a, b ]) 3 / 2 = P a ≤ x i ≤ b   N [ a, x i ] − R x i a Y ( u ) b θ [ a,b ] γ ( u, b β [ a,b ] ) d u   N [ a, b ] 3 / 2 ≥ 0 . 499 (6 . 8) for in tended 10% signiﬁcance level, and 0.582 for intended 5% signiﬁcance lev el. 0.499 and 0.582 are upp er quantiles of the R 1 0 | W 0 ( s ) | d s distribution. There are simpler one-parameter analogues to (6.7) and (6.8), essentially as in these form ulae but with γ ( u, β ) = 1. Note that (6.6) can b e used when computing any of these test statistics. Some experimentation with these b h = b h ( s ) selectors is necessary . One should a v oid using to o small windows since this w ould lead to to o irregular lo cal estimates. W e should therefore only searc h for acceptable windo ws s ± 1 2 h with h at least as large as some suitably determined h 0 ( s ). One p ossibility is to demand at least k observ ed x i ’s in the window, say with k = 10. Hence the (6.3)–(6.5) and (6.7)–(6.8) stopping criteria are to b e used with such a mo diﬁcation. Secondly the realised b h ( s ) could b e somewhat irregular as a function of s . A natural mo diﬁcation is to smo oth this curve ﬁrst, before ﬁnally computing the lo cal lik eliho od estimate α ( s, b θ [ s − b h ( s ) / 2 ,s + b h ( s ) / 2] ). 7. Supplementing remarks. 7A. St ar ting the estima tor. W e ha v e deﬁned b α ( s ) = α ( s, b θ ( s )) with parameter estimate obtained from s ± 1 2 h data, which also means that a separate deﬁnition is required for s ≤ 1 2 h . One natural strategy is to use the mo del adequacy on in terv als metho ds of Sections 6.3–6.5 to ﬁnd the smallest b for which the model is rejected on [0 , b ], and then use b α ( s ) = α ( s, b θ [0 ,b 0 ] ) for s ∈ [0 , b 0 ], with a somewhat smaller b 0 than b . Another p ossibilit y is to use α ( s, b θ ( 1 2 h )) on [0 , 1 2 h ]. 7B. Post-smoothing of p arameter estima tes. The basic estimator is b α ( s ) = α ( s, b θ ( s )) where b θ ( s ) uses only s ± 1 2 h information. It is useful in practical applications to display not only the ﬁnal b α ( s ) but also the parameter estimate function or functions b θ ( s ). Sometimes this function has discon tinuities, cf. (3.12) and the requiremen ts on K noted there to give smoothness. A general alternativ e is to post-smo oth the parameter estimates, b efore plugging in to giv e b α ( s ). Comments in 5.1, for example, suggest using p ost-smoothing of b β ( s ) in (4.3) and (4.4). 7C. Density estima tion with d ynamic likelihood. When α ( . ) is estimated one can of course also estimate other quan tities dep ending on α ( . ). The lo cal likelihoo d metho ds of Sections 3 and 4 therefore apply to nonparametric or semiparametric density estimation as well, via the f ( t ) = α ( t ) exp {− A ( t ) } connection. Metho ds given there can b e used to obtain a lo cally estimated Nils Lid Hjort 17 April 1993 normal densit y of the t yp e b f ( t ) = N { b µ ( t ) , b σ ( t ) 2 } ( t ), for example. There are at least tw o general immediate p ossibilities, namely b f 1 ( t ) = α ( t, b θ ( t )) exp {− A ( t, b θ ( t ) } and b f 2 ( t ) = h Y [0 ,t ) { 1 − α ( s, b θ ( s )) d s } i α ( t, b θ ( t )) = exp n − Z t 0 α ( s, b θ ( s )) d s o α ( t, b θ ( t )) . The simplest case would again be that of a lo cally constan t hazard, for which b f 1 ( t ) = b θ ( t ) exp {− b θ ( t ) t } and b f 2 ( t ) = exp n − Z t 0 b θ ( s ) d s o b θ ( t ) . These are somewhat cum b ersome densit y estimators. There are b etter sc hemes more directly geared to wards the densit y estimation problem, but still with the same local likelihoo d characteristics, see Hjort and Jones (1993). 7D. Regression models. Metho ds of this pap er can b e made to work in situations with co v ariate information. Consider the Co x regression mo del where individual i has hazard rate of the form α i ( s ) = α 0 ( s ) exp( β 0 z i ) for s ∈ [0 , T ] and i = 1 , . . . , n. The α 0 ( . ) function is the hazard rate for individuals with cov ariate vector z = 0, and is left unsp eciﬁed. This baseline hazard function can now b e estimated using dynamic lik eliho od. If w e ﬁt a lo cal constant on window W = s ± 1 2 h the recip e is to maximise the kernel smo othed log-lik eliho o d n X i =1 Z W K ( h − 1 ( t − s )) { (log θ + β 0 z i ) d N i ( t ) − Y i ( t ) θ exp( β 0 z i ) d t } , where d N i ( t ) = I { x i ∈ [ t, t + d t ] , δ i = 1 } and Y i ( t ) = I { x i ≥ t } are the 0–1 coun ting pro cess and at risk pro cess for individual i . This gives b α 0 ( s ) = P n i =1 R W K ( h − 1 ( t − s )) d N i ( t ) P n i =1 R W K ( h − 1 ( t − s )) Y i ( t ) exp( b β 0 z i ) d t . Here b β could be ev aluated only lo cally , but if one trusts the Cox mo del then β remains constan t o ver the [0 , T ] range, and we should accordingly use the same b β regardless of s . But this is the same as smo othing the traditional Breslow estimator. One can similarly construct a nonparametric α 0 ( . ) estimator b y ﬁtting a running W eibull θγ s γ − 1 , for example. The result is of the form b α 0 ( s ) = P n i =1 R W K ( h − 1 ( t − s )) d N i ( t ) P n i =1 R W K ( h − 1 ( t − s )) Y i ( t ) b γ ( s ) t b γ ( s ) − 1 exp( b β 0 z i ) d t . Dynamic lik eliho o d metho ds can also be dev elop ed in Aalen’s linear hazard rate regression mo del, by lo cal parametric modelling of the hazard factor functions. See Hjort (1993a). 7E. Modera tel y incorrect p arametric models. A parametric mo del does not hav e to b e fully p erfect in order for the metho ds based on it to b e b etter than more conserv ativ e ones. In Dynamic Likelihoo d 18 April 1993 Hjort (1993b) a ‘tolerance distance’ is calculated from a mo derately incorrect model to a wider and correct one; inside the tolerance radius estimators based on the incorrect mo del are b etter than those based on the correct mo del. F or an example, suppose the true mo del is the gamma one, with hazard function inherited from the density f ( s, θ , γ ) = { θ γ / Γ( γ ) } s γ − 1 exp( − θ s ). Then estimators based on the incorrect assumption of a constant rate (which corresponds to γ = 1) are b etter than the t wo-parameter metho ds if | γ − 1 | ≤ 1 . 245 / √ n (assuming no censoring). This can b e seen as y et another argumen t for not giving up simple parametric metho ds, ev en though the underlying mo dels might b e wrong. 7F. Counting process models. Metho ds and results of this paper can b e generalised in v arious directions. They could b e dev elop ed for Aalen’s general multiplicativ e in tensit y mo del for coun ting pro cesses, and hence b e used to estimate hazard transition rates in time-inhomogeneous Mark ov chains, for example. There will then b e a more complicated expression for the M W matrix of (2.6), but otherwise there will b e few complications. In another direction our results could b e extended to the full halfline [0 , ∞ ) with appropriate extra assumptions on the censoring mechanism. 7G. More theor y. In our presen tation w e ha v e concen trated on the perhaps most immediate asp ects of the dynamic likelihoo d estimation metho d. There are further natural questions to ask and further natural results to prov e. (i) One can pro ve uniform consistency of the (3.12) estimator without to o m uch w ork, for example. One can more generally establish max a ≤ s ≤ b | α ( s, b θ ( s )) − α ( s ) | → p 0 under natural conditions. (ii) And the appro ximate size and distribution of this maximal deviation quantit y are also of in terest. (iii) It is not diﬃcult to establish that ( nh ) 1 / 2 { b α ( s ) − α ( s ) − 1 2 β K α 00 ( s ) } has a limiting zero-mean normal distribution, when h → 0 and nh → ∞ . This can also be used to construct p oin t-wise appro ximate conﬁdence band for the α ( . ) function, for example incorp orating a bias correction − 1 2 β K b α 00 ( s ). (iv) One should work out a reliable cross v alidation method for minimising a nearly unbiased estimate of R T 0 w ( s ) { b α ( s ) − α ( s ) } 2 d s , say , as a function of the windo w width h , or as a function of c in h = cY n ( s ) − 1 / 5 . The crux is to estimate R T 0 w ( s ) b α ( s ) α ( s ) d s . (v) Theory can also b e work ed out for estimation of deriv ativ es of the hazard function, as touched on in 6.1. T aking the deriv ativ e of (3.12) to deﬁne b α 0 ( s ), with a smo oth k ernel function K , one can show that the bias is prop ortional to a h 2 b 1 ( s ) and that the v ariance is prop ortional to n − 1 h − 3 α ( s ) /y ( s ), but with a b 1 ( s ) function diﬀeren t from that of the deriv ativ e of the Ramlau-Hansen–Y andell estimator. 7H. Questions. A sim ulation study comparing the v arious estimators w ould b e w elcome. Some of the questions to answ er include: How muc h better are the new estimators than the purely nonparametric ones when the true hazard is in the vicinity of the parametric model used? Ho w m uch do they lose to the parametric ones on the latter’s home turf ? Are there signiﬁcan t adv antages to using multi-parameter mo dels for the dynamic likelihoo d metho ds of Sections 3 and 4? What are the most useful w ays of choosing windo w width h = h n ( s )? References Andersen, P .K., Borgan, Ø., Gill, R.D., and Keiding, N.L. (1993). Statistical Models Based on Coun ting Processes. Springer-V erlag, New Y ork. Borgan, Ø. (1984). Maxim um likelihoo d estimation in parametric coun ting pro cess mo dels, Nils Lid Hjort 19 April 1993 with applications to censored failure time data. Scand. J. Statist. 11 , 1–16. Corrigendum, ibid. p. 275. Gr ´ egoire, C. (1993). Least squares cross v alidation for coun ting pro cess in tensities. Scand. J. Sta- tist. , to app ear. Hjort, N.L. (1985). Contribution to the discussion of Andersen and Borgan’s ‘Counting pro cess mo dels for life history data: a review’. Scand. J. Statist. 12 , 141–150. Hjort, N.L. (1990). Go odness of ﬁt tests in mo dels for life history data based on cumulativ e hazard rates. Ann. Statist. 18 , 1221–1258. Hjort, N.L. (1991). Semiparametric estimation of parametric hazard rates. In Surviv al Analysis: State of the Art , Klu wer, Dordrech t, pp. 211–236. Pro ceedings of the NA TO Adv anced Study W orkshop on Surviv al Analysis and Related T opics , Colum bus, Ohio, eds. P .S. Go el and J.P . Klein. Hjort, N.L. (1992). On inference in parametric surviv al data mo dels. Int. Statist. Review 60 , 355–387. Hjort, N.L. (1993a). Eﬃciency of three estimators in Aalen’s linear hazard rate regression mo del. Submitted for publication. Hjort, N.L. (1993b). Estimation in mo derately misspeciﬁed mo dels. Submitted for publication. Hjort, N.L. and Jones, M.C. (1993). Locally parametric nonparametric density estimation. Man us- cript in progress. Hjort, N.L. and Lumley , T. (1993). Normalised lo cal hazard plots. Submitted for publication. Nielsen, J.P . (1990). Kernel estimation of densities and hazards: a counting pro cess approach. PhD. dissertation, Departmen t of Statistics, Univ ersit y of Berkeley , California. Ramlau-Hansen, H. (1983). Smo othing coun ting pro cess intensities b y means of kernel functions. Ann. Statist. 11 , 453–466. T anner, M.A. and W ong, W.H. (1983). The estimation of the hazard function from randomly censored data. Ann. Statist. 11 , 989–993. Y andell, B.S. (1983). Nonparametric inference for rates and densities with censored serial data. Ann. Statist. 11 , 1119–1135. Dynamic Likelihoo d 20 April 1993

Dynamic likelihood hazard rate estimation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment