Generation interval contraction and epidemic data analysis

Generation in terv al con traction and epidemic data analysis Eb en Kenah 1 , 2 , ∗ , Marc Lipsitc h 1 , 3 , James M . Robins 1 , 2 1 Departmen t of Epidemiology 2 Departmen t of Biostatistics 3 Departmen t of Imm unology and Infectious Disease Harv ard Sc ho ol of Public Health 677 Hun tington Av e., Boston, Massac h usetts, USA Corresp onding author: ek enah@hsph.harv ard.edu April and No v ember, 2006 Revised Jan uary-June and O ctob er-No ve mber, 200 7 Abstract The gener ation interval is the time b etw een the infection time of an infected per son and the infection time of his or her infector. Probability density func- tions for generatio n in terv a ls ha ve bee n an impo rtant input for epidemic mo dels and epidemic data analysis. In this pape r , w e s p ecify a g eneral stochastic SIR epidemic mo del and prov e that the mean g e ne r ation int er v al decrea ses whe n susceptible p er s ons a re at risk of infectious contact from m ultiple sources. The int uition b ehind this is that when a susceptible p erson has multip le p otential infectors, there is a “race” to infect him or her in whic h o nly the ﬁrst infectious contact leads to infection. In an epidemic, the mean genera tion in terv al con- tracts as the prev alence of infection incre a ses. W e call this glob al c omp etition among potential infectors. When there is rapid tr ansmission within clus ter s of contacts, generation interv al contraction can b e caus ed by a high lo cal prev a- lence o f infection even when the globa l prev alence is low. W e call this lo c al c omp etition among p o ten tial infector s . Using s im ulatio ns , we illustrate b oth t yp es of comp etition. Finally , we sho w that hazards of infectious contact can be used instead of gener ation in terv als to e s timate the time course of the eﬀectiv e repro ductive n umber in an epidemic. This approach leads naturally to partia l likelihoo ds for epidemic data that ar e very similar to those that ar ise in sur viv a l analysis, op ening a pr omising av enue of metho dologic a l r e s earch in infectious disease epidemiology . 1 In tro duc tion In infectious diseas e epidemiolo gy , the serial interval is the diﬀerence b etw een the symptom onset time of an infected p erson a nd the symptom o ns et time of his or he r infector [1 ]. This is sometimes ca lled the “g eneration interv al.” Ho wev er , we ﬁnd it more useful to adopt the terminolo gy of Sv enss on [2] and deﬁne the gener ation int erval as the diﬀerence b etw een the infection time o f an infected per son and the infection time of his or her infector. By these deﬁnitions, the serial interv a l is o bserv able while the ge ne r ation int er v al usually is no t. W e deﬁne infe ctious c ontact fro m i to j to be a con tact that is suﬃcient to infect j if i is infectious a nd j is susceptible, and we deﬁne a p otential infe ctor o f p er s on i to b e a n infectious p erson who has p ositive probability of making infectious contact with i . Finally , we use the ter m hazar d rather than for c e of infe ction to highlight the similarities betw een epidemic data analy sis and surviv a l analysis. The genera tion interv al has b een an imp ortant input for epidemic mo dels used to inv estigate the transmissio n and control of SARS [3 , 4] a nd pandemic inﬂuenza [5 , 6]. More recen tly , g eneration interv a l distributions ha ve b een used to calculate the incubation perio d distribution of SARS [7] and to estimate R 0 from the exp onential g rowth ra te at the b eginning of a n epidemic [8]. It is generally assumed that the gener a tion in terv al distribution is characteristic o f an infectious dis e a se. In this pap er, we show that this is not true. Instead, the exp ected gene r ation int er v al decreases as the n um b er of potential infectors of susceptibles increases. During a n epidemic, genera tion interv als tend to contract as the prev alence of infection increases. This eﬀect was describ ed by Svensson [2] for an SIR mo del with homog eneous mixing. In this pap er, we extend this result to all time-homo geneous stochastic SIR models. A simple thought exp eriment illustrates the in tuition b ehind our main result. Imagine a susceptible p e rson j in a ro om. Place m other perso ns in the roo m and infect them all a t time t = 0. F or simplicity , a ssume that infectious contact from i to j o ccurs with pr obability one, i = 1 , ..., m . Let t ij be a contin uous nonnegative random v ar iable denoting the ﬁr st time a t which i ma kes infectious contact with j . Person j is infected a t time t j = min( t 1 j , ..., t mj ). Since all infectious p ers ons were infected a t time zer o, t j is the gener ation interv al. If we rep eat the exp eriment with larg er and la rger m , the exp ected v alue of min( t 1 j , ..., t mj ) will decreas e. When a susceptible p erson is at risk of infectious co n tac t from mu ltiple sources, there is a “ra ce” to infect him or her in whic h only the ﬁrst infectious contact leads to infection. Generatio n in terv al contraction is an exa mple of a well-kno wn phenomenon in epidemiolog y: The exp ected time to an outcome, given that the outco me o ccurs , decr eases in t he presence of co mpeting risks. In our thought exp eriment, the outcome is the infection of j by a given i and the comp eting risks are infectious contacts from all sources other than i . Adapting o ur thought exp eriment slightly , we see that the co n tr a ction of the generation interv al is a consequence of the fact that the hazar d of infection for j increases as the n umber of p otential infectors increases. Let λ ( t ) be th e hazard of infectious contact from an y potential infector to j at time t and let E [ t j | m ] 1 be the expected infection time of j given m potential infectors. Then E [ t j | m ] = Z ∞ 0 e − mλ ( t ) dt > Z ∞ 0 e − ( m +1) λ ( t ) dt = E [ t j | m + 1 ] , so the expec ted generation interv a l decreases as the num b er of p otent ia l infec- tors increases. A ha zard of infection that increases with the n umber of p otential infectors is a deﬁning feature of mo st epidemic mo dels, so g eneration interv al contraction is a very gener a l phenomenon. W e note that a very similar phe- nomenon occur s in endemic diseases, where increased force o f infection res ults in a decreased av erag e ag e at ﬁrst infection [9]. The r e s t of the pap er is orga nized as follows: In Section 2 , we describ e a general sto chastic SIR epidemic mo del. In Section 3, w e use this mo del to show that the mean genera tio n interv al decrea ses a s the num b er o f p otential infectors incr e ases. As a c o rollary , we ﬁnd that the mea n ser ial interv al a lso decreases. In Section 4, we cons ider the role of the p opulation contact structure in genera tion in terv al contraction and illustrate the eﬀects of global and lo ca l comp etition among p o ten tial infectors with simulations. In Section 5, we ar gue that hazards of inf ectio us con tact should b e used instead of genera tion or serial int er v al distributions in the ana lysis of epidemic data . Section 6 summar izes our main results and conclusio ns. 2 General sto c hastic SIR mo del W e start with a very general sto chastic ”Susceptible- I nfectio us-Remov ed” (SIR) epidemic mo del. This mo del includes f ully- mixed and netw or k-based mo dels as sp ecial cases, and it has be en used previously to deﬁne a ma pping fro m the ﬁnal outcomes of stochastic SIR models to the comp onents o f semi-dir ected random net works [10, 11]. Each p erson i is infected at his or her infe ction time t i , with t i = ∞ if i is never infected. Person i recovers fro m infectiousness or dies at time t i + r i , where the r e c overy p erio d r i is a p ositive random v aria ble with the cumulative distribution function (cdf ) F i ( r ). The r ecov ery p erio d r i may be the s um of a latent p erio d , during whic h i is infected but not infectious, a nd an infe ctious p erio d , during which i can tr ansmit infection. W e assume that all infected per sons ha ve a ﬁnite recov ery perio d. If p erson i is never infected, let r i = ∞ . Let Sus( t ) = { i : t i > t } be the set of susceptibles at time t . When pe r son i is infected, he or she makes infectious cont a c t with p erson j after an infe ctious c ontact interval τ ij . Each τ ij is a po s itiv e random v ar iable with cdf F ij ( τ | r i ) and surviv al function S ij ( τ | r i ) = 1 − F ij ( τ | r i ). Let τ ij = ∞ if per son i never makes infectious contact with perso n j , so the infectious con tac t int er v al distribution may ha ve probability mass at ∞ . Deﬁne S ij ( ∞| r i ) = lim τ →∞ S ij ( τ | r i ) , 2 which is the conditiona l pro babilit y that i never makes inf ectio us contact with j given r i . Since a p erson cannot tr ansmit disease b efore b eing infected o r after recovering f r o m infectiousness , S ij ( τ | r i ) = 1 for a ll τ ≤ 0 and S ij ( τ | r i ) = S ij ( ∞| r i ) for a ll τ ≥ r i . Since a p erso n cannot infect himself (or her self ), τ ii = ∞ with probability one and S ii ( τ | r i ) = 1 for all τ . The infe ctious c ontact time t ij = t i + τ ij is the time at which p erson i makes infectious contact with p e rson j . If p erson j is susceptible at time t ij , then i infects j and t j = t ij . If t ij < ∞ , then t j ≤ t ij bec ause p ers on j avoids infection at time t ij only if he o r she has alr eady b e en infected. If p er s on i never makes infectious contact with pe rson j , then t ij = ∞ b ecause τ ij = ∞ . Figure 1 sho ws a schematic diagr am o f the re la tionships among r i , τ ij , and t ij . The imp ortation time t 0 i of p erso n i is the ear lie s t time at whic h he or she receives in fectious con tact from outside the p opulation. The impor tation time vector t 0 = ( t 01 , ..., t 0 n ). W e assume that ea ch infected person has a unique infector. F o llowing [4], we let v i represent the index of the pe r son who infected person i , with v i = 0 for impo rted infections a nd v i = ∞ if i is never infected. If tied infectious contact times ha ve no nzero probability , then v i can be chosen from all j such that t j i = t i < ∞ . 2.1 Epidemics Let t (1) ≤ t (2) ≤ ... ≤ t ( m ) be the order statistics of all t 1 , ..., t n less than inﬁnit y , and let ( k ) be the index of the k th per son infected. Before the epidemic b egins, an impo rtation time vector t 0 is ch o sen. The epidemic b egins at time t (1) = min i ( t 0 i ). Person (1) is assigned a recov ery time r (1) . Every p erson j ∈ Sus( t (1) ) is assig ned an infectious contact time t (1) j = t (1) + τ (1) j . The sec o nd infection o ccurs at t (2) = min j ∈ Sus( t (1) ) min( t 0 j , t (1) j ), whic h is the ﬁrst infectious contact time after t (1) . Person (2) is assigned a infectious p erio d r (2) . After k infections, the next infection o ccurs a t t ( k +1) = min j ∈ Sus( t ( k ) ) min( t 0 j , t (1) j , ..., t ( k ) j ). The epidemic stops after m infections if and only if t ( m +1) = ∞ . 3 Generation in terv al con traction In this section, w e show that the mean infectious contact interv al τ ij given that i infects j is shorter than the mean infectious co n tac t interv al given that i makes infectious contact with j . In the notation from the previous sectio n, E [ τ ij | v j = i ] ≤ E [ τ ij | τ ij < ∞ ] (note that v j = i implies τ ij < ∞ but not vice versa). In gener al, this inequality is strict when j is at risk of infectious contact fr om any source other than i . This inequality implies the contraction o f gener ation and s erial interv als during an epidemic. F or background on the probability theory used in this section, please see Ref. [12] or any o ther probability text. 3 Lemma 1 E [ τ ij | v j = i ] ≤ E [ τ ij | τ ij < ∞ ] . Pro of. W e ﬁrst show tha t E [ τ ij | r i , τ ij < ∞ ] ≤ E [ τ ij | r i , v j = i ] and then use the law of iterated exp ectation. If p erso n i was infected at time t i and has recovery per io d r i , then the probability that τ ij < ∞ is F ij ( ∞| r i ) = 1 − S ij ( ∞| r i ). Let F ∗ ij ( τ | r i ) = F ij ( τ | r i ) F ij ( ∞| r i ) be the conditional cdf of τ ij given r i and τ ij < ∞ . Then E [ τ ij | r i , τ ij < ∞ ] = Z r i 0 τ dF ∗ ij ( τ | r i ) . (1) If p erson j is susceptible at time t i and τ ij < ∞ , then v j = i if and only if j escap es infectious contact from all other infectious p eople during the time int er v al ( t i , t i + τ ij ). Let S ∗ j ( t i + τ ) be the probabilit y that j escap es infectious contact from all s ources other than i in the in terv al ( t i , t i + τ ). Giv en r i and τ ij < ∞ , the co nditional probability densit y for an infectious contact from i to j at time t i + τ that leads to the infection of j is propo rtional to S ∗ j ( t i + τ ) dF ∗ ij ( τ | r i ) . If we let ψ = Z r i 0 S ∗ j ( t i + τ ) dF ∗ ij ( τ | r i ) , then E [ τ ij | r i , v j = i ] = Z r i 0 τ S ∗ j ( t i + τ ) ψ dF ∗ ij ( τ | r i ) . Since S ∗ j ( t i + τ ij ) is a monotonically decrea sing function of τ ij , E [ τ ij | r i , v j = i ] − E [ τ ij | r i , τ ij < ∞ ] = E [ τ ij S ∗ j ( t i + τ ij ) ψ | r i , τ ij < ∞ ] − E [ τ ij | r i , τ ij < ∞ ] E [ S ∗ j ( t i + τ ij ) ψ | r i , τ ij < ∞ ] = Cov ( τ ij , S ∗ j ( t i + τ ij ) ψ | r i , τ ij < ∞ ) ≤ 0 . Therefore, E [ τ ij | r i , v j = i ] ≤ E [ τ ij | r i , τ ij < ∞ ] . (2) Since the same inequality holds for all r i , E [ τ ij | v j = i ] = E [ E [ τ ij | r i , v j = i ]] ≤ E [ E [ τ ij | r i , τ ij < ∞ ]] = E [ τ ij | τ ij < ∞ ] (3) 4 by the law of iterated exp ectation. Equality holds in equation ( 2 ) if and only if τ ij and S ∗ j ( t i + τ ij ) hav e cov ari- ance zero given r i and τ ij < ∞ . Since S ∗ j ( t i + τ ij ) is a mono tonically dec reasing function of τ ij , this will occur if a nd only if τ ij or S ∗ j ( t i + τ ij ) is constant given r i and τ ij < ∞ . E quality holds in equation (3) if and only if equality holds in (2) with proba bilit y one in r i . If τ ij is c o nstant, then clea rly S ∗ j ( t i + τ ij ) is con- stant a nd their cov aria nce is zero. If j is not at risk of infectious contact from any so ur ce o ther than i , then S ∗ j ( t i + τ ij ) will be constan t even when τ ij is no t. In the thought experiment from the In tro duction, the exp ected infection time of the susceptible j w ould remain consta nt in the f o llowing tw o scenarios : (i) all infectious per sons mak e infectious contact with j at a ﬁxed time t 0 , or (ii) j is only at r isk of infectious contact fro m a single person. Scenar io (i) c o rresp onds to a constant τ ij and scenario (ii) corres p onds to a constant S ∗ j ( t i + τ ij ). The exp ected gener ation interv al fro m i to j given v j = i w ill b e shortest when the risk of infectious co nt ac t to j fro m sources other tha n i is g reatest. More speciﬁca lly , E [ τ ij | r i , v j = i ] − E [ τ ij | r i , τ ij < ∞ ] will b e minimized when S ∗ j ( t i + τ ij ) decr eases fastest in τ ij . In genera l, the risk of infectious con tact from other sources will b e gre a test when the prev alence of infection is highest, so we exp ect the g r eatest c ont r action of the seria l interv al during an epidemic to coincide with the p eak prev alence of infection. In g eneral, we exp ect to see the fo llowing pattern ov er the course of an epidemic: The mean gener a tion interv a l decreases a s the pr ev ale nc e of infection increases, reaches a minimum as the prev alence of infectio n p eaks, and incre a ses again as the prev alence of infection decreases. 3.1 T yp es of generation in terv als In [2], Sv ennson discussed tw o t yp es of generation in terv als that are consistent with the verbal deﬁnition giv en in the Introduction. T p ( p for “primar y”) denotes τ ij where i is chosen at random fr om all p erso ns who infect at least one other p erson and j is chosen randomly from the set of p erso ns i infects. T s ( s for “seco ndary”) denotes τ ij where j is chosen at ra ndom from all p erso ns infected from within the p opula tion and i = v j . T p and T s diﬀer only in the sampling pr o cedure used to o bta in the ordered pair ij ; T p samples pr imary c a ses (infectors) at rando m while T s samples secondary case s at random. Eq uation (3) implies t hat b o th E [ T p ] and E [ T s ] decrease when susceptible per sons are at risk of infectious co n tact fro m m ultiple sources. This c ont r action oc curs beca use the deﬁnitions of T p and T s include only τ ij such that i actually infected j . 3.2 Serial interv al contraction In an epidemic, infection times are generally unobse rved. Instead, symptom onset times are observed. Reca ll that the time betw een the onset o f sympto ms in an infected person a nd the ons et of symptoms in his or her infector is called the 5 serial interval . Contraction of the mean gener ation interv al implies c ont r action of the mean ser ia l interv al as well. The incub ation p erio d is the time fro m infection to the onset of symptoms [1]. Let q i be the incubation p erio d in per son i , and let t sym i = t i + q i be the time of his or her onset of symptoms. If v j = i , then the serial interv al asso cia ted wit h p ers on j is t sym j − t sym i = τ ij + q j − q i . Therefore, E [ t sym j − t sym i | v j = i ] = E [ τ ij | v j = i ] + E [ q j ] − E [ q i ] ≤ E [ τ ij | τ ij < ∞ ] + E [ q j ] − E [ q i ] , with s trict ineq uality whenever strict inequality ho lds for the corr espo nding generation in terv a l. Over the course of an epidemic, w e expect the mea n serial int er v al t o follo w a pattern very simila r to that of the mean gener ation interv al. 4 Sim ulations W e refer to the “race” to infect a susceptible perso n as c omp et ition among p o- tential infe ctors . In this section, w e illustrate tw o t yp es o f competition a mong po ten tial infectors : Glob al c omp etition among p otent ia l infectors results from a high g lobal prev alence o f infection. L o c al c omp etition among p otential in- fectors r e s ults from rapid tr a nsmission within clusters of co n tac ts, which c a uses susceptibles to b e at risk of infectious contact from multiple sourc es w ithin their clusters ev en if the glo bal prev alence of infection is low. In real epidemics, the prev alence o f infection is usually low but there is clustering of con tacts within households, hospital wards, sc ho ols, and other settings. In this section, w e use simulations to illustrate g eneration interv a l contrac- tion under global and local competition among p otential infector s. Ea ch simu- lation is a single realiza tion o f a sto chastic SIR mo del in a p opulation of 1 0 , 0 0 0. W e keep tra ck of the infection times of t he primar y a nd seco ndary case in each infector/infectee pair and the prev alence of infection a t the infection tim e of the secondary case, which is a proxy for the amount of c o mpetition to infect the secondary case. W e then calculate a smo othed mean o f the generatio n interv al as a function of the infection time of the primary case in each pair. Another v alid approa c h would be to calcula te the smo othed means fro m the r esults of many simulations. W e did no t take this appr oach for the following reaso ns: (i) B e cause o f v ariation in the time course of diﬀeren t realizations of the s ame sto chastic SIR mo del, many simulations would b e requir ed to obtain a cur ve that relia bly approximates the asy mptotic limit. (ii) The smo othed mea n ov er many simulations w ould show a pattern similar to tha t obtained in any single simulation. (iii) Generatio n interv al c o n tr a ction was prov en in Se c tion 3, so the simulations are in tended primarily as illustra tions. All simulations were implemented in Mathematica 5 .0.0.0 [ c  198 8-200 3 W ol- fram Resear ch, Inc.]. All data analys is was done using Interco oled Stata 9.2 6 [ c  1985 -2007 StataCorp LP] All smo othed means are running mea ns with a bandwidth of 0 . 8 (the default for the Stata command lo wess with the option mean ). Similar results were obtained for larger and smaller bandwidths. 4.1 Global comp etition T o illustrate global comp etition among p otential infector s, we use a f ully- mixed mo del with popula tion size n = 1 0 , 0 0 0 a nd basic repro ductive num b er R 0 . The infectious perio d is ﬁxed, w ith r i = 1 with pr obability one for all i . The infectious co ntact interv als τ ij hav e an exponential distribution with haz a rd R 0 ( n − 1) − 1 truncated a t r i , so S ij ( τ | r i ) = e − R 0 ( n − 1) − 1 τ when 0 < τ < 1 and τ ij = ∞ with pro bability e − R 0 ( n − 1) − 1 . The epidemic starts with a single impo rted infection and no other imp orted infections o ccur . F ro m equation (1), the mean infectious contact interv al given tha t contact o ccurs is E [ τ ij | τ ij < ∞ ] = Z 1 0 e − R 0 τ ( n − 1) − 1 − e − R 0 ( n − 1) − 1 1 − e − R 0 ( n − 1) − 1 dτ F or n = 10 , 000, T able 1 shows this e x pected v alue at ea ch R 0 . F or a ll R 0 , E [ τ ij | τ ij < ∞ ] ≈ . 5. This mo del was run once at R 0 = 1 . 25, 1 . 5, 2 , 3, 4, 5, a nd 10 . F o r ea ch simulation, we recor ded t i , v i , t v i , and the prev alence of infection a t time t i in each infector/infectee pair . Figur e 2 shows smo othed mean curves for the generation interv al versus the sourc e infection time for R 0 = 2 , 3 , 4 , 5. There is a clea r tendency for the mean gener ation interv al to contract, with gr eater contraction at higher R 0 . Figur e 3 shows smo othed mean cur ves for the gen- eration in terv al and the prev alence of infection versus the source infection time at ea ch R 0 ; in e a ch ca se, the greatest contraction of the ser ial interv al coincides with the p eak pr ev ale nc e of infectio n (i.e., the greatest compe titio n among po- ten tial infectors). Figure 4 sho ws the same curves for R 0 = 1 . 25 and 1 . 50; in these cases, the g eneration in terv al sta ys relatively consta n t. These results are exactly in line with the argument of Section 3. 4.2 Lo cal comp etition T o illustr ate loc al competition a mong potential infectors, we grouped a p opu- lation of n = 9 , 000 individua ls in to clusters of size k . As before, the infectious per io d is ﬁxed at r i = 1 fo r all i . When i and j are in the same cluster, t he in- fectious con tact in terv al τ ij has an e x po ne ntial distribution with hazar d λ within truncated at r i , so S ij ( τ | r i ) = e − λ within τ when 0 < τ < 1 and τ ij = ∞ with probability e − λ within . When i and j are in diﬀer e n t clus ters, τ ij has an exp onen- tial distribution with hazard λ b etw een truncated at r i , so S ij ( τ | r i ) = e − λ betwe en τ when 0 < τ < 1 a nd τ ij = ∞ with pr o bability e − λ betwe en . W e ﬁxe d the hazar d of infectious contact b etw een individuals in the same cluster at λ within = . 4. W e tuned the haza rd o f infectious contact b et ween indi- viduals in diﬀerent clusters to obtain R mean infectious co n tac ts b y infectious 7 individuals; sp eciﬁcally , λ b etw een = R − ( k − 1)(1 − e − . 4 ) n − k . W e c ho se λ within = . 4 to obtain rapid transmis sion within clus ters while re ta in- ing suﬃcient transmissio n betw een clus ters to sustain an epidemic. Note that when k > R (1 − e − . 4 ) − 1 + 1, we g et the implausible r e sult that λ b etw een < 0 . Clearly , R and k must b e chosen so that an infectious p e rson mak es an average of R or few er infectious contacts within his or her cluster , which guara n tees that λ b etw een ≥ 0. A t a g iven R , the mean infectious contact interv al given that infectio us contact o ccur s dep ends on the cluster size. If the entire p opulatio n is infectious and the cluster size is k , then a g iven individual will receive an average of R infectious contacts, of which ( k − 1)(1 − e − . 4 ) co me fro m within his or her cluster . The mean infectious contact in terv a l for within-cluster contacts is 1 1 − e − . 4 Z 1 0 . 4 τ e − . 4 τ dτ , and the mean infectious co n tac t in terv a l for betw een-cluster con tacts is appro x- imately . 5 (as in the models for g lobal comp etition). Ther efore, the mea n infectious contact interv a l giv en that co n tact o ccurs and the cluster size is k is E [ τ ij | τ ij < ∞ , k ] ≈ (1 − ( k − 1)(1 − e − . 4 ) R ) . 5 + ( k − 1) R Z 1 0 . 4 τ e − . 4 τ dτ . T o compa re generation in terv a l con tra ction for diﬀeren t cluster sizes, we c a lcu- lated sc ale d gener ation intervals b y dividing the observed generation interv als at ea ch cluster size by E [ τ ij | τ ij < ∞ , k ]. If the mean genera tio n in terv al re- mained co nstant, we would e xpect the mean scaled gener ation interv al to b e approximately one throughout an epidemic. F or R = 2 , we ran the mo del with cluster s izes of 1 through 6. F or R = 3, we ran the mo de l w ith cluster sizes of 2 through 8 . F or each simulation, we r ecorded t i , v i , t v i , and the prev alence o f infection at time t i in ea ch infector/ infectee pa ir . Figure 5 shows smo othed mean curves for the genera tion in ter v al and prev alence versus the sour ce infection time for several cluster sizes at eac h R . As b efore, there is a clear tendency of the mean genera tio n interv a l to contract. The degr ee of contraction is r oughly the same fo r all clus ter sizes, but this contraction is maintained at a lower global prev alence o f infection in mo dels with lar ger cluster sizes. Similar res ults w ere obta ined for cluster sizes not s hown. Again, these results are exactly in line with the ar gument of Section 3 . 5 Consequences for estimatio n The eﬀect of g e neration interv al contraction on par ameter estimates o btained from models that assume a constant gener ation or serial in terv a l distribution is 8 diﬃcult to asse ss. The assumption of a consta n t serial or ge neration in terv al distribution may b e reaso nable in the ear ly stag es o f an epidemic with little clustering o f contacts, in an epidemic with R 0 near o ne, or in an endemic situa- tion. How ever, this ignores t he more fundamental issue that estimates of these distributions are obtained from transmission ev ents wher e the infector/infectee pairs are known (often b ecause o f tr ansmission fro m a k nown patient within a ho usehold or hospital ward). Even in the ea rly stag es o f a n epidemic, the generation interv al distribution in thes e settings may diﬀer substantially from the generatio n interv a l distribution for transmissio n in the general population. In this s ection, we argue that hazar ds of infectious con tac t can b e used instead o f generation or serial interv als in the analysis o f epidemic data. As an example, w e lo ok a t the estimato r of R ( t ) (the eﬀectiv e reproductive num ber at time t ) derived by W allinga and T eunis [4 ] and applied to da ta on the SARS outbreaks in Hong Kong , Vietnam, Sing ap o re, and Canada in 2003 . In their pap er, the av ailable data was the “ epidemic curve” t = ( t (1) , ..., t ( m ) ), where t ( i ) is the infection time of the i th per son infected. They a s sume a probability density function (pdf ) w ( τ | θ ) for the serial interv al given a v ector θ of parameter s (note that this parameter vector applies to the population, not to individuals). The infector of per son ( i ) is denoted b y v ( i ) , with v ( i ) = 0 for imported infections . The “infectio n net work” is a vector v = ( v (1) , ..., v ( m ) ) sp ecifying the sour ce of infection for ea c h infected person. With these assumptions, the likelihoo d of v and θ giv en t is L ( v , θ | t ) = Y i : v ( i ) 6 =0 w ( t ( i ) − t v ( i ) | θ ) . The sum of this likeliho o d over the set V of a ll infection netw orks co nsistent with the epidemic curve t is L ( θ | t ) = Y i : v ( i ) 6 =0 X j 6 = i w ( t i − t j | θ ) . T ak ing a likelihoo d ratio, W allinga a nd T eunis arg ue that the relative likeliho o d that p erson k w as infected by person j is p ( W T ) j k = w ( t k − t j | θ ) P i 6 = k w ( t k − t i | θ ) . (4) The num b er R j of seco nda ry infectious g enerated by per son j is a sum of Bernoulli random v a r iables with exp ectation E [ R j ] = n X k =1 p ( W T ) j k . An estimate of the eﬀective repr o ductiv e n umber R ( t ) can b e obtained by cal- culating a smo othed mea n fo r a scatter plo t of ( t j , E [ R j ]). This analysis is ingenious, but it can b e only approximately correct because the distr ibution of serial in ter v als v aries systema tically o ver the course of an epidemic. 9 5.1 Hazard-based estimator A v er y similar result can be der ived b y applying the theory of order s ta tistics (see Ref. [12]) to the gener al sto chastic SIR model f r om Section 2. Spe ciﬁcally , we use the f o llowing r esults: If X 1 , ..., X n are indep endent no n-negative random v ar iables, then their minim um X (1) has the hazard function λ (1) ( t ) = n X i =1 λ i ( t ) . Given that the minimum is x (1) , the proba bilit y that X j = x (1) (i.e. that the minim um was observed in the j th random v ar ia ble) is λ j ( x (1) ) P n i =1 λ i ( x (1) ) . F or s implicit y , w e assume t ha t th e infectious contact interv als τ ij are absolutely contin uous random v ar iables. Let λ ij ( τ | r i ) b e the conditiona l hazard function for τ ij given r i and le t λ 0 i ( t ) be the hazard function for infectious co n tact to i from outside the p opulatio n at time t . Since τ ij is nonnegative, λ ij ( τ | r i ) = 0 whenever τ < 0. Let H ( t ) denote the set of infection times a nd recov ery perio ds for a ll i suc h that t i ≤ t . If p erson k is susce ptible at time t , his or her total haza rd of infection at time t given H ( t ) is P n i =0 λ ik ( t − t i | r i ), where w e let λ 0 k ( t − t 0 | r 0 ) = λ 0 k ( t ) for simplicity o f notation. If a n infection o ccurs in p erson k at time t k < ∞ , then the conditional proba bility that person j infected p erson k given H ( t k ) is p j k = λ j k ( t k − t j | r j ) P n i =0 λ ik ( t k − t i | r i ) , (5) which is the probability that t j k = min ( t 0 k , t 1 k , ..., t nk ). This has the same form as equation (4) except that it uses hazar ds of infectious contact instead of a pdf for the ser ial interv al. If the haz a rds of infectious contact in the underlying SIR mo del do not c hange ov er the course of an epidemic, then p j k can be estimated accura tely througho ut an epidemic. Unlike the a ssumption of a stable generation or serial in terv al distribution, this a ssumption is unaﬀected by comp etition among p otential infectors. The rest of the estimation of R ( t ) could pro ceed exactly as in Ref. [4], replacing p ( W T ) j k with p j k . 5.2 P artial lik eliho o d for epidemic data A par tial likeliho o d for epidemic data can b e de r ived using the same log ic a s that used to derive p j k in equation (5 ). F or each p erson k such that t k < ∞ , the probability tha t the failure at time t k o ccurred in p erson k given H ( t k ) is P n i =0 λ ik ( t k − t i | r i ) P n j =1 P n i =0 λ ij ( t k − t i | r i ) , (6) 10 where th e numerator is t he hazard of infection (from all sources) in p e rson k at time t k and the denominator is the total hazar d of infection for a ll pers ons a t risk of infection at time t k . If there is a vector of para meters x ij for each pair ij (which may include individual-level cov a riates for i and j a s w ell as pairwise co v ariates for the or- dered pair ij ) and a vector of parameters θ suc h that λ ij ( τ | r i ) = λ ( τ | r i , x ij , θ ), then a partial lik eliho o d for θ ca n be obtained by multiplying e q uation (6 ) ov er all m obser v ed failur e times. If ( k ) de no tes the index of the k th per son infected, t = ( t 1 , ..., t n ), and X = { x ij : i , j = 1 , ..., n } , then the partial likelihoo d is L p ( θ | t , X ) = m Y k =1 P n i =0 λ ( t ( k ) − t i | r i , x i ( k ) , θ ) P n j =1 P n i =0 λ ( t k − t i | r i , x ij , θ ) . (7) This is very similar to partial likelihoo ds that aris e in surviv al ana lysis, so many techn ique s fro m surviv al analysis may b e adaptable for use in the ana lysis of epidemic data. The goal of suc h metho ds would b e to allow statistical inference about the eﬀects of individual and pa irwise cov a riates on the ha zard o f infection in order ed pairs of individuals. In the o rdered pair ij , the eﬀects of individua l c ov a riates for i and j on λ ij ( τ | r i ) would reﬂect the infectious nes s of i and the susc e ptibilit y o f j , r esp ectively . Pairwise cov ariates could include suc h info r mation as whether i and j are in the same ho us ehold, the distance betw een their households , whether they are sexual partners, a nd any other asp ects of their relations hip to ea ch other thay may aﬀect the hazard of infection from i to j . This approach has several adv antages ov er any approa ch based on a distri- bution of generation or se rial interv als. First, it is not necessary to determine who infected who m in any subset o f observed infections. If v j is known for s ome j , this knowledge can b e incorp or ated in the partial lik eliho o d b y replacing the term fo r the failure time of p erson j in (7) with p v j j from equatio n (5). Second, this approa ch allows the use individual-level and pair wise cov a riates for infer- ence in a ﬂexible and intuitiv e w ay . The resulting estimated hazard functions hav e a straig h tfor w a r d in terpretation and can be incorp orated naturally into a sto chastic SIR model. Thir d, this approa ch allows theo ry and metho ds fro m surviv al analysis to be applied to the analysis of epidemic data. 6 Discussion Generation a nd s e rial interv al distributions ar e not stable characteristics of an infectious disease. When multiple infectious pe rsons compete to infect a giv en susceptible p ers on, infection is c a used by the ﬁrst p erson to make infectious contact. In Section 3 , we sho wed that the mean inf ectio us contact in terv al τ ij given that i ac tua lly infected j is less than or eq ual to the mean τ ij given i ma de infectious contact with j . T ha t is, E [ τ ij | v j = i ] ≤ E [ τ ij | τ ij < ∞ ] , 11 with strict inequa lit y when τ ij is non- c o nstant a nd j is a t risk of infectious contact from any s ource other than i (more precise co nditions are given in Section 3 ). This r esult holds for a ll time-homogeneous sto chastic SIR models. In an epidemic, the mean gener a tion (and serial) interv a ls contract as the prev alence of infection increa ses and susceptible persons ar e a t risk of infectious contact fro m multiple sources. In the simulations o f Section 4, we saw that the degree of c o n tr a ction increases with R 0 . F or mo dels with c lus tering of contacts, generation interv al contraction can o ccur e ven w he n the globa l prev alence of in- fection is low b ecaus e sus c e ptibles ar e at risk of infectious contact from mult iple sources within their own clusters . In all of the simulations, the gr eatest seria l int er v al contraction coincided with the p eak prev alence of infection, when the risk of infectious contacts fro m mult iple sour ces was highest. The mean gene r - ation interv a l increa ses again as the epidemic wanes, but this r ebo und may b e small when R 0 is high. The reason that generation and serial in ter v als contract during an epidemic is that their deﬁnition applies to pairs of individuals ij such that i actually transmitted infection to j . If we do n’t require that an infectious con tact leads to the transmission o f infection, w e are led naturally to the conc e pt of the infectious contact interv al, whic h has a well-deﬁned distribution througho ut an epidemic. Similarly , we can deﬁne R 0 as the mean num be r of infectious contacts (i.e., ﬁnite infectious contact interv als) made by a primar y case without reference to a completely susceptible population. Gener ation and seria l interv als and the eﬀective repro ductive n umber can then b e deﬁned in terms of infectious contacts that a ctually lead to the transmissio n of infection. Many fundamental concepts in infectious disease epidemiolo g y ca n b e simpliﬁed us efully by de ﬁning them in terms of infectious contact rather than infection transmission. Infectious c o n tac t hazar ds for o rdered pa ir s of individuals can b e used for many o f the same types of analysis that have b een attempted using generation or serial in terv al distributions. In Section 5, W e der ived a hazard- based es ti- mator of R ( t ) v ery similar to that dev elop ed by W allinga and T eunis [4]. This deriv ation led naturally t o a par tial likeliho o d for epidemic d a ta very similar to those that a rise in surv iv al ana lysis. W e b elieve that the adaptation o f metho ds and theory fro m surviv a l ana lysis to infectious diseas e epidemiology will yield ﬂexible and p ow erful to o ls for epidemic data analysis. Ac kno wledg emen ts: This work was supp ort e d by the US National In- stitutes of He alth c o op er ative agr e emen t 5U01GM076 497 ”Mo dels of Infe ctious Dise ase A gent St udy” (E.K. and M.L.) and Ruth L. Kir chstein N ational R e- se ar ch Servic e Awar d 5T32 AI 0075 35 ”Epidemiolo gy of Infe ctious Dise ases and Bio defense” (E.K.). We also wi sh to thank Jac c o Wal linga and the anonymous r eviewers of Mathematic al Bioscienc es for useful c omments and suggestions. References [1] J. Giesecke. Mo dern Infe ctious Dise ase Epidemiolo gy. London: Edward Arnold, 199 4. 12 [2] ˚ A. Svensson (200 7). A no te on generatio n times in epidemic mo dels. Ma th- ematic al Bioscienc es, 208: 300 -311. [3] M. Lipsitch, T. Co he n, B. Co o per , et al. (2003 ). T rans mission dyna mics and control of Sev ere Acute Respirato ry Syndrome. Scienc e , 30 0: 1 966-19 70. [4] J. W allinga and P . T eunis (2004 ). Diﬀerent epidemic c urves for Se vere Acute Respiratory Syndrome Rev ea l Similar Impacts of Control Measur es. Americ an Journal of Epidemi olo gy, 160(6): 50 9 -516. [5] C.E. Mills, J . Ro bins, and M. L ipsitc h (2004 ). T rans missibilit y of 1918 pandemic inﬂuenza. Natu r e 43 2 : 90 4. [6] N.M. F erg uson, D.A.T. Cummings, S. Ca uchemez, C. F ras er, S. Riley , A. Meeyai, S. Iamsiritha worn, a nd D. Burk e. Strategies for cont a ining an emerging inﬂuenza pandemic in So uthea st Asia. Natur e 437: 209 -214. [7] A.Y. K uk and S. Ma (2005 ). E stimation of SARS incubation distribu- tion from serial interv al data using a conv olution likelihoo d. Statistics in Medicine 24(16 ): 2 525-3 7. [8] J. W allinga and M. Lipsitch (2 0 07). How genera tion interv als shap e the relationship be tw een gr owth rates and reproductive n umbers. Pr o c e e dings of the R oyal So ciety B , 274: 599- 604. [9] R. M. Anderso n and R. M. May . Infe ctious Dise ases of Humans: Dynamics and Contr ol . New Y ork: O xford Univ ersity Press, 1991. [10] E. Kenah and J. Robins (20 07). Second lo ok at the spread o f e pidemics on net works. Physic al R eview E 76: 0 3 6113. [11] E. K enah and J. Robins (2007). Net work-based analysis o f sto chastic SIR epidemic models with r andom a nd propor tionate mixing. Journal of The- or et ic al Biolo gy 24 9(4): 706 -722. [12] A. Gut (1995). An Interme diate Course in Pr ob ability. New Y o rk: Springer - V erla g. A Figures and tables 13 R 0 E [ τ ij | τ ij < ∞ ] . 5 − E [ τ ij | τ ij < ∞ ] 1 . 25 . 49 999 . 00001 1 . 5 . 49998 8 . 00001 2 2 . 49998 3 . 00001 7 3 . 49997 5 . 00002 5 4 . 49996 7 . 00003 3 5 . 49995 8 . 00004 2 T able 1: Exp ected infectious contact interv al given that infectious contact o ccurs in the mo dels illustrating global competition among po ten tial infectors. If the generation interv al were constant, this would b e the mean g eneration interv al throughout an epidemic. Figure 1: Schematic diagram of v aria bles in the general sto chastic SIR model for t he ordered pair i j . Recall that t j ≤ t ij . As discussed in Section 3.2 , p erson i develops symptoms at t ime t sym i = t i + q i , where q i is the incubation p erio d. 14 Figure 2: The smo othed mean generatio n in ter v al as a function the sour ce infection time fo r R 0 = 2 , 3 , 4 , 5. There is a clear tendency to co ntract, with greater contraction for higher R 0 . 15 Figure 3: The smo othed mean generatio n interv al (solid lines) and pre v alence (dotted lines) a s a function of the source infection time for R 0 = 2 , 3 , 4 , 5 . In all cases , the greatest cont r action of the serial interv al coincides with the peak prev alence of infection (i.e., the greates t co mpetition among p otential infector s). 16 Figure 4: The smoothed mea n generation interv als (solid lines) and prev alence (dotted lines) as a function of the s ource inf ectio n time f o r R 0 = 1 . 25 and 1 . 50. F or R 0 near one, the mean generatio n in terv a l sta ys re la tively constant. 17 Figure 5: The smo othed mean scale d generation in terv al (SGI) and prev alence as a function of the source infection time fo r R = 2 and R = 3 . With incr easing cluster size, the degree of genera tion interv al contraction is ro ughly the same even tho ug h the peak prev alence of infection is low er. 18

Generation interval contraction and epidemic data analysis

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment