Generation interval contraction and epidemic data analysis
The generation interval is the time between the infection time of an infected person and the infection time of his or her infector. Probability density functions for generation intervals have been an important input for epidemic models and epidemic d…
Authors: Eben Kenah, Marc Lipsitch, James M. Robins
Generation in terv al con traction and epidemic data analysis Eb en Kenah 1 , 2 , ∗ , Marc Lipsitc h 1 , 3 , James M . Robins 1 , 2 1 Departmen t of Epidemiology 2 Departmen t of Biostatistics 3 Departmen t of Imm unology and Infectious Disease Harv ard Sc ho ol of Public Health 677 Hun tington Av e., Boston, Massac h usetts, USA Corresp onding author: ek enah@hsph.harv ard.edu April and No v ember, 2006 Revised Jan uary-June and O ctob er-No ve mber, 200 7 Abstract The gener ation interval is the time b etw een the infection time of an infected per son and the infection time of his or her infector. Probability density func- tions for generatio n in terv a ls ha ve bee n an impo rtant input for epidemic mo dels and epidemic data analysis. In this pape r , w e s p ecify a g eneral stochastic SIR epidemic mo del and prov e that the mean g e ne r ation int er v al decrea ses whe n susceptible p er s ons a re at risk of infectious contact from m ultiple sources. The int uition b ehind this is that when a susceptible p erson has multip le p otential infectors, there is a “race” to infect him or her in whic h o nly the first infectious contact leads to infection. In an epidemic, the mean genera tion in terv al con- tracts as the prev alence of infection incre a ses. W e call this glob al c omp etition among potential infectors. When there is rapid tr ansmission within clus ter s of contacts, generation interv al contraction can b e caus ed by a high lo cal prev a- lence o f infection even when the globa l prev alence is low. W e call this lo c al c omp etition among p o ten tial infector s . Using s im ulatio ns , we illustrate b oth t yp es of comp etition. Finally , we sho w that hazards of infectious contact can be used instead of gener ation in terv als to e s timate the time course of the effectiv e repro ductive n umber in an epidemic. This approach leads naturally to partia l likelihoo ds for epidemic data that ar e very similar to those that ar ise in sur viv a l analysis, op ening a pr omising av enue of metho dologic a l r e s earch in infectious disease epidemiology . 1 In tro duc tion In infectious diseas e epidemiolo gy , the serial interval is the difference b etw een the symptom onset time of an infected p erson a nd the symptom o ns et time of his or he r infector [1 ]. This is sometimes ca lled the “g eneration interv al.” Ho wev er , we find it more useful to adopt the terminolo gy of Sv enss on [2] and define the gener ation int erval as the difference b etw een the infection time o f an infected per son and the infection time of his or her infector. By these definitions, the serial interv a l is o bserv able while the ge ne r ation int er v al usually is no t. W e define infe ctious c ontact fro m i to j to be a con tact that is sufficient to infect j if i is infectious a nd j is susceptible, and we define a p otential infe ctor o f p er s on i to b e a n infectious p erson who has p ositive probability of making infectious contact with i . Finally , we use the ter m hazar d rather than for c e of infe ction to highlight the similarities betw een epidemic data analy sis and surviv a l analysis. The genera tion interv al has b een an imp ortant input for epidemic mo dels used to inv estigate the transmissio n and control of SARS [3 , 4] a nd pandemic influenza [5 , 6]. More recen tly , g eneration interv a l distributions ha ve b een used to calculate the incubation perio d distribution of SARS [7] and to estimate R 0 from the exp onential g rowth ra te at the b eginning of a n epidemic [8]. It is generally assumed that the gener a tion in terv al distribution is characteristic o f an infectious dis e a se. In this pap er, we show that this is not true. Instead, the exp ected gene r ation int er v al decreases as the n um b er of potential infectors of susceptibles increases. During a n epidemic, genera tion interv als tend to contract as the prev alence of infection increases. This effect was describ ed by Svensson [2] for an SIR mo del with homog eneous mixing. In this pap er, we extend this result to all time-homo geneous stochastic SIR models. A simple thought exp eriment illustrates the in tuition b ehind our main result. Imagine a susceptible p e rson j in a ro om. Place m other perso ns in the roo m and infect them all a t time t = 0. F or simplicity , a ssume that infectious contact from i to j o ccurs with pr obability one, i = 1 , ..., m . Let t ij be a contin uous nonnegative random v ar iable denoting the fir st time a t which i ma kes infectious contact with j . Person j is infected a t time t j = min( t 1 j , ..., t mj ). Since all infectious p ers ons were infected a t time zer o, t j is the gener ation interv al. If we rep eat the exp eriment with larg er and la rger m , the exp ected v alue of min( t 1 j , ..., t mj ) will decreas e. When a susceptible p erson is at risk of infectious co n tac t from mu ltiple sources, there is a “ra ce” to infect him or her in whic h only the first infectious contact leads to infection. Generatio n in terv al contraction is an exa mple of a well-kno wn phenomenon in epidemiolog y: The exp ected time to an outcome, given that the outco me o ccurs , decr eases in t he presence of co mpeting risks. In our thought exp eriment, the outcome is the infection of j by a given i and the comp eting risks are infectious contacts from all sources other than i . Adapting o ur thought exp eriment slightly , we see that the co n tr a ction of the generation interv al is a consequence of the fact that the hazar d of infection for j increases as the n umber of p otential infectors increases. Let λ ( t ) be th e hazard of infectious contact from an y potential infector to j at time t and let E [ t j | m ] 1 be the expected infection time of j given m potential infectors. Then E [ t j | m ] = Z ∞ 0 e − mλ ( t ) dt > Z ∞ 0 e − ( m +1) λ ( t ) dt = E [ t j | m + 1 ] , so the expec ted generation interv a l decreases as the num b er of p otent ia l infec- tors increases. A ha zard of infection that increases with the n umber of p otential infectors is a defining feature of mo st epidemic mo dels, so g eneration interv al contraction is a very gener a l phenomenon. W e note that a very similar phe- nomenon occur s in endemic diseases, where increased force o f infection res ults in a decreased av erag e ag e at first infection [9]. The r e s t of the pap er is orga nized as follows: In Section 2 , we describ e a general sto chastic SIR epidemic mo del. In Section 3, w e use this mo del to show that the mean genera tio n interv al decrea ses a s the num b er o f p otential infectors incr e ases. As a c o rollary , we find that the mea n ser ial interv al a lso decreases. In Section 4, we cons ider the role of the p opulation contact structure in genera tion in terv al contraction and illustrate the effects of global and lo ca l comp etition among p o ten tial infectors with simulations. In Section 5, we ar gue that hazards of inf ectio us con tact should b e used instead of genera tion or serial int er v al distributions in the ana lysis of epidemic data . Section 6 summar izes our main results and conclusio ns. 2 General sto c hastic SIR mo del W e start with a very general sto chastic ”Susceptible- I nfectio us-Remov ed” (SIR) epidemic mo del. This mo del includes f ully- mixed and netw or k-based mo dels as sp ecial cases, and it has be en used previously to define a ma pping fro m the final outcomes of stochastic SIR models to the comp onents o f semi-dir ected random net works [10, 11]. Each p erson i is infected at his or her infe ction time t i , with t i = ∞ if i is never infected. Person i recovers fro m infectiousness or dies at time t i + r i , where the r e c overy p erio d r i is a p ositive random v aria ble with the cumulative distribution function (cdf ) F i ( r ). The r ecov ery p erio d r i may be the s um of a latent p erio d , during whic h i is infected but not infectious, a nd an infe ctious p erio d , during which i can tr ansmit infection. W e assume that all infected per sons ha ve a finite recov ery perio d. If p erson i is never infected, let r i = ∞ . Let Sus( t ) = { i : t i > t } be the set of susceptibles at time t . When pe r son i is infected, he or she makes infectious cont a c t with p erson j after an infe ctious c ontact interval τ ij . Each τ ij is a po s itiv e random v ar iable with cdf F ij ( τ | r i ) and surviv al function S ij ( τ | r i ) = 1 − F ij ( τ | r i ). Let τ ij = ∞ if per son i never makes infectious contact with perso n j , so the infectious con tac t int er v al distribution may ha ve probability mass at ∞ . Define S ij ( ∞| r i ) = lim τ →∞ S ij ( τ | r i ) , 2 which is the conditiona l pro babilit y that i never makes inf ectio us contact with j given r i . Since a p erson cannot tr ansmit disease b efore b eing infected o r after recovering f r o m infectiousness , S ij ( τ | r i ) = 1 for a ll τ ≤ 0 and S ij ( τ | r i ) = S ij ( ∞| r i ) for a ll τ ≥ r i . Since a p erso n cannot infect himself (or her self ), τ ii = ∞ with probability one and S ii ( τ | r i ) = 1 for all τ . The infe ctious c ontact time t ij = t i + τ ij is the time at which p erson i makes infectious contact with p e rson j . If p erson j is susceptible at time t ij , then i infects j and t j = t ij . If t ij < ∞ , then t j ≤ t ij bec ause p ers on j avoids infection at time t ij only if he o r she has alr eady b e en infected. If p er s on i never makes infectious contact with pe rson j , then t ij = ∞ b ecause τ ij = ∞ . Figure 1 sho ws a schematic diagr am o f the re la tionships among r i , τ ij , and t ij . The imp ortation time t 0 i of p erso n i is the ear lie s t time at whic h he or she receives in fectious con tact from outside the p opulation. The impor tation time vector t 0 = ( t 01 , ..., t 0 n ). W e assume that ea ch infected person has a unique infector. F o llowing [4], we let v i represent the index of the pe r son who infected person i , with v i = 0 for impo rted infections a nd v i = ∞ if i is never infected. If tied infectious contact times ha ve no nzero probability , then v i can be chosen from all j such that t j i = t i < ∞ . 2.1 Epidemics Let t (1) ≤ t (2) ≤ ... ≤ t ( m ) be the order statistics of all t 1 , ..., t n less than infinit y , and let ( k ) be the index of the k th per son infected. Before the epidemic b egins, an impo rtation time vector t 0 is ch o sen. The epidemic b egins at time t (1) = min i ( t 0 i ). Person (1) is assigned a recov ery time r (1) . Every p erson j ∈ Sus( t (1) ) is assig ned an infectious contact time t (1) j = t (1) + τ (1) j . The sec o nd infection o ccurs at t (2) = min j ∈ Sus( t (1) ) min( t 0 j , t (1) j ), whic h is the first infectious contact time after t (1) . Person (2) is assigned a infectious p erio d r (2) . After k infections, the next infection o ccurs a t t ( k +1) = min j ∈ Sus( t ( k ) ) min( t 0 j , t (1) j , ..., t ( k ) j ). The epidemic stops after m infections if and only if t ( m +1) = ∞ . 3 Generation in terv al con traction In this section, w e show that the mean infectious contact interv al τ ij given that i infects j is shorter than the mean infectious co n tac t interv al given that i makes infectious contact with j . In the notation from the previous sectio n, E [ τ ij | v j = i ] ≤ E [ τ ij | τ ij < ∞ ] (note that v j = i implies τ ij < ∞ but not vice versa). In gener al, this inequality is strict when j is at risk of infectious contact fr om any source other than i . This inequality implies the contraction o f gener ation and s erial interv als during an epidemic. F or background on the probability theory used in this section, please see Ref. [12] or any o ther probability text. 3 Lemma 1 E [ τ ij | v j = i ] ≤ E [ τ ij | τ ij < ∞ ] . Pro of. W e first show tha t E [ τ ij | r i , τ ij < ∞ ] ≤ E [ τ ij | r i , v j = i ] and then use the law of iterated exp ectation. If p erso n i was infected at time t i and has recovery per io d r i , then the probability that τ ij < ∞ is F ij ( ∞| r i ) = 1 − S ij ( ∞| r i ). Let F ∗ ij ( τ | r i ) = F ij ( τ | r i ) F ij ( ∞| r i ) be the conditional cdf of τ ij given r i and τ ij < ∞ . Then E [ τ ij | r i , τ ij < ∞ ] = Z r i 0 τ dF ∗ ij ( τ | r i ) . (1) If p erson j is susceptible at time t i and τ ij < ∞ , then v j = i if and only if j escap es infectious contact from all other infectious p eople during the time int er v al ( t i , t i + τ ij ). Let S ∗ j ( t i + τ ) be the probabilit y that j escap es infectious contact from all s ources other than i in the in terv al ( t i , t i + τ ). Giv en r i and τ ij < ∞ , the co nditional probability densit y for an infectious contact from i to j at time t i + τ that leads to the infection of j is propo rtional to S ∗ j ( t i + τ ) dF ∗ ij ( τ | r i ) . If we let ψ = Z r i 0 S ∗ j ( t i + τ ) dF ∗ ij ( τ | r i ) , then E [ τ ij | r i , v j = i ] = Z r i 0 τ S ∗ j ( t i + τ ) ψ dF ∗ ij ( τ | r i ) . Since S ∗ j ( t i + τ ij ) is a monotonically decrea sing function of τ ij , E [ τ ij | r i , v j = i ] − E [ τ ij | r i , τ ij < ∞ ] = E [ τ ij S ∗ j ( t i + τ ij ) ψ | r i , τ ij < ∞ ] − E [ τ ij | r i , τ ij < ∞ ] E [ S ∗ j ( t i + τ ij ) ψ | r i , τ ij < ∞ ] = Cov ( τ ij , S ∗ j ( t i + τ ij ) ψ | r i , τ ij < ∞ ) ≤ 0 . Therefore, E [ τ ij | r i , v j = i ] ≤ E [ τ ij | r i , τ ij < ∞ ] . (2) Since the same inequality holds for all r i , E [ τ ij | v j = i ] = E [ E [ τ ij | r i , v j = i ]] ≤ E [ E [ τ ij | r i , τ ij < ∞ ]] = E [ τ ij | τ ij < ∞ ] (3) 4 by the law of iterated exp ectation. Equality holds in equation ( 2 ) if and only if τ ij and S ∗ j ( t i + τ ij ) hav e cov ari- ance zero given r i and τ ij < ∞ . Since S ∗ j ( t i + τ ij ) is a mono tonically dec reasing function of τ ij , this will occur if a nd only if τ ij or S ∗ j ( t i + τ ij ) is constant given r i and τ ij < ∞ . E quality holds in equation (3) if and only if equality holds in (2) with proba bilit y one in r i . If τ ij is c o nstant, then clea rly S ∗ j ( t i + τ ij ) is con- stant a nd their cov aria nce is zero. If j is not at risk of infectious contact from any so ur ce o ther than i , then S ∗ j ( t i + τ ij ) will be constan t even when τ ij is no t. In the thought experiment from the In tro duction, the exp ected infection time of the susceptible j w ould remain consta nt in the f o llowing tw o scenarios : (i) all infectious per sons mak e infectious contact with j at a fixed time t 0 , or (ii) j is only at r isk of infectious contact fro m a single person. Scenar io (i) c o rresp onds to a constant τ ij and scenario (ii) corres p onds to a constant S ∗ j ( t i + τ ij ). The exp ected gener ation interv al fro m i to j given v j = i w ill b e shortest when the risk of infectious co nt ac t to j fro m sources other tha n i is g reatest. More specifica lly , E [ τ ij | r i , v j = i ] − E [ τ ij | r i , τ ij < ∞ ] will b e minimized when S ∗ j ( t i + τ ij ) decr eases fastest in τ ij . In genera l, the risk of infectious con tact from other sources will b e gre a test when the prev alence of infection is highest, so we exp ect the g r eatest c ont r action of the seria l interv al during an epidemic to coincide with the p eak prev alence of infection. In g eneral, we exp ect to see the fo llowing pattern ov er the course of an epidemic: The mean gener a tion interv a l decreases a s the pr ev ale nc e of infection increases, reaches a minimum as the prev alence of infectio n p eaks, and incre a ses again as the prev alence of infection decreases. 3.1 T yp es of generation in terv als In [2], Sv ennson discussed tw o t yp es of generation in terv als that are consistent with the verbal definition giv en in the Introduction. T p ( p for “primar y”) denotes τ ij where i is chosen at random fr om all p erso ns who infect at least one other p erson and j is chosen randomly from the set of p erso ns i infects. T s ( s for “seco ndary”) denotes τ ij where j is chosen at ra ndom from all p erso ns infected from within the p opula tion and i = v j . T p and T s differ only in the sampling pr o cedure used to o bta in the ordered pair ij ; T p samples pr imary c a ses (infectors) at rando m while T s samples secondary case s at random. Eq uation (3) implies t hat b o th E [ T p ] and E [ T s ] decrease when susceptible per sons are at risk of infectious co n tact fro m m ultiple sources. This c ont r action oc curs beca use the definitions of T p and T s include only τ ij such that i actually infected j . 3.2 Serial interv al contraction In an epidemic, infection times are generally unobse rved. Instead, symptom onset times are observed. Reca ll that the time betw een the onset o f sympto ms in an infected person a nd the ons et of symptoms in his or her infector is called the 5 serial interval . Contraction of the mean gener ation interv al implies c ont r action of the mean ser ia l interv al as well. The incub ation p erio d is the time fro m infection to the onset of symptoms [1]. Let q i be the incubation p erio d in per son i , and let t sym i = t i + q i be the time of his or her onset of symptoms. If v j = i , then the serial interv al asso cia ted wit h p ers on j is t sym j − t sym i = τ ij + q j − q i . Therefore, E [ t sym j − t sym i | v j = i ] = E [ τ ij | v j = i ] + E [ q j ] − E [ q i ] ≤ E [ τ ij | τ ij < ∞ ] + E [ q j ] − E [ q i ] , with s trict ineq uality whenever strict inequality ho lds for the corr espo nding generation in terv a l. Over the course of an epidemic, w e expect the mea n serial int er v al t o follo w a pattern very simila r to that of the mean gener ation interv al. 4 Sim ulations W e refer to the “race” to infect a susceptible perso n as c omp et ition among p o- tential infe ctors . In this section, w e illustrate tw o t yp es o f competition a mong po ten tial infectors : Glob al c omp etition among p otent ia l infectors results from a high g lobal prev alence o f infection. L o c al c omp etition among p otential in- fectors r e s ults from rapid tr a nsmission within clusters of co n tac ts, which c a uses susceptibles to b e at risk of infectious contact from multiple sourc es w ithin their clusters ev en if the glo bal prev alence of infection is low. In real epidemics, the prev alence o f infection is usually low but there is clustering of con tacts within households, hospital wards, sc ho ols, and other settings. In this section, w e use simulations to illustrate g eneration interv a l contrac- tion under global and local competition among p otential infector s. Ea ch simu- lation is a single realiza tion o f a sto chastic SIR mo del in a p opulation of 1 0 , 0 0 0. W e keep tra ck of the infection times of t he primar y a nd seco ndary case in each infector/infectee pair and the prev alence of infection a t the infection tim e of the secondary case, which is a proxy for the amount of c o mpetition to infect the secondary case. W e then calculate a smo othed mean o f the generatio n interv al as a function of the infection time of the primary case in each pair. Another v alid approa c h would be to calcula te the smo othed means fro m the r esults of many simulations. W e did no t take this appr oach for the following reaso ns: (i) B e cause o f v ariation in the time course of differen t realizations of the s ame sto chastic SIR mo del, many simulations would b e requir ed to obtain a cur ve that relia bly approximates the asy mptotic limit. (ii) The smo othed mea n ov er many simulations w ould show a pattern similar to tha t obtained in any single simulation. (iii) Generatio n interv al c o n tr a ction was prov en in Se c tion 3, so the simulations are in tended primarily as illustra tions. All simulations were implemented in Mathematica 5 .0.0.0 [ c 198 8-200 3 W ol- fram Resear ch, Inc.]. All data analys is was done using Interco oled Stata 9.2 6 [ c 1985 -2007 StataCorp LP] All smo othed means are running mea ns with a bandwidth of 0 . 8 (the default for the Stata command lo wess with the option mean ). Similar results were obtained for larger and smaller bandwidths. 4.1 Global comp etition T o illustrate global comp etition among p otential infector s, we use a f ully- mixed mo del with popula tion size n = 1 0 , 0 0 0 a nd basic repro ductive num b er R 0 . The infectious perio d is fixed, w ith r i = 1 with pr obability one for all i . The infectious co ntact interv als τ ij hav e an exponential distribution with haz a rd R 0 ( n − 1) − 1 truncated a t r i , so S ij ( τ | r i ) = e − R 0 ( n − 1) − 1 τ when 0 < τ < 1 and τ ij = ∞ with pro bability e − R 0 ( n − 1) − 1 . The epidemic starts with a single impo rted infection and no other imp orted infections o ccur . F ro m equation (1), the mean infectious contact interv al given tha t contact o ccurs is E [ τ ij | τ ij < ∞ ] = Z 1 0 e − R 0 τ ( n − 1) − 1 − e − R 0 ( n − 1) − 1 1 − e − R 0 ( n − 1) − 1 dτ F or n = 10 , 000, T able 1 shows this e x pected v alue at ea ch R 0 . F or a ll R 0 , E [ τ ij | τ ij < ∞ ] ≈ . 5. This mo del was run once at R 0 = 1 . 25, 1 . 5, 2 , 3, 4, 5, a nd 10 . F o r ea ch simulation, we recor ded t i , v i , t v i , and the prev alence of infection a t time t i in each infector/infectee pair . Figur e 2 shows smo othed mean curves for the generation interv al versus the sourc e infection time for R 0 = 2 , 3 , 4 , 5. There is a clea r tendency for the mean gener ation interv al to contract, with gr eater contraction at higher R 0 . Figur e 3 shows smo othed mean cur ves for the gen- eration in terv al and the prev alence of infection versus the source infection time at ea ch R 0 ; in e a ch ca se, the greatest contraction of the ser ial interv al coincides with the p eak pr ev ale nc e of infectio n (i.e., the greatest compe titio n among po- ten tial infectors). Figure 4 sho ws the same curves for R 0 = 1 . 25 and 1 . 50; in these cases, the g eneration in terv al sta ys relatively consta n t. These results are exactly in line with the argument of Section 3. 4.2 Lo cal comp etition T o illustr ate loc al competition a mong potential infectors, we grouped a p opu- lation of n = 9 , 000 individua ls in to clusters of size k . As before, the infectious per io d is fixed at r i = 1 fo r all i . When i and j are in the same cluster, t he in- fectious con tact in terv al τ ij has an e x po ne ntial distribution with hazar d λ within truncated at r i , so S ij ( τ | r i ) = e − λ within τ when 0 < τ < 1 and τ ij = ∞ with probability e − λ within . When i and j are in differ e n t clus ters, τ ij has an exp onen- tial distribution with hazard λ b etw een truncated at r i , so S ij ( τ | r i ) = e − λ betwe en τ when 0 < τ < 1 a nd τ ij = ∞ with pr o bability e − λ betwe en . W e fixe d the hazar d of infectious contact b etw een individuals in the same cluster at λ within = . 4. W e tuned the haza rd o f infectious contact b et ween indi- viduals in different clusters to obtain R mean infectious co n tac ts b y infectious 7 individuals; sp ecifically , λ b etw een = R − ( k − 1)(1 − e − . 4 ) n − k . W e c ho se λ within = . 4 to obtain rapid transmis sion within clus ters while re ta in- ing sufficient transmissio n betw een clus ters to sustain an epidemic. Note that when k > R (1 − e − . 4 ) − 1 + 1, we g et the implausible r e sult that λ b etw een < 0 . Clearly , R and k must b e chosen so that an infectious p e rson mak es an average of R or few er infectious contacts within his or her cluster , which guara n tees that λ b etw een ≥ 0. A t a g iven R , the mean infectious contact interv al given that infectio us contact o ccur s dep ends on the cluster size. If the entire p opulatio n is infectious and the cluster size is k , then a g iven individual will receive an average of R infectious contacts, of which ( k − 1)(1 − e − . 4 ) co me fro m within his or her cluster . The mean infectious contact in terv a l for within-cluster contacts is 1 1 − e − . 4 Z 1 0 . 4 τ e − . 4 τ dτ , and the mean infectious co n tac t in terv a l for betw een-cluster con tacts is appro x- imately . 5 (as in the models for g lobal comp etition). Ther efore, the mea n infectious contact interv a l giv en that co n tact o ccurs and the cluster size is k is E [ τ ij | τ ij < ∞ , k ] ≈ (1 − ( k − 1)(1 − e − . 4 ) R ) . 5 + ( k − 1) R Z 1 0 . 4 τ e − . 4 τ dτ . T o compa re generation in terv a l con tra ction for differen t cluster sizes, we c a lcu- lated sc ale d gener ation intervals b y dividing the observed generation interv als at ea ch cluster size by E [ τ ij | τ ij < ∞ , k ]. If the mean genera tio n in terv al re- mained co nstant, we would e xpect the mean scaled gener ation interv al to b e approximately one throughout an epidemic. F or R = 2 , we ran the mo del with cluster s izes of 1 through 6. F or R = 3, we ran the mo de l w ith cluster sizes of 2 through 8 . F or each simulation, we r ecorded t i , v i , t v i , and the prev alence o f infection at time t i in ea ch infector/ infectee pa ir . Figure 5 shows smo othed mean curves for the genera tion in ter v al and prev alence versus the sour ce infection time for several cluster sizes at eac h R . As b efore, there is a clear tendency of the mean genera tio n interv a l to contract. The degr ee of contraction is r oughly the same fo r all clus ter sizes, but this contraction is maintained at a lower global prev alence o f infection in mo dels with lar ger cluster sizes. Similar res ults w ere obta ined for cluster sizes not s hown. Again, these results are exactly in line with the ar gument of Section 3 . 5 Consequences for estimatio n The effect of g e neration interv al contraction on par ameter estimates o btained from models that assume a constant gener ation or serial in terv a l distribution is 8 difficult to asse ss. The assumption of a consta n t serial or ge neration in terv al distribution may b e reaso nable in the ear ly stag es o f an epidemic with little clustering o f contacts, in an epidemic with R 0 near o ne, or in an endemic situa- tion. How ever, this ignores t he more fundamental issue that estimates of these distributions are obtained from transmission ev ents wher e the infector/infectee pairs are known (often b ecause o f tr ansmission fro m a k nown patient within a ho usehold or hospital ward). Even in the ea rly stag es o f a n epidemic, the generation interv al distribution in thes e settings may differ substantially from the generatio n interv a l distribution for transmissio n in the general population. In this s ection, we argue that hazar ds of infectious con tac t can b e used instead o f generation or serial interv als in the analysis o f epidemic data. As an example, w e lo ok a t the estimato r of R ( t ) (the effectiv e reproductive num ber at time t ) derived by W allinga and T eunis [4 ] and applied to da ta on the SARS outbreaks in Hong Kong , Vietnam, Sing ap o re, and Canada in 2003 . In their pap er, the av ailable data was the “ epidemic curve” t = ( t (1) , ..., t ( m ) ), where t ( i ) is the infection time of the i th per son infected. They a s sume a probability density function (pdf ) w ( τ | θ ) for the serial interv al given a v ector θ of parameter s (note that this parameter vector applies to the population, not to individuals). The infector of per son ( i ) is denoted b y v ( i ) , with v ( i ) = 0 for imported infections . The “infectio n net work” is a vector v = ( v (1) , ..., v ( m ) ) sp ecifying the sour ce of infection for ea c h infected person. With these assumptions, the likelihoo d of v and θ giv en t is L ( v , θ | t ) = Y i : v ( i ) 6 =0 w ( t ( i ) − t v ( i ) | θ ) . The sum of this likeliho o d over the set V of a ll infection netw orks co nsistent with the epidemic curve t is L ( θ | t ) = Y i : v ( i ) 6 =0 X j 6 = i w ( t i − t j | θ ) . T ak ing a likelihoo d ratio, W allinga a nd T eunis arg ue that the relative likeliho o d that p erson k w as infected by person j is p ( W T ) j k = w ( t k − t j | θ ) P i 6 = k w ( t k − t i | θ ) . (4) The num b er R j of seco nda ry infectious g enerated by per son j is a sum of Bernoulli random v a r iables with exp ectation E [ R j ] = n X k =1 p ( W T ) j k . An estimate of the effective repr o ductiv e n umber R ( t ) can b e obtained by cal- culating a smo othed mea n fo r a scatter plo t of ( t j , E [ R j ]). This analysis is ingenious, but it can b e only approximately correct because the distr ibution of serial in ter v als v aries systema tically o ver the course of an epidemic. 9 5.1 Hazard-based estimator A v er y similar result can be der ived b y applying the theory of order s ta tistics (see Ref. [12]) to the gener al sto chastic SIR model f r om Section 2. Spe cifically , we use the f o llowing r esults: If X 1 , ..., X n are indep endent no n-negative random v ar iables, then their minim um X (1) has the hazard function λ (1) ( t ) = n X i =1 λ i ( t ) . Given that the minimum is x (1) , the proba bilit y that X j = x (1) (i.e. that the minim um was observed in the j th random v ar ia ble) is λ j ( x (1) ) P n i =1 λ i ( x (1) ) . F or s implicit y , w e assume t ha t th e infectious contact interv als τ ij are absolutely contin uous random v ar iables. Let λ ij ( τ | r i ) b e the conditiona l hazard function for τ ij given r i and le t λ 0 i ( t ) be the hazard function for infectious co n tact to i from outside the p opulatio n at time t . Since τ ij is nonnegative, λ ij ( τ | r i ) = 0 whenever τ < 0. Let H ( t ) denote the set of infection times a nd recov ery perio ds for a ll i suc h that t i ≤ t . If p erson k is susce ptible at time t , his or her total haza rd of infection at time t given H ( t ) is P n i =0 λ ik ( t − t i | r i ), where w e let λ 0 k ( t − t 0 | r 0 ) = λ 0 k ( t ) for simplicity o f notation. If a n infection o ccurs in p erson k at time t k < ∞ , then the conditional proba bility that person j infected p erson k given H ( t k ) is p j k = λ j k ( t k − t j | r j ) P n i =0 λ ik ( t k − t i | r i ) , (5) which is the probability that t j k = min ( t 0 k , t 1 k , ..., t nk ). This has the same form as equation (4) except that it uses hazar ds of infectious contact instead of a pdf for the ser ial interv al. If the haz a rds of infectious contact in the underlying SIR mo del do not c hange ov er the course of an epidemic, then p j k can be estimated accura tely througho ut an epidemic. Unlike the a ssumption of a stable generation or serial in terv al distribution, this a ssumption is unaffected by comp etition among p otential infectors. The rest of the estimation of R ( t ) could pro ceed exactly as in Ref. [4], replacing p ( W T ) j k with p j k . 5.2 P artial lik eliho o d for epidemic data A par tial likeliho o d for epidemic data can b e de r ived using the same log ic a s that used to derive p j k in equation (5 ). F or each p erson k such that t k < ∞ , the probability tha t the failure at time t k o ccurred in p erson k given H ( t k ) is P n i =0 λ ik ( t k − t i | r i ) P n j =1 P n i =0 λ ij ( t k − t i | r i ) , (6) 10 where th e numerator is t he hazard of infection (from all sources) in p e rson k at time t k and the denominator is the total hazar d of infection for a ll pers ons a t risk of infection at time t k . If there is a vector of para meters x ij for each pair ij (which may include individual-level cov a riates for i and j a s w ell as pairwise co v ariates for the or- dered pair ij ) and a vector of parameters θ suc h that λ ij ( τ | r i ) = λ ( τ | r i , x ij , θ ), then a partial lik eliho o d for θ ca n be obtained by multiplying e q uation (6 ) ov er all m obser v ed failur e times. If ( k ) de no tes the index of the k th per son infected, t = ( t 1 , ..., t n ), and X = { x ij : i , j = 1 , ..., n } , then the partial likelihoo d is L p ( θ | t , X ) = m Y k =1 P n i =0 λ ( t ( k ) − t i | r i , x i ( k ) , θ ) P n j =1 P n i =0 λ ( t k − t i | r i , x ij , θ ) . (7) This is very similar to partial likelihoo ds that aris e in surviv al ana lysis, so many techn ique s fro m surviv al analysis may b e adaptable for use in the ana lysis of epidemic data. The goal of suc h metho ds would b e to allow statistical inference about the effects of individual and pa irwise cov a riates on the ha zard o f infection in order ed pairs of individuals. In the o rdered pair ij , the effects of individua l c ov a riates for i and j on λ ij ( τ | r i ) would reflect the infectious nes s of i and the susc e ptibilit y o f j , r esp ectively . Pairwise cov ariates could include suc h info r mation as whether i and j are in the same ho us ehold, the distance betw een their households , whether they are sexual partners, a nd any other asp ects of their relations hip to ea ch other thay may affect the hazard of infection from i to j . This approach has several adv antages ov er any approa ch based on a distri- bution of generation or se rial interv als. First, it is not necessary to determine who infected who m in any subset o f observed infections. If v j is known for s ome j , this knowledge can b e incorp or ated in the partial lik eliho o d b y replacing the term fo r the failure time of p erson j in (7) with p v j j from equatio n (5). Second, this approa ch allows the use individual-level and pair wise cov a riates for infer- ence in a flexible and intuitiv e w ay . The resulting estimated hazard functions hav e a straig h tfor w a r d in terpretation and can be incorp orated naturally into a sto chastic SIR model. Thir d, this approa ch allows theo ry and metho ds fro m surviv al analysis to be applied to the analysis of epidemic data. 6 Discussion Generation a nd s e rial interv al distributions ar e not stable characteristics of an infectious disease. When multiple infectious pe rsons compete to infect a giv en susceptible p ers on, infection is c a used by the first p erson to make infectious contact. In Section 3 , we sho wed that the mean inf ectio us contact in terv al τ ij given that i ac tua lly infected j is less than or eq ual to the mean τ ij given i ma de infectious contact with j . T ha t is, E [ τ ij | v j = i ] ≤ E [ τ ij | τ ij < ∞ ] , 11 with strict inequa lit y when τ ij is non- c o nstant a nd j is a t risk of infectious contact from any s ource other than i (more precise co nditions are given in Section 3 ). This r esult holds for a ll time-homogeneous sto chastic SIR models. In an epidemic, the mean gener a tion (and serial) interv a ls contract as the prev alence of infection increa ses and susceptible persons ar e a t risk of infectious contact fro m multiple sources. In the simulations o f Section 4, we saw that the degree of c o n tr a ction increases with R 0 . F or mo dels with c lus tering of contacts, generation interv al contraction can o ccur e ven w he n the globa l prev alence of in- fection is low b ecaus e sus c e ptibles ar e at risk of infectious contact from mult iple sources within their own clusters . In all of the simulations, the gr eatest seria l int er v al contraction coincided with the p eak prev alence of infection, when the risk of infectious contacts fro m mult iple sour ces was highest. The mean gene r - ation interv a l increa ses again as the epidemic wanes, but this r ebo und may b e small when R 0 is high. The reason that generation and serial in ter v als contract during an epidemic is that their definition applies to pairs of individuals ij such that i actually transmitted infection to j . If we do n’t require that an infectious con tact leads to the transmission o f infection, w e are led naturally to the conc e pt of the infectious contact interv al, whic h has a well-defined distribution througho ut an epidemic. Similarly , we can define R 0 as the mean num be r of infectious contacts (i.e., finite infectious contact interv als) made by a primar y case without reference to a completely susceptible population. Gener ation and seria l interv als and the effective repro ductive n umber can then b e defined in terms of infectious contacts that a ctually lead to the transmissio n of infection. Many fundamental concepts in infectious disease epidemiolo g y ca n b e simplified us efully by de fining them in terms of infectious contact rather than infection transmission. Infectious c o n tac t hazar ds for o rdered pa ir s of individuals can b e used for many o f the same types of analysis that have b een attempted using generation or serial in terv al distributions. In Section 5, W e der ived a hazard- based es ti- mator of R ( t ) v ery similar to that dev elop ed by W allinga and T eunis [4]. This deriv ation led naturally t o a par tial likeliho o d for epidemic d a ta very similar to those that a rise in surv iv al ana lysis. W e b elieve that the adaptation o f metho ds and theory fro m surviv a l ana lysis to infectious diseas e epidemiology will yield flexible and p ow erful to o ls for epidemic data analysis. Ac kno wledg emen ts: This work was supp ort e d by the US National In- stitutes of He alth c o op er ative agr e emen t 5U01GM076 497 ”Mo dels of Infe ctious Dise ase A gent St udy” (E.K. and M.L.) and Ruth L. Kir chstein N ational R e- se ar ch Servic e Awar d 5T32 AI 0075 35 ”Epidemiolo gy of Infe ctious Dise ases and Bio defense” (E.K.). We also wi sh to thank Jac c o Wal linga and the anonymous r eviewers of Mathematic al Bioscienc es for useful c omments and suggestions. References [1] J. Giesecke. Mo dern Infe ctious Dise ase Epidemiolo gy. London: Edward Arnold, 199 4. 12 [2] ˚ A. Svensson (200 7). A no te on generatio n times in epidemic mo dels. Ma th- ematic al Bioscienc es, 208: 300 -311. [3] M. Lipsitch, T. Co he n, B. Co o per , et al. (2003 ). T rans mission dyna mics and control of Sev ere Acute Respirato ry Syndrome. Scienc e , 30 0: 1 966-19 70. [4] J. W allinga and P . T eunis (2004 ). Different epidemic c urves for Se vere Acute Respiratory Syndrome Rev ea l Similar Impacts of Control Measur es. Americ an Journal of Epidemi olo gy, 160(6): 50 9 -516. [5] C.E. Mills, J . Ro bins, and M. L ipsitc h (2004 ). T rans missibilit y of 1918 pandemic influenza. Natu r e 43 2 : 90 4. [6] N.M. F erg uson, D.A.T. Cummings, S. Ca uchemez, C. F ras er, S. Riley , A. Meeyai, S. Iamsiritha worn, a nd D. Burk e. Strategies for cont a ining an emerging influenza pandemic in So uthea st Asia. Natur e 437: 209 -214. [7] A.Y. K uk and S. Ma (2005 ). E stimation of SARS incubation distribu- tion from serial interv al data using a conv olution likelihoo d. Statistics in Medicine 24(16 ): 2 525-3 7. [8] J. W allinga and M. Lipsitch (2 0 07). How genera tion interv als shap e the relationship be tw een gr owth rates and reproductive n umbers. Pr o c e e dings of the R oyal So ciety B , 274: 599- 604. [9] R. M. Anderso n and R. M. May . Infe ctious Dise ases of Humans: Dynamics and Contr ol . New Y ork: O xford Univ ersity Press, 1991. [10] E. Kenah and J. Robins (20 07). Second lo ok at the spread o f e pidemics on net works. Physic al R eview E 76: 0 3 6113. [11] E. K enah and J. Robins (2007). Net work-based analysis o f sto chastic SIR epidemic models with r andom a nd propor tionate mixing. Journal of The- or et ic al Biolo gy 24 9(4): 706 -722. [12] A. Gut (1995). An Interme diate Course in Pr ob ability. New Y o rk: Springer - V erla g. A Figures and tables 13 R 0 E [ τ ij | τ ij < ∞ ] . 5 − E [ τ ij | τ ij < ∞ ] 1 . 25 . 49 999 . 00001 1 . 5 . 49998 8 . 00001 2 2 . 49998 3 . 00001 7 3 . 49997 5 . 00002 5 4 . 49996 7 . 00003 3 5 . 49995 8 . 00004 2 T able 1: Exp ected infectious contact interv al given that infectious contact o ccurs in the mo dels illustrating global competition among po ten tial infectors. If the generation interv al were constant, this would b e the mean g eneration interv al throughout an epidemic. Figure 1: Schematic diagram of v aria bles in the general sto chastic SIR model for t he ordered pair i j . Recall that t j ≤ t ij . As discussed in Section 3.2 , p erson i develops symptoms at t ime t sym i = t i + q i , where q i is the incubation p erio d. 14 Figure 2: The smo othed mean generatio n in ter v al as a function the sour ce infection time fo r R 0 = 2 , 3 , 4 , 5. There is a clear tendency to co ntract, with greater contraction for higher R 0 . 15 Figure 3: The smo othed mean generatio n interv al (solid lines) and pre v alence (dotted lines) a s a function of the source infection time for R 0 = 2 , 3 , 4 , 5 . In all cases , the greatest cont r action of the serial interv al coincides with the peak prev alence of infection (i.e., the greates t co mpetition among p otential infector s). 16 Figure 4: The smoothed mea n generation interv als (solid lines) and prev alence (dotted lines) as a function of the s ource inf ectio n time f o r R 0 = 1 . 25 and 1 . 50. F or R 0 near one, the mean generatio n in terv a l sta ys re la tively constant. 17 Figure 5: The smo othed mean scale d generation in terv al (SGI) and prev alence as a function of the source infection time fo r R = 2 and R = 3 . With incr easing cluster size, the degree of genera tion interv al contraction is ro ughly the same even tho ug h the peak prev alence of infection is low er. 18
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment