An Extreme Value Theory approach for the early detection of time clusters with application to the surveillance of Salmonella

We propose a method to generate a warning system for the early detection of time clusters applied to public health surveillance data. This new method relies on the evaluation of a return period associated to any new count of a particular infection re…

Authors: Armelle Guillou, Marie Kratz, Yann Le Strat

An Extreme V alue Theory approach for the early detection of time clusters with application to th e surv eillance of Salmonel la Armelle Guillou ( a ) , Marie Kratz ( b ) , Y ann Le Strat ( c ) ( a ) IRMA UMR 7501, Univ ersit ´ e d e Strasb our g, F r ance; Email: armelle.guillou@math.unistra.fr ( b ) ESSEC Business Sc ho ol P aris & MAP5 UMR 8145, Un iv. P aris Descartes, F rance; Email: kratz@essec.fr ( c ) Institut de V eille Sanitaire, D ´ epartemen t des Maladies Infectieuses, Sain t- Maurice, F rance; Email: y .lestrat@in vs.sante .fr Abstract W e prop ose a metho d to generate a wa rn ing s y s tem for the early detection of time clusters applied to p ublic health surveilla nce data. This new m etho d r elies on the ev al uation of a return p erio d asso ciated to any new count of a particular in f ection rep orted to a surve illance system. The metho d is applied to Salmonella surveilla nce in F rance and compared to the model d ev elop ed b y F arr ington et al. 1 In tro ductio n Since the pioneering work of Serfling (see [1 9]), sev eral statistical models ha v e b een prop osed to detect time clusters from sp ecific surv eillance data. 0 This work was pa rtially supp orted b y the ESSE C Research Center. 2000 AMSC 60G70 ; 6 2P10 Keywor ds: Extreme v alue theory , return perio d, o utbreak detection, salmonella, surveillance 1 A time cluster is defined a s a time in terv al in whic h the num ber of observ ed ev en ts is significan tly higher than the exp ected n um b er of ev ents in a giv en geographic area. The t erm ”eve nt” is generic enough to include an y ev ent o f in terest suc h as a case of illness, an admission to a n emergency departmen t, a death or an y o ther health ev en t. The published mo dels can b e classified into three bro a d approac hes: regres- sion metho ds, time series metho ds and statistical pro cess con trol as prop osed b y some recen t reviews (see [20],[7],[14]). In most cases they are based on t w o steps: (i) the calculatio n of an exp ected v alue of the ev en t of inte rest for the curren t time unit (generally a wee k or a day); (ii) a statistical compari- son b etw een this exp ected v alue and the observ ed v alue. A statistical alar m is trigg ered if the observ ed v alue is significan tly differen t from the exp ected v alue. The first step is based on the past coun ts, or more often o n a sample of the past coun ts, that tak es the seasonality pattern(s) in to account. Th us, the curren t coun t is compared to coun ts that o ccurred in t he past during the same time p erio ds, e.g. the same w eek more or less three w eeks for the last fiv e y ears. Alternative ly , sin usoidal seasonal comp onen ts can b e incorp or a ted in to regression mo dels to deal with the seasonalit y and to easily tak e secular trends in to accoun t. More rarely , mo dels try t o reduce the influence of wee ks coinciding with past outbreaks. One solution to a v oid that suc h outbreaks reduce the sensitivit y o f the mo del is to asso ciate low w eigh ts to these wee ks (see e.g. [6]). The early prosp ectiv e detection of time clusters represen ts a statistical chal- lenge as the mo dels m ust take the main feature of the dat a in to account suc h as secular trends, seasonalit y , past outbreaks but are also faced with id- iosyncrasies in rep orting, such as dela ys, incomplete or inaccurate rep orting or other a r tefacts of the surv eillance systems. Repo rting delays are partic- ularly pro blematic for surv eillance sy stems that are not based on electronic rep orting. Concerning non- sp ecific surv eillance systems, the same difficulties are encoun tered, excepted for t he rep orting delays b ecause these surv eillance systems are mostly based on electronic rep orting. The in ten tional release of an thrax in the USA in October 2001 emphasized the need to dev elop new early w arning surve illance systems (see [9 ],[18]). These surve illance systems treat an increasing n um b er o f data provided fro m m ultiple sources of information (see [4]). One log ical consequence w as to p erform statistical analyses with a daily fr equency . Dev eloping automated statistical prosp ectiv e metho ds for the early detection 2 of time clusters is thus essen tial. It is imp ortant f or a public health surv eil- lance agency to run sev eral statistic al metho ds concomitan tly in order to compare the alarms generated b y these metho ds. It is crucial to carry on the dev elopmen t of new metho ds b ecause the combination of metho ds increases the sensitivit y and the p ositive predictiv e v alue of the surv eillance system. It is the reason wh y w e propose in this pap er a new approac h based on Extreme V alue Theory (EVT) (see e.g. [5]) for the early detection of time clusters. T o illustrate the p erformance of the metho d, we applied it to the detection of time clusters fro m we ekly coun ts of Salmonel la isolates r ep orted to the nationa l surv eillance sys tem in F rance. Salmonellosis is a ma j or cause of bacterial en teric illness in b oth humans and animals, w ith bacteria called Salmonel la . In F rance, Salmonel la is the first cause of lab oratory confirmed bacterial gastro ente ritis, of ho spitalizatio n and of death. In 200 5, a study estimated that b etw een 92 and 535 deaths attributable to non t yphoidal Salmonel la o ccurred eac h y ear (see [22]). The pap er is organized as follows. The surv eillance system and the data ar e presen ted in Section 2. A description of our metho d to c hec k if each new observ ation corresp onds to an un usual/extremal ev ent is giv en in Section 3. Applications to coun ts of Salmonel la as w ell as a comparison to the F arrington metho d (see [6 ]) are dev elop ed in Section 4. A discussion follows in the la st section. 2 Data The National Reference Cen ter for Salmo n el la contributes t o the surv eillance of salmonellosis b y p erforming serot yping o f ab out 100 0 0 clinical isolates eac h y ear. Salmonel la surv eillance is based on a net w ork of 1500 medical labo - ratories that v olun tarily send their isolates. Salm onel la en terica serot yp es Th yphim urium and En teritidis represen t 70 % of all S a lmonel la isolates in h umans among man y hundre ds of serot yp es; that is wh y w e consider in this pap er mainly these t w o serot yp es. F o r illustrative purp ose, four other less fre- quen t serot yp es (Manhattan, Derby , Agona and Virc how ) migh t also b e con- sidered; Figure 1 sho ws the we ekly n um b er o f isolates for these six serot yp es from January 1, 1 995 to D ecem b er 31, 2008. It highlights the gr eat v ari- abilit y in terms of seasonality , secular trend and w eekly num b er of rep o rted isolates a nd frequencies of un usual eve nts. 3 Let { Y t ; t ∈ N } b e the time series corresp onding to the nu mber of isolates at time p oint t for a giv en serot yp e. As men tioned b y man y authors (see e.g. [15], [9]), seasonal effects ma y hav e a strong impact to generate a statistical alarm. A usual w ay to prepare dataset is to select coun ts from comparable p erio ds in past y ears, as describ ed in t he literature (see [21], [6]). The dataset is restricted to the counts that o ccurred during the times within these com- parable p erio ds. F or instance, if the curren t time is t of y ear y , then only the coun ts for the n = b (2 w + 1) times f rom t − w to t + w of y ears from y − b to y − 1 ( b > 1, w > 1) are used. F rom no w on, ( X t ) will denote the resulting times series that will b e con- sidered in our study . As an illustration Fig ure 2 represen ts the restricted dataset for Sa lmonel la T yphim urium, for a given curren t we ek. 3 An EVT appr oac h Supp ose w e hav e at our disp osal n successiv e observ ations that we consider as realizations of a sample ( X i ) of indep enden t and iden tically distributed (i.i.d.) non-negativ e random v aria bles defined on a probability space (Ω , A , P ), fr om a distribution function F . Recall that a return lev el z T asso ciated with a g iven return perio d T corre- sp onds t o the lev el expected to be exc eeded on a verage o nce ev ery T time units, i.e . suc h that E T X i =1 1 l { X i >z T } ! = 1 where 1 l { A } represen ts the indicator function that is equal to 1 if A is true and to 0 otherwise. The last equalit y can b e rewritten as 1 − F ( z T ) = 1 / T . Hen ce, the return lev el z T corresp onds simply to a p T − quan tile with p T = 1 − 1 /T , z T = F ← (1 − 1 /T ), F ← denoting the g eneralized in v erse function of F . The idea of the metho d is to a sso ciate with each observ ation x s a return p erio d T s defined theoretically as (1 − F ( x s )) − 1 to b e able to determine the return p erio d T t asso ciated to eac h new observ at io n x t at time t , then to lo ok bac kw ards (and not forwards as in the standard w a y) in the in terv al ( t − T t ; t ) for the existence of an observ ation that would exceed x t ; if it exists, w e gen- erate a w arning time a t t ime t since on av erage w e were no t exp ecting a second exceedanc e on ( t − T t ; t ]. Notice t ha t in our discrete f ramew ork it will not b e p ossible to estimate ex- 4 plicitly the r eturn leve ls; instead estimated b ounds prop osed in [11] will b e considered. Therefore, a fter a preliminary analysis of the data a nd definition of our sam- ple, w e will compute the estimated b ounds of the return lev els in order to obtain a graph of the return p erio ds and lev els. Then, we will allo cate a return p erio d to an y new observ ation x t to test if t corresp onds to a w arning time a ccording to our definition. 3.1 Bounds for the return lev el Lo oking at extremal eve nts leads us to the crucial problem of high quan tile estimation. Suc h a purp ose has b een extensiv ely studied in the literature (see e.g. [5]), and the classical approac h, in the i.i.d. setting, consists to use the Extreme V a lue Theory assuming that exceedances ab ov e a high threshold appro ximately fo llow a Generalized P areto distribution (this result is known as the P eak-Over-Thre shold (POT) me tho d). Ho w ev er, this theory is only v alid in the case where the underlying distribution function F is contin uous. This is not the case in the epidemiology contex t. Therefore, w e prop o se to use instead upp er and low er b ounds for the return leve l z t and estimate t hem, follo wing the metho d dev elop ed b y G uillou et a l . (see [11]); this metho d has sev eral a dv an tag es: the upp er and low er b ounds can be computed for an y v alue of t (in particular it holds for larg e v alues), it do es work for b oth small and large samples, and for F con tinuous or discrete. So this approach is w ell-adapted to our con text, when assuming the random v aria bles asso ciated to the observ atio ns i.i.d. Let us recall the expression of those upp er and lo w er b ounds, giv en resp ec- tiv ely b y inf n b t ( u, v ) : u ≥ 0, v ≥ 0 non-decreasing functions o (1) where b t ( u, v ) := u ←  tθ ( u, v ) v (1 − 1 /t )  , with θ ( u, v ) := E [ u ( X ) v ( F ( X )) ], and sup n ℓ t ( u, w , q ) : u ≥ 0 non-decreasing function, w ≥ 0 no n- increasing function, q > 1 o (2) 5 where ℓ t ( u, w , q ) := u ←    θ ∗ ( u, w ) − t − 1+1 /q  θ ∗ ( u q , w q )  1 /q w (1 /t )(1 − 1 /t )    ; θ ∗ ( u, w ) := E [ u ( X ) w (1 − F ( X ) ) ] . Estimators o f those tw o b ounds (1) and (2) follow when considering natural estimators o f θ ( u , v ) and θ ∗ ( u, w ), namely b b t ( u, v ) = u ← " t b θ n ( u, v ) v (1 − 1 /t ) # and b ℓ t ( u, w , q ) = u ←    b θ ∗ n ( u, w ) − t − 1+1 /q  b θ ∗ n ( u q , w q )  1 /q w (1 /t )(1 − 1 /t )    , (3) where, if ( X 1 ,n ≤ ... ≤ X n,n ) denote the order statistics from a giv en sample, b θ n ( u, v ) = 1 n n X i =1 u ( X i,n ) v  i n  and b θ ∗ n ( u, w ) = 1 n n X i =1 u ( X i,n ) w  1 − i n  . Under some conditions on u , v and w , asymptotic distributions are obtained for the bounds estimators, as w ell as asymptotic confiden ce in terv als when using t he delta metho d (see Section 3 in [11]). F or instance, concerning the upp er b ound, w e hav e the following asymptotic confidence interv al: b t ( u, v ) ∈ " b b t ( u, v ) ± q 1 − α/ 2 t b σ √ n v (1 − 1 t ) ( u ← ) ′  t b θ n ( u, v ) v (1 − 1 t )  # (4) where q 1 − α/ 2 denotes the quan tile of order 1 − α/ 2 of the standard normal distribution a nd b σ the empirical v ersion of σ defined for U uniformly dis- tributed on (0 , 1) as σ 2 = E − v ( U ) u ( F ← ( U )) + θ ( u , v ) − Z 1 0  1 l { U ≤ t } − t  v ′ ( t ) u ( F ← ( t )) dt ! 2 . A similar confidence interv al can b e obtained for the lo w er b ound. Finally , since it is imp ossible to optimize in (1) and (2) under t he whole family of non- negativ e and non- decreasing functions u and v and non- negativ e and non-increasing functions w , w e r educe the problem b y c ho osing the sub-class 6 of p o w er functions since it seems adapted t o our study , giving reasonable results (ev en if not o ptimal). W e consider the functions defined by u ( x ) = x α , v ( x ) = x β , w ( x ) = x − ν , with α , β , ν p ositive real n um b ers and set q = 2 (c hanging the v alue of t his last parameter do es not affect significan tly the final result). Then w e solv e n umerically , for ε close enough to 0, the following optimization problem ( b α t , b β t ) = argmin n b b t ( x α , x β ) : α ∈ [ ε, α 0 ] , β ∈ [ ε , β 0 ] o and ( b α ∗ t , b ν ∗ t ) = argmax n b ℓ t ( x α , x − ν , 2) : α ∈ [ ε, α 0 ] , ν ∈ [ ε, ν 0 ] o , (5) in or der to obtain the estimated upp er and low er b ounds equal, respective ly , to b b t = b b t ( x b α t , x b β t ) and b ℓ t = b ℓ t ( x b α ∗ t , x − b ν ∗ t , 2) . (6) As already said, the c hoice for u, v , w a nd q do es not necessarily corresp ond to the optimal b ounds but cov ers a wide enough ra nge o f b ounds that provides satisfying results whe n w o r king on v arious epidemiology datasets, as w e are going to see in Section 4. 3.2 Determination of an alarm time Let us presen t our metho d to define an alarm system. It will consist in three main steps. Note that using b ounds for a return leve l z t will imply that the return p erio d defined theoretically b y (1 − F ( z t )) − 1 cannot b e explicitly estimated and w e ha v e T ℓ ≤ (1 − F ( z t )) − 1 ≤ T b (7) where T ℓ and T b denote the return p erio ds of t he b ounds ℓ t ( u, w , q ) and b t ( u, v ) resp ectiv ely . Step 1 : W e draw the plot of the return perio d on the x -axis and the cor- resp onding es timate of the upp er bound of the return lev el (instead o f the return lev el itself ):  t, b b t  . Step 2 : W e allo cate to each observ ation x t i , i ≥ 1, a return time T i us- ing the previous plot. Namely , x t i corresp onds to a v alue b b T i of the y -axis of the plo t from which w e deduce the asso ciated return lev el T i . Reading an 7 observ ation as an upp er b ound of a return lev el means tha t T i is in fa ct the lo w er b o und of the theoretical return perio d (1 − F ( x t i )) − 1 that should be asso ciated to the observ atio n x t i , b ecause of ( 7). W e justify our c hoice as follow s. Considering b ℓ T i instead of b b T i in the ab o ve metho d w ould hav e led to o v erestimate the return perio d associated to the observ ation x t i , whic h could b e a problem in the con text of alarms (it is b etter to ha ve more alarms than less), ex cept if the plo ts  t, b ℓ t  and  t, b b t  w ere close enough, but it is generally not the case for our data sets where b ℓ t app eared approx imately as a constan t function of time (see [2]). Step 3 : W e use the fact t ha t if ( X j ) are i.i.d. random v ar iables, then w e ha v e for any time in terv al I ( T ) with length T E T X i =1 1 l { X i >z T } ! = 1 ⇔ E   X i ∈ I ( T ) 1 l { X i >z T }   = 1 . (8) This remark is importa n t since w e w ant to define for each new observ ation a warning time and not a predicted return perio d, whic h means to lo ok bac kw ard in time. Hence for eac h new observ ation x t i , to whic h a return time T i has been asso ciated (via Step 2), we will lo ok in the in terv al ( t i − T i ; t i ) t o see if there exists an observ ation exceeding x t i ; if it do es, w e ring an ala r m at this time t i , since as alr eady said, on a v erage, w e do not expect a second excee dance on ( t i − T i ; t i ]. No tice that w e c hose here to consider an inequalit y in the indicator set, and no t a strict one as in ( 8 ), the level corresp o nding no w to an observ atio n. T o finish this section, let us sumarize our metho d. • Using the plot  t, b b t  , asso ciate a time T i to eac h o bserv ation x t i ( i ≥ 1 ) ; • for each new observ ation x t i , consider I ( T i ) = ( t i − T i , t i ]; • lo ok for the existence of an observ ation x t ≥ b b T i , for t ∈ ( t i − T i , t i ); if there exists at least suc h an observ ation, i.e. if E   X j ∈ I ( T i ) 1 l { X j ≥ b b T i }   > 1, then generate a w arning time at t i . 8 No w to illustrate our metho d, let us consider the example of the maxim um n um b er of Salmonel l a Virc how isolat es. In Figure 3, the x -axis corr esp o nds to the v alues of t from 2 to 100 w eeks and the y -axis to b b t ; the t w o dashed lines indicate the 95% confidence inte rv al b ounds of b t . The return lev el/return p erio d gra phs for the six seroty p es presen ted in Sec- tion 2 a r e represen ted in Figure 4. 4 Applicatio ns F or eac h w eek from Janu ary 2000 to D ecem b er 2008, the EVT metho d w as applied t o the six time series presen ted in Section 2. F or any w eek t , coun ts of w eeks from t − 3 to t + 3 of years from y − 5 to y − 1 we re used. Moreo v er, in order to reduce the probabilit y tha t an alarm could b e triggered for few sp oradic cases, a standard rule has b een adopt ed to kee p an alarm a t w eek t if at least 5 cases w ere observ ed during the 4 last w eeks preceding t . This rule has already b een applied at the Comm unicable D isease Surv eillance Cen ter (CDSC) using the metho d dev elop ed by F arrington et al. (see [6]). Sev eral statistical metho ds for the prosp ectiv e change-point detection in time series of coun ts hav e b een already applied to Salm onel la inf ections b y v arious authors (see [2 3],[6],[13]). W e ch ose to compare the alarms g enerated b y the EVT metho d with the ones generated b y the F arrington metho d for the six serot yp es and for each w eek since 200 5 , as the National Reference Cen ter for Salmonel la applies the F arrington metho d. Comparisons b et w een t he t wo metho ds are sho wn in T able 1. The concor- dance is equal to 93.7% (T yphim urium), 95.9% (Derby), 96.8% (Agona), 97.4% (Virc how), 98.7% (En teritidis) and 99.1% (Manhattan). When com- paring the results outside of the diagonal with the r eal past epidemies, it app ears that our metho d seems t o fit b etter the reality than F arringto n’s one, prov iding less alarms when historically t here was indeed no epidem y , and more alarms when there w as. It make s our metho d quite promising. Nev ertheless this comparison cannot replace a more standard pro cedure that systematically compare for eac h w eek the statistical alarms and the alerts iden tified b y an epidemiologist. In this case, the epidemiologist j udgemen t is considered a s the gold standard (ev en if the epidemiologist is not infallible). Figures 5 and 6 represen t b oth the alar ms and the w eekly coun ts o v er time for the serot yp es Manhat t an and Agona . Each triangle represen ts a statistical 9 Manhattan Derb y EVT EVT - + T ot a l - + T otal - 440 3 443 - 441 12 453 F arrington + 1 19 20 F arrington + 7 3 10 T otal 441 22 463 T otal 448 15 463 Agona Virc ho w EVT EVT - + T ot a l - + T otal - 427 13 440 - 451 1 452 F arrington + 2 21 23 F arrington + 11 0 11 T otal 429 34 463 T otal 462 1 463 T yphim urium En teritidis EVT EVT - + T ot a l - + T otal - 418 1 419 - 455 0 455 F arrington + 28 16 44 F arrington + 6 2 8 T otal 446 17 463 T otal 461 2 463 T able 1: Tw o-wa y tables of frequency coun ts of non-a la rms (-) and a larms (+) fr om the F arr ington metho d and the EVT metho d, for eac h serotype. alarm. T riang les on the first line represen t the ala r ms g enerated by the EVT metho d, whereas the ones g enerated by the F arrington metho d are giv en on the second line. In Figure 5, the alarms generated b y the t w o metho ds occurred in the same p erio d that corresp onds to a do cumen t ed outbreak, delimited b y the dashed lines, for the serot yp e Manhatta n (see [16]). F rom August 2005 to F ebru- ary 2006, a comm unit y-wide outbreak of Salmonel la Manhatta n infections o ccurred in F rance. The inv estigation incriminated p ork pro ducts from a slaugh terhouse as b eing the most lik ely source of this outbreak. There was a concordance b et w een the temp o ral (Octob er- Decem b er 2005 ) and the geo- 10 graphical (south-eastern F rance) o ccurrence of the ma jority of cases and the distribution of pro ducts from t he slaughterhouse . In Figure 6, alarms for the seroty p e Agona are distributed from 2000 to 200 8. There is a concordance b et w een the t w o metho ds during three p erio ds. The first p erio d, correspo nding to 5 w eeks in August and Septem b er 2 003, w a s not do cumen ted as an outbreak. The second concorda nce p erio d, corresp onds to 15 consecutiv e w eeks from the la st we ek in Decem b er 2004 to w eek 15 in April 2005. This second p erio d is more in teresting as it corresp o nds to the b eginning o f a large outbreak o f infections in inf a n ts linked to t he consump- tion of p ow dered infant form ula (see [3]) . The outbreak p erio d, delimited by the tw o dashed lines, to ok pla ce in tw o stages: the first stage from w eek 53 in Decem b er 20 04 to wee k 10 in March 2005 and the second fr o m w eek 11 to w eek 2 1. A total of 4 7 cases less than 12 months age w ere iden tified during the first stage and 94 cases less than 12 mon ths age w ere iden tified during the second stage. The third p erio d corresp o nds to the wee k 29 in July 2008. It included fiv e cases, tw o of them coming from a fo o db orne disease outbreak in v olving piglet consumption, and the three others b eing proba bly sp o radic cases. The EVT was implemen ted using R v ersion 2 .9 [17]: see [2]. The function called algo.farrington , implemen ted in the R-P ac k a ge ’surv eillance’ ([12]) w as used to apply the F arring t o n metho d. 5 Discuss ion W e b eliev e that the EVT metho d meets a n um b er of r equiremen ts, liste d b y F a rrington et al. (see [7]), for the outbreak detection algorithms imple- men ted in surv eillance systems. Indeed, this metho d is able to monitor a large n um b er of time series whic h b ecame an absolute necessit y in mo dern computerized surv eillance syste ms. It can deal with a wide range of ev en ts as it is the case for the Salmonel la infections with the routinely analyses of sev eral h undred serotypes p er w eek. It can handle times series with great n um b ers o f cases (suc h as S almonel la En teritidis) or small n um b ers o f cases (suc h as Salmonel la Manhatta n). Seasonality is tak en into account b y com- paring coun ts ov er the same p erio ds of time. Other metho ds prop ose a direct w a y to treat the past ab erratio ns, for instance by asso ciating lo w we ights to the wee ks coinciding with past o utbreaks. There is no suc h a need when using the EVT metho d since the return p erio d is no t a constan t but dep ends 11 on eac h observ ed count; alarms can t hen b e generated ev en if past outbreaks exist. It is particularly the case with low coun ts for whic h the return p erio d is small a nd do es no t include the past outbreaks. Finally , the metho d is implemen ted in a function using the R language, allowin g to run it in an automated pro cedure with minimal user interv en tion. Although the mo del dev elop ed b y F arringto n et al. (see [6]) b ecame a stan- dard reference metho d, ro utinely used in F rance since ma ny y ears and in- corp orated in se ve ral surve illance systems: h uman Salmonella, non h uman Salmonella (see [1]), legionella (see [10]) or in syndromic surv eillance systems (see [8]), the EVT metho d seems also to b e a v aluable and in teresting t o ol for the recognition o f time clusters. It could be in tegrat ed in the fa mily of outbreak detection algorithms used b y the public health surv eillance ag encies since dev eloping effectiv e computer-a ssisted out break detection systems still remains a necessit y to ensure timely public health in terv en tion. Another p o ssible w ay to pro ceed w ould b e to transfor m the sample of discrete random v ariables in to a contin uous one in order to use standard EVT to ols, instead of quan tile’s b ounds. It ha s b een empirically studied when smooth- ing t he da ta via a k ernel tra nsformation and pro vided pro mising results (see [2]); it will b e the sub ject of a future w ork. Finally , w e also plan to in ve stigate the ex tension of suc h EVT metho ds for time o r spatially dependen t data . Ac kno wledg emen ts The a uthors w ould like to thank Dr F rancois-Xa vier W eill, head of the Na- tional Reference Cen ter fo r Salmonel la in F rance for pro viding the datasets and Anis Borc hani for the imple men tat io n of the metho d usin g the R lan- guage. Conflict of Inter est: None declared. References [1] Bar oukh, T., Le S tra t, Y., Mour y, F., Brisabois, A. and D anan, C. (2008). Use of statistical metho ds f o r a routinely un usual ev en t surv eillance in non h uman Salmonella. ESCAID E : Eur op e an Sci- entific Confer enc e o n Applie d Infe ctious D ise ase s Ep i d emiolo g y. 12 [2] Bor chani, A. (2008). Sta tistiques des v aleurs extr ˆ emes dans le cas de lois discr ` etes. ESSEC working p a p er. [3] Br ouard, C., Espie, E., Weill, F.X., Ke ro uanton, A., Bris- abois, A., For gue, A.M., V aillant, V. and de V alk, H. (20 07). Tw o consecutiv e large outbreaks of Salmonella en terica serot yp e Ag- ona infections in infa n ts link ed to the consump tion of p o wdered infan t form ula. T he Pe diatric Infectious Disease Journal 26 , 148–15 2. [4] Centers f or Diseas e Contro l and Prevention (2004). Syn- dromic Surveillance : Rep o rts from a Na tional Conference 2003. Mor- bidity an d Mortality We ekly R ep o rt 53(Suppl) . [5] Embrechts, P., Kluppe lberg, C. and Mikosch, T. (200 1 ). Mo d - el ling extr emal events for Insur an c e and Financ e (Sto c hastic Mo de l ling and Applie d Pr o b ability) . Springer. [6] F arrington, C .P. , Andrews, N.J, Beale, A. D. and C a tch- pole, M.A. (1996). A statistical algorithm for the early detection of outbreaks of infectious disease. Journal of the R oyal Statistic al So ciety Series A 159 , 547– 5 63. [7] F arrington, C. P. and Andrews , N.J. (200 4). Statistical asp ects of detecting infectious dise ase outbreaks. In: Broo kmey er, R. and Stroup, D.F. (editors). Monitoring the He a lth of Populations. Oxford Univ ersit y Press, 203–23 1 . [8] F ouillet, A. , G olliot, F., Caillre, N. , Flamand, C., Kamali, C., Le Stra t, Y., Leon, L., Mandereau-Bruno, L., Pouey, J., Retel, O., W agner, V. (2008). Comparison of the p erformances of statistical metho ds to detect outbreaks. A dvan c es in D ise ase Surveil- lanc e 5 , 30. [9] Goldenberg, A., Shmueli, G., Car uana, R.A. and Fie nberg, S.E. (2002). Early statistical detection of a n thrax outbreaks b y tracking o v er-the-counter medication sales. Pr o c e e dings of the National A c adem y of Scienc es of the Unite d States of A meric a 99 , 523 7–5240. [10] Grandesso, F. (2009). Early detection of excess legionella cases in F rance: ev aluat ion p erformance of fiv e auto mated metho ds. ESCAID E 13 : Eur op e a n Scientific Confer enc e on Applie d Infe c tious Dise ases Ep i - demiolo gy. [11] Guillou, A., Na veau, P., Diebol t, J. and Ribereau, P. (200 9 ). Return lev el b ounds for discrete and con tinuous random v ariables. T est 18 , 584–604. [12] Hoehle,M. (2007 ) . Surveillance : An R pac k age for the surv eillance of infectious diseases. Computational Statistics 22 , 571–58 2. [13] Hutw a gner, L. C., M aloney, E.K., Bean, N.H., Slutske r, L. and Mar tin, S.M. (1997). Using labo r atory-based surv eillance data for prev ention: an algorithm for detecting salmonella outbreaks. Emer g- ing Infe ctious Dis e ases 3 , 395–4 00. [14] Le Stra t, Y. (2005). Ov erview of temp ora l surv eillance. In: La wson, A.B. and Kleinman, K. (editors). Sp atial and Syndr omic Surveil l a nc e. Wiley , 13–29. [15] Nobre, F.F., Monteiro, A.B.S., Telle s, P. R. and Williamson, G.D. (2001). Dynamic linear mo del and SARIMA: a comparison of their forecasting p erformance in epidemiology . Statistics in Me dici n e 20 , 3051–306 9. [16] Noel, H., Dominguez, M., Weill, F.X. , Brisabois, A., Duc- hazeaubeneix, C., Ker ouanton, A., Delmas, G., Pihier N. and Couturier, E. (2006). Outbreak of Salmonella en terica seroty p e Manhattan infection asso ciated with meat pro ducts, F rance, 2005. Eu- r osurveil la nc e 11 , 2 7 0–273. [17] R: A Language and Environment for St a tistic al Computing (2006). R F oundation for Statistical Computing. Vienna, Austria. ISBN 3-90005 1-07-0 . h ttp:// www.R-pro ject.org . [18] Reis, B.Y., P agano, M . and Mandl, K.D. (20 03). Using temp o r al con text to impro ve biosurve illance. Pr o c e e din gs o f the National A c ademy of Scienc es of the Unite d States of A meric a 100 , 196 1–1965. [19] Serfling, R .E. (1996). Metho ds for current statistical a na lysis of ex- cess pneumonia-influenza deaths. Public He alth R ep o rts 78 , 494–50 6 . 14 [20] Sonesson, C. and Bock, D. (2003). A review a nd discussion of prosp ectiv e statistical surv eillance in public health. Journal of the R oyal Statistic al So ciety Series A 1 66 , 5–21. [21] Str oup, D.F., Williamson, G.D. and Herndon, J.L. (1989). De- tection o f ab errations in the o ccurrence of notifiable diseases surv eillance data. Statistics in Me dicine 8 , 323–329 . [22] V aillant, V., de V alk, H., Baron, E. , Ancelle , T., Colin, P., Delmas, M.C., Dufour, B., Pouillot, R ., Le Stra t, Y., Weinbreck, P., Jougla, E. and Desenclos, J.C. (2 0 05). Burden of fo o db orne infections in F rance. F o o db o rne Patho g e n s and Dise ase 2 , 221–232. [23] W a tier, L., R ichardson, S . and Huber t, B. (1991) . A time series construction of an alert threshold with application to S. b o vismorbificans in F rance. Statistics in Me dicine 10 , 1493–1509 . 15 0 1 2 3 4 5 6 7 (a) 1995 1996 1997 1998 2000 2001 2002 2003 2005 2006 2007 2008 0 5 10 15 (b) 1995 1996 1997 1998 2000 2001 2002 2003 2005 2006 2007 2008 0 5 10 15 20 25 30 (c) 1995 1996 1997 1998 2000 2001 2002 2003 2005 2006 2007 2008 0 10 20 30 40 50 (d) 1995 1996 1997 1998 2000 2001 2002 2003 2005 2006 2007 2008 0 20 40 60 80 100 (e) 1995 1996 1997 1998 2000 2001 2002 2003 2005 2006 2007 2008 0 20 40 60 80 120 (f) 1995 1996 1997 1998 2000 2001 2002 2003 2005 2006 2007 2008 Figure 1: W eekly counts of isolates rep orted to the National Reference Cen- ter for Salmonel la in F rance, January 1, 1995 t o Decem b er 31, 200 8 : (a ) Salmonel la Manhattan; (b) Salm onel la Derb y; (c) Salmonel la Agona; (d) Salmonel la Virc how; (e) Salmonel la T yphimurium; (f ) Salmone l la En teri- tidis. 16 0 20 40 60 80 100 2003 2003 2004 2004 2005 2005 2006 2006 2007 2007 2008 2008 2009 Current week Figure 2: W eekly coun ts of isolates for Salmonel la T yphim urium. The cur- ren t we ek, represen ted by the (blue) arrow, is the last we ek of Decem b er 2008. The counts that o ccurred in comparable perio ds ( ± 3 we eks) in the fiv e previous y ears, and used to g enerate or not an alarm, are represen ted by the ( red) strip ed bands. 17 0 20 40 60 80 100 8 10 12 14 16 18 20 22 Return period Return level Figure 3 : The return lev el/return p erio d gra ph for Salmone l la Virc how. The blac k cu rve represen ts the upp er bo und of the return lev el. Dashed curv es represen t the 95% confidence interv al of this upp er b ound. T o the observ ation y = 20 (resp ectiv ely y = 15) do es corresp ond on the x -axis b b 91 (resp ectiv ely b b 7 ) from whic h we ded uce t he return p erio d equals to T = 91 (respectiv ely T = 7) wee ks. 18 0 20 40 60 80 100 0 1 2 3 4 5 6 (a) 0 20 40 60 80 100 0 2 4 6 8 10 (b) 0 20 40 60 80 100 0 1 2 3 4 5 6 (c) 0 20 40 60 80 100 0 5 10 15 20 (d) 0 20 40 60 80 100 0 10 20 30 40 50 60 (e) 0 20 40 60 80 100 0 10 20 30 40 (f) Figure 4: The return lev el/return p erio d graphs calculated for the last we ek in 2008: (a) Salmonel la Manhattan; ( b) Salmonel la Derby; (c) Salmonel la Ag- ona; (d) Sa lm onel la Vircho w; (e) Salmonel la T yphim urium; (f ) Sa l m onel la En teritidis. 19 time 2000 I 2000 IV 2001 III 2002 II 2003 I 2003 IV 2004 III 2005 II 2006 I 2006 IV 2007 III 2008 II 0 2 4 6 Figure 5: Salmonel la Manhattan: W eekly coun ts from Jan uary 1, 2000 to Decem b er 31, 2008. Roman numerals refer to the quarters of the y ears. Alarms g enerated b y the EVT metho d are represen ted by triangles on the first line. Alarms generated b y the F arrington metho d are represen ted by triangles on the second line . Th e do cumen t ed outbreak is delimited by the t w o dashed lines. 20 time 2000 I 2000 IV 2001 III 2002 II 2003 I 2003 IV 2004 III 2005 II 2006 I 2006 IV 2007 III 2008 II 0 5 10 15 20 25 30 Figure 6: Salmone l la Agona: W eek ly coun ts from Jan uary 1, 2 0 00 to De- cem b er 31, 2008 . Roman n umerals refer to the quarters of the y ears. Ala r ms generated by the EVT metho d are represen ted b y triangles on the first line. Alarms g enerated by the F arrington metho d are represen ted b y triangles on the second line. The do cumen ted outbreak is delimited b y the t w o da shed lines. 21

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment