HIV with contact-tracing: a case study in Approximate Bayesian Computation

HIV with con tact-tracing: a case study in Appro xim ate Ba y esian Computation Mic hael G.B. Blum 1 , Vie t Chi T ran 2 1 CNRS, Lab oratoire TIMC-IMAG , U n iversi t´ e Joseph F ourier, Grenoble, F rance 2 Lab oratoire P . Painlev ´ e, U n iversi t´ e Lille 1, F rance Abstract Missing data is a r ecurrent issue in epidemio logy where the infection pro cess may b e partially obser ved. Approximate Bayesian Computation, a n alternative to data imputation metho ds suc h as Markov Cha in Mont e Car lo integration, is prop osed for making inference in epidemiologica l models. It is a likelihoo d-free metho d that r e lies exclusively on n umerica l simulations. ABC consists in computing a distance betw een simulated and obser ved summ a r y statistics and weigh ting the simulations according to this distanc e . W e prop ose an origina l extension of ABC to pa th-v alued summary statistics, cor resp onding to the cumulated num b er of detections as a function of time. F or a standar d compa rtmental mo del with Suceptible, Infectious and Recovered individuals (SIR), we show that the p os ter ior distributions o btained with ABC and MCMC a re similar. In a r eﬁned SIR mo del well-suited to the HIV c o ntact- tracing data in Cuba, we p erform a co mparison betw een ABC with full and binned detection times. F or the Cuban data, we ev alua te the eﬃciency of the detection system and predict the evolution of the HIV-AIDS dise ase. In par ticular, the p ercentage of undetected infectious individuals is found to b e of the order of 4 0 %. Keywor ds: Mathematical epidemiology; sto c hastic SIR mo d el; un observ ed infectious p opulation; sim ulation-based inference; lik eliho o d-free in ference. AMS Subj e ct Classiﬁc ation: 92D30, 62F15, 62M05 , 62N02, 60K99. 1 In tro duc tion Mathematica l mo delling p la ys an imp ortant r ole for u nderstand in g and pred icting the sp read of diseases, as we ll as for comparing and ev aluating pu blic health p olicies. Although deterministic mo delling can b e a guide for describing ep idemics, sto c hastic m o dels hav e their imp ortance in featuring realistic p ro cesses and in quan tifying conﬁdence in parameters estimates and prediction uncertain t y [4]. S tandard mo dels in epidemiology consist in compartmental mo dels in wh ich the p opulation is structur ed in diﬀerent classes comp osed of S usceptible, Infectious, and Reco vered (or Remo v ed) individuals (SIR). P arameter estimation for SIR mo dels is usually a diﬃcult task b ecause of missing obser v ations, w h ic h is a r ecurren t issue in epidemiology . When the infected p opulation is partially observ ed or when the infection times are missing, the computation of the lik eliho o d is n um erically inf easible b ecause it inv olves integrati on ov er all the unobs er ved ev en ts. Mark o v Chain Mon te Carlo (MCMC) metho ds that treat the missing data as extra pa- rameters, ha ve b ecome increasingly p opular for calibrating sto chastic epid emiologic al mo dels with missin g data [21, 20, 8]. Ho wev er, MCMC may b e compu tationally prohibitive for high- dimensional missing observ ations [9, 24] and ﬁne tun ing of the prop osal d istribution is requ ired for eﬃcien t algorithms [17]. In this pap er, we show that SI R mo dels with missin g obs erv ations can b e calibrated w ith Ap pro ximate Ba y esian Computation (ABC), an alternativ e to MCMC, originally prop osed for making inference in p opu lation genetics [3]. This appr oac h is not based on 1 the lik eliho o d fu nction bu t relies on numerical simulat ions and comparisons b et w een simulated and obs erv ed summary statistics. This w ork is motiv ated by the stud y of the Cuban HIV-AIDS database that conta ins the dates of d etection of th e 8,662 individu als that hav e b een f ound to b e HIV p ositiv e in Cu ba b et ween 1986 and 2007 [11]. T he database con tains additional co v ariates including the manner b y whic h an individu al has b een found to b e HIV p ositive . The individu als can b e detecte d either by r andom scr e ening (individ uals ‘sp onta neously’ tak e a detection test) or c ontact-tr acing . Con tact-tracing consists in testing the sexual con tacts of detected individu als [18]. T he tota l n umb er of infectious individuals as we ll as the infection times are u nknown. In Section 2 , we in tro du ce the sto c hastic SIR mo del with con tact-tracing deve lop ed b y [10]. Section 3 is dev oted to ABC metho ds when all detect ion times are known. W e prop ose an original extension of ABC to path-v alued summary statistics consisting of the cum ulated n u mb er of d etections through time. F or a simple SIR mo del, we compare n umerically the p osterior distributions obtained with ABC and MCMC. S ection 4 deals w ith p ossibly noisy or binn ed detection times for which the previous p ath-v alued statistics are una v ailable. W e in tro duce a ﬁnite dimensional v ector of su mmary statistics and compare the statistical prop erties of p oin t estimates and credib ility interv als obtained with full an d bin ned detection times. Finally , Section 5 concen trates on the analysis of the Cuban HIV-AIDS database. W e address sev eral questions concerning the dynamic of this epid emic: what is the p ercen tage of th e epidemic that is kno wn [18, 12 , 11]; how man y new cases of HIV are exp ected in the forthcoming yea rs ; and what is the prop ortion of detections that is exp ected in the con tact-trac ing pr ogram. 2 A sto c hastic SIR mo del for HIV-AIDS epidemics with con tact- tracing W e restrict our study to the sexually-transmitted epidemic of HIV in Cuba (90% of the epidemic, see [18 ]). F or mo delling the dynamics of th e num b er of k n o wn and u nknown HIV cases, w e consider a SI R mo del dev elop ed b y [10]. The popu lation is divided in to three main classes S , I and R corresp ond ing to the susc eptible , infe ctious , and dete cte d individuals considered as r emove d b ecause we assume that they do not transmit HIV anymore (see Figure 1). The p opulation of the susceptible individuals, of size S t , at time t > 0, consists of the sexually activ e seronegativ e individuals. Ind ividuals immigrate into the class S with a rate λ 0 and lea ve it b y dying/emigrating, with r ate µ 0 S t , or b y b ecoming inf ected. The class of inf ectious individuals, of s ize I t , corresp onds to the serop ositiv e individuals wh o hav e not tak en a detectio n test yet and may th us con taminate new susceptible individ u als. W e assume that eac h infected individual ma y transmit the disease to a susceptible individual at rate λ 1 so that the total rate of infection is equal to λ 1 S t I t . In dividuals lea v e the class I w hen they die/emigrate with a total rate of µ 1 I t , or wh en they are d etected to b e HIV p ositiv e. The class R of the d etected individuals, of size R t , is sub d ivided into t wo sub classes. W e denote b y R 1 t (resp. R 2 t ) the size of the remo ve d p opulation detected by r an d om s cr eenin g (resp. con tact-tracing). The tota l rate of detection by random screening is λ 2 I t . F or the rate of con tact-tracing detection, the mo del shall capture the fact that the con tribu tion of a remo v ed individual dep end s on the time elapsed since she/he has b een foun d to b e HIV p ositiv e. W e consider the tw o follo wing expressions for the total rate of con tact-trac ing detection λ 3 I t X i ∈ R e − c ( t − T i ) and λ 3 I t X i ∈ R e − c ( t − T i ) / ( I t + X i ∈ R e − c ( t − T i ) ) , (2.1) where T i denotes the time at whic h a remo ved individual i has b een dete cted. The w eight exp( − c ( t − T i )), with c > 0, determines the contribution of a remo ved individu al i to the 2 con tact-tracing according to the time t − T i she/he has b een d etected. Th e ﬁrs t rate in (2.1) corresp onds to a mass acti on pr inciple, and is prop ortional to the sum of the con tribu tions of the d etected in d ividuals. The second rate in (2.1) corresp onds to a mo del with fr equency dep end ence. In the follo w ing, θ = ( µ 1 , λ 1 , λ 2 , λ 3 , c ) d en otes the m ultiv ariate parameter of the mo del. The p arameters of interest are here λ 1 , λ 2 and λ 3 . 2.1 Connection b etw een the sto cha stic and the deterministic SIR mo del Here we consider the ﬁrs t rate of cont act-tracing detection in (2.1), but similar results can b e obtained for the second r ate. [10] sh o w ed that in a large p opulation limit, th e SIR p ro cess of Section 2 can b e mo delled w ith sto c hastic diﬀeren tial equations and con v erges to the solution of the follo wing system of ordinary d iﬀeren tial equations    ds t dt = λ 0 − µ 0 s t − λ 1 s t i t di t dt = λ 1 s t i t − ( µ 1 + λ 2 ) i t − λ 3 i t r t dr t dt = λ 2 i t + λ 3 i t r t − cr t , (2.2) where s t and i t denote th e size of the s usceptible and infectious p opulations, and where r t is the con tribution of th e remo ved ind ivid uals to th e con tact-tracing (see Section 1 of the supp lemen- tary material for more d etails). Apart from the inherent sto c hastic n ature of epidemic propagation that may b e particularly imp ortant for small p opulations [14 ], considering a s to c hastic S I R mo del rather than its d eter- ministic counterpart can present at least t w o imp ortan t adv antag es. First, it is qu ite straightfo r- w ard to p erform simula tions fr om the stochastic mo del (see Section 2 of the su pplementa ry ma- terial) and this is one motiv ation for considerin g ABC metho d s. Second, the individ ual-cen tered sto c hastic pro cess su its the formali sm of statistical method s , whic h are based on samples of individual data. Sin ce the estimates of the sto chasti c pro cess con verge to the parameters of the ODEs [10], ABC provi d es a new alternativ e for calibrating parameters of ODEs [1, 7]. 3 ABC with suﬃcien t summary statistics for epidemic mo dels 3.1 Main principles of ABC F or simplicit y , we deal here w ith densities and not general probability measures. Let x b e th e a v ailable d ata and π ( θ ) b e the prior. Tw o appr o ximations are at the core of ABC. Replacing obs erv ations with summary statist ics Instead of fo cusing on the posterior densit y p ( θ | x ), ABC a ims at a p ossibly less informativ e tar g et d ensit y p ( θ | S ( x ) = s obs ) ∝ P r ( s obs | θ ) π ( θ ) where S is a su mmary s tatistic that tak es its v alues in a normed space, and s obs denotes the obs erv ed summary statistic. The summary statistic S can b e a d -dimen s ional v ector or an inﬁnite-dimensional v ariable suc h as a L 1 function. Of course, if S is suﬃcien t, then the t w o conditional densities are the same. T h e target distrib ution will also b e coined as the p artial p osterior distribution . Sim ulation-based approximations of the p osterior On ce the summ ary statistics ha v e b een c hosen, the second appro ximation arises w hen estimating the partial p osterior densit y p ( θ | S ( x ) = s obs ) an d sampling from this distr ibution. This step in v olv es nonparametric k ernel estimation and p ossibly correction reﬁnement s giv en in S ection 4.2. 3 3.2 Sampling from the p osterior The ABC metho d with smo oth rejection generates random d ra ws from the target distr ibution as f ollo ws [3 ] 1. Generate N random draws ( θ i , s i ), i = 1 , . . . , N . The p arameter θ i is generated fr om the prior distr ibution π and th e v ector of summary statistics s i is calculated for the i th data set th at is simula ted from the generativ e mo del w ith p arameter θ i . 2. Asso ciate to the i th sim ulation the w eigh t W i = K δ ( s i − s obs ), wh ere δ is a tolerance threshold and K δ a (p ossibly multiv ariate) smo othing ke rn el. 3. The distribution ( P N i =1 W i δ θ i ) / ( P N i =1 W i ), in whic h δ θ denotes th e Dirac mass at θ , ap- pro ximates the target d istribution. 3.3 P oin t estimation and credibility in t erv als Once a sample fr om the target d istr ibution has b een obtained, sev eral estimators ma y b e consid - ered for p oint estimation of eac h one-dimensional parameter λ j , j = 1 , 2 , 3. Using the weigh ted sample ( λ j,i , W i ), i = 1 , . . . , N , the me an of the target d istribution p ( λ j | s obs ) is estimated by ˆ λ j = P N i =1 λ j,i W i P N i =1 W i = P N i =1 λ j,i K δ ( s i − s obs ) P N i =1 K δ ( s i − s obs ) , j = 1 , 2 , 3 (3.1) whic h is the well-kno wn Nadara ya- W atson regression estimator of the conditional exp ectation E ( λ j | s obs ). W e also co mp ute th e me dians , mo des , and 95% credibilit y in terv als (CI) of the marginal p osterior distribu tion (see Section 3 of the su pplement ary material). 3.4 Data and ch oice of summary st atistics Data Starting at the time of the ﬁr s t d etectio n in 1986, the Cub an HIV-AIDS data consist principally of the d etectio n times at whic h the ind ividuals h a v e b een f ound to b e HIV p ositiv e. A t the time of the last detection ev en t, in July 2007, there is a total of 8,662 in d ividuals in the database. F or eac h d etection eve nt, there is a lab el indicating if th e individ ual has b een detected b y random screening or con tact-tracing. Summary statistics W e consider th e tw o (in ﬁ nite-dimensional) statistics ( R 1 t , t ∈ [0 , T ]) and ( R 2 t , t ∈ [0 , T ]). Their sum is equal to the cumulated num b er of detections sin ce the b eginning of the epidemic. Because the data consist of the detection times for the t w o diﬀeren t t yp es of detection, these t wo statistics can simply b e view ed as a particular co ding of the whole dataset so that the partial p osterior d istribution p ( θ | R 1 , R 2 ) is equal to th e p osterior distribu tion p ( θ | x ). The L 1 -norm b et w een the i th sim ulated path R l i and th e observed one R l obs is k R l obs − R l i k 1 = Z T 0 | R l obs,s − R l i,s | ds , l = 1 , 2 , i = 1 , . . . , N . (3.2) F or computing the w eigh ts W i , we c ho ose a p ro du ct k ernel so that W i = K δ 1 ( k R 1 obs − R 1 i k 1 ) K δ 2 ( k R 2 obs − R 2 i k 1 ) where δ 1 , δ 2 are 2 tolerance thr esholds. Epanec hniko v k ernels are considered for K δ 1 and K δ 2 and δ 1 and δ 2 are f ound by accepting a give n p ercenta ge P δ 1 = P δ 2 of the simula tions. 4 3.5 Comparisons b etw een ABC and MCMC metho ds for a standard SIR mo del F ollo win g [3] a p erformance indicator for ABC techniques consists in their abilit y to replicate lik eliho o d-based results giv en b y MCMC. Here the s ituation is p articularly fa v orable for com- paring the t w o m etho ds since the partial and the full p osterior are the same. In the follo win g examples, we c ho ose samples of small sizes ( n = 3 and n = 29) so that the dimension of the missing data is reasonable and MCMC ac hiev es fast con v ergence. F or large sample sizes with high-dimensional missin g data, MCMC con vergence might in deed b e a serious issue and more thorough u p dating sc heme shall b e implement ed [9, 24]. W e consider the stand ard SIR mo d el with no con tact-tracing ( λ 3 = 0). W e c h o ose gamma distributions for the priors of λ 1 and λ 2 with a sh ap e parameters of 0 . 1 and rate parameters of 1 and 0 . 1. The data consist of the detectio n times and we assume that the infection times are not observ ed. W e implemen t the MCMC algorithm of [21]. A total of 10 , 000 steps are considered for MCMC with an initial burn -in of 5 , 000 steps. F or ABC, the su mmary statistic consists of the cumulativ e n umber of d etections as a fu nction of time. A total of 100 , 00 0 sim ulations are p erformed for ABC. The ﬁr st example was previously considered b y [21]. They simulated detection t imes b y considering one initial infectious individual and b y setting S 0 = 9, λ 1 = 0 . 12, and λ 2 = 1 (see Section 4 of th e supp lementary m aterial for the d ata). As display ed b y Figure 2, the p osterior distributions obtained with ABC are extremely close to th e ones obtained with MCMC provided that the tolerance rate is suﬃcien tly s m all. W e see that the tolerance rate changes imp ortan tly the p osterior d istr ibution obtained with ABC (see the p osterior distributions for λ 2 ). In a second example, w e sim ulate a standard SIR tra jectory w ith λ 1 = 0 . 12, λ 2 = 1, S 0 = 30 and I 0 = 1. The d ata no w consist of 29 detection times (see Section 4 of the supplementary material). Once again, Figure 2 shows that the ABC and MCMC p osteriors are close pro vided that the tolerance rate is small enough. ABC pro duces p osterior distr ibutions w ith larger tails compared to MCMC, even with the lo west tolerance rate of 0 . 1%. This can b e explained b y considering th e extreme scenario in which the tolerance threshold δ go es to inﬁn ity: eve ry sim ulation has a weig ht of 1 so that ABC targets the prior ins tead of th e p osterior. As the prior has typical ly larger tails than the p osterior, ABC inﬂates the p osterior tails. 4 Comparison b et w een AB C wi th full and binned detection times When there is noise or w hen the detection times hav e b een binn ed, the full observ ations ( R 1 t , t ∈ [0 , T ]) and ( R 2 t , t ∈ [0 , T ]) are una v ailable. Then , w e replace these summary statistics b y a v ector of su mmary statistics suc h as the num b ers of detections p er y ear during the observ ation p erio d . Since these new su mmary statistics are n ot suﬃcien t an ymore, the n ew partial p osterior distribution ma y b e diﬀeren t from the p osterior p ( θ | x ). In the follo w in g, w e compare p oin t estimates and CIs obtained f rom ABC with full (metho d of S ection 3) and bin ned detection times (metho d of Section 4.2). 4.1 A new set of summary statistics W e consider a d-dimensional vecto r of sum mary statistics of three diﬀerent t yp es. First, w e compute the num b ers R 1 T and R 2 T of individu als d etected b y rand om screening and con tact- tracing by the end of th e observ ation p erio d. Second, for eac h y ear j , w e compu te the n umb ers of ind ividuals that hav e b een found to b e HIV p ositiv e R l j +1 − R l j , l = 1 , 2. The third t yp e of summary statistics consists of the num b ers of new infectious for eac h of the the sixth ﬁr st y ears 5 I j +1 − I j for j = 0 , . . . , 5, as we ll as the mean time durin g whic h an individual is inf ected b ut has n ot b een detected yet. This mean time corresp on d s to the mean so journ time in the class I for the sixth ﬁr st y ears. These summary statistics are a v ailable here b ecause infectious times b efore 1992 are kno wn . In order to compute the weig hts W i , w e consid er the follo wing spherical k ernel K δ ( x ) ∝ K ( k H − 1 x k /δ ). Here K d enotes the one-dimens ional Epanechnik ov kernel, k . k is the Euclidian norm of R d and H − 1 a matrix. Bec ause the summary statistics ma y span diﬀerent scal es, H is tak en equal to the d iagonal matrix with the standard deviation of eac h one-dimens ional summary statistic on the d iagonal. 4.2 Curse of dimensionalit y and regr ession adjustmen ts In th e case of a d -dimensional vect or of s ummary statist ics, the estimator of th e cond itional mean (3.1) is con verge nt if the tolerance r ate satisﬁes lim N → + ∞ δ N = 0, so that its bias con v erges to 0, an d lim N → + ∞ N δ d N = + ∞ , so that its v ariance conv erges to 0 [13]. As d increases, a larger tolerance threshold shall b e c hosen to k eep the v ariance small. As a consequence, the b ias ma y increase with the num b er of s ummary statistics. Th is ph enomenon kno wn as the curse of dimensionality m ay b e an iss u e for the ABC-rejection approac h. The follo wing paragraph present s regression-based adjustments that cop e with the cur se of dimensionalit y . The adjustmen t principle is presen ted in a general setting within whic h the corrections of [3] and [6] c an b e deriv ed. Correction adj u stmen ts aim at obtaining f r om a random coup le ( θ i , s i ) a random v ariable distribu ted according to p ( θ | s obs ). The idea is to constr u ct a coupling b et ween the d istributions p ( θ | s i ) and p ( θ | s obs ), through which we can shrin k the θ i ’s to a sample of i.i.d. dra ws from p ( θ | s obs ). In the remainin g of this su bsection, we describ e ho w to p erform the corr ections for eac h of the one-dimensional comp on ents separately . F or θ ∈ R , correction adjustments are obtained by assu m ing a r elationship θ = G ( s, ε ) =: G s ( ε ) b et ween the parameter and the summary statistics. Here G is a (p ossibly complicated) f u nction and ε is a rand om v ariable with a distribution that d o es n ot dep end on s . A p ossibilit y is to c ho ose G s = F − 1 s , the (generalized) inv ers e of the cum ulativ e distribu tion function of p ( θ | s ). In this case, ε = F s ( θ ) is a u niform rand om v ariable on [0 , 1]. T he formula for adjustment is give n by θ ∗ i = G − 1 s obs ( G s i ( θ i )) i = 1 , . . . , N . (4.1) F or G s = F − 1 s , the fact t h at the θ ∗ i ’s are i.i.d. with densit y p ( θ | s obs ) arises from the s tan- dard in v ersion algorithm. Of course, the fun ction G shall b e appro ximated in practice. As a consequence, the adjusted simulat ions θ ∗ i , i = 1 , . . . , N , co ns titute an appro ximate sample of p ( θ | s obs ). Th e ABC algorithm with regression adjustment can b e describ ed as follo ws 1. Sim ulate, as in the rejection algorithm, a sample ( θ i , s i ), i = 1 , . . . , N . 2. By m aking use of the sample of the ( θ i , s i )’s weigh ted by the W i ’s, appro ximate the fun ction G su c h that θ i = G ( s i , ε i ) in the vicinit y of s obs . 3. Replace the θ i ’s b y the adjusted θ ∗ i ’s. T he resulting w eigh ted sample ( θ ∗ i , W i ), i = 1 , . . . , N , form a sample from the target distribu tion. Beaumon t et al. local linear regressions (LO C L) The case where G is approximat ed b y a linear mo d el G ( s, ε ) = α + s t β + ε , wa s considered b y [3]. T he p arameters α and β are inferred b y minimizing th e w eigh ted squared error P N i =1 K δ ( s i − s obs )( θ i − ( α + ( s i − s obs ) T β )) 2 . Using (4.1), the correction of [3 ] is derived as θ ∗ i = θ i − ( s i − s obs ) T ˆ β , i = 1 , . . . , N . (4.2) 6 Asymptotic consistency of the estimators of th e partial p osterior distribution with the correction (4.2) is obtained b y [5]. Blum and F ran¸ cois’ nonlinear conditional het eroscedastic regressions (NCH) T o r e- lax the assump tions of homosc edasticit y and linearit y inherent to local linear regression, [6] appro ximated G b y G ( s, ε ) = m ( s ) + σ ( s ) × ε where m ( s ) denotes the conditional exp ecta tion, and σ 2 ( s ) the conditional v ariance. Th e estimators ˆ m and log ˆ σ 2 are found b y adju sting t wo feed-forw ard neural netw orks using a regularized wei ghted squared error. F or the NCH mo del, parameter adjustmen t is p erformed as follo ws θ ∗ i = ˆ m ( s obs ) + ( θ i − ˆ m ( s i )) × ˆ σ ( s obs ) ˆ σ ( s i ) , i = 1 , . . . , N . In p ractical app lications of the NCH mo del, w e train L = 10 neural net works for eac h condi- tional regression (exp ecta tion and v ariance) and we a v erage the resu lts of the L neural netw orks to p r o vide the estimates ˆ m and log ˆ σ 2 . Reparameterization In b oth regression adjustment app roac h es, the regressions can b e p er- formed on transform ations of the resp onses θ i rather that on the resp onses themselve s. P a- rameters whose pr ior distributions h av e ﬁnite supp orts are transformed via the logit f unction and n on-negativ e parameters are transformed via th e logarithm function. T hese transform a- tions guaran tee that the θ ∗ i ’s lie in the supp ort of the p rior distribution and ha v e th e additional adv ant age of stabilizing the v ariance. 4.3 A comparison for sim ulated datasets In order to wo rk on data similar to the Cuban database, we simulat e M = 200 syn thetic data sets for the HIV-AIDS epidemic with µ 1 = 2 × 10 − 6 , λ 1 = 1 . 14 × 10 − 7 , λ 2 = 3 . 75 × 10 − 1 , λ 3 = 6 . 55 × 10 − 5 , and c = 1 [10]. T h e initial cond itions are set to S 0 = 6 × 10 6 , the size of the Cuban p opulation in th e age-group 15-49, I 0 = 232 and R 0 = 0 [11]. Here w e simulate only 6 y ears of the epidemics. W e study four v arian ts of ABC for estimating λ 1 , λ 2 , and λ 3 : one with the t w o path-v alued summary s tatistics and three with the ve ctor of summary statistics. When us in g the ﬁnite di- mensional vect or of su mmary statistics, we p erform the smo oth rejection app roac h as well as the LOCL and NC H corrections with a total of 21 summary statistics: th e 18 summ ary statistics corresp ondin g to th e y early increases of R 1 , R 2 , and I ; the ﬁnal num b ers of detected ind ividuals R 1 6 and R 2 6 ; and the m ean so journ time in the class I . Eac h of th e M = 200 estimations of the partial p osterior d istributions are p erformed usin g a total of N = 5000 sim ulations of the SIR mo del w ith the mass action p rinciple (ﬁ rst rate in (2.1)). Prior distribut ions The prior distribu tions f or µ 1 , λ 1 , λ 2 and λ 3 are c hosen to b e un if orm on a log scale. The c h oice of a log scale reﬂects our uncertain t y ab out the ord er of magnitude of the parameters. The pr ior distribution f or log 10 ( µ 1 ) is U ( − 6 , − 4) wh er e U ( a, b ) denotes the un iform distribution on the interv al ( a, b ). T h e p rior distribu tion is U ( − 9 , − 6) for log 10 ( λ 1 ), U ( − 4 , 3) for log 10 ( λ 2 ), and U ( − 8 , 2) for log 10 ( λ 3 ). The b ounds of the un iform distr ib utions are set to k eep the sim ulations from b eing degenerate. The prior for c is log(2) / U (1 / 12 , 5) so that the half-life of t 7→ e − ct , whic h measures the individual contribution to th e detection b y conta ct-tracing, is uniformly d istributed b et wee n 1 / 12 and 5 y ears. P oin t estimates of θ and credibilit y in terv als Figure 3 displays the b o x p lots of the 200 estimated mod es, m edians, 2 . 5% and 97 . 5 % qu an tiles of the p osterior distribu tion for λ 1 as a 7 function of th e tolerance r ate P δ (see Figures 1 and 2 of the su p plemen tary material for λ 2 and λ 3 ). First, w e ﬁ n d that the medians and mo d es are equiv alen t except for the rejection metho d with 21 summary statistics for whic h the mo de is less biased. F or the lo w est tolerance rates, the p oint estimates obtained with the four p ossible m etho ds are close to the v alue λ 1 = 1 . 14 × 10 − 7 used in the s im ulations, with smaller CI for the LOCL and NCH v ariants. When in creasing the tolerance rate, the bias of the p oin t estimates obtained with the rejection m etho d with 21 summary statistics slightl y increases. By con trast, up to tolerance rates smaller than 50%, the biases of th e p oin t estimates obtained with th e three other metho ds remain sm all. As can b e exp ected, the widths of the CI obtained with the rejection m etho ds increase with the tolerance rate wh ile they remain considerably less v ariable for the m etho ds with regression adjus tmen t. Mean square error F or further comparison of the diﬀeren t metho ds, w e compute the r escaled mean square errors (RMSEs). The RMSEs are computed on a log scale and rescaled by the range of the prior distribution so that RMSE( λ j ) = 1 M M X k =1 (log( ˆ λ k j ) − log ( λ j )) 2 Range(prior( λ j )) 2 , j = 1 , 2 , 3 , (4.3) where ˆ λ k j is a p oin t estimate ob tained with th e k th syn thetic data set. W e ﬁnd that the smallest v alues of the RMSEs are usu ally reac hed f or the lo west v alue of th e tolerance r ate (see Figure 4). F or λ 1 and λ 2 , the RMSEs of the p oint estimates obtained w ith the four diﬀeren t m etho ds are comparable for the lo we st tolerance rate. Ho wev er, the s mallest v alues of the RMS Es are alw a ys found w hen p erforming th e r ejection metho d with the t wo suﬃcient su mmary s tatistics R 1 and R 2 . Th is ﬁ n ding is ev en more pronounced for the parameter λ 3 . Rescaled mea n credibility in terv als T o compare the whole p osterior distribu tions obtained with the four diﬀerent metho ds, we additionally compute the diﬀerent CIs. Th e rescaled mean CI (RMCI) is deﬁned as follo ws RMCI = 1 M M X k =1 | I C k j | Range(prior( λ j )) , j = 1 , 2 , 3 , (4.4) where | I C k j | is th e length of the k th estimated 95% CI for the parameter λ j . As disp lay ed by Figure 3, the C Is obtained with sm o oth r ejection increase imp ortan tly with the tolerance rate whereas suc h an imp ortan t increase is n ot observed with regression adj u stmen t. In Figure 4, the CIs obtained with the NCH metho d are clearly the thinnest, th ose obtained with the rejection metho ds are the widest and those obtained with the LOC L metho d ha ve inte r m ediate width. In th e follo wing, we p er f orm the NCH correctio n when considering the ﬁnite dimen sional vecto r of summary statistics. This c hoice is motiv ated by the small RMSEs and RMCIs obtained with the NCH metho d (Figure 4). 5 Application to the Cuban HIV-AIDS epidemic W e calibrate the S IR m o del with con tact-tracing to the Cuban HIV-AIDS database. W e consider t w o metho ds: smo oth rejection ABC w ith the t wo path-v alued summary statistics (Section 3), and the NCH-ABC with the v ector of summary statistics (Section 4.2). F or the Cu ban d ata, this vect or is of dimension 51. 8 5.1 P arameter c alibration and go o dness of ﬁt F or the ABC algorithm, we p erform a total of 100,000 sim ulations, w e consid er the t wo diﬀerent rates of con tact-traci n g detection (2.1), and w e use the same initial conditions and priors as in Section 4.3. T o set the v alue of the tolerance rate P δ , w e consider the 15 ﬁrst yea rs of th e epidemic as the training data and c ho ose th e v alue of the tolerance rate P δ that minimizes the prediction error at the en d of the ep idemic ( T = 21 . 5) Pred Err or = E " | R 1 21 . 5 ( P δ ) − R 1 obs, 21 . 5 | R 1 obs, 21 . 5 + | R 2 21 . 5 ( P δ ) − R 2 obs, 21 . 5 | R 2 obs, 21 . 5 # . (5.1) F or the optimal tolerance rate P δ , w e in ve stigate the go o dn ess of ﬁt of the SIR-t yp e mo del. By simulat ing paths of th e SIR m o del asso ciated with p arameters θ sampled from the partial p osterior distribution, w e c hec k if the model repro d uces a p osteriori the observ ed summary statistics [16]. In Figure 5, w e display the P osterior Predictiv e Distribu tions (PPD) of diﬀerent summary statistics. Figure 5 has b een obtained b y considering the t w o su ﬃcien t sum m ary statistics R 1 and R 2 , using the optimal tole rance rates of P δ 1 = P δ 2 = 1%, and considering the mo del of frequency dep endence (second rate in (2.1)). Th e cumulat ed num b er of detected individuals are contai n ed in the ranges of th e PPDs. By con tr ast, the mean so journ time in the class I is not con tained in the PPD and the observ ed n umb er of inf ectious ind ivid uals is in the lo wer tail of the P PD. An explanation migh t b e that an age-structure has to b e tak en in to account for th e infection rate in order to capture the non-Mark o vian eﬀects ( e.g. [25]). A mo del with an in creasing in fection rate could d iminish the mean so journ time in the class I and increase by comp ensation the n u mb er of infections to maintai n the infection pressu re constan t. When considerin g th e m o del with a mass action principle (ﬁrst r ate in (2.1)), we observed (see Figure 3 of the supplement ary material) that the statistic R 1 obs, 21 . 5 is not conta ined in the PPD. With a mass action principle, the rate of con tact-tracing detection increases linearly with the con tribution of th e d etected individuals, and that is to o rapid in comparison with the data. Last, w e ﬁ nd that the PP Ds obtai ned with the NCH method h a v e ext remely wide supp orts for b oth rates of con tact-tracing detection (see Figure 4 of the supp lementary material ). These large PPDs su ggest that the summary statistics measuring th e detections and the in fections may con tain conﬂicting signals, w hic h results in a large v ariance of the partial p osterior d istribution. T o pro vide p oint estimates and CIs, w e consider th e mo d el that pro vides the b est ﬁt (mo d el with frequ ency dep enden ce ﬁtted with the t w o tra jectories R 1 and R 2 ). F or p oin t estimation, w e compute the p osterior mo de. The estimate of λ 1 is 5 . 4 × 10 − 8 (95%CI [3 . 9 × 10 − 8 ; 2 . 3 × 10 − 7 ]), the estimate of λ 2 is 0.13 (95%CI [0 . 007 ; 1 . 17]), and the estimate of λ 3 is 0 . 19 (95%CI [0 . 03; 0 . 82]). The p oin t estimate of the rate of in fection λ 1 implies that the net rate of infection p er infectious individuals λ 1 S is equal to 0 . 32 (95%CI [0 . 23; 1 . 37 ]). This means th at th e wait ing time b efore an infectious individu al, that has not b een detected y et, inf ects an other in dividual is 3 . 1 y ears (95%CI [0 . 72; 4 . 34]). 5.2 The dynamic of the Cuban HIV-AIDS epidemic Reconstruction of the cum ulativ e num b ers of de t ections Figure 6 displa ys the dynam- ics p redicted by the SIR mo del for th e num b ers of individu als detected b y random screening, con tact-tracing and for the n umb er of unknown infectious individ uals. Interestingl y , there is a really go o d ﬁ t b et w een th e real and pr edicted num b ers of individu als d etected by random screening except b et we en 1992 and 1995. This p eriod corresp onds to the p erio d of crisis that follo wed the collapse of the S oviet Union and du ring wh ich the HIV detection system r eceiv ed less atten tion. W e also ﬁn d a sligh t discrep an cy in the recen t y ears (2000-2007 ) b et wee n the real and pred icted num b ers of detections b y con tact-tracing, whic h ma y reve al a we ake nin g in 9 the con tact-tracing sys tem. An exp lanation is that a new mo d e of detectio n, related to conta ct- tracing but still counted as random detection, h as b een develo p ed. Th is new t yp e of detection is promoted by the family do ctors w ho ask to their p atien ts the names of individuals at risk (H. De Arazoza, p ersonal comm unication). P erformance of the con tact-tracing system When testing the p erform an ce of con tact- tracing, [18] computed the cov erage of the epid emic deﬁned as the p ercenta ge of infectious individuals that hav e b een detected ( R 1 + R 2 ) / ( I + R 1 + R 2 ). As displa y ed by Figure 6, th e SIR mo d el predicts a co verage of 62% (95%CI [36%; 66%]) in 200 0 that is muc h lo wer than a co verage of 83% (95%CI [75%; 87%]) as inferred by [18]. Ho w ever, since the P P Ds of Figure 5 sho w that the SIR mo d el pr edicts less infectious individu als than observ ed, the co v erage migh t still b e o ve restimated and would consequently b e ev en smaller than 62%. Using this estimation of the co v erage, w e can compare the r ates of d etectio n b y r andom screening and con tact-traci n g p er in fectious ind ividual. The estimated p er capita rate of r an- dom s creening is λ 2 = 0 . 13. The p er capita rate of con tact-tracing equals λ 3 P i ∈ R exp( − c ( t − T i )) / ( I t + P i ∈ R exp( − c ( t − T i ))). Using a zero-o rd er expansion, w e ﬁ n d that this rate can b e appro ximated by the p ro duct of λ 3 with the co verag e of the epid emic. Hence, the p er capita rate of con tact-tracing can b e estimated as 0 . 19 × 0 . 6 2 ≈ 0 . 12 that is almost equal to λ 2 . Predictions Additionally , simulat ions of the SI R mo del p ro vide predictions for the evo lu- tion of th e HIV d ynamic in th e forthcoming y ears. The SI R m o del pred icts that in 2015, 42 , 000 (95%CI [29 , 000; 107 , 000]) ind ividuals will b e infected since th e b eginnin g of the epi- demic in C u ba. Among these infect ed individuals, a p rop ortion of 45% (95%CI [29%; 46%]) will b e detected by random screening and a prop ortion of 21% (95%CI [10%; 22%]) by con tact- tracing. As displa yed by Figure 6, the S IR-t yp e mo del with con tact-traci n g pr edicts that the total prop ortion of ind ividuals detected by con tact-tracing will reac h an asymptote of 32% (95%CI [25%; 33%]) in 2015. The total n umb er of in f ected individu als in 2015 corresp ond s to 27 , 000 (95%CI [19 , 000; 80 , 00 0]) new cases of HIV b etw een July 2007 an d January 2015. In the same p erio d of time, th e SIR mo del predicts th at 12 , 000 individuals (95%CI [9 , 000; 24 , 000 ]) will b e detected by random screenin g and 6 , 000 individuals (95%CI [4 , 000; 8 , 000]) b y con tact- tracing. 6 Conclusions In the con text of temp oral epidemiological d ata, ABC tec hn iques can provide accurate estimates of the parameters of interest suc h as the in fection and detection rates [19, 28]. ABC relies on sim ulations of the mo d el and can therefore b e applied to v arious epid emiologic al mo dels. F or p artially observ ed p opulation and m iss ing infectious times, MCMC metho d s require to reconstruct the un kno wn data wh ic h can b e highly compu tationally intensiv e for large p opu- lations. F or instance, [21] and [25] considered MCMC algorithms for p opulations consisting of ab out 100 ind ividuals whereas the Cub an HIV-AIDS database con tains almost 10,000 kno wn HIV p ositive individu als, whic h m ak es th e total (kno wn an d unkno wn) num b er of in f ectious individuals ev en larger. When the dimension of the missing data, the infection times h ere, is b oth large and unknown, data imputation with MCMC can b e compu tationally v ery demand ing and takes several da ys on a parallelized system [24]. Ho w ev er, compared with the abund an t MCMC literature, the exp erience of stati sticians with ABC is still rather limited. F or MCMC algorithms, theoretical con v ergence results [26] as well as practical metho d s for monitoring n u m erical con v ergence are a v ailable [15]. Although theoretic al results are no w a v ailable for ABC [5], there is n o metho d for ev aluating the t wo appro ximations inherent to ABC (S ection 3.1). Here, we are in a fav orable situation sin ce the full su mmary 10 statistics are su ﬃcien t so that th e partial and the fu ll p osterior are the sa me. Ho wev er, for the second approxima tion, the practic e of co nd itional densit y estimation in high dimensions remains an issue. When comparing p osterior distrib utions obtained with MCMC and ABC for a standard SIR mo del, w e ﬁnd the same mo des provided that the tolerance r ate is small enough. Ho w ev er, ev en for the smallest tolerance rate, we ﬁnd that ABC generates p osterior distribu tions with larger tails compared with MCMC. More generally , ABC applications ha ve b een restricted to mo dels with mo derate num b er of paramete rs w hereas MCMC applications can in vol ve a v ery large n umb er of parameters ([22]). F or mo dels with a substantia l num b ers of parameters, adaptiv e ABC algorithms that use th e sim ulations to mo dify the sampling d istribution of th e parameter θ , might constitute interesting wa ys to explore f or the futur e of ABC in epidemiology [23, 2, 27]. In th is pap er, w e consider b oth ﬁnite and in ﬁnite dimensional summary statistics f or ABC. When comparing ABC w ith the tw o diﬀeren t sets of statistics, w e ﬁnd th at the p oin t estimates of the parameters λ 1 , λ 2 , λ 3 , with the smallest quadratic errors are obtained with the r ejection metho d based on the inﬁn ite-dimensional statistics. Ho w ever, the 95% CIs obtained w ith this metho d are large and criti cally dep end o n the tolerance rate. By con trast, regression-based adjustment m etho ds, and the NCH metho d more particularly , considerably shorten the CIs and are less sensitiv e to the tolerance rate. Applications of regression-based ABC metho ds constitute therefore a solution for ‘stabilizing’ the C Is. Ho w ev er, no ABC w ith regression adjustment ha ve b een dev elop ed so f ar for inﬁnite-dimensional summary statistics. In the last section of the pap er, w e calibrate the SIR mod el to the Cuban HIV-AIDS d ata. By comparing the p osterio r predictive distributions of the total num b er of detections, we ﬁnd that the mo del with a frequency-dep en den t rate of con tact-tracing p ro vides the b est ﬁt to the data. Using this mo del, we compare the p resen t-da y rates of cont act-tracing and random screening. W e ﬁnd that they are almost th e same and equal to 0 . 13/individual/y ear. Conv ertin g r ates of detection to w aiting times b efore d etectio n s , w e ﬁnd that the w aiting time b efore an individual infected to day w ill b e detected is equal to 1 / (2 × 0 . 13) ≈ 3 . 8 y ears. A t the time of detection, b oth t yp es of detection are equally p robable. Although it suggests that con tact-tracing d etection con tributes imp ortan tly to HIV screening in Cuba, w e ﬁn d that the screening migh t hav e b een largely incomplete. The p ercen tage of und etected individuals among the infectious ind ividuals migh t ha ve b een underestimated [18] and w ould b e of the order of 40%. Ac kno wledgmen ts The authors are grateful to Pr. H. De Arazoza of the Univ ersit y of La Hav ana and to Dr. J. P erez of the National Ins titute of T ropical Diseases in Cu ba for gran ting them access to the HIV- AIDS database. This w ork has b een supp orted by the ANR Viroscop y (ANR-08-SYSC-016 -03), the ANR MANEGE (ANR-09-BLAN-02 15-01), and by the Rhˆ one-Alp es Institute of Complex Systems (IXXI). The authors th ank the anonymous referees for their v aluable su ggestions. supplemen tary mate rials Supp lemen tary material is a v ailable at ht tp://www.b iostatistic s.oxfo rd journals.org References [1] H.T. Banks, J.A. Burn s, and E.M. Cliﬀ. Paramet er estimation and id en tiﬁcation for systems with d ela ys . SIAM Journal of Contr ol and Optimization , 19(6):791– 828, 1981. 11 [2] M.A. Bea umont, J.-M. Marin, J.-M. Corn uet, and C.P . Rob ert. Adaptiv e appro ximate ba y esian compu tation. Biometrika , 96(4):98 3–990, 2009. [3] M.A. Beaumont, W. Zhang, and D.J. Balding. Appro ximate Ba yesia n computatio n in p opulation genetics. Genetics , 162:202 5–2035, 2002. [4] N.G. Bec ker and T. Britton. Stati stical stud ies of in fectious disease incidence. Journa l of the R oyal Statistic al So ci ety: Series B , 61(2):2 87–307, 1999. [5] M.G. Blum. App ro ximate Ba y esian C omputation: a non-parametric p ersp ectiv e. Journal of the Americ an Statistic al Asso ciation , 2009. to app ear. [6] M.G.B. Blum a nd O. F ran¸ cois. Non-linear regression mo dels for Appro xim ate Ba yesia n Computation. Statistics and Computing , 20:63–73, 2010. [7] J.A. Burns, E.M. Cliﬀ, and S.E. Dough t y . Sensitivit y analysis and parameter estimatio n for a mo del of Chlamydia Trachomat is infection. Journal of Inverse Il l-Pose d Pr oblems , 15:19– 32, 2007. [8] S. Cauc hemez, F. Carrat, C. Vib oud, A. J. V alleron, and P . Y. Bo elle. A Ba ye sian MCMC approac h to study transmission of in ﬂuenza: application to h ousehold longitudinal d ata. Statistics in Me dicine , 23:3469 –3487, 2004. [9] S. Cauc hemez and N.M. F erguson. Lik eliho o d -based estimation of contin uous -time epidemic mo dels from time-series d ata: app lication to measles transmission in london. Journal of the R oyal So ciety Interfac e , 5:885–897 , 2008 . [10] S. Cl ´ emen¸ con, V.C. T ran, and H. De Arazoza. A s to c hastic S IR mo del with conta ct-tracing: large p opu lation limits and statistical inferen ce. Journal of Biolo gi c al D ynamics , 2(4):392 – 414, 2008. [11] H. de Arazoza, J. Joanes, R. Lounes, C. Legeai, S. Cl´ emen ¸ con, J. P erez, and B. Auvert. The HIV/AIDS epidemic in C uba: description and ten tativ e explanation of its lo w prev alence. BMC Infe ctious D ise ase , 7:130, 2007. [12] H. d e Arazoza, R. Loun es, J . P erez, and T. Hong. What p ercen tage of the cuban HIV-AIDS epidemic is known. R ev Cub ana Me d T r op , 55:30–37, 2003. [13] J. F an. Design-adaptiv e nonparametric regression. Journal of the Americ an Statistic al Asso ciation , 87(420) :998–1004, 1992. [14] B.F Fink enst¨ adt, O.N. Bjørnstad, and B.T. Gr enfell. A s to c hastic mo del f or extinction and recurrence of ep id emics: estimation and inference for measles outbr eaks. Biostatistics , 3:493– 510, 2002. [15] A. Gelman. Inference and monitoring con ve rgence. I n W.R. Gilks, S. Ric hardson, and D.J. Spiegelhalter, editors, Markov c hain M onte Carlo in pr actic e , p ages 131–1 44. Ch apman & Hall, 1996 . [16] A. Gelman and X-L Meng. Mo del chec king and model impro vment . In W.R. Gilks, S. Ric hardson, and D.J. S piegelhalter, editors, Markov chain Monte Carlo in pr actic e . Chapman & Hall, 1996. [17] W.R. Gilks and G.O. R ob erts. S trategies for improving MCMC. In W.R. Gilks, S. Ric hard- son, and D.J. Spiegelhalter, editors, Markov chain Monte Carlo in pr actic e . Ch apman & Hall, 1996 . 12 [18] Y.H. Hsieh, H. de Arazoza, S.M. Lee, and C.W. C hen. Estimating the num b er of Cubans infected sexually by h u man imm un o deﬁciency virus us in g co ntac t tr acing data. Int. J. Epidemiol. , 31(3) :679–83, 2002 . [19] T. McKinley , A. R. Co ok, and R. Deardon. Inference in epidemic mo dels without likeli- ho o ds. The International J ournal of Biostatistics , 5(1), 2009. [20] P .D. O’Neill. A tutorial in tro du ction to Ba yesia n inferen ce for sto chastic epidemic mo dels using Marko v chain Mont e C arlo metho ds. Mathema tic al Bioscienc es , 180:103–1 14, 2002. [21] P .D. O’Neill and G.O. Rob erts. Ba y esian inference f or partially observ ed sto c hastic epi- demics. Journal of the R oyal Statistic al So ciety: Series A , 162:121 –129, 1999. [22] J. K. Pritchard, M. Stephens, and P . Donnelly . In ference of p opu lation structure using m ultilo cus genot yp e data. Genetic s , 155:945–9 59, 2002 . [23] S.A. Sisson, Y. F an, and M. T anak a. Sequen tial Mon te Carlo without lik eliho o ds. Pr o c . Nat. A c ad. Sci. USA , 104:1760 –1765, 2007. [24] I. Chis Ster, B.K. Singh, and N.M. F erguson. E pidemiological inference for p artially ob- serv ed epidemics: the example of the 2001 fo ot and mouth epidemic in Great Britain. Epidemics , 1:21–34, 2009. [25] G. Streftaris and G.J. Gib s on. Ba y esian inference for sto c hastic ep id emics in closed p opu- lation. Statistic al Mo del ling , 4(1):63–75 , 200 4. [26] L. T ierney . Mark ov chains for exploring p osterior d istributions. Annals of Statistics , 22(4): 1701–1728 , 1994. [27] T. T oni, D. W elc h, N. Str elko wa , A. Ips en, and M.P . S tumpf. Ap pro ximate Ba y esian computation sc heme for parameter inference and mo del selection in dynamical sys tems. Journal of The R oyal So c i ety Interfac e , 6:187 –202, 2009. [28] D.M. W alk er, D. Allingham, H.W.J. Lee, and M. Small. P arameter inference in small w orld net wo rk disease mo dels w ith Approxima te Ba yesian Compu tational metho ds. P hysic a A , 389:54 0–548, 2010. 13 λ 0 λ 1 SI μ 0 S Susceptible Individuals Infectious Individuals λ 3 R ∈ i e Σ I λ 2 I 2 1 R R Contact Tracing Random Screening Detected Individuals S I μ 1 I R=R +R 1 2 -c(t-T ) i Figure 1: Sc hematic description of the SIR m o del with con tact-tracing. 14 −10 −8 −6 −4 −2 0 2 0.0 0.1 0.2 0.3 0.4 Log λ 1 Density Prior MCMC Posterior ABC Posterior, Tol. rate=1% ABC Posterior, Tol. rate=0.1% −10 −8 −6 −4 −2 0 2 0.0 0.1 0.2 0.3 0.4 0.5 Log λ 2 Density −4 −3 −2 −1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Log λ 1 Density Prior MCMC Posterior ABC Posterior, Tol. rate=1% ABC Posterior, Tol. rate=0.1% −3 −2 −1 0 1 2 0.0 0.5 1.0 1.5 Log λ 2 Density a) b) Figure 2: Comparison of the p osterior densities obtained with MCMC and ABC. T h e vertica l lines corresp ond to the v alues of the p arameters used for generating the synt hetic data. a) The data consist of 3 d etectio n times that ha ve b een sim ulated by [21]. b) The data consist of 29 detection times that we sim ulated b y setting λ 1 = 0 . 12, λ 2 = 1, S 0 = 30, I 0 = 1, and T = 5 (see the supplementa ry material for the 29 d etectio n times). 15 Figure 3: Bo xplots of the M = 200 estimated mo des and quan tiles (2 . 5%, 50%, and 97 . 5%) of the partial p osterior distrib utions of λ 1 . F or eac h ABC metho d an d eac h v alue of the tolerance rate, 200 p osterior d istributions are computed for eac h of the 200 syn thetic data sets. The horizon tal lines corresp ond to the true v alue λ 1 = 1 . 14 × 10 − 7 used when sim ulating the 200 syn thetic data sets. The diﬀeren t tolerance r ates are 0.01, 0.05, 0.10, 0.25, 0.50, 0.50, 0.75, and 1 for all the ABC m etho ds except the rejection scheme with the tw o summary statistics. F or th e latter method , the tolerance rates are 0.007, 0.02, 0.06, 0.13, 0.27, 0.37, 0.45, 0.53, 0.66 , 0.80, 1. 16 ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 1e−06 1e−04 1e−02 1e+00 RMSE λ λ 1 ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 1e−05 1e−03 1e−01 λ λ 2 ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.001 0.005 0.010 0.050 0.100 0.500 1.000 λ λ 3 ● 21 one−dimensional sum stats Reject LOCL NCH 2 infinite dimensional sum stats Reject ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.001 0.005 0.010 0.050 0.100 0.500 1.000 RMCI λ λ 1 Tolerance rate ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.001 0.005 0.010 0.050 0.100 0.500 1.000 λ λ 2 Tolerance rate ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.01 0.02 0.05 0.10 0.20 0.50 1.00 λ λ 3 Tolerance rate Figure 4: Re scaled mean s q u ared error (RMSE) of the p osterior m o de and rescaled mean cred- ibilit y inte rv al (RMCI). 17 Figure 5: Bay esian p osterior predictiv e distrib utions of R 1 21 . 5 , R 2 21 . 5 , I 6 , and the mean so journ time in the class I. The S IR mo del corresp onds to the mo del with fr equency dep en d ence for con tact-tracing detection. The partial p osterior samples are obtained with the smo oth r ejection ABC algorithm b y making use of th e 2 inﬁnite-dimensional summary statisti cs R 1 and R 2 . T olerance rate of P δ 1 = P δ 2 = 1% are considered f or eac h summary statistic. 18 1 10 100 1000 10000 Years R 1 1990 2000 2010 observed trajectory median predicted trajectory 2.5% and 97.5% quantile trajectories 1 10 100 1000 10000 Years R 2 C o v er age 1990 2000 2010 0.0 0.1 0.2 0.3 0.4 0.5 Years R 2 ( ( R 1 + + R 2 ) ) 1990 2000 2010 Hsieh et al. estimate 0.0 0.2 0.4 0.6 0.8 1.0 1990 2000 2010 Coverage Years Figure 6: Median and 95% credibilit y in terv als of the p osterior p redictiv e distributions of R 1 t , R 2 t , R 1 t / ( R 1 t + R 2 t ) and co v erage R t / ( I t + R t ) from 1986 to 2015. T h e co v erage is deﬁned as the prop ortion of kno wn HIV p ositiv e individuals. The p osterior s amples are generated by the rejection scheme with the t wo summary statistic s. A tolerance rate of P δ = 1% is considered f or eac h summary s tatistic. 19

HIV with contact-tracing: a case study in Approximate Bayesian Computation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment