On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo

On optimalit y of k ernels for appro ximate Ba y esian computation using sequen tial Mon te Carlo Sarah Filippi 1 , Chris P . Barnes 1 , Julien Cornebise, Mic hael P .H. Stumpf 1 1 Cen tre for In tegrativ e Systems Biology and Bioinformatics, Imp erial College London, London SW7 No vem ber 26, 2024 Abstract Appro ximate Bay esian computation (ABC) has gained p opularity ov er the past few y ears for the analysis of complex mo dels arising in p opulation genetic, epidemiology and system biology . Sequen tial Monte Carlo (SMC) approaches hav e b ecome work horses in ABC. Here we discuss ho w to construct the p erturbation k ernels that are required in ABC SMC approaches, in order to construct a set of distributions that start out from a suitably deﬁned prior and conv erge tow ards the unkno wn p osterior. W e derive optimality criteria for diﬀeren t kernels, which are based on the Kullbac k-Leibler divergence b etw een a distribution and the distribution of the p erturb ed particles. W e will sho w that for many complicated posterior distributions, lo cally adapted kernels tend to sho w the best performance. In cases where it is p ossible to estimate the Fisher information w e can construct particularly eﬃcient p erturbation k ernels. W e ﬁnd that the added mo derate cost of adapting k ernel functions is easily regained in terms of the higher acceptance rate. W e demon- strate the computational eﬃciency gains in a range of toy-examples whic h illustrate some of the c hallenges faced in real-world applications of ABC, before turning to tw o demanding parameter inference problem in molecular biology , which highlight the h uge increases in eﬃciency that can b e gained from choice of optimal models. W e conclude with a general discussion of rational c hoice of p erturbation kernels in ABC SMC settings. 1 In tro duction Statistical practice and theory tend to reﬂect scientiﬁc fashions (Stigler, 1986). T o da y mathematical mo dels in biological sciences are b ecoming increasingly complex. This, together with the deluge of data b eing pro duced typically in genetics and genomics, p oses severe c hallenges to statistical inference (Efron, 2010). In particular in man y area of computational biology ev aluation of the likelihoo d (Cox, 2006) L ( θ ) = f ( x | θ ) , where x are realizations of the data, and θ is the (p oten tially v ector-v alued) parameter characterizing the data-generating pro cess, is often turning out to b e impractical. Appro ximate Ba yesian Com- putation (ABC) metho ds (Beaumont et al. , 2002; Marin et al. , 2011) w ere ﬁrst conceived to allow (Ba yesian) statistical inference in situations where the ev aluation of the likelihoo d is to o complicated or n umerically to o demanding (Pritchard et al. , 1999; T anak a et al. , 2006; Lop es & Beaumon t, 2010). Rather than ev aluating the likelihoo d directly , ABC-based approaches use systematic comparisons b et ween real and sim ulated data in order to arriv e at appro ximations of the true (but unobtainable) p osterior distribution, p ( θ | x ) ∝ f ( x | θ ) π ( θ ) , 1 where π ( θ ) denotes the prior distribution of θ . Sim ulating from f ( x | θ ) is generally straigh tforward, even if obtaining a reliable n umerical/functional represen tation of the mo del is not p ossible. W e then compare the simulated data, y , with the real data, x , and accept only those sim ulations where some distance measure betw een the t wo, ∆( x, y ), falls b elow a speciﬁed threshold,  . If the data are to o intricate or complicated it is common to replace a comparison of the real and simulated data b y a comparison of suitable summary statistics. This results in a often appreciable reduction of the dimension but is fraught with problems if the summary statistics are not suﬃcien t. Giv en that suﬃciency (Cox, 2006) is a rare quality indeed (Lehmann & Casella, 1998), and probably not giv en for an y real-w orld problem of scientiﬁc interest, this problem is no w attracting a lot of atten tion (Robert et al. , 2011; Didelot et al. , 2011; F earnhead & Prangle, 2011). Here, how ever, w e shall fo cus on the data directly; we thus seek to determine approximate p osteriors of the form, p ( θ | x ) ≈ p  ( θ | x ) ∝ Z f ( y | θ ) 1 (∆( x, y ) ≤  ) π ( θ ) dy , where y is the data simulated from the mo del f ( ·| θ ) for a given parameter, θ , drawn from the appro- priate prior distribution, and x is the observed data. The simple ABC scheme outlined ab ov e suﬀers from the same shortcomings as other rejection samplers: most of the samples are dra wn from regions of parameter space, whic h cannot giv e rise to sim ulation outputs that resem ble the data. Therefore a num b er of computational schemes hav e b een prop osed that mak es ABC inference more eﬃcient. These come in lo osely three ﬂa vours: regression- adjusted ABC (T allmon et al. , 2004; F agundes et al. , 2007; Blum & F ran¸ cois, 2010), Marko v chain Mon te Carlo ABC schemes (Marjoram & Molitor, 2003; Ratmann et al. , 2007), and ABC implementing some v arian t of sequential imp ortance sampling (SIS) or sequential Mon te Carlo (SMC) (Sisson et al. , 2007; T oni et al. , 2009; Beaumont et al. , 2009; Del Moral et al. , 2011). Of these ﬂav ours the ﬁrst and the last forms ha ve received the greatest atten tion and it is an ABC scheme based on sequential imp ortance sampling that we will fo cus on as it oﬀers greater ﬂexibilit y and applicability and app ears to b e enjo ying greater popularity in applications. W e fo cus on the implementation of T oni et al. (2009) and Beaumont et al. (2009) (called ABC SMC in the following), which lik e other related SIS and SMC metho ds w orks b y constructing a series of intermediate distributions that start out from a suitably sp eciﬁed prior distribution and increasingly resemble the (unknown) approximate p osterior distribution. These metho ds aim to sample sequen tially from a sequence of distributions, whic h increasingly resemble the target p osterior; they are constructed b y estimating in termediate distributions p  t ( θ | x ) for a decreasing sequence of {  t } 1 ≤ t ≤ T . Eac h in termediate distribution is describ ed by a weigh ted sample of parameter v ectors. Successiv e distributions are constructed by sampling parameters from the previous population, perturbing them through some kernel function, ˜ θ ∼ K t ( ·| θ ), generating simulated data, y ∼ f ( ·| ˜ θ ) and, upon acceptance, calculating the corresponding new weigh ts. While this sequential ABC approac h is computationally m uch more eﬃcien t than simple ABC rejection schemes, the ov erall computational burden do es not only dep end on the complexit y of the mo del and the amount of data at hand, but also on details of the chosen sequential scheme. In particular the  -sc hedule, {  1 , . . . ,  T } , and the choice of p erturbation kernels, K t ( ·|· ) exert considerable inﬂuence on the algorithmic complexit y . As in many Mon te Carlo settings (Gilks et al. , 1996; Rob ert & Casella, 2004) problems tend to arise as the dimension of the parameter space increases and balancing con vergence with an exhaustive exploration of the parameter space b ecomes harder. In this pap er we fo cus on perturbation k ernel selection and its eﬀect on the algorithm eﬃciency . The construction of suitable kernel functions has b een a longstanding problem in imp ortance sam- pling (Oh & Berger, 1993; Giv ens & Raftery, 1996), sequential imp ortance sampling and p opulation Mon te-Carlo (Douc et al. , 2007; Capp ´ e et al. , 2008; Cornuet et al. , 2009) as well as in sequential 2 Mon te-Carlo for state space mo dels (Pitt & Shephard, 1999; V an Der Merwe et al. , 2001; Cornebise et al. , 2008). How ev er, it is still far from b eing solved esp ecially in an ABC context where formal and informal understanding of k ernel c hoice remain areas of pressing concern. Esp ecially for mo dels which are computationally exp ensiv e to simulate, such as dynamical systems (Gutenkunst et al. , 2007; Secrier et al. , 2009; Erguler & Stumpf, 2011), the choice of the kernel will ha ve huge inﬂuence on the eﬃciency with whic h parameter spaces are explored and p osterior estimates obtained. Here w e will discuss a range of kernel functions, c haracterize their performance, and put forw ard some analytic results as to their optimality . In the next section we discuss the ABC sc heme in some detail b efore describing criteria for optimally c ho osing the p erturbation k ernels and outlining diﬀeren t classes of perturbation k ernels. W e then examine the p erformance of these kernels in a range of illustrativ e problems and compare their algorithmic complexity . W e will then show that for t wo mo dels in molecular biology with complex p osterior parameter distributions the choice of suitable k ernels can v astly impro v e the computational cost of ABC SMC inferences. 2 The ABC SMC algorithm The general sc heme of ABC inference is as follows: • sample a parameter v ector θ (also called a p article ) from the prior distribution π ( θ ), • sim ulate a dataset y according to the generative mo del f ( y | θ ), • compare the simulated dataset with the exp erimental data x : if ∆( x, y ) ≤  , accept the particle. This sc heme is repeated until N particles are accepted; these form a sample from the p osterior distri- bution p  ( θ | x ) ∝ Z 1 (∆( x, y ) ≤  ) f ( y | θ ) π ( θ ) dy whic h is an approximation of the posterior distribution p ( θ | x ). Ov er the past few y ears man y impro vemen ts of these algorithms ha v e b een proposed. In particular, Marjoram & Molitor (2003) introduced a method based on Marko v chain Mon te Carlo, whic h consists in constructing a Marko v chain whose stationary distribution is p  ( θ | x ). This algorithm is guaran teed to con verge, ho wev er, it is v ery diﬃcult to assess when the Mark ov chain reac hes the stationary regime; furthermore the c hain may get trapp ed in lo cal extrema. SIS and SMC samplers hav e then b een in tro duced in the ABC framework by several authors (Sisson et al. , 2007; T oni et al. , 2009; Beaumon t et al. , 2009; Del Moral et al. , 2011). These metho ds aim to sample sequentially from the distributions p  t ( θ | x ) for a decreasing sequence of {  t } 1 ≤ t ≤ T . The sc heme of the algorithm is as follows: ﬁrst, the ABC algorithm describ ed ab o ve is used to construct a sample from p  1 ( θ | x ) with a suﬃciently large v alue of  1 suc h that man y particles are accepted. The ABC algorithm is then used again with  2 as a threshold; but instead of sampling parameters from the prior, they are sampled from the set of accepted particles at the previous stage and p erturb ed according to a suitable p erturb ation kernel . This wa y a sample from p  2 ( θ | x ) is built, and so on until our target p osterior has been reached. In this article, w e fo cus on the implemen tation of T oni et al. (2009) and Beaumont et al. (2009) describ ed in Algorithm 1. In the following w e will refer to this implemen tation as the ABC SMC algorithm according to T oni et al. (2009). The ABC Population Mon te Carlo algorithm proposed by Beaumon t et al. (2009) is similar to the ABC SMC algorithm, except that a sp eciﬁc p erturbation k ernel is used. It is, how ev er, worth distinguishing b et ween these algorithms and the one of Del Moral et al. (2011) and Dro v andi & P ettitt (2011) based on the SMC sampler of Del Moral et al. (2006). When using SMC samplers, b oth a forward and a backw ard kernel need to b e deﬁned, whic h reduces the algorithmic complexit y from O ( N 2 ) to O ( N ) where N is the 3 n umber of particles. How ev er, in man y applications of in terest, the most computationally exp ensive part of an ABC algorithm is the simulation of the data, whic h although of complexit y O ( N ) dominates the O ( N 2 ) term for any practically feasible v alue of N (Beaumont et al. , 2009). In this pap er, we will not discuss kernel choice in the context of these approac hes, although here, to o, the c hoice of kernel will impact the n umerical eﬃciency . Algorithm 1 ABC SMC algorithm 1: input: a threshold  2: output: a weigh ted sample of particles from p  ( θ | x ) 3: t ← 0 4: repeat 5: t ← t + 1 6: determine the next threshold  t 7: determine the parameters of the perturbation k ernel K t ( ·|· ) 8: i ← 1 9: rep eat 10: if t=1 then 11: sample ˜ θ from π ( θ ) 12: else 13: sample θ from the previous p opulation { θ ( i,t − 1) } 1 ≤ i ≤ N with weigh ts { ω ( i,t − 1) } 1 ≤ i ≤ N 14: sample ˜ θ from K t ( ·| θ ) and suc h that π ( ˜ θ ) > 0 15: end if 16: sample y from f ( ·| ˜ θ ) 17: if ∆( y , x ) ≤  t then 18: θ ( i,t ) ← ˜ θ 19: y ( i,t ) ← y 20: i ← i + 1 21: end if 22: un til i = N + 1 23: calculate the w eigh ts: for all 1 ≤ i ≤ N 24: if t 6 = 1 then ω ( i,t ) ← π ( θ ( i,t ) ) P n j =1 ω ( j,t − 1) K t ( θ ( i,t ) | θ ( j,t − 1) ) 25: else ω ( i, 1) ← 1 26: end if 27: normalize the w eigh ts 28: un til  t ≤  29: T ← t The b ehaviour of the algorithm dep ends on its settings: in particular the decreasing sequence of {  t } t and the perturbation kernels { K t ( ·|· ) } t . The eﬀect of the sequence of decreasing threshold is easy to understand: if the diﬀerence b etw een tw o successive tolerances  t and  t +1 is small, the p osterior distributions p  t ( θ | x ) and p  t +1 ( θ | x ) are similar and a small num b er of simulations will b e required to generate N draws from the next in termediate distribution, p  t +1 ( θ | x ), by sampling from the w eighted p opulation { θ ( i,t − 1) , ω ( i,t − 1) } 1 ≤ i ≤ N . But a slowly decreasing sequence of thresholds {  t } 1 ≤ t ≤ T leads to a large n umber of iterations (large v alue of T ) in order to obtain  T =  . In practise, un til recently , the sequence of tolerance thresholds w ere most often tuned by hand according to the mo del. An adaptiv e choice of the threshold schedule has b een prop osed by Del Moral et al. (2011) and Drov andi 4 & Pettitt (2011). It consists in selecting the α -th quantile of the distances b etw een the simulated data { y ( i,t ) } 1 ≤ i ≤ N and the observed one x . Selecting the threshold adaptively often signiﬁcantly improv es the eﬃciency of the ABC SMC algorithm. Ho wev er, the eﬃciency strongly dep ends on the c hoice of α and as it is argued in Silk et al. (2012) for some v alues of α the algorithm ma y not con verge to the p osterior distribution p  ( ·| x ). Similarly , the c hoice of the p erturbation k ernels { K t ( ·|· ) } 1 ≤ t ≤ T exerts considerable inﬂuence on the computational complexity of the algorithm. A lo cal perturbation k ernel hardly mo ves the particles and has the adv an tage to produce new particles whic h are accepted with high probabilit y if the successive v alues of  are close enough; on the other hand, a widely spread out or p ermissiv e p erturbation k ernel enables exploring the parameter space more fully , but do es so at the cost of achieving only low acceptance rates. 3 Prop erties of optimal k ernels In sequen tial imp ortance sampling, a p erturbation k ernel K t should fulﬁll sev eral requirements to b e computationally eﬃcien t. In particular, the join t proposal distribution, corresponding to picking a particle at random and p erturbing it to obtain a new particle, should “resemble” in some sense the target join t distribution, corresp onding to picking independently tw o particles. More precisely , the joint prop osal distribution of a particle, whic h samples ﬁrst a particle θ ( t − 1) ∼ p  t − 1 ( ·| x ) then a p erturb ed particle θ ( t ) ∼ K t ( ·| θ ( t − 1) ), and accept the couple if and only if ∆ ( y , x ) ≤  t where y ∼ f ( ·| θ ( t ) ), admits for densit y q  t − 1 , t ( θ ( t − 1) , θ ( t ) | x ) = p  t − 1 ( θ ( t − 1) | x ) K t ( θ ( t ) | θ ( t − 1) ) R f ( y | θ ( t ) ) 1 (∆( x, y ) ≤  t ) dy α ( K t ,  t − 1 ,  t , x ) . (1) The normalization factor α ( K t ,  t − 1 ,  t , x ) = Z Z Z p  t − 1 ( θ ( t − 1) | x ) K t ( θ ( t ) | θ ( t − 1) ) f ( y | θ ( t ) ) 1 (∆( x, y ) ≤  t ) dθ ( t − 1) dθ ( t ) dy (2) is the aver age ac c eptanc e pr ob ability , that is, the proportion of proposed particles that are not rejected. This joint prop osal distribution should “resem ble” in some sense the target pro duct distribution, that of sampling θ ( t − 1) and θ ( t ) indep enden tly from, resp ectively , p  t − 1 ( ·| x ) and p  t ( ·| x ), whose densit y is q ∗  t − 1 , t ( θ ( t − 1) , θ ( t ) | x ) = p  t − 1 ( θ ( t − 1) | x ) p  t ( θ ( t ) | x ) . (3) As argued b y sev eral authors, e.g. Douc et al. (2007); Capp´ e et al. (2008); Cornebise et al. (2008); Beaumon t et al. (2009), a mathematically conv enien t formal deﬁnition of this “resemblance” is the Kullbac k-Leibler (KL) div ergence b et ween the prop osal distribution q  t − 1 , t ( θ ( t − 1) , θ ( t ) ) and the target distribution q ∗  t − 1 , t ( θ ( t − 1) , θ ( t ) ), i.e. K L ( q  t − 1 , t ; q ∗  t − 1 , t ) = Z Z q ∗  t − 1 , t  θ ( t − 1) , θ ( t )  log q ∗  t − 1 , t  θ ( t − 1) , θ ( t )  q  t − 1 , t  θ ( t − 1) , θ ( t )  dθ ( t − 1) dθ ( t ) . (4) In order to determine the v ariance of a comp onent-wise Gaussian k ernel, Beaumont et al. (2009) also minimized this quan tity , alb eit considering only the special case where  t − 1 =  t = 0 for which the solution has a closed form. In particular, in this special case, α ( K t , 0 , 0 , x ) = 1 for any K t . Ho wev er, in addition to maximize K L ( q  t − 1 , t ; q ∗  t − 1 , t ), an eﬃcient ABC SMC prop osal k ernel K t should hav e a high av erage acceptance rate α ( K t ,  t − 1 ,  t , x ). When reﬁning Beaumon t et al. (2009) with  t 6 =  t − 1 , we remark that K L ( q  t − 1 , t ; q ∗  t − 1 , t ) can be separated into three terms 5 K L ( q  t − 1 , t ; q ∗  t − 1 , t ) = − Q ( K t ,  t − 1 ,  t , x ) + log α ( K t ,  t − 1 ,  t , x ) + C (  t − 1 ,  t , x ) , (5) where Q ( K t ,  t − 1 ,  t , x ) = Z Z p  t − 1 ( θ ( t − 1) | x ) p  t ( θ ( t ) | x ) log K t ( θ ( t ) | θ ( t − 1) ) dθ ( t − 1) dθ ( t ) , (6) can b e maximized easily in some conv enient cases (see Section 4), α ( K t ,  t − 1 ,  t , x ) is the a verage acceptance probability already deﬁned in (2), which is muc h harder to minimize, and C (  t − 1 ,  t , x ) do es not dep end on the k ernel K t . Therefore the t w o following maximization problems are equiv alent: argmax K t Q ( K t ,  t − 1 ,  t , x ) = argmax K t  − K L ( q  t − 1 , t ; q ∗  t − 1 , t ) + log α ( K t ,  t − 1 ,  t , x )  . (7) As w e will show in Section 4 this problem is easy to solv e since the left-hand side often admits a closed- form solution. The most imp ortant remark is that the right-hand side is the solution of a multi-obje ctive optimization pr oblem , solving a trade-oﬀ b et ween jointly minimizing the Kul lb ack-L eibler diver genc e and maximizing the lo garithm of the aver age ac c eptanc e pr ob ability . Multi-ob jective optimization using an additiv e combination of t wo distinct ob jective functions is common practice, see e.g. Section 4.7.5 of Bo yd & V anden b erghe (2004). Although the w eights of suc h additiv e com bination are here forced up on us, w e note that the use of the logarithm of the acceptance probability strongly penalizes v ery low probabilities, while making equally desirable mo derate to large acceptance probabilities, a reasonable preference from the computational p oint of view. The practical consequence of these theoretical argumen ts is to choose the prop osal kernel K t = argmax K t Q ( K t ,  t − 1 ,  t , x ) . (8) W e ha ve shown that this choice corresp onds to a trade-oﬀ b etw een tw o desirable prop erties of the k ernel, namely the resemblance of the prop osal distribution and the target in the sense of the KL div ergence, and a high acceptance rate. And our criterion not only sheds new light on the justiﬁcation of some existing, pro ven criteria, but, additionally , reﬁnes them. In the next section we will describe ho w to carry out this maximisation in practice from a set of particles for random walk kernels. Remark 1 (Resampling and ﬁnite p opulation size) . F or the sake of clarity, we have slightly sim- pliﬁe d the e quations ab ove by omitting the r esampling step of ABC-SMC. In r e ality it is not p ossible to sample exactly θ ( t − 1) fr om p  t − 1 ( ·| x ) ; r ather, as describ e d in A lgorithm 1, θ ( t − 1) is sample d with r eplac ement fr om the pr evious p opulation of p articles { θ ( i,t − 1) } , eventual ly sampling fr om the weighte d empiric al distribution of the pr evious p opulation, note d p N  t − 1 ( ·| x ) . Systematic al ly r eplacing p  t − 1 ( ·| x ) with p N  t − 1 ( ·| x ) thr oughout e quations (1) to (8) le ads to the exact optimality criterion that we use d, pr op erly deﬁne d for a ﬁnite p opulation size N . Remark 2 (Theoretical requirements for conv ergence) . Conver genc e of imp ortanc e sampling-b ase d algorithms such as SMC samplers and ABC-SMC plac es some pr e cise c onditions on the kernel: i) having a lar ger supp ort than the tar get distribution, to guar ante e asymptotic unbiase dness of the em- piric al me an by a law of lar ge numb ers; and ii) vanishing slow ly enough in the tails of the tar get, e quivalent to ensuring ﬁnite varianc e of the imp ortanc e weights, to guar ante e asymptotic normality and ﬁnite varianc e of the estimator by a c entr al limit the or em. F ul ly investigating whether the optimal kernels satisfy such c onditions is outside the sc op e of this article. However, the Gaussian kernels studie d in the fol lowing se ctions have unb ounde d supp ort, ther efor e invoking a law of lar ge numb ers. The dir e ction of the asymmetric KLD that we c onsider in e quation (4) facilitates (but do es not guar ante e) ﬁnite varianc e. Inde e d, this dir e ction is opp osite to that of the KLD use d in V ariational Bayes, but similar to that use d in Exp e ctation Pr op agation algorithms (Barthelm ´ e & Chopin, 2011). As explaine d by e.g. (Bishop, 2006, p. 468), this dir e ction favours pr op osals with a go o d c over age of the tar get, ther efor e heuristic al ly tending to satisfy the r e quir ements for asymptotic normality. 6 4 Optimal c hoice of random w alk k ernels P erturbing a particle θ ( j,t − 1) consists of generating a new particle according to a probability parametrized b y θ ( j,t − 1) and often centered on θ ( j,t − 1) . In the ABC SMC algorithm, in addition to sampling from the kernel, w e m ust be able to compute the transition densit y K t ( θ ( i,t ) | θ ( j,t − 1) ) for an y particles θ ( i,t ) and θ ( j,t − 1) . Then, instead of c ho osing the p erturbation kernel K t that maximizes equation (6) ov er all p ossible k ernels, the space of p ossible k ernels is often restrained to a parametric family from whic h it is easy to sample and p erform optimization. Diﬀerent probabilit y mo dels ma y b e used, with the most common being the uniform and the Gaussian distributions (Sisson et al. , 2007; T oni et al. , 2009; Liep e et al. , 2010). In the following, w e outline diﬀerent c lasses of Gaussian p erturbation kernels and compare their eﬃciency in terms of acceptance rates and computational cost. 4.1 Comp onen t-wise p erturbation kernel In most cases the particle is mo ved comp onent-wise: for eac h comp onent 1 ≤ j ≤ d of the parameter v ector θ = ( θ 1 , · · · θ d ), θ j is p erturb ed independently according to a Gaussian distribution with mean θ j and v ariance σ 2 j . The parameters { σ j } 1 ≤ j ≤ d ma y b e ﬁxed in adv ance, but more frequen tly (Beaumont et al. , 2009; McKinley et al. , 2009; T oni & Stumpf, 2009; Jasra et al. , 2010; Barnes et al. , 2011; Didelot et al. , 2011) adaptiv ely chosen kernel widths, { σ ( t ) j } 1 ≤ j ≤ d , are used whic h dep end on the previous population — the scale or v ariance is then indexed b y the population index, t . Considering a kernel of the form K t ( θ ( t ) | θ ( t − 1) ) = d Y j =1 1 √ 2 π σ ( t ) j exp ( − ( θ ( t ) j − θ ( t − 1) j ) 2 2 σ ( t ) 2 j ) (9) and maximizing Q ( K t , 0 , 0 , x ) — or more precisely Q ( K t ,  t − 1 ,  t − 1 , x ) — Beaumont et al. (2009) sho wed that the optimal v alue of σ ( t ) j is twice the empirical v ariance of the j -th comp onent of the parameter vector in the previous p opulation. Maximizing Q ( K t ,  t − 1 ,  t , x ), ho w ever, leads to a sligh tly diﬀeren t c hoice of σ ( t ) j whic h is σ ( t ) j =  Z Z p  t − 1 ( ·| x ) p  t ( ·| x )( θ ( t ) j − θ ( t − 1) j ) 2 dθ ( t ) j dθ ( t − 1) j  1 / 2 . (10) This quantit y can be appro ximated from the set { θ ( i,t − 1) , ω ( i,t − 1) , y ( i,t − 1) } 1 ≤ i ≤ N as follows σ ( t ) j ≈ N X i =1 N 0 X k =1 ω ( i,t − 1) ˜ ω ( k,t − 1) ( ˜ θ ( k,t − 1) j − θ ( i,t − 1) j ) 2 ! 1 / 2 . (11) where n ˜ θ ( k,t − 1) , ˜ ω ( k,t − 1) o 1 ≤ k ≤ N 0 = ( θ ( i,t − 1) , ω ( i,t − 1) ¯ ω ! s.t. ∆( x, y ( i,t − 1) ) ≤  t , 1 ≤ i ≤ N ) (12) and ¯ ω is a normalizing constant such that P N 0 k =1 ˜ ω ( k,t − 1) = 1. Another commonly used component-wise kernel is the uniform kernel , whic h consists of perturbing the j -th comp onent of particle θ to any v alue in the in terv al h θ j − σ ( t ) j ; θ j − σ ( t ) j i with density 1 / 2 σ ( t ) j . A natural c hoice is to set the parameter σ ( t ) j to the scale of the previous p opulation, that is σ ( t ) j ≈ 1 2  max 1 ≤ k ≤ N { θ ( k,t − 1) j } − min 1 ≤ k ≤ N { θ ( k,t − 1) j }  . Note that the main diﬀerence b etw een the uniform and the component-wise normal k ernels concerns their supp ort: b ounded vs. unbounded. 7 θ 1 θ 2 θ 1 θ 2 θ 1 θ 2 Figure 1: A p opulation of particles and iso density curv es for a uniform kernel (left), a comp onen t-wise normal kernel (centre) and a m ultiv ariate normal p erturbation kernel (right) around one particle (red p oin t). 4.2 Multiv ariate normal p erturbation kernels Consider a p opulation of tw o-dimensional parameters that are highly correlated. The p erturbation of a particle according to the uniform kernel consists in sampling a parameter uniformly in a rectangle whose sides are parallel to the axes (see Figure 1 left). Similarly , the densit y levels of the comp onen t- wise normal kernel are ellipsoids whose principal axes are parallel to the parameter axes (see Figure 1 centre). F or highly correlated parameters the use of comp onent-wise p erturbation k ernels in the ABC SMC framework can lead to a small acceptance rate b ecause they can inadequately reﬂect the structure of the true p osterior. Instead of using a component-wise kernel it may thus b e more eﬃcien t to tak e in to accoun t the correlations b etw een the diﬀerent elements of the parameter vectors, in eﬀect perturbing the particles according to a m ultiv ariate normal distribution with a cov ariance matrix Σ ( t ) , whic h depends on the co v ariance of the previous population. Figure 1 (right) represen ts a multivariate normal p erturb ation kernel for a matrix Σ ( t ) prop ortional to the co v ariance of the previous p opulation. W e observ e that few er particle proposals are likely to be rejected with this p erturbation k ernel compared to the uniform or comp onent-wise normal one. The m ultiv ariate normal p erturbation kernel relies on the co v ariance matrix Σ ( t ) whic h dep ends on the previous p opulation. As before, it is p ossible to calculate the optimal cov ariance matrix Σ ( t ) using the Kullbac k-Leibler div ergence minimization approac h (see also (Capp´ e et al. , 2008)). Maximizing equation (6) for K t ( θ ( t ) | θ ( t − 1) ) = (2 π ) − d/ 2  det Σ ( t )  − 1 / 2 exp  − 1 2  θ ( t ) − θ ( t − 1)  T  Σ ( t )  − 1  θ ( t ) − θ ( t − 1)   with resp ect to Σ ( t ) leads to maximizing the real-v alued function g ( M ) = log det ( M ) − Z Z p  t − 1 ( θ ( t − 1) | x ) p  t ( θ ( t ) | x )( θ ( t ) − θ ( t − 1) ) T M ( θ ( t ) − θ ( t − 1) ) dθ ( t ) dθ ( t − 1) , with resp ect to the symmetric d × d matrix M , and deﬁning Σ ( t ) = M − 1 . W e denote by v T the transp ose of v ector v . W e then obtain that the co v ariance matrix Σ ( t ) of the optimal k ernel in the m ultiv ariate Gaussian family is Σ ( t ) = Z Z p  t − 1 ( θ ( t − 1) | x ) p  t ( θ ( t ) | x )( θ ( t ) − θ ( t − 1) )( θ ( t ) − θ ( t − 1) ) T dθ ( t ) dθ ( t − 1) . Pro ceeding in a similar fashion to the comp onent-wise case, if we assume that  t =  t − 1 , then the optimal cov ariance matrix can b e approximated b y twice the empirical cov ariance matrix of the p op- ulation t − 1. In the general case, how ever, an optimal c hoice of the co v ariance matrix Σ ( t ) for the 8 m ultiv ariate normal perturbation k ernel is then approximated b y Σ ( t ) ≈ N X i =1 N 0 X k =1 ω ( i,t − 1) ˜ ω ( k ) ( ˜ θ ( k ) − θ ( i,t − 1) )( ˜ θ ( k ) − θ ( i,t − 1) ) T . (13) where, { ˜ θ ( k ) } 1 ≤ k ≤ N 0 and { ˜ ω ( k ) } 1 ≤ k ≤ N 0 are deﬁned by equation (12). In the following, if nothing else is speciﬁed, the multivariate normal kernel refers to the k ernel with this choice of co v ariance matrix. This result generalises the work of Beaumont et al. (2009). 4.3 Lo cal p erturbation kernels In man y applications the parameters of the system are highly correlated but in a non-linear wa y . The m ultiv ariate normal and the comp onent-wise normal kernels discussed ab o ve ma y then b eha ve similarly (see for example the to y mo del describ ed in section 5.3). Indeed, the cov ariance matrix based on all the previous particles yields only limited information ab out the lo cal correlation among the individual comp onen ts of the parameter v ectors. In suc h cases it is in teresting to consider the use of a lo cal cov ariance matrix whic h no w ma y diﬀer b et ween particles. In the follo wing we discuss three lo cal p erturbation kernels for whic h each particle θ is p erturb ed according to a multiv ariate normal k ernel whose co v ariance matrix Σ ( t ) θ is a function of θ . 4.3.1 The m ultiv ariate normal k ernel with M nearest neighbours The multivariate normal kernel with M neighb ours follo ws this principle: for eac h particle θ ∈ { θ ( l,t − 1) , 1 ≤ l ≤ N } , the M -nearest neighbours of θ are selected, and the p erturb ed particle is sampled according to a multiv ariate normal distribution of mean θ and of cov ariance the empirical co v ariance Σ ( t ) θ,M based on the M nearest neigh b ours of θ . The main dra wback of this perturbation kernel is that the parameter M typically needs to be ﬁxed in adv ance b efore an y of the in tricacies of the p osterior are known. Using to o small a v alue ma y lead to a lac k of exploration of parameter space, while to o large a v alue of M w ould oﬀer little or no adv antage compared to the standard m ultiv ariate normal k ernel. Ideally , a mixture of m ultiv ariate normal k ernels with diﬀeren t v alues of M could b e used; how ev er, in practice, this solution is computationally to o exp ensiv e. 4.3.2 The m ultiv ariate normal k ernel with optimal lo cal co v ariance matrix The theoretical calculation of the optimal cov ariance matrix ab ov e (see Section 4.2) may b e adapted to identify an optimal lo cal co v ariance matrix. Let’s consider a particle θ ( t − 1) whic h has b een sampled from the previous p opulation { θ ( k,t − 1) } 1 ≤ k ≤ N . T o determine the cov ariance matrix Σ ( t ) θ ( t − 1) of a m ulti- v ariate normal pertubation k ernel cen tered in θ ( t − 1) w e deriv e a criterion Q ( K t ,  t , x ) similar to the criterion deﬁned in equation (6) expect that the particle θ ( t − 1) is ﬁxed: Q ( K t ,  t , x ) = Z p  t ( θ ( t ) | x ) log K t ( θ ( t ) | θ ( t − 1) ) dθ ( t ) (14) whic h is equal to Z p  t ( ·| x ) log det  Σ ( t ) θ ( t − 1)  − 1 − d 2 log(2 π ) − 1 2 ( θ ( t ) − θ ( t − 1) ) T  Σ ( t ) θ ( t − 1)  − 1 ( θ ( t ) − θ ( t − 1) ) dθ ( t ) (15) 9 when K t ( θ ( t ) | θ ( t − 1) ) is a multi-v ariate normal kernel cen tred on θ ( t − 1) with co v ariance matrix Σ ( t ) θ ( t − 1) . Maximizing the equation ab o ve with resp ect to Σ ( t ) θ ( t − 1) the optimal local co v ariance matrix is Σ ( t ) θ ( t − 1) = Z p  t ( θ ( t ) | x )( θ ( t ) − θ ( t − 1) )( θ ( t ) − θ ( t − 1) ) T dθ ( t ) , whic h can b e approximated from the set  θ ( i,t − 1) , ω ( i,t − 1) , y ( i,t − 1)  1 ≤ i ≤ N as follows Σ ( t ) θ ( t − 1) ≈ N 0 X k =1 ˜ ω ( k ) ( ˜ θ ( k ) − θ ( t − 1) )( ˜ θ ( k ) − θ ( t − 1) ) T , where, { ˜ θ ( k ) } 1 ≤ k ≤ N 0 and { ˜ ω ( k ) } 1 ≤ k ≤ N 0 are deﬁned b y equation (12). T o b etter understand the co- v ariance matrix Σ ( t ) θ ( t − 1) w e remark, by a classical bias-v ariance decomp osition, that it is equal to the co v ariance of the particles from the previous p opulation corresp onding to distances smaller than  t (i.e. the weigh ted particle denoted { ˜ θ ( k ) , ˜ ω ( k ) } 1 ≤ k ≤ N 0 through this article) plus a bias term related to the discrepancy betw een the mean of this population and the particle of interest θ ( t − 1) : Σ ( t ) θ ( t − 1) ≈ N 0 X k =1 ˜ ω ( k ) ( ˜ θ ( k ) − m )( ˜ θ ( k ) − m ) T + ( m − θ ( t − 1) )( m − θ ( t − 1) ) T where m = P N 0 k =1 ˜ ω ( k ) ˜ θ ( k ) . The multiv ariate normal p erturbation kernel with a cov ariance matrix as deﬁned ab ov e will b e referred to as the multivariate normal kernel with OLCM (where OLCM stands for optimal lo cal co v ariance matrix). W e now hav e a diﬀerent cov ariance matrix Σ ( t ) θ ( j,t − 1) for each particle θ ( j,t − 1) of the previous p opulation. 4.3.3 P erturbation kernel based on the Fisher information for mo del deﬁned by ordinary or stochastic diﬀerential equations Often in molecular biology the time evolution of species abundance is modelled b y a system of ordinary diﬀeren tial equations (ODE), sto chastic diﬀerential equations (SDE) or a chemical master equation (CME). F or these t yp es of mo dels it is possible to use information from the generativ e mo del — via the Fisher Information Matrix (Rao, 1945; MacKay, 2003; Cox, 2006) — to compute a local cov ariance matrix for the p erturbation kernel. The Fisher Information Matrix (FIM) deﬁned as I ( θ ) = − E X  ∂ 2 ∂ θ 2 log f ( X | θ )  measures the amount of information that the observ able random v ariable X carries ab out the pa- rameter θ . As previously men tioned, the ABC algorithm is mainly used when the likelihoo d function f ( ·| θ ) is not known, and so the Fisher Information Matrix can not b e computed exactly . Nevertheless, Komoro wski et al. (2011) hav e developed a metho d that appro ximates the FIM for deterministic and sto c hastic dynamical systems represented b y ODEs and by SDEs (using the linear noise approxima- tion). This ev aluation of the FIM can b e applied as part of the ABC SMC procedure progresses and so the p erturbation k ernels can adapt appropriately . Despite the fact that the following k ernels cannot b e computed in the general case, we consider them here because of their potential use for inference in dynamical systems. In the Laplace expansion the eigen vectors and eigenv alues of the in verse of the FIM I ( θ ) map out ellipsoidal levels of equal densit y around the parameter θ . This immediately suggests the use of the 10 Figure 2: Lo cal k ernel based on the Fisher Information Matrix I ( θ ). Left and cen tre: The eigen vectors of I − 1 ( θ ) (red arro ws) of size prop ortional to the eigenv alues for tw o diﬀerent particles θ . Righ t: Logarithm of the determinant of I − 1 ( θ ) for each particle θ . The maxim um v alue of det( I − 1 ( θ )) o ver the population is equal to 85745 (yello w p oints) and the minim um one is equal to 0 . 14 (black p oints). matrix I − 1 ( θ ) as the cov ariance matrix for a multiv ariate normal p erturbation kernel. The directions of the eigenv ectors of I − 1 ( θ ) and the relative size of their eigen v alues are indeed b oth relev an t for p erturbing the parameter θ eﬃcien tly (see Figure 2). How ev er, the determinan t of I ( θ ) is a measure of the amoun t of information a v ailable around θ and may v ary exp onen tially with θ (see Figure 2 Righ t); its v alue is v ery small for some parameters θ and this leads to a p erturbation kernel with to o large a cov ariance. On the other hand, if the determinant of I ( θ ) is large, additional information ma y b e gained by mo ving only in the direct vicinit y of θ , and a p erturbation kernel based on the in verse of the FIM explores only the immediate neighbourho o d of the parameter. W e therefore prop ose scaling the matrix I − 1 ( θ ) suc h that its determinant v aries in a more con trolled manner. W e consider tw o v ersions of the multivariate normal p erturb ation kernel b ase d on the FIM : the ﬁrst one consists of normalizing the matrix such that its determinan t is equal to the determinan t of the empirical co v ariance matrix of the previous population Σ ( t ) θ = det(Σ ( t ) ) det( I − 1 ( θ )) ! 1 /d I − 1 ( θ ) , where Σ ( t ) is the matrix deﬁned in equation (13); the second approac h consists of normalizing the matrix suc h that its determinan t is equal to the determinan t of the empirical co v ariance matrix Σ ( t ) θ,M based on the M nearest neigh b ours of the particle deﬁned according to Σ ( t ) θ,M =   det(Σ ( t ) θ,M )) det( I − 1 ( θ ))   1 /d I − 1 ( θ ) . The parameter M may , for instance, b e equal to 20% of the previous p opulation. 5 Numerical results W e ﬁrst apply the ABC SMC algorithm with diﬀeren t k ernels to three illustrativ e examples, which exhibit certain pathological features that highligh t the diﬀerences betw een the p erturbation k ernels considered here. In order to analyze the impact of the kernel choice rather than an y extraneous factors, we ﬁx the threshold sc hedule {  t } t to (160 , 120 , 80 , 60 , 40 , 30 , 20 , 15 , 10 , 8 , 6 , 4 , 3 , 2 , 1) and use the same for ev ery simulation. How ev er, in practice we strongly advise using an adaptiv e choice of the 11 N = 800 Bandwidth = 0.3367 Density 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.02 0.04 0.06 0.08 1 . 0 0.12 0.14 0.16 0.18 0.2 0.22 Density A. P osterior distribution B. A cceptance rates acceptance rate ABC SMC iteration Figure 3: A. P osterior distribution for an ellipsoid p osterior. B. Av erage of the acceptance rate ov er 10 independent runs for 6 diﬀeren t k ernels: the uniform kernel (green), the component-wise normal k ernel (blue), the comp onen t-wise normal kernel proposed by Beaumon t et al. (2009) (dashed blue), the m ultiv ariate normal kernel (blac k), the multiv ariate normal kernel with 50 neighbours (red), the m ultiv ariate normal k ernel with OLCM (cy an).  threshold. All the sim ulation ha v e b een done using the soft ware ABC-SysBio (Liep e et al. , 2010) v ersion 1.03 (which allows for adaptive kernels, to o) using an euclidian distance to compare sim ulated and observed data. 5.1 Ellipsoid shap e W e b egin with a to y example where the prior distribution of the t wo dimensional parameter is a uniform distribution on the square [ − 50 , 50] × [ − 50 , 50] and the likelihoo d function is given by x ∼ N  ( θ 1 − 2 θ 2 ) 2 + ( θ 2 − 4) 2 , 1  . It is assumed that x = 0 is observ ed. The p osterior densit y is then p ( θ | x ) ∝ φ (0; ( θ 1 − 2 θ 2 ) 2 + ( θ 2 − 4) 2 , 1) 1 [ − 50 , 50] × [ − 50 , 50] ( θ ) where φ ( x ; µ, σ 2 ) is the one dimensional normal density with mean µ and v ariance σ 2 , and is repre- sen ted in Figure 3 A. The ABC SMC algorithm is used to estimate p  ( θ | x ) with N = 800 particles. W e compare 6 diﬀeren t p erturbation kernels: 1) the uniform k ernel (Section 4.1), 2) the comp onent-wise normal k ernel (Section 4.1), 3) the component-wise normal k ernel prop osed b y Beaumon t et al. (2009), 4) the m ultiv ariate normal k ernel with the co v ariance matrix computed from the whole previous p opu- lation (Section 4.2), 5) the multiv ariate normal kernel whose cov ariance matrix is computed according to the M nearest neigh b ours of each particle (Section 4.3.1), with M = 50, and 6) the m ultiv ariate normal kernel with OLCM (Section 4.3.2). W e reiterate that the v ariance of the comp onen t-wise normal k ernel prop osed b y Beaumont et al. (2009) is s ligh tly diﬀerent than the one w e prop ose here: they suggest to use t wice the empirical v ariance of the previous p opulation as a v ariance whereas we take in to accoun t the new threshold  t , as describ ed in equation (11). Figure 3 B sho ws that the acceptance rate diﬀers signiﬁcantly betw een kernels. The uniform k ernel has an acceptance rate roughly equal to that of the comp onent-wise normal kernel. Moreov er, the tw o v ersions of the comp onent-wise normal k ernels ha ve similar acceptance rates, with a sligh tly b etter 12 N = 800 Bandwidth = 0.373 Density 0.01 0.01 0.02 0.02 0.03 0.03 0.04 0.04 0.05 0.05 6 0 . 0 0.06 0.06 0.06 0.06 0.07 0.07 0.01 0.01 0.02 0.02 0.03 0.03 0.04 0.04 0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.07 0.07 Density A. P osterior distribution B. A cceptance rates acceptance rate ABC SMC iteration Figure 4: A. P osterior distribution for an ellipsoid p osterior. B. Av erage of the acceptance rate ov er 10 independent runs for 6 diﬀeren t k ernels: the uniform kernel (green), the component-wise normal k ernel (blue), the comp onen t-wise normal kernel proposed by Beaumon t et al. (2009) (dashed blue), the m ultiv ariate normal kernel (blac k), the multiv ariate normal kernel with 50 neighbours (red), the m ultiv ariate normal k ernel with OLCM (cy an). p erformance for the one taking in to accoun t the diﬀerence b etw een the successive threshold v alues. Giv en the shap e of the p osterior distribution, it is easy to understand that a multiv ariate normal kernel results in a larger acceptance rate than the other kernels. Since the tw o comp onents of the parameters are strongly correlated, using an estimate of the cov ariance from the previous p opulation instead of an estimation of only the comp onen t-wise v ariances makes a mark ed diﬀerence on the acceptance rates. Both the multiv ariate normal kernel based on the 50 nearest neigh b ours and the one based on the OLCM result in acceptance rates o v er t wo times higher than the comp onent-wise k ernels. 5.2 Ring shap e In the second to y example, the prior distribution of the t wo dimensional parameter is still a uniform distribution on the square [ − 50 , 50] × [ − 50 , 50] but the likelihoo d function is now giv en b y x ∼ N  θ 2 1 + θ 2 2 , 0 . 5  . Again we assume that x = 0 is observ ed; the p osterior density is then p ( θ | x ) ∝ φ (0; θ 2 1 + θ 2 2 , 0 . 5) 1 [ − 50 , 50] × [ − 50 , 50] ( θ ) . As in the previous example w e used the ABC SMC algorithm with N = 800 particles and compare the same 5 p erturbation kernels. The p osterior distribution, represented by Figure 4 A, has a ring shap e centred around 0. In this case, in con trast to the previous example, the multiv ariate normal perturbation k ernel using an estimate of the co v ariance based on the previous p opulation, as w ell as the OLCM version of it, ha ve an acceptance rate similar to the comp onen t-wise normal p erturbation kernel. In this example the correlation b et ween the tw o parameters, θ 1 and θ 2 , at the whole population level is weak. This kind of shape requires a more lo cal p erturbation k ernel in order to obtain higher acceptance rates. This is the case for the perturbation kernel based on the co v ariance matrix computed from the 50 nearest neigh b ours. 13 N = 800 Bandwidth = 0.1673 Density 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 Density uniform normal component-wise whole population M nearest neighbours OL CM FIM scaled based on whole pop FIM scaled based on nearest neighbours multi-variate normal A. P osterior distribution B. A cceptance rates B. A cceptance rates f or M nearest neighbours kernel for dier ent values of M C. Legend f or Figure A acceptance rate ABC SMC iteration acceptance rate ABC SMC iteration normal component-wise from Beaumont et al Figure 5: A. P osterior distribution for an ellipsoid p osterior. B. Av erage of the acceptance rate ov er 10 indep endent runs for 8 diﬀeren t kernels. C. Acceptance rate for m ultiv ariate normal k ernels based on the M neigh b ours for M ∈ { 50 , 100 , 200 , 300 , 400 , 500 , 600 , 700 , 800 } (from red to y ellow) and the m ultiv ariate kernel with an estimated cov ariance based on the whole p opulation (blac k). D. Legend for Figure B. 14 5.3 Banana shap e The third example w e consider is one of the canonical examples of a p osterior distribution whic h p oses a challenge to simple k ernels: the so-called ‘banana-shap e’ distribution in tw o dimensions (Haario et al. , 1999). The likelihoo d function is giv en b y  x 1 x 2  ∼ N 2  θ 1 θ 1 + θ 2 2  ,  1 0 0 0 . 5  and we use a uniform prior distribution on the square [ − 50 , 50] × [ − 50 , 50]. It is assumed that x = (0 , 0) is observed. The p osterior densit y is then p ( θ | x ) ∝ Φ  0 0  ;  θ 1 θ 1 + θ 2 2  ,  1 0 0 0 . 5  1 [ − 50 , 50] × [ − 50 , 50] ( θ ) where Φ( x ; v , Σ) is the m ulti-dimensional normal densit y of mean v and co v ariance Σ. W e use the same ABC SMC settings and again compare the 6 previous perturbation k ernels, as well as tw o versions of the m ultiv ariate normal p erturbation kernel where the co v ariance matrix is proportional to the inv erse of the FIM. Here the FIM is exactly computable: I ( θ ) =  1 . 5 θ 2 θ 2 2 θ 2 2  . When θ 2 = 0 w e replace it by a v ery small v alue, 10 − 4 , such that I ( θ ) is no longer singular; safeguarding against singular FIMs is straigh tforward and unproblematic and a sensible precaution when running suc h algorithms without manual in terven tion. The p osterior distribution is represen ted in Figure 5 A. As in the ring example, the m ultiv ariate normal perturbation k ernel using the full estimated co v ariance of the whole previous population has an acceptance rate similar to the comp onen t-wise normal p erturbation k ernel with adaptive estimation of the v ariances (see Figure 5 B). The multiv ariate normal kernel with OLCM obtains sligh tly b etter results. The tw o v ersions of the p erturbation kernel based on the FIM hav e signiﬁcantly diﬀerent acceptance rates. The most eﬃcien t version in term of acceptance rate is the one using the M nearest neigh b ours, as might b e exp ected. Ho wev er this kernel, as in the case of the multiv ariate normal k ernel based on the M nearest neigh b ours, can show undesirable dep endence on the c hosen v alue of M . Figure 5 B represents the acceptance rate ev olution for this last p erturbation kernel with diﬀerent v alues of M . The acceptance rate diminish considerably as M increases and the kernel b ecomes less “lo cally aw are”. 6 Applications in Molecular Biology 6.1 The Repressilator Mo del T o analyse the diﬀerences b et ween the eﬃciency of the p erturbation k ernel in biological applications w e ﬁrst fo cus on sim ulated datasets for the repressilator mo del, a p opular mo del for gene regulatory systems (Elowitz & Leibler, 2000). It consists of three genes connected in a feedback lo op, where each gene transcrib es the repressor protein for the next gene in the lo op (see Figure 6 A.). This mo del also exempliﬁes the c hallenges that are frequently encountered in attempts to reverse engineer the structure and parameters of dynamical systems from data (T oni et al. , 2009; Girolami & Calderhead, 2011). The evolution of the concentration of the 3 proteins and mRNA ov er time is describ ed by a system of six ordinary diﬀerential equations parametrized b y a four dimensional parameter vector, 15 A. Model B. Intermediate post erior distr ibution D . Computational eciency for di erent kernels 0 2 N = 1000 Bandwidth = 0.0682 Density 0.5 1 1 1.5 2 2.5 0.05 0.1 0.15 5e−04 0.001 0.0015 0.5 1 1 1.5 2 2.5 1.6 2.4 N = 1000 Bandwidth = 0.04065 Density 0.1 0.2 0.3 0.4 0.5 0.6 5e−04 0.001 0.0015 0.05 0.1 0.15 0.1 0.2 0.3 0.4 0.5 5 20 N = 1000 Bandwidth = 0.6145 Density 5e−05 1e−04 0.00015 5e−04 0.001 5e−04 0.001 0.0015 5e−05 1e−04 0.00015 500 2500 Density 0 5000 10000 15000 0 1 2 3 6 4 5 x10 6 Number of simulations Running time Uniform Normal compon. Whole population M-nearest neighbours OL CM FIM scaled based on whole pop. FIM scaled based on K neighb. Multi-variate normal p1 p2 p3 m1 m2 m3 C. P osterior distribution α0 α β n 0.8 1.2 1.8 2.05 3.5 5.5 900 1400 α0 n β α Figure 6: A. The repressilator mo del: three genes are connected in a feedback lo op; eac h gene i , transcrib es the repressor protein p i for the next gene in the lo op. B. P osterior distribution for  = 50. C. Marginal p osterior distribution for  = 35. D. Num b er of simulations required to obtain the p osterior distribution and running time of the algorithm (in min utes) for eac h kernel. The error bars on the barplots show the v ariance of the num b er of simulation and running time o ver 10 (resp. 3) indep enden t simulations for the multi-v ariate normal kernels (resp. for the normal comp onen twise k ernel). The mRNA measuremen ts are giv en in the supplementary tak e 1. 16 θ = ( α 0 , n, β , α ) as follows dm 1 dt = − m 1 + α 1 + p n 3 + α 0 dp 1 dt = − β ( p 1 − m 1 ) dm 2 dt = − m 2 + α 1 + p n 1 + α 0 dp 2 dt = − β ( p 2 − m 2 ) dm 3 dt = − m 3 + α 1 + p n 2 + α 0 dp 3 dt = − β ( p 3 − m 3 ) . W e denote b y m i and p i the concentration of the mRNA and protein pro ducts of gene i resp ectiv ely . The parameters in the mo del are the Hill co eﬃcien t n , repression strength α , basal expression rate α 0 and the ratio of the protein deca y rate to the mRNA deca y rate β . These are assumed to b e the same for all three genes. W e assume that only the mRNA ( m 1 , m 2 , m 3 ) measurements are a v ailable, and set the initial sp ecies concentrations of ( m 1 , p 1 , m 2 , p 2 , m 3 , p 3) to (0 , 2 , 0 , 1 , 0 , 3); data are generated by sim ulating the mo del with ( α 0 , n, β , α ) = (1 , 2 , 5 , 1000), and measuring the concen tration of mRNA at time- p oin ts (0 . 0 , 0 . 6 , 4 . 2 , 6 . 2 , 8 . 6 , 13 . 4 , 16 , 21 . 4 , 27 . 6 , 34 . 4 , 39 . 8 , 40 . 6 , 45 . 2), sub ject to some added zero-mean Gaussian noise with v ariance 5. The same problems obviously prev ail for diﬀeren t datasets. The ABC SMC algorithm is used to estimate p ( θ |{ m 1 ( k ) , m 2 ( k ) , m 3 ( k ) } k ) with N = 1000 par- ticles and a decreasing sequence of thresholds equal to (160 , 150 , 140 , 130 , 120 , 100 , 80 , 50 , 40 , 37 , 35). The marginal p osterior distribution is represen ted in Figure 6 C and agrees very w ell with what is kno wn from previous studies (T oni et al. , 2009). An intermediate p osterior distribution is repre- sen ted in Figure 6 B and shows a highly non-linear correlation b et ween some parameters, in particular parameters n and β . In Figure 6 D, w e compare the cum ulative num b er of sampled data o v er the algorithm as w ell as the running time of the algorithm for diﬀeren t p erturbation k ernels. Using the uniform k ernel, up to 6 × 10 6 sim ulations are required to obtain an appro ximation of the p osterior distribution whereas less than 1 × 10 6 sim ulations are required if a m ulti-v ariate normal kernel is used with, in particular, only 1 . 6 × 10 5 sim ulations if the second version of the m ultiv ariate normal k ernel based on the FIM is used. Moreo ver, w e observe that the running time of the algorithm for each k ernel is prop ortional to the n umber of simulations, so the main computational cost of the algorithm is due to the simulation of the data. Therefore, the time sp ent on deﬁning the kernels, including determining the nearest neighbours of eac h particle, or ev aluating the FIM for each parameter is small compared to the time sa ved b y prop osing new particles more eﬃciently . So even in this simple model a signiﬁcant impro vemen t is p ossible through the appropriate choice of p erturbation k ernel. 6.2 The Hes1 mo del W e no w compare the eﬃciency of the p erturbation k ernels on an exp erimen tal dataset. W e consider a dynamical system describing the expression lev el of the transcription factor Hes1, whic h plays an imp ortan t role in cell diﬀeren tiation and the segmentation of vertebrate em bryos. In 2002, oscillations of Hes1 expression lev el as been observ ed by Hirata et al. (2002). The Hes1 oscillator can b e mo delled 17 b y a system of three ordinary diﬀeren tial equations describing the evolution of the Hes1 mRNA, m , the Hes1 cytosolic protein, p 1, and the Hes1 nu clear protein, p 2 as follows (see Figure 7 A and Silk et al. (2011)) dm dt = − k deg m + 1 1 + ( p 2 /P 0 ) h dp 1 dt = − k deg p 1 + ν m − k 1 p 1 dp 2 dt = − k deg p 2 + k 1 p 1 . The mo del is parametrized by the protein degradation rate k deg whic h is assumed to b e the same for b oth cytoplasmic and n uclear proteins, the rate of transp ort k 1 of Hes1 protein in to the n ucleus, the amoun t P 0 of Hes1 protein in the nucleus, when the rate of transcription of Hes1 mRNA is at half its maximal v alue, the rate ν of translation of Hes1 mRNA and the Hill coeﬃcient h . T o infer the parameters of this mo del we use the qualitative real-time PCR measurements published b y Silk et al. (2011): the concen tration of mRNA is measured ev ery 30 min utes ov er 3 hours and is equal to { m } k = (2 , 1 . 20 , 5 . 90 , 4 . 58 , 2 . 64 , 5 . 38 , 6 . 42 , 5 . 60 , 4 . 48) (see Figure 7). W e aim at inferring the 4 parameters { P 0 , ν, k 1 , h } ; w e take the degradation rate k deg is equal to b e the exp erimen tally determined v alue of 0 . 03. According to the data and some preliminary results, we ﬁx the initial concen tration of mRNA to be equal to 2, the initial concen tration of p 1 to 5 and the one of p 2 to 3. The ABC SMC algorithm is used to estimate the posterior probability distribution with N = 1000 particles and a decreasing sequence of thresholds equal to (20 , 13 , 10 , 6 , 5 , 4 , 3 , 2 . 8 , 2 . 7 , 2 . 6 , 2 . 5). The p osterior distribution is represented in Figure 7 and is identical for ev ery p erturbation kernel. T o compare the performance of the p erturbation kernels w e show in Figure 7 C the num b er of simulations during the algorithm as w ell as the running time of it for each kernel independently . W e observe that the m ulti-v ariate normal k ernels outp erforms the uniform and normal component wise k ernels with an increase of the sp eed up to 4 folds for the kernel based on the M-nearest neihbours with M = 50. Con trary to the repressilator mo del studied in the previous section, in the Hes1 dynamical system, the kernels based on the FIM do not p erform well, with a running tim e comparable to the normal comp onen t-wise k ernel. This may b e explained b y the fact that w e use real data whic h ma y not b e en tirely explained by the model. In such a situation the lik eliho o d surface may not b e s mo oth enough for the FIM to oﬀer muc h help. By con trast, how ev er, the p erformance of the OLCM perturbation k ernel is similar to that of the 50-nearest neighbours kernel, with the adv antage of being free of an y parameter choice. F or suc h a mo del we would clearly recommend the use of the OLCM p erturbation k ernel. 7 Conclusion In contrast to MCMC, where the piv otal role of perturbation k ernels for conv ergence and mixing has b een w ell do cumen ted (Gilks et al. , 1996; Rob ert & Casella, 2004), for ABC SMC approaches there has b een comparatively little work. In particular in the ABC con text, whic h often relies on computationally costly sim ulation routines, p o or c hoice of the p erturbation kernel will result in p oten tially prohibitive computational o verheads. W e ha ve addressed this lack of suitable k ernels here in a rigorous but non-exhaustiv e fashion by fo cusing on k ernels that are based around uniform or normal/multiv ariate normal parametric families. Imp ortantly , in all the examples we were able to ensure that the diﬀeren t k ernels had arrived at essentially identical p osterior distributions, and for ﬁxed  t sc hedule w e can use the acceptance rate as an ob jective criterion for the numerical eﬃciency of diﬀerent k ernels. F or all these mo dels it is relatively straigh tforward to construct optimalit y criteria b y reference to the KL div ergence follo wing Beaumon t et al. (2009). In higher-dimensional parameter spaces it is 18 0 1 2 3 4 200 400 600 800 1000 1200 0 Number of simulations Running time Uniform Normal compon. Whole population M-nearest neighbours OL CM FIM scaled based on whole pop . FIM scaled based on K neighb . Multi-variate normal x10 5 2.0 2.8 N = 1000 Bandwidth = 0.03459 Density P0 P0 P0 0.5 1 1.5 100 200 300 5 10 15 0.5 1 1.5 6 9 N = 1000 Bandwidth = 0.1327 Density 20 40 60 80 1 2 3 4 5 100 200 300 20 40 60 80 0.015 0.035 N = 1000 Bandwidth = 0.000743 Density 200 400 600 800 1000 1200 5 10 15 1 2 3 4 5 200 400 600 800 1000 0 0.35 Density h nu k1 p1 p2 m gene A. Model B. P osterior distribution D . Computational eciency for di erent kernels C. Data 0 2 4 6 0 50 100 150 200 Figure 7: A. The Hes1 model: the Hes1 protein migrates from the cytoplasm to the nucleus where it regulates the transcription of the mrna whic h then translates the cytoplasmic protein. B. Posterior distribution. C. The time-course of mRNA concentration: the stars represen t the data published in Silk et al. (2011), the lines represent the minim um and maximum v alues for the ev olution of the sp ecies for 1000 particles sampled from the p osterior distribution. D. Num b er of sim ulations required to obtain the posterior distribution and running time of the algorithm (in min utes) for eac h k ernel. The error bars on the barplots show the v ariance of the num b er of simulations and running time ov er 10 indep endent sim ulations for eac h perturbation k ernel. 19 imp ortan t to take in to accoun t the p otentially correlated nature of parameters, and, not surprisingly w e ﬁnd that comp onent-wise p erturbation of particles tends to p erform p o orly compared to the other approac hes considered here. In more complicated cases, e.g. decidedly non-Gaussian p osteriors, m ultimo dal posteriors, or p osteriors with ridges, w e ﬁnd that a straigh tforw ard m ultiv ariate normal k ernel is in turn inferior to kernels that are conditioned on the local en vironment of a particle. In most applications of interest, the computational cost of simulating the data exceeds the algo- rithmic complexity O ( N 2 ) of the ABC SMC sc heme. W e therefore argue that the c hoice of a kernel with a high acceptance rate enables users to optimize the computational cost. How ev er, when tw o k ernels ha v e the same acceptance rate — which ma y happ en for some shap es of the p osterior — it is more appropriate to select the one whic h is cheaper in terms of algorithmic complexit y . The following table summarizes the computational cost of implemen ting the proposed perturbation k ernels from a previous p opulation of N particles with dimension d (the n umber of individual parameters). In the case of the m ultiv ariate normal k ernel based on the FIM, we denote by C the computational cost of sim ulating an observ ation, e.g. by solving the set of ODEs or SDEs which deﬁne the generative model. Comp onen t-wise normal O ( dN 2 ) Multiv ariate normal based on the whole previous population O ( d 2 N 2 ) Multiv ariate normal based on the M nearest neigh b ours O (( d + M ) N 2 + d 2 M 2 N ) Multiv ariate normal with OLCM O ( d 2 N 2 ) Multiv ariate normal based on the FIM O ( dC N + d 2 N 2 ) (normalized with en tire p opulation) As a general rule of thum b w e would recommend the use of m ulti-v ariate k ernels with OLCM, whic h tends to hav e the highest acceptance rate in our examples and is relativ ely easy to implement at acceptable computational cost. F or some probability mo dels, in particular those describing dynamical systems, the FIM has attracted a lot of atten tion recen tly (Amari & Nagaok a, 2007; Arwini & Dodson, 2008; Secrier et al. , 2009; Girolami & Calderhead, 2011; Erguler & Stumpf, 2011; Komorowski et al. , 2011), and it app ears likely that w e will b e able to exploit these notions, and those of information geometry more generally , fruitfully in ABC SMC. Therefore, where applicable, w e tested the use of the FIM to drive the p erturbation k ernel. Theoretically , and as can be seen in the Repressilator model, this has the adv an tage of exploring so-called neutral spaces more eﬃciently while main taining a high acceptance rate. How ever, the results on the Hes1 mo del suggest that the perturbation k ernels based on the FIM are not robust enough and can lead to high computational cost in particular for relatively complex biological models with real data whic h induce a non-smo oth lik eliho o d surface. The cost of lo cal measures based on M nearest neighbours ma y seem to o high to contemplate their use. Ho w ever, giv en the increase in acceptance rate that w e hav e observed, and the generally high computational cost of simulating complex data, they can prov e to be fruitful. The user should b e aw are that the eﬃciency of such a measure strongly depends on the chosen v alue of M (the smaller the v alue of M the higher the acceptance rate) and that a to o small v alue of M can lead to a too local p erturbation k ernel with a risk of not conv erging to the p osterior distribution as  go es to 0. This sensitivit y to a user-deﬁned parameter mak es the M nearest neighbours kernel less user-friendly than OLCM. The kernels discussed here may seem restrictiv e, esp ecially to those from a background in ev o- lutionary computation. W e ma y , for example, wish to consider other p erturbations to generate new candidate particles, suc h as recom bination (Baragona et al. , 2010), as is frequently done in global op- timization. In principle it is possible to include this in ABC SMC approaches, as long as the w eights for new particles can b e calculated (which turns out to b e relatively straigh tforw ard for recombination and diﬀeren t cross-o v er sc hemes). It has to b e kept in mind, how ev er, that these p erturbations work b est in cases where the parameter space is so under-sampled that random com binations of individual parameters are suﬃciently likely to end up in a region with a more fa vourable cost-function than a 20 lo cal, e.g. gradient-based prop osal w ould. While such strategies hav e b een applied in many optimiza- tion settings, their use in Ba y esian inference is rare, since generally here the optimum (by whichev er criterion) parameter v alue is of less interest than the distribution as a whole. F or maxim um a p oste- riori inferences such metho ds may b e fruitfully applied, but here we do not see an obvious adv antage (as is also b orne out b y simulation studies, data not shown). Kernel choice is one of the obvious means of sp eeding up ABC SMC inferences. Setting the  sc hedule optimally is another. The latter is straigh tforwardly automated by basing the next  t +1 on the acceptance rate obtained during the generation of the intermediate distribution, p  t ( θ | x ). But again there is a trade-oﬀ to be made betw een con v ergence and exhaustiv e exploration of the parameter space. In particular too gentle a decrease in  t ma y result in loss of particle diversit y (Silk et al. , 2012). Here we b elieve that further in v estigation of FIMs ma y hold imp ortant clues to ho w the  t are b est c hosen. This would, for example, resonate with the p ersp ective on  prop osed by Ratmann et al. (2009). Ac kno wledgements SF is funded through an MRC Computational Biology Research F ellowship; CB and MPHS gratefully ac knowledge ﬁnancial supp ort from the BBSR C (BB/G007934/1); MPHS is a Ro y al Society W olfson Researc h Merit a ward holder. References Amari, S. & Nagaok a, H. 2007 Metho ds of information ge ometry v olume 191 American Mathematical So ciet y . Arwini, K. & Do dson, C. 2008 Information ge ometry Springer. Baragona, R., Battaglia, F. & P oli, I. 2010 Evolutionary Statistic al Pr o c e dur es Springer V erlag. Barnes, C. P ., Silk, D., Sheng, X. & Stumpf, M. P . H. 2011 Ba y esian design of synthetic biological systems Pr o c e e dings of the National A c ademy of Scienc es of the Unite d States of Americ a 108 , 15190–15195. Barthelm ´ e, S. & Chopin, N. 2011 Exp ectation-propagation for summary-less, lik eliho o d-free inference arXiv pr eprint arXiv:1107.5959 . Beaumon t, M., Zhang, W. & Balding, D. 2002 Appro ximate Bay esian computation in p opulation genetics Genetics 162 , 2025–2035. Beaumon t, M. A., Cornuet, J. M., Marin, J. M. & Rob ert, C. P . 2009 Adaptiv e approximate Ba yesian computation Biometrika 96 , 983–990. Bishop, C. 2006 Pattern r e c o gnition and machine le arning Springer New Y ork. Blum, M. G. B. & F ran¸ cois, O. 2010 Non-linear regression mo dels for Appro ximate Ba yesian Compu- tation Statistic and Computing 20 , 63–73. Bo yd, S. P . & V anden b erghe, L. 2004 Convex optimization Cam bridge Univ ersity Press. Capp ´ e, O., Douc, R., Guillin, A., Marin, J.-M. & Rob ert, C. P . 2008 Adaptive imp ortance sampling in general mixture classes Statistics and Computing 18 , 447–459. 21 Cornebise, J., Moulines, ´ E. & Olsson, J. 2008 Adaptive methods for sequential imp ortance sampling with application to state space models Statistics and Computing 18 , 461–480. Corn uet, J. M., Marin, J. M., Mira, A. & Rob ert, C. 2009 Adaptive multiple importance sampling Sc andinavian Journal of Statistics . Co x, D. 2006 Principles of statistic al infer enc e Cambridge Universit y Press. Del Moral, P ., Doucet, A. & Jasra, A. 2006 Sequen tial monte carlo samplers Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) 68 , 411–436. Del Moral, P ., Doucet, A. & Jasra, A. 2011 An adaptiv e sequential mon te carlo metho d for approximate ba yesian computation Statistics and Computing 1–12. Didelot, X., Everitt, R. G., Johansen, A. M. & Lawson, D. J. 2011 Lik eliho o d-free estimation of mo del evidence Bayesian Analysis 6 , 49–76. Douc, R., Guillin, A., Marin, J.-M. & Rob ert, C. P . 2007 Con v ergence of adaptiv e mixtures of imp or- tance sampling sc hemes The Annals of Statistics 35 , 420–448. Dro v andi, C. C. & P ettitt, A. N. 2011 Estimation of Parameters for Macroparasite Population Evolu- tion Using Appro ximate Bay esian Computation Biometrics 67 , 225–233. Efron, B. 2010 L ar ge-Sc ale Infer enc e: Empiric al Bayes Metho ds for Estimation, T esting, and Pr e dic- tion Cambridge Univ Press. Elo witz, M. B. & Leibler, S. 2000 A synthetic oscillatory netw ork of transcriptional regulators. Natur e 403 , 335–338. Erguler, K. & Stumpf, M. P . H. 2011 Practical limits for reverse engineering of dynamical systems: a statistical analysis of sensitivity and parameter inferabilit y in systems biology mo dels Mole cular BioSystems 7 , 1593–1602. F agundes, N., Ray , N., Beaumont, M., Neuensc hw ander, S. et al. 2007 Statistical ev aluation of alter- nativ e models of human ev olution Pr o c e e dings of the National A c ademy of Scienc es 104 , 17614. F earnhead, P . & Prangle, D. 2011 Constructing summary statistics for approximate bay esian compu- tation: semi-automatic approximate bay esian computation Journal of the R oyal Statistic al So ciety, Series B . Gilks, W., Ric hardson, S. & Spiegelhalter, D. 1996 Markov chain Monte Carlo in pr actic e Chapman & Hall/CRC. Girolami, M. & Calderhead, B. 2011 Riemann manifold langevin and hamiltonian monte carlo methods Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) 73 , 123–214. Giv ens, G. & Raftery , A. 1996 Lo cal adaptive imp ortance sampling for multiv ariate densities with strong nonlinear relationships Journal of the Americ an Statistic al Asso ciation 132–141. Gutenkunst, R. N., W aterfall, J. J., Casey , F. P ., Bro wn, K. S., My ers, C. R. & Sethna, J. P . 2007 Univ ersally Slopp y P arameter Sensitivities in Systems Biology Mo dels PL oS c omputational biolo gy 3 , e189. Haario, H., Saksman, E. & T amminen, J. 1999 Adaptiv e prop osal distribution for random walk metrop olis algorithm Computational Statistics 14 , 375–396. 22 Hirata, H., Y oshiura, S., Ohtsuk a, T., Bessho, Y. et al. 2002 Oscillatory expression of the bhlh factor hes1 regulated b y a negativ e feedback lo op Scienc e’s STKE 298 , 840. Jasra, A., Singh, S. S., Martin, J. S. & McCoy , E. 2010 Filtering via approximate Bay esian computation Statistics and Computing . Komoro wski, M., Costa, M. J., Rand, D. A. & Stumpf, M. P . H. 2011 Sensitivity , robustness, and iden- tiﬁabilit y in stochastic c hemical kinetics mo dels. Pr o c e e dings of the National A c ademy of Scienc es of the Unite d States of Americ a 108 , 8645–8650. Lehmann, E. & Casella, G. 1998 The ory of p oint estimation volume 31 Springer V erlag. Liep e, J., Barnes, C., Cule, E., Erguler, K. et al. 2010 ABC-SysBio-appro ximate Ba yesian computation in Python with GPU supp ort Bioinformatics 26 , 1797–1799. Lop es, J. S. & Beaumon t, M. A. 2010 ABC: A useful Bay esian to ol for the analysis of p opulation data Infe ction, Genetics and Evolution 10 , 825–832. MacKa y , D. 2003 Information the ory, infer enc e, and le arning algorithms Cam brige Univ ersit y Press. Marin, J., Pudlo, P ., Rob ert, C. & Ryder, R. 2011 Appro ximate ba yesian computational metho ds Statistics and Computing 1–14. Marjoram, P . & Molitor, J. 2003 Mark ov c hain Mon te Carlo without lik eliho o ds Pr o c e e dings of the National A c ademy of Scienc es of the Unite d States of Americ a 100 , 15324. McKinley , T., Co ok, A. R. & Deardon, R. 2009 Inference in Epidemic Mo dels without Likelihoo ds The International Journal of Biostatistics 5 . Oh, M. & Berger, J. 1993 In tegration of multimodal functions by monte carlo importance sampling Journal of the A meric an Statistic al Asso ciation 88 , 450–456. Pitt, M. K. & Shephard, N. 1999 Filtering via simulation: Auxiliary particle ﬁlters Journal of the A meric an Statistic al Asso ciation 590–599. Pritc hard, J., Seielstad, M., Perez-Lezaun, A. & F eldman, M. 1999 Population gro wth of human y chromosomes: a study of y c hromosome microsatellites. Mole cular Biolo gy and Evolution 16 , 1791–1798. Rao, C. R. 1945 Information and accuracy attainable in the estimation of statistical parameters Bul letin of the Calcutta Mathematic al So ciety 37 , 81 – 91. Ratmann, O., Jørgensen, O., Hinkley , T., Stumpf, M., Richardson, S. & Wiuf, C. 2007 Using lik eliho o d-free inference to compare ev olutionary dynamics of the protein netw orks of h. p ylori and p. falciparum PL oS Computational Biolo gy 3 , e230. Ratmann, O., Andrieu, C., Wiuf, C. & Richardson, S. 2009 Mo del criticism based on likelihoo d-free inference, with an application to protein netw ork ev olution Pr o c e e dings of the National A c ademy of Scienc es 106 , 10576–10581. Rob ert, C. & Casella, G. 2004 Monte Carlo statistic al metho ds Springer V erlag. Rob ert, C., Cornuet, J., Marin, J. & Pillai, N. 2011 Lac k of conﬁdence in approximate bay esian computation mo del c hoice volume 108 15112–15117 National Acad Sciences. 23 Secrier, M., T oni, T. & Stumpf, M. P . H. 2009 The ABC of rev erse engineering biological signalling systems Mole cular BioSystems 5 , 1925–1935. Silk, D., Kirk, P ., Barnes, C., T oni, T. et al. 2011 Designing attractiv e models via automated identi- ﬁcation of c haotic and oscillatory dynamical regimes Natur e Communic ations 2 , 489. Silk, D., Filippi, S. & Stumpf, M. P . H. 2012 Optimizing Threshold - Schedules for Approximate Ba yesian Computation Sequen tial Monte Carlo Samplers: Applications to Molecular Systems arXiv 1210.3296 . Sisson, S. A., F an, Y. & T anak a, M. M. 2007 Sequen tial Monte Carlo without lik eliho o ds Pr o c e e dings of the National A c ademy of Scienc es of the Unite d States of Americ a 104 , 1760–1765. Stigler, S. 1986 The history of statistics: The me asur ement of unc ertainty b efor e 1900 Belknap Press. T allmon, D., Luik art, G. & Beaumon t, M. 2004 Comparative ev aluation of a new eﬀective p opulation size estimator based on approximate ba yesian computation Genetics 167 , 977–988. T anak a, M., F rancis, A., Luciani, F. & Sisson, S. 2006 Using appro ximate ba yesian computation to estimate tub erculosis transmission parameters from genot yp e data Genetics 173 , 1511–1520. T oni, T. & Stumpf, M. P . H. 2009 Simulation-based mo del selection for dynamical systems in systems and p opulation biology Bioinformatics 26 , 104–110. T oni, T., W elch, D., Strelk ow a, N., Ipsen, A. & Stumpf, M. P . H. 2009 Appro ximate Bay esian com- putation scheme for parameter inference and mo del selection in dynamical systems. Journal of the R oyal So ciety, Interfac e / the R oyal So ciety 6 , 187–202. V an Der Merwe, R., Doucet, A., De F reitas, N. & W an, E. 2001 The unscented particle ﬁlter A dvanc es in Neur al Information Pr o c essing Systems 584–590. 24

On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment