Model choice versus model criticism
The new perspectives on ABC and Bayesian model criticisms presented in Ratmann et al.(2009) are challenging standard approaches to Bayesian model choice. We discuss here some issues arising from the authors' approach, including prior influence, model…
Authors: Christian P. Robert, Kerrie L. Mengersen, Carla Chen
Mo del c hoice v ersus mo del criticism Christian P. R ober t 1 , 2 , Kerrie Mengersen 3 , and Carla Chen 3 1 Univ ersit´ e P aris Dauphine, 2 CREST-INSEE, P aris, F rance, and 3 Queensland Univ ersity of T ec hnology , Brisbane, Australia Abstract The new p erspectives on Ba yesian model criticisms presented in Ratmann et al. (2009) are c hallenging standard approac hes to Ba yesian mo del c hoice. W e discuss here some issues arising from the approac h, including prior influence, mo del assessmen t and criticism, and the meaning of error. Keyw ords: Appro ximate Ba yesian Computation, Ba y esian statistics, Ba y esian mo del choice, Ba y esian mo del criticism, Ba yesian model com- parison, computational statistics. In Ratmann et al. (2009), the p erception of the approximation error in the ABC algorithm (Pritc hard et al., 1999, Beaumon t et al., 2002, Marjoram et al., 2003) is radically modified, mo ving from a computational parameter that is calibrated by the user when balancing precision and computing time in to a genuine parameter about which inferences can b e made in the same manner as for the original parameter θ . As stressed in Section S2 of Ratmann et al. (2009), this is indeed a c hange of p erception rather than a mo difica- tion of the ABC metho d in that the target in θ remains the same. (This should not b e construed as a criticism in that the unification of most ABC represen tations prop osed in Section 2 is immensely v aluable.) Although the deriv ation of the distribution ξ x 0 ,θ ( ) is somewhat con voluted in Section S1, w e note here that it is simply the distribution of the error ρ ( S ( x ) , S ( x 0 )) when x ∼ f ( x | θ ), i.e. a pro jection of f ( x | θ ) in probabilistic terms. Example— F or a P oisson x 0 ∼ P ( θ ) model, a natural div ergence is the difference = x − x 0 whic h is distributed as a translated Poisson P ( θ ) − x 0 when conditional on x 0 and whic h is marginaly distributed as the difference of t w o iid P ( θ ) v ariables. Since thus is an integer v alued v ariable, the supplemen tary prior π should reflect this feature. A natural solution is π ( k ) ∝ 1 / (1 + k 2 ) , 1 since the series P k 1 /k 2 is con v erging, ev en though using a prop er prior π do es not app ear to b e a necessary condition in Ratmann et al. (2009). J The c hange of p erception in Ratmann et al. (2009) is based on the un- derlying assumption that the data is informative ab out the error term , whic h is not necessarily the case, as shown by the previous and following examples. Example— F or a lo cation family , x 0 ∼ f ( x − θ ), if w e tak e = x − x 0 , the p osterior distribution of is π ( | x 0 ) ∝ Z f ( + x 0 − θ ) π θ ( θ ) π ( ) d θ π ( ) and therefore a mostly flat prior π θ ( θ ) with a large supp ort pro duces a p osterior π ( | x 0 ) identical to π ( ) for most v alues of x 0 . Con versely , a highly concentrated prior π ( ) hardly mo difies the p osterior π ( θ | x 0 ). J Example— F or the binomial model x 0 ∼ B ( n, θ ), assuming a uniform prior θ ∼ U (0 , 1), we can consider = x − x 0 , in which case is supp orted on {− n, . . . , n } . If we use a uniform prior on as well, π ( | x 0 ) ∝ n + x 0 Z θ + x 0 (1 − θ ) n − − x 0 d θ I {− n,...,n } ( ) ∝ n + x 0 ( + x 0 )!( n − − x 0 )! ( n + 1)! I {− n,...,n } ( ) = 1 (1 + 2 n ) I {− n,...,n } ( ) and therefore the (Bay esian) mo del brings no information ab out . J Ob viously , this example is not directly incriminating against the metho d of Ratmann et al. (2009), in that it only considers a single statistic, instead of sev eral as in Ratmann et al. (2009) (which distinguishes this pap er from the remainder of the literature, where is a single num b er). 1 Ba y esian mo del assessmen t The pap er c ho oses to assess the v alidit y of the mo del based on the marginal lik eliho o d m ( x ) instead of the predictiv e p ( x | x 0 ). While this has the adv an- tage of “using the data once”, it suffers from a strong impact of the prior mo delling and of not conditioning on the observed data x 0 . A more appro- priate (if still ad-ho c) pro cedure is to relate the observ ed statistics S ( x 0 ) with statistics sim ulated from p ( x | x 0 ), as in, e.g., V erdinelli and W asser- man (1998). It may b e argued that chec king the prior adequacy is a go o d 2 thing, but having no wa y to distinguish b etw een prior and sampling mo del inadequacy is a difficulty , as seen in the Poisson example. Example— F or the lo cation family , x 0 ∼ f ( x − θ ), the join t p osterior dis- tribution of ( θ , ) is f ( + x 0 − θ ) π ( θ ) π ( ) , and therefore the difference ( − θ ) is not identifiable from the data, solely from the prior(s). J Note that, from an ABC p ersp ectiv e, using p ( x | x 0 ) instead of m ( x ) do es not imply a considerable increase in computing time. How ever, computing the Bay es factor (and therefore the evidence) using the acceptance rate of the ABC algorithm is even faster. Moreov er, it pro vides a different answer. Example— F or the Poisson P ( θ ) mo del, if we take as an example an expo- nen tial E (1) prior π θ , the evidence asso ciated with the mo del is Z π θ ( θ ) f ( x 0 | θ )d θ = Z θ x 0 e − 2 θ x 0 ! d θ = 2 − x 0 − 1 , while the quantitativ e assessment of Ratmann et al. (2009) is ∞ X k = − x 0 π ( k | x 0 ) I { π ( k | x 0 ) ≤ π (0 | x 0 ) } , (1) with π ( | x 0 ) ∝ Z θ + x 0 e − 2 θ ( + x 0 )!(1 + 2 ) d θ = 2 − − x 0 − 1 (1 + 2 ) . The numerical comparison of b oth functions of x 0 in Figure 1 shows a muc h slo wer decrease in x 0 for the p -v alue (1) than for the evidence, not to men tion a frankly puzzling non-monotonicity of the p -v alue. J 2 Implications of mo del criticism While the approach by Ratmann et al. (2009) provides an informal assess- men t that can b e derived in an ABC setting, the Bay esian foundations of the metho d ma y b e questioned. The core of the Bay esian approac h is to in- corp orate all asp ects of uncertaint y and all asp ects of decision consequences in to a single inferen tial machine that pro vides the “optimal” solution. In the curren t case, while the consequences of rejecting the current mo del are not discussed, they w ould most lik ely include the construction of another model. 3 0 10 20 30 40 50 0.0 0.1 0.2 0.3 0.4 0.5 0.6 x_0 probability Figure 1: Comparison of the decreasing rates of the evidence (blue) and of the p -v alue (black) deriv ed from Ratmann et al. (2009) for a P oisson mo del. In the first graph in the pap er, several models are contrasted and this leads us to w onder ab out the gain compared with using the Ba y es factor, whic h can b e directly derived from the ABC sim ulation as w ell since the (accepted or rejected) prop osed v alues are simulated from π ( θ ) f ( x | θ ). Example— F or the Poisson x 0 ∼ P ( θ ) mo del, running ABC with no ap- pro ximation (since this is a finite setting) pro duces an exact ev aluation of the evidence. J W e also note that the non-parametric ev aluation at the basis of the ABC µ algorithm of Ratmann et al. (2009) can equally b e used for appro ximating the true marginal densit y m ( x ). The smo oth version of ABC µ presen ted in Section S1.5, eqn. [S8], is how ever far from b eing a densit y estimate of ξ x 0 ,θ ( ) since it based on a single realisation from f ( x | θ ). It should rather b e construed as a (further) smo othed version of its smo oth ABC counterpart and this suggests integreting ov er h as w ell. Unless some group structure can b e exploited to a v oid the rep etition of simulations x b = x b ( θ ), the non- parametric estimator [S9] cannot be used as a practical device b ecause either B is small, in whic h case the non-parametric approximation is p o or, or B is large, in whic h case pro ducing the x b ’s for every v alue of θ is to o time- consuming. Ob viously , using mo derate B is alwa ys feasible from a compu- tational p oin t of view and it can also b e argued that the appro ximation of f ρ ( θ , | x 0 ) by ˆ f ρ ( θ , | x 0 ) is not of ma jor interest, since the former is only an appro ximation to the true target. (In a v aguely connected w ay , the rejection sampler of Subsection S1.8 do es seem an approximation to exact rejection- sampling, in that the choice of the upp er bound C = max i min k ˆ ξ k ( ik , x i ) 4 o ver the samples sim ulated in Step 1 of the algorithm do es not pro duce a true upp er b ound.) 3 On the meaning of the error The error term is defined as part of the mo del, based on the marginal, with the additional input of a prior distribution π ( ). Since Ratmann et al. (2009) analyse this error based on the pro duct of tw o densities, ξ x 0 ,θ ( ) π ( ), this pro duct is not prop erly defined from a probabilistic p oint of view. The authors ch o ose to call ξ x 0 ,θ ( ) a “likelihoo d” by a fiducial argument, but this is (strictly sp eaking) not [prop ortional to] a density in x 0 . Ob viously , sim ulating from the density that is proportional to ξ x 0 ,θ ( ) π ( ) π ( θ ) is en tirely p ossible as long as this function integrates in ( θ , ) against the dominating measure, but it suffers from an undefined probabilistic background in that, for instance, it is not in v arian t under reparameterisation in : changing to ε in tro duces the squared Jacobian | d/dε | 2 in the “densit y”. W e ackno wledge that most ABC strategies can b e seen as using a formal “prior+lik eliho o d” represen tation of the distribution of , since π ABC ( θ ) = Z π ( ) ξ ( | x 0 , θ ) d π ( θ ) , but this formal p ersp ectiv e does not turn in to a “true” parameter and π in to its prior. F or instance, non parametric π ’s may b e based on the observ ations or on additional simulations. The denomination of “lik eliho o d” is th us debatable in that ξ x 0 ,θ ( ) π ( ) π ( θ ) cannot alwa ys b e turned in to a densit y on x 0 (or even on a statistic S ( x 0 )). Example— F or the Poisson x 0 ∼ P ( θ ) mo del, ξ x 0 ,θ ( ) is the translated P oisson distribution P ( θ ) − , truncated to p ositive v alues. While this is indeed a distribution on x 0 , conditional on ( θ , ), it cannot b e used as the original Poisson distribution, b ecause of the unidentifiabilit y of . J W e also think that comparing mo dels via the (“p osterior”) distributions of the errors do es not pro vide a coherent setup in that this approac h do es not incorp orate the mo del complexit y p enalisation that is at the heart of the Bay esian mo del comparison to ols like the Bay es factor. First, a more complex (e.g., with more parameters) mo del will most likely hav e a more disp ersed distribution on . Second, returning to the first argument of that nore, the choice of the prior π ( ) (and of the error itself ) is mo del dep enden t (as s tressed in the pap er via the notation π ( , M )) and the comparison thus 5 reflects p ossibly mostly the prior mo delling instead of the data assessment, as shown, again, by the lo cation parameter example. Using the same band of rejection for all mo dels as in Figure 1 of Ratmann et al. (2009) th us do es not seem p ossible nor recommendable on a general basis. Ac kno wledgmen ts This work w as partially supp orted b y the Agence Nationale de la Rec herche (ANR, 212, rue de Bercy 75012 Paris) through the 2005 pro ject ANR-05- BLAN-0196-01 Misgep op and the 2009 pro ject ANR-08-BLAN-0218 Big’MC (for C.P .R.). W e are grateful to Oliv er Ratmann for clarifying sev eral points ab out his pap er. References Beaumont, M. , Zhang, W. and Balding, D. (2002). Appro ximate Ba yesian computation in p opulation genetics. Genetics , 162 2025–2035. Marjoram, P. , Molitor, J. , Plagnol, V. and T a v ar ´ e, S. (2003). Mark ov c hain Monte Carlo without likelihoo ds. Pr o c. Natl. A c ad. Sci. USA , 100 15324–15328. Pritchard, J. , Seielst ad, M. , Perez-Lezaun, A. and Feldman, M. (1999). Population gro wth of h uman Y c hromosomes: a study of Y c hro- mosome microsatellites. Mol. Biol. Evol. , 16 1791–1798. Ra tmann, O. , Andrieu, C. , Wiuf, C. and Richardson, S. (2009). Mo del criticism based on lik eliho o d-free inference, with an application to protein netw ork evolution. PNAS , 106 1–6. Verdinelli, I. and W asserman, L. (1998). Bay esian go o dness-of-fit test- ing using infinite-dimensional exp onential families. Annals of Statistics , 26 1215–1241. 6
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment