Model choice versus model criticism

Mo del c hoice v ersus mo del criticism Christian P. R ober t 1 , 2 , Kerrie Mengersen 3 , and Carla Chen 3 1 Univ ersit´ e P aris Dauphine, 2 CREST-INSEE, P aris, F rance, and 3 Queensland Univ ersity of T ec hnology , Brisbane, Australia Abstract The new p erspectives on Ba yesian model criticisms presented in Ratmann et al. (2009) are c hallenging standard approac hes to Ba yesian mo del c hoice. W e discuss here some issues arising from the approac h, including prior inﬂuence, mo del assessmen t and criticism, and the meaning of error. Keyw ords: Appro ximate Ba yesian Computation, Ba y esian statistics, Ba y esian mo del choice, Ba y esian mo del criticism, Ba yesian model com- parison, computational statistics. In Ratmann et al. (2009), the p erception of the approximation error in the ABC algorithm (Pritc hard et al., 1999, Beaumon t et al., 2002, Marjoram et al., 2003) is radically modiﬁed, mo ving from a computational parameter that is calibrated by the user when balancing precision and computing time in to a genuine parameter  about which inferences can b e made in the same manner as for the original parameter θ . As stressed in Section S2 of Ratmann et al. (2009), this is indeed a c hange of p erception rather than a mo diﬁca- tion of the ABC metho d in that the target in θ remains the same. (This should not b e construed as a criticism in that the uniﬁcation of most ABC represen tations prop osed in Section 2 is immensely v aluable.) Although the deriv ation of the distribution ξ x 0 ,θ (  ) is somewhat con voluted in Section S1, w e note here that it is simply the distribution of the error ρ ( S ( x ) , S ( x 0 )) when x ∼ f ( x | θ ), i.e. a pro jection of f ( x | θ ) in probabilistic terms. Example— F or a P oisson x 0 ∼ P ( θ ) model, a natural div ergence is the diﬀerence  = x − x 0 whic h is distributed as a translated Poisson P ( θ ) − x 0 when conditional on x 0 and whic h is marginaly distributed as the diﬀerence of t w o iid P ( θ ) v ariables. Since  thus is an integer v alued v ariable, the supplemen tary prior π  should reﬂect this feature. A natural solution is π  ( k ) ∝ 1 / (1 + k 2 ) , 1 since the series P k 1 /k 2 is con v erging, ev en though using a prop er prior π  do es not app ear to b e a necessary condition in Ratmann et al. (2009). J The c hange of p erception in Ratmann et al. (2009) is based on the un- derlying assumption that the data is informative ab out the error term  , whic h is not necessarily the case, as shown by the previous and following examples. Example— F or a lo cation family , x 0 ∼ f ( x − θ ), if w e tak e  = x − x 0 , the p osterior distribution of  is π  (  | x 0 ) ∝ Z f (  + x 0 − θ ) π θ ( θ ) π  (  ) d θ π (  ) and therefore a mostly ﬂat prior π θ ( θ ) with a large supp ort pro duces a p osterior π  (  | x 0 ) identical to π  (  ) for most v alues of x 0 . Con versely , a highly concentrated prior π  (  ) hardly mo diﬁes the p osterior π ( θ | x 0 ). J Example— F or the binomial model x 0 ∼ B ( n, θ ), assuming a uniform prior θ ∼ U (0 , 1), we can consider  = x − x 0 , in which case  is supp orted on {− n, . . . , n } . If we use a uniform prior on  as well, π  (  | x 0 ) ∝  n  + x 0  Z θ  + x 0 (1 − θ ) n −  − x 0 d θ I {− n,...,n } (  ) ∝  n  + x 0  (  + x 0 )!( n −  − x 0 )! ( n + 1)! I {− n,...,n } (  ) = 1  (1 + 2 n ) I {− n,...,n } (  ) and therefore the (Bay esian) mo del brings no information ab out  . J Ob viously , this example is not directly incriminating against the metho d of Ratmann et al. (2009), in that it only considers a single statistic, instead of sev eral as in Ratmann et al. (2009) (which distinguishes this pap er from the remainder of the literature, where  is a single num b er). 1 Ba y esian mo del assessmen t The pap er c ho oses to assess the v alidit y of the mo del based on the marginal lik eliho o d m ( x ) instead of the predictiv e p ( x | x 0 ). While this has the adv an- tage of “using the data once”, it suﬀers from a strong impact of the prior mo delling and of not conditioning on the observed data x 0 . A more appro- priate (if still ad-ho c) pro cedure is to relate the observ ed statistics S ( x 0 ) with statistics sim ulated from p ( x | x 0 ), as in, e.g., V erdinelli and W asser- man (1998). It may b e argued that chec king the prior adequacy is a go o d 2 thing, but having no wa y to distinguish b etw een prior and sampling mo del inadequacy is a diﬃculty , as seen in the Poisson example. Example— F or the lo cation family , x 0 ∼ f ( x − θ ), the join t p osterior dis- tribution of ( θ ,  ) is f (  + x 0 − θ ) π ( θ ) π (  ) , and therefore the diﬀerence (  − θ ) is not identiﬁable from the data, solely from the prior(s). J Note that, from an ABC p ersp ectiv e, using p ( x | x 0 ) instead of m ( x ) do es not imply a considerable increase in computing time. How ever, computing the Bay es factor (and therefore the evidence) using the acceptance rate of the ABC algorithm is even faster. Moreov er, it pro vides a diﬀerent answer. Example— F or the Poisson P ( θ ) mo del, if we take as an example an expo- nen tial E (1) prior π θ , the evidence asso ciated with the mo del is Z π θ ( θ ) f ( x 0 | θ )d θ = Z θ x 0 e − 2 θ x 0 ! d θ = 2 − x 0 − 1 , while the quantitativ e assessment of Ratmann et al. (2009) is ∞ X k = − x 0 π  ( k | x 0 ) I { π  ( k | x 0 ) ≤ π  (0 | x 0 ) } , (1) with π  (  | x 0 ) ∝ Z θ  + x 0 e − 2 θ (  + x 0 )!(1 +  2 ) d θ = 2 −  − x 0 − 1 (1 +  2 ) . The numerical comparison of b oth functions of x 0 in Figure 1 shows a muc h slo wer decrease in x 0 for the p -v alue (1) than for the evidence, not to men tion a frankly puzzling non-monotonicity of the p -v alue. J 2 Implications of mo del criticism While the approach by Ratmann et al. (2009) provides an informal assess- men t that can b e derived in an ABC setting, the Bay esian foundations of the metho d ma y b e questioned. The core of the Bay esian approac h is to in- corp orate all asp ects of uncertaint y and all asp ects of decision consequences in to a single inferen tial machine that pro vides the “optimal” solution. In the curren t case, while the consequences of rejecting the current mo del are not discussed, they w ould most lik ely include the construction of another model. 3 0 10 20 30 40 50 0.0 0.1 0.2 0.3 0.4 0.5 0.6 x_0 probability Figure 1: Comparison of the decreasing rates of the evidence (blue) and of the p -v alue (black) deriv ed from Ratmann et al. (2009) for a P oisson mo del. In the ﬁrst graph in the pap er, several models are contrasted and this leads us to w onder ab out the gain compared with using the Ba y es factor, whic h can b e directly derived from the ABC sim ulation as w ell since the (accepted or rejected) prop osed v alues are simulated from π ( θ ) f ( x | θ ). Example— F or the Poisson x 0 ∼ P ( θ ) mo del, running ABC with no ap- pro ximation (since this is a ﬁnite setting) pro duces an exact ev aluation of the evidence. J W e also note that the non-parametric ev aluation at the basis of the ABC µ algorithm of Ratmann et al. (2009) can equally b e used for appro ximating the true marginal densit y m ( x ). The smo oth version of ABC µ presen ted in Section S1.5, eqn. [S8], is how ever far from b eing a densit y estimate of ξ x 0 ,θ (  ) since it based on a single realisation from f ( x | θ ). It should rather b e construed as a (further) smo othed version of its smo oth ABC counterpart and this suggests integreting ov er h as w ell. Unless some group structure can b e exploited to a v oid the rep etition of simulations x b = x b ( θ ), the non- parametric estimator [S9] cannot be used as a practical device b ecause either B is small, in whic h case the non-parametric approximation is p o or, or B is large, in whic h case pro ducing the x b ’s for every v alue of θ is to o time- consuming. Ob viously , using mo derate B is alwa ys feasible from a compu- tational p oin t of view and it can also b e argued that the appro ximation of f ρ ( θ ,  | x 0 ) by ˆ f ρ ( θ ,  | x 0 ) is not of ma jor interest, since the former is only an appro ximation to the true target. (In a v aguely connected w ay , the rejection sampler of Subsection S1.8 do es seem an approximation to exact rejection- sampling, in that the choice of the upp er bound C = max i min k ˆ ξ k (  ik , x i ) 4 o ver the samples sim ulated in Step 1 of the algorithm do es not pro duce a true upp er b ound.) 3 On the meaning of the error The error term  is deﬁned as part of the mo del, based on the marginal, with the additional input of a prior distribution π (  ). Since Ratmann et al. (2009) analyse this error based on the pro duct of tw o densities, ξ x 0 ,θ (  ) π (  ), this pro duct is not prop erly deﬁned from a probabilistic p oint of view. The authors ch o ose to call ξ x 0 ,θ (  ) a “likelihoo d” by a ﬁducial argument, but this is (strictly sp eaking) not [prop ortional to] a density in x 0 . Ob viously , sim ulating from the density that is proportional to ξ x 0 ,θ (  ) π (  ) π ( θ ) is en tirely p ossible as long as this function integrates in ( θ ,  ) against the dominating measure, but it suﬀers from an undeﬁned probabilistic background in that, for instance, it is not in v arian t under reparameterisation in  : changing  to ε in tro duces the squared Jacobian | d/dε | 2 in the “densit y”. W e ackno wledge that most ABC strategies can b e seen as using a formal “prior+lik eliho o d” represen tation of the distribution of  , since π ABC ( θ ) = Z π  (  ) ξ (  | x 0 , θ ) d  π ( θ ) , but this formal p ersp ectiv e does not turn  in to a “true” parameter and π  in to its prior. F or instance, non parametric π  ’s may b e based on the observ ations or on additional simulations. The denomination of “lik eliho o d” is th us debatable in that ξ x 0 ,θ (  ) π (  ) π ( θ ) cannot alwa ys b e turned in to a densit y on x 0 (or even on a statistic S ( x 0 )). Example— F or the Poisson x 0 ∼ P ( θ ) mo del, ξ x 0 ,θ (  ) is the translated P oisson distribution P ( θ ) −  , truncated to p ositive v alues. While this is indeed a distribution on x 0 , conditional on ( θ ,  ), it cannot b e used as the original Poisson distribution, b ecause of the unidentiﬁabilit y of  . J W e also think that comparing mo dels via the (“p osterior”) distributions of the errors  do es not pro vide a coherent setup in that this approac h do es not incorp orate the mo del complexit y p enalisation that is at the heart of the Bay esian mo del comparison to ols like the Bay es factor. First, a more complex (e.g., with more parameters) mo del will most likely hav e a more disp ersed distribution on  . Second, returning to the ﬁrst argument of that nore, the choice of the prior π (  ) (and of the error  itself ) is mo del dep enden t (as s tressed in the pap er via the notation π ( , M )) and the comparison thus 5 reﬂects p ossibly mostly the prior mo delling instead of the data assessment, as shown, again, by the lo cation parameter example. Using the same band of rejection for all mo dels as in Figure 1 of Ratmann et al. (2009) th us do es not seem p ossible nor recommendable on a general basis. Ac kno wledgmen ts This work w as partially supp orted b y the Agence Nationale de la Rec herche (ANR, 212, rue de Bercy 75012 Paris) through the 2005 pro ject ANR-05- BLAN-0196-01 Misgep op and the 2009 pro ject ANR-08-BLAN-0218 Big’MC (for C.P .R.). W e are grateful to Oliv er Ratmann for clarifying sev eral points ab out his pap er. References Beaumont, M. , Zhang, W. and Balding, D. (2002). Appro ximate Ba yesian computation in p opulation genetics. Genetics , 162 2025–2035. Marjoram, P. , Molitor, J. , Plagnol, V. and T a v ar ´ e, S. (2003). Mark ov c hain Monte Carlo without likelihoo ds. Pr o c. Natl. A c ad. Sci. USA , 100 15324–15328. Pritchard, J. , Seielst ad, M. , Perez-Lezaun, A. and Feldman, M. (1999). Population gro wth of h uman Y c hromosomes: a study of Y c hro- mosome microsatellites. Mol. Biol. Evol. , 16 1791–1798. Ra tmann, O. , Andrieu, C. , Wiuf, C. and Richardson, S. (2009). Mo del criticism based on lik eliho o d-free inference, with an application to protein netw ork evolution. PNAS , 106 1–6. Verdinelli, I. and W asserman, L. (1998). Bay esian go o dness-of-ﬁt test- ing using inﬁnite-dimensional exp onential families. Annals of Statistics , 26 1215–1241. 6

Model choice versus model criticism

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment