On resolving the Savage-Dickey paradox

The Savage-Dickey ratio is known as a specialised representation of the Bayes factor (O'Hagan and Forster, 2004) that allows for a functional plugging approximation of this quantity. We demonstrate here that the Savage-Dickey representation is in fac…

Authors: Jean-Michel Marin, Christian Robert

Electronic Journal of Statistics ISSN: 1935-7524 On resolving the Sa v age–Dic k ey parado x Jean-Mic hel Marin and Christian P . Robert Institut de Math´ ematiques et Mo d´ elisation de Montp el lier, Universit´ e Montpellier 2, Case Courrier 51 34095 Montpellier c e dex 5, F r anc e, e-mail: jean-michel.marin@univ-montp2.fr Universit´ e Paris-Dauphine, CEREMADE 75775 Paris c e dex 16, F r anc e, CREST 92245 Malakoff c e dex, F ranc e e-mail: xian@ceremade.dauphine.fr Abstract: When testing a n ull hypothesis H 0 : θ = θ 0 in a Ba yesian framew ork, the Sav age–Dick ey ratio ( Dic key , 1971 ) is kno wn as a sp ecific represen tation of the Bay es factor ( O’Hagan and F orster , 2004 ) that only uses the p osterior distribution under the alternative hypothesis at θ 0 , th us allowing for a plug-in v ersion of this quantit y . W e demonstrate here that the Sav age–Dickey representation is in fact a generic represen tation of the Bay es factor and that it fundamentally relies on sp ecific measure-theoretic v ersions of the densities inv olved in the ratio, instead of b eing a sp ecial identit y imposing some mathematically v oid constrain ts on the prior distributions. W e completely clarify the measure-theoretic foundations of the Sav age–Dick ey represen tation as w ell as of the later generalisation of V erdinelli and W asserman ( 1995 ). W e provide furthermore a general framework that pro duces a conv erging approximation of the Bay es factor that is unrelated with the approac h of V erdinelli and W asserman ( 1995 ) and prop ose a comparison of this new approximation with their v ersion, as well as with bridge sampling and Chib’s approaches. Keyw ords and phrases: Bay esian model choice, Ba yes factor, bridge sampling, conditional distri- bution, hypothesis testing, Sav age–Dick ey ratio, zero measure set. 1. In tro duction F rom a metho dological viewp oin t, testing a null h yp othesis H 0 : x ∼ f 0 ( x | ω 0 ) versus the alternativ e H a : x ∼ f 1 ( x | ω 1 ) in a Bay esian framew ork requires the introduction of tw o prior distributions, π 0 ( ω 0 ) and π 1 ( ω 1 ), that are defined on the resp ectiv e parameter spaces. In functional terms, the core ob ject of the Ba y e sian approach to testing and model c hoice, the Ba yes factor ( Jeffreys , 1939 , Robert , 2001 , O’Hagan and F orster , 2004 ), is indeed a ratio of tw o marginal densities tak en at the same observ ation x , B 01 ( x ) = R π 0 ( ω 0 ) f 0 ( x | ω 0 ) d ω 0 R π 1 ( ω 1 ) f 1 ( x | ω 1 ) d ω 1 = m 0 ( x ) m 1 ( x ) . (This quantit y B 01 ( x ) is then compared to 1 in order to decide ab out the strength of the supp ort of the data in fav our of H 0 or H a .) It is thus mathematically clearly and uniquely defined, provided both in tegrals exist and differ from b oth 0 and ∞ . The practical computation of the Ba yes factor has generated a large literature on appro ximative (see, e.g. Chib , 1995 , Gelman and Meng , 1998 , Chen et al. , 2000 , Chopin and Rob ert , 2010 ), seeking improv ements in n umerical precision. The Sav age–Dick ey ( Dic key , 1971 ) represen tation of the Bay es factor is primarily known as a sp ecial iden tit y that relates the Bay es factor to the posterior distribution whic h corresp onds to the more complex h yp othesis. As described in V erdinelli and W asserman ( 1995 ) and Chen et al. (2000, pages 164-165), this rep- resen tation has practical implications as a basis for sim ulation methods. Ho w ever, as stressed in Dick ey ( 1971 ) and O’Hagan and F orster ( 2004 ), the foundation of the Sav age–Dick ey represen tation is clearly theoretical. More sp ecifically , when considering a testing problem with an em b edded mo del, H 0 : θ = θ 0 , and a n uisance parameter ψ , i.e. when ω 1 can b e decomposed as ω 1 = ( θ , ψ ) and when ω 0 = ( θ 0 , ψ ), for a sampling distribution f ( x | θ , ψ ), the plug-in representation B 01 ( x ) = π 1 ( θ 0 | x ) π 1 ( θ 0 ) , (1) 1 with the obvious notations for the marginal distributions π 1 ( θ ) = Z π 1 ( θ , ψ )d ψ and π 1 ( θ | x ) = Z π 1 ( θ , ψ | x )d ψ , holds under Dick ey’s (1971) assumption that the conditional prior densit y of ψ under the alternative mo del, giv en θ = θ 0 , π 1 ( ψ | θ 0 ), is equal to the prior density under the n ull h yp othesis, π 0 ( ψ ), π 1 ( ψ | θ 0 ) = π 0 ( ψ ) . (2) Therefore, Dic key’s (1971) identit y ( 1 ) reduces the Bay es factor to the ratio of the p osterior ov er the prior marginal densities of θ under the alternative mo del, taken at the tested v alue θ 0 . The Bay es factor is thus expressed as an amount of information brough t by the data and this helps in its justification as a mo del c hoice to ol. (See also Consonni and V eronese , 2008 .) In order to illustrate the Sa v age–Dick ey representation, consider the artificial example of computing the Ba y e s factor b etw een the mo dels M 0 : x | ψ ∼ N ( ψ , 1) , ψ ∼ N (0 , 1) , and M 1 : x | θ , ψ ∼ N ( ψ , θ ) , ψ | θ ∼ N (0 , θ ) , θ ∼ I G (1 , 1) , whic h is equiv alent to testing the null h yp othesis H 0 : θ = θ 0 = 1 against the alternative H 1 : θ 6 = 1 when x | θ , ψ ∼ N ( ψ , θ ). In that case, mo del M 0 clearly is embedded in model M 1 . W e hav e m 0 ( x ) = exp  − x 2 / 4   ( √ 2 √ 2 π ) and m 1 ( x ) =  1 + x 2 / 4  − 3 / 2 Γ(3 / 2)  ( √ 2 √ 2 π ) , and therefore B 01 ( x ) = Γ(3 / 2) − 1  1 + x 2 / 4  3 / 2 exp  − x 2 / 4  . Dic k ey’s assumption ( 2 ) on the prior densities is satisfied, since π 1 ( ψ | θ 0 ) = 1 √ 2 π exp  − ψ 2 / 2  = π 0 ( ψ ) . Therefore, since π 1 ( θ ) = θ − 2 exp  − θ − 1  , π 1 ( θ 0 ) = exp( − 1) , and π 1 ( θ | x ) = Γ(3 / 2) − 1  1 + x 2 / 4  3 / 2 θ − 5 / 2 exp  − θ − 1  1 + x 2 / 4  I θ> 0 , π 1 ( θ 0 | x ) = Γ(3 / 2) − 1  1 + x 2 / 4  3 / 2 exp  −  1 + x 2 / 4  , w e clearly recov er the Sav age–Dick ey represen tation B 01 ( x ) = Γ(3 / 2) − 1  1 + x 2 / 4  3 / 2 exp  − x 2 / 4  = π 1 ( θ 0 | x ) /π 1 ( θ 0 ) . While the difficult y with the represen tation ( 1 ) is usually addressed in terms of computational asp ects, giv en that π 1 ( θ | x ) is rarely av ailable in closed form, we argue in the current pap er that the Sa v age–Dick ey represen tation faces challenges of a deeper nature that led us to consider it a ‘paradox’. First, by considering b oth prior and p osterior marginal distributions of θ uniquely under the alternative mo del, ( 1 ) seems to indicate that the p osterior probability of the null h yp othesis H 0 : θ = θ 0 is contained within the alternativ e h yp othesis p osterior distribution, even though the set of ( θ , ψ )’s such that θ = θ 0 has a zero probability under this alternative distribution. Second, as explained in Section 2 , an ev en more fundamen tal difficult y with assumption ( 2 ) is that it is meaningless when examined (as it should) within the mathematical axioms of measure theory . Ha ving stated those mathematical difficulties with the Sav age–Dick ey represen tation, w e proceed to show in Section 3 that similar identities hold under no constraint on the prior distributions. In Section 3 , we deriv e computational algorithms that exploit these represen tations to approximate the Bay es factor, in an approac h that differs from the earlier solution of V erdinelli and W asserman ( 1995 ). The pap er concludes with an illustration in the setting of v ariable selection within a probit mo del. 2 2. A measure-theoretic parado x When considering a standard probabilistic setting where the dominating measure on the parameter space is the Leb esgue measure, rather than a counting measure, the conditional density π 1 ( ψ | θ ) is rigorously ( Billingsley , 1986 ) defined as the density of the conditional probabilit y distribution or, equiv alently , by the condition that P (( θ , ψ ) ∈ A 1 × A 2 ) = Z A 1 Z A 2 π 1 ( ψ | θ ) d ψ π 1 ( θ ) d θ = Z A 1 × A 2 π 1 ( θ , ψ )d ψ d θ , for all measurable sets A 1 × A 2 , when π 1 ( θ ) is the asso ciated marginal density of θ . Therefore, this iden tity p oin ts out the well-kno wn fact that the conditional density function π 1 ( ψ | θ ) is defined up to a set of measure zero b oth in ψ for every v alue of θ and in θ . This implies that changing arbitrarily the v alue of the function π 1 ( ·| θ ) for a negligible collection of v alues of θ do es not impact the properties of the conditional distribution. In the setting where the Sav age–Dick ey representation is advocated, the v alue θ 0 to b e tested is not determined from the observ ations but it is instead given in adv ance since this is a testing problem. Therefore the density function π 1 ( ψ | θ 0 ) ma y b e chosen in a c ompletely arbitr ary manner and there is no possible reason for a unique represen tation of π 1 ( ψ | θ 0 ) that can b e found within measure theory . This implies that there alwa ys is a version of the conditional density π 1 ( ψ | θ 0 ) such that Dick ey’s (1971) condition ( 2 ) is satisfied—as w ell as, conv ersely , there are an infinity of versions for which it is not satisfied—. As a result, from a mathematical p ersp ective, condition ( 2 ) cannot b e seen as an assumption on the prior π 1 without further conditions, con trary to what is stated in the original Dick ey ( 1971 ) and later in O’Hagan and F orster ( 2004 ), Consonni and V eronese ( 2008 ) and W etzels et al. ( 2010 ). This difficult y is the first part of what we call the Savage–Dickey p ar adox , namely that, as stated, the representation ( 1 ) relies on a mathem atically v oid constraint on the prior distribution. In the sp ecific case of the artificial example introduced abov e, the c hoice of the conditional density π 1 ( ψ | θ 0 ) is therefore arbitrary: if w e pic k for this density the densit y of the N (0 , 1) distribution, there is agreemen t b et ween π 1 ( ψ | θ 0 ) and π 0 ( ψ ), while, if w e select instead the function exp(+ ψ 2 / 2), whic h is not a densit y , there is no agreement in the sense of condition ( 2 ). The parado x is that this disagreemen t has no consequence whatso ev er in the Sav age–Dick ey representation. The second part of the Sav age–Dick ey parado x is that the representation ( 1 ) is solely v alid for a sp ecific and unique c hoice of a v ersion of the density for b oth the conditional densit y π 1 ( ψ | θ 0 ) and the joint densit y π 1 ( θ 0 , ψ ). When lo oking at the deriv ation of ( 1 ), the choices of some sp ecific v ersions of those densities are indeed noteworth y: in the follo wing developmen t, B 01 ( x ) = R π 0 ( ψ ) f ( x | θ 0 , ψ ) d ψ R π 1 ( θ , ψ ) f ( x | θ, ψ ) d ψ d θ [b y definition] = R π 1 ( ψ | θ 0 ) f ( x | θ 0 , ψ ) d ψ π 1 ( θ 0 ) R π 1 ( θ , ψ ) f ( x | θ, ψ ) d ψ d θ π 1 ( θ 0 ) [using a sp ecific version of π 1 ( ψ | θ 0 )] = R π 1 ( θ 0 , ψ ) f ( x | θ 0 , ψ ) d ψ m 1 ( x ) π 1 ( θ 0 ) [using a sp ecific version of π 1 ( θ 0 , ψ )] = π 1 ( θ 0 | x ) π 1 ( θ 0 ) , [using a sp ecific version of π 1 ( θ 0 | x )] the second equality dep ends on a sp ecific choice of the version of π 1 ( ψ | θ 0 ) but not on the choice of the v ersion of π 1 ( θ 0 ), while the third equalit y dep ends on a sp ecific c hoice of the version of π 1 ( ψ , θ 0 ) as equal to π 0 ( ψ ) π 1 ( θ 0 ), th us related to the choice of the v ersion of π 1 ( θ 0 ). The last equality leading to the Sa v age–Dick ey represen tation relies on the c hoice of a sp ecific version of π 1 ( θ 0 | x ) as well, namely that the constraint π 1 ( θ 0 | x ) π 1 ( θ 0 ) = R π 0 ( ψ ) f ( x | θ 0 , ψ ) d ψ m 1 ( x ) 3 holds, where the right hand side is equal to the Bay es factor B 01 ( x ) and is therefore indep endent from the v ersion. This rigorous analysis implies that the Sav age–Dick ey representation is tautological, due to the a v ailability of a version of the p osterior density that mak es it hold. As an illustration, consider once again the artificial example ab o ve. As already stressed, the v alue to b e tested θ 0 = 1 is set prior to the experiment. Th us, without mo difying either the prior distribution under mo del M 1 or the marginal p osterior distribution of the parameter θ under mo del M 1 , and in a completely rigorous measure-theoretic framework, w e can select π 1 ( θ 0 ) = 100 = π 1 ( θ 0 | x ) . F or that choice, w e obtain π 1 ( θ 0 | x ) /π 1 ( θ 0 ) = 1 6 = B 01 ( x ) = Γ(3 / 2) − 1  1 + x 2 / 4  3 / 2 exp  − x 2 / 4  . Hence, for this sp ecific choice of the densities, the Sa v age–Dick ey represen tation do es not hold. V erdinelli and W asserman ( 1995 ) hav e prop osed a generalisation of the Sav age–Dick ey density ratio when the constraint ( 2 ) on the prior densities is not v erified (w e stress again that this is a mathematically void constrain t on the resp ective prior distributions). V erdinelli and W asserman ( 1995 ) state that B 01 ( x ) = R π 0 ( ψ ) f ( x | θ 0 , ψ ) d ψ m 1 ( x ) [b y definition] = π 1 ( θ 0 | x ) R π 0 ( ψ ) f ( x | θ 0 , ψ ) d ψ m 1 ( x ) π 1 ( θ 0 | x ) [for any v ersion of π 1 ( θ 0 | x )] = π 1 ( θ 0 | x ) Z π 0 ( ψ ) f ( x | θ 0 , ψ ) m 1 ( x ) π 1 ( θ 0 | x ) π 1 ( ψ | θ 0 ) π 1 ( ψ | θ 0 ) d ψ [for any v ersion of π 1 ( ψ | θ 0 )] = π 1 ( θ 0 | x ) Z π 0 ( ψ ) π 1 ( ψ | θ 0 ) f ( x | θ 0 , ψ ) π 1 ( ψ | θ 0 ) d ψ m 1 ( x ) π 1 ( θ 0 | x ) π 1 ( θ 0 ) π 1 ( θ 0 ) [for any v ersion of π 1 ( θ 0 )] = π 1 ( θ 0 | x ) π 1 ( θ 0 ) Z π 0 ( ψ ) π 1 ( ψ | θ 0 ) π 1 ( ψ | θ 0 , x ) d ψ [for a sp ecific version of π 1 ( ψ | θ 0 , x )] = π 1 ( θ 0 | x ) π 1 ( θ 0 ) E π 1 ( ψ | x,θ 0 )  π 0 ( ψ ) π 1 ( ψ | θ 0 )  . This represen tation of V erdinelli and W asserman ( 1995 ) therefore remains v alid for any choice of versions for π 1 ( θ 0 | x ), π 1 ( θ 0 ), π 1 ( ψ | θ 0 ), provided the conditional densit y π 1 ( ψ | θ 0 , x ) is defined by π 1 ( ψ | θ 0 , x ) = f ( x | θ 0 , ψ ) π 1 ( ψ | θ 0 ) π 1 ( θ 0 ) m 1 ( x ) π 1 ( θ 0 | x ) , whic h obviously means that the V erdinelli–W asserman representation B 01 ( x ) = π 1 ( θ 0 | x ) π 1 ( θ 0 ) E π 1 ( ψ | x,θ 0 )  π 0 ( ψ ) π 1 ( ψ | θ 0 )  (3) is dep endent on the c hoice of a version of π 1 ( θ 0 ). W e no w establish that an alternative represen tation of the Bay es factor is a v ailable and can b e exploited to w ards appro ximation purposes. When considering the Bay es factor B 01 ( x ) = R π 0 ( ψ ) f ( x | θ 0 , ψ ) d ψ R π 1 ( θ , ψ ) f ( x | θ, ψ ) d ψ d θ π 1 ( θ 0 ) π 1 ( θ 0 ) , where the right hand side obviously is indep endent of the choice of the v ersion of π 1 ( θ 0 ), the numerator can b e seen as inv olving a specific v ersion in θ = θ 0 of the marginal p osterior density ˜ π 1 ( θ | x ) ∝ Z π 0 ( ψ ) f ( x | θ , ψ ) d ψ π 1 ( θ ) , 4 whic h is asso ciated with the alternativ e prior ˜ π 1 ( θ , ψ ) = π 1 ( θ ) π 0 ( ψ ). Indeed, this density ˜ π 1 ( θ | x ) appears as the marginal p osterior density of the p osterior distribution defined b y the densit y ˜ π 1 ( θ , ψ | x ) = π 0 ( ψ ) π 1 ( θ ) f ( x | θ, ψ ) ˜ m 1 ( x ) , where ˜ m 1 ( x ) is the proper normalising constan t of the join t posterior densit y . In order to guaran tee a Sa v age– Dic k ey- lik e representation of the Bay es factor, the appropriate version of the marginal p osterior density in θ = θ 0 , ˜ π 1 ( θ 0 | x ), is obtained by imp osing ˜ π 1 ( θ 0 | x ) π 0 ( θ 0 ) = R π 0 ( ψ ) f ( x | θ 0 , ψ ) d ψ ˜ m 1 ( x ) , (4) where, once again, the right hand side of the equation is uniquely defined. This constraint amounts to imp osing that Bay es’ theorem holds in θ = θ 0 instead of almost everywhere (and th us not necessarily in θ = θ 0 ). It then leads to the alternative represen tation B 01 ( x ) = ˜ π 1 ( θ 0 | x ) π 1 ( θ 0 ) ˜ m 1 ( x ) m 1 ( x ) , whic h holds for any v alue c hosen for π 1 ( θ 0 ) provided condition ( 4 ) applies. This new representation may seem to b e only formal, since b oth m 1 ( x ) and ˜ m 1 ( x ) are usually una v ailable in closed form, but w e can take adv antage of the fact that the bridge sampling iden tity of T orrie and V alleau ( 1977 ) (see also Gelman and Meng , 1998 ) gives an un biase d estimator of ˜ m 1 ( x ) /m 1 ( x ) since E π 1 ( θ,ψ | x )  π 0 ( ψ ) π 1 ( θ ) f ( x | θ, ψ ) π 1 ( θ , ψ ) f ( x | θ, ψ )  = E π 1 ( θ,ψ | x )  π 0 ( ψ ) π 1 ( ψ | θ )  = ˜ m 1 ( x ) m 1 ( x ) . In conclusion, we obtain the representation B 01 ( x ) = ˜ π 1 ( θ 0 | x ) π 1 ( θ 0 ) E π 1 ( θ,ψ | x )  π 0 ( ψ ) π 1 ( ψ | θ )  , (5) whose exp ectation part is uniquely defined (in that it does not dep end on the c hoice of a v ersion of the densi- ties in volv ed therein), while the first ratio m ust satisfy condition ( 4 ). W e further note that this represen tation clearly differs from V erdinelli and W asserman’s ( 1995 ) represen tation: B 01 ( x ) = π 1 ( θ 0 | x ) π 1 ( θ 0 ) E π 1 ( ψ | x,θ 0 )  π 0 ( ψ ) π 1 ( ψ | θ 0 )  , (6) since ( 6 ) uses a sp ecific version of the marginal posterior density on θ in θ 0 , as w ell as a sp ecific v ersion of the full conditional p osterior density of ψ given θ 0 3. Computational solutions In this Section, we consider the computational implications of the abov e representation in the specific case of latent v ariable mo dels, namely under the practical p ossibilit y of a data completion by a laten t v ariable z suc h that f ( x | θ, ψ ) = Z f ( x | θ, ψ , z ) f ( z | θ , ψ ) d z when π 1 ( θ | x, ψ , z ) ∝ π 1 ( θ ) f ( x | θ, ψ , z ) is av ailable in closed form, including the normalising constant. W e first consider a computational solution that approximates the Ba yes factor based on our no vel rep- resen tation ( 5 ). Giv en a sample ( ¯ θ (1) , ¯ ψ (1) , ¯ z (1) ) , . . . , ( ¯ θ ( T ) , ¯ ψ ( T ) , ¯ z ( T ) ) simulated from (or conv erging to) the augmen ted p osterior distribution ˜ π 1 ( θ , ψ , z | x ), the sequence 1 T T X t =1 ˜ π 1 ( θ 0 | x, ¯ z ( t ) , ¯ ψ ( t ) ) 5 con v erges to ˜ π 1 ( θ 0 | x ) in T under the following constrain t on the selected v ersion of ˜ π 1 ( θ 0 | x, z , ψ ) used therein: ˜ π 1 ( θ 0 | x, z , ψ ) π 1 ( θ 0 ) = f ( x, z | θ 0 , ψ ) R f ( x, z | θ , ψ ) π 1 ( θ ) d θ . whic h again amounts to imp osing that Bay es’ theorem holds in θ = θ 0 for ˜ π 1 ( θ | x, z , ψ ) rather than almost ev erywhere. (Note once more that the right hand side is uniquely defined, i.e. that it do es not dep end on a sp ecific version.) Therefore, provided iid or MCMC simulations from the joint target ˜ π 1 ( θ , ψ , z | x ) are a v ailable, the con v erging approximation to the Ba y es factor B 01 ( x ) is then 1 T T X t =1 ˜ π 1 ( θ 0 | x, ¯ z ( t ) , ¯ ψ ( t ) ) π 1 ( θ 0 ) ˜ m 1 ( x ) m 1 ( x ) . (W e stress that the sim ulated sample is pro duced for the artificial target ˜ π 1 ( θ , ψ , z | x ) rather than the true p osterior π 1 ( θ , ψ , z | x ) if ˜ π 1 ( θ , ψ ) 6 = π 1 ( θ , ψ ).) Moreov er, if ( θ (1) , ψ (1) ) , . . . , ( θ ( T ) , ψ ( T ) ) is a sample indep en- den tly simulated from (or con verging to) π 1 ( θ , ψ | x ), then 1 T T X t =1 π 0 ( ψ ( t ) ) π 1 ( ψ ( t ) | θ ( t ) ) is a conv ergent and un biased estimator of ˜ m 1 ( x ) /m 1 ( x ). Therefore, the computational solution associated to our representation ( 5 ) of B 01 ( x ) leads to the follo wing un biased estimator of the Bay es factor: d B 01 MR ( x ) = 1 T T X t =1 ˜ π 1 ( θ 0 | x, ¯ z ( t ) , ¯ ψ ( t ) ) π 1 ( θ 0 ) 1 T T X t =1 π 0 ( ψ ( t ) ) π 1 ( ψ ( t ) | θ ( t ) ) . (7) Note that E ˜ π 1 ( θ,ψ | x )  π 1 ( θ , ψ ) f ( x | θ, ψ ) π 0 ( ψ ) π 1 ( θ ) f ( x | θ, ψ )  = E ˜ π 1 ( θ,ψ | x )  π 1 ( ψ | θ ) π 0 ( ψ )  = m 1 ( x ) ˜ m 1 ( x ) implies that T  T X t =1 π 1 ( ¯ ψ ( t ) | θ ( t ) ) π 0 ( ¯ ψ ( t ) ) is another con vergen t (if biased) estimator of ˜ m 1 ( x ) /m 1 ( x ). The a v ailability of tw o estimates of the ratio ˜ m 1 ( x ) /m 1 ( x ) is a ma jor b on us from a computational p oint of view since the comparison of both estimators ma y allow for the detection of infinite v ariance estimators, as w ell as for coherence of the appro ximations. The first approac h requires t wo sim ulation sequences, one from ˜ π 1 ( θ , ψ | x ) and one from π 1 ( θ , ψ | x ), but this is a v oid constraint in that, if H 0 is rejected, a sample from the alternativ e h yp othesis posterior will b e required no matter what. Although we do not pursue this p ossibility in the current pap er, note that a comparison of the differen t representations (including V erdinelli and W asserman’s, 1995, as exp osed b elow) could b e conducted by expressing them in the bridge sampling formalism ( Gelman and Meng , 1998 ). W e no w consider a computational solution that approximates the Bay es factor and is based on V erdinelli and W asserman ( 1995 )’s representation ( 6 ). Given a sample ( θ (1) , ψ (1) , z (1) ) , . . . , ( θ ( T ) , ψ ( T ) , z ( T ) ) simulated from (or conv erging to) π 1 ( θ , ψ , z | x ), the sequence 1 T T X t =1 π 1 ( θ 0 | x, z ( t ) , ψ ( t ) ) con v erges to π 1 ( θ 0 | x ) under the following constrain t on the selected v ersion of π 1 ( θ 0 | x, z , ψ ) used there: π 1 ( θ 0 | x, z , ψ ) π 1 ( θ 0 ) = f ( x, z | θ 0 , ψ ) R f ( x, z | θ , ψ ) π 1 ( θ ) d θ . 6 Moreo v er, if  ˜ ψ (1) , ˜ z (1)  , . . . ,  ˜ ψ ( T ) , ˜ z ( T )  is a sample generated from (or con verging to) π 1 ( ψ , z | x, θ 0 ), the sequence 1 T T X t =1 π 0 ( ˜ ψ ( t ) ) π 1 ( ˜ ψ ( t ) | θ 0 ) is conv erging to E π 1 ( ψ | x,θ 0 )  π 0 ( ψ ) π 1 ( ψ | θ 0 )  under the constraint π 1 ( ψ , z | θ 0 , x ) ∝ f ( x, z | θ 0 , ψ ) π 1 ( ψ | θ 0 ) . Therefore, the computational solution asso ciated to the V erdinelli and W asserman ( 1995 )’s represen tation of B 01 ( x ) ( 6 ) leads to the follo wing unbiased estimator of the Bay es factor: d B 01 VW ( x ) = 1 T T X t =1 π 1 ( θ 0 | x, z ( t ) , ψ ( t ) ) π 1 ( θ 0 ) 1 T T X t =1 π 0 ( ˜ ψ ( t ) ) π 1 ( ˜ ψ ( t ) | θ 0 ) . (8) Although, at first sight, the appro ximations ( 7 ) and ( 8 ) may lo ok v ery similar, the sim ulated sequences used in b oth approximations differ: the first av erage inv olves sim ulations from ˜ π 1 ( θ , ψ , z | x ) and from π 1 ( θ , ψ , z | x ), resp ectiv ely , while the second av erage relies on sim ulations from π 1 ( θ , ψ , z | x ) and from π 1 ( ψ , z | x, θ 0 ), resp ec- tiv ely . 4. An illustration Although our purp ose in this note is far from adv ancing the sup eriority of the Sav age–Dick ey t yp e represen- tations for Bay es factor approximation, given the wealth of a v ailable solutions for embedded mo dels ( Chen et al. , 2000 , Marin and Robert , 2010 ), w e briefly consider an example where both V erdinelli and W asserman’s (1995) and our proposal apply . The model is the Ba yesian p osterior distribution of the regression co efficients of a probit mo del, follo wing the prior mo delling adopted in Marin and Rob ert ( 2007 ) that extends Zellner ’s (1971) g -prior to generalised linear mo dels. W e tak e as data the Pima Indian diab etes study av ailable in R ( R Developmen t Core T eam , 2008 ) dataset with 332 women registered and build a probit mo del predicting the presence of diab etes from three predictors, the glucose concentration, the diastolic blo o d pressure and the diab etes pedigree function, assessing the impact of the diabetes p edigree function, i.e. testing the nullit y of the co efficien t θ associated to this v ariable. F or more details on the statistical and computational issues, see Marin and Rob ert ( 2010 ) since this pap er relies on the Pima Indian probit mo del as b enc hmark. This probit mo del is a natural setting for completion by a truncated normal latent v ariable ( Alb ert and Chib , 1993 ). W e can th us easily implement a Gibbs sampler to produce output from all the p osterior distri- butions considered in the previous Section. Besides, in that case, the conditional distribution π 1 ( θ | x, ψ , z ) is a normal distribution with closed form parameters. It is therefore straightforw ard to compute the un biased estimators ( 7 ) and ( 8 ). Figure 1 compares the v ariation of this approximation with other standard solu- tions cov ered in Marin and Robert ( 2010 ) for the same example, namely the regular imp ortance sampling appro ximation based on the MLE asymptotic distribution, Chib’s version based on the same completion, and a bridge sampling ( Gelman and Meng , 1998 ) solution completing π 0 ( · ) with the full conditional b eing deriv ed from the conditional MLE asymptotic distribution. The b oxplots are all based on 100 replicates of T = 20 , 000 simulations. While the estimators ( 7 ) and ( 8 ) are not as accurate as Chib’s version and as the imp ortance sampler in this sp ecific case, their v ariabilities remain at a reasonable order and are v ery com- parable. The R co de and the reformated datasets used in this Section are a v ailable at the following address: http://www.math.univ-montp2.fr/~marin/savage/dickey.html . Ac knowledgemen ts The authors are grateful to H. Doss and J. Rousseau for helpful discussions, as well as to M. Kilbinger for bringing the problem to their atten tion. Comments from the editorial team were also most useful to impro ve 7 ● ● ● ● ● ● ● Bridge MR VW Chib IS 2.8 3.0 3.2 3.4 Fig 1 . Comp arison of the variabilities of five appr oximations of the Bayes factor evaluating the imp act of the diab etes p e digr e e c ovariate up on the o c curr enc e of diab etes in the Pima Indian p opulation, b ase d on a pr obit mo del ling. The b oxplots ar e b ase d on 100 r eplic as and the Savage–Dickey r epr esentation pr op ose d in the curr ent p ap er is denoted by MR, while V er dinel li and Wasserman ’s (1995) version is denote d by VW. 8 our exp osition of the Sav age–Dick ey parado x. The second author also thanks Geoff Nicholls for p oin ting out the bridge sampling connection at the CRiSM workshop at the Univ ersity of W arwick, May 31, 2010. This w ork had b een supp orted by the Agence Nationale de la Recherc he (ANR, 212, rue de Bercy 75012 P aris) through the 2009-2012 pro ject Big’MC . References Alber t, J. and Chib, S. (1993). Bay esian analysis of binary and polychotomous response data. J. Americ an Statist. Asso c. , 88 669–679. Billingsley, P. (1986). Pr ob ability and Me asur e . 2nd ed. John Wiley , New Y ork. Chen, M. , Shao, Q. and Ibrahim, J. (2000). Monte Carlo Metho ds in Bayesian Computation . Springer- V erlag, New Y ork. Chib, S. (1995). Marginal likelihoo d from the Gibbs output. J. Americ an Statist. Asso c. , 90 1313–1321. Chopin, N. and R ober t, C. (2010). Prop erties of evidence. Biometrika . T o app ear. Consonni, G. and Veronese, P. (2008). Compatibilit y of prior sp ecifications across linear mo dels. Statist. Scienc e , 23 332–353. Dickey, J. (1971). The weigh ted likelihoo d ratio, linear h yp otheses on normal lo cation parameters. Ann. Mathemat. Statist. , 42 204–223. Gelman, A. and Meng, X. (1998). Simulating normalizing constan ts: F rom imp ortance sampling to bridge sampling to path sampling. Statist. Scienc e , 13 163–185. Jeffreys, H. (1939). The ory of Pr ob ability . 1st ed. The Clarendon Press, Oxford. Marin, J. and Rober t, C. (2010). Imp ortance sampling metho ds for Bay esian discrimination b etw een em b edded models. In F r ontiers of Statistic al De cision Making and Bayesian A nalysis (M.-H. Chen, D. Dey , P . M ¨ uller, D. Sun and K. Y e, eds.). Springer-V erlag, New Y ork. T o app ear, see Marin, J.-M. and R ober t, C. (2007). Bayesian Cor e . Springer-V erlag, New Y ork. O’Hagan, A. and F orster, J. (2004). Kendal l’s advanc e d the ory of Statistics: Bayesian infer enc e . Arnold, London. R Development Core Team (2008). R: A L anguage and Envir onment for Statistic al Computing . R F oun- dation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R- project. org . R ober t, C. (2001). The Bayesian Choic e . 2nd ed. Springer-V erlag, New Y ork. Torrie, G. and V alleau, J. (1977). Nonph ysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling. J. Comp. Phys. , 23 187–199. Verdinelli, I. and W asserman, L. (1995). Computing Bay es factors using a generalization of the Sav age– Dic k ey densit y ratio. J. Americ an Statist. Asso c. , 90 614–618. Wetzels, R. , Grasman, R. and W agenmakers, E.-J. (2010). An encompassing prior generalization of the Sav age-Dick ey density ratio. Comput. Statist. Data Anal. , 54 2094–2102. Zellner, A. (1986). On assessing prior distributions and Bay esian regression analysis with g -prior distri- bution regression using Bay esian v ariable selection. In Bayesian infer enc e and de cision te chniques: Essays in Honor of Bruno de Finetti . North-Holland / Elsevier, 233–243. 9

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment