Generalization of Jeffreys divergence based priors for Bayesian hypothesis testing
In this paper we introduce objective proper prior distributions for hypothesis testing and model selection based on measures of divergence between the competing models; we call them divergence based (DB) priors. DB priors have simple forms and desira…
Authors: M.J. Bayarri, G. Garcia-Donato
Generalizati o n of Jeffreys’ Div ergence B ased Priors for Ba y esian Hyp othesi s tes ting M.J. Ba y arri Univ ersit y o f V alencia G. Garc ´ ıa-Donato Univ ersit y of Castilla-La Manc ha ∗ August 3, 2018 Abstract In this pap er we int ro duce ob jective prop er prio r dis tributions for hypothes is testing and mo del selection based on meas ures of divergence b et ween the comp eting mo dels; we call them diver genc e b ase d (DB) prior s. DB priors ha ve simple forms and desirable prop erties, like information (finite sample) consistency; o ften, they a re similar to other existing prop osals like the intrinsic prio rs; mo reov er, in nor mal linear mo dels scenar ios, they e x actly r epro duce Jeffreys-Ze llne r -Siow prio rs. Most imp ortantly , in challenging scenarios such as irreg ular mo dels and mixture mo dels , the DB pr iors are w ell defined and very reasona ble, while alternative prop osa ls are not. W e der ive approximations to the DB prio rs as well as MCMC and asymptotic expressio ns for the asso ciated Bayes factors . Keyw ords : Bay es factors ; Infor ma tion Consistency; In trinsic prior s; Ir regular mo dels; Kullback-Leibler divergence; Mix ture models. 1 In tro duction F or the data y , with den s it y f ( y | θ , ν ), w e consider the h yp othesis testing problem: H 1 : θ = θ 0 , vs. H 2 : θ 6 = θ 0 , (1) where θ 0 ∈ Θ is a known v alue. This is equiv alen t to the mo del selection p roblem of c ho osing b et wee n mo dels: M 1 : f 1 ( y | ν 1 ) = f ( y | θ 0 , ν 1 ) vs. M 2 : f 2 ( y | θ , ν 2 ) = f ( y | θ , ν 2 ) , (2) ∗ Address for correspondence: Gonzalo Garc ´ ıa-Donato, Department of Economy , Plaza Universidad 2, 02071 Albacete, S pain. Email:Gonzalo.Ga rciaDonato@uclm.es 1 where the notation r eflects the fact th at often ν 1 and ν 2 represent different quan tities in eac h mo del. In Jeffreys’ scenarios (Jeffreys, 19 61), ν 1 and ν 2 had the same mea nin g; he ca lled θ the new p ar ameter , and ν 1 and ν 2 , the c ommon p ar ameters (also known as nuisance parameters). W e r evisit this issue in Section 4. W e aim for an obje ctive Bayes solution to this m o del selection problem; that is, no ‘external’ (sub jectiv e) inform ation is assumed, other than the data, y , and th e in formation implicitly needed to p ose the pr oblem, c ho ose the comp eting mo dels, etc. An excellen t exp osition of the adv ant ages of Ba y esian metho ds, sp ecially ob jectiv e Ba y es metho ds, f or pr oblems with mo d el uncertain t y is Berger and Pericc hi (2001). Usual Ba ye sian s olutions (for 0- k i loss fun ctions) to (1) (or, equiv alen tly , to (2)) are based on the p osterior odd s: Pr( H 1 | y ) Pr( H 2 | y ) = Pr( H 1 ) Pr( H 2 ) × B 12 , where Pr( H i ) , i = 1 , 2 are the pr ior probabilities of the hyp otheses, and B 12 is Bayes F actor for H 1 against H 2 : B 12 = m 1 ( y ) m 2 ( y ) = R f 1 ( y | ν 1 ) π 1 ( ν 1 ) d ν 1 R f 2 ( y | θ , ν 2 ) π 2 ( θ , ν 2 ) d θ d ν 2 , (3) where π 1 ( ν 1 ) is the prior under H 1 and π 2 ( θ , ν ) the prior under H 2 . That is, B 12 is th e ratio of the marginal (a verag ed) likel iho o ds of the m o dels. It is common p r actice in ob jectiv e Bay es approac hes to concen trate on deriv ations of the Ba yes factors, letting the ultimate c hoice (whether ob jectiv e or sub jectiv e) of the pr ior mo d el probabilities (and the deriv ations of the p osterior o d ds) to the u ser. Ba y es f actors were exten- siv ely used by Jeffreys (1961) as a measure of evidence in fav or of a mo del (see also Berger, 1985; Berger and Delampady , 1987, and Berger and Sellk e, 1987); Kass and Raftery (1995) is a go o d reference for review and app lications. Ba ye s factors are al so crucial ingred ien ts of mo del av erag- ing approac hes (see Clyde, 1999 ; Hoeting et al, 1999). In the r est of the pap er, w e concen trate on the deriv ation of ob jectiv e priors to compute Ba y es factors. A main issue for deriving ob jectiv e Ba yes factors is appropriate c hoice of π 1 ( ν 1 ) and π 2 ( θ , ν 2 ) for u se in (3). It is well kno wn that familiar improp er ob jectiv e priors (or non-in f ormativ e priors) for estimation p r oblems (und er a fixed mo del) are usu ally seriously inadequate in the presence of m o del un certain t y , generally p ro ducing arbitrary a nswers. (In teresting exceptions are studied in Berger, P ericc hi and V arsha vsky , 1998.) Of course, w hen improp er p r iors can n ot b e used , use of arbitrarily v ague (bu t p rop er) priors is not a cure, and generally it is ev en worse. Another bad s olution often encoun tered in practice is use of an apparen tly ‘inno cuous ’, harmless, but y et arbitrary , p rop er prior, s ince it can sev erely dominate the lik eliho o d in w a ys that are not an ticipated (and can not b e in vest igated for h igh dimensional problems). There are t wo b asic a pp roac h es to c ompu te Ba y es fact ors when there is not en ough informa- tion a v ailable for trust worth y su b jectiv e assessment of π 1 ( ν 1 ) and π 2 ( θ , ν 2 ) . A v ery successful 2 one is to directly deriv e the ob jectiv e Ba y es factors th emselv es, u sually by ‘training’ and calibrat- ing in seve ral wa ys the non-appropr iate Ba yes factors obtained fr om usu al ob jectiv e improp er priors (see Berger and P ericc hi, 2001 for r eviews and references). Ho we ver, all these ob jectiv e Ba yes factors s hould ultimately b e chec k ed to corresp ond (appro ximately) to a gen uin e Ba ye s factor deriv ed from a sensible prior. Th e alternativ e approac h is to lo ok for ‘formal rules’ for constructing ‘ob jectiv e’ bu t prop er priors that ha v e nice prop erties and are appropr iate for us- ing in mo del selection; Ba y es factors are then just computed from these ob j ective pr op er priors. Whether these Bay es factors are appropr iate can then b e directly judged from the adequacy of the priors used. Choice of p rior distr ib utions in scenarios of mo del un certain ty is still largely an op en q u es- tion, and only partial answers are kn o wn. Seve ral metho ds hav e b een p rop osed for use in general scenarios, like the arithmetic intrinsic (AI) priors (Berger and Pericc hi, 1996 ; Moreno, Be rtolino and Racugno, 1998) ; the fractional int rin s ic (FI) priors (De San tis and Sp ezaferri, 1999 ; Berger and Mortera, 1999 ); the exp ected p osterior (EP) p riors (P´ erez and Berger, 2002); the unit in- formation p riors (Kass and W asserman, 1995) and predictiv ely matc hed priors (Ibr ahim and Laud, 199 4; Laud and Ibrahim, 1995 ; Berger, P ericc hi and V arsha vsky , 1998; Berger and P er - icc h i, 2001). In the sp ecific con text of linear mo dels, widely used prior with nice prop erties are Jeffreys-Zellner-Sio w (JZS ) priors (Jeffreys, 1961; Zellner and Siow, 1980 ,1984; Ba ya rr i and Garc ´ ıa-Donato , 2007). An interesting generalization is the m ixtures of g -priors (Liang et al., 2007) . All these method s are in s igh tful, pro vid e many interesting and u seful ideas, and indeed ha ve sho wn to b eh av e nicely in a num b er of testing and mo del selection problems. Nonetheless, except for the very sp ecific scenario of linear mo dels, nob o d y seems to hav e in vesti gated the ramifications of Jeffreys (196 1) pioneering prop osal (see the end of Section 2). His w as indeed the fi r st general deriv ation of ob j ective priors f or hypothesis testing, and w as in tended as a generalizat ion of his p r op osal for te sting a normal mean. Giv en the success of the generalization of this J effrey’s testing prior to linear models (Zellner and Siow, 1980,1 984; Ba y arri and Garc ´ ıa- Donato, 2007) , it is somewhat su rprising th at h is general prop osal has not b een pu rsued. W e think that it is h istorically imp ortant to pursu e this inv estigation, and w e d o so in this pap er. Sp ecifically , we generalize Jeffrey’s p ioneering suggestion, and use d ivergence measures b e- t w een the comp eting mo dels to deriv e the required (prop er) priors. W e call these p riors di- ver genc e b ase d (DB) pr iors. The main motiv ation was to generalize the useful JZS priors for use in scenarios other than the n orm al linear mod el, while at th e same time exte nd ing Jeffrey’s general prop osal. W e will sho w that in deed the DB priors are the JZ S priors in linear mo del con texts; also, they are as easy to d eriv e (often easier) than other p opular prop osals (AI, FI or EP priors), b eing quite similar to them in many instances; most in terestingly , they are well defined in certain scenarios where all of the other p rop osals fail. 3 F or clarit y of exp osition, we consider first th e case wh en there are n o nuisance parame- ters. Dev elopment for the general case is dela y ed till Section 4, once the basic id eas ha v e b een in tro du ced, and the b ehavio r of DB pr iors studied in this considerably simpler scenario. 2 DB pr iors Assume first the p r oblem without n uisance parameters: M 1 : f 1 ( y ) = f ( y | θ 0 ) vs. M 2 : f 2 ( y | θ ) = f ( y | θ ) . (4) That is, th e simp ler mo del ( M 1 ) in v olv e no unkn o wn parameters; hen ce only the prior for θ under M 2 is needed. W e drop the subindex in th e previous section and denote such p rior simp ly b y π ( θ ); clearly π ( θ ) has to b e prop er. Our pr op osal for DB p r iors for θ w ill b e in terms of div ergence measur es b etw een the com- p eting mo dels f ( y | θ 0 ) and f ( y | θ ), based on Ku llbac k-Leibler directed div ergences K L [ θ 0 : θ ] = Z [log f ( y | θ ) − log f ( y | θ 0 )] f ( y | θ ) d y , (5) (assuming con tin uous y for simp licit y). K L is a m easure of the information in y to d iscr im in ate b et wee n θ and θ 0 ; it is designed to measure ho w far apart the t w o comp eting densities are in the sense of the likelihoo d (Sc hervish, 1995). W e d o not directly use K L to define the DB prior b ecause it is not s y m metric with resp ect to its argumen ts, and h ence it w ould lik ely r esult in nonsymmetric priors; how eve r, symmetric measures of diverge nce c an b e derive d b y taking sums (which wa s Jeffrey’s c hoice) or minimums of K L div ergences. W e define: D S [ θ , θ 0 ] = K L [ θ : θ 0 ] + K L [ θ 0 : θ ] , (6) and D M [ θ , θ 0 ] = 2 × m in { K L [ θ : θ 0 ] , K L [ θ 0 : θ ] } . (7) W e multiply b y 2 the minim um in the defin ition of D M so that b oth measures are in the same scale; indeed, in some symmetric mo dels (lik e in the normal scenario) b oth measures of div ergence coincide. Generalizati ons of K L , D S and D M to includ e marginal parameters are discussed in Section 4. Note that D M is well defi ned eve n when one of directed K L div ergences is not, wh ic h is the case when th e comp eting mo dels ha ve different supp ort. Except for these irregular scenarios, D S is w ell defined and it is considerably easier to deriv e th an D M . Most of th e deriv ations and prop erties to follo w are common to b oth D S and D M . T o av oid tedious rep etitions, w e then simply use D to refer to an y one of them. W e u se the su p erind ex S or M 4 only when necessary . It is well kno wn that D ≥ 0 with equalit y if and only if θ = θ 0 , although it is not a metric (the triangle inequalit y do es not hold). Our pr op osal, is based on unitar y me asur e s of diver genc e , ¯ D , whic h w e tak e to b e D d ivided b y the effe c tiv e sample size n ∗ , ¯ D = D /n ∗ . In simple univ ariate i.i.d. data the effectiv e s ample equals the num b er of scalar data p oints, but it does n ot need to b e so in general. Indeed, in complex situations, it can b e a difficult concept; although there ha ve b een s ev eral attempts in the literature to f ormalize it (see e.g. Pauler, 1998; Pa uler, W ak efield and Kass, 1999 ; Berger et al. 2007), n o general agreed definition seems to exist. In all of the examples of this pap er , it is quite clear what n ∗ should b e, so we rely for n o w in simp le, intuitiv e in terpretations. 2.1 Motiv ation: sc alar lo c at ion parameters Supp ose y is a random sample from a u niv ariate lo cation f amily: f ( y | θ ) = n Y i =1 f ( y i | θ ) = n Y i =1 g ( y i − θ ) , θ ∈ R . It has b een argu ed (Berger and Delampady 1987; Berger and Sellk e 1987) that in symmetric problems w ith Θ = R , ob jectiv e testing priors π ( θ ) und er H 2 : θ 6 = θ 0 should b e u n imo dal and symmetric ab out θ 0 ; these priors preve nt introd ucing excessiv e bias to w ard H 2 . Accordingly , w e lo ok for a prop er π ( θ ) which, when in this simple scenario, has these desirable c haracteristics and which is ea sily generalizable to other situations. As b efore, let ¯ D b e a unitary symmetrized div ergence. W e consider u se of a fun ction, h of ¯ D as a testing prior und er H 2 ; that is π ( θ ) ∝ h ( ¯ D [ θ , θ 0 ]). Sin ce π has to b e pr op er, h ( t ) has to b e a decreasing (n o-increasing) fun ction for t > 0. A first p ossibilit y could b e to tak e h ( t ) = exp {− q t } for some q > 0, b ut th is resu lts in p riors with sh ort tails. Short-tailed priors are usually n ot adequate for mo del selection, since th ey tend to exhibit undesirable (finite s ample) inconsisten t b eha vior (see Liang et al 2007). W e explore instead u se of th e functions h q ( t ) = (1 + t ) − q , where q > 0 con trols th ic k n ess of the tails of π ( θ ). Let c ( q ) = Z h q ( ¯ D [ θ , θ 0 ]) dθ = Z 1 + D [ θ , θ 0 ] n ∗ − q dθ , and define q = inf { q ≥ 0 : c ( q ) < ∞} , q ∗ = q + 1 / 2 . 5 F or fin ite q , our sp ecific prop osal for a DB p rior in this lo cation p roblem is π D ( θ ) = c ( q ∗ ) − 1 1 + D [ θ , θ 0 ] n ∗ − q ∗ ∝ h q ∗ ¯ D [ θ , θ 0 ] . (8) Generalizati on to v ector v alued θ is trivial. W e use q ∗ instead of the more natural q b ecause q is not guaran teed to pr o duce prop er priors. Of course, if q is fin ite, an y q = q + δ , with δ > 0 results in prop er priors, and h ence co uld h a v e b een us ed to define a DB pr ior. Our sp ecific prop osal, δ = 1 / 2 was c hosen to r epro duce the w ell kno wn Jeffr eys-Z ellner-S io w pr ior in the Normal cont ext; in general this c hoice results in densities with hea vy tails. Moreo ver, w e ha v e found th at in general 0 < δ < 1 is a go o d c hoice since it pr o duces priors without momen ts, whic h in n orm al scenarios is n eeded to av oid undesirable b eha vior of conjugate g p riors (Liang et al, 2007). The follo wing lemma establishes the d esired symmetry and unimo dalit y of th e DB p r ior. The pro of follo ws easily from prop erties of D in these lo cation problems and is a vo ided. Lemma 2.1. Assume q < ∞ ; then π D ( θ ) is unimo dal and symmetric ar ound θ 0 . Definition of DB pr iors f or scale parameters is also direct. Indeed assume th at θ is a scale parameter for a p ositiv e random v ariable X ; then , ξ = log θ is a lo cation parameter for Y = log X , with density f ∗ ( y | ξ ). Applying the definition in (8), the DB pr ior for ξ is: π D ( ξ ) ∝ h q ∗ ( ¯ D ∗ [ ξ , ξ 0 ]) , (9) where ξ 0 = log ( θ 0 ) and ¯ D ∗ [ ξ , ξ 0 ] is the un itary measure of div ergence b et we en f ∗ ( y | ξ 0 ) and f ∗ ( y | ξ ). Ther efore, in th e original parameterization: π D ( θ ) ∝ h q ∗ ( ¯ D ∗ [log θ , log θ 0 ]) 1 θ = h q ∗ ( ¯ D [ θ , θ 0 ]) π N ( θ ) , (10) where, b ecause of in v ariance of ¯ D u nder rep arameterizat ions, ¯ D ∗ [log θ , log θ 0 ] = ¯ D [ θ , θ 0 ], and π N ( θ ) = 1 /θ is the non informativ e prior (r igh t Haar in v ariant prior) for θ . Definition of DB priors for general parameters, formalized in n ext section, will basically b e a generalizatio n of (10). 2.2 General parameters Assume the more general pr oblem (4) and let π N ( θ ) b e an ob j ective (us u ally imp r op er) ‘estima- tion’ pr ior (referen ce, inv arian t, Jeffreys, Uniform, ... pr ior) for θ , and let ξ b e a tr an s formation suc h th at π N ( ξ ) = 1 for ξ = ξ ( θ ). W e can then deriv e a DB prior for θ by considering ξ as a “location parameter”, applyin g th e definition (8), and transforming bac k to θ . This transfor- mation w as first prop osed by Jeffreys (1961). Bernardo (200 5) u s es it with a reference prior π N 6 for a scalar θ , and notes that ξ asymptotically b ehav es as a lo cation parameter. Giving ξ a DB p rior for lo cation parameters results in: π D ( ξ ) ∝ h q ∗ ( ¯ D ∗ [ ξ , ξ 0 ]) , (11) where, as b efore, ¯ D ∗ [ ξ , ξ 0 ] den otes ‘u n it’ (symmetrized) discrep ancy b et ween f ∗ ( y | ξ ) and f ∗ ( y | ξ 0 ), and ξ 0 = ξ ( θ 0 ). Hence, the corresp onding ( D B ) prior for θ is π D ( θ ) ∝ h q ∗ ( ¯ D ∗ [ ξ ( θ ) , ξ ( θ 0 )]) |J θ ( θ ) | ∝ h q ∗ ( ¯ D [ θ , θ 0 ]) π N ( θ ) , (12) as long as π N is inv arian t under transformations; J ( θ ) is the jaco bian of the transformation. It should b e n oted from (1 2) that the explicit tr an s formation to ξ is not needed in order to deriv e the prior π D . W e can no w form ally define a DB prior as follo ws: Definition 2.1. (General DB priors) F or the mo del sele c tion pr oblem (4) , let ¯ D [ θ , θ 0 ] b e a unitary me asur e of diver genc e b etwe en f ( y | θ ) and f ( y | θ 0 ) . Also let π N ( θ ) b e an obje ctive (p ossibly impr op er) estimation prior for θ under the c omplex mo del, M 2 , and h q ( · ) b e a de cr e asing function. D efine: q = inf { q ≥ 0 : c ( q ) < ∞} , q ∗ = q + 1 / 2 , wher e c ( q ) = R h q ( ¯ D [ θ , θ 0 ]) π N ( θ ) d θ . If q ∗ < ∞ , then a diver genc e b ase d prior under M 2 is define d as π D ( θ ) = c ( q ∗ ) − 1 h q ∗ ( ¯ D [ θ , θ 0 ]) π N ( θ ) . (13) Note that, by defin ition, the DB priors either do not exist, or they are prop er (and hence they do not inv olv e arbitrary constants). Sp ecific Prop osals. Definition 2.1 is very general, in that sev eral d efinitions of ¯ D , h q and π N could b e explored (as well as d ifferen t choic es of 0 < δ < 1 in q ∗ = q + δ ). W e giv e sp ecific c h oices whic h, in part, are based on pr evious explorations and desired prop erties of the resulting π D ; ho we ver our sp ecific c hoices are mainly intended to repr o duce JZS priors in normal scenarios, so that our prop osals for DB priors can b e b est con templated as extensions of JZS pr iors to non-normal scenarios. In wh at follo w s, w e take D to b e either D S in (6) o r D M in (7), and h q ( t ) = (1 + t ) − q . S ince w e will explore b oth, w e need differen t n otations: Definition 2.2. (Sum and Minim um DB priors) The su m DB prior π S and the minimum DB prior π M ar e the D B priors given in definition 2.1 with h q ( t ) = (1 + t ) − q and D b eing r esp e ctive ly D S (se e (6) ) and D M (se e (7) ). When ne e de d, we r efer to their c orr esp onding c’s and q’s as c S , q S , q S ∗ , and c M , q M , q M ∗ , r esp e ctively. 7 It can easily b e shown that c S ( q ) ≤ c M ( q ), so that, for regular problems (in whic h ¯ D S < ∞ ), q M ∗ < ∞ imp lies q s ∗ < ∞ , and th erefore, in these problems, existence of π M implies existence of π S . It should b e noted that, although w e are not explicitly assuming a sp ecific ob j ectiv e pr ior π N in the defin ition of DB pr iors, p rop erties of π N are inherited by the DB prior π D ; some prop erties will b e crucial for sensible DB priors, and hence appropriate c hoice of π N b ecomes v ery imp ortan t. W e no w explore some app ealing prop erties of DB p riors. S ince these are common to b oth prop osals in Definition 2.2, we d rop un needed sup er and su b ind exes and r efer to the prior simply as π D . This conv ent ion w ill b e kept through the pap er; distinction b etw een π S and π M will only b e done when needed. Lo cal b ehavior of DB priors. It can b e easily c h ec ked that, when π N ( θ ) = 1 (as when θ is a lo cation parameter), then the mo de of π D is θ 0 (so π D is ‘cen tered’ at the simplest mo del). W e can also exploit th e follo wing (well kno wn) approximat e relationship b et ween Ku llbac k-Leibler div ergence and Fisher information (see K ullbac k, 1968): f or θ is in a neighborh o o d of θ 0 K L [ θ 0 , θ ] ≈ 1 2 ( θ − θ 0 ) t J ( θ 0 )( θ − θ 0 ) , where J ( θ 0 ) is the exp ected Fisher information m atrix ev aluated at θ 0 . Hence, in a neigh b orh o o d of θ 0 , the DB priors approximat ely b ehav e as k multiv ariate Stud en t distribu tions, cen tered at θ 0 , and scaled by Fisher information matrix under the simpler mo del. T hat is, π D ( θ ) ≈ St k ( θ 0 , n ∗ J ( θ 0 ) − 1 /d, d ) , where d = 2 q − k + 1. Moreo ve r, by definition of q ∗ , d ab o ve is generally close to 1, and then the DB pr iors w ou ld appro ximately b e Cauc hy . As highlighte d in Section 4.3.2, the appr oximati on ab ov e exactly holds in Normal scenarios with d = 1, and hence the DB priors repro duce precisely the prop osals of Jeffreys-Zellner-Sio w. In v ariance under one-to-one transformations An important question is whether the DB priors are inv arian t under r eparameterizatio ns of the problem. Supp ose that ξ = ξ ( θ ) is a one-to-one monotone mapp ing ξ : Θ → Θ ξ . T h e mo del selection p roblem (4) no w b ecomes: M ∗ 1 : f ∗ 1 ( y ) = f ∗ ( y | ξ 0 ) vs. M ∗ 2 : f ∗ 2 ( y | ξ ) = f ∗ ( y | ξ ) , (14) where f ∗ ( y | ξ ( θ )) = f ( y | θ ) and ξ 0 = ξ ( θ 0 ). Th e next result sh ows that, if π N is in v ariant under the reparameterization ξ ( θ ) then so are the DB priors. 8 Prop osition 1. L et π D θ ( θ ) and π D ξ ( ξ ) denote the DB priors for th e original (4 ), and r ep ar ame- terize d (14) pr oblems r esp e ctively. If π N θ ( θ ) ∝ π N ξ ( ξ ( θ )) |J ξ ( θ ) | , wher e J ξ is the Jac obian of the tr ansformatio n then π D θ ( θ ) = π D ξ ( ξ ( θ )) |J ξ ( θ ) | . Pr o of. See App endix. Under the conditions of Prop osition 1, Ba yes factors computed from DB p riors are not affected b y reparameterizations. It is imp ortant to note that in v ariance of DB priors is a dir ect consequence of b oth the in v ariance of the d iv ergence m easure used and the inv ariance of π N . Some ob jectiv e priors π N in v ariant under reparameterizations are Jeffreys’ p riors and (partially) the reference priors. Compatibility with sufficien t statist ics. DB priors are sometimes compatible with red uc- tion of the data via sufficien t s tatistics. This attractiv e prop ert y is not sh ared by other ob jectiv e Ba yesian metho ds, as in trinsic Bay es factors. Prop osition 2. L et t = t ( y ) b e a sufficient statistic for θ in f ( y | θ ) with distribution f ∗ ( t | θ ) . Assume that π N and n ∗ r emain the same in the pr oblem define d by f ∗ , then the DB prior π D for the original pr oblem (4) is the same as the DB prior for the r e duc e d (by sufficienty) testing pr oblem M ∗ 1 : f ∗ 1 ( t ) = f ∗ ( t | θ 0 ) v s. M ∗ 2 : f ∗ 2 ( t | θ ) = f ∗ ( t | θ ) . (15) Pr o of. See App endix. DB priors and Jeffreys’ general rule. Jeffreys (1961) prop osed ob jectiv e prop er pr iors for testing situations other than the normal mean. Sp ecificall y , wh en y is a random sample of size n , and for un iv ariate θ he pr op osed the follo win g mo d el testing prior: π J ( θ ) = 1 π d dθ tan − 1 D S [ θ , θ 0 ] n 1 / 2 = 1 π 1 + D S [ θ , θ 0 ] n − 1 d dθ D S [ θ , θ 0 ] n 1 / 2 . (16) This redu ces to Jeffreys Cauc h y prop osal wh en θ is a normal mean. Also, wh en | θ − θ 0 | is small, π J ( θ ) can b e appro ximated by π J ( θ ) ≈ 1 π 1 + ¯ D S [ θ , θ 0 ] − 1 π N J ( θ ) , (17) where π N J ( θ ) is Jeffreys’ (estimation) prior (i.e. the s q u ared ro ot of the exp ected Fish er infor- mation). 9 Note that π J can lead to improp er pr iors and at least in pr inciple can n ot b e applied for mul- tiv ariate p arameters. Ho wev er, the app ro ximation (17 ) was a main ins piration for th e definition of DB pr iors, w ith clear similarities b et ween them. 3 Comparativ e examples: simple n ull In the spirit o f Berger and Pe ricc hi (2001) w e inv estigate in this sect ion the p erformance of DB priors in a s er ies of situations c hosen to b e somehow representa tiv e of wider cla sses of statistical problems. W e also explicitly deriv e well established, alternativ e prop osals for ob jectiv e p r iors in Bay esian h yp othesis testing and compare their p erformance with that of DB priors. W e sho w that in simple standard situations, DB priors pro du ce similar results to th ese alternativ e prop osals. More in terestingly , in more sophisticated situations where these pr op osals fail (mo dels with irregular asymp totics or impr op er lik eliho o ds), the DB priors are well defined and very sensible. W e will compute and compare Ba y es fact ors der ived with DB p riors, with those derived with t w o of the most p opular general ob jectiv e priors f or ob j ectiv e Ba y es mo d el selectio n, namely: 1. Arithmetic intrinsic prior: π A ( θ ) = π N ( θ ) E M 2 θ ( B N 12 ( y ∗ )) , where the Ba yes f actor B N is compu ted with the ob jectiv e estimation prior π N , and y ∗ is an imaginary samp le of minimum size suc h that 0 < m N 2 ( y ∗ ) < ∞ . 2. F ractional in trinsic p rior: π F ( θ ) = π N ( θ ) exp { m E M 2 θ log f ( y | θ 0 ) } R exp { m E M 2 θ log f ( y | ˜ θ ) } π N ( ˜ θ ) d ˜ θ . In the iid case and asymptotically , π A pro du ces th e arithmetic intrinsic Bayes factor (Berger and P ericc hi, 19 96), and π F the fr actional Bayes factor (O’Hagan, 1995) if the exp onen t of the lik eliho o d is b = m/n for a fixed m (see De Sant is and Sp ezaferri, 1999). F ollo win g the recommendation of Berger and P ericc hi (2001) w e tak e m to b e the size of the minimal training sample y ∗ . In the examples of this Section, y is an iid sample of size n from f ( y | θ ), and unless otherwise sp ecified, n ∗ = n ( n ∗ denotes effec tive sample size). W e let B S 12 denote the Ba y es factor in fav or of H 1 computed with π S (see Definition 2.2); B M 12 , B A 12 and B F 12 are defined similarly . 10 3.1 Bounded parameter space (E xample 1) W e b egin with a simp le example, in whic h data is a r andom sample from a Bernoulli distrib ution, that is f ( y | θ ) = θ y (1 − θ ) 1 − y , y ∈ { 0 , 1 } , θ ∈ Θ = [0 , 1] , and we wan t to test M 1 : θ = θ 0 v ersus M 1 : θ 6 = θ 0 . Th e u sual estimation ob jectiv e prior (b oth reference and Jeffreys) in this problem is the b eta densit y π N ( θ ) = B e ( θ | 1 / 2 , 1 / 2) ∝ θ − 1 / 2 (1 − θ ) − 1 / 2 . In this case, since π N is prop er, it w ould b e temp ting to use it as a testing prior. Ho w ev er, we will see that all π S , π M , π A and π F cen ter around the n ull v alue θ 0 whereas the estimation p rior completely ignores it. The DB prior for th e sum-symm etrized div ergence can b e computed to b e π S ( θ ) ∝ h 1 + ( θ − θ 0 ) log θ (1 − θ 0 ) θ 0 (1 − θ ) i − 1 / 2 π N ( θ ) , and the DB prior for the min-symmetrized diverge nce π M ( θ ) ∝ 1 + ¯ D M [ θ , θ 0 ] − 1 / 2 π N ( θ ) , where ¯ D M [ θ , θ 0 ] = n 2 K L [ θ : θ 0 ] if min { θ 0 , 1 − θ 0 } < θ < max { θ 0 , 1 − θ 0 } 2 K L [ θ 0 : θ ] otherwise , and K L [ θ : θ 0 ] = θ 0 log θ 0 θ + (1 − θ 0 ) log 1 − θ 0 1 − θ . The in trin s ic priors are deriv ed in the next resu lt. Th e pro of is straigh tforw ard and hence it is omitted. Lemma 3.1. The arithmetic intrinsic prior is π A ( θ ) = 2 π (1 − θ 0 )(1 − θ ) + θ 0 θ π N ( θ ) and the fr actional intrinsic prior is π F ( θ ) = θ θ 0 (1 − θ 0 ) 1 − θ Γ( θ + 1 / 2)Γ(3 / 2 − θ ) π N ( θ ) . By construction, π S and π M are prop er pr iors; π A is p rop er but π F is not. F or instance, for θ 0 = 1 / 2, π F in tegrates to 1.28 and for θ 0 = 3 / 4, π F in tegrates to 1.18. This implies a small bias in the Ba yes factor in fa vo r of M 2 . In Figure 1 w e d ispla y π S , π M , π A and π F for θ 0 = 1 / 2 and θ 0 = 3 / 4. Th ey can b e seen to b e very similar. When θ 0 = 1 / 2 they are also similar to the 11 0.2 0.4 0.6 0.8 1 Theta 1 2 3 4 5 0.2 0.4 0.6 0.8 1 Theta 1 2 3 4 5 Figure 1: I n Bernoulli example: π S (Solid line), π M (Dot-dashed line), π A (Dots) and π F (Dashed line), for the case θ 0 = 1 / 2 (left) an d θ 0 = 3 / 4 (righ t). n = 10 ˆ θ B S 12 B M 12 B A 12 B F 12 0.50 3.26 3.44 4.06 2.68 0.65 2.14 2.24 2.58 1.75 0.80 0.55 0.57 0.60 0.44 n = 10 0 0.50 9.74 10.28 12.56 8.03 0.55 5.93 6.26 7.61 4.89 0.60 1.33 1.40 1.68 1.09 Conov er 19.38 20.20 20.79 16.02 T able 1: Bay es factors in fa vor of M 1 for Bernoulli testing of θ 0 = 1 / 2, for different v alues of the MLE and n = 10, n = 100. Also, Ba ye s factors for Cono v er data. ob jectiv e estimation p rior B e ( θ | 1 / 2 , 1 / 2), but not f or other v alues of θ 0 . W e also compute th e Ba ye s f actors f or th e four d ifferen t pr iors, wh en θ 0 = 1 / 2, f or t wo differen t s amp le sizes, n = 10 and n = 100, and for d ifferen t v alues of the MLE, ˆ θ = P 10 i =1 y i /n (see T able 1). All the results are quite similar. As exp ected, B F 12 giv es the most supp ort to M 2 ; B A 12 giv es the lea st. Both DB priors pr o duce similar resu lts, b eing sligh tly closer to B A 12 than to B F 12 . Finally , w e consider app lication to real data tak en from C ono v er (1971). Under th e hy- p othesis of simple Mendelian inheritance, a cross b et ween tw o particular plants p ro duces, in a prop ortion of θ = 3 / 4 a sp ecie called ‘gian t’. T o determine whether this assu m ption is tru e, Cono v er (1971) crossed n = 925 pair of plan ts, getting T = 682 gian t plants. The Ba y es f actors in fa vor of the Mend elian inh er itance h yp othesis (simplest mo del) are also giv en in T able 1 for the four different priors. Again the results are very similar, th e fr actional intrinsic prior pro viding the least supp ort to M 1 . 12 2 4 6 8 10 mu 0.025 0.05 0.075 0.1 0.125 0.15 0.175 0.2 piD 2 4 6 8 10 mu 0.025 0.05 0.075 0.1 0.125 0.15 0.175 0.2 piM 2 4 6 8 10 mu 0.025 0.05 0.075 0.1 0.125 0.15 0.175 0.2 piA 2 4 6 8 10 mu 0.025 0.05 0.075 0.1 0.125 0.15 0.175 0.2 piF Figure 2: π S (upp er left), π M (upp er right), π A (lo wer left) and π F (lo wer right) for the Exp onentia l testing of µ 0 = 5. 3.2 Scale parameter (Example 2) W e next consider another simple example of testi ng a scale p arameter. Sp ecifically , we consider that data come from the one parameter exp onential mo del with mean µ , th at is, f ( y | µ ) = E xp ( y | 1 µ ) = 1 µ exp {− y µ } , y > 0 , µ > 0 , and that it is d esir ed to test H 1 : µ = µ 0 vs. H 2 : µ 6 = µ 0 . Here π N ( µ ) = µ − 1 , and the DB priors are computed to b e: π S ( µ ) ∝ 1 + ( µ − µ 0 ) 2 µµ 0 − 1 / 2 µ − 1 , π M ( µ ) ∝ 1 + ¯ D M [ µ, µ 0 ] − 3 / 2 µ − 1 , where ¯ D M [ µ, µ 0 ] = n 2 K L [ µ 0 : µ ] if µ > µ 0 2 K L [ µ : µ 0 ] if µ ≤ µ 0 , and K L [ µ : µ 0 ] = log( µ 0 /µ ) − ( µ 0 − µ ) /µ 0 . The in trinsic pr iors are giv en in the next lemma (the pro of is str aightforw ard and is omitted): Lemma 3.2. The arithmetic and fr actional intrinsic priors ar e π A ( µ ) = µ 0 − 1 (1 + µ µ 0 ) − 2 , π F ( µ ) = µ 0 − 1 exp {− µ µ 0 } = E xp { µ | 1 µ 0 } . The f our p riors are sho wn in Figure 2 when testing µ 0 = 5. They all h a v e similar shap es, 13 n = 10 ˆ µ B S 12 B M 12 B A 12 B F 12 5 5.65 4.43 5.13 3 .59 7.5 2.36 2.02 2 .09 1.58 2.5 0.95 0.88 0 .82 0.59 n = 100 5 17.28 12.81 15.98 10.89 7.5 14 . 6 × 10 − 4 12 . 2 × 10 − 4 13 × 10 − 4 9 . 4 × 1 0 − 4 2.5 0 . 86 × 10 − 7 0 . 83 × 10 − 7 0 . 73 × 10 − 7 0 . 54 × 10 − 7 T able 2: Ba y es factors for the exp onen tial testing with µ 0 = 5 for differen t v alues of the MLE and n = 10, n = 100. although that of π M is someho w in nusual; they ha v e some inte resting prop erties: 1. In the log scale, b oth π M and π S are symmetric aroun d log µ 0 ; this is in accordance to Berger and Delampady (1987) an d Berger and Sellk e (1987) prop osals, since log( µ ) is a lo cation parameter. 2. All four p riors are prop er . 3. Neither the arith m etic intrinsic nor the DB p riors ha ve moments; the arith m etic fractional has all the moments. 4. π M has the hea viest tails, and π F the thinnest. π S has hea vier tails than π A 5. All f our priors are ‘cen tered’ at the n ull v alue µ 0 ; indeed, µ 0 is the m edian of th e DB p riors and of π A , and it is the mean of π F . The four Ba y es factors B 12 in fa v our of M 1 : µ = 5 app ear in T able 2, for tw o v alues of n ( n = 10 and n = 100) and some few v alues of the MLE ˆ µ = P n i =1 y i /n ∈ { 5 , 7 . 5 , 2 . 5 } . W e again find very similar r esults f or the differen t priors, with B S 12 and B A 12 pro viding sligh tly more supp ort to M 1 than B M 12 and B F 12 when data is compatible with M 1 . W e next in v estigate a desirable p rop erty of Ba y es factors whic h often f ails wh en they are computed using conju gate pr iors (see Berger and P ericc h i, 2001). It is natural to exp ect that, for any giv en sample size, B 12 → 0 as the evidence against the simpler mo del M 1 b ecomes o v erwhelming. When this prop ert y holds, w e sa y that the Ba y es fact or is evidenc e c onsistent (or finite sample consisten t). It is easy to sh o w that, if ¯ y → ∞ then B 12 → 0 ∀ n , no matter what prior is u sed to obtain the Ba yes factor. The follo wing lemma provides sufficien t conditions for B 12 → 0 as ¯ y → 0. Lemma 3.3. L et B π 12 b e the Bayes f actor c ompute d with π ( µ ) . B π 12 → 0 as ¯ y → 0 , for al l n ≥ k > 0 if and only if Z 1 0 µ − k π ( µ ) dµ = ∞ . (18) 14 20 40 60 80 100 2.5 5 7.5 10 12.5 15 17.5 20 Figure 3: Upp er b oun ds B 0 12 ( n, π ) of Ba y es factors as a fun ction of n for the priors π S (Solid line), π M (Dot-dashed line), π F (Dashed Line), and π A (Dots). Pr o of. See App endix. It follo ws that all four priors considered pro d uce evidenc e c onsistent Ba ye s factors for all n ≥ 1. Evid ence consistency p ro vides fur ther in sigh t into the b eha viour of the DB priors. Indeed, we recall that in the general defi nition of DB priors we used the p o w er q + δ , and th en w e recommended the sp ecific choi ce δ = . 5 . Interesti ngly , if δ > 1 is used instead, then π S w ould not b e evidence consistent as ¯ y → 0. Last, we stud y the b eha vior of B 12 as the evidence in fa v or of M 1 gro ws (that is, as ¯ y → µ 0 ). F or this example, it is easy to show that, w hen ¯ y → µ 0 , B 12 gro ws to a constan t, B 0 12 ( n, π ) sa y , that dep en ds only on n and the prior u s ed. Of course, it then follo ws from the dominated con v ergence theorem that B 0 12 ( n, π ) → ∞ with n , b ut this also follo ws f rom g eneral consistency of Ba ye s factors (for prop er, fix priors), so it is not v ery in teresting. O f more interest for our comparison is to stu dy h o w fast B 0 12 ( n, π 2 ) goes to ∞ . In Figure 3 w e sho w B 0 12 ( n, π ) for the four priors considered. It can b e seen that π S is the one pro ducing the largest v alues of B 0 12 for all v alues of n , with those for π A follo win g very closely . 3.3 Lo cation-scale (E xample 3) DB priors are defined in general for v ector parameters θ . As an illustration, we next consid er a most p opular example, namely the normal distribu tion; here the 2-dimensional θ h as t w o comp onent s of differen t n ature (location and scale). Sp ecifically , assume that f ( y | µ, σ ) = N ( y | µ, σ 2 ) , and that w e wan t to test M 1 : ( µ, σ ) = ( µ 0 , σ 0 ) versus M 2 : ( µ, σ ) 6 = ( µ 0 , σ 0 ). This h yp othesis testing problem o ccurs often in statistic al pr o c ess c ontr ol , wh ere a p ro duction p r o cess is con- sidered ‘in con trol’ if its p ro duction outputs ha v e a sp ecified mean and standard deviation (the so called nominal values ); the question of in terest is whether the pro cess is in con trol, that is, 15 -2 0 2 mu 0.5 1 1.5 2 sigma 0 0.05 0.1 0.15 -2 0 2 mu Figure 4: π S for the Normal p roblem, with µ 0 = 0 , σ 0 = 1. whether the mean and v ariance are equal to the nominal v alues. T o compute the DB priors we u s e the reference prior π N ( µ, σ ) = σ − 1 ; for the sum-DB prior w e get: π S ( µ, σ ) = π S ( σ ) π S ( µ | σ ) , π S ( σ ) ∝ σ ( σ 4 0 + σ 4 ) 1 / 2 ( σ 2 0 + σ 2 ) 1 / 2 , and π S ( µ | σ ) = Ca( µ | µ 0 , Σ) , Σ = σ 4 0 + σ 4 σ 2 0 + σ 2 , where C a r ep resen ts the Cauc hy d ensit y . In this example, the minim um-DB prior π M do es not exist, s in ce q M = ∞ . It can b e c hec ked that π S ( µ | σ ) is s ymmetric aroun d µ 0 , whic h is a lo cation parameter in π S ( µ | σ ); σ 0 is a scale paramete r in π S ( σ ). The joint density π S is shown in Figure 4. The in trinsic priors, which hav e simpler forms and thinner tails, are d eriv ed next (the pro of is omitted): Lemma 3.4. The arithmetic intrinsic prior is π A ( µ, σ ) = π A ( σ ) π A ( µ | σ ) , π A ( σ ) = 2 π σ 0 σ 2 + σ 2 0 , π A ( µ | σ ) = N ( µ | µ 0 , σ 2 + σ 2 0 2 ) , and the fr actional intrinsic prior is π F ( µ, σ ) = N + ( σ | 0 , σ 2 0 2 ) N ( µ | µ 0 , σ 2 0 2 ) , wher e N + stands for the normal density trunc ate d to the p ositive r e al line. The intrinsic pr iors are prop er; also, as with the sum -DB prior, µ 0 and σ 0 are lo cation and scale parameters for µ | σ and σ r esp ectiv ely . Un der the fractional intrinsic p r ior π F , µ an d σ are indep enden t a priori. V alues of B 12 for all three priors an d differen t v alues of the sufficient statistic ( ¯ y , S ) are 16 ¯ y = 0 ¯ y = 1 ¯ y = 2 B s 12 B A 12 B F 12 B s 12 B A 12 B F 12 B s 12 B A 12 B F 12 S = 0 . 5 2.30 1.35 0.70 0.03 0.02 0.01 3 · 10 − 8 4 · 10 − 8 6 · 10 − 8 S = 1 18.67 18.55 11.72 0.21 0.19 0.18 1 · 10 − 7 2 · 10 − 7 6 · 10 − 7 S = 2 0.006 0.006 0.017 5 · 10 − 5 5 · 10 − 5 21 · 10 − 5 2 · 10 − 11 2 · 10 − 11 41 · 10 − 11 T able 3: F or multidimensional parameter p roblem ( µ 0 = 0 , σ 0 = 1), v alues of B 12 for different v alues of ( ¯ y , S ) w ith n = 10. 1 2 3 4 5 sigma 0.2 0.4 0.6 0.8 1 Figure 5: Marginal distributions of σ when ( µ 0 , σ 0 ) = (0 , 1); π S 2 ( σ ) (solid line), π A 2 ( σ ) (dots), and π F 2 ( σ ) (dashed line). The pair (mo de,median) for these pr iors are (0.81,1 .56) for π D , (0,1) for π A , and (0,0.48) f or π F . -3 -2 -1 1 2 3 mu 0.1 0.2 0.3 0.4 0.5 -6 -4 -2 2 4 6 mu 0.1 0.2 0.3 0.4 0.5 Figure 6: Conditional distrib utions of µ given σ = 1 (left) and σ = 3 (righ t) when ( µ 0 , σ 0 ) = (0 , 1); π S (solid), π A (dots), and π F (dashed). giv en in T able 3 when ( µ 0 , σ 0 ) = (0 , 1). T he Ba yes f actors corresp ondin g to the d ifferen t pr iors can b e seen to b e quite similar, sp ecially , once again, B S 12 and B A 12 . F or the three priors, w e disp la y in Figure 5 the marginal distribu tions of σ and in Figure 6, the conditional distributions of µ give n σ . It can clearly b e seen th at π F ( σ ) has thinner tails than π A 2 and π S 2 (recall, th ic ker tails seem to p erform b etter for testing). Also, all cond itional priors for µ are symmetric around their mo d e µ 0 , with π S ( µ | σ ) having the hea viest tails. With r esp ect to the evidence consistency of the Ba y es factors, it is easy to sho w that wh en either ¯ y → ∞ , ¯ y → −∞ or S → ∞ (the evidence against M 1 is v ery stron g), then B 12 → 0, 17 20 40 60 80 100 25 50 75 100 125 150 175 200 Figure 7: Upp er b oun ds B 1 12 ( n, π ) of Ba y es factors as a fun ction of n for the priors π S (Solid line), π M (Dot-dashed line), π F (Dashed Line), and π A (Dots). ∀ n and for the three pr iors considered. When the evid en ce in fav or of M 1 is largest (that is, ( ¯ y , S ) → ( µ 0 , σ 0 )) it can b e seen (with a change of v ariables) that the Ba y es factor in fa vo r of M 1 , gro w s to B 1 12 ( n, π ) B 1 12 ( n, π ) = Z β − n exp {− n 1 + β 2 ( α 2 − 1) 2 β 2 } π j ∗ ( α, β ) , a f unction only of n and th e p rior ( j = A, F , S ) used . F or the arithmetic intrinsic prior and fractional priors, the mixin g densities π j ∗ are: π A ∗ ( α, β ) = 2 β π 3 / 2 (1 + β 2 ) 3 / 2 exp {− α 2 β 2 1 + β 2 } , π F ∗ ( α, β ) = 2 β π exp {− β 2 (1 + α 2 ) } , and for the sum DB prior: π S ∗ ( α, β ) = β 2 π κ 1 + β 4 + β 2 α 2 (1 + β 2 ) − 1 , κ = Z s ((1 + s 4 )(1 + s 2 )) − 1 / 2 ds. Figure 7 illustrates the rate at whic h B 1 12 ( n, π ) → ∞ as n → ∞ . It can b e clearly seen that, as in the previous example, DB and in trinsic prior b eha ve v ery similarly , b eing more sen s itiv e to the evidence in f a v or M 1 than the fractional prior, subtantia lly s o unless n is very small. Finally we compare the b eh a v ior of the three pr iors in a real example tak en from Mon tgomery (2001 ). The example r efers to con trolling the piston ring for an automotiv e engine pr o duction pro cess. The pro cess w as considered to b e in c ontr ol if the mean and the standard d eviation of the in side diameter (in millimeters) of the pistons w ere µ 0 = 74 . 001 and σ 0 = 0 . 0099. A t some sp ecific time, the f ollo wing sample was tak en from the pr o cess: 74 . 035 , 74 . 01 0 , 74 . 012 , 74 . 015 , 7 4 . 026 , and it had to b e c hec ked whether the process w as in con trol. Ba ye s factors are giv en in T able 4. 18 B F 12 pro vides ab out t wice more supp ort to M 1 than B S 12 and B A 12 , whic h are very simila r to eac h other. B S 12 B A 12 B F 12 0.004 0.005 0.011 T able 4: Ba yes factors B 12 for Mon tgomery (2001) example. 3.4 Irregular mo dels (Example 4) There is an imp ortant class of mo dels for wh ic h the parameter space is constrained by the data. These mo dels do not ha ve regular asymptotics and hence solutio ns based on asymptotic theory (lik e the Ba yesian in formation criteria, BIC) do not app ly . Moreo ver, these mo dels are v ery c hallenging for the intrinsic appr oac h; indeed, as discussed in Berger and Pe ricc hi (2001), the fractional Ba y es factor is completely unreasonable (and hence the f r actional in trinsic pr ior is useless), and the arithmetic intrinsic prior (whic h wa s only derive d for the one side pr oblem) is “something of a conjecture” (authors’ verbatim). W e tak e h ere the simplest su ch mo dels, namely an exp on ential distribution with unk n o wn lo cation. Accordingly , assume that f ( y | θ ) = exp {− ( y − θ ) } , y > θ , and that it is wan ted to test H 1 : θ = θ 0 vs. H 2 : θ 6 = θ 0 . T o th e b est of our kno wledge, n o ob jectiv e priors h a v e b een prop osed for this testing problem in the literature. In these situations, the sum-symm etrized kulbac k-Leibler diverge nce D S [ θ , θ 0 ] is ∞ , so w e ha v e to use the min im um. It can b e c hec k ed that ¯ D M [ θ , θ 0 ] = 2 | θ − θ 0 | , a w ell defi ned divergence . Also, π N ( θ ) = 1 since θ is a location parameter. The Minimum DB prior is then giv en by π M ( θ ) = 1 2 1 + 2 | θ − θ 0 | − 3 / 2 , θ ∈ R , whic h is symmetric with resp ect to θ 0 (as exp ected, since θ is a lo cation parameter); also, π M has no momen ts. Figure 8 (left) sh o ws π M ( θ ) when θ 0 = 0. W e next in ve stigate the evidenc e consistency for any n . The s u fficien t statistic is T = min { y 1 , . . . , y n } . It is tr ivially true that B 12 → 0, as T → −∞ f or an y (pr op er) prior (in f act, B 12 = 0 f or T < θ 0 ). The next lemma pro vides a sufficien t condition on the prior to pro duce evidence consistency ∀ n , as T → ∞ . Lemma 3.5. L et π ( θ ) b e any pr op er prior (on M 2 ) and B π 12 b e the c orr esp onding Bayes factor. If for some inte ger k > 0 Z ∞ θ 0 e k θ π ( θ ) dθ = ∞ , (19) 19 -3 -2 -1 1 2 3 theta 0.1 0.2 0.3 0.4 0.5 piM 5 10 15 20 25 30 n 10 20 30 40 50 60 B0 Figure 8: Irregular example, t wo -side testing of M 1 : θ = 0. Left: the DB prior π M ; Right: B 0 12 ( n ) as a fu n ction of n . then B π 12 → 0 as T → ∞ ∀ n ≥ k . Pr o of. See App endix. It follo ws from the previous lemma that π M pro du ces evidenc e c onsistent Ba yes factors ∀ n ≥ 1. W e next in v estigate the situation for increasing evidence in favor of M 1 , that is, as T → θ + 0 . Let B 0 12 ( n ) = lim T → θ + 0 B π D 12 . B 0 12 ( n ) is an upp er b ound of B 12 when the evid en ce in fav or of M 1 is largest. It can b e seen in Figure 8 (right) that B 0 12 ( n ) is nearly linear. Of course B 0 12 ( n ) → ∞ when n → ∞ . As mentio ned b efore, there d o es not seem to b e any other prop osals in th e literature for the t wo -side testing p r oblem. Ho wev er, Berger and P ericc hi (2001), do consid er the ‘one sid e testing’ ve rsion, namely testing M 1 : θ = θ 0 vs M 2 : θ > θ 0 ; they conjecture that the arithmetic in trinsic prior for this prob lem is th e prop er density π A 2 ( θ ) = − e θ − θ 0 log(1 − e θ 0 − θ ) − 1 , θ > θ 0 , whic h is a decreasing and unb ounded fu nction of θ . Also, since W e next compare th e (minim um) DB prior for this p roblem with Berger and Pericc hi prop osal. Although our original formulatio n app ears to b e in terms of t wo side testing (see (1)) in realit y it suffices to d efine Θ appropr iately to cov er other testing situ ations. F or instance, in our one-side testing, we tak e Θ = [ θ 0 , ∞ ). Th e (minim um) DB p rior is π M ( θ ) = 1 + 2( θ − θ 0 ) − 3 / 2 , θ > θ 0 . It can b e c hec k ed, that π A meets condition (19) for k = 1 and hence π A pro du ces evidence 20 0.5 1 1.5 2 2.5 3 3.5 4 theta 0.25 0.5 0.75 1 1.25 1.5 1.75 2 Figure 9: Irregular, one side testing prob lem: π D (solid) and π A (dots) for the case θ 0 = 0. T 0.02 0.05 0.10 0.20 0.50 1.00 n = 10 B M 12 46.56 16.66 6.83 2.19 0.16 0.002 B A 12 11.54 5.16 2.57 1.02 0.10 0.001 n = 20 B M 12 41.96 12.65 3.75 0.55 0.002 2 · 10 − 7 B A 12 10.52 4.04 1.50 0.28 0.002 2 · 10 − 7 T able 5: I r regular mo dels, one side testing. V alues of B 12 for differen t v alues of T , n and for the t wo priors π A , π M , when testing θ 0 = 0. consisten t Ba yes factors ∀ n ≥ 1. T he p riors π A and π M are displa yed in Figure 9. W e find that also in this example π M has thic k er tails. In this one side testing scenario (in sharp contrast to the b ehavio r in the t wo-side testing) the Ba y es factor in fa v or of M 1 for eve ry n > 0 do es gro w to ∞ as the evidence in fav or of M 1 gro ws. Ind eed, the Ba yes factor B 12 is Z T θ 0 exp { n ( θ − θ 0 ) } π ( θ ) dθ − 1 , so that, B 12 → ∞ wh en T → θ + 0 , ∀ n > 0, no matter what prior is used. Note that here θ 0 is in the b oundary of the parameter space. In T able 5, w e p ro duce the Ba y es facto rs computed with π A and π M when θ 0 = 0 f or v arious v alues of T = min { y 1 , . . . , y n } , and for n = 10 and n = 20. F or small v alues of T ( T < 0 . 20), when evidence su pp orts M 1 , B M 12 is considerably larger than B A 12 , thus giving more sup p ort to M 1 . F or larger v alues of T (that is, when d ata contradict M 1 ) b oth p riors result in v ery similar Ba yes factors. 21 3.5 Mixture mo dels (Example 5) Mixture m o dels are among the most c hallenging scenarios for ob jectiv e Ba ye sian m etho dology . These mo dels ha v e impr op er likeliho o ds , i.e., lik eliho o ds for wh ic h no improp er prior yields a finite marginal densit y (in tegrated lik eliho o d ). Recen tly , P´ erez and Berger (2001), hav e used exp e cte d p osterior priors (see P ´ erez and Be rger, 2002) to derive ob jectiv e estimation priors, but basically no general metho d seems to exist for deriving ob jectiv e pr iors for testing w ith these mo dels. Ho wev er, the div ergence measures are well defined (although the inte grals are n o w more in vo lve d) p ro viding a reasonable DB pr ior to b e used in mo del selection. W e consid er a simple illustration. Assume f ( y | µ, p ) = p N ( y | 0 , 1) + (1 − p ) N ( y | µ, 1) , and th e testing of H 1 : µ = 0, vs . H 2 : µ 6 = 0, where p < 1 is kno wn (if p = 1, b oth h yp otheses define the same mo del). As Berger and P ericc hi (2001) p oin t out, there is no min imal training sample for this problem and hence the in trinsic Ba yes factor cannot b e defined. Th e fractional Ba ye s factor do es n ot exist either. T he only prior w e kn o w f or this pr ob lem is the recommendation in Berger and P ericc hi (2001 ) of us ing π B P ( µ ) = C a ( µ | 0 , 1). Although there is n o formal π N ( µ ) h ere, π N ( µ ) = 1 is usu ally assumed (see for instance P ´ erez an d Berger, 2002). It can b e sho wn that q M = ∞ , and h ence, π M do es not exist. Let G ( p, µ, µ ∗ ) = Z ∞ −∞ log h 1 + 1 − p p e y µ − µ 2 / 2 i N ( y | µ ∗ , 1) dy . (20) Then D S [ µ, µ 0 ] = n (1 − p ) G ( p, µ, µ ) − G ( p, µ, 0) . It can b e sh o wn that q S < ∞ , and hence that the sum DB p rior π S exists. The normalizing constan t, how eve r, can not b e der ived in closed form. Numerical pr o cedures could b e u sed to exactly der ive the sum-DB prior. W e use in stead a Laplace approximati on (see T anner 1996) to (20) to get and app r o ximate DB p rior. S p ecifically G ( p, µ, µ ∗ ) ≈ log h 1 + 1 − p p e µ ∗ µ − µ 2 / 2 i = G L ( p, µ, µ ∗ ) . (21) Figure 10 s ho ws G ( p, µ, µ ∗ ) − G ( p, µ, 0) an d its ap p ro ximation G L ( p, µ, µ ∗ ) − G L ( p, µ, 0) f or p = . 5 and p = . 75. Th e appro ximation is very goo d as long as p is not too extreme. W e can now use this approximat ion to d eriv e the DB prior. Note that the natural effectiv e sample size here is n ∗ = n (1 − p ), so that the un itary sum-symmetrized dive rgence is 22 -2 -1.5 -1 -0.5 0.5 1 1.5 2 mu 0.25 0.5 0.75 1 1.25 1.5 1.75 2 -2 -1.5 -1 -0.5 0.5 1 1.5 2 mu 0.2 0.4 0.6 0.8 1 1.2 1.4 Figure 10: G ( p, µ, µ ) − G ( p, µ, 0) (solid) and its Laplace appro ximation G L ( p, µ, µ ) − G L ( p, µ, 0) (dots). Left: p = 0 . 50. Right: p = 0 . 75. -4 -2 2 4 mu 0.1 0.2 -4 -2 2 4 mu 0.1 0.2 -4 -2 2 4 mu 0.1 0.2 Figure 11: π S L (solid lin e), C a (0 , 1 / 1 − p ) (dashed line) and π B P ( µ ) = C a ( µ | 0 , 1) (dots) for p = 0 . 50 (left), p = 0 . 75 (middle) and p = 0 . 25 (righ t). ¯ D S [ µ, µ 0 ] = D S ( µ, µ 0 ) n (1 − p ) ≈ log 1 + 1 − p p e µ 2 / 2 1 + 1 − p p e − µ 2 / 2 = ¯ D S L [ µ, µ 0 ] . This appr o ximation is s p ecially app ealing b ecause it also k eeps essen tial prop erties of th e div ergence m easures. In p articular, ¯ D S L ( µ, µ 0 ) ≥ ¯ D S L ( µ 0 , µ 0 ) = 0, so th at the approximate DB prior π S L ( µ ) ∝ 1 + ¯ D S L ( µ, µ 0 ) − q s ∗ , has a mo de at zero. S ince q s = 1 / 2, we finally get π S L ( µ ) ∝ 1 + ¯ D LS ( µ, µ 0 ) − 1 . In terestingly , th e prior π S L is close to a Cauc hy density , wh ich was Berger and P ericc hi prop osal, although the scale differs. Ind eed a T a ylor expansion of order 3, around µ = 0 giv es ¯ D S L ( µ, µ 0 ) ≈ (1 − p ) µ 2 , (22) 23 20 40 60 80 100 n 2.5 5 7.5 10 12.5 15 17.5 20 Figure 12: B 0 12 for π S L (solid line) and π B P (dots) as a f u nction of n , for p = 0 . 5. so that, un less p is v ery close to 1, π S L b ehav es aroun d 0 as a C a ( µ | 0 , 1 / (1 − p )); the appr ox- imation is excellen t when p is close to 0.5. In the tails, on the other hand, we ha ve that, as | µ | → ∞ ¯ D S L ( µ, µ 0 ) ≈ µ 2 2 , (23) indep end en tly of p . Hence, th e tails of π S L are close to those of a C a ( µ | 0 , 2) den s it y . Note that b oth app ro ximations (22) and (23 ) coincide for p = 0 . 5. The s cale of the C a ( µ | 0 , 1 / (1 − p )) mak es in tuitiv e s en se. In d eed, the larger p , the less observ ations pr o viding information ab out µ we get, and the DB prior adjust to a less inf orm ativ e lik eliho o d by inflating its scale. Figure 11 displa ys π S L , its C a ( µ | 0 , 1 / (1 − p )) approximati on, and the prop osal of Berger and P ericc h i (2001 ) for differen t v alues of p . Notice that, for v alues of p close to 0, π S L (and its app ro ximation C a (0 , 1 / 1 − p )) appro ximately b eha v es as a C a (0 , 1), the Berger and Pericc hi prop osal (see Figure 11, r igh t). Th is has an interesting inte rp retation since, as p → 0 the testing p roblem in this example essen tially coincides with that of testing H 1 : µ = 0 vs. H 2 : µ 6 = 0, when µ is the mean o f a normal den s it y , for which the C a ( µ | 0 , 1) is p erhaps the most p opular prior to b e used as prior distribu tion for µ under H 2 . In this examp le, the DB pr ior (as w ell as Berger and Pe ricc hi prop osal) again p ro duces evidence consisten t Ba ye s factors for all n . Indeed, it can b e sh o w n that if one of the y ′ i s tends to ∞ or −∞ , then the corresp ondin g Ba y es f actor tends to 0 no matter what prior is used. On the other hand , as the evidence for H 1 increases, w e get a finite upp er b oun d on B 12 for eve ry fixed sample size n : B 0 12 ( n, p, π ) = lim y i → 0 , ∀ i B 12 . In Figure 12 w e show B 0 12 for π = π S L and π = C a ( µ | 0 , 1) as a fun ction of n for p = 0 . 5. As in the previous examples, it is an immediate consequ ence that B 0 12 ( n, p, π ) → ∞ as n → ∞ for b oth priors, but the supp ort for H 1 is larger when π S L is used for eve ry n . In T able 6 w e show the Ba ye s factors B S L 12 , B ap 12 and B B P 12 computed resp ectiv ely w ith the priors π S L , its C a ( µ | 0 , 1 / (1 − p )) appro ximation and the C a ( µ | 0 , 1) pr op osed b y Berger and 24 p = 0 . 2 5 p = 0 . 5 p = 0 . 75 µ B S L 12 B ap 12 B B P 12 B S L 12 B ap 12 B B P 12 B S L 12 B ap 12 B B P 12 0 5.49 4.97 4.3 9 2.56 2.56 2.01 2.37 2.90 1.87 0.5 1.82 1.65 1 .49 0.36 0.36 0.3 3 1.69 2.06 1.42 1 0.07 0.06 0.0 6 0.04 0.04 0.04 0.01 0.01 0.01 T able 6: Ba y es factors B 12 for simulated samples of size n = 20 from the mixture mo d el with v arious v alues of p and µ and the priors π S L , its ap p ro ximation C a ( µ | 0 , 1 / (1 − p )) and π B P ( µ ) = C a ( µ | 0 , 1). P ericc hi. Sin ce redu ction b y sufficient statistic is not p ossible, the Bay es factors are computed for sim ulated samples of size n = 20, with mean µ ∈ { 0 , 0 . 5 , 1 } , and p ∈ { 0 . 25 , . 5 , 0 . 75 } . B S L 12 and its approxi mation B ap 12 are ve ry close, demons trating that the appr oximati on is v ery goo d for the considered r ange of p . B S L 12 and B B P 12 are also v ery similar. 4 Nuisance parameters In this section we deal w ith more realistic problems in w hic h the distribution of the data is not fully sp ecified u nder the n ull (simplest mod el), but dep ends on some n uisance parameter. Assume that y i , i = 1 , . . . , n are ind ep endent (not necessarily i.i.d.) and th at y = ( y 1 , . . . , y n ) ∼ { f ( y | θ , ν ) , θ ∈ Θ , ν ∈ Υ } . W e wa nt to test H 1 : θ = θ 0 vs. H 2 : θ 6 = θ 0 . Equiv alen tly w e wa nt to solve the mod el sele ction pr oblem (2) w here it is carefully ac knowledged that ν can ha v e a differen t m eanings in eac h mo del. Ho wev er, from no w on w e assu m e, afte r su itable reparameterizat ion if needed, that θ and ν are ortho gonal (that is, that Fisher information matrix is blo ck diagonal) . It is then cu s tomary to assume that ν has the same meaning under b oth mo dels (see Berger and Pe ricc hi, 199 6, for an asymptotic justification). T h is will b e n eeded for th e div ergence measures to ha ve int uitive meaning, and also to justify assessment of the same (p ossibly impr op er) p rior for ν und er b oth mo d els th us considerably simp lifying the assessment task. The su itabilit y of orthogonal parameters in the presence of mo del uncertaint y was first exploited by Jeffreys (1961) and it has b een successfully used by man y others (see for example Zellner and Siow, 1980, 1984, and C lyde, DeSimone and Parmigi ani, 1996). F or u niv ariate θ , Co x and Reid (1987) explicitly pro vide an orthogonal reparameterization. Accordingly , we assume that the h yp othesis testing problem ab o ve is equiv alen t to that of c ho osing b et ween the comp eting mo dels: M 1 : f 1 ( y | ν ) = f ( y | θ 0 , ν ) vs. M 2 : f 2 ( y | θ , ν ) = f ( y | θ , ν ) , (24) where θ 0 ∈ Θ is a sp ecified v alue, and ν (the ol d p ar ameter in J effrey’s terminology) is assum ed 25 to b e common to b oth mo dels, which only differ by the different v alue of th e new p ar ameter θ under M 2 . 4.1 Div ergence Measures The basic measur e of discrepancy b et wee n θ and θ 0 is again Ku llb ac k-Leibler directed div ergence (5) where ν is tak en to b e the same in b oth mod els: K L [( θ 0 , ν ) : ( θ , ν )] = Z Y (log f ( y | θ , ν ) − log f ( y | θ 0 , ν )) f ( y | θ , ν ) d y . Note that using the same ν only mak es in tuitiv e sense if ν has th e same meaning under b oth mo dels, and h ence can b e considered common. Actually , P´ erez (2005) u sing geometrica l argu- men ts, sho ws th at und er orthogonalit y K L [( θ 0 , ν ) : ( θ , ν )] can b e inte rp reted as a measure of div ergence b et w een f 1 and f 2 due solely to th e p arameter o f in terest θ . Th is interpretatio n do es not hold for other div ergence measures, as the in trins ic loss d iv ergence defined in Bernard o and Rueda (2002). Similarly to Section 2 we symmetrize Ku llb ac k-Leibler directed div ergence b y add ing or taking the minim um of them, resulting in th e sum-dive rgence and min-d ivergence measur es b et wee n θ and θ 0 for a giv en ν D S [( θ , θ 0 ) | ν ] = K L [( θ , ν ) : ( θ 0 , ν )] + K L [( θ 0 , ν ) : ( θ , ν )] , (25) and D M [( θ , θ 0 ) | ν ] = 2 × m in { K L [( θ , ν ) : ( θ 0 , ν )] , K L [( θ 0 , ν ) : ( θ , ν )] } . (26) D M is used by P ´ erez (2005) to defin e what he calls the “orthogonal in trinsic loss”. In wh at follo ws, many of the definitions and prop erties apply to b oth D S and D M , in whic h case we again generically use D to d enote any of them. Their basic prop erties w ere discussed in Section 2. As b efore, the building blo c k of the DB prior is th e unitary me asur e of diver genc e ¯ D = D /n ∗ , where n ∗ is the equiv alent sample size for θ . 4.2 DB prior s in the presence of n uisance parameters F or testing H 1 : θ = θ 0 vs. H 2 : θ 6 = θ 0 , or equiv alen tly c ho osing b etw een mo dels M 1 and M 2 in (24), we need priors π 1 ( ν ) u nder M 1 and π 2 ( ν , θ ) un der M 2 . In the sp irit of Jeffreys (and man y others after him) we tak e (u nder eac h of the mod els) the same ob jectiv e (p ossibly imp rop er) pr ior f or the common p arameter ν and a prop er prior f or the conditional d istr ibution of the new parameter θ | ν und er M 2 , whic h will b e derived similarly to the DB pr iors in S ection 2.2. Note that sin ce ν o ccurs in the t w o mo dels, if we tak e the same π N ( ν ) in b oth, then the (co mmon) arb itrary constan ts cancel wh en computing the Ba y es 26 factor; how ev er θ whic h on ly o ccurs in M 2 has to ha ve a prop er prior. A common p r ior for the old parameter only m akes sense when ν has the same meaning in b oth models (another reason to tak e θ and ν orth ogonal). Moreo ver, it is well k n o wn that u nder orthogonalit y , the sp ecific c ommon p rior for ν has littl e impact on the resulting Ba yes factor (see Jeffreys 196 1; Kass and V aidy anathan 1992), thus supp orting use of ob jectiv e p riors for common p arameters. Let π N ( ν ) b e an ob jectiv e (usu ally either Jeffreys or reference) p rior for mo d el f 1 and π N ( θ , ν ) the corresp onding one for m o del f 2 ( θ is of in terest if the reference pr ior is used). W e define π N ( θ | ν ) such that π N ( θ , ν ) = π N ( θ | ν ) π N ( ν ) . T o define the DB priors, le t D an y of (25) or (26) (other appropr iate d iv ergence measures could also b e explored). Th en w e defin e: Definition 4.1. (DB priors) L et c ( q , ν ) = R 1 + ¯ D [( θ , θ 0 ) | ν ] − q π N ( θ | ν ) d θ , and q = inf { q ≥ 0 : c ( q, ν ) < ∞} , a.e. ν ∈ Υ , q ∗ = q + 1 / 2 If q < ∞ , the D-diver genc e b ase d prior under M 1 is π D 1 ( ν ) = π N ( ν ) , and u nder M 2 is π D 2 ( θ , ν ) = π D ( θ | ν ) π N ( ν ) , wher e the (pr op er) π D ( θ | ν ) is π D ( θ | ν ) = c ( q ∗ , ν ) − 1 1 + ¯ D [( θ , θ 0 ) | ν ] − q ∗ π N ( θ | ν ) . In this defint ion we are implicitly u sing the reccomended non-increasing fun ction h q ( t ) = (1 + t ) − q , but again other n on-increasing functions on t ∈ [0 , ∞ ) could b e explored. Definition 4.2. (Sum and Minim um DB priors) The su m DB prior π S and the minimum DB prior π M ar e the DB priors giv en in definition 4.1 with D b eing r esp e ctively D S (se e (25) ) and D M (se e (26) ). When ne e de d, we r efer to their c orr esp onding c’s and q’s as c S , q S , q S ∗ , and c M , q M , q M ∗ , r esp e ctively. W e next inv estigate whether the DB priors are in v ariant u nder r ep arameterizat ions. Supp ose that ξ = ξ ( θ ) and η = η ( ν ) are, resp ectiv ely one-to-one monotone mappings ξ : Θ → Θ ξ , η : Υ → Υ η . Clearly , the reparameterization ( ξ , η ) preserv es orthogonalit y . The original p roblem (24) in this p arameterization b ecomes: M ∗ 1 : f ∗ 1 ( y | η ) = f ∗ ( y | ξ 0 , η ) vs. M ∗ 2 : f ∗ 2 ( y | ξ , η ) = f ∗ ( y | ξ , η ) , (27) where f ∗ ( y | ξ ( θ ) , η ( ν )) = f ( y | θ , ν ) and ξ 0 = ξ ( θ 0 ). W e n ext show that if π N ( ν ) and π N ( θ , ν ) are inv arian t under these reparameterizatio ns , so are the DB priors. (See Datta and Ghosh, 19 95 for a detaile d analysis about the in v ariance of s ev eral non informativ e priors in the presence of nuisance parameters.) 27 Theorem 1. (Invarianc e under one-to-one tr ansformatio ns.) L et π D ν ( ν ) and π D η ( η ) b e either the sum or the minimum DB priors under M 1 for the original (24), and r ep ar ameterize d (27) pr oblems, r e sp e ctively, and similar notation for π D θ , ν ( θ , ν ) and π D ξ ,η ( ξ , η ) , under M 2 . If π N ν ( ν ) = κ π N η ( η ( ν )) |J η ( ν ) | , wher e κ is a c onstant, and π N θ , ν ( θ , ν ) ∝ π N ξ ,η ( ξ ( θ ) , η ( ν )) |J ξ ,η ( θ , ν ) | , then π D ν ( ν ) = κ π D η ( η ( ν )) |J η ( ν ) | , π D θ , ν ( θ , ν ) = κ π D ξ ,η ( ξ ( θ ) , η ( ν )) |J ξ ,η ( θ , ν ) | . Pr o of. See App endix. As a consequ en ce, DB Ba ye s factors are n ot affected b y r eparameterizatio ns of the t yp e considered. Th ese are th e most natural and in teresting reparamete rizations of the pr oblem (and indeed other reparameterizat ions seem qu estionable). Also, the DB pr iors are compatible w ith reduction by sufficiency in the same spirit as in Prop osition 2. 4.3 Examples W e next demonstrate the b ehavior of DB p r iors and corresp onding Ba y es factors in a couple of examples. T he fir s t is testing the mean of a gamma mo del, a d ifficult problem in general. The second discusses linear m o dels. 4.3.1 G amma mo del ( E xample 6) Let y = ( y 1 , . . . , y n ) b e an iid sample fr om a Gamma mod el with mean µ , and sh ap e parameter α , that is, from f ( y | α, µ ) = α µ α Γ( α ) − 1 y α − 1 e − y α/µ . It is desired to test H 1 : µ = µ 0 vs. H 2 : µ 6 = µ 0 . It is easy to sho w that µ is orth ogonal to α . The ob j ectiv e (reference) p riors are π N ( α ) = ( ψ (1) ( α ) − 1 /α ) 1 / 2 and π N ( µ, α ) = µ − 1 ( ψ (1) ( α ) − 1 /α ) 1 / 2 , where ψ (1) represent s the digamma function. Hence π N 2 ( µ | α ) = µ − 1 . The DB priors are π D ( α ) = π N ( α ), under b oth hyp otheses a nd for D either the su m or min div ergence. Und er H 2 , the conditional sum-DB p rior for µ is π S ( µ | α ) = c − 1 s ( α ) 1 + α ( µ − µ 0 ) 2 µµ 0 − 1 / 2 1 µ where c s ( α ) is the pr op ortionalit y constant c s ( α ) = Z ∞ 0 1 + α ( t − 1) 2 t − 1 / 2 1 t dt. 28 ˆ µ = 10 ˆ µ = 11 ˆ µ = 12 B S 12 B M 12 B S 12 B M 12 B S 12 B M 12 ˆ σ = 0 . 5 12.94 2.83 0.005 0.004 1 · 10 − 5 3 · 10 − 5 ˆ σ = 1 11.27 2.92 0.35 3 0.150 0.003 0.003 ˆ σ = 2 9.49 3 .06 3.102 1.136 0.22 0.12 T able 7: V alues of B 12 for gamma m ean testing with µ 0 = 10; w e use n = 10, and d ifferen t v alues of ( ˆ µ, ˆ σ ) . The conditional min -DB prior is π M ( µ | α ) = c m ( α ) − 1 1 + ¯ D M [( µ, µ 0 ) | α ] − 3 / 2 1 µ where ¯ D M [( µ, µ 0 ) | α ] = n 2 α (log µ µ 0 − 1 + µ 0 µ ) if µ > µ 0 2 α (log µ 0 µ − 1 + µ µ 0 ) if µ ≤ µ 0 , and c m ( α ) = 2 Z ∞ 0 1 + 2 α ( t − 1 + e − t ) − 3 / 2 dt. In T able 7 we sho w the corresp onding Ba yes factors B S 12 and B M 12 for n = 10; th e null v alue is µ 0 = 10, and w e ha ve consider ed s everal com b inations of ( ˆ µ, ˆ σ ), the maximum lik eliho o d estimates of the mean an d standard deviation. When ˆ µ = 12 (casting doub t on th e n ull), b oth Ba yes factors are v ery similar, and in creasing with ˆ σ , an intuitiv e b ehavio r. When the data sho ws th e most su pp ort for the n ull, that is, when ˆ µ = 10, the Ba y es factors differ, with the sum-DB prior giving the most supp ort to the null. In con trast with DB priors, it is not p ossible to derive r elativ ely simple expressions for the in trinsic priors. Hence, in this example, we compare the DB Ba ye s factors with the in trinsic arithmetic Ba yes factor I B A 12 (see Berger and P ericc hi 1996). Although I B A 12 do es not exactl y corresp ond to a Bay es factor d eriv ed from a sp ecific prior, it d o es asymptotically corresp on d to a Ba y es factor derive d w ith the int rin s ic arithmetic pr ior. Since I B A 12 is not d efined with reduction by s u fficiency , the comparison are carried out f or (sp ecific) simulate d samp les with the giv en parameters. In T able 8 we show the arithmetic intrinsic and DB Ba yes factors for testing H 1 : µ = 10, with n = 10 and samp les generated fr om Gamma d istributions with µ ∈ { 10 , 11 , 12 } and σ ∈ { 0 . 5 , 1 . 0 , 2 . 0 } . The r esulting MLEs ( ˆ µ, ˆ σ ) in lexicographical order are: { (10.02,0.5 2), (9.98,0.99 ), (9.98,1.9 7), (11.01,0.4 8), (11.00,0.9 9), (10.98, 1.99), (11.99, 0.51), (11.98 ,0.99), (12.0 1,1.99) } . When H 2 is tr ue ( µ = 11 or µ = 12), th e thr ee measures are rather close. Similar v alues are also obtained wh en the ‘n ull’ m o del H 1 is true and σ = 2. In all these cases, the thr ee measures provide su pp ort to the tru e m o del. Nev ertheless, w hen H 1 is true 29 µ = 10 µ = 11 µ = 12 B S 12 B M 12 I B A 12 B S 12 B M 12 I B A 12 B S 12 B M 12 I B A 12 σ = 0 . 5 13.1 7 2.93 0.08 0.004 0.003 0.001 1.4 · 10 − 5 3.7 · 10 − 5 0.1 · 10 − 5 σ = 1 11.15 2.88 0.55 0.33 0.14 0.07 0.003 0.003 0.001 σ = 2 9.5 7 3.08 3.71 3.07 1.12 1.2 3 0.22 0 .12 0.07 T able 8: F or Gamma mo d el problem, and test H 1 : µ = 10 vs. H 2 : µ 6 = 10. In eac h cell, v alues of B 12 and arithmetic int rin sic Ba ye s factor I B A 12 , associated with a sample o f size n = 10, from a Gamma mo del with m ean µ and standard d eviation σ . and the v ariance is small, the DB Ba ye s f actors are very sensible (with B S 12 giving th e largest supp ort to the n ull) but the I B A 12 is not, giving supp ort to H 2 . This b ehavio r of I B A 12 is likely due to the well k n o wn instabilit y of I B A 12 when the sample size is sm all (w orsened in this case b ecause the v ariance is small). 4.3.2 V ariable selection in linear mo dels (Example 7). W e briefly show next the motiv ating example for this pap er; sp ecifically we show how the DB pr ior repro duces Jeffreys-Zellner-Sio w pr ior f or v ariable selection in linear mo dels. More elab orated examples of testing in linear mo d els can b e found in Ba yarri and Garc ´ ıa-Donat o (2007 ). Deriv ations of DB priors for random effects are given in Garc ´ ıa-Donato and S un (2007) . Consider the full rank General Linear Mo del { N n ( y | X 1 β 1 + X e β e , σ 2 I n ) } and th e problem of testing H 1 : β e = 0 . After the usual orthogonal r eparameterizati on (see e.g. Zellner and Siow 1984) and taking n ∗ = n and π N ( β 1 , β e , σ ) = σ − 1 , the DB pr iors are π D 1 ( β 1 , σ ) = σ − 1 , π D 2 ( β 1 , β e , σ ) = σ − 1 C a k e ( β e | 0 , n ∗ σ 2 ( V t V ) − 1 ) , where k e is the dimension of β e and V = ( I n − P 1 ) X e , P 1 = X 1 ( X t 1 X 1 ) − 1 X t 1 . Note that the exact matc hin g of JZS and DB priors only o ccur if the effectiv e sample s ize is n ∗ = n . Th is ‘coincidence’ was the original motiv ation for the sp ecific c hoice q + 1 / 2 in the definition of DB pr iors (see Garc ´ ıa-Donato, 2003 for d etails). Ho wev er, n ∗ migh t we ll d ep end on the design matrix (or cov ariates). F or example, in th e linear m o del Y = X θ + ǫ , with X : n × 1 and θ scalar, it is in tuitive ly clear that if X = (1 , . . . , 1) t then n ∗ should b e n , but if X = (1 , ε, . . . , ε ) t with ε ve ry small, then n ∗ should b e 1. The effect ive sample size defined in Berger et al. (2007) satisfies this requirement but other d efinitions migh t not. Extended in ve stigation of this issue is b eyo nd the scop e of this pap er and will b e pu rsued elsewhere. Since comparison among existing ob j ective Ba y esian testi ng procedu res for the Lin ear mo d el ha v e extensive ly b een giv en in the literat ur e, includin g Ba y es factors derived with JZS pr iors, 30 w e skip them here (see for example Berger, Ghosh and Mukhopadhy ay , 2003; Liang et al., 2007; Ba yarri and Garc ´ ıa-Donato, 2007). 5 Appro ximations and computation In this Section, we deriv e simple app r o ximations to DB pr iors and sho w their connections w ith already existing prop osal. W e also exploit th e connection b et wee n DB Ba y es factors and a cor- rected Ba y es factor computed with usual (possib ly improp er) non-informative priors to p rop ose easy MCMC computation of DB Ba yes factors. 5.1 Appro ximated DB priors It is well kno wn (see Kullbac k 1968; Sc hervish 1995) that the Kullbac k -Leibler d iv ergence mea- sures can b e app ro ximated up to second order using the exp ected Fisher information, so th at: D S [( θ , θ 0 ) | ν ] ≈ ( θ − θ 0 ) t J θ ( θ 0 , ν ) ( θ − θ 0 ) ≈ D M [( θ , θ 0 ) | ν ] , where J θ ( θ 0 , ν ) is the blo c k in Fisher in f ormation matrix corresp ond ing to θ , ev aluated at ( θ 0 , ν ). Hence, for the prob lem (24) (recall that θ and ν are orthogonal), the DB priors π D (either π S or π M ) can b e appr o ximated by π D 1 ( ν ) = π N ( ν ) and π D ( θ | ν ) = c ( q ∗ , ν ) − 1 h q ∗ ( θ − θ 0 ) t J θ ( θ 0 , ν ) n ∗ ( θ − θ 0 ) π N ( θ | ν ) , (28) where now q ∗ = q + 1 / 2, and q is the infimum of q v alues for whic h th e conditional d efined in (28) (in terms of Fish er information) is pr op er. The cases when π N ( θ | ν ) d o es not dep end on θ (so θ b eha v es asymp toticall y as a lo cation parameter) are sp eciall y interesting. It is easy to then show that q = k / 2, w here k is the dimension of θ and hence π D ( θ | ν ) ≈ C a k ( θ | θ 0 , n ∗ J − 1 θ ( θ 0 , ν )) , (29) The conditional prior (29) has b een in terpreted by man y authors (see for in stance Kass and W asserman 1995) as the generalizatio n of Jeffreys’ ideas to multiv ariate problems. Moreo ver, if h q ( t ) = e − q t is used instead, then π D w ould essen tially b e th e normal un it information priors, as defin ed b y Kass and W asserman (1995) and further stud ied by Raftery (1998 ). Note that w e ha v e shown that this prop osals can b e in terpreted as appro ximated DB priors only when θ is asymptotically a lo cation p arameter. 31 5.2 Computation of Bay es factor In terestingly enough, and similarly to other ob jectiv e Bay esian prop osals (lik e the intrinsic and fractional Ba yes factors), it can b e sho wn that Ba y es factors computed with DB pr iors, B D 21 , can b e expr essed as an (in v alid) Ba y es factor computed with n on-informativ e (usu ally improp er) priors, B N 21 , multiplie d by a correction factor. Th is expression also allo ws for easy computation of DB Ba ye s factors wh en B N 21 is easy to compute. Lemma 5.1. F or pr oblem (24) (with θ and ν orth o gonal), let B N 12 denote the Bayes factor c ompute d using π N 1 ( ν ) and π N 2 ( θ , ν ) , then for b oth the sum and min DB- priors B D 21 = B N 21 × E π N ( θ, ν | y ) c ( q ∗ , ν ) − 1 h q ∗ ( ¯ D [( θ , θ 0 ) | ν ]) . (30) Pr o of. See App endix. Computation of B N 21 is often simpler than computation of prop er Bay es factors. Then a sample (usually MCMC) from the p osterior distribution π N ( θ , ν | y ) can b e u sed to ev aluate the exp ectation in (30) , th us considerably simplifyin g computation of B S 12 or B M 12 . This is actually ho w we computed the Ba ye s factors for Example 6 in Section 4.3.1. Moreo ver, if n is large (relativ e to the dim en sion of φ = ( θ , ν ), assumed fixed) we can approx- imate (30) usin g asymptotic expressions to p osterior distrib u tion along with th e app r o ximated DB priors giv en in (28). W e illustrate the approac h in a simple s etting. First we assu m e that the asymptotic p osterior distribution is giv en b y (see conditions in e.g. Berger 1985 ), π N ( θ , ν | y ) ≈ N ( ˆ φ , J − 1 ( ˆ φ )) , where ˆ φ = ( ˆ θ , ˆ ν ) is th e (assumed to exist) maxim um likel iho o d estimate of ( θ , ν ) and J = J θ ⊕ J ν is the (blo ck diagonal) exp ected Fisher inform ation matrix of f ( y | θ , ν ). Next we assume th at π N ( θ | ν ) do es not d ep end on θ , so the appro ximating (conditional) DB prior is the Cauch y prior in (29). As a nota tional device, it w ill b e conv enient to then write π N ( θ | ν ) as π N ( θ 0 | ν ). Expressing the Cauc hy densit y (29) in the usual wa y as a scale mixture of a Normal and an in v erse gamma, and using the asymptotic p osterior, the DB Ba y es factors, as giv en in (30), can b e appro ximated by B D 21 ≈ B N 21 Z Z 1 π N ( θ 0 | φ ) N k ( ˆ θ | θ 0 , Σ( ν , t )) N p ( ν | ˆ ν , J ν ( ˆ φ )) d ν I Ga ( t | 1 2 , 1 2 ) dt, where p is the dimension of ν and Σ ( ν , t ) = t n J − 1 θ ( θ 0 , ν ) + J − 1 θ ( ˆ φ ). A similar asymptotic appro ximation to B N 12 , fi nally giv es the desired asymptotic app ro ximation to the DB Ba yes 32 factor: B D 21 ≈ p ( y | ˆ φ ) p ( y | θ 0 , ˆ ν ) (2 π ) k / 2 1 det J θ ( ˆ φ ) 1 / 2 × Z Z π N ( ˆ θ | ˆ ν ) π N ( θ 0 | ν ) N k ( ˆ θ | θ 0 , Σ( ν , t )) N p ( ν | ˆ ν , J ν ( ˆ φ )) I Ga ( t | 1 2 , 1 2 ) d ν dt, whic h is ve ry easy to ev aluate b y simple Mont e Carlo. Note that arbitrary constants in the p ossibly improp er π N ( θ | ν ) cancel out in the expression ab o ve . 6 Summary and conclusions Extending pioneering wo rk b y Jeffreys (1961), we pr op ose a new class of pr iors for ob jectiv e Ba yes h yp othesis testing based on div ergence measur es, wh ic h w e call ‘Div ergence Based’ (DB) priors. F or div ergence measures, we prop ose use of symmetrized versions (su m an d the minim um) of Kullbac k Liebler diverge nces. T h e resulting DB priors are usually easy to compute and ha ve a num b er of desirable prop erties as in v ariance und er reparameterizations, evidence consistency and compatibilit y with su fficien t statistics. W e explore DB priors in a series of estudy examples, in which they sho w to b e in tuitiv ely soun d and to p ro duce sens ib le Ba yes factors. T his is so ev en for irregular mo d els and impr op er lik eliho o d s, whic h are extremely c hallenging scenarios for other o b jective Ba y es testing metho dologies. W e recommend use of the sum-DB prior when it exists b ecause it is considerably easier to compu te than the m in-DB prior and seems to exhibit a nicer b eh avior. The DB pr iors seem to b ehav e similarly to the arithm etic intrinsic pr ior (when defin ed). Also, in normal scenarios, they exactl y r epro duce the p rop osals of Jeffreys (1961) and Zellner and Sio w (1980, 1984), so that they can b e considered an extension of these classical prop osals to non-normal situations. App ro ximations to DB priors are also sho wn to b e connected with other prop osals as the unit information priors. Finally , we also pro vide asymptotic appr o ximations to DB Ba y es factors for large samp le size. The definition of DB priors are based on particular c hoices of b oth 1) an ‘ob jectiv e prior’ π N for estimation pr ob lems an d 2) an equiv alen t sample size n ∗ . Of course, th ere is no general agreemen t in the literature ab out a single defin ition for any of these concepts (and there migh t nev er b e). W e think that an y sensible pr op osals would p ro duce nice results, b ut this in an issue that needs to be furth er inv estigated. W e recommend, when p ossible, use of the r efer enc e prior (Berger and Bernardo, 1992) and of the equiv alent sample size in Berger et al. (2007 ). Other apparently arbitrary c hoices that w e made w ere th ose of h q and of q ∗ , how eve r they w ere b ased on some comp elling arguments • Choice of h q ( t ) = (1 + t ) − q w as sp ecifically chosen to r ep ro duce in the normal case Jeffreys- 33 Zellner-Sio w priors, but there are other r easons for it. A comp elling reason is th at it is a simp le function resulting in Ba y es factors with n ice pr op erties; another simple fun ction to u se could b e the exp onen tial, but this resu lts in normal p riors that are not evidenc e c onsistent . Also, h q results in priors with very hea vy tails, whic h is imp ortan t so as not to ‘kno c k-out’ the lik eliho o d when data is not well explained b y the n ull mo del. How ev er, w e do n ot rule out that other c hoices of functions h ( t ) whic h are d ecreasing for t ∈ [0 , ∞ ), with maxim um at zero, and prod ucing prop er DB-t yp e priors could wo rk b etter in sp ecific scenarios. • Choice of q ∗ = q + 1 / 2. In prin ciple, any q + δ could b e used. As a matter of fact, we do n ot exp ect that the sp ecific c h oice of δ matters m uch as lo ng as δ ∈ (0 , 1) (needed to pro d uce priors with hea vy tails and n o momen ts), but this again needs furth er inv estigation. W e recommend use of δ = 1 / 2 b ecause it is the v alue repr o ducing Jeffreys prop osal. Ac kno wledgemen ts Commen ts b y Jim Berger are gratefully ac kno wledged. This researc h w as su pp orted in part by the Span ish Ministry of S cience and T ec hn ology , u n der Gran t MTM2004-032 90. References Ba yarri, M.J. and Garc ´ ıa-Donat o, G. (2007), “Extending Con ve ntional priors for T esting General Hyp otheses in Linear Mod els,” Biometrika, 94, 135-152. Berger, J.O. (1985), Statistic al De cision The ory and Bayesian Anal ysis (2nd e d.), New Y ork: Springer-V erlag. Berger, J. O. and Bernardo, J. M. (1992), “On the deve lopment of the reference p rior metho d.”. In Bayesian Statistics 4 (eds J. M. Bernardo, J. O. Berger, A. P . Da wid and A. F. M. Smith), pp. 35-60. Oxford: Oxford Universit y Press. Berger, J.O. and Delampady , M. (1987), “T esting p recise hyp otheses,” Statistic al Scienc e , 3, 317-3 52. Berger, J.O. and Mortera, J . (1999 ), “Default Ba y es F actors for Nonnested Hyp othesis T esting,” Journal of the Americ an Statistic al Asso ciation , 94 , 542-5 54. Berger, J .O ., Ghosh, J.K . and Mukh opadhy a y , N. (2003), “Approximati ons to the Ba yes factor in 34 mo del selection problems and consistency issu es,” Journal of Statistic al Planning and Infer enc e, 112 , 241-58 . Berger, J. O. and Pericc hi, L. R. (1996), “The in trinsic Ba ye s factor for m o del selection and prediction,” Journal of the Americ an Statistic al Asso ci ation, 91 , 109-22. Berger, J. O., and Peric chi, R. L. (2001), “Ob jectiv e Ba y esian metho ds for mo del selection: in tro du ction and comparison (with discussion)”. In Mo del Sele ction (ed P . L ahiri), pp . 135-207. Institute of Mathematical Statistics Lecture Notes-Monograph Series, v olume 38. Beac hw o o d Ohio. Berger, J. O., Pericc hi, L. R. and V arsha vsky , J . A. (1998), “Ba y es factors and marginal distri- butions in inv arian t situations,” Sankhya A, 60 , 307-2 1. Berger, J.O . and Sellk e, T. (1987), “T esting a p oint null hyp othesis: the irreconcilabilit y of P-v alues and evidence,” Journal of the Americ an Statistic al Asso ciation, 82 , 112-1 22. Berger, J. et al. (2007). “Extensions and generalizations of BIC”, ISDS W orking pap er, in preparation. Bernardo, J.M. and Rueda, R. (2002), “Ba ye sian hyp othesis testing: A reference app roac h ,” International Statistic al R evi ew, 70 , 351-372. Bernardo, J.M. (2005), “Int rin s ic credible regions: An ob j ectiv e Ba y esian approac h to in terv al estimation,” T est , 14 , 317-384. Clyde, M. (1999), “Ba yesia n Mo d el Av eraging and Mo del Searc h S trategies (with discussion)”. In Bayesian Statistics 6 (eds J.M. Bernardo, A.P . Da wid, J.O. Berger, and A.F.M. Smith), p p. 157-1 85. O x f ord: Oxford Univ ersit y Press. Clyde, M., DeSimone, H. and P armigiani, G. (1996), “Prediction via Orth ogonaliz ed Mo d el Mixing,” Journal of the Americ an Statistic al A sso c iation , 91 , 1197-120 8. Cono v er, W. J. (1971), P r actic al nonp ar ametric statistics, New Y ork: John Wiley and Sons. Co x, D. R. and Reid, N. (1987), “P arameter orthogonalit y and approximate conditional inf er- ence,” Journal of the R oyal Statistic al So ciety B , 49 , 1-39. Datta, G.S. and Ghosh , M. (1995), “On the inv ariance of nonin f ormativ e priors”, Anna ls of 35 Statistics , 24 , 141-159. De San tis, F. and S p ezzaferri, F. (1999), “Metho ds for Default and Robust Ba y esian Mo del Comparison: Th e F ractional Ba y es F actor Ap proac h,” International Statistics R eview, 67 , 267- 286. Garc ´ ıa-Donato , G. (2003), F actor es Bayes F actor es Bayes Convencionales: Algunos Asp e ctos R elevantes, Unp ublished PhD Thesis, Department of Statistics, Unive rsity of V alencia. Garc ´ ıa-Donato , G. and Su n, D. (2007), “Ob jectiv e Priors for Mo del Selecti on in One-W a y Random Effects Mo dels,” The Canadian journal of Statistics, in press. Ho eting, J.A, Madigan, D., Raftery , A.E. and V olinsky , C.T. (1999), “Ba y esian Mod el Av erag- ing: A T utorial,” Statistic al Scienc e , 14 , 382-41 7. Ibrahim, J. and Laud, P . (1994), “A Predictive Approac h to the Analysis of Designed Exp eri- men ts,” Journla o d the Americ an Statistic al Asso ciation, 89 , 309-319 Jeffreys, H. (1961). The ory of P r ob ability , 3rd edn. Lond on: Oxford Univ ersit y Pr ess. Kass, R. E . and Raftery , A. E. (1995 ), “Ba ye s factors,” Journal of the Americ an Statistic al Asso ciation, 90 , 773-9 5. Kass, R. E . and V aidy anathan, S. (1992), “Appro ximate Ba yes factors and orth ogonal param- eters, with application to testing equalit y of t wo binomial p rop ortions,” J ournal of the R oyal Statistic al So c iety B, 54 , 129-44. Kass, R. E. and W asserman, L. (1995), “A reference Ba yesia n test for nested h yp otheses and its relationship to the Sc hw arz criterion,” Journal of the Am eric an Statistic al Asso ciation, 90 , 928-3 4. Kullbac k, S . (1968 ), Information The ory and Statistics, New Y ork: Do v er Pub licatio ns, Inc. Laud, P .W. and Ibrahim, J. (1995), “Predictiv e Mo d el Selection,” J ournal of the R oyal Statistic al So ciety B, bf 57, 247-26 2. Liang, F., Pa ulo, R., Mo lina, G., Clyde, M., and Berge r, J. O. (2007), “Mixtures of g -priors for Ba yesian V ariable Selection,” Journal of the Americ an Statistic al So c i ety , in pr ess. 36 Mon tgomery , D. (2001), Intr o duction to Statistic al Quality Contr ol , 4th edn. J ohn Wiley and Sons, Inc. Moreno, E., Bertolino, F. and Racugno, W. (1998), “An int rin sic limiting pro cedure for mo d el selection and h yp otheses testing,” Journal of the Americ an Statistic al Asso ciation, 93 , 1451 -60. O’Hagan, A. (1995), “F ractional Ba ye s fact ors for mo del comparison (with discus sion),” J ournal of the R oyal Statistic al So ciety, B , 57, 99-138. P auler, D. (1998), “Th e Sc hw arz Criterion and Related Method s for Normal Linear Mo dels,” Biometrika, 85 , 13-27. P auler, D.K., W akefield, J.C. and Kass, R.E. (1999), “Ba yes f actors an d appr o ximations for v ariance comp onent mo dels,” Journal of the Americ an Statistic al Asso c i ation , 94, 1242-125 3. P ´ erez, J.M. and B erger, J. (200 1), “Analysis o f mixture mo d els u sing exp ected posterior priors, with applicat ion to classification of gamma ra y bur sts.” In Bayesian Metho ds, with applic ations to scienc e, p olicy and official statistics, (eds E. George and P . Nanop oulos), pp. 401-410 . O ffi cial Publications of the Eu rop ean Comm un ities, Luxem b ourg. P ´ erez, J. M. and Berger, J. O. (2002), “Exp ected p osterior prior d istr ibutions for mo del selec- tion,” Biometrika, 89 , 491-512. P ´ erez, S. (2005 ), M´ eto dos Bayesianos objetivos de c omp ar aci´ on de me dias, Unpublish ed Ph D Thesis, Departmen t of Statistics, Unive rsity of V alencia. Raftery , A.E. (1998 ), “Ba yes factor and BIC: comment on W eakliem,” T ec h n ical Rep ort 347, Departmen t of Statistics, Univ ersity of W ashington. Sc hervisch, M.J. (1995) , The ory of Statistics. New Y ork: Springer-V erlag. T anner, M.A. (1996 ), T o ols for Statistic al Infer enc e. Metho ds for the explor ation of Posterior Distributions and Likeliho o d F unctions. 3rd edn. New Y ork: Sp ringer V erlag. Zellner, A. and Sio w, A. (1980), “P osterior o d ds ratio for selected regression hyp otheses”. In Bayesian Statistics 1 (eds J. M. Be rn ardo, M. H. DeGroot, D. V. Lindley and A. F. M. S m ith), pp. 585-603 . V alencia: Univ ersity Press. Zellner, A. and Sio w, A. (1984 ). Basic Issues in Ec onometrics. C h icago: Universit y of C hicago 37 Press. App endix. Pr o ofs. Pro of of Pr op osition 1. Let ¯ D ∗ [ ξ , ξ 0 ] b e the unitary measure of diverge nce b et ween f ∗ 1 ( y ) and f ∗ 2 ( y | ξ ) in (14). I t is w ell kno wn that K L remains the same under one-to -one reparameteri- zations, and clearly ¯ D ∗ [ ξ ( θ ) , ξ ( θ 0 )] = ¯ D [ θ , θ 0 ]. Now, by defin ition of DB priors, and using the relation b et ween π N θ and π N ξ , it follo ws that π D θ ( θ ) = c ( q ∗ ) h q ∗ ( ¯ D [ θ , θ 0 ]) π N θ ( θ ) = c ( q ∗ ) h q ∗ ( ¯ D ∗ [ ξ ( θ ) , ξ ( θ 0 )]) π N ξ ( ξ ( θ )) |J ξ ( θ ) | = π D ξ ( ξ ( θ )) |J ξ ( θ ) | . Pro of of Pr op osition 2. Let D ∗ [ θ , θ 0 ] b e the symmetric div ergence b et ween f ∗ 1 ( t ) and f ∗ 2 ( t | θ ) in (15), and hence D ∗ [ θ , θ 0 ] = D [ θ , θ 0 ]. T he result no w follo ws from the assump tion that neither π N nor n ∗ c hange when the problem is formulat ed in terms of su ffi cien t statistics. Pro of of Lemma 3.3. First we show that (18) imp lies that B π 12 → 0 as ¯ y → 0. Assum e R 1 0 µ − k π ( µ ) = ∞ . Then lim ¯ y → 0 m 2 ( y ) = lim ¯ y → 0 Z ∞ 0 µ − n e − n ¯ y/µ π ( µ ) dµ ≥ Z 1 0 µ − k π ( µ ) dµ = ∞ , and the result f ollo ws. T o sh o w the conv erse, note that, since π ( µ ) is prop er, lim ¯ y → 0 Z ∞ 1 µ − n e − n ¯ y /µ π ( µ ) dµ < ∞ . (31) No w , by con tradiction s upp ose that for n ≥ k , R 1 0 µ − k π ( µ ) dµ < ∞ , so in particular R 1 0 µ − n π ( µ ) dµ < ∞ , and hence the limit ing fu nction g ( µ ) = µ − n π ( µ ) is integ rable; no w, the Dominated Con v er- gence Theorem give s lim ¯ y → 0 Z 1 0 µ − n e − n ¯ y/µ π ( µ ) = Z 1 0 µ − n π ( µ ) dµ < ∞ , whic h jointly with (31) con tradicts the assumption of B π 12 → 0 as ¯ y → 0, pro ving the result. Pro of of Lemma 3.5. It can easily b e seen that, as T → ∞ B π 21 → e − nθ 0 Z ∞ −∞ e nθ π ( θ ) dθ , 38 No w , ∀ n ≥ k , it follo ws that Z ∞ −∞ e nθ π ( θ ) dθ ≥ Z ∞ −∞ e k θ π ( θ ) dθ ≥ Z ∞ θ 0 e k θ π ( θ ) dθ , pro ving the lemma. Pro of of Th eorem 1. By d efinition, the DB priors f or the reparameterized problem are π D ν ( ν ) = π N ν ( ν ) and (recall h q ( t ) = (1 + t ) − q ) π D ξ ,η ( ξ , η ) = c ∗ ( q ∗ , η ) − 1 h q ∗ ( ¯ D ∗ [( ξ , ξ 0 ) | η ]) π N ξ | η ( ξ | η ) π N η ( η ) , where ¯ D ∗ [( ξ , ξ 0 ) | η ] is the co rr esp onding unitary measure of divergence b et ween the co mp eting mo dels f ∗ 1 and f ∗ 2 in (27) and c ∗ ( q ∗ , η ) = Z h q ∗ ( ¯ D ∗ [( ξ , ξ 0 ) | η ]) π N ξ | η ( ξ | η ) d ξ . It can b e easily sho wn that ¯ D ∗ [( ξ , ξ 0 ) | η ] = ¯ D [( θ , θ 0 ) | ν ]. Also, u n der the assu mptions of the theorem, π N θ , ν ( θ , ν ) = κ 2 π N ξ ,η ( ξ ( θ ) , η ( ν )) |J ξ ,η ( θ , ν ) | , where κ 2 is a constant. T hen π N θ | ν ( θ , ν ) = κ 2 κ π N ξ | η ( ξ ( θ ) | η ( ν )) |J ξ ( θ ) | , and hence c ( q ∗ , ν ) = κ 2 κ c ∗ ( q ∗ , η ( ν )) , and the result f ollo ws. Pro of of Lemma 5.1. F or i = 1 , 2, let m D i ( y ) and m N i ( y ) denote th e prior predictiv e marginals obtained with π D i and π N i , resp ectiv ely . By definition of DB priors, m N i ( y ) = m D i ( y ), a nd hence B D 21 = m D 2 ( y ) m D 1 ( y ) = m N 2 ( y ) m N 1 ( y ) m D 2 ( y ) m N 2 ( y ) = B D 21 m D 2 ( y ) m N 2 ( y ) . 39 Finally m D 2 ( y ) = Z f ( y | θ , ν ) π D ( θ , ν ) d θ d ν = Z f ( y | θ , ν ) c ( q ∗ , ν ) − 1 h q ∗ ( ¯ D [( θ , θ 0 ) | ν ]) π N ( θ , ν ) d θ d ν = m N 2 ( y ) Z c ( q ∗ , ν ) − 1 h q ∗ ( ¯ D [( θ , θ 0 ) | ν ]) π N ( θ , ν | y ) d θ d ν , and the result h olds. 40
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment