Comment: Bayesian Checking of the Second Levels of Hierarchical Models

Statistic al Scienc e 2007, V ol. 22, No. 3, 344– 348 DOI: 10.1214 /07-STS235C Main article DO I: 10.1214/07-STS235 c  Institute of Mathematical Statisti cs , 2007 Comment: Ba y esian Checki ng of the Second Levels of Hiera rchical Mo dels M. Evans Abstr act. W e discuss the metho ds of Ev ans and Moshono v [ Bayesian Ana lysis 1 ( 2006) 893–9 14, Bayesian Stat istics and Its A pplic ations (2007 ) 145–1 59] concerning c hec king for prior-data conﬂict and their relev ance to the metho d prop osed in this pap er. Key wor ds and ph r ases: Chec king for pr ior-data conﬂict, su ﬃciency, ancillarit y, nonin formativit y. 1. INTRODUCTION This is an in teresting pap er dealing with an im- p ortant topic. It is a lo gical cont inuatio n of the con- tributions found in Ba ya rri and Berger ( 20 00 ). In particular, it con tinues the emph asis on a vo iding the “double use of the data” and this is an imp ortan t p oint th at w e agree with. While it seems intuitiv ely clear wh at “double use of the d ata” means, it w ould b e nice to ha v e a pre- cise deﬁnition as the phrase seems to b e used a bit to o fr eely b y s ome at times, at least in our view. I n- tuitiv ely , in mo del chec king, this w ould seem to b e the situation w h ere the ﬁtted mo del dep ends on a particular asp ect of the data and then the m o del is c h ec k ed by comparing the s ame asp ect of the data with the ﬁ tted mo d el. On the other hand, we ha v e seen assertions that a “double u se of the data” is b eing made in situations lik e computing a posterior (the ﬁ rst use) and then (the second us e) compu t- ing a charac teristic of that d istribution lik e a mo de or hp d region. While in some tec hnical sen s e this seems like using the data twice, th ere d o es not seem to be an y th ing wrong with it, at least to us. Rather than giving a deﬁnition, this pap er, like Ba y arri a nd M. Evans is Pr ofessor, Dep artment of Statistics, University of T or onto, T or onto, Ontario M5S 3G3, Canada e-mail: mevans@utstat.ut or onto.c a . This is a n electronic repr int of the orig inal article published by the Institute of Mathematical Statistics in Statistic al Scienc e , 2 007, V o l. 22, No. 3, 344 –348 . This reprint diﬀer s from the original in pagination and t yp ogr aphic detail. Berger ( 2000 ) and Robin s, v an der V a art and V en- tura ( 2000 ), p oin ts to a n egativ e consequence of d ou- ble use of the d ata, in terms of the lac k of unif or- mit y of p -v alues. P erhaps the factorization in S ec- tion 2 of this d iscussion giv es a general metho d of ensuring that comp onent s of the total information a v ailable to a statisticia n for an analysis are used ap- propriately , an d so giv es a general c haracterization for a v oidance of “double use of the information.” This p ap er a ssum es a default or “ob jectiv e” p rior on the last lev el of a hierarc hically sp eciﬁed pr ior. In general this will r esu lt in an improp er pr ior. Part of the motiv ation for this seems to b e that “mo del c h ec king w ith informativ e priors cannot s ep arate in- adequacy of the p rior fr om inadequacy of the mo d el” and so the metho dology prop osed by Box (198 0), whic h is based on prop er pr iors, is not used. W e disagree w ith the qu oted statemen t. The method s discussed in Ev ans and Moshono v ( 2006 , 2007 ) are a mo diﬁcation of Box’ s approac h and are mot iv ated precisely by the need to separate the tw o kind s of inadequacies in the con text of prop er, in formativ e priors whic h , as they should, repr esen t sub jectiv e b eliefs. W e brieﬂy outline this approac h in S ection 2 . Also, Ev ans and Moshono v ( 2006 ) includes metho d- ology for chec king the second lev el of a hierarc hical mo del b ased on a fact orization of the full informa- tion. W e discuss this in Section 3 and sho w that this metho dology is also applicable when the ﬁrst lev el is improp er . While w e agree with the n ecessit y to consider im- prop er priors as part of a general theory of statis- tics, it is d iﬃcult for us to accept these as a basis from wh ic h statistical theory is built. It is our opin- ion that the core of statistics is represented b y th e 1 2 M. EV ANS prop er pr ior cont ext. As such, w e f eel that what is done outside of this core sh ould b e highly inﬂuenced, if not dir ected, b y the cen tral theory with prop er priors. So our discussion reﬂects this and considers the imp lications for the situation d iscussed in this pap er. F or us c hecki ng the s ampling mo d el and th e p rior are imp ortan t parts o f a statistical analysis. A com- mon complain t concerning the prior is that it is sub- jectiv e, as it r epresen ts someone’s p ersonal b eliefs ab out the true v alue of θ . A common retort is that the sampling mo del is also sub jectiv e as it r ep resen ts someone’s b elief that the true distribution is in this class, that is, it was someone’s su b jectiv e c hoice. Of course, b oth these state ments are correct as there is t ypically little “ob jectiv e” ab out either choice . F rom another p oint of view, the fact th at these c hoices are sub jectiv e is a goo d thing b ecause they are (hop e- fully) informed c hoices and that s h ould lead to b et- ter statistica l analyses than if we made these c hoices arbitrarily , or based on con ve ntio n. F or us the wa y to reconcile th e debate b et w een ob jectiv e and sub jec- tiv e is through chec king that these in gredien ts mak e sense in light of what we know to b e tru ly ob jectiv e (at least if it is collec ted correctly), namely , the data. Others argue that no suc h c hec ks should b e made, as they lea d us to be incoheren t. There is a wide di- v ersit y of opinion on th ese matters and w e certainly ac kn o w ledge v alue in v arious p oin ts of view. 2. F A CTORING THE F ULL INF ORMA TION Supp ose we ha ve p rescrib ed a sampling mo d el { P θ : θ ∈ Θ } , a prop er prior Π , and ha ve observ ed the data x. The sampling mo del and prior com- bine to giv e the joint mo del P θ × Π for ( x, θ ) . W e will supp ose that this join t mo del and the observed data comprise the full inform ation a v ailable to the analyst. W e are not sa ying that further informatio n ma y not b e a v ailable in an analysis, but w e will re- strict our discussion to the situation wher e this is all w e ha v e. F ur ther, denote the p rior predictiv e mea- sure by M ( B ) = R Θ P θ ( B )Π( dθ ) ; for statistics T and U ◦ T on the s ample space let M T ( ·| U ◦ T ) d enote the conditional prior pred ictive distrib ution of T giv en U ◦ T , and Π( ·| x ) denote the p osterior of θ . In Bo x’s appr oac h to mo del chec king, th e observ ed v alue of x is compared with M to see if there is mo del failure, that is, we chec k to see if x is a sur- prising v alue from M . There would app ear to b e an illogica lit y in v olv ed in this, how ev er, as we kno w, at least in the sub jectiv e Ba y esian context , that x w as not generated from M . If our assertion wa s that x w as generate d f rom M , p erhap s as a random ef- fects m o del, then it w ould mak e sense to c hec k x against M , as this is an assertion ab out the u n der- lying data generating mec hanism . I t is clearly more appropriate, in Ba yesia n con text, ho wev er, to see if x is not surpr ising for at least one of the distribu - tions in { P θ : θ ∈ Θ } , that is, c hec k x against what w e are asserting is the data generating mec hanism—the sampling mo del. As discussed in Ev ans and Moshono v ( 2006 ), there are t wo p ossibilities for failure in the Ba y esian for- m ulation: the sampling mo del may fail by x b e- ing s u rprisin g for eac h distribution in the sampling mo del or, if the sampling mo del do es not f ail, th e prior ma y conﬂict with the data by placing the b ulk of its mass on those distributions in the samp ling mo del for whic h the data is surp rising. No te that it only m ak es sense to talk about prior-data conﬂict if the sampling m o del do es not fail. Logically , c hec king the sampling mo del precedes chec king for prior-data conﬂict. Ho w then sh ould w e c hec k for pr ior-d ata conﬂict? In tuitiv ely this arises when the eﬀectiv e s upp orts of the likelihoo d and the pr ior do not o ve rlap. As dis- cussed in E v ans and Moshonov ( 2006 ), ho wev er, the clearest appr oac h to measuring this conﬂict comes from asking if the observ ed likelihoo d is a sur prising v alue from its prior p redictiv e distribu tion. Giv en that th e lik eliho o d map is minimal suﬃcient , this is equiv alent to asking if the observed v alue T ( x ) of a minimal suﬃcient statistic T is s u rprisin g fr om its marginal p rior pr edictiv e M T . F u rther consider- ation shows that T ( x ) can b e surp rising simply b e- cause some v alue U ( T ( x )) is surprising w here U ◦ T is ancillary . When such ancillaries exist, this leads to comparing T ( x ) to M T ( ·| U ◦ T ) wh ere U ◦ T is a maximal ancillary , as this cond itioning remo v es the maximal amount of ancillary v ariation. Ancillary v ariation is clearly n ot relev an t to assessing pr ior- data conﬂict as it do es not dep end on the p aram- eter. F ur ther, there is nothing to prev ent us from using some function S ( T ) , and comparing its ob- serv ed v alue to the distribution M S ( T ) ( ·| U ◦ T ) , to c h ec k for prior-data conﬂict. Of course, S has to b e c h osen sensibly if we are going to make a meaningful c h ec k. This app roac h leads to the follo win g facto rization of the joint distribution: P θ × Π COMMENT 3 (1) = P ( ·| T ) × P U ◦ T × M T ( ·| U ◦ T ) × Π( ·| x ) , where P ( ·| T ) is the conditional distribu tion of the data gi ve n the minimal suﬃcien t statistic T , and so do es not in vol ve θ , and P U ◦ T is the marginal distri- bution of P U ◦ T whic h is also free of θ . Eac h of the comp onent s in ( 1 ) pla ys a separate role in a statisti- cal analysis. P ( ·| T ) and P U ◦ T are a v ailable for c hec k- ing the sampling m o del, M T ( ·| U ◦ T ) is a v ailable for c h ec king for p rior-data conﬂict and Π( ·| x ) [whic h re- ally only dep end s on the data through T ( x )] is for inference ab out θ . W e see that M = P ( ·| T ) × P U ◦ T × M T ( ·| U ◦ T ) , wh ic h explains h o w this is a m o diﬁca- tion of Bo x’s app roac h and it shows ho w to c hec k for inadequacies in the prior as well as the sampling mo del. It is our claim that eﬀectiv ely ( 1 ) shows us h o w to pro ceed to a void double use of the information and, as such, a v oid double u s e of the data. Of course, as men tioned in the p ap er, it ma y b e diﬃcult, with complicated mo dels, to determine P ( ·| T ) or P U ◦ T in meaningful w a ys. Accordingly , it seems r eason- able to weak en th is requiremen t in suc h con texts to ha ving this hold asymptotically in some s en se. F or example, a c h i-squared go o dness-of-ﬁt test is asymp- totical ly ancillary . In the context of an improp er prior that leads to a prop er p osterior, then ( 1 ) is still a v ailable b ut no w the factor M T ( ·| U ◦ T ) is not a probabilit y measure and so it is not clear how w e would c h ec k for prior- data conﬂict. As discussed in Ev ans and Moshono v ( 2006 , 20 07 ), a partial charac terization of a nonin- formativ e prior is that it would nev er lead to ev- idence of a prior-data conﬂict existing no matter what data is obtained. Th us the c hoice of an im- prop er prior is an assertion that this c hoice av oids suc h a conﬂict. Noninformativ e sequences of pr iors are also d iscussed in Ev ans and Moshono v ( 2006 , 2007 ) and these can pro vide a wa y to jus tify such a statemen t for a particular impr op er prior. In any case, the choic e of an improp er p rior should not in an y wa y c hange the role of the remaining factors if w e follo w the p rinciple that the p rop er case is cen- tral. Although w e do not h a v e a formal pro of, it w ould seem that the method s discussed in Ba y arri and Berger ( 2000 ) will satisfy this asymp totically . F u rther, any p -v alues computed according to this factorizat ion will h a v e the necessary un iform prop- erties w hen assessed against th e appropriate mea- sures. F or example, if p ( t ) = M T ( h ( T ) > h ( t )) is a p -v alue for c h ec kin g for prior-data conﬂict w ith no ancillary , then p ( T ) will b e uniformly d istributed, at least in the conti nuous case, wh en T ∼ M T . 3. HIERARCHICAL MODELS In Ev ans and Moshonov ( 2006 , 2007 ) metho ds are discussed for c h ecking hierarchica lly sp eciﬁed p riors for θ = ( θ 1 , θ 2 ) ∈ Θ 1 × Θ 2 , that is, we s p ecify pr i- ors Π 1 and Π 2 so that Π( d ( θ 1 , θ 2 )) = Π 2 ( dθ 2 | θ 1 ) × Π 1 ( dθ 1 ) . In suc h situations w e w ould like to c hec k the individual comp onen ts of th e prior s ep arately , as this give s us more inf ormation ab out a prior-data conﬂict w hen this o ccurs. F or example, it ma y b e that Π 1 conﬂicts but Π 2 do es not. W e distinguish t w o d iﬀeren t situations. First, the parameters θ 1 and θ 2 ma y b oth b e part of the lik eli- ho o d fun ction and second, only θ 2 is part of the lik e- liho o d function. The second situation corr esp onds to hierarchical mo dels an d θ 1 is a hyperp arame- ter. Met ho d s are presen ted in Ev ans and Moshonov ( 2006 , 2007 ) for b oth of these situations, but w e only discuss hierarc hical m o dels here. With pr op er priors w e ha v e the pr ior Π ∗ 2 ( dθ 2 ) = R Θ 1 Π 2 ( dθ 2 | θ 1 )Π 1 ( dθ 1 ) for the mod el parameter and the metho ds of Section 2 , based on the minimal statistic T for the mo del { P θ 2 : θ 2 ∈ Ω 2 } , are a v ail- able to chec k wh ether or not Π ∗ 2 conﬂicts with the data. While this chec k is a v ailable, Ev ans and Moshono v ( 2006 ) d ev elop a factorizati on that is ap- propriate for c hec king the comp onen ts, suc h as the second lev el Π 2 ( ·| θ 1 ) , of a h ierarc hical mo del. T o simplify the presentat ion of this, w e will sup- p ose th ere are no r elev an t ancill aries for { P θ 2 : θ 2 ∈ Ω 2 } based on T , but note that these can b e incor- p orated as w ell. W e can formally generate another mo del for x from the join t distribution, namely , via M θ 1 ( dx ) = Z Ω 2 P θ 2 ( dx )Π 2 ( dθ 2 | θ 1 ) = P ( dx | T )( t ) Z Ω 2 P T θ 2 ( dt )Π 2 ( dθ 2 | θ 1 ) = P ( dx | T )( t ) × M T θ 1 ( dt ) . This m o del is only formal, as, indeed, our mo del in- dicates that x w as not generated via M θ 1 , f or some v alue of θ 1 . Here M θ 1 is the conditional prior pr e- dictiv e distr ib ution for x give n θ 1 and M T θ 1 is the conditional prior p redictiv e distribution for T giv en θ 1 . Note that when Π 2 ( ·| θ 1 ) is p rop er, as in the p a- p er, then M θ 1 and M T θ 1 are also prop er. 4 M. EV ANS Let V ( T ) b e a m in imal su ﬃcien t statistic f or the formal mo del for T giv en by { M T θ 1 : θ 1 ∈ Ω 1 } . W e can factor M T θ 1 as M T ( ·| V ) × M V θ 1 , where M T ( ·| V ) is the conditional prior pr ed ictiv e distribution of T giv en V , and M V θ 1 is the conditional prior pr edictiv e distribution of V giv en θ 1 . Then the join t distribu- tion of ( θ 1 , x ) can b e fact ored as P ( ·| T ) × M T ( ·| V ) × M V × Π 1 ( ·| V ) , (2) where M V is the prior predictiv e distribu tion of V and Π 1 ( ·| V ) is the p osterior distribution of θ 1 . Consider ho w eac h of the factors in ( 2 ) is to b e used. First P ( ·| T ) is a v ailable for chec king th e ba- sic sampling mod el { P θ 2 : θ 2 ∈ Ω 2 } . If no evidence is found against { P θ 2 : θ 2 ∈ Ω 2 } , we can pro ceed to c h ec k th e form al mo del { M T θ 1 : θ 1 ∈ Ω 1 } for T using M T ( ·| V ) and note that this do es not d ep end on Π 1 . Note also that M T ( ·| V ) is pr op er wh en ev er Π 2 ( ·| θ 1 ) is prop er for eac h v alue of θ 1 . If evidence is found against this mo d el, then, b ecause w e h a v e acce pted the samp ling m o del, and so consequen tly the mo del { P T θ 2 : θ 2 ∈ Ω 2 } for T , this m ust o ccur b ecause of a conﬂict b etw een the obs erv ed v alue T ( x ) and Π 2 . So a c hec k of the formal mo del { M T θ 1 : θ 1 ∈ Ω 1 } using M T ( ·| V ) is a c hec k for prior-data conﬂict with Π 2 . Note that this chec k pro ceeds exa ctly as in the sim- pler situation describ ed in Section 2 . If w e ﬁnd no evidence against { M T θ 1 : θ 1 ∈ Ω 1 } , then w e can c hec k for a conﬂict with Π 1 using M V . Finally , if there is no conﬂict with Π 1 , then Π 1 ( ·| V ) is av ailable for in- ference ab out θ 1 . Of course, if th er e is no conﬂict with Π 1 and Π 2 , then we can also make in ference ab out the p arameter of in terest θ 2 . The mo del { M V θ 1 : θ 1 ∈ Ω 1 } may ha ve ancillaries. Let W ◦ V b e s u c h a m aximal ancillary . W e then ha v e that M V factors as M V = M W ◦ V × M V ( ·| W ◦ V ) so that ( 2 ) b ecomes P ( ·| T ) × M T ( ·| V ) (3) × M W ◦ V × M V ( ·| W ◦ V ) × Π 1 ( ·| V ) . In th is case, giv en th at w e ha ve accepted the sam- pling mo d el, the factor M W ◦ V is a v ailable f or c hec k- ing for p rior-data conﬂict with Π 2 , and M V ( ·| W ◦ V ) is the app ropriate factor for c hec king Π 1 . The jus- tiﬁcation f or this is exactly as in the simple case discussed in Section 2 . Note that in ( 3 ), the only distribution that will necessarily b e imp rop er when Π 1 is impr op er, is M V ( ·| W ◦ V ) . The measur e M V ( ·| W ◦ V ) is to b e used only in the c hec k for Π 1 . Therefore, the c hoice of an improp er Π 1 is really an assertion that this prior will nev er conﬂict with the data. Irresp ectiv e of whether or not Π 1 is imp rop er, the factors M T ( ·| V ) and M W ◦ V are a v ailable to chec k f or prior-data con- ﬂict with Π 2 , w hen it is prop er . W e consider the im p lemen tation of this approac h in the normal-normal hierarc h ical mo del presente d in the pap er. Example ( N ormal– normal hier ar chic al mo del ). W e ﬁr st consider a s im p ler mo del. In particular, w e assume that the kno wn σ 2 i are all equal to σ 2 and that we ha ve balance, namely , n 1 = · · · = n I = n. F or this problem we ha v e that T ( x ) = ( ¯ x 1 , . . . , ¯ x I ) ′ ∼ N I ( θ , ( σ 2 /n ) I ) and here θ is the mo del parameter (corresp onding to θ 2 in our parameterizat ion of a hierarc hical mo del ab o ve ). T h erefore, according to our factorization, we c hec k the samp ling mo del us- ing P ( ·| T ) , whic h is eﬀectiv ely the d istr ibution of the residuals. No w ( ¯ x 1 , . . . , ¯ x I ) ′ = ( θ 1 , . . . , θ I ) ′ + ( σ / √ n )( z 1 , . . . , z I ) ′ where the z i are i.i.d. N (0 , 1) and , from the s ec- ond level , ( θ 1 , . . . , θ I ) ′ ∼ N I ( µ 1 , τ 2 I ) , indep end en t of ( z 1 , . . . , z I ) ′ . Thus ( µ, τ 2 ) is the h yp erparameter (corresp onding to θ 1 in our parameterization of a hi- erarc hical mo del ab o v e). Th is implies that M T ( µ,τ 2 ) is giv en by ( ¯ x 1 , . . . , ¯ x I ) ′ ∼ N I ( µ 1 , ( τ 2 + σ 2 /n ) I ) . It is then easy to see that V ( ¯ x 1 , . . . , ¯ x I ) = ( P I i =1 ¯ x i , P I i =1 ¯ x 2 i ) is a minimal suﬃcien t statistic for the mo d el { M T ( µ,τ 2 ) : µ ∈ R 1 , τ 2 > 0 } . Note also that V is a complete minimal suﬃcient statistic so there are no relev an t ancilla ries W that w e need consid er for the c h ec k f or the second lev el. T o determine M T ( ·| V ) we need the conditional distribution of ( ¯ x 1 , . . . , ¯ x I ) ′ giv en ( P I i =1 ¯ x i , P I i =1 ¯ x 2 i ) . This is clea rly uniform on the sphere of squared ra- dius P I i =1 ¯ x 2 i lying in the hyp erplane of R I giv en b y { ( y 1 , . . . , y I ) ′ : P I i =1 y i = P I i =1 ¯ x i } . W e can sim u- late from this distrib ution by generating v 1 , . . . , v I − 1 i.i.d. N (0 , 1) , putting u i = v i / ( P I − 1 i =1 v 2 i ) 1 / 2 and ( y 1 , . . . , y I ) ′ = ( ¯ x 1 , . . . , ¯ x I ) ′ + A ( u 1 , . . . , u I − 1 ) ′ where A ∈ R I × ( I − 1) is su c h that the matrix ( 1 / √ I A ) is orthogonal. T hen for any particular discrepancy statistic, w e can compute an app r opriate p -v alue via sim ulation. The ab o ve analysis also app lies when the σ 2 i /n i are all equal. When they are not equal the analysis is more complicated, as the form of V depen ds on COMMENT 5 whic h ones are equ al. F ur th er, it is not a complete minimal suﬃcient statistic and so there are relev ant ancillaries. Based on the f actorization ( 3 ) w e feel that M T ( ·| V ) and M W ◦ V are appropriate distributions for com- puting p -v alues to assess the s econd lev el for a hier- arc h ical mo del. F ur th er, the unif orm it y of th e corre- sp ond ing p -v alues should b e assessed against these distributions and this do es not require that Π 1 b e improp er . It is d iﬃcult to compare our app roac h with the prop osal in the p ap er, but w e note that it h as the distinct adv antag e of not in vo lving the prior for the ﬁrst level. F or our c hec k on the second lev el we need sa y nothing ab out the p rior for the ﬁrst lev el and it can b e imp rop er. T h e intuition for this lies with conditioning on V , which completely remo ves the eﬀect of Π 1 on the pr ior predictive for T , and the fact that Π 2 induces the ancillary W ◦ V . Therefore an y conﬂict that is found can only b e d ue to Π 2 . It ma y b e that the m etho d prop osed in the p ap er will satisfy ( 3 ) in an asymptotic sen s e but w e do n ot ha v e a p ro of of this. 4. CONCLUSIONS It is sometimes suggested that mo del c hec king is a somewhat informal pro cess. P artly this is b ecause mo dels can f ail in man y wa ys and some of these may b e more relev ant in certain situations than others. It seems imp ossible then to come up with a method- ology that w ill c hec k for all of the p ossibilities si- m ultaneously . So it seems reasonable to ask that we sp ecify a set of chec ks that w e thin k are relev ant, prior to seeing the data, and then implemen t only these, rather than going on a h unting exp edition for defects. A similar approac h seems appr opriate for c h ec king f or prior-data conﬂict. While selection of the actual c hec ks is p erh aps somewhat informal, we do not b eliev e that there is complete freedom in this. Some general prin ciples m ust app ly . The ill eﬀe cts of d ou b le u se of the data, as discu ssed in this pap er and Ba ya rri and Berger ( 2000 ), p ro vide a go o d example of the need for such principles. In frequentist statistical theory , inference ab out parameters dep en ds on th e data only through the minimal suﬃcien t s tatistic and, w hat is left o v er in the d ata (the residual), is av ailable for mo d el chec k- ing. Mixing these up would seem to corresp ond to an inappropr iate statistical analysis. W e b eliev e th is is equally applicable in Ba yesia n formulati ons. Chec king for p r ior-data conﬂ ict s eems to sit b e- t w een mo del c hec king and in ference. While it de- p end s on the minimal suﬃcient statistic, h o w ev er, the factorization giv en by ( 1 ) ind icates that it re- ally is separate from mo del c hecki ng and inference as it in v olv es a s ep arate comp onent of the full in- formation as expressed by the join t d istribution. In essence ( 1 ) p rescrib es ho w eac h comp onen t of the full information is to b e used in a statistical analy- sis. If w e mix these up, it w ould seem to us that w e can exp ect illogica l or incoheren t b eha vior, for ex- ample, ov erly conserv ativ e p -v alues. Note that in a certain sense eac h comp onent of ( 1 ) is indep enden t of the others, as w e could pr escrib e eac h p robabil- it y measure separatel y and s till end up with a v alid join t d istribution. Sp eciﬁcation of eac h comp on ent of ( 1 ) is necessary and s u ﬃcien t for the sp eciﬁcation of a join t p r obabilit y distrib u tion for ( x, θ ) . Of course, this restriction could b e w eak ened to re- quiring that a metho dology only satisfy ( 1 ) in some asymptotic sens e. The motiv ation for this would seem to arise fr om the complexity of some s itu ations. S till, ( 1 ) can b e implement ed exac tly with many mo dels of consider ab le imp ortance, so it is not just of theo- retical relev ance. Similarly , w e b eliev e that ( 3 ) is the r elev an t fac- torizatio n for mo d el c hec king an d c hec king for prior- data conﬂict in hierarc hical mo d els. F rom that p er- sp ectiv e it would b e imp ortan t to see if the m etho ds prop osed in the p ap er satisﬁed this in some asymp- totic sense. Th is would giv e us more conﬁdence that these constituted an appr opriate w a y to pro ceed in situations where they were felt to b e necessary . W e also feel that our discussion o f ( 3 ) sho ws that the choi ce of prior Π 1 for θ 1 is irr elev ant for chec king Π 2 with hierarc hical mo dels. In particular, w hether Π 1 is pr op er or improp er, the c heck for Π 2 is the same and this is a satisfying result. T his do es not app ear to b e the case for the metho d prop osed in the pap er whic h dep ends, in particular, on whic h ob jec- tiv e pr ior w e use. Pe rhaps this eﬀec t disapp ears as the amoun t of data increases, but then the relev ance of chec king for prior-data conﬂ ict disapp ears to o, as the eﬀect of the p rior on in ference disapp ears, at least under reasonable regularit y conditions. Ov erall, our purp ose h er e is to suggest that there is a prin cipled approac h to the qu estion addressed in the pap er. W e are not sa yin g that using the par- tial p osterior app roac h is in some wa y incorrect. W e do think, h o w ev er, that it w ould b e w orth in v esti- gating to what exten t the partial p osterior app roac h satisﬁed ( 3 ). 6 M. EV ANS REFERENCES Ba y arri , M. J. and Be rger, J. O. (2000). p v alues for com- p osite null models (with discussion). J. Amer. Statist. As- so c. 95 1127–114 2, 1157–117 0. MR1804239 Ev ans, M. and Moshono v, H. (2006). Chec king for prior- data conﬂ ict. I n Bayesian A nalysis 1 893–914. Ev ans, M. and Moshono v, H. (2007). Chec king for prior- data conﬂict with hierarc h ically sp eciﬁed priors. Baye sian Statistics and Its Appli c ations (A. K. Upad hya y , U . Singh and D. Dey , eds.) 145–159. Anamay a Publishers, New Delhi. Ro bins, J. M., v an der V aar t, A. and Ventura, V. (2000). Asymptotic distribution p val ues in comp osite null mod els (with discussion). J. Amer. Statist. Asso c. 95 1143– 1156, 1171–1172. MR1804240

Comment: Bayesian Checking of the Second Levels of Hierarchical Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment