Comment: Bayesian Checking of the Second Level of Hierarchical Models: Cross-Validated Posterior Predictive Checks Using Discrepancy Measures

Statistic al Scienc e 2007, V ol. 22, No. 3, 359– 362 DOI: 10.1214 /07-STS235B Main article DO I: 10.1214/07-STS235 c  Institute of Mathematical Statisti cs , 2007 Comment: Ba y esian Checki ng of the Second Level of Hiera rchical Mo dels: Cross-V alidated P o sterio r Predictive Checks Using Discre pancy Measures Michael D. La rsen and Lu Lu 1. INTRODUCTION W e complimen t Ba yarri and Castellanos (BC) on pro du cing an interesti ng and insightful pap er on mo del c h ec king applied to the s econd lev el of h ierar- c h ical mo d els. Distributions of test statistics (fun c- tions of the observ ed data n ot inv olving parameters) for judging app ropriateness of hierarc hical m o dels t ypically in volv e n uisance (i.e., u nknown) parame- ters. BC ( 2007 ) fo cus on w a ys to remo ve the dep en- dency on nuisance parameters so that test statis- tics can b e used to assess mo d els, either through p - v alues or Berger’s r elativ e predictiv e su rprise (RPS). They demonstrate shortcomings in terms of v ery lo w p ow er of p osterior predictiv e c hecks and a p osterior empirical Ba y esian m etho d. They also d emons trate b etter p erformance of their partial p osterior predic- tiv e ( ppp ) method o v er a prior empirical Ba y esian metho d. Metho ds of Dey et al. ( 1998 ), O’Hagan ( 2003 ) and Marshall and Sp iegelhalter ( 20 03 ) a lso are compared. Metho ds are con trasted in terms of whether they require prop er prior distributions, ho w m an y mea- sures of su rprise (one p er group or one total) are pro- duced, and the degree to wh ic h d ata are used t w ice in e stimation and te sting. Their preferred met ho d ( ppp ) can use impr op er prior distributions, wh ic h are r eferr ed to as ob jectiv e, p ro du ces a single mea- sure of sur prise for eac h test statistic, and av oids M. D. L arsen is Asso ciate Pr ofessor, L. Lu is gr aduate student, D ep artment of Statistics and Center for Survey Statistics & Metho dolo gy, Iowa St ate University, Sne de c or Hal l, Ames, Iowa 50011 , USA e-mail: larsen@iastate.e du ; icyemma@iastate.e du . This is an electronic reprint of the or iginal article published by the Institute o f Mathematical Statistics in Statistic al Scienc e , 20 07, V ol. 22, No. 3, 359–3 6 2 . This reprint diﬀers from the o r iginal in pagination and t yp ogr aphic detail. double use of the data. F or the mod els and s tatis- tics consid ered, in comparison to the alternativ es present ed, ppp has a more u niform n ull distrib ution of p -v alues and more p o we r v ersus alternativ es. In this discussion, w e suggest that cross-v alidated p osterior p r edictiv e c hec ks using discrepancy mea- sures hold some promise for ev aluating complex mo d- els. W e apply them to O’Hagan’s data example, pr o- vide some commen ts on the pap er and discuss p os- sible future w ork . 2. CROSS-V ALIDA TED POSTERIOR PREDICTIVE CHECKS USING DISCREP ANCY MEASURES Supp ose there are data for I group s : X i , i = 1 , . . . , I , where X i = ( X ij , j = 1 , . . . , n i ). The u nkno wn pa- rameters in the ﬁ rst lev el in group i are θ i : f ( X i | θ i ) indep en d en tly . The parameters i n the second lev el of the mo d el are η : π ( θ | η ) = Q I i =1 π ( θ i | η ). The prior distribution on η is π ( η ). Let D ( X , θ , η ) b e a gen- eralized discrep ancy measure. If D ( X , θ , η ) = D ( X ), then it is a test statistic. Examples are give n in the next section for the n ormal-normal mo del consid- ered by BC ( 2007 ). Cr oss-v alidated p osterior pre- dictiv e mo del chec king us in g a discrepancy m ea- sure is implemented as f ollo ws. Separately for eac h i = 1 , . . . , I : 1. Generate M v alues ( m = 1 , . . . , M ) from the p os- terior distr ib ution of η | X ( − i ) ; call them η m ( − i ) , where X ( − i ) represent s all the data without group i . Generating v alues of η will b e accomplished in man y cases thr ough iterativ e s im ulation meth- o ds that will generate v alues of θ ( − i ) , where θ ( − i ) is the collectio n of group p arameters excluding group i : f ( η | X ( − i ) ) = R f ( η , θ ( − i ) | X ( − i ) ) dθ ( − i ) ∝ R π ( η ) π ( θ ( − i ) | η ) f ( X ( − i ) | θ ( − i ) ) dθ ( − i ) . 2. Generate v alues θ m i of θ i giv en the h yp erparam- eters η m ( − i ) indep en d en tly from π ( θ i | η m ( − i ) ), m = 1 , . . . , M . 1 2 M. D. LAR S EN AND L. LU T able 1 Posterior pr e di ctive p -values for individual gr oups and the whole p opulation Discrepancy Group 1 Group 2 Group 3 Group 4 Group 5 Whole p opul ation Overall X 2 0.568 0.857 0.261 0.747 0.287 0.483 1st Level X 2 0.547 0.893 0.140 0.893 0.202 0.496 2nd Level X 2 0.512 0.594 0.567 0.518 0.403 0.513 Max j ∈{ 1 ,...,n i } X ij 0.476 0.851 0.060 0.847 0.143 — Max j ∈{ 1 ,...,n i } | X ij − θ i | 0.610 0.839 0.113 0.923 0.283 — Max j ∈{ 1 ,...,n i } | X ij − µ | 0.682 0.820 0.286 0.897 0.151 — Max i | ¯ X i − µ | — — — — — 0.493 3. Generate replicate data X m i indep en d en tly from f ( X i | θ m i ), m = 1 , . . . , M . 4. C ompute the prop ortion of times out of M that D ( X m i , θ m i , η m ( − i ) ) is greater than D ( x i , θ m i , η m ( − i ) ), m = 1 , . . . , M . This prop osal allo ws the use of ob j ectiv e pr ior dis- tributions, is r elativ ely easy to implemen t in many hierarc hical m o dels, a v oids double use of data in group i for ev aluating the mo d el for group i , and al- lo w s many test statistics and d iscrepancy measures to b e used based on one s et of sim ulations of η and θ . On the negativ e side, this pro cedure ma y lose some p ow er for some statistics compared w ith ppp , but lik ely muc h less so th an r egular p osterior predic- tiv e chec ks. Th e us e of more ﬂ exibly deﬁned d is- crepancies, ho wev er, c ould pr o duce r elativ ely pow- erful ev aluations fo r some asp ects of some mod els. The p rop osal requires more computing than regular p osterior pr ed ictiv e c hec ks and faces issues of m u lti- plicit y in testing. The method is applied in Section 3 and follo wed b y discussion in Sectio n 4 . 3. O’HA GAN’S EXAMPLE O’Hagan’s data [see Secti on 5 of BC ( 2007 )] are used to study the p erformance of mo del chec king based on r egular and cross-v alidated p osterior pr e- dictiv e c hecks utilizing v arious discrepancy measur es. The mo del b eing ﬁt is a tw o-lev el normal-normal hierarc hical mo del. Nota tion is th e same as in BC ( 2007 ). Diﬀeren t discrepancy measures relate to v arious parts of the mo del. The o v erall X 2 discrepancy , de- ﬁned by P n i j =1 ( X ij − µ ) 2 ( σ 2 + τ 2 ) for group i , measures the ad- equacy of t wo leve ls as a wh ole. Th e ﬁ rst an d sec- ond lev el X 2 discrepancies, deﬁned as P n i j =1 ( X ij − θ i ) 2 σ 2 and ( θ i − µ ) 2 τ 2 for grou p i , d etect the in adequacy of the ﬁr s t- and second-lev el mo dels, resp ective ly . The three measur es ab ov e also can b e summed across groups, i = 1 , . . . , I . The maxim um absolute devia- tion of a group a v erage from the o verall cen ter is Max i | ¯ X i − µ | and quantiﬁes ﬁt of the whole m o del. The maxim um v alue Max j ∈{ 1 ,...,n i } X ij and the min- im um v alue Min j ∈{ 1 ,...,n i } X ij in group i are s ensitiv e to extremes w ithin groups. Th e maxim um absolute deviations of obser v ations from the group mean in group i , Max j ∈{ 1 ,...,n i } | X ij − θ i | , relates to s p read ab out the mean within group i . The maximum abso- lute deviation of observ ations from the o ve rall mean in group i , Max j ∈{ 1 ,...,n i } | X ij − µ | , relates to ade- quacy of b oth lev els in the mo d el. F or the r egular p osterior predictiv e c hec ks non- informativ e prior distributions for p arameters σ 2 , µ and τ 2 w ere u sed: π ( µ ) ∝ 1, π ( σ 2 ) ∝ 1 /σ 2 and π ( τ 2 ) ∝ 1 /τ (or equiv alen tly π ( τ ) ∝ 1). T able 1 s ho ws the p osterior p redictiv e p -v alues for ind ividual grou p s and the whole p opulation. As observed b y BC ( 2007 ), suﬀering fr om the d ou b le use of data , none of the discrepancy m easures detect an y evidence of incom- patibilit y b et ween the observ ed data and the null mo del f or ind ividual group s or f or the p op u lation as a whole. T able 2 shows the p -v alues based on cross-v alidated p osterior predictiv e c hec ks for individual groups. T he mo del ﬁts the data fr om groups 1, 2 and 4 ve ry well. F or group 3, the p -v alues based on the ﬁrst-lev el X 2 discrepancy is 0.016, wh ic h indicates slight inade- quacy of the ﬁrst-lev el mo d el. This is not surpr ising due to the extreme observ ation 4.10. The impact of this unusual observ ation in group 3, giv en a mo d el of equal spread in eac h group, also is detected by the discrepancy measure Max j ∈{ 1 ,...,n i } | X ij − θ i | , whic h has a p -v alue of 0.023. Despite the concern ab out the ﬁrst-lev el mo del in group 3, discrepancy measures fo cused on the seco nd lev el and the mo del o v erall do not detec t an y problem. This is consisten t with COMMENT 3 T able 2 Cr oss-validate d p osterior pr e dictive p -values for individual gr oups Discrepancy Group 1 Group 2 Group 3 Group 4 Group 5 Overall X 2 0.653 0.804 0.520 0.730 0.007 1st Level X 2 0.168 0.315 0.016 0.291 0.000 2nd Level X 2 0.577 0.656 0.654 0.585 0.007 Max j ∈{ 1 ,...,n i } X ij 0.641 0.723 0.373 0.759 0.005 Max j ∈{ 1 ,...,n i } | X ij − θ i | 0.203 0.333 0.023 0.411 0.002 Max j ∈{ 1 ,...,n i } | X ij − µ | 0.715 0.819 0.472 0.841 0.006 the fact that the mean and spread in group 3 are not extreme compared with th e other group s. F or group 5, all discrepancies detect th e inade- quacy of the hierarc hical m o del. This make s sense since group 5 has a v ery extreme group m ean of 4.44, whic h is almost three times the other group means, and has at least one relativ ely extreme ob- serv ation of 6.32, w hic h is almost t wice the ov er- all w ithin-group standard deviation a wa y f r om the group mean. Note th at ev en if p -v alues for group 5 w ere m u ltiplied by 5 or 6 to deal with multiplicit y of testing, the resu lt would still b e less than 0.05 for all the v arious discrepancies. No w we consider imp ro ving the prop osed hier- arc h ical mo del b y using more robu st distributions for m o deling the outlying group and extreme ob- serv ations. Since w e hav e seen slight inadequ acy in the ﬁ rst-lev el mo del for groups 3 and 5 and s er i- ous inadequacy in the second-lev el mo del for group 5, we migh t consid er usin g Student -t distributions to accommod ate the un usual observ ations and the extreme group mean parameter in the hierarchical mo del. T o p erform a robust analysis, we replace the nor- mal distribu tions b y Student- t d istributions with ﬁxed degrees of freedom ν 1 = 3 and ν 2 = 2 . 2 in th e ﬁrst and second level s of the hierarchica l mo del. The cross-v alidated p osterior predictiv e p -v alues assum- ing Student- t distribu tions in b oth lev els of mo d el are s h o wn in T able 3 . Th e tw o-lev el rob u st Student- t mo d el successfully accommo dates the unusual ob- serv ation in group 3 and almost accommo d ates the extreme obs er v ation in group 5. But it do es not fully address the inadequacy of the second-lev el mo del for ﬁtting group 5’s data. Given this result, one might suggest treating group 5 as b eing generated from a normal d istribution with a shifted lo cation param- eter or an inﬂated v ariance parameter. O ne could also consider using a nother mo d el, suc h as one of BC’s ( 2007 ) alte rnative m o dels in their S ection 3.6. If there w ere more groups with higher means, th en ﬁtting a mixtu r e of n ormal distributions in th e s ec- ond lev el migh t b e an option. Degrees of freedom greater than 2 are used b e- cause such t -distribu tions ha v e ﬁnite v ariances. A little bit of exp erimen ting w as done to c ho ose the degrees of freedom. L arger degrees of freedom had less success (sli ght ly) of ﬁtting the d ata, but m ade little d iﬀeren ce in p osterior d istributions of param- eters or in results in T able 3 . If the degrees of fr ee- dom are though t of as p arameters, th en p osterior v ariance will b e quite h igh with this few groups. T able 3 Cr oss-validate d p osterior pr e di ctive p -values for indi vidual gr oups assuming Student- t distributions for b oth levels in the hier ar chi c al mo del Discrepancy Group 1 Group 2 Group 3 Group 4 Group 5 Overall X 2 0.680 0.856 0.493 0.822 0.074 1st Level X 2 0.211 0.376 0.081 0.381 0.060 2nd Level X 2 0.636 0.676 0.667 0.639 0.022 Max j ∈{ 1 ,...,n i } X ij 0.581 0.664 0.320 0.734 0.070 Max j ∈{ 1 ,...,n i } | X ij − θ i | 0.295 0.450 0.117 0.501 0.122 Max j ∈{ 1 ,...,n i } | X ij − µ | 0.732 0.877 0.440 0.891 0.134 4 M. D. LAR S EN AND L. LU 4. SOME COMMENTS ON THE P APER AND DISCUSSION F rom the ab o v e analysis we can see that it is u s eful to emplo y v arious discrepancies to measur e the ov er- all p erformance and the sp eciﬁc assu mptions of the mo del. C ross-v alidated p osterior predictive c hecking allo ws the use of many d iscrepancies fo cused on v ar- ious asp ects of the m o del and a voids the doub le u se of data. It is also useful for assessing individual small groups or areas that are inconsisten t with the mo del. Extensions to m ultilev el mo dels, mo dels with co v ari- ates and generalized linear mo dels sh ould b e p ossi- ble. See Gelman ( 2004 ) and Gelman et al. ( 2005 ) and references therein f or other examples of mo del diag- nostics that use ﬂexibilit y in deﬁnin g ev aluations to adv an tage. The framew ork of test statistics only for c hec k- ing mo dels is less ﬂexible and r equires more eﬀort; test statistics of BC’s ( 2007 ) Section 3.3 r equired some reﬁnement of p ro cedures in App end ix C. The authors should b e commended on their eﬀo rts and explanations; their resu lts sho w a deﬁnite adv antag e o ver the other metho ds in their article in these ap- plications. The authors state that they int end the m o del c h ec ks to b e p reliminary in order to a vo id mo del elab oration and (possib ly) a v eraging. It seems u n- lik ely to us that there would not b e v alue in us ing suc h method s for f urther study of mod els past an initial stage . Indeed, it migh t b e the ca se that un- usual patterns might b e detectable only after mo d- els reac h a certain lev el of complexit y . W e agree w ith the authors that assessing total uncertaint y th rough an elab orate mo del selection and reﬁnement pro ce- dure is a c hallenge th at deserv es more s tudy . An issue for future w ork w ith mo del assessment is m u ltiplicities: the use of m ultiple test statistics or discrepancy measures to ev aluate a single m o del and tests concerning individu al groups. Multiplicit y in testing will aﬀect p o w er and distribution of p -v alues. One could recommend selecting on e discrep ancy to assess eac h part of a mo d el and a vo id to o m uc h o verlap and r edund an cy . W e agree w ith BC ( 2007 ) that in cases with many discrepancy measures and , in particular, man y groups, simple Bonferroni cor- rections might decrease p o w er to o m uc h; in suc h cases in v estigation of metho d s from statistic al ge- netics (small n , large p ) might b e helpfu l. As a side note, it would not b e p articularly hard to simulat e p -v alue distribu tions and p o we r for cross-v alidated p osterior predictiv e p -v alues un der th e scenario s of BC ( 2007 ) with or without adjustmen t for m ultiplic- it y . In order to imp lement cross-v alidated p osterior predictiv e c hec king one m ust sample th e p osterior distribution while leavi ng out groups of data. When the num b er of groups or areas is large, the com- putation n eeded for reanalyzing the mo del without eac h group or area could b e time consuming. T o a void reﬁtting the mo del without eac h group, meth- o ds s u c h as imp ortance weigh ting and imp ortance resampling could b e used to appro ximate the p oste- rior distribution that w ould b e obtained if the anal- ysis were rep eated with lea ving out the group. See Stern and C ressie ( 2000 ), Marshall and Spiegelhal- ter ( 2003 ) and references therein in this regard. Again we wish to thank authors for a stim u lating pap er th at d emonstrates a metho d th at seems quite eﬀectiv e and clearly state s issues in v olv ed. A CKNO WLEDGMENTS The authors wish to thank the editor Ed George for the opp ortunit y to discuss this article and Ba- y arri and Castellanos for helpfu l commen ts on the discussion. This wo rk wa s supp orted in p art b y Io wa’s State Board of Ed u cation and a dissertation a w ard from the American Education Researc h Asso ciation. REFERENCES Ba y arri , M. J. and C astellanos, M. E. (2007). Ba yesian chec kin g of the second level of hierarc hical models. Statist. Sci. 22 322–343. Dey, D. K., Ge lf and, A . E., Sw ar tz, T. B . and Vlachos, A. K. (1998). A simulation-in tensive approach for chec k ing hierarc hical models. T est 7 325–346. Gelman, A. (2004). Exp loratory data analysis for com- plex mo dels. J. Comput. Gr aph. Statist. 13 755–779 . MR2109052 Gelman, A., v an Mechelen, I., V erbeke, G., Heitjan, D. F. and Meulders, M. (2005). Multiple imputation for mod el chec king: Completed-d ata plots with missing and laten t data. Biometrics 61 74–85. MR2135847 Marshall, E. C. and Spiege lhal ter, D. J. (200 3). Ap- proximate cross-v alidatory predictive chec ks in disease mapping mo dels. Stat. Me d. 22 1649–1660. O’Hagan, A. (2003). HSSS model criticism (with discussion). In Highly Structur e d Sto chastic Systems (P . J. Green, N. L. Hjort and S. T. R ichardso n, eds.) 423–445. Oxford Univ. Press. MR2082403 Stern, H. S. and Cre ssie, N. (2000). P osterior p redictive mod el chec ks for disease mapping mo dels. Statistics in Me dicine 19 2377 –2397.

Comment: Bayesian Checking of the Second Level of Hierarchical Models: Cross-Validated Posterior Predictive Checks Using Discrepancy Measures

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment