Fully Bayes factors with a generalized g-prior
For the normal linear model variable selection problem, we propose selection criteria based on a fully Bayes formulation with a generalization of Zellner's $g$-prior which allows for $p>n$. A special case of the prior formulation is seen to yield tra…
Authors: Yuzo Maruyama, Edward I. George
The Annals of Statistics 2011, V ol. 39, No. 5, 2740–27 65 DOI: 10.1214 /11-AOS917 c Institute of Mathematical Statistics , 2 011 FULL Y BA YES F A CTORS WITH A GENERALIZED G -PRIOR By Yuzo Maruy ama and Ed w ard I. George University of T okyo and Univ e rsity of Pennsylvania F or the normal linear mo del v ariable selection problem, w e pro- p ose selection criteria based on a fully Bay es formulation with a gen- eralization of Zellner’s g -p rior which allo ws for p > n . A sp ecial case of the prior formulation is seen t o yield tractable closed forms for marginal densities and Ba yes factors which rev eal new model eva lu- ation characteristics of potential interest. 1. Introduction. Supp ose the normal linear regression mo del is used to relate y to the p oten tial predicto rs x 1 , . . . , x p , y ∼ N n ( α 1 n + X F β F , σ 2 I n ) , (1.1) where α is an unknown in tercept parameter, 1 n is an n × 1 v ector eac h comp onent of whic h is one, X F = ( x 1 , . . . , x p ) is an n × p d esign matrix, β F is a p × 1 v ector of unkn o wn regression co efficien ts, I n is an n × n identit y matrix and σ 2 is an u nkno wn p ositive scalar. (The su bscript F d enotes th e full mo del.) W e assume that the columns of X F ha v e b een s tandardized so that for 1 ≤ i ≤ p , x ′ i 1 n = 0 and x ′ i x i /n = 1. W e shall b e particularly int erested in the v ariable selection pr oblem where w e would lik e to select an unknown subset of the imp ortant predictors. It will b e con v enien t throughou t to in dex eac h of these 2 p p ossible subset c hoices b y the v ector γ = ( γ 1 , . . . , γ p ) ′ , where γ i = 0 or 1. W e use q γ = γ ′ 1 p to denote the size of the γ th s ubset. The p roblem then b ecomes that of selecting a sub mo del of ( 1.1 ) w hic h has a density of th e form p ( y | α, β γ , σ 2 , γ ) = φ n ( y ; α 1 n + X γ β γ , σ 2 I n ) , (1.2) Received Sep tember 2010; revised August 2011. AMS 2000 subje ct classific ations. Primary 62F07, 62F15; secondary 62C10. Key wor ds and phr ases. Bay es factor, mo del selection consistency , ridge regression, sin- gular val ue d ecomp osition, v ariable selection. This is an electronic repr int of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics , 2011, V ol. 39, No. 5, 27 4 0–27 65 . This reprint differs fr o m the original in pagination and t yp ogr aphic detail. 1 2 Y. MARUY AMA AND E. I. GEORGE where φ n ( y ; µ , Σ ) d enotes the n -v ariate normal d ensit y with mean v ector µ and co v ariance matrix Σ . In ( 1.2 ), X γ is the n × q γ matrix whose columns corresp ond to th e γ th subset of x 1 , . . . , x p , and β γ is a q γ × 1 vec tor of unknown regression coefficien ts. W e assume thr ou gh ou t that X γ is of f ull rank denoted r γ = m in { q γ , n − 1 } . Last, let M γ denote the submo d el giv en b y ( 1.2 ). A Ba ye sian approac h to this problem en tails the sp ecificatio n of pr ior dis- tributions on the mo dels π γ = Pr( M γ ), and on the parameters p ( α, β γ , σ 2 ) of eac h mo del. F or eac h suc h s p ecification, of key in terest is the p osterior probabilit y of M γ giv en y , Pr( M γ | y ) = π γ m γ ( y ) P γ π γ m γ ( y ) = π γ BF γ : N P γ π γ BF γ : N , (1.3) where m γ ( y ) is th e marginal density of y und er M γ . In ( 1.3 ), BF γ : N is the so-calle d “null-based Ba y es factor” f or comparin g eac h of M γ to th e n ull mo del M N whic h is d efined as BF γ : N = m γ ( y ) m N ( y ) , where th e n ull mo d el M N is giv en by y ∼ N n ( α 1 n , σ 2 I n ) and m N ( y ) is the marginal d ensit y of y under the n ull mo del. F or mo d el s electio n, a p opular strategy is to select the m o del for whic h Pr ( M γ | y ) or π γ BF γ : N is largest. Our main fo cus in this pap er is to p rop ose and stu dy sp ecifications for the parameter pr ior for eac h sub mo del M γ , wh ich w e will consider to b e of the form p ( α, β γ , σ 2 ) = p ( α ) p ( σ 2 ) p ( β γ | σ 2 ) (1.4) = p ( α ) p ( σ 2 ) Z p ( β γ | σ 2 , g ) p ( g ) dg , where g is a h yp erparameter. In Section 2 w e explicitly describ e our c hoices of p rior forms for ( 1.4 ). Ou r key in n o v ati on th er e will b e to use a general- ization of p ( β γ | σ 2 , g ) = φ q γ ( β ; 0 , g σ 2 ( X ′ γ X γ ) − 1 ) , (1.5) Zellner’s ( 1986 ) g -p rior, a normal conju gate form which leads to tr actable marginalizatio n, for example, see George and F oster ( 2000 ), F ern´ andez, Ley and S teel ( 2001 ), Liang et al. ( 2008 ). Under ( 1.5 ) and a flat prior on α , the marginal density of y give n g and σ 2 under M γ is giv en b y m γ ( y | g , σ 2 ) ∝ exp g g + 1 n max α, β γ log p ( y | α, β γ , σ 2 ) − q γ H ( g ) o , (1.6) FULL Y BA YES F ACTORS 3 where H ( g ) = (2 g ) − 1 ( g + 1) log( g + 1) , a s p ecial case of the key relation in George and F oster ( 2000 ). As th ey p oint out, f or particular v alues of g , when σ 2 is kno wn, the Ba ye sian strategy of choosing M γ to maximize ( 1.6 ) corresp onds to common fi xed p enalt y selection criteria. F or example, setting H ( g ) = 2, log n or 2 log p (in d ep end ently of y ) w ould corresp ond to AIC [Ak aike ( 1974 )], BIC [Sc h warz ( 1978 )] or RIC [F oster and George ( 1994 )], BIC, or RIC, resp ectiv ely . F or a discussion of recommendations in the lit- erature for c ho osing a fixed g dep ending on p and /or n , see Section 2.4 of Liang et al. ( 2008 ). Although the corresp ond ences to fixed p enalt y criteria are inte resting, as a p ractical matter, it is n ecessary to deal with the uncertain t y ab out g and σ 2 to obtain useful criteria. F or this pu rp ose, George and F oster ( 2000 ) prop osed select ing the mo del maximizing m γ ( y | g , σ 2 ) based on an empirical Ba y es estimate of g and the standard u nbiased estimate of σ 2 . More recen tly , Cui and George ( 2008 ) prop osed margining out g with resp ect to a pr ior, and Liang et al. ( 2008 ) prop osed marginin g out g and σ 2 with r esp ect to priors. It should b e n oted that the fi rst pap er to effectiv ely use a prior integrat ing out g was Zellner and Sio w ( 198 0 ); they stated things in terms of multiv ariate Cauc hy densities, which can alw a ys b e expressed as a g -mixture of g -pr iors. All of these strategies lead to criteria th at can b e seen as adapting to the fixed p enalt y criterion whic h w ould b e most suitable for the d ata at hand. In this p ap er, we shall similarly f ollo w a fu lly Ba y es app roac h, but w ith a generalizat ion of the g -prior ( 1.5 ) and an extension of the considered class of pr iors on g . After describin g our prior forms in Section 2 and then calculating th e marginals and Ba y es factors in Section 3 , w e ultimately obtain our pr op osed g -pr ior Ba y es f actor ( g BF), w h ic h is of the form (omitting th e γ su bscripts for clarit y) g BF γ : N = ¯ d d q − q { 1 − R 2 + d 2 q k ˆ β LS k 2 } − 1 / 4 − q / 2 C n,q (1 − R 2 ) ( n − q ) / 2 − 3 / 4 , if q < n − 1, { ¯ d × k ˆ β MP LS k} − n +1 , if q ≥ n − 1, (1.7) where C n,q ≡ B (1 / 4 , ( n − q ) / 2 − 3 / 4) B ( q / 2+1 / 4 , ( n − q ) / 2 − 3 / 4) using the Beta function B ( · , · ), R 2 is the familiar R -squ ared s tatistic u nder M γ , ¯ d and d r are, resp ectiv ely , the geometric mean and minim um of the sin gular v alues of X γ , k · k is th e L 2 norm, and fin ally , f or the stand ardized resp on s e ( y − ¯ y 1 n ) / k y − ¯ y 1 n k , ˆ β LS is the usual least squares estimator, and ˆ β MP LS is the least squares estimator using the Mo ore–P enrose inv erse matrix. Tw o immediately app aren t features of ( 1.7 ) should b e n oted. First, in con- trast to other fu lly Ba y es factors for our selection problem, g BF is a closed form expression whic h allo ws for in terpretation and straightfo rward calcu- lation un der any mo del. As will b e seen in later sections, this tr an s parency 4 Y. MARUY AMA AND E. I. GEORGE rev eals that g BF not only rewa rds explained v ariation ov erall , but also re- w ards v ariati on explained b y the larger p rincipal comp onen ts of the design matrix. Second, g BF ca n b e app lied to all mo dels ev en when the n um b er of predictors p exceeds the num b er of observ ations n . This includes p > n whic h is of increasing inte rest. This is not the case for ( 1.5 ) which r equires p ≤ n − 1 so that X ′ γ X γ will b e inv ertible for all q γ , (recall that X γ has dimension at most n − 1 b ecause its columns ha v e b een cen tered). Note also that when p > n − 1 , p enalized sum-of-squares criteria su c h as AIC , BIC and RIC will b e un a v ail able for all subm o dels. The organizatio n of th is p ap er is as follo ws. In S ection 2 w e pr op ose prior forms including a generalized g -prior w ith a b eta-prime prior f or g . In Section 3 we d eriv e general Ba y es facto r expressions, and prop ose default h yp erparameter settings whic h yield g BF ab o v e. In Section 4 w e discuss app ealing consequences of our default sp ecifications. In Section 5 we describ e conditional shrin k ag e estimation with the generalized g -pr ior. I n Section 6 w e sho w that g BF is consisten t for mo del selection as n → ∞ . In S ection 7 w e p r o vide a simulatio n ev aluation of g BF p erformance. 2. A fully Ba ye s prior formulatio n. W e no w pro ceed to describ e the prior comp onent s that form p ( α, β γ , σ 2 ) in ( 1.4 ). Thr ou gh ou t the remaind er of the pap er, w e will omit the su bscript γ for notational simp licit y when there is no ambiguit y . Ho w ev er, it is imp ortan t to rememb er th r oughout that our form ulations are to b e app lied to all of the 2 p p ossible su b mo dels in ( 1.2 ). 2.1. A gener alize d g - prior for β . T o motiv ate our prop osed generaliza- tion of Zellner’s g -prior, we b egin with a reconsideration of the original g -pr ior ( 1.5 ) for the case p ≤ n − 1. T he co v ariance matrix of the g -p rior, g σ 2 ( X ′ X ) − 1 , is prop ortional to the cov ariance m atrix of the least squ ares estimator ˆ β LS . As a consequence of this c hoice, the marginal lik elihoo d with resp ect to the g -prior app ealingly b ecomes a fu nction only of the residual sum-of-squares, R S S. Ho w ev er, from the “matrix conditioning” viewp oin t of Casella ( 1980 , 1985 ) w h ic h adv o cates more shr ink age on higher v ariance estimates, the original g -prior ma y not b e reasonable. T o see w h y , let us r otate the problem b y the q × q orth ogonal matrix W = ( w 1 , . . . , w q ) wh ic h diagonalizes X ′ X as W ′ ( X ′ X ) W = D 2 , (2.1) where D = diag ( d 1 , . . . , d q ) with d 1 ≥ · · · ≥ d q > 0 . (2.2) Th us, W ′ ˆ β LS ∼ N q ( W ′ β , σ 2 D − 2 ) . FULL Y BA YES F ACTORS 5 Applying the g -prior ( 1.5 ) to th ese rotated co ordinates would then ind uce the pr ior W ′ β ∼ N q ( 0 , g σ 2 D − 2 ) , whic h rev eals the prior v ariance s to b e p rop ortional to the sample v ariances of the elemen ts of W ′ ˆ β LS . This contradicts Casella ( 1980 ) who states, “if the s ampling inform ation is go o d, it is reasonable to d o wnw eig ht the prior guess.” T o r emedy th is situ ation, we pr op ose consideration of p riors on β for whic h W ′ β ∼ N q ( 0 , σ 2 Ψ q ) , where the comp onen ts of Ψ q = d iag( ψ 1 , . . . , ψ q ) are in descending order, namely , ψ 1 ≥ · · · ≥ ψ q > 0 . (2.3) Note that this would b e satisfied for Ψ q ∝ I q , a consequence of the common assumption of exc hangeable β comp onent s. In fact, a sligh tly weak er ordering of the form d 2 1 ψ 1 ≥ · · · ≥ d 2 q ψ q > 0 (2.4) w ould still b e r easonable b ecause the resulting Ba y es estimator of w ′ i β would b e of the form (1 + { d 2 i ψ i } − 1 ) − 1 w ′ i ˆ β LS , so that u nder ( 2.4 ), the comp onen ts of W ′ ˆ β LS with larger v aria nce would b e shrun k more. W e note that the original g -pr ior ( 1.5 ), for wh ic h ψ i = g d − 2 i , satisfies only the extreme b oundary of ( 2.4 ), n amely , d 2 1 ψ 1 = · · · = d 2 q ψ q = g . This violates ( 2.3 ) whenever d i > d i +1 , in w h ic h case ψ i < ψ i +1 . An app ealing general form for Ψ q is Ψ q ( g , ν ) = diag( ψ 1 ( g , ν ) , . . . , ψ q ( g , ν )), where ψ i ( g , ν ) = (1 /d 2 i ) { ν i (1 + g ) − 1 } , (2.5) ν = ( ν 1 , . . . , ν q ) ′ and ν i ≥ 1 for an y i , guarantee ing ψ i ( g , ν ) > 0 . Note that Ψ q ( g , ν ) , lik e the original g -prior, is control led by a single hyp erparameter g > 0. When ν 1 = · · · = ν q = 1, σ 2 Ψ q ( g , ν ) b ecomes gσ 2 D − 2 , yielding the co v ariance structure of the original g -p rior. Although ( 2.4 ) w ill b e satisfied whenev er ν 1 ≥ · · · ≥ ν q ≥ 1, we shall ultimately b e in terested in a particular design d ep endent c hoice defined in Section 3.2 . In su mmary , when q ≤ n − 1, w e p r op ose a generalized g -prior for β of the form p ( β | σ 2 , g ) = φ q ( W ′ β ; 0 , σ 2 Ψ q ( g , ν )) , (2.6) where ν 1 ≥ · · · ≥ ν q ≥ 1. 6 Y. MARUY AMA AND E. I. GEORGE When q > n − 1 and the rank of X is n − 1 , there exists a q × ( n − 1) matrix W = ( w 1 , . . . , w n − 1 ) w hic h diagonalizes X ′ X as W ′ ( X ′ X ) W = D 2 , (2.7) where W ′ W = I n − 1 and D = diag( d 1 , d 2 , . . . , d n − 1 ) with d 1 ≥ d 2 ≥ · · · ≥ d n − 1 > 0 . F or this case, we pr op ose a generalized g -p rior of the form p ( β | σ 2 , g ) = φ n − 1 ( W ′ β ; 0 , σ 2 Ψ n − 1 ( g , ν )) p # ( W ′ # β ) , (2.8) where Ψ n − 1 ( g , ν ) = d iag( ψ 1 , . . . , ψ n − 1 ) is again give n b y ( 2.5 ) and ν 1 ≥ · · · ≥ ν n − 1 ≥ 1. Here, W # is an arbitrary matrix w hic h mak es the q × q matrix ( W , W # ) orthogonal, and p # ( · ) is an arbitrary p robabilit y densit y on W ′ # β , resp ectiv el y . As will b e seen, the c hoices of W # and p # ha v e no effect on the selection criteria we obtain, th us we lea ve them as arbi- trary . Com bining the ab ov e tw o cases by letting r = min { q , n − 1 } , (2.9) our suggested generalized g -prior is of the form p ( β | g , σ 2 ) = φ r ( W ′ β ; 0 , σ 2 Ψ r ( g , ν )) (2.10) × 1 , if q ≤ n − 1, p # ( W ′ # β ) , if q > n − 1, where the q × r matrix W satisfies b oth W ′ X ′ XW = d iag( d 2 1 , . . . , d 2 r ) and W ′ W = I r , and Ψ r ( g , ν ) = diag( ψ 1 ( g , ν ) , . . . , ψ r ( g , ν )) with ( 2.5 ). Remark 2.1. In ( 2.1 ) and ( 2.7 ), let U = ( u 1 , . . . , u r ) = ( Xw 1 /d 1 , . . . , Xw r /d r ) = XWD − 1 . (2.11) Then U ′ U = I r and X = UD W ′ = r X i =1 d i u i w ′ i . (2.12) This is the nonnull part of th e w ell-kno wn singular v alue decomp osition (SVD). The d iagonal element s of D = diag ( d 1 , . . . , d r ) are the singular v al - ues of X , and the columns of U = ( u 1 , . . . , u r ) are the n ormalized prin cipal comp onent s of the column sp ace of X . Note that the comp onents of the r o- tated vect or W ′ β are the co efficien ts for the principal comp onen t regression of y on UD . F rom the definition of W and U by ( 2.1 ), ( 2.7 ) and ( 2.11 ), the signs of u i w ′ i are determinate although the signs of w i and u i for 1 ≤ i ≤ r are ind etermin ate. Th ese indeterminacies can safely b e ignored in our de- v elopmen t. FULL Y BA YES F ACTORS 7 2.2. Priors for g , α and σ 2 . T urning to the pr ior for the hyp erparame- ter g , w e p rop ose p ( g ) = g b (1 + g ) − a − b − 2 B ( a + 1 , b + 1) I (0 , ∞ ) ( g ) (2.13) with a > − 1, b > − 1, a P earson Typ e VI or b eta-prim e distribution u nder whic h 1 / (1 + g ) has a Beta distribution Be( a + 1 , b + 1) . C hoices for th e h yp erparameters a and b are d iscussed later. Although Zellner an d Sio w ( 1980 ) did not explicitly u s e a g -prior formu- lation with a pr ior on g , their r ecommendation of a multiv ariate Cauch y form for p ( β | σ 2 ) implicitly corresp onds to using a g -prior with an inv erse Gamma prior ( n/ 2) 1 / 2 { Γ(1 / 2) } − 1 g − 3 / 2 e − n/ (2 g ) on g . Both Cui and George ( 2008 ) and Liang et al. ( 2008 ) prop osed u sing g -pr iors with priors of the form p ( g ) = ( a + 1) − 1 (1 + g ) − a − 2 , (2.14) the su b class of ( 2.13 ) with b = 0. Cases for wh ic h b = O ( n ) will b e of int erest to us in what follo ws. F or the parameter α and σ 2 , we use the lo cation inv arian t flat prior p ( α ) = I ( −∞ , ∞ ) ( α ) (2.15) and the scale in v arian t pr ior p ( σ 2 ) = ( σ 2 ) − 1 I (0 , ∞ ) ( σ 2 ) , (2.16) resp ectiv ely . Because α and σ 2 app ear in ev ery mo del, th e use of these improp er priors for Ba y esian mo del s election is formally justified b y Berger, P ericc hi and V arsha vsky ( 1998 ). W e note in passing that for the estimation of a multiv ariate normal mean, priors equiv ale nt to ( 2.6 ), ( 2.13 ), ( 2.15 ) and ( 2.16 ) hav e b een considered by Stra wderman ( 1971 ) and extended by Maruy ama and Stra wderm an ( 2005 ). 3. Marginal densities and Bay es factors. 3.1. Gener al forms. The marginal densities of y under M γ ( 6 = M N ) and M N are, by d efinition, m γ ( y ) = Z ∞ −∞ Z R q Z ∞ 0 p ( y | α, β γ , σ 2 ) p ( α, β γ , σ 2 ) dα d β γ dσ 2 , (3.1) m N ( y ) = Z ∞ −∞ Z ∞ 0 p ( y | α, σ 2 ) p ( α, σ 2 ) dα dσ 2 , 8 Y. MARUY AMA AND E. I. GEORGE resp ectiv ely . Under the p riors p ( α, β γ , σ 2 ) = p ( α ) p ( σ 2 ) Z ∞ 0 p ( β γ | σ 2 , g ) p ( g ) dg for M γ ( 6 = M N ) and p ( α, σ 2 ) = p ( α ) p ( σ 2 ) for M N , where p ( β | σ 2 , g ), p ( α ) an d p ( σ 2 ) are give n b y ( 2.10 ), ( 2.15 ) and ( 2.16 ), and p ( g ) w hen q < n − 1 is giv en by ( 2.13 ) with − 1 < a < − 1 / 2 and b = ( n − 5) / 2 − q / 2 − a [ p ( g ) is arb itrary wh en q ≥ n − 1], w e ha ve a follo wing theorem ab out the Ba y es factor ratio of the marginal den sities un der eac h of M γ and M N . Theorem 3.1. The Bayes factor for c omp aring e ach of M γ to M N is BF γ : N ( a, ν ) = m γ ( y ) m N ( y ) (3.2) = q Y i =1 ν − 1 / 2 i B ( q / 2 + a + 1 , ( n − q − 3) / 2 − a ) B ( a + 1 , ( n − q − 3 ) / 2 − a ) × (1 − Q 2 ) − q / 2 − a − 1 (1 − R 2 ) ( n − q − 3) / 2 − a , if q < n − 1 , n − 1 Y i =1 ν − 1 / 2 i (1 − Q 2 ) − ( n − 1) / 2 , if q ≥ n − 1 , wher e ν 1 ≥ · · · ≥ ν r ≥ 1 , R 2 and Q 2 ar e given by R 2 = r X i =1 { cor( u i , y ) } 2 , Q 2 = r X i =1 (1 − ν − 1 i ) { cor( u i , y ) } 2 . (3.3) Note that R 2 and Q 2 are the usual and a mo dified v ersion of the R - squared statistics and cor( u i , y ) is the correlation of the resp onse y and the i th principal comp onent of X . Pr oof of Theor em 3.1 . Defining v = y − ¯ y 1 n , w here ¯ y is the mean of y , so that k y − α 1 n − X β k 2 = n ( − α + ¯ y ) 2 + k v − X β k 2 , w e obtain Z ∞ −∞ p ( y | α, β , σ 2 ) dα = n 1 / 2 (2 π σ 2 ) ( n − 1) / 2 exp − k v − X β k 2 2 σ 2 . (3.4) FULL Y BA YES F ACTORS 9 W e make the follo wing orthogonal transformation w hen int egration with resp ect to β is considered: β → W ′ β ≡ β ∗ , if q ≤ n − 1, W ′ β W ′ # β ≡ β ∗ β # , if q > n − 1, (3.5) so that Z ∞ −∞ Z R q p ( y | α, β , σ 2 ) p ( β | σ 2 , g ) dα d β = n 1 / 2 (2 π σ 2 ) ( n − 1) / 2 | Ψ | − 1 / 2 (2 π σ 2 ) r / 2 Z R r exp − k v − UD β ∗ k 2 2 σ 2 − β ′ ∗ Ψ − 1 β ∗ 2 σ 2 d β ∗ × 1 , if q ≤ n − 1, Z R q − n +1 p # ( β # ) d β # (=1) , if q > n − 1. Completing the square k v − UD β ∗ k 2 + β ′ ∗ Ψ − 1 β ∗ with resp ect to β ∗ , w e ha v e k v − UD β ∗ k 2 + β ′ ∗ Ψ − 1 β ∗ = { β ∗ − ( D 2 + Ψ − 1 ) − 1 D ′ U ′ v } ′ ( D 2 + Ψ − 1 ) (3.6) × { β ∗ − ( D 2 + Ψ − 1 ) − 1 D ′ U ′ v } − v ′ UD ( D 2 + Ψ − 1 ) − 1 D ′ U ′ v + v ′ v , where the residual term is rewritten as − v ′ UD ( D 2 + Ψ − 1 ) − 1 D ′ U ′ v + v ′ v = − v ′ r X i =1 u i u ′ i d 2 i d 2 i + ψ − 1 i ! v + v ′ v = g k v k 2 g + 1 ( 1 − r X i =1 ( u ′ i v ) 2 k v k 2 ) + k v k 2 1 + g ( 1 − r X i =1 1 − 1 ν i ( u ′ i v ) 2 k v k 2 ) . Hence, by | Ψ | = r Y i =1 ν i + ν i g − 1 d 2 i , | D 2 + Ψ − 1 | = r Y i =1 d 2 i ν i (1 + g ) ν i + ν i g − 1 , w e h av e Z ∞ −∞ Z R q p ( y | α, β , σ 2 ) p ( β | g , σ 2 ) dα d β (3.7) = n 1 / 2 (2 π σ 2 ) ( n − 1) / 2 (1 + g ) − r / 2 Q r i =1 ν 1 / 2 i exp − k v k 2 { g (1 − R 2 ) + 1 − Q 2 } 2 σ 2 ( g + 1) , where R 2 and Q 2 are giv en by ( 3.3 ). 10 Y. MARUY AMA AND E. I. GEORGE Next w e consider the integrat ion with r esp ect to σ 2 . By ( 3.7 ), w e h a v e Z ∞ −∞ Z R q Z ∞ 0 p ( y | α, β , σ 2 ) p ( β | g , σ 2 ) 1 σ 2 dα d β dσ 2 = Z ∞ 0 n 1 / 2 (2 π σ 2 ) ( n − 1) / 2 (1 + g ) − r / 2 Q r i =1 ν 1 / 2 i (3.8) × exp − k v k 2 { g (1 − R 2 ) + 1 − Q 2 } 2 σ 2 ( g + 1) 1 σ 2 dσ 2 = K ( n, y ) Q r i =1 ν 1 / 2 i (1 + g ) − r / 2+( n − 1) / 2 { g (1 − R 2 ) + 1 − Q 2 } − ( n − 1) / 2 , where K ( n, y ) = n 1 / 2 Γ( { n − 1 } / 2 ) π ( n − 1) / 2 k y − ¯ y 1 n k n − 1 . When q ≥ n − 1, R 2 = 1 and r = n − 1 so that Z ∞ −∞ Z R q Z ∞ 0 p ( y | α, β , σ 2 ) p ( β | g , σ 2 ) 1 σ 2 dα d β dσ 2 (3.9) = K ( n, y ) Q n − 1 i =1 ν 1 / 2 i { 1 − Q 2 } − ( n − 1) / 2 , whic h do es not dep end on g . Hence, in this case, m γ ( y ) do es not dep end on the pr ior densit y of g . When q < n − 1, w e consider the prior ( 2.13 ) of g with − 1 < a < − 1 / 2 and b = ( n − 5) / 2 − q / 2 − a , where b is guaran teed to b e s tr ictly greater than − 1 for q < n − 1. Th en w e ha v e m γ ( y ) = K ( n, y ) Q q i =1 ν 1 / 2 i B ( a + 1 , b + 1) × Z ∞ 0 g b (1 + g ) a + b +2 { g (1 − R 2 ) + 1 − Q 2 } − ( n − 1) / 2 (1 + g ) q / 2 − ( n − 1) / 2 dg = K ( n, y )(1 − Q 2 ) − ( n − 1) / 2 Q q i =1 ν 1 / 2 i B ( a + 1 , b + 1) Z ∞ 0 g b 1 − R 2 1 − Q 2 g + 1 − ( n − 1) / 2 dg (3.10) = K ( n, y )(1 − Q 2 ) − ( n − 1) / 2+ b +1 Q q i =1 ν 1 / 2 i { 1 − R 2 } b +1 B ( q / 2 + a + 1 , b + 1) B ( a + 1 , b + 1) = K ( n, y )(1 − Q 2 ) − q / 2 − a − 1 Q q i =1 ν 1 / 2 i { 1 − R 2 } ( n − q − 3) / 2 − a B ( q / 2 + a + 1 , ( n − q − 3) / 2 − a ) B ( a + 1 , ( n − q − 3) / 2 − a ) . FULL Y BA YES F ACTORS 11 In the same w a y , m N ( y ) for the n ull mo d el is obtained as m N ( y ) = K ( n, y ) . (3.11) F rom ( 3.9 ), ( 3.10 ) an d ( 3.11 ), the th eorem follo ws. Remark 3.1. R 2 and Q 2 giv en by ( 3.3 ) are the usu al and a mo difi ed form of the R -squared measur e for multiple regression. Th ey are h ere ex- pressed in terms of { cor( u 1 , y ) } 2 , . . . , { cor( u r , y ) } 2 , the squared correlations of the resp on s e y and the pr incipal comp onen ts u 1 , . . . , u r of X . F or fix ed q and ν , the BF criterion is increasing in b oth R 2 and Q 2 . The former is defi- nitely reasonable. Larger Q 2 w ould also b e reasonable wh en ν 1 ≥ · · · ≥ ν r so that Q 2 w ould p ut m ore weig ht on those comp onents of W ′ β for wh ich d i is larger and are consequ en tly b ette r estimated. In this sense, Q 2 w ould r ew ard those mo d els w hic h are more stably estimated. Bey ond th eir infl uence th r ough Q 2 , the c hoice of ν 1 , . . . , ν r pla ys a further influentia l role in BF γ : N through the Q r i =1 ν − 1 / 2 i terms in ( 3.2 ). In Section 3.2 b elo w, a d efault c hoice is prop osed wh ic h, throu gh th ese terms, rewards stable estimation. Note that if ν i = 1 for all i (i.e., the original g -p rior), Q 2 b ecomes zero, Q r i =1 ν − 1 / 2 i ≡ 1, and BF γ : N b ecomes a function of just R 2 and q . In this case, BF γ : N will not distinguish b etw een mod els for whic h q ≥ n − 1. Remark 3.2. The analytical simplification in ( 3.10 ) is a consequence of the choice b = ( n − 5) / 2 − q / 2 − a , and resu lts in a con v enient closed f orm for our Ba y es factor. S u c h a reduction is una v ailable for other c hoices of b . F or example, Liang et al. ( 2008 ) use Laplace appro ximations to a v oid the ev aluation of the s p ecial fu nctions th at arise in the resu lting Ba yes factor when b = 0. Another attractiv e feature of th e c hoice b = ( n − 5) / 2 − q / 2 − a will b e discu ssed in Section 4.2 . 3.2. Def ault choic es. A t this p oin t, we are ready to consider default c hoices for a and ν . F or a , we recommend a = − 3 / 4 , (3.12) the median of the r ange of v alues ( − 1 , − 1 / 2) for whic h the marginal densit y is we ll defin ed for any c hoices of q < n − 1. In Section 4 we will explicitly see th e app ealing consequence of this c hoice on the asymptotic tail b eha vior of p ( β | σ 2 ). F or ν , we recommend ν = ( d 2 1 /d 2 r , d 2 2 /d 2 r , . . . , 1) ′ , (3 .13) 12 Y. MARUY AMA AND E. I. GEORGE whic h coup led with ( 2.5 ) satisfies ( 2.4 ) since ν 1 ≥ · · · ≥ ν q ≥ 1 for this choice . Inserting this ν into ( 3.3 ) yields Q 2 = R 2 − d 2 r r X i =1 ( u ′ i v ) 2 d 2 i v ′ v = R 2 − d 2 r k D − 1 U ′ { v / k v k}k 2 (3.14) = ( R 2 − d 2 q k ˆ β LS k 2 , if q < n − 1, 1 − d 2 n − 1 k ˆ β MP LS k 2 , if q ≥ n − 1, where, for the standardized resp onse v / k v k f or v = y − ¯ y 1 n , ˆ β LS is the usual LS estimator for q < n − 1 , and ˆ β MP LS is the LS estimator based on the Mo ore–P enrose inv erse matrix. The third equalit y in ( 3.14 ) follo ws from th e fact that b oth ˆ β LS and ˆ β MP LS for the r esp onse v / k v k can b e expressed as ˆ β = WD − 1 U ′ { v / k v k} , and from the orthogonalit y of W , k ˆ β k 2 = k D − 1 U ′ { v / k v k}k 2 . It will also b e u seful to define ¯ d = r Y i =1 d i ! 1 /r , (3.15) the geometric mean of the sin gular v alues d 1 , . . . , d r . In serting our default c hoices for a and ν in to BF γ : N ( a, ν ) in ( 3.2 ), and n oting that r Y i =1 ν − 1 / 2 i = ( ¯ d/d r ) − r , (3.16) w e obtain our recommended Bay es factor in ( 1.7 ) which w e denote by g BF ( g -prior Ba y es factor): g BF γ : N (3.17) = ¯ d d q − q B ( q / 2 + 1 / 4 , ( n − q ) / 2 − 3 / 4) B (1 / 4 , ( n − q ) / 2 − 3 / 4) × (1 − R 2 + d 2 q k ˆ β LS k 2 ) − 1 / 4 − q/ 2 (1 − R 2 ) ( n − q ) / 2 − 3 / 4 , if q < n − 1, { ¯ d × k ˆ β MP LS k} − ( n − 1) , if q ≥ n − 1, whic h is a f unction of th e k ey quantitie s q , R 2 , th e LS estimators and the singular v alues of the design matrix. FULL Y BA YES F ACTORS 13 Remark 3.3. L ike traditional selection criteria such as AIC, BIC and RIC, the g BF criterion ( 3.17 ) r ew ards mo dels for exp lained v ariation thr ough R 2 . Ho w ev er, g BF also rewards mo dels for stabilit y of estimation thr ough smaller v al ues of ¯ d/d q and d q k ˆ β LS k f or q < n − 1, and through smaller v alues of the pro du ct ¯ d/d n − 1 and d n − 1 k ˆ β MP LS k for q ≥ n − 1, th e case where R 2 is unav ail- able. T o see how these v arious quan tities b ear on stable estimation, n ote first that ¯ d/d r = ( r Y i =1 ( d i /d r ) ) 1 /r , (3.18) whic h gets s maller as the d i /d r ratios get smaller. Lik e the w ell-kno wn con- dition n umb er d 1 /d r , smaller v alues of ( 3.18 ) indicate a more stable design matrix X γ . F or d q k ˆ β LS k and d n − 1 k ˆ β MP LS k , note that eac h of these can b e expressed as d 2 r k ˆ β k 2 = r X i =1 d r d i 2 ( u ′ i v ) k u i kk v k 2 = r X i =1 d r d i 2 { cor( u i , y ) } 2 . (3.19) Th us, for a given set of d i /d r ratios, ( 3.19 ) gets s maller if the larger cor- relations cor( u i , y ) corresp ond to the larger d i . Again, this is a measure of stabilit y , as the largest prin cipal co mp onen ts d i u i are the ones whic h are most stably estimated. Remark 3.4. The choice of ν in ( 3.13 ) will b e esp eciall y sensitiv e to small v alues of d r whic h would lead to large prior v ariances in ( 2.10 ). Thus, one bad x i predictor v ariable could sp oil the mo del. F rom an estimation p oint of view, this p erhaps w ould b e unwise. Ho w ev er, f rom a mo d el selection p oint of view, the effec t of a small d r w ould ha ve the effect of d o wnw eig hting the mo del, through the stabilit y measures discussed in R emark 3.3 , in fav or of mo d els which left out the offending x i . Thus, any u nstable subm o del with at least one suc h x i , but p ossibly more, w ould b e d o wnw eig hte d. 4. The effect of the default c hoices of a and b . In Section 3 w e prop osed the prior form p ( g ) giv en by ( 2.13 ) with hyp erparameters a and b , recom- mending the choi ces a = − 3 / 4 and b = ( n − q − 5) / 2 − a for the case q < n − 1 where the p rior on g matters. In the follo wing su bsections, w e show some app ealing consequences of these choic es. 4.1. The effe ct of a on the tail b ehavior of p ( β | σ 2 ) . Com bining p ( β | g, σ 2 ) in ( 2.10 ) with p ( g ) in ( 2.13 ), the probabilit y dens ity of β giv en σ 2 is giv en b y p ( β | σ 2 ) = Z ∞ 0 φ q ( W ′ β ; 0 , σ 2 Ψ q ( g , ν )) B ( a + 1 , b + 1) g b (1 + g ) a + b +2 dg . (4.1) 14 Y. MARUY AMA AND E. I. GEORGE T o examine the asymptotic b eha vior of the densit y p ( β | σ 2 ) as k β k → ∞ , we app eal to the T aub erian theorem for the Laplace transform [see Geluk an d de Haan ( 1987 )], which tells u s that th e con tribution of th e in tegral ( 4.2 ) around zero b ecomes negligible as k β k → ∞ . Thus, we hav e only to consider the inte gration b etw een ν 1 and ∞ (the ma jor term). Since d 1 ≥ · · · ≥ d q , and assu ming ν 1 ≥ · · · ≥ ν q , w e h a v e d 2 q ( ν 1 + 1) g ≤ d 2 i ν i + ν i g − 1 ≤ d 2 1 ν q g (4.2) for g ≥ ν 1 and any i , which imp lies C d q q ( ν 1 + 1) q / 2 Z ∞ ν 1 g g + 1 a + b +2 1 g q / 2+ a +2 exp − 1 g d 2 1 k W ′ β k 2 2 ν q σ 2 dg ≤ th e ma jor term of p ( β | σ 2 ) ≤ C d q 1 ν q / 2 q Z ∞ ν 1 g g + 1 a + b +2 1 g q / 2+ a +2 exp − 1 g d 2 q k W ′ β k 2 2( ν 1 + 1) σ 2 dg , where C = { B ( a + 1 , b + 1) } − 1 (2 π σ 2 ) − q / 2 . Thus, b y the T aub erian theorem, there exist C 1 < C 2 suc h that C 1 < k β k q +2 a +2 ( σ 2 ) a +1 p ( β | σ 2 ) < C 2 (4.3) for sufficient ly large k β k . F rom ( 4.3 ), we see that the asymptotic tail b ehavior of p ( β | σ 2 ) is de- termined b y a and un affected b y b . Smaller a yields flatter tail b eha vior, thereb y diminishing the p rior influ ence of p ( β | σ 2 ). F or a = − 1 / 2 the asymp- totic tail b eha vior of p ( β | σ 2 ), k β k − q − 1 , corresp onds to th at of multi v ariate Cauc hy distr ibution recommended by Zellner and Sio w ( 1980 ). I n contrast, the asymp totic tail b ehavio r of our choice a = − 3 / 4, k β k − q − 1 / 2 , is eve n flatter than that of the multiv ariate Cauc hy d istribution. 4.2. The effe ct of b on the implicit O ( n ) choic e of g . F or implementa - tions of the original g -pr ior ( 1.5 ), Zellner ( 1986 ) an d others ha v e recom- mended choic es f or which g = O ( n ) . This pr ev en ts the g -p rior fr om asymp- totical ly dominating the like liho o d w hic h wo uld o ccur if g w as unc hanged as n increased. The recommendation of choosing g = O ( n ) also applies to the c hoice of a fixed g for the generalized g -pr ior ( 2.10 ) where tr { V ar( β | g , σ 2 ) } = σ 2 q X i =1 ν i + ν i g − 1 d 2 i . Since d 2 i = O ( n ) for 1 ≤ i ≤ q by Lemma B.1 , tr { V ar( β | g, σ 2 ) } = g O ( n − 1 ) if ν i is b ounded. Th erefore, the c hoice g = O ( n ) w ill also preve nt the gener- FULL Y BA YES F ACTORS 15 alized g -prior f rom asymptotically domin ating the lik eliho o d, and stabilize it in the sense that tr { V ar ( β | g , σ 2 ) } = O (1) when g = O ( n ). F or our fully Ba y es case, where g is treated as a random v ariable, our c hoice of b , in ad d ition to yielding a closed form for the marginal densit y in ( 3.10 ), also yields an implicit O ( n ) c hoice of g , in the sense that [mo de of g ] = b a + 2 = 2( n − q ) − 7 5 , 1 E [ g − 1 ] = b a + 1 = 2( n − q ) − 7 for our recommended c hoices a = − 3 / 4 and b = ( n − q − 5) / 2 − a . (Note that E [ g ] do es not exist un der the c hoice a = − 3 / 4.) 5. Sh r ink age estimatio n conditionally on a mo del. In this section w e con- sider estimation conditionally on a mo del M γ . Beca use β is not ident ifiable when q > n − 1, and h ence not estimable, w e instead fo cus on estimation of X β , which is alw a ys estimable. F or this pu rp ose, we consider estimation of X β under scaled quadratic loss ( δ − X β ) ′ Q ( δ − X β ) /σ 2 for p ositiv e- definite Q . Th e Bay es estimator u nder this loss for an y Q is of the form X ˆ β B = X E [ σ − 2 β | y ] /E [ σ − 2 | y ] . (5.1) F rom calculations similar to those in Section 3 , u nder our priors giv en in Section 2 , a s imple closed form can b e obtained for th is estimator as follo ws. In cont rast, such a simple closed form is n ot a v ailable for the usual Ba ye s estimator, X E [ β γ | y ], the p osterior mean u nder ( δ − X β ) ′ Q ( δ − X β ) wh ic h do es not scale for the v ariance σ 2 . Theorem 5.1. The Bayes estimator under sc ale d quadr atic loss is given by X ˆ β B = r X i =1 (1 − H ( y ) /ν i )( u ′ i y ) u i , (5.2) wher e H ( y ) = 1 + 1 − Q 2 1 − R 2 ( n − q − 3) / 2 − a q / 2 + a + 1 − 1 , q < n − 1 , { 1 + E [ g ] } − 1 , q ≥ n − 1 . Pr oof . See the App end ix . Th us, when q ≥ n − 1 , we m ust sp ecify the mean of p rior dens ity of g , although no s u c h sp ecification wa s needed for mo d el select ion. A reasonable sp ecification m a y b e E [ g ] = d 2 n − 1 /d 2 1 , a function of the condition n umb er 16 Y. MARUY AMA AND E. I. GEORGE d 1 /d n − 1 of the linear equation. F or extremely large v alues of d 1 /d n − 1 , the co efficien ts of th e fir st and the last terms in ( 5.2 ) b ecome nearly 1 and 0, resp ectiv ely . S ee Casella ( 1985 ) and Maruy ama and Stra wderm an ( 2005 ) for further discussion of the condition n umb er. Th us, for our r ecommended c hoices of h yp erparameters a = − 3 / 4 and ν i = d 2 i /d 2 r for 1 ≤ i ≤ r , our recommended estimator of X β for a given mo del M γ is X ˆ β B = r X i =1 (1 − { d 2 r /d 2 i } H ( y ))( u ′ i y ) u i , (5 .3) where H ( y ) = 1 + 1 − R 2 + d 2 q k ˆ β LS k 2 1 − R 2 n/ 2 − q / 2 − 3 / 4 q / 2 + 1 / 4 − 1 , if q < n − 1 , (1 + d 2 n − 1 /d 2 1 ) − 1 , if q ≥ n − 1 . (5.4) Remark 5.1. As mentio ned in Remark 3.4 , a small v alue of d r could b e problematic for estimati on. This is reflected in ( 5.3 ) wh er e a small d r w ould dimin ish o v erall shrin k ag e. Ho w ev er, th e probabilit y of such a mo d el w ould b e sev erely down w eigh ted in the m o del selection con text, and so this diminished shr ink ag e w ould b e of little consequence. 6. Mo del selection consistency . In this section w e consider the mo del selection consistency in the case wh ere p is fi xed and n appr oac hes infi nit y . P osterior consistency for mo d el c hoice means plim n →∞ Pr( M T | y ) = 1 when M T is the tru e m o del , where plim denotes conv ergence in p robabilit y under the true mo d el M T , namely , y = α T 1 n + X T β T + ε , w here X T is the n × q T true d esign matrix and β T is the tru e ( q T × 1) co efficien t v ector and ε n ∼ N n ( 0 , σ 2 I n ). Let us sho w that our general cr iterion, BF γ : N ( a, ν ) give n b y ( 3.2 ) with b ound ed ν 1 , is mo d el selection consisten t. This is clearly equiv alen t to plim n →∞ BF γ : N ( a, ν ) BF T : N ( a, ν ) = 0 ∀M γ 6 = M T . (6.1) Recall th at we ha ve already assumed th at x ′ i 1 n = 0 and x ′ i x i /n = 1 for an y 1 ≤ i ≤ p . T o obtain mo del selection consistency , w e also assume the follo w- ing: (A1) Th e correlation b et w een x i and x j , x ′ i x j /n , has a limit as n → ∞ . (A2) Th e limit of the correlation matrix of x 1 , . . . , x p , lim n →∞ X ′ F X F /n , is p ositiv e definite. FULL Y BA YES F ACTORS 17 Assumption (A1) is the stand ard assump tion whic h also app ears in K nigh t and F u ( 2000 ) and Zou ( 2006 ). Assu mption (A2) is natural b ecause the columns of X F are assu med to b e lin early indep enden t. Our main consistency th eorem is as follo ws. Note that our r ecommended c hoice ν 1 = d 2 1 /d 2 q is b ounded by Lemma B.1 in the App end ix . Theorem 6.1. Under assumptions (A1) and (A2) , if ν 1 is b ounde d, then BF γ : N ( a, ν ) is c onsistent for mo del sele ction. 7. Simulated p erformance ev aluations. In this section we rep ort on a num- b er of sim ulated p erformance comparisons b etw een our recommended Ba ye s factor g BF γ : N and the follo wing selection criteria: ZE = (1 − R 2 ) − ( n − q ) / 2+3 / 4 B ( q / 2 + 1 / 4 , ( n − q ) / 2 − 3 / 4) B (1 / 4 , ( n − q ) / 2 − 3 / 4) , EB = max g m γ ( y | g , ˆ σ 2 ) , AIC = − 2 × maximum log lik eliho o d + 2( q + 2) , AICc = − 2 × maxim um log like liho o d + 2( q + 2) n n − q − 3 , BIC = − 2 × maximum log lik eliho o d + q log n . Here, ZE is the sp ecial case of BF γ : N with a = − 3 / 4 and ν 1 = · · · = ν q = 1 (corresp onding to Zellner’s g -prior). Note that comparisons of g BF w ith ZE should r ev eal th e effect of our c hoice of descending ν . EB is th e empirical Ba y es criterion of George and F oster ( 2000 ) in ( 1.6 ), also based on th e original g -prior, with ˆ σ 2 = RSS γ / ( n − q γ − 1) plugged in. Finally , AICc is the well -known correction of AIC prop osed by Hurvic h and Tsai ( 1989 ). F or th ese comparisons, w e consider data generated by submo dels ( 1.2 ) of ( 1.1 ) w ith p = 16 p oten tial p redictors for tw o different c hoices of the un - derlying design matrix X F . F or the first c hoice, whic h we refer to as the correlated case, eac h row of the 16 pred ictors are generated as x 1 , . . . , x 13 ∼ N (0 , 1), and x 14 , x 15 , x 16 ∼ U ( − 1 , 1) (the uniform distrib u tion) with the fol- lo wing pairwise correlations: cor=0 . 9 z }| { x 1 , x 2 , x 3 , x 4 | {z } cor= − 0 . 7 , cor=0 . 5 z }| { x 5 , x 6 , x 7 , x 8 | {z } cor= − 0 . 3 , cor=0 . 1 z }| { x 9 , x 10 (7.1) and indep endently otherw ise. F or the seco nd c hoice, whic h w e refer to as the simp le case, eac h row of the 16 pr edictors are generated as x 1 , . . . , x 16 i.i.d. ∼ N (0 , 1). F or our first set of comparisons, we set n = 30 (larger than p = 16) and considered 4 su bmo dels where the true pr edictors are: • x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 , x 9 , x 10 , x 11 , x 12 , x 13 , x 14 , x 15 , x 16 ( q T = 16), 18 Y. MARUY AMA AND E. I. GEORGE T able 1 R ank of the true mo del q T : 16 12 8 4 Rank: 1st 1st–3rd 1st 1st–3rd 1st 1st– 3rd 1st 1st–3rd Correlated case g BF 0.71 0.91 0.73 0.94 0.6 9 0.8 7 0.66 0.86 ZE 0.40 0.70 0.63 0.89 0.6 8 0.8 9 0.67 0.87 EB 0.41 0. 71 0.63 0.90 0. 67 0.8 8 0.66 0.85 AIC 0.95 0.99 0.23 0.38 0.09 0.17 0.05 0.08 AICc 0.25 0.45 0.67 0.90 0.52 0.75 0.25 0.44 BIC 0.88 0.98 0.41 0.65 0.31 0.43 0.23 0.42 Simple case g BF 0.98 0.99 0.83 0.97 0.75 0.93 0.67 0.85 ZE 0.94 0.98 0.87 0.97 0.78 0.95 0.69 0.88 EB 0.95 0.98 0.87 0.98 0.76 0.95 0.65 0.87 AIC 1.00 1.00 0.22 0.37 0.08 0.13 0.05 0.08 AICc 0.82 0.87 0.85 0.97 0.55 0.80 0.24 0.46 BIC 0.99 1.00 0.41 0.65 0.27 0.46 0.22 0.39 • x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 , x 9 , x 10 , x 11 , x 14 ( q T = 12), • x 1 , x 2 , x 5 , x 6 , x 9 , x 10 , x 11 , x 14 ( q T = 8), • x 1 , x 2 , x 5 , x 6 ( q T = 4) (where q T denotes th e num b er of tru e predictors) and the true mo d el is giv en b y Y = 1 + 2 X i ∈{ true } x i + { norm al err or term N (0 , 1) } . (7.2) In b oth cases, after generating p seudo random x 1 , . . . , x 16 , we cente red and scaled them as n oted in S ection 1 . Remark 7.1. With simula tions of p erformance in Ba y esian mo d el se- lection, the answ ers primarily dep end on the assumed prior. Here w e h av e c hosen all the β i = 2, an extreme form of the assu mption of exc hangeabilit y . T able 1 compares the cr iteria by ho w often the true mo d el was selected as b est, or in the top 3, among the 2 16 candidate mod els across the N = 500 replications. W e n ote the follo wing: • In th e correlated cases, EB, ZE and g BF were v ery similar for q T = 4 , 8, but g BF w as muc h b etter f or q = 12 , 16. • In the simple cases, g BF, Z E and EB w ere v ery similar, suggesting no effect of ou r extension of Z ellner’s g -prior with descending ν . FULL Y BA YES F ACTORS 19 T able 2 Pr e di ction err or c omp arisons 16 12 8 4 Mean (LQ, UQ) Mean (LQ, UQ) Me an (LQ, UQ) Mean (LQ, UQ) Correlated case Oracle 0.57 (0.43, 0.68) 0.43 (0.31, 0.53) 0.30 (0.20, 0.38) 0.17 (0.09, 0.22) g BF 0.70 (0.44, 0.78) 0.52 (0.32, 0.61) 0.37 (0.22, 0.47) 0.26 (0.11, 0.35) ZE 1.02 (0.53, 1.20) 0.59 (0.35, 0.71) 0.41 (0.23, 0.53) 0.27 (0.11, 0.37) EB 1.00 ( 0.52, 1.16) 0.58 (0.35, 0.70) 0.41 (0.23, 0.53) 0.27 (0.11, 0.37) AIC 0.56 (0.42, 0.67) 0.54 (0.40, 0.65) 0.51 (0.37, 0.62) 0.48 (0.33, 0.59) AICc 1.29 (0.65, 1.65) 0.56 (0.34, 0.68) 0.42 (0.25, 0.52) 0.36 (0.22, 0.47) BIC 0.58 (0.42, 0.69) 0.53 (0.38, 0.64) 0.46 (0.31, 0.58) 0.39 (0.23, 0.51) Simple case Oracle 0.57 (0.43, 0.68) 0.43 (0.31, 0.53) 0.30 (0.20, 0.38) 0.17 (0.09, 0.22) g BF 0.57 (0.41, 0.67) 0.45 (0.33, 0.56) 0.35 (0.21, 0.45) 0.25 (0.12, 0.33) ZE 0.66 (0.42, 0.70) 0.45 (0.32, 0.56) 0.34 (0.21, 0.44) 0.24 (0.12, 0.32) EB 0.65 ( 0.42, 0.69) 0.45 (0.32, 0.56) 0.35 (0.21, 0.45) 0.25 (0.12, 0.34) AIC 0.56 (0.42, 0.67) 0.54 (0.39, 0.65) 0.51 (0.37, 0.63) 0.48 (0.32, 0.60) AICc 0.98 (0.45, 0.83) 0.46 (0.33, 0.55) 0.39 (0.25, 0.50) 0.35 (0.20, 0.47) BIC 0.56 (0.42, 0.67) 0.52 (0.37, 0.64) 0.45 (0.30, 0.57) 0.38 (0.21, 0.50) • In b oth the correlated and simp le cases, AIC and BIC w ere p oor for all cases except q T = 16. • In b oth the correlated and simple cases, AIC c w as p o or for q T = 16 and 4 but go o d for q T = 8 , 12. Ov erall, T able 1 suggests that g BF is stable and go o d for most cases, and that our generaliz ation of Zellner’s g -p rior is effect ive in th e correlate d case. On data fr om the same setup with n = 30 and N = 500, T able 2 compares the m o dels selected b y eac h criterion based on their (in-samp le) pred ictive error ( ˆ y ∗ − α T 1 n − X T β T ) ′ ( ˆ y ∗ − α T 1 n − X T β T ) nσ 2 , where X T , α T and β T are the true n × q T design matrix, the tr ue intercept and the true co efficien ts. The pr ediction ˆ y ∗ for eac h selected mo del is given b y ¯ y 1 n + X γ ∗ ˆ β γ ∗ , where X γ ∗ is the selected d esign matrix, ˆ β γ ∗ is the Bay es estimator for g BF , ZE and EB, and is the least squares estimator for AIC, BIC and AICc. T o aid in gauging these comparisons, w e also included the “oracle” prediction error, namely , that based on the least squares estimate under the true mo del. The summary statistics rep orted in T able 2 are the m ean pr edictiv e error, and the lo w er quant ile (LQ ) and u pp er qu an tile (UQ) of the predictiv e err ors. 20 Y. MARUY AMA AND E. I. GEORGE T able 3 Mo del size fr e quen cies in the many pr e dictors c ase 0–6 7 8 9 10 11 12–16 Correlated 0.10 0.11 0.22 0.34 0.16 0.07 0.00 Simple 0.11 0.15 0.21 0.33 0.14 0.06 0.00 T able 4 The r elative r ank of the true mo del Min LQ Median Mean UQ Max Correlated 0.001 0.012 0.023 0.035 0.04 2 0 .518 Simple 0.001 0.013 0.023 0.039 0.04 3 0 .555 In terms of pr edictiv e p erformance, the comparisons are similar to those in T able 1 . Overal l, we see th at g BF wo rks w ell in this setting. F or our final ev aluati ons, we u se d ata again simula ted from the simple form ( 7.2 ), b ut now with x 1 , x 2 , . . . , x 12 , x 14 , x 15 as the tru e predictors ( q T = 14) and a sm all sample size n = 12 (smaller than p = 16). S ince p > q T > n , the tru e mo del is not id en tifiable here. F u rthermore, AIC, BIC, AIC c, ZE and EB cannot eve n b e co mpu ted (b ecause p > n ) and so w e confine our ev aluations to g BF . F or this very difficult selection situation, g BF did not rank the complete true mo d el of dimension q T = 14 as b est ev en once across the N = 500 iterations. In fact, as sho wn by the fr equency of mo del sizes selected as b est b y gBF in T able 3 , the top s elected mo del w as alw a ys of dimens ion less than n = 12, the dimension required for ident ifiabilit y . Ho w ev er, if one instead considers the o veral l g BF rankings across all p ossible mo dels, a different picture emerges. As can b e seen in T able 4 , wh ic h su mmarizes the r elativ e r ank of the true mo del (rank / 2 16 ) o ve r the N = 500 iterations (smaller is b etter), g BF often r ank ed the true m o del relativ ely high. Indeed, the mean relativ e gBF rank of the true m o del w as 0 . 03 5 in the correlated case and 0 . 039 in the simple structur e case. Both of these mean r an k s w ere the highest mean ranks ac hiev ed b y any of the 2 16 = 65, 536 cand idate mo dels! T he true mo d el ranks w ere evidently more stable than the other m o del ranks wh ic h v aried m ore from iteration to iteration. Rather than select a single top rank ed mo del in this context , it w ould seem to b e b etter to us e g BF to restrict int erest to a promising su bset. F urth er, it should b e noted that g BF p erformed b est among the large r unidentified mo dels as sho wn by T able 5 , w hic h rep orts the f requencies w ith whic h the true mo del was ranked highly among the (16 × 15) / 2 = 120 can- didate mo dels with exactly 14 pr edictors. T o our knowledge , we kno w of no FULL Y BA YES F ACTORS 21 T able 5 F r e quency that the true mo del was r anke d highly among mo dels with 14 pr e dictors 1st 1st–2nd 1st–3rd Correlated 0.14 0.22 0.26 Simple 0.13 0.20 0.26 T able 6 Pr e di ctor fr e quencies in the many pr e dictors c ase x 1 (T) x 2 (T) x 3 (T) x 4 (T) x 5 (T) x 6 (T) Correlated 0 .65 0.63 0.44 0.46 0.62 0.60 Simple 0.54 0.54 0.54 0.54 0.54 0.57 x 7 (T) x 8 (T) x 9 (T) x 10 (T) x 11 (T) x 12 (T) Correlated 0 .56 0.56 0.59 0.58 0.58 0.60 Simple 0.55 0.55 0.54 0.56 0.52 0.50 x 13 (F) x 14 (T) x 15 (T) x 16 (F) Correlated 0 .40 0.43 0.45 0.40 Simple 0.34 0.55 0.57 0.39 other analytical selection criterion for choosing b et w een mo dels w ith R 2 = 1, whic h is the case h ere. Finally , w e call atten tion to T able 6 whic h r ep orts the observed g BF predictor selection frequ encies across the top ran ked g BF mo dels o v er the N = 500 iterations. T h ese fr equencies show that the top g BF mo dels tended to at least b e partially corr ect in the sens e that, for the most p art, the true individual predictors [designated by (T)] we re selected more often than not. Remark 7.2. The only v ariables that were un der-selected by g BF in T able 6 w ere ( x 3 , x 4 ) and ( x 14 , x 15 ) in the correlated case. Although x 3 and x 4 are true pr edictors, their u nder-selection may b e explained by the high negativ e correlation b et w een them. In terestingly , the under-selection of x 14 and x 15 is n ot explained by correlation (as they are indep enden t in b oth the correlated and simp le cases). Rather, since all p r edictors ha ve b een stand ardized, it suggests that in this setting, selection of U ( − 1 , 1) predictors may b e more difficu lt than N (0 , 1) p redictors (they are u n iform in the correlated case and normal in the simple case). 22 Y. MARUY AMA AND E. I. GEORGE APPENDIX A: PROOF OF THEOREM 5.1 W e pro ceed by finding a simple closed form for ˆ β B in ( 5.1 ). Making u s e of the transformation ( 3.5 ), and b y the calculation in ( 3.6 ), E [ β # | y ] = E [ β # ] (sa y , µ # ) and W E [ σ − 2 β ∗ | y ] E [ σ − 2 | y ] = 1 E [ σ − 2 | y ] E " σ − 2 r X i =1 u ′ i y d i 1 − 1 ν i (1 + g ) w i y # = r X i =1 u ′ i y d i 1 − H ( y ) ν i w i , where H ( y ) = E [ σ − 2 (1 + g ) − 1 | y ] E [ σ − 2 | y ] . (A.1) Th us, ˆ β B = r X i =1 u ′ i y d i 1 − H ( y ) ν i w i + 0 , if q ≤ n − 1, W # µ # , if q > n − 1. (A.2) Since β is not iden tifiable when q ≥ n − 1 , it is n ot sur prising that ˆ β B is incompletely defin ed due to the arbitrariness of W # µ # . Ho we ve r, b ecause XW # = 0 , th is arbitrariness is n ot an issue for the estimation of X β , for whic h we obtain X ˆ β B = r X i =1 ( u ′ i y ) u i 1 − H ( y ) ν i . (A.3) It now only remains to obtain a closed f orm for H ( y ). As in ( 3.4 ), ( 3.7 ) and ( 3.8 ) in S ection 3 , Z ∞ −∞ Z R q Z ∞ 0 1 σ 2 p ( y | α, β , σ 2 ) p ( β | g , σ 2 ) 1 σ 2 dα d β dσ 2 = Z ∞ 0 { σ 2 } − ( n +1) / 2 n 1 / 2 (2 π ) ( n − 1) / 2 (1 + g ) − r / 2 Q r i =1 ν 1 / 2 i × exp − k v k 2 { g (1 − R 2 ) + 1 − Q 2 } 2 σ 2 ( g + 1) 1 σ 2 dσ 2 (A.4) = 2 n 1 / 2 Γ( { n + 1 } / 2 ) π ( n − 1) / 2 k v k − n − 1 Q r i =1 ν 1 / 2 i (1 + g ) − r / 2+( n +1) / 2 × { g (1 − R 2 ) + 1 − Q 2 } − ( n +1) / 2 , FULL Y BA YES F ACTORS 23 whic h differs s ligh tly from ( 3.8 ) b ecause of the extra 1 /σ 2 term in the first expression. Letting L ( y | g ) = (1 + g ) − r / 2+( n +1) / 2 { g (1 − R 2 ) + 1 − Q 2 } − ( n +1) / 2 , (A.5) w e h av e H ( y ) = R ∞ 0 (1 + g ) − 1 L ( y | g ) p ( g ) dg R ∞ 0 L ( y | g ) p ( g ) dg = R ∞ 0 (1 + g ) − r / 2+( n − 1) / 2 { g (1 − R 2 ) + 1 − Q 2 } − ( n +1) / 2 p ( g ) dg R ∞ 0 (1 + g ) − r / 2+( n +1) / 2 { g (1 − R 2 ) + 1 − Q 2 } − ( n +1) / 2 p ( g ) dg . When q < n − 1, under the pr ior ( 2.13 ) us ed in Section 3 , namely , p ( g ) = g b (1 + g ) − a − b − 2 B ( a + 1 , b + 1) = g b (1 + g ) − ( n − r − 1) / 2 B ( a + 1 , b + 1) , where b = ( n − 5) / 2 − r / 2 − a , we h a v e H ( y ) = R ∞ 0 g b { g (1 − R 2 ) + 1 − Q 2 } − ( n +1) / 2 dg R ∞ 0 g b (1 + g ) { g (1 − R 2 ) + 1 − Q 2 } − ( n +1) / 2 dg = 1 + R ∞ 0 g b +1 { g (1 − R 2 ) + 1 − Q 2 } − ( n +1) / 2 dg R ∞ 0 g b { g (1 − R 2 ) + 1 − Q 2 } − ( n +1) / 2 dg − 1 = 1 + 1 − Q 2 1 − R 2 B ( q / 2 + a + 1 , b + 2) B ( q / 2 + a + 2 , b + 1) − 1 = 1 + 1 − Q 2 1 − R 2 ( n − q − 3) / 2 − a q / 2 + a + 1 − 1 . On the other hand , when q ≥ n − 1, it follo ws that R 2 = 1, r = n − 1, L ( y | g ) = (1 + g )(1 − Q 2 ) − ( n +1) / 2 and, hence, H ( y ) = R ∞ 0 p ( g ) dg R ∞ 0 (1 + g ) p ( g ) dg = { 1 + E [ g ] } − 1 . (A.6) APPENDIX B: PR OOF OF T HEOREM 6.1 B.1. Some preliminary lemmas. Under the assump tions (A1) and (A2) in S ection 6 , we will give the follo wing lemmas (Lemma B.1 on X T and X γ and Lemmas B.2 , B.3 on R 2 T and R 2 γ ) for our main pr o of. See also F ern ´ andez, Ley and S teel ( 2001 ) and Liang et al. ( 2008 ). Note that (A2) implies that, for an y mo del M γ , there exists a p ositiv e d efinite matrix H γ suc h that lim n →∞ 1 n X ′ γ X γ = H γ . (B.1) 24 Y. MARUY AMA AND E. I. GEORGE Lemma B.1. (1) L et d 1 [ γ ] and d q [ γ ] b e the maximum and minimum of singular values of X γ . Then { d 1 [ γ ] } 2 /n and { d q [ γ ] } 2 /n appr o ach the maxi- mum and minimum eigenvalues of H γ , r esp e ctively. (2) The q T × q T limit lim n →∞ n − 1 X ′ T X γ ( X ′ γ X γ ) − 1 X ′ γ X T = H ( T , γ ) (B.2) exists. (3) When γ + T , the r ank of H T − H ( T , γ ) is given by the numb er of nonoverlap ping pr e dictors and β ′ T H T β T > β ′ T H ( T , γ ) β T . (4) H T − H ( T , γ ) = 0 for γ ) T . Lemma B.2. L et γ + T . Then plim n →∞ R 2 γ = β ′ T H ( γ , T ) β T σ 2 + β ′ T H T β T < β ′ T H T β T σ 2 + β ′ T H T β T . (B.3) Pr oof . F or the submo d el M γ , 1 − R 2 γ is giv en b y k Q γ ( y − ¯ y 1 n ) k 2 / k y − ¯ y 1 n k 2 with Q γ = I − X γ ( X ′ γ X γ ) − 1 X ′ γ . The n umerator and denominator are rewr it- ten as k Q γ ( y − ¯ y 1 n ) k 2 = k Q γ X T β T + Q γ ˇ ε k 2 (B.4) = β ′ T X ′ T Q γ X T β T + 2 β ′ T X ′ T Q γ ε + ˇ ε ′ Q γ ˇ ε , where ˇ ε = ε − ¯ ε 1 n and, similarly , k y − ¯ y 1 n k 2 = β ′ T X ′ T X T β T + 2 β ′ T X ′ T ε + k ˇ ε k 2 . Hence, 1 − R 2 γ can b e rewr itten as β ′ T { X ′ T Q γ X T /n } β T + 2 β ′ T { X ′ T Q γ ε /n } + k Q γ ˇ ε k 2 /n β ′ T { X ′ T X T /n } β T + 2 β ′ T { X ′ T ε /n } + k ˇ ε k 2 /n . (B.5) In ( B.5 ), β ′ T X ′ T ε /n approac hes 0 in p robabilit y b ecause E [ ε ] = 0 , v ar[ ε ] = σ 2 I n , E [ X ′ T ε /n ] = 0 and v ar( X ′ T ε /n ) = n − 1 σ 2 { X ′ T X T /n } → 0 . (B.6) Similarly β ′ T { X ′ T Q γ ε /n } → 0 in probabilit y . F urther, b oth k ˇ ε k 2 /n and k Q γ ˇ ε k 2 /n for any γ con ve rge to σ 2 in probability . Therefore, by p arts (2) and (3) of Lemma B.1 , R 2 γ for γ + T approac hes β ′ T H ( γ , T ) β T σ 2 + β ′ T H T β T < β ′ T H T β T σ 2 + β ′ T H T β T in probability . FULL Y BA YES F ACTORS 25 Lemma B.3. L et γ ) T . Then: (1) R 2 γ ≥ R 2 T for any n and plim n →∞ R 2 T = plim n →∞ R 2 γ = β ′ T H T β T σ 2 + β ′ T H T β T . (B.7) (2) { (1 − R 2 T ) / (1 − R 2 γ ) } n is b ounde d fr om ab ove in pr ob ability. Pr oof . (1) When γ ) T , Q γ X T = 0 . Hence, as in ( B.5 ), we hav e 1 − R 2 γ = k Q γ ˇ ε k 2 /n β ′ T { X ′ T X T /n } β T + 2 β ′ T { X ′ T ε /n } + k ˇ ε k 2 /n , (B.8) 1 − R 2 T = k Q T ˇ ε k 2 /n β ′ T { X ′ T X T /n } β T + 2 β ′ T { X ′ T ε /n } + k ˇ ε k 2 /n . Since k Q T ˇ ε k 2 /n > k Q γ ˇ ε k 2 /n f or an y n and b oth approac h σ 2 in pr obabilit y , part (1) follo ws. (2) By ( B.8 ), (1 − R 2 T ) / (1 − R 2 γ ) is give n b y k Q T ˇ ε k 2 / k Q γ ˇ ε k 2 . F urther, we ha v e 1 ≤ 1 − R 2 T 1 − R 2 γ = k Q T ˇ ε k 2 k Q γ ˇ ε k 2 ≤ k ˇ ε k 2 k Q γ ˇ ε k 2 = 1 W γ , where W γ ∼ (1 + χ 2 q γ /χ 2 n − q γ − 1 ) − 1 , for indep end en t χ 2 n − q γ − 1 and χ 2 q γ . Hence, { 1 + χ 2 q γ /χ 2 n − q γ − 1 } − n = { 1 + { n/χ 2 n − q γ − 1 }{ χ 2 q γ /n }} − n ∼ exp( − χ 2 q γ ) as n → ∞ since χ 2 n − q γ − 1 /n → 1 in p r obabilit y . T h erefore, W − n γ is b ounded in proba- bilit y from ab ov e an d part (2) follo ws. B.2. The p r o of of Theorem 6.1 . Note that ν − 1 1 ≤ 1 − Q 2 γ ≤ 1 b y ( 3.3 ), ν − q / 2 1 ≤ q Y i =1 ν − 1 / 2 i ≤ 1 , b ecause th e ν i ’s are descendin g, B ( q / 2 + a + 1 , ( n − q − 3) / 2 − a ) B ( a + 1 , ( n − q − 3) / 2 − a ) = Γ( q / 2 + a + 1) Γ( a + 1) Γ( { n − q − 1 } / 2 ) Γ( { n − 1 } / 2) and lim n →∞ ( n/ 2) q / 2 Γ( { n − q − 1 } / 2) Γ( { n − 1 } / 2) = 1 26 Y. MARUY AMA AND E. I. GEORGE b y S tirlin g’s formula. Then, by ( 3.2 ), ther e exist c 1 ( γ ) < c 2 ( γ ) (which do not dep end on n ) suc h that c 1 ( γ ) < { n q γ (1 − R 2 γ ) n } 1 / 2 BF γ : N ( a, ν ) (1 − R 2 γ ) ( q γ +3) / 2+ a < c 2 ( γ ) for su fficien tly large n . By Lemmas B.2 and B.3 , R 2 γ go es to some constan t in probability . Hence, to show consistency , it suffices to sh o w that plim n →∞ n q T − q γ 1 − R 2 T 1 − R 2 γ n = 0 . (B.9) Consider the follo wing tw o situations: (1) γ + T : by Lemmas B.2 and B.3 , (1 − R 2 T ) / (1 − R 2 γ ) is strictly less than 1 in probabilit y . Hence, { (1 − R 2 T ) / (1 − R 2 γ ) } n con v erges to zero in probabilit y exp onen tially fast with resp ect to n . Therefore, no m atter wh at v al ue q T − q γ tak es, ( B.9 ) is satisfied. (2) γ ) T : by Lemma B.3 , { (1 − R 2 T ) / (1 − R 2 γ ) } n is b ounded in p robabilit y . Since q γ > q T , ( B.9 ) is satisfied. Ac kno wledgment s. W e are v ery grateful to a referee for w onderfu l in - sigh ts wh ic h sub stan tially help ed us to strengthen this pap er. REFERENCES Akaike, H. (1974). A new lo ok at the statistical mo del identification. IEEE T r ans. Aut omat. Contr ol A C-19 716–723 . System identification and time-series analysis. MR0423716 Berger, J. O. , Pericchi, L. R. and V arsha vsky, J. A. (1998). Ba yes factors and marginal distributions in inv ariant situations. Sankhy¯ a Ser. A 60 307–321. MR1718789 Casella, G. (1980). Minima x ridge regression estimation. Ann. Statist. 8 1036–105 6. MR0585702 Casella, G . (1985). Condition numbers and minimax ridge regression estimators. J. A mer. Statist. Asso c. 80 753–758. MR0803264 Cui, W. and Ge or ge, E. I. (2008). Empirical Bay es vs. fully Ba yes v ariable selection. J. Statist. Plann. Infer enc e 138 888–900. MR2416869 Fern ´ andez, C. , Le y, E. and Steel, M. F. J. (2001). Benchmark priors for Bay esian mod el av eraging. J. Ec onometrics 100 381–427. MR1820410 F oster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regres- sion. Ann. Statist. 22 1947 –1975. MR1329177 Geluk, J. L. and de H aan, L. (1987). R e gular V ariation, Extens ions and T aub e- rian The or ems . CWI T r act 40 . Math. Centrum, Cen trum Wisk. Inform., Amsterdam. MR0906871 George, E. I. and Foster, D. P. (2000 ). Calibration and emp irical Ba yes v ariable selection. Biom etrika 87 731–747. MR1813972 Hur vich, C. M. and Tsai, C.-L. (1989). Regression and time series mo del selection in small samples. Biometrika 76 297–30 7. MR1016020 FULL Y BA YES F ACTORS 27 Knight, K. and Fu, W. (2000). Asymptotics for lasso-t yp e estimators. Ann. Statist. 28 1356–13 78. MR1805787 Liang, F. , P aulo, R. , Molina, G. , Cl y de, M. A. an d Berger, J. O. (2008). Mix- tures of g p riors for Bay esian v ariable selection. J. Amer. Statist. Asso c. 103 410–423. MR2420243 Maruy ama, Y. and Stra wderman, W. E. (2005). A new class of generalized Bay es minimax ridge regressio n estimators. An n. Statist. 33 1753–1770. MR2166561 Schw arz, G . (1978). Estima ting the dimension of a mo del. Ann. Statist. 6 461–46 4. MR0468014 Stra wderman, W. E. (1971). Prop er Bay es minimax estimators of th e multiv ariate nor- mal mean. Ann. Math. Statist. 42 385–388. MR0397939 Zellner, A. (1986). On assessing prior distributions and Ba yesian regression analysis with g -prior d istributions. I n Bayesian Infer enc e and De cision T e chniques . Stud. Bayesian Ec onometrics Statist. 6 233– 243. North- Holland, Amsterd am. MR0881437 Zellner, A. and S iow , A . (1980). Posterior odds ratios for selected regression hypotheses. In Bayesian Statist ics: Pr o c e e dings of the First International Me eting Held in V alencia (Sp ain) ( J. M. Bernard o , M. H. DeGroot , D. V. Li ndley and A. F. M . Smi th , eds.) 585–603. Univ. V alencia, V alencia. Zou, H. (2006). The adaptive lasso and its oracle prop erties. J. Amer. Statist. Asso c. 101 1418–14 29. MR2279469 Center for Sp a tial Informat ion S cience University of Tokyo 5-1-5 Kashiw anoha, Kashiw a-shi Chiba, 277 -8568 Jap an E-mail: maruya ma@csis.u- toky o.ac.jp Dep ar tment of S t a tist ics University of Pennsyl v ania 400 Jon M . Huntsman Hall 3730 W a ln ut Street Philadelphia, Pennsyl v ania 1910 4-6302 USA E-mail: edgeorge@wharton .up enn.edu
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment