Case-deletion importance sampling estimators: Central limit theorems and related results

Case-deleted analysis is a popular method for evaluating the influence of a subset of cases on inference. The use of Monte Carlo estimation strategies in complicated Bayesian settings leads naturally to the use of importance sampling techniques to as…

Authors: Ilenia Epifani, Steven N. MacEachern, Mario Peruggia

Electronic Journal of Stati stics V ol. 2 (2008) 774–806 ISSN: 1935-7524 DOI: 10.1214/ 08-EJS25 9 Case-del etion imp o rtance sampling estimators: Cen tral limit theorems and related results Ilenia Epifani Dep artment of Mathematics Polite cnico di Milano I-20133 Milano, Italy e-mail: ilenia.e pifani@p olimi.it url: w ww1.mate .polimi. it/ ∼ ileepi/ Stev en N. MacEac hern † and Mario P eruggia ‡ Dep artment of Statistics The Ohio State U niversity Columbus, OH 43210-1247 , USA e-mail: snm@stat .osu.edu ; peruggia@sta t.osu.ed u url: www.stat.o su.edu/ ∼ sn m/ ; www.stat .osu.edu / ∼ peruggia/ Abstract: Case-deleted analysis is a p opular method for ev aluating the influence of a su bset of cases on inf erence. The use of Monte Carlo estima- tion strategies in complicated Bay esian settings leads naturally to the use of imp ortance sampling tec hnique s to assess the dive rgence b etw een full- data and case-deleted posterior s and to pro vide estimates under the case-deleted posteriors . How ev er, the dep endability of the imp ortance sampling estima- tors dep ends cri tically on the v ariabilit y of the case-deleted weigh ts. W e prov ide the oretical results conc erning the assessment of the dependability of case-deleted imp ortance sampling estimators i n several Bay esi an mo dels. In particular, these results all ow us to establish whether or not the esti- mators satisfy a cen tral li mit theorem. Because the conditions w e derive are of a simple analytical nature, the assessment of the dep endabilit y of the estimators can be verified routinely bef ore estimation is performed. W e illustrate the use of the results in several examples. AMS 2000 sub ject classificati ons: Primary 62F15, 62J20. Keywords and phra ses: Infinite V ariance, Influ ence, Lev erage, Marginal Residual Sum of Squares, M arko v Chain Monte Carlo, Mo del Averaging, Momen t Index, T ail Beha vior. Receiv ed July 2008. ∗ The author thanks the Departmen t of Stat istics at the Ohio Stat e Universit y for the kind hospitalit y during her visit † Supported i n part by NSF Awards No. DM S-0072526, SES-0437251, and DM S-0605041. ‡ Supported i n part by NSF Awards No. SES-0214574, SES- 0437251 and DMS- 0605052. 774 I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 775 1. In troductio n Complex Bay esian mo dels are fit with simulation tec hniques. A Monte Carlo metho d is used to genera te a sample fro m the p oster ior dis tribution, and this sample is used to estimate many qua n tities, such as p o sterior means and v a ri- ances of para meters, p os ter ior probabilities of events, pre dic tive distributions of future cases, etc. F or a complete a nalysis, one examines the data, loo king for outliers and influen tial cases. One also considers informatio n external to the mo del which sugge s ts gro ups o f ca ses that may depart fr o m the mo del. When int eresting groups of ca ses ar e found, they are dr opp ed fr o m the data set, and estimates ar e r ecomputed. The resulting case-dele ted po sterior distribution and the case- deleted estimates a re of interest, as ar e the c hanges in the p osterior and estimates. Substan tial changes in p osterior or estimates may lea d to refinement of the mo del. Cr oss-v alidation also relies on ca se-deletion, as formaliz ed b y the conditional predictive or dinate (CPO) (se e , for example, p. 47 a nd p. 284 o f [ 4 ]). Case-deleted p osterio r distr ibutio ns are examined through imp ortance sa m- pling. The lar ge sample from the full pos ter ior distributio n is reweigh ted, as suggested for example in [ 21 ] and [ 23 ], to compute summaries with resp ect to the case-deleted p osterio r distr ibution. Ex a mples of this and similar approa ches are presented in [ 3 , 15 , 16 , 25 , 26 ] and [ 27 ]. As shown in [ 13 ] it is essential fo r the imp or tance sampling weigh ts to hav e finite v ariance. If the 2nd moment of the w eigh ts do es not exist, typical es timators will not follow a n 1 / 2 asymptotic, nor will they fo llow a central limit theorem. It is sho wn in [ 19 ] that, for the case of a popula r B ay esian linear mo del with conjugate pr iors, whether or not the weight function for a sing le c a se-deletion has finite 2nd moment depends on simple conditions in volving the scale parame- ter of the prio r distribution of the erro r v aria nce , the leverage of the o bserv a tion being deleted, its residua l, and the total residual sum of squar es. In this ar ticle, we expand up on the results of [ 19 ] in several dir ections. W e fir st a nalyze the sit- uation of m ultiple ca s e-deletions and pr ovide necessary and s ufficien t conditions for the r th ( r > 1) p osterio r moment of the weigh t function to be finite. T his a l- lows us to treat a group of observ ations coherently , thereby capturing synergistic effects of s imilar cases. W e extend the re sults to much broader cla sses o f prior distributions, so that we can handle nonconjugate as well as conjuga te priors. This is a ccomplished b y for mally defining classes of distributions that ar e thick or thin tailed with resp ect to the conjugate priors. This extension is co upled with tw o devices, b ounding functions and adjustment of the prior , to allow us to esta blish a connection betw een a finite r th moment of the weight function and the finiteness o f the 2nd moment for a v a r iety of functions. The existence of tw o moment s for these functions implies that a central limit theo rem holds for a n estimator. As in [ 19 ], the conditions ar e on sample s ize, leverage and an adjusted res idua l sum of squa r es. In addition to the linea r mo del, we pr ovide res ults for the Mic haelis-Menton (MM) mo del. The MM mode l is nonlinear , but has the prop erty that, co ndi- tional on one par ameter, the mean str ucture is linear in the r emaining pa r am- eters. Making use of co nditional linearity , we dev elop uniform v ersions o f the I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 776 conditions for the linear mo del that ensure existence o f the weight function’s r th moment in the MM model. Many other models are conditionally linear (among them linear regression used in conjunction with Box-Cox transfo r mations or linear regressio n along with Box-Tidwell trans formations). W e purs ue further extensions of the linear mo del, deriving r esults for the logistic regr ession mo del. Our results have a very practical implication. They le t us determine, quickly and a nalytically , whether cen tral limit theorems ho ld for particular functionals. If central limit theor e ms hold, then we can pursue the strategy of fitting the mo del to the full data set and using impo rtance sampling to estimate the func- tionals under case-deleted p osterio rs. If central limit theo r ems do no t ho ld, we m ust alter o ur inferential strategy , either using mor e sophisticated impor tance sampling techniques (such a s the imp ortance link function tec hnique introduce d in [ 17 ]) or fitting the mo del for pa rticular case-deleted data s ets with sepa rate Monte Ca rlo simulations. By providing conditions under which r moments of the ca se-deleted weigh t function exist, our theore ms go b eyond the t ypical central limit theore m results that r ely on the exis tence o f second momen ts. This is impor tant for tw o reaso ns. First, one may b e interested in functionals wher e higher o rder moments of the case-deleted weigh t function come into play (see, fo r example, estimation o f χ 2 divergence in [ 26 ] and [ 27 ]). Second, the n um ber of moments which exist for the deletion of pa r ticular cases can b e used as a measure of their influence, th us allowing one to as ses influence alo ng a contin uum. The co nnection b etw een influence and moment conditions is elucida ted by applying results presented in [ 7 ] and [ 8 ], which, for an a rbitrary , non-nega tive r andom v a r iable X , contain the definition of a qua nt it y ca lled the moment index of X . Denoting b y W the w eight function resulting fro m the deletion of a g iven set of ca ses, its moment index r ∗ is the least upper b ound on the n um ber of moments whic h exist. This represents a quantitativ e summary of the limiting tail b ehavior of the case-deleted weight function in the sense that, as stated in [ 7 ] and [ 8 ], r ∗ = lim inf t →∞ [log P ( W > t )] / [log(1 /t )]. A lar ger moment index corre spo nds to a la rger class o f functions for which the cen tral limit theorem exists. Pra ctical illustration o f these ideas are presented in Sections 4 and 6 . This ar ticle is laid out as follows. Section 2 contains preliminary results a nd formal definitions of thick and thin tails. Section 3 provides conditions for the (non)existence of the r th momen t o f the c ase deletion w eight function in the linear model. Section 4 gives conditions on momen ts’ exis tence for the MM mo del, and Section 5 gives par a llel results for the lo g istic regr ession mo del. Sec- tion 6 shows how the r e s ults can b e used to establish cen tral limit theor ems. A summary of sufficient conditions on the w eight function’s momen ts to ensur e a central limit theore m for several p opula r Bayesian measur es of influence are presented in T a ble 2 . The s ection also shows the results in action, in vestigating bo th measur es of influence and their impac t on mo del dev elopment in a m ul- tiple linear regr ession setting. The fina l section contains concluding remarks. T echnical details of pro ofs are left to the a ppendix . I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 777 2. Notation and preliminary resul ts Each Bay esian mo del considered in this article depe nds o n a finite dimen- sional para meter vector s = ( s 1 , . . . , s k ). Suppo se tha t a set of o bserv a tions y = ( y 1 , . . . , y n ) is collected and let p ( s ) = p ( s | y ) denote the full p oster ior density for s . Let I denote the set of indices to be deleted fro m the ana lysis and let I b e its cardina lit y . Let y \I represent the n − I observ a tio ns r e ma ining after the indice s in I are omitted with p \I ( s ) = p \I ( s | y \I ) denoting the cor- resp onding ca s e-deleted p osterior dens ity . F ur thermore, let q ( s ) = q ( s, y ) and q \I ( s ) = q ( s, y \I ) denote functions computable at every p oint ( s, y ) and pr o- po rtional to the join t prior densities (e.g., prior × data likelihoo d) of ( s, y ) and ( s, y \I ), resp ectively . Suppo se that a sample z 1 , . . . , z M from p ( s ) is av ailable. In a typical a ppli- cation this will b e either an indep endent sample or a depe ndent s ample from an ergo dic Markov chain. W e wish to construct an estimate o f E p \I [ g ( s )] = R g ( s ) p \I ( s ) d ( s ), for s o me real v alued function g ( s ) such that R | g ( s ) | p \I ( s ) ds < ∞ . This can b e done by computing a Monte Car lo sum in which the individ- ual elements g ( z m ) are r eweigh ted. T ypically , p ( s ) and p \I ( s ) a re not av ailable bec ause their norma liz ing constants are unknown a nd only q ( s ) and q \I ( s ) a re directly computable. In that case we can define the weight function w \I ( s ) = q \I ( s ) /q ( s ) and estimate the e x pecta tion by: ˆ E p \I [ g ( s )] = M X m =1 w \I ( z m ) g ( z m ) ! , M X m =1 w \I ( z m ) ! . (2.1) The denominator in E quation ( 2.1 ) divided b y M e stimates the r atio of the t wo unknown normalizing constants. Thus, if p ( s ) and p \I ( s ) are av ailable, w \I ( s ) can be replaced b y w ∗ \I ( s ) = p \I ( s ) /p ( s ) in th e n umerator and the denominator can b e replaced b y M , resulting in the rela ted estimator that we denote by ˆ E ∗ p \I [ g ( s )]. In b oth cases , the resulting es timators a re co nsistent under mild as s umptions (see [ 13 ] for the case of i.i.d. samples and [ 2 4 ] for the case of samples from erg o dic Marko v chains). Thro ughout the a r ticle we refer to estimators of the form ˆ E p \I [ g ( s )] and ˆ E ∗ p \I [ g ( s )] as c ase-delete d imp ortanc e sampling estimators . The prior distribution plays a large role in deter mining whether the Estima tor ( 2.1 ) is asymptotically no rmal. T o ens ur e asymptotic no rmality for ˆ E p \I [ g ( s )], we need b oth R w 2 \I ( s ) g 2 ( s ) p ( s ) ds < ∞ and R w 2 \I ( s ) p ( s ) ds < ∞ . Finiteness of these integrals is unchanged by substitution of w ∗ 2 \I ( s ) for w 2 \I ( s ). (See Sec- tion 6 for further discussion of c o nditions for the a symptotic normality of bo th ˆ E p \I [ g ( s )] and ˆ E ∗ p \I [ g ( s )].) In many instances, a prior distr ibution with sharp enough tails will ensure that these integrals are finite while a flatter tailed pr ior will lead to infinite integrals. The up coming lemma ena bles us to work ea sily with prior s having different tails. In par ticular, it ena bles us to derive preliminar y r e s ults for co njugate I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 778 prior distributions, and then to quic kly ex tend the results to non-conjugate prior distributions. Use of the lemma is demo nstrated in the examples. T o set up the lemma, we fir st define the ba sic notation. Let S i = Z f ( x ) π i ( x ) R f ( u ) π i ( u ) du h ( x ) dx = c − 1 i Z f ( x ) π i ( x ) h ( x ) dx, for i = 0 , 1 . The functions f , π i and h ar e assumed to b e non-nega tiv e. The constants c i are assumed to b e finite a nd po s itive. Let 0 < b < B < ∞ . Lemma 2.1. If, for al l x , π 0 ( x ) /π 1 ( x ) < B , t hen S 0 = ∞ implies S 1 = ∞ . If, for all x , b < π 0 ( x ) /π 1 ( x ) , then S 0 < ∞ implies S 1 < ∞ . If, for al l x , b < π 0 ( x ) /π 1 ( x ) < B , then S 0 < ∞ if and only if S 1 < ∞ . A device tha t we have found useful is a fo r mal description of thinner and thic ker tailed distributions. Since the prior distr ibutions that we consider here are all abs olutely contin uous with resp ect to Leb esg ue measure on R k , we use a simple definition that suffices for our purp os es. W e describ e the res ult in terms of a distribution for a par ameter since that is how w e will use the result. Consider a par ameter s ∈ S . The para meter space S is taken to b e R k . Let F represent a set of distr ibutio ns on s , a ll of which ha v e densities with res pect to Leb esgue measur e. The following definition concerns the relationship b etw een another distribution, g , and the set of distributions F . Definition 2.1. The density g is said to be t hick-taile d with resp ect to F if, for each f ∈ F and for ea ch sequence s t with || s t || → ∞ a s t → ∞ , lim t →∞ g ( s t ) /f ( s t ) = ∞ . Definition 2. 2. The density g is said to b e thin-taile d with resp ect to F if, for each f ∈ F and for ea ch sequence s t with || s t || → ∞ a s t → ∞ , lim t →∞ g ( s t ) /f ( s t ) = 0 . W e note that these definitions ca pture the g eneral notion of which distribu- tions a re thicker or thinner tailed than others. F o r exa mple, a t distribution will b e thick er tailed than the class o f normal dis tr ibutions. A one-dimens io nal normal distribution will b e thinner tailed than the L a place dis tr ibution. A t dis- tribution with 5 degrees of freedom will b e thic k er tailed than a t distribution with 7 deg rees of fr e e dom, etc. W e also note that a no rmal distribution with v aria nce σ 2 is thick er ta iled than a norma l distribution with v a riance cσ 2 if c < 1. 3. A Ba y es i an linear mo del In [ 19 ] the a uthor considers a sta nda rd sp ecification of the Bay esian linear mo del and derives necessa r y and sufficien t c o nditions for the v ar iance of the case-deleted imp or tance sampling w eigh t function to be finite when a single o b- serv a tion is omitted. Lo osely , the conditions for a finite v ariance stated in [ 19 ] can b e des c rib ed as (a) small leverage for the deleted c a se, (b) la r ge enoug h I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 779 sample size, a nd (c) small enoug h r esidual for the deleted case. In this section we extend the res ults of [ 19 ] in tw o different dire c tio ns: w e analyze the situation of multiple case- deletions and provide necessary and sufficient conditions for the r th ( r > 1 ) p oster ior moment of the case-deleted weigh t function to be finite. Our conditions ar e also on leverage, sample size and residual. In a dditio n, we extend the r e sults to nonconjugate mo dels by considering the tail behavior of the prio r distr ibution. In Section 6 , these results are used to establis h cen tral limit theorems for a broa d class of imp ortance s ampling estimators . Let the n × 1 vector of observ ations Y be distributed as Y | θ , σ 2 ∼ N  X θ , σ 2 I  , (3.1) where I denotes the identit y matrix and X denotes an n × k de s ign matrix of rank k . Assume that the v ariance σ 2 , ha ving an inv erse gamma pr ior distribution with known p ositive parameters α a nd β , is indep endent of the k × 1 vector of regres s ion parameters θ = ( θ 0 , . . . , θ k − 1 ) T having a prop er prior dens it y π 1 with full supp ort R k , i.e., θ ∼ π 1 ⊥ σ 2 ∼ I G ( α, β ) . (3.2) T o des c rib e conditions under which moments of the case- deleted weigh t func- tion exist, we intro duce some additional quantities. Let H = X ( X T X ) − 1 X T and RSS = y T ( I − H ) y denote the pro jection ma trix and the residual sum of squares from the least squares fit of the full data s et, r esp ectively . The index set, I , co nsists of the indices of the I cases to b e deleted. Given the index set I , let Y I be the I × 1 r andom vector of observ ations Y i , with i ∈ I , and let X T I be the I × k subma trix o f the I r ows of X indexed b y I . De- fine the leverage of set I to be the principa l minor of H corr esp onding to I : H I = X T I ( X T X ) − 1 X I , and define e I to b e the I × 1 vector of the ele- men ts indexed by I in the vector of the ordinary r esiduals e = ( I − H ) y , i.e., e I = y I − X T I ( X T X ) − 1 X T y . Finally , for each r > 0, if the I × I matrix ( I − r H I ) is non-singula r, let RSS ∗ \I ( r ) = RSS − r e T I ( I − r H I ) − 1 e I . When I = 1, so that I = { i } , H I = x T i ( X T X ) − 1 x i is the lev erage of i th obser- v ation, say h ii , e i = y i − x T i ( X T X ) − 1 X T y is the residual of observ ation i , and RSS ∗ \ i ( r ) = RSS − r e 2 i / (1 − r h ii ). When r = 1, RSS ∗ \I ( r ) is the res idua l sum of squar es from the lea st squares fit of the case-deleted da ta set. Letting s = ( θ , σ 2 ), the unnorma lized impor tance sampling weigh t function resulting from the deletion of the I ca s es indexed by I is given b y w \I ( s ) = ( σ 2 ) I / 2 exp  1 / (2 σ 2 )( Y I − X T I θ ) T ( Y I − X T I θ )  . (3.3) This functional form o f the weight results fro m ignor ing nor malizing constants not depending on the model par ameters and from canceling the common fa c tors in the n umerator and the denominator represent ed b y the prior and b y the po rtion of the Ga us sian likelihoo d which corre spo nds to the undeleted cases. F o r the Ba y esian linear mo del sp ecified b y Equa tions ( 3.1 ) and ( 3.2 ) the following theor em holds. I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 780 Theorem 3. 1. L et Y | θ , σ 2 ∼ N ( X θ , σ 2 I ) . L et λ 1 ≤ · · · ≤ λ I denote the eigenvalues of H I and assume that λ i 6 = 1 /r, for al l i = 1 , . . . , I . ( i ) If the prior distribution follows sp e cific ation ( 3.2 ) , then the c ase-delete d weight function w \I ( s ) has a finite r th moment with r esp e ct t o t he ful l p osterior p ( s ) if ( a ) λ I < 1 / r and ( b ) n/ 2 + α > r I / 2 and ( c ) RSS ∗ \I ( r ) > − 2 / β . Conversely, the r th moment of w \I ( s ) is infinite if ( a ′ ) λ I > 1 / r or ( b ′ ) n/ 2 + α ≤ r I / 2 or ( c ′ ) RSS ∗ \I ( r ) < − 2 /β . ( ii ) If the noninformative prior π ( θ , σ 2 ) ∼ 1 /σ 2 is use d, t hen c onditions ( a ) and ( a ′ ) r emain unchange d, and c onditions ( b ) , ( c ) , ( b ′ ) and ( c ′ ) b e c ome: ( b ) n > r I + k , ( b ′ ) n ≤ r I + k , ( c ) RSS ∗ \I ( r ) > 0 and ( c ′ ) RSS ∗ \I ( r ) < 0 . R emark 3.1 . Theorem 3.1 includes the problem in v estigated in [ 19 ] as a sp ecial case. There, the a uthor takes r = 2 and sp ecifies the prior distr ibution on ( θ , σ 2 ) as θ | Σ ∼ N ( θ 0 , Σ), σ 2 ∼ I G ( α, β ) and Σ ∼ I W ( ν R , ν ), with conditional independenc e at all stages of the model. The parameter θ 0 ∈ R k is a known mean v ector, α and β are known positive constant s, and I W ( ν R , ν ) is an in verse Wishart distribution with ν a known integer greater than or equal to k a nd R a known k × k p ositive definite matrix. R emark 3.2 . The statement of Theore m 3.1 inv olv es the eigenv alues of the I × I matrix H I . In t ypical applications , the car dinality I of the s et o f observ ations being deleted will b e fairly s mall and the c alculation of the eig en v alues can b e accomplished quickly with standard so ft w are. F o r the illustrative examples pre- sented in the a r ticle, w e computed all eigenv alues using the R function eigen() . Theorem 3.1 holds for an y prop er prior distribution on θ having full support on R k , provided the par ameters θ a nd σ 2 are indepe nden t and the prior for σ 2 is I G ( α, β ). This follows from the form of the likelihoo d function, whic h, for fixed σ 2 , is a n ex po nen tial function with q uadratic argument in θ , and, fo r fixed θ , is the product of a p ower and an exponential function in 1 /σ 2 . Reco g nizing a connection with the integral needed to norma liz e the k ernel of an in verse g amma density suggests how to extend the results to the case of non-conjugate pr ior distributions. The next tw o corollar ies make this extension, placing the focus on the tails o f the prio r distribution. The cor o llaries as sume independenc e b et ween θ and σ 2 , and so we co nsider their tail b ehavior separately . Let π 11 denote a (prop er) prior distribution on θ , and let F 1 be the family of all nondegenerate multiv ariate normal dis tr i- butions on R k . Co rollar y 3.2 dis tinguishes b et ween priors tha t ar e thick-tailed with resp ect to F 1 and those tha t are no t. Let π 12 denote a (pr o pe r) prior dis- tribution on σ 2 , and let F 2 be the family o f all inverse gamma distributions, I G ( α, β ) , α > 0 , β > 0. Ex ploiting the connection mentioned in the previo us paragr aph, the pro of of Theorem 3.1 shows that conditions ( a ) , ( a ′ ) , ( c ) and ( c ′ ) determine the integrability (or lac k thereo f ) of a cer tain function of σ 2 in a neighborho o d of zero. F or σ 2 going to infinity , a s uita ble num ber of observ a tions guarantees integrabilit y . Th us, the corolla ries fo cus on the tail for σ 2 near 0, or I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 781 the ta il for the precision, 1 / σ 2 , tending to ∞ . A distribution, π 12 which is thic k- tailed with resp ect to F 2 has the prop erty that lim σ 2 → 0 π 12 ( σ 2 ) /π 02 ( σ 2 ) = ∞ , for all π 02 ∈ F 2 ; a distribution that is thin-tailed with re spe c t to F 2 satisfies lim σ 2 → 0 π 12 ( σ 2 ) /π 02 ( σ 2 ) = 0 , for all π 02 ∈ F 2 . Before pro c eeding, we summarize the nota tional c o nv e n tions just in tro duced and the ass umptions common to b oth coro llaries. 1. F 1 denotes the family of all nondege nerate multiv ariate normal distribu- tions on R k . 2. F 2 denotes the family of all inv erse g amma distributions. 3. θ and σ 2 are ass umed to be indep endent. 4. π 11 denotes a prio r distr ibutio n for θ having full suppo rt R k . 5. π 12 denotes a prior distribution for σ 2 such that R ( σ 2 ) ( n − r I ) / 2 π 12 ( σ 2 ) dσ 2 < ∞ . 6. λ 1 ≤ · · · ≤ λ I denote the eigenv alues of H I assumed to satisfy λ i 6 = 1 /r , for all i = 1 , . . . , I . The firs t cor ollary deals with thick-tailed prior distr ibutio ns π 12 on σ 2 and covers the case of a ll prop er prio r distributions π 11 on θ . Corollary 3.1. As sume 1.–6. ab ove and let π 12 ( σ 2 ) b e thick-taile d with r esp e ct to F 2 . If λ I < 1 /r and RSS ∗ \I ( r ) > 0 , then the c ase-delete d weight function has finite r th moment with resp e ct to the ful l p osterior distribution. On the other hand, if λ I > 1 /r or RSS ∗ \I ( r ) < 0 , then the r t h moment of the c ase-delete d weight function is infin ite. The next c o rollar y applies to thin-tailed distributions π 12 ( σ 2 ). It provides only a sufficient condition if π 11 ( θ ) is thin-tailed and neces sary a nd sufficient conditions if π 11 ( θ ) is thick-tailed with res pec t to F 1 . Corollary 3.2. Assume 1.–6. ab ov e and let π 12 ( σ 2 ) b e thin-taile d with r esp e ct to F 2 . If λ I < 1 /r , then the c ase-delete d weight function has finite r th moment with r esp e ct t o the ful l p osterior distribution. If π 11 ( θ ) is thick-taile d with re sp e ct to F 1 , t hen λ I > 1 / r implies that the c ase-delete d weight function has infin ite r th moment. If λ I > 1 /r and b oth the prior π 11 on θ and the prior π 12 on σ 2 are thin- tailed, we canno t draw a n y conclusions ab out the finiteness of the full poster ior r th moment of w \I ( θ , σ 2 ) as shown in the following example. Example 3.1 . Consider the univ a riate r egressio n model y j ∼ N ( θx j , σ 2 ) with no intercept and with pr ior distribution on θ , π 11 ( θ ) ∝ e x p {− ( θ − θ 0 ) 4 } . Sup- po se we observe a sample with i th leverage h ii = 1 / 2 + 1 / P n j =1 x 2 j . Althoug h h ii > 1 / 2, if the prio r distribution o n σ 2 is π 12 ( σ 2 ) ∝ exp {− ( σ 2 ) − 2 − σ 2 } , the po sterior seco nd moment E ( w 2 \ i ( θ, σ 2 ) | y ) is finite. On the other ha nd, if the prio r distribution on σ 2 is π 22 ( σ 2 ) ∝ ex p {− ( σ 2 ) − 3 / 2 − σ 2 } , then E ( w 2 \ i ( θ, σ 2 ) | y ) = ∞ . Both prior dis tr ibutions π 12 and π 22 are thin-tailed with resp ect to F 2 . Finally , consider the c ase of a prior distribution π 11 ( θ ) having b ounded sup- po rt. Arguing as in the pro of of Co rollary 3.2 , it is easy to verify that the r th I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 782 moment of w \I , E ( w r \I ( θ , σ 2 ) | y ), alwa ys exists if π 12 ( σ 2 ) is thin-tailed with re- sp ect to F 2 . On the other hand, if π 12 ( σ 2 ) is either in F 2 or is thick-tailed with resp ect to F 2 , the finiteness o f E ( w r \I ( θ , σ 2 ) | y ) dep ends e s sentially on the v a lue of RSS ∗ \I . More precisely , let M = min θ ∈ sup p ort( π 11 ) θ T ( X T X − rX I X T I ) θ − 2( y T X − ry T I X T I ) θ . Then one ca n prov e the following: – if π 12 ( σ 2 ) = I G ( α, β ), then RSS ∗ \I > − (2 /β + M ) a nd n/ 2 + α > rI / 2 imply E ( w r \I ( θ , σ 2 ) | y ) < ∞ , while RSS ∗ \I < − (2 /β + M ) o r n/ 2 + α ≤ rI / 2 implies E ( w r \I ( θ , σ 2 ) | y ) = ∞ ; – if π 12 ( σ 2 ) is thic k-tailed with resp ect to F 2 , then RSS ∗ \I ( r ) > − M and R ( σ 2 ) − ( n − r I ) / 2 π j 2 ( σ 2 ) dσ 2 < ∞ imply E ( w r \I ( θ , σ 2 ) | y ) < ∞ , while RSS ∗ \I ( r ) < − M or R ( σ 2 ) − ( n − r I ) / 2 π j 2 ( σ 2 ) dσ 2 = ∞ implies E ( w r \I ( θ , σ 2 ) | y ) = ∞ . 4. A nonlinear mo del T o illustr ate some of the issues that a rise when the fitted mo del is nonlinear, we revisit a Bayesian a nalysis o f the Puromicyn data presented in [ 17 ]. The da ta come fro m a bio chemical reactio n and are describ ed in [ 5 ], p. 42 5. F or a gr oup of c ells no t tre ated with the drug Pur omycin, there ar e n = 11 measurements of the initial velo city of a r eaction, V i , o btained when the co ncentration of the substrate w as se t a t a given p ositive v a lue, c i . The obser v ations a r e r ecorded in T able 1 a nd plotted in Figure 1 . The B ay esian mo del fit in [ 17 ] assumes a non linear r e gression of velo c it y on concentration given by the Michaelis-Menten (MM) relation: E( V i ) = ( mc i ) / ( κ + c i ) . (4.1) According to this relation, when the concentration of the substr ate equals the Michaelis para meter, κ , the velocity r eaches half o f its maximal v alue, m , which is also the limiting velo c it y as the concentration go es to infinity . F o llowing [ 17 ], w e mo del the n observ ations as independent rea liz ations fro m normal distr ibutions with means given by E quation ( 4.1 ) and common v a riance σ 2 . All three para meters m, κ, a nd σ 2 are constr ained to b e po sitive and their T able 1 The Pur omycin Data and R elate d Case-Delete d Quantities. The b ottom r ow c ontains the moment index r ∗ , i.e., the le ast upp er b ound on the value of r such that E ( w r \ i ( m, σ 2 , κ )) | v ) < ∞ . Case No. i 1 2 3 4 5 6 7 8 9 10 11 Concen tration 0.02 0.02 0.06 0 .06 0.11 0 .11 0.22 0.22 0.56 0.56 1.1 V elocity 67 51 84 86 98 115 131 124 144 158 160 P j / ∈{ i } c 2 j − c 2 i 1.97 1.97 1.96 1.96 1.94 1.94 1.87 1.87 1.34 1.34 − 0.45 momen t index r ∗ 1.59 2.79 4.48 5.17 2.86 5.19 6.38 5.26 3.77 2.81 1.32 I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 783 Concentration Velocity 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 50 100 150 1 2 3 4 5 6 7 8 9 10 11 Fig 1 . Sc atte rplot of the Pur omycin D ata Set. The curve r epr esents a fit of t he e xp e cte d velo ci t y ba se d on the p osterior me ans of m and κ . prior distribution is s pec ified a s π ( m, κ, σ 2 ) = π 1 ( m, σ 2 ) π 2 ( κ ), with π 1 ( m, σ 2 ) ∝ 1 /σ 2 representing a no ninformative prior density for ( m, σ 2 ) and π 2 representing a pr op er pr ior density for κ suc h that R ∞ 0 κ π 2 ( κ ) dκ < ∞ . This r equirement guarantees that the p oster io r is prop er. The MM mo del is, conditional on κ , a linea r regressio n mo del with no in ter- cept and cov ar iate x i ( κ ) := c i / ( κ + c i ), for i = 1 , . . . , n . Thus, for fixed κ , if the suppo rt of m were R 1 , we could a pply Theo rem 3.1 (ii). The ca se-deleted weigh t function is w \I ( m, κ, σ 2 ) = ( σ 2 ) I / 2 exp { P i ∈I [ v i − mx i ( κ )] 2 / (2 σ 2 ) } . Noting that x i ( κ ), h ii and e i are contin uo us functions of κ , we see that when The- orem 3.1 (ii) indica tes an infinite conditional r th mo ment at so me v alue κ 0 , it also indicates an infinite conditional mo men t in an op en in terv al a bo ut κ 0 . If the prior on κ has full s uppo r t, this interv al r eceives positive p osterior probability , and so the unco nditional r th moment is infinite. The analog of Theorem 3.1 for the Michaelis-Menton mo del will impo s e conditions on le verage, sample size and residual. The unconditional r th moment may b e infinite for a different reason: the finite conditiona l r th moment s may integrate to infinit y . Thus, the conditions will need to b e strengthened to ensure a finite r th moment. T o a void this route to infinit y , the conditions o n lev erage and residual are applied unifor mly in κ . Finally , an appa rent infinite mo men t will sometimes b e finite due to the restriction on the supp ort of m . Define the conditional design matrix X ( κ ). Pro c eeding as in Section 3 , define the matrix H I ( κ ), and co ncent rate on its lar gest eigenv alue. The conditiona l leverage, l ( I , κ ) = P i ∈I x 2 i ( κ ) / P n i =1 x 2 i ( κ ), is the only non-z e r o eigen v alue. The condition o n the r e s idual can b e expressed in terms of simpler functions which will prov e useful later. Define A ( I , r, κ ) = P i / ∈I x 2 i ( κ ) − ( r − 1) P i ∈I x 2 i ( κ ), B ( I , r, κ ) = P i / ∈I x i ( k ) v i − ( r − 1 ) P i ∈I x i ( κ ) v i , and C ( I , r ) = P i / ∈I v 2 i − ( r − 1) P i ∈I v 2 i . The adjusted, conditiona l residua l sum o f sq ua res is RSS ∗ \I ( r , κ ) = C ( I , r ) − B 2 ( I , r , κ ) / A ( I , r, κ ). The set of zero es of A ( I , r, κ ), with I and r held I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 784 fixed, contains at most 2( n − 1 ) po ints, a set of Lebesgue mea sure 0, and so we need not worry ab out the apparent division by 0. One last qua n tit y is needed to handle the partial suppo rt of m . Define g ( I , κ ) = P i ∈I x i ( κ ) v i / P n i =1 x i ( κ ) v i , the pro duct of cov aria te a nd resp ons e summed over the deleted cas es divided by the same quantit y summed ov er all cases. The results on the finiteness of E ( w r I ( m, σ 2 , κ ) | v ) are summarized in the following theor em. Theorem 4.1. L et π 2 b e a pr op er prior distribution on κ such that R ∞ 0 κ π 2 ( κ ) dκ < ∞ . Supp ose that ( a ) ther e exists a me asur able set N such that π 2 ( N ) = 0 and sup κ ∈N c l ( I , κ ) < 1 / r and ( b ) n > r I + 1 . If, in addition, ( c ) inf κ ∈N c { RSS ∗ \I ( r , κ ) } > 0 or ( d ) C ( I , r ) > 0 and inf κ ∈N c g ( I , κ ) > 1 /r holds, t hen t he c ase-delete d weight function w \I ( m, κ, σ 2 ) has finite r th moment with r esp e ct to t he ful l p osterio r p ( m, κ, σ 2 ) . On the other hand, either of the c onditions ( e ) A ( I , r , κ ) m 2 − 2 B ( I , r , κ ) m + C ( I , r ) < 0 for al l ( m, κ ) in some non- ne gligible set or ( f ) n ≤ rI + 1 is sufficient for the r th moment of the imp ortanc e sampling weight function to b e infinite. R emark 4.1 . The sufficient conditions of Theo rem 4.1 are essentially necessary for the r th p osterior moment o f the case-deleted w eigh t function to b e finite. If there exists a non-neglig ible set of v alues of κ such that ( a ′ ) l ( I , κ ) > 1 /r or ( c ′ ) RSS ∗ \I ( r , κ ) } < 0 and ( d ′ ) C ( I , r ) < 0 or g ( I , κ ) < 1 /r , then condition ( e ) is satisfied, and vice versa. This is b ecause l ( I , κ ) > 1 /r if a nd only if A ( I , r , κ ) < 0, B 2 ( I , r , κ ) − C ( I , r ) A ( I , r, κ ) is the discriminant o f the quadratic equation A ( I , r , κ ) m 2 − 2 B ( I , r, κ ) m + C ( I , r ) = 0, and g ( I , κ ) < 1 /r if and only if B ( I , r , κ ) > 0. R emark 4.2 . F or a g eneral r > 1 , upp er b ounds for the leverages l ( I , κ ) and low er b ounds for the functions g ( I , κ ) and for the marg inal residual sums of squares RSS ∗ \I ( r , κ ) are hard to derive analytically , but numerical verification o f the conditions of Theorem 4.1 is rather simple. In fact, l ( I , κ ) 6 = 1 / r if a nd only if A ( I , r , κ ) 6 = 0 and g ( I , κ ) 6 = 1 /r if and only if B ( I , r , κ ) 6 = 0. Mor eov er, for κ > 0, A ( I , r, κ ), B ( I , r, κ ) and C ( I , r ) A ( I , r , κ ) − [ B ( I , r, κ )] 2 are contin uous functions that a ppr oach zer o as κ g o es to infinity and that can only hav e a finite I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 785 nu m ber of extrema. The latter can be found among the real p ositive ro ots of po lynomials of degr ee 2 n − 3, n − 2 and 2 n − 3 resp ectively . R emark 4.3 . The str ategy applied to the MM mo del applies to an array of mo dels that are , conditiona l o n s o me set of para meters, linea r. W e imp os e the leverage and residual conditions uniformly across the pa rameters that render the mo del nonlinear . Imp ortant classes of mo dels are linear reg ression mo dels tha t allow for Box-Cox transformation of the r esp o nse v ar iable and/ or B ox-Tidw ell transformatio n of the explanatory v ariables. The authors of [ 17 ] specified a t distr ibution on 3 degrees of fre e dom restricted to [0 , + ∞ ) as a prior π 2 for the parameter κ a nd fit the mo del to the Pur omycin data using the progra m BUGS (see [ 22 ]). (Due to some technical res trictions of BUGS, they ha d to use approximations for some of the prio r specifica tions.) They considered deletio n of single cases and computed the corres po nding ca se- deleted w eight functions. They r epo rted detaile d estimation results based on the deletion of case 1, an observ a tion that pro duces highly v aria ble realized weigh t functions, and illustrated ho w a transformation ba sed approach (the Impo rtance Link F unction metho d) ca n effectively reduce the v a riability of the weigh t functions and lead to improved estimation. W e dis cuss the implications of the results developed in this section on the analysis pres e nted in [ 17 ]. W e consider, as was done in [ 17 ], deletion of sin- gle obser v ations and fo cus on the case r = 2, so that the sa mple size co ndi- tion ( b ) o f Theorem 4.1 is satisfied for all cases . An examination of the le ver- age condition ( a ) sho ws that observ a tion 1 1 has large lev erage (for κ = 2, l ( I , κ ) = 0 . 5 065 > 1 / r = 1 / 2), and so by Remark 4.1 , the p osterior v aria nce of the case-dele ted weigh t function for c a se 11 is infinite. All remaining cases hav e leverages b ounded aw a y fr o m 0 and ab ov e strictly by 1 / 2, and so s atisfy condition ( a ). T ur ning to the residua l conditions ( c ) and ( d ), w e find that all observ ations other than 1 a nd 11 satisfy co ndition ( c ), thus ens ur ing finite v ariances for their case-deleted weigh t function. F or obser v ation 1, the adjusted residual sum o f squares is negativ e for v alues of κ nea r 0 . 08, viola ting condition ( c ). Condi- tion ( d ) is a lso viola ted since s up κ> 0 g (1 , κ ) = 0 . 0550 1 < 1 / 2. Cons e quent ly , Theorem 4.1 implies tha t the case-deleted weigh t function for observ ation 1 has infinite v a riance. In addition to r = 2, we can examine other moments of the cas e-deleted weigh t functions. T able 1 displays, for every case-deletion i , the moment in- dex r ∗ , i.e., the leas t uppe r bound on the v alue of r for which the r th mo- men t exists (see [ 7 ] and [ 8 ]). If the influence of the i th obser v ation on the po sterior distribution p ( m, σ 2 , κ ) is assess ed by the χ 2 divergence mea s ure b e- t ween the case-de le ted and full p oster iors: χ 2 = R [ p \ i ( m, σ 2 , κ ) /p ( m, σ 2 , κ ) − 1] 2 p ( m, σ 2 , κ ) dm dσ 2 dκ , then, a s sugg ested in [ 26 ] a nd [ 2 7 ], we can estimate χ 2 by means of the Monte Carlo sum app ear ing in T able 2 . As indicated in Sec- tion 6, this estimator is a symptotically nor mal if E ( w 4 \ i ( s ) | y ) < ∞ . Accor ding to the v a lues of r display ed in T able 1 , a central limit theorem holds only for the estimator s of χ 2 corres p onding to obs erv atio ns 3 , 4 , 6 , 7 , and 8. I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 786 T able 2 Influenc e Me asur es. This table pr esents a sele ction of influenc e me asur es and suffic i ent co nditions for their estimators to fol low a c entr al limit the or em (se e Se c t ion 6 ). K L r epr esents Kul lb ack - Liebler diver gence, L 1 i s inte gr ate d L 1 loss, L 2 is inte gr ate d L 2 loss, ∆1 is change in first moment of a p ar ameter θ , ∆2 is change in se c ond moment of a p ar ameter θ , H el i s Hel linger distanc e, C hS q is chi-squar e distanc e, C P O i s the Conditional Pr e dictive Or dinate, and B dd is a b ounde d function. A s a shorthand for the notation intr o duc e d in Se ct ion 2 , a subscript m me ans that a function is ev aluate d at z m (e.g., w m = w ( z m ) , etc. ). The symb ol L I ( s ) rep r ese nts the likeliho o d of the observations in I e v aluate d at t he p oint s , L \I r epr esents the lik eliho o d of the observations not in set I , and π r epr esents the prior density. The expr ession 2 + δ in the table mea ns that it is sufficient, for some δ > 0 , t hat 2 + δ moments exist. ˆ R = P M m =1 w m / M . ˆ C is an est imator of C = R q ( s ) ds . Ther e ar e many est imators of C , with some b ase d on a differ ent simulation than that use d to fit the mo del. In lines 2 and 3 of the table, we assum e that ˆ C is sufficiently wel l b e have d that it do e s not pr event the estimators fr om fol lowing c entr al limit t he or ems. In line 8 of the table, we pr esume that w ( s ) = 1 /L I ( s ) . Meas Estimand Estimator Mom’s Adjmnt Adj-Mom’s K L R log( p ( s ) p \I ( s ) ) p \I ( s ) ds − ˆ R − 1 P M m =1 w m log( w m ) / M − l og( ˆ R ) 2 + δ n.a. n.a. L 1 R | p \I ( s ) − p ( s ) | p \I ( s ) ds ˆ C − 1 ˆ R − 1 P M m =1 q m w m | ˆ R − 1 w m − 1 | / M 2 π 2 L 2 \I 2 L 2 R ( p \I ( s ) − p ( s )) 2 p \I ( s ) ds ˆ C − 2 ˆ R − 1 P M m =1 q 2 m ( ˆ R − 1 w m − 1) 2 w m / M 2 π 4 L 4 \I 4 ∆1 R θ p \I ( s ) ds − R θ p ( s ) ds P M m =1 θ m ( ˆ R − 1 w m − 1) / M 2 θ 2 2 ∆2 R θ 2 p \I ( s ) ds − R θ 2 p ( s ) ds P M m =1 θ 2 m ( ˆ R − 1 w m − 1) / M 2 θ 4 2 H el R ( p p ( s ) − p p \I ( s )) 2 ds 2 − 2 p ˆ R − 1 P M m =1 √ w m / M 2 n.a. n.a. C hS q R ( p \I ( s ) p ( s ) − 1) 2 p ( s ) ds P M m =1 ( ˆ R − 1 w m − 1) 2 / M 4 n.a. n.a. C P O R L I ( s ) p \I ( s ) ds ˆ R − 1 2 n.a. n.a. B dd R g ( s ) p \I ( s ) ds P M m =1 ˆ R − 1 g m w m / M 2 n.a. n.a. I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 787 5. Ba y esi an logisti c regressio n W e no w switc h our fo cus to g eneralized linear models, co ncent rating on the study of a logis tic regression mo del. Assume that, for each of n sub jects, we hav e av ailable a k × 1 vector of cov ariate information, x i , and we observe a 0-1 out- come, Y i . Supp ose that the Y i are independently distr ibuted a s B e r noulli r andom v aria bles taking on v a lue 1 with proba bilit y p i = ex p { β T x i } / [1 + exp { β T x i } ]. The case-deleted weight function is pro po rtional to w \I ( β ) = q \I ( β ) q ( β ) = Y i ∈I 1 + ex p { β T x i } exp { β T x i y i } . (5.1) The following theore m co vers prior distributions with the ex po nen tial tails that match the logistic regression lik elihoo d. Subsequent co r ollaries cover thinner and thic ker tailed prior distributions. Theorem 5 . 1. L et t he data fol low t he lo gistic r e gr ession mo del just describ e d, and assume that we have a prior distribution for β with density pr op ortional t o π ( β ) = exp {− ǫ | β T | 1 } , wher e ǫ > 0 is given and | β T | 1 := P k − 1 j =0 | β j | . Define h ( β , r , ǫ ) = β T X i / ∈I x i y i − ( r − 1) X i ∈I x i y i − − X i / ∈I x i I ( β T x i > 0) + ( r − 1 ) X i ∈I x i I ( β T x i > 0) ! − ǫ | β T | 1 . If, for al l ve ctors β such that | β T | 1 = 1 , h ( β , r , ǫ ) < 0 , then t he c ase-delete d weight fun ction w \I ( β ) has finite r t h moment with re sp e ct to the ful l p osterior p ( β ) . If, for some ve ct or β such that | β T | 1 = 1 , h ( β , r, ǫ ) > 0 , then the c ase- delete d weight function has infinite r th moment. The theor em can b e applied to pr ior distributions prop or tio nal to exp {−| β T | ǫ } , where ǫ is a vector of p o s itive num bers. In this insta nce, a rescal- ing of the cov ariates to obtain a prior with a sing le real ǫ results in the type of prior for which the theorem is stated. The theor em may b e strengthened so mewhat by ex plicitly cons ider ing the case of max β : | β T | 1 =1 h ( β , r , ǫ ) = 0, although the sta temen t of precise condi- tions under which the cas e-deleted r th moment is infinite b ecomes mes sy . The conditions in Theorem 5.1 ar e easy to c heck since the max im um of h ( β , r, ǫ ) may b e found via linear progr a mming methods. As in the case of the linear mo del, w e will in v estigate the r th mo men t of the cas e-deleted weigh t function under thic k-tailed and thin-tailed prior dis- tributions. The main to o l for the pro o fs is, once again, Lemma 2.1 . The fir s t corolla r y dea ls with thick-tailed distributions. Corollary 5.1. L et the prior distribution on β have t hick tails with r esp e ct to the class of distribut ions F = { π ( β ) : π ( β ) = c exp( − ǫ | β T | 1 ) and ǫ > 0 } . Then, I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 788 if h ( β , r , 0) < 0 for al l β su ch t hat | β T | 1 = 1 , the c ase-de lete d weight funct ion has finite r th moment ( r > 0 ) with r esp e ct to t he ful l p osterior p ( β ) . If, for some ve ctor β such that | β T | 1 = 1 , h ( β , r, 0) > 0 , then the c ase-delete d weight function has infinite r th moment. The next corolla ry a pplies to thin-tailed dis tributions. Corollary 5.2. Le t the prior distribution on β have thin tails with r esp e ct to the class of distributions F = { π ( β ) : π ( β ) = c exp( − ǫ | β T | 1 ) and ǫ > 0 } . Then the c ase-delete d weight function has finit e r th moment with r esp e ct to the ful l p osterior p ( β ) for al l r > 0 . 5.1. Applyi ng the c or ol laries The preceding corollar ies enable us to determine q uickly whether the ca s e- deleted weight function has finite or infinite r th momen t. Consider an arbitrar y logistic re g ression mo del where the prior distributio n on β is tak en to be the normal distribution with mea n β 0 and v ariance Σ, with Σ of full rank. T his distribution is thin-tailed w ith r e spe c t to the family o f prior dis tributions used in Theor em 5.1 . T o v erify this, write the r a tio of priors, with g representing the normal prio r density and f repr esenting the prio r density under a mem ber of the exp onential-tailed class: g ( β ) f ( β ) = (2 π ) − k/ 2 | Σ | − 1 / 2 exp( − ( β − β 0 ) T Σ − 1 ( β − β 0 ) / 2) c exp( − ǫ | β T | 1 ) ≤ (2 π ) − k/ 2 | Σ | − 1 / 2 c − 1 exp( − 1 / (2 λ 1 )( β − β 0 ) T ( β − β 0 ) + ǫ | β T | 1 ) = (2 π ) − k/ 2 | Σ | − 1 / 2 c − 1 exp( − 1 / (2 λ 1 ) k X i =1 ( β i − β 0 i ) 2 + k X i =1 | β i | ǫ ) , where λ 1 is the larges t eigenv alue o f Σ. Applying Cor ollary 5.2 with a normal prior distributio n, we find that all po sitive momen ts of the case- de le ted weigh t function are finite. This result holds, even if all of the cases are dele ted. Suppo se instead that the pr ior distribution on β is taken to b e a multiv ariate t distribution with ν degrees o f fr e edom, lo ca tion vector β 0 and scale ma trix Σ, with Σ of full ra nk. This t distribution is thick-tailed with resp ect to the family of prio r distributions used in Theo r em 5.1 . A for ma l verification of this follows from an examination of the ra tio of prior densit y functions. T o establish finiteness or infiniteness of the cas e-deleted moment s, use Theorem 5.1 with ǫ = 0. W e note that Theorem 5.1 can be of help in establishing whether or not the r th moment of the cas e-deleted weight function will b e infinite, even when the prior distribution is improp er. If the prior density for β is uniform on R k , for example, w e merely apply the theorem with ǫ = 0 . The c o nclusion of a finite case-deleted r th momen t is conditiona l up on the proprie t y of the p oste- rior distributio n. This pro priety is no t guaranteed, as use of the uniform pr io r I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 789 distribution may lead to an improp er p os terior distribution (see [ 12 ] and [ 18 ]). How ev er, since the weight function w \I ( β ) in Equatio n ( 5.1 ) alwa ys exceeds one, if the first mo ment of the case-deleted weight function is finite, so is the normalizing constant for the p oster ior: the p os terior distribution is prop er if any case-deleted weigh t function has finite first moment. 6. Cen tral limit theorems The previous sectio ns provide res ults that enable us to calculate the num ber of moments which exist for the case-deleted weigh t function. The results ap- ply to classes of prio r dis tributions, and so can b e quickly used to establish asymptotic normality o f the imp ortance sampling estimator ˆ E p \I [ g ( s )] given in Equation ( 2.1 ) and o f the related es tima to r ˆ E ∗ p \I [ g ( s )]. In this sec tio n, we indi- cate how these r esults apply to a v ariety of measure s of ca s e influence. W e a lso present t w o tec hniques which ar e generally useful for applying the results. Cent ral limit theorems (CL Ts) for impor tance sampling estimators when the parameter vectors s are g enerated as i.i.d. sa mples or arise from a uniformly ergo dic Markov chain, are describ ed in [ 13 ] a nd [ 24 ], resp ectively . Under either source for the s ample, the estimator ˆ E ∗ p \I [ g ( s )] is a symptotically nor mal if a nd only if Z w 2 \I ( s ) g 2 ( s ) p ( s | y ) ds < ∞ . (6.1) Sufficien t conditions for ˆ E p \I [ g ( s )] to b e asymptotically nor mal are that condi- tion ( 6.1 ) ho lds and that Z w 2 \I ( s ) p ( s | y ) ds < ∞ . (6.2) These conditions are explicitly presented for i.i.d. samples in [ 13 ]. A sligh t techni- cal ex tens io n of the CL T in [ 24 ] helps to establish the result for ergo dic samples. The extension consis ts of an application of the Cra mer-W old devic e to esta blish the joint asy mptotic normalit y of the estima tor of the normalizing constant for the w eight function and of an estimato r prop ortional to ˆ E ∗ p \I [ g ( s )], follow ed b y an application of the delta metho d (e.g., se e [ 9 ]). The first technique for establis hing a CL T re c ognizes that the g 2 ( s ) ter m in the integral in condition ( 6.1 ) can b e group ed w ith p ( s | y ), yielding, say , p ∗ ( s | y ). The q uantit y p ∗ ( s | y ) is the forma l p os ter ior dis tribution for s given the da ta, provided that it is in tegrable. It cor resp onds to a prop er Bayesian analysis with g 2 ( s )-adjusted prior distribution prop ortiona l to g 2 ( s ) π ( s ), pr ovided that 0 < R g 2 ( s ) π ( s ) ds < ∞ . W e note that this in tegral is ag ainst the prior distribution, and so is typically easy to ev a lua te. T o facilitate applicatio n of the theor ems we wish to preserve full supp ort of the function-adjusted pr ior distribution. T o chec k the a symptotic normality o f ˆ E ∗ p \I [ g ( s )] we need only verify condition ( 6.1 ), provided g ( s ) is nev er equal to I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 790 zero so that the g 2 ( s )-adjusted prio r has full s uppo r t. In a ll other cas e s w e act a s if the prior distribution had density (1 + g 2 ( s )) π ( s ). This preserves full supp ort of the function-a djusted pr ior distribution in case g ( s ) is not always different from zero. This also allows us to v erify at once conditions ( 6.1 ) and ( 6.2 ) when we wish to determine if ˆ E p \I [ g ( s )] is a symptotically normal. The sec o nd technique that w e find useful is to establish the finiteness of in te- grals in cas e-deleted p os ter iors for a b ounding function whic h then implies finite- ness for in teresting classes of functions. The relation log 2 ( x ) ≤ ( C ǫ + x − ǫ + x ǫ ) 2 for some co ns tant C ǫ and all x > 0 connects momen ts o f the ca se-deletion weigh t function to finitenes s of in tegrals for s everal influence measures . The moment ge ner ating function is also a use ful b ounding function. Hence, we con- sider g ( s ) = ex p( s T t ) for a ll t in some open neig hbo rho o d o f 0 , say , U . If R w 2 \I ( s ) e x p( s T t ) p ( s | y ) ds < ∞ , for all t ∈ U , then, condition ( 6.1 ) is s atisfied for a n y poly nomial in s n 1 1 · · · s n k k and any constant. W e note that condition ( 6.1 ) implies co nditio n ( 6.2 ) and the CL T applies to the imp or tance s a mpling esti- mators of any mixed and mar ginal moment of s . F o r mal Bay esian techniques tha t describ e the influence of a set of ca ses on an analysis fo cus o n a one-dimensio na l summary o f the difference b etw een the case-deleted p oster ior distribution and the full p oster ior distribution. Bay esian measures of mo del fit focus o n ca se-deleted measures of predictiv e a ccuracy and cross- v alidation. A plethor a of summaries exist. In this subsection, w e show how our results can be used to v erify tha t a CL T holds for the summaries estimated on the basis of a Monte Ca r lo sample. W e illus tr ate this po in t with a discussio n at the end of Example 6.1 concer ning estimation of the conditional pr edictive ordinate (CPO). This a pproach ca n b e applied to ma n y Bay esian case influence measures. T a ble 2 contains a s umma r y of re s ults. Each row of the table cor resp onds to a measure of influence. The measure is g iven under the column headed E stimand, and a for mula for estima tio n is g iven under the heading E stimator. The last three columns present sufficient conditions for the estimator to follow a CL T. The co lumn headed Mo m’s gives a num ber of moments of the case-deleted weight function; the column headed Adjmnt presents the function used to adjust the prior distribution, if needed, and the column headed Adj-Mom’s gives a num ber of moments o f the case-deleted weigh t function against the function- adjusted prior distr ibutio n. If the given num b er s of mo men ts and adjusted momen ts b oth exist, then a CL T holds for the es tima tor. 6.1. Examples Example 6.1. This e xample illustrates the pr actical use of the re s ults pre- sented in Section 3 . W e fit a linear mo del to data a ssembled b y the authors of [ 20 ] to investigate gr owth rates acr oss ma mmalian sp ecies. Ges ta tional time is known to be an impor tant fac tor in determining gr owth rate. The data set co n- tains 96 en tries with co mplete information on gr owth ra tes and p ossibly related cov ariates for mammalia n specie s . There is one marsupial that we ex cluded I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 791 from o ur a nalysis. Thr ee o f the remaining species exhibit delayed implan ta- tion, a phenomenon b y which the blasto cyst, after rea ch ing the uterus, r emains dormant a nd unattached to the uterine lining for an extended p er io d of time. An e x amination of the cov ariate g estation time (in days) led us to conclude that the r ecorded gestational time for the gr izzly and p ola r b ear s –ursus ar ctos and thalarctos maritimus–included the preimplantation time while the recorded gestational time for the nine-banded armadillo –dasypus novemcinctus–did not. This last ges tational time w as adjusted to include preimplantation. After this adjustment , the r ecorded ges tation time for each sp ecies included in the analy sis cov ers the time from egg fertilization to birth. The resp onse v aria ble is a sp ecies’ adv ancement, defined as the ratio of ne o na- tal to adult b o dy w eight. W e built a linea r mo del including an intercept and three cov ariates: the na tur al logarithms of ges tation time, litter size, and adult bo dy weigh t (centering all three cov ariates ar ound their r e spe c tiv e means). The least s quares fit of this model yields a multiple R -squa re of 0 . 4344. B ased on a Bay esian analysis with noninformative prio r distributions fo r the mo del pa- rameters, the 95% highes t p osterior density (hpd) interv als fo r the co efficie nts for log litter size a nd log b o dy w eight include o nly negative v a lues while the 95% hp d in terv al for the co efficient for log gestation time includes only p ositive v alues. This indicates that heavier sp ecies, sp ecies with la r ger litter siz es, and sp ecies with sho rter gestation times give birth to relatively immature offspring. W e use the theoretica l results of the pr eceding sections for three purp oses: we examine the influence of a preselected group of ca s es on inference, we scree n all groups o f cas e s of a certain size for their influence, and w e verify the stability of cross-v alidatory es timators of summa ry mea sures. First, consider the three sp ecies with delayed implantation. W e interpret the moment index r ∗ I , i.e., the cut-off v a lue for the existence o r no n-existence of the r th momen t of the ca s e deletion weigh ts (see, [ 7 ] and [ 8 ]), as a measure of influence of the cas e s b eing excluded. This cut-off v a lue is given b y the minimum of the cut-off v a lue r ∗ a, I betw een the leverage conditions ( a ) and ( a ′ ) and the cut-off v alue r ∗ c, I betw een the distance conditions ( c ) a nd ( c ′ ). Dropping the three sp ecies lea ds to the v alues r ∗ a, I = 4 . 74 a nd r ∗ c, I = 2 . 93. Thus r ∗ I = 2 . 93 fo r this s e t of sp ecies. This nu m ber is small, suggesting tha t this gro up of cases is influential. A glance a t T a ble 2 shows us that a central limit theore m will no t hold for the chi-square distance, but that it will ho ld for the other measur es listed in the table. As with traditional measures of influence, w e consider where our s e t o f obser- v ations falls on the measur es r ∗ a, I and r ∗ c, I with resp ect to other sets of similar size. W e scanned all triples of sp ecies , computing cut-offs fo r each triple. Order- ing the triples o f dr opp ed sp ecies according to their incr easing v alues of r ∗ a, I , we fo und that the nine-banded armadillo belong s to 9 9 o f the top 1 0 0 triples (all but the 31st), while the grizzly bea r belongs to 2 of the top 1 00 triples (the 18th a nd the 99th), and the p olar b ear belo ngs to one of the top 100 triples (the 38th). Our three sp ecies in co m bination r a nk 13 43rd out of the 138 415 triples, with an r ∗ a, I v alue of 4.74 . Ordering the tr iples of dropp ed sp ecies acco rding to their increas ing v alues o f r ∗ c, I , we find that b oth bear sp e cies b elong to each of the top 93 triples and that the grizzly b ear b elongs to e a ch of the top 16 7 I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 792 triples. Dropping all thr ee sp ecies with delay ed implantation at once yields the 6th smallest v alue for r ∗ c, I . F rom this compara tive a nalysis, we conclude tha t the three sp ecies with delayed implantation may b e influential, the nine-banded armadillo due mainly to its leverage and the tw o bear sp e cies due mainly to their outlyingness . This set of three species stands out, as there is a commo n underlying factor that may differen tiate them from the other sp ecies. Pursuing the p otential influence of o ur triple of cases, we examine whether inclusion of the do r mant p erio d in the total g estation time affects the conclusions that we draw from the mo del. T o a nswer this question we adjusted the g estation times of these s pecie s to account only for the perio d of actual developmen t and reconsider e d the linear mo del des crib ed ab ov e. The least squa res fit now yields a m ultiple R -square of 0 . 5267. The 95% hpd interv al for the co efficien t for log litter size now contains 0, sug gesting pos sible simplificatio n of the mo del, althoug h the qualitative interpretations of the impact of sp ecies weight, litter size, and gestation time rema in the same. Repe a ting the ea rlier exercise of dr opping triples of cases, we find that the leverage o f the nine-banded arma dillo is a little diminishe d, as it now en ters only in 7 o f the top 100 triples for r ∗ a, I . The grizzly be a r has a little more leverage, as it now belong s to 4 of the top 1 00 triples (the 7th, 10th, 1 7th and 65 th), while the p olar b ear b elongs to just one of the to p 100 tr iples (the 9 8 th). The three sp ecies in com bination rank 1916th with a n r ∗ a, I v alue of 5.24 . The smallest v alue of r ∗ c, I , which now equals 3 . 65 , is attained when the thr e e sp ecies papio papio, ursus arctos , and thalar ctos maritimus are droppe d. Both bea r sp ecies belo ng to each o f the top 2 2 triples, the griz z ly b ear belo ngs to 9 6 o f the top 100 tr iples a nd the po la r bear belo ngs to 97 of the top 100 tr iple s . Dro pping all three sp ecies with delayed implantation at once yie lds the 23r d smallest v alue for r ∗ c, I . Thu s, the three sp ecies with de layed implantation are still influential when the mo del is fit to the adjusted ges tation times, although the extent of their influence is slightly diminished. Acco rding to both analyses, the t w o bear sp ecies are highly influent ial, due mainly to their lar ge residuals . This no t only confirms the well known fact that bea rs have an unusually small adv ancement but also reveals that the dor mant gestatio n p erio d by itself cannot a ccount for it. Quo ting from a January 27, 200 4 New Y ork Times article (se e [ 1 ]): Po lar b ears share with all b ears an extreme dispari ty betw een the size of their mother, in the quarter-ton range, and that of a newborn cub—ab out a p ound. “It’s dr amatic trait in the b ear family ,” Dr. Pea tk au said. “They ar e off the chart among placen tal mammals, and closer to marsupials like the k angaroo.” Mo del fit is commonly assessed via k -fold cross v alidation. The data are partitioned a t random int o k subsets of approximately equa l sizes and each of the k subsets is used in turn as a tes t set with the union of the r e maining k − 1 subsets s erving as a training se t. The mo del is fit to the data in the tra ining sets and its predictions ar e co mpared to the actual v alues of the o bserv a tions in the test sets by computing some measure of predictive ability av e raged ov er the k sets o f predictions . In a Bay e s ian analysis, C P O provides a measure of ov erall pre dic tive ability . The cross- v alidated C P O can be estimated with draws I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 793 from the full p osterio r and imp or ta nce sampling weigh ts. How ev er, as noted in T a ble 2 , for a g iven partition, the cen tral limit theorem will not ho ld if r ∗ I drops below 2 when any of the k subsets o f observ ations is excluded. T o in v estigate how o ften the central limit theorem brea ks down for C P O , w e considered the case of 5-fold cro ss v alidation for the mo del fit to the data used in the first ana lysis a nd simulated 1 0,000 rando m partitions o f the data into 5 subsets of size 19 eac h. F or eac h split we computed five v alues of r ∗ I . Out o f the total 10,0 00 simulated pa rtitions, there were 658 partitions where r ∗ I dropp ed below 2 for ex a ctly one of the five case deletions and there was one partition where it dropp ed below 2 for t w o of the the fiv e case deletions. The v alue of r ∗ I never dro pped b elow 2 for more than t wo o f the five case deletions. If it is es ta blished that, for a particular partition, no cen tral limit theorem holds for impo r tance sampling estimation of C P O , then the a na lyst must turn to other metho ds of estimation. F or example, sampling from a mixture distribution with co mpo nen ts given by the full po s terior and by the case deleted p osterior s conditional on those subsets for whic h r ∗ I ≤ 2 e nsures the existence of a cen tral limit theorem for the estimate of C P O . Example 6.2. The authors of [ 11 ] in their influential pap er on Ba y esian mo del selection/mo del a veraging put a prior distr ibution ov er a collection of Ba y esian linear models . There hav e been a host of extensions o f their mo del, mos t o f which are amenable to the treatment b elow. F ormally , we describ e a prior dis- tribution having the fo r m of Equation ( 3.2 ). The likelihoo d fo r the mo del follows Equation ( 3.1 ). The prior distribution on the error v ar iance is σ 2 ∼ I G ( α, β ). The prior distribution on the r e gression co efficients is descr ibed in tw o stages. A t the fir s t stage, there is a n indica tor vector o f whether a reg ressor , θ j , “ ap- pea rs in” the mo del. The indica tors are indep endent B ernoulli( p j ) v ariates. I f the regress or does no t, then the conditiona l prior distribution on θ j is N (0 , τ 2 ) with s mall τ ; if the regr essor do es, then the conditional prior distribution on θ j is N (0 , cτ 2 ), with large c > 1 . Mar ginalizing p j , the prior distributio n on an individua l regr essor is θ j ∼ (1 − p j ) N (0 , τ 2 ) + p j N (0 , cτ 2 ). The r esulting prior dis tr ibution re mains absolutely contin uous with r esp ect to Lebes gue mea - sure while effectively allowing regressor s to b e included in or excluded from the mo del. The r e g ression analysis is used to estimate the reg r ession co efficients and the asso ciated po sterior exp ected loss. Pursuing a decision theor etic a pproach, we ask when the case-deleted imp ortance sa mpling estimators follow CL Ts. W e use the standard sum-of- s quared error loss, so that L ( θ , a ) = P k − 1 j =0 ( θ j − a j ) 2 . The Bay es ac tio n, a , is the p oster ior mea n vector. Here, we fo cus o n the p os ter ior exp ected los s . The pos terior ex p ected loss is E [ L ( θ , a ) | y ] = P k − 1 j =0 V ar ( θ j | y ), and then, for the asy mptotic normality of ˆ E p \I [ g ( s )] or ˆ E ∗ p \I [ g ( s )], the function g ( θ ) to b e considered is g ( θ ) = P k − 1 j =0 θ 2 j . W e no w pr o ceed with the technique. First, we verify that the function- adjusted pr io r distribution is pr op er. Since the prior distribution on the re- gressio n coefficients is a finite mixture of nor mals, I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 794 Z (1 + g 2 ( θ )) π ( θ ) d θ < ∞ . (6.3) Next, we consider Theor em 3.1 a s applied to the function-adjusted prio r dis - tribution with r = 2. If conditio ns ( a ), ( b ) and ( c ) of the theorem hold for a particular case -deletion, then the case - deleted weigh t function have finite sec- ond moment, o r equiv alently , R w 2 \I ( θ , σ 2 )(1 + g 2 ( θ )) p ( θ , σ 2 | y ) d θ dσ 2 < ∞ , establishing conditions ( 6.1 ) and ( 6.2 ) and hence a CL T for the estimators ˆ E p \I [ g ( s )] and ˆ E ∗ p \I [ g ( s )]. (Finiteness of the previous integral implies finite- ness of R P k − 1 j =0 w 2 \I ( θ , σ 2 ) θ 4 j p ( θ , σ 2 | y ) dθ dσ 2 .) W e note that conditions ( a ) , ( b ) and ( c ) involv e leverage, residuals from a least squa r es regr ession including all regres s ors and n um ber of ca ses deleted. They do not directly inv olve the prio r distribution, b eyond the parameter s α and β of the prior on σ 2 . The impa ct o f the prior distribution’s tail behavior on decision rules is dis- cussed in [ 2 ]. Ro bustness co nsiderations s uggest that it is often wise to use a prior distribution with thick er tails than the likelihoo d. F or MCMC algorithms, a conv enient replace men t o f the norma l distribution is a t - distribution, see fo r ex- ample [ 4 ]. The technique used ab ov e can b e dir ectly applied and yields the same results when the prior distribution for θ j is (1 − p j ) N (0 , τ 2 ) + p j T ( d, 0 , cτ 2 ), with the la tter ter m in the mix tur e a t -dis tribution with d > 4 degr e e s of freedom, center 0 and scale τ 2 . The r equirement d > 4 guar antees that condition ( 6.3 ) holds. Example 6.3. The results of a study us ed to estimate the surviv al distribution for le ukemia patients are presented in [ 10 ]. The resp onse v aria ble is surv iv al time (from dia gnosis), and explanatory v ariables a re white blo o d cell co unt at diagnosis (WBC) and whether “Auer ro ds and/o r s ignificant granulature of the leukemic cells in the bone marrow at dia g nosis” were present (A G p ositive) or absent (AG neg ative). The author s of [ 10 ] develop estimates of the surviv al distribution bas e d up on presumed exp onential distributions which are a llow ed to depend on the cov ariates . The authors of [ 6 ] dic hotomize the surviv a l times by defining a new resp onse whic h indica tes surviv al past 50 weeks. T he y analyze the data with the frequentist counterpart to the logistic regression mo del describ ed in Section 5 , wher e there a re k = 3 cov ariates : an in tercept, WBC and A G. The authors of [ 6 ] iden tify one cas e, a patient with a high WBC count and a surviv a l time of more than 50 weeks, as having extr emely large influence. They also note that a ltering the mo del (to predict surviv a l based on log(WBC) and AG) can reduce the influence of the ca se. W e examine influence under a pro duct of double- exp o nent ial prio r distr ibu- tions for β . The distribution has s cale para meter 10 in each direction (and hence a prior dis tribution with mea n for β i | ( β i > 0) of 10 ). Case 15, diag nosed in [ 6 ] as a n influential o bserv a tion, is easily found to have an infinite v aria nce for its case-deleted w eight function. The v alue of the c riterion h ( β , 2 , 0) is found to b e h ((0 , − 1 , 0) T , 2 , − 0 . 1) = 45 . 15. This v a lue is well in excess of 0, and indicates that the choice of ǫ = 0 . 1 for the prior distribution has little to do with why the case-deleted weigh t function has infinite v ariance. On the other hand, the I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 795 v alue o f the criter ion in the p ositive direction for β 2 is less than 0 , indicating that this tail of the distribution of β 2 is well-behav ed. No other cas e results in an infinite v ariance for its ca se-deleted weigh t function. F o r case 15 condition ( 6.2 ) do es not ho ld a nd the estimators ˆ E p \I [ g ( s )] and ˆ E ∗ p \I [ g ( s )] are not a symptotically normal. Co ndition ( 6.2 ) holds fo r all remaining observ ations. Thus w e can establish the asymptotic nor mality of their asso ci- ated estimators b y showing that c o ndition ( 6.1 ) holds a s w ell. W e do this b y using the b o unding strategy describ ed a b ove showing that h ( β , 2 , ǫ ) < 0, for all β : | β T | 1 = 1, implies the existence of an op en neighbor ho o d U o f 0 s uch that R w 2 \I ( s ) e x p( s T t ) p ( s | y ) ds < ∞ , for all t ∈ U . Hence, it follows that R exp { h ( β , 2 , ǫ ) + 2 β T t } d β is finite for a ll t in U , which in turn, a rguing as in the pr o of of Theorem 5.1 , implies that R w 2 \I ( s ) e x p( s T t ) p ( s | y ) ds < ∞ , for all t ∈ U . It is interesting to note that the analysis ab ov e is no t strictly connected to the particular choice of the prior distribution as a pro duct o f do uble exp onentials. Indeed, in light of Co rollary 5.1 , if the (proper ) prior distribution o n β is thic k- tailed with resp ect to the family of product of double exponential distributions, then h ( β , 2 , 0) < 0 for all β : | β T | 1 = 1 still implies b oth conditions ( 6.1 ) and ( 6.2 ). This is true, even when the noninformative prior distribution π ( β ) ∼ 1 is assigned. Finally , if π ( β ) is thinner -tailed than an y pro duct of double exponen- tials, then conditions ( 6.1 ) a nd ( 6.2 ) are always sa tisfied. 7. Conclusions The developmen t of effective computational to ols for fitting hierarchical mo dels has spurred the gr owth of Ba y esian data analy s is. As with its classical co un ter- part, a complete Bay esian data analysis investigates s e nsitivit y of inferences to changes in the data set, with particular consideration giv en to ex cluding obser- v ations from the analys is . This exclusion is most often accomplished through the use of impor tance sampling based on case-deleted w eight functions. The theoret- ical results in Sections 3 through 5 pr ovide conditions under which impo rtance sampling estimators of v ar ious functiona ls will follow central limit theorems. F ur ther results along these lines may b e o btained for o ther likeliho o ds (particu- larly those in the expo nen tial family) and for o ther spe c ific model structures (as in Section 4 ). The techniques in Section 6 provide a simple means of v erifying the conditions of the ear lier theo rems. W e ha v e found that the com bination o f these techn iques and the theorems allow us to easily verify (or disprove) asymptotic normality o f many estimators. The results can be used to ev aluate computational strategies. In many sit- uations, co mputations can b e hastened b y sampling from a formal mo del that uses a nic e ly structured pr ior distribution–say π s ( s )–in plac e of the actual prior distribution, π ( s ). This ch ange may b e motiv ated by the s peed of pro gramming conjugate calculations or by the s p eed o f execution of the a lgorithm (e.g., see [ 17 ]) use d to fit the mo del. With the alter ed mo del, inference is made thro ugh use of imp o rtance sampling with weigh ts w p ( s ) = π ( s ) /π s ( s ). When conce r ned I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 796 ab out the effects of g roups of cases, these imp or ta nce sampling weigh ts can b e combined with the case-deletion weights to pro duce infer ence under the ca s e- deleted p osterior distributio n. The weigh ts are w ( s ) = w p ( s ) w \I ( s ). Supp ose that the weight s due to the prio r distribution have r p moments and the ca se- deletion weigh ts have r I moments (under the model with pr ior distribution π s ). Then a straightforw ard calculatio n shows that the co m bined weigh ts ha ve at least ( r − 1 p + r − 1 I ) − 1 moments. Th us, the suitability for quick and efficient data analysis based on the co mputational strategy where π is replaced by π s for the sampling algo rithm can be ev a luated. There is a stro ng connection b et ween the tail o f the prior distr ibution rela tive to the likeliho o d and the robustness of inference based o n the mo del. Sen timen t generally favors prior distributions with thic ker tails than the lik eliho o d. With a thick-tailed prior distr ibution, when there is a cla sh b et ween lik eliho o d and prior, inference is dominated b y the likelihoo d (e.g., see [ 2 ], Chapter 4). Our preference is to select a prior distribution that reflects the analyst’s beliefs. Of- ten, this will b e a thick-tailed prior distribution, leading to simplified conditions such as those in Corolla r ies 3.1 or 5.1 . While our prefere nce is to select the prio r distribution on the basis o f mo deling c onsiderations, we do note that the results of this pap er co uld b e used to select a prior with tails thin eno ug h to guarantee existence of s ome targeted r mo men ts. The r esults we der ive apply to bro a d c lasses of models . As an exa mple, the sp ecification of the no r mal theory linear mo del in ( 3.1 ) a nd ( 3.2 ) can mask a m uch richer hierarchical mo del. The richer mo del ma y include further parameters– say γ –where the prior distribution on θ dep e nds on γ . As long as the likeliho o d is a function only of θ and σ 2 , the case-deleted weigh t function will also be a function of these par a meters. The theo rems ar e applied with the marginal prio r distribution of θ and σ 2 . The prior s pecific a tions in [ 11 ] and [ 19 ] ma y b e viewed in this light. Mo dels which combine different studies provide a less evident matc h for these theorems. A typical linear mo del us ed for suc h combination will allow the re- gressio n co efficients to v ary from s tudy to s tudy . Suc h v a riation is captured with a hierarchical mo del that links the c o efficient s across studies by mea ns of hyper parameters. The ov erall mo del can b e expr essed in graphica l form a s a hierarchical mo del. The adv antage of the general conditions in the theorems that describ e only the tail b ehavior of the prior distr ibution b ecomes apparent in this setting. F o r case deletions inv olving only o ne study , a nd referring to the notation of the pr evious paragra ph, γ includes the parameter s sp ecific to the other studies, the data sp ecific to the other studies, a nd the hyperpa rameters. Thu s the marginal prior distr ibution on θ a nd σ 2 to be used in the theorems is the marginal distribution on these parameters, po sterior to the data fro m the other studies. While this distribution is usua lly inaccessible in closed form, one can often verify that its tails b ehave like so me (unspecified) normal distribution or that they are thick er than the cla ss of normal distributions. This is sufficien t for application o f the theoretical r esults. I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 797 APPENDIX Pr o of of L emma 2.1 T o prov e the first part of the lemma, note that S 1 = c − 1 1 Z f ( x ) π 1 ( x ) h ( x ) dx > c − 1 1 Z f ( x ) B − 1 π 0 ( x ) h ( x ) dx = c − 1 1 B − 1 c 0 S 0 = ∞ . T o prov e the second part of the lemma, note that S 1 = c − 1 1 Z f ( x ) π 1 ( x ) h ( x ) dx < c − 1 1 Z f ( x ) b − 1 π 0 ( x ) h ( x ) dx = c − 1 1 b − 1 c 0 S 0 < ∞ . The third par t of the lemma follows from the first tw o parts. The pro o f o f Theorem 3.1 relies o n the following t wo lemma s. Lemma A.1. L et λ 1 ≤ · · · ≤ λ I denote the eigenvalues of H I . The matrix ( I − r H I ) is non-singular if and only if λ i 6 = 1 /r , for every i = 1 , . . . , I . If ( I − r H I ) is non-singu lar, t hen it is p ositiv e definite if and only if λ I < 1 /r . Pr o of. Because for all l ∈ R and for all r > 0, [ I − r H I − l I ] = − r [ H I − (1 − l ) /r I ], then the I eigenv alues o f ( I − r H I ) are 1 − r λ 1 ≥ · · · ≥ 1 − r λ I and the statements in the lemma fo llow dir ectly . Lemma A. 2. ( i ) ( X T X − r X I X T I ) is singular if and only if ( I − r H I ) is singular. ( ii ) ( X T X − rX I X T I ) is p ositive definite if and only if ( I − r H I ) is p ositive definite. Pr o of. T o pr ov e the lemma we use a formula for matrix inv ersion giv en in [ 14 ]. F o r every squa re ma trix W and any confor ming r ectangular matrices U a nd V , assuming that each of the s tated in verses e xists: ( W + U T V ) − 1 = W − 1 − W − 1 U T ( I + V W − 1 U T ) − 1 V W − 1 . (A.1) By applying formula ( A.1 ) to the matrices W = X T X , U = − rX T I and V = X T I , an expressio n for the inv erse of ( X T X − rX I X T I ), when it exists, is given b y: ( X T X − r X I X T I ) − 1 = = ( X T X ) − 1 + r ( X T X ) − 1 X I ( I − r H I ) − 1 X T I ( X T X ) − 1 . (A.2) On the other ha nd, if w e s ubstitute W = I , U = − r ( X T X ) − 1 X I and V = X I int o Equation ( A.1 ), an expr ession for ( I − r H I ) − 1 is given b y I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 798 ( I − r H I ) − 1 = I + rX T I ( X T X − rX I X T I ) − 1 X I . (A.3) Thu s, we can use formula ( A.2 ) to verify the “if ” p art of pro po sition ( i ) and formula ( A.3 ) to verify the “only if ” p art . With rega r d to prop os itio n ( i i ) of the lemma, obser ve that if ( I − r H I ) − 1 is p ositive definite, then X T I ( I − r H I ) − 1 X I is po s itive semi-definite a nd E quation ( A.2 ) shows that ( X T X − rX I X T I ) − 1 can be written as the sum of a po s itive definite matrix, ( X T X ) − 1 , and a p ositive semi-definite ma trix. As such, it is po sitive definite and ( X T X − rX I X T I ) m ust be p os itive definite as w ell. Lo oking at Equation ( A.3 ) and arguing in a s imilar manner, the neces sary condition in pr op osition ( ii ) may be proved. Pr o of of The or em 3.1 P art ( i ) The assumption that λ i 6 = 1 / r for all i = 1 , . . . , I implies that ˜ θ = ( X T X − rX I X T I ) − 1 ( X T y − r X I y \I ) (A.4) is well defined in view of for m ula ( A.2 ) a nd Le mma A.1 , and the p os terior r th moment of w \I ( s ), E ( w r \I ( s ) | y ), is prop ortiona l to Z w r \I ( s ) q ( s ) ds = Z ( σ 2 ) − ( n − r I ) / 2 − α − 1 × × exp {− 1 / (2 σ 2 )[ y T y − r y T I y I − ˜ θ T ( X T X − rX I X T I ) ˜ θ + 2 / β ] } × × exp {− 1 / (2 σ 2 )( θ − ˜ θ ) T ( X T X − r X I X T I )( θ − ˜ θ ) } π 1 ( θ ) d θ dσ 2 . (A.5) If condition ( a ) holds, then, by Lemma A.2 , ( X T X − r X I X T I ) is p ositive definite, and E ( w r \I ( s ) | y ) ≤ const × Z ( σ 2 ) − ( n − r I ) / 2 − α − 1 × × exp {− 1 / (2 σ 2 )[ y T y − r y T I y I − ˜ θ T ( X T X − rX I X T I ) ˜ θ + 2 / β ] } dσ 2 . (A.6) Using the e x pression for ( X T X − rX I X T I ) − 1 given in E q uation ( A.2 ) and the prop erty that H I commutes with ( I − H I ), we obtain: y T y − r y T I y I − ˜ θ T ( X T X − rX I X T I ) ˜ θ = y T ( I − H ) y − r y T I [ I + r H I + r 2 H I ( I − r H I ) − 1 H I ] y I + 2 r y T I [ I + r H I ( I − r H I ) − 1 ] X T I ( X T X ) − 1 X T y − r y T X ( X T X ) − 1 X I ( I − r H I ) − 1 X T I ( X T X ) − 1 X T y = y T ( I − H ) y − r e T I ( I − r H I ) − 1 e I = RSS ∗ \I ( r ) . Thu s, the in tegrand in Equation ( A.6 ) is prop ortio nal to an in v erse gamma density if conditions ( b ) and ( c ) hold. Sufficiency o f conditions ( a ) − ( c ) is prov ed. Suppos e no w that a n y of conditions ( a ′ ) or ( b ′ ) or ( c ′ ) holds. If ( b ′ ) I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 799 is true then, as σ 2 → ∞ , the in tegrand in ( A.5 ) goes to zero to o slowly and it is not integrable. On the other ha nd, if ( c ′ ) holds, b ecause qua dratic forms are contin uous and beca use π 1 has full supp ort, then ther e exists a neighborho o d C 1 of ˜ θ having p ositive Lebesgue mea sure such that RSS ∗ \I ( r ) + 2 /β + ( θ − ˜ θ ) T ( X T X − rX I X T I )( θ − ˜ θ ) < 0 . Also, when ( a ′ ) holds, b eca use ( X T X − rX I X T I ) is non- po sitive-definite, no n-singular ma trix, we ca n find a s et C 2 , dep ending on β and RSS ∗ \I ( r ), with p ositive Le bes g ue measur e, such that RSS ∗ \I ( r ) + 2 / β + ( θ − ˜ θ ) T ( X T X − rX I X T I )( θ − ˜ θ ) < 0, for all θ ∈ C 2 . Thus, under either of conditions ( a ′ ) or ( c ′ ), the integrand in ( A.5 ) approa ches infinity at an exp onential r ate as σ 2 → 0 for every θ belo nging to a set with p ositive Leb esgue measure . It follows that E ( w r \I ( s ) | y ) = ∞ . P art ( ii ) If the standar d noninfor mative prior π ( θ , σ 2 ) ∝ 1 /σ 2 is used, w e can o btain an expressio n for R w r \I ( s ) q ( s ) ds by setting α = 0 and π 1 ( θ ) ∝ 1 and letting β tend to infinity in Equation ( A.5 ). Then, if co ndition ( a ) holds, we have Z exp {− 1 / (2 σ 2 )( θ − ˜ θ ) T ( X T X − rX I X T I )( θ − ˜ θ ) } π 1 ( θ ) d θ = =  2 π σ 2 | X T X − rX I X T I | − 1  k/ 2 , where here | · | denotes the determinant of its ar gument and E ( w r \I ( s ) | y ) ∝  2 π | X T X − rX I X T I | − 1  k/ 2 × × Z ( σ 2 ) − ( n − r I − k ) / 2 − 1 exp {− 1 / (2 σ 2 )RSS ∗ \I ( r ) } dσ 2 . (A.7) The integral o n the r ight-hand side is finite if conditions ( b ) and ( c ) (as given in the statement of par t ( ii )) hold and sufficiency in part ( ii ) is shown. The pro o f of the “only if ” p art pro ceeds a s in part ( i ). Pr o of of Cor ol lar y 3.1 Let E j ( w r \I ( θ , σ 2 ) | y ) denote the p oster io r r th moment of the w eight function when the prior distribution for ( θ , σ 2 ) is given by π 11 ( θ ) × π j 2 ( σ 2 ), for j = 0 , 1. If λ i 6 = 1 /r for all i = 1 , . . . , I , then, E j ( w r \I ( θ , σ 2 ) | y ) is prop ortio na l to Z ( σ 2 ) − ( n − r I ) / 2 exp {− 1 / (2 σ 2 )[RSS ∗ \I ( r )+ + ( θ − ˜ θ ) T ( X T X − rX I X T I )( θ − ˜ θ )] } π 11 ( θ ) π j 2 ( σ 2 ) d θ dσ 2 (A.8) where ˜ θ is (well) defined in E quation ( A.4 ). As shown in the pro of of T heo - rem 3.1 , if λ I < 1 /r , then 0 < exp {− (1 / (2 σ 2 )( θ − ˜ θ ) T ( X T X − r X I X T I )( θ − ˜ θ ) } ≤ 1 so that I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 800 E j ( w r \I ( θ , σ 2 ) | y ) ≤ const × × Z ( σ 2 ) − ( n − r I ) / 2 exp {− 1 / (2 σ 2 )RSS ∗ \I ( r ) } π j 2 ( σ 2 ) dσ 2 , j = 0 , 1 . (A.9) Applying inequalit y ( A.9 ) with j = 1 and using the assumption that RSS ∗ \I ( r ) > 0, we hav e E j ( w r \I ( θ , σ 2 ) | y ) ≤ const × Z ( σ 2 ) − ( n − r I ) / 2 π 12 ( σ 2 ) dσ 2 , and the latter in tegral is finite by ass umption. T o pro v e the sec ond par t of the corollary , we fir st note that Theorem 3.1 implies tha t if λ I > 1 /r , then E 0 ( w r \I ( θ , σ 2 ) | y ) = ∞ for any π 11 and π 02 ∈ F 2 , whereas, if RSS ∗ \I ( r ) < 0, then E 0 ( w 2 \I ( θ , σ 2 ) | y ) = ∞ for any π 02 in F 2 having β > − 2 / RSS ∗ \I ( r ). As we noted in the pr o of of Theorem 3.1 , in b oth cases we can find a subset C o f R k having p os itive Leb esgue measur e such tha t, for any θ ∈ C , E 0 ( w r \I ( θ , σ 2 ) | y ) is infinite beca use the integral with re spe c t to σ 2 do es not ex ist in any neigh- bo rho o d o f zero. Because π 12 is thic k-tailed with resp ect to F 2 , then, for ev- ery fixed B > 0, there exists a σ 2 0 such that π 12 ( σ 2 ) > B π 02 ( σ 2 ) for any σ 2 < σ 2 0 . T hus, by Lemma 2.1 E 0 ( w 2 \I ( θ , σ 2 ) | y ) = ∞ for some π 02 in F 2 implies E 1 ( w 2 \I ( θ , σ 2 ) | y ) = ∞ as well. Pr o of of Cor ol lar y 3.2 If λ I < 1 /r , inequalit y ( A.9 ) ho lds for b oth the prior π 11 ( θ ) × π 02 ( σ 2 ) and π 11 ( θ ) × π 12 ( σ 2 ). F urthermore, if π 02 ( σ 2 ) is a prior distribution in F 2 with α > − ( n − rI ) / 2 and with β such that RSS ∗ \ i (2) > − 2 /β , then, if λ I < 1 /r , R ( σ 2 ) − ( n − r I ) / 2 exp {− 1 / (2 σ 2 )RSS ∗ \I ( r ) } π 02 ( σ 2 ) dσ 2 is finite. By assumption, for any fixed b > 0 there exis ts a δ > 0 such that π 12 ( σ 2 ) < b π 02 ( σ 2 ) for any σ 2 < δ . Next, split the integral o n the right hand side in E quation ( A.9 ) into the tw o po rtions over (0 , δ ) a nd [ δ, ∞ ). By Lemma 2 .1 , R δ 0 ( σ 2 ) − ( n − r I ) / 2 exp {− 1 / (2 σ 2 ) RSS ∗ \I ( r ) } π 02 ( σ 2 ) dσ 2 < ∞ implies R δ 0 ( σ 2 ) − ( n − r I ) / 2 exp {− 1 / (2 σ 2 )RSS ∗ \I ( r ) } π 12 ( σ 2 ) dσ 2 < ∞ . F or the p ortion over ( δ, ∞ ), it is enoug h to observe that R ∞ δ ( σ 2 ) − ( n − r I ) / 2 exp {− 1 / (2 σ 2 )RSS ∗ \I ( r ) } π 12 ( σ 2 ) dσ 2 < const × R ∞ δ ( σ 2 ) − ( n − r I ) / 2 π 12 ( σ 2 ) dσ 2 , which is finite by assumption. Assume now that π 11 ( θ ) is thick-tailed with resp ect to F 1 and that λ I > 1 /r . It follows from λ I > 1 /r together with λ i 6 = 1 /r for all i = 1 , . . . , I − 1 that ( X T X − rX I X T I ) /σ 2 is a non-p ositive-definite, non-singular matrix, ∀ σ 2 > 0. Thus, ther e ex is ts a sequence { θ 0 t } with || θ 0 t || → ∞ , as t → ∞ , and a vector ǫ = ( ǫ 0 , . . . , ǫ k − 1 ), with ǫ j > 0 for a ll j = 0 , 1 , . . . , k − 1 suc h that lim t →∞ 1 / (2 σ 2 )( θ t − ˜ θ ) T ( X T X − r X I X T I )( θ t − ˜ θ ) = −∞ , for all sequences { θ t } such that θ 0 t − ǫ < θ t < θ 0 + ǫ . Keeping in mind that π 11 ( θ ) is thick-tailed with resp ect to F 1 , then lim t →∞ exp {− 1 / (2 σ 2 )( θ t − ˜ θ ) T ( X T X − r X I X T I )( θ t − ˜ θ ) } π 11 ( θ t ) = ∞ , for a ll seque nc e s { θ t } such tha t θ 0 t − ǫ < θ t < θ 0 t + ǫ . It follows fro m Equatio n ( A.8 ) that E 1 ( w r \I ( θ , σ 2 ) | y ) = ∞ . I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 801 Pr o of of Example 3.1 T o av oid heavy algebra, we consider o nly the case θ 0 = ˜ θ , althoug h the res ult is true for an ar bitr ary θ 0 . If h ii = 1 / 2 + 1 / P n j =1 x 2 j , then X T X − 2 x i x T i = − 2. Some algebra ic manipulations yield E ( w 2 \ i ( θ, σ 2 ) | y ) ∝ Z ∞ 0 ( σ 2 ) − ( n − 2) / 2 exp {− RSS ∗ \ i / (2 σ 2 ) }× ×  Z ∞ 0 exp { x/σ 2 − x 2 } x − 1 / 2 dx  π 12 ( σ 2 ) dσ 2 and for the interior integral with resp ect to x the following bounds hold: Z 1 0 exp { x/σ 2 − x 2 } dx ≤ Z ∞ 0 exp { x/σ 2 − x 2 } x − 1 / 2 dx ≤ exp { 1 /σ 2 } Z 1 0 x − 1 / 2 dx + Z ∞ 1 exp { x/σ 2 − x 2 } dx. (A.10) F ur ther more, R b a exp { x/σ 2 − x 2 } dx ∝ exp { (2 σ 2 ) − 2 } , for a ll −∞ ≤ a < b ≤ ∞ . This fact and the second inequality in ( A.10 ) imply that if π 12 ∝ exp( − ( σ 2 ) − 2 − σ 2 ) then E ( w 2 \ i ( θ , σ 2 ) | y ) < const × R 1 0 ( σ 2 ) − ( n − 2) / 2 exp {− ( σ 2 ) − 2 − σ 2 − (RSS ∗ \ i / 2 − 1) /σ 2 } dσ 2 + const × R 1 0 ( σ 2 ) − ( n − 2) / 2 exp {− 3 4 ( σ 2 ) − 2 − σ 2 − RSS ∗ \ i / (2 σ 2 ) } dσ 2 < ∞ . On the o ther ha nd, if π 12 ( σ 2 ) ∝ ex p( − ( σ 2 ) − 3 / 2 − σ 2 ), then the fir st inequal- it y in ( A.10 ) yields E ( w 2 \ i ( θ , σ 2 ) | y ) > const × Z ∞ 0 ( σ 2 ) − ( n − 2) / 2 × × exp {− ( σ 2 ) − 3 / 2 − σ 2 − RSS ∗ \ i / (2 σ 2 ) + (2 σ 2 ) − 2 } dσ 2 = ∞ . Pr o of of The or em 4.1 T o simplify the notation, in this pro of w e will wr ite A ( κ ) fo r A ( I , r, κ ), B ( κ ) for B ( I , r, κ ) and C for C ( I , r ). Simple alg ebraic manipulations show that E ( w r \I ( m, σ 2 , κ ) | v ) ∝ Z w r \I ( m, σ 2 , κ ) q ( m, σ 2 , κ ) dm dσ 2 dκ = Z ( σ 2 ) − ( n − r I ) / 2 − 1 exp n − 1 2 σ 2 h A ( κ ) m 2 − 2 B ( κ ) m + C io π 2 ( κ ) dm dσ 2 dκ (A.11) = Z ( σ 2 ) − ( n − r I ) / 2 − 1 exp n − 1 2 σ 2 h C − B 2 ( κ ) A ( κ ) io × (A.12) × exp n − A ( κ ) 2 σ 2 h m − B ( κ ) A ( κ ) i 2 o π 2 ( κ ) dm dσ 2 dκ. I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 802 Suppo se first that co nditions ( a ) , ( b ) a nd ( c ) a re satisfied. It follows from ( a ) that A ( κ ) > 0 for almost all κ , so that exp n − A ( κ ) / (2 σ 2 ) [ m − B ( κ ) / A ( κ )] 2 o is prop ortiona l to the normal density w ith mea n B ( κ ) / A ( κ ) and v a riance σ 2 / A ( κ ). Then, denoting by Φ the standard normal cum ula tiv e distribution function, int egral ( A.12 ) re duces to (2 π ) 1 / 2 Z ( σ 2 ) − ( n − r I − 1) / 2 − 1 exp n − 1 2 σ 2 h C − B 2 ( κ ) A ( κ ) io (A.13) ×  1 − Φ  − B ( κ ) ( σ 2 A ( κ )) 1 / 2  A − 1 / 2 ( κ ) π 2 ( κ ) dσ 2 dκ ≤ Z (2 π ) 1 / 2 ( σ 2 ) − ( n − r I − 1) / 2 − 1 exp n − 1 2 σ 2 h C − B 2 ( κ ) A ( κ ) io (A.14) × A − 1 / 2 ( κ ) π 2 ( κ ) dσ 2 dκ. Under conditions ( b ) and ( c ), the in teg rand in ( A.14 ) is proportio nal to an inv e r se gamma densit y for a lmost all κ a nd in tegral ( A.14 ) is pr op ortional to Z h C − B 2 ( κ ) A ( κ ) i − ( n − r I − 1) / 2 A − 1 / 2 ( κ ) π 2 ( κ ) dκ (A.15) Moreov er, condition ( c ) implies that [ C − B 2 ( κ ) / A ( κ )] − ( n − r I − 1) / 2 is a bo unded (co ntin uous) function of κ on N c so that if R A − 1 / 2 ( κ ) π 2 ( κ ) dκ < ∞ then integral ( A.15 ) is finite. Condition ( a ) implies that P i ∈I c 2 i / P n i =1 c 2 i = = lim κ →∞ l ( I , κ ) < 1 /r or, equiv a lent ly , that P i 6∈I c 2 i − ( r − 1 ) P i ∈I c 2 i > 0 so that, as κ tends to infinity , A ( κ ) b ehav es like 1 / κ 2 . Hence, the finiteness of R A − 1 / 2 ( κ ) π 2 ( κ ) dκ is gua ranteed b y R κπ 2 ( κ ) dκ < ∞ . Sufficiency of conditions ( a ) , ( b ) and ( c ) follows. Assume now that conditions ( a ) , ( b ) and ( d ) hold. Then E ( w r \I ( m, κ, σ 2 ) | v ) is still prop ortiona l to int egral ( A.13 ). W e will prov e that under c o nditions ( b ) and ( d ) the integral is finite. It fo llows fro m condition ( d ) that B ( κ ) < 0 a lmo st surely and, for every fixe d ǫ > 0, we can find a consta nt M 1 > 0 such that 1 − Φ  − B ( κ ) ( σ 2 A ( κ )) 1 / 2  ≤ 1 + ǫ √ 2 π × ( σ 2 A ( κ )) 1 / 2 | B ( κ ) | × exp n − 1 / (2 σ 2 ) B 2 ( κ ) A ( κ ) o , ∀ σ 2 < (1 / M 2 1 ) B 2 ( κ ) / A ( κ ). Therefore, a n upper b ound for integral ( A.13 ) is (1 + ǫ ) Z ∞ 0 Z M ( κ ) 0 ( σ 2 ) − ( n − r I ) / 2 exp {− C 2 σ 2 }| B ( κ ) | − 1 π 2 ( κ ) dσ 2 dκ + + Z ∞ 0 Z ∞ M ( κ ) ( σ 2 ) − ( n − r I − 1) / 2 − 1 exp n − 1 2 σ 2 h C − B 2 ( κ ) A ( κ ) io A − 1 / 2 ( κ ) π 2 ( κ ) dσ 2 dκ := I 1 + I 2 where M ( κ ) := B 2 ( κ ) / [ M 2 1 A ( κ )]. With regar d to integral I 1 , obs erve that I 1 ≤ (1 + ǫ ) R ∞ 0 R M ( κ ) 0 M ( κ )( σ 2 ) − ( n − r I ) / 2 − 1 exp {− C / 2 σ 2 }| B ( κ ) | − 1 π 2 ( κ ) dσ 2 dκ. I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 803 Under conditions ( b ) a nd ( d ), ( σ 2 ) − ( n − r I ) / 2 − 1 exp {− 1 / (2 σ 2 ) C } is pr op ortional to an inv erse gamma density so that I 1 ≤ M 2 R ∞ 0 M ( κ ) / | B ( κ ) | π 2 ( κ ) dκ = M 2 / M 2 1 R ∞ 0 | B ( κ ) | / A ( κ ) π 2 ( κ ) dκ , for some co nstant M 2 > 0. Mo reov er, as κ tends to ∞ , it follows from condition ( d ) that | B ( κ ) | b ehaves like 1 /κ a nd, a s seen earlier, it follows from co ndition ( a ) tha t A ( κ ) behaves like 1 / κ 2 . Hence, we co nclude that R κπ ( κ ) < ∞ implies R | B ( κ ) | / A ( κ ) π 2 ( κ ) dκ < ∞ . With rega r d to I 2 , under conditio n ( b ) we obtain: I 2 ≤ Z ∞ 0 M ( κ ) − ( n − r I − 1) / 2 max n 1 , exp n − C − B 2 ( κ ) / A ( κ ) 2 M ( κ ) oo A − 1 / 2 ( κ ) π 2 ( κ ) dκ. Conditions ( a ) and ( d ) together yield sup κ ∈N c B 2 ( κ ) / A ( κ ) < ∞ and inf κ ∈N c M ( κ ) > 0 and previous integral is finite if R A − 1 / 2 ( κ ) π 2 ( κ ) dκ < ∞ . Sufficiency of conditio ns ( a ) , ( b ) , ( d ) follows. Conv ersely , if ( e ) holds, then the integrand in in tegral ( A.11 ) a pproaches infinit y at an expo nen tial r a te as σ 2 go es to zero , wherea s if n − rI ≤ 0, the int egrand approaches zer o too slowly as σ 2 go es to infinity . Bo th ( e ) and n − rI ≤ 0 imply that in tegral ( A.11 ) is infinite. Actually , no n integrabilit y follows even if 0 < n − r I ≤ 1. T o show this supp ose that A ( I , r , κ ) m 2 − 2 B ( I , r , κ ) m + C ( I , r ) > 0 fo r almo st all κ and that n − rI > 0. Th us, integral ( A.11 ) is propo r tional to Z h A ( I , r , κ ) m 2 − 2 B ( I , r , κ ) m + C ( I , r ) i − ( n − r I ) / 2 π 2 ( κ ) dm dκ, but the interior in tegral with r e spe c t to m is infinite if ( n − r I ) ≤ 1. Thus condition ( f ) implies E ( w r \I ( m, κ, σ 2 ) | v ) = ∞ . The pro of of Theorem 5.1 relies on the following lemma which relates a bound in terms of p ola r co ordinates to the finiteness o f the integral. Lemma A.3. S upp ose that f ( β ) is c ontinuous in β , β ∈ R k , and t hat, for some M < ∞ and b < 0 , | f ( β ) | ≤ exp( b || β || ) for al l β such that || β || ≥ M . Then R R k | f ( β ) | d β < ∞ . Pr o of. Split the integral in to t w o p ortions. F or β such that || β || ≤ M , w e ha ve the integral of a contin uous function ov er a compact set. T his integral is finite. The int egral over the remaining p ortion of the space is also finite: Z || β || >M | f ( β ) | d β ≤ Z || β || >M exp( b || β || ) d β = Z ∞ M c k r k − 1 exp( br ) dr < ∞ , where c k r k − 1 is the surfac e area of the k dimensiona l s phere of r adius r . Pr o of of The or em 5.1 The exp ected r th moment o f the ca s e-deleted weigh t function can be written as an integral against the prior times the likelihoo d: I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 804 Z w r \I ( β ) π ( β ) f ( y | x, β ) d β = = Z Y i ∈I (1 + ex p { β T x i } ) r − 1 exp { ( r − 1) β T x i y i } exp( − ǫ | β T | 1 ) Y i / ∈I exp { β T x i y i } 1 + ex p { β T x i } d β . In order to apply Lemma A.3 , we cons ide r a ray emanating from the orig in in a n a rbitrary direction, sp ecified by a particular β under the co nstraint that | β T | 1 = 1 . In this fixed direction, the rate of de c ay (or increas e) of the tail is determined by the maxim um contribution, either 1 or exp { β T x i } , from each term o f the form 1 + exp { β T x i } in the pro ducts ab ov e. Collecting terms, we hav e that the ra te o f decay is governed by exp X i / ∈I β T x i y i − ( r − 1) X i ∈I β T x i y i − X i / ∈I max(0 , β T x i )+ + ( r − 1 ) X i ∈I max(0 , β T x i ) − ǫ | β T | 1 ! = exp( h ( β , r , ǫ )) W e consider the expressio n above, a nd note that we can obtain an (decreas- ing) exp onential bo und on the ta il whenever the term inside the exp onential is negative. If the co rresp onding expression is negative for every direction, w e can construct a uniform b ound which satisfies the assumption of the lemma which, in turn, allows us to conclude that the r th moment of the case-deleted w eight function is finite. The infinite r th momen t ca se in volv es a p ositive v a lue for some dir ection sp ecified by β . In this e vent, since h ( β , r, ǫ ) is contin uous in β , w e conclude that there is a ne ig hborho o d of directions in which the integral a long a ray is infinite. Thus, the integral is infinite, and so is the r th moment of the case- deleted weigh t function. References [1] Angier, N. Built for the Arctic: A Sp ecies’ Splendid Adapta tions. The New Y ork Times January 27, 200 4. [2] Ber ger, J.O . (1985 ). S tatistic al De cision The ory and Bayesian Analy sis (2nd edition). Springer V e r lag, New Y or k. MR08046 11 [3] Bradlow, E. T., and Za sla vsky, A. M. (1 997). Ca se Influence Analysis in Bay esian Inference. J. Comput. Gr aph. S tatist. 6 31 4–331 . [4] Carlin, B .P ., and Louis, T.A. (200 0 ). Bayes and Empiric al Bayes metho ds for data analysis (2 nd edition). Cha pman & Hall, New Y o rk. MR14277 49 [5] Chambers, J.M., and Hastie, T. (1991 ). St atist ic al mo dels in S . Duxbury Press, North Scituate, MA. [6] Cook, R.D., and Weisberg, S. (1982). R esiduals and Influ en c e in R e gr es- sion . Chapman and Hall, New Y ork. MR06752 63 I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 805 [7] Daley, D.J. (2001). The Moment Index of Minima. Journal of Applie d Pr ob abil ity, 38 33– 36. MR19155 31 [8] Daley, D.J. and Goldie, C.M. (20 06). The Moment Index of Minima, I I. Stat. Pr ob ab. L etters 76 831– 837. MR2 26609 7 [9] Doss, H. (199 4 ). Discussion of the pap er by L. Tierney: Markov Cha ins for Explo ring Posterior Distributions. The A nn. Statist. 22 170 1 – 176 2. MR13291 66 [10] Feigl, P ., and Zelen, M. (1 965). Estimation of exp onential probabilities with concomitant information. Biometrics 21 826–8 38. [11] Georg e, E. I., and McCulloch, R. E. (1993 ). V ariable selection via Gibbs sampling. J . Amer. Statist. A s s o c. 88 8 81–8 8 9. [12] Gelf and, A. E., and Sahu, S. K. (19 99). Identifiabilit y , improp er prior s , and Gibbs sampling fo r generalized linea r mo dels. J. Amer. Statist. Asso c. 94 247– 253. MR16892 29 [13] Geweke, J. (1989 ). Bayesian Inference in Eco nometric Models Using Monte Ca rlo In tegration. Ec onometric a 5 7 1317 – 1339. MR10 35115 [14] Henderson, H. V., and Searle, S. R. (198 1). On deriving the inv erse of a sum of matr ices. SIAM Rev iew 23 53– 6 0. MR06054 40 [15] Hodges, J. (1998). Some alg ebra and geometr y for hier a rchical mo dels, applied to diagno s tics. With discussio n and a reply by the author. J. Roy . Statist. So c. S er. B 60 49 7–536 . MR16259 54 [16] K ong, A., Liu, J. S., and Wong, W. H. (1994 ). Sequen tial Imputation and Bay e sian Mis sing Data Problems. J. Amer. St atist. Asso c. 8 9 27 8–288 . [17] Ma cEachern, S. N., and P er uggia, M. (2000). Imp ortance Link F unc- tion Estimation for Marko v Chain Mon te Carlo Methods. J. Comput. Gr ap h. Statist. 9 9 9–121 . MR18198 67 [18] Na t arajan, R., a nd McCulloch, C. E. (1995). A note on the exis tence of the p os terior distribution for a class o f mixed mo dels for binomial r esp onses. Biometrika 82 6 39–64 3. MR13662 87 [19] Per uggia, M. (1997). On the V ariability of Ca se-Deletion Imp ortance Sampling W eights in the Bayesian Linear Mo del. J. Amer. Statist. Asso c. 92 199– 207. MR14361 08 [20] Sa cher, G. A. and St affeldt, E. F. (1974). Relation of Gestation Time to Brain W eight for Placental Mammals: Implications for the Theo ry of V er tebrate Growth. A meric an N atu r alist 108 593– 6 15. [21] Smith, A. F. M., and Gelf and, A. E . (1 9 92). Bayesian Statistics With- out T ears: A Sampling-Resampling Perspective. A mer. Statist. 46 84 – 88. MR11655 66 [22] Spiegelhal ter, D. J., Thomas, A., Best, N.G., and Gilks, W. R. (1996). BUGS : Bayesia n infer enc e Using Gibbs Sampling, V ersion 0.5, (version ii), Cambridge, UK: MR C Biostatistics Unit. [23] T anner, M. A. (199 6). T o ols for statistic al infer enc e. Metho ds for t he ex- plor ation of p osterior distributions and likeli ho o d functions (3nd edition). Springer-V erlag, New Y ork. MR13963 11 [24] Tierney, L. (1994 ). Mar ko v Chains for Explo ring P osteriors Distributions. With discussion and a rejoinder b y the author. Ann. Statist. 22 1701– 1762. I. Epifani et al./Ca se-deletion imp ortanc e sampling est i mators 806 MR13291 66 [25] Weiss, R. (1992 ). Influence Diag nostics with the Gibbs Sampler. Comput- ing Scienc e and St atistics , ed. Newton, H. J., 2 4 266– 2 70. [26] Weiss, R. (1996 ). An Approach to Bay esian Sensitivity Analy sis. J. R oy. Statist. So c. S er. B. 58 739 –750. MR14101 88 [27] Weiss, R., a nd Cho, M. (1998). Bay esian Mar ginal Influence Ass essment. J. Statist. Plann. Infer enc e 71 163–1 77. MR16518 04

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment