Bayesian computation for statistical models with intractable normalizing constants

Ba y esian computation for statistic al mo de ls with in tractable normalizin g constan ts Yves F. A t chad´ e ∗ , Nic olas L artil lot † and Christian P. R ob ert ‡ (April 2008) Abstract: This pap er deals with some computational aspects in th e Bay esian analysis of statistica l mod els with intractable normali zing constan ts. In the presence of in tractable normalizing constants in th e lik eliho od function, traditional MCMC methods cannot b e applied. W e prop ose an approac h to sample from such p osterior distributions. The method can be thought as a Bay esian v ersion of the MCMC-MLE approac h of [8]. T o the b est o f our knowl edge, this is the ﬁrst general and asympt otically consisten t Monte Carlo metho d for such problems. W e i llustrate the method with ex amp les f rom image seg mentation and social netw ork mo deling. W e stu dy as well the asymptotic b ehavior of th e algorithm and obtain a strong law of large numbers for empirical av erages. AMS 2000 sub ject cl assiﬁcations: Primary 60C05, 60J27, 60J3 5, 65C4 0. Keywords and phr ase s: Mo nte Carl o meth od s, A daptive MCMC, Ba yesia n inference, Ising mo del, Image segmen tation, So cial netw ork mo deling. 1. In tro duction Statistical inference for mod els with in tractable normalizing constan ts p oses a ma jor co mp uta- tional chall enge. This p roblem o ccurs in the statistical mod eling of man y scien tiﬁc p roblems. Examples includ e the analysis of spatial p oint pro cesses ([13]), image analysis ([10]), protein de- sign ([11]) and man y others. T he problem can b e describ ed as follo ws. Supp ose w e hav e a dataset x 0 ∈ ( X , B ) generated from a statistical mo d el e E ( x,θ ) λ ( dx ) / Z ( θ ) with parameter θ ∈ (Θ , Ξ), where t h e normaliz in g constan t Z ( θ ) = R X e E ( x,θ ) λ ( dx ) d ep ends on θ and is not a v ailable in closed form. Let µ b e the prior d en sit y of the p arameter θ ∈ (Θ , Ξ). The p osterior distribution of ∗ Department of Statistics, Universit y of Michiga n , email: yvesa @ u mic h .edu † LIRM, Un ivers it´ e d e Montpellier 2, email: nicolas.lartill ot@lirmm.fr ‡ CEREMADE, U n iversi t´ e Paris-Dauphine and CREST, INSEE, email: xian@ceremade.dauphine.fr 1 imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 2 θ giv en x 0 is then giv en b y π ( θ ) ∝ 1 Z ( θ ) e E ( x 0 ,θ ) µ ( θ ) . (1) When Z ( θ ) cannot b e easily ev aluated, Mon te Carlo sim u lation from this p osterior d istribution is problematic ev en using Mark o v Ch ain Monte Carlo (MCMC). [14] u ses the term doubly in- tr actable distr ibu tion to refer to posterior d istributions of the form (1). Cu rrent Mon te Carlo sampling metho ds do not allo w one to d eal with suc h m o dels in a Ba yesian framework. F or ex- ample, a Metrop olis-Hastings algorithm with prop osal kernel Q and target distribution π , wo u ld ha ve acce p tance ratio min  1 , e E ( x 0 ,θ ′ ) e E ( x 0 ,θ ) Z ( θ ) Z ( θ ′ ) Q ( θ ′ ,θ ) Q ( θ ,θ ′ )  whic h cannot b e compu ted as it in v olv es the in tractable n ormalizing constan t Z ev aluated at θ and θ ′ . An early atte mp t to deal with this prob lem is the pseu d o-lik eliho o d appro ximation of Besag ([4]) which approxima tes the mo d el e E ( x,θ ) b y a more tractable mo del. Pseudo-lik eliho o d in ference pro vides a ﬁ rst appr oximati on but t ypically perf orms p o orly (see e.g. [5 ]). Maxim um lik eliho o d inference is p ossible. MCMC-MLE, a maxim u m lik eliho o d approac h us ing MCMC has b een d e- v elop ed in the 90’s ([7, 8]). Another related appr oac h to ﬁ nd MLE estimates is Y ounes’ algo rith m ([18]) based on sto c hastic approximati on. An int eresting sim ulation study comparing these three metho ds is pr esented in [10]. Comparativ ely little wo r k has been done to d ev elop asymptotically exact metho ds for th e Ba y esian approac h to this problem. Bu t v arious appro ximate al gorithms exist in the lite r atur e, often based on path s ampling ([6]). Recen tly , [12] ha ve s ho wn that if exact sampling of X from e E ( x,θ ) / Z ( θ ) (as a density in ( X , B )) is p ossible then a v alid MCMC algorithm to sample from (1) can b e dev elop ed. See also [14] for some impro ve ments. T heir approac h uses a clev er auxiliary v ariable algo rith m . But intract able n ormalizing constan ts often occur in mo dels f or which exact sampling of X is not p ossib le or is very exp ensive. Another r ecent dev elopmen t to the problem is the app ro ximate Ba y esian computation sc hemes of Plagnol- T a v ar´ e ([ 15]) but whic h sample only appro ximately fr om the p osterior distrib ution. In this pap er, we pr op ose an adaptiv e Mon te Carlo approac h to sample from (1). Our algorithm generates a stochasti c p ro cess (not Mark o v in general) { ( X n , θ n ) , n ≥ 0 } in X × Θ suc h that as n → ∞ , the marginal distribution of θ n con v erges to (1 ). It is clear that any metho d to sample from (1 ) will ha v e to deal with the intracta ble normalizing constan t Z ( θ ). In the auxiliary v ariable metho d of [12], computing Z ( θ ) is replaced in a sense by p erfect sampling from e E ( x,θ ) / Z ( θ ). This imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 3 strategy w orks well so long as p erfect sampling is feasible and inexp ensive. In the present w ork, w e tak e another approac h b uilding on the idea of estimating the en tire function Z fr om a single Mon te Carlo sampler. The starting p oin t of the metho d is imp ortance sampling. Supp ose that for some θ (0) ∈ Θ, w e can sample (p erhaps b y MCMC) from the densit y e E ( x,θ (0) ) / Z ( θ (0) ) in ( X , B ). Using this sample, w e can certainly estimate Z ( θ ) / Z ( θ (0) ) for an y θ ∈ Θ. This is the same idea b ehind the MCMC-MLE algo rith m of [8]. But it is w ell kno wn that these estimates are t yp ically v ery p o or as θ gets far from θ (0) . No w, supp ose th at instead of a sin gle p oin t θ (0) , w e generate a p opulation { θ ( i ) , i = 1 , . . . , d } in Θ and that w e can sample from Λ ∗ ( x, i ) ∝ e E ( x,θ ( i ) ) / Z ( θ ( i ) ) on X × { 1 , . . . , d } . Th en w e sho w that in principle, eﬃcie nt estimation for Z ( θ ) is p ossible for an y θ ∈ Θ. Building on [3] and the ideas sk etc hed ab o ve, we prop ose an algorithm that generates a random pro cess { ( X n , θ n ) , n ≥ 0 } suc h that the margi n al distribution of X n con v erges to Λ ∗ and the marginal distrib ution of θ n con v erges to (1). This random p ro cess is not a Mark o v chain in general but w e show (from ﬁrs t p rinciple) that { θ n } has limiting distribution π and satisﬁes a strong la w of large n umb ers. The pap er is organized as follo w s. A full description of the metho d including practical im- plemen tation details is giv en in Section 2. W e illustrate the alg orithm with th r ee examples. The Ising mo d el, a Ba yesia n image segmen tation example and a Ba ye sian m o deling of social net wo r k s . The examples are pr esen ted in Secti on 4. Some theoretical asp ects of the metho d are discussed in Section 3 with the pro ofs p ostp oned to 6. 2. Sampling from p osterior distributions with in tractable normalizing constan t s Throughout, w e ﬁx the sample space ( X , B , λ ) and the parameter space (Θ , Ξ). Th e problem of in terest is to sample from the p osterior distribution (1) with Z ( θ ) = Z X e E ( x,θ ) λ ( dx ) . (2) Let { θ ( i ) , i = 1 , . . . , d } b e a sequence of d p oin ts in Θ. Let Λ ∗ b e the prob ab ility measure on X × { 1 , . . . , d } giv en b y: Λ ∗ ( x, i ) = e E ( x,θ ( i ) ) d Z ( θ ( i ) ) , x ∈ X , i ∈ { 1 , . . . , d } . (3) imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 4 Let κ ( θ , θ ′ ) b e a sim ilarity k ern el on Θ × Θ su c h that P d i =1 κ ( θ , θ ( i ) ) = 1 for all θ ∈ Θ. The starting p oint of th e algorithm is the follo win g decomp osition of th e p artition function: Z ( θ ) = Z X e E ( x,θ ) λ ( dx ) = d X i =1 κ ( θ , θ ( i ) ) Z X e E ( x,θ ) − E ( x,θ ( i ) ) e E ( x,θ ( i ) ) λ ( dx ) = d d X i =1 κ ( θ , θ ( i ) ) Z ( θ ( i ) ) Z X e E ( x,θ ) − E ( x,θ ( i ) ) e E ( x,θ ( i ) ) d Z ( θ ( i ) ) λ ( dx ) = d X i =1 Z X Λ ∗ ( x, i ) h θ ( x, i ) λ ( dx ) , (4) where h θ ( x, i ) = dκ ( θ , θ ( i ) ) Z ( θ ( i ) ) e E ( x,θ ) − E ( x,θ ( i ) ) . (5) The in terest of the decomp osition (4) is th at { Z ( θ ( i ) ) } and Λ ∗ do not d ep end on θ . T herefore, using samples from Λ ∗ , this decomp osition giv es an approac h to estimate Z ( θ ) for all θ ∈ Θ. This estimate sh ould b e reliable p ro vided θ is close to at least one particle θ ( i ) . The p roblem of sampling from p robabilit y measures suc h as Λ ∗ has b een rece ntly considered b y [3] building on the W ang-Landau algorithm of [17]. W e follo w and impro v e that approac h here. The resulting estimate of Z ( θ ) can then cont inuously b e fed to a second Mon te Carlo sampler that carries the sim ulation with resp ect to π . T his su ggests an adaptiv e Mon te Carlo sampler to sample from (1) whic h we dev elop next. F or any c = ( c (1) , . . . , c ( d )) ∈ R d , we d eﬁne the follo wing densit y fu nction on X × { 1 , . . . , d } : Λ c ( x, i ) ∝ e E ( x,θ ( i ) ) − c ( i ) . (6) With c = log ( Z ), Λ c = Λ ∗ . The reader should thin k of c as an estimate of z , with z ( i ) := log Z ( θ ( i ) ). The algorithm will adaptiv ely adjus t c suc h that th e marginal distr ib ution on { 1 , . . . , d } is appro ximately uniform. In which case, w e should hav e c ( i ) = log Z ( i ). Let { γ n } b e a sequ ence of (p ossibly random) p ositiv e n umb ers. W e prop ose a non-Marko vian adaptiv e sampler that lives in X × { 1 , . . . , d } × R d × Θ. W e start from an initial state ( X 0 , I 0 , c 0 , θ 0 ) ∈ X × { 1 , . . . , d } × R d × Θ, where c 0 ∈ R d is the initial estimate of z . F or example, c 0 ≡ 0. At time n , giv en ( X n , I n , c n , θ n ) we ﬁrst generate X n +1 from P I n ( X n , · ), where P i is a transition k ern el on ( X , B ) w ith in v ariant d istri- bution e E ( x,θ ( i ) ) / Z ( θ ( i ) ). Next, w e generate I n +1 from the distribution on { 1 , . . . , d } prop ortional imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 5 to e E ( X n +1 ,θ ( i ) ) − c n ( i ) . Then we u p d ate the current estimate of log( Z ) to c n +1 giv en by: c n +1 ( i ) = c n ( i ) + γ n e E ( X n +1 ,θ ( i ) ) − c n ( i ) P d j =1 e E ( X n +1 ,θ ( j ) ) − c n ( j ) , i = 1 , . . . , d. (7) In view of (4), we can estimate Z ( θ ) by: Z n +1 ( θ ) = d X i =1 κ ( θ , θ ( i ) ) e c n +1 ( i ) " P n +1 k =1 e E ( X k ,θ ) − E ( X k ,θ ( i ) ) 1 i ( I k ) P n +1 k =1 1 i ( I k ) # , (8) with the conv ention that 0 / 0 = 0. Finally , for an y p ositiv e function ζ : Θ → (0 , ∞ ), let Q ζ b e a transition ke rn el on (Θ , Ξ) w ith in v arian t distribution π ζ ( θ ) ∝ 1 ζ ( θ ) e E ( x 0 ,θ ) µ ( θ ) . (9) Giv en ( X n +1 , I n +1 , c n +1 , Z n +1 , θ n ), we generate θ n +1 from Q Z n +1 ( θ n , · ), where Z n +1 is the fun ction deﬁned by (8). The algorithm can b e summarized as follo ws. Algorithm 2.1. . L et ( X 0 , I 0 , c 0 , θ 0 ) ∈ X × { 1 , . . . , d } × R d × Θ b e the initial state of the algorithm. L et { γ n } b e (a p ossibly r andom) se quenc e of p ositive numb e rs. At time n , given ( X n , I n , c n , θ n ) : 1. Gener ate X n +1 fr om P I n ( X n , · ) wher e P i is any er go dic kernel on ( X , B ) with invaria nt dis- tribution e E ( x,θ ( i ) ) / Z ( i ) . 2. Gener ate I n +1 by sampling fr om the distribution on { 1 , . . . , d } pr op ortional to e E ( X n +1 ,θ ( i ) ) − c n ( i ) . 3. Compute c n +1 , the new estimate of g using (7). 4. Using the fu nction Z n +1 deﬁne d by (8), g ener ate θ n +1 fr om Q Z n +1 ( θ n , · ) . Remark 2.1 . 1. The algorithm can b e seen as an MCMC-MCMC analog to the MCMC-MLE of [8]. Indeed, with d = 1, the decomp osition (4) b ecomes Z ( θ ) / Z ( θ (1) ) = E h e E ( θ,X ) − E ( X, θ (1) ) i , where the exp ectation is tak en with resp ect to the d ensit y e E ( x,θ (1) ) / Z ( θ (1) ). But as discussed in the in tro d uction, when E ( θ , X ) − E ( X , θ (1) ) has a large v ariance, the resulting estimate is terribly p o or. 2. W e int r o duce κ to serve as a smo othing f actor so that the particles θ ( i ) ’s close to θ contribute more to the estimation of Z ( θ ). W e exp ect this smo othing step to reduce the v ariance of imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 6 the o v erall estimate of Z ( θ ). In th e simula tions w e c h o ose κ ( θ , θ ( i ) ) = e − 1 2 h 2 k θ − θ ( i ) k 2 P d j =1 e − 1 2 h 2 k θ − θ ( j ) k 2 . The v alue of the smo othing parameter h is set by trials and errors for eac h example. 3. The implemen tation of the algorithm requires k eeping trac k of all the samples X k that are generated (Equation (8 )). X can b e a v ery high-dimensional space and we are a w are of the fact that in practice , this b o okk eeping can signiﬁcant ly slo w do wn the algorithm. But in man y cases, the f unction E tak es the form E ( x, θ ) = P K l =1 S l ( x ) θ l for some real-v alued func- tions S l . In these cases, we only n eed to kee p track of the statistics { ( S 1 ( X n ) , . . . , S K ( X n )) , n ≥ 0 } . All the examples d iscussed in the pap er f all in this latter category . 4. As men tioned earlier, the up date of ( X n , I n , c n ) is essen tially the W ang-Landau algorithm of [3] w ith the follo wing imp ortan t diﬀerence. [3] p rop ose to up date c n one comp onent p er iteration: c n +1 ( i ) = c n ( i ) + γ n 1 { i } ( I n +1 ) . W e imp r o v e on this sc heme in (7) b y Rao-Blac k welliza tion where w e in tegrate out I n +1 . 5. As menti oned ab o v e, and we stress th is again, this algorithm is not Marko vian in an y wa y . The pro cess { ( X n , I n , c n ) } is not a Mark ov chain but a nonhomogeneous Marko v c hain if w e let { γ n } b e a deterministic sequence. { θ n } , the main random pr o cess of in terest is not a Mark o v c hain either. Nev er th eless, the marginal distribution of θ n will t ypically con v erge to π . This is b ecause, Q Z n , the conditional distribu tion of θ n +1 giv en F n con v erges to Q Z as n → ∞ and Q Z is a kernel with in v arian t distribution π . W e make this precise by sho wing that a strong la w of large n umb ers holds for additiv e functionals of { θ n } . W e no w discuss the c h oice of the parameters of the algo rith m . 2.1. Cho osing d a nd the p ar ticles { θ ( i ) } W e do not h a v e an y general approac h in c h o osing d and { θ ( i ) } but we giv e some guidelines. Th e general idea is that the particles { θ ( i ) } need to co v er r easonably w ell the r ange of the density π and b e suc h that for an y θ ∈ Θ, the density e E ( x,θ ) / Z ( θ ) in X can b e w ell appro ximated by at least one of the densities e E ( x,θ ( i ) ) / Z ( i ). On e p ossibilit y is to sample θ ( i ) indep end en tly from the imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 7 prior distribution µ , some temp ered v ersion of it o r some other similar d istribution. W e follo w this app roac h in the examples b elo w. Another p ossibilit y is to use a grid of p oin ts in Θ. The v alue of d , the n um b er of particle s, should dep end on the size of Θ. W e need to c ho ose d suc h that the distributions e E ( x,θ ( i ) ) / Z ( i ) (in X ) o ve r lap. Otherwise, estimating the constan ts { Z ( i ) } can b e d iﬃ cu lt. This implies that d should not b e to o small. I n the simulatio n examples b e! lo w w e choose d b et we en 100 and 500. 2.2. Cho osing the step-size { γ n } It is shown in [3] that the recur s ion (7) can also b e wr itten as a sto c hastic ap p ro ximation algorithm with step-size { γ n } , so that i n theory , an y p ositiv e sequence { γ n } s uc h that P γ n = ∞ and P γ 2 n < ∞ can b e used. But the con v ergence of c n to log Z is v ery sensitiv e to the c hoice { γ n } . If the γ n ’s are o v erly s mall, the recursiv e equation in (7) will mak e v ery small steps. But if these n umb ers are o verly large, the algorithm will h a v e a large v ariance. In b oth cases, the con verge n ce to the solution w ill b e slow. Overall, choosing the righ t step-size for a stochastic app r o ximation algorithm is a diﬃcult problem. Here we follo w [3] whic h has elab orated on a heu r istic approac h to this pr oblem originall y prop osed by [17]. The main idea of the metho d is that typical ly , the larger γ n , the easier it is for the algorithm to mo ve aroun d the state space. Therefore, at the b eginning γ 0 is s et at a relativ ely large v alue. This v alue is k ept until { I n } h as visited equally w ell all th e points of { 1 , . . . , d } . Let τ 1 b e the ﬁrst time where the occup ation measure of { 1 , . . . , d } by { I n } is app ro ximately un iform. Then w e set γ τ 1 +1 to some smaller v alue (for example γ τ 1 +1 = γ τ 1 / 2) and th e pr o cess is iterated unt il γ n b ecome suﬃcien tly small. As wh ich p oin t w e can c h o ose to switc h to a deterministic sequence of the form γ n = n − 1 / 2 − ε . Combining th is id ea with Algorithm 2.1, we get the follo wing. Algorithm 2.2. . L et γ > ε 1 > 0 , ε 2 > 0 b e c onstants a nd let ( X 0 , I 0 , c 0 , θ 0 ) b e some a rbitr ary initial state of the algorithm. Set v = 0 ∈ R d and n = 0 . While γ > ε 1 and given F n = σ { ( X k , I k , c k , θ k ) , k ≤ n } , 1. Gener ate ( X n +1 , I n +1 , c n +1 , θ n +1 ) as in Algorithm 2.1. 2. F or i = 1 , . . . , d : set v ( i ) = v ( i ) + 1 i ( I n +1 ) . 3. If max i    v ( i ) − 1 d    ≤ ε 2 d , then se t γ = γ / 2 and set v = 0 ∈ R d . imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 8 Remark 2.2. W e use this algorithm in the examples b elo w with the follo w in g sp eciﬁcations. W e set the initial γ to 1, ε 2 = 0 . 2. W e run { ( X n , I n , c n ) } un til γ ≤ ε 1 = 0 . 001 b efore starting { θ n } and switc hing to a deterministic sequence γ n = ε 1 /n 0 . 7 . 3. Conv ergence analysis In this s ection, w e deriv e a la w of large n umbers u nder some veriﬁable conditions. The pro cess of interest her e is { θ n } . Let (Ω , F , Pr ) b e the reference probabilit y triplet. W e equ ip (Ω , F , Pr ) with the ﬁltration {F n } , where F n = σ { ( X k +1 , I k +1 , c k +1 , θ k ) , k ≤ n } . Note that F n includes ( X n +1 , I n +1 , c n +1 ) since these rand om v ariables are used in generat in g θ n +1 . F rom th e deﬁnition of the algorithm, we h a v e: Pr ( θ n +1 ∈ A |F n ) = Q Z n +1 ( θ n , A ) , Pr − a.s.. (10) W e see from (10) that { θ n , F n } is an adaptiv e Mon te C arlo algo r ithm with v aryin g target distri- bution. In analyzing { θ n , F n } h ere, we do not striv e for the most general r esu lt but r estrict ourself to conditions that can b e easily c h ec k ed in the examples considered in the pap er. W e assume that Θ is a compact subset of R q , the q -dimensional Euclidean space equipp ed with its Borel σ -algebra and the Leb esgue measure. Firs tly , we assume that the fun ction E is b ounded: (A1) : Ther e exist m, M ∈ R such that: m ≤ E ( x, θ ) ≤ M , x ∈ X , θ ∈ Θ . (11) In man y a p plications, and this is th e c ase for th e examples discussed b elo w, X is a ﬁn ite set (t ypically v ery large) and Θ is a compact set. In th ese cases, (A1) is easily c hec ked. In ord er to pro ceed any fur ther, we n eed some notations. A transition kernel on ( X , B ) op erates on measurable real-v alued functions f as P f ( x ) = R P ( x, dy ) f ( y ), and the pro du ct of t wo transition k ernels P 1 and P 2 is the transition kernel deﬁned as P 1 P 2 ( x, A ) = R P 1 ( x, dy ) P 2 ( y , A ). W e can then deﬁn e recursiv ely P n = P P n − 1 , n ≥ 1, P 0 ( x, A ) = 1 A ( x ). F or t wo p robabilit y measures µ, ν , the total v ariation distance b et we en µ and ν is deﬁn ed as k µ − ν k T V := sup A | µ ( A ) − ν ( A ) | . W e sa y that a transition kernel P is geometricall y ergo d ic if P is φ -irredu cible, ap erio dic and has an inv arian t distribution π such th at: k P n ( x, · ) − π k T V ≤ M ( x ) ρ n , n ≥ 0 imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 9 for some ρ ∈ (0 , 1) and some function M : X → (0 , ∞ ]. Our next assump tion inv olv es the transition ke rn el Q ζ . (A2) : F or ζ : Θ → (0 , ∞ ) , Q ζ is a Metr op olis kernel with i nvariant distribution π ζ and pr op osal kernel density p . Ther e e xist ε > 0 and an inte ger n 0 ≥ 1 such that for al l θ , θ ′ ∈ Θ : p n 0 ( θ , θ ′ ) ≥ ε. (12) Remark 3.1 . 1. The cond ition (12) clearly h olds for most symmetric p r op osal k ern els p ( θ , θ ′ ), pro vided th at p ( θ , θ ′ ) remains b ounded a wa y f r om 0 on some ball cen tered at θ . 2. (12) often imp lies that Q ζ is uniformly ergo dic: Q ζ ( θ , A ) ≥ Z A min  1 , e E ( x 0 ,θ ′ ) − E ( x 0 ,θ ) ζ ( θ ′ ) ζ ( θ )  p ( θ , θ ′ ) dθ ′ ≥ e m − M inf θ , θ ′ ∈ Θ  ζ ( θ ) ζ ( θ ′ )  Z A p ( θ , θ ′ ) dθ ′ . Therefore, pro vided inf θ , θ ′ ∈ Θ  ζ ( θ ) ζ ( θ ′ )  > 0, if (12) hold then Q n 0 ζ ( θ , A ) ≥ ε ′ µ Leb ( A ) for s ome ε ′ > 0. (A3) : { γ n } is a r andom se quenc e, adapte d to {F n } such that γ n > 0 , P γ n = ∞ and P γ 2 n < ∞ Pr -a.s. Theorem 3.1. Assume (A1)-(A 3). Assume also tha t e ach ke rnel P i on ( X , B ) i s ge ometric al ly er go dic. L et h : (Θ , Σ) → R such that | h | ≤ 1 . Then: 1 n n X k =1 h ( θ k ) → π ( h ) , Pr − a.s. (13) Pr o of. See Section 6. 4. Examples 4.1. Ising mo del W e test th e algorithm with the Ising mo d el on a rectangular lattice . Th is is a s im ulated example. The mo del is given b y e θ E ( x ) / Z ( θ ) wh er e E ( x ) = m X i =1 n − 1 X j =1 x ij x i,j +1 + m − 1 X i =1 n X j =1 x ij x i +1 ,j , (14) imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 10 and x i ∈ { 1 , − 1 } . W e u se m = n = 64. W e generate the d ata x 0 from e θ E ( x ) / Z ( θ ) with θ = 0 . 40 b y p erfect sampling using the Propp -Wilson alg orithm. Using x 0 and p ostulating th e mo del e θ E ( x ) / Z ( θ ), w e would lik e t o infer on θ . W e use the prior µ ( θ ) = 1 (0 , 1) ( θ ). W e set d = 100 and generate the p oin ts { θ ( i ) } indep endentl y and uniformly in (0 , 1). As d escrib ed in Section 2.2, we use the ﬂat histogram approac h in selecting { γ n } with an initial v alue γ 0 = 1, u n til γ n b ecomes sm aller than 0 . 001. Then we start feeding the adaptive c hain { θ n } which is run for 10 , 000 iterations. In up d ating θ n , w e use a Random W alk Metropolis sampler with prop osal distribution U ( θ n − b, θ n + b ) (with reﬂexion at the b oun daries) for some b > 0. W e adaptiv ely up date b so as to reac h an acce p tance rate of 30% (see e.g. [2]). W e d! iscard th! e ﬁrst 1 , 999 p oin ts as a b urn-in p erio d. The results are p lotted on Figure 1. As w e can see from these plots, the samp ler ap p ears to ha ve con verged to th e p osterior distr ib ution π . The mixing rate of the algorithm a s inferred from the auto correlation function seems fairly go o d. In addition, the algo rith m yields an estimate of the partition fu n ction log Z ( θ ) which can b e re-used in other sampling problems. 0.0 0.2 0.4 0.6 0.8 1.0 17000 19000 21000 (a) 0 2000 4000 6000 8000 0.38 0.39 0.40 0.41 0.42 (b) 0.38 0.39 0.40 0.41 0.42 0 50 100 200 300 (c) 0 100 200 300 400 0.0 0.2 0.4 0.6 0.8 1.0 (d) Figure 1: Outp u t for the Ising mo del θ = 0 . 40, m = n = 64. (a): estimation of log Z ( θ ) up to an additiv e constan t; (b)-(d): T race plot, histogram and autocorrelation fu nction of the adaptiv e sampler { θ n } . imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 11 4.2. An applic ation to image se gmentation W e use the Isin g mo del ab o v e to illustrate an application of the metho dology to image segmen- tation. In image segmenta tion, the goal is the r econstruction of images f rom noisy observ ations (see e.g. [9 , 10]). W e repr esen t the image b y a v ector x = { x i , i ∈ S } where S is a m × n lattice and x i ∈ { 1 , . . . , K } . Eac h i ∈ S represents a pixel, and x i is the color of the pixel i . K i s the n umb er of colors. Here w e assume that K = 2 and x i ∈ {− 1 , 1 } is either blac k or white. In the image segmen tation problem, we do not observ e x but a noisy app ro ximation y . W e assume that: y i | x, σ 2 ind ∼ N ( x i , σ 2 ) , (15) for some unknown parameter σ 2 . Even though (15) is a cont inuous mo d el, it has b een shown to pro vide a relati vely goo d framew ork for imag e s egmentat ion problems w ith m ultiple additiv e sources of noise ([10 ]). W e assu me that the t r ue image x is g enerated from an Ising mo d el (see Section 4.1) w ith in teraction parameter θ . W e assume t h at θ follo w s a uniform prior distrib u tion on (0 , 1) and that σ 2 has a p rior distribution that is p rop ortional to 1 /σ 2 1 (0 , ∞ ) ( σ 2 ). The p osterior distribution ( θ , σ 2 , x ) is then giv en b y: π  θ , σ 2 , x | y  ∝  1 σ 2  |S | 2 +1 e θ E ( x ) Z ( θ ) e − 1 2 σ 2 P s ∈S ( y ( s ) − x ( s )) 2 1 (0 , 1) ( θ ) 1 (0 , ∞ ) ( σ 2 ) , (16) where E is as in (14). W e sample from this p osterior distribution using th e adaptiv e c hain { ( y n , i n , c n , θ n , σ 2 n , x n ) } . The c hain { ( y n , i n , c n ) } is up dated follo wing Steps (1)-(3) of Algorithm 2.1. It is used to form the ad ap tive estimate of Z ( θ ) as giv en by (8) (with { y n , i n } r ep lacing { X n , I n } ). Th ese estimates are u s ed to up date ( θ n , σ 2 n , x n ) u sing a Metropolis-within-Gibbs sc heme. More sp eciﬁcally , giv en σ 2 n , x n , we sample θ n +1 with a Random W alk Metrop olis with prop osal U ( θ n − b, θ n + b ) (with reﬂexion at the b oundaries) and target prop ortional to e θE ( x n ) Z n ( θ ) . Given θ n +1 , x n , we generate σ 2 n +1 b y s ampling from the In verse-Ga mm a distribu tion with parameters ( |S | 2 , 1 2 P s ∈S ( y ( s ) − x ( s )) 2 ). And giv en ( θ n +1 , σ n +1 ), we sample eac h x n +1 ( s ) from its conditional distribu tion giv en { x ( u ) , u 6 = s } . This conditional d istribution is giv en b y p ( x ( s ) = a | x ( u ) , u 6 = s ) ∝ exp θ a X u ∼ s x ( u ) − 1 2 σ 2 ( y ( s ) − a ) 2 ! , a ∈ {− 1 , 1 } . imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 12 Here u ∼ v means that pixels u and v are neigh b ors. T o test this algo rith m , w e generate a simulate d dataset y according to mo del (15) with x generated from e θ E ( x ) / Z ( θ ) b y p erfect sampling. W e us e m = n = 64, θ = 0 . 40 and σ = 0 . 5. F or the implemen tation details of the algorithm, we mak e exactly th e same c hoices as in Example 4.1 abov e. In particular we c ho ose d = 100 and generate { θ ( i ) } u niformly in (0 , 1). Th e results are giv en in Figure 2. Once again, the sample path obtained from { θ n } clearly su ggests that the distribution of θ n has conv erged to π with a goo d m ixin g r ate, as inferr ed from the auto correlation plots. 0 2000 4000 6000 8000 0.38 0.39 0.40 0.41 (a) 0.38 0.39 0.40 0.41 0 50 100 150 200 250 300 (b) 0 100 200 300 400 0.0 0.2 0.4 0.6 0.8 1.0 (c) 0 2000 4000 6000 8000 0.24 0.25 0.26 0.27 0.28 (d) 0.24 0.25 0.26 0.27 0.28 0 50 100 150 200 250 300 (e) 0 100 200 300 400 0.0 0.2 0.4 0.6 0.8 1.0 (f) Figure 2: Output for the image segmenta tion mod el. (a)-(c): plots of { θ n } ; (d)-(f ): plots of { σ n } . 4.3. So cial networ k mo d eling W e now giv e an app lication of the metho d to a Ba yesia n analysis of social netw orks. Stati stical mo deling of so cial net work is a gro wing su b ject in so cial science (See e.g. [16] and the r eferences therein for more details). Th e set up is the follo wing. W e ha v e n actors I = { 1 , . . . , n } . F or eac h pair ( i, j ) ∈ I × I , deﬁne y ij = 1 if actor i has ties with actor j and y ij = 0 otherwise. In the example b elo w, we only consider the case of a symmetric relationship where y ij = y j i for all i, j . The ties referr ed to here can b e of v arious natures. F or exa m p le, w e might b e interested in imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 13 mo deling ho w fr iendships build up b et we en co-w orkers or ho w researc h col lab oration tak es place b et ween collea gues. Another inte r esting example from p olitical science is mod eling co-sp onsorship ties (for a giv en p iece of legislation) b etw een mem b ers of a hous e of representa tives or parlimen t. In th is example we study the Medici business net w ork dataset tak en from [16] whic h describ es the business ties b et ween 16 Floren tine families. Num b ering arbitrarily the family from 1 to 16, w e plot the observed so cial net w ork in Figure 3. Th e dataset conta ins relat ively few ties b et ween families and ev en few er transitive ties. 1 9 2 6 7 3 5 4 11 15 8 16 13 14 10 12 Figure 3: Business Relationships b et we en 16 Floren tine families. One of the most p opular m o dels for so cial net wo r ks is the class of exp onen tial random graph mo dels. In these m o dels, w e assu m e that { y ij } is a sample generated f rom the distribution p ( y | θ 1 , . . . , θ K ) ∝ exp K X i =1 θ i S i ( y ) ! , (17) for some parameters θ 1 , . . . , θ K ; where S i ( y ) is a statistic used to capture some asp ect of the net wo r k. F or this example, and follo win g [16], we consider a 4-dimensional mo d el with statistics S 1 ( y ) = X i 0 and a p ositiv e deﬁnite matrix Σ. W e adaptiv ely set σ so as to reac h an acceptance rate of 30%. Ideally , we w ould lik e to c ho ose Σ = Σ π the v ariance-co v ariance of π whic h of course, is not av ailable. W e adaptive ly estimate Σ π during the sim ulation as in [2]. As b efore, we run ( X n , I n , c n ) un til γ n < 0 . !001 . Then w e sta r t { θ n } and run the full c h ain ( X n , I n , c n , θ n ) for a total of 25 , 000 iterat ions. The p osterior distri- butions of the parameters are giv en in Figur es 4a-4d. In T able 1, w e giv e the sample p osterior mean together with the 2 . 5% and 97 . 5% quan tiles of the p osterior distribution. Overall, these r e- sults are consisten t with the maxim um likelihoo d estimates obtained by [16] usin g MCMC-MLE. The main diﬀerence app ears in θ 4 whic h we ﬁ nd here not to b e signiﬁcant. As a b y-pro d uct, the sampler giv es an estimate of the v ariance-co v ariance matrix of th e p osterior distrib ution π : Σ π =          1.67 -0.41 0.27 -0.07 -0.41 1.83 -0.47 - 0.02 0.27 -0.47 1.78 -0.03 -0.07 - 0.02 -0.03 1.65          . (19) imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 15 P arameters P ost. mean P ost. quantiles θ 1 − 2 . 14 ( − 3 . 32 , − 0 . 81) θ 2 0 . 94 ( − 0 . 43 , 2 . 49) θ 3 − 1 . 06 ( − 2 . 72 , 0 . 04) θ 4 0 . 09 ( − 1 . 39 , 1 . 07) T able 1 Summary of the p osterior distribution of the p ar ameters. Posterior me ans, 2 . 5% and 97 . 5% quantiles 0 5000 15000 25000 −4 −2 0 1 (a) (b) −4 −3 −2 −1 0 0 100 200 300 0 50 100 150 200 0.0 0.4 0.8 (c) Figure 4a: Th e adap tive MCMC output from (18 ). (a)-(c ): Plots f or { θ 1 } . Based on 25 , 000 iterations. 0 5000 15000 25000 −2 0 1 2 3 (a) (b) −1 0 1 2 3 0 50 150 250 0 50 100 150 200 0.0 0.4 0.8 (c) Figure 4b: The adaptiv e MCMC output from (18). (a)-(c): P lots for { θ 2 } . Based on 25 , 000 iterations. 0 5000 15000 25000 −4 −3 −2 −1 0 (a) (b) −4 −3 −2 −1 0 0 50 150 250 0 50 100 150 200 0.0 0.4 0.8 (c) Figure 4c: The adaptiv e MCMC output from (18). (a)-(c): P lots for { θ 3 } . Based on 25 , 000 iterations. imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 16 0 5000 15000 25000 −3 −2 −1 0 1 (a) (b) −3 −2 −1 0 1 0 100 200 300 400 0 50 100 150 200 0.0 0.4 0.8 (c) Figure 4d: The adaptiv e MCMC output from (18). (a)-(c): P lots for { θ 4 } . Based on 25 , 000 iterations. 5. Conclusion Sampling fr om p osterior distributions with intracta ble n ormalizing constan ts is a d iﬃ cu lt compu- tational pr oblem. T h u s far, all metho ds prop osed in the literature but one en tail approximat ions that do not v anish asymptotically . And the only exception ([12]) requir es exact sampling in the data space, which is on ly possib le for v ery sp eciﬁc cases. In this work, w e prop ose an approac h that b oth is more general than [12] and satisﬁes a s trong la w of large n umb ers w ith limiting distribution equal to the target distribu tion. The few applications w e ha ve p resen ted here su g- gest th at the metho d is promising. It remains to b e determined ho w the metho d will scale with the dimensionalit y and with the size of th e problems, although in this resp ect, adaptations of the metho d in v olving annealing/temperin g schemes are easy to imagine, whic h would allo w large problems to b e analysed p rop erly . Ac kno wledgements The researc h of the third author had b een partly su pp orted b y the Agence Nationale de la Rec herc h e (ANR, 212, rue de Bercy 75012 Paris) through the 2006-200 8 p r o ject Adap’MC . 6. Pro of of Theorem 3.1 Pr o of. Th r ough tout th e pro of, C will d enote a ﬁnite constan t but wh ose actual v alue can c hange from one equation to the next. T he conv ergence of the W ang-Landau algorithm has b een studied in [3]. It is sho wn in th is w ork that under the cond ition of Theorem 3.1, min i P ∞ k =1 1 { i } ( I k ) = ∞ and imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 17 more imp ortan tly , e c n ( i ) / P d j =1 e c n ( j ) con v erges almost su rely to a Z ( θ ( i ) ) (up to a m ultiplicativ e constan t). Deﬁne ω n ( i ) = e c n ( i ) P d j =1 e c n ( j ) , v n,i ( θ ) = P n k =1 e E ( X k ,θ ) − E ( X k ,θ ( i ) ) 1 i ( I k ) P n k =1 1 i ( I k ) and ˜ Z n ( θ ) := Z n ( θ ) P d j =1 e c n ( j ) = d X i =1 ω n ( i ) v n,i ( θ ) . (20) Instead of Z n , w e work with ˜ Z n . This is equiv alen t b ecause P d j =1 e c n ( j ) do es not d ep end on θ and Z n alw a ys app ears in Q Z n as a ratio. W e ha v e: inf θ ∈ Θ ˜ Z n ( θ ) ≥ e m − M . (21) inf θ , θ ′ ∈ Θ ˜ Z n ( θ ) ˜ Z n ( θ ′ ) ! ≥ e 2( m − M ) . (22) Com binin g (22) and (12) and part 2 of R emark 3.1, we dedu ce that there exists ε 0 > 0 suc h that for all n, j ≥ 0 sup | h |≤ 1    Q j Z n h ( θ ) − π Z n ( h )    ≤ 2(1 − ε 0 ) j , Pr − a.s. (23) W e in tro du ce the n otation ¯ Q n = Q ˜ Z n − π ˜ Z n . It f ollo w s from (23) that for an y n ≥ 1 the follo win g function g n is w ell deﬁned: g n ( θ ) = ∞ X j =1 ¯ Q j n h ( θ ) . Moreo v er | g n ( θ ) | ≤ 2 /ε 0 for all θ ∈ Θ. g n satisﬁes Po isson’s equation for ¯ Q n and h − π ˜ Z n ( h ): g n ( θ ) − ¯ Q n g n ( θ ) = h − π Z n ( h ) . (24) Using this we can rewrite P n k =1 h ( θ k ) − π Z k ( h ) as: 1 n n X k =1 ( h ( θ k ) − π Z k ( h )) = 1 n n X k =1  g k ( θ k ) − ¯ Q k g k ( θ k − 1 )  + 1 n n X k =1  ¯ Q k g k ( θ k − 1 ) − ¯ Q k − 1 g k − 1 ( θ k − 1 )  + 1 n  ¯ Q 0 g 0 ( θ 0 ) − ¯ Q n g n ( θ n )  . (25) imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 18 Since su p θ ∈ Θ | g n ( θ ) | ≤ 2 /ε 0 , a similar b ound h old for ¯ Q n g n and we conclud e that 1 n  ¯ Q 0 g 0 ( θ 0 ) − ¯ Q n g n ( θ n )  actually co nv erges almost surely to 0 as n → ∞ . W riting D k = g k ( θ k ) − ¯ Q k g k ( θ k − 1 ), it is eas- ily seen that { D k , F k } is a martin gale diﬀerence with b ounded in cr ement and w e deduce from martingales theory that 1 n P n k =1  g k ( θ k ) − ¯ Q k g k ( θ k − 1 )  con v erges almost surely to 0 as n → ∞ . Since Q Z is a Metrop olis k ernel and u sing the fact th at | min(1 , ax ) − min(1 , ay ) | ≤ a | x − y | for all a, x, y ≥ 0 we d educe that for any f unction h : Θ → R suc h that | h | ≤ 1,    ( Q ˜ Z n − Q ˜ Z n − 1 ) h ( θ )    ≤ Z      ˜ Z n ( θ ) ˜ Z n ( θ ′ ) − ˜ Z n − 1 ( θ ) ˜ Z n − 1 ( θ ′ )      e E ( x 0 ,θ ′ ) − E ( x 0 ,θ ) p ( θ , θ ′ )   h ( θ ′ ) − h ( θ )   dθ ′ ≤ 2 e M − m sup θ , θ ′ ∈ Θ      ˜ Z n ( θ ) ˜ Z n ( θ ′ ) − ˜ Z n − 1 ( θ ) ˜ Z n − 1 ( θ ′ )      ≤ C    ˜ Z n ( θ ) − ˜ Z n − 1 ( θ )    , using (21 − 22) , (26) for some ﬁnite constan t. Combining (23 and 26) w e ha ve the follo w ing w ell-kno wn consequence: there exists C < ∞ such that for all n ≥ 1: sup | h |≤ 1    π ˜ Z n ( h ) − π ˜ Z n − 1 ( h )    ≤ C sup θ ∈ Θ    ˜ Z n ( θ ) − ˜ Z n − 1 ( θ )    . (27) The stabilit y of P oisson’s equation for geometrically ergod ic tran s ition kernels is w ell kno wn (see e.g. [1, 3]). Combining (23), (26) and (27), we can ﬁnd a ﬁnite constan t C suc h that for all k ≥ 1:    ¯ Q k g k ( θ k − 1 ) − ¯ Q k − 1 g k − 1 ( θ k − 1 )    ≤ C sup θ , θ ′ ∈ Θ    ˜ Z k ( θ ) − ˜ Z k − 1 ( θ )    . (28) Giv en the expression of ˜ Z n ( θ ) in (20) it is not v ery hard to sho w there exists C < ∞ suc h that: sup θ ∈ Θ    ˜ Z k ( θ ) − ˜ Z k − 1 ( θ )    ≤ C dγ k + 1 min i P k l =1 1 { i } ( I l ) ! → 0 , (29) as k → ∞ as discussed ab o v e. It follo ws indeed that 1 n P n k =1 ( h ( θ k ) − π Z k ( h )) conv er ges a.s. to 0. Giv en that ˜ Z n ( θ ) → C Z ( θ ) almost surely for some ﬁnite co n stan t C , π Z n ( h ) = R e E ( θ ,x 0 ) Z n ( θ ) h ( θ ) dθ R e E ( θ ,x 0 ) Z n ( θ ) dθ − → R e E ( θ ,x 0 ) Z ( θ ) h ( θ ) dθ R e E ( θ ,x 0 ) Z ( θ ) dθ = π ( h ) , as n → ∞ b y L eb esgue’s dominated con verge nce. imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 19 References [1] Andrieu, C. and Moulines, ´ E. (200 6). On the ergo dicit y pr op erties of some adaptive MCMC algorithms. Ann. Appl. Pr ob ab. , 16 146 2–1505. [2] A t chade, Y. F. (2006). An adaptiv e version for the Metrop olis adjusted Langevin algorithm with a trun cated dr if t. Metho dol Com put A ppl P r ob ab , 8 235– 254. [3] A t chade, Y. F. and Liu, J. S. (2004). The Wang-Landau a lgorithm f or Mon te Carlo computation in general state spaces. T e chnic al R ep ort . [4] Besag, J. (1974). Spatial in teraction and the statistical analysis of lattice systems. J. R oy. Statist. So c. Ser. B , 36 192–2 36. With discussion b y D. R. Co x , A. G. Ha wkes, P . Cliﬀord , P . Whittle, K. Ord, R. Mead, J. M. Hammersley , and M. S. Bartlett and with a reply b y the author. [5] Cucala, L. , Marin, J .-M. , R o b er t, C. and Titterington , D. (2 008). A Ba y esian reassessmen t of nearest-neigh b our classiﬁcation. T ec h. rep., CEREMADE, Universit ´ e Paris Dauphine. arXiv:08 02.1357 . [6] Gelman, A. and Meng, X.-L. (1998) . Sim u lating normalizing constan ts: from imp ortance sampling to br idge sampling to path sampling. Statist. Sci. , 13 163–185. [7] Geyer, C. J . (199 4). On the con v ergence of Mon te Carlo maxim um like liho o d calculations. J. R oy. Statist. So c. Ser. B , 56 261–2 74. [8] Geyer, C. J. an d Thomp son, E. A. (1992) . Constrained Mon te Carlo maximum lik eliho o d for dep endent data. J. R oy. Statist. So c . Ser. B , 54 657–699. With discussion and a rep ly b y the authors. [9] Hurn, M. , Husby , O . and Rue, H. (200 3). A tutorial on image analysis. L e ctur e notes in Statistics , 173 87–141. [10] Iban ez, M. V. and Simo, A. (2003). P arameter estimation in Mark o v random ﬁeld im age mo deling with imp erfect observ ations. a comparativ e stu dy . Pattern r e c o gnition letters , 24 2377– 2389. [11] Kle inman, A. , R odrigue, N. , Bonnar d , C. and Philippe, H. (20 06). A maximum lik eliho o d framew ork for p rotein design. BMC Bioinformatics , 7 . [12] Møll er, J. , Pettitt, A. N. , Reeves, R. and Ber thel sen, K. K. (2006). An eﬃcien t Mark o v c h ain Monte Carlo metho d for distributions with intract able normalising constants. imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018 /Bayesian c omputation for intr actable normalizing c onstants 20 Biometrika , 93 451–458. [13] Møller, J . and W aagepetersen, R. P. (200 4). Statistic al infer enc e and simulation for sp atial p oint pr o c e sses , v ol. 100 of Mono gr aphs o n Sta tistics and A pplie d P r ob ability . Ch ap- man & Hall/CR C, Boca Raton, FL. [14] Murra y, I. , Gha h ramani, Z. and MacKa y, D. (2006 ). MCMC for doubly-in tractable distributions. Pr o c e e dings of the 22nd A nnual Confer enc e on Unc ertainty in Artiﬁcial Intel- ligenc e (UA I) . [15] Plagnol, V . and T a v ar ´ e, S. (2004). App ro ximate Ba y esian compu tation and MCMC . In Monte Carlo and quasi- M onte Carlo metho ds 2002 . S pringer, Berlin, 99–113. [16] R obins , G. , P., P. , Kalish, Y. and Lush e r, D. (200 7). An in tro duction to exp onen tial random graph mo d els for so cial netw orks . So c ial Networks , 29 173–19 1. [17] W ang, F. and Land au, D. P. (20 01). Eﬃcien t, m ultiple-range rand om w alk algo r ithm to calculate the d ensit y of states. P hysic al R ev iew L etters , 86 205 0–2053. [18] Younes, L. (1988). Estimation and annealing for Gibbsian ﬁelds. Anna les de l’Institut Henri Poinc ar´ e. P r ob abilit´ e et Stat istiq u es , 24 269–294. imsart ver. 2005/05/19 file: ALR08.tex date: November 10, 2018

Bayesian computation for statistical models with intractable normalizing constants

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment