A Gibbs sampler on the $n$-simplex

We determine the mixing time of a simple Gibbs sampler on the unit simplex, confirming a conjecture of Aldous. The upper bound is based on a two-step coupling, where the first step is a simple contraction argument and the second step is a non-Markovi…

Authors: Aaron Smith

The Annals of Applie d Pr obabil ity 2014, V ol. 24, N o. 1, 114– 130 DOI: 10.1214 /12-AAP916 c  Institute of Mathematical Statistics , 2 014 A GIBBS SAMPLER ON THE n -SIMPLEX By Aaron Smith 1 ICERM, Br own University W e determine the mixing time of a simple Gibbs sampler on th e unit simplex, confi rming a conjecture of Aldous. The up p er b ound is based on a t wo-step coupling, where the first step is a simple con- traction argument and the second step is a n on-Marko vian coupling. W e also present a MCMC-based p erfect sampling algorithm based on our pro of which can b e applied with Gibbs samplers that are harder to analyze. 1. In tro d uction. Giv en a mea sure µ o n a con vex b o d y K ⊂ R n , h o w can w e efficientl y obtain ind ep endent samples from the distribution of µ ? This problem arises in the computational sciences, and a frequen tly-used to ol is Mark o v c hain Monte Carlo (MCMC) [ 5 ]. Because MCMC metho ds pro du ce nearly-indep endent samples only after a lengthy mixing p erio d, a long-standing mathematica l question is to analyze the mixing times of the MCMC algorithms in common use. The analysis of discrete MCMC algorithms is v ery adv anced, with pr ecise b ound s for man y d ifficult problems as w ell as some general theory that has receiv ed recen t exp osition in [ 1 , 11 ]. F or samplers on con tin uous state spaces, there has b een s ome general theory based on geometric or coup ling argumen ts (see [ 12 , 13 , 24 ] and [ 20 ]), but many of th e tec h niques built for discrete c h ains seem to run in to tec hnical difficulties. There are also v ery few w ell-und ersto o d simp le chains, in stark contrast to the discrete th eory , whic h has b een bu ilt on man y detailed analyses of sp ecific chains; though, see [ 16 , 17 ] for some ve ry n ice analyses of t wo slo wer walks on th e simplex; [ 18 , 19 ] for group w alks; and [ 10 ] for some applications. This pap er is an attempt to carefully analyze a simple conti n u ous chain, n amely a Gibbs sampler on the n -simplex. In addition, it illustrates the use of tw o p o w erf u l Received Au gust 2011; revised December 2012. 1 Supp orted by a S tanford Graduate F ello wship court esy of the Hewlett F oundation. AMS 2000 subje ct classific ations. 60J10, 65C04. Key wor ds and phr ases. Mark ov chain, Gibbs sampler, p erfect sampling. This is an electronic r eprint of the original article published by the Institute of Mathematical Statistics in The Annals of Applie d Pr ob ability , 2014, V ol. 2 4, No. 1 , 114– 1 30 . This reprint differs fr o m the o riginal in pagination and typogr aphic detail. 1 2 A. SMITH tec hniques from the d iscr ete theory: n on-Mark o vian coupling [ 2 , 4 , 7 ] and coupling from th e p ast [ 15 ]. The id eas in this pap er can b e used f or a num b er of other problems. The analysis was initially motiv ated b y a simp ler ve rsion of Kac’s r an d om w alk on S ( n ) or S O( n ) (see [ 8 , 14 , 23 ] and esp ecially [ 9 ] for recen t progress). I t is also a steppin g stone tow ard analysis of Gibbs samplers on more com- plicated con vex sets, su c h as con tingency tables. In the author’s thesis and a forthcoming note, w e us e th e tec h nique in th is pap er to imp ro ve existing analyses of these samplers and some others [ 22 , 23 ]; there is still subs tan tial ro om for impro vemen t. In this pap er, we will discuss mixing in terms of the p opular total v ariation distance. F or a Mark ov c h ain with transition kernel K on a measurable space (Ω , Σ) and unique stationary distribution π , the total v ariation distance to stationarit y after t steps of a Mark ov c h ain started at ω ∈ Ω is giv en by sup A ∈ Σ | K t ( ω , A ) − π ( A ) | . Most of this pap er will b e concerned with a sp ecific Gibbs s ampler X t on th e n -simplex ∆ n = { X ∈ R n | P n i =1 X [ i ] = 1; X [ i ] ≥ 0 } whose stationary distribution is the uniform distr ibution on ∆ n . T o tak e a mov e in this Mark o v c hain, b egin by c ho osing 1 ≤ i < j ≤ n and λ ∈ [0 , 1] indep endent ly and uniformly . Then set X t +1 [ i ] = λ ( X t [ i ] + X t [ j ]) , X t +1 [ j ] = (1 − λ )( X t [ i ] + X t [ j ]) , (1.1) X t +1 [ k ] = X t [ k ] ( k 6 = i, j ) . This samp ler wa s first mentioned in [ 1 ], where the mixing time w as sho wn to b e O ( n 2 log n ) . Aldous suggested in his list of op en p roblems that the correct mixin g time wa s O ( n log n ), and we confirm this, also d emonstrating a pre-cutoff win do w of mo derate size: Theorem 1.1 (Simplex mixing time). Fix C > 3 and n satisfying n > max(4096 , 2 C + 7 2 ) and n log n > 3(1 / 2+ C ) C C / 2 − 1 / 4 . If K t n is the t -step tr ansition ke rnel for the Gibbs sampler describ e d ab ove, and U n is the uniform distribution on ∆ n , then for al l t > 10 C n log n , x ∈ ∆ n and A ⊂ ∆ n me asur able, | K t n ( x, A ) − U n ( A ) | < n 3 − C + 2 n − C / 2 − 1 / 4 + 4 n 11 / 4 − C . On the other hand, for 0 < C < 1 and t < (1 − C ) n log n , lim inf n →∞ sup x ∈ ∆ n sup A ⊂ ∆ n | K t n ( x, A ) − U n ( A ) | = 1 . A GIBBS SAMPLER ON THE N -SIMPLEX 3 The conditions on the constan t C are not onerous. Ch o osing C = 4 giv es a mixing time of at most 40 n log n that is effectiv e for n > 4096. Sections 2 – 4 are d evote d to proving the up p er b oun d of Theorem 1.1 , and Section 5 prov es the lo w er b ound. In Section 6 , we briefly discuss applications of our metho d to closely r elated Marko v chai ns. In Section 7 , we use the ideas of the pro of to d ev elop a p erfect sampling algorithm w ith w ider applicabilit y . 2. Notatio n , basic lemmas and strategy . W e recall th at a coupling of Mark o v c hains with transition k ern el K is a pro cess ( X t , Y t ) so th at marginally , b oth X t and Y t are Mark ov c hains with transition kernel K . T h e pro of re- lies on the follo wing standard lemma (see [ 11 ], Theorem 5.2—they wo r k in discrete space, but their pr o of do es not rely on this assump tion): Lemma 2.1 (F undamental coupling lemma). Assume ( X t , Y t ) is a c ou- pling of Markov chains su ch that if X s = Y s , then X t = Y t for al l t > s . Assume also that X 0 = x and Y 0 is distribute d ac c or ding to the stationary distribution of K . Define the r andom time τ to b e the first time at which X t = Y t . Then su p A ∈ Σ | K t ( x, A ) − π ( A ) | ≤ P [ τ > t ] . Throughout this note, w e are int erested in a coupling of Mark o v chains ( X t , Y t ), where X 0 starts according to some d istribution of our c ho osing, Y 0 starts out unif orm ly o v er the sim p lex and b oth marginally ev olve as the Gibbs sampler b eing studied. W e will describ e a joint ev olution of our t w o c hains X t and Y t , suc h that at a sp ecific time, the probabilit y of ha vin g coupled is v ery h igh. The m etho d for pro vin g this is sligh tly u n usual. In most coup lin g p ro ofs, includin g the non-Mark o vian couplin g in [ 7 ], there is an attempt to m ak e the tw o c hains get closer thr oughout the pro cess. In our metho d, w e attempt to couple only at a sp ecific fi nal time, and include many mo ves that are lik ely to increase the distance b et ween the chains b y a large amoun t. In fact, our joint distribution will generally assign 0 pr obabilit y to coupling at any prior time. In ord er to dev elop our global join t coup ling, we describ e tw o p ossible one-step coup lings of X t and Y t . These are th e “prop ortional” coupling and the “subset” coupling. Throughout, we will alw ays choose to up date entries at the same co ordin ates i, j in b oth X t and Y t at ev ery step; only the u n iform v ariable λ used in repr esen tation ( 1.1 ) sometimes d iffers. Because of this, w e often d escrib e the coup lings b y describing only ho w the up date v ariables λ are coupled. In th e p rop ortional coup ling, we c ho ose an i, j and λ for Y t , and then use the same c hoices for X t in representat ion ( 1.1 ), so that, f or example, en try i in Y t is u p dated to λ ( Y t [ i ] + Y t [ j ]) w h ile en try i in X t is u p dated to λ ( X t [ i ] + X t [ j ]). The subset coupling is sligh tly more complicated. As b efore, w e c ho ose t wo co ord inates i, j to b e u p dated in b oth c hains. Next, 4 A. SMITH define the weig h t w ( S, X ) that a ve ctor X give s to a s ubset S ⊂ [ n ] to b e w ( S, X ) = P s ∈ S X [ s ]. A subset coupling of X t and Y t will alw ays b e with resp ect to some sp ecific su bset S ⊂ [ n ] = { 1 , 2 , . . . , n } . I f either i, j ∈ S or i, j / ∈ S , p erform a prop ortional coupling. Otherwise, assume w ith ou t loss of generalit y that i ∈ S an d j / ∈ S and also that X t [ i ] + X t [ j ] ≥ Y t [ i ] + Y t [ j ]. In this case, call a coupling of X t +1 and Y t +1 conditioned on X t , Y t , i and j a subset coupling if P [ w ( X t +1 , S ) = w ( Y t +1 , S )] ≥ Y t [ i ] + Y t [ j ] − | P k ∈ S/ { i } ( Y t [ k ] − X t [ k ]) | X t [ i ] + X t [ j ] . W e will sa y a subs et coupling has succeeded if w ( X t +1 , S ) = w ( Y t +1 , S ), and that it h as failed otherwise. W e will generally not b e concerned with what happ ens wh en a sub set couplin g h as failed. W e will c hec k no w that suc h a coupling exists. Note th at, conditioned on X t and the co ordinates i, j up dated at time t , the weig ht w ( S, X t +1 ) is uniformly distribu ted on [ P k ∈ S/ { i } X t [ k ] , P k ∈ S/ { i } X t [ k ] + X t [ i ] + X t [ j ]]. Similarly , conditioned on Y t , i and j , w ( S, Y t +1 ) is uniformly distrib u ted on [ P k ∈ S/ { i } X t [ k ] , P k ∈ S/ { i } X t [ k ] + X t [ i ] + X t [ j ]]. Lemma 2.2 (T otal v ariation distance of tw o u n iform distributions). L et U b e distribute d uniformly on [ a, a + b ] and let U ′ b e distribute d uni f ormly on [ a ′ , a ′ + b ′ ] . A ssume b ≤ b ′ . Then kL ( U ) − L ( U ′ ) k TV ≤ 1 − b −| a − a ′ | b ′ . Pr oof. Note th at U has d en sit y f ( x ) = 1 b 1 x ∈ [ a,a + b ] , and U ′ has d en - sit y g ( x ) = 1 b ′ 1 x ∈ [ a ′ ,a ′ + b ′ ] . Thus, the total v ariation distance b et ween them is giv en by kL ( U ) − L ( U ′ ) k TV = 1 − Z x min( f ( x ) , g ( x )) dx = 1 − 1 b ′ Z x ∈ [ a,a + b ] ∩ [ a ′ ,a ′ + b ′ ] 1 dx = 1 − 1 b ′ [min( a + b, a ′ + b ′ ) − max( a, a ′ )] ≤ 1 − 1 b ′ [ b + min( a, a ′ ) − max( a, a ′ )] = 1 − b − | a − a ′ | b ′ .  Since it is alw a ys p ossible to couple tw o random v ariables W, Z so that P [ Z = W ] = 1 − kL ( Z ) − L ( W ) k TV , Lemma 2.2 implies that sub s et couplings exist. A GIBBS SAMPLER ON THE N -SIMPLEX 5 W e now giv e a rough and nonr igorous description of the pro of strategy , whic h pro ceeds b y d escribing a t wo-st ep coupling of X t and Y t . F or the first T 1 steps, X t and Y t ev olv e alwa ys under the prop ortional coupling. This coupling is Marko v ian, and we prov e that und er th is coupling, the t wo c hains are very close in sup-norm with h igh probability after ab out n log n steps. In the n ext ph ase, we record th e up d ated co ordinates ( i ( t ) , j ( t )) from time T 1 unt il a sp ecified time T = T 1 + T 2 . This information is u sed to construct a nested s equence P t of partitions of the s et of coord inates [ n ]. With high pr obabilit y , the sequen ce will satisfy P T 1 = { [ n ] } and P T 1 + T 2 = {{ 1 } , { 2 } , . . . , { n }} . W e will then coup le X t to Y t step by step, using only information ab out the futur e that is con tained in P t , using a prop ortional coupling for some steps and a sub set coupling f or others. W e then show th at it is p ossib le to keep w ( S, X t ) = w ( S, Y t ) for all S ∈ P t with h igh probability . If all of these high-probabilit y ev ent s o ccur, th en the final partition consists of only sin gletons, and this im p lies that X T [ i ] = Y T [ i ] for 1 ≤ i ≤ n . The tw o main difficu lties are constructing th e partition and sh o wing that X t and Y t remain close thr ou gh ou t th e second ph ase. It is worth p oin ting out that the dep endence of the coupling on the futu re is in fact necessary to get the correct mixing time, or indeed an y b ound that is o ( n 2 ). This is analogous to the w ell-known fact that n o Marko vian coupling of the random transp osition w alk on S n can give a coup lin g time that is o ( n 2 ). See Lemma 8 of [ 2 ] for a short pro of of this fact for the walk on S n whic h applies essen tially without mo dification to this Gibbs samp ler. Here is a list of some commonly used v ariables that h a ve b een reserved, for reference wh ile reading: X t , the Mark o v c h ain of in terest. Y t , another instance of th e Mark o v c h ain, started at stationarit y . P t , a set partition of [ n ]. S , a piece of a partition. i, j , co ordinates we u p d ate. λ, λ x , λ y , un iform rand om v ariable used to up date a c hain, or c hains X t and Y t . w ( S, X ) , the weig ht assigned by v ector X to a subset S ⊂ [ n ] . 3. First couplin g stage. Define Z t = k X t − Y t k 2 2 . T he follo wing p ro vides an upp er b ound for E [ Z t ] under the prop ortional coupling describ ed ab o ve: Lemma 3.1 (Burn-in). L et X t and Y t b e two c opies of the Markov chain c ouple d by the pr op ortional c oupling, and Z t define d as ab ove. After s ≥ 3 2 dn log n steps of the pr op ortional c oupling, E [ Z s ] ≤ 2 n − d . Pr oof. The pro of is by a one-step con tr action estimate. Assu me X t and Y t are coupled by th e prop ortional coupling fr om time 0 on wa rds. Let F t b e 6 A. SMITH the σ -algebra generated by the random v ariables X t and Y t ; note th at Z t is F t -measurable. Then, under the prop ortional coupling, the follo w ing equal- it y comes from conditioning on the co ordin ates ( i, j ) = ( i ( t ) , j ( t )) up dated at time t , E [ Z t +1 |F t ] = 1 n ( n − 1) X i 6 = j E  λ 2 ( X t [ i ] + X t [ j ] − Y t [ i ] − Y t [ j ]) 2 + (1 − λ ) 2 ( X t [ i ] + X t [ j ] − Y t [ i ] − Y t [ j ]) 2 + X k 6 = i,j ( X t [ k ] − Y t [ k ]) 2  . Note E [ λ 2 ] = E [(1 − λ ) 2 ] = 1 3 . Expanding the ab o ve, w e obtain E [ Z t +1 |F t ] = 1 n ( n − 1) X i 6 = j  2 3 ( X t [ i ] − Y t [ i ]) 2 + 2 3 ( X t [ j ] − Y t [ j ]) 2 + 4 3 ( X t [ i ] − Y t [ i ])( X t [ j ] − Y t [ j ]) + X k 6 = i,j ( X t [ k ] − Y t [ k ]) 2  . Collecting co efficients of Z t and using the fact that Z t = P k ( X t [ k ] − Y t [ k ]) 2 , this equals  1 − 2 3 n  Z t + 4 3 n ( n − 1) X i 6 = j ( X t [ i ] − Y t [ i ])( X t [ j ] − Y t [ j ]) . Noting that P n i =1 ( X t [ i ] − Y t [ i ]) = 0 , the last term can b e rewritten as X i 6 = j ( X t [ i ] − Y t [ i ])( X t [ j ] − Y t [ j ]) = n X i =1 ( X t [ i ] − Y t [ i ]) ! 2 − n X i =1 ( X t [ i ] − Y t [ i ]) 2 = − Z t . Putting this together, w e find that E [ Z t +1 |F t ] =  1 − 2 3 n − 4 3 n ( n − 1)  Z t . A GIBBS SAMPLER ON THE N -SIMPLEX 7 And so in particular, E [ Z t |F 0 ] = E [ E [ Z t |F t − 1 ] |F 0 ] ≤  1 − 2 3 n  E [ Z t − 1 |F 0 ] . By indu ction on t , it is then easy to see that E [ Z t |F 0 ] ≤  1 − 2 3 n  t Z 0 . Bound Z 0 b y Z 0 ≤ X k ( X 0 [ k ] 2 + Y 0 [ k ] 2 ) ≤ X k X 0 [ k ] + Y 0 [ k ] = 2 . W e conclude that at times t ≥ 3 2 dn log n , E [ Z t ] ≤ 2 n − d .  Using the obvious inequalit y | X t [ i ] − Y t [ i ] | ≤ √ Z t and Mark ov’s inequalit y , P [ | X t [ i ] − Y t [ i ] | > δ ] ≤ δ − 1 n − d/ 2 for all δ > 0 and i ∈ [ n ] . 4. Second coup ling stage. Let T = ( 1 2 + ε ) n log n b e fix ed , for some ε > 0 to b e decided later. L et Y 0 b e c h osen from the un iform d istr ibution on the simplex, and let X 0 satisfy k X 0 − Y 0 k 1 ≤ n − d . W e describ e a coupling ( X t , Y t ) from time 0 to time T with the prop erty that X T = Y T with high prob- abilit y as n go es to infinity , for an y fi xed ε > 0 and d suffi cien tly large. First, we choose a s equ ence of pairs of distinct elemen ts 1 ≤ i ( t ) 6 = j ( t ) ≤ n indep en d en tly and uniformly for times 0 ≤ t ≤ T . These pairs ( i ( t ) , j ( t )) will b e the co ord inates up dated at time t in b oth X t and Y t . Then de- fine a sequ ence of graphs G t for 0 ≤ t ≤ T − 1 to hav e ve rtex s et [ n ] and edge set E t = { ( i ( t ) , j ( t )) , ( i ( t + 1) , j ( t + 1)) , . . . , ( i ( T − 1) , j ( T − 1)) } , thr o w- ing out rep eated edges, if an y . W e also define G T to b e th e graph on [ n ] with n o edges. F rom this sequence, constru ct a sequence of partitions of [ n ], P (0) , P (1) , . . . , P ( T ) by letting the sets in P t b e exactly the conn ected comp onent s of G t . Since the edges satisfy E s ⊂ E t for ev ery s > t , it is clear that for an y A ∈ P s , there m u s t b e some B ∈ P t with A ⊂ B . In this sense, the sequence of partitions is n ested. Also note from the constr u ction that either P t and P t +1 are the same, or they differ by ha ving a single set in P t split in to tw o sets in P t +1 . Define the sequence of mark ed time 0 ≤ t 1 < · · · < t k = T − 1 as th e times at whic h P t ℓ 6 = P t ℓ +1 . T hen, for marked time t ℓ , d efine S ( t ℓ , 1) 8 A. SMITH and S ( t ℓ , 2) to b e the tw o sets that were split apart at time t ℓ , lab eled so that | S ( t ℓ , 1) | ≤ | S ( t ℓ , 2) | . Note that there are at m ost n − 1 mark ed times, and that there are n − 1 if an d only if P 0 = [ n ]. Note also th at P 0 = [ n ] if and only if G 0 is connected. The question of whether or not th e random graph G 0 is connected is a classical question in random graph theory . Th e follo wing result, found in [ 3 ] among other p laces, is go o d enough for our pu rp oses: Lemma 4.1 (Connectedness for Er dos–Ren yi graphs). L et ε > 0 b e fixe d, and let T = T ε b e the first time that ( 1 2 + ε ) n log n distinct e dges have b e en chosen. Then the pr ob ability that G 0 is c onne cte d is at le ast 1 − n − ε . This has the immed iate corollary: Lemma 4.2 (Connectedness for G 0 ). L et ε > 0 , and assume n > 4 satis- fies n log n > 3(1+2 ε )(1 / 2+2 ε ) ε . Then let T > ( 1 2 + 2 ε ) n log n . Then the pr ob ability that G 0 is c onne cte d is at le ast 1 − 2 n − ε . Pr oof. Ignoring the orderin g of ve rtices in ed ges, define A t = 1 ( i ( t ) ,j ( t )) / ∈ { ( i (0) ,j (0)) ,... , ( i ( t − 1) ,j ( t − 1)) } 1 t − 1 x ] ≤ P [ B > x ] for all x > 0. F or n satisfying n log n > 3 ε (1+2 ε )(1 / 2+2 ε ) , Ch er n off ’s inequalit y giv es the b ound P [ T ε > ( 1 2 + 2 ε ) n log n ] ≤ P [ B > εn log n ] ≤ e − nε/ 2 , whic h is less than n − ε for n ≥ 4. Let E T b e th e ev ent that G 0 is disconnected. Since P [ G 0 ] ≤ P [ T ε > T ] + P [ E T | T ε < T ] , the r esult follo ws immediately from this b ound on T ε and Lemma 4.2 .  Ha ving constru cted this partition, we n o w couple X t and Y t for time 0 ≤ t ≤ T . Firs t, w e need to choose the co ordinates to up date; we do this by up d ating co ordinates i ( t ) and j ( t ) at time t in b oth c hains. Next, w e must describ e the coupling of the co ordinates. If t is a marked time, then p erform a subset coupling for the set S ( t, 1). Otherwise, do a prop ortional coupling. Note that if t is a marked time, then one of i ( t ) or j ( t ) is in S ( t, 1) and the other is in S ( t, 2), so the coupling pro ceeds according to the description in Section 2 in the case i ∈ S , j ∈ S c . W e claim that th is couples th e t wo walks b y time T with high pr obabilit y: A GIBBS SAMPLER ON THE N -SIMPLEX 9 Lemma 4.3 (Coupling for close c h ains). F or ε > 0 , d > 11 2 , n > max( d + 3 2 , 4096) , n log n > 3(1+2 ε )(1 / 2+2 ε ) ε and T > ( 1 2 + 2 ε ) n log n , the c oupling de- scrib e d in this se ction has the pr op erty P [ X T 6 = Y T ] ≤ 2 n − ε + 5 n (15 − 2 d ) / 4 . W e b egin b y sh o wing th at subset couplings succeed with high pr obabilit y: Lemma 4.4 (Sub s et coupling). Assume n ≥ 6 , and let ( X t , Y t ) b e a p air of elements of ∆ n satisfying sup k | X t [ k ] − Y t [ k ] | = n − f and in f k X t [ k ] , inf k Y t [ k ] ≥ 2 n − b , with f ≥ b + 1 . Then for al l S ⊂ [ n ] and up date c o or dinates i ∈ S , j / ∈ S , P [ w ( X t +1 , S ) = w ( Y t +1 , S )] ≥ 1 − 3 n b +1 − f under the subset c ou- pling. Pr oof. Assume that X t [ i ] + X t [ j ] ≥ Y t [ i ] + Y t [ j ]. T h en, from its defin i- tion, the su bset coupling succeeds w ith probabilit y at least P [ w ( X t +1 , S ) = w ( Y t +1 , S )] ≥ Y t [ i ] + Y t [ j ] − | P k ∈ S/ { i } ( Y t [ k ] − X t [ k ]) | X t [ i ] + X t [ j ] ≥ Y t [ i ] + Y t [ j ] − 2 | S | n − f Y t [ i ] + Y t [ j ] + 4 n − f ≥ (1 − 2 n 1 − f + b )(1 − 4 n − f + b − 8 n − 2 f +2 b ) , whic h , for n ≥ 6, is at least 1 − 3 n b +1 − f .  Ha ving b ound ed the pr obabilit y of failure when X t , Y t are close, we must sho w that they remain close as long as all subset couplings succeed. F or S ⊂ [ n ] , define k X k S = P s ∈ S | X [ s ] | . T h en: Lemma 4.5 (C loseness). L e t X t , Y t b e c ouple d as describ e d ab ove, and assume that P 0 = { [ n ] } , that al l subset c ouplings up to time t have suc c e e de d and that k X 0 − Y 0 k 1 < ε . Then k X t − Y t k S < ε for every S in P t Pr oof. There are t w o t yp es of coupling to take care of. F or a p rop or- tional coupling with co ordinates i and j , | X t +1 [ i ] − Y t +1 [ i ] | + | X t +1 [ j ] − Y t +1 [ j ] | = λ t | X t [ i ] + X t [ j ] − Y t [ i ] − Y t [ j ] | + (1 − λ t ) | X t [ i ] + X t [ j ] − Y t [ i ] − Y t [ j ] | ≤ | X t [ i ] − Y t [ i ] | + | X t [ j ] − Y t [ j ] | . 10 A. SMITH Since i and j alw a ys connect elemen ts of the same set in P t , this sho ws that prop ortional couplings neve r increase k X t − Y t k S . Oth erwise, assume that at time t w e had a successful sub s et coupling f or the s u bset S along edges i and j . Without loss of generalit y , assum e that i ∈ S := S ( t, 1) and j ∈ R := S ( t, 2). Since w ( X 0 , [ n ]) = w ( Y 0 , [ n ]) = 1, and all subset coup lin gs up to time t ha v e succeeded, we h av e w ( X t , Q ) = w ( Y t , Q ) for all Q ∈ P t . In particular, w ( X t , S ∪ R ) = w ( Y t , S ∪ R ) . Then we note that X t +1 [ i ] − Y t +1 [ i ] = X s ∈ S \{ i } ( Y t [ s ] − X t [ s ]) = X t [ i ] − Y t [ i ] + X s ∈ R ( X t [ s ] − Y t [ s ]) and so | X t +1 [ i ] − Y t +1 [ i ] | ≤ | X t [ i ] − Y t [ i ] | + k X t − Y t k R , whic h immediately imp lies that k X t +1 − Y t +1 k S ≤ k X t − Y t k S ∪ R . An analogous calculation sh o ws that k X t +1 − Y t +1 k R ≤ k X t − Y t k R ∪ S as w ell. By ind uction on t , this implies that k X t +1 − Y t +1 k S ≤ k X 0 − Y 0 k 1 and k X t +1 − Y t +1 k R ≤ k X 0 − Y 0 k 1 .  Lemma 4.6 (Largeness). P [inf 1 ≤ i ≤ n inf 0 ≤ t ≤ n 2 − 1 Y t [ i ] ≤ n − 4 . 5 − k ] ≤ 2 n − k for n > max(2 k , 4096) . Pr oof. Let q 1 , . . . , q n b e indep endent rand om v ariables c h osen from the exp onentia l distribution with mean 1, and let Q = P n i =1 q n . It is well kno wn (see, e.g., Algorithm 2.7.1 of [ 21 ]) that ( q 1 Q , . . . , q n Q ) is distributed uniform ly on th e simplex ∆ n . In p articular, Y t D = ( q 1 Q , . . . , q n Q ). T aking a un ion b ound o ve r 1 ≤ i ≤ n and 0 ≤ t ≤ n 2 − 1, it is thus sufficien t to sho w P  q 1 Q ≤ n − 1 . 5 − k  ≤ 2 n − k . Let E b e the ev ent that q 1 Q < n − 1 . 5 − k , E 1 the ev ent that q 1 < n − k − 0 . 25 , and E 2 the ev ent that Q > n 1 . 25 , and ob s erv e that E ⊂ E 1 ∪ E 2 . It is immediate that P [ E 1 ] = 1 − e − n − k − 0 . 25 ≤ n − k − 0 . 25 + 1 2 n − 2 k − 0 . 5 . A GIBBS SAMPLER ON THE N -SIMPLEX 11 F or n > 4096, this is certainly less than n − k . T o b ound the pr obabilit y that Q is large, n ote that for all 0 < θ < 1, E [ e θ Q ] = E [ e θ q 1 ] n = 1 (1 − θ ) n . Setting θ = 1 − n − 0 . 25 , Mark o v’s inequalit y give s P [ E 2 ] ≤ e (1 / 4) n log n + n − n 1 . 25 . It is s traigh tforward to c heck that, for n > max(2 k , 4096), this is less than n − k . Since P [ E ] ≤ P [ E 1 ] + P [ E 2 ], this pro ves the lemma.  Finally , it is p ossible to pro v e that in f act most couplings will succeed: Lemma 4.7 (W eigh t lemma). Fix d > 11 2 and n > max( k + 3 2 , 4096) . As- sume P 0 = { [ n ] } and that k X 0 − Y 0 k 1 ≤ n − d . L et E b e the event that the e q u ality w ( S, X t ) = w ( S, Y t ) (4.1) holds for al l 0 ≤ t ≤ T and al l S ⊂ P ( t ) . Then P [ E ] ≥ 1 − 5 n (15 − 2 d ) / 4 . Pr oof. The equalit y ( 4.1 ) clearly holds at time 0. Also note that if it holds at an un mark ed time t , it must also hold at time t + 1, since at u n - mark ed times the we ights of parts S of the partition P t cannot c hange in ei- ther X t or Y t . So, assu me that equalit y ( 4.1 ) holds for all times t ≤ t k for some mark ed time t k . If the sub set coupling is successful at time t k , then w ( S ( t k + 1 , 1) , X t k +1 ) = w ( S ( t k + 1 , 1) , Y t k +1 ) by construction. How ev er, b y the as- sumption that equalit y ( 4.1 ) holds unti l time t k , w ( S ( t k , 1) ∪ S ( t k , 2) , X t k ) = w ( S ( t k , 1) ∪ S ( t k , 2) , Y t k ). Sin ce w ( A ∪ B , X ) = w ( A, X ) + w ( B , X ) for an y disjoin t sets A, B and any v ector X , this implies w ( S ( t k + 12) , X t k +1 ) = w ( S ( t k + 1 , 2) , Y t k +1 ) as well. Since none of the other parts of P t k c hange w eight, this imp lies that w ( S, X t k +1 ) = w ( S, Y t k +1 ) h olds for all S ∈ P t k +1 . It remains to b ound only the p robabilit y that the fi rst sub set coupling to fail o ccurs at time t k . By Lemma 4.3 and th e assu mption of this lemma, k X t k − Y t k k 1 ≤ X S ∈ P t k k X t k − Y t k k S ≤ X S ∈ P t k n − d (4.2) ≤ n 1 − d . 12 A. SMITH Set q = d 2 + 3 4 . By Lemma 4.6 , inf i,t Y t [ i ] ≥ n − q with probabilit y at least 1 − 2 n 4 . 5 − q . Assuming this holds, Lemma 4.4 along with inequalit y ( 4.2 ) implies that any particular s ubset coupling succeeds with p robabilit y at least 1 − 3 n 2+ q − d . T aking a union b ound o ver all at m ost n − 1 subset couplings, all sub set couplings succeed with probabilit y at least 1 − 3 n 3+ q − d − 2 n 4 . 5 − q = 1 − 5 n (15 − 2 d ) / 4 .  It is n o w time to pr o ve Lemma 4.2 . Recall that if P 0 = [ n ] and all comp o- nen ts Q ∈ P t satisfy w ( Q, X t ) = w ( Q, Y t ) for 0 ≤ t ≤ T , then at time T the t wo w alks hav e coupled. There are only t w o w ays for this to fail to happ en. The fir st is the ev ent E 1 that P 0 6 = [ n ]. By Lemma 4.2 , P [ E 1 ] ≤ 2 n − ε . Th e second is th e eve n t E 2 that at least one subset coupling f ails. By L emma 4.7 and our assu mption that n log n > 1 2 + 2 ε , whic h implies T < n 2 , we ha v e the b ound P [ E 2 ] ≤ 5 n (15 − 2 d ) / 4 . Combining these tw o b ounds pr o ve s the lemma. Finally , we pro v e Theorem 1.1 . W e will run the prop ortional coupling unt il time T 1 = 9 C n log n , and then we will run the second phase coupling from time T 1 unt il time T = 10 C n log n . T here are only tw o wa ys to hav e X T 6 = Y T . Th e first is the ev ent E 1 that k X T 1 − Y T 1 k 1 > n − (2 C + 2) . By the commen t immediately after Lemma 3.1 , P [ E 1 ] ≤ n 3 − C . The second is the ev ent E 2 that the second phase coup lin g fails. By Lemma 4.3 , P [ E 2 ] ≤ 2 n C / 2 − 1 / 4 + 4 n 11 / 4 − C . Com b ining these tw o b ounds pro v es the theorem. W e also note that it is p ossible to impro ve th e top of the pre-cutoff wind o w from 30 to 12 by b eing more careful in the ab ov e pro ofs, but there is no hop e of actually p ro vin g a cutoff without a substant ially new argum en t. 5. Lo wer b ound. Since our wa lk is o v er a con tin uous space, th e total v ari- ation distance to stationarit y of the Marko v c hain at time t must b e at least the probabilit y that not all co ord inates ha ve b een c hosen by time t . Since only t w o coord inates are c h osen at a time, the classical coup on-collector re- sults in [ 6 ] tell us that at time T = 1 2 n (log n − c ), sup A ∈ Σ | K T n ( x, A ) − π ( A ) | ≥ 1 − exp( − exp( c )) + o (1) as n go es to infin ity . It is p ossible to imp ro ve the constant a little bit. Let X 0 = (1 , 0 , . . . , 0) , and let Q t ∈ { 0 , 1 } n b e a vect or ke eping trac k of u p dates in X t , s tarted at Q 0 = (0 , 0 , . . . , 0). If co ordin ates i and j are up dated in X t at time t , set Q t +1 [ i ] = Q t +1 [ j ] = 1 if at least one of X t [ i ], X t [ j ] are nonzero, and s et Q t +1 [ k ] = Q t [ k ] for all k 6 = i, j . If X t [ i ] = X t [ j ] = 0, then set Q t +1 = Q t . Next, let τ j = in f { t | Q t +1 6 = Q t , t > τ j − 1 } with τ 0 = 0 . W e note th at E [ τ 1 ] = n 2 , and for j > 1, E [ τ j ] = n ( n − 1) 2 j ( n − j ) . Thus, letting τ = P n − 1 j =1 τ j , E [ τ ] = n 2 + n 2 2 n − 1 X j =2 1 j ( n − j ) = n log n + O ( n ) . A GIBBS SAMPLER ON THE N -SIMPLEX 13 Similarly , since τ i and τ j are ind ep endent for i 6 = j , it is easy to calculate that th e v ariance V [ τ ] ≤ 6 n 2 . By Chebyshev’s inequalit y , for all ε > 0 and n sufficien tly large, P [ τ < (1 − ε ) n log n ] = O  1 log n 2  . (5.1) Finally , observe th at for t < τ , at least one en try of X t is 0, and so taking H j = { X ∈ ∆ n | X [ j ] = 0 } and A ∈ Σ to b e S j H j , we fin d | K T n ((1 , 0 , . . . , 0) , A ) − U n ( A ) | ≥ P [ T < τ ]. Com b ined w ith inequalit y ( 5.1 ), this pro ve s the lo we r b oun d on the mixing time. 6. Closely related walks. It is wo r th p ointing out a small n u mb er of cases where the ab o v e argumen t go es thr ough with very few change s. The first al- lo ws u s to go from sampling fr om the un if orm distribu tion to sampling fr om a large class of d istributions on the simp lex, in cluding symmetric Diric hlet distributions. A t eac h step of the r andom w alk, instead of choosing λ accord- ing to the uniform d istribution on [0 , 1], c ho ose it according to some other distribution with t w ice d ifferentiable cdf F satisfying F [ x ] = 1 − F [1 − x ] for all 0 ≤ x ≤ 1 2 . Then the ab o ve arguments sho w that the total mixing time is O ( n log n k F ′′ k ∞ +1 1 − 2 E [ λ 2 ] ), essential ly w ithout mo d ifcation. It is also p ossible to app ly this argument to the discrete analogue of the simplex, in whic h M indistinguish ab le balls are stored in n b o xes; these are kno wn as M -comp ositions of n . The analogous Marko v chain inv olves c ho os- ing t wo b oxes, holding N balls b et wee n them, at ev ery step, and p utting 0 ≤ k ≤ N of them in the fi rst b o x with probability 1 N +1 , and th e remain- der in the second b o x. The argum ents giv en ab ov e apply to the d iscrete c hain, giving a mixing b ound of order O ( n log n ), b ut there need to b e enough balls for the con tin uou s approximati on to b e go o d at eac h step. A straight forw ard step-through of the argumen t giv es a b ound of O ( n log n ) for M > n 18 . 5 ab o ve. Aldous’ greedy argum en t, whic h giv es an upp er b ound of O ( n 2 log n ) , holds for M > n 5 . 5 . The follo w-up pap er [ 22 ] will discuss a w ider v ariet y of r elated walks, requiring larger mo difcations. 7. P erfect sampling on the simp lex. In this section, we discuss ho w the t wo- c hain coupling describ ed ab o ve can b e mo difi ed into a grand coupling, and h ow to use this fact to create a p erfect sampling algorithm. Before describing the algorithm, we m en tion that it is not a practical wa y to obtain uniform p oint s on the simplex. How ev er, the s ame algorithm can b e u sed to obtain samples f rom the other d istributions on the s implex mentioned in Section 6 , man y of whic h are a priori m uc h h arder to samp le from. Th e metho d is also of some in terest as a relativ ely rare in stance of a coupling 14 A. SMITH from the past (CFTP) algorithm w hic h d o es not use monotonicit y or ant i- monotonicit y . T o b egin, w e recall the CFTP algorithm, describ ed in greater detail in [ 15 ]. First, c h o ose some large time T , and start a cop y of the Mark o v c h ain X ω − T for eac h ω in the sample space Ω . Next, couple all of the chains f rom time − T to time 0. If the c hains h av e coalesce d by time 0, th e resulting single v alue is distr ib uted according to th e stationary distr ibution of the c hain. If n ot, we couple c hains started at all p oin ts from − 2 T to T and ke ep the ev olution from − T to 0, then from − 3 T to − 2 T kee ping the ev olution from − 2 T to 0, and s o on until coale scence at 0 has o ccur red. F or Mark o v chains on a fi nite state sp ace, it is easy in theory to construct a grand coupling that will eve n tually coalesce , though bad couplings are v ery inefficient. In p ractice, even on finite c hains, CFTP is only used if the c hain h as s ome very sp ecial pr op erties. Th e most p opular prop erties are monotonicit y and its t win an timonotonicit y . Br iefly , we introd uce a partial order ≤ on Ω , and sa y that a coupling of tw o chains X t , Y t is monotone if X 0 ≤ Y 0 implies X t ≤ Y t for all t > 0 . It is then easy to see that if our grand coupling is m onotone, it is s u fficien t to keep trac k of chains started at maximal and minimal eleme n ts of the p oset. If they ha v e coupled, all states ha ve coupled. F or Marko v c h ains on in finite state sp aces, man y grand coup lings will nev er coalesce, and of course we cannot k eep track of all of the starting v alues on a computer. Some chains ha ve a monotonicit y pr op ert y , b ut suc h a pr op erty is not obvi ous for the simp lex mo del. Despite th is, there is a fairly efficien t p erfect sampling algorithm th at requires trac king only n + 1 p oint s (and a little extra o ve rhead eac h time an ep o c h of length T fails to coalesce ). Let X v t b e a copy of the Marko v c hain started at v = ( v [1] , v [2] , . . . , v [ n ]) at time 0 , an d let e j b e the j th standard unit b asis ve ctor. W e construct a grand coupling of the c hains X v t as follo w s. F or time 0 < t < T 1 , do a pro- p ortional coupling. That is, at eac h time t , c ho ose coord inates i ( t ) , j ( t ) and parameter λ ( t ), and up date all c hains u sing these three n u mb ers. W e claim that for eac h t , there exists a matrix M t [ i, j ] s uc h th at for any v , X v t [ i ] = P n j =1 M t [ i, j ] v [ j ]. T o see this, observe that X t +1 = M i ( t ) ,j ( t ) ,λ ( t ) X t , where M i ( t ) ,j ( t ) ,λ ( t ) [ i ( t ) , i ( t )] = M i ( t ) ,j ( t ) ,λ ( t ) [ i ( t ) , j ( t )] = λ ( t ) , M i ( t ) ,j ( t ) ,λ ( t ) [ j ( t ) , i ( t )] = M i ( t ) ,j ( t ) ,λ ( t ) [ j ( t ) , j ( t )] = (1 − λ ( t )), M i ( t ) ,j ( t ) ,λ ( t ) [ k , k ] = 1 for k / ∈ { i ( t ) , j ( t ) } and all other en tries are 0. W e can then wr ite M t [ i, j ] = Q s 3 2 dn log n , P [ k X e j t − X e k t k 1 > n − k ] < 2 n 2 k + 1 − d , and so taking a u nion b ound an d ap p lying the inequalit y pro v ed j ust ab ov e, P h sup v,w ∈ ∆ n k X v t − X w t k 1 > n 3 − k i < 2 n 2 k + 3 − d . This tells us that after O ( n log n ) steps, th e L 1 distance b et wee n an y p air of p oin ts is extremely s m all with high probabilit y . The second step of the coupling is almost iden tical to the algo r ithm giv en in Section 4 of this note. Run X ( n − 1 ,...,n − 1 ) t from time T 1 to time T , r ecording all choice s of i ( t ) , j ( t ) and λ ( t ) from repr esentati on ( 1.1 ). Then form the s ame partition p ro cess, and use it to attempt su bset couplings of all v ariables to this sp ecial c hain. W e will p erform these couplin gs in such a w ay that with high pr ob ab ility , all c hains sim ultaneously ha ve successful subset couplings, rather than merely ha ving a high probability of a substantia l fr action of the subset couplings succeeding. A t eac h subset coupling stage, use the up date v ariable λ ( t ) for th e c h ain X ( n − 1 ,...,n − 1 ) t . F or eac h other c h ain X v t , there will b e some probabilit y p ( t, v ) that X v t p erforms a successfu l subset coupling with X ( n − 1 ,...,n − 1 ) t . Let p b e a kno wn lo we r b ound on inf v ∈ ∆ n p ( t, v ). Th is can b e obtained from Lemma 4.4 and inequalit y ( 7.1 ). T o determin e th e u p d ate v alue of X v t , c ho ose a single uniform random v ariable U . If U < p , let X v t ha ve a successful sub- set coup lin g, in whic h case the change to X v t +1 dep end s only on i ( t ) , j ( t ) and X ( n − 1 ,...,n − 1 ) t +1 , n ot th e particular v alue of U . Otherwise, up date with λ tak en fr om the U − p 1 − p ’th quantil e of the remainder distribution. When 16 A. SMITH X v t [ i ]+ X v t [ j ] X ( n − 1 ,...,n − 1 ) t [ i ]+ X ( n − 1 ,...,n − 1 ) t [ j ] ≤ 1 , this h as densit y f ( λ ) = C  1 − X v t [ i ] + X v t [ j ] X ( n − 1 ,...,n − 1 ) t [ i ] + X ( n − 1 ,...,n − 1 ) t [ j ] 1 g − 1 ( λ ) ∈ [0 , 1]  for g ( λ ) = λ X ( n − 1 ,...,n − 1 ) t [ i ] + X ( n − 1 ,...,n − 1 ) t [ j ] X v t [ i ] + X v t [ j ] + 1 X v t [ i ] + X v t [ j ] X s ∈ S n { i } ( X ( n − 1 ,...,n − 1 ) t [ s ] − X v t [ s ]) and C a normalizing constan t. An analogous formula holds when X v t [ i ] + X v t [ j ] X ( n − 1 ,...,n − 1 ) t [ i ] + X ( n − 1 ,...,n − 1 ) t [ j ] > 1 . Under this grand coupling, all subset couplings succeed together with probabilit y at least p . As long as the n p oint s X e j T 1 are close as measured in L 1 metric, and X ( n − 1 ,...,n − 1 ) t [ i ] r emains far from 1 and 0, the pr o of of Lemma 4.4 tells us that all of th e subset couplings s u cceed with high probability . Finally , if a single subset coupling fails at time t , then all chains should b e coupled according to the prop ortional coupling for time s > t . It remains to determine what to do if one of the ab o ve subset couplings fails. In order to obtain a p erfect sample, it will b e necessary to lo ok at a grand couplin g for the ep o c h − 2 T ≤ t ≤ − T . Assume f or now that the grand coupling describ ed ab o ve su cceeds for the c hain started at − 2 T . It is n ecessary to d etermine th e v alue at time 0 of the c h ain started f r om ( 1 n , . . . , 1 n ) at time − 2 T . Assum e that at time − T , this c hain is at v ∈ ∆ n . Then our sample will b e X v 0 . F ortunately , from the ab o ve description, it is p ossible to calculate this v alue from v and the v alues of i, j, λ and U u sed during the firs t ep o ch. Thus, it is sufficien t to record those O ( T ) pieces of information in eac h failed ep o ch. A longer discu ssion of this algorithm, with pseudo co d e, ma y b e foun d in [ 23 ]. It should b e n oted that, for other target distributions on th e simplex, suc h as those in Section 6 , the ab o ve algorithm can also b e used without a rigorous b ound on the mixing time and can b e used to rigorously c h ec k an estimated b ound of time T . Simp ly run the algorithm w ith ep o c h size T ; the n um b er of failed ru n s k out of a total of N runs is distributed as a binomial random v ariable with some unknown probabilit y q , where q is an upp er b ound on the total v ariation distance to stationarit y at time T . A GIBBS SAMPLER ON THE N -SIMPLEX 17 Ac kn o wledgment s. Th e au th or thanks Da vid Aldous for mentio ning the problem, and O lena Blumb erg, Persi Diaconis, Bob Hough, Daniel Jerison and John Jiang for many helpful conv ersations. T h e author also thanks the review ers for friend ly and usefu l comment s. REFERENCES [1] A ldous, D. an d Fill, J. (1994). Reversible Marko v c hains and random walks on graphs Av ailable at http://www.s tat.berke ley.edu/ ~ aldous/RWG /book.htm l . [2] B lumberg, O. (2011). A coupling pro of for random transp ositions. Preprint. [3] B ollob ´ as, B. (2001). R andom Gr aphs , 2nd ed. C am bridge Studies i n A dvanc e d Mathematics 73 . Cambridge Univ. Press, Cambridge. MR1864966 [4] B ur ton, R. and Ko vchego v, Y. (2011). Mixing times via sup er-fast coupling. Preprint. [5] D i aco nis, P. (2009). The Mark o v chain Monte Carlo revolution. Bul l. Amer. Math. So c. (N. S.) 46 179–205. MR2476411 [6] Erd ¨ os, P. and R ´ enyi, A. (1961). On a classical problem of probability theory (in English). Publ. Math. Inst. Hung. A c ad. Sci., Ser. A 6 215–220. [7] H a y es, T. and Vigoda, E. (2003). A non-Marko vian coupling for randomly sampling colorings. In FOCS Pr o c e e dings . [8] Ji ang, J. (2012). T otal va riation b ound for Kac’s random walk. Preprint. [9] Ji ang, J. (2012). P olyn omial mixing time of the Kac random w alk on the orthogonal group. Preprint. [10] Jones, G . L. and Hober t, J. P. (2001). H onest exploration of intractable prob- abilit y d istributions via Marko v chain Monte Carlo. Statist. Sci. 16 312–3 34. MR1888447 [11] Levin, D. A. , Pere s, Y . and W ilmer, E. L. (2009). Markov Chains and Mixing Times . A m er. Math. So c., Providence, RI. MR2466937 [12] Lov ´ asz, L. (1999). Hit-and- run mixes fast. Math. Pr o gr am. 86 443–46 1. MR1733749 [13] Lov ´ asz, L. and Vemp ala, S. (2003). Hit and run is fast and fun. T ec hnical rep ort, Microsof t R esearch. [14] Oliveira, R. I. (2009). On the converge nce t o equilibrium of Kac’s random walk on matrices. Ann. Appl. Pr ob ab. 19 1200–1231 . MR2537204 [15] Pro pp, J. G. and Wilson, D. B. (1996). Exact sampling with coupled Mark ov chai ns and applications to statistical mec hanics. R andom Structur es Algorithms 9 223–252. [16] Randall, D. and Winkler, P. (2005). Mixing p oints on an interv al. In Pr o c e e di ngs of ANALCO . SIAM, Philadelphia. [17] Randall, D. and Winkler, P. (2005). Mixing p oints on a circle. In A ppr oxim a- tion, R andomi zation and Combinatorial Optimi zation . L e ctur e Notes in Com- puter Scienc e 3624 426–435. Springer, Berlin. MR2193706 [18] Rosenthal, J. S. (1994). Random rotations: Characters and random walks on SO( N ) . Ann . Pr ob ab. 22 398–423. MR1258882 [19] Rosenthal, J. S . (1995). O n generalizing the cut-off p henomenon for rand om w alks on groups. A dv. in Appl. Math. 16 306–320. MR1342831 [20] Rosenthal, J. S . ( 1995). Minorization conditions and converg ence rates for Marko v chai n Monte Carlo. J. Amer. Statist. Asso c. 90 558–566 . MR1340509 [21] Rubinstein, R. Y. and Me lamed, B. (1998). Mo dern Si mulation and Mo deling . Wiley , N ew Y ork. MR1607871 18 A. SMITH [22] Smith, A. (2012). Analysis of converg en ce rates of some Gibbs samplers on con t in- uous state spaces. Preprint. [23] Smith, A. (2012). Some analyses of Mark ov chains by th e coupling metho d. Ph.D. thesis, S tanford Un iv., Stanford, CA. [24] Yuen, W . K. (2001). A pplications of geometric b ou n ds to con verge nce rates of Mark o v chains and Marko v processes on R n . Ph .D . th esis, Univ. T oronto. MR2702043 ICERM Bro wn University 121 South Main Street Pro vidence, Rhode Island 029 12 USA E-mail: asmith3@math.stanford.edu

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment