Irreversible Monte Carlo Algorithms for Efficient Sampling

Irr eve rsible Monte Carlo Algorithms f or Efﬁcient Sampling K onstantin S. T uritsyn ( a,b,c ) , Michael Chertkov ( b,d ) , Marija V ucelja ( d,b ) a J ames F rank Institute, University of Chicago, C hicago , IL 60615, USA b Center for Nonlinear Studies & Theor eti cal Division, LANL, Los A lamos, NM 87545 , USA c Landau Institute for Theor etical Physics, Moscow 142432 , Russia d Department of Physics of Complex Systems, W eizmann Institute of Sciences, Rehovot 7610 0, Israel Equilibrium systems e volv e according t o Detailed Balance (DB). This principe guided dev elopment of the Monte-Carlo sampling techniques, of which Metropolis-Hastings (MH) algorithm is the famous representativ e. It is also kno wn that DB i s sufﬁcient but not necessary . W e construct irre versible deformation of a gi v en rev ersible algorithm capable of dramatic i mprov ement of sa mpling from kno wn distrib ution. Our transformation modiﬁes transition rates keepin g the structure of t ransitions intact. T o illustrate the general scheme we design an Irrev ersible version of Metropolis-Hastings (IMH) and test it on example of a spin cluster . Standard MH for the model suf fers from the critical slo wdo wn, while IMH is free from critical slowdo wn. P A CS numbers: 02 .70.Tt, 05.10.Ln, 64.75.Ef Recent decades h av e been marked by f ruitful in teraction be- tween physics and com puter science, with one of the m ost striking examples of such in teraction going back to 40s wh en physicists proposed a Mar kov Chain Mo nte Carlo (MCMC) algorithm [ 1, 2]. MCMC ev aluates large sums, or integrals, approx imately , in a sense im itating how nature would do ef ﬁ- cient sampling itself. Dev elopment o f this idea has becam e wide spread and prolifer ated a g reat variety of disciplines. (See [3, 4, 5] for a sample set of reviews in physics and com - puter science.) If one fo rmally follows the letter of the orig- inal MCMC sug gestion on e o ught to ensure that the Detailed Balance ( DB) c ondition is satisﬁed. This condition reﬂects microscop ic reversibility of the underly ing equilibr ium d y- namics. A re ader, impressed with in disputable su ccess o f th e reversible MCMC techniques, may still wonder i f the equilib- rium dy namics is the m ost efﬁcient strategy for sampling and ev aluating the integrals? I n this letter we argue that typically the answer is NO. Let u s try to illustrate the id eas on a sim- ple everyday life example. Consider m ixing sugar in a cup of coffee, which is similar to sampling, as long as the sugar particles h av e to explore the entire interio r of the cup. DB dynamics corre sponds to diffusion taking an enormou s mix - ing time. This is certainly not the best way to mix. More- over , o ur ev eryday experien ce sugg ests a b etter solution – enhance mixin g with a spoon. Spoon steering g enerates an out-of -equilibriu m external ﬂow which signiﬁcantly acceler- ates mixing, while ac hieving the same ﬁnal resu lt – uniform distribution of su gar con centration over the cup. In this letter we show constructiv ely , with a practical algorithm suggested , that similar strategy can be used to dec rease mix ing time o f any kno wn re versible MC MC algorithm s. There are two main obstacles wh ich prevent fast mixing by traditional MCMC methods. First, th e effecti ve en ergy lan d- scape can hav e high barriers, separating the energy minima. I n this case mixing time is do minated by r are pro cesses of over- coming th e bar riers. Sec ond, slow mixin g can origin ate from the high en tropy o f the states basin (too many co mparably im - portant s tates) p roviding major c ontribution to the sy stem p ar- tition function . In the later case mixin g time is determined by the numb er o f steps it takes for r ev ersible ( diffusi ve) random walk to e xplore all the relev ant s tates. MCMC algor ithms are best described on d iscrete example. Consider a graph with vertices i = 1 , . . . , N each labeling a state of th e system and edges ( i → j ) co rrespond ing to “allowed” transitions between the states. F or in stance, an N - dimensiona l h ypercub e cor respond s to a system of N spins (with N = 2 N states) with single- spin ﬂips allowed. An MCMC algorith m can b e describ ed in terms of the tran sition matrix T ij representin g the probab ility of a single MCMC step from state j to state i . Proba bility of ﬁnding the system in state i at time t , P t i , ev olves according to th e f ollowing Mas- ter Equation (ME): P t +1 i = P j T ij P t j . Stationary solution of ME, P t i = π i , satisﬁes the Balance Condition (BC): X j ( T ij π j − T j i π i ) = 0 . (1) Q ij = T ij π j from th e lhs of Eq. (1 ) ca n b e inter preted as the stationary probability ﬂux from state i to state j . Obviously , stationarity of th e p robab ility ﬂow reads as the co ndition fo r incoming and outgoin g ﬂu xes at any state to sum up to zero . Note also, that Eq. (1) is n othing but incompressibility condi- tion of the stationary probab ility ﬂow . The D B used in traditiona l MCMC algorithms is a more stringent cond ition, a s it requires the piecewise balan ce of terms in th e sum ( 1): f or any pair of states with allowed tran- sitions one requ ires, T ij π j = T j i π i . The main reason for DB to be so o ften used in p ractice originate s from its tre men- dous simplicity . Oth erwise, DB-con sistent schemes co nstitute only a small subset o f all other MCMC schemes co n vergent to the same stationa ry distrib ution π . From the hydrod ynamic point of view reversible MCMC co rrespond s to irrotation al probab ility ﬂows, while irreversibility relates to n onzero r o- tational part, e.g. correspond ent to vortices contained in the ﬂow . Putting it for mally , in the irreversible case antisym- metric part of the ergod ic ﬂow matrix is nonzero and it ac- tually allows the following cycle deco mposition, Q ij − Q j i = P α J α ( C α ij − C α j i ) , wh ere in dex α enumerates cycles on the graph of states with the adjacency matrices C α ij . Then , J α 2 stands for the magnitude of the p robability ﬂux ﬂowing over cycle α . This cycle r epresentation sug gests how an irre versible MCMC algo rithms can be constru cted. One simply add s cy- cles to a given reversible Markov Chain, ensuring that th e transition prob abilities r emain po siti ve. In some simp le ex- amples this straightf orward ap proach can be rath er efﬁcient. Consider, for example, th e problem of samplin g unifo rm dis- tribution on a two-d imensional lattice, with the character istic size L ≫ 1 . Reversible Mo nte Carlo algorithm s would re- quire appro ximately T ∼ L 2 steps to c on verge. On the other hand, imposing a kind of super-lattice o f size l cycles, where 1 ≪ l ≪ L , o ne observes that typical mixing time is now determined b y an interplay of usual an d “turbulent” d iffusion respectively . One estimates, T ∼ l 2 + L 2 /l , wh ere the two terms correspo nd to dyna mics on small (sub vortex-size) and large scales. The minimal value of T is ach iev ed at l ∼ L 2 / 3 , thus resulting in T ∼ L 4 / 3 . M oreover , one can reduce T even further to T ∼ L with the help of an additional liftin g o pera- tion, see e.g . [6] for r elated d iscussion. T he impo rtant lesson we draw fro m this example is that k nowing the st ate spa ce and carefully planting irreversible cycles one can indeed ach iev e a signiﬁcant acceleration of mixing. Howe ver prom ising it looks, the cycle-based pro cedure has two serio us caveats making it difﬁcult to implement. First, the num ber of states in m ajority of inter esting problem s is (at least) exponential in the num ber of physical degrees o f free- dom. (For e xample an Ising system consisting of N spin s has 2 N states.) Thus, one expects th at th e n umber o f cycles suf- ﬁcient for essential mixing imp rovement is also exponen tial, i.e. too large o f a n umber for alg orithmically feasible imple- mentation. Second, n ot all cycles a re eq ually desirab le, m ak- ing optimization over cycle plac ements to be a task of even higher algorithm ic co mplexity . Therefo re, aiming to achiev e practical and ﬂexible im- plementation , we h av e to aban don the cycle-ba sed idea an d instead focus on a better alternative – building irreversible MCMC algorithms v ia controlled deformatio n o f a n existent reversible MCMC. T o be more speciﬁc, in the remaind er of this Letter we ado pt and develop replication /lifting trick dis- cussed in [6, 7]. The main id ea beh ind o ur strategy is as f ol- lows. In stead o f p lanting into the system an irreversible proba- bility ﬂu x, corr esponden t to an “ incompr essible” BC , we add a m ixing desirable “co mpressible” ﬂux , and compensate for its com pressibility by building an add itional replica with r e- versed ﬂux and allowing some inter-replica transitions. T o enforce BC on e tunes the re plica switching probabilities com- puted “on the ﬂy ” ( and locally). The replication idea is illus- trated in Fig. 1. Ackn owledging generality of the setting, we focus in this Letter on explaining o ne relati vely simple imple- mentation of th is idea. Generalization an d modiﬁcations of the procedu re will be analyzed and discussed else where. Consider reversible MCMC algorithm character ized by the transition matrix T ij which (a) o beys the DB con dition, an d (b) c onv erges to the equ ilibrium distribution π i . Assume that each state has duplicate s in two replicas, m arked by ± . Fol- FIG. 1: Schematic representation of the replication deformation. Dashed lines represent replica swi tching transitions, which compen- sate fo r com pressibility of the probability ﬂows associated with so lid lines. lowing some local r ule (an example will be provided be low) one intro duces a sp lit between states within eac h o f the rep li- cas, T ij = T (+) ij + T ( − ) ij , such that a ll T ( ± ) ij are positi ve and satisfy , ∀ i 6 = j , T (+) ij π j = T ( − ) j i π i , to be called the skew DB condition . The total transition matrix, ˆ T =  ˆ T (+) ˆ Λ (+ , − ) ˆ Λ ( − , +) ˆ T ( − )  , (2) also contains nonzero and p ositiv e ( as probabilities) inter- replica terms, Λ ( ± , ∓ ) ii , allowing transitions only b etween two r eplicas of the same state. One tunes the inter- replica ter ms to ensure conv ergence to the giv en steady dis- tribution, π i . This is ac hieved by choo sing, Λ ( ± , ∓ ) ii = max n 0 , P j T ( ± ) ij − T ( ∓ ) ij o > 0 , and the d iagonal terms T ( ± ) ii are ﬁxed accordin g to the stoch asticity cond ition: T ( ± ) ii = 1 − P j T ( ± ) j i − Λ ( ∓ , ± ) ii . This de scription co mpletes our con struction of an irreversible MCMC alg orithm from a giv en reversible one. Note that this c onstruction is not un ique, and in gen eral multiple choices of Λ ( ± , ∓ ) ii are possible. The propo sed scheme is illustrated below on examp le of a simple spin system, with th e Metro polis-Hastings (MH)- Glauber al- gorithm chosen as the respective reversible p rototyp e. MH [8] is the most popular rev ersible MCMC algor ithm. MH-transition f rom a curren t state i is de ﬁned in two steps. (A) A n ew state j is selected random ly . (B) The proposed state is accepted with prob ability p acc = min(1 , π j /π i ) or re- jected wit h the probab ility 1 − p acc respectively . Selecting the propo sed state i.i.d. randomly from all possible single spin ﬂips cor respond s to the Glau ber dyna mics popu lar in simula- tions of sp in systems. Let u s n ow explain h ow to build an ir- reversible M CMC algorithm fo r sp in systems based on th e re- versible MH-Gla uber alg orithm. One consid ers separ ation in two replica s accordin g to the sign value, + or − , of th e spin to be ﬂipped . T hen, our irreversible MH-Glauber scheme works as follows. Spin α is selected i.i.d. randomly from the po ol of 3 2 4 6 8 10 12 14 2 4 6 8 10 12 14 16 18 20 log 2 N log 2 T FIG. 2: Correlation time of the total spin de-correlation in the spin cluster model. Dots correspon d to t he direct diagonalization of the transition matri ces. Crosses are correlation times found from re- specti ve MCMC simulations. Blue and red colors correspond to re- versible and irrev ersible algorithms respecti vely . Best ﬁtting slopes are giv en by T r ev ∼ N 1 . 43 and T irr ∼ N 0 . 85 . all other spins of th e system ha ving + or − values, d ependin g on the sign of th e replica where the sy stem stays. The selected spin is ﬂipped with th e pro bability p acc = min(1 , π j /π i ) , in which case the system stay s in the same r eplica. If the ﬂip is not accep ted the state is switch ed to its cou nterpart of the other replica wit h probability Λ ( ∓ , ± ) ii / (1 − P j T ( ± ) j i ) . ( These transitions are indicated as dash lines in Fig. 1.) Note, that in the case of th e Gla uber dynam ics both Λ ( ∓ , ± ) ii and P j T ( ± ) j i are local quantities d epending only o n the cu rrent state of the system, and calculating tr ansition prob abilities co nstitutes an insigniﬁcant computatio nal overhea d. W e cho ose N -spins ferr omagne tic clu ster (eq ual strength interaction between all the spin s) as a testbed an d discuss sampling from resp ectiv e station ary distribution, π s 1 ...s N ∼ exp h ( J / 2 N ) P k,k ′ s k s k ′ i . Note, that a state of th e sim- ple system is completely characterized by its global spin, S = P k s k , a nd respective prob ability distribution, P ( S ) ∼ N ! N + ! N − ! exp  J S 2 / (2 N )  , where N ± = ( N ± S ) / 2 is the number of positive/negati ve spins. Consider ed in the ther- modyn amic limit, N → ∞ , the system undergoes a phase transition at J = 1 . A way f rom the transition in the param- agnetic p hase, J < 1 , P ( S ) is cente red ar ound S = 0 and the width o f the distribution is estimated by δ S ∼ p N /J , which changes to δ S ∼ N 3 / 4 at the critical poin t J = 1 . One imp ortant consequence of the d istribution b roaden ing is a slowdown o bserved at the critical po int for reversible MH- Glauber sampling. T hen chara cteristic co rrelation time of S (measured in the nu mber of Markov chain steps) is estimated as T r ev ∼ ( δ S ) 2 , and the com putational overhead associated with the critical slo wdown is ∼ √ N . W e b rough t th is sim- ple mo del to illustrate advantage of using irrev ersibility . As shown below , the irreversible modiﬁcation of the MH-Glaube r algorithm applied to the spin cluster problem achieves c om- plete removal of the critical slowdown. T o estimate the corre- lation time in the irre versible case we ﬁrst note that, switching from one r eplica to an other the system a lw ays g o throu gh the S = 0 state. ( Th is follows directly fr om the o bservation that Λ (+ , − ) ii = 0 for the states with S > 0 and Λ ( − , +) ii = 0 for the states with S < 0 .) The Markovian nature o f the alg orithm implies tha t all the tr ajectories c onnecting two consequen t S = 0 -swip es are statistically i ndepen dent, and t herefor e co r- relation tim e is roughly equal to the t ypical number of steps in each of these trajectories. Recalling that insid e a replica (i.e. in b etween two consecutive swipes) dy namics o f S is strictly monoto nous, one estimates T irr ∼ δ S . Th is estimate suggests a sig niﬁcant a cceleration: T irr ∼ √ T r ev ≪ T r ev . No te, that one expects to observe signiﬁcant acceleratio n ev en outside o f the critical domain, for larger and smaller v alues of J . W e veriﬁed the cor relation time estimation via nume rical tests. Imp lementing reversible and irreversible versions of the MH-Glauber algorithm we, ﬁrst, analyzed decay o f the pair corre lation fu nction, h S (0) S ( t ) i , with time. Respective correlation time was r econstructed by ﬁtting the large time asymptotics with exponential f unction, exp( − t/T r ev ) , and exponential-o scillatory fu nction, exp( − t/T irr ) cos( ω t − ϕ ) , in the r ev ersible and irr ev ersible cases resp ectiv ely . Sec- ond, for both MH and IMH alg orithms we con structed tran- sition matrix correspon ding to the ran dom walk in S , cal- culated spectral gap, ∆ , related to the corr elation tim e as, T = 1 / R e ∆ . In b oth tests we analyz ed cr itical poin t J = 1 and used dif ferent v alues o f N rangin g fr om 16 to 40 96 . Simulation results are shown in Fig. 2. Th e results found for two settings are consistent with each o ther . Numerica l values ( T r ev ∼ N 1 . 43 and T irr ∼ N 0 . 85 ) are also in a reasonable ag reement with respec ti ve theoretical p redictions ( T r ev ∼ N 3 / 2 and T irr ∼ N 3 / 4 ) while a slight d iscrep- ancy can be attributed to ﬁnite size effects. Note, that in the irreversible case c orrelation time of the global spin correla- tion function (numb er of resp ectiv e MC steps) grows with the nu mber of sp ins, N , but does it slower than linearly . I n other words, mixing be comes so ef ﬁcient that equilibration of the global spin corr elations is observed even before all spin s of the sy stems are ﬂipped . One conclu des, that pe rforman ce of the irreversible scheme is at least as good as th e one of the cluster alg orithms [9, 1 0] tested o n the spin c luster m odel [11, 12]. (W e note, h owe ver , that direct comparison of the two algorithm s is no t straightforward, as the cluster algo rithm ﬂips many spins at o nce a nd there fore its convergence is norma lly stated in renorm alized un its.) Let us now discuss relation of the proposed algor ithm to previous studies. Alth ough potential power of alg orithms with broken DB has been re alized for already a while, only ha ndful of irreversible examples have bee n propo sed so f ar . One of th e examples is the sequen tial updating algo rithm [13] designed to simulate two-d imensional Ising system. In the essence, the algor ithm con sists of a num ber o f subsystems (replicas) with internal dy namics, each cha racterized by its own transi- tion matrix. In a great contrast with our a lgorithm, the s ystem switches between r eplicas in a predeﬁn ed d eterministic fash- ion. Similar id ea of breaking DB by switching irre versibly 4 but periodically betwe en reversible po rtions was impleme nted in the suc cessi ve over-relaxation algorithm o f [14]. Impor- tant sampling algorith m with DB broken is the Hybrid Monte Carlo of [15], where Hamiltonian dynamics is used to accel- erate samp ling. Once ag ain th e story h ere relates to r eplicas, each par ameterized by distinct momen tum, with switches be- tween the r eplicas contro lled deterministically by th e under- lying Hamiltonian. It is also app ropriate to cite relev ant ef- forts origina ted in statistics [7], m athematics [6] and comp uter science [17]. Sev eral simp le examples o f irrev ersible algo- rithms were discussed and analyzed in [7]. [6] showed that improvement in mixing, provided by a multi-replica lifting, does not allow reduc tion stronger than the one observed in the diffusi ve-to-ballistic scen ario, T → √ T , w here T is th e mix - ing time of the und erlying reversible algor ithm. The grain of salt here is that the acceleration was achie ved via a replication of an extremely h igh, ∼ k 2 , degree where k is the n umber of states. [1 7] showed t hat some co mplemen tary idea s, from the ﬁeld of d istributed networks, allo ws to reduce this rep lication scaling a bit. W e also ﬁnd it useful to brie ﬂy discuss reversible alg o- rithms showing certain similarity to the algorith m and ideas of the Letter . First of all, it is imp ortant to mention again cluster algorithms [9, 10, 1 6] wh ich wer e most successful in biting the odds of the critical slowdo wn in th e regular sys- tems of the Ising type. The trick here is to explore duality of the model, which allo ws two alterna ti ve representatio ns re- lated to each other via a state- non-lo cal tr ansformatio n. The cluster algo rithm switches between two dual representatio ns, thus realizin g lon g jum ps in the ph ase space. Note that suc h jumps would be fo rbidden by a phase- space local dynamics in either of the two represen tations. Best algorithm s of the clus- ter typ e ach iev es very impressive ra te of con vergence. The downside is in the fact that the cluster algor ithms are mod el speciﬁc and rather difﬁ cult in implemen tation b ecause o f ex- treme p hase spa ce no n-locality of th e steps. W orm algorithm of [18] allows essential re duction in the critical slo wdo wn via mapping to a high-temp erature-in spired lo op repr esentation and m aking local m oves there. The last but n ot the least, we mention the simulated an nealing algorith m of [19] b uilt o n a temperatur e-grad ed replication consistent with DB. On e inter- esting d irection for f uture research is to explore if ( and under which condition s) additiona l irreversibility can im prove al- ready go od mixing perfo rmance provided with in each of these reversible algo rithms. T o su mmarize, this le tter describes ho w to up grade a re- versible M C into an irrev ersible MC c on verging to the same distribution faster . T o prove the con cept we design a spin- problem speciﬁc irreversible algorithm, and te sted it on the mean-ﬁeld spin-cluster model. W e showed on this exam- ple that the irreversible modiﬁcation c an lead to d ramatic ac- celeration o f MC mixing. Our results sug gest th at the ir re- versible MC alg orithms are esp ecially beneﬁcial for accel- eration of mixing in systems containing multiple soft and zero mod es, howe ver u naccessible fo r standard ( reversible) schemes. This situation occur s typically in systems exper i- encing cr itical slowdown in the vicinity of a ph ase tran sition, and it is also an inh erent pro perty of system s possessing in- ternal sym metries o f h igh d egree. E ntropic degeneracy is the main factor limiting th e conver gence o f regular MCMC alg o- rithm in these pro blems. T o co nclude, th e ideas discussed in the letter might be usefu l in stud ies o f pha se transitions, soft matter dynam ics, pr otein structures and granula r m edia. The au thors ar e g rateful to V . Ch ernyak, F . Krzakala, J. Machta, D. Shah and T . W itten f or insp iring d iscussions and useful remarks. The work at LANL was carried out under the au spices of the Nationa l Nuclear Security Administration of the U.S. D epartment of Ene rgy at Los Alamos National Laborato ry u nder Contract No. DE-AC 52-06 N A25396. MC also acknowledges the W esto n V isiting Professorship Pro- gram supportin g his stay at th e W eizmann In stitute, where part of this work was done. [1] N. Metropolis and S . Ulam, J. Am. Stat. Assoc., 44 , 247, 335- 341 (1949). [2] N. Metropolis, A. W . Rosenbluth, M.N. Rosenbluth, A.H. T eller, and E. T eller . J. of Chem. Phys. 21 (6), 1087-1092 (1953). [3] D.P . Land au, K. Binder , A guide to Monte Carlo Simulations in Statistical Physics , Cambridge Univ ersity Press (2000) [4] W . Krauth, Statistical Mechanics: Algorithms and Computa- tions , Oxford Univ ersity Press (2006) [5] M. Jerrum, A. Sinclair, Appr oximation Algorithms for NP-har d Pr oblems , PWS Publishing (1996) [6] F . Chen, L. Lov asz, I. Pak, Lifting Marko v Chains to Speed up Mixing , Proceedings of the A CM symposium on Theory of Computing, 275 - 281 (1999) [7] P . Diaconis, S. Holmes, R. M. Neal, T echnical Report BU- 1385-M, Cornell Univ ersity (1997) [8] W .K. Hastings, Biometrika 57 , 97 - 109 (1970) [9] R.H. Swendsen, J.-S . W ang, Phys. Re v . Lett. 58 , 86–88 (1987) [10] U. W olff, Phys. Re v . Lett. 62 , 361 (1989) [11] T .S. Ray , P . T amao , and W . Klein, Phys. Rev . A, 39 , 5949 (1989) [12] N. Persky et. al., Phys. Re v . E, 54 , 2351 (1996) [13] R. Ren, G. Orkoulas, J. Chem. Phys. 124 , 0641 09 (2006) [14] S.L. Adler , Phys. Rev . D 23 , 290 1 (1981) [15] I. Horvath, A.D. K enned y , Nucl. Phys. B 510 , 367 (1998) [16] R. Edwards, A.D. Sokal, Phys. Rev . D 38 , 200 9–2012 (1988) [17] K. Jung, D. Shah, J. Shin, http://web.mi t.edu/devavra t/www/liftedm c.pdf [18] N. Prok of ’ev , B. S vistunov , Phys. R e v . Lett., 87 , 160601 (2001) [19] S. Kirkpatrick, C. D. Gelatt, M. P . V ecchi, S cience, 220 , 4598, 671-680 (1983)

Irreversible Monte Carlo Algorithms for Efficient Sampling

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment