Coarse-to-Fine Sequential Monte Carlo for Probabilistic Programs

Coarse-to-Fine Sequen tial Mon te Carlo for Probabilistic Programs Andreas Stuhlm ¨ uller Rob ert X.D. Ha wkins N. Siddharth Noah D. Go o dman Departmen t of Psyc hology Stanford Univ ersit y { astu, rxdh, nsid, ngo o dman } @stanford.edu Abstract Man y practical tec hniques for probabilistic inference require a sequence of distributions that interpolate b etw een a tractable distribu- tion and an in tractable distribution of interest. Usually , the sequences used are simple, e.g., based on geometric a verages b etw een distri- butions. When mo dels are expressed as prob- abilistic programs, the models themselves are highly structured ob jects that can be used to derive annealing sequences that are more sensitiv e to domain structure. W e prop ose an algorithm for transforming probabilistic pro- grams to c o arse-to-ﬁne pr o gr ams which hav e the same marginal distribution as the original programs, but generate the data at increas- ing levels of detail, from coarse to ﬁne. W e apply this algorithm to an Ising mo del, its depth-from-disparit y v ariation, and a factorial hidden Marko v mo del. W e sho w preliminary evidence that the use of coarse-to-ﬁne mo dels can make existing generic inference algorithms more eﬃcient. 1 INTR ODUCTION Imagine watc hing a tennis tournamen t. Y our visual system makes fast and accurate inferences ab out the depth-ﬁeld (how far aw a y are diﬀerent patc hes?), the ob jects (is that a ball or rack et?), their tra jectories, and many other properties of the scene. A p ow erful in tuition is that such feats of inference are enabled by c o arse-to-ﬁne reasoning: ﬁrst getting a rough sense of where the action is in the scene, ab out ho w far a wa y it is, and so on; later reﬁning this impression to pic k out details. The app eal of coarse-to-ﬁne reasoning is manifold. First, there is introspection: When faced with a complex reasoning task, it often helps to tak e a step back, try to understand the big picture, and then fo cus on what seems most promising. The big 2 8 10 3 1 1 1 1 3 2 5 2 1 Figure 1: Incremen tal coarsening reduces surprise in SMC. Particles (red) are directed tow ards high- probabilit y regions (light) step by step as we reﬁne the state space from coarse to ﬁne. The n umbers indi- cate how man y particles are asso ciated with a particular state. picture tends to ha ve few er moving parts, and its parts tend to b e easier to understand. Neuroscience pro- vides another angle: for instance, face pro cessing in the high-level visual cortex plausibly follows coarse-to- ﬁne principles (Goﬀaux et al., 2010), and stereoscopic depth p erception similarly pro ceeds from large to small spatial scales (Menz and F reeman, 2003). Finally , there is a rich set of existing applications of coarse-to-ﬁne tec hniques for sp eciﬁc applications in a diverse set of areas including ph ysical c hemistry (Lyman and Zuck- erman, 2006), sp eec h pro cessing (T ang et al., 2006), PCF G parsing (Charniak et al., 2006), and machine translation (Petro v et al., 2008). Despite the success and app eal of coarse-to-ﬁne ideas , they hav e been dif- ﬁcult to apply in general settings. Here w e propose a system for deriving coarse-to-ﬁne inference from any mo del written as a probabilistic program. W e do this b y lev eraging program structure to transform the initial program into a m ulti-level coarse-to-ﬁne program that Coarse-to-Fine Sequen tial Monte Carlo for Probabilistic Programs can b e used with existing inference algorithms. Probabilistic programming languages pro vide a uni- v ersal and high-level representation for probabilistic mo dels, separating the burdens of mo deling from those of inference. Y et the diﬃculty of inference can grow quic kly as the state space (num b er of program execu- tions) grows large. A widely-used technique for infer- ence in large state spaces is Sequential Monte Carlo (SMC), a class of algorithms based on constructing a sequence of distributions, b eginning with an easy-to- sample distribution and ending with the distribution of interest, with each distribution serving as an im- p ortance sampler for the next. The success of SMC rests on the quality of the approximating sequence. W e presen t a generic metho d for deriving coarse-to- ﬁne sequences of approximating distributions from a probabilistic program. Our approach can b e seen as building a hierarchical mo del from an initial mo del, where each stage of the hierarc hy resolves more details of the state space than the one b efore. W e additionally augment eac h lev el of the hierarch y with a coarse approximation to the evi- dence (implemented via heuristic factors , see Section 3.1), in order to specify a useful conditional distribution at each level. In practice we create the hierarchical mo del and heuristic factors at once b y sp ecifying ho w to “lift” each elemen t of the program—elementary distribu- tions, factors, primitive functions, and constants—to the coarser levels. The resulting mo del supports coarse- to-ﬁne inference by SMC, where the nth distribution in the sequence is simply the state space of the n-coarsest lev els; this is correct inference for the original model b ecause, b y construction, the marginal distribution o ver the ﬁnest level is the original distribution. Mo del transformations let us directly use existing se- quen tial inference algorithms to p erform coarse-to-ﬁne inference, rather than prop osing a new inference al- gorithm p er se . This is in con trast to essentially all prior work on coarse-to-ﬁne inference, including Kiddon and Domingos (2011) and Steinhardt and Liang (2014). One b eneﬁt of this mo dular approac h is that adv ances in SMC algorithms immediately yield improv ements to coarse-to-ﬁne inference. Another b eneﬁt is the concep- tual clarity that comes from an explicit representation of the coarse-to-ﬁne mo del. In the following, w e ﬁrst review probabilistic programs and Sequen tial Monte Carlo. W e then describ e our coarse-to-ﬁne program transform and how it lifts ran- dom v ariables, primitive functions, and factors to op- erate on m ultiple levels of abstraction. W e apply this transform to tw o mo dels in the domain of tracking partially observ able ob jects ov er time giv en visual in- formation, a depth-from-disparit y mo del and a factorial hidden Marko v model, and sho w preliminary evidence that it may help reduce inference time in these do- mains. Finally , we discuss the current limitations of this framework, the circumstances where our approach to coarse-to-ﬁne inference is a go o d ﬁt, and outline researc h questions raised b y this new approac h. 2 BA CK GROUND 2.1 PR OBABILISTIC PROGRAMS Probabilistic programs are mo dels expressed in T uring- complete languages that supply primitives for random sampling and probabilistic inference (e.g., Go o dman et al., 2008; Koller et al., 1997; Pfeﬀer, 2007). Many existing probabilistic mo dels hav e been expressed con- cisely as probabilistic programs. A distinguishing fea- ture of probabilistic programming as a mac hine learning tec hnique is that it separates inference techniques from mo deling assumptions. Th us, any adv ances in algo- rithms provide b eneﬁts for a wide range of applications at once. While we demonstrate our technique for a small set of mo dels chosen for their p edagogical v alue, w e emphasize that the technique can b e applied to a m uch wider range of mo dels without mo diﬁcation. W e express probabilistic programs in WebPPL (Go o d- man and Stuhlm¨ uller, 2014), a small probabilistic language embedded in Ja v ascript. This language is univ ersal, and feature-ric h, so w e exp ect the tech- niques to generalize straigh tforw ardly to other lan- guages. In this language, all random c hoices are mark ed b y sample ; the argument to sample is a distribution ob ject (also called Elementary R andom Primitive , or ERP ), its return v alue a sample from this distribution. Calls to functions suc h as flip(0.5) are shorthand for sample(bernoulliERP, [0.5]) . T o enable probabilistic conditioning, the language sup- p orts factor statemen ts. The argument to factor is a score: a num b er that is added to the log-probability of a program execution, th us increasing or decreasing its relative p osterior probabilit y . This includes hard conditioning on evidence as a sp ecial case (scores 0 and −∞ ). Finally , the language supp orts inference primitiv es suc h as ParticleFilter and MH (Metrop olis- Hastings). Eac h of these takes as an argument a thunk, that is, a sto chastic function that itself takes no argu- men ts. And each of these computes or estimates the distribution on return v alues of this thunk (its mar ginal distribution ), taking into account the re-weigh ting in- duced by factor statements. Figure 2a sho ws a program that implement s a simpli- ﬁed one-step version of multiple ob ject trac king: the noisily observed v alue 7 could hav e b een produced by either x or y , each of which is uniformly chosen from Andreas Stuhlm ¨ uller Rob ert X.D. Hawkins N. Siddharth Noah D. Go o dman v a r n o i s y O b s e r v e = f u n c t i o n ( o b s ) { v a r s c o r e = - 3 * d i s t a n c e ( o b s , 7 ) f a c t o r ( s c o r e ) } v a r m o d e l = f u n c t i o n ( ) { v a r x = s a m p l e ( u n i f o r m E R P ) v a r y = s a m p l e ( u n i f o r m E R P ) v a r o b s e r v a t i o n = f l i p ( . 5 ) ? x : y n o i s y O b s e r v e ( o b s e r v a t i o n ) r e t u r n [ x , y ] } (a) A probabilistic program v a r n o i s y O b s e r v e = f u n c t i o n ( o b s ) { v a r s c o r e = - 3 * d i s t a n c e ( o b s , 7 ) f a c t o r ( s c o r e ) } v a r m o d e l = f u n c t i o n ( ) { v a r x = s a m p l e ( u n i f o r m E R P ) v a r h e u r i s t i c S c o r e = - d i s t a n c e ( x , 7 ) f a c t o r ( h e u r i s t i c S c o r e ) v a r y = s a m p l e ( u n i f o r m E R P ) v a r o b s e r v a t i o n = f l i p ( . 5 ) ? x : y n o i s y O b s e r v e ( o b s e r v a t i o n ) f a c t o r ( - h e u r i s t i c S c o r e ) r e t u r n [ x , y ] } (b) Rewritten using heuristic factors Figure 2: Tw o probabilistic programs with the same marginal distribution (shown in the ﬁnal panel in Figure 1). { 1 , 2 , . . . , 8 } . The ﬁnal panel in Figure 1 shows the marginal distribution on [ x, y ] for this program. In probabilistic programs, the same syntactic v ariable can b e used multiple times. The protot ypical example is the geometric distribution: v a r g e o m e t r i c = f u n c t i o n ( ) { r e t u r n f l i p ( 0 . 1 ) ? 0 : 1 + g e o m e t r i c ( ) } The call to flip(0.1) ma y occur an unbounded num- b er of times. F or many purposes, it is necessary to distinguish and refer to these diﬀerent calls. In the con text of MCMC, Wingate et al. (2011) in tro duced a suitable naming sc heme based on stac k addresses. The address of a random choice is a list of syn tactic lo ca- tions, one for each function on the function call stac k at the time when the random v ariable w as sampled. W e will build on this scheme to associate corresp onding random choices on diﬀerent levels of coarsening with eac h other, and use address in the following to refer to the curren t stac k address. 2.2 SEQUENTIAL MONTE CARLO Supp ose our target distribution is X with probabil- it y mass function p . Importance sampling generates samples from an appro ximating distribution Y (with probabilit y mass function q ) and re-weigh ts the samples to account for the diﬀerence b etw een true and appro x- imating w ( x ) = p ( x ) /q ( x ). T o compute estimates of ψ = E x ∼ X [ f ( x )] given samples y 1 , . . . , y n w e use ˆ ψ = P n i =1 w ( y i ) f ( y i ) P n i =1 w ( y i ) T o generate appro ximate samples from p ( x ), w e re- sample from the set of samples in prop ortion to the imp ortance weigh ts. If we iterate this pro cedure with a sequence of ap- pro ximating distributions q 1 , . . . , q k , we get Se quential Imp ortanc e Sampling . If we resample at each stage, we get Se quential Imp ortanc e R esampling . If we addition- ally apply MCMC “rejuv enation” steps at each stage i with a transition kernel that lea ves the distribution q i in v ariant, w e get Se quential Monte Carlo . F or Sequential Imp ortance Sampling, the sum of the KL divergences b et ween successive distributions con- trols the diﬃculty of sampling (F reer et al., 2010). If w e can sample from the right coarse-grained distribu- tions, we can reduce this diﬃculty , as illustrated in Figure 1. With rejuv enation steps (SMC), the picture is more complex, but empirically , it is still the case that distributions that are closer together in KL gener- ally make the sampling problem easier. In particular, w e exp ect that goo d coarse-to-ﬁne sequences lead to b etter co verage of regions with high p osterior proba- bilit y , and that they enable more eﬃcien t pruning of lo w-probability regions. A ﬁnite set of ﬁne-grained par- ticles ma y not co ver the entire region, which can lead to a situation where all particles assign lo w probability to the next ﬁltering step (particle decay). A particle that has not b een reﬁned yet corresp onds to distributions on ﬁne-grained states, th us each suc h particle can cov er a bigger region (Steinhardt and Liang, 2014). Go od coarse-to-ﬁne sequences can allow us to prune entire parts of the state space in one go, only considering re- ﬁnemen ts of abstract states that hav e suﬃcien tly high p osterior probability (Kiddon and Domingos, 2011). 3 ALGORITHM Giv en a probabilistic program, our algorithm builds a coarse-to-ﬁne program with the same marginal dis- tribution as the original program, but with additional Coarse-to-Fine Sequen tial Monte Carlo for Probabilistic Programs laten t structure corresp onding to coarsened v ersions of the program. W e will assume that the user provides a coarsenValue function that describ es how v alues map to more ab- stract v alues. Iterating this function leads to m ultiple lev els of coarsened v alues. Our goal then is to construct a v ersion of the original program that op erates o ver v alues coarsened N times. W e will preserve the basic ﬂo w structure of the program, and th us w e only need to sp ecify ho w each primitive construct in the program is lifted to the space of coarsened v alues. The tricky part is to construct these lifted comp onen ts suc h that the ﬁnal marginal distribution is preserved. W e use t wo ideas to accomplish this. First, we replace each unconditional elementary distribution at a given lo ca- tion with a distribution that dep ends on the coarser v alue of the same lo cation, but such that the marginal o ver this coarser v alue yields the original distribution. Second, we treat lifted factor s as only appro ximations useful for guiding inference, which are then canceled by an extra factor inserted at the next-ﬁner level. With this scheme, only the ﬁnest-lev el factors contribute to the ﬁnal score. This gains us ﬂexibility ov er the lifting of primitive functions: lifted functions (that ultimately ﬂo w only to factor statements) only need to hav e similar b eha vior to their original; deviations will b e corrected b y the cancellation of factors. In the next few subsections, we introduce heuristic factors, the inputs that the program transform requires, ho w the mo del syntax is transformed, and how each of the comp onents of the lifted mo del w orks: constants, random v ariables, factors, and primitive functions. 3.1 HEURISTIC F A CTORS A heuristic factor is a factor that is in tro duced for the purp ose of guiding incremental inference algorithms suc h as particle ﬁltering and b est-ﬁrst enumeration (Go odman and Stuhlm ¨ uller, 2014). Its distinguishing c haracteristic is that an equiv alent, canceling factor is inserted at a later p osition in the program in order to leav e the program’s distribution inv ariant. In other w ords, the pair of statemen ts factor(s) and factor(-s) together has no eﬀect on the meaning of a mo del; its only eﬀect is in controlling how inference algorithms explore the state space. F or example, Figure 2b shows a wa y to rewrite the program in Figure 2a in a wa y that initially assigns higher weigh t to program executions where x is close to the true observ ation 7. This is a heuristic, since— dep ending on the outcome of the coin ﬂip—it may b e y whic h is observ ed, in which case there is no pressure for x to b e close to 7. The coarse-to-ﬁne transform in tro duces heuristic fac- tors that guide sampling on coarse lev els tow ards high- probabilit y regions of the state space without c hanging the program’s distribution. 3.2 PREREQUISITES The main inputs to the transform are a mo del, giv en as co de for a probabilistic program, and a pair of functions coarsenValue and refineValue . The main constraint on the mo del is that all ERPs need to b e indep endent, i.e., do not tak e parameters that dep end on other ERPs. If the supp ort of each ERP is known, this can b e achiev ed using a simple transform that replaces eac h dep endent ERP with a maxim um-entrop y ERP , and adds a dep endent factor that corrects the score. That is, we transform v a r x = s a m p l e ( o r i g i n a l E R P , p a r a m s ) to v a r x = s a m p l e ( m a x e n t E R P ) f a c t o r ( o r i g i n a l E R P . s c o r e ( x , p a r a m s ) - m a x e n t E R P . s c o r e ( x ) ) This transform lea ves the mo del’s distribution un- c hanged and greatly simpliﬁes the coarsening of ERPs, but reduces the statistical eﬃciency of the mo del. This statistical ineﬃciency can p otentially b e ad- dressed by merging sample and factor statemen ts into sampleWithFactor (Go odman and Stuhlm¨ uller, 2014) after the coarse-to-ﬁne transform =. The mo del is annotated with the name of the main mo del function (which deﬁnes the marginal distribu- tion of interest) and a list of names of ERPs, constan ts, and functions (comp ound, primitiv e, score, and p oly- morphic; see b elo w) to b e lifted. The main parameters that control the coarse distribu- tions are the user-sp eciﬁed functions coarsenValue and refineValue . The function coarsenValue maps a v alue to a coarser v alue; the function refineValue maps a coarse v alue to a set of ﬁner v alues. T o generate v al- ues on abstraction level i , we iterate the coarsenValue function i times. W e require that the t wo functions are in verses in the sense that v ∈ refineValue ( V ) ⇔ coarsenValue ( v ) = V for all v and V . If a mo del has multiple diﬀeren t t yp es of v ariables for whic h inference is needed, it is easy to deﬁne a p oly- morphic coarsening function that implements diﬀerent b eha viors for diﬀerent t yp es of v alues. In cases where diﬀeren t v ariables ha ve the same type but diﬀerent meaning or scale, we recommend using wrapp er types to control which coarsening is used. This pap er do es not address the task of ﬁnding go o d v alue coarsening functions. Instead, w e ask: if such a Andreas Stuhlm ¨ uller Rob ert X.D. Hawkins N. Siddharth Noah D. Go o dman v a r l i f t e d U n i f o r m E R P = l i f t E R P ( u n i f o r m E R P ) v a r l i f t e d D i s t a n c e = l i f t S c o r e r ( d i s t a n c e ) v a r n o i s y O b s e r v e = f u n c t i o n ( o b s ) { v a r s c o r e = - 3 * l i f t e d D i s t a n c e ( o b s , 7 ) l i f t e d F a c t o r ( s c o r e ) } v a r m o d e l = f u n c t i o n ( ) { s t o r e . b a s e = g e t S t a c k A d d r e s s ( ) v a r x = l i f t e d U n i f o r m E R P ( ) v a r y = l i f t e d U n i f o r m E R P ( ) v a r o b s e r v a t i o n = f l i p ( 0 . 5 ) ? x : y n o i s y O b s e r v e ( o b s e r v a t i o n ) r e t u r n [ x , y ] } v a r c o a r s e T o F i n e M o d e l = f u n c t i o n ( l e v e l ) { s t o r e . l e v e l = l e v e l v a r m a r g i n a l V a l u e = m o d e l ( ) i f ( l e v e l = = = 0 ) { r e t u r n m a r g i n a l V a l u e } e l s e { r e t u r n c o a r s e T o F i n e M o d e l ( l e v e l - 1 ) } } Figure 3: The coarse-to-ﬁne mo del corresp onding to the mo del in Figure 2a. This mo del has the same marginal distribution as 2a and 2b, but samples it using the hierarc hical pro cess shown in Figure 1. function is given, how c an we use it to coarsen en tire programs so that we produce a sequence of coarse-to- ﬁne mo dels that is useful for SMC? 3.3 MODEL TRANSF ORM The transform adds a wrapp er coarseToFineModel that calls the mo del once for eac h coarsening lev el, from coarse to ﬁne, each time setting the (dynamically scop ed) v ariable store.level (in the following, lev el ) to the current level. The transform also replaces all ERPs, factors, primitive functions, and score functions with lifted versions that act diﬀeren tly depending on lev el . The coarse mo dels only aﬀect the ﬁne-grained mo dels through the side-eﬀect of storing the v alues of their random choices and the w eights of their factors in store , whic h is used b y the ﬁner-grained mo dels to conditionally sample their random c hoices and compute their factor w eights. The syntactic transform itself pro ceeds as follo w: 1. F or eac h ERP , primitive and score function, insert the corresponding lifted deﬁnition b efore the model deﬁnition. F or example: var liftedPlus = liftPrimitive(plus) 2. Rename all ERPs, factors, primitive and score functions to their corresp onding lifted names in mo del and compound functions. F or example, replace plus with liftedPlus . 3. W rap all constants. F or example, replace c with liftConstant( c ) . 4. As the ﬁrst statemen t in the mo del, store the cur- ren t address , which is needed to compute relative addresses of random choices and factors later on: store.base = getStackAddress() 5. Add a wrapp er coarseToFineModel that calls the mo del once for eac h coarsening level (see Figure 3). W e will now describ e the mechanisms b ehind lifted constan ts, random v ariables, factors, and primitive functions. 3.3.1 Lifting Constan ts T o lift a constan t, w e simply rep eatedly coarsen it to the current level: Algorithm 2: Lifting constants pro cedure liftedConst ant ( c ) for i =0; i < level ; i ++ do c = coarsenV alue( c ) end for return c end pro cedure 3.3.2 Lifting Random V ariables W e lift each random v ariable to a sequence of v ariables, from coarse to ﬁne, such that (1) on the coarsest level, w e unconditionally sample from the original distribu- tion, coarsened N times; (2) at eac h lev el n < N , we sample from the set of v alid reﬁnemen ts of the v alue of the next-coarser v ariable; (3) marginalizing out all coarser v ariables, the distribution at the ﬁnest level reco vers the distribution of the original uncoarsened v ariable. Let D 0 b e the domain of an ERP with distribution p 0 ( x ), and let D n = cv n ( D 0 ) be the set of v alues arriv ed at by rep eatedly applying the coarsenValue function (written cv for short). W e w ould like to decomp ose the original distribution p 0 ( x ) in to a sequence of conditional distributions q ( x n | x n +1 ) for random v ariables x n ∈ D n . If we take: q ( x n | x n +1 ) ∝ X x 0 p ( x 0 ) 1 cv n ( x 0 )= x n ∧ cv( x n )= x n +1 Coarse-to-Fine Sequen tial Monte Carlo for Probabilistic Programs Algorithm 1: Lifting ERPs pro cedure sampleLiftedERP ( e 0 , l ) v 1 = store[erpName( address , l + 1)] if v 1 is undeﬁned then if l is 0 then return e 0 .sample() else return coarsenV alue(sampleLiftedERP( e 0 , l − 1)) end if else ~ v = reﬁneV alue( v 1 ) ~ p = ~ v .map( λ (v) { return getERPScore( e 0 , v , l ) } ) return sampleDiscrete( ~ v, ~ p ) end if end pro cedure pro cedure liftERP ( e 0 ) e 1 = makeERP( λ () { sampleLiftedERP ( e 0 , level ) } ) return λ () { v = sample( e 1 ) store[erpName( address , level )] = v return v } end pro cedure and for the coarsest lev el, N , q ( x N ) = X x 0 p ( x 0 ) 1 cv N ( x 0 )= x N then it is clear that w e preserve the marginal distribu- tion on x 0 . That is: p ( x 0 ) = X x 1 ,...,x N q ( x 0 | x 1 ) · · · q ( x N − 1 | x N ) q ( x N ) . Algorithm 1 sho ws how we implemen t sampling from suc h a decomp osed ERP at a giv en lev el. Note that, to lo ok up the existing v alue at the next-coarser level, w e iden tify random v ariables on diﬀeren t levels based on relativ e stack addresses (via erpName ); this is critical for mo dels such as grammars with an un b ounded num b er of random choices. The implementation for computing the score of lifted ERPs is analogous (although this score is not needed for pure particle ﬁltering, so we omit the details). Both are parameterized b y a function getERPScore that estimates the total probability of the equiv alence class of v alues that map to a given coarse v alue. These ERP scores for coarse v alues can be estimated by sampling reﬁnements, via user-sp eciﬁed scoring functions (as in Section 4.1), or using exact computation (as in Section 4.2). 3.3.3 Lifting F actors W e treat the lifted counterpart to factors as heuristic factors: the score on the next-higher level is subtracted out on the current level. By canceling out these fac- tors when inference pro ceeds to ﬁner-grained levels, w e ensure that the ov erall distribution of the mo del remains unchanged—ultimately , only the base-level factors count. This incremental scoring process formal- izes the intuition of increasing attention to detail as w e mov e down the abstraction ladder. Lik e random v ariables, w e identify factors based on relative stack addresses. Algorithm 3: Lifting factors pro cedure liftedF actor ( s ) s 1 = store[factorName( address , level + 1)] ∨ 0 factor( s − s 1 ) store[factorName( address , level )] = s end pro cedure 3.3.4 Lifting Primitiv e F unctions When a primitive function f is applied to a base-level v alue, it is deterministic. Now we are interested in lifting primitive functions to op erate on coarse v alues. Ho wev er, eac h coarse v alue corresp onds to a set of base-lev el v alues. F or diﬀeren t elements in this set, f ma y return diﬀerent v alues. This suggests that lifted v ersions of f ma y be sto c hastic. W e wish to preserve the marginal distribution of the entire program, but b ecause we hav e required ERPs to b e unconditional and treat coarse factors as canceling heuristics, we ha v e some latitude in how to lift the primitive functions. Algorithm 4 shows one approach to the computation of such coarsened primitives. This algorithm is pa- rameterized by a function marginalize , which may be implemen ted using exact computation, sampling, etc, and which may cac he its computations. Algorithm 4: Lifting primitive functions pro cedure liftPrimitive ( f ) return λ ( ~ x ) { e = marginalize( λ () { ~ x 0 = ~ x .map((uniformDra w ◦ reﬁneV alue) level ) v = f ( ~ x 0 ) return coarsenV alue level ( v ) } ) return sample( e ); } end pro cedure Scoring functions—those that directly compute a score to b e consumed by factor —are a sp ecial case of prim- itiv e functions. W e kno w that they return a n umber, so instead of sampling from the return distribution, w e can simply take the exp ectation. W e ﬁnd that this leads to more stable, and hence useful, heuristic factors. Algebraic data type constructors are another sp ecial case. In man y circumstances (including Section 4.2), they can b e treated as transparent with resp ect to coarsening. F or example, it is frequently useful to Andreas Stuhlm ¨ uller Rob ert X.D. Hawkins N. Siddharth Noah D. Go o dman deﬁne coarsenValue([ x 1 , x 2 , . . . ]) as equiv alen t to [ coarsenValue( x 1 ), coarsenValue( x 2 ), . . . ] . More gen- erally , if a function can apply to coarsened ob jects directly , it can be marked as p olymorphic . In that case, no lifting is necessary . F or example, if w e hav e a function that computes the mean of a list of n um- b ers, and if our coarsening maps lists of num b ers to shorter lists (without changing the t yp e of the list ele- men ts), then w e ha ve the option to mark this function as p olymorphic. 4 EMPIRICAL EV ALUA TION W e introduced t wo complemen tary ideas: (1) Coarse-to- ﬁne SMC (which can be useful even for a single random v ariable with a big state space, giv en an appropriate coarsening function) and (2) constructing coarse-to- ﬁne mo dels from probabilistic programs (using a v alue coarsening function to lift eac h of the comp onents of a probabilistic program to op erate on coarsened v al- ues without c hanging the marginal distribution of the program). Our exp erimen ts mirror this structure: In the ﬁrst set of exp erimen ts, w e ev aluate the b eneﬁts of coarse-to-ﬁne SMC on an Ising model and its depth-from-disparit y v ariation. These mo dels use a single matrix-v alued random v ariable and a single global factor (the energy function). In the second set of experiments—on a F actorial Hidden Mark ov Mo del—we mak e lib eral use of probabilistic program constructs: the mo del is deﬁned as a recursiv e sto c hastic function, and the state transition function is implemented using the higher-order function map . This set of exp eriments fully exercises our program transform, including the iden tiﬁcation of corresp onding coarse and ﬁne random v ariables using relative stack addresses. All mo dels are expressed as programs, and all experi- men ts are implemented within the same framework. F or eac h experiment, we sho w ho w the av erage imp ortance w eight—essen tially a low er b ound on the normalization constan t (Grosse, 2013)—b ehav es ov er time. Note that these exp erimen ts are preliminary: They do not yet employ rejuvenation steps, and hence only test the qualit y of the sequence of coarse-to-ﬁne distribu- tions in the setting of Sequen tial Imp ortance Resam- pling. While the use of a coarse-to-ﬁne sequence of imp ortance distributions results in b etter p erformance than baseline, we ha ve only ev aluated the p erformance compared to a relatively weak baseline (imp ortance sampling for the MRF mo dels, particle ﬁltering for the F actorial HMM mo del), and expect that the results are not impressive on an absolute scale. Indeed, as describ ed in Section 5.2, we b eliev e that some mo diﬁ- cations and further developmen ts are needed to make this approach useful in practice. 4.1 MARK OV RANDOM FIELDS A num b er of applications in ph ysics, biology , and com- puter vision can b e mo deled as Mark ov Random Fields (MRFs). These problems are uniﬁed in sp ecifying a global energy function whic h, b y virtue of the Mark ov prop ert y , depends only on the lo cal neighborho o ds of elemen ts. Once this energy function is speciﬁed, it can b e diﬃcult to minimize; sp ecialized optimization algorithms hav e b een developed for particular domains (Szeliski et al., 2008), but there is no generally applica- ble solution. The lo cal neigh b orho o d structure of the energy func- tion, how ev er, suggests that a coarse-to-ﬁne transfor- mation may b e useful: if neigh b orho o ds are coarsened in to single represen tativ e v alues, then the energy can be minimized in this smaller space, using heuristic factors to guide search in the original space. W e do not claim that our mo del transformation constitutes a solution in itself, but can b e used in tandem with other algorithms to eﬀectively reduce the search space. In this situa- tion, we demonstrate the coarse-to-ﬁne transformation on tw o simple MRFs: the Ising mo del and the stereo matc hing task. 4.1.1 Ising mo del Coarse-to-ﬁne transformations hav e a long history of applications in physics. When studying systems which in teract across multiple orders of magnitude, such as ﬂuids, ferromagnets, and metal alloys, it is intractable to work at the most ﬁne-grained level. Since exact solutions do not exist, physicists developed a method called the r enormalization gr oup (Wilson, 1975, 1979), whic h eﬀectiv ely maps the ﬁne-grained representation of a system onto a coarser but identically parameterized represen tation with similar prop erties. One of the simplest testb eds for renormalization group metho ds is the 2-dimensional Ising mo del. The state space is an n × n lattice of cells, eac h of whic h can take one of tw o spin v alues, σ i ∈ {− 1 , +1 } . The ener gy of a particular conﬁguration of spins σ is given b y the Hamiltonian: H ( σ ) = J X h ij i σ i σ j where J is the inter action c onstant and h ij i indicates summing ov er all p ossible pairs of neighbors. Note that the num b er of p ossible conﬁgurations grows ex- p onen tially in n , rendering an exhaustive search for lo w-energy states imp ossible. Coarse-to-Fine Sequen tial Monte Carlo for Probabilistic Programs 0 1000 2000 3000 4000 5000 0 200 400 600 800 n = 27, T = 1 Seconds Estimated log Z No coarsening 30 lev els 90 lev els (a) Ising at low temp erature ( T = 1) 0 500 1500 2500 3500 −50 0 50 100 150 n = 27, T = 2.39 Seconds Estimated log Z No coarsening 30 lev els 90 lev els (b) Ising at the critical state ( T = 2 . 39) 0 100 200 300 400 500 6300 6500 6700 n = 10, m = 40, d = 15 Seconds Estimated log Z No coarsening 50 lev els 100 lev els (c) Depth-from-disparity Figure 4: Quantitativ e inference results for Marko v Random Field mo dels The interaction constant can b e written as J = 1 /T where T is the temp er atur e of the system. The conﬁg- uration distribution tak es diﬀeren t forms at diﬀeren t temp eratures: we will conduct exp erimen ts at T = 1, a lo w-temp erature condition where spins prefer to glob- ally align, and T = 2 . 39, the critic al temp er atur e , where long-range correlations dominate. Abov e the critical temp erature (e.g. for T > 3), cells become uncoupled and the energy distribution across conﬁgurations con- v erges to uniform, so we fo cus on lo wer temp eratures. W e implemented the Ising energy-minimization prob- lem as a simple probabilistic program, whic h ﬁrst sam- ples a set of spins and then factors based on the energy of that conﬁguration. T o apply our coarse-to-ﬁne trans- formation, w e used the spin-blo ck ma jorit y-rule for our coarsenValue and refineValue functions. T o coarsen, this rule replaced eac h 3 × 3 sub-lattice with its mo dal v alue (see Figure 5). T o reﬁne a single cell in the coarse matrix, we considered the space of all 256 p ossible 3 × 3 matrices that could coarsen to that v alue. Note that our sequen tial reﬁnemen t – making man y small choices instead of one big one – diﬀers from the typical renormalization group approac h, which si- m ultaneously replaces all sublattices and rew eights the in teraction constan t J accordingly . F or example, consider a ﬁne-grained v alue such as 1 : A =         0 1 0 0 0 0 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1         1 W e use spins { 1 , 0 } here instead of { 1 , − 1 } to mak e the matrices more readable. Coarser v alues are partially-coarsened matrices, which can b e represen ted as pairs of matrices, e.g. cv ( A ) =                 ∗ ∗ ∗ 0 0 0 ∗ ∗ ∗ 1 0 0 ∗ ∗ ∗ 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1         ,  1 ∗ ∗ ∗          where each entry in the second matrix corresponds to a 3x3 blo c k in the ﬁrst matrix. In each coarsening step, w e only coarsen a single suc h blo c k. As a consequence, there are 90 (= 81 + 9) steps when w e incrementally coarsen a 27x27 matrix ﬁrst to 9x9, then to 3x3. T o facilitate this sequen tial reﬁnemen t, w e implemen ted a p olymorphic energy function, which can directly re- turn a score for such partially coarsened matrices with- out needing to reﬁne all the wa y down to the most ﬁne-grained level. When this energy function is applied to a matrix that contains cells at diﬀerent levels of abstraction, the energy computation counts a single coarse cell as a neighbor for each surrounding ﬁner cell (and vice v ersa). W e ran tw o exp eriments on 27 × 27 lattices to demon- strate the p erformance of our coarsened program at diﬀeren t lev els of coarsening. In our ﬁrst exp eriment, w e set the temperature to T = 1 and a verage 10 runs for b oth coarse-to-ﬁne ﬁltering (with 90 and 30 levels) and ﬂat imp ortance sampling. The 90 level coarsening condition fully reduces the 27 × 27 lattice to a 3 × 3 lattice, and the 30 lev el condition yields a partially coarsened matrix. In our second exp eriment, w e set the temp erature to T = 2 . 39 and run the same set of conditions. Figures 4a and 4b show the av erage im- p ortance weigh t for diﬀeren t le v els of coarsening. W e see that even in termediate levels of coarsening p erform Andreas Stuhlm ¨ uller Rob ert X.D. Hawkins N. Siddharth Noah D. Go o dman 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Figure 5: Coarsening at the critical state ( T = 2 . 39) b etter than no coarsening, and that a full coarsening p erforms dramatically b etter than the other conditions. The b est solution from the second exp eriment is shown in the right-most panel of Figure 5, along with the tw o coarser conﬁgurations from which it w as reﬁned. This displa ys the characteristic local structure of the Ising mo del at critical temp eratures. 4.1.2 Stereo matc hing Another common application of MRFs is the stereo matc hing task (Boyk o v et al., 2001; Scharstein and Szeliski, 2002). The goal is to estimate the disparity b et ween t wo images, I and I 0 , captured from slightly shifted viewp oints. This disparity map can b e used to reco ver a rough measure of depth. As in the Ising case, w e implemen t this task as a probabilistic program b y sampling a lattice of disparit y v alues, and factoring on its energy . The energy function for a particular set of disparity v alues has tw o parts: (1) a smo othing term penalizing distance b etw een the v alues of neigh b ors and (2) a data c ost term p enalizing each particular disparity v alue for discrepancies with the true data (as measured b y comparing the diﬀerence in pixel intensities at the giv en discrepancy): H ( d ) = X { p,q }∈N V p,q ( d p , d q ) + X p C ( p, d p ) W e denote the intensit y of pixel p in image I b y I p . Since corresp onding pixels should ha ve similar in tensi- ties, we set our data cost te rm C ( p, d p ) as suggested by Bo yko v et al. (2001), taking the absolute diﬀerence be- t ween I p and I 0 p + d p . T o reduce sensitivity to v ariability in image sampling, we interpolate b etw een neigh b oring in tensities in the neighborho o d x ∈ ( d − 0 . 5 , d + 0 . 5) and tak e the minimum. F or our smo othing function, w e use the truncated squared error: V p,q ( d p , d q ) = min(( d p − d q ) 2 , V max ) with V max = 5. W e implemented energy minimization for the stereo matc hing mo del analogous to the Ising mo del, but with diﬀeren t coarsening and reﬁnement functions: coars- ening replaces a 2 × 2 sublattice with its mean and standard deviation; reﬁnement returns the set of all p ossible 2 × 2 lattices with the given mean and stan- dard deviation that could hav e o ccurred at the giv en abstraction level. Figure 4c sho ws that in a comparison betw een coarse-to- ﬁne SMC and imp ortance sampling, SMC ﬁnds low er- energy states more eﬃcien tly for a 10x40 cropp ed pair of images from the Middlebury dataset (Sc harstein and Szeliski, 2002), although w e exp ect that the absolute qualit y of the states found is not very goo d for either algorithm. 4.2 F A CTORIAL HMM In our second example, we test the hypothesis that abstractions are useful as a means to av oid particle collapse in large state spaces. F or this reason, we chose the F actorial HMM, a model with a large eﬀectiv e state size even within a single particle ﬁlter step. The F actorial HMM is a HMM where the state factors in to m ultiple v ariables (Ghahramani and Jordan, 1997). If there are M p ossible v alues for eac h latent state, and k state v ariables p er time step, then the eﬀective state size is M k . If only few of these hav e high probabilit y , then even for mo derate M and k it is p ossible that there are not suﬃciently many ﬁne-grained particles to co ver all regions of high p osterior probability . In our ﬁrst exp eriment, w e use a F actorial HMM with 3 v ariables p er step, 256 possible state v alues p er v ariable, and 6 observed time steps. W e run b oth coarse-to- ﬁne ﬁltering (with 6 and 8 abstraction lev els) and ﬂat ﬁltering and a verage 10 runs. W e coarsen the F actorial HMM by merging some state and observ ation symbols. T o test the h yp othesis that coarse-to-ﬁne inference will work b est when abstrac- tions matc h the dynamics of the mo del, w e generate transition and observ ation matrices with appro ximately hierarc hical structure as follows. En umerate state 1 to N . F or states i and j , we let the transition probabilit y b e approximately prop ortional to 2 −| i − j | . Similarly , for state i , the probability of generating observ ation k is prop ortional to 2 −| i − k | . Eac h state consists of three substates, and each sub- state is c hosen from { 1 , 2 , . . . , 256 } . Coarsening maps n umbers to successively greater in terv als. F or example, coarsening maps (5 , 11 , 56) to ([5 , 6] , [11 , 12] , [55 , 56]), and on the next-coarser level to ([5 , 8] , [11 , 14] , [55 , 58]). Figure 6a shows that plain particle ﬁltering consistently underestimates the true normalization constant relative to coarse-to-ﬁne ﬁltering. In our second exp erimen t (Figure 6b), we compare the b eha vior of ﬂat and coarse-to-ﬁne ﬁltering as the n um- b er of HMM states increases from 2 3 to 256 3 . As before, Coarse-to-Fine Sequen tial Monte Carlo for Probabilistic Programs 0 50 100 150 200 −350 −300 −250 −200 Seconds Estimated log Z No coarsening 6 le vels 8 le vels (a) SMC using a coarse-to-ﬁne mo del ﬁnds more proba- ble samples earlier on than SMC without coarsening. −300 −200 −100 0 Number of HMM states Estimated log Z 50 3 100 3 150 3 200 3 250 3 Coarse−to−fine SMC Plain SMC (b) F or small n umbers of states, coarse-to-ﬁne is indis- tinguishable from plain particle ﬁltering. As the n um b er of states grows, coarse-to-ﬁne is able to provide b etter solutions in the same amoun t of time. Figure 6: Inference results for a F actorial HMM w e keep the runtime constant. F or small n umbers of states, plain and coarse-to-ﬁne SMC give v ery similar estimates. As the num b er of states grows, the diﬀer- ence b etw een coarse-to-ﬁne and plain ﬁltering grows as well, indicating that coarse-to-ﬁne is most useful in large state spaces. 5 DISCUSSION 5.1 WHEN DOES CO ARSE-TO-FINE INFERENCE HELP? It is generally diﬃcult to compute or estimate the p os- terior probabilit y of a set of (program) states. How ev er, this is precisely what is required for coarse-to-ﬁne infer- ence to work: when we ev aluate a program on a coarse lev el, we need to estimate for each coarse v alue how lik ely its reﬁnements are under the p osterior. This sug- gests that settings where coarse-to-ﬁne inference works ha ve sp ecial c haracteristics that make such estimation feasible. W e now name a few. First, the giv en program may satisfy independence as- sumptions that mak e estimating p osterior probabilities feasible. F or example, for the program sho wn in Figures 1 and 2a, the score function only dep ends on one of x and y at a time; hence, w e can indep endently compute the estimated score for the reﬁnements of x and y , and use this information in computing the estimated scores for abstract v alues of b oth. Second, we may b e in a setting where the t yp e of a coarse v alue matc hes the t yp e of its reﬁnements. In that situation, “p olymorphic” score- and primitive functions may b e a c heap heuristic for estimating the p osterior probability of a coarse state. F or the Ising mo del, the energy function satisﬁes this criterion up to parameterization. Third, coarse-to-ﬁne may be particularly useful in the amortized setting (Stuhlm ¨ uller et al., 2013). Learn- ing the conditional distributions asso ciated with lifted primitiv e functions is one instan tiation of “learning to do inference”. This is particularly feasible for smooth state spaces, where one can eﬀectively estimate en tire distributions from a few samples. Another answer to the question of when coarse-to-ﬁne helps is to p oint out that this depends on what inference algorithm is used. F or inference b y en umeration, exact coarsening (i.e., coarsening within v alues that hav e the same p osterior probability) is useful for increasing computational eﬃciency . By contrast, for sequential Mon te Carlo metho ds, it is frequen tly more desirable to merge states with diﬀer ent p osterior probabilit y , as this smo othes the state space and th us increases statistical eﬃciency . 5.2 CURRENT LIMIT A TIONS The system as presented has three main limitations. W e will now describ e these limitations and outline how they can b e addressed. First, as discussed in Section 3.2, the coarse-to-ﬁne transform can only b e applied to mo dels where all ERPs are indep endent. W e ha v e describ ed how to transform mo dels into this form and referred to the tec hnique of merging sample and factor statemen ts describ ed in Go o dman and Stuhlm ¨ uller (2014) as a Andreas Stuhlm ¨ uller Rob ert X.D. Hawkins N. Siddharth Noah D. Go o dman to ol for reco vering statistical eﬃciency lost in such a transform. W e hav e not yet emplo yed this merging in our exp eriments, which mak es the results more diﬃcult to in terpret. W e exp ect that it would be feasible to dev elop a version of the coarse-to-ﬁne transform that directly op erates on dep endent ERPs. Second, single-site MCMC rejuvenation steps are of v ery limited use in the curren t setup. This is due to the combination of (a) computing coarsened v al- ues by binning ﬁne-grained v alues (using the user- pro vided coarsenValue function) and (b) using this coarsening to build a hierarchical mo del to which stan- dard SMC algorithms can b e applied. In this hierarchi- cal mo del, each ﬁne-grained v alue v only has non-zero probabilit y if the corresponding coarse v alue is set to coarsenValue( v ) . As a c onsequence, w e reject all MCMC steps that c hange this coarse-grained v alue to an ything else, and likewise all steps that change v to v 0 suc h that coarsenValue( v ) 6 = coarsenValue( v 0 ) . These strong dep endencies b et ween levels of abstraction can b e a voided if we only use each level in the sequence of coarse-to-ﬁne models as an imp ortance sampler for the next lev el instead of constructing a coarse-to-ﬁne mo del. Third, coarsening only applies to individual v ariables and v alues, not to sets of v ariables. Consider the Ising mo del: while w e implemented it using a matrix-v alued random v ariable, a more natural implementation w ould use indep endent Bernoulli v ariables for each of the ma- trix entries. Our current scheme cannot express coars- enings that apply to such sets of v ariables. While it is easy to see how our scheme can b e extended to coars- ening multiple v ariables within the same con trol-ﬂow blo c k (by transforming multiple v ariables to a single tuple-v alued v ariable), generic coarsening of v ariables that are created in diﬀerent control ﬂow blo cks will require new inno v ations. 5.3 RELA TED AND FUTURE WORK The work in this pap er is related to and inspired by a broad background of work on coarse-to-ﬁne and lifted inference, including (but not limited to) w ork by Char- niak et al. (2006) on coarse-to-ﬁne inference in PCF Gs, w ork on coarse-to-ﬁne inference for ﬁrst-order proba- bilistic mo dels by Kiddon and Domingos (2011), and attempts to “fatten” particles for ﬁltering with broader co verage (Kulk arni et al., 2014; Steinhardt and Liang, 2014). Outside of machine learning, we tak e inspiration from Appr oximate Bayesian Computation (condition- ing on summary statistics) and r enormalization gr oup approac hes to inference in Ising mo dels (Lyman and Zuc kerman, 2006). This approach op ens up man y exciting researc h di- rections, including (1) understanding the relation to abstract interpretation, and Galois Connections sp ecif- ically (e.g., Cousot and Monerau, 2012; Monniaux, 2000), (2) automatically deriving coarsenings for hier- arc hical Ba y esian mo dels, (3) learning go o d coarsenings, and eﬃcient learning of approximations for coarsened primitiv e and score functions, and (4) coarsening (merg- ing) m ultiple v ariables across blo c ks, potentially via ﬂo w analysis. W e exp ect that the most in teresting applications of coarse-to-ﬁne approaches to eﬃcient inference are yet to come. Ac knowledgemen ts This material is based on researc h sp onsored b y DARP A under agreement num b er F A8750-14-2-0009. The U.S. Go vernmen t is authorized to repro duce and distribute reprin ts for Go vernmen tal purp oses notwithstanding an y cop yright notation thereon. RXDH is supp orted by a NSF Graduate Researc h F el- lo wship and a Stanford Graduate F ello wship. References Y uri Boyk ov, Olga V eksler, and Ramin Zabih. F ast appro ximate energy minimization via graph cuts. Pattern Analysis and Machine Intel ligenc e, IEEE T r ansactions on , 23(11):1222–1239, 2001. Eugene Charniak, Mark Johnson, Mic ha Elsner, Joseph Austerw eil, David Ellis, Isaac Haxton, Catherine Hill, R. Shriv aths, Jerem y Mo ore, Mic hael Pozar, and Theresa V u. Multilev el Coarse-to-ﬁne PCFG Pars- ing. Pr o c e e dings of the Human L anguage T e chnolo gy Confer enc e of the NAACL, Main Confer enc e , pages 168–175, 2006. P atrick Cousot and Mic hael Monerau. Probabilistic abstract interpretation. In Pr o gr amming L anguages and Systems , volume 7211 of L e ctur e Notes in Com- puter Scienc e , pages 169–193. Springer, 2012. Cameron E F reer, Vik ash K Mansinghk a, and Daniel M Ro y . When are probabilistic programs probably com- putationally tractable? NIPS 2010 Workshop on Monte Carlo Metho ds in Mo dern Applic ations , 2010. Zoubin Ghahramani and Mic hael I Jordan. F actorial Hidden Marko v Mo dels. Machine L e arning , 29(2-3): 245–273, Nov ember 1997. V alerie Goﬀaux, Judith Peters, Julie Haubrech ts, Chris- tine Schiltz, Bernadette Jansma, and Rainer Go ebel. F rom coarse to ﬁne? Spatial and temp oral dynamics of cortical face pro cessing. Cer ebr al Cortex , page bhq112, 2010. Noah D Goo dman and Andreas Stuhlm ¨ uller. The Design and Implemen tation of Probabilistic Pro- Coarse-to-Fine Sequen tial Monte Carlo for Probabilistic Programs gramming Languages. http://dippl.org , 2014. Ac- cessed: 2015-4-16. Noah D Go o dman, Vik ash Mansinghk a, Daniel M Roy , K A Bona witz, and Josh ua B T enenbaum. Churc h: a language for generative mo dels. Pr o c e e dings of the 24th Confer enc e in Unc ertainty in A rtiﬁcial Intel li- genc e , pages 220–229, 2008. Roger Grosse. Unbiased estimators of partition func- tions are basically lo wer b ounds. http://hips.seas. harvard.edu/ , January 2013. Chlo ´ e Kiddon and P edro Domingos. Coarse-to-Fine Inference and Learning for First-Order Probabilistic Mo dels. In Pr o c e e dings of the Twenty-Fifth AAAI Confer enc e on A rtiﬁcial Intel ligenc e , pages 1049– 1056, August 2011. Daphne Koller, David McAllester, and Avi Pfeﬀer. Ef- fectiv e Ba yesian Inference for Sto chastic Programs. In Pr o c e e dings of the National Confer enc e on Artiﬁ- cial Intel ligenc e , pages 740–747, 1997. T ejas D Kulk arni, Ardav an Saeedi, and Samuel Ger- shman. V ariational Particle Approximations. arXiv pr eprint arXiv:1402.5715 , pages 1–11, 2014. Edw ard Lyman and Daniel M. Zuc kerman. Resolution exc hange sim ulation with incremental coarsening. Journal of Chemic al The ory and Computation , 2: 656–666, 2006. ISSN 15499618. Mic hael D Menz and Ralph D F reeman. Stereoscopic depth processing in the visual cortex: a coarse-to-ﬁne mec hanism. Natur e neur oscienc e , 6(1):59–65, 2003. Da vid Monniaux. Abstract in terpretation of probabilis- tic seman tics. In Static Analysis , volume 1824 of L e ctur e Notes in Computer Scienc e , pages 322–339. Springer, 2000. Sla v P etrov, Aria Haghighi, and Dan Klein. Coarse- to-ﬁne syntactic machine translation using language pro jections. Empiric al Metho ds in Natur al L anguage , pages 108–116, 2008. Avi Pfeﬀer. 14 The Design and Implemen tation of IBAL: A General-Purp ose Probabilistic Language. Intr o duction to statistic al r elational le arning , page 399, 2007. Daniel Scharstein and Richard Szeliski. A taxonomy and ev aluation of dense t wo-frame stereo correspon- dence algorithms. International journal of c omputer vision , 47(1-3):7–42, 2002. Jacob Steinhardt and P ercy Liang. Filtering with Abstract P articles. In Pr o c e e dings of The Thirty- First International Confer enc e on Machine L e arning , pages 727–735, June 2014. Andreas Stuhlm ¨ uller, Jessica T a ylor, and Noah D Go o d- man. Learning Sto chastic In verses. A dvanc es in Neur al Information Pr o c essing Systems (NIPS) 27 , 2013. Ric hard Szeliski, Ramin Zabih, Daniel Scharstein, Olga V eksler, Vladimir Kolmogorov, Aseem Agarwala, Marshall T appen, and Carsten Rother. A com- parativ e study of energy minimization metho ds for mark ov random ﬁelds with smo othness-based priors. Pattern Analysis and Machine Intel ligenc e, IEEE T r ansactions on , 30(6):1068–1080, 2008. Y un T ang, W en-Ju Liu, Hua Zhang, Bo Xu, and Guo- Hong Ding. One-pass coarse-to-ﬁne segmental sp eec h deco ding algorithm. In A c oustics, Sp e e ch and Signal Pr o c essing, 2006. ICASSP 2006 Pr o c e e dings. 2006 IEEE International Confer enc e on , v olume 1, pages I–I. IEEE, 2006. Kenneth G Wilson. The renormalization group: Criti- cal phenomena and the Kondo problem. R eviews of Mo dern Physics , 47(4):773, 1975. Kenneth G. Wilson. Problems in Ph ysics with many Scales of Length. Scientiﬁc Americ an , 241:158–179, 1979. ISSN 0036-8733. Da vid Wingate, Andreas Stuhlm¨ uller, and Noah D Go odman. Light weigh t Implementations of Prob- abilistic Programming Languages Via T ransforma- tional Compilation. In Pr o c e e dings of the 14th in- ternational c onfer enc e on Artiﬁcial Intel ligenc e and Statistics , 2011.

Coarse-to-Fine Sequential Monte Carlo for Probabilistic Programs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment