Generating Functions Meet Occupation Measures: Invariant Synthesis for Probabilistic Loops (Extended Version)
A fundamental computational task in probabilistic programming is to infer a program’s output (posterior) distribution from a given initial (prior) distribution. This problem is challenging, especially for expressive languages that feature loops or unbounded recursion. While most of the existing literature focuses on statistical approximation, in this paper we address the problem of mathematically exact inference. To achieve this for programs with loops, we rely on a relatively underexplored type of probabilistic loop invariant, which is linked to a loop’s so-called occupation measure. The occupation measure associates program states with their expected number of visits, given the initial distribution. Based on this, we derive the notion of an occupation invariant. Such invariants are essentially dual to probabilistic martingales, the predominant technique for formal probabilistic loop analysis in the literature. A key feature of occupation invariants is that they can take the initial distribution into account and often yield a proof of positive almost sure termination as a by-product. Finally, we present an automatic, template-based invariant synthesis approach for occupation invariants by encoding them as generating functions. The approach is implemented and evaluated on a set of benchmarks.
💡 Research Summary
The paper tackles the challenging problem of exact inference for probabilistic programs that contain loops. While most prior work relies on statistical approximation, the authors aim for mathematically precise output (posterior) distributions given an initial (prior) distribution. Their key insight is to use a relatively under‑explored class of loop invariants based on the occupation measure of a loop. The occupation measure assigns to each program state the expected number of times that state is visited during the execution of the loop, starting from the given initial distribution.
The authors first relate Kozen’s measure‑transformer semantics to occupation measures, proving (Theorem 2) that the final distribution of a loop while (φ) { C } can be obtained by restricting the occupation measure to the states that violate the guard φ:
K⟦while (φ){C}⟧(μ) = J_{¬φ} (OM).
Thus, instead of solving Kozen’s functional fixed‑point equation, which is independent of the initial distribution, they solve a distribution‑dependent fixed‑point equation for the occupation measure:
OM = μ + K⟦C⟧ ( J_φ (OM) ) (‡)
Equation (‡) defines the least fixed point, which is the true occupation measure. The authors propose a “guess‑and‑check” methodology: (1) guess a candidate measure M (an invariant), (2) verify that M satisfies (‡). If M is a solution, it provides an upper bound on the true occupation measure; if M is finite, its total mass equals the expected runtime, thereby yielding a proof of positive almost‑sure termination (PAST). Moreover, when J_{¬φ}(M) is a probability distribution (mass = 1), the inequality becomes an equality, giving the exact output distribution.
Because occupation measures are defined over an infinite state space, the paper encodes them using generating functions (GFs). For programs over non‑negative integer variables, a GF is a formal power series where variables (e.g., X, C) encode program variables, exponents encode concrete values, and coefficients encode probabilities. Many occupation measures admit rational closed forms; for example, the occupation measure of the running example has GF 1 + 2 X² / (2 – C). This compact representation enables symbolic manipulation and automated verification of (‡).
The synthesis engine works with template‑based rational GFs. A user supplies a parametric template (e.g., a rational function with unknown coefficients). The fixed‑point condition (‡) is translated into algebraic constraints on the parameters. These constraints are solved using SMT or algebraic solvers, yielding concrete GFs that satisfy (‡). The approach thus automates both the discovery of occupation invariants and their verification.
Implementation details are presented in Section 7. The prototype, named OCCGF, was evaluated on a benchmark suite of around 30 probabilistic loops, covering geometric distributions, loops with non‑terminating branches, unbounded variable growth, and nested probabilistic choices. In most cases the tool automatically synthesized a correct occupation invariant, proved PAST, and derived the exact output distribution. The experiments demonstrate that the method handles programs beyond the reach of traditional martingale‑based techniques, especially those with non‑linear updates or complex probabilistic branching. Performance scales with the size of the template and the complexity of the resulting algebraic constraints, but remains practical for the tested benchmarks.
Limitations include the need for manually chosen templates, lack of support for continuous distributions or real‑valued variables, and potential scalability issues for highly non‑linear GFs. The authors suggest future work on automatic template generation, extensions to continuous domains, and hybrid approaches that combine occupation invariants with martingale reasoning.
In summary, the paper introduces a novel semantic foundation (occupation measures) for exact loop analysis, shows how to encode these measures as rational generating functions, and provides the first fully automatic synthesis technique for such invariants. This advances the state of the art in probabilistic program verification by enabling exact inference and termination proofs for a broader class of loops.
Comments & Academic Discussion
Loading comments...
Leave a Comment