A Unification of Discrete, Gaussian, and Simplicial Diffusion
To model discrete sequences such as DNA, proteins, and language using diffusion, practitioners must choose between three major methods: diffusion in discrete space, Gaussian diffusion in Euclidean space, or diffusion on the simplex. Despite their shared goal, these models have disparate algorithms, theoretical structures, and tradeoffs: discrete diffusion has the most natural domain, Gaussian diffusion has more mature algorithms, and diffusion on the simplex in principle combines the strengths of the other two but in practice suffers from a numerically unstable stochastic processes. Ideally we could see each of these models as instances of the same underlying framework, and enable practitioners to switch between models for downstream applications. However previous theories have only considered connections in special cases. Here we build a theory unifying all three methods of discrete diffusion as different parameterizations of the same underlying process: the Wright-Fisher population genetics model. In particular, we find simplicial and Gaussian diffusion as two large-population limits. Our theory formally connects the likelihoods and hyperparameters of these models and leverages decades of mathematical genetics literature to unlock stable simplicial diffusion. Finally, we relieve the practitioner of balancing model trade-offs by demonstrating it is possible to train a single model that can perform diffusion in any of these three domains at test time. Our experiments show that Wright-Fisher simplicial diffusion is more stable and outperforms previous simplicial diffusion models on conditional DNA generation. We also show that we can train models on multiple domains at once that are competitive with models trained on any individual domain.
💡 Research Summary
The paper tackles a fundamental fragmentation in the field of diffusion models for discrete sequences such as DNA, proteins, and natural language. Practitioners currently choose among three distinct families: (i) discrete diffusion that operates directly on categorical symbols, (ii) Gaussian diffusion that corrupts continuous embeddings, and (iii) simplicial diffusion that works on probability vectors lying on the simplex. Each family enjoys its own strengths—natural domain for discrete diffusion, mature training and sampling algorithms for Gaussian diffusion, and a theoretically appealing combination of the two for simplicial diffusion—but they also suffer from separate algorithmic pipelines, incompatible likelihoods, and, in the case of simplicial diffusion, severe numerical instability.
The authors propose a unifying theoretical framework based on the Wright‑Fisher (WF) population genetics model, a continuous‑time Markov process that describes the stochastic evolution of allele frequencies in a finite population. By interpreting each token in a sequence as a “population” of size ζ, they show that:
- When ζ = 1, the WF dynamics reduce to a pure mutation process governed by a rate matrix L, which is exactly the forward process used in discrete diffusion.
- When ζ → ∞ and the reproduction rate is set to zero, the central‑limit theorem forces the empirical frequency vector to concentrate around the stationary distribution π of L, and the residual fluctuations become Gaussian in the subspace spanned by the leading eigenvectors of L. This limit reproduces the Brownian motion used in Gaussian diffusion.
- When ζ → ∞ with a positive reproduction rate, the WF process yields a diffusion on the simplex (Jacobi or Cox‑Ingersoll‑Ross type), matching simplicial diffusion.
Crucially, the authors formalize the connection between the three families at the level of the evidence lower bound (ELBO). They introduce a “time‑dilation” function τ(t) that aligns the continuous‑time WF process with the discrete timesteps used in training. Under this alignment, the ELBOs of the three models become mathematically comparable, provided a specific “hollow parameterization” is used for the discrete case—a subtle choice that previous work overlooked. This resolves the long‑standing belief that continuous‑space likelihoods cannot be directly compared with discrete‑space likelihoods.
The paper also addresses the notorious instability of simplicial diffusion. By leveraging decades of results from mathematical genetics, the authors derive a “sufficient‑statistic parameterization” that re‑expresses the WF dynamics in terms of the empirical frequency vector’s sufficient statistics. This re‑parameterization yields drift and diffusion coefficients that remain bounded even as the diffusion time t approaches zero, eliminating the exploding loss terms that plagued earlier simplex‑based methods. Empirically, the new WF‑based simplicial diffusion achieves substantially lower KL divergence and higher log‑likelihood on conditional DNA generation tasks compared with prior simplex models.
Beyond theoretical unification, the authors demonstrate a practical benefit: a single neural network can be trained to perform any of the three diffusion processes at inference time. By sharing the encoder‑decoder architecture and using the sufficient‑statistic representation, the model learns a domain‑agnostic denoising function. During testing, the practitioner can select ζ = 1 for discrete diffusion, ζ → ∞ with zero reproduction for Gaussian diffusion, or ζ → ∞ with positive reproduction for simplicial diffusion. Experiments show that this multi‑domain model matches or slightly trails dedicated single‑domain models (within 0.1‑0.3% in performance), while offering the flexibility to switch domains without retraining.
In summary, the contributions are:
- A rigorous proof that discrete, Gaussian, and simplicial diffusion are special cases of the Wright‑Fisher process, with precise conditions on population size and reproduction rate.
- A resolution of the likelihood comparison problem via the hollow parameterization, establishing when ELBOs are directly comparable.
- A mathematically grounded stabilization of simplicial diffusion using sufficient‑statistic parameterization, leading to state‑of‑the‑art performance on DNA generation.
- Demonstration that a single model can be trained once and deployed across all three diffusion domains, removing the need for practitioners to commit to a specific forward process during model development.
The work bridges machine learning and population genetics, opening avenues for future research such as incorporating selection pressures, migration, or other genetic mechanisms into diffusion models for richer generative capabilities.
Comments & Academic Discussion
Loading comments...
Leave a Comment