Inferring stochastic dynamics with growth from cross-sectional data
Time-resolved single-cell omics data offers high-throughput, genome-wide measurements of cellular states, which are instrumental to reverse-engineer the processes underpinning cell fate. Such technologies are inherently destructive, allowing only cross-sectional measurements of the underlying stochastic dynamical system. Furthermore, cells may divide or die in addition to changing their molecular state. Collectively these present a major challenge to inferring realistic biophysical models. We present a novel approach, unbalanced probability flow inference, that addresses this challenge for biological processes modelled as stochastic dynamics with growth. By leveraging a Lagrangian formulation of the Fokker-Planck equation, our method accurately disentangles drift from intrinsic noise and growth. We showcase the applicability of our approach through evaluation on a range of simulated and real single-cell RNA-seq datasets. Comparing to several existing methods, we find our method achieves higher accuracy while enjoying a simple two-step training scheme.
💡 Research Summary
The paper tackles a fundamental problem in single‑cell analysis: how to infer the underlying stochastic dynamics of cellular states when only destructive, cross‑sectional measurements (e.g., scRNA‑seq snapshots) are available, and when cells can proliferate or die. Classical approaches either ignore birth/death, assume constant isotropic diffusion, or require external information such as known growth rates or lineage tracing. The authors propose a new method called Unbalanced Probability Flow Inference (UPFI) that extends the earlier Probability Flow Inference (PFI) framework to handle non‑conserved mass due to growth and death.
Mathematically, each cell follows an Itô diffusion
dXₜ = vₜ(Xₜ) dt + σₜ(Xₜ) dBₜ,
with division and death rates bₜ(Xₜ) and dₜ(Xₜ). The population density ρ(t,x) then satisfies a Fokker‑Planck equation with a source term g(t,x)=b−d. By rewriting the drift‑diffusion term as a transport term that involves the score ∇log ρ(t,x), the authors move to a Lagrangian frame where the PDE reduces to an ODE system for particle positions x(t) and their associated masses m(t):
dx/dt = v(t,x) + ∇·(D∇log ρ), dm/dt = g(t,x) m.
The key observation is that the score does not depend on the model parameters and can be learned offline from each snapshot using denoising score matching (DSM). Once a neural network s_φ(t,x) ≈ ∇log ρ(t,x) is obtained, the ODE system can be integrated forward for each observed cell, updating both its state and its weight according to the learned drift v_θ(t,x) and growth g_θ(t,x).
Training proceeds in two stages. First, DSM provides a time‑dependent score estimator. Second, the drift and growth networks are optimized by minimizing a discrepancy between the empirical distribution at time t_{i+1} (obtained by pushing forward the weighted particles from t_i) and the actual snapshot at t_{i+1}. The authors adopt the unbalanced Sinkhorn divergence S_{ε,γ} as the loss because it naturally handles measures with different total mass and works directly with discrete point clouds, avoiding any density estimation.
The paper also analyses identifiability. For an Ornstein‑Uhlenbeck (OU) process with quadratic fitness, they derive closed‑form expressions for the evolution of the mean, covariance, and total mass. They prove that many different combinations of drift and growth produce identical marginal distributions, showing that drift and growth are fundamentally confounded without additional constraints. This theoretical insight explains why naive inference can mistakenly attribute proliferation effects to drift, as observed with the original PFI.
Empirically, the authors evaluate UPFI on three settings: (1) a 2‑D bistable system where growth is biased toward one branch, (2) a high‑dimensional OU‑type system with time‑varying growth, and (3) real mouse embryonic stem‑cell scRNA‑seq data. Across all experiments, UPFI outperforms PFI, DeepRUOT, and unbalanced optimal‑transport baselines in terms of MMD, Wasserstein, and Sinkhorn losses. Visualizations of probability‑flow lines demonstrate that UPFI correctly separates branches that PFI erroneously merges due to mass imbalance. Moreover, the two‑step training scheme proves stable and requires fewer hyper‑parameter tricks than multi‑stage methods like DeepRUOT.
In summary, the contribution of the paper is threefold: (i) a principled reduction of the non‑conservative Fokker‑Planck equation to a tractable ODE system via a Lagrangian formulation, (ii) the use of denoising score matching to obtain a parameter‑free estimate of the score field, and (iii) the integration of an unbalanced Sinkhorn divergence to jointly learn drift and growth from cross‑sectional snapshots. This framework enables accurate reconstruction of stochastic cellular dynamics—including intrinsic noise and proliferation—using only destructive single‑cell measurements, opening the door to more realistic biophysical modeling of development, reprogramming, and disease processes.
Comments & Academic Discussion
Loading comments...
Leave a Comment