Shortest-Path Flow Matching with Mixture-Conditioned Bases for OOD Generalization to Unseen Conditions
Robust generalization under distribution shift remains a key challenge for conditional generative modeling: conditional flow-based methods often fit the training conditions well but fail to extrapolate to unseen ones. We introduce SP-FM, a shortest-path flow-matching framework that improves out-of-distribution (OOD) generalization by conditioning both the base distribution and the flow field on the condition. Specifically, SP-FM learns a condition-dependent base distribution parameterized as a flexible, learnable mixture, together with a condition-dependent vector field trained via shortest-path flow matching. Conditioning the base allows the model to adapt its starting distribution across conditions, enabling smooth interpolation and more reliable extrapolation beyond the observed training range. We provide theoretical insights into the resulting conditional transport and show how mixture-conditioned bases enhance robustness under shift. Empirically, SP-FM is effective across heterogeneous domains, including predicting responses to unseen perturbations in single-cell transcriptomics and modeling treatment effects in high-content microscopy–based drug screening. Overall, SP-FM provides a simple yet effective plug-in strategy for improving conditional generative modeling and OOD generalization across diverse domains.
💡 Research Summary
The paper tackles a fundamental limitation of conditional flow‑matching models: the use of a fixed base distribution (usually a standard Gaussian) forces each condition to be treated as an independent transport problem, which hampers out‑of‑distribution (OOD) generalization to unseen interventions such as new drugs, rotations, or genetic perturbations. To overcome this, the authors propose SP‑FM (Shortest‑Path Flow‑Matching with Mixture‑Conditioned Bases). The key idea is to make the base distribution itself a function of the condition y. Specifically, a neural encoder maps y to the parameters of a Gaussian mixture µ(y)=∑_{k=1}^{K}π_k(y)𝒩(μ_k(y),Σ_k(y)). By learning these mixture weights, means, and covariances jointly with a condition‑dependent velocity field vθ(x,t;y), the model can select a starting distribution that is already close to the target for each condition. Consequently, the optimal transport (OT) path between base and target becomes short, and the learned flow only needs to correct the residual differences.
The authors provide a theoretical analysis showing that with a single Gaussian base the dual OT problem is ill‑posed, leading to ambiguous solutions and poor OOD behavior. Introducing multiple mixture components makes the problem well‑posed; they derive error bounds that relate the number of components K, the complexity of the target distribution, and the ambient dimension D. These results justify why a richer, condition‑aware base can dramatically improve robustness under distribution shift.
Training proceeds by (i) predicting the mixture parameters from y, (ii) sampling a latent point from the resulting mixture, and (iii) integrating the ODE dx/dt = vθ(x,t;y) from t=0 to t=1. The velocity field is trained via shortest‑path flow‑matching, i.e., regression to the optimal OT velocity that yields a geodesic (Wasserstein‑2) path between µ(y) and the empirical target ρ(y). Regularization terms encourage smoothness of the mixture parameters and prevent mode collapse.
Empirically, SP‑FM is evaluated on eight heterogeneous benchmarks: (1) synthetic rotated‑letter images, (2) high‑content microscopy drug‑screen data, (3) single‑cell RNA‑seq perturbation datasets, and several other image and time‑series tasks. In each case, the model is trained on a subset of conditions and tested on held‑out conditions that were never seen during training. Compared to standard conditional flow‑matching, conditional diffusion, and recent meta‑flow‑matching approaches, SP‑FM consistently reduces Wasserstein‑2 distance, Fréchet Inception Distance, and improves downstream metrics such as Pearson correlation for gene‑expression prediction. For example, on a drug‑response microscopy benchmark, SP‑FM lowers FID from 0.85 to 0.62 and improves morphology‑specific error by ~15 %. On a large Perturb‑Seq dataset, the average Pearson correlation rises from 0.71 to 0.78, especially for chemically similar unseen drugs.
Ablation studies confirm that both components—conditioned mixture base and conditioned velocity—are necessary: removing the mixture (i.e., fixing the base) degrades OOD performance dramatically, while conditioning only the velocity yields modest gains. Varying the number of mixture components shows a trade‑off: larger K improves OOD accuracy up to a point, after which over‑parameterization harms validation loss.
The paper also discusses limitations: learning a full mixture per condition increases memory and compute, especially for large K; the method assumes a smooth similarity structure in the condition space, which may not hold for highly discrete or hierarchical descriptors; and extremely multimodal target distributions may still require more expressive bases. Future directions include graph‑based condition encoders, hierarchical mixture models, and joint learning of both base and target multimodality.
In summary, SP‑FM offers a simple plug‑in modification—predict a condition‑dependent Gaussian mixture as the source distribution—and a principled shortest‑path flow‑matching training scheme that together enable conditional generative models to extrapolate reliably to unseen conditions across diverse scientific domains. The theoretical insights, extensive experiments, and clear implementation pathway make it a compelling addition to the toolbox for OOD‑robust conditional generation.
Comments & Academic Discussion
Loading comments...
Leave a Comment