Generative Model via Quantile Assignment
Deep Generative models (DGMs) play two key roles in modern machine learning: (i) producing new information (e.g., image synthesis) and (ii) reducing dimensionality. However, traditional architectures often rely on auxiliary networks such as encoders in Variational Autoencoders (VAEs) or discriminators in Generative Adversarial Networks (GANs), which introduce training instability, computational overhead, and risks like mode collapse. We present NeuroSQL, a new generative paradigm that eliminates the need for auxiliary networks by learning low-dimensional latent representations implicitly. NeuroSQL leverages an asymptotic approximation that expresses the latent variables as the solution to an optimal transportation problem. Specifically, NeuroSQL learns the latent variables by solving a linear assignment problem and then passes the latent information to a standalone generator. We benchmark its performance against GANs, VAEs, and a budget-matched diffusion baseline on four datasets: handwritten digits (MNIST), faces (CelebA), animal faces (AFHQ), and brain images (OASIS). Compared to VAEs, GANs, and diffusion models: (1) in terms of image quality, NeuroSQL achieves overall lower mean pixel distance between synthetic and authentic images and stronger perceptual/structural fidelity; (2) computationally, NeuroSQL requires the least training time; and (3) practically, NeuroSQL provides an effective solution for generating synthetic data with limited training samples. By embracing quantile assignment rather than an encoder, NeuroSQL provides a fast, stable, and robust way to generate synthetic data with minimal information loss.
💡 Research Summary
The paper introduces NeuroSQL, a novel deep generative modeling framework that completely removes the need for auxiliary networks such as encoders (used in VAEs) or discriminators (used in GANs). The core idea is to treat the latent variables as a permutation of a fixed set of multivariate quantiles derived from the prior distribution. By leveraging recent optimal‑transport formulations of multivariate quantiles, the authors construct a deterministic lattice Qₙ = {Q₁,…,Qₙ} in the latent space. For a dataset of n samples {Xᵢ}, they define a cost matrix Cᵢₖ = ℓ(Xᵢ, Gθ(Qₖ)), where ℓ is a pixel‑wise loss (e.g., L2) and Gθ is a neural generator. Solving the linear assignment problem minπ∈Sₙ ∑ᵢ Cᵢ,π(i) yields the optimal permutation π, which in turn provides the estimated latent codes Ẑ = Qₙπ.
Training proceeds by alternating two steps: (i) with a fixed permutation, update the generator parameters θ by minimizing the supervised loss L(θ,π) = (1/n)∑ᵢ ℓ(Xᵢ, Gθ(Ẑᵢ)) + λR(θ); (ii) with the updated generator, recompute the cost matrix and solve the assignment problem to obtain a new π. A momentum update (Ẑᵗ = ρ Qₙπᵗ + (1−ρ) Ẑᵗ⁻¹) stabilizes the latent estimates across iterations. The inner assignment can be solved exactly by the Hungarian algorithm (O(n³)) or approximately by a greedy O(n²) scheme; the authors also propose a mini‑batch variant with O(m²) cost, making the method scalable to large n.
Theoretical contributions include: (1) a proof that in the univariate case the order statistics of the true latent variables converge to the quantile grid, yielding an error of Oₚ(1/√n); (2) an extension to the multivariate case using the center‑outward distribution function F±, which maps a uniform grid on the unit ball to the desired prior. Proposition 1 shows that the minimal L₂ distance between the true latent matrix Z and the permuted quantile matrix Qₙπ converges to zero almost surely as n → ∞. These results guarantee that the permutation‑based approximation becomes exact in the large‑sample limit.
Empirically, NeuroSQL is benchmarked against VAEs, GANs, and a budget‑matched denoising diffusion probabilistic model (DDPM) on four datasets of increasing complexity: MNIST (handwritten digits), CelebA (human faces), AFHQ (animal faces), and OASIS (brain MRI). All methods share the same generator backbones (ConvNet, ResNet, U‑Net), training schedules, and computational budgets to isolate the effect of the learning principle. Evaluation metrics include mean pixel distance, LPIPS, SSIM, and FID. NeuroSQL consistently achieves the lowest pixel distance and highest structural similarity across all datasets, while requiring the least wall‑clock training time—approximately 30–50 % faster than VAEs/GANs and more than five times faster than DDPMs. In the low‑sample OASIS scenario (few hundred images), NeuroSQL still produces realistic synthetic scans, demonstrating robustness to data scarcity.
Latent space visualizations reveal that NeuroSQL forms well‑separated, class‑specific clusters, unlike VAEs where the KL regularizer forces most codes toward the origin, causing substantial overlap. The deterministic assignment to a high‑variance lattice spreads latent codes uniformly (e.g., roughly
Comments & Academic Discussion
Loading comments...
Leave a Comment