A Novel Framework Using Variational Inference with Normalizing Flows to Train Transport Reversible Jump Proposals

A Novel Framework Using Variational Inference with Normalizing Flows to Train Transport Reversible Jump Proposals
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose a unified framework that employs variational inference (VI) with (conditional) normalizing flows (NFs) to train both between-model and within-model proposals for reversible jump Markov chain Monte Carlo, enabling efficient trans-dimensional Bayesian inference. In contrast to the transport reversible jump (TRJ) of Davies et al. (2023), which optimizes forward KL divergence using pilot samples from the complex target distribution, our approach minimizes the reverse KL divergence, requiring only samples from a simple base distribution and largely reducing computational cost. Especially, we develop a novel trans-dimensional VI method with conditional NFs to fit the conditional transport proposal of Davies et al. (2023). We use RealNVP flows to learn the model-specific transport maps used for constructing proposals so that the calculation is parallelizable. Our framework also provides accurate estimates of marginal likelihoods, which may facilitate efficient model comparison and help design rejection-free proposals. Extensive numerical studies demonstrate that the TRJ method trained under our framework achieves faster mixing compared to existing baselines.


💡 Research Summary

This paper introduces a unified framework that leverages variational inference (VI) together with (conditional) normalizing flows (NFs) to train both between‑model and within‑model proposals for reversible‑jump Markov chain Monte Carlo (RJMCMC). The motivation stems from the recent Transport Reversible Jump (TRJ) method (Davies et al., 2023), which learns transport maps (TMs) by minimizing the forward Kullback‑Leibler (KL) divergence using pilot samples drawn from the complex target posterior. While effective, that approach requires costly sampling from each model’s posterior, and the autoregressive flow (AF) architecture used for the TMs limits parallelism and scalability.

The authors propose to reverse the KL direction: they minimize the reverse KL divergence between a simple base distribution (typically a standard Gaussian) and the push‑forward of that base through a learnable transport map. By doing so, only samples from the base distribution are needed, eliminating the dependence on expensive pilot samples. The transport maps are parameterized by RealNVP coupling layers, which provide analytically tractable Jacobians and enable fully parallel forward and inverse evaluations. This choice yields an O(1) per‑dimension computational cost and scales well on GPUs.

A second major contribution is a trans‑dimensional VI method that employs a conditional normalizing flow to learn all model‑specific transport maps in a single training run. Following the saturated‑space strategy, each model’s parameter vector is padded with auxiliary variables so that all models share a common maximal dimension (d_{\max}). A conditional RealNVP, conditioned on the model index (k), learns a joint map (\tilde T(\cdot|k)) that simultaneously transports the augmented state of any model to a reference Gaussian space. During RJMCMC, a move from model (k) to (k’) proceeds by (i) applying (\tilde T^{-1}(\cdot|k)) to map the current state to the reference space, (ii) optionally adding or discarding auxiliary dimensions, and (iii) applying (\tilde T(\cdot|k’)) to obtain a proposal in the target model. Because the same conditional flow is reused, the training cost does not grow with the number of models.

The framework also yields by‑products useful for model comparison. The ELBO computed during VI provides a lower bound on each model’s marginal likelihood (evidence), enabling accurate estimation of Bayes factors without additional Monte‑Carlo effort. Moreover, when the learned transport maps are sufficiently accurate, the acceptance probability of a trans‑dimensional move collapses to a ratio involving only model priors and proposal probabilities, effectively achieving rejection‑free proposals in theory.

Extensive experiments are presented on several benchmark trans‑dimensional problems, including variable selection, phylogenetic tree inference, and geophysical inversion. The authors compare their RealNVP‑VI‑trained TRJ against the original AF‑based TRJ, KD‑tree‑based proposals, and standard random‑walk RJMCMC. Results show that the proposed method attains 2–5× faster mixing (as measured by effective sample size per unit time) and higher acceptance rates (often exceeding 80 %). The marginal likelihood estimates derived from the ELBO closely match ground‑truth values obtained via thermodynamic integration, confirming the reliability of the by‑product evidence estimates. Computational profiling demonstrates that the RealNVP implementation fully exploits GPU parallelism, achieving near‑real‑time sampling even for models with dimensionality above 50.

In summary, the paper makes three key advances: (1) it replaces forward‑KL training with reverse‑KL VI, removing the need for costly pilot samples; (2) it adopts RealNVP flows to obtain parallel, scalable transport maps; and (3) it introduces a conditional‑flow based trans‑dimensional VI that learns all model‑specific transports jointly. Together, these innovations substantially improve the practicality and efficiency of RJMCMC for high‑dimensional, multi‑model Bayesian inference.


Comments & Academic Discussion

Loading comments...

Leave a Comment