Discrete Variational Autoencoding via Policy Search

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Discrete latent bottlenecks in variational autoencoders (VAEs) offer high bit efficiency and can be modeled with autoregressive discrete distributions, enabling parameter-efficient multimodal search with transformers. However, discrete random variables do not allow for exact differentiable parameterization; therefore, discrete VAEs typically rely on approximations, such as Gumbel-Softmax reparameterization or straight-through gradient estimates, or employ high-variance gradient-free methods such as REINFORCE that have had limited success on high-dimensional tasks such as image reconstruction. Inspired by popular techniques in policy search, we propose a training framework for discrete VAEs that leverages the natural gradient of a non-parametric encoder to update the parametric encoder without requiring reparameterization. Our method, combined with automatic step size adaptation and a transformer-based encoder, scales to challenging datasets such as ImageNet and outperforms both approximate reparameterization methods and quantization-based discrete autoencoders in reconstructing high-dimensional data from compact latent spaces.

💡 Research Summary

The paper introduces Discrete Autoencoding via Policy Search (DAPS), a novel training framework for variational autoencoders (VAEs) that employ discrete latent variables. Traditional discrete VAEs rely on approximations such as the Gumbel‑Softmax re‑parameterization, straight‑through (ST) estimators, or high‑variance gradient‑free methods like REINFORCE. These approaches suffer from temperature sensitivity, large memory footprints, vanishing or exploding gradients in autoregressive settings, and poor scalability to high‑dimensional data such as ImageNet.

DAPS reframes the ELBO maximization problem as an entropy‑regularized return maximization, borrowing concepts from reinforcement learning (RL). The ELBO
\

Discrete Variational Autoencoding via Policy Search

💡 Research Summary

Comments & Academic Discussion

Leave a Comment