A Hitchhiker's Guide to Poisson Gradient Estimation

A Hitchhiker's Guide to Poisson Gradient Estimation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: Exponential Arrival Time (EAT) simulation and Gumbel-SoftMax (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical guidance for practitioners. Our main technical contribution is a modification to the EAT method that theoretically guarantees an unbiased first moment (exactly matching the firing rate), and reduces second-moment bias. We evaluate these methods on their distributional fidelity, gradient quality, and performance on two tasks: (1) variational autoencoders with Poisson latents, and (2) partially observable generalized linear models, where latent neural connectivity must be inferred from observed spike trains. Across all metrics, our modified EAT method exhibits better overall performance (often comparable to exact gradients), and substantially higher robustness to hyperparameter choices. Together, our results clarify the trade-offs between these methods and offer concrete recommendations for practitioners working with Poisson latent variable models.


💡 Research Summary

This paper tackles the long‑standing problem of differentiating through discrete Poisson samples in latent variable models, a challenge that appears in computational neuroscience applications such as Poisson variational autoencoders (P‑VAEs) and partially observable generalized linear models (POGLMs). Two recent path‑wise gradient estimators—Exponential Arrival Time (EAT) and Gumbel‑SoftMax (GSM)—provide continuous relaxations of the Poisson distribution, but both depend critically on a temperature hyper‑parameter τ. The original EAT uses a sigmoid to smooth the hard indicator that counts arrivals before a unit time; GSM maps Poisson probabilities onto a Gumbel‑SoftMax concrete distribution. In practice, the choice of τ dramatically influences distributional fidelity, gradient bias, variance, and training stability, yet prior work offered no systematic guidance.

The authors’ primary technical contribution is a modification of EAT, termed EAT cubic, which replaces the sigmoid with a compact cubic Hermite (smoothstep) interpolation. By leveraging Campbell’s theorem they derive closed‑form expressions for the first and second moments of the relaxed distribution as functions of τ and the smoothing function. They prove that the cubic interpolant yields an unbiased first moment (exactly matching the Poisson rate λ) and substantially reduces the second‑moment bias compared with the sigmoid. Moreover, the compact support of the cubic function makes the relaxation far less sensitive to τ, allowing a broad range of temperatures to produce near‑Poisson behavior.

Empirical evaluation is conducted on two benchmark tasks:

  1. Poisson VAE (P‑VAE) – MNIST and CIFAR‑10 images are converted to count data. The authors compare ELBO convergence, reconstruction loss, KL divergence, and sample quality across EAT cubic, original EAT (sigmoid), GSM, a score‑function estimator, and exact gradients (when tractable). EAT cubic matches or exceeds exact‑gradient performance across a wide τ interval (≈0.1–0.3), while the other methods require finely tuned τ to avoid bias or instability.

  2. Partially Observable GLM (POGLM) – Real neural recordings with 200 observed and 800 latent neurons are used to infer latent spike trains and connectivity matrices. Metrics include log‑likelihood, AUC for connectivity recovery, and F1‑score for spike reconstruction. Again, EAT cubic delivers the highest scores and remains stable for τ up to 0.4, whereas GSM only works at very low τ (≈0.05) and suffers from gradient explosions, and the original EAT shows systematic over‑estimation of firing rates.

A five‑axis rating table (distributional fidelity, gradient bias, gradient variance, temperature robustness, and generalizability) assigns near‑perfect scores to EAT cubic, moderate scores to GSM (good smoothness but high temperature sensitivity), and low scores to the original EAT (biased moments). The authors also explore automatic τ selection and find no simple heuristic; the optimal τ depends non‑linearly on λ and the loss landscape, reinforcing the value of a method that is intrinsically robust to τ.

Practical recommendations distilled from the study are:

  • Default choice for Poisson‑only models: EAT cubic, due to unbiased mean, low variance, and broad temperature tolerance.
  • When non‑Poisson discrete distributions are needed: GSM, which can be adapted to arbitrary categorical counts but requires careful τ tuning.
  • If existing code already uses sigmoid‑based EAT: swapping the sigmoid for the cubic Hermite interpolant yields immediate benefits with negligible computational overhead.
  • Hyper‑parameter guidance: start with τ in the 0.1–0.3 range for EAT cubic; monitor ELBO or validation loss to fine‑tune if necessary.

In summary, the paper delivers the first systematic comparison of Poisson gradient estimators, introduces a theoretically grounded and empirically superior modification to EAT, and provides clear, actionable guidance for researchers and engineers working with Poisson latent variable models in NeuroAI and related fields.


Comments & Academic Discussion

Loading comments...

Leave a Comment