Tubular Riemannian Laplace Approximations for Bayesian Neural Networks

Reading time: 5 minute
...

📝 Original Info

  • Title: Tubular Riemannian Laplace Approximations for Bayesian Neural Networks
  • ArXiv ID: 2512.24381
  • Date: 2025-12-30
  • Authors: Rodrigo Pereira David

📝 Abstract

Laplace approximations are among the simplest and most practical methods for approximate Bayesian inference in neural networks, yet their Euclidean formulation struggles with the highly anisotropic, curved loss surfaces and large symmetry groups that characterize modern deep models. Recent work has proposed Riemannian and geometric Gaussian approximations to adapt to this structure. Building on these ideas, we introduce the Tubular Riemannian Laplace (TRL) approximation. TRL explicitly models the posterior as a probabilistic tube that follows a lowloss valley induced by functional symmetries, using a Fisher/Gauss-Newton metric to separate prior-dominated tangential uncertainty from datadominated transverse uncertainty. We interpret TRL as a scalable reparametrised Gaussian approximation that utilizes implicit curvature estimates to operate in high-dimensional parameter spaces. Our empirical evaluation on ResNet-18 (CIFAR-10 and CIFAR-100) demonstrates that TRL achieves excellent calibration, matching or exceeding the reliability of Deep Ensembles (in terms of ECE) while requiring only a fraction (1/5) of the training cost. TRL effectively bridges the gap between single-model efficiency and ensemble-grade reliability.

💡 Deep Analysis

📄 Full Content

Modern deep learning systems are increasingly deployed in safety-critical settings such as medical diagnosis, autonomous driving and scientific discovery. In these regimes, calibrated uncertainty quantification is as important as accuracy: models should know when they do not know (Gal, 2016;Kendall & Gal, 2017). Bayesian neural networks (BNNs) offer a principled way to reason about uncertainty by placing priors over weights and maintaining a posterior distribution over parameters and induced functions (Rasmussen & Williams, 2006;Gal, 2016).

Unfortunately, exact Bayesian inference in deep networks is intractable, and practical Bayesian deep learning relies on approximate methods such as variational inference, Monte Carlo sampling, and Laplace approximations. Among these, the Laplace approximation remains attractive because it is conceptually simple, computationally cheap, and easy to integrate into existing training pipelines (Daxberger et al., 2021). The Euclidean Laplace approximation (ELA) fits a Gaussian in parameter space around the maximum a posteriori (MAP) estimate using the Hessian of the negative log-posterior. Despite recent advances in scalable Hessian approximation (Daxberger et al., 2021), ELA is known to underfit and produce poorly calibrated uncertainty for deep networks.

A recent line of work addresses part of this problem by moving from parameter space to function space. The linearised Laplace approximation (LLA) linearises the network around the MAP and propagates a Gaussian weight posterior through the Jacobian, yielding a Gaussian process in function space that better captures predictive uncertainty (e.g. Daxberger et al., 2021;Roy et al., 2024). At the same time, information geometry and Riemannian optimisation have highlighted that neural network parameter spaces carry rich geometric structure when equipped with metrics derived from the Fisher information or Gauss-Newton matrices (Amari, 1998;Bonnabel, 2013;Kristiadi et al., 2023). This has led to Riemannian Laplace approximations (RLA) that sample from approximate posteriors using geodesic flows with respect to such metrics (Bergamin et al., 2023;Yu et al., 2024).

A complementary perspective comes from Da Costa et al. (2025), who introduce geometric Gaussian approximations and show that families of posterior approximations obtained by pushing forward Gaussian base measures through diffeomorphisms (ReparamGA) or Riemannian exponential maps (RiemannGA) are universal under mild regularity conditions. This unifies many existing Laplace-like methods under a common geometric umbrella, and clarifies the role of the metric and parametrisation.

This paper. We build on these advances and propose the Tubular Riemannian Laplace (TRL) approximation, a geometric posterior approximation tailored to Bayesian neural networks. TRL is motivated by two empirical observations about deep networks: (i) the posterior mass is often organised along high-dimensional loss valleys or tunnels (Dold et al., 2025) generated by functional symmetries in weight space, (ii) the curvature of the negative log-posterior is extremely anisotropic, with nearly flat directions along these valleys and sharp directions across them. Standard Gaussian approximations are ill-suited to this geometry: fitting an ellipsoid to a long, curved valley is inherently inadequate.

Instead of a single Gaussian “bubble” centred at the MAP, TRL models the approximate posterior as a probabilistic tube that follows a low-loss curve through parameter space. The tube has three key ingredients: (i) an axis given by a curve that traces an approximate invariance valley, (ii) a transverse covariance determined by the Fisher/Gauss-Newton metric, and (iii) a tangential variance governed primarily by the prior. Sampling from the tube is implemented as a pushforward of an isotropic Gaussian in a latent space containing a one-dimensional valley coordinate and transversal coordinates.

We make the following contributions:

• We introduce Tubular Riemannian Laplace (TRL), a geometric posterior approximation that models posterior mass as a probabilistic tube along symmetry-induced lowloss valleys, separating prior-dominated tangential uncertainty from data-dominated transverse uncertainty under a Fisher/Gauss-Newton metric. 1

• We derive a scalable implementation combining a stochastic spine construction with implicit curvature estimation via Lanczos and Hessian-vector products, enabling TRL to operate in high-dimensional networks without explicit Hessian or Jacobian materialization, at cost comparable to standard training.

• We provide an empirical evaluation ranging from synthetic manifolds to ResNet-18 on CIFAR-100 (main) and CIFAR-10 (Appendix B.3). TRL achieves strong calibration in high-dimensional regimes, matching Deep Ensembles in ECE while requiring only a fraction (1/5) of the training cost.

Throughout, we focus on the geometric formulation and algorithmic design of TRL, providing empirical v

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut