Improving the Linearized Laplace Approximation via Quadratic Approximations

Improving the Linearized Laplace Approximation via Quadratic Approximations
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Deep neural networks (DNNs) often produce overconfident out-of-distribution predictions, motivating Bayesian uncertainty quantification. The Linearized Laplace Approximation (LLA) achieves this by linearizing the DNN and applying Laplace inference to the resulting model. Importantly, the linear model is also used for prediction. We argue this linearization in the posterior may degrade fidelity to the true Laplace approximation. To alleviate this problem, without increasing significantly the computational cost, we propose the Quadratic Laplace Approximation (QLA). QLA approximates each second order factor in the approximate Laplace log-posterior using a rank-one factor obtained via efficient power iterations. QLA is expected to yield a posterior precision closer to that of the full Laplace without forming the full Hessian, which is typically intractable. For prediction, QLA also uses the linearized model. Empirically, QLA yields modest yet consistent uncertainty estimation improvements over LLA on five regression datasets.


💡 Research Summary

The paper addresses the well‑known problem that deep neural networks (DNNs) tend to produce over‑confident predictions on out‑of‑distribution (OOD) inputs, motivating Bayesian approaches for uncertainty quantification. The classical Laplace approximation (LA) provides a Gaussian posterior centered at the maximum a posteriori (MAP) estimate, but computing the full Hessian of the log‑likelihood is infeasible for modern DNNs. The Linearized Laplace Approximation (LLA) circumvents this by linearizing the network around the MAP parameters, applying a Generalized Gauss‑Newton (GGN) approximation to the Hessian, and using the resulting linear model both for posterior inference and for prediction. While LLA is computationally cheap and yields stable predictive distributions, the linearization of the posterior may degrade the fidelity of the posterior covariance, potentially harming uncertainty calibration.

To improve posterior fidelity without a large increase in computational cost, the authors propose the Quadratic Laplace Approximation (QLA). QLA retains the quadratic Taylor expansion of the network output around the MAP point, i.e., it adds a second‑order term involving the per‑sample Hessian of the network. The posterior precision matrix would then be Σ⁻¹_QTE = Σₙ ∇²_θ log p(yₙ | f_θ*^quad(xₙ,θ)) + S₀⁻¹, where the Hessian term contains both the Jacobian‑Jacobian product (as in GGN) and a term with the true Hessian H_θ*(x). Directly forming H_θ*(x) is prohibitive, but modern autodiff frameworks can compute Hessian‑vector products efficiently.

QLA therefore approximates the dominant eigen‑direction of the per‑sample Hessian‑based matrix A = ∇²_θ log p(y | f_θ*^quad) using a few iterations of the power method. The initial vector is set to the Jacobian J_θ*(x); each iteration computes Az = r H_θ*(x)z − J_θ*(x)ΛJ_θ*ᵀ(x)z, where r is the residual and Λ the noise precision. Because only Hessian‑vector products are required, the full Hessian is never materialized. After roughly ten iterations the dominant eigenvector ẑ converges. The matrix A is then approximated by the rank‑one factor ẑẑᵀ, yielding a posterior precision Σ⁻¹_QTE ≈ Σₙ ẑₙẑₙᵀ + S₀⁻¹. This provides a refined estimate of the posterior covariance that is closer to the true Laplace posterior than the GGN used in LLA, yet remains cheap to compute.

A subtle issue arises when using the full quadratic model for prediction: the predictive mean no longer equals the MAP network output, and the predictive variance reverts to a form similar to the standard Laplace predictive, which can again allocate excessive probability mass to unsupported regions. To avoid this, the authors retain the linearized predictive model for inference, exactly as in LLA. Consequently, the predictive mean stays at f(x,θ*), while the variance becomes J_θᵀ(x) Σ_QTE J_θ(x), directly reflecting the improved posterior precision obtained from the rank‑one updates.

Empirical evaluation is performed on five standard UCI regression benchmarks (Boston Housing, Energy Efficiency, Yacht Hydrodynamics, Concrete Compressive Strength, Wine Quality). For each dataset the authors create “in‑between” splits that hold out the middle third of sorted inputs as test data, thereby simulating OOD conditions. A single DNN architecture per dataset is trained with hyper‑parameters selected by inner cross‑validation. Both LLA and QLA are then applied to the trained network; QLA uses the prior variance estimated by LLA and ten power‑iteration steps. Predictive performance is assessed with Negative Log Likelihood (NLL) and Continuous Ranked Probability Score (CRPS), both proper scoring rules where lower values indicate better calibrated uncertainty.

Results (Table 1) show that QLA consistently yields slightly lower NLL and CRPS than LLA across all datasets, with the most pronounced gains on Boston Housing and Energy Efficiency. The improvements are modest (often on the order of 0.001–0.02) but statistically consistent, indicating that the rank‑one refinement of the posterior precision translates into more realistic predictive variances without harming the mean predictions.

The paper acknowledges limitations: QLA is currently demonstrated only for regression with scalar outputs, and extending it to classification or multivariate regression will require additional work. Moreover, using only the leading eigen‑vector provides a low‑rank approximation; richer low‑rank or inducing‑point strategies could capture more curvature information for very large networks. Future work is suggested in these directions.

In summary, the Quadratic Laplace Approximation offers a principled, computationally efficient enhancement to the Linearized Laplace Approximation. By incorporating dominant second‑order information through power‑iteration based rank‑one updates, QLA refines the posterior precision, leading to modest but consistent improvements in uncertainty calibration while preserving the cheap linear predictive model that mitigates over‑confidence. This contribution advances practical Bayesian deep learning by bridging the gap between tractable approximations and more faithful posterior representations.


Comments & Academic Discussion

Loading comments...

Leave a Comment