GeoIB: Geometry-Aware Information Bottleneck via Statistical-Manifold Compression
Information Bottleneck (IB) is widely used, but in deep learning, it is usually implemented through tractable surrogates, such as variational bounds or neural mutual information (MI) estimators, rather than directly controlling the MI I(X;Z) itself. The looseness and estimator-dependent bias can make IB “compression” only indirectly controlled and optimization fragile. We revisit the IB problem through the lens of information geometry and propose a \textbf{Geo}metric \textbf{I}nformation \textbf{B}ottleneck (\textbf{GeoIB}) that dispenses with mutual information (MI) estimation. We show that I(X;Z) and I(Z;Y) admit exact projection forms as minimal Kullback-Leibler (KL) distances from the joint distributions to their respective independence manifolds. Guided by this view, GeoIB controls information compression with two complementary terms: (i) a distribution-level Fisher-Rao (FR) discrepancy, which matches KL to second order and is reparameterization-invariant; and (ii) a geometry-level Jacobian-Frobenius (JF) term that provides a local capacity-type upper bound on I(Z;X) by penalizing pullback volume expansion of the encoder. We further derive a natural-gradient optimizer consistent with the FR metric and prove that the standard additive natural-gradient step is first-order equivalent to the geodesic update. We conducted extensive experiments and observed that the GeoIB achieves a better trade-off between prediction accuracy and compression ratio in the information plane than the mainstream IB baselines on popular datasets. GeoIB improves invariance and optimization stability by unifying distributional and geometric regularization under a single bottleneck multiplier. The source code of GeoIB is released at “https://anonymous.4open.science/r/G-IB-0569".
💡 Research Summary
The paper introduces GeoIB, a novel formulation of the Information Bottleneck (IB) principle that avoids explicit mutual information (MI) estimation by leveraging information geometry. Traditional deep‑learning IB methods rely on variational bounds (e.g., VIB) or neural MI estimators (e.g., MINE, CLUB), which introduce bias, variance, and instability, especially under strong compression. GeoIB re‑expresses both I(X;Z) and I(Z;Y) as exact projection distances: each is the minimal Kullback‑Leibler (KL) divergence from the joint distribution to its corresponding independence manifold. This geometric view turns the IB Lagrangian into a difference of two KL‑projection terms, providing a clear “push‑pull” interpretation of compression versus predictive utility.
To control compression without estimating MI, GeoIB adds two complementary regularizers. The first is a distribution‑level Fisher‑Rao (FR) discrepancy, which approximates KL by the squared geodesic distance induced by the Fisher‑Rao metric. For smooth parametric families, KL(pθ′‖pθ) = ½ΔᵀF(θ)Δ + o(‖Δ‖²) = ½d_FR²(pθ′,pθ) + o(‖Δ‖²). In the common diagonal‑Gaussian encoder case, the FR distance matches the exact KL up to second order, and the authors provide a closed‑form expression. The second regularizer is a geometry‑level Jacobian‑Frobenius (JF) term. By re‑parameterizing the encoder as z = fφ(x) + ε with ε∼N(0,Σ(x)), the pull‑back metric on the input space is g_x = J_f(x)ᵀ Σ(x)⁻¹ J_f(x). Linearizing fφ around each x yields a local Gaussian channel whose capacity provides an upper bound on I(X;Z). Under a unit‑energy constraint on the input covariance, this bound simplifies to ½ E_x Tr
Comments & Academic Discussion
Loading comments...
Leave a Comment