Deterministic Bounds and Random Estimates of Metric Tensors on Neuromanifolds
The high-dimensional parameter space of deep neural networks – the neuromanifold – is endowed with a unique metric tensor defined by the Fisher information. Reliable and scalable computation of this metric tensor is valuable for theorists and practitioners. Focusing on neural classifiers, we return to a low-dimensional space of probability distributions, which we call the core space, and examine the spectrum and envelopes of its Fisher information matrix. We extend our discoveries there to deterministic bounds for the metric tensor on the neuromanifold. We introduce an unbiased random estimator based on Hutchinson’s trace method and derive related bounds. It can be evaluated efficiently with a single backward pass per batch, with a standard deviation bounded by the true value up to scaling.
💡 Research Summary
The paper tackles the problem of efficiently and accurately estimating the Fisher Information Matrix (FIM) that endows the high‑dimensional parameter space of deep neural networks—referred to as the “neuromanifold”—with a Riemannian‑like metric. The authors focus on classifier networks, where the output probabilities lie on a low‑dimensional statistical simplex (the “core space”). By pulling back the metric from this core space to the full parameter space, they derive deterministic upper and lower bounds for the neuromanifold metric and propose a novel unbiased stochastic estimator based on Hutchinson’s trace trick.
Core‑space analysis.
For a C‑class classifier, the core‑space FIM is IΔ(z)=diag(p)−ppᵀ, where p=Softmax(z) are the class probabilities. Theorem 1 characterizes its eigen‑spectrum: λ₁=0 (the all‑ones direction), λC is the largest eigenvalue bounded between max_i p_i(1−p_i) and 1−‖p‖², with tighter bounds expressed via the second‑largest and largest probabilities p(C−1) and p(C). Lemma 2 shows that IΔ(z) is sandwiched between a rank‑1 matrix λC vC vCᵀ (the tightest rank‑1 lower envelope) and the diagonal matrix diag(p) (the tightest diagonal upper envelope). Lemma 3 quantifies the Frobenius distance from IΔ(z) to each envelope, revealing that the diagonal bound always incurs at least 1/C error, whereas the rank‑1 bound can be arbitrarily accurate when the probability vector is near‑uniform.
Deterministic bounds on the neuromanifold.
Using the pull‑back relation F(θ)=E_x
Comments & Academic Discussion
Loading comments...
Leave a Comment