Manifold Diffusion Geometry: Curvature, Tangent Spaces, and Dimension

Manifold Diffusion Geometry: Curvature, Tangent Spaces, and Dimension
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce novel estimators for computing the curvature, tangent spaces, and dimension of data from manifolds, using tools from diffusion geometry. Although classical Riemannian geometry is a rich source of inspiration for geometric data analysis and machine learning, it has historically been hard to implement these methods in a way that performs well statistically. Diffusion geometry lets us develop Riemannian geometry methods that are accurate and, crucially, also extremely robust to noise and low-density data. The methods we introduce here are comparable to the existing state-of-the-art on ideal dense, noise-free data, but significantly outperform them in the presence of noise or sparsity. In particular, our dimension estimate improves on the existing methods on a challenging benchmark test when even a small amount of noise is added. Our tangent space and scalar curvature estimates do not require parameter selection and substantially improve on existing techniques.


💡 Research Summary

The paper presents a novel framework for estimating intrinsic geometric quantities—dimension, tangent spaces, and various curvature tensors (scalar, Ricci, and Riemann)—of data assumed to lie on a manifold, by leveraging diffusion geometry. Traditional approaches that rely on hard neighborhood definitions suffer from abrupt boundary changes, sensitivity to noise, and the need for manually tuned scale parameters. In contrast, the authors adopt a soft‑kernel diffusion maps construction, which naturally incorporates all data points with smoothly decaying weights, thereby providing robustness to both noise and non‑uniform sampling.

The core of the method is an estimator of the Laplace–Beltrami operator Δ̂ obtained from the data via a variable‑bandwidth Gaussian kernel Kε(p_i,p_j)=exp(−‖p_i−p_j‖²/(ε ρ(p_i) ρ(p_j))). An automatic bandwidth selection scheme (from prior work) yields a parameter‑free Δ̂ that converges to the true Δ with provable rates, independent of the underlying sampling density. Using the “carré du champ” identity g(∇f,∇h)=½


Comments & Academic Discussion

Loading comments...

Leave a Comment