Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for a few emblematic methods based on pairwise distances: a simple algorithm based on the extraction of connected components in a neighborhood graph; the spectral clustering method of Ng, Jordan and Weiss; and hierarchical clustering with single linkage. The methods are shown to enjoy some near-optimal properties in terms of separation between clusters and robustness to outliers. The local scaling method of Zelnik-Manor and Perona is shown to lead to a near-optimal choice for the scale in the first two methods. We also provide a lower bound on the spectral gap to consistently choose the correct number of clusters in the spectral method.

💡 Research Summary

The paper introduces a very general generative model for clustering data points in a Euclidean space where each cluster may have its own intrinsic dimension, size, density, and sampling noise. Formally, the ambient space is the unit hyper‑cube (

Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions

💡 Research Summary

Comments & Academic Discussion

Leave a Comment