PCA of probability measures: Sparse and Dense sampling regimes

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A common approach to perform PCA on probability measures is to embed them into a Hilbert space where standard functional PCA techniques apply. While convergence rates for estimating the embedding of a single measure from $m$ samples are well understood, the literature has not addressed the setting involving multiple measures. In this paper, we study PCA in a double asymptotic regime where $n$ probability measures are observed, each through $m$ samples. We derive convergence rates of the form $n^{-1/2} + m^{-α}$ for the empirical covariance operator and the PCA excess risk, where $α>0$ depends on the chosen embedding. This characterizes the relationship between the number $n$ of measures and the number $m$ of samples per measure, revealing a sparse (small $m$) to dense (large $m$) transition in the convergence behavior. Moreover, we prove that the dense-regime rate is minimax optimal for the empirical covariance error. Our numerical experiments validate these theoretical rates and demonstrate that appropriate subsampling preserves PCA accuracy while reducing computational cost.

💡 Research Summary

This paper investigates principal component analysis (PCA) for collections of probability measures by first embedding each measure into a Hilbert space and then applying standard functional‑PCA techniques. While the statistical behavior of a single embedded measure estimated from m samples is well understood, the authors address the practically relevant scenario where n independent measures are observed, each through m independent draws. They develop a double‑asymptotic framework in which both n (the number of measures) and m (the number of intra‑measure samples) tend to infinity, and they derive explicit convergence rates for two key objects: the empirical covariance operator and the excess risk of the resulting PCA projection.

Main contributions

Covariance operator error – Under a fourth‑moment bound on the embedding, they prove
\

PCA of probability measures: Sparse and Dense sampling regimes

💡 Research Summary

Comments & Academic Discussion

Leave a Comment