Identifying Topological Differences in Two Populations of Random Geometric Objects
We propose a statistical framework to identify topological differences in two populations of random geometric objects. The proposed framework involves first associating a topological signature with random geometric objects and then performing a two-s…
Authors: Satish Kumar, Subhra Sankar Dhar
Iden tifying T op ological Differences in Tw o P opulations of Random Geometric Ob jects Satish Kumar 1 and Subhra Sank ar Dhar 2 Departmen t of Mathematics and Statistics, Indian Institute of T ec hnology Kanpur Kanpur 208016, India Emails: satsh@iitk.ac.in 1 , subhra@iitk.ac.in 2 Marc h 17, 2026 Abstract W e prop ose a statistical framew ork to iden tify top ological differences in t wo p opulations of random geometric ob jects. The prop osed framew ork in volv es first asso ciating a top ological signature with random geometric ob jects and then p erforming a tw o-sample test using the observ ed topological signatures. W e associate p ersistence barcodes, a top ological signature from top ological data analysis, with eac h observed random geometric ob ject. This, in turn, yields a t wo-sample problem on the space of p ersistence barco des. As the space of p ersistence barco des is not suitable for standard statistical analysis, we translate the t wo-sample problem on a suitable subset of a Euclidean space. In the course of this study , w e embed the topological signatures in an ordered con vex cone in a Euclidean space using functions from tropical geometry . W e sho w that the embedding is a sufficient statistic for the p ersistence barco des. This fact leads to the prop osal of a tw o-sample test based on this sufficient statistic, and its equiv alence to the tw o-sample problem on the barco de space is established. Finally , the consistency of the prop osed test is studied. 1 Keyw ords: T op ological Data Analysis, P ersistent Homology , Random Geometric Ob jects, T ropical Em b edding, T ropical Sufficien t Statistics, Energy Statistics, P ermutation T est. 1 In tro duction T op ological data analysis (TDA) (see, e.g., [ 11 ], [ 9 ], [ 17 ], [ 48 ] and references therein) is an emerg- ing field that utilizes algebraic top ological techniques to analyze complex and high-dimensional data. The foundation of TDA is laid on the so-called “ manifold hyp othesis ” ([ 27 ]), which con- jectures that high-dimensional data are sampled from a smo oth manifold. The theme of TDA is that “ data has shap e ”, and the shape (or the manifold) underlying the data ma y rev eal stim ulating insights ab out the pro cess that generates the data, particularly when the data are high-dimensional and admit a complex structure. One of the key to ols in TDA is persisten t homology (see, e.g., [ 24 ], [ 25 ], [ 49 ]), a multi- scale extension of homology that is a classical top ological inv ariant from algebraic topology (see, e.g., [ 39 ], [ 30 ]). In tuitively , homology characterizes a top ological space using connected comp onen ts and holes in higher dimensions by associating a sequence of ab elian groups, called homology groups. P ersistent homology is an adaptation of homology to sampled data p oints from geometric ob jects, where data are represented as finite metric spaces called p oint clouds . P ersistent homology summarizes top ological features of data sets b y asso ciating a multi-set of in terv als in real lines, called b ar c o des . Barco des provide a geometric and topological summary of the data-generating mechanism (see, e.g., [ 12 ], [ 19 ], [ 29 ]). In the standard TD A framework, a point cloud is giv en from an unkno wn geometric ob ject, and the goal is to infer the geometric and top ological features of the underlying latent geometric ob ject (see, e.g., [ 41 ], [ 4 ], [ 26 ], [ 5 ], [ 15 ], [ 14 ], [ 13 ]). How ev er, occasionally , w e apply TDA to ols to a differen t framework, where w e consider a random sample of geometric ob jects instead of a p oin t cloud sampled from an unknown geometric ob ject. The aim here is to pro vide statistical inference on the probability distribution of the sampled geometric ob jects from the viewp oin t of p ersistent homology . Precisely , we view observed geometric ob jects through the lens of p ersisten t homology b y associating barco des with eac h geometric ob ject in the sample. Then 2 w e consider the random sample of barco des asso ciated with random geometric ob jects to infer the probabilit y distribution of the observ ed barco des. Note that the probability distribution of barco des is w ell defined (see, e.g., [ 36 ], [ 4 ]). In this article, the goal is to distinguish betw een t w o indep enden t collections of random geometric ob jects up to p ersistent homology . This amounts to p erforming a t wo-sample test for top ological signatures of geometric ob jects computed from p ersisten t homology . W e consider a t wo-step h yp othesis testing pro cedure to distinguish t wo independent collec- tions of random geometric ob jects. In the first step, we quan tify the observ ed geometric ob jects using p ersistence barcodes. Subsequen tly , in the second step, w e form ulate a tw o-sample test for indep enden t collections of random samples of barco des. The first step to wards suc h testing pro cedures is to define the class of geometric ob jects of in terest. Therefore, in the following subsection, we first define the geometric ob jects of interest and then construct an appropriate probabilit y space to incorp orate randomness in geometric ob jects. 1.1 Random Geometric Ob jects The class of geometric ob jects under consideration consists of c omp act metric sp ac es that are subsets of Euclidean spaces admitting triangulation . This class can b e defined using the notion of o-minimal structures from [ 21 ] to define tame sets (see Definition 6.1 ) that are triangulable b y the T riangulation Theorem in [ 22 ]. In particular, w e define the class of geometric ob jects X as X := X ⊂ R d : X is a tame and compact metric space . (1.1) Next, we define random geometric ob jects as elements of an appropriate probabilit y space (Ω , F , P ) that can b e constructed as follows. First, one can tak e Ω = X and F = B ( X ), where B ( X ) denotes the Borel σ -algebra generated b y the topology induced b y a suitable metric on X . In this regard, w e can use the Gromo v–Hausdorff distance (see Definition 6.2 ) to define a metric on X . Ho w ever, the Gromov–Hausdorff distance is a pseudo-metric on X , and a metric on the set of isometry classes of geometric ob jects is in X (see [ 46 ]). Therefore, we define Ω as: Ω := { [ X ] : X ∈ X } , (1.2) 3 where [ X ] denotes the isometry class of X . Th us, using the Gromo v–Hausdorff metric on Ω, we define the sample space (Ω , F ) on whic h a suitable probability measure P can b e defined. Note that F is the Borel σ -algebra generated by the top ology induced by the Gromo v–Hausdorff metric on Ω. In what follows, w e view random geometric ob jects as elemen ts from a probabilit y space (Ω , F , P ), where P is unkno wn. 1.2 Literature Review Statistical inference on random geometric ob jects pro ceeds with the probabilit y distribution of some suitable geometric summary . In classical statistical shap e analysis, geometric ob jects are represen ted by a set of user-sp ecified p oin ts, known as landmarks ([ 32 ]). The sp ecification of landmarks requires domain kno wledge and is sub ject to bias ([ 6 ]). Moreo ver, landmarks are not suitable for comparing geometric ob jects, as each geometric ob ject must b e represen ted b y an equal num b er of landmarks ([ 28 ]). [ 23 ] prop oses an alternativ e to landmark-based approac hes to compare geometric ob jects. Ho w ever, the approach prop osed b y [ 23 ] relies on the assumption that the geometric ob jects under comparison are diffeomorphic, whic h ma y not hold in practice for many data sets. TD A pro vides summaries of geometric ob jects that can b e used to distinguish betw een t w o collections of geometric ob jects without sp ecifying landmarks or relying on the assumption that geometric ob jects are diffeomorphic. In this direction, the follo wing t wo approac hes are relev an t. Recen tly , [ 35 ] prop osed a tw o-wa y ANO V A testing pro cedure in the functional data analysis framew ork by representing geometric ob jects using the smo oth Euler c haracteristic transform ([ 20 ]). In the context of a time series of random geometric ob jects, tw o-sample lo cation tests ha ve b een prop osed in [ 46 ] for a suitable class of geometric summaries, including dendrograms and p ersistence diagrams. 4 1.3 Our Con tribution W e in tro duce a statistical framework for conducting statistical inference on random geometric ob jects using em b eddings from tropical geometry . The prop osed framew ork inv olv es t wo k ey steps. First, we quantify geometric ob jects using p ersistence barco des, a top ological signature from TD A. Second, w e embed the barco des in to a finite-dimensional subset of Euclidean space using em b eddings from tropical geometry . The prop osed framework differs from the con v en- tional statistical shape analysis framew ork, whic h is usually based on man ual landmarking of shap es or on the assumption of diffeomorphism of shap es. Th us, the prop osed framework pa v es the wa y for statistical inference on random geometric ob jects without relying on landmark- or diffeomorphism-based approaches. Moreo ver, the prop osed framework facilitates statisti- cal analysis of a metric space v alued data in an standard statistical framew ork in a subset of Euclidean space. W e here present a t wo-sample test to detect top ological and geometric differences in t wo p opulations of random geometric ob jects. A tw o-step testing pro cedure is adopted. In the first step, w e associate barco des with the observed random geometric ob jects, yielding a t wo- sample problem on the space of barco des. In the second step, w e embed the barco des in a subset of Euclidean space as statistical analysis on the space of barco des is prohibited for the reasons explained in the next subsection. T o facilitate the testing pro cedure, a sufficient statistic for p ersistence barco des is prop osed. The prop osed sufficien t statistic is based on the tropical em b eddings prop osed b y [ 31 ] and refined by [ 38 ] to em b ed the barco des in a finite-dimensional Euclidean space. The prop osed sufficient statistic complemen ts the main results of [ 38 ] regarding the sufficiency of tropical em b eddings b y allowing statistical inference for a wider class of probability distributions (see Section 3.1 ). Thus, the application of the prop osed sufficient statistic translates the tw o-sample problem on the barco de space in to a sample problem on an ordered conv ex cone C d in R d . W e establish that p erforming a tw o- sample problem on the space of barco des is equiv alen t to p erforming a tw o-sample problem on C d . Finally , we prop ose a test based on the manifold energy distance to p erform a tw o-sample testing on the manifold C d . Moreov er, the consistency of the prop osed test is established. The prop osed test can b e useful in geometric morphometrics to iden tify morphological v ariations in 5 t wo groups of shap es and for such data analysis. 1.4 Mathematical Challenges W e address the follo wing main challenges. W e presen t tw o-sample tests on the space of barco des, whic h enables us to distinguish t w o indep endent collections of random geometric ob jects up to p ersistent homology . In the testing framew ork, barco des are associated with eac h observ ed random geometric ob ject, yielding tw o indep enden t collections of random barco des. Then, w e test the hypothesis of equality of the tw o probability distributions that generate the t wo indep enden t collections of random barco des. How ev er, p erforming the aforementioned test is prohibited due to the unusual nature of barco des. Barco des are collections of interv als in real lines rather than n umeric quan tities, whic h makes con v entional mathematical op erations suc h as addition and multiplication una v ailable to the aforementioned testing framew ork. W e circum ven t this issue b y embedding the barco des using functions from tropical geometry . In this regard, we use the statistical sufficiency (see Theorem 3.1 ) of the tropical embeddings prop osed b y [ 38 ] to em b ed the barco des in a finite-dimensional Euclidean space. How ev er, these tropical embeddings are not suitable for t w o-sample tests as it require stronger assumptions on the distributions of tropical em b eddings. In particular, tropical em b eddings are sufficient statistics for barco des if the class of distributions of tropical embeddings is restricted to the class of exchangeable distributions on Euclidean spaces (see Section 3.1 ). Therefore, w e prop ose a sufficient statistic using these tropical em b eddings that allo ws us to p erform tw o-sample tests under the standard assumptions on the class of distributions of tropical embeddings. F urther, this allows us to establish the equiv alence of t w o-sample problem on the barcode space to the t wo-sample problem on the the ordered conv ex cone C d := { ( x 1 , . . . , x d ) ∈ R d : x 1 ≤ , . . . , ≤ x d } . Th us, w e can perform t wo-sample tests on C d with standard assumptions on the class of distributions of the proposed sufficien t statistic. F urthermore, to p erform a t wo-sample test for manifold-v alued data, we prop ose a p ermutation test based on manifold energy statistics ([ 18 ]) and establish its consistency . 6 1.5 Organization The rest of the article is organized as follo ws. Section 2 provides the necessary background on the concepts required for the developmen t of the results in the article. Section 3 formulates the h yp othesis testing problem of interest and establishes its equiv alence to a tw o-sample problem on a subset of the Euclidean spaces. Section 4 presen ts a testing pro cedure based on the energy statistics and establish its consistency . Some conclusive remarks are pro vided in Section 5 . Finally , tec hnical details suc h as definitions and pro ofs of the main results are pro vided in the app endix in Section 6 . 2 Preliminaries This section formalizes the necessary concepts used in the pap er, suc h as the space of barcodes, tropical functions, and tropical co ordinates on the space of barco des. W e refer to [ 33 ] and [ 34 ] for a concise treatmen t of other required concepts, suc h as simplicial complexes, simplicial homology , and p ersistent homology , whic h are required for this pap er. 2.1 Barco de Space P ersistent homology is an adaptation of homology for the filtration of top ological spaces. A filtration (right contin uous) of top ological spaces is a nested collection F := {F ϵ : ≥ 0 } such that F ϵ ⊆ F t and F ϵ = ∩ ϵ 0), and 1 denote the indicator function. Note that restricting B ≤ n to B m ≤ n do es not p ose any limitations in practice, as imp ortan t prop erties suc h as Lipschitz con tin uity of the functions defined in Theorem 2.1 that hold on B ≤ n , also remain intact on B m ≤ n . The adv an tage of w orking with B m ≤ n instead of B ≤ n , as the barco de space, is that the functions defined in Theorem 2.1 provide an em b edding of barco des 15 in to a finite-dimensional Euclidean space. Moreov er, for a given finite set of barco des, a c hoice of m is straightforw ard. F or example, w e can take m = ⌈ max 1 ≤ i ≤ s ( b i / i ) ⌉ , where s = P n i =1 1 ( i > 0), and ⌈ x ⌉ denotes the smallest in teger greater than or equal to x for any x ∈ R . Therefore, [ 38 ] presented a mo dified version of Theorem 2.1 for the regularized barco de space B m ≤ n and for the tropical functions parameterized b y i, j ∈ { 0 , . . . , n } alone so that the k -factor of (1, 0) ro ws is redundan t. In particular, for a fixed m ∈ N , consider the family of functions { T ( i,j ) : i, j ∈ { 0 , . . . , n } such that ( i + j ) ≤ n } on B m ≤ n , defined by T ( i,j ) ([( b 1 , 1 , . . . , b n , n )]) := Γ [(0 , 1) i , (1 , 1) j ] [( b 1 ⊕ m 1 , 1 , . . . , b n ⊕ m n , n )] . (2.7) The functions defined in Equation ( 2.7 ) separate the nonequiv alen t barco des in B m ≤ n and are Lipsc hitz with resp ect to the b ottlenec k distance (see Definition 6.3 ). In what follows, we denote the family of functions { T ( i,j ) : i, j ∈ { 0 , . . . , n } such that ( i + j ) ≤ n } by { T 1 , . . . , T d } , d is determined b y equation 2 d = 2 n + n ( n + 1) (see [ 38 ]), and call the functions T 1 , . . . , T d as tr opic al c o or dinates on B m ≤ n . Th us, given a barco de B ∈ B m ≤ n and tropical co ordinates { T 1 , . . . , T d } on B m ≤ n , w e ha ve an em b edding T : B m ≤ n − → R d , defined as: T ( B ) := T π (1) ( B ) , . . . , T π ( d ) ( B ) ⊤ , (2.8) where π is a fixed p erm utation on { 1 , . . . , d } , d = n + 0 . 5 n ( n + 1). In general, w e hav e d ! em b eddings in R d for a barco de B ∈ B m ≤ n . This fact is crucial for the developmen t of the metho dology prop osed in this pap er. The following example illustrates the computation of tropical co ordinates on a regularized barco de space, and thereb y an em b edding in a Euclidean space. Example 2.2. R e c al l in Example 2.1 , for n = 2 , the set of orbits under the r ow p ermutation action of the symmetric gr oup S 2 on A 2 (se e, Equation ( 2.2 ) ), denote d by A 2 /S 2 , is given in Equation ( 2.5 ) . A c c or ding to Pr op osition 2.8 of [ 38 ], it suffic es to work with the fol lowing subsets of A 2 /S 2 to define tr opic al c o or dinates on the b ar c o de sp ac e: 0 1 1 1 , 0 0 1 1 , 0 1 0 1 , 0 0 0 1 , 1 1 1 1 . (2.9) 16 Now, supp ose we ar e given two b ar c o des in B ≤ 2 denote d by B 1 and B 2 , wher e B 1 = [[2 , 1) , [3 , 1)] , and B 2 = [[4 , 4)] . In the first step, we c ompute m for the observe d b ars. In this example m = 3 , sinc e ⌈ max 1 ≤ i ≤ 3 ( b i / i ) ⌉ = ⌈ max(2 / 1 , 3 / 1 , 1) ⌉ = 3 . This implies that B 1 , B 2 ∈ B 3 ≤ 2 . Note that the functions T ( i,j ) (se e Equation ( 2.7 ) ) wil l b e obtaine d fr om the orbits in Equation ( 2.9 ) . In p articular, for the first orbit in Equation ( 2.9 ) , that is, for i = 1 (numb er of (0 1) r ows) and j = 1 (numb er of (1 1) r ows), we wil l have T (1 , 1) , which we denote by T 1 . Simi- larly, we wil l have T 2 , T 3 , T 4 , T 5 c orr esp onding to the r est of the orbits in Equation ( 2.9 ) . L et B = [( b 1 , 1 , b 2 , 2 )] ∈ B 3 ≤ 2 , then the tr opic al c o or dinates T 1 , . . . , T 5 on B 3 ≤ 2 ar e obtaine d as fol lows. T 1 ( B ) := T (1 , 1) [( b 1 , 1 , b 2 , 2 )] = Γ [(0 , 1) , (1 , 1)] b 1 ⊕ 3 1 , 1 , b 2 ⊕ 3 2 , 2 = 1 ⊙ b 2 ⊕ 3 2 ⊙ 2 ⊞ b 1 ⊕ 3 1 ⊙ 1 ⊙ 2 = max 1 ⊙ b 2 ⊕ 3 2 ⊙ 2 , b 1 ⊕ 3 1 ⊙ 1 ⊙ 2 = max 1 ⊙ min b 2 , 3 2 ⊙ 2 , min b 1 , 3 1 ⊙ 1 ⊙ 2 = max 1 + min b 2 , 3 2 + 2 , min b 1 , 3 1 + 1 + 2 = max ( 1 + min ( b 2 , 3 2 ) + 2 , min ( b 1 , 3 1 ) + 1 + 2 ) . T 2 ( B ) := T (0 , 1) [( b 1 , 1 , b 2 , 2 )] = Γ [(1 , 1)] [( b 1 , 1 , b 2 , 2 )] = max (min ( b 1 , 3 1 ) + 1 , min ( b 2 , 3 2 ) + 2 ) . T 3 ( B ) := T (2 , 0) [( b 1 , 1 , b 2 , 2 )] = Γ [(0 , 1) 2 ] [( b 1 , 1 , b 2 , 2 )] = 1 + 2 . T 4 ( B ) := T (1 , 0) [( b 1 , 1 , b 2 , 2 )] = Γ [(0 , 1)] [( b 1 , 1 , b 2 , 2 )] = max( 1 , 2 ) . T 5 ( B ) := T (0 , 2) [( b 1 , 1 , b 2 , 2 )] = Γ [(1 , 1) 2 ] [( b 1 , 1 , b 2 , 2 )] = min( b 1 , 3 1 ) + 1 + min( b 2 , 3 2 ) + 2 . Thus, we have tr opic al c o or dinates { T 1 , . . . , T 5 } on B 3 ≤ 2 . Conse quently, the b ar c o des B 1 = { [2 , 1) , [3 , 1) } and B 2 = { [4 , 4) } c an b e emb e dde d in R 5 as (5 , 4 , 2 , 1 , 7) T and (8 , 8 , 4 , 4 , 8) T , r esp e ctively, by taking π ( i ) = i, i = 1 , . . . , 5 in Equation ( 2.8 ) . 17 3 Problem F orm ulation Let X 1 , . . . , X n 1 i.i.d ∼ P 1 and Y 1 , . . . , Y n 2 i.i.d ∼ P 2 b e tw o independent random samples of geometric ob jects, where the probabilit y measures P 1 and P 2 are defined on the measurable space (Ω , F ). Here n 1 and n 2 denotes the sample sizes, whic h ma y or may not b e equal. Recall that Ω is defined in Equation ( 1.2 ), and F = B (Ω), where B (Ω) is the Borel σ -algebra generated b y the top ology induced by the Gromo v–Hausdorff metric (denoted b y d GH , see Definition 6.2 ) on Ω. That is, B (Ω) is the smallest σ -algebra that con tains all op en sets in the metric space (Ω , d GH ) with topology generated b y op en balls. W e aim to detect top ological differences b etw een t wo indep enden t collections of random geometric ob jects. Our approach quantifies eac h geometric ob ject using topological signatures and then p erforms a tw o-sample hypothesis testing for the probabilit y distributions of the topological signatures. In this paper, w e quan tify the topological con tent of geometric ob jects using p ersistence barco des. Let γ : (Ω , F ) − → ( B ≤ n , σ ( B ≤ n )) b e a measurable transformation, where σ ( B ≤ n ) is the smallest σ -algebra that contains all op en sets in the metric space ( B ≤ n , δ B ) with top ology generated by op en balls. Here, δ B denotes the b ottlenec k distance (see Definition 6.3 ). Then, for a geometric ob ject X ∈ Ω, γ ( X ) is a top ological signature representing the p ersistence barco de of X and if X ∼ P , then γ ( X ) ∼ P ◦ γ − 1 , P ◦ γ − 1 denotes the push-forw ard of P under γ . W e are in terested in the follo wing tw o-sample hypothesis testing problem: H ′ 0 : µ ( A ) = ν ( A ) for all A ∈ σ ( B ≤ n ) vs. H ′ 1 : µ ( A ) = ν ( A ) , for some A ∈ σ ( B ≤ n ) , (3.1) where µ ≡ P 1 ◦ γ − 1 and ν ≡ P 2 ◦ γ − 1 . In essence, the testing problem in Equation ( 3.1 ) is a tw o-sample problem on the bar- co de space for the tw o indep enden t random samples of barco des B 1 , . . . , B n 1 i.i.d ∼ µ and ˜ B 1 , . . . , ˜ B n 2 i.i.d ∼ ν . Ho w ever, devising a testing pro cedure using barco des as data p oints is prohibitiv e due to the complex nature of barcodes. In particular, the usual mathematical op- erations, suc h as addition and multiplication, cannot b e applied to a collection of interv als in R . Therefore, to place the testing framework on a standard statistical fo oting in Euclidean space, we form ulate an equiv alen t hypothesis on Euclidean space by regularizing the observed barco des for a suitable m ∈ N and then using the tropical embeddings (see Equation ( 2.8 )) on 18 the regularized barco de space B m ≤ n (see Equation ( 2.6 )). 3.1 Equiv alen t Hyp othesis F orm ulation W e translate the t wo-sample problem on the barco de space defined in Equation ( 3.1 ) to a tw o- sample problem on the Euclidean space using tropical em b eddings defined in Equation ( 2.8 ). In this con text, the following result from [ 38 ] regarding the statistical sufficiency of tropical em b eddings will b e useful. Theorem 3.1. (The or em 3.5 of [ 38 ]) Consider a statistic al mo del on ( B m ≤ n , σ ( B m ≤ n )) with a family of pr ob ability me asur es P dominate d by a σ -finite me asur e λ , then for a b ar c o de B ∼ ϑ ∈ P , the emb e dding B 7→ T ( B ) = ( T π (1) ( B ) , . . . , T π ( d ) ( B )) ⊤ ∈ R d (se e Equation ( 2.8 ) ), for a fixe d p ermutation π on { 1 , . . . , d } , is a sufficient statistic for P . In other wor ds, for e ach ϑ ∈ P , the R adon-Niko dym derivative f ϑ ≡ dϑ/dλ admits the factorization f ϑ ( B ) = h ( B ) g ϑ ( T ( B )) , wher e h is a non-ne gative me asur able function on B m ≤ n , g ϑ is a non-ne gative me asur able function on R d , d = n + 0 . 5 n ( n + 1) , σ ( B m ≤ n ) denotes the smal lest σ -algebr a that c ontains al l op en sets in the metric sp ac e ( B m ≤ n , δ B ) with top olo gy gener ate d by op en b al ls, δ B denotes the b ottlene ck distanc e (se e Definition 6.3 ). In view of Theorem 3.1 , w e can regularize the barco de space B ≤ n for a suitable choice of m ∈ N , and consider the tw o random samples of barco des B 1 , . . . , B n 1 i.i.d ∼ µ and ˜ B 1 , . . . , ˜ B n 2 i.i.d ∼ ν suc h that µ and ν are defined on ( B m ≤ n , σ ( B m ≤ n )). Then, Theorem 3.1 can b e used to formulate an equiv alen t h yp othesis to Equation ( 3.1 ) based on the random samples T ( B 1 ) , . . . , T ( B n 1 ) i.i.d ∼ F and T ( ˜ B 1 ) , . . . , T ( ˜ B n 2 ) i.i.d ∼ G , where F and G are probability distributions on R d , where d ≥ 2 for n ≥ 1, b y the equation 2 d = 2 n + n ( n + 1). Recall that n denotes the maximum n umber of features (bars) in a barco de B ∈ B ≤ n . Note that F and G could b e considered as con tinuous probability distributions, since tropical embeddings are contin uous due to the Lipsc hitz contin uit y of tropical co ordinates with resp ect to the b ottleneck distance (see [ 31 ]). 19 Ho wev er, Theorem 3.1 is v alid under the assumption that the probabilit y distributions F and G belong to the class of exchangeable distributions. This is b ecause for an y t wo tropical em b eddings T π and T σ corresp onding to the tw o different p ermutations π and σ on { 1 , . . . , d } , resp ectiv ely , Theorem 3.1 yields the follo wing by taking h ( B ) ≡ 1 without loss of generalit y: f ϑ ( B ) = g ϑ ( T π ( B )) = ˜ g ϑ ( T σ ( B )) , (3.2) where g ϑ and ˜ g ϑ are the probabilit y densities of T π ( B ) := ( T π (1) ( B ) , . . . , T π ( d ) ( B )) ⊤ and T σ ( B ) := ( T σ (1) ( B ) , . . . , T σ ( d ) ( B )) ⊤ , resp ectiv ely . Here, ˜ g ϑ ≡ g ϑ ◦ φ , φ is a bijection such that φ ( T σ ( B )) := T π ( B ) and T i ’s are tropical co ordinates defined in Equation ( 2.7 ). Thus, if the densities g ϑ and ˜ g ϑ with resp ect to the induced probabilit y measures ϑ ◦ T − 1 π and ϑ ◦ T − 1 σ , resp ectiv ely , are not the same, then for a p oint B ∈ B m ≤ n , there will b e t wo images of B under f ϑ . This implies that f ϑ will not b e a map from B m ≤ n to (0 , ∞ ). Consequen tly , Theorem 3.1 will not b e v alid as f ϑ will not b e a probabilit y density with resp ect to the probabilit y measure ϑ ∈ P , defined on ( B m ≤ n , σ ( B m ≤ n )). Hence, the probability distributions with resp ect to the induced measures ϑ ◦ T − 1 π and ϑ ◦ T − 1 σ need to b e exc hangeable for the v alidit y of Theorem 3.1 . Note that the condition in Equation ( 3.2 ) is trivially true if w e assume that the tropical co ordinates are independent and identically distributed (i.i.d). How ev er, in the presen t con text, the i.i.d assumption for the tropical co ordinates is to o restrictiv e, as an individual component of the tropical co ordinates do es not represen t a barco de in the barco de space. In addition, the class of exchangeable distributions excludes some imp ortant classes of distributions, such as {N ( θ d × 1 , Σ) : θ d × 1 = θ 1 d × 1 , Σ = σ 2 I d × d , θ ∈ R , σ 2 > 0 , d ≥ 2 } . Moreo v er, we need to v alidate the assumption via testing the h yp othesis whether the observ ed v ector represen tations are exchangeable or not. Therefore, to allo w testing framework for a wider class of probabilit y distributions, we prop ose an em b edding based on the tropical co ordinates defined in Equation ( 2.7 ) and establish that the em b edding is a sufficien t statistic. In the following theorem, we define the prop osed embedding and state its statistical sufficiency . Theorem 3.2. Consider a statistic al mo del on ( B m ≤ n , σ ( B m ≤ n )) with a family of pr ob ability me asur es P dominate d by a σ -finite me asur e λ , then for a b ar c o de B ∼ ϑ ∈ P , the map B 7→ V ( B ) := ( V 1 ( B ) , . . . , V d ( B )) ⊤ ∈ C d , V k := k- min { T 1 ( B ) , . . . , T d ( B ) } , k = 1 , . . . , d , is a sufficient statistic for P , wher e C d := { ( x 1 , . . . , x d ) ⊤ ∈ R d : x 1 ≤ x 2 ≤ . . . , ≤ x d } and the k- min 20 denotes the k -th smal lest value fr om the tr opic al c o or dinates { T 1 ( B ) , . . . , T d ( B ) } (se e Equation ( 2.7 ) ). W e refer to Section 6.2 for the pro of of Theorem 3.2 . In fact, the em b edding V is a minimal sufficien t statistic among all sufficien t statistics generate d b y the tropical em b eddings defined in Equation ( 2.8 ). This is b ecause, corresp onding to ev ery sufficien t statistic generated b y tropical em b eddings, there exists a measurable function Ψ suc h that V ( B ) = Ψ( T ( B )), where Ψ is a map that sort the elements in the vector T ( B ) in increasing order. Hence, by the definition of a minimal sufficient statistic (see, e.g., Definition 2.5 of [ 43 ]), the sufficient statistic V is a minimal sufficient statistic. Th us, as an application of Theorem 3.2 , w e prop ose to use the minimal sufficient statistic to form ulate an equiv alen t h yp othesis to the h yp othesis in Equation ( 3.1 ). This allows us to p erform tw o-sample tests for the hypothesis in Equation ( 3.1 ) by t wo-sample tests on the manifold C d . Note that w e do not require probability distributions on C d to b e exc hangeable to p erform t wo-sample tests on the barco de space. No w, w e present the main result that states that t wo-sample tests on B m ≤ n , σ ( B m ≤ n ) can b e performed using the probabilit y measures on the manifold C d , using the minimal sufficien t statistic from Theorem 3.2 . Theorem 3.3. Supp ose we observe two indep endent samples of b ar c o des B 1 , . . . , B n 1 i.i.d ∼ µ , and ˜ B 1 , . . . , ˜ B n 2 i.i.d ∼ ν , wher e µ and ν ar e pr ob ability me asur es define d on ( B m ≤ n , σ ( B m ≤ n ) . L et the tr opic al r epr esentation of b ar c o des b e V ( B 1 ) , . . . , V ( B n 1 ) i.i.d ∼ F and V ( ˜ B 1 ) , . . . , V ( ˜ B n 2 ) i.i.d ∼ G , wher e F and G ar e supp orte d on the manifold C d (se e The or em 3.2 ). Then testing H ′ 0 (se e Equation ( 3.1 ) ) is e quivalent to testing the fol lowing hyp othesis: H 0 : F = G vs. H 1 : F = G. (3.3) 4 Pro cedure: T est of Hyp othesis This section presents a test statistic to p erform a t w o-sample h yp othesis test for H 0 defined in Equation ( 3.3 ). The prop osed test statistic is based on the energy distance ([ 18 ]) betw een the 21 t wo distributions F and G supp orted on a D -dimensional compact smooth submanifold M of R d , d ≥ D . Let ρ be a metric on M and consider the random v ariables X , X ′ , Y , Y ′ suc h that X D = X ′ , and Y D = Y ′ , where X ∼ F , Y ∼ G , and U D = U ′ indicate that the random v ariables U and U ′ are iden tically distributed. Then, the energy distance b et ween F and G denoted b y E ( F , G ) is defined as: E ( F , G ) := 2 E ( ρ ( X , Y )) − E ( ρ ( X , X ′ )) − E ( ρ ( Y , Y ′ )) , (4.1) where E ( U ) denotes the exp ectation of a random v ariable U . The t wo-sample tests on Euclidean spaces based on the energy distance ha ve b een con- sidered in the literature (see, e.g., [ 44 ]) and are shown to b e consisten t provided E ( F , G ) is a metric on the class of distributions under consideration. Recently , [ 18 ] extended the testing framew ork for the manifold-v alued data and provided sufficien t conditions for E ( F , G ) to be a metric. In the present context, the manifold under consideration is C d (see Theorem 3.2 ) with the standard Euclidean metric. How ev er, we require the follo wing assumptions, for C d to b e a compact smo oth submanifold of R d . Assumption 1. C d is a c omp act submanifold of R d , d ≥ 2 . Assumption 2. The class of pr ob ability distributions on the manifold C d is define d as: C := { F : F is absolutely c ontinuous } . (4.2) Then, under Assumption 1 and Assumption 2 , the following prop osition asserts that E ( F , G ) is a metric. Prop osition 4.1. E ( F , G ) = 0 if and only if F = G , for F , G ∈ C . W e refer to Section 6.2 for the pro of of Proposition 4.1 , whic h relies on the condition of strong negative type ([ 42 ]) for the metric space ( C d , ∥ . ∥ ), where ∥ . ∥ denotes the standard Euclidean metric in R d , d ≥ 2. Now, we define the test statistic based on the energy statistic, whic h is the sample counterpart of E ( F , G ). Let Z := S 1 ∪ S 2 denote the p o oled sample obtained from the tw o samples defined in Theorem 3.2 , that is, S 1 := {V ( B 1 ) , . . . , V ( B n 1 ) } and S 2 := {V ( ˜ B 1 ) , . . . , V ( ˜ B n 2 ) } . Then, the 22 energy statistic denoted by E n 1 ,n 2 ( Z ), is defined as E n 1 ,n 2 ( Z ) := X ( X,Y ) ∈Z ×Z 2( n 1 n 2 ) − 1 ∥ X − Y ∥ − X ( X,X ′ ) ∈S 1 ×S 1 n − 2 1 ∥ X − X ′ ∥ − X ( Y ,Y ′ ) ∈S 2 ×S 2 n − 2 2 ∥ Y − Y ′ ∥ , (4.3) where A × B denotes the Cartesian pro duct of the tw o sets A and B , and ∥ . ∥ denotes the standard Euclidean metric in R d , d ≥ 2. Then, w e prop ose the follo wing test for H 0 (see Equation ( 3.3 )) based on E n 1 ,n 2 ( Z ). Let α ∈ (0 , 1) b e a fixed level of significance. W e prop ose to reject H 0 at the level of significance α , if the observ ed v alue E n 1 ,n 2 ( Z obs ) ≥ C n 1 ,n 2 ( α ), where Z obs denotes the p o oled sample con taining observ ed v alues from the samples S 1 and S 2 , and C n 1 ,n 2 ( α ) denotes the (1 − α )th quan tile of the distribution of E n 1 ,n 2 ( Z ) under H 0 . T o accomplish the prop osed testing procedure, w e compute C n 1 ,n 2 ( α ) using the permutation distribution of the test statistic E n 1 ,n 2 ( Z ) under H 0 . Note that, under H 0 , the random v ariables in Z , sa y , Z := { Z 1 , . . . , Z N } , where N = ( n 1 + n 2 ), are exc hangeable. This implies that, under H 0 , an y v alue of the test statistic across all N ! p ermutations of { Z 1 , . . . , Z N } is equally likely . Th us, under H 0 , E n 1 ,n 2 ( Z ) ∼ U nif {E n 1 ,n 2 ( Z π obs ) : π ∈ S N } , where Z π obs denotes the observed p o oled sample Z obs with elemen ts ordered according to the p erm utation π , and S N denotes the symmetric group on { 1 , . . . , N } . This yields, under H 0 , for k = ⌈ (1 − α ) N ⌉ , w e hav e C n 1 ,n 2 ( α ) = k - min {E n 1 ,n 2 ( Z π obs ) : π ∈ S N } , (4.4) where k -min(A) denotes the k-th smallest v alue from the set A. 4.1 Asymptotic Prop erties of T est This subsection establishes the consistency of the prop osed test. In other words, w e sho w that the p ow er of the prop osed test tends to 1 as min( n 1 , n 2 ) − → ∞ . How ev er, b efore p erforming an asymptotic analysis, we would like to highligh t that the top ological signatures of the observ ed random geometric ob jects are regularized for a suitable c hoice of m ∈ N . That is, the observ ed random samples of barco des lie in a regularized subset of the barco de space B ≤ n , for a suitable 23 c hoice of m . Therefore, recall that the prop osed sufficien t statistic V is a measurable transfor- mation from the regularized barco de space B m ≤ n to C d (see Theorem 3.2 ). A data-driven choice of m w ould v ary with sample size, rendering the domain of V sample-dep enden t, and thereb y complicating a rigorous asymptotic analysis. T o remedy this, w e prop ose a universal v alue of m that can b e used to regularize the barco de space. First, recall from the definition of B m ≤ n (see Equation ( 2.6 )), w e subset only those barco des that consist of top ological features that satisfy the follo wing for m ∈ N : b i ≤ m ( d i − b i ) = ⇒ b i ≤ m m + 1 d i for all i = 1 , . . . , n, (4.5) where d i is the death time of the i -th feature in the barcode, and is related to the p ersistence i as, i = d i − b i . The condition in Equation ( 4.5 ) can be interpreted as choosing features from a subset of p ersistent diagrams dep ending on m . Note that in a t ypical p ersistence diagram, the birth alw ays precedes the death of a top ological feature. Therefore, all the top ological features of a p ersistent diagram are in the region { ( b, d ) ∈ R 2 : 0 < b ≤ d } . The condition in Equation ( 4.5 ) reduces this region b y scaling the death times b y m/m + 1. Th us, the higher the v alue of m , the wider the regularized region, whic h encompasses the features close to the diagonal in regularized subsets of barco des. Therefore, if we c ho ose a smaller v alue of m , say m = 1, then we will lea v e out most of the features that are close to the diagonal, while if we c ho ose a higher v alue of m , sa y m = 100, then the region { ( b, d ) : b ≤ 0 . 99 d } is closer to the region { ( b, d ) : b ≤ d } . Th us, a suitable larger v alue, say m = 100 allows us to subset most of the features from the persistence diagram. Hence, an appropriate and univ ersal choice of m to dra w random samples from B m ≤ n could b e m = 100. No w, we state the consistency of the prop osed p erm utation test in the following theorem. Theorem 4.1. L et the sample sizes n 1 and n 2 ar e such that n 1 / ( n 1 + n 2 ) − → λ ∈ (0 , 1) as min( n 1 , n 2 ) − → ∞ . Then under Assumption 1 and Assumption 2 , the test b ase d on E n 1 ,n 2 ( Z ) for H 0 (se e Equation ( 3.3 ) ), is c onsistent, that is, for the fol lowing pr ob ability under H 1 , we have P H 1 ( E n 1 ,n 2 ( Z ) ≥ C n 1 ,n 2 ( α )) − → 1 as min( n 1 , n 2 ) − → ∞ . 24 5 Conclusion W e prop ose a tw o-sample testing framew ork to detect top ological differences in random geo- metric ob jects. In the course of this study , we prop ose a sufficien t statistic deriv ed from tropical em b eddings of barco des to place the testing framew ork on a standard statistical fo oting. As an application of Theorem 3.2 , we establish that it is equiv alen t to p erforming a t wo-sample test on the barco de space to a tw o-sample problem on the ordered con vex cone ( C d ) in R d (see Theorem 3.3 ). W e prop ose a t wo-sample test on the manifold C d based on the manifold energy statistics and deriv e its consistency . The proposed testing framework is a generalized framew ork of h yp othesis testing framework proposed by [ 41 ] and [ 4 ]. In particular, the prop osed testing framew ork can b e adapted for the ensembles of p oin t cloud data. Moreo ver, the prop osed testing framework provides an alternative to the testing framew ork prop osed by [ 35 ]. As a future consideration, it w ould b e tempting to explore the p ossibilit y of extending the prop osed framew ork for a time series of random geometric ob jects. 6 App endix 6.1 Definitions Definition 6.1. ( T ame Set ) We use the notion of o-minimal structur es fr om [ 21 ] to define tame sets. L et P ( R d ) denote the p ower set of R d , and A × B denotes the Cartesian pr o duct of two sets A and B . An o-minimal structur e is define d as O := {O d : d ≥ 1 } , wher e O d ⊆ P ( R d ) satisfying the fol lowing c onditions: 1. Sets in O d ar e close d under finite interse ction and c omplement. 2. F or any set A ∈ O d , we have A × R ∈ O d +1 and R × A ∈ O d +1 . 3. L et π : R d +1 − → R d b e an axis-aligne d pr oje ction map. Then for any for any set A ∈ O d +1 , we have π ( A ) ∈ O d . 25 4. O is close d with r esp e ct to al l the op er ations of R that make it an or der e d field, that is, the op er ations like c omp arison ( < ), addition, and multiplic ation. 5. The only sets in O 1 ar e al l finite unions of p oints and op en intervals of R . Then, the elements of O ar e c al le d tame sets . Definition 6.2. ( Gr omov–Hausdorff distanc e ) We define the Gr omov–Hausdorff distanc e b etwe en two metric sp ac es ( X , d X ) and ( Y , d Y ) in terms of c orr esp ondenc es as in [ 46 ]. A c or- r esp ondenc e is a subset C ⊂ X × Y that satisfies the fol lowing: 1. F or al l x ∈ X , ∃ y ∈ Y such that ( x, y ) ∈ C 2. F or al l y ∈ y , ∃ x ∈ X such that ( x, y ) ∈ C . The distortion of the c orr esp ondenc e C is define d as: dist ( C ) = sup ( x 1 ,x 2 ) , ( y 1 ,y 2 ) ∈ C | d X ( x 1 , x 2 ) − d Y ( y 1 , y 2 ) | . Then, the Gr omov–Hausdorff distanc e d GH ( X , Y ) b etwe en the metric sp ac es X and Y is define d as: d GH ( X , Y ) = 1 2 inf { dist ( C ) : C ∈ C } , wher e C denotes the class of al l c orr esp ondenc es b etwe en X and Y . Definition 6.3. (Bottlene ck distanc e) [ 8 ] L et B 1 and B 2 b e two b ar c o des in B ≤ n (se e Equation ( 2.1 ) ). This implies that B 1 and B 2 c an b e written as finite c ol le ctions of intervals. That is, B 1 := { I i : i ∈ [ N ] } and B 2 := { J i : i ∈ [ M ] } , for some p ositive inte ger N and M such that max( N , M ) ≤ n . R e c al l that, her e [ n ] r epr esent the set { 1 , . . . , n } for any n ∈ N . Now, to define the b ottlene ck distanc e, we first ne e d to sp e cify the distanc e b etwe en two fe atur es in a b ar c o de as wel l as the distanc e b etwe en a fe atur e and the diagonal ∆ = { [ b, b ) : b ≥ 0 } c ontaining b ars of length 0. We define the distanc e b etwe en two fe atur es I := [ b 1 , 1 ) and J := [ b 2 , 2 ) as: δ ∞ ( I , J ) := max ( | b 1 − b 2 | , | ( b 1 + 1 ) − ( b 2 + 2 ) | ) , 26 wher e b i r epr esents the birth time and i r epr esents the p ersistenc e of the i -th fe atur e, i = 1 , 2 . The distanc e b etwe en a fe atur e [ b, ) and the diagonal ∆ is define d as: δ ∞ ([ b, ) , ∆) := 2 . Now, c onsider a bije ction ϕ : A − → B , wher e A ⊆ [ N ] and B ⊆ [ M ] , and define the p enalty ρ ( ϕ ) of ϕ as: ρ ( ϕ ) := max max i ∈ A δ ∞ I i , J ϕ ( i ) , max i ∈ [ N ] \ A ( δ ∞ ( I i , ∆)) , max i ∈ [ M ] \ B ( δ ∞ ( J i , ∆)) Then the b ottlene ck distanc e b etwe en B 1 and B 2 is denote d by δ B ( B 1 , B 2 ) , and define d as: δ B ( B 1 , B 2 ) := min ϕ ( ρ ( ϕ )) . Definition 6.4. ( Gener al F actorization The or em [ 2 ] ) L et ( X , F ) b e a me asur able sp ac e with a family of pr ob ability me asur es M dominate d by a σ -finite me asur e λ . Then a statistic T is sufficient for M if and only if ther e exist a non-ne gative me asur able function h on X and a set of non-ne gative me asur able functions { g ϑ : ϑ ∈ M } on the r ange of T such that for e ach ϑ ∈ M , the R adon-Niko dym derivative f ϑ ≡ dϑ/dλ admits the factorization f ϑ ( x ) = h ( x ) g ϑ ( T ( x )) , x ∈ X . 6.2 Pro ofs of Theorems and Prop ositions Pr o of of The or em 3.2 . W e pro ceed b y establishing that for a barco de B ∼ ϑ ∈ P , the Radon-Nik o dym deriv ativ e f ϑ ≡ dϑ/dλ factors as: f ϑ ( B ) = h ( B ) g ϑ ( V ( B )) , where h is a non-negative measurable function on B m ≤ n and g ϑ is a non-negative measurable function on C d . Recall that C d := { ( x 1 , . . . , x d ) ⊤ ∈ R d : x 1 ≤ x 2 ≤ . . . , ≤ x d } . Then by the general factorization theorem (see Definition 6.4 ), the map V will b e a sufficien t statistic. First, we use the prop ert y that the map V : B m ≤ n − → C d is injective. This is b ecause the tropical co ordinates defined in Equation ( 2.7 ) separate the barco des in B m ≤ n b y applying 27 Prep osition 2.8 of [ 38 ] and Theorem 2.1 (see [ 38 ]). This implies that for any t w o distinct p oint B 1 and B 2 in B m ≤ n , w e ha v e V ( B 1 ) = V ( B 2 ). Consequently , V is an em b edding, therefore, there exists a function η suc h that η ◦ V and V ◦ η are iden tit y maps in B m ≤ n and C d , resp ectively . Th us, w e can write f ϑ ( B ) = h ( B ) g ϑ ( V ( B )), for h ( B ) = 1 and g ϑ ≡ f ϑ ◦ η . It is eviden t that b oth h and g ϑ are non-negativ e, g ϑ is non-negativ e, as f ϑ is a probability densit y on B m ≤ n . No w, w e verify the measurability of the maps h and g ϑ to apply the general factorization theorem (see Definition 6.4 ). Note that the map h ( B ) = 1 is a constant map, hence it is contin uous. Therefore, by Theorem 1.5 of [ 40 ], h is measurable. Next, to sho w that g ϑ ≡ f ϑ ◦ η is measurable, w e use the fact that the composition of t wo measurable maps is measurable. Therefore, w e need to s ho w that η is measurable as f ϑ is measurable by the Radon-Nikodym theorem. W e use the Kurato wski theorem (see Chapter 3 in [ 40 ]), whic h states that the inv erse of an injectiv e, measurable map betw een complete and separable metric spaces is measurable. Now, since b oth the metric spaces ( B m ≤ n , δ B ) and ( C d , δ ) are closed subspaces of complete and separable metric spaces ( B ≤ n , δ B ) and ( R d , δ ), resp ectiv ely , δ denotes the standard Euclidean metric in R d . This implies that b oth the metric spaces ( B m ≤ n , δ B ) and ( C d , δ ) are complete and separable. W e refer to Theorem 3.2 of [ 4 ] for completeness and separabilit y of ( B ≤ n , δ B ). Consequently , the inv erse of V , that is, the map η is measurable. Hence, the embedding V is a sufficien t statistic for P b y the general factorization theorem (see Definition 6.4 ). This completes the pro of of Theorem 3.2 . Pr o of of The or em 3.3 . W e use Theorem 3.2 and apply the general factorization theorem (see Definition 6.4 ) for h ( B ) ≡ 1 without loss of generalit y to establish Theorem 3.3 . Let f µ , f ϑ , g µ and g ϑ denote the probabilit y densities of µ , ϑ , F and G , resp ectively . Recall that, F and G are the probability distributions corresp onding to the induced probabilit y measures µ ◦ V − 1 and ϑ ◦ V − 1 , resp ectiv ely . W e need to sho w that any statistical decision (accept or reject) for H ′ 0 (see, Equation ( 3.1 )) is v alid for H 0 (see, Equation ( 3.3 )), and vice v ersa. Consider the situation when w e accept the 28 n ull hypothesis H ′ 0 . This implies that for ev ery A ∈ σ ( B m ≤ n ) and B ∈ B m ≤ n , we hav e: µ ( A ) = ν ( A ) ⇐ ⇒ f µ ( B ) = f ν ( B ) , (6.1) where Equation ( 6.1 ) follows from the fact that the probability densit y of a random v ariable uniquely c haracterizes its probability measure. No w, using the sufficiency of the tropical em- b edding V from Theorem 3.2 , w e hav e: f µ ( B ) = f ν ( B ) ⇐ ⇒ g µ ( V ( B )) = g ν ( V ( B )) ⇐ ⇒ F ( V ( B )) = G ( V ( B )) . (6.2) Therefore, by Equation ( 6.1 ) and ( 6.2 ), for any arbitrary B ∈ B m ≤ n and A ∈ σ ( B m ≤ n ), we ha ve: µ ( A ) = ν ( A ) ⇐ ⇒ F ( V ( B )) = G ( V ( B )) . (6.3) No w, consider the situation when w e reject H ′ 0 , that is, there exists A ∈ σ ( B m ≤ n ) and B ∈ B m ≤ n suc h that: µ ( A ) = ν ( A ) ⇐ ⇒ f µ ( B ) = f ν ( B ) . (6.4) This further implies by Theorem 3.2 that: f µ ( B ) = f ν ( B ) ⇐ ⇒ g µ ( V ( B )) = g ν ( V ( B )) ⇐ ⇒ F ( V ( B )) = G ( V ( B )) . (6.5) Therefore, by Equation ( 6.4 ) and ( 6.5 ), there exists A ∈ σ ( B m ≤ n ) and B ∈ B m ≤ n suc h that: µ ( A ) = ν ( A ) ⇐ ⇒ F ( V ( B )) = G ( V ( B )) . (6.6) Th us, using Equation ( 6.3 ) and ( 6.6 ) it is established that b oth the h yp othesis H ′ 0 (see, Equation ( 3.1 )) and H 0 (see, Equation ( 3.3 )) are equiv alen t. This establishes the statement in Theorem 3.3 . Pr o of of Pr op osition 4.1 . W e need to sho w that the metric space ( C d , ∥ . ∥ ) has strong neg- ativ e type ([ 42 ]). Then, b y Prop osition 3 of [ 45 ], E ( F , G ) will b e a metric on the class of distribution functions C (see Equation ( 4.2 )). The metric space ( C d , ∥ . ∥ ) has strong negative t yp e if for an y tw o probability distributions F and G supp orted on C d and for the random v ariables X , X ′ , Y and Y ′ suc h that X D = X ′ and Y D = Y ′ , where X ∼ F and Y ∼ G , we ha v e: 2 E ∥ X − Y ∥ − E ∥ X − X ′ ∥ − E ∥ Y − Y ′ ∥ ≥ 0 , (6.7) 29 suc h that equality is attained in Equation ( 6.7 ) if and only if F = G . The condition of strong negativ e t yp e for the manifold C d holds under the Euclidean metric b y Theorem 2.1 of [ 3 ]. This implies that the metric space ( C d , ∥ . ∥ ) has strong negative type. Hence, by Prop osition 3 of [ 45 ], E ( F , G ) is metric on C . This establishes the assertion in Prop osition 4.1 . Pr o of of The or em 4.1 . The pro of pro ceeds along the follo wing tw o steps. First, we sho w that the energy statistic E n 1 ,n 2 ( Z ) con v erges in probability to its p opulation counterpart E ( F , G ) under H 1 , that is, we ha ve: E n 1 ,n 2 ( Z ) P − → E ( F , G ) as min( n 1 , n 2 ) − → ∞ (6.8) The equation ( 6.8 ) follo ws directly from the application of the asymptotics of U-statistics. In particular, we apply Theorem 12.6 of [ 47 ] to the first term of E n 1 ,n 2 ( Z ) and Theorem 12.3 of [ 47 ] to the remaining t wo terms of E n 1 ,n 2 ( Z ). Second, we use Lemma A.2 of [ 18 ], whic h establishes that for every > 0 there exists 0 < M < ∞ such that for any p ermutation π ∈ S N , we hav e: lim inf min( n 1 ,n 2 ) − → ∞ P (( n 1 + n 2 ) E n 1 ,n 2 ( Z π ) < M ) ≥ 1 − , (6.9) where Z π denotes the po oled sample with elemen ts ordered according to the permutation π on { 1 , . . . , ( n 1 + n 2 ) } . No w, consider the follo wing probability under H 1 : P H 1 ( E n 1 ,n 2 ( Z ) ≥ C n 1 ,n 2 ( α )) = P H 1 (( n 1 + n 2 ) E n 1 ,n 2 ( Z ) ≥ ( n 1 + n 2 ) C n 1 ,n 2 ( α )) ≥ P H 1 (( n 1 + n 2 ) E n 1 ,n 2 ( Z ) ≥ M ) (E.1) − → P H 1 ( E ( F , G ) ≥ 0) as min( n 1 , n 2 ) − → ∞ (E.2) = 1 , as E ( F , G ) > 0 , by Prep osition 4.1 under H 1 , where ( E.1 ) follows from the application of Equation ( 6.9 ) to Equation ( 4.4 ) and ( E.2 ) follows from Equation ( 6.8 ). This establishes the consistency of the prop osed test. 30 References [1] Aaron Adco c k, E. C. and Carlsson, G. (2016). The Ring of Algebraic F unctions on Persis- tence Barco des. Homolo gy, Homotopy and Applic ations , 18(1):381 – 402. [2] Bahadur, R. R. (1954). Sufficiency and Statistical Decision F unctions. The A nnals of Mathematic al Statistics , 25(3):423 – 462. [3] Baringhaus, L. and F ranz, C. (2004). On a new m ultiv ariate t wo-sample test. Journal of Multivariate Analysis , 88(1):190–206. [4] Blum b erg, A. J., Gal, I., Mandell, M. A., and P ancia, M. (2014). Robust statistics, hy- p othesis testing, and confidence in terv als for p ersistent homology on metric measure spaces. F oundations of Computational Mathematics , 14(4):745–789. [5] Bobro wski, O., Mukherjee, S., and T aylor, J. E. (2017). T op ological consistency via k ernel estimation. Bernoul li , 23(1):288 – 328. [6] Bo yer, D. M., Lipman, Y., Clair, E. S., Puen te, J., P atel, B. A., F unkhouser, T., Jern v all, J., and Daub echies, I. (2011). Algorithms to automatically quan tify the geometric similarit y of anatomical surfaces. Pr o c e e dings of the National A c ademy of Scienc es , 108(45):18221–18226. [7] Bub enik, P ., Carlsson, G., Kim, P ., and Luo, Z.-M. (2010). Statistical top ology via morse theory p ersistence and nonparametric estimation. A lgebr aic metho ds in statistics and pr ob a- bility II. Contemp or ary Mathematics. , 516:75–92. [8] Carlsson, G. (2014). T op ological pattern recognition for p oin t cloud data. A cta Numeric a , 30:289–368. [9] Carlsson, G. (2020). T op ological metho ds for data mo delling. Natur e R eviews Physics , 2(12):697–708. [10] Carlsson, G. and Kali ˇ snik V erov ˇ sek, S. (2016). Symmetric and r-symmetric tropical p oly- nomials and rational functions. Journal of Pur e and Applie d Algebr a , 220(11):3610–3627. 31 [11] Carlsson, G. and V ejdemo-Johansson, M. (2021). T op olo gic al Data Analysis with Applic a- tions . Cambridge Universit y Press. [12] Carlsson, G., Zomoro dian, A., Collins, A., and Guibas, L. (2004). P ersistence barcodes for shap es. In Pr o c e e dings of the 2004 Eur o gr aphics/A CM SIGGRAPH Symp osium on Ge om- etry Pr o c essing , SGP ’04, page 124–135, New Y ork, NY, USA. Asso ciation for Computing Mac hinery . [13] Chazal, F., Cohen-Steiner, D., and M´ erigot, Q. (2011a). Geometric inference for measures based on distance functions. F oundations of c omputational mathematics , 11(6):733–751. [14] Chazal, F., F asy , B., Lecci, F., Mic hel, B., Rinaldo, A., Rinaldo, A., and W asserman, L. (2017). Robust top ological inference: Distance to a measure and k ernel distance. J. Mach. L e arn. R es. , 18(1):5845–5884. [15] Chazal, F., Glisse, M., Labru` ere, C., and Michel, B. (2014). Con vergence rates for p ersis- tence diagram estimation in top ological data analysis. In Pr o c e e dings of the 31st International Confer enc e on International Confer enc e on Machine L e arning - V olume 32 , ICML’14, page I–163–I–171. JMLR.org. [16] Chazal, F., Guibas, L. J., Oudot, S. Y., and Skraba, P . (2011b). Scalar field analysis o v er p oin t cloud data. Discr ete & Computational Ge ometry , 46(4):743–775. [17] Chazal, F. and Mic hel, B. (2021). An in tro duction to top ological data analysis: F unda- men tal and practical asp ects for data scientists. F r ontiers in Artificial Intel ligenc e , 4. [18] Ch u, L. and Dai, X. (2024). Manifold energy t w o-sample test. Ele ctr onic Journal of Statistics , 18(1):145 – 166. [19] Collins, A., Zomoro dian, A., Carlsson, G., and Guibas, L. J. (2004). A barco de shap e descriptor for curve p oint cloud data. Computers & Gr aphics , 28:881–894. [20] Cra wford, L., Mono d, A., Chen, A. X., Mukherjee, S., and Rabad´ an, R. (2020). Predicting clinical outcomes in glioblastoma: An application of topological and functional data analysis. Journal of the Americ an Statistic al Asso ciation , 115(531):1139–1150. 32 [21] Curry , J., Mukherjee, S., and T urner, K. (2022). How many directions determine a shap e and other sufficiency results for tw o top ological transforms. T r ansactions of the Americ an Mathematic al So ciety, Series B , 9:1006–1043. [22] Dries, L. P . D. v. d. (1998). T ame T op olo gy and O-minimal Structur es . London Mathe- matical So ciet y Lecture Note Series. Cambridge Universit y Press. [23] Dupuis, P . and Grenander, U. (1998). V ariational problems on flo ws of diffeomorphisms for image matching. Q. Appl. Math. , L VI(3):587–600. [24] Edelsbrunner, H. and Harer, J. (2008). P ersistent homology—a survey . Contemp or ary Mathematics 453 , 26:257–282. [25] Edelsbrunner, H., Letscher, D., and Zomorodian, A. (2002). T op ological p ersistence and simplification. Discr ete & Computational Ge ometry , 28(4):511–533. [26] F asy , B. T., Lecci, F., Rinaldo, A., W asserman, L., Balakrishnan, S., and Singh, A. (2014). Confidence sets for p ersistence diagrams. The Annals of Statistics , 42(6):2301 – 2339. [27] F efferman, C., Mitter, S., and Nara y anan, H. (2016). T esting the manifold h yp othe- sis. Journal of the A meric an Mathematic al So ciety , 29:983–1049. Published electronically: F ebruary 9, 2016. [28] Gao, T., Kov alsky , S. Z., Bo yer, D. M., and Daub echies, I. (2019). Gaussian pro cess landmarking for three-dimensional geometric morphometrics. SIAM Journal on Mathematics of Data Scienc e , 1(1):237–267. [29] Ghrist, R. (2008). Barco des: The p ersisten t top ology of data. Americ an Mathematic al So ciety (New Series) , 45:61–75. [30] Hatc her, A. (2002). Algebr aic top olo gy . Cam bridge Universit y Press, Cam bridge. [31] Kali ˇ snik, S. (2019). T ropical co ordinates on the space of p ersistence barcodes. F oundations of Computational Mathematics , 19(1):101–129. [32] Kendall, D. G. (1989). A Survey of the Statistical Theory of Shap e. Statistic al Scienc e , 4(2):87 – 99. 33 [33] Kumar, S. and Dhar, S. S. (2025). A nov el characterization of structures in smo oth regression curves: from a viewp oin t of p ersistent homology . [34] Kumar, S. and Dhar, S. S. (2026). T esting homological equiv alence using b etti n um b ers: Probabilistic prop erties. The ory of Pr ob ability & Its Applic ations , 71(1). T o app ear. [35] Meng, K., W ang, J., Crawford, L., and Eloy an, A. (2025). Randomness of shap es and statistical inference on shapes via the smo oth euler characteristic transform. Journal of the A meric an Statistic al Asso ciation , 120(549):498–510. [36] Mileyk o, Y., Mukherjee, S., and Harer, J. (2011). Probabilit y measures on the space of p ersistence diagrams. Inverse Pr oblems , 27(12):124007. [37] Milnor, J. (1963). Morse The ory , v olume 51 of Annals of Mathematics Studies . Princeton Univ ersity Press, Princeton, NJ. [38] Mono d, A., Kali ˇ snik, S., Pati˜ no Galindo, J. A., and Cra wford, L. (2019). T ropical sufficien t statistics for p ersisten t homology . SIAM Journal on Applie d Algebr a and Ge ometry , 3(2):337– 371. [39] Munkres, J. (1984). Elements of Algebr aic T op olo gy . W estview Press; 1st edition. [40] P arthasarathy , K. (1967). Probabilit y and mathematical statistics: A series of monographs and textb o oks. In Pr ob ability Me asur es on Metric Sp ac es , Probability and Mathematical Statistics: A Series of Monographs and T extbo oks, page ii. Academic Press. [41] Robinson, A. and T urner, K. (2017). Hyp othesis testing for top ological data analysis. Journal of Applie d and Computational T op olo gy , 1(2):241–261. [42] Sc ho enberg, I. J. (1938). Metric spaces and positive definite functions. T r ansactions of the A meric an Mathematic al So ciety , 44(3):522–536. [43] Shao, J. (2003). Mathematic al Statistics . Springer T exts in Statistics. Springer, New Y ork, NY, 2 edition. [44] Szek ely , G. and Rizzo, M. (2004). T esting for equal distributions in high dimension. In- terStat , 5. 34 [45] Sz ´ ek ely , G. J. and Rizzo, M. L. (2017). The energy of data. Annual R eview of Statistics and Its Applic ation , 4(1):447–479. [46] v an Delft, A. and Blumberg, A. J. (2025). A statistical framework for analyzing shap e in a time series of random geometric ob jects. The Annals of Statistics , 53(2):561 – 588. [47] v an der V aart, A. W. (1998). Asymptotic Statistics . Cam bridge Series in Statistical and Probabilistic Mathematics. Cambridge Univ ersit y Press, Cam bridge. [48] W asserman, L. (2018). T op ological data analysis. Annual R eview of Statistics and Its Applic ation , 5(1):501–532. [49] Zomoro dian, A. and Carlsson, G. (2005). Computing p ersisten t homology . Discr ete Com- put. Ge om. , 33(2):249–274. 35
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment