Identifying Topological Differences in Two Populations of Random Geometric Objects

Iden tifying T op ological Diﬀerences in Tw o P opulations of Random Geometric Ob jects Satish Kumar 1 and Subhra Sank ar Dhar 2 Departmen t of Mathematics and Statistics, Indian Institute of T ec hnology Kanpur Kanpur 208016, India Emails: satsh@iitk.ac.in 1 , subhra@iitk.ac.in 2 Marc h 17, 2026 Abstract W e prop ose a statistical framew ork to iden tify top ological diﬀerences in t wo p opulations of random geometric ob jects. The prop osed framew ork in volv es ﬁrst asso ciating a top ological signature with random geometric ob jects and then p erforming a tw o-sample test using the observ ed topological signatures. W e associate p ersistence barcodes, a top ological signature from top ological data analysis, with eac h observed random geometric ob ject. This, in turn, yields a t wo-sample problem on the space of p ersistence barco des. As the space of p ersistence barco des is not suitable for standard statistical analysis, we translate the t wo-sample problem on a suitable subset of a Euclidean space. In the course of this study , w e embed the topological signatures in an ordered con vex cone in a Euclidean space using functions from tropical geometry . W e sho w that the embedding is a suﬃcient statistic for the p ersistence barco des. This fact leads to the prop osal of a tw o-sample test based on this suﬃcient statistic, and its equiv alence to the tw o-sample problem on the barco de space is established. Finally , the consistency of the prop osed test is studied. 1 Keyw ords: T op ological Data Analysis, P ersistent Homology , Random Geometric Ob jects, T ropical Em b edding, T ropical Suﬃcien t Statistics, Energy Statistics, P ermutation T est. 1 In tro duction T op ological data analysis (TDA) (see, e.g., [ 11 ], [ 9 ], [ 17 ], [ 48 ] and references therein) is an emerg- ing ﬁeld that utilizes algebraic top ological techniques to analyze complex and high-dimensional data. The foundation of TDA is laid on the so-called “ manifold hyp othesis ” ([ 27 ]), which con- jectures that high-dimensional data are sampled from a smo oth manifold. The theme of TDA is that “ data has shap e ”, and the shape (or the manifold) underlying the data ma y rev eal stim ulating insights ab out the pro cess that generates the data, particularly when the data are high-dimensional and admit a complex structure. One of the key to ols in TDA is persisten t homology (see, e.g., [ 24 ], [ 25 ], [ 49 ]), a multi- scale extension of homology that is a classical top ological inv ariant from algebraic topology (see, e.g., [ 39 ], [ 30 ]). In tuitively , homology characterizes a top ological space using connected comp onen ts and holes in higher dimensions by associating a sequence of ab elian groups, called homology groups. P ersistent homology is an adaptation of homology to sampled data p oints from geometric ob jects, where data are represented as ﬁnite metric spaces called p oint clouds . P ersistent homology summarizes top ological features of data sets b y asso ciating a multi-set of in terv als in real lines, called b ar c o des . Barco des provide a geometric and topological summary of the data-generating mechanism (see, e.g., [ 12 ], [ 19 ], [ 29 ]). In the standard TD A framework, a point cloud is giv en from an unkno wn geometric ob ject, and the goal is to infer the geometric and top ological features of the underlying latent geometric ob ject (see, e.g., [ 41 ], [ 4 ], [ 26 ], [ 5 ], [ 15 ], [ 14 ], [ 13 ]). How ev er, occasionally , w e apply TDA to ols to a diﬀeren t framework, where w e consider a random sample of geometric ob jects instead of a p oin t cloud sampled from an unknown geometric ob ject. The aim here is to pro vide statistical inference on the probability distribution of the sampled geometric ob jects from the viewp oin t of p ersistent homology . Precisely , we view observed geometric ob jects through the lens of p ersisten t homology b y associating barco des with eac h geometric ob ject in the sample. Then 2 w e consider the random sample of barco des asso ciated with random geometric ob jects to infer the probabilit y distribution of the observ ed barco des. Note that the probability distribution of barco des is w ell deﬁned (see, e.g., [ 36 ], [ 4 ]). In this article, the goal is to distinguish betw een t w o indep enden t collections of random geometric ob jects up to p ersistent homology . This amounts to p erforming a t wo-sample test for top ological signatures of geometric ob jects computed from p ersisten t homology . W e consider a t wo-step h yp othesis testing pro cedure to distinguish t wo independent collec- tions of random geometric ob jects. In the ﬁrst step, we quan tify the observ ed geometric ob jects using p ersistence barcodes. Subsequen tly , in the second step, w e form ulate a tw o-sample test for indep enden t collections of random samples of barco des. The ﬁrst step to wards suc h testing pro cedures is to deﬁne the class of geometric ob jects of in terest. Therefore, in the following subsection, we ﬁrst deﬁne the geometric ob jects of interest and then construct an appropriate probabilit y space to incorp orate randomness in geometric ob jects. 1.1 Random Geometric Ob jects The class of geometric ob jects under consideration consists of c omp act metric sp ac es that are subsets of Euclidean spaces admitting triangulation . This class can b e deﬁned using the notion of o-minimal structures from [ 21 ] to deﬁne tame sets (see Deﬁnition 6.1 ) that are triangulable b y the T riangulation Theorem in [ 22 ]. In particular, w e deﬁne the class of geometric ob jects X as X :=  X ⊂ R d : X is a tame and compact metric space  . (1.1) Next, we deﬁne random geometric ob jects as elements of an appropriate probabilit y space (Ω , F , P ) that can b e constructed as follows. First, one can tak e Ω = X and F = B ( X ), where B ( X ) denotes the Borel σ -algebra generated b y the topology induced b y a suitable metric on X . In this regard, w e can use the Gromo v–Hausdorﬀ distance (see Deﬁnition 6.2 ) to deﬁne a metric on X . Ho w ever, the Gromov–Hausdorﬀ distance is a pseudo-metric on X , and a metric on the set of isometry classes of geometric ob jects is in X (see [ 46 ]). Therefore, we deﬁne Ω as: Ω := { [ X ] : X ∈ X } , (1.2) 3 where [ X ] denotes the isometry class of X . Th us, using the Gromo v–Hausdorﬀ metric on Ω, we deﬁne the sample space (Ω , F ) on whic h a suitable probability measure P can b e deﬁned. Note that F is the Borel σ -algebra generated by the top ology induced by the Gromo v–Hausdorﬀ metric on Ω. In what follows, w e view random geometric ob jects as elemen ts from a probabilit y space (Ω , F , P ), where P is unkno wn. 1.2 Literature Review Statistical inference on random geometric ob jects pro ceeds with the probabilit y distribution of some suitable geometric summary . In classical statistical shap e analysis, geometric ob jects are represen ted by a set of user-sp eciﬁed p oin ts, known as landmarks ([ 32 ]). The sp eciﬁcation of landmarks requires domain kno wledge and is sub ject to bias ([ 6 ]). Moreo ver, landmarks are not suitable for comparing geometric ob jects, as each geometric ob ject must b e represen ted b y an equal num b er of landmarks ([ 28 ]). [ 23 ] prop oses an alternativ e to landmark-based approac hes to compare geometric ob jects. Ho w ever, the approach prop osed b y [ 23 ] relies on the assumption that the geometric ob jects under comparison are diﬀeomorphic, whic h ma y not hold in practice for many data sets. TD A pro vides summaries of geometric ob jects that can b e used to distinguish betw een t w o collections of geometric ob jects without sp ecifying landmarks or relying on the assumption that geometric ob jects are diﬀeomorphic. In this direction, the follo wing t wo approac hes are relev an t. Recen tly , [ 35 ] prop osed a tw o-wa y ANO V A testing pro cedure in the functional data analysis framew ork by representing geometric ob jects using the smo oth Euler c haracteristic transform ([ 20 ]). In the context of a time series of random geometric ob jects, tw o-sample lo cation tests ha ve b een prop osed in [ 46 ] for a suitable class of geometric summaries, including dendrograms and p ersistence diagrams. 4 1.3 Our Con tribution W e in tro duce a statistical framework for conducting statistical inference on random geometric ob jects using em b eddings from tropical geometry . The prop osed framew ork inv olv es t wo k ey steps. First, we quantify geometric ob jects using p ersistence barco des, a top ological signature from TD A. Second, w e embed the barco des in to a ﬁnite-dimensional subset of Euclidean space using em b eddings from tropical geometry . The prop osed framework diﬀers from the con v en- tional statistical shape analysis framew ork, whic h is usually based on man ual landmarking of shap es or on the assumption of diﬀeomorphism of shap es. Th us, the prop osed framework pa v es the wa y for statistical inference on random geometric ob jects without relying on landmark- or diﬀeomorphism-based approaches. Moreo ver, the prop osed framework facilitates statisti- cal analysis of a metric space v alued data in an standard statistical framew ork in a subset of Euclidean space. W e here present a t wo-sample test to detect top ological and geometric diﬀerences in t wo p opulations of random geometric ob jects. A tw o-step testing pro cedure is adopted. In the ﬁrst step, w e associate barco des with the observed random geometric ob jects, yielding a t wo- sample problem on the space of barco des. In the second step, w e embed the barco des in a subset of Euclidean space as statistical analysis on the space of barco des is prohibited for the reasons explained in the next subsection. T o facilitate the testing pro cedure, a suﬃcient statistic for p ersistence barco des is prop osed. The prop osed suﬃcien t statistic is based on the tropical em b eddings prop osed b y [ 31 ] and reﬁned by [ 38 ] to em b ed the barco des in a ﬁnite-dimensional Euclidean space. The prop osed suﬃcient statistic complemen ts the main results of [ 38 ] regarding the suﬃciency of tropical em b eddings b y allowing statistical inference for a wider class of probability distributions (see Section 3.1 ). Thus, the application of the prop osed suﬃcient statistic translates the tw o-sample problem on the barco de space in to a sample problem on an ordered conv ex cone C d in R d . W e establish that p erforming a tw o- sample problem on the space of barco des is equiv alen t to p erforming a tw o-sample problem on C d . Finally , we prop ose a test based on the manifold energy distance to p erform a tw o-sample testing on the manifold C d . Moreov er, the consistency of the prop osed test is established. The prop osed test can b e useful in geometric morphometrics to iden tify morphological v ariations in 5 t wo groups of shap es and for such data analysis. 1.4 Mathematical Challenges W e address the follo wing main challenges. W e presen t tw o-sample tests on the space of barco des, whic h enables us to distinguish t w o indep endent collections of random geometric ob jects up to p ersistent homology . In the testing framew ork, barco des are associated with eac h observ ed random geometric ob ject, yielding tw o indep enden t collections of random barco des. Then, w e test the hypothesis of equality of the tw o probability distributions that generate the t wo indep enden t collections of random barco des. How ev er, p erforming the aforementioned test is prohibited due to the unusual nature of barco des. Barco des are collections of interv als in real lines rather than n umeric quan tities, whic h makes con v entional mathematical op erations suc h as addition and multiplication una v ailable to the aforementioned testing framew ork. W e circum ven t this issue b y embedding the barco des using functions from tropical geometry . In this regard, we use the statistical suﬃciency (see Theorem 3.1 ) of the tropical embeddings prop osed b y [ 38 ] to em b ed the barco des in a ﬁnite-dimensional Euclidean space. How ev er, these tropical embeddings are not suitable for t w o-sample tests as it require stronger assumptions on the distributions of tropical em b eddings. In particular, tropical em b eddings are suﬃcient statistics for barco des if the class of distributions of tropical embeddings is restricted to the class of exchangeable distributions on Euclidean spaces (see Section 3.1 ). Therefore, w e prop ose a suﬃcient statistic using these tropical em b eddings that allo ws us to p erform tw o-sample tests under the standard assumptions on the class of distributions of tropical embeddings. F urther, this allows us to establish the equiv alence of t w o-sample problem on the barcode space to the t wo-sample problem on the the ordered conv ex cone C d := { ( x 1 , . . . , x d ) ∈ R d : x 1 ≤ , . . . , ≤ x d } . Th us, w e can perform t wo-sample tests on C d with standard assumptions on the class of distributions of the proposed suﬃcien t statistic. F urthermore, to p erform a t wo-sample test for manifold-v alued data, we prop ose a p ermutation test based on manifold energy statistics ([ 18 ]) and establish its consistency . 6 1.5 Organization The rest of the article is organized as follo ws. Section 2 provides the necessary background on the concepts required for the developmen t of the results in the article. Section 3 formulates the h yp othesis testing problem of interest and establishes its equiv alence to a tw o-sample problem on a subset of the Euclidean spaces. Section 4 presen ts a testing pro cedure based on the energy statistics and establish its consistency . Some conclusive remarks are pro vided in Section 5 . Finally , tec hnical details suc h as deﬁnitions and pro ofs of the main results are pro vided in the app endix in Section 6 . 2 Preliminaries This section formalizes the necessary concepts used in the pap er, suc h as the space of barcodes, tropical functions, and tropical co ordinates on the space of barco des. W e refer to [ 33 ] and [ 34 ] for a concise treatmen t of other required concepts, suc h as simplicial complexes, simplicial homology , and p ersistent homology , whic h are required for this pap er. 2.1 Barco de Space P ersistent homology is an adaptation of homology for the ﬁltration of top ological spaces. A ﬁltration (right contin uous) of top ological spaces is a nested collection F := {F ϵ :  ≥ 0 } such that F ϵ ⊆ F t and F ϵ = ∩ ϵ 0), and 1 denote the indicator function. Note that restricting B ≤ n to B m ≤ n do es not p ose any limitations in practice, as imp ortan t prop erties suc h as Lipschitz con tin uity of the functions deﬁned in Theorem 2.1 that hold on B ≤ n , also remain intact on B m ≤ n . The adv an tage of w orking with B m ≤ n instead of B ≤ n , as the barco de space, is that the functions deﬁned in Theorem 2.1 provide an em b edding of barco des 15 in to a ﬁnite-dimensional Euclidean space. Moreov er, for a given ﬁnite set of barco des, a c hoice of m is straightforw ard. F or example, w e can take m = ⌈ max 1 ≤ i ≤ s ( b i / i ) ⌉ , where s = P n i =1 1 (  i > 0), and ⌈ x ⌉ denotes the smallest in teger greater than or equal to x for any x ∈ R . Therefore, [ 38 ] presented a mo diﬁed version of Theorem 2.1 for the regularized barco de space B m ≤ n and for the tropical functions parameterized b y i, j ∈ { 0 , . . . , n } alone so that the k -factor of (1, 0) ro ws is redundan t. In particular, for a ﬁxed m ∈ N , consider the family of functions { T ( i,j ) : i, j ∈ { 0 , . . . , n } such that ( i + j ) ≤ n } on B m ≤ n , deﬁned by T ( i,j ) ([( b 1 ,  1 , . . . , b n ,  n )]) := Γ [(0 , 1) i , (1 , 1) j ] [( b 1 ⊕  m 1 ,  1 , . . . , b n ⊕  m n ,  n )] . (2.7) The functions deﬁned in Equation ( 2.7 ) separate the nonequiv alen t barco des in B m ≤ n and are Lipsc hitz with resp ect to the b ottlenec k distance (see Deﬁnition 6.3 ). In what follows, we denote the family of functions { T ( i,j ) : i, j ∈ { 0 , . . . , n } such that ( i + j ) ≤ n } by { T 1 , . . . , T d } , d is determined b y equation 2 d = 2 n + n ( n + 1) (see [ 38 ]), and call the functions T 1 , . . . , T d as tr opic al c o or dinates on B m ≤ n . Th us, given a barco de B ∈ B m ≤ n and tropical co ordinates { T 1 , . . . , T d } on B m ≤ n , w e ha ve an em b edding T : B m ≤ n − → R d , deﬁned as: T ( B ) :=  T π (1) ( B ) , . . . , T π ( d ) ( B )  ⊤ , (2.8) where π is a ﬁxed p erm utation on { 1 , . . . , d } , d = n + 0 . 5 n ( n + 1). In general, w e hav e d ! em b eddings in R d for a barco de B ∈ B m ≤ n . This fact is crucial for the developmen t of the metho dology prop osed in this pap er. The following example illustrates the computation of tropical co ordinates on a regularized barco de space, and thereb y an em b edding in a Euclidean space. Example 2.2. R e c al l in Example 2.1 , for n = 2 , the set of orbits under the r ow p ermutation action of the symmetric gr oup S 2 on A 2 (se e, Equation ( 2.2 ) ), denote d by A 2 /S 2 , is given in Equation ( 2.5 ) . A c c or ding to Pr op osition 2.8 of [ 38 ], it suﬃc es to work with the fol lowing subsets of A 2 /S 2 to deﬁne tr opic al c o or dinates on the b ar c o de sp ac e:        0 1 1 1     ,     0 0 1 1     ,     0 1 0 1     ,     0 0 0 1     ,     1 1 1 1        . (2.9) 16 Now, supp ose we ar e given two b ar c o des in B ≤ 2 denote d by B 1 and B 2 , wher e B 1 = [[2 , 1) , [3 , 1)] , and B 2 = [[4 , 4)] . In the ﬁrst step, we c ompute m for the observe d b ars. In this example m = 3 , sinc e ⌈ max 1 ≤ i ≤ 3 ( b i / i ) ⌉ = ⌈ max(2 / 1 , 3 / 1 , 1) ⌉ = 3 . This implies that B 1 , B 2 ∈ B 3 ≤ 2 . Note that the functions T ( i,j ) (se e Equation ( 2.7 ) ) wil l b e obtaine d fr om the orbits in Equation ( 2.9 ) . In p articular, for the ﬁrst orbit in Equation ( 2.9 ) , that is, for i = 1 (numb er of (0 1) r ows) and j = 1 (numb er of (1 1) r ows), we wil l have T (1 , 1) , which we denote by T 1 . Simi- larly, we wil l have T 2 , T 3 , T 4 , T 5 c orr esp onding to the r est of the orbits in Equation ( 2.9 ) . L et B = [( b 1 ,  1 , b 2 ,  2 )] ∈ B 3 ≤ 2 , then the tr opic al c o or dinates T 1 , . . . , T 5 on B 3 ≤ 2 ar e obtaine d as fol lows. T 1 ( B ) := T (1 , 1) [( b 1 ,  1 , b 2 ,  2 )] = Γ [(0 , 1) , (1 , 1)]  b 1 ⊕  3 1 ,  1 , b 2 ⊕  3 2 ,  2  =   1 ⊙  b 2 ⊕  3 2  ⊙  2  ⊞  b 1 ⊕  3 1  ⊙  1 ⊙  2  = max   1 ⊙  b 2 ⊕  3 2  ⊙  2 ,  b 1 ⊕  3 1  ⊙  1 ⊙  2  = max   1 ⊙ min  b 2 ,  3 2  ⊙  2 , min  b 1 ,  3 1  ⊙  1 ⊙  2  = max   1 + min  b 2 ,  3 2  +  2 , min  b 1 ,  3 1  +  1 +  2  = max (  1 + min ( b 2 , 3  2 ) +  2 , min ( b 1 , 3  1 ) +  1 +  2 ) . T 2 ( B ) := T (0 , 1) [( b 1 ,  1 , b 2 ,  2 )] = Γ [(1 , 1)] [( b 1 ,  1 , b 2 ,  2 )] = max (min ( b 1 , 3  1 ) +  1 , min ( b 2 , 3  2 ) +  2 ) . T 3 ( B ) := T (2 , 0) [( b 1 ,  1 , b 2 ,  2 )] = Γ [(0 , 1) 2 ] [( b 1 ,  1 , b 2 ,  2 )] =  1 +  2 . T 4 ( B ) := T (1 , 0) [( b 1 ,  1 , b 2 ,  2 )] = Γ [(0 , 1)] [( b 1 ,  1 , b 2 ,  2 )] = max(  1 ,  2 ) . T 5 ( B ) := T (0 , 2) [( b 1 ,  1 , b 2 ,  2 )] = Γ [(1 , 1) 2 ] [( b 1 ,  1 , b 2 ,  2 )] = min( b 1 , 3  1 ) +  1 + min( b 2 , 3  2 ) +  2 . Thus, we have tr opic al c o or dinates { T 1 , . . . , T 5 } on B 3 ≤ 2 . Conse quently, the b ar c o des B 1 = { [2 , 1) , [3 , 1) } and B 2 = { [4 , 4) } c an b e emb e dde d in R 5 as (5 , 4 , 2 , 1 , 7) T and (8 , 8 , 4 , 4 , 8) T , r esp e ctively, by taking π ( i ) = i, i = 1 , . . . , 5 in Equation ( 2.8 ) . 17 3 Problem F orm ulation Let X 1 , . . . , X n 1 i.i.d ∼ P 1 and Y 1 , . . . , Y n 2 i.i.d ∼ P 2 b e tw o independent random samples of geometric ob jects, where the probabilit y measures P 1 and P 2 are deﬁned on the measurable space (Ω , F ). Here n 1 and n 2 denotes the sample sizes, whic h ma y or may not b e equal. Recall that Ω is deﬁned in Equation ( 1.2 ), and F = B (Ω), where B (Ω) is the Borel σ -algebra generated b y the top ology induced by the Gromo v–Hausdorﬀ metric (denoted b y d GH , see Deﬁnition 6.2 ) on Ω. That is, B (Ω) is the smallest σ -algebra that con tains all op en sets in the metric space (Ω , d GH ) with topology generated b y op en balls. W e aim to detect top ological diﬀerences b etw een t wo indep enden t collections of random geometric ob jects. Our approach quantiﬁes eac h geometric ob ject using topological signatures and then p erforms a tw o-sample hypothesis testing for the probabilit y distributions of the topological signatures. In this paper, w e quan tify the topological con tent of geometric ob jects using p ersistence barco des. Let γ : (Ω , F ) − → ( B ≤ n , σ ( B ≤ n )) b e a measurable transformation, where σ ( B ≤ n ) is the smallest σ -algebra that contains all op en sets in the metric space ( B ≤ n , δ B ) with top ology generated by op en balls. Here, δ B denotes the b ottlenec k distance (see Deﬁnition 6.3 ). Then, for a geometric ob ject X ∈ Ω, γ ( X ) is a top ological signature representing the p ersistence barco de of X and if X ∼ P , then γ ( X ) ∼ P ◦ γ − 1 , P ◦ γ − 1 denotes the push-forw ard of P under γ . W e are in terested in the follo wing tw o-sample hypothesis testing problem: H ′ 0 : µ ( A ) = ν ( A ) for all A ∈ σ ( B ≤ n ) vs. H ′ 1 : µ ( A )  = ν ( A ) , for some A ∈ σ ( B ≤ n ) , (3.1) where µ ≡ P 1 ◦ γ − 1 and ν ≡ P 2 ◦ γ − 1 . In essence, the testing problem in Equation ( 3.1 ) is a tw o-sample problem on the bar- co de space for the tw o indep enden t random samples of barco des B 1 , . . . , B n 1 i.i.d ∼ µ and ˜ B 1 , . . . , ˜ B n 2 i.i.d ∼ ν . Ho w ever, devising a testing pro cedure using barco des as data p oints is prohibitiv e due to the complex nature of barcodes. In particular, the usual mathematical op- erations, suc h as addition and multiplication, cannot b e applied to a collection of interv als in R . Therefore, to place the testing framework on a standard statistical fo oting in Euclidean space, we form ulate an equiv alen t hypothesis on Euclidean space by regularizing the observed barco des for a suitable m ∈ N and then using the tropical embeddings (see Equation ( 2.8 )) on 18 the regularized barco de space B m ≤ n (see Equation ( 2.6 )). 3.1 Equiv alen t Hyp othesis F orm ulation W e translate the t wo-sample problem on the barco de space deﬁned in Equation ( 3.1 ) to a tw o- sample problem on the Euclidean space using tropical em b eddings deﬁned in Equation ( 2.8 ). In this con text, the following result from [ 38 ] regarding the statistical suﬃciency of tropical em b eddings will b e useful. Theorem 3.1. (The or em 3.5 of [ 38 ]) Consider a statistic al mo del on ( B m ≤ n , σ ( B m ≤ n )) with a family of pr ob ability me asur es P dominate d by a σ -ﬁnite me asur e λ , then for a b ar c o de B ∼ ϑ ∈ P , the emb e dding B 7→ T ( B ) = ( T π (1) ( B ) , . . . , T π ( d ) ( B )) ⊤ ∈ R d (se e Equation ( 2.8 ) ), for a ﬁxe d p ermutation π on { 1 , . . . , d } , is a suﬃcient statistic for P . In other wor ds, for e ach ϑ ∈ P , the R adon-Niko dym derivative f ϑ ≡ dϑ/dλ admits the factorization f ϑ ( B ) = h ( B ) g ϑ ( T ( B )) , wher e h is a non-ne gative me asur able function on B m ≤ n , g ϑ is a non-ne gative me asur able function on R d , d = n + 0 . 5 n ( n + 1) , σ ( B m ≤ n ) denotes the smal lest σ -algebr a that c ontains al l op en sets in the metric sp ac e ( B m ≤ n , δ B ) with top olo gy gener ate d by op en b al ls, δ B denotes the b ottlene ck distanc e (se e Deﬁnition 6.3 ). In view of Theorem 3.1 , w e can regularize the barco de space B ≤ n for a suitable choice of m ∈ N , and consider the tw o random samples of barco des B 1 , . . . , B n 1 i.i.d ∼ µ and ˜ B 1 , . . . , ˜ B n 2 i.i.d ∼ ν suc h that µ and ν are deﬁned on ( B m ≤ n , σ ( B m ≤ n )). Then, Theorem 3.1 can b e used to formulate an equiv alen t h yp othesis to Equation ( 3.1 ) based on the random samples T ( B 1 ) , . . . , T ( B n 1 ) i.i.d ∼ F and T ( ˜ B 1 ) , . . . , T ( ˜ B n 2 ) i.i.d ∼ G , where F and G are probability distributions on R d , where d ≥ 2 for n ≥ 1, b y the equation 2 d = 2 n + n ( n + 1). Recall that n denotes the maximum n umber of features (bars) in a barco de B ∈ B ≤ n . Note that F and G could b e considered as con tinuous probability distributions, since tropical embeddings are contin uous due to the Lipsc hitz contin uit y of tropical co ordinates with resp ect to the b ottleneck distance (see [ 31 ]). 19 Ho wev er, Theorem 3.1 is v alid under the assumption that the probabilit y distributions F and G belong to the class of exchangeable distributions. This is b ecause for an y t wo tropical em b eddings T π and T σ corresp onding to the tw o diﬀerent p ermutations π and σ on { 1 , . . . , d } , resp ectiv ely , Theorem 3.1 yields the follo wing by taking h ( B ) ≡ 1 without loss of generalit y: f ϑ ( B ) = g ϑ ( T π ( B )) = ˜ g ϑ ( T σ ( B )) , (3.2) where g ϑ and ˜ g ϑ are the probabilit y densities of T π ( B ) := ( T π (1) ( B ) , . . . , T π ( d ) ( B )) ⊤ and T σ ( B ) := ( T σ (1) ( B ) , . . . , T σ ( d ) ( B )) ⊤ , resp ectiv ely . Here, ˜ g ϑ ≡ g ϑ ◦ φ , φ is a bijection such that φ ( T σ ( B )) := T π ( B ) and T i ’s are tropical co ordinates deﬁned in Equation ( 2.7 ). Thus, if the densities g ϑ and ˜ g ϑ with resp ect to the induced probabilit y measures ϑ ◦ T − 1 π and ϑ ◦ T − 1 σ , resp ectiv ely , are not the same, then for a p oint B ∈ B m ≤ n , there will b e t wo images of B under f ϑ . This implies that f ϑ will not b e a map from B m ≤ n to (0 , ∞ ). Consequen tly , Theorem 3.1 will not b e v alid as f ϑ will not b e a probabilit y density with resp ect to the probabilit y measure ϑ ∈ P , deﬁned on ( B m ≤ n , σ ( B m ≤ n )). Hence, the probability distributions with resp ect to the induced measures ϑ ◦ T − 1 π and ϑ ◦ T − 1 σ need to b e exc hangeable for the v alidit y of Theorem 3.1 . Note that the condition in Equation ( 3.2 ) is trivially true if w e assume that the tropical co ordinates are independent and identically distributed (i.i.d). How ev er, in the presen t con text, the i.i.d assumption for the tropical co ordinates is to o restrictiv e, as an individual component of the tropical co ordinates do es not represen t a barco de in the barco de space. In addition, the class of exchangeable distributions excludes some imp ortant classes of distributions, such as {N ( θ d × 1 , Σ) : θ d × 1  = θ 1 d × 1 , Σ  = σ 2 I d × d , θ ∈ R , σ 2 > 0 , d ≥ 2 } . Moreo v er, we need to v alidate the assumption via testing the h yp othesis whether the observ ed v ector represen tations are exchangeable or not. Therefore, to allo w testing framework for a wider class of probabilit y distributions, we prop ose an em b edding based on the tropical co ordinates deﬁned in Equation ( 2.7 ) and establish that the em b edding is a suﬃcien t statistic. In the following theorem, we deﬁne the prop osed embedding and state its statistical suﬃciency . Theorem 3.2. Consider a statistic al mo del on ( B m ≤ n , σ ( B m ≤ n )) with a family of pr ob ability me asur es P dominate d by a σ -ﬁnite me asur e λ , then for a b ar c o de B ∼ ϑ ∈ P , the map B 7→ V ( B ) := ( V 1 ( B ) , . . . , V d ( B )) ⊤ ∈ C d , V k := k- min { T 1 ( B ) , . . . , T d ( B ) } , k = 1 , . . . , d , is a suﬃcient statistic for P , wher e C d := { ( x 1 , . . . , x d ) ⊤ ∈ R d : x 1 ≤ x 2 ≤ . . . , ≤ x d } and the k- min 20 denotes the k -th smal lest value fr om the tr opic al c o or dinates { T 1 ( B ) , . . . , T d ( B ) } (se e Equation ( 2.7 ) ). W e refer to Section 6.2 for the pro of of Theorem 3.2 . In fact, the em b edding V is a minimal suﬃcien t statistic among all suﬃcien t statistics generate d b y the tropical em b eddings deﬁned in Equation ( 2.8 ). This is b ecause, corresp onding to ev ery suﬃcien t statistic generated b y tropical em b eddings, there exists a measurable function Ψ suc h that V ( B ) = Ψ( T ( B )), where Ψ is a map that sort the elements in the vector T ( B ) in increasing order. Hence, by the deﬁnition of a minimal suﬃcient statistic (see, e.g., Deﬁnition 2.5 of [ 43 ]), the suﬃcient statistic V is a minimal suﬃcient statistic. Th us, as an application of Theorem 3.2 , w e prop ose to use the minimal suﬃcient statistic to form ulate an equiv alen t h yp othesis to the h yp othesis in Equation ( 3.1 ). This allows us to p erform tw o-sample tests for the hypothesis in Equation ( 3.1 ) by t wo-sample tests on the manifold C d . Note that w e do not require probability distributions on C d to b e exc hangeable to p erform t wo-sample tests on the barco de space. No w, w e present the main result that states that t wo-sample tests on  B m ≤ n , σ ( B m ≤ n )  can b e performed using the probabilit y measures on the manifold C d , using the minimal suﬃcien t statistic from Theorem 3.2 . Theorem 3.3. Supp ose we observe two indep endent samples of b ar c o des B 1 , . . . , B n 1 i.i.d ∼ µ , and ˜ B 1 , . . . , ˜ B n 2 i.i.d ∼ ν , wher e µ and ν ar e pr ob ability me asur es deﬁne d on ( B m ≤ n , σ ( B m ≤ n ) . L et the tr opic al r epr esentation of b ar c o des b e V ( B 1 ) , . . . , V ( B n 1 ) i.i.d ∼ F and V ( ˜ B 1 ) , . . . , V ( ˜ B n 2 ) i.i.d ∼ G , wher e F and G ar e supp orte d on the manifold C d (se e The or em 3.2 ). Then testing H ′ 0 (se e Equation ( 3.1 ) ) is e quivalent to testing the fol lowing hyp othesis: H 0 : F = G vs. H 1 : F  = G. (3.3) 4 Pro cedure: T est of Hyp othesis This section presents a test statistic to p erform a t w o-sample h yp othesis test for H 0 deﬁned in Equation ( 3.3 ). The prop osed test statistic is based on the energy distance ([ 18 ]) betw een the 21 t wo distributions F and G supp orted on a D -dimensional compact smooth submanifold M of R d , d ≥ D . Let ρ be a metric on M and consider the random v ariables X , X ′ , Y , Y ′ suc h that X D = X ′ , and Y D = Y ′ , where X ∼ F , Y ∼ G , and U D = U ′ indicate that the random v ariables U and U ′ are iden tically distributed. Then, the energy distance b et ween F and G denoted b y E ( F , G ) is deﬁned as: E ( F , G ) := 2 E ( ρ ( X , Y )) − E ( ρ ( X , X ′ )) − E ( ρ ( Y , Y ′ )) , (4.1) where E ( U ) denotes the exp ectation of a random v ariable U . The t wo-sample tests on Euclidean spaces based on the energy distance ha ve b een con- sidered in the literature (see, e.g., [ 44 ]) and are shown to b e consisten t provided E ( F , G ) is a metric on the class of distributions under consideration. Recently , [ 18 ] extended the testing framew ork for the manifold-v alued data and provided suﬃcien t conditions for E ( F , G ) to be a metric. In the present context, the manifold under consideration is C d (see Theorem 3.2 ) with the standard Euclidean metric. How ev er, we require the follo wing assumptions, for C d to b e a compact smo oth submanifold of R d . Assumption 1. C d is a c omp act submanifold of R d , d ≥ 2 . Assumption 2. The class of pr ob ability distributions on the manifold C d is deﬁne d as: C := { F : F is absolutely c ontinuous } . (4.2) Then, under Assumption 1 and Assumption 2 , the following prop osition asserts that E ( F , G ) is a metric. Prop osition 4.1. E ( F , G ) = 0 if and only if F = G , for F , G ∈ C . W e refer to Section 6.2 for the pro of of Proposition 4.1 , whic h relies on the condition of strong negative type ([ 42 ]) for the metric space ( C d , ∥ . ∥ ), where ∥ . ∥ denotes the standard Euclidean metric in R d , d ≥ 2. Now, we deﬁne the test statistic based on the energy statistic, whic h is the sample counterpart of E ( F , G ). Let Z := S 1 ∪ S 2 denote the p o oled sample obtained from the tw o samples deﬁned in Theorem 3.2 , that is, S 1 := {V ( B 1 ) , . . . , V ( B n 1 ) } and S 2 := {V ( ˜ B 1 ) , . . . , V ( ˜ B n 2 ) } . Then, the 22 energy statistic denoted by E n 1 ,n 2 ( Z ), is deﬁned as E n 1 ,n 2 ( Z ) := X ( X,Y ) ∈Z ×Z 2( n 1 n 2 ) − 1 ∥ X − Y ∥ − X ( X,X ′ ) ∈S 1 ×S 1 n − 2 1 ∥ X − X ′ ∥ − X ( Y ,Y ′ ) ∈S 2 ×S 2 n − 2 2 ∥ Y − Y ′ ∥ , (4.3) where A × B denotes the Cartesian pro duct of the tw o sets A and B , and ∥ . ∥ denotes the standard Euclidean metric in R d , d ≥ 2. Then, w e prop ose the follo wing test for H 0 (see Equation ( 3.3 )) based on E n 1 ,n 2 ( Z ). Let α ∈ (0 , 1) b e a ﬁxed level of signiﬁcance. W e prop ose to reject H 0 at the level of signiﬁcance α , if the observ ed v alue E n 1 ,n 2 ( Z obs ) ≥ C n 1 ,n 2 ( α ), where Z obs denotes the p o oled sample con taining observ ed v alues from the samples S 1 and S 2 , and C n 1 ,n 2 ( α ) denotes the (1 − α )th quan tile of the distribution of E n 1 ,n 2 ( Z ) under H 0 . T o accomplish the prop osed testing procedure, w e compute C n 1 ,n 2 ( α ) using the permutation distribution of the test statistic E n 1 ,n 2 ( Z ) under H 0 . Note that, under H 0 , the random v ariables in Z , sa y , Z := { Z 1 , . . . , Z N } , where N = ( n 1 + n 2 ), are exc hangeable. This implies that, under H 0 , an y v alue of the test statistic across all N ! p ermutations of { Z 1 , . . . , Z N } is equally likely . Th us, under H 0 , E n 1 ,n 2 ( Z ) ∼ U nif {E n 1 ,n 2 ( Z π obs ) : π ∈ S N } , where Z π obs denotes the observed p o oled sample Z obs with elemen ts ordered according to the p erm utation π , and S N denotes the symmetric group on { 1 , . . . , N } . This yields, under H 0 , for k = ⌈ (1 − α ) N ⌉ , w e hav e C n 1 ,n 2 ( α ) = k - min {E n 1 ,n 2 ( Z π obs ) : π ∈ S N } , (4.4) where k -min(A) denotes the k-th smallest v alue from the set A. 4.1 Asymptotic Prop erties of T est This subsection establishes the consistency of the prop osed test. In other words, w e sho w that the p ow er of the prop osed test tends to 1 as min( n 1 , n 2 ) − → ∞ . How ev er, b efore p erforming an asymptotic analysis, we would like to highligh t that the top ological signatures of the observ ed random geometric ob jects are regularized for a suitable c hoice of m ∈ N . That is, the observ ed random samples of barco des lie in a regularized subset of the barco de space B ≤ n , for a suitable 23 c hoice of m . Therefore, recall that the prop osed suﬃcien t statistic V is a measurable transfor- mation from the regularized barco de space B m ≤ n to C d (see Theorem 3.2 ). A data-driven choice of m w ould v ary with sample size, rendering the domain of V sample-dep enden t, and thereb y complicating a rigorous asymptotic analysis. T o remedy this, w e prop ose a universal v alue of m that can b e used to regularize the barco de space. First, recall from the deﬁnition of B m ≤ n (see Equation ( 2.6 )), w e subset only those barco des that consist of top ological features that satisfy the follo wing for m ∈ N : b i ≤ m ( d i − b i ) = ⇒ b i ≤ m m + 1 d i for all i = 1 , . . . , n, (4.5) where d i is the death time of the i -th feature in the barcode, and is related to the p ersistence  i as,  i = d i − b i . The condition in Equation ( 4.5 ) can be interpreted as choosing features from a subset of p ersistent diagrams dep ending on m . Note that in a t ypical p ersistence diagram, the birth alw ays precedes the death of a top ological feature. Therefore, all the top ological features of a p ersistent diagram are in the region { ( b, d ) ∈ R 2 : 0 < b ≤ d } . The condition in Equation ( 4.5 ) reduces this region b y scaling the death times b y m/m + 1. Th us, the higher the v alue of m , the wider the regularized region, whic h encompasses the features close to the diagonal in regularized subsets of barco des. Therefore, if we c ho ose a smaller v alue of m , say m = 1, then we will lea v e out most of the features that are close to the diagonal, while if we c ho ose a higher v alue of m , sa y m = 100, then the region { ( b, d ) : b ≤ 0 . 99 d } is closer to the region { ( b, d ) : b ≤ d } . Th us, a suitable larger v alue, say m = 100 allows us to subset most of the features from the persistence diagram. Hence, an appropriate and univ ersal choice of m to dra w random samples from B m ≤ n could b e m = 100. No w, we state the consistency of the prop osed p erm utation test in the following theorem. Theorem 4.1. L et the sample sizes n 1 and n 2 ar e such that n 1 / ( n 1 + n 2 ) − → λ ∈ (0 , 1) as min( n 1 , n 2 ) − → ∞ . Then under Assumption 1 and Assumption 2 , the test b ase d on E n 1 ,n 2 ( Z ) for H 0 (se e Equation ( 3.3 ) ), is c onsistent, that is, for the fol lowing pr ob ability under H 1 , we have P H 1 ( E n 1 ,n 2 ( Z ) ≥ C n 1 ,n 2 ( α )) − → 1 as min( n 1 , n 2 ) − → ∞ . 24 5 Conclusion W e prop ose a tw o-sample testing framew ork to detect top ological diﬀerences in random geo- metric ob jects. In the course of this study , we prop ose a suﬃcien t statistic deriv ed from tropical em b eddings of barco des to place the testing framew ork on a standard statistical fo oting. As an application of Theorem 3.2 , we establish that it is equiv alen t to p erforming a t wo-sample test on the barco de space to a tw o-sample problem on the ordered con vex cone ( C d ) in R d (see Theorem 3.3 ). W e prop ose a t wo-sample test on the manifold C d based on the manifold energy statistics and deriv e its consistency . The proposed testing framework is a generalized framew ork of h yp othesis testing framework proposed by [ 41 ] and [ 4 ]. In particular, the prop osed testing framew ork can b e adapted for the ensembles of p oin t cloud data. Moreo ver, the prop osed testing framework provides an alternative to the testing framew ork prop osed by [ 35 ]. As a future consideration, it w ould b e tempting to explore the p ossibilit y of extending the prop osed framew ork for a time series of random geometric ob jects. 6 App endix 6.1 Deﬁnitions Deﬁnition 6.1. ( T ame Set ) We use the notion of o-minimal structur es fr om [ 21 ] to deﬁne tame sets. L et P ( R d ) denote the p ower set of R d , and A × B denotes the Cartesian pr o duct of two sets A and B . An o-minimal structur e is deﬁne d as O := {O d : d ≥ 1 } , wher e O d ⊆ P ( R d ) satisfying the fol lowing c onditions: 1. Sets in O d ar e close d under ﬁnite interse ction and c omplement. 2. F or any set A ∈ O d , we have A × R ∈ O d +1 and R × A ∈ O d +1 . 3. L et π : R d +1 − → R d b e an axis-aligne d pr oje ction map. Then for any for any set A ∈ O d +1 , we have π ( A ) ∈ O d . 25 4. O is close d with r esp e ct to al l the op er ations of R that make it an or der e d ﬁeld, that is, the op er ations like c omp arison ( < ), addition, and multiplic ation. 5. The only sets in O 1 ar e al l ﬁnite unions of p oints and op en intervals of R . Then, the elements of O ar e c al le d tame sets . Deﬁnition 6.2. ( Gr omov–Hausdorﬀ distanc e ) We deﬁne the Gr omov–Hausdorﬀ distanc e b etwe en two metric sp ac es ( X , d X ) and ( Y , d Y ) in terms of c orr esp ondenc es as in [ 46 ]. A c or- r esp ondenc e is a subset C ⊂ X × Y that satisﬁes the fol lowing: 1. F or al l x ∈ X , ∃ y ∈ Y such that ( x, y ) ∈ C 2. F or al l y ∈ y , ∃ x ∈ X such that ( x, y ) ∈ C . The distortion of the c orr esp ondenc e C is deﬁne d as: dist ( C ) = sup ( x 1 ,x 2 ) , ( y 1 ,y 2 ) ∈ C | d X ( x 1 , x 2 ) − d Y ( y 1 , y 2 ) | . Then, the Gr omov–Hausdorﬀ distanc e d GH ( X , Y ) b etwe en the metric sp ac es X and Y is deﬁne d as: d GH ( X , Y ) = 1 2 inf { dist ( C ) : C ∈ C } , wher e C denotes the class of al l c orr esp ondenc es b etwe en X and Y . Deﬁnition 6.3. (Bottlene ck distanc e) [ 8 ] L et B 1 and B 2 b e two b ar c o des in B ≤ n (se e Equation ( 2.1 ) ). This implies that B 1 and B 2 c an b e written as ﬁnite c ol le ctions of intervals. That is, B 1 := { I i : i ∈ [ N ] } and B 2 := { J i : i ∈ [ M ] } , for some p ositive inte ger N and M such that max( N , M ) ≤ n . R e c al l that, her e [ n ] r epr esent the set { 1 , . . . , n } for any n ∈ N . Now, to deﬁne the b ottlene ck distanc e, we ﬁrst ne e d to sp e cify the distanc e b etwe en two fe atur es in a b ar c o de as wel l as the distanc e b etwe en a fe atur e and the diagonal ∆ = { [ b, b ) : b ≥ 0 } c ontaining b ars of length 0. We deﬁne the distanc e b etwe en two fe atur es I := [ b 1 ,  1 ) and J := [ b 2 ,  2 ) as: δ ∞ ( I , J ) := max ( | b 1 − b 2 | , | ( b 1 +  1 ) − ( b 2 +  2 ) | ) , 26 wher e b i r epr esents the birth time and  i r epr esents the p ersistenc e of the i -th fe atur e, i = 1 , 2 . The distanc e b etwe en a fe atur e [ b,  ) and the diagonal ∆ is deﬁne d as: δ ∞ ([ b,  ) , ∆) :=  2 . Now, c onsider a bije ction ϕ : A − → B , wher e A ⊆ [ N ] and B ⊆ [ M ] , and deﬁne the p enalty ρ ( ϕ ) of ϕ as: ρ ( ϕ ) := max  max i ∈ A  δ ∞  I i , J ϕ ( i )  , max i ∈ [ N ] \ A ( δ ∞ ( I i , ∆)) , max i ∈ [ M ] \ B ( δ ∞ ( J i , ∆))  Then the b ottlene ck distanc e b etwe en B 1 and B 2 is denote d by δ B ( B 1 , B 2 ) , and deﬁne d as: δ B ( B 1 , B 2 ) := min ϕ ( ρ ( ϕ )) . Deﬁnition 6.4. ( Gener al F actorization The or em [ 2 ] ) L et ( X , F ) b e a me asur able sp ac e with a family of pr ob ability me asur es M dominate d by a σ -ﬁnite me asur e λ . Then a statistic T is suﬃcient for M if and only if ther e exist a non-ne gative me asur able function h on X and a set of non-ne gative me asur able functions { g ϑ : ϑ ∈ M } on the r ange of T such that for e ach ϑ ∈ M , the R adon-Niko dym derivative f ϑ ≡ dϑ/dλ admits the factorization f ϑ ( x ) = h ( x ) g ϑ ( T ( x )) , x ∈ X . 6.2 Pro ofs of Theorems and Prop ositions Pr o of of The or em 3.2 . W e pro ceed b y establishing that for a barco de B ∼ ϑ ∈ P , the Radon-Nik o dym deriv ativ e f ϑ ≡ dϑ/dλ factors as: f ϑ ( B ) = h ( B ) g ϑ ( V ( B )) , where h is a non-negative measurable function on B m ≤ n and g ϑ is a non-negative measurable function on C d . Recall that C d := { ( x 1 , . . . , x d ) ⊤ ∈ R d : x 1 ≤ x 2 ≤ . . . , ≤ x d } . Then by the general factorization theorem (see Deﬁnition 6.4 ), the map V will b e a suﬃcien t statistic. First, we use the prop ert y that the map V : B m ≤ n − → C d is injective. This is b ecause the tropical co ordinates deﬁned in Equation ( 2.7 ) separate the barco des in B m ≤ n b y applying 27 Prep osition 2.8 of [ 38 ] and Theorem 2.1 (see [ 38 ]). This implies that for any t w o distinct p oint B 1 and B 2 in B m ≤ n , w e ha v e V ( B 1 )  = V ( B 2 ). Consequently , V is an em b edding, therefore, there exists a function η suc h that η ◦ V and V ◦ η are iden tit y maps in B m ≤ n and C d , resp ectively . Th us, w e can write f ϑ ( B ) = h ( B ) g ϑ ( V ( B )), for h ( B ) = 1 and g ϑ ≡ f ϑ ◦ η . It is eviden t that b oth h and g ϑ are non-negativ e, g ϑ is non-negativ e, as f ϑ is a probability densit y on B m ≤ n . No w, w e verify the measurability of the maps h and g ϑ to apply the general factorization theorem (see Deﬁnition 6.4 ). Note that the map h ( B ) = 1 is a constant map, hence it is contin uous. Therefore, by Theorem 1.5 of [ 40 ], h is measurable. Next, to sho w that g ϑ ≡ f ϑ ◦ η is measurable, w e use the fact that the composition of t wo measurable maps is measurable. Therefore, w e need to s ho w that η is measurable as f ϑ is measurable by the Radon-Nikodym theorem. W e use the Kurato wski theorem (see Chapter 3 in [ 40 ]), whic h states that the inv erse of an injectiv e, measurable map betw een complete and separable metric spaces is measurable. Now, since b oth the metric spaces ( B m ≤ n , δ B ) and ( C d , δ ) are closed subspaces of complete and separable metric spaces ( B ≤ n , δ B ) and ( R d , δ ), resp ectiv ely , δ denotes the standard Euclidean metric in R d . This implies that b oth the metric spaces ( B m ≤ n , δ B ) and ( C d , δ ) are complete and separable. W e refer to Theorem 3.2 of [ 4 ] for completeness and separabilit y of ( B ≤ n , δ B ). Consequently , the inv erse of V , that is, the map η is measurable. Hence, the embedding V is a suﬃcien t statistic for P b y the general factorization theorem (see Deﬁnition 6.4 ). This completes the pro of of Theorem 3.2 . Pr o of of The or em 3.3 . W e use Theorem 3.2 and apply the general factorization theorem (see Deﬁnition 6.4 ) for h ( B ) ≡ 1 without loss of generalit y to establish Theorem 3.3 . Let f µ , f ϑ , g µ and g ϑ denote the probabilit y densities of µ , ϑ , F and G , resp ectively . Recall that, F and G are the probability distributions corresp onding to the induced probabilit y measures µ ◦ V − 1 and ϑ ◦ V − 1 , resp ectiv ely . W e need to sho w that any statistical decision (accept or reject) for H ′ 0 (see, Equation ( 3.1 )) is v alid for H 0 (see, Equation ( 3.3 )), and vice v ersa. Consider the situation when w e accept the 28 n ull hypothesis H ′ 0 . This implies that for ev ery A ∈ σ ( B m ≤ n ) and B ∈ B m ≤ n , we hav e: µ ( A ) = ν ( A ) ⇐ ⇒ f µ ( B ) = f ν ( B ) , (6.1) where Equation ( 6.1 ) follows from the fact that the probability densit y of a random v ariable uniquely c haracterizes its probability measure. No w, using the suﬃciency of the tropical em- b edding V from Theorem 3.2 , w e hav e: f µ ( B ) = f ν ( B ) ⇐ ⇒ g µ ( V ( B )) = g ν ( V ( B )) ⇐ ⇒ F ( V ( B )) = G ( V ( B )) . (6.2) Therefore, by Equation ( 6.1 ) and ( 6.2 ), for any arbitrary B ∈ B m ≤ n and A ∈ σ ( B m ≤ n ), we ha ve: µ ( A ) = ν ( A ) ⇐ ⇒ F ( V ( B )) = G ( V ( B )) . (6.3) No w, consider the situation when w e reject H ′ 0 , that is, there exists A ∈ σ ( B m ≤ n ) and B ∈ B m ≤ n suc h that: µ ( A )  = ν ( A ) ⇐ ⇒ f µ ( B )  = f ν ( B ) . (6.4) This further implies by Theorem 3.2 that: f µ ( B )  = f ν ( B ) ⇐ ⇒ g µ ( V ( B ))  = g ν ( V ( B )) ⇐ ⇒ F ( V ( B ))  = G ( V ( B )) . (6.5) Therefore, by Equation ( 6.4 ) and ( 6.5 ), there exists A ∈ σ ( B m ≤ n ) and B ∈ B m ≤ n suc h that: µ ( A )  = ν ( A ) ⇐ ⇒ F ( V ( B ))  = G ( V ( B )) . (6.6) Th us, using Equation ( 6.3 ) and ( 6.6 ) it is established that b oth the h yp othesis H ′ 0 (see, Equation ( 3.1 )) and H 0 (see, Equation ( 3.3 )) are equiv alen t. This establishes the statement in Theorem 3.3 . Pr o of of Pr op osition 4.1 . W e need to sho w that the metric space ( C d , ∥ . ∥ ) has strong neg- ativ e type ([ 42 ]). Then, b y Prop osition 3 of [ 45 ], E ( F , G ) will b e a metric on the class of distribution functions C (see Equation ( 4.2 )). The metric space ( C d , ∥ . ∥ ) has strong negative t yp e if for an y tw o probability distributions F and G supp orted on C d and for the random v ariables X , X ′ , Y and Y ′ suc h that X D = X ′ and Y D = Y ′ , where X ∼ F and Y ∼ G , we ha v e: 2 E ∥ X − Y ∥ − E ∥ X − X ′ ∥ − E ∥ Y − Y ′ ∥ ≥ 0 , (6.7) 29 suc h that equality is attained in Equation ( 6.7 ) if and only if F = G . The condition of strong negativ e t yp e for the manifold C d holds under the Euclidean metric b y Theorem 2.1 of [ 3 ]. This implies that the metric space ( C d , ∥ . ∥ ) has strong negative type. Hence, by Prop osition 3 of [ 45 ], E ( F , G ) is metric on C . This establishes the assertion in Prop osition 4.1 . Pr o of of The or em 4.1 . The pro of pro ceeds along the follo wing tw o steps. First, we sho w that the energy statistic E n 1 ,n 2 ( Z ) con v erges in probability to its p opulation counterpart E ( F , G ) under H 1 , that is, we ha ve: E n 1 ,n 2 ( Z ) P − → E ( F , G ) as min( n 1 , n 2 ) − → ∞ (6.8) The equation ( 6.8 ) follo ws directly from the application of the asymptotics of U-statistics. In particular, we apply Theorem 12.6 of [ 47 ] to the ﬁrst term of E n 1 ,n 2 ( Z ) and Theorem 12.3 of [ 47 ] to the remaining t wo terms of E n 1 ,n 2 ( Z ). Second, we use Lemma A.2 of [ 18 ], whic h establishes that for every  > 0 there exists 0 < M < ∞ such that for any p ermutation π ∈ S N , we hav e: lim inf min( n 1 ,n 2 ) − → ∞ P (( n 1 + n 2 ) E n 1 ,n 2 ( Z π ) < M ) ≥ 1 − , (6.9) where Z π denotes the po oled sample with elemen ts ordered according to the permutation π on { 1 , . . . , ( n 1 + n 2 ) } . No w, consider the follo wing probability under H 1 : P H 1 ( E n 1 ,n 2 ( Z ) ≥ C n 1 ,n 2 ( α )) = P H 1 (( n 1 + n 2 ) E n 1 ,n 2 ( Z ) ≥ ( n 1 + n 2 ) C n 1 ,n 2 ( α )) ≥ P H 1 (( n 1 + n 2 ) E n 1 ,n 2 ( Z ) ≥ M ) (E.1) − → P H 1 ( E ( F , G ) ≥ 0) as min( n 1 , n 2 ) − → ∞ (E.2) = 1 , as E ( F , G ) > 0 , by Prep osition 4.1 under H 1 , where ( E.1 ) follows from the application of Equation ( 6.9 ) to Equation ( 4.4 ) and ( E.2 ) follows from Equation ( 6.8 ). This establishes the consistency of the prop osed test. 30 References [1] Aaron Adco c k, E. C. and Carlsson, G. (2016). The Ring of Algebraic F unctions on Persis- tence Barco des. Homolo gy, Homotopy and Applic ations , 18(1):381 – 402. [2] Bahadur, R. R. (1954). Suﬃciency and Statistical Decision F unctions. The A nnals of Mathematic al Statistics , 25(3):423 – 462. [3] Baringhaus, L. and F ranz, C. (2004). On a new m ultiv ariate t wo-sample test. Journal of Multivariate Analysis , 88(1):190–206. [4] Blum b erg, A. J., Gal, I., Mandell, M. A., and P ancia, M. (2014). Robust statistics, hy- p othesis testing, and conﬁdence in terv als for p ersistent homology on metric measure spaces. F oundations of Computational Mathematics , 14(4):745–789. [5] Bobro wski, O., Mukherjee, S., and T aylor, J. E. (2017). T op ological consistency via k ernel estimation. Bernoul li , 23(1):288 – 328. [6] Bo yer, D. M., Lipman, Y., Clair, E. S., Puen te, J., P atel, B. A., F unkhouser, T., Jern v all, J., and Daub echies, I. (2011). Algorithms to automatically quan tify the geometric similarit y of anatomical surfaces. Pr o c e e dings of the National A c ademy of Scienc es , 108(45):18221–18226. [7] Bub enik, P ., Carlsson, G., Kim, P ., and Luo, Z.-M. (2010). Statistical top ology via morse theory p ersistence and nonparametric estimation. A lgebr aic metho ds in statistics and pr ob a- bility II. Contemp or ary Mathematics. , 516:75–92. [8] Carlsson, G. (2014). T op ological pattern recognition for p oin t cloud data. A cta Numeric a , 30:289–368. [9] Carlsson, G. (2020). T op ological metho ds for data mo delling. Natur e R eviews Physics , 2(12):697–708. [10] Carlsson, G. and Kali ˇ snik V erov ˇ sek, S. (2016). Symmetric and r-symmetric tropical p oly- nomials and rational functions. Journal of Pur e and Applie d Algebr a , 220(11):3610–3627. 31 [11] Carlsson, G. and V ejdemo-Johansson, M. (2021). T op olo gic al Data Analysis with Applic a- tions . Cambridge Universit y Press. [12] Carlsson, G., Zomoro dian, A., Collins, A., and Guibas, L. (2004). P ersistence barcodes for shap es. In Pr o c e e dings of the 2004 Eur o gr aphics/A CM SIGGRAPH Symp osium on Ge om- etry Pr o c essing , SGP ’04, page 124–135, New Y ork, NY, USA. Asso ciation for Computing Mac hinery . [13] Chazal, F., Cohen-Steiner, D., and M´ erigot, Q. (2011a). Geometric inference for measures based on distance functions. F oundations of c omputational mathematics , 11(6):733–751. [14] Chazal, F., F asy , B., Lecci, F., Mic hel, B., Rinaldo, A., Rinaldo, A., and W asserman, L. (2017). Robust top ological inference: Distance to a measure and k ernel distance. J. Mach. L e arn. R es. , 18(1):5845–5884. [15] Chazal, F., Glisse, M., Labru` ere, C., and Michel, B. (2014). Con vergence rates for p ersis- tence diagram estimation in top ological data analysis. In Pr o c e e dings of the 31st International Confer enc e on International Confer enc e on Machine L e arning - V olume 32 , ICML’14, page I–163–I–171. JMLR.org. [16] Chazal, F., Guibas, L. J., Oudot, S. Y., and Skraba, P . (2011b). Scalar ﬁeld analysis o v er p oin t cloud data. Discr ete & Computational Ge ometry , 46(4):743–775. [17] Chazal, F. and Mic hel, B. (2021). An in tro duction to top ological data analysis: F unda- men tal and practical asp ects for data scientists. F r ontiers in Artiﬁcial Intel ligenc e , 4. [18] Ch u, L. and Dai, X. (2024). Manifold energy t w o-sample test. Ele ctr onic Journal of Statistics , 18(1):145 – 166. [19] Collins, A., Zomoro dian, A., Carlsson, G., and Guibas, L. J. (2004). A barco de shap e descriptor for curve p oint cloud data. Computers & Gr aphics , 28:881–894. [20] Cra wford, L., Mono d, A., Chen, A. X., Mukherjee, S., and Rabad´ an, R. (2020). Predicting clinical outcomes in glioblastoma: An application of topological and functional data analysis. Journal of the Americ an Statistic al Asso ciation , 115(531):1139–1150. 32 [21] Curry , J., Mukherjee, S., and T urner, K. (2022). How many directions determine a shap e and other suﬃciency results for tw o top ological transforms. T r ansactions of the Americ an Mathematic al So ciety, Series B , 9:1006–1043. [22] Dries, L. P . D. v. d. (1998). T ame T op olo gy and O-minimal Structur es . London Mathe- matical So ciet y Lecture Note Series. Cambridge Universit y Press. [23] Dupuis, P . and Grenander, U. (1998). V ariational problems on ﬂo ws of diﬀeomorphisms for image matching. Q. Appl. Math. , L VI(3):587–600. [24] Edelsbrunner, H. and Harer, J. (2008). P ersistent homology—a survey . Contemp or ary Mathematics 453 , 26:257–282. [25] Edelsbrunner, H., Letscher, D., and Zomorodian, A. (2002). T op ological p ersistence and simpliﬁcation. Discr ete & Computational Ge ometry , 28(4):511–533. [26] F asy , B. T., Lecci, F., Rinaldo, A., W asserman, L., Balakrishnan, S., and Singh, A. (2014). Conﬁdence sets for p ersistence diagrams. The Annals of Statistics , 42(6):2301 – 2339. [27] F eﬀerman, C., Mitter, S., and Nara y anan, H. (2016). T esting the manifold h yp othe- sis. Journal of the A meric an Mathematic al So ciety , 29:983–1049. Published electronically: F ebruary 9, 2016. [28] Gao, T., Kov alsky , S. Z., Bo yer, D. M., and Daub echies, I. (2019). Gaussian pro cess landmarking for three-dimensional geometric morphometrics. SIAM Journal on Mathematics of Data Scienc e , 1(1):237–267. [29] Ghrist, R. (2008). Barco des: The p ersisten t top ology of data. Americ an Mathematic al So ciety (New Series) , 45:61–75. [30] Hatc her, A. (2002). Algebr aic top olo gy . Cam bridge Universit y Press, Cam bridge. [31] Kali ˇ snik, S. (2019). T ropical co ordinates on the space of p ersistence barcodes. F oundations of Computational Mathematics , 19(1):101–129. [32] Kendall, D. G. (1989). A Survey of the Statistical Theory of Shap e. Statistic al Scienc e , 4(2):87 – 99. 33 [33] Kumar, S. and Dhar, S. S. (2025). A nov el characterization of structures in smo oth regression curves: from a viewp oin t of p ersistent homology . [34] Kumar, S. and Dhar, S. S. (2026). T esting homological equiv alence using b etti n um b ers: Probabilistic prop erties. The ory of Pr ob ability & Its Applic ations , 71(1). T o app ear. [35] Meng, K., W ang, J., Crawford, L., and Eloy an, A. (2025). Randomness of shap es and statistical inference on shapes via the smo oth euler characteristic transform. Journal of the A meric an Statistic al Asso ciation , 120(549):498–510. [36] Mileyk o, Y., Mukherjee, S., and Harer, J. (2011). Probabilit y measures on the space of p ersistence diagrams. Inverse Pr oblems , 27(12):124007. [37] Milnor, J. (1963). Morse The ory , v olume 51 of Annals of Mathematics Studies . Princeton Univ ersity Press, Princeton, NJ. [38] Mono d, A., Kali ˇ snik, S., Pati˜ no Galindo, J. A., and Cra wford, L. (2019). T ropical suﬃcien t statistics for p ersisten t homology . SIAM Journal on Applie d Algebr a and Ge ometry , 3(2):337– 371. [39] Munkres, J. (1984). Elements of Algebr aic T op olo gy . W estview Press; 1st edition. [40] P arthasarathy , K. (1967). Probabilit y and mathematical statistics: A series of monographs and textb o oks. In Pr ob ability Me asur es on Metric Sp ac es , Probability and Mathematical Statistics: A Series of Monographs and T extbo oks, page ii. Academic Press. [41] Robinson, A. and T urner, K. (2017). Hyp othesis testing for top ological data analysis. Journal of Applie d and Computational T op olo gy , 1(2):241–261. [42] Sc ho enberg, I. J. (1938). Metric spaces and positive deﬁnite functions. T r ansactions of the A meric an Mathematic al So ciety , 44(3):522–536. [43] Shao, J. (2003). Mathematic al Statistics . Springer T exts in Statistics. Springer, New Y ork, NY, 2 edition. [44] Szek ely , G. and Rizzo, M. (2004). T esting for equal distributions in high dimension. In- terStat , 5. 34 [45] Sz ´ ek ely , G. J. and Rizzo, M. L. (2017). The energy of data. Annual R eview of Statistics and Its Applic ation , 4(1):447–479. [46] v an Delft, A. and Blumberg, A. J. (2025). A statistical framework for analyzing shap e in a time series of random geometric ob jects. The Annals of Statistics , 53(2):561 – 588. [47] v an der V aart, A. W. (1998). Asymptotic Statistics . Cam bridge Series in Statistical and Probabilistic Mathematics. Cambridge Univ ersit y Press, Cam bridge. [48] W asserman, L. (2018). T op ological data analysis. Annual R eview of Statistics and Its Applic ation , 5(1):501–532. [49] Zomoro dian, A. and Carlsson, G. (2005). Computing p ersisten t homology . Discr ete Com- put. Ge om. , 33(2):249–274. 35

Identifying Topological Differences in Two Populations of Random Geometric Objects

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment