Medians of populations of persistence diagrams

Medians of p opulations of p ersis tence d iagrams Katharine T urner F ebruary 7, 20 19 Abstract P ersistence diagrams are co mmon ob jects in the ﬁeld of T op ological Data Analysis. They are top ological summaries that capture both top ological and geometric structure within data. Recently there has b een a su rge of interest in develo ping tools to statistically anal yse populations of p ersistence diagrams, a pro cess hampered by t h e complicated geometry of the space of persistence diagra ms. In this pap er w e study the median of a set of diagrams, deﬁned as the minimi zer of an a ppropriate cost function analogo us to the sum of distances used f or samples of real n umbers. W e then c h aracterize th e local minima of this cost function and in doing so c h aracterize the median. W e also do some comparative analysis of th e prop erties of the median and t he mean. 1 In tro duction T op o logical data analy sis (TD A) is a rapidly g rowing ﬁeld that uses multi-scale top ologic a l features to ﬁnd and analyse structure in data. An impor tant use of TDA is as a prepr o cessing to o l, pr oviding top ologica l summaries that may b e more tractable than the raw infor ma tion, a nd p er ha ps highlight geo metric and top olo g ical features that a re of par ticula r interest. Examples of applica tio ns include the analysis o f the shap e of hum an jaws [ 14 ], pla nt ro ot s ystems [ 3 ], shapes o f calca nei b ones of v arious prima tes [ 29 ] and retr iev al of t rademark s ymbols [ 8 ]. Persistence diagr ams ar e a common top ologic a l summar y s tatistic. Each diagra m is a discr e te summary of how th e homolo g y ev olves ov e r a nested sequence of topo logical spaces pa r ameterized b y some tuning parameter . Homology computes features such a s the num ber of co nnected comp onents, lo ops, tunnels, and void. Although per sistent homolog y is deﬁned via homolog y , the tuning par a meter captures geometric informatio n. As par t of the g rowing ﬁeld of ob ject orie nted data a nalysis we wish to co nsider the structure of the space of p ers istence dia grams when p erfor ming statistical analysis. This is complicated b y the geometr y o f the space of p ersistence diagrams - it is inﬁnite in dimension a nd with no upp er b ound on curv ature. A t no p oint do es it lo cally lo ok E uclidean. The papers [ 22 , 28 ] study the geometric and a na lytic pro p e r ties of the space o f p er sistence diagrams equipp ed with the metric analog ous to the 2-W asserstein distances of probability measur e s and to the L 2 distances of functions o n a discrete set. W e denote this space ( D , d 2 ). In [ 22 ] it was shown that it is p ossible to deﬁne the mean and v ariance o f “nice” distributions via the F r´ echet function. In [ 28 ] it was shown that ( D , d 2 ) was a non-negatively curved Alexandrov s pa ce and used this structure to characterize the mean o f a population of per sistence diagrams a nd to provide an algorithm to c o mpute it. After the mean, the median is next mo s t common statistic to describ e the cent er of a distribution. F ur thermore, the median is often more robust than the mean. The purpo se of this pap er is to pro vide an analogous ana lysis for the space of the pers istence diagrams under the cor resp onding metric for p = 1 (instead of p = 2) a nd to characterize the median of a p opulation o f per sistence diag r ams. W e a ls o are in terested in compar ing the pro p e r ties of the mean and t he media n of populations of p ersistence diagrams. In section 2 w e establish geometric prop erties of the space of p ers is tence diagrams, such as curv ature and connected components, for a na tural family of metr ic s { ( D , d p ) } t hat a re a nalogo us to the p -W asserstein distances on the spa ce of proba bility m easures and to the L p distances on the space of functions on a discr ete set. Just as spaces of functions equipp ed with an L 2 metric has nice r prop e rties to those with an L p metric with p 6 = 2, w e see that a lthough ( D , d 2 ) is a non-neg atively curved Alex a ndrov spa ce, this does not hold for ( D , d p ) when p 6 = 2. The median of a p opulatio n is the loc ation that minim izes the average distances to each of the mem b ers of the po pulation. In section 3 we study the median of a po pulation of p ersis tence dia grams. Unfortunately the pro o fs in [ 28 ] for characterizing a nd computing the mean re quire the ex tr a Alexandr ov space s tructure av ailable in the case when p = 2 (r a ther than p = 1 which is the scena rio fo r the media n). This implies that w e needed to dev elop new metho ds in or der to characterize the median of a po pulation of per sistence diagr ams. In the app endix w e summarize how these new metho ds provide a new pro of for previous results ab out the mean. Another asp ect where 1 the ana lysis here diﬀers from that in [ 28 ] is that we consider p ersistence diagra ms con ta ining p o ints with inﬁnite per sistence. W e can deﬁne the mean as the lo catio n that minimizes the average distance to the sample set. Unlike the mean, the media n of a p opula tio n of even size is not unique. F or N o dd, the median of a set of rea l num b ers a 1 , a 2 , . . . , a N , written in non-decrea sing or der, is a ( N +1) / 2 . If N is even, th en ev er y n umber in the in terv al [ a N/ 2 , a ( N +2) / 2 ] minimizes the average distance to t he sample set and hence would b e a v alid c hoice as a median. T o overcome this lack of uniquenes s the gener al conv ention is to declar e the median to b e the midpoint of [ a N/ 2 , a ( N +2) / 2 ]. Analogous conv entions could b e applied for medians of an o dd num b er of p ersis tence diagr ams. How e ver, the statements of theory a nd the clarity o f exp osition quic kly b ecomes less clear. F or this reason we will be restricting our atten tion to t he ca se when the size of the populatio ns is o dd. In section 4 we compare v arious prop erties o f the median to the mean. W e bound the num b er o f oﬀ diagonal po ints in the median compared to the mean, s how the media n is more robust than the mean, and prov e that the mean is gene r ically unique while the median is no t. 1.1 Related w ork Statistical a nalysis of p ersistence diag rams is a n exa mple of ob ject oriented data analysis . This is an emerging area of statistics where the g oal is to develop and apply statistical metho ds to ob jects such as functions, im ages, graphs or tr ees (e.g. [ 19 , 21 , 20 , 2 5 , 27 , 30 ]) while respecting the structure of these ob jects. There ha s b een a growing mov ement of developing sta tistical metho ds to under stand p opula tions o f p er sistence diagrams. This includes a signiﬁcan t b o dy of work (for example see [ 6 , 4 , 9 , 10 , 1 3 ]) which studies sta tistica l metho dology using the b ottlene ck distanc e , whic h is eﬀectively the L ∞ distance within th e space o f per sistence diagrams. There has also been a par allel mov ement o f sta tis tica l analy sis in TD A by conv erting p ersis tence dia grams into other functional summaries. These new t op ologica l summaries lie in s paces w he r e it is easier to modify traditional statistical metho ds such as c o mputing means o r p er forming t-tests. How ever, this can be at the exp ense of making it ha rder to provide top olo g ical interpretations of the results. Examples of such functional summarie s includes per sistence landscape s [ 7 ] and p ers istent homolo gy rank functions [ 26 ]. Other r elated work in vestigates the homolog y or p e r sistent homology of random top olog ical ob jects (for example [ 1 , 5 , 15 , 1 6 , 32 , 31 ]) including limit theore ms and analy sis of simulations. 2 Geometry of the space of P ersistence Diagrams 2.1 Bac kground t heory on p ersisten t homology an d p ersistence diagrams Here w e will provide a very brie f in tro duction to pers istent homolog y . A more complete co verage ca n be found in [ 12 ]. Although homology can b e computed o ver any ring, to compute persistent homolo g y we need a ﬁeld. This is usually F 2 for co mputational purp o ses. T o deﬁne p ersistent homology , we sta r t with a nested s equence o f topologic al spaces, X 0 ⊆ X 1 ⊆ X 2 ⊆ . . . X n = X. (2.1) Often this seque nc e a rises from the sublevel sets of a function, f : X → R , with X i = f − 1 ( −∞ , t i ] for a sequence −∞ ≤ t 0 ≤ t 1 ≤ . . . ≤ t n ≤ ∞ . The sequence induces linear maps o n ho mology for an y dimension r: H r ( X 0 ) → H r ( X 1 ) → . . . → H r ( X n ) . W e are interested in when homology cla sses appear and disappe a r in this sequence . Let φ j i : H r ( X i ) → H r ( X j ) be the linear map on homolo gy induced b y the inclus ion X i → X j . Observe tha t if i < j < k then φ k j ◦ φ j i = φ k i . The homolo gy cla ss γ ∈ H r ( X i ) is said to b e born at X i if it is not in t he image o f φ i i − 1 . This s ame class is sa id to die a t X j if its image in H r ( X j − 1 ) is not in the imag e of φ j − 1 i − 1 , but its imag e in H r ( X j ) is in the image of φ j i − 1 . In the case that the spaces arose fro m the lev el sets of a function f as deﬁned abov e , w e deﬁne say that γ is bo rn at time t i , dies at time t j , a nd its lifetime is t j − t i . A discr ete descr iption of the p ersistent homology o f ( 2.1 ) is the multiset of p o int s in the extended plane where we include ( t i , t j ) with multiplicit y t he dimension of the set o f homology classe s born at t i and dying at t j . This is the infor mation r e corded a s the ( r th dimensional) persistence diagram. 2 Although we have here r estricted our deﬁnition of p e r sistent homo logy to that o f a ﬁnite nested sequence, it can be deﬁned more genera lly for nested parameteriz ed families of top o logical spa ces { X t : t ∈ R } with the condition that X s ⊆ X t whenever s ≤ t (alongside some technical ﬁniteness conditions). W e can think of this as a ﬁlter ed space with ﬁltration parameter t . F or “nice ” ﬁltered spa ces the per sistence dia gram can b e deﬁned analogously to the ﬁnite case. F or details see [ 11 ]. It is w orth observing that although w e are computing homology w e are often capturing geometric information through the ﬁltra tion par ameter. F or exa mple, if we a re c o nsidering a ﬁltra tion of R 2 by the distance from a circle of r adius R then the 1 st dimensiona l p ersistent homolog y will hav e exactly one class b orn at 0 (the time t he c ir cle ﬁrst appears) a nd dying at R (the tim e when the circ le is ﬁlled in). Before giving a technical deﬁnition of a persistence diagram we will need to in tr o duce some notatio n. Let R 2+ = { ( a, b ) ∈ R 2 : a < b } . This is the pa rt of the plane ab ov e the diagona l. Le t ∆ denote an abstract element representing the diag onal { ( x, x ) : x ∈ R } . Since we wish to also wan t to consider p ers istent homo logy classes of inﬁnite duration we will a lso wan t to include lines at inﬁnit y . Le t L −∞ := { ( −∞ , b ) : b ∈ R } and L ∞ := { ( a, ∞ ) : a ∈ R } . Persistence dia grams will be m ultiset of p oints in L ∞ ∪ L −∞ ∪ R 2+ ∪ ∆. F or tra c tability w e will imp os e s ome ﬁniteness conditions, namely that only ﬁnitely many class es hav e inﬁnite lifetimes and that the sum of all the ﬁnite lifetimes is ﬁnite. This restrictio n is n ot onerous in applications where generally we hav e ﬁnite sized data as input. Deﬁnition 1 . A p ersisten c e diagr am X is a multiset of L ∞ ∪ L −∞ ∪ R 2+ ∪ ∆ s uch that • The n umber of elements in X | L ∞ and X | L −∞ are ﬁnite • P ( x i ,y i ) ∈ X | R 2 + ( y i − x i ) < ∞ • There are countably inﬁnite copies o f ∆ 2.2 Metrics on the space of p ersistence diagrams In this pap er we diﬀer slightly from the historical deﬁnition of a p ersistence diagra m. Instead of including every po int along the diago nal of R 2 with inﬁnite multiplicit y we include countably many copies of the diagonal. This alternative description still has the same dis tances betw een diﬀer ent per sistence diagrams but simpliﬁes statements, arguments and calc ulations. This is because w e do not need to sp ecify whic h point on the diagonal is being use d. Let D deno te the spa ce of all p er sistence diagr ams. W e will consider a fa mily o f metrics which a re a nalogo us to the p - W as s erstein dis tances o n the spa ce of pro bability measures and t o the L p distances on the space o f funct ions on a disc r ete set. R 2+ inherits natural L p distances from R 2 . F o r p ∈ [1 , ∞ ) we hav e k ( a 1 , b 1 ) − ( a 2 , b 2 ) k p p = | a 1 − a 2 | p + | b 1 − b 2 | p and k ( a 1 , b 1 ) − ( a 2 , b 2 ) k ∞ = max { | a 1 − a 2 | , | b 1 , b 2 |} . Recall that ∆ represents the diagonal in R 2 . With a slight abuse of notation we write k ( a, b ) − ∆ k p to denote the sho rtest L p distance fro m ( a, b ) in to a po int in the diagona l set in R 2 . Th us k ( a, b ) − ∆ k p = inf t ∈ R k ( a, b ) − ( t, t ) k p = 2 1 p − 1 | b − a | for p < ∞ , and k ( a, b ) − ∆ k ∞ = inf t ∈ R k ( a, b ) − ( t, t ) k ∞ = | y − x | / 2. Both L −∞ and L ∞ inherit natural L p distances from the L p metric on R ; k ( −∞ , b 1 ) − ( −∞ , b 2 ) k p = | b 1 − b 2 | a nd k ( a 1 , ∞ ) − ( a, ∞ ) k p = | a 1 − a 2 | . W e should a lso think of L −∞ , L −∞ and R 2+ ∪ ∆ as three separate disjoint pa rts o f a larger space. Given p ersis tence diagr ams X and Y we can consider all the bijections f rom the set of oﬀ diagonal points and copies of ∆ in X , to the set of oﬀ diagonal points and copies of ∆ in Y . This set is non-empty as it co ntains the bijection whic h matc hes everything to a copy of ∆ in the o ther diag ram. E a ch bijection provides a tr ansp ort plan from X to Y . Analogous to the deﬁnition of W asserstein distances, we will deﬁne our fa mily of metrics in terms o f the co st of most eﬃcien t transpor t plan. F or eac h p ∈ [1 , ∞ ) deﬁne d p ( X, Y ) = inf bijec t i ons φ : X → Y X x ∈ X k x − φ ( x ) k p p ! 1 /p . and d ∞ ( X, Y ) = inf bijec t i ons φ : X → Y sup x ∈ X k x − φ ( x ) k ∞ . These dista nces may b e inﬁnite - for example if X a nd Y con tain a diﬀer ent num b er of p oints in L ∞ then d p ( X, Y ) = ∞ for all p . 3 Figure 1 : The dashed lines indicate a n optimal bijection from the p er s istence dia g ram containing the squa re p oints (and c o pies of ∆) to the per sistence diag ram con taining the triangle p oints (and copies of ∆). W e will call a bijection b etw e e n diagrams optimal for d p if it achiev es the inﬁm um in the deﬁnition o f d p and this dista nce is ﬁnite. Fig ure 1 illustra tes an example of a n optimal bijection. Given the same pair of diag rams but diﬀerent v alues of p , diﬀerent bijections may b e optimal. F urther more, optimal bijections for a g iven p are no t necessar ily unique. F or exa mple, let X and Y to b e diagr a ms containing pairs of oppo site co rners o f a squa re loc a ted far from the diagona l. Because of symmetry , bijection the p oints vertically o r horizontally in volv es the same cost. This example works for every p ∈ [1 , ∞ ]. Other ex amples of non-uniqueness can inv olve ∆; it may b e equally eﬃcient to match tw o p oints to each o ther as it is to matc h b oth with a c o py of ∆. In theo r y , for ev ery p air p , q ∈ [1 , ∞ ] one could co nstruct a distance function of the form inf φ : X → Y X x ∈ X k x − φ ( x ) k p q ! 1 /p with p and q diﬀerent. So me o f the co mputatio nal topology literature us es a family of metrics d W p where p v ar ies but q = ∞ is ﬁxed. The families { d p } and { d W p } share many pr op erties. The metrics d p and d W p are bi- Lipschitz equiv alent as for any x, y ∈ R 2 we hav e k x − y k ∞ ≤ k x − y k p ≤ 2 k x − y k ∞ , implying d W p ( X, Y ) ≤ d p ( X, Y ) ≤ 2 d W p ( X, Y ). An y sta bilit y r esults for { d p } or { d W p } would extend (with minor changes in co nstant) to sta bilit y results for the o ther. W e feel that the choice o f s e tting q = p is cle aner in theory and in practice. The co ordina tes of the p oints within a p ersis tence diagr am ha ve p articular mea ning s; one is the birth time and o ne is the death time. They are often inﬁnitesimally indep endent (ev e n though not globally so ). F or example, if w e hav e g e nerated our p er sistence diag ram from the distance function to a p oint clo ud then each p ers istence class has its bir th a nd death time (inﬁnitesimally) determined by the lo catio n of t wo pairs of po ints which ar e o ften distinct. Whenever these pair s are distinct, moving any of these four points will change either the birth or t he death but not bo th. The distinctness of the treatment of birth a nd death times as separate qua lities ma y seem more philosophically pleasing to the reader in the setting of ba rco des. Another ar gument in fav o r of using p = q is computational pow er. The mean a nd the median a re deﬁned to be the minimizers of cost functions inv olving d 2 and d 1 resp ectively . Computatio nally the mean and median ha ve far nicer characteriza tions, with easy a lg orithms to ﬁnd them, when they a re deﬁned using the metrics where p = q . This is discuss e d la ter in section 3 . 2.3 Geometry of the space of p ersistence diagrams In this section w e will describ e the connected components of D and show D is a geodes ic space, but ﬁ rst w e need to in tro duce some notation. F or each p < ∞ w e can deﬁne the d istance betw ee n ﬁnite mu ltisets A, B of points in L ∞ as d L ∞ p ( A, B ) p = inf bijec t i ons φ : A → B X a ∈ A k a − φ ( a ) k p p when | A | = | B | a nd inﬁnity other wise. W e also set d L ∞ ∞ ( A, B ) = inf bijec t i ons φ : A → B sup a ∈ A k a − φ ( a ) k ∞ 4 whenever | A | = | B | and inﬁnit y otherwise. W e deﬁne d L −∞ p similarly . Let D ( k,l ) denote the space of p ersistence diagrams co ntaining e x actly k p oints in L −∞ , a nd exactly l p oints in L ∞ . Lemma 2. Let p ∈ [1 , ∞ ]. The connected comp onents of ( D , d p ) ar e D ( k,l ) , with k , l non- negative in teger s. Let X , Y ∈ D ( k,l ) . F or p < ∞ , d p ( X, Y ) p = d p ( X | R 2+ ∪ ∆ , Y | R 2+ ∪ ∆ ) p + d L ∞ p ( X | L ∞ , Y | L ∞ ) p + d L −∞ p ( X | L −∞ , Y | L −∞ ) p and d ∞ ( X, Y ) = max { d ∞ ( X | R 2+ ∪ ∆ , Y | R 2+ ∪ ∆ ) , d L ∞ ∞ ( X | L ∞ , Y | L ∞ ) , d L −∞ ∞ ( X | L −∞ , Y | L −∞ ) } . The pro of of this lemma follo ws from the o bserv ations that the connected comp onents o f L ∞ ∪ L −∞ ∪ R 2+ ∪ ∆ are L ∞ , L −∞ and R 2+ ∪ ∆, and that d p ( X | R 2+ ∪ ∆ , Y | R 2+ ∪ ∆ ) is alwa ys ﬁnite. Let ( X , d ) b e a metric space. A cur ve λ : [0 , 1] → X is called a ge o desic if there exists constant C > 0 such that d ( λ ( t 1 ) , λ ( t 2 )) = C | t 1 − t 2 | fo r a ll t 1 , t 2 ∈ [0 , 1 ] with | t 1 − t 2 | s uﬃciently small. ( X , d ) is called a ge o desic sp ac e if for every x, y ∈ X there ex ists a g eo desic λ : [0 , 1 ] → X s uch that λ (0) = x and λ (1) = y . Prop ositi on 3. ( D , d p ) is a geo desic space for all p ∈ [1 , ∞ ]. Pr o of. Fix p ∈ [1 , ∞ ) and X , Y ∈ D with d p ( X, Y ) < ∞ . W e want to ﬁnd a bijection φ such that d p ( X, Y ) p = P x ∈ X k x − φ ( x ) k p p and from this bijection t o construct a geo desic fro m X to Y . Let { φ i } be a sequence of bijections such that lim i →∞ P x ∈ X k x − φ i ( x ) k p p = d p ( X, Y ) p . Fix some oﬀ diagonal po int ˆ x ∈ X . The sequence { φ i ( ˆ x ) } m ust have a co nv erg e nt subsequence { φ i j ( ˆ x ) } which co nv erges either to an oﬀ diagona l p o int or to ∆ . This limit po int must b e in Y and we set φ ( ˆ x ) to b e this limit p o int. Obse r ve our subsequence also s atisﬁes lim j →∞ P x ∈ X k x − φ i j ( x ) k p p = d p ( X, Y ) p . W e now replace our or iginal s e quence of bijections { φ i } with the subsequence { φ i j } . In this manner we c a n determine a ch oice φ ( x ) for each oﬀ diag onal po int x ∈ X . Similarly we ca n determine φ − 1 ( y ) for all the oﬀ diag onal po ints y ∈ Y . Since w e a r e alwa ys consider ing subsequences of previous subsequences we ha ve consistency in our choices. Since there ar e only co untably many p oints o ﬀ the diagonal in the diag rams X and Y combined we can ﬁnd an optimal bijection φ : X → Y Let X t be the dia gram with oﬀ diagonal p oints { (1 − t ) x + tφ ( x ) : x ∈ X } a nd se t deﬁne the path λ : [0 , 1] → D by λ ( t ) = X t . By obser v ation X 0 = X , X 1 = Y , and λ is a geo desic. The p = ∞ ca se is similar. Let { φ i } be a seq ue nc e of bijectio ns suc h that lim i →∞ max x ∈ X k x − φ i ( x ) k ∞ = d ∞ ( X, Y ) and pro ce e d as in the case p ∈ [1 , ∞ ) to pro duce a bijection φ , by assigning the v alues o f φ ( x ) and restricting to appropria te subsequences, such that d ∞ ( X, Y ) = max x ∈ X k x − φ ( x ) k ∞ . This bijection will determine a geo desic by the sa me rea soning in the p ∈ [1 , ∞ ) ca se. 2.4 Curv ature b ounds on the space of p ersistence d iagrams In o r der to understand the spa ce of p ersis tence diagra ms it is useful to analy ze its cur v ature. Alexandrov spaces are geo desic spa ces with curv ature bo unds. The y co me in t wo diﬀerent forms; either their curv ature is b ounded from ab ove (a ls o k nown as C AT spaces) or their curv ature is b ounded from b elow. A b ound on curv ature in geo desic space ( X , d ) is deﬁned using comparison triangles. F or each κ ∈ R there is a model space M κ with constant curv ature κ . W e compar e tria ng les in X to t riangles in M κ . T ak e t hree points x, y , z . If κ > 0 w e require d ( x, y ) + d ( y , z ) + d ( z , x ) ≤ p 2 π /κ. These deﬁne a tr iangle ∆( x, y , z ) in X . W e can build a c o mparison triangle ∆( ˜ x, ˜ y , ˜ z ) in the mo del spa c e M κ whose sides have the same length as the sides of ∆( x, y , z ). The cur v ature of X is bo unded from b elow (ab ov e) by κ if, for every triangle ∆( x, y , z ) in X , the dis ta nces b e t ween the po ints in ∆( x, y , z ) are less than or equa l (resp ectively g r eater than o r equal) the corre s p o nding p oints in the co mparison triangle ∆ ( x ′ , y ′ , z ′ ) in M κ . F or more details s ee [ 18 ]. A C AT ( k ) space is a geo desic space whose curv ature is b ounded from ab ov e by k . C AT -spaces, in particula r C AT (0) spa ces, hav e des ir able prop erties. F or example, the ba rycenter of any measure in a C AT (0) space is unique and in a C AT ( k ) space there is a length D k such that balls o f r adius D k are co ntractible. W e ﬁrst co nﬁrm that ( D , d p ) is not a C AT -spac e. Prop ositi on 4. F o r all k > 0 and p ∈ [1 , ∞ ], ( D , d p ) is not in CA T( k ). 5 Pr o of. If ( D , d p ) is a CA T( k ) spa ce then ther e is a constant K > 0 s uch that for all pairs X , Y ∈ ( D , d p ) with d p ( X, Y ) 2 < K there is a unique g eo desic b etw een them [ 18 ]. How e ver, we can ﬁnd pa ir s of dia grams X and Y which are arbitrarily close suc h that there a re t wo distinct geo desics be tw een them. One example is by taking X to be a diagra m with tw o diagona lly oppo site cor ners of a s quare set far from the diagona l and Y the dia g ram with the o ther tw o corners . This is illustrated in Figure 2 . The horizo nt al and vertical paths are eq ually o ptimal and we may cho ose the square to b e as small as we wish. Figure 2 : Tw o diﬀerent optimal bijections b etw een the triangle a nd the square dia grams. Alexandrov spac es ha ve nice proper ties . F or ex ample, fo r Alexandrov spa ces w e can deﬁne tangent cones (analogo us to tangent pla nes), and exponential maps. F r om [ 24 ] w e know that a ge o desic space ( X , d ) is an Alexandrov spa ce with curv ature b o unded from b elow by zero if, and o nly if, for ev ery geo desic γ : [0 , 1] → X fro m X to Y , and ev er y Z ∈ X w e ha ve d ( Z, γ ( t )) 2 ≥ td ( Z , Y ) 2 + (1 − t ) d ( Z, X ) 2 − t (1 − t ) d ( X, Y ) 2 . (2.2) Using this characterization [ 28 ] show ed that ( D , d 2 ) is a n Alexandr ov spa ce with curv atur e b ounded b elow by zero . In co nt rast, using diﬀer e nt count erexamples for p ∈ [1 , 2) and for p ∈ (2 , ∞ ] we can show that this curv atur e bo und do es not hold when p 6 = 2 . These counterexamples a re illustrated in Figure 3 . Let p ∈ [1 , 2) a nd t = 1 / 2. Let X , Y and Z b e a p ersistence diag ram with only o ne oﬀ diago na l p o int ea ch in them at x = (1 , 4) , y = (1 , 6) and z = (0 , 5) re s p e ctively . The midwa y p oint b etw een X and Y (pla ying the role of γ (1 / 2)) is the dia gram with the p oint w = (1 , 5). This set up is shown in Figure 3a . W e ca n ca lculate d p ( Z, γ (1 / 2)) p = k z − w k p p = 1, d p ( Z, X ) p = k z − x k p p = 2, d p ( Z, Y ) p = k z − y k p p = 2 and d p ( X, Y ) p = k x − y k p p = 2 p . T og ether they imply 1 2 d p ( Z, Y ) 2 + 1 2 d p ( Z, X ) 2 − 1 4 d p ( X, Y ) 2 = 2 2 /p − 1 . But 2 2 /p − 1 > 1 = d p ( Z, γ (1 / 2)) 2 when 1 ≤ p < 2. This contradicts equatio n ( 2.2 ) and hence ( D , d p ) is not an Alexandrov spa ce with curv ature b ounded below by zero. Now let p ∈ (2 , ∞ ) and t = 1 / 2. Let X , Y and Z be a p ersistence diagram with o nly o ne oﬀ diago nal p oint each in them at x = (0 , 4) , y = (2 , 6) and z = (0 , 6) resp ectively . The midwa y po int betw een X and Y (pla y ing the role o f γ (1 / 2)) is the diag ram with the p oint w = (1 , 5). This set up is s hown in Figure 3b . Here d p ( Z, γ (1 / 2)) p = k z − w k p p = 2, d p ( Z, X ) p = k z − x k p p = 2 p , d p ( Z, Y ) p = k z − y k p p = 2 p , and d p ( X, Y ) p = k x − y k p p = 2 p +1 . T og ether they imply 1 2 d p ( Z, Y ) 2 + 1 2 d p ( Z, X ) 2 − 1 4 d p ( X, Y ) 2 = 2 2 − 2 2 /p But 2 2 − 2 2 /p > 2 2 /p = d p ( Z, γ (1 / 2)) 2 when p > 2. This c ontradicts equation ( 2.2 ) and hence ( D , d p ) is not a n Alexandrov spa ce with curv ature b ounded below by zero. With this sa me set up we can calculate d ∞ ( Z, γ (1 / 2)) = k z − w k ∞ = 1, d ∞ ( Z, X ) = k z − x k ∞ = 2, d ∞ ( Z, Y ) = k z − y k ∞ = 2, and d ∞ ( X, Y ) = k x − y k ∞ = 2. Hence 1 2 d ∞ ( Z, Y ) 2 + 1 2 d ∞ ( Z, X ) 2 − 1 4 d ∞ ( X, Y ) 2 = 3 > 1 = d ∞ ( Z, γ (1 / 2)) 2 . This c o ntradicts equation ( 2.2 ) and hence ( D , d ∞ ) is not an Alex androv space with curv atur e bo unded b elow b y zero. 6 x = (1 , 4 ) y = (1 , 6 ) z = (0 , 5 ) w = (1 , 5) (a) Countere xample for p ∈ [1 , 2) y = (2 , 6 ) x = (0 , 4) z = (0 , 6 ) w = (1 , 5) (b) Counterexample for p ∈ (2 , , ∞ ] Figure 3: In b oth (a) and (b) the geo des ic γ fro m X = { x } to Y = { y } has midp oint γ (1 / 2) = { w } . W e co ns ider the distances to Z = { z } . Here we are descr ibing each p ers istence diagr am by its list of oﬀ diago nal p oints. In bo th ex amples d p ( Z, γ (1 / 2)) 2 < 1 2 d p ( Z, Y ) 2 + 1 2 d p ( Z, X ) 2 − 1 4 d p ( X, Y ) 2 contradicting ( 2.2 ). Thus ( D , d p ) is no t an Alexandrov spa ce with curv ature b ounded below by zero when p ∈ [1 , 2) ∪ (2 , ∞ ]. 3 The median of a p opulation of p ersistence diag rams Measures of central tendency (such as the mean and the median), o r their corres po nding meas ur es of v ariability or disp er sion (the v a riance and the average cost re s p e ctively) are common statistics used to describe distributions. Cent ral tendencies a re solutions for o ptimizing diﬀerent cost functions which are based on p -W asserstein metrics. The media n c o rresp o nds to the case where p = 1. The median of a set o f re al num b ers a 1 , a 2 , . . . , a N , written in non-decreas ing order, is the num b er m which minimizes the mean abs olute deviation function F R 1 ( x ) = 1 N P N i =1 | a i − x | . The av era ge co s t of moving a point in the s ample data to the median is F 1 ( m ). F or N o dd the median is unique and equals a N + 1 / 2 . If N even then every nu m be r in the interv al [ a N/ 2 , a ( N +2) / 2 ] will minimize F R 1 and would be a v alid choice as a media n. T o overcome this la ck o f uniqueness the general conv ention is to declare the median to be the midp oint of [ a N/ 2 , a ( N +2) / 2 ]. More generally the median of a population { x 1 , x 2 , . . . , x N } within a connected metr ic space X is the minimizer of the function F 1 ( y ) = 1 N P N i =1 d ( x i , y ) where d is an a ppropria te metr ic on X . T o deﬁne the median o f a p opulation of p ersistence diagra ms w e need to ﬁx a metric on the space of pe rsistence diagrams. Unlik e on the real line there ar e multiple reaso nable o ptions as explored in section 2.2 . In this pap er we argue that a well-motiv ated metric is d 1 . This is for tw o main r easons. Firstly , the coo rdinates eac h h av e separ ate meanings and are inﬁnitesimally indep endent. Thus for taking the median the birth should b e the median of the relev ant births, th e deaths the media n of the relev ant dea ths. Secondly , the computations beco me sig niﬁca ntly easier. If w e w ere to use d 2 from 2.2 then would need to co mpute, a t some stag e, the geometric media n of a set of po ints in t he pla ne. This is a pr oblem a s it was shown in [ 2 ] that in gener al there is no exact a lgorithm to ﬁnd the geometric median o f a set of k p o ints in the plane. The me dian is the p ersis tence dia gram m whic h minimizes the co st function F 1 ( Y ) = 1 N P N i =1 d 1 ( X i , Y ). The total c ost is N F 1 ( m ) and the aver age c ost is F 1 ( m ). It is w orth obse r ving that the median is only deﬁned for po pulations { X 1 , . . . , X N } that lie within the same connected component o f D as otherwise for every Y there is some X i such that d 1 ( X i , Y ) = ∞ . Prop ositi on 5. Fix k , l non-neg ative in tegers. Let X = { X 1 , X 2 , . . . , X N } be a po pulation of pe r sistence diagra ms in D ( k,l ) . F or eac h i let X R 2+ i denote X i | R 2+ ∪ ∆ , X ∞ i denote X i | L ∞ and X −∞ i denote X i | L − ∞ . The diagram Y is a median of X if and only if Y = Y R 2+ ∪ Y ∞ ∪ Y −∞ where Y R 2+ is a median of { X R 2+ 1 , X R 2+ 2 , . . . , X R 2+ N } , Y ∞ is a median of { X ∞ 1 , X ∞ 2 , . . . , X ∞ N } and Y −∞ is a median of { X −∞ 1 , X −∞ 2 , . . . , X −∞ N } Pr o of. Let Z ∈ D ( k,l ) . F rom Lemma 2 we know that d 1 ( Z, X i ) = d 1 ( Z | R 2+ , X R 2+ i ) + d L ∞ 1 ( Z | L ∞ , X ∞ i ) + d L −∞ 1 ( Z | L −∞ , X −∞ i ) 7 and hence we can write F 1 as a sum o f thre e indep endent sums F 1 ( Z ) = 1 N N X i =1 d 1 ( Z | R 2+ , X R 2+ i ) + 1 N N X i =1 d L ∞ 1 ( Z | L ∞ , X ∞ i ) + 1 N N X i =1 d L −∞ 1 ( Z | L −∞ , X −∞ i ) . If Y is a median then it must minimize eac h of these sums. In the remainder o f this section we will characteriz e the media n of a population of multisets in L ∞ or in L −∞ . The nex t se ction will address the m ore c omplicated characteriza tion of medians of populations in D (0 , 0) . Lemma 6. Fix N o dd. Let A 1 , A 2 , . . . , A N be each multisets of exactly k rea l num b ers . Lab e l the elements of each A i so that A i = { a i, 1 , a i, 2 , . . . , a i,k } with a i, 1 ≤ a i, 2 ≤ . . . ≤ a i,k . Set B = { b 1 , b 2 , . . . , b k } where b j is the median o f { a 1 ,j , . . . a N ,j } . Then B is the unique multiset of k real num b ers tha t minimizes f 1 : Y 7→ N X i =1 inf φ : A i → Y ,φ bijec tion X a ∈ A i | a − φ ( a ) | ! . Pr o of. The key to this pro of is that fo r X = { x 1 , x 2 , . . . , x k } and Y = { y 1 , y 2 , . . . , y k } (each wr itten in non-decre a sing order) w e have inf φ : X → Y , φ bijection m X j =1 | x i − φ ( x i ) | = m X j =1 | x j − y j | . W e ar e not claiming that φ : x i 7→ y i is the unique optimal transp or t from X to Y (this is no t alwa ys true) but just that it ach ieves this optimalit y . Suppo se the m ultiset Y = { y 1 , y 2 , . . . , y k } (wr itten in non-decrea sing or der) minimizes f 1 . The observ ation ab ov e implies f 1 ( Y ) = N X i =1 k X j =1 | a i,j − y j | = k X j =1 N X i =1 | a i,j − y j | ! . F or each j let b j be the median o f { a i,j } . Since N is o dd, P N i =1 | a i,j − b j | ≤ P N i =1 | a i,j − y j | w ith equa lity if and only if b j = y j . Since B minimizes f 1 we conclude that y j is the median of { a i,j } If N is even, then the median is not unique, even if we use the midp oint conv ent ion for p o pula tions of rea l v alues. F or example if A 1 = { 0 , 2 } a nd A 2 = { 6 , 12 } , the t wo medians w ould be { midpt[0 , 6] = 3 , midpt[2 , 12] = 8 } and { midpt[0 , 12] = 6 , midpt[2 , 6] = 4 } . 3.1 The mean and median of m ultisets of p o ints in the plane and copies of the diagonal W e ar e splitting up o ur analy sis into the diﬀerent regio ns R 2+ ∪ ∆, L ∞ or L −∞ . T his is be c a use these are the disconnected co mp o nent s and all geo desics will keep the points in the p ersis tence diagrams within these diﬀer ent regions sepa rate. Th us in this s ection we will focus on the points in R 2+ ∪ ∆, that is thos e corr esp onding to per sistent homolog y classes with ﬁnite lifetimes. Before consider ing the pr o blem of p opula tions of multisets in R 2+ ∪ ∆ w e will ﬁrst inv estigate the s impler problem of po pulations of single to ns in R 2+ ∪ ∆. Here we are using the deﬁnition of median as the minimizer of the sum of d 1 distances. Giv en a populatio n S set f S : R 2+ → R b y f S ( z ) = 1 N P w ∈ S || z − w || 1 . Th us y is a median o f S implies it is a minimizer of f S . Within the prop o sition the ca ndida te minimiser is the p oint whose x − and y − co or dinates are each the median of a sets of num b ers constructed from the x − and y − co or dinates of the points in S , and fr om copies of the diago nal. The idea is that whenever a po int ( x, y ) is m atched with the diago nal it is eﬀectively matching it with a p oint ( t, t ) with x ≤ t ≤ y . F or c a lculating a median of th e x coordina tes with | S | lar g er than the n umber of the copies of the diagonal, a co ntribution of t ∈ [ x, y ] and a contribution of ∞ will hav e the same eﬀect. Similarly , for calculating the median of the y -co o rdinates, a contribution of t ∈ [ x, y ] will hav e the same eﬀect as that of −∞ . This means that fro m t he pur po ses of calculation, we can use ±∞ in our lists of coo rdinates in the prop ositio n below. 8 Prop ositi on 7 . Fix N o dd and suppose k > N / 2. Let S = { ( a 1 , b 1 ) , ( a 2 , b 2 ) , . . . , ( a k , b k ) } and N − k copies of ∆ (where the ( a i , b i ) ∈ R 2+ ). Deﬁne f = f S | R 2+ , that is f (( x, y )) = k X i =1 k ( x, y ) − ( a i , b i ) k 1 + N X i = k +1 k ( x, y ) − ∆ k 1 . Let ( ˜ x, ˜ y ) be the point in R 2 where ˜ x is the median o f { a 1 , a 2 . . . a k } with N − k copies of ∞ and ˜ y is the median of { b 1 , b 2 , . . . , b k } with N − k co pies of −∞ . If ˜ x < ˜ y then ( ˜ x, ˜ y ) is the point in R 2+ which minimizes f . If ˜ x ≥ ˜ y then f (( x, y )) > P k i =1 k ∆ − ( a i , b i ) k 1 for all ( x, y ) ∈ R 2+ . Pr o of. Since k > N / 2 we k now that ˜ x and ˜ y are ﬁnite. Supp ose that ˜ x < ˜ y . W e wan t to show ( ˜ x, ˜ y ) is the minimum of f . Since f is a con vex function o ver R 2+ it is suﬃcien t to sho w ( ˜ x, ˜ y ) is a loc al minimum. Consider pairs ( u, v ) suc h that | u | < min a i 6 = ˜ x | ˜ x − a i | , | v | < min b i 6 = ˜ y | ˜ y − b i | , and | u | + | v | < k ( ˜ x, ˜ y ) − ∆ k 1 . This is t rue for suﬃcien tly small u and v . F o r such ( u, v ) w e ha ve k X i =1 k ( ˜ x + u, ˜ y + v ) − ( a i , b i ) k 1 − k X i =1 k ( ˜ x, ˜ y ) − ( a i , b i ) k 1 = |{ i : a i < ˜ x }| u + |{ i : a i > ˜ x }| ( − u ) + |{ i : a i = ˜ x }|| u | + |{ i : b i < ˜ y }| v + |{ i : b i > ˜ y } | ( − v ) + |{ i : b i = ˜ y }|| v | and k ( ˜ x + u, ˜ y + v ) − ∆ k 1 − k ( ˜ x, ˜ y ) − ∆ k 1 = (( ˜ y + v ) − ( ˜ x + u )) − ( ˜ y − ˜ x ) = v − u. T og ether these imply that f (( ˜ x + u, ˜ y + v )) − f (( ˜ x, ˜ y )) = |{ i : a i < ˜ x }| u + |{ i : a i > ˜ x } | ( − u ) + |{ i : a i = ˜ x }|| u | + ( N − k )( − u ) + |{ i : b i < ˜ y }| v + |{ i : b i > ˜ y }| ( − v ) + |{ i : b i = ˜ y }|| v | + ( N − k ) v . Since ˜ x is the median of { a 1 , a 2 . . . a k } with N − k copies of ∞ w e k now that     ( |{ i : a i > ˜ x }| + ( N − k )) − |{ i : a i < ˜ x }|     < |{ i : a i = ˜ x }| . This im plies t hat if u 6 = 0 then |{ i : a i < ˜ x }| u + |{ i : a i > ˜ x }| ( − u ) + |{ i : a i = ˜ x } || u | + ( N − k )( − u ) > 0 . Similarly if v 6 = 0 then |{ i : b i < ˜ y }| v + |{ i : b i > ˜ y }| ( − v ) + |{ i : b i = ˜ y }|| v | + ( N − k ) v > 0 . Thu s f (( ˜ x + u, ˜ y + v )) > f (( ˜ x, ˜ y )) for ( ˜ x + u, ˜ y + v ) suﬃcien tly near, but not eq ual to, ( ˜ x, ˜ y ), a nd tha t ( ˜ x, ˜ y ) is a local minim um. The conv exity of f further implies that ( ˜ x, ˜ y ) is the global minimum of f over the domain R 2+ . Remem be r we are not including the diagonal as a candidate of lo cations for the minim um here as f is a f unction ov er R 2+ . Now supp ose that ( ˜ x, ˜ y ) lies on or be low the diago nal. Let ( x, y ) ∈ R 2+ . Then either x < ˜ x or y > ˜ y . Suppose that x < ˜ x . Let x ′ ∈ ( x, ˜ x ) wit h ( x ′ , y ) ∈ R 2+ . Then f (( x, y )) − f (( x ′ , y )) = k X i =1 ( | x − a i | − | x ′ − a i | ) + N X i = k +1 ( x ′ − x ) > 0 as | x − a i | − | x ′ − a i | = x ′ − x whenever a i ≥ ˜ x and |{ i : a i ≥ ˜ x } | + ( N − k ) > |{ i : a i < ˜ x }| as ˜ x is the media n of the a i and N − k copies of ∞ . 9 A similar ar gument sho ws that if y > ˜ y and y ′ ∈ ( ˜ y , y ) with ( x, y ′ ) ∈ R 2+ then f (( x, y )) > f (( x, y ′ )). Thus f decreases a s ( x, y ) tra vels towards ( ˜ x , ˜ y ) while sta ying within R 2+ . By consider ing limits there is some t such that f (( x, y )) > k X i =1 k ( t, t ) − ( a i , b i ) k 1 + ( N − k ) k ( t, t ) − ∆ k 1 . The o bserv ation that k ( t, t ) − ( a i , b i ) k 1 ≥ k ∆ − ( a i , b i ) k 1 for all i , t completes the pr o of. Lemma 8. If k < N / 2 then k X i =1 k ( x, y ) − ( a i , b i ) k 1 + N X i = k +1 k ( x, y ) − ∆ k 1 > k X i =1 k ∆ − ( a i , b i ) k 1 for ev ery p o int ( x, y ) ∈ R 2+ Pr o of. P N i = k +1 k ( x, y ) − ∆ k 1 > P k i =1 k ( x, y ) − ∆ k 1 as k < N / 2 . Then use the triangle inequality; k ( x, y ) − ( a i , b i ) k 1 + k ( x, y ) − ∆ k 1 ≥ k ∆ − ( a i , b i ) k 1 . Using Prop o s ition 7 and Lemma 8 w e can characterize the median of a o dd sized populatio n of p oints in R 2+ ∪ ∆. Corollary 9. Let N b e odd and S the popula tion { ( a 1 , b 1 ) , . . . , ( a k , b k ) , ∆ , . . . , ∆ } con taining N − k copies of ∆. Let ˜ x b e the median of { a 1 , a 2 . . . a k } with N − k co pies of ∞ and let ˜ y b e the median of { b 1 , b 2 , . . . , b k } with N − k copies o f −∞ . If ( ˜ x , ˜ y ) lies ab ov e the diagonal then the median of S is either ( ˜ x, ˜ y ) o r ∆ (dep ending o n whether f (( ˜ x, ˜ y )) o r f (∆) is smaller ). If ( ˜ x, ˜ y ) lies on or be low the diag onal (o r is ( ∞ , −∞ ) which philosophically lie s below the dia g onal) then ∆ is t he media n o f S . Figure 4: The median of the square, triangle and diamond, circle and pentagon points. The order of the x co ordinates ar e { p entagon, sq uare, diamond, triangle, circle } and hence the median is that of the diamond. The order of the y coordina tes a re { p entagon, dia mond, square, c ircle, triangle } and hence the y coor dinates of the median is that of the square . The median of S is at most tw o lo ca tions as the median o f an o dd num b er of (extended) real num b ers is unique. In con tra st, if N is even then we would have the same uniqueness issues as for s ets of rea l num b ers. F o r example, given the set X = { 2 , 3 , 4 , 10 } e very p oint in the interv al [3 , 4] will minimise the sums distance s of p oints to X . How ever, by con ven tion, we genera lly say that 3 . 5 is the median of X . When considering the median of a n even nu m be r of po ints in R 2+ ∪ ∆, instead of a unique p oint in R 2+ ∪ ∆ w e hav e a rec ta ngle of options , in which every p oint would minimize f S . W e co uld in theor y adopt a n analogo us co nven tion by using the ba rycenter o f this rectangle. This may be application dep endent. How ever, for clarit y of exposition, and to make the statemen ts of theorems muc h cle a ner, we will b e restricting o ur attention to N o dd. 3.2 Characterizing the median(s) of sets of diagrams In [ 28 ] there is a complete characterization of the local minima o f F 2 when the observ ations are ﬁnitely many per sistence diagra ms each with only ﬁnitely many oﬀ dia gonal p o int s. How ever, the pro of used the Alexandrov space structure of ( D , d 2 ) and hence we ca nnot adapt it to characterize the lo c al minima of F 1 . Here we use an 10 Figure 5: The median of the squa r e, triangle and diamond points a longside t wo copies of the diagonal. The order of the x co ordinates are { squa re, diamond, t riangle, ∞ , ∞} and hence the median is that o f the triangle. The order of the y co o rdinates are {−∞ , − ∞ , dia mond, square, triangle } and hence the y co ordina tes of t he median is that of the diamond. Figure 6: The median of the squar e and diamond p o ints alongside three copies of the diagonal. The o rder of the x coo rdinates are { square, diamond, ∞ , ∞ , ∞} and hence the median is “ ∞ ”. The order of the y co o rdinates are {−∞ , −∞ , −∞ , diamo nd, square } and hence the median is −∞ . This implies that the median is a copy of the diagonal. Figure 7 : The median of the squa re a nd diamond points a longside three co pies of the dia gonal. The order of the x co ordinates ar e { square , diamond, tria ng le ∞ , ∞} and hence the ca ndidate median has x co ordinate of the triangle . The order o f the y co ordinates ar e { −∞ , −∞ , diamond, squar e, triang le } and hence the hence the candidate median has y co ordina te of the diamo nd. How ever this candidate lies b elow the diagonal and hence the median is a cop y of the diago nal. 11 alternative approa ch to pro ve a nalogo us necess ary and suﬃcien t conditions for a p ersistence diagram to be a lo cal minim um of F 1 . This c harac ter ization of lo cal minima of F 2 was r ephrased in [ 23 ] in terms o f selections , groupings a nd optimal pairings. Here we will characterize the lo cal minima of F 1 with this sa me terminology . Given a set of dia grams X 1 , · · · , X N , a s ele ct ion is a c hoice of one p oint from each diagram (where that point can b e ∆). A gr ouping is a set of selectio ns so that every oﬀ diag onal p oint of every diagram is par t of exactly one se le c tion. Our notation will be as follows. If S is a selection then let m S be the median of that selection (chosen to be the oﬀ diag o nal p oint if not unique). A grouping G of X 1 , . . . X N is the set of selections G = { S j } . L e t m ( G ) be the p er sistence diagram which co ntains { m S j : S j ∈ G } . Ea ch gr ouping G pr o duces a candidate m ( G ) for t he median. W e will show that any media n of X 1 , . . . X N m ust b e m ( G ) for some gro uping G o f X 1 , . . . X N . W e will consider t wo groupings a s equiv alen t if they only diﬀer b y selections containing only co pies of the diagonal. Note that equiv alent groupings pro duce the same p ersis tence diagr am as their median ca ndidate. This implies that in theorems and algo rithms we can restrict to grouping s where each selection contains at least one oﬀ-diagona l po int. Since any minim um is a ls o a lo cal minimum, we will focus now on c hara cterizing the loc a l minima of F 1 . W e will ﬁrst show that if Y is the media n of { X 1 , . . . , X N } then Y = m ( G ) whenever G is an appropria te gro uping constructed using optimal bijections φ i : Y → X i . Theorem 10. L et X 1 , . . . , X N b e p ersistenc e diagr ams in D (0 , 0) . L et Y = { y j } ∈ D (0 , 0) . F or e ach i let φ i : Y → X i b e an optimal bije ction b etwe en Y and X i using the distanc e function d 1 . F or e ach y ∈ Y we have a sele ction { φ i ( y ) } (to make t his wel l deﬁne d we t hink of the c opies of ∆ when φ − 1 i ( x j ) = ∆ as e ach disjoint). L et G b e the gr ouping {{ φ i ( y ) } : y ∈ Y } . If Y is a lo c al minimu m of F 1 : Z 7→ 1 N P N i =1 d 1 ( X i , Z ) then Y = m ( G ) . Pr o of. Supp ose that Y 6 = m ( G ) a nd th us y 6 = m { φ i ( y ) } for some y ∈ Y . W e need to split into cases depending on whether or not m { φ i ( y ) } is the diagonal. If y = ∆ then { φ i ( y ) } con tains at most one oﬀ diagona l p o int. By Lemma 8 we know that m { φ i ( y ) } = ∆. Suppo se now that y 6 = ∆ and that more that half of { φ i ( y ) } are copies of the dia gonal. As z mov es from y to the closest p oint on the dia gonal P { i : φ i ( y ) 6 =∆ } k z − φ i ( y ) k 1 increases le s s than P { i : φ i ( y )=∆ } k z − ∆ k 1 decreases and hence P i k z − φ i ( y ) k 1 m ust b e decreasing . Let Z = { z } ∪ Y \ y . F 1 ( Z ) decrea ses as z moves from y tow ar ds the diagonal. Thus Y cannot b e a lo cal minim um. Finally supp ose that y 6 = ∆ and mor e than half the points of { φ i ( y ) } are oﬀ the diagonal. Cons ider the point ( ˜ x, ˜ y ) ∈ R 2 int ro duced in P rop osition 7 . If ( ˜ x, ˜ y ) lies above the diagonal then by Pr op osition 7 we know that P i k z − φ i ( y ) k 1 decreases a s z travels along a straight line from y to m { φ i ( y ) } . If ( ˜ x, ˜ y ) lies on or below the dia gonal then the proof of Prop os ition 7 sho ws t hat P i k z − φ i ( y ) k 1 decreases as z moves from y to ∆ = m { φ i ( y ) } . In bo th cases this implies that F 1 would also be decr easing as z trav e ls from y tow ards m { φ i ( y ) } . Ha ving now exhausted all the pos sibilities w e co nclude Y is not a lo cal minimum. In the ab ov e theorem we made no ass umption ab o ut the uniqueness of the optima l bijections φ i : Y → X i . Instead this necessary condition holds fo r any set of optimal bijections. This is slig htly diﬀerent to the scenario of the mean. In [ 28 ] it was shown that if Y was a loc a l minim um of F 2 the φ i were essentially unique (up to relab elling po ints in t he p ersistent diagra ms at the same lo cation in R 2+ . Ho wev er this uniqueness do es not hold for the lo cal minima of F 1 . This is b ecause shifting an o bserv ation a i within a po pulation { a 1 , . . . , a N } of r eal num b ers do e s not aﬀect the median unless sgn( a i − median { a 1 , . . . , a N } ) changes. Fig ur e 8 provides an e x plicit ex ample. The following is a s uﬃcient co ndition for a p ersistence diagra m to b e a lo c al minimum of the function F 1 : Z 7→ 1 N P N i =1 d 1 ( X i , Z ) when we r estrict to input diag rams X i containing only ﬁnitely many o ﬀ dia gonal p oints. Theorem 11. L et X 1 , . . . , X N ∈ D (0 , 0) b e p ersistenc e diagr ams with only ﬁnitely many oﬀ diagonal p oints. L et Y = { y j } ∈ D . Supp ose t hat whenever φ i : Y → X i ar e optimal bije ctio n s 1. y = m { φ i ( y ) } whenever y ∈ Y is oﬀ diagona l, and 2. for any sele ction S = { x 1 , x 2 , . . . x N } such that x i ∈ X i and φ − 1 i ( x i ) = ∆ , we have m S = ∆ . Then Y is a lo c a l minimum of F 1 : Z 7→ 1 N P N i =1 d 1 ( X i , Z ) 12 φ 2 ( y 1 ) φ 2 ( y 2 ) φ 1 ( y 2 ) φ 1 ( y 1 ) φ 3 ( y 1 ) φ 3 ( y 2 ) y 1 y 2 (a) A grouping where the y 2 is the median of t he selection conta ining the diamond on the l eft . φ 2 ( y 1 ) φ 2 ( y 2 ) φ 1 ( y 2 ) φ 1 ( y 1 ) φ 3 ( y 2 ) φ 3 ( y 1 ) y 1 y 2 (b) A grouping where the y 2 is the median of the selection conta ining t he diamond on the right . Figure 8: Y (blac k ) is a local minim um of F 1 . It is the unique median of the “triangle ” , “sq uare” a nd “diamond” diagrams. There ar e t wo o ptimal bijections φ 3 from Y to the “dia mo nd” diagram, crea ting tw o groupings G 1 and G 2 . Here Y = m ( G 1 ) = m ( G 2 ). Pr o of. Let φ i : Y → X i be optima l bijections. Assume that y = m { φ i ( y ) } whenever y ∈ Y is oﬀ diago nal. Since each of the X i contain only ﬁnitely many o ﬀ diag onal points there can only b e ﬁnitely many selectio ns of the { X 1 , X 2 , . . . X N } containing so me oﬀ diagona l p o int . Thus there can only b e ﬁnitely many oﬀ dia g onal points in Y . Suppo se that Y is no t a lo cal minim um. Then there exists a sequence Y n that con verges to Y suc h that F 1 ( Y n ) < Y for all n . F or ea ch Y n ﬁx optimal bijections ψ n : Y → Y n . Fix an o ﬀ diagona l p o int y ∈ Y . Since k y − ∆ k > 0 and d 1 ( Y , Y n ) → 0 w e kno w ψ n ( y ) 6 = ∆ for large enough n . F or eac h i choose optimal bijections φ n i : Y n → X i . Consider the sequence ( φ n i ◦ ψ n )( y ) ∈ X i . Since X i has only ﬁnitely ma ny o ﬀ diag onal p oints this s equence ha s a co nstant subsequence (here we think of the sequence c ontaining only copies of the diagonal as co nstant). By taking subsequences o f subsequences we ca n ﬁnd a subsequence ˆ Y l of Y n such that suc h that ( φ l i ◦ ψ l )( y ) is constant for a ll oﬀ diagona l y ∈ Y and all i . Construct β i : Y → X i by β i ( y ) = φ l i ◦ ψ l ( y ) a nd β i ( x ) = ∆ for a ny rema ining unmatched p oints x ∈ X i W e will show these β i : Y → X i are o ptimal bijections. F or each bijection τ : A → B let C( τ ) = P a ∈ A k a − τ ( a ) k 1 denote the 1-W asserstein transp orta tion cost v ia the bijection τ . Thus τ : A → B is a n optimal bijection if and only if d 1 ( A, B ) = C( τ ). Suppo se that β i : Y → X i is not optimal. This implies ther e is some bijection α : Y → X i and ǫ > 0 with the C( α ) < C( β i ) − ǫ. Since lim l →∞ ˆ Y l = Y there is some l and some bijection ψ l : ˆ Y l → Y such that C( ψ l ) < ǫ/ 3 . Let ˆ β i : ˆ Y l → X i be the tr ansp ortatio n plan by ﬁrs t tra ns po rting Y l via ψ l to Y and then trans p o rting via α to X i . By contruction C( ˆ β i ) ≤ C( α ) + C( ψ l ) < C( β i ) − 2 ǫ/ 3 . But at the same time, b y co ns idering β i as the compos ition of the tra nsp ortation plans of ˆ β i and ψ l , we know C( β i ) ≤ C( ˆ β i ) + C( ψ l ) < C( ˆ β i ) + ǫ/ 3 . T og ether these inequalities imply that C( β i ) < C( β i ) − ǫ/ 3 which is imp ossible. Thus the β i = φ l i ◦ ψ l : Y → X i are o ptimal bijections for all i . Now F 1 ( Y ) = 1 N P N i =1  P { y ∈ Y : y 6 =∆ } k y − β i ( y ) k 1 + P { x ∈ X i : β − 1 i ( x )=∆ } k x − ∆ k 1  and F 1 ( Y l ) = 1 N N X i =1   X { ˆ y ∈ Y : ψ l ( ˆ y ) 6 =∆ } k ψ l ( ˆ y ) − φ l i ( y ) k 1 + X { ˆ y ∈ Y l : ψ l ( ˆ y )=∆ } k ˆ y − φ l i ( ˆ y ) k 1   = 1 N N X i =1   X { y ∈ Y : y 6 =∆ } k ψ − 1 l ( y ) − β i ( y ) k 1 + X { ˆ y ∈ Y l : ψ l ( ˆ y )=∆ } k ˆ y − φ l i ( ˆ y ) k 1   . By as s umption y = m { β i ( y ) } and hence N X i =1 X { y ∈ Y : y 6 =∆ } k y − β i ( y ) k 1 ≤ N X i =1 X { y ∈ Y : y 6 =∆ } k ψ − 1 l ( y ) − β i ( y ) k 1 . Thu s f or F 1 ( Y l ) < F 1 ( Y ) to hold it m ust be t rue that N X i =1 X { x ∈ X i : β − 1 i ( x )=∆ } k x − ∆ k 1 > N X i =1 X { ˆ y ∈ Y l : ψ l ( ˆ y ) = ∆ } k ˆ y − φ l i ( ˆ y ) k 1 . 13 Since { ˆ y ∈ Y l : ψ l ( ˆ y ) = ∆ } ⊂ { ˆ y ∈ Y l : ( β − 1 i ◦ φ l i )( ˆ y ) = ∆ } w e further kno w N X i =1 X { x ∈ X i : β − 1 i ( x )=∆ } k x − ∆ k 1 > N X i =1 X { x ∈ X i : β − 1 i ( x )=∆ } k x − ( φ l i ) − 1 ( x ) k 1 . This implies that there is a selection S = { x 1 , x 2 , . . . x N } such that x i ∈ X i , β − 1 i ( x i ) = ∆ and m S 6 = ∆, c ontradicting our s e c ond condition. Theorem 1 0 pr ovides us with an (admittedly very slow) a lgorithm to ﬁnd the median. W e can consider the set of all gr oupings G up t o equiv alence, a nd their corresp o nding candida tes m ( G ). T he median is one of these m ( G ) so we o nly need to c o mpare t he F 1 ( m ( G )) over all groupings G . W e only have to use the nec essary co ndition as it is always allow a ble t o chec k (ﬁnitely many) extra options when loo king for global minima as long as we kno w we hav e a ll the local minima in our list to c heck. Alternatively we could use a g radient descent algor ithm analog ous to that used in [ 28 ]. The only modiﬁcations needed a re r e placing the optimal pairing using d 2 with optimal pair ings using d 1 and r eplacing the means of the selections with the media ns of the se le c tions. T his algorithm will terminate in ﬁnite time as at eac h iteration the cost function F 1 decreases a nd uses a new gr ouping (of which there are only ﬁnitely many). This would not guarantee ﬁnding the g lobal minimum but r ather will terminate at a lo cal minimum. Running m ultiple times from diﬀerent initial lo cations c a n improve the estimate. 4 Comparing the median and the mean 4.1 Robustness of the median F or real num b ers , the median is a robust measur e of central tendency , while the mean is no t. One measur e of robustness is the breakdown po int, which is of a b ound o n the prop or tion of incorre ct obser v ations (e.g. arbitrar ily large o bserv ations) that an estimator can handle b efore giving an incorrect (e.g. ar bitrarily larg e) result. The median has a brea kdown p oint of 50%, w hile the mea n has a breakdown p o int of 0 % (a sing le large o bserv ation can throw it oﬀ ). In this s e ction w e will prove tha t the breakdown p oints for medians and mean of p ersis tence diag r ams are the same as those of re a l num b ers . F or p ersistence dia grams we r eplace “arbitra rily lar g e” with “ar bitrarily far ﬁnite distance aw ay .” Lemma 12. The brea kdown p oint for the mean of a p opulation of per sistence dia grams lying in D (0 , 0) is 0%. Pr o of. Let X 1 , X 2 , . . . X N ∈ D (0 , 0) with mean Y . There is some M > 0 such that every po int in Y is at mos t distance M fro m the diagonal. Let ˜ X b e a diagram with a single oﬀ diago nal p oint p = (0 , √ 2( K N + M N )). Observe that p is distance K N + M N from the diagonal. Let Z b e a mean of { ˜ X , X 2 , . . . X N } . Using the c ha racteris ation o f the mea n, Z must con ta in a point a t least distance ( K N + M N ) / N = K + M fro m the diagonal. This implies that d 2 ( Z, Y ) ≥ K . By c ho osing K arbitr a rily large w e ca n ensure Y and Z ar e arbitrarily far awa y . The br eakdown p o int for the median of a popula tio n o f p er sistence diagrams lying in D (0 , 0) is 50%. Let ∅ deno te the p er sistence diagram only co nt aining copies of the dia gonal. Theorem 13. L et X 1 , X 2 , . . . X n +1 ∈ D (0 , 0) e ach with only ﬁnitely many oﬀ diagonal p oints. Then exists a c onst ant M (dep endent on X 1 , X 2 , . . . X n ) such that for any X n +2 , . . . X 2 n +1 ∈ D (0 , 0) , any m e dian of { X 1 , X 2 , . . . X n +1 , X n +2 , . . . X 2 n +1 } is of distanc e at most M fr om the p ersistenc e dia gr am only c ontaining c opies of the dia gonal. Pr o of. Set our b o und M a s max Groupings G of X 1 ,X 2 ,...X n +1 X selection s s ∈ G max { y -co ords in s } − min { x -c o ords in s } which only dep ends on the diagrams { X 1 , X 2 , . . . X n +1 } . Let Y = { ( a j , b j ) } b e a median of { X 1 , X 2 , . . . X n +1 , X n +2 , . . . X 2 n +1 } . F rom optimal bijections φ i : Y → X i denote the co ordina tes of φ i (( a j , b j )) as ( x i,j , y i,j ) (writing ( ∞ , −∞ ) when φ i (( a j , b j )) = ∆). Since a j is the median of { x 1 ,j , x 2 ,j , . . . x 2 n +1 ,j } we know that a j ≥ min { x 1 ,j , x 2 ,j , . . . x n +1 ,j } . Similarly b j ≤ max { y 1 ,j , y 2 ,j , . . . y n +1 ,j } . 14 This implies that for ea ch j , we hav e the b ound b j − a j ≤ max { y 1 ,j , y 2 ,j , . . . y n +1 ,j } − min { x 1 ,j , x 2 ,j , . . . x n +1 ,j } . Thu s d 1 ( Y , ∅ ) = X j b j − a j ≤ X j max { y 1 ,j , y 2 ,j , . . . y n +1 ,j } − min { x 1 ,j , x 2 ,j , . . . x n +1 ,j } . F ro m our characterisatio n of the media n of a set of p ersistence diagra ms we know Y = m ( ˆ G ) for some gro uping ˆ G of { X 1 , X 2 , . . . X 2 n +1 } . Let ˆ G r es be the re s triction of ˆ G to the subset of diagra ms { X 1 , X 2 , . . . X n +1 } . By cons tr uction M ≥ X selection s s ∈ ˆ G res max { y -co ordinates in s } − min { x -co ordinates in s } = X j max { y 1 ,j , y 2 ,j , . . . y n +1 ,j } − min { x 1 ,j , x 2 ,j , . . . x n +1 ,j } ≥ d 1 ( Y , ∅ ) 4.2 Num b er of p oin ts in the mean compared to the median One qualitative diﬀere nce betw een the mean and the median is the pres ence or absence of points with small per sistence. In some applications, such as when we have point cloud samples of an underlying shap e of interest, these are heuristica lly t he r esult of noise. The mean of a selection con taining at least one p oint oﬀ the diagonal is a p oint oﬀ the diagonal. It is possible f or the mean of N diagr ams ea ch with K p oints to contain N K oﬀ diagonal po ints. In compariso n the median of any selection with mor e than half copies of the dia g onal will always be a cop y of the diag onal. In the big picture this can add up to lots of extra p oints oﬀ the diagonal in the mean p ersistence diagram when co mpared to the median. Lemma 14. Let X 1 , . . . X N be p er sistence dia grams such that the average num b er of o ﬀ dia gonal p o ints in the X i is K . If Y is a median of the X i then Y has less t han 2 K p o ints oﬀ the diagona l. Pr o of. Let y 1 , y 2 , . . . y n be the oﬀ diagona l po int s in Y . Let φ i be optima l bijections b etw een Y a nd the X i . By Theorem 10 w e know that y j is the median o f { φ i ( y j ) } for eac h j . By Lemma 8 w e know that for each j the sets { φ i ( y j ) } must contain at least ( N + 1) / 2 oﬀ diago nal p oints. This implies that ∪ j { φ i ( y j ) } must cont ain at least ( N + 1 ) n/ 2 p oints. Since the combined of total of all the oﬀ diagonal po ints in the X i is N K we ca n conclude that ( N + 1) n/ 2 ≤ N K and hence n < 2 K . W e illustra te this signiﬁcan t adv antage of the median with a simulated g e ometric e xample. W e generated p oint clouds of the unit cir c le by drawing 25 p oints fro m the uniform meas ur e on the unit circle conv o luted with Gaussian noise with v ariance σ 2 . W e then build the H 1 per sistence diagrams from the corr e sp onding Rips ﬁltration of this po int cloud (descr ibe d in the a pp e ndix ). F o r the 5 p e rsistence diagra ms thus pr o duced w e then computed the mean and t he media n. These ar e illustrated in Figure 9 . Since the underlying sha p e of interest is a circle there should b e o ne p o int in each p ers istence diagra m far fro m the diagonal corres po nding to the H 1 class of the circle, alongside ex tr a “noisy” po ints near the diagona l (with more as the noise parameter in the sa mpling p ro cess increases). In the sim ulated data t he mea n and median eac h hav e one po int far from the dia gonal (and thes e are clo se to each other) but with lar ger noise the mean diag r am has mor e extra points near the diagonal than the median diagram. 4.3 Discon tinuities and non-uniqueness of the mean and the m edian Unfortunately b oth the mean a nd median are neither co nt inu ous nor a lwa ys unique. There can b e a discontin uities when the grouping G whic h provides us with the optimal candidate for the mean or the median s witches. This is illustrated in the Fig ures 11 a nd 10 . In these ex amples we hav e three diagrams, one consists only of copies of the diagona l, one containing oﬀ diag onal po ints deno ted by squares, and the other denoted by triangle s . In this exa mple as z increa ses in the squares diagr am trav els acro ss the optimal gro uping changes from {{ x 1 , (1 , z ) , ∆ } , { x 2 , ∆ , ∆ }} to {{ x 1 , ∆ , ∆ } , { x 2 , (1 , z ) , ∆ } } leading 15 (a) σ = 0 (b) σ = 0 . 05 (c) σ = 0 . 1 (d) σ = 0 . 15 (e) σ = 0 . 2 (f ) σ = 0 . 25 Figure 9 : F or ea ch standard devia tion σ w e randomly gener ated ﬁve noisy p o int clouds of the c ir cle each 25 points drawn i.i.d. from the c o nv olution of the uniform mea sure on the unit circle convolv ed with Gaussia n no ise with standard dev iation σ . F rom these ﬁve p oint clouds we co nstructed ﬁve Rips ﬁltrations and their H 1 per sistence diagrams. The co rresp o nding median diagra m is then depicted using c ircles and the mean diagra m using triang les. to a discon tinuit y to both the mean and the median (note that the v a lue of z where the switch o ccur s is diﬀ e r ent for the mean and median). At the time it s witches b o th g r oupings are equally optimal and hence we hav e non- uniqueness. The mean is generica lly unique but the median is no t. T o show this r igoro usly w e shall restrict our selves to the case where we hav e N diag rams ea ch with o nly ﬁnitely ma ny oﬀ diagona l p oints. Let k 1 , k 2 , . . . k N be no n-negative int egers. Let U ( k 1 , k 2 , . . . , k N ) denote the space of s ets of dia grams X = { X 1 , X 2 , . . . X N } suc h that X i has k i oﬀ diagonal points. U ( k 1 , k 2 , . . . k N ) is the quotien t of ( R 2+ ) k 1 + k 2 + ...k N by a ﬁnit e group of symmetries Γ. There is a quotient map q : ( R 2+ ) k 1 + k 2 + ...k N → U ( k 1 , k 2 , . . . k N ) = ( R 2+ ) k 1 + k 2 + ...k N / Γ . Let λ be Leb es gue measure on ( R 2+ ) k 1 + k 2 + ...k N and let ρ = q ∗ ( λ ) be the push forward of Lebesgue measure on to U ( k 1 , k 2 , . . . k N ). Prop ositi on 15. The sets o f dia grams in U ( k 1 , k 2 , . . . k N ) which do not have a unique mean has mea sure ze r o. Pr o of. Let ˜ A be the s e t o f se ts of diagrams in U ( k 1 , k 2 , . . . k N ) which do no t hav e a unique mea n. Then A = q − 1 ( ˜ A ) is the set of vectors of lab elled diagr ams (in ( R 2+ ) k 1 + k 2 + ...k N ) which do no t hav e a unique mea n. Since ρ ( ˜ A ) = λ ( q − 1 ( ˜ A )) it is suﬃcient to sho w λ ( A ) = 0. Let S b e a selection co ntaining the p oints { ( a 1 , b 1 ) , ( a 2 , b 2 ) , . . . ( a k , b k ) } with N − k copies of the diag onal. In the a ppendix w e deﬁne the mean of the selection S a s the minimizer of f S ( y ) = P x ∈ S k x − y k 2 2 . which o ccurs at µ S =  1 N  k ˆ x + ( N − k ) ˆ x + ˆ y 2  , 1 N  k ˆ y + ( N − k ) ˆ x + ˆ y 2  where ˆ x and ˆ y ar e the means of a 1 , a 2 , . . . a k and b 1 , b 2 , . . . b k resp ectively . F or ea ch pair of distinct groupings, G 1 and G 2 , of the lab elled diagrams le t A ( G 1 , G 2 ) =  X = ( X 1 , X 2 , . . . X N ) : P S ∈ G 1 f S ( µ S ) = P S ∈ G 2 f S ( µ S )  . An y X ∈ A ( G 1 , G 2 ) m ust satisfy a quadratic equa tion so either A ( G 1 , G 2 ) = ( R 2+ ) k 1 + k 2 + ...k N or λ ( A ( G 1 , G 2 )) = 0. It is clear tha t there e xists a vector of la b elled persistence diag rams X = ( X 1 , X 2 , . . . X N ) ∈ ( R 2+ ) k 1 + k 2 + ...k N such that X / ∈ A ( G 1 , G 2 ) we conclude t hat λ ( A ( G 1 , G 2 )) = 0. If X has more than one mean then by Prop ositio n 18 there must b e g roupings G 1 , G 2 such that µ G 1 6 = µ G 2 but P S ∈ G 1 f S ( µ S ) = F 2 ( µ G 1 ) = F 2 ( µ G 2 ) = P S ∈ G 2 f S ( µ S ) . This implies A ⊆ S G 1 6 = G 2 groupings A ( G 1 , G 2 ) . There ar e only ﬁnitely many g roupings so λ ( A ) = 0. 16 (0 , 2) (3 , 5) (1 , z ) (4 , 4) ( 11 3 , 13 3 ) ( z +3 4 , z +3 4 ) ( z +7 12 , 11+5 z 12 ) (a) The mean for z ≤ 3 . 99071 (0 , 2) (3 , 5) (1 , z ) ( 2 3 , 4 3 ) ( 9+ z 4 , 9+ z 4 ) ( 25+ z 12 , 29+5 z 12 ) (b) The mean for z ≥ 3 . 99071 Figure 10: W e hav e three diag rams, one consists o nly of copies of the diagonal, one co ntaining oﬀ diago- nal p oints denoted by squares, and the other co ntaining oﬀ diagonal p oints denoted b y triangle s. In (a) F 2 (circles) = 8639 − 3995 z +1268 z 2 6534 and in (b) F 2 (circles) = 191 − 58 z +7 z 2 36 . When z < 3 . 990 7 0 the o ptimal gr ouping is { (0 , 2) , (1 , z ) , ∆ } and { (3 , 5) , ∆ , ∆ } (used in (a)). When z > 3 . 99 072 then the optimal g rouping is { (0 , 2) , ∆ , ∆ } and { (3 , 5) , (1 , z ) , ∆ } (used in (b)). Both groupings are optimal when z ≃ 3 . 99 0 71 and as a result w e do not have a unique mean. (0 , 2) (3 , 5) (1 , z ) (1 , 2) (a) The median for z ≤ 4 (0 , 2) (3 , 5) (1 , z ) (3 , z ) (b) The median for z ≥ 4 Figure 11: W e have three diagrams, one cons ists only of copies of the diagonal, one con taining oﬀ dia g onal points denoted by squar es, and the other containing oﬀ diagonal p o ints denoted by triangle s . When z < 4 the optimal grouping is { (0 , 2) , (1 , z ) , ∆ } a nd { (3 , 5) , ∆ , ∆ } (the grouping used in (a )). When z > 4 then the optimal grouping is { (0 , 2) , ∆ , ∆ } and { (3 , 5) , (1 , z ) , ∆ } (the grouping used in (b)). Both ar e optimal when z = 4 and as a result we do not have a unique median. This p ro of o f gener ic uniqueness contrasts sharply to the case of the median which is not generica lly unique. Prop ositi on 1 6 . Let N ≥ 3 b e an odd n umber . Let k 1 , k 2 , . . . , k ( N +1) / 2 ≥ 2. The sets of diagr ams in U ( k 1 , k 2 , . . . k N ) which do not ha ve a unique median has positive measure. Pr o of. W e will ﬁrst illustrate this with the ca se U (2 , 2 , 0 ) which shows the idea o f the genera l case. Suppose X 1 and X 2 each co ntain tw o oﬀ diag onal points { ( a 1 , a 2 ) , ( a 2 , b 2 ) } , and { ( c 1 , d 1 ) , ( c 2 , d 2 ) } respec tively , and X 3 has no oﬀ dia gonal p oints. F urther supp ose that a 1 , a 2 < c 1 , c 2 ≤ b 1 , b 2 < d 1 , d 2 . First consider the grouping G 1 = { S (1 , 2) , S (2 , 1) } where S (1 , 2) := { ( a 1 , b 1 ) , ( c 2 , d 2 ) , ∆ } and S (2 , 1) := { ( a 2 , b 2 ) , ( c 1 , d 1 ) , ∆ } . The median of the selection S (1 , 2) is ( c 2 , b 1 ) and the median o f the selectio n S (2 , 1) is ( c 1 , b 2 ). This implies that m G 1 has oﬀ-diag onal p oints { ( c 2 , b 1 ) , ( c 1 , b 2 ) } . Also co nsider the gro uping G 1 = { S (1 , 2) , S (2 , 1) } wher e S (1 , 2) := { ( a 1 , b 1 ) , ( c 2 , d 2 ) , ∆ } and S (2 , 1) := { ( a 2 , b 2 ) , ( c 1 , d 1 ) , ∆ } . Analogous ca lculations show the oﬀ-diagonal p oints of m G 2 are { ( c 1 , b 1 ) , ( c 2 , b 2 ) } . These gr oupings ar e illustrated in Figures 12a and 12b . Now F 1 ( m G 1 ) = − a 1 + d 1 − a 2 + d 2 = F 1 ( m G 2 ) a nd that F 1 ( m G ) ≥ − a 1 + d 1 − a 2 + d 2 for all other grouping s G . This implies that m G 1 and m G 2 are b o th medians o f X . If b 1 6 = b 2 and c 1 6 = c 2 these medians ar e distinct and th us we do not hav e a unique median. The mea sure of such sets of diagr ams { X 1 , X 2 , X 3 } has non-zero mea sure in U (2 , 2 , 0). The extension of this example to when k 1 , k 2 , . . . k ( N +1) / 2 > 2 is illustrated in Figure 12c . W e need to ﬁnd a n 17 ( a 1 , b 1 ) ( c 1 , d 1 ) ( a 2 , b 2 ) ( c 2 , d 2 ) ( c 1 , b 2 ) ( c 2 , b 1 ) (a) m G 1 ( a 1 , b 1 ) ( c 1 , d 1 ) ( a 2 , b 2 ) ( c 2 , d 2 ) ( c 1 , b 1 ) ( c 2 , b 2 ) (b) m G 2 1 2 3 (c) The diﬀerent regio ns of R 2+ for constructing sets of p opulations of p ersistence diagrams with non unique medians. example of a non-z e ro measure set o f ( X 1 , X 2 , . . . X N ) ∈ ( R 2+ ) k 1 + k 2 + ... + k N with k 1 , k 2 , . . . k ( N +1) / 2 > 2 with non unique media ns. W e will requir e that: • X 1 has tw o points ( a 1 , b 1 ) a nd ( a 2 , b 2 ) in region 1, • X 2 has tw o points ( c 1 , d 1 ) a nd ( c 2 , d 2 ) in region 2, • X 3 , X 4 . . . X ( N − 3) / 2 each co ntains tw o points in region 3 , and • every other o ﬀ-diagonal p o int in the X i lies in the regio n patterned b y cr osshatch. Note tha t this set of populations of p ersistence diagrams is of non-zero measur e in U ( k 1 , k 2 , . . . k N ). Every median of { X i } can b e wr itten as m G where each selection in G con tains either p oints in the cross -hatch region (and p otentially copies of the diago na l) o r they cont ain o ne p oint each from reg ions 1 a nd 2, ( N − 3) / 2 po ints from regio n 3, and ( N − 1) / 2 copies of the diagona l. There is a is a median m with oﬀ diago nal po int s { ( c 1 , b 1 ) , ( c 2 , b 2 ) } alongside other po ints determined b y the points in the cross-hatched r egion. Another median ˜ m is t he s a me as m but switching { ( c 1 , b 1 ) , ( c 2 , b 2 ) } for { ( c 2 , b 1 ) , ( c 1 , b 2 ) } . 5 Discussion and further d irections There are many parallels b etw een the mea n and median of p opulations of p ers is tence diagrams. This sugg ests so me future dire c tio ns could inv olve ex tending work that has b een done on the mean to the corre sp onding r e sults for the median. F o r example, in [ 23 ] they explore a n alternative probabilistic deﬁnition of the mean which combines 18 the tra dition mean used with th e notion of a s haking hand equilibrium in ga me theory . This alternate deﬁnition is unique a nd c ontin uous. W e believe a simila r idea w ould w ork to c r eate a probabilistic deﬁnition of the media n. Another future direc tio n is to combine the median w ith sampling theorems to ﬁnd conditions to infer the cor rect homology with high probability . F or ex ample, th e homology o f a set ca n be inferred fro m the p ersistence diagram corres p o nding to a p oint cloud with small Hausdor ﬀ dista nce to the o r iginal set. Under certain sampling conditions we ca n ensur e that this Hausdor ﬀ dista nce is small with hig h pro bability . Perhaps with higher pro bability the median of indep endently o btained pe r sistence diagr a ms under such sampling s co nditions will provide the cor r ect homology . There is s cop e for further development s in alg o rithms, b oth in design and implementation. In this pap er we hav e discussed very referred to some of the computational asp ects, including men tioning a brute force algor ithm and a g radient descent appro ach. P erhaps there could b e sig niﬁcant impro vemen ts by using ge o metry a na logous to the w ork in [ 17 ] where they show t hat b y exploiting the inherit geo metry of the p oints in p ersistence diagr ams lying in a plane w e c an appro x imate the W assers tein distances muc h faster. References [1] Rob ert J Adler, Omer Bobrowski, a nd Shmuel W ein b erg e r. Crackle: The homolog y o f noise. Discr ete & Computational Ge ometry , 52(4):680–704 , 2014. [2] Chanderjit Ba ja j. The a lgebraic degre e of geometric o ptimization problems. Discr ete & Co mputational Ge om- etry , 3 (2):177– 191, 1 9 88. [3] Paul B endich, Herb ert Edelsbrunner, and Mic hael Kerber . Co mputing r obustness and p e rsistence f o r images. Visualization and Computer Gr aphics, IEEE T r ansactions on , 1 6(6):125 1 –126 0, 2010. [4] Andrew J Blumberg, Itamar Gal, Michael A Mandell, and Matthew Pancia. Robust statistics, hypothesis test- ing, and co nﬁdence interv als for pe rsistent homo logy on metric measure spaces. F oundations of Co m pu t ational Mathematics , 14 (4):745– 789, 2 014. [5] Omer Bo browski a nd Ro be rt J Adler. Distance functions, critical points, and top o logy fo r some random complexes. arXiv pr eprint arXiv:11 07.4775 , 2 0 11. [6] Peter Bub enik, Gunnar Carlsson, P eter T Kim, and Zhi-Ming L uo . Statistical top olo gy via morse theory per sistence a nd no nparametric e s timation. Algebr aic metho ds in statistics and pr ob ability II , 516:75 – 92, 20 10. [7] Peter Bub e nik and Paw el Dlotko. A p er sistence la ndscap es t o olb ox for topo logical s ta tistics. 2 013. [8] Andrea Cerri, Massimo F erri, a nd Daniela Giorgi. Retriev al of trademark images b y means of size functions. Gr aphic al Mo dels , 6 8(5):451– 471, 2 006. [9] F r´ ed ´ er ic C ha zal, Brittany T erese F asy , F abrizio Lecci, Alessandro Rinaldo, Aarti Singh, a nd La rry W asse r man. On the bo otstra p for persistence diagrams and landscapes. arXiv pr eprint arXiv:1311 .0376 , 20 13. [10] F r´ ed´ eric Chaza l, Marc Glisse, Cather ine La bru` ere, and Bertrand Michel. Optimal r ates of conv erg ence for per sistence dia grams in topo lo gical data analysis. arXiv pr eprint arXiv:1305 .6239 , 2013 . [11] William Crawley-Bo evey . Decomp osition of p oint wise ﬁnite-dimensio nal p ers is tence mo dules. Journal of Al- gebr a and Its Appli c ations , 14(05):155 0066 , 201 5. [12] H. Edelsbrunner and J . L. Harer. Computational T op olo gy: An Intro du ct ion . Amer . Math. So c., 2010. [13] Brittany T er ese F asy , F a brizio Lecci, Alessandro Rinaldo, Larr y W ass erman, Siv araman Ba lakrishna n, Aarti Singh, et a l. Conﬁdence sets for persistence diagrams. The Annals of S tatistics , 42(6 ):2301– 2339 , 2 014. [14] Jennifer Gamble and Giseon Heo. Exploring uses of persistent h omology for statistical analysis of landmark- based sha p e data. J ournal of Multiva riate Analysi s , 101(9):2184 –2199 , 20 10. [15] Matthew Kahle. Random g eometric complexes. Discr ete & Computational Ge ometry , 45(3 ):553–5 73, 20 11. [16] Matthew Kahle, E liz ab eth Meck es , et al. Limit the theo rems for betti num b ers o f rando m simplicial complexe s. Homolo gy, Homotopy and Applic a tions , 15(1):343–3 74, 20 1 3. 19 [17] Michael Kerb er , Dmitriy Mor ozov, and Arnur Nigmetov. Geo metry helps to co mpare p er sistence diagra ms. Journal of Exp erimental A lgorithmics (JEA) , 22:1 –4, 2017. [18] William A K irk. Geo desic ge ometry a nd ﬁxed p oint theory . In Seminar of Mathematic al Analysi s (Malaga/ Sevil le, 2002/2003) , volume 64 , pages 195–225, 2 003. [19] Xiaosun Lu, JS Marr o n, and Perry Haa la nd. Ob ject-or iented data analys is of cell ima ges. Journal of t he Americ a n Statistic al Asso ciation , 109(506):548– 559, 2014. [20] J Stev e Marr on and Andr´ es M Alonso . Overview o f ob ject o riented data analysis. Biometric a l Journal , 56(5):732 –753 , 2014 . [21] JS Marr on, Ja mes O Ramsay , Laura M Sangalli, Anuj Sriv astav a, et al. F unctional data a na lysis of amplitude and ph ase v ar iation. St atist ic al Scienc e , 30(4):4 68–4 8 4, 201 5. [22] Y ur iy Mileyk o, Say an Mukher jee, and John Harer. P robability measures on the spa ce of p ers istence diag rams. Inverse Pr oblems , 2 7(12):12 4007 , 2 011. [23] Elizab eth Mu nch, Katharine T urner, Paul Bendich, Sayan Mukherjee, Jonathan Mattingly , John Harer, et al. Probabilis tic fr´ echet means for time v arying p er sistence diag rams. Ele ctr onic Journal of S tatistics , 9(1):1 173– 1204, 2 015. [24] Shin-ichi Ohta. Baryc e nters in alexandrov spaces of curv ature b o unded b elow. A dv. Ge om , 12:5 71–5 87, 20 1 2. [25] James O Ra msay . F unctional da ta analysis . Wiley Online Library , 2 006. [26] V a nessa Ro bins a nd Kathar ine T urner. Pr incipal comp o nent ana lysis o f p ers istent homology rank functions with ca se s tudies o f s patial p o int patterns, sphere packing and c o lloids. Physic a D: Nonline ar Phenomena , (to app ear). [27] Laura M Sanga lli, Piercesa re Secchi, and Simone V an tini. Ob ject oriented data analysis : a few metho do logical challenges. Biometric al Journal , 56(5):774–777 , 2014. [28] Kathar ine T urner, Y ur iy Mileyk o, Sayan Mukherjee, and J ohn Harer. F r ´ echet means for distributions o f per sistence dia grams. Discr ete & Comp utational Ge ometry , 52(1):44–70 , July 2014. [29] Kathar ine T urner , Sayan Mukher jee, and Doug M Boy er. Persisten t homolo gy transfo rm for modeling shapes and surfaces. In formation and Infer enc e , 3(4):3 10–3 44, 201 4. [30] Haonan W ang, JS Marr on, et al. O b ject oriented data ana lysis: Sets of trees. The Annals of Statistics , 35(5):184 9–18 73, 2007 . [31] D Y oge s hw aran, Rob er t J Adler, et al. On the top olog y o f random complexes built ov er stationa ry p oint pro cesses . The A nnals of Appli e d Pr ob ability , 25(6):3338–3 380, 201 5. [32] D Y ogeshw ara n, Elira n Subag, and Rob ert J Adler. Random geometr ic complexes in the thermo dy namic regime. Pr ob ability The ory and R elate d Fiel ds , pages 1–36 , 2014 . A Rips ﬁltration The Rips ﬁltration R is the ﬁltr a tion of the ﬂag complex on 25 vertices where at time t , R t contains all the vertices [ v ], the edg es [ v 0 , v 1 ] whenever k v 0 − v 1 k < t , the 2-simplicies [ v 0 , v 1 , v 2 ] whenever [ v 0 , v 1 ], [ v 1 , v 2 ] and [ v 2 , v 0 ] are all included in R t , a nd so on including hig her dimensio na l simplicies whenev er a ll their bo undary f aces a re in R t . In other w ords R t is the ﬂag complex (also known as the clique complex) on the graph con taining all edg es of length at most t . 20 B Mean diagram The methods here provide a proof for the neces s ary condition for a pers istence diagram to b e a local minim um of of the F r ´ echet f unction which is far simpler than that in [ 28 ]. W e also extend the r e sults to per sistence diagr ams containing po ints in L ∞ and L −∞ . Due to th e s imilarities t o the earlier mater ial we omit many of the details. F or the mea n we als o split o ur analysis into the restrictions to R 2+ ∪ ∆, L ∞ and L −∞ . If X 1 , X 2 , . . . , X N ∈ D ( k,l ) then Y is a mean of the X i if and only if Y | R 2+ ∪ ∆ , Y | L ∞ , a nd Y | L −∞ are means of the X i each restricted to the appropria te domain. As L ∞ and L −∞ are e ﬀectively copies of R we can easily characterize the means of populations of multisets in them. Suppose A 1 , A 2 , . . . , A N are each multisets of exactly k real n um ber s and w e lab el the elements of eac h A i so that A i = { a i, 1 , a i, 2 , . . . , a i,k } with a i, 1 ≤ a i, 2 ≤ . . . ≤ a i,k . Set B = { b 1 , b 2 , . . . , b k } where b j is the mean o f { a 1 ,j , . . . a N ,j } . Then B is the unique multiset of k real num b ers tha t minimizes f 2 : Y 7→ N X i =1 inf φ : A i → Y ,φ bijec tion X a ∈ A i | a − φ ( a ) | 2 ! and hence B is the mean. Character iz ing the means of populations of persistence diagr ams in D (0 , 0) is analogously ac hieved thro ugh the means of sele c tions. Lemma 17 . Let ( a 1 , b 1 ) , ( a 2 , b 2 ) , . . . , ( a k , b k ) b e p oints in the plane. Let ˆ x be the mean of a 1 , a 2 . . . a k and ˆ y b e the mea n of b 1 , b 2 , . . . , b k . Then ( ˜ x, ˜ y ) : =  1 N  k ˆ x + ( N − k ) ˆ x + ˆ y 2  , 1 N  k ˆ y + ( N − k ) ˆ x + ˆ y 2  is t he unique po int in R 2+ which minimizes f ( x, y ) = k X i =1 k ( x, y ) − ( a i , b i ) k 2 2 + N X i = k +1 k ( x, y ) − ∆ k 2 2 . Let S be a mulitset in R 2+ ∪ ∆ containing ( a 1 , b 1 ) , ( a 2 , b 2 ) , . . . , ( a k , b k ) a nd N − k copies of the diago nal and let ( x, y ) be the po int in R 2+ found in Lemma 17 . W e call this ( x, y ) the me an of S a nd denote it by µ S . Let µ ( G ) b e the pe r sistence dia gram which contains { µ S j : S j ∈ G } . Each grouping G pr o duces a candida te µ ( G ) f or the mean. W e will sho w that any mean m ust be µ ( G ) f or s ome gr ouping G . Theorem 18. L et X 1 , . . . , X N , Y ∈ D (0 , 0) b e p ersistenc e diagr ams e ach with ﬁnitely many oﬀ d iagonal p oints. L et F 2 : Z 7→ 1 N P N i =1 d 2 ( X i , Z ) 2 . F or e ach i ﬁx an optimal bije ction φ i : Y → X i (for d 2 ). F or e ach y ∈ Y we have a sele ction { φ i ( y ) } . L et G Y b e the gr oupi ng {{ φ i ( y ) } : y ∈ Y } . If Y is a lo c al minimum of F 2 then Y = µ G Y . Pr o of. Supp ose that Y 6 = µ G Y and thus y 0 6 = µ { φ i ( y 0 ) } for so me y 0 ∈ Y . Set Y t to b e the dia g ram which a grees with Y except the point y 0 is repla ced with (1 − t ) y 0 + tµ { φ i ( y 0 ) } . F 2 ( Y t ) = 1 N N X i =1 inf φ t i : Y t → X i X y t ∈ Y t k y t − φ t i ( y t ) k 2 ≤ 1 N N X i =1 X y ∈ Y ,y 6 = y 0 k y − φ i ( y ) k 2 + (((1 − t ) y 0 + tµ { φ i ( y 0 ) } ) − φ i ( y 0 )) 2 W e th us can conclude that for all t ∈ (0 , 1) F 2 ( Y t ) − F 2 ( Y 0 ) ≤ 1 N N X i =1 ( y 0 − φ i ( y 0 )) 2 − (((1 − t ) y 0 + tµ { φ i ( y 0 ) } ) − φ i ( y 0 )) 2 which we know is negative fr o m the pro of of Lemma 17 . This implies that Y 0 = Y can not b e a lo cal minimum. 21 Figure 1 3: W e wan t the m ean of the three p oints marked by circles a longside tw o c opies of the diagonal. The gra y square is the arithmetic mean of the three p oints marked by c ircles. The diamond is the p oint on the dia gonal closest to the squa re. The triangle is the mean of the cir cles and 2 copies of the diamond. It is the weighted av erage of the square and the diamond. 22

Medians of populations of persistence diagrams

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment