Persistence Barcodes versus Kolmogorov Signatures: Detecting Modes of One-Dimensional Signals

We investigate the problem of estimating the number of modes (i.e., local maxima) - a well known question in statistical inference - and we show how to do so without presmoothing the data. To this end, we modify the ideas of persistence barcodes by f…

Authors: Ulrich Bauer, Axel Munk, Hannes Sieling

Persistence Barcodes versus Kolmogorov Signatures: Detecting Modes of   One-Dimensional Signals
Persistence Barcodes v ersus K olmogoro v Signatures: Detecting Modes of One-Dimensional Signals ∗ Ulrich Bauer 1 , Axel Munk 2,3 , Hannes Sieling 2 , and Max W ardetzky 4 1 T echnische Univ ersität München (TUM) 2 Institute for Mathematical Stochastics, Uni versity of Göttingen 3 Max Planck Institute for Biophysical Chemistry , Göttingen 4 Institute of Numerical and Applied Mathematics, Uni versity of Göttingen January 27, 2015 Abstract W e in vestigate the problem of estimating the number of modes (i.e., local maxima)—a well known question in statistical inference—and we sho w ho w to do so without presmooth- ing the data. T o this end, we modify the ideas of persistence barcodes by first relating persistence values in dimension one to distances (with respect to the supremum norm) to the sets of functions with a gi ven number of modes, and subsequently working with norms di ff erent from the supremum norm. As a particular case we in vestigate the K olmogor ov norm . W e argue that this modification has certain statistical advantages. W e o ff er confidence bands for the attendant K olmogor ov signatur es , thereby allowing for the selection of rele vant signatures with a statistically controllable error . As a result of independent interest, we sho w that taut strings minimize the number of critical points for a very general class of functions. W e illustrate our results by several numerical e xamples. AMS subject classification: Primary 62G05,62G20; secondary 62H12 1 Intr oduction Persistent homology [ 16 , 17 ] provides a quantitative notion of the stability or robustness of critical v alues of a (su ffi ciently nice) real v alued function f on a topological space: the persistence of a critical v alue is a lo wer bound on the amount of perturbation (in the supremum norm) required for its elimination. Persistence measures the life span of homological features in terms of the di ff erence between birth and death of such features—according to the filtration of the underlying topological space that arises from the sublev el sets of f . Birth and death of homological features of f can be encoded in a barcode diagram, see [ 17 ]. In this article, we consider what we call persistence signatur es , defined as (half) the life span (or persistence) of critical values, i.e., ∗ Research partially supported by DFG FOR 916, V olkswagen Foundation, and the T oposys project FP7-ICT - 318493-STREP 1 persistence signatures correspond to (half) the lengths of persistence barcodes and (when properly ordered) gi ve rise to a descending sequence s 0 , ∞ ( f ) ≥ s 1 , ∞ ( f ) ≥ s 2 , ∞ ( f ) ≥ · · · , (1) where we appropriately account for multiplicity of critical v alues. In our setup, s 0 , ∞ ( f ) denotes the lar gest finite persistence v alue of f , and we append the sequence by zeros beyond the smallest positi ve persistence v alue of f . W e consider one dimensional signals f : [0 , 1] → R . For the moment, to illustrate our results, let X denote the space of piece wise constant real-valued functions on a (variable) equipartition of [0 , 1]. (Later in our e xposition, we also consider more general function spaces.) Let X k ⊂ X denote the space of functions with at most k modes , i.e., local maxima, where we only count inner local maxima. Our point of departure is the observation that s k , ∞ ( f ) = dist ∞ ( f , X k ) , i.e., s k , ∞ ( f ) equals the distance of f to the space of functions with at most k modes with respect to the sup norm. This follows from the combination of two facts. First, from the celebrated stability theorem in persistent homology [ 10 ], which asserts that | s k , ∞ ( f ) − s k , ∞ ( g ) | ≤ k f − g k ∞ for all k ≥ 0 . Second, from the fact that in oder to eliminate all positi ve persistence signatures of f : [0 , 1] → R with v alue less or equal to δ , it su ffi ces to change f by δ in the sup norm, see [ 2 ]. 1 The fact that persistence signatures correspond to distances (with respect to the sup norm) to sets of functions with at most k modes leads us to considering norms di ff erent from the sup norm. Our moti vation is to ask ho w signatures arising from di ff erent norms compare in a statistical sense. T o this end, consider an arbitrary metric d on X and define the metric signatures s k ( f ) : = dist( f , X k ) with respect to d . Then ( s k ( f )) is an descending sequence as in ( 1 ) , see Fig. 1 . Moreov er, since distance to sets in metric spaces is 1-Lipschitz, stability is immediate: | s k ( f ) − s k ( g ) | ≤ d ( f , g ) for all k ≥ 0 . The resulting signatures s k ( f ) will in general be di ff erent from persistence signatures. The aim of this article is to analyze, from a statistical and algorithmic point of view , one particular example: the K olmogor ov metric d K and its resulting Kolmogoro v signatures s K . For one dimensional signals f : [0 , 1] → R the K olmogoro v norm is defined as the L ∞ -norm of the antideriv ativ e F of f , subject to F (0) = 0. The K olmogorov norm plays a prominent role in probability and statistics, see, e.g., [ 29 ]. Our approach is based on the observation that if s k ( f ) = 0, then (the unkno wn function) f has at most k modes. This provides a link between mode hunting , a widely studied problem in statistics [ 23 , 19 , 22 , 12 , 30 ], and the robust estimation of signatures. Most related to our approach is [ 12 ], where the Kolmogoro v norm has been used for mode hunting in the context of density estimation. 1 Note that this result does no longer hold in dimensions greater than two. 2 X 0 X 1 X 2 f Figure 1: Illustration of metric signatures, i.e., distances of some f ∈ X to the sets X k containing those functions with at most k modes. In the sequel we consider the follo wing basic statistical additiv e regression model. Sup- pose that f : [0 , 1] → R is corrupted by random noise  and observed by a finite number of (equidistantly sampled) measurements ( Y i ) n i = 0 , i.e., Y i = f ( t i ) +  i , t i = i / n . (2) Throughout we assume that the noise (  i ) is independently distrib uted with mean zero such that for some κ > 0, v > 0 and all m ≥ 2, E |  i | m ≤ vm ! κ m − 2 / 2 for all i = 1 , . . . , n . (3) W e are concerned with the following question: W ith what pr obability can one estimate the number of modes of f (or pr ovide bounds for its under- and over estimation) fr om the observations Y ? In dimension one, where mode hunting is intimately related to persistent homology , this question has been addressed in topological data analysis (TD A). A well kno wn problem in this context is the fact that the stability theorem of persistent homology is based on the sup norm, which potentially makes this approach non-robust to outliers or unbounded noise. Therefore, se veral methods ha ve been recently suggested to ov ercome this problem in v arious settings [ 1 , 3 , 5 , 6 , 8 , 9 , 25 , 28 ]. Roughly speaking, these methods ha ve in common that the y first regularize or filter the data in one form or another —in order to improve stability with respect to the sup norm—and then w ork with the persistence diagram of the so obtained preprocessed result. This is based on the initial estimation of f itself. From a statistical perspecti ve, ho wev er , having to estimate f in a first step somewhat weak ens the potential appeal of TD A. Already in dimension one of the underlying space, estimating f by any regularization technique leads to di ffi cult problems, e.g., data dri ven smoothing or parameter thresholding. W e stress that in addition, this sensibly a ff ects the resulting persistence properties in a statistically hard to control manner , see, e.g., [ 1 , 6 , 18 ] for the case of a kernel estimator . In fact, presmoothing with a kernel estimator leads to what has been sometimes called the notorious bandwidth selection pr oblem , which does not posses a widely accepted solution since the optimal bandwidth (e.g., in the sense of minimizig the mean squared error between f and its kernel estimate)—although theoretically 3 kno wn—depends on unkno wn characteristics of f , such as its curv ature (see [ 32 ] among many others). Hence, we argue that a conceptual simplification and a computational advantage of TD A would result from circumv enting explicit estimation of f . One aim of this paper is to show that direct estimation of topological properties of f without having to estimate f itself is indeed a doable task by using K olmogorov signatures. W e confine ourselves to dimension one because using the K olmogorov norm in this case lends itself to an e ffi cient algorithm ( O ( n log n ) in the number of data points). W e stress that that our statistical analysis carries ov er higher dimensions. A second aim of this paper is to provide confidence statements on the empirical Kolmogoro v signatures with a controllable statistical error , similar in spirit to [ 18 ], where asymptotic con- fidence bands for the empirical (sup norm based) persistence diagram are given for data on a manifold. Their approach, ho wev er , is based on presmoothing for unbounded noise using a kernel density estimator , which we av oid in this paper . Inference for Kolmogor ov signatures Using the K olmogorov metric and the resulting Kol- mogorov signatures, we inv estigate how well the empirical signatures s k ( Y ), obtained by inter - preting Y as a piecewise constant function, estimate the signatures s k ( f ). As a starting point, Theorem 1 asserts that under the moment condition ( 3 ), for any δ > 0 one has P max k ∈ N 0 | s k ( Y ) − s k ( f ) | ≥ δ ! ≤ 2 exp − δ 2 n 2 v + 2 κ δ ! . Using this, Theorem 2 asserts that for a giv en probability α ∈ (0 , 1), one can construct non- asymptotic confidence regions for the entire sequence ( s k ( f )) of signatures in the sense that P  s k ( f ) ∈  ( s k ( Y ) − τ n ( α ) ) + , s k ( Y ) + τ n ( α )  for all k ∈ N 0  ≥ 1 − α , (4) where ( x ) + = max (0 , x ). Here τ n ( α ) depends in an explicit manner on n , α , κ , and v , which are kno wn constants or can be easily estimated from the data. W e drop the dependence of τ n on κ and v by considering κ and v fixed since we are mainly concerned with the dependence on n and α . For fixed α, κ , v , one asymptotically has τ n ( α ) ≈ 1 / √ n . The parameter τ n ( α ) can be used to threshold the empirical signatures s k ( Y ) by defining k  ( Y ) = max { k ∈ N 0 : s k − 1 ( Y ) ≥  } , where, as a con vention, we define s − 1 ( Y ) = ∞ . Then Theorem 3 asserts that for all k ∈ N 0 , f ∈ X k , and α ∈ (0 , 1), one has P  k τ n ( α ) ( Y ) > k  ≤ α , i.e., the threshold parameter τ n ( α ) controls the probability of over estimating the number of modes for any function f ∈ X . Notice that τ n ( α ) is independent of the number and magnitude of the modes of f , so in this sense the result is univ ersal. Obtaining a univ ersal result in the other direction, i.e., controlling the probability of under estimating the number of modes, is a more delicate task. Indeed, as pointed out in [ 14 ], obtaining such results is in general impossible if the modes of f are allowed to become arbitrarily small. As a consequence, without a priori information on the “smallest scales” of f , no method can provide a control for their underestimation. Therefore, it is only possible to provide a bound for underestimating those 4 signatures of f that are lar ger than a certain threshold. Theorem 4 asserts that for any k ∈ N 0 , f ∈ X k , and α ∈ (0 , 1), one has P  k τ n ( α ) ( Y ) < k 2 τ n ( α ) ( f )  ≤ α . Combining the latter results, we obtain two sided bounds for the estimated number of modes. More precisely , for any f ∈ X k and any α ∈ (0 , 1) we obtain that P  k 2 τ n ( α/ 2) ( f ) ≤ k τ n ( α/ 2) ( Y ) ≤ k  ≥ 1 − α . As mentioned before, for fix ed α, κ , v , one has τ n ( α ) ≈ 1 / √ n . Therefore there exists a constant C such that asymptotically (for large enough n ) by thresholding at C / √ n , it can be guaranteed at a le vel α that all signatures abo ve this threshold are detected. Notice that so f ar we ha ve not made use of any a priori information about f . This changes with Theorem 5 , which asserts that if f ∈ X k and s k − 1 ( f ) ≥  , then P  k  / 2 ( Y ) = k  ≥ 1 − 2 exp −  2 n 8 v + 4 κ  ! , (5) i.e., the number of modes of f can be estimated exponentially fast (in the number of samples) by thresholding the empirical signature pr ovided that one has a priori lo wer bounds on magnitude (in the K olmogorov norm) of the smallest mode of f . Notice that this result is independent of the number of modes of f . K olmogoro v signatures vs. persistence signatures K olmogorov signatures o ff er an alterna- ti ve to persistence signatures, since they behave more rob ust for large errors  i . The intuiti ve reason is that the K olmogorov norm damps these errors, while they remain dominant using the sup norm without prefiltering. This is relev ant, e.g., for unbounded noise (such as normally distributed errors, which are included in our noise model ( 3 )) or for data with outliers. Ne vertheless, Kolmogoro v signatures are not alw ays superior to persistence signatures in terms of statistical e ffi cienc y . This can be seen by comparing their probabilities to detect a non v anishing signature from the data. T o this end, we consider two limiting scenarios. The first comprises sparse signals with high peaks and small support, while the second comprises weak signals with lar ge support. T o illustrate these scenarios, we consider functions with one single mode and i.i.d. normal errors with v ariance one, i.e.,  i ∼ N (0 , 1). In the first scenario, we consider a sequence of functions f n ( x ) =        (1 + ε ) p 2 log n if x ∈ [ j / n , ( j + 1) / n ) , 0 otherwise , (6) for some ε > 0 and for some j ∈ { 0 , . . . , n − 1 } that is a priori not known . W e sho w in Theorem 6 that asymptotically (as n → ∞ ) it is impossible to distinguish f n from the zero function by thresholding K olmogorov signatures at τ n ( α ) as abov e. In contrast, for such signals, sup norm based thresholding of the v ector ( Y 1 , . . . , Y n ) is known to behav e asymptotically minimax e ffi cient in the sense of detecting a non vanishing mode with probability tending to one as n → ∞ , see, e.g., [ 13 , 24 ]. Whether this e ffi ciency carries o ver to persistence signatures is unkno wn to us. In the second scenario, we consider a sequence of functions f n ( x ) =        δ n if x ∈ [1 / 3 , 2 / 3) , 0 otherwise , (7) 5 with δ n → 0. It is well kno wn that it is possible to detect the single mode of f n with probability tending to one as n → ∞ if δ n √ n → ∞ , see, e.g., [ 31 ]. From ( 5 ) it follows, using  = δ n , that K olmogorov signatures can correctly detect the single mode of signals in ( 7 ) by thresholding signatures at δ n / 2. In contrast, for persistence signatures, there exists no thresholding strategy that can detect the single mode with probability one. T o be precise, let again δ n √ n → ∞ , and assume additionally δ n p log n → 0. Then Theorem 7 asserts that for an arbitrary sequence ( q n ) of reals one has lim sup n →∞ P  k q n ( Y ) = 1  < 1. E ffi cient computation using taut strings While our approach can in principle be extended to metric di ff erent from the ones induced by the sup or K olmogoro v norms, not e very metric lends itself to an e ffi cient computation of the requisite signatures. The di ffi culty is to compute the distance of a gi ven function to the set of functions with at most k modes. Using taut strings (which are intimately related to total variation (TV) minimization [ 20 , 21 , 11 , 26 ]), we prove that the set of K olmogorov signatures can be computed in O ( n log n ) time, where n is the number of observ ations. Giv en f ∈ L ∞ ([0 , 1]) and α ≥ 0, the taut string , F α , is the function whose graph has minimal total length (as a curve) among all absolutely continuous functions in the α -tube around the antideriv ati ve F of f . Letting f α = F 0 α denote the deri vati ve of the taut string, Theorem 8 provides a result of independent interest that has been implicitly used sev eral times in the existing literature b ut has ne ver been proven rigorously to our kno wledge: f α minimizes the number of modes among all L ∞ -functions in the (closed) α -ball around f with respect to the K olmogorov norm. Indeed, our result generalizes previous results on the mode-minimizing property of f α , which were sho wn in the special context of piece wise constant functions using the K olmogorov norm, see, e.g., [ 11 , 12 , 22 , 26 ]. 2 Modes and signatur es Modes Let f : [0 , 1] → R be an arbitrary function. In order to define the number of modes (local maxima) of f , consider a finite partition P = { t 0 , . . . , t | P | } of [0 , 1] such that 0 = t 0 < t 1 < · · · < t | P |− 1 < t | P | = 1. For each 0 < i < | P | let M ( f , P , i ) =        1 if max( f ( t i − 1 ) , f ( t i + 1 )) < f ( t i ) 0 else . Define the number of modes of f with respect to P and the total number of modes of f by M ( f , P ) = | P |− 1 X i = 1 M ( f , P , i ) and M ( f ) = sup P M ( f , P ) , respecti vely . It is easy to see that if f is constant, then M ( f ) = 0 and if f is a Morse function in the classical sense (i.e., a smooth function with only nonde generate critical points), then M ( f ) equals the (possibly infinite) number of local maxima of f on the open interval (0 , 1). Notice that di ff erent from Morse theory , though, we are not concerned with critical values or critical points of functions; M ( f ) merely counts the number of modes, without referring to their individual positions or v alues. 6 Metric signatur es W e denote by L ∞ the linear space of Lebesgue-measurable essentially bounded functions on [0 , 1]. Notice that we do not regard L ∞ as a space of equi v alence classes of functions. Throughout this article we work with functions in some (to be specified) set X ⊂ L ∞ . For e xample, X may consist of functions of bounded v ariation or piece wise polynomial functions. W e do not a priori require X to be a linear space. By ( X , d ) we denote X together with some metric, but we do not require ( X , d ) to be a complete metric space. Additionally , we allo w that d attains the value ∞ . Particular choices of ( X , d ) will be specified belo w . Definition 1 (Metric signatures) Let X k denote the subset of X with at most k modes, i.e., X k : = { f ∈ X : M ( f ) ≤ k } . Define the k th metric signatur e of f ∈ X as s k ( f ) : = d ( f , X k ) = inf g ∈ X k d ( f , g ) for k ∈ N 0 , i.e., the distance of f to the set of functions with at most k modes.  Clearly , X k ⊆ X k + 1 are nested models; hence, the sequence ( s k ( f )) k ∈ N is monotonically decreasing, and s k ( f ) measures the minimal distance by which f needs to be mo ved (with respect to the metric d ) in order to remo ve all b ut its k most significant modes. What is considered significant and what is not, ho wev er , heavily depends on the choice of metric. In any case, so far we hav e not excluded pathologies, i.e., situations where M ( f ) > k but s k ( f ) = d ( f , X k ) = 0. Hence: Definition 2 (Descriptive metric) ( X , d ) is called descriptive if M ( f ) > k implies that s k ( f ) > 0 for e very f ∈ X and all k ∈ N 0 .  Stability Regardless of the concrete choice of metric, notice that distance to (arbitrary) sets in metric spaces is 1-Lipschitz; therefore stability essentially comes for free: Lemma 1 (Stability of signatures) Let f , g ∈ X . Then | s k ( f ) − s k ( g ) | ≤ d ( f , g ) for all k ∈ N 0 .  Stability implies that a small perturbation of f results in a small perturbation of the signatures s k ( f ). 3 P ersistence signatur es and K olmogorov signatur es In our setting a “good” metric is one that leads to signatures that clearly separate significant modes (with respect to a giv en noise model) from insignificant ones. W e in vestigate two choices. Persistence signatur es One possible choice of metric is the one induced by the sup norm, i.e., d ∞ ( f , g ) = sup x | f ( x ) − g ( x ) | , which leads to signatures that hav e an interpretation in the context of persistent homology , as we show belo w . Lemma 2 ( X , d ∞ ) is descriptive for every X ⊂ L ∞ .  7 P r oof Being descriptiv e is equiv alent to X k being closed in X for all k . Suppose that there exists k ∈ N 0 such that X k is not closed, i.e., there exist f ∈ X \ X k and a sequence ( f n ) in X k with d ∞ ( f n , f ) → 0. Since f < X k , there exists a partition P = { t 0 , . . . , t | P | } of [0 , 1] and some index set I with k < | I | < | P | such that max ( f ( t i − 1 ) , f ( t i + 1 )) < f ( t i ) for all i ∈ I . Since d ∞ ( f n , f ) → 0, there exists N ∈ N such that max( f n ( t i − 1 ) , f n ( t i + 1 )) < f n ( t i ) for all n ≥ N and all i ∈ I . Contradiction.  The follo wing lemma makes precise the relation between topological persistence and our notion of metric signatures for the sup norm. Lemma 3 Let X be a space of tame functions, i.e ., H ∗ ( f − 1 ( −∞ , t ]) has finite rank for all f ∈ X and all t ∈ R , and every f ∈ X has a finite number of homologically critical values. Or der the finite persistence values (counted with multiplicity) of some f ∈ X accor ding to their persistence, fr om highest to lowest, yielding a persistence sequence ( p k ( f )) k ≥ 1 . Using d ∞ yields p k ( f ) = 2 s k − 1 ( f ) for all k ≥ 1 .  P r oof Let k ≥ 1. W e first claim that p k ( f ) ≤ 2 s k − 1 ( f ). Let ( f n ) be a sequence in X k − 1 with d ∞ ( f n , f ) ≤ s k − 1 ( f ) + 1 n . Notice that p k ( g ) = 0 for all g ∈ X k − 1 ⊂ X . By the stability theorem for persistence diagrams [ 10 ], one has | p k ( g ) − p k ( f ) | ≤ 2 d ∞ ( f , g ) for all f , g ∈ X . T ogether these facts imply that p k ( f ) = | p k ( f ) − p k ( f n ) | ≤ 2 d ∞ ( f , f n ) ≤ 2 s k − 1 ( f ) + 2 n , which prov es the first claim. T o see that p k ( f ) ≥ 2 s k − 1 ( f ), observe that the bound pro vided by the stability theorem is tight in dimensions less or equal to 2, see [ 2 ]. Indeed, if f is tame, then by mo ving f by at most δ in the sup norm, it is possible to remove all its persistence pairs with persistence less or equal to 2 δ without increasing the number of remaining persistence pairs. Hence there exists a function g ∈ X k − 1 with d ∞ ( g , f ) ≤ 1 2 p k ( f ), which implies that s k − 1 ( f ) = d ∞ ( f , X k − 1 ) ≤ d ∞ ( f , g ) ≤ 1 2 p k ( f ).  K olmogoro v signatur es For reasons that will become e vident in the next section, we propose an alternati ve to persistence signatures, which we call K olmogor ov signatur es . Let L 1 denote the space of Lebesgue-integrable functions on [0 , 1]. Due to compactness of [0 , 1], we hav e that L ∞ ⊂ L 1 . The K olmogor ov distance , d K , is defined as follows. Let f , g ∈ L ∞ , and let F , G denote the respecti ve antideriv ati ves, where, as a con vention, we require that F (0) = G (0) = 0. Define d K ( f , g ) : = d ∞ ( F , G ) . Notice that d K does not induce a metric on arbitrary subsets X ⊂ L ∞ since if f = g almost e verywhere (a.e.), then d K ( f , g ) = 0. Therefore, we work with a unique representativ e in each equi v alence class of a.e. identical functions by requiring that X ⊂ L : =        f ∈ L ∞ : f ( t ) = lim  → 0 inf 0 <δ< 1 t + ( δ ) − t − ( δ ) Z t + ( δ ) t − ( δ ) f ( s ) d s for all t ∈ [0 , 1]        , (8) where t − ( δ ) = max(0 , t − δ ) and t + ( δ ) = min(1 , t + δ ). 8 There indeed e xists a (unique) representativ e in L for every equiv alence class of a.e. identical functions in L ∞ , since the right hand side of ( 8 ) exists (and is finite) for all t ∈ [0 , 1] and all f ∈ L ∞ , and since Lebesgue’ s di ff erentiation theorem asserts that e very f ∈ L 1 satisfies f ( t ) = lim δ → 0 1 2 δ Z t + δ t − δ f ( s ) d s a.e. on (0 , 1) . W e thus obtain a projection operator P : L ∞ → L ⊂ L ∞ . Notice, howe ver , that L is not a linear space, since f ∈ L does not necessarily imply that − f ∈ L . Nonetheless, we may of course choose linear subspaces X ⊂ L for specific applications. The follo wing lemma further moti v ates our choice of L . Lemma 4 F or any class [ f ] of a.e. identical functions in L ∞ , its unique r epresentative P ( f ) ∈ L minimizes the number of modes within that class.  P r oof Let f ∈ L ∞ with representati ve ˜ f : = P ( f ) ∈ L . W e show that M ( ˜ f ) ≤ M [ f ]. Consider any finite partition P of [0 , 1] and assume that t i counts a mode of ˜ f , i.e., ˜ f ( t i ) − max( ˜ f ( t i − 1 ) , ˜ f ( t i + 1 )) >  for some  > 0. Consider any open neighborhood U i of t i . Since ˜ f ( t i ) = lim  → 0 inf 0 <δ< 1 t + i ( δ ) − t − i ( δ ) Z t + i ( δ ) t − i ( δ ) f ( s ) d s , there must be some t ∈ U i with f ( t ) ≥ ˜ f ( t i ) −  / 2. Since U i can be chosen arbitrarily small, there exists t 0 i arbitrarily close to t i such that f ( t 0 i ) ≥ ˜ f ( t i ) −  / 2. By the same argument, there exist t 0 i − 1 and t 0 i + 1 arbitrarily close to t i − 1 and t i + 1 , respectively , such that f ( t 0 i − 1 ) ≤ ˜ f ( t i − 1 ) +  / 2 and f ( t 0 i + 1 ) ≤ ˜ f ( t i + 1 ) +  / 2. By our choice of  this implies f ( t 0 i ) > max ( f ( t 0 i − 1 ) , f ( t 0 i + 1 )). Continuing this way yields a partition P 0 with M ( f , P 0 ) ≥ M ( ˜ f , P ).  Lemma 5 ( X , d K ) is descriptive for every X ⊂ L .  P r oof W e show that X \ X k is open wrt. the Kolomogoro v metric. Let f ∈ X \ X k . Then there exists a finite partition P = { t 0 , . . . , t | P | } of [0 , 1] and some index set I with k < | I | < | P | such that f ( t i ) − max ( f ( t i − 1 ) , f ( t i + 1 )) >  for all i ∈ I and some small enough  > 0. W ithout loss of generality , we assume that t i − 1 > 0 and t i + 1 < 1 for all i ∈ I . Let δ > 0 be small enough such that for all i ∈ I the interv als [ t i − 1 − δ , t i − 1 + δ ], [ t i − δ , t i + δ ], and [ t i + 1 − δ , t i + 1 + δ ] are contained in [0 , 1] and are mutually disjoint. Additionally , for all i ∈ I and all j ∈ {− 1 , 0 , 1 } let δ i , j ≤ δ be such that       f ( t i + j ) − 1 2 δ i , j Z t i + j + δ i , j t i + j − δ i , j f ( s ) d s       <  4 . Let δ 0 = min i ∈ I , j ∈{− 1 , 0 , 1 } δ i , j . Let g ∈ X with d K ( f , g ) < 1 4  δ 0 . Then       Z b a ( f ( s ) − g ( s )) d s       < 1 2  δ 0 for all 0 ≤ a < b ≤ 1. Hence,       f ( t i + j ) − 1 2 δ i , j Z t i + j + δ i , j t i + j − δ i , j g ( s ) d s       <  4 +  δ 0 4 δ i , j ≤  2 9 2s 1 2s 1 2s 1 2s 1 s 1 s 1 Figure 2: A function with exactly tw o modes (left) and its closest function with exactly one mode w .r .t. the K olmogorov norm (right, in purple). Notice that the attendant K olmogorov signature, s 1 , for removing the smallest mode of f , can be read o ff from the light-blue areas. The purple function is computed using taut strings (see Section 5 ). for all i ∈ I and all j ∈ {− 1 , 0 , 1 } . Therefore, there e xists t 0 i ∈ [ t i − δ i , 0 , t i + δ i , 0 ] with g ( t 0 i ) > f ( t i ) −  2 . Like wise, there e xist t 0 i ± 1 ∈ [ t i ± 1 − δ i ± 1 , ± 1 , t i ± 1 + δ i ± 1 , ± 1 ] with g ( t 0 i ± 1 ) < f ( t i ± 1 ) +  2 . Thus there exists a partition P 0 of [0 , 1] with M [ g , P 0 ] > k, i.e., g ∈ X \ X k . Since  and δ 0 only depend on f and since g was chosen arbitrarily in the open Kolmogoro v-ball of radius 1 4  δ 0 around f , this ball is contained in X \ X k .  Figure 2 o ff ers a visualization for a function with tw o modes and its closest function with a single mode with respect to the K olmogorov norm. Before elaborating on how to compute K olmogorov signatures, though, we examine their statistical properties. 4 Statistical perspectiv e Throughout this section we assume that the noise (  i ) in Model ( 2 ) is independently distrib uted with mean zero such that for some κ > 0, v > 0 and all m ≥ 2, E  |  i | m  ≤ vm ! κ m − 2 / 2 for all i = 1 , . . . , n . (9) Distributions which satisfy ( 9 ) include the centered normal distribution with v ariance σ 2 > 0, the (centered) Poisson distribution with intensity λ , or the Laplace distribution with variance 2 λ 2 . Moreov er , any symmetric distribution around zero with compact support is co vered by ( 9 ) , including the uniform distribution on an interv al [ − B , B ]. 4.1 Thresholding K olmogorov signatur es In this subsection we prove an e xponential deviation inequality for the empirical K olmogorov signatures (Theorem 1 ), which allows us to construct uniform confidence bands for the unknown signatures ( s j ( f )) j ∈ N 0 . More precisely , we provide a data dependent sequence of intervals ( I ( n ) ( α ) j ) j ∈ N 0 that cov ers the (unkno wn) signatures with probability at least 1 − α . Let f ∈ X k ⊂ X ⊂ L , with L defined in ( 8 ) , hav e (an unknown number of) exactly k modes. As stressed in the introduction, we do not aim at estimating the re gression function f itself b ut rather at inferring directly the sequence of signatures s j ( f ) together with the number of modes k 10 in such a way that estimates for these quantities can be provided at a prespecified error rate. This can be achie ved by properly thresholding the sequence of empirical signatures. In our analysis we consider equidistant sampling points i / n and piece wise constant functions f ( n ) : [0 , 1] → R defined as f ( n ) ( t ) = n − 1 X i = 0 1 h i n , i + 1 n ) ( t ) f  i n  . W e define X ( n ) j as the corresponding set of piecewise constant functions with at most j modes, and we call s j ( f ( n ) ) = dist ( f ( n ) , X ( n ) j ) the quantized signatur e of f . Further , for the observation vector Y = ( Y 1 , . . . , Y n ) we define the piece wise constant function Y ( n ) ( t ) = n X i = 1 1 h i − 1 n , i n ) ( t ) Y i . In the follo wing, we call s j ( Y ( n ) ) the empirical signatur es . Function spaces In principle, the results of this subsection hold for any function space X ⊂ L as long as one can control the distance d K ( f , f ( n ) ) between f and the quantized function f ( n ) . Accordingly , all subsequent results are formulated for the quantized signatures s j ( f ( n ) ). From those, the corresponding statements concerning s j ( f ) can be obtained along the following reasoning. Consider the (deterministic) approximation error between f ( n ) and f in terms of the K olmogorov metric d K ( f , f ( n ) ) = sup s ∈ [0 , 1]      Z s 0 f ( t ) − f ( n ) ( t ) d t      . (10) Then, due to Lemma 1 and the triangle inequality , it follows that max j ∈ N 0 | s j ( Y ( n ) ) − s j ( f ) | ≤ max j ∈ N 0 | s j ( Y ( n ) ) − s j ( f ( n ) ) | + d K ( f , f ( n ) ) . Therefore, if d K ( f , f ( n ) ) is known, then the subsequent estimates on | s j ( Y ( n ) ) − s j ( f ( n ) ) | can readily be modified to obtain estimates on | s j ( Y ( n ) ) − s j ( f ) | . E.g., if f Hölder continuous, i.e., | f ( x ) − f ( y ) | ≤ C | x − y | γ ∀ ( x , y ) ∈ [0 , 1] , γ > 0 , then d K ( f , f ( n ) ) ≤ C γ + 1 n − γ , (11) so that the approximation error is of order n − γ . Hence, due to Lemma 1 , max j ∈ N 0 | s j ( Y ( n ) ) − s j ( f ) | ≤ max j ∈ N 0 | s j ( Y ( n ) ) − s j ( f ( n ) ) | + O ( n − γ ) , and all subsequent estimates and results can be modified accordingly . 11 Statistical inference of signatures and modes without a priori inf ormation W e return to our initial goal of providing tools for statistical inference on the signatures and modes. W e start with in vestigating ho w well the empirical signatures s j ( Y ( n ) ) estimate the quantized signatures s j ( f ( n ) ). T o this end, we control d K ( f ( n ) , Y ( n ) ) by the follo wing exponential de viation bound, which is a direct consequence of [ 27 , Theorem B.2]. Theorem 1 Assume the moment condition in ( 9 ) . Then, for any δ > 0 and any f ∈ X , one has P max j ∈ N 0 | s j ( Y ( n ) ) − s j ( f ( n ) ) | ≥ δ ! ≤ 2 exp − δ 2 n 2 v + 2 κ δ ! .  P r oof By stability of metric signatures (Lemma 1 ), we hav e that P max j ∈ N 0 | s j ( Y ( n ) ) − s j ( f ( n ) ) | ≥ δ ! ≤ P  d K ( f ( n ) , Y ( n ) ) ≥ δ  . Let S k = P k i = 1  i and observe that d K ( Y ( n ) , f ( n ) ) = max k | S k | / n . From [ 27 , Theorem B.2] we obtain P  d K ( Y ( n ) , f ( n ) ) ≥ δ  = P  max k | S k | ≥ δ n  ≤ 2 exp − δ 2 n 2 v + 2 κ δ ! .  This results shows that the empirical signatures s j ( Y ( n ) ) are close to the quantized signatures s j ( f ( n ) ) with high probability simultaneously for all j ∈ N 0 . Remark 1 (Sharpness of bound) Figure 3 o ff ers two examples of how the signatures of Y ( n ) de viate from those of f ( n ) . Notice that in these e xamples, the signatures of f ( n ) are almost indistinguishable from the highest signatures of Y ( n ) —indeed, their di ff erence is less than what is predicted by Theorem 1 . The reason is that, while the bound in Theorem 1 is sharp in general (since stability of metric signatures provides a sharp bound in general), it may be arbitrarily suboptimal for concrete examples, i.e., if | s k ( f ) − s k ( Y ) | is small while d K ( f , Y ) is large.  A useful application of Theorem 1 is that for a giv en probability α , we can construct a non-asymptotic and honest (uniform) confidence region cov ering the signatures s j ( f ( n ) ) with probability at least 1 − α , as shown in the follo wing theorem. Theorem 2 F ix some α ∈ (0 , 1) and let τ n ( α ) : = 1 n q log( α/ 2)  log( α/ 2) κ 2 − 2 nv  − κ log( α/ 2) ! . Assume the r egr ession model ( 2 ) and the moment condition in ( 9 ) . Then inf f ∈ X P  s j ( f ( n ) ) ∈ h s j ( Y ( n ) ) − τ n ( α )  + , s j ( Y ( n ) ) + τ n ( α ) i for all j ∈ N 0  ≥ 1 − α , wher e ( x ) + = max(0 , x ) .  P r oof From Theorem 1 we obtain P  | s j ( Y ( n ) ) − s j ( f ( n ) ) | ≤ τ n ( α ) for all j ∈ N 0  ≥ 1 − 2 exp − τ n ( α ) 2 n 2 ν + 2 κ τ n ( α ) ! = 1 − α . Since s j ( f ( n ) ) ≥ 0 for all j ∈ N 0 , this completes the proof.  12 0.2 0.4 0.6 0.8 1.0  5 5 10 0.2 0.4 0.6 0.8 1.0  5 5 10 2 4 6 8 10 12 14 0.005 0.010 0.050 0.100 0.500 1.000 2 4 6 8 10 12 14 0.005 0.010 0.050 0.100 0.500 1.000 Figure 3: Noisy samples of signals (top) and signatures of both original signal and sample (bottom, log -scale). Left: Function generated by random sampling and smoothing. Right: signal bumps [ 15 ]. Sampling noise normally distributed with standard de viation σ = 1. Notice that the largest signatures of signal and sample are very close (almost indistinguishable), and that there is a clear gap between the smallest signature of the signal (left: k = 4, right: k = 5) and the next signature of the noisy sample. Note that τ n ( α ) is a quantity that only depends on the v alues n , κ , v , and the confidence le vel α . Here we assume for simplicity that κ and ν are kno wn—and while in practice this might not be the case, these numbers can be estimated from the data, e.g., in the case of a normal distribution, such an estimate boils do wn to estimating the v ariance σ 2 . Fixing α , we obtain a (random) sequence of interv als h s j ( Y ( n ) ) − τ n ( α )  + , s j ( Y ( n ) ) + τ n ( α ) i , which, according to Theorem 2 , co ver the sequence of true quantized signatures s j ( f ( n ) ) with confidence lev el 1 − α . For smaller values of α , i.e., for larger confidence, these intervals become wider . Notice that for a fix ed error α ∈ (0 , 1), the interv al lengths 2 τ n ( α ) behave like 1 / √ n as n → ∞ . Theorem 1 sho ws that s j ( Y ( n ) ) approximates s j ( f ( n ) ) well in the sup norm. Ho we ver , the number of estimated signatures greater than zero might still be large. Consequently , s ( Y ( n ) ) does not directly indicate which signatures are significantly larger than zero and hence will be of limited use for estimating the number of modes of f . Nonetheless, such an estimate can readily be obtained by thresholding the empirical signatures. Define k  ( Y ( n ) ) = max { j ∈ N 0 : s j − 1 ( Y ( n ) ) ≥  } , (12) where, as a con vention, we define s − 1 ( Y ( n ) ) = ∞ . 13 The threshold parameter τ n ( α ) has an immediate statistical interpretation: It controls the probability of ov erestimating the number of modes for any function f ∈ X . Theorem 3 Let f ∈ X , assume the re gr ession model ( 2 ) and the moment condition ( 9 ) , let α ∈ (0 , 1) , and let k ∈ N 0 be such that f ( n ) ∈ X k . Then P  k τ n ( α ) ( Y ( n ) ) > k  ≤ α .  P r oof First, observe from the definition of k τ n ( α ) ( Y ( n ) ) in ( 12 ) that P  k τ n ( α ) ( Y ( n ) ) > k  = P  s k ( Y ( n ) ) ≥ τ n ( α )  . Notice that f ( n ) ∈ X k implies that s k ( f ( n ) ) = 0. Therefore, for f ( n ) ∈ X k , Theorem 1 and the definition of τ n ( α ) imply (similar to the proof of Theorem 2 ) that P  s k ( Y ( n ) ) ≥ τ n ( α )  ≤ α .  Hence, whatev er the number of modes of f ( n ) might be, the thresholding index k τ n ( α ) ( Y ( n ) ) ov erestimates this number with probability less or equal to α . Notice that the thresholding parameter τ n ( α ) is independent of the number and magnitude of the modes of f , so in that sense, this result is uni versal. As mentioned in the introduction, obtaining a uni versal result in the other direction, i.e., controlling the probability of underestimating the number of modes, is a more delicate task since modes can become arbitrarily small. Recalling the definition of k  ( f ) as in ( 12 ), we find: Theorem 4 Let f ∈ X , assume the re gr ession model ( 2 ) and the moment condition ( 9 ) , let α ∈ (0 , 1) , and let k ∈ N 0 be such that f ( n ) ∈ X k . Then P  k τ n ( α ) ( Y ( n ) ) < k 2 τ n ( α ) ( f ( n ) )  ≤ α . (13) P r oof Let f ( n ) ∈ X k and let l denote the largest integer such that s l − 1 ( f ( n ) ) ≥ 2 τ n ( α ), i.e., l = k 2 τ n ( α ) ( f ( n ) ). If l = 0, then ( 13 ) is trivially satisfied, since k τ n ( α ) ( Y ( n ) ) ≥ 0. So suppose that l > 0. Then P  k τ n ( α ) ( Y ( n ) ) < l  = P  s l − 1 ( Y ( n ) ) < τ n ( α )  ≤ P  s l − 1 ( f ( n ) ) − s l − 1 ( Y ( n ) ) > τ n ( α )  ≤ P  | s l − 1 ( f ( n ) ) − s l − 1 ( Y ( n ) ) | > τ n ( α )  ≤ α , where the last inequality follo ws from Theorem 1 and the definition of τ n ( α ).  W e hav e thus expressed the underestimation error of the number of modes as an explicit function of the signature threshold 2 τ n ( α ). Combining the latter results, we obtain two sided bounds for the estimated number of modes. More precisely , for any f and k with f ( n ) ∈ X k and any α ∈ (0 , 1) we hav e that P  k 2 τ n ( α/ 2) ( f ( n ) ) ≤ k τ n ( α/ 2) ( Y ( n ) ) ≤ k  ≥ 1 − α . 14 As mentioned abov e, for fixed α, κ , v one has τ n ( α ) ≈ 1 / √ n . Therefore there exists a constant C such that asymptotically (for large enough n ) by thresholding at C / √ n , it can be guaranteed at a le vel α that all signatures above this threshold are detected. Based on the pre vious results we no w construct confidence interv als for k  ( f ( n ) ), i.e., for the number of modes whose signatures exceed a certain size  . Corollary 1 Assume the r e gr ession model ( 2 ) , the moment condition ( 9 ) , let  ≥ 0 , and let f ( n ) ∈ X k . Define l ( α,  ) =        max n j ∈ N 0 : s j ( Y ( n ) ) >  + τ n ( α ) o if  < s 0 ( Y ( n ) ) − τ n ( α ) 0 otherwise and u ( α,  ) =        min n j ∈ N 0 : s j ( Y ( n ) ) <  − τ n ( α ) o if  > τ n ( α ) ∞ otherwise . Then P  k  ( f ( n ) ) ∈ [ l ( α,  ) , u ( α,  ) ]  ≥ 1 − α .  P r oof Suppose, for the moment, that d K ( Y ( n ) , f ( n ) ) ≤ τ n ( α ) . (14) Since s j ( f ( n ) ) ≥  for all j < k  ( f ( n ) ), stability of metric signatures implies that s j ( Y ( n ) ) ≥  − τ n ( α ) for all j < k  ( f ( n ) ). Hence, by the definition of u ( α,  ), we ha ve u ( α,  ) ≥ k  ( f ( n ) ). Further , while still assuming ( 14 ) , s j ( Y ( n ) ) >  + τ n ( α ) implies that s j ( f ( n ) ) >  . Hence, by the definition of l ( α,  ), we find that s l ( α, ) ( f ( n ) ) >  . This in turn implies k  ( f ( n ) ) ≥ l ( α,  ) . Therefore, we hav e so far shown that ( 14 ) implies that l ( α,  ) ≤ k  ( f ( n ) ) ≤ u ( α,  ) . Since ( 14 ) holds with probability ≥ 1 − α (see proof of Theorem 1 ), this prov es the assertion.  Note that the upper bound for k  jumps to ∞ if  ≤ τ n ( α ). This reflects the fact, that meaningful upper bounds cannot be pro vided for signatures whose size is of the order of the noise le vel. Remark 2 (Distribution of signatur es) Assume the setting of Theorem 1 and suppose that X k is scaling in variant for all k ∈ N 0 , i.e., { λ g : g ∈ X k } = X k for all 0 < λ ∈ R . Assume for simplicity that f ≡ 0, the general case still being unkno wn. Then, for an y k ∈ N 0 , we hav e that √ n  s k ( Y ( n ) ) − s k ( f ( n ) )  = √ n inf g ∈ X k sup s ∈ [0 , 1]      Z s 0   ( n ) ( t ) − g ( t )  d t      = inf g ∈ X k sup s ∈ [0 , 1]        1 √ n d n s e X i = 1  i − Z s 0 √ ng ( t ) d t        = inf g ∈ X k sup s ∈ [0 , 1]        1 √ n d n s e X i = 1  i − Z s 0 g ( t ) d t        , 15 where the last equality follo ws from the scaling in variance of X k . Noting that 1 √ n max m = 1 ,..., n        m X i = 1  i        D → sup 0 ≤ x ≤ 1 B ( x ) , where B denotes a standard Brownian motion on [0 , 1] and using that f ≡ 0, it follows that √ n  s ( Y ( n ) )  D → s ( B 0 ) , where B 0 denotes the deri vati ve of a standard Bro wnian Motion on [0 , 1] in a weak sense. This follo ws from the continuity of the functional s w .r .t. the Kolmogoro v norm.  Remark 3 (Gaussian observation) If the noise  in ( 2 ) is Gaussian with mean zero and vari- ance σ 2 , then Theorem 1 can be sharpened, due to a refined large de viation result for Gaussian observ ations (see, e.g., [ 4 ]): P  d K ( f ( n ) , Y ( n ) ) ≥ δ  ≤ 2 exp − δ 2 n 2 σ 2 ! . (15) Hence, in the Gaussian case, all results of Section 4 remain true if τ n ( α ) is replaced by the simpler (and slightly sharper) threshold ˜ τ n ( α ) = q − 2 σ 2 / n log( α/ 2) .  Obtaining the correct number of modes using a priori information Notice that so far we hav e not made any a priori assumption about f ( n ) . If, howe ver , f ( n ) ∈ X k , and if we impose prior information on the smallest strictly positiv e signature s k − 1 ( f ( n ) ), then we obtain an explicit bound for the probability that the number of modes is estimated correctly . Theorem 5 Assume the r egr ession model ( 2 ) and the moment condition ( 9 ) . Let f ( n ) ∈ X k be such that s k − 1 ( f ( n ) ) ≥  . Then P  k  / 2 ( Y ( n ) ) = k  ≥ 1 − 2 exp −  2 n 8 v + 4 κ  ! . (16) P r oof First suppose that k > 0. Notice that by ( 12 ) we hav e that k  / 2 ( Y ) = k i ff s k − 1 ( Y ( n ) ) ≥  2 and s k ( Y ( n ) ) <  2 . Furthermore, by assumption we hav e that s k − 1 ( f ( n ) ) ≥  and s k ( f ( n ) ) = 0. Therefore, k  / 2 ( Y ) , k implies that | s k − 1 ( f ( n ) ) − s k − 1 ( Y ( n ) ) | ≥  2 or | s k ( f ( n ) ) − s k ( Y ( n ) ) | ≥  2 . For k = 0, by a similar argument, we hav e that k  / 2 ( Y ( n ) ) , 0 implies that | s k ( f ( n ) ) − s k ( Y ( n ) ) | ≥  2 . Thus, for all k ≥ 0, Theorem 1 implies that P  k  / 2 ( Y ( n ) ) , k  ≤ P max j ∈ N 0 | s j ( Y ( n ) ) − s j ( f ( n ) ) | ≥  2 ! ≤ 2 exp −  2 n 8 v + 4 κ  ! .  W e stress that the bound in Theorem 5 is remarkably simple, as it depends on the signatures s j ( f ( n ) ), j ∈ { 0 , . . . , k − 1 } , only through s k − 1 ( f ( n ) ), which in a sense represents the signature that is hardest to detect. Notice furthermore that the bound in Theorem 5 does not depend on the (unkno wn) number of signatures k . 16 Limitations of K olmogoro v signatur es K olmogorov signatures are by no means suitable for all kinds of signals. Indeed, as might be expected intuitively , K olmogorov signatures are not well suited for sparse signals that ha ve high peaks with small support (the needle in a haystac k pr oblem ). In order to illustrate this e ff ect, consider signals of the following kind: f n ( x ) =        (1 + ε ) p 2 log n if x ∈ [ j /, ( j + 1) / n ) , 0 otherwise , (17) for some ε > 0 and for some j ∈ { 0 , . . . , n − 1 } that is a priori not known . Note that there e xists no statistical testing procedure that can asymptotically (as the number of observation n → ∞ ) detect signals with intensity as in ( 17 ) for ε < 0 with positiv e detection po wer , see, e.g., [ 13 ]. For ε > 0, sup norm based thresholding is known to achie ve the optimal detection boundary [ 13 ]. In contrast, K olmogorov signature based thresholding at τ n ( α ) as described abov e is not able to detect signals of the type ( 17 ) for any ε > 0: Theorem 6 (K olmogorov signatur es and sparse signals) Let f n : [0 , 1] → R be as in ( 17 ) , and let Y i = f n ( i n ) +  i , wher e  1 , . . . ,  n i.i.d. ∼ N (0 , 1) . Then for any α ∈ (0 , 1) one has lim n →∞ P  k τ n ( α ) ( Y ( n ) ) = 1  = 0 , i.e., it is impossible to detect the single mode of f n when thr esholding K olmogr ov signatur es at τ n ( α ) .  P r oof W e have P  k τ n ( α ) ( Y ( n ) ) = 1  ≤ P  k τ n ( α ) ( Y ( n ) ) ≥ 1  = P  s 0 ( Y ( n ) ) ≥ τ n ( α )  ≤ P  d K ( Y ( n ) , 0) ≥ τ n ( α )  , where d K ( Y ( n ) , 0) denotes the K olmogorov distance of the observations Y ( n ) to the zero function. Let µ n : = (1 + ε ) p 2 log n . Then the last term can be further estimated as P  d K ( Y ( n ) , 0) ≥ τ n ( α )  = P        1 n max m = 1 ,..., n        m X i = 1  i + µ n        ≥ τ n ( α )        ≤ P        1 n max m = 1 ,..., n        m X i = 1  i        + µ n n ≥ τ n ( α )        = P        1 √ n max m = 1 ,..., n        m X i = 1  i        + µ n √ n ≥ √ n τ n ( α )        . The claim no w follo ws from the fact that with n → ∞ one has µ n / √ n → 0, √ n τ n ( α ) → ∞ , and 1 √ n max m = 1 ,..., n        m X i = 1  i        D → sup 0 ≤ x ≤ 1 B ( x ) , where B denotes a standard Brownian motion on [0 , 1].  17 0.2 0.4 0.6 0.8 1.0 - 4 - 2 2 4 6 8 10 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 Figure 4: Left: signal blocks ; Right: signal b umps [ 15 ]. 4.2 Simulations using K olmogor ov signatur es W e illustrate the v alidity of our approach by means of a simulation study for the signals bloc ks and bumps [ 15 ], which are sho wn in Fig. 4 . Concerning detection of modes, the tw o signals are of di ff erent types as they contain modes of di ff erent lengths. F or a function f with k modes and observ ations Y from ( 2 ) the theory in the previous Section shows that the number of modes k can be estimated by thresholding of the empirical signatures. This approach clearly relies on the fact that s k − 1 ( Y ( n ) ) and s k ( Y ( n ) ) can be distinguished with high probability . Here, we in vestig ate this empirically by considering the quantity ∆ ( Y ) = s k − 1 ( Y ( n ) ) s k ( Y ( n ) ) . For our simulation we consider independent Gaussian noise. W e note that the bound in Remark 3 is constant for increasing n if the variance is linearly increasing in n . This suggests that the expected v alue of ∆ ( Y ) is also constant in this case. W e chose σ = √ n / 16 for blocks and σ = √ n / 256 for bumps and computed the a verage v alue of ∆ ( Y ) in 1000 Monte-Carlo simulations. The results in Fig. 4 show that ∆ ( Y ) is approximately constant for n ≥ 1024. Further , for both signals the ratio ∆ ( Y ) is bounded aw ay from 1, which empirically confirms that the number of modes can be estimated by thresholding. n blocks bumps 256 1.28726 2.08565 1024 1.57086 1.8708 4096 1.52344 1.85699 16384 1.52735 1.84809 65536 1.52647 1.83197 T able 1: A verage v alues of ∆ ( Y ) for blocks and b umps as in Fig. 4 . The results are obtained from 1000 simulations with independent Gaussian noise with σ = √ n / 16 and σ = √ n / 256 for blocks and bumps , respecti vely . As becomes e vident from Fig. 4 , the correct number of modes of the signal is k = 5 and k = 11, respecti vely . 18 4.3 Sup norm based persistence signatures W e contrast the results of the pre vious sections with what holds true for persistence signatures. Throughout this section, let s j , ∞ denote the signatures with respect to the sup norm. F or simplicity we restrict our exposition to functions with one single mode. More precisely , we consider functions of the type f n ( x ) =        δ n if x ∈ [1 / 3 , 2 / 3) , 0 otherwise , (18) with δ n → 0. It is well kno wn that it is possible to detect the single mode of f n with probability tending to one as n → ∞ if δ n √ n → ∞ , (19) see, e.g., [ 7 , 31 ]. From Theorem 5 it follo ws, using  = δ n , that K olmogorov signatures can correctly detect the single mode of signals in ( 7 ) by thresholding signatures at δ n / 2. In contrast, for persistence signatures, there e xists no thresholding strate gy that can detect the single mode with probability one: Theorem 7 Let Y i = f n ( i / n ) +  i , wher e  1 , . . . ,  n i.i.d. ∼ N (0 , 1) , and let f n : [0 , 1] → R be as in ( 18 ) with δ n such that δ n p log n → 0 . F or any arbitrary sequence q n ∈ R lim sup n →∞ P  k ∞ q n ( Y ) = 1  < 1 .  The proof of Theorem 7 requires some preparation. First, recall that a sequence of random v ariables Z 1 , . . . , Z n follo ws a Gumbel extr eme value limit (GEVL) with sequences a n and b n if lim n →∞ P  max 1 ≤ i ≤ n Z i ≤ a n + b n x  = e − e − x . A sequence of i.i.d. standard normal random variables follo ws a GEVL with a n = p 2 log n −  1 / 2 log log n + log 2 √ π  / p 2 log n , b n = 1 / p 2 log n . (20) Another essential ingredient of the proof of Theorem 7 is the follo wing lemma. Lemma 6 Let m ∈ N , assume  1 , . . . ,  2 m i.i.d. ∼ N (0 , 1) , and set ∆ m = min h ∈ R 2 m : h 1 ≤ h 2 ≤···≤ h 2 m ||  − h || ∞ . Then, with a m and b m as in ( 20 ) , lim m →∞ P ( ∆ m ≤ a m + b m x ) ≤ e − e − x .  P r oof ( of L emma 6 ) Consider a fixed vector h ∈ R 2 m such that h 1 ≤ h 2 ≤ · · · ≤ h 2 m . In particular , h j ≤ h m for all j ≤ m and h j ≥ h m for all j ≥ m . Let M (1) = max i = 1 ,..., m  i and M (2) = min i = m + 1 ,..., 2 m  i , and observe ||  − h || ∞ ≥ max n M (1) − h m , h m − M (2) o . 19 Hence, ∆ m ≥ min ζ ∈ R max n M (1) − ζ , ζ − M (2) o = 1 2  M (1) − M (2)  D = M (1) , where A D = B means that A and B are equally distrib uted. This implies that lim m →∞ P ( ∆ m ≤ a m + b m x ) ≤ lim m →∞ P  M (1) ≤ a m + b m x  = e − e − x , (21) because M (1) is the maximum of m independent standard normal random variables and follo ws a GEVL with a m and b m .  P r oof ( of T heorem 7 ) T o ease notation, we assume that n = 6 m for some m ∈ N and hence m = m ( n ). First, we observe that P  k ∞ q n ( Y ( n ) ) ≥ 1  = P  s 0 , ∞ ( Y ( n ) ) ≥ q n  and P  k ∞ q n ( Y ( n ) ) > 1  = P  s 1 , ∞ ( Y ( n ) ) ≥ q n  . (22) Since s 0 , ∞ ( Y ( n ) ) ≤ d ∞ ( f n , Y ) + s 0 ( f n ) (by Lemma 1 ) and s 0 ( f n ) = δ n / 2 it holds that P  k ∞ q n ( Y ( n ) ) ≥ 1  ≤ P ( d ∞ ( f n , Y ) ≥ q n − δ n / 2 ) (23) = P d ∞ ( f n , Y ) − a n b n ≥ q n − a n b n − δ n 2 b n ! = P d ∞ ( f n , Y ) − a n b n ≥ q n − a n b n + o (1) ! , with a n and b n as in ( 20 ) . Since d ∞ ( f n , Y ) = max i = 1 ,..., n |  i | it follo ws that for any x ∈ R one has P ( d ∞ ( f n , Y ) ≥ x ) ≤ 2 P  max i = 1 ,..., n  i ≥ x  by symmetry . Therefore, lim n →∞ P d ∞ ( f n , Y ) − a n b n ≥ x ! ≤ 2  1 − e − e − x  . (24) Further , for i = 0 , . . . , 5 we define ∆ ± i = min h ∈ R m : h 1 ≤ h 2 ≤···≤ h m || ± ( Y im + 1 , . . . , Y ( i + 1) m ) − h || ∞ . Recall that s 1 , ∞ ( Y ( n ) ) = inf g ∈ X 1 d ∞ ( g , Y ). Observe that any g ∈ X 1 is either monotonically increasing or decreasing on [ i / 6 , ( i + 1) / 6] for some 0 ≤ i ≤ 5. Otherwise g would hav e tw o modes, which contradicts g ∈ X 1 . F or this reason, we find s 1 , ∞ ( Y ( n ) ) ≥ min n ∆ − 0 , ∆ + 0 , . . . , ∆ − 5 , ∆ + 5 o . Note that ∆ − 0 , ∆ + 0 , . . . , ∆ − 5 , ∆ + 5 are identically distributed and independent asymptotically . Therefore, P  k ∞ q n ( Y ( n ) ) > 1  = lim n →∞ P  s 1 , ∞ ( Y ( n ) ) ≥ q n  ≥ lim n →∞ P  min n ∆ − 0 , ∆ + 0 , . . . , ∆ − 5 , ∆ + 5 o ≥ q n  = lim n →∞  1 − P  ∆ − 0 < q n  12 . In order to prov e the assertion, we sho w that for some β ∈ (0 , 1) lim n →∞ P  k ∞ q n ( Y ( n ) ) ≥ 1  ≥ 1 − β 20 already implies lim n →∞ P  k ∞ q n ( Y ( n ) ) > 1  > 0 for any sequence q n ∈ R . In other words, no thresholding procedure can estimate the number of true modes k = 1 with probability tending to one. Combining ( 23 ) and ( 24 ) sho ws that lim n →∞ P  k ∞ q n ( Y ( n ) ) ≥ 1  ≥ 1 − β implies q n ≤ a n + b n z β + o ( b n ) where z β is defined by 2(1 − exp ( − exp ( − z β ))) = β (it is assumed w .l.o.g. that β < e 1 / 6 / 2). W e then find from ( 13 ) that lim n →∞ P  k ∞ q n ( Y ( n ) ) > 1  ≥ lim n →∞  1 − P  ∆ − 0 < a n + b n z β + o ( b n )  12 = lim n →∞ 1 − P ∆ − 0 − a m b m < a n − a m b m + b n b m z β !! 12 ≥  1 − e − e − z β − log 6  12 . Here the last inequality follows from Lemma 6 together with b n b m → 1 and a n − a m b m → log 6. The proof is then completed by observing that z β + log 6 < ∞ , which yields  1 − e − e − z β − log 6  > 0.  5 T aut strings In order to compute K olmogorov signatures, we require some well kno wn and also some less kno wn results about taut strings , see e.g. [ 11 , 26 ]. W e prove a result that is central for our exposition and appears to be interesting in its own right: T aut strings minimize the number of critical points within a certain (quite general) class of functions. For a gi ven f ∈ L with antideri v ati ve F , consider the d ∞ -ball D α ( F ) of radius α ≥ 0 around F . W e refer to D α ( F ) as the α -tube around F . The taut string , denoted by F α , is the unique function in D α ( F ) whose graph, regarded as a curve in R 2 , has minimal total curve length, subject to boundary conditions F α (0) = F (0) and F α (1) = F (1) . For e xistence and uniqueness, we refer to [ 20 , 21 ]. F α is Lipschitz continuous for all α > 0 (see [ 20 ], proof of Lemma 2); thus its deriv ative f α (defined a.e.) is in L ∞ and we may hence choose f α ∈ L . Therefore, the properties that F α ∈ D α ( F ) and that the graph of F α has minimal curve length are equi v alent to d K ( f , f α ) ≤ α and Z 1 0 q 1 + f 2 α ( t ) d t = min , respecti vely . The aim of this section is to show the follo wing result. Theorem 8 F or all f ∈ L and all α > 0 , the derivative f α ∈ L of the taut string F α minimizes the number of modes among all function g ∈ L with d K ( f , g ) ≤ α .  21 f f α F ± α F α Figure 5: T aut string F α (purple) in the α -tube around F (top) and its deriv ati ve f α (bottom). The proof requires some preparation. Let the top and bottom functions of the α -tube around the antideri v ati ve F of f ∈ L be denoted by T α ( t ) : = F ( t ) + α and B α ( t ) : = F ( t ) − α , respecti vely . Furthermore, let S T ,α = { t ∈ [0 , 1] : F α ( t ) = T α ( t ) } and S B ,α = { t ∈ [0 , 1] : F α ( t ) = B α ( t ) } denote the sets where the taut string touches the top (resp. bottom) of the α -tube. Lemma 7 (Grasmair and Obereder [ 21 ]) F or e very α > 0 , the taut string F α is the unique function in D α ( F ) with F α (0) = F (0) and F α (1) = F (1) that is con vex on every connected component of (0 , 1) \ S B ,α and concave on every connected component of (0 , 1) \ S T ,α . In particular , F α is piecewise a ffi ne outside of S B ,α ∪ S T ,α .  Lemma 7 gi ves rise to a characterization of the modes of the deri v ati ve of a taut string (see Lemma 10 belo w). This characterization resembles the fact that an isolated local maximum (local minimum) of f α corresponds to a point (or interv al) where its antideriv ativ e F α changes from being locally con v ex to locally concav e (concave to con ve x), see Fig. 5 . Accordingly , we define: Definition 3 (maximally conca ve, con vex, and a ffi ne interv als) Fix α > 0. An interval I = [ a , b ] ⊂ [0 , 1] is called maximally a ffi ne if F α is a ffi ne on I but not on any interv al that properly contains I . An interval I = [ a , b ] ⊂ [0 , 1] that is not maximally a ffi ne is called maximally con vex (concave) if F α is con vex (conca ve) on I but not on any interv al that properly contains I .  22 Observe that by Lemma 7 , if F α is not a ffi ne on all of [0 , 1], then e very t ∈ [0 , 1] is contained in a maximally concav e or a maximally conv ex interval (or possibly both). By construction, maximally con vex (conca ve) intervals are mutually disjoint (within their respecti ve classes). Definition 4 (positive and negati ve inflection inter vals) Fix α > 0. An interv al I = [ a , b ] ⊂ (0 , 1) is called a positive ( ne gative ) inflection interval of F α if I is a maximally a ffi ne interval of F α and F α is con vex (conca ve) on some non empty neighborhood of a and concav e (conv ex) on some non empty neighborhood of b .  Notice that we deliberately require that a > 0 and b < 1 in our definition of inflection interv als. As a direct consequence of Lemma 7 we obtain: Lemma 8 F ix α > 0 . If [ a , b ] is a positive inflection interval of F α , then F α ( a ) = T α ( a ) and F α ( b ) = B α ( b ) ; if it is a ne gative inflection interval, then F α ( a ) = B α ( a ) and F α ( b ) = T α ( a ) .  Moreov er we ha ve: Lemma 9 F ix α > 0 . Then F α has the following pr operties: (i) The number of maximally conve x, the number of maximally concave, and the number of inflection intervals of F α is finite. (ii) Maximally con vex and maximally concave intervals ar e interleaved, i.e ., the set of points between two consecutive maximally con vex (concave) intervals belongs to a maximally concave (con vex) interval. (iii) The intersection of a maximally con vex (concave) with an immediately consecutive max- imally concave (con vex) interval is a positive (ne gative) inflection interval, and e very inflection interval arises in this way .  P r oof Let T α and B α denote the top and bottom of the α -tube around F α , respectively . Since F α is continuous, the graphs of T α and B α are compact sets. Let I be a maximally concav e, a maximally con ve x, or an inflection interv al of F α . By Definitions 3 and 4 and Lemma 7 , the graph of F α restricted to I must then contain an a ffi ne segment that connects T α with B α (or B α with T α ). Therefore, the arc length of the graph of F α , restricted to I , is bounded from belo w by the Euclidean distance d α between the graphs of B α and T α . Since these sets are compact and disjoint, one has d α > 0. Since d α is independent of I , and since F α is Lipschitz, it follows that the length of I is bounded from below by a number that only depends on α and the Lipschitz constant of F α . Hence, since maximally con ve x (conca ve) interv als are mutually disjoint, there can only exist finitely man y of them. Likewise, since positi ve (ne gati ve) inflection interv als are disjoint, there can only exist finitely many of those. Properties (i) and (ii) are then a straightforward consequence of Lemma 7 .  The ne xt lemma states the promised characterization of the modes of the deri v ative of a taut string. Lemma 10 F ix α > 0 and define f α ( t ) = lim  → 0 inf 0 <δ< F α ( t + δ ) − F α ( t − δ ) 2 δ if 0 < t < 1 and f α ( t ) = lim s → t f α ( s ) for t ∈ { 0 , 1 } . Then the number of positive inflection intervals of F α equals the number of modes of f α , and this number is finite.  23 P r oof First notice that the definition of f α (0) and f α (1) is meaningful since F α is a ffi ne in some neighborhood of 0 and 1. If F α is a ffi ne on all of [0 , 1], then there is nothing to show . So suppose that this is not the case. Consider a finite partition P = { t 0 , . . . , t | P | } of [0 , 1]. Notice that f α is no where decreasing (no where increasing) on intervals where F α is con vex (conca ve). Hence, for t i to count a mode of f α , i.e., M ( f α , P , i ) = 1, the pair ( t i − 1 , t i ) must not belong to the same maximally concav e interv al and the pair ( t i , t i + 1 ) must not belong to the same maximally conv ex interval of F α . Since, by assumption, F α is not a ffi ne on all of [0 , 1], e very t ∈ [0 , 1] belongs to a maximally concave or maximally con vex interv al (or both). Therefore, by property (i) of Lemma 9 , to each mode of f α counted by P there corresponds at least one change from a maximally con ve x to an immediately consecuti ve maximally conca ve interv al. By property (ii) of Lemma 9 , the total number of such changes is equal to the number of positi ve inflection interv als, which we denote by I + ( F α ). It follo ws that I + [ F α ] ≥ M ( f α ). V ice versa, by considering a partition of [0 , 1] such that there e xists (apart from t 0 = 0 and t | P | = 1) exactly one point in each positiv e and each negati ve inflection interval, it is straightforward to sho w that I + [ F α ] ≤ M ( f α ). Finally , finiteness of M ( f α ) follo ws from the fact that there are only finitely many positi ve inflection interv als.  W ith these preparations, we are now in the position to pro ve Theorem 8 . P r oof ( of T heorem 8 ) Let g ∈ L with antideri v ati ve G such that d K ( f , g ) ≤ α . Consider a positi ve inflection interv al [ a , b ] of F α . By Lemma 8 , F α ( a ) = T α ( a ), F α ( b ) = B α ( b ), and F α is a ffi ne on [ a , b ]. In particular, G ( a ) ≤ F α ( a ) and G ( b ) ≥ F α ( b ), and thus f α ( t ) = F α ( b ) − F α ( a ) b − a ≤ G ( b ) − G ( a ) b − a for all t ∈ ( a , b ) . For e very Lebesgue-integrable g : [ a , b ] → R with G ( t ) = G ( a ) + R t a g ( s ) d s , there e xist sets C 1 , C 2 ⊂ [ a , b ] of positiv e Lebesgue measure such that g ( c 1 ) ≤ G ( b ) − G ( a ) b − a ≤ g ( c 2 ) for all c 1 ∈ C 1 and all c 2 ∈ C 2 . Hence, for ev ery positive inflection interv al [ a , b ] there exists t ∈ ( a , b ) such that g ( t ) ≥ f α ( t ). By a similar argument, for e very ne gati ve inflection interv al [ a , b ] there exists t ∈ ( a , b ) such that g ( t ) ≤ f α ( t ). By Lemma 10 , whenev er M ( f α ) > 0 (otherwise there is nothing to show), the set of positi ve inflection interv als of F α is not empty . Therefore, one can choose a partition P = { t 0 , . . . , t | P | } of [0 , 1] that contains (apart from t 0 = 0 and t | P | = 1) e xactly one point in the interior of each inflection interval of F α such that g ( t i ) ≥ f α ( t i ) whene ver t i lies in a positi ve inflection interv al and g ( t i ) ≤ f α ( t i ) whenev er t i lies in a negati ve inflection interv al. By the proof of Lemma 10 , M ( f α ) = M ( f α , P ) for any partition P that contains (apart from t 0 = 0 and t | P | = 1) exactly one point in the interior of each inflection interval. Such partitions P count a mode of f α precisely for ev ery positive inflection interv al of F α . Since positi ve and negati ve inflection interv als are interleav ed and their interiors are disjoint, we obtain that M ( g , P ) ≥ M ( f α , P ) = M ( f α ). Thus M ( g ) ≥ M ( f α ).  24 6 Computing K olmogorov signatur es The results of the previous section lead to an e ffi cient algorithm for computing K olmogorov signatures. Let X ⊂ L be some subset, and let f ∈ X with antideri vati ve F . Suppose that X contains the deri v ati ves f α of the taut stings F α for all α ≥ 0. For e xample, let X be the space of piece wise constant functions. F or α large enough, F α is a ffi ne on all of [0 , 1], and its deriv ati v e f α has no modes. If f has any modes at all, then by lo wering α continuously , F α will at some point de velop a positi ve inflection interv al below some threshold α 0 > 0. By Theorem 8 and Lemma 10 , the value of α 0 is precisely the distance of f to the set of functions in X with zero modes, i.e., s 0 ( f ) = α 0 . Continuing this way , and defining α k as the smallest α for which f α has at most k modes, one finds that s k ( f ) = α k for all k . The idea of the algorithm belo w is to rev erse this observ ation: Starting from f = f 0 , we incrementally compute the values of α (in incr easing order) at which the number of modes of f α decreases. T o this end, we work with the space X of piece wise constant functions on a fixed partition 0 = t 0 < t 1 < · · · < t n = 1 of [0 , 1]. Notice that since we require X ⊂ L , we ha ve f ( t i ) = 1 2 ( f | ( t i − 1 , t i ) + f | ( t i , t i + 1 ) ) for all non-boundary points t i of the partition. Our starting point is a reformulation of Lemma 7 for piece wise constant functions. Lemma 11 Let f be a piece wise constant function with antiderivative F . Then the taut string F α is the unique continuous piecewise linear function in D α ( F ) with F α (0) = F (0) and F α (1) = F (1) such that if t is an incr easing (decreasing) discontinuity of f α = F 0 α , then F α ( t ) = F ( t ) + α ( F α ( t ) = F ( t ) − α ).  Fix α ≥ 0. Let I = ( a , b ) ⊆ (0 , 1) be an open interval, and let f α be constant on ( a , b ). W e call I r egular for f α if either a = 0 and b = 1 or a > 0, b < 1, and there exists  > 0 such that for all 0 < δ ≤  either f α ( a ) > f α ( a − δ ) and f α ( b ) < f α ( b + δ ) or f α ( a ) < f α ( a − δ ) and f α ( b ) > f α ( b + δ ). W e call I = ( a , b ) maximal (r espectively minimal) for f α if a > 0, b < 1, and there exists  > 0 such that for all 0 < δ ≤  one has f α ( a ) > f α ( a − δ ) and f α ( b ) > f α ( b + δ ) (respecti vely f α ( a ) < f α ( a − δ ) and f α ( b ) < f α ( b + δ )). W e call I critical if it is minimal or maximal. Finally , we call I = ( a , b ) a boundary interval for f α if either a = 0 and b < 1 or a > 0 and b = 1, and ( a , b ) is the largest such interval on which f α is constant. As a consequence of Lemma 11 we obtain: Corollary 2 A way from discontinuities, f α has the following form: either • t lies on a r e gular interval I = ( a , b ) of f α with value f α ( t ) = F ( b ) − F ( a ) b − a , • t lies on a locally minimal / maximal interval I = ( a , b ) of f α with value f α ( t ) = F ( b ) − F ( a ) ± 2 α b − a , or • t lies on a boundary interval of f α with value f α ( t ) = F ( b ) − F ( a ) ± α b − a .  This corollary is central for our computation of K olmogorov signatures. First observe that a v alue of a maximal interv al is continuously decreasing with gro wing α , the value of a minimal interval is continuously increasing, and the v alue of a re gular interval remains unchanged. Moreov er , if α is increased only slightly , then the discontinuities of f α remain unchanged; indeed: Lemma 12 Let F be piece wise linear . F or every α ≥ 0 ther e is δ > 0 suc h that the points of discontinuity of f β coincide with those of f α for all β with α ≤ β < α + δ . Mor eover , if t lies on a r egular interval of f α , then f β ( t ) = f α ( t ) .  25 P r oof Define G β by the properties of Lemma 11 , using the discontinuities of f α , i.e., if t is an increasing (decreasing) discontinuity of f α , then define G β ( t ) : = F ( t ) + β (resp. G β ( t ) : = F ( t ) − β ); set G β (0) : = F (0) and G β (1) : = F (1), and interpolate linearly . Then k G β − F α k ∞ = β − α and thus, since F α ∈ D α ( F ), we ha ve that k G β − F k ∞ ≤ β , i.e., G β ∈ D β ( F ). For δ su ffi ciently small, the discontinuities of g β = G 0 β hav e the same type as those of f α . But since F β is uniquely defined by the properties of Lemma 11 with respect to these discontinuities, we must hav e F β = G β .  As a consequence, for e very α ≥ 0, there e xists a minimal number µ ( α ) > α such that f β and f α hav e the same points of discontinuity for all β with α ≤ β < µ ( α ) but the set of points of discontinuity of f µ ( α ) is di ff erent from that of f α . W e call µ ( α ) the mer ge value of α . The merge value is the smallest number strictly greater than α for which a critical interv al or a boundary interval of f µ ( α ) reaches the v alue of an adjacent constant interval, and the corresponding discontinuity v anishes. Each discontinuity of f α that is incident to a critical or a boundary interv al is a possible candidate for such an event. Consider such a discontinuity b between two consecutiv e constant interv als I = [ a , b ] and J = [ b , c ] of f α . For an interv al I = [ a , b ], let F I : = F ( b ) − F ( a ). As a consequence of Corollary 2 and Lemma 12 , we obtain that the merge v alue µ ( α ) is the smallest number among all merge v alue candidates m I , J of f α , which are computed as follo ws: If I is critical and J is regular or vice-v ersa, then the merge value candidate is m I , J = 1 2      F I − | I | | J | F J      . If both I and J are critical, then the merge v alue candidate is m I , J =      | I | F J − | J | F I 2( | I | + | J | )      . If I is critical and J is a boundary interv al, then the merge v alue candidate is m I , J =      | I | F J − | J | F I | I | + 2 | J |      . If I is a boundary interval and J is critical, then the merge v alue candidate is m I , J =      | I | F J − | J | F I 2 | I | + | J |      . If I is a boundary interval and J is regular or vice-versa, then the mer ge v alue candidate is m I , J =      F I − | I | | J | F J      . If both I and J are boundary interv als, then the merge v alue candidate is m I , J =      | I | F J − | J | F I | I | + | J |      . W e define the sequence of merge v alues µ 1 < µ 2 < µ 3 < . . . of f as follo ws. Starting from µ 1 : = µ (0), let µ i + 1 : = µ ( µ i ). By construction, the values α = µ i are precisely those v alues where the number of discontinuities of f α decreases with increasing α . Observe that the mer ge value candidates of f µ i + 1 are equal to those of f µ i except only for the merged interv als I and J , i.e., those interv als that have the same value for f µ i + 1 but did not ha ve 26 the same v alue for f µ i . This suggests an e ffi cient way for computing K olmogorov signatures of f in re verse order . Starting with α = 0, we iterate in increasing order through the sequence of merge v alues of f . In a min-priority queue, we maintain the merge v alue candidates m I , J . In each iteration i , the lowest merge value candidate is the ne xt value µ i . Upon a merge, the corresponding discontinuity is remov ed, and the merge v alue candidates of the neighboring discontinuities are recomputed and updated in the priority queue. The discontinuities are organized in a linked list to allo w fast access to the neighbors. If the number of modes of f α has decreased upon a merge, the v alue α is prepended to the sequence of computed signatures. This can only occur if one of the merged intervals is maximal. The method is summarized in pseudocode in Algorithm 1 . Using an appropriate heap data structure, the running time is O ( n log n ), where n is the number of function v alues of f . Algorithm 1 Computing K olmogorov signatures 1: procedure K olmogor o v S equence ( f : list of function values) 2: α = 0 3: S = empty sequence 4: L = jumps of f (linked list) 5: Q = mer ge v alues of the jumps (priority queue) 6: while the priority queue Q is not empty do 7: let α be the smallest mer ge v alue in Q 8: let I = [ a , b ] and J = [ b , c ] be the corresponding intervals 9: if I and J are minimum / maximum or boundary / maximum of f α then 10: prepend α to S 11: remov e b from the list L of discontinuities 12: remov e α from the priority queue Q 13: recompute merge v alues of a and c and update priority queue Q 14: retur n S Acknowledgements W e would like to thank the anon ymous revie wers for their v ery helpful suggestions for re vising our manuscript and Carola Schoenlieb for inspiring discussions. Refer ences [1] S. Balakrishnan, A. Rinaldo, D. Sheehy , A. Singh, and L. A. W asserman. Minimax rates for homology inference. Journal of Mac hine Learning Resear ch - Pr oceedings T rac k , 22: 64–72, 2012. [2] U. Bauer , C. Lange, and M. W ardetzky . Optimal topological simplification of discrete functions on surfaces . Discr ete & Computational Geometry , 47(2):347–377, 2012. [3] P . Bendich, T . Galkovsk yi, and J. Harer . Improving homology estimates with random walks . In verse Pr oblems , 27(12):124002 + , 2011. 27 [4] P . Billingsley . Con ver gence of pr obability measur es . W iley Series in Probability and Statistics: Probability and Statistics. John W iley & Sons Inc., second edition, 1999. A W iley-Interscience Publication. [5] P . Bubenik and P . T . Kim. A statistical approach to persistent homology . Homology , Homotopy and Applications , 9(2):337–362, 2007. [6] P . Bubenik, G. Carlsson, P . T . Kim, and Z.-M. Luo. Statistical topology via Morse theory persistence and nonparametric estimation . In M. A. G. V iana and H. P . W ynn, editors, Alge- braic Methods in Statistics and Pr obability II , v olume 516 of Contemporary Mathematics , pages 75–92. American Mathematical Society , 2010. [7] H. P . Chan and G. W alther . Detection with the scan and the av erage likelihood ratio. arXiv:1107.4344v1 , 2013. [8] F . Chazal, D. Cohen-Steiner , L. J. Guibas, F . Mémoli, and S. Y . Oudot. Gromov-Hausdor ff stable signatures for shapes using persistence . Computer Gr aphics F orum , 28(5):1393–1403, 2009. [9] F . Chazal, D. Cohen-Steiner , and Q. Mérigot. Geometric inference for probability measures . F oundations of Computational Mathematics , 11(6):733–751, 2011. [10] D. Cohen-Steiner , H. Edelsbrunner , and J. Harer . Stability of persistence diagrams . Discr ete and Computational Geometry , 37(1):103–120, 2007. [11] P . L. Da vies and A. K ovac. Local extremes, runs, strings and multiresolution. The Annals of Statistics , 29(1):1–65, 2001. With discussion and rejoinder by the authors. [12] P . L. Davies and A. K ovac. Densities, spectral densities and modality . The Annals of Statistics , 32(3):1093–1136, 2004. [13] D. Donoho and J. Jin. Higher criticism for detecting sparse heterogeneous mixtures . Ann. Statist. , 32(3):962–994, 2004. [14] D. L. Donoho. One-sided inference about functionals of a density . The Annals of Statistics , 16(4):1390–1420, 1988. [15] D. L. Donoho, I. M. Johnstone, G. Kerk yacharian, and D. Picard. W av elet shrinkage: asymptopia? J. Roy . Statist. Soc. Ser . B , 57(2):301–369, 1995. W ith discussion and a reply by the authors. [16] H. Edelsbrunner and J. L. Harer . Computational T opology: An Intr oduction . AMS, 2010. [17] H. Edelsbrunner , D. Letscher , and A. Zomorodian. T opological persistence and simplifica- tion . Discrete and Computational Geometry , 28(4):511–533, 2002. [18] B. T . Fasy , F . Lecci, A. Rinaldo, L. W asserman, S. Balakrishnan, and A. Singh. Confidence sets for persistence diagrams . Ann. Statist. , 42(6):2301–2339, 2014. [19] I. J. Good and R. A. Gaskins. Density estimation and bump-hunting by the penalized likelihood method e xemplified by scattering and meteorite data. Journal of the American Statistical Association , 75(369):pp. 42–56, 1980. 28 [20] M. Grasmair . The equiv alence of the taut string algorithm and BV-regularization . J ournal of Mathematical Imaging and V ision , 27(1):59–66, 2007. [21] M. Grasmair and A. Obereder . Generalizations of the taut string method . Numerical Functional Analysis and Optimization , 29(3-4):346–361, 2008. [22] J. A. Hartigan. T esting for antimodes. In Data Analysis , Studies in Classification, Data Analysis, and Kno wledge Organization, pages 169–181. Springer Berlin Heidelber g, 2000. [23] J. A. Hartigan and P . M. Hartigan. The dip test of unimodality . The Annals of Statistics , 13 (1):pp. 70–84, 1985. [24] Y . Ingster and I. Suslina. Nonparametric Goodness-of-F it T esting Under Gaussian Models , volume 169 of Lectur e Notes in Statistics . Springer , 2003. [25] J. Kloke and G. Carlsson. T opological De-Noising: Strengthening the topological signal , 2010. . [26] E. Mammen and S. van de Geer . Locally adaptive re gression splines . The Annals of Statistics , 25(1):387–413, 1997. [27] E. Rio. Théorie asymptotique des pr ocessus aléatoir es faiblement dépendants , v olume 31 of Mathématiques & Applications (Berlin) [Mathematics & Applications] . Springer-V erlag, 2000. [28] D. R. Sheehy . A multicov er nerv e for geometric inference . In CCCG: Canadian Confer ence in Computational Geometry , 2012. [29] G. R. Shorack and J. A. W ellner . Empirical pr ocesses with applications to statistics , volume 59. Siam, 2009. [30] B. W . Silverman. Using kernel density estimates to in vestigate multimodality . J ournal of the Royal Statistical Society . Series B (Methodological) , 43(1):pp. 97–99, 1981. [31] A. W . van der V aart. Asymptotic Statistics . Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge Uni versity Press, 2000. [32] M. W and and M. Jones. K ernel smoothing. London: Chapman & Hall, 1995. 29

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment