Estimating the reach of a manifold via its convexity defect function

The reach of a submanifold is a crucial regularity parameter for manifold learning and geometric inference from point clouds. This paper relates the reach of a submanifold to its convexity defect function. Using the stability properties of convexity …

Authors: Clément Berenfeld, John Harvey, Marc Hoffmann

Estimating the reach of a manifold via its convexity defect function
ESTIMA TING THE REACH OF A MANIFOLD VIA ITS CONVEXITY DEFECT FUNCTION CL ´ EMENT BERENFELD, JOHN HAR VEY , MARC HOFFMANN, KRISHNAN SHANKAR A B S T R A C T . The reach of a submanifold is a crucial regularity parameter for manifold learning and geometric inference fr om point clouds. This paper relates the reach of a submanifold to its convexity defect function. Using the stability properties of convexity defect functions, along with some new bounds and the recent submanifold estimator of Aamari and Levrard [Ann. Statist. 47 177-–204 (2019)], an estimator for the reach is given. A uniform expected loss bound over a C k model is found. Lower bounds for the minimax rate for estimating the reach over these models are also provided. The estimator almost achieves these rates in the C 3 and C 4 cases, with a gap given by a logarithmic factor . Mathematics Subject Classification (2010) : 62C20, 62G05, 53A07, 53C40. Keywords : Point clouds, manifold reconstruction, minimax estimation, convex- ity defect function, reach. 1. I N T R O D U C T I O N 1.1. Motivation. The r each of a submanifold M ⊆ R D is a geometric invari- ant which measures how tightly the submanifold folds in on itself. Dating back to Federer [ Fed59 ], it encodes both local curvature conditions as well as global ‘bottlenecks’ arising from two regions of the manifold that are far apart in the manifold’s intrinsic metric but are close in the ambient Euclidean metric. The reach is a key r egularity parameter in the estimation of other geometric informa- tion. Methods and algorithms from topological data analysis often use the reach as a ‘tuning parameter ’. The correctness of their results depends on setting this parameter correctly . Statistical inference from point clouds has become an active area. In a prob- abilistic framework, a reach condition , meaning that the reach of the submani- fold under study is bounded below , is usually necessary in or der to obtain min- imax inference results in manifold learning. These include: homology infer ence [ NSW08 , BRSSW12 ], curvature [ AL19 ], reach estimation itself [ AKCMRW19 ] as well as manifold estimation [ GPVW12 , KR W19 , AL19 ]. In this context, there is a risk of algorithms being applied as ‘black boxes’ without attention to their un- derlying assumptions. Efficient r each estimation would be a vital addition to this field, providing a so-called sanity test of other r esults. Date : March 29, 2021. 1 2 BERENFELD, HAR VEY , HOFFMANN, SHANKAR In this dir ection, Aamari, Kim et al. paved the way: in [ AKCMR W19 ], un- der some specific assumptions, an estimator of the reach has been proposed and studied when the observation is an n -sample of a smooth pr obability distribution supported on an unknown d -dimensional submanifold M of a Euclidean space R D together with the tangent spaces at each sampled point. For certain types of C 3 -regularity models, the estimator , based on a r epresentation of the reach in terms of points of M and its tangent spaces (Theorem 4.18 in [ Fed59 ]) achieves the rate n − 2/ ( 3 d − 1 ) . A lower bound for the minimax rate of convergence is given by n − 1/ d . In the special case when the r each of M is attained at a bottleneck, the algorithm in [ AKCMR W19 ] achieves this rate. However , in general, one does not know whether this condition is satisfied a priori . In this paper , we continue the study of reach estimation by taking a completely differ ent route: we use the relationship between the reach of a submanifold of R D and its convexity defect function . This function was introduced by Attali, Lieutier and Salinas in [ ALS13 ] and measures how far a (bounded) subset X ⊆ R D is from being convex at a given scale. It is a powerful geometric tool that has other ap- plications such as manifold r econstruction, see the recent work by Divol [ Div20 ]. By establishing certain new quantitative properties of the convexity defect func- tion of a submanifold M ⊆ R D that relate to both its curvature and bottleneck properties, we show that the convexity defect function can be used to compute the reach of a submanifold. From this we obtain a method which transforms an estimator of M , along with information on its err or , into a new estimator of the reach. The r ecent r esults of Aamari and Levrard in [ AL19 ] pr ovide an estimator of M which is optimal, to within logarithmic terms. T ransforming this into an estima- tor of the reach, we obtain new convergence results over general C k -regularity models ( k > 3). These rates improve upon the previous work of [ AKCMRW19 ]. By establishing lower bounds for the minimax rates of convergence, we prove that our results ar e optimal up to logarithmic terms in the cases k = 3 and k = 4. 1.2. Main results. W e present here one of several possible definitions of the reach. Given a submanifold M ⊆ R D , consider its δ -of fset given by the open set M δ ⊆ R D , where M δ = [ p ∈ M B δ ( p ) . Here B δ ( p ) denotes the open Euclidean ball centered at p with radius δ . For small enough δ (a uniform choice for such δ exists in general only when M is compact), one has has the pr operty that for all y ∈ M δ \ M , there is a unique straight line from y to a point in M realizing the distance from y to M . In other words, the metric projection π : M δ → M is well defined. ESTIMA TING THE REACH OF A MANIFOLD 3 Definition 1.1 (Federer [ Fed59 ]) . The reach of a submanifold M is sup n δ > 0 : The nearest point pr ojection π : M δ → M is well defined o . W e denote the reach by R ( M ) or simply R when the context is clear . Other equivalent characterizations of the r each exist. For example, in Section 4.1 below , we use the characterization from Theorem 4.18 in [ Fed59 ]. More re- cently Theorem 1 in [ BL W19 ] defined the reach in terms of the metric distortion. Our main results are obtained for a statistical model which imposes certain standard regularity conditions on the manifolds being considered, requir es that they ar e compact and connected, and also imposes conditions on the distribu- tions being considered which have support on those manifolds. The set of dis- tributions satisfying these constraints on C k manifolds is denoted in the results below by P k and these constraints are elaborated upon in Sections 3 and 6 . Theorem 1. For d-dimensional submanifolds of r egularity C k with k > 3 , and for sufficiently large n, there exists an estimator b R explicitly constructed in Section 6 below that satisfies sup P ∈ P k E P ⊗ n    b R − R    6 C           log ( n ) n − 1  1/ d k = 3  log ( n ) n − 1  k / ( 2 d ) k > 4, where b R denotes an estimator of the reach R = R ( M ) constructed fr om an n -sample ( X 1 , . . . , X n ) of independent random variables with common distribution P ∈ P k . The quantity C > 0 depends on d, k and the regularity parameters that define the class P k and the notation E P ⊗ n [ · ] refers to the expectation operator under the distribution P ⊗ n of the n -sample ( X 1 , . . . , X n ) . W e also provide a lower bound for the minimax convergence rate. In case k = 3, 4, our estimators are almost optimal, with a gap given by a log ( n ) factor . Theorem 2. For certain values of the regularity parameters (depending only on d and k), then inf b R sup P ∈ P k E P ⊗ n    b R − R    > cn − ( k − 2 ) / d , where the infimum is taken over all the estimators b R = b R ( X 1 , . . . , X n ) and c > 0 depends on d , k and the regularity parameters. These results are of an entirely theoretical nature. The question of practical im- plementation remains, although it is not of primary inter est for the paper . Start- ing from a point cloud X 1 , . . . , X n , one may implement the following protocol: • Estimate M from the data X 1 , . . . , X n by the best available manifold re- construction method c M , or , indeed, by any other method. 4 BERENFELD, HAR VEY , HOFFMANN, SHANKAR • Compute h c M (Definition 4.1 ) and derive b R thanks to Definition 6.7 . The only inputs are c M and the regularity parameters that define the class P k . It is a common practice in statistics to assume some prior knowledge of the class in order to constrain the problem. However , the quantities C d , k , R min and C in Theorem 6.8 are unknown, which cr eates difficulties in deriving the accuracy of the estimator b R and, for example, calculating a confidence interval. This is common to every minimax r esult and could in practice be treated by estimating the variance of the estimator via any conventional computational method such as the bootstrap [ ET94 ]. A more detailed discussion lies outside the scope of the present paper . 1.3. Organization of the paper. The paper is divided into two halves: a first half that is mainly geometric in flavor and a second half which employs mainly statistical techniques. T o that end Sections 2 , 3 and 4 describe the geometric setting of this paper in some de- tail, Section 5 discusses the appr oximation of the reach in a deterministic setting, while Sections 6 and 7 are devoted to showing that the new algorithm proposed to estimate the reach achieves the rates stated in Theorem 1 and to the proof of the lower bound for the minimax rate stated in Theorem 2 . Section 2 : W e elaborate on the geometry of the reach. W e recall a dichotomy due to Aamari, Kim et al. [ AKCMRW19 ] in Theorem 2.1 and we study in partic- ular the distinction between global r each or weak featur e size in Definition 2.2 and the local reach in Definition 2.3 , according to the terminology of [ AKCMRW19 ]. This is not apparent in the classical Definition 1.1 of Feder er . Section 3 : A geometrical framework is given for studying reach estimation. W e describe pr ecisely a class C k R min , L of submanifolds, following Aamari and Levrard [ AL19 ]. Manifolds M in this class admit a local parametrization at all points p ∈ M by the tangent space T p M , which is the inverse of the projection to the tangent space and satisfies certain C k bounds. Section 4 : This section is devoted to the study of the convexity defect function h M of M as intr oduced in [ ALS13 ] and its properties. W e show how the local reach can be calculated from the values of h M near the origin in Proposition 4.3 and how the weak feature size (the global reach) appears as a discontinuity point of h M whenever it is smaller than the local r each. This is done by proving an upper bound on h M in Proposition 4.4 . Proposition 4.3 and 4.4 are central to the results of the paper . Section 5 : When we attempt to estimate the reach in later sections, we will not know M exactly . Instead, we will know it up to some statistical error coming from an estimator . Propositions 5.1 and 5.3 give approximations of the local r each and ESTIMA TING THE REACH OF A MANIFOLD 5 the weak feature size, respectively , calculated from some proxy f M . The errors of the approximations ar e given in terms of the Hausdorff distance H ( M , f M ) . Section 6 : Building on the definitions in Section 3 , a statistical framework is described within which we study reach estimation in a minimax setting. This defines a class P k of admissible distributions P over their support M , the sub- manifold of interest, which belongs to the class C k R min , L . T o apply the results of the previous section, we may use the Aamari–Levrard estimator [ AL19 ] c M of M fr om a sample ( X 1 , . . . , X n ) as the proxy f M for M . This estimator is almost optimal over the class P k . This yields estimators of the local reach and finally of the reach R ( M ) in Section 6 . W e then pr ove the upper bounds announced in Theorem 1 above in Theor ems 6.5 – 6.8 . Section 7 : Using the classical Le Cam testing argument we obtain minimax lower bounds as announced in Theorem 2 . 2. G E O M E T RY O F T H E R E A C H The reach of a submanifold M , which we will denote by R ( M ) , or simply R , is an unusual invariant. Definition 1.1 conceals what is almost a dichotomy – the reach of a submanifold can be realised in two very differ ent ways. This is made precise by the following r esult. Theorem 2.1. [ AKCMR W19 , Theorem 3.4] Let M ⊆ R D be a compact submanifold with reach R ( M ) > 0 . At least one of the following two assertions holds. • (Global case) M has a bottleneck, that is, there exist q 1 , q 2 ∈ M such that ( q 1 + q 2 ) / 2 ∈ M e d ( M ) and k q 1 − q 2 k = 2 R ( M ) . • (Local case) There exists q 0 ∈ M and an arc-length parametrized geodesic γ such that γ ( 0 ) = q 0 and k γ 00 ( 0 ) k = 1/ R ( M ) . Here, Me d ( M ) is the medial axis of M , that subset of R D on which the nearest point projection on M is ill-defined, namely M ed ( M ) = { z ∈ R D | ∃ p , q ∈ M , p 6 = q , d ( z , p ) = d ( z , q ) = d ( z , M ) } . W e say that the r esult above is only ‘almost’ a dichotomy because it is possible for both conditions to hold simultaneously . The curve γ could be one half of a circle with radius R ( M ) joining q 1 and q 2 , for example, in which case the term ‘bottleneck’ might be considered a misnomer , or the points q 1 and q 2 might not lie on γ at all, so that the two assertions hold completely independently . This situation invites us to consider two separate invariants. One, the weak feature size , R wfs , is a widely studied invariant encoding large scale information such as bottlenecks. The second, which we will call the local r each , R ` , following [ AKCMR W19 ], will encode curvatur e information. Theorem 2.1 states that the 6 BERENFELD, HAR VEY , HOFFMANN, SHANKAR minimum of these two invariants is the reach, R = min { R ` , R wfs } . Note that, in Riemannian geometry , the local reach is referred to as the focal radius of M , while the reach itself is often referred to as the normal injectivity radius of M . 2.1. The weak feature size. The weak feature size is defined in terms of critical points of the distance function from M (in the sense of Grove and Shiohama; see for instance [ Gro94 ], p. 360). Consider the function, d M : R D → R defined by d M ( y ) = inf p ∈ M k y − p k . Note that M = d − 1 M ( 0 ) . Following [ ALS13 ], let Γ M ( y ) = { x ∈ M : d M ( y , M ) = k x − y k} , i.e., those x in M realizing the distance between y and M . Then we define a generalized gradient as ∇ M ( y ) : = y − Center ( Γ M ( y ) ) d M ( y , M ) , where Center ( σ ) is defined as the center of the smallest (Euclidean) ball enclosing the bounded subset σ ⊆ R D . This generalized gradient ∇ M for d M coincides with the usual gradient where d M is differentiable. W e say that a point y ∈ R D \ M is a critical point of d M if ∇ M ( y ) = 0 . For example, if y is the midpoint of a chord the endpoints of which meet the submanifold perpendicularly , then fr om y there are two shortest paths to M which travel in opposite directions. It follows that y is a critical point. Definition 2.2. Given a submanifold M of R D let C denote the set of critical points of the distance function d M . The weak feature size , denoted R wfs ( M ) or simply R wfs , is then defined as R wfs : = inf { d M ( y ) : y ∈ C } . By Theor em 2.1 , if the reach is realised globally then the first critical point will be the midpoint of a shortest chord which meets M perpendicularly at both ends, and so the weak feature size is equal to the r each. 2.2. The local reach. In the local case, Theor em 2.1 tells us that the r each is deter- mined by the maximum value of k γ 00 k over all arc-length parametrised geodesics γ . This can be formulated mor e concisely by considering instead the second fun- damental form , II, which measur es how the submanifold M curves in the ambient Euclidean space R D . W e refer the reader to a standard text in Riemannian ge- ometry such as [ Car92 ] for a precise definition of the second fundamental form. Informally , the second fundamental form is defined as follows. For a pair of vec- tor fields tangent to M , the (Euclidean) derivative of one with r espect to the other is not usually tangent to M . In fact, the tangential component is the Levi–Civita ESTIMA TING THE REACH OF A MANIFOLD 7 connection of the induced (Riemannian) metric on M . The normal, or perpen- dicular , component yields a symmetric, bilinear form, namely , the second funda- mental form, denoted by II p . In particular , if the norm of II p is small then M is nearly flat near p and if the norm is large then it is an area of high curvatur e. Definition 2.3. Given a submanifold M of R D let II p denote the second funda- mental form at p ∈ M . The the local r each of M , denoted R ` ( M ) or simply R ` is the quantity R ` = inf p ∈ M n 1 k II p k op o . W e use the term ‘local reach’ here to reflect the fact that this quantity is gen- erated entirely by the local geometry . In differ ential geometry literature the local reach is r eferred to as the focal radius of the submanifold. 3. G E O M E T R I C A L F R A M E W O R K W e define a class of manifolds which are suitable for the task of reach estima- tion. This class is the same as that considered by Aamari and Levrar d [ AL19 ] for other problems in minimax geometric inference. The class is that of C k sub- manifolds, but with some additional regularity requir ements. These guarantee the existence of a T aylor expansion of the embedding of the submanifold with bounded co-efficients, as well as a uniform lower bound on the r each. Definition 3.1. (see [ AL19 ]) For two fixed natural numbers d < D and for some k > 3, R min > 0, and L = ( L ⊥ , L 3 , . . . , L k ) , we let C k R min , L denote the set of d - dimensional, compact, connected submanifolds M of R D such that: ( i ) R ( M ) > R min ; ( ii ) For all p ∈ M , there exists a local one-to-one parametrization ψ p of the form: ψ p : B T p M ( 0, r ) ⊆ T p M → M , v 7 → p + v + N p ( v ) for some r > 1 4 L ⊥ , with N p ∈ C k ( B T p M ( 0, r ) , R D ) such that N p ( 0 ) = 0, d 0 N p = 0,   d 2 v N p   op 6 L ⊥ , for all k v k 6 1 4 L ⊥ ; ( iii ) The differentials d i v N p satisfy   d i v N p   op 6 L i for all 3 6 i 6 k and k v k 6 1 4 L ⊥ . W e define subclasses of C k R min , L as follows, using the gap R ` − R wfs between the weak feature size and the local r each. For fixed values of R min and L , we define M k 0 =  M ∈ C k R min , L | R wfs ( M ) > R ` ( M )  8 BERENFELD, HAR VEY , HOFFMANN, SHANKAR and M k α =  M ∈ C k R min , L | R wfs ( M ) 6 R ` ( M ) − α  , α > 0. Note that C k R min , L = ∪ α > 0 M k α . Manifolds in C k R min , L admit a second parametrization, one that repr esents the manifold locally as the graph of a function over the tangent space so that the first non-zero term in the T aylor expansion is of degr ee two and is given by the sec- ond fundamental form. These parametrizations in general satisfy weaker bounds than L . The degree k T aylor polynomial then gives an algebraic approximation of the manifold, which will be very useful in later calculations. The following lemma from [ AL19 ] describes the T aylor expansion of a local parametrization at every point p ∈ M . Lemma 3.2. [ AL19 , Lemma 2] Let k > 3 , M ∈ C k R min , L and r = 1 4 min { R min , L − 1 ⊥ } . Then for all p ∈ M there is a local one-to-one parametrization around p , Φ p : U → M , for some U ⊂ T p M , which contains B ( p , r ) ∩ M in its image, satisfies pr T p M ◦ Φ p ( v ) = v on its domain, and takes the form Φ p ( v ) = p + v + 1 2 T 2 ( v ⊗ 2 ) + 1 6 T 3 ( v ⊗ 3 ) + . . . + 1 ( k − 1 ) ! T k − 1 ( v ⊗ ( k − 1 ) ) + R k ( v ) , where k R k ( v ) k 6 C k v k k . Furthermore T 2 = II p and k T i k op 6 L 0 i , wher e L 0 i and C depends on d , k, R min and L , and the terms T 2 , . . . , T k − 1 , R k are all normal to T p M . Definition 3.3. W e call the degree j truncation of the parametrization Φ p given in Lemma 3.2 the approximation of degr ee j to M around p and write it Φ j p ( v ) = p + v + 1 2 T 2 ( v ⊗ 2 ) + 1 6 T 3 ( v ⊗ 3 ) + . . . + 1 j ! T j ( v ⊗ j ) . 4. C O N V E X I T Y D E F E C T F U N C T I O N S The convexity defect function, originally intr oduced by Attali, Lieutier and Salinas [ ALS13 ], measures how far a subset X ⊆ R D is from being convex at scale t . The goal of this section is to establish a relationship between the convexity defect function and the r each. The definition is valid for any compact subset of R D . In this section we will principally consider the case of a closed submanifold M as before, but in the sequel we will need to know that this function can be defined in greater generality . W e recall the definition. Given a compact subset σ ⊆ X , it is contained in a smallest enclosing closed ball in R D . W e define Rad ( σ ) to be the radius of this ball. W e denote by Hull ( σ ) the convex hull of σ in R D . Then we define the convex hull of X at scale t to be the following subset of R D : Hull ( X , t ) = [ σ ⊆ X Rad ( σ ) 6 t Hull ( σ ) . ESTIMA TING THE REACH OF A MANIFOLD 9 For two compact subsets A and B of R D , we define the asymmetric distance H ( A | B ) = sup a ∈ A d ( a , B ) so that H ( A , B ) = max  H ( A | B ) , H ( B | A )  is the sym- metric Hausdorff distance. Definition 4.1. Given a compact subset X ⊆ R D , we define the convexity defect function h X : R > 0 → R > 0 by h X ( t ) = H ( Hull ( X , t ) , X ) = H ( Hull ( X , t ) | X ) . F I G U R E 1 . The convex hull at scale t , Hull ( X , t ) (in blue), of a curve X (in black). Enclosed between the dotted curves is the min- imal tubular neighborhood around X that contains Hull ( X , t ) — its width is the convexity defect function h X ( t ) . W e recall her e from [ ALS13 ] some useful pr operties of h X . 1. h X ( 0 ) = 0. 2. h X is non-decreasing on the interval [ 0, Rad ( X ) ] and constant thereafter . 3. If e X ⊆ R D satisfies H ( X , e X ) < e , where H is the Hausdorff distance, then h e X ( t − e ) − 2 e 6 h X ( t ) 6 h e X ( t + e ) + 2 e for any t > e . 4. h X ( t ) 6 t for all t > 0. Moreover , h X ( t 0 ) = t 0 if and only if t 0 is a critical value of the distance function, d X . 5. If the r each, R = R ( X ) > 0, then on [ 0, R ) the function h X ( t ) is bounded above by a quarter -circle of radius R centered on ( 0, R ) . In other words, h X ( t ) 6 R − √ R 2 − t 2 for t ∈ [ 0, R ) . From item 4 and the definition of the weak feature size in terms of critical points of the distance function, the following proposition is immediate. Proposition 4.2. If M is a submanifold of R D then R wfs = inf { t > 0 : h M ( t ) = t } . W e can also relate the local reach to the convexity defect function with the following proposition, which we will pr ove in Section 4.2 . 10 BERENFELD, HAR VEY , HOFFMANN, SHANKAR Proposition 4.3. Let k > 4 . There exists a constant C (depending on R min , L , d and k) such that, for any sufficiently small non-negative real t , t 6 t R min , L , d , k , and any M ∈ C k R min , L , we have     h M ( t ) − t 2 2 R `     6 Ct 4 . In case k = 3 , there exists a constant C 0 (depending on R min , L , d ) such that, for any sufficiently small non-negative real t, t 6 t R min , L , d , and any M ∈ C k R min , L , we have     h M ( t ) − t 2 2 R `     6 C 0 t 3 . W e will write, somewhat informally , R ` = 1/ h 00 M ( 0 ) . The function h M is not actually twice differ entiable; h 00 M ( 0 ) here is a ‘pointwise second derivative’. Since R = min { R ` , R wfs } , these two propositions show how the convexity defect function yields the reach. Proposition 4.3 will be proven in Section 4.2 , but first we need to refine the upper bound given in item 5 of the properties of h X given after Definition 4.1 for the case where X is a submanifold. 4.1. Upper bounds on the convexity defect function. The two aspects of the reach r elate to the convexity defect function in quite differ ent ways, which nat- urally leads one to wonder which aspect of the reach is responsible for item 5 of the properties of h X given after Definition 4.1 . In this subsection we improve the upper bound by increasing the radius of the bounding circle from R to R ` , though the bound still only holds on the interval [ 0, R ) (compare with Lemma 12 in [ ALS13 ]). See Figure 2 for an illustation. Proposition 4.4. If M ∈ C k R min , L and R = R ( M ) is its reach, then on [ 0, R ) the function h M ( t ) is bounded above by a quarter-circle of radius R ` centered on ( 0, R ` ) . In other words, h M ( t ) 6 R ` − q R 2 ` − t 2 . For submanifolds in the class M k 0 (where R wfs > R ` ), this result does not have any content. However , for manifolds in M k α i.e., manifolds for which R wfs 6 R ` − α for some α > 0, the bound is sharper , with the following consequence. Corollary 4.5. If M ∈ M k α for some α > 0 , then h M is discontinuous at R ( M ) . Proof. Since α > 0, we have R ( M ) = R wfs < R ` . For t < R wfs the bound h M ( t ) 6 R ` − q R 2 ` − t 2 from Pr oposition 4.4 holds. On the other hand, for t = R wfs we have h M ( t ) = t . Therefor e the one-sided limit lim t % R wfs h M ( t ) < h M ( R wfs ) and the function is discontinuous.  ESTIMA TING THE REACH OF A MANIFOLD 11 F I G U R E 2 . A curve X (left) and its convexity defect function h X ( t ) (right), which is below the quarter-cir cle of radius R ` for t < R ( X ) = R wfs . Since R wfs < R ` , we observe a discontinuity at t = R wfs . The proof of Proposition 4.4 will requir e a few steps. W e can focus our atten- tion on the local r each by paying attention to sets of the form M 0 = M ∩ B ( z , r ) , where z ∈ R D , 0 < r < R ( M ) and B ( z , r ) is a closed ball. Lemma 4.6 will show that subsets of this type have no bottlenecks. W e would expect, then, that the reach of such a subset is generated by the local geometry . Lemma 4.9 quantifies this point: the reach of M 0 is determined by the behaviour of the second funda- mental form on M 0 . The principal point of difficulty her e relates to the boundary of the sets M 0 . The proposition then follows from the fact that h M ( t ) can be bounded using the functions h M 0 ( t ) and so the bound is in fact determined by the second fundamental form, i.e. by R ` in particular . Lemma 4.6. Let A ⊆ R D be a compact set. Let 0 < s < R ( A ) , z ∈ R D , and A 0 = A ∩ B ( z , s ) , where B is a closed ball. If A 0 6 = ∅ , then A 0 cannot have any bottlenecks, i.e. there is no pair p , q ∈ A 0 with k p − q k = 2 R ( A 0 ) and ( p + q ) / 2 ∈ M ed ( A 0 ) . Proof. Suppose for a contradiction that a bottleneck exists. Then it is a chord of length 2 R ( A 0 ) . Since diam A 0 6 2 s we obtain that 2 R ( A 0 ) 6 2 s < 2 R ( A ) 6 2 R ( A 0 ) , the last inequality holding by [ AL15 , Lemma 5].  W e now consider the case where A = M , a submanifold, and consider the intersections M 0 . Our goal is to find the r each of the intersections, M 0 , in or der to bound h M 0 and hence h M . W e will use the following characterisation of the reach due to Federer [ Fed59 ] 1 R ( A ) = sup p , q ∈ A 2 d ( q − p , C p A ) k q − p k 2 , 12 BERENFELD, HAR VEY , HOFFMANN, SHANKAR where C p A is the tangent cone at p , which Federer showed always exists for a set of positive reach. This quotient can be related to the second fundamental form as follows (cf. [ AKCMR W19 , Lemma 3.3]; and also work of L ytchak [ L yt04 ] for more general r esults). Lemma 4.7. Let k > 3 and M ∈ C k R min , L . Let M 0 = M ∩ B ( z , r ) , where z ∈ R D , 0 < r < R ( M ) and B is a closed ball. Then, provided M 0 contains mor e than a single point, for any p ∈ M 0 the norm of the second fundamental form is given by k II p k op = lim sup q → p q ∈ M 0 2 d ( q − p , C p M 0 ) k q − p k 2 , where C p M 0 is the tangent cone at p in M 0 . In particular , 1 / R ( M 0 ) > sup p ∈ M 0 k II p k op . Proof. W e claim that ∂ M 0 is a C k submanifold of M . Consider the distance func- tion to the central point z ∈ R D , say f ( y ) = d ( z , y ) . This function is smooth on R D \ z and its pull-back f | M is C k on M \ z . For any p ∈ ∂ M 0 , f ( p ) = r . Note that r is a critical value of f | M precisely when the distance sphere ∂ B ( z , r ) is tangent to M at some p ∈ M . However , this cannot happen for r < R ( M ) . This is because r is less than the focal radius at p and so M must lie in the exterior of B ( z , r ) . This in turn implies that M 0 = { p } which contradicts the assumption that it is not a singleton. Therefor e, r is a regular value of the C k function f on M and the pre-image ∂ M 0 is an embedded submanifold without boundary , as claimed. As a consequence, M 0 is an embedded submanifold of M of full dimension with boundary . The tangent cone in R D , C p M 0 , is given by T p M for p in the interior of M 0 and by a half-space of T p M for p ∈ ∂ M 0 , namely C p M 0 = T p M ∩ { u | h p − z , u i 6 0 } , where z is the center of the ball containing M 0 . W e now consider some other point q ∈ M 0 , q 6 = p , and show that the projection of q to T p M lies in C p M 0 . Suppose p ∈ ∂ M 0 ⊆ ∂ B . Consider the affine hyperplane H D − 1 through p perpendicular to the line pz . Since q ∈ B , q lies on the same side of H as z and therefore the projection of q to T p M lies in C p M 0 . If p / ∈ ∂ M 0 then T p M = C p M 0 and so this statement automatically holds. Let us assume now that q is close to p , satisfying k q − p k 6 1 4 min { R min , ( L ⊥ ) − 1 } , so that the projection of q to C p M 0 satisfies the conclusion of Lemma 3.2 . In par- ticular , if v is the projection of q onto T p M , we may write q − p = v + 1 2 II p ( v , v ) + R 3 ( v ) , where the r emainder R 3 ( v ) is of order O ( k v k 3 ) . Therefore d ( q − p , C p M 0 ) =   1 2 II p ( v , v ) + R 3 ( v )   . ESTIMA TING THE REACH OF A MANIFOLD 13 W e can then calculate the Federer quotient, 2 d ( q − p , C p M 0 ) k q − p k 2 =   II p ( v , v ) + 2 R 3 ( v )   k v k 2 +   1 2 II p ( v , v ) + R 3 ( v )   2 = 1 k v k 2 k II p ( v , v ) + 2 R 3 ( v ) k + 1 4   II p ( v , v ) + 2 R 3 ( v )   . As q → p we see that v → 0. In order to compute the lim sup, we may assume that a sequence of points q i is chosen such that k II p ( v i , v i ) k is maximized. Then, since all terms in the denominator go to zero except the ratio k v i k 2 k II p ( v i , v i ) k , we have lim sup q → p q ∈ M 0 2 d ( q − p , C p M 0 ) k q − p k 2 = lim i → ∞ k II p ( v i , v i ) k k v i k 2 . W e would like to claim that lim i → ∞ k II p ( v i , v i ) k k v i k 2 = k II p k op , but recall that p may lie on the boundary of M 0 and so we must check that a suitable sequence of points q i ∈ M 0 can be found. Since C p M 0 is a half-space and II p is a symmetric, bilinear form, there is some unit vector w ∈ C p M 0 so that k II p ( w , w ) k = k II p k op . Then we can choose a sequence q i ∈ M 0 so that the projections of the q i are t i v i , where the v i are unit vectors in C p M 0 such that v i → w and the t i are positive numbers with t i → 0. The existence of such a sequence is equivalent to the fact that w ∈ C p M 0 . The final statement then follows from k II p k op = lim sup q → p q ∈ M 0 2 d ( q − p , C p M 0 ) k q − p k 2 6 sup p , q ∈ M 0 2 d ( q − p , C p M 0 ) k q − p k 2 = 1 R ( M 0 ) .  Remark 4.8. The r egularity assumption of k > 3 in the previous lemma may pos- sibly be improved to k > 2. This stems from the assumption in Lemma 3.2 which in turn derives from the regularity assumption in [ AL19 , Lemma 2]. However , this is not needed in the sequel so we do not pursue this further . Lemma 4.9. Let k > 3 and M ∈ C k R min , L . Let M 0 = M ∩ B ( z , r ) , where z ∈ R D , 0 < r < R ( M ) and B is a closed ball. Then, provided M 0 contains mor e than a single point, we have 1/ R ( M 0 ) = sup p ∈ M 0 k II p k op . Proof. W e have already shown in Lemma 4.7 that 1/ R ( M 0 ) > sup p ∈ M 0 k II p k op . By Lemma 4.6 , M 0 does not contain any bottlenecks. It follows that the r each is attained in one of two ways and we examine each case. Case 1 : The r each of M 0 is attained by a pair of points q , r ∈ M 0 but k q − r k < 2 R ( M 0 ) . In this case we apply [ AKCMR W19 , Lemma 3.2] to obtain, in M 0 , an ar c of a circle of radius R equal to the reach of M 0 . Note that that lemma is stated 14 BERENFELD, HAR VEY , HOFFMANN, SHANKAR for manifolds, but in fact the proof only requir es a set of positive reach. Then, for any point p on the reach-attaining arc, we obtain that 1 R ( M 0 ) 6 k II p k op 6 sup s ∈ M 0 k II s k . Case 2 : The reach of M 0 is attained at a single point, say p , in M 0 . It follows, using Lemma 4.7 that 1 R ( M 0 ) = lim sup q → p q ∈ M 0 2 d ( q − p , C p M 0 ) k q − p k 2 = k II p k op 6 sup s ∈ M 0 k II s k op . Combining the two cases, then, we also have that 1 R ( M 0 ) 6 sup s ∈ M 0 k II s k op completing the proof.  Proof of Pr oposition 4.4 . Let M 0 = M ∩ B ( z , r ) , wher e z ∈ R D , 0 < r < R ( M ) and B is a closed ball. Recall that on [ 0, R ( M 0 ) ) we have h M 0 ( t ) 6 R ( M 0 ) − q R ( M 0 ) 2 − t 2 . By Lemma 4.9 , if M 0 is not a single point we have 1 R ` = sup s ∈ M k II s k op > sup s ∈ M 0 k II s k op = 1 R ( M 0 ) , and this entails the bound h M 0 ( t ) 6 R ` − q R 2 ` − t 2 on [ 0, R ( M 0 ) ) . If M 0 is a point then h M 0 ( t ) = 0 for all t and so the same bound holds. Recalling that R ( M 0 ) > R ( M ) for every M 0 with Rad ( M 0 ) < R ( M ) , we have, for 0 < t 6 r < R ( M ) , sup M 0 = M ∩ B ( z , r ) h M 0 ( t ) 6 R ` − q R 2 ` − t 2 . Now for every σ ⊂ M with Rad ( σ ) 6 t 6 r , there is some M 0 = M ∩ B ( z , r ) with σ ⊂ M 0 and it follows that h M ( t ) 6 sup M 0 = M ∩ B ( z , r ) h M 0 ( t ) . Setting t = r and combining the two inequalities, we have, for 0 < t < R ( M ) , h M ( t ) 6 R ` − q R 2 ` − t 2 .  ESTIMA TING THE REACH OF A MANIFOLD 15 4.2. The convexity defect function near zero. W e have seen in the previous sec- tion how , for M ⊆ R D a compact submanifold, the function h M on [ 0, R ) obeys an upper bound determined by R ` . W e now study h M in greater detail in a neigh- borhood of zero to obtain a T aylor polynomial, identifying R ` as the reciprocal of the ‘pointwise second derivative’, 1 / h 00 M ( 0 ) . Mor e formally , we prove Proposition 4.3 , which states that, for any sufficiently small t ,     h M ( t ) − t 2 2 R `     6 Ct k ∧ 4 . Once more, we approach h M by considering sets M 0 , which ar e the intersection of M with small closed balls. W e introduce a new function h loc M 0 ( p , r 1 , r 2 ; t ) , which contains information on the convexity of M 0 . Lemma 4.10 shows how h M can be determined from all the h loc M 0 ( p , r 1 , r 2 ; t ) . Recall from Lemma 3.2 that such sets M 0 can be written as the graphs of functions over T p M and that these functions have T aylor expansions. Lemma 4.12 will set a lower bound on h loc for the degree 3 appr oximation to M around p , which Lemma 4.14 translates to a lower bound on h loc M 0 ( p , r 1 , r 2 ; t ) itself. V arying M 0 we obtain a lower bound on h M ( t ) for small t , which we combine with the upper bound from Pr oposition 4.4 to prove the r esult. Lemma 4.10. Let B denote a closed ball. Then, for any compact set X ⊂ R D and any r 1 , r 2 , t > 0 satisfying r 1 > 2 t and r 2 > t + r 1 , we have h X ( t ) = sup p ∈ X h loc X ( p , r 1 , r 2 ; t ) where h loc X ( p , r 1 , r 2 ; t ) = H  [ σ ⊆ X ∩ B ( p , r 1 ) Rad σ 6 t Hull σ    X ∩ B ( p , r 2 )  . Proof. W e begin by showing h X ( t ) > sup p ∈ X h loc X ( p , r 1 , r 2 ; t ) . W e have immedi- ately , for any p ∈ X and any r 1 , t > 0 h X ( t ) = H  [ σ ⊆ X Rad σ 6 t Hull σ   X  > H  [ σ ⊆ X ∩ B ( p , r 1 ) Rad σ 6 t Hull σ   X  and so all that is necessary is to check that H  [ σ ⊆ X ∩ B ( p , r 1 ) Rad σ 6 t Hull σ   X  = H  [ σ ⊆ X ∩ B ( p , r 1 ) Rad σ 6 t Hull σ   X ∩ B ( p , r 2 )  = h loc X ( p , r 1 , r 2 ; t ) . Let the asymmetric distance H  [ σ ⊆ X ∩ B ( p , r 1 ) Rad σ 6 t Hull σ   X  16 BERENFELD, HAR VEY , HOFFMANN, SHANKAR be realized by the data σ ⊆ X ∩ B ( p , r 1 ) , y ∈ Hull σ , p 0 ∈ X . W e have d ( p 0 , y ) 6 t and d ( y , p ) 6 r 1 so that d ( p 0 , p ) 6 r 1 + t 6 r 2 . Now we check that h X ( t ) 6 sup p ∈ X h loc X ( p , r 1 , r 2 ; t ) . If σ ⊂ X , we have σ ⊂ B ( p , 2 Rad σ ) for any p ∈ σ . By requiring Rad σ 6 t , we obtain H ( Hull σ | X ) 6 h loc X ( p , r 1 , r 2 ; t ) for any p ∈ σ provided r 1 > 2 t .  For a bilinear map S : R d × R d → R D − d and a trilinear map T : R d × R d × R d → R D − d , we denote M ( S , T ) = n ( v , S ( v ⊗ 2 ) + T ( v ⊗ 3 ) ) | v ∈ R d o ⊆ R D which is a C ∞ submanifold of R D of dimension d . By setting S and T to be the coefficients of Φ 3 p , the approximation of degree 3 to a manifold M around p ∈ M (see Definition 3.3 ), we can easily see that, near p , M ( S , T ) is Hausdorff close to M . This assumes that p = 0 and that T p M is the subspace spanned by the first d co-ordinates. This assumption, which is used in the statement of the lemma below , is for convenience only . For each p ∈ M ther e is an isometry of R D which causes it to be satisfied. Lemma 4.11. Let M ∈ C k R min , L . Suppose that p = 0 ∈ M and T p M = R d ⊆ R D . If k > 4 , we have, for s 6 s 1 with s 1 depending only on R min , L , k , d, H ( M ∩ B ( 0, s ) , M ( S , T ) ∩ B ( 0, s ) ) 6 Cs 4 , where S and T are obtained from the degr ee 3 appr oximation Φ 3 0 given in Definition 3.3 by S = 1 2 d 2 0 Φ 3 0 = II 0 , T = 1 6 d 3 0 Φ 3 0 and the constant C = C R min , L , k , d . When k = 3 we can use the degree 2 appr oximation Φ 2 0 and pick T ≡ 0 , to obtain H ( M ∩ B ( 0, s ) , M ( S , 0 ) ∩ B ( 0, s ) ) 6 C 0 s 3 Proof. Let us initially take s 1 = min { R min , L − 1 ⊥ } /4. Then for any point q ∈ M ∩ B ( 0, s ) , if v = pr T 0 M ( q ) then q = Φ 0 ( v ) = v + S ( v ⊗ 2 ) + T ( v ⊗ 3 ) + R ( v ) , where Φ 0 is the expansion given in Lemma 3.2 and k R ( v ) k 6 L 0 4 24 k v k 4 , unless k = 3. In case k = 3, if we wish to control the remainder we can only use the degree 2 polynomial appr oximation Φ 2 0 . It is therefore clear that, for the point q = Φ 0 ( v ) ∈ M ∩ B ( 0, s ) , there is a corresponding point Φ 3 0 ( v ) ∈ M ( S , T ) within the requir ed distance and, con- versely , for any point Φ 3 0 ( v ) ∈ M ( S , T ) ∩ B ( 0, s ) , there is a corr esponding point Φ 0 ( v ) ∈ M . The constant C may be chosen to be C = L 0 4 24 . However , the corresponding point is not guaranteed to lie in the ball B ( 0, s ) . In the next paragraph we establish that there is a vector v 0 very close to v , so that Φ 3 0 ( v 0 ) or Φ 0 ( v 0 ) , as appropriate, will be suf ficiently close. Let us continue to assume k > 4, since the case k = 3 is essentially identical. W e first consider the possibility that k Φ 3 0 ( v ) k 6 s but k Φ 0 ( v ) k > s . It is clear that, ESTIMA TING THE REACH OF A MANIFOLD 17 for sufficiently small s , k Φ 0 ( v ) k 2 6 s 2 + C 0 s 6 , where C 0 depends on R min , L ⊥ , L 3 and L 4 . It follows that k Φ 0 ( v ) k 6 s + C 1 s 5 . Consider now a vector v 0 = ( 1 − λ ) v , with λ ≈ 0, chosen so that k Φ 0 ( v 0 ) k = s . For small enough s we have λ 6 C 2 s 4 . It follows immediately that Φ 0 ( v 0 ) lies within C 3 s 4 of Φ 0 ( v ) , and hence within C s 4 of Φ 3 0 ( v ) . The case where k Φ 0 ( v ) k 6 s but k Φ 3 0 ( v ) k > s is dealt with similarly .  The utility of M ( S , T ) is that, since it is algebraic, we can compute explicit bounds for h loc X , where X = M ( S , T ) . Lemma 4.12. Let r 1 6 r 2 6 13 1/4 2 k T k − 1/ 2 op , and let X = M ( S , T ) . Then for any t 6 min  1 2 k S k − 1 op , 2 √ 13 r 1  we have h loc X ( 0, r 1 , r 2 ; t ) >  t − 1 2 t 5 k T k 2 op  2 k S k op > t 2 k S k op − t 6 k S k op k T k 2 op . Proof. Let v be a unit norm vector in R d such that k S ( v ⊗ 2 ) k = k S k op . Let z 6 min ( 1 2 k S k − 1 op , 2 √ 13 r 1 ) . Note that the upper bound on r 1 gives a third upper bound for z , namely z 6 13 − 1/ 4 k T k − 1/ 2 op 6 k T k − 1/ 2 op . W e set p 1 = ( zv , S ( ( zv ) ⊗ 2 ) ) + T ( ( z v ) ⊗ 3 ) )) and p 2 = ( − z v , S (( − z v ) ⊗ 2 ) + T ( ( − z v ) ⊗ 3 ) ) and denote the two-point set containing them by σ = { p 1 , p 2 } . In order to use σ to bound h loc X we must (1) check σ ⊆ X ∩ B ( 0, r 1 ) , (2) find the radius of σ and (3) determine H ( Hull σ | X ∩ B ( 0, r 2 ) ) . Firstly , since σ ⊆ M ( S , T ) , it is enough to show that k p 1 k 2 , k p 2 k 2 6 r 2 1 . Using all three bounds on z , we can check k p 1 k 2 , k p 2 k 2 6 z 2 + z 4 k S k 2 op + 2 z 5 k S k op k T k op + z 6 k T k 2 op 6 2 z 2 + 2 z 3 k S k op + z 4 k S k 2 op by z k T k 1/2 op < 1 6 13 4 z 2 by z k S k op 6 1 2 6 r 2 1 by z 6 2 √ 13 r 1 . Secondly , we obtain the radius as Rad σ = 1 2 q ( 2 z ) 2 + ( 2 z 3 k T ( v ⊗ 3 ) k ) 2 = z q 1 + z 4 k T ( v ⊗ 3 ) k 2 6 z  1 + 1 2 z 4 k T k 2 op  since √ 1 + x 6 1 + 1 2 x for x > 0 = z + 1 2 z 5 k T k 2 op . 18 BERENFELD, HAR VEY , HOFFMANN, SHANKAR Thirdly , we place a lower bound on H ( Hull σ | X ∩ B ( 0, r 2 ) ) . Let q = 1 2 ( p 1 + p 2 ) ∈ Hull σ . For any p = ( w , S ( w ⊗ 2 ) + T ( w ⊗ 3 ) ) ∈ X satisfying k w k 6 r 2 , we have d ( q , p ) 2 = k w k 2 + k S ( w ⊗ 2 ) + T ( w ⊗ 3 ) − z 2 S ( v ⊗ 2 ) k 2 > z 4 k S k 2 op + k w k 2 ( 1 − 2 z 2 k S k 2 op − 2 z 2 r 2 k S k op k T k op ) . Since z k S k op 6 1 /2 we have 2 z 2 k S k 2 op 6 1 2 . The same condition also allows us to see that 2 z 2 r 2 k S k op k T k op 6 zr 2 k T k op 6 1 2 . It follows that d ( q , p ) 2 > z 4 k S k 2 op = d ( q , 0 ) 2 from which we obtain the bound H ( Hull σ | X ∩ B ( 0, r 2 ) ) > z 2 k S k op . These three calculations yield h loc X ( 0, r 1 , r 2 ; z + 1 2 z 5 k T k 2 op ) > z 2 k S k op . Now we may reparametrize the argument by setting t = z + 1 2 z 5 k T k 2 op . Obviously t > z so we can invert to obtain z = t − 1 2 z 5 k T k 2 op > t − 1 2 t 5 k T k 2 op and so h loc X ( 0, r 1 , r 2 ; t ) > ( t − 1 2 t 5 k T k 2 op ) 2 k S k op > ( t 2 − t 6 k T k 2 op ) k S k op . If the bounds given in the statement hold for t , then they will also hold for the smaller value z and so the result is pr oved.  W e ar e now in a position to convert this bound for an algebraic approximation to M into one for the small patch of M itself. W e need a stability result first. Lemma 4.13. Let X , Y be two subsets of R D and let r 1 , r 2 , t > 0 such that r 1 6 r 2 . Then, if p ∈ X ∩ Y and H ( X ∩ B ( p , r 2 ) , Y ∩ B ( p , r 2 ) ) 6 e , we have h loc X ( p , r 1 , r 2 ; t ) 6 h loc Y ( p , r 1 + e , r 2 ; t + e ) + 2 e . Proof. This is a straightforwar d adaptation of the pr oof of Lemma 5 in [ ALS13 ]. Indeed let σ ⊂ X ∩ B ( p , r 1 ) be such that Rad σ 6 t . Let ξ = Y ∩ B ( p , r 2 ) ∩ σ e . Since H ( X ∩ B ( p , r 2 ) , Y ∩ B ( p , r 2 ) ) 6 e , ξ is not empty and satisfies H ( ξ , σ ) 6 e . Thus ξ ⊂ Y ∩ B ( p , r 1 + e ) , and furthermore, by Lemma 16 in [ ALS13 ], we have Rad ξ 6 t + e . W e conclude using that Hull σ ⊂ Hull ( ξ e ) = ( Hull ξ ) e ⊂ ( Y ∩ B ( p , r 2 ) ) h loc Y ( p , r 1 + e , r 2 ; t + e ) + e ⊂ ( X ∩ B ( p , r 2 ) ) h loc Y ( p , r 1 + e , r 2 ; t + e ) + 2 e .  Lemma 4.14. Let k > 4 . There exists s 2 > 0 depending only on R min , L , k , d such that for any r 2 6 s 2 and for any r 1 , t > 0 such that both r 1 6 r 2 and C 0 r 4 2 6 t 6 2 √ 13 r 1 ESTIMA TING THE REACH OF A MANIFOLD 19 for some constant C 0 depending on R min , L , k , d, we have, for all M ∈ C k R min , L and all p ∈ M , h loc M ( p , r 1 , r 2 ; t ) > 1 2 t 2 k II p k op − C r 4 2 where C is a constant depending on R min , L , k , d. In case k = 3 , we have, for all M ∈ C k R min , L and all p ∈ M , h loc M ( p , r 1 , r 2 ; t ) > 1 2 t 2 k II p k op − C 0 r 3 2 where C 0 is a constant depending on R min , L , d . Proof. By applying an isometry of R D , we may assume that p = 0 and that T p M = R d ⊆ R D . The result will then follow from Lemmata 4.11 and 4.12 in addition to the Hausdorff stability pr operty for h loc (Lemma 4.13 ). T ake r 2 > 0 smaller than s 1 (from Lemma 4.11 ) and than 13 1/4 2 L 0 3 1/2 (from Lemma 4.12 ). In the case k > 4, where Φ p is the expansion described in Lemma 3.2 , S = 1 2 d 2 0 Φ p = II p , T = 1 6 d 3 0 Φ p and C 0 is the constant from the statement of Lemma 4.11 , we have h loc M ( 0, r 1 , r 2 ; t ) > h loc M ( S , T )  0, r 1 − C 0 s 4 , r 2 ; t − C 0 r 4 2  − 2 C 0 r 4 2 >  t − C 0 r 4 2  2 k S k op −  t − C 0 r 4 2  6 k S k op k T k 2 op − 2 C 0 r 4 2 > 1 2 k II p k op t 2 − C r 4 2 . where C depends only on R min , L , d , k . The first inequality only holds if C 0 r 4 2 6 t . In the case k = 3 the r esult is obtained similarly .  W e conclude with the proof of Pr oposition 4.3 . Proof of Pr oposition 4.3 . By taking t 6 1 4 s 2 ∧ ( 4 4 C 0 ) − 1/ 3 (from Lemma 4.14 ), and setting r 1 = 2 t and r 2 = 3 t , we have that C 0 r 4 2 6 t 6 2 √ 13 r 1 and t + r 1 6 r 2 so that the hypotheses of both 4.10 and 4.14 hold. It is now immediate that if k > 4 h M ( t ) = sup p ∈ M h loc M ( p , r 1 , r 2 ; t ) > sup p ∈ M  1 2 k II p k op t 2 − C r 4 2  = t 2 2 R ` − 3 4 C t 4 20 BERENFELD, HAR VEY , HOFFMANN, SHANKAR where C is a constant depending on R min , L , d , k , while if k = 3 h M ( t ) > 1 2 R ` t 2 − C 0 t 3 , where C 0 is a constant depending on R min , L . On the other hand, Pr oposition 4.4 provides an upper bound which will hold for all t < R min : h M ( t ) 6 R ` − q R 2 ` − t 2 6 t 2 2 R ` + t 4 2 R 3 ` 6 t 2 2 R ` + t 4 2 R 3 min .  5. A P P R O X I M AT I N G T H E R E A C H Recall item 3 of the properties of h X given after Definition 4.1 which guaran- tees that the convexity defect function is stable with respect to perturbations of the manifold which are small in the Hausdorff distance. This allows one to ap- proximate the r each of a submanifold M ⊆ R D from a nearby subset f M . Given a submanifold M and another subset f M (not necessarily a manifold) so that H ( M , f M ) < e , we can calculate the convexity defect function h f M . This can then be used to approximate R ` = ( h 00 M ( 0 ) ) − 1 and R wfs = inf { t : h M ( t ) = t , t > 0 } . W e can approximate the local r each via h 00 M ( 0 ) ≈ 2 h f M ( ∆ ) ∆ 2 for some choice of step size ∆ . Proposition 4.3 gives the following bound on the error . Proposition 5.1. Let M ∈ C k R min , L . Let 0 < e < ∆ < 1 be such that e + ∆ is small enough to satisfy the hypotheses constraining the variable t in Proposition 4.3 . Let f M ⊆ R D be such that H ( M , f M ) < e . Then • If k > 4 ,    h 00 M ( 0 ) − 2 h f M ( ∆ ) ∆ 2    6 A e ∆ − 2 + B ∆ 2 and, in particular , if ∆ = e 1/4 ,     h 00 M ( 0 ) − 2 h f M ( ∆ ) ∆ 2     6 ( A + B ) e 1/2 • If k = 3 ,    h 00 M ( 0 ) − 2 h f M ( ∆ ) ∆ 2    6 A e ∆ − 2 + B ∆ and, in particular , if ∆ = e 1/3 ,     h 00 M ( 0 ) − 2 h f M ( ∆ ) ∆ 2     6 ( A + B ) e 1/3 where the constants A and B depend only on R min , L . Proof. Set κ = h 00 M ( 0 ) and ˜ κ = 2 h f M ( ∆ ) ∆ 2 . Comparing M to f M , we obtain from stability that 2 h M ( ∆ − e ) − 2 e ∆ 2 6 ˜ κ 6 2 h M ( ∆ + e ) + 2 e ∆ 2 . ESTIMA TING THE REACH OF A MANIFOLD 21 In the case k > 4, Pr oposition 4.3 states that   h M ( t ) − κ 2 t 2   6 C t 4 , for some constant C depending only on R min , L . It follows that κ ( ∆ − e ) 2 − 2 C ( ∆ − e ) 4 − 4 e ∆ 2 6 ˜ κ 6 κ ( ∆ + e ) 2 + 2 C ( ∆ + e ) 4 + 4 e ∆ 2 . Expanding and using that e , ∆ < 1, we obtain | κ − ˜ κ | 6 2 C ∆ 2 + ( 3 κ + 30 C + 4 ) e ∆ − 2 . Similarly , in the case k = 3, we obtain | κ − ˜ κ | 6 2 C 0 ∆ + ( 3 κ + 14 C 0 + 4 ) e ∆ − 2 where C 0 is again a constant depending only on R min , L . Since κ 6 1 / R min , the constants may be chosen to be A = max { 3/ R min + 30 C + 4, 3 / R min + 14 C 0 + 4 } and B = max { 2 C , 2 C 0 } . They depend only on R min , L . Now set ∆ = e p and seek the p yielding the fastest rate of convergence of the error bound to zer o. Since the exponent in the first term incr eases with r espect to p while that in the second decreases, the fastest rate is obtained by requiring the two exponents to be equal, so that p = 1/ 4 for k > 4 and p = 1/3 for k = 3.  At the weak feature size the convexity defect function satisfies h M ( t ) = t . The stability given by item 3 of the pr operties of h X given after Definition 4.1 guaran- tees that the graph of h f M lies close to that of h M , but this alone cannot be used to approximate the first intersection of the graph of h M with the diagonal. The graph of h M could approach the diagonal very slowly before intersecting it, so that the err or in approximating an intersection time based on the graph of h f M is not necessarily small. However , we are only interested in approximating the weak feature size if it yields the reach, i.e. when R wfs < R ` . Corollary 4.5 guarantees the existence of a discontinuity in h M at R wfs ; in this case the function h M must jump at R wfs from being bounded above by a quarter circle of radius R ` to intersecting the diagonal. This feature makes it possible to bound the error in an approximation. W e begin with a simple lemma. Lemma 5.2. Fix R > 0 . Let the intersection points of the line y = x − 6 e and the quarter-circle y = R − √ R 2 − x 2 be ( x 0 , y 0 ) and ( x 1 , y 1 ) . Then there is some e 0 , which depends only on R , so that for 0 < e < e 0 the bounds x 0 6 25 4 e and x 1 > R − e 4 hold. Proof. The equation x − 6 e = R − √ R 2 − x 2 can be rearranged to give the qua- dratic 2 x 2 − ( 2 R + 12 e ) x + ( 36 e + 12 R ) e = 0 with solutions x = 2 R + 12 e ± p ( 2 R − 12 e ) 2 − 288 e 2 4 . For sufficiently small values of e , we have the bound 2 R − 13 e 6 2 R − 12 e − 288 e 2 4 R − 24 e 6 q ( 2 R − 12 e ) 2 − 288 e 2 22 BERENFELD, HAR VEY , HOFFMANN, SHANKAR so that the solutions x 0 and x 1 are bounded by x 0 6 2 R + 12 e − ( 2 R − 13 e ) 4 = 25 4 e x 1 > 2 R + 12 e + ( 2 R − 13 e ) 4 = R − e 4 .  It is clear from the proof that for any δ > 0 there is an e > 0 so that the bounds can be taken to be ( 6 + δ ) e and R ` − δ e . It is sufficient to proceed with δ = 1 /4 and so we will do so. Proposition 5.3. Let M be such that R ( M ) > R min and let e < 2 9 R min be a positive number small enough that the conclusion of Lemma 5.2 holds for R = R min . Let f M ⊆ R D be such that H ( M , f M ) < e . Now suppose further that M is such that R ` − R wfs > 9 4 e . Then the value r = inf  t > 22 4 e : h f M ( t ) > t − 3 e  satisfies the bound | R wfs − r | 6 e . Proof. W e first claim that r 6 R wfs + e . T o see this, suppose that R wfs + e < r . Then, by the definition of r , either R wfs + e < 22 4 e , which by the assumption on e cannot happen, or h f M ( R wfs + e ) < R wfs − 2 e in which case R wfs = h M ( R wfs ) 6 h f M ( R wfs + e ) + 2 e < R wfs , which is a contradiction. Now let us seek a lower bound for r , which relies on the fact that R = R wfs . Note that h M ( r + e ) > h f M ( r ) − 2 e > r − 5 e . If the additional inequality r − 5 e > R ` − q R 2 ` − ( r + e ) 2 , holds, so that h M ( r + e ) > R ` − q R 2 ` − ( r + e ) 2 , then by Proposition 4.4 we would have r + e > R = R wfs , providing the requir ed lower bound r > R wfs − e and completing the pr oof. By Lemma 5.2 , this additional inequality holds when- ever 25 4 e 6 r + e 6 R ` − e 4 . The first bound is true by the definition of r . The second follows from the upper bound for r and the gap between R wfs and R ` : r 6 R wfs + e 6 R ` − 5 4 e .  6. M I N I M A X R AT E S F O R R E A C H E S T I M AT O R S : U P P E R B O U N D S Every submanifold has a natural uniform probability distribution given by its volume measure. W e consider probability distributions with density bounded above and below with r espect to this volume measure. Recall the class of mani- folds C k R min , L studied in [ AL19 ]: d -dimensional compact, connected, submanifolds of R D with a lower bound on the reach and admitting a local parametrization with bounded terms in the T aylor expansion (see Definition 3.1 ). Definition 6.1. For k > 3, R min > 0, L = ( L ⊥ , L 3 , . . . , L k ) and 0 < f min 6 f max < ∞ , we let P k R min , L ( f min , f max ) denote the set of distributions P supported on some ESTIMA TING THE REACH OF A MANIFOLD 23 M ∈ C k R min , L which are absolutely continuous with respect to the volume measur e µ M , with density f taking values µ M -a.s. in [ f min , f max ] . This will be abbreviated by P k where there is no ambiguity . W e define the submodels P k α to be those distributions supported on elements of M k α (the classes defined in Section 3 ). These submodels are such that P k = ∪ α > 0 P k α . The following lemma shows that the uniform lower bound, f min , on the den- sity of elements of P k provides an upper bound R max for both R ` and R wfs , which we will use in our estimators later in the section. Lemma 6.2. There exists R max depending on d , f min , R min so that, if P ∈ P k has support M , then R ` , R wfs 6 R max . Proof. Due to the r elationship between curvature and volume, we have, by Point (3) on [ Alm86 , p. 2] that R ` 6 ( vol M / ω d ) 1/ d 6 ( f min ω d ) − 1/ d , where ω d is the volume of the d -dimensional spher e of radius 1. Furthermore, Aamari and Levrard have shown [ AL18 , Lemma 2.2] that for some constant C depending only on dimension, diam ( M ) 6 C ( d ) f − 1 min R 1 − d min . Since R wfs 6 1 2 diam ( M ) we have R wfs 6 1 2 C ( d ) f − 1 min R 1 − d min . Setting R max : = max { ( f min ω d ) − 1/ d , 1 2 C ( d ) f − 1 min R 1 − d min } , we have the result.  In [ AL19 ] the authors construct an estimator c M out of polynomial patches, from a sample ( X 1 , . . . , X n ) of random variables with common distribution P ∈ P k , supported on a submanifold M ∈ C k R min , L . That estimator has the follow- ing conver gence property . (Note that the T ∗ i referr ed to below are i -linear maps from T p M to R D which are the i th order terms in the T aylor expansion of the submanifold discussed in Section 3 .) Theorem 6.3 (Theorem 6 in [ AL19 ]) . Let k > 3 . Set θ =  C d , k log ( n ) f 2 max ( n − 1 ) f 3 min  1/ d for C d , k large enough. If n is large enough so that 0 < θ 6 1 8 min n R min , L − 1 ⊥ o and θ − 1 > C d , k , R min , L > sup 2 6 i 6 k | T ∗ i | op , then with probability at least 1 − 2 ( 1 n ) k d , we have H ( c M , M ) 6 C ? θ k for some C ? > 0 . In particular , for n large enough, sup P ∈ P k E P ⊗ n  H ( c M , M )  6 C  log ( n ) n − 1  k / d , where C = C d , k , R min , L , f min , f max . 24 BERENFELD, HAR VEY , HOFFMANN, SHANKAR Note that the estimator is dependent on the value of θ ≈ n − 1/ d to within log- arithmic terms, which serves as a bandwidth. The convergence rate of this esti- mator is very close to the currently established lower bound for estimating the reach R , which is n − k / d ; see Theorem 7.1 in Section 7 below . 6.1. Estimating the local reach. Definition 6.4. W e define an estimator for R ` ( M ) , the local reach of a submani- fold M , by b R ` = min   2 h c M ( ∆ ) ∆ 2  − 1 , R max  where c M is the Aamari–Levrard estimator of M as discussed at the beginning of Section 6 above, e = C ? θ k as in Theorem 6.3 , ∆ = e 1/3 if k = 3, or ∆ = e 1/4 if k > 4, and R max is as in Lemma 6.2 . Theorem 6.5. Let k > 3 , let θ be as in Theor em 6.3 and set e = C ? θ k . Then with probability at least 1 − 2 ( 1 n ) k d , we have   b R ` − R `   6 C d , k , R min , L , f min e 1/3 , and, where k > 4 , the exponent is e 1/2 . Moreover , for n large enough, we have sup P ∈ P k E P ⊗ n    b R ` − R `    6 C  log ( n ) n − 1  k 3 d , or , for k > 4 , C  log ( n ) n − 1  k 2 d , where C = C d , k , R min , L , f min , f max . A glance at the proof shows that we actually control   b R − 1 ` − R ` − 1   rather than   b R ` − R `   . This has no impact since R ` 6 R max is uniformly bounded and we do not seek fine control on C . Changing the parametrization R 7 → 1 / R in our statistical proble m and estimating 1/ R instead of R would enable us to r emove the projection onto [ 0, R max ] that we use to define b R ` . Proof. By construction, b R ` 6 R max , and it is also clear that    1 b R ` − 1 R `    6    2 h c M ( ∆ ) ∆ 2 − 1 R `    . W e derive   b R ` − R `   = b R ` R `    1 b R ` − 1 R `    6 R 2 max    2 h c M ( ∆ ) ∆ 2 − 1 R `    . The first statement of Theorem 6.5 is then a straightforward consequence of Propo- sition 5.1 together with Theorem 6.3 . Next, we have E P ⊗ n    b R ` − R `    6 C d , k , R min , f min , L e 1/3 + 2 R max P ⊗ n    b R ` − R `   > C d , k , R min , f min , L e 1/3  6 C d , k , R min , f min , L e 1/3 + 4 R max n − k / d ESTIMA TING THE REACH OF A MANIFOLD 25 thanks to the first part of Theorem 6.5 . This term is of order ( log n / n  k / 3 d . For k > 4, we have the improvement to the exponent e 1/2 and the order becomes ( log n / n  k / 2 d , which establishes the second part of the theorem for all values of k > 3 and completes the proof.  For k = 3, 4, then, the constructed estimator is optimal up to a log ( n ) factor as follows from Theor em 7.1 below . 6.2. Estimating the global reach. By the earlier discussion, it is not possible to give a conver gence guarantee when estimating the weak featur e size, i.e. the first positive critical value of d M . However , in the case where R = R wfs , that is, when R wfs < R ` , this is possible. Accordingly , we now define an estimator for R wfs and hence an estimator for the reach itself. Definition 6.6. W e define an estimator for R wfs , the weak feature size of a sub- manifold M , by b R wfs = min n inf  t ∈ R : 22 4 e < t , h c M ( t ) > t − 3 e } , R max o , where c M is the Aamari–Levrard estimator of M as discussed at the beginning of Section 6 above, e = C ? θ k as in Theorem 6.3 and R max is as in Lemma 6.2 . Our estimator for the reach is then the lesser of the two individual estimators. Definition 6.7. Let C ? , θ be as in Theorem 6.3 and set e = C ? θ k . W e define an estimator for R ( M ) , the reach of a submanifold M , by b R = min n b R wfs , b R ` o . Note that we could just as well use b R ` in place of R max to cap the value of b R wfs , since we do not analyse the error in the case b R ` < b R wfs . However , Definition 6.6 appears more natural as a stand-alone estimator of R wfs . Theorem 6.8. Let k > 3 , let C ? , θ be as in Theorem 6.3 , and set e = C ? θ k , with e such that 22 4 e < min ( R min , 1 ) , which is always satisfied for large enough n > 1 . Then with probability at least 1 − 4 n − k / d , we have   b R − R   6 C d , k , R min , L e 1/3 , and, where k > 4 , the exponent is e 1/2 . In particular , for n large enough, sup P ∈ P k E P ⊗ n    b R − R    6 C  log ( n ) n − 1  k 3 d , or , for k > 4 , C  log ( n ) n − 1  k 2 d , where C = C d , k , τ min , L , f min , f max . 26 BERENFELD, HAR VEY , HOFFMANN, SHANKAR Proof. W e will prove the result in three steps. In Step 1 we provide a bound in the case b R ` < b R wfs which holds with high probability . Then in Step 2 we provide a bound in the complementary case b R ` > b R wfs . Finally , in Step 3, we combine the two bounds, proving the first statement, and use it to obtain the bound on the expected loss. In the following, we use the letters C and C 0 to denote positive numbers that do not depend on n and that may vary at each occurence. Step 1) . W e have   b R − R   1 { b R ` < b R wfs } =   b R ` − min ( R ` , R wfs )   1 { b R ` < b R wfs } 6   b R ` − R `   +   b R ` − R wfs   1 ( R wfs < R ` ) 1 { b R ` < b R wfs } 6 2   b R ` − R `   +   R ` − R wfs   1 ( R wfs < R ` ) 1 { b R ` < b R wfs } by triangle inequality . For C 1 , C 2 > 0, intr oduce the events Ω 1 =    b R ` − R `   6 C 1 e 1/3  and Ω 2 =  H ( c M , M ) 6 e  . On { b R ` < b R wfs } , we have ∀ t ∈  22 4 e , b R `  : h c M ( t ) < t − 3 e , therefor e, on { b R ` < b R wfs } ∩ Ω 1 , we infer that for all t ∈  22 4 e , R ` − C 1 e 1/3  : h c M ( t ) < t − 3 e . By item 3 of the properties of the convexity defect function given after Definition 4.1 , on Ω 2 , we have h c M ( t ) > h M ( t − e ) − 2 e . Putting the last two estimates together , we obtain on { b R ` < b R wfs } ∩ Ω 1 ∩ Ω 2 the bound ∀ t ∈  22 4 e , R ` − C 1 e 1/3  : h M ( t − e ) < t − 3 e + 2 e or equivalently ∀ t ∈  ( 22 4 − 1 ) e , R ` − C 1 e 1/3 − e  : h M ( t ) < t . Therefor e h M ( t ) < t for t 6 R ` − C 1 e 1/3 − e and this implies in turn R wfs > R ` − C 1 e 1/3 − e . W e have thus proved   R ` − R wfs   1 ( R wfs < R ` ) 1 { b R ` < b R wfs } 1 Ω 1 ∩ Ω 2 6 ( C 1 e 1/3 + e ) 6 C e 1/3 . Finally   b R − R   1 { b R ` < b R wfs } 1 Ω 1 ∩ Ω 2 6 C e 1/3 . Step 2) . W e have   b R − R   1 { b R ` > b R wfs } 6 T 1 + T 2 + T 3 , ESTIMA TING THE REACH OF A MANIFOLD 27 with T 1 =   b R wfs − R wfs   1  R wfs + 9 4 e < R `  1 { b R ` > b R wfs } , T 2 =   b R wfs − R wfs   1  R wfs 6 R ` < R wfs + 9 4 e  1 { b R ` > b R wfs } , T 3 =   b R wfs − R `   1 ( R ` < R wfs ) 1 { b R ` > b R wfs } . By Proposition 5.3 , we have T 1 6 e on Ω 2 . W e turn to the term T 2 . W e have h c M ( b R wfs ) > b R wfs − 3 e on { b R ` > b R wfs } by construction. Thanks to item 3 of the properties of the con- vexity defect function given after Definition 4.1 , we also have h c M ( b R wfs ) 6 h M ( b R wfs + e ) + 2 e on Ω 2 therefor e b R wfs − 5 e 6 h M ( b R wfs + e ) holds true on { b R ` > b R wfs } ∩ Ω 2 . Introduce now the event Ω 3 =  b R wfs + e < R wfs  . By Proposition 4.4 , it follows that b R wfs − 5 e 6 R ` − q R 2 ` − ( b R wfs + e ) 2 on { b R ` > b R wfs } ∩ Ω 2 ∩ Ω 3 . Solving this inequality when R ` > b R wfs + e yields b R wfs > R ` − C e for some C > 0 that depends on R ` only . Otherwise, R ` − e 6 b R wfs directly . Replacing C by max { 1, C } , we infer R ` − C e 6 b R wfs 6 b R ` 6 R ` + C 1 e 1/3 on { b R ` > b R wfs } ∩ Ω 1 ∩ Ω 2 ∩ Ω 3 hence | b R wfs − R ` | 6 C e 1/3 on that event. Com- bining this estimate with the condition | R ` − R wfs | 6 9 4 e in the definition of T 2 implies   b R wfs − R wfs   6 C e 1/3 + 9 4 e . W e have thus proved T 2 1 T 3 i = 1 Ω i 6 C e 1/3 + 9 4 e 6 C 0 e 1/3 . On the complementary event Ω c 3 = { b R wfs + e > R wfs } , we have, on the one hand R wfs − b R wfs 6 e . But on the other hand, on { b R ` > b R wfs } ∩ Ω 1 , we have b R wfs − R wfs 6 b R ` − R wfs 6 R ` − R wfs + C 1 e 1/3 6 9 4 e + C 1 e 1/3 6 C e 1/3 28 BERENFELD, HAR VEY , HOFFMANN, SHANKAR thanks to the condition | R ` − R wfs | 6 9 4 e in the definition of T 2 . Combining these bounds, we obtain T 2 ( 1 − 1 Ω 3 ) 1 Ω 1 6 C e 1/3 . Putting together this estimate and the bound T 2 1 T 3 i = 1 Ω i 6 C e 1/3 we established previously , we derive T 2 1 Ω 1 ∩ Ω 2 6 C e 1/3 . W e finally turn to the term T 3 . On { b R wfs > R ` } intersected with { b R ` > b R wfs } ∩ Ω 1 , we have 0 < R ` 6 b R wfs 6 b R ` 6 R ` + C 1 e 1/3 which yields the estimate | b R wfs − R ` | 6 C 1 e 1/3 on { b R wfs > R ` } ∩ { b R ` > b R wfs } ∩ Ω 1 . Alternatively , on the complementary event { b R wfs < R ` } intersected with { b R ` > b R wfs } ∩ Ω 2 we have b R wfs − 5 e 6 R ` − q R 2 ` − ( b R wfs + e ) 2 in the same way as for the term T 2 , provided b R wfs + e < R ` . This implies b R wfs > R ` − C e . Otherwise b R wfs + e > R ` holds true. In any event, we obtain − C e 6 b R wfs − R ` . Since b R wfs − R ` 6 C 1 e 1/3 on Ω 1 , we conclude   b R wfs − R `   6 e + C 1 e 1/3 6 C e 1/3 on { b R wfs < R ` } ∩ { b R ` > b R wfs } ∩ Ω 1 ∩ Ω 2 . Combining these two bounds for   b R wfs − R `   , we finally derive T 3 1 Ω 1 ∩ Ω 2 6 C e 1/3 . Putting together our successive estimates for T 1 , T 2 and T 3 , we have proved   b R − R   1 { b R ` > b R wfs } 1 Ω 1 ∩ Ω 2 6 e + 2 C e 1/3 6 C 0 e 1/3 . Step 3). Combining Step 1) and Step 2) yields   b R − R   1 Ω 1 ∩ Ω 2 6 C e 1/3 . By Theorem 6.5 , we have P ⊗ n ( Ω 1 ) > 1 − 2 n − k / d as soon as C 1 > C d , k , R min , f min , L . By Theor em 6.3 , we have P ⊗ n ( Ω 2 ) > 1 − 2 n − k / d . The first estimate in Theorem 6.8 follows for k > 3. The improvement in the case k = 4 is done in exactly the same way and we omit it. Finally , integrating, we obtain E P ⊗ n    b R − R    6 C e 1/3 + 2 R max  P ⊗ n ( Ω c 1 ) + P ⊗ n ( Ω c 2 )  6 C e 1/3 + 4 R max n − k / d 6 C 0 e 1/3 and the second statement of Theorem 6.8 is proved for k > 3. The impr ovement in the case k = 4 follows in similar fashion.  ESTIMA TING THE REACH OF A MANIFOLD 29 7. M I N I M A X R AT E S F O R R E A C H E S T I M AT O R S : L O W E R B O U N D S W e fix R min , L , k , f min and f max and r ecall the classes P k α which wer e defined in Section 6 , parametrized by the gap α 6 R ` − R wfs . These sub-models are such that P k = ∪ α > 0 P k α . Theorem 7.1. If f min is small enough and f max , L are large enough (depending on R min , and on α for the second statement), then we have the following lower bounds on the reach estimation problem lim inf n → ∞ n ( k − 2 ) / d inf b R sup P ∈ P k 0 E P ⊗ n [ | b R − R | ] > C 0 > 0 and lim inf n → ∞ n k / d inf b R sup P ∈ P k α E P ⊗ n [ | b R − R | ] > C α > 0 ∀ α > 0 with C 0 depending on R min and C α depending on R min and α . In particular , the minimax rate on the whole model P k is of order n − k − 2 d . T o show this proposition, we will make use of Le Cam’s Lemma, restated in our context. Lemma 7.2 (Le Cam Lemma, [ Y u97 ]) . For any two P 1 , P 2 ∈ P , where P is a model of manifold-supported probability measur es, we have inf b R sup P ∈ P E P ⊗ n [ | b R − R | ] > 1 2 | R 1 − R 2 | ( 1 − TV ( P 1 , P 2 ) ) n , where TV denotes the total variation distance between measur es and R 1 (respectively R 2 ) denotes the reach of the support of P 1 (resp P 2 ). Therefor e, one needs to compute the total variation distance between two given manifold-supported measures. When these measures are uniform over their support, we have the following convenient formula. Lemma 7.3. Let M 1 , M 2 be two compact d-dimensional submanifolds of R D and let P 1 , P 2 be the uniform distributions over M 1 and M 2 . Then we have TV ( P 1 , P 2 ) = H d ( M 2 \ M 1 ) vol M 2 if vol M 2 > vol M 1 , where H d denotes the d -dimensional Hausdorff measur e on R D . Proof. First note that P 1 and P 2 are absolutely continuous with respect to H d with densities 1 vol M 1 1 M 1 and 1 vol M 2 1 M 2 respectively . Therefore, we have the following 30 BERENFELD, HAR VEY , HOFFMANN, SHANKAR chain of equalities. TV ( P 1 , P 2 ) = 1 2 Z     1 vol M 1 1 M 1 − 1 vol M 2 1 M 2     d H d = H d ( M 1 \ M 2 ) 2 vol M 1 + H d ( M 2 \ M 1 ) 2 vol M 2 + 1 2 H d ( M 1 ∩ M 2 )  1 vol M 1 − 1 vol M 2  = 1 2  1 + H d ( M 2 \ M 1 ) − H d ( M 1 ∩ M 2 ) vol M 2  = H d ( M 2 \ M 1 ) vol M 2 .  Before pr oving Theorem 7.1 we need to introduce the following technical re- sult: Lemma 7.4. Let Φ : R d → R be a smooth function and let M = { ( v , Φ ( v ) ) | v ∈ R d } ⊆ R d + 1 be its graph. The second fundamental form of M at the point x = ( v , Φ ( v )) ∈ M is given by II x ( u , w ) = d 2 Φ ( v )[ pr ( u ) , pr ( w )] p 1 + k ∇ Φ ( v ) k 2 , for all u , w ∈ T x M where pr is the linear pr ojection to R d ⊆ R d + 1 . Proof. W e define Ψ : v ∈ R d 7 → ( v , Φ ( v )) ∈ R d + 1 so that M is the image of R d through the dif feomorphism Ψ . Let x ∈ M and let v ∈ R d be such that x = Ψ ( v ) . The tangent space T x M is given by T x M = { d Ψ ( v )[ h ] = ( h , h h , ∇ Φ ( v ) i ) | h ∈ R d } , so that a normal vector field on M is given by n ( x ) = − ∇ Φ ( v ) p 1 + k ∇ Φ ( v ) k 2 , 1 p 1 + k ∇ Φ ( v ) k 2 ! ∈ R d + 1 . For u ∈ T x M , where h = pr u , we have d n ( x )[ u ] = − H Φ ( v ) h p 1 + k ∇ Φ ( v ) k 2 , 0 ! − h H Φ ( v ) h , ∇ Φ ( v ) i 1 + k ∇ Φ ( v ) k 2 n ( x ) , where H Φ denotes the Hessian of Φ . Now for w ∈ T x M and η = pr w , we have II x ( u , w ) = − h d n ( x ) [ u ] , w i = * H Φ ( v ) h p 1 + k ∇ Φ ( v ) k 2 , 0 ! , ( η , h η , ∇ Φ ( v ) i ) + = * H Φ ( v ) h p 1 + k ∇ Φ ( v ) k 2 , η + = d 2 Φ ( v )[ h , η ] p 1 + k ∇ Φ ( v ) k 2 concluding the proof.  W e are now r eady to prove Theor em 7.1 . ESTIMA TING THE REACH OF A MANIFOLD 31 Proof of Theor em 7.1 . Step 1: The case of P k 0 . Let M be the d -dimensional sphere in R d + 1 of radius r center ed at − re d + 1 , where e d + 1 = ( 0, . . . , 0, 1 ) . W e choose r to be such that r > 2 R min . Since M is smooth, there exists L ∗ ∈ R k − 2 (depending on r ) such that M ∈ C k r , L ∗ and thus the uniform probability P on M is in P k r , L ∗ ( a ∗ , a ∗ ) (see Definition 6.1 ) with a ∗ = ( r d s d ) − 1 and s d being the volume of the unit d - dimensional sphere. Let us now perturb M to M γ , as illustrated in Figure 3 . Define for any γ > 0 Φ γ :    R d + 1 → R d + 1 z 7 → z + γ k Ψ ( z / γ ) e d + 1 , where Ψ ( z ) = ψ ( k z k ) and wher e ψ : R → R is a smooth, even, non-trivial, pos- itive map supported on [ − 1, 1 ] , decreasing on [ 0, 1 ] , and with φ 00 ( 0 ) < 0. The above map is a global dif feomorphism as soon as γ k − 1 k d Ψ k op, ∞ < 1. Moreover , we have k d Φ γ − I D k op, ∞ = γ k − 1 k d Ψ k op, ∞ and k d j Φ γ k op, ∞ 6 γ k − j k d j Ψ k , so that, provided k d k Ψ k is chosen small enough (depending on r ) and that γ is small enough (depending again on r ), then we can apply Proposition A.5 fr om the sup- plementary material in [ AL19 ] to show that the submanifold M γ = Φ γ ( M ) is in C k r /2,2 L ∗ . F I G U R E 3 . The submanifolds M and M γ used in the proof of the first part of the lower bound. Then we have vol M γ = Z M γ d vol M γ ( x ) = Z M | det d Φ γ ( z ) | d vol M ( z ) . 32 BERENFELD, HAR VEY , HOFFMANN, SHANKAR Since 1 2 6 | det d Φ γ | 6 2 for γ small enough (depending on r ), it follows that 1 2 vol M 6 vol M γ 6 2 vol M for the same values of γ , so that the uniform distri- bution P γ on M γ in is P k r /2,2 L ∗ ( a ∗ /2, 2 a ∗ ) . If we assume that 2 L ∗ 6 L , f min 6 a ∗ /2 and 2 a ∗ 6 f max (which we do from now on) then we immediately have P ∈ P k 0 and P γ ∈ P k 0 , provided that R wfs ( M γ ) > R ` ( M γ ) . W e claim that the latter in- equality holds. Around 0, simple geometrical considerations show that M γ can be viewed as the graph of the function ξ γ :      R d → R v 7 → p r 2 − k v k 2 − r + γ k ψ  r γ q 2 − 2 p 1 − k v k 2 / r 2  . W riting ξ γ ( v ) = ζ γ ( k v k ) with ζ γ : R → R , a series of computations shows that ζ 00 γ ( 0 ) = − 1 r + r γ k − 2 ψ 00 ( 0 ) . Setting c = − ψ 00 ( 0 ) > 0 (which depends on r ) we have, using Lemma 7.4 , R ` ( M γ ) 6 1 | ζ 00 γ ( 0 ) | = 1 1 r + c r γ k − 2 6 r − 1 2 cr 2 γ k − 2 as soon as cr 2 γ k − 2 6 1. Now let us turn to the control of R wfs ( M γ ) . W e will show that the distance between any pair of bottleneck points is bounded below by 2 r . Let ( x , y ) ∈ M γ be a pair of bottleneck points. First notice that x and y cannot lie simultaneously in B ( 0, γ ) because M γ ∩ B ( 0, γ ) can be seen as a graph. If x , y ∈ M γ \ B ( 0, γ ) , then d ( x , y ) = 2 r necessarily . If, say , x ∈ B ( 0, γ ) and y ∈ M γ \ B ( 0, γ ) , then the open segment ( x , y ) cross M at a single point z ∈ M . Therefor e, we have that d ( x , y ) = d ( x , z ) + d ( z , y ) . But now since [ x , y ] is normal to M γ at point y , we know that [ z , y ] is a diameter of M so that d ( z , y ) = 2 r and thus d ( x , y ) > 2 r . W e have shown that R wfs ( M γ ) > r > R ` ( M γ ) for γ small enough and thus M γ ∈ M k 0 and P γ ∈ P k 0 . Now , by Lemma 7.3 , we have that TV ( P , P γ ) = H d ( M γ \ M ) / vol M γ 6 C γ d for some constant C depending on r . Applying now Le Cam’s Lemma (Lemma 7.2 ) and noting that R ( M ) − R ( M γ ) > c r 2 γ k − 2 , we obtain inf b R sup P ∈ P k 0 E P ⊗ n [ | b R − R | ] > 1 2 cr 2 γ k − 2 × ( 1 − C γ d ) n . Setting γ = 1/ ( C n ) 1/ d , we know that for n large enough (depending on r ), we have inf b R sup P ∈ P k 0 E P ⊗ n [ | b R − R | ] > 1 8 cr 2 ( C n ) − ( k − 2 ) / d . Set r to be equal to 2 R min and the first statement of Theorem 7.1 follows. ESTIMA TING THE REACH OF A MANIFOLD 33 Step 2: The case of P k α . W e next turn to the second part of the theorem. W e fix α > 0 and construct a manifold M ∈ C k as follows. W e consider the two parallel disks B ( 0, 2 r ) ⊆ R d ⊆ R d + 1 and B ( 2 r e d + 1 , 2 r ) ⊆ 2 r e d + 1 + R d ⊆ R d + 1 , with r > 2 R min , and link them together so that M satisfies the following: • M is a smooth submanifold of R d + 1 , • M has reach r , and ( 0, 2 re d + 1 ) is a reach attaining pair , • R ` ( M ) > r + α . See Figure 4 for a schematic notion of such M , visualized with d = 1. F I G U R E 4 . The submanifolds M and M γ used in the proof of the second part of the lower bound. Furthermore, we know that there exists L ∗ (depending on r and α ) such that M ∈ C k r , L ∗ and P ∈ P k r , L ∗ ( a ∗ , a ∗ ) where a ∗ = 1/ vol M and where P is the uniform probability over M . W e again consider the map Φ γ :    R d + 1 → R d + 1 z 7 → z + γ k Ψ ( z / γ ) e d + 1 . Similarly to the first part of the theorem, for γ small enough (depending on α and r ), we know that M γ = Φ γ ( M ) is a smooth submanifold in C k r /2,2 L ∗ and that the uniform distribution P γ over M γ lies in P k r /2,2 L ∗ ( a ∗ /2, 2 a ∗ ) . Again, assuming that L > 2 L ∗ , f min 6 a ∗ /2 and f max > 2 a ∗ , we have that P ∈ P k α and, furthermore, that P γ ∈ P k α , provided that R ` ( M γ ) > R wfs ( M γ ) + α . W e claim that the latter inequality holds. Since Ψ is maximal at 0, we know that ( γ k ψ ( 0 ) e d + 1 , 2 re d + 1 ) is still a bottleneck pair , and thus R wfs ( M γ ) 6 r − c γ k where we set c = − 2 ψ ( 0 ) (depending on α and r ). For the curvature, notice that it is unchanged outside of B ( 0, γ ) and that M γ is just the graph of v 7 → γ k Ψ ( v / γ ) within this ball. Using Lemma 7.4 , we thus have R ` ( M γ ) > min  ( r + α ) , ( C γ k − 2 ) − 1  , with C depending on α and r , so that R ` ( M γ ) > R wfs ( M γ ) + α for γ small enough (depending on α and r ), and therefor e M γ ∈ M k α and P γ ∈ P k α . 34 BERENFELD, HAR VEY , HOFFMANN, SHANKAR Using Lemma 7.3 , we have that TV ( P , P γ ) = H d ( M γ \ M ) / vol M γ 6 δ γ d for some constant δ depending on r . Applying now Le Cam’s Lemma (Lemma 7.2 ) and noticing that R ( M ) − R ( M γ ) > c γ k , we get inf b R sup P ∈ P k 0 E P ⊗ n [ | b R − R | ] > 1 2 c γ k × ( 1 − δ γ d ) n . Setting γ = 1 / ( δ n ) 1/ d , we know that for n large enough (depending on r and α ), we have inf b R sup P ∈ P k 0 E P ⊗ n [ | b R − R | ] > 1 8 c ( δ n ) − k / d . Setting r = 2 R min yields the result completing the pr oof of Theorem 7.1 .  Acknowledgments It is a pleasure to thank the University of Oklahoma and the University of Paris–Dauphine for providing ideal working conditions, and for their support. J. Harvey was supported by a Daphne Jackson Fellowship sponsored by the U.K. Engineering and Physical Sciences Resear ch Council and Swansea University . K. Shankar was supported by the U.S. National Science Foundation during the completion of this work. Any opinion, findings, and con- clusions or r ecommendations expr essed in this material ar e those of the authors and do not necessarily reflect the views of the National Science Foundation. The authors express their gratitude to the referees for the great care and attention shown to the manuscript, which has greatly impr oved the exposition. R E F E R E N C E S [AKCMR W19] Aamari, E., J. Kim, F . Chazal, B. Michel, A. Rinaldo, and L. W asserman (2019). Estimating the reach of a manifold , Electr on. J. Stat. 13 (1), 1359–1399. [AL18] Aamari, E. and C. Levrard (2018). Stability and Minimax Optimality of T angential Delaunay Complexes for Manifold Reconstruction , Discrete Comput. Geom. 59 , 923–971. [AL19] Aamari, E. and C. Levrard (2019). Nonasymptotic rates for manifold, tangent space and curva- ture estimation , Ann. Statist. 47 (1), 177–204. [Alm86] Almgren, F . (1986). Optimal isoperimetric inequalities , Indiana Univ . Math. J. 35 (3), 451–547. [AL15] Attali, D., and A. Lieutier (2015). Geometry-driven collapses for converting a ˇ Cech complex into a triangulation of a nicely triangulable shape. Discrete Comput. Geom. 54 (4), 798–825. [ALS13] Attali, D., A. Lieutier , and D. Salinas (2013). V ietoris-–Rips complexes also provide topologi- cally correct r econstructions of sampled shapes , Comput. Geom. 46 (4), 448–465. [BRSSW12] Balakrishnan, S., A. Rinaldo, D. Sheehy , A. Singh, and L. W asserman (2012). Minimax rates for homology inference , in Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, pp. 64–72. [BL W19] Boissonnat, J.D. Lieutier A., W intraecken, M. (2019) The reach, metric distortion, geodesic convexity and the variation of tangent spaces , J. App. and Comput. T opology . 3 , 29–58. [Div20] Divol, V . (2020). Minimax adaptive estimation in manifold inference . [Car92] do Carmo, M. (1992). Riemannian geometry . Boston, MA: Birkh ¨ auser . [ET94] Efron, B. and R.J. T ibshirani (1994) An introduction to the Bootstrap . Monograph on Statistics and Probability 57. London: Chapman & Hall. ESTIMA TING THE REACH OF A MANIFOLD 35 [Fed59] Federer , H. (1959). Curvature measures , T rans. Amer . Math. Soc. 91 , 418–491. [Gro94] Grove, K. (1994). Critical point theory for distance functions , in Proceedings of Symposia in Pure Mathematics 54 (3) pp. 357–386. Pr ovidence, RI: American Mathematical Society . [GPVW12] Genovese, C., M. Perone-Pacifico, I. V erdinelli, and L. W asserman (2012). Minimax man- ifold estimation , J. Mach. Learn. Res. 13 , 1263–1291. [KR W19] Kim, J., A. Rinaldo, and L. W asserman (2019). Minimax rates for estimating the dimension of a manifold , J. Comput. Geom. 10 (1), 42–95. [L yt04] L ytchak, A. (2004). On the geometry of subsets of positive reach , Manuscripta Math. 115 , 199– 205. [NSW08] Niyogi, P ., S. Smale, and S. W einberger (2008). Finding the homology of submanifolds with high confidence from random samples , Discr ete Comput. Geom. 39 (1), 419–441. [Y u97] Y u, B. (1997). Assouad, Fano, and Le Cam , in Festschrift for Lucien Le Cam pp. 423–435. New Y ork, NY : Springer . C L ´ E M E N T B E R E N F E L D , U N I V E R S I T ´ E P A R I S - D A U P H I N E P S L , C E R E M A D E , P L A C E D U M A R ´ E C H A L D E L AT T R E D E T A S S I G N Y , 7 5 0 1 6 P A R I S , F R A N C E Email address : berenfeld@ceremade.dauphine.fr J O H N H A RV E Y , D EPA RT M E N T O F M AT H E M AT I C S , S W A N S E A U N I V E R S I T Y , F A B I A N W A Y , S WA N S E A , S A 1 8 E N , U . K . Email address : j.m.harvey@swansea.ac.uk M A R C H O FF M A N N , U N I V E R S I T ´ E P A R I S - D A U P H I N E P S L , C E R E M A D E , P L A C E D U M A R ´ E C H A L D E L AT T R E D E T A S S I G N Y , 7 5 0 1 6 P A R I S , F R A N C E Email address : hoffmann@ceremade.dauphine.fr K R I S H N A N S H A N K A R , N AT I O N A L S C I E N C E F O U N D AT I O N , 2 4 1 5 E I S E N H O W E R A V E N U E , A L E X A N - D R I A , V A 2 2 3 1 4 , U . S . A . Email address : Krishnan.Shankar-1@ou.edu

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment