Tight Lower Bounds for Homology Inference

The homology groups of a manifold are important topological invariants that provide an algebraic summary of the manifold. These groups contain rich topological information, for instance, about the connected components, holes, tunnels and sometimes th…

Authors: Sivaraman Balakrishnan, Aless, ro Rinaldo

TIGHT LO WER BOUNDS F OR HOMOL OGY IN FERENCE By Siv araman Balakrishnan , Ale ssandro Rinaldo , Aar ti Singh and Larr y W asserman Scho ol of Computer Scienc e and Statistics Dep artmen t Carne gie Mel lon University The homology groups of a manifold are importa n t top o logical in v a riants that provide an algebraic summary of the manifold. These groups contain rich topolog ical information, for instance, abo ut the connected comp onents, holes, tunnels and sometimes the dimension of the manifold. In ea rlier work [ 1 ], w e hav e cons idered the statistical problem of estimating the homo logy of a manifold from noiseless samples and fro m noisy sa mples under several differe n t noise mo dels. W e deriv ed upper and low er b ounds on the minimax r isk for this problem. In this note w e revisit the noiseless case. In [ 1 ], we used Le Cam’s lemma to establish the low er b ound 1 R n = Ω  exp  − nτ d  for d ≥ 1 and D > d . In the noiseless ca s e the uppe r b ound follows from the work of [ 2 ], who show that R n = O  1 τ d exp  − nτ d   . In this note we use a different construction base d on the direc t analys is of the likelihoo d r atio test to s how that R n = Ω  1 τ d exp  − nτ d   , as n → ∞ thus establishing rate optimal a symptotic minimax bounds fo r the problem. The techn iques w e use here extend in a straightforward way to the noisy settings considered in [ 1 ]. Although, we do not consider the extens ion here non-as ymptotic bounds are als o straightforward. 1. In t ro duction. Let M be a d - dimensional manifold em b edded in R D where d ≤ D . The homolo g y gr oups H ( M ) of M (see [ 3 ]) , are an a lg ebraic summary of the prop erties of M . The homology groups of a manifold describ e its top ological features suc h as its connected comp onen ts, holes, tunnels, etc. In this note w e study the problem of estimating the homology of a manifold M from a sample X = { X 1 , . . . , X n } . Sp ecifically , we b ound the minimax risk (1) R n ≡ inf b H sup Q ∈Q Q n  b H 6 = H ( M )  where the infim um is ov er al l estimators b H of the homolo gy of M and the suprem um is ov er appropriately defined classes o f distributions Q fo r Y . Note tha t 0 ≤ R n ≤ 1 with R n = 1 1 The a symptotic notation in both the upp er and low er bo unds hide co nstants that c ould dep end on the dimensions d and D . 1 meaning that the problem is ho p eless. Bounding t he minimax risk is equiv alent to b ounding the sam ple c omplex ity of the b est p ossible estimator, defined by n ( ǫ ) = min  n : R n ≤ ǫ  where 0 < ǫ < 1. W e assume that the sample X ⊂ R D constitutes a set of observ ations o f an unkno wn d - dimensional manifold M , with d < D , whose homolo gy w e seek to estimate. The distribution of t he sample dep ends on the prop erties of the manifold M as w ell as on the distribution of p oin ts on M . W e consider the collection P ≡ P ( M ) ≡ P ( M , a ) of all probabilit y distributions supp orted o ver manifolds M in M ha ving densities p with resp ect to the v olume form on M uniformly b ounded from b elo w b y a constan t a > 0, i.e. 0 < a ≤ p ( x ) < ∞ fo r all x ∈ M . Manifold Assumptions. W e assume that the unknown ma nif o ld M is a d -dimensional smo oth compact Riemannian manifo ld without b o undary em b edded in the compact set X = [0 , 1] D . W e furt her assume tha t the v olume of the manifo ld is b ounded from ab o ve b y a constan t whic h can dep end on the dimensions d, D , i.e. w e assume v ol( M ) ≤ C D ,d . W e will also mak e the further assumption that D > d . The main regularit y condition w e imp ose on M is that its c ondition numb er b e not to o large. The c ondi tion numb er κ ( M ) (see [ 2 ]) is 1 /τ , where τ is the largest n um b er suc h that the op en normal bundle ab out M of ra dius r is im b edded in R D for ev ery r < τ . F or τ > 0 let M ≡ M ( τ ) = n M : κ ( M ) ≥ τ o denote the set o f all suc h manifo lds with condition n um b er no smaller than τ . A manifo ld with small condition n um b er do es not come to o close to b eing self-inters ecting. 1.1. L owe r b oundin g the mi nimax risk. In this note we will lo w er b o und the minimax risk b y considering a related testing problem. Before describing the h yp o theses we describe the n ull and a lternate manifolds. The n ull manifold M 0 is a collection of m , d -spheres of radius τ , denoted S 1 , . . . , S m , with cen t ers on one face of the unit hypercub e in d + 1 dimensions ( M 0 is em b edded in a space of dimension D whic h is of dimension at least d + 1), with spacing b etw een adjacen t cen ters = 4 τ . It is easy to see that m = O  1 (4 τ ) d  b ecause the manifold m ust b e comple tely in [0 , 1] D , and that t he manifold has condition n um b er at least 1 /τ . W e will use m = Θ  1 (4 τ ) d  in this note. Let P 0 denote the uniform distribution on M 0 . 2 The alternate manif o lds are a collection { M 1 i : i ∈ { 1 , . . . , m }} , where M 1 i is M 0 with S i remo v ed. Let π denote the uniform distribution on { 1 , . . . , m } , and P 1 i denote the uniform distribution on M 1 i . W e need to ensure that the density p is lo w er b ounded b y a constan t. Note that the total d -dimensional v olume of M 0 is v d τ d m , and so p ( x ) ≥ 1 v d τ d m where v d is the volume of the d -dimensional unit ball. This is Ω(1) as desired. A similar argumen t w orks for M 1 i . Consider the following testing problem: H 0 : X ∼ P 0 H 1 : X ∼ P 1 i with i ∼ π A test T , is a measurable function of X , in part icular T : X → { 0 , 1 } , and its risk is defined as R T n . . = P H 0 ( T ( X ) = 1) + P H 1 ( T ( X ) = 0) The relatio nship b et w een testing and estimation is standard [ 4 ]. In our case it is easy to see that the estimation minimax risk of Equation 1 satisfies, R T n ≤ 2 R n and so it suffices to lo w er b ound R T n to obtain a lo w er b ound on R n . This relation is a straigh tforward consequence of the fact that H ( M 0 ) 6 = H ( M 1 i ) for ev ery i (since they ha v e differen t num b er of connected comp onents), a nd so an y estimator can b e used in the testing problem describ ed. The optimal test for the hypothesis testing problem describ ed is the lik eliho o d ratio test, T ( X ) = 0 if and only if L ( X ) ≤ 1 where L ( X ) = L 1 ( X ) L 0 ( X ) where L 1 ( X ) and L 0 ( X ) are lik eliho o ds of the data under the alternate and null respectiv ely . 1.2. Coup o n c ol le ctor lower b ound. W e b egin with a theorem from [ 5 ]. Lemma 1 (Theorem 3.8 of [ 5 ]) . L et the r andom variable X denote the numb er of trials fo r c ol le cting e ach of the n typ es of c oup ons. The n fo r any c o nstant c ∈ R , and m = n log n − cn , lim n →∞ P ( X > m ) = 1 − exp ( − exp ( c )) 3 2. Main result. Theorem 2 . F or any c on stant δ < 1 , we have R n ≥ Ω  min  1 τ d exp  − nτ d  , δ  as n → ∞ . Proo f. Notice that since m = Θ  1 (4 τ ) d  the theorem is implied by the statemen t that n = m log m + m log  1 δ  = ⇒ R n ≥ cδ for some constan t c . W e will fo cus on proving this claim. Let us consider the case when samples are dra wn according to P 0 . F rom Lemma 1 we hav e that if n = m log m + m log  1 δ  then the probability with whic h w e do not see a p oin t in each of the m spheres is 1 − exp( − exp( − log 1 /δ )) ≥ cδ since δ < 1 , for some constan t c . It is easy to see that if w e do not see a p oint in each of the m spheres then L ( X ) ≥ 1 m 1 (1 − 1 /m ) n . . = T m,n When n = m log m + m log  1 δ  , T m,n → 1 δ > 1 so asymptotically the likelihoo d r a tio test alwa ys r ejects the n ull. F rom this we can see the probabilit y o f a T yp e I error → cδ , and R T n ≥ cδ , whic h giv es R n ≥ c 2 δ as desired. 4 3. Discussion. In this note w e ha v e establishe d tigh t minimax rates fo r the problem of homology inference in the no iseless case. The in tuition b ehind the construction extends to the noisy cases considered in [ 1 ] in a straigh tforward w ay . Although the b o und w e hav e sho wn is an asymptotic low er b ound, a finite sample low er b ound f ollo ws in a straightforw ard w a y b y replacing the asymptotic calculation in Lemma 1 with finite sample estimates. W e also exp ect similar cons tructions to be useful in establishing tigh t lo we r b ounds for the problems of manifold estimation in Hausdorff distance considered in [ 6 , 7 ], and for the problem of estimation o f p ersistence diagrams in b ottlenec k distance considered in [ 8 ]. REFERENCES [1] Siv ara man Bala krishnan, Alessandro Rinaldo, Don Sheeh y , Aarti Sing h, and Larry W a s serman. Minimax rates for homo logy inference. AIST A TS , 2012. [2] Partha Niyogi, Stephen Smale, and Shmuel W einberg er. Finding the homolog y o f submanifolds with high confidence fro m random samples. Discr ete & Computational Ge ometry , 39(1-3 ):419–4 41, 200 8 . [3] Allen Hatcher. Algebr aic T op olo gy . Ca m bridge University P ress, 2002. [4] E.L. Lehmann and J.P . Roma no . T est ing Statist ic al Hyp otheses . Springer T exts in Statistics. Springer , 2005. [5] R. Motw ani and P . Raghav an. R andomize d Algorithms . Cam bridge Universit y Pres s, 1995. [6] Christopher R. Genov ese, Marco Perone-Pacifico, Is a be lla V er dinelli, and Larr y W asserman. Manifold estimation and sing ular deconv olution under Hausdor ff los s. Ann. St atist. , 40(2):941– 963, 2012. [7] Christopher R. Genov ese, Marco Perone-Pacifico, Is a be lla V er dinelli, and Larr y W asserman. Minimax manifold estimation. Journal of Machine L e arning R ese ar ch , 13:12 63–12 91, 201 2. [8] F r´ ed´ eric Cha z al, Marc Glisse, Ca therine Labru` ere, and Bertr and Mich el. Optimal rates of conv ergence for p ersistence diagrams in top ologic a l data analysis, 201 3. School of Computer S cience Carnegie Mellon University Pittsburgh, P A 15213 E-mail: sbalakri@cs.cmu.edu E-mail: aarti@ cs.cm u.edu Dep a r tmen t of St a tistics Carnegie Mellon University Pittsburgh, P A 15213 E-mail: arinaldo@cmu.ed u E-mail: larry@cmu .edu 5

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment