Unsupervised nonparametric detection of unknown objects in noisy images based on percolation theory

Unsup ervised nonparametric detection of unkno wn ob jects in noisy images based on p ercolation theory Mikhail Lango v o y ∗ Machine L e arning and Optimization L ab or atory EPFL, Station 14 L ausanne, CH-1015 Switzerland e-mail: mikhail.langovoy@epfl.ch and Olaf Wittic h L ehrstuhl A f ¨ ur Mathematik R WTH A achen, 52056 A achen e-mail: olaf.wittich@matha.rwth-aachen.de and Laurie Da vies Dep artment of Statistics, University of California at Davis, Davis CA, 95616-8572, USA e-mail: laurie.davies@uni-due.de Abstract: W e develop an unsup ervised, nonparametric, and scalable statistical learning metho d for detection of unknown ob jects in noisy images. The metho d uses results from p ercolation theory and random graph theory . W e present an algorithm that allows to detect ob jects of unknown shapes and sizes in the presence of nonparametric noise of unkno wn level. The noise density is assumed to b e unkno wn and can b e very irregular. The algorithm has linear complexity and exp onen- tial accuracy and is appropriate for real-time systems. W e pro ve strong consistency and scalability of our metho d in this setup with minimal assumptions. Keyw ords and phrases: Nonparametric learning, unsup ervised learn- ing, ob ject detection, image analysis, noisy image, p ercolation, extreme noise, nonparametric h yp othesis testing. ∗ Corresp onding author. 1 imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 2 1. In tro duction Detection of ob jects in noisy images is the most basic problem of image analysis. Indeed, when one lo oks at a noisy image, the ﬁrst question to ask is whether there is any ob ject hidden b ehind the noise, at all. This is also a primary question of interest in suc h diverse ﬁelds as, for example, cancer detection (Ricci-Vitiani et al., 2007), automated urban analysis (Negri et al., 2006), detection of crac ks in buried pip es (Sinha and Fieguth, 2006), and other p ossible applications in astronom y , electron microscopy and neurology . Moreo v er, if there is just a random noise in the picture, it do esn’t make sense to run computationally in tensive pro cedures for image reconstruction for this particular picture. This is esp ecially relev an t in mo dern da y applications to In ternet data and in automated image pro cessing systems, where one has to mine billions of images under time constrain ts. Surprisingly , the v ast ma jority of image analysis metho ds, b oth in statistics and in engineering, a v oid the pure detection problem and start immediately with the more c hallenging and computationally in tensiv e task of image reconstruction. 1.1. R elate d work As pixels in digital images can b e viewed as netw ork no des with attributes, man y applications from image pro cessing, such as road tracking (Geman and Jedynak, 1996) or medical tumor detection (McInerney and T erzop ou- los, 1996), can b e treated within the framework of communit y detection in net w orks. More sp eciﬁcally , these setups corresp ond to detection of comm u- nities hidden in large netw orks with noisy no de attributes, where one decides on the existence of communities using b oth the net work top ology as well as the net work’s conten t represented by no de attributes (see, e.g., (Ruan et al., 2013) or (Y ang et al., 2013) for related types of setups). W e are only concerned with the existence of an ob ject in the image, and not with estimating the ob ject. W e are also hea vily using the fact that images are pro cessed on computers only in a discretized form. F or this problem and the setup, recen tly a new line of researc h emerged where discrete probabilit y metho ds of statistical physics were applied to unsup ervised comm unity de- tection in discrete structures such as pixelized images or lattices (Langov oy and Wittich, 2009), (Langov oy and Wittic h, 2013a), (Lango vo y and Wittic h, 2013b), (Arias-Castro and Grimmett, 2013). The idea of applying percolation theory to study of hidden comm unities in netw orks, com bined with v ariations of k -NN scans, prov ed useful in application areas like anomaly detection and automated detection of unknown ob jects in extremely noisy images (Lan- go v oy et al., 2011b), (Langov oy et al., 2011a). imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 3 Ho w ever, in (Arias-Castro and Grimmett, 2013) only the case of paramet- ric noise from a one-parameter exp onen tial family w as considered, while the presen t pap er deals with the general case of nonparametric noise. Our bulk condition on the ob ject in terior is also more general than conditions on clus- ter sizes in (Arias-Castro and Grimmett, 2013). The algorithm in the presen t pap er has linear computational complexity , irresp ectively of a shap e of an ob ject. P ap ers (Langov oy and Wittic h, 2009) and (Langov oy and Wittich, 2013a) treated diﬀerent kind of underlying lattices and had to resort to the case of nonparametric noise of b ounded level, while (Lango v oy and Wittic h, 2013b) had to limit the class of p ossible noise distributions via additional smo othness assumptions. In this pap er, w e establish a strong form of consis- ten t detection for a m uc h wider class of noise distributions, and c ompletely r emove smo othness assumptions on the noise. 1.2. Main c ontributions F rom the statistical p oint of view, we treat the ob ject detection problem as a nonparametric h yp othesis testing problem within the class of statistical in v erse problems on net works. W e assume that the noise densit y is completely unkno wn, and that it is not necessarily smo oth or contin uous. In this pap er, w e prop ose an algorithmic solution for this nonparametric h yp othesis testing problem. W e pro ve that our algorithm has linear complexity in terms of the n um b er of pixels on the screen, and this procedure is not only asymptotically consisten t, but on top of that has accuracy that gro ws exp onentially with the ”n um b er of pixels” in the ob ject of detection. The algorithm has a built-in data-driv en stopping rule, so there is no need in h uman assistance to stop the algorithm at an appropriate step. The crucial diﬀerence of our metho d is that we do not imp ose an y shap e or smo othness assumptions on the b oundary of the ob ject. This p ermits the detection of nonsmo oth, irregular or disconnected ob jects in noisy images, under very mild assumptions on the ob ject’s interior. This is esp ecially suit- able, for example, if one has to detect a highly irregular non-conv ex ob ject in a noisy image. This is usually the case, for example, in the aforementioned ﬁelds of automated urban analysis, cancer detection and detection of crac ks in materials. Although our detection pro cedure w orks for regular images as w ell, it is precisely the class of irregular images with unkno wn shap e where our metho d can b e very adv antageous. imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 4 1.3. Outline The pap er is organized as follows. Our statistical mo del is describ ed in de- tails in Section 2. Prop er t yp e of thresholding for noisy images is crucial in our method and w ould allo w us to apply p ercolation theory to our learning task. Both thresholding and p ercolation are describ ed in Section 3. An algo- rithm for ob ject detection is presented in Section 4. Theorem 1 established the strong consistency and scalabilit y of the metho d in the setup with min- imal assumptions on b oth the noise and the ob ject of interest. An example illustrating p ossible applications of our metho d is giv en in Section 5. Section 6 is dev oted to the pro of of the main theorem. 2. Statistical mo del Assume we observe a noisy digital image on a screen of N × N pixels. Ob- ject detection and image reconstruction for noisy images are tw o of the cor- nerstone problems in image analysis. In this pap er, w e develop an eﬃcient scalable robust tec hnique for quick detection of ob jects in noisy images. In the presen t pap er w e are interested in detection of ob jects that ha v e a kno wn colour. This colour has to b e diﬀeren t from the colour of the back- ground. Mathematically , this is equiv alen t to assuming that the true (non- noisy) images are black-and-white, where the ob ject of interest is black and the bac kground is white. Without loss of generality , we are free to assume that all the pixels that b elong to the meaningful ob ject within the digitalized image ha ve the v alue 1 attac hed to them. W e can call this v alue a black c olour . Additionally , assume that the v alue 0 is attached to those and only those pixels that do not b elong to the ob ject in the non-noisy image. If the n umber 0 is attac hed to the pixel, w e call this pixel white . It is also assumed that on eac h pixel we ha v e random noise that has the unknown distribution function F ; the noise at eac h pixel is completely in- dep enden t from noises on other pixels. It is imp ortan t that w e consider the case of a fully nonparametric noise of unknown level and having an unknown distribution. More formally , we ha ve an N × N arra y of observ ations, i.e. w e observe N 2 real num b ers { Y ij } N i,j =1 . Denote the true v alue on the pixel ( i, j ), 1 ≤ i, j ≤ N , b y I m ij , and the corresp onding noise by σ ε ij . According to the ab ov e, Y ij = I m ij + σ ε ij , (1) imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 5 where 1 ≤ i, j ≤ N , and { ε ij } , 1 ≤ i, j ≤ N are i.i.d., and I m ij =  1 , if ( i, j ) b elongs to the ob ject; 0 , if ( i, j ) do es not b elong to the ob ject. (2) T o stress the dep endence on the noise lev el σ , we write our assumption on the noise in the following wa y: ε ij ∼ F , E ε ij = 0 , V ar ε ij = 1 . (3) Throughout this pap er we will additionally assume that the following non- degeneracy assumption holds. h A i F ( t ) ≡ C for all t ∈ ( a, b ) ⇒ b − a < 1 . (4) The nul l hyp othesis is H 0 : I m i,j = 0 for all i, j . The alternativ e h yp othesis is H 1 : I m ij 6 = 0 for some i, j . No w we can pro ceed to preliminary quantitativ e estimates. If a pixel ( i, j ) is white in the original image, let us denote the corresp onding probabilit y distribution of Y ij b y P 0 . F or a blac k pixel ( i, j ) we denote the corresp onding distribution of Y ij b y P 1 . W e are free to omit dep endency of P 0 and P 1 on i and j in our notation, since all the noises are indep enden t and iden tically distributed. The following simple observ ation will b e used to link comm unity detection in triangular net works and p ercolation theory . Prop osition 1. If h A i holds and the distribution of the noise distribution is symmetric, then P 0 ( Y ij ≥ 1 / 2 ) < 1 / 2 , (5) 1 / 2 < P 1 ( Y ij ≥ 1 / 2 ) . (6) Pr o of. (Prop osition 1) Since the noise is symmetric, assumption h A i yields P 1 ( Y ij ≥ 1 / 2 ) = P ( ε + 1 > 1 / 2 ) = P ( ε > − 1 / 2 ) = P ( ε < 1 / 2 ) > P ( ε < 0 ) = 1 / 2 . imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 6 F or the other part, we hav e in view of the previous calculation P 0 ( Y ij ≥ 1 / 2 ) = P ( ε ≥ 1 / 2 ) = 1 − P ( ε < 1 / 2 ) < 1 / 2 . This completes the pro of. 3. Thresholding and p ercolation on triangular lattices As was sho wn in (Lango vo y and Wittic h, 2013a), (Langov oy and Wittich, 2013b), (Arias-Castro and Grimmett, 2013), p ercolation-based detection meth- o ds are applicable to more general types of netw orks than lattices or regular graphs. In this pap er, in order to obtain strong stabilit y against a very wide nonparametric class of noise distributions, w e decided to stick to graphs with critical probability 0 . 5, of whic h the triangular lattice is the most natural ex- ample. 3.1. Thr esholding No w we are ready to describ e one of the main ingredients of our metho d: the thr esholding . The idea of the thresholding is as follo ws: in the noisy gra yscale image { Y ij } N i,j =1 , we pick some pixels that lo ok as if their real colour was blac k. Then w e colour all those pixels blac k, irresp ectively of the exact v alue that w as observ ed on them. W e take into account the in tensit y observed at those pixels only once, in the b eginning of our pro cedures. The idea is to think that some pixel ”seems to hav e a blac k colour” when it is not very lik ely to obtain the observed grey v alue when adding a ”reasonable” noise to a white pixel. W e colour white all the pixels that weren’t coloured blac k at the previous step. At the end of this pro cedure, w e w ould hav e a transformed v ector of 0’s and 1’s, call it { Y i,j } N i,j =1 . W e will b e able to analyse this transformed picture b y using certain results from the mathematical theory of p ercolation. Let us ﬁx, for eac h N , a real num b er α 0 ( N ) > 0, α 0 ( N ) ≤ 1, such that there exists θ ( N ) ∈ R satisfying the follo wing condition: P 0 ( Y ij ≥ θ ( N ) ) ≤ α 0 ( N ) . (7) In this pap er w e will alwa ys pick α 0 ( N ) ≡ α 0 for all N ∈ N , for some constan t α 0 > 0. imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 7 As a ﬁrst step, we transform the observ ed noisy image { Y i,j } N i,j =1 in the follo wing w ay: for all 1 ≤ i, j ≤ N , 1. If Y ij ≥ θ ( N ), set Y ij := 1 (i.e., in the transformed picture the corresp onding pixel is coloured black). 2. If Y ij < θ ( N ), set Y ij := 0 (i.e., in the transformed picture the corresp onding pixel is coloured white). Deﬁnition 1. The ab ov e transformation is called thr esholding at the level θ ( N ). The resulting array { Y i,j } N i,j =1 of N 2 v alues (0’s and 1’s) is called a thr esholde d pictur e . 3.2. Per c olation One can think of pixels from { Y i,j } N i,j =1 as of v ertices of a planar graph. Let us colour these N 2 v ertices with the same colours as the corresp onding pixels. W e obtain a graph G N with N 2 blac k or white v ertices and (so far) no edges. W e add edges to G N in the following wa y . If an y tw o black v ertices are ”neigh b ours” (in a w a y to b e sp eciﬁed b elow), we connect these tw o vertices with a black edge. If an y t w o white vertices are neighbours, w e connect them with a white edge. W e will not add any edges b etw een non-neighbouring p oin ts, and we will not connect vertices of diﬀeren t colours to eac h other. It is crucial how one deﬁnes neigb ourho o ds for v ertices of G N : diﬀerent deﬁnitions can lead to testing pro cedures with v ery diﬀerent prop erties. The ﬁrst and a very natural w a y is to view G N as an N × N square subset of the Z 2 lattice. The metho d w orks in this case, see (Langov oy and Wittich, 2009) and (Langov oy and Wittic h, 2013a). Ho wev er, it turns out that the method b ecomes esp ecially robust when w e view our black and white pixelized picture as a collection of black and white clusters on an N × N subset of the triangular lattice T 2 (obtained from Z 2 lattice b y adding diagonals to ev ery square on the lattice). In the present pap er, w e will work exclusively with triangular lattices. W e p erform θ ( N ) − thresholding of the noisy image { Y i,j } N i,j =1 using with a very sp ecial v alue of θ ( N ). Our goal is to c ho ose θ ( N ) (and corresp onding α 0 ( N ), see (7)) suc h that: P 0 ( Y ij ≥ θ ( N ) ) < p site c , (8) p site c < P 1 ( Y ij ≥ θ ( N ) ) , (9) where p site c is the critical probability for site p ercolation on T 2 (see (Grimmett, 1999), (Kesten, 1982)). imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 8 Since G N is random, w e actually observe the so-called site p er c olation on blac k v ertices within the subset of T 2 . F rom this p oin t, w e can use results from p ercolation theory to predict formation of blac k and white clusters on G N , as well as to estimate the n umber of clusters and their sizes and shap es. Relations (8) and (9) are crucial here. T o explain this more formally , let us split the set of v ertices V N of the graph G N in to to groups: V N = V im N ∪ V out N , where V im N ∩ V out N = ∅ , and V im N consists of those and only those vertices that corresp ond to pixels b elonging to the original ob ject, while V out N is left for the pixels from the background. Denote G im N the subgraph of G N with vertex set V im N , and denote G out N the subgraph of G N with v ertex set V out N . If (8) and (9) are satisﬁed, we will observe a so-called sup er critic al p er c ola- tion of blac k clusters on G im N , and a sub critic al p ercolation of blac k clusters on G out N . Without going into muc h details on p ercolation theory (the nec- essary introduction can b e found in (Grimmett, 1999) or (Kesten, 1982)), w e mention that there will b e a high probability of forming relatively large blac k clusters on G im N , but there will b e only little and scarce blac k clusters on G out N . The diﬀerence betw een the t wo regions will b e striking, and this is the main comp onent in our image analysis metho d. In this paper, mathematical p ercolation theory will b e used to deriv e quan- titativ e results on b eha viour of clusters for b oth cases. W e will apply those results to build eﬃcient randomized algorithms that will b e able to detect and estimate the ob ject { I m i,j } N i,j =1 using the diﬀerence in p ercolation phases on G im N and G out N . But when can the k ey inequalities (8) and (9) b e sim ultaneously satisﬁed for an appropriate threshold θ ? The following imp ortant prop osition shows that, under v ery mild conditions, our method is asymptotically consisten t for an y noise level. Prop osition 2. On the triangular lattic e (8) and (9) ar e always satisﬁe d for θ = 1 / 2 . Pr o of. (Prop osition 2) F or the planar triangular lattice one has p site c = 1 / 2 (see (Kesten, 1982)). The statement follows from Prop osition 1. Prop osition 2 explains the main reason for w orking with the triangular lattice: for this lattice, the metho d is asymptotically consisten t for any noise lev el, and the natural threshold θ ( N ) = 1 / 2 is alwa ys appropriate. As we will see in the follo wing section, this makes our testing pro cedure applicable in the case of unknown and nonsmo oth nonparametric noise. imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 9 4. Ob ject detection W e either observe a blank white screen with accidental noise or there is an actual ob ject in the blurred picture. In this section, w e prop ose an algorithm to make a decision on whic h of the t wo p ossibilities is true. This algorithm is a statistical testing pro cedure. It is designed to solve the question of testing H 0 : I ij = 0 for all 1 ≤ i, j ≤ N versus H 1 : I ij = 1 for some i, j . Let us c ho ose α ( N ) ∈ (0 , 1) - the pr ob ability of false dete ction of an ob ject. More formally , α ( N ) is the maximal probabilit y that the algorithm ﬁnishes its work with the decision that there was an ob ject in the picture, while in fact there w as just noise. In statistical terminology , α ( N ) is the probability of an error of the ﬁrst kind. W e allow α to dep end on N ; α ( N ) is connected with complexit y (and exp ected working time) of our randomized algorithm. Since in our metho d it is crucial to observ e some kind of p ercolation in the picture (at least within the image), the image has to be ”not too small” in or- der to b e detectable by the algorithm: one can’t observe an ything p ercolation- alik e on just a few pixels. W e will use p ercolation theory to determine how ”large” precisely the ob ject has to b e in order to b e detectable. Some size assumption has to b e presen t in any detection problem, though: for example, it is mathematically hop eless to detect a single p oin t on a very large screen ev en in the case of a mo derate noise. W e pro ceed with the follo wing w eak assumption ab out the ob ject’s interior part: h B i The ob ject contains a black square with the side of size at least ϕ im ( N ) pixels, where lim N →∞ log 1 α ( N ) ϕ im ( N ) = 0 . (10) lim N →∞ ϕ im ( N ) log N = ∞ . (11) Assumption h B i is a suﬃcient condition for the algorithm to w ork. F or ex- ample, it is possible to relax (11) and to replace a square in h B i by a triangle- shap ed ﬁgure. Although conditions (10) and (11) are of asymptotic character, most of the estimates used in our metho d are v alid for ﬁnite N as w ell. F or ob vious consistency reasons, ϕ im ( N ) ≤ N . No w we are ready to form ulate our Dete ction Algorithm (see Algorithm 1). Fix the false detection rate α ( N ) b efore running the algorithm. imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 10 Algorithm 1 Detection 1: Step 0. Find an optimal θ ( N ) (in our framew ork θ ( N ) := 1 / 2). 2: Step 1. P erform θ ( N ) − thresholding of the noisy picture { Y i,j } N i,j =1 . 3: Step 2. 4: rep eat 5: Run depth-ﬁrst searc h (T arjan, 1972) on the graph G N of the θ ( N ) − thresholded picture { Y i,j } N i,j =1 6: until {{ Blac k cluster of size ϕ im ( N ) is found } or { all black clusters are found }} 7: Step 3. 8: if black cluster of size ϕ im ( N ) w as found then 9: Output: an ob ject was detected 10: else 11: Output: there is no ob ject. 12: end if A t Step 2 our algorithm ﬁnds and stores not only sizes of black clusters, but also co ordinates of pixels constituting eac h cluster. W e remind that θ ( N ) is deﬁned as in (7), G N and { Y i,j } N i,j =1 w ere deﬁned in Section 3, and ϕ im ( N ) is any function satisfying (10). The depth-ﬁrst search algorithm is a stan- dard pro cedure used for searc hing connected comp onents on graphs. This pro cedure is a deterministic algorithm. The detailed description and rigor- ous complexit y analysis can b e found in (T arjan, 1972), or in the classic bo ok (Aho et al., 1975), Chapter 5. Let us pro ve that Algorithm 1 works, and determine its complexit y . Theorem 1. Supp ose assumptions h A i and h B i ar e satisﬁe d and the noise is symmetric. Then 1. Algorithm 1 ﬁnishes its work in O ( N 2 ) steps, i.e. is line ar. 2. If ther e was an obje ct in the pictur e, Algorithm 1 dete cts it with pr ob- ability at le ast (1 − exp( − C 1 ( σ ) ϕ im ( N ))) . 3. The pr ob ability of false dete ction do esn ’t exc e e d min { α ( N ) , exp( − C 2 ( σ ) ϕ im ( N )) } for al l N > N ( σ ) . The c onstants C 1 > 0 , C 2 > 0 and N ( σ ) ∈ N dep end only on σ . Theorem 1 means that Algorithm 1 is of quic k est p ossible order: it is line ar in the input size in the w orst case. Theorem 1 also implies that the algorithm has computational complexity O ( ϕ im ( N )) if the starting p oint of the depth-ﬁrst search w as close enough to the ob ject. It is diﬃcult to think of an algorithm working quick er in this problem. Indeed, if the image is v ery small and lo cated in an unkno wn place on the screen, or if there is no image at all, then an y algorithm solving the detection problem will ha v e to at least imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 11 upload information ab out O ( N 2 ) pixels, i.e. under general assumptions of Theorem 1, an y detection algorithm will hav e at least linear complexity . Another imp ortant p oint is that Algorithm 1 is not only consistent, but that it has exp onential rate of accuracy , t ypically achiev able only for para- metric or suﬃcien tly smo oth mo dels. R emark 1 . It is also inter esting to r emark her e that, although it is assume d that the obje ct of inter est c ontains a ϕ im ( N ) × ϕ im ( N ) black squar e, one c annot use a very natur al ide a of simply c onsidering sums of values on al l squar es of size ϕ im ( N ) × ϕ im ( N ) in or der to dete ct an obje ct. Neither some sort of thr esholding c an b e avoide d, in gener al. Inde e d, although this simple ide a works very wel l for normal noise, it c annot b e use d in c ase of unknown and p ossibly irr e gular or he avy-taile d noise. F or example, for he avy-taile d noise, dete ction b ase d on non-thr esholde d sums of values over subsquar es wil l le ad to a high pr ob ability of false dete ction. Wher e as the metho d of the pr esent p ap er stil l works. 5. Example In this section, w e outline an example illustrating p ossible applications of our metho d. W e start with a real greyscale picture of a neuron (see Fig. 1). This neuron is an irregular ob ject with unknown shap e, and our metho d can b e v ery adv antageous in situations like this. Basing on this real picture, we p erform the following sim ulation study . W e add Gaussian noise of level σ = 1 . 8 indep enden tly to eac h pixel in the image, and then w e run Algorithm 1 on this noisy picture. A typical version of a noisy picture with this relatively strong noise can b e seen on Fig. 2. W e run the algorithm on 1000 simulated pictures. Note that w e used Gaussian noise for illustrative purp oses only . W e did not mak e any use neither of the fact that the noise is normal nor of our knowledge of the actual noise lev el. As a result, the neuron was detected in 96.8% of all cases. A t the same time, the probabilit y of false detection w as sho wn to b e b elo w 5%. Now we describ e our exp eriment in more details. The starting picture (see Fig. 1) w as 450 × 450 pixels. White pixels ha ve v alue 0 and black pixels hav e v alue 1. Some pixels w ere grey already in the original picture, but in practice this do esn’t sp oil the detection pro cedure. W e used as a threshold θ = 0 . 5. The thresholded version of Fig. 2 is shown on Fig. 3. As follo ws from Theorem 1, our testing procedure is asymptotically consisten t. W e hav e chosen σ = 1 . 8 in our sim ulation study . In practice, Algorithm 1 can b e consisten tly used for stronger noise levels for images of this size. imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 12 Fig 1 . A p art of a r e al neur on. Supp ose the null h yp othesis is true, i.e. there is no signal in the original picture. By running Algorithm 1 on empty pictures of size 450 × 450 with sim ulated noise of lev el σ = 1 . 8 and θ = 0 . 5, one can ﬁnd that with prob- abilit y more than 95% there will b e no blac k cluster of size 304 or more on the thresholded picture. Therefore, we considered as signiﬁcant only those clusters that had more than 304 pixels. A diﬀeren t and muc h more eﬃcien t w a y of calculating ϕ ( N ) for mo derate sizes of N is prop osed in (Lango v oy and Wittic h, 2010). F or mo derate sample sizes, the algorithm is applicable in many situations that are not co v ered by Theorem 1. The ob ject, of course, do esn’t ha v e to con tain a square of size 303 × 303 in order to b e detectable. In particular, for noise lev el σ = 1 . 8, even ob jects containing a 40 × 40 square are consistently detected. The neuron on Fig. 1 passes this criterion, and Algorithm 1 detected the neuron 968 times out of 1000 runs. W e remark here that the algorithm is also v ery quick in practice. F or example, its realization in Python typically requires less than 1 second to pro cess a 4000 by 4000 image on a p ersonal computer. imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 13 Fig 2 . A noisy pictur e. 6. Pro ofs Before proving the main result, w e shall state ﬁrst the follo wing theorem ab out sub critical site p ercolation on the standard triangular lattice T 2 . Theorem 2. Consider site p er c olation with pr ob ability p 0 on T 2 . Ther e exists a c onstant λ site = λ site ( p 0 ) > 0 such that P p 0 ( | C | ≥ n ) ≤ e − n λ site ( p 0 ) , for al l n ≥ N ( p 0 ) . (12) Her e C denotes the op en cluster c ontaining the origin. Pr o of. (Theorem 2): The triangular lattice satisﬁes conditions of the Theo- rem 5.1 in (Kesten, 1982), p.83. Therefore, the second part of that Theorem (see equations (5.12)-(5.14) and the conclusion following them) ensures that there exist constan ts C 1 = C 1 ( p 0 ) > 0, C 2 = C 2 ( p 0 ) > 0 suc h that P p 0 ( | C | ≥ n ) ≤ C 1 ( p 0 ) e − n C 2 ( p 0 ) , for all n ≥ 1 . (13) imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 14 Fig 3 . A thr esholde d pictur e. If C 1 ≤ 1, then (12) follo ws immediately . Otherwise, (12) follo ws from (13) for all n ≥ N ( p 0 ) and any λ site ( p 0 ) := C 3 = C 3 ( p ) > 0 such that N ( p 0 ) and C 3 satisfy the inequalit y N ( p 0 ) ( C 2 − C 3 ) ≥ log C 1 . (14) W e will also need to use the celebrated FK G inequalit y (see (F ortuin et al., 1971), or (Grimmett, 1999), Theorem 2.4, p.34; see also Grimmett’s b ook for some explanation of the terminology). Theorem 3. If A and B ar e b oth incr e asing (or b oth de cr e asing) events on the same me asur able p air (Ω , F ) , then P ( A ∩ B ) ≥ P ( A ) P ( B ) . Deﬁne F N ( n ) as the ev ent that there is an err one ously marke d black cluster of size greater or equal n , lying in the square of size N × N corresp onding imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 15 to the screen. (An erroneously mark ed blac k cluster is a blac k cluster on G N suc h that e ach of the pixels in the cluster was wrongly coloured blac k after the θ − thresholding). Denote p out ( N ) := P ( Y ij ≥ 1 / 2 | I m ij = 0 ) , (15) a probability of erroneously marking a white pixel outside of the image as blac k. The next theorem is particularly useful when studying p ercolation on ﬁnite sublattices of the initial inﬁnite lattice. Theorem 4. Supp ose that 0 < p out ( N ) < p site c . Ther e exists a c onstant C 3 = C 3 ( p out ( N )) > 0 such that P p out ( N ) ( F N ( n )) ≤ exp( − n C 3 ( p out ( N ))) , for al l n ≥ ϕ im ( N ) . (16) Pr o of. (Theorem 4): Denote by C ( i, j ) the largest cluster in the N × N screen (triangulated b y diagonals of one orien tation) con taining the pixel with co ordinates ( i, j ), and b y C (0) the largest blac k cluster on the same N × N screen con taining 0. It do esn’t matter for this pro of which p oin t is denoted b y 0. By Theorem 2, for all i , j : 1 ≤ i, j ≤ N : P p out ( N ) ( | C (0) | ≥ n ) ≤ e − n λ site ( p out ) , (17) P p out ( N ) ( | C ( i, j ) | ≥ n ) ≤ e − n λ site ( p out ) . Ob viously , it only help ed to inequalities (12) and (17) that we hav e limited our clusters to only a ﬁnite subset instead of the whole lattice T 2 . On a side note, there is no symmetry an ymore b etw een arbitrary p oints of the N × N ﬁnite subset of the triangular lattice; luckily , this do esn’t aﬀect the presen t pro of. Since { | C (0) | ≥ n } and { | C ( i, j ) | ≥ n } are increasing ev ents (on the measurable pair corresp onding to the standard random-graph mo del on G N ), w e hav e that { | C (0) | < n } and { | C ( i, j ) | < n } are decreasing ev en ts for all i , j . By FKG inequality for decreasing even ts, P p out ( N ) ( | C ( i, j ) | < n for all i, j , 1 ≤ i, j ≤ N ) ≥ Y Y 1 ≤ i,j ≤ N P p out ( N ) ( | C ( i, j ) | < n ) ≥ (b y (17)) ≥  1 − e − n λ site ( p out )  N 2 . imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 16 It follo ws that P p out ( N ) ( F N ( n )) = P p out ( N )  ∃ ( i, j ) , 1 ≤ i, j ≤ N : | C ( i, j ) | ≥ n  ≤ 1 −  1 − e − n λ site ( p out )  N 2 = 1 − N 2 X k =0 ( − 1) k C k N 2 e − n λ site ( p out ) k = N 2 X k =1 ( − 1) k − 1 C k N 2 e − n λ site ( p out ) k = N 2 e − n λ site ( p out ) + o  N 2 e − n λ site ( p out )  , b ecause w e assumed in (16) that n ≥ ϕ im ( N ), and ϕ im ( N )  log N . More- o v er, we see immediately that Theorem 4 follows now with some C 3 suc h that 0 < C 3 ( p out ( N )) < λ site ( p out ( N )). No w we establish the following useful lemma. Let G N denote the N × N subset of T 2 , as deﬁned in Section 3 of the presen t pap er. denote its canonical matc hing graph by G ∗ N . W e remind that T 2 is self-matc hing, and refer to (Kesten, 1982), Section 2.2 for the necessary deﬁnitions. Assuming that n ≤ N , denote A n b e the even t that there is an op en (i.e., black) path in the rectangle [0 , n ] × [0 , n ] joining some v ertex on its left side to some v ertex on its right side. Similarly , let B n denote the even t that there exists a closed (i.e., white) path on G ∗ n joining a v ertex on the top side of G ∗ n to a v ertex on its b ottom side. R emark 2 . When sp eaking ab out blac k or white crossings of rectangles, we are free to assume that T 2 is em b edded in the plane as a Z 2 lattice with diag- onals. See (Kesten, 1982) for a discussion of connections b etw een p ercolation and v arious planar em b eddings of regular lattices. Lemma 1. L et 0 < p < 1 b e a r e al numb er. Consider standar d site p er c ola- tion with pr ob ability p on the triangular lattic e. Then 1. Either A n or B n o c curs. Mor e over, A n ∩ B n = ∅ . 2. P p ( A n ) + P p ( B n ) = 1 . (18) 3. P p ( A n ) + P 1 − p ( A n ) = 1 . (19) imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 17 Pr o of. (Lemma 1). Statemen t 1 of the Lemma directly follo ws from Prop o- sition 2.2 from (Kesten, 1982) (see also pp.398 - 402 of that b o ok: there a rigorous pro of of this prop osition is presen ted, including necessary top ologi- cal considerations). Statement 2 is an immediate consequence of Statement 1 and deﬁnitions of p ercolation measures on G n and G ∗ n . T o complete the pro of, note that G N and G ∗ N are isomorphic, by Example (iii), pp. 19-20 of (Kesten, 1982). Since by deﬁnition a v ertex of G ∗ n is black with probabilit y 1 − p , w e hav e that P p ( B n ) = P 1 − p ( A n ) . (20) This pro v es (19). First w e prov e the following theorem: Theorem 5. Consider site p er c olation on T 2 lattic e with p er c olation pr ob- ability p > p site c = 1 / 2 . L et A n b e the event that ther e is an op en p ath in the r e ctangle [0 , n ] × [0 , n ] joining some vertex on its left side to some ver- tex on its right side. L et M n b e the maximal numb er of vertex-disjoint op en left-right cr ossings of the r e ctangle [0 , n ] × [0 , n ] . Then ther e exist c onstants C 4 = C 4 ( p ) > 0 , C 5 = C 5 ( p ) > 0 , C 6 = C 6 ( p ) > 0 such that P p ( A n ) ≥ 1 − ( n + 1) e − C 4 n , (21) P p ( M n ≤ C 5 n ) ≤ e − C 6 n , (22) and b oth ine qualities holds for al l n ≥ N 1 ( p ) . Pr o of. (Theorem 5): Let LR k ( n ), 0 ≤ k ≤ n , b e the even t that the p oint (0 , k ) of G n is connected by a white (in other w ords, closed) path (that lies in the interior of G n ) to some vertex on the righ t b order of G n . Denote b y LR ( n ) the even t that there exists a closed left-right crossing of G n . Let C ((0 , k )) denotes the white cluster containing the p oint (0 , k ), where we mak e a conv ention that this cluster is considered on the whole lattice T 2 . Then ob viously LR k ( n ) ⊆ { ω : | C ((0 , k )) | ≥ n } (23) imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 18 and LR ( n ) ⊆ n [ k =0 LR k ( n ) . (24) No w (24) gives us P 1 − p ( LR ( n )) ≤ n X k =0 P 1 − p ( LR k ( n )) ≤ ( n + 1) max 0 ≤ k ≤ 1 P 1 − p ( LR k ( n )) . (25) Since 1 − p < p site c , w e get from (23) and Theorem 2 that for all k P 1 − p ( LR k ( n )) ≤ P 1 − p ( { | C ((0 , k )) | ≥ n } ) ≤ e − C 4 n . (26) Com bining (25) and (26) yields P 1 − p ( A n ) = P 1 − p ( LR ( n )) ≤ ( n + 1) e − C 4 n . (27) Altogether, (19) and (27) imply (21). This prov es the ﬁrst half of Theorem 5. As ab out the second part of the pro of, (22) is deduced from (21) with the help of Theorem 2.45 of (Grimmett, 1999). The deriv ation itself is presented at pp. 49-50 of (Grimmett, 1999); the only diﬀerence is that in our case one has to change ”edges” b y ”v ertices” in the pro of from the b o ok. Ev erything else works the same, since Theorem 2.45 is v alid for all Bernoulli pro duct measures on regular lattices; in particular, Theorem 2.45 applies for site p ercolation as well. This completes the pro of of Theorem 5. Pr o of. (Theorem 1): I. First w e prov e the complexity result. The θ ( N ) − thresholding gives us { Y i,j } N i,j =1 and G N in O ( N 2 ) op erations. This ﬁnishes the analysis of Step 1. As for Step 2, it is kno wn (see, for example, (Aho et al., 1975), Chapter 5, or (T arjan, 1972)) that the standard depth-ﬁrst search ﬁnishes its work also in O ( N 2 ) steps. It tak es not more than O ( N 2 ) op erations to sav e p ositions of all pixels in all clusters to memory , since one has no more than N 2 p ositions and clusters. This completes analysis of Step 2 and shows that Algorithm 1 is linear in the size of the input data. imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 19 I I. No w we prov e the b ound on the probabilit y of false detection. Denote p out ( N ) := P ( Y ij ≥ 1 / 2 | I m ij = 0 ) , (28) a probability of erroneously marking a white pixel outside of the image as blac k. Under assumptions of Theorem 1, p out ( N ) < p site c . The exp onential b ound on the probability of false detection follows trivially from Theorem 4. I I I. It remains to prov e the low er b ound on the probabilit y of true detec- tion. Suppose that w e ha ve an ob ject in the picture that satisﬁes assumptions of Theorem 1. Consider any ϕ im ( N ) × ϕ im ( N ) square in this image. After θ − thresholding of the picture by Algorithm 1, w e observ e on the selected square a site p ercolation with probability p im ( N ) := P ( Y ij ≥ 1 / 2 | I m ij = 1 ) > p site c . Then, by (21) of Theorem 5, there exists C 4 = C 4 ( p im ( N )) such that there will b e at le ast one cluster of size not less than ϕ im ( N ) (for example, one could tak e any of the existing left-right crossings as a part of suc h cluster), pro vided that N is bigger than certain N 1 ( p im ( N )); and all that happ ens with probabilit y at least 1 − n e − C 4 n > 1 − e − C 3 n , for some C 3 : 0 < C 3 < C 4 . Note that one can alwa ys w eak en the constan t C 3 ab o ve in such a wa y that the estimate ab o v e starts to hold for all n ≥ 1. Theorem 1 is prov ed. Ac kno wledgmen ts. The authors w ould lik e to thank Remco v an der Hofs- tad, Artem Sap ozhniko v and Shota Gugushvili for helpful discussions. References Alfred V. Aho, John E. Hop croft, and Jeﬀrey D. Ullman. The design and analysis of c omputer algorithms . Addison-W esley Publishing Co., Reading, Mass.-London-Amsterdam, 1975. Second printing, Addison-W esley Series in Computer Science and Information Pro cessing. imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 20 E. Arias-Castro and G. Grimmett. Cluster detection in netw orks using per- colation. Bernoul li , 19(2):676–719, 2013. C. M. F ortuin, P . W. Kasteleyn, and J. Ginibre. Correlation inequalities on some partially ordered sets. Comm. Math. Phys. , 22:89–103, 1971. ISSN 0010-3616. Donald Geman and Bruno Jedynak. An active testing mo del for trac king roads in satellite images. IEEE T r ansactions on Pattern A nalysis and Machine Intel ligenc e , 18(1):1–14, 1996. Geoﬀrey Grimmett. Per c olation , v olume 321 of Grund lehr en der Mathematis- chen Wissenschaften [F undamental Principles of Mathematic al Scienc es] . Springer-V erlag, Berlin, second edition, 1999. ISBN 3-540-64902-6. Harry Kesten. Per c olation the ory for mathematicians , v olume 2 of Pr o gr ess in Pr ob ability and Statistics . Birkh¨ auser Boston, Mass., 1982. ISBN 3- 7643-3107-0. M. Lango vo y and O. Wittic h. Dete ction of obje cts in noisy images and site p er c olation on squar e lattic es . EURANDOM Rep ort No. 2009-035. EU- RANDOM, Eindho v en, 2009. M. Lango vo y and O. Wittich. Computational ly eﬃcient algorithms for sta- tistic al image pr o c essing. Implementation in R . EURANDOM Rep ort No. 2010-053. EURANDOM, Eindho ven, 2010. M. Langov oy and O. Wittic h. Randomized algorithms for statistical image analysis and site p ercolation on square lattices. Statistic a Ne erlandic a , 67 (3):337–353, 2013a. ISSN 1467-9574. . URL http://dx.doi.org/10. 1111/stan.12010 . M. Lango v o y and O. Wittic h. Robust nonparametric detection of ob jects in noisy images. Journal of Nonp ar ametric Statistics , 25(2):409–426, 2013b. M. Langov oy , M. Hab eck, and B. Sc ho elkopf. Adaptive nonparametric de- tection in cry o-electron microscop y . In Pr o c e e dings of the 58-th World Statistic al Congr ess , Session: High Dimensional Data, pages 4456 – 4461, 2011a. M. Langov oy , M. Hab eck, and B. Schoelkopf. Spatial statistics, image anal- ysis and p ercolation theory . In The Joint Statistic al Me etings Pr o c e e dings , Time Series and Netw ork Section, pages 5571 – 5581, American Statistical Asso ciation, Alexandria, V A, 2011b. Tim McInerney and Demetri T erzop oulos. Deformable mo dels in medical image analysis: a survey . Me dic al image analysis , 1(2):91–108, 1996. M. Negri, P . Gamba, G. Lisini, and F. T upin. Junction-aw are extraction and regularization of urban road netw orks in high-resolution sar images. Ge oscienc e and R emote Sensing, IEEE T r ansactions on , 44(10):2962–2971, Oct. 2006. ISSN 0196-2892. . Lucia Ricci-Vitiani, Dario G. Lombardi, Eman uela Pilozzi, Mauro Biﬀoni, imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018 M. L angovoy et al./Unsup ervise d dete ction and p er c olation 21 Matilde T odaro, Cesare Pesc hle, and Ruggero De Maria. Iden tiﬁcation and expansion of human colon-cancer-initiating cells. Natur e , 445(7123): 111–115, Oct. 2007. ISSN 0028-0836. Yiy e Ruan, Da vid F uhry , and Sriniv asan P arthasarathy . Eﬃcien t comm unit y detection in large net w orks using conten t and links. In Pr o c e e dings of the 22nd international c onfer enc e on World Wide Web , pages 1089–1098. A CM, 2013. Sunil K. Sinha and Paul W. Fieguth. Automated detection of crac ks in buried concrete pip e images. A utomation in Construction , 15(1):58 – 72, 2006. ISSN 0926-5805. . Rob ert T arjan. Depth-ﬁrst search and linear graph algorithms. SIAM J. Comput. , 1(2):146–160, 1972. ISSN 0097-5397. Jaew on Y ang, Julian McAuley , and Jure Lesko vec. Comm unit y detection in net w orks with no de attributes. In Data Mining (ICDM), 2013 IEEE 13th international c onfer enc e on , pages 1151–1156. IEEE, 2013. imsart-generic ver. 2007/04/13 file: Detection_Percolation_Version_6.tex date: July 16, 2018

Unsupervised nonparametric detection of unknown objects in noisy images based on percolation theory

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment