Suboptimality of Nonlocal Means for Images with Sharp Edges

We conduct an asymptotic risk analysis of the nonlocal means image denoising algorithm for the Horizon class of images that are piecewise constant with a sharp edge discontinuity. We prove that the mean square risk of an optimally tuned nonlocal mean…

Authors: Arian Maleki, Manjari Narayan, Richard G. Baraniuk

Sub optimalit y of Nonlo cal Means for Images with Sharp Edges Arian Maleki 1, ∗ , Manjari Nara yan 2 , Ric hard G. Baraniuk 3, ∗∗ Dept. of Computer and Ele ctric al Engine ering, Ric e University, MS-380 6100 Main Str e et, Houston, TX 77005, USA Abstract W e conduct an asymptotic risk analysis of the nonlo cal means image denois- ing algorithm for the Horizon class of images that are piecewise constan t with a sharp edge discon tinuit y . W e pro ve that the mean square risk of an opti- mally tuned nonlo cal means algorithm deca ys according to n − 1 log 1 / 2+  n , for an n -pixel image with  > 0. This deca y rate is an impro vemen t o v er some of the predecessors of this algorithm, including the linear con volution filter, median filter, and the SUSAN filter, eac h of whic h provides a rate of only n − 2 / 3 . It is also within a logarithmic factor from optimally tuned wa v elet thresholding. Ho wev er, it is still substantially low er than the the optimal minimax rate of n − 4 / 3 . Keywor ds: Denoising, minimax risk, Horizon class, nonlo cal means, linear filter, SUSAN filter, w av elet thresholding ∗ Corresp onding author ∗∗ Principal corresponding author Email addr esses: arian.maleki@rice.edu (Arian Maleki), manjari@rice.edu (Manjari Nara y an), richb@rice.edu (Ric hard G. Baraniuk) URL: http://www.ece.rice.edu/~mam15/ (Arian Maleki), http://dsp.rice.edu/dspmember/12 (Manjari Nara y an), http://web.ece.rice.edu/richb/ (Ric hard G. Baraniuk) 1 Phone: +1 713.348.3579; F ax: +1 713.348.5685 2 Phone: +1 713.348.2371; F ax: +1 713.348.5685 3 Phone: +1 713.348.5132; F ax: +1 713.348.5685 Pr eprint submitte d to Applie d and Computational Harmonic Analysis Septemb er 26, 2018 1. In tro duction 1.1. Image denoising The long history of image denoising is testimon y to its central imp or- tance in image pro cessing. A wide range of algorithms hav e b een dev elop ed, ranging from simple linear con volution and median filtering to total v ariation denoising [1] and sparsit y exploiting algorithms suc h as w a velet shrink age [2]. Due to the sensitivit y of h uman visual system to edges, the abilit y to preserv e sharp edges is an imp ortant criterion for noise remo v al algorithms. Therefore Korostelev and Tsybako v prop osed a framework to c haracterize the p erfor- mance of image denoisers on edges [3]. Based on this framework, we aim to c haracterize the p erformance of several denoising algorithms that represent the current state of the art image enhancement tec hniques. In particular, we will fo cus on the p opular and p ow erful nonlo cal means (NLM) algorithm. 1.2. The minimax fr amework In this pap er, we are in terested in estimating a function f : [0 , 1] 2 → R from noisy pixel lev el observ ations. Define Pixel( i, j ) = [ i n , i +1 n ) × [ j n , j +1 n ), and let x i,j = Av e( f | Pixel( i, j )) b e the pixel level a v erages of f . W e observe the samples y i,j = x i,j + z i,j , where, z i,j is iid N (0 , σ 2 ). The goal is to recov er the original pixel v alues x i,j from the observ ations y i,j , based on some information ab out the function f . F or a giv en function f and an estimator ˆ f we define the risk function as R n ( f , ˆ f ) = E 1 n 2 X i X j ( x i,j − ˆ f i,j ) 2 ! . (1) The risk can also b e written as R n ( f , ˆ f ) = 1 n 2 X i X j ( x i,j − E ˆ f i,j ) 2 ! + E 1 n 2 X i X j ( ˆ f i,j − E ˆ f i,j ) 2 ! , (2) where the first and second terms corresp ond to the bias and varianc e of the estimator ˆ f , resp ectively . Let f b elong to a class of functions F , e.g., a class of edge-lik e images that represent edges with different shap es and orientations. The risk defined 2 in (1) dep ends on the sp ecific c hoice of f . W e define the risk of an estimator ˆ f on the class F as the risk of the worst-case signal, i.e., R n ( F , ˆ f ) = sup f ∈F R n ( f , ˆ f ) . The minimax risk o v er functions in F is then defined as the risk of the b est p ossible estimator, i.e., R ∗ n ( F ) = inf ˆ f sup f ∈ F R n ( f , ˆ f ) . The minimax risk is a lower b ound for the p erformance of all measurable estimators for signals in F . In this paper we are in terested in the asymptotic setting where the n um b er of pixels n → ∞ . F or all of the estimators w e consider, R n ( F , ˆ f ) → 0 as n → ∞ . Therefore, we consider the de c ay r ate of the risk as the p erformance measure. W e will derive the minimax risk for several p opular image denoising tec hniques b elo w. W e will use the follo wing asymptotic notation in this pap er. Definition 1. f ( n ) = O ( g ( n )) as n → ∞ , if and only if ther e exist n 0 and c such that for any n > n 0 , | f ( n ) | ≤ c | g ( n ) | . Likewise, f ( n ) = Ω( g ( n )) as n → ∞ , if and only if ther e exist n 0 and c such that for any n > n 0 , | f ( n ) | ≥ c | g ( n ) | . Final ly, f ( n ) = Θ( g ( n )) , if f ( n ) = O ( g ( n )) and f ( n ) = Ω( g ( n )) . We may inter change ably use f ( n )  g ( n ) for f ( n ) = Θ( g ( n )) . Definition 2. f ( n ) = o ( g ( n )) if and only if lim n →∞ f ( n ) g ( n ) = 0 . 1.3. Horizon e dge mo del Sev eral different image edge mo dels ha v e b een dev elop ed in the image pro cessing and denoising literature. Here we will use the Horizon mo del that con tains piecewise constant images with edges that are smo oth in the direction of the edge contour but discontin uous in the direction orthogonal to the edge contour [3, 4]. Sp ecifically , let H ¨ ol der α ( C ) b e the class of H¨ older smo oth functions on R , defined as follo ws: h ∈ H ¨ ol der α ( C ) if and only if | h ( k ) ( t 1 ) − h ( k ) ( t 0 1 ) | ≤ C | t 1 − t 0 1 | α − k , where k = b α c . Giv en a one-dimensional smo oth edge contour function h , w e define the image f h : [0 , 1] 2 → R as f h ( t 1 , t 2 ) = 1 { t 2 θ } . Then wa v elet thresholding denoising corresponds to ˆ f W θ = I W ( T θ ( W ( y ))) . Donoho and Johnstone hav e prov en that sup f ∈ H α ( C ) R n ( f , ˆ f W ) = Ω( n − 1 ) [4], [10]. Even though this rate is an impro vemen t o ver the ab o ve algorithms, is still far from the optimal ac hiev able rate of n − 4 3 for α = 2. This sub optimalit y spurred the dev elopment of other sparsity-inducing transformations, including curvelets [11], wedgelets [4], shearlets [12], and con tourlets [13]. Among these transforms, w edgelet denoising pro v ably ac hieves the optimal rate of n − 4 3 for α = 2 [4]. Ho w ever, w edgelet denoising p erforms p o orly on textures, which has limited its application in practice to date. 2. Nonlo cal means denoising The YF estimator sets its weigh ts according the noisy pixel v alues and their spatial vicinity; how ev er neither of these tw o features are reliable for noisy , edgy images. In contrast, the nonlo c al me ans (NLM) algorithm sets its w eights according to the proximit y of the image patc h surrounding each noisy pixel with other patches in the image [14]. Define the δ n -neigh b orho o d distance d δ n ( y i,j , y m,` ) b et w een tw o observ ations as d 2 δ n ( y i,j , y n,p ) = 1 ρ 2 n δ n X m = − δ n δ n X ` = − δ n | y i + `,j + m − y n + `,p + m | 2 − | y i,j − y n,p | 2 , where ρ 2 n = (2 δ n + 1) 2 − 1. Note that, in contrast to the definition in [14], we ha ve remov ed the cen ter elemen t | y i,j − y n,p | 2 from the summation. Since w e assume that δ n → ∞ as n → ∞ , the effect is negligible on the asymptotic p erformance. But, as w e will see in Section 4, removing the center elemen t simplifies the calculations considerably . NLM uses the neigh b orho o d dis- tances to estimate ˆ f N i,j = P ( m,` ) ∈ S w N i,j ( m, ` ) y m,` P ( m,` ) ∈ S w N i,j ( m, ` ) , (7) where S = { 1 , 2 , . . . , n } × { 1 , 2 , . . . , n } and w i,j ( m, ` ) is set according to the δ n -neigh b orho o d distance b et ween y i,j and y m,` . F or the simplicity of 7 notation, in cases where b oth the reference pixel ( i, j ) and the algorithm are ob vious from the context, w e will omit the sup erscript and subscript of the w eight and use the simplified notation w m,` instead of w N i,j ( m, ` ). It is straigh tforward to v erify that E ( d 2 δ n ( y i,j , y m,` )) = d 2 δ n ( x i,j , x m,` ) + 2 σ 2 , which suggests the follo wing strategy for setting the weigh ts: w N i,j ( m, ` ) =  1 if d 2 δ n ( y i,j , y m,` ) ≤ 2 σ 2 + t n , 0 otherwise, (8) where t n is the thr eshold p ar ameter . Soft/tap ered versions of setting the w eights hav e b een explored and are often used in practice [14]. Ho wev er, the ab o v e untapered weigh ts capture the essesnse of the algorithm while simplifying the analysis. W e p ostp one the discussion of tap ered weigh ts un til Section 5. There are t wo main differences b etw een the NLM and YF algorithms. First, the pixels that con tribute in the NLM av eraging are not necessarily in the lo cal neighborho o d of the reference pixel (hence the monick er “nonlo cal”). Second, the NLM weigh ts dep end not on the difference b etw een the pixel v alues but on distance b etw een the pixel neighborho o ds. In other w ords the pixel neigh b orho o d is even more imp ortant than the pixel v alue. T o derive a lo w er b ound for the risk of NLM, we will analyze tw o algo- rithms that set the weigh ts using some degree of oracle information regard- ing the true v alue of the signal. The ful l or acle NLM (FNLM) has access to E ( d 2 δ n ( y i,j , y m,` )) in setting the weigh ts w m,l in (7) and th us sets them using the noise-free v alues of the pixels w F i,j ( m, ` ) =  1 if d 2 δ n ( x i,j , x m,` ) ≤ t n, , 0 otherwise. (9) The semi-or acle NLM (SNLM) differs only slightly from the standard NLM in that it uses the semi-oracle neigh b orho o d distance ¯ d 2 δ n ( y i,j , y n,p ) , 1 ρ 2 n δ n X m = − δ n δ n X ` = − δ n | x i + `,j + m − y n + `,p + m | 2 − ( x i,j − y n,p ) 2 ! , (10) and then sets the w eights in (7) according to w S i,j ( m, ` ) =  1 if ¯ d 2 δ n ( y i,j , y m,` ) ≤ σ 2 + t n, , 0 otherwise. (11) 8 Unlik e FNLM, SNLM assumes that just one-half of the noise is remo v ed from the distance estimates. Therefore, the distances calculated in the SNLM are more accurate than the standard NLM but less accurate than in the FNLM. In the rest of the pap er, w e will use ˆ f N , ˆ f S , and ˆ f F to denote the NLM, SNLM, and FNLM estimators, resp ectiv ely . 3. Main Results Our first result, pro ved in Section 4.3, establishes an upp er b ound on the risk of NLM. Theorem 4. Fix  > 0 and c onsider NLM denoising with δ n = 2 log 1 2 +  n and t n = 2 σ 2 log  2 n . The risk of this algorithm over the class H α ( C ) is sup f ∈ H α ( C ) R ( f , ˆ f N ) = O log 1 2 +  n n ! . (12) Before we discuss the implications of this theorem, it is imp ortant to note that, while we can improv e the decay rate as close as we desire to O ( n − 1 log 1 2 n ), the constants that are in volv ed in the big- O notation grow as  decreases. Therefore, in practice very small v alues of  are not desirable. Comparing the upp er b ound (12) with the optimal minimax risk (4) in- dicates that NLM is sub optimal for α > 1. In other w ords, NLM cannnot exploit the smo othness of edge con tours in images. The bound in Theorem 4 is for a sp ecific choice of parameters, and it is natural to ask whether NLM can achiev e the optimal rate with some other c hoice of parameters. T o answer this question, we consider SNLM, which outp erforms standard NLM in general. W e make the following mild assump- tions: A1: The window size δ n → ∞ as n → ∞ . This assumption is critical to ensuring go o d p erformance of any NLM estimator. A2: The threshold is set to σ 2 + t n as explained in (11) with t n > 0. This ensures that if the neighborho o d of pixel ( m, ` ) is exactly the same as the neigh b orho o d of pixel ( i, j ), then w m,` = 1 with high probabilit y . A3: The threshold t n is set suc h that, if the noise-free neighborho o ds are differen t in more than half of their pixels, i.e., if d 2 ( x i,j , x m,` ) ≥ 1 2 , then P ( w F i,j ( m, ` ) = 1) = o ( n − 1 ). 9 A4: δ n = O ( n β ), for some β ≤ 0 . 3. The follo wing theorem provides a lower b ound on the p erformance of SNLM. Theorem 5. Supp ose that δ n and t n satisfy A1–A4. The risk of the SNLM over the class H α ( C ) is inf δ n ,t n sup f ∈ H α ( C ) R ( f , ˆ f S ) = Ω( n − 1 ) . This b ound is still sub optimal compared to the n − 4 / 3 minimax rate for α = 2. In the w ords of John Corn yn I I I, the junior United States Senator for T exas, “The problem with a mini-deal is w e hav e a maxi-problem” [15]. Remark ably , this low er b ound is ac hieved on a v ery simple image on whic h NLM w ould b e assumed to w ork very well: 1 { t 2 < 0 . 5 } (see Figure 2). Here is what go es wrong. Consider the estimation of an “edge” pixel ( i, j ) that satisfies j = d nh ( i n ) e . Define the set J = { ( m, ` ) | ` = b nh ( m n ) c} as the set of pixels just b elow the edge. W e will pro ve the probabilit y that a pixel in J con tributes to the NLM estimate ( w i,j ( m, ` ) = 1) is larger than p 0 , where p 0 do es not dep end on n . This happ ens due to the lo w “signal to noise ratio” in the distance estimates. Hence Θ( n ) pixels of J will contribute to the NLM estimate. Since these pixels hav e x m,` = 1, they introduce a large bias in the estimate. In fact, we sho w b elo w that the bias, as defined in (2), will be larger than np 0 n + np 0 + np 0 . Here np 0 corresp onds to the pixels b elow the edge that pass the threshold. This shows that the bias is clearly Θ(1). Since there are n edge pixels, the risk of the estimator o ver the en tire image is Ω( n − 1 ). 4. Pro ofs of the Main Theorems 4.1. Pr o of of The or em 2 The pro of has tw o main steps. The first step is to prov e that there exists a linear filter for which the supremum risk is upp er b ounded by O ( n − 2 / 3 ). F or this step w e use Theorem 3.1 and 3.2 from [5], which establish the same upp er b ound for the b ox filter. The second and more challenging step is to pro ve that no other linear filter can impro ve on this decay rate. The rest of this section is dedicated to the pro of of this fact. 10 x i , j t 1 t 2 Figure 2: The simple image Horizon 1 { t 2 < 0 . 5 } used for pro ving the v arious lo w er b ounds. Consider the function f h ( t 1 , t 2 ) for h ( t ) = 1 2 and suppose that n is even. This function is displa yed in Figure 2. Let X ( k 1 , k 2 ) = 1 n X ` 1 X ` 2 x ` 1 ,` 2 e − j 2 πk 1 ` 1 n e − j 2 πk 2 ` 2 n represen t the Discrete F ourier T ransform (DFT) of a discrete t w o-dimensional discrete signal x . Since y = x + z , the DFT of ˆ f LF g equals ˆ F LF g ( k 1 , k 2 ) = Y ( k 1 , k 2 ) · G ( k 1 , k 2 ) = X ( k 1 , k 2 ) G ( k 1 , k 2 ) + Z ( k 1 , k 2 ) G ( k 1 , k 2 ) , where Z is again iid N (0 , σ 2 ). F or f h ( t 1 , t 2 ) with h ( t 1 ) = 1 2 , X ( k 1 , k 2 ) satisfies X ( k 1 , k 2 ) = ( 0 if k 1 6 = 0, 1 − e − j π k 2 1 − e − j 2 πk 2 n if k 1 = 0. (13) It is easy to see that R n ( f , ˆ f LF g ) = 1 n 2 E ( k X − ˆ F LF g k 2 F ), where k Y k 2 F , P k 1 ,k 2 | Y ( k 1 , k 2 ) | 2 . If we define B ( ˆ f ) as the bias of the estimator ˆ f , then w e hav e B 2 ( ˆ f LF g ) = 1 n 2 X 1 ≤ k 2 ≤ n, odd | 1 − G (0 , k 2 ) | 2 1 sin 2 π k 2 n ≥ 1 n 2 X 1 ≤ k 2 ≤ n 2 / 3 , odd | 1 − G (0 , k 2 ) | 2 1 sin 2 π k 2 n . 11 The v ariance of the estimator is V ar( ˆ f LF g ) = 1 n 2 X k 1 ,k 2 | G ( k 1 , k 2 ) | 2 σ 2 . W e kno w that 1 n 2 X k 1 ,k 2 | G ( k 1 , k 2 ) | 2 = Z Z | ˆ G ( ω 1 , ω 2 ) | 2 + O ( n − 1 ) , (14) where ˆ G is the con tinuous F ourier transform of g and satisfies k grad( ˆ G ) k 2 ≤ C . Since g is isotropic, there exists F : R → C such that ˆ G ( ω 1 , ω 2 ) = F  q ω 2 1 + ω 2 2  . Changing the v ariables of integration in (14) to p olar co ordinate radius ω r = p ω 2 1 + ω 2 2 angle θ , w e ha ve Z Z | ˆ G ( ω 1 , ω 2 ) | 2 ≥ 2 π Z 2 π r =0 r | F ( r ) | 2 dr = 2 π Z 2 π ω 2 =0 ω 2 | ˆ G (0 , ω 2 ) | 2 dω 2 . (15) Com bining (14) and (15) w e hav e V ar( ˆ f LF h ) = 1 n 2 X k 1 ,k 2 | G ( k 1 , k 2 ) | 2 σ 2 = 4 π 2 n 2 X k 2 k 2 | G (0 , k 2 ) | 2 σ 2 − O ( n − 1 ) . Summing the low er b ounds for the bias and v ariance of this estimator, w e obtain the following low er b ound for the risk of linear filtering: R n ( f , ˆ f LF ) = B 2 ( ˆ f LF g ) + V ar( ˆ f LF g ) ≥ 1 n 2 X 1 ≤ k 2 ≤ n 2 / 3 , odd | 1 − G (0 , k 2 ) | 2 1 sin 2 π k 2 n + 4 π 2 n 2 X k 2 k 2 | G (0 , k 2 ) | 2 σ 2 − O ( n − 1 ) = 1 n 2 X 1 ≤ k 2 ≤ n 2 / 3 , odd | 1 − G (0 , k 2 ) | 2 n 2 π 2 k 2 2 + 4 π 2 n 2 X k 2 k 2 | G (0 , k 2 ) | 2 σ 2 − O ( n − 1 ) . Minimizing the dominant term of the low er b ound o ver the filter weigh ts pro vides G ∗ (0 , k 2 ) = 1 1+ 4 π 4 σ 2 k 3 2 n 2 for o dd v alues of k 2 and zero for even v alues 12 of k 2 . T o find a lo wer b ound we calculate the bias term with these optimal w eights: B 2 ( ˆ f LF g ∗ ) = 1 n 2 X 1 ≤ k 2 ≤ n 2 / 3 , odd | 1 − G (0 , k 2 ) | 2 n 2 π 2 k 2 2 = 1 n 2 X 1 ≤ k 2 ≤ n 2 / 3 , odd  4 π 4 σ 2 k 3 2 /n 2 1 + 4 π 4 σ 2 k 3 2 /n 2  2 n 2 π 2 k 2 2 ≥ 1 n 2 X 1 ≤ k 2 ≤ n 2 / 3 , odd  4 π 4 σ 2 k 3 2 /n 2 1 + 4 π 4 σ 2  2 n 2 π 2 k 2 2 = 1 n 4  4 π 4 σ 2 1 + 4 π 4 σ 2  2 X 1 ≤ k 2 ≤ n 2 / 3 , odd k 4 2 =  4 π 4 σ 2 1 + 4 π 4 σ 2  2  n − 2 / 3 40 + o ( n − 2 / 3 )  . This completes the pro of. 4.2. Pr o of of The or em 3 In this section, denote the pixel to b e estimated as x i,j . F or clarity we use the notation w m,` instead of w S Y i,j ( m, ` ). W e first c haracterize some of the prop erties of the SYF w eigh ts. Lemma 1. Supp ose that x i,j = 0 . If x m,` = x i,j , then E ( w m,` y m,` ) = 0 . F urthermor e, if | x i,j − x m,` | = 1 , then E ( w m,` y m,` ) > τ √ σ 2 + τ 2 e − 1 2( σ 2 + τ 2 ) . Pr o of. The first claim is clear from symmetry . T o pro ve the second claim, w e observe that E ( w m,` y m,` ) = E ( w m,` x m,` ) + E ( w m,` z m,` ). Since x m,` = 1, w e calculate E ( w m,` ) and E ( w m,` z m,` ). It is clear that E ( w m,` z m,` ) = 1 σ √ 2 π R ∞ −∞ z m,` e − ( z m,` − 1) 2 2 τ 2 − z 2 m,` 2 σ 2 ≥ 0. Therefore we calculate E ( w m,` ) = 1 σ √ 2 π Z ∞ −∞ e − ( z m,` − 1) 2 2 τ 2 − z 2 m,` 2 σ 2 = e − 1 2 τ 2 + σ 2 2( σ 2 + τ 2 ) τ 2 σ 2 π Z ∞ −∞ e − σ 2 + τ 2 2 σ 2 τ 2 ( z 2 m,` − 2 σ 2 σ 2 + τ 2 z m,` + σ 4 ( σ 2 + τ 2 ) 2 ) = e − 1 2( σ 2 + τ 2 ) σ r σ 2 τ 2 σ 2 + τ 2 = τ e − 1 2( σ 2 + τ 2 ) √ σ 2 + τ 2 . 13 This completes the pro of. Define the ∆-neighborho o d of a pixel ( m, ` ) as C ∆ m,` = { ( i, j ) : | i − m | ≤ ∆ , | j − ` | ≤ ∆ } ∩ S . Lemma 2. L et Ω n = (2∆ n + 1) 2 . We then have P    1 Ω n    X ( m,` ) ∈C ∆ n i,j w S Y m,` − X ( m,` ) ∈C ∆ n i,j E w S Y m,`    ≥ t    ≤ 2e − 2Ω n t 2 . The pro of is a simple application of the Ho effding inequality . Pr o of of The or em 3. The first claim is that the optimal neighborho o d size satisfies ∆ n = Ω(log n ). W e pro ve this by contradiction. Supp ose that ∆ n = O (log( n )) and consider the p erformance of the SYF on the image x i,j = 0 for every ( i, j ). It is clear that the bias is zero. Ho wev er, the v ariance is lo wer bounded by Ω  1 log 2 n  . This is far from the optimal p erformance of the linear filters analyzed in Theorem 2. Therefore ∆ n = Ω(log( n )). No w consider the example image shown in Figure 2 with f h ( t 1 , t 2 ) = 1 { t 2 < 0 . 5 } . F or notational simplicity w e assume n is ev en so that the v alue of eac h pixel is either 0 or 1. Define the t w o regions P 1 = { ( i, j ) : n 2 ≤ j ≤ n 2 + ∆ n 2 } and P 2 = { ( i, j ) : j > n 2 + ∆ n } . At least 1 / 4 of the pixels in the neigh b orho o d of the pixels in P 1 ha ve the noise-free v alue of 1. All pixels in the neighborho o d of the pixels in P 2 ha ve the noise-free pixel v alues equal to 1. Over eac h region we will find a low er b ound for the risk of SYF and then sum them to obtain a lo wer b ound for the risk o v er the entire image. Case I – ( i, j ) ∈ P 1 : F rom the Jensen inequality we hav e E x i,j − P ( m,` ) ∈C ∆ n i,j w m,` y m,` P ( m,` ) ∈C ∆ n i,j w i,j ! 2 ≥ E P ( m,` ) ∈C ∆ n i,j w m,` y m,` P ( m,` ) ∈C ∆ n i,j w m,` ! 2 . Define the follo wing tw o constants: m 0 = E ( w S Y i,j ( m, ` ) | x i,j = 0 , x m,` = 0) , m 1 = E ( w S Y i,j ( m, ` ) | x i,j = 0 , x m,` = 1) . 14 It is clear that m 0 > m 1 . Let the even t A b e A =      X ( m,` ) ∈C ∆ n i,j w m,` − X ( m,` ) ∈C ∆ n i,j E w m,` ≤ ∆ 2 −  n      (16) for some  > 0. W e hav e E P ( m,` ) ∈C ∆ n i,j w m,` y m,` P ( m,` ) ∈C ∆ n i,j w m,` ! ≥ E P ( m,` ) ∈C ∆ n i,j w m,` y m,` P ( m,` ) ∈C ∆ n i,j w m,`      A ! P ( A ) ( a ) ≥ E P ( m,` ) ∈C ∆ n i,j w m,` y m,` 4∆ 2 n m 0 + ∆ 2 −  n      A ! P ( A ) ≥ E P ( m,` ) ∈C ∆ n i,j w m,` y m,` 4∆ 2 n m 0 + ∆ 2 −  n ! − P ( A c ) ( b ) ≥  ∆ 2 n c 0 4∆ 2 n m 0 + ∆ 2 −  n  − P ( A c ) . Inequalit y ( a ) uses Lemma 2 and the fact that m 0 ≥ m 1 . Inequality ( b ) uses Lemma 1, and therefore c 0 = τ √ σ 2 + τ 2 e − − 1 2( σ 2 + τ 2 ) . Since C ∆ n i,j has (2∆ n + 1) 2 pixels, at least ∆ 2 n of them ha ve the noise-free pixel v alue 1. Since ∆ n = Ω(log n ), Lemma 2 pro v es that P ( A c ) = o (1) and, therefore, the bias is lo wer b ounded b y Θ(1) for all of the pixels in P 1 . Case II – ( i, j ) ∈ P 2 : As men tioned b efore, all pixels in the neighborho o d of the pixels in P2 ha ve the noise-free pixel v alues equal to 1. Hence, we ha ve E P ( m,` ) ∈C ∆ n i,j w m,` y i,j P ( m,` ) ∈C ∆ n i,j w m,` ! 2 = E P ( m,` ) ∈C ∆ n i,j w m,` z m,` P ( m,` ) ∈C ∆ n i,j w m,` ! 2 . 15 Defining the ev ent A as in (16), we hav e E   P ( m,` ) ∈C ∆ n i,j w m,` z m,` P ( m,` ) ∈C ∆ n i,j w m,` ! 2       A   P ( A ) ≥ E   P ( m,` ) ∈C ∆ n i,j w m,` z m,` 4∆ 2 n m 0 + ∆ 2 −  n ! 2       A   P ( A ) ≥ E P ( m,` ) ∈C ∆ n i,j w m,` z m,` 4∆ 2 n m 0 + ∆ 2 −  n ! 2 − P ( A c ) = 4∆ 2 n E ( w m,` z m,` ) 2 (4 m 0 ∆ 2 n + ∆ 2 −  n ) 2 − P ( A c ) . If the neighborho o d size is larger than c log ( n ) for some constant c , then Lemma 2 will imply that P ( A c ) < o  1 n 2  . Therefore, the dominant term in the ab o v e expression of the form of γ ∆ 2 n . Combining the low er b ounds for P 1 and P 2 , we obtain a lo wer b ound of the form of β ∆ n n + γ ∆ 2 n . Optimizing o ver ∆ n pro ves that inf ∆ n ,τ R n ( f , ˆ f S Y ) > Ω( n − 2 / 3 ) . This completes the pro of. It is clear from the pro of ab ov e that the neighborho o d size is the main parameter that controls the deca y rate of the risk of the YF. The Gaussian term in the YF w eigh ts enables an impro vemen t in the constants but do es not play any role in the deca y rate. In the extreme case of ∆ n = n , when all of the image pixels can p oten tially contribute to the estimation of a pixel, the deca y rate of YF degrades to Θ(1). This algorithm is called the r ange filter , and [8] observ ed in practice that it p erforms m uch w orse than even linear filters, as the ab ov e analysis confirms. Interestingly , NLM addresses this issue and therefore its search space could b e the entire image. This is the main reason for its impro ved p erformance. The low er b ound prov ed in Theorem 3 is the same as the upp er b ound w e derived for the p erformance of linear filtering. Therefore, we hav e the follo wing theorem. Theorem 6. The risk of the SYF satisfies inf ∆ n ,τ sup f ∈ H α ( C ) R n ( f , ˆ f S Y )  n − 2 / 3 . 16 4.3. Pr o of of The or em 4 The pro of has t wo main steps. First, w e show that the risk of the pixels far from the edge is O (log 1+2  ( n ) /n 2 ). Second, w e sho w that the risk of the pixels whose δ n neigh b orho o d senses the edge is constant; how ev er there are at most O ( nδ n ) of these pixels. The following tw o lemmas will pla y key roles in our analysis. Lemma 3. L et Z ∼ N (0 , σ 2 ) . F or λ < 1 2 σ 2 , we have E (e λZ 2 ) = 1 √ 1 − 2 λσ 2 . Pr o of. The pro of is a simple integral calculation: E (e λZ 2 ) = 1 σ √ 2 π Z ∞ −∞ e ( λ − 1 2 σ 2 ) Z 2 d Z = 1 σ q 1 σ 2 − 2 λ . Lemma 4. L et Z 1 , Z 2 , . . . , Z n b e iid N (0 , 1) r andom variables. The χ 2 n r an- dom variable define d as P n i =1 Z 2 i c onc entr ates ar ound its me an with high pr ob- ability, i.e., P 1 n X i Z 2 i − 1 > t ! ≤ e − n 2 ( t − ln(1+ t )) , P 1 n X i Z 2 i − 1 < − t ! ≤ e − n 2 ( t +ln(1 − t )) . Pr o of. Here we prov e just the first claim; the pro of of the second claim follows along v ery similar lines. F rom Mark ov’s Inequality , we ha ve P 1 n n X i =1 Z 2 i ! − 1 > t ! ≤ e − λt − λ E  e λ n P n i =1 Z 2 i  = e − λt − λ  E  e λZ 2 1 n  n = e − λt − λ  1 − 2 λ n  n 2 . (17) 17 S 1 S 2 S 3 S 4 2 ! n 2 ! n t 1 t 2 ( i a , j a ) ( i b , j b ) Figure 3: An example of a Horizon image. The δ n -neigh b orho o d of pixel ( i a , j a ) ∈ S 4 do es not intersect the edge con tour, while the δ n -neigh b orho o d of pixel ( i b , j b ) ∈ S 3 in tersects with the edge con tour. The last inequalit y follows from Lemma 3. The upp er b ound pro ved ab o ve holds for any λ < n 2 . T o obtain the low est upp er b ound w e minimize e − λt − λ ( 1 − 2 λ n ) n 2 o ver λ . The optimal v alue of λ is λ ? = arg min λ e − λt − λ ( 1 − 2 λ n ) n 2 = nt 2( t +1) . Plugging λ ∗ in to (17) prov es the result. Pr o of of The or em 4. W e will consider the follo wing partition of the image pixels. Let S = { 1 , 2 , . . . , n } × { 1 , 2 , . . . , n } . F or a giv en Horizon function f h ( t 1 , t 2 ), define S 1 = { ( i, j ) | j n > h ( i n ) + 2 δ n n } , S 2 = { ( i, j ) | h ( i n ) < j n ≤ h ( i n ) + 2 δ n n } , S 3 = { ( i, j ) | h ( i n ) − 2 δ n n ≤ j n ≤ h ( i n ) } , and S 4 = { ( i, j ) | j n < h ( i n ) − 2 δ n n } . These regions are display ed in Figure 3.The δ n -neigh b orho o d of the pixels in S 1 and S 4 do not in tersect the edge, while the δ n -neigh b orho o d of the other pixels ma y hav e pixels from b oth sides of the edge. See Figure 3. F or the notational simplicity we write P ( i,j ) ∈ S ` for the double summation o ver i, j where j satisfies the constraints sp ecified for S ` . Consider a pixel ( i, j ) ∈ S 1 . The risk of NLM at this pixel is E  x i,j − P w m,` y m,` P w m,`  2 , 18 where x i,j = 0, since ( i, j ) ∈ S 1 . Define the set of oracle weigh ts w ? m,` =  1 if ` n > h ( m n ), 0 otherwise. (18) Define U ,  P w m,` y m,` P w m,`  2 , and let the ev ent A = { w m,` = w ? m,` , ∀ ( m, ` ) ∈ S 1 ∪ S 4 } . W e then hav e E ( U ) = E ( U | A ) P ( A ) + E ( U | A c ) P ( A c ) ≤ E ( U | A ) P ( A ) + P ( A c ) , (19) where the last inequalit y is due to the fact that the risk of the estimator is b ounded b y 1. W e now calculate each term of (19) separately . Define S 14 = S 1 ∪ S 4 and S 23 = S 2 ∪ S 3 . Then we hav e E ( U | A ) P ( A ) = E   P ( m,` ) ∈ S 14 w ? m,` y m,` + P ( m,` ) ∈ S 23 w m,` y m,` P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 23 w m,` ! 2       A   P ( A ) ≤ E P ( m,` ) ∈ S 14 w ? m,` y m,` + P ( m,` ) ∈ S 23 w m,` y m,` P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 23 w m,` ! 2 ≤ E P ( m,` ) ∈ S 14 w ? m,` x m,` + P ( m,` ) ∈ S 23 w m,` x m,` P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 23 w m,` ! 2 + E P ( m,` ) ∈ S 14 w ? m,` z m,` + P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 23 w m,` ! 2 + 2 v u u t E P ( m,` ) ∈ S 14 w ? m,` x m,` + P ( m,` ) ∈ S 23 w m,` x m,` P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 23 w m,` ! 2 × v u u t E P ( m,` ) ∈ S 1 ∪ S 4 w ? m,` z m,` + P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 23 w m,` ! 2 . (20) The last inequalit y is due to Cauc hy-Sc hw artz. In the next tw o lemmas w e b ound the last three terms of (20). 19 Lemma 5. L et w m,` b e the weights of NLM with δ n = log 1 2 +  n and t n = 2 √ log  2 n for  > 0 . Also, let w ? m,` b e the or acle weights intr o duc e d in (18). Then E P ( m,` ) ∈ S 14 w ? m,` x m,` + P ( m,` ) ∈ S 23 w m,` x m,` P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 23 w m,` ! 2 = O  δ 2 n n 2  . Pr o of. Define S f as the set of indices of the pixels whose noise-free v alue is neither zero nor one. Since the images are chosen from the Horizon class, the cardinalit y of this set is at most 2 n . Plugging in the v alues of x m,` , w e hav e E P ( m,` ) ∈ S 14 w ? m,` x m,` + P ( m,` ) ∈ S 23 w m` x m,` P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 23 w m,` ! 2 ( a ) = E P ( m,` ) ∈ S 14 w ? m,` x m,` + P ( m,` ) ∈ S 3 \ S f w m,` + P ( m,` ) ∈ S f w m,` x m,` P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 3 \ S f w m,` + P ( m,` ) ∈ S 2 \ S f w m,` + P ( m,` ) ∈ S f w m` ! 2 ( b ) ≤ E P ( m,` ) ∈ S 14 w ? m,` x m,` + P ( m,` ) ∈ S 3 1 + P ( m,` ) ∈ S f w m` x m,` P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 3 1 + P ( m,` ) ∈ S f w m,` ! 2 ≤ E P ( m,` ) ∈ S 14 w ? m,` x m,` + P ( m,` ) ∈ S 3 1 + 2 n P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 3 1 ! 2 = O  δ 2 n n 2  , where Inequality (b) is due to the fact that the expression after Equality (a) is an increasing function of P ( m,` ) ∈ S 3 \ S f w m` and a decreasing function of P ( m,` ) ∈ S 2 \ S f w m` . Therefore, we set w m,` = 1 for ( m, ` ) ∈ S 3 and w m,` = 0 for ( m, ` ) ∈ S 2 . Lemma 6. L et w m,` b e the weights of NLM with δ n = log 1 2 +  n and t n = 2 √ log  2 n for  > 0 . Also, let w ? m,` b e the or acle weights intr o duc e d in (18). Then we have E P ( m,` ) ∈ S 14 w ? m,` z m,` + P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 23 w m,` ! 2 = O  1 n 2  . Pr o of. Since P ( m,` ) ∈ S 23 w m,` ≥ 0 and we are in terested in the upp er b ound 20 of the risk, w e can remov e it from the denominator to obtain E   P ( m,` ) ∈ S 14 w ? m,` z m,` + P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w ? m,` + P ( m,` ) ∈ S 23 w m,` ! 2   ≤ E   P ( m,` ) ∈ S 14 w ? m,` z m,` + P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w ? m,` ! 2   = E   P ( m,` ) ∈ S 14 w ? m,` z m,` P ( m,` ) ∈ S 14 w ? m,` ! 2   + E   P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w ? m,` ! 2   + 2 E P ( m,` ) ∈ S 14 w ? m,` z m,` P ( m,` ) ∈ S 14 w ? m,` ! P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w ? m,` !! . (21) Since P ( m,` ) ∈ S 14 w ? m,` z m,` P ( m,` ) ∈ S 14 w ? m,` is the a v erage of iid random v ariables, it is not hard to pro ve that E  P ( m,` ) ∈ S 14 w ? m,` z m,` P ( m,` ) ∈ S 14 w ? m,`  2 = O ( σ 2 n 2 ). T o b ound the other tw o terms in (21) w e use the notation defined in the last section: C ∆ m,` = { ( i, j ) : | i − m | < ∆ , | j − ` | < ∆ } ∩ S . W e also define E ( · | C ∆ m,` ) as the conditional exp ectation giv en the v ariables in C ∆ m,` . W e then hav e E   P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w ? m,` ! 2   = E   E   P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w ? m,` ! 2       C δ n i,j     = E E ( P ( m 0 ,` 0 ) ∈ S 23 P ( m,` ) ∈ S 23 w m,` z m,` w m 0 ,` 0 z m 0 ,` 0 | C δ n i,j ) ( P ( m,` ) ∈ S 14 w ? m,` ) 2 ! = E   E ( P ( m 0 ,` 0 ) ∈C 2 δ n m,` P ( m,` ) ∈ S 23 w m,` z m,` w m 0 ,` 0 z m 0 ,` 0 | C δ n i,j ) ( P ( m,` ) ∈ S 14 w ? m,` ) 2   = P ( m 0 ,` 0 ) ∈C 2 δ n m,` P ( m,` ) ∈ S 23 E ( w m,` z m,` w m 0 ,` 0 z m 0 ,` 0 ) ( P ( m,` ) ∈ S 14 w ? m,` ) 2 ! ≤ O  δ 3 n n 3  . F or the last inequalit y w e ha v e used the Cauch y-Sch w artz Inequalit y to pro v e 21 that E ( w m,` z m,` w m 0 ,` 0 z m 0 ,` 0 ) ≤ 3 σ 2 . Although we could derive a lo ose b ound for E   P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w ? m,`  2  and still dra w the same conclusion, w e used the ab o v e technique since we hav e to use it in the pro of of Theorem 7. The last term w e hav e to b ound in (21) is E P ( m,` ) ∈ S 14 w ? m,` z m,` P ( m,` ) ∈ S 14 w ? m,` ! P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w ? m,` !! ≤ v u u t E P ( m,` ) ∈ S 14 w ? m,` z m,` P ( m,` ) ∈ S 14 w ? m,` ! 2 v u u t E P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w ? m,` ! 2 ≤ O  1 n 2  . This pro ves the lemma. Using Lemma 5 and Lemma 6 in (20) pro ves that E ( U | A ) P ( A ) = O  δ 2 n n 2  . (22) Finally , using Lemma 4 and the union b ound it is easy to show that P ( A c ) = O  1 n 2  . (23) It is imp ortant to note that the constants of this probabilit y are hidden in the O notation. These constants dep end on  and increase as  decreases. Therefore, w e cannot set  = 0. Plugging in (23) and (22) in (19) results in E  x i,j − P w m,` y m,` P w m,`  2 = O  log 1+2  ( n ) n 2  ∀ ( i, j ) ∈ S 1 . No w consider ( i, j ) ∈ S 2 ∪ S 3 . In this region w e can b ound the error b y the w orst p ossible risk, which is 1. W e will discuss the sharpness of this b ound in the next section where w e develop a low er b ound for the risk. Using the b ounds pro vided ab o ve for the risks of the pixels in S 1 , S 2 , S 3 and S 4 , we can no w calculate the final upp er b ound for the risk of the NLM 22 as sup f ∈ H α ( C ) R ( f , ˆ f N L ) = 1 n 2 X i X j E ( x i,j − ˆ f N i,j ) 2 ≤ log 1+2  ( n )( | S 1 | + | S 4 | ) n 2 + | S 2 | + | S 3 | n 2 ≤ O log 1 2 +  ( n ) n ! . In order to derive the last inequality w e noted that since h ( t 1 ) ∈ H ¨ older 1 (1) the cardinality of S 2 and S 3 are O ( n log( n )). This completes the pro of of Theorem 4. 4.4. Pr o of of The or em 5 Supp ose that the parameters of SNLM satisfy assumptions A1–A4. T o deriv e a low er b ound we consider the p erformance of the SNLM algorithm on the simple image in Figure 2. F or notational simplicity we assume that n is even, and hence all of the pixel v alues are either 0 or 1. The pro of follo ws four main steps: 1. W e consider the pixels that are just ab ov e the edge, i.e., ( i, d n 2 e ), and pro ve that the risk of the NLM on these pixels is low er b ounded b y a constan t that do es not dep end on n . 2. Using asymptotic argumen ts we prov e that the probability a pixel just b elo w the edge passes the threshold t n > 0 is larger than p 0 , where p 0 is a non-zero probability indep endent of n . Based on this, we use a concen tration argument to pro ve that Θ( n ) of the pixels just b elow the edge will pass the threshold with high probability . See the formal statemen t in Theorem 1. 3. Using symmetry argumen ts w e pro ve that the probability a pixel that is ` < δ n / 2 ro ws 5 ab o v e the edge or below the edge passes the threshold is equal. This is formally stated in Lemma 8. 4. Combining the outcomes of Steps 2 and 3 we show that the risk is minimized if all the pixels just ab o ve the edge pass the threshold and the probability that the other pixels pass the threshold is as lo w as 5 The ` th r ow of an image is the set of all pixels of the form ( i, ` ). 23 p ossible. If more zero pixels ab o ve the edge pass the threshold, then more pixels with noise-free v alue 1 will also pass the threshold, and this mak es the bias large. Therefore we assume that p n,` , the probability that a pixel at distance ` of the edge passes the threshold, is equal to zero for ` > 1. Ho w ever, we ha v e already prov en that for ` = 1 the probabilit y is larger than p 0 . Theorem 5 uses this fact to show that the risk of this estimator is larger than a constan t indep enden t of n . Prop osition 1. L et j ∗ = d n 2 e . F or any pixel with c o or dinates of the form ( i ∗ , j ∗ ) , ther e exists a non-zer o c onstant pr ob ability p 0 such that for any δ n and t n P X m w m,j ∗ − 1 − np 0 < − t ! ≤ 4 δ n e − t 2 4 nδ n . Pr o of. F or notational simplicity w e use i = i ∗ and j = j ∗ in the pro of. W e ha ve P ( ¯ d 2 δ n ( y i,j , y m,j − 1 ) ≤ σ 2 + t n ) = P 1 ρ 2 n ( X `,p | x i + p,j + ` − y m + p,j − 1+ ` | 2 − ( x i,j − y p,j − 1 ) 2 ) ≤ σ 2 + t n ! = P 1 ρ 2 n X `,p ( s 2 `,p − σ 2 ) − 2 ρ 2 n X ` s `, 0 ≤ − 1 ρ n + t n ! ≥ P 1 ρ 2 n X `,p ( s 2 `,p − σ 2 ) − 2 ρ 2 n X ` s `, 0 ≤ − 1 ρ n ! , where s `,m = z m + `,j − 1+ p . According to the Berry-Esseen Cen tral Limit The- orem for indep endent non-identically distributed random v ariables [16], w e kno w that P 1 ρ 2 n X ` X p ( s 2 `,p − σ 2 ) − 2 ρ 2 n X ` s `, 0 ≤ − 1 ρ n ! ≥ P ( G ≤ − 1) − C ρ n , where G is a Gaussian random v ariable with mean zero and b ounded standard deviation. In fact, it is not difficult to confirm that E ( G 2 ) = 2 σ 4 + 8 σ 2 δ n − 2 σ 4 (2 δ n + 1) 2 . 24 Since P ( G ≤ − 1) ≥ 2 p 0 (2 p 0 is P ( G 0 ≤ − 1) where G 0 ∼ N (0 , 2 σ 4 )) is non-zero, for large v alues of n we can ensure that C /n < p 0 and therefore that P ( ¯ d 2 δ n ( y i,j , y m,j − 1 ) ≤ σ 2 + t n ) > p 0 . W e now pro ve that ev en though the w eights are correlated, Θ( n ) of the weigh ts will be equal to 1 with v ery high probabilit y . Define u i as w i,j − 1 and define the pro cess U = ( u 1 , . . . , u n ). Break this sequence into 2 δ n subsequences U i = ( u i , u i +2 δ n , u i +4 δ n , . . . , u n − 2 δ n + i ). Eac h U i has indep enden t and iden tically distributed elemen ts. Therefore, ac- cording to the Ho effding Inequalit y , we ha v e P ( | P u j ∈ U i u j − n 2 δ n E ( u i ) | > t ) ≤ 2e − t 2 δ n n . On the other hand we know that E ( u i ) > p 0 . Therefore, P   X u j ∈ U i u j < n 2 δ n p 0 − t   ≤ 2e − t 2 δ n n . Finally w e use the union b ound to obtain P  X u i − np 0 ≤ − t  ≤ P   X i X u j ∈ U 1 u j − n 2 δ n p 0 ≤ − t   ≤ P   ∪ i { ω : X u j ∈ U i u j − n 2 δ n p 0 ≤ − t 2 δ n }   ≤ 4 δ n e − t 2 4 nδ n . Define the set J = { ( i, j ) | j = b j h ( i n ) c} . It is clear that | J | = n . The follo wing Corrollary to Prop osition 1 shows that NLM sets the w eigh ts of most of the pixels in J to 1. Corollary 1. Consider the image displaye d in Figur e 2, and let δ n = O ( n α ) for α < 1 . F or any δ n and t n > 0 , Θ( n ) of the pixels in J wil l p ass the thr eshold t n with very high pr ob ability. Pr o of. Set t = n 3+ α 4 in Prop osition 1. Remark ably the abov e corollary holds in a v ery general setting ev en if the assumptions A1–A4 do not hold. In other w ords, NLM in its most general form is not able to distinguish b et ween the pixels right ab ov e the edge from the pixels right b elo w the edge. This is due to the fact that the “signal to noise ratio” in the δ n -neigh b orho o d distance estimates is to o low at the edge pixels. This is the result of the isotropic neigh b orho o ds used to form the w eight estimates. 25 Lemma 7. If | m − i ∗ | > δ n / 2 and | m 0 − i ∗ | > δ n / 2 , then P ( ¯ d 2 δ n ( y i ∗ ,j ∗ , y m,j ∗ − ` ) ≤ σ 2 + t n ) = P ( ¯ d 2 δ n ( y i ∗ ,j ∗ , y m 0 ,j ∗ − ` ) ≤ σ 2 + t n ) for any `, m, m 0 . The pro of of this lemma is ob vious and is skipp ed here. Lemma 8. F or ` < δ n / 2 , P ( ¯ d 2 δ n ( y i ∗ ,j ∗ , y m,j ∗ − ` ) ≤ σ 2 + t n ) = P ( ¯ d 2 δ n ( y i ∗ ,j ∗ , y m,j ∗ + ` ) ≤ σ 2 + t n ) . The pro of of this lemma is also obvious from symmetry and is skipp ed here. W e can now prov e Theorem 5, which pro vides a low er b ound for the risk of SNLM. Pr o of of The or em 5. W e derive a lo wer b ound for the risk of SNLM on the image displa yed in Figure 2. T o do so, w e consider the pixels just ab ov e the edge and pro ve that the SNLM algorithm has risk Θ(1) at these pixels. Since there are Θ( n ) of these pixels, the risk o ver the entire image is larger than Θ( n − 1 ). Consider a pixel ( i ∗ , j ∗ ) with j ∗ = d n 2 e . The risk of the SNLM is E  f i ∗ ,j ∗ − P P w m,` y m,` P P w m,l  2 ≥  E  P P w m,` y m,` P P w m,l  2 . (24) Note that w m,` is independent of the y m,` according to the construction of the SNLM weigh ts in (10). Let p n,` b e the probabilit y P ( w `,i = 1) for ` ∈ { j ∗ − δ n , j ∗ − δ n + 1 , . . . , j ∗ + δ n } . W e can partition the row { ( i, ` ) | 1 ≤ i ≤ n } in to 2 δ n + 1 subsequences and apply Ho effding inequality on each subsequence. W e combine the results of different subsequences with the union b ound to pro ve that P | X m w m,` − np n,` | > t ! ≤ 4 δ n e − t 2 4 nδ n . (25) Define the ev ent A as A = ( | X m w m,` − np n,` | < n 0 . 66 ∀ `, | ` − j ∗ | ≤ δ n ) . 26 Using the union b ound and (25) w e ha ve P ( A c ) ≤ 8 δ 2 n e − n 1 . 32 4 nδ n . An y lo wer b ound on the bias of the estimator leads to a lo wer b ound on its risk. Therefore, we find a low er b ound for the bias as follows: E  P P w m,` y m,` P P w m,`  ≥ E  P P w m,` y m,` P P w m,`     A  P ( A ) ≥ E  P P w m,` y m,` P np n,` + n . 66 δ n     A  P ( A ) ≥ E  P P w m,` y m,` P np n,` + n . 66 δ n  − P ( A c ) , where for the last inequality we hav e used the fact that the risk of SNLM is b ounded b y 1. Since from the construction of SNLM in (10), w m,` is indep enden t of z m,` , w e hav e E  P P w m,` y m,` P np n,` + n 0 . 66 δ n  − P ( A c ) = E  P P w m,` x m,` P np n,` + n 0 . 66 δ n  − P ( A c ) = P `j ∗ + δ n 2 np n,` . By com bining this fact with Lemma 8, w e obtain P ` c , for some constan t c indep enden t of n . B4: If d 2 ( x i,j , x m,` ) = 1, then E ( w i,j ( m, ` )) = O  1 √ n  . It shall b e empha- sized that slow er deca y rate in this exp ectation, results in slow er decay in the rate of the NLM algorithm. Theorem 7. If the weight assignment p olicy in NLM satisfies pr op erties B1–B4, then sup f ∈ H α ( C ) R ( f , ˆ f N ) = O  log n n  . Pr o of. Consider the four partitions S 1 – S 4 defined in the pro of of Theorem 4. Our goal is to obtain an upp er b ound for the risk of the pixels in eac h region. The risk of the pixels in S 2 and S 3 will be b ounded b y the strategy w e employ ed in Theorem 4. Here, we just explain how w e b ound the risk of the pixels in S 1 and S 4 . Since the pro of for S 4 is the same as the pro of for S 1 , w e consider just S 1 . Let ( i, j ) ∈ S 1 . Therefore x i,j = 0 and E ( x i,j − ˆ f N i,j ) 2 = E  P w m,` y m,` P w m,`  2 = E  P w m,` x m,` P w m,`  2 + E  P w m,` z m,` P w m,`  2 + E  P w m,` x m,` P w m,`   P w m,` z m,` P w m,`  . (26) T o obtain an upp er b ound for the risk, w e will find upp er b ounds for the last three terms in (26). Lemmas 9 and 10 b elow summarize the upp er b ounds. 28 Lemma 9. L et w m,` b e the weights of NLM satisfying B1–B4. Then E  P w m,` x m,` P w m,`  2 = O  1 n  . Pr o of. Define S f as the set of the indices of the pixels whose noise-free v alue is neither 0 nor 1, and plug in the actual v alues of x m,` to obtain E P ( m,` ) ∈ S 1 ∪ S 4 w m,` x m,` + P ( m,` ) ∈ S 2 ∪ S 3 \ S f w m,` x m,` + P ( m,` ) ∈ S f w m,` x m,` P ( m,` ) ∈ S 1 ∪ S 4 w m,` + P ( m,` ) ∈ S 2 ∪ S 3 \ S f w m,` + P ( m,` ) ∈ S f w m,` ! 2 ≤ E P ( m,` ) ∈ S 4 w m,` + P ( m,` ) ∈ S 3 \ S f w m,` + P ( m,` ) ∈ S f w m,` x m,` P ( m,` ) ∈ S 1 ∪ S 4 w m,` + P ( m,` ) ∈ S 2 ∪ S 3 w m,` + P ( m,` ) ∈ S f w m,` ! 2 ≤ E P ( m,` ) ∈ S 4 w m,` + P ( m,` ) ∈ S 3 α + 2 nα P ( m,` ) ∈ S 1 ∪ S 4 w m,` + P ( m,` ) ∈ S 3 α ! 2 . (27) T o derive the last inequality we use the following facts, which are easy to c heck: 1.  P ( m,` ) ∈ S 4 w m,` + P ( m,` ) ∈ S 3 \ S f w m,` + P ( m,` ) ∈ S f w m,` x m,` P ( m,` ) ∈ S 1 ∪ S 4 w m,` + P ( m,` ) ∈ S 2 ∪ S 3 w m,` + P ( m,` ) ∈ S f w m,`  2 is an increasing func- tion of P ( m,` ) ∈ S 3 \ S f w m,` . 2.  P ( m,` ) ∈ S 4 w m,` + P ( m,` ) ∈ S 3 \ S f w m,` + P ( m,` ) ∈ S f w m,` x m,` P ( m,` ) ∈ S 1 ∪ S 4 w m,` + P ( m,` ) ∈ S 2 ∪ S 3 w m,` + P ( m,` ) ∈ S f w m,`  2 is a decreasing func- tion of P ( m,` ) ∈ S 2 \ S f w m,` . 3. | S f | ≤ 2 n , i.e., S f con tains at most 2 n pixels. Our next claim is that P ( m,` ) ∈ S 4 w m,` and P ( m,` ) ∈ S 1 w m,` concen trate around their means. W e establish this in a manner v ery similar to the pro of of Theorem 5. W e first break the P ( m,` ) ∈ S 4 w m,` in to (4 δ n + 2) 2 subsequences suc h that each subsequece contains only indep enden t random v ariables. In other words if x m,` is in one summation, then no other element of C 4 δ n +2 i,j will b e in the summation. Therefore, for each summation w e can apply the Ho effding inequality . Finally , we use the union b ound as explained in the 29 pro of of Theorem 5 to sho w that P         X ( m,` ) ∈ S 1 w m,` − X ( m,` ) ∈ S 1 E ( w m,` )       > t   ≤ 2(4 δ n + 2) 2 e − 2 t 2 (4 δ n +2) 4 ( P ( m,` ) ∈ S 1 α 2 ) , P         X ( m,` ) ∈ S 4 w m,` − X ( m,` ) ∈ S 1 E ( w m,` )       > t   ≤ 2(4 δ n + 2) 2 e − 2 t 2 (4 δ n +2) 4 ( P ( m,` ) ∈ S 1 α 2 ) . It is straigh tforward to prov e that by setting t to 32 αn log 2 . 5 ( n ), w e hav e P         X ( m,` ) ∈ S 1 w m,` − X ( m,` ) ∈ S 1 E ( w m,` )       > 32 αn log 2 . 5 ( n )   ≤ O  δ 2 n n 8  P         X ( m,` ) ∈ S 4 w m,` − X ( m,` ) ∈ S 1 E ( w m,` )       > 32 αn log 2 . 5 ( n )   ≤ O  δ 2 n n 8  . (28) Define the even t F as n    P ( m,` ) ∈ S 1 w m,` − P ( m,` ) ∈ S 1 E ( w m,` )    < 32 αn log 2 . 5 n o ∩ n | P ( m,` ) ∈ S 4 w m,` − P ( m,` ) ∈ S 1 E ( w m,` ) | < 32 αn log 2 . 5 n o . It is clear from (28) that P ( F c ) = O  δ 2 n n 8  . (29) Using (27), (28), and (29) we hav e E P ( m,` ) ∈ S 4 w m,` + P ( m,` ) ∈ S 3 \ S f α + 2 nα P ( m,` ) ∈ S 1 ∪ S 4 w m,` + P ( m,` ) ∈ S 3 \ S f α ! 2 ≤ E   P ( m,` ) ∈ S 4 w m,` + P ( m,` ) ∈ S 3 \ S f α + 2 nα P ( m,` ) ∈ S 1 ∪ S 4 w m,` + P ( m,` ) ∈ S 3 \ S f α ! 2       F   P ( F ) + P ( F c ) ≤ O  1 n  . The last inequality is due to Assumptions B3 and B4. This completes the pro of of the lemma. 30 Lemma 10. L et w m,` b e the weights of NLM with δ n = log( n ) . Also assume that the weights ar e set ac c or ding to B 1 - - B 4 . We then have E  P w m,` z m,` P w m,`  2 = O  log 2 ( n ) n 2  . Pr o of. W e first condition on the ev ent F in tro duced in the pro of of Lemma 9. E  P w m,` z m,` P w m,`  2 ! ≤ E  P w m,` z m,` P w m,`  2      F ! P ( F ) + P ( F c ) ≤ E  P w m,` z m,` P E ( w m,` ) − 32 αn log 2 . 5 ( n )  2      F ! P ( F ) + P ( F c ) ≤ E  P w m,` z m,` P E ( w m,` ) − 32 αn log 2 . 5 ( n )  2 ! + P ( F c ) ≤ O  log 2 ( n ) n 2  . The last inequalit y is due to the fact that E   X ( m,` ) ∈ S 14 w m,` z m,`       C δ n i,j   2 = E   X ( m,` ) ∈ S 14 X ( m 0 ,` 0 ) ∈ S 14 w m,` z m,` w m 0 ,` 0 z m 0 ,` 0       C δ n i,j   = X ( m,` ) ∈ S 14 X ( m 0 ,` 0 ) ∈C 2 δ n m,` E ( w m,` z m,` w m 0 ,` 0 z m 0 ,` 0 | C δ n i,j ) = O ( n 2 δ 2 n ) . Therefore, E   P ( m,` ) ∈ S 23 w m,` z m,` P ( m,` ) ∈ S 14 w m,` ! 2   ≤ O  δ 2 n n 2  . (30) 31 Using the b ounds deriv ed in Lemmas 9 and 10, w e can complete the proof of the main theorem: sup f ∈ H α ( C ) R ( f , ˆ f N L ) = 1 n 2 X i X j E ( x i,j − ˆ f N i,j ) 2 ≤ log 2 ( n )( | S 1 | + | S 4 | ) n 4 + | S 2 | + | S 3 | n 2 ≤ O  log( n ) n  . 6. Discussion W e hav e provided the first asymptotic result on the risk analysis of the nonlo cal means (NLM) algorithm on smo oth images with sharp edges. In con trast to most other filtering approac hes, NLM does not consider the spa- tial vicinity of the pixels as a feature for setting the weigh ts. Instead, it exploits more global features, whic h leads to improv ed p erformance. In spite of this success, we hav e sho wn that the p erformance of NLM is within a logarithmic factor of the p erformance of the wa v elet thresholding and still significantly b elow the optimal achiev able rate. This is due to the fact that the isotropic nature of the NLM neighborho o ds do es not allo w the algorithm to discriminate the pixels that are close to but b elo w the edge from the pixels that are close to but abov e the edge. This leads to a blurring effect that results in high bias along the edge. Exploring the p erformance of NLM with anisotropic neigh b orho o ds ma y address this issue and is left for future researc h. 7. Ac knowledgemen ts This work w as supp orted b y the Gran ts NSF CCF-0431150, CCF-0728867, CCF-0926127, D ARP A/ONR N66001-08-1-2065, N66001-11-1-4090, N66001- 11-C-4092, ONR N00014-08-1-1112, N00014-10-1-0989, AFOSR F A9550-09- 1-0432, ARO MURI W911NF-07-1-0185 and MURI W911NF-09-1-0383, and b y the T exas Instruments Leadership Universit y Program. References [1] L. I. Rudin, S. Osher, E. F atemi, Physica D: Nonlinear Phenomena 60 (1992) 259 – 268. 32 [2] D. L. Donoho, I. M. Johnstone, Annals of Statistics 26 (1998) 879–921. [3] A. Korostelev, A. Tsybak ov, Minimax theory of Image Reconstruction, Lecture Notes in Statistics, Springer-V erlag, 1993. [4] D. L. Donoho, Annals of Statistics 27 (1999) 859 – 897. [5] E. Arias-Castro, D. L. Donoho, Annals of Statistics 37 (2002) 1172–1206. [6] L. Y aroslavsky , Digital image pro cessing-An introduction, Springer V er- lag, 1985. [7] S. M. Smith, J. M. Brady , International Journal of Computer Vision 23 (1997) 45–78. [8] C. T omasi, R. Manduchi, in: International Conference on Computer Vision, pp. 839 –846. [9] J.-S. Lee, Computer Vision, Graphics, and Image Pro cessing 24 (1983) 255 – 269. [10] S. Mallat, A W av elet T our of Signal Pro cessing, Academic Press, San Diego, CA, 1997. [11] E. Candes, D. L. Donoho, Curvelets: A Surprisingly Effective Nonadap- tiv e Representation of Ob jects with Edges, T ec hnical Rep ort, 1999. [12] G. Kutyniok, D. Labatte, Journal on W av elet Theory and Applications 1 (2007) 1–10. [13] M. N. Do, M. V etterli, IEEE T ransactions on Image Pro cessing 14 (2005) 2091–2106. [14] A. Buades, B. Coll, J. Morel, SIAM Journal on Multiscale Mo delling and Sim ulation 4 (2005) 490–530. [15] J. M. Bro der, New Y ork Times (2011). [16] C. Stein, Appro ximate computation of exp ectation, Institute of Mathe- matical Statistics, 1986. 33

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment