A nonlinear Stein based estimator for multichannel image denoising

1 A Nonlinear Stein Based Estimator f or Multichannel Image Denoising Caroline Chaux, Member IEEE , Laurent Duval, Member IEEE, Amel Benazza-B enyahia, Member IEEE, and Jean-Christophe Pesquet, Senior Member IEEE Abstract The use o f m ulticompo nent images h as beco me widespr ead with the improvement of multisensor systems having i ncreased spatial and spectral reso lutions. Ho wev er , the observed images are often corrup ted by an additive Gaussian noise. In th is paper, we are interested in multichann el image denoising based on a multiscale representatio n of the images. A multivariate statistical ap proach is a dopted to take into a ccount both the spatial an d th e inter-componen t cor relations existing betwe en the different wa velet subband s. Mo re pr ecisely , we propose a ne w parametric no nlinear estimator which generalizes many reported deno ising methods. The d eriv ation of the optimal p arameters is achieved by applying Stein’ s principle in the multiv ariate case. Experiments perform ed on mu ltispectral remote sensing images clearly indicate that our me thod outperfo rms conventional wav elet deno ising tech niques. Index T erms Multicompo nent image, multichannel noise, denoising, multi variate estimation, block esti mate, Stein’ s principle, non linear estimation, thresholding, M -band wa velet tran sform, du al-tree wavelet transfor m, frames. C. Chaux and J.-C. Pesquet are with the Institut Gaspard Monge and CNRS -UMR 8049, Université de Paris-Est, 77454 Marne-la-V allée Cedex 2, France. E-mail: {chaux,pesquet}@u niv-mlv.fr , phone: +33 1 60 95 77 39. L. Duv al is with the Institut F rançais du Pét role (IFP), T echnology , Computer Science and Applied M athematics Division, 92500 Rueil-Malmaison, France. E-mail: laurent.duval@ifp .fr , phone: +33 1 47 52 61 02. A. Benazza-Beny ahia i s with the Unité de Recherche en Imagerie Satellitaire et ses Applications (URISA), École Supérieure des Communications de Tunis, Route de R aoued 3.5 Km, 2083 Ariana, Tu nisia. E- mail: benazza.amel@supc om.rnu.tn , phone: +216 1755621. May 30, 2018 DRAFT 2 I . I N T R O D U C T I O N Many real world imag es are co ntaminated by noise during their a cquisition and/or trans mission. In particular , multichannel imaging is prone to quality degradation due t o the imperfectness of the sensors often operating in dif fere nt spectral ranges [1], [2]. In order to alleviate the inﬂ uence of suc h d isturbing artif acts on subseq uent a nalysis proc edures, de noising app ears as a c rucial initial step in multicomponent image enh anceme nt. In this context, attention ha s bee n paid to dev eloping ef ﬁcient den oising methods . Usually , the noise remov al p roblem is considered as a regression problem. The challenge thus resides in ﬁnding realistic statistical models which lea d to both efﬁcient an d tractable deno ising approac hes. T o this respect, linearly transforming the signal fr om the spatial domain to a more suitable one may drastically improve the d enoising pe rformance. The rationale for such a transformation is the observation that some represe ntations posse ssing goo d energy concen tration and decorrelation prope rties tend to simplify the statistical analysis of many natural imag es. For instance, the Disc rete W avelet Transform (DWT) constitutes a powerful tool for image d enoising [3], [4]. The DWT , c omputed for each c hannel/comp onent separately , usua lly yields “larger” coe f ﬁcients for signal features and “smaller” ones for no ise since it forms an u nconditional basis for s ev eral classes of regular signals [5]. For mono channe l signals or image s, the seminal work of Donoho and Johns tone has shown tha t a me re wa velet c oefﬁcient thresholding constitutes a simple ye t e f fecti ve technique for noise redu ction [6]. Based on Stein’ s Unbias ed Risk Estimator (SURE), they have propos ed the SUREshrink technique [7]. Subse quently , several e xtensions of their work have be en performed, e.g. in [8]–[11]. Recen tly , the de noising problem in the wavelet domain has gained more attention in the ca se of multichanne l images. Indee d, the increa sing need for multicomponent images in several applications s uch as medical imaging and remote sen sing has moti vated a grea t interest in de signing tractable d enoising methods ded icated to this kind of images . Componentwise process ing c an be perfor med for each modality , b ut a joint denoising shou ld be preferred in order to exploit the cross-chan nel s imilariti es in an efﬁcient way [12]. The problem of a joint estimation in the wa velet domain has b een formulated in [13]. More precise ly , the us e of joint threshold e stimators was in ves tigated in two situations: overcomplete represe ntations of a noisy image [14] and multiple obs ervations of the same image [13]. A s cale-adap ti ve wav elet thresholding was d esigned for multichanne l images in the c ase of an i.i.d. (independe nt identically d istrib ute d) Ga ussian vec tor noise whose c omponents are indepe ndent and ha ve the same variance [15]. In a Bayes ian framework, several prior models ha ve been considered such as multivariate Be rnoulli-Gaussian on es [16]. A generalized Gau ssian distributi on was also con sidered for modelling the marginal distributi on of each subb and in each channe l and a simple s hrinkage was app lied May 30, 2018 DRAFT 3 depend ing on the loc al spec tral activi ty [17]. A vector -base d leas t-square approa ch was also in vestigated in the wavelet domain [18]. Rec ently , the application o f Stein’ s principle [19]–[21 ] in the multiv a riate case has motiv a ted the design of a no nlinear estimator in [22]. In this paper , links existing between the propos ed nonlinear e stimator an d Bayes ian a pproache s were discussed. In pa rticular , the structure of the estimator w as moti vated by a multi variate Bernoulli-Gaussian mode l r eﬂecting the sparseness o f the wa velet representation a s well as the statistical dep endenc ies existing b etween the dif ferent componen ts. W e point out that the form of the estimator in [22 ] is not the s ame as the on e p roposed in this pap er . In particular , the estimator i n [22] does not i n volve any thresholding operation. Moreover , the estimator does n ot a llo w to take into ac count spa tial de penden cies but on ly those existing be tween the multichan nel data at a gi ven position. In parallel to these works, the idea of pe rforming a joint spatial denoising of the coefﬁcients, rather than using a con ventional t erm-by-term proces sing, ha s emer ged in statistics. This idea, stemming from an ince nti ve for cap turing statistical depe ndence s b etween spa tial neighboring wavelet coef ﬁc ients, was ﬁrst inv e stigated for single c omponent image s in both no n-Bayesian and Baye sian c ases [23 ], [24 ]. A succe ssful extension was also carried out in the ca se of multichann el imag es by c onsidering hybrid (spectral and spatial) neighborhoods [25 ]. In this paper , we aim at building a new estimator allowing t o take into account the vari ous correlations existing in multichan nel image data. This estimator also pro vides a unifying framework for several denoising methods proposed in the literature. More precisely , our co ntrib u tions are the foll owing. • The metho d ap plies to a ny vector-v alued data embedde d in a multi variate Gauss ian no ise. As illustrated later on, many examp les of such mu lti variate co ntexts (inter -compo nent, spatial a nd inter-scale) c an be found. They na turally include multiv a riate denoising obtained with vectors of samples s haring the same spatial position in dif ferent chann els. • The e stimator can be computed in any image repres entation domain. For instan ce, in ad dition to wavelet domains, usually considered in con ventional denoising methods, we propose to exploit more general frame decompo sitions suc h as t he dual-tree wa velet transform [26], [ 27]. • The co mputation of the es timated value can be p erformed with the help o f various o bservations. Again, our method include s most of the reported estimation me thods acting in that way . Furthermore, it off ers a great ﬂexibili ty in the c hoice of these a uxiliary data. • The form of the proposed estimator is q uite gene ral. More prec isely , we focus on deriving thres holding estimators inc luding an expon ent parameter and a linear p art. Optimal pa rameters are de ri ved from S tein’ s principle. May 30, 2018 DRAFT 4 • The den oising approach allo ws to handle any covariance matrix between the multichann el noise compone nts. Notwithstanding it s generality , the propose d approach rema ins tracta ble a nd compares quite f avorably with state-of-the-art methods. The paper is o r ganized as follo w s. In Section II, we present the rele vant backgrou nd and intr oduce notations for a g eneral formulation of the estimator , based on the conce pt of Reference Observ ation V ector . In Section III, we d escribe the propos ed multi variate nonlinear estimator . In Sec tion IV, w e give the s peciﬁc form taken b y this new e stimator for multichannel image s deco mposed by a wavelet transform or an M -band dual-tree wav elet trans form. In Section V , experimental results are giv e n for remote s ensing image s showing that the proposed es timator outpe rforms existing on es and some concluding remarks are d rawn in Se ction VI. Throughou t this paper , the follo w ing notations will be use d: let M be an integer greater than or equ al to 2, N M = { 0 , . . . , M − 1 } and N ⋆ M = { 1 , . . . , M − 1 } ; Z , R an d R + are the sets of integers, reals and positiv e reals; ⌈ . ⌉ de notes roun ding towards the immediate upper integer . Besides, b a denotes the Fourier transform of a function a , ( δ m ) m ∈ Z is the Kron ecker se quence (equal to 1 if m = 0 and 0 othe rwise), ( f ) + = f if f > 0 and 0 otherwise and 1 { A } = 1 if cond ition A is true and 0 otherwise. I I . B AC K G RO U N D A. Gen eral formu lation of the multichannel estimator In multisensor imaging, B vectors of observed data samples ( r ( 1 ) ( k )) k ∈ K , . . . , ( r ( B ) ( k )) k ∈ K , are provided where B is the number of e f fec ti ve se nsors and K is a se t of sp atial indices ( K ⊂ Z 2 ). Gen erally , t hese data correspon d to noisy realizations of B unknown s ignals ( s ( 1 ) ( k )) k ∈ K , . . . , ( s ( B ) ( k )) k ∈ K , respectively . Subseq uently , our task will co nsist in devising method s to reduce the noise present in the obs ervations. T wo a lternati ves ca n be en visag ed in this context. On the one hand, a monochann el approac h b uilds an es timator f s ( b ) ( k ) o f s ( b ) ( k ) o nly from the obs ervations ( r ( b ) ( k )) k ∈ K , for each ch annel b ∈ { 1 , . . . , B } . On the o ther hand, a multiv ariate technique attempts to e stimate s ( b ) ( k ) by accounting not only for the individual data set { r ( b ) ( k ) } k ∈ K , but also for the remaining one s { r ( 1 ) ( k ) } k ∈ K , . . . , { r ( b − 1 ) ( k ) } k ∈ K , { r ( b + 1 ) ( k ) } k ∈ K , . . . , { r ( B ) ( k ) } k ∈ K . Thus, o ne of the simples t relevant denoising a pproach cons ists in calculating the estimated value f s ( b ) ( k ) of s ( k ) as f s ( b ) ( k ) = f ( r ( b ) ( k )) (1) where f is a scalar function deﬁne d on the rea l line. For instance , a shrinkage fun ction can be used, possibly in volving s ome threshold value. Such a technique is commonly used in regression, whe n ou tliers May 30, 2018 DRAFT 5 have to be removed in order to improve the represe ntati vity of the ﬁ t [28]. Although r ( b ) ( k ) does no t neces sarily depend on other obse rved samples, for structured signal or image analysis, neighboring samples often presen t s ome correlations. Conseq uently , an impro vement can be expected if f s ( b ) ( k ) is calculated with the help of a subse t R ( b ) ref ( k ) o f obse rved sample locations. A verage o r me dian ﬁltering [29, p. 243–245 ] are exa mples where the estimated sample depends on its neighborho od. As a r esult, a more general estimation rule is: f s ( b ) ( k ) = f  ( r ( b ) ( k ′ )) k ′ ∈ R ( b ) ref ( k )  . (2) W ith underlying Markovian assu mptions, the context s et { r ( b ) ( k ′ ) } k ′ ∈ R ( b ) ref ( k ) can b e restricted to a limited number of values around the samp le loca tion k . The se v alue s c an be ga thered in a vector r ( b ) ( k ) which will be designated as the Reference Obs ervation V ector (R O V). W e have then f s ( b ) ( k ) = f ( r ( b ) ( k )) . (3) The mu lti variate c ase ca n also be d escribed by such a formula if we allow the RO V to con tain add itional samples from the r emaining chann els in order to exploit the inter-component statisti cal depend encies. Another d egree of freedom lies in the choice of a suitable domain for data representation. While virtually any transform can b e c hosen, special attention has been pa id to multiscale transforms. For example, if a decomposition onto an M -band wa velet b asis ( M ≥ 2) [30] is p erformed, the observed images are represented by c oefﬁcients r ( b ) j , m ( k ) deﬁne d at resolution level j ≥ 1 a nd subb and index m ∈ N 2 M and the correspond ing R O V will b e deno ted r ( b ) j , m ( k ) . Since the noise is usu ally less c orrelated than the data, the DWT is app lied in order to provide a spa rser repres entation of the data of interest, before further analysis [3], [4]. The goal become s to generate estimates f s ( b ) j , m ( k ) o f the unknown wa velet coefﬁcients s ( b ) j , m ( k ) of t he original images: f s ( b ) j , m ( k ) = f ( r ( b ) j , m ( k )) . (4) Then, the in verse DWT is applied to the estimated c oefﬁcients in order to reconstruct the e stimated s ignal f s ( b ) ( k ) in the spatial d omain. In the li terature c oncerning denoising , two key issues have bee n address ed. The ﬁrst on e lies in the de ﬁnition of the R O V . The s econd one conc erns the ch oice of an appropriate function f o r , in othe r words, a suitable expres sion of the estimator . In the next subse ction, we giv e a brief ov erview of the main R O Vs propos ed until no w . May 30, 2018 DRAFT 6 B. Rep orted R O Vs in the D WT doma in Popular compo nentwise me thods operating in the D WT domain are V is ushrink [5] and SUREsh rink [7]. They b oth employ a very basic R O V reduced to a scalar v a lue: r ( b ) j , m ( k ) = r ( b ) j , m ( k ) . (5) Similarly to what can b e d one in the sp atial doma in, the wa velet coefﬁcients can also be proce ssed by block rather than indi v idually , again in a mono-chann el way [23], [24], [31]–[33]. The main motiv ation for this tec hnique is to exploit the s patial similarities be tween ne ighboring coefﬁcients in a given subban d. The introduc tion of d − 1 spatial ne ighbors k 1 ,. . . , k d − 1 of the current s ample indexed by k in the R O V allows to take into account the s patial dependencies : r ( b ) j , m ( k ) = [ r ( b ) j , m ( k ) , r ( b ) j , m ( k 1 ) , . . . , r ( b ) j , m ( k d − 1 )] ⊤ . (6) For higher dimens ional data, the R O V may also c onsist of c oefﬁcients sharing similar orientations, possibly within d if feren t scales [34]. Another generalization of the scalar c ase takes into accoun t the inter -scale similarities b etween the current co efﬁcient and the homologous ones deﬁne d at other scales . Based on empirical observations in image compression [35], it h as been proposed to us e the c urrent coefﬁcient ance stors at coa rser sca les j + 1, j + 2, . . . , j m ev entually u p to the coarses t level J [36]–[38]: the R O V r ( b ) j , m ( k ) thus includes t he correspon ding j m − j + 1 coefﬁcients at location k , in subba nd m , at resolution le vel j . In the cas e of multicomponent d ata, add itional samples borrowed from the dif ferent ch annels can b e included in the R O Vs, as shown in [34], [39] for color imag e as we ll as for multispectral image denoising. Basically , the inter -componen t correlations can be taken into account through the following R O V [22 ]: r ( b ) j , m ( k ) = [ r ( 1 ) j , m ( k ) , . . . , r ( B ) j , m ( k )] ⊤ . (7) Such a RO V includes all the coefﬁcients o f all ch annels at the same spatial loca tion, in the s ame subb and m a nd a t the s ame res olution lev e l j . In [25], a more soph isticated multicomponen t R O V r ( b ) j , m ( k ) has been deﬁned which c ombines both spatial and multichanne l neigh bors. As particular ca ses, su ch an R O V en compass es the R O V in (7) a nd, also the R O V in (6). In addition, the R O V may include coef ﬁcients fr om dif feren t subbands. A ﬁnal potential extension of the R O Vs is rela ted to the choice of the transform. Indee d, it has been l ong observed that a decomposition onto a wa velet basis suffers from a lack of shift-in variance as well as a p oor directionality , resu lting in denoising a rtif a cts at low signal to n oise ratios. A simple way for alleviating May 30, 2018 DRAFT 7 these problems is to use a frame decompos ition built from a union o f wa velet bas es. In particular , a number of papers [40]–[42] ha ve demonstrated signiﬁcant improvements in scalar shrinkage when resorting to a translation-in variant wav e let representa tion. The latter can be viewed as a deco mposition onto a union of shifted versions of a unique wavelet basis. M -band dua l-tree wavelet dec ompositions [27] constitute another exa mple o f a union of 2 (resp . 4) wavelet base s in the real (resp. complex) case. T he correspond ing mother wav e lets are the n deriv ed from the ﬁ rst one by Hilbert transforms, wh ich res ults in an impro ved directiona l analysis. F or such fr ame decompo sitions, one can extend the no tion of R O V to includ e samples produce d by the diff erent wavelet basis dec ompositions ope rating in pa rallel. The se facts will be further developed to moti vate the app lication of the gene ral estimator p roposed in this pa per to an M -band dual-tree w av elet fr ame [27]. C. A unifying framework for shrink age functions In the a forementioned works, the estimation is often performed by shrinkage , so exp loiting the s parse- ness of the represen tation. The most well-kno wn metho d was proposed in the pioneering works of Donoho and Johnstone [5]. T he estimating function f is then given by f ( r ( b ) j , m ( k )) = s ign ( r ( b ) j , m ( k )) max {| r ( b ) j , m ( k ) | − λ , 0 } (8) for a soft thres holding with thres hold value λ ≥ 0, where s ign ( · ) is the signum func tion. Equ i valently , by using the R O V in (5), the estimating function can b e expressed as f ( r ( b ) j , m ( k )) = | r ( b ) j , m ( k ) | − λ | r ( b ) j , m k ) | ! + r ( b ) j , m ( k ) . (9) Some works [43] have focus ed on the improvement of the s calar s hrinkage rule, yielding for instance smoother functions such as the ga rrote shrinkage ba sed on [44 ], which is deﬁned a s: f ( r ( b ) j , m ( k )) = | r ( b ) j , m ( k ) | 2 − λ | r ( b ) j , m ( k ) | 2 ! + r ( b ) j , m ( k ) . (10) Several authors h av e propos ed vector -like generalizations to the scalar s hrinkage. Cai a nd Silverman [23], hav e proposed a bloc k estimator which takes into acc ount information on neighboring co efﬁcients in eac h subband , as expressed as in (6). Th is estimator domina tes the maximum likelihood e stimator when the b lock size is g reater than 2. This method, na med “Ne ighBlock”, consists in ap plying the following shrinkage rule: f s ( b ) j , m ( k ) = k r ( b ) j , m ( k ) k 2 − ¯ λ d σ 2 k r ( b ) j , m ( k ) k 2 ! + r ( b ) j , m ( k ) (11) May 30, 2018 DRAFT 8 where ¯ λ > 0, d is the number of c omponents in the R O V , r ( b ) j , m ( k ) is a sub part of the R O V , f s ( b ) j , m ( k ) is the assoc iated vector of e stimated values, k . k de notes the classical Euclidean n orm o f R d and σ 2 denotes the noise v arianc e. S uch a function is clearly reminiscent of the s calar garrote shrinka ge deﬁne d in (10). Based o n an a symptotic minimax s tudy , Cai and Silverman sugge sted approp riate values for ¯ λ and d . They c onsidered both overlapping and n on-overlapping variants of this ap proach. In pa rticular , the s o- called “Ne ighCoeff ” method correspond s to the cas e when f s ( b ) j , m ( k ) red uces to a s calar es timate. Then, the correspond ing estimating function is: f ( r ( b ) j , m ( k )) = k r ( b ) j , m ( k ) k 2 − ¯ λ d σ 2 k r ( b ) j , m ( k ) k 2 ! + r ( b ) j , m ( k ) . (12) In the mea ntime, ¸ Sendur a nd Selesnick [45] introduced a Bayesian ap proach all owing to model inter- scale d epende ncies between t wo c onsec uti ve levels. These authors con sequen tly formulated the problem in the 2-ba nd wa velet domain. In their approach, the R O V is gi ven by r ( b ) j , m ( k ) = [ r ( b ) j , m ( k ) , r ( b ) j + 1 , m ( ⌈ k 2 ⌉ )] ⊤ , r ( b ) j + 1 , m ( ⌈ k 2 ⌉ ) being the “paren t” of r ( b ) j , m ( k ) (at the next c oarser res olution). By cons idering as a prior model the non-Gauss ian biv a riate probability dens ity f unction p ( s ( b ) j , m ( k ) , s ( b ) j + 1 , m ( ⌈ k 2 ⌉ )) ∝ e xp  − √ 3 σ s r   s ( b ) j , m ( k )   2 +   s ( b ) j + 1 , m ( ⌈ k 2 ⌉ )   2  , σ s > 0 (13) the follo wing Maximum A Posteriori (MAP) estimator was de ri ved: f ( r ( b ) j , m ( k )) = k r ( b ) j , m ( k ) k − √ 3 σ 2 σ s k r ( b ) j , m ( k ) k ! + r ( b ) j , m ( k ) (14) where the noise variance is again denoted b y σ 2 . More recently , in the context of signal restoration problems, Combe ttes and W ajs [46] hav e studied the properties o f proximity operators corresp onding to the solutions of some c on vex regularization prob lems. In particular , an interpretation o f o ne o f their results is the followi ng. L et us adopt a Baye sian formulation by assuming t hat the vector r ( b ) j , m ( k ) is a noisy observation of the vector s ( b ) j , m ( k ) o f multi channe l coe f ﬁcients at location k , embedded in wh ite Gaussian noise with variance σ 2 . Further assume that the vectors s ( b ) j , m ( k ) are ind epende nt of the noise, mutually i ndepen dent an d ha ve a prior distrib ution p roportional to exp ( − λ k · k ) with λ > 0. The MAP es timation o f s ( b ) j , m is found by solving the optimization problem: min u ∈ R B λ k u k + 1 2 σ 2 k u − r ( b ) j , m ( k ) k 2 . (15) It is shown in [46] that the minimizer of the MAP criterion is f s ( b ) j , m =  k r ( b ) j , m ( k ) k − λ σ 2 k r ( b ) j , m ( k ) k  + r ( b ) j , m ( k ) . (16) May 30, 2018 DRAFT 9 The three pre vious block-thresh olding estimators have b een derived from dif ferent perspectiv es and they have also been applied in dif ferent ways. H owe ver , it is p ossible to d escribe them through a ge neral shrinkage factor η λ ( k ¯ r ( b ) j , m k β ) , where ∀ τ ∈ R + , η λ ( τ ) =  τ − λ τ  + (17) and β > 0 and λ ≥ 0 take speciﬁc values in each of the aforementioned b lock e stimators. W e a lso remark that this generalized shrinkage obviously en compass es the s oft an d garrote thres holdings provided in ( 9) and (10). I I I . P RO P O S E D N O N L I N E A R E S T I M A T O R A. Notations W e will now prop ose a more ge neral adap ti ve estimator that can be applied in any representation domain. W e will therefore drop the ind ices j and m and we will consider the general s ituation where an observation se quence ( r ( k )) k ∈ Z 2 of d -dimensional rea l-v alued vectors ( d ∈ N , d > 1) i s deﬁned as ∀ k ∈ Z 2 , r ( k ) = s ( k ) + n ( k ) , (18) where ( n ( k )) k ∈ Z 2 is a N ( 0 , Γ Γ Γ ( n ) ) noise and ( s ( k )) k ∈ Z 2 is a n iden tically distr ibuted seco nd-order random sequen ce which is independen t of ( n ( k )) k ∈ Z 2 . W e will a ssume that the covariance matr ix Γ Γ Γ ( n ) is in vertible. These random vectors are deco mposed as r ( k ) =   r ( k ) ˜ r ( k )   , s ( k ) =   s ( k ) ˜ s ( k )   , n ( k ) =   n ( k ) ˜ n ( k )   (19) where r ( k ) , s ( k ) and n ( k ) a re scalar random variables. W e aim a t e stimating the ﬁrst c omponent s ( k ) of the vector s ( k ) using an observation sequence ( r ( k )) k ∈ K where K is a ﬁnite su bset of Z 2 . W e recall that, although (18) do es not introduce a n explicit depe ndence b etween s ( k ) and the vector ˜ r ( k ) of the last d − 1 comp onents of r ( k ) , such a statistical depe ndenc e may exist, due to the dep endenc e betwee n the componen ts of s ( k ) themselves. The estimated seque nce will be denoted by ( f s ( k )) k ∈ K . B. F orm of the adaptive estimator In order to ga in more ﬂexibility in the denoising proc edure, the follo w ing generalized form of sh rinkage estimate will be considered: f s ( k ) = η λ ( k r ( k ) k β ) q ⊤ r ( k ) , (20) May 30, 2018 DRAFT 10 where t he function η λ ( · ) i s gi ven by (17) with λ ≥ 0, β > 0 and q ∈ R d . The vector q corresponds to a linear parameter . W e n otice, in particular , that if the thresh old value λ is set to ze ro, the considered estimator reduces to f s ( b ) ( k ) = q ⊤ ¯ r ( k ) . T his shows that li near estimators cons titute a subset of t he considered c lass of estimators. In addition, by an app ropriate c hoice of the vector q , estimators co nsisting o f a preliminary decorrelation of the data followed by a thresholding step also a ppear as spec ial cases of the proposed estimator . Note that, in c on ventional multichanne l data ana lysis, it is c ustomary to de correlate the da ta before processing. The most common examp les are ﬁxed chan nel conv ersions (like those from stereo to mono in sound process ing or from RGB to luminan ce/chrominanc e co mponents in c olor image or video process ing). When the data modalities a re less stand ardized (for instanc e in satellite imaging), a daptiv e methods such as the Karhun en-Loève transform or Indepe ndent Component Analysis (ICA) [47] can be use d. Th e latter adaptive transforms can also b e performed in the trans formed domain, e.g. in each subban d. Furthermore, in order to limit the co mputational c omplexity in the implemen tation of the e stimator , it can be use ful to constrain t he vector q to belong to some vector subspace of reduced dimens ion d ′ ≤ d . Let P ∈ R d × d ′ be the matr ix whose column v e ctors f orm a basis of this subspa ce. W e have the n q = P a where a ∈ R d ′ . As a simple examp le, by choo sing P =   I d ′ O   where I d ′ denotes the identity matrix of size d ′ × d ′ , we see tha t we only introduc e in the es timator a linear combination of the ﬁrst d ′ compone nts o f the vector r ( k ) . In summary , the proposed form of t he estimator is parameterized by λ , β and a for a gi ven choice of P . Our o bjectiv e is to ﬁnd the optimal pa rameters tha t minimize the qua dratic risk deﬁne d as R ( λ , β , a ) = E [ | s ( k ) − f s ( k ) | 2 ] , for a predeﬁned value of P . It is easy to show that the risk reads: R ( λ , β , a ) = E [ | s ( k ) − f s ( k ) | 2 ] = E [ | s ( k ) | 2 ] + E [ | η λ ( k r ( k ) k β ) a ⊤ P ⊤ r ( k ) | 2 ] − 2 E [ η λ ( k r ( k ) k β ) a ⊤ P ⊤ r ( k ) s ( k ) ] . (21) The minimization of the risk is not obvious for any obs ervation mod el. I ndeed, since the s ( k ) a re unknown, it see ms imposs ible to express the rightmost term E [ η λ ( k r ( k ) k β ) a ⊤ P ⊤ r ( k ) s ( k ) ] . Howe ver , in the case o f a Gauss ian no ise, it is possible to a pply an extens ion of Stein’ s principle [19] for deriving an explicit expression. I n the next subs ection, we will state and prove su ch e xtended Stein’ s formula. May 30, 2018 DRAFT 11 C. Stein’ s for mula Pr opo sition 1: Let f : R d → R be a continuous, almost ev erywhere differentiable function such tha t: ∀ θ θ θ ∈ R d , lim k t k→ + ∞ f ( t ) exp  − ( t − θ θ θ ) ⊤ ( Γ Γ Γ ( n ) ) − 1 ( t − θ θ θ ) 2  = 0; (22) E [ | f ( r ( k )) | 2 ] < + ∞ and E    ∂ f ( r ( k )) ∂ r ( k )    < + ∞ . (23) Then, E [ f ( r ( k )) s ( k )] = E [ f ( r ( k )) r ( k )] − E h ∂ f ( r ( k )) ∂ r ( k ) i ⊤ E [ n n ] . (24) Pr oof: Let T : R d → R d be a continuous, almost everywhere differentiable function such that ∀ θ θ θ ∈ R d , lim k t k→ + ∞ T ( t ) exp  − ( t − θ θ θ ) ⊤ ( Γ Γ Γ ( n ) ) − 1 ( t − θ θ θ ) 2  = 0 ; (25) E [ k T ( r ( k )) k 2 ] < + ∞ and E h   ∂ T ( r ( k )) ∂ r ⊤ ( k )   F i < + ∞ . (26) where k · k F is the Frobenius norm. In this multiv ariate c ontext, Stein’ s principle [19] ca n be express ed as E [ T ( r ( k )) s ⊤ ( k )] = E [ T ( r ( k )) r ⊤ ( k )] − E [ ∂ T ( r ( k )) ∂ r ⊤ ( k ) ] Γ Γ Γ ( n ) . (27) Eq. (24) follows by choo sing T : t 7→ [ f ( t ) , 0 , . . . , 0 ] ⊤ and focusing o n the top-left e lement of matrix E [ T ( r ( k )) s ⊤ ( k )] . D. Risk expression W e de ﬁne the function f : u 7→ η λ ( k u k β ) a ⊤ P ⊤ u . It is easy to check that this function f satisﬁes the conditions of Prop. 1. Con sequen tly , the last term can be calculated tha nks to (24). This y ields E [ s ( k ) f ( r ( k ))] = E [ r ( k ) f ( r ( k ))] − E h ∂ f  r ( k )  ∂ r ( k ) i ⊤ Γ Γ Γ ( n , n ) (28) where Γ Γ Γ ( n , n ) = E [ n ( k ) n ( k )] . W e then have ∂ f  r ( k )  ∂ r ( k ) = a ⊤ P ⊤ r ( k ) ∂η λ ( k r ( k ) k β ) ∂ r ( k ) + η λ ( k r ( k ) k β ) ∂ a ⊤ P ⊤ r ( k ) ∂ r ( k ) = λβ k r ( k ) k β + 1 1 {k r ( k ) k β > λ } ∂ k r ( k ) k ∂ r ( k ) r ⊤ ( k ) Pa + η λ ( k r ( k ) k β ) Pa = η λ ( k r ( k ) k β ) Pa + λ r ( k ) ξ ξ ξ ⊤ ( k ) Pa (29) where ξ ξ ξ ( k ) = 1 {k r ( k ) k β > λ } β k r ( k ) k β + 2 r ( k ) . This leads to the following expression of the risk: R ( λ , β , a ) = E [ | r ( k ) − f ( r ( k )) | 2 ] + 2 E [ η λ ( k r ( k ) k β )] a ⊤ P ⊤ Γ Γ Γ ( n , n ) + 2 λ a ⊤ P ⊤ E [ ξ ξ ξ ( k ) r ⊤ ( k )] Γ Γ Γ ( n , n ) − σ 2 (30) where σ 2 = E [ | n ( k ) | 2 ] . May 30, 2018 DRAFT 12 W e will now loo k for parameters λ , β and a that minimize the ri sk e xpression (30) for a given cho ice of P . E. Deter mination of t he parameter a W e ﬁrst aim a t calcu lating the value of a tha t minimizes the risk (30). B y noticing that the risk is a quadratic co n vex function of a , the minimization c an be pe rformed by differentiati ng w .r .t. a an d then ﬁnding a ∗ ( λ , β ) such that ∂ R / ∂ a  λ , β , a ∗ ( λ , β )  = 0. It readily follo ws that a ∗ ( λ , β ) =  P ⊤ E [ η 2 λ ( k r ( k ) k β ) r ( k ) r ⊤ ( k )] P  − 1 P ⊤  E [ η λ ( k r ( k ) k β ) r ( k ) r ( k )] − E [ η λ ( k r ( k ) k β )] Γ Γ Γ ( n , n ) − λ E [ ξ ξ ξ ( k ) r ⊤ ( k )] Γ Γ Γ ( n , n )  . (31) F . Deter mination of the parameters λ a nd β Starting from (30), the ri sk R ( λ , β , a ) can be re-expressed as R ( λ , β , a ) = E [ ρ λ , β , a ( k )] where ρ λ , β , a ( k ) = α 2 ( k ) λ 2 + α 1 ( k ) λ + α 0 ( k ) (32) and α 0 ( k ) = r 2 ( k ) − σ 2 + 1 {k r ( k ) k β > λ } a ⊤ P ⊤  2 Γ Γ Γ ( n , n ) +  a ⊤ P ⊤ r ( k ) − 2 r ( k )  r ( k )  α 1 ( k ) = 2 a ⊤ P ⊤   r ( k ) − a ⊤ P ⊤ r ( k )  r ( k ) − Γ Γ Γ ( n , n ) k r ( k ) k β + β r ( k ) r ⊤ ( k ) Γ Γ Γ ( n , n ) k r ( k ) k β + 2  1 {k r ( k ) k β > λ } α 2 ( k ) = 1 {k r ( k ) k β > λ }  a ⊤ P ⊤ r ( k )  2 k r ( k ) k 2 β . In practice, under standard mixing assump tions for ( n ( k )) k ∈ Z 2 and ( s ( k )) k ∈ Z 2 [48], R ( λ , β , a ) c an be estimated via a n emp irical average f R ( λ , β , a ) computed over K , p rovided that the d ata len gth K = card ( K ) is large eno ugh. Follo wing a proce dure s imilar to the sea rch implemen ted for the SUREshrink estimator , we will s ubsequ ently determine op timal values of λ and β for this c onsistent risk es timate. More precisely , the no rms of the R O Vs ( k r ( k ) k ) k ∈ K are ﬁrst sorted in descen ding order , so that k r ( k 1 ) k ≥ k r ( k 2 ) k ≥ . . . ≥ k r ( k K ) k . T o study the variations of f R ( λ , β , a ) w .r .t. λ , we c onsider the c ase whe n λ ∈ I i 0 with i 0 ∈ { 1 , . . . , K + 1 } and I i 0 =               k r ( k 1 ) k β , ∞  if i 0 = 1  k r ( k i 0 ) k β , k r ( k i 0 − 1 ) k β  if i 0 ∈ { 2 , . . . , K }  0 , k r ( k K ) k β  if i 0 = K + 1 . (33) May 30, 2018 DRAFT 13 On the interv al I i 0 , the ri sk estimate then takes the following form: 1 f R ( λ , β , a ) = 1 K  i 0 − 1 ∑ i = 1 ρ λ , β , a ( k i ) + K ∑ i = i 0 ρ λ , β , a ( k i )  (34) = 1 K  λ 2 i 0 − 1 ∑ i = 1 α 2 ( k i ) + λ i 0 − 1 ∑ i = 1 α 1 ( k i ) + i 0 − 1 ∑ i = 1 α 0 ( k i ) + K ∑ i = i 0 r 2 ( k i ) − ( K + 1 − i 0 ) σ 2  . (35) In othe r words, f R ( λ , β , a ) is a piecewise second-orde r p olynomial function of λ . Assume now that i 0 ∈ { 2 , . . . , K } . For given values of β an d a , the minimum over R of the polynomial in (35) is r eache d at ˜ λ i 0 ( β , a ) = − ∑ i 0 − 1 i = 1 α 1 ( k i ) 2 ∑ i 0 − 1 i = 1 α 2 ( k i ) . (36) The minimum ov e r  k r ( k i 0 ) k β , k r ( k i 0 − 1 ) k β  of the estimated ri sk is therefore gi ven by λ ∗ i 0 ( β , a ) =              ˜ λ i 0 ( β , a ) if [ k r ( k i 0 ) k β ≤ ˜ λ i 0 ( β , a ) ≤ k r ( k i 0 − 1 ) k β k r ( k i 0 ) k β if ˜ λ i 0 ( β , a ) < k r ( k i 0 ) k β k r ( k i 0 − 1 ) k β if ˜ λ i 0 ( β , a ) > k r ( k i 0 − 1 ) k β . (37) The minimi zers λ ∗ 1 ( β , a ) a nd λ ∗ K + 1 ( β , a ) o f t he estimated risk over I 1 and I K + 1 can be found in a similar way . The global minimizer λ ∗ ( β , a ) of t he estimated risk is s ubsequ ently compu ted a s λ ∗ ( β , a ) = arg min ( λ ∗ i 0 ( β , a )) 1 ≤ i 0 ≤ K + 1 f R ( λ ∗ i 0 ( β , a ) , β , a ) . (38) T o d etermine the o ptimal value β ∗ ( a ) o f the exponent β , we can the n proc eed to an exhaustiv e se arch over a set V o f possible v alues for this parameter b y choosing β ∗ ( a ) = arg min β ∈ V f R ( λ ∗ ( β , a ) , β , a ) . (39) In our experiments, it w a s observed tha t a restricted set of a fe w sea rch v a lues is suf ﬁcient to get good results. G. Iterative optimization algorithm The optimal expression o f the vector a is derived in a closed form in Section III-E as a function of the pa rameters λ and β . In this way , the optimization problem simply reduc es to the determination of the latter tw o parameters. On the other hand, gi ven a , a procedure for determining the optimal values of λ an d β is des cribed in Se ction III-F. In order to get optimized values of the e stimator parame ters, we therefore propose to apply t he follo wing iterativ e optimization approach: 1 W e adopt here the con vention ∑ 0 i = 1 · = ∑ K i = K + 1 · = 0. May 30, 2018 DRAFT 14 1) Initialization: Fix P and V . Se t t he iteration number p = 1 and a ( 0 ) = [ 1 , 0 , . . . , 0 ] ⊤ ∈ R d ′ 2) Iteration p a) Set β ( p ) = β ∗ ( a ( p − 1 ) ) and λ ( p ) = λ ∗ ( β ( p ) , a ( p − 1 ) ) as described in Se ction II I-F. b) Set a ( p ) = a ∗ ( λ ( p ) , β ( p ) ) using (31) w here the expectations are replaced by sample estimates. 3) Set p ← p + 1 and g oto step 2 u ntil con ver gence. 4) Return the optimized v a lues ( λ ( p ) , β ( p ) , a ( p ) ) of the parameters. W e point out t hat, alt hough we were not able to prov e the c on vergence of the optimized parameters, the generated sequenc e ( f R ( λ ( p ) , β ( p ) , a ( p ) )) p is a decreasing con vergent seq uence. This means tha t improved parameters are generated at e ach iterati on of the algorithm. I V . M U L T I C O M P O N E N T W A V E L E T D E N O I S I N G Our objectiv e he re is to apply the no nlinear estimator dev e loped in the previous se ction to noise reduction in d egraded multicomponen t images by c onsidering wavelet-based approac hes. The original multichannel image is c omposed o f B ∈ N ∗ compone nts s ( b ) of size L × L , with b ∈ { 1 , . . . , B } . Ea ch image compo nent s ( b ) is corrupted by an additi ve n oise n ( b ) , which is assu med ind epende nt of the images of interest. Consequen tly , we obtain the follo wing noisy obse rv ation ﬁ eld r ( b ) deﬁned by: ∀ k ∈ K , r ( b ) ( k ) = s ( b ) ( k ) + n ( b ) ( k ) , (40) where K = { 1 , . . . , L } 2 . F ollowi ng a multi variate approach, we d eﬁne: ∀ k ∈ K ,          s ( k ) △ = [ s ( 1 ) ( k ) , . . . , s ( B ) ( k )] ⊤ n ( k ) △ = [ n ( 1 ) ( k ) , . . . , n ( B ) ( k )] ⊤ r ( k ) △ = [( r ( 1 ) ( k ) , . . . , r ( B ) ( k )] ⊤ . (41) Obviously , the observation model (40) can be rewr itten as ∀ k ∈ K , r ( k ) = s ( k ) + n ( k ) . In many optical systems , the noise stems from a combination of photonic and elec tronic no ises cumulated with quantization errors. Su bseque ntly , we wi ll assume that t he noise vector proces s n is zero-mean iid Gaus sian with c ov ariance matrix Γ Γ Γ ( n ) . In [1] a nd [2], this was shown to c onstitute a realistic a ssumption for satellite systems. It is w orth noticing that a n on diagonal matrix Γ Γ Γ ( n ) indicates that inter -comp onent correlations exist b etween co-located noise s amples. Hereafter , we will use two decompositions. T he ﬁrst o ne consists in a critically d ecimated M -ban d wa velet transform wherea s the secon d one, correspon ds to an M -ban d dual-tree wavelet decomposition we recently proposed [27 ] which permits a d irectional analysis of i mages. May 30, 2018 DRAFT 15 A. M -band wavelet basis estimation 1) Mo del: W e ﬁrst cons ider an M -band orthonormal discrete wav elet transform (D WT) [30] over J resolution lev els applied, for each chan nel b , to the obs ervation ﬁeld r ( b ) . This deco mposition produces M 2 − 1 wa velet su bband seque nces r ( b ) j , m , m ∈ N 2 M \ { ( 0 , 0 ) } , each o f s ize L j × L j (where L j = L / M j ) 2 , a t ev ery reso lution level j and a n additional app roximation seq uence r ( b ) J , 0 of size L J × L J , at reso lution level J . r ( b ) j , m W T W T n ( b ) j , m W T s ( b ) j , m r ( b ) iid N ( 0 , ( n ) ) n ( b ) s ( b ) ( s ( b ) j , m , s H ( b ) j , m ) ( r ( b ) j , m , r H ( b ) j , m ) D T T D T T D T T ( n ( b ) j , m , n H ( b ) j , m ) r ( b ) iid N ( 0 , ( n ) ) n ( b ) s ( b ) Fig. 1. Considered models in the wav elet tr ansform domain ( left) and in the dual-tree transform domain (right). On the one ha nd, the linea rity of the D WT yields (see . Fig. 1 ): ∀ k ∈ K j , r j , m ( k ) = s j , m ( k ) + n j , m ( k ) where K j = { 1 , . . . , L j } 2 and s j , m ( k ) △ = [ s ( 1 ) j , m ( k ) , . . . , s ( B ) j , m ( k )] ⊤ , r j , m ( k ) △ = [ r ( 1 ) j , m ( k ) , . . . , r ( B ) j , m ( k )] ⊤ , n j , m ( k ) △ = [ n ( 1 ) j , m ( k ) , . . . , n ( B ) j , m ( k )] ⊤ . On the other han d, the orthono rmality of the DWT preserves the spa tial whiteness of n j , m . More speciﬁca lly , i t is easily sho wn that the latter ﬁeld i s an i.i.d. N ( 0 , Γ Γ Γ ( n ) ) random v ector process. A ﬁnal requ ired a ssumption is that the random vectors ( s j , m ( k )) k ∈ K are identically dis trib uted for any giv e n value of ( j , m ) . 2) As sociated estimator: As desc ribed in Section III, our estimator ca n be directly a pplied to the M -ban d DW T c oefﬁcients. As in con ven tional a pproache s, the approximation coefﬁcients ( i.e. j = J and m = ( 0 , 0 ) ) a re kept untouch ed. The parameters λ j , m , β j , m and q j , m can be d etermined a daptiv ely , for ev ery subband ( j , m ) a nd ev ery component b . In this c ase, the R O V c an be sca lar , spatial, inter- componen t or combined spatial/inter -compo nent. More d etailed examples will be giv en i n Section V. 2 For simplicity , L i s assumed to be di visible by M J . May 30, 2018 DRAFT 16 B. M -band dual-tr ee wavelet frame es timation G ∗ 0 G ∗ 1 G ∗ M − 1 ↓ M ↓ M ↓ M ↑ M ↑ M ↑ M G 0 G 1 G M − 1 H ∗ 0 H ∗ 1 H ∗ M − 1 ↓ M ↓ M ↑ M ↑ M ↑ M H 0 H 1 H M − 1 ↓ M Fig. 2. Pair of analysis/synthesis M -band para-unitary ﬁlter banks. 0000 1111 r STEP 1 STEP 2 STEP 3 u 2 , m 1 , m 2 ﬁlter bank ( F 1 ) Preﬁltering Preﬁltering ( F 2 ) Linear combinaison of the subbands M -band ﬁlter bank Linear combinaison of the subbands r 0 , 0 , 0 r 1 , 0 , 0 r H 1 , 0 , 0 r H 0 , 0 , 0 "Dual" M -band ﬁlter bank ﬁlter bank "Dual" M -band ( u H 1 , m 1 , m 2 ) ( m 1 , m 2 ) 6 =( 0 , 0 ) ( u 1 , m 1 , m 2 ) ( m 1 , m 2 ) 6 =( 0 , 0 ) u H 2 , m 1 , m 2 M -band Fig. 3. Dual-tree 2 D . 1) A brie f ove rview of the decomp osition: The M -band rea l dual-tree trans form (DTT) c onsists in performing two separab le M -band orthonormal wa velet decompositions in parallel as illustrated by Fig. 2. The one-dimens ional wav e lets ( ψ m ) m ∈ N ⋆ M correspond ing to the primal tree (upp er b ranch) a re a ssumed known a nd the “dual tree” o nes ( ψ H m ) m ∈ N ⋆ M (used in the lower b ranch) a re built so that they deﬁne Hilbert pairs with the p rimal ones. This reads in the frequency do main: ∀ m ∈ N ⋆ M , b ψ H m ( ω ) = − ı sign ( ω ) b ψ m ( ω ) . Details of co nstruction are given in [27] and the global sch eme o f the dec omposition is shown in Fig. 3. An important point is that t he dual-tree dec omposition includ es a pos t-processing, co nsisting of a linear isometric c ombination of the primal/dual subban ds (see Fig. 3). This post-proces sing constitutes a n essen tial step for obtaining a directional a nalysis. Finally , two sets of c oefﬁcients (primal an d dual one s) are obtained, which means that this represen tation in volves a limi ted redundan cy of a f a ctor two. May 30, 2018 DRAFT 17 2) Mo del: Ap plying this decompos ition to a multichan nel image having B compone nts and using similar notations to S ection IV -A.1, we ob tain the followi ng c oefﬁcients for the original data , the obse rved ones and the noise, resp ectiv ely: • before post-processing : ( s j , m ( k ) , s H j , m ( k )) , ( r j , m ( k ) , r H j , m ( k )) , ( n j , m ( k ) , n H j , m ( k )) ; • after post-processing: ( v j , m ( k ) , v H j , m ( k )) , ( u j , m ( k ) , u H j , m ( k )) , ( w j , m ( k ) , w H j , m ( k )) . Note that a post-proce ssing is not applied to all subband s (see [27]) as the Hilbert condition is only veriﬁed by mothe r wavelets. As a c onsequ ence, the linear is ometric c ombination is n ot p erformed for subban ds process ed by low pass ﬁ lters. More precisely , the post-processing consists o f the follo wing unitary transform of the de tail coef ﬁcients: f or all m ∈ N ⋆ 2 M , ∀ k ∈ K j , w j , m ( k ) = 1 √ 2  n j , m ( k ) + n H j , m ( k )  and w H j , m ( k ) = 1 √ 2  n j , m ( k ) − n H j , m ( k )  . (42) Similar relations hold for the original and obse rved data. Furthermore, inv oking the linearity prop erty of the transform, these coef ﬁc ients are related by (see. Fig. 1 (r ight)): ∀ k ∈ K j , r j , m ( k ) = s j , m ( k ) + n j , m ( k ) and u j , m ( k ) = v j , m ( k ) + w j , m ( k ) r H j , m ( k ) = s H j , m ( k ) + n H j , m ( k ) u H j , m ( k ) = v H j , m ( k ) + w H j , m ( k ) . (43) 3) No ise statistical properties: In o ur recent work [49], [50], a detailed an alysis o f the noise statistical properties after such a dual tree decomposition has been perf ormed. In the sequel, some of the main results we obtained are brieﬂy s ummarized. Le t us recall the deﬁn ition of the d eterministic cros s-correlation function between the pri mal and dual wa velets: for all ( m , m ′ ) ∈ N 2 M , ∀ τ ∈ R , γ m , m ′ ( τ ) = Z ∞ − ∞ ψ m ( x ) ψ H m ′ ( x − τ ) d x . (44) W e have obtained the followi ng expressions for the cov arianc e ﬁelds: f or all j ∈ Z , m = ( m 1 , m 2 ) ∈ N 2 M , m ′ = ( m ′ 1 , m ′ 2 ) ∈ N 2 M , k = ( k 1 , k 2 ) ∈ K j and k ′ = ( k ′ 1 , k ′ 2 ) ∈ K j , E [ n j , m ( k )( n j , m ′ ( k ′ )) ⊤ ] E [ n H j , m ( k )( n H j , m ′ ( k ′ )) ⊤ ]      = Γ Γ Γ ( n ) δ m 1 − m ′ 1 δ m 2 − m ′ 2 δ k 1 − k ′ 1 δ k 2 − k ′ 2 E [ n j , m ( k )( n H j , m ′ ( k ′ )) ⊤ ] = Γ Γ Γ ( n ) γ m 1 , m ′ 1 ( k ′ 1 − k 1 ) γ m 2 , m ′ 2 ( k ′ 2 − k 2 ) . It can be further noticed tha t, for m 6 = 0 , the random vectors n j , m ( k ) and n H j , m ( k ) at a gi ven location k are mutually uncorrelated. After post-proce ssing, the c ov ariances of the transformed noise c oefﬁcient ﬁelds can be easily ded uced May 30, 2018 DRAFT 18 from (42): for a ll ( m , m ′ ) ∈ N ⋆ 2 M and ( k , k ′ ) ∈ K 2 j , E [ w j , m ( k )( w j , m ′ ( k ′ )) ⊤ ] = E [ n j , m ( k )( n j , m ′ ( k ′ )) ⊤ ] + E [ n j , m ( k )( n H j , m ′ ( k ′ )) ⊤ ] (45) E [ w H j , m ( k )( w H j , m ′ ( k ′ )) ⊤ ] = E [ n j , m ( k )( n j , m ′ ( k ′ )) ⊤ ] − E { n j , m ( k )( n H j , m ′ ( k ′ )) ⊤ ] (46) E [ w j , m ( k )( w H j , m ′ ( k ′ )) ⊤ ] = 0 . (47) In summary , n oise coef ﬁ cients are inter -tree correlated before the post-transform w hereas aft er the po st- transform, they are s patially correlated. This con stitutes an important conseq uence o f the po st-processing stage. 4) As sociated e stimator: In the M -band DTT cas e, the primal a nd d ual co efﬁcients are b oth es timated. For each compone nt b ∈ { 1 , . . . , B } , the estimator rea ds: for the s ubband s which a re not linearly co mbined ( m 6∈ N ⋆ M ), f s ( b ) j , m ( k ) = η λ ( b ) j , m ( k r ( b ) j , m ( k ) k β ( b ) j , m ) ( q ( b ) j , m ) ⊤ r ( b ) j , m ( k ) (48) f s H ( b ) j , m ( k ) = η λ H ( b ) j , m ( k ( r H ( b ) j , m ( k )) k β H ( b ) j , m )  q H ( b ) j , m  ⊤ r H ( b ) j , m ( k ) , (49) and, for the combined s ubband s ( m ∈ N ⋆ M ), f v ( b ) j , m ( k ) = η λ ( b ) j , m ( k u ( b ) j , m ( k ) k β ( b ) j , m ) ( q ( b ) j , m ) ⊤ u ( b ) j , m ( k ) (50) f v H ( b ) j , m ( k ) = η λ H ( b ) j , m ( k ( u H ( b ) j , m ( k )) k β H ( b ) j , m )  q H ( b ) j , m  ⊤ u H ( b ) j , m ( k ) , (51) where r H ( b ) j , m ( k ) and r H ( b ) j , m ( k ) (resp. u ( b ) j , m ( k ) and u H ( b ) j , m ( k ) ) are the R O Vs for the primal a nd d ual co efﬁcients before (resp . after) p ost-transformation. Similarly to the D WT cas e, ( λ j , m , β j , m , q j , m ) a nd ( λ H j , m , β H j , m , q H j , m ) can be adaptiv e ly determined by minimizing the qu adratic risk over the frame coefﬁcients for every subban d ( j , m ) and every c omponent b in each tree. Fu rthermore, the approximation coefﬁcients a re also kept untouche d. The denoised multichannel images are then obtaine d from the es timated wa velet coefﬁcients by in verting the DTT us ing the optimal reconstruction developed in [27]. In this case, a great ﬂ exibility exists in the choice of the R O V since the latter ca n be s calar , s patial, inter-component, inter -tree or combined spatial/inter -compo nent/inter -tree as wil l be illustrated in t he next section. V . N U M E R I C A L R E S U LT S W e now pro vide numerical examples sh owing the ef ﬁciency of the proposed method. In our simulations, we consider dif ferent multichannel remote sensing images. For the sake of clarity , we only provide experimental results c oncerning two multispectral images. T he ﬁrst one designa ted as T u nis correspo nds May 30, 2018 DRAFT 19 to a part of a SPO T3 scene depicting a urban area of the city of T unis ( B = 3). T he second one named T ren to is a Lands at The matic Mapp er image h aving initially sev en cha nnels. Th e thermal compone nt (the sixth c omponen t) has been discarded sinc e it is not simi lar t o the remaining one s. Hence, the test image T ren to is a B = 6 componen t image. In order to obtain reliable results from a statistical viewpoint, Mon te Carlo simulations h av e bee n c onducted . According to our experiments, a veraging the me an square error over ﬁve noise realizations is suf ﬁc ient to o btain consistent quantitati ve e valuations. In the f ollowi ng, we discuss se veral topics: in particular , we compare our method with other recently proposed estimators, possibly h aving a multi vari ate structure. Then, w e conside r diff erent pre-proces sings that can be pe rformed on the multichannel data before ap plying the estimator , thus expec ting impro ved results. The R O V being deﬁn ed in a generic way in the p revi ous section, we also study the inﬂ uence of s peciﬁc choices of this R O V on the d enoising pe rformance a s well a s the inﬂu ence of the wa velet choice (conside ring various M -band ﬁlter ban ks). When dif ferent decompositions are p erformed, we set the maximum d ecompos ition le vel so that the s ize of the ap proximation ﬁe lds remain the sa me. Consequ ently , we d ecompos e the images over 2 le vels for a 4-band ﬁlt er bank structure and 4 l ev els for a dyadic one. If σ ( b ) denotes the standa rd deviation of the clea n mu ltichannel compone nt s ( b ) (of s ize L 1 × L 2 ) we deﬁne the initial and t he ﬁnal signal to noise ratios SNR ( b ) initial and, SNR ( b ) ﬁnal in the b -th channel as: SNR ( b ) initial △ = 10 log 10  ( σ ( b ) ) 2 L 1 L 2 k s ( b ) − r ( b ) k 2  , and SNR ( b ) ﬁnal △ = 10 log 10  ( σ ( b ) ) 2 L 1 L 2 k s ( b ) − ˆ s ( b ) k 2  . (52) Then, all the B channel contributions a re averaged into global values o f the initial and ﬁn al s ignal to noise ratio SNR initial and, SNR ﬁnal . A. Comp arison with existing methods W e aim in this section at co mparing the prop osed app roach with several e xisting de noising metho ds which are brieﬂy described in T able I. T ests are pe rformed on a 512 × 512 SP O T3 image o f T unis city ( B = 3) (as some multi vari ate methods are limited to 3-band images) corrupted by a n additiv e zero-mea n white Gau ssian noise with covariance matr ix Γ Γ Γ ( n ) 1 = σ 2 I B , whe re I B denotes the identity matrix of size B × B . W e ﬁrst study techniqu es that use orthog onal w avelet transforms. W e employ Daube chies wav elets of order 4 in all t he follo wing es timators: 1) the Bi variate sh rinkage, which takes into account inter- scale depen dencies , the last level being process ed by in verting children and parent role [45]; May 30, 2018 DRAFT 20 T ABLE I B R I E F DE S C R I P T I O N O F T H E T E S T E D M E T H O D S . Acrony m Description Ref. Acrony m Description Ref. Biv . Biv ariate shrinkage method [45] Multi v ariate methods BLS-GSM Bayes ian Least Squares (BL S) [34] ProbShrink Multi variate method for 3-band images using [39] Gaussian Scale Mixture (GSM) ( . × . ) critically decimated DWT and taking into using critically decimated DWT account a ( . × . ) neighborho od in each channel BLS-GSM BLS-GSM using critically [34] ProbShrink Multi variate method for 3-band images [39] + parent decimated DWT and taking into red. ( . × . ) using undecimated DWT and taking into account the parent coefﬁcient account a ( . × . ) neighborhood in each channel BLS-GSM BLS-GSM using a full [34] Surev ect Estimator based on an extended S URE [22] red. steerable pyramid approach using a critically decimated DWT (redundant transform) Curvelets Block estimator using curvelet [51] transform: 7 . 5 times redundant 2) the BLS-GSM method dev eloped in [34] including or no t the parent neighborhoo d an d co nsidering a 3 × 3 spa tial neighborhood; 3 3) the ProbS hrink estimator [39] for multi variate data with a 3 × 3 spatial neighborhood (in each channe l); 4 4) the Su rev e ct estimator [22], which on ly takes into a ccount multicompon ent s tatistical de penden cies; 5) the prop osed estimator wh ere the s et of values taken by β ( b ) j , m is V = { 0 . 5 , 1 , 1 . 5 , 2 } , the R O V is represented in Fig. 5(b). A subspace constraint is ad ded on the vector q ( b ) j , m so that ( q ( b ) j , m ) ⊤ r ( b ) j , m ( k ) reduces to a linea r c ombination o f the multichannel d ata a t the con sidered loca tion a nd the 4 s patial nearest neighbors. The o btained results are p rovided in T a ble II (the initial SNRs may be dif feren t in eac h chan nel althou gh the no ise variance is ﬁxed ). For the ﬁrst three methods, d enoising has bee n performed for ea ch compo nent of the multichannel da ta. For orthogona l wavelets, ProbShrink lead s to b etter res ults when it is as sociated to a spatial neighborhood than wh en considering only the pixel v alue to be estimated. It pe rforms q uite similarly to the Bi variate shrinkage. The BLS-GSM estimator outperforms thes e two methods p roviding a gain of approx imati vely 0 . 2 dB (up to 0 . 3 dB by including the pa rent c oefﬁcient in the neighbo rhood). Nev ertheless, the Surevect estimator brings more signiﬁcan t improvements and it can be observed that 3 W e use the toolbox available from Portilla’ s website http://www.io.csi c.es/PagsPers/JPor tilla/ . 4 W e use the toolbox available from Pi ˘ zurica’ s website http://telin.rug. ac.be/ ∼ sanja/ . May 30, 2018 DRAFT 21 T ABLE II D E N O I S I N G R E S U LT S ( A V E R AG E V A L U E S C O M P U T E D OV E R 3 C H A N N E L S ) O N T U N I S I M A G E U S I N G N O N R E D U N D A N T O RT H O G O N A L T R A N S F O R M S ( S E E T A B . I ) W I T H D AU B E C H I E S WA V E L E T S O F O R D E R 4 ( L E N G T H 8 ) . σ 2 SNR init Biv Probshrink BLS-GSM BLS-GSM S ure vect Proposed (3 × 3) + parent DWT DWT 650.3 5.081 11 .85 11.86 12.05 12.14 13.08 13.42 410.3 7.081 12 .89 12.84 13.11 13.21 14.12 14.52 258.9 9.081 13 .99 13.91 14.26 14.36 15.24 15.70 163.3 11.08 15 .19 15.08 15.49 15.60 16.43 16.96 103.1 13.08 16 .49 16.37 16.81 16.93 17.70 18.28 65.03 15.08 17 .88 17.54 18.22 18.35 19.04 19.65 our me thod leads to even better nume rical resu lts whatever the initial n oise level is. The new structure of the estimator coupled with a s patial a nd spectral bloc k proce ssing may explain s uch an improvement. Furthermore, the g ain increas es as the initial SNR increase s, which is interes ting in s atellite imag ing where the noise is often of low intens ity . T o b e fair , we would like to me ntion that, althoug h Biv ariate shrinkage, Prob shrink and BLS-GSM were designe d for mo nochan nel image denoising, exten sions of these methods to t he multi vari ate case could p robably be en visaged . In the monochan nel ca se, it has been reported that the us e of redundant transforms o ften brings noticeable improvements in denoising [51 ]. W e subs equen tly compare metho ds that have be en proved to be very efﬁcient when combined with a redund ant a nalysis: 1) the curvelet de noising [51 ] using a c urvelet frame with a redun dancy app roximati vely equal to 7 . 5 and a block t hresholding; 5 2) the BLS-GSM method using steerable pyramids with 8 orientations, includ ing the paren t neighb or - hood and a 3 × 3 spatial neighborhood as des cribed i n [34], 3) the ProbShrink estimator for multiv ariate data using undec imated wav elet transform [39] (with Daubec hies wavelets of length 8) and taking into ac count a 3 × 3 or n o spatial neighborhood; 4) the Surev ect estimator [22], extended to DTT (with Daub echies w avelets o f length 8); 5) the prop osed e stimator us ing a DTT wh ere V = { 0 . 5 , 1 , 1 . 5 , 2 } , the R O V is repre sented in Fig. 6(b). The vector q ( b ) j , m (resp. q H ( b ) j , m ) is suc h that it introduces a linear co mbination of the multichanne l data in the primal (resp . dual) tr ee at the considered location a nd the 4 s patial nearest neighbors. 5 W e employ the Curvelab 2.0 toolbox which can be down loaded f rom http://www.curvel et.org . May 30, 2018 DRAFT 22 T ABLE III D E N O I S I N G R E S U LT S ( A V E R AG E V A L U E S C O M P U T E D OV E R 3 C H A N N E L S ) O N T U N I S I M A G E U S I N G R E D U N D A N T T R A N S F O R M S ( S E E T A B . I ) W I T H D A U B E C H I E S WA V E L E T S O F O R D E R 4 ( L E N G T H 8 ) . σ 2 SNR init Curvelets BLS-GSM red P robshrink r ed Probshrink red Surev ect P roposed + parent (3 × 3) (1 × 1) DTT DTT 650.3 5.081 11 .91 12.92 13.00 13.33 13.54 13.72 410.3 7.081 12 .94 14.00 14.04 14.38 14.59 14.80 258.9 9.081 14 .04 15.15 15.13 15.50 15.70 15.97 163.3 11.081 15.17 16.38 16.28 16.68 16.87 17.21 103.1 13.081 16.33 17.68 17.51 17.92 18.11 18.52 65.03 15.081 17.56 19.04 18.76 19.20 19.42 19.88 It is worth p ointing out that the sa me noisy imag es as u sed in the n on red undant ca se have been p rocessed by the red undant transforms. As sh own in T ab le III, curvelets do no t seem rea lly appropriate in this multichannel co ntext in spite o f the ir p romising resu lts in the monoc hannel one. ProbShrink and BL S- GSM method s are very efﬁcient in the redund ant case and ProbShrink s hows it s su periority when using an inter -component neighborhood. The methods u sing a DTT outperform the existing one s in all the case s. W e point o ut that the DTT has a limited redunda ncy of a factor 2 c ompared with the othe r conside red redundan t decompositions. It can be noticed that our me thod provides better res ults than Su rev e ct. The observed ga in increases as the initial SNR increas es and we o btain signiﬁcant improvements with respe ct to critically decimated transforms of abou t 0 . 25 dB. It is also interesting to note that the obse rved gain in terms of SNR leads to q uite visible dif ferenc es. In Fig. 4, c ropped versions of the ﬁrst channel of the T unis image a re display ed, for a low value of the initial SNR (4 . 66 dB). W e can n otice that the proposed method (see Fig. 4-(f) ) allo ws to b etter r ecover e dges w hereas t he t hree others (see Fig. 4-(c,d,e )) res ult in more blurred images, w here some of the ori ginal structures are missing. Th is is es pecially visible for the image denoised w ith the BLS-GSM estimator (s ee Fig. 4 -(d)). In the foll owing, we focus on the method introduced in this paper and more speciﬁca lly on the vari ations of its performance according to the parameter setup. B. Pre-pr oces sing stage In order to improve the den oising performance in the multichannel co ntext, additional linear p rocedures can be applied. Actually , dif ferent linear pre-processing s of the compone nts may be en visag ed: • The simplest idea consists in dec orrelating the spectral co mponents of the image to be estimated in May 30, 2018 DRAFT 23 (a) (b) (c) (d) (e) (f) Fig. 4. Cropped versions of Tunis image (channel b = 1, initial SNR equal to 4 . 66 dB) and (a) Original image, (b) Noisy image, (c) Denoised image using ProbShrink red. ( 1 × 1 ) , (d) Denoised imag e usin g BLS-GSM red. + parent method, (e) Denoised image using curvelets and (f) Denoised i mage using our method (employing a DT T). order to process them sepa rately . Kn owi ng t he noise covariance ma trix Γ Γ Γ ( n ) , we can deduce the original data cov a riance matrix (assumed h ere to be spatially constant): Γ Γ Γ ( s ) = Γ Γ Γ ( r ) − Γ Γ Γ ( n ) , from the obs erved data covariance matrix Γ Γ Γ ( r ) . More precis ely , by performing a n eigende composition of Γ Γ Γ ( s ) , we se ek for an orthogona l matrix U ( s ) such that: Γ Γ Γ ( s ) = U ( s ) D ( s ) ( U ( s ) ) ⊤ where D ( s ) is a diagon al matrix. The n, the transformed multichannel image is (( U ( s ) ) ⊤ r ( k )) k and it is corrupted by a spa tially white ze ro- mean Gaussian noise with covari ance matrix ( U ( s ) ) ⊤ Γ Γ Γ ( n ) U ( s ) . W e then proc eed to the nonlinear wa velet estimation of the decorrelated co mponents as described i n the pre viou s sections. • Instead of dec orrelating the componen ts, we may try to make the m statistically independ ent or , a t least, a s indepe ndent as p ossible. A number of ICA (Inde penden t Co mponent Analysis) methods have been developed f or this p urpose in recent y ears [47]. In this case , a linear transform V ( s ) (which is no t neces sarily orthogo nal) is applied to the multichannel data. The proposed es timator alrea dy inc ludes an optimized linear combination o f s ome of the componen ts of the R O V . It is therefore expe cted to p rovide compe titi ve results w .r .t. techniqu es in volving some linear pre-process ing. In order to make fair comparisons and ev aluate the improvements resulting from the May 30, 2018 DRAFT 24 optimization of the linear part of the estimator , we provide simulations where the R O V is the same whatever the pre-process ing is (we h av e cho sen the same R O V as in the previous se ctions). I n addition, when a decorrelation or an ICA is employed, the linea r part of the e stimator is chose n equ al to the identity . W e ﬁnally prop ose to compare these results with a s imple linear MSE es timator b ased on a linear combination of coef ﬁcients from different channels. Numerical results display ed in T able IV allow us to evaluate the proposed app roach w ithout optimiza- tion of the linea r parameter vector , the sa me estimator combined with an ICA of the multichanne l data (using the J ADE algorithm [47]) o r a pre-deco rrelation s tage and, ﬁnally our app roach with a n optimized linear part. From thes e resu lts, it is clear that including s ome linea r p rocessing is us eful for multichann el T ABLE IV I N FL U E N C E O F D I FF E R E N T P R E - P RO C E S S I N G S O N T U N I S I M A G E D E N O I S I N G ( σ 2 = 258 . 9) . S Y M L E T S O F L E N G T H 16 A R E U S E D . T r ansform Channel SNR init W ithout transf. ICA Decorrelation MSE Lin. Opt. lin. b = 1 8.664 13.84 14.66 15.15 15.18 15 .75 DWT b = 2 9.653 14.39 15.03 15.36 15.28 15 .90 b = 3 8.926 15.15 13.85 15.11 15.26 15 .86 A verage 9.081 14.46 14.51 15.21 15.24 15 .84 b = 1 8.664 14.13 14.37 15.42 15.42 15 .95 DTT b = 2 9.653 14.66 14.67 15.64 15.53 16 .10 b = 3 8.926 15.38 14.26 15.25 15.52 16 .01 A verage 9.081 14.72 14.43 15.44 15.49 16.02 image denoising. The ICA only brings slight i mprovements, po ssibly due to the f ac t that the associated transform is not orthogona l. Pre -decorrelating the data signiﬁcantly increa ses the SNR, h owe ver the fully optimized version o f our e stimator remains the most ef fectiv e method. C. Inﬂue nce of t he neighbor hoods The R O V can be deﬁned as de sired a nd plays a prominen t role in the c onstruction of ou r estimator . W e study here the inﬂuence of dif ferent choices of the RO V : 1) R O V1 c orresponds to an inter -c omponent neighbo rhood. When a DWT is e mployed (see Fig. 5(a)), we have r ( b ) j , m ( k ) = [  r ( b ′ ) j , m ( k )  b ′ ] ⊤ , while for a DTT (see Fig. 6(a)), we use r ( b ) j , m ( k ) = [  r ( b ′ ) j , m ( k )  b ′ ,  r H ( b ′ ) j , m ( k )  b ′ ] ⊤ and u ( b ) j , m ( k ) = [  u ( b ′ ) j , m ( k )  b ′ ] ⊤ (53) r H ( b ) j , m ( k ) = [  r H ( b ′ ) j , m ( k )  b ′ ,  r ( b ′ ) j , m ( k )  b ′ ] ⊤ u H ( b ) j , m ( k ) = [  u H ( b ′ ) j , m ( k )  b ′ ] ⊤ . (54) May 30, 2018 DRAFT 25 2) R O V2 co rresponds to a c ombination of a spatial 3 × 3 and a n inter -componen t neighbo rhood as considered in the pre vious sec tions a nd shown in Figs. 5(b) and 6(b). B spectral components B spectral components (a) (b) Fig. 5. Representation of the different considered R O Vs in the DWT domain (the black triangle will be estimated taking into account the white ones); (a) RO V1 the purely inter-component one and (b) R OV2 combining inter-compo nent and spatial dependen cies. The linear part of the e stimator is deﬁned as in Section V -A. The corresponding results are given i n T a ble V. T ABLE V I N FL U E N C E O F T H E N E I G H B O R H O O D I N T U N I S I M A G E D E N O I S I N G ( A V E R AG E V A L U E S C O M P U T E D O V E R 3 C H A N N E L S A R E P R O V I D E D A N D σ 2 = 258 . 9) U S I N G S Y M L E T S ( L E N G T H 16 ) ( T O P ) A N D A C FI LT E R B A N K ( L E N G T H 16) ( B O T T O M ) . T r ansform SNR init R O V1 RO V2 T r ansform SNR init R O V1 RO V2 DWT (symlets) 9.081 15.42 15.84 DWT (A C) 9.081 15.48 15.77 DTT (symlets) 9.081 15.77 16.02 DTT (AC) 9.081 15.88 16.02 In order to compare different po ssible wav elet cho ices, the resu lts are provided both for s ymlets of length 16 and a 4-band ﬁlter bank gi ven in [52] which is de noted b y AC. Thes e results can a lso b e compared with the ones given i n Section V -A whe re Daubechies ﬁlters of length 8 are used. Concerning the neighborhoo d inﬂuence, we no te that tak ing into accoun t spatial depend ence leads to a signiﬁcant improvement w .r .t. inter-component depe ndence . In all cases, combining spectral an d spa tial neighborhoo d leads to the best results. May 30, 2018 DRAFT 26 B + B dual (dashed) Post−processing components B + B dual (dashed) Post−processing components (a) (b) Fig. 6. Representation of the dif ferent considered RO Vs in the DTT domain, with and without post-processing stage (the black triangle will be estimated taking into account the white ones); (a) R O V1 the purely inter-comp onent one and (b) R O V2 combining inter-compon ent and spatial dependencies. Concerning the wa velet choice, it appears that the 4-band A C wav elets yield slightly better results than the d yadic symlets choosing R O V1 and equivalent results choosing R O V3. Both outperform Daubechies wa velets wathever the R O V cho sen. D. V arious noise levels In this s ection, we cons ider that the image cha nnels are corrupted at diff erent noise levels. Thus, the noise is spatially whit e, zero-mean, Gaussian with cov ariance matrix Γ Γ Γ ( n ) 2 = Diag ( σ 2 1 , . . . , σ 2 B ) . T ABLE VI D E N O I S I N G R E S U LT S O N T U N I S I M A G E C O N S I D E R I N G Γ Γ Γ ( n ) 2 A N D U S I N G S Y M L E T S ( L E N G T H 16 ) . Channel σ 2 b SNR init Surev ect Proposed DWT Sure vect DTT P roposed DTT b = 1 25.89 18.66 2 0.58 21 .15 20.84 21.24 b = 2 258.9 9.653 1 8.53 18 .63 18.76 18.84 b = 3 491.9 6.138 1 4.20 14 .57 14.55 14.71 A verage 11.49 17.76 1 8.12 18.05 18.26 The resulting nu merical resu lts are displaye d in T able VI with the correspon ding noise levels, when May 30, 2018 DRAFT 27 our es timator is used with R O V2 . N oticeable differences can b e obs erved whe n co mparing Surevect with our method both considering DWT and DT T transforms. E. Increased n umber of c hannels A strong a dvantage of the proposed method is that, u nlike many multicompo nent a pproache s limited to RGB (3 components) imag es, it may proc ess a ny kind of multichannel images, whate ver the nu mber of channe ls is. W e consider here the 6 channel T rento image. W e apply the Su rev ect e stimator ( both using D WT and DTT), the BLS-GSM es timator (taking into a ccount the parent coef ﬁc ient), and our estimator using R O V2 . From the results provided in T a ble VII, we se e that, while the number of channels is T ABLE VII R E S U LT S O B TA I N E D A P P LY I N G D I FF E R E N T E S T I M ATO R S O N T R E N T O I M A G E ( σ 2 = 258 . 9) . Channel SNR init Surev ect Pr oposed BLS-GSM red Surev ect Proposed DWT + parent DTT DT T b = 1 -2.907 8.661 8.945 8.311 8.984 9.255 b = 2 -6.878 8.375 8.430 6.536 8.805 8.876 b = 3 -3.836 8.288 8.430 7.341 8.647 8.749 b = 4 2.428 9.525 9.796 9.836 9.901 10.00 b = 5 4.765 11.18 11.53 11.38 11.61 11.78 b = 6 -1.560 9.545 9.700 8.167 9.945 10.02 A verage -1.331 9.262 9.472 8.596 9.649 9.780 increased , our method still outperforms the other ones e specially when a DTT is used. W ith the inc rease of the n umber of channels , the reduced red undancy of the DTT become s another attractiv e feature of the proposed approach. V I . C O N C L U S I O N In this pa per , we h av e proposed a nonlinear Stein bas ed estimator for wa velet denoising of multichanne l data. Due to its ﬂexible f orm, the considered es timator generalizes many existi ng methods, in particular block-base d ones. Although the prop osed app roach has been applied t o s atellite i mages, i t could also be used in any multi variate signal d enoising problem. B esides, the estimator has been used in conjun ction with real dual-tr ee wa velet transforms but complex ones or other frame decompositions could be en visag ed as we ll. In the context of frame represe ntations, it s hould howev e r b e no ticed that the proposed estimator minimizes the risk over the frame co efﬁcients and not on the recon structed signal, which may be suboptimal [21], [53]. Another question that should be in ves tigated in a future work is the ability of May 30, 2018 DRAFT 28 the proposed frame work to exploit inter -scale depe ndencies in add ition to spa tial an d inter-component ones, a s conside red in [21] for the mono-chan nel c ase. In order to obtain an interscale d enoising method, an appropriate R O V s hould be d eﬁned and the intersc ale statistics of t he noise should be a vailable. R E F E R E N C E S [1] M. C. Abrams and S. C . Cain, “Sampling, r adiometry and image reconstruction for polar and geostationary meteorological remote sensing systems, ” i n P r oc. of SPIE, Image Reconstruction fr om Incomplete Data II , P . J. Bones, M. A. Fiddy , and R. P . Millane, Eds., vol. 4792, Dec. 2002, pp. 207–215. [2] B. R. Corner , R. M. Narajanan, and S . E. Reichenbach, “Noise estimation i n remote sensing imagery using data masking, ” Int. J. Remote Sensing , vol. 24, no. 4, pp. 689–70 2, 2003. [3] I. Daubechies, T en Lectur es on W avelets . Philadelphia, P A, USA: CBMS-NS F , S IAM L ecture Seri es, 1992. [4] S . Mallat, A wavelet tour of signal pr ocessing . San Diego, CA, US A: Academic Press, 1998. [5] D. L. Donoho, “Unconditional bases are optimal bases for data compression and for statistical estimation, ” Appl. and Comp. H arm. Analysis , vol. 1, no. 1, pp. 100–1 15, Dec. 1993. [6] D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation by wav elet shrinkage, ” Biometrika , vol. 81, no. 3, pp. 425–455, Aug. 1994. [7] — —, “ Adapting to unkno wn smoothness via wav elet shrinkage, ” J. American Statist. Ass. , vol. 90, pp. 1200–1224, Dec. 1995. [8] G. P . Nason, “W avelet shrinkage using cross-validation, ” J. Royal Statistical Society B , vol. 58, pp. 463–479, 1996. [9] Y . W ang, “Function esti mation via wavelet shrinkage for long-memory data, ” The Annals of Statistics , vol. 24, pp. 466–484 , 1996. [10] N. W eyrich and G. T . W arhola, “W av elet shrinkage and generalized cross validation for image denoising, ” IE EE T rans . on Imag e Proc. , vol. 7, no. 1, pp. 82–90, Jan. 1998. [11] H. Krim, D. T ucker , S. Mallat, and D. Donoho, “On denoising and best signal represe ntation, ” IEEE T rans. on Inform. Theory , vol. 45, no. 7, pp. 2225–2238, Nov . 1999. [12] K. T ang, J. Astola, and Y . Neuvo, “Nonlinear multiv ariate image ﬁlteri ng techniques, ” IEEE Tr ans. on Im ag e Pro c. , vol. 4, no. 6, pp. 788–798 , Jun. 1995. [13] A. K. Fletcher , V . Goyal, and K. Ramchandran, “On multiv ariate estimation by thresholding, ” in Pr oc. Int. Conf. on Image Pr ocessing , vol. 1, Barcelona, Spain, Sep. 14-17 2003, pp. 61–64. [14] A. K. F letcher and K. Ramchandran, “Estimation error bounds for denoising by sparse approximation, ” i n Pro c. Int. Conf. on Imag e Pro cessing , vol. 1, Barcelona, S pain, S ep. 4-17 2003, pp. 113–116 . [15] P . S cheunders, “W ave let thresholding of multiv alued i mages, ” IEEE T rans. on Image P r oc. , vol. 13, no. 4, pp. 475–483, Apr . 2004. [16] A. Benazza-Benyahia and J. -C. Pesquet, “W avelet-based multispectral image denoising wit h Bernoulli-Gaussian models, ” in IEEE-EURA SIP W orkshop on Nonlinear Signal and Image Processin g , Grado, Italy , Jun. 8-11 2003. [17] A. P i ˘ zurica, W . Philips, I. Lemahieu, and M. Acheroy , “ A joint inter- and intrascale statistical model for bayesian wav elet based image denoising, ” IEEE T rans. on Image Pro c. , vol. 11, no. 5, pp. 545–557, May 2002. [18] P . S cheunders and J. Driesen, “Least-squares interband denoising of color and multispectral images, ” in Proc. Int. Conf. on Imag e Pro cessing , vol. 2, Si ngapore, Oct. 24-27 2004, pp. 985–988. May 30, 2018 DRAFT 29 [19] C. Stein, “Estimation of the mean of a multiv ariate normal distribution, ” Annals of Statistics , vol. 9, no. 6, pp. 1135–1151, 1981. [20] J.-C. Pesquet and D. Leporini, “ A new wa velet esti mator for image denoising, ” in IE E Sixth International Confer ence on Imag e Processin g and its Applications , vol. 1, Dublin, Ireland, Jul. 14-17 1997, pp. 249–253. [21] F . Luisier , T . Blu, and M. Unser, “ A new SURE approach to image denoising: Inter-scale orthonorma l wavelet thresho lding, ” IEEE T rans. on Imag e Pr oc. , vol. 16, no. 3, pp. 593–606, Mar . 2007. [22] A. Benazza-Ben yahia and J.-C. Pesquet, “Building robust w av elet estimators for multicomponent images using Stein’ s principle, ” IEEE T rans. on Image Pr oc. , vol. 14, no. 11, pp. 1814–183 0, Nov . 2005. [23] T . T . Cai and B. W . Silverman, “Incorporating information on neighboring coefﬁcients into wav el et estimation, ” Sankhya , vol. 63, S eries B., pp. 127–148, 2001. [24] F . Abramo vich, P . Besbeas, and T . Sapatinas, “Empirical Bayes approach to block w av elet function estimation, ” Computational Statistics & Data Analysis , vol. 39, no. 4, pp. 435–451, Jun. 2002. [25] C. C haux, A. Benazza-Benyahia, and J.-C . Pesquet, “ A block-thresholding method for multispectral image denoising, ” in Pr oc. SPIE , Ann. Meeting, W avelets X , vol. 5914, San Diego, CA, US A, Aug. 2005, pp. 1–H1,1–H13. [26] I. W . Selesnick, R. G. Baraniuk, and N. G. Kingsbury , “The dual-tree complex wav elet transform, ” IEEE Signal Proc essing Maga zine , vol. 22, no. 6, pp. 123–151, Nov . 2005. [27] C. Chaux, L. Duv al, and J.-C. Pesquet, “Image analysis using a dual-tree M -band wav elet transform, ” IEEE T rans. on Imag e Proc. , vol. 15, no. 8, pp. 2397– 2412, Aug. 2006. [28] A. Antoniadis, D. Leporini, and J. -C. Pesquet, “W avelet thresholding for some classes of non-Gaussian noise, ” Statistica Neerlandica , vol. 56, no. 4, pp. 434–453, Dec. 2002. [29] H. Maître, E d., Le t raitement d’images . Paris, France: H ermès Sci ence Publications, 2003. [30] P . St ef fen, P . N. Heller , R. A. Go pinath, an d C . S. Burrus, “Theory o f regular M -band wa velet ba ses, ” IEEE T rans. on Signal Pr oc. , vo l. 41, no. 12, pp. 3497–3511, Dec. 1993. [31] P . Hall, S . Pene v , G. K erkyacharian, and D. Picard, “Numerical performan ce of block thresho lded wa velet estimators, ” Statist. Comput. , vol. 7, no. 2, pp. 115–124, 1997. [32] P . Hall, G. Kerk yacharian, and D. Picard, “Block threshold rules for curve estimation using kernel and wavelet methods, ” Ann. Statist. , vol. 26, no. 3, pp. 922–942, 1998. [33] ——, “On t he minimax optimality of block thresholded wav elet estimators, ” Statist. Sinica , vol. 9, pp. 33–49, 1999. [34] J. Portilla, V . Strela, M. J. W ainwright, and E. P . Si moncelli, “Image denoising using scale mixtures of Gaussians in the wa velet domain, ” IEEE T rans. on Image Proc . , vol. 12, no. 11, pp. 1338–1 351, Nov . 2003. [35] J. M . Shapiro, “Embedded imag e coding using zerotrees of wavelet coef ﬁ cients, ” IEEE T rans. on Signal Pr oc. , vol. 41, pp. 3445–346 2, Dec. 1993. [36] J. K. Romberg, H. Choi, and R. G. Baraniu k, “Bayesian tree-structured image modeling using wa velet-domain hidden Marko v models, ” IEEE T rans. on Image Pro c. , vol. 10, no. 7, pp. 1056–1068, Jul. 2001. [37] L. ¸ Sendur and I. W . Selesnick, “Biv ariate shrinkage functions for wa velet-based denoising ex ploiting interscale depe ndenc y , ” IEEE T rans. on Signal P r oc. , vol. 50, no. 11, pp. 2744–275 6, Nov . 2002. [38] A. Benazza-Beny ahia and J.-C. Pesquet, “ An interscale mu ltiv ariate MAP estimation of multispectral images, ” in P r oc. Eur . Sig. and Imag e Proc. Conferen ce , Wien, Austria, Sep. 6-10 2004. [39] A. P i ˘ zurica and W . Phili ps, “Estimating probability of presence of a signal of interest in multiresolution single- and multiband image denoising, ” IE EE Tr ans. on Image Proc . , vol. 15, no. 3, pp. 654–665, Mar . 2006. May 30, 2018 DRAFT 30 [40] R. Coifman and D. Donoho, T ranslation-in variant de-noising , ser . Lecture Notes in Statistics. New Y ork, NY , USA: Springer V erlag, 1995, vol. 103, pp. 125–15 0. [41] G. P . Nason and B. W . Silverman, The stationary wavelet transform and some statistical applications , ser . Lecture Notes in Statistics. New Y ork, NY , USA: Springer V erlag, 1995, vol. 103, pp. 281–300. [42] J.-C. Pesquet, H. Krim, and H. Carfatan, “T i me-in variant orthogonal wav elet representations, ” I EEE T rans. on Signal Pro c. , vol. 44, no. 8, pp. 1964–1970, A ug. 1996. [43] H.-Y . Gao, “W av elet shrinkage denoising u sing th e no n-neg ativ e garrote, ” J . Comput. Gr aph. Statist. , vol. 7 , no. 4, pp. 469–48 8, 1998. [44] L. Breiman, “Bett er subset regression using the nonneg ativ e garrote, ” T echnome trics , vol. 37, no. 4, pp. 373–384, 1995. [45] L. ¸ Sendur and I . W . Selesnick, “Biv ariate shrinkage with local v ariance estimation, ” Signal P r ocessing Letters , vol. 9 , no. 12, pp. 438–441, Dec. 2002. [46] P . L. Combettes and V . R. W ajs, “Signal recovery by proximal forwa rd-backwa rd splitting, ” SIAM J . on Mult. Model. Simul. , vol. 4, pp. 1168–1200 , N ov . 2005. [47] J.-F . Cardoso and A. Souloumiac, “Blind beamforming for non Gaussian signals, ” IE E Procee dings F , vol. 140, no. 6, pp. 362–37 0, Dec. 1993. [48] X. Guyon, Champs aléatoires sur un réseau, modélisations, statistiques et applications . Paris: Masson, 1993. [49] C. Chaux, L. Duv al, and J.-C. Pesque t, “Étude du bruit dans une an alyse M -bandes en arbre du al, ” in Pr oc. GRETSI , Louv ain, Belgique, Sep. 2005, pp. 229–232. [50] C. Chau x, J.-C. P esquet, and L. Duval, “Noise c ov ariance prop erties i n dual-tree wav elet decompositions, ” IEEE Tr ans. on Inform. Theory , 2007, accepted. [51] E. Cand ès, L. Demanet, D. Dono ho, and L. Ying , “Fast discrete curv elet tr ansforms, ” SIAM J . on Mult. Model. S imul. , vol. 5, no. 3, pp. 861–899, Mar . 2006. [52] O. Alkin and H. Caglar, “Design of efﬁcient M -band coders with l inear-pha se and perfect-reconstruction properties, ” IEE E T rans. on Signal P r oc. , vol. 43, no. 7, pp. 1579–1590, Jul. 1995. [53] M. Raphan and E . P . Simoncelli, “Optimal denoising in redundan t bases, ” in P r oc. Int. Conf. on Imag e Pr ocessing , vol. III, San Antonio, TX, USA, Sep. 16-19 2007, pp. 113–11 6. May 30, 2018 DRAFT

A nonlinear Stein based estimator for multichannel image denoising

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment