On MMSE and MAP Denoising Under Sparse Representation Modeling Over a Unitary Dictionary
Among the many ways to model signals, a recent approach that draws considerable attention is sparse representation modeling. In this model, the signal is assumed to be generated as a random linear combination of a few atoms from a pre-specified dicti…
Authors: Javier Turek, Irad Yavneh, Matan Protter
On MMSE and MAP Denoising Under Sp arse Represen tati on Mo deli ng Over a Unita ry Dict ionar y ✩ J.S. T urek a , I. Y a vneh a , M. Protter a , M. Elad a a T echnio n, I sr ael Institute of T e chnolo gy, Computer Scie nc e Dep artment, Haifa 32000 , Isr ael Abstract Among the many ways to mo del signals , a recent a pproach that draws considerable attention is sparse rep- resentation mo deling. In this mo del, the signa l is assumed to b e generated as a random linear combination of a few atoms from a pre- s p e c ified dictiona ry . In this work we analyze tw o Bay esian denoising algor ithms – the Maximum-Apo s teriori Pr o bability (MAP) and the Minimum-Mean-Squared-E rror (MMSE) es timators, under the ass umption that the dictionar y is unitary . It is well known that b oth these estimator s lead to a scalar shrink age on the transformed co efficients, alb eit with a differen t resp onse curve. In this work we start by deriving closed-for m expr essions for these shr ink age curves and then analyze their p er formance. Upp er bounds on the MAP and the MMSE es timation err ors a re derived. W e tie these to the error obtained by a so-ca lled oracle estimator, where the supp or t is given, establishing a worst-ca se gain-fa ctor betw een the MAP/MMSE estimation err ors and the oracle’s p erformance. These de no ising alg orithms a r e demonstrated on synthetic signals a nd on true data (images). Keywor ds: Sparse representations, MAP, MMSE, Unitary dictiona ry, Shrink age, Bay es ian estimation, Oracle ✩ This research was supp orted by the European Comm unity ’s FP7-FET program, SM A LL pro j ect, under grant agreemen t no. 225913, and b y the Israel Science F oundation (ISF) grant nu mber 1031 /08. Email addr esses: javiert@cs.tec hnion.ac.i l (J.S. T urek), irad@ cs.technio n.ac.il (I. Y avneh ), matanpr@ cs.technio n.ac.il (M. Protter), elad@cs.tec hnion.ac.i l (M. Elad) Pr eprint submitted to Applie d and Computational Harmonic Analysis May 21, 2018 1. I n tro duction A classical and long-studied sub ject in signal pro cessing is denoising. This tas k considers a g iven meas ure- men t signal y ∈ R n obtained from a clear signal w ∈ R n by an additive co nt amination of the fo rm y = w + v . W e shall r estrict our discussio n to zero mea n i.i.d. Gaus sian noise vectors v ∈ R n , with eac h ent ry drawn at random from the nor mal distribution N 0 , σ 2 . The denois ing g oal is to recover w from y . An effective denoising alg o rithm assumes k nowledge abo ut the no is e characteristics, like the ab ov e de- scription, and introduces so me a ssumptions ab out the clas s of sig nals to whic h w belong s, that is, a-prio ri knowledge ab o ut the signal. There is a great nu mber of algorithms to day , corresp onding to a v a riety o f signal mo dels. Among these, a r ecently emerging group of tec hniques relies on sparse and r edundant represe ntations for mo deling the signals [4]. A signal w is said to hav e a s parse representation ov er a kno wn dictio nary , D ∈ R n × m , if there exists a s parse vector x ∈ R m such that w = Dx . The v ector x is the representation of w , having a num b er o f non-zeros , k x k 0 = k , which is muc h smaller than its length, m . Thu s, x descr ib es ho w to construct w as a linea r co mbination o f a few columns (also refer r ed to as a toms) o f D . In gener al, the dictionary may b e redundant, containing more atoms than the signal dimension ( m ≥ n ). Assuming that w = Dx with a spars e r epresentation x , how can o ne reco ver w fr o m the noisy mea surement y ? By posing a prior probability densit y function o ver x , one ca n derive the exact Maxim um-A’po steriori Probability (MAP) estimator for this task. This b ecomes a s earch for the supp or t o f the spar se r epresentation ˆ x that maximizes the p osterio r probability . This problem is computationally complex, a s it generally requires an exp onential sweep over all the po ssible spar se suppo r ts [17]. Therefore, appr oximation metho ds are often employ ed, such as the Or thogonal Matching Pur suit (OMP ) [15] a nd the Basis Purs uit (BP) [6]. While MAP estimation pr omotes seeking a single spa rse representation to explain the meas urements, r ecent work ha s s hown that better results 1 are p ossible using the Minimum Mean Squar e Erro r (MMSE ) estimator [14, 20, 12]. These works de velop MMSE estimato rs, showing that they lead to a weigh ted av era ge of all the po ssible repr e s entations that may explain the signal, with w e ights related to their pr obabilities. Jus t like MAP in the general setting, this estimation is infeasible to co mpute, and thus v ario us approximations are prop osed [14, 20, 1 2]. A w ell known and celebrated result in signal proc essing is the fact that the MAP es timator men tio ned ab ov e admits a clo sed-form simple fo rmula, in the specia l ca se where the dictionar y D is sq uare and unitary [11, 21, 1 6]. This for mula, known as a shrinkage o p e ration, yields the estimate ˆ w by applying a simple 1D op eration on the en trie s of the vector D T y . The denoised sig na l is then obta ined by multiplication by D . Shrink age tends to eliminate sma ll entries, while leaving larger ones almost intact. Our recent work rep orted in [18, 1 9] aimed to develop an MMSE closed-form formula for the unita r y case. With a spec ific prior mo del o n x , a r e c ursive formula for this tas k was developed. Thus, a t least in principle, the implications from this work are that o ne nee d not turn to approximations, as this fo r mula is 1 In the ℓ 2 -error sense, whic h is often the measure used to assess performance. 2 easily computable, lea ding to the exact MMSE. While s uch a result is very encouraging, it do es not pr ovide a truly simple technique of the for m that MAP enjoys. F urthermore, due to its recursive na ture, this alg orithm suffers from instability problems tha t hinder its use fo r hig h-dimensional signals. In the pr esent work, w e pr o p ose a mo dified prior mo del for the sparse representation vector x . W e s how that this c ha ng e leads to a simplified MMSE formula, whic h, just a s for the MAP , b ecomes a scalar shr ink age, alb eit with a differen t r esp onse curve. As such, this e x act MMSE denoising exhibits no n umerica l sensitivities as in [1 8, 1 9], and thus it can op erate ea sily in a ny dimensio n. The core idea that MMSE estimation for the unitary case leads to a shrink age algorithm has b een obser ved befo re [7, 8, 9, 1, 2]. Here w e adopt a distinct approach in the deriv ation, which also g ives us exact and simple express ions for MAP a nd MMSE shrink ag e cur ves, and their expec ted ℓ 2 -error s. W e use these as a stepping-stone tow ar ds the development of upper b ounds o n the MAP and the MMSE estimation err ors. A fundamental and key ques tion that has attracted attention in r ecent years is the pr oximit y b etw een practical pursuit 2 results a nd the or acle performanc e . The oracle is an estimator that knows the true support, th us g iving an ultimate res ult which can b e use d as a go ld- standard for ass e s sing practical pursuit p erforma nc e . F or example, the work rep o rted in [5] shows that the Danzig Selector algor ithm is a co nstant (and log) factor aw ay from the oracle r esult. Similar cla ims for the BP , the OMP , and even the thresholding algorithms, ar e made in [3]. In both these pap ers, the analysis is deterministic and non-Bay esian, which is differen t from the p oint of view taken in this pap er. In this work we tie the MAP and the MMSE er rors for the unitary ca se to the error obtained b y an oracle estima to r. W e establish worst-case gain-fac to rs of the MAP a nd the MMSE error s relative to the oracle error. This gives a clear ranking of these a lgorithms, and states clea rly their near ness to the ideal p erfor mance. The pa pe r is o rganized a s follows. In section 2 we describ e the signal model we shall use throughout this work. F or completeness of the presentation, we also derive the MAP and MMSE estimators for the general case in this section. In Section 3 w e tur n to the unitary c a se and pre sent the ideal MAP and MMSE estimator s, showing how b oth lead to shrink age o p e r ations. Section 4 is dev oted to the developmen t of the p er formance behavior of the MAP and MMSE estimates, and the upper b ounds on their erro r s. Section 5 pr esents n umerical exp eriments, demonstrating the prop osed a lgorithms in ac tio n. In Section 6 we conclude the pap er. 2. Bac kground 2.1. The Signal Mo del W e consider a genera tive signal mo del that resembles the one pr e s ented in [20]. In this mo del, each a tom has a prior pro bability P i of pa rticipating in the supp o rt of each sig nal, and (1 − P i ) of not app ear ing. One can think of the supp ort selection stage as p erforming biased coin-tosse s of m coins , with the i th coin having 2 Pursuit is a generic name given to algorithms that aim to estimate x . 3 a pr obability P i of “hea ds” and (1 − P i ) for “tails”. The coins that turn up (“heads”) constitute the supp or t S for this s ignal. Thus, the a pr iori pr o bability for an y s uppo rt S is given by P ( S ) = Y i ∈S P i · Y j / ∈S (1 − P j ) . (1) It is imp ortant to note that, as o pp o sed to the mo del used in [12 , 18, 19], here it is not p os sible to explicitly prescrib e the cardinality o f the supp ort, nor is it p ossible to limit it (as ev en the empt y and full supp orts may arise by chance). If, for so me i , P i equals 0, all the supp or ts that con tain element i hav e zero pr obability . Similarly , if w e hav e P i = 1, then all the supp orts that do not s elect the i th atom also ha ve zero pr obability . Hence, in o ur study we only need to consider v alues 0 < P i < 1 for all i , and this is assumed henceforth. W e further assume that, giv en the supp or t S , the c o efficien ts in x on this supp or t ar e drawn as i.i.d. Gaussian random v a r iables 3 with zero mean and v ariance σ 2 x , x |S ∼ N 0 , σ 2 x I |S | , (2) where I |S | is the identit y matr ix of size |S | . W e measur e the v ec tor y ∈ R n , a noisy linear co mb ination of atoms from D with coefficients x ∈ R m , namely , y = Dx + v , where the noise v is as sumed to b e white Gaus s ian with v ar iance σ 2 , i.e., v ∼ N 0 , σ 2 I n , and the columns of D are no rmalized. ¿F rom the model assumptions made ab ov e, it can b e seen [13] that y and x a re jointly Gaussia ns for a given supp ort, y x S ∼ N 0 0 , C S σ 2 x D S σ 2 x D T S σ 2 x I |S | , (3) where C S = σ 2 x D S D T S + σ 2 I n , (4) and D S ∈ R n ×|S | is comprised of the c olumns of the matrix D that appea r in the supp ort S . Hence, the marginal p.d.f. P ( y |S ) is Ga ussian and it is given b y y |S ∼ N (0 , C S ) . (5) Using prop er ties of the Multiv ar iate Gauss ia n p.d.f. (see [13, p. 325]), we have that the likelihoo d P ( y | x , S ) and the p oster ior p.d.f. P ( x | y , S ) are a lso Gauss ia n, namely y | x , S ∼ N D S x S , σ 2 I n (6) x | y , S ∼ N 1 σ 2 Q − 1 S D T S y , Q − 1 S , (7) 3 In fact, we may suggest a broader mo del of the form P ( x i ) ∼ exp {− f ( x i /σ x ) } , for an arbitrary function f ( · ), th us ke eping the mo del very general. It appears that with this c hange one can still obtain MMSE-shri nk age. F urthermore, one may also s tudy the sensitivity of M MSE/MAP shrink age-curves under p erturbations of f ( · ), and even find the worst choice of this function, that leads to the maximal expected error in MMSE – all these ar e left to future work, as we mainly fo cus here on the Gaussian mo del. 4 where the sub- vector x S is comprised of the elements of x whose indice s a re in the supp ort S , and Q S = 1 σ 2 x I |S | + 1 σ 2 D T S D S . (8) There is a direct link b etw een the ma trices Q S and C S , expressed using the matr ix inv e r sion lemma, C − 1 S = 1 σ 2 I n − 1 σ 4 D S Q − 1 S D T S . (9) 2.2. MAP/MM SE Estimators – The Gener al Case 2.2.1. The Or acle Estimator The firs t estimator we deriv e is the o racle. This estimato r ass umes knowledge of the chosen suppo rt for x , information that is unknown in the a c tua l pro ble m. Therefore it cannot b e obtained in practice. Nev ertheless, it gives us a refere nce per formance qua lit y to co mpare agains t. The o racle can tar get the minimization o f the MSE 4 . A well-kno wn and class ic a l result states that the MMSE estimator is equa l to the conditional mean of the unknown, conditio ned on the known parts, and thus in our ca s e it is E { x | y , S } . As the support S is known, we need to estimate x S , the s ub- vector of non-zero e ntries o f x , so the estimator is given by ˆ x Or acle S = E { x S | y , S } = 1 σ 2 Q − 1 S D T S y , (10) where this eq uality comes fr om the exp ectatio n of the proba bility distribution in (7). 2.2.2. Maximum A-Posteriori Estimator (MAP) The MAP e stimator pro p oses an estimate ˆ x that maximizes the pos ter ior proba bility . As the mo del mixes discrete probabilities P i with contin uous ones P ( x |S ), the MAP s hould b e car efully for mulated, o therwise, the mos t probable estimate would b e the zero vector. Thus, we choose instead to maximize the p osterio r of the suppor t, S M AP = ar g max S P ( S | y ) , (11 ) and only then compute the co rresp o nding estimate ˆ x S MAP . W e know from E quation (7), that x | y , S b ehaves as a norma l distribution, and thus the estimate ˆ x MAP is given b y the or acle in (10) with the sp ecific supp or t S M AP . Using Bay e s’s rule, Equation (1 1) leads to P ( S | y ) = P ( y |S ) P ( S ) P ( y ) . (12) Since P ( y ) do es not depend on S , it affects this e x pression o nly as a norma lizing factor. Using the expressions of the pr obabilities in the numerator that are given b y Equations (5 ) and (1), resp ectively , we obtain P ( S | y ) ∝ 1 p det( C S ) exp − 1 2 y T C − 1 S y · Y i ∈S P i · Y j / ∈S 1 − P j ≡ t S , (13) 4 Or M A P – in fact, the t wo are the same i n this case due to the Gaussianity of x | y , S . 5 where w e have in tro duced the no tation t S for br evity of la ter expre s sions. Returning to our MAP goal p os e d in Eq uation (11), applying a few simple algebr aic steps on the expressio n for P ( S | y ) leads to the following pena lty function, which should be max imize d with resp ect to the suppo rt S , V al ( S ) = 1 2 1 σ 2 Q − 1 / 2 S D T S y 2 2 − 1 2 log det C S + X i ∈S log ( P i ) + X j / ∈S log (1 − P j ) , (14) ov er a ll 2 m po ssible supp or ts. Once found, we obtain the MAP estimatio n by using the oracle formula from Equation (10), which computes ˆ x S for this supp o r t. 2.2.3. Minimum Me an Squar e Err or Estimator (MMSE) The MMSE estimate is given by the conditional expec tation, E { x | y } , ˆ x M M S E = E { x | y } = Z x x P ( x | y ) d x . (15) Marginalizing the pos terior pr obability P ( x | y ) o ver all p oss ible supp or ts S ∈ Ω, we hav e P ( x | y ) = X S ∈ Ω P ( x | y , S ) P ( S | y ) . (16) Plugging Equation (16) into Equa tion (15) yields ˆ x M M S E = X S ∈ Ω P ( S | y ) Z x x P ( x | y , S ) d x = X S ∈ Ω P ( S | y ) E { x | y , S } = X S ∈ Ω P ( S | y ) ˆ x Or acle S . (17) Equation (17) shows that the MMSE es timator is a w eighted av erage of all the “oracle” solutions, each with a different support a nd weigh ted b y its pro bability . Finally , we substitute the expressio n t s developed in Equation (13) into Eq uation (17), and get the formula for MMSE estimatio n, ˆ x M M S E = 1 t X S ∈ Ω t S · ˆ x Or acle S , (18) where t = P S ∈ Ω t S is the ov era ll normalizing facto r. 2.3. Estimator Performanc e – The Gener al Case W e conclude this background section by discussing the ex p ected Mean- Squared-E r ror (MSE) induced by each of the estimato r s developed a b ove. Our goal is to o btain clear express io ns for these error s, which will later serve when we develop similar and simpler express ions for the unitar y case. W e start with the p erfor mance of the orac le es timator, as the oracle is central to the deriv a tion of MAP and MMSE er rors . The o r acle’s ex pec ted MSE is given by E n ˆ x Or acle S − x S 2 2 y o = E ( Q − 1 S 1 σ 2 D T S y − x S 2 2 y ) = E ( Q − 1 S 1 σ 2 D T S ( D S x S + v ) − x S 2 2 y ) = trace Q − 1 S , (19) 6 where w e have used Eq uation (8 ), and the fact that y = D S x S + v . Our analysis con tinues with the exp ected erro r for a general estimate ˆ x , observing that it can b e written as E n k ˆ x − x k 2 2 y o = Z x ∈ R m k ˆ x − x k 2 2 P ( x | y ) d x = X S ∈ Ω P ( S | y ) Z x ∈ R m k ˆ x − x k 2 2 P ( x | y , S ) d x , (20) where we hav e used the marg inalization prop os e d in Equation (16). W e add and subtract the o racle estimate ˆ x Or acle S that corresp onds to the supp ort S into the norm term, yie lding Z x ∈ R m k ˆ x − x k 2 2 P ( x | y , S ) d x = Z x ∈ R m ˆ x Or acle S − x 2 2 P ( x | y , S ) d x (21) + Z x ∈ R m ˆ x − ˆ x Or acle S 2 2 P ( x | y , S ) d x . Note that the integral o ver the cro s s-term ˆ x Or acle S − x T ˆ x − ˆ x Or acle S v anishes, since the term ˆ x − ˆ x Or acle S is deterministic and can th us be moved outside the in tegration, while the expressio n rema ining ins ide the int egral is zero, since the ora cle estimate is the expec ted x ov er this domain and w ith this supp ort. Contin uing with Equa tion (21), the first term represents the MSE of an oracle for a g iven supp ort S , as derived in Equation (19). In the second term, the norm factor do es not depend on the in tegr a l v ar iable x , a nd th us it may be pulled outside the in tegration. The remaining part is equal to one. Therefor e, Z x ∈ R m k ˆ x − x k 2 2 P ( x | y , S ) d x = trace Q − 1 S + ˆ x − ˆ x Or acle S 2 2 . (22) Returning to the ov erall exp ected MSE a s in Equation (20), using the fact that P ( S | y ) = t S /t , as developed in Equation (13 ), we hav e E n k ˆ x − x k 2 2 o = 1 t X S ∈ Ω t S · h trace Q − 1 S + ˆ x − ˆ x Or acle S 2 2 i . (23) By plugging ˆ x = ˆ x M M S E int o this expres sion, we get the MMSE erro r. Note that if w e minimize the a bove with resp ect to x , we g et the MMSE estimate formula exactly , as expec ted, since the MMSE is the solution that leads to the smallest erro r. Observe that (23) c a n b e written differently b y adding and subtracting ˆ x M M S E inside the nor m term, giving E n k ˆ x − x k 2 2 o = 1 t X S ∈ Ω t S · trace Q − 1 S + 1 t X S ∈ Ω t S ˆ x − ˆ x M M S E + ˆ x M M S E − ˆ x Or acle S 2 2 = 1 t X S ∈ Ω t S · trace Q − 1 S + ˆ x − ˆ x M M S E 2 2 + 1 t X S ∈ Ω t S ˆ x M M S E − ˆ x Or acle S 2 2 = ˆ x − ˆ x M M S E 2 2 + E n ˆ x M M S E − x 2 2 y o . (24) In this deriv ation, the cr oss-term ( ˆ x − ˆ x M M S E ) T ( ˆ x M M S E − ˆ x Or acle S ) dro ps out, since in this summation the term ( ˆ x − ˆ x M M S E ) T can b e p ositio ned outside the summation, and then, using E quation (18), it is e asily 7 shown tha t we a re left with an expr ession that equals ˆ x M M S E − ˆ x M M S E = 0. W e hav e then a genera l error formula for any estimator, given by equation (24). In par ticular, this means that the err or for the MAP estimate can b e ca lculated by E n ˆ x M AP − x 2 2 y o = ˆ x M AP − ˆ x M M S E 2 2 + E n ˆ x M M S E − x 2 2 y o . (25) 3. MAP & MMSE Estim ators for a Unitary Dictionary The der iv ation of MAP and MMSE for a general dictiona ry lea ds to prohibitive computational tasks. As we shall see next, w he n using unitary dictionar ies, w e are able to av oid these demanding computations, a nd instead obtain clo sed-form solutions for each o ne of the es timators. F urthermore, the t wo r esulting a lgorithms are very similar, b o th having a shrink a ge structure . While this cla im ab out MAP and MMSE lea ding to s hrink age is no t new [7, 8, 9, 1, 2], our distinct devel- opment of the closed-form shr ink age for mulae will lead to a simple computationa l pro ces s for the ev a luation of the MAP and the MMSE, whic h will facilitate the p erforma nce a nalysis der ived in Section 4. 3.1. The Or acle Just as for the gener al dictionary case, we sta r t b y deriving an expre s sion for the orac le estimation. In this case, we assume that the dictiona ry D is a unitary ma trix, and th us D T D = I . Moreov er, it is eas ily seen that D T S D S = I |S | , which will simplify our expressions. W e start b y simplifying the matrix Q S defined in (8), Q S = 1 σ 2 x I |S | + 1 σ 2 D T S D S = σ 2 x + σ 2 σ 2 x σ 2 I |S | . (26) The oracle s olution, a s g iven in Equa tion (1 0) , b ecomes ˆ x Or acle = 1 σ 2 Q − 1 S D T S y = c 2 β S , (27) where w e hav e defined the c o nstant c 2 = σ 2 x / ( σ 2 x + σ 2 ) and the vector β S = D T S y . The or acle estimator has th us b e en reduced to a simple matrix by vector m ultiplication. 3.2. The MAP – Unitary Case W e turn to the MAP estimatio n, which requir es to first find the optimal supp ort S based on Eq ua tions (13) a nd (14), a nd then plug it in to the or acle expre s sion as g iven in Equation (10) to get the estimate. W e pro ceed b y s implifying the expressio n det( C S ) in Equations (13) and (14). The matrix C S is defined in Equation (4) as C S = σ 2 x D S D T S + σ 2 I n . Denoting by W S a diagonal matrix with o nes and zero s on its main diagonal ma tching the s uppo rt 5 S , we obtain det( C S ) = det σ 2 x D S D T S + σ 2 I n = de t ( D ) · det σ 2 x W S + σ 2 I n · det D T = σ 2 x + σ 2 |S | ( σ 2 ) n −|S | = 1 − c 2 −|S | σ 2 n . (28) 5 ( W S ) ii is 1 if i ∈ S , and 0 elsewhere. 8 Plugging this r esult into Equation (13 ), and using the r elation betw een C S and Q S in Equation (9), yields P ( S | y ) ∝ exp c 2 2 σ 2 y T D S D T S y − 1 2 log 1 − c 2 −|S | Y i ∈S P i · Y j / ∈S 1 − P j ∝ exp c 2 2 σ 2 k β S k 2 2 Y i ∈S P i p 1 − c 2 · Y j / ∈S 1 − P j ∝ Y i ∈S exp c 2 2 σ 2 β 2 i P i p 1 − c 2 · Y j / ∈S 1 − P j . (29) T aking in to a ccount that 0 < P i < 1, we can rewrite this expr ession a s P ( S | y ) ∝ Y i ∈S exp c 2 2 σ 2 β 2 i P i 1 − P i p 1 − c 2 · n Y j =1 1 − P j ∝ Y i ∈S exp c 2 2 σ 2 β 2 i P i 1 − P i p 1 − c 2 = Y i ∈S q i , (30) where w e have defined q i = exp c 2 2 σ 2 β 2 i P i 1 − P i p 1 − c 2 . (31) W e further define g i = q i / (1 + q i ) (whic h implies that q i = g i / (1 − g i )), and substitute this in to Equation (30 ). Adding now the neces sary no rmalization factor we g e t P ( S | y ) = X S ∗ ∈ Ω Y i ∈S ∗ q i ! − 1 Y i ∈S q i = X S ∗ ∈ Ω Y i ∈S ∗ g i 1 − g i ! − 1 Y i ∈S g i 1 − g i = P S ∗ ∈ Ω Q i ∈S ∗ g i Q j / ∈S ∗ (1 − g j ) Q n k =1 (1 − g k ) − 1 Q i ∈S g i Q j / ∈S (1 − g j ) Q n k =1 (1 − g k ) = X S ∗ ∈ Ω Y i ∈S ∗ g i Y j / ∈S ∗ (1 − g j ) − 1 Y i ∈S g i Y j / ∈S (1 − g j ) . (32) The following observ ation will facilitate a further simplification of this ex pr ession: Prop ositi on 1 . L et Ω b e the set of al l p ossible subsets of n indic es, and let g i b e values asso ciate d with e ach index, such that 0 ≤ g i ≤ 1 . Then, X S ∈ Ω Y i ∈S g i · Y j / ∈S (1 − g j ) = 1 . (33) Proof. Consider the following exper iment : a set of n indep endent coins ar e toss ed, with the i th coin having a proba bility g i for “heads” and (1 − g i ) for “ta ils”. The proba bility of a sp ecific set o f S coins turning up “heads” (and the rest tur ning up “tails”) is Q i ∈S g i · Q j / ∈S (1 − g j ). F or any one tos s o f the n coins , exactly one of these co mbinations will b e the outcome. Therefor e, the s um of these probabilities o ver all the c o mbinations m ust b e 1 . 9 Using this prop ositio n, the norma lization term in Equatio n (32) v a nis hes, as it is equal to 1 (0 < g i ≤ 1 since g i = q i 1+ q i and q i ≥ 0 for every i ). W e therefore obtain P ( S | y ) = Y i ∈S g i Y j / ∈S (1 − g j ) . (34) The optimiza tio n task (1 1) ca n now b e written as S MAP = arg max S ∈ Ω Y i ∈S g i Y j / ∈S (1 − g j ) = arg max S ∈ Ω Y i ∈S q i 1 + q i Y j / ∈S 1 − q j 1 + q j = arg max S ∈ Ω Q i ∈S q i Q j / ∈S 1 Q n k =1 (1 + q k ) = ar g max S ∈ Ω Y i ∈S exp c 2 2 σ 2 β 2 i P i 1 − P i p 1 − c 2 . (35) Int erpreting this expression, we see that every element in the supp or t influences the penalty in one o f t wo wa ys: • If i t is part of the supp ort : Multiply the expres s ion by √ 1 − c 2 P i 1 − P i exp n c 2 2 σ 2 β 2 i o , or • If i t is not in the supp ort : Multiply the express io n by 1. As we aim to max imize the e x pression in Equatio n (35), the supp o rt w ill co nt ain all the elements i such that √ 1 − c 2 · P i 1 − P i · exp n c 2 2 σ 2 β 2 i o > 1. (In the ca se tha t no such element exists, the supp ort should b e empty and the solution is therefore ˆ x M AP = 0 .) Once these elements are found, all we have to do is to multiply their v alue β i by c 2 and this is the MAP estimate. Stated differently , this mea ns that after co mputing the tr a nsformed vector β = D T y , we tes t each of its ent ries, and s e t the MAP estimate for the i th ent ry to b e ˆ x M AP i = ψ M AP ( β i ) = c 2 β i | β i | > √ 2 σ c r log 1 − P i √ 1 − c 2 P i 0 otherwise . (36) This is the shrink ag e algorithm men tioned earlie r – each en try is handled independently of the others, passing through a scalar shrink a g e curve that n ulls sma ll entries and keeps large ones intact (up to the multiplication by c 2 ). There is no trace of the exhaustive a nd combin atoria l search that c haracter izes MAP in the gener al case, and this simple algo rithm yields the exact MAP estimatio n. 3.3. The MMSE – The Un itary Case Equation (18) shows the presence of the ora c le in the MMSE estimatio n. Similarly to MAP , we ma ke use of the unita r y o racle e s timate in Equa tion (27). Note that β S may be written as β S = n X k =1 I S ( k ) β k e k , (37) 10 where e k is the k th vector in the ca nonical basis, and I S ( k ) is a n indicator function ( I S ( k ) = 1 if k ∈ S , and zero otherwise). While this may seem like a cum b er some change, it will prov e v a luable in later deriv a tions. Starting fr om Equation (18), substituting the expr ession develop ed for P ( S | y ) in Equation (34) in to Equation (18), a nd using E quation (37), we obtain the following expressio n for the unitary MMSE es tima to r, ˆ x M M S E = X S ∈ Ω Y i ∈S g i Y j / ∈S (1 − g j ) c 2 · n X k =1 I S ( k ) β k e k ! = c 2 n X k =1 X S ∈ Ω I S ( k ) Y i ∈S g i Y j / ∈S (1 − g j ) β k e k . (38) W e in tr o duce now another observ atio n, similar to the one p osed in Pro p o sition 1 . This will b e used to further simplify the a b ove expression. Prop ositi on 2 . L et Ω b e the set of al l p ossible subsets of n indic es, and let g i b e values asso ciate d with e ach index, such that 0 ≤ g i ≤ 1 . Then, X S ∈ Ω I S ( k ) Y i ∈S g i · Y j / ∈S (1 − g j ) = g k . (39) Proof. In the spirit of the coin tossing in ter pretation describ ed in the pr o of of Propo sition 1, the multiplica- tion b y the expression I S ( k ) implies that only toss outcomes where the k th coin turns up “heads” ar e included in the summation. Th us, the overall proba bilit y of those is exactly the pro bability that the k th coin turn up “heads”, whic h is g k as claimed. A somewhat mo re fo rmal wa y to p ose this rationale is b y o bs erving that X S ∈ Ω I S ( k ) Y i ∈S g i · Y j / ∈S (1 − g j ) = X S ∈ Ω s.t. k ∈ S Y i ∈S g i · Y j / ∈S (1 − g j ) = g k · X S ∈ Ω k Y i ∈S g i · Y j / ∈S (1 − g j ) . The last summation is o ver the set Ω k , that contains all the supp orts in Ω and do not contain the k th ent ry . Thu s, for the remaining n − 1 elemen ts , this s ummation is complete, just as p osed in Pro p osition 1, a nd therefore the ov era ll expression eq uals g k . Returning to the MMSE expression in Equa tion (38), a nd using this equality , we get a far simpler MMSE expression of the for m ˆ x M M S E = c 2 n X k =1 g k β k e k = c 2 n X k =1 q k 1 + q k β k e k . (40) This is an explicit formula for MMSE estima tio n. The estimation is computed by firs t calculating β = D T y , and then simply multiplying eac h en try β k by c 2 q k / (1 + q k ) (which is a function of β k as well). Explicitly , the MMSE estimate is g iven elemen twise by ˆ x M M S E i = ψ M M S E ( β i ) = exp n c 2 2 σ 2 β 2 i o P i 1 − P i √ 1 − c 2 1 + exp c 2 2 σ 2 β 2 i P i 1 − P i √ 1 − c 2 · c 2 β i . (41) 11 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 β Shrink( β ) MMSE MAP (a) σ = 0 . 1 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 β Shrink( β ) MMSE MAP (b) σ = 0 . 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 β Shrink( β ) MMSE MAP (c) σ = 1 Figure 1: Shrink age functions fo r MMSE and MAP estimators ( P i = 0 . 1, σ x = 1). This op eration has the fo rm o f a sc alar s hrink age op e ration, just like MAP . F or | β i | ≪ σ/c this for mula leads to ˆ x M M S E i ≈ 0, where as for | β i | ≫ σ/c the outcome is ˆ x M M S E i ≈ c 2 β i (just like the MAP). Th us , the expression m ultiplying c 2 β i here serves as a soft-shrink age 6 op eration, which repla c es the ha r d-shrink ag e practiced in the MAP . Figure 1 shows the v arious shrink ag e functions obtained for e a ch estimator. 4. Performanc e Analysis 4.1. Deriving the Estimators’ MSE Our main goal in this work is to develop erro r expressions for the differen t estimators in the unitar y reg ime, exploiting the general deriv ations of s ection 2.3. W e start by calculating the error for an or a cle solution ˆ x Or acle S . Using E quation (26) we obtain E n ˆ x Or acle S − x 2 2 o = trace Q − 1 S = |S | c 2 σ 2 = n X k =1 I S ( k ) c 2 σ 2 , (42) 6 This should not b e confused with the term sof t-thresholding obtained when minimi zing an ℓ 1 penalty . 12 where the indicator function is the same as prev iously used in (37). The last equality will b ecome useful for our later development. T urning to the MMSE estimator, recall the g eneral ex p ected- MSE expressio n in Equation (23 ), E n ˆ x M M S E − x 2 2 o = X S ∈ Ω P ( S | y ) · h |S | c 2 σ 2 + ˆ x M M S E − ˆ x Or acle S 2 2 i . (43) Using the unitar y MMSE estimator expression in E quation (40) a nd that of the or acle so lution in Eq ua tion (27), we further develop the seco nd term in the expression a b ove, and o btain ˆ x M M S E − ˆ x Or acle S 2 2 = c 2 n X k =1 g k β k e k − c 2 n X k =1 I S ( k ) β k e k 2 2 = n X k =1 c 4 [ g k − I S ( k )] 2 β 2 k = n X k =1 c 4 g 2 k − 2 g k I S ( k ) + I S ( k ) β 2 k . (44) Plugging this expressio n back into Equation (4 3), to gether with the express ion for P ( S | y ) in E quation (34), gives MSE ˆ x M M S E = X S ∈ Ω P ( S | y ) ( c 2 σ 2 n X k =1 I S ( k ) + c 4 n X k =1 g 2 k − 2 g k I S ( k ) + I S ( k ) β 2 k ) = n X k =1 c 2 σ 2 X S ∈ Ω I S ( k ) Y i ∈S g i Y j / ∈S (1 − g j ) + c 4 β 2 k X S ∈ Ω Y i ∈S g i Y j / ∈S (1 − g j ) g 2 k − 2 g k I S ( k ) + I S ( k ) = n X k =1 c 2 σ 2 g k + c 4 β 2 k g k − g 2 k . (45) Here w e hav e exploited Prop ositio n 2. In ter estingly , the pr op erty |S | = P n k =1 I S ( k ) and P rop osition 2 yield the relationship E {|S |} = X S ∈ Ω P ( S | y ) |S | = X S ∈ Ω n X k =1 I S ( k ) ! Y i ∈S g i Y j / ∈S (1 − g j ) = n X k =1 X S ∈ Ω I S ( k ) Y i ∈S g i Y j / ∈S (1 − g j ) = n X k =1 g k . (46) This implies that the MMSE erro r can b e alternatively written as MSE ˆ x M M S E = c 2 σ 2 E {|S |} + c 4 n X k =1 β 2 k g k − g 2 k , (47) 13 suggesting that the erro r is comp osed of an “oracle” error 7 , and an additional par t that is necessa rily positive (since 0 < g k < 1). As an extreme example, if the elements of the vector β tend to b e either very high or very low (compared to σ /c ), then the g k tend to the extremes as well. In such a case, the seco nd term near ly v anishes, and the p er formance is clo se to that of the o racle. W e next study the MAP p e r formance. Recall Equation (2 3 ), and note that ˆ x M AP may be wr itten a s ˆ x M AP = n X k =1 I MAP ( k ) c 2 β k e k , (48) where I MAP ( k ) is an indica to r function for the MAP supp or t. Exploiting P rop ositions 1 a nd 2, we obtain the following expressio n for the MAP mean-s q uared-er ror, MSE ˆ x M AP = X S ∈ Ω P ( S | y ) ( c 2 σ 2 n X k =1 I S ( k ) + c 4 n X k =1 [ I MAP ( k ) − 2 I S ( k ) I MAP ( k ) + I S ( k )] β 2 k ) = n X k =1 c 2 σ 2 X S ∈ Ω I S ( k ) Y i ∈S g i Y j / ∈S (1 − g j ) + c 4 β 2 k X S ∈ Ω Y i ∈S g i Y j / ∈S (1 − g j ) [ I MAP ( k ) + I S ( k ) − 2 I S ( k ) I MAP ( k )] = n X k =1 c 2 σ 2 g k + c 4 β 2 k [ g k + I MAP ( k )(1 − 2 g k )] . (49) Analyzing the difference betw e en the MMSE and MAP er rors , in E quations (45) and (49) resp ectively , we find that only the last terms in each ar e differen t: − g 2 k versus I MAP ( k )(1 − 2 g k ), resp ectively . Obviously , this implies MSE ˆ x M M S E ≤ MSE ˆ x M AP , b ecaus e − g 2 k ≤ I MAP ( k )(1 − 2 g k ) for any k , and regar dless of the v alue o f I MAP ( k ) (zero or one). In order to further under stand the estimator s’ pe r formance given in the Equations (45) and (4 9), we turn now to a further analysis of thes e expressio ns and derive worst-case upp er-b o unds for them. The bo unds we are ab out to build do no t dep end o n the dimension of the signal, but rather on the problem parameters ( σ , σ x , P i ) alone. W e b eg in with the MMSE, then turn to the MAP , a nd finally compare a nd discuss the resulting bo unds . 4.2. MMSE Performanc e Bound Referring to E quation (4 5), which descr ibe s the er r or asso cia ted with the MMSE approximation, we shall denote by M S E 1 the first term, M S E 1 = c 2 σ 2 n X k =1 g k . (50) As mentioned b efor e, this is the exp ected MSE of the ora cle (given y ). The seco nd term, denoted b y M S E 2 , is g iven by M S E 2 = c 4 n X k =1 β 2 k g k (1 − g k ) . (51) 7 See the similarity b etw een the fir st term here and the one p osed in Equation (42). 14 This is the additional er ror due to the fact that the supp or t is unknown. W e would like to b ound the ra tio r = M S E 2 / M S E 1 , a s this immediately yields a b ound ( r + 1) on the MMSE err or in terms of the e x pe cted oracle error. Our goal is thus to characterize the w orst ratio max y ∈ R n r = max y ∈ R n M S E 2 M S E 1 , (52) that is, the worst (largest) ratio ov er all conceiv able signa ls y , where the dep endence on y enters via the β k ’s. In order to characterize this ratio , we shall need the following simple lemma: Lemma 3. L et ( a k , b k ) , k = 1 , . . . , n , b e p airs of p ositive r e al n umb ers. L et m b e the index of a p air whose r atio is maximal, i.e., a k b k ≤ a m b m for al l k ≥ 1 . (53) Then P n k =1 a k P n j =1 b j ≤ a m b m , with e qu ality o c cu r ring only if a 1 b 1 = a 2 b 2 = · · · = a n b n . Proof. By (53), a k b m ≤ a m b k for a ll k ≥ 1 , with eq uality obtained o nly if a k /b k = a m /b m . Summing up all these inequalities, w e o btain b m n X k =1 a k ≤ a m n X j =1 b j , hence, P n k =1 a k P n j =1 b j ≤ a m b m , as claimed, with eq ua lity occurr ing o nly if a i b i = a m b m , for every i . Returning to o ur task of b ounding M S E 2 / M S E 1 , w e o bserve that this r atio can b e written as M S E 2 M S E 1 = c 4 P n k =1 β 2 k g k (1 − g k ) c 2 σ 2 P n k =1 g k , (54) which is o f the s ame for m as the ratio appea ring in the Lemma . This leads us to the following Theor em: Theorem 4. Denote G k = √ 1 − c 2 P k / (1 − P k ) , and let m b e t he index c orr esp onding to an a priori le ast likely atom, i.e., P m = min 1 ≤ k ≤ n P k and henc e, G m = min 1 ≤ k ≤ n G k . Denote f M M S E ( s ) = 2 s 1+ G m e s , and define (implicitly) s ⋆ = arg max s ≥ 0 f M M S E ( s ) . Then 1. r ⋆ = f M M S E ( s ⋆ ) is an upp er-b ound on the ra tio M S E 2 / M S E 1 . 2. The worst r atio, r ⋆ , satisfies the explicit b ound r ⋆ ≤ 2 ln 1 4 G m G m < 1 4 e 2 ≈ 0 . 034 2 √ G m e G m ≥ 1 4 e 2 ≈ 0 . 034 . (55) 15 Proof. Starting with the first cla im, w e embark fro m Equation (54) and exploit Lemma 3 to obtain M S E 2 M S E 1 = c 4 P n k =1 β 2 k g k (1 − g k ) c 2 σ 2 P n k =1 g k ≤ c 2 σ 2 · max 1 ≤ k ≤ n β 2 k g k (1 − g k ) g k ≤ c 2 σ 2 · max 1 ≤ k ≤ n β 2 k (1 − g k ) . (56) Recalling that g k = q k / (1 + q k ), the definition of q k in (31), and the definition of G k ab ov e, w e have 1 − g k = 1 1 + q k = 1 1 + P k 1 − P k √ 1 − c 2 exp c 2 2 σ 2 β 2 k = 1 1 + G k exp { c 2 β 2 k / 2 σ 2 } . (57) Plugging this into Eq uation (5 6) a nd denoting s = c 2 β 2 k / 2 σ 2 , w e o btain M S E 2 M S E 1 ≤ max 1 ≤ k ≤ n 2 s 1 + G k exp { s } . (58) This is a mo notonically de c reasing function of G k for any fixed v a lue of s ≥ 0 (note that s must b e non- negative, due to its definition). Th us, the maximum over the indices 1 ≤ k ≤ n is o bta ined for the index m for w hich G k is the smallest. Therefore, max β M S E 2 M S E 1 ≤ max s ≥ 0 2 s 1 + G m exp { s } = f M M S E ( s ⋆ ) = r ⋆ , (59) as claimed. T urning to the second claim of the theorem, we desire to b ound f M M S E ( s ) from ab ov e. T o this end, we maximize the alter native function f ( s ) that b o unds f M M S E ( s ) fr om a bove p oint-wise: f M M S E ( s ) = 2 s 1 + G m e s ≤ 2 s max 1 , 2 √ G m e s ≡ f ( s ) . (60) Here we have used the facts that (i) the arithmetic mean (1 + G m e s ) / 2 is necessarily la rger than the geometric one, √ G m e s , and (ii) 1 + G m e s ≥ 1. The switc h-ov e r in the denominator o f f ( s ) o ccurs when G m e s = 1 / 4 , which takes place for s = s 0 ≡ ln(1 / 4 G m ). F or s ≤ s 0 , f ( s ) = 2 s , which is mono tonically increasing. F or s ≥ s 0 , f ( s ) = s/ √ G m e s , whose deriv ative is given by f ′ ( s ) = (1 − s/ 2 ) / √ G m e s . Th us , if s 0 ≥ 2, the maximum of f ( s ) o ccurs at s = s 0 , being f ( s 0 ) = 2 s 0 = 2 ln(1 / 4 G m ). Otherwise, the max imu m o ccurs at s = 2 , b eing f (2 ) = 2 / √ G m e . This proves the explicit upper b ound on r ⋆ , as g iven in Equa tion (55). Figure 2 s hows the functions f M M S E ( s ) and its upp er b ound f ( s ) for t wo p ossible v alues of G m : 0 . 0 1 and 0 . 1. These tw o ca s es co rresp ond to the tw o o ptions cov e red in E q uation (5 5). As can b e seen, for G m = 0 . 01 < 0 . 034, the max imum p oint is obtained on the linear pa rt of f ( s ), wher eas in the case o f G m = 0 . 1 > 0 . 034 , the maximum is obtained for s = 2. Figure 3 pr esents the v alue of r ⋆ as a function of G m . This fig ure also s hows the upp er-b ound on this v alue as given in Equation (55), a nd the tw o sub-functions that comprise it. 16 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 s f M M S E ( s ) an d f ( s ) f M M S E ( s ) for G m = 0 . 01 f ( s ) for G m = 0 . 01 Maxim u m l ocation f M M S E ( s ) for G m = 0 . 1 f ( s ) for G m = 0 . 1 Maxim u m l ocation Figure 2: Graph plot of the function f M M S E ( s ) and its upp er-b ounding function f ( s ) (the so lid and the dashed line s , resp ectively), exhibiting the t wo cases, whe r e the maximum changes g iven the v alue G m . 10 −3 10 −2 10 −1 10 0 10 1 0 5 10 15 G m r ⋆ a n d its u p p er b o u n d r ⋆ Th e upp er- b ound on r ⋆ 2 l n (1 / 4 G m ) 2 / p G m e Th e swit ch poin t Figure 3: The w orst ra tio r ⋆ and its upper bo und, as given in Equation (55). T his g raph also shows the tw o po rtions of this b ounding function, and the lo cation of the switch b etw een them. 17 Corollary 5. The ex p e cte d err or for the MMSE estimator is b ounde d for any signal y by MSE ˆ x M M S E ≤ MSE ˆ x Or acle · 1 + 2 ln 1 4 G m G m ≤ 1 4 e − 2 1 + 2 √ G m e G m ≥ 1 4 e − 2 . ( 61) Proof. F ollo ws from Theor e m 4. What happ ens when all the proba bilities P k are equal? In such a case we obtain that G 1 = G 2 = · · · = G n . F rom Equation (56), which uses Lemma 3, it is obvious tha t the worst-ratio r ⋆ bec omes a tight upper -b ound on M S E 2 / M S E 1 , since a ll the terms in the numerator a nd the denominator summations ar e e q ual. F ur thermore, the w orst-ca se β k ’s a r e a ll equa l to ± σ √ s ⋆ /c . 4.3. MAP Performanc e Bound W e next develop an uppe r -b ound on the er ror asso ciated with the MAP estimate in E quation (49). While M S E 1 remains the same as in Equation (50), the ter m that cor resp onds to M S E 2 for the MAP b ecomes M S E 2 = c 4 n X k =1 β 2 k g k 1 + I MAP ( k )(1 − 2 g k ) g k . Contin uing with the s ame definitions as in the previous section, we pr ov e a similar theor em for the exp ected MSE o f the MAP estimator. Theorem 6. Denote G k = √ 1 − c 2 P k / (1 − P k ) , and let m b e t he index c orr esp onding to an a priori le ast likely atom, i.e., P m = min 1 ≤ k ≤ n P k and henc e, G m = min 1 ≤ k ≤ n G k . Defi ne the function f M AP ( s ) = 2 s G m e s < 1 2 s G m e s G m e s ≥ 1 , ( 62) and define (implici tly) s ⋆ = ar g max s ≥ 0 f M AP ( s ) . Then 1. r ⋆ = f M AP ( s ⋆ ) is an upp er-b ound on the r atio M S E 2 / M S E 1 . 2. The worst r atio, r ⋆ , satisfies the explicit b ound r ⋆ = 2 ln 1 G m G m < e − 1 ≈ 0 . 3 68 2 G m e G m ≥ e − 1 ≈ 0 . 3 68 . (63) Proof. The pro of follows the same lines as that of Theor em 4. Starting with the ra tio r , w e e x ploit Lemma 3 a nd obtain M S E 2 M S E 1 = c 4 P n k =1 β 2 k g k h 1 + I MAP ( k ) 1 − 2 g k g k i c 2 σ 2 P n k =1 g k ≤ c 2 σ 2 · max 1 ≤ k ≤ n β 2 k g k h 1 + I MAP ( k ) 1 − 2 g k g k i g k ≤ c 2 σ 2 · max 1 ≤ k ≤ n β 2 k 1 + I MAP ( k ) 1 − 2 g k g k . (64) 18 Again us ing the rela tion g k = q k / (1 + q k ) and the definition of q k from (31), w e have that 1 − 2 g k g k = 1 q k − 1 = 1 G k exp n c 2 β 2 k 2 σ 2 o − 1 = 1 G k exp { s } − 1 , (65) where w e have used the definition of s as b efore ( s = c 2 β 2 k / 2 σ 2 ). Plugged ba ck in to Equation (64), we obtain M S E 2 M S E 1 ≤ max 1 ≤ k ≤ n 2 s 1 + I MAP ( k ) 1 G k exp { s } − 1 . (66) F or any fixed v a lue of s , the maximum ov er the indices 1 ≤ k ≤ n is obtained fo r the index m for which G k is the smallest. Therefore, maximizing this ex pr ession with res pe c t to b oth k and s yields max β M S E 2 M S E 1 ≤ ma x s ≥ 0 2 s 1 + I MAP ( m ) 1 G m exp { s } − 1 (67) ≤ ma x s ≥ 0 2 s G m exp { s } < 1 2 s G m e s G m exp { s } ≥ 1 , . Here we hav e us ed the fact that I MAP ( m ) = 1 when the atom m is part of the MAP suppo rt, which ta kes place if q m ≥ 1 (see the discussion after Equation (35)). W e turn to the second claim of the theorem, and ca lculate explicitly the v a lue s ⋆ for which f M AP ( s ⋆ ) = r ⋆ is ma ximized. The s w itch-ov er betw een the t wo cases of f M AP ( s ) occur s when G m e s = 1, that is , s = s 0 ≡ ln (1 /G m ). F or s ≤ s 0 , f M AP ( s ) = 2 s , which is mo notonically increasing . F or s ≥ s 0 , f M AP ( s ) = 2 s/ ( G m e s ), whose deriv ative is given by f ′ ( s ) = (2 − 2 s ) / ( G m e s ). Thus, if s 0 > 1, the max imum of f o ccurs at s ⋆ = s 0 , that is, f M AP ( s 0 ) = 2 ln(1 /G m ). O therwise, the maximum o ccurs at s ⋆ = 1 with f M AP (1) = 2 / ( G m e ). This prov e s the explicit upper b ound r ⋆ as given in Equatio n (63). Figure 4 shows tw o examples of f M AP ( s ) for tw o possible v alues of G m : 0 . 2 and 0 . 8. These t wo cases corres p o nd to the tw o options cov ered in Equation (63). As can be see n, for G m = 0 . 2 < 0 . 368, the maximum po int of f M AP is obtained at the switch-ov er p oint, whereas in the case o f G m = 0 . 8 > 0 . 36 8, the maximum is found at s = 1. Figure 5 presents the v alue of r ⋆ as a function o f G m for b oth the MAP and the MMSE. This figure a lso shows the tw o s ub-functions that construct r ⋆ for the MAP , as describ ed in Equation (63). Corollary 7. The ex p e cte d MSE err or for the MAP estimator is b ounde d for any s ignal y by MSE ˆ x M AP ≤ MSE ˆ x Or acle · 1 + 2 ln 1 G m G m ≤ e − 1 1 + 2 G m e G m ≥ e − 1 . (6 8) Proof. F ollo ws from Theor e m 6. When all the proba bilities P i are eq uiv alent, and hence G 1 = · · · = G n , we get again that the worst r atio r ⋆ bec omes a tigh t upp er bound on M S E 2 / M S E 1 , following the s ame rea soning as expla ined in the MMSE 19 0 1 2 3 4 5 6 7 8 0 0.5 1 1.5 2 2.5 3 3.5 s f M AP ( s ) f M AP ( s ) for G m = 0 . 2 Maxim um l ocation f M AP ( s ) for G m = 0 . 8 Maxim um l ocation Figure 4: A plot of the function f M AP ( s ), exhibiting the tw o cas e s , where the maximum c hanges c haracter according to the v alue G m . 10 −3 10 −2 10 −1 10 0 10 1 0 5 10 15 G m r ⋆ The M AP r ⋆ 2 ln (1 /G m ) 2 / p G m e The sw it ch poin t The M MS E r ⋆ Figure 5: The worst ratio r ⋆ for MAP , as g iven in Equatio n (63) . This g raph also shows the t wo p ortions of this function, the lo ca tion o f the switch b etw een them, and the MMSE ratio r ⋆ . 20 case. The worst-case β k ’s a r e a ll given by β k = ± 2 σ 2 c 2 r 2 ln 1 G m G m < e − 1 ± 2 σ 2 c 2 G m ≥ e − 1 . 4.4. MMSE and MAP Bounds – A S ummary The b ounds developed a bove sug gest that b o th the MMSE and the MAP estimator s lead in the unitar y case to a mean-squa red err or that is at worst a constant times the oracle MSE. The analysis given ab ov e provides exact e xpressions for these r atios. W e s ho uld note that the bounds developed above are based on a worst-case scenario. A more pr actical goal would b e to b ound the av erage case, as this should tell us more ab out the b ehavior o f real- life signals. W e leav e this to pic to future work. As a last p oint in this section, we co ns ider the following questio n: When are the MAP and MMSE nearly equiv alent? Rec a ll that the err o rs of these tw o estimators are given in Equa tions (47) and (49) as MSE ˆ x M M S E = n X k =1 c 2 σ 2 g k + c 4 n X k =1 β 2 k g k − g 2 k MSE ˆ x M AP = n X k =1 c 2 σ 2 g k + c 4 n X k =1 β 2 k [ g k + I MAP ( k )(1 − 2 g k )] . In order for these tw o erro rs to be close , we should therefor e imp os e for all k g k − g 2 k ≈ g k + I MAP ( k )(1 − 2 g k ) ⇒ g 2 k − 2 I MAP ( k ) g k + I MAP ( k ) ≈ 0 . (69) If P k → 0, this leads to g k → 0, since g k = q k / (1 + q k ) and q k = √ 1 − c 2 P k / (1 − P k ) · exp c 2 β 2 k / 2 σ 2 . F rom E quation (3 6) we also have that I MAP ( k ) = 0, implying that this index is not part of the MAP supp ort. Returning to the r e quirement p ose d in Eq ua tion (69), we obtain the condition g 2 k ≈ 0, whic h is rea dily s atisfied. Thu s, w e conclude tha t one cas e wher e the tw o estimator s, MAP and MMSE, align, is when P k → 0. When P k → 1, this leads to g k → 1. Relying again on Equation (36) w e also ha ve that I MAP ( k ) = 1 this time, implying tha t this index is now part of the MAP supp ort. Returning to the r equirement p o sed in Equation (69), we obtain the condition g 2 k − 2 g k + 1 = ( g k − 1 ) 2 ≈ 0, again satisfied (since g k is close to 1. Thu s, another case wher e the tw o estimator s align is when P k → 1 . 5. Exp erimen tal Res ults Here we demonstra te the MAP a nd MMSE estimators for unitary dictionar ie s a nd provide both syn thetic and r eal-signa l exp eriments to illustrate these a lgorithms. 5.1. Synthetic Exp eriments In the first exp er iment we use a 2D W av elet dictionar y D (Daubachies-5 filters) [10], with 3 levels of resolution. W e cho ose a ll the atom pro ba bilities P i and a ll the v ar iances σ i to b e the same in this test. W e use P = 0 . 1 and σ x = 1. 21 Generating a tw o-dimensiona l signal accor ding to the prop osed model is done by first randomly choosing whether each a tom is part o f the supp ort or not with probability P . F or the s elected ato ms, co efficients x i are drawn indep endently from a normal distribution N (0 , σ 2 x ). The r esulting sparse v ector of co efficients is multiplied by the unitary dictiona ry to o btain the gr oun d-t ruth tw o- dimensional signal. Each entry is independently con taminated by white Gaussia n noise N (0 , σ 2 ) to cr eate the input signa l y . The v a lues of the additiv e noise p ow e r , σ , are v aried in the r a nge [0 . 1 , 1] to demonstrate the effect of the noise lev el on the ov er all per formance. Each of the (noisy) signals is then approximated using the following estimators: 1. Empirical Oracle estimatio n and its MSE. This estimator app ears in E quation (27 ). 2. Theoretical Oracle estimatio n error, as given in Equation (42). 3. Empirical MMSE estimation and its MSE. W e use Eq uation (40) in order to compute the estimatio n, and then ass ess its erro r empirica lly . 4. Theoretical MMSE estima tion er ror, us ing E quation (45) dire ctly . 5. Empirical MAP estima tio n and its MSE. W e use the clo sed-form s olution g iven in Equation (36). 6. Theoretical MAP estimatio n error, as given in Equation (49). The ab ov e pro cess is re pea ted for 1000 randomly gener a ted signals of size 128 × 12 8 , and the mean L 2 error is av era ged over all signals to obtain an estimate of the exp ected qua lity of each estimator. Figure 6 shows the relative deno ising effect (compared to the original noisy s ig nal) ac hieved by eac h e s timator. The improved per formance of the MMSE es tima to r ov er the MAP is clearly seen, as well as a cle a r v alidation of the theo r etical deriv ations. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.05 0.1 0.15 0.2 0.25 0.3 σ Relative Mean−Squared−Error ( σ ) 1. Oracle Emp. 2. MMSE Emp. 3. MAP Emp. 4. Oracle Thr. 5. MMSE Thr. 6. MAP Thr. Figure 6: Empirica l and theoretical ev aluations of the MSE as a function of the input noise for syn thetic signals ( P = 0 . 1, σ x = 1, and n = 128 × 12 8). 22 0 10 20 30 40 50 60 70 80 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 σ Relative Mean−Squared−Error ( σ ) MMSE MAP Figure 7: Relativ e denoising achiev ed by the MAP and the MMSE estimato r s, obtained for the image Pepp ers , with v a r ying input nois e p ower. 5.2. R e al-World Signals Next, we exp eriment with re al-world sig nals – imag es. The unitary dictiona ry for this exp eriment is the same 2D W av elet T r ansform dictionar y used in the s ynthet ic exper iment . This dictionar y is known to serve natural image con ten t adequately (i.e., sparsify image con ten t). There are t wo ma in obstacles when aiming to op erate on no n-synthetic signals: 1. The assumption that all the non-zero entries in x share the same v a riance is inadequate, and we should generalize the a b ove discussion to a heteroscedas tic mo del. 2. The para meters that desc rib e the signal mo del a re unknown and need to b e estima ted from the co rrupted signal. Our handlng o f these tw o issues is describ ed in detail in Appendix A. It is important to note tha t o ur main goal in this exp eriment is to demonstrate the p ow er of the MMSE and the MAP estimators, and their comparison. W e do not attempt to compare these results to state- of-the-art image denois ing algor ithms, as the current mo del is too limited for this comparison to be fair, due to the non-adaptiveness and the unitar ity of the dictio na ry . W e exp eriment with the image Peppe rs sho wn in Figure 10. The noise levels considered ar e: 5 , 10 , 15 , . . . , 70 , where the pixel v alues are in the range [0 , 25 5 ]. The rela tive MSE of the cleaned image compared to the noisy one app ears in Figur e 7, as a function o f the input noise p ow er. P er each σ , the para meters are estimated, and then used within the MAP a nd the MMSE estimator s. Clearly , the MMSE outp erforms the MAP for all the noise levels, the gap b eing bigger for high SNR lev els. Nevertheless, it is also eviden t from this graph that the differe nc e b etw een the tw o is relatively small. Figure 8 23 1.00 0.60 0.37 0.32 0.19 0.09 0.03 0.01 0.00 0.00 (a) P i 741.85 98.77 88.48 58.36 54.60 48.96 46.27 44.90 45.85 46.13 (b) σ i Figure 8: Estimated parameters for the “Peppers ” image with no ise σ = 10. shows the estimated parameters learnt fro m the noisy fr ame for each band, and the v alues o f these para meter may provide an explanation for this phenomenon. As w e hav e o bserved in the previous section, the g ap b etw een the MMSE and the MAP is expec ted to b e negligible if P k are nearly zeros or o ne s . T his means that among the 1 0 bands in the wa velet transfor m, the three hig h-resolution and the single low-resolution bands are exp ected to give the same p er formance for b oth estimators. T his suggests that the difference b etw ee n the MAP and the MMSE is only due to the image energ y that resides in the 6 middle-bands. Figure 9 shows the actual err ors per band, as obtained by the MMSE and the MAP , a nd indeed, as exp ected, the difference in these er rors exists mos tly in the 6 middle ba nds. Finally , a visua l comparison of the results of the differe nt estimators is presented in Figure 1 0 for the ima ge Pepper s , to which white Gaussian noise with σ = 10 is added. As exp ected, the MMSE result shows a s mall visual improv ement ov er the MAP . 6. Summ ary and Conclusions In this w o rk we hav e s tudied a model where each atom has a given probability to b e part of the suppo rt. This model as s umes that all the suppor ts are p os sible, thus avoiding assumptions on the (generally unknown) suppo rt size. W e study MAP and MMSE estimator s for the mo del with a general dictionary , including a n ov er view of their p erfor mance. T he n, we focus o n unitary dictio naries, for which both e s timators hav e simple and accur ate closed form ula s for their computation. A fter dev elo ping the closed-form MAP and MMSE estimators, it is shown how can they b e interpreted in terms of shrink age. W e descr ibe the r elation o f the MAP and MMSE estimators in this mo del to ex isting mo dels app earing in the literature . This developmen t is extended by lo oking at the theoretical p erfor mance of the estimators. Here, analytical b ounds on the worst- case denoising p erfo r mance is shown. Finally , syn thetic and rea l- world exper iments show the perfor mance of the estimators, and the clear adv antage of MMSE estimator over MAP estimator. 24 102.11 101.00 97.36 98.03 77.93 64.33 51.36 32.46 17.51 6.80 (a) M MSE 102.11 118.05 117.90 120.52 98.86 82.39 62.30 37.14 18.41 7.06 (b) M AP Figure 9: The er ror per band for the MMSE and the MAP estimators. (a) Gr ound truth image (b) N oisy image (PSNR 28.12dB) (c) M AP (PSNR 32.33dB) (d) M MSE (PSNR 33.01dB) Figure 10: Visual co mpa rison o f the r e constructed imag e by the MAP and MMSE estimators ( σ = 10 ). 25 App endix A – Handling Images As men tioned in Section 5, in or der to handle a given noisy image , we should extend the mo del to allow for distinct v a riances for the different atoms, and we should also es timate the mo del parameters from the image . This app endix desc rib es thes e t wo tasks. A.1. Extension to Heter osc e dastic Mo del In the der iv ations in this pa pe r we have as s umed that all the non-zero entries in x hav e the same v ar ia nce. As this is rarely the ca se for natura l ima ges, w e treat now a more ge neral problem, where this v aria nce is atom-dep endent. Suc h a model is known as heter oscedastic. O ur goa l is to show that mos t of the results remain of similar form, with modest c hanges. Th us, w e shall keep the discussio n in the section brief, and only state the main res ults. W e c hange the cov a riance ma tr ix in Equation (2) to b e a more general diagonal matrix V S , given b y V S = diag σ 2 S 1 , . . . , σ 2 S k , (A-1) where k = |S | . F or the genera l estimators developed in section 2.2, the changes due to this generaliza tion are all a bsorb ed in the matr ices Q S and C S , b ecoming C S = D S V S D T S + σ 2 I n , Q S = V − 1 S + 1 σ 2 D T S D S , and the rela tion b etw een them in E quation (9) is still v a lid. Moving to the unitar y ca se, the matrix Q S is a dia gonal ma tr ix o f the form Q S = diag σ 2 S 1 + σ 2 σ 2 S 1 σ 2 , . . . , σ 2 S k + σ 2 σ 2 S k σ 2 ! . (A-2) Its in version, Q − 1 S , can easily b e calculated, and the o racle s o lution b eco mes ˆ x Or acle = diag c 2 S 1 , . . . , c 2 S k · β S , (A-3) where c 2 S i = σ 2 S i / ( σ 2 S i + σ 2 ). The suppo rt of the MAP estimator is given by S MAP = ar g max S ∈ Ω Y i ∈S q 1 − c 2 i · P i 1 − P i · ex p c 2 i 2 σ 2 β 2 i Y j / ∈S 1 . (A-4) Lastly , the unitar y MMSE es timate pre s ented in E quation (40) b ecomes ˆ x M M S E = n X k =1 c 2 k q k 1 + q k β k e k , (A -5) where q k = P k 1 − P k p 1 − c 2 k exp n c 2 k 2 σ 2 β 2 k o . 26 A.2. Par ameter Estimation The par ameters of the imag e genera tion mo del are not known in adv a nce and th us they should b e estimated. W e shall as sume that each band in the wav elet transform is characterized by a pair o f parameters σ i , P i , and there a re r such bands ov erall (10 in the exp eriment rep orted in Section 5). W e prop ose to estimate these parameters directly fr om the noisy image, by p erfor ming the following optimization task: arg max { P i ,σ i } r i =1 P ( y |{ P i , σ i } r i =1 ) . (A-6) Marginaliza tion of this likeliho o d term with r esp ect to the supp ort of the image in the wa velet domain reads P ( y |{ P i , σ i } r i =1 ) = X S P ( y |S , { P i , σ i } r i =1 ) · P ( S |{ P i , σ i } r i =1 ) . (A-7) As maximiza tion of this summation may b e computationa lly difficult, we turn to approximate it by considering only o ne item – the dominant one within this sum. Thus, we pro po se to solve arg max { P i ,σ i } r i =1 P ( y |{ P i , σ i } r i =1 ) (A-8) ≈ arg max { P i ,σ i } r i =1 , S P ( y |S , { P i , σ i } r i =1 ) · P ( S |{ P i , σ i } r i =1 ) · P ( S ) , where we maximize with r esp ect to the suppo rt as w ell. Note that w e hav e in tro duced a pr ior on the suppo rt size, P ( S ) . W e shall use the form P ( S ) = r Y i =1 exp {− λ i |S i |} , with S i the supp or t in the i -th band. This prior controls the supp ort spars it y in each ba nd, and a s we show next, it stabilizes the estimation pro cedure . The v alues λ i are se t to b e high for low-frequency bands, and decrease for the higher frequency bands. W e use the mo del definitions in Section 2.1 in order to develop an expres sion that depends o nly on the parameters of the r bands. Starting with P ( S |{ P i , σ i } r i =1 ), w e g e t P ( S |{ P i , σ i } r i =1 ) = r Y i =1 P |S i | i (1 − P i ) n i −|S i | , (A-9) where n i is the s ize of the i -th band. Using the fact that the wa velet dictiona ry is unitary and exploiting Equation (A-2), we ha ve det ( C S ) = σ 2 i + σ 2 σ 2 |S | σ 2 n = σ 2 n Y i ∈S σ 2 i + σ 2 σ 2 = σ 2 n r Y i =1 σ 2 i + σ 2 σ 2 |S i | . (A-10) Plugging Equation (5) and the ab ov e express ions into (A-8), the parameters estimation ta sk b eco mes arg max S , { P i ,σ i } r i =1 r Y i =1 σ 2 i + σ 2 σ 2 − |S i | 2 P |S i | i (1 − P i ) n i −|S i | exp 1 2 σ 2 σ 2 i σ 2 + σ 2 i k β S i k 2 − λ i |S i | . Two impo rtant features of this express ion deserve our attention: First, rather than seeking the supp ort S , this express ion reveals that all we need are the ca r dinalities |S | within e ach ba nd. Sec o nd, this expression is 27 separable with r esp ect to the r bands, implying that we can estimate P i , σ i for the i - th band by solving arg max |S i | ,P i ,σ i σ 2 i + σ 2 σ 2 − |S i | 2 P |S i | i (1 − P i ) n i −|S i | exp 1 2 σ 2 σ 2 i σ 2 + σ 2 i k β S i k 2 − λ i |S i | . T aking the lo g of the ab ov e express ion, we obtain an alternative function to ma x imize, f ( |S i | , P i , σ i ) = − |S i | 2 log σ 2 i + σ 2 σ 2 + |S i | log P i + ( n i − |S i | ) log (1 − P i ) + 1 2 σ 2 σ 2 i σ 2 + σ 2 i k β S i k 2 − λ i |S i | . (A-11) T o o btain the estimates for σ i and P i we differen tiate f with resp ect to these unknowns. The deriv a tive with resp ect to P i leads to 0 = ∂ f ( |S i | , P i , σ i ) ∂ P i = |S i | P i − ( n i − |S i | ) 1 − P i = ⇒ P i = |S i | n i . (A-12) Similarly , the der iv ative with resp ect to σ i gives 0 = ∂ f ( |S i | , P i , σ i ) ∂ σ i = −|S i | σ i σ 2 i + σ 2 + σ i ( σ 2 + σ 2 i ) 2 k β S i k 2 = ⇒ σ 2 i = k β S i k 2 |S i | − σ 2 . (A-13) The la st step in this estimation pro cess is to discov er the cardinality |S i | . Returning to the expr ession to be maximized in Eq uation (A-11), w e can plug in the solutions obtained fo r P i and σ i , b oth being functions of |S i | . The ov era ll expressio n is th us a function of the scala r |S i | , a nd the maximizer v alue can be fo und b y a simple sweep of this unk nown in the rang e [0 , n i ]. W e sho uld note that for every v a lue tested, we should also up date the vector β S to include only non-z e ro elements of S i . Since we are maximizing f ( |S i | , P i , σ i ), we should choos e the la rgest e nt ries (in abso lute v alue) within this vector. After this exhaus tive pr o cess is done, we pick the suppo r t size and the resp ective calculated parameters that maximize the o ptimization ta sk (A-11). References [1] F. Abramovich, T. Sa pa tinas and B.W. Silverman, W a velet thresholding via a Bay esian appr oach, J. R. Statist. So c. B , 6 0:725 –749, 1998 . [2] A. Antoniadis, J. Bigo t, and T. Sapatina s , W avelet estimator s in nonparametric re g ressio n: a compa r ative simulation study , J. Stat. Softwar e , 6(6 ):1 –83 , 2001 . [3] Z. Ben- Haim, Y.C. Eldar, and M. Elad, Cohere nc e -based performa nc e guara ntees for estimating a sparse vector under ra ndom no ise, s ubmitted to IEEE T r ansactions on Signal Pr o c essing . [4] A.M. Br uckstein, D. L. Donoho, and M. E lad, F r om spar se solutions of sy s tems of equa tions to spar s e mo deling o f signals and imag es, SIAM r eview , 5 1(1):34– 81, F ebruary 2009. [5] E.J. Candes and T. T ao, The Danzig-Selec to r: Statistica l estimatio n when p is muc h larger than n , Annals. St atistics , 35(6):2313 –235 1, 200 7 . 28 [6] S.S. Chen, D.L. Donoho, and M.A. Saunders, Atomic decomp ositio n by ba sis pur suit, SIAM J ournal on Scientific Computing , 20(1 ):33–61 , 1998. [7] M. Clyde and E.I. Georg e, Empirical Bay es estimation in wav elet nonparametr ic regressio n. In B ayesian Inference in W av elet Based Mo dels, P . Muller a nd B. Vidako vic (Eds .), Lect. Notes Statist., 141 :309–3 22, New Y ork : Springer-V er lag, 1998 . [8] M. Clyde a nd E .I. Geor ge, Flexible empir ical Bayes estimatio n for wa velets, J. R . S tatist. So c. B , 62 :681– 698, 2 000. [9] M. Cly de, G. Parmigiani and B. Vidako vic, Multiple shrink age and subset selection in wav elets, Biometrika , 8 5 :391– 401, 1998. [10] I. Daub echies, T en le ctur es on wavelets , SIAM, 199 2 . [11] D.L. Do no ho a nd I.M. Johnstone, Idea l spa tial adaptation by wa velet shrink age, Biometrika , 8 1(3):425 – 455, September 1 994. [12] M. Elad and I. Y avneh, A plurality of spa r se representations is b etter than the sparsest one alone, IEEE T r ans. on Information The ory , 5 5(10):47 01–4 714, Oc to b e r 200 9. [13] S.M. Kay , F un damentals of Statistic al Signal Pr o c essing: Est imation The ory , V olume I, Prentice Hall, 1993. [14] E. Larsson and Y. Selen, Linea r r egress ion with a sparse para meter v e c tor, IEEE T r ansactions on Signal Pr o c essing , 55 :4 51–4 60, 2007. [15] S. Mallat and Z. Zha ng , Matching Pursuits with time-frequency dictionar ies, IEEE T r ans. on Signal Pr o c essing , 41 (1 2):3397 –341 5, 1993 . [16] P . Moulin and J. Liu, Analysis of mult iresolutio n imag e deno ising schemes using genera lized Gaussia n and co mplexity priors, IEEE T r ans. Inf. The ory , 45(3):909– 9 19, April 19 99. [17] B.K. Natara jan, Sparse appr oximate so lutions to linear systems, SIAM Journal on Computing , 24 :227– 234, 1 995. [18] M. Protter, I. Y avneh, and M. Elad, Closed-form MMSE estimator for denoising signa ls under spa rse reconstructio n mo deling, Eleventh IEEEI c onfer enc e , Eila t, Isr ael, Dec. 200 8. [19] M. Protter, I. Y avneh, and M. Ela d, Clos ed-form MMSE estimation for sig na l denoising under s parse representation mo deling ov er a unitary dictio na ry , submitted to IEEE T r ansactions on S ignal Pr o c essing . [20] P . Schnitt er, L. C. P o tter, and J. Ziniel, F ast Bayesian matching pursuit, Pro c . W orks hop on Information Theory a nd Applications (IT A), (La Jolla, CA), Jan. 2008. [21] E.P . Simoncelli and E .H. Adelson, Nois e re mov al via Bay esian wav elet co ring, in Pr o c. ICIP , Laussa nne, Switzerland, pp. 3 79-38 2, Septem b er 1 996. 29 0.00 17.05 20.54 22.49 20.93 18.06 10.94 4.67 0.90 0.26
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment