A proximal iteration for deconvolving Poisson noisy images using sparse representations

1 A proximal iteration for decon v olving Poisson noisy images using sparse representations F .-X. Dup ´ e a , M.J. F adili a and J.-L. Starck b a GREYC UMR CNRS 6072 b D APNIA/SEDI-SAP CEA-Saclay 14050 Caen France 91191 Gif-sur-Yv ette France Abstract W e propo se an im age d econv olution algorithm when the data is con taminated by Poisson noise. The image to restore is assumed to b e sparsely re presented in a d ictionary of wa veform s such as the wavelet or curvelet transforms. Our key contributions are: First, we hand le the Poisson no ise properly by using the Anscombe variance stabilizing transform leadin g to a non-lin ear degradation equation with additive Gaussian noise. Seco nd, the d econv olution problem is f ormulated as the minimization of a co n vex functio nal with a data- ﬁdelity term re ﬂecting th e n oise proper ties, and a no n-smooth sparsity-promo ting penalty over the image representation coefﬁcients (e.g. ℓ 1 -norm ). An ad ditional term is a lso includ ed in the functio nal to ensure positivity of th e restor ed imag e. Third , a fast iterative forward-ba ckward splitting algorithm is p roposed to solve the min imization prob lem. W e d eriv e existence an d uniquen ess conditions of the solution, and establish conver gen ce of the itera ti ve alg orithm. Finally , a GCV -b ased model selection proced ure is pro posed to objecti vely select the regular ization parameter . Exper imental results ar e carried o ut to show the striking beneﬁts g ained fr om taking into account th e Poisson statistics of the noise. These results also suggest that using sparse-domain re gu larization may be tractable in many deconv olution app lications with Poisson noise such as astro nomy and microscopy . Index T erms Deconv olution , Poisson noise, Proximal iteration, for ward-backward splitting , Iter ati ve th resholdin g, Sp arse representatio ns. 2 I . I N T R O D U C T I O N Decon volution is a lon gstanding prob lem in many area s of sign al and image process ing (e.g. biomed ical imag ing [1], [2], astronomy [3], remote-se nsing, to quote a fe w). For example, res earch in astronomica l image dec on volution has recently seen conside rable work, partly tr igge red by the Hubble space telescope (HST) optical aberration problem at the beginning of its mission. In biomedical imaging, researchers are also inc reasingly relying o n dec on volution to improve the quality of imag es acquired by confocal microsc opes [2]. Decon volution may then prov e crucial for exploiting images and extracting scientiﬁc con tent. There is a n extens i ve literature on dec on volution problems. One might refer to w ell-known dedicated monog raphs on the subje ct [4]–[6]. In presenc e of P oisson noise, several dec on volution metho ds hav e been propo sed such as T ikhonov-Miller in verse ﬁlter and Richardson-Lucy (RL) algorithms; se e [1], [3] for a comp rehensive revie w . The RL has been use d extens i vely in many applications beca use it is a dapted to Poisson noise . The RL algorithm, howe ver , ampliﬁes noise after a few iterations, which c an b e avoided by introducing regularization. In [7], the authors prese nted a T otal V ariation (TV)-regularized RL algo rithm. In the astronomical imaging literature, sev eral authors a dvocated the use o f wa velet-regularized RL algorithm [8]–[10]. In the context of biolog ical imaging deconv olution, wavelets have also been use d as a regularization s cheme when deconv olving biome dical images; [11] presen ts a version of RL combined with wa velets denoising, a nd [12] use s the thresho lded L andwebe r iteration introduced in [13 ]. The latter ap proach implicitly ass umes that the con taminating noise is Gau ssian. Other recent attempts for solving Poiss on linear inv erse p roblems is a Bay esian multi-scale framework propose d in [14] bas ed on a multi-scale factorization of the Poiss on likelihood function associated with a recursive parti tioning of the unde rlying intensity . Regularization of the solution is acc omplished by imposing prior probability distrib utions in a Bayes ian paradigm and the maximum a pos teriori solution is computed us ing the expectation-maximization algorithm. Howe ver , the multisca le an alysis b y the ab ove autho rs is only tractable with the Haar wa velet. Similarly , the w ork in [15] o n h ard t hres hold es timators in the tomographic d ata fr amework ha s sho wn t ha t for a particular operator (t he Radon operator) a n e xtens ion of wa velet-v ag uelette decomposition ( WVD) method [16] for Poisson data is theoretically feasible. But the authors do not provide any computational algorithm for computing the estimate. Inspired by the WVD method, the authors in [17] explored an alternati ve approach via w avelet-based decomp ositions combined with thresh olding strategies tha t address a daptivit y issu es. Sp eciﬁcally , their framew ork extends the wa velet-Galerkin methods of [18] to the Poisson setting. In order to ens ure the positivit y o f the estimated intensity , the log-intensity is expa nded in a wavelet basis. This metho d is howe ver li mited to standard orthogo nal wa velet bases . In the co ntext of deco n voluti on with Gaus sian white noise, s parsity-promoting regularization over orthogonal wa velet co efﬁcients has be en rece ntly propos ed [13], [19], [20 ]. Generalization to frames was p roposed in [21], [22]. In [23], the authors pres ented an image d econ volution algorithm by iterativ e thresholding in an overcomplete dictionary of tr ans forms, and [24] designed a de con volution method that combines both the wavelet an d curvelet 3 transforms. Howe ver , sparsity-based approac hes pu blished so far have mainly focused on Gaussia n noise. In this paper , we propos e an image dec on volution algorithm for data blurred and con taminated by Poiss on no ise. The Po isson noise is handled properly by u sing the Anscombe v ariance stabilizing transform (VST), lead ing to a non-linear degradation equation with additi ve Gaussian noise, see (1). The deconv olution problem is then formulated as the minimization of a conv ex functional c ombining a non-linear data -ﬁdelity term reﬂec ting the noise properties, and a no n-smooth s parsity-promoting p enalty over the represe ntation co efﬁcients of the image to restore. S uch representations inc lude not only the orthog onal wavelet transform but also overcomplete representations such as translation-in vari ant wa velets, c urvelets or wav elets an d curvelets. Sinc e Poisson intensity func tions are no nnegativ e by de ﬁnition, a n add itional term is a lso include d in the minimized fun ctional to ens ure the positi vity of the res tored image. Inspired by the work in [20], a fast proximal iterati ve algorithm is p roposed to solve the minimization problem. Experimental results are ca rried out on a s et of simulated and real image s to compare our app roach to some c ompetitors. W e sho w the striking beneﬁts ga ined from taking into ac count the Poisson nature of the noise and the morpho logical structures in volved in the image through overcomplete sparse multiscale transforms. A. Relation to prior work A naive solution to this decon voluti on problem would be to ap ply traditi on al approa ches designed for Gaussian noise. But this would be awkward a s (i) the noise tends to Gaussian only for lar ge mean intensities (central limit theorem), and (ii) the noise variance depen ds on the mean anyway . A more adapted way would be to adopt a bayesia n framework with an app ropriate a nti-log-lik elihood score—the negative of the log -likeli hoo d function— to obtain a d ata ﬁdelity term reﬂec ting the Poisson statistics of the noise. The data ﬁde lity term is derived from the conditional distrib ution of the obse rved data given the original image, which is k nown to b e governed by physical considerations conc erned w ith the da ta-acquisition device and the noise generating p rocess (e.g . Poisson here). Unfortunately , doing so, we would end -up with a f un ctional which d oes not satisfy a k ey property: the data ﬁdelity term does no t have a Lipschitz-continuous gradie nt as req uired in [20], henc e p rev enting u s from using the forward-backward sp litti ng proximal algo rithm to solve the optimization prob lem. T o circumvent this difﬁculty , we propose to ha ndle the noise statistical properties by us ing the An scombe VST . Some previous authors [25] have already sugge sted to use the Anscombe VST , and then deco n volv e with wa velet-domain regularization as if the stabilized obs ervation were linearly degraded a nd con taminated by a dditi ve Gaus sian noise. But this is n ot valid a s standard results of the Anscombe VST lead to a non-linear degradation equation because of the sq uare-root, see (1). B. Organization o f this pap er The or ganiza tion of the pa per is as follo ws: we ﬁrst formulate our decon volution p roblem under Poisso n noise (Section II), and then rec all so me neces sary material ab out overcomplete sparse represen tations (Section III). The core of the p aper lies in Se ction IV, where we state the deconv olution optimi za tion problem, ch aracterize it and 4 solve it u sing mon otone operator splitting iterations. W e also focus o n the choice of the two main parameters of the algorithm and propose so me solutions. In Section V, e xp erimental res ults are reported and discuss ed. The proofs of our ma in results are de ferred to the a ppendix for the sake of p resentation. C. Notation a nd terminology Let H a real Hilbert space , here a ﬁnite dimensional real v ec tor space. W e den ote by k . k the norm ass ociated with the inner p roduct h ., . i in H , and I is the iden tity ope rator on H . x and α are respectiv ely reordered vectors of image sa mples and transform co efﬁcients. A real-valued func tion f is coe rci ve, if lim k x k→ + ∞ f ( x ) = + ∞ . The domain of f is deﬁned by d om f = { x ∈ H : f ( x ) < + ∞} and f is proper if d om f 6 = ∅ . W e say that a real-valued function f is lower se mi- continuous (lsc) if lim inf x → x 0 f ( x ) ≥ f ( x 0 ) . Lower semi-continuity is we aker than continuity , and pla ys an important role for existen ce o f solutions in mi nimization prob lems [26, Page 17]. Γ 0 ( H ) is the class of all prope r lsc c on vex functions from H to ( −∞ , + ∞ ] . Th e subdifferential of a func tion f ∈ Γ 0 ( H ) at x ∈ H is the set ∂ f ( x ) = { u ∈ H|∀ y ∈ H , f ( y ) ≥ f ( x ) + h u, y − x i} . An element u o f ∂ f is called a subgradient. A comprehensive accoun t o f subdifferentials can be found in [26]. An operator A acting on H is κ -Li ps chitz continuous if ∀ x, y ∈ H , k A( x ) − A( y ) k 6 κ k x − y k where κ is the Lipschitz cons tant. The spectral operator no rm of A is gi ven by k A k 2 = max x 6 =0 k A x k k x k . W e deno te by ı C the indicator of t he co n vex set C : ı C ( x ) =      0 , if x ∈ C , + ∞ , otherwise . . W e denote by ⇀ the con vergence. I I . P RO B L E M S T A T E M E N T Consider the image formation model where an input imag e of n pix els x is blurred by a p oint sp read func tion (PSF) h a nd contaminated by Poisson noise. T he observed image is then a discrete collection o f counts y = ( y i ) 1 6 i 6 n which a re bounded , i.e. y ∈ ℓ ∞ . Eac h count y i is a realization of a n inde pendent Poisson rando m vari ab le with a mean ( h ⊛ x ) i , wh ere ⊛ is the circular c on volution op erator . Formally , this writes y i ∼ P ( ( h ⊛ x ) i ) . Th e deconv olution problem at hand is to restore x from the observed coun t image y . A natural way to attack this problem would b e to a dopt a maximum a posteriori (MAP) bay esian framew ork with an appropriate likelihood function—the distribution of the observed da ta y given an original x —reﬂecting t he Poisson statistics of the noise. But, as stated abov e, this w ould prevent us from using the forw ard-ba ckward splitt ing prox imal algorithm to solve the MAP optimization problem, s ince the gradient of the data ﬁdelity term is no t Lipschitz- continuous . Indeed, forward-backward iteration is e ssentially a generalization of the cla ssical g radient projection method [27 ] for c onstrained co n vex optimization and monoton e variational ine qualities, a nd inherit res trictions similar to those method s. For such me thods, Lipsch itz continuity o f the gradient is classical [27, Theo rem 8.6-2]. The latter prope rty is then crucial for the iterates in (13) to be determined uniquely , a nd for the forward-backward 5 splitting algorithm to conv erge; see Theorem 1 and also [28 ]. For this reaso n, we propose to h andle the noise statistical properties by u sing the An scombe VST [29] d eﬁned as z i = 2 q ( h ⊛ x ) i + 3 8 + ε, ε ∼ N ( 0 , 1) , (1) where ε is an additive white Gaus sian noise of unit variance 1 . In words, z is no n-linearly related to x . In Se ction IV, we provide an elegant optimization problem an d a ﬁxed point algorithm taking into accoun t su ch a non-linea rity . I I I. S PA R S E I M AG E R E P R E S E N TA T I O N Let x ∈ H be an √ n × √ n image. x ca n be written as the superpos ition of elementary atoms ϕ γ parameterized by γ ∈ I ac cording to the following linear generative mode l : x = X γ ∈I α γ ϕ γ = Φ α, |I | = L > n . (2) W e deno te by Φ the dictionary i.e. the n × L matrix whose columns a re the ge nerating wav eforms ( ϕ γ ) γ ∈I all normalized to a unit ℓ 2 -norm. The forward (analysis) transform is then deﬁned by a non -necess arily square matrix T = Φ T ∈ R L × n with L > n . When L > n t he d ictionary is sa id to be redundant or overcomplete. In the case of the simple orthog onal basis, the in verse (sy nthesis) transform is tri vially Φ = T T . Whereas assuming that Φ is a tight fr ame implies tha t the frame op erator s atisﬁes ΦΦ T = c I , where c > 0 is the ti ght frame constant. For tight frames, the ps eudo-in verse recons truction (syn thesis) operator reduces to c − 1 Φ . In the se quel, the dictionary Φ will correspond either to an orthobasis or to a tight frame of H . Owing to recent advances in modern harmonic analysis, many redund ant sy stems, lik e the unde cimated wavelet transform, curvelet, contourlet etc, were s hown to be very ef fective in sparsely representing images. By spa rsity , we mean that we are seeking for a good representation of x with only few signiﬁc ant coefﬁcients. In the res t of the pap er , the dictionary Φ is built by taking union of on e or several transforms, eac h co rresponding to an orthogonal bas is or a tight frame. Choosing an a ppropriate dictionary is a key step tow ards a good sp arse representation, hence restoration. A core idea here is the concept of morphological di versity . When the transforms are amalgama ted in the dictionary , they have to be chos en such that e ach leads to sparse repre sentation over the parts of the image it is serving while being inefﬁcient in repres enting the other imag e content. As popular examp les, one may think of wavelets for smooth i mag es with isotropic singularities [30, Section 9.3], curvelets for representing piecewise smooth C 2 images away from C 2 contours [31], [32], wa ve atoms or local DCT to rep resent loc ally oscillating textures [30], [33]. 1 Rigorously speaking, t he equation is to be understood in an asymptotic sense. 6 I V . S P A R S E I T E R A T I V E D E C O N VO L U T I O N A. Optimization p r oblem In this Section, we deri ve that the c lass o f minimization problems we a re interested in, see (5), c an be stated in the gen eral form : min α ∈ R L f 1 ( α ) + f 2 ( α ) , (3) where f 1 ∈ Γ 0 ( R L ) , f 2 ∈ Γ 0 ( R L ) and f 1 is dif ferentiable with a κ -Lipschitz gradient. W e denote by M the se t o f solutions of (3). From (1), we immediately deduce the data ﬁde lity term F ◦ H ◦ Φ ( α ) , with (4) F : η ∈ R n 7→ n X i =1 f ( η i ) , f ( η i ) = 1 2  z i − 2 q η i + 3 8  2 , where H deno tes the (circular) c on volution operator . From a sta tistical perspe cti ve, (4) c orresponds to the anti- log-likelihood score. Note that for bias correction reasons [34], the value 1 /8 may be use d instead of 3/8 in (4). Howe ver , there are implications of this alternate choice on the Lipschitz constant in (8) , and conseque ntly it can be seen from Theorem 1 that this will ha ve a n unfa vorable impact on the con vergence spee d of the d econ volution algorithm. Adopting a ba yesian framew ork a nd us ing a standard MAP rule, our go al is to minimize the follo wing functional with respec t to the represen tation coefﬁcients α : ( P λ,ψ ) : min α J ( α ) , (5) J : α 7→ F ◦ H ◦ Φ ( α ) | {z } f 1 ( α ) + ı C ◦ Φ ( α ) + λ L X i =1 ψ ( α i ) | {z } f 2 ( α ) , where we implicitly as sumed that ( α i ) 1 6 i 6 L are indepen dent and identically d istrib uted with a Gibbsian den sity ∝ e − λψ ( α i ) . The pen alty function ψ is chos en to en force sp arsity , λ > 0 is a regularization parame ter and ı C is the indicator function of the con vex set C . In our case, C is the p ositi ve o rthant. The role of the term ı C ◦ Φ is to impose the p ositi vity constraint on the restored image bec ause we are ﬁ tting Poisso n inten sities, which are positi ve by nature. W e als o deﬁne the s et C ′ = { α | Φ α ∈ C } , that is ı C ′ = ı C ◦ Φ . From (5), we ha ve the following, Proposition 1. (i) f 1 is con vex fun ction. It is strictly con vex if Φ is an orthobasis and k er (H) = ∅ (i.e. the spectr um of the PSF does not va nish within the Nyq uist band). (ii) The g radient of f 1 is ∇ f 1 ( α ) = Φ T ◦ H T ◦ ∇ F ◦ H ◦ Φ ( α ) , (6) 7 with ∇ F ( η ) = − z i p η i + 3 / 8 + 2 ! 1 6 i 6 n . (7) (iii) f 1 is continuo usly d if ferentiable with a κ -Lipschitz gradient where κ 6  2 3  3 / 2 4 c k H k 2 2 k z k ∞ < + ∞ . (8) (iv) ( P λ,ψ ) is a pa rticular case of pr oblem (3) . A proof c an be found i n the appendix. B. Character ization of the solution Since J is coerci ve and con vex, the f ollowing holds : Proposition 2. 1) Ex istence: ( P λ,ψ ) has at least o ne solution, i.e. M 6 = ∅ . 2) Uniqu eness: ( P λ,ψ ) has a unique solution if Φ is an orthobasis and ker (H) = ∅ , or if ψ is strictly con vex. C. Pr oximal iteration W e ﬁrst d eﬁne the no tion of a proximity op erator , which was introduced in [35] as a gen eralization of the no tion of a c on vex projection ope rator . Deﬁnition 1 (Moreau [35 ]) . Let ϕ ∈ Γ 0 ( H ) . Then, for every x ∈ H , the function y 7→ ϕ ( y ) + k x − y k 2 / 2 a chieves its inﬁmum a t a uniqu e point den oted by pro x ϕ x . The op erator p ro x ϕ : H → H thus deﬁ ned is the proximity operator of ϕ . Moreover , ∀ x, p ∈ H p = pr o x ϕ x ⇐ ⇒ x − p ∈ ∂ ϕ ( p ) ⇐ ⇒ h y − p, x − p i + ϕ ( p ) 6 ϕ ( y ) ∀ y ∈ H . (9) (9) means that pro x ϕ is the resolvent o f the subd if ferential of ϕ [36]. Re call tha t the resolvent of the subdiffer ential ∂ ϕ is the sing le-valued operator J ∂ ϕ : H → H su ch that ∀ x ∈ H , x − J ∂ ϕ ( x ) ∈ ∂ ϕ ( J ∂ ϕ ) ⇐ ⇒ J ∂ ϕ = (I − ∂ ϕ ) − 1 . It will a lso be conv enien t to intr odu ce the reﬂec tion operator rprox ϕ = 2 p ro x ϕ − I . For n otational simplicity , we den ote by Ψ the function α 7→ P i ψ ( α i ) . Our goal now is to expre ss the p roximity operator a ssociated to f 2 , which will be needed in the iterati ve decon volution a lgorithm. Th e difﬁculty s tems from the deﬁnition of f 2 which combines both the ’pos iti vity’ cons traint and the regularization. Unfort un ately , we c an show that e ven with a separable pen alty Ψ ( α ) , the ope rator pro x f 2 = pr o x ı C ◦ Φ+ λ Ψ has no explicit form in general, except the c ase where Φ = I . W e then p ropose to replace explicit ev aluation o f p ro x f 2 by a sequ ence of ca lculations that activ ate sep arately pro x ı C ◦ Φ and pr o x λ Ψ . W e will s how that the last two prox imity operators have c losed-form expressions. Such a s trategy is known as a splitting method of maximal monotone operators [36], [37]. As both 8 ı C ′ and Ψ be long to Γ 0 ( H ) and are non-differentiable, our splitting method is based on the Douglas-Rac hford algorithm [28], [36], [37]. The follo wing lemma summarizes our sche me. Lemma 1. Let Φ an orthobasis or a tight fr ame with constant c . Recall that C ′ = { α | Φ α ∈ C } . 1) If α ∈ C ′ then p r o x f 2 ( α ) = pr o x λ Ψ ( α ) . 2) Other wise, let ( ν t ) t be a sequ ence in (0 , 1) such that P t ν t (1 − ν t ) = + ∞ . T ake γ 0 ∈ H , and deﬁn e the sequen ce of iterates : γ t +1 = γ t + ν t  rprox λ Ψ+ 1 2 k . − α k 2 ◦ r pro x ı C ′ − I  ( γ t ) , (10) where prox λ Ψ+ 1 2 k . − α k 2 ( γ t ) =  pro x λ 2 ψ  ( α i + γ t i ) / 2   1 6 i 6 L , P C ′ = pro x ı C ′ = c − 1 Φ T ◦P C ◦ Φ+  I − c − 1 Φ T ◦ Φ  and P C is the projector on to the positive or thant ( P C η ) i = max( η i , 0) . Then, γ t ⇀ γ a nd prox f 2 ( α ) = P C ′ ( γ ) . (11) The proof is detailed in the ap pendix. Note that wh en Φ is an ort ho basis, P C ′ = Φ T ◦ P C ◦ Φ . T o implement the above iteration, we ne ed to e xp ress p ro x λψ , which is gi ven by the fol lowing result for a wide class of p enalties ψ : Lemma 2. Suppo se tha t ψ s atisﬁes, (i) ψ is c on vex even-symmetr ic , n on-ne gative and non -decreasing on [0 , + ∞ ) , and ψ (0 ) = 0 . (ii) ψ is twice d if ferentiable on R \ { 0 } . (iii) ψ is con tinuous on R , it is not neces sarily smooth at zero and ad mits a po sitive right de rivative at zer o ψ ′ + (0) = lim h → 0 + ψ ( h ) h > 0 . Then, the pr oximity operator of δ Ψ( γ ) , prox δ Ψ ( γ ) has exactly one continuous solution de coupled in ea ch c oordinate γ i : pro x δψ ( γ i ) =      0 if | γ i | 6 δ ψ ′ + (0) γ i − δ ψ ′ ( ¯ α i ) if | γ i | > δ ψ ′ + (0) (12) A proof of this lemma can be found in [38]. A similar res ult also ha s rece ntly ap peared in [39]. Among the mo st popular pen alty functions ψ satisfying the a bove requiremen ts, we have ψ ( α i ) = | α i | , in which case the assoc iated proximity operator is the popular soft-thresholding. W e are no w ready t o state our main proximal iterativ e algorithm to solve the minimization problem ( P λ,ψ ) : Theorem 1. F or t ≥ 0 , let ( µ t ) t be a se quence in ( 0 , + ∞ ) s uch that 0 < in f t µ t 6 sup t µ t <  3 2  3 / 2 /  2 c k H k 2 2 k z k ∞  , let ( β t ) t be a seque nce in (0 , 1] such that in f t β t > 0 , and let ( a t ) t and ( b t ) t be se quence s in H such that P t k a t k < + ∞ and P t k b t k < + ∞ . F ix α 0 ∈ R L , for e ve ry t > 0 , s et α t +1 = α t + β t (pro x µ t f 2  α t − µ t  ∇ f 1 ( α t ) + b t  + a t − α t ) (13) where ∇ f 1 and pro x µ t f 2 ar e given by Pr opos ition 1(ii) and Lemma 1. Th en ( α t ) t ≥ 0 con verges to a s olution o f ( P λ,ψ ) . This is the most genera l con ver ge nce result kn own on the forward-backward iteration. The name of the iteration is inspired by we ll-established tech niques from numerical linear algebra. Th e words ”forward” and ” backward” 9 refer res pectiv ely to the standard n otions o f a forward difference (here explicit g radient desce nt) step a nd o f a backward diff eren ce (he re implicit prox imity) step in n umerical ana lysis. Th e seq uences a t and b t play a prominent role a s they formally estab lish the rob us tness of the algorithm to nu merical errors when computing the gradient ∇ f 1 and the prox imity ope rator p ro x f 2 . The latter remark will allo w us to acce lerate the algorithm by running the sub-iteration (10) o nly a few iterations (see implementation de tails in IV -F). For illustration, let’ s take Ψ as the ℓ 1 norm, in which case pro x λ Ψ is the compon ent-wise soft-thresholding with threshold λ , a t = b t ≡ 0 , β t ≡ 1 an d µ t ≡ µ in (13), and ν t ≡ 1 / 2 in (10). Thus, bringing a ll the piec es together , the dec on volution algorithm d ictated by iterations (13) and (10) is summa rized in Algorithm 1. Algorithm 1 T ask: Image de con volution with P oisson noise , solve (5). Parameters: The observed image c ounts y , the dictionary Φ , number of iterations N FB in (13) and N DR in s ub- iteration (10), relaxa tion parameter µ , regularization pa rameter λ . Initialization: • Apply VST z = 2 p y + 3 / 8 . • Initial solution α 0 = 0 . Main iteration: For t = 0 to N FB − 1 , • Compute blurred es timate η t = HΦ α t . • Compute ’ residua ls’ r t =  − z i √ η t i +3 / 8 + 2  1 6 i 6 n . • Move a long the desc ent direction ξ t = α t + µ Φ T H T r t . • Initialize γ 0 = ξ t , and s tart sub-iteration. • F or m = 0 to N DR -1, – P roject γ m orthogonally to C ′ : ζ m = c − 1 Φ T (min(Φ γ m , 0) ) . – Up date γ m +1 by soft-thresholding γ m +1 = ST λ/ 2  ( ξ t + γ m ) / 2 − ζ m  + ζ m . • Update α t +1 = γ N DR − c − 1 Φ T  min(Φ γ N DR , 0)  . End main it er ation Output: Dec on volv ed image x ⋆ = Φ α N FB . D. Choice o f µ The relaxation (or descen t) parameter µ h as a n impo rtant impact o n the co n ver genc e spee d of the algorithm. The upper-bound provided by Theorem 1, derived from the Lipschitz con stant (8) is on ly a s uf ﬁc ient condition for (13) to con verge, and may be pessimistic in some applications. T o circumvent this drawback, Ts eng proposed in [40] an extension of the forward-backward algorithm with an iteration to adapti vely es timate a ”good” value of µ . Th e 10 main result provided hereafter is an adaptation to our context to the one o f Tse ng [40]. W e state it in full for the sake of completene ss and the reade r con venience. Theorem 2. Let C ′ as deﬁned a bove (IV -A ). Choose a ny α 0 ∈ C ′ . L et ( µ t ) t ∈ N be a s equen ce such that ∀ t > 0 , µ t ∈ (0 , ∞ ) . Let f 1 as deﬁn ed in (5) . Then the sequence ( α t ) t ∈ N of iterates α t + 1 2 = pro x λ Ψ ( α t − µ t ∇ f 1 ( α t )) , α t +1 = P C ′  α t + 1 2 − µ t  ∇ f 1 ( α t + 1 2 ) − ∇ f 1 ( α t )  (14) con verges to a minimum of J . As ∇ f 1 is Lipsch itz-continuous, the update of the relaxation sequen ce µ t is rather e asy . Indeed, us ing an Armijo- Goldstein-type stepsize approach , we can compute and update µ t at each iteration by taking µ t to be the largest µ ∈ { σ , τ σ, τ 2 σ, . . . } satisfying µ     ∇ f 1 ( α t + 1 2 ) − ∇ f 1 ( α t )     6 θ     α t + 1 2 − α t     , (15) where τ ∈ (0 , 1) , θ ∈ (0 , 1) a nd σ > 0 are c onstants. τ = 1 / 2 is a typical cho ice. It is worth noting that for tight frames, this algorithm will somewhat simplify the compu tation of p ro x f 2 , removing the ne ed of the Douglas-Rac hford sub-iteration (10). But, whatev er the transform, this w ill co me at the price of keeping track of the g radient of f 1 at the points α t + 1 2 and α t , and the need to check (15) several times. E. Choice o f λ As usual in regularized in verse prob lems, the c hoice of λ is crucial as it represents the desired ba lance between sparsity (re gulariza tion) and deconv olution (data ﬁdelity). For a given a pplication and corpus of images (e.g. confoca l microscopy), a naive brute-force approach would consist in testing se veral v alues of λ and taking the b est one by visual asse ssment of the de con volution qua lity . However , this is cumbe rsome in the gene ral case. W e p ropose to o bjecti vely select the regularizing pa rameter λ base d on an ad aptiv e model selection criterion such as the generalized cross validation (GCV) [41]. Other criteria are possible as well includ ing the AIC [42 ] or the BIC [43 ]. GCV attempts to provi de a d ata-dri ven estimate of λ by minimizing : GCV( λ ) =    z − 2 q HΦ α ⋆ + 3 8    2 ( n − d f ) 2 , (16) where α ⋆ ( z ) deno tes the solution arr ived at by it eration (13) (or (14)), an d d f is the ef fectiv e number of degrees of freedom. Deri ving the closed-form expression of d f is very challenging in our case as it faces two main difﬁculties, (i) the observ ation model (1) is no n-linear , a nd (ii) the solution α ⋆ ( z ) is not known in closed form but given by the iterati ve forward-backward algorithm. Degrees o f freedom is a familiar ph rase in statistics. In (overdetermined) linear regression d f is the numb er of estimated predictors. More gene rally , degrees of freedom is often used to qua ntify the model complexity of a 11 statistical mode ling procedure. Howev er , generally sp eaking, there is no exact correspo ndence b etween the d egrees of freedom d f and the number of pa rameters in the model. In penalized solutions o f in verse problems where the estimator is linear in the observation, e.g . T ikhonov regularization o r ridge regression in s tatistics, d f is simply the trace of the so-called inﬂuence or the hat matrix. But in g eneral, it is dif ﬁcult to deri ve the analytical expression of d f for gen eral nonlinear mode ling proc edures suc h a s ours. This remains a challenging and ac ti ve area of re search. Stein’ s unbias ed risk estimation (SURE) theory [44] gives a rigorous deﬁnition of the degrees of freedom for any ﬁtting procedure. F ollowing our notation, gi ven the solution α ⋆ provided by our d econ volution algorithm, let z ⋆ ( z ) = 2 p HΦ α ⋆ ( z ) + 3 / 8 represent the model ﬁt from the observation z . As Z | α ∼ N  2 p HΦ α + 3 / 8 , 1  , it follo ws fr om [45] that the degrees of freedom of our procedure is d f ( λ ) = n X i =1 Co v ( z ⋆ i ( z ) , z i ) , a qu antity also called the optimism of the estimator z ⋆ ( z ) . If the estimation a lgorithm is s uch that z ⋆ ( z ) is a lmost- dif ferentiable [44 ] with respect to z , so that its diver gence is well-deﬁned in the weak s ense (as is the case if z ⋆ ( z ) were uniformly L ipschitz-continuous ), Stein’ s Lemma yie lds the so-called diver gence formula d f ( λ ) = n X i =1 Co v ( z ⋆ i ( z ) , z i ) = E Z [div ( z ⋆ ( z ))] = E Z " n X i =1 ∂ z ⋆ i ( z ) ∂ z i # . (17) where the expectation E Z is taken under the distri bution of Z . The d f is the n the sum of t he sensitivit y of each ﬁtted value with respect to the corresponding observed value. For example, the las t expres sion of this formula has been used in [46] for orthogonal wa velet denoising. Howev er , i t is notoriously dif ﬁcult to deriv e the closed-form analytical expression of d f from the ab ove formula for gen eral nonlinear modeling procedures . T o overcome the analytical dif ﬁculty , the boo tstrap [47] can be u sed to obtain an (as ymptotically) u nbiased e stimator o f d f . Y e [48] and Shen and Y e [49] propos ed u sing a da ta p erturbation tec hnique to numerically compute an (approximately) unbiased es timate for d f when the analytical form of z ⋆ ( z ) is unavailable. From (17), the es timator of d f takes the form b d f ( λ ) = E V 0  v 0 , z ⋆ ( z + τ v 0 ) − z ⋆ ( z ) τ  , V 0 ∼ N (0 , I ) , = 1 τ 2 Z h v , z ⋆ ( z + v ) i φ ( v ; τ 2 I) dv , V ∼ N (0 , τ 2 I) , (18) where φ ( v ; τ 2 I) is the n -dimensional den sity of N (0 , τ 2 I) . It can be shown that this formula is valid if V is replaced by any vector of ran dom of variables with ﬁn ite highe r order moments. T he au thor in [48] proved that this is a n unbiased estimate of d f as τ → 0 , that is lim τ → 0 E Z h b d f ( λ ) i = d f ( λ ) . It c an be computed b y Monte-Carlo integration w ith τ near 0 .6 as devised in [48]. Howe ver , both bootstrap and Y e’ s method, although g eneral and can be used for any Ψ ∈ Γ 0 ( R L ) , are c omputationally prohibiti ve. This is t he main reason we will not use them here. Zou et al. [50] rece ntly studied the degrees of freedom of the Las so 2 in the framework of SURE. They showed that for any gi ven λ the number of nonzero c oefﬁcients in the model is an unbiased a nd consistent es timate o f d f . 2 The Lasso model correspond to the case of (5) where the degradation model in (1) is linear and Ψ is the ℓ 1 -norm. 12 Howe ver , for their resu lts to hold rigorously , the matrix A = HΦ in the Lass o must b e over -determined L < n with Rank(A) = L . Nonetheless, one ca n sho w that their intuit ive es timator can be exten ded to the und er -determined case (i.e. L ≥ n ) unde r the so -called (UC) condition of [51]; s ee Theo rem 2 in that referen ce. This will yield an un biased es timator of d f , but cons istency would be mu ch ha rder to prove since it requires that the Gram matrix A T A is positive-deﬁnite which only happens in the special case o f Φ an orthogonal basis and k er (H) = ∅ . Furthermore, ev en with the ℓ 1 norm, extending this simple es timator rigorously to ou r setting faces two additional serious difﬁculties bes ide underde terminacy of A : namely the non-linearity o f the degradation equation (1) and the positivit y constraint in (5). Follo wing this discu ssion, it appears clea rly that estimating d f is either co mputationally intensi ve (bootstrap or perturbation techniques), or analytically dif ﬁcult to deri ve. In this paper , in the s ame vein as [50], we co njecture that a simple estimator of d f , is gi ven by the ca rdinal of the s upport of α ⋆ . That is, fr om (12)-( 13) b d f ( λ ) = Card  i = 1 , . . . , L   | α ⋆ i | ≥ λµ  . (19) W ith such simple formula o n hand, expres sion o f the model s election criteria GCV i n (16) is readily av ailable. 10 −4 10 −3 10 −2 10 −1 10 0 10 1 10 −6 10 −5 10 −4 λ 10 −4 10 −3 10 −2 10 −1 10 0 10 1 10 −5 10 −4 λ (a) (b) Figure 1. GCV for the Cameraman (a) and the Neuron phan tom (b). The translation-in variant discrete wavelet transform was used with the Cameraman image, and the curvelet transform with the Neuron phantom. The solid line represents the GCV , the dashed line the MSE and the dashed-do tted line t he MAE. Although this formula is o nly an a pproximation, in a ll our experiments, it performed reaso nably well. This is testiﬁed b y Fig. 1(a) and (b) which res pectiv ely show the behavior of the GCV as a function of λ for two image s: the Came raman portrayed in Fig. 4(a) and the Neu ron phantom sh own in Fig. 2( a). As the ground -truth is known in the simulation, we c omputed for eac h λ the mean ab solute-error (MAE)—well adap ted to Poisson noise as it 13 is c losely related to the squared Hellinger distan ce [52]— as well as the mean squa re-error (MSE) between the deconv olved a nd true image. W e c an clearly see that the GCV reaches its minimum clos e to those of the MAE and the MSE. Even though the r egularization pa rameter dictated by the GCV criterion is slightly high er than that of the MSE , which may l ea d to a so mewhat over -smooth estimate. F . Com putational complexity and implementation details The bulk of comp utation of our dec on volution algorithm is in vested in applying Φ (r es p. H ) an d its adjoint Φ T (resp. H T ). These operators a re nev er constructed explicitl y , rather they are implemented a s fast implicit op erators taking a vector x , and returning Φ x (resp. Φ T x ) and H x (resp. H T x ). Multiplication by H or H T costs tw o FF Ts, that is 2 n log n op erations ( n deno tes the numb er of pixels). The c omplexity of Φ and Φ T depend s on the trans forms in the dictionary: for example, the orthogonal wa velet transform costs O ( n ) operations, the translation-in variant discrete wavelet transform (TI-DW T) costs O ( n log n ) , the curvelet transform cos ts O ( n log n ) , etc. Let V Φ denote the co mplexity o f ap plying the analys is or syn thesis operator . De ﬁne N FB and N DR as the number of iterations in the forward-backward a lgorithm and the D ouglas-Rach ford sub-iteration, and rec all that L is the n umber of coefﬁcients. The computational complexities of our iterations (13 ) and (14) a re summarized be low: Algorithm Computational complexity bounds Φ orthoba sis Φ tight frame (13) N FB (4 n log n + N DR (2 V Φ + O ( n ))) N FB (4 n log n + 2 V Φ + N DR (2 V Φ + O ( L ))) (14) N FB (8 n log n + 2 V Φ + O ( n )) N FB (8 n log n + 6 V Φ + O ( L ) ) The orthoba sis case requires less multiplications by Φ and Φ T becaus e in that case, Φ is a bijective linear operator . Thus, the optimization problem (5) can be equi valently written in terms of image samples instead of coefﬁcients, hence reducing comp utations in the co rresponding iterations (13) and (14). For o ur implementation, as in Algorithm 1, we ha ve taken a t = b t ≡ 0 and β t ≡ 1 in (13), and ν t ≡ 1 / 2 in (10). As the PS F h in our experiments is lo w-pas s normalized to a unit sum, k H k 2 2 = 1 . Ψ was the ℓ 1 -norm, leading to s oft-thresholding. Furthermore, in o rder to accelerate the computation of pro x f 2 in (13), the Dougla s-Rachford sub-iteration (10) was on ly run onc e (i.e. N DR = 1 ) s tarting with γ 0 = α . In this case , one c an che ck that if γ 0 ∈ C ′ , then this leads to the ”natural” formula : pro x f 2 ( α ) = P C ′ ◦ p r o x λ 2 Ψ ( α ) . In our experimental s tudies, the GCV -ba sed selection of λ was run using the forw ard-back ward algorithm (13) which has a lo wer computational burden than (14) (se e abov e table for co mputational co mplexities). Once λ was objectiv ely chosen by the GCV proced ure, the d econ volution algorithm was applied using (14) to exempt the use r from the ch oice of the relax ation parameter µ . 14 V . R E S U LT S A. Simulated data The p erformance of our approach has been asse ssed on several test images: a 128 × 128 ne uron pha ntom [53], a 370 × 370 c onfocal microscopy image of micro-vesse l cells [54], the Cameraman ( 256 × 256 ), a 512 × 512 simulated astronomical image of the Hubble Space T elescop e W ide Field Camera of a distant cluster of galaxies [3]. Our algorithm was compa red to RL with total v ariation regularization (RL-TV [7]) , RL with mu lti-r es olution support wavelet regularization (RL-MRS [9]), fast translation i n variant tree-pruning reconstruction combined with an EM algorithm (FTITPR [ 55 ]) and the naive prox imal method t ha t would treat the noise a s if it were Ga ussian (Nai veGaus s [12]). For all results presented, eac h algorithm was run with N FB = 200 iterations , e nough to reach con vergence. For a ll results below , λ was selected using the GCV criterion for our algorithm. For fair comparison to [12], λ w as a lso chose n by adapting our GCV formul a to the Gaus sian n oise. Fig.2(a), dep icts a ph antom of a neu ron with a mushroo m-shaped s pine. T he maximum intensity is 3 0. Its blurred (using a 7 × 7 moving average) a nd blurred+ noisy versions are in (b) an d (c). W ith this n euron, and for NaiveGauss and our approac h, the dictionary Φ contained the curvelet ti ght frame [32]. The deco n voluti on results are shown in Fig.2(d)- (h). As expected at this intensity level, the Nai veGauss a lgorithm performs quite badly , as it does not ﬁt the noise model at this intensity regime. It turns o ut that NaiveGauss u nder-r egularizes the e stimate and the Poisson signal-depen dent n oise is n ot always un der con trol. This behavior of NaiveGauss, w hich was predictable at this intensity level, will be observed on almost all tes ted image s. RL-TV does a good job at dec on volution but the backgrou nd is dominated by artif acts, and the restored neuron has staircase-like art ifacts typical of TV re gu larization. Our a pproach p rovides a visually plea sant deco n voluti on res ult on this example. It efﬁciently restores the spine , although the backgroun d is not fully c leaned. RL-MRS a lso exhibits good decon volution results. On this image, FTITPR provides a well smo othed estimate but with almost no decon volution. These q ualitati ve visu al results are con ﬁrmed by quantitativ e mea sures of the q uality of decon volution, where we used both the MAE and the traditional MSE criteria. At each intensity value, 10 noisy an d blurred replications were generated and and the MAE was computed for each deconv olution algo rithm. T he average MAE over the 10 replica tions are gi ven in Fig. 6 (similar results we re obtaine d for the MSE, not shown here). In general, our algorithm performs very well at all intens ity regimes (es pecially at medium to low). T he Naiv eGau ss is amon g the worst algorithms at low intens ity le vels. Its performance becomes better as the intens ity increases which was expected. RL-MRS is effecti ve at low and medium intens ity levels and is ev en better than o ur algorithm on the Cell image. RL-TV underpe rforms all algorithms at low intensity . W e su spect the s taircase-like artifacts of TV - regularization to b e respons ible for this b ehavior . At high intensity , RL-TV bec omes competitiv e and its MAE comparable to ou rs. The same experiment as above was c arried out with the co nfocal microsco py cell imag e; se e Fig. 3. In this experiment, the PSF was a 7 × 7 moving average. For the NaiveGauss and our app roach, the dictionary Φ contained 15 (a) (b) (c) (d) (e) (f) (g) (h) Figure 2. Dec on volution of a si mulated neuron (Intensity 6 30). (a) Original, (b) Blurred, (c) Blurred&noisy , (d) RL-TV [7], (e) NaiveGa uss [12], (f) RL -MRS [3], (g) FTITPR [55], (h) Our Algorithm. the TI-D WT . N ai veGau ss decon volution result is spoiled by a rtif acts. RL-TV produc es a good restoration of s mall isolated d etails but with a dominating s taircase-like artifacts. FTITPR yields a somewhat oversmooth es timate, whereas o ur approa ch p rovides a sharpe r de con volution res ult. This visual inspec tion is in agreeme nt with the MAE mea sures of Fig. 6 . In particular , one can notice that RL-MRS s hows the be st behavior , and the performance of our a pproach compa red to the other methods on this cell image is roughly the same as o n the previous neuron 16 image. (a) (b) (c) (d) (e) (f) (g) (h) Figure 3. Decon volution of a microscopy cell image (Intensity 6 30). (a) Original, (b) Blurred, (c) Blurred&noisy , (d) RL-TV [7], (e) Naiv eGauss [12], (f) RL-MRS [ 3], (g) FT ITPR [55] (h) Our Algorithm. Fig.4(a) depicts the result of the experiment on the Ca meraman with maximum intens ity of 30 . Th e PSF was the same as above. Aga in, the d ictionary contained the TI-D WT frame. One may no tice that the d egradation in Fig.4(c) is quite s ev ere. Our algorithm provides the mos t visually pleas ing result with a good balanc e b etween regularization and deconv olution, althoug h some art ifacts are persisting. RL-MRS manages to deconv olve the image with more 17 artif ac ts than ou r approach, a nd suf fers fr om a loss of ph otometry . Again, FTITPR gi ves an oversmooth e stimate with many missing details. Both RL -TV and NaiveGauss yield resu lts with many artif acts. This visual impression is in a greement with the MAE v alues in Fig. 6. (a) (b) (c) (d) (e) (f) (g) (h) Figure 4. Decon vo lution of the cameraman (Intensity 6 30). (a) Original, (b) Blurred, (c) Bl urred&noisy , (d) RL -TV [7], (e) NaiveGauss [12], (f) RL -MRS [3], (g) FTITPR [55], (h) Our Algorithm. T o a ssess the compu tational cos t of the compared a lgorithms, T a b . I summarizes the execution times o n the Cameraman image with an Intel PC Co re 2 Duo 2GHz, 2Gb RAM. Exc ept RL-MRS which is written in C++, all 18 other algorithms were implemented in Matlab . Method T ime (in s) Our method 88 Naiv eGauss 71 RL-MRS 99.5 RL-TV 15.5 T able I E X E C U T I O N T I M E S F O R T H E S I M U L A T E D 256 × 256 C A M E R AM A N I MA G E U S I N G T H E T I - D W T ( N FB = 200 ) . The same experimental protocol was applied to a simulated Hubble Space T elescop e wide ﬁe ld camera image of a distant clus ter of ga laxies portrayed in Fig.5(a). W e use d the Hubble Space T elescope P SF as giv en in [3]. T he ma ximum intens ity on the b lurred imag e was 5 000. For Naiv eGa uss and o ur app roach, the dictionary contained the TI-D WT frame. For this imag e, the RL-MRS is clearly the bes t as it was exactly designed to handle Poisson noise for suc h images. Mos t f aint structures are recovered by RL-MRS and large bright objects are well deconv olved. Our app roach a lso yields a goo d d econ volution resu lt a nd prese rves most faint objects that are hardly visible on the degraded image. But the backgrou nd is less clean than the on e of RL-MRS. A this high intensity regime, NaiveGauss provides s atisfactory results comparable to ours o n the galaxies. FTITPR manages to properly recover most signiﬁcant structures with a very c lean background, but many faint objects are los t. RL-TV gives a deconv olution result c omparable to ours on the brightest objects, but the ba ckground is dominated by s purious faint structures. W e also quan tiﬁed the inﬂuence of the dictiona ry on the d econ volution performance on three test images . W e ﬁrst s how in Fig. 7 the res ults of an experiment on a simulated 128 × 128 image, containing p oint-lik e sou rces (upper left), g aussians an d lines. In this exp eriment, the maximum intensity of the original image is 30 an d we used the 7 × 7 moving-a verage PSF . The TI-D WT depicted in Fig. 7(d) does a good job at recovering isotropic structures (point-like and gaussians ), but the lines are no t well restored. T his drawback is overcome when using the curvelet transform as s een from Fig. 7(e), but as expec ted, the f aint point-like source in the upper -left part is sacriﬁced. V isually , using a dictiona ry with both trans forms seems to t ake the b est of b oth worlds, see Fig. 7(g). Fig. 8 shows the MAE—he re normalized to the maximum intens ity of the original image for the sake of legibil- ity—as a function of the maximal i ntens ity le vel f or three test images: Neuron phantom, Cell and LinesGaussians . As above, three dictionaries were use d: TI-D WT (solid line), curvelets (das hed line) and a dictionary built by me r ging both transforms (dashe d-dotted line). For the Ne uron phantom, which is piec ewise-smooth, the bes t performance is given by the TI-D WT+c urvelets dictiona ry at medium and high intensities. Even though the dif ferences between dictionaries are less s alient at low intens ity levels. For the Cell imag e, which contains many localized s tructures, the TI-D WT seems to provide the best b ehavior , esp ecially as the intens ity increase s. Finally , the beh avior obse rved for the LinesGauss ians image is just the o pposite to that of the Cell. More precisely , the curvelets and TI-D WT+ curvelets 19 (a) (b) (c) (d) (e) (f) (g) (h) Figure 5. Dec on volution of the simulated sky . (a) Original, (b) Blurred, (c) Blurred&noisy , (d) RL-TV [7], (e) Naive Gauss [12], (f) RL-MRS [3], (g) FT ITPR [55], (h) Our Algorithm. dictionaries show the bes t performance with an a dvantage to the latter . Howev er , this limited set o f experiments does not allow to co nclude that a dictionary built by a malgamating s everal transforms is the best strategy in ge neral. Such a ch oice strongly dep ends on the image mo rphological conten t. 20 10 1 10 2 2 4 6 8 10 12 14 16 Maximal Intensity MAE Our algorithm NaiveGauss RL−MRS RL−TV FTITPR 10 1 10 2 1 2 3 4 5 6 7 8 Maximal Intensity MAE 10 1 10 2 1 2 3 4 5 6 7 Maximal Intensity MAE (a) (b) (c) Figure 6. A verage MAE of all algorithms as a function of the intensity lev el. (a) Cameraman, (b) Neuron phantom, (c) Cell . (a) (b) (c) (d) (e) (f) Figure 7. Impact of the dictionary on decon volution of the simulated LinesGaussians i mage with maximum intensity 30. (a) Original, (b) Blurred, (c) Blurred&noisy , (d) restored wi th TI-DWT , (e) restored wi th curvelets, (f) restored wi th a dictionary containing both transforms. 21 50 100 150 200 250 0.1 0.15 0.2 0.25 0.3 0.35 Maximal Intensity Normalized MAE 50 100 150 200 250 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Maximal Intensity Normalized MAE (a) (b) 50 100 150 200 250 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Maximal Intensity Normalized MAE (c) Figure 8. Impa ct of the dicti onary on decon volution performance as a function of the maximal intensity leve l for se veral test i mages: (a) Neuron phantom, (b) Cell and (c) LinesGaussians images. S olid line indicates the TI-DWT , dashed l ine corresponds to the curvelet transform, and dashed-dotted line to the dictionary built by merging both wav elets and curvelets. B. Real data Finally , we a pplied o ur a lgorithm on a real 512 × 512 confocal microscopy image of neurons . Fig. 9(a) depicts the observed image 3 using the GFP ﬂuoresc ent protein. The optical PSF of the ﬂuo rescence microscope was modeled using the gauss ian approximation desc ribed in [56]. Fig. 9(b) shows the restored image us ing ou r a lgorithm with the wa velet trans form. Th e images are shown in log-sc ale for better visua l rendering. W e can notice that the b ackgroun d 3 Courtesy of the GIP Cyc ´ eron, Caen France. 22 has b een c leaned an d some structures have reappeared . The s pines a re we ll re stored an d p art of the den dritic tree is reconstructed. However , so me information can be lost (see tiny holes). W e susp ect that this result may be improv ed using a more accurate PSF model. (a) (b) Figure 9. Decon volution of a real neuron. ( a) Original noisy , (b) Restored with our algorithm C. Reproducible resear ch Follo wing the philosophy of reproducible rese arch [57], a toolbox is made available freely for do wnloa d at the ﬁrst a uthor’ s webpage http://www.gr eyc.ensicae n.fr/ ∼ fdupe . This too lbox is a collection of Matlab functions, scripts and d atasets for image deconv olution under Poisson noise. It requires at least W aveLab 8.02 [57]. The toolbox implements the proposed algorithms a nd conta ins all scripts to reprod uce most o f the ﬁgures included in this pa per . V I . C O N C L U S I O N In this pap er , a novel spa rsity-based fast iterative thresholding decon volution a lgorithm that takes acc ount of the presenc e of Poisson noise was presented . The Poisson noise was h andled properly . A careful theoretical study of the optimization problem and charac terization of the iterati ve a lgorithm were provided. The ch oice of the regularization parameter was also attacked using a GCV -base d proce dure. Se veral experimental tes ts ha ve shown the cap abilities of our approach, which comp ares fav orably with some state-of-the-art algo rithms. E ncouraging preliminary res ults were also obtaine d on real co nfocal microscopy images. The pres ent work may be extended along several lines. For example, it is worth noting that ou r approac h generalizes straightforwardly to any non-linearity in (1) other than the s quare-root, provided tha t the correspon ding 23 data ﬁdelity term a s in (4) is c on vex and has a Lipsc hitz-continuous g radient. This is for instanc e the ca se if a generalization of the Anscombe VST [58] is applied to a Poisson plus Ga ussian noise, which is a realistic noise model for data obtaine d from a CCD detector . For suc h a noise, on e ca n e asily sh ow similar results to those proved in our wor k. In this pape r , the simple expression of the degrees o f freedom d f was conjectured without a r igorou s proof. Deriving the exact analytical express ion of d f , if pos sible, needs further inv estigation and a very careful analysis that we leav e for a future work. On the a pplicativ e side, the exten sion to 3D to h andle confocal microsco py volumes is u nder in vestigation. Ex tension to multi-v alued images is also an important aspect that will be the focus of future rese arch. R E F E R E N C E S [1] P . Sarder and A. Nehorai, “Decon volution Method for 3-D Fluorescence Microscopy Images, ” IEEE Sig. Pr o. Mag . , vol. 23, pp. 32–45, 2006. [2] J. Pawle y , Handbook of Confocal Microsco py . Plenum Press, 2005. [3] J.-L . Starck and F . Murtagh, Astrono mical Imag e and Data Analysis . Springer , 2006. [4] H. C. Andrews and B. R. Hunt, Digital Image Restoration . Prentice-Hall, 1977. [5] P . Jansson, Image Recovery: Theory and Application . Ne w Y ork: Academic Press, 1987. [6] ——, D econ volution of Images and Spectra . Ne w Y ork: Academic Press, 1997. [7] N. Dey , L. Blanc-F ´ eraud, C. Zimmer, Z. Kam, J. -C. Oliv o-Marin, and J. Zerubia., “ A decon volution method for confocal microscopy with total variation regularization, ” i n IEEE ISBI , 2004. [8] J.-L . Starck and F . Murtagh, “Image restoration with noise suppression using the wavelet t ransform, ” Astro nomy and Astr ophysics , vol. 288, pp. 343–34 8, 1994. [9] J.-L . Starck, A. Bijaoui, and F . Murtagh, “Multiresolution support applied to image ﬁltering and decon volution, ” CVGI P: Graphical Models and Image Processin g , vo l. 57, pp. 420–431, 1995. [10] G. Jammal and A. Bijaoui, “Dequant: a ﬂ exible multiresolution restoration frame work, ” Signal Pr ocessing , vol. 84, pp. 1049–1069, 2004. [11] J. B. de Mon vel et al, “Image Restoration for C onfocal Microscopy : Impro ving the Limits of Decon volu tion, wit h Application of t he V isualization of the Mammalian Hearing Organ, ” Biophysical J ournal , vol. 80, pp. 2455–247 0, 2001 . [12] C. V onesch and M. Unser , “ A fast iterative thresholding algorithm for wav elet-regularized decon volution, ” IE EE ISBI , 2007. [13] I. Daubechies, M. Defrise, and C. D . Mol, “ An it erativ e thresholding algorithm f or linear in verse problems with a sparsity constraints, ” Comm. Pur e Appl. Math. , vo l. 112, pp. 1413–1541, 2004. [14] R. No wak an d E. K olaczyk, “ A statistical mu lti scale frame work for poisso n inv erse prob lems, ” IEEE T ransactions on Information Theory , vol. 46, pp. 2794–28 02, 2000. [15] L. Cavalier and J.-Y . Koo, “Poisson intensity estimation for t omographic data using a wav elet shrinkage approach, ” IEEE T ransactions on Information T heory , vol. 48, pp. 2794– 2802, 2002. [16] D. L. Donoho, “Nonlinear solution of in verse problems by wa velet-v aguelette decomposition, ” Applied and Computational Harmonic Analysis , vol. 2, pp. 101–12 6, 1995. [17] A. Antoniadis and Bigot, “Poisson in verse problems, ” Annals of Statistics , vol. 34, pp. 1811–1825 , 2006. [18] A. Cohen, M. Hoffman, and M. Reiss, “ Adaptiv e wav elet-galerkin methods for in verse problems, ” SIAM J. Numerical Analysis , vol. 42, pp. 1479–1501, 2004. [19] M. F igueiredo and R. Nowak, “ An em algorithm for wavelet-based image restoration, ” IEEE T ransactions on Imag e Pr ocessing , vol. 12, pp. 906–916, 2003. 24 [20] P . L. Combettes and V . R. W ajs, “Signal recov ery by proximal forward-b ackward splitting, ” SIAM Multiscale Model. Simul. , vo l. 4, no. 4, pp. 1168–1 200, 2005. [21] G. T eschke, “Mu lti-f rame represen tations in linear in verse problems with mix ed multi-constraints, ” Applied and Computational Harmonic Analysis , vol. 22, no. 1, pp. 43–60, 2007. [22] C. Chaux, P . L. Combettes, J.-C. Pesquet, and V . R. W ajs, “ A v ariational formulation for frame-based inv erse problems, ” In v . Pr ob . , vol. 23, pp. 1495–1518 , 2007. [23] M. J. Fadili and J.- L. S tarck, “Sparse representation-based i mage decon volution by iterative thresholding, ” in ADA IV . France: E lsev ier, 2006. [24] J.-L. Starck, M. Nguyen, and F . Murtagh, “W ave lets and curvelets for image decon volution: a combined approach, ” Signal Pr ocessing , vol. 83, pp. 2279–2283 , 2003. [25] C. Chaux, L. Blanc-F ´ eraud, and J. Zerubia, “W avelet-based restoration methods: application to 3d confocal microscopy images, ” in SPIE W avelets XII , 2007. [26] C. Lemar ´ echal and J.-B. Hi riart-Urruty , Conv ex A nalysis and Minimization A lgorithms I , 2nd ed. Springer , 1996. [27] P . G. Ciarlet, Intro duction ` a l’analyse num ´ erique matricielle et ` a l’ optimisation . Mass on-Paris, 1985. [28] P . L. Combettes, “Solving monotone inclusions via compositions of none xpansiv e av eraged operators, ” Optimization , vol. 53, pp. 475–50 4, 2004. [29] F . J. Anscombe, “The T ransformation of Poisson, Binomial and Negativ e-Binomial Data, ” Bi ometrika , vol. 35, pp. 246–254, 1948. [30] S. G. Mallat, A W avelet T our of Signal Process ing , 2nd ed. Academic Press, 1998. [31] E. J. Cand ` es and D. L. Donoho, “Curvelets – a surprisingly effecti ve nonadapti ve representation for objects with edges, ” in Curve and Surface Fitting: Saint-Malo 1999 , A. Cohen, C. Rabut, and L. Schumaker , E ds. Nashville, T N: V anderbilt University Press, 1999. [32] E. Cand ` es, L. Demanet, D. Dono ho, and L. Y ing, “Fas t discrete curvelet transfo rms, ” SIA M Multiscale Model. S imul. , v ol. 5, pp . 861–89 9, 2005. [33] L. Demanet and L. Yin g, “W ave atoms and sparsity of oscillatory patterns, ” Applied and Computational Harmonic Analysis , vol. 23, no. 3, pp. 368–38 7, 2007 . [34] M. Jansen, “Multiscale poisson data smoothing, ” J. Royal Stat. Soc. B. , vol. 68, no. 1, pp. 27–48, 2006. [35] J.-J. Moreau, “Fonctions con vex es duales et points proximaux dans un espace hilbertien, ” CR AS S ´ er . A Math. , vol. 255, pp. 2897–2899 , 1962. [36] J. E ckstein and D. P . Bertsekas, “On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, ” Math. Pro gramming , vol. 55, pp. 293–318 , 1992. [37] P . -L. Lions and B. Mercier , “Splitting algorithms for the sum of two nonlinear operators, ” SIAM J. Numer . Anal. , vol. 16, pp. 964–979, 1979. [38] M. Fadili, J.-L. S tarck, and F . Murtagh, “Inpainting and zooming using sparse representations, ” The C omputer Journal , 2006, in press. [39] P . L. Combe ttes and J.-C. Pesquet, “Proximal thresholding algorithm for minimization ov er orthono rmal bases, ” SIAM J ournal on Optimization , vol. 18, no. 4, pp. 1351–1376, Nov ember 2007. [40] P . Tseng, “ A modiﬁed forward-b ackward spli tting method for maximal monotone mappings, ” SIAM J. Control & Optim. , vol. 38, pp. 431–44 6, 2000. [41] G. H. Golub, M. Heath, and G. W ahba, “Generalized cross-va lidation as a method for choosing a good ridge parameter , ” T echnometrics , vol. 21, no. 2, pp. 215–223, 1979. [42] H. Akaike, B. N. P etrox, and F . Caski, “Information t heory and an extension of the maximum li kelihoo d principle, ” Second International Symposium on Information Theory . Akademiai Ki ado, Boudapest. , pp. 267–2 81, 1973. [43] G. Schwarz, “Esti mation of the dimension of a model, ” Annals of Statistics , vol. 6, pp. 461–464 , 1978. [44] C. Stein, “Estimation of the mean of a multiv ariate normal distr ibution, ” Annals of Statistics , vol. 9, pp. 1135–1151, 1981. [45] B. Efron, “Ho w biased is the apparent error rate of a prediction r ule, ” J ournal of the American Statistical Association , vol. 81, pp. 461–47 0, 1981. 25 [46] M. Jansen, M. Malfait, and A. Bultheel, “Generalized cross validation for wav elet t hresholding, ” Signal Processing , vol. 56, no. 1, pp. 33–44, 1997. [47] B. E fron, “The estimation of prediction error: cov ariance penalties and cross-va lidation, ” Jou rnal of the American Statisti cal Association , vol. 99, pp. 619–642, 2004. [48] J. Y e, “On measuring and correcting the effects of data mining and model selection, ” Journa l of the A merican Statistical Association , vol. 93, pp. 120–131, 1998. [49] X. Shen and J. Y e, “ Adaptive model selection, ” Jo urnal of the American Statistical Association , vol. 97, pp. 210–221 , 2002. [50] H. Zou, T . Hasti e, and R. T ibshirani, “On the ”degrees of freedom” of the lasso, ” Annals of Statistics , vol. 35, pp. 2173–2 192, 2007. [51] C. Dossal, “ A necessary and suf ﬁcient condition for exac t recov ery by ℓ 1 minimization, ” 2007, preprint, hal-0016 4738. [Online]. A v ailable: http://hal.archiv es- ou vertes.fr/hal- 0 0164738/en/ [52] A. Barron and T . Cover , “Minimum complexity density estimation, ” IEEE T ransa ctions on Information Theory , vol. 37, pp. 1034–105 4, 1991. [53] [Online]. A v ailable: RebeccaW illetthomepagehttp://www .ee.duke.edu/ ∼ willett/ [54] [Online]. A v ailable: ImageJwebsitehttp://rsb .info.nih.gov /ij / [55] R. W illett and R. No wak, “Fast multi resolution photon-limited image reconstruction, ” in IEEE ISBI , 2004. [56] B. Zhang, J. Z erubia, an d J.-C. Oliv o-Marin, “Gaussian approximations of ﬂuorescence microscope PS F models, ” Applied Optics , vol. 46, no. 10, pp. 1819–18 29, 2007. [57] J. B uckheit and D. Donoho, “W avelab and reproducible research, ” in W avelets and Statist ics , A. Antoniadis, Ed. Spring er, 1995. [58] F . Murtag h, J.-L. S tarck, and A. Bijaoui, “Image restoration with noise suppre ssion using a multiresolution support, ” Ast r onomy and Astr ophysics, Supplement Series , vol. 112, pp. 179–189, 1995. [59] P . L. Combettes and J.-. Pesquet, “ A Douglas-Rachford splittting approach to nonsmo oth con vex v ariational signal recov ery , ” IEE E J ournal of Selected T opics in Signal Proce ssing , vol. 1, no. 4, pp. 564–574 , 2007. A P P E N D I X A. Pr oof of Pr opos ition 1 Pr oof: (i) f 1 is obviously con vex, a s Φ and H a re boun ded linear operators and f is con vex. (ii) The c omputation of the grad ient of f 1 is straightforward. (iii) For any α, α ′ ∈ H , we ha ve,   ∇ f 1 ( α ) − ∇ f 1 ( α ′ )   6 k Φ k 2 k H k 2   ∇ F ◦ H ◦ Φ( α ) − ∇ F ◦ H ◦ Φ( α ′ )   . (20) The function − z i √ η i +3 / 8 + 2 is one -to-one increasing on [0 , + ∞ ) with deriv ati ve uniformly b ounded above by z i 2 (8 / 3) 3 / 2 . Thus ,   ∇ F ◦ H ◦ Φ ( α ) − ∇ F ◦ H ◦ Φ ( α ′ )   6  8 3  3 2 k z k ∞ 2   H ◦ Φ( α ) − H ◦ Φ( α ′ )   6  8 3  3 2 k z k ∞ 2 k Φ k 2 k H k 2   α − α ′   . (21) Using the f ac t that k Φ k 2 2 =   ΦΦ T   2 = c for a tight frame, and z is bounde d since y ∈ ℓ ∞ by assumption, we conclud e that ∇ f 1 is Lipschitz-continuou s with the constant given in (8). 26 B. Pr oof of Pr opos ition 2 Pr oof: The existence is obvious b ecause J is coercive. If Φ is an o rthobasis and ker (H) = ∅ then f 1 is s trictly con vex and s o is J lea ding to a strict minimum. Similarly , if ψ is strictly c on vex, so is f 2 , hen ce J . C. Pr oof of L emma 1 Pr oof: 1) Le t g : γ 7→ 1 2 k α − γ k 2 + λ Ψ( γ ) . From Deﬁnition 1, pro x λ Ψ ( α ) is the un ique minimizer of g , wherea s pro x f 2 ( α ) is t he unique minimizer of g + ı C ′ . If α ∈ C ′ , then p r o x f 2 ( α ) is also the unique minimi ze r of g as obviously ı C ′ ( α ) = 0 in this case. That is, p ro x f 2 ( α ) = pro x λ Ψ ( α ) . 2) Le t’ s now turn to the gene ral case . W e have to ﬁnd the unique s olution to the follo wing minimization prob lem: pro x f 2 ( α ) = arg min γ g ( γ ) + ı C ◦ Φ( γ ) = arg min γ ∈C ′ g ( γ ) . As both ı C and g ∈ Γ 0  R L  but non-differentiable, we use the Douglas-Rac hford splitting method [28 ], [36]. This iteration is g i ven by: γ t +1 = γ t + ν t  rprox λ Ψ+ 1 2 k . − α k 2 ◦ r pro x ı C ′ − I  ( γ t ) . (22) where the se quence ν t satisﬁes the condition of the lemma. From [28, Corollary 5.2], an d by strict con vexity , we deduce tha t the seq uence of iterates γ t con verges to a unique point γ , and P C ′ ( γ ) is the unique proximity point pro x f 2 ( α ) . It re mains no w to explicitly express pro x λ Ψ+ 1 2 k . − α k 2 and pro x ı C ′ . p ro x λ Ψ+ 1 2 k . − α k 2 is the proximity operator of a qua dratic perturbation of λ Ψ , which is related to pr o x λ Ψ by: pro x λ Ψ+ 1 2 k . − α k 2 ( . ) = pro x λ 2 Ψ  α + . 2  . (23) See [20, Lemma 2.6]. Using [59, Propos ition 11], we have pro x ı C ◦ Φ = I + c − 1 Φ T ◦ ( P C − I) ◦ Φ = c − 1 Φ T ◦ P C ◦ Φ + (I − c − 1 Φ T Φ) . (24) This completes the proof. D. Pr oof of T heorem 1 Pr oof: The most g eneral result on the con vergence of the forward-backward algorithm is is due to [20, Theorem 3.4]. Hence , combining this theorem with Lemma 1, Lemma 2 and Prop osition 1, the result follows.

A proximal iteration for deconvolving Poisson noisy images using sparse representations

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment