GAN Based Image Deblurring Using Dark Channel Prior

GAN Based Image Deblurring Using Dark Channel Prior Shuang Zhang ∗ , Ada Zhen, Robert L. Stevenson; University of Notre Dame; Notre Dame, Indiana, 46556 Abstract A conditional general adver sarial network (GAN) is pr o- posed for image deblurring pr oblem. It is tailor ed for image de- blurring instead of just applying GAN on the deblurring pr oblem. Motivated by that, dark channel prior is carefully picked to be in- corporated into the loss function for network training. T o make it mor e compatible with neur on networks, its original indiffer en- tiable form is discar ded and L2 norm is adopted instead. On both synthetic datasets and noisy natural images, the pr oposed network shows impro ved deblurring performance and r obustness to image noise qualitatively and quantitatively . Additionally , compared to the existing end-to-end deblurring networks, our network struc- tur e is light-weight, which ensur es less training and testing time . Introduction Blur is a common artifact for images taken by hand-held cameras. It is mostly caused by object motions, hand shake or out-of-focus. The blurry image is often modeled as conv olution of a sharp image and a blur kernel. And the target of deblurring is to restore a latent sharp image from the blurry one. Single im- age deblurring, howe ver , is a highly ill-posed problem, since it contains insuf ﬁcient information to recover a unique sharp image. In the past fe w years, assorted constraints and regulation schemes have been proposed to exclude implausible solutions. Priors, like total variation prior [1], sparse image prior [2], heavy- tailed gradient prior [3] and dark channel prior [4], are combined with L 1 / L 2 norm image regulation term to suppress ringing ar- tifacts and improve quality . Zhen [5] takes advantage of inertia sensor data to gain e xtra information and estimate spatially vary- ing blur kernels. Howe ver , since the blur kernel in reality is more complicated than the model, estimation of blur kernel is inaccu- rate, which causes ringing artifacts. Furthermore, these methods based on iterative optimization techniques are computationally in- tensiv e. Recently , Conv olutional Neural Networks (CNN) and deep learning related techniques have drawn a great attention in com- puter vision and image processing. Their applications in image deblurring demonstrate promising results. Sun [7] and Schuler [6] use CNN to estimate the spatially-in variant blur kernel and ob- tain latent image by tradition pipeline. Chakrabarti [13] trained a neural network to predict comple x F ourier coefﬁcients of motion kernel. Recently kernel-free end-to-end deblurring methods are proposed by Nah et al. [8] and Kupyn et al. [9]. Nah [8] adopted the multi-scale network to mimic con ventional coarse-to-ﬁne op- timization methods, and proposed a new realistic blurry image dataset with ground truth sharp images. The work of Kupyn [9] trains the popular Generative Adversarial Netw ork (GAN) on the same dataset with fewer parameters, gains higher PSNR values than that of Nah et al. [8] on the GOPRO dataset, and beats the others on K ohler dataset [10] using SSIM. Although [9] performs well based on metric scores, visually , its deblurred result suffers Figure 1. Comparison. (a) Input blurry image. (b) Result of [9]. (c) Our result. grid artifacts, as illustrated in Fig. 1. T o address this artifact, we utilize the dark channel prior . Dark channel is deﬁned as minimal intensity among three color channels of pixels in a local area. It was ﬁrst proposed by He et al. [11] for dehazing problem, based on the statistics that haze- free outdoor images have a smaller dark channel than hazy im- ages. Pan et al. [4] applied dark channel prior to image deblur- ring. They theoretically and empirically proved that comparing with blur images, the dark channel of sharp image is more sparse. And their results demonstrate that dark channel prior contrib utes to suppressing ringing and other artifacts. In order to enforce the sparsity , they utilize a regulation term of L 0 norm to count the nonzero elements of dark channel maps. Unfortunately , L 0 norm is not differentiable, which makes it hard to utilize in back prop- agation of neuron networks. Instead of using L 0 norm, we adopt L 2 norm to directly compute difference of the dark channel maps between groundtruth sharp images and deblurred images. In this paper , we present a GAN based image deblurring net- work using dark channel difference as loss function. The pro- posed technique is not just a straightforward application of GAN. This method focuses on how to combine traditional knowledge with deep learning to mak e the network achiev e better perfor - mance. Compared to the previous GAN-based deblurring net- work, the proposed network has less layers and weights. It leads to less training and testing time, and more importantly achiev es fav orable results. In addition, the original GOPR O train- ing dataset consists of artiﬁcially created blurry images without noise, which are usually different from the real blurry images. T o improve the quality of our trained network on more realistic blurry images and increase network robustness, we add random Gaussian noise with variance in a limited range onto the training image patches. The comparison experiments show that our net- work outperforms Kupyn et al. [9] for both GOPRO test dataset and real noisy blurry images. Related W ork Conditional General Adver sarial Networks GAN is ﬁrst proposed by Goodfellow et al. [14] to train a generativ e network in an adversarial process. It consists of two Figure 2. Proposed Network. The proposed CGAN based network has two sub-networks: generator G and discr iminator D . Generator restores shar p image I S from input blurry image I B . I ¯ S represents ground truth image. Discriminator can regard input pair ( I S , I B ) as ”fak e” and ( I ¯ S , I B ) as ”real”. Except for the ﬁrst la y er of discr iminator and generator , each block in both generator and discriminator consists of a conv olutional layer , batch nor malization step [21] and an activation function LeakyReLu [23]. The ﬁrst layers are not nor malized. The digit denotes the number of ﬁlters for each block. Dotted lines are skip connection layers in decoder which come from lay ers with same size in encoder . networks: a generator G and a discriminator D . Generator gen- erates a fake sample from input noise z , while discriminator es- timates the probability that the fake sample is from training data rather than generated by generator . These two networks are si- multaneously trained until discriminator cannot tell if the sample is real or fake. This process can be summarized as a two-player min-max game with the follo wing function: min G max D E ¯ x ∼ P data ( ¯ x ) [ log D ( ¯ x )] + E z ∼ P z ( z ) [ log ( 1 − D ( G ( z )))] , (1) where P data denotes distribution over training data ¯ x and P z is distribution of input noise z . GAN has been applied to different image restoration problems like super-resolution [16] and texture transfer [17]. Mirza et al. [15] extend GAN into a conditional model (eq. (2)), called Conditional Generativ e Adversarial Nets (CGAN), so that GAN can make use of auxiliary information to direct both generator and discriminator . Isola et at. [18] adopt CGAN ar- chitecture to achieve general image-to-image translation. In [18], more than just random noise z , similar image y is added as input of the generator , where y and ¯ x share part of features. y and ¯ x can be pairs of hazing and clear images about same scene, or dif ferent color buildings with same structure. Based on network architec- ture of [18], Kupyn et al. [9] utilize W asserstein loss [19] and perceptual loss [20] to train a CGAN for deblurring problem. min G max D E ¯ x ∼ P data ( ¯ x ) [ log D ( ¯ x , y )] + E z ∼ P z ( z ) [ log ( 1 − D ( G ( z , y ) , y ))] , (2) Dark Channel Prior For an image I , the dark channel of a pixel p is deﬁned by He et al. [11] as D c ( p ) = min q ∈ N ( p )  min c ∈{ r , g , b } I c ( q )  , (3) where p and q are pixel locations, N ( p ) denotes the image patch centered at p , and I c is the c -th color channel. As shown in eq. (3), dark channel describes the minimum intensity in an image patch. He et al. [11] observe that dark channel map D ( I ) in a haze-free image tends to be zero. Pan et al. [4] use a less restric- tiv e assumption that dark channel map D ( I ) is sparse rather than zero. Inspired by this, they adopt L 0 -regulation term to enforce less sparse dark channel in a deblurring process, where L 0 norm counts non-zero elements in a dark channel map. Proposed Method Network Ar chitecture The proposed network aims at obtaining a generator to re- store sharp image I S from input blurry image I B . This generator is trained with a discriminator using pairs of blurry image I B and ground truth sharp image I ¯ S . This structure is shown in ﬁg.2. Ex- cept for the ﬁrst layer of discriminator and generator , each block in both generator and discriminator consists of a conv olutional layer , batch normalization step [21] and an acti vation function LeakyReLu [23] with leaking rate α = 0 . 2. The ﬁrst layers are not normalized. Generator The proposed generator adopts an encoder- decoder frame work to achie ve image-to-image performance. Similar to [18], the encoder consists of a sequence of con volu- tional layers with stride = 2 and kernel size = 5. And the decoder has a chain of transposed-con volutional layers with same size of stride and kernel. Encoder represents the input image with a bot- tleneck vector and decoder recovers an image with same size of input from bottleneck vector . A skip architecture is applied by inserting same size of layers from encoder after each layer of de- coder . This skip connection reﬁnes the details in output image by combining deep, coarse, semantic information and shallow , ﬁne, appearance information [22]. Dropout is also included in decoder to av oid over-ﬁtting. Discriminator The proposed discriminator contains a series of con volutional layers with stride = 2 and kernel size = 5. The output of discriminator is a scalar , followed by a sigmoid function. Loss Functions According to eq. (2), we train discriminator and generator alternativ ely . The loss function of discriminator is same as adv er- sarial loss: L d = E ¯ x , y [ log D ( ¯ x , y )] + E y , z [ log ( 1 − D ( G ( z , y ) , y ))] . (4) In the deblurring setting, y and ¯ x denote blurry and sharp image, respectiv ely . The generator loss is deﬁned as combination of ad- Figure 3. Comparison with DeblurGAN [9]. From top to bottom: image from GOPR O dataset and real nature image. F rom left to right: blurr y images , deb lurred results by [9] and our result. versarial loss, content loss and dark channel loss: L g = E y , z [ log ( 1 − D ( G ( z , y ) , y ))] + λ 1 · L cont ent + λ 2 · L d ark channel , (5) where λ 1 = 100 and λ 2 = 250 in our experiments. Content loss W e adopt the traditional content loss to direct the output of generator to ground truth. Although both L 1 and L 2 norm are commonly used, L 1 norm is chosen since it attains less blurring result [18]. L cont ent = E ¯ x , y , z [ || ¯ x − G ( y , z ) || 1 ] . (6) Dark channel loss In order to suppress ringing and grid arti- fact, dark channel prior is especially chosen. Pan et al. [4] e xploit L 0 norm to count non-zero elements in a dark channel map D c ( I ) of an image I . Since L 0 norm is indifferential, L 2 norm is uti- lized instead which calculates the distance of dark channel map between ground truth and deblurred image. L d ark channel = E ¯ x , y , z [ || D c ( ¯ x ) − D c ( G ( y , z )) || 2 ] . (7) Unlike [9], we discard the perceptual loss [20]. Kupyn et al. [9] employ the difference of one feature map in the VGG-19 [24] between ground truth and restored images as perceptual loss. GAN is known for its ability to reserve perceptual feature of an image. Adding an extra perceptual loss seems a noneffectiv e re- peat. Our experiment shows that perceptual loss doesn’t improve the result, on the contrary , it leads to worse performance. Experiments Our network is implemented with Python code based on T en- sorﬂow [25]. Datasets GOPR O dataset [8] is utilized for training and testing our network. It contains 2103 paris of blurry and ground truth im- ages in train dataset, and 1111 pairs in test dataset. Resolution of the image are 720p. The blurry image is generated by averag- ing a sequence (7-15) of continuous sharp images. Sharp image in the middle of sequence is regarded as ground truth. GOPRO dataset is regarded as benchmark by many deblurring algorithms like [8] and [9]. Although GOPR O dataset is widely used, it only employs noise-free images. For natural images, howe ver , noise always accompanies with blur . T o test our model on more real images, we add Gaussian noise with variance = 0 . 001 to origi- nal GOPRO Large dataset and create a new GOPR O-noise dataset with 1111 image pairs. A synthetic dataset in [9] is adopted for training. Same as combination version of DeblurGAN in [9], we use both GOPR O train dataset and synthetic dataset to train our network. T raining Process The proposed network is trained on NVIDIA GeForce GTX 1080 T i GPU and tested on Mac Pro with 2.7 GHz Intel Core i5 CPU. Similar to [9], the input training pair is randomly croped as size of 256 × 256 after downsampled by a factor of two. W eights are initialized to follo w Gaussian distribution with zeros mean and standard de viation 0 . 02. For each iteration of optimization, 1 step is performed on discriminator D , followed by 2 steps on genera- tor G to prevent discriminator loss L d from zero. The model is trained for 15 epochs within 2 days, comparing with 200 epochs for 6 days in [9] . Furthermore, despite of instability GAN’ s train- ing, our method con ver ges to similar result for each and every training task, which demonstrates the robustness of our GAN ar - chitecture. T able . 1 A verage PSNR and SSIM. Dataset Metrics [9] d c 0 d c 250 d c 250 p Original PSNR 26.63 26.70 27.01 26.45 SSIM 0.8701 0.8798 0.8813 0.8680 Noisy PSNR 26.32 26.53 26.83 26.31 SSIM 0.8524 0.8697 0.8707 0.8604 Result and Comparison Our test results are mainly compared with state-of-art GAN based deblurring network DeblurGAN [9]. DeblurGAN defeats deep learning networks [7] and [8] on GOPRO dataset. Since the author posted the code online 1 , we compare our network with DeblurGAN by directly adopting its uploaded network and latest trained weights. W e test our model on GOPR O and GOPR O-noise test datasets. Fig. 3 illustrates the deblurred results of [9] and our model. Blurry image in the ﬁrst row is picked in GOPR O-noise dataset and the blurry one of second ro w is real natural image with motion blur taken by camera. According to local patches, although [9] can deal with blur but its results suffer from grid artifacts, while our model with dark channel loss achieves sharper images without grid artifacts. Furthermore, for motion blurry image (second row), the sharp part in input image remains unchanged in our deblurred result, but e xtra grid artifacts are added to result of [9]. The quantitativ e performance of the proposed network on two dataset GOPR O and GOPRO-noise is shown in T ab. 1. In our experiment, the coef ﬁcient of dark channel loss λ 2 = 250( d c 250). The results are compared with same network without dark chan- nel loss d c 0, same network with extra perpetual loss d c 250 p as well as DeblurGAN [9]. All test images are downsampled by f ac- tor of two. The perpetual loss follows what it is in [10]. The proposed model performs best among the comparisons for both noise-free and noisy dataset. DeblurGAN performs less well ow- ing to its grid artifacts. Perceptual loss leads to a worse result. Since GAN is good at preserving perceptual feature already , per- ceptual loss brings no extra constraints for the network. Compar- ison with dc=0 demonstrates that dark channel loss contrib utes to better result. Conclusion T o address deblurring problem using a CGAN based archi- tecture and to tackle the issue with grid artifacts in GAN based deblurring methods, this paper incorporates a dark channel prior . The dark channel prior is employed by L 2 norm rather than L 0 in order to make it more friendly for network training. T o vali- date the deblurring result on more nature images, a noise in volv ed dataset is proposed. The proposed network shows a great deblur- ring performance for both synthetic and real blurry images. References [1] Nicolas Dey , Laure Blanc-Feraud, C Zimmer, Z Kam, J-C Oliv o- Marin, and J Zerubia. A deconv olution method for confocal mi- croscopy with total variation regularization. In Biomedical Imag- ing: Nano to Macro, 2004. IEEE International Symposium on, pg. 1223 − 1226. (2004). 1 https://github .com/KupynOrest/DeblurGAN [2] Anat Levin, Rob Fergus, Fredo Durand, and William T Freeman. Im- age and depth from a conv entional camera with a coded aperture. In A CM T ransactions on Graphics (TOG), v olume 26, pg. 70. (2007). [3] Shan Qi, Jiaya Jia, and Aseem Agarwala. High-quality motion de- blurring from a single image. In ACM T rans. Graph., volume 27, pg. 73. (2008). [4] Jinshan Pan, Deqing Sun, Hanspeter Pﬁster , and Ming − Hsuan Y ang. Blind image deblurring using dark channel prior . In CVPR, pg. 1628 − 1636, (2016). [5] Ruiwen Zhen and Robert L. Stev enson. Multi − image motion de- blurring aided by inertial sensors. Journal of Electronic Imaging, 25(1):013027 − 013027, (2016). [6] Christian J. Schuler, Michael Hirsch, Stefan Harmeling, and Bern- hard Schlkopff. Learning to deblur . IEEE Trans. Pattern Anal. Mach. Intell., 38(7):1439 − 1451, (2016). [7] Jian Sun, W enfei Cao, Zongben Xu, and Jean Ponce. Learning a con- volutional neural network for non-uniform motion blur remov al. In CVPR, pg. 769 − 777, (2015). [8] Seungjun Nah, T ae Hyun Kim, and K young Mu Lee. Deep multi − scale con volutional neural network for dynamic scene deblur- ring. pg. 3883 − 3891, (2017). [9] Kupyn, Orest Budzan, V olodymyr Mykhailych, Mykola Mishkin, Dmytro Matas, and Jiri Matas. DeblurGAN: Blind Motion Deblur- ring Using Conditional Adversarial Networks. CVPR camera-ready (2018). [10] Rolf Khler, Michael Hirsch, Betty Mohler, Bernhard Schlkopf, and Stefan Harmeling. Recording and playback of camera shake: Bench- marking blind decon volution with a real − world database. In ECCV , pg. 27 − 40, (2012). [11] Kaiming He, Jian Sun, and Xiaoou T ang. Single image haze removal using dark channel prior . In CVPR, pg. 1956 − 1963, (2019). [12] T ae Hyun Kim, Seungjun Nah, and K young Mu Lee. Dynamic scene deblurring using a locally adaptive linear blur model. arXiv preprint arX iv : 1603.04265, (2016). [13] A yan Chakrabarti. A neural approach to blind motion deblurring. In ECCV , pg. 221 − 235. Springer , (2016). [14] Ian Goodfello w , Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David W arde-Farley , Sherjil Ozair , Aaron Courville, Y oshua Bengio. Gen- erativ e adversarial nets. In NIPS, pg. 2672-2680, (2014). [15] Mehdi Mirza, Simon Osindero. Conditional generativ e adversarial nets. CoRR, abs/1411.1784., (2014). [16] Christian Ledig, Lucas Theis, Ferenc Huszar , Jose Caballero, An- drew Cunningham, Alejandro Acosta, Andre w Aitken, Alykhan T e- jani, Johannes T otz, Zehan W ang, and W enzhe Shi. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXi v:1609.04802, (2016). [17] Chuan Li and Michael W and. Precomputed real-time texture synthe- sis with markovian generati ve adversarial networks. ECCV , (2016). [18] Phillip Isola, Jun-Y an Zhu, Tinghui Zhou, and Alexei A. Efros. Image-to-image translation with conditional adversarial networks. In CoRR, abs /1611.07004, (2016). [19] Martin Arjovsky , Soumith Chintala, and Lon Bottou. W asserstein GAN. ArXiv , arXiv:1701.07875v3, (2017). [20] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In ECCV , (2016). [21] Sergey Ioffe and Christian Szegedy . Batch normalization: Acceler- ating deep network training by reducing internal cov ariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - V olume 37 (ICML ’15), Francis Bach and David Blei (Eds.), V ol. 37. JMLR.org 448-456.(2015). [22] E. Shelhamer , J. Long and T . Darrell, Fully Con volutional Networks for Semantic Segmentation, in IEEE T ransactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pg. 640-651, (2017). [23] Bing Xu, Naiyan W ang, Tianqi Chen, Mu Li. Empiricalevalua- tionof rectiﬁed activ ations in conv olutional network. arXiv preprint arXiv:1505.00853, 2015. [24] Karen Simonyan, Andrew Zisserman. V ery Deep Conv olutional Networks for Large-Scale Image Recognition. ArXiv e- prints, Sept. (2014). [25] Abadi, Martn, Barham, Paul, Chen, Jianmin, Chen, Zhifeng, Davis, Andy , Dean, Jeffrey , De vin, Matthieu, Ghema wat, Sanjay , Irving, Ge- offre y , Isard, Michael, Kudlur , Manjunath, Levenber g, Josh, Monga, Rajat, Moore, Sherry , Murray , Derek Gordon, Steiner , Benoit, T ucker, P aul A., V asudev an, V ijay , W arden, Pete, Wick e, Martin, Y u, Y uan and Zhang, Xiaoqiang. T ensorFlow: A system for large-scale machine learning. CoRR abs/1605.08695 (2016) .

GAN Based Image Deblurring Using Dark Channel Prior

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment