Nonlinear Prediction of Multidimensional Signals via Deep Regression with Applications to Image Coding
Deep convolutional neural networks (DCNN) have enjoyed great successes in many signal processing applications because they can learn complex, non-linear causal relationships from input to output. In this light, DCNNs are well suited for the task of s…
Authors: Xi Zhang, Xiaolin Wu
NONLINEAR PREDICTION OF MUL TIDIMENSIONAL SIGN ALS VIA DEEP REGRESSION WITH APPLICA TIONS T O IMA GE CODING Xi Zhang ? Xiaolin W u ? † ? Department of Electronic Engineering, Shanghai Jiao T ong Univ ersity † Department of Electrical & Computer Engineering, McMaster Uni versity ABSTRA CT Deep con volutional neural networks (DCNN) ha ve enjoyed great successes in many signal processing applications be- cause they can learn comple x, non-linear causal relationships from input to output. In this light, DCNNs are well suited for the task of sequential prediction of multidimensional sig- nals, such as images, and have the potential of impro ving the performance of traditional linear predictors. In this research we in vestigate how far DCNNs can push the en velop in terms of prediction precision. W e propose, in a case study , a two- stage deep regression DCNN framework for nonlinear pre- diction of two-dimensional image signals. In the first-stage regression, the proposed deep prediction network (PredNet) takes the causal context as input and emits a prediction of the present pixel. Three PredNets are trained with the re- gression objecti ves of minimizing ` 1 , ` 2 and ` ∞ norms of prediction residuals, respectiv ely . The second-stage regres- sion combines the outputs of the three PredNets to generate an ev en more precise and robust prediction. The proposed deep regression model is applied to lossless predictiv e image coding, and it outperforms the state-of-the-art linear predic- tors by appreciable margin. Index T erms — Deep regression, nonlinear prediction, lossless image coding. 1. INTR ODUCTION Sequential prediction of signals plays important roles in many applications, ranging from economics to image/video processing. Practically , all existing predictors used in im- age/video processing and computer vision are linear . This linearity is not due to the nature of the underlying phys- ical problems; instead, it is only the result of operational expedienc y . Optimal design of linear predictors is computa- tionally intractable. Linear prediction is ef fective to decor- relate stationary Gaussian random process, and is widely used in predictive coding of multidimensional signals. The classical linear predictors for image coding can be found in [1, 2, 3, 4, 5, 6]. Even li ving with the limitation of linear predictors, there is another difficulty hindering the optimal design of linear pre- dictors of image signals; that is, the choice of causal context for predicting the current pixel. The standard practice is to use the template that contains the K closest known pix els to the current pixel. The order K of the prediction model is fixed throughout the sequential prediction process and cho- sen empirically . The 2-D prediction context is simply a rect- angular causal region of size K that is centered at the next pixel x i . Justifying this design is the assumption that the correlation between two samples increases as they get closer to each other in space/time. Although the assumption might be true for many 1-D signals (e.g., ECG, audio), it does not hold for multidimensional signals as sample dependencies in natural signals are anisotropic in general. As such, a signal- independent prediction context must be suboptimal because it includes irrelev ant past samples and misses rele vant ones. W u et al . [7] proposed an adaptive, piece wise autoregres- siv e (P AR) prediction model for multidimensional signals. It uses the correlation instead of Euclidean distance between the past sample x i − t and the current sample x i to sequentialize past samples to form spatially nested causal prediction con- texts for different orders of the P AR model. For each x i , the order of the P AR model is determined in a criterion of mini- mum description length (MDL). T o estimate the P AR model parameters for x i , the authors also dev eloped a technique to choose a causal training set of past samples and the associated prediction conte xt. The MDL model is optimally designed on a sample-by-sample basis, and it beats all of its predecessors by achieving the lo west entropy of prediction residuals up to now . Howe ver , the MDL optimization process proposed in [7] has a prohibitively high computational complexity , requiring 8 hours to perform sequential prediction of a 512 × 512 image. This research is inspired by great successes of deep learn- ing in various signal processing applications, aiming to use the new tool to improve the performance of existing predic- tors for multidimensional signals. Our goal is well within reach because deep con volutional neural networks can learn complex, non-linear causal relationships, provided that a lar ge amount of paired input and output data is av ailable. In addi- tion to breaking the linearity limit, a DCNN prediction model also circumv ents the dif ficulty of finding a suitable prediction context because it can, via the training process with a spar - sity constraint, discov er effecti ve features that contribute to accurate prediction. Operationally , a deep learning based predictor also has advantage. Although the training of the DCNN prediction model is computationally expensi ve, it is only an off-line pro- cess. At the on-line inference stage, the new method runs faster than the state-of-the-art method of adaptiv e MDL pre- dictor . In this paper, the technical de velopments are presented mostly around 2D image signals. Howe ver , the ideas and re- sults can be easily extended to signals of other dimensions. W e propose a two-stage deep regression DCNN for nonlin- ear prediction of 2D signals. In the first-stage regression, the prediction network (PredNet), consisting of a conv olu- tional module and a regression module, is designed to take the causal context as input and output the prediction of the current pixel. Three PredNets are trained with the different re- gression objecti ves of minimizing ` 1 , ` 2 and ` ∞ of prediction residuals, respectively . In the second-stage regression, called refinement re gression, the different predictions from the three trained PredNets are fed into a new regression network to generate a more precise and robust prediction for the current pixel. T o validate the ef fectiv eness of the proposed two-stage deep regression DCNN, we apply it to lossless image cod- ing and e valuate the self-entropy of the prediction residuals. The ne w deep learning prediction method achieves the lowest entropy of the prediction residuals, among all predictors that hav e been published till present. 2. SEQUENTIAL PREDICTION VIA DEEP REGRESSION 2.1. Problem F ormulation For an image signal modeled as a 2D Markov field, the se- quential prediction of the current pix el x is made in a suitable causal neighborhood, that is: ˆ x = F ( C ( x )) (1) where C ( x ) is a causal conte xt consisting of past pixels that hav e effects on x ; a simple prediction context of nearest neighbors is illustrated in Fig 1. In what follows, we inv es- tigate how the predictor F can be realized by a prediction neural network model (PredNet) of deep learning. Giv en a set S of training samples { x i ; C ( x i ) } , the Pred- Net can be optimized by solving the following minimization problem: F = arg min F E x ∈ S k F ( C ( x )) − x k ` (2) where E represents the expectation ov er the training set S . 2.2. Minimum-entropy pr ediction For the training of the DCCN prediction model F , any ` -norm of the prediction residuals can be used in the objecti ve func- Curr e nt Pi xe l Caus a l Co ntex t Fig. 1 . The illustration of the 2-D predictor and the corre- sponding causal context. One circle represents one pixel in the image. tion. Different ` -norms can be chosen to serve different de- sign purposes. In lossless image compression, for instance, the ultimate goal is to minimize the entropy of the prediction residuals. Designing minimum entropy predictor , despite its practical v alues in data compression, has been hardly studied, apparently because of the difficulty of the problem. The only known work is a linear minimum entropy predictor by W ang and W u [8], which is computed by conv ex or quasicon vex programming. Now with the ne w tool of DCCNs, we embark on designing non-linear minimum-entropy predictors, which has remained to be a hard nut to crack thus far . For most natural images prediction residuals obey a Laplacian distribution [9]; hence, minimizing ` 1 -norm is equiv alent to minimizing the entropy of prediction residuals. For the sake of completeness, besides the ` 1 -norm se- lected as a proxy for minimizing the entropy , we also design non-linear DCNN predictors of minimum ` 2 and ` ∞ norms. The three prediction networks (PredNets) trained with the criteria of minimum ` 1 , ` 2 and ` ∞ are called PredNet- ` 1 , PredNet- ` 2 and PredNet- ` ∞ , respectiv ely . 2.3. Network Ar chitecture The proposed PredNet consists of a con volution module and a regression module. The con volutonal module is designed to extract features that contribute to the prediction from the causal context, and the regression module applies the regres- sion on these extracted features to emit the prediction. As illustrated in the Fig. 2 and Fig. 3, The conv olutional module contains 16 residual units. Each unit consists of tw o con volu- tional layers, respectively followed by a batch-normalization layer and a LeakyReLU activ ation layer . For LeakyReLU, the slope of the leak is set to 0.2. The regression module contains a flatten layer and a fully-connected regression layer with lin- ear activ ation. 2.4. Sparse Regularization If the image is modeled as a Markov random field, then the in- put of the PredNet, i.e., the causal prediction context, should 1 6 U i n ts C o n v o l u t i o n a l M o d u l e R e g re s s i o n M o d u l e Fig. 2 . The architecture of the deep prediction network (PredNet). Fig. 3 . The detailed configurations of the residual unit. be suf ficiently large to contain all the pixels that influence the current pixel, but at the same time it may include some ir- relev ant pixels. W e rely on the deep regression of DCNN to discov er useful features that contribute to the prediction and in the process discard the irrelev ant pixels. In the principle of MDL, or to reduce the risk of overfitting, we promote a com- pact DCNN prediction model by requiring the coefficients of the regression module to be sparse. By noting that the weights w r of the regression layer behave like a selection function in pixel domain, we include a model cost regularization term R ( w r ) into the objecti ve function for training the prediction network F : F = arg min F E x ∈ S k F ( C ( x )) − x k ` + λR ( w r ) (3) where the scalar λ is a Lagrangian multiplier . Here we adopt the most common form of sparse regularization in neural net- works, the ` 1 -norm of neuron weights, namely , R ( w r ) = k w r k 1 (4) 2.5. Refinement Regression In the first-stage regression, three deep prediction networks hav e been trained for minimizing different norms of predic- tion residuals. For pixel x , the three predictions optimized in the different criteria are denoted by ˆ x ` 1 , ˆ x ` 2 and ˆ x ` ∞ , re- spectiv ely . The goal of the second-stage regression is to train a refined DCNN prediction model that takes ˆ x ` 1 , ˆ x ` 2 , ˆ x ` ∞ as input and emits an improv ed prediction ˜ x . Using another training set S 0 (different from S to pre vent ov erfitting), the re- fined regression network F can be trained by minimizing the following cost function: F = arg min F E x ∈ S 0 k F ( ˆ x ` 1 , ˆ x ` 2 , ˆ x ` ∞ ) − x k l (5) In the interest of lossless image compression, our goal is to design a DCNN minimum-entropy predictor , thus the ` 1 - norm of prediction residuals is to be minimized in the training of the refined regression network PredNet-R. 3. EXPERIMENTS T o v alidate the effecti veness of the proposed deep prediction network (PredNet), we compare the prediction residuals on four measures ( ` 1 , ` 2 , ` ∞ and the self-entropy), with the state- of-the-art predictor MDL-P AR [7], and the gradient adaptive prediction (GAP) used in the well-kno wn lossless compres- sion algorithm CALIC [10]. For training, we collect thou- sands of 2K-resolution high-quality images from the three public datasets: DIV2K [11], CLIC [12] and Flickr2K [13], then randomly extract millions of patches from these images as the training set. T est images used in our experiments are from the K odak lossless image dataset [14]. In order to facilitate the computation, we use ` 8 -norm as an alternative to the ` ∞ -norm in training PreNet- ` ∞ . The hyper-parameters used for training the PredNets are listed as following: size of causal context is 21 × 21 ; learning rate is fixed to 10 − 4 ; weighting coefficient λ is set to 0 . 2 ; the param- eters of Adam optimizer is β 1 = 0 . 9 , β 2 = 0 . 99 , = 10 − 8 . The performance comparisons with the state-of-the-art predictors are listed in T able 1. As illustrated in the table, in the first-stage regression, the DCNN predictors PredNet- ` 1 , (a) Image (b) GAP (c) MDL-P AR (d) PredNet-R Fig. 4 . Residual images for ‘motocross bikes’ from the K odak image dataset. (a) Image (b) GAP (c) MDL-P AR (d) PredNet-R Fig. 5 . Residual images for ‘shuttered windo ws’ from the Kodak image dataset. T able 1 . Performance comparisons with the state-of-the- art predictors. Blue numbers indicate the best performances achiev ed in the first-stage regression; Red numbers indicate the best performances after the second-stage regression. Predictors Measures ` 1 ` 2 ` ∞ entropy ρ max GAP 5.68 10.36 188.25 4.69 0.27 MDL-P AR 5.34 9.65 181 4.40 0.24 PredNet- ` 1 4.51 9.40 241.29 4.32 0.20 PredNet- ` 2 4.62 8.32 228.20 4.38 0.21 PredNet- ` ∞ 5.65 8.50 160.29 4.58 0.24 PredNet-R 4.48 8.82 230.12 4.25 0.18 PredNet- ` 2 and PredNet- ` ∞ outperform the state-of-the- art predictor MDL-P AR in their respective error criterion. PredNet- ` 1 not only has the smallest ` 1 -norm, it also has the lowest entropy of the prediction residuals. This result indicates that the prediction residuals indeed obey the Lapla- cian distribution, hence minimizing ` 1 -norm is equiv alent to minimizing the entropy of prediction residuals. In the refinement regression, the DCNN predictor PredNet- R further reduces the ` 1 error and the entrop y of the predic- tion residual, by combining the different predictions in the first-stage re gression. PredNet-R e xhibits the po wer and advantages of deep learning in multidimensional signal pre- diction over traditional methods by breaking the record of achiev able lowest entropy held by the MDL-P AR predictor . This achiev ement is remarkable considering the extremely high complexity of the MDL-P AR predictor that needs to solve one optimization problem per pixel. As a result, it re- quires hours to preform sequential prediction of a 512 × 512 image. In contrast, the proposed deep learning predictors are designed off line, and they run much faster than the MDL- P AR predictor at the time of inference, taking 30 seconds per 512 × 512 image. In addition to the entropy , the performance of an image predictor can be measured by lack of correlation between the prediction residual and the original image signal. W e compute the local maximum correlations (denoted by ρ max ) between the prediction residuals and the input image for the predictors in the comparison group, and include the results in T able 1. The local maximum correlation refers to the lar gest of the cor- relation coefficients between the corresponding patches ex- tracted from the prediction residuals and the original image. The superiority of PredNet-R can be visualized by the ab- sence of image structures in the residual image. Figs. 4 and 5 are sample residual images of GAP , MDL-P AR and PredNet- R. It is evident that the residual images of PredNet-R contain the least amount of visible signal structures. 4. CONCLUSIONS In this work, DCNNs establish new performance records in sequential prediction of image signals. The proposed deep learning signal prediction models may find applications in signal compression, denoising and analysis. 5. REFERENCES [1] X. W u, E. Barthel, and W . Zhang, “Piece wise 2d au- toregression for predictiv e image coding, ” in Ima ge Pr o- cessing, 1998. ICIP 98. Proceedings. 1998 International Confer ence on . IEEE, 1998, pp. 901–904. [2] X. Li and M.T . Orchard, “Edge-directed prediction for lossless compression of natural images, ” IEEE T ransac- tions on Imag e Pr ocessing , vol. 10, no. 6, pp. 813–817, 2001. [3] H. T akeda, S. Farsiu, P . Milanfar , et al., “K ernel regres- sion for image processing and reconstruction, ” 2006. [4] C. K ervrann and J. Boulanger , “Local adaptivity to vari- able smoothness for ex emplar-based image regulariza- tion and representation, ” International J ournal of Com- puter V ision , vol. 79, no. 1, pp. 45–69, 2008. [5] N. Memon, D.L. Neuhof f, and S. Shende, “ An analysis of some common scanning techniques for lossless image coding, ” IEEE transactions on image pr ocessing , v ol. 9, no. 11, pp. 1837–1848, 2000. [6] A. Akimov , A. K olesnikov , and P . Franti, “Lossless compression of color map images by context tree mod- eling, ” IEEE T ransactions on Image Pr ocessing , vol. 16, no. 1, pp. 114–120, 2007. [7] X. W u, G. Zhai, X. Y ang, and W . Zhang, “ Adaptiv e se- quential prediction of multidimensional signals with ap- plications to lossless image coding, ” IEEE T ransactions on Imag e Pr ocessing , vol. 20, no. 1, pp. 36–42, 2011. [8] X. W ang and X. W u, “On design of linear minimum- entropy predictor , ” in Multimedia Signal Pr ocessing, 2007. MMSP 2007. IEEE 9th W orkshop on . IEEE, 2007, pp. 199–202. [9] E.Y . Lam and J.W . Goodman, “ A mathematical analysis of the dct coefficient distributions for images, ” IEEE transactions on image pr ocessing , vol. 9, no. 10, pp. 1661–1666, 2000. [10] X. W u and N. Memon, “Context-based, adapti ve, loss- less image coding, ” IEEE transactions on Communica- tions , vol. 45, no. 4, pp. 437–444, 2011. [11] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study , ” in The IEEE Conference on Computer V ision and P attern Recognition (CVPR) W orkshops , July 2017. [12] William T .F ., “W orkshop and challenge on learned image compression (clic), ” http: //www.compression.cc/ , 2018. [13] B. Lim, S. Son, H. Kim, S. Nah, and K.M. Lee, “En- hanced deep residual networks for single image super- resolution, ” in The IEEE Confer ence on Computer V i- sion and P attern Recognition (CVPR) W orkshops , July 2017. [14] Rich F ., “K odak lossless true color image suite, ” http: //r0k.us/graphics/kodak/ , 1999.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment