Super-resolution PET imaging using convolutional neural networks

S U P E R - R E S O L U T I O N P E T I M A G I N G U S I N G C O N V O L U T I O N A L N E U R A L N E T W O R K S A P R E P R I N T Tzu-An Song Department of Electrical and Computer Engineering Univ ersity of Massachusetts Lowell Lowell, MA 01854 Samadrita R. Chowdhury Department of Electrical and Computer Engineering Univ ersity of Massachusetts Lowell Lowell, MA 01854 Fan Y ang Department of Electrical and Computer Engineering Univ ersity of Massachusetts Lowell Lowell, MA 01854 Joyita Dutta Department of Electrical and Computer Engineering Univ ersity of Massachusetts Lowell Lowell, MA 01854 Gordon Center for Medical Imaging Massachusetts General Hospital Boston, MA 01720 June 11, 2019 A B S T R AC T Positron emission tomography (PET) suf fers from sev ere resolution limitations which limit its quantitativ e accuracy . In this paper , we present a super-resolution (SR) imaging technique for PET based on con volutional neural netw orks (CNNs). T o facilitate the resolution recov ery process, we incorporate high-resolution (HR) anatomical information based on magnetic resonance (MR) imaging. W e introduce the spatial location information of the input image patches as additional CNN inputs to accommodate the spatially-variant nature of the blur k ernels in PET . W e compared the performance of shallow (3-layer) and very deep (20-layer) CNNs with various combinations of the following inputs: low-resolution (LR) PET , radial locations, axial locations, and HR MR. T o validate the CNN architectures, we performed both realistic simulation studies using the BrainW eb digital phantom and clinical neuroimaging data analysis. For both simulation and clinical studies, the LR PET images were based on the Siemens HR+ scanner . T wo dif ferent scenarios were examined in simulation: one where the target HR image is the ground-truth phantom image and another where the tar get HR image is based on the Siemens HRR T scanner — a high-resolution dedicated brain PET scanner . The latter scenario was also examined using clinical neuroimaging datasets. A number of factors affected relativ e performance of the different CNN designs examined, including network depth, target image quality , and the resemblance between the tar get and anatomical images. In general, howe v er , all CNNs outperformed classical penalized decon volution techniques by large mar gins both qualitativ ely (e.g., edge and contrast recovery) and quantitati vely (as indicated by two metrics: peak signal-to-noise-ratio and structural similarity index). K eywords Super-resolution · CNN · Deep learning · PET · MRI · Multimodality imaging · Partial volume correction 1 Introduction Positron emission tomography (PET) is a 3D medical imaging modality that allo ws in vivo quantitation of molecular targets. While oncology [1] and neurology [2] are perhaps the ﬁelds where PET is of the greatest relev ance, its applications are expanding to man y other clinical domains [3, 4]. The quantitativ e capabilities of PET are confounded by a number of de grading factors, the most prominent of which are lo w signal-to-noise ratio (SNR) and intrinsically A P R E P R I N T - J U N E 1 1 , 2 0 1 9 limited spatial resolution. While the former is largely dri ven by tracer dose and detector sensitivity , the latter is the driv en by a number of factors, including both physical and hardware-limited constraints and software issues. The resolution limitations pose an e ven greater challenge when the tar get regions-of-interest (R OIs) are smaller . Physical and hardware-related factors limiting the spatial resolution include the non-collinearity of the emitted photon pairs, intercrystal scatter , crystal penetration, and the non-zero positron range of PET radionuclides [5 – 7]. On the software front, resolution reductions are largely a product of smoothing regularizers and ﬁlters commonly used within or post-reconstruction for lowering the noise le vels in the ﬁnal images [8]. T ogether image blurring and tissue fractioning (due to spatial sampling for image digitization) lead to the so-called partial volume ef fect that is embodied by spillo ver of estimated activity across dif ferent R OIs [9]. Broadly , the efforts to address the resolution challenge encompass both within-reconstruction and post-reconstruction corrections. The former family includes methods that incorporate image-domain or sinogram-domain point spread functions (PSFs) in the PET image reconstruction frame work [10 – 12] and/or smoothing penalties that preserve edges by incorporating anatomical information [13 – 19] or other transform-domain information [20 – 22]. The latter family of post-reconstruction ﬁltering techniques includes both non-iterati ve corrections [9, 23 – 27] and techniques that rely on an iterativ e decon volution backbone [28 – 30] which is stabilized by dif ferent edge-guided or anatomically-guided penalty or prior functions [31, 32]. Unlike partial volume correction, strategies for which are often modality-speciﬁc, super -resolution (SR) imaging is a more general problem in image processing and computer vision. SR imaging refers to the task of con verting a low-resolution (LR) image to a high-resolution (HR) one. The problem is inherently ill-posed as there multiple HR images that may correspond to any given LR image. Classical approaches for SR imaging inv olve collating multiple LR images with subpixel shifts and applying motion-estimation techniques for combining them into an HR image frame [33, 34]. Less computationally-intensive modern approaches to SR encompass the f amily of so-called “example-based" techniques [35 – 41], which exploit self-similarities or recurring redundancies within the same image by searching for similar “patches" or sub-images within a gi ven image. These methods are particularly ef fecti ve for super-resolving natural images with ﬁne and repetiti ve textures and less meaningful for most medical image types. W ith the proliferation of deep learning techniques, many deep SR models hav e been proposed and have demonstrated state- of-the-art performance at SR tasks. Deep SR models commenced with a paper that proposed a 3-layer CNN architecture (commonly referred to in literature as the SRCNN) [42]. Subsequently , a very deep SR (VDSR) CNN architecture [43] that had 20 layers and used residual learning [44] was demonstrated to result in much-improved performance ov er the shallower SRCNN approach. More recently SR performance has been further boosted by le veraging generati ve adversarial networks (GANs) [45], although GAN training remains notoriously dif ﬁcult. Our pre vious work on SR PET spans a wide gamut, including both (classical) penalized decon volution based on joint entropy [32, 46] and deep learning approach based on the VDSR CNN [47]. This work is an extension of the latter ef fort. In this paper , we design, implement, and validate se veral CNN architectures for SR PET imaging, including both shallow and very deep v arieties. As a ke y adaptation of these models to PET imaging, we supplement the LR PET input image with its HR anatomical counterpart, e.g. a T1-weighted MR image. Unlike uniformly-blurred natural images, PET images are blurred in a spatially-v ariant manner [11, 48]. W e, therefore, further adjust the netw ork to accommodate spatial location details as inputs to assist the SR process. “Ground-truth" HR images are required for the training phase of all supervised learning methods. Though simulation studies are not constrained by this demand, it is a challenge for clinical studies where it is usually infeasible to obtain the “ground-truth" HR counterparts for LR PET scans of human subjects. T o ensure clinical utility of this method, we e xtend it to training based on imperfect tar get images deriv ed from a higher-resolution clinical scanner , thereby exploiting the SR framew ork to establish a mapping from an LR scanner’ s image domain to an HR scanner’ s image domain. In section 2 of this paper , we present the underlying network architecture. In section 3, we describe the simulation data generation steps and the network training and v alidation procedures. In section 4, we present simulation results comparing the performance of CNN-based SR with two well-studied reference approaches. A summary of our work and a discussion on the application of CNNs for SR PET imaging are presented in section 6. 2 Theory 2.1 Network Design 2.1.1 CNN Basics CNNs typically contain three types of layers: con volutional layers, nonlinear activ ation layers, and pooling layers. Pooling layers, which ensure shift inv ariance, are useful for classiﬁcation and object recognition tasks. For SR imaging, which is an estimation/regression task, it suf ﬁces for the CNN to contain only con v olutional and activ ation layers. A 2 A P R E P R I N T - J U N E 1 1 , 2 0 1 9 con volutional layer contains an array of con volutional k ernels which e xtracts linear features from an input based on local connectivity . The pix el intensity at location ( i, j ) in the k th feature map for the l th layer , z lk ij , can be mathematically represented as [49]: z lk ij = w lk T x l ij + b lk , (1) where x l ij is the input patch at the ( i, j ) th pixel location and w lk and b lk are the kernel weights and bias respectiv ely . An activ ation layer introduces nonlinearities that enable the e xtraction of nonlinear features by the CNN. In deep neural networks, the rectiﬁed linear unit (ReLU) is a popular choice for the activ ation function as, unlike alternati ves such as the sigmoid function, it does not exhibit the vanishing gradient problem [50]. The ReLU activ ation function is deﬁned as: z = max( x , 0 ) , (2) where x is the input v ector and z is the output v ector . In addition, a ReLU activ ation function can accelerate computation of a network model because ReLUs set input v alues that are negati ve to be zero. 2.1.2 Residual learning The ﬁrst CNN applied to SR imaging (SRCNN) [42] sought to directly estimate the SR image as the network output. In contrast to such methods which compute the latent clean image, we adopt the residual learning strategy to remo ve the latent clean image from the blurry observations. This strate gy was originally described in the ResNet [44] architecture for image recognition and has prov en particularly beneﬁcial for very deep networks, for which training accuracy degrades with increasing network depth. A generalization of this idea based on r esidual blocks demonstrated ev en higher efﬁcac y at solving the SR problem [51]. 2.2 Network Inputs One k ey contrib ution of this paper is the tailoring of the CNN inputs to address the needs of SR PET imaging. All CNNs implemented here hav e the LR PET as the main input. This is similar to the two papers that inspired this work [42, 43], both of which had the LR single- or multi-channel images as their inputs. T o further assist the resolution recovery process, additional inputs are incorporated as described below: 2.2.1 Anatomical Inputs In generating the SR PET images, we seek to e xploit the similarities between the PET and its high-resolution anatomical counterpart. Most clinical and preclinical PET scanners come equipped with anatomical imaging capabilities, in the form of computed tomography (CT) or MR imaging, to complement the functional information in PET with structural information. As illustrated in the schematic in Fig. 1, we employ CNNs with multi-channel inputs, that include LR PET and HR MR input channels. 2.2.2 Spatial Inputs W e provide location details to the network via additional input channels in the form of patches representing radial and axial coordinate locations. In light of the cylindrical symmetry of PET scanners, we deem these sufﬁcient for learning the spatially-variant structure of the blurring operator directly from the training data. 2.2.3 Fusion For natural RGB images, where the three input channels typically ha ve a high de gree of structural similarity , usually the same set of kernels are ef fectiv e for feature extraction. In contrast, our CNN architecture exhibits a higher degree of input heterogeneity . Our initial experiments indicated the need for greater network width to accommodate a div erse set of features based on the very different input channels. W e found an efﬁcient solution to this by using separate kernels at the lower le vels and fusing this information at higher le vels as demonstrated in Fig. 1. 2.3 Network Depth A ke y ﬁnding of the VDSR paper was that the deeper the network, the better its performance. The paper sho wed that VDSR outperforms SRCNN by a great mar gin in terms of PSNR [43]. Here, we implement netw orks with different depths: 3 A P R E P R I N T - J U N E 1 1 , 2 0 1 9 Mul ti -Chan nel P atch In put LR PET SR PET HR MR Ra d ia l Locatio n Axi a l Locatio n Conv + ReLU … Conv + ReLU Conv + ReLU Conv + ReLU Conv Res idual Fina l Ou tput Figure 1: CNN architecture for SR PET . The network uses up to 4 inputs: (i) LR PET (the main input), (ii) HR MR, (iii) radial locations, and (i v) axial locations. The netw orks include alternating con v olutional (Con v) and nonlinear acti vation (ReLU) layers. It predicts the residual PET image which is later added to the input LR PET to generate the SR PET . 2.3.1 Shallow CNN W e designed and implemented a set of “shallow" networks loosely inspired by the 3-layer SRCNN [42]. The modiﬁcations used here are 1) residual learning and 2) modiﬁed inputs as described in section 2.2. It only has three con volutional layers, each follo wed by a ReLU, except for the last output layer . While, as a rule of thumb, the qualiﬁer “deep" is applied to networks with three or more layers, we use the word “shallo w" in this paper in a relati ve way . 2.3.2 V ery Deep CNN W e designed and implemented a series of “very deep" CNNs based on the VDSR architecture in [43], but with a different input design described in section 2.2. The very deep SR networks all hav e 20 con volutional layers, each followed by a ReLU, e xcept for the last output layer . 2.4 Network T ypes W e implemented, validated, and compared both shallow (3-layer) and very deep (20-layer) CNN architectures with v arying numbers of inputs. For the rest of the paper , we refer to these conﬁgurations as S1, S2, S3, S4, V1, V2, V3, and V4 as summarized in T able 1. 4 A P R E P R I N T - J U N E 1 1 , 2 0 1 9 T able 1: CNN architectures Network S1 V1 S2 V2 S3 V3 S4 V4 properties Number 3 20 3 20 3 20 3 20 of layers Input types LR PET LR PET LR PET LR PET LR PET LR PET LR PET LR PET HR MR HR MR Radial Radial HR MR HR MR locations locations Axial Axial Radial Radial locations locations locations locations Axial Axial locations locations 3 Methods 3.1 Overview In the following, we describe two simulation studies using the BrainW eb digital phantom and the 18 F -FDG radiotracer and a clinical patient study also based on 18 F -FDG. The inputs and tar get for the studies are summarized in T able 2. All LR PET images were based on the Siemens ECA T EXA CT HR+ scanner . For the ﬁrst simulation study , the HR PET images were the “ground-truth" images generated from segmented anatomical templates. For the second simulation study and the clinical study , the HR PET images were based on the Siemens HRR T scanner . Bicubic interpolation was used to resample all input and target images to the same vox el size of 1 mm × 1 mm × 1 mm with a 256 × 256 × 207 grid size. The LR and HR scanner properties are summarized in T able 3. T able 2: Simulation and experimental studies Study index Study type LR image (input) HR image (target) 1 Simulation HR+ PET T rue PET 2 Simulation HR+ PET HRR T PET 3 Clinical HR+ PET HRR T PET T able 3: LR and HR image sources Image type Scanner Spatial resolution Bore diameter Axial length LR HR+ 4.3 - 8.3 mm 562 mm 155 mm HR HRR T 2.3 - 3.4 mm 312 mm 250 mm 3.2 Simulation Setup 3.2.1 HR+ PSF Measurement An experimental measure of the true PSF was made by placing 0 . 5 mm diameter sources ﬁlled with 18 F -FDG inside the HR+ scanner bore 56 . 2 cm in diam eter and 15 . 5 cm in length. The PSF images were reconstructed using ordered subsets expectation maximization (OSEM) with post-smoothing using a Gaussian ﬁlter . The PSFs were ﬁtted with Gaussian kernels. W e assumed radial and axial symmetry and calculated the PSFs at all other in-between locations as linear combinations of the PSFs measured at the nearest measurement locations. Interpolation weights for the experimental datasets were determined by means of bilinear interpolation over an irregular grid consisting of the quadrilaterals formed by the nearest radial and axial PSF sampling locations from a giv en point. 3.2.2 Input Image Generation for Studies 1 and 2 Realistic simulations were performed using the 3D BrainW eb digital phantom ( http://brainweb.bic.mni.mcgill. ca/brainweb/ ). 20 distinct atlases with 1 mm isotropic resolution were used to generate a set of “ground-truth" PET images. The atlases contained the follo wing re gion labels: gray matter , white matter , blood pool, and cerebrospinal ﬂuid. Static PET images were generated based on a 1 hour long 18 F -FDG scan as described in our earlier paper [32]. This “ground-truth" static PET is referred to as “true PET" for the rest of the paper . The geometric model of the HR+ scanner was used to generate sinogram data. Noisy data was generated using Poisson de viates of the projected sinograms, a 5 A P R E P R I N T - J U N E 1 1 , 2 0 1 9 noise model widely accepted in the PET imaging community [52]. The Poisson de viates were generated with a mean of 10 8 counts for the full scan duration of 3640 s. The data were then reconstructed using the OSEM algorithm (6 iterations, 16 subsets). The images were subsequently blurred using the measured, spatially-v ariant PSF to generate the LR PET images. In order to match HR PET image grid size, the LR PET images were interpolated into 256 × 256 × 207 from the HR+ output size of 128 × 128 × 64 using bicubic interpolation. T1-weighted MR images with 1 mm isotropic resolution deriv ed directly from the BrainW eb database were used as HR MR inputs. 3.2.3 T arget Image Generation for Study 1 In Study 1, our purpose was to train the networks to map LR PET images to “ground-truth" image domain. The target HR PET images are, therefore, the true PET images as explained pre viously in section 3.1. 3.2.4 T arget Image Generation for Study 2 In Study 2, we trained the networks using simulated HRR T PET images as our tar get HR images. The geometric model of the HRR T scanner was used to generate sinogram data. Poisson noise realizations were generated for the projected sinograms with a mean of 10 8 counts for a scan duration of 3640 s. The images were then reconstructed using the OSEM algorithm (6 iterations, 16 subsets). OSEM reconstruction results typically appear grainy due to noise. W e, therefore, perform post-ﬁltering with a 2.4 mm full width at half maximum (FWHM) 3D Gaussian ﬁlter . Since the intrinsic resolution of the HRR T scanner is in the 2.3-3.4 mm range, this step improv es image quality without any appreciable reduction is resolution. 3.3 Experimental Setup Clinical neuroimaging datasets for this paper were obtained from the Alzheimer’ s Disease Neuroimaging Initiativ e (ADNI, http://adni.loni.usc.edu/ ) database, a public repository containing images and clinical data from 2000+ human datasets. W e selected 20 HRR T PET scans and the anatomical T1-weighted MPRA GE MR scans for clinical validation of our method. 10 of the 20 subjects were from the cognitiv ely normal category . The remaining 10 subjects had mild cognitiv e impairment. The full scan duration was 30 minutes ( 6 × 5 -minute frames). The OSEM algorithm (6 iterations, 16 subsets) was used for reconstruction. 3.3.1 T arget Image Generation for Study 3 As with Study 2 (section 3.2.3), the OSEM-reconstructed HRR T PET images, which were grain y , were post-ﬁltered using a 2.4 mm FWHM 3D Gaussian ﬁlter to suppress some of the noise without substantially reducing the image resolution. This led to the target HR PET images for the clinical study . 3.3.2 Input Image Generation for Study 3 The LR counterparts of the HRR T images were generated by applying the measured spatially-variant PSF of the HR+ scanner described in section 3.2.1 to the OSEM-reconstructed HRR T images. While, not directly deriv ed from the HR+ scanner , the use of a measured image-domain PSF ensured parity in terms of spatial resolution with true HR+ images. Rigidly co-registered T1-weighted MR images with 1 × 1 × 1 vox els were used as HR MR inputs. Cross-modality registration was performed using FSL ( https://fsl.fmrib.ox.ac.uk ) [53, 54]. 3.4 Network Implementation, T raining, and V alidation All networks were implemented on the PyT orch platform. Training w as performed using GPU-based acceleration achiev ed by using an NVIDIA GTX 1080Ti graphic card. An L 1 loss function was used for network training. Training was performed using Adam , an algorithm for optimizing stochastic objecti ve functions via adapti ve estimates of lo wer order moments [55]. The cohort size (total 20 subjects) was the same for all three studies. For all studies, training was performed using data from 15 out of 20 av ailable subjects to predict a residual image that is an estimated difference between the input LR PET and the ground truth HR PET . The stride and padding for the con volution kernels were both set to 1 . The kernel size was set to 3 × 3 . All the con volutional layers had 64 ﬁlters except for the last layer , which had only 1 ﬁlter . The batch size was 10 . The learning rate was set to 3 × 10 − 4 . The network was trained for 400 epochs. W e validated our results by employing the data from the remaining 5 subjects. 6 A P R E P R I N T - J U N E 1 1 , 2 0 1 9 3.5 Reference Appr oaches and Evaluation Metrics For the SR PET image, denoted by a vector x ∈ R N , and the HR MR image, denoted by a v ector y ∈ R N , where N is the number of vox els, the JE penalty is deﬁned as: Φ JE ( x | y ) = − M X i =1 M X j =1 δ u δ v p ( u i , v j ) log p ( u i , v j ) . (3) Here u ∈ R M and v ∈ R M are intensity histogram v ectors based on the PET and MR images respecti vely and M is the number of intensity bins. The TV penalty is deﬁned as: Φ TV ( x ) = ( k ∆ 1 x k 1 + k ∆ 2 x k 1 + k ∆ 3 x k 1 ) , (4) where ∆ k ( k = 1 , 2 , or 3 ) are ﬁnite difference operators along the three Cartesian coordinate directions. The e valuation metrics used here are deﬁned belo w . The true and estimated images are denoted x and ˆ x respectiv ely . W e use the notation µ x and σ x respectiv ely for the mean and standard deviation of x . 3.5.1 Peak Signal-to-Noise Ratio (PSNR) The PSNR is the ratio of the maximum signal power to noise po wer and is deﬁned as: PSNR ( ˆ x , x ) = 20 log 10  max ( ˆ x ) RMSE ( ˆ x , x )  , (5) where the root-mean-square error (RMSE) is deﬁned as: RMSE ( ˆ x , x ) = s 1 N X k ( ˆ x k − x k ) 2 . (6) 3.5.2 Structural Similarity Index (SSIM) The SSIM is a well-accepted measure of perceiv ed image quality and is deﬁned as: SSIM ( ˆ x , x ) = (2 µ x µ ˆ x + c 1 )(2 σ x ˆ x + c 2 ) ( µ 2 x + µ 2 ˆ x + c 1 )( σ 2 x + σ 2 ˆ x + c 2 ) . (7) Here c 1 and c 2 are parameters stabilizing the division. 4 Results 4.1 Simulation Results: Study 1 Fig. 2a showcases results from Study 1 — transverse slices from the HR MR, HR PET (same as the true PET in this case), LR PET and the SR imaging results from the two deconv olution methods and eight CNNs described in T able 1. The images are for a given subject from the v alidation dataset. Magniﬁed subimages in Fig. 2b highlight artifacts/inaccuracies indicated by purple arro ws that are observed in all the techniques that lack anatomical guidance, namely TV , S1, V1, S3, and V3. Comparison of the subimage pairs (S1, S3) and (V1, V3) illustrates that, in the absence a anatomical information, spatial information greatly enhanced image quality . Comparison of the subimage pairs (S1, V1) and (S3, V3) also sho ws that the addition of more conv olutional layers (increased network depth) is also very effecti ve in the absence of anatomical information. Since JE incorporates anatomical information, it shows better edge recov ery than TV . The CNN results show better gray-to-white contrast than both JE and TV . The PSNR and SSIM of the different methods are tabulated in T able 4. In terms of PSNR, the supervised CNN networks outperform the classical approaches by a wide mar gin. The netw orks that have anatomical guidance, S2, S4, V2, and V4 ha ve better performance on both PSNR and SSIM than the networks S1, V1, S3, and V3, which lack anatomical guidance. The PSNR and SSIM ﬁgures for S1 vs. S3 shows that these metrics increase noticeably with spatial information for the shallow case. 7 A P R E P R I N T - J U N E 1 1 , 2 0 1 9 Figure 2: Simulation results from the v alidation set: Study 1. (a) T ransverse slices from the T1-weighted HR MR image, true PET image (also the HR image for this case), LR PET image (HR+ scanner), JE-penalized decon v olution result, TV - penalized decon volution result, and the SR outputs from the following CNNs: S1, V1, S2, V2, S3, V3, S4, and V4. The blue box on the MR image indicates the re gion that is magniﬁed for closer inspection. (b) The corresponding magniﬁed subimages. Purple arrows indicate areas in the white-matter background re gion where prominent noise-induced artifacts arise for TV and for the CNNs without anatomical inputs, namely S1, V1, S3, and V4. T able 4: Study 1: Performance comparison Metric Reference LR TV JE S1 V1 S2 V2 S3 V3 S4 V4 PSNR True 21.47 22.35 22.37 26.86 27.71 37.17 37.61 27.17 27.98 37.27 37.93 SSIM T rue 0.74 0.85 0.86 0.74 0.86 0.97 0.97 0.83 0.87 0.97 0.98 4.2 Simulation Results: Study 2 Fig. 3a showcases results from Study 2 — transverse slices from the HR MR, true PET , HR PET , LR PET and the SR imaging results from the two decon volution methods and eight CNNs described in T able 1. The images are for a gi ven subject from the validation dataset. As with Study 1, the CNNs with MR-based anatomical inputs (S2, V2, S4, and V4) still produce the best results. Ho wev er , since the target image is no w a corrupt image with diminished structural similarity with the MR image, the margin of gain from using anatomical information is now reduced. Magniﬁed subimages in Fig. 3b highlight artifacts/inaccuracies indicated by purple arro ws that are more preponderant in this study compared to Study 1. Interestingly , for this more challenging problem, the deeper networks (V2 and V4) ha ve reduced background noise v ariations than their shallower counterparts (S2 and S4), as indicated by purple arro ws in the latter . The JE and TV images, which are unsupervised, are the same as those showcased in Fig. 2. The PSNR and SSIM of the different methods are tabulated in T able 5. An additional goal for this study was to understand the v ariability in results that could be anticipated when imperfect HR images were used for training. W e, therefore, computed two sets of PSNR and SSIM measures: one with respect to the target HR PET and another with respect to the true PET . Our results sho w that there is an overall reduction in performance when the true PET is used as the reference. This is expected with the HR PET used for training de viates substantially from the true PET . But a key observation here is that the CNNs exhibit consistent relati ve le vels of accuracy for the tw o reference images. As with 8 A P R E P R I N T - J U N E 1 1 , 2 0 1 9 Study 1, anatomically-guided networks showed better performance than the non-anatomically guided networks and the two classical methods. Figure 3: Simulation results from the v alidation set: Study 2. (a) T ransverse slices from the T1-weighted HR MR image, true PET image, HR PET image (HRR T scanner), LR PET image (HR+ scanner), JE-penalized decon volution result, TV -penalized deconv olution result, and the SR outputs from the following CNNs: S1, V1, S2, V2, S3, V3, S4, and V4. The blue box on the MR image indicates the region that is magniﬁed for closer inspection. (b) The corresponding magniﬁed subimages. Purple arro ws indicate areas in the white-matter background region where prominent noise- induced artif acts arise. These artifacts are the least prominent for V2 and V4 — very deep CNNs with anatomical inputs. T able 5: Study 2: Performance comparison Metric Reference LR TV JE S1 V1 S2 V2 S3 V3 S4 V4 PSNR HRR T 27.83 26.68 27.48 35.11 35.69 38.37 38.48 35.48 35.92 38.44 38.69 PSNR True 21.47 22.35 22.37 22.69 23.06 22.46 23.92 23.07 23.48 23.78 24.29 SSIM HRR T 0.76 0.83 0.82 0.82 0.80 0.88 0.88 0.85 0.82 0.89 0.90 SSIM T rue 0.74 0.85 0.86 0.77 0.75 0.86 0.86 0.79 0.77 0.86 0.87 4.3 Experimental Results: Study 3 Fig. 4a showcases results from Study 3 — the HR MR, HR PET , LR PET and the SR imaging results from the two decon volution methods and eight CNNs described in T able 1. For this study , the deeper networks (V1, V2, V3, and V4) produced visually sharper images than their shallo wer counterparts (S1, S2, S3, and S4). This is clearly e vident from the magniﬁed subimages in Fig. 4b . This is consistent with our pre vious observation that, in the absence of a strong contribution of the anatomical inputs, the e xtra layers lead to a stronger margin of improvement. That said, V4, which is deeper and uses anatomical information, led to the highest le vels of gray matter contrast as highlighted by red arro ws. The PSNR, SSIM of the dif ferent methods are tabulated in T able 6. As displayed in the table, V4 continues to exhibit the best performance in terms of PSNR and SSIM. As in Studies 1 and 2, all CNN based methods outperformed TV and JE. 9 A P R E P R I N T - J U N E 1 1 , 2 0 1 9 Figure 4: Clinical results from the validation set: Study 3. (a) Transv erse slices from the T1-weighted HR MR image, HR PET image (HRR T scanner), LR PET image (HR+ scanner), JE-penalized decon v olution result, TV -penalized decon volution result, and the SR outputs from the follo wing CNNs: S1, V1, S2, V2, S3, V3, S4, and V4. The blue box on the MR image indicates the region that is magniﬁed for closer inspection. (b) The corresponding magniﬁed subimages. The very deep CNNs (V1, V2, V3, and V4) yield sharper images than their shallo w counterparts (S1, S2, S3, and S4). The red arro ws points to bright gray matter areas in the HR image that are recovered with the highest contrast in V4. T able 6: Study 3: Performance comparison Metric Reference LR TV JE S1 V1 S2 V2 S3 V3 S4 V4 PSNR HRR T 29.93 34.98 33.32 35.75 36.93 36.00 37.27 36.29 37.42 36.46 38.02 SSIM HRR T 0.86 0.93 0.88 0.94 0.95 0.95 0.95 0.95 0.95 0.95 0.96 5 Discussion Overall, our results indicate that CNNs vastly outperform penalized decon volution at the SR imaging task. Among different CNN architectures, relati ve performance depends on the problem at hand. Our simulation and clinical studies all agree that deep CNNs outperform shallo w CNNs and that the additional channels contribute to improving o verall performance. The relativ e importance of anatomical and spatial input channel depends on the underlying structural similarity between the HR MR and the true PET . It should be noted that the utilization of anatomical images in conjunction with functional images required co-registration. T o ensure robustness, a well-tested standardized registration tool w as used for the task. Since the registration is intra- subject, rigid registration based on mutual information suf ﬁces for this application. A limitation of the CNNs designed and v alidated in this paper is that they are speciﬁc to inputs with noise and blur lev els similar to those used in training. In other words, they lack portability . Even for a gi ven LR-HR scanner pair , such as the HR+ and HRR T , the SR performance is expected to drop when the LR inputs are based on a much lower or higher tracer dose than the datasets used for training. Our future work will characterize the performance of these networks when input noise le vels are v aried. One remedial approach to address noise sensiti vity is the use of transfer learning for easy retraining of the networks with a much smaller training dataset containing LR inputs with altered 10 A P R E P R I N T - J U N E 1 1 , 2 0 1 9 SNR. Another , more sophisticated strategy is to use the CNN output as a prior in reconstruction. W e hav e previously used this approach and produced promising results for denoising [56] and anticipate that it will work for deblurring and SR by extension. Another limitation of CNN-based SR, and perhaps the most signiﬁcant one, is that it relies on supervised learning and, therefore, requires paired LR PET and HR PET images for training. This requirement is easy to address while training using simulated datasets. But paired LR and HR clinical scans are rare. T o address this limitation, we are currently exploring self-supervised learning strate gies based on adversarial training of generativ e adversarial networks, that circumvent the need for paired training inputs. 6 Conclusion W e hav e designed, implemented, and validated a family of CNN-based SR PET imaging techniques. T o facilitate the resolution recovery process, we incorporated both anatomical and spatial information. In the studies presented here, the anatomical information w as provided as an HR MR image. So as to easily provide spatial information, we supplied patches containing the radial and axial coordinates of each v oxel as additional input channels. This strategy is well-consistent with standard CNN multi-channel input formats and, therefore, con venient to implement. Both simulation and clinical studies sho wed that the CNNs greatly outperform penalized decon volution both qualitati vely (e.g., edge and contrast recov ery) and quantitativ ely (as indicated by PSNR and SSIM). As future work, we will de velop ne w networks based on self-supervised learning that will enable us to circumvent the need for paired training datasets. W e will also explore av enues to ensure applicability of this method to super-resolv e inputs with noise lev els different from those in the training data. From an applications perspective, we are interested in using this technique to super-resolv e PET images of tau tangles, a neuropathological hallmark of Alzheimer’ s disease, with the goal of dev eloping sensitiv e image-based biomarkers for tau. References [1] H. Sotoudeh, A. Sharma, K. J. Fo wler , J. McConathy , and F . Dehdashti. Clinical application of PET/MRI in oncology. J. Magn. Reson. Imaging , 44(2):265–276, 08 2016. [2] C. Catana, A. Drzezga, W . D. Heiss, and B. R. Rosen. PET/MRI for neurologic applications. J. Nucl. Med. , 53(12):1916–1925, Dec 2012. [3] B. M. Salata and P . Singh. Role of Cardiac PET in Clinical Practice. Curr . T reat. Options Car diovasc. Med. , 19(12):93, Nov 2017. [4] S. Hess, A. Alavi, and S. Basu. PET-Based Personalized Management of Infectious and Inﬂammatory Disorders. PET Clin. , 11(3):351–361, Jul 2016. [5] Richard M Leahy and Jin yi Qi. Statistical approaches in quantitati ve positron emission tomograph y . Stat. Comput. , 10(2):147–165, Apr 2000. [6] J. Dutta, S. Ahn, and Q. Li. Quantitative statistical methods for image quality assessment. Theranostics , 3(10):741–756, Oct 2013. [7] J. Dutta, R. M. Leahy , and Q. Li. Non-local means denoising of dynamic PET images. PLoS ONE , 8(12):e81390, Dec 2013. [8] J. Qi and R. M. Leahy . Resolution and noise properties of MAP reconstruction for fully 3-D PET. IEEE T rans. Med. Imaging , 19(5):493–506, May 2000. [9] O. G. Rousset, Y . Ma, and A. C. Evans. Correction for partial volume effects in PET: principle and v alidation. J. Nucl. Med. , 39(5):904–911, May 1998. [10] Andrew J Reader , Peter J Julyan, Heather W illiams, Da vid L Hastings, and Jamal Zweit. EM algorithm system modeling by image-space techniques for PET reconstruction. IEEE T rans. Nucl. Sci. , 50(5):1392–1397, Oct 2003. [11] A. M. Alessio, P . E. Kinahan, and T . K. Lewellen. Modeling and incorporation of system response functions in 3-D whole body PET. IEEE T rans. Med. Imaging , 25(7):828–837, Jul 2006. [12] V . Y . Panin, F . Kehren, C. Michel, and M. Casey . Fully 3-D PET reconstruction with system matrix deriv ed from point source measurements. IEEE T rans. Med. Imaging , 25(7):907–921, Jul 2006. [13] Richard Leahy and X Y an. Incorporation of anatomical MR data for impro ved functional imaging with PET. In Inf. Pr ocess. Med. Imaging. , v olume 511, pages 105–120. Springer , 1991. 11 A P R E P R I N T - J U N E 1 1 , 2 0 1 9 [14] C. Comtat, P . E. Kinahan, J. A. Fessler , T . Beyer , D. W . T ownsend, M. Defrise, and C. Michel. Clinically feasible reconstruction of 3D whole-body PET/CT data using blurred anatomical labels. Phys. Med. Biol. , 47(1):1–20, Jan 2002. [15] James E Bowsher , Hong Y uan, Laurence W Hedlund, Timoth y G T urkington, Gamal Akabani, Alexandra Badea, W illiam C Kurylo, C T ed Wheeler , Gary P Cofer, Mark W De whirst, et al. Utilizing MRI information to estimate F18-FDG distributions in rat ﬂank tumors. In IEEE Nucl. Sci. Symp. Conf. Rec. , v olume 4, pages 2488–2492. IEEE, 2004. [16] K. Baete, J. Nuyts, W . V an Paesschen, P . Suetens, and P . Dupont. Anatomical-based FDG-PET reconstruction for the detection of hypo-metabolic regions in epilepsy. IEEE T r ans. Med. Imaging , 23(4):510–519, Apr 2004. [17] F Bataille, C Comtat, S Jan, FC Sureau, and R Trebossen. Brain PET partial-volume compensation using blurred anatomical labels. IEEE T rans. Nucl. Sci. , 54(5):1606–1615, Apr 2007. [18] S. Pedemonte, A. Bousse, B. F . Hutton, S. Arridge, and S. Ourselin. 4-D generati ve model for PET/MRI reconstruction. Med. Image Comput. Comput. Assist. Interv . , 14(Pt 1):581–588, 2011. [19] S. Somayajula, C. Panagiotou, A. Rangarajan, Q. Li, S. R. Arridge, and R. M. Leahy . PET image reconstruction using information theoretic anatomical priors. IEEE T rans. Med. Imaging , 30(3):537–549, Mar 2011. [20] G. W ang and J. Qi. Penalized likelihood PET image reconstruction using patch-based edge-preserving re gulariza- tion. IEEE T rans. Med. Imaging , 31(12):2194–2204, Dec 2012. [21] K. Kim, Y . D. Son, Y . Bresler , Z. H. Cho, J. B. Ra, and J. C. Y e. Dynamic PET reconstruction using temporal patch-based low rank penalty for ROI-based brain kinetic analysis. Phys. Med. Biol. , 60(5):2019–2046, Mar 2015. [22] G. W ang and J. Qi. Edge-preserving PET image reconstruction using trust optimization transfer. IEEE T rans. Med. Imaging , 34(4):930–939, Apr 2015. [23] C. C. Meltzer , J. P . Leal, H. S. Mayber g, H. N. W agner , and J. J. Frost. Correction of PET data for partial v olume effects in human cerebral corte x by MR imaging. J. Comput. Assist. T omogr . , 14(4):561–570, Jul-Aug 1990. [24] H. W . Müller-Gärtner , J. M. Links, J. L. Prince, R. N. Bryan, E. McV eigh, J. P . Leal, C. Davatzik os, and J. J. Frost. Measurement of radiotracer concentration in brain gray matter using positron emission tomography: MRI-based correction for partial volume ef fects. J. Cereb . Blood Flow Metab . , 12(4):571–583, Jul 1992. [25] M. Soret, S. L. Bacharach, and I. Buvat. Partial-volume effect in PET tumor imaging. J . Nucl. Med. , 48(6):932–945, Jun 2007. [26] B. A. Thomas, K. Erlandsson, M. Modat, L. Thurfjell, R. V andenberghe, S. Ourselin, and B. F . Hutton. The importance of appropriate partial volum e correction for PET quantiﬁcation in Alzheimer’ s disease. Eur . J. Nucl. Med. Mol. Imaging , 38(6):1104–1119, Jun 2011. [27] Alexandre Bousse, Stefano Pedemonte, Benjamin A Thomas, Kjell Erlandsson, Sébastien Ourselin, Simon Arridge, and Brian F Hutton. Markov random ﬁeld and Gaussian mixture for segmented MRI-based partial volume correction in PET. Phys. Med. Biol. , 57(20):6681, 2012. [28] P H V an Cittert. Zum Einﬂuß der Spaltbreite auf die Intensitätsverteilung in Spektrallinien II. Z. Phys. , 69(5- 6):298–308, May 1931. [29] W illiam Hadley Richardson. Bayesian-based iterativ e method of image restoration. J . Opt. Soc. Am. , 62(1):55–59, Jan 1972. [30] Leon B Lucy . An iterative technique for the rectiﬁcation of observ ed distributions. Astr on. J. , 79:745, Jun 1974. [31] J. Y an, J. C. Lim, and D. W . T ownsend. MRI-guided brain PET image ﬁltering and partial volume correction. Phys. Med. Biol. , 60(3):961–976, Feb 2015. [32] T -A Song, F Y ang, SR Chowdhury , K Kim, KA Johnson, G El Fakhri, Q Li, and J Dutta. PET image deblurring and super-resolution with an MR-based joint entrop y prior . IEEE T rans Comput Imaging , 2019. [33] K Nasrollahi and TB Moeslund. Super-resolution: A comprehensi ve surve y . Mach. V is. Appl. , 25:1423–68, 2014. [34] D W allach, F Lamare, G K ontaxakis, and D V isvikis. Super-resolution in respiratory synchronized positron emission tomography . IEEE T rans. Med. Imaging , 31(2):438–448, 2012. [35] Glasner D, Bagon S, and Irani M. Super-resolution from a single image. Pr oc. IEEE Int. Conf. Comput. V is. , pages 349–56, 2009. [36] Kwang In Kim and Y ounghee Kwon. Single-image super-resolution using sparse re gression and natural image prior . IEEE T rans. P attern Anal. Mach. Intell. , 32(6):1127–1133, June 2010. 12 A P R E P R I N T - J U N E 1 1 , 2 0 1 9 [37] Y ang J, Wright J, Huang TS, and Ma Y . Image super-resolution via sparse representation. IEEE T rans. Image Pr ocess. , 19(11):2861–2873, Nov ember 2010. [38] T imofte R, De Smet V , and V an Gool L. Anchored neighborhood regression for fast example-based super- resolution. Proc. IEEE Int. Conf. Comput. V is. , pages 1920–1927, 2013. [39] Y ang J, Lin Z, and Cohen S. Fast image super-resolution based on in-place example re gression. Pr oc. IEEE Int. Conf. Comput. V is. , pages 1059–1066, 2013. [40] Jia K, W ang X, and T ang X. Image transformation based on learning dictionaries across image spaces. IEEE T r ans. P attern Anal. Mach. Intell. , 35(11):367–380, February 2013. [41] Schulter S, Leistner C, and Bischof H. Fast and accurate image upscaling with super-resolution forests. Pr oc. IEEE Int. Conf. Comput. V is. , pages 3791–99, 2015. [42] Chao Dong, Chen Change Loy , Kaiming He, and Xiaoou T ang. Image super-resolution using deep con volutional networks. IEEE T rans. P attern Anal. Mach. Intell. , 38(2):295–307, 2016. [43] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super -resolution using v ery deep con volutional networks. In Pr oc. IEEE Comput. Soc. Conf. Comput. V is. P attern Recognit. , pages 1646–1654, 2016. [44] K He, X Zhang, S Ren, and J Sun. Deep residual learning for image recognition. Pr oc. IEEE Int. Conf. Comput. V is. , pages 349–56, 2009. [45] Christian Ledig, Lucas Theis, Ferenc Huszar , Jose Caballero, Andre w Cunningham, Alejandro Acosta, Andre w Aitken, Alykhan T ejani, Johannes T otz, Zehan W ang, and W enzhe Shi. Photo-realistic single image super- resolution using a generativ e adversarial network. Pr oc. IEEE Int. Conf. Comput. V is. , pages 105–114, 2017. [46] Joyita Dutta, Georges El Fakhri, Xuping Zhu, and Quanzheng Li. PET point spread function modeling and image deblurring using a PET/MRI joint entrop y prior . In Pr oc. IEEE Int. Symp. Biomed. Imaging , pages 1423–1426. IEEE, 2015. [47] T -A Song, SR Chowdhury , K Kim, K Gong, G El F akhri, Q Li, and J Dutta. Super-resolution PET using a v ery deep con volutional neural netw ork. In Pr oc. IEEE Nucl. Sci. Symp. Med. Imag. Conf. IEEE, 2018. [48] C. Cloquet, F . C. Sureau, M. Defrise, G. V an Simaeys, N. T rotta, and S. Goldman. Non-Gaussian space-variant resolution modelling for list-mode reconstruction. Phys. Med. Biol. , 55(17):5045–5066, Sep 2010. [49] Jiuxiang Gu, Zhenhua W ang, Jason Kuen, Lianyang Ma, Amir Shahroudy , Bing Shuai, Ting Liu, Xingxing W ang, Gang W ang, Jianfei Cai, and Tsuhan Chen. Recent advances in con volutional neural networks. P attern Recognition , 77:354–377, 2018. [50] Y ann LeCun, Léon Bottou, Genevie ve B. Orr , and Klaus-Robert Müller . Efﬁcient backprop. In Neural Networks: T ric ks of the T rade , 1998. [51] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and K young Mu Lee. Enhanced deep residual netw orks for single image super-resolution. Pr oc. IEEE Comput. Soc. Conf . Comput. V is. P attern Recognit. , pages 1132–1140, 2017. [52] K. Lange. Con ver gence of EM image reconstruction algorithms with Gibbs smoothing. IEEE T rans. Med. Imaging , 9(4):439–446, 1990. [53] M. Jenkinson and S. Smith. A global optimisation method for robust af ﬁne registration of brain images. Med. Image Anal. , 5(2):143–156, Jun 2001. [54] M. Jenkinson, P . Bannister , M. Brady , and S. Smith. Improved optimization for the rob ust and accurate linear registration and motion correction of brain images. Neur oimage , 17(2):825–841, Oct 2002. [55] DP Kingma and J Ba. Adam: A method for stochastic optimization. arXiv preprint , 2014. [56] K. Kim, D. W u, K. Gong, J. Dutta, J. H. Kim, Y . D. Son, H. K. Kim, G. El Fakhri, and Q. Li. Penalized PET Reconstruction Using Deep Learning Prior and Local Linear Fitting. IEEE T rans. Med. Imaging , 37(6):1478–1487, 06 2018. 13

Super-resolution PET imaging using convolutional neural networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment