Deep Variational Networks with Exponential Weighting for Learning Computed Tomography

Deep V ariational Net w orks with Exp onen tial W eigh ting for Learning Computed T omograph y V alery Vishnevskiy 1 , Ric hard Rau, and Orcun Goksel Computer-assisted Applications in Medicine Group, ETH Zurich, Switzerland 1 valeryv@vision.ee.ethz.ch Abstract. T omographic image reconstruction is relev an t for man y med- ical imaging mo dalities including X-ray , ultrasound (US) computed to- mograph y (CT) and photoacoustics, for which the access to full angular range tomographic pro jections might be not a v ailable in clinical practice due to ph ysical or time constrain ts. Reconstruction from incomplete data in low signal-to-noise ratio regime is a challenging and ill-p osed inv erse problem that usually leads to unsatisfactory image qualit y . While infor- mativ e image priors may b e learned using generic deep neural netw ork arc hitectures, the artefacts caused b y an ill-conditioned design matrix often hav e global spatial supp ort and cannot be eﬃciently ﬁltered out b y means of conv olutions. In this pap er we prop ose to learn an inv erse mapping in an end-to-end fashion via unrolling optimization iterations of a prototypical reconstruction algorithm. W e herein introduce a net- w ork architecture that p erforms ﬁltering jointly in b oth sinogram and spatial domains. T o eﬃcien tly train such deep netw ork w e propose a no vel regularization approac h based on deep exponential w eigh ting. Ex- p erimen ts on US and X-ra y CT data sho w that our proposed metho d is qualitativ ely and quantitativ ely superior to conv en tional non-linear re- construction metho ds as well as state-of-the-art deep netw orks for image reconstruction. F ast inference time of the prop osed algorithm allows for sophisticated reconstructions in real-time critical settings, demonstrated with US SoS imaging of an ex vivo b o vine phan tom. 1 In tro duction T omographic image reconstruction with sparse or limited angular (LA) data arises in a n umber of applications including image guided interv entions [13], photoacoustics [7], and US sp eed-of-sound (SoS) imaging [11,2]. Suc h underde- termined problems usually require suitable problem-sp eciﬁc regularization for meaningful reconstructions, e.g. free from streaking artefacts. Setting regular- ization parameters manually can b e cum b ersome and often generalizes p o orly . Using learning-based metho ds as in [18] can greatly improv e reconstruction ac- curacy and account for non-Gaussian noise mo dels. Unfortunately the metho d in [18] is based on patch-based clustering leading to very slow reconstruction. Straigh tforward application of computationally eﬃcient conv olutional netw ork directly to measurements might b e tempting, but is unjustiﬁed, b ecause sino- gram v alues hav e global spatial dep endence on image intensities. In practice, such 2 V. Vishnevskiy et al. generic netw orks are not likely to generalize well for LA-CT problems [9]. Many deep-learning inspired metho ds employ artiﬁcial neural net works to learn ﬁlter- ing or weigh ting [17,12] of the input sinograms prior to the backpro jection step, after which the result migh t b e p ostpro cessed by another net work [4]. Unfortu- nately suc h sinogram preprocessing requires problem-speciﬁc weigh ting sc hemes, whic h would constrain applicable acquisition geometries. V ariational Net works (VN) employ adjoint of pro jection op erator to learn conv olutional ﬁlters in sp a- tial domain. F or compressed sensing in MRI, a landmark VN arc hitecture w as in tro duced b y Hammernik et al. [3], which in practice relies on unitarit y of F ourier transform. This w as addressed in [15] with more sophisticated unrolled iterations that improv ed USCT reconstructions and allow ed dete ction of coarse blob-lo oking inclusions. In order to allow for accurate image reconstruction with ill-conditioned spa- tial enco ding operators, in this pap er we extend VN architecture by in tro ducing sinogram ﬁlters that are learned as preconditioners, inspired b y ﬁltered backpro- jection. W e also prop ose an eﬃcien t netw ork regularization scheme that allows stable training inspired b y Landweber iterations [6] and deep sup ervision [8]. 2 Metho ds T omographic reconstruction problem inv olves estimating image intensities x from set of measurements (sinogram) b i (e.g., time-of-ﬂight or ray atten uation) that are mo delled, e.g., as line integrals b i = R ray i x ( r ) ds . Giv en a set of mea- suremen ts b ∈ R M , algebraic reconstruction metho ds solve for discretized spatial enco ding equations in a maximum-a-posteriori (MAP) sense, i.e.: ˆ x ( b ; λ, p ) = argmin x k Lx − b k p p + λ R ( x ) , (1) where x ∈ R n 1 n 2 is n 1 × n 2 image and L ∈ R M × n 1 n 2 is a sparse, ra y-discretization matrix that dep ends on the acquisition geometry . The norm p determines data inconsistency p enalt y , e.g. p =2 assumes Gaussian acquisition noise while p =1 assumes Laplace noise, the latter of which is often considered to b e more robust. Non-negativ e weigh t λ controls the inﬂuence of regularizer. Similarly to [5], w e herein consider total v ariation R TV ( x )= k ∇ x k 1 and total generalized v ariation R TGV ( x )=min u k∇ x − u k 1 + 2 k E u k 1 regularizers. Here ∇ denotes ﬁrst-order forw ard ﬁnite deriv ativ e matrix and E is the symmetrized v ector ﬁeld deriv ativ e op erator. Both TV and TGV regularizers yield conv ex optimization problems for p = { 1 , 2 } , which we hereafter refer as L p TV and L p TGV. A V ariational Netw ork can b e seen as a sequence of K unrolled iterations of a n umerical optimization scheme, inspired by a prototypical ob jective as in (1). F or additional learning capacity , these iterations are further relaxed, e.g. by adding v ariable ﬁlters and activ ations that can b e tuned during training. As illustrated in Fig. 1, we initialize VN inference b y L T b . Then, at each netw ork la yer (iter- ation) k , a gr adient term g ( k ) is accum ulated via running av erage go verned by Deep V ariational Netw orks for CT 3 ! " # $ %&' $ %( )*' +, - . $ %( ' ... +,- / $ %0 ' $ %0 )*' T raining loss: 1 ( 2 )3 0 )( $ %( ' 4 $ 5 * # 5 6 ! 78 $ 78 5 Noise $ %*' ... 9 % ( ' 6 : ( ! ; % ( ' " < = % ( ' > = % ( ' < = % ( ' : % ( ' ! ; % ( ' $ % ( ) * ' 4 # ? @ ( " < A % ( ' > A ( < A % ( ' @ ( $ % ( ) * ' B % ( ' 6 C % ( ' B % ( ) * ' ? 9 % ( ' $ % ( ' 6 $ % ( ) * ' 4 B % ( ' B %*' B %( )*' B %( ' B %0 )*' Initalization +, - D Simulated groundtruth Downsampling Solution T raining Inference Fig. 1. Structure of v ariational netw ork and its training strategy . momen tum co eﬃcien t α ( k ) and used to up date current image estimate x ( k ) as follo ws: g ( k ) ←  P ( k ) LQ ( k )  T W ( k ) d ϕ ( k ) d n W ( k ) d  P ( k ) LQ ( k ) x ( k − 1) − b o +  D ( k )  T W ( k ) r ϕ ( k ) r n W ( k ) r D ( k ) x ( k − 1) o , s ( k ) ← α ( k ) s ( k − 1) + g ( k ) , x ( k ) ← x ( k − 1) − s ( k ) . (2) W e deﬁne the gradien t term g ( k ) with the following tunable op erations that all allow backpropagation of gradients: (i) m ultiplication with diagonal pr e- c onditioner W ( k ) d and spatial regularization weighting W ( k ) r ; (ii) conv olution with left ( P ( k ) ) and right ( Q ( k ) ) preconditioners, and several (herein, n f =50) regularization ﬁlters D ( k ) ; (iii) nonlinear data ( ϕ ( k ) d ) and regularization ( ϕ ( k ) r ) activ ation functions that are parametrized via linear in terp olation on a regu- lar grid, herein, of size n g =35; i.e. ϕ { t } = (1 − t + b t c ) φ b t c + ( t − b t c ) φ b t c +1 . T o av oid bilinear ambiguities, every n k × n k (herein, n k =7) conv olution D , Q , P with kernel d is reparametrized to b e zero-centered unit-norm, i.e. d = n k ( d 0 − mean( d 0 )) / std( d 0 ), while diagonal terms are also ensured to b e nonnegative and b ounded via sigmoid: W = diag( σ ( w i )). Sto chastic minimization of the exp o- nential ly weighte d ` 1 reconstruction loss is then conducted to tune parameter set Θ = { P ( k ) , Q ( k ) , D ( k ) , W ( k ) r , W ( k ) f , φ ( k ) r , φ ( k ) d , α ( k ) } : min Θ E { b , x ? }∈T K X k =1 e − τ ( K − k ) k x ( k ) ( b ; Θ ) − x ? k 1 , (3) on the training set T . Here τ ≥ 0 controls the regularization of the netw ork: at τ =0, the reconstruction on all lay ers is w eighted equally , therefore all netw ork parameters hav e low v ariance of gradients, which allows for stable training. F or τ → + ∞ , only the last net work output x ( K ) is used for training, whic h allows the net work to b e tuned accurately for the ﬁnal ob jectiv e. W e accordingly increase τ during the training pro cedure to gradually relax constraints on the netw ork. In tuitively , suc h regularization encourages VN to provide reconstruction as early as p ossible, which is inspired by e arly stopping — a common image reconstruc- tion strategy that allows to a void degenerate solutions and can be shown to be equiv alen t to Tikhonov regularization in certain cases [6]. 4 V. Vishnevskiy et al. PC-VN expPC-VN Sparse V iew (30) CT Limited-Angle (90°) CT VN0 VN1 GT 1540 1545 1550 1555 1560 1565 1570 [m/s] [HU] SoS Fig. 2. Performance comparison of VN architectures on X-ra y CT and SoS simulations. Prop osed architectures are highlighted in green. T raining. W e employ a K =10 lay er VN and p erform 5 · 10 4 iterations of Adam algorithm (learning rate 10 − 3 , β 1 =0 . 85, β 1 =0 . 98, batc h size of 10) for train- ing, during which we contin ually adjust τ = j · 10 − 3 with j being the iteration n umber. T o av oid ov erﬁtting to the discretization scheme employ ed in the sim- ulation, we use higher resolution images x HR to compute line integrals deﬁned b y L HR while training for these to be reconstructed in the desired resolution deﬁned by the discretization of L . The net work was implemented using T ensor- ﬂo w framework, where multiplication with the design matrix L was carried out as a generic sparse matrix-vector multiplication. F or comparison with iterative reconstruction, w e employ ADMM algorithm [1] to solve (1) by approximating the regularized inv ersion of L with 5 iterations of LSQR solver [10]. F or each exp erimen tal scenario, an optimal v alue of λ was tuned on a single test image. 3 Exp erimen ts and Results X-ra y CT dataset. W e used 3DIRCADb dataset [14] that consists of 3080 axial CT scans from 22 patients with inplane resolution v arying from 0.56 2 to 0.961 2 mm 2 and slice thic kness of 1.6 to 4 mm. The forward simulation was con- ducted on original 512 × 512 images, which then were downsampled to 256 × 256 grid, yielding the ground truth (GT). W e sim ulated parallel b eam acquisition ge- ometry for limiter-angle (LA) and sparse-view (SV) scenarios. In the LA scenario w e sim ulated angular ranges of 120 ◦ , 90 ◦ and 60 ◦ , where pro jections were ac- quired in 1 ◦ incremen ts. F or the SV scenario w e sim ulated 180 ◦ range with 60, 30, and 15 uniformly-acquired pro jections. T o simulate realistic acquisition noise, w e Deep V ariational Netw orks for CT 5 T able 1. Mean reconstruction RMSE computed on corresp onding training sets with standard deviations indicated in parentheses. Prop osed metho ds are highlighted. Dataset L2TV L1TV L2TGV VN0 VN1 PC-VN expPC-VN RMSE RMSE RMSE RMSE RMSE RMSE RMSE LA-CT 90 ◦ 216 (9) 238 (16) 205 (6) 210 (9) 128 (15) 119 (19) 103 (8) SV-CT 30 93 (5) 91 (4) 98 (6) 97 (4) 87 (8) 88 (9) 65 (5) SoS USCT 1.48 (0.44) 1.62 (0.48) 1.89 (0.35) 1.8 (0.33) 1.35 (0.43) 1.19 (0.41) 1.0 (0.35) follo w [18] and employ P oisson+Gaussian mo del, i.e. b = log   P oisson( I 0 exp( − b ? ))+Gauss(0 , σ E )   , with I 0 =2 · 10 4 and σ E =8 · Unif(0 , 1) to allow v ariable SNR. W e used 20 patient scans for training and t wo for testing. US Sp eed-of-Sound T omograph y . W e follow [15] to simulate reﬂector-based USCT reconstruction with a 128-elemen t transducer and a square imaging ﬁeld- of-view. Synthetic inclusion masks were generated at 256 × 256 resolution as lev elsets of random, spatially-smo oth functions. The inclusion SoS v alues were then randomly sampled from [1350 , 1650] m/s. Acquisition noise w as mo delled as Gaussian with σ N =2 · 10 − 8 and the reconstruction w as conducted on a coarser 64 × 64 grid to av oid ov erﬁtting to the discretization. The training set contain s 15000 random synthetic inclusions, while the test set includes 13 geometric prim- itiv es consisting of ov al and p olygonal inclusions. Ev aluation. F or quantitativ e comparison of reconstruction and ground truth, w e calculated structural similarity index measuremen t (SSIM) [16], Ro ot Mean Square Error (RMSE), and p eak signal-to-noise ratio (PSNR) as follows: RMSE( x , y ) = r k x − y k 2 2 N , PSNR( x , y ) = 10 log 10 R 2 N k x − y k 2 2 , (4) where R is the dynamic range of the ground truth image. The corresponding v al- ues are rep orted in Hounsﬁeld units (HU) and m/s, for X-ray CT and USCT SoS exp erimen ts accordingly . W e denote the architectures emplo yed in [3] and [15] as VN0 and VN1, resp ectiv ely . As seen in Fig. 2 the prop osed preconditioned net work (PC-VN) with sinogram conv olutions improv es reconstruction accuracy and quality for all X-ray CT acquisition scenarios and USCT as suggested by RMSE and SSIM v alues. As reported in T ab. 1, training the prop osed netw ork using exponentially weigh ted loss (expPC-VN) deﬁned in Eq. (3) further im- pro ves reconstruction quality and reduces the v ariance of error, whic h can b e explained by the in tro duced regularization eﬀect. Fig. 3 sho ws that expPC-VN outp erforms iterative methods b oth in terms of accuracy and image qualit y . Namely , compared to nonlinear reconstruction metho ds, we observ e impro ve- men ts of RMSE b y 49% in the CT-LA-60 ◦ scenario, and increase of SSIM by 38% in the CT-SV-15 exp erimen t. Quan titative results from T ab. 1 and Fig. 3 sho w that prop osed reconstruction metho d outp erforms all considered iterativ e and deep learning -based approac hes. In Fig. 4 (a), w e presen t a USCT SoS reconstruction from ex vivo bovine sk eletal m uscle tissue embedded in a gelatin phantom. Compared to the conv en- 6 V. Vishnevskiy et al. expPC-VN Limited - Angle C T Sparse-V ie w CT 180° range: 120° 90° 60° #of views: 60 30 15 L2TV L1TV L2TGV GT 1540 1545 1550 1555 1560 1565 1570 [HU] [m/s] SoS USCT Fig. 3. Reconstruction results for X-ray CT and USCT acquisitions. F or sparse view and limited angle exp erimen ts, acquired angular p ositions of pro jections are depicted in the GT column. Deep V ariational Netw orks for CT 7 (a) (b) [HU] [m/s] 1420 1430 1440 1450 1460 SoS B-mode Axial Sagittal Coronal Fig. 4. (a) expPC-VN SoS reconstruction of ex vivo phan tom and corresp onding B- mo de image. Red contour shows inclusion segmentation from the B-mo de image. (b) Axial, sagittal and coronal view of slice-wise reconstruction of the proposed expPC-VN reconstruction netw ork. Corresp onding slice p ositions are indicated with red dashed line. tional B-Mo de image, w e could accurately iden tify inclusion location and provide quan titative estimates of lo cal tissue SoS. Reconstruction of a single image with VN takes 0.03 s on NVIDIA Titan Xp GPU and 1-4 min with iterative metho ds on a 6-core 3.7 GHz Intel CPU. In order to demonstrate p oten tial 3D imaging applications of our metho d, we also conducted X-ra y CT reconstruction of SV- 60 acquisition scenario with test images rotated by 90 ◦ in the axial plane, and sho w a cross-sectional views from the reconstructed 3D volume in Fig. 4 (b). W e observ e high spatial coherence and contrast in coronal and sagittal planes which asserts high generalization abilit y of the prop osed expPC-VN metho d. 4 Conclusions In this pap er we hav e presented a net work architecture for preconditioned re- construction and a regularization scheme for its eﬃcien t training via exp onen tial w eighting. The prop osed netw ork has b een sho wn to outp erform conv en tional algebraic and learning-based reconstruction metho ds in terms of accuracy and image quality for v arious challenging X-ray CT and SoS USCT scenarios. Such eﬀectiv eness and versatilit y of our approach may suggest its p oten tial for solv- ing other intriguing optimization and inv erse problems also outside of the image reconstruction ﬁeld. References 1. Bo yd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction metho d of multipliers. F oun- dations and T rends in ML 3 (1), 1–122 (2011) 2. Cheng, A., Kim, Y., Anas, E.M., Rahmim, A., Bo ctor, E.M., Seifabadi, R., W o od, B.J.: Deep learning image reconstruction metho d for limited-angle ultrasound to- mograph y in prostate cancer. In: Pro cs SPIE Medical Imaging. p. 1095516 (2019) 3. Hammernik, K., Klatzer, T., Kobler, E., Rech t, M.P ., So dickson, D.K., Pock, T., Knoll, F.: Learning a v ariational netw ork for reconstruction of accelerated MRI data. MRM 79 (6), 3055–3071 (2018) 8 V. Vishnevskiy et al. 4. Hammernik, K., W ¨ urﬂ, T., Pock, T., Maier, A.: A deep learning arc hitecture for limited-angle computed tomography reconstruction. In: Bildverarbeitung f ¨ ur die Medizin 2017, pp. 92–97 (2017) 5. Knoll, F., Bredies, K., Pock, T., Stollberger, R.: Second order total generalized v ariation (TGV) for MRI. MRM 65 (2), 480–491 (2011) 6. Landw eb er, L.: An iteration formula for Fredholm integral equations of the ﬁrst kind. American journal of mathematics 73 (3), 615–624 (1951) 7. Lin, H., Azuma, T., Unlu, M.B., T ak agi, S.: Ev aluation of adjoint metho ds in photoacoustic tomography with under-sampled sensors. In: MICCAI. pp. 73–81 (2018) 8. Liu, Y., Lew, M.S.: Learning relaxed deep sup ervision for better edge detection. In: CVPR. pp. 231–240 (2016) 9. Maier, A., Syb en, C., Lasser, T., Riess, C.: A gentle introduction to deep learning in medical image pro cessing. Zeitschrift f ¨ ur Medizinische Physik (2019) 10. P aige, C.C., Saunders, M.A.: LSQR: An algorithm for sparse linear equations and sparse least squares. ACM TOMS 8 (1), 43–71 (1982) 11. Sanabria, S.J., Goksel, O.: Hand-held sound-sp eed imaging based on ultrasound reﬂector delineation. In: MICCAI. pp. 568–576 (2016) 12. Sc hw ab, J., Antholzer, S., Haltmeier, M.: Learned backpro jection for sparse and limited view photoacoustic tomography . In: Pro cs SPIE Photons Plus Ultrasound: Imaging and Sensing. p. 1087837 (2019) 13. Siew erdsen, J., Daly , M., Bachar, G., Moseley , D., Bo otsma, G., Bro c k, K., Ansell, S., Wilson, G., Chhabra, S., Jaﬀra y , D., et al.: Multimo de C-arm ﬂuoroscopy , tomosyn thesis, and cone-b eam CT for image-guided interv entions: from pro of of principle to patien t proto cols. In: Pro cs SPIE Medical Imaging. p. 65101A (2007) 14. Soler, L., Hostettler, A., Agnus, V., Charnoz, A., F asquel, J., Moreau, J., Oss- w ald, A., Bouhadjar, M., Marescaux, J.: 3D image reconstruction for comparison of algorithm database: A patient sp eciﬁc anatomical and medical image database. IR CAD, Strasb ourg, F rance, T ech. Rep (2010) 15. Vishnevskiy , V., Sanabria, S.J., Goksel, O.: Image reconstruction via v ariational net work for real-time hand-held sound-sp eed imaging. MLMIR pp. 120–128 (2018) 16. W ang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P ., et al.: Image quality assess- men t: from error visibility to structural similarity . IEEE T rans Imag Pro c 13 (4), 600–612 (2004) 17. W ¨ urﬂ, T., Hoﬀmann, M., Christlein, V., Breininger, K., Huang, Y., Un b erath, M., Maier, A.K.: Deep learning computed tomography: learning pro jection-domain w eights from image domain in limited angle problems. IEEE T rans Med Imag 37 (6), 1454–1463 (2018) 18. Zheng, X., Ravishank ar, S., Long, Y., F essler, J.A.: PWLS-UL TRA: An eﬃcient clustering and learning-based approach for lo w-dose 3D CT image reconstruction. IEEE T rans Med Imag 37 (6), 1498–1510 (2018)

Deep Variational Networks with Exponential Weighting for Learning Computed Tomography

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment