A Splitting-Based Iterative Algorithm for GPU-Accelerated Statistical Dual-Energy X-Ray CT Reconstruction

A SPLITTING-B ASED ITERA TIVE ALGORITHM FOR GPU-A CCELERA TED ST A TISTICAL DU AL-ENERGY X-RA Y CT RECONSTR UCTION F angda Li Ankit Manerikar T anmay Pr akash A vinash Kak School of Electrical and Computer Engineering, Purdue Uni versity { li1208, amanerik, tprakash, kak } @purdue.edu ABSTRA CT When dealing with material classiﬁcation in baggage at airports, Dual-Energy Computed T omography (DECT) allows characterization of any giv en material with coef ﬁ- cients based on tw o attenuati ve ef fects: Compton scatter - ing and photoelectric absorption. Howe ver , straightforward projection-domain decomposition methods for this charac- terization often yield poor reconstructions due to the high dynamic range of material properties encountered in an ac- tual luggage scan. Hence, for better reconstruction quality under a timing constraint, we propose a splitting-based, GPU- accelerated, statistical DECT reconstruction algorithm. Com- pared to prior art, our main contrib ution lies in the signiﬁcant acceleration made possible by separating reconstruction and decomposition within an ADMM frame work. Experimental results, on both synthetic and real-world baggage phantoms, demonstrate a signiﬁcant reduction in time required for con- ver gence. Index T erms — DECT , ADMM, GPU 1. INTR ODUCTION X-ray Computed T omography (CT) is a widely deployed technique for threat detection in baggage at airport security checkpoints. Howe ver , using a single X-ray spectrum lim- its its reconstruction to only that of LAC (linear attenuation coefﬁcient) or HU (Hounsﬁeld Unit) images which are, at best, an approximation to the underlying energy-dependent characteristics of the materials. While LAC might sufﬁce for discriminating between different types of tissues in medical imaging, in the adv ersarial case of airport baggage scan- ning, materials used commonly for homemade explosi ves can possess LAC values that are nearly the same as for benign materials. As a result, the technique of DECT , where pro- jections from two X-ray spectra are collected simultaneously at each angle, has been proposed for material discrimination because it allows for the material property to be recov ered in an additional dimension by using an ener gy-dependent attenuation model. This research is funded by the BAA 17-03 AA TR contract with Depart- ment of Homeland Security . The most commonly used DECT model represents X-ray attenuation as the combined ef fect of Compton scattering and photoelectric absorption, which can be written as: µ ( E , x ) = x c f KN ( E ) + x p f p ( E ) , (1) where f KN ( E ) and f p ( E ) denote the ener gy-dependent mul- tipliers 1 to the Compton coefﬁcient x c and the photoelec- tric (PE) coef ﬁcient x p , respectively . Therefore, the goal of DECT is to reco ver both the Compton and the PE coefﬁcients of the object using the projections m h , m l ∈ R M measured at two dif ferent X-ray spectra. This amounts to solving simul- taneously for x c , x p ∈ R N from the following tw o equations that can be shown compactly by: m h/l = − ln Z S h/l ( E ) e − Rµ ( E ) dE + ln Z S h/l ( E ) dE , (2) where R denotes the forward projection matrix, and S h ( E ) and S l ( E ) denote, respecti vely , the photon distribution across energy le vels in the high- or low-ener gy spectra. Solving the two equations sho wn abov e is made challeng- ing by the fact the two phenomena that contribute to X-ray attenuation — Compton and PE, occur at grossly different scales. At most applicable energy le vels, Compton scattering is the dominant contributor to attenuation and this disparity between the two phenomena becomes worse with increasing energy since f p ( E ) has a cubic decay with E . As a result, stable reco very of PE coef ﬁcients requires more sophisticated in version algorithms, such as those described in [2, 3, 4, 5]. Unfortunately , the algorithms cited above tend to be it- erativ e and, with run-of-the-mill computing hardware, take a long time to return the results. Therefore, a straightforward implementation of these algorithms is not appropriate for the end-goals that moti vate our research — high-throughput bag- gage screening at airports. The focus of the concepts pre- sented in this paper is on impro ving the computational efﬁ- ciency of an ADMM-based statistical in version algorithm. 2. RELA TED WORK DECT in volv es two key steps: dual-energy decomposition and tomographic reconstruction. The two tasks can be done 1 See [1] for a detailed formulation of f KN ( E ) and f p ( E ) . either sequentially , as in projection-wise decomposition, or in a uniﬁed step using iterativ e statistical approaches. Projection-wise Decomposition : One of the earliest ap- proaches for DECT decomposition, the Constrained Decom- position Method (CDM) [1] inv olves directly decomposing the dual energy projections to Compton and PE line integrals followed by the FBP reconstructions of the two. In [6], CDM was also extended to operate for Multi-Energy CT . A major disadvantage of this method is that it guards itself poorly against artifacts, especially in PE coefﬁcients but it is still a preferred approach as it enables parallel implementation. Iterative Statistical A pproaches : The statistical methods for DECT solve for the MAP estimates, ﬁnding the Compton and PE coefﬁcients that best correspond to the measurements and any prior kno wledge. The literature on Multi-Energy CT has focused largely on designing the models and the priors that best le verage the structural similarity across bases. In [4], Compton/PE images are reconstructed on a set of e xponential basis functions with an edge-correlation penalty term. This idea of encouraging geometric similarity between the more stable Compton image and the PE image is further explored in [5]. For this purpose, the authors ha ve proposed a ne w Non-Local Mean (NLM) regularizer on the PE image and laid out an ADMM formation that scaled up to a problem size of practical interest. By treating the energy lev el as a third dimension, the contributions in [2, 3] adopt a tensor-based model for reconstruction using sparse priors. Despite being able to produce high-quality DECT , statis- tical reconstructions generally fail drastically with regard to the timing constraints of practical applications. Compared to LA C reconstructions, DECT has to deal with the added com- putational b urden associated with the decomposition step. As a result, in existing approaches, solving decomposition and reconstruction in one step becomes inef ﬁcient due to the com- bined complexity and high-dimensionality of the problem. W e therefore propose a statistical DECT approach that combines the best of the projection-wise decomposition methods and the iterativ e statistical methods. More speciﬁ- cally , we employ a splitting-based MAP estimation method embedded in an ADMM framew ork. As we will sho w in Section 4, our new splitting scheme not only pro vides a better con ver gence rate, but also allows for po werful hardware- enabled acceleration. 3. PR OPOSED METHOD 3.1. Pr oblem Formulation T o describe our proposed method, we ﬁrst deﬁne the forward model for X-ray attenuation, i.e., the nonlinear transform f ( · ) from Compton/PE coefﬁcients to log arithmic projections: f h/l ( e Rx ) = C h/l − ln Z S h/l e − f KN Rx c − f p Rx p dE (3) where C h/l are constants, x =  x T c ; x T p  T and e R =  R ; R  . Gi ven a pair of line-integral measurements m =  m T h ; m T l  T corrupted by Poisson photon noise and possibly other artifacts, we construct a MAP estimate with the T otal V ariation (TV) term and the non-negati vity prior: b x MAP = argmin x 1 2    f ( e Rx ) − m    2 Σ + λ | x | TV + g ( x ) , (4) where Σ is a diagonal matrix with photon counts as trace el- ements as proposed in [7], g ( x i ) =  0 , x i ∈ R + ∞ , x i / ∈ R + and λ is the regularization parameter . While the unconstrained opti- mization in (4) is highly inefﬁcient to solve directly , the Al- ternating Direction Method of Multipliers (ADMM) method provides a ﬂexible splitting-based framework for both opti- mality and con ver gence [5]. 3.2. F ormulation for ADMM ADMM begins with the conv ersion of (4) to its constrained equiv alent. By introducing two ne w auxiliary variables y and z for the TV and the non-negati vity terms, respectively , and by posing x as the primal variable, the MAP minimization can be transformed into: b x MAP = argmin x , y , z 1 2    f ( e Rx ) − m    2 Σ + λ | y | + g ( z ) , s.t.  y c/p z c/p  =  D I  x c/p = C x c/p , (5) where D denotes the ﬁnite dif ference operator . Subsequently , the corresponding Augmented Lagrangian (AL) can be writ- ten as: L ( x , y , z ) = 1 2    f ( e Rx ) − m    2 Σ + λ | y | + g ( z ) + X β ∈{ c,p } ρ β 2      y β z β  − C x β + u β     2 , (6) where u denotes the dual variable, or the scaled Lagrangian multiplier , and ρ denotes the penalty parameter . ADMM splits the minimization of AL into subproblems that are solved separately in an iterativ e framew ork. One such in- tuitiv e splitting scheme proposed in [5] is for the iterativ e updates to be carried out according to (7) through (11) shown below . First, the primal variables are updated using: b x c = argmin x c 1 2 k f ( Rx c , Rx p ) − m k 2 Σ + ρ c 2      y c z c  − C x c + u c     2 , (7) b x p = argmin x p 1 2 k f ( R b x c , Rx p ) − m k 2 Σ + ρ p 2      y p z p  − C x p + u p     2 . (8) Note that the update to x p is made subsequent to b x c since we can e xpect b x c to stabilize the recov ery of the more noise- prone x p . Secondly , we update the auxiliary variables y and z corresponding to the TV and non-negati vity terms, respec- tiv ely , by using b y c/p = shrinkage ( D b x c/p − u y c/p , λ c/p ρ c/p ) , (9) b z c/p = max(0 , b x c/p − u z c/p ) , (10) where the shrinkage function is deﬁned in [8, Eq. 9.35]. Then the dual variable u is updated with b u c/p = u c/p +  b y c/p b z c/p  − C b x c/p . (11) Lastly , we update ρ adapti vely using a method described in [9] that is based on the primal and dual residual. While (9)-(11) can be realized by straightforward element-wise operations, solving (7)-(8) requires a nonlinear least squares algorithm such as Lev enberg-Marquardt (LM). Despite its robustness, LM, is a sequential algorithm, offering little room for paral- lelization. Intuitiv ely , the ov erall computational inefﬁcienc y can be attributed to (7) or (8) which at once addresses tw o problems of v ery dif ferent nature: nonlinear dual-ener gy de- composition and regular linear tomographic reconstruction. 3.3. Pr oposed Splitting Scheme For the new sped-up implementation presented in this paper , we further decompose (7)-(8) into two simpler subproblems: tomographic reconstruction followed by dual-energy decom- position. This is achieved by viewing measurement ﬁtting as an additional constraint: a = e Rx , where a is a new auxil- iary variable for that purpose. Incorporating this constraint with the others, for the ﬁrst subproblem we pose tomographic reconstruction as an unweighted unconstrained least squares problem: b x c/p = argmin x c/p ρ c/p 2          a c/p y c/p z c/p   −  R C  x c/p +    u a c/p u y c/p u z c/p           2 . (12) and we can no w separately deal with the decomposition, i.e. solving for a , in the following subproblem. W e hav e cho- sen to use a CG solv er for the minimization sho wn abo ve be- cause our system is huge and sparse. Additionally , since the projection-wise weights Σ are no w absorbed by the decompo- sition step to be described later , the linear system correspond- ing to (12) is shift-in variant. Therefore, for improved con ver - gence rate, each CG iteration lends itself well to precondition- ing using a high-passing ﬁlter [10]. In our implementation, we experimented with the ramp ﬁlter as the preconditioner . The second subproblem, that of dual-energy decomposi- tion, can be solved by ﬁnding the pair of Compton/PE coefﬁ- cients that minimizes the cost function at each ray projection independently . Therefore, the decomposition is achiev ed by using the Unconstrained Decomposition Method (UDM):  b a c b a p  = argmin a c , a p 1 2     f (  a c a p  ) − m     2 Σ + X β ∈{ c,p } ρ β 2   a β − R b x β + u a β   2 (13) T able 1 . Minimum number of operations per ADMM itera- tion. In LM( m ) × CG( n ) , n CG iterations are taken to com- pute the update per LM iteration. Note that LM may require more operations than listed to tune the damping parameter . R R T f ( · ) LM( m ) × CG( n ) 2 m ( n + 1) 2 m ( n + 1) 4 m UDM( m )-PCG( n ) 2 n 2( n + 1) 4 m Compared to (7) and (8), the update formulas in (12) and (13) hav e the following adv antages: First, treating reconstruction and decomposition separately allo ws us to use two entirely different algorithms, with each tailored for the corresponding subproblem, which in our case are Preconditioned CG (PCG) and UDM. As an additional beneﬁt, while LM is still used in UDM, backtracking of the damping parameter , which is necessary for dealing with nonlinear optimization, no longer in volv es expensi ve tomographic operations. Lastly , compu- tational ef ﬁciency is improved in terms of both the number of operations required, as listed in T able 1, and paralleliza- tion. W e can now not only independently execute (12) for both bases, b ut also lev erage massi vely parallelized hardw are such as GPUs to solve (13). 4. EXPERIMENT AL RESUL TS W e now present qualitati ve and quantitati ve results on three phantoms: a simulated phantom ( sim18 ), a real-world wa- ter bottle phantom ( Water ), and an actual baggage phantom ( Clutter ). W e ha ve implemented the following algorithms and ev aluated them on these three phantoms: (i) CDM-FBP : CDM [1] with 16 CPU threads; (ii) LM( m ) × CG( n ) : ADMM as in [5]; and (iii) UDM( m )-(P)CG( n ) : proposed ADMM method with m UDM iterations and n (P)CG iterations. Our Python implementations use the Astra toolbox [11] for GPU-based calculations of forward and backward projec- tions. Additionally , as a key factor for acceleration, we use Gpuﬁt [12] to parallelize the decomposition step. All e xperi- ments are carried out on a single computing cluster node with 16 cores, 20GB RAM and an Nvidia 1080Ti GPU. Regard- ing initialization, in our experiment we ha ve chosen the CDM output as the initial Compton estimate, since it is generally stable. For initial PE coef ﬁcients, we use a scaled v ersion of the Compton estimate, similar to what is done in [5]. For both Compton and PE, we used λ = 10 − 5 as the TV re gularization parameter and ρ = 10 − 3 as the initial penalty parameter . Fur - thermore, for scale-in variant ev aluation, quantitative recon- struction quality is assessed with the normalized ` 2 -distance between x and ground truth x ∗ : ξ ( x ) = 20 log 10  k x − x ∗ k 2 k x ∗ k 2  . (14) The ﬁrst phantom, sim18 , contains sev en circular re- gions of different materials. The reconstructed images, of Fig. 1 . Compton and PE reconstructions by CDM-FBP and UDM1-PCG1 . Note that we clipped the values in PE images of sim18 to sho w the streaking artifacts in the CDM-FBP reconstruction. Such artifacts, appearing also in the CDM-FBP reconstructions of Water and Clutter , are signiﬁcantly suppressed by the UDM1-PCG1 method. Fig. 2 . ξ ( x ) vs. iteration for PE and a verage seconds per iteration for (a) sim18 and (b) Water . The same plots for Compton are omitted since they e xhibit similar trends. size 512 × 512 , are constructed from 720 angles, each con- taing 725 parallel ray projections. The ﬁrst ro w in Figure 1 shows the reconstructions by CDM-FBP and the proposed UDM-PCG algorithm. In Figure 2a, we display the a ver- age computational time per iteration inside parentheses in the inset box and plot ξ ( x ) versus iteration to compare the con- ver gence rates. In general, the ADMM algorithm with the proposed splitting scheme results in signiﬁcantly shorter total ex ecution time than LM × CG to reach the same error . For results on the two real-world phantoms, the parallel beam projection data for the two was collected on the Ima- tron C300 CT scanner . The X-ray source emits 1 . 8 × 10 5 and 1 . 7 × 10 5 photons per ray with two energy spectra at 95keV and 130k eV , respecti vely . The high- and the low-ener gy sino- grams are subsampled by two, resulting in 360 angles for each and with 512 bins for each angle. The reconstruction results for the Water phantom are sho wn in the second ro w of Fig- ure 1. For quantitative ev aluation, in Figure 2b we plot ξ ( x ) versus iteration within the ROI – the central circular region that is occupied by distilled water . Our proposed UDM-PCG algorithm is signiﬁcantly more efﬁcient not only in terms of the number of the iterations needed, but also in terms of the time per iteration as compared to LM × CG . The real-world baggage phantom Clutter is particu- larly challenging because it contains metallic objects, and il- lustrates well the need for a statistical reconstruction. As shown in the third ro w of Figure 1, while the CDM-FBP PE reconstruction is completely overshado wed by streaking ar- tifacts, UDM1-PCG1 is able to recov er object shapes to a reasonable degree. 5. CONCLUSIONS AND FUTURE WORK In this paper, we have proposed a new splitting scheme for implementing ADMM to reconstruct Compton and PE co- efﬁcient images using dual-energy projection data. By sep- arating the the reconstruction and decomposition steps, the proposed GPU-accelerated ADMM algorithm achiev es a sig- niﬁcant speedup in time when compared to the prior state-of- the-art. Future work will be aimed at generalization to 3D reconstruction and improving initialization heuristics. 6. REFERENCES [1] Zhengrong Y ing, Ram Naidu, and Carl R Crawford, “Dual energy computed tomography for explosi ve de- tection, ” J ournal of X-r ay Science and T echnology , v ol. 14, no. 4, pp. 235–256, 2006. [2] Oguz Semerci, Ning Hao, Misha E Kilmer , and Eric L Miller , “T ensor -based formulation and nuclear norm regularization for multienergy computed tomography , ” IEEE T ransactions on Ima ge Pr ocessing , vol. 23, no. 4, pp. 1678–1693, 2014. [3] Y anbo Zhang, Xuanqin Mou, Ge W ang, and Hengyong Y u, “T ensor-based dictionary learning for spectral ct reconstruction, ” IEEE transactions on medical ima ging , vol. 36, no. 1, pp. 142–154, 2017. [4] Oguz Semerci and Eric L Miller , “ A parametric lev el-set approach to simultaneous object identiﬁcation and back- ground reconstruction for dual-energy computed tomog- raphy , ” IEEE transactions on imag e pr ocessing , v ol. 21, no. 5, pp. 2719–2734, 2012. [5] Brian H Trace y and Eric L Miller , “Stabilizing dual- energy x-ray computed tomography reconstructions us- ing patch-based regularization, ” In verse Pr oblems , vol. 31, no. 10, pp. 105004, 2015. [6] Y aoshen Y uan, Brian T racey , and Eric Miller , “Robust x-ray based material identiﬁcation using multi-energy sinogram decomposition, ” in Anomaly Detection and Imaging with X-Rays (ADIX) . International Society for Optics and Photonics, 2016, vol. 9847, p. 98470V . [7] Charles A Bouman and K en Sauer , “ A uniﬁed approach to statistical tomography using coordinate descent opti- mization, ” IEEE T ransactions on image pr ocessing , v ol. 5, no. 3, pp. 480–492, 1996. [8] Charles A Bouman, “Model based image processing, ” 2013. [9] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers, ” F oundations and T r ends R  in Machine learning , vol. 3, no. 1, pp. 1–122, 2011. [10] Jef frey A Fessler and Scott D Booth, “Conjugate- gradient preconditioning methods for shift-v ariant pet image reconstruction, ” IEEE tr ansactions on image pr o- cessing , vol. 8, no. 5, pp. 688–699, 1999. [11] W im van Aarle, W illem Jan Palenstijn, Jeroen Cant, Eline Janssens, Folkert Bleichrodt, Andrei Dabrav olski, Jan De Beenhouwer , K Joost Batenburg, and Jan Sijbers, “Fast and ﬂe xible x-ray tomography using the astra tool- box, ” Optics expr ess , vol. 24, no. 22, pp. 25129–25147, 2016. [12] Adrian Przybylski, Bj ¨ orn Thiel, Jan Keller -Findeisen, Bernd Stock, and Mark Bates, “Gpuﬁt: An open-source toolkit for gpu-accelerated curve ﬁtting, ” Scientiﬁc re- ports , vol. 7, no. 1, pp. 15722, 2017.

A Splitting-Based Iterative Algorithm for GPU-Accelerated Statistical Dual-Energy X-Ray CT Reconstruction

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment