3DFPN-HS$^2$: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection

3DFPN-HS 2 : 3D F eature Pyramid Net w ork Based High Sensitivit y and Sp eciﬁcit y Pulmonary No dule Detection Jingy a Liu 1 , Liangliang Cao 2 , 3 , Oguz Akin 4 , and Yingli Tian 1 , ? 1 The Cit y College of New Y ork, New Y ork, NY 10031 2 UMass CICS, Amherst, MA 01002 3 Go ogle AI, New Y ork, NY 10011 4 Memorial Sloan-Kettering Cancer Cen ter, New Y ork, NY, 10065 Abstract. Accurate detection of pulmonary no dules with high sensitiv- it y and sp eciﬁcit y is essen tial for automatic lung cancer diagnosis from CT scans. Although man y deep learning-based algorithms make great progress for improving the accuracy of no dule detection, the high false p ositiv e rate is still a challenging problem whic h limited the automatic diagnosis in routine clinical practice. In this paper, we prop ose a nov el pulmonary no dule detection framework based on a 3D F eature Pyramid Net work (3DFPN) to improv e the sensitivit y of no dule detection by em- plo ying multi-scale features to increase the resolution of no dules, as well as a parallel top-down path to transit the high-level semantic features to complement low-lev el general features. F urthermore, a High Sensitiv- it y and Sp eciﬁcit y (HS 2 ) net work is introduced to eliminate the falsely detected no dule candidates by tracking the app earance changes in con- tin uous CT slices of each no dule candidate. The prop osed framew ork is ev aluated on the public Lung No dule Analysis (LUNA16) challenge dataset. Our metho d is able to accurately detect lung no dules at high sensitivit y and sp eciﬁcit y and achiev es 90 . 4% sensitivity with 1/8 false p ositiv e p er scan which outperforms the state-of-the-art results 15 . 6%. Keyw ords: Lung No dule Detection · F alse Positiv e Reduction · CT · Deep Learning 1 In tro duction Lung cancer is one of the leading cancer killers around the world whic h makes the study of lung cancer diagnosis eminently crucial. Computer-aided diagnosis (CAD) systems provide assistance for radiologists to accelerate the diagnosing pro cess. Man y eﬀorts [3,6,11] hav e b een m ade for lung nodule detection b y gener- alizing the recent p ow erful deep detection mo dels in computer vision. Although these eﬀorts made goo d progress in accurately detecting pulmonary no dules from CT scans, the false p ositiv e rate is still very high which limits the real ? Corresp onding author. Email: ytian@ccny .cun y .edu 2 J. Liu et al. application in routine clinical practice. F or example, most of the previous w ork [3,6,11,9] obtained less than 75% sensitivity with 1/8 false p ositives p er scan. T o get sensitivity scores as high as 95 . 8%, these mo dels would b ear ab out eight false p ositiv es, whic h prev ent their use in routine clinical practice. W e b eliev e tw o main challenges preven t the existing mo dels from accurate lung no dule detection. 1) Some normal tissues ha ve similar morphological ap- p earances as no dules in CT images which cause high false p ositiv es b y wrongly detecting these tissues as no dules. 2) The high discrepancy of the v olume b et w een no dules and the whole CT scan may cause missing detection of real no dules. F or example, the v olume of a no dule with 10mm size in the diameter only o ccu- pied 0 . 059% of the volume of whole CT scan (in av erage 213 × 293 pixels with 281 slices ). F urthermore, the size of pulmonary no dules can v ary by as m uc h as 10 times. F or example, no dules in diameter can range from 3mm to 30mm in LUNA16 dataset. Therefore, it is particularly crucial for designing metho ds whic h can detect small v olume no dules from large volume CT scans as well as to diﬀerentiate no dules from tissues with similar app earances in CT images. T o address the ab o ve tw o c hallenges, this pap er attempts to in tegrate the most recent progress in computer vision as well as the domain exp ert knowledge in medical imaging. Motiv ated by the-state-of the-art image detector using 2D F eature Pyramid Netw ork (FPN) [7], we develop a 3D feature pyramid netw ork (3DFPN) for small no dule detection b y app ending the lo w-level high-resolution features to the high-level strong semantic features and a multi-scale feature pre- diction guarantees the wide-scale no dule detection. In addition, we carefully analyze the diﬀerence betw een no dules and the tissues where are wrongly de- tected as no dule candidates, and ﬁnd that although they lo ok similar on single CT slice, their spatial v ariances within contin uous CT slices are distributed dif- feren tly . This unique insigh t motiv ates us to design a nov el reﬁnement netw ork, based on the lo cation history image in the continous CT slices. The ﬁnal mo del b eneﬁts from both the p o werful deep net work and medical imaging insights, and reduces a signiﬁcant amount of the previous false detected no dule candidates. Our mo del ac hieves a 90 . 4% sensitivity at 1/8 false p ositiv e p er scan whic h signiﬁcan tly outp erforms the state-of-the-art metho d 15 . 6%. 2 Our Metho d As shown in Fig.1, the input of our lung no dule detection framework is a whole 3D v olume of CT scan which is fed in to a 3D F eature Pyramid Con vNet (3DFPN) to detect the 3D lo cations of no dule candidates. After the no dule candidates are detected, w e crop a 3D cub e region centered with the candidate and develop a High Sensitivit y and Sp eciﬁcit y (HS 2 ) net work to further recognize whether the detected no dule candidates are real no dules or false detected tissues which eﬀectiv ely reduces false p ositiv es. In this framework, the 3DFPN b eneﬁts from the progress of state-of-the-art deep learning, and the HS 2 net work beneﬁts from the insight of medical images. W e will discuss them resp ectiv ely . 3DFPN-HS 2 based Pulmonary No dule Detection 3 Fig. 1. The prop osed 3DFPN-HS 2 framew ork of high sensitivity and sp eciﬁcit y lung no dule detection by com bining a 3D F eature Pyramid ConvNet (3DFPN) with an HS 2 net work. A whole CT scan is fed into the 3DFPN to predict no dule candidates. F or the detected no dule candidates, the HS 2 net work eliminates the miss-predicted normal tissues based on the location v ariance within con tinuous CT slices. More detailed struc- ture of the proposed 3DFPN netw ork can b e found in the supplementary document. 2.1 3D F eature Pyramid Con vNet The recen t progress in computer vision suggests feature p yramid netw orks (FPNs) are go od at detect ob jects at diﬀerent scales [7]. How ever, traditional FPNs are designed for 2D images. Here, we prop ose a 3DFPN netw ork to detect 3D lo ca- tions of lung no dules from 3D CT volumetric scans. Diﬀerent than [7] whic h only concatenates the upp er-lev el features in feature pyramid, we use a dense p yra- mid netw ork to integrate b oth the low-lev el high-resolution features as well as high-lev el high-semantic features, which enriches the lo cation details and strong seman tics for no dule detection. T able 1 highlights the main diﬀerences b et ween 2DFPN and our 3DFPN. T able 1. Comparison b et ween 2DFPN [7] and our prop osed 3DFPN. W e take 3D v olume as input and the feature pyramid la yers are integrated with lateral connections of all the high-lev el and lo w-level features. Metho d Input 3D v olume Lateral connections In tegrate upp er lay er In tegrate lo wer lay er Upsample higher lay er Do wnsample lo wer lay er 2DFPN[7] No √ √ √ 3DFPN √ √ √ √ √ √ The b ottom-up netw ork extracts features from the conv olution la yer 2–5, refer as C2, C3, C4, C5, is follow ed by a conv olution la yer with k ernel size 1 × 1 × 1 to conv ert feature channels with the same num b er of channels. The feature pyramid netw ork contains four la yers, as P2, P3, P4, P5, whic h in tegrates the low-lev el features by a max p o oling lay er and the downsampled high-level features b y deconv olution. 3DFPN predicts lo cation with four parameters as [ x, y , z , d ], where [ x, y ] as the spatial co ordinates at each CT slices, z as CT slice n umber, and d as no dule diameter and a conﬁdence score for each candidate. 4 J. Liu et al. Fig. 2. The proposed Lo cation History Images (LHI) to distinguish tissues and no dules from the predicted no dule candidates. (a) Similar app earance of true no dules (green b o xes) and false detected tissues (orange b o xes). (b) The orientations of the location v ariances for no dules and tissues are presented diﬀerently in LHIs. T rue no dules gen- erally hav e a circular region which indicates the spatial changes with either a brighter cen ter (when no dule sizes in following CT slices are smaller) or a darker center (when no dule sizes in following CT slices are bigger). On the other hand, the lo cation v ari- ance for false detected tissues usually tends to change in certain directions such as a gradually changed tra jectory line. 2.2 HS 2 Net work Due to the low resolution and the noise of CT images, as sho wn in Fig. 2(a), some tissues (orange boxes) appear to ha v e similar features as real no dules (green b o xes) whic h are very lik ely to b e detected as no dule candidates. This leads to a large num b er of false p ositiv es. As shown in T able 2, we further analyze 300 false p ositiv es predicted by the 3DFPN and observe that 241 F alse P ositives (FPs) are caused by the high similarit y of tissues (80.3%), 33 of them are caused b y inaccurate size detection, and 26 FPs are due to the inaccurate lo cation detection. T able 2. Statistic Analysis for F alse P ositive No dule Candidates. Tissue Inaccurate Size Inaccurate Lo cation P ercentage 80.3% 11% 8.7% It is crucial to obtain a ma jor diﬀerence to distinguish similar tissues from no dules for false p ositiv e reduction. By observing the con tinues slices, we disco v er that for tissues, the orientation of the location changes could be track ed in certain patterns, while the v ariance of true no dules tends to expand outside the contour or diminishing to the center at con tin uous CT slices. F or instance, Fig.2(b) shows no dules and tissues in the appearance of con tin uous slides. F or a gray-scale v alue of each pixel represents the closest change of the pixel in the region of no dules within a series of CT slices, with the gray v alue orthogonal to the mov emen t, we obtain the lo cation v ariance of the candidates. Inspired by Motion History Image (MHI) [2][10], we deﬁne the Lo cation His- tory Image (LHI) as f . By giv en any pixel location ( x, y ) on a CT slice s , f ( x, y , s ) 3DFPN-HS 2 based Pulmonary No dule Detection 5 represen ts the intensit y v alue of LHI within (1 , τ ) slice. The LHI is fed to a HS 2 (high sensitivity and sp eciﬁcit y) netw ork which is a feed-forward neural netw ork with 2 conv olution la yers follow ed b y 3 fully connected la y ers. The outputs of the HS 2 net work are reﬁned predicted lab els of true no dules and tissues. Sensitivity is deﬁned as a ratio of true p ositiv es o ver the total num b er of true p ositiv es and false negatives. Sp eciﬁcit y is the ratio of true negatives ov er the total num b er of the true negativ es and false p ositiv es. The intensit y of LHI is calculated as in Eq. (1): f ( x, y , s ) =  τ if ψ ( x, y , s ) = 1 max (0 , f ( x, y , s − 1) − 1) other w ise (1) where the up date function ψ ( x, y , s ) given by the spatial diﬀerentiation of the pixel in tensity of t wo con tinuous CT slices. The algorithm has the follo wing steps. 1) If | I ( x, y , s ) − I ( x, y , s − 1) | is larger than a threshold, ψ ( x, y , s ) = 1, otherwise, ψ ( x, y , s ) = 0. 2) F or the curren t slice, if ψ ( x, y , s ) = 1, f = τ . Otherwise, if f ( x, y , s ) is not zero, it is attenuated with a gradient of 1. If f ( x, y , s ) equals zero, then remains as zero. 3) Rep eat steps 1) and 2) until all the slices are pro cessed. Therefore, the lo cation v ariance among contin ues CT slices and their c hange patterns can b e eﬀectiv ely represen ted b y our prop osed LHIs. 3 Exp erimen tal Results and Discussions 3.1 Dataset and Ev aluation The p erformance of the prop osed framework is ev aluated on the most p opular LUNA16 c hallenge dataset [1] which consists of 1186 nodules in the size b et ween 3 − 30 mm from 888 CT scans and agreed b y at least 3 out of 4 radiologists. It is divided into 10 subsets. In order to conduct a fair comparison with other metho ds, we follo w the same pro cess to conduct cross v alidations b y using 9 subsets for training and the remaining 1 subset for testing, then obtain the ﬁnal results by av eraging the 10 experiments. Data augmen t is applied b y ﬂipping and resizing the CT scans. Same as other metho ds, the F ree-Response Receiv er Op erating Characteristic (FROC) analysis [8] and Comp etition Performance Metric (CPM) of detection sensitivity and the corresp onding false p ositiv es at 1 / 8 , 1 / 4 , 1 / 2 , 1 , 2 , 4 , 8 p er scan are employ ed to measure the p erformance. The CPM score is calculated by the a verage of sensitivity for all the levels of false p ositiv es p er scan. 3.2 Exp erimen tal Results Exp erimen tal Settings. The framework takes the whole CT scan as input, while volume at 96 × 96 × 96 pixels is selected by a sliding window metho d as the input of the 3DFPN netw ork. This size is selected based on exp erimen ts whic h is bigger enough to contain the whole no dule even when it is with the largest size (ab out 30 mm). The anchor sizes employ ed in our 3DFPN to obtain 6 J. Liu et al. T able 3. FROC P erformance comparison with the state-of-the-arts: sensitivit y (recall) and the corresp onding false p ositiv es at 1 / 8 , 1 / 4 , 1 / 2 , 1 , 2 , 4 , 8 p er scan. Our 3DFPN- HS 2 metho d ac hieves the b est performance (with > 90% sensitivit y) at all false p ositiv e lev els and signiﬁcan tly outperforms others esp ecially at the lo w false p ositiv e lev els (1 / 8 and 1 / 4). Metho ds 1/8 1/4 1/2 1 2 4 8 CPM score Dou et al.[4] 0.659 0.745 0.819 0.865 0.906 0.933 0.946 0.839 Zh u et al.[11] 0.692 0.769 0.824 0.865 0.893 0.917 0.933 0.842 W ang et al.[9] 0.676 0.776 0.879 0.949 0.958 0.958 0.958 0.878 Ding et al.[3] 0.748 0.853 0.887 0.922 0.938 0.944 0.946 0.891 Khosra v an et al.[6] 0.709 0.836 0.921 0.953 0.953 0.953 0.953 0.897 3DFPN (Ours) 0.848 0.876 0.905 0.933 0.943 0.957 0.970 0.919 3DFPN-HS 2 (Ours) 0.904 0.914 0.933 0.957 0.971 0.971 0.971 0.952 the candidate regions from feature maps are [3 3 , 5 3 , 10 3 , 15 3 , 20 3 , 25 3 , 30 3 ] pixels. F or all the anchors, the corresp onding regions obtained from all the 3D feature map lev els are gathered to predict the p osition of no dules. In the training phase, the regions with an Intersection-o ver-Union (IoU) threshold to the ground-truth regions less than 0 . 02 are referred to negative samples and greater than 0.4 are p ositiv e samples. The samples in b et w een are ignored to av oid the positive and negative samples similarity . A classiﬁcation la yer is used to predict a conﬁdence score for the candidate class and a region regression lay er is applied to learn the oﬀset b et ween the p osition of region pro- p osals and the ground-truth. W e adopt Smo oth L 1 loss [5] and binary cross en tropy loss (BCE-loss) for lo cation regression and classiﬁcation score resp ec- tiv ely . In the testing, for each region prop osal, a conﬁdence score is calculated b y the classiﬁcation lay er. The prop osals with a probability larger than 0.1 are c hosen as nodule candidates. Non-maxim um suppression is further applied to eliminate the m ultiple predicted candidates for one no dule. The t wo con volution lay ers of the HS 2 net work are set to (1 , 30), (30 , 50) dimensions and follo wed b y three fully connected lay ers with the channel sizes of (2048 , 1024 , 512). The cross entrop y loss is applied for classiﬁcation during the training. Image patc hes aligned with each predicted nodule candidate region but with twice size (in both x and y directions) are selected from 11 contin uous CT slices (5 slices b efore and after the current slice of nodule candidate respectively). The LHI of these patches is extracted and resized to 48 × 48 pixels as the input of the HS 2 net work. In the training, the learning rate starts from 0 . 01 and decreases to 1 / 10 for ev ery 500 ep o c hs. T otal of 2 , 000 ep o c hs is conducted for the framework. The a verage prediction time for a whole CT scan is ab out 0 . 53 min/scan on a server with one GeF orce GTX 1080 GPU using Pytorch 2.7. Comparison with Other Metho ds. T able 3 shows the FR OC ev aluation results with (1 / 8 , 1 / 4 , 1 / 2 , 1 , 2 , 4 , 8) false p ositiv e lev els of our prop osed metho d compared with state-of-the-art metho ds. The highlighted n umbers in the table indicate the b est p erformance within each column. All the metho ds are tested on LUNA16 dataset follow ed the same FROC ev aluation. As shown in the table, 3DFPN-HS 2 based Pulmonary No dule Detection 7 Fig. 3. Left: Comparison b et ween the prop osed 3DFPN and 3DFPN-HS 2 (with High Sensitivit y and Sp eciﬁcit y netw ork for false p ositiv e reduction.) 3DFPN-HS 2 greatly impro ves the p erformance of the 3DFPN at all the FP levels. Righ t: The num b er of false p ositives is reduced from 629 to 97 for a total of 88 CT scans with the conﬁdence score abov e 0 after the HS 2 net work is applied. More visualized detection results are pro vided in the supplementary do cumen t. our framew ork outperforms 5 . 5% av erage sensitivit y than the b est result of other metho ds. In addition, the prop osed framework achiev es the b est p erformance at ev ery FP level. As previously mentioned, the CAD system is not only required a high sensitivity , but also a high sp eciﬁcit y . T able 3 demonstrates that the false p ositiv es are greatly reduced by the proposed HS 2 net work. 3DFPN-HS 2 obtains a highest 97 . 14% sensitivity at 2 FPs per scan. In addition, for the FP of 1/8, 1/4, and 1/2 p er scan, the prop osed framework still remains a high sensitivit y ab o ve 90%. The exp erimen tal results sho w that 3DFPN-HS 2 reac hes a state-of-the-art p erformance for high sensitivit y and sp eciﬁcit y lung no dule detection. Eﬀectiv eness of HS 2 for FP Reduction. Two experiments are conducted to demonstrate the adv antages of HS 2 net work. As sho wn in Fig. 3(a), compared with 3DFPN without the HS 2 net work, the result of 3DFPN-HS 2 with the false p ositiv e reduction is increased more than 5% at 1/8 FP level. In addition, the n umbers of FPs with (blue bar) and without (orange bar) HS 2 net work for all the predicted no dule candidates from a total of 88 CT scans (subset 9) are further compared in Fig. 3(b). By applying HS 2 , the 3DFPN-HS 2 is able to distinguish the falsely detected tissues from true no dules, therefore signiﬁcantly reduces FPs b y 84 . 5%. It is worth noting that our proposed 3DFPN without HS 2 net work still ac hieves 97% at 8 FPs p er scan and 91 . 9% CPM, which surpasses other state-of-the-art metho ds (see T able 3.) 4 Conclusion In this pap er, we hav e prop osed an eﬀectiv e framework 3DFPN-HS 2 b y em- plo ying a 3D feature pyramid netw ork with lo cal and global feature enrichmen t for small volume and multi-scale nodule detection. HS 2 net work is introduced to reduce false p ositiv es based on the diﬀerent patterns of lo cation v ariance for no d- ules and tissues in contin uous CT slices. The prop osed framework signiﬁcan tly 8 J. Liu et al. outp erforms the state-of-the-art metho ds and has achiev ed high sensitivity and sp eciﬁcit y which has a great p otential in routine clinical practice. Ac knowledgemen ts. This material is based up on work supp orted by the Na- tional Science F oundation under aw ard num b er IIS-1400802 and Memorial Sloan- Kettering Cancer Cen ter Supp ort Grant/Core Gran t P30 CA008748. Oguz Akin, MD serves as a scien tiﬁc advisor for Ezra AI, Inc., which is developing artiﬁcial in telligence algorithms and soft ware unrelated to the research b eing rep orted. References 1. Aaa, S., T rav erso, A., De, B.T., Msn, B., Cvd, B., Cerello, P ., Chen, H., Dou, Q., F an tacci, M.E., Geurts, B.: V alidation, comparison, and combination of algorithms for automatic detection of pulmonary no dules in computed tomography images: The luna16 challenge. Medical Image Analysis 42 , 1–13 (2017) 2. Da vis, J.W.: Hierarc hical motion history images for recognizing human motion. In: Pro ceedings IEEE W orkshop on Detection and Recognition of Even ts in Video. pp. 39–46 (2001) 3. Ding, J., Li, A., Hu, Z., W ang, L.: Accurate pulmonary nodule detection in com- puted tomograph y images using deep conv olutional neural netw orks. In: Interna- tional Conference on Medical Image Computing and Computer-Assisted In terven- tion. pp. 559–567 (2017) 4. Dou, Q., Chen, H., Jin, Y., Lin, H., Qin, J., Heng, P .A.: Automated pulmonary no dule detection via 3d convnets with online sample ﬁltering and hybrid-loss residual learning. In: International Conference on Medical Image Computing and Computer-Assisted Interv ention. pp. 630–638 (2017) 5. Girshic k, R.: F ast r-cnn. In: Pro ceedings of the IEEE international conference on computer vision. pp. 1440–1448 (2015) 6. Khosra v an, N., Bagci, U.: S4nd: Single-shot single-scale lung no dule detection. In: In ternational Conference on Medical Image Computing and Computer-Assisted In terven tion. pp. 794–802 (2018) 7. Lin, T.Y., Doll´ ar, P ., Girshick, R., He, K., Hariharan, B., Belongie, S.: F eature p yramid net works for ob ject detection. In: Pro ceedings of the IEEE Conference on Computer Vision and P attern Recognition. pp. 2117–2125 (2017) 8. Setio, A.A., Ciompi, F., Litjens, G., Gerke, P ., Jacobs, C., V an, R.S., Winkler, W.M., Naqibullah, M., Sanc hez, C., V an, G.B.: Pulmonary no dule detection in ct images: false p ositive reduction using multi-view conv olutional netw orks. IEEE T ransactions on Medical Imaging 35 (5), 1160–1169 (2016) 9. W ang, B., Qi, G., T ang, S., Zhang, L., Deng, L., Zhang, Y.: Automated pulmonary no dule detection: High sensitivity with few candidates. In: International Conference on Medical Image Computing and Computer-Assisted In terven tion. pp. 759–767. Springer (2018) 10. Y ang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Pro ceedings of the 20th ACM international conference on Multimedia. pp. 1057–1060 (2012) 11. Zh u, W., Liu, C., F an, W., Xie, X.: Deeplung: Deep 3d dual path nets for automated pulmonary no dule detection and classiﬁcation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (W ACV). pp. 673–681 (2018)

3DFPN-HS$^2$: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment