Multi-task Localization and Segmentation for X-ray Guided Planning in Knee Surgery
X-ray based measurement and guidance are commonly used tools in orthopaedic surgery to facilitate a minimally invasive workflow. Typically, a surgical planning is first performed using knowledge of bone morphology and anatomical landmarks. Informatio…
Authors: Florian Kordon (1, 2, 3)
Multi-task Lo calization and Segmen tation for X-ra y Guided Planning in Knee Su rge ry Florian Kordon 1 , 2 , 3 , Peter Fischer 1 , 2 , Maxim Priv alo v 4 , Benedict Sw artman 4 , Marc Sc hnetzke 4 , Jo c hen F ranke 4 , Ruxandra Lasowski 3 , Andreas Maier 1 , and Holger Kunze 2 1 P attern Recognition Lab, Department of Computer S ci ence, F riedric h-A le xand er-Univ ersitt Erlangen-Nrnberg, Erlangen, Germany florian.ko rdon@fau.de 2 Adv anced Therapies, Siemens Healthcare GmbH, F orchheim, Germany 3 F acult y of Digital Media, Ho c hsch ule F u rt w angen, F urtw angen, Germany 4 Department for T rauma and Orth opaedic Surgery , BG T rauma Center Ludwigshafen, Lud wigshafen, German y Abstract. X-ray based measurement and guidance are commonly used tools in orth o paedic surgery to fa cilitate a minimally in v asive w orkflow. T ypically , a surgical plann in g is first p erfo rmed using k no wledge of b one morphology and anatomical landmarks. Information ab out b one lo ca tion then serves as a prior fo r registration during o verla y of the planning on intra-operative X-ra y ima ges. Performi ng these steps man ually how ever is prone to intra-rater/in ter-rater v ariabilit y and increases task complexity for the surgeon. T o remedy these issues, we propose an automatic frame- w ork for planning and subsequent ov erlay . W e ev aluate it on the example of femoral drill s ite planning for medial patellof emoral ligamen t recon- struction surgery . A deep m ulti-task stac ked hourglass net wo rk is t ra ined on 149 conv entional lateral X-ra y images to join tly localize tw o femoral landmarks, to predict a region of in terest for the p osterio r femoral cor- tex tangent line, and to p erform semantic segmentati on of the femur, patella, tibia, and fi bula with adaptive tas k complexity w eighting. On 38 clinical test images the framework achieve s a median localization er- ror of 1.50 mm for the f emoral drill site and mean IOU scores of 0.99, 0.97, 0.98, and 0.96 for the femur, patella, tibia, and fi bula respectively . The demonstrated approach consistentl y p erfo rms surgical planning at exp ert-lev el precision without the need for man ual correction. Keywords: Landmark lo cali zation · Multi-lab el b one segmentation · MPFL · X-ra y guidance · Orthopaedics · Su rg ical planning. 1 In tro duction In orthopaedics, X -r ay imaging is fr equen tly used to facilitate planning and op erativ e guidance for surg ical in terven tions. By capturing patient-specific char- acteristics and con textual information prior to and during the pro cedure, suc h image-base d to ols b enefit a more reliable and minimally in v asive workflow at 2 Kordon et al. reduced ris k for the patient. T o this end, t ypica l assessment inv olves geometric measurements of patien t anatomy , verification o f cor r ect p ositioning of surgical to ols and implants, as well as navigational guidance with help o f ana tomical landmarks and bo ne morphology . In curr en t clinical practice, several metho d- ologies hav e been established which leverage this too lset to standardize routine pro cedures. One ex ample is the Schoettle planning metho dology for reconstruc- tion surgery of a ruptured medial patellofemoral ligamen t (MPFL) [8 ]. T o resto r e the anatomica lly correct biomec hanics a nd to forestall recurr en t injuries, the o p- timal fixation area on the femur is a ppr o ximated by the Schoettle Poin t, whic h can be der ived from se veral osseous landmarks (Fig. 1). Unfortunately , execution of such a metho dology faces sev eral clinical and technical ch alleng es [5,9]. First, many orthopae dic surg eries target anatomica l regions which are not dire ctly inferable from the imag e but rely on auxiliary structures derived from anatom- ical landmarks, leading to inter-rater and in tra-r ater differences. Secondly , the ov erlay of the planning r esult on subseq ue nt intra-oper ational images requires registra tion to compe nsate for motion which should be restricted to the ana to m- ical region of interest (ROI), in the case of MP FL, the femora l b one. And las t ly , manual intra-oper ational planning and interaction with a guidance application in a sterile s e t ting ar e disruptive in the doc to r’s surg ical workflo w. Using MPFL reconstr uctio n as an exa mp le, we present a fra mew ork whic h allows fully- automatic lo calization of anato mical landmarks, semantic segmen- tation of bo ne str uc tur es, a nd predictio n of ROIs for g eometric line f eatur es on X-ray images. Building up on the ideas o f [1], w e exploit recent adv ances in se- quential deep learning architectures in form o f deep stack ed ho ur glass netw orks (SHGN) [7] to refine pr edictions based on the learned residual infor mation b e- t ween the gr ound truth a nd in termedia te estimates. W e prop ose an extension to a mu lti-tas k lea rning approach to incor porate cross -task informatio n for an enriched and mor e general feature se t, which proved to b e b eneficial in X-r a y based se g men tation tasks [2]. T o automatically weight the single tas k loss terms, our framework introduces a nov el adaptio n of gra dien t normalization [3] for Fig. 1. Ap pro ximation of the S c hoettle Poin t p sp [8] as the center of the inner circle of three lines. These lines can b e derived fro m osseous landmarks on a lateral radiograph. Multi-task Lo caliza tion and Segmen tation for S urgica l Planning 3 stack ed netw ork ar c hitectures by integrating it with a deeply sup ervised opti- mization scheme. W e ev aluate this approach for femoral attachmen t site planning in MPFL reconstr uctio n surg ery which is a clinically relev a n t a nd common pro - cedure. W e demonstrate exp ert-level p erformance of o ur pr oposed solution with a compr ehensiv e ev aluation including c linica l data a nd an in ter-r ater study with m ultiple surg eons. The achiev ed r esults ena ble direct integration into the o p era- tive workflo w and in almost a ll cases allow the num b er of man ual planning steps to b e limited to the confirmation of the planning propos a l, so tha t the surgeon can remain sterile thr o ughout the pro cedure. 2 Metho ds 2.1 Multi-task Stac ked Hourglass Net w ork A SHGN is a m ulti-stage con volutional netw ork a rc hitecture which sequen tially arrang es l = 1 , 2 , ..., L symmetrical F ully C o n volutional Netw orks referred to as hourglas s mo dules (HG) [7 ]. By casca ded infer ence, several iterations of b ottom- up and top-down pro cessing of data and features are perfor med to ca ptur e and combine the input morpho lo gy at v ar io us scales and abstraction levels. At the end o f the expanding path of each HG, features are fed into an additio na l bot- tleneck res idual unit before b eing distributed for individual task pr ocessing. F or each task t = 1 , 2 , ..., T we introduce a sepa r ate prediction mo dule to facilitate task-sp ecific discriminative p o wer, allow for in termediate estimates, and exploit iterative refinement by reinjection (Fig. 2). Fig. 2. Proposed multi-task n etw ork based on the SHGN arc hitecture with intermedi- ate GradNorm we ight balancing, instance n orma lization ( IN), and p re-acti v ation lay out for residual b ottleneck un i ts [3,4,7]. Here, C in = 1, C 1 = 64, C 2 = 128, and C ∗ = 256. 4 Kordon et al. W eighted Multi-l a b el Segmentation Loss X-ray ima ges a re sup erimp o si- tion pro jections, whic h leads to a m biguities in class assignment for o verlapping bo nes with simila r imag ing characteristics. Instead of using multinomial pixel- wise classification, w e ther e fo r define bone segmentation as a m ulti-class/multi- lab el pro blem and p erform sepa rate binomial class ifications for each b one in the target region to allow a pixel to b e assigned m ultiple la b els. W e further exploit this m ulti-lab el infor mation to pena lize error s in o verlap regio ns, whic h w e derive by a character istic function g bij = [ P c y bcij > 1] a nd incor porate into the loss function with scaling factor β . y corresp onds to a 4th-order tens o r ( B , C , H, W ), where the task-sp ecific g round truth maps are stack ed along C . Ea c h tensor el- ement is indexed with b ∈ [1 , B ], c ∈ [1 , C ], i ∈ [1 , H ], and j ∈ [1 , W ] where B , C , H , W mark batch, channel, heigh t, and width dimensions. The resulting segmentation los s for prediction ˆ y [ l ] with sigmoid nonlinearity σ computes by L [ l ] seg = 1 B H W X ( b,c,i,j ) (1+ β g bij ) h − y log σ ˆ y [ l ] − (1 − y )log 1 − σ ˆ y [ l ] i bcij . (1) Landmark a nd Region of In te r est Loss L a ndmarks and ROIs are repre- sented as heat-maps, which enco de the lo calization likelihoo d a s a spatial in ten- sity distribution. F or landmark gro und truth, a 2D unnor malized Gaussia n with a standar d deviation of 6 pixels is cen tered o n the annotated p osition. Likewise, the line’s ROI ground tr uth is der ived by placing equidistant ps eudo landmarks along the gro und truth cor tical line. F o r lo ss calcula tio n, heat-ma p matching is per formed as L [ l ] { lm,r oi } = 1 B C H W X ( b,c,i,j ) ˆ y [ l ] bcij − y bcij 2 . (2) W e der iv e landmark positio ns by performing the ar g max oper a tion on the pre- dicted likelihoo d s c o res. F or the p osterior femoral cortex tangent line LM1 (Fig. 1), w e mask the f emur segmen tation outline with the pr edicted ROI and discard features with a likeliho od below 0 . 5 pr ior to a least squares regression. T ask W ei gh tin g and T otal Loss W e utilize deep supervision to ease gr adi- ent flow acr oss m ultiple stages of the SHGN and to facilitate faster training. F or this purp ose, individual task losses are calculated and summed up for b oth int ermedia te a nd final HGs to form a single lo ss v alue. T o resp ect im balanced task difficulties a nd to avoid overfitting to only a subset of tasks, we use gradient normalizatio n (GradNorm) and adapt it to a deeply sup ervised setting [3]. Gra d- Norm genera lly tackles task im balance by reducing the v aria nce acr oss the ta sks’ training r ates. F o r this purp ose, ta sk-specific loss weigh ting fa ctors are lea rned by jointly reducing an additional m ulti-task loss function on the basis of g r adien t magnitudes to ada pt ively a dju st the g radien t norm at each upda te step [3]. The GradNorm weigh ts for each sup ervised HG are based o n the last shared bo ttle- neck lay er befor e bra nc hing off to the pr e dic tio n mo dules (Fig. 2 ). The res ult ing balanced loss is g iv en b y L total = P L l =1 w [ l ] seg L [ l ] seg + w [ l ] lm L [ l ] lm + w [ l ] r oi L [ l ] r oi . Multi-task Lo caliza tion and Segmen tation for S urgica l Planning 5 2.2 Dataset and T ra ini ng P ro cedures T ra ining a nd v alidation data consists o f 185 lateral X-ray pro jectio ns o f the knee joint acq uired prior to re c onstruction surger y . The data was split with ratio 0 . 8 / 0 . 2 for training and v a lida tion (14 9 /36 images). F or ev aluatio n, 38 separate test images with s ta ndardized measuring spheres o f 30 mm diameter were used. Annotation of the gr ound truth landma rk positions and line reference po in ts on training and v alida t ion images was p erformed by one ortho pa edic surgeon with an in teractive pro prietary to ol (Fig. 1). T o allow for an estimate on in ter-r ater v ariability , annotation on the tes t data w as extended to three orthopaedic sur geons from the same hospital. Gr ound truth segmentation masks for the fem ur, patella, tibia, and fibula were created by the first author . A basic set of data a ugmen tations (rotatio n, s caling, ho rizon tal flipping) as well as linea r contrast scaling with a probability of p = 0 . 5 e a c h were a pplied during tra ining. After a ugmen tation, the v ariably s ized images were zero -padded to square spatia l dimensions and subsequently downsampled to a resolution of 256 × 256 pixels. W e dev ise a multi-task SHGN with L = 4 HGs and intro duce instance nor- malization layers for approximate contrast inv a riance a nd to smo oth the op- timization la ndscape [6]. W e consider T = 3 task s and hence construct three prediction mo dules at the end of each HG. The netw ork was implemented with PyT orch (v0.4.1 ) and trained on a NVIDIA Quadro P500 0 ov er 250 ep ochs with batch size 2 . The netw ork parameter s and GradNorm task weigh ts were optimized with RMSPro p at learning r ates o f 0.0 0025 a nd 0.02 5 res pectively . The learning rate fo r netw ork parameter s was halved every 60 ep ochs. Based on prior h yp er-parameter o ptim izatio n, GradNor m’s asymmetry hyper- parameter was set to α = 1 at eac h HG, and the penalizing weight factor for multi-label segmentation was as signed to β = 0 . 6. 3 Ev aluation Bone Seg men tatio n The mode l consistently yields hig h ov erlap- and contour- based metric results a nd successfully delinea t es a ll ta rget b one structures (T ab. 1). Qualitative assess ment indicates successful disambiguation in ov erlapping a reas, in narr ow interarticular join t s pa ces, and in low-contrast reg ions (Fig. 3). Also, uncommon image characteris tics lik e os t eophytes along joint contours a s well as ab errant later al pro jections ar e resolved with high pr ecision. How ever, subpar per formance is obser v ed for the fibula due to the proximal par t being mostly ov erlapp ed b y the tibia with s e e min gly no intensit y shifts. Likewise, wro ngful assignment of spherical ma r k ers to the tibia or the femur leads to high contour distances. Line and Landmark Lo calization P r edictions for the landmarks p blum and p tmc are spatially precise with median E uclidean dista nc e (E D) error s of 1 . 18 , CI 80 % [0 . 99 , 1 . 74] mm and 2 . 1 4 , CI 80 % [1 . 71 , 2 . 63] mm resp ectiv ely (Fig. 3). In general, it can be observed that lo calization of p tmc is less r obust due to its 6 Kordon et al. T abl e 1. Segmen tation p erformance on 38 t e st images for all b ones in target region. Anatomy mean IOU (Mean ± STD) Average S urfa ce Distance (Mean ± STD) (mm) Hausdorff Distance (Mean ± STD) (mm) F em ur 0 . 99 ± 0 . 01 0 . 12 ± 0 . 61 2 . 96 ± 7 . 51 P atella 0 . 97 ± 0 . 02 0 . 02 ± 0 . 02 0 . 62 ± 0 . 56 Tibia 0 . 98 ± 0 . 02 0 . 23 ± 0 . 85 3 . 76 ± 10 . 77 Fibula 0 . 96 ± 0 . 02 0 . 14 ± 0 . 68 2 . 38 ± 5 . 41 Fig. 3. Automatic results for multi-task segmentation (a,c) and localization (b,d). In (a), false-p o sitive ass ignment of a sp heric al mark er to the tibia is observed. depe ndence on tr ue - lateral imag ing. Slight dev iations fro m a true - lateral pro - jection lea d to non-ov erlapping femoral epicondyles, whic h necessitates three- dimensional re a soning and co mp ensation for corre c t spatia l p ositioning. F or measuring the alignmen t o f the cortical extensio n line, ED of the ground truth po in ts p prox and p dist to the predicted line ar e a veraged, yielding a median score of 0 . 62 , CI 80 % [0 . 48 , 0 . 79] mm . Adaptiv e T ask W eighting The learned Gr adNorm task weight s genera lly re- duce the segmentation training rate ac ross all mo dules in exchange for incr eased landmark and R OI los s contributions (Fig. 4). With adv anced training time, balancing slightly con verges which indicates harmonization of the task-sp ecific loss magnitudes a nd gradients. Esp ecially in ear ly HGs, o ptim izatio n tow ards a single task is observed. 0 100 200 0 . 5 1 . 0 1 . 5 2 . 0 0 100 200 0 100 200 0 10 0 200 landmarks region of interest segmentation Epo c h T a sk weighting factor HG 1 HG 2 HG 3 HG 4 Fig. 4. Developmen t of GradNorm task w eights for each HG during t raining. Multi-task Lo caliza tion and Segmen tation for S urgica l Planning 7 Fig. 5. Visualization of errors b etw een in d ividually plann ed Schoettle Poin ts and th e exp erts’ cen troid. Distance scores are standardized w.r.t. pixel/mm spacing and to the mean orien tation of LM1 based on exp ert annotations. Per image, al l results are visually aligned to a reference planning, whose Schoettle P oint corresp onds to this cen troid. F or comparison with the original Schoettle area of r = 2 . 50 mm, the av erage confidence circle as p l anned b y the ex perts (b ounded by LM1, LM2 and LM3) is ov erla yed. In ter-rater Analysis and Autom a tic Planni ng The obs e r v ed inter-rater EDs of the constructed Schoettle Poin ts are generally within a c ircular confi- dence a rea with radius r = 2 . 50 mm, which descr ibes an anato mica lly correct femoral MPFL insertion s ite [8]. In comparis on, auto matic plannings are equally reliable and can reduce v a riabilit y c aused by differences in planning s trategy betw een ex pert rater s (T ab. 2, Fig. 5). Howev er, mor phological v ariations of the femoral cortex or a to o sho rt predicted line ROI frequen tly lead to slight an terior shifting. Likewise, pro jection of individual ED errors onto the longitudinal a nd anteroposter ior ax e s indicates an error tendency tow ar ds the anterior direction, which underlines difficulties in correct assessment of the femoral bow (Fig . 5). T abl e 2. Inter-rater v ariabilit y and comparison with prop osed automatic plann i ng. The median ED at 80 % confidence level (mm) b et ween raters is rep orted for full dataset and subset of images agreed to b e su itable for surgical plann i ng by all raters. First rater Second rater Schoettle Poin t (38/38 test images) Schoettle Poin t (29/38 suitable test images) 1 2 2.35, [1.94, 2.85] 2.68, [2.09, 3.13] 1 3 2.31, [1.91, 2.79] 2.49, [1.91, 2.95] 2 3 1.67, [1.37, 2.22] 1.62, [1.07, 2.12] Autom. 1 2.41, [1.97, 2.99] 2.64, [1.64, 3.10] Autom. 2 (gr. trut h training) 1.46, [1.00, 1.85] 1.33, [0.91, 1.59] Autom. 3 1.61, [1.45, 1.87] 1.56, [1.27, 1.73] Autom. Exp ert centroid 1.50, [1.41, 2.07] 1.41, [1.28, 1.52] 4 Discussion and Conclusion W e prop osed an automatic framework for jo int pr ediction of segmen tations a nd heatmaps for spatial lo calization of landmarks and line features. On the ex- 8 Kordon et al. ample of MPFL r econstruction sur g ery , we could show that we can facilitate surgical pla nning by providing planning prop osals at e xpert-ra ter precision. W e see limitations in that the pr oposed metho d w as only tra ined a nd tested on pre-op erational, co n ven tional X-ray images which depict a lar ge p ortion of the femoral shaft and t ypically hav e hig h contrast. F or seamless in tegration in to a clinical workflow with subsequent overla y of the planning result on live imag es, the framework p e rformance m ust be ev aluated on fluo roscopic image data . This mo dalit y ho wev er imposes additional difficulties onto auto m ated prediction b y sup e rimposed surgical too ls and b y greater ov erall hetero geneit y in imag e char- acteristics and acq uisition settings. T his co uld par tially by solved by o verlay of simulated to ols and implan ts in the image domain during tr aining, which sho wed to increase robustness o f lear ning-based algorithms [6]. As shown in this w or k, planning metho ds like the Scho ettle metho dology might a lso inhere n tly tolera te v ariability in assess men t s trategy and typically cannot b e s e curely v alidated due to absence of anatomica l gro und truth. An automatic solutio n should therefor utilize an adequate enco ding of this v aria bil- it y to allevia t e overfitting to a certa in type of annotatio n str ategy b y a sing le rater. F urthermor e , while w e exp erience satisfactor y results in estimating a line by masking segmen tation con tour fea tur es, such cross- task coupling introduce s additional fa ilur e p oin ts in the planning pipe line. As fo r estimation of the p oste- rior femor al cortex, strongly curved femora l shafts allow an ant er io r s hif t of the fitted line, which is directly conditioned b y the longitudinal extends of the pr e- dicted ROI. T o this end, we aim to loo k a t wa ys to derive confidence estimates for each task and to keep the n umber of interdependent tasks at a re asonable level. In future work, we also seek to adapt our a ppr oac h to differen t anatomies by exploiting the highly genera lizable concept of heat-map matching for a direct representation of arbitrar ily shap ed feature s . Disclaimer The metho ds and information pr esen ted here are ba sed on research and are not commercially av a ila ble. References 1. Bier, B., et al.: X-ray-transform inv ariant anatomical landmark detection for pelvic trauma surgery . In: F rangi, A . F., Schnab el, J.A., Dav atzik os, C., Alb erola-L´ opez, C., Fic htinger, G. (eds.) Medical Image Computing and Computer Assisted Interven tion MICCAI 2018. LNCS, vo l. 11073, pp. 55–63. Springer, Cham (2018) 2. Breininger, K., et al.: M ultiple d evice segmen tation for fluoroscopic imaging u si ng multi-task learning. In: St o yanov, D., et al. (eds.) Intrav ascular Imaging and Co m- puter Assisted Stenting and Large-Scale Annotation of Biomedical D ata and Ex p ert Lab el Synthesis. p p. 19–27. Springer, Cham (2018) 3. Chen, Z., Ba drinaray anan, V., Lee, C., Rabino vich, A.: Gradnorm: Gradient nor- malization for adaptiv e loss balancing in d eep multitask net w orks. In: Proc Int Co nf Mac h Learn. pp. 793–802 (2018 ) 4. He, K., Zhang, X., Ren, S., Sun, J .: Iden tity mappings in deep residual netw ork s. In: Leib e, B., Matas, J., Seb e, N., W elling, M. (eds.) Computer Vision – European Conference on Computer Vision 2016. pp . 630–645. Springer, Cham (2016) Multi-task Lo caliza tion and Segmen tation for S urgica l Planning 9 5. Josko wicz, L., Hazan, E.J.: Comput e r aided orthopaedic surgery: Incremental shift or paradigm c hange? Med Image Anal 33 , 84 – 90 (2016) 6. Kordon, F., Lasowski, R ., Sw artman, B., F ranke, J., Fischer, P ., K unze, H.: Im- prov ed x-ray b one segmen tation by normalization and augmen tation strategies. In: Handels, H ., Dese rno, T.M ., Maier, A., M aier-Hein, K.H., Pa lm, C. , T olxdorff, T. (eds.) Bildv erarb eitung f ¨ ur die Medizin 2019. pp. 104–109. Springer, Wiesbaden (2019) 7. Newell, A . , Y ang, K., Deng, J.: Stacke d hourglass netw orks for human p o se es- timation. In: Leib e, B., Matas, J., Seb e, N., W elling, M. (eds.) Computer Vision – Europ ea n Conference on Computer V is ion 2016. pp. 483–499. S pringer, Cham (2016) 8. Sch¨ ottle, P .B., Schmeling, A., Rosenstiel, N., W eiler, A.: R adio graphic landmarks for femoral tu nnel placemen t in medial p a tellofemoral ligament reconstruction. Am J Sp orts Med 35 (5), 801–804 (2007) 9. Szkely , G., Nolte, L.P .: Image guid an ce in orthopaedics and traumatology: A his- torical p erspective. Med Image Anal 33 , 79 – 83 (2016)
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment