TopoGate: Quality-Aware Topology-Stabilized Gated Fusion for Longitudinal Low-Dose CT New-Lesion Prediction
Longitudinal low-dose CT follow-ups vary in noise, reconstruction kernels, and registration quality. These differences destabilize subtraction images and can trigger false new lesion alarms. We present TopoGate, a lightweight model that combines the …
Authors: Seungik Cho
TOPOGA TE: QU ALITY -A W ARE TOPOLOGY -ST ABILIZED GA TED FUSION FOR LONGITUDINAL LO W -DOSE CT NEW -LESION PREDICTION Seungik Cho Department of Physics & Astronomy , Rice Uni versity , USA ABSTRA CT Longitudinal low-dose CT follo w-ups v ary in noise, recon- struction kernels, and registration quality . These differences destabilize subtraction images and can trigger false “new- lesion” alarms. W e present T opoGate, a lightweight model that combines the follo w-up appearance view with the sub- traction view and controls their influence through a learned, quality-aware g ate. The gate is dri ven by three case-specific signals: CT appearance quality , registration consistenc y , and stability of anatomical topology measured with topological metrics. On the NLST–Ne w-Lesion–LongCT cohort com- prising 152 pairs from 122 patients, T opoGate improv es discrimination and calibration o ver single-vie w baselines, achieving an area under the R OC curve of 0.65 with a stan- dard de viation of 0.05 and a Brier score of 0.14. Remo v- ing corrupted or lo w-quality pairs, identified by the quality scores, further increases the area under the R OC curve from 0.62 to 0.68 and reduces the Brier score from 0.14 to 0.12. The gate responds predictably to degradation, placing more weight on appearance when noise gro ws, which mirrors ra- diologist practice. The approach is simple, interpretable, and practical for reliable longitudinal LDCT triage. Index T erms — Low-dose CT , new-lesion prediction, quality-aware fusion, topological stability 1. INTR ODUCTION Low-dose CT (LDCT) screening and longitudinal follow-up are essential for early lung-cancer detection and management, yet serial scans frequently dif fer in noise, reconstruction ker - nels, and acquisition protocols. [1, 2] These factors alter im- age statistics and downstream features, complicating case- wise comparison across time. T emporal subtraction, which is a dif ference between baseline CT image and followup CT image, is highly sensitiv e to misalignment, respiration, and contrast differences, often yielding spurious “new lesion” sig- nals. In routine practice, radiologists implicitly trust the more reliable e vidence stream [3] —the appearance on the follow- up vie w when subtraction is unstable, or the image dif ference when alignment is clean. Our goal is to model this beha vior explicitly . Longitudinal change detection commonly relies on non- rigid re gistration follo wed by subtraction or ∆ -radiomics; performance therefore hinges on registration fidelity and ac- quisition consistency . Data-quality control (QC) is typically implemented as pre-filtering: scans failing heuristic thresh- olds are excluded to a void bias. Similarity indices such as structural similarity (SSIM) are widely used to summarize structural agreement between paired images and serve as pragmatic consistency checks. [4] Fig. 1 . Longitudinal LDCT pair . (Left) Follow-up (FU); (Right) Registered baseline (BL reg). Differences in recon- struction and residual misalignment can destabilize temporal subtraction ∆ , motiv ating quality-aw are fusion. In parallel, topological descriptors [7] hav e emer ged as robust, interpretable measures of shape and structural change, offering resilience to certain intensity perturbations. [5] Ex- isting longitudinal models usually treat all inputs uniformly after a QC gate: accept or discard scans based on fixed thresh- olds. Such hard filtering may improve headline metrics but discards potentially useful e vidence and can bias cohorts. [6] What is missing is a quality-aware fusion mechanism that continuously modulates reliance on (i) the appearance of the follow-up R OI v ersus (ii) the temporal dif ference ( ∆ ), in proportion to case-specific reliability . Moreov er , registra- tion quality is not the only driv er of reliability; acquisition- induced variability (e.g., kernel choice) and structural sta- bility should also influence how much we trust ∆ v ersus appearance. T o address these challenges, we propose T opoGate , a quality-aware gated fusion framework for longitudinal LDCT new-lesion prediction. W e construct a quality vec- tor q = [ q ct , q reg , q topo ] ∈ [0 , 1] 3 that captures (a) CT ap- pearance quality (no-reference sharpness/entropy measures), (b) registration/consistency (slice-wise SSIM between FU and registered baseline), and (c) topology stability under controlled perturbations. W e introduce a constrained gate α ( q ) that increases reliance on appearance as CT quality and topological stability rise, and decreases reliance on ∆ when registration consistency degrades, mirroring radiologist heuristics. On a longitudinal cohort deri ved from NLST , the gate improv es discrimination and calibration ov er single-view baselines; furthermore, learning to weight by quality confers robustness to noise and mis-registration, while simple QC filtering provides additional gains. 2. METHODOLOGY The ov erall framew ork of our proposed method is illus- trated in Fig. 2. The primary goal is to reduce false “new- lesion” alarms and impro ve calibrated detection in longitu- dinal LDCT by adapti vely fusing appearance and temporal- difference views according to per -case image and re gistration quality . 2.1. T ask and Notation For patient i , let the region of interest (R OI) be defined by ROI ( i ) FU , R OI ( i ) BL reg ∈ R H × W × D and ∆ ( i ) = ROI ( i ) FU − R OI ( i ) BL reg . Labels are y i ∈ { 0 , 1 } ( 1 real-ne w; 0 pseudo-ne w). From the dataset, we read a FU-space lesion centroid p ( i ) ∈ R 3 and the mark whether it is pseudo-disease; we use the same mapping so that y i =1 denotes real-new and y i =0 de- notes pseudo-ne w . The baseline is deformably registered to FU to obtain BL reg , and both R OIs are extracted as fix ed- size cubic crops (after isotropic resampling) centered at p ( i ) , ensuring identical field of view for FU and BL reg . 2.2. Dual-V iew Encoders Rationale. Appearance and temporal subtraction fail for dif- ferent reasons (noise/blur vs. mis-registration). W e decouple them into two e xperts with the same light capacity so the gate can learn when to trust each view without confounding by model size. W e extract two feature vectors with a shallow shared 3D CNN (two 3 × 3 × 3 blocks + global a verage pooling): f ( i ) app = F app R OI ( i ) FU , (1) f ( i ) ∆ = F ∆ ∆ ( i ) , (2) producing f ∈ R K . The design separates reliable appearance cues from potentially noisy temporal differences. 2.3. Quality V ector W e compute a bounded quality vector q ( i ) = [ q ( i ) ct , q ( i ) reg , q ( i ) topo ] ⊤ ∈ [0 , 1] 3 : q ( i ) ct = tanh σ 2 ∇ 2 R OI ( i ) FU κ ct , (3) q ( i ) reg = 1 D D X d =1 SSIM I ( d ) FU , I ( d ) BL reg , (4) q ( i ) topo = exp − τ W ∞ D (FU) , D (BL reg ) . (5) Notation. D is the number of axial slices; I ( d ) FU and I ( d ) BL reg are the d -th slices of the FU and re gistered-baseline R OIs. σ 2 ( ∇ 2 X ) is v ariance-of-Laplacian of v olume X (sharpness), κ ct > 0 normalizes the scale, and tanh( · ) maps to [0 , 1) . SSIM( · , · ) ∈ [0 , 1] is averaged o ver non-constant slices. D ( · ) are persistence diagrams from the Euler–Characteristic trans- form; W ∞ is the bottleneck distance; τ > 0 controls the sta- bility mapping. All components are thus comparable on [0 , 1] . 2.4. Quality-A ware Gate and Prediction Fusion Design. T o mirror radiologist heuristics, the fusion should incr ease reliance on appearance when CT/topology are stable and decrease it when registration is strong. A sign-constrained, sigmoid gate provides this monotone, inter - pretable behavior and yields bounded, calibrated scores. The gate increases reliance on appearance as CT/topology quality rises and reduces it when registration is strong: α i = σ w 1 q ( i ) ct + w 2 q ( i ) topo − w 3 q ( i ) reg + b , (6) s i = α i g app ( f ( i ) app ) + (1 − α i ) g ∆ ( f ( i ) ∆ ) , (7) ˆ y i = σ ( s i ) , (8) with w 1 , w 2 , w 3 ≥ 0 for interpretability . 2.5. Loss and Implementation Loss Evaluation. Clinical use demands not only discrimina- tion but also reliable probabilities. W e therefore train with a cross-entropy term and an explicit calibration term, and keep the implementation lightweight for practical deployment. W e optimize L = BCE( ˆ y i , y i ) + λ brier ( ˆ y i − y i ) 2 (9) (with an optional monotonicity penalty on α ). Pr epr ocessing: NIfTI conv ersion, HU clipping [ − 1000 , 400] , isotropic re- sampling to ( H , W, D ) , and ROI crops centered at physician 3D points. T raining: Adam ( 10 − 4 ), batch 8 , early stopping on validation A UR OC; patient-lev el K -fold cross-v alidation. Fig. 2 . T opoGate framework. W e deformably register baseline CT to the follow-up to obtain BL reg , then crop aligned 3D R OIs around each lesion point p ( i ) and compute the temporal difference ∆ = FU − BL reg . T wo 3D encoders extract appearance and dif ference embeddings ( f app , f ∆ ) . A quality vector q = [ q ct , q reg , q topo ] ∈ [0 , 1] 3 controls a monotonic gate α = σ ( w 1 q ct + w 2 q topo − w 3 q reg + b ) , which adapti vely fuses the two view-specific predictions: s = α g app ( f app ) + (1 − α ) g ∆ ( f ∆ ) , producing the calibrated output ˆ y = σ ( s ) . 3. EXPERIMENTS AND RESUL TS 3.1. Dataset and Setup W e ev aluate on the publicly curated NLST –Ne w-Lesion– LongCT longitudinal cohort, comprising 152 follow-up (FU) scan pairs drawn from 126 LDCT studies and 122 unique patients. [11] Each FU has a deformably registered base- line ( BL r e g ). Lesion annotations are provided in the data as FU-space 3D centroids p ( i ) ∈ R 3 together indicating whether the lesion was already present at baseline. W e con- vert this to the target label y i ∈ { 0 , 1 } , where 1 = real-ne w and 0 = pseudo-new . All volumes are con verted to NIfTI, clipped to Hounsfield Units [ − 1000 , 400] , and resampled to isotropic spacing. W e then crop fixed-size cubic R OIs (edge length L vox els; H = W = D = L ) from both FU and BL reg centered at p ( i ) , yielding identical fields of view per pair . 3.2. Baselines and T raining W e compare: (i) App-only (FU appearance branch only), (ii) ∆ -only (temporal-dif ference branch only), (iii) T opo-only (uses only the topological descriptors), (iv) T opoGate (our model) which includes quality-aware fusion, and (v) Gate Fu- sion + All featur es (sanity check for overfitting). All models share the same shallow 3D CNN capacity (two 3 × 3 × 3 con v blocks with BN–ReLU and global average pooling), identical preprocessing, Adam optimizer (learning rate 10 − 4 ), batch size 8 , and early stopping on validation A UR OC. 3.3. Main P erformance T able 1 summarizes discrimination and calibration on the full cohort. The proposed method surpasses single-view baselines and produces better-calibrated probabilities. T able 1 . Full-cohort performance. Model A UR OC ( ± SD) App-only 0.55 ± 0.09 ∆ -only 0.57 ± 0.08 T opology-only 0.61 ± 0.06 T opoGate 0.65 ± 0.05 Gate + All features 0.58 ± 0.06 3.4. Gate Beha vior and Interpretability W e analyze ho w the learned gate weight α v aries with per- case quality . Figure 3 shows a clear monotonic trend: higher CT appearance quality ( q ct ) and higher topology stability ( q topo ) correspond to larger α , increasing reliance on appear- ance when subtraction is likely unstable. Fig. 3 . Gate response vs. quality . Larger α (color) is associ- ated with higher CT quality and topology stability , increasing trust in the appearance branch. 3.5. Quality Filtering Study W e identify low-quality pairs (e.g., constant slices or regis- tration failures indicated by low q reg ) and re-ev aluate on the clean subset. Figure 4 sho ws that A UR OC impro ves from 0 . 62 to 0 . 68 and Brier decreases from 0 . 14 to 0 . 12 . This con- firms that scan quality directly affects reliability and that T o- poGate benefits from cleaner inputs without modifying the model. Fig. 4 . Effect of quality filtering. Removing low-quality pairs increases A UR OC and reduces Brier , improving relia- bility . 3.6. Rob ustness to Degradation T o test sensiti vity , we add incremental noise to the FU view and record the mean gate weight. Figure 5 shows a monotonic rise of α with noise lev el, indicating that the model down- weights ∆ when subtraction becomes unreliable and shifts trust to appearance—evidence against o verfitting and support- iv e of stable deployment under v ariable image quality . Fig. 5 . Robustness. As simulated noise increases, the mean gate weight α increases monotonically , shifting weight from ∆ to appearance. 4. CONCLUSION T opoGate provides an inherently interpretable, quality-aware fusion for longitudinal LDCT : the g ate output ( α ) and the per- case quality vector ( q ) expose why the model fav ors appear- ance vs. temporal difference for a gi ven patient, reducing false “new-lesion” alarms while improving calibration. This design emphasizes built-in interpretability rather than post-hoc ex- planations—aligning with recommendations for high-stakes clinical AI to prefer transparent mechanisms ov er black-box models [8]. In practice, α and q of fer case-le vel rationales (e.g., low registration SSIM down-weights ∆ ), which can be surfaced alongside reliability diagrams or decision thresholds to support human-in-the-loop revie w and safety monitoring [10]. Beyond LDCT nodules, the same g ate can generalize to other paired/longitudinal settings where view reliability varies—e.g., therapy response assessment (CT/MR), surveil- lance after resection, PET/CT with heterogeneous reconstruc- tions, or mammography with vendor shifts—by swapping encoders while retaining the quality channels. Because T o- poGate is lightweight, it can serve as a deployable front-end that filters, calibrates, and prioritizes cases before hea vier models, and its continuous quality weighting complements (rather than replaces) standard QC prefilters. Future work will include external validation across scanners and institu- tions, prospective reader studies measuring time-to-decision and override rates, and integration with stronger foundation encoders while preserving quality-modulated fusion and its explanatory signals [9]. 5. COMPLIANCE WITH ETHICAL ST ANDARDS This retrospectiv e study used de-identified, open-access hu- man subject imaging data from The Cancer Imaging Archiv e (TCIA), specifically the “NLST–Ne w-Lesion–LongCT” anal- ysis result (DOI: 10.7937/e yvh-ag54) deri ved from the NLST image collection. Ethical approv al was not required as con- firmed by the public license and data-use terms attached to these TCIA resources; no new human or animal experiments were performed. 6. A CKNO WLEDGMENTS This work was accepted at IEEE ISBI (International Sympo- sium on Biomedical Imaging) 2026. W e thank the National Cancer Institute (NCI) and A CRIN for conducting and releas- ing the NLST , and TCIA for hosting and curating both the NLST collection and the NLST–New-Lesion–LongCT anal- ysis result. The analyses and conclusions are solely those of the authors and do not necessarily represent the views of NCI, A CRIN, or TCIA. 7. REFERENCES [1] S. Xu, S. Jiang, and W . Min, “No-reference/Blind image quality assessment: A survey , ” IETE T echni- cal Review , vol. 34, no. 3, pp. 223–245, 2017. doi: 10.1080/02564602.2016.1151385. [2] D. Mackin, X. Fa ve, L. Zhang, D. Fried, J. Y ang, B. T aylor , E. Rodriguez-Riv era, C. Dodge, A. K. Jones, and L. Court, “Measuring computed tomography scan- ner variability of radiomics features, ” In vestigative Ra- diology , vol. 50, no. 11, pp. 757–765, Nov . 2015. doi: 10.1097/RLI.0000000000000180. [3] S. Klein, M. Staring, K. Murphy , M. A. V ierge ver , and J. P . W . Pluim, “elastix: a toolbox for intensity- based medical image registration, ” IEEE T rans. Med. Imaging , v ol. 29, no. 1, pp. 196–205, Jan. 2010. doi: 10.1109/TMI.2009.2035616. [4] Z. W ang, A. C. Bo vik, H. R. Sheikh, and E. P . Si- moncelli, “Image quality assessment: From error visi- bility to structural similarity , ” IEEE T rans. Image Pr o- cess. , vol. 13, no. 4, pp. 600–612, Apr . 2004. doi: 10.1109/TIP .2003.819861. [5] D. Cohen-Steiner , H. Edelsbrunner, and J. Harer, “Sta- bility of persistence diagrams, ” in Proc. 21st ACM Symp. Computational Geometry (SoCG) , Pisa, Italy , Jun. 6–8, 2005, pp. 263–271. doi: 10.1007/s00454-006-1276-5. [6] S. Pertuz, D. Puig, and M. A. Garcia, “ Analysis of fo- cus measure operators for shape-from-focus, ” P attern Recognit. , vol. 46, no. 5, pp. 1415–1432, May 2013. doi: 10.1016/j.patcog.2012.11.011. [7] E. Munch, “ An invitation to the Euler characteris- tic transform, ” arXiv pr eprint arXiv:2310.10395 , Oct. 2023. doi: 10.48550/arXiv .2310.10395. [8] C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, ” Natur e Machine Intelligence , vol. 1, no. 5, pp. 206–215, May 2019. doi: 10.1038/s42256- 019-0048-x. [9] C. Guo, G. Pleiss, Y . Sun, and K. Q. W einberger , “On calibration of modern neural networks, ” in Pr oc. 34th Int. Conf. Machine Learning (ICML) , Sydney , Australia, 2017, pp. 1321–1330. [10] K. K. L. W ong, Y . Han, Y . Cai, W . Ouyang, H. Du, and C. Liu, “From trust in automation to trust in AI in healthcare: A 30-year longitudinal revie w and an interdisciplinary framework, ” Bioengineering , vol. 12, no. 10, p. 1070, Oct. 2025. doi: 10.3390/bioengineer- ing12101070. [11] A. Gong, M. Daly , J. Goldin, M. Brown, M. McNitt- Gray , and K. Ruchalski, “New Lung Lesions in Low- dose CT : A ne wly annotated longitudinal dataset deriv ed from the National Lung Screening T rial Dataset (NLST - New-lesion-LongCT) V ersion 1 [Dataset], ” The Cancer Imaging Ar chive , 2025. doi: 10.7937/eyvh-ag54.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment