Imaging foundation model for universal enhancement of non-ideal measurement CT
Non-ideal measurement computed tomography (NICT) employs suboptimal imaging protocols to expand CT applications. However, the resulting trade-offs degrade image quality, limiting clinical acceptability. Although deep learning methods have been used to enhance NICT images, their reliance on large training datasets and limited generalizability across diverse settings hinder practical use. We propose the multi-scale integrated Transformer AMPlifier (TAMP), the first imaging foundation model for universal NICT enhancement. Pre-trained on 10.8 million physics-driven simulated NICT images, TAMP generalizes effectively across various NICT settings, defect degrees, and body regions. Moreover, a parameter-efficient fine-tuning strategy enables TAMP to adapt to specific clinical scenarios using only few slices. Extensive experiments, including radiologists and real-world validations, demonstrate that TAMP consistently improves image quality and clinical acceptability, underscoring its significant potential to advance CT imaging and broaden NICT applications in clinical practice.
💡 Research Summary
The paper addresses the longstanding challenge of improving image quality in non‑ideal measurement computed tomography (NICT), which includes low‑dose CT (LDCT), sparse‑view CT (SVCT), and limited‑angle CT (LACT). While these sub‑optimal acquisition protocols reduce radiation exposure, accelerate scans, or accommodate restricted patient postures, they inevitably introduce noise, streaking, and angular artifacts that compromise diagnostic utility. Existing deep‑learning solutions are typically specialized to a single NICT setting or anatomical region, requiring large, task‑specific datasets and extensive model redesign. Consequently, they lack the flexibility needed for rapid clinical deployment when new scanners or protocols emerge.
To overcome these limitations, the authors propose TAMP (Transformer AMPlifier), the first imaging foundation model (FM) designed for universal NICT enhancement. The approach consists of three major contributions:
-
SimNICT – a massive physics‑driven simulated dataset. Starting from 9,638 ideal‑measurement CT (ICT) volumes sourced from ten public repositories, the authors simulate NICT acquisitions by applying physically accurate degradation models (dose reduction, sparse angular sampling, limited angular range). This yields 10.8 million paired NICT‑ICT images covering three NICT modalities, four anatomical regions (head, chest, abdomen, lower limbs), and three defect severity levels (low, medium, high). Compared with prior works, SimNICT is over 360× larger, providing the scale necessary for foundation‑model pre‑training while preserving realistic artifact characteristics.
-
Multi‑Scale Integrated Transformer Network (MITNet). TAMP’s backbone is a hierarchical transformer architecture that processes the input at multiple spatial resolutions simultaneously. Each scale employs self‑attention blocks with large receptive fields, enabling the model to capture both fine‑grained noise (typical of LDCT) and coarse‑scale angular artifacts (typical of LACT). A Dual‑Domain Enhancement Learning (DDEL) scheme jointly optimizes losses in the image domain and the projection (sinogram) domain, encouraging consistency across reconstruction steps and reducing residual streaks.
-
Parameter‑efficient adaptation via LoRA. After pre‑training on SimNICT, TAMP can be fine‑tuned to a specific clinical scenario by updating only low‑rank adaptation matrices (LoRA) attached to each transformer layer. The authors demonstrate that with as few as five NICT‑ICT image pairs and 20 training epochs, TAMP‑S (the adapted version) matches or exceeds the performance of models trained from scratch on the same limited data. This dramatically lowers the data and compute barriers for site‑specific deployment.
Experimental validation spans 27 distinct NICT enhancement tasks (3 modalities × 3 defect levels × 3 anatomical regions). Baselines include six specialized CNN/transformer models (RED‑CNN, FBPConvNet, SPECIAL, TransCT, the newly introduced MITNet, and ProCT) trained from scratch, as well as the adapted FM ProCT. Evaluation metrics comprise PSNR, RMSE, SSIM, and LPIPS, supplemented by radiologist visual assessments and real‑world clinical scans. Key findings:
- Without any task‑specific fine‑tuning, TAMP outperforms all baselines on PSNR in 16/27 tasks (≈59 %) and on LPIPS in 23/27 tasks (≈85 %).
- After LoRA adaptation (TAMP‑S), TAMP achieves the highest PSNR in 26/27 tasks (≈96 %) and the best LPIPS in all tasks, surpassing even the task‑specific models that were trained on the full training set.
- Quantitatively, TAMP delivers average PSNR gains of 4–7 dB, SSIM improvements of 2–5 %, and LPIPS reductions of >30 % relative to the strongest baselines.
- Radiologists report that TAMP‑enhanced images retain fine anatomical details, exhibit fewer streaks, and are more readily interpretable for diagnosis.
- Real‑world validation on clinical LDCT, SVCT, and LACT scans confirms that the physics‑driven simulation pipeline generalizes well to actual patient data, despite the absence of real NICT‑ICT pairs in training.
The authors also release the SimNICT dataset publicly, establishing a benchmark for future NICT research and addressing ethical concerns related to repeated patient scanning for data collection.
In summary, TAMP combines (i) large‑scale physics‑based pre‑training for universal artifact representation, (ii) a multi‑scale transformer backbone with dual‑domain loss to handle diverse defect scales, and (iii) LoRA‑based low‑cost adaptation for site‑specific fine‑tuning. This trifecta enables a single model to deliver state‑of‑the‑art image quality across a wide spectrum of non‑ideal CT protocols, dramatically reducing the time, data, and computational resources required for clinical adoption. The work represents a significant step toward truly universal, foundation‑model‑driven medical imaging enhancement.
Comments & Academic Discussion
Loading comments...
Leave a Comment