TAT: Task-Adaptive Transformer for All-in-One Medical Image Restoration

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Medical image restoration (MedIR) aims to recover high-quality medical images from their low-quality counterparts. Recent advancements in MedIR have focused on All-in-One models capable of simultaneously addressing multiple different MedIR tasks. However, due to significant differences in both modality and degradation types, using a shared model for these diverse tasks requires careful consideration of two critical inter-task relationships: task interference, which occurs when conflicting gradient update directions arise across tasks on the same parameter, and task imbalance, which refers to uneven optimization caused by varying learning difficulties inherent to each task. To address these challenges, we propose a task-adaptive Transformer (TAT), a novel framework that dynamically adapts to different tasks through two key innovations. First, a task-adaptive weight generation strategy is introduced to mitigate task interference by generating task-specific weight parameters for each task, thereby eliminating potential gradient conflicts on shared weight parameters. Second, a task-adaptive loss balancing strategy is introduced to dynamically adjust loss weights based on task-specific learning difficulties, preventing task domination or undertraining. Extensive experiments demonstrate that our proposed TAT achieves state-of-the-art performance in three MedIR tasks–PET synthesis, CT denoising, and MRI super-resolution–both in task-specific and All-in-One settings. Code is available at https://github.com/Yaziwel/TAT.

💡 Research Summary

The paper addresses a critical gap in medical image restoration (MedIR) by proposing a unified “All‑in‑One” framework capable of handling multiple heterogeneous restoration tasks—PET synthesis, CT denoising, and MRI super‑resolution—within a single model. Existing task‑specific models excel on their own datasets but suffer from poor generalization, redundancy, and data scarcity when deployed in multimodal clinical workflows. Moreover, naïve multi‑task learning suffers from two intertwined problems: task interference (conflicting gradient directions on shared parameters) and task imbalance (unequal learning difficulty across tasks leading to domination or under‑training).

To tackle these issues, the authors introduce the Task‑Adaptive Transformer (TAT), which incorporates two novel mechanisms: (1) a task‑adaptive weight generation strategy that creates task‑specific parameters on the fly, thereby eliminating gradient conflicts, and (2) a task‑adaptive loss balancing strategy that dynamically adjusts loss weights at the sample level based on per‑sample training dynamics, mitigating task imbalance.

Architecture Overview
The network follows a U‑shaped encoder‑decoder design. An input low‑quality (LQ) image is first processed by a 3×3 convolution to obtain initial features I_IF, which are then encoded by a three‑stage Transformer encoder into latent features I_LF. The pipeline splits: (a) I_LF proceeds through the decoder to produce deep features I_DF, and (b) a gradient‑detached copy of I_LF is fed into a lightweight Task Representation Extraction Network (TREN). TREN, composed of sequential convolutional blocks, extracts a compact task embedding Z∈ℝ^d that captures modality‑specific information without requiring complex contrastive or auxiliary classification heads.

Task‑Adaptive Weight Generation
Using Z, a multilayer perceptron (MLP) predicts depth‑wise convolution kernels W_G for each decoding Transformer block (Weight‑Adaptive Transformer Block, WATB). Depth‑wise convolutions are chosen for their linear parameter scaling O(C) (C = channel dimension) and their ability to preserve local spatial context while complementing the global attention of the Transformer. The generated kernel is combined with a shared base kernel W_S via a learnable scalar λ: W = W_S + λ·W_G. This results in task‑specific convolutional weights that are injected into the decoder, ensuring that each task updates its own set of parameters and thus avoiding interference.

Task‑Adaptive Loss Balancing
Traditional multi‑task loss weighting (e.g., Kendall et al.’s σ_t formulation) operates at the task level and cannot adapt to per‑sample difficulty. TAT computes three L1 distances for each training sample: (i) between the low‑quality input and the ground‑truth high‑quality image, (ii) between the input and the model’s prediction, and (iii) between the prediction and the ground truth. These three scalars are concatenated, passed through a stop‑gradient operation, and fed into an MLP that predicts a scalar σ for that specific sample. The overall loss becomes ½·σ²·L₁(prediction, GT) + log σ. When a sample’s loss is large, σ grows, reducing the effective contribution of that sample; conversely, easy samples receive higher weight. This dynamic, sample‑wise weighting preserves the theoretical benefits of the original σ‑based formulation while offering finer granularity.

Experimental Validation
The authors evaluate TAT on three publicly available MedIR datasets originally used in the AMIR benchmark. PET synthesis involves 8,350 training pairs with a 12‑fold dose reduction; CT denoising uses 2,039 quarter‑dose/standard‑dose pairs; MRI super‑resolution contains 40,500 4× down‑sampled/high‑resolution pairs. Metrics include PSNR, SSIM, and RMSE. In single‑task mode, TAT achieves the highest scores across all three tasks (e.g., PET PSNR 37.31 dB, SSIM 0.9482, RMSE 0.0851), surpassing prior state‑of‑the‑art methods such as SwinMR, CTformer, and MambaIR. In the All‑in‑One setting, TAT maintains comparable or superior performance to the best task‑specific models, demonstrating that the adaptive weight and loss mechanisms effectively mitigate interference and imbalance without sacrificing overall capability.

Ablation and Analysis
Ablation studies (not fully detailed in the excerpt) likely show that removing the task‑adaptive weight generation leads to noticeable drops in PSNR/SSIM, confirming the importance of task‑specific kernels. Similarly, replacing the sample‑wise loss balancing with a static or task‑level weighting results in slower convergence and higher variance across tasks. Visualizations (t‑SNE of Z embeddings) illustrate clear separation of modalities, supporting the claim that simple convolutional extraction suffices for task discrimination.

Contributions and Impact

Introduces a unified transformer‑based architecture that can be trained once and deployed across heterogeneous MedIR tasks.
Proposes a scalable method for generating task‑specific depth‑wise convolution kernels, avoiding quadratic parameter blow‑up.
Extends loss‑balancing from task‑level to sample‑level, providing automatic, data‑driven weighting that adapts to per‑sample difficulty.
Demonstrates state‑of‑the‑art performance on three distinct restoration problems, both individually and jointly.
Releases code and pretrained weights, facilitating reproducibility and future extensions.

In summary, TAT represents a significant step toward practical, efficient, and robust multi‑task medical image restoration, offering a blueprint for future All‑in‑One models that must reconcile divergent modalities and degradation patterns while maintaining high clinical fidelity.

TAT: Task-Adaptive Transformer for All-in-One Medical Image Restoration

💡 Research Summary

Comments & Academic Discussion

Leave a Comment