Adaptive Domain Shift in Diffusion Models for Cross-Modality Image Translation

Adaptive Domain Shift in Diffusion Models for Cross-Modality Image Translation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Cross-modal image translation remains brittle and inefficient. Standard diffusion approaches often rely on a single, global linear transfer between domains. We find that this shortcut forces the sampler to traverse off-manifold, high-cost regions, inflating the correction burden and inviting semantic drift. We refer to this shared failure mode as fixed-schedule domain transfer. In this paper, we embed domain-shift dynamics directly into the generative process. Our model predicts a spatially varying mixing field at every reverse step and injects an explicit, target-consistent restoration term into the drift. This in-step guidance keeps large updates on-manifold and shifts the model’s role from global alignment to local residual correction. We provide a continuous-time formulation with an exact solution form and derive a practical first-order sampler that preserves marginal consistency. Empirically, across translation tasks in medical imaging, remote sensing, and electroluminescence semantic mapping, our framework improves structural fidelity and semantic consistency while converging in fewer denoising steps.


💡 Research Summary

**
The paper tackles a fundamental inefficiency in diffusion‑based cross‑modal image translation: the reliance on a fixed, globally linear schedule that linearly interpolates between source and target modalities during reverse diffusion. Such a schedule forces the sampler to traverse high‑energy off‑manifold regions, burdening the diffusion model with large corrective updates and often causing semantic drift, especially when the source and target domains differ substantially in texture, intensity, or structure.

To overcome this, the authors propose Cross‑Domain Translation SDE (CDTSDE), which embeds a learnable, spatially‑varying, channel‑aware mixing field Λₜ into every reverse‑diffusion step. At each time step t, a lightweight convolutional network Sθ receives a base linear schedule λ_linₜ and a sinusoidal positional encoding π(p) and outputs a modulation hₜ,c(p). After zero‑centering and logistic squashing, this yields Λₜ,c(p)∈(0,1), which defines a per‑pixel mixture
 dₜ = Λₜ ⊙ x̂_src + (1−Λₜ) ⊙ x₀.
The forward marginal is then q(xₜ|x₀, x̂_src)=𝒩(xₜ; √{ᾱₜ}·dₜ, σ²ₜ), i.e., the usual Gaussian noise schedule but centered on the dynamic mixture. The reverse transition incorporates an explicit “restoration drift” term proportional to (Λₜ−Λₜ₋₁)⊙(x̂_src−x₀), which pushes the sample toward the target domain at every step.

The authors formalize a path‑energy functional
E


Comments & Academic Discussion

Loading comments...

Leave a Comment