Universal Multi-Domain Translation via Diffusion Routers

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Multi-domain translation (MDT) aims to learn translations between multiple domains, yet existing approaches either require fully aligned tuples or can only handle domain pairs seen in training, limiting their practicality and excluding many cross-domain mappings. We introduce universal MDT (UMDT), a generalization of MDT that seeks to translate between any pair of $K$ domains using only $K-1$ paired datasets with a central domain. To tackle this problem, we propose Diffusion Router (DR), a unified diffusion-based framework that models all central$\leftrightarrow$non-central translations with a single noise predictor conditioned on the source and target domain labels. DR enables indirect non-central translations by routing through the central domain. We further introduce a novel scalable learning strategy with a variational-bound objective and an efficient Tweedie refinement procedure to support direct non-central mappings. Through evaluation on three large-scale UMDT benchmarks, DR achieves state-of-the-art results for both indirect and direct translations, while lowering sampling cost and unlocking novel tasks such as sketch$\leftrightarrow$segmentation. These results establish DR as a scalable and versatile framework for universal translation across multiple domains.

💡 Research Summary

The paper tackles the problem of translating among many data domains when only a limited set of paired data is available. Traditional multi‑domain translation (MDT) either requires fully aligned tuples across all domains—a prohibitive data collection burden—or it learns only the mappings between a single “central” domain and each peripheral domain, leaving translations between peripheral domains unsupported. To address these limitations, the authors define Universal Multi‑Domain Translation (UMDT). In UMDT, K domains are considered, but only K − 1 paired datasets linking each peripheral domain to a chosen central domain are required. The goal is to learn a model capable of translating between any pair of domains, even those never seen together during training.

The proposed solution, Diffusion Router (DR), is a unified diffusion‑based framework that uses a single noise‑prediction network εθ. Unlike conventional conditional diffusion models that condition on a single label or class, DR conditions on both the source and target domain identifiers (src, tgt). This design mirrors network routers that route packets based on source and destination IP addresses, allowing the same network to infer the correct translation path for any noisy input x_t. Consequently, all bidirectional central↔peripheral mappings are learned jointly, eliminating the need for O(K) separate models.

Indirect translation between two peripheral domains X_i and X_j is performed in two stages: (1) generate a central‑domain sample x_c conditioned on the source sample x_i (or x_j), and (2) generate the target sample x_j (or x_i) conditioned on the intermediate x_c. Probabilistically, this corresponds to p(x_j|x_i)=∫p(x_j|x_c)p(x_c|x_i)dx_c, assuming conditional independence given the central domain. The authors show that a single DR model can learn all required conditional distributions p(x_k|x_c) and p(x_c|x_k) using a paired loss (Eq. 5) that randomly swaps source and target roles during training.

Direct peripheral‑to‑peripheral translation is more challenging because it would require sampling from the latent central distribution, which is computationally expensive. The authors introduce a variational‑upper‑bound objective that minimizes the KL divergence between the true indirect distribution and a direct model pθ(x_j|x_i). By approximating the intractable term with the already‑trained central‑to‑target distribution p_ref(x_j|x_c), they derive a tractable loss (Eq. 8) that aligns the direct conditional with the indirect one. This loss is implemented as a noise‑prediction objective (Eq. 10) that fine‑tunes the same DR network while keeping the original central↔peripheral mappings intact. A weighted combination of the paired and unpaired losses (Eq. 11) balances learning new direct routes against preserving existing ones.

To further reduce sampling cost, the paper proposes Tweedie refinement, a lightweight procedure that adjusts a noisy sample at a given timestep t toward the conditional mean using a closed‑form Tweedie estimator. Instead of running a full reverse diffusion from T to t for each sample, a few refinement steps (e.g., n = 0–7) suffice to obtain high‑quality conditional samples, dramatically speeding up both training and inference.

Experiments are conducted on three newly constructed large‑scale UMDT benchmarks: Shoes‑UMDT, Faces‑UMDT‑Latent, and Sketch‑Segmentation. The authors compare DR against strong baselines from GANs (StarGAN), normalizing flows (Rectified Flow), and other diffusion models (UniDiffuser, iDR/dDR). Evaluation uses Fréchet Inception Distance (FID) and sampling time. DR consistently achieves the lowest FID scores for both indirect and direct translations, often by a large margin, and requires fewer diffusion steps thanks to Tweedie refinement. Notably, DR can produce direct peripheral‑to‑peripheral translations that many baselines cannot handle at all. The method also generalizes to more complex spanning‑tree topologies with multiple central nodes.

In summary, the contributions are: (1) formalizing the UMDT problem, (2) introducing the Diffusion Router that conditions on source and target domain labels, (3) deriving a variational bound‑based training scheme for direct cross‑domain mappings, (4) presenting Tweedie refinement for efficient sampling, and (5) constructing large‑scale benchmarks that demonstrate state‑of‑the‑art performance. Limitations include potential degradation when the central domain shares little information with peripherals and the bias introduced by the variational approximation. Future work may explore dynamic label embeddings, non‑star routing graphs, and extensions to non‑image modalities such as text or audio.

Universal Multi-Domain Translation via Diffusion Routers

💡 Research Summary

Comments & Academic Discussion

Leave a Comment