CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Continuous-time generative models have achieved remarkable success in image restoration and synthesis. However, controlling the composition of multiple pre-trained models remains an open challenge. Current approaches largely treat composition as an algebraic composition of probability densities, such as via products or mixtures of experts. This perspective assumes the target distribution is known explicitly, which is almost never the case. In this work, we propose a different paradigm that formulates compositional generation as a cooperative Stochastic Optimal Control problem. Rather than combining probability densities, we treat pre-trained diffusion models as interacting agents whose diffusion trajectories are jointly steered, via optimal control, toward a shared objective defined on their aggregated output. We validate our framework on conditional MNIST generation and compare it against a naive inference-time DPS-style baseline replacing learned cooperative control with per-step gradient guidance.

💡 Research Summary

The paper introduces CMAD (Cooperative Multi‑Agent Diffusion), a novel framework for compositional generation that treats multiple pretrained diffusion models as interacting agents whose reverse‑time trajectories are jointly steered toward a task‑specific objective. Traditional compositional methods rely on algebraic combinations of probability densities (e.g., product of experts, geometric averages), which presuppose an explicit target density—a requirement rarely satisfied in practice. CMAD departs from this paradigm by formulating the problem as a stochastic optimal control (SOC) task. Each diffusion model is modeled as an independent agent i with a controlled stochastic differential equation (SDE): dXᵤ,ᵢₜ = bᵢ(Xᵤ,ᵢₜ, t) dt + g(t) uᵢ(Xᵤ,ᵢₜ, t; {Xᵤ,ⱼₜ}) dt + g(t) dWₜ. The control uᵢ depends on the states of all agents, creating a coupled SDE system that mirrors classic SOC formulations.

The overall objective J combines three terms: (1) a quadratic control cost ∑ᵢ λᵢ‖uᵢ‖² to regularize the magnitude of the controls, (2) a running cost c(Yₜ, t) that provides dense‑time gradients, and (3) a terminal cost Ψ(Y_T) evaluated on the aggregated output Yₜ = φ({Xᵤ,ᵢₜ}) where φ is a possibly learnable aggregation operator. The running cost is implemented as a time‑scaled surrogate of the terminal loss, c(Yₜ, t) = αₜ Ψ(ĤY₀), with ĤY₀ being a Tweedie‑based denoised estimate. This design ensures that the agents receive informative feedback throughout the reverse diffusion process, not only at the final step.

Optimization proceeds via a coordinate‑wise scheme called Control‑wise Optimisation with Iterative Diffusion Optimisation (IDO). In the outer loop, a single agent i is selected while the controls of all other agents are held fixed. The inner loop repeatedly samples trajectories of the coupled SDEs, computes a Monte‑Carlo estimate of J and its gradients with respect to uᵢ (and optionally the aggregation parameters ϑ), and updates uᵢ by stochastic gradient descent. This “control‑wise” update yields a sequence of single‑agent SOC sub‑problems, each solved efficiently with IDO. The authors also experiment with a joint update where all controls are optimized simultaneously.

Empirical validation is performed on a proof‑of‑concept task using MNIST digits. The authors split each 28×28 image into horizontal stripes and assign each stripe to a distinct agent (2 or 3 agents in total). The aggregation operator simply concatenates the non‑overlapping stripes. The terminal loss Ψ is the negative log‑likelihood of a pretrained MNIST classifier for a target digit, while a seam‑continuity loss encourages smooth transitions across stripe boundaries. As a baseline, they implement a DPS‑style inference‑time guidance method (CDPS) that approximates the optimal control by a scaled gradient of Ψ at each step. Results show that CMAD (both joint and control‑wise variants) achieves comparable or higher classification accuracy (≈99 %) while substantially reducing the terminal loss compared to CDPS. Qualitatively, CMAD produces more realistic digit shapes with fewer artifacts, though it exhibits slightly reduced sample diversity.

The contributions of the paper are threefold: (1) a new multi‑agent SOC formulation for compositional diffusion that does not require an explicit algebraic target density, (2) a scalable control‑wise optimization algorithm (IDO) that can handle the coupled dynamics of multiple agents, and (3) empirical evidence that directly optimizing a task‑defined loss yields better fidelity than heuristic gradient guidance. Limitations include the focus on low‑dimensional MNIST data, the lack of theoretical convergence guarantees for the control‑wise updates, and the need for more sophisticated path‑wise gradient estimators to scale to high‑resolution images. Future work outlined by the authors involves extending the method to high‑dimensional settings, investigating adjoint‑matching or other efficient gradient techniques, exploring connections to differential games and fictitious‑play dynamics, and learning the aggregation operator φ jointly with the controls. Overall, CMAD opens a promising direction for flexible, goal‑driven composition of multiple diffusion models without relying on predefined density algebra.

CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control

💡 Research Summary

Comments & Academic Discussion

Leave a Comment