CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control

Alexander Denker; Francisco Vargas; Riccardo Barbano; Runchang Li; Zeljko Kereta

arxiv: 2602.10933 · v2 · pith:YBRS44DQnew · submitted 2026-02-11 · 💻 cs.LG

CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control

Riccardo Barbano , Alexander Denker , Zeljko Kereta , Runchang Li , Francisco Vargas This is my paper

Pith reviewed 2026-05-21 13:35 UTC · model grok-4.3

classification 💻 cs.LG

keywords diffusion modelsstochastic optimal controlmulti-agent systemscompositional generationimage synthesistrajectory steering

0 comments

The pith

Pre-trained diffusion models can be treated as interacting agents and jointly steered by stochastic optimal control to compose outputs without knowing the target distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes formulating compositional generation with multiple pre-trained diffusion models as a cooperative stochastic optimal control problem. Instead of algebraically combining probability densities, the models are viewed as agents whose individual diffusion trajectories are controlled together to optimize a shared goal based on their combined results. This approach is tested on conditional MNIST generation and compared to a simple gradient guidance baseline. A sympathetic reader would care because it sidesteps the need for explicit knowledge of the target distribution, which most real-world composition tasks lack.

Core claim

Rather than combining probability densities, we treat pre-trained diffusion models as interacting agents whose diffusion trajectories are jointly steered, via optimal control, toward a shared objective defined on their aggregated output.

What carries the argument

Cooperative stochastic optimal control applied to the diffusion trajectories of multiple pre-trained models treated as agents.

If this is right

Compositional generation becomes possible without explicit target distribution knowledge.
Individual models require no major architectural changes.
Joint steering of trajectories achieves the shared aggregated objective.
Validation shows comparison to naive DPS-style baseline on MNIST.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar control approaches might apply to other generative models like GANs or VAEs for composition tasks.
This framework could enable more flexible multi-modal generation by defining appropriate aggregated objectives.
Testing on larger image datasets would reveal scalability of the optimal control steering.

Load-bearing premise

Pre-trained diffusion models can be treated as interacting agents whose trajectories can be jointly steered via optimal control toward an aggregated objective without requiring explicit knowledge of the target distribution or major changes to the individual models.

What would settle it

Observing that the cooperative control method fails to outperform the naive per-step gradient guidance baseline on conditional MNIST generation tasks would falsify the practical advantage of the approach.

Figures

Figures reproduced from arXiv: 2602.10933 by Alexander Denker, Francisco Vargas, Riccardo Barbano, Runchang Li, Zeljko Kereta.

**Figure 1.** Figure 1: A single sample generated with 3 agents for the target 3. Every agent controls one horizontal stripe ( coded) of the aggregated state Yt. We show the state X u,i 0 for every agent [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: The schematic above illustrates linear stacking induced by a non-overlapping selection [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Two Agents (joint): Aggregated state in a two-agent compositional diffusion setup with [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Two Agents (control-wise): Aggregated state in a two-agent compositional diffusion [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Three Agents (joint): Aggregated state in a three-agent compositional diffusion setup with [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Three Agents (control-wise): Aggregated state in a three-agent compositional diffusion [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Inference-time CDPS composition on MNIST. Top: two-agent setup with non-overlapping [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: A single sample for CMAD and CDPS generated with [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: A single sample for CMAD and CDPS generated with [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

read the original abstract

Continuous-time generative models have achieved remarkable success in image restoration and synthesis. However, controlling the composition of multiple pre-trained models remains an open challenge. Current approaches largely treat composition as an algebraic composition of probability densities, such as via products or mixtures of experts. This perspective assumes the target distribution is known explicitly, which is almost never the case. In this work, we propose a different paradigm that formulates compositional generation as a cooperative Stochastic Optimal Control problem. Rather than combining probability densities, we treat pre-trained diffusion models as interacting agents whose diffusion trajectories are jointly steered, via optimal control, toward a shared objective defined on their aggregated output. We validate our framework on conditional MNIST generation and compare it against a na\"ive inference-time DPS-style baseline replacing learned cooperative control with per-step gradient guidance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CMAD, a framework that formulates compositional generation with pre-trained diffusion models as a cooperative Stochastic Optimal Control (SOC) problem. Pre-trained models are treated as interacting agents whose diffusion trajectories are jointly steered via optimal control toward a shared objective defined on their aggregated output, rather than combining probability densities explicitly. The approach is validated on conditional MNIST generation and compared against a naive inference-time DPS-style baseline that uses per-step gradient guidance.

Significance. If the central claim holds with supporting evidence, this work could offer a meaningful new perspective on controlling compositions of diffusion models without requiring explicit knowledge of the target distribution, which addresses a key limitation in current algebraic composition methods. The cooperative SOC formulation and treatment of diffusion models as agents represent a potentially useful bridge between stochastic control and generative modeling.

major comments (2)

[Abstract] Abstract: The manuscript states that the framework is validated on conditional MNIST and compared to a DPS-style baseline, yet supplies no quantitative results, error bars, ablation details, derivation steps, or performance metrics. This absence leaves the central empirical claim unsupported and makes it impossible to evaluate whether the SOC paradigm improves upon the baseline.
[Abstract / Formulation] Formulation (as described in the abstract and introduction): The claim that a shared objective on aggregated output can be specified and optimized via SOC without explicit target distribution knowledge or major model changes is not demonstrated for non-trivial compositions beyond simple conditional MNIST (e.g., class label mismatch). For logical combinations of concepts from separate pre-trained models, constructing such an objective without reintroducing density-level information remains unclear, which is load-bearing for the paradigm-shift argument.

minor comments (2)

[Abstract] Abstract contains a typographical issue with the rendering of 'naive' as 'naive'.
[Method] The manuscript would benefit from explicit statements of the SOC cost function, the aggregation operator, and the resulting optimality conditions to allow readers to assess the derivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major point below, indicating planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The manuscript states that the framework is validated on conditional MNIST and compared to a DPS-style baseline, yet supplies no quantitative results, error bars, ablation details, derivation steps, or performance metrics. This absence leaves the central empirical claim unsupported and makes it impossible to evaluate whether the SOC paradigm improves upon the baseline.

Authors: We agree that the abstract would benefit from explicit quantitative support. The current manuscript presents the validation on conditional MNIST primarily through qualitative examples in the main text and a comparison to the DPS-style baseline in the experiments section, with derivation steps for the cooperative SOC formulation provided in the appendix. In the revision we will incorporate key performance metrics (e.g., classification accuracy on generated samples and a simple distance measure to the target condition) together with error bars from multiple random seeds directly into the abstract and expand the main-text experiments to include the requested ablation details. revision: yes
Referee: [Abstract / Formulation] Formulation (as described in the abstract and introduction): The claim that a shared objective on aggregated output can be specified and optimized via SOC without explicit target distribution knowledge or major model changes is not demonstrated for non-trivial compositions beyond simple conditional MNIST (e.g., class label mismatch). For logical combinations of concepts from separate pre-trained models, constructing such an objective without reintroducing density-level information remains unclear, which is load-bearing for the paradigm-shift argument.

Authors: We appreciate the referee highlighting the scope of the current demonstration. Conditional MNIST is used as a minimal setting in which the shared objective is defined directly on the aggregated output (via a pre-trained classifier score on the combined image) without requiring the explicit target density. For logical combinations such as AND/OR of concepts, the same principle applies: the objective can be realized as a function of classifier outputs on the aggregated sample (e.g., product of scores for conjunction) without reverting to density-level operations. We will add a dedicated paragraph in the revised introduction and a short illustrative example in the experiments section to clarify this construction and better support the paradigm-shift argument. revision: partial

Circularity Check

0 steps flagged

No circularity: new SOC multi-agent framing is independent of target result

full rationale

The paper introduces compositional generation as a cooperative stochastic optimal control problem in which pre-trained diffusion models act as agents whose trajectories are steered toward an objective on aggregated output. This is explicitly contrasted with algebraic density composition and does not derive any quantity from itself. Validation on conditional MNIST employs a straightforward cost (e.g., label mismatch) that is external to the models; no equation reduces by construction to a fitted parameter, self-citation chain, or renamed input. The derivation therefore remains self-contained against external SOC theory and the stated empirical benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review supplies no concrete free parameters, axioms, or invented entities; the central framing rests on the domain assumption that diffusion models can be cast as controllable agents.

axioms (1)

domain assumption Pre-trained diffusion models can be treated as interacting agents whose trajectories are jointly steerable via optimal control
This premise is invoked when the abstract replaces density composition with cooperative control.

pith-pipeline@v0.9.0 · 5669 in / 1160 out tokens · 54811 ms · 2026-05-21T13:35:41.186836+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

formulates compositional generation as a cooperative Stochastic Optimal Control problem... shared objective defined on their aggregated output

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 6 internal anchors

[1]

Trust region constrained measure transport in path space for stochastic optimal control and inference.arXiv preprint arXiv:2508.12511,

Denis Blessing, Julius Berner, Lorenz Richter, Carles Domingo-Enrich, Yuanqi Du, Arash Vahdat, and Gerhard Neumann. Trust region constrained measure transport in path space for stochastic optimal control and inference.arXiv preprint arXiv:2508.12511,

work page arXiv
[2]

Diffusion Posterior Sampling for General Noisy Inverse Problems

Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems.arXiv preprint arXiv:2209.14687,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Kevin Clark, Paul Vicol, Kevin Swersky, and David J Fleet. Directly fine-tuning diffusion models on differentiable rewards.arXiv preprint arXiv:2309.17400,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Alexander Denker, Shreyas Padhy, Francisco Vargas, and Johannes Hertrich

doi: 10.52202/079017-0620. Alexander Denker, Shreyas Padhy, Francisco Vargas, and Johannes Hertrich. Iterative importance fine-tuning of diffusion models.arXiv preprint arXiv:2502.04468,

work page doi:10.52202/079017-0620
[5]

Adjoint matching: Fine- tuning flow and diffusion generative models with memoryless stochastic optimal control.arXiv preprint arXiv:2409.08861, 2024

Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, and Ricky TQ Chen. Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic optimal control. arXiv preprint arXiv:2409.08861, 2024a. Carles Domingo-Enrich, Jiequn Han, Brandon Amos, Joan Bruna, and Ricky T. Q. Chen. Stochas- tic optimal control matching. InThe Thirt...

work page arXiv
[6]

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.arXiv preprint arxiv:2006.11239,

work page internal anchor Pith review Pith/arXiv arXiv 2006
[7]

Deep fictitious play for stochastic differential games

Ruimeng Hu. Deep fictitious play for stochastic differential games.arXiv preprint arXiv:1903.09376,

work page arXiv 1903
[8]

The Principles of Diffusion Models

5 Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, and Stefano Ermon. The principles of diffusion models.arXiv preprint arXiv:2510.21890,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Feynman-kac correctors in diffusion: Annealing, guidance, and product of experts.arXiv preprint arXiv:2503.02819, 2025

Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian, Roberto Bondesan, Al ´an Aspuru-Guzik, Arnaud Doucet, Rob Brekelmans, Alexander Tong, and Kirill Neklyudov. Feynman-kac correc- tors in diffusion: Annealing, guidance, and product of experts.arXiv preprint arXiv:2503.02819,

work page arXiv
[11]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456,

work page internal anchor Pith review Pith/arXiv arXiv 2011
[12]

Composition and control with distilled energy diffusion models and sequential monte carlo.arXiv preprint arXiv:2502.12786,

James Thornton, Louis B ´ethune, Ruixiang Zhang, Arwen Bradley, Preetum Nakkiran, and Shuangfei Zhai. Composition and control with distilled energy diffusion models and sequential monte carlo.arXiv preprint arXiv:2502.12786,

work page arXiv
[13]

Path integral sampler: a stochastic control approach for sam- pling.arXiv preprint arXiv:2111.15141,

Qinsheng Zhang and Yongxin Chen. Path integral sampler: a stochastic control approach for sam- pling.arXiv preprint arXiv:2111.15141,

work page arXiv
[14]

For example, Du et al

6 A Related Work Much of the literature on compositional generation focuses on sampling methods for drawing from specific combinations of the underlying diffusion model densities (Liu et al., 2022; Du et al., 2023; Skreta et al., 2025). For example, Du et al. (2023) considers sampling from product-of-experts (PoE), mixtures of densities or negation. Conce...

work page 2022
[15]

(2023); Liu et al

Such approximations are known to lead to poor generation quality in practice Du et al. (2023); Liu et al. (2022). This observation further motivates our approach. Rather than correcting the sampling procedure via MCMC, we adopt a pragmatic control-based formulation that avoids diffusion-time PoE sampling. B Background B.1 Generative Models as Continuous-T...

work page 2023
[16]

Another equivalent view is to consider the SOC as optimizing a measure on trajectories

combined with the fact that time-reverse diffusion process is a well-defined Markov diffusion process first shown in Haussmann & Pardoux (1986). Another equivalent view is to consider the SOC as optimizing a measure on trajectories. LetP denote the law of uncontrolled SDE dXt =b(X t, t)dt+g(t)dW t, X 0 ∼p 0 Girsanov theorem gives the Radon-Nikodym derivat...

work page 1986
[17]

In the following paragraphs, we discuss key technical details and design choices

12 The complete backpropagation-through-time update is also detailed in Algorithm 3, which computes Monte Carlo estimates of the control energy, path-wise cost, and terminal cost from sampled con- trolled trajectories and differentiates the resulting objective with respect to the control parameters. In the following paragraphs, we discuss key technical de...

work page 2025

[1] [1]

Trust region constrained measure transport in path space for stochastic optimal control and inference.arXiv preprint arXiv:2508.12511,

Denis Blessing, Julius Berner, Lorenz Richter, Carles Domingo-Enrich, Yuanqi Du, Arash Vahdat, and Gerhard Neumann. Trust region constrained measure transport in path space for stochastic optimal control and inference.arXiv preprint arXiv:2508.12511,

work page arXiv

[2] [2]

Diffusion Posterior Sampling for General Noisy Inverse Problems

Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems.arXiv preprint arXiv:2209.14687,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Kevin Clark, Paul Vicol, Kevin Swersky, and David J Fleet. Directly fine-tuning diffusion models on differentiable rewards.arXiv preprint arXiv:2309.17400,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Alexander Denker, Shreyas Padhy, Francisco Vargas, and Johannes Hertrich

doi: 10.52202/079017-0620. Alexander Denker, Shreyas Padhy, Francisco Vargas, and Johannes Hertrich. Iterative importance fine-tuning of diffusion models.arXiv preprint arXiv:2502.04468,

work page doi:10.52202/079017-0620

[5] [5]

Adjoint matching: Fine- tuning flow and diffusion generative models with memoryless stochastic optimal control.arXiv preprint arXiv:2409.08861, 2024

Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, and Ricky TQ Chen. Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic optimal control. arXiv preprint arXiv:2409.08861, 2024a. Carles Domingo-Enrich, Jiequn Han, Brandon Amos, Joan Bruna, and Ricky T. Q. Chen. Stochas- tic optimal control matching. InThe Thirt...

work page arXiv

[6] [6]

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.arXiv preprint arxiv:2006.11239,

work page internal anchor Pith review Pith/arXiv arXiv 2006

[7] [7]

Deep fictitious play for stochastic differential games

Ruimeng Hu. Deep fictitious play for stochastic differential games.arXiv preprint arXiv:1903.09376,

work page arXiv 1903

[8] [8]

The Principles of Diffusion Models

5 Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, and Stefano Ermon. The principles of diffusion models.arXiv preprint arXiv:2510.21890,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Feynman-kac correctors in diffusion: Annealing, guidance, and product of experts.arXiv preprint arXiv:2503.02819, 2025

Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian, Roberto Bondesan, Al ´an Aspuru-Guzik, Arnaud Doucet, Rob Brekelmans, Alexander Tong, and Kirill Neklyudov. Feynman-kac correc- tors in diffusion: Annealing, guidance, and product of experts.arXiv preprint arXiv:2503.02819,

work page arXiv

[11] [11]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456,

work page internal anchor Pith review Pith/arXiv arXiv 2011

[12] [12]

Composition and control with distilled energy diffusion models and sequential monte carlo.arXiv preprint arXiv:2502.12786,

James Thornton, Louis B ´ethune, Ruixiang Zhang, Arwen Bradley, Preetum Nakkiran, and Shuangfei Zhai. Composition and control with distilled energy diffusion models and sequential monte carlo.arXiv preprint arXiv:2502.12786,

work page arXiv

[13] [13]

Path integral sampler: a stochastic control approach for sam- pling.arXiv preprint arXiv:2111.15141,

Qinsheng Zhang and Yongxin Chen. Path integral sampler: a stochastic control approach for sam- pling.arXiv preprint arXiv:2111.15141,

work page arXiv

[14] [14]

For example, Du et al

6 A Related Work Much of the literature on compositional generation focuses on sampling methods for drawing from specific combinations of the underlying diffusion model densities (Liu et al., 2022; Du et al., 2023; Skreta et al., 2025). For example, Du et al. (2023) considers sampling from product-of-experts (PoE), mixtures of densities or negation. Conce...

work page 2022

[15] [15]

(2023); Liu et al

Such approximations are known to lead to poor generation quality in practice Du et al. (2023); Liu et al. (2022). This observation further motivates our approach. Rather than correcting the sampling procedure via MCMC, we adopt a pragmatic control-based formulation that avoids diffusion-time PoE sampling. B Background B.1 Generative Models as Continuous-T...

work page 2023

[16] [16]

Another equivalent view is to consider the SOC as optimizing a measure on trajectories

combined with the fact that time-reverse diffusion process is a well-defined Markov diffusion process first shown in Haussmann & Pardoux (1986). Another equivalent view is to consider the SOC as optimizing a measure on trajectories. LetP denote the law of uncontrolled SDE dXt =b(X t, t)dt+g(t)dW t, X 0 ∼p 0 Girsanov theorem gives the Radon-Nikodym derivat...

work page 1986

[17] [17]

In the following paragraphs, we discuss key technical details and design choices

12 The complete backpropagation-through-time update is also detailed in Algorithm 3, which computes Monte Carlo estimates of the control energy, path-wise cost, and terminal cost from sampled con- trolled trajectories and differentiates the resulting objective with respect to the control parameters. In the following paragraphs, we discuss key technical de...

work page 2025