pith. sign in

arxiv: 2606.17192 · v1 · pith:2PTAUQSSnew · submitted 2026-06-15 · 💻 cs.LG

Constrained Diffusion Models with Primal-Dual Inference

Pith reviewed 2026-06-27 03:40 UTC · model grok-4.3

classification 💻 cs.LG
keywords constrained diffusion modelsprimal-dual inferenceentropy-regularized optimizationGibbs distributionsdual ascentaverage constraintsconstrained sampling
0
0 comments X

The pith

Diffusion models sample constrained optimal distributions by jointly inferring the dual variable during generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a method for sampling from the optimal distributions of entropy-regularized problems that have average constraints. Rather than solving separately for the dual multiplier and then freezing it, the approach trains one score network conditioned on the dual variable and performs dual ascent updates at each reverse diffusion step using the constraint violation of the current samples. This produces a trajectory whose average dual variable converges to a neighborhood of the optimum, with the mismatch effect on the terminal distribution controlled by schedule-dependent stability factors.

Core claim

Primal-dual inference performs constrained diffusion sampling by denoising with the score field of the current dual-conditioned Gibbs distribution and then updating the dual multiplier via ascent on the observed constraint violation, so that the time-averaged dual converges to a neighborhood of the optimum while the effect of any residual mismatch on the final distribution remains bounded by schedule-dependent factors.

What carries the argument

The dual-conditioned score network, trained once over the family of Gibbs distributions indexed by the dual variables encountered during inference, which supplies the score field for each reverse step and enables the joint primal-dual trajectory.

If this is right

  • The terminal samples satisfy the average constraints up to a factor that depends on the diffusion schedule.
  • The same trained network supports sampling for any dual value encountered in the trajectory without retraining.
  • The method applies directly to wireless resource allocation and portfolio management problems with average constraints.
  • Convergence holds for the time average of the dual variables rather than requiring pointwise convergence at every step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same conditional-score idea could be tested in non-diffusion generative models that admit a score or energy formulation.
  • Online dual updates during generation may reduce the need for separate constraint-projection layers in other sampling pipelines.
  • Schedule-dependent stability bounds suggest that faster or slower diffusion schedules could be chosen to tighten the final constraint satisfaction.

Load-bearing premise

One network conditioned on the dual variable can accurately approximate the score for every Gibbs distribution that arises along the inference trajectory.

What would settle it

Run the method on a constrained mixture-of-Gaussians task until the time-averaged dual is within the proven neighborhood of the optimum, then measure whether the empirical constraint violation of the terminal samples exceeds the schedule-dependent bound; a consistent excess would falsify the stability claim.

Figures

Figures reproduced from arXiv: 2606.17192 by Alejandro Ribeiro, Samar Hadou, Yigit Berkay Uslu.

Figure 1
Figure 1. Figure 1: Wireless power allocation under ergodic minimum-rate rmin constraints. Nodes represent users and colors indicate how well the rate requirement of each user is met, with red marking those that fall short. Here, average constraints are meaningful because mutual interference prevents any single solution from satisfying every requirement at once. Optimal solutions must therefore alternate samples (i.e., power … view at source ↗
Figure 2
Figure 2. Figure 2: Mixture of Gaussians (K = 12 modes with M = 10 constraints in a 30-dimensional space). While PDI, PDL and PDM maintain full feasibility, PDI exhibits the best objective and mode diversity. 5 Numerical Results We provide numerical evidence in three stochastic optimization problems: constrained MoG, wireless resource allocation, and portfolio management. More details are provided in Appendices E, F, and G. 5… view at source ↗
Figure 3
Figure 3. Figure 3: Ablation. Comparisons between our base PDI-Net model, plotted in blue, and (left) a PDI-Net while setting λt to a fixed value along the denoising trajectory, (middle) a PDI-Net with different noise schedules, and (right) DT checkpoints corresponding to different training epochs and different estimates of the dual variable. users, however, are harder to correct by the dual variables and suffer under all met… view at source ↗
Figure 4
Figure 4. Figure 4: Out-of-distribution constraints. The models are trained under rmin = 0.6 and tested under different values of constraint levels. PDI degrades more gracefully than DT, showing more robustness to distribution shifts. Out-of-distribution constraints. The dual state con￾ditioning in our PDI score model also benefits OOD performance. In [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of Constrained Energy-based Diffusion-sampler (CED) in enforcing the con￾straints in expectation (top) vs per sample (bottom). Per-sample enforcement produces conservative samples that satisfy the constraints individually but fall short in providing a competitive objective. Unconstrained DM. We run our trained model with λt = 0 for all time steps. This eliminates the constraint term from the ene… view at source ↗
Figure 6
Figure 6. Figure 6: Initial to final transition matrix. The matrix is almost diagonal in the PD Langevin case, reflecting the fact that samples converge to their nearest modes. F Extended Numerical Results: Wireless Resource Allocation We study stochastic constrained optimization for optimal wireless resource allocation [7, 30, 60]. Consider an ad-hoc wireless network comprised of N = 200 transmitter-receiver (tx-rx) pairs de… view at source ↗
Figure 7
Figure 7. Figure 7: Dual Training. Cosine similarity and L2 distance between the dual variables across the DT training iterations and the last-iterate PDI dual variables. The shaded region indicates the train￾ing iterations where the cosine similarity peaks above 0.95. The metrics then drift slightly after￾wards. Dual training (continued). We investigate the differ￾ence between PDI and DT more closely by comparing the dual tr… view at source ↗
Figure 8
Figure 8. Figure 8: Performance of DT checkpoints along the training trajectory. The DT models fail to attain the same 5th percentile rates, mean violations and feasibility percentage as PDI. All the metrics are evaluated over the test set [PITH_FULL_IMAGE:figures/full_fig_p034_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Temperature effect. Two-dimensional scatter plots of generated samples of two test networks. The title shows the mean rates and feasibility of the network. Higher temperature (left) promotes more diverse allocations while lower temperature (right) generates more degenerate samples. The value β = 1 provides a good balance between mean rates and feasibility. The improvement of DT+ shows that the dual variabl… view at source ↗
Figure 10
Figure 10. Figure 10: Annealing effect of PDI. A scatter plot of the predicted power allocation of four pairs of agents. PDI discovers more transmission modes, allowing more agents to transmit with higher power to achieve higher rates on average. PDL concentrates the samples in the same modes [PITH_FULL_IMAGE:figures/full_fig_p036_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Performance of PDI while fixing λ along the diffusion steps. The time-average dual variable gives better objective and constraint satisfaction compared to the last iterate. However, running the dual updates beats both of them and does not depend on the initial value. 0 100 200 300 400 500 Reverse step 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Mean return PD from 0 tail 10% tail 20% tail 30% tail 40% tail 50% tai… view at source ↗
Figure 12
Figure 12. Figure 12: Frozen λ¯ tail during sampling. The effect of running the sampler with a fixed dual variable equal to the time-average of the tail dual iterates. Second, we test in [PITH_FULL_IMAGE:figures/full_fig_p038_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Temperature effect. Lower temperature β, equivalently larger β −1 (yellow) helps improve returns and constraint satisfaction. 0 100 200 300 400 500 Reverse step 10−9 10−7 10−5 10−3 Mean violation Cosine Linear β (DDPM) Linear SNR Polynomial (p = 2) [PITH_FULL_IMAGE:figures/full_fig_p039_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Effect of noise schedules. Standard DDPM cosine and linear schedules give better feasibility. Schedules that reconstruct the clean sample too early make the sampler sensitive to perturbations in the dual variable before the dual variable has stabilized [PITH_FULL_IMAGE:figures/full_fig_p039_14.png] view at source ↗
read the original abstract

This paper develops constrained diffusion models with primal-dual inference (PDI) to sample from optimal distributions of entropy-regularized optimization problems with \emph{average} constraints. We formalize constrained sampling in the Lagrangian dual domain, where the optimal distribution takes the form of a Gibbs distribution indexed by the optimal dual variable. Rather than estimating this dual multiplier before sampling and freezing it throughout generation, PDI jointly infers the optimal primal distribution and its parametrizing dual variable. Each reverse diffusion step denoises using the score field associated with the current multiplier and then updates the multiplier through dual ascent using the estimated constraint violation of the denoised samples. To enable this conditional score field, we train a single dual-conditioned score network over the family of Gibbs distributions induced by the dual variables encountered during inference. We prove that the time average of the dual variables generated along the inference trajectory converges to a neighborhood of the dual optimum and bound the effect of residual dual mismatch on the terminal distribution through schedule-dependent stability factors. We evaluate PDI on constrained sampling from a mixture of Gaussians, wireless resource allocation, and portfolio management.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces primal-dual inference (PDI) for constrained diffusion models to sample from entropy-regularized problems with average constraints. It formulates the problem in the Lagrangian dual domain, where the target is a Gibbs distribution indexed by the optimal dual variable. PDI performs joint inference by alternating score-based denoising steps (using a dual-conditioned score) with dual-ascent updates on estimated constraint violations. A single dual-conditioned score network is trained over the induced family of Gibbs distributions. The central theoretical result states that the time average of dual variables along the inference trajectory converges to a neighborhood of the dual optimum, with the effect of residual mismatch on the terminal distribution bounded by schedule-dependent stability factors. Empirical evaluation is reported on a mixture of Gaussians, wireless resource allocation, and portfolio management.

Significance. If the convergence result holds with controlled approximation error, the approach would offer a principled alternative to pre-computing and freezing dual multipliers in constrained diffusion sampling. This could improve flexibility for average-constraint problems in generative modeling, with direct relevance to the three evaluated application domains. The combination of a dual-conditioned network, trajectory-based dual updates, and explicit stability bounds on the terminal distribution constitutes the main technical contribution.

major comments (2)
  1. [Abstract] Abstract and the convergence claim: the proof that the time average of dual variables converges to a neighborhood of the optimum (and the subsequent bound on terminal-distribution mismatch via schedule-dependent stability factors) requires that the single dual-conditioned score network supplies sufficiently accurate scores for every dual value visited along the inference trajectory. No quantitative bound on score approximation error, no restriction on the dual range, and no propagation of that error into the neighborhood size or stability factors are provided.
  2. [Method (training of dual-conditioned network)] The training procedure for the dual-conditioned score network: the manuscript states that the network is trained over the family of Gibbs distributions induced by dual variables encountered during inference, but supplies no explicit mechanism (e.g., sampling schedule, range restriction, or error metric) that guarantees coverage or controls the approximation quality for duals arising at inference time.
minor comments (2)
  1. [Method] Notation for the dual-conditioned score field and the precise form of the dual-ascent update should be introduced with explicit equations rather than descriptive text only.
  2. [Experiments] The three experimental tasks would benefit from an ablation that isolates the effect of joint dual inference versus a fixed dual estimated in advance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on the convergence claim and training procedure. We address each major comment below, indicating where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract] Abstract and the convergence claim: the proof that the time average of dual variables converges to a neighborhood of the optimum (and the subsequent bound on terminal-distribution mismatch via schedule-dependent stability factors) requires that the single dual-conditioned score network supplies sufficiently accurate scores for every dual value visited along the inference trajectory. No quantitative bound on score approximation error, no restriction on the dual range, and no propagation of that error into the neighborhood size or stability factors are provided.

    Authors: The stated convergence result and stability bounds are derived under the assumption of exact scores from the dual-conditioned network. We agree that the manuscript does not provide quantitative error bounds or propagate approximation error into the neighborhood size. In revision we will add an explicit statement of this assumption in the abstract and theory section, along with a remark noting that error propagation is left for future analysis. revision: partial

  2. Referee: [Method (training of dual-conditioned network)] The training procedure for the dual-conditioned score network: the manuscript states that the network is trained over the family of Gibbs distributions induced by dual variables encountered during inference, but supplies no explicit mechanism (e.g., sampling schedule, range restriction, or error metric) that guarantees coverage or controls the approximation quality for duals arising at inference time.

    Authors: The current description relies on dual variables sampled from ranges observed in preliminary runs, but we acknowledge the lack of an explicit, formalized mechanism such as a fixed sampling schedule or coverage metric. In the revision we will expand the method section to specify the dual sampling distribution used during training and report the empirical range covered on each task. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard duality and explicit training assumption

full rationale

The central claim is a mathematical convergence result for the time-averaged dual variables along the PDI trajectory, together with a bound on terminal-distribution mismatch via schedule-dependent stability factors. This rests on Lagrangian duality and dual-ascent updates applied to estimated constraint violations. The training procedure for the single dual-conditioned score network is stated as covering the family of Gibbs distributions induced by encountered duals, but the convergence statement itself is not obtained by fitting a parameter to the same data or by redefining a quantity in terms of its own output. No self-citation chain, uniqueness theorem imported from prior author work, or ansatz smuggled via citation appears in the provided text. The result is therefore self-contained against external benchmarks of convex duality and diffusion score matching.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed from abstract only; ledger entries are therefore limited to those explicitly named or strongly implied by the abstract text.

axioms (2)
  • standard math Standard assumptions of diffusion processes and score matching hold for the family of dual-conditioned distributions.
    Implicit in any diffusion-model training claim.
  • domain assumption Dual ascent on the estimated constraint violation produces a convergent trajectory under the chosen step-size schedule.
    Required for the stated convergence result on the time-averaged dual variables.

pith-pipeline@v0.9.1-grok · 5722 in / 1260 out tokens · 52200 ms · 2026-06-27T03:40:41.898336+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 3 linked inside Pith

  1. [1]

    Deep unsupervised learning using nonequilibrium thermodynamics,

    J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” inProceedings of the 32nd International Conference on Machine Learning(F. Bach and D. Blei, eds.), vol. 37 ofProceedings of Machine Learning Research, (Lille, France), pp. 2256–2265, PMLR, 07–09 Jul 2015

  2. [2]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”arXiv preprint arxiv:2006.11239, 2020

  3. [3]

    Score-based generative modeling through stochastic differential equations,

    Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” inInternational Conference on Learning Representations, 2021

  4. [4]

    High-Resolution Image Synthesis with Latent Diffusion Models ,

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “ High-Resolution Image Synthesis with Latent Diffusion Models ,” in2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (Los Alamitos, CA, USA), pp. 10674–10685, IEEE Computer Society, June 2022

  5. [5]

    Diffwave: A versatile diffusion model for audio synthesis,

    Z. Kong, W. Ping, J. Huang, K. Zhao, and B. Catanzaro, “Diffwave: A versatile diffusion model for audio synthesis,” inInternational Conference on Learning Representations, 2021

  6. [6]

    Video diffusion models,

    J. Ho, T. Salimans, A. Gritsenko, W. Chan, M. Norouzi, and D. Fleet, “Video diffusion models,” inAdvances in Neural Information Processing Systems(S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, eds.), vol. 35, pp. 8633–8646, Curran Associates, Inc., 2022

  7. [7]

    Generative diffusion models for resource allocation in wireless networks,

    Y . B. Uslu, S. Hadou, S. S. Bidokhti, and A. Ribeiro, “Generative diffusion models for resource allocation in wireless networks,” in2025 IEEE 10th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 201–205, IEEE, 2025

  8. [8]

    Diffusion model based resource allocation strategy in ultra-reliable wireless networked control systems,

    A. Babazadeh Darabi and S. Coleri, “Diffusion model based resource allocation strategy in ultra-reliable wireless networked control systems,”IEEE Communications Letters, vol. 29, no. 1, pp. 85–89, 2025

  9. [9]

    Generative diffusion model for risk-neutral derivative pricing,

    N. Tiwari, “Generative diffusion model for risk-neutral derivative pricing,” 2026

  10. [10]

    Factor-based conditional diffusion model for portfolio optimization,

    M. He, X. He, and X. Gao, “Factor-based conditional diffusion model for portfolio optimization,” inNeurIPS 2025 Workshop: Generative AI in Finance, 2025

  11. [11]

    Path integral sampler: a stochastic control approach for sampling,

    Q. Zhang and Y . Chen, “Path integral sampler: a stochastic control approach for sampling,” in International Conference on Learning Representations, 2022

  12. [12]

    Denoising diffusion samplers,

    F. Vargas, W. S. Grathwohl, and A. Doucet, “Denoising diffusion samplers,” inThe Eleventh International Conference on Learning Representations, 2023

  13. [13]

    An optimal control perspective on diffusion-based generative modeling,

    J. Berner, L. Richter, and K. Ullrich, “An optimal control perspective on diffusion-based generative modeling,”Transactions on Machine Learning Research, 2024

  14. [14]

    Improved sampling via learned diffusions,

    L. Richter and J. Berner, “Improved sampling via learned diffusions,” inThe Twelfth Interna- tional Conference on Learning Representations, 2024. 10

  15. [15]

    Constrained diffusion models via dual training,

    S. Khalafi, D. Ding, and A. Ribeiro, “Constrained diffusion models via dual training,”Advances in Neural Information Processing Systems, vol. 37, pp. 26543–26576, 2024

  16. [16]

    Composition and alignment of diffusion models using constrained learning,

    S. Khalafi, I. Hounie, D. Ding, and A. Ribeiro, “Composition and alignment of diffusion models using constrained learning,”arXiv preprint arXiv:2508.19104, 2025

  17. [17]

    Tweedie’s formula and selection bias,

    B. Efron, “Tweedie’s formula and selection bias,”Journal of the American Statistical Associa- tion, vol. 106, no. 496, pp. 1602–1614, 2011

  18. [18]

    Diffusion models beat GANs on image synthesis,

    P. Dhariwal and A. Q. Nichol, “Diffusion models beat GANs on image synthesis,” inAdvances in Neural Information Processing Systems(A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, eds.), 2021

  19. [19]

    Classifier-free diffusion guidance,

    J. Ho and T. Salimans, “Classifier-free diffusion guidance,”arXiv preprint arXiv:2207.12598, 2022

  20. [20]

    Glide: Towards photorealistic image generation and editing with text-guided diffusion models,

    A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” 2022

  21. [21]

    Reflected diffusion models,

    A. Lou and S. Ermon, “Reflected diffusion models,” inInternational Conference on Machine Learning, pp. 22675–22701, PMLR, 2023

  22. [22]

    Mirror diffusion models for constrained and watermarked generation,

    G.-H. Liu, T. Chen, E. Theodorou, and M. Tao, “Mirror diffusion models for constrained and watermarked generation,”Advances in Neural Information Processing Systems, vol. 36, pp. 42898–42917, 2023

  23. [23]

    Riemannian score-based generative modelling,

    V . De Bortoli, E. Mathieu, M. Hutchinson, J. Thornton, Y . W. Teh, and A. Doucet, “Riemannian score-based generative modelling,”Advances in neural information processing systems, vol. 35, pp. 2406–2422, 2022

  24. [24]

    Metropolis sampling for constrained diffusion models,

    N. Fishman, L. Klarner, E. Mathieu, M. Hutchinson, and V . De Bortoli, “Metropolis sampling for constrained diffusion models,”Advances in Neural Information Processing Systems, vol. 36, pp. 62296–62331, 2023

  25. [25]

    Constrained synthesis with projected diffusion models,

    J. K. Christopher, S. Baek, and F. Fioretto, “Constrained synthesis with projected diffusion models,”Advances in Neural Information Processing Systems, vol. 37, pp. 89307–89333, 2024

  26. [26]

    Constrained diffusion with trust sampling,

    W. Huang, Y . Jiang, T. Van Wouwe, and C. K. Liu, “Constrained diffusion with trust sampling,” NeurIPS, 2024

  27. [27]

    Constrained sampling with primal-dual langevin monte carlo,

    L. F. Chamon, M. R. Karimi, and A. Korba, “Constrained sampling with primal-dual langevin monte carlo,”Advances in Neural Information Processing Systems, vol. 37, pp. 29285–29323, 2024

  28. [28]

    Diffusion posterior sampling for general noisy inverse problems,

    H. Chung, J. Kim, M. T. Mccann, M. L. Klasky, and J. C. Ye, “Diffusion posterior sampling for general noisy inverse problems,” inInternational Conference on Learning Representations (ICLR), 2023

  29. [29]

    A projection method of the cimmino type for linear algebraic systems,

    F. Sloboda, “A projection method of the cimmino type for linear algebraic systems,”Parallel Computing, vol. 17, no. 4, pp. 435–442, 1991

  30. [30]

    Graph signal diffusion models for wireless resource allocation,

    Y . B. Uslu, S. Hadou, S. S. Bidokhti, and A. Ribeiro, “Graph signal diffusion models for wireless resource allocation,” 2026

  31. [31]

    Estimation of non-normalized statistical models by score match- ing.,

    A. Hyvärinen and P. Dayan, “Estimation of non-normalized statistical models by score match- ing.,”Journal of Machine Learning Research, vol. 6, no. 4, 2005

  32. [32]

    Generative modeling by estimating gradients of the data distribution,

    Y . Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in neural information processing systems, vol. 32, 2019

  33. [33]

    Denoising diffusion implicit models,

    J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inInternational Conference on Learning Representations, 2021

  34. [34]

    Improved denoising diffusion probabilistic models,

    A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” inInterna- tional conference on machine learning, pp. 8162–8171, PMLR, 2021

  35. [35]

    Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps,

    C. Lu, Y . Zhou, F. Bao, J. Chen, C. Li, and J. Zhu, “Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps,”arXiv preprint arXiv:2206.00927, 2022

  36. [36]

    Elucidating the design space of diffusion-based generative models,

    T. Karras, M. Aittala, T. Aila, and S. Laine, “Elucidating the design space of diffusion-based generative models,” inProc. NeurIPS, 2022. 11

  37. [37]

    Sequential controlled langevin diffusions,

    J. Chen, L. Richter, J. Berner, D. Blessing, G. Neumann, and A. Anandkumar, “Sequential controlled langevin diffusions,” 2024

  38. [38]

    Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic optimal control,

    C. Domingo-Enrich, M. Drozdzal, B. Karrer, and R. T. Chen, “Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic optimal control,”arXiv preprint arXiv:2409.08861, 2024

  39. [39]

    Improved off-policy training of diffusion samplers,

    M. Sendera, M. Kim, S. Mittal, P. Lemos, L. Scimeca, J. Rector-Brooks, A. Adam, Y . Bengio, and N. Malkin, “Improved off-policy training of diffusion samplers,”Advances in Neural Information Processing Systems, vol. 37, pp. 81016–81045, 2024

  40. [40]

    Gradient guidance for diffusion models: An optimization perspective,

    Y . Guo, H. Yuan, Y . Yang, M. Chen, and M. Wang, “Gradient guidance for diffusion models: An optimization perspective,” inAdvances in Neural Information Processing Systems(A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, eds.), vol. 37, pp. 90736– 90770, Curran Associates, Inc., 2024

  41. [41]

    Diffusion models as constrained samplers for optimization with unknown constraints,

    L. Kong, Y . Du, W. Mu, K. Neklyudov, V . D. Bortoli, D. Wu, H. Wang, A. Ferber, Y .-A. Ma, C. P. Gomes, and C. Zhang, “Diffusion models as constrained samplers for optimization with unknown constraints,” 2025

  42. [42]

    Training diffusion models with reinforcement learning,

    K. Black, M. Janner, Y . Du, I. Kostrikov, and S. Levine, “Training diffusion models with reinforcement learning,” inThe Twelfth International Conference on Learning Representations, 2024

  43. [43]

    DIFUSCO: Graph-based diffusion solvers for combinatorial optimization,

    Z. Sun and Y . Yang, “DIFUSCO: Graph-based diffusion solvers for combinatorial optimization,” inThirty-seventh Conference on Neural Information Processing Systems, 2023

  44. [44]

    Distributionally robust optimization via diffusion ambiguity modeling,

    J. Wen and J. Yang, “Distributionally robust optimization via diffusion ambiguity modeling,” arXiv preprint arXiv:2510.22757, 2025

  45. [45]

    Robust optimization with diffusion models for green security,

    L. Kong, H. Wang, Y . Pan, C. W. Kim, M. Song, A. Nguyen, T. Wang, H. Xu, and M. Tambe, “Robust optimization with diffusion models for green security,” inProceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence(S. Chiappa and S. Magliacane, eds.), vol. 286 ofProceedings of Machine Learning Research, pp. 2325–2344, PMLR, 21–25 Jul 2025

  46. [46]

    Conditional diffusion guidance under hard constraint: A stochastic analysis approach,

    Z. Guo, W. Tang, and R. Xu, “Conditional diffusion guidance under hard constraint: A stochastic analysis approach,”arXiv preprint arXiv:2602.05533, 2026

  47. [47]

    Mimo channel estimation using score-based generative models,

    M. Arvinte and J. I. Tamir, “Mimo channel estimation using score-based generative models,” IEEE Transactions on Wireless Communications, vol. 22, no. 6, pp. 3698–3713, 2023

  48. [48]

    Joint channel estimation and data detection in massive mimo systems based on diffusion models,

    N. Zilberstein, A. Swami, and S. Segarra, “Joint channel estimation and data detection in massive mimo systems based on diffusion models,” inICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 13291–13295, 2024

  49. [49]

    Deterministic score-based diffusion model for channel estimation in ris-assisted mimo systems,

    Z. He, F. Héliot, and Y . Ma, “Deterministic score-based diffusion model for channel estimation in ris-assisted mimo systems,” in2025 IEEE 26th International Workshop on Signal Processing and Artificial Intelligence for Wireless Communications (SPAWC), pp. 1–5, 2025

  50. [50]

    Generating high dimensional user-specific wireless channels using diffusion models,

    T. Lee, J. Park, H. Kim, and J. G. Andrews, “Generating high dimensional user-specific wireless channels using diffusion models,”IEEE Transactions on Wireless Communications, vol. 25, pp. 2907–2921, 2026

  51. [51]

    Residual diffusion models for joint source channel coding of mimo csi,

    S. K. Ankireddy, H. Kim, and H. Kim, “Residual diffusion models for joint source channel coding of mimo csi,” in2025 59th Asilomar Conference on Signals, Systems, and Computers, pp. 55–62, IEEE, 2025

  52. [52]

    Generative diffusion model-based compression of mimo csi,

    H. Kim, T. Lee, H. Kim, G. De Veciana, M. A. Arfaoui, A. Koc, P. Pietraski, G. Zhang, and J. Kaewell, “Generative diffusion model-based compression of mimo csi,” inICC 2025 - IEEE International Conference on Communications, pp. 6323–6328, 2025

  53. [53]

    Diffusion model for multiple antenna communica- tion,

    J. Guo, X. Xu, Y . Liu, and A. Nallanathan, “Diffusion model for multiple antenna communica- tion,”IEEE Communications Magazine, vol. 63, no. 10, pp. 44–50, 2025

  54. [54]

    Channel-aware conditional diffusion model for secure mu-miso communications,

    T. Hui, X. Tang, Y . Wang, Q. Du, D. Niyato, and Z. Han, “Channel-aware conditional diffusion model for secure mu-miso communications,”IEEE Transactions on Vehicular Technology, p. 1–6, 2026

  55. [55]

    Improve the training efficiency of drl for wireless communication re- source allocation: The role of generative diffusion models,

    X. Zhang and J. Yu, “Improve the training efficiency of drl for wireless communication re- source allocation: The role of generative diffusion models,”IEEE Transactions on Wireless Communications, vol. 25, pp. 11593–11608, 2026. 12

  56. [56]

    Generation of synthetic financial time series by diffusion models,

    T. Takahashi and T. Mizuno, “Generation of synthetic financial time series by diffusion models,” Quantitative Finance, vol. 25, no. 10, pp. 1507–1516, 2025

  57. [57]

    Diffusion-augmented reinforcement learning for robust portfolio optimization under stress scenarios,

    H. Choudhary, A. Orra, and M. Thakur, “Diffusion-augmented reinforcement learning for robust portfolio optimization under stress scenarios,” 2025

  58. [58]

    Forecasting implied volatility surface with generative diffusion models,

    C. Jin and A. Agarwal, “Forecasting implied volatility surface with generative diffusion models,” 2025

  59. [59]

    Iterated denoising energy matching for sampling from Boltzmann densities,

    T. Akhound-Sadegh, J. Rector-Brooks, A. J. Bose, S. Mittal, P. Lemos, C.-H. Liu, M. Sendera, S. Ravanbakhsh, G. Gidel, Y . Bengio, N. Malkin, and A. Tong, “Iterated denoising energy matching for sampling from Boltzmann densities,” inInternational Conference on Machine Learning (ICML), 2024

  60. [60]

    Fast state-augmented learning for wireless resource allocation with dual variable regression,

    Y . B. Uslu, N. NaderiAlizadeh, M. Eisen, and A. Ribeiro, “Fast state-augmented learning for wireless resource allocation with dual variable regression,” 2025. 13 A Extended Related Work Diffusion models and score-based generative modeling.Score-based diffusion models learn a stochastic reversal of a progressive noising process by matching the gradients o...

  61. [61]

    train a single network across noise levels and sample via annealed Langevin dynamics, while denoising diffusion probabilistic models [2] frame generation as iterative Gaussian denoising guided by a simplified ELBO. These formulations are unified in [3] as instances of a common SDE, whose time reversal yields a generative process; and admits an equivalent ...

  62. [62]

    [42] reinterpreted the denoising chain as a multi-step Markov Decision Process (MDP) and applied policy-gradient methods to fine-tune diffusion models on non-differentiable rewards

    combined guided diffusion models and Langevin dynamics in a two-stage scheme that learns constrained samplers for optimization with unknown constraints. [42] reinterpreted the denoising chain as a multi-step Markov Decision Process (MDP) and applied policy-gradient methods to fine-tune diffusion models on non-differentiable rewards. Beyond continuous sett...