From Denoising to Decision Making: A Survey on Diffusion Model-Enabled Deep Reinforcement Learning for Wireless Networks

Bo Ma; Dusit Niyato; Jie Cao; Min Xu; Nguyen Cong Luong; Nguyen Duc Duy Anh; Nguyen Duc Hai; Nguyen Quoc Khanh; Qiushi Zhao; Shaohan Feng

arxiv: 2605.25531 · v1 · pith:NOLD5NFInew · submitted 2026-05-25 · 📡 eess.SP

From Denoising to Decision Making: A Survey on Diffusion Model-Enabled Deep Reinforcement Learning for Wireless Networks

Nguyen Cong Luong , Zeping Sui , Jie Cao , Min Xu , Nguyen Duc Hai , Zhihao Dong , Nguyen Duc Duy Anh , Qiushi Zhao

show 5 more authors

Nguyen Quoc Khanh Zhe Fu Shaohan Feng Dusit Niyato Bo Ma

This is my paper

Pith reviewed 2026-06-29 20:47 UTC · model grok-4.3

classification 📡 eess.SP

keywords diffusion modelsdeep reinforcement learningwireless networksresource managementmultimodal policiescomputation offloadingUAV systemssurvey

0 comments

The pith

Diffusion models integrated with deep reinforcement learning capture multimodal action structures to improve wireless resource management decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys the emerging integration of diffusion models with deep reinforcement learning for sequential decision problems in wireless networks. It establishes that conventional DRL is limited by unimodal policy distributions and inefficient exploration in high-dimensional spaces, while DM-enabled policies model complex, discontinuous, and multimodal actions more effectively. Applications reviewed include computation offloading in mobile edge computing, UAV-assisted and vehicular systems, AIGC-driven networks, resource allocation, physical-layer security, and robotics planning. A reader would care because this points to a concrete path for handling the dynamic and heterogeneous nature of wireless environments through generative modeling of decisions.

Core claim

The integration of diffusion models and deep reinforcement learning opens a new research direction in which DM-enabled policies substantially enhance decision quality by capturing the complex, discontinuous, and multimodal action structures inherent in wireless resource management.

What carries the argument

DM-enabled DRL policies that generate actions via a denoising process to represent multimodal distributions instead of unimodal ones.

If this is right

DM-DRL algorithms can be applied to computation offloading in mobile edge computing systems to handle heterogeneous user demands.
UAV-assisted and vehicular networks gain improved adaptability through policies that explore multimodal action spaces.
Wireless resource allocation and physical-layer security problems benefit from better modeling of discontinuous decision boundaries.
AIGC-driven systems and robotics planning tasks see enhanced performance from generative action sampling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Scalability of the denoising process may need approximation techniques for real-time wireless control loops.
Similar multimodal policy benefits could appear in other sequential decision domains such as power systems or autonomous driving.
Hybrid training that combines DMs with existing DRL exploration bonuses might reduce sample complexity further.

Load-bearing premise

Conventional DRL methods are fundamentally limited by unimodal policies and inefficient exploration, and diffusion models can reliably overcome these limitations in wireless settings.

What would settle it

A head-to-head empirical comparison in which standard DRL methods match or exceed DM-enabled variants on wireless tasks such as resource allocation or offloading while using less computation.

Figures

Figures reproduced from arXiv: 2605.25531 by Bo Ma, Dusit Niyato, Jie Cao, Min Xu, Nguyen Cong Luong, Nguyen Duc Duy Anh, Nguyen Duc Hai, Nguyen Quoc Khanh, Qiushi Zhao, Shaohan Feng, Zeping Sui, Zhe Fu, Zhihao Dong.

**Figure 1.** Figure 1: Theoretical foundations of diffusion models and conditional generation mechanisms. The diagram illustrates the discrete-time forward and reverse [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: The overall architecture and training workflow of the proposed off [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of multi-satellite cooperative computation offloading [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: The diagram of the proposed GenAI-DRL scheme in [ [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Applications of diffusion models in spectrum allocation and wireless [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: The diagram of the proposed human-in-the-loop RL with diffusion [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: The diagram of the proposed DM-enabled DRL scheme for multi [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

read the original abstract

Deep reinforcement learning (DRL) has long been a promising solution for sequential resource management in wireless networks. However, conventional DRL methods are fundamentally limited by their reliance on unimodal policy distributions, inefficient exploration in high-dimensional action spaces, and poor adaptability to dynamic and heterogeneous environments. Meanwhile, diffusion models (DMs) as one of the most powerful families of generative AI have demonstrted remarkable capabilities in modeling complex, multi-modal data distributions across diverse domains. The integration of DMs and DRL has opened a new and rapidly growing research direction, in which DM-enabled policies substantially enhance decision quality by capturing the complex, discontinuous, and multimodal action structures inherent in wireless resource management. In this paper, we present a comprehensive survey of DM-enabled DRL algorithms and their applications for various issues in wireless networks. Particularly, we first provide the theoretical background of DM and present different DM-enabled DRL algorithms. We then systematically review applications of DM-enabled DRL for across computation offloading in mobile edge computing, UAV-assisted, vehicular, and AIGC-driven systems, as well as wireless resource allocation, physical-layer security, and robotics and UAV planning. We conclude the paper by higlight future research directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A straightforward catalog of existing DM-DRL papers for wireless tasks, but it presents the performance gains as established without pulling together comparative evidence.

read the letter

This survey collects papers that combine diffusion models with DRL for wireless problems like offloading, UAV control, resource allocation, and security. It starts with background on both techniques, then lists algorithm variants and maps them to application areas.

That organization is the main value. Someone new to the intersection can use it to see what combinations have been tried and where.

The limitation is that the opening claim about DM-enabled policies substantially improving decision quality rests on the individual cited papers. The survey does not aggregate results, compare baselines across works, or note cases where the advantage does not appear. It treats the multimodal-action benefit as given rather than testing how consistently it shows up.

No new derivations or data appear, which fits a survey format, but it leaves the central advantage unexamined at the meta level.

The paper is mainly for researchers already active in wireless DRL who need a quick map of diffusion-model extensions. It deserves peer review so the coverage and balance can be checked by people in the subfield.

Referee Report

2 major / 2 minor

Summary. This survey paper examines the integration of diffusion models (DMs) with deep reinforcement learning (DRL) for wireless network resource management. It posits that conventional DRL methods are limited by unimodal policy distributions, inefficient exploration, and poor adaptability, while DMs enable modeling of complex, multimodal, and discontinuous action spaces, thereby substantially improving decision quality. The manuscript covers DM theoretical background, DM-enabled DRL algorithms, and applications in areas including mobile edge computing offloading, UAV-assisted systems, vehicular networks, AIGC-driven systems, wireless resource allocation, physical-layer security, and robotics/UAV planning, concluding with future research directions.

Significance. If the survey delivers a balanced, evidence-based synthesis of the cited works rather than restating individual claims, it could usefully map an emerging intersection between generative models and wireless DRL, highlighting algorithmic patterns and open problems in a fast-growing area. The paper's value would rest on whether it identifies consistent performance patterns, failure modes, or conditions under which DM advantages materialize across the reviewed wireless scenarios.

major comments (2)

[Abstract] Abstract: The claim that 'DM-enabled policies substantially enhance decision quality by capturing the complex, discontinuous, and multimodal action structures' is asserted as established fact and used to structure the survey, yet the described organization (background, algorithms, applications) provides no indication of a meta-analysis, aggregated performance metrics, or critical assessment of when these advantages hold versus fail across the cited papers.
[Abstract] Abstract (limitations paragraph): The statement that conventional DRL methods are 'fundamentally limited' by unimodal policies and inefficient exploration is presented without reference to specific counter-examples or successful DRL deployments in wireless settings that would justify the 'fundamental' qualifier; this framing underpins the motivation for the entire survey.

minor comments (2)

[Abstract] Abstract: Typo 'demonstrted' should be 'demonstrated'.
[Abstract] Abstract: Typo 'higlight' should be 'highlight'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful comments on our survey. We address the two major comments on the abstract below and will revise the manuscript to qualify the claims more carefully while preserving the survey's focus on synthesizing the emerging literature.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'DM-enabled policies substantially enhance decision quality by capturing the complex, discontinuous, and multimodal action structures' is asserted as established fact and used to structure the survey, yet the described organization (background, algorithms, applications) provides no indication of a meta-analysis, aggregated performance metrics, or critical assessment of when these advantages hold versus fail across the cited papers.

Authors: We agree that the abstract phrasing presents the performance benefits too definitively. As a survey, the manuscript reviews and organizes existing works rather than conducting a new meta-analysis or aggregating raw performance metrics across papers (which would require data not publicly available in most cited studies). We will revise the abstract to state that DM-enabled policies 'have demonstrated potential to enhance' decision quality in the reviewed literature, and we will add a brief discussion in the introduction or conclusion noting the current lack of cross-paper comparative benchmarks and the conditions under which advantages appear most consistent. revision: yes
Referee: [Abstract] Abstract (limitations paragraph): The statement that conventional DRL methods are 'fundamentally limited' by unimodal policies and inefficient exploration is presented without reference to specific counter-examples or successful DRL deployments in wireless settings that would justify the 'fundamental' qualifier; this framing underpins the motivation for the entire survey.

Authors: The word 'fundamentally' is too strong and does not adequately acknowledge successful conventional DRL applications in wireless networks. We will change the wording to 'face significant challenges, including' unimodal policies and inefficient exploration in high-dimensional settings, and we will include citations to both limitation-highlighting papers and representative successful DRL deployments in the revised introduction to provide balanced motivation. revision: yes

Circularity Check

0 steps flagged

No significant circularity: survey aggregates external literature without self-referential derivations

full rationale

This is a survey paper reviewing DM-enabled DRL methods and applications in wireless networks. The abstract and structure present background, algorithms, and applications drawn from cited external works. No new equations, fitted parameters, or derivations are introduced that reduce by construction to the paper's own inputs. Claims of enhancement are framed as summaries of the surveyed literature rather than internally derived results. No self-citation chains, ansatzes, or uniqueness theorems are invoked in a load-bearing manner within the paper itself. The paper is self-contained as a review against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Survey contains no new free parameters, axioms, or invented entities; the abstract relies on standard background in DRL and diffusion models without introducing ad-hoc constructs.

pith-pipeline@v0.9.1-grok · 5792 in / 962 out tokens · 18676 ms · 2026-06-29T20:47:34.466468+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

105 extracted references · 22 canonical work pages · 7 internal anchors

[1]

Optimizing aigc services by prompt engineering and edge computing: A generative diffusion model-based contract theory approach,

D. Ye, S. Cai, H. Du, J. Kang, Y . Liu, R. Yu, and D. Niyato, “Optimizing aigc services by prompt engineering and edge computing: A generative diffusion model-based contract theory approach,”IEEE Transactions on Vehicular Technology, vol. 74, no. 1, pp. 571–586, 2024

2024
[2]

A survey of mobile edge computing for the metaverse: Architectures, applications, and challenges,

Y . Wang and J. Zhao, “A survey of mobile edge computing for the metaverse: Architectures, applications, and challenges,” in2022 IEEE 8th international conference on collaboration and internet computing (CIC). IEEE, 2022, pp. 1–9

2022
[3]

Edgeshard: Efficient llm inference via collaborative edge computing,

M. Zhang, X. Shen, J. Cao, Z. Cui, and S. Jiang, “Edgeshard: Efficient llm inference via collaborative edge computing,”IEEE Internet of Things Journal, vol. 12, no. 10, pp. 13 119–13 131, 2024

2024
[4]

Mobility-aware multi-hop task offloading for autonomous driving in vehicular edge computing and networks,

L. Liu, M. Zhao, M. Yu, M. A. Jan, D. Lan, and A. Taherkordi, “Mobility-aware multi-hop task offloading for autonomous driving in vehicular edge computing and networks,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 2, pp. 2169–2182, 2022

2022
[5]

Unmanned- aerial-vehicle-aided integrated sensing and computation with mobile- edge computing,

N. Huang, C. Dou, Y . Wu, L. Qian, B. Lin, and H. Zhou, “Unmanned- aerial-vehicle-aided integrated sensing and computation with mobile- edge computing,”IEEE Internet of Things Journal, vol. 10, no. 19, pp. 16 830–16 844, 2023

2023
[6]

Playing Atari with Deep Reinforcement Learning

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,”arXiv preprint arXiv:1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[7]

Continuous control with deep reinforcement learning

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning. arxiv 2015,”arXiv preprint arXiv:1509.02971, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[8]

Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” inInternational conference on machine learning. Pmlr, 2018, pp. 1861–1870

2018
[9]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[10]

Multi-agent actor-critic for mixed cooperative-competitive envi- ronments,

R. Lowe, Y . I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mor- datch, “Multi-agent actor-critic for mixed cooperative-competitive envi- ronments,”Advances in neural information processing systems, vol. 30, 2017

2017
[11]

Deep unsupervised learning using nonequilibrium thermodynamics,

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” inInternational conference on machine learning. pmlr, 2015, pp. 2256–2265

2015
[12]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020

2020
[13]

Diffusion models in vision: A survey,

F.-A. Croitoru, V . Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 9, pp. 10 850–10 869, 2023

2023
[14]

A survey of diffusion models in natural language processing,

H. Zou, Z. M. Kim, and D. Kang, “A survey of diffusion models in natural language processing,”arXiv preprint arXiv:2305.14671, 2023

work page arXiv 2023
[15]

A survey of multimodal controllable diffusion models,

R. Jiang, G.-C. Zheng, T. Li, T.-R. Yang, J.-D. Wang, and X. Li, “A survey of multimodal controllable diffusion models,”Journal of Computer Science and Technology, vol. 39, no. 3, pp. 509–541, 2024

2024
[16]

An integrated communication and computing scheme for wi-fi networks based on generative ai and reinforcement learning,

X. Du and X. Fang, “An integrated communication and computing scheme for wi-fi networks based on generative ai and reinforcement learning,” inGLOBECOM 2024-2024 IEEE Global Communications Conference. IEEE, 2024, pp. 2009–2014

2024
[17]

Computation- offloading optimization for satellite edge computing via diffusion and lyapunov-based deep reinforcement learning,

Z. Rao, Z. Zhu, Y . Yao, Y . Xu, Y . Cheng, and H. Du, “Computation- offloading optimization for satellite edge computing via diffusion and lyapunov-based deep reinforcement learning,”IEEE Internet of Things Journal, 2025

2025
[18]

Dmais: Diffusion model-based scheduling in edge-cloud systems,

Z. Wang, M. Ding, Y . Zhao, C. Qiu, Q. Ye, and X. Wang, “Dmais: Diffusion model-based scheduling in edge-cloud systems,” inGLOBE- COM 2024 - 2024 IEEE Global Communications Conference, 2024, pp. 4612–4617

2024
[19]

Diffusion-based reinforcement learning for edge-enabled ai-generated content services,

H. Du, Z. Li, D. Niyato, J. Kang, Z. Xiong, H. Huang, and S. Mao, “Diffusion-based reinforcement learning for edge-enabled ai-generated content services,”IEEE Transactions on Mobile Computing, vol. 23, no. 9, pp. 8902–8918, 2024

2024
[20]

Enhancing llm qos through cloud-edge collaboration: A diffusion-based multi-agent reinforcement learning approach,

Z. Yao, Z. Tang, W. Yang, and W. Jia, “Enhancing llm qos through cloud-edge collaboration: A diffusion-based multi-agent reinforcement learning approach,”IEEE Transactions on Services Computing, 2025

2025
[21]

Diffusion-based multi-agent reinforcement learning for semantic ve- hicular edge computing,

Y . Yang, W. Ma, W. Sun, J. He, Y . Fu, C. Yuen, and Y . Zhang, “Diffusion-based multi-agent reinforcement learning for semantic ve- hicular edge computing,”IEEE Transactions on Services Computing, 2025

2025
[22]

Improve the training efficiency of drl for wireless communication resource allocation: The role of generative diffusion models,

X. Zhang and J. Yu, “Improve the training efficiency of drl for wireless communication resource allocation: The role of generative diffusion models,”arXiv preprint arXiv:2502.07211, 2025

work page arXiv 2025
[23]

Uplink rsma in leo satellite communications: A perspective from generative artificial intelligence,

K. Wang, X. Wang, N. Zhao, X. Yang, H. Fang, and D. Niyato, “Uplink rsma in leo satellite communications: A perspective from generative artificial intelligence,”IEEE Transactions on Vehicular Technology, 2025

2025
[24]

Carrier aggregation, load balanc- ing, and backhauling in non-terrestrial networks: Generative diffusion model-based optimization,

F. Khoramnejad and E. Hossain, “Carrier aggregation, load balanc- ing, and backhauling in non-terrestrial networks: Generative diffusion model-based optimization,”IEEE Transactions on Wireless Communi- cations, 2025

2025
[25]

Multi- objective aerial collaborative secure communication optimization via generative diffusion model-enabled deep reinforcement learning,

C. Zhang, G. Sun, J. Li, Q. Wu, J. Wang, D. Niyato, and Y . Liu, “Multi- objective aerial collaborative secure communication optimization via generative diffusion model-enabled deep reinforcement learning,”IEEE Transactions on Mobile Computing, 2024

2024
[26]

Uav- enabled secure data collection and energy transfer in iot via diffusion model-enhanced deep reinforcement learning,

S. Liang, M. Yin, W. Xie, Z. Sun, J. Li, J. Wang, and H. Du, “Uav- enabled secure data collection and energy transfer in iot via diffusion model-enhanced deep reinforcement learning,”IEEE Internet of Things Journal, 2024

2024
[27]

Diffusion model enhanced deep reinforcement learning for traffic control in 6g networks,

H. Shi, R. Wang, C. Pan, F. Gao, H. Tang, and L. Chen, “Diffusion model enhanced deep reinforcement learning for traffic control in 6g networks,”IEEE Communications Magazine, vol. 63, no. 7, pp. 41–47, 2025

2025
[28]

A combined diffusion model and reinforcement learning approach for solving the vehicle routing problem with multiple soft time windows,

Y . Qiao, J. Miao, and X. Huang, “A combined diffusion model and reinforcement learning approach for solving the vehicle routing problem with multiple soft time windows,”IEEE Access, 2025

2025
[29]

Enhancing deep reinforcement learning: A tutorial on generative diffusion models in network optimization,

H. Du, R. Zhang, Y . Liu, J. Wang, Y . Lin, Z. Li, D. Niyato, J. Kang, Z. Xiong, S. Cuiet al., “Enhancing deep reinforcement learning: A tutorial on generative diffusion models in network optimization,”IEEE Communications Surveys & Tutorials, vol. 26, no. 4, pp. 2611–2646, 2024

2024
[30]

Diffusion models for future networks and communications: A comprehensive survey,

N. C. Luong, N. D. Hai, D. Van Le, H. T. Nguyen, T.-H. Vu, T. Huynh- The, R. Zhang, N. D. D. Anh, D. Niyato, M. Di Renzoet al., “Diffusion models for future networks and communications: A comprehensive survey,”arXiv preprint arXiv:2508.01586, 2025

work page arXiv 2025
[31]

Generative diffusion models for wireless networks: Fundamental, architecture, and state-of-the-art,

D. Fan, R. Meng, X. Xu, Y . Liu, G. Nan, C. Feng, S. Han, S. Gao, B. Xu, D. Niyatoet al., “Generative diffusion models for wireless networks: Fundamental, architecture, and state-of-the-art,”IEEE Com- munications Surveys & Tutorials, 2026

2026
[32]

Diffusion models for reinforcement learning: A survey,

Z. Zhu, H. Zhao, H. He, Y . Zhong, S. Zhang, H. Guo, T. Chen, and W. Zhang, “Diffusion models for reinforcement learning: A survey,” arXiv preprint arXiv:2311.01223, 2023

work page arXiv 2023
[33]

Diffusion models for wireless communications,

M. Letafati, S. Ali, and M. Latva-aho, “Diffusion models for wireless communications,”arXiv preprint arXiv:2310.07312, 2023

work page arXiv 2023
[34]

Applications of deep reinforcement learning in communications and networking: A survey,

N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y .-C. Liang, and D. I. Kim, “Applications of deep reinforcement learning in communications and networking: A survey,”IEEE communications surveys & tutorials, vol. 21, no. 4, pp. 3133–3174, 2019

2019
[35]

Toward autonomous multi-uav wireless network: A survey of reinforcement 21 learning-based approaches,

Y . Bai, H. Zhao, X. Zhang, Z. Chang, R. Jäntti, and K. Yang, “Toward autonomous multi-uav wireless network: A survey of reinforcement 21 learning-based approaches,”IEEE Communications Surveys & Tutori- als, vol. 25, no. 4, pp. 3038–3067, 2023

2023
[36]

A survey on drl based uav communications and networking: Drl fun- damentals, applications and implementations,

W. Zhao, S. Cui, W. Qiu, Z. He, Z. Liu, X. Zheng, B. Mao, and N. Kato, “A survey on drl based uav communications and networking: Drl fun- damentals, applications and implementations,”IEEE Communications Surveys & Tutorials, 2025

2025
[37]

Deep reinforcement learning for radio resource allocation and man- agement in next generation heterogeneous wireless networks: A sur- vey,

A. Alwarafy, M. Abdallah, B. S. Ciftler, A. Al-Fuqaha, and M. Hamdi, “Deep reinforcement learning for radio resource allocation and man- agement in next generation heterogeneous wireless networks: A sur- vey,”arXiv preprint arXiv:2106.00574, 2021

work page arXiv 2021
[38]

Deep reinforcement learning in edge networks: Challenges and future directions,

A. Hazra, V . M. R. Tummala, N. Mazumdar, D. K. Sah, and M. Ad- hikari, “Deep reinforcement learning in edge networks: Challenges and future directions,”Physical Communication, vol. 66, p. 102460, 2024

2024
[39]

Score-Based Generative Modeling through Stochastic Differential Equations

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,”arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011
[40]

Classifier-Free Diffusion Guidance

J. Ho and T. Salimans, “Classifier-free diffusion guidance,” 2022. [Online]. Available: https://arxiv.org/abs/2207.12598

work page internal anchor Pith review Pith/arXiv arXiv 2022
[41]

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Z. Wang, J. J. Hunt, and M. Zhou, “Diffusion policies as an expres- sive policy class for offline reinforcement learning,”arXiv preprint arXiv:2208.06193, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[42]

Diffusion actor-critic with entropy reg- ulator,

Y . Wang, L. Wang, Y . Jiang, W. Zou, T. Liu, X. Song, W. Wang, L. Xiao, J. Wu, J. Duanet al., “Diffusion actor-critic with entropy reg- ulator,”Advances in Neural Information Processing Systems, vol. 37, pp. 54 183–54 204, 2024

2024
[43]

Qos- aware multi-aigc service orchestration at edges: An attention-diffusion- aided drl method,

Y . Liu, S. Li, X. Lin, X. Chen, G. Li, Y . Liu, B. Liao, and J. Li, “Qos- aware multi-aigc service orchestration at edges: An attention-diffusion- aided drl method,”IEEE Transactions on Cognitive Communications and Networking, 2025

2025
[44]

Towards multi- task generative-ai edge services with an attention-based diffusion drl approach,

Y . Liu, X. Lin, S. Li, G. Li, Q. Mao, and J. Li, “Towards multi- task generative-ai edge services with an attention-based diffusion drl approach,” in2024 9th IEEE International Conference on Smart Cloud (SmartCloud), 2024, pp. 60–65

2024
[45]

Diffusion- based reinforcement learning for cooperative offloading and resource allocation in multi-uav assisted edge-enabled metaverse,

Z. Zhang, J. Wang, J. Chen, H. Fu, Z. Tong, and C. Jiang, “Diffusion- based reinforcement learning for cooperative offloading and resource allocation in multi-uav assisted edge-enabled metaverse,”IEEE Trans- actions on Vehicular Technology, 2025

2025
[46]

Diffusion model and digital twin enhanced deep reinforcement learning for radio resource management in ran slicing,

S. Xiong, S. He, G. Chen, C. Zhang, and Y . Huang, “Diffusion model and digital twin enhanced deep reinforcement learning for radio resource management in ran slicing,” in2025 IEEE Wireless Communications and Networking Conference (WCNC), 2025, pp. 1–6

2025
[47]

Accelerating ai-generated content collaborative inference via transfer reinforcement learning in dynamic edge networks,

M. Tian, Z. Liu, C. Hou, C. Qiu, X. Wang, D. Niyato, and V . C. Leung, “Accelerating ai-generated content collaborative inference via transfer reinforcement learning in dynamic edge networks,”IEEE Transactions on Cloud Computing, 2025

2025
[48]

Addressing function approxi- mation error in actor-critic methods,

S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approxi- mation error in actor-critic methods,” inInternational conference on machine learning. PMLR, 2018, pp. 1587–1596

2018
[49]

Multi-objective aerial irs-assisted isac optimization via gen- erative ai-enhanced deep reinforcement learning,

W. Xie, G. Sun, J. Wang, H. Du, J. Kang, K. Huang, and V . Le- ung, “Multi-objective aerial irs-assisted isac optimization via gen- erative ai-enhanced deep reinforcement learning,”arXiv preprint arXiv:2502.10687, 2025

work page internal anchor Pith review arXiv 2025
[50]

Joint computing offloading and resource allo- cation in mec-enabled iot: A diffusion-based reinforcement learning approach,

H. Cao and B. Xiao, “Joint computing offloading and resource allo- cation in mec-enabled iot: A diffusion-based reinforcement learning approach,” in2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2024, pp. 890–896

2024
[51]

Generative ai-aided reinforcement learning for computation offloading and privacy protection in vr-based multi-access edge computing,

F. You, H. Du, J. Kang, W. Ni, D. Niyato, and A. Jamalipour, “Generative ai-aided reinforcement learning for computation offloading and privacy protection in vr-based multi-access edge computing,” in 2024 IEEE Smart World Congress (SWC), 2024, pp. 2209–2214

2024
[52]

Dnn task assignment in uav networks: A generative ai enhanced multi-agent reinforcement learning approach,

X. Tang, Q. Chen, W. Weng, B. Liao, J. Wang, X. Cao, and X. Li, “Dnn task assignment in uav networks: A generative ai enhanced multi-agent reinforcement learning approach,”IEEE Internet of Things Journal, 2025

2025
[53]

Task assignment and exploration optimization for low altitude uav rescue via generative ai enhanced multi-agent reinforcement learning,

X. Tang, Q. Chen, W. Weng, C. Jin, Z. Liu, J. Wang, G. Sun, X. Li, and D. Niyato, “Task assignment and exploration optimization for low altitude uav rescue via generative ai enhanced multi-agent reinforcement learning,”arXiv preprint arXiv:2504.13554, 2025

work page arXiv 2025
[54]

Monotonic value function factorisation for deep multi- agent reinforcement learning,

T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi- agent reinforcement learning,”Journal of Machine Learning Research, vol. 21, no. 178, pp. 1–51, 2020

2020
[55]

Diffusion-based deep reinforcement learning for resource man- agement in connected construction equipment networks: A hierarchical framework,

P. Ning, H. Wang, T. Tang, J. Zhang, H. Du, D. Niyato, and F. R. Yu, “Diffusion-based deep reinforcement learning for resource man- agement in connected construction equipment networks: A hierarchical framework,”IEEE Transactions on Wireless Communications, 2025

2025
[56]

Multi-agent conditional diffusion model with mean field communication as wireless resource allocation planner,

K. Meng, S. Zhang, R. Li, X. Meng, C. Wang, M. Lei, and Z. Zhao, “Multi-agent conditional diffusion model with mean field communication as wireless resource allocation planner,”arXiv preprint arXiv:2510.22969, 2025

work page arXiv 2025
[57]

Dnn partitioning, task offloading, and resource allocation in dynamic vehicular networks: A lyapunov-guided diffusion-based reinforcement learning approach,

Z. Liu, H. Du, J. Lin, Z. Gao, L. Huang, S. Hosseinalipour, and D. Niyato, “Dnn partitioning, task offloading, and resource allocation in dynamic vehicular networks: A lyapunov-guided diffusion-based reinforcement learning approach,”IEEE Transactions on Mobile Com- puting, 2024

2024
[58]

Planning with diffusion for flexible behavior synthesis,

M. Janner, Y . Du, J. B. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” inInternational Conference on Machine Learning, 2022. [Online]. Available: https: //api.semanticscholar.org/CorpusID:248965046

2022
[59]

Dif- fuserlite: Towards real-time diffusion planning,

Z. Dong, J. Hao, Y . Yuan, F. Ni, Y . Wang, P. Li, and Y . Zheng, “Dif- fuserlite: Towards real-time diffusion planning,”Advances in Neural Information Processing Systems, vol. 37, pp. 122 556–122 583, 2024

2024
[60]

Crossway diffu- sion: Improving diffusion-based visuomotor policy via self-supervised learning,

X. Li, V . Belagali, J. Shang, and M. S. Ryoo, “Crossway diffu- sion: Improving diffusion-based visuomotor policy via self-supervised learning,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 16 841–16 849

2024
[61]

Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning,

H. He, C. Bai, K. Xu, Z. Yang, W. Zhang, D. Wang, B. Zhao, and X. Li, “Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning,”Advances in neural information processing systems, vol. 36, pp. 64 896–64 917, 2023

2023
[62]

Dual-circulation generative ai for optimizing resource allocation in multi-granularity heterogeneous federated learning,

W. He, H. Yao, X. Ren, T. Ouyang, Z. Xiong, Y . He, and Y . Liu, “Dual-circulation generative ai for optimizing resource allocation in multi-granularity heterogeneous federated learning,”IEEE Transactions on Cognitive Communications and Networking, 2025

2025
[63]

Dress: Diffusion reasoning-based reward shaping scheme for intelligent networks,

F. You, H. Du, X. Hou, Y . Ren, and K. Huang, “Dress: Diffusion reasoning-based reward shaping scheme for intelligent networks,”arXiv preprint arXiv:2503.07433, 2025

work page arXiv 2025
[64]

Drl optimization tra- jectory generation via wireless network intent-guided diffusion models for resource allocation,

J. Wu, X. Fang, D. Niyato, J. Wang, and J. Wang, “Drl optimization tra- jectory generation via wireless network intent-guided diffusion models for resource allocation,”IEEE Internet of Things Journal, 2025

2025
[65]

Diffusion-rl for scalable resource allocation for 6g networks,

S. Nouri, M. K. Motalleb, and V . Shah-Mansouri, “Diffusion-rl for scalable resource allocation for 6g networks,”arXiv preprint arXiv:2506.07880, 2025

work page arXiv 2025
[66]

Diffusion model based resource allocation strategy in ultra-reliable wireless networked control systems,

A. B. Darabi and S. Coleri, “Diffusion model based resource allocation strategy in ultra-reliable wireless networked control systems,”IEEE Communications Letters, 2024

2024
[67]

Integrating failures in robot skill acquisition with offline action-sequence diffusion rl,

H. Wang, L. Qi, and Y . Sun, “Integrating failures in robot skill acquisition with offline action-sequence diffusion rl,” inICASSP 2025- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5

2025
[68]

Enhancing qoe in collaborative edge systems with feed- back diffusion generative scheduling,

C. Xu, J. Guo, Y . Liang, H. Zou, J. Zeng, H. Dai, W. Jia, J. Cao, and T. Wang, “Enhancing qoe in collaborative edge systems with feed- back diffusion generative scheduling,”IEEE Transactions on Mobile Computing, 2025

2025
[69]

Trust model-based consensus optimization for vehicle platooning networks: A novel deep reinforcement learning approach with genai,

H. Chen, X. Fu, Q. Yuan, Z. Zhuang, J. Kang, Z. Liu, J. Wang, and D. Niyato, “Trust model-based consensus optimization for vehicle platooning networks: A novel deep reinforcement learning approach with genai,”IEEE Transactions on Intelligent Transportation Systems, 2025

2025
[70]

Diffusion-based reinforcement learning for dynamic uav-assisted ve- hicle twins migration in vehicular metaverses,

Y . Tong, J. Kang, J. Chen, M. Xu, G. Li, W. Zhang, and X. Yan, “Diffusion-based reinforcement learning for dynamic uav-assisted ve- hicle twins migration in vehicular metaverses,” inGLOBECOM 2024- 2024 IEEE Global Communications Conference. IEEE, 2024, pp. 5156–5161

2024
[71]

Continuous deep q- learning with model-based acceleration,

S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep q- learning with model-based acceleration,” inInternational conference on machine learning. PMLR, 2016, pp. 2829–2838

2016
[72]

Decentralized request dispatch for edge-clouds: a diffusion-based reinforcement learning paradigm,

Y . Peng, H. Peng, and W. Wang, “Decentralized request dispatch for edge-clouds: a diffusion-based reinforcement learning paradigm,”IEEE Transactions on Services Computing, 2025

2025
[73]

Energy-efficient ground-air-space vehicular crowdsensing by hierarchical multi-agent deep reinforcement learning with diffusion models,

Y . Zhao, C. H. Liu, T. Yi, G. Li, and D. Wu, “Energy-efficient ground-air-space vehicular crowdsensing by hierarchical multi-agent deep reinforcement learning with diffusion models,”IEEE Journal on Selected Areas in Communications, 2024

2024
[74]

Energy-efficient rsma- enabled low-altitude mec optimization via generative ai-enhanced deep reinforcement learning,

X. Wang, H. Du, L. Feng, and K. Huang, “Energy-efficient rsma- enabled low-altitude mec optimization via generative ai-enhanced deep reinforcement learning,”arXiv preprint arXiv:2507.12910, 2025

work page arXiv 2025
[75]

A priority-aware ai-generated content resource allocation method for multi-uav aided metaverse,

Z. Zhang, J. Wang, J. Chen, Z. Fang, C. Jiang, and Z. Han, “A priority-aware ai-generated content resource allocation method for multi-uav aided metaverse,” in2025 IEEE Wireless Communications and Networking Conference (WCNC). IEEE, 2025, pp. 1–6

2025
[76]

Generative diffusion-based contract design for efficient ai twin 22 migration in vehicular embodied ai networks,

Y . Zhong, J. Kang, J. Wen, D. Ye, J. Nie, D. Niyato, X. Gao, and S. Xie, “Generative diffusion-based contract design for efficient ai twin 22 migration in vehicular embodied ai networks,”IEEE Transactions on Mobile Computing, 2025

2025
[77]

Adaptive dig- ital twin-assisted 3c management for qoe-driven msvs: A gai-based drl approach,

X. Huang, X. Qin, M. Li, C. Huang, and X. Shen, “Adaptive dig- ital twin-assisted 3c management for qoe-driven msvs: A gai-based drl approach,”IEEE Transactions on Cognitive Communications and Networking, 2024

2024
[78]

Ai-generated network design: A diffusion model-based learning approach,

Y . Huang, M. Xu, X. Zhang, D. Niyato, Z. Xiong, S. Wang, and T. Huang, “Ai-generated network design: A diffusion model-based learning approach,”IEEE Network, vol. 38, no. 3, pp. 202–209, 2023

2023
[79]

Enhanced secure beamforming for irs-assisted iot communication using a generative diffusion model-enabled optimization approach,

J. Zhang, Z. Liu, X. Feng, H. Yang, and S. Liang, “Enhanced secure beamforming for irs-assisted iot communication using a generative diffusion model-enabled optimization approach,”IEEE Internet of Things Journal, 2025

2025
[80]

Generative diffusion model-based deep reinforcement learning for uplink rate-splitting multiple access in leo satellite networks,

X. Wang, K. Wang, D. Zhang, J. Li, M. Zhou, and T. Hämäläinen, “Generative diffusion model-based deep reinforcement learning for uplink rate-splitting multiple access in leo satellite networks,” in2024 IEEE Symposium on Computers and Communications (ISCC). IEEE, 2024, pp. 1–8

2024

Showing first 80 references.

[1] [1]

Optimizing aigc services by prompt engineering and edge computing: A generative diffusion model-based contract theory approach,

D. Ye, S. Cai, H. Du, J. Kang, Y . Liu, R. Yu, and D. Niyato, “Optimizing aigc services by prompt engineering and edge computing: A generative diffusion model-based contract theory approach,”IEEE Transactions on Vehicular Technology, vol. 74, no. 1, pp. 571–586, 2024

2024

[2] [2]

A survey of mobile edge computing for the metaverse: Architectures, applications, and challenges,

Y . Wang and J. Zhao, “A survey of mobile edge computing for the metaverse: Architectures, applications, and challenges,” in2022 IEEE 8th international conference on collaboration and internet computing (CIC). IEEE, 2022, pp. 1–9

2022

[3] [3]

Edgeshard: Efficient llm inference via collaborative edge computing,

M. Zhang, X. Shen, J. Cao, Z. Cui, and S. Jiang, “Edgeshard: Efficient llm inference via collaborative edge computing,”IEEE Internet of Things Journal, vol. 12, no. 10, pp. 13 119–13 131, 2024

2024

[4] [4]

Mobility-aware multi-hop task offloading for autonomous driving in vehicular edge computing and networks,

L. Liu, M. Zhao, M. Yu, M. A. Jan, D. Lan, and A. Taherkordi, “Mobility-aware multi-hop task offloading for autonomous driving in vehicular edge computing and networks,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 2, pp. 2169–2182, 2022

2022

[5] [5]

Unmanned- aerial-vehicle-aided integrated sensing and computation with mobile- edge computing,

N. Huang, C. Dou, Y . Wu, L. Qian, B. Lin, and H. Zhou, “Unmanned- aerial-vehicle-aided integrated sensing and computation with mobile- edge computing,”IEEE Internet of Things Journal, vol. 10, no. 19, pp. 16 830–16 844, 2023

2023

[6] [6]

Playing Atari with Deep Reinforcement Learning

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,”arXiv preprint arXiv:1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[7] [7]

Continuous control with deep reinforcement learning

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning. arxiv 2015,”arXiv preprint arXiv:1509.02971, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[8] [8]

Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” inInternational conference on machine learning. Pmlr, 2018, pp. 1861–1870

2018

[9] [9]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[10] [10]

Multi-agent actor-critic for mixed cooperative-competitive envi- ronments,

R. Lowe, Y . I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mor- datch, “Multi-agent actor-critic for mixed cooperative-competitive envi- ronments,”Advances in neural information processing systems, vol. 30, 2017

2017

[11] [11]

Deep unsupervised learning using nonequilibrium thermodynamics,

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” inInternational conference on machine learning. pmlr, 2015, pp. 2256–2265

2015

[12] [12]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020

2020

[13] [13]

Diffusion models in vision: A survey,

F.-A. Croitoru, V . Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 9, pp. 10 850–10 869, 2023

2023

[14] [14]

A survey of diffusion models in natural language processing,

H. Zou, Z. M. Kim, and D. Kang, “A survey of diffusion models in natural language processing,”arXiv preprint arXiv:2305.14671, 2023

work page arXiv 2023

[15] [15]

A survey of multimodal controllable diffusion models,

R. Jiang, G.-C. Zheng, T. Li, T.-R. Yang, J.-D. Wang, and X. Li, “A survey of multimodal controllable diffusion models,”Journal of Computer Science and Technology, vol. 39, no. 3, pp. 509–541, 2024

2024

[16] [16]

An integrated communication and computing scheme for wi-fi networks based on generative ai and reinforcement learning,

X. Du and X. Fang, “An integrated communication and computing scheme for wi-fi networks based on generative ai and reinforcement learning,” inGLOBECOM 2024-2024 IEEE Global Communications Conference. IEEE, 2024, pp. 2009–2014

2024

[17] [17]

Computation- offloading optimization for satellite edge computing via diffusion and lyapunov-based deep reinforcement learning,

Z. Rao, Z. Zhu, Y . Yao, Y . Xu, Y . Cheng, and H. Du, “Computation- offloading optimization for satellite edge computing via diffusion and lyapunov-based deep reinforcement learning,”IEEE Internet of Things Journal, 2025

2025

[18] [18]

Dmais: Diffusion model-based scheduling in edge-cloud systems,

Z. Wang, M. Ding, Y . Zhao, C. Qiu, Q. Ye, and X. Wang, “Dmais: Diffusion model-based scheduling in edge-cloud systems,” inGLOBE- COM 2024 - 2024 IEEE Global Communications Conference, 2024, pp. 4612–4617

2024

[19] [19]

Diffusion-based reinforcement learning for edge-enabled ai-generated content services,

H. Du, Z. Li, D. Niyato, J. Kang, Z. Xiong, H. Huang, and S. Mao, “Diffusion-based reinforcement learning for edge-enabled ai-generated content services,”IEEE Transactions on Mobile Computing, vol. 23, no. 9, pp. 8902–8918, 2024

2024

[20] [20]

Enhancing llm qos through cloud-edge collaboration: A diffusion-based multi-agent reinforcement learning approach,

Z. Yao, Z. Tang, W. Yang, and W. Jia, “Enhancing llm qos through cloud-edge collaboration: A diffusion-based multi-agent reinforcement learning approach,”IEEE Transactions on Services Computing, 2025

2025

[21] [21]

Diffusion-based multi-agent reinforcement learning for semantic ve- hicular edge computing,

Y . Yang, W. Ma, W. Sun, J. He, Y . Fu, C. Yuen, and Y . Zhang, “Diffusion-based multi-agent reinforcement learning for semantic ve- hicular edge computing,”IEEE Transactions on Services Computing, 2025

2025

[22] [22]

Improve the training efficiency of drl for wireless communication resource allocation: The role of generative diffusion models,

X. Zhang and J. Yu, “Improve the training efficiency of drl for wireless communication resource allocation: The role of generative diffusion models,”arXiv preprint arXiv:2502.07211, 2025

work page arXiv 2025

[23] [23]

Uplink rsma in leo satellite communications: A perspective from generative artificial intelligence,

K. Wang, X. Wang, N. Zhao, X. Yang, H. Fang, and D. Niyato, “Uplink rsma in leo satellite communications: A perspective from generative artificial intelligence,”IEEE Transactions on Vehicular Technology, 2025

2025

[24] [24]

Carrier aggregation, load balanc- ing, and backhauling in non-terrestrial networks: Generative diffusion model-based optimization,

F. Khoramnejad and E. Hossain, “Carrier aggregation, load balanc- ing, and backhauling in non-terrestrial networks: Generative diffusion model-based optimization,”IEEE Transactions on Wireless Communi- cations, 2025

2025

[25] [25]

Multi- objective aerial collaborative secure communication optimization via generative diffusion model-enabled deep reinforcement learning,

C. Zhang, G. Sun, J. Li, Q. Wu, J. Wang, D. Niyato, and Y . Liu, “Multi- objective aerial collaborative secure communication optimization via generative diffusion model-enabled deep reinforcement learning,”IEEE Transactions on Mobile Computing, 2024

2024

[26] [26]

Uav- enabled secure data collection and energy transfer in iot via diffusion model-enhanced deep reinforcement learning,

S. Liang, M. Yin, W. Xie, Z. Sun, J. Li, J. Wang, and H. Du, “Uav- enabled secure data collection and energy transfer in iot via diffusion model-enhanced deep reinforcement learning,”IEEE Internet of Things Journal, 2024

2024

[27] [27]

Diffusion model enhanced deep reinforcement learning for traffic control in 6g networks,

H. Shi, R. Wang, C. Pan, F. Gao, H. Tang, and L. Chen, “Diffusion model enhanced deep reinforcement learning for traffic control in 6g networks,”IEEE Communications Magazine, vol. 63, no. 7, pp. 41–47, 2025

2025

[28] [28]

A combined diffusion model and reinforcement learning approach for solving the vehicle routing problem with multiple soft time windows,

Y . Qiao, J. Miao, and X. Huang, “A combined diffusion model and reinforcement learning approach for solving the vehicle routing problem with multiple soft time windows,”IEEE Access, 2025

2025

[29] [29]

Enhancing deep reinforcement learning: A tutorial on generative diffusion models in network optimization,

H. Du, R. Zhang, Y . Liu, J. Wang, Y . Lin, Z. Li, D. Niyato, J. Kang, Z. Xiong, S. Cuiet al., “Enhancing deep reinforcement learning: A tutorial on generative diffusion models in network optimization,”IEEE Communications Surveys & Tutorials, vol. 26, no. 4, pp. 2611–2646, 2024

2024

[30] [30]

Diffusion models for future networks and communications: A comprehensive survey,

N. C. Luong, N. D. Hai, D. Van Le, H. T. Nguyen, T.-H. Vu, T. Huynh- The, R. Zhang, N. D. D. Anh, D. Niyato, M. Di Renzoet al., “Diffusion models for future networks and communications: A comprehensive survey,”arXiv preprint arXiv:2508.01586, 2025

work page arXiv 2025

[31] [31]

Generative diffusion models for wireless networks: Fundamental, architecture, and state-of-the-art,

D. Fan, R. Meng, X. Xu, Y . Liu, G. Nan, C. Feng, S. Han, S. Gao, B. Xu, D. Niyatoet al., “Generative diffusion models for wireless networks: Fundamental, architecture, and state-of-the-art,”IEEE Com- munications Surveys & Tutorials, 2026

2026

[32] [32]

Diffusion models for reinforcement learning: A survey,

Z. Zhu, H. Zhao, H. He, Y . Zhong, S. Zhang, H. Guo, T. Chen, and W. Zhang, “Diffusion models for reinforcement learning: A survey,” arXiv preprint arXiv:2311.01223, 2023

work page arXiv 2023

[33] [33]

Diffusion models for wireless communications,

M. Letafati, S. Ali, and M. Latva-aho, “Diffusion models for wireless communications,”arXiv preprint arXiv:2310.07312, 2023

work page arXiv 2023

[34] [34]

Applications of deep reinforcement learning in communications and networking: A survey,

N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y .-C. Liang, and D. I. Kim, “Applications of deep reinforcement learning in communications and networking: A survey,”IEEE communications surveys & tutorials, vol. 21, no. 4, pp. 3133–3174, 2019

2019

[35] [35]

Toward autonomous multi-uav wireless network: A survey of reinforcement 21 learning-based approaches,

Y . Bai, H. Zhao, X. Zhang, Z. Chang, R. Jäntti, and K. Yang, “Toward autonomous multi-uav wireless network: A survey of reinforcement 21 learning-based approaches,”IEEE Communications Surveys & Tutori- als, vol. 25, no. 4, pp. 3038–3067, 2023

2023

[36] [36]

A survey on drl based uav communications and networking: Drl fun- damentals, applications and implementations,

W. Zhao, S. Cui, W. Qiu, Z. He, Z. Liu, X. Zheng, B. Mao, and N. Kato, “A survey on drl based uav communications and networking: Drl fun- damentals, applications and implementations,”IEEE Communications Surveys & Tutorials, 2025

2025

[37] [37]

Deep reinforcement learning for radio resource allocation and man- agement in next generation heterogeneous wireless networks: A sur- vey,

A. Alwarafy, M. Abdallah, B. S. Ciftler, A. Al-Fuqaha, and M. Hamdi, “Deep reinforcement learning for radio resource allocation and man- agement in next generation heterogeneous wireless networks: A sur- vey,”arXiv preprint arXiv:2106.00574, 2021

work page arXiv 2021

[38] [38]

Deep reinforcement learning in edge networks: Challenges and future directions,

A. Hazra, V . M. R. Tummala, N. Mazumdar, D. K. Sah, and M. Ad- hikari, “Deep reinforcement learning in edge networks: Challenges and future directions,”Physical Communication, vol. 66, p. 102460, 2024

2024

[39] [39]

Score-Based Generative Modeling through Stochastic Differential Equations

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,”arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011

[40] [40]

Classifier-Free Diffusion Guidance

J. Ho and T. Salimans, “Classifier-free diffusion guidance,” 2022. [Online]. Available: https://arxiv.org/abs/2207.12598

work page internal anchor Pith review Pith/arXiv arXiv 2022

[41] [41]

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Z. Wang, J. J. Hunt, and M. Zhou, “Diffusion policies as an expres- sive policy class for offline reinforcement learning,”arXiv preprint arXiv:2208.06193, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[42] [42]

Diffusion actor-critic with entropy reg- ulator,

Y . Wang, L. Wang, Y . Jiang, W. Zou, T. Liu, X. Song, W. Wang, L. Xiao, J. Wu, J. Duanet al., “Diffusion actor-critic with entropy reg- ulator,”Advances in Neural Information Processing Systems, vol. 37, pp. 54 183–54 204, 2024

2024

[43] [43]

Qos- aware multi-aigc service orchestration at edges: An attention-diffusion- aided drl method,

Y . Liu, S. Li, X. Lin, X. Chen, G. Li, Y . Liu, B. Liao, and J. Li, “Qos- aware multi-aigc service orchestration at edges: An attention-diffusion- aided drl method,”IEEE Transactions on Cognitive Communications and Networking, 2025

2025

[44] [44]

Towards multi- task generative-ai edge services with an attention-based diffusion drl approach,

Y . Liu, X. Lin, S. Li, G. Li, Q. Mao, and J. Li, “Towards multi- task generative-ai edge services with an attention-based diffusion drl approach,” in2024 9th IEEE International Conference on Smart Cloud (SmartCloud), 2024, pp. 60–65

2024

[45] [45]

Diffusion- based reinforcement learning for cooperative offloading and resource allocation in multi-uav assisted edge-enabled metaverse,

Z. Zhang, J. Wang, J. Chen, H. Fu, Z. Tong, and C. Jiang, “Diffusion- based reinforcement learning for cooperative offloading and resource allocation in multi-uav assisted edge-enabled metaverse,”IEEE Trans- actions on Vehicular Technology, 2025

2025

[46] [46]

Diffusion model and digital twin enhanced deep reinforcement learning for radio resource management in ran slicing,

S. Xiong, S. He, G. Chen, C. Zhang, and Y . Huang, “Diffusion model and digital twin enhanced deep reinforcement learning for radio resource management in ran slicing,” in2025 IEEE Wireless Communications and Networking Conference (WCNC), 2025, pp. 1–6

2025

[47] [47]

Accelerating ai-generated content collaborative inference via transfer reinforcement learning in dynamic edge networks,

M. Tian, Z. Liu, C. Hou, C. Qiu, X. Wang, D. Niyato, and V . C. Leung, “Accelerating ai-generated content collaborative inference via transfer reinforcement learning in dynamic edge networks,”IEEE Transactions on Cloud Computing, 2025

2025

[48] [48]

Addressing function approxi- mation error in actor-critic methods,

S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approxi- mation error in actor-critic methods,” inInternational conference on machine learning. PMLR, 2018, pp. 1587–1596

2018

[49] [49]

Multi-objective aerial irs-assisted isac optimization via gen- erative ai-enhanced deep reinforcement learning,

W. Xie, G. Sun, J. Wang, H. Du, J. Kang, K. Huang, and V . Le- ung, “Multi-objective aerial irs-assisted isac optimization via gen- erative ai-enhanced deep reinforcement learning,”arXiv preprint arXiv:2502.10687, 2025

work page internal anchor Pith review arXiv 2025

[50] [50]

Joint computing offloading and resource allo- cation in mec-enabled iot: A diffusion-based reinforcement learning approach,

H. Cao and B. Xiao, “Joint computing offloading and resource allo- cation in mec-enabled iot: A diffusion-based reinforcement learning approach,” in2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2024, pp. 890–896

2024

[51] [51]

Generative ai-aided reinforcement learning for computation offloading and privacy protection in vr-based multi-access edge computing,

F. You, H. Du, J. Kang, W. Ni, D. Niyato, and A. Jamalipour, “Generative ai-aided reinforcement learning for computation offloading and privacy protection in vr-based multi-access edge computing,” in 2024 IEEE Smart World Congress (SWC), 2024, pp. 2209–2214

2024

[52] [52]

Dnn task assignment in uav networks: A generative ai enhanced multi-agent reinforcement learning approach,

X. Tang, Q. Chen, W. Weng, B. Liao, J. Wang, X. Cao, and X. Li, “Dnn task assignment in uav networks: A generative ai enhanced multi-agent reinforcement learning approach,”IEEE Internet of Things Journal, 2025

2025

[53] [53]

Task assignment and exploration optimization for low altitude uav rescue via generative ai enhanced multi-agent reinforcement learning,

X. Tang, Q. Chen, W. Weng, C. Jin, Z. Liu, J. Wang, G. Sun, X. Li, and D. Niyato, “Task assignment and exploration optimization for low altitude uav rescue via generative ai enhanced multi-agent reinforcement learning,”arXiv preprint arXiv:2504.13554, 2025

work page arXiv 2025

[54] [54]

Monotonic value function factorisation for deep multi- agent reinforcement learning,

T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi- agent reinforcement learning,”Journal of Machine Learning Research, vol. 21, no. 178, pp. 1–51, 2020

2020

[55] [55]

Diffusion-based deep reinforcement learning for resource man- agement in connected construction equipment networks: A hierarchical framework,

P. Ning, H. Wang, T. Tang, J. Zhang, H. Du, D. Niyato, and F. R. Yu, “Diffusion-based deep reinforcement learning for resource man- agement in connected construction equipment networks: A hierarchical framework,”IEEE Transactions on Wireless Communications, 2025

2025

[56] [56]

Multi-agent conditional diffusion model with mean field communication as wireless resource allocation planner,

K. Meng, S. Zhang, R. Li, X. Meng, C. Wang, M. Lei, and Z. Zhao, “Multi-agent conditional diffusion model with mean field communication as wireless resource allocation planner,”arXiv preprint arXiv:2510.22969, 2025

work page arXiv 2025

[57] [57]

Dnn partitioning, task offloading, and resource allocation in dynamic vehicular networks: A lyapunov-guided diffusion-based reinforcement learning approach,

Z. Liu, H. Du, J. Lin, Z. Gao, L. Huang, S. Hosseinalipour, and D. Niyato, “Dnn partitioning, task offloading, and resource allocation in dynamic vehicular networks: A lyapunov-guided diffusion-based reinforcement learning approach,”IEEE Transactions on Mobile Com- puting, 2024

2024

[58] [58]

Planning with diffusion for flexible behavior synthesis,

M. Janner, Y . Du, J. B. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” inInternational Conference on Machine Learning, 2022. [Online]. Available: https: //api.semanticscholar.org/CorpusID:248965046

2022

[59] [59]

Dif- fuserlite: Towards real-time diffusion planning,

Z. Dong, J. Hao, Y . Yuan, F. Ni, Y . Wang, P. Li, and Y . Zheng, “Dif- fuserlite: Towards real-time diffusion planning,”Advances in Neural Information Processing Systems, vol. 37, pp. 122 556–122 583, 2024

2024

[60] [60]

Crossway diffu- sion: Improving diffusion-based visuomotor policy via self-supervised learning,

X. Li, V . Belagali, J. Shang, and M. S. Ryoo, “Crossway diffu- sion: Improving diffusion-based visuomotor policy via self-supervised learning,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 16 841–16 849

2024

[61] [61]

Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning,

H. He, C. Bai, K. Xu, Z. Yang, W. Zhang, D. Wang, B. Zhao, and X. Li, “Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning,”Advances in neural information processing systems, vol. 36, pp. 64 896–64 917, 2023

2023

[62] [62]

Dual-circulation generative ai for optimizing resource allocation in multi-granularity heterogeneous federated learning,

W. He, H. Yao, X. Ren, T. Ouyang, Z. Xiong, Y . He, and Y . Liu, “Dual-circulation generative ai for optimizing resource allocation in multi-granularity heterogeneous federated learning,”IEEE Transactions on Cognitive Communications and Networking, 2025

2025

[63] [63]

Dress: Diffusion reasoning-based reward shaping scheme for intelligent networks,

F. You, H. Du, X. Hou, Y . Ren, and K. Huang, “Dress: Diffusion reasoning-based reward shaping scheme for intelligent networks,”arXiv preprint arXiv:2503.07433, 2025

work page arXiv 2025

[64] [64]

Drl optimization tra- jectory generation via wireless network intent-guided diffusion models for resource allocation,

J. Wu, X. Fang, D. Niyato, J. Wang, and J. Wang, “Drl optimization tra- jectory generation via wireless network intent-guided diffusion models for resource allocation,”IEEE Internet of Things Journal, 2025

2025

[65] [65]

Diffusion-rl for scalable resource allocation for 6g networks,

S. Nouri, M. K. Motalleb, and V . Shah-Mansouri, “Diffusion-rl for scalable resource allocation for 6g networks,”arXiv preprint arXiv:2506.07880, 2025

work page arXiv 2025

[66] [66]

Diffusion model based resource allocation strategy in ultra-reliable wireless networked control systems,

A. B. Darabi and S. Coleri, “Diffusion model based resource allocation strategy in ultra-reliable wireless networked control systems,”IEEE Communications Letters, 2024

2024

[67] [67]

Integrating failures in robot skill acquisition with offline action-sequence diffusion rl,

H. Wang, L. Qi, and Y . Sun, “Integrating failures in robot skill acquisition with offline action-sequence diffusion rl,” inICASSP 2025- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5

2025

[68] [68]

Enhancing qoe in collaborative edge systems with feed- back diffusion generative scheduling,

C. Xu, J. Guo, Y . Liang, H. Zou, J. Zeng, H. Dai, W. Jia, J. Cao, and T. Wang, “Enhancing qoe in collaborative edge systems with feed- back diffusion generative scheduling,”IEEE Transactions on Mobile Computing, 2025

2025

[69] [69]

Trust model-based consensus optimization for vehicle platooning networks: A novel deep reinforcement learning approach with genai,

H. Chen, X. Fu, Q. Yuan, Z. Zhuang, J. Kang, Z. Liu, J. Wang, and D. Niyato, “Trust model-based consensus optimization for vehicle platooning networks: A novel deep reinforcement learning approach with genai,”IEEE Transactions on Intelligent Transportation Systems, 2025

2025

[70] [70]

Diffusion-based reinforcement learning for dynamic uav-assisted ve- hicle twins migration in vehicular metaverses,

Y . Tong, J. Kang, J. Chen, M. Xu, G. Li, W. Zhang, and X. Yan, “Diffusion-based reinforcement learning for dynamic uav-assisted ve- hicle twins migration in vehicular metaverses,” inGLOBECOM 2024- 2024 IEEE Global Communications Conference. IEEE, 2024, pp. 5156–5161

2024

[71] [71]

Continuous deep q- learning with model-based acceleration,

S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep q- learning with model-based acceleration,” inInternational conference on machine learning. PMLR, 2016, pp. 2829–2838

2016

[72] [72]

Decentralized request dispatch for edge-clouds: a diffusion-based reinforcement learning paradigm,

Y . Peng, H. Peng, and W. Wang, “Decentralized request dispatch for edge-clouds: a diffusion-based reinforcement learning paradigm,”IEEE Transactions on Services Computing, 2025

2025

[73] [73]

Energy-efficient ground-air-space vehicular crowdsensing by hierarchical multi-agent deep reinforcement learning with diffusion models,

Y . Zhao, C. H. Liu, T. Yi, G. Li, and D. Wu, “Energy-efficient ground-air-space vehicular crowdsensing by hierarchical multi-agent deep reinforcement learning with diffusion models,”IEEE Journal on Selected Areas in Communications, 2024

2024

[74] [74]

Energy-efficient rsma- enabled low-altitude mec optimization via generative ai-enhanced deep reinforcement learning,

X. Wang, H. Du, L. Feng, and K. Huang, “Energy-efficient rsma- enabled low-altitude mec optimization via generative ai-enhanced deep reinforcement learning,”arXiv preprint arXiv:2507.12910, 2025

work page arXiv 2025

[75] [75]

A priority-aware ai-generated content resource allocation method for multi-uav aided metaverse,

Z. Zhang, J. Wang, J. Chen, Z. Fang, C. Jiang, and Z. Han, “A priority-aware ai-generated content resource allocation method for multi-uav aided metaverse,” in2025 IEEE Wireless Communications and Networking Conference (WCNC). IEEE, 2025, pp. 1–6

2025

[76] [76]

Generative diffusion-based contract design for efficient ai twin 22 migration in vehicular embodied ai networks,

Y . Zhong, J. Kang, J. Wen, D. Ye, J. Nie, D. Niyato, X. Gao, and S. Xie, “Generative diffusion-based contract design for efficient ai twin 22 migration in vehicular embodied ai networks,”IEEE Transactions on Mobile Computing, 2025

2025

[77] [77]

Adaptive dig- ital twin-assisted 3c management for qoe-driven msvs: A gai-based drl approach,

X. Huang, X. Qin, M. Li, C. Huang, and X. Shen, “Adaptive dig- ital twin-assisted 3c management for qoe-driven msvs: A gai-based drl approach,”IEEE Transactions on Cognitive Communications and Networking, 2024

2024

[78] [78]

Ai-generated network design: A diffusion model-based learning approach,

Y . Huang, M. Xu, X. Zhang, D. Niyato, Z. Xiong, S. Wang, and T. Huang, “Ai-generated network design: A diffusion model-based learning approach,”IEEE Network, vol. 38, no. 3, pp. 202–209, 2023

2023

[79] [79]

Enhanced secure beamforming for irs-assisted iot communication using a generative diffusion model-enabled optimization approach,

J. Zhang, Z. Liu, X. Feng, H. Yang, and S. Liang, “Enhanced secure beamforming for irs-assisted iot communication using a generative diffusion model-enabled optimization approach,”IEEE Internet of Things Journal, 2025

2025

[80] [80]

Generative diffusion model-based deep reinforcement learning for uplink rate-splitting multiple access in leo satellite networks,

X. Wang, K. Wang, D. Zhang, J. Li, M. Zhou, and T. Hämäläinen, “Generative diffusion model-based deep reinforcement learning for uplink rate-splitting multiple access in leo satellite networks,” in2024 IEEE Symposium on Computers and Communications (ISCC). IEEE, 2024, pp. 1–8

2024