Optimisation of Resource Allocation in Heterogeneous Wireless Networks Using Deep Reinforcement Learning

Jaco Du Toit; Jonathan Shock; Oluwaseyi Giwa; Tobi Awodumila

arxiv: 2509.25284 · v2 · submitted 2025-09-29 · 💻 cs.LG · cs.NI· eess.SP

Optimisation of Resource Allocation in Heterogeneous Wireless Networks Using Deep Reinforcement Learning

Oluwaseyi Giwa , Jonathan Shock , Jaco Du Toit , Tobi Awodumila This is my paper

Pith reviewed 2026-05-18 12:11 UTC · model grok-4.3

classification 💻 cs.LG cs.NIeess.SP

keywords deep reinforcement learningresource allocationheterogeneous networksO-RANenergy efficiencyuser fairnessproximal policy optimizationxApp

0 comments

The pith

A PPO-based xApp using deep reinforcement learning jointly optimizes transmit power, bandwidth slicing, and user scheduling in O-RAN heterogeneous networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a near-real-time RIC xApp that applies proximal policy optimisation to manage transmit power, bandwidth allocation, and user scheduling together under changing loads in heterogeneous wireless networks. It benchmarks this against TD3 and conventional heuristics using simulations drawn from real-world topologies. A sympathetic reader would care because better joint optimisation could make wireless networks more energy efficient and equitable without separate rules for each resource. The work frames this as a step toward centralised AI control in 6G systems.

Core claim

We propose a near-real-time RAN intelligent controller (Near-RT RIC) xApp utilising deep reinforcement learning (DRL) to jointly optimise transmit power, bandwidth slicing, and user scheduling. Leveraging real-world network topologies, we benchmark proximal policy optimisation (PPO) and twin delayed deep deterministic policy gradient (TD3) against standard heuristics. Our results demonstrate that the PPO-based xApp achieves a superior trade-off, reducing network energy consumption by up to 70% in dense scenarios and improving user fairness by more than 30% compared to throughput-greedy baselines. These findings validate the feasibility of centralised, energy-aware AI orchestration in future

What carries the argument

The PPO-based xApp in the Near-RT RIC that learns a policy for simultaneous control of transmit power, bandwidth slicing, and user scheduling.

If this is right

Network energy consumption falls by up to 70% in dense scenarios.
User fairness rises by more than 30% compared with throughput-greedy methods.
PPO delivers a better energy-fairness balance than TD3 or standard heuristics.
Centralised AI orchestration becomes feasible for energy-aware 6G resource allocation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the simulation-to-reality gap is small, operators could adopt similar xApps to cut operating costs in dense urban deployments.
Joint multi-objective DRL may generalise to other wireless settings where power, spectrum, and scheduling interact strongly.
Adding explicit mobility or interference dynamics to the training loop would test whether the reported gains remain stable.

Load-bearing premise

The simulated real-world network topologies and user load patterns used for benchmarking accurately predict performance in actual deployed heterogeneous networks.

What would settle it

Deploying the PPO xApp in a live heterogeneous network testbed and checking whether energy consumption drops by 70% and fairness rises by 30% relative to the same baselines.

Figures

Figures reproduced from arXiv: 2509.25284 by Jaco Du Toit, Jonathan Shock, Oluwaseyi Giwa, Tobi Awodumila.

**Figure 3.** Figure 3: Comparison of the mean performance of PPO and TD3. Error bars represent [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Dynamic resource allocation in open radio access network (O-RAN) heterogeneous networks (HetNets) presents a complex optimisation challenge under varying user loads. We propose a near-real-time RAN intelligent controller (Near-RT RIC) xApp utilising deep reinforcement learning (DRL) to jointly optimise transmit power, bandwidth slicing, and user scheduling. Leveraging real-world network topologies, we benchmark proximal policy optimisation (PPO) and twin delayed deep deterministic policy gradient (TD3) against standard heuristics. Our results demonstrate that the PPO-based xApp achieves a superior trade-off, reducing network energy consumption by up to 70% in dense scenarios and improving user fairness by more than 30% compared to throughput-greedy baselines. These findings validate the feasibility of centralised, energy-aware AI orchestration in future 6G architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies PPO and TD3 to joint power-bandwidth-scheduling optimization in O-RAN HetNets and reports large simulated gains over heuristics, but the evaluation details are too thin to assess reliability.

read the letter

The main thing to know is that this work takes standard DRL algorithms and applies them to a joint optimization problem in O-RAN heterogeneous networks, claiming up to 70% lower energy use and over 30% better fairness than throughput-greedy baselines in dense simulated scenarios based on real-world topologies. It frames the solution as a near-RT RIC xApp and benchmarks PPO against TD3 and simple heuristics. That setup is a reasonable incremental step for the subfield of AI-driven wireless management. The paper does a clear job of showing the trade-off surface for energy versus fairness when all three controls are handled together rather than in isolation. The choice to stick with established algorithms like PPO and TD3 keeps the contribution focused on the application and the O-RAN context instead of new algorithmic tricks. The benchmarking against heuristics is straightforward and gives readers a concrete sense of where the DRL approach lands. The soft spots are in the experimental reporting. The headline numbers appear without mention of run counts, variance, error bars, or statistical tests, so it is difficult to tell how stable the 70% and 30% figures actually are. The reliance on simulated topologies also leaves the usual question about how faithfully the model captures real propagation, interference correlation, and bursty traffic; if those elements are simplified, the gains could shrink in live networks. This paper is mainly for people already working on energy-aware O-RAN or 6G resource allocation who want a recent benchmark to compare against. It is not a foundational advance, but the comparison is useful enough that a serious referee could give feedback on the missing statistics and validation steps. I would send it to peer review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Near-RT RIC xApp that employs deep reinforcement learning (PPO and TD3) to jointly optimize transmit power, bandwidth slicing, and user scheduling in O-RAN HetNets. It benchmarks these agents against standard heuristics on real-world network topologies and reports that PPO achieves up to 70% lower energy consumption in dense scenarios and more than 30% better user fairness than throughput-greedy baselines, thereby supporting the feasibility of centralised energy-aware AI orchestration for 6G.

Significance. If the simulation results prove robust, the work would contribute to the timely problem of multi-objective resource allocation in open RAN architectures. The explicit comparison of PPO versus TD3 and the focus on energy-fairness trade-offs are strengths. However, the absence of experimental methodology details prevents a confident assessment of whether the headline gains are reproducible or generalisable.

major comments (2)

[Abstract] Abstract: the quantitative claims of 'up to 70% energy reduction' and 'more than 30% fairness improvement' are presented without any description of the number of independent runs, statistical tests, confidence intervals, or error bars, rendering it impossible to determine whether the data support the stated superiority.
[Evaluation] Evaluation section (implied by benchmarking description): the central claim that the simulator using 'real-world network topologies' and 'user load patterns' validates feasibility for deployed 6G networks rests on an unverified assumption; no evidence is supplied that the model captures small-scale fading correlation, O-RAN control-loop delays, or bursty traffic, so the reported deltas may be simulation-specific artifacts.

minor comments (2)

[Abstract] The abstract and introduction should explicitly state the precise definitions of the energy-consumption and fairness metrics used for the reported percentages.
[Method] Notation for the joint action space (power, bandwidth slice, scheduling) should be introduced consistently before the DRL formulation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments identify areas where additional clarity and transparency can strengthen the presentation of our results. We address each major comment below and outline the planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the quantitative claims of 'up to 70% energy reduction' and 'more than 30% fairness improvement' are presented without any description of the number of independent runs, statistical tests, confidence intervals, or error bars, rendering it impossible to determine whether the data support the stated superiority.

Authors: We agree that the abstract would be improved by referencing the statistical basis of the reported figures. In the revised manuscript we will update the abstract to note that the headline results are averages computed over 20 independent runs with different random seeds, and we will direct readers to the evaluation section where mean values, standard deviations, and error bars are presented. We will also add a brief statement confirming that the observed improvements were consistent across runs. revision: yes
Referee: [Evaluation] Evaluation section (implied by benchmarking description): the central claim that the simulator using 'real-world network topologies' and 'user load patterns' validates feasibility for deployed 6G networks rests on an unverified assumption; no evidence is supplied that the model captures small-scale fading correlation, O-RAN control-loop delays, or bursty traffic, so the reported deltas may be simulation-specific artifacts.

Authors: We acknowledge the validity of this observation. Our simulator incorporates publicly available real-world base-station locations and historical user-load traces, which go beyond purely synthetic random deployments. However, the channel model follows standard 3GPP path-loss and log-normal shadowing assumptions without explicit spatial correlation for small-scale fading, and the traffic model does not include fine-grained burstiness or O-RAN-specific control-loop latencies. In the revision we will expand the evaluation section to explicitly list these modeling choices and add a dedicated limitations paragraph discussing their implications for direct extrapolation to live 6G networks. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical simulation benchmarks are independent of input definitions

full rationale

The paper reports performance deltas obtained by training PPO and TD3 agents inside a simulator and comparing them to throughput-greedy heuristics on the same simulated topologies and load patterns. These numbers are direct outputs of the experimental runs rather than algebraic identities, fitted parameters renamed as predictions, or results that reduce to self-citations. No uniqueness theorems, ansatzes smuggled via prior work, or self-definitional loops appear in the derivation chain. The evaluation is therefore self-contained against external benchmarks (the heuristics), satisfying the condition for a zero-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so specific free parameters and axioms cannot be extracted in detail; the approach necessarily relies on standard DRL hyperparameters and simulation assumptions.

free parameters (1)

DRL hyperparameters
Learning rates, network sizes, and reward weights are typically tuned or fitted in such studies but are not reported here.

axioms (1)

domain assumption Simulated topologies and load patterns sufficiently represent real heterogeneous network behavior
The abstract invokes real-world topologies for benchmarking without further justification.

pith-pipeline@v0.9.0 · 5685 in / 1291 out tokens · 63044 ms · 2026-05-18T12:11:53.649805+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

rt = κ·∑TU − β·∑PBS + ϕ·Fairness t (Jain index); PPO clipped surrogate LCLIP and TD3 clipped double Q-learning for continuous power/bandwidth actions
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Satellite-derived topology with 3 macro + 10 micro BS, 50 users, path-loss + log-normal shadowing SINR model

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 2 internal anchors

[1]

A survey on resource allocation for 5G heterogeneous networks: Current research, future trends, and challenges,

X. Yongjun, G. Guan, G. Haris, and A. Fumiyuki, “A survey on resource allocation for 5G heterogeneous networks: Current research, future trends, and challenges,”IEEE Communications Surveys & Tutorials, vol. 23, no. 2, pp. 668–695, 2021

work page 2021
[2]

A survey on resource management for 6G heterogeneous networks: Current research, future trends, and challenges,

A. H. Faeq, H. M. Nour, D. Kaharudin, H. E. Binti, S. Nurhizam, Q. Faizan, A. Khairul, and N. Q. Ngoc, “A survey on resource management for 6G heterogeneous networks: Current research, future trends, and challenges,”Electronics, vol. 12, no. 3, 2023. [Online]. Available: https://doi.org/10.3390/electronics12030647

work page doi:10.3390/electronics12030647 2023
[3]

A comprehensive survey on radio resource management in 5G hetnets: Current solutions, future trends and open issues,

A. Bharat, T. M. Amine, M. Marco, and M. Gabriel-Miro, “A comprehensive survey on radio resource management in 5G hetnets: Current solutions, future trends and open issues,”IEEE Communications Surveys & Tutorials, vol. 24, no. 4, pp. 2495–2534, 2022. [Online]. Available: https://doi.org/10.1109/COMST.2022.3207967 Algorithm 1TD3 for Resource Allocation Opt...

work page doi:10.1109/comst.2022.3207967 2022
[4]

Ather, R

D. Ather, R. Kler, Z. T. Baig, G. P. Babu, A. Rastogi, and N. Ahmed, 6G Networks: Pioneering Advanced Communication Techniques for Call Centers and Beyond.CRC Press, 2025. [Online]. Available: https://doi.org/10.1201/9781003583127-12

work page doi:10.1201/9781003583127-12 2025
[5]

Boyd and L

S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge University Press, 2004. [Online]. Available: https://web.stanford.edu/ ∼boyd/cvxbook/

work page 2004
[6]

Energy-efficient joint resource allocation in 5G hetnet using multi-agent parameterized deep reinforcement learning,

A. Mughees, M. Tahir, M. A. Sheikh, A. Amphawan, Y . K. Meng, A. Ahad, and K. Chamran, “Energy-efficient joint resource allocation in 5G hetnet using multi-agent parameterized deep reinforcement learning,” Physical Communication, vol. 61, p. 102206, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1874490723002094 TABLE I: Qua...

work page 2023
[7]

Mobility induced multi-hop leach protocol in heterogeneous mobile network,

M. Seli, B. P. Kumar, S. P. Kumar, B. S. Kishoro, H. K. Lee, and S. Mangal, “Mobility induced multi-hop leach protocol in heterogeneous mobile network,”IEEE Access, vol. 10, pp. 132 895–132 907, 2022. [Online]. Available: https://doi.org/10.1109/ACCESS.2022.3228576

work page doi:10.1109/access.2022.3228576 2022
[8]

Wireless network scheduling with discrete propagation delays: Theorems and algorithms,

Y . Shenghao, M. Jun, and L. Yanxiao, “Wireless network scheduling with discrete propagation delays: Theorems and algorithms,”IEEE Transactions on Information Theory, vol. 70, no. 3, pp. 1852–1875,

work page
[9]

Available: https://doi.org/10.1109/TIT.2023.3324180

[Online]. Available: https://doi.org/10.1109/TIT.2023.3324180

work page doi:10.1109/tit.2023.3324180 2023
[10]

Sutton and A

R. Sutton and A. Barto,Reinforcement Learning: An Introduction. MIT Press, 1998

work page 1998
[11]

Playing atari with deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglu, D. Wier- stra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” NIPS Deep Learning Workshop 2013, 2013

work page 2013
[12]

Deep reinforcement learning with double q-learning,

van Hado Hasselt, G. Arthur, and S. David, “Deep reinforcement learning with double q-learning,” ser. AAAI’16. AAAI Press, 2016, p. 2094–2100. [Online]. Available: https://doi.org/10.48550/arXiv.1509. 06461

work page doi:10.48550/arxiv.1509 2016
[13]

Soft Actor-Critic Algorithms and Applications

T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. H. J. Tan, V . Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft actor- critic algorithms and applications,”arXiv preprint, arXiv:1812.05905v2,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Soft Actor-Critic Algorithms and Applications

[Online]. Available: https://doi.org/10.48550/arXiv.1812.05905

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1812.05905
[15]

Applications of deep re- inforcement learning in wireless networks-a recent review,

A. Archi, H. A. Saadi, and S. Mekaoui, “Applications of deep re- inforcement learning in wireless networks-a recent review,” in2023 2nd International Conference on Electronics, Energy and Measurement (IC2EM), vol. 1, 2023, pp. 1–8

work page 2023
[16]

An intelligent optimization method for wireless communication network resources based on reinforcement learning,

D. Tian, “An intelligent optimization method for wireless communication network resources based on reinforcement learning,”Journal of Physics: Conference Series, 2023. [Online]. Available: https://doi.org/10.1088/ 1742-6596/2560/1/012036

work page 2023
[17]

D3qn-based multi- priority computation offloading for time-sensitive and interference- limited industrial wireless networks,

X. Chi, Z. Peifeng, Y . Haibin, and L. Yonghui, “D3qn-based multi- priority computation offloading for time-sensitive and interference- limited industrial wireless networks,”IEEE Transactions on Vehicular Technology, vol. 73, no. 9, pp. 13 682–13 693, 2024. [Online]. Available: https://doi.org/10.1109/TVT.2024.3387567

work page doi:10.1109/tvt.2024.3387567 2024
[18]

Application of mac protocol reinforcement learning in wireless network environment,

J. Park and W. Na, “Application of mac protocol reinforcement learning in wireless network environment,” in2024 15th International Conference on Information and Communication Technology Convergence (ICTC), 2024, pp. 730–731

work page 2024
[19]

A twin delayed deep deterministic policy gradient algorithm for autonomous ground vehicle navigation via digital twin perception awareness,

K. Olayemi, M. Van, S. McLoone, Y . Sun, J. Close, N. M. Nyat, and S. McIlvanna, “A twin delayed deep deterministic policy gradient algorithm for autonomous ground vehicle navigation via digital twin perception awareness,”arXiv preprint, arXiv:2403.15067v1, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2403.15067

work page doi:10.48550/arxiv.2403.15067 2024
[20]

Decentralized machine learning for dynamic resource optimization in wireless networks using reinforcement learning,

S. Shalini, N.Kopperundevi, R.Rajkumar, A. Radhika, M. Gopianand, and M. Ram, “Decentralized machine learning for dynamic resource optimization in wireless networks using reinforcement learning,” Journal of Electrical Systems, 2024. [Online]. Available: https: //doi.org/10.52783/jes.2539

work page doi:10.52783/jes.2539 2024
[21]

Communication in the presence of noise,

C. Shannon, “Communication in the presence of noise,”Proceedings of the IRE, vol. 37, no. 1, pp. 10–21, 1949

work page 1949
[22]

A quantitative measure of fairness and discrimination for resource allocation in shared computer systems,

R. Jain, D. Chiu, and W. Hawe, “A quantitative measure of fairness and discrimination for resource allocation in shared computer systems,” arXiv preprint, arxiv:9809099, 1998

work page 1998

[1] [1]

A survey on resource allocation for 5G heterogeneous networks: Current research, future trends, and challenges,

X. Yongjun, G. Guan, G. Haris, and A. Fumiyuki, “A survey on resource allocation for 5G heterogeneous networks: Current research, future trends, and challenges,”IEEE Communications Surveys & Tutorials, vol. 23, no. 2, pp. 668–695, 2021

work page 2021

[2] [2]

A survey on resource management for 6G heterogeneous networks: Current research, future trends, and challenges,

A. H. Faeq, H. M. Nour, D. Kaharudin, H. E. Binti, S. Nurhizam, Q. Faizan, A. Khairul, and N. Q. Ngoc, “A survey on resource management for 6G heterogeneous networks: Current research, future trends, and challenges,”Electronics, vol. 12, no. 3, 2023. [Online]. Available: https://doi.org/10.3390/electronics12030647

work page doi:10.3390/electronics12030647 2023

[3] [3]

A comprehensive survey on radio resource management in 5G hetnets: Current solutions, future trends and open issues,

A. Bharat, T. M. Amine, M. Marco, and M. Gabriel-Miro, “A comprehensive survey on radio resource management in 5G hetnets: Current solutions, future trends and open issues,”IEEE Communications Surveys & Tutorials, vol. 24, no. 4, pp. 2495–2534, 2022. [Online]. Available: https://doi.org/10.1109/COMST.2022.3207967 Algorithm 1TD3 for Resource Allocation Opt...

work page doi:10.1109/comst.2022.3207967 2022

[4] [4]

Ather, R

D. Ather, R. Kler, Z. T. Baig, G. P. Babu, A. Rastogi, and N. Ahmed, 6G Networks: Pioneering Advanced Communication Techniques for Call Centers and Beyond.CRC Press, 2025. [Online]. Available: https://doi.org/10.1201/9781003583127-12

work page doi:10.1201/9781003583127-12 2025

[5] [5]

Boyd and L

S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge University Press, 2004. [Online]. Available: https://web.stanford.edu/ ∼boyd/cvxbook/

work page 2004

[6] [6]

Energy-efficient joint resource allocation in 5G hetnet using multi-agent parameterized deep reinforcement learning,

A. Mughees, M. Tahir, M. A. Sheikh, A. Amphawan, Y . K. Meng, A. Ahad, and K. Chamran, “Energy-efficient joint resource allocation in 5G hetnet using multi-agent parameterized deep reinforcement learning,” Physical Communication, vol. 61, p. 102206, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1874490723002094 TABLE I: Qua...

work page 2023

[7] [7]

Mobility induced multi-hop leach protocol in heterogeneous mobile network,

M. Seli, B. P. Kumar, S. P. Kumar, B. S. Kishoro, H. K. Lee, and S. Mangal, “Mobility induced multi-hop leach protocol in heterogeneous mobile network,”IEEE Access, vol. 10, pp. 132 895–132 907, 2022. [Online]. Available: https://doi.org/10.1109/ACCESS.2022.3228576

work page doi:10.1109/access.2022.3228576 2022

[8] [8]

Wireless network scheduling with discrete propagation delays: Theorems and algorithms,

Y . Shenghao, M. Jun, and L. Yanxiao, “Wireless network scheduling with discrete propagation delays: Theorems and algorithms,”IEEE Transactions on Information Theory, vol. 70, no. 3, pp. 1852–1875,

work page

[9] [9]

Available: https://doi.org/10.1109/TIT.2023.3324180

[Online]. Available: https://doi.org/10.1109/TIT.2023.3324180

work page doi:10.1109/tit.2023.3324180 2023

[10] [10]

Sutton and A

R. Sutton and A. Barto,Reinforcement Learning: An Introduction. MIT Press, 1998

work page 1998

[11] [11]

Playing atari with deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglu, D. Wier- stra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” NIPS Deep Learning Workshop 2013, 2013

work page 2013

[12] [12]

Deep reinforcement learning with double q-learning,

van Hado Hasselt, G. Arthur, and S. David, “Deep reinforcement learning with double q-learning,” ser. AAAI’16. AAAI Press, 2016, p. 2094–2100. [Online]. Available: https://doi.org/10.48550/arXiv.1509. 06461

work page doi:10.48550/arxiv.1509 2016

[13] [13]

Soft Actor-Critic Algorithms and Applications

T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. H. J. Tan, V . Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft actor- critic algorithms and applications,”arXiv preprint, arXiv:1812.05905v2,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Soft Actor-Critic Algorithms and Applications

[Online]. Available: https://doi.org/10.48550/arXiv.1812.05905

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1812.05905

[15] [15]

Applications of deep re- inforcement learning in wireless networks-a recent review,

A. Archi, H. A. Saadi, and S. Mekaoui, “Applications of deep re- inforcement learning in wireless networks-a recent review,” in2023 2nd International Conference on Electronics, Energy and Measurement (IC2EM), vol. 1, 2023, pp. 1–8

work page 2023

[16] [16]

An intelligent optimization method for wireless communication network resources based on reinforcement learning,

D. Tian, “An intelligent optimization method for wireless communication network resources based on reinforcement learning,”Journal of Physics: Conference Series, 2023. [Online]. Available: https://doi.org/10.1088/ 1742-6596/2560/1/012036

work page 2023

[17] [17]

D3qn-based multi- priority computation offloading for time-sensitive and interference- limited industrial wireless networks,

X. Chi, Z. Peifeng, Y . Haibin, and L. Yonghui, “D3qn-based multi- priority computation offloading for time-sensitive and interference- limited industrial wireless networks,”IEEE Transactions on Vehicular Technology, vol. 73, no. 9, pp. 13 682–13 693, 2024. [Online]. Available: https://doi.org/10.1109/TVT.2024.3387567

work page doi:10.1109/tvt.2024.3387567 2024

[18] [18]

Application of mac protocol reinforcement learning in wireless network environment,

J. Park and W. Na, “Application of mac protocol reinforcement learning in wireless network environment,” in2024 15th International Conference on Information and Communication Technology Convergence (ICTC), 2024, pp. 730–731

work page 2024

[19] [19]

A twin delayed deep deterministic policy gradient algorithm for autonomous ground vehicle navigation via digital twin perception awareness,

K. Olayemi, M. Van, S. McLoone, Y . Sun, J. Close, N. M. Nyat, and S. McIlvanna, “A twin delayed deep deterministic policy gradient algorithm for autonomous ground vehicle navigation via digital twin perception awareness,”arXiv preprint, arXiv:2403.15067v1, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2403.15067

work page doi:10.48550/arxiv.2403.15067 2024

[20] [20]

Decentralized machine learning for dynamic resource optimization in wireless networks using reinforcement learning,

S. Shalini, N.Kopperundevi, R.Rajkumar, A. Radhika, M. Gopianand, and M. Ram, “Decentralized machine learning for dynamic resource optimization in wireless networks using reinforcement learning,” Journal of Electrical Systems, 2024. [Online]. Available: https: //doi.org/10.52783/jes.2539

work page doi:10.52783/jes.2539 2024

[21] [21]

Communication in the presence of noise,

C. Shannon, “Communication in the presence of noise,”Proceedings of the IRE, vol. 37, no. 1, pp. 10–21, 1949

work page 1949

[22] [22]

A quantitative measure of fairness and discrimination for resource allocation in shared computer systems,

R. Jain, D. Chiu, and W. Hawe, “A quantitative measure of fairness and discrimination for resource allocation in shared computer systems,” arXiv preprint, arxiv:9809099, 1998

work page 1998