pith. sign in

arxiv: 2509.25284 · v2 · submitted 2025-09-29 · 💻 cs.LG · cs.NI· eess.SP

Optimisation of Resource Allocation in Heterogeneous Wireless Networks Using Deep Reinforcement Learning

Pith reviewed 2026-05-18 12:11 UTC · model grok-4.3

classification 💻 cs.LG cs.NIeess.SP
keywords deep reinforcement learningresource allocationheterogeneous networksO-RANenergy efficiencyuser fairnessproximal policy optimizationxApp
0
0 comments X

The pith

A PPO-based xApp using deep reinforcement learning jointly optimizes transmit power, bandwidth slicing, and user scheduling in O-RAN heterogeneous networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a near-real-time RIC xApp that applies proximal policy optimisation to manage transmit power, bandwidth allocation, and user scheduling together under changing loads in heterogeneous wireless networks. It benchmarks this against TD3 and conventional heuristics using simulations drawn from real-world topologies. A sympathetic reader would care because better joint optimisation could make wireless networks more energy efficient and equitable without separate rules for each resource. The work frames this as a step toward centralised AI control in 6G systems.

Core claim

We propose a near-real-time RAN intelligent controller (Near-RT RIC) xApp utilising deep reinforcement learning (DRL) to jointly optimise transmit power, bandwidth slicing, and user scheduling. Leveraging real-world network topologies, we benchmark proximal policy optimisation (PPO) and twin delayed deep deterministic policy gradient (TD3) against standard heuristics. Our results demonstrate that the PPO-based xApp achieves a superior trade-off, reducing network energy consumption by up to 70% in dense scenarios and improving user fairness by more than 30% compared to throughput-greedy baselines. These findings validate the feasibility of centralised, energy-aware AI orchestration in future

What carries the argument

The PPO-based xApp in the Near-RT RIC that learns a policy for simultaneous control of transmit power, bandwidth slicing, and user scheduling.

If this is right

  • Network energy consumption falls by up to 70% in dense scenarios.
  • User fairness rises by more than 30% compared with throughput-greedy methods.
  • PPO delivers a better energy-fairness balance than TD3 or standard heuristics.
  • Centralised AI orchestration becomes feasible for energy-aware 6G resource allocation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the simulation-to-reality gap is small, operators could adopt similar xApps to cut operating costs in dense urban deployments.
  • Joint multi-objective DRL may generalise to other wireless settings where power, spectrum, and scheduling interact strongly.
  • Adding explicit mobility or interference dynamics to the training loop would test whether the reported gains remain stable.

Load-bearing premise

The simulated real-world network topologies and user load patterns used for benchmarking accurately predict performance in actual deployed heterogeneous networks.

What would settle it

Deploying the PPO xApp in a live heterogeneous network testbed and checking whether energy consumption drops by 70% and fairness rises by 30% relative to the same baselines.

Figures

Figures reproduced from arXiv: 2509.25284 by Jaco Du Toit, Jonathan Shock, Oluwaseyi Giwa, Tobi Awodumila.

Figure 1
Figure 1. Figure 1: An illustration of the RL agent in the HetNet environ [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of the mean performance of PPO and TD3. Error bars represent [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Dynamic resource allocation in open radio access network (O-RAN) heterogeneous networks (HetNets) presents a complex optimisation challenge under varying user loads. We propose a near-real-time RAN intelligent controller (Near-RT RIC) xApp utilising deep reinforcement learning (DRL) to jointly optimise transmit power, bandwidth slicing, and user scheduling. Leveraging real-world network topologies, we benchmark proximal policy optimisation (PPO) and twin delayed deep deterministic policy gradient (TD3) against standard heuristics. Our results demonstrate that the PPO-based xApp achieves a superior trade-off, reducing network energy consumption by up to 70% in dense scenarios and improving user fairness by more than 30% compared to throughput-greedy baselines. These findings validate the feasibility of centralised, energy-aware AI orchestration in future 6G architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Near-RT RIC xApp that employs deep reinforcement learning (PPO and TD3) to jointly optimize transmit power, bandwidth slicing, and user scheduling in O-RAN HetNets. It benchmarks these agents against standard heuristics on real-world network topologies and reports that PPO achieves up to 70% lower energy consumption in dense scenarios and more than 30% better user fairness than throughput-greedy baselines, thereby supporting the feasibility of centralised energy-aware AI orchestration for 6G.

Significance. If the simulation results prove robust, the work would contribute to the timely problem of multi-objective resource allocation in open RAN architectures. The explicit comparison of PPO versus TD3 and the focus on energy-fairness trade-offs are strengths. However, the absence of experimental methodology details prevents a confident assessment of whether the headline gains are reproducible or generalisable.

major comments (2)
  1. [Abstract] Abstract: the quantitative claims of 'up to 70% energy reduction' and 'more than 30% fairness improvement' are presented without any description of the number of independent runs, statistical tests, confidence intervals, or error bars, rendering it impossible to determine whether the data support the stated superiority.
  2. [Evaluation] Evaluation section (implied by benchmarking description): the central claim that the simulator using 'real-world network topologies' and 'user load patterns' validates feasibility for deployed 6G networks rests on an unverified assumption; no evidence is supplied that the model captures small-scale fading correlation, O-RAN control-loop delays, or bursty traffic, so the reported deltas may be simulation-specific artifacts.
minor comments (2)
  1. [Abstract] The abstract and introduction should explicitly state the precise definitions of the energy-consumption and fairness metrics used for the reported percentages.
  2. [Method] Notation for the joint action space (power, bandwidth slice, scheduling) should be introduced consistently before the DRL formulation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments identify areas where additional clarity and transparency can strengthen the presentation of our results. We address each major comment below and outline the planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the quantitative claims of 'up to 70% energy reduction' and 'more than 30% fairness improvement' are presented without any description of the number of independent runs, statistical tests, confidence intervals, or error bars, rendering it impossible to determine whether the data support the stated superiority.

    Authors: We agree that the abstract would be improved by referencing the statistical basis of the reported figures. In the revised manuscript we will update the abstract to note that the headline results are averages computed over 20 independent runs with different random seeds, and we will direct readers to the evaluation section where mean values, standard deviations, and error bars are presented. We will also add a brief statement confirming that the observed improvements were consistent across runs. revision: yes

  2. Referee: [Evaluation] Evaluation section (implied by benchmarking description): the central claim that the simulator using 'real-world network topologies' and 'user load patterns' validates feasibility for deployed 6G networks rests on an unverified assumption; no evidence is supplied that the model captures small-scale fading correlation, O-RAN control-loop delays, or bursty traffic, so the reported deltas may be simulation-specific artifacts.

    Authors: We acknowledge the validity of this observation. Our simulator incorporates publicly available real-world base-station locations and historical user-load traces, which go beyond purely synthetic random deployments. However, the channel model follows standard 3GPP path-loss and log-normal shadowing assumptions without explicit spatial correlation for small-scale fading, and the traffic model does not include fine-grained burstiness or O-RAN-specific control-loop latencies. In the revision we will expand the evaluation section to explicitly list these modeling choices and add a dedicated limitations paragraph discussing their implications for direct extrapolation to live 6G networks. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical simulation benchmarks are independent of input definitions

full rationale

The paper reports performance deltas obtained by training PPO and TD3 agents inside a simulator and comparing them to throughput-greedy heuristics on the same simulated topologies and load patterns. These numbers are direct outputs of the experimental runs rather than algebraic identities, fitted parameters renamed as predictions, or results that reduce to self-citations. No uniqueness theorems, ansatzes smuggled via prior work, or self-definitional loops appear in the derivation chain. The evaluation is therefore self-contained against external benchmarks (the heuristics), satisfying the condition for a zero-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so specific free parameters and axioms cannot be extracted in detail; the approach necessarily relies on standard DRL hyperparameters and simulation assumptions.

free parameters (1)
  • DRL hyperparameters
    Learning rates, network sizes, and reward weights are typically tuned or fitted in such studies but are not reported here.
axioms (1)
  • domain assumption Simulated topologies and load patterns sufficiently represent real heterogeneous network behavior
    The abstract invokes real-world topologies for benchmarking without further justification.

pith-pipeline@v0.9.0 · 5685 in / 1291 out tokens · 63044 ms · 2026-05-18T12:11:53.649805+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 2 internal anchors

  1. [1]

    A survey on resource allocation for 5G heterogeneous networks: Current research, future trends, and challenges,

    X. Yongjun, G. Guan, G. Haris, and A. Fumiyuki, “A survey on resource allocation for 5G heterogeneous networks: Current research, future trends, and challenges,”IEEE Communications Surveys & Tutorials, vol. 23, no. 2, pp. 668–695, 2021

  2. [2]

    A survey on resource management for 6G heterogeneous networks: Current research, future trends, and challenges,

    A. H. Faeq, H. M. Nour, D. Kaharudin, H. E. Binti, S. Nurhizam, Q. Faizan, A. Khairul, and N. Q. Ngoc, “A survey on resource management for 6G heterogeneous networks: Current research, future trends, and challenges,”Electronics, vol. 12, no. 3, 2023. [Online]. Available: https://doi.org/10.3390/electronics12030647

  3. [3]

    A comprehensive survey on radio resource management in 5G hetnets: Current solutions, future trends and open issues,

    A. Bharat, T. M. Amine, M. Marco, and M. Gabriel-Miro, “A comprehensive survey on radio resource management in 5G hetnets: Current solutions, future trends and open issues,”IEEE Communications Surveys & Tutorials, vol. 24, no. 4, pp. 2495–2534, 2022. [Online]. Available: https://doi.org/10.1109/COMST.2022.3207967 Algorithm 1TD3 for Resource Allocation Opt...

  4. [4]

    Ather, R

    D. Ather, R. Kler, Z. T. Baig, G. P. Babu, A. Rastogi, and N. Ahmed, 6G Networks: Pioneering Advanced Communication Techniques for Call Centers and Beyond.CRC Press, 2025. [Online]. Available: https://doi.org/10.1201/9781003583127-12

  5. [5]

    Boyd and L

    S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge University Press, 2004. [Online]. Available: https://web.stanford.edu/ ∼boyd/cvxbook/

  6. [6]

    Energy-efficient joint resource allocation in 5G hetnet using multi-agent parameterized deep reinforcement learning,

    A. Mughees, M. Tahir, M. A. Sheikh, A. Amphawan, Y . K. Meng, A. Ahad, and K. Chamran, “Energy-efficient joint resource allocation in 5G hetnet using multi-agent parameterized deep reinforcement learning,” Physical Communication, vol. 61, p. 102206, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1874490723002094 TABLE I: Qua...

  7. [7]

    Mobility induced multi-hop leach protocol in heterogeneous mobile network,

    M. Seli, B. P. Kumar, S. P. Kumar, B. S. Kishoro, H. K. Lee, and S. Mangal, “Mobility induced multi-hop leach protocol in heterogeneous mobile network,”IEEE Access, vol. 10, pp. 132 895–132 907, 2022. [Online]. Available: https://doi.org/10.1109/ACCESS.2022.3228576

  8. [8]

    Wireless network scheduling with discrete propagation delays: Theorems and algorithms,

    Y . Shenghao, M. Jun, and L. Yanxiao, “Wireless network scheduling with discrete propagation delays: Theorems and algorithms,”IEEE Transactions on Information Theory, vol. 70, no. 3, pp. 1852–1875,

  9. [9]

    Available: https://doi.org/10.1109/TIT.2023.3324180

    [Online]. Available: https://doi.org/10.1109/TIT.2023.3324180

  10. [10]

    Sutton and A

    R. Sutton and A. Barto,Reinforcement Learning: An Introduction. MIT Press, 1998

  11. [11]

    Playing atari with deep reinforcement learning,

    V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglu, D. Wier- stra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” NIPS Deep Learning Workshop 2013, 2013

  12. [12]

    Deep reinforcement learning with double q-learning,

    van Hado Hasselt, G. Arthur, and S. David, “Deep reinforcement learning with double q-learning,” ser. AAAI’16. AAAI Press, 2016, p. 2094–2100. [Online]. Available: https://doi.org/10.48550/arXiv.1509. 06461

  13. [13]

    Soft Actor-Critic Algorithms and Applications

    T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. H. J. Tan, V . Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft actor- critic algorithms and applications,”arXiv preprint, arXiv:1812.05905v2,

  14. [14]

    Soft Actor-Critic Algorithms and Applications

    [Online]. Available: https://doi.org/10.48550/arXiv.1812.05905

  15. [15]

    Applications of deep re- inforcement learning in wireless networks-a recent review,

    A. Archi, H. A. Saadi, and S. Mekaoui, “Applications of deep re- inforcement learning in wireless networks-a recent review,” in2023 2nd International Conference on Electronics, Energy and Measurement (IC2EM), vol. 1, 2023, pp. 1–8

  16. [16]

    An intelligent optimization method for wireless communication network resources based on reinforcement learning,

    D. Tian, “An intelligent optimization method for wireless communication network resources based on reinforcement learning,”Journal of Physics: Conference Series, 2023. [Online]. Available: https://doi.org/10.1088/ 1742-6596/2560/1/012036

  17. [17]

    D3qn-based multi- priority computation offloading for time-sensitive and interference- limited industrial wireless networks,

    X. Chi, Z. Peifeng, Y . Haibin, and L. Yonghui, “D3qn-based multi- priority computation offloading for time-sensitive and interference- limited industrial wireless networks,”IEEE Transactions on Vehicular Technology, vol. 73, no. 9, pp. 13 682–13 693, 2024. [Online]. Available: https://doi.org/10.1109/TVT.2024.3387567

  18. [18]

    Application of mac protocol reinforcement learning in wireless network environment,

    J. Park and W. Na, “Application of mac protocol reinforcement learning in wireless network environment,” in2024 15th International Conference on Information and Communication Technology Convergence (ICTC), 2024, pp. 730–731

  19. [19]

    A twin delayed deep deterministic policy gradient algorithm for autonomous ground vehicle navigation via digital twin perception awareness,

    K. Olayemi, M. Van, S. McLoone, Y . Sun, J. Close, N. M. Nyat, and S. McIlvanna, “A twin delayed deep deterministic policy gradient algorithm for autonomous ground vehicle navigation via digital twin perception awareness,”arXiv preprint, arXiv:2403.15067v1, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2403.15067

  20. [20]

    Decentralized machine learning for dynamic resource optimization in wireless networks using reinforcement learning,

    S. Shalini, N.Kopperundevi, R.Rajkumar, A. Radhika, M. Gopianand, and M. Ram, “Decentralized machine learning for dynamic resource optimization in wireless networks using reinforcement learning,” Journal of Electrical Systems, 2024. [Online]. Available: https: //doi.org/10.52783/jes.2539

  21. [21]

    Communication in the presence of noise,

    C. Shannon, “Communication in the presence of noise,”Proceedings of the IRE, vol. 37, no. 1, pp. 10–21, 1949

  22. [22]

    A quantitative measure of fairness and discrimination for resource allocation in shared computer systems,

    R. Jain, D. Chiu, and W. Hawe, “A quantitative measure of fairness and discrimination for resource allocation in shared computer systems,” arXiv preprint, arxiv:9809099, 1998