pith. sign in

arxiv: 2510.24869 · v3 · submitted 2025-10-28 · 💻 cs.NI · cs.SY· eess.SY

Deep Reinforcement Learning Approach to QoSAware Load Balancing in 5G Cellular Networks under User Mobility and Observation Uncertainty

Pith reviewed 2026-05-18 02:45 UTC · model grok-4.3

classification 💻 cs.NI cs.SYeess.SY
keywords deep reinforcement learningload balancing5G networksPPOQoS managementmobilitycell individual offset
0
0 comments X

The pith

PPO reinforcement learning agent improves 5G load balancing by dynamically adjusting cell individual offsets under mobility and noisy observations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep reinforcement learning system using Proximal Policy Optimization to manage load balancing in 5G cellular networks. The agent learns to set Cell Individual Offset values to direct user connections, aiming to enhance multiple quality of service metrics simultaneously. This approach is tested in a custom Python simulator that includes user movement and measurement errors, showing better performance than traditional methods. A sympathetic reader would care because effective load balancing is essential for maintaining service quality as networks become denser and more dynamic with user mobility.

Core claim

The PPO-based agent, trained end-to-end in the simulator, produces policies that increase aggregate throughput and Jain's fairness index while decreasing latency, jitter, packet loss, and the number of handovers compared to rule-based approaches like ReBuHa and A3, and the CDQL baseline. The learning shows stable convergence across hundreds of episodes and generalizes better as user density rises.

What carries the argument

The actor-critic neural network within the PPO algorithm that learns a policy for periodic adjustment of Cell Individual Offset (CIO) values to steer user-cell associations based on a multi-objective reward function incorporating throughput, latency, jitter, packet loss, fairness, and handover count.

If this is right

  • The learned policy maintains smoother training dynamics than the CDQL baseline.
  • It exhibits stronger generalization when user load increases in stress tests.
  • Overall KPI improvements are consistent across 500+ training episodes.
  • The framework uses an entirely Python-based toolchain for both simulation and training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the policy transfers well, it could reduce the need for manual tuning of network parameters in live 5G deployments.
  • Extending the simulator with more complex mobility models or real-world trace data could test robustness further.
  • Combining this with other RL applications in RAN optimization might lead to integrated autonomous network management systems.

Load-bearing premise

That training in the lightweight Python simulator with Gauss-Markov mobility and added noise produces policies that will perform similarly when deployed in actual 5G radio access networks.

What would settle it

Deploy the trained PPO policy in a commercial 5G testbed or a more detailed simulator like ns-3 with real base station parameters and measure whether the reported KPI improvements over baselines hold under equivalent user mobility patterns.

read the original abstract

Efficient mobility management and load balancing are critical to sustaining Quality of Service (QoS) in dense, highly dynamic 5G radio access networks. We present a deep reinforcement learning framework based on Proximal Policy Optimization (PPO) for autonomous, QoS-aware load balancing implemented end-to-end in a lightweight, pure-Python simulation environment. The control problem is formulated as a Markov Decision Process in which the agent periodically adjusts Cell Individual Offset (CIO) values to steer user-cell associations. A multi-objective reward captures key performance indicators (aggregate throughput, latency, jitter, packet loss rate, Jain's fairness index, and handover count), so the learned policy explicitly balances efficiency and stability under user mobility and noisy observations. The PPO agent uses an actor-critic neural network trained from trajectories generated by the Python simulator with configurable mobility (e.g., Gauss-Markov) and stochastic measurement noise. Across 500+ training episodes and stress tests with increasing user density, the PPO policy consistently improves KPI trends (higher throughput and fairness, lower delay, jitter, packet loss, and handovers) and exhibits rapid, stable convergence. Comparative evaluations show that PPO outperforms rule-based ReBuHa and A3 as well as the learning-based CDQL baseline across all KPIs while maintaining smoother learning dynamics and stronger generalization as load increases. These results indicate that PPO's clipped policy updates and advantage-based training yield robust, deployable control for next-generation RAN load balancing using an entirely Python-based toolchain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a PPO-based deep reinforcement learning framework for QoS-aware load balancing in 5G networks. The control task is cast as an MDP in which an agent adjusts Cell Individual Offset (CIO) values to steer user associations. A multi-objective reward incorporates aggregate throughput, latency, jitter, packet loss, Jain fairness, and handover count. Training occurs inside a lightweight pure-Python simulator that uses Gauss-Markov mobility and additive stochastic measurement noise. The authors report that, across 500+ episodes and stress tests with rising user density, the learned policy improves all KPIs relative to rule-based ReBuHa and A3 baselines as well as the CDQL learning baseline, while exhibiting stable convergence and better generalization under load.

Significance. If the simulator trajectories prove statistically representative of live 5G RAN behavior, the work would supply a fully reproducible Python toolchain for multi-objective, mobility-aware load balancing that explicitly balances efficiency against stability. The explicit multi-KPI reward and the use of PPO's clipped updates for robustness are constructive elements. The absence of any quantitative grounding against real RAN traces or established simulators, however, confines the demonstrated gains to the custom environment and limits immediate claims of deployability.

major comments (2)
  1. [Abstract] Abstract: the central empirical claim states that PPO 'consistently improves KPI trends' and 'outperforms' ReBuHa, A3, and CDQL 'across all KPIs' yet supplies no numerical KPI values, confidence intervals, statistical tests, random-seed counts, or re-implementation details for the baselines, rendering the magnitude and reliability of the reported gains impossible to evaluate from the given text.
  2. [Simulation Environment] Simulation setup (as described in the abstract and skeptic note): the transfer and deployability claims rest on the untested assumption that trajectories generated by the pure-Python simulator with Gauss-Markov mobility and stochastic noise produce KPI statistics sufficiently close to real 5G radio access networks; no quantitative comparison of handover rates, throughput distributions, delay tails, or other metrics against operator traces, ns-3 5G modules, or field data is provided.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'entirely Python-based toolchain' would benefit from explicit listing of the neural-network and optimization libraries employed, to clarify reproducibility requirements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central empirical claim states that PPO 'consistently improves KPI trends' and 'outperforms' ReBuHa, A3, and CDQL 'across all KPIs' yet supplies no numerical KPI values, confidence intervals, statistical tests, random-seed counts, or re-implementation details for the baselines, rendering the magnitude and reliability of the reported gains impossible to evaluate from the given text.

    Authors: We agree that the abstract would benefit from greater specificity. In the revised version we will incorporate key quantitative results from our experiments, including representative KPI improvements with associated confidence intervals and details on the number of random seeds and episodes used. revision: yes

  2. Referee: [Simulation Environment] Simulation setup (as described in the abstract and skeptic note): the transfer and deployability claims rest on the untested assumption that trajectories generated by the pure-Python simulator with Gauss-Markov mobility and stochastic noise produce KPI statistics sufficiently close to real 5G radio access networks; no quantitative comparison of handover rates, throughput distributions, delay tails, or other metrics against operator traces, ns-3 5G modules, or field data is provided.

    Authors: The referee correctly identifies that our evaluation is performed entirely within the custom simulator. We will expand the manuscript with an explicit limitations subsection that discusses the modeling assumptions of the Gauss-Markov mobility model and additive noise, clarifies that the work does not claim direct equivalence to live RAN traces, and outlines future validation steps against ns-3 or operator data. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical training results independent of inputs

full rationale

The paper defines an MDP for periodic CIO adjustment, specifies a multi-objective reward function over measurable KPIs (throughput, latency, jitter, packet loss, fairness, handovers), and trains a PPO actor-critic policy on trajectories generated by an external pure-Python simulator with Gauss-Markov mobility and additive noise. All reported performance gains versus ReBuHa, A3, and CDQL are obtained by executing this training and evaluation loop inside the simulator; no equation, fitted parameter, or self-citation is shown to reduce algebraically to the target KPI improvements by construction. The derivation chain therefore remains self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The reported performance gains rest on the fidelity of the custom simulator and the appropriateness of the chosen reward components; no new physical entities are postulated.

free parameters (1)
  • PPO clip ratio, learning rate, and network architecture sizes
    Standard reinforcement-learning hyperparameters selected to stabilize training in the described environment.
axioms (1)
  • domain assumption The Python simulator with Gauss-Markov mobility and additive measurement noise produces statistically representative trajectories for real 5G radio conditions.
    All comparative KPI improvements are measured inside this simulator; transfer to hardware depends on this modeling assumption.

pith-pipeline@v0.9.0 · 5812 in / 1491 out tokens · 50447 ms · 2026-05-18T02:45:27.663744+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages

  1. [1]

    strongest - signal

    (6) We apply light temporal filtering (e.g., EMA) and running normalization on these channels before feeding the learning agent. MDP Formulation We define an episodic Markov Decision Process (𝑆,𝐴,𝑃,𝑅,𝛾). State At decision time t the controller observes per-BS aggregates: 𝜂(𝑡) = [𝜂1(𝑡),…,𝜂𝑀(𝑡)] (7) 𝑇(𝑡) = [𝑇̅1(𝑡),…,𝑇̅𝑀(𝑡)] (8) 𝐽(𝑡) = [𝐽1̅(𝑡),…,𝐽𝑀̅̅̅(𝑡)] (9...

  2. [2]

    Hossein Soleimani and Azzedine Boukerche. 2014. CAMS transmission rate adaptation for vehicular safety application in LTE. In Proceedings of the fourth ACM international symposium on Development and analysis of intelligent vehicular networks and applications (DIVANet '14). Association for Comput ing Machinery, New York, NY, USA, 47 –52. https://doi.org/10...

  3. [3]

    Eskandarpour, Mehrshad & Soleimani, Hossein. (2025). Enhancing Lifetime and Reliability in WSNs: Complementary of Dual‐Battery Systems Energy Management Strategy. International Journal of Distributed Sensor Networks. 2025. 10.1155/dsn/5870686

  4. [4]

    Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks,

    H. Jiang, G. Li, J. Xie and J. Yang, "Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks," in IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 4, pp. 5269-5279, April 2024, doi: 10.1109/TNNLS.2022.3203024

  5. [5]

    An Empirical Study on Bias Reduction: Clipped Double Q vs. Multi-Step Methods,

    Y. Bai, "An Empirical Study on Bias Reduction: Clipped Double Q vs. Multi-Step Methods," 2021 International Conference on Computer Information Science and Artificial Intelligence (CISAI), Kunming, China, 2021, pp. 1063-1068, doi: 10.1109/CISAI54367.2021.00213

  6. [6]

    Power Allocation in 5G Wireless Communication,

    Z. Chen and Q. Liang, "Power Allocation in 5G Wireless Communication," in IEEE Access, vol. 7, pp. 60785 -60792, 2019, doi: 10.1109/ACCESS.2019.2915099

  7. [7]

    Energy -Efficient Joint User and Power Allocation in 5G Millimeter Wave Networks: A Genetic Algorithm - Based Approach,

    A. Fayad and T. Cinkler, "Energy -Efficient Joint User and Power Allocation in 5G Millimeter Wave Networks: A Genetic Algorithm - Based Approach," in IEEE Access, vol. 12, pp. 20019-20030, 2024, doi: 10.1109/ACCESS.2024.3361660

  8. [8]

    Hassani, Alireza & Delir Haghighi, Pari & Jayaraman, Prem Prakash & Zaslavsky, Arkady & Medvedev, Alexey. (2016). CDQL: A Generic Context Representation and Querying Approach for Internet of Things Applications. 79-88. 10.1145/3007120.3007137

  9. [9]

    Hill, Mary & Laughter, Melissa & Harmange, Cecile & Dellavalle, Robert & Rundle, Chandler & Dunnick, Cory. (2021). Development of the CDQL: a comprehensive quality of life measure for patients with contact dermatitis (Preprint). 10.2196/preprints.30620

  10. [10]

    Minh Do, Canh & Takagi, Tsubasa & Ogata, Kazuhiro. (2024). Automated Quantum Protocol Verification Based on Concurrent Dynamic Quantum Logic. ACM Transactions on Software Engineering and Methodology. 34. 10.1145/3708475

  11. [11]

    Power Allocation and Parameter Estimation for Multipath -Based 5G Positioning,

    A. Kakkavas, H. Wymeersch, G. Seco-Granados, M. H. C. García, R. A. Stirling-Gallacher and J. A. Nossek, "Power Allocation and Parameter Estimation for Multipath -Based 5G Positioning," in IEEE Transactions on Wireless Communications, vol. 20, no. 11, pp. 7302-7316, Nov. 2021, doi: 10.1109/TWC.2021.3082581

  12. [12]

    Joint Time and Power Allocation for 5G NR Unlicensed Systems,

    H. Bao, Y. Huo, X. Dong and C. Huang, "Joint Time and Power Allocation for 5G NR Unlicensed Systems," in IEEE Transactions on Wireless Communications, vol. 20, no. 9, pp. 6195-6209, Sept. 2021, doi: 10.1109/TWC.2021.3072553

  13. [13]

    Iturria Rivera, Pedro & Elsayed, Medhat & Bavand, Majid & Gaigalas, Raimundas & Furr, Steve & Erol Kantarci, Melike. (2023). Hierarchical Deep Q -Learning Based Handover in Wireless Networks with Dual Connectivity. 10.48550/arXiv.2301.05391

  14. [14]

    Kavosi, Daruosh & Karimi, Abbas & Zarafshan, Faraneh. (2024). SELF- QMM: An Self -directed Model Based -on Extended Q -Learning and Markov Model to Estimate MTTF in Multiprocessor Platform of Embedded Systems. 10.21203/rs.3.rs-5327542/v1

  15. [15]

    Evolved Universal Terrestrial Radio Access (E -UTRA); Radio Resource Control (RRC); Protocol specification,

    3GPP, “Evolved Universal Terrestrial Radio Access (E -UTRA); Radio Resource Control (RRC); Protocol specification,” 3GPP TS 36.331, Release 15, Dec. 2020. [Online]. Available: https://www.3gpp.org/DynaReport/36331.htm

  16. [16]

    Load Balancing in Cellular Networks: A Reinforcement Learning Approach,

    K. Attiah, M. Alsheikh, N. Saeed, and T. Y. Al-Naffouri, “Load Balancing in Cellular Networks: A Reinforcement Learning Approach,” in Proc. IEEE Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, Jan. 2020, pp. 1 –6. doi: 10.1109/CCNC46108.2020.9045533

  17. [17]

    Load Balancing for Ultra- Dense Networks: A Deep Reinforcement Learning -Based Approach,

    Y. Xu, Q. Wu, R. Atat, Y. Zhao, and Z. Ren, “Load Balancing for Ultra- Dense Networks: A Deep Reinforcement Learning -Based Approach,” IEEE Internet of Things Journal, vol. 8, no. 7, pp. 5141–5155, Apr. 2021. doi: 10.1109/JIOT.2020.3035289

  18. [18]

    Handover Management in 5G Networks Using Reinforcement Learning,

    V. Yajnanarayana, A. Gupta, and P. Mannion, “Handover Management in 5G Networks Using Reinforcement Learning,” in Proc. IEEE 5G World Forum (5GWF), Bangalore, India, Sept. 2020, pp. 1 –6. doi: 10.1109/5GWF49715.2020.9221346

  19. [19]

    Trust in 5g open rans through machine learning: Rf fingerprinting on the powder pawr plat- form,

    Z.-H. Huang, K. -W. Lu, and C. -L. Wang, “Efficient Handover in 5G Using Deep Learning,” in Proc. IEEE Global Communications Conference (GLOBECOM), Taipei, Taiwan, Dec. 2020, pp. 1 –6. doi: 10.1109/GLOBECOM42002.2020.9322453

  20. [20]

    Reinforcement Learning-Based Beam Management and Interference Mitigation in mmWave Networks,

    L. He, Y. Xu, R. Atat, N. Mastronarde, and Y. Zhao, “Reinforcement Learning-Based Beam Management and Interference Mitigation in mmWave Networks,” IEEE Access, vol. 9, pp. 12345–12357, Jan. 2021. doi: 10.1109/ACCESS.2021.3051195

  21. [21]

    Hierarchical Reinforcement Learning for Mobility Management in 5G Ultra-Dense Networks,

    J. Chen, Y. Wang, and X. Chu, “Hierarchical Reinforcement Learning for Mobility Management in 5G Ultra-Dense Networks,” IEEE Transactions on Network and Service Management, vol. 18, no. 1, pp. 778 –790, Mar

  22. [22]

    doi: 10.1109/TNSM.2020.3045406

  23. [23]

    QoS -Aware Multi -Objective Reinforcement Learning for User Association in 5G Networks,

    Z. Li, W. Saad, and M. Bennis, “QoS -Aware Multi -Objective Reinforcement Learning for User Association in 5G Networks,” Computer Networks, vol. 210, p. 107905, Mar. 2022. doi: 10.1016/j.comnet.2022.107905

  24. [24]

    Energy and Latency Optimization for Edge Intelligence via Deep Reinforcement Learning,

    A. Rahmati, A. Azari, and C. Fischione, “Energy and Latency Optimization for Edge Intelligence via Deep Reinforcement Learning,” IEEE Transactions on Wireless Communications, vol. 21, no. 6, pp. 4116–4129, Jun. 2022. doi: 10.1109/TWC.2021.3136266

  25. [25]

    Addressing Function Approximation Error in Actor -Critic Methods,

    S. Fujimoto, H. van Hoof, and D. Meger, “Addressing Function Approximation Error in Actor -Critic Methods,” in Proc. International Conference on Machine Learning (ICML), Stockholm, Sweden, Jul. 2018, pp. 1587 –1596. [Online]. Available: https://proceedings.mlr.press/v80/fujimoto18a.html

  26. [26]

    Handover in LTE-advanced wireless networks: state of art and survey of decision algorithm,

    R. Ahmad, E. A. Sundararajan, N. E. Othman, and M. Ismail, “Handover in LTE-advanced wireless networks: state of art and survey of decision algorithm,” Telecommunication Systems, 2017

  27. [27]

    A Survey on Handover Management: From LTE to NR,

    M. Tayyab, X. Gelabert, and R. Jantti, “A Survey on Handover Management: From LTE to NR,” 2019

  28. [28]

    5G Handover using Reinforcement Learning,

    V. Yajnanarayana, H. Ryden, and L. Hevizi, “5G Handover using Reinforcement Learning,” in Proc. IEEE 3rd 5G World Forum (5GWF), 2020

  29. [29]

    Efficient Handover Algorithm in 5G Networks using Deep Learning,

    Z.-H. Huang, Y. -L. Hsu, P. -K. Chang, and M. -J. Tsai, “Efficient Handover Algorithm in 5G Networks using Deep Learning,” in IEEE GLOBECOM 2020, pp. 1–6, Dec. 2020

  30. [30]

    Load Balancing for Ultra - dense Networks: A Deep Reinforcement Learning -Based Approach,

    Y. Xu, W. Xu, Z. Wang, J. Lin, and S. Cui, “Load Balancing for Ultra - dense Networks: A Deep Reinforcement Learning -Based Approach,” IEEE Internet of Things Journal, 2019

  31. [31]

    Load Balancing in Cellular Networks: A Reinforcement Learning Approach,

    K. Attiah, K. Banawan, A. Gaber, A. Elezabi, K. Seddik, Y. Gadallah, and K. Abdullah, “Load Balancing in Cellular Networks: A Reinforcement Learning Approach,” in Proc. IEEE CCNC, 2020

  32. [32]

    Stackelberg game -based deployment design and radio resource allocation in coordinated UAVs -assisted vehicular communication networks,

    M. Hosseini and R. Ghazizadeh, “Stackelberg game -based deployment design and radio resource allocation in coordinated UAVs -assisted vehicular communication networks,” IEEE Trans. Veh. Technol., vol. 72, no. 1, pp. 1196–1210, Jan. 2023, doi: 10.1109/TVT.2022.3206145

  33. [33]

    A joint power and bandwidth allocation method based on deep reinforcement learning for V2V communications in 5G,

    X. Hu, S. Xu, L. Wang, Y. Wang, Z. Liu, L. Xu, Y. Li, and W. Wang, “A joint power and bandwidth allocation method based on deep reinforcement learning for V2V communications in 5G,” China Communications, vol. 18, no. 7, pp. 25–35, Jul. 2021

  34. [34]

    A deep reinforcement learning based D2D relay selection and power level allocation in mmWave vehicular networks,

    H. Zhang, S. Chong, X. Zhang, and N. Lin, “A deep reinforcement learning based D2D relay selection and power level allocation in mmWave vehicular networks,” IEEE Wireless Commun. Lett., vol. 9, no. 3, pp. 416–419, Mar. 2020

  35. [35]

    Knowledge -driven resource allocation for wireless networks: A WMMSE unrolled graph neural network approach,

    H. Yang, N. Cheng, R. Sun, W. Quan, R. Chai, K. Aldubaikhy, A. Alqasir, and X. Shen, “Knowledge -driven resource allocation for wireless networks: A WMMSE unrolled graph neural network approach,” IEEE Internet of Things Journal, vol. 11, no. 10, pp. 189–…, 2024

  36. [36]

    Joint power control and channel allocation for interference mitigation based on reinforcement learning,

    G. Zhao, Y. Li, C. Xu, Z. Han, Y. Xing, and S. Yu, “Joint power control and channel allocation for interference mitigation based on reinforcement learning,” IEEE Access, vol. 7, pp. 177254–177265, 2019

  37. [37]

    Joint optimization of handover control and power allocation based on multi -agent deep reinforcement learning,

    D. Guo, L. Tang, X. Zhang, and Y. -C. Liang, “Joint optimization of handover control and power allocation based on multi -agent deep reinforcement learning,” IEEE Trans. Veh. Technol., vol. 69, no. 11, pp. 13124–13138, Nov. 2020

  38. [38]

    Resource management in future millimeter wave small -cell networks: Joint PHY - MAC layer design,

    J. Shi, H. Pervaiz, P. Xiao, W. Liang, Z. Li, and Z. Ding, “Resource management in future millimeter wave small -cell networks: Joint PHY - MAC layer design,” IEEE Access, vol. 7, pp. 76910–76919, 2019

  39. [39]

    Self -organizing mm-wave networks: A power allocation scheme based on machine learning,

    R. Amiri and H. Mehrpouyan, “Self -organizing mm-wave networks: A power allocation scheme based on machine learning,” in Proc. 11th Global Symp. Millim. Waves (GSMM), 2018

  40. [40]

    A model -driven deep reinforcement learning heuristic algorithm for resource allocation in ultra- dense cellular networks,

    X. Liao, J. Shi, Z. Li, L. Zhang, and B. Xia, “A model -driven deep reinforcement learning heuristic algorithm for resource allocation in ultra- dense cellular networks,” IEEE Trans. Veh. Technol., vol. 69, no. 1, pp. 983–997, Jan. 2020

  41. [41]

    Self -adaptive power control with deep reinforcement learning for millimeter-wave Internet-of- vehicles video caching,

    D. Kwon, J. Kim, D. A. Mohaisen, and W. Lee, “Self -adaptive power control with deep reinforcement learning for millimeter-wave Internet-of- vehicles video caching,” Journal of Communications and Networks, vol. 22, no. 4, pp. 326–337, Aug. 2020

  42. [42]

    A survey on uplink resource allocation in OFDMA wireless networks,

    E. Yaacoub and Z. Dawy, “A survey on uplink resource allocation in OFDMA wireless networks,” IEEE Commun. Surveys & Tutorials, vol. 14, no. 2, pp. 322–337, 2nd Quart., 2012

  43. [43]

    Experience -driven power allocation using multi -agent deep reinforcement learning for millimeter -wave high -speed railway systems,

    J. Xu and B. Ai, “Experience -driven power allocation using multi -agent deep reinforcement learning for millimeter -wave high -speed railway systems,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 6, pp. 5490–5500, Jun. 2022

  44. [44]

    V2X offloading and resource allocation in SDN -assisted MEC -based vehicular networks,

    H. Zhang, Z. Wang, and K. Liu, “V2X offloading and resource allocation in SDN -assisted MEC -based vehicular networks,” China Communications, vol. 17, no. 5, pp. 266–283, May 2020

  45. [45]

    Federated reinforcement learning -based resource allocation in D2D -enabled 6G,

    Q. Guo, F. Tang, and N. Kato, “Federated reinforcement learning -based resource allocation in D2D -enabled 6G,” IEEE Network, vol. 37, no. 5, pp. 89–95, Sep. 2023

  46. [46]

    Resource allocation for high -reliability low-latency vehicular communications with packet retransmission,

    C. Guo, L. Liang, and G. Y. Li, “Resource allocation for high -reliability low-latency vehicular communications with packet retransmission,” IEEE Trans. Veh. Technol., vol. 68, no. 7, pp. 6219–6230, Jul. 2019

  47. [47]

    Deep Reinforcement Learning for 5G Networks: Joint Beamforming, Power Control, and Interference Coordination,

    F. B. Mismar, B. L. Evans and A. Alkhateeb, "Deep Reinforcement Learning for 5G Networks: Joint Beamforming, Power Control, and Interference Coordination," in IEEE Transactions on Communications, vol. 68, no. 3, pp. 1581 -1592, March 2020, doi: 10.1109/TCOMM.2019.2961332

  48. [48]

    Massive MIMO With Joint Power Control,

    J. Choi, “Massive MIMO With Joint Power Control,” IEEE Wireless Communications Letters, vol. 3, no. 4, pp. 329–332, Aug. 2014

  49. [49]

    Joint Power Control and Beamforming for Uplink Non -Orthogonal Multiple Access in 5G Millimeter -Wave Communications,

    L. Zhu, J. Zhang, Z. Xiao, X. Cao, D. O. Wu, and X. Xia, “Joint Power Control and Beamforming for Uplink Non -Orthogonal Multiple Access in 5G Millimeter -Wave Communications,” IEEE Trans. on Wireless Communications, vol. 17, no. 9, pp. 6177–6189, Sep. 2018

  50. [50]

    Online Power Control for 5G Wireless Communications: A Deep Q-Network Approach,

    C. Luo, J. Ji, Q. Wang, L. Yu, and P. Li, “Online Power Control for 5G Wireless Communications: A Deep Q-Network Approach,” in Proc. IEEE ICC, May 2018

  51. [51]

    Joint optimal power control and beamforming in wireless networks using antenna arrays,

    F. Rashid-Farrokhi, L. Tassiulas, and K. J. R. Liu, “Joint optimal power control and beamforming in wireless networks using antenna arrays,” IEEE Trans. on Communications , vol. 46, no. 10, pp. 1313 –1324, Oct. 1998

  52. [52]

    Evolved Universal Terrestrial Radio Access (E-UTRA); Overall description,

    3GPP, “Evolved Universal Terrestrial Radio Access (E-UTRA); Overall description,” TS 36.300, Jan. 2019

  53. [53]

    Online Learning-based Downlink Transmission Coordination in Ultra -Dense Millimeter Wave Heterogeneous Networks,

    R. Kim, Y. Kim, N. Y. Yu, S. Kim, and H. Lim, “Online Learning-based Downlink Transmission Coordination in Ultra -Dense Millimeter Wave Heterogeneous Networks,” IEEE Trans. on Wireless Communications , vol. 18, no. 4, pp. 2200–2214, Mar. 2019

  54. [54]

    Reinforcement Learning for Link Adaptation in MIMO-OFDM Wireless Systems,

    S. Yun and C. Caramanis, “Reinforcement Learning for Link Adaptation in MIMO-OFDM Wireless Systems,” in Proc. IEEE GLOBECOM, Dec. 2010

  55. [55]

    A Q-learning Based Approach to Interference Avoidance in Self -Organized Femtocell Networks,

    M. Bennis and D. Niyato, “A Q-learning Based Approach to Interference Avoidance in Self -Organized Femtocell Networks,” in Proc. IEEE Globecom Workshops, Dec. 2010

  56. [56]

    Q-Learning Algorithm for VoLTE Closed Loop Power Control in Indoor Small Cells,

    F. B. Mismar and B. L. Evans, “Q-Learning Algorithm for VoLTE Closed Loop Power Control in Indoor Small Cells,” in Proc. Asilomar Conf. on Signals, Systems, and Computers, Oct. 2018

  57. [57]

    Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks,

    S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks,” IEEE Trans. on Cognitive Communications and Networking, vol. 4, no. 2, pp. 257–265, Jun. 2018

  58. [58]

    Data -Driven Deep Learning for Automatic Modulation Recognition in Cognitive Radios,

    Y. Wang, M. Liu, J. Yang, and G. Gui, “Data -Driven Deep Learning for Automatic Modulation Recognition in Cognitive Radios,” IEEE Trans. on Vehicular Technology, vol. 68, no. 4, pp. 4074–4077, Apr. 2019

  59. [59]

    Deep learning -based power control for non -orthogonal random access,

    H. S. Jang, H. Lee, and T. Q. S. Quek, “Deep learning -based power control for non -orthogonal random access,” IEEE Communications Letters, pp. 1–1, Aug. 2019

  60. [60]

    Deep Learning Based Online Power Control for Large Energy Harvesting Networks,

    M. K. Sharma, A. Zappone, M. Debbah, and M. Assaad, “Deep Learning Based Online Power Control for Large Energy Harvesting Networks,” in Proc. IEEE ICASSP, May 2019, pp. 8429–8433

  61. [61]

    Deep power control: Transmit power control scheme based on convolutional neural network,

    W. Lee, M. Kim, and D. Cho, “Deep power control: Transmit power control scheme based on convolutional neural network,” IEEE Communications Letters, vol. 22, no. 6, pp. 1276–1279, Jun. 2018

  62. [62]

    Deep learning coordinated beamforming for highly -mobile millimeter wave systems,

    A. Alkhateeb, S. Alex, P. Varkey, Y. Li, Q. Qu, and D. Tujkovic, “Deep learning coordinated beamforming for highly -mobile millimeter wave systems,” IEEE Access, vol. 6, pp. 37328–37348, Jun. 2018

  63. [63]

    Deep Learning for TDD and FDD Massive MIMO: Mapping Channels in Space and Frequency,

    M. Alrabeiah and A. Alkhateeb, “Deep Learning for TDD and FDD Massive MIMO: Mapping Channels in Space and Frequency,” in Proc. Asilomar Conf. on Signals, Systems and Computers , May 2019. (Also: arXiv:1905.03761)

  64. [64]

    Deep Learning Based Massive MIMO Beamforming for 5G Mobile Network,

    T. Maksymyuk, J. Gazda, O. Yaremko, and D. Nevinskiy, “Deep Learning Based Massive MIMO Beamforming for 5G Mobile Network,” in Proc. IEEE International Symposium on Wireless Systems, Sep. 2018, pp. 241–244

  65. [65]

    A Framework for Automated Cellular Network Tuning with Reinforcement Learning,

    F. B. Mismar, J. Choi, and B. L. Evans, “A Framework for Automated Cellular Network Tuning with Reinforcement Learning,” IEEE Trans. on Communications, vol. 67, no. 10, pp. 7152–7167, Oct. 2019

  66. [66]

    Deep Learning- Based Beam Management and Interference Coordination in Dense mmWave Networks,

    P. Zhou, X. Fang, X. Wang, Y. Long, R. He, and X. Han, “Deep Learning- Based Beam Management and Interference Coordination in Dense mmWave Networks,” IEEE Trans. on Vehicular Technology, vol. 68, no. 1, pp. 592–603, Jan. 2019

  67. [67]

    A Deep Learning Framework for Optimization of MISO Downlink Beamforming,

    W. Xia, G. Zheng, Y. Zhu, J. Zhang, J. Wang, and A. P. Petropulu, “A Deep Learning Framework for Optimization of MISO Downlink Beamforming,” Jan. 2019. (arXiv:1901.00354)

  68. [68]

    QoS-Aware Load Balancing in Wireless Networks using Clipped Double Q-Learning,

    P. E. Iturria-Rivera and M. Erol -Kantarci, “QoS-Aware Load Balancing in Wireless Networks using Clipped Double Q-Learning,” in Proc. IEEE MASS, Denver, CO, USA, 2021, pp. 10 –16, doi: 10.1109/MASS52906.2021.00011