Deep Reinforcement Learning Approach to QoSAware Load Balancing in 5G Cellular Networks under User Mobility and Observation Uncertainty

Hossein Soleimani; Mehrshad Eskandarpour

arxiv: 2510.24869 · v3 · submitted 2025-10-28 · 💻 cs.NI · cs.SY· eess.SY

Deep Reinforcement Learning Approach to QoSAware Load Balancing in 5G Cellular Networks under User Mobility and Observation Uncertainty

Mehrshad Eskandarpour , Hossein Soleimani This is my paper

Pith reviewed 2026-05-18 02:45 UTC · model grok-4.3

classification 💻 cs.NI cs.SYeess.SY

keywords deep reinforcement learningload balancing5G networksPPOQoS managementmobilitycell individual offset

0 comments

The pith

PPO reinforcement learning agent improves 5G load balancing by dynamically adjusting cell individual offsets under mobility and noisy observations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep reinforcement learning system using Proximal Policy Optimization to manage load balancing in 5G cellular networks. The agent learns to set Cell Individual Offset values to direct user connections, aiming to enhance multiple quality of service metrics simultaneously. This approach is tested in a custom Python simulator that includes user movement and measurement errors, showing better performance than traditional methods. A sympathetic reader would care because effective load balancing is essential for maintaining service quality as networks become denser and more dynamic with user mobility.

Core claim

The PPO-based agent, trained end-to-end in the simulator, produces policies that increase aggregate throughput and Jain's fairness index while decreasing latency, jitter, packet loss, and the number of handovers compared to rule-based approaches like ReBuHa and A3, and the CDQL baseline. The learning shows stable convergence across hundreds of episodes and generalizes better as user density rises.

What carries the argument

The actor-critic neural network within the PPO algorithm that learns a policy for periodic adjustment of Cell Individual Offset (CIO) values to steer user-cell associations based on a multi-objective reward function incorporating throughput, latency, jitter, packet loss, fairness, and handover count.

If this is right

The learned policy maintains smoother training dynamics than the CDQL baseline.
It exhibits stronger generalization when user load increases in stress tests.
Overall KPI improvements are consistent across 500+ training episodes.
The framework uses an entirely Python-based toolchain for both simulation and training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the policy transfers well, it could reduce the need for manual tuning of network parameters in live 5G deployments.
Extending the simulator with more complex mobility models or real-world trace data could test robustness further.
Combining this with other RL applications in RAN optimization might lead to integrated autonomous network management systems.

Load-bearing premise

That training in the lightweight Python simulator with Gauss-Markov mobility and added noise produces policies that will perform similarly when deployed in actual 5G radio access networks.

What would settle it

Deploy the trained PPO policy in a commercial 5G testbed or a more detailed simulator like ns-3 with real base station parameters and measure whether the reported KPI improvements over baselines hold under equivalent user mobility patterns.

read the original abstract

Efficient mobility management and load balancing are critical to sustaining Quality of Service (QoS) in dense, highly dynamic 5G radio access networks. We present a deep reinforcement learning framework based on Proximal Policy Optimization (PPO) for autonomous, QoS-aware load balancing implemented end-to-end in a lightweight, pure-Python simulation environment. The control problem is formulated as a Markov Decision Process in which the agent periodically adjusts Cell Individual Offset (CIO) values to steer user-cell associations. A multi-objective reward captures key performance indicators (aggregate throughput, latency, jitter, packet loss rate, Jain's fairness index, and handover count), so the learned policy explicitly balances efficiency and stability under user mobility and noisy observations. The PPO agent uses an actor-critic neural network trained from trajectories generated by the Python simulator with configurable mobility (e.g., Gauss-Markov) and stochastic measurement noise. Across 500+ training episodes and stress tests with increasing user density, the PPO policy consistently improves KPI trends (higher throughput and fairness, lower delay, jitter, packet loss, and handovers) and exhibits rapid, stable convergence. Comparative evaluations show that PPO outperforms rule-based ReBuHa and A3 as well as the learning-based CDQL baseline across all KPIs while maintaining smoother learning dynamics and stronger generalization as load increases. These results indicate that PPO's clipped policy updates and advantage-based training yield robust, deployable control for next-generation RAN load balancing using an entirely Python-based toolchain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PPO for 5G CIO load balancing works inside their Python simulator but the results rest on unvalidated assumptions about how well that simulator matches real networks.

read the letter

The paper applies Proximal Policy Optimization to adjust Cell Individual Offsets for QoS-aware load balancing in 5G networks. It does this inside a lightweight pure-Python simulator that models user mobility with Gauss-Markov and adds stochastic noise to observations. The agent is trained to balance several KPIs at once and reportedly outperforms a couple of rule-based methods and another learning baseline. What stands out is the end-to-end setup with a multi-objective reward that covers throughput, latency, jitter, packet loss, fairness, and handover count. They run training across more than 500 episodes and test under increasing user density, showing stable convergence and better generalization at higher loads. The pure-Python choice keeps things simple and accessible. The soft spots center on evaluation. The abstract and summary give no concrete KPI numbers, confidence intervals, or details on how many seeds were used, which makes it hard to assess how reliable the outperformance is. More importantly, the simulator's outputs are not checked against real 5G deployment data or more detailed simulators, so the transferability of the policy remains an open question. This would be relevant for researchers working on reinforcement learning applications in mobile networks who need a concrete implementation example. A reader focused on practical deployment would need additional validation steps before considering it seriously. I recommend sending it to peer review. The core formulation and training approach are solid enough that referees can help address the experimental gaps.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a PPO-based deep reinforcement learning framework for QoS-aware load balancing in 5G networks. The control task is cast as an MDP in which an agent adjusts Cell Individual Offset (CIO) values to steer user associations. A multi-objective reward incorporates aggregate throughput, latency, jitter, packet loss, Jain fairness, and handover count. Training occurs inside a lightweight pure-Python simulator that uses Gauss-Markov mobility and additive stochastic measurement noise. The authors report that, across 500+ episodes and stress tests with rising user density, the learned policy improves all KPIs relative to rule-based ReBuHa and A3 baselines as well as the CDQL learning baseline, while exhibiting stable convergence and better generalization under load.

Significance. If the simulator trajectories prove statistically representative of live 5G RAN behavior, the work would supply a fully reproducible Python toolchain for multi-objective, mobility-aware load balancing that explicitly balances efficiency against stability. The explicit multi-KPI reward and the use of PPO's clipped updates for robustness are constructive elements. The absence of any quantitative grounding against real RAN traces or established simulators, however, confines the demonstrated gains to the custom environment and limits immediate claims of deployability.

major comments (2)

[Abstract] Abstract: the central empirical claim states that PPO 'consistently improves KPI trends' and 'outperforms' ReBuHa, A3, and CDQL 'across all KPIs' yet supplies no numerical KPI values, confidence intervals, statistical tests, random-seed counts, or re-implementation details for the baselines, rendering the magnitude and reliability of the reported gains impossible to evaluate from the given text.
[Simulation Environment] Simulation setup (as described in the abstract and skeptic note): the transfer and deployability claims rest on the untested assumption that trajectories generated by the pure-Python simulator with Gauss-Markov mobility and stochastic noise produce KPI statistics sufficiently close to real 5G radio access networks; no quantitative comparison of handover rates, throughput distributions, delay tails, or other metrics against operator traces, ns-3 5G modules, or field data is provided.

minor comments (1)

[Abstract] Abstract: the phrase 'entirely Python-based toolchain' would benefit from explicit listing of the neural-network and optimization libraries employed, to clarify reproducibility requirements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim states that PPO 'consistently improves KPI trends' and 'outperforms' ReBuHa, A3, and CDQL 'across all KPIs' yet supplies no numerical KPI values, confidence intervals, statistical tests, random-seed counts, or re-implementation details for the baselines, rendering the magnitude and reliability of the reported gains impossible to evaluate from the given text.

Authors: We agree that the abstract would benefit from greater specificity. In the revised version we will incorporate key quantitative results from our experiments, including representative KPI improvements with associated confidence intervals and details on the number of random seeds and episodes used. revision: yes
Referee: [Simulation Environment] Simulation setup (as described in the abstract and skeptic note): the transfer and deployability claims rest on the untested assumption that trajectories generated by the pure-Python simulator with Gauss-Markov mobility and stochastic noise produce KPI statistics sufficiently close to real 5G radio access networks; no quantitative comparison of handover rates, throughput distributions, delay tails, or other metrics against operator traces, ns-3 5G modules, or field data is provided.

Authors: The referee correctly identifies that our evaluation is performed entirely within the custom simulator. We will expand the manuscript with an explicit limitations subsection that discusses the modeling assumptions of the Gauss-Markov mobility model and additive noise, clarifies that the work does not claim direct equivalence to live RAN traces, and outlines future validation steps against ns-3 or operator data. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical training results independent of inputs

full rationale

The paper defines an MDP for periodic CIO adjustment, specifies a multi-objective reward function over measurable KPIs (throughput, latency, jitter, packet loss, fairness, handovers), and trains a PPO actor-critic policy on trajectories generated by an external pure-Python simulator with Gauss-Markov mobility and additive noise. All reported performance gains versus ReBuHa, A3, and CDQL are obtained by executing this training and evaluation loop inside the simulator; no equation, fitted parameter, or self-citation is shown to reduce algebraically to the target KPI improvements by construction. The derivation chain therefore remains self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The reported performance gains rest on the fidelity of the custom simulator and the appropriateness of the chosen reward components; no new physical entities are postulated.

free parameters (1)

PPO clip ratio, learning rate, and network architecture sizes
Standard reinforcement-learning hyperparameters selected to stabilize training in the described environment.

axioms (1)

domain assumption The Python simulator with Gauss-Markov mobility and additive measurement noise produces statistically representative trajectories for real 5G radio conditions.
All comparative KPI improvements are measured inside this simulator; transfer to hardware depends on this modeling assumption.

pith-pipeline@v0.9.0 · 5812 in / 1491 out tokens · 50447 ms · 2026-05-18T02:45:27.663744+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The control problem is formulated as a Markov Decision Process in which the agent periodically adjusts Cell Individual Offset (CIO) values... multi-objective reward captures... throughput, latency, jitter, packet loss rate, Jain’s fairness index, and handover count
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

User mobility follows configurable stochastic models (e.g., Gauss–Markov)... measurements are corrupted with controlled noise

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages

[1]

strongest - signal

(6) We apply light temporal filtering (e.g., EMA) and running normalization on these channels before feeding the learning agent. MDP Formulation We define an episodic Markov Decision Process (𝑆,𝐴,𝑃,𝑅,𝛾). State At decision time t the controller observes per-BS aggregates: 𝜂(𝑡) = [𝜂1(𝑡),…,𝜂𝑀(𝑡)] (7) 𝑇(𝑡) = [𝑇̅1(𝑡),…,𝑇̅𝑀(𝑡)] (8) 𝐽(𝑡) = [𝐽1̅(𝑡),…,𝐽𝑀̅̅̅(𝑡)] (9...

work page 2048
[2]

Hossein Soleimani and Azzedine Boukerche. 2014. CAMS transmission rate adaptation for vehicular safety application in LTE. In Proceedings of the fourth ACM international symposium on Development and analysis of intelligent vehicular networks and applications (DIVANet '14). Association for Comput ing Machinery, New York, NY, USA, 47 –52. https://doi.org/10...

work page doi:10.1145/2656346.2656347 2014
[3]

Eskandarpour, Mehrshad & Soleimani, Hossein. (2025). Enhancing Lifetime and Reliability in WSNs: Complementary of Dual‐Battery Systems Energy Management Strategy. International Journal of Distributed Sensor Networks. 2025. 10.1155/dsn/5870686

work page doi:10.1155/dsn/5870686 2025
[4]

Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks,

H. Jiang, G. Li, J. Xie and J. Yang, "Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks," in IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 4, pp. 5269-5279, April 2024, doi: 10.1109/TNNLS.2022.3203024

work page doi:10.1109/tnnls.2022.3203024 2024
[5]

An Empirical Study on Bias Reduction: Clipped Double Q vs. Multi-Step Methods,

Y. Bai, "An Empirical Study on Bias Reduction: Clipped Double Q vs. Multi-Step Methods," 2021 International Conference on Computer Information Science and Artificial Intelligence (CISAI), Kunming, China, 2021, pp. 1063-1068, doi: 10.1109/CISAI54367.2021.00213

work page doi:10.1109/cisai54367.2021.00213 2021
[6]

Power Allocation in 5G Wireless Communication,

Z. Chen and Q. Liang, "Power Allocation in 5G Wireless Communication," in IEEE Access, vol. 7, pp. 60785 -60792, 2019, doi: 10.1109/ACCESS.2019.2915099

work page doi:10.1109/access.2019.2915099 2019
[7]

Energy -Efficient Joint User and Power Allocation in 5G Millimeter Wave Networks: A Genetic Algorithm - Based Approach,

A. Fayad and T. Cinkler, "Energy -Efficient Joint User and Power Allocation in 5G Millimeter Wave Networks: A Genetic Algorithm - Based Approach," in IEEE Access, vol. 12, pp. 20019-20030, 2024, doi: 10.1109/ACCESS.2024.3361660

work page doi:10.1109/access.2024.3361660 2024
[8]

Hassani, Alireza & Delir Haghighi, Pari & Jayaraman, Prem Prakash & Zaslavsky, Arkady & Medvedev, Alexey. (2016). CDQL: A Generic Context Representation and Querying Approach for Internet of Things Applications. 79-88. 10.1145/3007120.3007137

work page doi:10.1145/3007120.3007137 2016
[9]

Hill, Mary & Laughter, Melissa & Harmange, Cecile & Dellavalle, Robert & Rundle, Chandler & Dunnick, Cory. (2021). Development of the CDQL: a comprehensive quality of life measure for patients with contact dermatitis (Preprint). 10.2196/preprints.30620

work page doi:10.2196/preprints.30620 2021
[10]

Minh Do, Canh & Takagi, Tsubasa & Ogata, Kazuhiro. (2024). Automated Quantum Protocol Verification Based on Concurrent Dynamic Quantum Logic. ACM Transactions on Software Engineering and Methodology. 34. 10.1145/3708475

work page doi:10.1145/3708475 2024
[11]

Power Allocation and Parameter Estimation for Multipath -Based 5G Positioning,

A. Kakkavas, H. Wymeersch, G. Seco-Granados, M. H. C. García, R. A. Stirling-Gallacher and J. A. Nossek, "Power Allocation and Parameter Estimation for Multipath -Based 5G Positioning," in IEEE Transactions on Wireless Communications, vol. 20, no. 11, pp. 7302-7316, Nov. 2021, doi: 10.1109/TWC.2021.3082581

work page doi:10.1109/twc.2021.3082581 2021
[12]

Joint Time and Power Allocation for 5G NR Unlicensed Systems,

H. Bao, Y. Huo, X. Dong and C. Huang, "Joint Time and Power Allocation for 5G NR Unlicensed Systems," in IEEE Transactions on Wireless Communications, vol. 20, no. 9, pp. 6195-6209, Sept. 2021, doi: 10.1109/TWC.2021.3072553

work page doi:10.1109/twc.2021.3072553 2021
[13]

Iturria Rivera, Pedro & Elsayed, Medhat & Bavand, Majid & Gaigalas, Raimundas & Furr, Steve & Erol Kantarci, Melike. (2023). Hierarchical Deep Q -Learning Based Handover in Wireless Networks with Dual Connectivity. 10.48550/arXiv.2301.05391

work page doi:10.48550/arxiv.2301.05391 2023
[14]

Kavosi, Daruosh & Karimi, Abbas & Zarafshan, Faraneh. (2024). SELF- QMM: An Self -directed Model Based -on Extended Q -Learning and Markov Model to Estimate MTTF in Multiprocessor Platform of Embedded Systems. 10.21203/rs.3.rs-5327542/v1

work page doi:10.21203/rs.3.rs-5327542/v1 2024
[15]

Evolved Universal Terrestrial Radio Access (E -UTRA); Radio Resource Control (RRC); Protocol specification,

3GPP, “Evolved Universal Terrestrial Radio Access (E -UTRA); Radio Resource Control (RRC); Protocol specification,” 3GPP TS 36.331, Release 15, Dec. 2020. [Online]. Available: https://www.3gpp.org/DynaReport/36331.htm

work page 2020
[16]

Load Balancing in Cellular Networks: A Reinforcement Learning Approach,

K. Attiah, M. Alsheikh, N. Saeed, and T. Y. Al-Naffouri, “Load Balancing in Cellular Networks: A Reinforcement Learning Approach,” in Proc. IEEE Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, Jan. 2020, pp. 1 –6. doi: 10.1109/CCNC46108.2020.9045533

work page doi:10.1109/ccnc46108.2020.9045533 2020
[17]

Load Balancing for Ultra- Dense Networks: A Deep Reinforcement Learning -Based Approach,

Y. Xu, Q. Wu, R. Atat, Y. Zhao, and Z. Ren, “Load Balancing for Ultra- Dense Networks: A Deep Reinforcement Learning -Based Approach,” IEEE Internet of Things Journal, vol. 8, no. 7, pp. 5141–5155, Apr. 2021. doi: 10.1109/JIOT.2020.3035289

work page doi:10.1109/jiot.2020.3035289 2021
[18]

Handover Management in 5G Networks Using Reinforcement Learning,

V. Yajnanarayana, A. Gupta, and P. Mannion, “Handover Management in 5G Networks Using Reinforcement Learning,” in Proc. IEEE 5G World Forum (5GWF), Bangalore, India, Sept. 2020, pp. 1 –6. doi: 10.1109/5GWF49715.2020.9221346

work page doi:10.1109/5gwf49715.2020.9221346 2020
[19]

Trust in 5g open rans through machine learning: Rf fingerprinting on the powder pawr plat- form,

Z.-H. Huang, K. -W. Lu, and C. -L. Wang, “Efficient Handover in 5G Using Deep Learning,” in Proc. IEEE Global Communications Conference (GLOBECOM), Taipei, Taiwan, Dec. 2020, pp. 1 –6. doi: 10.1109/GLOBECOM42002.2020.9322453

work page doi:10.1109/globecom42002.2020.9322453 2020
[20]

Reinforcement Learning-Based Beam Management and Interference Mitigation in mmWave Networks,

L. He, Y. Xu, R. Atat, N. Mastronarde, and Y. Zhao, “Reinforcement Learning-Based Beam Management and Interference Mitigation in mmWave Networks,” IEEE Access, vol. 9, pp. 12345–12357, Jan. 2021. doi: 10.1109/ACCESS.2021.3051195

work page doi:10.1109/access.2021.3051195 2021
[21]

Hierarchical Reinforcement Learning for Mobility Management in 5G Ultra-Dense Networks,

J. Chen, Y. Wang, and X. Chu, “Hierarchical Reinforcement Learning for Mobility Management in 5G Ultra-Dense Networks,” IEEE Transactions on Network and Service Management, vol. 18, no. 1, pp. 778 –790, Mar

work page
[22]

doi: 10.1109/TNSM.2020.3045406

work page doi:10.1109/tnsm.2020.3045406 2020
[23]

QoS -Aware Multi -Objective Reinforcement Learning for User Association in 5G Networks,

Z. Li, W. Saad, and M. Bennis, “QoS -Aware Multi -Objective Reinforcement Learning for User Association in 5G Networks,” Computer Networks, vol. 210, p. 107905, Mar. 2022. doi: 10.1016/j.comnet.2022.107905

work page doi:10.1016/j.comnet.2022.107905 2022
[24]

Energy and Latency Optimization for Edge Intelligence via Deep Reinforcement Learning,

A. Rahmati, A. Azari, and C. Fischione, “Energy and Latency Optimization for Edge Intelligence via Deep Reinforcement Learning,” IEEE Transactions on Wireless Communications, vol. 21, no. 6, pp. 4116–4129, Jun. 2022. doi: 10.1109/TWC.2021.3136266

work page doi:10.1109/twc.2021.3136266 2022
[25]

Addressing Function Approximation Error in Actor -Critic Methods,

S. Fujimoto, H. van Hoof, and D. Meger, “Addressing Function Approximation Error in Actor -Critic Methods,” in Proc. International Conference on Machine Learning (ICML), Stockholm, Sweden, Jul. 2018, pp. 1587 –1596. [Online]. Available: https://proceedings.mlr.press/v80/fujimoto18a.html

work page 2018
[26]

Handover in LTE-advanced wireless networks: state of art and survey of decision algorithm,

R. Ahmad, E. A. Sundararajan, N. E. Othman, and M. Ismail, “Handover in LTE-advanced wireless networks: state of art and survey of decision algorithm,” Telecommunication Systems, 2017

work page 2017
[27]

A Survey on Handover Management: From LTE to NR,

M. Tayyab, X. Gelabert, and R. Jantti, “A Survey on Handover Management: From LTE to NR,” 2019

work page 2019
[28]

5G Handover using Reinforcement Learning,

V. Yajnanarayana, H. Ryden, and L. Hevizi, “5G Handover using Reinforcement Learning,” in Proc. IEEE 3rd 5G World Forum (5GWF), 2020

work page 2020
[29]

Efficient Handover Algorithm in 5G Networks using Deep Learning,

Z.-H. Huang, Y. -L. Hsu, P. -K. Chang, and M. -J. Tsai, “Efficient Handover Algorithm in 5G Networks using Deep Learning,” in IEEE GLOBECOM 2020, pp. 1–6, Dec. 2020

work page 2020
[30]

Load Balancing for Ultra - dense Networks: A Deep Reinforcement Learning -Based Approach,

Y. Xu, W. Xu, Z. Wang, J. Lin, and S. Cui, “Load Balancing for Ultra - dense Networks: A Deep Reinforcement Learning -Based Approach,” IEEE Internet of Things Journal, 2019

work page 2019
[31]

Load Balancing in Cellular Networks: A Reinforcement Learning Approach,

K. Attiah, K. Banawan, A. Gaber, A. Elezabi, K. Seddik, Y. Gadallah, and K. Abdullah, “Load Balancing in Cellular Networks: A Reinforcement Learning Approach,” in Proc. IEEE CCNC, 2020

work page 2020
[32]

Stackelberg game -based deployment design and radio resource allocation in coordinated UAVs -assisted vehicular communication networks,

M. Hosseini and R. Ghazizadeh, “Stackelberg game -based deployment design and radio resource allocation in coordinated UAVs -assisted vehicular communication networks,” IEEE Trans. Veh. Technol., vol. 72, no. 1, pp. 1196–1210, Jan. 2023, doi: 10.1109/TVT.2022.3206145

work page doi:10.1109/tvt.2022.3206145 2023
[33]

A joint power and bandwidth allocation method based on deep reinforcement learning for V2V communications in 5G,

X. Hu, S. Xu, L. Wang, Y. Wang, Z. Liu, L. Xu, Y. Li, and W. Wang, “A joint power and bandwidth allocation method based on deep reinforcement learning for V2V communications in 5G,” China Communications, vol. 18, no. 7, pp. 25–35, Jul. 2021

work page 2021
[34]

A deep reinforcement learning based D2D relay selection and power level allocation in mmWave vehicular networks,

H. Zhang, S. Chong, X. Zhang, and N. Lin, “A deep reinforcement learning based D2D relay selection and power level allocation in mmWave vehicular networks,” IEEE Wireless Commun. Lett., vol. 9, no. 3, pp. 416–419, Mar. 2020

work page 2020
[35]

Knowledge -driven resource allocation for wireless networks: A WMMSE unrolled graph neural network approach,

H. Yang, N. Cheng, R. Sun, W. Quan, R. Chai, K. Aldubaikhy, A. Alqasir, and X. Shen, “Knowledge -driven resource allocation for wireless networks: A WMMSE unrolled graph neural network approach,” IEEE Internet of Things Journal, vol. 11, no. 10, pp. 189–…, 2024

work page 2024
[36]

Joint power control and channel allocation for interference mitigation based on reinforcement learning,

G. Zhao, Y. Li, C. Xu, Z. Han, Y. Xing, and S. Yu, “Joint power control and channel allocation for interference mitigation based on reinforcement learning,” IEEE Access, vol. 7, pp. 177254–177265, 2019

work page 2019
[37]

Joint optimization of handover control and power allocation based on multi -agent deep reinforcement learning,

D. Guo, L. Tang, X. Zhang, and Y. -C. Liang, “Joint optimization of handover control and power allocation based on multi -agent deep reinforcement learning,” IEEE Trans. Veh. Technol., vol. 69, no. 11, pp. 13124–13138, Nov. 2020

work page 2020
[38]

Resource management in future millimeter wave small -cell networks: Joint PHY - MAC layer design,

J. Shi, H. Pervaiz, P. Xiao, W. Liang, Z. Li, and Z. Ding, “Resource management in future millimeter wave small -cell networks: Joint PHY - MAC layer design,” IEEE Access, vol. 7, pp. 76910–76919, 2019

work page 2019
[39]

Self -organizing mm-wave networks: A power allocation scheme based on machine learning,

R. Amiri and H. Mehrpouyan, “Self -organizing mm-wave networks: A power allocation scheme based on machine learning,” in Proc. 11th Global Symp. Millim. Waves (GSMM), 2018

work page 2018
[40]

A model -driven deep reinforcement learning heuristic algorithm for resource allocation in ultra- dense cellular networks,

X. Liao, J. Shi, Z. Li, L. Zhang, and B. Xia, “A model -driven deep reinforcement learning heuristic algorithm for resource allocation in ultra- dense cellular networks,” IEEE Trans. Veh. Technol., vol. 69, no. 1, pp. 983–997, Jan. 2020

work page 2020
[41]

Self -adaptive power control with deep reinforcement learning for millimeter-wave Internet-of- vehicles video caching,

D. Kwon, J. Kim, D. A. Mohaisen, and W. Lee, “Self -adaptive power control with deep reinforcement learning for millimeter-wave Internet-of- vehicles video caching,” Journal of Communications and Networks, vol. 22, no. 4, pp. 326–337, Aug. 2020

work page 2020
[42]

A survey on uplink resource allocation in OFDMA wireless networks,

E. Yaacoub and Z. Dawy, “A survey on uplink resource allocation in OFDMA wireless networks,” IEEE Commun. Surveys & Tutorials, vol. 14, no. 2, pp. 322–337, 2nd Quart., 2012

work page 2012
[43]

Experience -driven power allocation using multi -agent deep reinforcement learning for millimeter -wave high -speed railway systems,

J. Xu and B. Ai, “Experience -driven power allocation using multi -agent deep reinforcement learning for millimeter -wave high -speed railway systems,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 6, pp. 5490–5500, Jun. 2022

work page 2022
[44]

V2X offloading and resource allocation in SDN -assisted MEC -based vehicular networks,

H. Zhang, Z. Wang, and K. Liu, “V2X offloading and resource allocation in SDN -assisted MEC -based vehicular networks,” China Communications, vol. 17, no. 5, pp. 266–283, May 2020

work page 2020
[45]

Federated reinforcement learning -based resource allocation in D2D -enabled 6G,

Q. Guo, F. Tang, and N. Kato, “Federated reinforcement learning -based resource allocation in D2D -enabled 6G,” IEEE Network, vol. 37, no. 5, pp. 89–95, Sep. 2023

work page 2023
[46]

Resource allocation for high -reliability low-latency vehicular communications with packet retransmission,

C. Guo, L. Liang, and G. Y. Li, “Resource allocation for high -reliability low-latency vehicular communications with packet retransmission,” IEEE Trans. Veh. Technol., vol. 68, no. 7, pp. 6219–6230, Jul. 2019

work page 2019
[47]

Deep Reinforcement Learning for 5G Networks: Joint Beamforming, Power Control, and Interference Coordination,

F. B. Mismar, B. L. Evans and A. Alkhateeb, "Deep Reinforcement Learning for 5G Networks: Joint Beamforming, Power Control, and Interference Coordination," in IEEE Transactions on Communications, vol. 68, no. 3, pp. 1581 -1592, March 2020, doi: 10.1109/TCOMM.2019.2961332

work page doi:10.1109/tcomm.2019.2961332 2020
[48]

Massive MIMO With Joint Power Control,

J. Choi, “Massive MIMO With Joint Power Control,” IEEE Wireless Communications Letters, vol. 3, no. 4, pp. 329–332, Aug. 2014

work page 2014
[49]

Joint Power Control and Beamforming for Uplink Non -Orthogonal Multiple Access in 5G Millimeter -Wave Communications,

L. Zhu, J. Zhang, Z. Xiao, X. Cao, D. O. Wu, and X. Xia, “Joint Power Control and Beamforming for Uplink Non -Orthogonal Multiple Access in 5G Millimeter -Wave Communications,” IEEE Trans. on Wireless Communications, vol. 17, no. 9, pp. 6177–6189, Sep. 2018

work page 2018
[50]

Online Power Control for 5G Wireless Communications: A Deep Q-Network Approach,

C. Luo, J. Ji, Q. Wang, L. Yu, and P. Li, “Online Power Control for 5G Wireless Communications: A Deep Q-Network Approach,” in Proc. IEEE ICC, May 2018

work page 2018
[51]

Joint optimal power control and beamforming in wireless networks using antenna arrays,

F. Rashid-Farrokhi, L. Tassiulas, and K. J. R. Liu, “Joint optimal power control and beamforming in wireless networks using antenna arrays,” IEEE Trans. on Communications , vol. 46, no. 10, pp. 1313 –1324, Oct. 1998

work page 1998
[52]

Evolved Universal Terrestrial Radio Access (E-UTRA); Overall description,

3GPP, “Evolved Universal Terrestrial Radio Access (E-UTRA); Overall description,” TS 36.300, Jan. 2019

work page 2019
[53]

Online Learning-based Downlink Transmission Coordination in Ultra -Dense Millimeter Wave Heterogeneous Networks,

R. Kim, Y. Kim, N. Y. Yu, S. Kim, and H. Lim, “Online Learning-based Downlink Transmission Coordination in Ultra -Dense Millimeter Wave Heterogeneous Networks,” IEEE Trans. on Wireless Communications , vol. 18, no. 4, pp. 2200–2214, Mar. 2019

work page 2019
[54]

Reinforcement Learning for Link Adaptation in MIMO-OFDM Wireless Systems,

S. Yun and C. Caramanis, “Reinforcement Learning for Link Adaptation in MIMO-OFDM Wireless Systems,” in Proc. IEEE GLOBECOM, Dec. 2010

work page 2010
[55]

A Q-learning Based Approach to Interference Avoidance in Self -Organized Femtocell Networks,

M. Bennis and D. Niyato, “A Q-learning Based Approach to Interference Avoidance in Self -Organized Femtocell Networks,” in Proc. IEEE Globecom Workshops, Dec. 2010

work page 2010
[56]

Q-Learning Algorithm for VoLTE Closed Loop Power Control in Indoor Small Cells,

F. B. Mismar and B. L. Evans, “Q-Learning Algorithm for VoLTE Closed Loop Power Control in Indoor Small Cells,” in Proc. Asilomar Conf. on Signals, Systems, and Computers, Oct. 2018

work page 2018
[57]

Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks,

S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks,” IEEE Trans. on Cognitive Communications and Networking, vol. 4, no. 2, pp. 257–265, Jun. 2018

work page 2018
[58]

Data -Driven Deep Learning for Automatic Modulation Recognition in Cognitive Radios,

Y. Wang, M. Liu, J. Yang, and G. Gui, “Data -Driven Deep Learning for Automatic Modulation Recognition in Cognitive Radios,” IEEE Trans. on Vehicular Technology, vol. 68, no. 4, pp. 4074–4077, Apr. 2019

work page 2019
[59]

Deep learning -based power control for non -orthogonal random access,

H. S. Jang, H. Lee, and T. Q. S. Quek, “Deep learning -based power control for non -orthogonal random access,” IEEE Communications Letters, pp. 1–1, Aug. 2019

work page 2019
[60]

Deep Learning Based Online Power Control for Large Energy Harvesting Networks,

M. K. Sharma, A. Zappone, M. Debbah, and M. Assaad, “Deep Learning Based Online Power Control for Large Energy Harvesting Networks,” in Proc. IEEE ICASSP, May 2019, pp. 8429–8433

work page 2019
[61]

Deep power control: Transmit power control scheme based on convolutional neural network,

W. Lee, M. Kim, and D. Cho, “Deep power control: Transmit power control scheme based on convolutional neural network,” IEEE Communications Letters, vol. 22, no. 6, pp. 1276–1279, Jun. 2018

work page 2018
[62]

Deep learning coordinated beamforming for highly -mobile millimeter wave systems,

A. Alkhateeb, S. Alex, P. Varkey, Y. Li, Q. Qu, and D. Tujkovic, “Deep learning coordinated beamforming for highly -mobile millimeter wave systems,” IEEE Access, vol. 6, pp. 37328–37348, Jun. 2018

work page 2018
[63]

Deep Learning for TDD and FDD Massive MIMO: Mapping Channels in Space and Frequency,

M. Alrabeiah and A. Alkhateeb, “Deep Learning for TDD and FDD Massive MIMO: Mapping Channels in Space and Frequency,” in Proc. Asilomar Conf. on Signals, Systems and Computers , May 2019. (Also: arXiv:1905.03761)

work page arXiv 2019
[64]

Deep Learning Based Massive MIMO Beamforming for 5G Mobile Network,

T. Maksymyuk, J. Gazda, O. Yaremko, and D. Nevinskiy, “Deep Learning Based Massive MIMO Beamforming for 5G Mobile Network,” in Proc. IEEE International Symposium on Wireless Systems, Sep. 2018, pp. 241–244

work page 2018
[65]

A Framework for Automated Cellular Network Tuning with Reinforcement Learning,

F. B. Mismar, J. Choi, and B. L. Evans, “A Framework for Automated Cellular Network Tuning with Reinforcement Learning,” IEEE Trans. on Communications, vol. 67, no. 10, pp. 7152–7167, Oct. 2019

work page 2019
[66]

Deep Learning- Based Beam Management and Interference Coordination in Dense mmWave Networks,

P. Zhou, X. Fang, X. Wang, Y. Long, R. He, and X. Han, “Deep Learning- Based Beam Management and Interference Coordination in Dense mmWave Networks,” IEEE Trans. on Vehicular Technology, vol. 68, no. 1, pp. 592–603, Jan. 2019

work page 2019
[67]

A Deep Learning Framework for Optimization of MISO Downlink Beamforming,

W. Xia, G. Zheng, Y. Zhu, J. Zhang, J. Wang, and A. P. Petropulu, “A Deep Learning Framework for Optimization of MISO Downlink Beamforming,” Jan. 2019. (arXiv:1901.00354)

work page arXiv 2019
[68]

QoS-Aware Load Balancing in Wireless Networks using Clipped Double Q-Learning,

P. E. Iturria-Rivera and M. Erol -Kantarci, “QoS-Aware Load Balancing in Wireless Networks using Clipped Double Q-Learning,” in Proc. IEEE MASS, Denver, CO, USA, 2021, pp. 10 –16, doi: 10.1109/MASS52906.2021.00011

work page doi:10.1109/mass52906.2021.00011 2021

[1] [1]

strongest - signal

(6) We apply light temporal filtering (e.g., EMA) and running normalization on these channels before feeding the learning agent. MDP Formulation We define an episodic Markov Decision Process (𝑆,𝐴,𝑃,𝑅,𝛾). State At decision time t the controller observes per-BS aggregates: 𝜂(𝑡) = [𝜂1(𝑡),…,𝜂𝑀(𝑡)] (7) 𝑇(𝑡) = [𝑇̅1(𝑡),…,𝑇̅𝑀(𝑡)] (8) 𝐽(𝑡) = [𝐽1̅(𝑡),…,𝐽𝑀̅̅̅(𝑡)] (9...

work page 2048

[2] [2]

Hossein Soleimani and Azzedine Boukerche. 2014. CAMS transmission rate adaptation for vehicular safety application in LTE. In Proceedings of the fourth ACM international symposium on Development and analysis of intelligent vehicular networks and applications (DIVANet '14). Association for Comput ing Machinery, New York, NY, USA, 47 –52. https://doi.org/10...

work page doi:10.1145/2656346.2656347 2014

[3] [3]

Eskandarpour, Mehrshad & Soleimani, Hossein. (2025). Enhancing Lifetime and Reliability in WSNs: Complementary of Dual‐Battery Systems Energy Management Strategy. International Journal of Distributed Sensor Networks. 2025. 10.1155/dsn/5870686

work page doi:10.1155/dsn/5870686 2025

[4] [4]

Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks,

H. Jiang, G. Li, J. Xie and J. Yang, "Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks," in IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 4, pp. 5269-5279, April 2024, doi: 10.1109/TNNLS.2022.3203024

work page doi:10.1109/tnnls.2022.3203024 2024

[5] [5]

An Empirical Study on Bias Reduction: Clipped Double Q vs. Multi-Step Methods,

Y. Bai, "An Empirical Study on Bias Reduction: Clipped Double Q vs. Multi-Step Methods," 2021 International Conference on Computer Information Science and Artificial Intelligence (CISAI), Kunming, China, 2021, pp. 1063-1068, doi: 10.1109/CISAI54367.2021.00213

work page doi:10.1109/cisai54367.2021.00213 2021

[6] [6]

Power Allocation in 5G Wireless Communication,

Z. Chen and Q. Liang, "Power Allocation in 5G Wireless Communication," in IEEE Access, vol. 7, pp. 60785 -60792, 2019, doi: 10.1109/ACCESS.2019.2915099

work page doi:10.1109/access.2019.2915099 2019

[7] [7]

Energy -Efficient Joint User and Power Allocation in 5G Millimeter Wave Networks: A Genetic Algorithm - Based Approach,

A. Fayad and T. Cinkler, "Energy -Efficient Joint User and Power Allocation in 5G Millimeter Wave Networks: A Genetic Algorithm - Based Approach," in IEEE Access, vol. 12, pp. 20019-20030, 2024, doi: 10.1109/ACCESS.2024.3361660

work page doi:10.1109/access.2024.3361660 2024

[8] [8]

Hassani, Alireza & Delir Haghighi, Pari & Jayaraman, Prem Prakash & Zaslavsky, Arkady & Medvedev, Alexey. (2016). CDQL: A Generic Context Representation and Querying Approach for Internet of Things Applications. 79-88. 10.1145/3007120.3007137

work page doi:10.1145/3007120.3007137 2016

[9] [9]

Hill, Mary & Laughter, Melissa & Harmange, Cecile & Dellavalle, Robert & Rundle, Chandler & Dunnick, Cory. (2021). Development of the CDQL: a comprehensive quality of life measure for patients with contact dermatitis (Preprint). 10.2196/preprints.30620

work page doi:10.2196/preprints.30620 2021

[10] [10]

Minh Do, Canh & Takagi, Tsubasa & Ogata, Kazuhiro. (2024). Automated Quantum Protocol Verification Based on Concurrent Dynamic Quantum Logic. ACM Transactions on Software Engineering and Methodology. 34. 10.1145/3708475

work page doi:10.1145/3708475 2024

[11] [11]

Power Allocation and Parameter Estimation for Multipath -Based 5G Positioning,

A. Kakkavas, H. Wymeersch, G. Seco-Granados, M. H. C. García, R. A. Stirling-Gallacher and J. A. Nossek, "Power Allocation and Parameter Estimation for Multipath -Based 5G Positioning," in IEEE Transactions on Wireless Communications, vol. 20, no. 11, pp. 7302-7316, Nov. 2021, doi: 10.1109/TWC.2021.3082581

work page doi:10.1109/twc.2021.3082581 2021

[12] [12]

Joint Time and Power Allocation for 5G NR Unlicensed Systems,

H. Bao, Y. Huo, X. Dong and C. Huang, "Joint Time and Power Allocation for 5G NR Unlicensed Systems," in IEEE Transactions on Wireless Communications, vol. 20, no. 9, pp. 6195-6209, Sept. 2021, doi: 10.1109/TWC.2021.3072553

work page doi:10.1109/twc.2021.3072553 2021

[13] [13]

Iturria Rivera, Pedro & Elsayed, Medhat & Bavand, Majid & Gaigalas, Raimundas & Furr, Steve & Erol Kantarci, Melike. (2023). Hierarchical Deep Q -Learning Based Handover in Wireless Networks with Dual Connectivity. 10.48550/arXiv.2301.05391

work page doi:10.48550/arxiv.2301.05391 2023

[14] [14]

Kavosi, Daruosh & Karimi, Abbas & Zarafshan, Faraneh. (2024). SELF- QMM: An Self -directed Model Based -on Extended Q -Learning and Markov Model to Estimate MTTF in Multiprocessor Platform of Embedded Systems. 10.21203/rs.3.rs-5327542/v1

work page doi:10.21203/rs.3.rs-5327542/v1 2024

[15] [15]

Evolved Universal Terrestrial Radio Access (E -UTRA); Radio Resource Control (RRC); Protocol specification,

3GPP, “Evolved Universal Terrestrial Radio Access (E -UTRA); Radio Resource Control (RRC); Protocol specification,” 3GPP TS 36.331, Release 15, Dec. 2020. [Online]. Available: https://www.3gpp.org/DynaReport/36331.htm

work page 2020

[16] [16]

Load Balancing in Cellular Networks: A Reinforcement Learning Approach,

K. Attiah, M. Alsheikh, N. Saeed, and T. Y. Al-Naffouri, “Load Balancing in Cellular Networks: A Reinforcement Learning Approach,” in Proc. IEEE Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, Jan. 2020, pp. 1 –6. doi: 10.1109/CCNC46108.2020.9045533

work page doi:10.1109/ccnc46108.2020.9045533 2020

[17] [17]

Load Balancing for Ultra- Dense Networks: A Deep Reinforcement Learning -Based Approach,

Y. Xu, Q. Wu, R. Atat, Y. Zhao, and Z. Ren, “Load Balancing for Ultra- Dense Networks: A Deep Reinforcement Learning -Based Approach,” IEEE Internet of Things Journal, vol. 8, no. 7, pp. 5141–5155, Apr. 2021. doi: 10.1109/JIOT.2020.3035289

work page doi:10.1109/jiot.2020.3035289 2021

[18] [18]

Handover Management in 5G Networks Using Reinforcement Learning,

V. Yajnanarayana, A. Gupta, and P. Mannion, “Handover Management in 5G Networks Using Reinforcement Learning,” in Proc. IEEE 5G World Forum (5GWF), Bangalore, India, Sept. 2020, pp. 1 –6. doi: 10.1109/5GWF49715.2020.9221346

work page doi:10.1109/5gwf49715.2020.9221346 2020

[19] [19]

Trust in 5g open rans through machine learning: Rf fingerprinting on the powder pawr plat- form,

Z.-H. Huang, K. -W. Lu, and C. -L. Wang, “Efficient Handover in 5G Using Deep Learning,” in Proc. IEEE Global Communications Conference (GLOBECOM), Taipei, Taiwan, Dec. 2020, pp. 1 –6. doi: 10.1109/GLOBECOM42002.2020.9322453

work page doi:10.1109/globecom42002.2020.9322453 2020

[20] [20]

Reinforcement Learning-Based Beam Management and Interference Mitigation in mmWave Networks,

L. He, Y. Xu, R. Atat, N. Mastronarde, and Y. Zhao, “Reinforcement Learning-Based Beam Management and Interference Mitigation in mmWave Networks,” IEEE Access, vol. 9, pp. 12345–12357, Jan. 2021. doi: 10.1109/ACCESS.2021.3051195

work page doi:10.1109/access.2021.3051195 2021

[21] [21]

Hierarchical Reinforcement Learning for Mobility Management in 5G Ultra-Dense Networks,

J. Chen, Y. Wang, and X. Chu, “Hierarchical Reinforcement Learning for Mobility Management in 5G Ultra-Dense Networks,” IEEE Transactions on Network and Service Management, vol. 18, no. 1, pp. 778 –790, Mar

work page

[22] [22]

doi: 10.1109/TNSM.2020.3045406

work page doi:10.1109/tnsm.2020.3045406 2020

[23] [23]

QoS -Aware Multi -Objective Reinforcement Learning for User Association in 5G Networks,

Z. Li, W. Saad, and M. Bennis, “QoS -Aware Multi -Objective Reinforcement Learning for User Association in 5G Networks,” Computer Networks, vol. 210, p. 107905, Mar. 2022. doi: 10.1016/j.comnet.2022.107905

work page doi:10.1016/j.comnet.2022.107905 2022

[24] [24]

Energy and Latency Optimization for Edge Intelligence via Deep Reinforcement Learning,

A. Rahmati, A. Azari, and C. Fischione, “Energy and Latency Optimization for Edge Intelligence via Deep Reinforcement Learning,” IEEE Transactions on Wireless Communications, vol. 21, no. 6, pp. 4116–4129, Jun. 2022. doi: 10.1109/TWC.2021.3136266

work page doi:10.1109/twc.2021.3136266 2022

[25] [25]

Addressing Function Approximation Error in Actor -Critic Methods,

S. Fujimoto, H. van Hoof, and D. Meger, “Addressing Function Approximation Error in Actor -Critic Methods,” in Proc. International Conference on Machine Learning (ICML), Stockholm, Sweden, Jul. 2018, pp. 1587 –1596. [Online]. Available: https://proceedings.mlr.press/v80/fujimoto18a.html

work page 2018

[26] [26]

Handover in LTE-advanced wireless networks: state of art and survey of decision algorithm,

R. Ahmad, E. A. Sundararajan, N. E. Othman, and M. Ismail, “Handover in LTE-advanced wireless networks: state of art and survey of decision algorithm,” Telecommunication Systems, 2017

work page 2017

[27] [27]

A Survey on Handover Management: From LTE to NR,

M. Tayyab, X. Gelabert, and R. Jantti, “A Survey on Handover Management: From LTE to NR,” 2019

work page 2019

[28] [28]

5G Handover using Reinforcement Learning,

V. Yajnanarayana, H. Ryden, and L. Hevizi, “5G Handover using Reinforcement Learning,” in Proc. IEEE 3rd 5G World Forum (5GWF), 2020

work page 2020

[29] [29]

Efficient Handover Algorithm in 5G Networks using Deep Learning,

Z.-H. Huang, Y. -L. Hsu, P. -K. Chang, and M. -J. Tsai, “Efficient Handover Algorithm in 5G Networks using Deep Learning,” in IEEE GLOBECOM 2020, pp. 1–6, Dec. 2020

work page 2020

[30] [30]

Load Balancing for Ultra - dense Networks: A Deep Reinforcement Learning -Based Approach,

Y. Xu, W. Xu, Z. Wang, J. Lin, and S. Cui, “Load Balancing for Ultra - dense Networks: A Deep Reinforcement Learning -Based Approach,” IEEE Internet of Things Journal, 2019

work page 2019

[31] [31]

Load Balancing in Cellular Networks: A Reinforcement Learning Approach,

K. Attiah, K. Banawan, A. Gaber, A. Elezabi, K. Seddik, Y. Gadallah, and K. Abdullah, “Load Balancing in Cellular Networks: A Reinforcement Learning Approach,” in Proc. IEEE CCNC, 2020

work page 2020

[32] [32]

Stackelberg game -based deployment design and radio resource allocation in coordinated UAVs -assisted vehicular communication networks,

M. Hosseini and R. Ghazizadeh, “Stackelberg game -based deployment design and radio resource allocation in coordinated UAVs -assisted vehicular communication networks,” IEEE Trans. Veh. Technol., vol. 72, no. 1, pp. 1196–1210, Jan. 2023, doi: 10.1109/TVT.2022.3206145

work page doi:10.1109/tvt.2022.3206145 2023

[33] [33]

A joint power and bandwidth allocation method based on deep reinforcement learning for V2V communications in 5G,

X. Hu, S. Xu, L. Wang, Y. Wang, Z. Liu, L. Xu, Y. Li, and W. Wang, “A joint power and bandwidth allocation method based on deep reinforcement learning for V2V communications in 5G,” China Communications, vol. 18, no. 7, pp. 25–35, Jul. 2021

work page 2021

[34] [34]

A deep reinforcement learning based D2D relay selection and power level allocation in mmWave vehicular networks,

H. Zhang, S. Chong, X. Zhang, and N. Lin, “A deep reinforcement learning based D2D relay selection and power level allocation in mmWave vehicular networks,” IEEE Wireless Commun. Lett., vol. 9, no. 3, pp. 416–419, Mar. 2020

work page 2020

[35] [35]

Knowledge -driven resource allocation for wireless networks: A WMMSE unrolled graph neural network approach,

H. Yang, N. Cheng, R. Sun, W. Quan, R. Chai, K. Aldubaikhy, A. Alqasir, and X. Shen, “Knowledge -driven resource allocation for wireless networks: A WMMSE unrolled graph neural network approach,” IEEE Internet of Things Journal, vol. 11, no. 10, pp. 189–…, 2024

work page 2024

[36] [36]

Joint power control and channel allocation for interference mitigation based on reinforcement learning,

G. Zhao, Y. Li, C. Xu, Z. Han, Y. Xing, and S. Yu, “Joint power control and channel allocation for interference mitigation based on reinforcement learning,” IEEE Access, vol. 7, pp. 177254–177265, 2019

work page 2019

[37] [37]

Joint optimization of handover control and power allocation based on multi -agent deep reinforcement learning,

D. Guo, L. Tang, X. Zhang, and Y. -C. Liang, “Joint optimization of handover control and power allocation based on multi -agent deep reinforcement learning,” IEEE Trans. Veh. Technol., vol. 69, no. 11, pp. 13124–13138, Nov. 2020

work page 2020

[38] [38]

Resource management in future millimeter wave small -cell networks: Joint PHY - MAC layer design,

J. Shi, H. Pervaiz, P. Xiao, W. Liang, Z. Li, and Z. Ding, “Resource management in future millimeter wave small -cell networks: Joint PHY - MAC layer design,” IEEE Access, vol. 7, pp. 76910–76919, 2019

work page 2019

[39] [39]

Self -organizing mm-wave networks: A power allocation scheme based on machine learning,

R. Amiri and H. Mehrpouyan, “Self -organizing mm-wave networks: A power allocation scheme based on machine learning,” in Proc. 11th Global Symp. Millim. Waves (GSMM), 2018

work page 2018

[40] [40]

A model -driven deep reinforcement learning heuristic algorithm for resource allocation in ultra- dense cellular networks,

X. Liao, J. Shi, Z. Li, L. Zhang, and B. Xia, “A model -driven deep reinforcement learning heuristic algorithm for resource allocation in ultra- dense cellular networks,” IEEE Trans. Veh. Technol., vol. 69, no. 1, pp. 983–997, Jan. 2020

work page 2020

[41] [41]

Self -adaptive power control with deep reinforcement learning for millimeter-wave Internet-of- vehicles video caching,

D. Kwon, J. Kim, D. A. Mohaisen, and W. Lee, “Self -adaptive power control with deep reinforcement learning for millimeter-wave Internet-of- vehicles video caching,” Journal of Communications and Networks, vol. 22, no. 4, pp. 326–337, Aug. 2020

work page 2020

[42] [42]

A survey on uplink resource allocation in OFDMA wireless networks,

E. Yaacoub and Z. Dawy, “A survey on uplink resource allocation in OFDMA wireless networks,” IEEE Commun. Surveys & Tutorials, vol. 14, no. 2, pp. 322–337, 2nd Quart., 2012

work page 2012

[43] [43]

Experience -driven power allocation using multi -agent deep reinforcement learning for millimeter -wave high -speed railway systems,

J. Xu and B. Ai, “Experience -driven power allocation using multi -agent deep reinforcement learning for millimeter -wave high -speed railway systems,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 6, pp. 5490–5500, Jun. 2022

work page 2022

[44] [44]

V2X offloading and resource allocation in SDN -assisted MEC -based vehicular networks,

H. Zhang, Z. Wang, and K. Liu, “V2X offloading and resource allocation in SDN -assisted MEC -based vehicular networks,” China Communications, vol. 17, no. 5, pp. 266–283, May 2020

work page 2020

[45] [45]

Federated reinforcement learning -based resource allocation in D2D -enabled 6G,

Q. Guo, F. Tang, and N. Kato, “Federated reinforcement learning -based resource allocation in D2D -enabled 6G,” IEEE Network, vol. 37, no. 5, pp. 89–95, Sep. 2023

work page 2023

[46] [46]

Resource allocation for high -reliability low-latency vehicular communications with packet retransmission,

C. Guo, L. Liang, and G. Y. Li, “Resource allocation for high -reliability low-latency vehicular communications with packet retransmission,” IEEE Trans. Veh. Technol., vol. 68, no. 7, pp. 6219–6230, Jul. 2019

work page 2019

[47] [47]

Deep Reinforcement Learning for 5G Networks: Joint Beamforming, Power Control, and Interference Coordination,

F. B. Mismar, B. L. Evans and A. Alkhateeb, "Deep Reinforcement Learning for 5G Networks: Joint Beamforming, Power Control, and Interference Coordination," in IEEE Transactions on Communications, vol. 68, no. 3, pp. 1581 -1592, March 2020, doi: 10.1109/TCOMM.2019.2961332

work page doi:10.1109/tcomm.2019.2961332 2020

[48] [48]

Massive MIMO With Joint Power Control,

J. Choi, “Massive MIMO With Joint Power Control,” IEEE Wireless Communications Letters, vol. 3, no. 4, pp. 329–332, Aug. 2014

work page 2014

[49] [49]

Joint Power Control and Beamforming for Uplink Non -Orthogonal Multiple Access in 5G Millimeter -Wave Communications,

L. Zhu, J. Zhang, Z. Xiao, X. Cao, D. O. Wu, and X. Xia, “Joint Power Control and Beamforming for Uplink Non -Orthogonal Multiple Access in 5G Millimeter -Wave Communications,” IEEE Trans. on Wireless Communications, vol. 17, no. 9, pp. 6177–6189, Sep. 2018

work page 2018

[50] [50]

Online Power Control for 5G Wireless Communications: A Deep Q-Network Approach,

C. Luo, J. Ji, Q. Wang, L. Yu, and P. Li, “Online Power Control for 5G Wireless Communications: A Deep Q-Network Approach,” in Proc. IEEE ICC, May 2018

work page 2018

[51] [51]

Joint optimal power control and beamforming in wireless networks using antenna arrays,

F. Rashid-Farrokhi, L. Tassiulas, and K. J. R. Liu, “Joint optimal power control and beamforming in wireless networks using antenna arrays,” IEEE Trans. on Communications , vol. 46, no. 10, pp. 1313 –1324, Oct. 1998

work page 1998

[52] [52]

Evolved Universal Terrestrial Radio Access (E-UTRA); Overall description,

3GPP, “Evolved Universal Terrestrial Radio Access (E-UTRA); Overall description,” TS 36.300, Jan. 2019

work page 2019

[53] [53]

Online Learning-based Downlink Transmission Coordination in Ultra -Dense Millimeter Wave Heterogeneous Networks,

R. Kim, Y. Kim, N. Y. Yu, S. Kim, and H. Lim, “Online Learning-based Downlink Transmission Coordination in Ultra -Dense Millimeter Wave Heterogeneous Networks,” IEEE Trans. on Wireless Communications , vol. 18, no. 4, pp. 2200–2214, Mar. 2019

work page 2019

[54] [54]

Reinforcement Learning for Link Adaptation in MIMO-OFDM Wireless Systems,

S. Yun and C. Caramanis, “Reinforcement Learning for Link Adaptation in MIMO-OFDM Wireless Systems,” in Proc. IEEE GLOBECOM, Dec. 2010

work page 2010

[55] [55]

A Q-learning Based Approach to Interference Avoidance in Self -Organized Femtocell Networks,

M. Bennis and D. Niyato, “A Q-learning Based Approach to Interference Avoidance in Self -Organized Femtocell Networks,” in Proc. IEEE Globecom Workshops, Dec. 2010

work page 2010

[56] [56]

Q-Learning Algorithm for VoLTE Closed Loop Power Control in Indoor Small Cells,

F. B. Mismar and B. L. Evans, “Q-Learning Algorithm for VoLTE Closed Loop Power Control in Indoor Small Cells,” in Proc. Asilomar Conf. on Signals, Systems, and Computers, Oct. 2018

work page 2018

[57] [57]

Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks,

S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks,” IEEE Trans. on Cognitive Communications and Networking, vol. 4, no. 2, pp. 257–265, Jun. 2018

work page 2018

[58] [58]

Data -Driven Deep Learning for Automatic Modulation Recognition in Cognitive Radios,

Y. Wang, M. Liu, J. Yang, and G. Gui, “Data -Driven Deep Learning for Automatic Modulation Recognition in Cognitive Radios,” IEEE Trans. on Vehicular Technology, vol. 68, no. 4, pp. 4074–4077, Apr. 2019

work page 2019

[59] [59]

Deep learning -based power control for non -orthogonal random access,

H. S. Jang, H. Lee, and T. Q. S. Quek, “Deep learning -based power control for non -orthogonal random access,” IEEE Communications Letters, pp. 1–1, Aug. 2019

work page 2019

[60] [60]

Deep Learning Based Online Power Control for Large Energy Harvesting Networks,

M. K. Sharma, A. Zappone, M. Debbah, and M. Assaad, “Deep Learning Based Online Power Control for Large Energy Harvesting Networks,” in Proc. IEEE ICASSP, May 2019, pp. 8429–8433

work page 2019

[61] [61]

Deep power control: Transmit power control scheme based on convolutional neural network,

W. Lee, M. Kim, and D. Cho, “Deep power control: Transmit power control scheme based on convolutional neural network,” IEEE Communications Letters, vol. 22, no. 6, pp. 1276–1279, Jun. 2018

work page 2018

[62] [62]

Deep learning coordinated beamforming for highly -mobile millimeter wave systems,

A. Alkhateeb, S. Alex, P. Varkey, Y. Li, Q. Qu, and D. Tujkovic, “Deep learning coordinated beamforming for highly -mobile millimeter wave systems,” IEEE Access, vol. 6, pp. 37328–37348, Jun. 2018

work page 2018

[63] [63]

Deep Learning for TDD and FDD Massive MIMO: Mapping Channels in Space and Frequency,

M. Alrabeiah and A. Alkhateeb, “Deep Learning for TDD and FDD Massive MIMO: Mapping Channels in Space and Frequency,” in Proc. Asilomar Conf. on Signals, Systems and Computers , May 2019. (Also: arXiv:1905.03761)

work page arXiv 2019

[64] [64]

Deep Learning Based Massive MIMO Beamforming for 5G Mobile Network,

T. Maksymyuk, J. Gazda, O. Yaremko, and D. Nevinskiy, “Deep Learning Based Massive MIMO Beamforming for 5G Mobile Network,” in Proc. IEEE International Symposium on Wireless Systems, Sep. 2018, pp. 241–244

work page 2018

[65] [65]

A Framework for Automated Cellular Network Tuning with Reinforcement Learning,

F. B. Mismar, J. Choi, and B. L. Evans, “A Framework for Automated Cellular Network Tuning with Reinforcement Learning,” IEEE Trans. on Communications, vol. 67, no. 10, pp. 7152–7167, Oct. 2019

work page 2019

[66] [66]

Deep Learning- Based Beam Management and Interference Coordination in Dense mmWave Networks,

P. Zhou, X. Fang, X. Wang, Y. Long, R. He, and X. Han, “Deep Learning- Based Beam Management and Interference Coordination in Dense mmWave Networks,” IEEE Trans. on Vehicular Technology, vol. 68, no. 1, pp. 592–603, Jan. 2019

work page 2019

[67] [67]

A Deep Learning Framework for Optimization of MISO Downlink Beamforming,

W. Xia, G. Zheng, Y. Zhu, J. Zhang, J. Wang, and A. P. Petropulu, “A Deep Learning Framework for Optimization of MISO Downlink Beamforming,” Jan. 2019. (arXiv:1901.00354)

work page arXiv 2019

[68] [68]

QoS-Aware Load Balancing in Wireless Networks using Clipped Double Q-Learning,

P. E. Iturria-Rivera and M. Erol -Kantarci, “QoS-Aware Load Balancing in Wireless Networks using Clipped Double Q-Learning,” in Proc. IEEE MASS, Denver, CO, USA, 2021, pp. 10 –16, doi: 10.1109/MASS52906.2021.00011

work page doi:10.1109/mass52906.2021.00011 2021