arxiv: 2601.12659 · v2 · submitted 2026-01-19 · 📡 eess.SP

Two-Layer Reinforcement Learning-Assisted Joint Beamforming and Trajectory Optimization for Multi-UAV Downlink Communications

Ruiqi Wang , Essra M.Ghoura , Omar Alhussein , Yuzhi Yang , Jing Ren , Shizhong Xu , Sami Muhaidat This is my paper

Pith reviewed 2026-05-16 13:44 UTC · model grok-4.3

classification 📡 eess.SP

keywords UAV communicationsbeamforming optimizationtrajectory planninggraph neural networksmulti-agent reinforcement learningdownlink sum rate6G non-terrestrial networks

0 comments p. Extension

The pith

A two-layer AI system pairs graph neural networks for instant beamforming with multi-agent reinforcement learning for trajectory planning to raise sum rates in multi-UAV downlink networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a hierarchically decoupled optimization method that splits the joint beamforming and trajectory problem into short-timescale and long-timescale layers. On the short scale a topology-aware graph neural network produces beamforming vectors in sub-millisecond time by treating UAV-user links as a time-varying heterogeneous graph. On the long scale a multi-agent proximal policy optimization solver plans UAV trajectories under a decentralized partially observable Markov decision process. Simulations show the combined approach exceeds both conventional optimization heuristics and other deep-learning baselines on total data rate, speed of convergence, and ability to generalize across different user placements.

Core claim

The central claim is that modeling the dynamic interference topology explicitly with GraphNorm-augmented GNNs for beamforming and solving the trajectory sub-problem via centralized-training decentralized-execution multi-agent RL produces higher system sum rates than either numerical solvers or standard deep-learning baselines while meeting real-time latency constraints.

What carries the argument

A hierarchically decoupled framework that uses a topology-aware GNN beamformer on the short timescale and multi-agent proximal policy optimization for decentralized trajectory planning on the long timescale.

If this is right

Beamforming decisions can be computed at sub-millisecond latency without solving non-convex programs at every slot.
UAVs can learn cooperative trajectory policies that improve collective sum rate without requiring a central controller at inference time.
The same separation of timescales can be reused for other coupled resource-allocation tasks that mix fast radio-frequency variables with slow physical movement variables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework may extend to uplink or full-duplex UAV scenarios if the GNN graph is redefined to include uplink interference edges.
Adding a third layer for energy or regulatory constraints could be tested by extending the Markov decision process state with battery or no-fly-zone indicators.
Real-world validation would require replacing the paper's perfect-CSI assumption with online channel estimation and checking whether the GNN still generalizes.

Load-bearing premise

That performance measured under idealized channel models and perfect channel-state information will remain high when real hardware imperfections and imperfect channel estimates are present.

What would settle it

A field trial on actual UAV hardware that records a drop in achieved sum rate below the simulated baseline once measured channel-state information error exceeds the level assumed in the paper's simulations.

Figures

Figures reproduced from arXiv: 2601.12659 by Essra M.Ghoura, Jing Ren, Omar Alhussein, Ruiqi Wang, Sami Muhaidat, Shizhong Xu, Yuzhi Yang.

**Figure 1.** Figure 1: System model user k is fixed at l U k = [x u k , yu k , 0]T . Meanwhile, the timevarying position of UAV n at time slot t is denoted by l A n [t] = [x a n [t], ya n [t], H] T . For the trajectory planning task, each UAV n is required to travel from a predefined starting point l S n = [x s n , ys n , H] T to a destination point l D n = [x d n , yd n , H] T within the maximum mission duration Tmax, while si… view at source ↗

**Figure 2.** Figure 2: Overall framework of the proposed GNN-enabled beamforming and MAPPO-based UAV trajectory optimization. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Average sum rate on training and validation datasets [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 5.** Figure 5: Sum rate vs. noise power [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 8.** Figure 8: Average computation time per beamforming decision. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Performance evaluation of the proposed MAPPO-based trajectory planning algorithm: (a) Training convergence comparison, [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: Projected 2D trajectories of three UAVs navigating from start (S) to destination (D) under varying user densities. [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

read the original abstract

Unmanned aerial vehicles (UAVs) are pivotal for future 6G non-terrestrial networks, yet their high mobility creates a complex coupled optimization problem for beamforming and trajectory design. Existing numerical methods suffer from prohibitive latency, while standard deep learning often ignores dynamic interference topology, limiting scalability. To address these issues, this paper proposes a hierarchically decoupled framework synergizing graph neural networks (GNNs) with multi-agent reinforcement learning. Specifically, on the short timescale, we develop a topology-aware GNN beamformer by incorporating GraphNorm. By modeling the dynamic UAV-user association as a time-varying heterogeneous graph, this method explicitly extracts interference patterns to achieve sub-millisecond inference. On the long timescale, trajectory planning is modeled as a decentralized partially observable Markov decision process and solved via the multi-agent proximal policy optimization algorithm under the centralized training with decentralized execution paradigm, facilitating cooperative behaviors. Extensive simulation results demonstrate that the proposed framework significantly outperforms state-of-the-art optimization heuristics and deep learning baselines in terms of system sum rate, convergence speed, and generalization capability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main contribution is a practical hierarchical split using a GraphNorm GNN for fast beamforming and MAPPO for UAV trajectories, which beats some baselines in ideal simulations, but the gains rest on perfect CSI with no robustness checks.

read the letter

The core idea here is a two-layer setup: a short-timescale GNN beamformer that treats the UAV-user links as a time-varying graph and uses GraphNorm to pull out interference patterns, paired with a long-timescale MAPPO planner that runs under centralized training with decentralized execution. This decoupling targets the latency problem in multi-UAV downlink, where full joint optimization is too slow. The approach is new enough in this specific combination for the multi-UAV case, and the graph modeling step is a reasonable way to make the beamformer topology-aware rather than treating channels as a flat vector. Simulations reportedly show higher sum rates, quicker convergence, and better generalization than standard heuristics and other deep learning methods, which is the main evidence offered. That part is useful for anyone already working on RL or GNN applications in wireless systems. The evaluation stays within perfect instantaneous CSI and standard path-loss models, with no reported tests under CSI estimation error, phase noise, or hardware limits. Without those, it's hard to know whether the reported advantages survive the mismatch between assumed and actual interference that occurs in real deployments. The abstract also skips details on exact baselines, number of runs, or how generalization was quantified, so the strength of the performance claims is difficult to judge from what's given. This is for researchers focused on UAV-assisted 6G networks or on applying graph and multi-agent methods to communications optimization. A reader in that niche could take the framework and adapt it, but would probably need to run their own robustness experiments first. I would send it to peer review because the methods are grounded and the hierarchical idea addresses a real bottleneck, even if the simulation gaps need referee attention.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a hierarchically decoupled two-layer framework for joint beamforming and trajectory optimization in multi-UAV downlink communications. Short-timescale beamforming is handled by a topology-aware GNN incorporating GraphNorm to model dynamic UAV-user associations as time-varying heterogeneous graphs for sub-millisecond inference. Long-timescale trajectory planning is formulated as a decentralized POMDP and solved using multi-agent proximal policy optimization (MAPPO) under centralized training with decentralized execution. Extensive simulations are reported to demonstrate significant gains over optimization heuristics and deep learning baselines in sum rate, convergence speed, and generalization.

Significance. The hierarchical GNN-MAPPO decoupling addresses a relevant scalability challenge in 6G non-terrestrial networks by separating interference-aware beamforming from cooperative trajectory planning. The explicit use of graph structure for dynamic interference topology and the CTDE paradigm for multi-agent cooperation are technically sound ideas that could enable real-time operation if the performance advantages hold. However, the idealized simulation regime limits the assessed significance for practical deployment.

major comments (2)

[Simulation results section] Simulation results section: the central claims of superior sum rate, convergence, and generalization rest on simulations under perfect instantaneous CSI and standard path-loss models with no reported error bars, no description of baseline implementation details, and no data exclusion rules. This prevents independent verification of the stated gains and is load-bearing for the outperformance claim.
[Proposed framework section] Proposed framework section: the short-timescale GNN beamformer and long-timescale MAPPO planner are evaluated only under perfect CSI; no robustness curves are provided under CSI estimation error (e.g., 5-15% normalized MSE) or hardware impairments. This mismatch between assumed and actual interference topology directly affects whether the hierarchical decoupling retains its reported advantage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help improve the clarity and verifiability of our work. We address each major comment below and have revised the manuscript accordingly.

read point-by-point responses

Referee: [Simulation results section] Simulation results section: the central claims of superior sum rate, convergence, and generalization rest on simulations under perfect instantaneous CSI and standard path-loss models with no reported error bars, no description of baseline implementation details, and no data exclusion rules. This prevents independent verification of the stated gains and is load-bearing for the outperformance claim.

Authors: We agree that additional details are required for independent verification. In the revised manuscript, we have added error bars (mean ± one standard deviation over 100 independent Monte Carlo runs) to all sum-rate, convergence, and generalization plots in Section IV. We have also expanded the simulation setup subsection with complete baseline implementation details, including the exact optimization algorithms, iteration limits, and neural-network hyperparameters used for the deep-learning baselines. We explicitly state that no data exclusion rules were applied; all simulation runs are retained and averaged. These changes are now included in the main text and a new appendix. revision: yes
Referee: [Proposed framework section] Proposed framework section: the short-timescale GNN beamformer and long-timescale MAPPO planner are evaluated only under perfect CSI; no robustness curves are provided under CSI estimation error (e.g., 5-15% normalized MSE) or hardware impairments. This mismatch between assumed and actual interference topology directly affects whether the hierarchical decoupling retains its reported advantage.

Authors: The original evaluations assume perfect CSI to isolate the benefits of the hierarchical GNN-MAPPO decoupling. To address robustness, we have added a new figure and accompanying analysis in the revised simulation section showing sum-rate performance under CSI estimation errors with normalized MSE ranging from 0% to 15%. The results confirm that the proposed framework retains its advantage over baselines, although the margin narrows at higher error levels. For hardware impairments, we have inserted a limitations paragraph explaining that incorporating specific models (e.g., phase noise or quantization) would require extending the channel model beyond the paper’s scope; we discuss how the framework could be adapted in future work. This constitutes a partial but substantive revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a hierarchical framework with GNN-based short-timescale beamforming and MAPPO-based long-timescale trajectory planning. Claims of outperformance rest on simulation comparisons to external baselines under idealized CSI and channel models. No equations reduce outputs to inputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing steps rely on self-citations that collapse the central result. The derivation remains self-contained against the stated simulation benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard domain assumptions from wireless communications and reinforcement learning; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Dynamic UAV-user associations can be represented as time-varying heterogeneous graphs whose interference patterns are extractable by GNNs.
Invoked for the short-timescale beamformer.
domain assumption Trajectory planning constitutes a decentralized partially observable Markov decision process solvable by MAPPO under CTDE.
Invoked for the long-timescale planner.

pith-pipeline@v0.9.0 · 5514 in / 1178 out tokens · 39515 ms · 2026-05-16T13:44:47.428381+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

[1]

Geraci et al., ”What Will the Future of UA V Cellular Communications Be? A Flight From 5G to 6G,” in IEEE Communications Surveys & Tutorials, vol

G. Geraci et al., ”What Will the Future of UA V Cellular Communications Be? A Flight From 5G to 6G,” in IEEE Communications Surveys & Tutorials, vol. 24, no. 3, pp. 1304-1335, thirdquarter 2022, doi: 10.1109/COMST.2022.3171135

work page doi:10.1109/comst.2022.3171135 2022
[2]

Xiao et al., ”Antenna Array Enabled Space/Air/Ground Communi- cations and Networking for 6G,” in IEEE Journal on Selected Areas in Communications, vol

Z. Xiao et al., ”Antenna Array Enabled Space/Air/Ground Communi- cations and Networking for 6G,” in IEEE Journal on Selected Areas in Communications, vol. 40, no. 10, pp. 2773-2804, Oct. 2022, doi: 10.1109/JSAC.2022.3196320

work page doi:10.1109/jsac.2022.3196320 2022
[3]

Tang et al., ”Deep Graph Reinforcement Learning for UA V-Enabled Multi-User Secure Communications,” in IEEE Transactions on Mo- bile Computing, vol

X. Tang et al., ”Deep Graph Reinforcement Learning for UA V-Enabled Multi-User Secure Communications,” in IEEE Transactions on Mo- bile Computing, vol. 24, no. 9, pp. 8780-8793, Sept. 2025, doi: 10.1109/TMC.2025.3558790

work page doi:10.1109/tmc.2025.3558790 2025
[4]

S. Wang, X. Song, T. Song and Y . Yang, ”Joint Optimization of Beamforming and Trajectory for UA V-RIS-Assisted MU-MISO Systems Using GNN and SD3,” in IEEE Transactions on Mobile Computing, vol. 24, no. 10, pp. 9539-9553, Oct. 2025, doi: 10.1109/TMC.2025.3563072

work page doi:10.1109/tmc.2025.3563072 2025
[5]

J. Chen, K. Zhai, Z. Wang, Y . Liu, J. Jia and X. Wang, ”CoMP and RIS-Assisted Multicast Transmission in a Multi-UA V Communication System,” in IEEE Transactions on Communications, vol. 72, no. 6, pp. 3602-3617, June 2024, doi: 10.1109/TCOMM.2024.3357428

work page doi:10.1109/tcomm.2024.3357428 2024
[6]

Liu et al., ”UA V-Enabled Collaborative Beamforming via Multi- Agent Deep Reinforcement Learning,” in IEEE Transactions on Mo- bile Computing, vol

S. Liu et al., ”UA V-Enabled Collaborative Beamforming via Multi- Agent Deep Reinforcement Learning,” in IEEE Transactions on Mo- bile Computing, vol. 23, no. 12, pp. 13015-13032, Dec. 2024, doi: 10.1109/TMC.2024.3419915

work page doi:10.1109/tmc.2024.3419915 2024
[7]

”The surprising effectiveness of ppo in cooperative multi-agent games.” Advances in neural information processing systems 35 (2022): 24611-24624

Yu, Chao, et al. ”The surprising effectiveness of ppo in cooperative multi-agent games.” Advances in neural information processing systems 35 (2022): 24611-24624

work page 2022
[8]

Q. Wu, Y . Zeng and R. Zhang, ”Joint Trajectory and Communication Design for Multi-UA V Enabled Wireless Networks,” in IEEE Transactions on Wireless Communications, vol. 17, no. 3, pp. 2109-2121, March 2018, doi: 10.1109/TWC.2017.2789293

work page doi:10.1109/twc.2017.2789293 2018
[9]

Z. Xiao, H. Dong, L. Bai, D. O. Wu and X. -G. Xia, ”Unmanned Aerial Vehicle Base Station (UA V-BS) Deployment With Millimeter- Wave Beamforming,” in IEEE Internet of Things Journal, vol. 7, no. 2, pp. 1336-1349, Feb. 2020, doi: 10.1109/JIOT.2019.2954620

work page doi:10.1109/jiot.2019.2954620 2020
[10]

L. Zhu, J. Zhang, Z. Xiao, X. -G. Xia and R. Zhang, ”Multi-UA V Aided Millimeter-Wave Networks: Positioning, Clustering, and Beamforming,” in IEEE Transactions on Wireless Communications, vol. 21, no. 7, pp. 4637-4653, July 2022, doi: 10.1109/TWC.2021.3131580

work page doi:10.1109/twc.2021.3131580 2022
[11]

X. Yuan, H. Jiang, Y . Hu and A. Schmeink, ”Joint Analog Beamforming and Trajectory Planning for Energy-Efficient UA V-Enabled Nonlinear Wireless Power Transfer,” in IEEE Journal on Selected Areas in Communications, vol. 40, no. 10, pp. 2914-2929, Oct. 2022, doi: 10.1109/JSAC.2022.3196108

work page doi:10.1109/jsac.2022.3196108 2022
[12]

S. Li, B. Duo, X. Yuan, Y . -C. Liang and M. Di Renzo, ”Reconfigurable In- telligent Surface Assisted UA V Communication: Joint Trajectory Design and Passive Beamforming,” in IEEE Wireless Communications Letters, vol. 9, no. 5, pp. 716-720, May 2020, doi: 10.1109/LWC.2020.2966705

work page doi:10.1109/lwc.2020.2966705 2020
[13]

L. Ge, P. Dong, H. Zhang, J. -B. Wang and X. You, ”Joint Beamforming and Trajectory Optimization for Intelligent Reflecting Surfaces-Assisted UA V Communications,” in IEEE Access, vol. 8, pp. 78702-78712, 2020, doi: 10.1109/ACCESS.2020.2990166

work page doi:10.1109/access.2020.2990166 2020
[14]

X. Pang, N. Zhao, J. Tang, C. Wu, D. Niyato and K. -K. Wong, ”IRS- Assisted Secure UA V Transmission via Joint Trajectory and Beamforming Design,” in IEEE Transactions on Communications, vol. 70, no. 2, pp. 1140-1152, Feb. 2022, doi: 10.1109/TCOMM.2021.3136563

work page doi:10.1109/tcomm.2021.3136563 2022
[15]

Joint maneuver and beamforming design for UA V-enabled integrated sensing and communication,

Z. Lyu et al., “Joint maneuver and beamforming design for UA V-enabled integrated sensing and communication,” IEEE Transactions on Wireless Communications, vol. 22, no. 4, pp. 2424–2439, Apr. 2023

work page 2023
[16]

Cheng, X

G. Cheng, X. Song, Z. Lyu and J. Xu, ”Networked ISAC for Low-Altitude Economy: Coordinated Transmit Beamforming and UA V Trajectory Design,” in IEEE Transactions on Communications, vol. 73, no. 8, pp. 5832-5847, Aug. 2025, doi: 10.1109/TCOMM.2025.3541027

work page doi:10.1109/tcomm.2025.3541027 2025
[17]

B. Li, H. Zhang, Y . Rong and Z. Han, ”A Control-based Design of Beam- forming and Trajectory for UA V-Enabled ISAC System,” in IEEE Trans- actions on Wireless Communications, doi: 10.1109/TWC.2025.3604344

work page doi:10.1109/twc.2025.3604344 2025
[18]

D. Deng, W. Zhou, X. Li, D. B. da Costa, D. W. K. Ng and A. Nallanathan, ”Joint Beamforming and UA V Trajectory Optimization for Covert Communications in ISAC Networks,” in IEEE Transactions on Wireless Communications, vol. 24, no. 2, pp. 1016-1030, Feb. 2025, doi: 10.1109/TWC.2024.3503726

work page doi:10.1109/twc.2024.3503726 2025
[19]

Yu et al., ”Joint 3D Beamforming-and-Trajectory Design for UA V- Satellite Uplink Covert Communication,” in IEEE Transactions on Communications, vol

J. Yu et al., ”Joint 3D Beamforming-and-Trajectory Design for UA V- Satellite Uplink Covert Communication,” in IEEE Transactions on Communications, vol. 73, no. 5, pp. 3469-3481, May 2025, doi: 10.1109/TCOMM.2024.3480979

work page doi:10.1109/tcomm.2024.3480979 2025
[20]

Y . Yao et al., ”UA V-Relay-Aided Secure Maritime Networks Coex- isting with Satellite Networks: Robust Beamforming and Trajectory Optimization,” in IEEE Transactions on Wireless Communications, doi: 10.1109/TWC.2025.3596136

work page doi:10.1109/twc.2025.3596136 2025
[21]

X. Liu, Y . Liu and Y . Chen, ”Machine Learning Empowered Trajectory and Passive Beamforming Design in UA V-RIS Wireless Networks,” in IEEE Journal on Selected Areas in Communications, vol. 39, no. 7, pp. 2042-2055, July 2021, doi: 10.1109/JSAC.2020.3041401

work page doi:10.1109/jsac.2020.3041401 2042
[22]

L. Wang, K. Wang, C. Pan and N. Aslam, ”Joint Trajectory and Passive Beamforming Design for Intelligent Reflecting Surface-Aided UA V Communications: A Deep Reinforcement Learning Approach,” in IEEE Transactions on Mobile Computing, vol. 22, no. 11, pp. 6543-6553, 1 Nov. 2023, doi: 10.1109/TMC.2022.3200998

work page doi:10.1109/tmc.2022.3200998 2023
[23]

C. Liu, W. Yuan, Z. Wei, X. Liu and D. W. K. Ng, ”Location-Aware Predictive Beamforming for UA V Communications: A Deep Learning Approach,” in IEEE Wireless Communications Letters, vol. 10, no. 3, pp. 668-672, March 2021, doi: 10.1109/LWC.2020.3045150

work page doi:10.1109/lwc.2020.3045150 2021
[24]

H. -L. Chiang, K. -C. Chen, W. Rave, M. Khalili Marandi and G. Fettweis, ”Machine-Learning Beam Tracking and Weight Optimization for mmWave Multi-UA V Links,” in IEEE Transactions on Wireless Communications, vol. 20, no. 8, pp. 5481-5494, Aug. 2021, doi: 10.1109/TWC.2021.3068206

work page doi:10.1109/twc.2021.3068206 2021
[25]

K. Guo, M. Wu, X. Li, H. Song and N. Kumar, ”Deep Reinforcement Learning and NOMA-Based Multi-Objective RIS-Assisted IS-UA V-TNs: Trajectory Optimization and Beamforming Design,” in IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 9, pp. 10197-10210, Sept. 2023, doi: 10.1109/TITS.2023.3267607

work page doi:10.1109/tits.2023.3267607 2023
[26]

K. Guo, M. Wu, X. Li, Z. Lin and T. A. Tsiftsis, ”Joint Trajectory and Beamforming Optimization for Federated DRL-Aided Space-Aerial- Terrestrial Relay Networks With RIS and RSMA,” in IEEE Transactions on Wireless Communications, vol. 23, no. 12, pp. 18456-18471, Dec. 2024, doi: 10.1109/TWC.2024.3468298

work page doi:10.1109/twc.2024.3468298 2024
[27]

B. Yin, X. Fang, X. Wang, L. Yan, J. Wu and J. Wang, ”Trajectory Design and Beamforming in UA V-Assisted Wireless Networks: A Fine- Tuned M2LLM-Driven DRL-Based Framework,” in IEEE Transactions on Wireless Communications, doi: 10.1109/TWC.2025.3605277

work page doi:10.1109/twc.2025.3605277 2025
[28]

Graph neural network-based scheduling for multi-UA V- enabled communications in D2D networks,

P. Li et al., “Graph neural network-based scheduling for multi-UA V- enabled communications in D2D networks,” Digital Communications and Networks, vol. 10, no. 1, pp. 45–52, 2024

work page 2024
[29]

H. Zhao, K. Liu, M. Liu, S. Garg and M. Alrashoud, ”Intelligent Beam- forming for UA V-Assisted IIoT Based on Hypergraph Inspired Explain- able Deep Learning,” in IEEE Transactions on Consumer Electronics, vol. 70, no. 1, pp. 1972-1982, Feb. 2024, doi: 10.1109/TCE.2023.3325128

work page doi:10.1109/tce.2023.3325128 1972
[30]

Q. Wang, Y . Lu, W. Chen, B. Ai, Z. Zhong and D. Niyato, ”GNN- Enabled Optimization of Placement and Transmission Design for UA V Communications,” in IEEE Transactions on Vehicular Technology, vol. 74, no. 4, pp. 6656-6661, April 2025, doi: 10.1109/TVT.2024.3514860

work page doi:10.1109/tvt.2024.3514860 2025
[31]

Y . Pan, X. Wang, Z. Xu, N. Cheng, W. Xu and J. -J. Zhang, ”GNN- Empowered Effective Partial Observation MARL Method for AoI Management in Multi-UA V Network,” in IEEE Internet of Things Journal, vol. 11, no. 21, pp. 34541-34553, 1 Nov.1, 2024, doi: 10.1109/JIOT.2024.3447774

work page doi:10.1109/jiot.2024.3447774 2024
[32]

Z. Chen, Z. Zhang, Z. Xiao, Z. Yang and R. Jin, ”Deep Learning-Based Multi-User Positioning in Wireless FDMA Cellular Networks,” in IEEE Journal on Selected Areas in Communications, vol. 41, no. 12, pp. 3848- 3862, Dec. 2023, doi: 10.1109/JSAC.2023.3322799

work page doi:10.1109/jsac.2023.3322799 2023
[33]

Mozaffari, W

M. Mozaffari, W. Saad, M. Bennis and M. Debbah, ”Efficient Deployment of Multiple Unmanned Aerial Vehicles for Optimal Wireless Coverage,” in IEEE Communications Letters, vol. 20, no. 8, pp. 1647-1650, Aug. 2016, doi: 10.1109/LCOMM.2016.2578312

work page doi:10.1109/lcomm.2016.2578312 2016
[34]

GraphNorm: A principled approach to accelerating graph neural network training,

T. Cai, S. Luo, K. Xu, D. He, T.-Y . Liu, and L. Wang, “GraphNorm: A principled approach to accelerating graph neural network training,” in Proc. Int. Conf. Mach. Learn. (ICML), Jul. 2021, pp. 1204–1215

work page 2021
[35]

A review on Genetic Algorithm: Past, present, and future,

S. Katoch, S. S. Chauhan, and V . Kumar, “A review on Genetic Algorithm: Past, present, and future,” Multimedia Tools and Applications, vol. 80, no. 5, pp. 8091–8126, Oct. 2020. doi:10.1007/s11042-020-10139-6

work page doi:10.1007/s11042-020-10139-6 2020
[36]

Blank and K

J. Blank and K. Deb, ”Pymoo: Multi-Objective Optimization in Python,” in IEEE Access, vol. 8, pp. 89497-89509, 2020, doi: 10.1109/AC- CESS.2020.2990567

work page doi:10.1109/ac- 2020
[37]

Deep Sets

M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. Salakhutdinov, and A. Smola, “Deep sets,” arXiv preprint arXiv:1703.06114, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[38]

PointNet++: Deep hierarchical feature learning on point sets in a metric space,

C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep hierarchical feature learning on point sets in a metric space,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Long Beach, CA, USA, Dec. 2017, pp. 5099–5108

work page 2017
[39]

Value-decomposition networks for cooperative multi- agent learning,

P. Sunehag et al., “Value-decomposition networks for cooperative multi- agent learning,” in Proc. Int. Conf. Auto. Agents Multiagent Syst. (AAMAS), Stockholm, Sweden, Jul. 2018, pp. 2085–2087. 15

work page 2018
[40]

Monotonic value function factorisation for deep multi-agent reinforcement learning,

T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,” J. Mach. Learn. Res., vol. 21, no. 178, pp. 1–51, 2020

work page 2020