pith. the verified trust layer for science. sign in

arxiv: 2601.12659 · v2 · submitted 2026-01-19 · 📡 eess.SP

Two-Layer Reinforcement Learning-Assisted Joint Beamforming and Trajectory Optimization for Multi-UAV Downlink Communications

Pith reviewed 2026-05-16 13:44 UTC · model grok-4.3

classification 📡 eess.SP
keywords UAV communicationsbeamforming optimizationtrajectory planninggraph neural networksmulti-agent reinforcement learningdownlink sum rate6G non-terrestrial networks
0
0 comments X p. Extension

The pith

A two-layer AI system pairs graph neural networks for instant beamforming with multi-agent reinforcement learning for trajectory planning to raise sum rates in multi-UAV downlink networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a hierarchically decoupled optimization method that splits the joint beamforming and trajectory problem into short-timescale and long-timescale layers. On the short scale a topology-aware graph neural network produces beamforming vectors in sub-millisecond time by treating UAV-user links as a time-varying heterogeneous graph. On the long scale a multi-agent proximal policy optimization solver plans UAV trajectories under a decentralized partially observable Markov decision process. Simulations show the combined approach exceeds both conventional optimization heuristics and other deep-learning baselines on total data rate, speed of convergence, and ability to generalize across different user placements.

Core claim

The central claim is that modeling the dynamic interference topology explicitly with GraphNorm-augmented GNNs for beamforming and solving the trajectory sub-problem via centralized-training decentralized-execution multi-agent RL produces higher system sum rates than either numerical solvers or standard deep-learning baselines while meeting real-time latency constraints.

What carries the argument

A hierarchically decoupled framework that uses a topology-aware GNN beamformer on the short timescale and multi-agent proximal policy optimization for decentralized trajectory planning on the long timescale.

If this is right

  • Beamforming decisions can be computed at sub-millisecond latency without solving non-convex programs at every slot.
  • UAVs can learn cooperative trajectory policies that improve collective sum rate without requiring a central controller at inference time.
  • The same separation of timescales can be reused for other coupled resource-allocation tasks that mix fast radio-frequency variables with slow physical movement variables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework may extend to uplink or full-duplex UAV scenarios if the GNN graph is redefined to include uplink interference edges.
  • Adding a third layer for energy or regulatory constraints could be tested by extending the Markov decision process state with battery or no-fly-zone indicators.
  • Real-world validation would require replacing the paper's perfect-CSI assumption with online channel estimation and checking whether the GNN still generalizes.

Load-bearing premise

That performance measured under idealized channel models and perfect channel-state information will remain high when real hardware imperfections and imperfect channel estimates are present.

What would settle it

A field trial on actual UAV hardware that records a drop in achieved sum rate below the simulated baseline once measured channel-state information error exceeds the level assumed in the paper's simulations.

Figures

Figures reproduced from arXiv: 2601.12659 by Essra M.Ghoura, Jing Ren, Omar Alhussein, Ruiqi Wang, Sami Muhaidat, Shizhong Xu, Yuzhi Yang.

Figure 1
Figure 1. Figure 1: System model user k is fixed at l U k = [x u k , yu k , 0]T . Meanwhile, the time￾varying position of UAV n at time slot t is denoted by l A n [t] = [x a n [t], ya n [t], H] T . For the trajectory planning task, each UAV n is required to travel from a predefined starting point l S n = [x s n , ys n , H] T to a destination point l D n = [x d n , yd n , H] T within the maximum mission duration Tmax, while si… view at source ↗
Figure 2
Figure 2. Figure 2: Overall framework of the proposed GNN-enabled beamforming and MAPPO-based UAV trajectory optimization. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average sum rate on training and validation datasets [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Sum rate vs. noise power [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: Average computation time per beamforming decision. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Performance evaluation of the proposed MAPPO-based trajectory planning algorithm: (a) Training convergence comparison, [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Projected 2D trajectories of three UAVs navigating from start (S) to destination (D) under varying user densities. [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
read the original abstract

Unmanned aerial vehicles (UAVs) are pivotal for future 6G non-terrestrial networks, yet their high mobility creates a complex coupled optimization problem for beamforming and trajectory design. Existing numerical methods suffer from prohibitive latency, while standard deep learning often ignores dynamic interference topology, limiting scalability. To address these issues, this paper proposes a hierarchically decoupled framework synergizing graph neural networks (GNNs) with multi-agent reinforcement learning. Specifically, on the short timescale, we develop a topology-aware GNN beamformer by incorporating GraphNorm. By modeling the dynamic UAV-user association as a time-varying heterogeneous graph, this method explicitly extracts interference patterns to achieve sub-millisecond inference. On the long timescale, trajectory planning is modeled as a decentralized partially observable Markov decision process and solved via the multi-agent proximal policy optimization algorithm under the centralized training with decentralized execution paradigm, facilitating cooperative behaviors. Extensive simulation results demonstrate that the proposed framework significantly outperforms state-of-the-art optimization heuristics and deep learning baselines in terms of system sum rate, convergence speed, and generalization capability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a hierarchically decoupled two-layer framework for joint beamforming and trajectory optimization in multi-UAV downlink communications. Short-timescale beamforming is handled by a topology-aware GNN incorporating GraphNorm to model dynamic UAV-user associations as time-varying heterogeneous graphs for sub-millisecond inference. Long-timescale trajectory planning is formulated as a decentralized POMDP and solved using multi-agent proximal policy optimization (MAPPO) under centralized training with decentralized execution. Extensive simulations are reported to demonstrate significant gains over optimization heuristics and deep learning baselines in sum rate, convergence speed, and generalization.

Significance. The hierarchical GNN-MAPPO decoupling addresses a relevant scalability challenge in 6G non-terrestrial networks by separating interference-aware beamforming from cooperative trajectory planning. The explicit use of graph structure for dynamic interference topology and the CTDE paradigm for multi-agent cooperation are technically sound ideas that could enable real-time operation if the performance advantages hold. However, the idealized simulation regime limits the assessed significance for practical deployment.

major comments (2)
  1. [Simulation results section] Simulation results section: the central claims of superior sum rate, convergence, and generalization rest on simulations under perfect instantaneous CSI and standard path-loss models with no reported error bars, no description of baseline implementation details, and no data exclusion rules. This prevents independent verification of the stated gains and is load-bearing for the outperformance claim.
  2. [Proposed framework section] Proposed framework section: the short-timescale GNN beamformer and long-timescale MAPPO planner are evaluated only under perfect CSI; no robustness curves are provided under CSI estimation error (e.g., 5-15% normalized MSE) or hardware impairments. This mismatch between assumed and actual interference topology directly affects whether the hierarchical decoupling retains its reported advantage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help improve the clarity and verifiability of our work. We address each major comment below and have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [Simulation results section] Simulation results section: the central claims of superior sum rate, convergence, and generalization rest on simulations under perfect instantaneous CSI and standard path-loss models with no reported error bars, no description of baseline implementation details, and no data exclusion rules. This prevents independent verification of the stated gains and is load-bearing for the outperformance claim.

    Authors: We agree that additional details are required for independent verification. In the revised manuscript, we have added error bars (mean ± one standard deviation over 100 independent Monte Carlo runs) to all sum-rate, convergence, and generalization plots in Section IV. We have also expanded the simulation setup subsection with complete baseline implementation details, including the exact optimization algorithms, iteration limits, and neural-network hyperparameters used for the deep-learning baselines. We explicitly state that no data exclusion rules were applied; all simulation runs are retained and averaged. These changes are now included in the main text and a new appendix. revision: yes

  2. Referee: [Proposed framework section] Proposed framework section: the short-timescale GNN beamformer and long-timescale MAPPO planner are evaluated only under perfect CSI; no robustness curves are provided under CSI estimation error (e.g., 5-15% normalized MSE) or hardware impairments. This mismatch between assumed and actual interference topology directly affects whether the hierarchical decoupling retains its reported advantage.

    Authors: The original evaluations assume perfect CSI to isolate the benefits of the hierarchical GNN-MAPPO decoupling. To address robustness, we have added a new figure and accompanying analysis in the revised simulation section showing sum-rate performance under CSI estimation errors with normalized MSE ranging from 0% to 15%. The results confirm that the proposed framework retains its advantage over baselines, although the margin narrows at higher error levels. For hardware impairments, we have inserted a limitations paragraph explaining that incorporating specific models (e.g., phase noise or quantization) would require extending the channel model beyond the paper’s scope; we discuss how the framework could be adapted in future work. This constitutes a partial but substantive revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a hierarchical framework with GNN-based short-timescale beamforming and MAPPO-based long-timescale trajectory planning. Claims of outperformance rest on simulation comparisons to external baselines under idealized CSI and channel models. No equations reduce outputs to inputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing steps rely on self-citations that collapse the central result. The derivation remains self-contained against the stated simulation benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard domain assumptions from wireless communications and reinforcement learning; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Dynamic UAV-user associations can be represented as time-varying heterogeneous graphs whose interference patterns are extractable by GNNs.
    Invoked for the short-timescale beamformer.
  • domain assumption Trajectory planning constitutes a decentralized partially observable Markov decision process solvable by MAPPO under CTDE.
    Invoked for the long-timescale planner.

pith-pipeline@v0.9.0 · 5514 in / 1178 out tokens · 39515 ms · 2026-05-16T13:44:47.428381+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

  1. [1]

    Geraci et al., ”What Will the Future of UA V Cellular Communications Be? A Flight From 5G to 6G,” in IEEE Communications Surveys & Tutorials, vol

    G. Geraci et al., ”What Will the Future of UA V Cellular Communications Be? A Flight From 5G to 6G,” in IEEE Communications Surveys & Tutorials, vol. 24, no. 3, pp. 1304-1335, thirdquarter 2022, doi: 10.1109/COMST.2022.3171135

  2. [2]

    Xiao et al., ”Antenna Array Enabled Space/Air/Ground Communi- cations and Networking for 6G,” in IEEE Journal on Selected Areas in Communications, vol

    Z. Xiao et al., ”Antenna Array Enabled Space/Air/Ground Communi- cations and Networking for 6G,” in IEEE Journal on Selected Areas in Communications, vol. 40, no. 10, pp. 2773-2804, Oct. 2022, doi: 10.1109/JSAC.2022.3196320

  3. [3]

    Tang et al., ”Deep Graph Reinforcement Learning for UA V-Enabled Multi-User Secure Communications,” in IEEE Transactions on Mo- bile Computing, vol

    X. Tang et al., ”Deep Graph Reinforcement Learning for UA V-Enabled Multi-User Secure Communications,” in IEEE Transactions on Mo- bile Computing, vol. 24, no. 9, pp. 8780-8793, Sept. 2025, doi: 10.1109/TMC.2025.3558790

  4. [4]

    S. Wang, X. Song, T. Song and Y . Yang, ”Joint Optimization of Beamforming and Trajectory for UA V-RIS-Assisted MU-MISO Systems Using GNN and SD3,” in IEEE Transactions on Mobile Computing, vol. 24, no. 10, pp. 9539-9553, Oct. 2025, doi: 10.1109/TMC.2025.3563072

  5. [5]

    J. Chen, K. Zhai, Z. Wang, Y . Liu, J. Jia and X. Wang, ”CoMP and RIS-Assisted Multicast Transmission in a Multi-UA V Communication System,” in IEEE Transactions on Communications, vol. 72, no. 6, pp. 3602-3617, June 2024, doi: 10.1109/TCOMM.2024.3357428

  6. [6]

    Liu et al., ”UA V-Enabled Collaborative Beamforming via Multi- Agent Deep Reinforcement Learning,” in IEEE Transactions on Mo- bile Computing, vol

    S. Liu et al., ”UA V-Enabled Collaborative Beamforming via Multi- Agent Deep Reinforcement Learning,” in IEEE Transactions on Mo- bile Computing, vol. 23, no. 12, pp. 13015-13032, Dec. 2024, doi: 10.1109/TMC.2024.3419915

  7. [7]

    ”The surprising effectiveness of ppo in cooperative multi-agent games.” Advances in neural information processing systems 35 (2022): 24611-24624

    Yu, Chao, et al. ”The surprising effectiveness of ppo in cooperative multi-agent games.” Advances in neural information processing systems 35 (2022): 24611-24624

  8. [8]

    Q. Wu, Y . Zeng and R. Zhang, ”Joint Trajectory and Communication Design for Multi-UA V Enabled Wireless Networks,” in IEEE Transactions on Wireless Communications, vol. 17, no. 3, pp. 2109-2121, March 2018, doi: 10.1109/TWC.2017.2789293

  9. [9]

    Z. Xiao, H. Dong, L. Bai, D. O. Wu and X. -G. Xia, ”Unmanned Aerial Vehicle Base Station (UA V-BS) Deployment With Millimeter- Wave Beamforming,” in IEEE Internet of Things Journal, vol. 7, no. 2, pp. 1336-1349, Feb. 2020, doi: 10.1109/JIOT.2019.2954620

  10. [10]

    L. Zhu, J. Zhang, Z. Xiao, X. -G. Xia and R. Zhang, ”Multi-UA V Aided Millimeter-Wave Networks: Positioning, Clustering, and Beamforming,” in IEEE Transactions on Wireless Communications, vol. 21, no. 7, pp. 4637-4653, July 2022, doi: 10.1109/TWC.2021.3131580

  11. [11]

    X. Yuan, H. Jiang, Y . Hu and A. Schmeink, ”Joint Analog Beamforming and Trajectory Planning for Energy-Efficient UA V-Enabled Nonlinear Wireless Power Transfer,” in IEEE Journal on Selected Areas in Communications, vol. 40, no. 10, pp. 2914-2929, Oct. 2022, doi: 10.1109/JSAC.2022.3196108

  12. [12]

    S. Li, B. Duo, X. Yuan, Y . -C. Liang and M. Di Renzo, ”Reconfigurable In- telligent Surface Assisted UA V Communication: Joint Trajectory Design and Passive Beamforming,” in IEEE Wireless Communications Letters, vol. 9, no. 5, pp. 716-720, May 2020, doi: 10.1109/LWC.2020.2966705

  13. [13]

    L. Ge, P. Dong, H. Zhang, J. -B. Wang and X. You, ”Joint Beamforming and Trajectory Optimization for Intelligent Reflecting Surfaces-Assisted UA V Communications,” in IEEE Access, vol. 8, pp. 78702-78712, 2020, doi: 10.1109/ACCESS.2020.2990166

  14. [14]

    X. Pang, N. Zhao, J. Tang, C. Wu, D. Niyato and K. -K. Wong, ”IRS- Assisted Secure UA V Transmission via Joint Trajectory and Beamforming Design,” in IEEE Transactions on Communications, vol. 70, no. 2, pp. 1140-1152, Feb. 2022, doi: 10.1109/TCOMM.2021.3136563

  15. [15]

    Joint maneuver and beamforming design for UA V-enabled integrated sensing and communication,

    Z. Lyu et al., “Joint maneuver and beamforming design for UA V-enabled integrated sensing and communication,” IEEE Transactions on Wireless Communications, vol. 22, no. 4, pp. 2424–2439, Apr. 2023

  16. [16]

    Cheng, X

    G. Cheng, X. Song, Z. Lyu and J. Xu, ”Networked ISAC for Low-Altitude Economy: Coordinated Transmit Beamforming and UA V Trajectory Design,” in IEEE Transactions on Communications, vol. 73, no. 8, pp. 5832-5847, Aug. 2025, doi: 10.1109/TCOMM.2025.3541027

  17. [17]

    B. Li, H. Zhang, Y . Rong and Z. Han, ”A Control-based Design of Beam- forming and Trajectory for UA V-Enabled ISAC System,” in IEEE Trans- actions on Wireless Communications, doi: 10.1109/TWC.2025.3604344

  18. [18]

    D. Deng, W. Zhou, X. Li, D. B. da Costa, D. W. K. Ng and A. Nallanathan, ”Joint Beamforming and UA V Trajectory Optimization for Covert Communications in ISAC Networks,” in IEEE Transactions on Wireless Communications, vol. 24, no. 2, pp. 1016-1030, Feb. 2025, doi: 10.1109/TWC.2024.3503726

  19. [19]

    Yu et al., ”Joint 3D Beamforming-and-Trajectory Design for UA V- Satellite Uplink Covert Communication,” in IEEE Transactions on Communications, vol

    J. Yu et al., ”Joint 3D Beamforming-and-Trajectory Design for UA V- Satellite Uplink Covert Communication,” in IEEE Transactions on Communications, vol. 73, no. 5, pp. 3469-3481, May 2025, doi: 10.1109/TCOMM.2024.3480979

  20. [20]

    Y . Yao et al., ”UA V-Relay-Aided Secure Maritime Networks Coex- isting with Satellite Networks: Robust Beamforming and Trajectory Optimization,” in IEEE Transactions on Wireless Communications, doi: 10.1109/TWC.2025.3596136

  21. [21]

    X. Liu, Y . Liu and Y . Chen, ”Machine Learning Empowered Trajectory and Passive Beamforming Design in UA V-RIS Wireless Networks,” in IEEE Journal on Selected Areas in Communications, vol. 39, no. 7, pp. 2042-2055, July 2021, doi: 10.1109/JSAC.2020.3041401

  22. [22]

    L. Wang, K. Wang, C. Pan and N. Aslam, ”Joint Trajectory and Passive Beamforming Design for Intelligent Reflecting Surface-Aided UA V Communications: A Deep Reinforcement Learning Approach,” in IEEE Transactions on Mobile Computing, vol. 22, no. 11, pp. 6543-6553, 1 Nov. 2023, doi: 10.1109/TMC.2022.3200998

  23. [23]

    C. Liu, W. Yuan, Z. Wei, X. Liu and D. W. K. Ng, ”Location-Aware Predictive Beamforming for UA V Communications: A Deep Learning Approach,” in IEEE Wireless Communications Letters, vol. 10, no. 3, pp. 668-672, March 2021, doi: 10.1109/LWC.2020.3045150

  24. [24]

    H. -L. Chiang, K. -C. Chen, W. Rave, M. Khalili Marandi and G. Fettweis, ”Machine-Learning Beam Tracking and Weight Optimization for mmWave Multi-UA V Links,” in IEEE Transactions on Wireless Communications, vol. 20, no. 8, pp. 5481-5494, Aug. 2021, doi: 10.1109/TWC.2021.3068206

  25. [25]

    K. Guo, M. Wu, X. Li, H. Song and N. Kumar, ”Deep Reinforcement Learning and NOMA-Based Multi-Objective RIS-Assisted IS-UA V-TNs: Trajectory Optimization and Beamforming Design,” in IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 9, pp. 10197-10210, Sept. 2023, doi: 10.1109/TITS.2023.3267607

  26. [26]

    K. Guo, M. Wu, X. Li, Z. Lin and T. A. Tsiftsis, ”Joint Trajectory and Beamforming Optimization for Federated DRL-Aided Space-Aerial- Terrestrial Relay Networks With RIS and RSMA,” in IEEE Transactions on Wireless Communications, vol. 23, no. 12, pp. 18456-18471, Dec. 2024, doi: 10.1109/TWC.2024.3468298

  27. [27]

    B. Yin, X. Fang, X. Wang, L. Yan, J. Wu and J. Wang, ”Trajectory Design and Beamforming in UA V-Assisted Wireless Networks: A Fine- Tuned M2LLM-Driven DRL-Based Framework,” in IEEE Transactions on Wireless Communications, doi: 10.1109/TWC.2025.3605277

  28. [28]

    Graph neural network-based scheduling for multi-UA V- enabled communications in D2D networks,

    P. Li et al., “Graph neural network-based scheduling for multi-UA V- enabled communications in D2D networks,” Digital Communications and Networks, vol. 10, no. 1, pp. 45–52, 2024

  29. [29]

    H. Zhao, K. Liu, M. Liu, S. Garg and M. Alrashoud, ”Intelligent Beam- forming for UA V-Assisted IIoT Based on Hypergraph Inspired Explain- able Deep Learning,” in IEEE Transactions on Consumer Electronics, vol. 70, no. 1, pp. 1972-1982, Feb. 2024, doi: 10.1109/TCE.2023.3325128

  30. [30]

    Q. Wang, Y . Lu, W. Chen, B. Ai, Z. Zhong and D. Niyato, ”GNN- Enabled Optimization of Placement and Transmission Design for UA V Communications,” in IEEE Transactions on Vehicular Technology, vol. 74, no. 4, pp. 6656-6661, April 2025, doi: 10.1109/TVT.2024.3514860

  31. [31]

    Y . Pan, X. Wang, Z. Xu, N. Cheng, W. Xu and J. -J. Zhang, ”GNN- Empowered Effective Partial Observation MARL Method for AoI Management in Multi-UA V Network,” in IEEE Internet of Things Journal, vol. 11, no. 21, pp. 34541-34553, 1 Nov.1, 2024, doi: 10.1109/JIOT.2024.3447774

  32. [32]

    Z. Chen, Z. Zhang, Z. Xiao, Z. Yang and R. Jin, ”Deep Learning-Based Multi-User Positioning in Wireless FDMA Cellular Networks,” in IEEE Journal on Selected Areas in Communications, vol. 41, no. 12, pp. 3848- 3862, Dec. 2023, doi: 10.1109/JSAC.2023.3322799

  33. [33]

    Mozaffari, W

    M. Mozaffari, W. Saad, M. Bennis and M. Debbah, ”Efficient Deployment of Multiple Unmanned Aerial Vehicles for Optimal Wireless Coverage,” in IEEE Communications Letters, vol. 20, no. 8, pp. 1647-1650, Aug. 2016, doi: 10.1109/LCOMM.2016.2578312

  34. [34]

    GraphNorm: A principled approach to accelerating graph neural network training,

    T. Cai, S. Luo, K. Xu, D. He, T.-Y . Liu, and L. Wang, “GraphNorm: A principled approach to accelerating graph neural network training,” in Proc. Int. Conf. Mach. Learn. (ICML), Jul. 2021, pp. 1204–1215

  35. [35]

    A review on Genetic Algorithm: Past, present, and future,

    S. Katoch, S. S. Chauhan, and V . Kumar, “A review on Genetic Algorithm: Past, present, and future,” Multimedia Tools and Applications, vol. 80, no. 5, pp. 8091–8126, Oct. 2020. doi:10.1007/s11042-020-10139-6

  36. [36]

    Blank and K

    J. Blank and K. Deb, ”Pymoo: Multi-Objective Optimization in Python,” in IEEE Access, vol. 8, pp. 89497-89509, 2020, doi: 10.1109/AC- CESS.2020.2990567

  37. [37]

    Deep Sets

    M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. Salakhutdinov, and A. Smola, “Deep sets,” arXiv preprint arXiv:1703.06114, 2018

  38. [38]

    PointNet++: Deep hierarchical feature learning on point sets in a metric space,

    C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep hierarchical feature learning on point sets in a metric space,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Long Beach, CA, USA, Dec. 2017, pp. 5099–5108

  39. [39]

    Value-decomposition networks for cooperative multi- agent learning,

    P. Sunehag et al., “Value-decomposition networks for cooperative multi- agent learning,” in Proc. Int. Conf. Auto. Agents Multiagent Syst. (AAMAS), Stockholm, Sweden, Jul. 2018, pp. 2085–2087. 15

  40. [40]

    Monotonic value function factorisation for deep multi-agent reinforcement learning,

    T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,” J. Mach. Learn. Res., vol. 21, no. 178, pp. 1–51, 2020