Queue-Aware Graph Reinforcement Learning for UAV-ISAC-Assisted Maritime Data Collection
Pith reviewed 2026-07-02 00:48 UTC · model grok-4.3
The pith
A graph-MARL policy for UAVs in maritime ISAC raises long-term queue-weighted data collection utility by 106 percent over rate-driven baselines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that a heterogeneous graph encoder producing candidate-edge logits, combined with masked sequential b-matching to enforce UAV-load and buoy-cluster constraints, yields a policy whose cumulative queue-weighted collection utility exceeds that of a rate-driven deterministic decoder by about 106 percent in congested maritime scenarios, with the advantage persisting across sea-state sweeps, medium-to-heavy loads, and larger networks.
What carries the argument
The structured feasible-association graph-MARL framework, in which a heterogeneous graph encoder generates logits for legal UAV-buoy edges and a masked sequential b-matching sampler produces constraint-satisfying associations inside a MAPPO training loop.
If this is right
- The policy maintains its advantage when sea states and buoy traffic loads vary.
- The same trained network transfers to larger UAV fleets without additional fine-tuning.
- Long-horizon queue weighting produces better backlog management than instantaneous rate objectives under energy and safety limits.
- The feasible-association sampling step removes the need to replace learning with an external deterministic optimizer.
Where Pith is reading between the lines
- The same graph-MARL structure could be applied to other queueing collection tasks such as disaster-zone sensor data gathering where associations must remain feasible.
- If the sea-state models are updated with real-time measurements, the policy might adapt online without full retraining.
- Extending the critic to include explicit communication latency between UAVs and the HAP could further tighten the gap between simulation and deployment.
Load-bearing premise
The simulated sea-patch field, patch-aware buoy dynamics, RCS- and clutter-aware sensing, fused posterior bounds, and propulsion-energy models accurately represent physical maritime conditions.
What would settle it
A side-by-side field trial that measures actual cumulative queue-weighted utility when the learned policy flies real rotary-wing UAVs over instrumented drifting buoys versus the rate-driven decoder under comparable sea states and traffic.
Figures
read the original abstract
This paper studies high-altitude platform (HAP)-assisted sparse cooperative integrated sensing and communication (ISAC) for UAV-enabled ocean monitoring. A fleet of rotary-wing UAVs senses drifting buoys, collects their monitoring data, and reports local posterior estimates to a HAP that performs fusion and sparse cooperation control. The model explicitly accounts for a spatially correlated sea-patch field, patch-aware buoy dynamics, RCS- and clutter-aware echo sensing, fused posterior Cram\'er-Rao bounds (PCRBs), and propulsion-energy-limited UAV mobility. The long-horizon objective is cast as a queue-weighted buffered-collection Markov decision process rather than instantaneous throughput, where each buoy maintains a backlog of buffered observations. The resulting long-horizon design is formulated as a mixed discrete-continuous problem with sensing, communication, mobility, safety, buffered-collection, and onboard-energy constraints. To address the combinatorial association component without replacing learning by a deterministic optimizer, we propose a structured feasible-association graph-MARL framework. A heterogeneous graph encoder produces candidate-edge logits, and a masked sequential b-matching policy samples legal UAV-buoy associations while exactly satisfying UAV-load and buoy-cluster constraints. A MAPPO-style training procedure, an independent queue-state value critic, and a consistency-verification protocol are then specified to support reproducible training. Simulation results on congested maritime scenarios show that the proposed policy improves the cumulative queue-weighted collection utility by about 106\% over the rate-driven deterministic decoder, maintains a large margin across sea-state sweeps and medium-to-heavy traffic loads, and transfers to larger networks without fine-tuning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a heterogeneous graph-MARL framework (MAPPO with masked sequential b-matching) for a fleet of rotary-wing UAVs performing ISAC-based maritime data collection under a HAP. The problem is formulated as a queue-weighted buffered-collection MDP that incorporates spatially correlated sea-patch fields, patch-aware buoy dynamics, RCS/clutter-aware sensing, fused PCRBs, and propulsion-energy limits. The central empirical claim is that the learned policy yields approximately 106% higher cumulative queue-weighted collection utility than a rate-driven deterministic decoder baseline, with robustness across sea-state and traffic-load sweeps plus zero-shot transfer to larger networks.
Significance. If the reported simulation margins hold under the stated models, the work provides a concrete, constraint-exact approach to long-horizon combinatorial association in maritime UAV networks that avoids replacing learning with an external optimizer. The consistency-verification protocol and independent queue-state critic are positive contributions toward reproducible MARL training in constrained multi-agent settings.
major comments (3)
- [Abstract / Simulation Results] Abstract and Simulation Results section: the 106% cumulative utility improvement is reported without error bars, training-run standard deviations, or the number of independent seeds; this leaves the statistical reliability of the central performance claim unquantified.
- [Policy formulation / Training procedure] Policy and Training sections: no numerical verification (e.g., fraction of trajectories or episodes) is supplied showing that the masked sequential b-matching policy satisfies UAV-load and buoy-cluster constraints on every sampled trajectory, which is load-bearing for the feasibility guarantee.
- [Simulation Results] Experimental evaluation: no ablation is presented on the queue-weighting parameters (explicitly listed among the free parameters) or on the contribution of the queue-aware critic versus a standard value function, making it impossible to isolate the source of the reported margin over the rate-driven baseline.
minor comments (1)
- [System Model] Notation for the heterogeneous graph encoder and the PCRB fusion step could be clarified with an explicit diagram or pseudocode to aid readers unfamiliar with the maritime ISAC setting.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract / Simulation Results] Abstract and Simulation Results section: the 106% cumulative utility improvement is reported without error bars, training-run standard deviations, or the number of independent seeds; this leaves the statistical reliability of the central performance claim unquantified.
Authors: We agree that statistical reliability should be quantified. In the revised manuscript we will report the number of independent seeds (5), the mean and standard deviation of cumulative utility across runs, and add error bars to the relevant figures. revision: yes
-
Referee: [Policy formulation / Training procedure] Policy and Training sections: no numerical verification (e.g., fraction of trajectories or episodes) is supplied showing that the masked sequential b-matching policy satisfies UAV-load and buoy-cluster constraints on every sampled trajectory, which is load-bearing for the feasibility guarantee.
Authors: The masked sequential b-matching enforces the constraints exactly by construction via the masking mechanism. To supply the requested numerical evidence we will add, in the revised Training section, results from the consistency-verification protocol reporting the fraction of trajectories (expected 100 %) that satisfy the constraints. revision: yes
-
Referee: [Simulation Results] Experimental evaluation: no ablation is presented on the queue-weighting parameters (explicitly listed among the free parameters) or on the contribution of the queue-aware critic versus a standard value function, making it impossible to isolate the source of the reported margin over the rate-driven baseline.
Authors: We acknowledge the value of such ablations. In the revised Simulation Results section we will add an ablation study on the queue-weighting parameters and a direct comparison of the queue-aware critic against a standard value function. revision: yes
Circularity Check
No circularity: empirical simulation result stands on independent baseline comparison.
full rationale
The paper formulates a queue-weighted MDP from first-principles maritime models (sea-patch field, buoy dynamics, RCS/clutter sensing, PCRB fusion, energy limits) and trains a graph-MARL policy with masked b-matching. The reported 106% utility gain is obtained by direct comparison against an external rate-driven deterministic decoder baseline inside the same simulator. No equation reduces the claimed gain to a fitted parameter by construction, no load-bearing premise rests on self-citation, and the central performance claim is not equivalent to its inputs. This is the normal non-circular case for a simulation study.
Axiom & Free-Parameter Ledger
free parameters (1)
- queue weights and MAPPO hyperparameters
axioms (2)
- domain assumption Spatially correlated sea-patch field and patch-aware buoy dynamics accurately represent real ocean conditions
- domain assumption RCS- and clutter-aware echo sensing and fused posterior Cramér-Rao bounds correctly capture sensing performance
Reference graph
Works this paper leans on
-
[1]
Maritime internet of things: Challenges and solutions,
T. Xia, M. M. Wang, J. Zhang, and L. Wang, “Maritime internet of things: Challenges and solutions,”IEEE Wireless Communications, vol. 27, no. 2, pp. 188–196, 2020
2020
-
[2]
A survey on UA V-aided maritime communications: Deployment considerations, applications, and future challenges,
N. Nomikos, P. K. Gkonis, P. S. Bithas, and P. Trakadas, “A survey on UA V-aided maritime communications: Deployment considerations, applications, and future challenges,”IEEE Open Journal of the Commu- nications Society, vol. 4, pp. 56–78, 2023
2023
-
[3]
Energy-efficient UA V- aided ocean monitoring networks: Joint resource allocation and trajectory design,
Z. Liu, X. Meng, Y . Yang, K. Ma, and X. Guan, “Energy-efficient UA V- aided ocean monitoring networks: Joint resource allocation and trajectory design,”IEEE Internet of Things Journal, vol. 9, no. 18, pp. 17 871– 17 884, 2022
2022
-
[4]
Integrated sensing and communications: Toward dual-functional wire- less networks for 6G and beyond,
F. Liu, Y . Cui, C. Masouros, J. Xu, T. Han, Y . C. Eldar, and S. Buzzi, “Integrated sensing and communications: Toward dual-functional wire- less networks for 6G and beyond,”IEEE Journal on Selected Areas in Communications, vol. 40, no. 6, pp. 1728–1767, 2022
2022
-
[5]
A vision and framework for the high altitude platform station (HAPS) networks of the future,
G. Karabulut Kurt, M. G. Khoshkholgh, S. Alfattani, A. Ibrahim, T. S. J. Darwish, M. S. Alam, H. Yanikomeroglu, and A. Yongacoglu, “A vision and framework for the high altitude platform station (HAPS) networks of the future,”IEEE Communications Surveys & Tutorials, vol. 23, no. 2, pp. 729–779, 2021. 17
2021
-
[6]
UA V-enabled joint sensing, communication, powering, and backhaul transmission in maritime monitoring networks,
B. Li, J. Liu, Y . Liang, Q. Li, H. Liu, Y . Zhang, J. Mu, S. Mumtaz, and S. Chen, “UA V-enabled joint sensing, communication, powering, and backhaul transmission in maritime monitoring networks,”IEEE Internet of Things Journal, vol. 13, no. 4, pp. 7473–7486, 2026
2026
-
[7]
Joint maneuver and beamforming design for UA V-enabled integrated sensing and communication,
Z. Lyu, G. Zhu, and J. Xu, “Joint maneuver and beamforming design for UA V-enabled integrated sensing and communication,”IEEE Trans- actions on Wireless Communications, vol. 22, no. 4, pp. 2424–2440, 2023
2023
-
[8]
Cooperative trajectory planning and resource allocation for UA V-enabled integrated sensing and communication systems,
Y . Pan, R. Li, X. Da, H. Hu, M. Zhang, D. Zhai, K. Cumanan, and O. A. Dobre, “Cooperative trajectory planning and resource allocation for UA V-enabled integrated sensing and communication systems,”IEEE Transactions on V ehicular Technology, vol. 73, no. 5, pp. 6502–6516, 2024
2024
-
[9]
Energy-efficient trajectory design for UA V- aided maritime data collection in wind,
Y . Zhang, J. Lyu, and L. Fu, “Energy-efficient trajectory design for UA V- aided maritime data collection in wind,”IEEE Transactions on Wireless Communications, vol. 21, no. 12, pp. 10 871–10 886, 2022
2022
-
[10]
Throughput maximization for UA V-enabled integrated periodic sensing and commu- nication,
K. Meng, Q. Wu, S. Zhang, W. Chen, K. Huang, and J. Xu, “Throughput maximization for UA V-enabled integrated periodic sensing and commu- nication,”IEEE Transactions on Wireless Communications, vol. 22, no. 1, pp. 671–687, 2023
2023
-
[11]
Energy minimization for wireless communication with rotary-wing UA V,
Y . Zeng, J. Xu, and R. Zhang, “Energy minimization for wireless communication with rotary-wing UA V,”IEEE Transactions on Wireless Communications, vol. 18, no. 4, pp. 2329–2345, 2019
2019
-
[12]
UA V-enabled integrated sensing and communication: Opportunities and challenges,
K. Meng, Q. Wu, J. Xu, W. Chen, Z. Feng, R. Schober, and A. L. Swindlehurst, “UA V-enabled integrated sensing and communication: Opportunities and challenges,”IEEE Wireless Communications, vol. 31, no. 2, pp. 97–104, 2024
2024
-
[13]
UA V-enabled integrated sensing and communication: Tracking design and optimization,
Y . Jiang, Q. Wu, W. Chen, and K. Meng, “UA V-enabled integrated sensing and communication: Tracking design and optimization,”IEEE Communications Letters, vol. 28, no. 5, pp. 1024–1028, 2024
2024
-
[14]
Trajectory design and power control for joint radar and communication enabled multi-UA V cooperative detection systems,
T. Zhang, K. Zhu, S. Zheng, D. Niyato, and N. C. Luong, “Trajectory design and power control for joint radar and communication enabled multi-UA V cooperative detection systems,”IEEE Transactions on Com- munications, vol. 71, no. 1, pp. 158–172, 2023
2023
-
[15]
Joint multi-domain resource allocation and trajectory optimization in UA V-assisted maritime IoT networks,
L. P. Qian, H. Zhang, Q. Wang, Y . Wu, and B. Lin, “Joint multi-domain resource allocation and trajectory optimization in UA V-assisted maritime IoT networks,”IEEE Internet of Things Journal, vol. 10, no. 1, pp. 539– 552, 2023
2023
-
[16]
UA V-enabled integrated sensing and communication in maritime emergency networks,
B. Li, J. Liu, J. Mu, P. Xiao, and S. Chen, “UA V-enabled integrated sensing and communication in maritime emergency networks,”IEEE Internet of Things Journal, vol. 12, no. 24, pp. 53 997–54 012, 2025
2025
-
[17]
Multiuser maritime integrated sensing and communication shipboard base station deployment optimization,
J. Zhang, G. Wang, H. Yang, B. Liu, and B. Li, “Multiuser maritime integrated sensing and communication shipboard base station deployment optimization,”IEEE Internet of Things Journal, vol. 11, no. 18, pp. 29 375–29 386, 2024
2024
-
[18]
Intelligent reflecting surface enhanced maritime joint sensing and communication systems: Performance opti- mization,
X. Cao, S. Wang, and Y . Zhang, “Intelligent reflecting surface enhanced maritime joint sensing and communication systems: Performance opti- mization,”IEEE Transactions on Communications, 2024
2024
-
[19]
Energy harvesting UA V-RIS-assisted maritime communications based on deep reinforcement learning against jamming,
H. Yang, K. Lin, L. Xiao, Y . Zhao, Z. Xiong, and Z. Han, “Energy harvesting UA V-RIS-assisted maritime communications based on deep reinforcement learning against jamming,”IEEE Transactions on Wireless Communications, vol. 23, no. 8, pp. 9854–9868, 2024
2024
-
[20]
Cooperative data collection for UA V-assisted maritime IoT based on deep reinforce- ment learning,
X. Fu, X. Huang, Q. Pan, P. Pace, G. Aloi, and G. Fortino, “Cooperative data collection for UA V-assisted maritime IoT based on deep reinforce- ment learning,”IEEE Transactions on V ehicular Technology, vol. 73, no. 7, pp. 10 333–10 349, 2024
2024
-
[21]
Integrated sensing and communi- cations for UA V assisted internet of things based on deep reinforcement learning,
X. Liu, J. Wu, C. Zhao, and Z. Liu, “Integrated sensing and communi- cations for UA V assisted internet of things based on deep reinforcement learning,”IEEE Transactions on V ehicular Technology, vol. 74, no. 6, pp. 9604–9616, 2025
2025
-
[22]
Multi-objective ISAC for low- altitude economy based on multi-task deep reinforcement learning with mixture of experts,
X. Ye, H. Lin, X. Song, Y . Wu, and L. Fu, “Multi-objective ISAC for low- altitude economy based on multi-task deep reinforcement learning with mixture of experts,”IEEE Transactions on Mobile Computing, 2026
2026
-
[23]
Multi- domain resource management for space–air–ground integrated sensing, communication, and computation networks,
S. Mao, L. Liu, X. Hou, M. Atiquzzaman, and K. Yang, “Multi- domain resource management for space–air–ground integrated sensing, communication, and computation networks,”IEEE Journal on Selected Areas in Communications, vol. 42, no. 12, pp. 3380–3394, 2024
2024
-
[24]
MADQN-enhanced computation offloading and resource allocation for 6G low-altitude economy vehicular networks,
B. Hu, H. Liu, J. Du, M. López-Benítez, C. Wu, X. Chu, and D. Niyato, “MADQN-enhanced computation offloading and resource allocation for 6G low-altitude economy vehicular networks,”IEEE Transactions on Cognitive Communications and Networking, 2025
2025
-
[25]
Graph neural networks for wireless communications: From theory to practice,
Y . Shen, J. Zhang, S. H. Song, and K. B. Letaief, “Graph neural networks for wireless communications: From theory to practice,”IEEE Transactions on Wireless Communications, vol. 22, no. 5, pp. 3554–3569, 2023
2023
-
[26]
Graph neural network meets multi-agent reinforcement learning: Fundamentals, applications, and future directions,
Z. Liu, J. Zhang, E. Shi, Z. Liu, B. Ai, Y . Shen, and D. W. K. Ng, “Graph neural network meets multi-agent reinforcement learning: Fundamentals, applications, and future directions,”IEEE Wireless Communications, 2024
2024
-
[27]
Mobile cell- free massive MIMO with multi-agent reinforcement learning: A scalable framework,
Z. Liu, J. Zhang, Y . Zhu, E. Shi, D. W. K. Ng, and B. Ai, “Mobile cell- free massive MIMO with multi-agent reinforcement learning: A scalable framework,”IEEE Transactions on Wireless Communications, 2024
2024
-
[28]
Cooperative trajectory design of multiple UA V base stations with heterogeneous graph neural networks,
X. Zhang, H. Zhao, J. Wei, C. Yan, J. Xiong, and X. Liu, “Cooperative trajectory design of multiple UA V base stations with heterogeneous graph neural networks,”IEEE Transactions on Wireless Communications, vol. 22, no. 3, pp. 1495–1509, 2023
2023
-
[29]
Heterogeneous graph neural network for beamforming design in cell-free massive MIMO with underlaid D2D maritime systems,
H. Liu, Z. Xie, J. Liu, and B. Li, “Heterogeneous graph neural network for beamforming design in cell-free massive MIMO with underlaid D2D maritime systems,” inProc. IEEE 101st V eh. Technol. Conf. (VTC2025- Spring), 2025, pp. 1–6
2025
-
[30]
Multi-UA V collaborative ISAC with dynamic resource allocation: A hierarchical graph multi-agent reinforcement learning approach,
J. Wang, X. Zhang, Z. Wei, F. Sun, Y . Li, Z. Feng, and J. Lu, “Multi-UA V collaborative ISAC with dynamic resource allocation: A hierarchical graph multi-agent reinforcement learning approach,”IEEE Journal of Selected Topics in Signal Processing, 2026
2026
-
[31]
L. H. Holthuijsen,Waves in Oceanic and Coastal Waters. Cambridge University Press, 2007
2007
-
[32]
Wave modelling - the state of the art,
L. Cavaleri, J.-H. G. M. Alves, F. Ardhuin, A. Babanin, M. Banner, K. Belibassakis, M. Benoit, M. Donelan, J. Groeneweg, T. H. C. Herbers et al., “Wave modelling - the state of the art,”Progress in Oceanography, vol. 75, no. 4, pp. 603–674, 2007
2007
-
[33]
Advection schemes for unstructured grid ocean modelling,
E. Hanert, D. Y . Le Roux, V . Legat, and E. Deleersnijder, “Advection schemes for unstructured grid ocean modelling,”Ocean Modelling, vol. 7, no. 1–2, pp. 39–58, 2004
2004
-
[34]
Estimating optimal tracking filter performance for manned maneuvering targets,
R. A. Singer, “Estimating optimal tracking filter performance for manned maneuvering targets,”IEEE Transactions on Aerospace and Electronic Systems, vol. AES-6, no. 4, pp. 473–483, 1970
1970
-
[35]
Radar Cross Section (RCS) Modeling and Simulation, Part 1: A Tutorial Review of Definitions, Strategies, and Canonical Examples,
C. Uluı¸ sık, G. Çakır, M. Çakır, and L. Sevgi, “Radar Cross Section (RCS) Modeling and Simulation, Part 1: A Tutorial Review of Definitions, Strategies, and Canonical Examples,”IEEE Antennas and Propagation Magazine, vol. 50, no. 1, pp. 115–126, 2008
2008
-
[36]
Integrated Sensing And Communications (ISAC); Channel Modelling, Measurements and Evaluation Methodology,
ETSI ISG ISAC, “Integrated Sensing And Communications (ISAC); Channel Modelling, Measurements and Evaluation Methodology,” ETSI, Tech. Rep. ETSI GR ISC 002 V1.1.1, Aug. 2025. [Online]. Available: https://www.etsi.org/deliver/etsi_gr/ISC/001_099/002/01.01. 01_60/gr_ISC002v010101p.pdf
2025
-
[37]
Discussion on ISAC Channel Modeling,
T. K. Le, “Discussion on ISAC Channel Modeling,” 3GPP TSG RAN WG1 Meeting #119, Athens, Greece, Tech. Rep. R1-2500414, Feb. 2025. [Online]. Available: https://www.eurecom.fr/publication/ 8092/download/comsys-publi-8092.pdf
2025
-
[38]
An improved empirical model for radar sea clutter reflectivity,
V . Gregers-Hansen and R. Mital, “An improved empirical model for radar sea clutter reflectivity,”IEEE Transactions on Aerospace and Electronic Systems, vol. 48, no. 4, pp. 3512–3524, 2012
2012
-
[39]
Energy model for UA V communications: Experimental validation and model generalization,
N. Gao, Y . Zeng, J. Wang, D. Wu, C. Zhang, Q. Song, J. Qian, and S. Jin, “Energy model for UA V communications: Experimental validation and model generalization,”China Communications, vol. 18, no. 7, pp. 253– 264, 2021
2021
-
[40]
Energy efficiency maximization for full-duplex UA V secrecy communication,
B. Duo, Q. Wu, X. Yuan, and R. Zhang, “Energy efficiency maximization for full-duplex UA V secrecy communication,”IEEE Transactions on V ehicular Technology, vol. 69, no. 4, pp. 4590–4595, 2020
2020
-
[41]
Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks,
L. Tassiulas and A. Ephremides, “Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks,”IEEE Transactions on Automatic Control, vol. 37, no. 12, pp. 1936–1948, 1992
1936
-
[42]
M. J. Neely,Stochastic Network Optimization with Application to Com- munication and Queueing Systems. Morgan & Claypool, 2010
2010
-
[43]
The surprising effectiveness of PPO in cooperative multi-agent games,
C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. Bayen, and Y . Wu, “The surprising effectiveness of PPO in cooperative multi-agent games,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 611– 24 624, 2022
2022
-
[44]
Is independent learning all you need in the StarCraft multi-agent challenge?
C. S. de Witt, T. Gupta, D. Makoviichuk, V . Makoviychuk, P. H. S. Torr, M. Sun, and S. Whiteson, “Is independent learning all you need in the StarCraft multi-agent challenge?”arXiv preprint arXiv:2011.09533, 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.