arxiv: 2604.04079 · v1 · submitted 2026-04-05 · 📡 eess.SY · cs.SY

Recognition: no theorem link

Multi-AUV Trajectory Learning for Sustainable Underwater IoT with Acoustic Energy Transfer

Mohamed Afouene Melki , Mohammad Shehab , Mohamed-Slim Alouini

Authors on Pith no claims yet

Pith reviewed 2026-05-13 17:02 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords multi-AUV coordinationunderwater IoTacoustic energy transferage of informationdeep reinforcement learningtrajectory optimizationsustainable sensing

0 comments

The pith

A centralized reinforcement learning policy coordinates multiple AUVs to lower average age of information and raise data collection fairness in underwater IoT networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formulates multi-AUV trajectory planning as a Markov decision process that includes continuous vehicle motion, propulsion energy use, acoustic energy transfer, and age-of-information tracking. A proximal policy optimization agent learns a single joint policy that respects docking stations, collision avoidance, and energy limits. Evaluation against heuristic baselines shows lower average age of information, better fairness across sensors, and higher overall data collection rates. The performance advantage grows as the number of AUVs and sensors increases. A reader cares because sustained underwater sensing currently depends on frequent surface returns or battery swaps that this approach aims to reduce.

Core claim

The paper claims that a centralized PPO policy, trained on a Markov decision process that couples AUV kinematics, acoustic energy transfer feasibility, and age-of-information penalties, produces cooperative trajectories that cut average age of information, improve sensor fairness, and raise collection efficiency relative to structured heuristics, with the gains becoming larger as network size grows.

What carries the argument

A centralized proximal policy optimization agent that maps joint observations of AUV states and sensor ages into continuous velocity commands while enforcing energy, docking, and safety constraints.

If this is right

Average age of information drops compared with heuristic trajectory planners.
Data collection becomes fairer across distributed sensors.
Collection efficiency rises because AUVs spend less time traveling and more time harvesting data.
The advantage over baselines widens as the number of AUVs and sensors grows.
Acoustic energy transfer becomes a practical way to extend mission duration without surface returns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same joint policy could be tested with partial observations to see whether decentralization still preserves most of the gains.
If acoustic transfer ranges prove shorter in real water than modeled, the policy would need additional surface docking stations.
The framework could be extended to include surface vehicles that serve as mobile energy hubs.
Real-sea validation would require measuring how well the learned trajectories tolerate currents and multipath fading not present in the simulator.

Load-bearing premise

The simulation environment correctly captures real acoustic propagation losses, energy transfer efficiencies, and AUV propulsion dynamics.

What would settle it

Deploy the learned policy on physical AUVs in a controlled sea trial and compare measured age-of-information values and energy consumption against the simulation predictions under identical starting conditions.

Figures

Figures reproduced from arXiv: 2604.04079 by Mohamed Afouene Melki, Mohamed-Slim Alouini, Mohammad Shehab.

**Figure 2.** Figure 2: Performance comparison of the proposed and benchmark schemes for [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: AUV trajectories and total collected data for a network with 7 IoUT nodes under different scheduling strategies. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: AUV speed and heading evolution for a network with 7 IoUT nodes [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

The Internet of Underwater Things (IoUT) supports ocean sensing and offshore monitoring but requires coordinated mobility and energy-aware communication to sustain long-term operation. This letter proposes a multi-AUV framework that jointly addresses trajectory control and acoustic communication for sustainable IoUT operation. The problem is formulated as a Markov decision process that integrates continuous AUV kinematics, propulsion-aware energy consumption, acoustic energy transfer feasibility, and Age of Information (AoI) regulation. A centralized deep reinforcement learning policy based on Proximal Policy Optimization (PPO) is developed to coordinate multiple AUVs under docking and safety constraints. The proposed approach is evaluated against structured heuristic baselines and demonstrates significant reductions in average AoI while improving fairness and data collection efficiency. Results show that cooperative multi-AUV control provides scalable performance gains as the network size increases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable PPO formulation for multi-AUV trajectory and energy management in underwater IoT, but its claims depend on simulations whose acoustic models lack real-world validation.

read the letter

This paper applies PPO to coordinate multiple AUVs for underwater IoT, handling trajectory, acoustic energy transfer, and Age of Information in one policy. The main takeaway is that it works in their simulations and scales, but the results hinge on untested assumptions about the underwater channel. The new element is the joint MDP formulation that includes continuous kinematics, energy consumption for propulsion, feasibility of acoustic charging, and AoI regulation under docking and safety rules. They train a centralized PPO on that and show it outperforms heuristic baselines on average AoI, fairness, and data collection as the number of AUVs grows. That is useful for the engineering side. The formulation is clear, the constraints are handled, and the scalability claim is backed by the reported trends. The soft spot is the lack of grounding in real data. The stress test points out that acoustic propagation and energy transfer are modeled with simplifications like constant sound speed and perfect observability. Without validation against field measurements or even basic sensitivity checks, the performance margins over heuristics may not hold when the AUVs hit variable currents or multipath fading. The paper does not appear to include hardware-in-the-loop tests or parameter sweeps that would address this. Readers in applied ocean sensing or IoUT deployment will find the MDP setup and baseline comparisons helpful for their own work. It is a scoped advance rather than a new paradigm, but the thinking is straightforward and the comparisons are there. I would send this to peer review. The core idea is solid enough that referees can push on the simulation fidelity and ask for more evidence on transferability.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a multi-AUV trajectory optimization framework for sustainable underwater IoT networks that incorporate acoustic energy transfer. The joint trajectory and communication problem is formulated as a continuous-state MDP that includes AUV kinematics, propulsion energy consumption, acoustic energy transfer feasibility, docking constraints, and Age of Information (AoI) regulation. A centralized PPO policy is trained to coordinate the AUVs, and simulation results are presented showing reductions in average AoI, improved fairness, and better data-collection efficiency relative to structured heuristic baselines, with performance scaling as the number of AUVs increases.

Significance. If the idealized simulation environment accurately captures real acoustic propagation and energy-transfer physics, the work would provide concrete evidence that cooperative DRL can deliver scalable, energy-aware coordination gains for IoUT. The integration of propulsion-aware energy models with AoI objectives is a timely contribution to sustainable underwater sensing systems.

major comments (3)

[§3.2] §3.2 (MDP Formulation, acoustic energy transfer model): the feasibility and efficiency of acoustic energy transfer are modeled with constant parameters and perfect docking; no sensitivity analysis or Monte-Carlo variation of sound-speed, attenuation, or transfer efficiency is reported, yet these quantities directly determine the sustainability claims.
[§4] §4 (Numerical Results): the reported 'significant reductions' in average AoI and the scalability gains are presented without error bars, confidence intervals, or results from multiple random seeds; the performance margins over the heuristic baselines therefore cannot be statistically assessed.
[§4.3] §4.3 (Scalability experiments): the claim that cooperative control provides scalable gains is supported only up to the largest simulated network size; no extrapolation analysis or larger-scale runs are provided to substantiate the asymptotic statement.

minor comments (2)

[Abstract] The abstract refers to 'structured heuristic baselines' without naming them; a brief enumeration in the abstract or a dedicated sentence in §4 would improve readability.
[§3] Notation table or list for the MDP state/action components (e.g., energy state, AoI vector) is missing; adding one would clarify the continuous-state formulation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and have revised the manuscript where feasible to strengthen the presentation.

read point-by-point responses

Referee: [§3.2] §3.2 (MDP Formulation, acoustic energy transfer model): the feasibility and efficiency of acoustic energy transfer are modeled with constant parameters and perfect docking; no sensitivity analysis or Monte-Carlo variation of sound-speed, attenuation, or transfer efficiency is reported, yet these quantities directly determine the sustainability claims.

Authors: We agree that the acoustic energy transfer model employs fixed parameters drawn from standard underwater propagation literature and assumes perfect docking. These choices reflect the focus on DRL-based coordination under nominal conditions. In the revised manuscript we have added a dedicated paragraph in Section 3.2 that explicitly states the parameter values, their sources, and the modeling assumptions, together with a brief discussion of how variations in sound speed or attenuation would affect feasibility. A full Monte-Carlo sensitivity study remains outside the scope of this letter because of the substantial additional simulation time required; we note this limitation and flag it for future work. revision: partial
Referee: [§4] §4 (Numerical Results): the reported 'significant reductions' in average AoI and the scalability gains are presented without error bars, confidence intervals, or results from multiple random seeds; the performance margins over the heuristic baselines therefore cannot be statistically assessed.

Authors: We accept that the original figures lacked statistical characterization. We have re-executed all experiments with five independent random seeds per configuration, recomputed the mean and standard deviation of the key metrics, and replaced the original plots in Section 4 with versions that include error bars. The revised text now reports the observed margins together with these variability measures, enabling direct statistical comparison against the heuristic baselines. revision: yes
Referee: [§4.3] §4.3 (Scalability experiments): the claim that cooperative control provides scalable gains is supported only up to the largest simulated network size; no extrapolation analysis or larger-scale runs are provided to substantiate the asymptotic statement.

Authors: The largest network size examined is constrained by available computational resources. In the revision we have appended a short extrapolation subsection that fits a simple linear model to the observed performance-versus-N curves and projects the trend beyond the simulated range. We have also tempered the wording in the text to clarify that the scalability statement is supported by the empirical trend up to the tested sizes rather than by a formal asymptotic analysis. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper formulates the multi-AUV problem as an MDP integrating kinematics, energy, and AoI, then applies PPO to learn a centralized policy evaluated against heuristic baselines in simulation. No equations, fitted parameters renamed as predictions, or self-citation chains are present in the abstract or described structure that reduce the claimed AoI reductions or scalability gains to inputs by construction. The results rest on independent simulation comparisons rather than self-definitional steps, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the approach rests on standard MDP and DRL assumptions whose details are not visible.

pith-pipeline@v0.9.0 · 5442 in / 1076 out tokens · 29496 ms · 2026-05-13T17:02:22.195608+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

[1]

Underwater communication tech- nologies: A review,

T. Theocharidis and E. Kavallieratou, “Underwater communication tech- nologies: A review,”Telecommunication Systems, vol. 88, no. 2, p. 54, 2025

work page 2025
[2]

Adaptive versus predictive techniques in underwater acoustic communication networks,

F. Busaccaet al., “Adaptive versus predictive techniques in underwater acoustic communication networks,”Computer Networks, vol. 252, p. 110679, 2024

work page 2024
[3]

Mobile relaying-based reliable data collection in underwater acoustic sensor networks,

M. Chenget al., “Mobile relaying-based reliable data collection in underwater acoustic sensor networks,”IEEE Wireless Communications Letters, vol. 11, no. 9, pp. 1795–1799, 2022

work page 2022
[4]

AUV Trajectory Learning for Underwater Acoustic Energy Transfer and Age Minimization,

M. A. Melki, M. Shehab, and M.-S. Alouini, “AUV Trajectory Learning for Underwater Acoustic Energy Transfer and Age Minimization,”IEEE Internet of Things Journal, vol. 12, no. 12, pp. 20 435–20 447, 2025

work page 2025
[5]

Optimal Power Allocation for Full-Duplex Underwater Relay Networks With Energy Harvesting: A Reinforcement Learning Approach,

R. Wanget al., “Optimal Power Allocation for Full-Duplex Underwater Relay Networks With Energy Harvesting: A Reinforcement Learning Approach,”IEEE Wireless Communications Letters, vol. 9, no. 2, pp. 223–227, 2020

work page 2020
[6]

Intermittent Event-Triggered Control for Multi- AUV System with Obstacle Avoidance,

H. Sun and X. Lin, “Intermittent Event-Triggered Control for Multi- AUV System with Obstacle Avoidance,”Journal of Marine Science and Engineering, vol. 13, no. 8, p. 1557, 2025

work page 2025
[7]

A Collision Avoidance MAC Protocol with Power Control for Adaptive Clustering Underwater Sensor Networks,

L. Xue, H. Lei, and R. Zhu, “A Collision Avoidance MAC Protocol with Power Control for Adaptive Clustering Underwater Sensor Networks,” Journal of Marine Science and Engineering, vol. 13, no. 1, p. 76, 2025

work page 2025
[8]

Underwater ultrasonic wireless power transfer: A battery-less platform for the internet of underwater things,

R. Guida, E. Demirors, N. Dave, and T. Melodia, “Underwater ultrasonic wireless power transfer: A battery-less platform for the internet of underwater things,”IEEE Transactions on Mobile Computing, vol. 21, no. 5, pp. 1861–1873, 2022

work page 2022
[9]

Remotely powered underwater acoustic sensor networks,

A. Bereketli and S. Bilgen, “Remotely powered underwater acoustic sensor networks,”IEEE Sensors Journal, vol. 12, no. 12, pp. 3467– 3472, 2012

work page 2012
[10]

Platforms: Autonomous underwater vehicles,

J. G. Bellingham, “Platforms: Autonomous underwater vehicles,”Mea- surement Techniques, Platforms and Sensors, p. 162, 2009

work page 2009