Multi-Objective-Optimization Assisted Data Collection Framework for IoUT Based on Offline Reinforcement

Guanwen Xie; Jingzehua Xu; Weiyi Liu; Xinqi Wang; Yi Li; Yimian Ding

arxiv: 2410.11282 · v2 · pith:RGSHH3EOnew · submitted 2024-10-15 · 📡 eess.SY · cs.SY

Multi-Objective-Optimization Assisted Data Collection Framework for IoUT Based on Offline Reinforcement

Yimian Ding , Xinqi Wang , Jingzehua Xu , Guanwen Xie , Weiyi Liu , Yi Li This is my paper

Pith reviewed 2026-05-23 19:07 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords multi-agent offline reinforcement learningdata collection frameworkunderwater networksAUVmulti-objective optimizationcollision avoidancevalue of information

0 comments

The pith

A multi-AUV framework uses multi-agent offline RL to maximize underwater data rate and value of information while minimizing energy and avoiding collisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a data collection system for Information Updating Networks that coordinates multiple autonomous underwater vehicles through offline multi-agent reinforcement learning. It replaces online RL methods, which incur high computation costs and poor data use in turbulent seas, by training policies on prior environmental and equipment data to meet four goals at once. A semi-communication decentralized training with decentralized execution paradigm and a multi-agent independent conservative Q-learning algorithm are introduced to handle the joint optimization. If the approach works, underwater sensor networks could operate more efficiently and practically without needing constant live interaction during missions.

Core claim

The proposed multi-AUV assisted data collection framework based on multi-agent offline RL maximizes data rate and the value of information, minimizes energy consumption, and ensures collision avoidance by utilizing environmental and equipment status data from prior operations through the SC-DTDE paradigm and MAICQL algorithm.

What carries the argument

The multi-agent independent conservative Q-learning (MAICQL) algorithm operating under the semi-communication decentralized training with decentralized execution (SC-DTDE) paradigm, which trains policies offline from historical data and executes them without central coordination.

If this is right

Simulations show the framework maintains robustness across dynamic underwater environments.
Data collection efficiency increases relative to online RL baselines.
The four objectives are achieved simultaneously without requiring live environment interaction during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The offline multi-agent structure could transfer to other fleets of robots operating in unpredictable settings such as aerial or ground disaster zones.
Periodic incorporation of new mission data might be needed to keep policies current when ocean currents or equipment states change beyond the original dataset.

Load-bearing premise

Data from earlier operations is representative enough of real turbulent ocean conditions for the offline policies to remain effective during actual use.

What would settle it

A field deployment in ocean conditions where the learned policies produce more collisions or lower data rates than predicted because of distribution shift from the training data.

Figures

Figures reproduced from arXiv: 2410.11282 by Guanwen Xie, Jingzehua Xu, Weiyi Liu, Xinqi Wang, Yi Li, Yimian Ding.

**Figure 2.** Figure 2: The curves of the cumulative reward, sum data rate, and sum VoI under different noise intensities: (a) Cumulative reward. (b) Sum data rate. (c) Sum [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Performance comparison of MAISAC, BC, GAIL and MAICQL [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Trajectories of AUVs for the data collection task in the turbulence-free [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Trajectories of AUVs for the data collection task in the turbulent [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

The Information Updating Networks (IUNs) offers significant potential for ocean exploration but encounters challenges due to dynamic underwater environments and severe system attenuation. Current methods relying on Autonomous Underwater Vehicles (AUVs) based on online reinforcement learning (RL) lead to high computational costs and low data utilization. To address these issues and the constraints of turbulent ocean environments, we propose a multi-AUV assisted data collection framework for IUNs based on multi-agent offline RL. This framework maximizes data rate and the value of information (VoI), minimizes energy consumption, and ensures collision avoidance by utilizing environmental and equipment status data. We introduce a semi-communication decentralized training with decentralized execution (SC-DTDE) paradigm and a multi-agent independent conservative Q-learning algorithm (MAICQL) to effectively tackle the problem. Extensive simulations demonstrate the high applicability, robustness, and data collection efficiency of the proposed framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies multi-agent offline RL to multi-AUV IoUT data collection via SC-DTDE and MAICQL, but the simulation claims rest on an untested assumption that prior data covers turbulent ocean dynamics.

read the letter

The paper takes offline RL and applies it to a multi-AUV setup for collecting data in underwater information updating networks. The new pieces are the SC-DTDE training paradigm and the MAICQL algorithm, which together handle the multi-objective goals of higher data rate and VoI, lower energy, and collision avoidance using historical environmental and equipment data. This is a reasonable response to the high cost and low data efficiency of online RL in these settings, and the simulations are presented as showing solid gains in efficiency and robustness. That part is straightforward and addresses a real constraint in ocean monitoring applications. The soft spot is the load-bearing assumption that the offline trajectories are representative enough for deployment. Ocean turbulence creates non-stationary conditions that offline methods often handle poorly, and the work does not describe explicit out-of-distribution testing or stronger regularization to bound the risk. The evaluation details are also limited, with no clear picture of baselines, statistical tests, or how the simulation environment matches real attenuation and current effects. This paper is mainly for people already working on RL applications in underwater or multi-agent IoT systems. A reader looking for a practical extension of conservative Q-learning to this domain would find it worth reading, though anyone focused on deployment would want more on transfer. I would send it for peer review because the problem is relevant and the algorithmic framing is coherent enough to benefit from referee scrutiny on the evaluation and assumptions.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a multi-AUV assisted data collection framework for Information Updating Networks (IUNs) in dynamic underwater environments. It uses multi-agent offline RL with a semi-communication decentralized training decentralized execution (SC-DTDE) paradigm and multi-agent independent conservative Q-learning (MAICQL) algorithm. The framework aims to maximize data rate and value of information (VoI), minimize energy consumption, and ensure collision avoidance by leveraging environmental and equipment status data from prior operations. Extensive simulations are reported to demonstrate high applicability, robustness, and efficiency compared to online RL baselines.

Significance. If the results hold under the stated conditions, the work could offer a practical alternative to online RL for IoUT data collection by reducing real-time computational demands and improving data utilization in attenuated underwater channels. The SC-DTDE and MAICQL contributions provide a structured way to handle multi-agent coordination offline. However, the significance is tempered by the reliance on simulation-based validation without explicit handling of non-stationarity, limiting immediate transfer to turbulent ocean deployments.

major comments (2)

[Simulation setup] Simulation setup (Section 4/5, implied by 'extensive simulations' in abstract): No description is given of how the offline dataset is generated, its state-action coverage, or any out-of-distribution testing for non-stationary effects such as time-varying currents or attenuation fluctuations. This is load-bearing for the robustness claim, as standard CQL penalties in MAICQL do not inherently bound extrapolation error in such environments.
[Section 3] Methods (Section 3, SC-DTDE and MAICQL definitions): The multi-objective aspect from the title is not explicitly connected to the RL objective; it is unclear whether VoI, data rate, and energy are combined via fixed weights, adaptive weighting, or a Pareto front, which affects whether the claimed simultaneous maximization/minimization is achieved or traded off.

minor comments (2)

[Abstract] Abstract: 'IUNs' is used before its expansion as Information Updating Networks; first-use definition would improve readability.
[Section 3] Notation: The relationship between the semi-communication mechanism in SC-DTDE and the independent Q-learning in MAICQL could be clarified with a diagram or pseudocode to show information flow during decentralized execution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and indicating revisions made to strengthen the paper.

read point-by-point responses

Referee: [Simulation setup] Simulation setup (Section 4/5, implied by 'extensive simulations' in abstract): No description is given of how the offline dataset is generated, its state-action coverage, or any out-of-distribution testing for non-stationary effects such as time-varying currents or attenuation fluctuations. This is load-bearing for the robustness claim, as standard CQL penalties in MAICQL do not inherently bound extrapolation error in such environments.

Authors: We agree that explicit details on offline dataset generation are necessary to support the robustness claims. The original manuscript referenced data from prior operations but omitted specifics on collection, coverage, and non-stationarity testing. In the revised manuscript, we have added a dedicated subsection in Section 4 describing: the generation of the offline dataset from historical multi-AUV missions in simulated IUN environments; quantitative analysis of state-action coverage; and additional out-of-distribution experiments evaluating MAICQL under time-varying currents and attenuation fluctuations. These additions directly address extrapolation concerns beyond standard CQL penalties. revision: yes
Referee: [Section 3] Methods (Section 3, SC-DTDE and MAICQL definitions): The multi-objective aspect from the title is not explicitly connected to the RL objective; it is unclear whether VoI, data rate, and energy are combined via fixed weights, adaptive weighting, or a Pareto front, which affects whether the claimed simultaneous maximization/minimization is achieved or traded off.

Authors: The multi-objective optimization is incorporated via a scalarized reward function within the MAICQL algorithm and SC-DTDE paradigm, where data rate, VoI, energy consumption, and collision avoidance are combined using fixed weights determined by domain priorities. This produces a single objective for Q-learning while achieving the stated simultaneous goals through the weighted trade-off. We have revised Section 3 to explicitly connect the title's multi-objective aspect to this reward formulation, including the specific weighting scheme and rationale for fixed weights over alternatives like Pareto fronts. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic proposals and simulation claims are self-contained

full rationale

The paper proposes MAICQL and SC-DTDE as new algorithmic components within a multi-agent offline RL framework for AUV data collection. Claims rest on simulation results comparing performance metrics (data rate, VoI, energy, collisions) rather than any derivation that reduces by construction to fitted inputs, self-citations, or renamed empirical patterns. No equations, parameter-fitting steps, or load-bearing self-citation chains appear in the provided text; the central contributions are presented as extensions of standard offline RL techniques with independent simulation validation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the framework implicitly assumes offline data suffices for policy learning in dynamic environments.

pith-pipeline@v0.9.0 · 5702 in / 951 out tokens · 28700 ms · 2026-05-23T19:07:19.714058+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AoI-MDP: An AoI Optimized Markov Decision Process (Student Abstract)
eess.SY 2026-05 unverdicted novelty 5.0

AoI-MDP integrates age of information into MDP state, action, and reward to optimize decision-making under observation delays for underwater autonomous vehicles.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Toward the internet of underwater things: Recent developments and future challenges,

R. A. Khalil, N. Saeed, M. I. Babar, and T. Jan, “Toward the internet of underwater things: Recent developments and future challenges,” IEEE Consumer Electronics Magazine , vol. 10, no. 6, pp. 32–37, 2020

work page 2020
[2]

Fisher-information-matrix-based usbl cooperative location in usv–auv networks,

Z. Wang, J. Xu, Y . Feng, Y . Wang, G. Xie, X. Hou, W. Men, and Y . Ren, “Fisher-information-matrix-based usbl cooperative location in usv–auv networks,” Sensors, vol. 23, no. 17, 2023

work page 2023
[3]

Self-deployment of mobile underwater acoustic sensor networks for maximized coverage and guaranteed connectivity,

F. Senel, K. Akkaya, M. Erol-Kantarci, and T. Yilmaz, “Self-deployment of mobile underwater acoustic sensor networks for maximized coverage and guaranteed connectivity,” Ad Hoc Networks , vol. 34, pp. 170–183, 2015

work page 2015
[4]

Low complexity residual doppler shift estimation for underwater acoustic multicarrier communi- cation,

A. Amar, G. Avrashi, and M. Stojanovic, “Low complexity residual doppler shift estimation for underwater acoustic multicarrier communi- cation,” IEEE Transactions on Signal Processing , vol. 65, no. 8, pp. 2063–2076, 2016

work page 2063
[5]

Delay-sensitive opportunistic routing for underwater sensor networks,

C.-C. Hsu, H.-H. Liu, J. L. G. G ´omez, and C.-F. Chou, “Delay-sensitive opportunistic routing for underwater sensor networks,” IEEE sensors journal, vol. 15, no. 11, pp. 6584–6591, 2015

work page 2015
[6]

An enhanced k-means and anova-based clustering approach for similarity aggregation in underwater wireless sensor networks,

H. Harb, A. Makhoul, and R. Couturier, “An enhanced k-means and anova-based clustering approach for similarity aggregation in underwater wireless sensor networks,” IEEE Sensors Journal , vol. 15, no. 10, pp. 5483–5493, 2015

work page 2015
[7]

Routing protocols for underwater wireless sensor networks,

G. Han, J. Jiang, N. Bao, L. Wan, and M. Guizani, “Routing protocols for underwater wireless sensor networks,”IEEE Communications Magazine, vol. 53, no. 11, pp. 72–78, 2015

work page 2015
[8]

Network lifetime-aware data collection in underwater sensor networks for delay-tolerant applications,

J. J. Kartha and L. Jacob, “Network lifetime-aware data collection in underwater sensor networks for delay-tolerant applications,” S¯adhan¯a, vol. 42, pp. 1645–1664, 2017

work page 2017
[9]

Biologically inspired self- organizing map applied to task assignment and path planning of an auv system,

D. Zhu, X. Cao, B. Sun, and C. Luo, “Biologically inspired self- organizing map applied to task assignment and path planning of an auv system,” IEEE Transactions on Cognitive and Developmental Systems , vol. 10, no. 2, pp. 304–313, 2017

work page 2017
[10]

A path planning scheme for auv flock-based internet-of-underwater-things systems to enable transparent and smart ocean,

C. Lin, G. Han, J. Du, Y . Bi, L. Shu, and K. Fan, “A path planning scheme for auv flock-based internet-of-underwater-things systems to enable transparent and smart ocean,” IEEE Internet of Things Journal , vol. 7, no. 10, pp. 9760–9772, 2020

work page 2020
[11]

Environment- aware auv trajectory design and resource management for multi-tier underwater computing,

X. Hou, J. Wang, T. Bai, Y . Deng, Y . Ren, and L. Hanzo, “Environment- aware auv trajectory design and resource management for multi-tier underwater computing,” IEEE Journal on Selected Areas in Commu- nications, vol. 41, no. 2, pp. 474–490, 2023

work page 2023
[12]

Environment and energy-aware auv-assisted data collection for the internet of underwater things,

Z. Zhang, J. Xu, G. Xie, J. Wang, Z. Han, and Y . Ren, “Environment and energy-aware auv-assisted data collection for the internet of underwater things,” IEEE Internet of Things Journal , 2024

work page 2024
[13]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2005

[1] [1]

Toward the internet of underwater things: Recent developments and future challenges,

R. A. Khalil, N. Saeed, M. I. Babar, and T. Jan, “Toward the internet of underwater things: Recent developments and future challenges,” IEEE Consumer Electronics Magazine , vol. 10, no. 6, pp. 32–37, 2020

work page 2020

[2] [2]

Fisher-information-matrix-based usbl cooperative location in usv–auv networks,

Z. Wang, J. Xu, Y . Feng, Y . Wang, G. Xie, X. Hou, W. Men, and Y . Ren, “Fisher-information-matrix-based usbl cooperative location in usv–auv networks,” Sensors, vol. 23, no. 17, 2023

work page 2023

[3] [3]

Self-deployment of mobile underwater acoustic sensor networks for maximized coverage and guaranteed connectivity,

F. Senel, K. Akkaya, M. Erol-Kantarci, and T. Yilmaz, “Self-deployment of mobile underwater acoustic sensor networks for maximized coverage and guaranteed connectivity,” Ad Hoc Networks , vol. 34, pp. 170–183, 2015

work page 2015

[4] [4]

Low complexity residual doppler shift estimation for underwater acoustic multicarrier communi- cation,

A. Amar, G. Avrashi, and M. Stojanovic, “Low complexity residual doppler shift estimation for underwater acoustic multicarrier communi- cation,” IEEE Transactions on Signal Processing , vol. 65, no. 8, pp. 2063–2076, 2016

work page 2063

[5] [5]

Delay-sensitive opportunistic routing for underwater sensor networks,

C.-C. Hsu, H.-H. Liu, J. L. G. G ´omez, and C.-F. Chou, “Delay-sensitive opportunistic routing for underwater sensor networks,” IEEE sensors journal, vol. 15, no. 11, pp. 6584–6591, 2015

work page 2015

[6] [6]

An enhanced k-means and anova-based clustering approach for similarity aggregation in underwater wireless sensor networks,

H. Harb, A. Makhoul, and R. Couturier, “An enhanced k-means and anova-based clustering approach for similarity aggregation in underwater wireless sensor networks,” IEEE Sensors Journal , vol. 15, no. 10, pp. 5483–5493, 2015

work page 2015

[7] [7]

Routing protocols for underwater wireless sensor networks,

G. Han, J. Jiang, N. Bao, L. Wan, and M. Guizani, “Routing protocols for underwater wireless sensor networks,”IEEE Communications Magazine, vol. 53, no. 11, pp. 72–78, 2015

work page 2015

[8] [8]

Network lifetime-aware data collection in underwater sensor networks for delay-tolerant applications,

J. J. Kartha and L. Jacob, “Network lifetime-aware data collection in underwater sensor networks for delay-tolerant applications,” S¯adhan¯a, vol. 42, pp. 1645–1664, 2017

work page 2017

[9] [9]

Biologically inspired self- organizing map applied to task assignment and path planning of an auv system,

D. Zhu, X. Cao, B. Sun, and C. Luo, “Biologically inspired self- organizing map applied to task assignment and path planning of an auv system,” IEEE Transactions on Cognitive and Developmental Systems , vol. 10, no. 2, pp. 304–313, 2017

work page 2017

[10] [10]

A path planning scheme for auv flock-based internet-of-underwater-things systems to enable transparent and smart ocean,

C. Lin, G. Han, J. Du, Y . Bi, L. Shu, and K. Fan, “A path planning scheme for auv flock-based internet-of-underwater-things systems to enable transparent and smart ocean,” IEEE Internet of Things Journal , vol. 7, no. 10, pp. 9760–9772, 2020

work page 2020

[11] [11]

Environment- aware auv trajectory design and resource management for multi-tier underwater computing,

X. Hou, J. Wang, T. Bai, Y . Deng, Y . Ren, and L. Hanzo, “Environment- aware auv trajectory design and resource management for multi-tier underwater computing,” IEEE Journal on Selected Areas in Commu- nications, vol. 41, no. 2, pp. 474–490, 2023

work page 2023

[12] [12]

Environment and energy-aware auv-assisted data collection for the internet of underwater things,

Z. Zhang, J. Xu, G. Xie, J. Wang, Z. Han, and Y . Ren, “Environment and energy-aware auv-assisted data collection for the internet of underwater things,” IEEE Internet of Things Journal , 2024

work page 2024

[13] [13]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2005