AoI-MDP: An AoI Optimized Markov Decision Process (Student Abstract)
Pith reviewed 2026-05-19 21:28 UTC · model grok-4.3
The pith
Incorporating age of information into the state space and adding a wait action lets reinforcement learning produce better policies for underwater vehicles facing observation delays.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AoI-MDP models observation delay as signal delay and places it inside the state space. It introduces a wait action and integrates age of information into the reward functions, enabling reinforcement learning to optimize both information freshness and decision quality. The resulting policies outperform those of the standard MDP in simulated underwater scenarios.
What carries the argument
Age-of-information-augmented MDP that adds delay to the state vector and a wait action to the action set so reinforcement learning can trade off freshness against task progress.
Load-bearing premise
Adding age of information to the state and a wait action will produce better policies without introducing instability or requiring extensive new hyper-parameter tuning in the reinforcement learning process.
What would settle it
A set of underwater-task simulations in which AoI-MDP achieves no higher cumulative reward or success rate than standard MDP across varied delay distributions would falsify the central claim.
Figures
read the original abstract
Ocean exploration places high demands on autonomous underwater vehicles, especially when there's observation delay. We propose age of information optimized Markov decision process (AoI-MDP) to enhance underwater tasks by modeling observation delay as signal delay and including it in the state space. AoI-MDP also introduces wait time in the action space and integrates AoI with reward functions, optimizing information freshness and decision-making using reinforcement learning. Simulations show AoI-MDP outperforms the standard MDP, demonstrating superior performance, feasibility, and generalization in underwater tasks. To accelerate relevant research, we have made the codes available as open-source at https://github.com/Xiboxtg/AoI-MDP.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes AoI-MDP, an extension of the standard Markov decision process for autonomous underwater vehicles that incorporates age of information (AoI) to model observation delays. AoI is added to the state space, a wait action is introduced in the action space, and AoI is integrated into the reward function. Reinforcement learning is used to derive policies that optimize information freshness alongside task performance. Simulations are reported to show that AoI-MDP outperforms the baseline MDP, with claims of superior performance, feasibility, and generalization; the implementation code is released as open source.
Significance. If the performance advantage can be shown under matched training conditions, the approach offers a practical way to handle communication delays in underwater robotic control, a setting where observation staleness directly affects decision quality. The open-source code release is a clear strength that supports reproducibility and extension by other researchers.
major comments (2)
- [Abstract / Simulation results] The central performance claim (abstract: 'Simulations show AoI-MDP outperforms the standard MDP') is unsupported by any quantitative metrics, tables, or figures. No values for cumulative reward, average AoI, task success rate, or other measures are supplied, nor are error bars or statistical tests across seeds reported. This absence prevents verification that the observed gap is attributable to the AoI augmentation rather than implementation artifacts.
- [Method and simulation setup] No information is given on training parity between AoI-MDP and the baseline MDP. The manuscript does not state whether the same number of episodes, wall-clock budget, or hyper-parameter tuning effort was used, despite AoI-MDP expanding the state space (by the AoI variable) and action space (by the wait action). Without explicit confirmation of comparable training resources, the superiority claim cannot be isolated to the modeling choice.
minor comments (2)
- [Overall] The manuscript is brief, consistent with a student abstract format; a short paragraph describing the specific underwater task (e.g., navigation, target tracking) and the simulation environment parameters would improve readability.
- [Code availability] The GitHub link is provided; confirming that the repository contains the exact environment, reward definitions, and training scripts used for the reported simulations would strengthen the reproducibility claim.
Simulated Author's Rebuttal
Thank you for the constructive comments on our student abstract. We address the major concerns below and will incorporate revisions to provide more quantitative evidence and clarify the experimental setup.
read point-by-point responses
-
Referee: [Abstract / Simulation results] The central performance claim (abstract: 'Simulations show AoI-MDP outperforms the standard MDP') is unsupported by any quantitative metrics, tables, or figures. No values for cumulative reward, average AoI, task success rate, or other measures are supplied, nor are error bars or statistical tests across seeds reported. This absence prevents verification that the observed gap is attributable to the AoI augmentation rather than implementation artifacts.
Authors: We agree with the referee that the abstract lacks specific quantitative metrics to support the performance claims. Given the constraints of the student abstract format, we will revise the manuscript to include key simulation results in a table format, reporting metrics such as average cumulative reward, mean AoI, and task success rates for AoI-MDP versus the baseline MDP, including standard deviations from multiple runs. revision: yes
-
Referee: [Method and simulation setup] No information is given on training parity between AoI-MDP and the baseline MDP. The manuscript does not state whether the same number of episodes, wall-clock budget, or hyper-parameter tuning effort was used, despite AoI-MDP expanding the state space (by the AoI variable) and action space (by the wait action). Without explicit confirmation of comparable training resources, the superiority claim cannot be isolated to the modeling choice.
Authors: We acknowledge the need for explicit details on the experimental setup. The training was performed with equivalent resources: the same number of episodes and hyper-parameter settings were used for both models. We will update the manuscript to describe the simulation parameters in detail, confirming the matched training conditions and noting any differences due to the increased state-action space. revision: yes
Circularity Check
No circularity in AoI-MDP proposal or simulation claims
full rationale
The paper proposes AoI-MDP by augmenting a standard MDP state space with an age-of-information variable, adding a wait action, and folding AoI into the reward function, then reports simulation outperformance versus baseline MDP. No equations, parameter-fitting steps, self-citations, or uniqueness theorems are present that would reduce any claimed result to an input by construction. The approach relies on conventional RL components applied to the modified formulation, rendering the work self-contained against external benchmarks with no load-bearing circular reductions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Vol and Energy-Aware AUV-Assisted Data Collection for Internet of Underwater Things , year=
Xu, Jingzehua and Zhang, Zekai and Wang, Ziyuan and Wang, Jingjing and Rent, Yong , booktitle=. Vol and Energy-Aware AUV-Assisted Data Collection for Internet of Underwater Things , year=
-
[2]
Noncooperative Mobile Target Tracking Using Multiple AUVs in Anchor-Free Environments , year=
Li, Yichen and Liu, Lingya and Yu, Wenbin and Wang, Yiyin and Guan, Xinping , journal=. Noncooperative Mobile Target Tracking Using Multiple AUVs in Anchor-Free Environments , year=
-
[3]
Zhang, Zekai and Xu, Jingzehua and Xie, Guanwen and Wang, Jingjing and Han, Zhu and Ren, Yong , journal=. Environment and Energy-Aware AUV-Assisted Data Collection for the Internet of Underwater Things , year=
-
[4]
Wei, Wei and Wang, Jingjing and Du, Jun and Fang, Zhengru and Ren, Yong and Chen, C. L. Philip , journal=. Differential Game-Based Deep Reinforcement Learning in Underwater Target Hunting Task , year=
-
[5]
Wu, Jiehong and Song, Chengxin and Ma, Jian and Wu, Jinsong and Han, Guangjie , journal=. Reinforcement Learning and Particle Swarm Optimization Supporting Real-Time Rescue Assignments for Multiple Autonomous Underwater Vehicles , year=
-
[6]
Optimizing Information Freshness in Wireless Networks Under General Interference Constraints , year=
Talak, Rajat and Karaman, Sertac and Modiano, Eytan , journal=. Optimizing Information Freshness in Wireless Networks Under General Interference Constraints , year=
-
[7]
Yates, Roy D. and Sun, Yin and Brown, D. Richard and Kaul, Sanjit K. and Modiano, Eytan and Ulukus, Sennur , journal=. Age of Information: An Introduction and Survey , year=
-
[8]
Jiang, Bingqing and Du, Jun and Jiang, Chunxiao and Han, Zhu and Debbah, Merouane , journal=. Underwater Searching and Multiround Data Collection via AUV Swarms: An Energy-Efficient AoI-Aware MAPPO Approach , year=
-
[9]
Altman, Eitan and Nain, Philippe , title =. Proceedings of the 1992 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems , pages =. 1992 , isbn =
work page 1992
-
[10]
Katsikopoulos, K.V. and Engelbrecht, S.E. , journal=. Markov decision processes with delays and asynchronous cost collection , year=
-
[11]
Multi-Objective-Optimization Multi-AUV Assisted Data Collection Framework for IoUT Based on Offline Reinforcement Learning , author=. arXiv preprint arXiv:2410.11282 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
Xu, Jingzehua and Ding, Yimian and Zhang, Zekai and Xie, Guanwen and Wang, Ziyuan and Zeng, Yongming and Li, Gang , booktitle=. Multi-AUV Assisted Seamless Underwater Target Tracking Relying on Deep Learning and Reinforcement Learning , year=
-
[13]
IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) , pages=
Deep reinforcement learning for fresh data collection in UAV-assisted IoT networks , author=. IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) , pages=. 2020 , organization=
work page 2020
- [14]
-
[15]
Altman, Eitan and Nain, Philippe , title =. SIGMETRICS Perform. Eval. Rev. , month =. 1992 , issue_date =
work page 1992
-
[16]
UAV-UGV-Based System for AoI minimization in IoT Networks , year=
Messaoudi, Kaddour and Oubbati, Omar Sami and Rachedi, Abderrezak and Bendouma, Tahar , booktitle=. UAV-UGV-Based System for AoI minimization in IoT Networks , year=
-
[17]
Cooperative Transmission for AoI-Penalty Aware State Estimation in Marine IoT Systems , year=
Lyu, Ling and Dai, Yanpeng and Cheng, Nan and Zhu, Shanying and Ding, Zhengtao and Guan, Xinping , booktitle=. Cooperative Transmission for AoI-Penalty Aware State Estimation in Marine IoT Systems , year=
-
[18]
Sun, Yin and Uysal-Biyikoglu, Elif and Yates, Roy D. and Koksal, C. Emre and Shroff, Ness B. , journal=. Update or Wait: How to Keep Your Data Fresh , year=
-
[19]
IEEE Transactions on Vehicular Technology , volume=
3U: Joint design of UAV-USV-UUV networks for cooperative target hunting , author=. IEEE Transactions on Vehicular Technology , volume=. 2022 , publisher=
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.