AoI-MDP: An AoI Optimized Markov Decision Process (Student Abstract)

Guanwen Xie; Jingzehua Xu; Shuai Zhang; Xinqi Wang; Yimian Ding; Yiyuan Yang

arxiv: 2605.16777 · v1 · pith:N7WH75Z4new · submitted 2026-05-16 · 📡 eess.SY · cs.SY

AoI-MDP: An AoI Optimized Markov Decision Process (Student Abstract)

Yimian Ding , Jingzehua Xu , Yiyuan Yang , Guanwen Xie , Xinqi Wang , Shuai Zhang This is my paper

Pith reviewed 2026-05-19 21:28 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords Age of InformationMarkov Decision ProcessReinforcement LearningUnderwater VehiclesObservation DelayAutonomous SystemsInformation Freshness

0 comments

The pith

Incorporating age of information into the state space and adding a wait action lets reinforcement learning produce better policies for underwater vehicles facing observation delays.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes AoI-MDP to address observation delays in autonomous underwater vehicle tasks by treating those delays as signal age and embedding them directly in the MDP state. It further expands the action set with an explicit wait option and folds age of information into the reward signal so that reinforcement learning jointly optimizes freshness and task performance. A reader would care because many real ocean missions suffer from significant sensor or communication lag, and any method that improves decisions under such lag has direct practical value. Simulations indicate the augmented process beats the ordinary MDP on the same tasks while remaining stable and generalizable.

Core claim

AoI-MDP models observation delay as signal delay and places it inside the state space. It introduces a wait action and integrates age of information into the reward functions, enabling reinforcement learning to optimize both information freshness and decision quality. The resulting policies outperform those of the standard MDP in simulated underwater scenarios.

What carries the argument

Age-of-information-augmented MDP that adds delay to the state vector and a wait action to the action set so reinforcement learning can trade off freshness against task progress.

Load-bearing premise

Adding age of information to the state and a wait action will produce better policies without introducing instability or requiring extensive new hyper-parameter tuning in the reinforcement learning process.

What would settle it

A set of underwater-task simulations in which AoI-MDP achieves no higher cumulative reward or success rate than standard MDP across varied delay distributions would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.16777 by Guanwen Xie, Jingzehua Xu, Shuai Zhang, Xinqi Wang, Yimian Ding, Yiyuan Yang.

**Figure 1.** Figure 1: Illustration of the AoI model. • We utilize statistical delay modeling (SDM) for delayoriented modeling of observation delay via sensor-based model, yielding realistic results. • Comprehensive experiments in the underwater data collection task show AoI-MDP’s superior feasibility and performance in balancing multi-objective optimization. Methodology AoI Optimized Markov Decision Process. As illustrated in… view at source ↗

**Figure 2.** Figure 2: Diagram of the heading and time delay estimation. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of experimental results of RL training [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗

read the original abstract

Ocean exploration places high demands on autonomous underwater vehicles, especially when there's observation delay. We propose age of information optimized Markov decision process (AoI-MDP) to enhance underwater tasks by modeling observation delay as signal delay and including it in the state space. AoI-MDP also introduces wait time in the action space and integrates AoI with reward functions, optimizing information freshness and decision-making using reinforcement learning. Simulations show AoI-MDP outperforms the standard MDP, demonstrating superior performance, feasibility, and generalization in underwater tasks. To accelerate relevant research, we have made the codes available as open-source at https://github.com/Xiboxtg/AoI-MDP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper offers a targeted AoI extension to MDPs for AUVs with open code but the outperformance claim lacks detailed evidence.

read the letter

The one thing to know about this paper is that it proposes an AoI-MDP where age of information is added to the state, a wait time is included in the actions, and AoI influences the reward for reinforcement learning in autonomous underwater vehicle tasks. This targets observation delays in ocean exploration. What stands out as new is the particular integration of these AoI elements into the MDP components for this underwater setting. The authors treat the delay as signal age and use RL to optimize for it. They also provide open-source code, which is a positive step for a student abstract and allows others to reproduce or extend the work. The paper does well in identifying a practical problem where information freshness matters for decision making in AUVs. The abstract explains the motivation clearly and positions the approach as improving performance, feasibility, and generalization. On the soft spots, the evaluation is the main issue. It claims superior simulation performance over standard MDP but supplies no quantitative results, no details on the simulation setup, and no indication of how the increased state and action space was handled during training. As the stress-test note suggests, without matching the number of episodes, wall-clock time, or re-tuning hyperparameters for the baseline, any advantage might not be due to the AoI additions alone. If the full paper includes proper statistical tests and controls for these factors, that would address the gap. Otherwise the central claim rests on unverified assertions. This work is aimed at researchers in systems and control who focus on age of information or RL for robotic applications in challenging environments. A reader interested in applying MDP and RL to delay-sensitive underwater tasks could gain some value from the modeling ideas and the released code. It engages honestly with the literature on AoI and MDPs, so it qualifies for a serious referee. I recommend sending it for peer review, with attention to bolstering the experimental comparisons.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes AoI-MDP, an extension of the standard Markov decision process for autonomous underwater vehicles that incorporates age of information (AoI) to model observation delays. AoI is added to the state space, a wait action is introduced in the action space, and AoI is integrated into the reward function. Reinforcement learning is used to derive policies that optimize information freshness alongside task performance. Simulations are reported to show that AoI-MDP outperforms the baseline MDP, with claims of superior performance, feasibility, and generalization; the implementation code is released as open source.

Significance. If the performance advantage can be shown under matched training conditions, the approach offers a practical way to handle communication delays in underwater robotic control, a setting where observation staleness directly affects decision quality. The open-source code release is a clear strength that supports reproducibility and extension by other researchers.

major comments (2)

[Abstract / Simulation results] The central performance claim (abstract: 'Simulations show AoI-MDP outperforms the standard MDP') is unsupported by any quantitative metrics, tables, or figures. No values for cumulative reward, average AoI, task success rate, or other measures are supplied, nor are error bars or statistical tests across seeds reported. This absence prevents verification that the observed gap is attributable to the AoI augmentation rather than implementation artifacts.
[Method and simulation setup] No information is given on training parity between AoI-MDP and the baseline MDP. The manuscript does not state whether the same number of episodes, wall-clock budget, or hyper-parameter tuning effort was used, despite AoI-MDP expanding the state space (by the AoI variable) and action space (by the wait action). Without explicit confirmation of comparable training resources, the superiority claim cannot be isolated to the modeling choice.

minor comments (2)

[Overall] The manuscript is brief, consistent with a student abstract format; a short paragraph describing the specific underwater task (e.g., navigation, target tracking) and the simulation environment parameters would improve readability.
[Code availability] The GitHub link is provided; confirming that the repository contains the exact environment, reward definitions, and training scripts used for the reported simulations would strengthen the reproducibility claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive comments on our student abstract. We address the major concerns below and will incorporate revisions to provide more quantitative evidence and clarify the experimental setup.

read point-by-point responses

Referee: [Abstract / Simulation results] The central performance claim (abstract: 'Simulations show AoI-MDP outperforms the standard MDP') is unsupported by any quantitative metrics, tables, or figures. No values for cumulative reward, average AoI, task success rate, or other measures are supplied, nor are error bars or statistical tests across seeds reported. This absence prevents verification that the observed gap is attributable to the AoI augmentation rather than implementation artifacts.

Authors: We agree with the referee that the abstract lacks specific quantitative metrics to support the performance claims. Given the constraints of the student abstract format, we will revise the manuscript to include key simulation results in a table format, reporting metrics such as average cumulative reward, mean AoI, and task success rates for AoI-MDP versus the baseline MDP, including standard deviations from multiple runs. revision: yes
Referee: [Method and simulation setup] No information is given on training parity between AoI-MDP and the baseline MDP. The manuscript does not state whether the same number of episodes, wall-clock budget, or hyper-parameter tuning effort was used, despite AoI-MDP expanding the state space (by the AoI variable) and action space (by the wait action). Without explicit confirmation of comparable training resources, the superiority claim cannot be isolated to the modeling choice.

Authors: We acknowledge the need for explicit details on the experimental setup. The training was performed with equivalent resources: the same number of episodes and hyper-parameter settings were used for both models. We will update the manuscript to describe the simulation parameters in detail, confirming the matched training conditions and noting any differences due to the increased state-action space. revision: yes

Circularity Check

0 steps flagged

No circularity in AoI-MDP proposal or simulation claims

full rationale

The paper proposes AoI-MDP by augmenting a standard MDP state space with an age-of-information variable, adding a wait action, and folding AoI into the reward function, then reports simulation outperformance versus baseline MDP. No equations, parameter-fitting steps, self-citations, or uniqueness theorems are present that would reduce any claimed result to an input by construction. The approach relies on conventional RL components applied to the modified formulation, rendering the work self-contained against external benchmarks with no load-bearing circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are described. The approach relies on standard MDP transition and reward structures plus the addition of AoI terms whose functional form is not specified.

pith-pipeline@v0.9.0 · 5656 in / 1123 out tokens · 36127 ms · 2026-05-19T21:28:14.765743+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

[1]

Vol and Energy-Aware AUV-Assisted Data Collection for Internet of Underwater Things , year=

Xu, Jingzehua and Zhang, Zekai and Wang, Ziyuan and Wang, Jingjing and Rent, Yong , booktitle=. Vol and Energy-Aware AUV-Assisted Data Collection for Internet of Underwater Things , year=

work page
[2]

Noncooperative Mobile Target Tracking Using Multiple AUVs in Anchor-Free Environments , year=

Li, Yichen and Liu, Lingya and Yu, Wenbin and Wang, Yiyin and Guan, Xinping , journal=. Noncooperative Mobile Target Tracking Using Multiple AUVs in Anchor-Free Environments , year=

work page
[3]

Environment and Energy-Aware AUV-Assisted Data Collection for the Internet of Underwater Things , year=

Zhang, Zekai and Xu, Jingzehua and Xie, Guanwen and Wang, Jingjing and Han, Zhu and Ren, Yong , journal=. Environment and Energy-Aware AUV-Assisted Data Collection for the Internet of Underwater Things , year=

work page
[4]

Wei, Wei and Wang, Jingjing and Du, Jun and Fang, Zhengru and Ren, Yong and Chen, C. L. Philip , journal=. Differential Game-Based Deep Reinforcement Learning in Underwater Target Hunting Task , year=

work page
[5]

Reinforcement Learning and Particle Swarm Optimization Supporting Real-Time Rescue Assignments for Multiple Autonomous Underwater Vehicles , year=

Wu, Jiehong and Song, Chengxin and Ma, Jian and Wu, Jinsong and Han, Guangjie , journal=. Reinforcement Learning and Particle Swarm Optimization Supporting Real-Time Rescue Assignments for Multiple Autonomous Underwater Vehicles , year=

work page
[6]

Optimizing Information Freshness in Wireless Networks Under General Interference Constraints , year=

Talak, Rajat and Karaman, Sertac and Modiano, Eytan , journal=. Optimizing Information Freshness in Wireless Networks Under General Interference Constraints , year=

work page
[7]

and Sun, Yin and Brown, D

Yates, Roy D. and Sun, Yin and Brown, D. Richard and Kaul, Sanjit K. and Modiano, Eytan and Ulukus, Sennur , journal=. Age of Information: An Introduction and Survey , year=

work page
[8]

Underwater Searching and Multiround Data Collection via AUV Swarms: An Energy-Efficient AoI-Aware MAPPO Approach , year=

Jiang, Bingqing and Du, Jun and Jiang, Chunxiao and Han, Zhu and Debbah, Merouane , journal=. Underwater Searching and Multiround Data Collection via AUV Swarms: An Energy-Efficient AoI-Aware MAPPO Approach , year=

work page
[9]

Proceedings of the 1992 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems , pages =

Altman, Eitan and Nain, Philippe , title =. Proceedings of the 1992 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems , pages =. 1992 , isbn =

work page 1992
[10]

and Engelbrecht, S.E

Katsikopoulos, K.V. and Engelbrecht, S.E. , journal=. Markov decision processes with delays and asynchronous cost collection , year=

work page
[11]

Multi-Objective-Optimization Assisted Data Collection Framework for IoUT Based on Offline Reinforcement

Multi-Objective-Optimization Multi-AUV Assisted Data Collection Framework for IoUT Based on Offline Reinforcement Learning , author=. arXiv preprint arXiv:2410.11282 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Multi-AUV Assisted Seamless Underwater Target Tracking Relying on Deep Learning and Reinforcement Learning , year=

Xu, Jingzehua and Ding, Yimian and Zhang, Zekai and Xie, Guanwen and Wang, Ziyuan and Zeng, Yongming and Li, Gang , booktitle=. Multi-AUV Assisted Seamless Underwater Target Tracking Relying on Deep Learning and Reinforcement Learning , year=

work page
[13]

IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) , pages=

Deep reinforcement learning for fresh data collection in UAV-assisted IoT networks , author=. IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) , pages=. 2020 , organization=

work page 2020
[14]

1960 , url=

Dynamic Programming and Markov Processes , author=. 1960 , url=

work page 1960
[15]

SIGMETRICS Perform

Altman, Eitan and Nain, Philippe , title =. SIGMETRICS Perform. Eval. Rev. , month =. 1992 , issue_date =

work page 1992
[16]

UAV-UGV-Based System for AoI minimization in IoT Networks , year=

Messaoudi, Kaddour and Oubbati, Omar Sami and Rachedi, Abderrezak and Bendouma, Tahar , booktitle=. UAV-UGV-Based System for AoI minimization in IoT Networks , year=

work page
[17]

Cooperative Transmission for AoI-Penalty Aware State Estimation in Marine IoT Systems , year=

Lyu, Ling and Dai, Yanpeng and Cheng, Nan and Zhu, Shanying and Ding, Zhengtao and Guan, Xinping , booktitle=. Cooperative Transmission for AoI-Penalty Aware State Estimation in Marine IoT Systems , year=

work page
[18]

and Koksal, C

Sun, Yin and Uysal-Biyikoglu, Elif and Yates, Roy D. and Koksal, C. Emre and Shroff, Ness B. , journal=. Update or Wait: How to Keep Your Data Fresh , year=

work page
[19]

IEEE Transactions on Vehicular Technology , volume=

3U: Joint design of UAV-USV-UUV networks for cooperative target hunting , author=. IEEE Transactions on Vehicular Technology , volume=. 2022 , publisher=

work page 2022

[1] [1]

Vol and Energy-Aware AUV-Assisted Data Collection for Internet of Underwater Things , year=

Xu, Jingzehua and Zhang, Zekai and Wang, Ziyuan and Wang, Jingjing and Rent, Yong , booktitle=. Vol and Energy-Aware AUV-Assisted Data Collection for Internet of Underwater Things , year=

work page

[2] [2]

Noncooperative Mobile Target Tracking Using Multiple AUVs in Anchor-Free Environments , year=

Li, Yichen and Liu, Lingya and Yu, Wenbin and Wang, Yiyin and Guan, Xinping , journal=. Noncooperative Mobile Target Tracking Using Multiple AUVs in Anchor-Free Environments , year=

work page

[3] [3]

Environment and Energy-Aware AUV-Assisted Data Collection for the Internet of Underwater Things , year=

Zhang, Zekai and Xu, Jingzehua and Xie, Guanwen and Wang, Jingjing and Han, Zhu and Ren, Yong , journal=. Environment and Energy-Aware AUV-Assisted Data Collection for the Internet of Underwater Things , year=

work page

[4] [4]

Wei, Wei and Wang, Jingjing and Du, Jun and Fang, Zhengru and Ren, Yong and Chen, C. L. Philip , journal=. Differential Game-Based Deep Reinforcement Learning in Underwater Target Hunting Task , year=

work page

[5] [5]

Reinforcement Learning and Particle Swarm Optimization Supporting Real-Time Rescue Assignments for Multiple Autonomous Underwater Vehicles , year=

Wu, Jiehong and Song, Chengxin and Ma, Jian and Wu, Jinsong and Han, Guangjie , journal=. Reinforcement Learning and Particle Swarm Optimization Supporting Real-Time Rescue Assignments for Multiple Autonomous Underwater Vehicles , year=

work page

[6] [6]

Optimizing Information Freshness in Wireless Networks Under General Interference Constraints , year=

Talak, Rajat and Karaman, Sertac and Modiano, Eytan , journal=. Optimizing Information Freshness in Wireless Networks Under General Interference Constraints , year=

work page

[7] [7]

and Sun, Yin and Brown, D

Yates, Roy D. and Sun, Yin and Brown, D. Richard and Kaul, Sanjit K. and Modiano, Eytan and Ulukus, Sennur , journal=. Age of Information: An Introduction and Survey , year=

work page

[8] [8]

Underwater Searching and Multiround Data Collection via AUV Swarms: An Energy-Efficient AoI-Aware MAPPO Approach , year=

Jiang, Bingqing and Du, Jun and Jiang, Chunxiao and Han, Zhu and Debbah, Merouane , journal=. Underwater Searching and Multiround Data Collection via AUV Swarms: An Energy-Efficient AoI-Aware MAPPO Approach , year=

work page

[9] [9]

Proceedings of the 1992 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems , pages =

Altman, Eitan and Nain, Philippe , title =. Proceedings of the 1992 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems , pages =. 1992 , isbn =

work page 1992

[10] [10]

and Engelbrecht, S.E

Katsikopoulos, K.V. and Engelbrecht, S.E. , journal=. Markov decision processes with delays and asynchronous cost collection , year=

work page

[11] [11]

Multi-Objective-Optimization Assisted Data Collection Framework for IoUT Based on Offline Reinforcement

Multi-Objective-Optimization Multi-AUV Assisted Data Collection Framework for IoUT Based on Offline Reinforcement Learning , author=. arXiv preprint arXiv:2410.11282 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Multi-AUV Assisted Seamless Underwater Target Tracking Relying on Deep Learning and Reinforcement Learning , year=

Xu, Jingzehua and Ding, Yimian and Zhang, Zekai and Xie, Guanwen and Wang, Ziyuan and Zeng, Yongming and Li, Gang , booktitle=. Multi-AUV Assisted Seamless Underwater Target Tracking Relying on Deep Learning and Reinforcement Learning , year=

work page

[13] [13]

IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) , pages=

Deep reinforcement learning for fresh data collection in UAV-assisted IoT networks , author=. IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) , pages=. 2020 , organization=

work page 2020

[14] [14]

1960 , url=

Dynamic Programming and Markov Processes , author=. 1960 , url=

work page 1960

[15] [15]

SIGMETRICS Perform

Altman, Eitan and Nain, Philippe , title =. SIGMETRICS Perform. Eval. Rev. , month =. 1992 , issue_date =

work page 1992

[16] [16]

UAV-UGV-Based System for AoI minimization in IoT Networks , year=

Messaoudi, Kaddour and Oubbati, Omar Sami and Rachedi, Abderrezak and Bendouma, Tahar , booktitle=. UAV-UGV-Based System for AoI minimization in IoT Networks , year=

work page

[17] [17]

Cooperative Transmission for AoI-Penalty Aware State Estimation in Marine IoT Systems , year=

Lyu, Ling and Dai, Yanpeng and Cheng, Nan and Zhu, Shanying and Ding, Zhengtao and Guan, Xinping , booktitle=. Cooperative Transmission for AoI-Penalty Aware State Estimation in Marine IoT Systems , year=

work page

[18] [18]

and Koksal, C

Sun, Yin and Uysal-Biyikoglu, Elif and Yates, Roy D. and Koksal, C. Emre and Shroff, Ness B. , journal=. Update or Wait: How to Keep Your Data Fresh , year=

work page

[19] [19]

IEEE Transactions on Vehicular Technology , volume=

3U: Joint design of UAV-USV-UUV networks for cooperative target hunting , author=. IEEE Transactions on Vehicular Technology , volume=. 2022 , publisher=

work page 2022