Enhancing Information Freshness: An AoI Optimized Markov Decision Process

Guanwen Xie; Jingzehua Xu; Shuai Zhang; Yimian Ding; Yiyuan Yang

arxiv: 2409.02424 · v4 · pith:7ARNLNSDnew · submitted 2024-09-04 · 📡 eess.SY · cs.SY

Enhancing Information Freshness: An AoI Optimized Markov Decision Process

Jingzehua Xu , Yimian Ding , Yiyuan Yang , Guanwen Xie , Shuai Zhang This is my paper

Pith reviewed 2026-05-23 21:02 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords age of informationmarkov decision processautonomous underwater vehiclesreinforcement learningobservation delayinformation freshnessmulti-agent data collection

0 comments

The pith

Modeling observation delays as statistical timing in MDP states, plus wait actions and AoI rewards, lets RL jointly minimize information age for multi-AUV underwater tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that underwater AUV tasks often fail because of observation delays from limited information in updating networks. It addresses this by building an AoI-MDP that treats those delays as timing components via a statistical formulation added to the state space, adds wait time to the action space, and folds AoI into the reward function. A sympathetic reader would care because the result is claimed to be joint optimization of freshness and decision quality during RL training. Simulations in a multi-AUV data collection scenario show the model reduces AoI and outperforms standard approaches.

Core claim

AoI-MDP models observation delay as timing delay through statistical delay formulation, includes this delay as a new component in the state space, introduces wait time in the action space, and integrates AoI with reward functions to achieve joint optimization that minimizes AoI while showing superior performance in the multi-AUV data collection task.

What carries the argument

The AoI-MDP, which augments MDP states with statistical delay components and rewards with AoI values to enable joint optimization of freshness and control via RL.

If this is right

The joint optimization produces lower AoI values during RL training for the data collection task.
The augmented MDP yields superior task performance metrics compared with standard RL formulations in the multi-AUV scenario.
The approach remains feasible under the modeled underwater conditions as demonstrated by the simulation results.
Open-sourced simulation code allows direct reproduction and extension of the AoI-MDP for similar networked control problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same state-augmentation technique could be tested on other delay-sensitive RL domains such as remote sensor networks or satellite tasking.
Real AUV hardware trials would reveal whether the statistical delay model remains predictive when acoustic channel statistics vary with depth and temperature.
Replacing the statistical delay with measured packet timestamps might tighten the state representation and further reduce residual AoI.
Extending the wait-time action to include variable transmission power levels could produce additional trade-offs between energy and freshness.

Load-bearing premise

The statistical delay formulation accurately represents the observation delay caused by information limitation in underwater updating networks.

What would settle it

A multi-AUV simulation run in which the AoI-MDP produces no lower average AoI or no higher task success rate than a baseline MDP that omits the statistical delay state component.

Figures

Figures reproduced from arXiv: 2409.02424 by Guanwen Xie, Jingzehua Xu, Shuai Zhang, Yimian Ding, Yiyuan Yang.

**Figure 2.** Figure 2: Illustration of the azimuth and time delay estimation. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Comparison of experimental results using online and [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: The AUV trajectories using the expert policy trained [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

read the original abstract

Ocean exploration utilizing autonomous underwater vehicles (AUVs) via reinforcement learning (RL) has emerged as a significant research focus. However, underwater tasks have mostly failed due to the observation delay caused by information limitation in the information updating networks. In this study, we present an AoI optimized Markov decision process (AoI-MDP) to improve the performance of underwater tasks. Specifically, AoI-MDP models observation delay as timing delay through statistical delay formulation, and includes this delay as a new component in the state space. Additionally, we introduce wait time in the action space, and integrate AoI with reward functions to achieve joint optimization of information freshness and decision-making for AUVs leveraging RL for training. Finally, we apply this approach to the multi-AUV data collection task scenario as an example. Simulation results highlight the feasibility of AoI-MDP, which effectively minimizes AoI while showcasing superior performance in the task. To accelerate relevant research in this field, we have made the simulation codes available as open-source.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper folds AoI into an MDP for multi-AUV data collection by adding a statistical delay to the state and AoI to the reward, but the delay model itself is asserted without derivation or data backing.

read the letter

The main takeaway is that this work extends MDP with AoI components for underwater AUV tasks, yet the central modeling step for observation delay rests on an unexamined statistical formulation. They treat the delay as a timing delay, insert it into the state, add wait time to the action space, and blend AoI into the reward so the RL agent optimizes both freshness and task success in a data-collection scenario. The code is released, which lets others inspect the implementation directly. That is the concrete addition here: an application-specific MDP that tries to handle information aging in a setting where acoustic links are slow and lossy. Prior AoI-MDP papers exist in wireless control, so the novelty is mainly the domain choice rather than a new framework. The open code and the focus on a documented failure mode in AUV RL are the parts that hold up. The soft spot is exactly the one flagged in the stress test. The statistical delay distribution is introduced as the new state element without showing how it comes from channel physics, without fitting to measured underwater traces, and without sensitivity runs. If that distribution does not match the actual information-limitation effects, the MDP state is mis-specified and any performance gain is hard to interpret. The abstract claims superior simulation results, but supplies no baseline details, run counts, or variance, so the size of the improvement cannot be judged yet. This paper is aimed at people already working on RL for ocean vehicles or on AoI in networked control. A reader who needs an example of how to wire AoI into an MDP reward and state can extract the structure and adapt the delay part themselves. It is incremental rather than foundational, but the problem is real and the code is available, so it clears the bar for peer review. I would send it to referees with a request to strengthen the justification and validation of the delay model.

Referee Report

2 major / 1 minor

Summary. The paper proposes an AoI-optimized Markov decision process (AoI-MDP) for multi-AUV underwater data collection. It models observation delay caused by information limitation via a statistical delay formulation, augments the MDP state space with this delay, adds wait time to the action space, and incorporates AoI into the reward function. RL is used to train policies that jointly minimize AoI and optimize task performance; simulations are reported to show feasibility and superiority over unspecified baselines, with code released as open source.

Significance. If the statistical delay model is shown to be accurate and the performance gains are reproducible, the framework could offer a structured way to trade off information freshness against task metrics in delay-sensitive RL settings for underwater networks. The open-source simulation code is a concrete strength that supports reproducibility.

major comments (2)

[Abstract] Abstract: the central modeling step asserts that observation delay is captured by a 'statistical delay formulation' and inserted directly as a state component, yet no derivation from acoustic channel physics, no citation to empirical underwater data, and no sensitivity analysis are supplied. This formulation is load-bearing for the claimed state-space augmentation and joint optimization.
[Abstract] Abstract: the claim of 'superior performance' in the multi-AUV task rests on simulation results, but the abstract supplies neither the baseline policies, the quantitative metrics (e.g., average AoI, task completion rate), nor any error bars or statistical tests, preventing assessment of whether the reported gains are robust.

minor comments (1)

[Abstract] The abstract states that codes are made available but does not provide the repository URL or license information.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central modeling step asserts that observation delay is captured by a 'statistical delay formulation' and inserted directly as a state component, yet no derivation from acoustic channel physics, no citation to empirical underwater data, and no sensitivity analysis are supplied. This formulation is load-bearing for the claimed state-space augmentation and joint optimization.

Authors: We agree that the statistical delay formulation is central and would benefit from additional grounding. The full manuscript introduces it as a general abstraction for observation delays arising from information limitations in underwater networks. We will revise the manuscript to add a brief motivation paragraph linking the formulation to acoustic propagation effects, include citations to empirical underwater channel studies, and incorporate a sensitivity analysis on the delay parameters to assess impact on the state augmentation and optimization. revision: yes
Referee: [Abstract] Abstract: the claim of 'superior performance' in the multi-AUV task rests on simulation results, but the abstract supplies neither the baseline policies, the quantitative metrics (e.g., average AoI, task completion rate), nor any error bars or statistical tests, preventing assessment of whether the reported gains are robust.

Authors: The abstract's length constraints prevented inclusion of these specifics. We will revise the abstract to name the baseline policies (standard MDP without AoI components), report key quantitative metrics such as average AoI reduction and task completion rates, and note that results are averaged over multiple runs with error bars presented in the full simulation section. This will allow better evaluation of robustness. revision: yes

Circularity Check

0 steps flagged

No circularity; additive MDP extension is self-contained

full rationale

The paper constructs AoI-MDP by adding a statistical delay term to the state, wait time to the action space, and AoI to the reward, then applies RL. No equations, fitted parameters, or self-citations are shown that reduce the claimed joint optimization or performance gains to the inputs by construction. The statistical delay formulation is asserted as an input rather than derived from prior results of the same authors, so the derivation chain does not collapse.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, invented physical entities, or non-standard axioms are identifiable beyond routine MDP assumptions.

axioms (1)

domain assumption A Markov decision process formulation is appropriate for modeling AUV decision-making under uncertainty and delays.
The paper builds the entire method on an MDP whose state and action spaces are augmented for the underwater task.

pith-pipeline@v0.9.0 · 5719 in / 1200 out tokens · 34368 ms · 2026-05-23T21:02:46.394103+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Auv-assisted node repair for iout relying on multiagent reinforcement learning,

Z. Wang, Z. Zhang, J. Wang, C. Jiang, W. Wei, and Y . Ren, “Auv-assisted node repair for iout relying on multiagent reinforcement learning,” IEEE Internet of Things Journal , vol. 11, no. 3, pp. 4139–4151, 2024

work page 2024
[2]

Noncooperative mobile target tracking using multiple auvs in anchor-free environments,

Y . Li, L. Liu, W. Yu, Y . Wang, and X. Guan, “Noncooperative mobile target tracking using multiple auvs in anchor-free environments,” IEEE Internet of Things Journal , vol. 7, no. 10, pp. 9819–9833, 2020

work page 2020
[3]

Guest editorial: Emerging trends and challenges in internet-of-underwater-things,

R. H. Jhaveri, K. M. Rabie, Q. Xin, M. Chafii, T. A. Tran, and B. M. ElHalawany, “Guest editorial: Emerging trends and challenges in internet-of-underwater-things,” IEEE Internet of Things Magazine , vol. 5, no. 4, pp. 8–9, 2022

work page 2022
[4]

Environment and energy-aware auv-assisted data collection for the internet of underwater things,

Z. Zhang, J. Xu, G. Xie, J. Wang, Z. Han, and Y . Ren, “Environment and energy-aware auv-assisted data collection for the internet of underwater things,” IEEE Internet of Things Journal , vol. 11, no. 15, pp. 26 406– 26 418, 2024

work page 2024
[5]

Underwater differential game: Finite-time target hunting task with communication delay,

W. Wei, J. Wang, J. Du, Z. Fang, C. Jiang, and Y . Ren, “Underwater differential game: Finite-time target hunting task with communication delay,” in ICC 2022 - IEEE International Conference on Communica- tions, 2022, pp. 3989–3994

work page 2022
[6]

Reinforcement learning and particle swarm optimization supporting real-time rescue assignments for multiple autonomous underwater vehicles,

J. Wu, C. Song, J. Ma, J. Wu, and G. Han, “Reinforcement learning and particle swarm optimization supporting real-time rescue assignments for multiple autonomous underwater vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 6807–6820, 2022

work page 2022
[7]

Optimizing information freshness in wireless networks under general interference constraints,

R. Talak, S. Karaman, and E. Modiano, “Optimizing information freshness in wireless networks under general interference constraints,” IEEE/ACM Transactions on Networking, vol. 28, no. 1, pp. 15–28, 2020

work page 2020
[8]

Age of information: An introduction and survey,

R. D. Yates, Y . Sun, D. R. Brown, S. K. Kaul, E. Modiano, and S. Ulukus, “Age of information: An introduction and survey,” IEEE Journal on Selected Areas in Communications , vol. 39, no. 5, pp. 1183– 1210, 2021

work page 2021
[9]

Uav-ugv- based system for aoi minimization in iot networks,

K. Messaoudi, O. S. Oubbati, A. Rachedi, and T. Bendouma, “Uav-ugv- based system for aoi minimization in iot networks,” in ICC 2023 - IEEE International Conference on Communications , 2023, pp. 4743–4748

work page 2023
[10]

Cooper- ative transmission for aoi-penalty aware state estimation in marine iot systems,

L. Lyu, Y . Dai, N. Cheng, S. Zhu, Z. Ding, and X. Guan, “Cooper- ative transmission for aoi-penalty aware state estimation in marine iot systems,” in 2020 IEEE 18th International Conference on Industrial Informatics (INDIN) , vol. 1, 2020, pp. 865–869

work page 2020
[11]

Update or wait: How to keep your data fresh,

Y . Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff, “Update or wait: How to keep your data fresh,” IEEE Transactions on Information Theory , vol. 63, no. 11, pp. 7492–7508, 2017

work page 2017
[12]

Dynamic programming and markov processes,

R. A. Howard, “Dynamic programming and markov processes,” 1960. [Online]. Available: https://api.semanticscholar.org/CorpusID:62124406

work page 1960
[13]

Closed-loop control with delayed information,

E. Altman and P. Nain, “Closed-loop control with delayed information,” SIGMETRICS Perform. Eval. Rev., vol. 20, no. 1, p. 193–204, jun 1992

work page 1992
[14]

Underwater searching and multiround data collection via auv swarms: An energy-efficient aoi- aware mappo approach,

B. Jiang, J. Du, C. Jiang, Z. Han, and M. Debbah, “Underwater searching and multiround data collection via auv swarms: An energy-efficient aoi- aware mappo approach,” IEEE Internet of Things Journal , vol. 11, no. 7, pp. 12 768–12 782, 2024

work page 2024
[15]

Closed-loop control with delayed information,

E. Altman and P. Nain, “Closed-loop control with delayed information,” in Proceedings of the 1992 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems , ser. SIGMETRICS ’92/PERFORMANCE ’92. New York, NY , USA: Association for Computing Machinery, 1992, p. 193–204

work page 1992
[16]

Markov decision processes with delays and asynchronous cost collection,

K. Katsikopoulos and S. Engelbrecht, “Markov decision processes with delays and asynchronous cost collection,” IEEE Transactions on Automatic Control, vol. 48, no. 4, pp. 568–574, 2003

work page 2003

[1] [1]

Auv-assisted node repair for iout relying on multiagent reinforcement learning,

Z. Wang, Z. Zhang, J. Wang, C. Jiang, W. Wei, and Y . Ren, “Auv-assisted node repair for iout relying on multiagent reinforcement learning,” IEEE Internet of Things Journal , vol. 11, no. 3, pp. 4139–4151, 2024

work page 2024

[2] [2]

Noncooperative mobile target tracking using multiple auvs in anchor-free environments,

Y . Li, L. Liu, W. Yu, Y . Wang, and X. Guan, “Noncooperative mobile target tracking using multiple auvs in anchor-free environments,” IEEE Internet of Things Journal , vol. 7, no. 10, pp. 9819–9833, 2020

work page 2020

[3] [3]

Guest editorial: Emerging trends and challenges in internet-of-underwater-things,

R. H. Jhaveri, K. M. Rabie, Q. Xin, M. Chafii, T. A. Tran, and B. M. ElHalawany, “Guest editorial: Emerging trends and challenges in internet-of-underwater-things,” IEEE Internet of Things Magazine , vol. 5, no. 4, pp. 8–9, 2022

work page 2022

[4] [4]

Environment and energy-aware auv-assisted data collection for the internet of underwater things,

Z. Zhang, J. Xu, G. Xie, J. Wang, Z. Han, and Y . Ren, “Environment and energy-aware auv-assisted data collection for the internet of underwater things,” IEEE Internet of Things Journal , vol. 11, no. 15, pp. 26 406– 26 418, 2024

work page 2024

[5] [5]

Underwater differential game: Finite-time target hunting task with communication delay,

W. Wei, J. Wang, J. Du, Z. Fang, C. Jiang, and Y . Ren, “Underwater differential game: Finite-time target hunting task with communication delay,” in ICC 2022 - IEEE International Conference on Communica- tions, 2022, pp. 3989–3994

work page 2022

[6] [6]

Reinforcement learning and particle swarm optimization supporting real-time rescue assignments for multiple autonomous underwater vehicles,

J. Wu, C. Song, J. Ma, J. Wu, and G. Han, “Reinforcement learning and particle swarm optimization supporting real-time rescue assignments for multiple autonomous underwater vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 6807–6820, 2022

work page 2022

[7] [7]

Optimizing information freshness in wireless networks under general interference constraints,

R. Talak, S. Karaman, and E. Modiano, “Optimizing information freshness in wireless networks under general interference constraints,” IEEE/ACM Transactions on Networking, vol. 28, no. 1, pp. 15–28, 2020

work page 2020

[8] [8]

Age of information: An introduction and survey,

R. D. Yates, Y . Sun, D. R. Brown, S. K. Kaul, E. Modiano, and S. Ulukus, “Age of information: An introduction and survey,” IEEE Journal on Selected Areas in Communications , vol. 39, no. 5, pp. 1183– 1210, 2021

work page 2021

[9] [9]

Uav-ugv- based system for aoi minimization in iot networks,

K. Messaoudi, O. S. Oubbati, A. Rachedi, and T. Bendouma, “Uav-ugv- based system for aoi minimization in iot networks,” in ICC 2023 - IEEE International Conference on Communications , 2023, pp. 4743–4748

work page 2023

[10] [10]

Cooper- ative transmission for aoi-penalty aware state estimation in marine iot systems,

L. Lyu, Y . Dai, N. Cheng, S. Zhu, Z. Ding, and X. Guan, “Cooper- ative transmission for aoi-penalty aware state estimation in marine iot systems,” in 2020 IEEE 18th International Conference on Industrial Informatics (INDIN) , vol. 1, 2020, pp. 865–869

work page 2020

[11] [11]

Update or wait: How to keep your data fresh,

Y . Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff, “Update or wait: How to keep your data fresh,” IEEE Transactions on Information Theory , vol. 63, no. 11, pp. 7492–7508, 2017

work page 2017

[12] [12]

Dynamic programming and markov processes,

R. A. Howard, “Dynamic programming and markov processes,” 1960. [Online]. Available: https://api.semanticscholar.org/CorpusID:62124406

work page 1960

[13] [13]

Closed-loop control with delayed information,

E. Altman and P. Nain, “Closed-loop control with delayed information,” SIGMETRICS Perform. Eval. Rev., vol. 20, no. 1, p. 193–204, jun 1992

work page 1992

[14] [14]

Underwater searching and multiround data collection via auv swarms: An energy-efficient aoi- aware mappo approach,

B. Jiang, J. Du, C. Jiang, Z. Han, and M. Debbah, “Underwater searching and multiround data collection via auv swarms: An energy-efficient aoi- aware mappo approach,” IEEE Internet of Things Journal , vol. 11, no. 7, pp. 12 768–12 782, 2024

work page 2024

[15] [15]

Closed-loop control with delayed information,

E. Altman and P. Nain, “Closed-loop control with delayed information,” in Proceedings of the 1992 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems , ser. SIGMETRICS ’92/PERFORMANCE ’92. New York, NY , USA: Association for Computing Machinery, 1992, p. 193–204

work page 1992

[16] [16]

Markov decision processes with delays and asynchronous cost collection,

K. Katsikopoulos and S. Engelbrecht, “Markov decision processes with delays and asynchronous cost collection,” IEEE Transactions on Automatic Control, vol. 48, no. 4, pp. 568–574, 2003

work page 2003