pith. sign in

arxiv: 2605.21016 · v1 · pith:CA5J7NDNnew · submitted 2026-05-20 · 💻 cs.IT · cs.NI· math.IT

Partially Observable Restless Bandits for Age-Optimal Scheduling over Markov Channels

Pith reviewed 2026-05-21 02:18 UTC · model grok-4.3

classification 💻 cs.IT cs.NImath.IT
keywords age of informationrestless banditsWhittle indexMarkov channelspartially observableschedulingIoTbandwidth constraint
0
0 comments X

The pith

Threshold structure on belief states proves indexability and enables Whittle index computation for minimizing age of information in partially observable Markov channels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how to schedule status updates from multiple IoT devices to a central controller when the time-varying channels are Markovian but their states cannot be observed before each decision, and only a limited number of transmissions are allowed per slot. It casts the resulting age-of-information minimization task as a constrained partially observable restless bandit problem. Lagrangian relaxation removes the hard bandwidth limit and splits the problem into independent single-device subproblems whose optimal policies turn out to be threshold rules on the posterior belief that the channel is good. This threshold property establishes indexability, so Whittle indices can be calculated and used to rank devices for scheduling; a simpler closed-form index is also supplied to lower complexity. Simulations show the resulting policies outperform common baselines and stay close to the relaxed performance bound, especially when bandwidth is tight or the network is large.

Core claim

The optimal policy for each decoupled subproblem after Lagrangian relaxation is a threshold policy on the belief state that tracks the probability the Markov channel is in the good state. Because this structure holds, each subproblem is indexable, so a Whittle index exists and can be computed numerically for the age objective; a closed-form Whittle-like index is further derived for low-complexity use. The indices produce a scheduling rule that approximately solves the original bandwidth-constrained problem.

What carries the argument

The threshold structure of the optimal policy on the belief state for each decoupled subproblem, which establishes indexability and permits direct computation of Whittle's index.

If this is right

  • The index policy can be computed once and then used for low-complexity online scheduling decisions.
  • Performance stays near the relaxed lower bound and improves relative to baselines when bandwidth is scarce.
  • The closed-form index variant further cuts computation while preserving most of the gain.
  • Gains become more pronounced as the number of devices increases or available slots decrease.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same threshold argument may apply to other partially observable scheduling objectives such as energy or delay.
  • In deployed networks the belief state could be maintained from delayed acknowledgments or occasional probes.
  • The framework suggests index policies remain practical even when full channel observation is impossible.
  • Periodic recomputation of indices from fresh channel statistics would keep the policy current in time-varying environments.

Load-bearing premise

The policy obtained from the relaxed problem remains close enough in performance to the true optimum under the original hard bandwidth constraint.

What would settle it

Solve the exact POMDP for a small instance with two or three devices, compute the long-run average AoI under both the Whittle index policy and the true optimal policy, and check whether the gap stays small; a large gap would show the indices do not reliably approximate the constrained optimum.

Figures

Figures reproduced from arXiv: 2605.21016 by Chao Xu, Shuying Gan, Xiang Chen, Xiaoyu Zhao, Xijun Wang, Yanzhi Huang.

Figure 1
Figure 1. Figure 1: Illustration of system model. of the channel between device i and the destination in slot t. Particularly, if the channel of device i is in the GOOD state in slot t, denoted by hi(t) = 1, the status update packet will be successfully received by the destination; while if the channel of device i is in the BAD state in slot t, denoted by hi(t) = 0, the transmission will be failed. If the status update of dev… view at source ↗
Figure 2
Figure 2. Figure 2: An illustration of the dynamics of AoI and belief in a time-slotted [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The state transition of the DTMC induced by policy [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The state transition of the DTMC induced by policy [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The states transition under a threshold policy with the threshold [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: The total time-average AoI versus the number of scheduled devices [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The total time-average AoI versus the number of total devices [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The total time-average AoI varying with time [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
read the original abstract

There is a surge of need for fresh information with the overwhelming proliferation of the Internet of Things (IoT) applications. To characterize the information freshness perceived by the destination, the age of information (AoI) has been proposed. In this paper, we consider an IoT system with multiple devices sending status update packets to a central controller through time-correlated Markov channels and assume that the instantaneous channel states are not available to the central controller before making scheduling decisions. To ensure information freshness, we investigate a timely scheduling problem that minimizes the total expected time-average AoI under a strict communications bandwidth constraint. We formulate this problem as a partially observable restless multi-armed bandit problem. Using Lagrangian relaxation, we decouple the relaxed problem into multiple sub-problems and prove the threshold structure of their optimal policies. Armed with this property, we establish the indexability for the decoupled problem and design an algorithm to compute the Whittle's index. To reduce implementation complexity, we further derive the Whittle-like index in closed-form for low-complexity scheduling. Simulation results show that the proposed index-based policies outperform the baselines, remain close to the optimal policy or relaxed lower bound, and are especially effective when scheduling resources are limited or the network size is large.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper formulates AoI minimization under a hard bandwidth constraint and partial observability of Markov channels as a POMDP restless bandit problem. It applies Lagrangian relaxation to obtain decoupled subproblems, proves that their optimal policies have a threshold structure on the belief state, uses this to establish indexability, computes Whittle indices, derives a closed-form Whittle-like index for low complexity, and reports simulation results showing that the resulting index policies outperform baselines and stay close to the relaxed lower bound.

Significance. If the threshold-structure and indexability proofs hold and the indices remain effective under the original cardinality constraint, the work supplies a scalable, theoretically justified scheduler for large-scale IoT status-update systems with correlated but unobservable channels. The closed-form index derivation and the explicit handling of partial observability are concrete strengths that would be useful to the community.

major comments (2)
  1. [Section on application of the index policy to the constrained problem (after index computation)] The central claim that the derived Whittle indices yield an effective policy for the original hard-constrained problem rests on the unproven assumption that the Lagrangian relaxation remains sufficiently tight when the bandwidth limit is active. No bound on the duality gap is provided, and the continuous belief-state dynamics make accumulation of violations possible; this step is load-bearing for the practical contribution.
  2. [Proof of threshold structure and indexability] The threshold-structure proof and indexability result are stated only for the decoupled subproblems obtained after Lagrangian relaxation. The manuscript does not supply a separate argument showing that the same structural properties survive the re-imposition of the cardinality constraint, which is required to justify direct use of the indices on the original problem.
minor comments (2)
  1. [Simulation results] The simulation section does not report the number of Monte Carlo runs, the exact procedure used to fit the Markov channel parameters, or any data-exclusion criteria; these details are needed to assess reproducibility of the reported performance gaps.
  2. [Formulation and Lagrangian relaxation] Notation for the belief-state update and the subsidy parameter in the relaxed subproblems should be introduced with an explicit equation reference to avoid ambiguity when the index computation is described.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope of our theoretical results. We address the major comments point by point below.

read point-by-point responses
  1. Referee: [Section on application of the index policy to the constrained problem (after index computation)] The central claim that the derived Whittle indices yield an effective policy for the original hard-constrained problem rests on the unproven assumption that the Lagrangian relaxation remains sufficiently tight when the bandwidth limit is active. No bound on the duality gap is provided, and the continuous belief-state dynamics make accumulation of violations possible; this step is load-bearing for the practical contribution.

    Authors: We agree that no explicit bound on the duality gap is derived for the original cardinality-constrained POMDP. In the Whittle-index literature for restless bandits, the Lagrangian relaxation is used to obtain a lower bound and to compute per-arm indices; the resulting index policy is then applied directly to the constrained problem, with performance typically validated through simulation or asymptotic analysis as the number of arms increases. Our simulations show that the index policy stays close to the relaxed lower bound and outperforms baselines, especially under tight bandwidth. In the revision we will add an explicit remark in the discussion section acknowledging the absence of a duality-gap bound for the continuous-belief case and noting that the practical effectiveness is supported by the reported numerical results. revision: partial

  2. Referee: [Proof of threshold structure and indexability] The threshold-structure proof and indexability result are stated only for the decoupled subproblems obtained after Lagrangian relaxation. The manuscript does not supply a separate argument showing that the same structural properties survive the re-imposition of the cardinality constraint, which is required to justify direct use of the indices on the original problem.

    Authors: The threshold structure and indexability are proved for the individual relaxed subproblems; this is the standard route to establishing Whittle indexability in restless-bandit problems. The indices are then used to rank arms and select the top subset that satisfies the bandwidth constraint, which is the conventional definition of the Whittle index policy for the original constrained problem. We do not assert that the threshold property holds for the joint constrained POMDP. We will revise the manuscript to state this distinction more explicitly in the section introducing the index policy and to reference the standard application of Whittle indices to cardinality-constrained problems. revision: partial

Circularity Check

0 steps flagged

Derivation of threshold structure and Whittle indices remains self-contained for the relaxed subproblems

full rationale

The paper formulates the AoI scheduling task as a POMDP restless bandit, applies Lagrangian relaxation to decouple into per-device subproblems, and proves threshold structure on the belief state for the optimal policies of those subproblems. Indexability follows directly from the monotonicity properties established in the relaxed value functions, and the Whittle index is computed from the resulting subsidy thresholds. The subsequent use of the index policy on the original hard-constrained problem is presented as an approximation whose performance is validated by simulation against the relaxed lower bound; no equation or step equates the final index policy to a fitted parameter or to a self-referential definition of the target metric. The Lagrange multiplier is selected to satisfy the average bandwidth constraint in the relaxed sense, which is a standard external tuning step rather than an internal reduction of the claimed result.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard MDP and POMDP theory plus the modeling choice that channels are finite-state Markov and that the Lagrangian multiplier can be tuned to enforce the bandwidth constraint. No new physical entities are postulated.

free parameters (1)
  • Lagrange multiplier
    Chosen to satisfy the average bandwidth constraint; its value is not derived from first principles but adjusted to meet the resource limit.
axioms (2)
  • domain assumption Channel states evolve as time-homogeneous Markov chains even when the device is not scheduled.
    Invoked in the system model to justify the restless-bandit formulation.
  • standard math The belief state is a sufficient statistic for the partially observable MDP.
    Standard POMDP result used when reducing the problem to belief-state thresholds.

pith-pipeline@v0.9.0 · 5767 in / 1584 out tokens · 28139 ms · 2026-05-21T02:18:22.093708+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    User scheduling for infor- mation freshness over correlated markov channels,

    Y . Huang, X. Wang, X. Sun, and X. Chen, “User scheduling for infor- mation freshness over correlated markov channels,” inProc. IEEE/CIC ICCC, Chongqing, China, Aug. 2020, pp. 1080–1085

  2. [2]

    Latency Critical IoT Applications in 5G: Perspective on the Design of Radio Interface and Network Architecture,

    P. Schulz, M. Matthe, H. Klessig, M. Simsek, G. Fettweis, J. Ansari, S. A. Ashraf, B. Almeroth, J. V oigt, I. Riedel, A. Puschmann, A. Mitschele-Thiel, M. Muller, T. Elste, and M. Windisch, “Latency Critical IoT Applications in 5G: Perspective on the Design of Radio Interface and Network Architecture,”IEEE Commun. Mag., vol. 55, no. 2, pp. 70–78, Feb. 2017

  3. [3]

    Critical Internet of Things: An Interworking Solution to Improve Service Reliability,

    C. K. Wu, K. F. Tsang, Y . Liu, H. Zhu, H. Wang, and Y . Wei, “Critical Internet of Things: An Interworking Solution to Improve Service Reliability,”IEEE Commun. Mag., vol. 58, no. 1, pp. 74–79, Jan. 2020

  4. [4]

    AI-Assisted Low Information Latency Wireless Networking,

    Z. Jiang, S. Fu, S. Zhou, Z. Niu, S. Zhang, and S. Xu, “AI-Assisted Low Information Latency Wireless Networking,”IEEE Wirel. Commun., vol. 27, no. 1, pp. 108–115, Feb. 2020

  5. [5]

    Real-time status: How often should one update?

    S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often should one update?” inProc. IEEE INFOCOM, Orlando, FL, USA, Mar. 2012, pp. 2731–2735

  6. [6]

    Optimizing Infor- mation Freshness in Computing-Enabled IoT Networks,

    C. Xu, H. H. Yang, X. Wang, and T. Q. S. Quek, “Optimizing Infor- mation Freshness in Computing-Enabled IoT Networks,”IEEE Internet Things J., vol. 7, no. 2, pp. 971–985, Feb. 2020

  7. [7]

    UA V-Aided Data Collection for Information Freshness in Wireless Sensor Networks,

    J. Liu, P. Tong, X. Wang, B. Bai, and H. Dai, “UA V-Aided Data Collection for Information Freshness in Wireless Sensor Networks,” IEEE Trans. Wirel. Commun., vol. 20, no. 4, pp. 2368–2382, 2021

  8. [8]

    Under- standing Age of Information in Large-Scale Wireless Networks,

    H. H. Yang, C. Xu, X. Wang, D. Feng, and T. Q. S. Quek, “Under- standing Age of Information in Large-Scale Wireless Networks,”IEEE Trans. Wirel. Commun., vol. 20, no. 5, pp. 3196–3210, 2021

  9. [9]

    Optimal Status Update for Caching Enabled IoT Networks: A Dueling Deep R-Network Approach,

    C. Xu, Y . Xie, X. Wang, H. H. Yang, D. Niyato, and T. Q. S. Quek, “Optimal Status Update for Caching Enabled IoT Networks: A Dueling Deep R-Network Approach,”IEEE Trans. Wirel. Commun., vol. 20, no. 12, pp. 8438–8454, 2021

  10. [10]

    When to Preprocess? Keeping Information Fresh for Computing Enable Internet of Things,

    X. Wang, M. Fang, C. Xu, H. H. Yang, X. Sun, X. Chen, and T. Q. S. Quek, “When to Preprocess? Keeping Information Fresh for Computing Enable Internet of Things,”IEEE Internet Things J., vol. 9, no. 6, pp. 4303–4317, 2022

  11. [11]

    Age of Changed Information: Content-Aware Status Updating in the Internet of Things,

    X. Wang, W. Lin, C. Xu, X. Sun, and X. Chen, “Age of Changed Information: Content-Aware Status Updating in the Internet of Things,” IEEE Trans. Commun., vol. 70, no. 1, pp. 578–591, 2022

  12. [12]

    Scheduling Policies for Minimizing Age of Information in Broadcast Wireless Networks,

    I. Kadota, A. Sinha, E. Uysal-Biyikoglu, R. Singh, and E. Modiano, “Scheduling Policies for Minimizing Age of Information in Broadcast Wireless Networks,”IEEE/ACM Trans. Netw., vol. 26, no. 6, pp. 2637– 2650, 2018

  13. [13]

    Minimizing the Age of Information in Wire- less Networks with Stochastic Arrivals,

    I. Kadota and E. Modiano, “Minimizing the Age of Information in Wire- less Networks with Stochastic Arrivals,” inProc. Mobihoc. Catania, Italy: ACM Press, 2019, pp. 221–230

  14. [14]

    Closed-Form Whittle’s Index-Enabled Random Access for Timely Status Update,

    J. Sun, Z. Jiang, B. Krishnamachari, S. Zhou, and Z. Niu, “Closed-Form Whittle’s Index-Enabled Random Access for Timely Status Update,” IEEE Trans. Commun., vol. 68, no. 3, pp. 1538–1551, Mar. 2020

  15. [15]

    Scheduling Algorithms for Minimizing Age of Information in Wireless Broadcast Networks with Random Arrivals,

    Y .-P. Hsu, E. Modiano, and L. Duan, “Scheduling Algorithms for Minimizing Age of Information in Wireless Broadcast Networks with Random Arrivals,”IEEE Trans. Mob. Comput., vol. 19, no. 12, pp. 2903– 2915, 2020

  16. [16]

    On The Optimality of The Whittle’s Index Policy For Minimizing The Age of Information,

    A. Maatouk, S. Kriouile, M. Assaad, and A. Ephremides, “On The Optimality of The Whittle’s Index Policy For Minimizing The Age of Information,”IEEE Trans. Wirel. Commun., vol. 20, no. 2, pp. 1263– 1277, 2021

  17. [17]

    Scheduling algorithms for optimizing age of information in wireless networks with throughput constraints,

    I. Kadota, A. Sinha, and E. Modiano, “Scheduling algorithms for optimizing age of information in wireless networks with throughput constraints,”IEEE/ACM Trans. Netw., vol. 27, no. 4, pp. 1359–1372, 2019

  18. [18]

    Minimum Age of Information in the Internet of Things With Non-Uniform Status Packet Sizes,

    B. Zhou and W. Saad, “Minimum Age of Information in the Internet of Things With Non-Uniform Status Packet Sizes,”IEEE Trans. Wirel. Commun., vol. 19, no. 3, pp. 1933–1947, Mar. 2020

  19. [19]

    A Reinforcement Learning Approach to Age of Information in Multi-User Networks With HARQ,

    E. T. Ceran, D. Gunduz, and A. Gyorgy, “A Reinforcement Learning Approach to Age of Information in Multi-User Networks With HARQ,” IEEE J. Sel. Areas Commun., vol. 39, no. 5, pp. 1412–1426, May 2021

  20. [20]

    Minimizing Age of Information via Scheduling over Heterogeneous Channels,

    J. Pan, A. M. Bedewy, Y . Sun, and N. B. Shroff, “Minimizing Age of Information via Scheduling over Heterogeneous Channels,” ArXiv201209403 Cs Math, Jan. 2021

  21. [21]

    Age-Optimal Low-Power Status Update over Time-Correlated Fading Channel,

    G. Yao, A. M. Bedewy, and N. B. Shroff, “Age-Optimal Low-Power Status Update over Time-Correlated Fading Channel,”ArXiv201202958 Cs Math, Dec. 2020

  22. [22]

    Minimizing Age of Information With Power Constraints: Multi-User Opportunistic Scheduling in Multi- State Time-Varying Channels,

    H. Tang, J. Wang, L. Song, and J. Song, “Minimizing Age of Information With Power Constraints: Multi-User Opportunistic Scheduling in Multi- State Time-Varying Channels,”IEEE J. Sel. Areas Commun., vol. 38, no. 5, pp. 854–868, May 2020

  23. [23]

    Channel Sensing and Communication Over a Time-Correlated Channel With an Energy Harvesting Transmitter,

    M. Salehi Heydar Abad, O. Ercetin, and D. Gündüz, “Channel Sensing and Communication Over a Time-Correlated Channel With an Energy Harvesting Transmitter,”IEEE Trans. Green Commun. Netw., vol. 2, no. 1, pp. 114–126, Mar. 2018

  24. [24]

    Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access,

    K. Liu and Q. Zhao, “Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access,”IEEE Trans. Inf. Theory, vol. 56, no. 11, pp. 5547–5567, 2010

  25. [25]

    Bandit processes and dynamic allocation indices,

    J. C. Gittins, “Bandit processes and dynamic allocation indices,”J. R. Stat. Soc., Ser. B, vol. 41, no. 2, pp. 148–177, 1979

  26. [26]

    Restless Bandits: Activity Allocation in a Changing World,

    P. Whittle, “Restless Bandits: Activity Allocation in a Changing World,” J. Appl. Probab., vol. 25, pp. 287–298, 1988

  27. [27]

    Restless bandits, partial conservation laws and indexa- bility,

    J. Niño-Mora, “Restless bandits, partial conservation laws and indexa- bility,”Adv. Appl. Probab., vol. 33, no. 1, pp. 76–98, 2001

  28. [28]

    Dynamic priority allocation via restless bandit marginal produc- tivity indices,

    ——, “Dynamic priority allocation via restless bandit marginal produc- tivity indices,”TOP, vol. 15, no. 2, pp. 161–198, 2007

  29. [29]

    Asymptotically optimal downlink scheduling over markovian fading channels,

    W. Ouyang, A. Eryilmaz, and N. B. Shroff, “Asymptotically optimal downlink scheduling over markovian fading channels,” inProc. IEEE INFOCOM, Orlando, FL, USA, Mar. 2012, pp. 1224–1232

  30. [30]

    The complexity of optimal queuing network control,

    C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of optimal queuing network control,”Math Oper. Res., vol. 24, no. 2, pp. 293–305, 1999

  31. [31]

    The optimal control of partially observable markov processes over a finite horizon,

    R. D. Smallwood and E. J. Sondik, “The optimal control of partially observable markov processes over a finite horizon,”Oper. Res., vol. 21, no. 5, pp. 1071–1088, 1973

  32. [32]

    Average Cost Optimal Stationary Policies in Infinite State Markov Decision Processes with Unbounded Costs,

    L. I. Sennott, “Average Cost Optimal Stationary Policies in Infinite State Markov Decision Processes with Unbounded Costs,”Oper. Res., vol. 37, no. 4, pp. 626–633

  33. [33]

    A counterexample on the optimality equation in markov decision chains with the average cost criterion,

    R. Cavazos-Cadena, “A counterexample on the optimality equation in markov decision chains with the average cost criterion,”Syst. Control. Lett., vol. 16, no. 5, pp. 387–392, 1991

  34. [34]

    Bertsekas,Dynamic Programming and Optimal Control-II, 3rd ed

    Dimitri P. Bertsekas,Dynamic Programming and Optimal Control-II, 3rd ed. Athena Scientific, 2007, vol. II

  35. [35]

    On computing average cost optimal policies with appli- cation to routing to parallel queues,

    L. I. Sennott, “On computing average cost optimal policies with appli- cation to routing to parallel queues,”Math Method Oper. Res., vol. 45, no. 1, pp. 45–62, 1997