pith. machine review for the scientific record. sign in

arxiv: 2605.08674 · v1 · submitted 2026-05-09 · 📡 eess.SP

Recognition: no theorem link

Fair and Efficient Scheduling for Sensor Networks via Online Whittle Index Policy

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:02 UTC · model grok-4.3

classification 📡 eess.SP
keywords sensor networkswake-up radioage of incorrect informationwhittle indexrestless multi-armed banditonline schedulingenergy efficiency
0
0 comments X

The pith

An online Whittle index policy using Age of Incorrect Information cuts sensor network transmissions by up to 70 percent compared to round-robin polling while keeping estimation errors within acceptable limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to reduce energy use and storage demands in wake-up radio sensor networks by polling only those nodes whose data meaningfully corrects the remote monitor's view of the monitored process. It casts the choice of which nodes to poll as a restless multi-armed bandit problem and replaces the usual requirement for known transition probabilities with an online state-estimation step that learns the necessary indices on the fly. This produces two policies, WAoII and its fair variant FWAoII, that adapt polling to actual information value rather than fixed rotation. Experiments on both real-world traces and synthetic data show the resulting schedule transmits far fewer packets than round-robin while the monitor's root-mean-square error stays inside application tolerances.

Core claim

The paper establishes that an online state-estimation procedure can compute Whittle indices for the Age of Incorrect Information metric without prior knowledge of transition dynamics, yielding WAoII and FWAoII policies that schedule node polling in wake-up radio networks. These policies reduce packet transmissions by up to 70 percent relative to round-robin polling while keeping root-mean-square error within acceptable application tolerances on both real and synthetic data sets.

What carries the argument

The online Whittle Index AoII (WAoII) policy, derived by estimating unknown transition dynamics from observed states and then applying the index policy of the resulting restless multi-armed bandit formulation of AoII minimization.

Load-bearing premise

The online state-estimation step recovers enough information about the unknown transition dynamics to produce reliable Whittle indices that correctly rank which nodes to poll.

What would settle it

A controlled deployment in which the state estimator converges to inaccurate transition estimates and the resulting WAoII policy either transmits at least as many packets as round-robin or produces root-mean-square error above the stated application tolerance.

Figures

Figures reproduced from arXiv: 2605.08674 by Anita Khadka, Saurav Staphit, Seong Ki Yoo, Sokipriala Jonah.

Figure 1
Figure 1. Figure 1: WSNs with wake-up radio, where the edge node can [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Polling distribution for Scenario One: RR and AoI [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Polling distribution for Scenario Two: The RR and AoI [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Time series reconstruction from the synthetic data at [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of rewards for various scheduling techniques under different values of [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example time series reconstruction from the tempera [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Example time series reconstruction from the humidity [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

Wake-Up Radio (WUR) enables resource-constrained, battery-powered sensor nodes to remain in a low-power deep sleep state while continuously listening for a Wake-Up Signal (WUS). Sensor nodes only wake and transmit data after receiving the WUS, significantly reducing energy consumption. However, polling nodes whose transmitted data provides little or no meaningful update to the remote monitor can still result in unnecessary energy usage and increased storage overhead. To address this issue, this paper uses the Age of Incorrect Information (AoII) metric to prioritise the polling of nodes that provide informative updates to the remote monitor. Determining the optimal set of nodes to poll based on AoII can be formulated as a Restless Multi-Armed Bandit (RMAB) problem, which traditionally requires prior knowledge of the monitored process transition dynamics. Since such dynamics are often unknown in practical deployments, we propose an online learning framework based on state estimation to derive Whittle Index AoII (WAoII) and Fair Whittle Index AoII (FWAoII) policies without assuming known transition probabilities. The proposed policies efficiently schedule node polling while adapting to unknown process behaviour. Experimental evaluation using both real-world and synthetic datasets demonstrates that the proposed online WAoII policy can reduce packet transmissions by up to 70\% compared to the widely used Round Robin (RR) polling strategy, while maintaining Root Mean Squared Error (RMSE) values within acceptable application error tolerances. These results demonstrate the effectiveness of WAoII and FWAoII as energy-efficient polling techniques for low-power WUR sensor networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper formulates polling scheduling for Wake-Up Radio sensor networks as a Restless Multi-Armed Bandit (RMAB) problem using the Age of Incorrect Information (AoII) metric to prioritize informative updates. It proposes online WAoII and FWAoII policies that use state estimation to compute Whittle indices without assuming known transition probabilities, and reports that these policies reduce packet transmissions by up to 70% versus Round Robin while keeping RMSE within acceptable tolerances on real-world and synthetic datasets.

Significance. If the online state estimation reliably recovers the underlying dynamics, the work provides a practical, adaptive scheduling method that extends battery life in resource-constrained WUR networks without requiring prior process models. The experimental results on both real and synthetic traces constitute a concrete strength, demonstrating measurable transmission savings while respecting application-level error bounds.

major comments (2)
  1. [online learning framework and WAoII/FWAoII policy derivation] The online learning framework (state estimation for unknown transition probabilities) provides no convergence guarantees, error bounds, or robustness analysis for the recovered dynamics used to compute Whittle indices. This is load-bearing for the central claim, as inaccurate indices would invalidate the prioritization that produces the reported 70% transmission reduction.
  2. [Experimental Evaluation] Experimental Evaluation: the manuscript reports RMSE values within tolerances and up to 70% savings versus RR but supplies no quantitative comparison of estimated versus true transition probabilities, no statistical significance tests across runs, and no tests under non-stationarity or observation noise. Without these, it is unclear whether the performance generalizes beyond the specific traces.
minor comments (2)
  1. [Abstract] The abstract states that RMSE remains 'within acceptable application error tolerances' but does not define or justify those tolerances or link them to specific application requirements.
  2. [Proposed online learning framework] Notation for the estimated state and the online estimator could be clarified with an explicit algorithm box or pseudocode to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, indicating where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [online learning framework and WAoII/FWAoII policy derivation] The online learning framework (state estimation for unknown transition probabilities) provides no convergence guarantees, error bounds, or robustness analysis for the recovered dynamics used to compute Whittle indices. This is load-bearing for the central claim, as inaccurate indices would invalidate the prioritization that produces the reported 70% transmission reduction.

    Authors: We agree that the manuscript does not include formal convergence guarantees, error bounds, or a dedicated robustness analysis for the state estimation step. The estimation uses online frequency counts of observed transitions, a standard method for learning unknown Markov dynamics, but we did not derive Whittle-index-specific bounds or prove convergence rates in the RMAB setting. In the revision we will add a dedicated subsection on the estimation procedure, recall its known asymptotic consistency under standard ergodicity assumptions, and include empirical plots of estimation error versus sample size on the synthetic traces. We will also discuss how index computation is affected by moderate estimation error. These additions will clarify the practical reliability of the approach while acknowledging that a full theoretical analysis remains future work. revision: partial

  2. Referee: [Experimental Evaluation] Experimental Evaluation: the manuscript reports RMSE values within tolerances and up to 70% savings versus RR but supplies no quantitative comparison of estimated versus true transition probabilities, no statistical significance tests across runs, and no tests under non-stationarity or observation noise. Without these, it is unclear whether the performance generalizes beyond the specific traces.

    Authors: We accept that the current experimental section lacks these quantitative checks. In the revised manuscript we will: (i) add direct comparisons (tables and plots) of estimated versus ground-truth transition probabilities on all synthetic datasets, reporting L1 or total-variation error; (ii) repeat all experiments over 20 independent runs and report mean performance with standard deviation together with paired statistical significance tests (t-tests or Wilcoxon signed-rank) against Round-Robin; (iii) introduce new experiments that inject controlled non-stationarity (abrupt or gradual changes in transition matrices) and additive observation noise, measuring degradation in transmission savings and RMSE. These results will be placed in an expanded experimental section to support claims of generalizability. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained via standard RMAB theory plus empirical validation

full rationale

The paper formulates AoII-based polling as an RMAB, adopts the standard Whittle index policy, and augments it with an online state-estimation procedure to handle unknown transition probabilities. The reported 70% transmission reduction is an empirical outcome measured on held-out real-world and synthetic traces, not a quantity that reduces by construction to parameters fitted inside the same experiment or to a self-citation chain. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the derivation; the online estimator is presented as an independent approximation whose accuracy is tested externally rather than assumed tautologically.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the standard RMAB formulation being a valid model for AoII-based polling and on the online estimator being able to substitute for unknown transition probabilities.

axioms (1)
  • domain assumption Polling decisions in WUR sensor networks can be modeled as a Restless Multi-Armed Bandit problem.
    Invoked in the abstract to justify the Whittle Index approach.

pith-pipeline@v0.9.0 · 5593 in / 1166 out tokens · 56806 ms · 2026-05-12T01:02:24.399607+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

  1. [1]

    When to pull data from sensors for minimum age of incorrect information,

    S. Kriouile and M. Assaad, “When to pull data from sensors for minimum age of incorrect information,” in2023 21st International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt). IEEE, 2023, pp. 603–610

  2. [2]

    Admin: Adaptive mon- itoring dissemination for the internet of things,

    D. Trihinas, G. Pallis, and M. D. Dikaiakos, “Admin: Adaptive mon- itoring dissemination for the internet of things,” inIEEE INFOCOM 2017-IEEE conference on computer communications. IEEE, 2017, pp. 1–9

  3. [3]

    Edge mining the internet of things,

    E. I. Gaura, J. Brusey, M. Allen, R. Wilkins, D. Goldsmith, and R. Rednic, “Edge mining the internet of things,”IEEE Sensors Journal, vol. 13, no. 10, pp. 3816–3825, 2013

  4. [4]

    Learn to schedule: Data freshness- oriented intelligent scheduling in industrial iot,

    J. Tang, F. Chen, J. Li, and Z. Liu, “Learn to schedule: Data freshness- oriented intelligent scheduling in industrial iot,”IEEE Transactions on Cognitive Communications and Networking, 2024

  5. [5]

    Goal-oriented scheduling in sensor networks with applica- tion timing awareness,

    J. Holm, F. Chiariotti, A. E. Kalør, B. Soret, T. B. Pedersen, and P. Popovski, “Goal-oriented scheduling in sensor networks with applica- tion timing awareness,”IEEE Transactions on Communications, vol. 71, no. 8, pp. 4513–4527, 2023

  6. [6]

    A bayesian ap- proach to online learning for contextual restless bandits with applications to public health,

    B. Liang, L. Xu, A. Taneja, M. Tambe, and L. Janson, “A bayesian ap- proach to online learning for contextual restless bandits with applications to public health,”arXiv preprint arXiv:2402.04933, 2024

  7. [7]

    Energy-efficient internet of things monitoring with content-based wake-up radio,

    A. A. Deshpande, F. Chiariotti, and A. Zanella, “Energy-efficient internet of things monitoring with content-based wake-up radio,”arXiv preprint arXiv:2312.04294, 2023

  8. [8]

    Nc-approximation schemes for np- and pspace-hard problems for geometric graphs,

    H. B. Hunt III, M. V . Marathe, V . Radhakrishnan, S. S. Ravi, D. J. Rosenkrantz, and R. E. Stearns, “Nc-approximation schemes for np- and pspace-hard problems for geometric graphs,”Journal of algorithms, vol. 26, no. 2, pp. 238–274, 1998

  9. [9]

    Restless-ucb, an efficient and low- complexity algorithm for online restless bandits,

    S. Wang, L. Huang, and J. Lui, “Restless-ucb, an efficient and low- complexity algorithm for online restless bandits,”Advances in Neural Information Processing Systems, vol. 33, pp. 11 878–11 889, 2020

  10. [10]

    Optimistic whittle index policy: Online learning for restless bandits,

    K. Wang, L. Xu, A. Taneja, and M. Tambe, “Optimistic whittle index policy: Online learning for restless bandits,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 8, 2023, pp. 10 131– 10 139

  11. [11]

    Energy efficient wake up radio polling based on value of information,

    S. Jonah, S. K. Yoo, and S. Sthapit, “Energy efficient wake up radio polling based on value of information,” 2025, presented at the IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), Chisinau, Moldova, 23–26 June 2025

  12. [12]

    Has time come to switch from duty-cycled mac protocols to wake-up radio for wireless sensor networks?

    J. Oller, I. Demirkol, J. Casademont, J. Paradells, G. U. Gamm, and L. Reindl, “Has time come to switch from duty-cycled mac protocols to wake-up radio for wireless sensor networks?”IEEE/ACM Transactions on Networking, vol. 24, no. 2, pp. 674–687, 2015

  13. [13]

    Energy efficiency trade-off between duty-cycling and wake-up radio techniques in iot networks,

    A. Kozłowski and J. Sosnowski, “Energy efficiency trade-off between duty-cycling and wake-up radio techniques in iot networks,”Wireless Personal Communications, vol. 107, no. 4, pp. 1951–1971, 2019

  14. [14]

    Ieee 802.11 ba wake-up radio: Performance evaluation and practical designs,

    D.-J. Deng, S.-Y . Lien, C.-C. Lin, M. Gan, and H.-C. Chen, “Ieee 802.11 ba wake-up radio: Performance evaluation and practical designs,”IEEE Access, vol. 8, pp. 141 547–141 557, 2020

  15. [15]

    Radio- on-demand sensor and actuator networks (rod-san): System design and field trial,

    H. Yomo, K. Abe, Y . Ezure, T. Ito, A. Hasegawa, and T. Ikenaga, “Radio- on-demand sensor and actuator networks (rod-san): System design and field trial,” in2015 IEEE Global Communications Conference (GLOBECOM). IEEE, 2015, pp. 1–6

  16. [16]

    Value of information- based packet scheduling scheme for auv-assisted uasns,

    X. Zhuo, W. Wu, L. Tang, F. Qu, and X. Shen, “Value of information- based packet scheduling scheme for auv-assisted uasns,”IEEE Transac- tions on Wireless Communications, 2023

  17. [17]

    6g networks: Beyond shannon towards semantic and goal-oriented communications,

    E. C. Strinati and S. Barbarossa, “6g networks: Beyond shannon towards semantic and goal-oriented communications,”Computer Networks, vol. 190, p. 107930, 2021

  18. [18]

    Toward goal- oriented semantic communications: New metrics, framework, and open challenges,

    A. Li, S. Wu, S. Meng, R. Lu, S. Sun, and Q. Zhang, “Toward goal- oriented semantic communications: New metrics, framework, and open challenges,”IEEE Wireless Communications, 2024

  19. [19]

    Goal-oriented wireless communication resource allocation for cyber-physical systems,

    C. Feng, K. Zheng, Y . Wang, K. Huang, and Q. Chen, “Goal-oriented wireless communication resource allocation for cyber-physical systems,” IEEE Transactions on Wireless Communications, 2024

  20. [20]

    Making sense of meaning: A survey on metrics for semantic and goal-oriented communication,

    T. M. Getu, G. Kaddoum, and M. Bennis, “Making sense of meaning: A survey on metrics for semantic and goal-oriented communication,” IEEE Access, vol. 11, pp. 45 456–45 492, 2023

  21. [21]

    Push-and pull-based effective communication in cyber-physical systems,

    P. Talli, F. Mason, F. Chiariotti, and A. Zanella, “Push-and pull-based effective communication in cyber-physical systems,”arXiv preprint arXiv:2401.10921, 2024

  22. [22]

    Content-based wake-up for top-k query in wireless sensor networks,

    J. Shiraishi, H. Yomo, K. Huang, ˇC. Stefanovi ´c, and P. Popovski, “Content-based wake-up for top-k query in wireless sensor networks,” IEEE Transactions on Green Communications and Networking, vol. 5, no. 1, pp. 362–377, 2020

  23. [23]

    Exact top-k queries in wireless sensor networks,

    B. Malhotra, M. A. Nascimento, and I. Nikolaidis, “Exact top-k queries in wireless sensor networks,”IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 10, pp. 1513–1525, 2010

  24. [24]

    Real-time status: How often should one update?

    S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often should one update?” in2012 Proceedings IEEE INFOCOM. IEEE, 2012, pp. 2731–2735

  25. [25]

    Wireless scheduling to optimize age of information based on earliest update time,

    Q. Liu, C. Li, Y . T. Hou, W. Lou, J. H. Reed, and S. Kompella, “Wireless scheduling to optimize age of information based on earliest update time,” IEEE Internet of Things Journal, vol. 10, no. 7, pp. 6352–6366, 2022

  26. [26]

    Deep reinforcement learning based scheduling for minimizing age of information in wireless powered sensor networks,

    W. Jin, J. Sun, K. Chi, and S. Zhang, “Deep reinforcement learning based scheduling for minimizing age of information in wireless powered sensor networks,”Computer Communications, vol. 191, pp. 1–10, 2022

  27. [27]

    Age-of-information aware scheduling for edge-assisted industrial wireless networks,

    M. Li, C. Chen, H. Wu, X. Guan, and X. Shen, “Age-of-information aware scheduling for edge-assisted industrial wireless networks,”IEEE Transactions on Industrial Informatics, vol. 17, no. 8, pp. 5562–5571, 2020

  28. [28]

    The age of incorrect information: A new performance metric for status updates,

    A. Maatouk, S. Kriouile, M. Assaad, and A. Ephremides, “The age of incorrect information: A new performance metric for status updates,” IEEE/ACM Transactions on Networking, vol. 28, no. 5, pp. 2215–2228, 2020

  29. [29]

    Optimization of aoii and qaoii in multi-user links,

    M. Ayik, E. T. Ceran, and E. Uysal, “Optimization of aoii and qaoii in multi-user links,” inIEEE INFOCOM 2023-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2023, pp. 1–6

  30. [30]

    Scheduling to minimize age of incorrect information with imperfect channel state information,

    Y . Chen and A. Ephremides, “Scheduling to minimize age of incorrect information with imperfect channel state information,”Entropy, vol. 23, no. 12, p. 1572, 2021

  31. [31]

    The age of incorrect in- formation: An enabler of semantics-empowered communication,

    A. Maatouk, M. Assaad, and A. Ephremides, “The age of incorrect in- formation: An enabler of semantics-empowered communication,”IEEE Transactions on Wireless Communications, vol. 22, no. 4, pp. 2621– 2635, 2022

  32. [32]

    Minimizing the age of incorrect information for unknown markovian source,

    S. Kriouile and M. Assaad, “Minimizing the age of incorrect information for unknown markovian source,”IEEE Transactions on Networking, 2026

  33. [33]

    Minimizing age of incorrect information over a channel with random delay,

    Y . Chen and A. Ephremides, “Minimizing age of incorrect information over a channel with random delay,”IEEE/ACM Transactions on Net- working, vol. 32, no. 4, pp. 2752–2764, 2024

  34. [34]

    Ao 2 i: Minimizing age of outdated information to improve freshness in data collection,

    Q. Liu, C. Li, Y . T. Hou, W. Lou, J. H. Reed, and S. Kompella, “Ao 2 i: Minimizing age of outdated information to improve freshness in data collection,” inIEEE INFOCOM 2022-IEEE Conference on Computer Communications. IEEE, 2022, pp. 1359–1368

  35. [35]

    Age of information: An introduction and survey,

    R. D. Yates, Y . Sun, D. R. Brown, S. K. Kaul, E. Modiano, and S. Ulukus, “Age of information: An introduction and survey,”IEEE Journal on Selected Areas in Communications, vol. 39, no. 5, pp. 1183– 1210, 2021

  36. [36]

    Scheduling to minimize age of information with multiple sources,

    K. Saurav and R. Vaze, “Scheduling to minimize age of information with multiple sources,”IEEE Journal on Selected Areas in Information Theory, vol. 4, pp. 539–550, 2023

  37. [37]

    Communication scheduling by deep reinforcement learning for remote traffic state estimation with bayesian inference,

    B. Peng, Y . Xie, G. Seco-Granados, H. Wymeersch, and E. A. Jorswieck, “Communication scheduling by deep reinforcement learning for remote traffic state estimation with bayesian inference,”IEEE Transactions on Vehicular Technology, vol. 71, no. 4, pp. 4287–4300, 2022

  38. [38]

    Weighted linear dynamic system for feature representation and soft sensor appli- cation in nonlinear dynamic industrial processes,

    X. Yuan, Y . Wang, C. Yang, Z. Ge, Z. Song, and W. Gui, “Weighted linear dynamic system for feature representation and soft sensor appli- cation in nonlinear dynamic industrial processes,”IEEE Transactions on Industrial Electronics, vol. 65, no. 2, pp. 1508–1517, 2017

  39. [39]

    Linearization of the sensors character- istics: A review,

    T. Islam and S. Mukhopadhyay, “Linearization of the sensors character- istics: A review,”International Journal on Smart Sensing and Intelligent Systems, vol. 12, no. 1, pp. 1–21, 2019

  40. [40]

    Adaptive retransmission for wireless sensor nodes under bursty error conditions,

    S. Jonah, S. K. Yoo, and S. Sthapit, “Adaptive retransmission for wireless sensor nodes under bursty error conditions,” in2024 5th International 16 Conference on Smart Sensors and Application (ICSSA). IEEE, 2024, pp. 1–6

  41. [41]

    Adaptive burst transmission scheme for wsns,

    Z. Ansar and W. Dargie, “Adaptive burst transmission scheme for wsns,” in2017 26th International Conference on Computer Communication and Networks (ICCCN). IEEE, 2017, pp. 1–7

  42. [42]

    The complexity of optimal queueing network control,

    C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of optimal queueing network control,” inProceedings of IEEE 9th annual confer- ence on structure in complexity Theory. IEEE, 1994, pp. 318–322

  43. [43]

    Rested and restless bandits with constrained arms and hidden states: Applications in social networks and 5g networks,

    V . Mehta, R. Meshram, K. Kaza, S. N. Merchant, and U. B. Desai, “Rested and restless bandits with constrained arms and hidden states: Applications in social networks and 5g networks,”IEEE Access, vol. 6, pp. 56 782–56 799, 2018

  44. [44]

    Markovian restless bandits and index policies: A review,

    J. Ni ˜no-Mora, “Markovian restless bandits and index policies: A review,” Mathematics, vol. 11, no. 7, p. 1639, 2023

  45. [45]

    Adaptive scheduling: A reinforce- ment learning whittle index approach for wireless sensor networks,

    S. Jonah, S. K. Yoo, and S. Sthapit, “Adaptive scheduling: A reinforce- ment learning whittle index approach for wireless sensor networks,” IEEE Access, 2026

  46. [46]

    On learning whittle index policy for restless bandits with scalable regret,

    N. Akbarzadeh and A. Mahajan, “On learning whittle index policy for restless bandits with scalable regret,”IEEE Transactions on Control of Network Systems, vol. 11, no. 3, pp. 1190–1202, 2023

  47. [47]

    Finite-time analysis of whittle index based q- learning for restless multi-armed bandits with neural network function approximation,

    G. Xiong and J. Li, “Finite-time analysis of whittle index based q- learning for restless multi-armed bandits with neural network function approximation,”Advances in Neural Information Processing Systems, vol. 36, pp. 29 048–29 073, 2023

  48. [48]

    Learn to in- tervene: An adaptive learning policy for restless bandits in application to preventive healthcare.arXiv preprint arXiv:2105.07965,

    A. Biswas, G. Aggarwal, P. Varakantham, and M. Tambe, “Learn to intervene: An adaptive learning policy for restless bandits in application to preventive healthcare,”arXiv preprint arXiv:2105.07965, 2021

  49. [49]

    Asymptotically optimal delay-aware scheduling in queueing systems,

    S. Kriouile, M. Assaad, and M. Larranaga, “Asymptotically optimal delay-aware scheduling in queueing systems,”Journal of Communica- tions and Networks, 2024

  50. [50]

    Restless bandits: Activity allocation in a changing world,

    P. Whittle, “Restless bandits: Activity allocation in a changing world,” Journal of applied probability, vol. 25, no. A, pp. 287–298, 1988

  51. [51]

    Aoi-bounded scheduling for industrial wireless sensor networks,

    C. Pu, H. Yang, P. Wang, and C. Dong, “Aoi-bounded scheduling for industrial wireless sensor networks,”Electronics, vol. 12, no. 6, p. 1499, 2023

  52. [52]

    Monitoring correlated sources: Aoi-based scheduling is nearly optimal,

    R. V . Ramakanth, V . Tripathi, and E. Modiano, “Monitoring correlated sources: Aoi-based scheduling is nearly optimal,”IEEE Transactions on Mobile Computing, 2024

  53. [53]

    Intel lab data,

    S. Madden, “Intel lab data,” http://db.lcs.mit.edu/labdata/labdata.html, Jul. 2010, online; accessed 2010-07-01