Efficient Algorithm Design of Dynamic Spectrum Access by Whittle Index

Keqin Liu; Qizhen Jia; Yiying Zhang; Zhi Ding

arxiv: 2501.00236 · v2 · pith:FG4I5CSRnew · submitted 2024-12-31 · 🧮 math.OC

Efficient Algorithm Design of Dynamic Spectrum Access by Whittle Index

Keqin Liu , Qizhen Jia , Yiying Zhang , Zhi Ding This is my paper

Pith reviewed 2026-05-23 06:40 UTC · model grok-4.3

classification 🧮 math.OC

keywords dynamic spectrum accessWhittle indexrestless multi-armed banditsthreshold policiesbelief-state MDPCQI feedbackchannel allocationsubsidy problem

0 comments

The pith

Belief-state value function bounds prove threshold optimality for single-channel subsidy problems, yielding an iterative closed-form Whittle index for dynamic spectrum access.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper models dynamic spectrum access as a restless multi-armed bandit problem using channel quality indicator feedback to update channel beliefs. It derives tight bounds on the derivatives of the value functions in the belief-state Markov decision process. These bounds establish that threshold policies are optimal for the single-channel problem with subsidy. The result allows construction of a closed-form channel index function via an iterative method that approximates the Whittle index policy. This approximation provides an efficient way to rank and allocate among channels whose states are not directly observable.

Core claim

The paper proves the optimality of threshold policies for the single-channel subsidy problem in a belief-state MDP arising from CQI-based observations, and uses this to develop an iterative method that computes a closed-form approximation to the Whittle index, enabling low-complexity ranking of available channels in the dynamic spectrum access setting.

What carries the argument

Tight bounds on derivatives of value functions in the belief-state MDP, which establish threshold optimality and support the iterative computation of the Whittle index approximation.

If this is right

Offers a low-complexity solution for ranking currently available channels.
Demonstrates superior performance and robustness through numerical studies.
Handles complex belief transition behaviors in infinite state space from CQI feedback.
Maximizes expected long-term return while minimizing interference to the parent network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The derivative bound technique may apply to other RMAB formulations with continuous observations.
The index approximation could be tested for convergence rate against standard value iteration in larger channel sets.
Deployment in actual wireless systems might show reduced spectrum waste compared to binary feedback methods.
Similar threshold structures could simplify index policies in related subsidy problems outside spectrum access.

Load-bearing premise

The value functions of the belief-state MDP possess sufficient structure to admit tight bounds on their derivatives.

What would settle it

Finding a single-channel subsidy problem instance where the optimal policy is not a threshold policy, or where the derivative bounds do not hold.

Figures

Figures reproduced from arXiv: 2501.00236 by Keqin Liu, Qizhen Jia, Yiying Zhang, Zhi Ding.

**Figure 1.** Figure 1: The system model the channel allocation problem. In Section III, we introduce the basic concepts of indexability and Whittle index, prove some important properties of value functions that will be useful in the following analysis, and derive sufficient conditions for indexability and the threshold structure of the optimal policy for the decoupled single-arm problem. In Section IV, under the conditions deriv… view at source ↗

**Figure 2.** Figure 2: Average discounted return using the myopic, AWI0 policies and our AWI policies in different iteration step sizes under the condition (52) of discount [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: Average discounted return using the myopic, AWI0 policies and our AWI policies in different iteration step sizes when discount factor [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

read the original abstract

Dynamic spectrum access problem is an important problem that allows a wireless sub-network to use channels temporarily unoccupied by the parent network for minimizing the spectrum waste. Previous work has shown that the sequential channel allocation problem for the sub-network can be formulated within the restless multi-armed bandits (RMAB) framework. The objective is to maximize the expected long-term return over an infinite horizon while minimizing interference to the parent network. Different from the previous work that exploits a binary feedback (e.g., ACK/NAK) to compensate for sensing errors, we leverage the finer and more robust channel quality indicator (CQI) feedback to update the information state (belief vector) of the sub-network. However, the implementation of CQI-based observation model yields significantly more complex belief transition behaviors in an infinite state space and worsens the curse of dimensionality of dynamic programming. To overcome this challenge, we dive into the rich structures of the value functions and obtain tight bounds on their derivatives. These results lead to the proof of optimality of threshold policies on a single-channel problem with subsidy and subsequently a closed-form channel index function using an iterative method to approximate the well-known Whittle index policy, which offers a low-complexity solution for ranking the currently available channels whose states are never directly observable. Through extensive numerical studies, we demonstrate the superior performance and robustness of our proposed algorithm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts Whittle index to CQI observations in spectrum RMAB and claims threshold optimality via value-function derivative bounds, but that bound step is the part that needs the closest look.

read the letter

The core contribution is extending the Whittle index framework from binary feedback to a CQI observation model for dynamic spectrum access. They formulate the problem as an RMAB, focus on the single-channel subsidy version, and use monotonicity and convexity properties of the value functions to derive bounds on their first and second derivatives. Those bounds are then used to prove that the optimal policy is threshold in the belief state, after which they construct an iterative closed-form approximation to the Whittle index itself. The result is a low-complexity ranking rule for channels whose states are only partially observed through CQI reports. Numerical experiments reportedly show gains over prior methods in both performance and robustness to sensing errors. This is the part that actually moves the needle for the subfield: a concrete index policy that scales beyond the binary case while staying inside the RMAB structure. The derivation stays grounded in the MDP value-function analysis rather than fitting or self-referential tricks, which is a plus. The main soft spot is exactly the derivative-bound argument. The CQI belief transitions are continuous and more intricate than the binary ACK/NAK kernel, so the induction that propagates the bounds across the infinite horizon has to be tight and uniform. If any step in that chain is loose, the threshold result does not follow and the index construction loses its optimality justification. The abstract states the bounds exist, but the letter would need the full induction and the explicit bound expressions to confirm they hold without additional assumptions. The paper is aimed at people already working on index policies for partially observed bandits in communications or resource allocation. Someone looking for a practical, computable ranking rule for spectrum channels would get direct value from the index formula and the numerics. It is worth sending to a serious referee because the problem is well-motivated, the technical path is clear, and the claimed improvement is testable, even though the central proof step will require careful verification.

Referee Report

2 major / 1 minor

Summary. The paper formulates dynamic spectrum access as a restless multi-armed bandit problem using CQI feedback to maintain a continuous belief-state MDP. It derives tight bounds on the first and second derivatives of the value functions, uses these to prove optimality of threshold policies for the single-channel subsidy problem, constructs a closed-form Whittle index approximation via an iterative method, and reports superior numerical performance over baselines for ranking unobservable channels.

Significance. If the derivative bounds and resulting threshold optimality hold, the work supplies a low-complexity, index-based policy for RMAB instances with continuous belief states and richer observation models than binary ACK/NAK. This is a meaningful technical contribution to scalable DSA algorithms, with the iterative index construction offering practical implementability.

major comments (2)

[value-function analysis] The central step establishing threshold optimality (value-function analysis section) invokes tight bounds on the derivatives of the infinite-horizon value functions under the CQI-induced belief transition kernel. The induction argument that propagates monotonicity and convexity properties across value-iteration steps must be verified explicitly for the continuous-state kernel; failure of uniform bounds would invalidate the threshold claim and collapse the subsequent index construction.
[index construction] The iterative method claimed to produce a closed-form channel index function (index construction section) is presented as an approximation to the Whittle index. Explicit convergence guarantees or error bounds relative to the true Whittle index (computed, e.g., via the subsidy formulation) are required; without them the ranking procedure lacks a proven performance guarantee.

minor comments (1)

[Abstract] The abstract states that 'tight bounds' are obtained but does not list the precise structural assumptions (e.g., on the CQI transition probabilities) needed for the derivative bounds; adding one sentence would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the positive assessment of the technical contribution. We address each major comment below.

read point-by-point responses

Referee: [value-function analysis] The central step establishing threshold optimality (value-function analysis section) invokes tight bounds on the derivatives of the infinite-horizon value functions under the CQI-induced belief transition kernel. The induction argument that propagates monotonicity and convexity properties across value-iteration steps must be verified explicitly for the continuous-state kernel; failure of uniform bounds would invalidate the threshold claim and collapse the subsequent index construction.

Authors: We appreciate the referee drawing attention to the need for explicit verification. The derivative bounds in Section III are obtained directly from the continuous belief-state transition kernel induced by the CQI observation model. The induction step shows that if monotonicity and convexity hold for the iterate V_k, they are preserved under the Bellman operator because the belief update is a convex combination that respects the ordering of the value-function derivatives. In the revision we will insert an additional lemma that isolates the uniform bound propagation for the continuous kernel and verifies it holds uniformly over the state space. revision: yes
Referee: [index construction] The iterative method claimed to produce a closed-form channel index function (index construction section) is presented as an approximation to the Whittle index. Explicit convergence guarantees or error bounds relative to the true Whittle index (computed, e.g., via the subsidy formulation) are required; without them the ranking procedure lacks a proven performance guarantee.

Authors: The iterative construction in Section IV yields a closed-form index by repeatedly solving the single-channel subsidy problem; the resulting expression is exact at the boundary subsidy values and inherits the threshold structure proved earlier. While we do not supply an analytical convergence rate to the exact Whittle index (which is intractable to compute directly for the continuous-state MDP), the approximation preserves the correct channel ranking, as confirmed by the extensive numerical comparisons against optimal and baseline policies. In the revision we will add a new subsection that reports the empirical index error on a discretized grid and discusses its effect on the overall ranking performance. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation proceeds from MDP structure analysis.

full rationale

The paper derives tight bounds on value-function derivatives directly from the CQI belief-state MDP transition kernel and monotonicity/convexity properties to establish threshold optimality for the single-channel subsidy problem, then uses an iterative approximation for the Whittle index. This chain is self-contained in the MDP formulation and subsidy setup; no step reduces by construction to a fitted input, self-citation, imported uniqueness theorem, or renamed known result. The provided abstract and reader's assessment confirm the central claims rest on independent structural analysis rather than self-referential inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on unproven structural properties of the infinite-state value functions under the CQI transition kernel; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Value functions admit tight bounds on their derivatives with respect to the belief state.
Invoked to prove threshold optimality of the single-channel subsidy policy.

pith-pipeline@v0.9.0 · 5774 in / 1205 out tokens · 88313 ms · 2026-05-23T06:40:20.850935+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

Security of Home Node B (HNB) / Home evolved Node B (HeNB) (Release 18),

3GPP, “Security of Home Node B (HNB) / Home evolved Node B (HeNB) (Release 18),” Mar. 2024

work page 2024
[2]

GTI 5G Low-Cost Series Products - 5G Femto Technical Re- quirements White Paper,

GTI, “GTI 5G Low-Cost Series Products - 5G Femto Technical Re- quirements White Paper,” Sep. 2023

work page 2023
[3]

A Machine Learning Solution for Video Delivery to Mitigate Co-Tier Interference in 5G HetNets,

D. Anand, M. A. Togou, and G.-M. Muntean, “A Machine Learning Solution for Video Delivery to Mitigate Co-Tier Interference in 5G HetNets,” IEEE Transactions on Multimedia , vol. 25, pp. 5117–5129, 2023. 12 (a) System-1 (b) System-2 (c) System-3 (d) System-4 Fig. 2. Average discounted return using the myopic, AWI0 policies and our AWI policies in different...

work page 2023
[4]

On Interfer- ence Aware Power Adjustment and Scheduling in Femtocell Networks,

M. Lin, N. Bartolini, M. Giallorenzo, and T. F. La Porta, “On Interfer- ence Aware Power Adjustment and Scheduling in Femtocell Networks,” IEEE/ACM Transactions on Networking , vol. 28, no. 2, pp. 736–749, Apr. 2020

work page 2020
[5]

An Energy- Quality Utility-Based Adaptive Scheduling Solution for Mobile Users in Dense Networks,

P. Scopelliti, A. Tropeano, G.-M. Muntean, and G. Araniti, “An Energy- Quality Utility-Based Adaptive Scheduling Solution for Mobile Users in Dense Networks,” IEEE Transactions on Broadcasting , vol. 66, no. 1, pp. 47–55, Mar. 2020

work page 2020
[6]

Novel Fast User-Placement Ushering Algorithms and Performance Analysis for LTE Femtocell Networks,

L. Pu, H.-C. Wu, C. Wang, S.-H. Fang, S. Mukhopadhyay, and C. Busch, “Novel Fast User-Placement Ushering Algorithms and Performance Analysis for LTE Femtocell Networks,”IEEE Transactions on Cognitive Communications and Networking, vol. 6, no. 1, pp. 381–393, Mar. 2020

work page 2020
[7]

Joint Mode Selection and Resource Allocation for D2D and Femtocell Users in Dense Hetero- geneous Networks With Full Frequency Reuse,

L. Eslami, G. Mirjalily, and T. N. Davidson, “Joint Mode Selection and Resource Allocation for D2D and Femtocell Users in Dense Hetero- geneous Networks With Full Frequency Reuse,” IEEE Transactions on Vehicular Technology, vol. 72, no. 11, pp. 14 364–14 379, Nov. 2023

work page 2023
[8]

OFDRA: Optimal Femtocell Deployment for Accurate Indoor Positioning of RIS- Mounted A Vs,

A. Famili, T. O. Atalay, A. Stavrou, H. Wang, and J.-M. Park, “OFDRA: Optimal Femtocell Deployment for Accurate Indoor Positioning of RIS- Mounted A Vs,” IEEE Journal on Selected Areas in Communications , vol. 41, no. 12, pp. 3783–3798, Dec. 2023

work page 2023
[9]

Secrecy Rate Max- imization in THz-Aided Heterogeneous Networks: A Deep Reinforce- ment Learning Approach,

H. Sharma, N. Kumar, I. Budhiraja, and A. Barnawi, “Secrecy Rate Max- imization in THz-Aided Heterogeneous Networks: A Deep Reinforce- ment Learning Approach,” IEEE Transactions on Vehicular Technology, vol. 72, no. 10, pp. 13 490–13 505, Oct. 2023

work page 2023
[10]

Joint Coordinated Beamforming and Power Splitting Ratio Optimization in MU-MISO SWIPT-Enabled HetNets: A Multi-Agent DDQN-Based Approach,

R. Zhang, K. Xiong, Y . Lu, B. Gao, P. Fan, and K. B. Letaief, “Joint Coordinated Beamforming and Power Splitting Ratio Optimization in MU-MISO SWIPT-Enabled HetNets: A Multi-Agent DDQN-Based Approach,”IEEE Journal on Selected Areas in Communications, vol. 40, no. 2, pp. 677–693, Feb. 2022

work page 2022
[11]

Robust Max-Min Energy Efficiency for RIS-Aided HetNets With Distortion Noises,

Y . Xu, H. Xie, Q. Wu, C. Huang, and C. Yuen, “Robust Max-Min Energy Efficiency for RIS-Aided HetNets With Distortion Noises,”IEEE Transactions on Communications , vol. 70, no. 2, pp. 1457–1471, Feb. 2022

work page 2022
[12]

Distributed Generalized Nash Equilibrium Seeking and Its Application to Femtocell Networks,

Z. Li, Z. Li, and Z. Ding, “Distributed Generalized Nash Equilibrium Seeking and Its Application to Femtocell Networks,” IEEE Transactions on Cybernetics, vol. 52, no. 4, pp. 2505–2517, Apr. 2022

work page 2022
[13]

Online Learning of Time-Varying Unbalanced Net- works in Non-Convex Environments: A Multi-Armed Bandit Approach,

O. T. Odeyomi, “Online Learning of Time-Varying Unbalanced Net- works in Non-Convex Environments: A Multi-Armed Bandit Approach,” IEEE Access, vol. 11, pp. 15 567–15 577, 2023

work page 2023
[14]

Radar Enhanced Multi-Armed Bandit for Rapid Beam Selection in Millimeter Wave Communications,

A. Sneh, S. Darak, S. S. Ram, and M. Hanawal, “Radar Enhanced Multi-Armed Bandit for Rapid Beam Selection in Millimeter Wave Communications,” IEEE Communications Letters , vol. 27, no. 9, pp. 2441–2445, Sep. 2023

work page 2023
[15]

Learning and Com- munications Co-Design for Remote Inference Systems: Feature Length Selection and Transmission Scheduling,

M. K. C. Shisher, B. Ji, I.-H. Hou, and Y . Sun, “Learning and Com- munications Co-Design for Remote Inference Systems: Feature Length Selection and Transmission Scheduling,” IEEE Journal on Selected Areas in Information Theory , vol. 4, pp. 524–538, 2023

work page 2023
[16]

On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples,

W. R. Thompson, “On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples,” Biometrika, vol. 25, no. 3/4, pp. 285–294, 1933

work page 1933
[17]

Bandit Processes and Dynamic Allocation Indices,

J. C. Gittins, “Bandit Processes and Dynamic Allocation Indices,” Journal of the Royal Statistical Society: Series B (Methodological) , vol. 41, no. 2, pp. 148–164, 1979

work page 1979
[18]

A Dynamic Allocation Index for the Discounted Multiarmed Bandit Problem,

J. C. Gittins and D. M. Jones, “A Dynamic Allocation Index for the Discounted Multiarmed Bandit Problem,” Biometrika, vol. 66, no. 3, pp. 561–565, 1979

work page 1979
[19]

Restless Bandits: Activity Allocation in a Changing World,

P. Whittle, “Restless Bandits: Activity Allocation in a Changing World,” Journal of Applied Probability , vol. 25, pp. 287–298, 1988

work page 1988
[20]

Index Policies and Performance Bounds for Dynamic Selection Problems,

D. B. Brown and J. E. Smith, “Index Policies and Performance Bounds for Dynamic Selection Problems,” Management Science, vol. 66, no. 7, pp. 3029–3050, Jul. 2020. 13 (a) System-1 (b) System-2 (c) System-3 (d) System-4 Fig. 3. Average discounted return using the myopic, AWI0 policies and our AWI policies in different iteration step sizes when discount fact...

work page 2020
[21]

A Whittle Index Approach to Minimiz- ing Age of Multi-Packet Information in IoT Network,

M. Chen, K. Wu, and L. Song, “A Whittle Index Approach to Minimiz- ing Age of Multi-Packet Information in IoT Network,” IEEE Access , vol. 9, pp. 31 467–31 480, 2021

work page 2021
[22]

On an index policy for restless bandits,

R. R. Weber and G. Weiss, “On an index policy for restless bandits,” Journal of Applied Probability , vol. 27, no. 3, pp. 637–648, Sep. 1990

work page 1990
[23]

An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits,

G. Zayas-Cab ´an, S. Jasin, and G. Wang, “An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits,” Advances in Applied Probability , vol. 51, no. 3, pp. 745–772, Sep. 2019

work page 2019
[24]

The complexity of optimal queueing network control,

C. Papadimitriou and J. Tsitsiklis, “The complexity of optimal queueing network control,” in Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory , Jun. 1994, pp. 318–322

work page 1994
[25]

Spectrum sharing and resource allocation for energy-efficient heterogeneous cognitive radio networks with fem- tocells,

R. Xie, F. R. Yu, and H. Ji, “Spectrum sharing and resource allocation for energy-efficient heterogeneous cognitive radio networks with fem- tocells,” in 2012 IEEE International Conference on Communications (ICC), Jun. 2012, pp. 1661–1665

work page 2012
[26]

Power Minimization Based Resource Allocation for Interference Mitigation in OFDMA Femtocell Networks,

D. Lopez-Perez, X. Chu, A. V . Vasilakos, and H. Claussen, “Power Minimization Based Resource Allocation for Interference Mitigation in OFDMA Femtocell Networks,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 2, pp. 333–344, Feb. 2014

work page 2014
[27]

Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players,

K. Liu and Q. Zhao, “Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players,” in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing , Mar. 2010, pp. 3010–3013

work page 2010
[28]

Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access,

——, “Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access,” IEEE Transactions on Information Theory , vol. 56, no. 11, pp. 5547–5567, Nov. 2010

work page 2010
[29]

Distributed Stochastic Online Learning Policies for Opportunistic Spectrum Access,

Y . Gai and B. Krishnamachari, “Distributed Stochastic Online Learning Policies for Opportunistic Spectrum Access,” IEEE Transactions on Signal Processing, vol. 62, no. 23, pp. 6184–6193, Dec. 2014

work page 2014
[30]

Femtocell Scheduling as a Restless Multiarmed Bandit Problem Using Partial Channel State Ob- servation,

H. M. Elmaghraby, K. Liu, and Z. Ding, “Femtocell Scheduling as a Restless Multiarmed Bandit Problem Using Partial Channel State Ob- servation,” in 2018 IEEE International Conference on Communications (ICC), May 2018, pp. 1–6

work page 2018
[31]

The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs,

E. J. Sondik, “The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs,” Operations Research, vol. 26, no. 2, pp. 282–304, 1978

work page 1978
[32]

Low-Complexity Algorithm for Restless Bandits with Imperfect Observations,

K. Liu, R. Weber, and C. Zhang, “Low-Complexity Algorithm for Restless Bandits with Imperfect Observations,” May 2024

work page 2024
[33]

An elementary proof of the one-dimensional Rademacher theorem,

L. Zaj ´ıˇcek, “An elementary proof of the one-dimensional Rademacher theorem,” Mathematica Bohemica, vol. 117, no. 2, pp. 133–136, 1992

work page 1992

[1] [1]

Security of Home Node B (HNB) / Home evolved Node B (HeNB) (Release 18),

3GPP, “Security of Home Node B (HNB) / Home evolved Node B (HeNB) (Release 18),” Mar. 2024

work page 2024

[2] [2]

GTI 5G Low-Cost Series Products - 5G Femto Technical Re- quirements White Paper,

GTI, “GTI 5G Low-Cost Series Products - 5G Femto Technical Re- quirements White Paper,” Sep. 2023

work page 2023

[3] [3]

A Machine Learning Solution for Video Delivery to Mitigate Co-Tier Interference in 5G HetNets,

D. Anand, M. A. Togou, and G.-M. Muntean, “A Machine Learning Solution for Video Delivery to Mitigate Co-Tier Interference in 5G HetNets,” IEEE Transactions on Multimedia , vol. 25, pp. 5117–5129, 2023. 12 (a) System-1 (b) System-2 (c) System-3 (d) System-4 Fig. 2. Average discounted return using the myopic, AWI0 policies and our AWI policies in different...

work page 2023

[4] [4]

On Interfer- ence Aware Power Adjustment and Scheduling in Femtocell Networks,

M. Lin, N. Bartolini, M. Giallorenzo, and T. F. La Porta, “On Interfer- ence Aware Power Adjustment and Scheduling in Femtocell Networks,” IEEE/ACM Transactions on Networking , vol. 28, no. 2, pp. 736–749, Apr. 2020

work page 2020

[5] [5]

An Energy- Quality Utility-Based Adaptive Scheduling Solution for Mobile Users in Dense Networks,

P. Scopelliti, A. Tropeano, G.-M. Muntean, and G. Araniti, “An Energy- Quality Utility-Based Adaptive Scheduling Solution for Mobile Users in Dense Networks,” IEEE Transactions on Broadcasting , vol. 66, no. 1, pp. 47–55, Mar. 2020

work page 2020

[6] [6]

Novel Fast User-Placement Ushering Algorithms and Performance Analysis for LTE Femtocell Networks,

L. Pu, H.-C. Wu, C. Wang, S.-H. Fang, S. Mukhopadhyay, and C. Busch, “Novel Fast User-Placement Ushering Algorithms and Performance Analysis for LTE Femtocell Networks,”IEEE Transactions on Cognitive Communications and Networking, vol. 6, no. 1, pp. 381–393, Mar. 2020

work page 2020

[7] [7]

Joint Mode Selection and Resource Allocation for D2D and Femtocell Users in Dense Hetero- geneous Networks With Full Frequency Reuse,

L. Eslami, G. Mirjalily, and T. N. Davidson, “Joint Mode Selection and Resource Allocation for D2D and Femtocell Users in Dense Hetero- geneous Networks With Full Frequency Reuse,” IEEE Transactions on Vehicular Technology, vol. 72, no. 11, pp. 14 364–14 379, Nov. 2023

work page 2023

[8] [8]

OFDRA: Optimal Femtocell Deployment for Accurate Indoor Positioning of RIS- Mounted A Vs,

A. Famili, T. O. Atalay, A. Stavrou, H. Wang, and J.-M. Park, “OFDRA: Optimal Femtocell Deployment for Accurate Indoor Positioning of RIS- Mounted A Vs,” IEEE Journal on Selected Areas in Communications , vol. 41, no. 12, pp. 3783–3798, Dec. 2023

work page 2023

[9] [9]

Secrecy Rate Max- imization in THz-Aided Heterogeneous Networks: A Deep Reinforce- ment Learning Approach,

H. Sharma, N. Kumar, I. Budhiraja, and A. Barnawi, “Secrecy Rate Max- imization in THz-Aided Heterogeneous Networks: A Deep Reinforce- ment Learning Approach,” IEEE Transactions on Vehicular Technology, vol. 72, no. 10, pp. 13 490–13 505, Oct. 2023

work page 2023

[10] [10]

Joint Coordinated Beamforming and Power Splitting Ratio Optimization in MU-MISO SWIPT-Enabled HetNets: A Multi-Agent DDQN-Based Approach,

R. Zhang, K. Xiong, Y . Lu, B. Gao, P. Fan, and K. B. Letaief, “Joint Coordinated Beamforming and Power Splitting Ratio Optimization in MU-MISO SWIPT-Enabled HetNets: A Multi-Agent DDQN-Based Approach,”IEEE Journal on Selected Areas in Communications, vol. 40, no. 2, pp. 677–693, Feb. 2022

work page 2022

[11] [11]

Robust Max-Min Energy Efficiency for RIS-Aided HetNets With Distortion Noises,

Y . Xu, H. Xie, Q. Wu, C. Huang, and C. Yuen, “Robust Max-Min Energy Efficiency for RIS-Aided HetNets With Distortion Noises,”IEEE Transactions on Communications , vol. 70, no. 2, pp. 1457–1471, Feb. 2022

work page 2022

[12] [12]

Distributed Generalized Nash Equilibrium Seeking and Its Application to Femtocell Networks,

Z. Li, Z. Li, and Z. Ding, “Distributed Generalized Nash Equilibrium Seeking and Its Application to Femtocell Networks,” IEEE Transactions on Cybernetics, vol. 52, no. 4, pp. 2505–2517, Apr. 2022

work page 2022

[13] [13]

Online Learning of Time-Varying Unbalanced Net- works in Non-Convex Environments: A Multi-Armed Bandit Approach,

O. T. Odeyomi, “Online Learning of Time-Varying Unbalanced Net- works in Non-Convex Environments: A Multi-Armed Bandit Approach,” IEEE Access, vol. 11, pp. 15 567–15 577, 2023

work page 2023

[14] [14]

Radar Enhanced Multi-Armed Bandit for Rapid Beam Selection in Millimeter Wave Communications,

A. Sneh, S. Darak, S. S. Ram, and M. Hanawal, “Radar Enhanced Multi-Armed Bandit for Rapid Beam Selection in Millimeter Wave Communications,” IEEE Communications Letters , vol. 27, no. 9, pp. 2441–2445, Sep. 2023

work page 2023

[15] [15]

Learning and Com- munications Co-Design for Remote Inference Systems: Feature Length Selection and Transmission Scheduling,

M. K. C. Shisher, B. Ji, I.-H. Hou, and Y . Sun, “Learning and Com- munications Co-Design for Remote Inference Systems: Feature Length Selection and Transmission Scheduling,” IEEE Journal on Selected Areas in Information Theory , vol. 4, pp. 524–538, 2023

work page 2023

[16] [16]

On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples,

W. R. Thompson, “On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples,” Biometrika, vol. 25, no. 3/4, pp. 285–294, 1933

work page 1933

[17] [17]

Bandit Processes and Dynamic Allocation Indices,

J. C. Gittins, “Bandit Processes and Dynamic Allocation Indices,” Journal of the Royal Statistical Society: Series B (Methodological) , vol. 41, no. 2, pp. 148–164, 1979

work page 1979

[18] [18]

A Dynamic Allocation Index for the Discounted Multiarmed Bandit Problem,

J. C. Gittins and D. M. Jones, “A Dynamic Allocation Index for the Discounted Multiarmed Bandit Problem,” Biometrika, vol. 66, no. 3, pp. 561–565, 1979

work page 1979

[19] [19]

Restless Bandits: Activity Allocation in a Changing World,

P. Whittle, “Restless Bandits: Activity Allocation in a Changing World,” Journal of Applied Probability , vol. 25, pp. 287–298, 1988

work page 1988

[20] [20]

Index Policies and Performance Bounds for Dynamic Selection Problems,

D. B. Brown and J. E. Smith, “Index Policies and Performance Bounds for Dynamic Selection Problems,” Management Science, vol. 66, no. 7, pp. 3029–3050, Jul. 2020. 13 (a) System-1 (b) System-2 (c) System-3 (d) System-4 Fig. 3. Average discounted return using the myopic, AWI0 policies and our AWI policies in different iteration step sizes when discount fact...

work page 2020

[21] [21]

A Whittle Index Approach to Minimiz- ing Age of Multi-Packet Information in IoT Network,

M. Chen, K. Wu, and L. Song, “A Whittle Index Approach to Minimiz- ing Age of Multi-Packet Information in IoT Network,” IEEE Access , vol. 9, pp. 31 467–31 480, 2021

work page 2021

[22] [22]

On an index policy for restless bandits,

R. R. Weber and G. Weiss, “On an index policy for restless bandits,” Journal of Applied Probability , vol. 27, no. 3, pp. 637–648, Sep. 1990

work page 1990

[23] [23]

An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits,

G. Zayas-Cab ´an, S. Jasin, and G. Wang, “An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits,” Advances in Applied Probability , vol. 51, no. 3, pp. 745–772, Sep. 2019

work page 2019

[24] [24]

The complexity of optimal queueing network control,

C. Papadimitriou and J. Tsitsiklis, “The complexity of optimal queueing network control,” in Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory , Jun. 1994, pp. 318–322

work page 1994

[25] [25]

Spectrum sharing and resource allocation for energy-efficient heterogeneous cognitive radio networks with fem- tocells,

R. Xie, F. R. Yu, and H. Ji, “Spectrum sharing and resource allocation for energy-efficient heterogeneous cognitive radio networks with fem- tocells,” in 2012 IEEE International Conference on Communications (ICC), Jun. 2012, pp. 1661–1665

work page 2012

[26] [26]

Power Minimization Based Resource Allocation for Interference Mitigation in OFDMA Femtocell Networks,

D. Lopez-Perez, X. Chu, A. V . Vasilakos, and H. Claussen, “Power Minimization Based Resource Allocation for Interference Mitigation in OFDMA Femtocell Networks,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 2, pp. 333–344, Feb. 2014

work page 2014

[27] [27]

Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players,

K. Liu and Q. Zhao, “Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players,” in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing , Mar. 2010, pp. 3010–3013

work page 2010

[28] [28]

Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access,

——, “Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access,” IEEE Transactions on Information Theory , vol. 56, no. 11, pp. 5547–5567, Nov. 2010

work page 2010

[29] [29]

Distributed Stochastic Online Learning Policies for Opportunistic Spectrum Access,

Y . Gai and B. Krishnamachari, “Distributed Stochastic Online Learning Policies for Opportunistic Spectrum Access,” IEEE Transactions on Signal Processing, vol. 62, no. 23, pp. 6184–6193, Dec. 2014

work page 2014

[30] [30]

Femtocell Scheduling as a Restless Multiarmed Bandit Problem Using Partial Channel State Ob- servation,

H. M. Elmaghraby, K. Liu, and Z. Ding, “Femtocell Scheduling as a Restless Multiarmed Bandit Problem Using Partial Channel State Ob- servation,” in 2018 IEEE International Conference on Communications (ICC), May 2018, pp. 1–6

work page 2018

[31] [31]

The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs,

E. J. Sondik, “The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs,” Operations Research, vol. 26, no. 2, pp. 282–304, 1978

work page 1978

[32] [32]

Low-Complexity Algorithm for Restless Bandits with Imperfect Observations,

K. Liu, R. Weber, and C. Zhang, “Low-Complexity Algorithm for Restless Bandits with Imperfect Observations,” May 2024

work page 2024

[33] [33]

An elementary proof of the one-dimensional Rademacher theorem,

L. Zaj ´ıˇcek, “An elementary proof of the one-dimensional Rademacher theorem,” Mathematica Bohemica, vol. 117, no. 2, pp. 133–136, 1992

work page 1992