The POW index policy for restless multi-armed bandits with per-arm penalty constraints is asymptotically optimal, computable offline per user, and learnable via deep RL.
Lagrangian index policy for restless bandits with average reward
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
contest 1representative citing papers
Lagrange index heuristic for RMAB-SMDP scheduling minimizes weighted AoI under non-preemptive heterogeneous updates in wireless networks.
citing papers explorer
-
Restless Bandits with Individual Penalty Constraints: Near-Optimal Indices and Deep Reinforcement Learning
The POW index policy for restless multi-armed bandits with per-arm penalty constraints is asymptotically optimal, computable offline per user, and learnable via deep RL.
-
Lagrange Index based Scheduling for Minimizing Age of Updates from Heterogeneous Sources
Lagrange index heuristic for RMAB-SMDP scheduling minimizes weighted AoI under non-preemptive heterogeneous updates in wireless networks.