A mixed Thompson-sampling and global-UCB strategy for Whittle-index policies in restless multi-armed bandits applied to data-center VM scheduling for grid demand response outperforms pure TW and EXP4 baselines.
Contextual Bandit Algorithms with Supervised Learning Guarantees,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Robust Restless Multi-Armed Bandit for Data Center Flexibility Services Through Virtual Machine Scheduling
A mixed Thompson-sampling and global-UCB strategy for Whittle-index policies in restless multi-armed bandits applied to data-center VM scheduling for grid demand response outperforms pure TW and EXP4 baselines.