A framework generates training data from a numerical solver for FRMABPs, applies nonlinear feature transforms, and learns time-dependent policies via OCT-H to achieve up to 26 million times speed-up on test problems.
: Restless bandits: Activity allocation in a changing world
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach
A framework generates training data from a numerical solver for FRMABPs, applies nonlinear feature transforms, and learns time-dependent policies via OCT-H to achieve up to 26 million times speed-up on test problems.