The MLA-UCB algorithm uses ML-generated surrogate rewards from auxiliary data to provably lower cumulative regret in multi-armed bandits, achieving asymptotic optimality under joint Gaussian assumptions without requiring knowledge of the true-surrogate covariance.
Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
math.ST 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards
The MLA-UCB algorithm uses ML-generated surrogate rewards from auxiliary data to provably lower cumulative regret in multi-armed bandits, achieving asymptotic optimality under joint Gaussian assumptions without requiring knowledge of the true-surrogate covariance.