An algorithm for multi-armed bandits that interpolates between reward maximization and accurate mean estimation, supported by matching upper and lower regret bounds.
Bui, Ramesh Johari, and Shie Mannor
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Trading off rewards and errors in multi-armed bandits
An algorithm for multi-armed bandits that interpolates between reward maximization and accurate mean estimation, supported by matching upper and lower regret bounds.