ReMax achieves the first sublinear finite-time regret bound for Gaussian bandits with M=2 by deriving an expected-improvement balance condition for its optimal sampling distribution and separating saturation from underestimation effects.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Finite-Time Regret Analysis of Retry-Aware Bandits
ReMax achieves the first sublinear finite-time regret bound for Gaussian bandits with M=2 by deriving an expected-improvement balance condition for its optimal sampling distribution and separating saturation from underestimation effects.