ReMax achieves the first sublinear regret bound for Gaussian rewards at M=2 by characterizing the optimal sampling distribution via an expected-improvement balance condition and separating saturation from underestimation effects.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Finite-Time Regret Analysis of Retry-Aware Bandits
ReMax achieves the first sublinear regret bound for Gaussian rewards at M=2 by characterizing the optimal sampling distribution via an expected-improvement balance condition and separating saturation from underestimation effects.