KLinf-UCB is extended to nonparametric rewards with asymptotic expected-regret optimality and a tight upper bound on regret tail probability that recovers and matches prior results for bounded and finite-support cases.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.IT 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Regret Tail Characterization of Optimal Bandit Algorithms with Generic Rewards
KLinf-UCB is extended to nonparametric rewards with asymptotic expected-regret optimality and a tight upper bound on regret tail probability that recovers and matches prior results for bounded and finite-support cases.