Prudent-Banker achieves pseudo-regret Õ(√T + √D) and Õ(1) regret vs. safe comparator in adversarial bandits both with and without delays, matching new lower bounds up to logs.
In standard bandits, verifying that the baseline/comparator is suboptimal reduces to a relatively direct hypothesis test
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Prudent-Banker: No Extra Fees for Baseline Safety in Adversarial Bandits With and Without Delays
Prudent-Banker achieves pseudo-regret Õ(√T + √D) and Õ(1) regret vs. safe comparator in adversarial bandits both with and without delays, matching new lower bounds up to logs.