Prudent-Banker achieves pseudo-regret Õ(√T + √D) and Õ(1) regret vs. safe comparator in adversarial bandits both with and without delays, matching new lower bounds up to logs.
The pareto regret frontier.Advances in Neural Information Processing Systems, 26, 2013
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2representative citing papers
citing papers explorer
-
Prudent-Banker: No Extra Fees for Baseline Safety in Adversarial Bandits With and Without Delays
Prudent-Banker achieves pseudo-regret Õ(√T + √D) and Õ(1) regret vs. safe comparator in adversarial bandits both with and without delays, matching new lower bounds up to logs.
- Learning Safely Without Knowing the World:COMPASS-Hedge