Prudent-Banker achieves pseudo-regret Õ(√T + √D) and Õ(1) regret vs. safe comparator in adversarial bandits both with and without delays, matching new lower bounds up to logs.
The pareto regret frontier.Advances in Neural Information Processing Systems, 26
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it