Prudent-Banker achieves pseudo-regret Õ(√T + √D) and Õ(1) regret vs. safe comparator in adversarial bandits both with and without delays, matching new lower bounds up to logs.
Best of both worlds: Regret minimization versus minimax play
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it