Prudent-Banker achieves pseudo-regret Õ(√T + √D) and Õ(1) regret vs. safe comparator in adversarial bandits both with and without delays, matching new lower bounds up to logs.
Logarithmic regret for matrix games against an adversary with noisy bandit feedback.arXiv preprint arXiv:2306.13233, 2023
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Prudent-Banker: No Extra Fees for Baseline Safety in Adversarial Bandits With and Without Delays
Prudent-Banker achieves pseudo-regret Õ(√T + √D) and Õ(1) regret vs. safe comparator in adversarial bandits both with and without delays, matching new lower bounds up to logs.