Prudent-Banker achieves pseudo-regret Õ(√T + √D) and Õ(1) regret vs. safe comparator in adversarial bandits both with and without delays, matching new lower bounds up to logs.
Best of both worlds: Regret minimization versus minimax play.arXiv preprint arXiv:2502.11673, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
COMPASS-Hedge is presented as the first parameter-free full-information anytime algorithm that simultaneously delivers minimax-optimal adversarial regret, instance-optimal stochastic regret, and Õ(1) regret to a baseline policy.
citing papers explorer
-
Prudent-Banker: No Extra Fees for Baseline Safety in Adversarial Bandits With and Without Delays
Prudent-Banker achieves pseudo-regret Õ(√T + √D) and Õ(1) regret vs. safe comparator in adversarial bandits both with and without delays, matching new lower bounds up to logs.
-
Learning Safely Without Knowing the World:COMPASS-Hedge
COMPASS-Hedge is presented as the first parameter-free full-information anytime algorithm that simultaneously delivers minimax-optimal adversarial regret, instance-optimal stochastic regret, and Õ(1) regret to a baseline policy.