An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

Peter Auer , Chao-Kai Chiang

Authors on Pith no claims yet

classification 💻 cs.LG

keywords banditspseudo-regretadversarialstochasticalgorithmsqrtexpectedoptimal

read the original abstract

We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. Against adversarial bandits the pseudo-regret is $O(K\sqrt{n \log n})$ and against stochastic bandits the pseudo-regret is $O(\sum_i (\log n)/\Delta_i)$. We also show that no algorithm with $O(\log n)$ pseudo-regret against stochastic bandits can achieve $\tilde{O}(\sqrt{n})$ expected regret against adaptive adversarial bandits. This complements previous results of Bubeck and Slivkins (2012) that show $\tilde{O}(\sqrt{n})$ expected adversarial regret with $O((\log n)^2)$ stochastic pseudo-regret.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Spectral bandits
stat.ML 2026-04 unverdicted novelty 7.0

Spectral bandits achieve scalable regret in graph-structured recommendation by using an effective dimension to learn good policies from few node evaluations.
Budgeted Online Influence Maximization
cs.LG 2026-04 unverdicted novelty 7.0

A new algorithm for online influence maximization under a total budget constraint using the independent cascade model and edge-level semi-bandit feedback, with improved regret bounds for both budgeted and cardinality ...
Best of both worlds: Stochastic & adversarial best-arm identification
stat.ML 2026-04 unverdicted novelty 7.0

No algorithm can be optimal in both stochastic and adversarial best-arm identification; a new parameter-free algorithm matches the derived lower bound up to log factors in stochastic cases while handling adversarial rewards.