pith. machine review for the scientific record. sign in

arxiv: 1605.08722 · v1 · submitted 2016-05-27 · 💻 cs.LG

Recognition: unknown

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

Authors on Pith no claims yet
classification 💻 cs.LG
keywords banditspseudo-regretadversarialstochasticalgorithmsqrtexpectedoptimal
0
0 comments X
read the original abstract

We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. Against adversarial bandits the pseudo-regret is $O(K\sqrt{n \log n})$ and against stochastic bandits the pseudo-regret is $O(\sum_i (\log n)/\Delta_i)$. We also show that no algorithm with $O(\log n)$ pseudo-regret against stochastic bandits can achieve $\tilde{O}(\sqrt{n})$ expected regret against adaptive adversarial bandits. This complements previous results of Bubeck and Slivkins (2012) that show $\tilde{O}(\sqrt{n})$ expected adversarial regret with $O((\log n)^2)$ stochastic pseudo-regret.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Spectral bandits

    stat.ML 2026-04 unverdicted novelty 7.0

    Spectral bandits achieve scalable regret in graph-structured recommendation by using an effective dimension to learn good policies from few node evaluations.

  2. Budgeted Online Influence Maximization

    cs.LG 2026-04 unverdicted novelty 7.0

    A new algorithm for online influence maximization under a total budget constraint using the independent cascade model and edge-level semi-bandit feedback, with improved regret bounds for both budgeted and cardinality ...

  3. Best of both worlds: Stochastic & adversarial best-arm identification

    stat.ML 2026-04 unverdicted novelty 7.0

    No algorithm can be optimal in both stochastic and adversarial best-arm identification; a new parameter-free algorithm matches the derived lower bound up to log factors in stochastic cases while handling adversarial rewards.