Adversarial Attacks on Stochastic Bandits

Kwang-Sung Jun; Lihong Li; Xiaojin Zhu; Yuzhe Ma

arxiv: 1810.12188 · v1 · pith:IES6HBWBnew · submitted 2018-10-29 · 💻 cs.LG · cs.AI· cs.CR· stat.ML

Adversarial Attacks on Stochastic Bandits

Kwang-Sung Jun , Lihong Li , Yuzhe Ma , Xiaojin Zhu This is my paper

classification 💻 cs.LG cs.AIcs.CRstat.ML

keywords banditactionsadversarialalgorithmattackattackerattacksbandits

0 comments

read the original abstract

We study adversarial attacks that manipulate the reward signals to control the actions chosen by a stochastic multi-armed bandit algorithm. We propose the first attack against two popular bandit algorithms: $\epsilon$-greedy and UCB, \emph{without} knowledge of the mean rewards. The attacker is able to spend only logarithmic effort, multiplied by a problem-specific parameter that becomes smaller as the bandit problem gets easier to attack. The result means the attacker can easily hijack the behavior of the bandit algorithm to promote or obstruct certain actions, say, a particular medical treatment. As bandits are seeing increasingly wide use in practice, our study exposes a significant security threat.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates
cs.LG 2025-09 unverdicted novelty 6.0

A novel robust asynchronous Q-learning algorithm achieves finite-time convergence rates that match clean-data bounds up to an additive term proportional to the corruption fraction, with a matching information-theoreti...