A novel robust asynchronous Q-learning algorithm achieves finite-time convergence rates that match clean-data bounds up to an additive term proportional to the corruption fraction, with a matching information-theoretic lower bound.
Data poisoning attacks on stochastic bandits
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2025 2verdicts
UNVERDICTED 2representative citing papers
Introduces bounded fake data injection attacks that force a class of stochastic bandit algorithms to select a target arm in nearly all rounds at sublinear attack cost.
citing papers explorer
-
Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates
A novel robust asynchronous Q-learning algorithm achieves finite-time convergence rates that match clean-data bounds up to an additive term proportional to the corruption fraction, with a matching information-theoretic lower bound.
-
Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection
Introduces bounded fake data injection attacks that force a class of stochastic bandit algorithms to select a target arm in nearly all rounds at sublinear attack cost.