The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

Csaba Szepesvari; Tor Lattimore

arxiv: 1610.04491 · v1 · pith:GZYCQBNPnew · submitted 2016-10-14 · 📊 stat.ML · cs.LG

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

Tor Lattimore , Csaba Szepesvari This is my paper

classification 📊 stat.ML cs.LG

keywords banditsfinite-armedlinearoptimismasymptoticoptimalsamplingsimple

0 comments

read the original abstract

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the optimism principle and Thompson sampling. While prior work has mostly been in the worst-case setting, we analyse the asymptotic instance-dependent regret and show matching upper and lower bounds on what is achievable. Surprisingly, our results show that no algorithm based on optimism or Thompson sampling will ever achieve the optimal rate, and indeed, can be arbitrarily far from optimal, even in very simple cases. This is a disturbing result because these techniques are standard tools that are widely used for sequential optimisation. For example, for generalised linear bandits and reinforcement learning.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sequential Experimental Design for Transductive Linear Bandits
stat.ML 2019-06 unverdicted novelty 7.0

Introduces transductive linear bandits, gives instance-dependent lower bounds, and presents an algorithm matching them up to logarithmic factors, including the first non-asymptotic near-optimal method for standard lin...