Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits

Akshay Krishnamurthy; Haipeng Luo; Robert E. Schapire; Vasilis Syrgkanis

arxiv: 1606.00313 · v1 · pith:Q5RE6UNGnew · submitted 2016-06-01 · 💻 cs.LG

Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits

Vasilis Syrgkanis , Haipeng Luo , Akshay Krishnamurthy , Robert E. Schapire This is my paper

classification 💻 cs.LG

keywords fracnumberadversarialalgorithmbarriercontextscontextualoracle-based

0 comments

read the original abstract

We give an oracle-based algorithm for the adversarial contextual bandit problem, where either contexts are drawn i.i.d. or the sequence of contexts is known a priori, but where the losses are picked adversarially. Our algorithm is computationally efficient, assuming access to an offline optimization oracle, and enjoys a regret of order $O((KT)^{\frac{2}{3}}(\log N)^{\frac{1}{3}})$, where $K$ is the number of actions, $T$ is the number of iterations and $N$ is the number of baseline policies. Our result is the first to break the $O(T^{\frac{3}{4}})$ barrier that is achieved by recently introduced algorithms. Breaking this barrier was left as a major open problem. Our analysis is based on the recent relaxation based approach of (Rakhlin and Sridharan, 2016).

This paper has not been read by Pith yet.

Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits

discussion (0)