Stochastic convex optimization with bandit feedback

Alekh Agarwal; Alexander Rakhlin; Daniel Hsu; Dean P. Foster; Sham M. Kakade

arxiv: 1107.1744 · v2 · pith:5PAUWGGBnew · submitted 2011-07-08 · 🧮 math.OC · cs.LG· cs.SY

Stochastic convex optimization with bandit feedback

Alekh Agarwal , Dean P. Foster , Daniel Hsu , Sham M. Kakade , Alexander Rakhlin This is my paper

classification 🧮 math.OC cs.LGcs.SY

keywords algorithmfunctionconvexregretbanditfeedbackmodeloptimal

0 comments

read the original abstract

This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $\xset$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \in \xset$. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs $\otil(\poly(d)\sqrt{T})$ regret. Since any algorithm has regret at least $\Omega(\sqrt{T})$ on this problem, our algorithm is optimal in terms of the scaling with $T$.

This paper has not been read by Pith yet.

Stochastic convex optimization with bandit feedback

discussion (0)