pith. sign in

arxiv: 1207.3024 · v4 · pith:7F7FQIEInew · submitted 2012-07-12 · 💻 cs.DS · cs.GT

A Time and Space Efficient Algorithm for Contextual Linear Bandits

classification 💻 cs.DS cs.GT
keywords achievealgorithmcomplexityalgorithmscomputationcontextscontextualgrow
0
0 comments X
read the original abstract

We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve $O(\log T)$ regret after $T$ time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with $T$ or achieve regrets that grow linearly with the number of contexts $|\myset{X}|$. We propose an $\epsilon$-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in $\reals^d$, we prove that our algorithm has a constant computation complexity per iteration of $O(poly(d))$ and can achieve a regret of $O(poly(d) \log T)$ even when $|\myset{X}| = \Omega (2^d) $. In addition, unlike previous algorithms, its space complexity scales like $O(Kd^2)$ and does not grow with $T$.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.