pith. sign in

arxiv: 1304.5793 · v4 · pith:XYUQ75UEnew · submitted 2013-04-21 · 💻 cs.LG

Continuum armed bandit problem of few variables in high dimensions

classification 💻 cs.LG
keywords alphaarmedcontinuumhighregrettimevariablesacross
0
0 comments X
read the original abstract

We consider the stochastic and adversarial settings of continuum armed bandits where the arms are indexed by [0,1]^d. The reward functions r:[0,1]^d -> R are assumed to intrinsically depend on at most k coordinate variables implying r(x_1,..,x_d) = g(x_{i_1},..,x_{i_k}) for distinct and unknown i_1,..,i_k from {1,..,d} and some locally Holder continuous g:[0,1]^k -> R with exponent 0 < alpha <= 1. Firstly, assuming (i_1,..,i_k) to be fixed across time, we propose a simple modification of the CAB1 algorithm where we construct the discrete set of sampling points to obtain a bound of O(n^((alpha+k)/(2*alpha+k)) (log n)^((alpha)/(2*alpha+k)) C(k,d)) on the regret, with C(k,d) depending at most polynomially in k and sub-logarithmically in d. The construction is based on creating partitions of {1,..,d} into k disjoint subsets and is probabilistic, hence our result holds with high probability. Secondly we extend our results to also handle the more general case where (i_1,...,i_k) can change over time and derive regret bounds for the same.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.