Lower Bounds for Multi-armed Bandit with Non-equivalent Multiple Plays
classification
💻 cs.LG
keywords
boundsbanditlowermulti-armedmultiplenon-equivalentplaysproblem
read the original abstract
We study the stochastic multi-armed bandit problem with non-equivalent multiple plays where, at each step, an agent chooses not only a set of arms, but also their order, which influences reward distribution. In several problem formulations with different assumptions, we provide lower bounds for regret with standard asymptotics $O(\log{t})$ but novel coefficients and provide optimal algorithms, thus proving that these bounds cannot be improved.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.