pith. sign in

arxiv: 1601.03855 · v1 · pith:5JS75PXTnew · submitted 2016-01-15 · 💻 cs.LG

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

classification 💻 cs.LG
keywords algorithmproblemrelativeadversarialbanditboundduelingexploitation
0
0 comments X
read the original abstract

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.