Maximin Action Identification: A New Bandit Framework for Games

Aur\'elien Garivier (IMT); Emilie Kaufmann (CRIStAL; SEQUEL); Wouter Koolen (CWI)

arxiv: 1602.04676 · v1 · pith:QVZ75EUFnew · submitted 2016-02-15 · 🧮 math.ST · cs.GT· stat.ML· stat.TH

Maximin Action Identification: A New Bandit Framework for Games

Aur\'elien Garivier (IMT) , Emilie Kaufmann (CRIStAL , SEQUEL) , Wouter Koolen (CWI) This is my paper

classification 🧮 math.ST cs.GTstat.MLstat.TH

keywords actionactionsbanditsamplealgorithmanalysisbestbound

0 comments

read the original abstract

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.

This paper has not been read by Pith yet.

Maximin Action Identification: A New Bandit Framework for Games

discussion (0)