Doing Better Than UCT: Rational Monte Carlo Sampling in Trees

arxiv: 1108.3711 · v2 · pith:ULOYLN4Mnew · submitted 2011-08-18 · 💻 cs.AI

Doing Better Than UCT: Rational Monte Carlo Sampling in Trees

David Tolpin , Solomon Eyal Shimony This is my paper

classification 💻 cs.AI

keywords regretmctssamplingsimplealgorithmcarlocumulativemonte

0 comments p. Extension

pith:ULOYLN4M Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{ULOYLN4M}

Prints a linked pith:ULOYLN4M badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

UCT, a state-of-the art algorithm for Monte Carlo tree sampling (MCTS), is based on UCB, a sampling policy for the Multi-armed Bandit Problem (MAB) that minimizes the accumulated regret. However, MCTS differs from MAB in that only the final choice, rather than all arm pulls, brings a reward, that is, the simple regret, as opposite to the cumulative regret, must be minimized. This ongoing work aims at applying meta-reasoning techniques to MCTS, which is non-trivial. We begin by introducing policies for multi-armed bandits with lower simple regret than UCB, and an algorithm for MCTS which combines cumulative and simple regret minimization and outperforms UCT. We also develop a sampling scheme loosely based on a myopic version of perfect value of information. Finite-time and asymptotic analysis of the policies is provided, and the algorithms are compared empirically.

This paper has not been read by Pith yet.

Doing Better Than UCT: Rational Monte Carlo Sampling in Trees

discussion (0)