pith. machine review for the scientific record. sign in

arxiv: 1207.5879 · v1 · pith:BCAIBP6Dnew · submitted 2012-07-25 · 💻 cs.AI

Selecting Computations: Theory and Applications

classification 💻 cs.AI
keywords decisionproblemsactionbanditbayesiancarlocasesderive
0
0 comments X
read the original abstract

Sequential decision problems are often approximately solvable by simulating possible future action sequences. {\em Metalevel} decision procedures have been developed for selecting {\em which} action sequences to simulate, based on estimating the expected improvement in decision quality that would result from any particular simulation; an example is the recent work on using bandit algorithms to control Monte Carlo tree search in the game of Go. In this paper we develop a theoretical basis for metalevel decisions in the statistical framework of Bayesian {\em selection problems}, arguing (as others have done) that this is more appropriate than the bandit framework. We derive a number of basic results applicable to Monte Carlo selection problems, including the first finite sampling bounds for optimal policies in certain cases; we also provide a simple counterexample to the intuitive conjecture that an optimal policy will necessarily reach a decision in all cases. We then derive heuristic approximations in both Bayesian and distribution-free settings and demonstrate their superiority to bandit-based heuristics in one-shot decision problems and in Go.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.