pith. machine review for the scientific record. sign in

arxiv: 1408.2028 · v1 · submitted 2014-08-09 · 💻 cs.AI

Recognition: unknown

Bandit Algorithms for Tree Search

Authors on Pith no claims yet
classification 💻 cs.AI
keywords treebanditsearchconfidencehighalgorithmsbranchesefficient
0
0 comments X
read the original abstract

Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to re- turn rapidly a good value, and improve preci- sion if more time is provided. The UCT algo- rithm [8], a tree search method based on Up- per Confidence Bounds (UCB) [2], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is "over-optimistic" in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modification of UCT us- ing a confidence sequence that scales expo- nentially in the horizon depth is analyzed. We then consider Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce and analyze a Bandit Algorithm for Smooth Trees (BAST) which takes into account ac- tual smoothness of the rewards for perform- ing efficient "cuts" of sub-optimal branches with high confidence. Finally, we present an incremental tree expansion which applies when the full tree is too big (possibly in- finite) to be entirely represented and show that with high probability, only the optimal branches are indefinitely developed. We illus- trate these methods on a global optimization problem of a continuous function, given noisy values.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Planning in entropy-regularized Markov decision processes and games

    cs.LG 2026-04 unverdicted novelty 7.0

    SmoothCruiser achieves O~(1/epsilon^4) problem-independent sample complexity for value estimation in entropy-regularized MDPs and games via a generative model.

  2. Scale-free adaptive planning for deterministic dynamics & discounted rewards

    cs.LG 2026-04 unverdicted novelty 7.0

    Platypoos is a scale-free adaptive planning algorithm with sample complexity bounds that hold simultaneously across discount factors and reward scales, accompanied by a matching lower bound.