pith. sign in

arxiv: 1805.01367 · v2 · pith:M4S3XKMZnew · submitted 2018-05-03 · 💻 cs.LG · stat.ML

Open Loop Execution of Tree-Search Algorithms, extended version

classification 💻 cs.LG stat.ML
keywords actionalgorithmsdecisionloopmethodopenoptimalityplanning
0
0 comments X
read the original abstract

In the context of tree-search stochastic planning algorithms where a generative model is available, we consider on-line planning algorithms building trees in order to recommend an action. We investigate the question of avoiding re-planning in subsequent decision steps by directly using sub-trees as action recommender. Firstly, we propose a method for open loop control via a new algorithm taking the decision of re-planning or not at each time step based on an analysis of the statistics of the sub-tree. Secondly, we show that the probability of selecting a suboptimal action at any depth of the tree can be upper bounded and converges towards zero. Moreover, this upper bound decays in a logarithmic way between subsequent depths. This leads to a distinction between node-wise optimality and state-wise optimality. Finally, we empirically demonstrate that our method achieves a compromise between loss of performance and computational gain.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

    cs.AI 2026-05 conditional novelty 7.0

    State-conditioned commitment depth in a vision-language policy Pareto-dominates fixed-depth baselines on Sliding Puzzle and Sokoban, raising solve rates by up to 12.5 points while using 25% fewer actions and beating l...

  2. When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

    cs.AI 2026-05 conditional novelty 7.0

    A vision-language policy learns state-conditioned commitment depth to Pareto-dominate fixed-depth baselines on long-horizon puzzles, achieving up to 12.5 pp higher solve rate with 25% fewer actions.

  3. When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

    cs.AI 2026-05 unverdicted novelty 6.0

    Learns state-conditioned commitment depth in a 7B vision-language policy that jointly predicts actions and replan intervals, outperforming fixed-depth baselines and larger models on Sliding Puzzle and Sokoban while pr...