pith. machine review for the scientific record. sign in

arxiv: 1904.04700 · v1 · submitted 2019-04-09 · 💻 cs.LG · stat.ML

Recognition: unknown

Practical Open-Loop Optimistic Planning

Authors on Pith no claims yet
classification 💻 cs.LG stat.ML
keywords open-loopplanningalgorithmcomplexityoptimisticpracticalproposeaccess
0
0 comments X
read the original abstract

We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i.e. sequences of actions - and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KLOLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Planning in entropy-regularized Markov decision processes and games

    cs.LG 2026-04 unverdicted novelty 7.0

    SmoothCruiser achieves O~(1/epsilon^4) problem-independent sample complexity for value estimation in entropy-regularized MDPs and games via a generative model.

  2. Scale-free adaptive planning for deterministic dynamics & discounted rewards

    cs.LG 2026-04 unverdicted novelty 7.0

    Platypoos is a scale-free adaptive planning algorithm with sample complexity bounds that hold simultaneously across discount factors and reward scales, accompanied by a matching lower bound.