Practical Open-Loop Optimistic Planning

Edouard Leurent , Odalric-Ambrym Maillard

Authors on Pith no claims yet

classification 💻 cs.LG stat.ML

keywords open-loopplanningalgorithmcomplexityoptimisticpracticalproposeaccess

read the original abstract

We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i.e. sequences of actions - and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KLOLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Planning in entropy-regularized Markov decision processes and games
cs.LG 2026-04 unverdicted novelty 7.0

SmoothCruiser achieves O~(1/epsilon^4) problem-independent sample complexity for value estimation in entropy-regularized MDPs and games via a generative model.
Scale-free adaptive planning for deterministic dynamics & discounted rewards
cs.LG 2026-04 unverdicted novelty 7.0

Platypoos is a scale-free adaptive planning algorithm with sample complexity bounds that hold simultaneously across discount factors and reward scales, accompanied by a matching lower bound.