Recognition: unknown
Practical Open-Loop Optimistic Planning
read the original abstract
We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i.e. sequences of actions - and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KLOLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Planning in entropy-regularized Markov decision processes and games
SmoothCruiser achieves O~(1/epsilon^4) problem-independent sample complexity for value estimation in entropy-regularized MDPs and games via a generative model.
-
Scale-free adaptive planning for deterministic dynamics & discounted rewards
Platypoos is a scale-free adaptive planning algorithm with sample complexity bounds that hold simultaneously across discount factors and reward scales, accompanied by a matching lower bound.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.