pith. sign in

arxiv: 1809.06098 · v2 · pith:UHEBMECInew · submitted 2018-09-17 · 💻 cs.LG · cs.AI· stat.ML

Policy Optimization via Importance Sampling

classification 💻 cs.LG cs.AIstat.ML
keywords optimizationpolicyalgorithmcontinuouscontrolfunctionimportanceobjective
0
0 comments X
read the original abstract

Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel, model-free, policy search algorithm, POIS, applicable in both action-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation; then we define a surrogate objective function, which is optimized offline whenever a new batch of trajectories is collected. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.