pith. sign in

arxiv: 1807.05827 · v4 · pith:HE7BB5FRnew · submitted 2018-07-16 · 💻 cs.LG · stat.ML

Remember and Forget for Experience Replay

classification 💻 cs.LG stat.ML
keywords policyref-erreplayexperienceexperiencesgradientoff-policyalgorithms
0
0 comments X
read the original abstract

Experience replay (ER) is a fundamental component of off-policy deep reinforcement learning (RL). ER recalls experiences from past iterations to compute gradient estimates for the current policy, increasing data-efficiency. However, the accuracy of such updates may deteriorate when the policy diverges from past behaviors and can undermine the performance of ER. Many algorithms mitigate this issue by tuning hyper-parameters to slow down policy changes. An alternative is to actively enforce the similarity between policy and the experiences in the replay memory. We introduce Remember and Forget Experience Replay (ReF-ER), a novel method that can enhance RL algorithms with parameterized policies. ReF-ER (1) skips gradients computed from experiences that are too unlikely with the current policy and (2) regulates policy changes within a trust region of the replayed behaviors. We couple ReF-ER with Q-learning, deterministic policy gradient and off-policy gradient methods. We find that ReF-ER consistently improves the performance of continuous-action, off-policy RL on fully observable benchmarks and partially observable flow control problems.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Experience Replay Optimization

    cs.LG 2019-06 unverdicted novelty 6.0

    ERO alternates updates between an agent policy maximizing cumulative reward and a replay policy selecting useful experiences, with experiments showing improved performance on continuous control tasks.