pith. machine review for the scientific record.
sign in

arxiv: 1712.01275 · v3 · pith:LKQIUL5Anew · submitted 2017-12-04 · 💻 cs.LG · cs.AI

A Deeper Look at Experience Replay

classification 💻 cs.LG cs.AI
keywords replayexperiencebufferhyper-parameterlargeshowcasesimpleutility
0
0 comments X
read the original abstract

Recently experience replay is widely used in various deep reinforcement learning (RL) algorithms, in this paper we rethink the utility of experience replay. It introduces a new hyper-parameter, the memory buffer size, which needs carefully tuning. However unfortunately the importance of this new hyper-parameter has been underestimated in the community for a long time. In this paper we did a systematic empirical study of experience replay under various function representations. We showcase that a large replay buffer can significantly hurt the performance. Moreover, we propose a simple O(1) method to remedy the negative influence of a large replay buffer. We showcase its utility in both simple grid world and challenging domains like Atari games.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODACER) for Safe Reinforcement Learning in Optimal Control

    eess.SY 2026-01 unverdicted novelty 7.0

    SODACER uses fast and slow buffers with adaptive clustering for experience replay in safe RL, integrated with CBFs and Sophia optimizer to achieve faster convergence and safety on nonlinear systems like HPV transmission.

  2. AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling

    cs.DC 2026-03 unverdicted novelty 5.0

    AGMARL-DKS uses per-node multi-agent RL with GNN state representations and stress-aware lexicographical ordering to outperform the default Kubernetes scheduler on fault tolerance, utilization, and cost for batch and m...