arxiv: 1611.01224 · v2 · pith:NVVW6T56new · submitted 2016-11-03 · 💻 cs.LG

Sample Efficient Actor-Critic with Experience Replay

Ziyu Wang , Victor Bapst , Nicolas Heess , Volodymyr Mnih , Remi Munos , Koray Kavukcuoglu , Nando de Freitas This is my paper

classification 💻 cs.LG

keywords actor-criticefficientexperienceincludingreplaysampleseveralachieve

0 comments

read the original abstract

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. To achieve this, the paper introduces several innovations, including truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
cs.LG 2026-05 unverdicted novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters
cs.LG 2026-05 accept novelty 7.0

Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.
Beyond Importance Sampling: Rejection-Gated Policy Optimization
cs.LG 2026-04 unverdicted novelty 6.0

RGPO replaces importance sampling with a smooth [0,1] acceptance gate in policy gradients, unifying TRPO/PPO/REINFORCE, bounding variance for heavy-tailed ratios, and showing gains in online RLHF experiments.
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
cs.LG 2019-10 conditional novelty 6.0

AWR learns policies via advantage-weighted supervised regression on actions, achieving competitive off-policy performance on Gym tasks and strong results from static data alone.
Polychromic Objectives for Reinforcement Learning
cs.LG 2025-09 unverdicted novelty 5.0

Introduces polychromic objectives adapted into PPO via vine sampling and modified advantages, showing higher success rates and better coverage under perturbations on BabyAI, Minigrid, and algorithmic tasks.
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
cs.LG 2020-05 unverdicted novelty 2.0

Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.