pith. machine review for the scientific record. sign in

arxiv: 1611.01224 · v2 · submitted 2016-11-03 · 💻 cs.LG

Recognition: unknown

Sample Efficient Actor-Critic with Experience Replay

Authors on Pith no claims yet
classification 💻 cs.LG
keywords actor-criticefficientexperienceincludingreplaysampleseveralachieve
0
0 comments X
read the original abstract

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. To achieve this, the paper introduces several innovations, including truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

    cs.LG 2026-05 unverdicted novelty 7.0

    DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

  2. Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters

    cs.LG 2026-05 accept novelty 7.0

    Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.

  3. Beyond Importance Sampling: Rejection-Gated Policy Optimization

    cs.LG 2026-04 unverdicted novelty 6.0

    RGPO replaces importance sampling with a smooth [0,1] acceptance gate in policy gradients, unifying TRPO/PPO/REINFORCE, bounding variance for heavy-tailed ratios, and showing gains in online RLHF experiments.

  4. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

    cs.LG 2019-10 conditional novelty 6.0

    AWR learns policies via advantage-weighted supervised regression on actions, achieving competitive off-policy performance on Gym tasks and strong results from static data alone.

  5. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

    cs.LG 2020-05 unverdicted novelty 2.0

    Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.