pith. machine review for the scientific record. sign in

arXiv preprint arXiv:1611.01224 , year=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it
abstract

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. To achieve this, the paper introduces several innovations, including truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method.

fields

cs.LG 5

representative citing papers

Beyond Importance Sampling: Rejection-Gated Policy Optimization

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

RGPO replaces importance sampling with a smooth [0,1] acceptance gate in policy gradients, unifying TRPO/PPO/REINFORCE, bounding variance for heavy-tailed ratios, and showing gains in online RLHF experiments.

citing papers explorer

Showing 5 of 5 citing papers.