pith. machine review for the scientific record.
sign in

arxiv: 1904.06312 · v1 · pith:P6AEGILSnew · submitted 2019-04-12 · 💻 cs.LG · cs.AI· stat.ML

Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments

classification 💻 cs.LG cs.AIstat.ML
keywords agentslearningperformancereinforcementvariabilitycommonenvironmentlearned
0
0 comments X
read the original abstract

Reproducibility in reinforcement learning is challenging: uncontrolled stochasticity from many sources, such as the learning algorithm, the learned policy, and the environment itself have led researchers to report the performance of learned agents using aggregate metrics of performance over multiple random seeds for a single environment. Unfortunately, there are still pernicious sources of variability in reinforcement learning agents that make reporting common summary statistics an unsound metric for performance. Our experiments demonstrate the variability of common agents used in the popular OpenAI Baselines repository. We make the case for reporting post-training agent performance as a distribution, rather than a point estimate.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Replicable Reinforcement Learning with Linear Function Approximation

    cs.LG 2025-09 unverdicted novelty 7.0

    Introduces replicable random design regression and covariance estimation tools to enable the first provably efficient replicable RL algorithms for linear MDPs in generative and episodic settings.

  2. RAPTOR: A Foundation Policy for Quadrotor Control

    cs.RO 2025-09 unverdicted novelty 6.0

    A 2084-parameter recurrent policy trained by distilling 1000 RL teacher policies enables zero-shot control across 10 real quadrotors differing in mass, motors, frames, propellers, and flight controllers.