pith. machine review for the scientific record. sign in

arxiv: 1511.05952 · v4 · submitted 2015-11-18 · 💻 cs.LG

Recognition: unknown

Prioritized Experience Replay

Authors on Pith no claims yet
classification 💻 cs.LG
keywords replayexperienceprioritizedtransitionsgameslearningreinforcementwere
0
0 comments X
read the original abstract

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently. We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new state-of-the-art, outperforming DQN with uniform replay on 41 out of 49 games.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Adaptive TD-Lambda for Cooperative Multi-agent Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 7.0

    ATD(λ) adapts TD(λ) in MARL via a density ratio estimator on past/current replay buffers to assign λ per state-action pair, yielding competitive or better results than fixed-λ QMIX and MAPPO on SMAC and Gfootball.

  2. Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation

    cs.RO 2026-05 conditional novelty 7.0

    A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.

  3. Disagreement-Regularized Importance Sampling for Adversarial Label Corruption

    cs.LG 2026-05 unverdicted novelty 7.0

    DR-IS selects low-contamination subsets via bounded rank-disagreement in proxy ensembles under an ε-contamination model, with O(√(log(N/δ)/K)) concentration rates that certify separation when the expectation gap Δ' is...

  4. Replay-buffer engineering for noise-robust quantum circuit optimization

    quant-ph 2026-04 unverdicted novelty 7.0

    Treating the replay buffer as a central lever in RL for quantum circuit optimization yields 4-32x sample efficiency gains, up to 67.5% faster episodes, and 85-90% fewer steps to accuracy on noisy molecular and compila...

  5. Mastering Diverse Domains through World Models

    cs.AI 2023-01 unverdicted novelty 7.0

    DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.

  6. Mastering Atari with Discrete World Models

    cs.LG 2020-10 accept novelty 7.0

    DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.

  7. Error whitening: Why Gauss-Newton outperforms Newton

    cs.LG 2026-05 conditional novelty 6.0

    Gauss-Newton descent whitens errors by projecting Newton directions or gradients onto the tangent space, replacing JJ^T with the identity and removing parameterization distortions that affect Newton descent.

  8. When Does Non-Uniform Replay Matter in Reinforcement Learning?

    cs.LG 2026-05 unverdicted novelty 6.0

    Non-uniform replay helps off-policy RL mainly at low replay volumes, high-entropy sampling matters even at similar recency, and Truncated Geometric replay offers a low-overhead practical solution.

  9. Experience Constrained Hierarchical Federated Reinforcement Learning for Large-scale UAV Teams in Hazardous Environments

    cs.LG 2026-05 unverdicted novelty 6.0

    In experience-constrained federated RL for UAVs, learning performance depends primarily on experience reuse and minibatch size rather than the number of participating learners.

  10. AutoREC: A software platform for developing reinforcement learning agents for equivalent circuit model generation from electrochemical impedance spectroscopy data

    cs.LG 2026-04 unverdicted novelty 6.0

    AutoREC uses a Double Deep Q-Network agent to generate equivalent circuit models from EIS data, reporting over 99.6% success on synthetic sets and generalization to experimental battery, corrosion, and catalysis data.

  11. Preventing Latent Rehearsal Decay in Online Continual SSL with SOLAR

    cs.LG 2026-04 unverdicted novelty 6.0

    SOLAR prevents latent rehearsal decay in online continual SSL by adaptively managing replay buffers with deviation proxies and an explicit overlap loss, delivering both fast convergence and state-of-the-art final accu...

  12. Data Warmup: Complexity-Aware Curricula for Efficient Diffusion Training

    cs.LG 2026-04 conditional novelty 6.0

    Data Warmup accelerates diffusion training on ImageNet by scheduling images from low to high complexity via a foreground-based metric and temperature-controlled sampler, improving FID and IS scores faster than uniform...

  13. DeepMind Control Suite

    cs.AI 2018-01 accept novelty 6.0

    The DeepMind Control Suite supplies a standardized collection of continuous control tasks with interpretable rewards for benchmarking reinforcement learning agents.

  14. When Does Non-Uniform Replay Matter in Reinforcement Learning?

    cs.LG 2026-05 unverdicted novelty 5.0

    Non-uniform replay improves RL sample efficiency mainly in low replay-volume regimes, with high-entropy sampling being key even at comparable recency.

  15. Distributional Value Estimation Without Target Networks for Robust Quality-Diversity

    cs.LG 2026-04 unverdicted novelty 5.0

    QDHUAC is a distributional, target-free QD-RL method that enables stable high-UTD training and competitive performance on Brax locomotion tasks using far fewer environment steps than prior approaches.

  16. Safe reinforcement learning with online filtering for fatigue-predictive human-robot task planning and allocation in production

    cs.AI 2026-04 unverdicted novelty 5.0

    PF-CD3Q uses online particle filtering to estimate fatigue parameters and constrains a deep Q-learning agent to solve fatigue-aware human-robot task planning as a CMDP.

  17. Rainbow Deep Q-Learning with Kinematics-Aware Design for Cooperative Delta and 3-RRS Parallel Robot Insertion

    cs.RO 2026-05 unverdicted novelty 4.0

    Rainbow DQN with kinematics-aware design optimization enables reliable cooperative insertion by Delta and 3-RRS robots in a high-fidelity simulator.

  18. XekRung Technical Report

    cs.CR 2026-04 unverdicted novelty 3.0

    XekRung achieves state-of-the-art performance on cybersecurity benchmarks among same-scale models via tailored data synthesis and multi-stage training while retaining strong general capabilities.