Reinforce- ment learning through asynchronous advantage actor-critic on a gpu

· 2016 · cs.LG · arXiv 1611.06256

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

We introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We analyze its computational traits and concentrate on aspects critical to leveraging the GPU's computational power. We introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed up compared to a CPU implementation; we make it publicly available to other researchers at https://github.com/NVlabs/GA3C .

representative citing papers

ERPPO: Entropy Regularization-based Proximal Policy Optimization

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.

Biologically Inspired Event-Based Perception and Sample-Efficient Learning for High-Speed Table Tennis Robots

cs.RO · 2026-04-06 · unverdicted · novelty 5.0

Event-based perception combined with progressive low-to-high speed training improves robotic table tennis return accuracy by 35.8% using the same number of training episodes.

citing papers explorer

Showing 2 of 2 citing papers.

ERPPO: Entropy Regularization-based Proximal Policy Optimization cs.LG · 2026-05-13 · unverdicted · none · ref 39 · internal anchor
ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.
Biologically Inspired Event-Based Perception and Sample-Efficient Learning for High-Speed Table Tennis Robots cs.RO · 2026-04-06 · unverdicted · none · ref 44
Event-based perception combined with progressive low-to-high speed training improves robotic table tennis return accuracy by 35.8% using the same number of training episodes.

Reinforce- ment learning through asynchronous advantage actor-critic on a gpu

fields

years

verdicts

representative citing papers

citing papers explorer