pith. machine review for the scientific record. sign in

arxiv: 1611.06256 · v3 · submitted 2016-11-18 · 💻 cs.LG

Recognition: unknown

Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU

Authors on Pith no claims yet
classification 💻 cs.LG
keywords asynchronousactor-criticadvantagecomputationalhybridintroducelearningother
0
0 comments X
read the original abstract

We introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We analyze its computational traits and concentrate on aspects critical to leveraging the GPU's computational power. We introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed up compared to a CPU implementation; we make it publicly available to other researchers at https://github.com/NVlabs/GA3C .

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ERPPO: Entropy Regularization-based Proximal Policy Optimization

    cs.LG 2026-05 unverdicted novelty 5.0

    ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.

  2. Biologically Inspired Event-Based Perception and Sample-Efficient Learning for High-Speed Table Tennis Robots

    cs.RO 2026-04 unverdicted novelty 5.0

    Event-based perception combined with progressive low-to-high speed training improves robotic table tennis return accuracy by 35.8% using the same number of training episodes.