pith. sign in

Mixed citations

Distributed distributional deterministic policy gradients

Mixed citation behavior. Most common role is background (60%).

7 Pith papers citing it
Background 60% of classified citations
abstract

This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting. We combine this within a distributed framework for off-policy learning in order to develop what we call the Distributed Distributional Deep Deterministic Policy Gradient algorithm, D4PG. We also combine this technique with a number of additional, simple improvements such as the use of $N$-step returns and prioritized experience replay. Experimentally we examine the contribution of each of these individual components, and show how they interact, as well as their combined contributions. Our results show that across a wide variety of simple control tasks, difficult manipulation tasks, and a set of hard obstacle-based locomotion tasks the D4PG algorithm achieves state of the art performance.

citation-role summary

background 4 method 1

citation-polarity summary

representative citing papers

Solving Rubik's Cube with a Robot Hand

cs.LG · 2019-10-16 · accept · novelty 7.0

Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.

Path-Coupled Bellman Flows for Distributional Reinforcement Learning

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

Path-Coupled Bellman Flows use source-consistent Bellman-coupled paths and a lambda-parameterized control-variate to learn return distributions via flow matching, improving fidelity and stability over prior DRL approaches.

A Generalist Agent

cs.AI · 2022-05-12 · accept · novelty 7.0

Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.

Dream to Control: Learning Behaviors by Latent Imagination

cs.LG · 2019-12-03 · accept · novelty 7.0

Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.

D2 Actor Critic: Diffusion Actor Meets Distributional Critic

cs.LG · 2025-10-03 · unverdicted · novelty 5.0

D2AC combines a diffusion actor with a distributional critic via fused distributional RL and clipped double Q-learning to reach state-of-the-art results on 18 hard control benchmarks including Humanoid, Dog, and Shadow Hand.

Learning to Solve a Rubik's Cube with a Dexterous Hand

cs.RO · 2019-07-26 · unverdicted · novelty 5.0

Hierarchical RL combines a model-based cube solver with a model-free hand controller to solve Rubik's cubes in simulation, achieving 90.3% success on 1400 random scrambles.

citing papers explorer

Showing 7 of 7 citing papers.

  • Solving Rubik's Cube with a Robot Hand cs.LG · 2019-10-16 · accept · none · ref 7 · internal anchor

    Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.

  • Path-Coupled Bellman Flows for Distributional Reinforcement Learning cs.LG · 2026-05-07 · unverdicted · none · ref 27

    Path-Coupled Bellman Flows use source-consistent Bellman-coupled paths and a lambda-parameterized control-variate to learn return distributions via flow matching, improving fidelity and stability over prior DRL approaches.

  • A Provably Robust Multi-Jet Framework applied to Active Flow Control of an Airfoil in Weakly Compressible Flow physics.flu-dyn · 2026-04-29 · unverdicted · none · ref 46

    A new injective multi-jet framework for RL flow control provides jet-count-independent running cost upper bounds and enables superior coordinated jet strategies, achieving drag suppression beyond symmetric ideals on cylinders and aerodynamic efficiency gains from 53% to 73% on airfoils.

  • A Generalist Agent cs.AI · 2022-05-12 · accept · none · ref 8

    Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.

  • Dream to Control: Learning Behaviors by Latent Imagination cs.LG · 2019-12-03 · accept · none · ref 3

    Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.

  • D2 Actor Critic: Diffusion Actor Meets Distributional Critic cs.LG · 2025-10-03 · unverdicted · none · ref 3 · internal anchor

    D2AC combines a diffusion actor with a distributional critic via fused distributional RL and clipped double Q-learning to reach state-of-the-art results on 18 hard control benchmarks including Humanoid, Dog, and Shadow Hand.

  • Learning to Solve a Rubik's Cube with a Dexterous Hand cs.RO · 2019-07-26 · unverdicted · none · ref 9 · internal anchor

    Hierarchical RL combines a model-based cube solver with a model-free hand controller to solve Rubik's cubes in simulation, achieving 90.3% success on 1400 random scrambles.