hub

Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

Plappert, M · 2018 · cs.LG · arXiv 1802.09464

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

open full Pith review browse 14 citing papers arXiv PDF

abstract

The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Multi-Goal Reinforcement Learning (RL) framework in which an agent is told what to do using an additional input. The second part of the paper presents a set of concrete research ideas for improving RL algorithms, most of which are related to Multi-Goal RL and Hindsight Experience Replay.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

Solving Rubik's Cube with a Robot Hand

cs.LG · 2019-10-16 · accept · novelty 7.0

Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.

Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.

Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

Introduces RAPCs and a contraction Bellman operator for cost-optimal policies that satisfy probabilistic reach-avoid specifications in stochastic MDPs, with almost-sure convergence to local optima.

Learning Optimal Strategies for Temporal Tasks in Stochastic Games

cs.AI · 2021-02-08 · unverdicted · novelty 6.0

Model-free RL learns optimal strategies in stochastic games for LTL specs by constructing a product with DPA and assigning rewards/discounts from acceptance conditions.

Disentangled Skill Embeddings for Reinforcement Learning

cs.LG · 2019-06-21 · unverdicted · novelty 6.0

Disentangled Skill Embeddings (DSE) is a variational inference framework for multi-task RL using shared parameters and task-specific latent embeddings for generalization to unseen conditions and as skills in hierarchical RL.

QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.

Trajectory-Level Data Augmentation for Offline Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

Trajectory-based data augmentation exploits geometric relationships between rewards, values, and logging policies to enable effective offline RL from few suboptimal trajectories.

When a Robot is More Capable than a Human: Learning from Constrained Demonstrators

cs.RO · 2025-10-10 · unverdicted · novelty 5.0

Robots outperform constrained human demonstrations by inferring state-only rewards from demos and using temporal interpolation to label and explore better trajectories, achieving 10x faster task completion on a real robotic arm than behavioral cloning.

D2 Actor Critic: Diffusion Actor Meets Distributional Critic

cs.LG · 2025-10-03 · unverdicted · novelty 5.0

D2AC combines a diffusion actor with a distributional critic via fused distributional RL and clipped double Q-learning to reach state-of-the-art results on 18 hard control benchmarks including Humanoid, Dog, and Shadow Hand.

Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

cs.LG · 2025-06-26 · unverdicted · novelty 5.0

SSE improves long-horizon goal-conditioned RL by using failure and partial-success transitions to identify unreliable subgoals, streamline high-level planning, and outperform prior hierarchical methods on benchmarks.

Goal-Conditioned Decision Transformer for Multi-Goal Offline Reinforcement Learning

cs.RO · 2024-10-08 · unverdicted · novelty 5.0

A Goal-Conditioned Decision Transformer is adapted for offline multi-goal RL and shown to outperform online baselines on a new Franka Emika Panda dataset.

Middle-mile logistics through the lens of goal-conditioned reinforcement learning

stat.ML · 2026-05-04 · unverdicted · novelty 4.0

Middle-mile logistics is cast as a multi-object goal-conditioned MDP and solved by combining graph neural networks with model-free RL via extraction of small feature graphs.

citing papers explorer

Showing 14 of 14 citing papers.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling cs.LG · 2026-05-14 · unverdicted · none · ref 106 · internal anchor
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
Solving Rubik's Cube with a Robot Hand cs.LG · 2019-10-16 · accept · none · ref 84 · internal anchor
Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.
Revisiting Mixture Policies in Entropy-Regularized Actor-Critic cs.LG · 2026-05-09 · unverdicted · none · ref 39
A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making cs.LG · 2026-05-15 · unverdicted · none · ref 144 · internal anchor
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning cs.LG · 2026-05-12 · unverdicted · none · ref 7 · 2 links · internal anchor
Introduces RAPCs and a contraction Bellman operator for cost-optimal policies that satisfy probabilistic reach-avoid specifications in stochastic MDPs, with almost-sure convergence to local optima.
Learning Optimal Strategies for Temporal Tasks in Stochastic Games cs.AI · 2021-02-08 · unverdicted · none · ref 40 · internal anchor
Model-free RL learns optimal strategies in stochastic games for LTL specs by constructing a product with DPA and assigning rewards/discounts from acceptance conditions.
Disentangled Skill Embeddings for Reinforcement Learning cs.LG · 2019-06-21 · unverdicted · none · ref 7 · internal anchor
Disentangled Skill Embeddings (DSE) is a variational inference framework for multi-task RL using shared parameters and task-specific latent embeddings for generalization to unseen conditions and as skills in hierarchical RL.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL cs.LG · 2026-05-03 · unverdicted · none · ref 152
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
Trajectory-Level Data Augmentation for Offline Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 28 · internal anchor
Trajectory-based data augmentation exploits geometric relationships between rewards, values, and logging policies to enable effective offline RL from few suboptimal trajectories.
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators cs.RO · 2025-10-10 · unverdicted · none · ref 11 · internal anchor
Robots outperform constrained human demonstrations by inferring state-only rewards from demos and using temporal interpolation to label and explore better trajectories, achieving 10x faster task completion on a real robotic arm than behavioral cloning.
D2 Actor Critic: Diffusion Actor Meets Distributional Critic cs.LG · 2025-10-03 · unverdicted · none · ref 27 · internal anchor
D2AC combines a diffusion actor with a distributional critic via fused distributional RL and clipped double Q-learning to reach state-of-the-art results on 18 hard control benchmarks including Humanoid, Dog, and Shadow Hand.
Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning cs.LG · 2025-06-26 · unverdicted · none · ref 25 · internal anchor
SSE improves long-horizon goal-conditioned RL by using failure and partial-success transitions to identify unreliable subgoals, streamline high-level planning, and outperform prior hierarchical methods on benchmarks.
Goal-Conditioned Decision Transformer for Multi-Goal Offline Reinforcement Learning cs.RO · 2024-10-08 · unverdicted · none · ref 40 · internal anchor
A Goal-Conditioned Decision Transformer is adapted for offline multi-goal RL and shown to outperform online baselines on a new Franka Emika Panda dataset.
Middle-mile logistics through the lens of goal-conditioned reinforcement learning stat.ML · 2026-05-04 · unverdicted · none · ref 15
Middle-mile logistics is cast as a multi-object goal-conditioned MDP and solved by combining graph neural networks with model-free RL via extraction of small feature graphs.

Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer