hub Canonical reference

Solving Rubik's Cube with a Robot Hand

OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew · 2019 · cs.LG · arXiv 1910.07113

Canonical reference. 78% of citing Pith papers cite this work as background.

64 Pith papers citing it

Background 78% of classified citations

open full Pith review browse 64 citing papers arXiv PDF

abstract

We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 baseline 1

citation-polarity summary

background 7 baseline 1 unclear 1

representative citing papers

Taming the Curses of Multiagency in Robust Markov Games with Large State Space through Linear Function Approximation

cs.LG · 2026-05-04 · unverdicted · novelty 8.0

The work gives the first algorithms for general robust Markov games with linear function approximation whose sample complexity breaks the curse of multiagency for large state spaces in both generative and online settings.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

Generative Language Modeling for Automated Theorem Proving

cs.LG · 2020-09-07 · unverdicted · novelty 8.0

GPT-f, a transformer-based prover for Metamath, generated new short proofs that were accepted into the main library—the first such contribution from a deep-learning system.

Labimus: A Simulation and Benchmark for Humanoid Dexterous Manipulation in Chemical Laboratory

cs.RO · 2026-06-30 · unverdicted · novelty 7.0

Labimus is the first benchmark for humanoid dexterous manipulation in organic chemistry laboratories, exposing a gap between task completion and required experimental precision.

DexCompose: Reusing Dexterous Policies for Multi-Task Manipulation with a Single Hand

cs.RO · 2026-06-26 · unverdicted · novelty 7.0

DexCompose achieves 77.4% average success on 16 composite dexterous tasks by using role-aware residual composition with explicit finger ownership to combine pretrained policies without destructive interference.

Mirror Descent Beyond Euclidean Stability: An Exponential Separation in Initialization Sensitivity

cs.LG · 2026-06-09 · conditional · novelty 7.0

Non-quadratic Mirror Descent exhibits exponential initialization sensitivity in convex settings, shown via 3D constructions and KL-regularized simplex examples, with Bregman anchoring proposed for stabilization.

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

OrderGrad supplies unbiased likelihood-ratio and reparameterization gradient estimators for finite-sample L-statistics by applying a rank-based reward transformation usable in standard policy-gradient updates.

Beyond Binary: Sim-to-Real Dexterous Manipulation with Physics-Grounded Contact Representation

cs.RO · 2026-05-27 · unverdicted · novelty 7.0

CoP tactile representation with differentiable calibration enables zero-shot sim-to-real transfer and outperforms binary and raw-taxel baselines on peg-in-hole insertion and ball balancing with a multi-fingered hand.

Learning Robust Dexterous In-Hand Manipulation from Joint Sensors with Proprioceptive Transformer

cs.RO · 2026-05-20 · conditional · novelty 7.0

A transformer policy distilled from a privileged RL teacher enables 3.1x faster real-world cube rotation on the ORCA hand using solely joint sensor data by extracting implicit object state from temporal joint patterns.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

cs.LG · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.

Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers

cs.NI · 2026-05-06 · unverdicted · novelty 7.0

ReGuard discovers network scenarios where RL controllers perform 43-64% worse than achievable and reduces those gaps by 79-85% with lightweight rule-based protection that preserves normal performance.

HANDFUL: Sequential Grasp-Conditioned Dexterous Manipulation with Resource Awareness

cs.RO · 2026-04-28 · unverdicted · novelty 7.0

HANDFUL learns resource-aware grasps using finger contact rewards and curriculum learning to improve success on sequential dexterous tasks in simulation and on a real LEAP hand.

Betting for Sim-to-Real Performance Evaluation

cs.RO · 2026-04-27 · unverdicted · novelty 7.0

Betting mechanisms can yield provably more accurate and efficient estimates of real-world robot behavior than Monte Carlo sampling under specified conditions, with practical approximations demonstrated on synthetic data and a robotic manipulator task.

SynthPID: P&ID digitization from Topology-Preserving Synthetic Data

cs.CV · 2026-04-15 · conditional · novelty 7.0

Topology-preserving synthetic P&IDs generated by seeding from real drawings enable models trained solely on synthetics to achieve 63.8% edge mAP on real P&ID benchmarks, closing most of the gap to real-data training.

Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum

cs.LG · 2026-02-02 · unverdicted · novelty 7.0

Single-timescale actor-critic with STORM momentum and a recent-sample buffer achieves optimal O(ε^{-2}) sample complexity for ε-optimal policies in finite discounted MDPs.

Learning to Play Piano in the Real World

cs.RO · 2025-03-19 · unverdicted · novelty 7.0

A Sim2Real2Sim learning pipeline enables a real-world dexterous robot to play piano pieces including Happy Birthday and Ode to Joy with an average F1-score of 0.881.

Dota 2 with Large Scale Deep Reinforcement Learning

cs.LG · 2019-12-13 · accept · novelty 7.0

OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.

Efficient Sim-to-Real Transfer of World-Action Models from Synthetic Priors

cs.RO · 2026-06-30 · unverdicted · novelty 6.0

A world-action model trained on ~800 synthetic demonstrations per task achieves 35% zero-shot success on real-robot manipulation tasks.

MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning

cs.RO · 2026-06-06 · unverdicted · novelty 6.0

Introduces MuJoCo-Drones-Gym, a modular Gymnasium-compatible multi-drone simulator on MuJoCo with GPU acceleration and seven example tasks for control and RL.

Task diversity produces systematic transfer but inhibits continual reinforcement learning

cs.LG · 2026-05-30 · unverdicted · novelty 6.0

Task diversity along map, object, and hierarchy axes produces local transfer across shifts in a new continual RL benchmark but fails to sustain learning as the number of shifts grows.

Physical Atari: A Robust and Accessible Platform for Real-time Reinforcement Learning on Robots

cs.RO · 2026-05-29 · unverdicted · novelty 6.0

Physical Atari is a robust under-$1000 hardware platform combining a bearing-based robot arm, Atari controller actuator, screen renderer, and camera for real-time RL experiments directly on physical hardware.

UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

cs.RO · 2026-05-28 · unverdicted · novelty 6.0

UniLab is a CPU/GPU heterogeneous system for robot RL training using MuJoCoUni and MotrixSim backends that reports 3-10x end-to-end efficiency improvements and cross-platform compatibility beyond CUDA.

Fishbone: From One 3D Asset to a Million Controllable Edits

cs.CV · 2026-05-24 · unverdicted · novelty 6.0

Fishbone introduces a unified rib-spine representation computed via adaptive heat method, iso-contour ribs, and geometry-aware spine that enables real-time parametric deformation, reduced-space simulation, and animation on general meshes.

citing papers explorer

Showing 14 of 64 citing papers.

HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies cs.RO · 2026-03-12 · unverdicted · none · ref 29 · 2 links · internal anchor
HandelBot refines simulation policies via physical rollouts and residual RL to achieve precise bimanual piano playing, outperforming direct sim transfer by 1.8x with only 30 minutes of real data across five songs.
UniCon: A Unified System for Efficient Robot Learning Transfers cs.RO · 2026-01-21 · unverdicted · none · ref 18 · internal anchor
UniCon standardizes states and control logic into modular execution graphs for efficient transfer of learning controllers across heterogeneous robots, with lower latency than ROS.
RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation cs.RO · 2025-10-20 · unverdicted · none · ref 13 · internal anchor
RESample uses exploratory sampling guided by a lightweight Coverage Function to expand VLA training data coverage, yielding 12% performance gains on LIBERO and real-world tasks with 10-20% added samples.
Learning to Act Through Contact: A Unified View of Multi-Task Robot Learning cs.RO · 2025-10-04 · unverdicted · none · ref 3 · internal anchor
A single goal-conditioned RL policy trained on contact plans performs multiple gaits and bimanual manipulation tasks on quadruped and humanoid robots.
Learning Geometry-Aware Nonprehensile Pushing and Pulling with Dexterous Hands cs.RO · 2025-09-22 · unverdicted · none · ref 46 · internal anchor
GD2P generates and learns dexterous hand poses for nonprehensile pushing and pulling by combining contact-guided sampling, physics-based filtering, and a geometry-conditioned diffusion model, demonstrated on Allegro and LEAP hands in real-world tests.
Apple: Toward General Active Perception via Reinforcement Learning cs.RO · 2025-05-09 · unverdicted · none · ref 1 · internal anchor
APPLE is an RL framework that jointly optimizes a transformer perception module and policy via a unified objective for general active perception, with evaluations on tactile MNIST regression and classification tasks.
Analyzing Adversarial Inputs in Deep Reinforcement Learning cs.LG · 2024-02-07 · unverdicted · none · ref 1 · internal anchor
Introduces the Adversarial Rate metric and associated tools to systematically evaluate and visualize the impact of adversarial inputs on DRL policies using formal verification.
Position: Deployed Reinforcement Learning should be Continual cs.LG · 2026-06-01 · unverdicted · none · ref 1 · internal anchor
Deployed RL agents receiving evaluative rewards face inherent non-stationarity and should engage in continual learning rather than following a train-then-fix approach.
Plasticity Loss in Deep Reinforcement Learning: A Survey cs.AI · 2024-11-07 · unverdicted · none · ref 88 · internal anchor
Survey unifies the definition of plasticity loss in DRL, taxonomizes over 50 mitigations, identifies evaluation gaps, and finds general regularization often outperforms domain-specific methods.
Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manipulation Tasks cs.RO · 2022-06-13 · unverdicted · none · ref 21 · internal anchor
A benchmark study finds that increased randomization improves Sim2Real transfer in robotic RL despite trade-offs in simulation learning, with full randomization and fine-tuning outperforming other approaches on the real robot.
Learning Stable In-Grasp Manipulation in a Non-Dropping Action Space cs.RO · 2026-06-26 · unverdicted · none · ref 13 · internal anchor
Decomposing dexterous skills into physics-constrained components enables efficient and stable RL for in-grasp manipulation across objects, noise, latency, and friction conditions.
Robustness of Robotic Manipulation: Foundations and Frontiers cs.RO · 2026-06-30 · unverdicted · none · ref 13 · internal anchor
A survey that formalizes manipulation robustness from probabilistic and control perspectives and reviews mechanisms, metrics, and open problems across robotics subfields.
Learning Dexterous Grasping from Sparse Taxonomy Guidance cs.RO · 2026-04-05 · unreviewed · ref 10 · internal anchor
PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation cs.RO · 2026-03-04 · unreviewed · ref 16 · internal anchor

Solving Rubik's Cube with a Robot Hand

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer