The work gives the first algorithms for general robust Markov games with linear function approximation whose sample complexity breaks the curse of multiagency for large state spaces in both generative and online settings.
hub Canonical reference
Solving Rubik's Cube with a Robot Hand
Canonical reference. 78% of citing Pith papers cite this work as background.
abstract
We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
GPT-f, a transformer-based prover for Metamath, generated new short proofs that were accepted into the main library—the first such contribution from a deep-learning system.
CoP tactile representation with differentiable calibration enables zero-shot sim-to-real transfer and outperforms binary and raw-taxel baselines on peg-in-hole insertion and ball balancing with a multi-fingered hand.
A transformer policy distilled from a privileged RL teacher enables 3.1x faster real-world cube rotation on the ORCA hand using solely joint sensor data by extracting implicit object state from temporal joint patterns.
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
ReGuard discovers network scenarios where RL controllers perform 43-64% worse than achievable and reduces those gaps by 79-85% with lightweight rule-based protection that preserves normal performance.
HANDFUL learns resource-aware grasps using finger contact rewards and curriculum learning to improve success on sequential dexterous tasks in simulation and on a real LEAP hand.
Betting mechanisms can yield provably more accurate and efficient estimates of real-world robot behavior than Monte Carlo sampling under specified conditions, with practical approximations demonstrated on synthetic data and a robotic manipulator task.
Topology-preserving synthetic P&IDs generated by seeding from real drawings enable models trained solely on synthetics to achieve 63.8% edge mAP on real P&ID benchmarks, closing most of the gap to real-data training.
Single-timescale actor-critic with STORM momentum and a recent-sample buffer achieves optimal O(ε^{-2}) sample complexity for ε-optimal policies in finite discounted MDPs.
A Sim2Real2Sim learning pipeline enables a real-world dexterous robot to play piano pieces including Happy Birthday and Ode to Joy with an average F1-score of 0.881.
OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.
Task diversity along map, object, and hierarchy axes produces local transfer across shifts in a new continual RL benchmark but fails to sustain learning as the number of shifts grows.
UniLab is a CPU/GPU heterogeneous system for robot RL training using MuJoCoUni and MotrixSim backends that reports 3-10x end-to-end efficiency improvements and cross-platform compatibility beyond CUDA.
Fishbone introduces a unified rib-spine representation computed via adaptive heat method, iso-contour ribs, and geometry-aware spine that enables real-time parametric deformation, reduced-space simulation, and animation on general meshes.
A VAE-based latent task representation enables automatic curriculum generation in CRL for non-Euclidean navigation tasks, outperforming interpolation and GAN-based methods in experiments.
The paper decomposes simulator value errors into identifiable shifts and irreducible residuals, shows passive learning fails on reachability, and introduces Fisher-SEP to minimize posterior value variance via targeted experiments.
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
DRIS improves zero-shot sim-to-real transfer for reactive catching by maintaining and acting on sets of randomized dynamics instances instead of single instances per episode.
GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for consistent environments.
A framework using 3D Gaussian Splatting for visual domain randomization enables robust monocular RGB-based dexterous in-hand reorientation on real hardware for multiple objects under varied lighting.
Differentiable simulation enables torque-sensor-free actuator model identification from trajectory data, achieving 1.88x better position tracking than a stand-trained baseline and 46% longer travel in downstream locomotion policies.
citing papers explorer
-
Learning to Play Piano in the Real World
A Sim2Real2Sim learning pipeline enables a real-world dexterous robot to play piano pieces including Happy Birthday and Ode to Joy with an average F1-score of 0.881.
-
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
Isaac Lab is a unified GPU-native platform combining high-fidelity physics, photorealistic rendering, multi-frequency sensors, domain randomization, and learning pipelines for scalable multi-modal robot policy training.
-
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
VLA-RL applies online RL to pretrained VLAs, yielding a 4.5% gain over strong baselines on 40 LIBERO manipulation tasks and matching commercial models like π₀-FAST.
-
RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation
RESample uses exploratory sampling guided by a lightweight Coverage Function to expand VLA training data coverage, yielding 12% performance gains on LIBERO and real-world tasks with 10-20% added samples.
-
Learning to Act Through Contact: A Unified View of Multi-Task Robot Learning
A single goal-conditioned RL policy trained on contact plans performs multiple gaits and bimanual manipulation tasks on quadruped and humanoid robots.
-
Learning Geometry-Aware Nonprehensile Pushing and Pulling with Dexterous Hands
GD2P generates and learns dexterous hand poses for nonprehensile pushing and pulling by combining contact-guided sampling, physics-based filtering, and a geometry-conditioned diffusion model, demonstrated on Allegro and LEAP hands in real-world tests.
-
Apple: Toward General Active Perception via Reinforcement Learning
APPLE is an RL framework that jointly optimizes a transformer perception module and policy via a unified objective for general active perception, with evaluations on tactile MNIST regression and classification tasks.