hub Canonical reference

Solving Rubik's Cube with a Robot Hand

OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew · 2019 · cs.LG · arXiv 1910.07113

Canonical reference. 78% of citing Pith papers cite this work as background.

66 Pith papers citing it

Background 78% of classified citations

open full Pith review browse 66 citing papers arXiv PDF

abstract

We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 baseline 1

citation-polarity summary

background 7 baseline 1 unclear 1

representative citing papers

Taming the Curses of Multiagency in Robust Markov Games with Large State Space through Linear Function Approximation

cs.LG · 2026-05-04 · unverdicted · novelty 8.0

The work gives the first algorithms for general robust Markov games with linear function approximation whose sample complexity breaks the curse of multiagency for large state spaces in both generative and online settings.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

Generative Language Modeling for Automated Theorem Proving

cs.LG · 2020-09-07 · unverdicted · novelty 8.0

GPT-f, a transformer-based prover for Metamath, generated new short proofs that were accepted into the main library—the first such contribution from a deep-learning system.

Labimus: A Simulation and Benchmark for Humanoid Dexterous Manipulation in Chemical Laboratory

cs.RO · 2026-06-30 · unverdicted · novelty 7.0

Labimus is the first benchmark for humanoid dexterous manipulation in organic chemistry laboratories, exposing a gap between task completion and required experimental precision.

DexCompose: Reusing Dexterous Policies for Multi-Task Manipulation with a Single Hand

cs.RO · 2026-06-26 · unverdicted · novelty 7.0

DexCompose achieves 77.4% average success on 16 composite dexterous tasks by using role-aware residual composition with explicit finger ownership to combine pretrained policies without destructive interference.

Mirror Descent Beyond Euclidean Stability: An Exponential Separation in Initialization Sensitivity

cs.LG · 2026-06-09 · conditional · novelty 7.0

Non-quadratic Mirror Descent exhibits exponential initialization sensitivity in convex settings, shown via 3D constructions and KL-regularized simplex examples, with Bregman anchoring proposed for stabilization.

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

OrderGrad supplies unbiased likelihood-ratio and reparameterization gradient estimators for finite-sample L-statistics by applying a rank-based reward transformation usable in standard policy-gradient updates.

Beyond Binary: Sim-to-Real Dexterous Manipulation with Physics-Grounded Contact Representation

cs.RO · 2026-05-27 · unverdicted · novelty 7.0

CoP tactile representation with differentiable calibration enables zero-shot sim-to-real transfer and outperforms binary and raw-taxel baselines on peg-in-hole insertion and ball balancing with a multi-fingered hand.

Learning Robust Dexterous In-Hand Manipulation from Joint Sensors with Proprioceptive Transformer

cs.RO · 2026-05-20 · conditional · novelty 7.0

A transformer policy distilled from a privileged RL teacher enables 3.1x faster real-world cube rotation on the ORCA hand using solely joint sensor data by extracting implicit object state from temporal joint patterns.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

cs.LG · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.

Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers

cs.NI · 2026-05-06 · unverdicted · novelty 7.0

ReGuard discovers network scenarios where RL controllers perform 43-64% worse than achievable and reduces those gaps by 79-85% with lightweight rule-based protection that preserves normal performance.

HANDFUL: Sequential Grasp-Conditioned Dexterous Manipulation with Resource Awareness

cs.RO · 2026-04-28 · unverdicted · novelty 7.0

HANDFUL learns resource-aware grasps using finger contact rewards and curriculum learning to improve success on sequential dexterous tasks in simulation and on a real LEAP hand.

Betting for Sim-to-Real Performance Evaluation

cs.RO · 2026-04-27 · unverdicted · novelty 7.0

Betting mechanisms can yield provably more accurate and efficient estimates of real-world robot behavior than Monte Carlo sampling under specified conditions, with practical approximations demonstrated on synthetic data and a robotic manipulator task.

SynthPID: P&ID digitization from Topology-Preserving Synthetic Data

cs.CV · 2026-04-15 · conditional · novelty 7.0

Topology-preserving synthetic P&IDs generated by seeding from real drawings enable models trained solely on synthetics to achieve 63.8% edge mAP on real P&ID benchmarks, closing most of the gap to real-data training.

Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum

cs.LG · 2026-02-02 · unverdicted · novelty 7.0

Single-timescale actor-critic with STORM momentum and a recent-sample buffer achieves optimal O(ε^{-2}) sample complexity for ε-optimal policies in finite discounted MDPs.

Learning to Play Piano in the Real World

cs.RO · 2025-03-19 · unverdicted · novelty 7.0

A Sim2Real2Sim learning pipeline enables a real-world dexterous robot to play piano pieces including Happy Birthday and Ode to Joy with an average F1-score of 0.881.

Dota 2 with Large Scale Deep Reinforcement Learning

cs.LG · 2019-12-13 · accept · novelty 7.0

OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.

Actuator Reality Shaping for Zero-Shot Sim-to-Real Robot Learning

cs.RO · 2026-07-02 · conditional · novelty 6.0

Actuator reality shaping uses a 2DOF controller to align real actuator closed-loop behavior with idealized simulation reference dynamics, enabling zero-shot sim-to-real policy deployment across multiple robot platforms.

Efficient Sim-to-Real Transfer of World-Action Models from Synthetic Priors

cs.RO · 2026-06-30 · unverdicted · novelty 6.0

A world-action model trained on ~800 synthetic demonstrations per task achieves 35% zero-shot success on real-robot manipulation tasks.

MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning

cs.RO · 2026-06-06 · unverdicted · novelty 6.0

Introduces MuJoCo-Drones-Gym, a modular Gymnasium-compatible multi-drone simulator on MuJoCo with GPU acceleration and seven example tasks for control and RL.

Task diversity produces systematic transfer but inhibits continual reinforcement learning

cs.LG · 2026-05-30 · unverdicted · novelty 6.0

Task diversity along map, object, and hierarchy axes produces local transfer across shifts in a new continual RL benchmark but fails to sustain learning as the number of shifts grows.

Physical Atari: A Robust and Accessible Platform for Real-time Reinforcement Learning on Robots

cs.RO · 2026-05-29 · unverdicted · novelty 6.0

Physical Atari is a robust under-$1000 hardware platform combining a bearing-based robot arm, Atari controller actuator, screen renderer, and camera for real-time RL experiments directly on physical hardware.

UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

cs.RO · 2026-05-28 · unverdicted · novelty 6.0

UniLab is a CPU/GPU heterogeneous system for robot RL training using MuJoCoUni and MotrixSim backends that reports 3-10x end-to-end efficiency improvements and cross-platform compatibility beyond CUDA.

citing papers explorer

Showing 50 of 66 citing papers.

Taming the Curses of Multiagency in Robust Markov Games with Large State Space through Linear Function Approximation cs.LG · 2026-05-04 · unverdicted · none · ref 24 · internal anchor
The work gives the first algorithms for general robust Markov games with linear function approximation whose sample complexity breaks the curse of multiagency for large state spaces in both generative and online settings.
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 246 · internal anchor
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
Generative Language Modeling for Automated Theorem Proving cs.LG · 2020-09-07 · unverdicted · none · ref 16 · internal anchor
GPT-f, a transformer-based prover for Metamath, generated new short proofs that were accepted into the main library—the first such contribution from a deep-learning system.
Labimus: A Simulation and Benchmark for Humanoid Dexterous Manipulation in Chemical Laboratory cs.RO · 2026-06-30 · unverdicted · none · ref 25 · internal anchor
Labimus is the first benchmark for humanoid dexterous manipulation in organic chemistry laboratories, exposing a gap between task completion and required experimental precision.
DexCompose: Reusing Dexterous Policies for Multi-Task Manipulation with a Single Hand cs.RO · 2026-06-26 · unverdicted · none · ref 10 · internal anchor
DexCompose achieves 77.4% average success on 16 composite dexterous tasks by using role-aware residual composition with explicit finger ownership to combine pretrained policies without destructive interference.
Mirror Descent Beyond Euclidean Stability: An Exponential Separation in Initialization Sensitivity cs.LG · 2026-06-09 · conditional · none · ref 115 · internal anchor
Non-quadratic Mirror Descent exhibits exponential initialization sensitivity in convex settings, shown via 3D constructions and KL-regularized simplex examples, with Bregman anchoring proposed for stabilization.
OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation cs.LG · 2026-06-04 · unverdicted · none · ref 70 · internal anchor
OrderGrad supplies unbiased likelihood-ratio and reparameterization gradient estimators for finite-sample L-statistics by applying a rank-based reward transformation usable in standard policy-gradient updates.
Beyond Binary: Sim-to-Real Dexterous Manipulation with Physics-Grounded Contact Representation cs.RO · 2026-05-27 · unverdicted · none · ref 4 · internal anchor
CoP tactile representation with differentiable calibration enables zero-shot sim-to-real transfer and outperforms binary and raw-taxel baselines on peg-in-hole insertion and ball balancing with a multi-fingered hand.
Learning Robust Dexterous In-Hand Manipulation from Joint Sensors with Proprioceptive Transformer cs.RO · 2026-05-20 · conditional · none · ref 11 · internal anchor
A transformer policy distilled from a privileged RL teacher enables 3.1x faster real-world cube rotation on the ORCA hand using solely joint sensor data by extracting implicit object state from temporal joint patterns.
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling cs.LG · 2026-05-14 · unverdicted · none · ref 40 · internal anchor
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift cs.LG · 2026-05-09 · unverdicted · none · ref 35 · 2 links · internal anchor
SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers cs.NI · 2026-05-06 · unverdicted · none · ref 5 · internal anchor
ReGuard discovers network scenarios where RL controllers perform 43-64% worse than achievable and reduces those gaps by 79-85% with lightweight rule-based protection that preserves normal performance.
HANDFUL: Sequential Grasp-Conditioned Dexterous Manipulation with Resource Awareness cs.RO · 2026-04-28 · unverdicted · none · ref 20 · internal anchor
HANDFUL learns resource-aware grasps using finger contact rewards and curriculum learning to improve success on sequential dexterous tasks in simulation and on a real LEAP hand.
Betting for Sim-to-Real Performance Evaluation cs.RO · 2026-04-27 · unverdicted · none · ref 43 · internal anchor
Betting mechanisms can yield provably more accurate and efficient estimates of real-world robot behavior than Monte Carlo sampling under specified conditions, with practical approximations demonstrated on synthetic data and a robotic manipulator task.
SynthPID: P&ID digitization from Topology-Preserving Synthetic Data cs.CV · 2026-04-15 · conditional · none · ref 1 · internal anchor
Topology-preserving synthetic P&IDs generated by seeding from real drawings enable models trained solely on synthetics to achieve 63.8% edge mAP on real P&ID benchmarks, closing most of the gap to real-data training.
Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum cs.LG · 2026-02-02 · unverdicted · none · ref 42 · internal anchor
Single-timescale actor-critic with STORM momentum and a recent-sample buffer achieves optimal O(ε^{-2}) sample complexity for ε-optimal policies in finite discounted MDPs.
Learning to Play Piano in the Real World cs.RO · 2025-03-19 · unverdicted · none · ref 7 · internal anchor
A Sim2Real2Sim learning pipeline enables a real-world dexterous robot to play piano pieces including Happy Birthday and Ode to Joy with an average F1-score of 0.881.
Dota 2 with Large Scale Deep Reinforcement Learning cs.LG · 2019-12-13 · accept · none · ref 26 · internal anchor
OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.
Actuator Reality Shaping for Zero-Shot Sim-to-Real Robot Learning cs.RO · 2026-07-02 · conditional · none · ref 7 · internal anchor
Actuator reality shaping uses a 2DOF controller to align real actuator closed-loop behavior with idealized simulation reference dynamics, enabling zero-shot sim-to-real policy deployment across multiple robot platforms.
Efficient Sim-to-Real Transfer of World-Action Models from Synthetic Priors cs.RO · 2026-06-30 · unverdicted · none · ref 7 · internal anchor
A world-action model trained on ~800 synthetic demonstrations per task achieves 35% zero-shot success on real-robot manipulation tasks.
MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning cs.RO · 2026-06-06 · unverdicted · none · ref 16 · internal anchor
Introduces MuJoCo-Drones-Gym, a modular Gymnasium-compatible multi-drone simulator on MuJoCo with GPU acceleration and seven example tasks for control and RL.
Task diversity produces systematic transfer but inhibits continual reinforcement learning cs.LG · 2026-05-30 · unverdicted · none · ref 25 · internal anchor
Task diversity along map, object, and hierarchy axes produces local transfer across shifts in a new continual RL benchmark but fails to sustain learning as the number of shifts grows.
Physical Atari: A Robust and Accessible Platform for Real-time Reinforcement Learning on Robots cs.RO · 2026-05-29 · unverdicted · none · ref 2 · internal anchor
Physical Atari is a robust under-$1000 hardware platform combining a bearing-based robot arm, Atari controller actuator, screen renderer, and camera for real-time RL experiments directly on physical hardware.
UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms cs.RO · 2026-05-28 · unverdicted · none · ref 11 · internal anchor
UniLab is a CPU/GPU heterogeneous system for robot RL training using MuJoCoUni and MotrixSim backends that reports 3-10x end-to-end efficiency improvements and cross-platform compatibility beyond CUDA.
Fishbone: From One 3D Asset to a Million Controllable Edits cs.CV · 2026-05-24 · unverdicted · none · ref 2 · internal anchor
Fishbone introduces a unified rib-spine representation computed via adaptive heat method, iso-contour ribs, and geometry-aware spine that enables real-time parametric deformation, reduced-space simulation, and animation on general meshes.
Curriculum reinforcement learning with measurable task representation learning cs.LG · 2026-05-22 · unverdicted · none · ref 1 · internal anchor
A VAE-based latent task representation enables automatic curriculum generation in CRL for non-Euclidean navigation tasks, outperforming interpolation and GAN-based methods in experiments.
Mind the Sim-to-Real Gap & Think Like a Scientist cs.AI · 2026-05-20 · unverdicted · none · ref 14 · internal anchor
The paper decomposes simulator value errors into identifiable shifts and irreducible residuals, shows passive learning fails on reachability, and introduces Fisher-SEP to minimize posterior value variance via targeted experiments.
Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing cs.LG · 2026-05-15 · unverdicted · none · ref 164 · internal anchor
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching cs.RO · 2026-05-10 · unverdicted · none · ref 4 · internal anchor
DRIS improves zero-shot sim-to-real transfer for reactive catching by maintaining and acting on sets of randomized dynamics instances instead of single instances per episode.
GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning cs.RO · 2026-04-28 · unverdicted · none · ref 2 · internal anchor
GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for consistent environments.
ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation cs.RO · 2026-04-13 · unverdicted · none · ref 1 · internal anchor
A framework using 3D Gaussian Splatting for visual domain randomization enables robust monocular RGB-based dexterous in-hand reorientation on real hardware for multiple objects under varied lighting.
Trajectory-based actuator identification via differentiable simulation cs.RO · 2026-04-11 · unverdicted · none · ref 25 · internal anchor
Differentiable simulation enables torque-sensor-free actuator model identification from trajectory data, achieving 1.88x better position tracking than a stand-trained baseline and 46% longer travel in downstream locomotion policies.
ROBOGATE: Adaptive Failure Discovery for Safe Robot Policy Deployment via Two-Stage Boundary-Focused Sampling cs.RO · 2026-03-23 · unverdicted · none · ref 10 · internal anchor
ROBOGATE applies adaptive boundary-focused sampling in simulation to discover robot policy failure boundaries, revealing a 97.65 percentage point performance gap for a VLA model between LIBERO and industrial scenarios.
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning cs.RO · 2025-11-06 · unverdicted · none · ref 73 · internal anchor
Isaac Lab is a unified GPU-native platform combining high-fidelity physics, photorealistic rendering, multi-frequency sensors, domain randomization, and learning pipelines for scalable multi-modal robot policy training.
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning cs.RO · 2025-05-24 · conditional · none · ref 2 · internal anchor
VLA-RL applies online RL to pretrained VLAs, yielding a 4.5% gain over strong baselines on 40 LIBERO manipulation tasks and matching commercial models like π₀-FAST.
SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks cs.LG · 2024-11-19 · unverdicted · none · ref 1 · internal anchor
SkillTree reduces continuous action spaces to discrete skills via a differentiable decision tree in a hierarchical policy, achieving comparable performance to neural skill methods with added skill-level explainability in robotic arm tasks.
Proximal Policy Distillation cs.LG · 2024-07-21 · conditional · none · ref 3 · internal anchor
PPD integrates PPO into policy distillation so the student collects and uses its own rewards, yielding better sample efficiency and robustness than standard student-distill or teacher-distill on ATARI, Mujoco, and Procgen tasks.
Continual Domain Randomization cs.RO · 2024-03-18 · unverdicted · none · ref 14 · internal anchor
Continual Domain Randomization trains RL policies sequentially on randomization parameter subsets with continual learning to achieve robust sim-to-real transfer in robotic reaching and grasping.
Scaling Robot Learning with Semantically Imagined Experience cs.RO · 2023-02-22 · unverdicted · none · ref 16 · internal anchor
Augmenting robot datasets via diffusion-based semantic inpainting enables manipulation policies to solve unseen tasks with new objects and improves robustness to novel distractors.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 92 · internal anchor
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 37 · internal anchor
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning cs.RO · 2021-08-24 · conditional · none · ref 11 · internal anchor
Isaac Gym achieves 2-3 orders of magnitude faster robot policy training by keeping physics simulation and PyTorch-based RL entirely on GPU with direct buffer sharing.
Scaling Laws for Transfer cs.LG · 2021-02-02 · unverdicted · none · ref 182 · internal anchor
Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.
LNN-Fly: Continuous-Time UAV Navigation for Robust Obstacle Avoidance under Timing Mismatch cs.RO · 2026-06-27 · unverdicted · none · ref 27 · internal anchor
LNN-Fly is a structured recurrent policy for continuous-time UAV obstacle avoidance trained with perturbed differentiable rollouts that shows improved tolerance to timing issues and zero-shot transfer to physical hardware with 100% success in real tests.
LUCID: Learning Embodiment-Agnostic Intent Models from Unstructured Human Videos for Scalable Dexterous Robot Skill Acquisition cs.RO · 2026-06-10 · unverdicted · none · ref 71 · internal anchor
LUCID learns embodiment-agnostic intent models from unstructured human videos to train dexterous robot policies in simulation, enabling zero-shot transfer on real-world tasks like stirring and wiping.
An Agency-Transferring Model-Free Policy Enhancement Technique cs.LG · 2026-06-08 · unverdicted · none · ref 4 · internal anchor
A model-free RL method arbitrates between a functional baseline policy and a learning policy, transferring agency over time to yield a standalone policy with high goal-reaching rates and competitive returns on continuous-control tasks.
Autonomous Aerial Manipulation via Contextual Contrastive Meta Reinforcement Learning cs.LG · 2026-06-07 · unverdicted · none · ref 17 · internal anchor
Aco2 trains a quadrotor policy in simulation that adapts to diverse payload dynamics via latent context encoding and contrastive structuring, enabling zero-shot real-world deployment for autonomous aerial delivery.
Representation Learning Enables Scalable Multitask Deep Reinforcement Learning cs.LG · 2026-06-04 · unverdicted · none · ref 26 · internal anchor
MR.Q combines predictive auxiliary tasks with high-capacity value functions in a model-free architecture to achieve strong multitask RL performance without planning.
Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions cs.LG · 2026-06-02 · unverdicted · none · ref 51 · internal anchor
GTR introduces a bounded non-monotonic Gaussian trust region and Mixture Gaussian Anchor to enable effective behavior transitions in non-stationary RL where standard PPO fails.
Closed-Loop Sim-to-Real Reinforcement Learning for Deformable Microfiber Shape Control cs.RO · 2026-05-20 · unverdicted · none · ref 13 · internal anchor
A closed-loop sim-to-real RL policy trained in a simplified frictionless simulator achieves sub-millimeter microfiber shape control on physical hardware via visual feedback without retraining.

Solving Rubik's Cube with a Robot Hand

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer