arxiv: 1910.07113 · v1 · submitted 2019-10-16 · 💻 cs.LG · cs.AI· cs.CV· cs.RO· stat.ML

Recognition: 2 theorem links

· Lean Theorem

Solving Rubik's Cube with a Robot Hand

OpenAI , Ilge Akkaya , Marcin Andrychowicz , Maciek Chociej , Mateusz Litwin , Bob McGrew , Arthur Petron , Alex Paino

show 11 more authors

Matthias Plappert Glenn Powell Raphael Ribas Jonas Schneider Nikolas Tezak Jerry Tworek Peter Welinder Lilian Weng Qiming Yuan Wojciech Zaremba Lei Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-15 09:33 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVcs.ROstat.ML

keywords automatic domain randomizationsim-to-real transferrobot manipulationRubik's cubereinforcement learninghumanoid handdomain randomizationmeta-learning

0 comments

The pith

Models trained only in simulation solve Rubik's cube with a real robot hand

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that models trained only in simulation can solve a manipulation problem of unprecedented complexity on a real robot. This relies on automatic domain randomization to create a distribution of randomized environments with ever-increasing difficulty, paired with a custom robot platform. If correct, complex robotic tasks involving precise control and state estimation can be learned without any real-world data or fine-tuning. Readers would care because it removes the need for costly and risky real-robot trials when developing advanced manipulation skills.

Core claim

We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by automatic domain randomization (ADR), which automatically generates a distribution over randomized environments of ever-increasing difficulty, and a robot platform built for machine learning. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. Memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a 1,

What carries the argument

Automatic domain randomization (ADR), an algorithm that generates distributions of randomized simulation environments of increasing difficulty to train policies and estimators that transfer to reality.

If this is right

Memory-augmented models exhibit emergent meta-learning when trained on ADR distributions.
Vision state estimators achieve improved sim-to-real transfer with ADR.
The method solves a Rubik's cube task on a humanoid robot hand using only simulation-trained models.
Both control and state estimation problems are addressed without real-world data collection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

ADR may extend to training policies for other dexterous manipulation tasks that require fine motor control.
Greater reliance on simulation could lower the time and cost of developing new robotic skills.
Emergent meta-learning hints that ADR produces policies capable of adapting during execution.

Load-bearing premise

The physics simulator, even when randomized over a wide distribution via ADR, captures enough of the real robot's dynamics, friction, and sensor characteristics for the policy to transfer successfully without real-world fine-tuning.

What would settle it

The physical robot hand failing to solve the Rubik's cube while the same policy succeeds in the ADR-trained simulation would show that transfer has not occurred.

read the original abstract

We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OpenAI shows real-robot Rubik's cube solves from pure simulation via automatic domain randomization, but the coverage of real dynamics remains the weakest link.

read the letter

The main point is that they trained a policy entirely in simulation and got a physical Shadow hand to solve a Rubik's cube. Automatic domain randomization (ADR) is the new piece: it starts narrow and automatically widens the distribution of environments as the policy improves, rather than hand-tuning ranges. Paired with memory-augmented networks, this produced policies that transferred zero-shot and showed some emergent adaptation at test time. The hardware videos are the strongest evidence; they actually demonstrate the cube being solved on the real platform, which is a clear step up in task complexity from earlier sim-to-real manipulation work.

Referee Report

1 major / 2 minor

Summary. The paper claims that Automatic Domain Randomization (ADR) enables training of control policies and vision estimators entirely in simulation that transfer zero-shot to a custom humanoid robot hand, allowing it to solve a Rubik's cube. This is supported by real-robot experiments and videos, with memory-augmented models showing emergent meta-learning.

Significance. If the result holds, this is a significant demonstration of scalable sim-to-real transfer for complex, long-horizon manipulation without real-world data or fine-tuning. The real-robot experiments and videos provide direct empirical grounding, and the emergent meta-learning observation is a notable byproduct of the ADR training regime.

major comments (1)

[§3] §3 (ADR algorithm and randomization): The central claim that ADR produces a distribution bracketing real dynamics relies on the assumption that final randomization ranges cover real joint friction, contact stiffness, motor backlash, and camera parameters, yet the manuscript provides no system-identification measurements or direct comparisons confirming that hardware values lie inside the converged ADR support. This leaves open the possibility that success is due to platform-simulator proximity rather than ADR's automatic expansion.

minor comments (2)

[Abstract] Abstract and §5 (Results): Success rates, number of independent trials, and statistical details on solve reliability are only summarized at a high level; adding quantitative tables or confidence intervals would strengthen verifiability of the transfer claims.
[§4] §4 (Vision and state estimation): The interaction between ADR-trained vision models and control policies is described qualitatively; a clearer ablation isolating the contribution of vision randomization would help readers assess robustness.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the work and for the constructive comment on Section 3. We address the point directly below.

read point-by-point responses

Referee: [§3] §3 (ADR algorithm and randomization): The central claim that ADR produces a distribution bracketing real dynamics relies on the assumption that final randomization ranges cover real joint friction, contact stiffness, motor backlash, and camera parameters, yet the manuscript provides no system-identification measurements or direct comparisons confirming that hardware values lie inside the converged ADR support. This leaves open the possibility that success is due to platform-simulator proximity rather than ADR's automatic expansion.

Authors: We thank the referee for this observation. ADR initializes a narrow parameter distribution and automatically widens each range only when the current policy's success rate in simulation falls below a target threshold; expansion stops once the policy reliably solves the task across the broadened distribution. In the Rubik's cube experiments, policies trained on the initial narrow ranges failed to transfer, while the same architecture trained after ADR expansion succeeded zero-shot on the physical hand. This controlled progression indicates that the automatic widening, rather than static simulator fidelity, is responsible for the observed transfer. We did not perform separate system-identification measurements to obtain precise hardware values for joint friction, contact stiffness, backlash, or camera intrinsics and then verify containment within the final ADR intervals. The zero-shot real-robot results nevertheless constitute empirical evidence that the real dynamics lie inside the final support. We will add a short clarifying paragraph in the revised manuscript that makes this design rationale and the empirical validation explicit. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical hardware validation independent of any fitted derivation

full rationale

The paper's core claim rests on direct physical experiments: policies trained in simulation with ADR are deployed zero-shot on a real Shadow Hand robot and successfully solve Rubik's cubes. No equations, predictions, or uniqueness theorems are presented that reduce by construction to the training distribution or to self-cited parameters. ADR is introduced as an algorithmic procedure whose coverage is tested by observed transfer success rather than assumed; the result is falsifiable against the external benchmark of real-robot performance. Self-citations (if any) are not load-bearing for the central result. This is the standard case of an experimental paper whose validity is measured outside its own fitted values.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The work rests on standard reinforcement learning assumptions plus the domain assumption that randomized simulation can approximate real dynamics sufficiently for policy transfer.

free parameters (2)

ADR randomization ranges and difficulty schedule
Specific bounds on physics parameters and the rate at which difficulty increases are chosen by the authors to cover real-world variation.
RL training hyperparameters
Learning rates, network sizes, and memory-augmentation parameters are tuned for the simulated environments.

axioms (1)

domain assumption A physics simulator with randomized parameters can produce trajectories whose distribution overlaps sufficiently with real robot behavior.
This is the core premise enabling sim-only training to transfer.

pith-pipeline@v0.9.0 · 5518 in / 1260 out tokens · 70591 ms · 2026-05-15T09:33:24.267536+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.DimensionForcing eight_tick_forces_D3 unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Taming the Curses of Multiagency in Robust Markov Games with Large State Space through Linear Function Approximation
cs.LG 2026-05 unverdicted novelty 8.0

The work gives the first algorithms for general robust Markov games with linear function approximation whose sample complexity breaks the curse of multiagency for large state spaces in both generative and online settings.
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
cs.LG 2026-05 unverdicted novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift
cs.LG 2026-05 unverdicted novelty 7.0

SeqRejectron builds a stopping rule from a small set of validator policies to achieve horizon-free sample-complexity guarantees for selective imitation learning under arbitrary train-test dynamics shifts.
Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers
cs.NI 2026-05 unverdicted novelty 7.0

ReGuard discovers network scenarios where RL controllers perform 43-64% worse than achievable and reduces those gaps by 79-85% with lightweight rule-based protection that preserves normal performance.
HANDFUL: Sequential Grasp-Conditioned Dexterous Manipulation with Resource Awareness
cs.RO 2026-04 unverdicted novelty 7.0

HANDFUL learns resource-aware grasps using finger contact rewards and curriculum learning to improve success on sequential dexterous tasks in simulation and on a real LEAP hand.
Betting for Sim-to-Real Performance Evaluation
cs.RO 2026-04 unverdicted novelty 7.0

Betting mechanisms can yield provably more accurate and efficient estimates of real-world robot behavior than Monte Carlo sampling under specified conditions, with practical approximations demonstrated on synthetic da...
SynthPID: P&ID digitization from Topology-Preserving Synthetic Data
cs.CV 2026-04 conditional novelty 7.0

Topology-preserving synthetic P&IDs generated by seeding from real drawings enable models trained solely on synthetics to achieve 63.8% edge mAP on real P&ID benchmarks, closing most of the gap to real-data training.
Dota 2 with Large Scale Deep Reinforcement Learning
cs.LG 2019-12 accept novelty 7.0

OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.
Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching
cs.RO 2026-05 unverdicted novelty 6.0

DRIS improves zero-shot sim-to-real transfer for reactive catching by maintaining and acting on sets of randomized dynamics instances instead of single instances per episode.
GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning
cs.RO 2026-04 unverdicted novelty 6.0

GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for ...
ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation
cs.RO 2026-04 unverdicted novelty 6.0

A framework using 3D Gaussian Splatting for visual domain randomization enables robust monocular RGB-based dexterous in-hand reorientation on real hardware for multiple objects under varied lighting.
Trajectory-based actuator identification via differentiable simulation
cs.RO 2026-04 unverdicted novelty 6.0

Differentiable simulation enables torque-sensor-free actuator model identification from trajectory data, achieving 1.88x better position tracking than a stand-trained baseline and 46% longer travel in downstream locom...
Learning Dexterous Grasping from Sparse Taxonomy Guidance
cs.RO 2026-04 unverdicted novelty 6.0

GRIT learns dexterous grasping from sparse taxonomy guidance, achieving 87.9% success and better generalization to novel objects via a two-stage prediction-plus-policy approach.
ROBOGATE: Adaptive Failure Discovery for Safe Robot Policy Deployment via Two-Stage Boundary-Focused Sampling
cs.RO 2026-03 unverdicted novelty 6.0

ROBOGATE applies adaptive boundary-focused sampling in simulation to discover robot policy failure boundaries, revealing a 97.65 percentage point performance gap for a VLA model between LIBERO and industrial scenarios.
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
cs.RO 2025-11 unverdicted novelty 6.0

Isaac Lab is a unified GPU-native platform combining high-fidelity physics, photorealistic rendering, multi-frequency sensors, domain randomization, and learning pipelines for scalable multi-modal robot policy training.
Language Models (Mostly) Know What They Know
cs.CL 2022-07 unverdicted novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment
cs.CL 2021-12 conditional novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
cs.RO 2021-08 conditional novelty 6.0

Isaac Gym achieves 2-3 orders of magnitude faster robot policy training by keeping physics simulation and PyTorch-based RL entirely on GPU with direct buffer sharing.
You're Pushing My Buttons: Instrumented Learning of Gentle Button Presses
cs.RO 2026-04 unverdicted novelty 5.0

Training-time instrumentation with audio and privileged button-state signals produces contact policies that match success rates but apply lower forces using only vision and audio at inference.

Reference graph

Works this paper leans on

123 extracted references · 123 canonical work pages · cited by 19 Pith papers · 44 internal anchors

[1]

Abell and M

T. Abell and M. A. Erdmann. Stably supported rotations of a planar polygon with two frictionless contacts. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1995, August 5 - 9, 1995, Pittsburgh, PA, USA, pages 411–418, 1995

work page 1995
[2]

Aiyama, M

Y . Aiyama, M. Inaba, and H. Inoue. Pivoting: A new method of graspless manipulation of object by robot ﬁngers. In Proceedings of 1993 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1993, Tokyo, Japan, July 26 - 30, 1993, pages 136–143, 1993

work page 1993
[3]

Reinforcement Learning for Pivoting Task

R. Antonova, S. Cruciani, C. Smith, and D. Kragic. Reinforcement learning for pivoting task. CoRR, abs/1703.00472, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[4]

Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation

D. Antotsiou, G. Garcia-Hernando, and T. Kim. Task-oriented hand motion retargeting for dexterous manipulation imitation. CoRR, abs/1810.01845, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

Asfour, J

T. Asfour, J. Schill, H. Peters, C. Klas, J. Bücker, C. Sander, S. Schulz, A. Kargov, T. Werner, and V . Bartenbach. Armar-4: A 63 dof torque controlled humanoid robot. In 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 390–396. IEEE, 2013

work page 2013
[6]

Bai and C

Y . Bai and C. K. Liu. Dexterous manipulation using both palm and ﬁngers. In 2014 IEEE International Conference on Robotics and Automation, ICRA 2014, Hong Kong, China, May 31 - June 7, 2014 , pages 1560–1565, 2014

work page 2014
[7]

Distributed Distributional Deterministic Policy Gradients

G. Barth-Maron, M. W. Hoffman, D. Budden, W. Dabney, D. Horgan, D. TB, A. Muldal, N. Heess, and T. P. Lillicrap. Distributed distributional deterministic policy gradients. CoRR, abs/1804.08617, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[8]

A. Beer. Fastest robot to solve a Rubik’s Cube. https://www.guinnessworldrecords.com/ world-records/fastest-robot-to-solve-a-rubiks-cube , 2016

work page 2016
[9]

A. Bicchi. Hands for dexterous manipulation and robust grasping: a difﬁcult road toward simplicity. IEEE Trans. Robotics and Automation, 16(6):652–662, 2000

work page 2000
[10]

Bicchi and R

A. Bicchi and R. Sorrentino. Dexterous manipulation through rolling. In Proceedings of the 1995 International Conference on Robotics and Automation, Nagoya, Aichi, Japan, May 21-27, 1995, pages 452–457, 1995

work page 1995
[11]

Botvinick, S

M. Botvinick, S. Ritter, J. X. Wang, Z. Kurth-Nelson, C. Blundell, and D. Hassabis. Reinforcement learning, fast and slow. Trends in cognitive sciences, 2019

work page 2019
[12]

Exploration by Random Network Distillation

Y . Burda, H. Edwards, A. Storkey, and O. Klimov. Exploration by random network distillation.arXiv preprint arXiv:1810.12894, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

Carter, D

S. Carter, D. Ha, I. Johnson, and C. Olah. Experiments in handwriting with a neural network. Distill, 2016

work page 2016
[14]

Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience

Y . Chebotar, A. Handa, V . Makoviychuk, M. Macklin, J. Issac, N. Ratliff, and D. Fox. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience. arXiv preprint 1810.05687, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

Cherif and K

M. Cherif and K. K. Gupta. Planning quasi-static ﬁngertip manipulations for reconﬁguring objects.IEEE Trans. Robotics and Automation, 15(5):837–848, 1999

work page 1999
[16]

ORRB -- OpenAI Remote Rendering Backend

M. Chociej, P. Welinder, and L. Weng. Orrb – openai remote rendering backend.arXiv preprint 1906.11633, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[17]

Christen, S

S. Christen, S. Stevsic, and O. Hilliges. Demonstration-guided deep reinforcement learning of control policies for dexterous human-robot interaction. CoRR, abs/1906.11695, 2019

work page arXiv 1906
[18]

P. F. Christiano, Z. Shah, I. Mordatch, J. Schneider, T. Blackwell, J. Tobin, P. Abbeel, and W. Zaremba. Transfer from simulation to real world through learning deep inverse dynamics model. CoRR, abs/1610.03518, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[19]

Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning

I. Clavera, A. Nagabandi, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn. Learning to adapt: Meta-learning for model-based control. CoRR, abs/1803.11347, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

J. Clune. Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artiﬁcial intelligence. arXiv preprint arXiv:1905.10985, 2019

work page arXiv 1905
[21]

E. D. Cubuk, B. Zoph, D. Mane, V . Vasudevan, and Q. V . Le. AutoAugment: Learning Augmentation Policies from Data. arXiv preprint 1805.09501, may 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Cully, J

A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret. Robots that can adapt like animals. Nature, 521(7553):503, 2015

work page 2015
[23]

Distilling Policy Distillation

W. Czarnecki, R. Pascanu, S. Osindero, S. M. Jayakumar, G. Swirszcz, and M. Jaderberg. Distilling policy distillation. ArXiv, abs/1902.02186, 2019. 32 A PREPRINT - OCTOBER 17, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[24]

N. C. Daﬂe and A. Rodriguez. Sampling-based planning of in-hand manipulation with external pushes. CoRR, abs/1707.00318, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[25]

N. C. Daﬂe, A. Rodriguez, R. Paolini, B. Tang, S. S. Srinivasa, M. A. Erdmann, M. T. Mason, I. Lundberg, H. Staab, and T. A. Fuhlbrigge. Extrinsic dexterity: In-hand manipulation with external forces. In 2014 IEEE International Conference on Robotics and Automation, ICRA 2014, Hong Kong, China, May 31 - June 7, 2014, pages 1578–1585, 2014

work page 2014
[26]

Doulgeri and L

Z. Doulgeri and L. Droukas. On rolling contact motion by robotic ﬁngers via prescribed performance control. In 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, May 6-10, 2013, pages 3976–3981, 2013

work page 2013
[27]

Y . Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. RL2: Fast reinforcement learning via slow reinforcement learning. CoRR, abs/1611.02779, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[28]

Dynamics

B. Dynamics. Atlas. https://www.bostondynamics.com/atlas, 2013

work page 2013
[29]

M. A. Erdmann. An exploration of nonprehensile two-palm manipulation. I. J. Robotics Res., 17(5):485–503, 1998

work page 1998
[30]

M. A. Erdmann and M. T. Mason. An exploration of sensorless manipulation. IEEE J. Robotics and Automation, 4(4):369–379, 1988

work page 1988
[31]

Falco, A

P. Falco, A. Attawia, M. Saveriano, and D. Lee. On policy learning robust to irreversible events: An application to robotic in-hand manipulation. IEEE Robotics and Automation Letters, 3(3):1482–1489, 2018

work page 2018
[32]

R. S. Fearing. Implementing a force strategy for object re-orientation. In Proceedings of the 1986 IEEE International Conference on Robotics and Automation, San Francisco, California, USA, April 7-10, 1986, pages 96–102, 1986

work page 1986
[33]

C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. CoRR, abs/1703.03400, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

D. Gilday. MindCuber. https://mindcuber.com/, 2013

work page 2013
[35]

Graves, M

A. Graves, M. G. Bellemare, J. Menick, R. Munos, and K. Kavukcuoglu. Automated curriculum learning for neural networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pages 1311–1320. JMLR.org, 2017

work page 2017
[36]

S. Gu, E. Holly, T. P. Lillicrap, and S. Levine. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017, pages 3389–3396, 2017

work page 2017
[37]

M. Guo, A. Haque, D.-A. Huang, S. Yeung, and L. Fei-Fei. Dynamic task prioritization for multitask learning. In Proceedings of the European Conference on Computer Vision (ECCV), pages 270–287, 2018

work page 2018
[38]

Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning

A. Gupta, C. Devin, Y . Liu, P. Abbeel, and S. Levine. Learning invariant feature spaces to transfer skills with reinforcement learning. CoRR, abs/1703.02949, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

Gupta, B

A. Gupta, B. Eysenbach, C. Finn, and S. Levine. Unsupervised meta-learning for reinforcement learning. CoRR, abs/1806.04640, 2018

work page arXiv 2018
[40]

Soft Actor-Critic Algorithms and Applications

T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V . Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine. Soft actor-critic algorithms and applications. CoRR, abs/1812.05905, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[41]

L. Han, Y . Guan, Z. X. Li, S. Qi, and J. C. Trinkle. Dextrous manipulation with rolling contacts. InProceedings of the 1997 IEEE International Conference on Robotics and Automation, Albuquerque, New Mexico, USA, April 20-25, 1997, pages 992–997, 1997

work page 1997
[42]

Han and J

L. Han and J. C. Trinkle. Dextrous manipulation by rolling and ﬁnger gaiting. In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA-98, Leuven, Belgium, May 16-20, 1998 , pages 730–735, 1998

work page 1998
[43]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016
[44]

R. Higo, Y . Yamakawa, T. Senoo, and M. Ishikawa. Rubik’s cube handling using a high-speed multi-ﬁngered hand and a high-speed vision system. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6609–6614. IEEE, 2018

work page 2018
[45]

Hochreiter and J

S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997

work page 1997
[46]

W. H. Huang and M. T. Mason. Mechanics, planning, and control for tapping.I. J. Robotics Res., 19(10):883–894, 2000. 33 A PREPRINT - OCTOBER 17, 2019

work page 2000
[47]

Humplik, A

J. Humplik, A. Galashov, L. Hasenclever, P. A. Ortega, Y . W. Teh, and N. Heess. Meta reinforcement learning as task inference. CoRR, abs/1905.06424, 2019

work page arXiv 1905
[48]

Hwangbo, J

J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V . Tsounis, V . Koltun, and M. Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019

work page 2019
[49]

Kalashnikov, A

D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V . Vanhoucke, and S. Levine. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation. ArXiv e-prints, June 2018

work page 2018
[50]

A. Kar, A. Prakash, M.-Y . Liu, E. Cameracci, J. Yuan, M. Rusiniak, D. Acuna, A. Torralba, and S. Fidler. Meta-Sim: Learning to Generate Synthetic Datasets. arXiv preprint 1904.11621, apr 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[51]

D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[52]

Learning Dexterous Manipulation Policies from Experience and Imitation

V . Kumar, A. Gupta, E. Todorov, and S. Levine. Learning dexterous manipulation policies from experience and imitation. CoRR, abs/1611.05095, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[53]

Kumar, E

V . Kumar, E. Todorov, and S. Levine. Optimal control with learned local models: Application to dexterous manipulation. In 2016 IEEE International Conference on Robotics and Automation, ICRA 2016, Stockholm, Sweden, May 16-21, 2016, pages 378–383, 2016

work page 2016
[54]

L. Lan, Z. Li, X. Guan, and P. Wang. Meta reinforcement learning with task embedding and shared policy. In IJCAI, 2019

work page 2019
[55]

N. C. Landolﬁ, G. Thomas, and T. Ma. A model-based approach for sample-efﬁcient multi-task reinforcement learning. CoRR, abs/1907.04964, 2019

work page arXiv 1907
[56]

Levine and V

S. Levine and V . Koltun. Guided policy search. InProceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, pages 1–9, 2013

work page 2013
[57]

Levine, P

S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. I. J. Robotics Res., 37(4-5):421–436, 2018

work page 2018
[58]

Levine, N

S. Levine, N. Wagener, and P. Abbeel. Learning contact-rich manipulation skills with guided policy search. In IEEE International Conference on Robotics and Automation, ICRA 2015, Seattle, WA, USA, 26-30 May, 2015, pages 156–163, 2015

work page 2015
[59]

M. Li, Y . Bekiroglu, D. Kragic, and A. Billard. Learning of grasp adaptation through experience and tactile sensing. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, September 14-18, 2014, pages 3339–3346, 2014

work page 2014
[60]

M. Li, H. Yin, K. Tahara, and A. Billard. Learning object-level impedance control for robust grasping and dexterous manipulation. In 2014 IEEE International Conference on Robotics and Automation, ICRA 2014, Hong Kong, China, May 31 - June 7, 2014, pages 6784–6791, 2014

work page 2014
[61]

Q. Li, M. Meier, R. Haschke, H. J. Ritter, and B. Bolder. Rotary object dexterous manipulation in hand: a feedback-based method. IJMA, 3(1):36–47, 2013

work page 2013
[62]

T. Li, W. Xi, M. Fang, J. Xu, and M. Qing-Hu Meng. Learning to Solve a Rubik’s Cube with a Dexterous Hand. arXiv e-prints, page arXiv:1907.11388, Jul 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[63]

Lynch, M

C. Lynch, M. Khansari, T. Xiao, V . Kumar, J. Tompson, S. Levine, and P. Sermanet. Learning latent plans from play. CoRR, abs/1903.01973, 2019

work page arXiv 1903
[64]

R. R. Ma and A. M. Dollar. On dexterity and dexterous manipulation. In 15th International Conference on Advanced Robotics: New Boundaries for Robotics, ICAR 2011, Tallinn, Estonia, June 20-23, 2011., pages 1–7, 2011

work page 2011
[65]

Mahler, J

J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In Robotics: Science and Systems XIII, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA, July 12-16, 2017, 2017

work page 2017
[66]

Dex-Net 3.0: Computing Robust Robot Vacuum Suction Grasp Targets in Point Clouds using a New Analytic Model and Deep Learning

J. Mahler, M. Matl, X. Liu, A. Li, D. V . Gealy, and K. Goldberg. Dex-net 3.0: Computing robust robot suction grasp targets in point clouds using a new analytic model and deep learning. CoRR, abs/1709.06670, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[67]

Teacher-Student Curriculum Learning

T. Matiisen, A. Oliver, T. Cohen, and J. Schulman. Teacher-student curriculum learning. arXiv preprint arXiv:1707.00183, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[68]

Active Domain Randomization

B. Mehta, M. Diaz, F. Golemo, C. J. Pal, and L. Paull. Active Domain Randomization.arXiv preprint 1904.04762, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[69]

A Simple Neural Attentive Meta-Learner

N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel. Meta-learning with temporal convolutions. CoRR, abs/1707.03141, 2017. 34 A PREPRINT - OCTOBER 17, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2017
[70]

Mordatch, Z

I. Mordatch, Z. Popovic, and E. Todorov. Contact-invariant optimization for hand manipulation. InProceedings of the 2012 Eurographics/ACM SIGGRAPH Symposium on Computer Animation, SCA 2012, Lausanne, Switzerland, 2012, pages 137–144, 2012

work page 2012
[71]

Nagabandi, K

A. Nagabandi, K. Konoglie, S. Levine, and V . Kumar. Deep Dynamics Models for Learning Dexterous Manipulation. In Conference on Robot Learning (CoRL), 2019

work page 2019
[72]

Nair and G

V . Nair and G. E. Hinton. Rectiﬁed linear units improve restricted boltzmann machines. InProceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010

work page 2010
[73]

nRF52832 Product Speciﬁcation v1.1

Nordic Semiconductor. nRF52832 Product Speciﬁcation v1.1. Technical report, Nordic Semiconductor, 2016

work page 2016
[74]

A. M. Okamura, N. Smaby, and M. R. Cutkosky. An overview of dexterous manipulation. In Proceedings of the 2000 IEEE International Conference on Robotics and Automation, ICRA 2000, April 24-28, 2000, San Francisco, CA, USA, pages 255–262, 2000

work page 2000
[75]

C. Olah, A. Satyanarayan, I. Johnson, S. Carter, L. Schubert, K. Ye, and A. Mordvintsev. The building blocks of interpretability. Distill, 2018. https://distill.pub/2018/building-blocks

work page 2018
[76]

OpenAI Five

OpenAI. OpenAI Five. https://blog.openai.com/openai-five/, 2018

work page 2018
[77]

Andrychowicz, B

OpenAI, M. Andrychowicz, B. Baker, M. Chociej, R. Józefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba. Learning dexterous in-hand manipulation. CoRR, 2018

work page 2018
[78]

3d-printed rubik’s cube robot

Otvinta. 3d-printed rubik’s cube robot. http://www.rcr3d.com/, 2017

work page 2017
[79]

Concurrent Meta Reinforcement Learning

E. Parisotto, S. Ghosh, S. B. Yalamanchi, V . Chinnaobireddy, Y . Wu, and R. Salakhutdinov. Concurrent meta reinforcement learning. CoRR, abs/1903.02710, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903
[80]

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. CoRR, abs/1710.06537, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

Showing first 80 references.