pith. machine review for the scientific record. sign in

arxiv: 1910.07113 · v1 · submitted 2019-10-16 · 💻 cs.LG · cs.AI· cs.CV· cs.RO· stat.ML

Recognition: 2 theorem links

· Lean Theorem

Solving Rubik's Cube with a Robot Hand

Authors on Pith no claims yet

Pith reviewed 2026-05-15 09:33 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVcs.ROstat.ML
keywords automatic domain randomizationsim-to-real transferrobot manipulationRubik's cubereinforcement learninghumanoid handdomain randomizationmeta-learning
0
0 comments X

The pith

Models trained only in simulation solve Rubik's cube with a real robot hand

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that models trained only in simulation can solve a manipulation problem of unprecedented complexity on a real robot. This relies on automatic domain randomization to create a distribution of randomized environments with ever-increasing difficulty, paired with a custom robot platform. If correct, complex robotic tasks involving precise control and state estimation can be learned without any real-world data or fine-tuning. Readers would care because it removes the need for costly and risky real-robot trials when developing advanced manipulation skills.

Core claim

We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by automatic domain randomization (ADR), which automatically generates a distribution over randomized environments of ever-increasing difficulty, and a robot platform built for machine learning. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. Memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a 1,

What carries the argument

Automatic domain randomization (ADR), an algorithm that generates distributions of randomized simulation environments of increasing difficulty to train policies and estimators that transfer to reality.

If this is right

  • Memory-augmented models exhibit emergent meta-learning when trained on ADR distributions.
  • Vision state estimators achieve improved sim-to-real transfer with ADR.
  • The method solves a Rubik's cube task on a humanoid robot hand using only simulation-trained models.
  • Both control and state estimation problems are addressed without real-world data collection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • ADR may extend to training policies for other dexterous manipulation tasks that require fine motor control.
  • Greater reliance on simulation could lower the time and cost of developing new robotic skills.
  • Emergent meta-learning hints that ADR produces policies capable of adapting during execution.

Load-bearing premise

The physics simulator, even when randomized over a wide distribution via ADR, captures enough of the real robot's dynamics, friction, and sensor characteristics for the policy to transfer successfully without real-world fine-tuning.

What would settle it

The physical robot hand failing to solve the Rubik's cube while the same policy succeeds in the ADR-trained simulation would show that transfer has not occurred.

read the original abstract

We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims that Automatic Domain Randomization (ADR) enables training of control policies and vision estimators entirely in simulation that transfer zero-shot to a custom humanoid robot hand, allowing it to solve a Rubik's cube. This is supported by real-robot experiments and videos, with memory-augmented models showing emergent meta-learning.

Significance. If the result holds, this is a significant demonstration of scalable sim-to-real transfer for complex, long-horizon manipulation without real-world data or fine-tuning. The real-robot experiments and videos provide direct empirical grounding, and the emergent meta-learning observation is a notable byproduct of the ADR training regime.

major comments (1)
  1. [§3] §3 (ADR algorithm and randomization): The central claim that ADR produces a distribution bracketing real dynamics relies on the assumption that final randomization ranges cover real joint friction, contact stiffness, motor backlash, and camera parameters, yet the manuscript provides no system-identification measurements or direct comparisons confirming that hardware values lie inside the converged ADR support. This leaves open the possibility that success is due to platform-simulator proximity rather than ADR's automatic expansion.
minor comments (2)
  1. [Abstract] Abstract and §5 (Results): Success rates, number of independent trials, and statistical details on solve reliability are only summarized at a high level; adding quantitative tables or confidence intervals would strengthen verifiability of the transfer claims.
  2. [§4] §4 (Vision and state estimation): The interaction between ADR-trained vision models and control policies is described qualitatively; a clearer ablation isolating the contribution of vision randomization would help readers assess robustness.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the work and for the constructive comment on Section 3. We address the point directly below.

read point-by-point responses
  1. Referee: [§3] §3 (ADR algorithm and randomization): The central claim that ADR produces a distribution bracketing real dynamics relies on the assumption that final randomization ranges cover real joint friction, contact stiffness, motor backlash, and camera parameters, yet the manuscript provides no system-identification measurements or direct comparisons confirming that hardware values lie inside the converged ADR support. This leaves open the possibility that success is due to platform-simulator proximity rather than ADR's automatic expansion.

    Authors: We thank the referee for this observation. ADR initializes a narrow parameter distribution and automatically widens each range only when the current policy's success rate in simulation falls below a target threshold; expansion stops once the policy reliably solves the task across the broadened distribution. In the Rubik's cube experiments, policies trained on the initial narrow ranges failed to transfer, while the same architecture trained after ADR expansion succeeded zero-shot on the physical hand. This controlled progression indicates that the automatic widening, rather than static simulator fidelity, is responsible for the observed transfer. We did not perform separate system-identification measurements to obtain precise hardware values for joint friction, contact stiffness, backlash, or camera intrinsics and then verify containment within the final ADR intervals. The zero-shot real-robot results nevertheless constitute empirical evidence that the real dynamics lie inside the final support. We will add a short clarifying paragraph in the revised manuscript that makes this design rationale and the empirical validation explicit. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical hardware validation independent of any fitted derivation

full rationale

The paper's core claim rests on direct physical experiments: policies trained in simulation with ADR are deployed zero-shot on a real Shadow Hand robot and successfully solve Rubik's cubes. No equations, predictions, or uniqueness theorems are presented that reduce by construction to the training distribution or to self-cited parameters. ADR is introduced as an algorithmic procedure whose coverage is tested by observed transfer success rather than assumed; the result is falsifiable against the external benchmark of real-robot performance. Self-citations (if any) are not load-bearing for the central result. This is the standard case of an experimental paper whose validity is measured outside its own fitted values.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The work rests on standard reinforcement learning assumptions plus the domain assumption that randomized simulation can approximate real dynamics sufficiently for policy transfer.

free parameters (2)
  • ADR randomization ranges and difficulty schedule
    Specific bounds on physics parameters and the rate at which difficulty increases are chosen by the authors to cover real-world variation.
  • RL training hyperparameters
    Learning rates, network sizes, and memory-augmentation parameters are tuned for the simulated environments.
axioms (1)
  • domain assumption A physics simulator with randomized parameters can produce trajectories whose distribution overlaps sufficiently with real robot behavior.
    This is the core premise enabling sim-only training to transfer.

pith-pipeline@v0.9.0 · 5518 in / 1260 out tokens · 70591 ms · 2026-05-15T09:33:24.267536+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith.Foundation.DimensionForcing eight_tick_forces_D3 unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Taming the Curses of Multiagency in Robust Markov Games with Large State Space through Linear Function Approximation

    cs.LG 2026-05 unverdicted novelty 8.0

    The work gives the first algorithms for general robust Markov games with linear function approximation whose sample complexity breaks the curse of multiagency for large state spaces in both generative and online settings.

  2. Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

    cs.LG 2026-05 unverdicted novelty 7.0

    DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

  3. Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

    cs.LG 2026-05 unverdicted novelty 7.0

    SeqRejectron builds a stopping rule from a small set of validator policies to achieve horizon-free sample-complexity guarantees for selective imitation learning under arbitrary train-test dynamics shifts.

  4. Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers

    cs.NI 2026-05 unverdicted novelty 7.0

    ReGuard discovers network scenarios where RL controllers perform 43-64% worse than achievable and reduces those gaps by 79-85% with lightweight rule-based protection that preserves normal performance.

  5. HANDFUL: Sequential Grasp-Conditioned Dexterous Manipulation with Resource Awareness

    cs.RO 2026-04 unverdicted novelty 7.0

    HANDFUL learns resource-aware grasps using finger contact rewards and curriculum learning to improve success on sequential dexterous tasks in simulation and on a real LEAP hand.

  6. Betting for Sim-to-Real Performance Evaluation

    cs.RO 2026-04 unverdicted novelty 7.0

    Betting mechanisms can yield provably more accurate and efficient estimates of real-world robot behavior than Monte Carlo sampling under specified conditions, with practical approximations demonstrated on synthetic da...

  7. SynthPID: P&ID digitization from Topology-Preserving Synthetic Data

    cs.CV 2026-04 conditional novelty 7.0

    Topology-preserving synthetic P&IDs generated by seeding from real drawings enable models trained solely on synthetics to achieve 63.8% edge mAP on real P&ID benchmarks, closing most of the gap to real-data training.

  8. Dota 2 with Large Scale Deep Reinforcement Learning

    cs.LG 2019-12 accept novelty 7.0

    OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.

  9. Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching

    cs.RO 2026-05 unverdicted novelty 6.0

    DRIS improves zero-shot sim-to-real transfer for reactive catching by maintaining and acting on sets of randomized dynamics instances instead of single instances per episode.

  10. GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning

    cs.RO 2026-04 unverdicted novelty 6.0

    GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for ...

  11. ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation

    cs.RO 2026-04 unverdicted novelty 6.0

    A framework using 3D Gaussian Splatting for visual domain randomization enables robust monocular RGB-based dexterous in-hand reorientation on real hardware for multiple objects under varied lighting.

  12. Trajectory-based actuator identification via differentiable simulation

    cs.RO 2026-04 unverdicted novelty 6.0

    Differentiable simulation enables torque-sensor-free actuator model identification from trajectory data, achieving 1.88x better position tracking than a stand-trained baseline and 46% longer travel in downstream locom...

  13. Learning Dexterous Grasping from Sparse Taxonomy Guidance

    cs.RO 2026-04 unverdicted novelty 6.0

    GRIT learns dexterous grasping from sparse taxonomy guidance, achieving 87.9% success and better generalization to novel objects via a two-stage prediction-plus-policy approach.

  14. ROBOGATE: Adaptive Failure Discovery for Safe Robot Policy Deployment via Two-Stage Boundary-Focused Sampling

    cs.RO 2026-03 unverdicted novelty 6.0

    ROBOGATE applies adaptive boundary-focused sampling in simulation to discover robot policy failure boundaries, revealing a 97.65 percentage point performance gap for a VLA model between LIBERO and industrial scenarios.

  15. Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    cs.RO 2025-11 unverdicted novelty 6.0

    Isaac Lab is a unified GPU-native platform combining high-fidelity physics, photorealistic rendering, multi-frequency sensors, domain randomization, and learning pipelines for scalable multi-modal robot policy training.

  16. Language Models (Mostly) Know What They Know

    cs.CL 2022-07 unverdicted novelty 6.0

    Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

  17. A General Language Assistant as a Laboratory for Alignment

    cs.CL 2021-12 conditional novelty 6.0

    Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

  18. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

    cs.RO 2021-08 conditional novelty 6.0

    Isaac Gym achieves 2-3 orders of magnitude faster robot policy training by keeping physics simulation and PyTorch-based RL entirely on GPU with direct buffer sharing.

  19. You're Pushing My Buttons: Instrumented Learning of Gentle Button Presses

    cs.RO 2026-04 unverdicted novelty 5.0

    Training-time instrumentation with audio and privileged button-state signals produces contact policies that match success rates but apply lower forces using only vision and audio at inference.

Reference graph

Works this paper leans on

123 extracted references · 123 canonical work pages · cited by 19 Pith papers · 44 internal anchors

  1. [1]

    Abell and M

    T. Abell and M. A. Erdmann. Stably supported rotations of a planar polygon with two frictionless contacts. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1995, August 5 - 9, 1995, Pittsburgh, PA, USA, pages 411–418, 1995

  2. [2]

    Aiyama, M

    Y . Aiyama, M. Inaba, and H. Inoue. Pivoting: A new method of graspless manipulation of object by robot fingers. In Proceedings of 1993 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1993, Tokyo, Japan, July 26 - 30, 1993, pages 136–143, 1993

  3. [3]

    Reinforcement Learning for Pivoting Task

    R. Antonova, S. Cruciani, C. Smith, and D. Kragic. Reinforcement learning for pivoting task. CoRR, abs/1703.00472, 2017

  4. [4]

    Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation

    D. Antotsiou, G. Garcia-Hernando, and T. Kim. Task-oriented hand motion retargeting for dexterous manipulation imitation. CoRR, abs/1810.01845, 2018

  5. [5]

    Asfour, J

    T. Asfour, J. Schill, H. Peters, C. Klas, J. Bücker, C. Sander, S. Schulz, A. Kargov, T. Werner, and V . Bartenbach. Armar-4: A 63 dof torque controlled humanoid robot. In 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 390–396. IEEE, 2013

  6. [6]

    Bai and C

    Y . Bai and C. K. Liu. Dexterous manipulation using both palm and fingers. In 2014 IEEE International Conference on Robotics and Automation, ICRA 2014, Hong Kong, China, May 31 - June 7, 2014 , pages 1560–1565, 2014

  7. [7]

    Distributed Distributional Deterministic Policy Gradients

    G. Barth-Maron, M. W. Hoffman, D. Budden, W. Dabney, D. Horgan, D. TB, A. Muldal, N. Heess, and T. P. Lillicrap. Distributed distributional deterministic policy gradients. CoRR, abs/1804.08617, 2018

  8. [8]

    A. Beer. Fastest robot to solve a Rubik’s Cube. https://www.guinnessworldrecords.com/ world-records/fastest-robot-to-solve-a-rubiks-cube , 2016

  9. [9]

    A. Bicchi. Hands for dexterous manipulation and robust grasping: a difficult road toward simplicity. IEEE Trans. Robotics and Automation, 16(6):652–662, 2000

  10. [10]

    Bicchi and R

    A. Bicchi and R. Sorrentino. Dexterous manipulation through rolling. In Proceedings of the 1995 International Conference on Robotics and Automation, Nagoya, Aichi, Japan, May 21-27, 1995, pages 452–457, 1995

  11. [11]

    Botvinick, S

    M. Botvinick, S. Ritter, J. X. Wang, Z. Kurth-Nelson, C. Blundell, and D. Hassabis. Reinforcement learning, fast and slow. Trends in cognitive sciences, 2019

  12. [12]

    Exploration by Random Network Distillation

    Y . Burda, H. Edwards, A. Storkey, and O. Klimov. Exploration by random network distillation.arXiv preprint arXiv:1810.12894, 2018

  13. [13]

    Carter, D

    S. Carter, D. Ha, I. Johnson, and C. Olah. Experiments in handwriting with a neural network. Distill, 2016

  14. [14]

    Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience

    Y . Chebotar, A. Handa, V . Makoviychuk, M. Macklin, J. Issac, N. Ratliff, and D. Fox. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience. arXiv preprint 1810.05687, 2018

  15. [15]

    Cherif and K

    M. Cherif and K. K. Gupta. Planning quasi-static fingertip manipulations for reconfiguring objects.IEEE Trans. Robotics and Automation, 15(5):837–848, 1999

  16. [16]

    ORRB -- OpenAI Remote Rendering Backend

    M. Chociej, P. Welinder, and L. Weng. Orrb – openai remote rendering backend.arXiv preprint 1906.11633, 2019

  17. [17]

    Christen, S

    S. Christen, S. Stevsic, and O. Hilliges. Demonstration-guided deep reinforcement learning of control policies for dexterous human-robot interaction. CoRR, abs/1906.11695, 2019

  18. [18]

    P. F. Christiano, Z. Shah, I. Mordatch, J. Schneider, T. Blackwell, J. Tobin, P. Abbeel, and W. Zaremba. Transfer from simulation to real world through learning deep inverse dynamics model. CoRR, abs/1610.03518, 2016

  19. [19]

    Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning

    I. Clavera, A. Nagabandi, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn. Learning to adapt: Meta-learning for model-based control. CoRR, abs/1803.11347, 2018

  20. [20]

    J. Clune. Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence. arXiv preprint arXiv:1905.10985, 2019

  21. [21]

    E. D. Cubuk, B. Zoph, D. Mane, V . Vasudevan, and Q. V . Le. AutoAugment: Learning Augmentation Policies from Data. arXiv preprint 1805.09501, may 2018

  22. [22]

    Cully, J

    A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret. Robots that can adapt like animals. Nature, 521(7553):503, 2015

  23. [23]

    Distilling Policy Distillation

    W. Czarnecki, R. Pascanu, S. Osindero, S. M. Jayakumar, G. Swirszcz, and M. Jaderberg. Distilling policy distillation. ArXiv, abs/1902.02186, 2019. 32 A PREPRINT - OCTOBER 17, 2019

  24. [24]

    N. C. Dafle and A. Rodriguez. Sampling-based planning of in-hand manipulation with external pushes. CoRR, abs/1707.00318, 2017

  25. [25]

    N. C. Dafle, A. Rodriguez, R. Paolini, B. Tang, S. S. Srinivasa, M. A. Erdmann, M. T. Mason, I. Lundberg, H. Staab, and T. A. Fuhlbrigge. Extrinsic dexterity: In-hand manipulation with external forces. In 2014 IEEE International Conference on Robotics and Automation, ICRA 2014, Hong Kong, China, May 31 - June 7, 2014, pages 1578–1585, 2014

  26. [26]

    Doulgeri and L

    Z. Doulgeri and L. Droukas. On rolling contact motion by robotic fingers via prescribed performance control. In 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, May 6-10, 2013, pages 3976–3981, 2013

  27. [27]

    Y . Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. RL2: Fast reinforcement learning via slow reinforcement learning. CoRR, abs/1611.02779, 2016

  28. [28]

    Dynamics

    B. Dynamics. Atlas. https://www.bostondynamics.com/atlas, 2013

  29. [29]

    M. A. Erdmann. An exploration of nonprehensile two-palm manipulation. I. J. Robotics Res., 17(5):485–503, 1998

  30. [30]

    M. A. Erdmann and M. T. Mason. An exploration of sensorless manipulation. IEEE J. Robotics and Automation, 4(4):369–379, 1988

  31. [31]

    Falco, A

    P. Falco, A. Attawia, M. Saveriano, and D. Lee. On policy learning robust to irreversible events: An application to robotic in-hand manipulation. IEEE Robotics and Automation Letters, 3(3):1482–1489, 2018

  32. [32]

    R. S. Fearing. Implementing a force strategy for object re-orientation. In Proceedings of the 1986 IEEE International Conference on Robotics and Automation, San Francisco, California, USA, April 7-10, 1986, pages 96–102, 1986

  33. [33]

    C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. CoRR, abs/1703.03400, 2017

  34. [34]

    D. Gilday. MindCuber. https://mindcuber.com/, 2013

  35. [35]

    Graves, M

    A. Graves, M. G. Bellemare, J. Menick, R. Munos, and K. Kavukcuoglu. Automated curriculum learning for neural networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pages 1311–1320. JMLR.org, 2017

  36. [36]

    S. Gu, E. Holly, T. P. Lillicrap, and S. Levine. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017, pages 3389–3396, 2017

  37. [37]

    M. Guo, A. Haque, D.-A. Huang, S. Yeung, and L. Fei-Fei. Dynamic task prioritization for multitask learning. In Proceedings of the European Conference on Computer Vision (ECCV), pages 270–287, 2018

  38. [38]

    Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning

    A. Gupta, C. Devin, Y . Liu, P. Abbeel, and S. Levine. Learning invariant feature spaces to transfer skills with reinforcement learning. CoRR, abs/1703.02949, 2017

  39. [39]

    Gupta, B

    A. Gupta, B. Eysenbach, C. Finn, and S. Levine. Unsupervised meta-learning for reinforcement learning. CoRR, abs/1806.04640, 2018

  40. [40]

    Soft Actor-Critic Algorithms and Applications

    T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V . Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine. Soft actor-critic algorithms and applications. CoRR, abs/1812.05905, 2018

  41. [41]

    L. Han, Y . Guan, Z. X. Li, S. Qi, and J. C. Trinkle. Dextrous manipulation with rolling contacts. InProceedings of the 1997 IEEE International Conference on Robotics and Automation, Albuquerque, New Mexico, USA, April 20-25, 1997, pages 992–997, 1997

  42. [42]

    Han and J

    L. Han and J. C. Trinkle. Dextrous manipulation by rolling and finger gaiting. In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA-98, Leuven, Belgium, May 16-20, 1998 , pages 730–735, 1998

  43. [43]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

  44. [44]

    R. Higo, Y . Yamakawa, T. Senoo, and M. Ishikawa. Rubik’s cube handling using a high-speed multi-fingered hand and a high-speed vision system. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6609–6614. IEEE, 2018

  45. [45]

    Hochreiter and J

    S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997

  46. [46]

    W. H. Huang and M. T. Mason. Mechanics, planning, and control for tapping.I. J. Robotics Res., 19(10):883–894, 2000. 33 A PREPRINT - OCTOBER 17, 2019

  47. [47]

    Humplik, A

    J. Humplik, A. Galashov, L. Hasenclever, P. A. Ortega, Y . W. Teh, and N. Heess. Meta reinforcement learning as task inference. CoRR, abs/1905.06424, 2019

  48. [48]

    Hwangbo, J

    J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V . Tsounis, V . Koltun, and M. Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019

  49. [49]

    Kalashnikov, A

    D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V . Vanhoucke, and S. Levine. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation. ArXiv e-prints, June 2018

  50. [50]

    A. Kar, A. Prakash, M.-Y . Liu, E. Cameracci, J. Yuan, M. Rusiniak, D. Acuna, A. Torralba, and S. Fidler. Meta-Sim: Learning to Generate Synthetic Datasets. arXiv preprint 1904.11621, apr 2019

  51. [51]

    D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014

  52. [52]

    Learning Dexterous Manipulation Policies from Experience and Imitation

    V . Kumar, A. Gupta, E. Todorov, and S. Levine. Learning dexterous manipulation policies from experience and imitation. CoRR, abs/1611.05095, 2016

  53. [53]

    Kumar, E

    V . Kumar, E. Todorov, and S. Levine. Optimal control with learned local models: Application to dexterous manipulation. In 2016 IEEE International Conference on Robotics and Automation, ICRA 2016, Stockholm, Sweden, May 16-21, 2016, pages 378–383, 2016

  54. [54]

    L. Lan, Z. Li, X. Guan, and P. Wang. Meta reinforcement learning with task embedding and shared policy. In IJCAI, 2019

  55. [55]

    N. C. Landolfi, G. Thomas, and T. Ma. A model-based approach for sample-efficient multi-task reinforcement learning. CoRR, abs/1907.04964, 2019

  56. [56]

    Levine and V

    S. Levine and V . Koltun. Guided policy search. InProceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, pages 1–9, 2013

  57. [57]

    Levine, P

    S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. I. J. Robotics Res., 37(4-5):421–436, 2018

  58. [58]

    Levine, N

    S. Levine, N. Wagener, and P. Abbeel. Learning contact-rich manipulation skills with guided policy search. In IEEE International Conference on Robotics and Automation, ICRA 2015, Seattle, WA, USA, 26-30 May, 2015, pages 156–163, 2015

  59. [59]

    M. Li, Y . Bekiroglu, D. Kragic, and A. Billard. Learning of grasp adaptation through experience and tactile sensing. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, September 14-18, 2014, pages 3339–3346, 2014

  60. [60]

    M. Li, H. Yin, K. Tahara, and A. Billard. Learning object-level impedance control for robust grasping and dexterous manipulation. In 2014 IEEE International Conference on Robotics and Automation, ICRA 2014, Hong Kong, China, May 31 - June 7, 2014, pages 6784–6791, 2014

  61. [61]

    Q. Li, M. Meier, R. Haschke, H. J. Ritter, and B. Bolder. Rotary object dexterous manipulation in hand: a feedback-based method. IJMA, 3(1):36–47, 2013

  62. [62]

    T. Li, W. Xi, M. Fang, J. Xu, and M. Qing-Hu Meng. Learning to Solve a Rubik’s Cube with a Dexterous Hand. arXiv e-prints, page arXiv:1907.11388, Jul 2019

  63. [63]

    Lynch, M

    C. Lynch, M. Khansari, T. Xiao, V . Kumar, J. Tompson, S. Levine, and P. Sermanet. Learning latent plans from play. CoRR, abs/1903.01973, 2019

  64. [64]

    R. R. Ma and A. M. Dollar. On dexterity and dexterous manipulation. In 15th International Conference on Advanced Robotics: New Boundaries for Robotics, ICAR 2011, Tallinn, Estonia, June 20-23, 2011., pages 1–7, 2011

  65. [65]

    Mahler, J

    J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In Robotics: Science and Systems XIII, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA, July 12-16, 2017, 2017

  66. [66]

    Dex-Net 3.0: Computing Robust Robot Vacuum Suction Grasp Targets in Point Clouds using a New Analytic Model and Deep Learning

    J. Mahler, M. Matl, X. Liu, A. Li, D. V . Gealy, and K. Goldberg. Dex-net 3.0: Computing robust robot suction grasp targets in point clouds using a new analytic model and deep learning. CoRR, abs/1709.06670, 2017

  67. [67]

    Teacher-Student Curriculum Learning

    T. Matiisen, A. Oliver, T. Cohen, and J. Schulman. Teacher-student curriculum learning. arXiv preprint arXiv:1707.00183, 2017

  68. [68]

    Active Domain Randomization

    B. Mehta, M. Diaz, F. Golemo, C. J. Pal, and L. Paull. Active Domain Randomization.arXiv preprint 1904.04762, 2019

  69. [69]

    A Simple Neural Attentive Meta-Learner

    N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel. Meta-learning with temporal convolutions. CoRR, abs/1707.03141, 2017. 34 A PREPRINT - OCTOBER 17, 2019

  70. [70]

    Mordatch, Z

    I. Mordatch, Z. Popovic, and E. Todorov. Contact-invariant optimization for hand manipulation. InProceedings of the 2012 Eurographics/ACM SIGGRAPH Symposium on Computer Animation, SCA 2012, Lausanne, Switzerland, 2012, pages 137–144, 2012

  71. [71]

    Nagabandi, K

    A. Nagabandi, K. Konoglie, S. Levine, and V . Kumar. Deep Dynamics Models for Learning Dexterous Manipulation. In Conference on Robot Learning (CoRL), 2019

  72. [72]

    Nair and G

    V . Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. InProceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010

  73. [73]

    nRF52832 Product Specification v1.1

    Nordic Semiconductor. nRF52832 Product Specification v1.1. Technical report, Nordic Semiconductor, 2016

  74. [74]

    A. M. Okamura, N. Smaby, and M. R. Cutkosky. An overview of dexterous manipulation. In Proceedings of the 2000 IEEE International Conference on Robotics and Automation, ICRA 2000, April 24-28, 2000, San Francisco, CA, USA, pages 255–262, 2000

  75. [75]

    C. Olah, A. Satyanarayan, I. Johnson, S. Carter, L. Schubert, K. Ye, and A. Mordvintsev. The building blocks of interpretability. Distill, 2018. https://distill.pub/2018/building-blocks

  76. [76]

    OpenAI Five

    OpenAI. OpenAI Five. https://blog.openai.com/openai-five/, 2018

  77. [77]

    Andrychowicz, B

    OpenAI, M. Andrychowicz, B. Baker, M. Chociej, R. Józefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba. Learning dexterous in-hand manipulation. CoRR, 2018

  78. [78]

    3d-printed rubik’s cube robot

    Otvinta. 3d-printed rubik’s cube robot. http://www.rcr3d.com/, 2017

  79. [79]

    Concurrent Meta Reinforcement Learning

    E. Parisotto, S. Ghosh, S. B. Yalamanchi, V . Chinnaobireddy, Y . Wu, and R. Salakhutdinov. Concurrent meta reinforcement learning. CoRR, abs/1903.02710, 2019

  80. [80]

    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. CoRR, abs/1710.06537, 2017

Showing first 80 references.