{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2019:4AMK3ARX5LB2I4PTWPWQA24N33","short_pith_number":"pith:4AMK3ARX","schema_version":"1.0","canonical_sha256":"e018ad8237eac3a471f3b3ed006b8dded26fe0e3a574bbf9010e87e67771ab1f","source":{"kind":"arxiv","id":"1910.07113","version":1},"attestation_state":"computed","paper":{"title":"Solving Rubik's Cube with a Robot Hand","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Models trained only in simulation solve Rubik's cube with a real robot hand","cross_cats":["cs.AI","cs.CV","cs.RO","stat.ML"],"primary_cat":"cs.LG","authors_text":"Alex Paino, Arthur Petron, Bob McGrew, Glenn Powell, Ilge Akkaya, Jerry Tworek, Jonas Schneider, Lei Zhang, Lilian Weng, Maciek Chociej, Marcin Andrychowicz, Mateusz Litwin, Matthias Plappert, Nikolas Tezak, OpenAI, Peter Welinder, Qiming Yuan, Raphael Ribas, Wojciech Zaremba","submitted_at":"2019-10-16T00:59:05Z","abstract_excerpt":"We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distributio"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"1910.07113","kind":"arxiv","version":1},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2019-10-16T00:59:05Z","cross_cats_sorted":["cs.AI","cs.CV","cs.RO","stat.ML"],"title_canon_sha256":"38993c8ea71777b31231b14848cf409b3a16488a22b97efdab9f1f8b22df5b8b","abstract_canon_sha256":"eeac08547f78310ddd26cc3f2bdad3357474e66415eac5b34bafff58cf684458"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:52.880483Z","signature_b64":"8ijoDIKxL0glb4TBZ2+vE7/Q7vvnFIbZD7TssJ87IEO6bppdbklF6JQW3WWmItLjiQuXQPwa5z6/MrrP8eVKBg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"e018ad8237eac3a471f3b3ed006b8dded26fe0e3a574bbf9010e87e67771ab1f","last_reissued_at":"2026-05-17T23:38:52.879730Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:52.879730Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Solving Rubik's Cube with a Robot Hand","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Models trained only in simulation solve Rubik's cube with a real robot hand","cross_cats":["cs.AI","cs.CV","cs.RO","stat.ML"],"primary_cat":"cs.LG","authors_text":"Alex Paino, Arthur Petron, Bob McGrew, Glenn Powell, Ilge Akkaya, Jerry Tworek, Jonas Schneider, Lei Zhang, Lilian Weng, Maciek Chociej, Marcin Andrychowicz, Mateusz Litwin, Matthias Plappert, Nikolas Tezak, OpenAI, Peter Welinder, Qiming Yuan, Raphael Ribas, Wojciech Zaremba","submitted_at":"2019-10-16T00:59:05Z","abstract_excerpt":"We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distributio"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the physics simulator, even when randomized over a wide distribution via ADR, captures enough of the real robot's dynamics, friction, and sensor characteristics for the policy to transfer successfully without real-world fine-tuning or additional data.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Models trained only in simulation solve Rubik's cube with a real robot hand","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"5583030dea1a8e198935968899a65b61cc289b323bbd57d236b5a97e58a912fb"},"source":{"id":"1910.07113","kind":"arxiv","version":1},"verdict":{"id":"1b7e2b39-b36c-414c-8f6f-17d96f2a207c","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T09:33:24.267536Z","strongest_claim":"We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot.","one_line_summary":"Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the physics simulator, even when randomized over a wide distribution via ADR, captures enough of the real robot's dynamics, friction, and sensor characteristics for the policy to transfer successfully without real-world fine-tuning or additional data.","pith_extraction_headline":"Models trained only in simulation solve Rubik's cube with a real robot hand"},"references":{"count":123,"sample":[{"doi":"","year":1995,"title":"T. Abell and M. A. Erdmann. Stably supported rotations of a planar polygon with two frictionless contacts. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1","work_id":"f920fb59-691d-4fb9-b837-eb753645edd0","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1993,"title":"Y . Aiyama, M. Inaba, and H. Inoue. Pivoting: A new method of graspless manipulation of object by robot ﬁngers. In Proceedings of 1993 IEEE/RSJ International Conference on Intelligent Robots and Syste","work_id":"36e4f045-9acc-4db9-93b1-7b94d394ef2e","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"Reinforcement Learning for Pivoting Task","work_id":"d460bc67-912f-41ab-92fe-1ef31c8ad186","ref_index":3,"cited_arxiv_id":"1703.00472","is_internal_anchor":true},{"doi":"","year":2018,"title":"Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation","work_id":"b792cb8d-2321-4883-aa54-9e2d9800d6f0","ref_index":4,"cited_arxiv_id":"1810.01845","is_internal_anchor":true},{"doi":"","year":2013,"title":"T. Asfour, J. Schill, H. Peters, C. Klas, J. Bücker, C. Sander, S. Schulz, A. Kargov, T. Werner, and V . Bartenbach. Armar-4: A 63 dof torque controlled humanoid robot. In 2013 13th IEEE-RAS Internati","work_id":"4bfeb805-2c90-4c57-9f90-e8a6a216fe56","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":123,"snapshot_sha256":"e41db157a8308bd8e0ef25d6099025b8b46bc192859ae0dad8243b2b1f057017","internal_anchors":44},"formal_canon":{"evidence_count":2,"snapshot_sha256":"19a60a838adc59678c5bd8cc12e58317b4419a7cba7de1d83363205a6e314b55"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"1910.07113","created_at":"2026-05-17T23:38:52.879842+00:00"},{"alias_kind":"arxiv_version","alias_value":"1910.07113v1","created_at":"2026-05-17T23:38:52.879842+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.1910.07113","created_at":"2026-05-17T23:38:52.879842+00:00"},{"alias_kind":"pith_short_12","alias_value":"4AMK3ARX5LB2","created_at":"2026-05-18T12:33:10.108867+00:00"},{"alias_kind":"pith_short_16","alias_value":"4AMK3ARX5LB2I4PT","created_at":"2026-05-18T12:33:10.108867+00:00"},{"alias_kind":"pith_short_8","alias_value":"4AMK3ARX","created_at":"2026-05-18T12:33:10.108867+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":40,"internal_anchor_count":40,"sample":[{"citing_arxiv_id":"2411.04832","citing_title":"Plasticity Loss in Deep Reinforcement Learning: A Survey","ref_index":88,"is_internal_anchor":true},{"citing_arxiv_id":"2009.03393","citing_title":"Generative Language Modeling for Automated Theorem Proving","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2503.15481","citing_title":"Learning to Play Piano in the Real World","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2505.06182","citing_title":"Apple: Toward General Active Perception via Reinforcement Learning","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21688","citing_title":"Closed-Loop Sim-to-Real Reinforcement Learning for Deformable Microfiber Shape Control","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2603.12243","citing_title":"HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21458","citing_title":"Mind the Sim-to-Real Gap & Think Like a Scientist","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21330","citing_title":"Learning Robust Dexterous In-Hand Manipulation from Joint Sensors with Proprioceptive Transformer","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09183","citing_title":"Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16520","citing_title":"Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing","ref_index":164,"is_internal_anchor":true},{"citing_arxiv_id":"2509.18455","citing_title":"Learning Geometry-Aware Nonprehensile Pushing and Pulling with Dexterous Hands","ref_index":46,"is_internal_anchor":true},{"citing_arxiv_id":"2510.03599","citing_title":"Learning to Act Through Contact: A Unified View of Multi-Task Robot Learning","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2510.17640","citing_title":"RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2102.01293","citing_title":"Scaling Laws for Transfer","ref_index":182,"is_internal_anchor":true},{"citing_arxiv_id":"2302.11550","citing_title":"Scaling Robot Learning with Semantically Imagined Experience","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2601.14617","citing_title":"UniCon: A Unified System for Efficient Robot Learning Transfers","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2505.18719","citing_title":"VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2602.01505","citing_title":"Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum","ref_index":42,"is_internal_anchor":true},{"citing_arxiv_id":"2309.16797","citing_title":"Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution","ref_index":246,"is_internal_anchor":true},{"citing_arxiv_id":"2603.04531","citing_title":"PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2603.12243","citing_title":"HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14350","citing_title":"Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling","ref_index":40,"is_internal_anchor":true},{"citing_arxiv_id":"2603.22126","citing_title":"ROBOGATE: Adaptive Failure Discovery for Safe Robot Policy Deployment via Two-Stage Boundary-Focused Sampling","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2604.04138","citing_title":"Learning Dexterous Grasping from Sparse Taxonomy Guidance","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"1912.06680","citing_title":"Dota 2 with Large Scale Deep Reinforcement Learning","ref_index":26,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/4AMK3ARX5LB2I4PTWPWQA24N33","json":"https://pith.science/pith/4AMK3ARX5LB2I4PTWPWQA24N33.json","graph_json":"https://pith.science/api/pith-number/4AMK3ARX5LB2I4PTWPWQA24N33/graph.json","events_json":"https://pith.science/api/pith-number/4AMK3ARX5LB2I4PTWPWQA24N33/events.json","paper":"https://pith.science/paper/4AMK3ARX"},"agent_actions":{"view_html":"https://pith.science/pith/4AMK3ARX5LB2I4PTWPWQA24N33","download_json":"https://pith.science/pith/4AMK3ARX5LB2I4PTWPWQA24N33.json","view_paper":"https://pith.science/paper/4AMK3ARX","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=1910.07113&json=true","fetch_graph":"https://pith.science/api/pith-number/4AMK3ARX5LB2I4PTWPWQA24N33/graph.json","fetch_events":"https://pith.science/api/pith-number/4AMK3ARX5LB2I4PTWPWQA24N33/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/4AMK3ARX5LB2I4PTWPWQA24N33/action/timestamp_anchor","attest_storage":"https://pith.science/pith/4AMK3ARX5LB2I4PTWPWQA24N33/action/storage_attestation","attest_author":"https://pith.science/pith/4AMK3ARX5LB2I4PTWPWQA24N33/action/author_attestation","sign_citation":"https://pith.science/pith/4AMK3ARX5LB2I4PTWPWQA24N33/action/citation_signature","submit_replication":"https://pith.science/pith/4AMK3ARX5LB2I4PTWPWQA24N33/action/replication_record"}},"created_at":"2026-05-17T23:38:52.879842+00:00","updated_at":"2026-05-17T23:38:52.879842+00:00"}