Recognition: 2 theorem links
· Lean TheoremEvaluating Real-World Robot Manipulation Policies in Simulation
Pith reviewed 2026-05-13 10:59 UTC · model grok-4.3
The pith
Simplified simulated environments reliably predict real-world performance of robot manipulation policies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Paired sim-and-real evaluations demonstrate that policy performance in SIMPLER environments correlates strongly with real-world outcomes and that the simulations accurately capture real policy behavior modes, including sensitivity to distribution shifts.
What carries the argument
SIMPLER, a collection of simulated environments created by targeted mitigation of control and visual disparities between sim and real setups.
If this is right
- Policy rankings obtained in SIMPLER match real-world rankings.
- Simulated evaluations can reveal which policies are sensitive to particular distribution shifts before real-world deployment.
- Open-sourced environments allow standardized benchmarking of generalist manipulation policies.
- Researchers can use simulation for rapid iteration and only validate top candidates on hardware.
Where Pith is reading between the lines
- If the correlation generalizes, simulation could become the default first filter for most manipulation policy development.
- The same mitigation approach might extend to evaluation of policies on other robot hardware or non-manipulation tasks.
- Future experiments could check whether SIMPLER environments also predict performance on entirely novel tasks not used in the original paired tests.
- This evaluation-focused result could inform training pipelines that already rely on simulation by showing when simplified environments suffice for reliable assessment.
Load-bearing premise
The proposed mitigations for control and visual differences are sufficient to produce reliable performance correlations across a wide range of policies, tasks, and shifts without full-fidelity digital twins.
What would settle it
Running a new, previously untested manipulation policy through both a SIMPLER environment and the corresponding real robot setup and observing that success rates or failure modes diverge significantly.
read the original abstract
The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policies broaden the spectrum of tasks they can perform. We identify control and visual disparities between real and simulated environments as key challenges for reliable simulated evaluation and propose approaches for mitigating these gaps without needing to craft full-fidelity digital twins of real-world environments. We then employ these approaches to create SIMPLER, a collection of simulated environments for manipulation policy evaluation on common real robot setups. Through paired sim-and-real evaluations of manipulation policies, we demonstrate strong correlation between policy performance in SIMPLER environments and in the real world. Additionally, we find that SIMPLER evaluations accurately reflect real-world policy behavior modes such as sensitivity to various distribution shifts. We open-source all SIMPLER environments along with our workflow for creating new environments at https://simpler-env.github.io to facilitate research on general-purpose manipulation policies and simulated evaluation frameworks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SIMPLER, a collection of simulated environments for evaluating real-world robot manipulation policies. It identifies control and visual disparities as key challenges, proposes mitigation approaches that avoid full-fidelity digital twins, and reports paired sim-and-real evaluations demonstrating strong correlation in policy performance and accurate reflection of real-world behavior modes such as sensitivity to distribution shifts. The environments and workflow are open-sourced.
Significance. If the reported correlations hold under broader testing, the work would be significant for enabling scalable, reproducible evaluation of generalist manipulation policies, addressing key barriers in real-world robotics research. The open-sourcing of environments and creation workflow is a clear strength that could facilitate follow-on studies.
major comments (2)
- [Abstract] Abstract and evaluation sections: the central claim of 'strong correlation' between SIMPLER and real-world performance rests on paired evaluations, but the manuscript provides no details on the number of policies tested, exact success metrics, statistical significance tests, or controls for confounding factors such as task selection bias; this prevents verification of whether the correlation is robust or load-bearing for the claim.
- [Mitigation approaches] Mitigation approaches section: the proposed methods for closing control and visual gaps (e.g., action scaling, camera adjustments, domain randomization) must be shown to generalize beyond the evaluated policy set rather than being calibrated or validated only on the tested policies; otherwise the observed correlation does not establish that SIMPLER works out-of-the-box for new generalist policies as claimed.
minor comments (2)
- [Abstract] The abstract and introduction would benefit from a brief table summarizing the specific policies, tasks, and distribution shifts evaluated in the paired experiments.
- [SIMPLER environments] Notation for environment parameters (e.g., control gains or visual rendering settings) should be defined consistently when first introduced to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript. We have carefully addressed each major comment below and revised the paper to improve clarity, add missing details, and strengthen the presentation of our results.
read point-by-point responses
-
Referee: [Abstract] Abstract and evaluation sections: the central claim of 'strong correlation' between SIMPLER and real-world performance rests on paired evaluations, but the manuscript provides no details on the number of policies tested, exact success metrics, statistical significance tests, or controls for confounding factors such as task selection bias; this prevents verification of whether the correlation is robust or load-bearing for the claim.
Authors: We agree that the original manuscript did not present these details with sufficient explicitness, which hinders independent verification. In the revised version we have expanded the evaluation section to report the full set of policies evaluated, the precise definition of success metrics used in both sim and real, the statistical tests performed on the observed correlations (including correlation coefficients and significance levels), and a discussion of task selection criteria with explicit checks for selection bias. The abstract has also been updated to summarize these elements. These changes directly address the concern and make the strength of the correlation claim verifiable. revision: yes
-
Referee: [Mitigation approaches] Mitigation approaches section: the proposed methods for closing control and visual gaps (e.g., action scaling, camera adjustments, domain randomization) must be shown to generalize beyond the evaluated policy set rather than being calibrated or validated only on the tested policies; otherwise the observed correlation does not establish that SIMPLER works out-of-the-box for new generalist policies as claimed.
Authors: We acknowledge the importance of demonstrating that the mitigation strategies are not overfitted to the policies used in our initial experiments. The approaches were derived from general principles of control and visual domain gaps rather than policy-specific tuning; however, we agree that explicit validation on held-out policies is necessary to support the out-of-the-box claim. In the revision we have added experiments applying the same mitigation pipeline to additional policies that were not used during the development of the environments. We also provide explicit guidelines and parameter ranges for applying the methods to new policies. This strengthens the evidence that SIMPLER can be used for previously unseen generalist policies. revision: yes
Circularity Check
No significant circularity; central claim is direct empirical measurement
full rationale
The paper advances no mathematical derivation or first-principles result whose output reduces to its inputs by construction. Its core claim—that SIMPLER environments exhibit strong correlation with real-world policy performance—is established solely through paired sim-and-real rollouts on the same policies, tasks, and distribution shifts. Mitigation techniques for control and visual gaps are presented as engineering choices that are then validated by those independent measurements; no parameter is fitted on a subset and then relabeled as a prediction, and no self-citation chain is invoked to justify uniqueness or force the result. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Control and visual disparities are the primary challenges that can be mitigated without full-fidelity digital twins
Forward citations
Cited by 34 Pith papers
-
RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies
RoboLab is a new simulation benchmark with 120 tasks across visual, procedural, and relational axes that quantifies generalization gaps and perturbation sensitivity in task-generalist robotic policies.
-
DiffPhD: A Unified Differentiable Solver for Projective Heterogeneous Materials in Elastodynamics with Contact-Rich GPU-Acceleration
DiffPhD delivers a unified differentiable projective dynamics solver for heterogeneous hyperelastic elastodynamics with contact that achieves up to 10x speedup and stable convergence on 100x stiffness contrasts while ...
-
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation
OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
-
Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation
A multimodal transformer generates and caches interleaved text-image traces to guide closed-loop actions, achieving 92.4% success on LIBERO-Long and 95.5% average on LIBERO.
-
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing ...
-
JailWAM: Jailbreaking World Action Models in Robot Control
JailWAM is the first dedicated jailbreak framework for World Action Models, achieving 84.2% attack success rate on LingBot-VA in RoboTwin simulation and enabling safety evaluation of robotic AI.
-
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models
VP-VLA decouples high-level reasoning from low-level control in VLA models by rendering spatial anchors as visual prompts directly in the RGB observation space, outperforming end-to-end baselines.
-
HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness
HeiSD delivers up to 2.45x faster inference for embodied VLA models by hybridizing speculative decoding with kinematic boundary detection and error-mitigation tricks while preserving task success rates.
-
Towards Generalizable Robotic Manipulation in Dynamic Environments
DOMINO dataset and PUMA architecture enable better dynamic robotic manipulation by incorporating motion history, delivering 6.3% higher success rates than prior VLA models.
-
Robotic Control via Embodied Chain-of-Thought Reasoning
Training VLAs to perform embodied chain-of-thought reasoning about plans, sub-tasks, motions, and grounded visual features before acting raises OpenVLA success rates by 28% on challenging generalization tasks without ...
-
Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models
GTA-VLA conditions VLA models on user spatial priors to produce a unified spatial-visual chain-of-thought, reaching 81.2% success on SimplerEnv WidowX and improving performance under out-of-distribution shifts.
-
Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs
A retrieve-then-steer method stores successful robot actions in memory and uses them to steer a frozen VLA's flow-matching sampler for better test-time reliability without parameter updates.
-
Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs
Retrieve-then-steer stores successful observation-action segments in memory, retrieves relevant chunks, filters them, and uses an elite prior with confidence-adaptive guidance to steer a flow-matching action sampler f...
-
Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation
Anchor-Centric Adaptation escapes the diversity trap by prioritizing repeated demonstrations at core anchors over broad coverage, yielding higher success rates under fixed data budgets in robotic manipulation.
-
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
-
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
Empirical study on robosuite tasks reveals a dominant-skill effect in compositions and shows that an atomic probe approximates full revalidation for skill updates at much lower cost.
-
Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training
DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist...
-
dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model
A discrete diffusion model tokenizes multimodal robotic data and uses a progress token to predict future states and task completion for scalable policy evaluation.
-
Exploring High-Order Self-Similarity for Video Understanding
The MOSS module learns and combines multi-order space-time self-similarity features to enhance temporal dynamics modeling in videos across action recognition, VQA, and robotic tasks.
-
Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models
State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks...
-
ST-$\pi$: Structured SpatioTemporal VLA for Robotic Manipulation
ST-π structures VLA models by having a spatiotemporal VLM produce causally ordered chunk-level prompts that guide a dual-generator action expert to jointly handle spatial and temporal control in robotic manipulation.
-
OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation
OFlow unifies temporal foresight and object-aware reasoning inside a shared latent space via flow matching to improve VLA robustness in robotic manipulation under distribution shifts.
-
Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction
COIN provides 50 interactive robotic tasks, a 1000-demonstration dataset collected via AR teleoperation, and metrics showing that CodeAsPolicy, VLA, and H-VLA models fail at causally-dependent interactive reasoning du...
-
Grounded World Model for Semantically Generalizable Planning
A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.
-
{\Psi}-Map: Panoptic Surface Integrated Mapping Enables Real2Sim Transfer
Ψ-Map combines plane-constrained Gaussian surfels from LiDAR with end-to-end panoptic lifting to deliver high-precision geometric and semantic reconstruction in large-scale environments at real-time speeds.
-
AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement
AnySlot decouples language grounding from low-level control by inserting an explicit visual goal image, yielding better zero-shot performance on precise slot placement tasks than flat VLA policies.
-
RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies
RoboLab is a photorealistic simulation benchmark with 120 tasks and perturbation analysis to evaluate true generalization and robustness of robotic foundation models.
-
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
-
mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
mimic-video combines internet video pretraining with a flow-matching decoder to achieve state-of-the-art robotic manipulation performance with 10x better sample efficiency than vision-language-action models.
-
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
CogACT is a new VLA model that uses a conditioned diffusion action transformer to achieve over 35% higher average success rates than OpenVLA in simulation and 55% in real-robot experiments while generalizing to new ro...
-
EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development
EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.
-
Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance
Parameter differences from two training runs on a small task set are treated as auxiliary capability vectors that are merged into a pretrained VLA model, yielding auxiliary-task gains at the cost of ordinary supervise...
-
RoboECC: Multi-Factor-Aware Edge-Cloud Collaborative Deployment for VLA Models
RoboECC delivers up to 3.28x speedup for VLA model inference via co-aware segmentation and network-aware adjustment with 2.55-2.62% overhead.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
Reference graph
Works this paper leans on
-
[1]
Do as I can, not as I say: Grounding language in robotic affordances
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Haus- man, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, K...
work page 2022
-
[2]
Learning dexterous in-hand manipulation
OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal J ´ozefowicz, Bob McGrew, Jakub Pa- chocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, and Wojciech Zaremba. Learning dexterous in-hand manipulation. The Interna- tional Journal of Robotics Research , 39(1)...
work page 2020
-
[3]
Using simulation and domain adaptation to improve efficiency of deep robotic grasping
Konstantinos Bousmalis, Alex Irpan, Paul Wohlhart, Yunfei Bai, Matthew Kelcey, Mrinal Kalakrishnan, Laura Downs, Julian Ibarz, Peter Pastor, Kurt Konolige, Sergey Levine, and Vincent Vanhoucke. Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In 2018 IEEE international conference on robotics and automation (ICRA) , pa...
work page 2018
-
[4]
Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevcevi- ciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad ˙Zołna, Sc...
work page 2023
-
[5]
RT-2: Vision-language-action models transfer web knowledge to robotic control
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Flo- rence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexan- der Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashn...
work page 2023
-
[6]
RT-1: robotics transformer for real- world control at scale
Anthony Brohan, Noah Brown, Justice Carbajal, Yev- gen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang- Huei Lee, Sergey Levine, Yao Lu, Utsav Mall...
work page 2023
-
[7]
The ycb object and model set: Towards common benchmarks for manipulation research
Berk Calli, Arjun Singh, Aaron Walsman, Siddhartha Srinivasa, Pieter Abbeel, and Aaron M Dollar. The ycb object and model set: Towards common benchmarks for manipulation research. In 2015 international conference on advanced robotics (ICAR) , pages 510–517. IEEE, 2015
work page 2015
-
[8]
Motion- benchmaker: A tool to generate and benchmark motion planning datasets
Constantinos Chamzas, Carlos Quintero-Pena, Zachary Kingston, Andreas Orthey, Daniel Rakita, Michael Gle- icher, Marc Toussaint, and Lydia E Kavraki. Motion- benchmaker: A tool to generate and benchmark motion planning datasets. IEEE Robotics and Automation Let- ters, 7(2):882–889, 2021
work page 2021
-
[9]
Closing the sim-to-real loop: Adapting simulation randomization with real world experience
Yevgen Chebotar, Ankur Handa, Viktor Makoviychuk, Miles Macklin, Jan Issac, Nathan Ratliff, and Dieter Fox. Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In 2019 International Conference on Robotics and Automation (ICRA), pages 8973–8979. IEEE, 2019
work page 2019
-
[10]
Dif- fusion policy: Visuomotor policy learning via action diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Dif- fusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023
work page 2023
-
[11]
Open X-Embodiment Collaboration, Abhishek Padalkar, Acorn Pooley, Ajinkya Jain, Alex Bewley, Alex Her- zog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anikait Singh, Anthony Brohan, Antonin Raffin, Ayzaan Wahid, Ben Burgess-Limerick, Beomjoon Kim, Bern- hard Sch ¨olkopf, Brian Ichter, Cewu Lu, Charles Xu, Chelsea Finn, Chenfeng Xu, Cheng Chi, Chenguang H...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[12]
Analysis and observations from the first amazon picking challenge
Nikolaus Correll, Kostas E Bekris, Dmitry Berenson, Oliver Brock, Albert Causo, Kris Hauser, Kei Okada, Alberto Rodriguez, Joseph M Romano, and Peter R Wur- man. Analysis and observations from the first amazon picking challenge. IEEE Transactions on Automation Science and Engineering , 15(1):172–188, 2016
work page 2016
-
[13]
Sudeep Dasari, Jianren Wang, Joyce Hong, Shikhar Bahl, Yixin Lin, Austin S. Wang, Abitha Thankaraj, Karanbir Chahal, Berk C ¸ alli, Saurabh Gupta, David Held, Ler- rel Pinto, Deepak Pathak, Vikash Kumar, and Abhinav Gupta. RB2: robotic manipulation benchmarking with a twist. In Joaquin Vanschoren and Sai-Kit Yeung, edi- tors, Proceedings of the Neural Inf...
work page 2021
-
[14]
Robothor: An open simulation-to-real embodied ai plat- form
Matt Deitke, Winson Han, Alvaro Herrasti, Anirud- dha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, and Ali Farhadi. Robothor: An open simulation-to-real embodied ai plat- form. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pag...
work page 2020
-
[15]
Objaverse: A universe of annotated 3d objects
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 13142–13153, 2023
work page 2023
-
[16]
CARLA: An open urban driving simulator
Alexey Dosovitskiy, German Ros, Felipe Codevilla, An- tonio Lopez, and Vladlen Koltun. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning , pages 1–16, 2017
work page 2017
-
[17]
Bayesian imitation learning for end-to-end mobile manipulation
Yuqing Du, Daniel Ho, Alex Alemi, Eric Jang, and Mohi Khansari. Bayesian imitation learning for end-to-end mobile manipulation. In International Conference on Machine Learning, pages 5531–5546. PMLR, 2022
work page 2022
-
[18]
Manipulathor: A framework for visual object manipulation
Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli Van- derBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. Manipulathor: A framework for visual object manipulation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 4495–4504, 2021
work page 2021
-
[19]
Graspnet-1billion: A large-scale benchmark for general object grasping
Hao-Shu Fang, Chenxi Wang, Minghao Gou, and Cewu Lu. Graspnet-1billion: A large-scale benchmark for general object grasping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 11444–11453, 2020
work page 2020
-
[20]
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Zipeng Fu, Tony Z Zhao, and Chelsea Finn. Mo- bile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation. arXiv preprint arXiv:2401.02117, 2024
work page internal anchor Pith review arXiv 2024
-
[21]
Niklas Funk, Charles Schaff, Rishabh Madan, Takuma Yoneda, Julen Urain De Jesus, Joe Watson, Ethan K Gordon, Felix Widmaier, Stefan Bauer, Siddhartha S Srinivasa, et al. Benchmarking structured policies and policy optimization for real-world dexterous object ma- nipulation. IEEE Robotics and Automation Letters , 7(1): 478–485, 2021
work page 2021
-
[22]
Ran Gong, Jiangyong Huang, Yizhou Zhao, Haoran Geng, Xiaofeng Gao, Qingyang Wu, Wensi Ai, Ziheng Zhou, Demetri Terzopoulos, Song-Chun Zhu, Baoxiong Jia, and Siyuan Huang. Arnold: A benchmark for language-grounded task learning with continuous states in realistic 3d scenes. 2023 IEEE/CVF International Conference on Computer Vision (ICCV) , pages 20426– 20438, 2023
work page 2023
-
[23]
Jiayuan Gu, Fanbo Xiang, Xuanlin Li, Z. Ling, Xiqiang Liu, Tongzhou Mu, Yihe Tang, Stone Tao, Xinyue Wei, Yuan Yao, Xiao Yuan, Pengwei Xie, Zhiao Huang, Rui Chen, and Hao Su. Maniskill2: A unified benchmark for generalizable manipulation skills. In The Eleventh International Conference on Learning Representations , 2023
work page 2023
-
[24]
Retinagan: An object-aware approach to sim-to-real transfer
Daniel Ho, Kanishka Rao, Zhuo Xu, Eric Jang, Mohi Khansari, and Yunfei Bai. Retinagan: An object-aware approach to sim-to-real transfer. 2021 IEEE International Conference on Robotics and Automation (ICRA) , pages 10920–10926, 2020
work page 2021
-
[25]
Occlusion-aware reconstruction and manipulation of 3d articulated objects
Xiaoxia Huang, Ian Walker, and Stan Birchfield. Occlusion-aware reconstruction and manipulation of 3d articulated objects. In 2012 IEEE international con- ference on robotics and automation , pages 1365–1371. IEEE, 2012
work page 2012
-
[26]
Learning agile and dynamic motor skills for legged robots
Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019
work page 2019
-
[27]
Stephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell, and Konstantinos Bousmalis. Sim- to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages ...
work page 2019
-
[28]
Rlbench: The robot learning bench- mark & learning environment
Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J Davison. Rlbench: The robot learning bench- mark & learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020
work page 2020
-
[29]
BC-z: Zero-shot task generalization with robotic imitation learning
Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, and Chelsea Finn. BC-z: Zero-shot task generalization with robotic imitation learning. In 5th Annual Conference on Robot Learning, 2021
work page 2021
-
[30]
Ditto: Building digital twins of articulated objects from inter- action
Zhenyu Jiang, Cheng-Chun Hsu, and Yuke Zhu. Ditto: Building digital twins of articulated objects from inter- action. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022
work page 2022
-
[31]
Tensoir: Tensorial inverse rendering
Haian Jin, Isabella Liu, Peijia Xu, Xiaoshuai Zhang, Songfang Han, Sai Bi, Xiaowei Zhou, Zexiang Xu, and Hao Su. Tensoir: Tensorial inverse rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2023
work page 2023
-
[32]
Abhishek Kadian, Joanne Truong, Aaron Gokaslan, Alexander Clegg, Erik Wijmans, Stefan Lee, Manolis Savva, S. Chernova, and Dhruv Batra. Sim2real predic- tivity: Does evaluation in simulation predict real-world performance? IEEE Robotics and Automation Letters , 5: 6670–6677, 2019
work page 2019
-
[33]
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ash- win Balakrishna, Sudeep Dasari, Siddharth Karam- cheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset. arXiv preprint arXiv:2403.12945, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[34]
Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross B
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross B. Girshick. Segment anything. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3992–4003, 2023
work page 2023
-
[35]
The darpa robotics challenge finals: Results and perspectives
Eric Krotkov, Douglas Hackett, Larry Jackel, Michael Perschbacher, James Pippine, Jesse Strauss, Gill Pratt, and Christopher Orlowski. The darpa robotics challenge finals: Results and perspectives. The DARPA robotics challenge finals: Humanoid robots to the rescue , pages 1–26, 2018
work page 2018
-
[36]
RMA: rapid motor adaptation for legged robots
Ashish Kumar, Zipeng Fu, Deepak Pathak, and Jitendra Malik. RMA: rapid motor adaptation for legged robots. In Robotics: Science and Systems , 2021
work page 2021
-
[37]
Chengshu Li, Ruohan Zhang, J. Wong, Cem Gokmen, Sanjana Srivastava, Roberto Mart´ın-Mart´ın, Chen Wang, Gabrael Levine, Michael Lingelbach, Jiankai Sun, Mona Anvari, Minjune Hwang, Manasi Sharma, Arman Aydin, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R. Matthews, Ivan Villa-Renteria, Jerry Tang, Claire Tang, Fei Xia, Silvio Savarese, Hy...
work page 2023
-
[38]
Minchen Li, Zachary Ferguson, Teseo Schneider, Tim- othy Langlois, Denis Zorin, Daniele Panozzo, Chen- fanfu Jiang, and Danny M. Kaufman. Incremental potential contact: intersection-and inversion-free, large- deformation dynamics. ACM Trans. Graph. , 39(4), aug
- [39]
-
[40]
One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d diffusion
Minghua Liu, Ruoxi Shi, Linghao Chen, Zhuoyang Zhang, Chao Xu, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, and Hao Su. One-2-3-45++: Fast sin- gle image to 3d objects with consistent multi-view gener- ation and 3d diffusion. arXiv preprint arXiv:2311.07885, 2023
-
[41]
Roa, Berk C ¸ alli, Hao Su, Yu Sun, and Ping Tan
Ziyuan Liu, Wei Liu, Yuzhe Qin, Fanbo Xiang, Minghao Gou, Songyan Xin, M ´aximo A. Roa, Berk C ¸ alli, Hao Su, Yu Sun, and Ping Tan. Ocrtoc: A cloud-based competition and benchmark for robotic grasping and manipulation. IEEE Robotics and Automation Letters , 7:486–493, 2021
work page 2021
-
[42]
Jeffrey Mahler, Florian T Pokorny, Brian Hou, Melrose Roderick, Michael Laskey, Mathieu Aubry, Kai Kohlhoff, Torsten Kr¨oger, James Kuffner, and Ken Goldberg. Dex- net 1.0: A cloud-based network of 3d objects for robust grasp planning using a multi-armed bandit model with correlated rewards. In 2016 IEEE international confer- ence on robotics and automati...
work page 2016
-
[43]
Jeffrey Mahler, Jacky Liang, Sherdil Niyaz, Michael Laskey, Richard Doan, Xinyu Liu, Juan Aparicio Ojea, and Ken Goldberg. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. Proceedings of Robotics: Science and Systems (RSS), 2017
work page 2017
-
[44]
Isaac gym: High performance gpu-based physics simulation for robot learning, 2021
Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac gym: High performance gpu-based physics simulation for robot learning, 2021
work page 2021
-
[45]
Lidarsim: Realistic lidar simulation by leveraging the real world
Sivabalan Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, and Raquel Urtasun. Lidarsim: Realistic lidar simulation by leveraging the real world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 11167–11176, 2020
work page 2020
-
[46]
Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. Calvin: A benchmark for language- conditioned policy learning for long-horizon robot ma- nipulation tasks. IEEE Robotics and Automation Letters (RA-L), 7(3):7327–7334, 2022
work page 2022
-
[47]
Mark Moll, Ioan A Sucan, and Lydia E Kavraki. Bench- marking motion planning algorithms: An extensible in- frastructure for analysis and visualization. IEEE Robotics & Automation Magazine , 22(3):96–102, 2015
work page 2015
-
[48]
Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations
Tongzhou Mu, Zhan Ling, Fanbo Xiang, Derek Cathera Yang, Xuanlin Li, Stone Tao, Zhiao Huang, Zhiwei Jia, and Hao Su. Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations. In Thirty- fifth Conference on Neural Information Processing Sys- tems Datasets and Benchmarks Track (Round 2) , 2021
work page 2021
-
[49]
Galen E Mullins, Paul G Stankiewicz, and Satyandra K Gupta. Automated generation of diverse and challenging scenarios for test and evaluation of autonomous vehicles. In 2017 IEEE international conference on robotics and automation (ICRA), pages 1443–1450. IEEE, 2017
work page 2017
-
[50]
NVIDIA. Nvidia isaac sim. https://developer.nvidia.com/ isaac-sim, 2022
work page 2022
-
[51]
Octo: An open-source generalist robot policy
Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Charles Xu, Jianlan Luo, Tobias Kreiman, You Liang Tan, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source generalist robot policy. https://octo-models.github.io, 2023
work page 2023
-
[52]
Note on regression and inheritance in the case of two parents
Karl Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London , 58:240–242, 1895. ISSN 03701662. URL http://www.jstor.org/stable/115794
-
[53]
Sim-to-real transfer of robotic control with dynamics randomization
Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE international conference on robotics and automa- tion (ICRA), pages 3803–3810. IEEE, 2018
work page 2018
-
[54]
Martin Pincus. A monte carlo method for the approxi- mate solution of certain types of constrained optimization problems. Operations Research, 18(6):1225–1228, 1970. ISSN 0030364X, 15265463. URL http://www.jstor.org/ stable/169420
work page 1970
-
[55]
arXiv preprint arXiv:2310.13724 (2023) 14
Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, et al. Habitat 3.0: A co-habitat for humans, avatars and robots. arXiv preprint arXiv:2310.13724 , 2023
-
[56]
In-hand object rotation via rapid motor adaptation
Haozhi Qi, Ashish Kumar, Roberto Calandra, Yinsong Ma, and Jitendra Malik. In-hand object rotation via rapid motor adaptation. In Conference on Robot Learning , 2022
work page 2022
-
[57]
In-hand object rotation via rapid motor adaptation
Haozhi Qi, Ashish Kumar, Roberto Calandra, Yi Ma, and Jitendra Malik. In-hand object rotation via rapid motor adaptation. In Conference on Robot Learning , pages 1722–1732. PMLR, 2023
work page 2023
-
[58]
Rl-cyclegan: Reinforce- ment learning aware simulation-to-real
Kanishka Rao, Chris Harris, Alex Irpan, Sergey Levine, Julian Ibarz, and Mohi Khansari. Rl-cyclegan: Reinforce- ment learning aware simulation-to-real. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11154–11163, 2020
work page 2020
-
[59]
Scott E. Reed, Konrad Zolna, Emilio Parisotto, Ser- gio G ´omez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, and Nando de Freitas. A generalist agent. Transacti...
work page 2022
-
[60]
Habitat: A platform for embodied ai research
Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, et al. Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9339–9347, 2019
work page 2019
-
[61]
Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, and Hao Su. Zero123++: a single image to consis- tent multi-view diffusion base model. arXiv preprint arXiv:2310.15110, 2023
- [62]
-
[63]
Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Mart’in-Mart’in, Fei Xia, Kent Vainio, Zheng Lian, Cem Gokmen, S. Buch, C. Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, and Li Fei-Fei. Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In Conference on Robot Learning , pages 477–...
work page 2022
-
[64]
Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra
Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladim´ır V ondruˇs, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel X. Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra. Habitat 2.0: Trai...
work page 2021
-
[65]
Domain ran- domization for transferring deep neural networks from simulation to the real world
Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain ran- domization for transferring deep neural networks from simulation to the real world. In IROS, pages 23–30. IEEE, 2017
work page 2017
-
[66]
Karl Van Wyk, Joe Falco, and Elena Messina. Robotic grasping and manipulation competition: Future tasks to support the development of assembly robotics. In Robotic Grasping and Manipulation: First Robotic Grasping and Manipulation Challenge, RGMC 2016, Held in Conjunc- tion with IROS 2016, Daejeon, South Korea, October 10– 12, 2016, Revised Papers 1 , pag...
work page 2016
-
[67]
Bridgedata v2: A dataset for robot learning at scale
Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen- Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, and Sergey Levine. Bridgedata v2: A dataset for robot learning at scale. In Conference on Robot Learning (CoRL) , 2023
work page 2023
-
[68]
Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search
Xinyue Wei, Minghua Liu, Zhan Ling, and Hao Su. Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search. ACM Trans- actions on Graphics (TOG) , 41(4):1–18, 2022
work page 2022
-
[69]
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel X. Chang, Leonidas J. Guibas, and Hao Su. Sapien: A simulated part-based interactive environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11097–11107, 2020
work page 2020
-
[70]
Decomposing the generalization gap in imitation learning for visual robotic manipu- lation
Annie Xie, Lisa Lee, Ted Xiao, and Chelsea Finn. Decomposing the generalization gap in imitation learn- ing for visual robotic manipulation. arXiv preprint arXiv:2307.03659, 2023
-
[71]
Movingparts: Motion- based 3d part discovery in dynamic radiance field
Kaizhi Yang, Xiaoshuai Zhang, Zhiao Huang, Xuejin Chen, Zexiang Xu, and Hao Su. Movingparts: Motion- based 3d part discovery in dynamic radiance field. In The Twelfth International Conference on Learning Rep- resentations, 2024
work page 2024
-
[72]
Unisim: A neural closed-loop sensor simulator
Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Mani- vasagam, Wei-Chiu Ma, Anqi Joyce Yang, and Raquel Urtasun. Unisim: A neural closed-loop sensor simulator. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 1389–1399, 2023
work page 2023
-
[73]
Rotating without seeing: Towards in-hand dexterity through touch
Zhao-Heng Yin, Binghao Huang, Yuzhe Qin, Qifeng Chen, and Xiaolong Wang. Rotating without seeing: Towards in-hand dexterity through touch. In Robotics: Science and Systems , 2023
work page 2023
-
[74]
Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learning
Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pages 1094–1100. PMLR, 2020
work page 2020
-
[75]
Vr- goggles for robots: Real-to-sim domain adaptation for visual control
Jingwei Zhang, Lei Tai, Peng Yun, Yufeng Xiong, Ming Liu, Joschka Boedecker, and Wolfram Burgard. Vr- goggles for robots: Real-to-sim domain adaptation for visual control. IEEE Robotics and Automation Letters , 4 (2):1148–1155, 2019
work page 2019
- [76]
-
[77]
URL https://doi.org/10.1109/ICRA48891.2023.10160591
Gaoyue Zhou, Victoria Dean, Mohan Kumar Srirama, Aravind Rajeswaran, Jyothish Pari, Kyle Hatch, Aryan Jain, Tianhe Yu, Pieter Abbeel, Lerrel Pinto, Chelsea Finn, and Abhinav Gupta. Train offline, test online: A real robot learning benchmark. In IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023 , pages 9...
-
[78]
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
Yuke Zhu, Josiah Wong, Ajay Mandlekar, Roberto Mart´ın-Mart´ın, Abhishek Joshi, Soroush Nasiriany, and Yifeng Zhu. robosuite: A modular simulation frame- work and benchmark for robot learning. arXiv preprint arXiv:2009.12293, 2020. APPENDIX A CONTRIBUTIONS Project Leads: Xuanlin Li, Kyle Hsu Main Methodology : Xuanlin Li, Jiayuan Gu, Kyle Hsu SIMPLER Envi...
work page internal anchor Pith review Pith/arXiv arXiv 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.