pith. sign in

arxiv: 2606.16776 · v2 · pith:P33HVNIMnew · submitted 2026-06-15 · 💻 cs.RO

JoyAI-Sim: A Simulation-Enabled Interconversion Toolchain for the Embodied Data Pyramid

Pith reviewed 2026-06-30 10:28 UTC · model grok-4.3

classification 💻 cs.RO
keywords robot simulationembodied data generationhuman-robot alignmentdigital twinsimulation toolchaingeneralist robot policiesphysical consistency filter
0
0 comments X

The pith

JoyAI-Sim provides bidirectional pathways that convert real robot tasks into simulations for human evaluation and lift human demonstrations into robot trajectories while enforcing physical constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Generalist robot policies need large-scale evaluation and training data, yet real-robot experiments remain slow and costly. The paper introduces JoyAI-Sim as an interconversion toolchain that moves information among robots, simulations, and humans in both directions. One pathway rebuilds real tabletop tasks as digital twins in simulation so humans can inspect motion naturalness at scale. The other pathway takes ego-centric human demonstrations, applies robot physical limits inside the simulator, and produces usable robot trajectories and observations. The JoySim simulator functions as both the evaluation layer and the consistency filter, with core modules offered as cloud services for repeated use.

Core claim

JoyAI-Sim establishes two complementary pathways denoted Robot ⇌ Simulation ⇌ Human. The Robot → Simulation → Human direction reconstructs real-robot tabletop organization tasks as calibrated digital twins for scalable evaluation and applies human embodied feedback to refine simulated motion naturalness. The Human → Simulation → Robot direction lifts ego-centric human demonstrations into simulation, verifies them against robot physical constraints, and converts them into robot-centered trajectories, annotations, and visual observations. The JoySim simulator thereby serves simultaneously as a scalable evaluation layer and a physical consistency filter for robot data generation, with the recon

What carries the argument

The bidirectional Robot ⇌ Simulation ⇌ Human pathways with JoySim acting as both scalable evaluation layer and physical consistency filter.

If this is right

  • Real-robot trials can be evaluated at larger scale without repeated physical execution.
  • Human demonstrations become convertible into trajectories that already satisfy robot physical limits.
  • Model evaluation gains an explicit human-robot alignment step through embodied feedback on simulated motions.
  • Data generation pipelines gain an early filter that discards physically inconsistent motions before robot deployment.
  • Core modules become reusable cloud infrastructure rather than one-off local setups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same interconversion pattern could reduce the fraction of development time spent on physical hardware across a wider range of manipulation tasks.
  • If the digital-twin calibration step generalizes, similar toolchains might support sim-to-real transfer testing for non-tabletop environments.
  • Cloud packaging of the modules opens the possibility of shared community datasets generated under consistent physical filters.
  • The realism-augmentation modules might be tested independently to measure their isolated effect on downstream policy performance.

Load-bearing premise

Real-robot tabletop tasks can be accurately rebuilt as calibrated digital twins in simulation and human embodied feedback can reliably inspect and refine the naturalness of simulated motions.

What would settle it

A side-by-side test in which policies trained or evaluated exclusively through the toolchain produce measurably lower success rates or higher failure modes when deployed on the original physical robots compared with policies trained only on real-robot data.

read the original abstract

Generalist robot policies require trustworthy evaluation and robot-usable training data, but both are difficult to scale with physical robots alone. Real-robot trials and demonstrations remain the most faithful source of deployment signals, yet they are slow, costly, and hard to reproduce. We present JoyAI-Sim, a simulation-enabled interconversion toolchain for human-robot aligned model evaluation and data generation, denoted as Robot $\rightleftharpoons$ Simulation $\rightleftharpoons$ Human. On the one hand, the Robot $\rightarrow$ Simulation $\rightarrow$ Human pathway supports human-robot aligned model evaluation by reconstructing real-robot tabletop organization tasks as calibrated digital twins for scalable evaluation, while using human embodied feedback to inspect and refine the naturalness of simulated motions. On the other hand, the Human $\rightarrow$ Simulation $\rightarrow$ Robot pathway supports human-robot aligned data generation: it lifts ego-centric human demonstrations into simulation, checks them under robot physical constraints, and converts them into robot-centered trajectories, annotations, and visual observations. Together, these pathways use the JoySim simulator as both a scalable evaluation layer and a physical consistency filter for robot data generation. We further package the core reconstruction, simulation, rendering, and realism-augmentation modules as cloud services on JD Cloud, turning the system into reusable infrastructure for robot data generation and model evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents JoyAI-Sim, a simulation-enabled interconversion toolchain for human-robot aligned model evaluation and data generation, denoted as Robot ⇌ Simulation ⇌ Human. It describes two pathways: Robot → Simulation → Human, which reconstructs real-robot tabletop tasks as calibrated digital twins for scalable evaluation and uses human embodied feedback to refine simulated motion naturalness; and Human → Simulation → Robot, which lifts ego-centric human demonstrations into simulation, enforces robot physical constraints, and converts them to robot-centered trajectories and annotations. The JoySim simulator is positioned as both an evaluation layer and physical consistency filter, with core modules packaged as JD Cloud services for reusable infrastructure.

Significance. If the described reconstruction, feedback, and conversion mechanisms can be validated, the toolchain could provide meaningful infrastructure for scaling trustworthy evaluation and data generation in generalist robot policies, addressing the cost and reproducibility limits of physical-robot-only approaches while enabling human-robot alignment.

major comments (3)
  1. [Abstract] Abstract: The claims that the pathways deliver 'human-robot aligned model evaluation' and that JoySim serves as a 'scalable evaluation layer and a physical consistency filter' are load-bearing for the central contribution, yet the manuscript provides no experimental results, validation metrics (e.g., reconstruction pose/trajectory error, physics parameter fidelity), error analysis, or baseline comparisons to support effectiveness or alignment.
  2. [Robot → Simulation → Human pathway] Robot → Simulation → Human pathway description: The assumption that real-robot tabletop organization tasks can be accurately reconstructed as calibrated digital twins and that human embodied feedback reliably inspects/refines motion naturalness is presented without any reported quantitative metrics on reconstruction fidelity or inter-rater reliability of naturalness judgments; if either fails, the evaluation pathway cannot deliver the claimed trustworthiness.
  3. [Human → Simulation → Robot pathway] Human → Simulation → Robot pathway description: No details or results are given on the accuracy of physical-constraint checking, the fidelity of converted robot-centered trajectories, or how annotations/visual observations are generated, which is required to substantiate the data-generation claims.
minor comments (2)
  1. The bidirectional notation (Robot ⇌ Simulation ⇌ Human) is introduced in the abstract but would benefit from an accompanying diagram or explicit definition of the interconversion steps in the main text for clarity.
  2. The manuscript refers to 'core reconstruction, simulation, rendering, and realism-augmentation modules' without specifying their implementation details, input/output formats, or open-source availability, which would aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for empirical grounding of the toolchain claims. The manuscript is a system description of JoyAI-Sim and its cloud services; we address each point by clarifying scope and proposing targeted revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claims that the pathways deliver 'human-robot aligned model evaluation' and that JoySim serves as a 'scalable evaluation layer and a physical consistency filter' are load-bearing for the central contribution, yet the manuscript provides no experimental results, validation metrics (e.g., reconstruction pose/trajectory error, physics parameter fidelity), error analysis, or baseline comparisons to support effectiveness or alignment.

    Authors: We agree the claims are load-bearing and currently unsupported by metrics. The paper presents the architectural design and intended use of the pathways rather than validated performance. In revision we will rephrase the abstract to indicate design intent (e.g., 'is designed to support' instead of 'supports') and add an explicit Future Work section outlining planned quantitative validation, including the suggested metrics on reconstruction error and physics fidelity. revision: yes

  2. Referee: [Robot → Simulation → Human pathway] Robot → Simulation → Human pathway description: The assumption that real-robot tabletop organization tasks can be accurately reconstructed as calibrated digital twins and that human embodied feedback reliably inspects/refines motion naturalness is presented without any reported quantitative metrics on reconstruction fidelity or inter-rater reliability of naturalness judgments; if either fails, the evaluation pathway cannot deliver the claimed trustworthiness.

    Authors: We acknowledge the absence of quantitative metrics on reconstruction fidelity and naturalness judgment reliability. The current text describes the system modules and workflow. We will add a Limitations subsection discussing these assumptions and the conditions under which the pathway may not achieve the intended trustworthiness. Because no such experiments were performed for this submission, specific numerical results cannot be inserted. revision: partial

  3. Referee: [Human → Simulation → Robot pathway] Human → Simulation → Robot pathway description: No details or results are given on the accuracy of physical-constraint checking, the fidelity of converted robot-centered trajectories, or how annotations/visual observations are generated, which is required to substantiate the data-generation claims.

    Authors: We agree that implementation-level details and accuracy results for constraint checking, trajectory fidelity, and annotation generation are not provided. We will expand the relevant section with additional pseudocode and module descriptions for how physical constraints are enforced and how robot-centered outputs are produced. Accuracy metrics remain unavailable without new experiments and will be noted as future work. revision: partial

Circularity Check

0 steps flagged

No circularity: purely architectural description of toolchain with no derivations, fits, or predictions

full rationale

The paper presents JoyAI-Sim as a simulation toolchain with two pathways (Robot→Simulation→Human and Human→Simulation→Robot) that use JoySim for evaluation and data generation. No equations, parameters, predictions, or derivations appear in the abstract or described content. The description is infrastructural and does not reduce any claim to a self-citation, fit, or definitional loop. This matches the default expectation of no circularity for non-mathematical system papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based solely on the abstract; no explicit free parameters, invented entities, or detailed axioms are stated. The system implicitly relies on standard robotics simulation assumptions about model accuracy.

axioms (1)
  • domain assumption Real-robot tasks can be accurately reconstructed as calibrated digital twins and human feedback can refine simulated motions for naturalness
    This premise underpins the Robot → Simulation → Human pathway for aligned evaluation.

pith-pipeline@v0.9.1-grok · 5901 in / 1339 out tokens · 52050 ms · 2026-06-30T10:28:46.970023+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 34 canonical work pages · 13 internal anchors

  1. [1]

    Real-is-sim: Bridging the sim-to-real gap with a dynamic digital twin for real-world robot policy evaluation

    Jad Abou-Chakra, Lingfeng Sun, Krishan Rana, Brandon May, Karl Schmeckpeper, Maria Vittoria Minniti, and Laura Herlant. Real-is-sim: Bridging the sim-to-real gap with a dynamic digital twin for real-world robot policy evaluation. arXiv preprint arXiv:2504.03597, 2025

  2. [2]

    Cosmos-transfer1: Conditional world generation with adaptive multimodal control.arXiv preprint arXiv:2503.14492, 2025

    Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, et al. Cosmos-transfer1: Conditional world generation with adaptive multimodal control.arXiv preprint arXiv:2503.14492, 2025

  3. [3]

    World Simulation with Video Foundation Models for Physical AI

    Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Balaji, Aaron Blakeman, Tiffany Cai, Jiaxin Cao, Tianshi Cao, Elizabeth Cha, Yu-Wei Chao, et al. World simulation with video foundation models for physical ai.arXiv preprint arXiv:2511.00062, 2025

  4. [4]

    Ikflow: Generating diverse inverse kinematics solutions,

    Barrett Ames, Jeremy Morgan, and George Konidaris. Ikflow: Generating diverse inverse kinematics solutions,

  5. [5]

    URLhttps://arxiv.org/abs/2111.08933

  6. [6]

    Roboarena: Distributed real-world evaluation of generalist robot policies

    Pranav Atreya, Karl Pertsch, Tony Lee, Moo Jin Kim, Arhan Jain, Artur Kuramshin, Clemens Eppner, Cyrus Neary, Edward Hu, Fabio Ramos, et al. Roboarena: Distributed real-world evaluation of generalist robot policies. arXiv preprint arXiv:2506.18123, 2025

  7. [7]

    RT-1: Robotics Transformer for Real-World Control at Scale

    Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et al. Rt-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022

  8. [8]

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818, 2023

  9. [9]

    RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

    Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Zixuan Li, Qiwei Liang, Xianliang Lin, Yiheng Ge, Zhenyu Gu, et al. Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation.arXiv preprint arXiv:2506.18088, 2025

  10. [10]

    X-sim: Cross-embodiment learning via real-to-sim-to-real

    Prithwish Dan, Kushal Kedia, Angela Chao, Edward Weiyi Duan, Maximus Adrian Pace, Wei-Chiu Ma, and Sanjiban Choudhury. X-sim: Cross-embodiment learning via real-to-sim-to-real. InProceedings of the Conference on Robot Learning (CoRL), 2025

  11. [11]

    GaussGym: An open-source real-to-sim framework for learning locomotion from pixels.arXiv preprint arXiv:2510.15352, 2025

    Alejandro Escontrela, Justin Kerr, Arthur Allshire, Jonas Frey, Rocky Duan, Carmelo Sferrazza, and Pieter Abbeel. GaussGym: An open-source real-to-sim framework for learning locomotion from pixels.arXiv preprint arXiv:2510.15352, 2025

  12. [12]

    Rebot: Scaling robot learning with real-to-sim-to-real robotic video synthesis, 2025

    YuFang, YueYang, XinghaoZhu, KaiyuanZheng, GedasBertasius, DanielSzafir, andMingyuDing. Rebot: Scaling robot learning with real-to-sim-to-real robotic video synthesis, 2025. URLhttps://arxiv.org/abs/2503.14526

  13. [13]

    Ego4d: Around the world in 3,000 hours of egocentric video

    Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, et al. Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18995–19012, 2022

  14. [14]

    Dexman: Learning bimanual dexterous manipulation from human and generated videos.arXiv preprint arXiv:2510.08475, 2025

    Jhen Hsieh, Kuan-Hsun Tu, Kuo-Han Hung, and Tsung-Wei Ke. Dexman: Learning bimanual dexterous manipulation from human and generated videos.arXiv preprint arXiv:2510.08475, 2025

  15. [15]

    Comparison between behavior trees and finite state machines, 2024

    Matteo Iovino, Julian Förster, Pietro Falco, Jen Jen Chung, Roland Siegwart, and Christian Smith. Comparison between behavior trees and finite state machines, 2024. URLhttps://arxiv.org/abs/2405.16137

  16. [16]

    Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J. Davison. Rlbench: The robot learning benchmark and learning environment.IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020

  17. [17]

    GSWorld: Closed-loop photo-realistic simulation suite for robotic manipulation.arXiv preprint arXiv:2510.20813, 2025

    Guangqi Jiang, Haoran Chang, Ri-Zhao Qiu, Yutong Liang, Mazeyu Ji, Jiyue Zhu, Zhao Dong, Xueyan Zou, and Xiaolong Wang. GSWorld: Closed-loop photo-realistic simulation suite for robotic manipulation.arXiv preprint arXiv:2510.20813, 2025

  18. [18]

    Rl-driven data generation for robust vision-based dexterous grasping

    Atsushi Kanehira, Naoki Wake, Kazuhiro Sasabuchi, Jun Takamatsu, and Katsushi Ikeuchi. Rl-driven data generation for robust vision-based dexterous grasping. ArXiv, abs/2504.18084, 2025. URL https: //api.semanticscholar.org/CorpusID:278129761. 22

  19. [19]

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset.arXiv preprint arXiv:2403.12945, 2024

  20. [20]

    OpenVLA: An Open-Source Vision-Language-Action Model

    Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, and Sergey Levine. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

  21. [21]

    AI2-THOR: An Interactive 3D Environment for Visual AI

    Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Daniel Gordon, Yuke Zhu, Abhinav Gupta, and Ali Farhadi. AI2-THOR: An interactive 3d environment for visual AI.arXiv preprint arXiv:1712.05474, 2017

  22. [22]

    Teleopbench: A simulator-centric benchmark for dual-arm dexterous teleoperation, 2025

    Hangyu Li, Qin Zhao, Haoran Xu, Xinyu Jiang, Qingwei Ben, Feiyu Jia, Haoyu Zhao, Liang Xu, Jia Zeng, Hanqing Wang, Bo Dai, Junting Dong, and Jiangmiao Pang. Teleopbench: A simulator-centric benchmark for dual-arm dexterous teleoperation, 2025. URLhttps://arxiv.org/abs/2505.12748

  23. [23]

    Evaluating real-world robot manipulation policies in simulation

    Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, and Ted Xiao. Evaluating real-world robot manipulation policies in simulation. InConference on Robot Learning (CoRL), 2024

  24. [24]

    EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks

    Yihang Li, Xuelong Wei, Jingzhou Luo, Yingjing Xiao, Yibo Bai, Guangyuan Zhou, Teng Zou, Chenguang Gui, Jiajun Wen, He Zhang, Kangliang Chen, Xing Pan, Shuaiyan Liu, Daming Wang, Tao An, Jiayi Li, Shibo Jin, Wanwan Zhang, Tianyu Wang, Boren Wei, Zhixuan Huang, Fangsheng Liu, Ruodai Li, Hui Zhang, Anson Li, Yicheng Gong, Peng Cao, Jiaming Liang, and Liang ...

  25. [25]

    The robot’s inner critic: Self-refinement of social behaviors through vlm-based replanning, 2026

    Jiyu Lim, Youngwoo Yoon, and Kwanghyun Park. The robot’s inner critic: Self-refinement of social behaviors through vlm-based replanning, 2026. URLhttps://arxiv.org/abs/2603.20164

  26. [26]

    Libero: Benchmarking knowledge transfer for lifelong robot learning

    Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning. InThirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023

  27. [27]

    Robo-gs: A physics consistent spatial-temporal model for robotic arm with hybrid representation

    Haozhe Lou, Yurong Liu, Yike Pan, Yiran Geng, Jianteng Chen, Wenlong Ma, Chenglong Li, Lin Wang, Hengzhen Feng, Lu Shi, Liyi Luo, and Yongliang Shi. Robo-gs: A physics consistent spatial-temporal model for robotic arm with hybrid representation. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 15379–15386, 2025. doi: 10.1109/I...

  28. [28]

    Isaac gym: High performance gpu-based physics simulation for robot learning

    Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac gym: High performance gpu-based physics simulation for robot learning. InAdvances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2021

  29. [29]

    RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation

    Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, Silvio Savarese, and Li Fei-Fei. Roboturk: A crowdsourcing platform for robotic skill learning through imitation, 2018. URLhttps://arxiv.org/abs/1811.02790

  30. [30]

    MimicGen: A data generation system for scalable robot learning using human demonstrations

    Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, and Dieter Fox. MimicGen: A data generation system for scalable robot learning using human demonstrations. In Conference on Robot Learning (CoRL), 2023

  31. [31]

    Robotwin: Dual-arm robot benchmark with generative digital twins

    Yao Mu, Tianxing Chen, Zanxin Chen, Shijia Peng, Zhiqian Lan, Zeyu Gao, Zhixuan Liang, Qiaojun Yu, Yude Zou, Mingkun Xu, et al. Robotwin: Dual-arm robot benchmark with generative digital twins. InProceedings of the computer vision and pattern recognition conference, pages 27649–27660, 2025

  32. [32]

    RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

    Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, Adeet Parikh, Aaron Lo, Abhishek Joshi, Ajay Mandlekar, and Yuke Zhu. Robocasa: Large-scale simulation of everyday tasks for generalist robots. arXiv preprint arXiv:2406.02523, 2024

  33. [33]

    Open x-embodiment: Robotic learning datasets and rt-x models

    Abby O’Neill, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, et al. Open x-embodiment: Robotic learning datasets and rt-x models. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903,

  34. [34]

    doi: 10.1109/ICRA57147.2024.10611477. 23

  35. [35]

    A real-to-sim-to-real approach to robotic manipulation with vlm-generated iterative keypoint rewards

    Shivansh Patel, Xinchen Yin, Wenlong Huang, Shubham Garg, Hooshang Nayyeri, Li Fei-Fei, Svetlana Lazebnik, and Yunzhu Li. A real-to-sim-to-real approach to robotic manipulation with vlm-generated iterative keypoint rewards. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2025

  36. [36]

    Reconstructing hands in 3D with transformers

    Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, and Jitendra Malik. Reconstructing hands in 3D with transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  37. [37]

    Sim-to-real transfer of robotic control with dynamics randomization

    Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. InIEEE International Conference on Robotics and Automation, 2018

  38. [38]

    Habitat: A platform for embodied AI research

    Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. Habitat: A platform for embodied AI research. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019

  39. [39]

    Tchapmi, Micael E

    Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Claudia Pérez-D’Arpino, Shyamal Buch, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Josiah Wong, Li Fei-Fei, and Silvio Savarese. iGibson 1.0: A simulation environment for interactive tasks in large realistic scenes. In Proceedings of the IEEE/RSJ Inte...

  40. [40]

    EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration

    Modi Shi, Shijia Peng, Jin Chen, Haoran Jiang, Yinghui Li, Di Huang, Ping Luo, Hongyang Li, and Li Chen. Egohumanoid: Unlocking in-the-wild loco-manipulation with robot-free egocentric demonstration.arXiv preprint arXiv:2602.10106, 2026

  41. [41]

    Maniparena: Comprehensive real-world evaluation of reasoning-oriented generalist robot manipulation

    Yu Sun, Meng Cao, Ping Yang, Rongtao Xu, Yunxiao Yan, Runze Xu, Liang Ma, Roy Gan, Andy Zhai, Qingxuan Chen, et al. Maniparena: Comprehensive real-world evaluation of reasoning-oriented generalist robot manipulation. arXiv preprint arXiv:2603.28545, 2026

  42. [42]

    Domain randomization for transferring deep neural networks from simulation to the real world

    Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

  43. [43]

    Reconciling reality through simulation: A real- to-sim-to-real approach for robust manipulation,

    Marcel Torne, Anthony Simeonov, Zechu Li, April Chan, Tao Chen, Abhishek Gupta, and Pulkit Agrawal. Reconciling reality through simulation: A real-to-sim-to-real approach for robust manipulation.arXiv preprint arXiv:2403.03949, 2024

  44. [44]

    Bridgedata v2: A dataset for robot learning at scale.arXiv preprint arXiv:2308.12952, 2023

    Homer Walke, Kevin Black, Abraham Lee, et al. Bridgedata v2: A dataset for robot learning at scale.arXiv preprint arXiv:2308.12952, 2023

  45. [45]

    GenSim: Generating robotic simulation tasks via large language models, 2023

    Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, and Xiaolong Wang. GenSim: Generating robotic simulation tasks via large language models, 2023

  46. [46]

    Rl-gsbridge: 3d gaussian splatting based real2sim2real method for robotic manipulation learning

    Yuxuan Wu, Lei Pan, Wenhua Wu, Guangming Wang, Yanzi Miao, Fan Xu, and Hesheng Wang. Rl-gsbridge: 3d gaussian splatting based real2sim2real method for robotic manipulation learning. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 192–198. IEEE, 2025

  47. [47]

    Chang, Leonidas J

    Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel X. Chang, Leonidas J. Guibas, and Hao Su. SAPIEN: A simulated part- based interactive environment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

  48. [48]

    Rldg: Robotic general- ist policy distillation via reinforcement learning,

    Charles Xu, Qiyang Li, Jianlan Luo, and Sergey Levine. Rldg: Robotic generalist policy distillation via reinforce- ment learning.ArXiv, abs/2412.09858, 2024. URLhttps://api.semanticscholar.org/CorpusID:274658369

  49. [49]

    Robochallenge: Large-scale real-robot evaluation of embodied policies.arXiv preprint arXiv:2510.17950, 2025

    Adina Yakefu, Bin Xie, Chongyang Xu, Enwen Zhang, Erjin Zhou, Fan Jia, Haitao Yang, Haoqiang Fan, Haowei Zhang, Hongyang Peng, et al. Robochallenge: Large-scale real-robot evaluation of embodied policies.arXiv preprint arXiv:2510.17950, 2025

  50. [50]

    World Action Models are Zero-shot Policies

    Seonghyeon Ye, Yunhao Ge, Kaiyuan Zheng, Shenyuan Gao, Sihyun Yu, George Kurian, Suneel Indupuru, You Liang Tan, Chuning Zhu, Jiannan Xiang, Ayaan Malik, Kyungmin Lee, et al. World action models are zero-shot policies. arXiv preprint arXiv:2602.15922, 2026

  51. [51]

    JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

    Tianle Zhang, Zhihao Yuan, Dafeng Chi, Peidong Liu, Dongwei Li, Kejun Hu, Likui Zhang, Junnan Nie, Ziming Wei, Zengjue Chen, Yili Tang, Jiayi Li, Zhiyuan Xiang, Mingyang Li, Tianci Luo, Hanwen Wan, Ao Li, Linbo Zhai, Zhihao Zhan, Xiaodong Bai, Jiakun Cai, Peng Cao, Kangliang Chen, Siang Chen, Yixiang Dai, Shuai Di, Yicheng Gong, Chenguang Gui, Yucheng Guo...

  52. [52]

    Ikdiffuser: a diffusion-based generative inverse kinematics solver for kinematic trees,

    Zeyu Zhang and Ziyuan Jiao. Ikdiffuser: a diffusion-based generative inverse kinematics solver for kinematic trees,

  53. [53]

    URLhttps://arxiv.org/abs/2506.13087

  54. [54]

    Egoscale: Scaling dexterous manipulation with diverse egocentric human data, 2026

    Ruijie Zheng, Dantong Niu, Yuqi Xie, Jing Wang, Mengda Xu, Yunfan Jiang, Fernando Castañeda, Fengyuan Hu, You Liang Tan, Letian Fu, Trevor Darrell, Furong Huang, Yuke Zhu, Danfei Xu, and Linxi Fan. Egoscale: Scaling dexterous manipulation with diverse egocentric human data, 2026. URLhttps://arxiv.org/abs/2602.16710. 25