pith. machine review for the scientific record. sign in

arxiv: 2601.02078 · v2 · submitted 2026-01-05 · 💻 cs.RO

Recognition: no theorem link

Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot

Authors on Pith no claims yet

Pith reviewed 2026-05-16 17:58 UTC · model grok-4.3

classification 💻 cs.RO
keywords simulation platformsim-to-real transferhumanoid robotsynthetic dataLLM scene generationrobotic manipulationpolicy trainingautomated evaluation
0
0 comments X

The pith

Genie Sim 3.0 shows synthetic data from LLM-generated scenes can train humanoid robot policies that transfer zero-shot to the real world.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Genie Sim 3.0 as a single simulation platform for humanoid robot manipulation. It supplies an LLM-based generator that turns natural-language instructions into high-fidelity scenes, allowing rapid creation of varied training environments. The platform also supplies the first benchmark that uses an LLM to produce evaluation scenarios and a VLM to score performance automatically. An open-source dataset of more than 10,000 hours across over 200 tasks is released, and experiments demonstrate that policies trained only on this data can be deployed on physical robots without further real-world fine-tuning under controlled conditions.

Core claim

The central claim is that the open-source dataset generated by Genie Sim 3.0 supports robust zero-shot sim-to-real transfer for humanoid robot policies, establishing that synthetic data can serve as an effective substitute for real-world data under controlled conditions for scalable policy training.

What carries the argument

Genie Sim Generator, an LLM-powered tool that constructs high-fidelity scenes from natural language instructions to enable rapid multi-dimensional generalization and large-scale data synthesis.

Load-bearing premise

The generated scenes must achieve sufficient physical and visual fidelity that policies trained on them perform comparably in the real world, and the automated VLM evaluation must accurately predict that real-world performance.

What would settle it

Train a policy exclusively on the released synthetic dataset for one of the 200 tasks and measure whether its success rate on the matching real-world task falls substantially below the reported sim performance.

Figures

Figures reproduced from arXiv: 2601.02078 by Chenghao Yin, Chen Xu, Da Huang, Di Yang, Jiayu Li, Jichao Wang, Junhui Wu, Lei Bao, Linjie Hou, Maoqing Yao, Nanshu Zhao, Qian Wang, Rui Feng, Sheng Zhang, Wenjun Sun, Zhaobo Liu, Zhenquan Pang, Zhen Xiao, Zhijun Li.

Figure 1
Figure 1. Figure 1: Overview of Genie Sim 3.0. Genie Sim 3.0 is a full-cycle robotic simulation platform that integrates environment reconstruction, scene generalization, data collection, and automated evaluation. We plan to open-source 5,140 object assets of simulation, more than 10,000 hours of synthetic dataset and 100,000 evaluation scenarios. Abstract— The development of robust and generalizable robot learning models is … view at source ↗
Figure 2
Figure 2. Figure 2: The Automated Workflow of Genie Sim Generator. This module captures user intent via multi-round conversation, translates it into executable Python code, and compiles the final scene graph with assets for Isaac Sim. model and stored in a ChromaDB vector database. At run￾time, the planner extracts keywords (e.g., “yellow cube”) from scene description and encodes them into the same embedding space using the i… view at source ↗
Figure 3
Figure 3. Figure 3: VLM-Driven Evaluation. B. Evaluation Generation Current open-source simulation benchmarks typically rely on predefined instructions, manually annotated success cri￾teria and repeated trial executions to evaluate VLA models. This paradigm yields a largely unidimensional instruction space, limits the scalability of evaluations, and entails high costs for new task creation. LLM exhibits strong natural languag… view at source ↗
Figure 4
Figure 4. Figure 4: Automated Data Collection. A complete task parsing and execution pipeline, which improves task success rate through waypoint filtering and a robust retry mechanism [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Task Distribution Matrix. The dataset is constructed along three dimensions: Manipulation skill, Cognitive com￾prehension, and Task complexity. sequences of fundamental sub-tasks contained within the dataset. To this end, we structure the task taxonomy along three primary axes ( [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Sim to Real Experiments. Comparison of initial task setup between real and sim testing environments first, whether a simulation benchmark can evaluate model performance as reliably as physical environment; and second, to what extent data synthetic data can effectively substitute for physically-collected data in model training. To address these questions systematically, a series of experiments are designed … view at source ↗
Figure 8
Figure 8. Figure 8: Performance Comparison between Real and Sim Environments. The performance in simulation share the same trend with it in real-world environment. Effectiveness of the Synthetic Data. In general, the success rates of the tasks significantly improve with an increas￾ing volume of training data—for both real and synthetic data—which aligns well with the scaling laws in learning performance, see [PITH_FULL_IMAGE… view at source ↗
Figure 9
Figure 9. Figure 9: Correlation Analysis of Model Performance on Sim and Real Environments. All 16 models are evaluated both in sim and real environments, which are tagged on the right side of the figure, each with a different color and marker [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Success Rate on both Sim and Real Env in Heatmap. The comparison of the left and right charts intuitively reflects the consistency between the real and the sim environment. Nevertheless, real-world robotics data collection incurs substantial costs and time constraints, limiting the scale and diversity of available datasets. In contrast, synthetic data can be generated efficiently at scale, allowing for ex… view at source ↗
read the original abstract

The development of robust and generalizable robot learning models is critically contingent upon the availability of large-scale, diverse training data and reliable evaluation benchmarks. Collecting data in the physical world poses prohibitive costs and scalability challenges, and prevailing simulation benchmarks frequently suffer from fragmentation, narrow scope, or insufficient fidelity to enable effective sim-to-real transfer. To address these challenges, we introduce Genie Sim 3.0, a unified simulation platform for robotic manipulation. We present Genie Sim Generator, a large language model (LLM)-powered tool that constructs high-fidelity scenes from natural language instructions. Its principal strength resides in rapid and multi-dimensional generalization, facilitating the synthesis of diverse environments to support scalable data collection and robust policy evaluation. We introduce the first benchmark that pioneers the application of LLM for automated evaluation. It leverages LLM to mass-generate evaluation scenarios and employs Vision-Language Model (VLM) to establish an automated assessment pipeline. We also release an open-source dataset comprising more than 10,000 hours of synthetic data across over 200 tasks. Through systematic experimentation, we validate the robust zero-shot sim-to-real transfer capability of our open-source dataset, demonstrating that synthetic data can server as an effective substitute for real-world data under controlled conditions for scalable policy training. For code and dataset details, please refer to: https://github.com/AgibotTech/genie_sim.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Genie Sim 3.0, a unified high-fidelity simulation platform for humanoid robot manipulation. It describes an LLM-powered Genie Sim Generator for rapid synthesis of diverse scenes from natural language instructions, the first benchmark that uses LLMs to mass-generate evaluation scenarios and VLMs for an automated assessment pipeline, and the release of an open-source dataset exceeding 10,000 hours of synthetic data across more than 200 tasks. The central claim is that systematic experimentation validates robust zero-shot sim-to-real transfer, demonstrating that the synthetic data can serve as an effective substitute for real-world data under controlled conditions for scalable policy training.

Significance. If the fidelity of generated scenes and the accuracy of the VLM pipeline are rigorously demonstrated with quantitative evidence, the platform and dataset could substantially advance scalable humanoid robot learning by lowering barriers to large-scale data collection and automated benchmarking, enabling broader experimentation in sim-to-real transfer.

major comments (2)
  1. [Abstract] Abstract: the claim that 'systematic experimentation' validates 'robust zero-shot sim-to-real transfer' and that 'synthetic data can serve as an effective substitute for real-world data' is unsupported because no quantitative metrics (success rates, transfer gaps, baselines), experimental protocols, or error analysis are supplied, rendering the central result unevaluable.
  2. [Benchmark section] Automated evaluation pipeline (described in the benchmark section): the VLM-based assessment is presented as establishing reliable policy scoring, yet no correlation analysis (e.g., Pearson r between VLM scores and real-robot success rates) or failure-mode study is reported; this is load-bearing for the transfer claim because VLMs are known to misjudge contact dynamics and grasp stability in manipulation tasks.
minor comments (1)
  1. [Abstract] Abstract: typographical error 'server' should read 'serve' in the phrase 'synthetic data can server as an effective substitute'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We agree that the central claims require stronger quantitative support and will revise the manuscript to address both major comments. Our responses are provided point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'systematic experimentation' validates 'robust zero-shot sim-to-real transfer' and that 'synthetic data can serve as an effective substitute for real-world data' is unsupported because no quantitative metrics (success rates, transfer gaps, baselines), experimental protocols, or error analysis are supplied, rendering the central result unevaluable.

    Authors: We acknowledge that the abstract states the claim without embedding specific numbers, protocols, or analysis, which makes the result difficult to evaluate from the abstract alone. The full manuscript contains experimental results in the evaluation section, but these details are not summarized quantitatively in the abstract. We will revise the abstract to include key metrics (e.g., real-robot success rates, sim-to-real transfer gaps relative to real-data baselines) and will add a concise description of the experimental protocol and error analysis. Corresponding expansions will appear in the experiments section. revision: yes

  2. Referee: [Benchmark section] Automated evaluation pipeline (described in the benchmark section): the VLM-based assessment is presented as establishing reliable policy scoring, yet no correlation analysis (e.g., Pearson r between VLM scores and real-robot success rates) or failure-mode study is reported; this is load-bearing for the transfer claim because VLMs are known to misjudge contact dynamics and grasp stability in manipulation tasks.

    Authors: We agree that the reliability of the VLM-based scoring pipeline must be demonstrated quantitatively, especially given known limitations of VLMs on contact-rich tasks. The current manuscript describes the pipeline but does not report correlation coefficients or a dedicated failure-mode study. We will add a new subsection in the benchmark section that presents Pearson correlation (and other agreement metrics) between VLM scores and both human annotations and real-robot success rates, along with a failure-mode analysis that explicitly examines misjudgments on grasp stability and contact dynamics. This revision will directly address the load-bearing concern for the transfer claims. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain; claims rest on empirical validation and released dataset

full rationale

The paper presents a simulation platform, LLM-powered scene generator, VLM-based evaluation benchmark, and open-source dataset of over 10,000 hours across 200 tasks. No equations, fitted parameters, or first-principles derivations appear in the provided text. The central claim of zero-shot sim-to-real transfer is supported by systematic experimentation and the released dataset rather than any self-referential construction, self-citation chain, or renaming of inputs as outputs. The work is therefore self-contained against external benchmarks with no load-bearing steps that reduce to the paper's own inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software platform and dataset release paper with no mathematical derivations, fitted parameters, or new physical axioms.

pith-pipeline@v0.9.0 · 5606 in / 1085 out tokens · 25784 ms · 2026-05-16T17:58:50.965132+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

    cs.RO 2026-04 unverdicted novelty 4.0

    JoyAI-RA is a multi-source pretrained VLA model that claims to bridge human-to-robot embodiment gaps via data unification and outperforms prior methods on generalization-heavy robotic tasks.

  2. Genie Sim PanoRecon: Fast Immersive Scene Generation from Single-View Panorama

    cs.RO 2026-04 unverdicted novelty 4.0

    A feed-forward Gaussian-splatting system reconstructs photo-realistic 3D scenes from single-view panoramas in seconds via cube-map decomposition and depth-aware fusion for robotic simulation use.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · cited by 2 Pith papers · 11 internal anchors

  1. [1]

    Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

    Cheng Chi et al. “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion”. In:Proceedings of Robotics: Science and Systems. Daegu, Republic of Korea, July 2023.DOI:10 . 15607 / RSS . 2023 . XIX.026

  2. [2]

    OpenVLA: An Open-Source Vision-Language-Action Model

    Moo Jin Kim et al. “OpenVLA: An Open-Source Vision-Language-Action Model”. In:arXiv preprint arXiv:2406.09246(2024)

  3. [3]

    Is Diversity All You Need for Scalable Robotic Manipulation?

    Modi Shi et al. “Is Diversity All You Need for Scalable Robotic Manipulation?” In:arXiv preprint arXiv:2507.06219(2025)

  4. [4]

    DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning

    Zhenyu Jiang et al. “DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning”. In:2025 IEEE International Conference on Robotics and Automation (ICRA). 2025

  5. [5]

    Object-Centric Dexterous Ma- nipulation from Human Motion Data

    Yuanpei Chen et al. “Object-Centric Dexterous Ma- nipulation from Human Motion Data”. In:8th Annual Conference on Robot Learning. 2024

  6. [6]

    Jiangran Lyu et al.ScissorBot: Learning Generaliz- able Scissor Skill for Paper Cutting via Simulation, Imitation, and Sim2Real. 2024. arXiv:2409.13966 [cs.RO].URL:https : / / arxiv . org / abs / 2409.13966

  7. [7]

    Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Perfor- mance?

    Abhishek Kadian et al. “Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Perfor- mance?” In:IEEE Robotics and Automation Letters 5.4 (2020), pp. 6670–6677.DOI:10 . 1109 / LRA . 2020.3013848

  8. [8]

    In: 2024 IEEE International Conference on Robotics and Automation (ICRA)

    Pushkal Katara, Zhou Xian, and Katerina Fragkiadaki. “Gen2Sim: Scaling up Robot Learning in Simula- tion with Generative Models”. In:2024 IEEE In- ternational Conference on Robotics and Automation (ICRA). 2024, pp. 6672–6679.DOI:10 . 1109 / ICRA57147.2024.10610566

  9. [9]

    Yufei Jia et al.DISCOVERSE: Efficient Robot Simu- lation in Complex High-Fidelity Environments. 2025. arXiv:2507.21981 [cs.RO].URL:https:// arxiv.org/abs/2507.21981

  10. [10]

    GraspVLA: a Grasping Foun- dation Model Pre-trained on Billion-scale Synthetic Action Data

    Shengliang Deng et al. “GraspVLA: a Grasping Foun- dation Model Pre-trained on Billion-scale Synthetic Action Data”. In: (2025). arXiv:2505 . 03233 [cs.RO].URL:https : / / arxiv . org / abs / 2505.03233

  11. [11]

    Arhan Jain et al.PolaRiS: Scalable Real-to-Sim Eval- uations for Generalist Robot Policies. 2025. arXiv: 2512.16881 [cs.RO].URL:https://arxiv. org/abs/2512.16881

  12. [12]

    Learning high-fidelity robot self-model with articulated 3D Gaussian splatting

    Kejun Hu, Peng Yu, and Ning Tan. “Learning high-fidelity robot self-model with articulated 3D Gaussian splatting”. In:The International Jour- nal of Robotics Research0.0 (2025).DOI:10 . 1177/02783649251396980. eprint:https:// doi . org / 10 . 1177 / 02783649251396980. URL:https : / / doi . org / 10 . 1177 / 02783649251396980

  13. [13]

    Xuanlin Li et al.Evaluating Real-World Robot Manip- ulation Policies in Simulation. 2024. arXiv:2405 . 05941 [cs.RO].URL:https://arxiv.org/ abs/2405.05941

  14. [14]

    Ctrl-World: A Controllable Generative World Model for Robot Manipulation

    Yanjiang Guo et al. “Ctrl-world: A controllable gener- ative world model for robot manipulation”. In:arXiv preprint arXiv:2510.10125(2025)

  15. [15]

    Haoran Geng et al.RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning. 2025. arXiv:2504. 18904 [cs.RO].URL:https://arxiv.org/ abs/2504.18904

  16. [16]

    Alexander Khazatsky et al.DROID: A Large-Scale In- The-Wild Robot Manipulation Dataset. 2025. arXiv: 2403.12945 [cs.RO].URL:https://arxiv. org/abs/2403.12945

  17. [17]

    Sudeep Dasari et al.RoboNet: Large-Scale Multi- Robot Learning. 2020. arXiv:1910 . 11215 [cs.RO].URL:https : / / arxiv . org / abs / 1910.11215

  18. [18]

    RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot

    Hao-Shu Fang et al. “RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot”. In:2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE. 2024, pp. 653–660

  19. [19]

    Embodiment Collaboration et al.Open X- Embodiment: Robotic Learning Datasets and RT-X Models. 2025. arXiv:2310.08864 [cs.RO].URL: https://arxiv.org/abs/2310.08864

  20. [20]

    Agibot world colosseo: A large- scale manipulation platform for scalable and intelli- gent embodied systems

    Qingwen Bu et al. “Agibot world colosseo: A large- scale manipulation platform for scalable and intelli- gent embodied systems”. In:2025 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS). IEEE. 2025

  21. [21]

    RoboCasa: Large-Scale Sim- ulation of Everyday Tasks for Generalist Robots

    Soroush Nasiriany et al. “RoboCasa: Large-Scale Sim- ulation of Everyday Tasks for Generalist Robots”. In: Robotics: Science and Systems. 2024

  22. [22]

    DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Syn- thetic Cluttered Scenes

    Jialiang Zhang et al. “DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Syn- thetic Cluttered Scenes”. In:8th Annual Conference on Robot Learning. 2024

  23. [23]

    RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

    Tianxing Chen et al. “Robotwin 2.0: A scalable data generator and benchmark with strong domain random- ization for robust bimanual robotic manipulation”. In: arXiv preprint arXiv:2506.18088(2025)

  24. [24]

    Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning, 2021

    Tianhe Yu et al. “Meta-World: A Benchmark and Eval- uation for Multi-Task and Meta Reinforcement Learn- ing”. In:Conference on Robot Learning (CoRL). 2019. arXiv:1910.10897 [cs.LG].URL:https:// arxiv.org/abs/1910.10897

  25. [25]

    HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reason- ing

    Zhi Jing et al. “HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reason- ing”. In:arXiv preprint arXiv:2507.00833(2025)

  26. [26]

    Carmelo Sferrazza et al.HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation. 2024

  27. [27]

    Bigym: A demo-driven mobile bi-manual manipulation benchmark,

    Nikita Chernyadev et al. “BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark”. In: arXiv preprint arXiv:2407.07788(2024)

  28. [28]

    BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

    Chengshu Li et al. “BEHA VIOR-1K: A Human- Centered, Embodied AI Benchmark with 1,000 Ev- eryday Activities and Realistic Simulation”. In:arXiv preprint arXiv:2403.09227(2024)

  29. [29]

    ManipulaTHOR: A Framework for Visual Object Manipulation

    Kiana Ehsani et al. “ManipulaTHOR: A Framework for Visual Object Manipulation”. In:CVPR. 2021

  30. [30]

    Sriram Yenamandra et al.HomeRobot: Open- Vocabulary Mobile Manipulation. 2024. arXiv: 2306.11565 [cs.RO].URL:https://arxiv. org/abs/2306.11565

  31. [31]

    DaXBench: Benchmarking De- formable Object Manipulation with Differentiable Physics

    Siwei Chen* et al. “DaXBench: Benchmarking De- formable Object Manipulation with Differentiable Physics”. In:ICLR. 2023

  32. [32]

    SoftGym: Benchmarking Deep Reinforcement Learning for Deformable Object Ma- nipulation

    Xingyu Lin et al. “SoftGym: Benchmarking Deep Reinforcement Learning for Deformable Object Ma- nipulation”. In:Conference on Robot Learning. 2020

  33. [33]

    Zhiao Huang et al.PlasticineLab: A Soft-Body Manip- ulation Benchmark with Differentiable Physics. 2021. arXiv:2104.03311 [cs.LG].URL:https:// arxiv.org/abs/2104.03311

  34. [34]

    Irving Fang et al.From Intention to Execution: Probing the Generalization Boundaries of Vision- Language-Action Models. 2025. arXiv:2506.09930 [cs.RO].URL:https : / / arxiv . org / abs / 2506.09930

  35. [35]

    Quan Khanh Luu et al.ManiFeel: Benchmarking and Understanding Visuotactile Manipulation Policy Learning. 2025. arXiv:2505 . 18472 [cs.RO]. URL:https://arxiv.org/abs/2505.18472

  36. [36]

    Exploring the Limits of Vision- Language-Action Manipulation in Cross-task Gener- alization

    Jiaming Zhou et al. “Exploring the Limits of Vision- Language-Action Manipulation in Cross-task Gener- alization”. In:The Thirty-ninth Annual Conference on Neural Information Processing Systems. 2025.URL: https : / / openreview . net / forum ? id = h6xQClTm4W

  37. [37]

    Yunzhi Zhang et al.The Scene Language: Repre- senting Scenes with Programs, Words, and Embed- dings. 2025. arXiv:2410.16770 [cs.CV].URL: https://arxiv.org/abs/2410.16770

  38. [38]

    3D Gaussian splatting for real- time radiance field rendering

    Bernhard Kerbl et al. “3D Gaussian splatting for real- time radiance field rendering.” In:ACM Trans. Graph. 42.4 (2023), pp. 139–1

  39. [39]

    Superpoint: Self-supervised interest point detection and description

    Daniel DeTone, Tomasz Malisiewicz, and Andrew Ra- binovich. “Superpoint: Self-supervised interest point detection and description”. In:Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018, pp. 224–236

  40. [40]

    Lightglue: Local feature matching at light speed

    Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. “Lightglue: Local feature matching at light speed”. In:Proceedings of the IEEE/CVF in- ternational conference on computer vision. 2023, pp. 17627–17638

  41. [41]

    Domain-size pooling in local descriptors: DSP-SIFT

    Jingming Dong and Stefano Soatto. “Domain-size pooling in local descriptors: DSP-SIFT”. In:Proceed- ings of the IEEE conference on computer vision and pattern recognition. 2015, pp. 5097–5106

  42. [42]

    Colmap- pcd: An open-source tool for fine image-to-point cloud registration

    Chunge Bai, Ruijie Fu, and Xiang Gao. “Colmap- pcd: An open-source tool for fine image-to-point cloud registration”. In:2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE. 2024, pp. 1723–1729

  43. [43]

    gsplat: An open-source library for Gaussian splatting

    Vickie Ye et al. “gsplat: An open-source library for Gaussian splatting”. In:Journal of Machine Learning Research26.34 (2025), pp. 1–17

  44. [44]

    DIFIX3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

    Jay Zhangjie Wu et al. “DIFIX3D+: Improving 3D Reconstructions with Single-Step Diffusion Models”. In:Proceedings of the Computer Vision and Pattern Recognition Conference. 2025, pp. 26024–26035

  45. [45]

    PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Re- construction

    Danpeng Chen et al. “PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Re- construction”. In:arXiv preprint arXiv:2406.06521 (2024)

  46. [46]

    curobo: Parallelized collision-free minimum-jerk robot motion generation

    Balakumar Sundaralingam et al. “curobo: Parallelized collision-free minimum-jerk robot motion generation”. In:arXiv preprint arXiv:2310.17274(2023)

  47. [47]

    Graspnet: A large-scale clustered and densely annotated dataset for object grasping, 2020

    Haoshu Fang et al. “GraspNet: A Large-Scale Clus- tered and Densely Annotated Datase for Object Grasp- ing”. In:CoRRabs/1912.13470 (2019). arXiv:1912. 13470.URL:http://arxiv.org/abs/1912. 13470

  48. [48]

    Physical Intelligence et al.π 0.5: a Vision-Language- Action Model with Open-World Generalization. 2025. arXiv:2504.16054 [cs.LG].URL:https:// arxiv.org/abs/2504.16054

  49. [49]

    UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

    Qingwen Bu et al. “Univla: Learning to act anywhere with task-centric latent actions”. In:arXiv preprint arXiv:2505.06111(2025)

  50. [50]

    RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

    Songming Liu et al. “RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation”. In:arXiv preprint arXiv:2410.07864(2024)

  51. [51]

    X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model

    Jinliang Zheng et al. “X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model”. In:arXiv preprint arXiv:2510.10274(2025)