pith. machine review for the scientific record. sign in

arxiv: 2604.05831 · v1 · submitted 2026-04-07 · 💻 cs.RO

Recognition: no theorem link

BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:52 UTC · model grok-4.3

classification 💻 cs.RO
keywords bimanual manipulationrobotics benchmarklong-horizon tasksspatial-temporal coordinationrobotic learningarm coordination
0
0 comments X

The pith

Bimanual robot policies fail on tasks requiring sustained tight coordination between two arms over long sequences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing bimanual benchmarks rely on short tasks with only loose arm cooperation, which does not match the continuous dependency and role switching seen in real two-handed work. BiCoord supplies a set of longer tasks built around ongoing inter-arm coupling and repeated sub-goal exchanges. The benchmark adds metrics that separately score timing, positioning, and their joint behavior. When standard policies are tested on these tasks, performance drops sharply, exposing limits in current learning approaches for coordinated manipulation.

Core claim

BiCoord is a benchmark for long-horizon tightly coordinated bimanual manipulation that includes diverse tasks requiring continuous inter-arm dependency and dynamic role exchange across multiple sub-goals, together with quantitative metrics that evaluate coordination from temporal, spatial, and spatial-temporal perspectives. Experiments show that representative policies such as DP, RDT, Pi0, and OpenVLA-OFT struggle with the long-duration and highly coupled tasks.

What carries the argument

The BiCoord benchmark, built from tasks that enforce continuous arm-to-arm dependency and role exchange, paired with a metric suite that separately quantifies timing, spatial alignment, and their interaction.

If this is right

  • Methods for bimanual control must add explicit handling of long-term arm interdependencies rather than treating arms independently.
  • The new metrics provide a concrete way to measure and compare progress toward better coordination.
  • Long-horizon tasks with role exchange become a standard test for whether learned policies can sustain cooperative behavior across changing goals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Improved performance on BiCoord tasks could support more reliable two-arm systems for assembly or household tasks that currently require human-level timing.
  • The benchmark format could be extended to include contact-rich actions or sensor noise to test whether coordination gains survive real-world conditions.

Load-bearing premise

The chosen tasks and metrics capture the essential spatial-temporal coupling present in real-world bimanual actions.

What would settle it

A standard policy trained without special coordination modules that nevertheless scores well on all BiCoord tasks and metrics would show the claimed fundamental challenges do not hold.

Figures

Figures reproduced from arXiv: 2604.05831 by Annan Li, Chen Gao, Liankai Jin, Si Liu, Xingyu Peng.

Figure 1
Figure 1. Figure 1: Overview of BiCoord. (a) The data generation pipeline. (b) An example trajectory of Cook task is exhibited. Each trajectory is divided into several stages with sub-goals and arm behaviours. Besides, key features of bimanual coordination are embodied in BiCoord, like phased coupling, spatial-temporal constraint and predictive coordination. (c) We design metrics to evaluate the bimanual manipulation benchmar… view at source ↗
Figure 4
Figure 4. Figure 4: Pipeline for building BiCoord. 4 BiCoord As shown in section 3.3, existing bimanual manipulation bench￾marks are deficient in coordination and length. To address this, we propose BiCoord, a bimanual manipulation benchmark requiring both high-level coordination and long-term inference, as shown in fig. 3. In the following, we will present the building pipeline of BiCoord in section 4.1, and introduce its fe… view at source ↗
Figure 3
Figure 3. Figure 3: Tasks in BiCoord. Each task requires high-level [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Statistics of BiCoord. to previous benchmarks. Besides, the average trajectory length and object number are improved by 63.35% and 64.13% respectively. Stage-Wise Annotation and Evaluation. To support fine-grained policy training and testing, BiCoord provides stage-wise annota￾tion and evaluation. For training, each trajectory is divided into several stages, where each stage is coupled with a sub-goal and … view at source ↗
Figure 6
Figure 6. Figure 6: Visualizations of Pi0 on Divide Block Tower task. Grasping errors occur when the color and order of the blocks change, demonstrating limited reasoning ability. better ability in handling complex tasks. Such a phenomenon indi￾cates that large-scale pre-training is also meaningful in the field of embodied intelligence, just like in vision-language models. High efficiency of DP. DP generally takes fewer times… view at source ↗
Figure 7
Figure 7. Figure 7: Visualizations on Handover Block With Bowls task. The block is poured out before the two bowls completely aligned, showing weak abilities in precise alignment. 𝑆𝑅(%) 𝑆𝑡𝑎𝑔𝑒 start 1 block 2 blocks 3 blocks 4 blocks (b) Jigsaw 𝑆𝑅(%) 𝑆𝑡𝑎𝑔𝑒 start 1 pen 2 pens 3 pens 4 pens (a) Collect Pens DP RDT OpenVLA-OFT Pi0 [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Stage-wise analysis. We present two examples here [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualizations on Cook task [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗
read the original abstract

Bimanual manipulation, i.e., the coordinated use of two robotic arms to complete tasks, is essential for achieving human-level dexterity in robotics. Recent simulation benchmarks, e.g., RoboTwin and RLBench2, have advanced data-driven learning for bimanual manipulation. However, existing tasks are short-horizon and only loosely coordinated, failing to capture the spatial-temporal coupling inherent in real-world bimanual behaviors. To address this gap, we introduce BiCoord, a benchmark for long-horizon and tightly coordinated bimanual manipulation. Specifically, BiCoord comprises diverse tasks that require continuous inter-arm dependency and dynamic role exchange across multiple sub-goals. Also, we propose a suite of quantitative metrics that evaluate coordination from temporal, spatial, and spatial-temporal perspectives, enabling systematic measurement of bimanual cooperation. Experimental results show that representative manipulation policies, e.g., DP, RDT, Pi0, and OpenVLA-OFT, struggle with long-duration and highly coupled tasks, revealing fundamental challenges in achieving long-horizon and tight coordination tasks. We hope BiCoord can serve as a foundation for studying long-horizon cooperative manipulation and inspire future research on coordination-aware robotic learning. All datasets, codes and supplements could be found at https://buaa-colalab.github.io/BiCoord/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces BiCoord, a new benchmark for long-horizon bimanual manipulation consisting of tasks that require continuous inter-arm dependency and dynamic role exchange across multiple sub-goals. It defines a suite of metrics to quantify coordination along temporal, spatial, and spatial-temporal axes. Experiments on representative policies (DP, RDT, Pi0, OpenVLA-OFT) show poor performance on these tasks, which the authors interpret as evidence of fundamental challenges in long-horizon tight coordination.

Significance. If the BiCoord tasks and metrics demonstrably isolate spatial-temporal coupling beyond generic long-horizon difficulty, and if the reported performance gaps are robust, the benchmark would fill a genuine gap left by short-horizon suites such as RoboTwin and RLBench2. The public release of datasets, code, and supplements is a clear strength that supports reproducibility and future coordination-aware learning research.

major comments (2)
  1. [Experimental results and task design] The central claim that policy failures reveal specific challenges in 'long-horizon and tight coordination' (abstract and conclusion) rests on the assumption that BiCoord tasks impose continuous inter-arm dependency that cannot be reduced to extended horizon or increased subgoal count. No control tasks or ablations are described that preserve duration and subgoal structure while relaxing simultaneous dependency (e.g., sequential independent-arm execution). Without such isolation, the attribution of struggles to coordination demands rather than known long-horizon planning limitations remains unverified.
  2. [Metrics section] The proposed temporal/spatial/spatial-temporal metrics are introduced to measure bimanual cooperation, yet the manuscript provides no quantitative validation or baseline comparisons showing that these metrics distinguish tight coupling from loose coordination on the BiCoord tasks themselves.
minor comments (1)
  1. [Abstract and introduction] The abstract states that 'all datasets, codes and supplements could be found at https://buaa-colalab.github.io/BiCoord/'; the main text should include a concise description of benchmark usage, task parameterization, and metric computation formulas to reduce reliance on the external site.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We agree that the manuscript would be strengthened by additional experiments isolating coordination demands from long-horizon effects and by explicit validation of the proposed metrics. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses
  1. Referee: [Experimental results and task design] The central claim that policy failures reveal specific challenges in 'long-horizon and tight coordination' (abstract and conclusion) rests on the assumption that BiCoord tasks impose continuous inter-arm dependency that cannot be reduced to extended horizon or increased subgoal count. No control tasks or ablations are described that preserve duration and subgoal structure while relaxing simultaneous dependency (e.g., sequential independent-arm execution). Without such isolation, the attribution of struggles to coordination demands rather than known long-horizon planning limitations remains unverified.

    Authors: We acknowledge that the current submission lacks explicit control experiments to separate the effects of tight inter-arm coupling from general long-horizon planning difficulties. Although the BiCoord tasks are explicitly designed around continuous dependency and dynamic role exchange (as described in the task definitions), we did not report sequential variants. In the revised manuscript we will add such ablations: for each task we will include a matched sequential version in which the arms execute sub-goals independently while preserving total duration, number of sub-goals, and overall workspace constraints. Direct performance comparisons between the original tightly coupled tasks and these sequential controls will be reported to better attribute the observed policy failures. revision: yes

  2. Referee: [Metrics section] The proposed temporal/spatial/spatial-temporal metrics are introduced to measure bimanual cooperation, yet the manuscript provides no quantitative validation or baseline comparisons showing that these metrics distinguish tight coupling from loose coordination on the BiCoord tasks themselves.

    Authors: We agree that quantitative validation is necessary to confirm the metrics capture tight versus loose coordination. In the revision we will compute the temporal, spatial, and spatial-temporal metrics on both the original BiCoord tasks and on relaxed variants that reduce coupling (e.g., by relaxing synchronization constraints while keeping the same sub-goal sequence). We will additionally report metric values obtained from human teleoperation demonstrations (high coordination) and from random or single-arm policies (low coordination) to demonstrate differentiation. These results will be included in an expanded metrics section. revision: yes

Circularity Check

0 steps flagged

Independent benchmark with empirical evaluation; no derivation chain present

full rationale

The paper introduces a new benchmark (BiCoord) consisting of tasks and metrics for bimanual coordination, then reports empirical performance of existing policies on those tasks. No equations, fitted parameters, predictions, or first-principles derivations are claimed. The central statements (existing benchmarks are short-horizon/loose; new tasks require continuous inter-arm dependency; policies struggle) are definitional descriptions of the benchmark plus experimental observations, not reductions of outputs to inputs by construction. Self-citations, if any, are not load-bearing for any result. This is a standard benchmark paper whose content is self-contained against external evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central contribution is a new benchmark rather than a derivation; it rests on the domain assumption that simulation can proxy real bimanual coordination and on the new tasks themselves.

axioms (1)
  • domain assumption Simulation environments can adequately model the spatial-temporal coupling of real-world bimanual behaviors
    The entire benchmark is built in simulation; this assumption underpins claims about real-world relevance.
invented entities (1)
  • BiCoord benchmark tasks and coordination metrics no independent evidence
    purpose: To evaluate long-horizon spatial-temporal coordination in bimanual manipulation
    Newly defined tasks and quantitative metrics introduced in the paper.

pith-pipeline@v0.9.0 · 5541 in / 1190 out tokens · 27546 ms · 2026-05-10T18:52:40.925383+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 26 canonical work pages · 13 internal anchors

  1. [1]

    Jorge Aldaco, Travis Armstrong, Robert Baruch, Jeff Bingham, Sanky Chan, Ken- neth Draper, Debidatta Dwibedi, Chelsea Finn, Pete Florence, Spencer Goodrich, et al. 2024. ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoper- ation.arXiv preprint arXiv:2405.02292(2024)

  2. [2]

    Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. 2024. 𝑝𝑖_ 0: A Vision-Language-Action Flow Model for General Robot Control.arXiv preprint arXiv:2410.24164(2024)

  3. [3]

    Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. 2023. Rt-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818(2023)

  4. [4]

    Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et al. 2022. Rt-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817(2022)

  5. [5]

    Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xu Huang, Shu Jiang, et al. 2025. Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems. arXiv preprint arXiv:2503.06669(2025)

  6. [6]

    Konstantinos Chatzilygeroudis, Bernardo Fichera, Ilaria Lauzana, Fanjun Bu, Kunpeng Yao, Farshad Khadivar, and Aude Billard. 2020. Benchmark for bimanual robotic manipulation of semi-deformable objects.IEEE Robotics and Automation Letters5, 2 (2020), 2443–2450

  7. [7]

    Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Zixuan Li, Qiwei Liang, Xianliang Lin, Yiheng Ge, Zhenyu Gu, et al. 2025. Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation.arXiv preprint arXiv:2506.18088(2025)

  8. [8]

    Tianxing Chen, Yao Mu, Zhixuan Liang, Zanxin Chen, Shijia Peng, Qiangyu Chen, Mingkun Xu, Ruizhen Hu, Hongyuan Zhang, Xuelong Li, et al . 2025. G3flow: Generative 3d semantic flow for pose-aware and generalizable object manipulation. InProceedings of the Computer Vision and Pattern Recognition Conference. 1735–1744

  9. [9]

    Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burch- fiel, Russ Tedrake, and Shuran Song. 2023. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research(2023), 02783649241273668

  10. [10]

    Sudeep Dasari, Jianren Wang, Joyce Hong, Shikhar Bahl, Yixin Lin, Austin S Wang, Abitha Thankaraj, Karanbir Singh Chahal, Berk Calli, Saurabh Gupta, et al

  11. [11]

    RB2: Robotic Manipulation Benchmarking with a Twist.NeurIPS 2021 Datasets and Benchmarks Track(2021)

  12. [12]

    Frederik Ebert, Yanlai Yang, Karl Schmeckpeper, Bernadette Bucher, Georgios Georgakis, Kostas Daniilidis, Chelsea Finn, and Sergey Levine. 2021. Bridge data: Boosting generalization of robotic skills with cross-domain datasets.arXiv preprint arXiv:2109.13396(2021)

  13. [13]

    Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kauf- mann, Michael J Black, and Otmar Hilliges. 2023. ARCTIC: A dataset for dexterous bimanual hand-object manipulation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12943–12954

  14. [14]

    Zipeng Fu, Tony Z Zhao, and Chelsea Finn. 2024. Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation.arXiv preprint arXiv:2401.02117(2024)

  15. [15]

    Jianfeng Gao, Xiaoshu Jin, Franziska Krebs, Noémie Jaquier, and Tamim Asfour

  16. [16]

    In2024 IEEE International Conference on Robotics and Automation (ICRA)

    Bi-kvil: Keypoints-based visual imitation learning of bimanual manipulation tasks. In2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 16850–16857

  17. [17]

    Haoran Geng, Feishi Wang, Songlin Wei, Yuyang Li, Bangjun Wang, Boshi An, Charlie Tianyue Cheng, Haozhe Lou, Peihao Li, Yen-Jen Wang, et al. 2025. Robo- Verse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning.arXiv preprint arXiv:2504.18904(2025)

  18. [18]

    Jennifer Grannen, Yilin Wu, Brandon Vu, and Dorsa Sadigh. 2023. Stabilize to act: Learning to coordinate for bimanual manipulation. InConference on Robot Learning. PMLR, 563–576

  19. [19]

    Markus Grotz, Mohit Shridhar, Yu-Wei Chao, Tamim Asfour, and Dieter Fox

  20. [20]

    InCoRL 2024 Workshop on Whole-body Control and Bimanual Manipulation: Applications in Humanoids and Beyond

    Peract2: Benchmarking and learning for robotic bimanual manipulation tasks. InCoRL 2024 Workshop on Whole-body Control and Bimanual Manipulation: Applications in Humanoids and Beyond

  21. [21]

    Songhao Han, Boxiang Qiu, Yue Liao, Siyuan Huang, Chen Gao, Shuicheng Yan, and Si Liu. 2025. RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation.arXiv preprint arXiv:2506.06677(2025)

  22. [22]

    Ce Hao, Xuanran Zhai, Yaohua Liu, and Harold Soh. 2026. Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies.arXiv preprint arXiv:2601.21251(2026)

  23. [23]

    Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J Davison. 2020. Rlbench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters5, 2 (2020), 3019–3026

  24. [24]

    Jian-Jian Jiang, Xiao-Ming Wu, Yi-Xiang He, Ling-An Zeng, Yi-Lin Wei, Dandan Zhang, and Wei-Shi Zheng. 2025. Rethinking bimanual robotic manipulation: Learning with decoupled interaction framework. InProceedings of the IEEE/CVF International Conference on Computer Vision. 12427–12437

  25. [25]

    Tsung-Wei Ke, Nikolaos Gkanatsios, and Katerina Fragkiadaki. 2025. 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations. InConference on Robot Learning. PMLR, 1949–1974

  26. [26]

    Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al . 2024. DROID: A large-scale in- the-wild robot manipulation dataset. InRobotics: Science and Systems

  27. [27]

    Moo Jin Kim, Chelsea Finn, and Percy Liang. 2025. Fine-tuning vision-language- action models: Optimizing speed and success.arXiv preprint arXiv:2502.19645 (2025)

  28. [28]

    Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan P Foster, Pannag R Sanketi, Quan Vuong, et al. [n. d.]. OpenVLA: An Open-Source Vision-Language-Action Model. In8th Annual Conference on Robot Learning

  29. [29]

    Zhiqian Lan, Yuxuan Jiang, Ruiqi Wang, Xuanbing Xie, Rongkui Zhang, Yicheng Zhu, Peihang Li, Tianshuo Yang, Tianxing Chen, Haoyu Gao, et al. 2025. Autobio: A simulation and benchmark for robotic automation in digital biology laboratory. arXiv preprint arXiv:2505.14030(2025)

  30. [30]

    Qixiu Li, Yaobo Liang, Zeyu Wang, Lin Luo, Xi Chen, Mozheng Liao, Fangyun Wei, Yu Deng, Sicheng Xu, Yizhong Zhang, et al. 2024. Cogact: A foundational vision-language-action model for synergizing cognition and action in robotic manipulation.arXiv preprint arXiv:2411.19650(2024)

  31. [31]

    Rui Li, Zixuan Hu, Wenxi Qu, Jinouwen Zhang, Zhenfei Yin, Sha Zhang, Xuantuo Huang, Hanqing Wang, Tai Wang, Jiangmiao Pang, et al. 2025. LabUtopia: High- Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents. arXiv preprint arXiv:2505.22634(2025)

  32. [32]

    Yunfei Li, Chaoyi Pan, Huazhe Xu, Xiaolong Wang, and Yi Wu. 2023. Efficient bimanual handover and rearrangement via symmetry-aware actor-critic learning. In2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3867–3874

  33. [33]

    Zhixuan Liang, Yao Mu, Mingyu Ding, Fei Ni, Masayoshi Tomizuka, and Ping Luo. 2023. AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners. InInternational Conference on Machine Learning. PMLR, 20725–20745

  34. [34]

    Zhixuan Liang, Yao Mu, Hengbo Ma, Masayoshi Tomizuka, Mingyu Ding, and Ping Luo. 2024. Skilldiffuser: Interpretable hierarchical planning via skill abstrac- tions in diffusion-based task execution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16467–16476

  35. [35]

    Zhixuan Liang, Yao Mu, Yixiao Wang, Tianxing Chen, Wenqi Shao, Wei Zhan, Masayoshi Tomizuka, Ping Luo, and Mingyu Ding. 2025. DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation. In Proceedings of the Computer Vision and Pattern Recognition Conference. 1745–1755

  36. [36]

    Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. 2023. Libero: Benchmarking knowledge transfer for lifelong robot learning. Advances in Neural Information Processing Systems36 (2023), 44776–44791

  37. [37]

    I-Chun Arthur Liu, Jason Chen, Gaurav S Sukhatme, and Daniel Seita. 2025. D- CODA: Diffusion for Coordinated Dual-Arm Data Augmentation. InConference on Robot Learning. PMLR, 3569–3588

  38. [38]

    I-Chun Arthur Liu, Sicheng He, Daniel Seita, and Gaurav S Sukhatme. 2025. VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation. InConference on Robot Learning. PMLR, 4354–4370

  39. [39]

    Junjia Liu, Yiting Chen, Zhipeng Dong, Shixiong Wang, Sylvain Calinon, Miao Li, and Fei Chen. 2022. Robot cooking with stir-fry: Bimanual non-prehensile manipulation of semi-fluid objects.IEEE Robotics and Automation Letters7, 2 (2022), 5159–5166

  40. [40]

    Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. 2024. Rdt-1b: a diffusion foundation model for bimanual manipulation.arXiv preprint arXiv:2410.07864(2024)

  41. [41]

    Guanxing Lu, Tengbo Yu, Haoyuan Deng, Season Si Chen, Yansong Tang, and Ziwei Wang. 2025. Anybimanual: Transferring unimanual policy for general bimanual manipulation. InProceedings of the IEEE/CVF International Conference on Computer Vision. 13662–13672

  42. [42]

    Qi Lv, Hao Li, Xiang Deng, Rui Shao, Yinchuan Li, Jianye Hao, Longxiang Gao, Michael Yu Wang, and Liqiang Nie. 2025. Spatial-temporal graph diffusion policy with kinematic modeling for bimanual robotic manipulation. InProceedings of the Computer Vision and Pattern Recognition Conference. 17394–17404

  43. [43]

    Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, et al

  44. [44]

    InNeurIPS Datasets and Benchmarks

    Isaac Gym: High Performance GPU Based Physics Simulation For Robot Learning. InNeurIPS Datasets and Benchmarks

  45. [45]

    Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. 2022. Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks.IEEE Robotics and Automation Letters7, 3 (2022), 7327–7334

  46. [46]

    Mayank Mittal, Calvin Yu, Qinxi Yu, Jingzhou Liu, Nikita Rudin, David Hoeller, Jia Lin Yuan, Ritvik Singh, Yunrong Guo, Hammad Mazhar, et al. 2023. Orbit: A unified simulation framework for interactive robot learning environments.IEEE Robotics and Automation Letters8, 6 (2023), 3740–3747

  47. [47]

    Yao Mu, Tianxing Chen, Zanxin Chen, Shijia Peng, Zhiqian Lan, Zeyu Gao, Zhixuan Liang, Qiaojun Yu, Yude Zou, Mingkun Xu, et al . 2025. RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins.Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(2025)

  48. [48]

    Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, Adeet Parikh, Aaron Lo, Abhishek Joshi, Ajay Mandlekar, and Yuke Zhu. 2024. RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots. InRobotics: Science and Systems (RSS)

  49. [49]

    Abby O’Neill, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, et al. 2024. Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0. In2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6892–6903

  50. [50]

    Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, Youngwoon Lee, and Pieter Abbeel. 2024. Humanoidbench: Simulated humanoid benchmark for whole-body locomotion and manipulation.arXiv preprint arXiv:2403.10506(2024)

  51. [51]

    Christian Smith, Yiannis Karayiannidis, Lazaros Nalpantidis, Xavi Gratal, Peng Qi, Dimos V Dimarogonas, and Danica Kragic. 2012. Dual arm manipulation—A survey.Robotics and Autonomous systems60, 10 (2012), 1340–1353

  52. [52]

    Stone Tao, Fanbo Xiang, Arth Shukla, Yuzhe Qin, Xander Hinrichsen, Xiaodi Yuan, Chen Bao, Xinsong Lin, Yulin Liu, Tse kai Chan, Yuan Gao, Xuanlin Li, Tongzhou Mu, Nan Xiao, Arnav Gurha, Zhiao Huang, Roberto Calandra, Rui Chen, Shan Luo, and Hao Su. 2024. ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI.arXiv pre...

  53. [53]

    Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, et al. 2024. Octo: An open-source generalist robot policy.arXiv preprint arXiv:2405.12213(2024)

  54. [54]

    Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. InIEEE/RSJ International Conference on Intelligent Robots and Systems. 5026–5033

  55. [55]

    Homer Rich Walke, Kevin Black, Tony Z Zhao, Quan Vuong, Chongyi Zheng, Philippe Hansen-Estruch, Andre Wang He, Vivek Myers, Moo Jin Kim, Max Du, et al. 2023. Bridgedata v2: A dataset for robot learning at scale. InConference on Robot Learning. PMLR, 1723–1736

  56. [56]

    Chenxi Wang, Hongjie Fang, Hao-Shu Fang, and Cewu Lu. 2024. Rise: 3d per- ception makes real-world robot imitation simple and effective. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2870– 2877

  57. [57]

    Dian Wang, Colin Kohler, Xupeng Zhu, Mingxi Jia, and Robert Platt. 2022. Bul- letarm: An open-source robotic manipulation benchmark and learning frame- work. InThe International Symposium of Robotics Research. Springer, 335–350

  58. [58]

    Junjie Wen, Yichen Zhu, Jinming Li, Zhibin Tang, Chaomin Shen, and Feifei Feng. 2025. DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control. InConference on Robot Learning. PMLR, 3094–3114

  59. [59]

    Junjie Wen, Yichen Zhu, Jinming Li, Minjie Zhu, Zhibin Tang, Kun Wu, Zhiyuan Xu, Ning Liu, Ran Cheng, Chaomin Shen, Yaxin Peng, Feifei Feng, and Jian Tang

  60. [60]

    Yuqi Wang, Xinghang Li, Wenxuan Wang, Junbo Zhang, Yingyan Li, Yuntao Chen, Xinlong Wang, and Zhaoxiang Zhang

    TinyVLA: Toward Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation.IEEE Robotics and Automation Letters10, 4 (2025), 3988–3995. doi:10.1109/LRA.2025.3544909

  61. [61]

    Kun Wu, Chengkai Hou, Jiaming Liu, Zhengping Che, Xiaozhu Ju, Zhuqin Yang, Meng Li, Yinuo Zhao, Zhiyuan Xu, Guang Yang, et al. 2024. Robomind: Benchmark on multi-embodiment intelligence normative data for robot manipulation.arXiv preprint arXiv:2412.13877(2024)

  62. [62]

    Guibas, and Hao Su

    Fanbo Xiang, He Wang, Yuzhe Qin, Austin Wang, Hejia Zhang, Yikuan Xia, Binbin Lin, Yuzhe Wu, Chengcheng Tang, Yixin Zhu, Li Yi, Leonidas J. Guibas, and Hao Su. 2020. SAPIEN: A SimulAted Part-based Interactive ENvironment.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  63. [63]

    Yuyin Yang, Zetao Cai, Yang Tian, Jia Zeng, and Jiangmiao Pang. 2025. Gripper Keypose and Object Pointflow as Interfaces for Bimanual Robotic Manipulation. arXiv preprint arXiv:2504.17784(2025)

  64. [64]

    Seonghyeon Ye, Joel Jang, Byeongguk Jeon, Se June Joo, Jianwei Yang, Baolin Peng, Ajay Mandlekar, Reuben Tan, Yu-Wei Chao, Bill Yuchen Lin, et al. [n. d.]. Latent Action Pretraining From Videos. InCoRL 2024 Workshop on Whole-body Control and Bimanual Manipulation: Applications in Humanoids and Beyond

  65. [65]

    Tengbo Yu, Guanxing Lu, Zaijia Yang, Haoyuan Deng, Season Si Chen, Jiwen Lu, Wenbo Ding, Guoqiang Hu, Yansong Tang, and Ziwei Wang. 2025. Mani- Gaussian++: General robotic bimanual manipulation with hierarchical Gaussian world model. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 12232–12239

  66. [66]

    Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. 2020. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. InConference on robot learning. PMLR, 1094–1100

  67. [67]

    Kevin Zakka, Philipp Wu, Laura Smith, Nimrod Gileadi, Taylor Howell, Xue Bin Peng, Sumeet Singh, Yuval Tassa, Pete Florence, Andy Zeng, et al. 2023. RoboPi- anist: Dexterous Piano Playing with Deep Reinforcement Learning. InConference on Robot Learning. PMLR, 2975–2994

  68. [68]

    Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, and Huazhe Xu. 2024. 3d diffusion policy.arXiv preprint arXiv:2403.03954(2024)

  69. [69]

    Tony Z Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. 2023. Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint arXiv:2304.13705(2023)

  70. [70]

    Yan Zhao, Ruihai Wu, Zhehuan Chen, Yourong Zhang, Qingnan Fan, Kaichun Mo, and Hao Dong. 2022. Dualafford: Learning collaborative visual affordance for dual-gripper manipulation.arXiv preprint arXiv:2207.01971(2022)

  71. [71]

    Yuke Zhu, Josiah Wong, Ajay Mandlekar, and Roberto Martín-Martín. 2020. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning. InarXiv preprint arXiv:2009.12293