arxiv: 2604.05831 · v1 · submitted 2026-04-07 · 💻 cs.RO

Recognition: no theorem link

BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination

Xingyu Peng , Chen Gao , Liankai Jin , Annan Li , Si Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:52 UTC · model grok-4.3

classification 💻 cs.RO

keywords bimanual manipulationrobotics benchmarklong-horizon tasksspatial-temporal coordinationrobotic learningarm coordination

0 comments

The pith

Bimanual robot policies fail on tasks requiring sustained tight coordination between two arms over long sequences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing bimanual benchmarks rely on short tasks with only loose arm cooperation, which does not match the continuous dependency and role switching seen in real two-handed work. BiCoord supplies a set of longer tasks built around ongoing inter-arm coupling and repeated sub-goal exchanges. The benchmark adds metrics that separately score timing, positioning, and their joint behavior. When standard policies are tested on these tasks, performance drops sharply, exposing limits in current learning approaches for coordinated manipulation.

Core claim

BiCoord is a benchmark for long-horizon tightly coordinated bimanual manipulation that includes diverse tasks requiring continuous inter-arm dependency and dynamic role exchange across multiple sub-goals, together with quantitative metrics that evaluate coordination from temporal, spatial, and spatial-temporal perspectives. Experiments show that representative policies such as DP, RDT, Pi0, and OpenVLA-OFT struggle with the long-duration and highly coupled tasks.

What carries the argument

The BiCoord benchmark, built from tasks that enforce continuous arm-to-arm dependency and role exchange, paired with a metric suite that separately quantifies timing, spatial alignment, and their interaction.

If this is right

Methods for bimanual control must add explicit handling of long-term arm interdependencies rather than treating arms independently.
The new metrics provide a concrete way to measure and compare progress toward better coordination.
Long-horizon tasks with role exchange become a standard test for whether learned policies can sustain cooperative behavior across changing goals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Improved performance on BiCoord tasks could support more reliable two-arm systems for assembly or household tasks that currently require human-level timing.
The benchmark format could be extended to include contact-rich actions or sensor noise to test whether coordination gains survive real-world conditions.

Load-bearing premise

The chosen tasks and metrics capture the essential spatial-temporal coupling present in real-world bimanual actions.

What would settle it

A standard policy trained without special coordination modules that nevertheless scores well on all BiCoord tasks and metrics would show the claimed fundamental challenges do not hold.

Figures

Figures reproduced from arXiv: 2604.05831 by Annan Li, Chen Gao, Liankai Jin, Si Liu, Xingyu Peng.

**Figure 1.** Figure 1: Overview of BiCoord. (a) The data generation pipeline. (b) An example trajectory of Cook task is exhibited. Each trajectory is divided into several stages with sub-goals and arm behaviours. Besides, key features of bimanual coordination are embodied in BiCoord, like phased coupling, spatial-temporal constraint and predictive coordination. (c) We design metrics to evaluate the bimanual manipulation benchmar… view at source ↗

**Figure 4.** Figure 4: Pipeline for building BiCoord. 4 BiCoord As shown in section 3.3, existing bimanual manipulation benchmarks are deficient in coordination and length. To address this, we propose BiCoord, a bimanual manipulation benchmark requiring both high-level coordination and long-term inference, as shown in fig. 3. In the following, we will present the building pipeline of BiCoord in section 4.1, and introduce its fe… view at source ↗

**Figure 3.** Figure 3: Tasks in BiCoord. Each task requires high-level [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Statistics of BiCoord. to previous benchmarks. Besides, the average trajectory length and object number are improved by 63.35% and 64.13% respectively. Stage-Wise Annotation and Evaluation. To support fine-grained policy training and testing, BiCoord provides stage-wise annotation and evaluation. For training, each trajectory is divided into several stages, where each stage is coupled with a sub-goal and … view at source ↗

**Figure 6.** Figure 6: Visualizations of Pi0 on Divide Block Tower task. Grasping errors occur when the color and order of the blocks change, demonstrating limited reasoning ability. better ability in handling complex tasks. Such a phenomenon indicates that large-scale pre-training is also meaningful in the field of embodied intelligence, just like in vision-language models. High efficiency of DP. DP generally takes fewer times… view at source ↗

**Figure 7.** Figure 7: Visualizations on Handover Block With Bowls task. The block is poured out before the two bowls completely aligned, showing weak abilities in precise alignment. 𝑆𝑅(%) 𝑆𝑡𝑎𝑔𝑒 start 1 block 2 blocks 3 blocks 4 blocks (b) Jigsaw 𝑆𝑅(%) 𝑆𝑡𝑎𝑔𝑒 start 1 pen 2 pens 3 pens 4 pens (a) Collect Pens DP RDT OpenVLA-OFT Pi0 [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Stage-wise analysis. We present two examples here [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 9.** Figure 9: Visualizations on Cook task [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗

read the original abstract

Bimanual manipulation, i.e., the coordinated use of two robotic arms to complete tasks, is essential for achieving human-level dexterity in robotics. Recent simulation benchmarks, e.g., RoboTwin and RLBench2, have advanced data-driven learning for bimanual manipulation. However, existing tasks are short-horizon and only loosely coordinated, failing to capture the spatial-temporal coupling inherent in real-world bimanual behaviors. To address this gap, we introduce BiCoord, a benchmark for long-horizon and tightly coordinated bimanual manipulation. Specifically, BiCoord comprises diverse tasks that require continuous inter-arm dependency and dynamic role exchange across multiple sub-goals. Also, we propose a suite of quantitative metrics that evaluate coordination from temporal, spatial, and spatial-temporal perspectives, enabling systematic measurement of bimanual cooperation. Experimental results show that representative manipulation policies, e.g., DP, RDT, Pi0, and OpenVLA-OFT, struggle with long-duration and highly coupled tasks, revealing fundamental challenges in achieving long-horizon and tight coordination tasks. We hope BiCoord can serve as a foundation for studying long-horizon cooperative manipulation and inspire future research on coordination-aware robotic learning. All datasets, codes and supplements could be found at https://buaa-colalab.github.io/BiCoord/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BiCoord adds long-horizon bimanual tasks with inter-arm dependency and new coordination metrics, but the experiments do not yet isolate coupling from general long-horizon difficulty.

read the letter

BiCoord introduces tasks that require two arms to maintain continuous dependency and swap roles across multiple sub-goals, plus metrics that score temporal alignment, spatial overlap, and their combination. The reported results show that diffusion policies, RDT, Pi0, and OpenVLA-OFT all degrade on these longer, more coupled sequences compared with shorter benchmarks like RoboTwin or RLBench2. That is the core addition: a testbed that moves past loose, short-horizon coordination and supplies quantitative ways to measure how well policies handle the joint constraints. Releasing the full code, datasets, and supplements is also useful for anyone who wants to run their own ablations or extensions. The gap it targets is real; prior simulation suites rarely force sustained spatial-temporal coupling or dynamic role changes. The paper therefore gives the community a concrete place to measure progress on tighter cooperation. The main limitation is the missing control conditions. The central claim attributes policy failures to the tight inter-arm coupling, yet the experiments do not include matched tasks that keep the same duration and subgoal count while relaxing simultaneous dependency (for example, allowing sequential independent-arm execution). Without those, it remains possible that the observed drops come from known long-horizon planning issues rather than the coordination demands the benchmark is meant to highlight. The metric definitions also need more validation against real-world bimanual data to confirm they track the intended coupling rather than proxy variables. This work is aimed at researchers who build or evaluate bimanual policies and want harder, more realistic test cases. It is worth sending to peer review because the task design and release lower the barrier for follow-up, even though the current evidence for coordination-specific difficulty is not yet isolated. I would ask for the control ablations and a short section checking metric sensitivity before acceptance.

Referee Report

2 major / 1 minor

Summary. The paper introduces BiCoord, a new benchmark for long-horizon bimanual manipulation consisting of tasks that require continuous inter-arm dependency and dynamic role exchange across multiple sub-goals. It defines a suite of metrics to quantify coordination along temporal, spatial, and spatial-temporal axes. Experiments on representative policies (DP, RDT, Pi0, OpenVLA-OFT) show poor performance on these tasks, which the authors interpret as evidence of fundamental challenges in long-horizon tight coordination.

Significance. If the BiCoord tasks and metrics demonstrably isolate spatial-temporal coupling beyond generic long-horizon difficulty, and if the reported performance gaps are robust, the benchmark would fill a genuine gap left by short-horizon suites such as RoboTwin and RLBench2. The public release of datasets, code, and supplements is a clear strength that supports reproducibility and future coordination-aware learning research.

major comments (2)

[Experimental results and task design] The central claim that policy failures reveal specific challenges in 'long-horizon and tight coordination' (abstract and conclusion) rests on the assumption that BiCoord tasks impose continuous inter-arm dependency that cannot be reduced to extended horizon or increased subgoal count. No control tasks or ablations are described that preserve duration and subgoal structure while relaxing simultaneous dependency (e.g., sequential independent-arm execution). Without such isolation, the attribution of struggles to coordination demands rather than known long-horizon planning limitations remains unverified.
[Metrics section] The proposed temporal/spatial/spatial-temporal metrics are introduced to measure bimanual cooperation, yet the manuscript provides no quantitative validation or baseline comparisons showing that these metrics distinguish tight coupling from loose coordination on the BiCoord tasks themselves.

minor comments (1)

[Abstract and introduction] The abstract states that 'all datasets, codes and supplements could be found at https://buaa-colalab.github.io/BiCoord/'; the main text should include a concise description of benchmark usage, task parameterization, and metric computation formulas to reduce reliance on the external site.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We agree that the manuscript would be strengthened by additional experiments isolating coordination demands from long-horizon effects and by explicit validation of the proposed metrics. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses

Referee: [Experimental results and task design] The central claim that policy failures reveal specific challenges in 'long-horizon and tight coordination' (abstract and conclusion) rests on the assumption that BiCoord tasks impose continuous inter-arm dependency that cannot be reduced to extended horizon or increased subgoal count. No control tasks or ablations are described that preserve duration and subgoal structure while relaxing simultaneous dependency (e.g., sequential independent-arm execution). Without such isolation, the attribution of struggles to coordination demands rather than known long-horizon planning limitations remains unverified.

Authors: We acknowledge that the current submission lacks explicit control experiments to separate the effects of tight inter-arm coupling from general long-horizon planning difficulties. Although the BiCoord tasks are explicitly designed around continuous dependency and dynamic role exchange (as described in the task definitions), we did not report sequential variants. In the revised manuscript we will add such ablations: for each task we will include a matched sequential version in which the arms execute sub-goals independently while preserving total duration, number of sub-goals, and overall workspace constraints. Direct performance comparisons between the original tightly coupled tasks and these sequential controls will be reported to better attribute the observed policy failures. revision: yes
Referee: [Metrics section] The proposed temporal/spatial/spatial-temporal metrics are introduced to measure bimanual cooperation, yet the manuscript provides no quantitative validation or baseline comparisons showing that these metrics distinguish tight coupling from loose coordination on the BiCoord tasks themselves.

Authors: We agree that quantitative validation is necessary to confirm the metrics capture tight versus loose coordination. In the revision we will compute the temporal, spatial, and spatial-temporal metrics on both the original BiCoord tasks and on relaxed variants that reduce coupling (e.g., by relaxing synchronization constraints while keeping the same sub-goal sequence). We will additionally report metric values obtained from human teleoperation demonstrations (high coordination) and from random or single-arm policies (low coordination) to demonstrate differentiation. These results will be included in an expanded metrics section. revision: yes

Circularity Check

0 steps flagged

Independent benchmark with empirical evaluation; no derivation chain present

full rationale

The paper introduces a new benchmark (BiCoord) consisting of tasks and metrics for bimanual coordination, then reports empirical performance of existing policies on those tasks. No equations, fitted parameters, predictions, or first-principles derivations are claimed. The central statements (existing benchmarks are short-horizon/loose; new tasks require continuous inter-arm dependency; policies struggle) are definitional descriptions of the benchmark plus experimental observations, not reductions of outputs to inputs by construction. Self-citations, if any, are not load-bearing for any result. This is a standard benchmark paper whose content is self-contained against external evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central contribution is a new benchmark rather than a derivation; it rests on the domain assumption that simulation can proxy real bimanual coordination and on the new tasks themselves.

axioms (1)

domain assumption Simulation environments can adequately model the spatial-temporal coupling of real-world bimanual behaviors
The entire benchmark is built in simulation; this assumption underpins claims about real-world relevance.

invented entities (1)

BiCoord benchmark tasks and coordination metrics no independent evidence
purpose: To evaluate long-horizon spatial-temporal coordination in bimanual manipulation
Newly defined tasks and quantitative metrics introduced in the paper.

pith-pipeline@v0.9.0 · 5541 in / 1190 out tokens · 27546 ms · 2026-05-10T18:52:40.925383+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 26 canonical work pages · 13 internal anchors

[1]

Jorge Aldaco, Travis Armstrong, Robert Baruch, Jeff Bingham, Sanky Chan, Ken- neth Draper, Debidatta Dwibedi, Chelsea Finn, Pete Florence, Spencer Goodrich, et al. 2024. ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoper- ation.arXiv preprint arXiv:2405.02292(2024)

work page arXiv 2024
[2]

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. 2024. 𝑝𝑖_ 0: A Vision-Language-Action Flow Model for General Robot Control.arXiv preprint arXiv:2410.24164(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. 2023. Rt-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818(2023)

work page internal anchor Pith review arXiv 2023
[4]

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et al. 2022. Rt-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817(2022)

work page internal anchor Pith review arXiv 2022
[5]

Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xu Huang, Shu Jiang, et al. 2025. Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems. arXiv preprint arXiv:2503.06669(2025)

work page internal anchor Pith review arXiv 2025
[6]

Konstantinos Chatzilygeroudis, Bernardo Fichera, Ilaria Lauzana, Fanjun Bu, Kunpeng Yao, Farshad Khadivar, and Aude Billard. 2020. Benchmark for bimanual robotic manipulation of semi-deformable objects.IEEE Robotics and Automation Letters5, 2 (2020), 2443–2450

2020
[7]

Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Zixuan Li, Qiwei Liang, Xianliang Lin, Yiheng Ge, Zhenyu Gu, et al. 2025. Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation.arXiv preprint arXiv:2506.18088(2025)

work page internal anchor Pith review arXiv 2025
[8]

Tianxing Chen, Yao Mu, Zhixuan Liang, Zanxin Chen, Shijia Peng, Qiangyu Chen, Mingkun Xu, Ruizhen Hu, Hongyuan Zhang, Xuelong Li, et al . 2025. G3flow: Generative 3d semantic flow for pose-aware and generalizable object manipulation. InProceedings of the Computer Vision and Pattern Recognition Conference. 1735–1744

2025
[9]

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burch- fiel, Russ Tedrake, and Shuran Song. 2023. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research(2023), 02783649241273668

2023
[10]

Sudeep Dasari, Jianren Wang, Joyce Hong, Shikhar Bahl, Yixin Lin, Austin S Wang, Abitha Thankaraj, Karanbir Singh Chahal, Berk Calli, Saurabh Gupta, et al
[11]

RB2: Robotic Manipulation Benchmarking with a Twist.NeurIPS 2021 Datasets and Benchmarks Track(2021)

2021
[12]

Frederik Ebert, Yanlai Yang, Karl Schmeckpeper, Bernadette Bucher, Georgios Georgakis, Kostas Daniilidis, Chelsea Finn, and Sergey Levine. 2021. Bridge data: Boosting generalization of robotic skills with cross-domain datasets.arXiv preprint arXiv:2109.13396(2021)

work page internal anchor Pith review arXiv 2021
[13]

Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kauf- mann, Michael J Black, and Otmar Hilliges. 2023. ARCTIC: A dataset for dexterous bimanual hand-object manipulation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12943–12954

2023
[14]

Zipeng Fu, Tony Z Zhao, and Chelsea Finn. 2024. Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation.arXiv preprint arXiv:2401.02117(2024)

work page internal anchor Pith review arXiv 2024
[15]

Jianfeng Gao, Xiaoshu Jin, Franziska Krebs, Noémie Jaquier, and Tamim Asfour
[16]

In2024 IEEE International Conference on Robotics and Automation (ICRA)

Bi-kvil: Keypoints-based visual imitation learning of bimanual manipulation tasks. In2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 16850–16857
[17]

Haoran Geng, Feishi Wang, Songlin Wei, Yuyang Li, Bangjun Wang, Boshi An, Charlie Tianyue Cheng, Haozhe Lou, Peihao Li, Yen-Jen Wang, et al. 2025. Robo- Verse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning.arXiv preprint arXiv:2504.18904(2025)

work page arXiv 2025
[18]

Jennifer Grannen, Yilin Wu, Brandon Vu, and Dorsa Sadigh. 2023. Stabilize to act: Learning to coordinate for bimanual manipulation. InConference on Robot Learning. PMLR, 563–576

2023
[19]

Markus Grotz, Mohit Shridhar, Yu-Wei Chao, Tamim Asfour, and Dieter Fox
[20]

InCoRL 2024 Workshop on Whole-body Control and Bimanual Manipulation: Applications in Humanoids and Beyond

Peract2: Benchmarking and learning for robotic bimanual manipulation tasks. InCoRL 2024 Workshop on Whole-body Control and Bimanual Manipulation: Applications in Humanoids and Beyond

2024
[21]

Songhao Han, Boxiang Qiu, Yue Liao, Siyuan Huang, Chen Gao, Shuicheng Yan, and Si Liu. 2025. RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation.arXiv preprint arXiv:2506.06677(2025)

work page arXiv 2025
[22]

Ce Hao, Xuanran Zhai, Yaohua Liu, and Harold Soh. 2026. Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies.arXiv preprint arXiv:2601.21251(2026)

work page arXiv 2026
[23]

Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J Davison. 2020. Rlbench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters5, 2 (2020), 3019–3026

2020
[24]

Jian-Jian Jiang, Xiao-Ming Wu, Yi-Xiang He, Ling-An Zeng, Yi-Lin Wei, Dandan Zhang, and Wei-Shi Zheng. 2025. Rethinking bimanual robotic manipulation: Learning with decoupled interaction framework. InProceedings of the IEEE/CVF International Conference on Computer Vision. 12427–12437

2025
[25]

Tsung-Wei Ke, Nikolaos Gkanatsios, and Katerina Fragkiadaki. 2025. 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations. InConference on Robot Learning. PMLR, 1949–1974

2025
[26]

Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al . 2024. DROID: A large-scale in- the-wild robot manipulation dataset. InRobotics: Science and Systems

2024
[27]

Moo Jin Kim, Chelsea Finn, and Percy Liang. 2025. Fine-tuning vision-language- action models: Optimizing speed and success.arXiv preprint arXiv:2502.19645 (2025)

work page internal anchor Pith review arXiv 2025
[28]

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan P Foster, Pannag R Sanketi, Quan Vuong, et al. [n. d.]. OpenVLA: An Open-Source Vision-Language-Action Model. In8th Annual Conference on Robot Learning
[29]

Zhiqian Lan, Yuxuan Jiang, Ruiqi Wang, Xuanbing Xie, Rongkui Zhang, Yicheng Zhu, Peihang Li, Tianshuo Yang, Tianxing Chen, Haoyu Gao, et al. 2025. Autobio: A simulation and benchmark for robotic automation in digital biology laboratory. arXiv preprint arXiv:2505.14030(2025)

work page arXiv 2025
[30]

Qixiu Li, Yaobo Liang, Zeyu Wang, Lin Luo, Xi Chen, Mozheng Liao, Fangyun Wei, Yu Deng, Sicheng Xu, Yizhong Zhang, et al. 2024. Cogact: A foundational vision-language-action model for synergizing cognition and action in robotic manipulation.arXiv preprint arXiv:2411.19650(2024)

work page Pith review arXiv 2024
[31]

Rui Li, Zixuan Hu, Wenxi Qu, Jinouwen Zhang, Zhenfei Yin, Sha Zhang, Xuantuo Huang, Hanqing Wang, Tai Wang, Jiangmiao Pang, et al. 2025. LabUtopia: High- Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents. arXiv preprint arXiv:2505.22634(2025)

work page arXiv 2025
[32]

Yunfei Li, Chaoyi Pan, Huazhe Xu, Xiaolong Wang, and Yi Wu. 2023. Efficient bimanual handover and rearrangement via symmetry-aware actor-critic learning. In2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3867–3874

2023
[33]

Zhixuan Liang, Yao Mu, Mingyu Ding, Fei Ni, Masayoshi Tomizuka, and Ping Luo. 2023. AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners. InInternational Conference on Machine Learning. PMLR, 20725–20745

2023
[34]

Zhixuan Liang, Yao Mu, Hengbo Ma, Masayoshi Tomizuka, Mingyu Ding, and Ping Luo. 2024. Skilldiffuser: Interpretable hierarchical planning via skill abstrac- tions in diffusion-based task execution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16467–16476

2024
[35]

Zhixuan Liang, Yao Mu, Yixiao Wang, Tianxing Chen, Wenqi Shao, Wei Zhan, Masayoshi Tomizuka, Ping Luo, and Mingyu Ding. 2025. DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation. In Proceedings of the Computer Vision and Pattern Recognition Conference. 1745–1755

2025
[36]

Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. 2023. Libero: Benchmarking knowledge transfer for lifelong robot learning. Advances in Neural Information Processing Systems36 (2023), 44776–44791

2023
[37]

I-Chun Arthur Liu, Jason Chen, Gaurav S Sukhatme, and Daniel Seita. 2025. D- CODA: Diffusion for Coordinated Dual-Arm Data Augmentation. InConference on Robot Learning. PMLR, 3569–3588

2025
[38]

I-Chun Arthur Liu, Sicheng He, Daniel Seita, and Gaurav S Sukhatme. 2025. VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation. InConference on Robot Learning. PMLR, 4354–4370

2025
[39]

Junjia Liu, Yiting Chen, Zhipeng Dong, Shixiong Wang, Sylvain Calinon, Miao Li, and Fei Chen. 2022. Robot cooking with stir-fry: Bimanual non-prehensile manipulation of semi-fluid objects.IEEE Robotics and Automation Letters7, 2 (2022), 5159–5166

2022
[40]

Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. 2024. Rdt-1b: a diffusion foundation model for bimanual manipulation.arXiv preprint arXiv:2410.07864(2024)

work page internal anchor Pith review arXiv 2024
[41]

Guanxing Lu, Tengbo Yu, Haoyuan Deng, Season Si Chen, Yansong Tang, and Ziwei Wang. 2025. Anybimanual: Transferring unimanual policy for general bimanual manipulation. InProceedings of the IEEE/CVF International Conference on Computer Vision. 13662–13672

2025
[42]

Qi Lv, Hao Li, Xiang Deng, Rui Shao, Yinchuan Li, Jianye Hao, Longxiang Gao, Michael Yu Wang, and Liqiang Nie. 2025. Spatial-temporal graph diffusion policy with kinematic modeling for bimanual robotic manipulation. InProceedings of the Computer Vision and Pattern Recognition Conference. 17394–17404

2025
[43]

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, et al
[44]

InNeurIPS Datasets and Benchmarks

Isaac Gym: High Performance GPU Based Physics Simulation For Robot Learning. InNeurIPS Datasets and Benchmarks
[45]

Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. 2022. Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks.IEEE Robotics and Automation Letters7, 3 (2022), 7327–7334

2022
[46]

Mayank Mittal, Calvin Yu, Qinxi Yu, Jingzhou Liu, Nikita Rudin, David Hoeller, Jia Lin Yuan, Ritvik Singh, Yunrong Guo, Hammad Mazhar, et al. 2023. Orbit: A unified simulation framework for interactive robot learning environments.IEEE Robotics and Automation Letters8, 6 (2023), 3740–3747

2023
[47]

Yao Mu, Tianxing Chen, Zanxin Chen, Shijia Peng, Zhiqian Lan, Zeyu Gao, Zhixuan Liang, Qiaojun Yu, Yude Zou, Mingkun Xu, et al . 2025. RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins.Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(2025)

2025
[48]

Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, Adeet Parikh, Aaron Lo, Abhishek Joshi, Ajay Mandlekar, and Yuke Zhu. 2024. RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots. InRobotics: Science and Systems (RSS)

2024
[49]

Abby O’Neill, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, et al. 2024. Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0. In2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6892–6903

2024
[50]

Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, Youngwoon Lee, and Pieter Abbeel. 2024. Humanoidbench: Simulated humanoid benchmark for whole-body locomotion and manipulation.arXiv preprint arXiv:2403.10506(2024)

work page arXiv 2024
[51]

Christian Smith, Yiannis Karayiannidis, Lazaros Nalpantidis, Xavi Gratal, Peng Qi, Dimos V Dimarogonas, and Danica Kragic. 2012. Dual arm manipulation—A survey.Robotics and Autonomous systems60, 10 (2012), 1340–1353

2012
[52]

Stone Tao, Fanbo Xiang, Arth Shukla, Yuzhe Qin, Xander Hinrichsen, Xiaodi Yuan, Chen Bao, Xinsong Lin, Yulin Liu, Tse kai Chan, Yuan Gao, Xuanlin Li, Tongzhou Mu, Nan Xiao, Arnav Gurha, Zhiao Huang, Roberto Calandra, Rui Chen, Shan Luo, and Hao Su. 2024. ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI.arXiv pre...

work page arXiv 2024
[53]

Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, et al. 2024. Octo: An open-source generalist robot policy.arXiv preprint arXiv:2405.12213(2024)

work page internal anchor Pith review arXiv 2024
[54]

Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. InIEEE/RSJ International Conference on Intelligent Robots and Systems. 5026–5033

2012
[55]

Homer Rich Walke, Kevin Black, Tony Z Zhao, Quan Vuong, Chongyi Zheng, Philippe Hansen-Estruch, Andre Wang He, Vivek Myers, Moo Jin Kim, Max Du, et al. 2023. Bridgedata v2: A dataset for robot learning at scale. InConference on Robot Learning. PMLR, 1723–1736

2023
[56]

Chenxi Wang, Hongjie Fang, Hao-Shu Fang, and Cewu Lu. 2024. Rise: 3d per- ception makes real-world robot imitation simple and effective. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2870– 2877

2024
[57]

Dian Wang, Colin Kohler, Xupeng Zhu, Mingxi Jia, and Robert Platt. 2022. Bul- letarm: An open-source robotic manipulation benchmark and learning frame- work. InThe International Symposium of Robotics Research. Springer, 335–350

2022
[58]

Junjie Wen, Yichen Zhu, Jinming Li, Zhibin Tang, Chaomin Shen, and Feifei Feng. 2025. DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control. InConference on Robot Learning. PMLR, 3094–3114

2025
[59]

Junjie Wen, Yichen Zhu, Jinming Li, Minjie Zhu, Zhibin Tang, Kun Wu, Zhiyuan Xu, Ning Liu, Ran Cheng, Chaomin Shen, Yaxin Peng, Feifei Feng, and Jian Tang
[60]

Yuqi Wang, Xinghang Li, Wenxuan Wang, Junbo Zhang, Yingyan Li, Yuntao Chen, Xinlong Wang, and Zhaoxiang Zhang

TinyVLA: Toward Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation.IEEE Robotics and Automation Letters10, 4 (2025), 3988–3995. doi:10.1109/LRA.2025.3544909

work page doi:10.1109/lra.2025.3544909 2025
[61]

Kun Wu, Chengkai Hou, Jiaming Liu, Zhengping Che, Xiaozhu Ju, Zhuqin Yang, Meng Li, Yinuo Zhao, Zhiyuan Xu, Guang Yang, et al. 2024. Robomind: Benchmark on multi-embodiment intelligence normative data for robot manipulation.arXiv preprint arXiv:2412.13877(2024)

work page arXiv 2024
[62]

Guibas, and Hao Su

Fanbo Xiang, He Wang, Yuzhe Qin, Austin Wang, Hejia Zhang, Yikuan Xia, Binbin Lin, Yuzhe Wu, Chengcheng Tang, Yixin Zhu, Li Yi, Leonidas J. Guibas, and Hao Su. 2020. SAPIEN: A SimulAted Part-based Interactive ENvironment.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

2020
[63]

Yuyin Yang, Zetao Cai, Yang Tian, Jia Zeng, and Jiangmiao Pang. 2025. Gripper Keypose and Object Pointflow as Interfaces for Bimanual Robotic Manipulation. arXiv preprint arXiv:2504.17784(2025)

work page arXiv 2025
[64]

Seonghyeon Ye, Joel Jang, Byeongguk Jeon, Se June Joo, Jianwei Yang, Baolin Peng, Ajay Mandlekar, Reuben Tan, Yu-Wei Chao, Bill Yuchen Lin, et al. [n. d.]. Latent Action Pretraining From Videos. InCoRL 2024 Workshop on Whole-body Control and Bimanual Manipulation: Applications in Humanoids and Beyond

2024
[65]

Tengbo Yu, Guanxing Lu, Zaijia Yang, Haoyuan Deng, Season Si Chen, Jiwen Lu, Wenbo Ding, Guoqiang Hu, Yansong Tang, and Ziwei Wang. 2025. Mani- Gaussian++: General robotic bimanual manipulation with hierarchical Gaussian world model. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 12232–12239

2025
[66]

Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. 2020. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. InConference on robot learning. PMLR, 1094–1100

2020
[67]

Kevin Zakka, Philipp Wu, Laura Smith, Nimrod Gileadi, Taylor Howell, Xue Bin Peng, Sumeet Singh, Yuval Tassa, Pete Florence, Andy Zeng, et al. 2023. RoboPi- anist: Dexterous Piano Playing with Deep Reinforcement Learning. InConference on Robot Learning. PMLR, 2975–2994

2023
[68]

Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, and Huazhe Xu. 2024. 3d diffusion policy.arXiv preprint arXiv:2403.03954(2024)

work page internal anchor Pith review arXiv 2024
[69]

Tony Z Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. 2023. Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint arXiv:2304.13705(2023)

work page internal anchor Pith review arXiv 2023
[70]

Yan Zhao, Ruihai Wu, Zhehuan Chen, Yourong Zhang, Qingnan Fan, Kaichun Mo, and Hao Dong. 2022. Dualafford: Learning collaborative visual affordance for dual-gripper manipulation.arXiv preprint arXiv:2207.01971(2022)

work page arXiv 2022
[71]

Yuke Zhu, Josiah Wong, Ajay Mandlekar, and Roberto Martín-Martín. 2020. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning. InarXiv preprint arXiv:2009.12293

work page internal anchor Pith review arXiv 2020