RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation

Shihan Wu , Xuecheng Liu , Shaoxuan Xie , Pengwei Wang , Xinghang Li , Bowen Yang , Zhe Li , Kai Zhu

show 76 more authors

Hongyu Wu Yiheng Liu Zhaoye Long Runtian Xu Yue Wang Chong Liu Dihan Wang Ziqiang Ni Xiang Yang You Liu Ruoxuan Feng Lei Zhang Denghang Huang Chenghao Jin Anlan Yin Xinlong Wang Zhenguo Sun Junkai Zhao Mengfei Du Mingyu Cao Xiansheng Chen Hongyang Cheng Xiaojie Zhang Yankai Fu Ning Chen Cheng Chi Sixiang Chen Huaihai Lyu Xiaoshuai Hao Yequan Wang Bo Lei Dong Liu Xi Yang Yance Jiao Tengfei Pan Yunyan Zhang Songjing Wang Ziqian Zhang Xu Liu Ji Zhang Caowei Meng Zhizheng Zhang Jiyang Gao Song Wang Xiaokun Leng Zhiqiang Xie Zhenzhen Zhou Peng Huang Wu Yang Yandong Guo Yichao Zhu Suibing Zheng Hao Cheng Xinmin Ding Yang Yue Huanqian Wang Chi Chen Jingrui Pang YuXi Qian Haoran Geng Lianli Gao Haiyuan Li Bin Fang Gao Huang Yaodong Yang Hao Dong He Wang Hang Zhao Yadong Mu Di Hu Hao Zhao Tiejun Huang Shanghang Zhang Yonghua Lin Zhongyuan Wang Guocai Yao

Authors on Pith no claims yet

classification 💻 cs.RO

keywords bimanualmanipulationroboticdatadatasetmulti-embodimentrobocoinacross

0 comments

read the original abstract

Despite the critical role of bimanual manipulation in endowing robots with human-like dexterity, large-scale and diverse datasets remain scarce due to the significant hardware heterogeneity across bimanual robotic platforms. To bridge this gap, we introduce RoboCOIN, a large-scale multi-embodiment bimanual manipulation dataset comprising over 180,000 demonstrations collected from 15 distinct robotic platforms. Spanning 16 diverse environments-including residential, commercial, and industrial settings-the dataset features 421 bimanual tasks systematically categorized by 39 bimanual collaboration actions and 432 objects. A key innovation of our work is the hierarchical capability pyramid, which provides granular annotations ranging from trajectory-level concepts to segment-level subtasks and frame-level kinematics. Furthermore, we present CoRobot, an efficient data processing pipeline powered by the Robot Trajectory Markup Language (RTML), designed to facilitate quality assessment, automated annotation, and unified multi-embodiment and data management. Extensive experiments demonstrate the effectiveness of RoboCOIN in enhancing the performance of various bimanual manipulation models across a wide spectrum of robotic embodiments. The entire dataset and codebase are fully open-sourced, providing a valuable resource for advancing research in bimanual and multi-embodiment manipulation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RotVLA: Rotational Latent Action for Vision-Language-Action Model
cs.RO 2026-05 unverdicted novelty 7.0

RotVLA models latent actions as continuous SO(n) rotations with triplet-frame supervision and flow-matching to reach 98.2% success on LIBERO and 89.6%/88.5% on RoboTwin2.0 using a 1.7B-parameter model.
CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
cs.CV 2026-05 unverdicted novelty 7.0

Capability vectors extracted from parameter differences between standard and auxiliary-finetuned VLA models can be merged into pretrained weights to match auxiliary-training performance while reducing computational ov...
HRDexDB: A Large-Scale Dataset of Dexterous Human and Robotic Hand Grasps
cs.RO 2026-04 unverdicted novelty 7.0

HRDexDB is a multi-modal dataset of 1.4K human and robotic dexterous grasps across 100 objects, providing aligned 3D kinematics, high-resolution tactile data, and video streams.
HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation
cs.RO 2026-04 unverdicted novelty 6.0

HEX is a new framework with humanoid-aligned state representation, mixture-of-experts proprioceptive predictor, history tokens, and residual-gated fusion that achieves state-of-the-art success and generalization on re...
A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model
cs.RO 2026-04 unverdicted novelty 6.0

A1 is a transparent VLA framework achieving state-of-the-art robot manipulation success with up to 72% lower latency via adaptive layer truncation and inter-layer flow matching.
Causal World Modeling for Robot Control
cs.CV 2026-01 unverdicted novelty 5.0

LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.
JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy
cs.RO 2026-04 unverdicted novelty 4.0

JoyAI-RA is a multi-source pretrained VLA model that claims to bridge human-to-robot embodiment gaps via data unification and outperforms prior methods on generalization-heavy robotic tasks.