Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics

· 2026 · cs.RO · arXiv 2604.21017

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Autonomous medical robots hold promise to improve patient outcomes, reduce provider workload, democratize access to care, and enable superhuman precision. However, autonomous medical robotics has been limited by a fundamental data problem: existing medical robotic datasets are small, single-embodiment, and rarely shared openly, restricting the development of foundation models that the field needs to advance. We introduce Open-H-Embodiment, the largest open dataset of medical robotic video with synchronized kinematics to date, spanning more than 50 institutions and multiple robotic platforms including the CMR Versius, Intuitive Surgical's da Vinci, da Vinci Research Kit (dVRK), Rob Surgical BiTrack, Virtual Incision's MIRA, Moon Surgical Maestro, and a variety of custom systems, spanning surgical manipulation, robotic ultrasound, and endoscopy procedures. We demonstrate the research enabled by this dataset through two foundation models. GR00T-H is the first open foundation vision-language-action model for medical robotics, which is the only evaluated model to achieve full end-to-end task completion on a structured suturing benchmark (25% of trials vs. 0% for all others) and achieves 64% average success across a 29-step ex vivo suturing sequence. We also train Cosmos-H-Surgical-Simulator, the first action-conditioned world model to enable multi-embodiment surgical simulation from a single checkpoint, spanning nine robotic platforms and supporting in silico policy evaluation and synthetic data generation for the medical domain. These results suggest that open, large-scale medical robot data collection can serve as critical infrastructure for the research community, enabling advances in robot learning, world modeling, and beyond.

representative citing papers

Adversarial Attacks on Learned Policies for Surgical Robotic Tasks

cs.RO · 2026-06-10 · unverdicted · novelty 8.0

Adversarial visual perturbations reduce success rates of end-to-end policies for debridement and suturing by an average of 61% in physical experiments on three policy architectures.

BiliVLA: Scene-Aware Vision-Language-Action Model with Reinforcement Learning for Autonomous Biliary Endoscopic Navigation

cs.RO · 2026-06-22 · unverdicted · novelty 5.0

BiliVLA applies scene-aware VLA with grounding-enhanced SFT and GRPO to achieve 91.96% action precision and 84.85% success rate across three ERCP subtasks in phantom experiments.

World Models for Robotic Manipulation: A Survey

cs.RO · 2026-05-27 · accept · novelty 5.0

Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.

citing papers explorer

Showing 1 of 1 citing paper after filters.

World Models for Robotic Manipulation: A Survey cs.RO · 2026-05-27 · accept · none · ref 147 · internal anchor
Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.

Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics

fields

years

verdicts

representative citing papers

citing papers explorer