Supervised Mixture-of-Experts for Surgical Grasping and Retraction

· 2026 · cs.RO · arXiv 2601.21971

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Imitation learning has achieved remarkable success in robotic manipulation, yet its application to surgical robotics remains challenging due to data scarcity, constrained workspaces, and the need for an exceptional level of safety and predictability. We present a supervised Mixture-of-Experts (MoE) architecture designed for phase-structured surgical manipulation tasks, which can be added on top of any autonomous policy. Unlike prior surgical robot learning approaches that rely on multi-camera setups or thousands of demonstrations, we show that a lightweight action decoder policy like Action Chunking Transformer (ACT) can learn complex, long-horizon manipulation from less than 150 demonstrations using solely stereo endoscopic images, when equipped with our architecture. We evaluate our approach on the collaborative surgical task of bowel grasping and retraction, where a robot assistant interprets visual cues from a human surgeon, executes targeted grasping on deformable tissue, and performs sustained retraction. Our results show that generalist Vision Language Action models fail to acquire the task entirely, even under standard in-distribution conditions. Furthermore, while standard ACT achieves moderate success in-distribution, adopting a supervised MoE architecture significantly boosts its performance, yielding higher success rates in-distribution and demonstrating superior robustness in out-of-distribution scenarios, including novel grasp locations, reduced illumination, and partial occlusions. Notably, it generalizes to unseen testing viewpoints and also transfers zero-shot to ex vivo porcine tissue without additional training, offering a promising pathway toward in vivo deployment. To support this statement, we present qualitative preliminary results of policy roll-outs during in vivo porcine surgery.

representative citing papers

Imitation Learning for Robot Assistance in Open Surgery: A Multi-Policy Evaluation on Suture Following

cs.RO · 2026-05-27 · conditional · novelty 7.0

Benchmarking ACT, Diffusion Policy, SmolVLA, and π0 on suture following yields 50-75% success under ideal conditions and 92% stitch completion with π0 in a surgeon-robot trial.

citing papers explorer

Showing 1 of 1 citing paper.

Imitation Learning for Robot Assistance in Open Surgery: A Multi-Policy Evaluation on Suture Following cs.RO · 2026-05-27 · conditional · none · ref 6 · internal anchor
Benchmarking ACT, Diffusion Policy, SmolVLA, and π0 on suture following yields 50-75% success under ideal conditions and 92% stitch completion with π0 in a surgeon-robot trial.

Supervised Mixture-of-Experts for Surgical Grasping and Retraction

fields

years

verdicts

representative citing papers

citing papers explorer