TASOT performs annotation-free surgical temporal segmentation by extending ASOT with temporally aligned textual captions from a VLM fused into an unbalanced Gromov-Wasserstein optimal transport objective using DINOv3 and CLIP features, reporting F1 gains of +18.9 to +33.7 over zero-shot baselines on
Medical Image Analysis p
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Multimodal Optimal Transport for Training-free Temporal Segmentation in Surgical Robotics
TASOT performs annotation-free surgical temporal segmentation by extending ASOT with temporally aligned textual captions from a VLM fused into an unbalanced Gromov-Wasserstein optimal transport objective using DINOv3 and CLIP features, reporting F1 gains of +18.9 to +33.7 over zero-shot baselines on