SurgOnAir introduces a streaming vision-language model trained on a hierarchical surgical dataset to generate real-time, multi-level narrations with explicit transition tokens.
In: International Conference on Medical Image Computing and Computer-Assisted Intervention
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Hi-GaTA is a hierarchical gated temporal aggregation adapter that uses short-to-long temporal pyramids and gated fusion to enable surgical video report generation, backed by a new 214-video benchmark and a surgical ViViT pretrained on 40,000 minutes of video.
TASOT performs annotation-free surgical temporal segmentation by extending ASOT with temporally aligned textual captions from a VLM fused into an unbalanced Gromov-Wasserstein optimal transport objective using DINOv3 and CLIP features, reporting F1 gains of +18.9 to +33.7 over zero-shot baselines on
citing papers explorer
-
SurgOnAir: Hierarchy-Aware Real-Time Surgical Video Commentary
SurgOnAir introduces a streaming vision-language model trained on a hierarchical surgical dataset to generate real-time, multi-level narrations with explicit transition tokens.
-
Hi-GaTA: Hierarchical Gated Temporal Aggregation Adapter for Surgical Video Report Generation
Hi-GaTA is a hierarchical gated temporal aggregation adapter that uses short-to-long temporal pyramids and gated fusion to enable surgical video report generation, backed by a new 214-video benchmark and a surgical ViViT pretrained on 40,000 minutes of video.
-
Multimodal Optimal Transport for Training-free Temporal Segmentation in Surgical Robotics
TASOT performs annotation-free surgical temporal segmentation by extending ASOT with temporally aligned textual captions from a VLM fused into an unbalanced Gromov-Wasserstein optimal transport objective using DINOv3 and CLIP features, reporting F1 gains of +18.9 to +33.7 over zero-shot baselines on