SurgLQA introduces FTC for compact long-range video representations and TMS for adaptive test-time scaling, reporting gains on restructured Colon-LQA and REAL-Colon-VQA benchmarks.
Medical Image Analysis99, 103366 (2025)
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
TASOT performs annotation-free surgical temporal segmentation by extending ASOT with temporally aligned textual captions from a VLM fused into an unbalanced Gromov-Wasserstein optimal transport objective using DINOv3 and CLIP features, reporting F1 gains of +18.9 to +33.7 over zero-shot baselines on
A framework using Temporal Error-Cascade loss, Evidence-Gated Transition Predictor, and Temporal Fragmentation Index reduces temporal fragmentation in online surgical phase recognition on Cholec80 and AutoLaparo datasets.
citing papers explorer
-
SurgLQA: Scalable Long-Horizon Surgical Video Question Answering
SurgLQA introduces FTC for compact long-range video representations and TMS for adaptive test-time scaling, reporting gains on restructured Colon-LQA and REAL-Colon-VQA benchmarks.
-
Multimodal Optimal Transport for Training-free Temporal Segmentation in Surgical Robotics
TASOT performs annotation-free surgical temporal segmentation by extending ASOT with temporally aligned textual captions from a VLM fused into an unbalanced Gromov-Wasserstein optimal transport objective using DINOv3 and CLIP features, reporting F1 gains of +18.9 to +33.7 over zero-shot baselines on
-
Stabilizing Temporal Inference Dynamics for Online Surgical Phase Recognition
A framework using Temporal Error-Cascade loss, Evidence-Gated Transition Predictor, and Temporal Fragmentation Index reduces temporal fragmentation in online surgical phase recognition on Cholec80 and AutoLaparo datasets.