SurgCoT is a new benchmark that evaluates chain-of-thought spatiotemporal reasoning in multimodal large language models on surgical videos using five defined dimensions and an annotation protocol of Question-Option-Knowledge-Clue-Answer.
General surgery vision transformer: A video pre-trained foundation model for general surgery
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3roles
background 1polarities
background 1representative citing papers
The FedSurg challenge benchmarks federated learning on appendectomy videos and finds only 26% F1 on unseen centers even with centralized data, plus extra penalties from decentralization, with spatiotemporal models performing best.
SurgMotion outperforms prior methods on 17 surgical video benchmarks by shifting pretraining to latent motion prediction with motion-guided masking, affinity distillation, and diversity regularization on a 15M-sample dataset.
citing papers explorer
-
SurgCoT: Advancing Spatiotemporal Reasoning in Surgical Videos through a Chain-of-Thought Benchmark
SurgCoT is a new benchmark that evaluates chain-of-thought spatiotemporal reasoning in multimodal large language models on surgical videos using five defined dimensions and an annotation protocol of Question-Option-Knowledge-Clue-Answer.
-
Federated Learning for Surgical Vision in Appendicitis Classification: Results of the FedSurg EndoVis 2024 Challenge
The FedSurg challenge benchmarks federated learning on appendectomy videos and finds only 26% F1 on unseen centers even with centralized data, plus extra penalties from decentralization, with spatiotemporal models performing best.
-
SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos
SurgMotion outperforms prior methods on 17 surgical video benchmarks by shifting pretraining to latent motion prediction with motion-guided masking, affinity distillation, and diversity regularization on a 15M-sample dataset.