General surgery vision transformer: A video pre-trained foundation model for general surgery

· 2024 · arXiv 2403.05949

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SurgCoT: Advancing Spatiotemporal Reasoning in Surgical Videos through a Chain-of-Thought Benchmark

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

SurgCoT is a new benchmark that evaluates chain-of-thought spatiotemporal reasoning in multimodal large language models on surgical videos using five defined dimensions and an annotation protocol of Question-Option-Knowledge-Clue-Answer.

Federated Learning for Surgical Vision in Appendicitis Classification: Results of the FedSurg EndoVis 2024 Challenge

cs.CV · 2025-10-06 · conditional · novelty 7.0

The FedSurg challenge benchmarks federated learning on appendectomy videos and finds only 26% F1 on unseen centers even with centralized data, plus extra penalties from decentralization, with spatiotemporal models performing best.

SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos

cs.CV · 2026-02-05 · unverdicted · novelty 6.0

SurgMotion outperforms prior methods on 17 surgical video benchmarks by shifting pretraining to latent motion prediction with motion-guided masking, affinity distillation, and diversity regularization on a 15M-sample dataset.

citing papers explorer

Showing 3 of 3 citing papers.

SurgCoT: Advancing Spatiotemporal Reasoning in Surgical Videos through a Chain-of-Thought Benchmark cs.CV · 2026-04-22 · unverdicted · none · ref 35
SurgCoT is a new benchmark that evaluates chain-of-thought spatiotemporal reasoning in multimodal large language models on surgical videos using five defined dimensions and an annotation protocol of Question-Option-Knowledge-Clue-Answer.
Federated Learning for Surgical Vision in Appendicitis Classification: Results of the FedSurg EndoVis 2024 Challenge cs.CV · 2025-10-06 · conditional · none · ref 37
The FedSurg challenge benchmarks federated learning on appendectomy videos and finds only 26% F1 on unseen centers even with centralized data, plus extra penalties from decentralization, with spatiotemporal models performing best.
SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos cs.CV · 2026-02-05 · unverdicted · none · ref 6
SurgMotion outperforms prior methods on 17 surgical video benchmarks by shifting pretraining to latent motion prediction with motion-guided masking, affinity distillation, and diversity regularization on a 15M-sample dataset.

General surgery vision transformer: A video pre-trained foundation model for general surgery

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer