SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures

Adi Chola Venkatesh; Cong Gao; Fengyi Jiang; Hallie McNamara; Jason Culman; Jingpei Lu; Lingbo Jin; Lirong Shao; Omid Mohareri; Ruixing Liang

arxiv: 2507.00209 · v3 · submitted 2025-06-30 · 📡 eess.IV · cs.AI· cs.CV· cs.RO

SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures

Fengyi Jiang , Xiaorui Zhang , Lingbo Jin , Ruixing Liang , Yuxin Chen , Adi Chola Venkatesh , Jason Culman , Tiantian Wu

show 6 more authors

Lirong Shao Wenqing Sun Cong Gao Hallie McNamara Jingpei Lu Omid Mohareri

This is my paper

Pith reviewed 2026-05-19 06:15 UTC · model grok-4.3

classification 📡 eess.IV cs.AIcs.CVcs.RO

keywords 4K endoscopic videorobotic surgery datasetminimally invasive proceduressurgical computer visionsuper resolutioninstrument detectiondepth estimationsmoke removal

0 comments

The pith

SurgiSR4K introduces the first public native 4K resolution dataset for endoscopic videos in robotic-assisted surgery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes SurgiSR4K as a new resource that supplies endoscopic videos and images at native 4K resolution for research in robotic minimally invasive surgery. Prior datasets fell short because they lacked this resolution despite the widespread use of 4K systems in clinical practice. By including realistic elements such as reflections from tools, tissue bleeding, and deformations, the dataset targets the exact visual problems surgeons encounter. This matters because higher resolution data can improve the training of algorithms for key tasks including instrument detection, depth estimation, and smoke removal in surgical settings. Public release of the data allows broader development of intelligent systems to support safer image-guided procedures.

Core claim

We introduce SurgiSR4K, the first publicly accessible surgical imaging and video dataset captured at a native 4K resolution, representing realistic conditions of robotic-assisted procedures. It comprises diverse visual scenarios including specular reflections, tool occlusions, bleeding, and soft tissue deformations, meticulously designed to reflect common challenges faced during laparoscopic and robotic surgeries.

What carries the argument

The SurgiSR4K dataset serves as the central object, providing high-resolution endoscopic data that enables a range of computer vision applications in surgery by supplying realistic examples of intraoperative visual conditions.

If this is right

Enables training of super-resolution models using actual surgical 4K data rather than simulated or lower-res sources.
Supports development of smoke removal techniques that handle high-resolution surgical scenes.
Facilitates accurate surgical instrument detection and instance segmentation with finer details available.
Allows for improved 3D tissue reconstruction, monocular depth estimation, and novel view synthesis from endoscopic views.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could lead to studies measuring the performance gains from using 4K data over standard HD in surgical AI models.
Researchers in related fields like general endoscopy might adapt the dataset collection methods for their own high-res needs.
The dataset opens opportunities to explore how high resolution affects the training of vision-language models for surgical guidance.
Potential extension to real-time applications where the high-res data is processed during live robotic procedures.

Load-bearing premise

The scenarios and visual challenges in the dataset were designed to accurately reflect common conditions in actual laparoscopic and robotic surgeries.

What would settle it

If comparisons with real surgical footage reveal that the dataset lacks sufficient examples of key challenges like bleeding or tissue deformation, or if the resolution is not truly native 4K in practice.

Figures

Figures reproduced from arXiv: 2507.00209 by Adi Chola Venkatesh, Cong Gao, Fengyi Jiang, Hallie McNamara, Jason Culman, Jingpei Lu, Lingbo Jin, Lirong Shao, Omid Mohareri, Ruixing Liang, Tiantian Wu, Wenqing Sun, Xiaorui Zhang, Yuxin Chen.

**Figure 2.** Figure 2: Example frames from the training dataset, showcasing various tools used in different scenarios. These [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of image quality at different resolutions. From left to right: native 4K image (3840 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Frame-by-frame example of a 5-second video clip sampled at 2 fps, showing training data with a bipolar [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Each recorded sequence spans several minutes [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 5.** Figure 5: Experimental setup for SurgiSR4K data col [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: Examples of downstream applications: (a) instance segmentation (Ravi et al. (2024)), (b) surgical tool [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

read the original abstract

High-resolution imaging is crucial for enhancing visual clarity and enabling precise computer-assisted guidance in minimally invasive surgery (MIS). Despite the increasing adoption of 4K endoscopic systems, there remains a significant gap in publicly available native 4K datasets tailored specifically for robotic-assisted MIS. We introduce SurgiSR4K, the first publicly accessible surgical imaging and video dataset captured at a native 4K resolution, representing realistic conditions of robotic-assisted procedures. SurgiSR4K comprises diverse visual scenarios including specular reflections, tool occlusions, bleeding, and soft tissue deformations, meticulously designed to reflect common challenges faced during laparoscopic and robotic surgeries. This dataset opens up possibilities for a broad range of computer vision tasks that might benefit from high resolution data, such as super resolution (SR), smoke removal, surgical instrument detection, 3D tissue reconstruction, monocular depth estimation, instance segmentation, novel view synthesis, and vision-language model (VLM) development. SurgiSR4K provides a robust foundation for advancing research in high-resolution surgical imaging and fosters the development of intelligent imaging technologies aimed at enhancing performance, safety, and usability in image-guided robotic surgeries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SurgiSR4K releases a native 4K surgical video dataset for robotic MIS but its 'first public' claim needs explicit comparison to prior collections to stand.

read the letter

This paper's core offering is a new public dataset of native 4K endoscopic videos captured during robotic-assisted procedures, with scenes that include specular reflections, tool occlusions, bleeding, and tissue deformations. The authors position it for downstream tasks such as super-resolution, smoke removal, depth estimation, and instrument segmentation in surgical settings. If the data is actually released and matches the described conditions, it supplies a higher-resolution resource than most existing endoscopic collections, which could help models that currently train on lower-res footage. The list of intended uses shows the authors considered how the data might be applied rather than just dumping raw video. The main weakness is that the novelty claim rests on an unverified premise. The abstract states there is a significant gap in public native 4K data but supplies no table or citations reviewing prior datasets like EndoVis, Cholec80, or MICCAI releases and their resolutions or access status. Without that comparison, it is impossible to confirm whether this is truly the first such public release. Details on collection protocol, camera calibration, dataset size in frames or procedures, and consent are also absent from the provided text, which limits assessment of how realistic or usable the data actually is. This work is aimed at computer vision researchers in medical robotics and image-guided surgery who need higher-resolution training material. A reader working on super-resolution or 3D reconstruction for MIS would find it relevant if the release is solid. The paper is coherent enough on its own terms to merit a serious referee rather than a desk reject, mainly to verify the data quality and prior-work review.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces SurgiSR4K as the first publicly accessible native 4K endoscopic video dataset for robotic-assisted minimally invasive surgery. It comprises diverse scenarios with challenges including specular reflections, tool occlusions, bleeding, and soft tissue deformations, positioned to support computer vision tasks such as super-resolution, smoke removal, instrument detection, 3D reconstruction, depth estimation, segmentation, novel view synthesis, and vision-language modeling.

Significance. If the dataset is released with full documentation and its novelty is substantiated, it could address a documented gap in high-resolution surgical imaging resources and enable improved algorithm development for image-guided robotic procedures. The absence of collection details and prior-work comparisons currently prevents assessment of whether this contribution is load-bearing or incremental.

major comments (2)

[Abstract] Abstract: The central claim that SurgiSR4K is 'the first publicly accessible surgical imaging and video dataset captured at a native 4K resolution' is unsupported by any enumeration or table comparing resolutions, access status, and release dates of prior datasets (e.g., EndoVis, Cholec80 variants, or MICCAI releases). This omission directly undermines the uniqueness assertion.
[Abstract] Abstract: The assertion that scenarios were 'meticulously designed to accurately reflect common conditions in actual laparoscopic and robotic surgeries' provides no information on collection protocol, camera calibration, patient consent, total dataset size (videos, frames, or procedures), or any validation that the captured scenes are realistic. These omissions are load-bearing for the dataset's claimed utility.

minor comments (1)

[Abstract] The abstract lists supported tasks but does not indicate how native 4K resolution specifically benefits each (e.g., quantitative gains in depth estimation or segmentation). Adding one sentence with expected advantages would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review of our manuscript on the SurgiSR4K dataset. The comments highlight important areas for strengthening the presentation of novelty and data collection details. We address each point below and will incorporate revisions to improve clarity and substantiation.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that SurgiSR4K is 'the first publicly accessible surgical imaging and video dataset captured at a native 4K resolution' is unsupported by any enumeration or table comparing resolutions, access status, and release dates of prior datasets (e.g., EndoVis, Cholec80 variants, or MICCAI releases). This omission directly undermines the uniqueness assertion.

Authors: We agree that an explicit comparison would better support the novelty claim. In the revised manuscript, we will add a dedicated table in the introduction or related work section that enumerates prior surgical video datasets (including EndoVis, Cholec80 and its variants, and relevant MICCAI releases), detailing their resolutions, public accessibility status, release dates, and key characteristics. This will directly substantiate that no prior publicly available dataset provides native 4K resolution for robotic-assisted minimally invasive procedures. revision: yes
Referee: [Abstract] Abstract: The assertion that scenarios were 'meticulously designed to accurately reflect common conditions in actual laparoscopic and robotic surgeries' provides no information on collection protocol, camera calibration, patient consent, total dataset size (videos, frames, or procedures), or any validation that the captured scenes are realistic. These omissions are load-bearing for the dataset's claimed utility.

Authors: We acknowledge that the current manuscript lacks sufficient detail on these aspects, which are important for assessing the dataset's realism and utility. In the revision, we will expand the methods or data description section to include: (1) the collection protocol and setup, (2) camera calibration procedures, (3) total dataset statistics (number of videos, frames, and procedures), and (4) any validation steps or expert review confirming that the scenarios reflect common surgical challenges. Regarding patient consent and ethics, we will clarify whether the data originates from real procedures, phantoms, or ex-vivo models and include relevant approvals or statements as appropriate. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release without derivations or self-referential claims

full rationale

This is a dataset paper whose value rests on the collection and public release of native 4K endoscopic video. No equations, predictions, fitted parameters, or derivation chains appear in the provided text. The claim of being the 'first publicly accessible' dataset is an empirical statement about the literature rather than a mathematical reduction to the paper's own inputs. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results are present. The paper is self-contained as a data contribution and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Dataset release papers rest primarily on empirical collection choices rather than mathematical axioms or new entities.

axioms (1)

domain assumption A significant gap exists in publicly available native 4K datasets for robotic-assisted MIS
Stated directly in the abstract as motivation for the work.

pith-pipeline@v0.9.0 · 5802 in / 1164 out tokens · 69783 ms · 2026-05-19T06:15:07.333038+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce SurgiSR4K, the first publicly accessible surgical imaging and video dataset captured at a native 4K resolution... 800 high-quality 4K PNG images and 50 video clips... downsampled versions at 960x540p and 480x270p
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Table 1: Comparison of public endoscopic and surgical datasets... SurgiSR4K (2025)* ... 3840×2160p SR, seg, det

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 5 internal anchors

[1]

2017 Robotic Instrument Segmentation Challenge

Max Allan, Alex Shvets, Thomas Kurmann, Zichen Zhang, Rahul Duggal, Yun-Hsuan Su, Nicola Rieke, Iro Laina, Niveditha Kalavakonda, Sebastian Boden- stedt, et al. 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426,

work page internal anchor Pith review Pith/arXiv arXiv 2017
[2]

2018 robotic scene segmen- tation challenge

Max Allan, Satoshi Kondo, Sebastian Bodenstedt, Stefan Leger, Rahim Kadkhodamohammadi, Imanol Luengo, Felix Fuentes, Evangello Flouty, Ahmed Mohammed, Marius Pedersen, et al. 2018 robotic scene segmen- tation challenge. arXiv preprint arXiv:2001.11190 ,

work page arXiv 2018
[3]

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Aleksei Bochkovskii, Ama ˜AG ¸l Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R Richter, and Vladlen Koltun. Depth pro: Sharp monocular metric depth in less than a second. arXiv preprint arXiv:2410.02073,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Depthcrafter: Generating consistent long depth se- quences for open-world videos

Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xi- aodong Cun, Yong Zhang, Long Quan, and Ying Shan. Depthcrafter: Generating consistent long depth se- quences for open-world videos. In Proceedings of the 8 SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures Computer Vision and Pattern Recognition Confe...

work page 2005
[5]

Visual-RFT: Visual Reinforcement Fine-Tuning

Ziyu Liu, Zeyi Sun, Yuhang Zang, Xiaoyi Dong, Yuhang Cao, Haodong Duan, Dahua Lin, and Jiaqi Wang. Visual-rft: Visual reinforcement fine-tuning. arXiv preprint arXiv:2503.01785,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

View synthesis of endoscope images by monocular depth prediction and gaussian splatting

Takeshi Masuda, Ryusuke Sagawa, Ryo Furukawa, and Hiroshi Kawasaki. View synthesis of endoscope images by monocular depth prediction and gaussian splatting. In 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1–6,

work page 2024
[7]

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Point tracking in surgery–the 2024 surgical tat- toos in infrared (stir) challenge

Adam Schmidt, Mert Asim Karaoglu, Soham Sinha, Min- gang Jang, Ho-Gun Ha, Kyungmin Jung, Kyeongmo Gu, Ihsan Ullah, Hyunki Lee, Jon ´aˇs ˇSer`ych, et al. Point tracking in surgery–the 2024 surgical tat- toos in infrared (stir) challenge. arXiv preprint arXiv:2503.24306,

work page arXiv 2024
[9]

Endonet: a deep architecture for recognition 9 Fengyi Jiang and Xiaorui Zhang, 2025 tasks on laparoscopic videos

Andru P Twinanda, Sherif Shehata, Didier Mutter, Jacques Marescaux, Michel De Mathelin, and Nicolas Padoy. Endonet: a deep architecture for recognition 9 Fengyi Jiang and Xiaorui Zhang, 2025 tasks on laparoscopic videos. IEEE transactions on medical imaging, 36(1):86–97,

work page 2025
[10]

Real-esrgan: Training real-world blind super- resolution with pure synthetic data

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super- resolution with pure synthetic data. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914,

work page 1905
[11]

Transformer with hy- brid attention mechanism for stereo endoscopic video super-resolution

Tian Zhang and Jingru Yang. Transformer with hy- brid attention mechanism for stereo endoscopic video super-resolution. Symmetry, 15(10):1947,

work page 1947
[12]

Feasibility study of using aug- mented mirrors for alignment task during orthopaedic procedures in mixed reality

Xiaorui Zhang, Andreas Keller, Mehran Armand, and Ale- jandro Martin Gomez. Feasibility study of using aug- mented mirrors for alignment task during orthopaedic procedures in mixed reality. In 2023 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pages 650–651. IEEE,

work page 2023
[13]

Surgical Visual Understanding (SurgVU) Dataset

Aneeq Zia, Max Berniker, Rogerio Nespolo, Conor Perreault, Ziheng Wang, Benjamin Mueller, Ryan Schmidt, Kiran Bhattacharyya, Xi Liu, and Anthony Jarc. Surgical visual understanding (surgvu) dataset. arXiv preprint arXiv:2501.09209,

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

2017 Robotic Instrument Segmentation Challenge

Max Allan, Alex Shvets, Thomas Kurmann, Zichen Zhang, Rahul Duggal, Yun-Hsuan Su, Nicola Rieke, Iro Laina, Niveditha Kalavakonda, Sebastian Boden- stedt, et al. 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426,

work page internal anchor Pith review Pith/arXiv arXiv 2017

[2] [2]

2018 robotic scene segmen- tation challenge

Max Allan, Satoshi Kondo, Sebastian Bodenstedt, Stefan Leger, Rahim Kadkhodamohammadi, Imanol Luengo, Felix Fuentes, Evangello Flouty, Ahmed Mohammed, Marius Pedersen, et al. 2018 robotic scene segmen- tation challenge. arXiv preprint arXiv:2001.11190 ,

work page arXiv 2018

[3] [3]

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Aleksei Bochkovskii, Ama ˜AG ¸l Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R Richter, and Vladlen Koltun. Depth pro: Sharp monocular metric depth in less than a second. arXiv preprint arXiv:2410.02073,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Depthcrafter: Generating consistent long depth se- quences for open-world videos

Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xi- aodong Cun, Yong Zhang, Long Quan, and Ying Shan. Depthcrafter: Generating consistent long depth se- quences for open-world videos. In Proceedings of the 8 SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures Computer Vision and Pattern Recognition Confe...

work page 2005

[5] [5]

Visual-RFT: Visual Reinforcement Fine-Tuning

Ziyu Liu, Zeyi Sun, Yuhang Zang, Xiaoyi Dong, Yuhang Cao, Haodong Duan, Dahua Lin, and Jiaqi Wang. Visual-rft: Visual reinforcement fine-tuning. arXiv preprint arXiv:2503.01785,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

View synthesis of endoscope images by monocular depth prediction and gaussian splatting

Takeshi Masuda, Ryusuke Sagawa, Ryo Furukawa, and Hiroshi Kawasaki. View synthesis of endoscope images by monocular depth prediction and gaussian splatting. In 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1–6,

work page 2024

[7] [7]

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Point tracking in surgery–the 2024 surgical tat- toos in infrared (stir) challenge

Adam Schmidt, Mert Asim Karaoglu, Soham Sinha, Min- gang Jang, Ho-Gun Ha, Kyungmin Jung, Kyeongmo Gu, Ihsan Ullah, Hyunki Lee, Jon ´aˇs ˇSer`ych, et al. Point tracking in surgery–the 2024 surgical tat- toos in infrared (stir) challenge. arXiv preprint arXiv:2503.24306,

work page arXiv 2024

[9] [9]

Endonet: a deep architecture for recognition 9 Fengyi Jiang and Xiaorui Zhang, 2025 tasks on laparoscopic videos

Andru P Twinanda, Sherif Shehata, Didier Mutter, Jacques Marescaux, Michel De Mathelin, and Nicolas Padoy. Endonet: a deep architecture for recognition 9 Fengyi Jiang and Xiaorui Zhang, 2025 tasks on laparoscopic videos. IEEE transactions on medical imaging, 36(1):86–97,

work page 2025

[10] [10]

Real-esrgan: Training real-world blind super- resolution with pure synthetic data

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super- resolution with pure synthetic data. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914,

work page 1905

[11] [11]

Transformer with hy- brid attention mechanism for stereo endoscopic video super-resolution

Tian Zhang and Jingru Yang. Transformer with hy- brid attention mechanism for stereo endoscopic video super-resolution. Symmetry, 15(10):1947,

work page 1947

[12] [12]

Feasibility study of using aug- mented mirrors for alignment task during orthopaedic procedures in mixed reality

Xiaorui Zhang, Andreas Keller, Mehran Armand, and Ale- jandro Martin Gomez. Feasibility study of using aug- mented mirrors for alignment task during orthopaedic procedures in mixed reality. In 2023 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pages 650–651. IEEE,

work page 2023

[13] [13]

Surgical Visual Understanding (SurgVU) Dataset

Aneeq Zia, Max Berniker, Rogerio Nespolo, Conor Perreault, Ziheng Wang, Benjamin Mueller, Ryan Schmidt, Kiran Bhattacharyya, Xi Liu, and Anthony Jarc. Surgical visual understanding (surgvu) dataset. arXiv preprint arXiv:2501.09209,

work page internal anchor Pith review Pith/arXiv arXiv