SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge

An Wang; Attaullah Khan; Bo Wang; Cong Gao; Estevao Lima; Eung-Joo Lee; Hao Ding; Ho-Gun Ha; Hongchao Shu; Hongliang Ren

arxiv: 2407.11906 · v3 · submitted 2024-07-16 · 💻 cs.CV · cs.RO

SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge

Hao Ding , Yuqian Zhang , Tuxun Lu , Ruixing Liang , Hongchao Shu , Lalithkumar Seenivasan , Yonghao Long , Qi Dou

show 34 more authors

Cong Gao Yicheng Leng Seok Bong Yoo Eung-Joo Lee Negin Ghamsarian Klaus Schoeffmann Raphael Sznitman Zijian Wu Yuxin Chen Septimiu E. Salcudean Samra Irshad Shadi Albarqouni Seong Tae Kim Yueyi Sun An Wang Long Bai Hongliang Ren Ihsan Ullah Ho-Gun Ha Attaullah Khan Hyunki Lee Satoshi Kondo Satoshi Kasai Kousuke Hirasawa Sita Tailor Ricardo Sanchez-Matilla Imanol Luengo Tianhao Fu Jun Ma Bo Wang Marcos Fern\'andez-Rodr\'iguez Estevao Lima Jo\~ao L. Vila\c{c}a Mathias Unberath

This is my paper

Pith reviewed 2026-05-23 22:42 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords surgical tool segmentationmodel robustnessnon-adversarial corruptionschallenge benchmarkendoscopic imagesbinary segmentationdeep neural networks

0 comments

The pith

A new benchmark with paired clean and corrupted surgical images shows that prior knowledge and custom training improve tool segmentation robustness to bleeding, smoke, and low brightness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the SegSTRONG-C challenge to measure how surgical tool segmentation models degrade under plausible non-adversarial corruptions and to identify methods that resist them. It supplies a dataset of paired clean and corrupted samples created through counterfactual robotic replay so that models trained only on clean data can be tested on corrupted versions. Top submissions reach high scores by drawing on prior knowledge, tailored training, and architecture choices. This setup matters because it isolates the effect of realistic corruptions that appear in surgery without adversarial intent. The results also flag that most gains still come from conventional techniques and call for fresh approaches to achieve wider robustness.

Core claim

The SegSTRONG-C challenge supplies paired clean and corrupted endoscopic images for the binary robot tool segmentation task, with corruptions generated through counterfactual robotic replay. Participants train on the clean domain and are evaluated on unreleased test sets containing bleeding, smoke, and low brightness. The leading entries attain an average 0.9394 DSC and 0.9301 NSD. These outcomes demonstrate that prior knowledge, customized training strategies, and architectural decisions can be leveraged to improve robustness. The challenge also surfaces recurring failure modes and concludes that conventional techniques remain limited, advocating new paradigms for universal robustness to un

What carries the argument

The paired clean-corrupted dataset generated through counterfactual robotic replay, which enables reproducible testing of models trained on uncorrupted data against non-adversarial corruptions.

If this is right

Models trained solely on clean data can still perform well on corrupted domains when prior knowledge and custom strategies are applied.
Architectural choices contribute measurably to accuracy under the tested corruption types.
Most successful entries rely on established techniques that carry known limits for handling unforeseen corruptions.
Further gains in surgical data science will require approaches beyond current conventional methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The paired structure could support training methods that explicitly enforce invariance to these specific corruptions.
Results from this benchmark may inform robustness evaluation in other endoscopic or medical imaging tasks.
Additional corruption types encountered in actual procedures could be added to increase coverage.

Load-bearing premise

The corruptions produced by counterfactual robotic replay match the non-adversarial corruptions that occur in real surgical procedures.

What would settle it

A direct comparison of the same models on the challenge's generated corruptions versus on naturally occurring corruptions recorded during live surgery would show whether performance transfers.

read the original abstract

Surgical data science has seen rapid advancement with the excellent performance of end-to-end deep neural networks (DNNs). Despite their successes, DNNs have been proven susceptible to minor "corruptions," introducing a major concern for the translation of cutting-edge technology, especially in high-stakes scenarios. We introduce the SegSTRONG-C challenge dedicated to better understanding model deterioration under unforeseen but plausible non-adversarial "corruption" and the capabilities of contemporary methods that seek to improve it. Built on a dataset generated through counterfactual robotic replay, SegSTRONG-C provides paired clean and "corrupted" samples, enabling reproducible evaluation of model robustness. Participants are challenged to train tool segmentation algorithms on "uncorrupted" data and evaluate them on "corrupted" test domains for the binary robot tool segmentation task. Through comprehensive baseline experiments and participating submissions from widespread community engagement, SegSTRONG-C reveals key themes for model failure and identifies promising directions for improving robustness. The performance of challenge winners, achieving an average 0.9394 DSC and 0.9301 NSD across the unreleased test sets with "corruption" types: bleeding, smoke, and low brightness. This highlights how prior knowledge, customized training strategies, and architectural choice can be leveraged to improve robustness. In conclusion, the SegSTRONG-C challenge has identified practical approaches for enhancing model robustness. However, most approaches rely on conventional techniques that have known limitations. Looking ahead, we advocate for expanding intellectual diversity and creativity in non-adversarial robustness beyond data augmentation, calling for new paradigms that enhance universal robustness to unforeseen "corruptions" to facilitate richer applications in surgical data science.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A new benchmark challenge using robotic-replay paired data for surgical tool segmentation under bleeding/smoke/low-brightness corruptions, with community winners at 0.94 DSC, but no validation that the synthetics match real OR distributions.

read the letter

The paper introduces SegSTRONG-C, a challenge that supplies paired clean and corrupted surgical video for tool segmentation, generated via counterfactual robotic replay. Winners from external submissions hit 0.9394 DSC and 0.9301 NSD on hidden test sets covering bleeding, smoke, and low brightness. That is the core deliverable: a reproducible testbed focused on non-adversarial corruptions in a high-stakes medical setting, plus some empirical evidence that prior knowledge and training tweaks can lift performance over plain baselines.

Referee Report

1 major / 0 minor

Summary. The paper introduces the SegSTRONG-C EndoVis'24 challenge for binary robot tool segmentation under non-adversarial corruptions (bleeding, smoke, low brightness) generated via counterfactual robotic replay, supplying paired clean/corrupted data. It reports baseline experiments plus community submissions, with winners reaching average DSC 0.9394 and NSD 0.9301 on unreleased test sets, and identifies themes for model failure while advocating new robustness paradigms beyond conventional augmentation.

Significance. If the generated corruptions are representative of real surgical conditions, the challenge supplies a reproducible benchmark that empirically demonstrates how prior knowledge, training strategies, and architecture choices can yield high robustness on the specified corruptions, thereby guiding practical improvements in surgical data science.

major comments (1)

[Abstract] Abstract and dataset description: the positioning of the corruptions as 'plausible non-adversarial' and relevant to real OR conditions is not supported by any quantitative validation (feature-space distances, perceptual metrics, or clinician ratings) that the counterfactual robotic replay preserves the statistics causing model failure in live procedures. This assumption is load-bearing for interpreting the reported DSC/NSD scores as evidence of robustness to clinically meaningful corruptions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the positioning of the generated corruptions. We address this point directly below and outline the planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract and dataset description: the positioning of the corruptions as 'plausible non-adversarial' and relevant to real OR conditions is not supported by any quantitative validation (feature-space distances, perceptual metrics, or clinician ratings) that the counterfactual robotic replay preserves the statistics causing model failure in live procedures. This assumption is load-bearing for interpreting the reported DSC/NSD scores as evidence of robustness to clinically meaningful corruptions.

Authors: We agree that the manuscript does not provide quantitative validation (e.g., feature-space distances, perceptual metrics, or clinician ratings) demonstrating that the counterfactual robotic replay corruptions preserve the exact statistics of model failures observed in live procedures. The generation process relies on replaying robotic trajectories with added visual effects (bleeding, smoke, low brightness) to create paired clean/corrupted samples, which we positioned as plausible non-adversarial corruptions based on the method's design. However, this remains an unvalidated assumption. In the revised manuscript we will (1) tone down the abstract and dataset description to describe the corruptions as 'synthetically generated to simulate common non-adversarial effects' rather than asserting clinical representativeness, (2) add an explicit limitations paragraph discussing the lack of such validation and its implications for interpreting the DSC/NSD scores, and (3) note this as an important direction for future work. These textual changes will make the claims more precise without requiring new experiments. revision: yes

Circularity Check

0 steps flagged

Empirical challenge report with no derivations or fitted predictions

full rationale

The paper is a report on an EndoVis'24 segmentation challenge. It describes a dataset of paired clean/corrupted images generated via counterfactual robotic replay, reports baseline and community-submitted DSC/NSD scores on unreleased test sets, and discusses practical robustness strategies. No equations, first-principles derivations, parameter fittings, or predictions are present. The central claims are empirical observations from community results, not reductions of outputs to inputs by construction. Self-citations are limited to prior challenge organization and do not bear load on any claimed derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical benchmark and challenge paper without mathematical derivations. No free parameters, axioms, or invented entities are introduced; the contribution rests on the new dataset construction and evaluation protocol applied to existing segmentation networks.

pith-pipeline@v0.9.0 · 6043 in / 1222 out tokens · 33243 ms · 2026-05-23T22:42:46.891002+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Towards Robust Surgical Automation via Digital Twin Representations from Foundation Models
cs.RO 2024-09 unverdicted novelty 5.0

Digital twin representations from vision foundation models enable LLM-based planning for robust peg transfer and gauze retrieval on the dVRK surgical platform with claimed generalizability.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · cited by 1 Pith paper · 9 internal anchors

[1]

arXiv preprint arXiv:2503.00695 (2025)

Ding, H., Lian, X., Unberath, M.: Mosformer: Augmenting temporal con- text with memory of surgery for surgical phase recognition. arXiv preprint arXiv:2503.00695 (2025)

work page arXiv 2025
[2]

arXiv preprint arXiv:2503.21054 (2025)

Shen, Y., Li, C., Liu, B., Li, C.-Y., Porras, T., Unberath, M.: Operating room workflow analysis via reasoning segmentation over digital twins. arXiv preprint arXiv:2503.21054 (2025)

work page arXiv 2025
[3]

arXiv preprint arXiv:2411.18018 (2024)

Ding, H., Gao, Z., Planche, B., Luan, T., Sharma, A., Zheng, M., Lou, A., Chen, T., Unberath, M., Wu, Z.: Neural finite-state machines for surgical phase recognition. arXiv preprint arXiv:2411.18018 (2024)

work page arXiv 2024
[4]

arXiv preprint arXiv:2410.20026 (2024)

Ding, H., Zhang, Y., Shu, H., Lian, X., Kim, J.W., Krieger, A., Unberath, M.: Towards robust algorithms for surgical phase recognition via digital twin-based scene representation. arXiv preprint arXiv:2410.20026 (2024)

work page arXiv 2024
[5]

In: 2020 25th International Conference on Pattern Recogni- tion (ICPR), pp

Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., Schoeff- mann, K.: Relevance detection in cataract surgery videos by spatio-temporal action localization. In: 2020 25th International Conference on Pattern Recogni- tion (ICPR), pp. 10720–10727 (2021). IEEE 25

work page 2020
[6]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Liu, D., Li, Q., Jiang, T., Wang, Y., Miao, R., Shan, F., Li, Z.: Towards uni- fied surgical skill assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2021)

work page 2021
[7]

Scientific reports11(1), 5197 (2021)

Lavanchy, J.L., Zindel, J., Kirtac, K., Twick, I., Hosgor, E., Candinas, D., Beldi, G.: Automation of surgical skill assessment using a three-stage machine learning algorithm. Scientific reports11(1), 5197 (2021)

work page 2021
[8]

Healthcare Technology Letters12(1), 12119 (2025)

Shu, H., Liu, M., Seenivasan, L., Gu, S., Ku, P.-C., Knopf, J., Taylor, R., Unberath, M.: Seamless augmented reality integration in arthroscopy: a pipeline for articular reconstruction and guidance. Healthcare Technology Letters12(1), 12119 (2025)

work page 2025
[9]

International journal of computer assisted radiology and surgery19(6), 1213–1222 (2024)

Killeen, B.D., Zhang, H., Wang, L.J., Liu, Z., Kleinbeck, C., Rosen, M., Tay- lor, R.H., Osgood, G., Unberath, M.: Stand in surgeon’s shoes: virtual reality cross-training to enhance teamwork in surgery. International journal of computer assisted radiology and surgery19(6), 1213–1222 (2024)

work page 2024
[10]

Healthcare Technology Letters 11(6), 355–364 (2024)

Zhang, H., Killeen, B.D., Ku, Y.-C., Seenivasan, L., Zhao, Y., Liu, M., Yang, Y., Gu, S., Martin-Gomez, A., Osgood, G.,et al.: Straighttrack: Towards mixed real- ity navigation system for percutaneous k-wire insertion. Healthcare Technology Letters 11(6), 355–364 (2024)

work page 2024
[11]

International Journal of Computer Assisted Radiology and Surgery 19(7), 1301–1312 (2024)

Kleinbeck, C., Zhang, H., Killeen, B.D., Roth, D., Unberath, M.: Neural digital twins: reconstructing complex medical environments for spatial planning in vir- tual reality. International Journal of Computer Assisted Radiology and Surgery 19(7), 1301–1312 (2024)

work page 2024
[12]

Inter- national Journal of Computer Assisted Radiology and Surgery18(7), 1235–1243 (2023)

Gu, W., Knopf, J., Cast, J., Higgins, L.D., Knopf, D., Unberath, M.: Nail it! vision-based drift correction for accurate mixed reality surgical guidance. Inter- national Journal of Computer Assisted Radiology and Surgery18(7), 1235–1243 (2023)

work page 2023
[13]

Towards Robust Surgical Automation via Digital Twin Representations from Foundation Models

Ding, H., Seenivasan, L., Shu, H., Byrd, G., Zhang, H., Xiao, P., Barragan, J.A., Taylor, R.H., Kazanzides, P., Unberath, M.: Towards robust automation of surgi- cal systems via digital twin-based scene representations from foundation models. arXiv preprint arXiv:2409.13107 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

W.et al.Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks (2024)

Kim, J.W., Zhao, T.Z., Schmidgall, S., Deguet, A., Kobilarov, M., Finn, C., Krieger, A.: Surgical robot transformer (srt): Imitation learning for surgical tasks. arXiv preprint arXiv:2407.12998 (2024)

work page arXiv 2024
[15]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 26

work page 2015
[16]

In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp

Ronneberger,O.,Fischer,P.,Brox,T.:U-net:Convolutionalnetworksforbiomed- ical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241 (2015). Springer

work page 2015
[17]

In: Pro- ceedings of the European Conference on Computer Vision (ECCV), pp

Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Pro- ceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)

work page 2018
[19]

IEEE Robotics and Automation Letters 7(2), 3858–3865 (2022)

Seenivasan, L., Mitheran, S., Islam, M., Ren, H.: Global-reasoned multi-task learning model for surgical scene understanding. IEEE Robotics and Automation Letters 7(2), 3858–3865 (2022)

work page 2022
[20]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Zheng,S.,Lu,J.,Zhao,H.,Zhu,X.,Luo,Z.,Wang,Y.,Fu,Y.,Feng,J.,Xiang,T., Torr, P.H.,et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)

work page 2021
[21]

Advances in Neural Information Processing Systems34, 12077–12090 (2021)

Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Seg- former: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems34, 12077–12090 (2021)

work page 2021
[22]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290–1299 (2022)

work page 2022
[23]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Ding, H., Qiao, S., Yuille, A., Shen, W.: Deeply shape-guided cascade for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8278–8288 (2021)

work page 2021
[24]

arXiv preprint arXiv:2001.11190 (2020)

Allan, M., Kondo, S., Bodenstedt, S., Leger, S., Kadkhodamohammadi, R., Luengo, I., Fuentes, F., Flouty, E., Mohammed, A., Pedersen, M., et al.: 2018 robotic scene segmentation challenge. arXiv preprint arXiv:2001.11190 (2020)

work page arXiv 2018
[25]

2017 Robotic Instrument Segmentation Challenge

Allan, M., Shvets, A., Kurmann, T., Zhang, Z., Duggal, R., Su, Y.-H., Rieke, N., Laina, I., Kalavakonda, N., Bodenstedt, S., et al.: 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

International journal of computer vision 88, 303–338 (2010) 27

Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International journal of computer vision 88, 303–338 (2010) 27

work page 2010
[27]

International Journal of Computer Vision127(3), 302–321 (2019)

Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision127(3), 302–321 (2019)

work page 2019
[28]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp

Ghamsarian, N., Gamazo Tejero, J., Márquez-Neila, P., Wolf, S., Zinkernagel, M., Schoeffmann, K., Sznitman, R.: Domain adaptation for medical image segmen- tation using transformation-invariant self-training. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 331–341 (2023). Springer

work page 2023
[29]

Drenkow, N., Sani, N., Shpitser, I., Unberath, M.: A systematic review of robustness in deep learning for computer vision: Mind the gap? arXiv preprint arXiv:2112.00639 (2021)

work page arXiv 2021
[30]

arXiv preprint arXiv:2501.17628 (2025)

Nasirihaghighi, S., Ghamsarian, N., Sznitman, R., Schoeffmann, K.: Dual invari- ance self-training for reliable semi-supervised surgical phase recognition. arXiv preprint arXiv:2501.17628 (2025)

work page arXiv 2025
[31]

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1903
[32]

arXiv preprint arXiv:2410.23494 (2024)

Drenkow, N., Ribaudo, C., Unberath, M.: Causality-driven audits of model robustness. arXiv preprint arXiv:2410.23494 (2024)

work page arXiv 2024
[33]

arXiv preprint arXiv:2503.09969 (2025)

Drenkow, N., Pavlak, M., Harrigian, K., Zirikly, A., Subbaswamy, A., Unberath, M.: Detecting dataset bias in medical ai: A generalized and modality-agnostic auditing framework. arXiv preprint arXiv:2503.09969 (2025)

work page arXiv 2025
[34]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: Learning augmentation strategies from data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 113–123 (2019)

work page 2019
[35]

SAM 2: Segment Anything in Images and Videos

Ravi, N., Gabeur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., et al.: Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

arXiv preprint arXiv:2408.04098 (2024)

Shen, Y., Ding, H., Shao, X., Unberath, M.: Performance and non-adversarial robustness of the segment anything model 2 in surgical video segmentation. arXiv preprint arXiv:2408.04098 (2024)

work page arXiv 2024
[37]

Biomimetics 7(2), 68 (2022)

Seenivasan, L., Islam, M., Ng, C.-F., Lim, C.M., Ren, H.: Biomimetic incremental domain generalization with a graph network for surgical scene understanding. Biomimetics 7(2), 68 (2022)

work page 2022
[38]

International Journal of Computer Assisted Radiology and Surgery18(5), 939–944 (2023) 28

Reiter, W.: Domain generalization improves end-to-end object detection for real-time surgical tool detection. International Journal of Computer Assisted Radiology and Surgery18(5), 939–944 (2023) 28

work page 2023
[39]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp

Philipp, M., Alperovich, A., Gutt-Will, M., Mathis, A., Saur, S., Raabe, A., Mathis-Ullrich, F.: Dynamic cnns using uncertainty to overcome domain gener- alization for surgical instrument localization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3612–3621 (2022)

work page 2022
[40]

In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp

Ding, H., Zhang, J., Kazanzides, P., Wu, J.Y., Unberath, M.: Carts: Causality- driven robot tool segmentation from vision and kinematics data. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 387–398 (2022). Springer

work page 2022
[41]

International Journal of Computer Assisted Radiology and Surgery18(6), 1009–1016 (2023)

Ding, H., Wu, J.Y., Li, Z., Unberath, M.: Rethinking causality-driven robot tool segmentation with temporal constraints. International Journal of Computer Assisted Radiology and Surgery18(6), 1009–1016 (2023)

work page 2023
[42]

arXiv preprint arXiv:2503.21056 (2025)

Shen, Y., Liu, B., Li, C., Seenivasan, L., Unberath, M.: Online reasoning video segmentation with just-in-time digital twins. arXiv preprint arXiv:2503.21056 (2025)

work page arXiv 2025
[43]

In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp

Kazanzides, P., Chen, Z., Deguet, A., Fischer, G.S., Taylor, R.H., DiMaio, S.P.: An open-source research kit for the da vinci® surgical system. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 6434–6439 (2014). IEEE

work page 2014
[44]

Segment Anything

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[45]

Video-based surveillance systems: Computer vision and distributed processing, 135–144 (2002)

KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real-time tracking with shadow detection. Video-based surveillance systems: Computer vision and distributed processing, 135–144 (2002)

work page 2002
[46]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

work page 2017
[47]

arXiv preprint arXiv:2407.19714 (2024)

Jamal, M.A., Mohareri, O.: Rethinking rgb-d fusion for semantic segmentation in surgical datasets. arXiv preprint arXiv:2407.19714 (2024)

work page arXiv 2024
[48]

arXiv preprint arXiv:2309.09668 (2023)

Yin, B., Zhang, X., Li, Z., Liu, L., Cheng, M.-M., Hou, Q.: Dformer: Rethink- ing rgbd representation learning for semantic segmentation. arXiv preprint arXiv:2309.09668 (2023)

work page arXiv 2023
[49]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth any- thing: Unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10371–10381 (2024) 29

work page 2024
[50]

In: 2021 International Conference on 3D Vision (3DV), pp

Lipson, L., Teed, Z., Deng, J.: Raft-stereo: Multilevel recurrent field transforms for stereo matching. In: 2021 International Conference on 3D Vision (3DV), pp. 218–227 (2021). IEEE

work page 2021
[51]

18963–18974 (2022)

Kar, O.F., Yeo, T., Atanov, A., Zamir, A.: 3d common corruptions and data augmentation.In:ProceedingsoftheIEEE/CVFConferenceonComputerVision and Pattern Recognition, pp. 18963–18974 (2022)

work page 2022
[52]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[53]

Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Fluids Eng. (1960)

work page 1960
[54]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

Ma, J., Li, F., Wang, B.: U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[55]

arXiv preprint arXiv:2401.13560 (2024)

Xing, Z., Ye, T., Yang, Y., Liu, G., Zhu, L.: Segmamba: Long-range sequen- tial modeling mamba for 3d medical image segmentation. arXiv preprint arXiv:2401.13560 (2024)

work page arXiv 2024
[56]

VMamba: Visual State Space Model

Liu, Y., Tian, Y., Zhao, Y., Yu, H., Xie, L., Wang, Y., Ye, Q., Liu, Y.: Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[57]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Choi, Y., Uh, Y., Yoo, J., Ha, J.-W.: Stargan v2: Diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8188–8197 (2020)

work page 2020
[58]

IEEE transactions on medical imaging40(5), 1450–1460 (2021)

Garcia-Peraza-Herrera,L.C.,Fidon,L.,D’Ettorre,C.,Stoyanov,D.,Vercauteren, T., Ourselin, S.: Image compositing for segmentation of surgical tools without manual annotations. IEEE transactions on medical imaging40(5), 1450–1460 (2021)

work page 2021
[59]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Cheng, T., Song, L., Ge, Y., Liu, W., Wang, X., Shan, Y.: Yolo-world: Real-time open-vocabulary object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16901–16911 (2024)

work page 2024
[60]

IEEE Transactions on Pattern Analysis and Machine Intelligence45(5), 5436–5447 (2022)

Guo, M.-H., Liu, Z.-N., Mu, T.-J., Hu, S.-M.: Beyond self-attention: External attention using two linear layers for visual tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence45(5), 5436–5447 (2022)

work page 2022
[61]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

work page 2018
[62]

In: Proceedings of the European Conference on Computer Vision 30 (ECCV), pp

Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block atten- tion module. In: Proceedings of the European Conference on Computer Vision 30 (ECCV), pp. 3–19 (2018)

work page 2018
[63]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Howard, A.G.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[64]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Woo, S., Debnath, S., Hu, R., Chen, X., Liu, Z., Kweon, I.S., Xie, S.: Convnext v2: Co-designing and scaling convnets with masked autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16133–16142 (2023)

work page 2023
[65]

Informa- tion 11(2), 125 (2020)

Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Informa- tion 11(2), 125 (2020)

work page 2020
[66]

: Swin transformer v2: Scaling up capacity and resolution

Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al. : Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)

work page 2022
[67]

In: Computer Vision (ICCV), 2017 IEEE International Conference On (2017)

Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networkss. In: Computer Vision (ICCV), 2017 IEEE International Conference On (2017)

work page 2017
[68]

Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods18, 203–211 (2021) https://doi.org/10.1038/s41592-020-01008-z

work page doi:10.1038/s41592-020-01008-z 2021
[69]

In: Neural Infor- mation Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8–12, 2021, Proceedings, Part III 28, pp

Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., El- Shabrawi, Y., Schöffmann, K.: Recal-net: Joint region-channel-wise calibrated network for semantic segmentation in cataract surgery videos. In: Neural Infor- mation Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8–12, 2021, Proceedings, Part II...

work page 2021
[70]

Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., El- Shabrawi, Y., Schoeffmann, K.: Lensid: a cnn-rnn-based framework towards lens irregularity detection in cataract surgery videos. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Confer- ence, Strasbourg, France, September 27–October 1, 2021...

work page 2021
[71]

International journal of computer assisted radiol- ogy and surgery, 1–9 (2024)

Ghamsarian, N., Wolf, S., Zinkernagel, M., Schoeffmann, K., Sznitman, R.: Deeppyramid+: medical image segmentation using pyramid view fusion and deformable pyramid reception. International journal of computer assisted radiol- ogy and surgery, 1–9 (2024)

work page 2024
[72]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp

Ghamsarian, N., Taschwer, M., Sznitman, R., Schoeffmann, K.: Deeppyramid: 31 Enabling pyramid view and deformable pyramid reception for semantic segmen- tation in cataract surgery videos. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 276–286 (2022). Springer 32

work page 2022

[1] [1]

arXiv preprint arXiv:2503.00695 (2025)

Ding, H., Lian, X., Unberath, M.: Mosformer: Augmenting temporal con- text with memory of surgery for surgical phase recognition. arXiv preprint arXiv:2503.00695 (2025)

work page arXiv 2025

[2] [2]

arXiv preprint arXiv:2503.21054 (2025)

Shen, Y., Li, C., Liu, B., Li, C.-Y., Porras, T., Unberath, M.: Operating room workflow analysis via reasoning segmentation over digital twins. arXiv preprint arXiv:2503.21054 (2025)

work page arXiv 2025

[3] [3]

arXiv preprint arXiv:2411.18018 (2024)

Ding, H., Gao, Z., Planche, B., Luan, T., Sharma, A., Zheng, M., Lou, A., Chen, T., Unberath, M., Wu, Z.: Neural finite-state machines for surgical phase recognition. arXiv preprint arXiv:2411.18018 (2024)

work page arXiv 2024

[4] [4]

arXiv preprint arXiv:2410.20026 (2024)

Ding, H., Zhang, Y., Shu, H., Lian, X., Kim, J.W., Krieger, A., Unberath, M.: Towards robust algorithms for surgical phase recognition via digital twin-based scene representation. arXiv preprint arXiv:2410.20026 (2024)

work page arXiv 2024

[5] [5]

In: 2020 25th International Conference on Pattern Recogni- tion (ICPR), pp

Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., Schoeff- mann, K.: Relevance detection in cataract surgery videos by spatio-temporal action localization. In: 2020 25th International Conference on Pattern Recogni- tion (ICPR), pp. 10720–10727 (2021). IEEE 25

work page 2020

[6] [6]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Liu, D., Li, Q., Jiang, T., Wang, Y., Miao, R., Shan, F., Li, Z.: Towards uni- fied surgical skill assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2021)

work page 2021

[7] [7]

Scientific reports11(1), 5197 (2021)

Lavanchy, J.L., Zindel, J., Kirtac, K., Twick, I., Hosgor, E., Candinas, D., Beldi, G.: Automation of surgical skill assessment using a three-stage machine learning algorithm. Scientific reports11(1), 5197 (2021)

work page 2021

[8] [8]

Healthcare Technology Letters12(1), 12119 (2025)

Shu, H., Liu, M., Seenivasan, L., Gu, S., Ku, P.-C., Knopf, J., Taylor, R., Unberath, M.: Seamless augmented reality integration in arthroscopy: a pipeline for articular reconstruction and guidance. Healthcare Technology Letters12(1), 12119 (2025)

work page 2025

[9] [9]

International journal of computer assisted radiology and surgery19(6), 1213–1222 (2024)

Killeen, B.D., Zhang, H., Wang, L.J., Liu, Z., Kleinbeck, C., Rosen, M., Tay- lor, R.H., Osgood, G., Unberath, M.: Stand in surgeon’s shoes: virtual reality cross-training to enhance teamwork in surgery. International journal of computer assisted radiology and surgery19(6), 1213–1222 (2024)

work page 2024

[10] [10]

Healthcare Technology Letters 11(6), 355–364 (2024)

Zhang, H., Killeen, B.D., Ku, Y.-C., Seenivasan, L., Zhao, Y., Liu, M., Yang, Y., Gu, S., Martin-Gomez, A., Osgood, G.,et al.: Straighttrack: Towards mixed real- ity navigation system for percutaneous k-wire insertion. Healthcare Technology Letters 11(6), 355–364 (2024)

work page 2024

[11] [11]

International Journal of Computer Assisted Radiology and Surgery 19(7), 1301–1312 (2024)

Kleinbeck, C., Zhang, H., Killeen, B.D., Roth, D., Unberath, M.: Neural digital twins: reconstructing complex medical environments for spatial planning in vir- tual reality. International Journal of Computer Assisted Radiology and Surgery 19(7), 1301–1312 (2024)

work page 2024

[12] [12]

Inter- national Journal of Computer Assisted Radiology and Surgery18(7), 1235–1243 (2023)

Gu, W., Knopf, J., Cast, J., Higgins, L.D., Knopf, D., Unberath, M.: Nail it! vision-based drift correction for accurate mixed reality surgical guidance. Inter- national Journal of Computer Assisted Radiology and Surgery18(7), 1235–1243 (2023)

work page 2023

[13] [13]

Towards Robust Surgical Automation via Digital Twin Representations from Foundation Models

Ding, H., Seenivasan, L., Shu, H., Byrd, G., Zhang, H., Xiao, P., Barragan, J.A., Taylor, R.H., Kazanzides, P., Unberath, M.: Towards robust automation of surgi- cal systems via digital twin-based scene representations from foundation models. arXiv preprint arXiv:2409.13107 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

W.et al.Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks (2024)

Kim, J.W., Zhao, T.Z., Schmidgall, S., Deguet, A., Kobilarov, M., Finn, C., Krieger, A.: Surgical robot transformer (srt): Imitation learning for surgical tasks. arXiv preprint arXiv:2407.12998 (2024)

work page arXiv 2024

[15] [15]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 26

work page 2015

[16] [16]

In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp

Ronneberger,O.,Fischer,P.,Brox,T.:U-net:Convolutionalnetworksforbiomed- ical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241 (2015). Springer

work page 2015

[17] [17]

In: Pro- ceedings of the European Conference on Computer Vision (ECCV), pp

Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Pro- ceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)

work page 2018

[18] [19]

IEEE Robotics and Automation Letters 7(2), 3858–3865 (2022)

Seenivasan, L., Mitheran, S., Islam, M., Ren, H.: Global-reasoned multi-task learning model for surgical scene understanding. IEEE Robotics and Automation Letters 7(2), 3858–3865 (2022)

work page 2022

[19] [20]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Zheng,S.,Lu,J.,Zhao,H.,Zhu,X.,Luo,Z.,Wang,Y.,Fu,Y.,Feng,J.,Xiang,T., Torr, P.H.,et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)

work page 2021

[20] [21]

Advances in Neural Information Processing Systems34, 12077–12090 (2021)

Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Seg- former: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems34, 12077–12090 (2021)

work page 2021

[21] [22]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290–1299 (2022)

work page 2022

[22] [23]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Ding, H., Qiao, S., Yuille, A., Shen, W.: Deeply shape-guided cascade for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8278–8288 (2021)

work page 2021

[23] [24]

arXiv preprint arXiv:2001.11190 (2020)

Allan, M., Kondo, S., Bodenstedt, S., Leger, S., Kadkhodamohammadi, R., Luengo, I., Fuentes, F., Flouty, E., Mohammed, A., Pedersen, M., et al.: 2018 robotic scene segmentation challenge. arXiv preprint arXiv:2001.11190 (2020)

work page arXiv 2018

[24] [25]

2017 Robotic Instrument Segmentation Challenge

Allan, M., Shvets, A., Kurmann, T., Zhang, Z., Duggal, R., Su, Y.-H., Rieke, N., Laina, I., Kalavakonda, N., Bodenstedt, S., et al.: 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[25] [26]

International journal of computer vision 88, 303–338 (2010) 27

Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International journal of computer vision 88, 303–338 (2010) 27

work page 2010

[26] [27]

International Journal of Computer Vision127(3), 302–321 (2019)

Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision127(3), 302–321 (2019)

work page 2019

[27] [28]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp

Ghamsarian, N., Gamazo Tejero, J., Márquez-Neila, P., Wolf, S., Zinkernagel, M., Schoeffmann, K., Sznitman, R.: Domain adaptation for medical image segmen- tation using transformation-invariant self-training. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 331–341 (2023). Springer

work page 2023

[28] [29]

Drenkow, N., Sani, N., Shpitser, I., Unberath, M.: A systematic review of robustness in deep learning for computer vision: Mind the gap? arXiv preprint arXiv:2112.00639 (2021)

work page arXiv 2021

[29] [30]

arXiv preprint arXiv:2501.17628 (2025)

Nasirihaghighi, S., Ghamsarian, N., Sznitman, R., Schoeffmann, K.: Dual invari- ance self-training for reliable semi-supervised surgical phase recognition. arXiv preprint arXiv:2501.17628 (2025)

work page arXiv 2025

[30] [31]

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1903

[31] [32]

arXiv preprint arXiv:2410.23494 (2024)

Drenkow, N., Ribaudo, C., Unberath, M.: Causality-driven audits of model robustness. arXiv preprint arXiv:2410.23494 (2024)

work page arXiv 2024

[32] [33]

arXiv preprint arXiv:2503.09969 (2025)

Drenkow, N., Pavlak, M., Harrigian, K., Zirikly, A., Subbaswamy, A., Unberath, M.: Detecting dataset bias in medical ai: A generalized and modality-agnostic auditing framework. arXiv preprint arXiv:2503.09969 (2025)

work page arXiv 2025

[33] [34]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: Learning augmentation strategies from data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 113–123 (2019)

work page 2019

[34] [35]

SAM 2: Segment Anything in Images and Videos

Ravi, N., Gabeur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., et al.: Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[35] [36]

arXiv preprint arXiv:2408.04098 (2024)

Shen, Y., Ding, H., Shao, X., Unberath, M.: Performance and non-adversarial robustness of the segment anything model 2 in surgical video segmentation. arXiv preprint arXiv:2408.04098 (2024)

work page arXiv 2024

[36] [37]

Biomimetics 7(2), 68 (2022)

Seenivasan, L., Islam, M., Ng, C.-F., Lim, C.M., Ren, H.: Biomimetic incremental domain generalization with a graph network for surgical scene understanding. Biomimetics 7(2), 68 (2022)

work page 2022

[37] [38]

International Journal of Computer Assisted Radiology and Surgery18(5), 939–944 (2023) 28

Reiter, W.: Domain generalization improves end-to-end object detection for real-time surgical tool detection. International Journal of Computer Assisted Radiology and Surgery18(5), 939–944 (2023) 28

work page 2023

[38] [39]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp

Philipp, M., Alperovich, A., Gutt-Will, M., Mathis, A., Saur, S., Raabe, A., Mathis-Ullrich, F.: Dynamic cnns using uncertainty to overcome domain gener- alization for surgical instrument localization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3612–3621 (2022)

work page 2022

[39] [40]

In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp

Ding, H., Zhang, J., Kazanzides, P., Wu, J.Y., Unberath, M.: Carts: Causality- driven robot tool segmentation from vision and kinematics data. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 387–398 (2022). Springer

work page 2022

[40] [41]

International Journal of Computer Assisted Radiology and Surgery18(6), 1009–1016 (2023)

Ding, H., Wu, J.Y., Li, Z., Unberath, M.: Rethinking causality-driven robot tool segmentation with temporal constraints. International Journal of Computer Assisted Radiology and Surgery18(6), 1009–1016 (2023)

work page 2023

[41] [42]

arXiv preprint arXiv:2503.21056 (2025)

Shen, Y., Liu, B., Li, C., Seenivasan, L., Unberath, M.: Online reasoning video segmentation with just-in-time digital twins. arXiv preprint arXiv:2503.21056 (2025)

work page arXiv 2025

[42] [43]

In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp

Kazanzides, P., Chen, Z., Deguet, A., Fischer, G.S., Taylor, R.H., DiMaio, S.P.: An open-source research kit for the da vinci® surgical system. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 6434–6439 (2014). IEEE

work page 2014

[43] [44]

Segment Anything

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[44] [45]

Video-based surveillance systems: Computer vision and distributed processing, 135–144 (2002)

KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real-time tracking with shadow detection. Video-based surveillance systems: Computer vision and distributed processing, 135–144 (2002)

work page 2002

[45] [46]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

work page 2017

[46] [47]

arXiv preprint arXiv:2407.19714 (2024)

Jamal, M.A., Mohareri, O.: Rethinking rgb-d fusion for semantic segmentation in surgical datasets. arXiv preprint arXiv:2407.19714 (2024)

work page arXiv 2024

[47] [48]

arXiv preprint arXiv:2309.09668 (2023)

Yin, B., Zhang, X., Li, Z., Liu, L., Cheng, M.-M., Hou, Q.: Dformer: Rethink- ing rgbd representation learning for semantic segmentation. arXiv preprint arXiv:2309.09668 (2023)

work page arXiv 2023

[48] [49]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth any- thing: Unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10371–10381 (2024) 29

work page 2024

[49] [50]

In: 2021 International Conference on 3D Vision (3DV), pp

Lipson, L., Teed, Z., Deng, J.: Raft-stereo: Multilevel recurrent field transforms for stereo matching. In: 2021 International Conference on 3D Vision (3DV), pp. 218–227 (2021). IEEE

work page 2021

[50] [51]

18963–18974 (2022)

Kar, O.F., Yeo, T., Atanov, A., Zamir, A.: 3d common corruptions and data augmentation.In:ProceedingsoftheIEEE/CVFConferenceonComputerVision and Pattern Recognition, pp. 18963–18974 (2022)

work page 2022

[51] [52]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[52] [53]

Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Fluids Eng. (1960)

work page 1960

[53] [54]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

Ma, J., Li, F., Wang, B.: U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[54] [55]

arXiv preprint arXiv:2401.13560 (2024)

Xing, Z., Ye, T., Yang, Y., Liu, G., Zhu, L.: Segmamba: Long-range sequen- tial modeling mamba for 3d medical image segmentation. arXiv preprint arXiv:2401.13560 (2024)

work page arXiv 2024

[55] [56]

VMamba: Visual State Space Model

Liu, Y., Tian, Y., Zhao, Y., Yu, H., Xie, L., Wang, Y., Ye, Q., Liu, Y.: Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[56] [57]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Choi, Y., Uh, Y., Yoo, J., Ha, J.-W.: Stargan v2: Diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8188–8197 (2020)

work page 2020

[57] [58]

IEEE transactions on medical imaging40(5), 1450–1460 (2021)

Garcia-Peraza-Herrera,L.C.,Fidon,L.,D’Ettorre,C.,Stoyanov,D.,Vercauteren, T., Ourselin, S.: Image compositing for segmentation of surgical tools without manual annotations. IEEE transactions on medical imaging40(5), 1450–1460 (2021)

work page 2021

[58] [59]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Cheng, T., Song, L., Ge, Y., Liu, W., Wang, X., Shan, Y.: Yolo-world: Real-time open-vocabulary object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16901–16911 (2024)

work page 2024

[59] [60]

IEEE Transactions on Pattern Analysis and Machine Intelligence45(5), 5436–5447 (2022)

Guo, M.-H., Liu, Z.-N., Mu, T.-J., Hu, S.-M.: Beyond self-attention: External attention using two linear layers for visual tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence45(5), 5436–5447 (2022)

work page 2022

[60] [61]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

work page 2018

[61] [62]

In: Proceedings of the European Conference on Computer Vision 30 (ECCV), pp

Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block atten- tion module. In: Proceedings of the European Conference on Computer Vision 30 (ECCV), pp. 3–19 (2018)

work page 2018

[62] [63]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Howard, A.G.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[63] [64]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Woo, S., Debnath, S., Hu, R., Chen, X., Liu, Z., Kweon, I.S., Xie, S.: Convnext v2: Co-designing and scaling convnets with masked autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16133–16142 (2023)

work page 2023

[64] [65]

Informa- tion 11(2), 125 (2020)

Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Informa- tion 11(2), 125 (2020)

work page 2020

[65] [66]

: Swin transformer v2: Scaling up capacity and resolution

Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al. : Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)

work page 2022

[66] [67]

In: Computer Vision (ICCV), 2017 IEEE International Conference On (2017)

Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networkss. In: Computer Vision (ICCV), 2017 IEEE International Conference On (2017)

work page 2017

[67] [68]

Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods18, 203–211 (2021) https://doi.org/10.1038/s41592-020-01008-z

work page doi:10.1038/s41592-020-01008-z 2021

[68] [69]

In: Neural Infor- mation Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8–12, 2021, Proceedings, Part III 28, pp

Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., El- Shabrawi, Y., Schöffmann, K.: Recal-net: Joint region-channel-wise calibrated network for semantic segmentation in cataract surgery videos. In: Neural Infor- mation Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8–12, 2021, Proceedings, Part II...

work page 2021

[69] [70]

Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., El- Shabrawi, Y., Schoeffmann, K.: Lensid: a cnn-rnn-based framework towards lens irregularity detection in cataract surgery videos. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Confer- ence, Strasbourg, France, September 27–October 1, 2021...

work page 2021

[70] [71]

International journal of computer assisted radiol- ogy and surgery, 1–9 (2024)

Ghamsarian, N., Wolf, S., Zinkernagel, M., Schoeffmann, K., Sznitman, R.: Deeppyramid+: medical image segmentation using pyramid view fusion and deformable pyramid reception. International journal of computer assisted radiol- ogy and surgery, 1–9 (2024)

work page 2024

[71] [72]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp

Ghamsarian, N., Taschwer, M., Sznitman, R., Schoeffmann, K.: Deeppyramid: 31 Enabling pyramid view and deformable pyramid reception for semantic segmen- tation in cataract surgery videos. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 276–286 (2022). Springer 32

work page 2022