SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
Pith reviewed 2026-05-23 22:42 UTC · model grok-4.3
The pith
A new benchmark with paired clean and corrupted surgical images shows that prior knowledge and custom training improve tool segmentation robustness to bleeding, smoke, and low brightness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The SegSTRONG-C challenge supplies paired clean and corrupted endoscopic images for the binary robot tool segmentation task, with corruptions generated through counterfactual robotic replay. Participants train on the clean domain and are evaluated on unreleased test sets containing bleeding, smoke, and low brightness. The leading entries attain an average 0.9394 DSC and 0.9301 NSD. These outcomes demonstrate that prior knowledge, customized training strategies, and architectural decisions can be leveraged to improve robustness. The challenge also surfaces recurring failure modes and concludes that conventional techniques remain limited, advocating new paradigms for universal robustness to un
What carries the argument
The paired clean-corrupted dataset generated through counterfactual robotic replay, which enables reproducible testing of models trained on uncorrupted data against non-adversarial corruptions.
If this is right
- Models trained solely on clean data can still perform well on corrupted domains when prior knowledge and custom strategies are applied.
- Architectural choices contribute measurably to accuracy under the tested corruption types.
- Most successful entries rely on established techniques that carry known limits for handling unforeseen corruptions.
- Further gains in surgical data science will require approaches beyond current conventional methods.
Where Pith is reading between the lines
- The paired structure could support training methods that explicitly enforce invariance to these specific corruptions.
- Results from this benchmark may inform robustness evaluation in other endoscopic or medical imaging tasks.
- Additional corruption types encountered in actual procedures could be added to increase coverage.
Load-bearing premise
The corruptions produced by counterfactual robotic replay match the non-adversarial corruptions that occur in real surgical procedures.
What would settle it
A direct comparison of the same models on the challenge's generated corruptions versus on naturally occurring corruptions recorded during live surgery would show whether performance transfers.
read the original abstract
Surgical data science has seen rapid advancement with the excellent performance of end-to-end deep neural networks (DNNs). Despite their successes, DNNs have been proven susceptible to minor "corruptions," introducing a major concern for the translation of cutting-edge technology, especially in high-stakes scenarios. We introduce the SegSTRONG-C challenge dedicated to better understanding model deterioration under unforeseen but plausible non-adversarial "corruption" and the capabilities of contemporary methods that seek to improve it. Built on a dataset generated through counterfactual robotic replay, SegSTRONG-C provides paired clean and "corrupted" samples, enabling reproducible evaluation of model robustness. Participants are challenged to train tool segmentation algorithms on "uncorrupted" data and evaluate them on "corrupted" test domains for the binary robot tool segmentation task. Through comprehensive baseline experiments and participating submissions from widespread community engagement, SegSTRONG-C reveals key themes for model failure and identifies promising directions for improving robustness. The performance of challenge winners, achieving an average 0.9394 DSC and 0.9301 NSD across the unreleased test sets with "corruption" types: bleeding, smoke, and low brightness. This highlights how prior knowledge, customized training strategies, and architectural choice can be leveraged to improve robustness. In conclusion, the SegSTRONG-C challenge has identified practical approaches for enhancing model robustness. However, most approaches rely on conventional techniques that have known limitations. Looking ahead, we advocate for expanding intellectual diversity and creativity in non-adversarial robustness beyond data augmentation, calling for new paradigms that enhance universal robustness to unforeseen "corruptions" to facilitate richer applications in surgical data science.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the SegSTRONG-C EndoVis'24 challenge for binary robot tool segmentation under non-adversarial corruptions (bleeding, smoke, low brightness) generated via counterfactual robotic replay, supplying paired clean/corrupted data. It reports baseline experiments plus community submissions, with winners reaching average DSC 0.9394 and NSD 0.9301 on unreleased test sets, and identifies themes for model failure while advocating new robustness paradigms beyond conventional augmentation.
Significance. If the generated corruptions are representative of real surgical conditions, the challenge supplies a reproducible benchmark that empirically demonstrates how prior knowledge, training strategies, and architecture choices can yield high robustness on the specified corruptions, thereby guiding practical improvements in surgical data science.
major comments (1)
- [Abstract] Abstract and dataset description: the positioning of the corruptions as 'plausible non-adversarial' and relevant to real OR conditions is not supported by any quantitative validation (feature-space distances, perceptual metrics, or clinician ratings) that the counterfactual robotic replay preserves the statistics causing model failure in live procedures. This assumption is load-bearing for interpreting the reported DSC/NSD scores as evidence of robustness to clinically meaningful corruptions.
Simulated Author's Rebuttal
We thank the referee for the constructive comment on the positioning of the generated corruptions. We address this point directly below and outline the planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract and dataset description: the positioning of the corruptions as 'plausible non-adversarial' and relevant to real OR conditions is not supported by any quantitative validation (feature-space distances, perceptual metrics, or clinician ratings) that the counterfactual robotic replay preserves the statistics causing model failure in live procedures. This assumption is load-bearing for interpreting the reported DSC/NSD scores as evidence of robustness to clinically meaningful corruptions.
Authors: We agree that the manuscript does not provide quantitative validation (e.g., feature-space distances, perceptual metrics, or clinician ratings) demonstrating that the counterfactual robotic replay corruptions preserve the exact statistics of model failures observed in live procedures. The generation process relies on replaying robotic trajectories with added visual effects (bleeding, smoke, low brightness) to create paired clean/corrupted samples, which we positioned as plausible non-adversarial corruptions based on the method's design. However, this remains an unvalidated assumption. In the revised manuscript we will (1) tone down the abstract and dataset description to describe the corruptions as 'synthetically generated to simulate common non-adversarial effects' rather than asserting clinical representativeness, (2) add an explicit limitations paragraph discussing the lack of such validation and its implications for interpreting the DSC/NSD scores, and (3) note this as an important direction for future work. These textual changes will make the claims more precise without requiring new experiments. revision: yes
Circularity Check
Empirical challenge report with no derivations or fitted predictions
full rationale
The paper is a report on an EndoVis'24 segmentation challenge. It describes a dataset of paired clean/corrupted images generated via counterfactual robotic replay, reports baseline and community-submitted DSC/NSD scores on unreleased test sets, and discusses practical robustness strategies. No equations, first-principles derivations, parameter fittings, or predictions are present. The central claims are empirical observations from community results, not reductions of outputs to inputs by construction. Self-citations are limited to prior challenge organization and do not bear load on any claimed derivation.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Towards Robust Surgical Automation via Digital Twin Representations from Foundation Models
Digital twin representations from vision foundation models enable LLM-based planning for robust peg transfer and gauze retrieval on the dVRK surgical platform with claimed generalizability.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2503.00695 (2025)
Ding, H., Lian, X., Unberath, M.: Mosformer: Augmenting temporal con- text with memory of surgery for surgical phase recognition. arXiv preprint arXiv:2503.00695 (2025)
-
[2]
arXiv preprint arXiv:2503.21054 (2025)
Shen, Y., Li, C., Liu, B., Li, C.-Y., Porras, T., Unberath, M.: Operating room workflow analysis via reasoning segmentation over digital twins. arXiv preprint arXiv:2503.21054 (2025)
-
[3]
arXiv preprint arXiv:2411.18018 (2024)
Ding, H., Gao, Z., Planche, B., Luan, T., Sharma, A., Zheng, M., Lou, A., Chen, T., Unberath, M., Wu, Z.: Neural finite-state machines for surgical phase recognition. arXiv preprint arXiv:2411.18018 (2024)
-
[4]
arXiv preprint arXiv:2410.20026 (2024)
Ding, H., Zhang, Y., Shu, H., Lian, X., Kim, J.W., Krieger, A., Unberath, M.: Towards robust algorithms for surgical phase recognition via digital twin-based scene representation. arXiv preprint arXiv:2410.20026 (2024)
-
[5]
In: 2020 25th International Conference on Pattern Recogni- tion (ICPR), pp
Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., Schoeff- mann, K.: Relevance detection in cataract surgery videos by spatio-temporal action localization. In: 2020 25th International Conference on Pattern Recogni- tion (ICPR), pp. 10720–10727 (2021). IEEE 25
work page 2020
-
[6]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Liu, D., Li, Q., Jiang, T., Wang, Y., Miao, R., Shan, F., Li, Z.: Towards uni- fied surgical skill assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2021)
work page 2021
-
[7]
Scientific reports11(1), 5197 (2021)
Lavanchy, J.L., Zindel, J., Kirtac, K., Twick, I., Hosgor, E., Candinas, D., Beldi, G.: Automation of surgical skill assessment using a three-stage machine learning algorithm. Scientific reports11(1), 5197 (2021)
work page 2021
-
[8]
Healthcare Technology Letters12(1), 12119 (2025)
Shu, H., Liu, M., Seenivasan, L., Gu, S., Ku, P.-C., Knopf, J., Taylor, R., Unberath, M.: Seamless augmented reality integration in arthroscopy: a pipeline for articular reconstruction and guidance. Healthcare Technology Letters12(1), 12119 (2025)
work page 2025
-
[9]
International journal of computer assisted radiology and surgery19(6), 1213–1222 (2024)
Killeen, B.D., Zhang, H., Wang, L.J., Liu, Z., Kleinbeck, C., Rosen, M., Tay- lor, R.H., Osgood, G., Unberath, M.: Stand in surgeon’s shoes: virtual reality cross-training to enhance teamwork in surgery. International journal of computer assisted radiology and surgery19(6), 1213–1222 (2024)
work page 2024
-
[10]
Healthcare Technology Letters 11(6), 355–364 (2024)
Zhang, H., Killeen, B.D., Ku, Y.-C., Seenivasan, L., Zhao, Y., Liu, M., Yang, Y., Gu, S., Martin-Gomez, A., Osgood, G.,et al.: Straighttrack: Towards mixed real- ity navigation system for percutaneous k-wire insertion. Healthcare Technology Letters 11(6), 355–364 (2024)
work page 2024
-
[11]
International Journal of Computer Assisted Radiology and Surgery 19(7), 1301–1312 (2024)
Kleinbeck, C., Zhang, H., Killeen, B.D., Roth, D., Unberath, M.: Neural digital twins: reconstructing complex medical environments for spatial planning in vir- tual reality. International Journal of Computer Assisted Radiology and Surgery 19(7), 1301–1312 (2024)
work page 2024
-
[12]
Inter- national Journal of Computer Assisted Radiology and Surgery18(7), 1235–1243 (2023)
Gu, W., Knopf, J., Cast, J., Higgins, L.D., Knopf, D., Unberath, M.: Nail it! vision-based drift correction for accurate mixed reality surgical guidance. Inter- national Journal of Computer Assisted Radiology and Surgery18(7), 1235–1243 (2023)
work page 2023
-
[13]
Towards Robust Surgical Automation via Digital Twin Representations from Foundation Models
Ding, H., Seenivasan, L., Shu, H., Byrd, G., Zhang, H., Xiao, P., Barragan, J.A., Taylor, R.H., Kazanzides, P., Unberath, M.: Towards robust automation of surgi- cal systems via digital twin-based scene representations from foundation models. arXiv preprint arXiv:2409.13107 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[14]
W.et al.Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks (2024)
Kim, J.W., Zhao, T.Z., Schmidgall, S., Deguet, A., Kobilarov, M., Finn, C., Krieger, A.: Surgical robot transformer (srt): Imitation learning for surgical tasks. arXiv preprint arXiv:2407.12998 (2024)
-
[15]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 26
work page 2015
-
[16]
Ronneberger,O.,Fischer,P.,Brox,T.:U-net:Convolutionalnetworksforbiomed- ical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241 (2015). Springer
work page 2015
-
[17]
In: Pro- ceedings of the European Conference on Computer Vision (ECCV), pp
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Pro- ceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)
work page 2018
-
[19]
IEEE Robotics and Automation Letters 7(2), 3858–3865 (2022)
Seenivasan, L., Mitheran, S., Islam, M., Ren, H.: Global-reasoned multi-task learning model for surgical scene understanding. IEEE Robotics and Automation Letters 7(2), 3858–3865 (2022)
work page 2022
-
[20]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Zheng,S.,Lu,J.,Zhao,H.,Zhu,X.,Luo,Z.,Wang,Y.,Fu,Y.,Feng,J.,Xiang,T., Torr, P.H.,et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)
work page 2021
-
[21]
Advances in Neural Information Processing Systems34, 12077–12090 (2021)
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Seg- former: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems34, 12077–12090 (2021)
work page 2021
-
[22]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290–1299 (2022)
work page 2022
-
[23]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Ding, H., Qiao, S., Yuille, A., Shen, W.: Deeply shape-guided cascade for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8278–8288 (2021)
work page 2021
-
[24]
arXiv preprint arXiv:2001.11190 (2020)
Allan, M., Kondo, S., Bodenstedt, S., Leger, S., Kadkhodamohammadi, R., Luengo, I., Fuentes, F., Flouty, E., Mohammed, A., Pedersen, M., et al.: 2018 robotic scene segmentation challenge. arXiv preprint arXiv:2001.11190 (2020)
-
[25]
2017 Robotic Instrument Segmentation Challenge
Allan, M., Shvets, A., Kurmann, T., Zhang, Z., Duggal, R., Su, Y.-H., Rieke, N., Laina, I., Kalavakonda, N., Bodenstedt, S., et al.: 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[26]
International journal of computer vision 88, 303–338 (2010) 27
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International journal of computer vision 88, 303–338 (2010) 27
work page 2010
-
[27]
International Journal of Computer Vision127(3), 302–321 (2019)
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision127(3), 302–321 (2019)
work page 2019
-
[28]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp
Ghamsarian, N., Gamazo Tejero, J., Márquez-Neila, P., Wolf, S., Zinkernagel, M., Schoeffmann, K., Sznitman, R.: Domain adaptation for medical image segmen- tation using transformation-invariant self-training. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 331–341 (2023). Springer
work page 2023
- [29]
-
[30]
arXiv preprint arXiv:2501.17628 (2025)
Nasirihaghighi, S., Ghamsarian, N., Sznitman, R., Schoeffmann, K.: Dual invari- ance self-training for reliable semi-supervised surgical phase recognition. arXiv preprint arXiv:2501.17628 (2025)
-
[31]
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1903
-
[32]
arXiv preprint arXiv:2410.23494 (2024)
Drenkow, N., Ribaudo, C., Unberath, M.: Causality-driven audits of model robustness. arXiv preprint arXiv:2410.23494 (2024)
-
[33]
arXiv preprint arXiv:2503.09969 (2025)
Drenkow, N., Pavlak, M., Harrigian, K., Zirikly, A., Subbaswamy, A., Unberath, M.: Detecting dataset bias in medical ai: A generalized and modality-agnostic auditing framework. arXiv preprint arXiv:2503.09969 (2025)
-
[34]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: Learning augmentation strategies from data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 113–123 (2019)
work page 2019
-
[35]
SAM 2: Segment Anything in Images and Videos
Ravi, N., Gabeur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., et al.: Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
arXiv preprint arXiv:2408.04098 (2024)
Shen, Y., Ding, H., Shao, X., Unberath, M.: Performance and non-adversarial robustness of the segment anything model 2 in surgical video segmentation. arXiv preprint arXiv:2408.04098 (2024)
-
[37]
Seenivasan, L., Islam, M., Ng, C.-F., Lim, C.M., Ren, H.: Biomimetic incremental domain generalization with a graph network for surgical scene understanding. Biomimetics 7(2), 68 (2022)
work page 2022
-
[38]
International Journal of Computer Assisted Radiology and Surgery18(5), 939–944 (2023) 28
Reiter, W.: Domain generalization improves end-to-end object detection for real-time surgical tool detection. International Journal of Computer Assisted Radiology and Surgery18(5), 939–944 (2023) 28
work page 2023
-
[39]
In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp
Philipp, M., Alperovich, A., Gutt-Will, M., Mathis, A., Saur, S., Raabe, A., Mathis-Ullrich, F.: Dynamic cnns using uncertainty to overcome domain gener- alization for surgical instrument localization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3612–3621 (2022)
work page 2022
-
[40]
In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp
Ding, H., Zhang, J., Kazanzides, P., Wu, J.Y., Unberath, M.: Carts: Causality- driven robot tool segmentation from vision and kinematics data. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 387–398 (2022). Springer
work page 2022
-
[41]
International Journal of Computer Assisted Radiology and Surgery18(6), 1009–1016 (2023)
Ding, H., Wu, J.Y., Li, Z., Unberath, M.: Rethinking causality-driven robot tool segmentation with temporal constraints. International Journal of Computer Assisted Radiology and Surgery18(6), 1009–1016 (2023)
work page 2023
-
[42]
arXiv preprint arXiv:2503.21056 (2025)
Shen, Y., Liu, B., Li, C., Seenivasan, L., Unberath, M.: Online reasoning video segmentation with just-in-time digital twins. arXiv preprint arXiv:2503.21056 (2025)
-
[43]
In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp
Kazanzides, P., Chen, Z., Deguet, A., Fischer, G.S., Taylor, R.H., DiMaio, S.P.: An open-source research kit for the da vinci® surgical system. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 6434–6439 (2014). IEEE
work page 2014
-
[44]
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[45]
Video-based surveillance systems: Computer vision and distributed processing, 135–144 (2002)
KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real-time tracking with shadow detection. Video-based surveillance systems: Computer vision and distributed processing, 135–144 (2002)
work page 2002
-
[46]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
work page 2017
-
[47]
arXiv preprint arXiv:2407.19714 (2024)
Jamal, M.A., Mohareri, O.: Rethinking rgb-d fusion for semantic segmentation in surgical datasets. arXiv preprint arXiv:2407.19714 (2024)
-
[48]
arXiv preprint arXiv:2309.09668 (2023)
Yin, B., Zhang, X., Li, Z., Liu, L., Cheng, M.-M., Hou, Q.: Dformer: Rethink- ing rgbd representation learning for semantic segmentation. arXiv preprint arXiv:2309.09668 (2023)
-
[49]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth any- thing: Unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10371–10381 (2024) 29
work page 2024
-
[50]
In: 2021 International Conference on 3D Vision (3DV), pp
Lipson, L., Teed, Z., Deng, J.: Raft-stereo: Multilevel recurrent field transforms for stereo matching. In: 2021 International Conference on 3D Vision (3DV), pp. 218–227 (2021). IEEE
work page 2021
-
[51]
Kar, O.F., Yeo, T., Atanov, A., Zamir, A.: 3d common corruptions and data augmentation.In:ProceedingsoftheIEEE/CVFConferenceonComputerVision and Pattern Recognition, pp. 18963–18974 (2022)
work page 2022
-
[52]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[53]
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Fluids Eng. (1960)
work page 1960
-
[54]
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
Ma, J., Li, F., Wang, B.: U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[55]
arXiv preprint arXiv:2401.13560 (2024)
Xing, Z., Ye, T., Yang, Y., Liu, G., Zhu, L.: Segmamba: Long-range sequen- tial modeling mamba for 3d medical image segmentation. arXiv preprint arXiv:2401.13560 (2024)
-
[56]
VMamba: Visual State Space Model
Liu, Y., Tian, Y., Zhao, Y., Yu, H., Xie, L., Wang, Y., Ye, Q., Liu, Y.: Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[57]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Choi, Y., Uh, Y., Yoo, J., Ha, J.-W.: Stargan v2: Diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8188–8197 (2020)
work page 2020
-
[58]
IEEE transactions on medical imaging40(5), 1450–1460 (2021)
Garcia-Peraza-Herrera,L.C.,Fidon,L.,D’Ettorre,C.,Stoyanov,D.,Vercauteren, T., Ourselin, S.: Image compositing for segmentation of surgical tools without manual annotations. IEEE transactions on medical imaging40(5), 1450–1460 (2021)
work page 2021
-
[59]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Cheng, T., Song, L., Ge, Y., Liu, W., Wang, X., Shan, Y.: Yolo-world: Real-time open-vocabulary object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16901–16911 (2024)
work page 2024
-
[60]
IEEE Transactions on Pattern Analysis and Machine Intelligence45(5), 5436–5447 (2022)
Guo, M.-H., Liu, Z.-N., Mu, T.-J., Hu, S.-M.: Beyond self-attention: External attention using two linear layers for visual tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence45(5), 5436–5447 (2022)
work page 2022
-
[61]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
work page 2018
-
[62]
In: Proceedings of the European Conference on Computer Vision 30 (ECCV), pp
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block atten- tion module. In: Proceedings of the European Conference on Computer Vision 30 (ECCV), pp. 3–19 (2018)
work page 2018
-
[63]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Howard, A.G.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[64]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Woo, S., Debnath, S., Hu, R., Chen, X., Liu, Z., Kweon, I.S., Xie, S.: Convnext v2: Co-designing and scaling convnets with masked autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16133–16142 (2023)
work page 2023
-
[65]
Informa- tion 11(2), 125 (2020)
Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Informa- tion 11(2), 125 (2020)
work page 2020
-
[66]
: Swin transformer v2: Scaling up capacity and resolution
Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al. : Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)
work page 2022
-
[67]
In: Computer Vision (ICCV), 2017 IEEE International Conference On (2017)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networkss. In: Computer Vision (ICCV), 2017 IEEE International Conference On (2017)
work page 2017
-
[68]
Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods18, 203–211 (2021) https://doi.org/10.1038/s41592-020-01008-z
-
[69]
Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., El- Shabrawi, Y., Schöffmann, K.: Recal-net: Joint region-channel-wise calibrated network for semantic segmentation in cataract surgery videos. In: Neural Infor- mation Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8–12, 2021, Proceedings, Part II...
work page 2021
-
[70]
Ghamsarian, N., Taschwer, M., Putzgruber-Adamitsch, D., Sarny, S., El- Shabrawi, Y., Schoeffmann, K.: Lensid: a cnn-rnn-based framework towards lens irregularity detection in cataract surgery videos. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Confer- ence, Strasbourg, France, September 27–October 1, 2021...
work page 2021
-
[71]
International journal of computer assisted radiol- ogy and surgery, 1–9 (2024)
Ghamsarian, N., Wolf, S., Zinkernagel, M., Schoeffmann, K., Sznitman, R.: Deeppyramid+: medical image segmentation using pyramid view fusion and deformable pyramid reception. International journal of computer assisted radiol- ogy and surgery, 1–9 (2024)
work page 2024
-
[72]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp
Ghamsarian, N., Taschwer, M., Sznitman, R., Schoeffmann, K.: Deeppyramid: 31 Enabling pyramid view and deformable pyramid reception for semantic segmen- tation in cataract surgery videos. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 276–286 (2022). Springer 32
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.