pith. sign in

arxiv: 2305.07152 · v4 · pith:AGVRJTZWnew · submitted 2023-05-11 · 💻 cs.CV

Intuitive Surgical SurgToolLoc and SurgVU Challenges Results: 2022-2025

Aneeq Zia , Max Berniker , Rogerio Garcia Nespolo , Xiaorui Zhang , Conor Perreault , Kiran Bhattacharyya , Xi Liu , Ziheng Wang
show 148 more authors
Satoshi Kondo Satoshi Kasai Kousuke Hirasawa Bo Liu David Austin Yiheng Wang Michal Futrega Jean-Francois Puget Zhenqiang Li Yoichi Sato Ryo Fujii Ryo Hachiuma Mana Masuda Hideo Saito An Wang Mengya Xu Mobarakol Islam Long Bai Winnie Pang Hongliang Ren Chinedu Nwoye Luca Sestini Nicolas Padoy Maximilian Nielsen Samuel Sch\"uttler Thilo Sentker H\"umeyra Husseini Ivo Baltruschat R\"udiger Schmitz Ren\'e Werner Aleksandr Matsun Mugariya Farooq Numan Saaed Jose Renato Restom Viera Mohammad Yaqub Neil Getty Fangfang Xia Zixuan Zhao Xiaotian Duan Xing Yao Ange Lou Hao Yang Jintong Han Jack Noble Jie Ying Wu Tamer Abdulbaki Alshirbaji Nour Aldeen Jalal Herag Arabian Ning Ding Knut Moeller Weiliang Chen Quan He Muhammad Bilal Taofeek Akinosho Adnan Qayyum Massimo Caputo Hunaid Vohra Michael Loizou Anuoluwapo Ajayi Ilhem Berrou Faatihah Niyi-Odumosu Charlie Budd Oluwatosin Alabi Tom Vercauteren Ruoxi Zhao Ayberk Acar John Han Jumanh Atoum Yinhong Qin Surong Hua Lu Ping Wenming Wu Rongfeng Wei Jinlin Wu You Pang Zhen Chen Tim Jaspers Amine Yamlahi Piotr Kalinowski Dominik Michael Tim R\"adsch Marco H\"ubner Danail Stoyanov Stefanie Speidel Lena Maier-Hein Jie Tian Ruxin Zhang Khang Hoang Nguyen Anh Quoc Nguyen Tam Minh Nguyen Khoi Dinh Tran Minh Nguyen Dang Nhat Trinh Thi Doan Pham Linh Van Nguyen Chunyang Jiang Dewei Yang Haitao Li Yannick Prudent Thibaut Boissin Mahmood Alam Shazad Ashraf Andrew D. Beggs Lukman Akanbi Manuel D. Delgado Narain Gupta Amir M. Hajiyavand Iqbal Qasim Hafiz A. Alaka Junaid Qadir Shu Yang Yihui Wang Hao Chen Shin Paul Yosuke Yamagishi Zhang Dong Hongyun Li Hongyu Gu Xiaoliu Ding Xiaoyao Liu Xingyu Zhao Mariana Ribeiro Tiago Jesus Andr\'e Ferreira Guilherme Barbosa Jo\~ao Carvalho Leonardo Barroso Nuno Gomes Rafael Peixoto Rodrigo Ralha Victor Alves Stephanie Nattapat Ittikosil Achita Chitrapan Quan Huu Cap Jiayuan Huang Shreyas C Dhake Sergi Kavtaradze Mobarak I Hoque Ka Young Kim Su Yong Yun Young Tae Kim Hyeon Bae Kim Seong Tae Kim Zuxing Deng Ling Li Jieyu Zheng Xiaojian Li Anthony Jarc
This is my paper

Pith reviewed 2026-05-24 08:37 UTC · model grok-4.3

classification 💻 cs.CV
keywords surgical tool localizationsurgical visual understandingrobotic surgerymachine learningchallenge resultscomputer vision
0
0 comments X

The pith

The SurgToolLoc and SurgVU challenges document AI performance in surgical tool localization and visual understanding from 2022 to 2025.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports the results of a series of machine learning challenges organized by Intuitive Surgical at the MICCAI conference. The challenges focus on localizing surgical tools and understanding visual scenes in robotic assisted surgery using publicly released datasets. A sympathetic reader cares because these results provide concrete benchmarks that show how well current algorithms handle the visual demands of surgery and what remains to be solved for clinical use.

Core claim

The authors document the winning methods and their performance scores in the SurgToolLoc challenge for tool localization and the SurgVU challenge for surgical visual understanding over the years 2022 to 2025.

What carries the argument

The evaluation on the SurgToolLoc and SurgVU datasets using metrics for localization accuracy and visual understanding tasks.

If this is right

  • Future research can build on the top performing methods as baselines.
  • The released dataset enables standardized comparisons in surgical data science.
  • High performance in these tasks suggests AI is approaching usability for assisting in robotic procedures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Successful methods from these challenges could be integrated into robotic systems to provide real-time feedback to surgeons.
  • Extending the challenges to include more varied surgical procedures might reveal gaps in current models.
  • The results imply that visual understanding in surgery is solvable with existing computer vision techniques when given appropriate training data.

Load-bearing premise

The challenge tasks and datasets provide a meaningful proxy for real clinical robotic surgery scenarios that will translate to improved patient outcomes.

What would settle it

Deploying the top challenge models in live robotic surgeries and comparing error rates or patient outcomes to standard procedures without AI assistance.

Figures

Figures reproduced from arXiv: 2305.07152 by Achita Chitrapan, Adnan Qayyum, Aleksandr Matsun, Amine Yamlahi, Amir M. Hajiyavand, Andr\'e Ferreira, Andrew D. Beggs, Aneeq Zia, Ange Lou, Anh Quoc Nguyen, Anthony Jarc, Anuoluwapo Ajayi, An Wang, Ayberk Acar, Bo Liu, Charlie Budd, Chinedu Nwoye, Chunyang Jiang, Conor Perreault, Danail Stoyanov, David Austin, Dewei Yang, Dominik Michael, Faatihah Niyi-Odumosu, Fangfang Xia, Guilherme Barbosa, Hafiz A. Alaka, Haitao Li, Hao Chen, Hao Yang, Herag Arabian, Hideo Saito, Hongliang Ren, Hongyu Gu, Hongyun Li, H\"umeyra Husseini, Hunaid Vohra, Hyeon Bae Kim, Ilhem Berrou, Iqbal Qasim, Ivo Baltruschat, Jack Noble, Jean-Francois Puget, Jiayuan Huang, Jie Tian, Jie Ying Wu, Jieyu Zheng, Jinlin Wu, Jintong Han, Jo\~ao Carvalho, John Han, Jose Renato Restom Viera, Jumanh Atoum, Junaid Qadir, Ka Young Kim, Khang Hoang Nguyen, Khoi Dinh Tran, Kiran Bhattacharyya, Knut Moeller, Kousuke Hirasawa, Lena Maier-Hein, Leonardo Barroso, Ling Li, Linh Van Nguyen, Long Bai, Luca Sestini, Lukman Akanbi, Lu Ping, Mahmood Alam, Mana Masuda, Manuel D. Delgado, Marco H\"ubner, Mariana Ribeiro, Massimo Caputo, Max Berniker, Maximilian Nielsen, Mengya Xu, Michael Loizou, Michal Futrega, Minh Nguyen Dang Nhat, Mobarak I Hoque, Mobarakol Islam, Mohammad Yaqub, Mugariya Farooq, Muhammad Bilal, Narain Gupta, Nattapat Ittikosil, Neil Getty, Nicolas Padoy, Ning Ding, Nour Aldeen Jalal, Numan Saaed, Nuno Gomes, Oluwatosin Alabi, Piotr Kalinowski, Quan He, Quan Huu Cap, Rafael Peixoto, Ren\'e Werner, Rodrigo Ralha, Rogerio Garcia Nespolo, Rongfeng Wei, R\"udiger Schmitz, Ruoxi Zhao, Ruxin Zhang, Ryo Fujii, Ryo Hachiuma, Samuel Sch\"uttler, Satoshi Kasai, Satoshi Kondo, Seong Tae Kim, Sergi Kavtaradze, Shazad Ashraf, Shin Paul, Shreyas C Dhake, Shu Yang, Stefanie Speidel, Stephanie, Surong Hua, Su Yong Yun, Tamer Abdulbaki Alshirbaji, Tam Minh Nguyen, Taofeek Akinosho, Thibaut Boissin, Thilo Sentker, Tiago Jesus, Tim Jaspers, Tim R\"adsch, Tom Vercauteren, Trinh Thi Doan Pham, Victor Alves, Weiliang Chen, Wenming Wu, Winnie Pang, Xiaojian Li, Xiaoliu Ding, Xiaorui Zhang, Xiaotian Duan, Xiaoyao Liu, Xi Liu, Xing Yao, Xingyu Zhao, Yannick Prudent, Yiheng Wang, Yihui Wang, Yinhong Qin, Yoichi Sato, Yosuke Yamagishi, Young Tae Kim, You Pang, Zhang Dong, Zhen Chen, Zhenqiang Li, Ziheng Wang, Zixuan Zhao, Zuxing Deng.

Figure 1
Figure 1. Figure 1: Overview of challenge categories 1 and 2 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Sample frames with presence labels (left) and a snapshot of the labels CSV file (right) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sample frames with testing labels. The UI interface was blurred to avoid its embedded information [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: HRI MV: The information required for detection and classification is different. Thus, the ROI proposal box output by RPN network was modified [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: HRI MV: ROI expand makes model identification device The detection frame output from the previous frame algorithm was used for target tracking and integrated into the results of target detection and target tracking via weighting. In this way, as long as the device appears in a certain frame, it can be identified by target tracking in its subsequent frames, solving the problem of target missing detection. T… view at source ↗
Figure 6
Figure 6. Figure 6: HRI MV: model ensemble [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: HKMV: Primary training dataset The clevis part in the mask label of the dataset was converted into a bounding box label and used as the primary training dataset. Since endovis17 and endovis18 do not include all 14 surgical tools, part of the data was added to the training dataset to ensure that the dataset contains all 14 surgical tools. The final set contained 5212 images. Surgical Tool Localization Algor… view at source ↗
Figure 8
Figure 8. Figure 8: HKMV: Model architecture In order to increase the size of the dataset, the trained model was used to infer the images of the competition dataset, adding these to the training dataset. A score threshold of 0.7 was employed to filter out the poor bounding box. After several operations, the dataset was expanded to 11035 images. The dataset expansion flow diagram of the algorithm is described in [PITH_FULL_IM… view at source ↗
Figure 9
Figure 9. Figure 9: HKMV: Data expansion scheme 3.5 NVIDIA - Team NVIDIA Team Members: Bo Liu, David Austin, Yiheng Wang, Michal Futrega, Jean-Francois Puget This team consists of five NVIDIA employees. Three of them (Jean-Francois, David, and Bo) are members of the Kaggle Grandmasters team, with extensive experience in computer vision machine learning competi￾tions. One member (Yiheng) works on MONAI, an open-source framewor… view at source ↗
Figure 10
Figure 10. Figure 10: NVIDIA: Team NVIDIA Category 2 workflow 3.5.2 Model Training For Category 1, they trained 5 EfficientNet-B4 models (on 5 fold splits) and 5 ConvNext-tiny models and ensembled them using a trick called logit shift. The idea of the logit shift trick is that, when data is extremely imbalanced between classes as in this dataset, the minor classes’ probabilities are extremely biased towards 0. The extent of th… view at source ↗
Figure 11
Figure 11. Figure 11: ANL-Surg: Top right frame is the result of the parts detection model. Bottom left is the [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: HVRL: Examples of preprocessing. The original image included the black region on the left and [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: SK: Overview of our proposed method. The augmentation techniques used were horizontal flip, shift, scale, rotation, color jitter, Gaussian blur, and Gaussian noise. The augmented images were resized to 640 × 480 pixels. The employed optimization method was Adam, and the initial learning rate was set to 1.0 × 10−5 changing at every epoch with cosine annealing. The cross-entropy loss was used as the loss fu… view at source ↗
Figure 14
Figure 14. Figure 14: VANDY-VISE: Architecture of Query2Label [42] [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: UKE: DINO-based self-distillation. Global and local crops are extracted from the input frame, [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: ITeM: The complete pipeline of the proposed model for tool presence detection and localiza [PITH_FULL_IMAGE:figures/full_fig_p025_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: ITeM: Model performance on the validation data. [PITH_FULL_IMAGE:figures/full_fig_p028_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: MM: R50+ViT-B 16 hybrid model 29 [PITH_FULL_IMAGE:figures/full_fig_p029_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: MM: The training process. 3.12.3 Preliminary Performance An mAP of 0.86 was achieved on the validation dataset [PITH_FULL_IMAGE:figures/full_fig_p030_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: BioMedIA: Frequency of each tool across the whole dataset. The graph illustrates the frequent [PITH_FULL_IMAGE:figures/full_fig_p031_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: BioMedIA: Surgical instruments present in Cholec80 Dataset. The tools are similar to the tools [PITH_FULL_IMAGE:figures/full_fig_p032_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: BioMedIA: Surgical instruments present in M2CAI Dataset. Some of the tools present are similar [PITH_FULL_IMAGE:figures/full_fig_p032_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: BioMedIA: Architecture of the two-tier model used for classification. Images are passed through [PITH_FULL_IMAGE:figures/full_fig_p033_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: WhiteBox: The tool presence classification framework used by WhiteBox team. [PITH_FULL_IMAGE:figures/full_fig_p034_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: CAMMA: Architecture of the spatial attention network (SANet) for surgical tool presence detec [PITH_FULL_IMAGE:figures/full_fig_p036_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: TeamZero: Proposed methodology for robust detection of surgical tools with noisy labels. [PITH_FULL_IMAGE:figures/full_fig_p037_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: TeamZero: Examples of noisy labels (a) and cropped Images using Segmentation Learner (b). [PITH_FULL_IMAGE:figures/full_fig_p038_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: TeamZero: Learning rate chart showing 5e-3 where ViT model can learn the most on the given [PITH_FULL_IMAGE:figures/full_fig_p040_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: TeamZero: Learning rate chart showing 5e-3 where ViT model can learn the most on the given [PITH_FULL_IMAGE:figures/full_fig_p041_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Class wise distribution for the original SurgToolLoc labels and the generated pseudo labels. The [PITH_FULL_IMAGE:figures/full_fig_p048_30.png] view at source ↗
Figure 33
Figure 33. Figure 33: Confusion matrix for all classes. 4.5 MapleLab Team Members: John Han, Ayberk Acar, Jumanh Atoum, Yinhong Qin, Jie Ying Wu 51 [PITH_FULL_IMAGE:figures/full_fig_p051_33.png] view at source ↗
Figure 34
Figure 34. Figure 34: A fine-tuned ResNet-101 model takes the concatenated 7-channel image as input and outputs the [PITH_FULL_IMAGE:figures/full_fig_p052_34.png] view at source ↗
Figure 35
Figure 35. Figure 35: In the final reiterated inference stage, predictions are made on each bounding box via masking [PITH_FULL_IMAGE:figures/full_fig_p053_35.png] view at source ↗
Figure 36
Figure 36. Figure 36: Qualitative Results 4.5.2 Model Training To improve the quality of training data, a morphological opening and closing was applied to the segmentation mask to remove noise and close holes after running TernausNet. The images with poor segmentation masks were removed, via comparing the size of the segmentation blob with certain thresholds. Segmentation blobs with areas > 0.02 × total image area were kept an… view at source ↗
Figure 37
Figure 37. Figure 37: Data Processing 4.6 PUMCH • Team name: PUMCH • Members: Surong Hua, Lu Ping, Wenming Wu • Research field: Computer Vision • Institution: Peking Union Medical College Hospital • City: Beijing, China • motivation and plan: At first glance, we were thinking of weakly supervised learning. But then we found that the video tags provided for training only contain information on whether a certain instrument exist… view at source ↗
Figure 38
Figure 38. Figure 38: OsTrack Pipeline Model Loss Function To train the one-stage object detector, we adpoted a dynamic soft label assignment strategy based on SimOTA [78]. As RTM-DET [79]does, we used the IoU between the predictions and ground truth boxes as the soft label to train the classification branch, used the logarithm of the IoU as the regression cost instead of GIoU used in the loss function, used a soft center regi… view at source ↗
Figure 39
Figure 39. Figure 39: Long Tail Model Size Bbox map Cascade R-CNN-Detectors (894,682) 63.6 RTMDet-l (864,864) 70.6 YOLOv8-l (864,864) 69.1 Co-DETR-swin-s Multi scale 64.6 [PITH_FULL_IMAGE:figures/full_fig_p056_39.png] view at source ↗
Figure 40
Figure 40. Figure 40: The overall pipeline of our proposed method. [PITH_FULL_IMAGE:figures/full_fig_p058_40.png] view at source ↗
Figure 41
Figure 41. Figure 41: Pseudo-code of Initial Label Filter Algorithm. [PITH_FULL_IMAGE:figures/full_fig_p059_41.png] view at source ↗
Figure 42
Figure 42. Figure 42: Pseudo-code of Multi-round Label Filter Algorithm. [PITH_FULL_IMAGE:figures/full_fig_p061_42.png] view at source ↗
Figure 43
Figure 43. Figure 43: Proposed training workflow: (a) Keypoints tracking-based bounding box generation. (b) Classi [PITH_FULL_IMAGE:figures/full_fig_p063_43.png] view at source ↗
Figure 44
Figure 44. Figure 44: Team TUE-VCA Schematic representation of the proposed workflow. [PITH_FULL_IMAGE:figures/full_fig_p065_44.png] view at source ↗
Figure 45
Figure 45. Figure 45: Overview of challenge results from 2022 to 2023 demonstrating a robust increase in model [PITH_FULL_IMAGE:figures/full_fig_p069_45.png] view at source ↗
read the original abstract

Robotic assisted (RA) surgery promises to transform surgical intervention. Intuitive Surgical is committed to fostering these changes and the machine learning models and algorithms that will enable them. With these goals in mind we have invited the surgical data science community to participate in a yearly competition hosted through the Medical Imaging Computing and Computer Assisted Interventions (MICCAI) conference. With varying changes from year to year, we have challenged the community to solve difficult machine learning problems in the context of advanced RA applications. Here we document the results of these challenges, focusing on surgical tool localization (SurgToolLoc) and surgical visual understanding (SurgVU). The publicly released dataset that accompanies these challenges is detailed in a separate paper arXiv:2501.09209 [1].

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript documents the results of the Intuitive Surgical SurgToolLoc (surgical tool localization) and SurgVU (surgical visual understanding) challenges hosted annually at MICCAI from 2022 to 2025. It reports participation numbers, submitted methods, and performance metrics across the challenge tasks, while referring readers to a companion paper (arXiv:2501.09209) for details on the released datasets.

Significance. If the reported outcomes are accurate, the paper provides a useful archival record of community progress on standardized benchmarks in robotic surgery computer vision. Multi-year documentation of this form can help the field track incremental improvements in tool detection and scene understanding, and the public datasets enable follow-on reproducible work.

minor comments (3)
  1. The abstract states the purpose but contains no numerical results or key findings; the main text should include a concise summary table of top-performing methods and metrics in the introduction or a dedicated results overview section for quick reference.
  2. Ensure that all challenge years (2022–2025) are covered with consistent reporting of task definitions, evaluation metrics, and participation statistics; any year-to-year changes in rules or data should be explicitly tabulated.
  3. Clarify the relationship to the companion dataset paper: add a short paragraph in the introduction that distinguishes what is new in this results report versus what is already described in arXiv:2501.09209.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation to accept the manuscript. The assessment correctly identifies the value of a multi-year archival record of the SurgToolLoc and SurgVU challenges.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is a factual report documenting the results of the SurgToolLoc and SurgVU challenges from 2022-2025. It contains no mathematical derivations, predictions, fitted parameters, or generalization claims. The central claim is confined to reporting participation, methods, and numerical outcomes from external submissions, with the dataset referenced to a separate paper. No load-bearing self-citations, self-definitional steps, or reductions of outputs to inputs by construction are present. The derivation chain is empty by the nature of the document.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced or required by the abstract content.

pith-pipeline@v0.9.0 · 6391 in / 889 out tokens · 17796 ms · 2026-05-24T08:37:57.315142+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Surgical Visual Understanding (SurgVU) Dataset

    cs.CV 2025-01 unverdicted novelty 5.0

    Releases the SurgVU dataset of surgical videos and labels to enable machine learning research in surgical data science.

Reference graph

Works this paper leans on

96 extracted references · 96 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Surgical visual understanding (surgvu) dataset, 2025

    Aneeq Zia, Max Berniker, Rogerio Nespolo, Conor Perreault, Ziheng Wang, Benjamin Mueller, Ryan Schmidt, Kiran Bhattacharyya, Xi Liu, and Anthony Jarc. Surgical visual understanding (surgvu) dataset, 2025. 69

  2. [2]

    Trends in robot-assisted procedures for general surgery in the veterans health administration

    Michael A Mederos, R Lorie Jacob, Rachel Ward, Rivfka Shenoy, Melinda M Gibbons, Mark D Girgis, Devan Kansagara, Denise Hynes, Paul G Shekelle, and Karli Kondo. Trends in robot-assisted procedures for general surgery in the veterans health administration. Journal of Surgical Research , 279:788–795, 2022

  3. [3]

    Exploring the paradigm of robotic surgery and its contribution to the growth of surgical volume

    Emily A Grimsley, Tara M Barry, Haroon Janjua, Emanuel Eguia, Christopher DuCoin, and Paul C Kuo. Exploring the paradigm of robotic surgery and its contribution to the growth of surgical volume. Surgery Open Science, 10:36–42, 2022

  4. [4]

    Surgical data science– from concepts toward clinical translation

    Lena Maier-Hein, Matthias Eisenmann, Duygu Sarikaya, Keno M¨ arz, Toby Collins, Anand Malpani, Johannes Fallert, Hubertus Feussner, Stamatia Giannarou, Pietro Mascagni, et al. Surgical data science– from concepts toward clinical translation. Medical image analysis, 76:102306, 2022

  5. [5]

    Surgical data science: the new knowledge domain

    S Swaroop Vedula and Gregory D Hager. Surgical data science: the new knowledge domain. Innovative surgical sciences, 2(3):109–121, 2017

  6. [6]

    Review of automated performance metrics to assess surgical technical skills in robot-assisted laparoscopy

    Sonia Guerin, Arnaud Huaulm´ e, Vincent Lavoue, Pierre Jannin, and Krystel Nyangoh Timoh. Review of automated performance metrics to assess surgical technical skills in robot-assisted laparoscopy. Surgical Endoscopy, pages 1–18, 2022

  7. [7]

    Deep learning-based computer vision to recognize and classify suturing gestures in robot-assisted surgery

    Francisco Luongo, Ryan Hakim, Jessica H Nguyen, Animashree Anandkumar, and Andrew J Hung. Deep learning-based computer vision to recognize and classify suturing gestures in robot-assisted surgery. Surgery, 169(5):1240–1244, 2021

  8. [8]

    A deep-learning model using automated performance metrics and clinical features to predict urinary continence recovery after robot-assisted radical prostatectomy

    Andrew J Hung, Jian Chen, Saum Ghodoussipour, Paul J Oh, Zequn Liu, Jessica Nguyen, Sanjay Pu- rushotham, Inderbir S Gill, and Yan Liu. A deep-learning model using automated performance metrics and clinical features to predict urinary continence recovery after robot-assisted radical prostatectomy. BJU international, 124(3):487–495, 2019

  9. [9]

    How to bring surgery to the next level: interpretable skills assessment in robotic-assisted surgery

    Kristen C Brown, Kiran D Bhattacharyya, Sue Kulason, Aneeq Zia, and Anthony Jarc. How to bring surgery to the next level: interpretable skills assessment in robotic-assisted surgery. Visceral medicine, 36(6):463–470, 2020

  10. [10]

    Automated surgical skill assessment in rmis training

    Aneeq Zia and Irfan Essa. Automated surgical skill assessment in rmis training. International journal of computer assisted radiology and surgery , 13(5):731–739, 2018

  11. [11]

    Temporal clustering of surgical activities in robot-assisted surgery

    Aneeq Zia, Chi Zhang, Xiaobin Xiong, and Anthony M Jarc. Temporal clustering of surgical activities in robot-assisted surgery. International journal of computer assisted radiology and surgery , 12(7):1171– 1178, 2017

  12. [12]

    Novel evaluation of surgical activ- ity recognition models using task-based efficiency metrics

    Aneeq Zia, Liheng Guo, Linlin Zhou, Irfan Essa, and Anthony Jarc. Novel evaluation of surgical activ- ity recognition models using task-based efficiency metrics. International journal of computer assisted radiology and surgery, 14(12):2155–2163, 2019

  13. [13]

    Surgical activity recognition in robot-assisted radical prostatectomy using deep learning

    Aneeq Zia, Andrew Hung, Irfan Essa, and Anthony Jarc. Surgical activity recognition in robot-assisted radical prostatectomy using deep learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention , pages 273–280. Springer, 2018

  14. [14]

    Biomedical image analysis competitions: The state of current participation practice

    Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, et al. Biomedical image analysis competitions: The state of current participation practice. arXiv preprint arXiv:2212.08568 , 2022

  15. [15]

    Surgical visual domain adaptation: results from the miccai 2020 surgvisdom challenge

    Aneeq Zia, Kiran Bhattacharyya, Xi Liu, Ziheng Wang, Satoshi Kondo, Emanuele Colleoni, Beatrice van Amsterdam, Razeen Hussain, Raabid Hussain, Lena Maier-Hein, et al. Surgical visual domain adaptation: results from the miccai 2020 surgvisdom challenge. arXiv preprint arXiv:2102.13644, 2021. 70

  16. [16]

    Learning motion flows for semi- supervised instrument segmentation from robotic surgical video

    Zixu Zhao, Yueming Jin, Xiaojie Gao, Qi Dou, and Pheng-Ann Heng. Learning motion flows for semi- supervised instrument segmentation from robotic surgical video. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part III 23 , pages 679–689. Springer, 2020

  17. [17]

    Objective surgical skills assessment and tool localization: Results from the miccai 2021 simsurgskill challenge

    Aneeq Zia, Kiran Bhattacharyya, Xi Liu, Ziheng Wang, Max Berniker, Satoshi Kondo, Emanuele Colleoni, Dimitris Psychogyios, Yueming Jin, Jinfan Zhou, et al. Objective surgical skills assessment and tool localization: Results from the miccai 2021 simsurgskill challenge. arXiv preprint arXiv:2212.04448, 2022

  18. [18]

    Stereo correspondence and reconstruction of endoscopic data challenge

    Max Allan, Jonathan Mcleod, Congcong Wang, Jean Claude Rosenthal, Zhenglei Hu, Niklas Gard, Peter Eisert, Ke Xue Fu, Trevor Zeffiro, Wenyao Xia, et al. Stereo correspondence and reconstruction of endoscopic data challenge. arXiv preprint arXiv:2101.01133 , 2021

  19. [19]

    2018 robotic scene segmentation challenge, 2020

    Max Allan, Satoshi Kondo, Sebastian Bodenstedt, Stefan Leger, Rahim Kadkhodamohammadi, Imanol Luengo, Felix Fuentes, Evangello Flouty, Ahmed Mohammed, Marius Pedersen, Avinash Kori, Vargh- ese Alex, Ganapathy Krishnamurthi, David Rauber, Robert Mendel, Christoph Palm, Sophia Bano, Guinther Saibro, Chi-Sheng Shih, Hsun-An Chiang, Juntang Zhuang, Junlin Yan...

  20. [20]

    Endonet: a deep architecture for recognition tasks on laparoscopic videos

    Andru P Twinanda, Sherif Shehata, Didier Mutter, Jacques Marescaux, Michel De Mathelin, and Nicolas Padoy. Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE transactions on medical imaging, 36(1):86–97, 2016

  21. [21]

    Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks

    Amy Jin, Serena Yeung, Jeffrey Jopling, Jonathan Krause, Dan Azagury, Arnold Milstein, and Li Fei- Fei. Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks. In 2018 IEEE winter conference on applications of computer vision (WACV) , pages 691–699. IEEE, 2018

  22. [22]

    Imagenet large scale visual recognition challenge

    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision , 115(3):211–252, 2015

  23. [23]

    Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines

    Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, and Gang Yu. Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 12549–12556, 2020

  24. [24]

    Cascade r-cnn: Delving into high quality object detection

    Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6154–6162, 2018

  25. [25]

    2017 robotic instrument segmentation challenge, 2019

    Max Allan, Alex Shvets, Thomas Kurmann, Zichen Zhang, Rahul Duggal, Yun-Hsuan Su, Nicola Rieke, Iro Laina, Niveditha Kalavakonda, Sebastian Bodenstedt, Luis Herrera, Wenqi Li, Vladimir Iglovikov, Huoling Luo, Jian Yang, Danail Stoyanov, Lena Maier-Hein, Stefanie Speidel, and Mahdi Azizian. 2017 robotic instrument segmentation challenge, 2019

  26. [26]

    Shvets, Alexander Rakhlin, Alexandr A

    Alexey A. Shvets, Alexander Rakhlin, Alexandr A. Kalinin, and Vladimir I. Iglovikov. Automatic instrument segmentation in robot-assisted surgery using deep learning. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) . IEEE, December 2018

  27. [27]

    A convnet for the 2020s

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022

  28. [28]

    Efficientnet: Rethinking model scaling for convolutional neural networks

    Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning , pages 6105–6114. PMLR, 2019. 71

  29. [29]

    Grad-cam: Visual explanations from deep networks via gradient-based localization

    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision , pages 618–626, 2017

  30. [30]

    YOLOv5 by Ultralytics, 5 2020

    Glenn Jocher. YOLOv5 by Ultralytics, 5 2020

  31. [31]

    Weakly supervised pseudo-label assisted learning for als point cloud semantic segmentation

    Puzuo Wang and Wei Yao. Weakly supervised pseudo-label assisted learning for als point cloud semantic segmentation. arXiv preprint arXiv:2105.01919 , 2021

  32. [32]

    Fastai: A layered API for deep learning

    Jeremy Howard and Sylvain Gugger. Fastai: A layered API for deep learning. Information, 11(2):108, feb 2020

  33. [33]

    Detectron2, 2019

    Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2, 2019

  34. [34]

    Surgical tool detection in open surgery videos

    Ryo Fujii, Ryo Hachiuma, Hiroki Kajita, and Hideo Saito. Surgical tool detection in open surgery videos. Applied Sciences, 12(20), 2022

  35. [35]

    Swin transformer v2: Scaling up capacity and resolution

    Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, and Baining Guo. Swin transformer v2: Scaling up capacity and resolution. In CVPR, pages 12009–12019, June 2022

  36. [36]

    Efficientnetv2: Smaller models and faster training

    Mingxing Tan and Quoc Le. Efficientnetv2: Smaller models and faster training. In ICML, 2021

  37. [37]

    Aggregated Residual Transformations for Deep Neural Networks

    Saining Xie, Ross B. Girshick, Piotr Doll´ ar, Zhuowen Tu, and Kaiming He. Aggregated residual trans- formations for deep neural networks. CoRR, abs/1611.05431, 2016

  38. [38]

    Pytorch image models, 2019

    Ross Wightman. Pytorch image models, 2019

  39. [39]

    Grad- cam++: Generalized gradient-based visual explanations for deep convolutional networks

    Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. Grad- cam++: Generalized gradient-based visual explanations for deep convolutional networks. In WACV, 2018

  40. [40]

    Imagenet: A large-scale hierar- chical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierar- chical image database. In CVPR, 2009

  41. [41]

    Shallow feature matters for weakly supervised object localization

    Jun Wei, Qin Wang, Zhen Li, Sheng Wang, S Kevin Zhou, and Shuguang Cui. Shallow feature matters for weakly supervised object localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5993–6001, 2021

  42. [42]

    Query2label: A simple transformer way to multi-label classification

    Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, and Jun Zhu. Query2label: A simple transformer way to multi-label classification. arXiv preprint arXiv:2107.10834 , 2021

  43. [43]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  44. [44]

    Asymmetric loss for multi-label classification

    Emanuel Ben-Baruch, Tal Ridnik, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, and Lihi Zelnik-Manor. Asymmetric loss for multi-label classification. arXiv preprint arXiv:2009.14119 , 2020

  45. [45]

    The opencv library

    Gary Bradski. The opencv library. Dr. Dobb’s Journal: Software Tools for the Professional Programmer, 25(11):120–123, 2000

  46. [46]

    Emerging Properties in Self-Supervised Vision Transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, et al. Emerging Properties in Self-Supervised Vision Transformers. In ICCV, 2021

  47. [47]

    iBOT: Image BERT Pre-Training with Online Tokenizer

    Jinghao Zhou, Chen Wei, Huiyu Wang, et al. iBOT: Image BERT Pre-Training with Online Tokenizer. ICLR, 2022

  48. [48]

    A deep learning spatial-temporal framework for detecting surgical tools in laparoscopic videos

    Tamer Abdulbaki Alshirbaji, Nour Aldeen Jalal, Paul D Docherty, Thomas Neumuth, and Knut M¨ oller. A deep learning spatial-temporal framework for detecting surgical tools in laparoscopic videos. Biomed- ical Signal Processing and Control , 68:102801, 2021. 72

  49. [49]

    Wildcat: Weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation

    Thibaut Durand, Taylor Mordan, Nicolas Thome, and Matthieu Cord. Wildcat: Weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 642–651, 2017

  50. [50]

    Weakly-supervised learn- ing for tool localization in laparoscopic videos

    Armine Vardazaryan, Didier Mutter, Jacques Marescaux, and Nicolas Padoy. Weakly-supervised learn- ing for tool localization in laparoscopic videos. In Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis , pages 169–179. Springer, 2018

  51. [51]

    Weakly supervised convolutional LSTM approach for tool tracking in laparoscopic videos.International journal of computer assisted radiology and surgery, 14(6):1059–1067, 2019

    Chinedu Innocent Nwoye, Didier Mutter, Jacques Marescaux, and Nicolas Padoy. Weakly supervised convolutional LSTM approach for tool tracking in laparoscopic videos.International journal of computer assisted radiology and surgery, 14(6):1059–1067, 2019

  52. [52]

    Squeeze-and-excitation networks

    Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 7132–7141, 2018

  53. [53]

    Abdulbaki Alshirbaji, Nour A

    T. Abdulbaki Alshirbaji, Nour A. Jalal, Paul D. Docherty, P. T. Neumuth, and Knut M¨ oller. Improving the Generalisability of Deep CNNs by Combining Multi-stage Features for Surgical Tool Classification. In 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 533–536. IEEE, 2022

  54. [54]

    Surgical tool classification in la- paroscopic videos using convolutional neural network

    Tamer Abdulbaki Alshirbaji, Nour Aldeen Jalal, and Knut M¨ oller. Surgical tool classification in la- paroscopic videos using convolutional neural network. Current Directions in Biomedical Engineering , 4(1):407–410, 2018

  55. [55]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint, 2020

  56. [56]

    DeVries Terrance, and Graham W. Taylor. Improved regularization of convolutional neural networks with cutout. arXiv preprint

  57. [57]

    Randaugment: Practical automated data augmentation with a reduced search space

    Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. arXiv e-prints, 2019

  58. [58]

    Asymmetric loss for multi-label classification

    Tal Ridnik, Emanuel Ben-Baruch, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, and Lihi Zelnik-Manor. Asymmetric loss for multi-label classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 82–91, 2021

  59. [59]

    YOLOv4: Optimal Speed and Accuracy of Object Detection

    Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 , 2020

  60. [60]

    Cholecseg8k: a semantic segmen- tation dataset for laparoscopic cholecystectomy based on cholec80

    W-Y Hong, C-L Kao, Y-H Kuo, J-R Wang, W-L Chang, and C-S Shih. Cholecseg8k: a semantic segmen- tation dataset for laparoscopic cholecystectomy based on cholec80. arXiv preprint arXiv:2012.12453 , 2020

  61. [61]

    Can masses of non- experts train highly accurate image classifiers? In International conference on medical image computing and computer-assisted intervention, pages 438–445

    Lena Maier-Hein, Sven Mersmann, Daniel Kondermann, Sebastian Bodenstedt, Alexandro Sanchez, Christian Stock, Hannes Gotz Kenngott, Mathias Eisenmann, and Stefanie Speidel. Can masses of non- experts train highly accurate image classifiers? In International conference on medical image computing and computer-assisted intervention, pages 438–445. Springer, 2014

  62. [62]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017

  63. [63]

    Weakly supervised convolutional lstm approach for tool tracking in laparoscopic videos

    Chinedu Innocent Nwoye, Didier Mutter, Jacques Marescaux, and Nicolas Padoy. Weakly supervised convolutional lstm approach for tool tracking in laparoscopic videos. International journal of computer assisted radiology and surgery, 14:1059–1067, 2019. 73

  64. [64]

    Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos

    Chinedu Innocent Nwoye, Tong Yu, Cristians Gonzalez, Barbara Seeliger, Pietro Mascagni, Didier Mutter, Jacques Marescaux, and Nicolas Padoy. Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos. Medical Image Analysis, 78:102433, 2022

  65. [65]

    Rethinking the inception architecture for computer vision

    Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016

  66. [66]

    Fastai: A layered api for deep learning

    Jeremy Howard and Sylvain Gugger. Fastai: A layered api for deep learning. Information, 11(2):108, 2020

  67. [67]

    Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis

    Davood Karimi, Haoran Dou, Simon K Warfield, and Ali Gholipour. Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical image analysis, 65:101759, 2020

  68. [68]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch- esnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011

  69. [69]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014

  70. [70]

    Faster r-cnn: Towards real-time object detection with region proposal networks, 2016

    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks, 2016

  71. [71]

    U-net: Convolutional networks for biomedical image segmentation, 2015

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation, 2015

  72. [72]

    Ultralytics YOLO, January 2023

    Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics YOLO, January 2023

  73. [73]

    Mixformer: End-to-end tracking with iterative mixed attention, 2022

    Yutao Cui, Cheng Jiang, Limin Wang, and Gangshan Wu. Mixformer: End-to-end tracking with iterative mixed attention, 2022

  74. [74]

    Dinov2: Learning robust visual features without supervision, 2024

    Maxime Oquab, Timoth´ ee Darcet, Th´ eo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv´ e Jegou, Julien Mairal, Patri...

  75. [75]

    Deep residual learning for image recognition, 2015

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015

  76. [76]

    Masked autoen- coders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollar, and Ross Girshick. Masked autoen- coders are scalable vision learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2022

  77. [77]

    Joint feature learning and relation modeling for tracking: A one-stream framework

    Botao Ye, Hong Chang, Bingpeng Ma, and Shiguang Shan. Joint feature learning and relation modeling for tracking: A one-stream framework

  78. [78]

    Yolox: Exceeding yolo series in 2021

    Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, Jian Sun, and Megvii Technology. Yolox: Exceeding yolo series in 2021

  79. [79]

    Rtmdet: An empirical study of designing real-time object detectors

    Chengqi Lyu, Wenwei Zhang, Haian Huang, Yue Zhou, Yudong Wang, Yanyi Liu, Shilong Zhang, Kai Chen, Concat Conv, and Resize Concat. Rtmdet: An empirical study of designing real-time object detectors

  80. [80]

    Weighted boxes fusion: Ensembling boxes from different object detection models

    Roman Solovyev, Weimin Wang, and Tatiana Gabruseva. Weighted boxes fusion: Ensembling boxes from different object detection models. Image and Vision Computing , page 104117, Mar 2021. 74

Showing first 80 references.