2017 Robotic Instrument Segmentation Challenge

Alex Shvets; Danail Stoyanov; Huoling Luo; Iro Laina; Jian Yang; Lena Maier-Hein; Luis Herrera; Mahdi Azizian; Max Allan; Nicola Rieke

2017 Robotic Instrument Segmentation Challenge

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1902.06426 v2 pith:NT6FR7NU submitted 2019-02-18 cs.CV

2017 Robotic Instrument Segmentation Challenge

Max Allan , Alex Shvets , Thomas Kurmann , Zichen Zhang , Rahul Duggal , Yun-Hsuan Su , Nicola Rieke , Iro Laina

show 11 more authors

Niveditha Kalavakonda Sebastian Bodenstedt Luis Herrera Wenqi Li Vladimir Iglovikov Huoling Luo Jian Yang Danail Stoyanov Lena Maier-Hein Stefanie Speidel Mahdi Azizian

This is my paper

classification cs.CV

keywords roboticsegmentationchallengedatasetshoweverinstrumentlimitedtype

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

In mainstream computer vision and machine learning, public datasets such as ImageNet, COCO and KITTI have helped drive enormous improvements by enabling researchers to understand the strengths and limitations of different algorithms via performance comparison. However, this type of approach has had limited translation to problems in robotic assisted surgery as this field has never established the same level of common datasets and benchmarking methods. In 2015 a sub-challenge was introduced at the EndoVis workshop where a set of robotic images were provided with automatically generated annotations from robot forward kinematics. However, there were issues with this dataset due to the limited background variation, lack of complex motion and inaccuracies in the annotation. In this work we present the results of the 2017 challenge on robotic instrument segmentation which involved 10 teams participating in binary, parts and type based segmentation of articulated da Vinci robotic instruments.

discussion (0)

Forward citations

Cited by 16 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SurgAtlas: A Large-Scale Surgical Video-Language Dataset with 2,391 Hours of Open and Minimally Invasive Surgery
cs.CV 2026-06 unverdicted novelty 7.0

SurgAtlas is a new dataset of 15,291 surgical videos totaling 2,391 hours with multi-level annotations that supports finetuning models to competitive performance on surgical benchmarks.
Unlocking Positive Transfer in Incrementally Learning Surgical Instruments: A Self-reflection Hierarchical Prompt Framework
cs.CV 2026-04 conditional novelty 7.0

A hierarchical prompt tree with self-reflection graph propagation enables positive forward and backward knowledge transfer in incremental surgical instrument segmentation, improving over baselines by more than 5% and ...
S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss
cs.CV 2026-01 unverdicted novelty 7.0

S2M-Net achieves state-of-the-art Dice scores on 16 medical datasets across 8 modalities using a 4.7M-parameter spectral-spatial mixer and morphology-aware adaptive loss, outperforming transformers with 3.5-6x fewer p...
SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures
eess.IV 2025-06 unverdicted novelty 7.0

Introduces the first publicly accessible native 4K resolution endoscopic video dataset for robotic-assisted minimally invasive procedures.
StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies
cs.CV 2025-04 unverdicted novelty 7.0

StereoMamba introduces a Mamba-based architecture with FE-Mamba and MFF modules for real-time stereo disparity estimation in RAMIS, reporting EPE of 2.64 px, depth MAE of 2.55 mm, and 21.28 FPS on the SCARED benchmark...
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
eess.IV 2024-01 unverdicted novelty 7.0

U-Mamba is a hybrid CNN-SSM architecture that outperforms prior CNN and Transformer networks on biomedical image segmentation tasks by efficiently modeling long-range dependencies.
Incorporating Temporal Prior from Motion Flow for Instrument Segmentation in Minimally Invasive Surgery Video
cs.CV 2019-07 unverdicted novelty 7.0

A temporal prior from inter-frame motion flow is injected as initialization into an attention pyramid network to guide coarse-to-fine instrument segmentation in MIS videos, exceeding prior results on the EndoVis datas...
RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering
cs.CV 2026-05 unverdicted novelty 6.0

RoboSurg-VQA is a new segmentation-aware VQA benchmark created by repurposing public surgical datasets with fixed clinically motivated questions and closed answer sets.
Dense Structural Priors for Sparse Functional Landmark Localization in Surgical Videos
cs.CV 2026-06 unverdicted novelty 5.0

A multi-frame network with SAM 3-derived mask priors achieves 72.4% F1 tip and 58.0% F1 anchor localization in surgical videos without manual mask annotations for training.
Surgical Anatomy Recognition with Context Learning using Foundation Representations
cs.CV 2026-06 unverdicted novelty 5.0

Presents ATLAS-120k dataset and ATLAS model for context-aware surgical anatomy segmentation using foundation representations and temporal cues.
USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation
cs.CV 2026-05 unverdicted novelty 5.0

USEMA is a hybrid UNet architecture merging CNNs with scalable Mamba-like attention (SEMA) that achieves better efficiency than transformers and superior segmentation accuracy than pure CNN or Mamba models across medi...
Surgical Visual Understanding (SurgVU) Dataset
cs.CV 2025-01 unverdicted novelty 5.0

Releases the SurgVU dataset of surgical videos and labels to enable machine learning research in surgical data science.
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
cs.CV 2024-07 accept novelty 5.0

SegSTRONG-C provides a new benchmark where top models reach 0.9394 DSC and 0.9301 NSD on corrupted surgical tool segmentation tests, showing conventional techniques help but calling for more innovative robustness methods.
Learning Where to Look While Tracking Instruments in Robot-assisted Surgery
cs.CV 2019-06 unverdicted novelty 4.0

An end-to-end multitask model with shared encoder, separate decoders, batch-Wasserstein loss, and soft attention module reports better performance than prior segmentation and saliency methods on the MICCAI robotic ins...
Attention Is not Everything: Efficient Alternatives for Vision
cs.CV 2026-04 unverdicted novelty 3.0

A survey that taxonomizes non-Transformer vision models and evaluates their practical trade-offs across efficiency, scalability, and robustness.
Benchmarking CNN- and Transformer-Based Models for Surgical Instrument Segmentation in Robotic-Assisted Surgery
cs.CV 2026-04 unverdicted novelty 2.0

DeepLabV3 matches SegFormer performance in multi-class surgical instrument segmentation while convolutional baselines like UNet remain competitive on the SAR-RARP50 dataset.