Attention U-Net: Learning Where to Look for the Pancreas

Ben Glocker; Bernhard Kainz; Daniel Rueckert; Jo Schlemper; Kazunari Misawa; Kensaku Mori; Loic Le Folgoc; Matthew Lee; Mattias Heinrich; Nils Y Hammerla

arxiv: 1804.03999 · v3 · submitted 2018-04-11 · 💻 cs.CV

Attention U-Net: Learning Where to Look for the Pancreas

Ozan Oktay , Jo Schlemper , Loic Le Folgoc , Matthew Lee , Mattias Heinrich , Kazunari Misawa , Kensaku Mori , Steven McDonagh

show 4 more authors

Nils Y Hammerla Bernhard Kainz Ben Glocker Daniel Rueckert

This is my paper

Pith reviewed 2026-05-12 21:13 UTC · model grok-4.3

classification 💻 cs.CV

keywords attention gatesU-Netmedical image segmentationCT imagingpancreas segmentationconvolutional neural networksdeep learningimage segmentation

0 comments

The pith

Attention gates added to U-Net let the model learn to focus on target structures in CT images, raising segmentation accuracy while removing the need for separate organ localization steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes attention gates that plug into standard U-Net models for medical image segmentation. These gates learn during training to highlight useful features and suppress background areas that do not help the task, handling organs of different sizes without extra preprocessing networks. Evaluation on two large abdominal CT datasets shows steady gains in accuracy and sensitivity over plain U-Net across multiple training set sizes, all while adding only small computational cost. The approach simplifies cascaded pipelines that first locate tissue before segmenting it. Readers would care because it offers a direct way to make convolutional networks more precise on variable anatomy with little extra effort.

Core claim

The authors introduce attention gates (AGs) that automatically learn to focus on target structures of varying shapes and sizes in medical images. When integrated into U-Net, the gates suppress irrelevant regions in the input while emphasizing salient features for the segmentation task. This removes the requirement for explicit external tissue or organ localisation modules in cascaded CNNs. Experiments on two large CT abdominal datasets for multi-class segmentation demonstrate that AGs improve U-Net prediction performance consistently across datasets and training sizes while preserving computational efficiency.

What carries the argument

Attention gates (AGs), modules inserted into the skip connections of U-Net that learn to filter feature maps by suppressing irrelevant spatial regions and amplifying task-relevant ones.

If this is right

AGs integrate into standard CNNs such as U-Net with only minor added computation.
Prediction accuracy and sensitivity increase consistently on abdominal CT segmentation tasks.
The model works across different dataset sizes and multiple training conditions without retraining the base architecture.
Cascaded localisation-plus-segmentation pipelines become unnecessary.
Computational efficiency remains comparable to the unmodified U-Net.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same gate design could be tested on MRI or ultrasound volumes where organ boundaries vary even more than in CT.
Because the gates operate on feature maps, they might reduce the amount of manual annotation needed for training by guiding the network to salient areas automatically.
Replacing explicit localisation stages with learned attention could shorten overall inference pipelines in clinical workflows.
Combining AGs with other forms of attention, such as channel-wise, remains an open extension not explored in the reported experiments.

Load-bearing premise

Attention gates will reliably learn to suppress irrelevant regions and highlight salient features for target structures of varying shapes and sizes without requiring explicit external tissue or organ localisation modules.

What would settle it

Train both standard U-Net and Attention U-Net on the same small subset of one CT dataset and measure Dice scores plus inference time on a fixed test set; if Dice does not rise or runtime overhead exceeds minimal levels, the central claim does not hold.

read the original abstract

We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes Attention Gates (AGs) as an architectural component to integrate into U-Net for medical image segmentation. AGs automatically learn to suppress irrelevant regions and highlight salient features for target structures of varying shapes and sizes, with the goal of eliminating explicit external tissue/organ localization modules required in cascaded CNN pipelines. The Attention U-Net is evaluated on two large CT abdominal datasets for multi-class segmentation, claiming consistent performance gains over standard U-Net across datasets and training sizes with minimal computational overhead. The code is made publicly available.

Significance. If the central claims hold, the work provides a lightweight attention mechanism that can improve segmentation sensitivity and accuracy in standard CNNs without separate localization stages, which would simplify pipelines in medical imaging. The public code supports reproducibility, a clear strength. However, the significance is limited by the absence of direct comparisons to cascaded baselines, leaving open whether observed gains truly substitute for explicit localization or merely reflect added model capacity.

major comments (2)

Abstract: The load-bearing claim that AGs 'enable us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks' is not supported by the experiments, which compare Attention U-Net only to plain U-Net on two CT datasets. No cascaded baseline (coarse localization network followed by fine segmentation) is evaluated for either Dice accuracy or total inference cost, so it remains possible that gains arise from multi-scale attention adding capacity rather than substituting for localization.
Experimental Results section: The abstract asserts 'consistent gains' and 'improved prediction performance' but the reported evaluation lacks specific quantitative metrics (e.g., Dice scores per class or dataset), error bars, number of training runs, or implementation details such as training sizes and hyper-parameters, which undermines verification of the performance claims and cross-dataset consistency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, acknowledging where the manuscript claims require qualification or additional clarification. We will revise the manuscript accordingly to strengthen the presentation of results and temper unsupported assertions.

read point-by-point responses

Referee: Abstract: The load-bearing claim that AGs 'enable us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks' is not supported by the experiments, which compare Attention U-Net only to plain U-Net on two CT datasets. No cascaded baseline (coarse localization network followed by fine segmentation) is evaluated for either Dice accuracy or total inference cost, so it remains possible that gains arise from multi-scale attention adding capacity rather than substituting for localization.

Authors: We agree that the abstract claim is not directly supported by the experiments, as no cascaded baseline is evaluated. The attention gates are intended to provide implicit localization by suppressing irrelevant regions, which is supported by the observed improvements over standard U-Net. However, without a head-to-head comparison on accuracy and inference cost, we cannot claim that AGs fully substitute for explicit localization modules. We will revise the abstract to qualify the statement (e.g., 'can reduce the need for explicit external localization modules') and add a limitations paragraph discussing this point. A cascaded baseline comparison is not feasible to add at this stage due to time and scope constraints. revision: partial
Referee: Experimental Results section: The abstract asserts 'consistent gains' and 'improved prediction performance' but the reported evaluation lacks specific quantitative metrics (e.g., Dice scores per class or dataset), error bars, number of training runs, or implementation details such as training sizes and hyper-parameters, which undermines verification of the performance claims and cross-dataset consistency.

Authors: The full manuscript reports per-class Dice scores, Hausdorff distances, and other metrics for both datasets in Tables 1–3, with results broken down by training set size (25%, 50%, 100%). Hyperparameters and training protocols are detailed in Section 3.2. We acknowledge that standard deviations across multiple runs and the precise number of independent training runs were not reported. We will add these (from 3 runs per configuration) and ensure all quantitative results are more explicitly cross-referenced in the text to better substantiate the claims of consistent gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; novel architectural component with independent empirical tests

full rationale

The paper proposes attention gates as an independent architectural addition to U-Net, with the central claim (elimination of explicit cascaded localization modules) supported by direct experiments on two external CT datasets showing Dice improvements. No equations, self-definitional reductions, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The derivation chain consists of a new mechanism plus standard training and evaluation, remaining self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the newly introduced attention gate entity and the domain assumption that standard CNN training on CT data will allow these gates to learn useful focus patterns without external localisation.

axioms (1)

domain assumption Convolutional neural networks trained end-to-end on medical CT images can perform multi-class segmentation tasks.
The paper builds directly on the U-Net architecture and its established training regime.

invented entities (1)

Attention Gate (AG) no independent evidence
purpose: To automatically learn to focus on target structures of varying shapes and sizes while suppressing irrelevant regions in an input image.
The AG is presented as a novel module that can be inserted into CNNs such as U-Net.

pith-pipeline@v0.9.0 · 5479 in / 1227 out tokens · 86936 ms · 2026-05-12T21:13:01.960033+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.DAlembert.Inevitability bilinear_family_forced echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

AGs automatically learn to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs).
IndisputableMonolith.Foundation.LedgerCanonicality no_free_knobs echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy.
IndisputableMonolith.Foundation.DimensionForcing dimension_forced unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 50 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AuraMask: An Extensible Pipeline for Developing Aesthetic Anti-Facial Recognition Image Filters
cs.CV 2026-05 conditional novelty 7.0

AuraMask produces 40 aesthetic anti-facial recognition filters that match or exceed prior adversarial effectiveness and achieve significantly higher user acceptance in a 630-person study.
TopoU-Net: a U-Net architecture for topological domains
cs.LG 2026-05 unverdicted novelty 7.0

TopoU-Net is a rank-path U-Net for combinatorial complexes that encodes by lifting cochains upward along incidences, decodes by transporting downward, and merges via skip connections at matched ranks.
XAttnRes: Cross-Stage Attention Residuals for Medical Image Segmentation
cs.CV 2026-03 unverdicted novelty 7.0

XAttnRes introduces cross-stage attention residuals that maintain a global feature history and selectively aggregate prior representations, improving medical image segmentation and performing on par with baselines eve...
Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation
cs.CV 2026-03 unverdicted novelty 7.0

GDLA delivers state-of-the-art accuracy on CT, MRI, ultrasound and dermoscopy segmentation benchmarks while keeping linear O(N) complexity in a PVT encoder-decoder.
Information Filtering via Variational Regularization for Robot Manipulation
cs.RO 2026-01 unverdicted novelty 7.0

Variational Regularization imposes an adaptive information bottleneck on noisy intermediate features in DP3-UNet and DP3-DiT policies, consistently raising task success rates on RoboTwin2.0, Adroit, and MetaWorld whil...
S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss
cs.CV 2026-01 unverdicted novelty 7.0

S2M-Net achieves state-of-the-art Dice scores on 16 medical datasets across 8 modalities using a 4.7M-parameter spectral-spatial mixer and morphology-aware adaptive loss, outperforming transformers with 3.5-6x fewer p...
StruMPL: Multi-task Dense Regression under Disjoint Partial Supervision and MNAR Labels
cs.CV 2026-05 unverdicted novelty 6.0

StruMPL is a multi-task dense regression model that jointly addresses disjoint partial supervision, MNAR labels, and inter-task physical constraints for improved forest biomass estimation from Earth observation.
Spectral Vision Transformer for Efficient Tokenization with Limited Data
cs.CV 2026-05 unverdicted novelty 6.0

A spectral vision transformer achieves equitable or superior performance with fewer parameters than standard ViTs, CNNs, and other models by using spectral projections for tokenization in limited-data medical imaging.
FEFormer: Frequency-enhanced Vision Transformer for Generic Knowledge Extraction and Adaptive Feature Fusion in Volumetric Medical Image Segmentation
eess.IV 2026-05 unverdicted novelty 6.0

A frequency-enhanced Vision Transformer with FDSA, FGMLP, WAFF, and FCSB modules delivers superior volumetric medical image segmentation performance and efficiency over prior state-of-the-art methods.
Polygon-mamba: Retinal vessel segmentation using polygon scanning mamba and space-frequency collaborative attention
cs.CV 2026-05 unverdicted novelty 6.0

Polygon-Mamba achieves F1 scores of 0.8283, 0.8282, and 0.8251 on DRIVE, STARE, and CHASE_DB1 by combining polygon scanning Mamba with space-frequency collaborative attention to better detect small retinal vessels.
ESICA: A Scalable Framework for Text-Guided 3D Medical Image Segmentation
cs.CV 2026-04 unverdicted novelty 6.0

ESICA delivers state-of-the-art accuracy on a five-modality 3D medical segmentation benchmark while offering a compact variant with far fewer parameters.
Mapping License Plate Recoverability Under Extreme Viewing Angles for Oppor-tunistic Urban Sensing
cs.CV 2026-04 unverdicted novelty 6.0

Recoverability maps use synthetic sweeps of viewing angles and artifacts to quantify the recoverable fraction of parameter space for license plate restoration, with the best model succeeding on 93% and geometry settin...
Learning from Noisy Prompts: Saliency-Guided Prompt Distillation for Robust Segmentation with SAM
cs.CV 2026-04 unverdicted novelty 6.0

SPD improves SAM segmentation robustness to noisy prompts by learning anatomical saliency priors, distilling consensus prompts from adjacent slices, and enforcing pairwise slice consistency.
Toward Polymorphic Backdoor against Semantic Communication via Intensity-Based Poisoning
cs.CR 2026-04 unverdicted novelty 6.0

SemBugger achieves polymorphic backdoors in semantic communication via graded-intensity trigger poisoning and hierarchical loss, plus a noise-based defense with a theoretical efficacy bound.
CDSA-Net:Collaborative Decoupling of Vascular Structure and Background for High-Fidelity Coronary Digital Subtraction Angiography
cs.CV 2026-04 unverdicted novelty 6.0

CDSA-Net decouples vascular structure extraction and background restoration in coronary DSA via hierarchical geometric priors and adaptive noise modeling to eliminate artifacts while preserving tissue fidelity.
Geometrical Cross-Attention and Nonvoid Voxelization for Efficient 3D Medical Image Segmentation
cs.CV 2026-04 unverdicted novelty 6.0

GCNV-Net achieves state-of-the-art accuracy on multiple 3D medical segmentation benchmarks while cutting FLOPs by 56% and inference latency by 68% through dynamic nonvoid voxelization and geometric attention.
CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing
cs.CV 2025-12 unverdicted novelty 6.0

The paper defines the Conformal Hallucination Estimation Metric (CHEM) that localizes hallucination-prone regions in image reconstruction models via multiscale representations and distribution-free conformal regression.
Can We Go Beyond Visual Features? Neural Tissue Relation Modeling for Relational Graph Analysis in Non-Melanoma Skin Histology
cs.CV 2025-12 unverdicted novelty 6.0

NTRM combines CNNs with tissue-level graph neural networks to model inter-tissue relationships, delivering 4.9% to 31.25% higher Dice scores than prior methods on a non-melanoma skin cancer histology segmentation benchmark.
SAMRI: Segment Any MRI
eess.IV 2025-10 conditional novelty 6.0

SAMRI fine-tunes only the mask decoder of SAM on 1.1 million MRI slices from 30 datasets to reach mean DSC 0.87 on 47 targets and strong zero-shot performance.
Category-based Galaxy Image Generation via Diffusion Models
astro-ph.IM 2025-06 unverdicted novelty 6.0

GalCatDiff applies category embeddings and a novel Astro-RAB block inside diffusion models to produce galaxy images whose color and size distributions match observations more closely than prior generative approaches.
Learning Parallax for Stereo Event-based Motion Deblurring
cs.CV 2023-09 unverdicted novelty 6.0

St-EDNet recovers sharp images from misaligned blurry intensity images and event streams by performing coarse cross-modal stereo alignment followed by fine bidirectional feature reconstruction.
M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation
cs.CV 2023-03 conditional novelty 6.0

M²SNet uses intra- and inter-layer multi-scale subtraction units plus a training-free LossNet to generate difference features that reduce redundancy in decoder fusion for medical segmentation.
MHMamba: Multi-Head Mamba for 3D Brain Tumor Segmentation
cs.CV 2026-05 unverdicted novelty 5.0

MHMamba combines a U-Net with multi-head Mamba, channel calibration, and adaptive skip fusion to improve 3D brain tumor segmentation accuracy and small-lesion sensitivity on BraTS datasets while retaining linear complexity.
Geometric Flood Depth Estimation: Fusing Transformer-Based Segmentation with Digital Elevation Models
cs.CV 2026-05 unverdicted novelty 5.0

A pipeline uses Mask2Former flood masks and DEMs to compute a single water surface elevation then derives local depths under hydrostatic equilibrium.
Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction
cs.CV 2026-05 unverdicted novelty 5.0

A masked-diffusion pretrained convolutional model outperforms ViT pathology foundation models on cell-level dense prediction tasks in histology.
MambaLiteUNet: Cross-Gated Adaptive Feature Fusion for Robust Skin Lesion Segmentation
cs.CV 2026-04 unverdicted novelty 5.0

MambaLiteUNet integrates Mamba into U-Net with adaptive fusion, local-global mixing, and cross-gated attention modules to reach 87.12% IoU and 93.09% Dice on skin lesion datasets while cutting parameters by 93.6%.
EDU-Net: Retinal Pathological Fluid Segmentation in OCT Images with Multiscale Feature Fusion and Boundary Optimization
eess.IV 2026-04 unverdicted novelty 5.0

EDU-Net fuses multiscale local and global features with boundary optimization to achieve state-of-the-art segmentation of intraretinal and subretinal fluid in OCT images.
Align then Refine: Text-Guided 3D Prostate Lesion Segmentation
cs.CV 2026-04 unverdicted novelty 5.0

A text-guided multi-encoder U-Net with alignment loss, heatmap calibration, and confidence-gated cross-attention refiner sets new state-of-the-art 3D prostate lesion segmentation performance on the PI-CAI dataset.
HQF-Net: A Hybrid Quantum-Classical Multi-Scale Fusion Network for Remote Sensing Image Segmentation
cs.CV 2026-04 unverdicted novelty 5.0

HQF-Net reports mIoU gains on three remote-sensing benchmarks by adding quantum circuits to skip connections and a mixture-of-experts bottleneck inside a classical U-Net fused with a DINOv3 backbone.
Attention-Guided Flow-Matching for Sparse 3D Geological Generation
cs.CV 2026-04 unverdicted novelty 5.0

3D-GeoFlow reformulates discrete categorical 3D geological generation as simulation-free continuous vector field regression with 3D attention gates, claiming to outperform heuristics and diffusion models on a 2,200-ca...
GroupKAN: Efficient Kolmogorov-Arnold Networks via Grouped Spline Modeling
cs.CV 2025-11 conditional novelty 5.0

GroupKAN reduces KAN parameter scaling via intra-group spline mappings, delivering 79.80% average IoU (+1.11% over U-KAN) at 47.6% of the parameters on BUSI, GlaS, and CVC datasets.
BGRem: A background noise remover for astronomical images based on a diffusion model
astro-ph.IM 2025-10 unverdicted novelty 5.0

BGRem applies a supervised diffusion model to denoise MeerLICHT and Fermi-LAT images, raising true-positive source detections by roughly 7% when used before SExtractor.
A novel attention mechanism for noise-adaptive and robust segmentation of microtubules in microscopy images
q-bio.QM 2025-07 conditional novelty 5.0

ASE_Res_UNet with a novel noise-adaptive attention mechanism outperforms ablated variants and alternative architectures in segmenting microtubules from noisy synthetic and real microscopy images while using fewer para...
MSLAU-Net: A Hybrid CNN-Transformer Network for Medical Image Segmentation
cs.CV 2025-05 conditional novelty 5.0

MSLAU-Net proposes a hybrid CNN-Transformer architecture using multi-scale linear attention and lightweight top-down aggregation that outperforms prior methods on medical segmentation benchmarks across three modalities.
Gamma-Ray Burst Light Curve Reconstruction: A Comparative Machine and Deep Learning Analysis
astro-ph.HE 2024-12 unverdicted novelty 5.0

MLP and Attention U-Net outperform other models in reconstructing GRB light curves on 521 events, cutting plateau parameter uncertainties by 37-41% versus the Willingale baseline while achieving low MSE.
Implantable Adaptive Cells: A Novel Enhancement for Pre-Trained U-Nets in Medical Image Segmentation
cs.CV 2024-05 unverdicted novelty 5.0

Introduces Implantable Adaptive Cells inserted into pre-trained U-Nets via Partially-Connected DARTS to achieve approximately 5 percentage point gains in segmentation accuracy on four medical MRI/CT datasets.
ConvNeXt-FD: A Fractal-Based Deep Model for Robust Biomedical Image Segmentation
cs.CV 2026-05 unverdicted novelty 4.0

ConvNeXt-FD pairs a ConvNeXt backbone with fractal-dimension boundary regularization inside a U-Net and reports competitive Dice and related scores on six biomedical segmentation benchmarks.
Med-DisSeg: Dispersion-Driven Representation Learning for Fine-Grained Medical Image Segmentation
cs.CV 2026-05 unverdicted novelty 4.0

Med-DisSeg uses a dispersive loss on batch representations plus adaptive multi-scale decoding to achieve state-of-the-art fine-grained segmentation on five medical imaging datasets.
Edge-Cloud Collaborative Pothole Detection via Onboard Event Screening and Federated Temporal Segmentation
cs.DC 2026-05 unverdicted novelty 4.0

An edge-cloud framework screens vibration events onboard with a GMM and uses a federated 1D Attention U-Net for temporal segmentation to detect potholes while reducing data transmission.
Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection
cs.CV 2026-05 unverdicted novelty 4.0

A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher mod...
MAE-Based Self-Supervised Pretraining for Data-Efficient Medical Image Segmentation Using nnFormer
cs.CV 2026-04 unverdicted novelty 4.0

MAE self-supervised pretraining of nnFormer yields higher Dice scores, faster convergence, and better generalization when labeled medical segmentation data is scarce.
PBE-UNet: A light weight Progressive Boundary-Enhanced U-Net with Scale-Aware Aggregation for Ultrasound Image Segmentation
cs.CV 2026-04 unverdicted novelty 4.0

PBE-UNet adds scale-aware aggregation and progressive boundary expansion modules to U-Net and reports better segmentation performance than prior methods on four ultrasound datasets.
SwinTextUNet: Integrating CLIP-Based Text Guidance into Swin Transformer U-Nets for Medical Image Segmentation
cs.CV 2026-04 unverdicted novelty 4.0

SwinTextUNet integrates CLIP text guidance into Swin U-Net via cross-attention and convolutional fusion, achieving 86.47% Dice and 78.2% IoU on QaTaCOV19 medical image segmentation.
SAGE-GAN: Towards Realistic and Robust Segmentation of Spatially Ordered Nanoparticles via Attention-Guided GANs
cs.CV 2026-04 unverdicted novelty 4.0

SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.
Focal Modulation and Bidirectional Feature Fusion Network for Medical Image Segmentation
cs.CV 2025-10 unverdicted novelty 4.0

FM-BFF-Net combines focal modulation attention with bidirectional encoder-decoder fusion in a CNN-transformer architecture and reports higher Dice and Jaccard scores than recent methods across eight medical image datasets.
Clinical utility of foundation models in musculoskeletal MRI for biomarker fidelity and predictive outcomes
eess.IV 2025-01 unverdicted novelty 4.0

Fine-tuned foundation models produce reliable MSK MRI biomarkers that support workload-reducing triage and calibrated 48-month prediction of knee replacement and incident OA.
Deep Learning for Pneumothorax Detection and Localization in Chest Radiographs
eess.IV 2019-07 unverdicted novelty 4.0

Comparison of CNN, multiple-instance learning, and FCN for pneumothorax detection and localization yielding AUCs of 0.96, 0.93, and 0.92 on 1003 chest radiographs.
Attention-ResUNet for Automated Fetal Head Segmentation
cs.CV 2026-04 unverdicted novelty 3.0

Attention-ResUNet reaches 99.30% mean Dice score on the HC18 fetal head ultrasound dataset, outperforming ResUNet, Attention U-Net, Swin U-Net, U-Net, and U-Net++ with statistical significance.
Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS)
cs.CV 2026-04 unverdicted novelty 3.0

ADRUwAMS reports Dice scores of 0.9229 (whole tumor), 0.8432 (tumor core), and 0.8004 (enhancing tumor) on BraTS 2020 after training on BraTS 2019/2020 datasets.
Benchmarking CNN- and Transformer-Based Models for Surgical Instrument Segmentation in Robotic-Assisted Surgery
cs.CV 2026-04 unverdicted novelty 2.0

DeepLabV3 matches SegFormer performance in multi-class surgical instrument segmentation while convolutional baselines like UNet remain competitive on the SAR-RARP50 dataset.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · cited by 50 Pith papers · 3 internal anchors

[1]

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and vqa. arXiv preprint arXiv:1707.07998 (2017)

work page Pith review arXiv 2017
[2]

Neural Machine Translation by Jointly Learning to Align and Translate

Bahdanau, D., Cho, K., Bengio, Y .: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[3]

arXiv preprint arXiv:1710.09289 (2017)

Bai, W., Sinclair, M., Tarroni, G., Oktay, O., Rajchl, M., Vaillant, G., Lee, A.M., Aung, N., Lukaschuk, E., Sanghvi, M.M., et al.: Human-level CMR image analysis with deep fully convolutional networks. arXiv preprint arXiv:1710.09289 (2017)

work page arXiv 2017
[4]

In: MICCAI (2017)

Cai, J., Lu, L., Xie, Y ., Xing, F., Yang, L.: Improving deep pancreas segmentation in CT and MRI images via recurrent neural contextual learning and direct loss function. In: MICCAI (2017)

work page 2017
[5]

In: MICCAI

Cerrolaza, J.J., Summers, R.M., Linguraru, M.G.: Soft multi-organ shape models via generalized PCA: A general framework. In: MICCAI. pp. 219–228. Springer (2016)

work page 2016
[6]

In: MICCAI

Gibson, E., Giganti, F., Hu, Y ., Bonmati, E., Bandula, S., Gurusamy, K., Davidson, B.R., Pereira, S.P., Clarkson, M.J., Barratt, D.C.: Towards image-guided pancreas and biliary endoscopy: Au- tomatic multi-organ segmentation on abdominal CT with dense dilated networks. In: MICCAI. pp. 728–736. Springer (2017)

work page 2017
[7]

Highway and Residual Networks learn Unrolled Iterative Estimation

Greff, K., Srivastava, R.K., Schmidhuber, J.: Highway and residual networks learn unrolled iterative estimation. arXiv preprint arXiv:1612.07771 (2016)

work page Pith review arXiv 2016
[8]

arXiv preprint arXiv:1801.09449 (2018)

Heinrich, M.P., Blendowski, M., Oktay, O.: TernaryNet: Faster deep model inference without GPUs for medical 3D segmentation using sparse and binary convolutions. arXiv preprint arXiv:1801.09449 (2018)

work page arXiv 2018
[9]

In: MICCAI

Heinrich, M.P., Oktay, O.: BRIEFnet: Deep pancreas segmentation using binary sparse convolu- tions. In: MICCAI. pp. 329–337. Springer (2017)

work page 2017
[10]

Squeeze-and-Excitation Networks

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv:1709.01507 (2017)

work page Pith review arXiv 2017
[11]

In: International Conference on Learning Representations (2018), https://openreview.net/forum?id=HyzbhfWRW

Jetley, S., Lord, N.A., Lee, N., Torr, P.: Learn to pay attention. In: International Conference on Learning Representations (2018), https://openreview.net/forum?id=HyzbhfWRW

work page 2018
[12]

In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries

Kamnitsas, K., Bai, W., Ferrante, E., McDonagh, S., Sinclair, M., Pawlowski, N., Rajchl, M., Lee, M., Kainz, B., Rueckert, D., Glocker, B.: Ensembles of multiple models and architectures for robust brain tumour segmentation. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 450–462. Cham (2018)

work page 2018
[13]

Medical image analysis 36, 61–78 (2017)

Kamnitsas, K., Ledig, C., Newcombe, V .F., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, D., Glocker, B.: Efﬁcient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical image analysis 36, 61–78 (2017)

work page 2017
[14]

arXiv preprint arXiv:1801.05173 (2018)

Khened, M., Kollerathu, V .A., Krishnamurthi, G.: Fully convolutional multi-scale residual densenets for cardiac segmentation and automated cardiac diagnosis using ensemble of classi- ﬁers. arXiv preprint arXiv:1801.05173 (2018)

work page arXiv 2018
[15]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[16]

In: Artiﬁcial Intelligence and Statistics

Lee, C.Y ., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artiﬁcial Intelligence and Statistics. pp. 562–570 (2015)

work page 2015
[17]

arXiv preprint arXiv:1711.08324 (2017)

Liao, F., Liang, M., Li, Z., Hu, X., Song, S.: Evaluate the malignancy of pulmonary nodules using the 3D deep leaky noisy-or network. arXiv preprint arXiv:1711.08324 (2017)

work page arXiv 2017
[18]

In: IEEE CVPR

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE CVPR. pp. 3431–3440 (2015) 9

work page 2015
[19]

Effective Approaches to Attention-based Neural Machine Translation

Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)

work page Pith review arXiv 2015
[20]

In: 3D Vision

Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumet- ric medical image segmentation. In: 3D Vision. pp. 565–571. IEEE (2016)

work page 2016
[21]

In: Advances in neural information processing systems

Mnih, V ., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in neural information processing systems. pp. 2204–2212 (2014)

work page 2014
[22]

In: DLMI, pp

Oda, M., Shimizu, N., Roth, H.R., Karasawa, K., Kitasaka, T., Misawa, K., Fujiwara, M., Rueckert, D., Mori, K.: 3D FCN feature driven regression forest-based pancreas localization and segmentation. In: DLMI, pp. 222–230. Springer (2017)

work page 2017
[23]

In: STACOM

Payer, C., Štern, D., Bischof, H., Urschler, M.: Multi-label whole heart segmentation using CNNs and anatomical label conﬁgurations. In: STACOM. pp. 190–198. Springer (2017)

work page 2017
[24]

In: MICCAI

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI. pp. 234–241. Springer (2015)

work page 2015
[25]

and Farag, Ayman and Turkbey, Evrim B

Roth, H., Farag, A., Turkbey, E.B., Lu, L., Liu, J., Summers, R.M.: Data from Pancreas-CT. The Cancer Imaging Archive (2016), http://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU

work page doi:10.7937/k9/tcia.2016.tnb1kqbu 2016
[26]

Medical Image Analysis 45, 94 – 107 (2018)

Roth, H.R., Lu, L., Lay, N., Harrison, A.P., Farag, A., Sohn, A., Summers, R.M.: Spatial aggre- gation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation. Medical Image Analysis 45, 94 – 107 (2018)

work page 2018
[27]

Hierarchical 3D fully convolutional networks for multi-organ segmentation

Roth, H.R., Oda, H., Hayashi, Y ., Oda, M., Shimizu, N., Fujiwara, M., Misawa, K., Mori, K.: Hierarchical 3D fully convolutional networks for multi-organ segmentation. arXiv preprint arXiv:1704.06382 (2017)

work page Pith review arXiv 2017
[28]

Medical image analysis 28, 46–65 (2016)

Saito, A., Nawano, S., Shimizu, A.: Joint optimization of segmentation and shape prior from level-set-based statistical shape model, and its application to the automated segmentation of abdominal organs. Medical image analysis 28, 46–65 (2016)

work page 2016
[29]

arXiv preprint arXiv:1709.04696 (2017)

Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., Zhang, C.: Disan: Directional self-attention network for rnn/cnn-free language understanding. arXiv preprint arXiv:1709.04696 (2017)

work page arXiv 2017
[30]

In: NIPS

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NIPS. pp. 6000–6010 (2017)

work page 2017
[31]

Graph Attention Networks

Veliˇckovi´c, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y .: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[32]

In: IEEE CVPR

Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classiﬁcation. In: IEEE CVPR. pp. 3156–3164 (2017)

work page 2017
[33]

Non-local Neural Networks

Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. arXiv preprint arXiv:1711.07971 (2017)

work page Pith review arXiv 2017
[34]

IEEE TMI 32(9) (2013)

Wolz, R., Chu, C., Misawa, K., Fujiwara, M., Mori, K., Rueckert, D.: Automated abdominal multi-organ segmentation with subject-speciﬁc atlas generation. IEEE TMI 32(9) (2013)

work page 2013
[35]

In: Proceedings of the IEEE international conference on computer vision

Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision. pp. 1395–1403 (2015)

work page 2015
[36]

arXiv preprint arXiv:1701.06452 (2017)

Ypsilantis, P.P., Montana, G.: Learning what to look in chest X-rays with a recurrent visual attention model. arXiv preprint arXiv:1701.06452 (2017)

work page arXiv 2017
[37]

arXiv preprint arXiv:1709.04518 (2017)

Yu, Q., Xie, L., Wang, Y ., Zhou, Y ., Fishman, E.K., Yuille, A.L.: Recurrent saliency transfor- mation network: Incorporating multi-stage visual cues for small organ segmentation. arXiv preprint arXiv:1709.04518 (2017)

work page arXiv 2017
[38]

In: MICCAI

Zhou, Y ., Xie, L., Shen, W., Wang, Y ., Fishman, E.K., Yuille, A.L.: A ﬁxed-point model for pancreas segmentation in abdominal CT scans. In: MICCAI. pp. 693–701. Springer (2017)

work page 2017
[39]

In: International MICCAI Workshop on Medical Computer Vision

Zografos, V ., Valentinitsch, A., Rempﬂer, M., Tombari, F., Menze, B.: Hierarchical multi-organ segmentation without registration in 3D abdominal CT images. In: International MICCAI Workshop on Medical Computer Vision. pp. 37–46. Springer (2015) 10

work page 2015

[1] [1]

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and vqa. arXiv preprint arXiv:1707.07998 (2017)

work page Pith review arXiv 2017

[2] [2]

Neural Machine Translation by Jointly Learning to Align and Translate

Bahdanau, D., Cho, K., Bengio, Y .: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[3] [3]

arXiv preprint arXiv:1710.09289 (2017)

Bai, W., Sinclair, M., Tarroni, G., Oktay, O., Rajchl, M., Vaillant, G., Lee, A.M., Aung, N., Lukaschuk, E., Sanghvi, M.M., et al.: Human-level CMR image analysis with deep fully convolutional networks. arXiv preprint arXiv:1710.09289 (2017)

work page arXiv 2017

[4] [4]

In: MICCAI (2017)

Cai, J., Lu, L., Xie, Y ., Xing, F., Yang, L.: Improving deep pancreas segmentation in CT and MRI images via recurrent neural contextual learning and direct loss function. In: MICCAI (2017)

work page 2017

[5] [5]

In: MICCAI

Cerrolaza, J.J., Summers, R.M., Linguraru, M.G.: Soft multi-organ shape models via generalized PCA: A general framework. In: MICCAI. pp. 219–228. Springer (2016)

work page 2016

[6] [6]

In: MICCAI

Gibson, E., Giganti, F., Hu, Y ., Bonmati, E., Bandula, S., Gurusamy, K., Davidson, B.R., Pereira, S.P., Clarkson, M.J., Barratt, D.C.: Towards image-guided pancreas and biliary endoscopy: Au- tomatic multi-organ segmentation on abdominal CT with dense dilated networks. In: MICCAI. pp. 728–736. Springer (2017)

work page 2017

[7] [7]

Highway and Residual Networks learn Unrolled Iterative Estimation

Greff, K., Srivastava, R.K., Schmidhuber, J.: Highway and residual networks learn unrolled iterative estimation. arXiv preprint arXiv:1612.07771 (2016)

work page Pith review arXiv 2016

[8] [8]

arXiv preprint arXiv:1801.09449 (2018)

Heinrich, M.P., Blendowski, M., Oktay, O.: TernaryNet: Faster deep model inference without GPUs for medical 3D segmentation using sparse and binary convolutions. arXiv preprint arXiv:1801.09449 (2018)

work page arXiv 2018

[9] [9]

In: MICCAI

Heinrich, M.P., Oktay, O.: BRIEFnet: Deep pancreas segmentation using binary sparse convolu- tions. In: MICCAI. pp. 329–337. Springer (2017)

work page 2017

[10] [10]

Squeeze-and-Excitation Networks

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv:1709.01507 (2017)

work page Pith review arXiv 2017

[11] [11]

In: International Conference on Learning Representations (2018), https://openreview.net/forum?id=HyzbhfWRW

Jetley, S., Lord, N.A., Lee, N., Torr, P.: Learn to pay attention. In: International Conference on Learning Representations (2018), https://openreview.net/forum?id=HyzbhfWRW

work page 2018

[12] [12]

In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries

Kamnitsas, K., Bai, W., Ferrante, E., McDonagh, S., Sinclair, M., Pawlowski, N., Rajchl, M., Lee, M., Kainz, B., Rueckert, D., Glocker, B.: Ensembles of multiple models and architectures for robust brain tumour segmentation. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 450–462. Cham (2018)

work page 2018

[13] [13]

Medical image analysis 36, 61–78 (2017)

Kamnitsas, K., Ledig, C., Newcombe, V .F., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, D., Glocker, B.: Efﬁcient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical image analysis 36, 61–78 (2017)

work page 2017

[14] [14]

arXiv preprint arXiv:1801.05173 (2018)

Khened, M., Kollerathu, V .A., Krishnamurthi, G.: Fully convolutional multi-scale residual densenets for cardiac segmentation and automated cardiac diagnosis using ensemble of classi- ﬁers. arXiv preprint arXiv:1801.05173 (2018)

work page arXiv 2018

[15] [15]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[16] [16]

In: Artiﬁcial Intelligence and Statistics

Lee, C.Y ., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artiﬁcial Intelligence and Statistics. pp. 562–570 (2015)

work page 2015

[17] [17]

arXiv preprint arXiv:1711.08324 (2017)

Liao, F., Liang, M., Li, Z., Hu, X., Song, S.: Evaluate the malignancy of pulmonary nodules using the 3D deep leaky noisy-or network. arXiv preprint arXiv:1711.08324 (2017)

work page arXiv 2017

[18] [18]

In: IEEE CVPR

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE CVPR. pp. 3431–3440 (2015) 9

work page 2015

[19] [19]

Effective Approaches to Attention-based Neural Machine Translation

Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)

work page Pith review arXiv 2015

[20] [20]

In: 3D Vision

Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumet- ric medical image segmentation. In: 3D Vision. pp. 565–571. IEEE (2016)

work page 2016

[21] [21]

In: Advances in neural information processing systems

Mnih, V ., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in neural information processing systems. pp. 2204–2212 (2014)

work page 2014

[22] [22]

In: DLMI, pp

Oda, M., Shimizu, N., Roth, H.R., Karasawa, K., Kitasaka, T., Misawa, K., Fujiwara, M., Rueckert, D., Mori, K.: 3D FCN feature driven regression forest-based pancreas localization and segmentation. In: DLMI, pp. 222–230. Springer (2017)

work page 2017

[23] [23]

In: STACOM

Payer, C., Štern, D., Bischof, H., Urschler, M.: Multi-label whole heart segmentation using CNNs and anatomical label conﬁgurations. In: STACOM. pp. 190–198. Springer (2017)

work page 2017

[24] [24]

In: MICCAI

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI. pp. 234–241. Springer (2015)

work page 2015

[25] [25]

and Farag, Ayman and Turkbey, Evrim B

Roth, H., Farag, A., Turkbey, E.B., Lu, L., Liu, J., Summers, R.M.: Data from Pancreas-CT. The Cancer Imaging Archive (2016), http://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU

work page doi:10.7937/k9/tcia.2016.tnb1kqbu 2016

[26] [26]

Medical Image Analysis 45, 94 – 107 (2018)

Roth, H.R., Lu, L., Lay, N., Harrison, A.P., Farag, A., Sohn, A., Summers, R.M.: Spatial aggre- gation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation. Medical Image Analysis 45, 94 – 107 (2018)

work page 2018

[27] [27]

Hierarchical 3D fully convolutional networks for multi-organ segmentation

Roth, H.R., Oda, H., Hayashi, Y ., Oda, M., Shimizu, N., Fujiwara, M., Misawa, K., Mori, K.: Hierarchical 3D fully convolutional networks for multi-organ segmentation. arXiv preprint arXiv:1704.06382 (2017)

work page Pith review arXiv 2017

[28] [28]

Medical image analysis 28, 46–65 (2016)

Saito, A., Nawano, S., Shimizu, A.: Joint optimization of segmentation and shape prior from level-set-based statistical shape model, and its application to the automated segmentation of abdominal organs. Medical image analysis 28, 46–65 (2016)

work page 2016

[29] [29]

arXiv preprint arXiv:1709.04696 (2017)

Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., Zhang, C.: Disan: Directional self-attention network for rnn/cnn-free language understanding. arXiv preprint arXiv:1709.04696 (2017)

work page arXiv 2017

[30] [30]

In: NIPS

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NIPS. pp. 6000–6010 (2017)

work page 2017

[31] [31]

Graph Attention Networks

Veliˇckovi´c, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y .: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[32] [32]

In: IEEE CVPR

Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classiﬁcation. In: IEEE CVPR. pp. 3156–3164 (2017)

work page 2017

[33] [33]

Non-local Neural Networks

Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. arXiv preprint arXiv:1711.07971 (2017)

work page Pith review arXiv 2017

[34] [34]

IEEE TMI 32(9) (2013)

Wolz, R., Chu, C., Misawa, K., Fujiwara, M., Mori, K., Rueckert, D.: Automated abdominal multi-organ segmentation with subject-speciﬁc atlas generation. IEEE TMI 32(9) (2013)

work page 2013

[35] [35]

In: Proceedings of the IEEE international conference on computer vision

Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision. pp. 1395–1403 (2015)

work page 2015

[36] [36]

arXiv preprint arXiv:1701.06452 (2017)

Ypsilantis, P.P., Montana, G.: Learning what to look in chest X-rays with a recurrent visual attention model. arXiv preprint arXiv:1701.06452 (2017)

work page arXiv 2017

[37] [37]

arXiv preprint arXiv:1709.04518 (2017)

Yu, Q., Xie, L., Wang, Y ., Zhou, Y ., Fishman, E.K., Yuille, A.L.: Recurrent saliency transfor- mation network: Incorporating multi-stage visual cues for small organ segmentation. arXiv preprint arXiv:1709.04518 (2017)

work page arXiv 2017

[38] [38]

In: MICCAI

Zhou, Y ., Xie, L., Shen, W., Wang, Y ., Fishman, E.K., Yuille, A.L.: A ﬁxed-point model for pancreas segmentation in abdominal CT scans. In: MICCAI. pp. 693–701. Springer (2017)

work page 2017

[39] [39]

In: International MICCAI Workshop on Medical Computer Vision

Zografos, V ., Valentinitsch, A., Rempﬂer, M., Tombari, F., Menze, B.: Hierarchical multi-organ segmentation without registration in 3D abdominal CT images. In: International MICCAI Workshop on Medical Computer Vision. pp. 37–46. Springer (2015) 10

work page 2015