pith. machine review for the scientific record. sign in

arxiv: 1804.03999 · v3 · submitted 2018-04-11 · 💻 cs.CV

Recognition: 3 theorem links

· Lean Theorem

Attention U-Net: Learning Where to Look for the Pancreas

Authors on Pith no claims yet

Pith reviewed 2026-05-12 21:13 UTC · model grok-4.3

classification 💻 cs.CV
keywords attention gatesU-Netmedical image segmentationCT imagingpancreas segmentationconvolutional neural networksdeep learningimage segmentation
0
0 comments X

The pith

Attention gates added to U-Net let the model learn to focus on target structures in CT images, raising segmentation accuracy while removing the need for separate organ localization steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes attention gates that plug into standard U-Net models for medical image segmentation. These gates learn during training to highlight useful features and suppress background areas that do not help the task, handling organs of different sizes without extra preprocessing networks. Evaluation on two large abdominal CT datasets shows steady gains in accuracy and sensitivity over plain U-Net across multiple training set sizes, all while adding only small computational cost. The approach simplifies cascaded pipelines that first locate tissue before segmenting it. Readers would care because it offers a direct way to make convolutional networks more precise on variable anatomy with little extra effort.

Core claim

The authors introduce attention gates (AGs) that automatically learn to focus on target structures of varying shapes and sizes in medical images. When integrated into U-Net, the gates suppress irrelevant regions in the input while emphasizing salient features for the segmentation task. This removes the requirement for explicit external tissue or organ localisation modules in cascaded CNNs. Experiments on two large CT abdominal datasets for multi-class segmentation demonstrate that AGs improve U-Net prediction performance consistently across datasets and training sizes while preserving computational efficiency.

What carries the argument

Attention gates (AGs), modules inserted into the skip connections of U-Net that learn to filter feature maps by suppressing irrelevant spatial regions and amplifying task-relevant ones.

If this is right

  • AGs integrate into standard CNNs such as U-Net with only minor added computation.
  • Prediction accuracy and sensitivity increase consistently on abdominal CT segmentation tasks.
  • The model works across different dataset sizes and multiple training conditions without retraining the base architecture.
  • Cascaded localisation-plus-segmentation pipelines become unnecessary.
  • Computational efficiency remains comparable to the unmodified U-Net.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same gate design could be tested on MRI or ultrasound volumes where organ boundaries vary even more than in CT.
  • Because the gates operate on feature maps, they might reduce the amount of manual annotation needed for training by guiding the network to salient areas automatically.
  • Replacing explicit localisation stages with learned attention could shorten overall inference pipelines in clinical workflows.
  • Combining AGs with other forms of attention, such as channel-wise, remains an open extension not explored in the reported experiments.

Load-bearing premise

Attention gates will reliably learn to suppress irrelevant regions and highlight salient features for target structures of varying shapes and sizes without requiring explicit external tissue or organ localisation modules.

What would settle it

Train both standard U-Net and Attention U-Net on the same small subset of one CT dataset and measure Dice scores plus inference time on a fixed test set; if Dice does not rise or runtime overhead exceeds minimal levels, the central claim does not hold.

read the original abstract

We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes Attention Gates (AGs) as an architectural component to integrate into U-Net for medical image segmentation. AGs automatically learn to suppress irrelevant regions and highlight salient features for target structures of varying shapes and sizes, with the goal of eliminating explicit external tissue/organ localization modules required in cascaded CNN pipelines. The Attention U-Net is evaluated on two large CT abdominal datasets for multi-class segmentation, claiming consistent performance gains over standard U-Net across datasets and training sizes with minimal computational overhead. The code is made publicly available.

Significance. If the central claims hold, the work provides a lightweight attention mechanism that can improve segmentation sensitivity and accuracy in standard CNNs without separate localization stages, which would simplify pipelines in medical imaging. The public code supports reproducibility, a clear strength. However, the significance is limited by the absence of direct comparisons to cascaded baselines, leaving open whether observed gains truly substitute for explicit localization or merely reflect added model capacity.

major comments (2)
  1. Abstract: The load-bearing claim that AGs 'enable us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks' is not supported by the experiments, which compare Attention U-Net only to plain U-Net on two CT datasets. No cascaded baseline (coarse localization network followed by fine segmentation) is evaluated for either Dice accuracy or total inference cost, so it remains possible that gains arise from multi-scale attention adding capacity rather than substituting for localization.
  2. Experimental Results section: The abstract asserts 'consistent gains' and 'improved prediction performance' but the reported evaluation lacks specific quantitative metrics (e.g., Dice scores per class or dataset), error bars, number of training runs, or implementation details such as training sizes and hyper-parameters, which undermines verification of the performance claims and cross-dataset consistency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, acknowledging where the manuscript claims require qualification or additional clarification. We will revise the manuscript accordingly to strengthen the presentation of results and temper unsupported assertions.

read point-by-point responses
  1. Referee: Abstract: The load-bearing claim that AGs 'enable us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks' is not supported by the experiments, which compare Attention U-Net only to plain U-Net on two CT datasets. No cascaded baseline (coarse localization network followed by fine segmentation) is evaluated for either Dice accuracy or total inference cost, so it remains possible that gains arise from multi-scale attention adding capacity rather than substituting for localization.

    Authors: We agree that the abstract claim is not directly supported by the experiments, as no cascaded baseline is evaluated. The attention gates are intended to provide implicit localization by suppressing irrelevant regions, which is supported by the observed improvements over standard U-Net. However, without a head-to-head comparison on accuracy and inference cost, we cannot claim that AGs fully substitute for explicit localization modules. We will revise the abstract to qualify the statement (e.g., 'can reduce the need for explicit external localization modules') and add a limitations paragraph discussing this point. A cascaded baseline comparison is not feasible to add at this stage due to time and scope constraints. revision: partial

  2. Referee: Experimental Results section: The abstract asserts 'consistent gains' and 'improved prediction performance' but the reported evaluation lacks specific quantitative metrics (e.g., Dice scores per class or dataset), error bars, number of training runs, or implementation details such as training sizes and hyper-parameters, which undermines verification of the performance claims and cross-dataset consistency.

    Authors: The full manuscript reports per-class Dice scores, Hausdorff distances, and other metrics for both datasets in Tables 1–3, with results broken down by training set size (25%, 50%, 100%). Hyperparameters and training protocols are detailed in Section 3.2. We acknowledge that standard deviations across multiple runs and the precise number of independent training runs were not reported. We will add these (from 3 runs per configuration) and ensure all quantitative results are more explicitly cross-referenced in the text to better substantiate the claims of consistent gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; novel architectural component with independent empirical tests

full rationale

The paper proposes attention gates as an independent architectural addition to U-Net, with the central claim (elimination of explicit cascaded localization modules) supported by direct experiments on two external CT datasets showing Dice improvements. No equations, self-definitional reductions, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The derivation chain consists of a new mechanism plus standard training and evaluation, remaining self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the newly introduced attention gate entity and the domain assumption that standard CNN training on CT data will allow these gates to learn useful focus patterns without external localisation.

axioms (1)
  • domain assumption Convolutional neural networks trained end-to-end on medical CT images can perform multi-class segmentation tasks.
    The paper builds directly on the U-Net architecture and its established training regime.
invented entities (1)
  • Attention Gate (AG) no independent evidence
    purpose: To automatically learn to focus on target structures of varying shapes and sizes while suppressing irrelevant regions in an input image.
    The AG is presented as a novel module that can be inserted into CNNs such as U-Net.

pith-pipeline@v0.9.0 · 5479 in / 1227 out tokens · 86936 ms · 2026-05-12T21:13:01.960033+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith.Foundation.DAlembert.Inevitability bilinear_family_forced echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    AGs automatically learn to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs).

  • IndisputableMonolith.Foundation.LedgerCanonicality no_free_knobs echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy.

  • IndisputableMonolith.Foundation.DimensionForcing dimension_forced unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 29 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AuraMask: An Extensible Pipeline for Developing Aesthetic Anti-Facial Recognition Image Filters

    cs.CV 2026-05 conditional novelty 7.0

    AuraMask produces 40 aesthetic anti-facial recognition filters that match or exceed prior adversarial effectiveness and achieve significantly higher user acceptance in a 630-person study.

  2. TopoU-Net: a U-Net architecture for topological domains

    cs.LG 2026-05 unverdicted novelty 7.0

    TopoU-Net is a rank-path U-Net for combinatorial complexes that encodes by lifting cochains upward along incidences, decodes by transporting downward, and merges via skip connections at matched ranks.

  3. XAttnRes: Cross-Stage Attention Residuals for Medical Image Segmentation

    cs.CV 2026-03 unverdicted novelty 7.0

    XAttnRes introduces cross-stage attention residuals that maintain a global feature history and selectively aggregate prior representations, improving medical image segmentation and performing on par with baselines eve...

  4. Spectral Vision Transformer for Efficient Tokenization with Limited Data

    cs.CV 2026-05 unverdicted novelty 6.0

    A spectral vision transformer achieves equitable or superior performance with fewer parameters than standard ViTs, CNNs, and other models by using spectral projections for tokenization in limited-data medical imaging.

  5. FEFormer: Frequency-enhanced Vision Transformer for Generic Knowledge Extraction and Adaptive Feature Fusion in Volumetric Medical Image Segmentation

    eess.IV 2026-05 unverdicted novelty 6.0

    A frequency-enhanced Vision Transformer with FDSA, FGMLP, WAFF, and FCSB modules delivers superior volumetric medical image segmentation performance and efficiency over prior state-of-the-art methods.

  6. Polygon-mamba: Retinal vessel segmentation using polygon scanning mamba and space-frequency collaborative attention

    cs.CV 2026-05 unverdicted novelty 6.0

    Polygon-Mamba achieves F1 scores of 0.8283, 0.8282, and 0.8251 on DRIVE, STARE, and CHASE_DB1 by combining polygon scanning Mamba with space-frequency collaborative attention to better detect small retinal vessels.

  7. ESICA: A Scalable Framework for Text-Guided 3D Medical Image Segmentation

    cs.CV 2026-04 unverdicted novelty 6.0

    ESICA delivers state-of-the-art accuracy on a five-modality 3D medical segmentation benchmark while offering a compact variant with far fewer parameters.

  8. Mapping License Plate Recoverability Under Extreme Viewing Angles for Oppor-tunistic Urban Sensing

    cs.CV 2026-04 unverdicted novelty 6.0

    Recoverability maps use synthetic sweeps of viewing angles and artifacts to quantify the recoverable fraction of parameter space for license plate restoration, with the best model succeeding on 93% and geometry settin...

  9. Learning from Noisy Prompts: Saliency-Guided Prompt Distillation for Robust Segmentation with SAM

    cs.CV 2026-04 unverdicted novelty 6.0

    SPD improves SAM segmentation robustness to noisy prompts by learning anatomical saliency priors, distilling consensus prompts from adjacent slices, and enforcing pairwise slice consistency.

  10. Toward Polymorphic Backdoor against Semantic Communication via Intensity-Based Poisoning

    cs.CR 2026-04 unverdicted novelty 6.0

    SemBugger achieves polymorphic backdoors in semantic communication via graded-intensity trigger poisoning and hierarchical loss, plus a noise-based defense with a theoretical efficacy bound.

  11. CDSA-Net:Collaborative Decoupling of Vascular Structure and Background for High-Fidelity Coronary Digital Subtraction Angiography

    cs.CV 2026-04 unverdicted novelty 6.0

    CDSA-Net decouples vascular structure extraction and background restoration in coronary DSA via hierarchical geometric priors and adaptive noise modeling to eliminate artifacts while preserving tissue fidelity.

  12. Geometrical Cross-Attention and Nonvoid Voxelization for Efficient 3D Medical Image Segmentation

    cs.CV 2026-04 unverdicted novelty 6.0

    GCNV-Net achieves state-of-the-art accuracy on multiple 3D medical segmentation benchmarks while cutting FLOPs by 56% and inference latency by 68% through dynamic nonvoid voxelization and geometric attention.

  13. Geometric Flood Depth Estimation: Fusing Transformer-Based Segmentation with Digital Elevation Models

    cs.CV 2026-05 unverdicted novelty 5.0

    A pipeline uses Mask2Former flood masks and DEMs to compute a single water surface elevation then derives local depths under hydrostatic equilibrium.

  14. Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

    cs.CV 2026-05 unverdicted novelty 5.0

    A masked-diffusion pretrained convolutional model outperforms ViT pathology foundation models on cell-level dense prediction tasks in histology.

  15. MambaLiteUNet: Cross-Gated Adaptive Feature Fusion for Robust Skin Lesion Segmentation

    cs.CV 2026-04 unverdicted novelty 5.0

    MambaLiteUNet integrates Mamba into U-Net with adaptive fusion, local-global mixing, and cross-gated attention modules to reach 87.12% IoU and 93.09% Dice on skin lesion datasets while cutting parameters by 93.6%.

  16. EDU-Net: Retinal Pathological Fluid Segmentation in OCT Images with Multiscale Feature Fusion and Boundary Optimization

    eess.IV 2026-04 unverdicted novelty 5.0

    EDU-Net fuses multiscale local and global features with boundary optimization to achieve state-of-the-art segmentation of intraretinal and subretinal fluid in OCT images.

  17. Align then Refine: Text-Guided 3D Prostate Lesion Segmentation

    cs.CV 2026-04 unverdicted novelty 5.0

    A text-guided multi-encoder U-Net with alignment loss, heatmap calibration, and confidence-gated cross-attention refiner sets new state-of-the-art 3D prostate lesion segmentation performance on the PI-CAI dataset.

  18. HQF-Net: A Hybrid Quantum-Classical Multi-Scale Fusion Network for Remote Sensing Image Segmentation

    cs.CV 2026-04 unverdicted novelty 5.0

    HQF-Net reports mIoU gains on three remote-sensing benchmarks by adding quantum circuits to skip connections and a mixture-of-experts bottleneck inside a classical U-Net fused with a DINOv3 backbone.

  19. Attention-Guided Flow-Matching for Sparse 3D Geological Generation

    cs.CV 2026-04 unverdicted novelty 5.0

    3D-GeoFlow reformulates discrete categorical 3D geological generation as simulation-free continuous vector field regression with 3D attention gates, claiming to outperform heuristics and diffusion models on a 2,200-ca...

  20. Med-DisSeg: Dispersion-Driven Representation Learning for Fine-Grained Medical Image Segmentation

    cs.CV 2026-05 unverdicted novelty 4.0

    Med-DisSeg uses a dispersive loss on batch representations plus adaptive multi-scale decoding to achieve state-of-the-art fine-grained segmentation on five medical imaging datasets.

  21. Edge-Cloud Collaborative Pothole Detection via Onboard Event Screening and Federated Temporal Segmentation

    cs.DC 2026-05 unverdicted novelty 4.0

    An edge-cloud framework screens vibration events onboard with a GMM and uses a federated 1D Attention U-Net for temporal segmentation to detect potholes while reducing data transmission.

  22. Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection

    cs.CV 2026-05 unverdicted novelty 4.0

    A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher mod...

  23. MAE-Based Self-Supervised Pretraining for Data-Efficient Medical Image Segmentation Using nnFormer

    cs.CV 2026-04 unverdicted novelty 4.0

    MAE self-supervised pretraining of nnFormer yields higher Dice scores, faster convergence, and better generalization when labeled medical segmentation data is scarce.

  24. PBE-UNet: A light weight Progressive Boundary-Enhanced U-Net with Scale-Aware Aggregation for Ultrasound Image Segmentation

    cs.CV 2026-04 unverdicted novelty 4.0

    PBE-UNet adds scale-aware aggregation and progressive boundary expansion modules to U-Net and reports better segmentation performance than prior methods on four ultrasound datasets.

  25. SwinTextUNet: Integrating CLIP-Based Text Guidance into Swin Transformer U-Nets for Medical Image Segmentation

    cs.CV 2026-04 unverdicted novelty 4.0

    SwinTextUNet integrates CLIP text guidance into Swin U-Net via cross-attention and convolutional fusion, achieving 86.47% Dice and 78.2% IoU on QaTaCOV19 medical image segmentation.

  26. SAGE-GAN: Towards Realistic and Robust Segmentation of Spatially Ordered Nanoparticles via Attention-Guided GANs

    cs.CV 2026-04 unverdicted novelty 4.0

    SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.

  27. Attention-ResUNet for Automated Fetal Head Segmentation

    cs.CV 2026-04 unverdicted novelty 3.0

    Attention-ResUNet reaches 99.30% mean Dice score on the HC18 fetal head ultrasound dataset, outperforming ResUNet, Attention U-Net, Swin U-Net, U-Net, and U-Net++ with statistical significance.

  28. Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS)

    cs.CV 2026-04 unverdicted novelty 3.0

    ADRUwAMS reports Dice scores of 0.9229 (whole tumor), 0.8432 (tumor core), and 0.8004 (enhancing tumor) on BraTS 2020 after training on BraTS 2019/2020 datasets.

  29. Benchmarking CNN- and Transformer-Based Models for Surgical Instrument Segmentation in Robotic-Assisted Surgery

    cs.CV 2026-04 unverdicted novelty 2.0

    DeepLabV3 matches SegFormer performance in multi-class surgical instrument segmentation while convolutional baselines like UNet remain competitive on the SAR-RARP50 dataset.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · cited by 29 Pith papers · 3 internal anchors

  1. [1]

    arXiv preprint arXiv:1707.07998 (2017)

    Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and vqa. arXiv preprint arXiv:1707.07998 (2017)

  2. [2]

    Neural Machine Translation by Jointly Learning to Align and Translate

    Bahdanau, D., Cho, K., Bengio, Y .: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  3. [3]

    arXiv preprint arXiv:1710.09289 (2017)

    Bai, W., Sinclair, M., Tarroni, G., Oktay, O., Rajchl, M., Vaillant, G., Lee, A.M., Aung, N., Lukaschuk, E., Sanghvi, M.M., et al.: Human-level CMR image analysis with deep fully convolutional networks. arXiv preprint arXiv:1710.09289 (2017)

  4. [4]

    In: MICCAI (2017)

    Cai, J., Lu, L., Xie, Y ., Xing, F., Yang, L.: Improving deep pancreas segmentation in CT and MRI images via recurrent neural contextual learning and direct loss function. In: MICCAI (2017)

  5. [5]

    In: MICCAI

    Cerrolaza, J.J., Summers, R.M., Linguraru, M.G.: Soft multi-organ shape models via generalized PCA: A general framework. In: MICCAI. pp. 219–228. Springer (2016)

  6. [6]

    In: MICCAI

    Gibson, E., Giganti, F., Hu, Y ., Bonmati, E., Bandula, S., Gurusamy, K., Davidson, B.R., Pereira, S.P., Clarkson, M.J., Barratt, D.C.: Towards image-guided pancreas and biliary endoscopy: Au- tomatic multi-organ segmentation on abdominal CT with dense dilated networks. In: MICCAI. pp. 728–736. Springer (2017)

  7. [7]

    arXiv preprint arXiv:1612.07771 (2016)

    Greff, K., Srivastava, R.K., Schmidhuber, J.: Highway and residual networks learn unrolled iterative estimation. arXiv preprint arXiv:1612.07771 (2016)

  8. [8]

    arXiv preprint arXiv:1801.09449 (2018)

    Heinrich, M.P., Blendowski, M., Oktay, O.: TernaryNet: Faster deep model inference without GPUs for medical 3D segmentation using sparse and binary convolutions. arXiv preprint arXiv:1801.09449 (2018)

  9. [9]

    In: MICCAI

    Heinrich, M.P., Oktay, O.: BRIEFnet: Deep pancreas segmentation using binary sparse convolu- tions. In: MICCAI. pp. 329–337. Springer (2017)

  10. [10]

    arXiv:1709.01507 (2017)

    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv:1709.01507 (2017)

  11. [11]

    In: International Conference on Learning Representations (2018), https://openreview.net/forum?id=HyzbhfWRW

    Jetley, S., Lord, N.A., Lee, N., Torr, P.: Learn to pay attention. In: International Conference on Learning Representations (2018), https://openreview.net/forum?id=HyzbhfWRW

  12. [12]

    In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries

    Kamnitsas, K., Bai, W., Ferrante, E., McDonagh, S., Sinclair, M., Pawlowski, N., Rajchl, M., Lee, M., Kainz, B., Rueckert, D., Glocker, B.: Ensembles of multiple models and architectures for robust brain tumour segmentation. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 450–462. Cham (2018)

  13. [13]

    Medical image analysis 36, 61–78 (2017)

    Kamnitsas, K., Ledig, C., Newcombe, V .F., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, D., Glocker, B.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical image analysis 36, 61–78 (2017)

  14. [14]

    arXiv preprint arXiv:1801.05173 (2018)

    Khened, M., Kollerathu, V .A., Krishnamurthi, G.: Fully convolutional multi-scale residual densenets for cardiac segmentation and automated cardiac diagnosis using ensemble of classi- fiers. arXiv preprint arXiv:1801.05173 (2018)

  15. [15]

    Adam: A Method for Stochastic Optimization

    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  16. [16]

    In: Artificial Intelligence and Statistics

    Lee, C.Y ., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics. pp. 562–570 (2015)

  17. [17]

    arXiv preprint arXiv:1711.08324 (2017)

    Liao, F., Liang, M., Li, Z., Hu, X., Song, S.: Evaluate the malignancy of pulmonary nodules using the 3D deep leaky noisy-or network. arXiv preprint arXiv:1711.08324 (2017)

  18. [18]

    In: IEEE CVPR

    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE CVPR. pp. 3431–3440 (2015) 9

  19. [19]

    Effective approaches to attention- based neural machine translation.arXiv preprint arXiv:1508.04025, 2015

    Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)

  20. [20]

    In: 3D Vision

    Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumet- ric medical image segmentation. In: 3D Vision. pp. 565–571. IEEE (2016)

  21. [21]

    In: Advances in neural information processing systems

    Mnih, V ., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in neural information processing systems. pp. 2204–2212 (2014)

  22. [22]

    In: DLMI, pp

    Oda, M., Shimizu, N., Roth, H.R., Karasawa, K., Kitasaka, T., Misawa, K., Fujiwara, M., Rueckert, D., Mori, K.: 3D FCN feature driven regression forest-based pancreas localization and segmentation. In: DLMI, pp. 222–230. Springer (2017)

  23. [23]

    In: STACOM

    Payer, C., Štern, D., Bischof, H., Urschler, M.: Multi-label whole heart segmentation using CNNs and anatomical label configurations. In: STACOM. pp. 190–198. Springer (2017)

  24. [24]

    In: MICCAI

    Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI. pp. 234–241. Springer (2015)

  25. [25]

    and Farag, Ayman and Turkbey, Evrim B

    Roth, H., Farag, A., Turkbey, E.B., Lu, L., Liu, J., Summers, R.M.: Data from Pancreas-CT. The Cancer Imaging Archive (2016), http://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU

  26. [26]

    Medical Image Analysis 45, 94 – 107 (2018)

    Roth, H.R., Lu, L., Lay, N., Harrison, A.P., Farag, A., Sohn, A., Summers, R.M.: Spatial aggre- gation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation. Medical Image Analysis 45, 94 – 107 (2018)

  27. [27]

    arXiv preprint arXiv:1704.06382 (2017)

    Roth, H.R., Oda, H., Hayashi, Y ., Oda, M., Shimizu, N., Fujiwara, M., Misawa, K., Mori, K.: Hierarchical 3D fully convolutional networks for multi-organ segmentation. arXiv preprint arXiv:1704.06382 (2017)

  28. [28]

    Medical image analysis 28, 46–65 (2016)

    Saito, A., Nawano, S., Shimizu, A.: Joint optimization of segmentation and shape prior from level-set-based statistical shape model, and its application to the automated segmentation of abdominal organs. Medical image analysis 28, 46–65 (2016)

  29. [29]

    arXiv preprint arXiv:1709.04696 (2017)

    Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., Zhang, C.: Disan: Directional self-attention network for rnn/cnn-free language understanding. arXiv preprint arXiv:1709.04696 (2017)

  30. [30]

    In: NIPS

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: NIPS. pp. 6000–6010 (2017)

  31. [31]

    Graph Attention Networks

    Veliˇckovi´c, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y .: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

  32. [32]

    In: IEEE CVPR

    Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: IEEE CVPR. pp. 3156–3164 (2017)

  33. [33]

    Non-local neural networks

    Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. arXiv preprint arXiv:1711.07971 (2017)

  34. [34]

    IEEE TMI 32(9) (2013)

    Wolz, R., Chu, C., Misawa, K., Fujiwara, M., Mori, K., Rueckert, D.: Automated abdominal multi-organ segmentation with subject-specific atlas generation. IEEE TMI 32(9) (2013)

  35. [35]

    In: Proceedings of the IEEE international conference on computer vision

    Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision. pp. 1395–1403 (2015)

  36. [36]

    arXiv preprint arXiv:1701.06452 (2017)

    Ypsilantis, P.P., Montana, G.: Learning what to look in chest X-rays with a recurrent visual attention model. arXiv preprint arXiv:1701.06452 (2017)

  37. [37]

    arXiv preprint arXiv:1709.04518 (2017)

    Yu, Q., Xie, L., Wang, Y ., Zhou, Y ., Fishman, E.K., Yuille, A.L.: Recurrent saliency transfor- mation network: Incorporating multi-stage visual cues for small organ segmentation. arXiv preprint arXiv:1709.04518 (2017)

  38. [38]

    In: MICCAI

    Zhou, Y ., Xie, L., Shen, W., Wang, Y ., Fishman, E.K., Yuille, A.L.: A fixed-point model for pancreas segmentation in abdominal CT scans. In: MICCAI. pp. 693–701. Springer (2017)

  39. [39]

    In: International MICCAI Workshop on Medical Computer Vision

    Zografos, V ., Valentinitsch, A., Rempfler, M., Tombari, F., Menze, B.: Hierarchical multi-organ segmentation without registration in 3D abdominal CT images. In: International MICCAI Workshop on Medical Computer Vision. pp. 37–46. Springer (2015) 10