Attention U-Net: Learning Where to Look for the Pancreas
Pith reviewed 2026-05-12 21:13 UTC · model grok-4.3
The pith
Attention gates added to U-Net let the model learn to focus on target structures in CT images, raising segmentation accuracy while removing the need for separate organ localization steps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce attention gates (AGs) that automatically learn to focus on target structures of varying shapes and sizes in medical images. When integrated into U-Net, the gates suppress irrelevant regions in the input while emphasizing salient features for the segmentation task. This removes the requirement for explicit external tissue or organ localisation modules in cascaded CNNs. Experiments on two large CT abdominal datasets for multi-class segmentation demonstrate that AGs improve U-Net prediction performance consistently across datasets and training sizes while preserving computational efficiency.
What carries the argument
Attention gates (AGs), modules inserted into the skip connections of U-Net that learn to filter feature maps by suppressing irrelevant spatial regions and amplifying task-relevant ones.
If this is right
- AGs integrate into standard CNNs such as U-Net with only minor added computation.
- Prediction accuracy and sensitivity increase consistently on abdominal CT segmentation tasks.
- The model works across different dataset sizes and multiple training conditions without retraining the base architecture.
- Cascaded localisation-plus-segmentation pipelines become unnecessary.
- Computational efficiency remains comparable to the unmodified U-Net.
Where Pith is reading between the lines
- The same gate design could be tested on MRI or ultrasound volumes where organ boundaries vary even more than in CT.
- Because the gates operate on feature maps, they might reduce the amount of manual annotation needed for training by guiding the network to salient areas automatically.
- Replacing explicit localisation stages with learned attention could shorten overall inference pipelines in clinical workflows.
- Combining AGs with other forms of attention, such as channel-wise, remains an open extension not explored in the reported experiments.
Load-bearing premise
Attention gates will reliably learn to suppress irrelevant regions and highlight salient features for target structures of varying shapes and sizes without requiring explicit external tissue or organ localisation modules.
What would settle it
Train both standard U-Net and Attention U-Net on the same small subset of one CT dataset and measure Dice scores plus inference time on a fixed test set; if Dice does not rise or runtime overhead exceeds minimal levels, the central claim does not hold.
read the original abstract
We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Attention Gates (AGs) as an architectural component to integrate into U-Net for medical image segmentation. AGs automatically learn to suppress irrelevant regions and highlight salient features for target structures of varying shapes and sizes, with the goal of eliminating explicit external tissue/organ localization modules required in cascaded CNN pipelines. The Attention U-Net is evaluated on two large CT abdominal datasets for multi-class segmentation, claiming consistent performance gains over standard U-Net across datasets and training sizes with minimal computational overhead. The code is made publicly available.
Significance. If the central claims hold, the work provides a lightweight attention mechanism that can improve segmentation sensitivity and accuracy in standard CNNs without separate localization stages, which would simplify pipelines in medical imaging. The public code supports reproducibility, a clear strength. However, the significance is limited by the absence of direct comparisons to cascaded baselines, leaving open whether observed gains truly substitute for explicit localization or merely reflect added model capacity.
major comments (2)
- Abstract: The load-bearing claim that AGs 'enable us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks' is not supported by the experiments, which compare Attention U-Net only to plain U-Net on two CT datasets. No cascaded baseline (coarse localization network followed by fine segmentation) is evaluated for either Dice accuracy or total inference cost, so it remains possible that gains arise from multi-scale attention adding capacity rather than substituting for localization.
- Experimental Results section: The abstract asserts 'consistent gains' and 'improved prediction performance' but the reported evaluation lacks specific quantitative metrics (e.g., Dice scores per class or dataset), error bars, number of training runs, or implementation details such as training sizes and hyper-parameters, which undermines verification of the performance claims and cross-dataset consistency.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, acknowledging where the manuscript claims require qualification or additional clarification. We will revise the manuscript accordingly to strengthen the presentation of results and temper unsupported assertions.
read point-by-point responses
-
Referee: Abstract: The load-bearing claim that AGs 'enable us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks' is not supported by the experiments, which compare Attention U-Net only to plain U-Net on two CT datasets. No cascaded baseline (coarse localization network followed by fine segmentation) is evaluated for either Dice accuracy or total inference cost, so it remains possible that gains arise from multi-scale attention adding capacity rather than substituting for localization.
Authors: We agree that the abstract claim is not directly supported by the experiments, as no cascaded baseline is evaluated. The attention gates are intended to provide implicit localization by suppressing irrelevant regions, which is supported by the observed improvements over standard U-Net. However, without a head-to-head comparison on accuracy and inference cost, we cannot claim that AGs fully substitute for explicit localization modules. We will revise the abstract to qualify the statement (e.g., 'can reduce the need for explicit external localization modules') and add a limitations paragraph discussing this point. A cascaded baseline comparison is not feasible to add at this stage due to time and scope constraints. revision: partial
-
Referee: Experimental Results section: The abstract asserts 'consistent gains' and 'improved prediction performance' but the reported evaluation lacks specific quantitative metrics (e.g., Dice scores per class or dataset), error bars, number of training runs, or implementation details such as training sizes and hyper-parameters, which undermines verification of the performance claims and cross-dataset consistency.
Authors: The full manuscript reports per-class Dice scores, Hausdorff distances, and other metrics for both datasets in Tables 1–3, with results broken down by training set size (25%, 50%, 100%). Hyperparameters and training protocols are detailed in Section 3.2. We acknowledge that standard deviations across multiple runs and the precise number of independent training runs were not reported. We will add these (from 3 runs per configuration) and ensure all quantitative results are more explicitly cross-referenced in the text to better substantiate the claims of consistent gains. revision: yes
Circularity Check
No significant circularity; novel architectural component with independent empirical tests
full rationale
The paper proposes attention gates as an independent architectural addition to U-Net, with the central claim (elimination of explicit cascaded localization modules) supported by direct experiments on two external CT datasets showing Dice improvements. No equations, self-definitional reductions, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The derivation chain consists of a new mechanism plus standard training and evaluation, remaining self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Convolutional neural networks trained end-to-end on medical CT images can perform multi-class segmentation tasks.
invented entities (1)
-
Attention Gate (AG)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith.Foundation.DAlembert.Inevitabilitybilinear_family_forced echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
AGs automatically learn to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs).
-
IndisputableMonolith.Foundation.LedgerCanonicalityno_free_knobs echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy.
-
IndisputableMonolith.Foundation.DimensionForcingdimension_forced unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 50 Pith papers
-
AuraMask: An Extensible Pipeline for Developing Aesthetic Anti-Facial Recognition Image Filters
AuraMask produces 40 aesthetic anti-facial recognition filters that match or exceed prior adversarial effectiveness and achieve significantly higher user acceptance in a 630-person study.
-
TopoU-Net: a U-Net architecture for topological domains
TopoU-Net is a rank-path U-Net for combinatorial complexes that encodes by lifting cochains upward along incidences, decodes by transporting downward, and merges via skip connections at matched ranks.
-
XAttnRes: Cross-Stage Attention Residuals for Medical Image Segmentation
XAttnRes introduces cross-stage attention residuals that maintain a global feature history and selectively aggregate prior representations, improving medical image segmentation and performing on par with baselines eve...
-
Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation
GDLA delivers state-of-the-art accuracy on CT, MRI, ultrasound and dermoscopy segmentation benchmarks while keeping linear O(N) complexity in a PVT encoder-decoder.
-
Information Filtering via Variational Regularization for Robot Manipulation
Variational Regularization imposes an adaptive information bottleneck on noisy intermediate features in DP3-UNet and DP3-DiT policies, consistently raising task success rates on RoboTwin2.0, Adroit, and MetaWorld whil...
-
S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss
S2M-Net achieves state-of-the-art Dice scores on 16 medical datasets across 8 modalities using a 4.7M-parameter spectral-spatial mixer and morphology-aware adaptive loss, outperforming transformers with 3.5-6x fewer p...
-
StruMPL: Multi-task Dense Regression under Disjoint Partial Supervision and MNAR Labels
StruMPL is a multi-task dense regression model that jointly addresses disjoint partial supervision, MNAR labels, and inter-task physical constraints for improved forest biomass estimation from Earth observation.
-
Spectral Vision Transformer for Efficient Tokenization with Limited Data
A spectral vision transformer achieves equitable or superior performance with fewer parameters than standard ViTs, CNNs, and other models by using spectral projections for tokenization in limited-data medical imaging.
-
FEFormer: Frequency-enhanced Vision Transformer for Generic Knowledge Extraction and Adaptive Feature Fusion in Volumetric Medical Image Segmentation
A frequency-enhanced Vision Transformer with FDSA, FGMLP, WAFF, and FCSB modules delivers superior volumetric medical image segmentation performance and efficiency over prior state-of-the-art methods.
-
Polygon-mamba: Retinal vessel segmentation using polygon scanning mamba and space-frequency collaborative attention
Polygon-Mamba achieves F1 scores of 0.8283, 0.8282, and 0.8251 on DRIVE, STARE, and CHASE_DB1 by combining polygon scanning Mamba with space-frequency collaborative attention to better detect small retinal vessels.
-
ESICA: A Scalable Framework for Text-Guided 3D Medical Image Segmentation
ESICA delivers state-of-the-art accuracy on a five-modality 3D medical segmentation benchmark while offering a compact variant with far fewer parameters.
-
Mapping License Plate Recoverability Under Extreme Viewing Angles for Oppor-tunistic Urban Sensing
Recoverability maps use synthetic sweeps of viewing angles and artifacts to quantify the recoverable fraction of parameter space for license plate restoration, with the best model succeeding on 93% and geometry settin...
-
Learning from Noisy Prompts: Saliency-Guided Prompt Distillation for Robust Segmentation with SAM
SPD improves SAM segmentation robustness to noisy prompts by learning anatomical saliency priors, distilling consensus prompts from adjacent slices, and enforcing pairwise slice consistency.
-
Toward Polymorphic Backdoor against Semantic Communication via Intensity-Based Poisoning
SemBugger achieves polymorphic backdoors in semantic communication via graded-intensity trigger poisoning and hierarchical loss, plus a noise-based defense with a theoretical efficacy bound.
-
CDSA-Net:Collaborative Decoupling of Vascular Structure and Background for High-Fidelity Coronary Digital Subtraction Angiography
CDSA-Net decouples vascular structure extraction and background restoration in coronary DSA via hierarchical geometric priors and adaptive noise modeling to eliminate artifacts while preserving tissue fidelity.
-
Geometrical Cross-Attention and Nonvoid Voxelization for Efficient 3D Medical Image Segmentation
GCNV-Net achieves state-of-the-art accuracy on multiple 3D medical segmentation benchmarks while cutting FLOPs by 56% and inference latency by 68% through dynamic nonvoid voxelization and geometric attention.
-
CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing
The paper defines the Conformal Hallucination Estimation Metric (CHEM) that localizes hallucination-prone regions in image reconstruction models via multiscale representations and distribution-free conformal regression.
-
Can We Go Beyond Visual Features? Neural Tissue Relation Modeling for Relational Graph Analysis in Non-Melanoma Skin Histology
NTRM combines CNNs with tissue-level graph neural networks to model inter-tissue relationships, delivering 4.9% to 31.25% higher Dice scores than prior methods on a non-melanoma skin cancer histology segmentation benchmark.
-
SAMRI: Segment Any MRI
SAMRI fine-tunes only the mask decoder of SAM on 1.1 million MRI slices from 30 datasets to reach mean DSC 0.87 on 47 targets and strong zero-shot performance.
-
Category-based Galaxy Image Generation via Diffusion Models
GalCatDiff applies category embeddings and a novel Astro-RAB block inside diffusion models to produce galaxy images whose color and size distributions match observations more closely than prior generative approaches.
-
Learning Parallax for Stereo Event-based Motion Deblurring
St-EDNet recovers sharp images from misaligned blurry intensity images and event streams by performing coarse cross-modal stereo alignment followed by fine bidirectional feature reconstruction.
-
M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation
M²SNet uses intra- and inter-layer multi-scale subtraction units plus a training-free LossNet to generate difference features that reduce redundancy in decoder fusion for medical segmentation.
-
MHMamba: Multi-Head Mamba for 3D Brain Tumor Segmentation
MHMamba combines a U-Net with multi-head Mamba, channel calibration, and adaptive skip fusion to improve 3D brain tumor segmentation accuracy and small-lesion sensitivity on BraTS datasets while retaining linear complexity.
-
Geometric Flood Depth Estimation: Fusing Transformer-Based Segmentation with Digital Elevation Models
A pipeline uses Mask2Former flood masks and DEMs to compute a single water surface elevation then derives local depths under hydrostatic equilibrium.
-
Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction
A masked-diffusion pretrained convolutional model outperforms ViT pathology foundation models on cell-level dense prediction tasks in histology.
-
MambaLiteUNet: Cross-Gated Adaptive Feature Fusion for Robust Skin Lesion Segmentation
MambaLiteUNet integrates Mamba into U-Net with adaptive fusion, local-global mixing, and cross-gated attention modules to reach 87.12% IoU and 93.09% Dice on skin lesion datasets while cutting parameters by 93.6%.
-
EDU-Net: Retinal Pathological Fluid Segmentation in OCT Images with Multiscale Feature Fusion and Boundary Optimization
EDU-Net fuses multiscale local and global features with boundary optimization to achieve state-of-the-art segmentation of intraretinal and subretinal fluid in OCT images.
-
Align then Refine: Text-Guided 3D Prostate Lesion Segmentation
A text-guided multi-encoder U-Net with alignment loss, heatmap calibration, and confidence-gated cross-attention refiner sets new state-of-the-art 3D prostate lesion segmentation performance on the PI-CAI dataset.
-
HQF-Net: A Hybrid Quantum-Classical Multi-Scale Fusion Network for Remote Sensing Image Segmentation
HQF-Net reports mIoU gains on three remote-sensing benchmarks by adding quantum circuits to skip connections and a mixture-of-experts bottleneck inside a classical U-Net fused with a DINOv3 backbone.
-
Attention-Guided Flow-Matching for Sparse 3D Geological Generation
3D-GeoFlow reformulates discrete categorical 3D geological generation as simulation-free continuous vector field regression with 3D attention gates, claiming to outperform heuristics and diffusion models on a 2,200-ca...
-
GroupKAN: Efficient Kolmogorov-Arnold Networks via Grouped Spline Modeling
GroupKAN reduces KAN parameter scaling via intra-group spline mappings, delivering 79.80% average IoU (+1.11% over U-KAN) at 47.6% of the parameters on BUSI, GlaS, and CVC datasets.
-
BGRem: A background noise remover for astronomical images based on a diffusion model
BGRem applies a supervised diffusion model to denoise MeerLICHT and Fermi-LAT images, raising true-positive source detections by roughly 7% when used before SExtractor.
-
A novel attention mechanism for noise-adaptive and robust segmentation of microtubules in microscopy images
ASE_Res_UNet with a novel noise-adaptive attention mechanism outperforms ablated variants and alternative architectures in segmenting microtubules from noisy synthetic and real microscopy images while using fewer para...
-
MSLAU-Net: A Hybrid CNN-Transformer Network for Medical Image Segmentation
MSLAU-Net proposes a hybrid CNN-Transformer architecture using multi-scale linear attention and lightweight top-down aggregation that outperforms prior methods on medical segmentation benchmarks across three modalities.
-
Gamma-Ray Burst Light Curve Reconstruction: A Comparative Machine and Deep Learning Analysis
MLP and Attention U-Net outperform other models in reconstructing GRB light curves on 521 events, cutting plateau parameter uncertainties by 37-41% versus the Willingale baseline while achieving low MSE.
-
Implantable Adaptive Cells: A Novel Enhancement for Pre-Trained U-Nets in Medical Image Segmentation
Introduces Implantable Adaptive Cells inserted into pre-trained U-Nets via Partially-Connected DARTS to achieve approximately 5 percentage point gains in segmentation accuracy on four medical MRI/CT datasets.
-
ConvNeXt-FD: A Fractal-Based Deep Model for Robust Biomedical Image Segmentation
ConvNeXt-FD pairs a ConvNeXt backbone with fractal-dimension boundary regularization inside a U-Net and reports competitive Dice and related scores on six biomedical segmentation benchmarks.
-
Med-DisSeg: Dispersion-Driven Representation Learning for Fine-Grained Medical Image Segmentation
Med-DisSeg uses a dispersive loss on batch representations plus adaptive multi-scale decoding to achieve state-of-the-art fine-grained segmentation on five medical imaging datasets.
-
Edge-Cloud Collaborative Pothole Detection via Onboard Event Screening and Federated Temporal Segmentation
An edge-cloud framework screens vibration events onboard with a GMM and uses a federated 1D Attention U-Net for temporal segmentation to detect potholes while reducing data transmission.
-
Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection
A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher mod...
-
MAE-Based Self-Supervised Pretraining for Data-Efficient Medical Image Segmentation Using nnFormer
MAE self-supervised pretraining of nnFormer yields higher Dice scores, faster convergence, and better generalization when labeled medical segmentation data is scarce.
-
PBE-UNet: A light weight Progressive Boundary-Enhanced U-Net with Scale-Aware Aggregation for Ultrasound Image Segmentation
PBE-UNet adds scale-aware aggregation and progressive boundary expansion modules to U-Net and reports better segmentation performance than prior methods on four ultrasound datasets.
-
SwinTextUNet: Integrating CLIP-Based Text Guidance into Swin Transformer U-Nets for Medical Image Segmentation
SwinTextUNet integrates CLIP text guidance into Swin U-Net via cross-attention and convolutional fusion, achieving 86.47% Dice and 78.2% IoU on QaTaCOV19 medical image segmentation.
-
SAGE-GAN: Towards Realistic and Robust Segmentation of Spatially Ordered Nanoparticles via Attention-Guided GANs
SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.
-
Focal Modulation and Bidirectional Feature Fusion Network for Medical Image Segmentation
FM-BFF-Net combines focal modulation attention with bidirectional encoder-decoder fusion in a CNN-transformer architecture and reports higher Dice and Jaccard scores than recent methods across eight medical image datasets.
-
Clinical utility of foundation models in musculoskeletal MRI for biomarker fidelity and predictive outcomes
Fine-tuned foundation models produce reliable MSK MRI biomarkers that support workload-reducing triage and calibrated 48-month prediction of knee replacement and incident OA.
-
Deep Learning for Pneumothorax Detection and Localization in Chest Radiographs
Comparison of CNN, multiple-instance learning, and FCN for pneumothorax detection and localization yielding AUCs of 0.96, 0.93, and 0.92 on 1003 chest radiographs.
-
Attention-ResUNet for Automated Fetal Head Segmentation
Attention-ResUNet reaches 99.30% mean Dice score on the HC18 fetal head ultrasound dataset, outperforming ResUNet, Attention U-Net, Swin U-Net, U-Net, and U-Net++ with statistical significance.
-
Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS)
ADRUwAMS reports Dice scores of 0.9229 (whole tumor), 0.8432 (tumor core), and 0.8004 (enhancing tumor) on BraTS 2020 after training on BraTS 2019/2020 datasets.
-
Benchmarking CNN- and Transformer-Based Models for Surgical Instrument Segmentation in Robotic-Assisted Surgery
DeepLabV3 matches SegFormer performance in multi-class surgical instrument segmentation while convolutional baselines like UNet remain competitive on the SAR-RARP50 dataset.
Reference graph
Works this paper leans on
-
[1]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and vqa. arXiv preprint arXiv:1707.07998 (2017)
work page Pith review arXiv 2017
-
[2]
Neural Machine Translation by Jointly Learning to Align and Translate
Bahdanau, D., Cho, K., Bengio, Y .: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[3]
arXiv preprint arXiv:1710.09289 (2017)
Bai, W., Sinclair, M., Tarroni, G., Oktay, O., Rajchl, M., Vaillant, G., Lee, A.M., Aung, N., Lukaschuk, E., Sanghvi, M.M., et al.: Human-level CMR image analysis with deep fully convolutional networks. arXiv preprint arXiv:1710.09289 (2017)
-
[4]
Cai, J., Lu, L., Xie, Y ., Xing, F., Yang, L.: Improving deep pancreas segmentation in CT and MRI images via recurrent neural contextual learning and direct loss function. In: MICCAI (2017)
work page 2017
-
[5]
Cerrolaza, J.J., Summers, R.M., Linguraru, M.G.: Soft multi-organ shape models via generalized PCA: A general framework. In: MICCAI. pp. 219–228. Springer (2016)
work page 2016
-
[6]
Gibson, E., Giganti, F., Hu, Y ., Bonmati, E., Bandula, S., Gurusamy, K., Davidson, B.R., Pereira, S.P., Clarkson, M.J., Barratt, D.C.: Towards image-guided pancreas and biliary endoscopy: Au- tomatic multi-organ segmentation on abdominal CT with dense dilated networks. In: MICCAI. pp. 728–736. Springer (2017)
work page 2017
-
[7]
Highway and Residual Networks learn Unrolled Iterative Estimation
Greff, K., Srivastava, R.K., Schmidhuber, J.: Highway and residual networks learn unrolled iterative estimation. arXiv preprint arXiv:1612.07771 (2016)
work page Pith review arXiv 2016
-
[8]
arXiv preprint arXiv:1801.09449 (2018)
Heinrich, M.P., Blendowski, M., Oktay, O.: TernaryNet: Faster deep model inference without GPUs for medical 3D segmentation using sparse and binary convolutions. arXiv preprint arXiv:1801.09449 (2018)
-
[9]
Heinrich, M.P., Oktay, O.: BRIEFnet: Deep pancreas segmentation using binary sparse convolu- tions. In: MICCAI. pp. 329–337. Springer (2017)
work page 2017
-
[10]
Squeeze-and-Excitation Networks
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv:1709.01507 (2017)
work page Pith review arXiv 2017
-
[11]
Jetley, S., Lord, N.A., Lee, N., Torr, P.: Learn to pay attention. In: International Conference on Learning Representations (2018), https://openreview.net/forum?id=HyzbhfWRW
work page 2018
-
[12]
In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries
Kamnitsas, K., Bai, W., Ferrante, E., McDonagh, S., Sinclair, M., Pawlowski, N., Rajchl, M., Lee, M., Kainz, B., Rueckert, D., Glocker, B.: Ensembles of multiple models and architectures for robust brain tumour segmentation. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 450–462. Cham (2018)
work page 2018
-
[13]
Medical image analysis 36, 61–78 (2017)
Kamnitsas, K., Ledig, C., Newcombe, V .F., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, D., Glocker, B.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical image analysis 36, 61–78 (2017)
work page 2017
-
[14]
arXiv preprint arXiv:1801.05173 (2018)
Khened, M., Kollerathu, V .A., Krishnamurthi, G.: Fully convolutional multi-scale residual densenets for cardiac segmentation and automated cardiac diagnosis using ensemble of classi- fiers. arXiv preprint arXiv:1801.05173 (2018)
-
[15]
Adam: A Method for Stochastic Optimization
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[16]
In: Artificial Intelligence and Statistics
Lee, C.Y ., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics. pp. 562–570 (2015)
work page 2015
-
[17]
arXiv preprint arXiv:1711.08324 (2017)
Liao, F., Liang, M., Li, Z., Hu, X., Song, S.: Evaluate the malignancy of pulmonary nodules using the 3D deep leaky noisy-or network. arXiv preprint arXiv:1711.08324 (2017)
-
[18]
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE CVPR. pp. 3431–3440 (2015) 9
work page 2015
-
[19]
Effective Approaches to Attention-based Neural Machine Translation
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
work page Pith review arXiv 2015
-
[20]
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumet- ric medical image segmentation. In: 3D Vision. pp. 565–571. IEEE (2016)
work page 2016
-
[21]
In: Advances in neural information processing systems
Mnih, V ., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in neural information processing systems. pp. 2204–2212 (2014)
work page 2014
-
[22]
Oda, M., Shimizu, N., Roth, H.R., Karasawa, K., Kitasaka, T., Misawa, K., Fujiwara, M., Rueckert, D., Mori, K.: 3D FCN feature driven regression forest-based pancreas localization and segmentation. In: DLMI, pp. 222–230. Springer (2017)
work page 2017
-
[23]
Payer, C., Štern, D., Bischof, H., Urschler, M.: Multi-label whole heart segmentation using CNNs and anatomical label configurations. In: STACOM. pp. 190–198. Springer (2017)
work page 2017
-
[24]
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI. pp. 234–241. Springer (2015)
work page 2015
-
[25]
and Farag, Ayman and Turkbey, Evrim B
Roth, H., Farag, A., Turkbey, E.B., Lu, L., Liu, J., Summers, R.M.: Data from Pancreas-CT. The Cancer Imaging Archive (2016), http://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU
-
[26]
Medical Image Analysis 45, 94 – 107 (2018)
Roth, H.R., Lu, L., Lay, N., Harrison, A.P., Farag, A., Sohn, A., Summers, R.M.: Spatial aggre- gation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation. Medical Image Analysis 45, 94 – 107 (2018)
work page 2018
-
[27]
Hierarchical 3D fully convolutional networks for multi-organ segmentation
Roth, H.R., Oda, H., Hayashi, Y ., Oda, M., Shimizu, N., Fujiwara, M., Misawa, K., Mori, K.: Hierarchical 3D fully convolutional networks for multi-organ segmentation. arXiv preprint arXiv:1704.06382 (2017)
work page Pith review arXiv 2017
-
[28]
Medical image analysis 28, 46–65 (2016)
Saito, A., Nawano, S., Shimizu, A.: Joint optimization of segmentation and shape prior from level-set-based statistical shape model, and its application to the automated segmentation of abdominal organs. Medical image analysis 28, 46–65 (2016)
work page 2016
-
[29]
arXiv preprint arXiv:1709.04696 (2017)
Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., Zhang, C.: Disan: Directional self-attention network for rnn/cnn-free language understanding. arXiv preprint arXiv:1709.04696 (2017)
- [30]
-
[31]
Veliˇckovi´c, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y .: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[32]
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: IEEE CVPR. pp. 3156–3164 (2017)
work page 2017
-
[33]
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. arXiv preprint arXiv:1711.07971 (2017)
work page Pith review arXiv 2017
-
[34]
Wolz, R., Chu, C., Misawa, K., Fujiwara, M., Mori, K., Rueckert, D.: Automated abdominal multi-organ segmentation with subject-specific atlas generation. IEEE TMI 32(9) (2013)
work page 2013
-
[35]
In: Proceedings of the IEEE international conference on computer vision
Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision. pp. 1395–1403 (2015)
work page 2015
-
[36]
arXiv preprint arXiv:1701.06452 (2017)
Ypsilantis, P.P., Montana, G.: Learning what to look in chest X-rays with a recurrent visual attention model. arXiv preprint arXiv:1701.06452 (2017)
-
[37]
arXiv preprint arXiv:1709.04518 (2017)
Yu, Q., Xie, L., Wang, Y ., Zhou, Y ., Fishman, E.K., Yuille, A.L.: Recurrent saliency transfor- mation network: Incorporating multi-stage visual cues for small organ segmentation. arXiv preprint arXiv:1709.04518 (2017)
-
[38]
Zhou, Y ., Xie, L., Shen, W., Wang, Y ., Fishman, E.K., Yuille, A.L.: A fixed-point model for pancreas segmentation in abdominal CT scans. In: MICCAI. pp. 693–701. Springer (2017)
work page 2017
-
[39]
In: International MICCAI Workshop on Medical Computer Vision
Zografos, V ., Valentinitsch, A., Rempfler, M., Tombari, F., Menze, B.: Hierarchical multi-organ segmentation without registration in 3D abdominal CT images. In: International MICCAI Workshop on Medical Computer Vision. pp. 37–46. Springer (2015) 10
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.