Recognition: 2 theorem links
· Lean TheoremTransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
Pith reviewed 2026-05-11 13:30 UTC · model grok-4.3
The pith
TransUNet shows that a transformer encoder on CNN features can supply the global context U-Net needs for accurate medical image segmentation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that transformers can serve as strong encoders for medical image segmentation: tokenized image patches extracted from a CNN feature map are processed by a transformer to capture global contexts, after which a U-Net-style decoder upsamples these features and fuses them with the original high-resolution CNN feature maps to recover precise localization, yielding better results than competing methods on multi-organ and cardiac segmentation benchmarks.
What carries the argument
The hybrid encoder-decoder where a transformer processes tokenized patches from a CNN feature map to extract global self-attention context, which is then merged via skip connections with high-resolution CNN features inside the decoder for localization.
If this is right
- Medical segmentation pipelines can replace pure CNN encoders with transformer encoders while keeping the decoder and skip connections intact.
- Long-range dependencies become explicitly modeled without sacrificing the fine boundary detail that CNN features provide.
- Performance gains appear across both multi-organ and cardiac tasks when the same hybrid fusion is applied.
- The architecture offers a template for other encoder-decoder segmentation models that need both global and local cues.
Where Pith is reading between the lines
- Similar tokenization-plus-fusion steps could be tested on non-medical imaging domains that already use U-Net variants.
- The approach may reduce the amount of local annotation needed if global context helps the model generalize from fewer examples.
- Future work could measure whether the transformer encoder alone, without the CNN backbone, suffices on larger training sets.
Load-bearing premise
That adding global context from the transformer to the CNN skip connections will reliably raise localization accuracy on new clinical data without introducing fresh failure modes.
What would settle it
A head-to-head evaluation on a held-out multi-center medical dataset in which the hybrid model shows no gain or a drop in Dice or Hausdorff metrics relative to a standard U-Net baseline.
read the original abstract
Medical image segmentation is an essential prerequisite for developing healthcare systems, especially for disease diagnosis and treatment planning. On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard and achieved tremendous success. However, due to the intrinsic locality of convolution operations, U-Net generally demonstrates limitations in explicitly modeling long-range dependency. Transformers, designed for sequence-to-sequence prediction, have emerged as alternative architectures with innate global self-attention mechanisms, but can result in limited localization abilities due to insufficient low-level details. In this paper, we propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation. On one hand, the Transformer encodes tokenized image patches from a convolution neural network (CNN) feature map as the input sequence for extracting global contexts. On the other hand, the decoder upsamples the encoded features which are then combined with the high-resolution CNN feature maps to enable precise localization. We argue that Transformers can serve as strong encoders for medical image segmentation tasks, with the combination of U-Net to enhance finer details by recovering localized spatial information. TransUNet achieves superior performances to various competing methods on different medical applications including multi-organ segmentation and cardiac segmentation. Code and models are available at https://github.com/Beckschen/TransUNet.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces TransUNet, a hybrid CNN-Transformer architecture for medical image segmentation. A CNN backbone extracts feature maps that are tokenized and processed by a Transformer encoder to capture global context; these are then decoded via a U-Net-style upsampling path that fuses the Transformer output with high-resolution CNN skip connections to recover fine localization details. The central claim is that this design yields superior empirical performance over competing methods on multi-organ and cardiac segmentation tasks.
Significance. If the performance gains are robust, the work is significant for demonstrating that Transformers can function as strong encoders in medical imaging by explicitly modeling long-range dependencies while the U-Net decoder mitigates localization weaknesses. The public release of code and models is a clear strength that supports reproducibility and extension by the community.
major comments (2)
- [§4] §4 (Experiments): The superiority claims rest on comparisons to U-Net and other baselines, but no ablation is reported against a pure CNN encoder (e.g., deeper/wider ResNet or U-Net) with matched parameter count and training protocol. Without this, it is impossible to determine whether gains derive from the Transformer’s global attention or simply from increased capacity, which is load-bearing for the central architectural claim.
- [§4.1] §4.1 and results tables: Performance improvements are presented without statistical significance tests, confidence intervals, or multiple random-seed runs. Given the small size and high variability of medical datasets, modest metric gains may not be reliable, directly affecting the strength of the “superior performance” assertion.
minor comments (2)
- [§3.2] §3.2: The precise reshaping of the Transformer output sequence back into a 2-D feature map before decoder fusion is described at a high level; adding an equation or explicit tensor-shape diagram would improve clarity and reproducibility.
- [Figure 2] Figure 2: The architecture diagram would benefit from explicit annotation of the tokenization step and the exact skip-connection fusion points to match the textual description.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive evaluation of our work. We address the major comments point-by-point below.
read point-by-point responses
-
Referee: [§4] §4 (Experiments): The superiority claims rest on comparisons to U-Net and other baselines, but no ablation is reported against a pure CNN encoder (e.g., deeper/wider ResNet or U-Net) with matched parameter count and training protocol. Without this, it is impossible to determine whether gains derive from the Transformer’s global attention or simply from increased capacity, which is load-bearing for the central architectural claim.
Authors: We appreciate the referee's point regarding the need for capacity-matched ablations to isolate the benefit of the Transformer encoder. Our manuscript compares TransUNet to the standard U-Net and other established baselines under consistent training protocols. The improvements are observed across different medical segmentation tasks, supporting that the global context modeling from the Transformer contributes to better performance. To directly address this, we will include parameter counts for all models in the revised manuscript and add an ablation study using a deeper CNN backbone with comparable parameters where possible. revision: partial
-
Referee: [§4.1] §4.1 and results tables: Performance improvements are presented without statistical significance tests, confidence intervals, or multiple random-seed runs. Given the small size and high variability of medical datasets, modest metric gains may not be reliable, directly affecting the strength of the “superior performance” assertion.
Authors: We concur that providing statistical analysis would enhance the reliability of our results, particularly given the characteristics of medical imaging datasets. We will update the experimental section to include multiple runs with different random seeds, report standard deviations or confidence intervals, and conduct appropriate statistical tests to validate the significance of the performance gains in the revised manuscript. revision: yes
Circularity Check
No circularity: empirical architecture proposal with benchmark validation
full rationale
The paper introduces TransUNet as a hybrid CNN-Transformer encoder-decoder for medical segmentation and supports its claims solely through empirical results on public benchmarks (multi-organ and cardiac segmentation). No equations, predictions, or first-principles derivations appear in the provided text; the architecture is defined explicitly and then evaluated, without any fitted parameter being relabeled as a prediction or any load-bearing premise reducing to a self-citation. The superiority statements are presented as experimental outcomes rather than closed-loop theoretical necessities, making the derivation chain self-contained and non-circular.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/HierarchyEmergence.leanhierarchy_emergence_forces_phi unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
TransUNet achieves superior performances to various competing methods on different medical applications including multi-organ segmentation and cardiac segmentation.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 42 Pith papers
-
Weakly Supervised Segmentation as Semantic-Based Regularization
Differentiable fuzzy logic constraints fine-tune SAM to generate higher-quality pseudo-labels, enabling a second-stage model to reach state-of-the-art weakly supervised segmentation on Pascal VOC and REFUGE2, sometime...
-
RAM-H1200: A Unified Evaluation and Dataset on Hand Radiographs for Rheumatoid Arthritis
RAM-H1200 introduces a public dataset of 1,200 hand X-rays with whole-hand bone segmentation, pixel-level bone erosion masks, and joint-level SvdH scores for both erosion and narrowing to enable unified RA analysis.
-
Camyla: Scaling Autonomous Research in Medical Image Segmentation
Camyla autonomously generates research proposals, experiments, and manuscripts in medical image segmentation, outperforming baselines on 24 of 31 recent datasets while producing 40 human-reviewed papers.
-
Nested Radially Monotone Polar Occupancy Estimation: Clinically-Grounded Optic Disc and Cup Segmentation for Glaucoma Screening
NPS-Net formulates optic disc and cup segmentation as nested radially monotone polar occupancy estimation to guarantee star-convexity, nesting, and high accuracy for glaucoma screening.
-
XAttnRes: Cross-Stage Attention Residuals for Medical Image Segmentation
XAttnRes introduces cross-stage attention residuals that maintain a global feature history and selectively aggregate prior representations, improving medical image segmentation and performing on par with baselines eve...
-
Fast Single Nitrogen-Vacancy Center Ramsey Characterization using a Physics-Informed Neural Network
NVRNet uses pretrained simulation-based U-Nets with attention and parameter-efficient adapters, followed by a transformer estimator, to reconstruct clean Ramsey waveforms and infer hyperfine parameters from minimal-sw...
-
Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation
GDLA delivers state-of-the-art accuracy on CT, MRI, ultrasound and dermoscopy segmentation benchmarks while keeping linear O(N) complexity in a PVT encoder-decoder.
-
Towards Label-Free Single-Cell Phenotyping Using Multi-Task Learning
A hybrid CNN-transformer model with multi-task learning achieves 91.3% WBC classification accuracy and 0.72 Pearson correlation for CD16 expression regression from label-free DPC images, augmented by LLM-generated summaries.
-
Automatic Landmark-Based Segmentation of Human Subcortical Structures in MRI
A three-stage pipeline detects 16 landmarks, coarsely segments 12 labels, and refines them into 26 structures using landmark constraints to improve accuracy in subcortical MRI segmentation.
-
DuetFair: Coupling Inter- and Intra-Subgroup Robustness for Fair Medical Image Segmentation
DuetFair couples inter-subgroup adaptation with intra-subgroup robustness via FairDRO (dMoE plus subgroup-conditioned DRO) to boost worst-case and equity-scaled performance on medical segmentation benchmarks.
-
ESICA: A Scalable Framework for Text-Guided 3D Medical Image Segmentation
ESICA delivers state-of-the-art accuracy on a five-modality 3D medical segmentation benchmark while offering a compact variant with far fewer parameters.
-
MedFlowSeg: Flow Matching for Medical Image Segmentation with Frequency-Aware Attention
MedFlowSeg is a conditional flow matching model for medical image segmentation that adds dual-branch spatial attention and frequency-aware attention to achieve more efficient inference than diffusion models while impr...
-
Rethinking Uncertainty in Segmentation: From Estimation to Decision
Uncertainty optimization alone misses most safety gains; a decision-stage deferral policy removes up to 80% segmentation errors at 25% pixel deferral with cross-dataset robustness, while calibration does not improve d...
-
Human Gaze-based Dual Teacher Guidance Learning for Semi-Supervised Medical Image Segmentation
HG-DTGL integrates human gaze as an extra teacher in mean-teacher learning via GazeMix, MGP module and Gaze Loss, reporting superior segmentation across ten organs on multiple modalities.
-
A fast and Generic Energy-Shifting Transformer for Hybrid Monte Carlo Radiotherapy Calculation
A hybrid Transformer-UNet model with energy-shifting inputs generates 6 MV LINAC dose maps from monoenergetic data, achieving over 98% gamma passing rate (3%/3mm) versus full Monte Carlo for prostate radiotherapy.
-
LiteMedCoT-VL: Parameter-Efficient Adaptation for Medical Visual Question Answering
LiteMedCoT-VL distills chain-of-thought from a 235B model to 2B VLMs via LoRA, reaching 64.9% accuracy on PMC-VQA and beating a 4B zero-shot baseline by 11 points.
-
A Unified Framework for the Detection and Classification of Fatty Pancreas in Ultrasound Images
A TransUNet-based segmentation followed by texture comparison classifies fatty pancreas in ultrasound with 89.7% accuracy on a small clinical dataset.
-
Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction
A masked-diffusion pretrained convolutional model outperforms ViT pathology foundation models on cell-level dense prediction tasks in histology.
-
Lightning Unified Video Editing via In-Context Sparse Attention
ISA prunes low-saliency context tokens and routes queries by sharpness to either full or 0-th order Taylor sparse attention, enabling LIVEditor to cut attention latency ~60% while beating prior video editing methods o...
-
Domain-Adaptive Arrhythmia Classification Using a Hybrid Transformer on Wearable Heart Signals
A hybrid transformer combining ECG morphology and HRV features with MMD domain adaptation achieves 95% F1-macro on unseen wearable data for arrhythmia classification.
-
Memory-Efficient EDA Denoising via Knowledge Distillation for Wearable IoT Under Severe Motion Artifacts and Underwater Conditions
Knowledge distillation from a hybrid CNN-Transformer teacher to a depth-wise separable CNN student, combined with realistic motion and environmental augmentation, produces a 15x smaller EDA denoiser that cuts underwat...
-
SAIL: Structure-Aware Interpretable Learning for Anatomy-Aligned Post-hoc Explanations in OCT
SAIL integrates anatomical priors at the representation level with semantic features via fusion to produce more anatomically aligned attribution maps in OCT without altering existing explainability techniques.
-
CNN-ViT Fusion with Adaptive Attention Gate for Brain Tumor MRI Classification: A Hybrid Deep Learning Model
Hybrid CNN-ViT with adaptive attention gate achieves 97.6% accuracy on brain tumor MRI classification, outperforming baselines.
-
MambaLiteUNet: Cross-Gated Adaptive Feature Fusion for Robust Skin Lesion Segmentation
MambaLiteUNet integrates Mamba into U-Net with adaptive fusion, local-global mixing, and cross-gated attention modules to reach 87.12% IoU and 93.09% Dice on skin lesion datasets while cutting parameters by 93.6%.
-
Weighted Knowledge Distillation for Semi-Supervised Segmentation of Maxillary Sinus in Panoramic X-ray Images
A semi-supervised framework using weighted knowledge distillation and SinusCycle-GAN refinement achieves 96.35% Dice score for maxillary sinus segmentation in panoramic X-rays from 2,511 patients.
-
EDU-Net: Retinal Pathological Fluid Segmentation in OCT Images with Multiscale Feature Fusion and Boundary Optimization
EDU-Net fuses multiscale local and global features with boundary optimization to achieve state-of-the-art segmentation of intraretinal and subretinal fluid in OCT images.
-
RF-HiT: Rectified Flow Hierarchical Transformer for General Medical Image Segmentation
RF-HiT uses rectified flow and a multi-scale hierarchical transformer to reach 91.27% Dice on ACDC and 87.40% on BraTS 2021 with only 10.14 GFLOPs, 13.6M parameters, and three inference steps.
-
UniSemAlign: Text-Prototype Alignment with a Foundation Encoder for Semi-Supervised Histopathology Segmentation
UniSemAlign aligns text and prototype representations with visual features to generate better supervision signals for semi-supervised segmentation, reporting Dice gains of up to 8.6% on CRAG with 10% labels.
-
Event-Level Detection of Surgical Instrument Handovers in Videos with Interpretable Vision Models
A ViT-LSTM spatiotemporal model detects surgical instrument handovers and classifies direction in videos, achieving F1 of 0.84 for detection and 0.72 mean F1 for direction on kidney transplant data.
-
CardioSAM: Topology-Aware Decoder Design for High-Precision Cardiac MRI Segmentation
CardioSAM introduces a topology-aware decoder and boundary refinement module to elevate SAM's performance on cardiac MRI, achieving a 93.39% Dice score.
-
Med-DisSeg: Dispersion-Driven Representation Learning for Fine-Grained Medical Image Segmentation
Med-DisSeg uses a dispersive loss on batch representations plus adaptive multi-scale decoding to achieve state-of-the-art fine-grained segmentation on five medical imaging datasets.
-
RD-ViT: Recurrent-Depth Vision Transformer for Semantic Segmentation with Reduced Data Dependence Extending the Recurrent-Depth Transformer Architecture to Dense Prediction
RD-ViT matches or exceeds standard ViT segmentation accuracy on cardiac MRI using a shared recurrent block, fewer parameters, and less training data.
-
Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection
A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher mod...
-
an interpretable vision transformer framework for automated brain tumor classification
Vision Transformer with CLAHE preprocessing, two-stage fine-tuning, MixUp/CutMix, EMA, TTA, and attention rollout achieves 99.29% accuracy and 99.25% macro F1 on four-class brain tumor MRI classification from 7023 scans.
-
MAE-Based Self-Supervised Pretraining for Data-Efficient Medical Image Segmentation Using nnFormer
MAE self-supervised pretraining of nnFormer yields higher Dice scores, faster convergence, and better generalization when labeled medical segmentation data is scarce.
-
Attention-Gated Convolutional Networks for Scanner-Agnostic Quality Assessment
A CNN-attention model achieves 99.2% accuracy on seen MRI sites and 75.5% on unseen heterogeneous sites for motion artifact quality assessment.
-
ASGNet: Adaptive Spectrum Guidance Network for Automatic Polyp Segmentation
ASGNet combines a spectrum-guided non-local perception module, multi-source semantic extractor, and dense cross-layer decoder to outperform 21 prior methods on five polyp segmentation benchmarks.
-
PBE-UNet: A light weight Progressive Boundary-Enhanced U-Net with Scale-Aware Aggregation for Ultrasound Image Segmentation
PBE-UNet adds scale-aware aggregation and progressive boundary expansion modules to U-Net and reports better segmentation performance than prior methods on four ultrasound datasets.
-
TAMISeg: Text-Aligned Multi-scale Medical Image Segmentation with Semantic Encoder Distillation
TAMISeg uses text prompts and semantic distillation from a frozen DINOv3 teacher inside a consistency-aware encoder and scale-adaptive decoder to outperform prior uni- and multi-modal methods on polyp and COVID-19 seg...
-
SwinTextUNet: Integrating CLIP-Based Text Guidance into Swin Transformer U-Nets for Medical Image Segmentation
SwinTextUNet integrates CLIP text guidance into Swin U-Net via cross-attention and convolutional fusion, achieving 86.47% Dice and 78.2% IoU on QaTaCOV19 medical image segmentation.
-
Semantic-Topological Graph Reasoning for Language-Guided Pulmonary Screening
STGR framework integrates LLaMA-3-V and MedSAM via text-to-vision distillation and graph reasoning, achieving 81.5% DSC on LIDC-IDRI with under 1% parameter updates and high cross-fold stability.
-
Benchmarking CNN- and Transformer-Based Models for Surgical Instrument Segmentation in Robotic-Assisted Surgery
DeepLabV3 matches SegFormer performance in multi-class surgical instrument segmentation while convolutional baselines like UNet remain competitive on the SAR-RARP50 dataset.
Reference graph
Works this paper leans on
-
[1]
Generating Long Sequences with Sparse Transformers
Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[2]
In: 2009 IEEE conference on computer vision and pattern recognition
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large- scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255. Ieee (2009)
work page 2009
-
[3]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirec- tional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[4]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: ICLR (2021) 12 J. Chen et al
work page 2021
-
[5]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention
Fu, S., Lu, Y., Wang, Y., Zhou, Y., Shen, W., Fishman, E., Yuille, A.: Domain adaptive relational reasoning for 3d multi-organ segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 656–666. Springer (2020)
work page 2020
-
[6]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
work page 2016
-
[7]
IEEE transactions on medical imaging 37(12), 2663–2674 (2018)
Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A.: H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE transactions on medical imaging 37(12), 2663–2674 (2018)
work page 2018
-
[8]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3431–3440 (2015)
work page 2015
-
[9]
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 3DV (2016)
work page 2016
-
[10]
Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention u-net: Learning where to look for the pancreas. MIDL (2018)
work page 2018
-
[11]
In: International Conference on Machine Learning
Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., Tran, D.: Image transformer. In: International Conference on Machine Learning. pp. 4055–
-
[12]
In: International Conference on Medical image computing and computer-assisted intervention
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedi- cal image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. pp. 234–241. Springer (2015)
work page 2015
-
[13]
Medical image analysis 53, 197–207 (2019)
Schlemper, J., Oktay, O., Schaap, M., Heinrich, M., Kainz, B., Glocker, B., Rueck- ert, D.: Attention gated networks: Learning to leverage salient regions in medical images. Medical image analysis 53, 197–207 (2019)
work page 2019
-
[14]
In: Advances in neural information processing systems
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems. pp. 5998–6008 (2017)
work page 2017
-
[15]
In: Pro- ceedings of the IEEE conference on computer vision and pattern recognition
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Pro- ceedings of the IEEE conference on computer vision and pattern recognition. pp. 7794–7803 (2018)
work page 2018
-
[16]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention
Yu, L., Cheng, J.Z., Dou, Q., Yang, X., Chen, H., Qin, J., Heng, P.A.: Automatic 3d cardiovascular mr segmentation with densely-connected volumetric convnets. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 287–295. Springer (2017)
work page 2017
-
[17]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Yu, Q., Xie, L., Wang, Y., Zhou, Y., Fishman, E.K., Yuille, A.L.: Recurrent saliency transformation network: Incorporating multi-stage visual cues for small organ seg- mentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 8280–8289 (2018)
work page 2018
-
[18]
Rethinking semantic segmen- tation from a sequence-to-sequence perspective with trans- formers
Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. arXiv preprint arXiv:2012.15840 (2020)
-
[19]
In: International conference on medical image computing and computer-assisted intervention
Zhou, Y., Xie, L., Shen, W., Wang, Y., Fishman, E.K., Yuille, A.L.: A fixed- point model for pancreas segmentation in abdominal ct scans. In: International conference on medical image computing and computer-assisted intervention. pp. 693–701. Springer (2017)
work page 2017
-
[20]
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: A nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Im- Title Suppressed Due to Excessive Length 13 age Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11. Springer (2018)
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.