A unified autoregressive vision-language framework integrates segmentation, detection, and appearance reasoning for CT images via task-routing tokens and progressive refinement, with gains on public benchmarks.
nnformer: Interleaved transformer for volumetric segmentation.arXiv preprint arXiv:2109.03201, 2021
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
MHMamba combines a U-Net with multi-head Mamba, channel calibration, and adaptive skip fusion to improve 3D brain tumor segmentation accuracy and small-lesion sensitivity on BraTS datasets while retaining linear complexity.
FM-BFF-Net combines focal modulation attention with bidirectional encoder-decoder fusion in a CNN-transformer architecture and reports higher Dice and Jaccard scores than recent methods across eight medical image datasets.
SwinUNETR outperforms 3D UNet with Dice scores up to 0.902 on larger gland subsets using mixed-cohort five-fold training, while UNETR performs poorly on the same subsets.
citing papers explorer
-
Segmentation, Detection and Explanation: A Unified Framework for CT Appearance Reasoning
A unified autoregressive vision-language framework integrates segmentation, detection, and appearance reasoning for CT images via task-routing tokens and progressive refinement, with gains on public benchmarks.
-
MHMamba: Multi-Head Mamba for 3D Brain Tumor Segmentation
MHMamba combines a U-Net with multi-head Mamba, channel calibration, and adaptive skip fusion to improve 3D brain tumor segmentation accuracy and small-lesion sensitivity on BraTS datasets while retaining linear complexity.
-
Focal Modulation and Bidirectional Feature Fusion Network for Medical Image Segmentation
FM-BFF-Net combines focal modulation attention with bidirectional encoder-decoder fusion in a CNN-transformer architecture and reports higher Dice and Jaccard scores than recent methods across eight medical image datasets.
-
Improving Prostate Gland Segmentation Using Transformer based Architectures
SwinUNETR outperforms 3D UNet with Dice scores up to 0.902 on larger gland subsets using mixed-cohort five-fold training, while UNETR performs poorly on the same subsets.