Dino U-Net: Exploiting High-Fidelity Dense Features from Foundation Models for Medical Image Segmentation

Feng Yuan; Haoyue Li; Xiaosong Wang; Xin Gao; Yifan Gao

arxiv: 2508.20909 · v2 · submitted 2025-08-28 · 💻 cs.CV · eess.IV

Dino U-Net: Exploiting High-Fidelity Dense Features from Foundation Models for Medical Image Segmentation

Haoyue Li , Yifan Gao , Feng Yuan , Xiaosong Wang , Xin Gao This is my paper

Pith reviewed 2026-05-18 20:18 UTC · model grok-4.3

classification 💻 cs.CV eess.IV

keywords medical image segmentationfoundation modelsDINOU-Netfeature adaptationdense featurestransfer learning

0 comments

The pith

Dino U-Net leverages dense features from a frozen DINOv3 foundation model to set new benchmarks in medical image segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Dino U-Net, which builds a U-Net around a frozen DINOv3 vision model to use its high-quality dense features for segmenting medical images. It adds an adapter to combine semantic and spatial details and a fidelity-aware projection module to keep those features intact when reducing dimensions for the decoder. Experiments on seven public datasets across different modalities show it outperforms earlier approaches, and performance gets better as the backbone grows to seven billion parameters. A sympathetic reader would care because this offers a way to improve clinical imaging tools by reusing powerful general-purpose models without retraining everything from scratch.

Core claim

Dino U-Net achieves state-of-the-art performance by exploiting the high-fidelity dense features of the DINOv3 vision foundation model in an encoder-decoder architecture. The encoder uses a frozen DINOv3 backbone with a specialized adapter to fuse rich semantic features with low-level spatial details, and the fidelity-aware projection module refines and projects these features for the decoder. This approach is highly scalable, with segmentation accuracy improving as the backbone model size increases up to the 7-billion-parameter variant, and it works across various imaging modalities on seven diverse datasets.

What carries the argument

The fidelity-aware projection module (FAPM), which refines and projects the high-fidelity dense features from the DINOv3 backbone to the decoder while preserving their quality during dimensionality reduction.

If this is right

Segmentation performance scales positively with increasing backbone size up to 7 billion parameters.
The method outperforms previous approaches on seven diverse medical image datasets across modalities.
It provides a parameter-efficient solution by keeping the foundation model frozen.
Transfer of natural image features to medical segmentation is effective with the proposed adapter and projection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the transfer works well here, similar adapters might allow foundation models to boost other medical imaging tasks like classification or detection.
Testing on private clinical datasets would check if the gains hold in real-world hospital settings.
Exploring even larger or differently trained foundation models could reveal further accuracy improvements.

Load-bearing premise

The high-fidelity features from DINOv3 pre-trained on natural images transfer effectively to medical images without major loss of clinical relevance or introduction of artifacts.

What would settle it

Running Dino U-Net on an additional medical segmentation dataset where it fails to match or exceed the accuracy of the current best method, or where larger models do not yield better results, would challenge the central claim.

Figures

Figures reproduced from arXiv: 2508.20909 by Feng Yuan, Haoyue Li, Xiaosong Wang, Xin Gao, Yifan Gao.

**Figure 3.** Figure 3: Qualitative comparison of segmentation results on representative samples from the seven evaluated datasets. Each column displays a different method, [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation study on the proposed FAPM. (a) Parameter comparison between models with and without FAPM across different scales (S, B, L, 7B). [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Foundation models pre-trained on large-scale natural image datasets offer a powerful paradigm for medical image segmentation. However, effectively transferring their learned representations for precise clinical applications remains a challenge. In this work, we propose Dino U-Net, a novel encoder-decoder architecture designed to exploit the high-fidelity dense features of the DINOv3 vision foundation model. Our architecture introduces an encoder built upon a frozen DINOv3 backbone, which employs a specialized adapter to fuse the model's rich semantic features with low-level spatial details. To preserve the quality of these representations during dimensionality reduction, we design a new fidelity-aware projection module (FAPM) that effectively refines and projects the features for the decoder. We conducted extensive experiments on seven diverse public medical image segmentation datasets. Our results show that Dino U-Net achieves state-of-the-art performance, consistently outperforming previous methods across various imaging modalities. Our framework proves to be highly scalable, with segmentation accuracy consistently improving as the backbone model size increases up to the 7-billion-parameter variant. The findings demonstrate that leveraging the superior, dense-pretrained features from a general-purpose foundation model provides a highly effective and parameter-efficient approach to advance the accuracy of medical image segmentation. The code is available at https://github.com/yifangao112/DinoUNet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Freezing a large DINOv3 backbone plus a lightweight adapter and FAPM module produces consistent segmentation gains on seven public medical datasets that improve with scale up to 7B parameters.

read the letter

Hey, the core result is that Dino U-Net freezes DINOv3, adds a specialized adapter to mix semantic and spatial features, and uses a new fidelity-aware projection module to feed the decoder. On seven datasets spanning different modalities it beats prior methods, and the numbers keep rising as the backbone grows to the 7-billion-parameter version. That scaling behavior is the clearest new signal in the work. The architecture choices are concrete and the experiments cover a decent range of clinical imaging tasks. Releasing the code also makes it straightforward for others to check or extend the setup. The soft spot is the transfer claim itself. The paper positions the gains as coming from high-fidelity dense features that survive the natural-to-medical domain shift, yet the supporting evidence is mainly the final Dice and IoU scores. There is no direct measurement of feature preservation, such as reconstruction error or distance metrics before and after the adapter and projection, and no ablation that pits the frozen DINOv3 route against a comparably sized medical-pretrained encoder. Without those checks it remains possible that extra capacity is doing most of the work rather than faithful reuse of the original features. This paper is aimed at groups working on medical segmentation who want a practical way to plug in general foundation models without training everything from scratch. Readers who care about scaling laws in transfer learning or who need strong baselines on public datasets will get something out of it. The empirical footprint and released code are solid enough that it deserves a real referee rather than an immediate desk reject. I would send it for review and ask the authors to add targeted diagnostics on whether the adapter and FAPM actually keep the claimed high-fidelity properties intact across the domain gap.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Dino U-Net, an encoder-decoder architecture that employs a frozen DINOv3 vision foundation model as backbone for medical image segmentation. It adds a specialized adapter to fuse rich semantic features with low-level spatial details and a fidelity-aware projection module (FAPM) to refine features during dimensionality reduction for the decoder. Experiments on seven public datasets across modalities report state-of-the-art Dice/IoU scores that consistently exceed prior methods, with monotonic accuracy gains as backbone size scales to the 7-billion-parameter variant. Code is released publicly.

Significance. If the empirical results hold under rigorous controls, the work is significant because it supplies concrete evidence that dense features from large-scale natural-image foundation models can be transferred to medical segmentation in a parameter-efficient way, with clear scaling behavior. This reduces reliance on domain-specific pretraining and offers a practical route to leverage future foundation-model advances in clinical imaging. Public code release further strengthens reproducibility and downstream impact.

major comments (2)

[§4.3 and Table 3] §4.3 (Ablation Studies) and Table 3: the reported gains from the adapter + FAPM are shown only against smaller or non-foundation baselines; no control experiment compares against a same-size randomly initialized or medical-pretrained encoder of comparable capacity. Without this, the central claim that performance stems from preserved high-fidelity DINOv3 features rather than raw capacity cannot be isolated.
[§3.3 and §4.4] §3.3 (FAPM description) and §4.4 (feature analysis): the paper asserts that FAPM 'preserves the quality of these representations' yet supplies no quantitative diagnostic (cosine similarity, reconstruction error, or modality-specific feature distance) between pre- and post-projection DINO features on grayscale medical inputs. This diagnostic is load-bearing for the domain-shift argument.

minor comments (2)

[Figure 2] Figure 2: the architecture diagram would be clearer if the adapter and FAPM blocks were annotated with exact tensor dimensions and the fusion operation (concatenation, addition, or attention) were labeled explicitly.
[§2] §2 (Related Work): several recent medical foundation-model papers (e.g., MedSAM, SAM-Med2D) are cited only briefly; a short paragraph contrasting the frozen-backbone + adapter strategy with full fine-tuning approaches would sharpen novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below and indicate where revisions will be made to strengthen the paper.

read point-by-point responses

Referee: [§4.3 and Table 3] §4.3 (Ablation Studies) and Table 3: the reported gains from the adapter + FAPM are shown only against smaller or non-foundation baselines; no control experiment compares against a same-size randomly initialized or medical-pretrained encoder of comparable capacity. Without this, the central claim that performance stems from preserved high-fidelity DINOv3 features rather than raw capacity cannot be isolated.

Authors: We agree that a control experiment using a randomly initialized encoder (or medical-pretrained encoder) of comparable parameter count would more cleanly isolate the benefit of the frozen DINOv3 features from raw model capacity. Our existing ablations in §4.3 demonstrate the incremental value of the adapter and FAPM, and the scaling results show monotonic gains as DINOv3 size increases to 7B parameters. In the revised manuscript we will add a new ablation row comparing Dino U-Net against an otherwise identical architecture with a randomly initialized ViT backbone of matching size, using the same training protocol. We will also briefly discuss available medical-pretrained baselines of similar scale. revision: yes
Referee: [§3.3 and §4.4] §3.3 (FAPM description) and §4.4 (feature analysis): the paper asserts that FAPM 'preserves the quality of these representations' yet supplies no quantitative diagnostic (cosine similarity, reconstruction error, or modality-specific feature distance) between pre- and post-projection DINO features on grayscale medical inputs. This diagnostic is load-bearing for the domain-shift argument.

Authors: We acknowledge that the current §4.4 provides only qualitative visualizations. To directly support the claim that FAPM preserves high-fidelity DINOv3 representations under domain shift, we will add quantitative diagnostics in the revision: cosine similarity and feature reconstruction error computed between the original DINOv3 dense features and the FAPM-projected features on held-out samples from the grayscale medical datasets. These metrics will be reported alongside the existing visualizations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture validated on external public datasets

full rationale

The paper proposes Dino U-Net as an encoder-decoder using a frozen DINOv3 backbone plus adapter and FAPM, then reports measured Dice/IoU gains on seven independent public medical segmentation datasets plus monotonic scaling with backbone size up to 7B parameters. No equations, first-principles derivations, or predictions are presented that reduce by construction to fitted inputs, self-definitions, or self-citation chains. All central claims rest on direct experimental outcomes against external benchmarks rather than any internal equivalence or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces two new modules (adapter and FAPM) whose internal design choices and any hyperparameters are not detailed in the abstract; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5774 in / 1085 out tokens · 30891 ms · 2026-05-18T20:18:38.717507+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our architecture introduces an encoder built upon a frozen DINOv3 backbone, which employs a specialized adapter to fuse the model's rich semantic features with low-level spatial details. To preserve the quality of these representations during dimensionality reduction, we design a new fidelity-aware projection module (FAPM)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Dino U-Net achieves state-of-the-art performance... with accuracy improving as backbone size increases to 7 billion parameters

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dino-NestedUNet: Unlocking Foundation Vision Encoders for Pathology Tumor Bulk Segmentation via Dense Decoding
cs.CV 2026-04 unverdicted novelty 6.0

Dino-NestedUNet improves pathology tumor segmentation by coupling DINOv3 encoders with dense nested decoding, showing gains over UNet++ and Dino-UNet baselines across multiple cohorts including zero-shot tests.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

Artificial intelligence–enabled rapid diagnosis of patients with covid-19,

X. Mei, H.-C. Lee, K.-y. Diao, M. Huang, B. Lin, C. Liu, Z. Xie, Y . Ma, P. M. Robson, M. Chung et al. , “Artificial intelligence–enabled rapid diagnosis of patients with covid-19,” Nature medicine , vol. 26, no. 8, pp. 1224–1228, 2020

work page 2020
[2]

Unetr++: delving into efficient and accurate 3d medical image segmentation,

A. Shaker, M. Maaz, H. Rasheed, S. Khan, M.-H. Yang, and F. S. Khan, “Unetr++: delving into efficient and accurate 3d medical image segmentation,” IEEE Transactions on Medical Imaging , vol. 43, no. 9, pp. 3377–3390, 2024

work page 2024
[3]

nn- former: V olumetric medical image segmentation via a 3d transformer,

H.-Y . Zhou, J. Guo, Y . Zhang, X. Han, L. Yu, L. Wang, and Y . Yu, “nn- former: V olumetric medical image segmentation via a 3d transformer,” IEEE transactions on image processing , vol. 32, pp. 4036–4045, 2023

work page 2023
[4]

Transmed: Transformers advance multi- modal medical image classification,

Y . Dai, Y . Gao, and F. Liu, “Transmed: Transformers advance multi- modal medical image classification,”Diagnostics, vol. 11, no. 8, p. 1384, 2021

work page 2021
[5]

Wega: Weakly-supervised global-local affinity learning framework for lymph node metastasis prediction in rectal cancer,

Y . Gao, Y . Dong, W. Wu, C. Ge, F. Yuan, J. Sheng, H. Li, and X. Gao, “Wega: Weakly-supervised global-local affinity learning framework for lymph node metastasis prediction in rectal cancer,” arXiv preprint arXiv:2505.10502, 2025

work page arXiv 2025
[6]

An anatomy-aware frame- work for automatic segmentation of parotid tumor from multimodal mri,

Y . Gao, Y . Dai, F. Liu, W. Chen, and L. Shi, “An anatomy-aware frame- work for automatic segmentation of parotid tumor from multimodal mri,” Computers in Biology and Medicine , vol. 161, p. 107000, 2023

work page 2023
[7]

A compos- ite alignment-aware framework for myocardial lesion segmentation in multi-sequence cmr images,

Y . Gao, S. Rui, H. Su, J. Xiang, L. Wu, and X. Wang, “A compos- ite alignment-aware framework for myocardial lesion segmentation in multi-sequence cmr images,” arXiv preprint arXiv:2507.11886 , 2025

work page arXiv 2025
[8]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241

work page 2015
[9]

Review of semantic segmentation of medical images using modified architectures of unet,

M. Krithika Alias AnbuDevi and K. Suganthi, “Review of semantic segmentation of medical images using modified architectures of unet,” Diagnostics, vol. 12, no. 12, p. 3064, 2022

work page 2022
[10]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” in Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026

work page 2023
[11]

Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation,

C. Chen, J. Miao, D. Wu, A. Zhong, Z. Yan, S. Kim, J. Hu, Z. Liu, L. Sun, X. Li et al. , “Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation,” Medical Image Analysis , vol. 98, p. 103310, 2024

work page 2024
[12]

Segment anything model for medical image segmentation: Current applications and future directions,

Y . Zhang, Z. Shen, and R. Jiao, “Segment anything model for medical image segmentation: Current applications and future directions,” Com- puters in Biology and Medicine , vol. 171, p. 108238, 2024

work page 2024
[13]

Medical sam adapter: Adapting segment anything model for medical image segmentation,

J. Wu, Z. Wang, M. Hong, W. Ji, H. Fu, Y . Xu, M. Xu, and Y . Jin, “Medical sam adapter: Adapting segment anything model for medical image segmentation,”Medical image analysis, vol. 102, p. 103547, 2025

work page 2025
[14]

Safeclick: Error-tolerant interactive segmentation of any medical volumes via hierarchical expert consensus,

Y . Gao, J. Sheng, W. Wu, H. Li, Y . Dong, C. Ge, F. Yuan, and X. Gao, “Safeclick: Error-tolerant interactive segmentation of any medical volumes via hierarchical expert consensus,” arXiv preprint arXiv:2506.18404, 2025

work page arXiv 2025
[15]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to- end object detection,” arXiv preprint arXiv:2203.03605 , 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[16]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al. , “Dinov2: Learning robust visual features without supervision,” arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

DINOv3

O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoa, F. Massa, D. Haziza, L. Wehrstedt, J. Wang, T. Darcet, T. Moutakanni, L. Sentana, C. Roberts, A. Vedaldi, J. Tolan, J. Brandt, C. Couprie, J. Mairal, H. J ´egou, P. Labatut, and P. Bojanowski, “DINOv3,” 2025. [Online]. Available: h...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafson et al., “Sam 2: Segment anything in images and videos,” arXiv preprint arXiv:2408.00714 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,

Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,” IEEE transactions on medical imaging , vol. 39, no. 6, pp. 1856–1867, 2019

work page 2019
[20]

3d mri brain tumor segmentation using autoencoder reg- ularization,

A. Myronenko, “3d mri brain tumor segmentation using autoencoder reg- ularization,” in International MICCAI brainlesion workshop . Springer, 2018, pp. 311–320

work page 2018
[21]

nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,”Nature methods, vol. 18, no. 2, pp. 203–211, 2021

work page 2021
[22]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

J. Ma, F. Li, and B. Wang, “U-mamba: Enhancing long-range dependency for biomedical image segmentation,” arXiv preprint arXiv:2401.04722, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

U-kan makes strong backbone for medical image segmentation and generation,

C. Li, X. Liu, W. Li, C. Wang, H. Liu, Y . Liu, Z. Chen, and Y . Yuan, “U-kan makes strong backbone for medical image segmentation and generation,” in Proceedings of the AAAI Conference on Artificial Intel- ligence, vol. 39, no. 5, 2025, pp. 4652–4660

work page 2025
[24]

Sam2-unet: Segment anything 2 makes strong encoder for natural and medical image segmentation,

X. Xiong, Z. Wu, S. Tan, W. Li, F. Tang, Y . Chen, S. Li, J. Ma, and G. Li, “Sam2-unet: Segment anything 2 makes strong encoder for natural and medical image segmentation,” arXiv preprint arXiv:2408.08870 , 2024

work page arXiv 2024
[25]

Kvasir-seg: A segmented polyp dataset,

D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. de Lange, D. Johansen, and H. D. Johansen, “Kvasir-seg: A segmented polyp dataset,” in MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II

work page 2020
[26]

Springer, 2020, pp. 451–462

work page 2020
[27]

Drishti-gs: Retinal image dataset for optic nerve head(onh) segmentation,

J. Sivaswamy, S. R. Krishnadas, G. Datt Joshi, M. Jain, and A. U. Syed Tabish, “Drishti-gs: Retinal image dataset for optic nerve head(onh) segmentation,” in 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI) , 2014, pp. 53–56

work page 2014
[28]

Dataset of breast ultrasound images,

W. Al-Dhabyani, M. Gomaa, H. Khaled, and A. Fahmy, “Dataset of breast ultrasound images,” Data in brief , vol. 28, p. 104863, 2020

work page 2020
[29]

Cellbindb: a large-scale multimodal annotated dataset for cell segmentation with benchmarking of universal models,

C. Shi, J. Fan, Z. Deng, H. Liu, Q. Kang, Y . Li, J. Guo, J. Wang, J. Gong, S. Liao et al., “Cellbindb: a large-scale multimodal annotated dataset for cell segmentation with benchmarking of universal models,” GigaScience, vol. 14, p. giaf069, 2025

work page 2025
[30]

Bayeseg: Bayesian modeling for medical image segmentation with interpretable generalizability,

S. Gao, H. Zhou, Y . Gao, and X. Zhuang, “Bayeseg: Bayesian modeling for medical image segmentation with interpretable generalizability,” Medical image analysis , vol. 89, p. 102889, 2023

work page 2023
[31]

Multivariate mixture model for myocardial segmentation combining multi-source images,

X. Zhuang, “Multivariate mixture model for myocardial segmentation combining multi-source images,” IEEE transactions on pattern analysis and machine intelligence , vol. 41, no. 12, pp. 2933–2946, 2018

work page 2018
[32]

Myops-net: Myocardial pathology segmentation with flexible combi- nation of multi-sequence cmr images,

J. Qiu, L. Li, S. Wang, K. Zhang, Y . Chen, S. Yang, and X. Zhuang, “Myops-net: Myocardial pathology segmentation with flexible combi- nation of multi-sequence cmr images,” Medical image analysis, vol. 84, p. 102694, 2023

work page 2023
[33]

Prostatex zone segmentations [data set],

A. Meyer, D. Schindele, D. V on Reibnitz, M. Rak, M. Schostak, and C. Hansen, “Prostatex zone segmentations [data set],” The Cancer Imaging Archive, p. 131, 2020

work page 2020
[34]

m2caiseg: Semantic seg- mentation of laparoscopic images using convolutional neural networks,

S. Maqbool, A. Riaz, H. Sajid, and O. Hasan, “m2caiseg: Semantic seg- mentation of laparoscopic images using convolutional neural networks,” arXiv preprint arXiv:2008.10134 , 2020

work page arXiv 2008
[35]

Swin-umamba: Mamba-based unet with imagenet-based pretraining,

J. Liu, H. Yang, H.-Y . Zhou, Y . Xi, L. Yu, C. Li, Y . Liang, G. Shi, Y . Yu, S. Zhang et al., “Swin-umamba: Mamba-based unet with imagenet-based pretraining,” in International conference on medical image computing and computer-assisted intervention . Springer, 2024, pp. 615–625

work page 2024

[1] [1]

Artificial intelligence–enabled rapid diagnosis of patients with covid-19,

X. Mei, H.-C. Lee, K.-y. Diao, M. Huang, B. Lin, C. Liu, Z. Xie, Y . Ma, P. M. Robson, M. Chung et al. , “Artificial intelligence–enabled rapid diagnosis of patients with covid-19,” Nature medicine , vol. 26, no. 8, pp. 1224–1228, 2020

work page 2020

[2] [2]

Unetr++: delving into efficient and accurate 3d medical image segmentation,

A. Shaker, M. Maaz, H. Rasheed, S. Khan, M.-H. Yang, and F. S. Khan, “Unetr++: delving into efficient and accurate 3d medical image segmentation,” IEEE Transactions on Medical Imaging , vol. 43, no. 9, pp. 3377–3390, 2024

work page 2024

[3] [3]

nn- former: V olumetric medical image segmentation via a 3d transformer,

H.-Y . Zhou, J. Guo, Y . Zhang, X. Han, L. Yu, L. Wang, and Y . Yu, “nn- former: V olumetric medical image segmentation via a 3d transformer,” IEEE transactions on image processing , vol. 32, pp. 4036–4045, 2023

work page 2023

[4] [4]

Transmed: Transformers advance multi- modal medical image classification,

Y . Dai, Y . Gao, and F. Liu, “Transmed: Transformers advance multi- modal medical image classification,”Diagnostics, vol. 11, no. 8, p. 1384, 2021

work page 2021

[5] [5]

Wega: Weakly-supervised global-local affinity learning framework for lymph node metastasis prediction in rectal cancer,

Y . Gao, Y . Dong, W. Wu, C. Ge, F. Yuan, J. Sheng, H. Li, and X. Gao, “Wega: Weakly-supervised global-local affinity learning framework for lymph node metastasis prediction in rectal cancer,” arXiv preprint arXiv:2505.10502, 2025

work page arXiv 2025

[6] [6]

An anatomy-aware frame- work for automatic segmentation of parotid tumor from multimodal mri,

Y . Gao, Y . Dai, F. Liu, W. Chen, and L. Shi, “An anatomy-aware frame- work for automatic segmentation of parotid tumor from multimodal mri,” Computers in Biology and Medicine , vol. 161, p. 107000, 2023

work page 2023

[7] [7]

A compos- ite alignment-aware framework for myocardial lesion segmentation in multi-sequence cmr images,

Y . Gao, S. Rui, H. Su, J. Xiang, L. Wu, and X. Wang, “A compos- ite alignment-aware framework for myocardial lesion segmentation in multi-sequence cmr images,” arXiv preprint arXiv:2507.11886 , 2025

work page arXiv 2025

[8] [8]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241

work page 2015

[9] [9]

Review of semantic segmentation of medical images using modified architectures of unet,

M. Krithika Alias AnbuDevi and K. Suganthi, “Review of semantic segmentation of medical images using modified architectures of unet,” Diagnostics, vol. 12, no. 12, p. 3064, 2022

work page 2022

[10] [10]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” in Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026

work page 2023

[11] [11]

Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation,

C. Chen, J. Miao, D. Wu, A. Zhong, Z. Yan, S. Kim, J. Hu, Z. Liu, L. Sun, X. Li et al. , “Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation,” Medical Image Analysis , vol. 98, p. 103310, 2024

work page 2024

[12] [12]

Segment anything model for medical image segmentation: Current applications and future directions,

Y . Zhang, Z. Shen, and R. Jiao, “Segment anything model for medical image segmentation: Current applications and future directions,” Com- puters in Biology and Medicine , vol. 171, p. 108238, 2024

work page 2024

[13] [13]

Medical sam adapter: Adapting segment anything model for medical image segmentation,

J. Wu, Z. Wang, M. Hong, W. Ji, H. Fu, Y . Xu, M. Xu, and Y . Jin, “Medical sam adapter: Adapting segment anything model for medical image segmentation,”Medical image analysis, vol. 102, p. 103547, 2025

work page 2025

[14] [14]

Safeclick: Error-tolerant interactive segmentation of any medical volumes via hierarchical expert consensus,

Y . Gao, J. Sheng, W. Wu, H. Li, Y . Dong, C. Ge, F. Yuan, and X. Gao, “Safeclick: Error-tolerant interactive segmentation of any medical volumes via hierarchical expert consensus,” arXiv preprint arXiv:2506.18404, 2025

work page arXiv 2025

[15] [15]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to- end object detection,” arXiv preprint arXiv:2203.03605 , 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[16] [16]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al. , “Dinov2: Learning robust visual features without supervision,” arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[17] [17]

DINOv3

O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoa, F. Massa, D. Haziza, L. Wehrstedt, J. Wang, T. Darcet, T. Moutakanni, L. Sentana, C. Roberts, A. Vedaldi, J. Tolan, J. Brandt, C. Couprie, J. Mairal, H. J ´egou, P. Labatut, and P. Bojanowski, “DINOv3,” 2025. [Online]. Available: h...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[18] [18]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafson et al., “Sam 2: Segment anything in images and videos,” arXiv preprint arXiv:2408.00714 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[19] [19]

Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,

Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,” IEEE transactions on medical imaging , vol. 39, no. 6, pp. 1856–1867, 2019

work page 2019

[20] [20]

3d mri brain tumor segmentation using autoencoder reg- ularization,

A. Myronenko, “3d mri brain tumor segmentation using autoencoder reg- ularization,” in International MICCAI brainlesion workshop . Springer, 2018, pp. 311–320

work page 2018

[21] [21]

nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,”Nature methods, vol. 18, no. 2, pp. 203–211, 2021

work page 2021

[22] [22]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

J. Ma, F. Li, and B. Wang, “U-mamba: Enhancing long-range dependency for biomedical image segmentation,” arXiv preprint arXiv:2401.04722, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [23]

U-kan makes strong backbone for medical image segmentation and generation,

C. Li, X. Liu, W. Li, C. Wang, H. Liu, Y . Liu, Z. Chen, and Y . Yuan, “U-kan makes strong backbone for medical image segmentation and generation,” in Proceedings of the AAAI Conference on Artificial Intel- ligence, vol. 39, no. 5, 2025, pp. 4652–4660

work page 2025

[24] [24]

Sam2-unet: Segment anything 2 makes strong encoder for natural and medical image segmentation,

X. Xiong, Z. Wu, S. Tan, W. Li, F. Tang, Y . Chen, S. Li, J. Ma, and G. Li, “Sam2-unet: Segment anything 2 makes strong encoder for natural and medical image segmentation,” arXiv preprint arXiv:2408.08870 , 2024

work page arXiv 2024

[25] [25]

Kvasir-seg: A segmented polyp dataset,

D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. de Lange, D. Johansen, and H. D. Johansen, “Kvasir-seg: A segmented polyp dataset,” in MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II

work page 2020

[26] [26]

Springer, 2020, pp. 451–462

work page 2020

[27] [27]

Drishti-gs: Retinal image dataset for optic nerve head(onh) segmentation,

J. Sivaswamy, S. R. Krishnadas, G. Datt Joshi, M. Jain, and A. U. Syed Tabish, “Drishti-gs: Retinal image dataset for optic nerve head(onh) segmentation,” in 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI) , 2014, pp. 53–56

work page 2014

[28] [28]

Dataset of breast ultrasound images,

W. Al-Dhabyani, M. Gomaa, H. Khaled, and A. Fahmy, “Dataset of breast ultrasound images,” Data in brief , vol. 28, p. 104863, 2020

work page 2020

[29] [29]

Cellbindb: a large-scale multimodal annotated dataset for cell segmentation with benchmarking of universal models,

C. Shi, J. Fan, Z. Deng, H. Liu, Q. Kang, Y . Li, J. Guo, J. Wang, J. Gong, S. Liao et al., “Cellbindb: a large-scale multimodal annotated dataset for cell segmentation with benchmarking of universal models,” GigaScience, vol. 14, p. giaf069, 2025

work page 2025

[30] [30]

Bayeseg: Bayesian modeling for medical image segmentation with interpretable generalizability,

S. Gao, H. Zhou, Y . Gao, and X. Zhuang, “Bayeseg: Bayesian modeling for medical image segmentation with interpretable generalizability,” Medical image analysis , vol. 89, p. 102889, 2023

work page 2023

[31] [31]

Multivariate mixture model for myocardial segmentation combining multi-source images,

X. Zhuang, “Multivariate mixture model for myocardial segmentation combining multi-source images,” IEEE transactions on pattern analysis and machine intelligence , vol. 41, no. 12, pp. 2933–2946, 2018

work page 2018

[32] [32]

Myops-net: Myocardial pathology segmentation with flexible combi- nation of multi-sequence cmr images,

J. Qiu, L. Li, S. Wang, K. Zhang, Y . Chen, S. Yang, and X. Zhuang, “Myops-net: Myocardial pathology segmentation with flexible combi- nation of multi-sequence cmr images,” Medical image analysis, vol. 84, p. 102694, 2023

work page 2023

[33] [33]

Prostatex zone segmentations [data set],

A. Meyer, D. Schindele, D. V on Reibnitz, M. Rak, M. Schostak, and C. Hansen, “Prostatex zone segmentations [data set],” The Cancer Imaging Archive, p. 131, 2020

work page 2020

[34] [34]

m2caiseg: Semantic seg- mentation of laparoscopic images using convolutional neural networks,

S. Maqbool, A. Riaz, H. Sajid, and O. Hasan, “m2caiseg: Semantic seg- mentation of laparoscopic images using convolutional neural networks,” arXiv preprint arXiv:2008.10134 , 2020

work page arXiv 2008

[35] [35]

Swin-umamba: Mamba-based unet with imagenet-based pretraining,

J. Liu, H. Yang, H.-Y . Zhou, Y . Xi, L. Yu, C. Li, Y . Liang, G. Shi, Y . Yu, S. Zhang et al., “Swin-umamba: Mamba-based unet with imagenet-based pretraining,” in International conference on medical image computing and computer-assisted intervention . Springer, 2024, pp. 615–625

work page 2024