Towards Voxel Spacing Consistency for Medical Image Segmentation

Han Li; Hanxiao Zhang; Jie Yang; Minghui Zhang; Nassir Navab; Runze Yang; Xin You; Yi Yu; Yun Gu

arxiv: 2606.31839 · v1 · pith:EDIKR3YFnew · submitted 2026-06-30 · 💻 cs.CV

Towards Voxel Spacing Consistency for Medical Image Segmentation

Xin You , Runze Yang , Minghui Zhang , Hanxiao Zhang , Han Li , Yi Yu , Jie Yang , Nassir Navab

show 1 more author

Yun Gu

This is my paper

Pith reviewed 2026-07-01 05:50 UTC · model grok-4.3

classification 💻 cs.CV

keywords medical image segmentationvoxel spacingresamplingsemantic consistencyimplicit neural networkODE constraintinter-slice dynamics

0 comments

The pith

A semantic-aware resampling method enforces consistent axial voxel spacing while preserving both anatomical continuity and class-wise semantic consistency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that resampling volumetric medical images to uniform voxel spacing need not discard semantic information or break anatomical continuity between slices. It introduces Consispace, which models inter-slice dynamics continuously via an ODE constraint and enforces intra-slice class consistency by reweighting features drawn from a pretrained vision model. These two constraints are folded into a single implicit neural network that supports arbitrary-scale resampling. A sympathetic reader would care because inconsistent spacing is a routine preprocessing step whose side effects on downstream segmentation have not been systematically addressed.

Core claim

Consispace is a semantic-aware resampling framework that achieves consistent voxel spacing in the axial direction while preserving anatomical and semantic consistency. It combines an ODE-based anatomical constraint for inter-slice dynamics with dense-feature semantic correlation maps that inject class-wise consistency via feature reweighting; both are integrated inside an implicit neural network supporting arbitrary-scale resampling.

What carries the argument

Consispace framework, which couples an ODE-based continuous interpolator for inter-slice anatomical constraints with pretrained-vision-model feature reweighting for intra-slice semantic consistency.

If this is right

Superior reconstruction quality and perceptual fidelity compared with discrete interpolation methods.
Smoother inter-slice anatomical transitions in the resampled volumes.
Measurable improvement in downstream segmentation performance when Consispace is used as a preprocessing step.
Support for arbitrary-scale resampling without retraining the network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dual-constraint design could be tested on resampling tasks outside the axial direction or on non-medical volumetric data where semantic labels matter.
If the semantic reweighting step proves robust, it might reduce reliance on spacing-specific data augmentation during segmentation model training.
The implicit-network formulation suggests the method could be inserted as a differentiable layer inside end-to-end trainable segmentation pipelines.

Load-bearing premise

Dense features from a pretrained vision model can be used to build intra-slice semantic correlation maps that preserve class-wise semantic consistency when injected via feature reweighting during resampling.

What would settle it

A controlled experiment on held-out volumes in which Consispace resampling yields no measurable gain in reconstruction PSNR, perceptual metrics, or downstream segmentation Dice scores relative to standard linear or spline interpolation followed by the same segmentation model.

Figures

Figures reproduced from arXiv: 2606.31839 by Han Li, Hanxiao Zhang, Jie Yang, Minghui Zhang, Nassir Navab, Runze Yang, Xin You, Yi Yu, Yun Gu.

**Figure 2.** Figure 2: The overall pipeline of Consispace. The ODE-based in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The performance comparison between linear interpolation and the proposed Consispace on medical image segmentation. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The image resampling performance comparison between the proposed Consispace and other resampling approaches. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: The detailed feature resampling process in the proposed Consispace. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: The segmentation performance promotion when imple [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

read the original abstract

Volumetric medical image segmentation is essential for both preoperative diagnosis and intraoperative guidance. While recent years have witnessed rapid progress in segmentation architectures, comparatively little attention is paid to the physical voxel spacing of anatomical data. Indeed, volumetric image resampling is a ubiquitous preprocessing step before segmentation, yet its interaction with downstream segmentation has not been systematically exploited. In this work, we study the correlation between image resampling and segmentation, and propose Consispace, a semantic-aware resampling framework that achieves consistent voxel spacing in the axial direction while preserving anatomical and semantic consistency. Consispace introduces an ODE-based anatomical constraint to model inter-slice dynamics with a continuous interpolator, enabling faithful reconstruction under complex anatomical transitions beyond discrete interpolation. To further couple resampling with segmentation objectives, we leverage dense features from a pretrained vision model to build intra-slice semantic correlation maps and inject class-wise semantic consistency via feature reweighting during resampling. Both intra-slice and inter-slice constraints are integrated into an implicit neural network, supporting arbitrary-scale resampling. Extensive experiments on multiple datasets demonstrate that Consispace achieves superior reconstruction quality and perceptual fidelity, produces smoother inter-slice anatomy, and improves downstream segmentation performance when used as a preprocessing step.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Consispace tries to fix voxel spacing with ODE inter-slice modeling plus pretrained feature reweighting in an implicit network, but the abstract supplies no numbers to check if it works.

read the letter

The main takeaway is that this paper builds a resampling method called Consispace that adds an ODE-based continuous interpolator for inter-slice anatomy and uses dense features from a pretrained vision model to create semantic correlation maps that get reweighted during resampling, all inside an implicit neural network for arbitrary scales. The goal is to make the output volumes better for downstream segmentation.

What is actually new is the specific combination of those three elements for the resampling step. The abstract correctly notes that resampling is a common preprocessing step whose effect on segmentation has not been exploited much, and the method tries to tie the two together through both anatomical and semantic constraints.

The soft spots stand out because the abstract asserts superior reconstruction quality, smoother inter-slice anatomy, and better segmentation performance after using Consispace as preprocessing, yet it contains no quantitative results, no dataset names, no baselines, and no error analysis. Without those, the central claims cannot be evaluated. The intra-slice semantic reweighting also rests on features from a pretrained vision model, which raises the domain-shift issue the stress-test note flags. Medical CT and MRI volumes differ sharply from natural-image training data in intensity and semantics, and the abstract gives no indication that the resulting correlation maps align with anatomical classes rather than artifacts.

This is a targeted preprocessing paper aimed at medical image segmentation groups that already deal with inconsistent voxel spacing. A reader in that subfield might pick up the ODE-plus-implicit-network framing as one way to approach the problem, but only the full paper's experiments would show whether the gains hold. The work shows clear thinking on the preprocessing gap even if the evidence presented so far is thin.

I would send it to peer review so the experiments and comparisons can be checked properly.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes Consispace, a semantic-aware resampling framework for volumetric medical images that enforces consistent axial voxel spacing. It combines an ODE-based inter-slice anatomical constraint using a continuous interpolator with intra-slice semantic correlation maps built from dense features of a pretrained vision model, injected via feature reweighting; both are integrated into an implicit neural network supporting arbitrary-scale resampling. The central claim is that this yields superior reconstruction quality, perceptual fidelity, smoother inter-slice anatomy, and improved downstream segmentation performance, supported by extensive experiments on multiple datasets.

Significance. If the empirical claims hold after proper validation, the work would address an under-explored interaction between resampling and segmentation by coupling physical voxel spacing with semantic consistency objectives. The integration of ODE dynamics for inter-slice modeling and pretrained feature reweighting for intra-slice constraints offers a potentially generalizable preprocessing approach. No machine-checked proofs or parameter-free derivations are present, but the framework's design could be falsifiable via targeted ablations on domain shift.

major comments (2)

[Abstract] Abstract: the assertion of 'extensive experiments on multiple datasets' demonstrating 'superior reconstruction quality... and improves downstream segmentation performance' supplies no quantitative metrics, dataset identifiers, baseline methods, statistical tests, or error analysis, making the central empirical claim of improved segmentation unverifiable from the manuscript.
[Abstract] Abstract (paragraph on intra-slice constraints): the intra-slice component constructs semantic correlation maps from dense features of a pretrained vision model and injects them via feature reweighting to preserve class-wise consistency. No discussion or evidence addresses the domain shift between natural-image pretraining data and medical CT/MRI volumes (modality, intensity statistics, semantic granularity), which directly risks the maps aligning with spurious rather than anatomical boundaries and thereby undermines both the consistency guarantee and the downstream segmentation gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point-by-point below and outline planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion of 'extensive experiments on multiple datasets' demonstrating 'superior reconstruction quality... and improves downstream segmentation performance' supplies no quantitative metrics, dataset identifiers, baseline methods, statistical tests, or error analysis, making the central empirical claim of improved segmentation unverifiable from the manuscript.

Authors: We agree the abstract's empirical claims would be more verifiable with concrete details. The current abstract prioritizes brevity, but we will revise it to name the datasets (e.g., specific CT/MRI collections used), report key metrics such as PSNR/SSIM for reconstruction and Dice/IoU gains for segmentation, reference the main baselines (trilinear, spline, and learning-based interpolators), and note statistical testing where performed. These additions will be kept concise while directly supporting the claims. revision: yes
Referee: [Abstract] Abstract (paragraph on intra-slice constraints): the intra-slice component constructs semantic correlation maps from dense features of a pretrained vision model and injects them via feature reweighting to preserve class-wise consistency. No discussion or evidence addresses the domain shift between natural-image pretraining data and medical CT/MRI volumes (modality, intensity statistics, semantic granularity), which directly risks the maps aligning with spurious rather than anatomical boundaries and thereby undermines both the consistency guarantee and the downstream segmentation gains.

Authors: This is a substantive concern. The manuscript does not currently discuss domain shift between natural-image pretraining and medical volumes. We will add a paragraph in the methods or discussion section addressing this, citing transfer-learning literature for medical imaging and explaining why dense features from the chosen pretrained model still capture useful anatomical semantics in our setting. We will also include a targeted ablation quantifying the contribution of the semantic reweighting on medical data to demonstrate that the maps align with anatomical rather than spurious boundaries. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external pretrained models and standard ODE concepts

full rationale

The paper proposes Consispace by combining an ODE-based inter-slice anatomical constraint with intra-slice semantic correlation maps derived from a pretrained vision model's dense features, then integrates both into an implicit neural network for resampling. No equations or steps in the abstract or described method reduce by construction to self-defined quantities, fitted inputs renamed as predictions, or load-bearing self-citations. The central claims depend on external pretrained models (not derived within the paper) and standard continuous interpolation ideas, making the derivation self-contained against external benchmarks. No quoted text exhibits the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the domain assumption that inter-slice anatomy follows continuous ODE dynamics and that pretrained vision features capture transferable semantic correlations; no explicit free parameters or invented physical entities are named in the abstract.

axioms (2)

domain assumption Inter-slice dynamics can be modeled by an ODE with a continuous interpolator that enables faithful reconstruction under complex anatomical transitions.
Invoked in the abstract to justify the anatomical constraint component.
domain assumption Dense features from a pretrained vision model can be leveraged to build intra-slice semantic correlation maps.
Stated as the basis for the semantic consistency mechanism.

invented entities (1)

Consispace framework no independent evidence
purpose: Semantic-aware resampling to enforce consistent axial voxel spacing
Newly proposed method combining the listed components.

pith-pipeline@v0.9.1-grok · 5752 in / 1393 out tokens · 25813 ms · 2026-07-01T05:50:45.105736+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 11 canonical work pages · 7 internal anchors

[1]

nnu-net: a self-configuring method for deep learning- based biomedical image segmentation,

F. Isenseeet al., “nnu-net: a self-configuring method for deep learning- based biomedical image segmentation,”Nature methods, vol. 18, no. 2, pp. 203–211, 2021. 1, 2, 3, 4, 5, 6

2021
[2]

Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved?

O. Bernardet al., “Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved?” TMI, vol. 37, no. 11, pp. 2514–2525, 2018. 1

2018
[3]

Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy,

S. Nikolovet al., “Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy,”arXiv preprint arXiv:1809.04430, 2018. 1

work page arXiv 2018
[4]

Near real-time intraoperative brain tumor diagnosis using stimulated raman histology and deep neural networks,

T. C. Hollonet al., “Near real-time intraoperative brain tumor diagnosis using stimulated raman histology and deep neural networks,”Nature medicine, vol. 26, no. 1, pp. 52–58, 2020. 1

2020
[5]

Automated quantitative tumour response as- sessment of mri in neuro-oncology with artificial neural networks: a multicentre, retrospective study,

P. Kickingerederet al., “Automated quantitative tumour response as- sessment of mri in neuro-oncology with artificial neural networks: a multicentre, retrospective study,”The Lancet Oncology, vol. 20, no. 5, pp. 728–740, 2019. 1

2019
[6]

Self-supervised pre-training of swin transformers for 3d medical image analysis,

Y . Tanget al., “Self-supervised pre-training of swin transformers for 3d medical image analysis,” inCVPR, 2022. 1, 2, 5, 6

2022
[7]

Mednext: transformer-driven scaling of convnets for medical image segmentation,

S. Royet al., “Mednext: transformer-driven scaling of convnets for medical image segmentation,” inMICCAI, 2023. 1, 2, 5 JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2026 11

2023
[8]

Learning with explicit shape priors for medical image segmentation,

X. You, J. He, J. Yang, and Y . Gu, “Learning with explicit shape priors for medical image segmentation,”IEEE Transactions on Medical Imaging, 2024. 1

2024
[9]

Boundary loss for highly unbalanced segmentation,

H. Kervadec, J. Bouchtiba, C. Desrosiers, E. Granger, J. Dolz, and I. B. Ayed, “Boundary loss for highly unbalanced segmentation,”Medical Image Analysis, vol. 67, p. 101851, 2021. 1

2021
[10]

Boundary difference over union loss for medical image segmentation,

F. Sun, Z. Luo, and S. Li, “Boundary difference over union loss for medical image segmentation,” inMICCAI. Springer, 2023, pp. 292–

2023
[11]

Towards boundary confusion for volumetric medical image segmentation,

X. Youet al., “Towards boundary confusion for volumetric medical image segmentation,”Medical Image Analysis, p. 103961, 2026. 1

2026
[12]

Loss odyssey in medical image segmentation,

J. Ma, J. Chen, M. Ng, R. Huang, Y . Li, C. Li, X. Yang, and A. L. Martel, “Loss odyssey in medical image segmentation,”Medical image analysis, vol. 71, p. 102035, 2021. 1

2021
[13]

Medsam2: Segment anything in 3d medical images and videos.arXiv preprint arXiv:2504.03600,

J. Ma, Z. Yang, S. Kim, B. Chen, M. Baharoon, A. Fallahpour, R. Asakereh, H. Lyu, and B. Wang, “Medsam2: Segment anything in 3d medical images and videos,”arXiv preprint arXiv:2504.03600, 2025. 1, 2

work page arXiv 2025
[14]

Sam-med3d: A vision foundation model for general- purpose segmentation on volumetric medical images,

H. Wanget al., “Sam-med3d: A vision foundation model for general- purpose segmentation on volumetric medical images,”IEEE Transac- tions on Neural Networks and Learning Systems, 2025. 1, 2

2025
[15]

Hyperspace: Hypernetworks for spacing-adaptive image segmentation,

S. Joutard, M. Pietsch, and R. Prevost, “Hyperspace: Hypernetworks for spacing-adaptive image segmentation,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2024, pp. 339–349. 1, 2, 3, 4, 6, 7

2024
[16]

Totalsegmentator: robust segmentation of 104 anatomic structures in ct images,

J. Wasserthalet al., “Totalsegmentator: robust segmentation of 104 anatomic structures in ct images,”Radiology: Artificial Intelligence, vol. 5, no. 5, p. e230024, 2023. 1

2023
[17]

Slord: structural low-rank descriptors for shape con- sistency in vertebrae segmentation,

X. Youet al., “Slord: structural low-rank descriptors for shape con- sistency in vertebrae segmentation,”IEEE Journal of Biomedical and Health Informatics, 2025. 2

2025
[18]

Saint: spatially aware interpolation network for medical slice synthesis,

C. Peng, W.-A. Lin, H. Liao, R. Chellappa, and S. K. Zhou, “Saint: spatially aware interpolation network for medical slice synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7750–7759. 2, 7

2020
[19]

Learning continuous image representa- tion with local implicit image function,

Y . Chen, S. Liu, and X. Wang, “Learning continuous image representa- tion with local implicit image function,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8628–

2021
[20]

An arbitrary scale super-resolution approach for 3d mr images via implicit neural representation,

Q. Wuet al., “An arbitrary scale super-resolution approach for 3d mr images via implicit neural representation,”IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 2, pp. 1004–1015, 2022. 2, 3, 7

2022
[21]

Rplhr- ct dataset and transformer baseline for volumetric super-resolution from ct scans,

P. Yu, H. Zhang, H. Kang, W. Tang, C. W. Arnold, and R. Zhang, “Rplhr- ct dataset and transformer baseline for volumetric super-resolution from ct scans,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2022, pp. 344–353. 2, 7

2022
[22]

Cycleinr: cycle implicit neural representation for arbitrary-scale volumetric super-resolution of medical data,

W. Fang, Y . Tang, H. Guo, M. Yuan, T. C. Mok, K. Yan, J. Yao, X. Chen, Z. Liu, L. Luet al., “Cycleinr: cycle implicit neural representation for arbitrary-scale volumetric super-resolution of medical data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 11 631–11 641. 2, 7

2024
[23]

Diffusion-prior based implicit neural representation for arbitrary-scale cardiac cine mri super-resolution,

J. Lyuet al., “Diffusion-prior based implicit neural representation for arbitrary-scale cardiac cine mri super-resolution,”Information Fusion, p. 103510, 2025. 2, 7

2025
[24]

Medical sam 2: Segment medical images as video via segment anything model 2,

J. Zhu, A. Hamdi, Y . Qi, Y . Jin, and J. Wu, “Medical sam 2: Segment medical images as video via segment anything model 2,”arXiv preprint arXiv:2408.00874, 2024. 2, 5, 9

work page arXiv 2024
[25]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronnebergeret al., “U-net: Convolutional networks for biomedical image segmentation,” inMICCAI, 2015. 2

2015
[26]

3d u-net: learning dense volumetric segmentation from sparse annotation,

¨O. C ¸ ic ¸eket al., “3d u-net: learning dense volumetric segmentation from sparse annotation,” inMICCAI, 2016, pp. 424–432. 2

2016
[27]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. 2

2016
[28]

Unext: Mlp-based rapid medical image segmentation network,

J. M. J. Valanarasuet al., “Unext: Mlp-based rapid medical image segmentation network,” inMICCAI. Springer, 2022. 2

2022
[29]

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

J. Chenet al., “Transunet: Transformers make strong encoders for medical image segmentation,”arXiv preprint arXiv:2102.04306, 2021. 2

work page internal anchor Pith review Pith/arXiv arXiv 2021
[30]

Transunet: Rethinking the u-net architecture de- sign for medical image segmentation through the lens of transformers,

J. Chen, J. Meiet al., “Transunet: Rethinking the u-net architecture de- sign for medical image segmentation through the lens of transformers,” Medical Image Analysis, vol. 97, p. 103280, 2024. 2

2024
[31]

nn- former: V olumetric medical image segmentation via a 3d transformer,

H.-Y . Zhou, J. Guo, Y . Zhang, X. Han, L. Yu, L. Wang, and Y . Yu, “nn- former: V olumetric medical image segmentation via a 3d transformer,” IEEE transactions on image processing, vol. 32, pp. 4036–4045, 2023. 2

2023
[32]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,” inICLR, 2021. 2

2021
[33]

Mamba-sea: A mamba-based framework with global-to-local sequence augmentation for generalizable medical image segmentation,

Z. Cheng, J. Guo, J. Zhang, L. Qi, L. Zhou, Y . Shi, and Y . Gao, “Mamba-sea: A mamba-based framework with global-to-local sequence augmentation for generalizable medical image segmentation,”IEEE Transactions on Medical Imaging, 2025. 2

2025
[34]

Switch-umamba: Dynamic scanning vision mamba unet for medical image segmentation,

Z. Zhanget al., “Switch-umamba: Dynamic scanning vision mamba unet for medical image segmentation,”Medical Image Analysis, p. 103792,
[35]

Segmamba-v2: Long-range sequential modeling mamba for general 3d medical image segmentation,

Z. Xinget al., “Segmamba-v2: Long-range sequential modeling mamba for general 3d medical image segmentation,”IEEE Transactions on Medical Imaging, 2025. 2

2025
[36]

Swin-umamba†: Adapting mamba-based vision foundation models for medical image segmentation,

J. Liu, H. Yang, H.-Y . Zhou, L. Yu, Y . Liang, Y . Yu, S. Zhang, H. Zheng, and S. Wang, “Swin-umamba†: Adapting mamba-based vision foundation models for medical image segmentation,”IEEE Transactions on Medical Imaging, vol. 44, no. 10, pp. 3898–3908, 2024. 2

2024
[37]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

J. Ma, F. Li, and B. Wang, “U-mamba: Enhancing long-range dependency for biomedical image segmentation,”arXiv preprint arXiv:2401.04722, 2024. 2

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

Segment anything,

A. Kirillovet al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026. 2

2023
[39]

Medical sam adapter: Adapting segment anything model for medical image segmentation,

J. Wuet al., “Medical sam adapter: Adapting segment anything model for medical image segmentation,”Medical image analysis, vol. 102, p. 103547, 2025. 2

2025
[40]

Sam-med2d.arXiv preprint arXiv:2308.16184,

J. Chenget al., “Sam-med2d,”arXiv preprint arXiv:2308.16184, 2023. 2, 9, 10

work page arXiv 2023
[41]

DINOv3

O. Sim ´eoni, H. V . V oet al., “Dinov3,”arXiv preprint arXiv:2508.10104,

work page internal anchor Pith review Pith/arXiv arXiv
[42]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840– 6851, 2020. 2

2020
[43]

Anysr: Realizing image super- resolution as any-scale, any-resource,

W. Zhan, M. Lin, C.-W. Lin, and R. Ji, “Anysr: Realizing image super- resolution as any-scale, any-resource,”IEEE transactions on image processing, vol. 33, pp. 6564–6578, 2024. 3

2024
[44]

From coarse to continuous: Progressive refinement implicit neural representation for motion-robust anisotropic mri reconstruction,

Z. Zhang, L. Zhang, Y . Cheng, Z. Wang, F. Wang, H. Zhang, Y . Yang, Y . Wu, J. Huang, A. I. Aviles-Rivero, Z. Gao, G. Yang, and P. J. Lally, “From coarse to continuous: Progressive refinement implicit neural representation for motion-robust anisotropic mri reconstruction,”IEEE transactions on image processing, vol. 35, pp. 3550–3565, 2026. 3

2026
[45]

Joanet: An integrated joint optimization architecture making medical image segmentation really helped by super-resolution pre-processing,

C.-H. Qiu, X.-S. Zhang, and Y .-J. Li, “Joanet: An integrated joint optimization architecture making medical image segmentation really helped by super-resolution pre-processing,”IEEE transactions on image processing, vol. 34, pp. 7008–7023, 2025. 3

2025
[46]

Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining,

B. Billotet al., “Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining,”Medical image analysis, vol. 86, p. 102789, 2023. 3, 6, 7

2023
[47]

Do we really need all these preprocessing steps in brain mri segmentation?

E. Kondrateva, P. Druzhinina, and A. Kurmukov, “Do we really need all these preprocessing steps in brain mri segmentation?” inMedical Imaging with Deep Learning, 2022. 3

2022
[48]

All you need is data preparation: A systematic review of image harmonization techniques in multi-center/device studies for medi- cal support systems,

S. Seoniet al., “All you need is data preparation: A systematic review of image harmonization techniques in multi-center/device studies for medi- cal support systems,”Computer Methods and Programs in Biomedicine, vol. 250, p. 108200, 2024. 3

2024
[49]

Computing large deformation metric mappings via geodesic flows of diffeomorphisms,

M. F. Beget al., “Computing large deformation metric mappings via geodesic flows of diffeomorphisms,”International journal of computer vision, vol. 61, no. 2, pp. 139–157, 2005. 3

2005
[50]

Flow Matching for Generative Modeling

Y . Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling,”arXiv preprint arXiv:2210.02747,

work page internal anchor Pith review Pith/arXiv arXiv
[51]

Benchmarking the cow with the topcow challenge: Topology-aware anatomical segmentation of the circle of willis for cta and mra,

K. Yanget al., “Benchmarking the cow with the topcow challenge: Topology-aware anatomical segmentation of the circle of willis for cta and mra,”ArXiv, pp. arXiv–2312, 2025. 5

2025
[52]

The multimodal brain tumor image segmentation benchmark (brats),

B. H. Menzeet al., “The multimodal brain tumor image segmentation benchmark (brats),”IEEE transactions on medical imaging, vol. 34, no. 10, pp. 1993–2024, 2014. 5

1993
[53]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

S. Bakaset al., “Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge,”arXiv preprint arXiv:1811.02629,

work page internal anchor Pith review Pith/arXiv arXiv
[54]

Lumbar spine segmentation in mr images: a dataset and a public benchmark,

J. W. van der Graaf, M. L. van Hooff, C. F. Buckens, M. Rutten, J. L. van Susante, R. J. Kroeze, M. de Kleuver, B. van Ginneken, and N. Lessmann, “Lumbar spine segmentation in mr images: a dataset and a public benchmark,”Scientific Data, vol. 11, no. 1, p. 264, 2024. 5

2024
[55]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017. 5 JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2026 12

work page internal anchor Pith review Pith/arXiv arXiv 2017
[56]

Image quality metrics: Psnr vs. ssim,

A. Hore and D. Ziou, “Image quality metrics: Psnr vs. ssim,” in2010 20th international conference on pattern recognition. IEEE, 2010, pp. 2366–2369. 5

2010
[57]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004. 5

2004
[58]

Data-efficient unsu- pervised interpolation without any intermediate frame for 4d medical images,

J. Kim, H. Yoon, G. Park, K. Kim, and E. Yang, “Data-efficient unsu- pervised interpolation without any intermediate frame for 4d medical images,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 11 353–11 364. 5

2024
[59]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in CVPR, 2018, pp. 586–595. 5

2018
[60]

Video interpolation with diffusion models,

S. Jain, D. Watson, E. Tabellion, B. Poole, J. Kontkanenet al., “Video interpolation with diffusion models,” inCVPR, 2024, pp. 7341–7351. 5

2024
[61]

Fb-diff: Fourier basis-guided diffusion for temporal interpolation of 4d medical imaging,

X. Youet al., “Fb-diff: Fourier basis-guided diffusion for temporal interpolation of 4d medical imaging,” inICCV, 2025, pp. 28 010–28 020. 5

2025
[62]

V-net: Fully convolutional neural networks for volumetric medical image segmentation,

F. Milletariet al., “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in2016 fourth international conference on 3D vision (3DV). Ieee, 2016, pp. 565–571. 5

2016
[63]

Reducing the hausdorff distance in medical image segmentation with convolutional neural networks,

D. Karimi and S. E. Salcudean, “Reducing the hausdorff distance in medical image segmentation with convolutional neural networks,”TMI, vol. 39, no. 2, pp. 499–513, 2019. 5

2019
[64]

Masked autoencoders are scalable vision learners,

K. Heet al., “Masked autoencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009. 9, 10

2022
[65]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcetet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023. 9, 10

work page internal anchor Pith review Pith/arXiv arXiv 2023
[66]

Vista3d: A unified segmentation foundation model for 3d medical imaging,

Y . Heet al., “Vista3d: A unified segmentation foundation model for 3d medical imaging,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 20 863–20 873. 9, 10

2025

[1] [1]

nnu-net: a self-configuring method for deep learning- based biomedical image segmentation,

F. Isenseeet al., “nnu-net: a self-configuring method for deep learning- based biomedical image segmentation,”Nature methods, vol. 18, no. 2, pp. 203–211, 2021. 1, 2, 3, 4, 5, 6

2021

[2] [2]

Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved?

O. Bernardet al., “Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved?” TMI, vol. 37, no. 11, pp. 2514–2525, 2018. 1

2018

[3] [3]

Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy,

S. Nikolovet al., “Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy,”arXiv preprint arXiv:1809.04430, 2018. 1

work page arXiv 2018

[4] [4]

Near real-time intraoperative brain tumor diagnosis using stimulated raman histology and deep neural networks,

T. C. Hollonet al., “Near real-time intraoperative brain tumor diagnosis using stimulated raman histology and deep neural networks,”Nature medicine, vol. 26, no. 1, pp. 52–58, 2020. 1

2020

[5] [5]

Automated quantitative tumour response as- sessment of mri in neuro-oncology with artificial neural networks: a multicentre, retrospective study,

P. Kickingerederet al., “Automated quantitative tumour response as- sessment of mri in neuro-oncology with artificial neural networks: a multicentre, retrospective study,”The Lancet Oncology, vol. 20, no. 5, pp. 728–740, 2019. 1

2019

[6] [6]

Self-supervised pre-training of swin transformers for 3d medical image analysis,

Y . Tanget al., “Self-supervised pre-training of swin transformers for 3d medical image analysis,” inCVPR, 2022. 1, 2, 5, 6

2022

[7] [7]

Mednext: transformer-driven scaling of convnets for medical image segmentation,

S. Royet al., “Mednext: transformer-driven scaling of convnets for medical image segmentation,” inMICCAI, 2023. 1, 2, 5 JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2026 11

2023

[8] [8]

Learning with explicit shape priors for medical image segmentation,

X. You, J. He, J. Yang, and Y . Gu, “Learning with explicit shape priors for medical image segmentation,”IEEE Transactions on Medical Imaging, 2024. 1

2024

[9] [9]

Boundary loss for highly unbalanced segmentation,

H. Kervadec, J. Bouchtiba, C. Desrosiers, E. Granger, J. Dolz, and I. B. Ayed, “Boundary loss for highly unbalanced segmentation,”Medical Image Analysis, vol. 67, p. 101851, 2021. 1

2021

[10] [10]

Boundary difference over union loss for medical image segmentation,

F. Sun, Z. Luo, and S. Li, “Boundary difference over union loss for medical image segmentation,” inMICCAI. Springer, 2023, pp. 292–

2023

[11] [11]

Towards boundary confusion for volumetric medical image segmentation,

X. Youet al., “Towards boundary confusion for volumetric medical image segmentation,”Medical Image Analysis, p. 103961, 2026. 1

2026

[12] [12]

Loss odyssey in medical image segmentation,

J. Ma, J. Chen, M. Ng, R. Huang, Y . Li, C. Li, X. Yang, and A. L. Martel, “Loss odyssey in medical image segmentation,”Medical image analysis, vol. 71, p. 102035, 2021. 1

2021

[13] [13]

Medsam2: Segment anything in 3d medical images and videos.arXiv preprint arXiv:2504.03600,

J. Ma, Z. Yang, S. Kim, B. Chen, M. Baharoon, A. Fallahpour, R. Asakereh, H. Lyu, and B. Wang, “Medsam2: Segment anything in 3d medical images and videos,”arXiv preprint arXiv:2504.03600, 2025. 1, 2

work page arXiv 2025

[14] [14]

Sam-med3d: A vision foundation model for general- purpose segmentation on volumetric medical images,

H. Wanget al., “Sam-med3d: A vision foundation model for general- purpose segmentation on volumetric medical images,”IEEE Transac- tions on Neural Networks and Learning Systems, 2025. 1, 2

2025

[15] [15]

Hyperspace: Hypernetworks for spacing-adaptive image segmentation,

S. Joutard, M. Pietsch, and R. Prevost, “Hyperspace: Hypernetworks for spacing-adaptive image segmentation,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2024, pp. 339–349. 1, 2, 3, 4, 6, 7

2024

[16] [16]

Totalsegmentator: robust segmentation of 104 anatomic structures in ct images,

J. Wasserthalet al., “Totalsegmentator: robust segmentation of 104 anatomic structures in ct images,”Radiology: Artificial Intelligence, vol. 5, no. 5, p. e230024, 2023. 1

2023

[17] [17]

Slord: structural low-rank descriptors for shape con- sistency in vertebrae segmentation,

X. Youet al., “Slord: structural low-rank descriptors for shape con- sistency in vertebrae segmentation,”IEEE Journal of Biomedical and Health Informatics, 2025. 2

2025

[18] [18]

Saint: spatially aware interpolation network for medical slice synthesis,

C. Peng, W.-A. Lin, H. Liao, R. Chellappa, and S. K. Zhou, “Saint: spatially aware interpolation network for medical slice synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7750–7759. 2, 7

2020

[19] [19]

Learning continuous image representa- tion with local implicit image function,

Y . Chen, S. Liu, and X. Wang, “Learning continuous image representa- tion with local implicit image function,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8628–

2021

[20] [20]

An arbitrary scale super-resolution approach for 3d mr images via implicit neural representation,

Q. Wuet al., “An arbitrary scale super-resolution approach for 3d mr images via implicit neural representation,”IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 2, pp. 1004–1015, 2022. 2, 3, 7

2022

[21] [21]

Rplhr- ct dataset and transformer baseline for volumetric super-resolution from ct scans,

P. Yu, H. Zhang, H. Kang, W. Tang, C. W. Arnold, and R. Zhang, “Rplhr- ct dataset and transformer baseline for volumetric super-resolution from ct scans,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2022, pp. 344–353. 2, 7

2022

[22] [22]

Cycleinr: cycle implicit neural representation for arbitrary-scale volumetric super-resolution of medical data,

W. Fang, Y . Tang, H. Guo, M. Yuan, T. C. Mok, K. Yan, J. Yao, X. Chen, Z. Liu, L. Luet al., “Cycleinr: cycle implicit neural representation for arbitrary-scale volumetric super-resolution of medical data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 11 631–11 641. 2, 7

2024

[23] [23]

Diffusion-prior based implicit neural representation for arbitrary-scale cardiac cine mri super-resolution,

J. Lyuet al., “Diffusion-prior based implicit neural representation for arbitrary-scale cardiac cine mri super-resolution,”Information Fusion, p. 103510, 2025. 2, 7

2025

[24] [24]

Medical sam 2: Segment medical images as video via segment anything model 2,

J. Zhu, A. Hamdi, Y . Qi, Y . Jin, and J. Wu, “Medical sam 2: Segment medical images as video via segment anything model 2,”arXiv preprint arXiv:2408.00874, 2024. 2, 5, 9

work page arXiv 2024

[25] [25]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronnebergeret al., “U-net: Convolutional networks for biomedical image segmentation,” inMICCAI, 2015. 2

2015

[26] [26]

3d u-net: learning dense volumetric segmentation from sparse annotation,

¨O. C ¸ ic ¸eket al., “3d u-net: learning dense volumetric segmentation from sparse annotation,” inMICCAI, 2016, pp. 424–432. 2

2016

[27] [27]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. 2

2016

[28] [28]

Unext: Mlp-based rapid medical image segmentation network,

J. M. J. Valanarasuet al., “Unext: Mlp-based rapid medical image segmentation network,” inMICCAI. Springer, 2022. 2

2022

[29] [29]

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

J. Chenet al., “Transunet: Transformers make strong encoders for medical image segmentation,”arXiv preprint arXiv:2102.04306, 2021. 2

work page internal anchor Pith review Pith/arXiv arXiv 2021

[30] [30]

Transunet: Rethinking the u-net architecture de- sign for medical image segmentation through the lens of transformers,

J. Chen, J. Meiet al., “Transunet: Rethinking the u-net architecture de- sign for medical image segmentation through the lens of transformers,” Medical Image Analysis, vol. 97, p. 103280, 2024. 2

2024

[31] [31]

nn- former: V olumetric medical image segmentation via a 3d transformer,

H.-Y . Zhou, J. Guo, Y . Zhang, X. Han, L. Yu, L. Wang, and Y . Yu, “nn- former: V olumetric medical image segmentation via a 3d transformer,” IEEE transactions on image processing, vol. 32, pp. 4036–4045, 2023. 2

2023

[32] [32]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,” inICLR, 2021. 2

2021

[33] [33]

Mamba-sea: A mamba-based framework with global-to-local sequence augmentation for generalizable medical image segmentation,

Z. Cheng, J. Guo, J. Zhang, L. Qi, L. Zhou, Y . Shi, and Y . Gao, “Mamba-sea: A mamba-based framework with global-to-local sequence augmentation for generalizable medical image segmentation,”IEEE Transactions on Medical Imaging, 2025. 2

2025

[34] [34]

Switch-umamba: Dynamic scanning vision mamba unet for medical image segmentation,

Z. Zhanget al., “Switch-umamba: Dynamic scanning vision mamba unet for medical image segmentation,”Medical Image Analysis, p. 103792,

[35] [35]

Segmamba-v2: Long-range sequential modeling mamba for general 3d medical image segmentation,

Z. Xinget al., “Segmamba-v2: Long-range sequential modeling mamba for general 3d medical image segmentation,”IEEE Transactions on Medical Imaging, 2025. 2

2025

[36] [36]

Swin-umamba†: Adapting mamba-based vision foundation models for medical image segmentation,

J. Liu, H. Yang, H.-Y . Zhou, L. Yu, Y . Liang, Y . Yu, S. Zhang, H. Zheng, and S. Wang, “Swin-umamba†: Adapting mamba-based vision foundation models for medical image segmentation,”IEEE Transactions on Medical Imaging, vol. 44, no. 10, pp. 3898–3908, 2024. 2

2024

[37] [37]

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

J. Ma, F. Li, and B. Wang, “U-mamba: Enhancing long-range dependency for biomedical image segmentation,”arXiv preprint arXiv:2401.04722, 2024. 2

work page internal anchor Pith review Pith/arXiv arXiv 2024

[38] [38]

Segment anything,

A. Kirillovet al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026. 2

2023

[39] [39]

Medical sam adapter: Adapting segment anything model for medical image segmentation,

J. Wuet al., “Medical sam adapter: Adapting segment anything model for medical image segmentation,”Medical image analysis, vol. 102, p. 103547, 2025. 2

2025

[40] [40]

Sam-med2d.arXiv preprint arXiv:2308.16184,

J. Chenget al., “Sam-med2d,”arXiv preprint arXiv:2308.16184, 2023. 2, 9, 10

work page arXiv 2023

[41] [41]

DINOv3

O. Sim ´eoni, H. V . V oet al., “Dinov3,”arXiv preprint arXiv:2508.10104,

work page internal anchor Pith review Pith/arXiv arXiv

[42] [42]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840– 6851, 2020. 2

2020

[43] [43]

Anysr: Realizing image super- resolution as any-scale, any-resource,

W. Zhan, M. Lin, C.-W. Lin, and R. Ji, “Anysr: Realizing image super- resolution as any-scale, any-resource,”IEEE transactions on image processing, vol. 33, pp. 6564–6578, 2024. 3

2024

[44] [44]

From coarse to continuous: Progressive refinement implicit neural representation for motion-robust anisotropic mri reconstruction,

Z. Zhang, L. Zhang, Y . Cheng, Z. Wang, F. Wang, H. Zhang, Y . Yang, Y . Wu, J. Huang, A. I. Aviles-Rivero, Z. Gao, G. Yang, and P. J. Lally, “From coarse to continuous: Progressive refinement implicit neural representation for motion-robust anisotropic mri reconstruction,”IEEE transactions on image processing, vol. 35, pp. 3550–3565, 2026. 3

2026

[45] [45]

Joanet: An integrated joint optimization architecture making medical image segmentation really helped by super-resolution pre-processing,

C.-H. Qiu, X.-S. Zhang, and Y .-J. Li, “Joanet: An integrated joint optimization architecture making medical image segmentation really helped by super-resolution pre-processing,”IEEE transactions on image processing, vol. 34, pp. 7008–7023, 2025. 3

2025

[46] [46]

Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining,

B. Billotet al., “Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining,”Medical image analysis, vol. 86, p. 102789, 2023. 3, 6, 7

2023

[47] [47]

Do we really need all these preprocessing steps in brain mri segmentation?

E. Kondrateva, P. Druzhinina, and A. Kurmukov, “Do we really need all these preprocessing steps in brain mri segmentation?” inMedical Imaging with Deep Learning, 2022. 3

2022

[48] [48]

All you need is data preparation: A systematic review of image harmonization techniques in multi-center/device studies for medi- cal support systems,

S. Seoniet al., “All you need is data preparation: A systematic review of image harmonization techniques in multi-center/device studies for medi- cal support systems,”Computer Methods and Programs in Biomedicine, vol. 250, p. 108200, 2024. 3

2024

[49] [49]

Computing large deformation metric mappings via geodesic flows of diffeomorphisms,

M. F. Beget al., “Computing large deformation metric mappings via geodesic flows of diffeomorphisms,”International journal of computer vision, vol. 61, no. 2, pp. 139–157, 2005. 3

2005

[50] [50]

Flow Matching for Generative Modeling

Y . Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling,”arXiv preprint arXiv:2210.02747,

work page internal anchor Pith review Pith/arXiv arXiv

[51] [51]

Benchmarking the cow with the topcow challenge: Topology-aware anatomical segmentation of the circle of willis for cta and mra,

K. Yanget al., “Benchmarking the cow with the topcow challenge: Topology-aware anatomical segmentation of the circle of willis for cta and mra,”ArXiv, pp. arXiv–2312, 2025. 5

2025

[52] [52]

The multimodal brain tumor image segmentation benchmark (brats),

B. H. Menzeet al., “The multimodal brain tumor image segmentation benchmark (brats),”IEEE transactions on medical imaging, vol. 34, no. 10, pp. 1993–2024, 2014. 5

1993

[53] [53]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

S. Bakaset al., “Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge,”arXiv preprint arXiv:1811.02629,

work page internal anchor Pith review Pith/arXiv arXiv

[54] [54]

Lumbar spine segmentation in mr images: a dataset and a public benchmark,

J. W. van der Graaf, M. L. van Hooff, C. F. Buckens, M. Rutten, J. L. van Susante, R. J. Kroeze, M. de Kleuver, B. van Ginneken, and N. Lessmann, “Lumbar spine segmentation in mr images: a dataset and a public benchmark,”Scientific Data, vol. 11, no. 1, p. 264, 2024. 5

2024

[55] [55]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017. 5 JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2026 12

work page internal anchor Pith review Pith/arXiv arXiv 2017

[56] [56]

Image quality metrics: Psnr vs. ssim,

A. Hore and D. Ziou, “Image quality metrics: Psnr vs. ssim,” in2010 20th international conference on pattern recognition. IEEE, 2010, pp. 2366–2369. 5

2010

[57] [57]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004. 5

2004

[58] [58]

Data-efficient unsu- pervised interpolation without any intermediate frame for 4d medical images,

J. Kim, H. Yoon, G. Park, K. Kim, and E. Yang, “Data-efficient unsu- pervised interpolation without any intermediate frame for 4d medical images,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 11 353–11 364. 5

2024

[59] [59]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in CVPR, 2018, pp. 586–595. 5

2018

[60] [60]

Video interpolation with diffusion models,

S. Jain, D. Watson, E. Tabellion, B. Poole, J. Kontkanenet al., “Video interpolation with diffusion models,” inCVPR, 2024, pp. 7341–7351. 5

2024

[61] [61]

Fb-diff: Fourier basis-guided diffusion for temporal interpolation of 4d medical imaging,

X. Youet al., “Fb-diff: Fourier basis-guided diffusion for temporal interpolation of 4d medical imaging,” inICCV, 2025, pp. 28 010–28 020. 5

2025

[62] [62]

V-net: Fully convolutional neural networks for volumetric medical image segmentation,

F. Milletariet al., “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in2016 fourth international conference on 3D vision (3DV). Ieee, 2016, pp. 565–571. 5

2016

[63] [63]

Reducing the hausdorff distance in medical image segmentation with convolutional neural networks,

D. Karimi and S. E. Salcudean, “Reducing the hausdorff distance in medical image segmentation with convolutional neural networks,”TMI, vol. 39, no. 2, pp. 499–513, 2019. 5

2019

[64] [64]

Masked autoencoders are scalable vision learners,

K. Heet al., “Masked autoencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009. 9, 10

2022

[65] [65]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcetet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023. 9, 10

work page internal anchor Pith review Pith/arXiv arXiv 2023

[66] [66]

Vista3d: A unified segmentation foundation model for 3d medical imaging,

Y . Heet al., “Vista3d: A unified segmentation foundation model for 3d medical imaging,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 20 863–20 873. 9, 10

2025