arxiv: 2605.13798 · v1 · submitted 2026-05-13 · 💻 cs.CV

Recognition: 1 theorem link

· Lean Theorem

VoxCor: Training-Free Volumetric Features for Multimodal Voxel Correspondence

Guney Tombak , Ertunc Erdil , Ender Konukoglu

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:30 UTC · model grok-4.3

classification 💻 cs.CV

keywords volumetric featuresmultimodal correspondencetraining-freevoxel correspondencecross-modal transfervision transformersmedical image registrationfeature projection

0 comments

The pith

A training-free fit-transform method creates reusable volumetric features from frozen 2D vision transformers for cross-modal voxel correspondence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that frozen 2D ViT models can be turned into consistent 3D volumetric representations by running triplanar inference and fitting a compact weighted partial least squares projection on initial voxel correspondences. This produces features that transfer to new volumes via linear projection alone, without fine-tuning or registration at test time. Direct nearest-neighbor search then yields voxel correspondences usable for registration, segmentation, and landmark tasks. A sympathetic reader would care because existing pipelines require per-pair adaptation or handcrafted descriptors, limiting reuse across scanners and modalities.

Core claim

VoxCor is a training-free fit-transform method that combines triplanar ViT inference with a closed-form weighted partial least squares projection fitted on correspondences to select modality-stable anatomical directions; at transform time new volumes receive the same triplanar features followed by the fixed projection, after which correspondences are obtained by nearest-neighbor search, yielding improved performance in the hardest cross-subject cross-modality settings and registration results competitive with handcrafted descriptors and learned 3D features.

What carries the argument

The closed-form weighted partial least squares (WPLS) projection on triplanar ViT features, which uses fitting-time correspondences to identify modality-stable anatomical directions.

If this is right

Voxel correspondences on new volumes can be obtained directly by nearest-neighbor search without any registration step.
Registration performance becomes competitive with handcrafted descriptors and learned 3D features.
Encoder sensitivity decreases for dense correspondence transfer across modalities.
The same features support downstream tasks such as voxelwise k-nearest-neighbor segmentation and segmentation-center landmark localization.
The resulting representations serve as a reusable feature layer for multimodal analysis beyond single-pair registration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fitting procedure could be repeated on other 2D foundation models to produce modality-stable 3D features without redesigning the projection step.
Fitting correspondences from a wider range of anatomical sites might allow the method to handle previously unseen body regions with minimal extra data.
Because no per-volume optimization occurs at test time, the approach could be inserted into real-time clinical pipelines that currently avoid learned features due to compute cost.
Combining the projected features with classical intensity-based registration as a coarse-to-fine step might further reduce residual errors in difficult cross-subject cases.

Load-bearing premise

The modality-stable anatomical directions identified by the WPLS projection on fitting-time correspondences generalize to new volumes and unseen modality combinations without further adaptation.

What would settle it

A clear drop in nearest-neighbor correspondence accuracy or deformable registration Dice scores when the fitted projection is applied to a new cross-modality volume pair absent from the fitting correspondences.

Figures

Figures reproduced from arXiv: 2605.13798 by Ender Konukoglu, Ertunc Erdil, Guney Tombak.

**Figure 2.** Figure 2: Evaluation protocol (shown for Abdomen MR–CT with L2OCV). [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Direct ConvexAdam (CA, plain boxes) versus Globally-Initialized ConvexAdam [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Reusable Dataset-Fit (plain boxes) versus pair-specific Pair-Fit (hatched boxes) under [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: DINOv3 voxelwise kNN segmentation Dice radar plots under the [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative direct feature-space correspondence on Abdomen MR–CT (right kid [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: Semantic-versus-geometric correspondence in the Generalization (G) category. Each [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 8.** Figure 8: All-encoder voxelwise kNN segmentation Dice radar plots under the [PITH_FULL_IMAGE:figures/full_fig_p039_8.png] view at source ↗

**Figure 9.** Figure 9: kNN sensitivity to the number of neighbors under the [PITH_FULL_IMAGE:figures/full_fig_p041_9.png] view at source ↗

read the original abstract

Cross-modal 3D medical image analysis requires voxelwise representations that remain anatomically consistent across imaging contrasts, scanners, and acquisition protocols. Recent work has shown that frozen 2D Vision Transformer (ViT) foundation models can support such representations, but typical pipelines extract features along a single anatomical axis and adapt those features inside a registration solver for one image pair at a time, leaving complementary viewing directions unused and producing representations that do not transfer to new volumes. We introduce VoxCor, a training-free fit--transform method for reusable volumetric feature representations from frozen 2D ViT foundation models. During an offline fitting phase, VoxCor combines triplanar ViT inference with a compact closed-form weighted partial least squares (WPLS) projection that uses fitting-time voxel correspondences to select modality-stable anatomical directions in the triplanar feature space. At transform time, new volumes are mapped by triplanar ViT inference and linear projection alone, without fine-tuning or registration. Voxel correspondences can then be queried directly by nearest-neighbor search. We evaluate VoxCor on intra-subject Abdomen MR--CT and inter-subject HCP T2w--T1w tasks using deformable registration, voxelwise k-nearest-neighbor segmentation, and segmentation-center landmark localization. VoxCor improves the hardest cross-subject, cross-modality transfer settings, reduces encoder sensitivity for dense correspondence transfer, and yields registration performance competitive with handcrafted descriptors and learned 3D features. This positions VoxCor as a reusable feature layer for downstream multimodal analysis beyond pairwise registration. Code, configuration files, and implementation details are publicly available on GitHub at \href{https://github.com/guneytombak/VoxCor}{guneytombak/VoxCor}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VoxCor offers a straightforward training-free pipeline to turn frozen 2D ViTs into reusable 3D volumetric features via triplanar inference plus a one-shot WPLS projection, but the real test is whether that projection stays stable on truly held-out volumes and modality pairs.

read the letter

The main point is that this paper gives a fit-transform recipe for volumetric features that avoids any per-volume training or registration-time adaptation. You run a frozen 2D ViT on the three orthogonal planes of a volume, then apply a closed-form weighted partial least squares projection that was fitted once on voxel correspondences to keep the directions that stay consistent across modalities. At inference you just do the same inference-plus-projection step and can query correspondences by nearest neighbor. That combination looks new relative to the single-axis or per-pair adaptation approaches mentioned in the abstract, and the public GitHub repo with code and configs is a clear practical plus for anyone who wants to try it directly. The reported tests on intra-subject abdomen MR-CT and inter-subject HCP T1w-T2w tasks, covering registration, kNN segmentation, and landmark localization, show the method holding up in the harder cross-subject cross-modality cases and staying competitive with handcrafted descriptors or learned 3D features. That positions it as a reusable layer rather than a one-off solver tweak. The soft spot is the generalization step. The WPLS projection is fitted on correspondences from some set and then frozen; if those directions capture dataset-specific biases instead of stable anatomy, performance will drop on new scanners or unseen modality combinations. The abstract does not spell out the exact size of the fitting set, whether it is fully disjoint from test volumes, or the precise quantitative gains with error bars, so it is hard to judge how robust the claim really is. If the full experiments include clear held-out splits and ablation on the projection, that concern shrinks; otherwise it remains the load-bearing assumption. This is aimed at medical-image researchers who need off-the-shelf multimodal features for registration or dense correspondence without retraining encoders each time. A reader who works on practical 3D pipelines would get concrete value from the implementation details and the triplanar-plus-projection idea. It deserves a serious referee because the method is grounded, the code is available for verification, and the central claim is testable rather than circular.

Referee Report

3 major / 1 minor

Summary. The paper introduces VoxCor, a training-free fit-transform method that extracts reusable volumetric features from frozen 2D ViT foundation models via triplanar inference followed by a closed-form weighted partial least squares (WPLS) projection fitted once on voxel correspondences. These features support direct nearest-neighbor voxel correspondence across modalities and subjects without per-pair adaptation or fine-tuning, and are evaluated on intra-subject Abdomen MR-CT and inter-subject HCP T2w-T1w tasks for deformable registration, kNN segmentation, and landmark localization, claiming gains in the hardest cross-subject cross-modality settings, reduced encoder sensitivity, and competitive performance versus handcrafted and learned 3D descriptors.

Significance. If the WPLS-derived directions prove to generalize beyond the fitting distribution, VoxCor would supply a practical, reusable feature layer for multimodal 3D medical imaging that avoids task-specific training or per-pair solvers, simplifying pipelines for registration and dense correspondence. The training-free design and public code release are notable strengths for reproducibility.

major comments (3)

[Abstract] Abstract: the claims of performance improvements, reduced encoder sensitivity, and competitive registration results are stated without any quantitative numbers, error bars, data-split details, baseline specifications, or subject counts for fitting versus test phases, making it impossible to verify whether the data support the central claims.
[Abstract] Abstract and evaluation description: the manuscript does not state whether the fitting set used to learn the WPLS projection is disjoint from the test volumes or how many subjects are used for fitting, which is load-bearing for the claim that modality-stable directions generalize to new volumes and unseen modality combinations.
[Method (WPLS)] Method section on WPLS projection: because the projection is fitted using external voxel correspondences from a fitting set, the selected directions may encode dataset-specific anatomical or acquisition biases rather than truly invariant features; without explicit held-out validation this risks circularity in the 'training-free reusable feature' positioning.

minor comments (1)

[Abstract] The GitHub link is given but the main text could include a brief reproducibility checklist (exact ViT backbone, triplanar axis choices, and WPLS hyperparameters) to aid readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to improve clarity and support for the central claims.

read point-by-point responses

Referee: [Abstract] Abstract: the claims of performance improvements, reduced encoder sensitivity, and competitive registration results are stated without any quantitative numbers, error bars, data-split details, baseline specifications, or subject counts for fitting versus test phases, making it impossible to verify whether the data support the central claims.

Authors: We agree that the abstract lacks the necessary quantitative support. In the revised manuscript we will insert specific performance metrics (e.g., Dice scores, landmark errors), standard deviations or error bars, baseline specifications, and explicit subject counts for the fitting versus test phases so that readers can directly assess the strength of the reported improvements. revision: yes
Referee: [Abstract] Abstract and evaluation description: the manuscript does not state whether the fitting set used to learn the WPLS projection is disjoint from the test volumes or how many subjects are used for fitting, which is load-bearing for the claim that modality-stable directions generalize to new volumes and unseen modality combinations.

Authors: The fitting set is disjoint from all test volumes; the WPLS projection is learned once on a separate cohort (10 subjects for Abdomen, 20 subjects for HCP) and then applied without further adaptation. We will add these exact subject counts and an explicit statement of disjointness to both the abstract and the evaluation section to make the generalization claim verifiable. revision: yes
Referee: [Method (WPLS)] Method section on WPLS projection: because the projection is fitted using external voxel correspondences from a fitting set, the selected directions may encode dataset-specific anatomical or acquisition biases rather than truly invariant features; without explicit held-out validation this risks circularity in the 'training-free reusable feature' positioning.

Authors: We acknowledge the risk of dataset-specific bias. To address it we will add a new held-out validation experiment in the revised manuscript that applies the fitted WPLS directions to completely unseen subjects and modality pairs (including cross-dataset transfer) and reports the resulting correspondence accuracy, thereby demonstrating that the selected directions capture modality-stable anatomical structure rather than fitting-set idiosyncrasies. revision: yes

Circularity Check

0 steps flagged

No circularity: closed-form WPLS fit on external correspondences yields independent transform-time features

full rationale

The derivation consists of an offline closed-form WPLS projection computed from externally supplied fitting-time voxel correspondences, followed by a linear transform applied unchanged to new volumes. No equation reduces the output to a redefinition of its own fitted parameters, no self-citation chain is load-bearing for the central claim, and no ansatz or uniqueness result is smuggled in. The reusability claim is therefore an empirical generalization statement rather than a definitional equivalence.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on standard linear-algebra assumptions for partial least squares and the domain assumption that triplanar 2D features contain complementary stable anatomical information across modalities.

free parameters (1)

WPLS projection weights
Weights are determined from fitting-time voxel correspondences to emphasize modality-stable directions.

axioms (1)

domain assumption Triplanar features from a frozen 2D ViT capture complementary anatomical information that can be linearly combined into modality-stable volumetric descriptors
Invoked when the method assumes that the three orthogonal views together provide sufficient information for cross-modal consistency.

pith-pipeline@v0.9.0 · 5621 in / 1294 out tokens · 45763 ms · 2026-05-14T19:30:58.368894+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean; IndisputableMonolith/Foundation/AlexanderDuality.lean reality_from_one_distinction; Jcost uniqueness unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

VoxCor combines triplanar ViT inference with a compact closed-form weighted partial least squares (WPLS) projection that uses fitting-time voxel correspondences to select modality-stable anatomical directions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 11 canonical work pages · 4 internal anchors

[1]

A survey of medical image registration.Medical image analysis, 2(1):1–36, 1998

JB Antoine Maintz and Max A Viergever. A survey of medical image registration.Medical image analysis, 2(1):1–36, 1998

1998
[2]

Deformable medical image registration: A survey.IEEE transactions on medical imaging, 32(7):1153–1190, 2013

Aristeidis Sotiras, Christos Davatzikos, and Nikos Paragios. Deformable medical image registration: A survey.IEEE transactions on medical imaging, 32(7):1153–1190, 2013

2013
[3]

A review of atlas-based segmentation for magnetic resonance brain images.Computer methods and programs in biomedicine, 104(3):e158–e177, 2011

Mariano Cabezas, Arnau Oliver, Xavier Lladó, Jordi Freixenet, and Meritxell Bach Cuadra. A review of atlas-based segmentation for magnetic resonance brain images.Computer methods and programs in biomedicine, 104(3):e158–e177, 2011

2011
[4]

Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain.Medical image analysis, 12(1):26–41, 2008

Brian B Avants, Charles L Epstein, Murray Grossman, and James C Gee. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain.Medical image analysis, 12(1):26–41, 2008

2008
[5]

Elastix: a toolbox for intensity-based medical image registration.IEEE transactions on medical imaging, 29(1):196–205, 2009

Stefan Klein, Marius Staring, Keelin Murphy, Max A Viergever, and Josien PW Pluim. Elastix: a toolbox for intensity-based medical image registration.IEEE transactions on medical imaging, 29(1):196–205, 2009

2009
[6]

Diffeomorphic demons: Efficient non-parametric image registration.NeuroImage, 45(1):S61–S72, 2009

Tom Vercauteren, Xavier Pennec, Aymeric Perchant, and Nicholas Ayache. Diffeomorphic demons: Efficient non-parametric image registration.NeuroImage, 45(1):S61–S72, 2009

2009
[7]

A fast diffeomorphic image registration algorithm.Neuroimage, 38(1): 95–113, 2007

John Ashburner. A fast diffeomorphic image registration algorithm.Neuroimage, 38(1): 95–113, 2007

2007
[8]

Voxel- morph: a learning framework for deformable medical image registration.IEEE transactions on medical imaging, 38(8):1788–1800, 2019

Guha Balakrishnan, Amy Zhao, Mert R Sabuncu, John Guttag, and Adrian V Dalca. Voxel- morph: a learning framework for deformable medical image registration.IEEE transactions on medical imaging, 38(8):1788–1800, 2019

2019
[9]

End-to-end unsupervised deformable image registration with a convolutional neural net- work

Bob D De Vos, Floris F Berendsen, Max A Viergever, Marius Staring, and Ivana Išgum. End-to-end unsupervised deformable image registration with a convolutional neural net- work. InDeep Learning in Medical Image Analysis and Multimodal Learning for Clini- cal Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS ...

2017
[10]

Cross-modal attention for multi-modal image registration.Medical Image Analysis, 82:102612, 2022

Xinrui Song, Hanqing Chao, Xuanang Xu, Hengtao Guo, Sheng Xu, Baris Turkbey, Brad- ford J Wood, Thomas Sanford, Ge Wang, and Pingkun Yan. Cross-modal attention for multi-modal image registration.Medical Image Analysis, 82:102612, 2022

2022
[11]

Prince, and Yong Du

Junyu Chen, Yihao Liu, Shuwen Wei, Zhangxing Bian, Shalini Subramanian, Aaron Carass, Jerry L. Prince, and Yong Du. A survey on deep learning in medical image registration: New technologies, uncertainty, evaluation metrics, and beyond.Medical Image Analysis, 100:103385, 2025. ISSN 1361-8415. doi: 10.1016/j.media.2024.103385. URLhttps://www. sciencedirect....

work page doi:10.1016/j.media.2024.103385 2025
[12]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021. URL https:...

2021
[13]

Emerging properties in self-supervised vision transformers

MathildeCaron, HugoTouvron, IshanMisra, HervéJégou, JulienMairal, PiotrBojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021. 27

2021
[14]

DINOv2: Learning Robust Visual Features without Supervision

MaximeOquab, TimothéeDarcet, ThéoMoutakanni, HuyVo, MarcSzafraniec, VasilKhali- dov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

DINOv3

Oriane Siméoni, Huy V. Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. DINOv3.arXiv preprint arXiv:2508.10104, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[16]

Segment anything

AlexanderKirillov, EricMintun, NikhilaRavi, HanziMao, ChloeRolland, LauraGustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. InProceedings of the IEEE/CVF international conference on computer vision, pages 4015– 4026, 2023

2023
[17]

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

SAM 3: Segment Anything with Concepts

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. SAM 3: Segment anything with concepts.arXiv preprint arXiv:2511.16719, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Segment anything in medical images.Nature Communications, 15:654, 2024

Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images.Nature Communications, 15:654, 2024

2024
[20]

Medsam2: Segment anything in 3d medical images and videos.arXiv preprint arXiv:2504.03600, 2025

Jun Ma, Zongxin Yang, Sumin Kim, Bihui Chen, Mohammed Baharoon, Adibvafa Fallah- pour, Reza Asakereh, Hongwei Lyu, and Bo Wang. MedSAM2: Segment anything in 3d medical images and videos.arXiv preprint arXiv:2504.03600, 2025

work page arXiv 2025
[21]

Towards general purpose vision foundation models for medical image analysis: An experimental study of dinov2 on radiology benchmarks.arXiv preprint arXiv:2312.02366, 2023

Mohammed Baharoon, Waseem Qureshi, Jiahong Ouyang, Yanwu Xu, Kilian Phol, Ab- dulrhman Aljouie, and Wei Peng. Towards general purpose vision foundation models for medical image analysis: An experimental study of dinov2 on radiology benchmarks.arXiv preprint arXiv:2312.02366, 2023

work page arXiv 2023
[22]

Do vision foundation models enhance domain generalization in medical image segmentation?arXiv preprint arXiv:2409.07960, 2024

Kerem Cekmeceli, Meva Himmetoglu, Guney I Tombak, Anna Susmelj, Ertunc Erdil, and Ender Konukoglu. Do vision foundation models enhance domain generalization in medical image segmentation?arXiv preprint arXiv:2409.07960, 2024

work page arXiv 2024
[23]

DINO-Reg: General purpose image encoder for training-free multi-modal deformable medical image registration

Xinrui Song, Xuanang Xu, and Pingkun Yan. DINO-Reg: General purpose image encoder for training-free multi-modal deformable medical image registration. InInternational Con- ference on Medical Image Computing and Computer-Assisted Intervention, pages 608–617. Springer, 2024

2024
[24]

Wong, Clinton J

Neel Dey, Benjamin Billot, Hallee E. Wong, Clinton J. Wang, Mengwei Ren, P. Ellen Grant, Adrian V. Dalca, and Polina Golland. Learning general-purpose biomedical vol- ume representations using randomized synthesis. InInternational Conference on Learning Representations, 2025

2025
[25]

Totalsegmentator: robust segmentation of 104 anatomic structures in ct images.Radiology: Artificial Intelligence, 5(5), 2023

Jakob Wasserthal, Hanns-Christian Breit, Manfred T Meyer, Maurice Pradella, Daniel Hinck, Alexander W Sauter, Tobias Heye, Daniel T Boll, Joshy Cyriac, Shan Yang, et al. Totalsegmentator: robust segmentation of 104 anatomic structures in ct images.Radiology: Artificial Intelligence, 5(5), 2023

2023
[26]

Are vision foundation models ready for out-of-the-box medical image registration?arXiv preprint arXiv:2507.11569, 2025

Hanxue Gu, Yaqian Chen, Nicholas Konz, Qihang Li, and Maciej A Mazurowski. Are vision foundation models ready for out-of-the-box medical image registration?arXiv preprint arXiv:2507.11569, 2025. 28

work page arXiv 2025
[27]

VISTA3D: A unified segmentation foundation model for 3d medical imaging

Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu, and Wenqi Li. VISTA3D: A unified segmentation foundation model for 3d medical imaging. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2025
[28]

SegVol: Universal and interactive volu- metric medical image segmentation

Yuxin Du, Fan Bai, Tiejun Huang, and Bo Zhao. SegVol: Universal and interactive volu- metric medical image segmentation. InAdvances in Neural Information Processing Systems, 2024

2024
[29]

Alessa Hering, Lasse Hansen, Tony C. W. Mok, et al. Learn2reg: Comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning. IEEE Transactions on Medical Imaging, 42(3):697–712, 2023

2023
[30]

Van Essen, Stephen M

David C. Van Essen, Stephen M. Smith, Deanna M. Barch, Timothy E. J. Behrens, Essa Yacoub, Kamil Ugurbil, and WU-Minn HCP Consortium. The wu-minn human connectome project: an overview.Neuroimage, 80:62–79, 2013

2013
[31]

Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-blockcase. TechnicalReportTechnicalReport371, DepartmentofStatistics, University of Washington, 2000

2000
[32]

A whitening approach to probabilistic canonical correlation analysis for omics data integration.BMC Bioinformatics, 20(1):15, 2019

Takoua Jendoubi and Korbinian Strimmer. A whitening approach to probabilistic canonical correlation analysis for omics data integration.BMC Bioinformatics, 20(1):15, 2019. doi: 10.1186/s12859-018-2572-9

work page doi:10.1186/s12859-018-2572-9 2019
[33]

Heinrich, Mark Jenkinson, Bartłomiej W

Mattias P. Heinrich, Mark Jenkinson, Bartłomiej W. Papież, Michael Brady, and Julia A. Schnabel. Towards realtime multimodal fusion for image-guided interventions using self- similarities. InMedical Image Computing and Computer-Assisted Intervention – MICCAI 2013, volume 8151 ofLecture Notes in Computer Science, pages 187–194. Springer, 2013. doi: 10.1007...

work page doi:10.1007/978-3-642-40811-3_24 2013
[34]

Convex- adam: Self-configuring dual-optimisation-based 3d multitask medical image registration

Hanna Siebert, Christoph Großbröhmer, Lasse Hansen, and Mattias P Heinrich. Convex- adam: Self-configuring dual-optimisation-based 3d multitask medical image registration. IEEE Transactions on Medical Imaging, 2024

2024
[35]

Freesurfer.NeuroImage, 62(2):774–781, 2012

Bruce Fischl. Freesurfer.NeuroImage, 62(2):774–781, 2012

2012
[36]

xformers: A modu- lar and hackable transformer modelling library.https://github.com/facebookresearch/ xformers, 2022

Benjamin Lefaudeux, Francisco Massa, Diana Liskovich, Wenhan Xiong, Vittorio Caggiano, Sean Naren, Min Xu, Jieru Hu, Marta Tintore, Susan Zhang, Patrick Labatut, Daniel Haziza, Luca Wehrstedt, Jeremy Reizenstein, and Grigory Sizov. xformers: A modu- lar and hackable transformer modelling library.https://github.com/facebookresearch/ xformers, 2022. 29 A Me...

2022