pith. machine review for the scientific record. sign in

arxiv: 2605.12026 · v1 · submitted 2026-05-12 · 💻 cs.CV · cs.AI· eess.SP

Recognition: 2 theorem links

· Lean Theorem

Spectral Vision Transformer for Efficient Tokenization with Limited Data

Authors on Pith no claims yet

Pith reviewed 2026-05-13 06:45 UTC · model grok-4.3

classification 💻 cs.CV cs.AIeess.SP
keywords spectral vision transformerefficient tokenizationmedical imaginglimited datavision transformerspatial invariancesignal-to-noise ratio
0
0 comments X

The pith

A spectral vision transformer delivers comparable performance on medical images using fewer parameters than standard models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a vision transformer that tokenizes images through projection onto a spectral basis rather than dividing them into spatial patches. This design is argued to provide spatial invariance and an optimal signal-to-noise ratio while lowering the model's complexity. When tested on limited medical imaging datasets, both simulated and real, the resulting model performs as well as or better than conventional vision transformers, attention-based CNNs, and simpler baselines despite having fewer parameters. This efficiency is particularly relevant for medical applications where data scarcity is common and computational resources may be constrained. The authors support their approach with theoretical analysis of the basis properties and empirical comparisons across multiple model types.

Core claim

The central claim is that a vision transformer built on spectral projections instead of spatial patches can maintain high performance in data-limited regimes, particularly for medical images, while using fewer parameters than conventional vision transformers and competing architectures. The spectral basis confers spatial invariance and maximizes signal-to-noise ratio, leading to reduced complexity in the tokenization step. Empirical results across multiple datasets demonstrate equitable or superior accuracy compared to compact and standard vision transformers, convolutional networks with attention, shifted window transformers, multi-layer perceptrons, and logistic regression.

What carries the argument

Spectral projection for tokenization, the step that replaces patch-based embedding by projecting the image onto a chosen spectral basis to generate tokens while providing spatial invariance and optimal signal-to-noise ratio.

Load-bearing premise

The chosen spectral basis delivers the stated spatial invariance and optimal signal-to-noise ratio in practice on medical images without losing task-relevant information, and that the complexity reduction translates directly to the reported performance gains.

What would settle it

A direct comparison on a new unseen clinical dataset in which a standard vision transformer with matched parameter count outperforms the spectral version, or where the spectral representation is shown to discard high-frequency features needed for the task.

Figures

Figures reproduced from arXiv: 2605.12026 by Alexandra G. Roberts, Alexey V. Dimov, Brian H. Kopell, Dominick Romano, Heejong Kim, Jinwei Zhang, Ki Sueng Choi, Maneesh John, Mert R. Sabuncu, Mert Sisman, Pascal Spincemaille, Thanh D. Nguyen, Yi Wang.

Figure 1
Figure 1. Figure 1: Our proposed spectral ViT primarily differs from conventional (“spatial”) ViTs in the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Spectral tokenization and PSRR reconstruction by desired basis property. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Simulated binary classification for network differentiation between pure noise (class 0, first [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Parameter-balanced spectral and spatial ViT pattern classification performance by sample [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Spectral and spatial ViT object detection classification between close (class 0) and far [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: QSM and saliency maps, identifying iron in the substantia nigra previously linked to [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
read the original abstract

We propose a novel spectral vision transformer architecture for efficient tokenization in limited data, with an emphasis on medical imaging. We outline convenient theoretical properties arising from the choice of basis including spatial invariance and optimal signal-to-noise ratio. We show reduced complexity arising from the spectral projection compared to spatial vision transformers. We show equitable or superior performance with a reduced number of parameters as compared to a variety of models including compact and standard vision transformers, convolutional neural networks with attention, shifted window transformers, multi-layer perceptrons, and logistic regression. We include simulated, public, and clinical data in our analysis and release our code at: \verb+github.com/agr78/spectralViT+.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a Spectral Vision Transformer (SpectralViT) for efficient tokenization under limited data regimes, with a focus on medical imaging. It claims that a chosen spectral basis yields spatial invariance and optimal signal-to-noise ratio, resulting in reduced model complexity relative to spatial ViTs. Empirically, it reports equitable or superior performance with fewer parameters than compact/standard vision transformers, CNNs with attention, shifted-window transformers (Swin), MLPs, and logistic regression, evaluated on simulated, public, and clinical datasets. Code is released at github.com/agr78/spectralViT.

Significance. If the central claims hold after verification, the work could provide a useful direction for parameter-efficient transformers in data-scarce medical imaging, where reduced complexity without loss of diagnostic performance would be valuable. The explicit code release is a positive factor supporting reproducibility.

major comments (3)
  1. [§3] §3 (Spectral Tokenization): the assertion that the spectral basis simultaneously achieves spatial invariance and optimal SNR while preserving task-critical high-frequency content (e.g., lesion boundaries and textures in medical images) lacks a concrete derivation or preservation proof; without this, the claimed complexity reduction cannot be shown to translate directly to the reported performance gains rather than acting as an unintended low-pass filter.
  2. [§4] §4 (Experiments): no ablation isolates the contribution of the spectral projection from confounding factors such as token count, attention depth, or regularization; the performance comparisons therefore do not establish that the spectral choice itself is responsible for equitable/superior results with fewer parameters.
  3. [Table 2] Table 2 (or equivalent results table): quantitative metrics are presented without error bars, statistical significance tests, or dataset-size details, undermining the ability to assess robustness of the “equitable or superior” claim across the simulated, public, and clinical splits.
minor comments (2)
  1. [Abstract] The abstract and §1 would benefit from a one-sentence statement of the exact spectral basis (e.g., Fourier, wavelet, or learned) to allow immediate assessment of the invariance and SNR claims.
  2. Ensure the released repository contains the precise preprocessing pipelines and hyper-parameter settings used for each baseline so that the parameter-count and performance comparisons can be reproduced exactly.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [§3] §3 (Spectral Tokenization): the assertion that the spectral basis simultaneously achieves spatial invariance and optimal SNR while preserving task-critical high-frequency content (e.g., lesion boundaries and textures in medical images) lacks a concrete derivation or preservation proof; without this, the claimed complexity reduction cannot be shown to translate directly to the reported performance gains rather than acting as an unintended low-pass filter.

    Authors: We appreciate the referee's observation that the theoretical justification in §3 would benefit from greater rigor. In the revised manuscript we will add an explicit derivation subsection that (i) proves spatial invariance via the translation-equivariance of the chosen spectral basis (Fourier or wavelet), (ii) shows optimal SNR through energy compaction and Parseval's relation, and (iii) demonstrates preservation of task-critical high-frequency content by analyzing the frequency response of the projection operator together with spectral plots of lesion boundaries on medical images. These additions will directly connect the basis properties to the observed complexity reduction and performance gains, ruling out an unintended low-pass effect. revision: yes

  2. Referee: [§4] §4 (Experiments): no ablation isolates the contribution of the spectral projection from confounding factors such as token count, attention depth, or regularization; the performance comparisons therefore do not establish that the spectral choice itself is responsible for equitable/superior results with fewer parameters.

    Authors: We agree that isolating the spectral projection's contribution is essential. In the revision we will insert a controlled ablation study in §4 that fixes token count (via equivalent patch sizing), attention depth (identical layer count), and regularization (same dropout and weight decay) while varying only the tokenization method. Direct head-to-head comparisons of spectral versus standard spatial tokenization on the same datasets will be reported, thereby attributing the parameter efficiency and performance to the spectral basis itself. revision: yes

  3. Referee: [Table 2] Table 2 (or equivalent results table): quantitative metrics are presented without error bars, statistical significance tests, or dataset-size details, undermining the ability to assess robustness of the “equitable or superior” claim across the simulated, public, and clinical splits.

    Authors: We acknowledge the need for statistical transparency. All result tables, including Table 2, will be updated to report mean ± standard deviation over multiple random seeds (minimum five runs), include p-values from paired statistical tests (t-test or Wilcoxon) against baselines, and explicitly list the number of samples in each simulated, public, and clinical split. These changes will allow readers to evaluate the robustness of the performance claims. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture and claims rest on independent empirical validation

full rationale

The paper proposes a spectral vision transformer with claimed properties (spatial invariance, optimal SNR) arising from the basis choice, then reports reduced complexity and competitive performance versus external baselines (ViT, Swin, CNN-attention, MLP, logistic regression) on simulated/public/clinical medical datasets. No load-bearing step equates a prediction or first-principles result to its own inputs by construction, self-citation, or fitted renaming. The central claims are supported by direct experiments and code release rather than reducing to self-referential definitions or prior author work invoked as uniqueness theorems. This is the normal case of a self-contained empirical architecture paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard properties of spectral bases (Fourier-like transforms) assumed to confer spatial invariance and optimal SNR; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Spectral projection onto chosen basis yields spatial invariance and optimal signal-to-noise ratio
    Stated as convenient theoretical properties arising from the choice of basis in the abstract.

pith-pipeline@v0.9.0 · 5463 in / 1115 out tokens · 72899 ms · 2026-05-13T06:45:56.231108+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 3 internal anchors

  1. [1]

    S. K. Zhou, H. Greenspan, C. Davatzikos, J. S. Duncan, B. van Ginneken, A. Madabhushi, J. L. Prince, D. Rueckert, and R. M. Summers. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proc IEEE Inst Electr Electron Eng, 109(5):820–838, 2021. ISSN 0018-9219. doi: 10.1...

  2. [2]

    Barragán-Montero, U

    A. Barragán-Montero, U. Javaid, G. Valdés, D. Nguyen, P. Desbordes, B. Macq, S. Willems, L. Vandewinckele, M. Holmström, F. Löfman, S. Michiels, K. Souris, E. Sterpin, and J. A. Lee. Artificial intelligence and machine learning for medical imaging: A technology review.Phys Med, 83:242–256, 2021. ISSN 1120-1797. doi: 10.1016/j.ejmp.2021.04.016

  3. [3]

    Segment anything in medical images.Nature Communications, 15(1):654, 2024

    Jun Ma et al. Segment anything in medical images.Nature Communications, 15(1):654, 2024

  4. [4]

    H. E. Kim, H. H. Kim, B. K. Han, K. H. Kim, K. Han, H. Nam, E. H. Lee, and E. K. Kim. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study.Lancet Digit Health, 2(3):e138–e148, 2020. ISSN 2589-7500. doi: 10.1016/s2589-7500(20)30003-0

  5. [5]

    J. G. Nam, S. Park, E. J. Hwang, J. H. Lee, K. N. Jin, K. Y . Lim, T. H. Vu, J. H. Sohn, S. Hwang, J. M. Goo, and C. M. Park. Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs.Radiology, 290(1): 218–228, 2019. ISSN 0033-8419. doi: 10.1148/radiol.2018180237

  6. [6]

    Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Anton Schwaighofer, Anja Thieme, Sam Bond- Taylor, Maximilian Ilse, Fernando Pérez-García, Valentina Salvatelli, Harshita Sharma, Felix Meissen, Mercy Ranjit, Shaury Srivastav, Julia Gong, Noel C. F. Codella, Fabian Falck, Ozan Oktay, Matthew P. Lungren, Maria Teodora Wetscherek, Javier Alvarez-Valle, and St...

  7. [7]

    Zhang, Y

    X. Zhang, Y . Liu, G. Ouyang, W. Chen, A. Xu, T. Hara, X. Zhou, and D. Wu. Dermvit: Diagnosis-guided vision transformer for robust and efficient skin lesion classification.Bioengi- neering (Basel), 12(4), 2025. ISSN 2306-5354. doi: 10.3390/bioengineering12040421

  8. [8]

    Prior knowledge-guided attention in self-supervised vision transformers, 2022

    Kevin Miao, Akash Gokul, Raghav Singh, Suzanne Petryk, Joseph Gonzalez, Kurt Keutzer, Trevor Darrell, and Colorado Reed. Prior knowledge-guided attention in self-supervised vision transformers, 2022. URLhttps://arxiv.org/abs/2209.03745

  9. [9]

    ICML, 2021

    Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through attention, 2021. URLhttps://arxiv.org/abs/2012.12877

  10. [10]

    Locality guidance for improving vision transformers on tiny datasets, 2022

    Kehan Li, Runyi Yu, Zhennan Wang, Li Yuan, Guoli Song, and Jie Chen. Locality guidance for improving vision transformers on tiny datasets, 2022. URL https://arxiv.org/abs/2207. 10026

  11. [11]

    Shekoofeh Azizi, Laura Culp, Jan Freyberg, Basil Mustafa, Sebastien Baur, Simon Kornblith, Ting Chen, Patricia MacWilliams, S. Sara Mahdavi, Ellery Wulczyn, Boris Babenko, Megan Wilson, Aaron Loh, Po-Hsuan Cameron Chen, Yuan Liu, Pinal Bavishi, Scott Mayer McKinney, Jim Winkens, Abhijit Guha Roy, Zach Beaver, Fiona Ryan, Justin Krogue, Mozziyar Etemadi, U...

  12. [12]

    A theory of learning from different domains.Machine Learn- ing, 79(1):151–175, 2010

    Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jen- nifer Wortman Vaughan. A theory of learning from different domains.Machine Learn- ing, 79(1):151–175, 2010. ISSN 1573-0565. doi: 10.1007/s10994-009-5152-4. URL https://doi.org/10.1007/s10994-009-5152-4. 12

  13. [13]

    Foundation models in medicine are a so- cial experiment: time for an ethical framework.npj Digital Medicine, 8(1):525, 2025

    Robert Ranisch and Joschka Haltaufderheide. Foundation models in medicine are a so- cial experiment: time for an ethical framework.npj Digital Medicine, 8(1):525, 2025. ISSN 2398-6352. doi: 10.1038/s41746-025-01924-4. URL https://doi.org/10.1038/ s41746-025-01924-4

  14. [14]

    A review of predictive and contrastive self-supervised learning for medical images.Machine Intelligence Research, 20 (4):483–513, 2023

    Wei-Chien Wang, Euijoon Ahn, Dagan Feng, and Jinman Kim. A review of predictive and contrastive self-supervised learning for medical images.Machine Intelligence Research, 20 (4):483–513, 2023. ISSN 2731-538X. doi: 10.1007/s11633-022-1406-4. URL https: //www.mi-research.net/en/article/doi/10.1007/s11633-022-1406-4

  15. [15]

    Roberts, Ha M

    Alexandra G. Roberts, Ha M. Luu, Mert ¸ Si¸ sman, Alexey V . Dimov, Ceren Tozlu, Ilhami Kovanlikaya, Susan A. Gauthier, Thanh D. Nguyen, and Yi Wang. Synthetic generation and latent projection denoising of rim lesions in multiple sclerosis, 2025. URL https://arxiv. org/abs/2505.23353

  16. [16]

    Piffer, L

    S. Piffer, L. Ubaldi, S. Tangaro, A. Retico, and C. Talamonti. Tackling the small data problem in medical image classification with artificial intelligence: a systematic review.Prog Biomed Eng (Bristol), 6(3), 2024. ISSN 2516-1091. doi: 10.1088/2516-1091/ad525b

  17. [17]

    Maier-Hein, M

    L. Maier-Hein, M. Eisenmann, D. Sarikaya, K. März, T. Collins, A. Malpani, J. Fallert, H. Feuss- ner, S. Giannarou, P. Mascagni, H. Nakawala, A. Park, C. Pugh, D. Stoyanov, S. S. Vedula, K. Cleary, G. Fichtinger, G. Forestier, B. Gibaud, T. Grantcharov, M. Hashizume, D. Heckmann- Nötzel, H. G. Kenngott, R. Kikinis, L. Mündermann, N. Navab, S. Onogur, T. R...

  18. [18]

    R. Zhao, Y . Zhuge, K. Camphausen, and A. V . Krauze. Machine learning based survival predic- tion in glioma using large-scale registry data.Health Informatics J, 28(4):14604582221135427,

  19. [19]

    doi: 10.1177/14604582221135427

    ISSN 1460-4582. doi: 10.1177/14604582221135427

  20. [20]

    Banerjee, J

    J. Banerjee, J. N. Taroni, R. J. Allaway, D. V . Prasad, J. Guinney, and C. Greene. Machine learning in rare disease.Nat Methods, 20(6):803–814, 2023. ISSN 1548-7091. doi: 10.1038/ s41592-023-01886-z

  21. [21]

    Wave-vit: Unifying wavelet and transformers for visual representation learning.arXiv preprint arXiv:2207.04978, 2022

    Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, and Tao Mei. Wave-vit: Unifying wavelet and transformers for visual representation learning.arXiv preprint arXiv:2207.04978, 2022

  22. [22]

    Spectral image tokenizer, 2025

    Carlos Esteves, Mohammed Suhail, and Ameesh Makadia. Spectral image tokenizer, 2025. URLhttps://arxiv.org/abs/2412.09607

  23. [23]

    Semanticist: Pca-guided visual tokenization.arXiv preprint arXiv:2503.08685, 2025

    Haiwen Wen et al. Semanticist: Pca-guided visual tokenization.arXiv preprint arXiv:2503.08685, 2025

  24. [24]

    Global filter networks for image classification

    Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. Global filter networks for image classification. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 980–993. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_ files/...

  25. [25]

    Frequency-domain MLPs are more effective learners in time series forecasting

    Kun Yi, Qi Zhang, Wei Fan, Shoujin Wang, Pengyang Wang, Hui He, Ning An, Defu Lian, Longbing Cao, and Zhendong Niu. Frequency-domain MLPs are more effective learners in time series forecasting. InThirty-seventh Conference on Neural Information Processing Systems,

  26. [26]

    URLhttps://openreview.net/forum?id=iif9mGCTfy

  27. [27]

    Mahoney, Andrew Gordon Wilson, Youngsuk Park, Syama Rangapuram, Danielle C

    Luca Masserano, Abdul Fatir Ansari, Boran Han, Xiyuan Zhang, Christos Faloutsos, Michael W. Mahoney, Andrew Gordon Wilson, Youngsuk Park, Syama Rangapuram, Danielle C. Maddix, and Yuyang Wang. Enhancing foundation models for time series forecasting via wavelet-based tokenization, 2024. URLhttps://arxiv.org/abs/2412.05244. 13

  28. [28]

    Universal spectral tokenization via self-supervised panchromatic representation learning, 2025

    Jeff Shen, Francois Lanusse, Liam Holden Parker, Ollie Liu, Tom Hehir, Leopoldo Sarra, Lucas Meyer, Micah Bowles, Sebastian Wagner-Carena, Sebastian Wagner-Carena, Helen Qu, Siavash Golkar, Alberto Bietti, Hatim Bourfoune, Nathan Cassereau, Pierre Cornette, Keiya Hirashima, Geraud Krawezik, Ruben Ohana, Nicholas Lourie, Michael McCabe, Rudy Morel, Payel M...

  29. [29]

    Fredft: Frequency domain fusion transformer for visible-infrared object detection, 2025

    Wencong Wu, Xiuwei Zhang, Hanlin Yin, Shun Dai, Hongxi Zhang, and Yanning Zhang. Fredft: Frequency domain fusion transformer for visible-infrared object detection, 2025. URL https://arxiv.org/abs/2511.10046

  30. [30]

    Spec- tralar: Spectral autoregressive visual generation, 2025

    Yuanhui Huang, Weiliang Chen, Wenzhao Zheng, Yueqi Duan, Jie Zhou, and Jiwen Lu. Spec- tralar: Spectral autoregressive visual generation, 2025. URL https://arxiv.org/abs/2506. 10962

  31. [31]

    Denoising vision transformer autoencoder with spectral self-regularization,

    Xunzhi Xiang, Xingye Tian, Guiyu Zhang, Yabo Chen, Shaofeng Zhang, Xuebo Wang, Xin Tao, and Qi Fan. Denoising vision transformer autoencoder with spectral self-regularization,

  32. [32]

    URLhttps://arxiv.org/abs/2511.12633

  33. [33]

    Bur, Cuncong Zhong, and Guanghui Wang

    Xiangyu Chen, Ying Qin, Wenju Xu, Andrés M. Bur, Cuncong Zhong, and Guanghui Wang. Explicitly increasing input information density for vision transformers on small datasets, 2022. URLhttps://arxiv.org/abs/2210.14319

  34. [34]

    Increasing input information density for vision transformers on small datasets.SSRN Electronic Journal, 2022

    Xiangyu Chen, Ying Qin, Wenju Xu, Andrés M Bur, Cocong Zhong, and Guanghui Wang. Increasing input information density for vision transformers on small datasets.SSRN Electronic Journal, 2022. doi: 10.2139/ssrn.4179882. URLhttps://ssrn.com/abstract=4179882

  35. [35]

    Springer Series in Statistics

    Ian T. Jolliffe.Principal Component Analysis. Springer-Verlag, New York, 2 edition, 2002. ISBN 978-0-387-95442-4. doi: 10.1007/b98835

  36. [36]

    Laplacian eigenmaps for dimensionality reduction and data representation.Neural computation, 15(6):1373–1396, 2003

    Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.Neural computation, 15(6):1373–1396, 2003

  37. [37]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. URLhttps://arxiv.org/abs/2010.11929

  38. [38]

    Symmetric gauge functions and unitarily invariant norms.The Quarterly Journal of Mathematics, 11(1):50–59, 1960

    Leon Mirsky. Symmetric gauge functions and unitarily invariant norms.The Quarterly Journal of Mathematics, 11(1):50–59, 1960

  39. [39]

    The approximation of one matrix by another of lower rank

    Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218, 1936

  40. [40]

    Predictability and redundancy of natural images.J

    Daniel Kersten. Predictability and redundancy of natural images.J. Opt. Soc. Am. A, 4(12): 2395–2400, Dec 1987. doi: 10.1364/JOSAA.4.002395. URL https://opg.optica.org/ josaa/abstract.cfm?URI=josaa-4-12-2395

  41. [41]

    Analysis of a complex of statistical variables into principal components

    Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6):417–441, 1933

  42. [42]

    Candès, Xiaodong Li, Yi Ma, and John Wright

    Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis?J. ACM, 58(3), June 2011. ISSN 0004-5411. doi: 10.1145/1970392.1970395. URL https://doi.org/10.1145/1970392.1970395

  43. [43]

    Natural image statistics and neural representation

    Eero P Simoncelli and Bruno A Olshausen. Natural image statistics and neural representation. Annual Review of Neuroscience, 24(1):1193–1216, 2001

  44. [44]

    Transformers in vision: A survey.ACM Comput

    Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. Transformers in vision: A survey.ACM Comput. Surv., 54(10s), September

  45. [45]

    doi: 10.1145/3505244

    ISSN 0360-0300. doi: 10.1145/3505244. URL https://doi.org/10.1145/3505244. 14

  46. [46]

    Gonzalez and Richard E

    Rafael C. Gonzalez and Richard E. Woods.Digital Image Processing. Pearson, New York, NY , 4th edition, 2018. ISBN 978-0133356724

  47. [47]

    Jain.Fundamentals of Digital Image Processing

    Anil K. Jain.Fundamentals of Digital Image Processing. Prentice Hall, Englewood Cliffs, NJ,

  48. [48]

    See Section 5.12, Eq

    ISBN 0133361659. See Section 5.12, Eq. 5.169 for the distribution of variances and eigenvalue identities

  49. [49]

    Science376(6594), 5197 (2022) https://doi.org/10.1126/science

    Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction.Science, 290(5500):2319–2323, 2000. doi: 10.1126/ science.290.5500.2319. URL https://www.science.org/doi/abs/10.1126/science. 290.5500.2319

  50. [50]

    Neural networks and principal component analysis: Learning from examples without local minima.Neural Networks, 2(1):53–58, 1989

    Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima.Neural Networks, 2(1):53–58, 1989

  51. [51]

    Nonlinear principal component analysis using autoassociative neural networks

    Mark A Kramer. Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37(2):233–243, 1991

  52. [52]

    Nonlinear principal component analysis by neural networks.Tellus A: Dynamic Meteorology and Oceanography, 53(5):599–615, 2001

    William W Hsieh. Nonlinear principal component analysis by neural networks.Tellus A: Dynamic Meteorology and Oceanography, 53(5):599–615, 2001

  53. [53]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017. URL https://arxiv.org/abs/1412.6980

  54. [54]

    Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach

    Elizabeth R DeLong, David M DeLong, and Daniel L Clarke-Pearson. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, pages 837–845, 1988. doi: 10.2307/2531595. URL https://www.jstor.org/ stable/2531595

  55. [55]

    Psychometrika12(2), 153–157 (1947)https://doi

    Quinn McNemar. Note on the sampling error of the difference between correlated proportions or percentages.Psychometrika, 12(2):153–157, 1947. doi: 10.1007/bf02295996. URL https://doi.org/10.1007/bf02295996

  56. [56]

    Ixi dataset - brain development

    Information eXtraction from Images (IXI). Ixi dataset - brain development. http:// brain-development.org/ixi-dataset/, 2005-2021. Funded by EPSRC GR/S21533/02

  57. [57]

    ICCV, 2021.https://arxiv.or g/abs/2103.14030 35 Supplementary Material S1

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows, 2021. URL https://arxiv.org/abs/2103.14030

  58. [58]

    Attention U-Net: Learning Where to Look for the Pancreas

    Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, Ben Glocker, and Daniel Rueckert. Attention u-net: Learning where to look for the pancreas, 2018. URL https://arxiv.org/abs/1804.03999

  59. [59]

    Roberts, Dominick J

    Alexandra G. Roberts, Dominick J. Romano, Mert ¸ Si¸ sman, Alexey V . Dimov, Thanh D. Nguyen, Ilhami Kovanlikaya, Susan A. Gauthier, Yi Wang, and Pascal Spincemaille. Maximum spherical mean value filtering for whole-brain qsm.Magnetic Resonance in Medicine, 91(4):1586–1597,

  60. [60]

    URL https://onlinelibrary.wiley.com/ doi/abs/10.1002/mrm.29963

    doi: https://doi.org/10.1002/mrm.29963. URL https://onlinelibrary.wiley.com/ doi/abs/10.1002/mrm.29963

  61. [61]

    Alexandra G Roberts, Jinwei Zhang, Ceren Tozlu, Dominick Romano, Sema Akkus, Heejong Kim, Mert R Sabuncu, Pascal Spincemaille, Jianqi Li, Yi Wang, Xi Wu, and Brian H Kopell. Technical feasibility of quantitative susceptibility mapping radiomics for predicting deep brain stimulation outcomes in parkinson disease.Neurosurgery, 98(5):1139–1148, September 2025

  62. [62]

    Focal Loss for Dense Object Detection

    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection, 2018. URLhttps://arxiv.org/abs/1708.02002

  63. [63]

    Revisiting batch normalization for practical domain adaptation, 2016

    Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, and Xiaodi Hou. Revisiting batch normalization for practical domain adaptation, 2016. URL https://arxiv.org/abs/1603. 04779. 15

  64. [64]

    Mitigating mri domain shift in sex classification: A deep learning approach with combat harmonization,

    Peyman Sharifian, Mohammad Saber Azimi, AliReza Karimian, and Hossein Arabi. Mitigating mri domain shift in sex classification: A deep learning approach with combat harmonization,

  65. [65]

    URLhttps://arxiv.org/abs/2508.20300

  66. [66]

    A comparative study of graph neural networks for shape classification in neuroimaging, 2022

    Nairouz Shehata, Wulfie Bain, and Ben Glocker. A comparative study of graph neural networks for shape classification in neuroimaging, 2022. URLhttps://arxiv.org/abs/2210.16670

  67. [67]

    Carrero, Asier Rabasco, Tim Lenz, Leo Misera, Gregory Patrick Veldhuizen, Paul Kuntke, Hagen H

    Radhika Juglan, Marta Ligero, Zunamys I. Carrero, Asier Rabasco, Tim Lenz, Leo Misera, Gregory Patrick Veldhuizen, Paul Kuntke, Hagen H. Kitzler, Sven Nebelung, Daniel Truhn, and Jakob Nikolas Kather. Three-dimensional end-to-end deep learning for brain mri analysis, 2025. URLhttps://arxiv.org/abs/2506.23916

  68. [68]

    Cerebellar deep brain stimulation for movement disorders.Neurobiology of Disease, 175:105899, 2022

    Chun-Hwei Tai and Sheng-Hong Tseng. Cerebellar deep brain stimulation for movement disorders.Neurobiology of Disease, 175:105899, 2022. ISSN 0969-9961. doi: https://doi.org/ 10.1016/j.nbd.2022.105899. URL https://www.sciencedirect.com/science/article/ pii/S0969996122002911

  69. [69]

    G. A. Basile, M. Quartu, S. Bertino, M. P. Serra, M. Boi, A. Bramanti, G. P. Anastasi, D. Milardi, and A. Cacciola. Red nucleus structure and function: from anatomy to clinical neurosciences. Brain Struct Funct, 226(1):69–91, 2021. ISSN 1863-2653 (Print) 1863-2653. doi: 10.1007/ s00429-020-02171-x

  70. [70]

    Association of Myelin Disruption and Iron Accumulation on MRI With Parkinson’s Disease Severity

    A. G. Roberts, M. ¸ Si¸ sman, and A. V . Dimov. Editorial for “Association of Myelin Disruption and Iron Accumulation on MRI With Parkinson’s Disease Severity”.J Magn Reson Imaging, 62(4):1117–1118, 2025. doi: 10.1002/jmri.70002. 16