arxiv: 2605.12026 · v1 · submitted 2026-05-12 · 💻 cs.CV · cs.AI· eess.SP

Recognition: 2 theorem links

· Lean Theorem

Spectral Vision Transformer for Efficient Tokenization with Limited Data

Alexandra G. Roberts , Maneesh John , Jinwei Zhang , Dominick Romano , Mert Sisman , Ki Sueng Choi , Heejong Kim , Mert R. Sabuncu

show 5 more authors

Thanh D. Nguyen Alexey V. Dimov Pascal Spincemaille Brian H. Kopell Yi Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 06:45 UTC · model grok-4.3

classification 💻 cs.CV cs.AIeess.SP

keywords spectral vision transformerefficient tokenizationmedical imaginglimited datavision transformerspatial invariancesignal-to-noise ratio

0 comments

The pith

A spectral vision transformer delivers comparable performance on medical images using fewer parameters than standard models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a vision transformer that tokenizes images through projection onto a spectral basis rather than dividing them into spatial patches. This design is argued to provide spatial invariance and an optimal signal-to-noise ratio while lowering the model's complexity. When tested on limited medical imaging datasets, both simulated and real, the resulting model performs as well as or better than conventional vision transformers, attention-based CNNs, and simpler baselines despite having fewer parameters. This efficiency is particularly relevant for medical applications where data scarcity is common and computational resources may be constrained. The authors support their approach with theoretical analysis of the basis properties and empirical comparisons across multiple model types.

Core claim

The central claim is that a vision transformer built on spectral projections instead of spatial patches can maintain high performance in data-limited regimes, particularly for medical images, while using fewer parameters than conventional vision transformers and competing architectures. The spectral basis confers spatial invariance and maximizes signal-to-noise ratio, leading to reduced complexity in the tokenization step. Empirical results across multiple datasets demonstrate equitable or superior accuracy compared to compact and standard vision transformers, convolutional networks with attention, shifted window transformers, multi-layer perceptrons, and logistic regression.

What carries the argument

Spectral projection for tokenization, the step that replaces patch-based embedding by projecting the image onto a chosen spectral basis to generate tokens while providing spatial invariance and optimal signal-to-noise ratio.

Load-bearing premise

The chosen spectral basis delivers the stated spatial invariance and optimal signal-to-noise ratio in practice on medical images without losing task-relevant information, and that the complexity reduction translates directly to the reported performance gains.

What would settle it

A direct comparison on a new unseen clinical dataset in which a standard vision transformer with matched parameter count outperforms the spectral version, or where the spectral representation is shown to discard high-frequency features needed for the task.

Figures

Figures reproduced from arXiv: 2605.12026 by Alexandra G. Roberts, Alexey V. Dimov, Brian H. Kopell, Dominick Romano, Heejong Kim, Jinwei Zhang, Ki Sueng Choi, Maneesh John, Mert R. Sabuncu, Mert Sisman, Pascal Spincemaille, Thanh D. Nguyen, Yi Wang.

**Figure 2.** Figure 2: Spectral tokenization and PSRR reconstruction by desired basis property. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Simulated binary classification for network differentiation between pure noise (class 0, first [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Parameter-balanced spectral and spatial ViT pattern classification performance by sample [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Spectral and spatial ViT object detection classification between close (class 0) and far [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: QSM and saliency maps, identifying iron in the substantia nigra previously linked to [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

We propose a novel spectral vision transformer architecture for efficient tokenization in limited data, with an emphasis on medical imaging. We outline convenient theoretical properties arising from the choice of basis including spatial invariance and optimal signal-to-noise ratio. We show reduced complexity arising from the spectral projection compared to spatial vision transformers. We show equitable or superior performance with a reduced number of parameters as compared to a variety of models including compact and standard vision transformers, convolutional neural networks with attention, shifted window transformers, multi-layer perceptrons, and logistic regression. We include simulated, public, and clinical data in our analysis and release our code at: \verb+github.com/agr78/spectralViT+.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Spectral tokenization in this ViT cuts parameters while matching baselines on limited medical data, but the key claim that the basis preserves diagnostic details rests on unshown ablations.

read the letter

The main takeaway is that this paper replaces standard patch tokenization with a spectral projection step in a vision transformer, claiming this gives spatial invariance, optimal SNR, and lower complexity so the model can run with fewer parameters on small medical datasets. They report equitable or better results than compact ViTs, Swin, CNN-attention hybrids, MLPs, and logistic regression across simulated, public, and clinical cases, and they release the code at the GitHub link in the abstract. That code release and the multi-source data mix are the clearest positives; they make it possible to inspect whether the efficiency numbers hold up in practice. The theoretical properties listed for the chosen basis are also laid out directly, which is useful for readers who want to understand why the projection might reduce token count without immediate loss of performance. The soft spot is exactly the one the stress-test flags: nothing in the abstract or the reported comparisons isolates whether the spectral basis is dropping high-frequency medical features like lesion boundaries or texture that matter for the task. If the projection is effectively low-pass on the discrete grid, the performance numbers could be driven by other choices such as token count, depth, or regularization rather than the spectral step itself. Without ablations that swap bases or measure retained diagnostic signal, it is hard to know how robust the gains are. This work is aimed at researchers building efficient transformers for data-limited medical imaging. A reader already working on alternative tokenization or low-data regimes could extract the projection idea and test it themselves. I would send it to peer review; the architecture is concrete, the code is public, and the experiments cover enough ground that referees can check the missing controls and decide if the claims survive.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a Spectral Vision Transformer (SpectralViT) for efficient tokenization under limited data regimes, with a focus on medical imaging. It claims that a chosen spectral basis yields spatial invariance and optimal signal-to-noise ratio, resulting in reduced model complexity relative to spatial ViTs. Empirically, it reports equitable or superior performance with fewer parameters than compact/standard vision transformers, CNNs with attention, shifted-window transformers (Swin), MLPs, and logistic regression, evaluated on simulated, public, and clinical datasets. Code is released at github.com/agr78/spectralViT.

Significance. If the central claims hold after verification, the work could provide a useful direction for parameter-efficient transformers in data-scarce medical imaging, where reduced complexity without loss of diagnostic performance would be valuable. The explicit code release is a positive factor supporting reproducibility.

major comments (3)

[§3] §3 (Spectral Tokenization): the assertion that the spectral basis simultaneously achieves spatial invariance and optimal SNR while preserving task-critical high-frequency content (e.g., lesion boundaries and textures in medical images) lacks a concrete derivation or preservation proof; without this, the claimed complexity reduction cannot be shown to translate directly to the reported performance gains rather than acting as an unintended low-pass filter.
[§4] §4 (Experiments): no ablation isolates the contribution of the spectral projection from confounding factors such as token count, attention depth, or regularization; the performance comparisons therefore do not establish that the spectral choice itself is responsible for equitable/superior results with fewer parameters.
[Table 2] Table 2 (or equivalent results table): quantitative metrics are presented without error bars, statistical significance tests, or dataset-size details, undermining the ability to assess robustness of the “equitable or superior” claim across the simulated, public, and clinical splits.

minor comments (2)

[Abstract] The abstract and §1 would benefit from a one-sentence statement of the exact spectral basis (e.g., Fourier, wavelet, or learned) to allow immediate assessment of the invariance and SNR claims.
Ensure the released repository contains the precise preprocessing pipelines and hyper-parameter settings used for each baseline so that the parameter-count and performance comparisons can be reproduced exactly.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [§3] §3 (Spectral Tokenization): the assertion that the spectral basis simultaneously achieves spatial invariance and optimal SNR while preserving task-critical high-frequency content (e.g., lesion boundaries and textures in medical images) lacks a concrete derivation or preservation proof; without this, the claimed complexity reduction cannot be shown to translate directly to the reported performance gains rather than acting as an unintended low-pass filter.

Authors: We appreciate the referee's observation that the theoretical justification in §3 would benefit from greater rigor. In the revised manuscript we will add an explicit derivation subsection that (i) proves spatial invariance via the translation-equivariance of the chosen spectral basis (Fourier or wavelet), (ii) shows optimal SNR through energy compaction and Parseval's relation, and (iii) demonstrates preservation of task-critical high-frequency content by analyzing the frequency response of the projection operator together with spectral plots of lesion boundaries on medical images. These additions will directly connect the basis properties to the observed complexity reduction and performance gains, ruling out an unintended low-pass effect. revision: yes
Referee: [§4] §4 (Experiments): no ablation isolates the contribution of the spectral projection from confounding factors such as token count, attention depth, or regularization; the performance comparisons therefore do not establish that the spectral choice itself is responsible for equitable/superior results with fewer parameters.

Authors: We agree that isolating the spectral projection's contribution is essential. In the revision we will insert a controlled ablation study in §4 that fixes token count (via equivalent patch sizing), attention depth (identical layer count), and regularization (same dropout and weight decay) while varying only the tokenization method. Direct head-to-head comparisons of spectral versus standard spatial tokenization on the same datasets will be reported, thereby attributing the parameter efficiency and performance to the spectral basis itself. revision: yes
Referee: [Table 2] Table 2 (or equivalent results table): quantitative metrics are presented without error bars, statistical significance tests, or dataset-size details, undermining the ability to assess robustness of the “equitable or superior” claim across the simulated, public, and clinical splits.

Authors: We acknowledge the need for statistical transparency. All result tables, including Table 2, will be updated to report mean ± standard deviation over multiple random seeds (minimum five runs), include p-values from paired statistical tests (t-test or Wilcoxon) against baselines, and explicitly list the number of samples in each simulated, public, and clinical split. These changes will allow readers to evaluate the robustness of the performance claims. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture and claims rest on independent empirical validation

full rationale

The paper proposes a spectral vision transformer with claimed properties (spatial invariance, optimal SNR) arising from the basis choice, then reports reduced complexity and competitive performance versus external baselines (ViT, Swin, CNN-attention, MLP, logistic regression) on simulated/public/clinical medical datasets. No load-bearing step equates a prediction or first-principles result to its own inputs by construction, self-citation, or fitted renaming. The central claims are supported by direct experiments and code release rather than reducing to self-referential definitions or prior author work invoked as uniqueness theorems. This is the normal case of a self-contained empirical architecture paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard properties of spectral bases (Fourier-like transforms) assumed to confer spatial invariance and optimal SNR; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Spectral projection onto chosen basis yields spatial invariance and optimal signal-to-noise ratio
Stated as convenient theoretical properties arising from the choice of basis in the abstract.

pith-pipeline@v0.9.0 · 5463 in / 1115 out tokens · 72899 ms · 2026-05-13T06:45:56.231108+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

tokens are defined by projection onto the n principal eigenvectors {w_i} ... s_i = v · w_i ... ω_i = 1/i ... h_i = ϕ(s_i · ω_i) + e_pos,i
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Fourier basis and spatial invariance ... magnitude of the complex coefficients ... spatial shifts result in phase shifts ... strict spatial (translational) invariance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 3 internal anchors

[1]

S. K. Zhou, H. Greenspan, C. Davatzikos, J. S. Duncan, B. van Ginneken, A. Madabhushi, J. L. Prince, D. Rueckert, and R. M. Summers. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proc IEEE Inst Electr Electron Eng, 109(5):820–838, 2021. ISSN 0018-9219. doi: 10.1...

work page arXiv 2021
[2]

Barragán-Montero, U

A. Barragán-Montero, U. Javaid, G. Valdés, D. Nguyen, P. Desbordes, B. Macq, S. Willems, L. Vandewinckele, M. Holmström, F. Löfman, S. Michiels, K. Souris, E. Sterpin, and J. A. Lee. Artificial intelligence and machine learning for medical imaging: A technology review.Phys Med, 83:242–256, 2021. ISSN 1120-1797. doi: 10.1016/j.ejmp.2021.04.016

work page doi:10.1016/j.ejmp.2021.04.016 2021
[3]

Segment anything in medical images.Nature Communications, 15(1):654, 2024

Jun Ma et al. Segment anything in medical images.Nature Communications, 15(1):654, 2024

work page 2024
[4]

H. E. Kim, H. H. Kim, B. K. Han, K. H. Kim, K. Han, H. Nam, E. H. Lee, and E. K. Kim. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study.Lancet Digit Health, 2(3):e138–e148, 2020. ISSN 2589-7500. doi: 10.1016/s2589-7500(20)30003-0

work page doi:10.1016/s2589-7500(20)30003-0 2020
[5]

J. G. Nam, S. Park, E. J. Hwang, J. H. Lee, K. N. Jin, K. Y . Lim, T. H. Vu, J. H. Sohn, S. Hwang, J. M. Goo, and C. M. Park. Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs.Radiology, 290(1): 218–228, 2019. ISSN 0033-8419. doi: 10.1148/radiol.2018180237

work page doi:10.1148/radiol.2018180237 2019
[6]

Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Anton Schwaighofer, Anja Thieme, Sam Bond- Taylor, Maximilian Ilse, Fernando Pérez-García, Valentina Salvatelli, Harshita Sharma, Felix Meissen, Mercy Ranjit, Shaury Srivastav, Julia Gong, Noel C. F. Codella, Fabian Falck, Ozan Oktay, Matthew P. Lungren, Maria Teodora Wetscherek, Javier Alvarez-Valle, and St...

work page arXiv 2024
[7]

Zhang, Y

X. Zhang, Y . Liu, G. Ouyang, W. Chen, A. Xu, T. Hara, X. Zhou, and D. Wu. Dermvit: Diagnosis-guided vision transformer for robust and efficient skin lesion classification.Bioengi- neering (Basel), 12(4), 2025. ISSN 2306-5354. doi: 10.3390/bioengineering12040421

work page doi:10.3390/bioengineering12040421 2025
[8]

Prior knowledge-guided attention in self-supervised vision transformers, 2022

Kevin Miao, Akash Gokul, Raghav Singh, Suzanne Petryk, Joseph Gonzalez, Kurt Keutzer, Trevor Darrell, and Colorado Reed. Prior knowledge-guided attention in self-supervised vision transformers, 2022. URLhttps://arxiv.org/abs/2209.03745

work page arXiv 2022
[9]

ICML, 2021

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through attention, 2021. URLhttps://arxiv.org/abs/2012.12877

work page arXiv 2021
[10]

Locality guidance for improving vision transformers on tiny datasets, 2022

Kehan Li, Runyi Yu, Zhennan Wang, Li Yuan, Guoli Song, and Jie Chen. Locality guidance for improving vision transformers on tiny datasets, 2022. URL https://arxiv.org/abs/2207. 10026

work page 2022
[11]

Shekoofeh Azizi, Laura Culp, Jan Freyberg, Basil Mustafa, Sebastien Baur, Simon Kornblith, Ting Chen, Patricia MacWilliams, S. Sara Mahdavi, Ellery Wulczyn, Boris Babenko, Megan Wilson, Aaron Loh, Po-Hsuan Cameron Chen, Yuan Liu, Pinal Bavishi, Scott Mayer McKinney, Jim Winkens, Abhijit Guha Roy, Zach Beaver, Fiona Ryan, Justin Krogue, Mozziyar Etemadi, U...

work page arXiv 2022
[12]

A theory of learning from different domains.Machine Learn- ing, 79(1):151–175, 2010

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jen- nifer Wortman Vaughan. A theory of learning from different domains.Machine Learn- ing, 79(1):151–175, 2010. ISSN 1573-0565. doi: 10.1007/s10994-009-5152-4. URL https://doi.org/10.1007/s10994-009-5152-4. 12

work page doi:10.1007/s10994-009-5152-4 2010
[13]

Foundation models in medicine are a so- cial experiment: time for an ethical framework.npj Digital Medicine, 8(1):525, 2025

Robert Ranisch and Joschka Haltaufderheide. Foundation models in medicine are a so- cial experiment: time for an ethical framework.npj Digital Medicine, 8(1):525, 2025. ISSN 2398-6352. doi: 10.1038/s41746-025-01924-4. URL https://doi.org/10.1038/ s41746-025-01924-4

work page doi:10.1038/s41746-025-01924-4 2025
[14]

A review of predictive and contrastive self-supervised learning for medical images.Machine Intelligence Research, 20 (4):483–513, 2023

Wei-Chien Wang, Euijoon Ahn, Dagan Feng, and Jinman Kim. A review of predictive and contrastive self-supervised learning for medical images.Machine Intelligence Research, 20 (4):483–513, 2023. ISSN 2731-538X. doi: 10.1007/s11633-022-1406-4. URL https: //www.mi-research.net/en/article/doi/10.1007/s11633-022-1406-4

work page doi:10.1007/s11633-022-1406-4 2023
[15]

Roberts, Ha M

Alexandra G. Roberts, Ha M. Luu, Mert ¸ Si¸ sman, Alexey V . Dimov, Ceren Tozlu, Ilhami Kovanlikaya, Susan A. Gauthier, Thanh D. Nguyen, and Yi Wang. Synthetic generation and latent projection denoising of rim lesions in multiple sclerosis, 2025. URL https://arxiv. org/abs/2505.23353

work page arXiv 2025
[16]

Piffer, L

S. Piffer, L. Ubaldi, S. Tangaro, A. Retico, and C. Talamonti. Tackling the small data problem in medical image classification with artificial intelligence: a systematic review.Prog Biomed Eng (Bristol), 6(3), 2024. ISSN 2516-1091. doi: 10.1088/2516-1091/ad525b

work page doi:10.1088/2516-1091/ad525b 2024
[17]

Maier-Hein, M

L. Maier-Hein, M. Eisenmann, D. Sarikaya, K. März, T. Collins, A. Malpani, J. Fallert, H. Feuss- ner, S. Giannarou, P. Mascagni, H. Nakawala, A. Park, C. Pugh, D. Stoyanov, S. S. Vedula, K. Cleary, G. Fichtinger, G. Forestier, B. Gibaud, T. Grantcharov, M. Hashizume, D. Heckmann- Nötzel, H. G. Kenngott, R. Kikinis, L. Mündermann, N. Navab, S. Onogur, T. R...

work page doi:10.1016/j.media.2021.102306 2022
[18]

R. Zhao, Y . Zhuge, K. Camphausen, and A. V . Krauze. Machine learning based survival predic- tion in glioma using large-scale registry data.Health Informatics J, 28(4):14604582221135427,

work page
[19]

doi: 10.1177/14604582221135427

ISSN 1460-4582. doi: 10.1177/14604582221135427

work page doi:10.1177/14604582221135427
[20]

Banerjee, J

J. Banerjee, J. N. Taroni, R. J. Allaway, D. V . Prasad, J. Guinney, and C. Greene. Machine learning in rare disease.Nat Methods, 20(6):803–814, 2023. ISSN 1548-7091. doi: 10.1038/ s41592-023-01886-z

work page 2023
[21]

Wave-vit: Unifying wavelet and transformers for visual representation learning.arXiv preprint arXiv:2207.04978, 2022

Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, and Tao Mei. Wave-vit: Unifying wavelet and transformers for visual representation learning.arXiv preprint arXiv:2207.04978, 2022

work page arXiv 2022
[22]

Spectral image tokenizer, 2025

Carlos Esteves, Mohammed Suhail, and Ameesh Makadia. Spectral image tokenizer, 2025. URLhttps://arxiv.org/abs/2412.09607

work page arXiv 2025
[23]

Semanticist: Pca-guided visual tokenization.arXiv preprint arXiv:2503.08685, 2025

Haiwen Wen et al. Semanticist: Pca-guided visual tokenization.arXiv preprint arXiv:2503.08685, 2025

work page arXiv 2025
[24]

Global filter networks for image classification

Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. Global filter networks for image classification. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 980–993. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_ files/...

work page 2021
[25]

Frequency-domain MLPs are more effective learners in time series forecasting

Kun Yi, Qi Zhang, Wei Fan, Shoujin Wang, Pengyang Wang, Hui He, Ning An, Defu Lian, Longbing Cao, and Zhendong Niu. Frequency-domain MLPs are more effective learners in time series forecasting. InThirty-seventh Conference on Neural Information Processing Systems,

work page
[26]

URLhttps://openreview.net/forum?id=iif9mGCTfy

work page
[27]

Mahoney, Andrew Gordon Wilson, Youngsuk Park, Syama Rangapuram, Danielle C

Luca Masserano, Abdul Fatir Ansari, Boran Han, Xiyuan Zhang, Christos Faloutsos, Michael W. Mahoney, Andrew Gordon Wilson, Youngsuk Park, Syama Rangapuram, Danielle C. Maddix, and Yuyang Wang. Enhancing foundation models for time series forecasting via wavelet-based tokenization, 2024. URLhttps://arxiv.org/abs/2412.05244. 13

work page arXiv 2024
[28]

Universal spectral tokenization via self-supervised panchromatic representation learning, 2025

Jeff Shen, Francois Lanusse, Liam Holden Parker, Ollie Liu, Tom Hehir, Leopoldo Sarra, Lucas Meyer, Micah Bowles, Sebastian Wagner-Carena, Sebastian Wagner-Carena, Helen Qu, Siavash Golkar, Alberto Bietti, Hatim Bourfoune, Nathan Cassereau, Pierre Cornette, Keiya Hirashima, Geraud Krawezik, Ruben Ohana, Nicholas Lourie, Michael McCabe, Rudy Morel, Payel M...

work page arXiv 2025
[29]

Fredft: Frequency domain fusion transformer for visible-infrared object detection, 2025

Wencong Wu, Xiuwei Zhang, Hanlin Yin, Shun Dai, Hongxi Zhang, and Yanning Zhang. Fredft: Frequency domain fusion transformer for visible-infrared object detection, 2025. URL https://arxiv.org/abs/2511.10046

work page arXiv 2025
[30]

Spec- tralar: Spectral autoregressive visual generation, 2025

Yuanhui Huang, Weiliang Chen, Wenzhao Zheng, Yueqi Duan, Jie Zhou, and Jiwen Lu. Spec- tralar: Spectral autoregressive visual generation, 2025. URL https://arxiv.org/abs/2506. 10962

work page 2025
[31]

Denoising vision transformer autoencoder with spectral self-regularization,

Xunzhi Xiang, Xingye Tian, Guiyu Zhang, Yabo Chen, Shaofeng Zhang, Xuebo Wang, Xin Tao, and Qi Fan. Denoising vision transformer autoencoder with spectral self-regularization,

work page
[32]

URLhttps://arxiv.org/abs/2511.12633

work page arXiv
[33]

Bur, Cuncong Zhong, and Guanghui Wang

Xiangyu Chen, Ying Qin, Wenju Xu, Andrés M. Bur, Cuncong Zhong, and Guanghui Wang. Explicitly increasing input information density for vision transformers on small datasets, 2022. URLhttps://arxiv.org/abs/2210.14319

work page arXiv 2022
[34]

Increasing input information density for vision transformers on small datasets.SSRN Electronic Journal, 2022

Xiangyu Chen, Ying Qin, Wenju Xu, Andrés M Bur, Cocong Zhong, and Guanghui Wang. Increasing input information density for vision transformers on small datasets.SSRN Electronic Journal, 2022. doi: 10.2139/ssrn.4179882. URLhttps://ssrn.com/abstract=4179882

work page doi:10.2139/ssrn.4179882 2022
[35]

Springer Series in Statistics

Ian T. Jolliffe.Principal Component Analysis. Springer-Verlag, New York, 2 edition, 2002. ISBN 978-0-387-95442-4. doi: 10.1007/b98835

work page doi:10.1007/b98835 2002
[36]

Laplacian eigenmaps for dimensionality reduction and data representation.Neural computation, 15(6):1373–1396, 2003

Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.Neural computation, 15(6):1373–1396, 2003

work page 2003
[37]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. URLhttps://arxiv.org/abs/2010.11929

work page internal anchor Pith review Pith/arXiv arXiv 2021
[38]

Symmetric gauge functions and unitarily invariant norms.The Quarterly Journal of Mathematics, 11(1):50–59, 1960

Leon Mirsky. Symmetric gauge functions and unitarily invariant norms.The Quarterly Journal of Mathematics, 11(1):50–59, 1960

work page 1960
[39]

The approximation of one matrix by another of lower rank

Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218, 1936

work page 1936
[40]

Predictability and redundancy of natural images.J

Daniel Kersten. Predictability and redundancy of natural images.J. Opt. Soc. Am. A, 4(12): 2395–2400, Dec 1987. doi: 10.1364/JOSAA.4.002395. URL https://opg.optica.org/ josaa/abstract.cfm?URI=josaa-4-12-2395

work page doi:10.1364/josaa.4.002395 1987
[41]

Analysis of a complex of statistical variables into principal components

Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6):417–441, 1933

work page 1933
[42]

Candès, Xiaodong Li, Yi Ma, and John Wright

Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis?J. ACM, 58(3), June 2011. ISSN 0004-5411. doi: 10.1145/1970392.1970395. URL https://doi.org/10.1145/1970392.1970395

work page doi:10.1145/1970392.1970395 2011
[43]

Natural image statistics and neural representation

Eero P Simoncelli and Bruno A Olshausen. Natural image statistics and neural representation. Annual Review of Neuroscience, 24(1):1193–1216, 2001

work page 2001
[44]

Transformers in vision: A survey.ACM Comput

Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. Transformers in vision: A survey.ACM Comput. Surv., 54(10s), September

work page
[45]

doi: 10.1145/3505244

ISSN 0360-0300. doi: 10.1145/3505244. URL https://doi.org/10.1145/3505244. 14

work page doi:10.1145/3505244
[46]

Gonzalez and Richard E

Rafael C. Gonzalez and Richard E. Woods.Digital Image Processing. Pearson, New York, NY , 4th edition, 2018. ISBN 978-0133356724

work page 2018
[47]

Jain.Fundamentals of Digital Image Processing

Anil K. Jain.Fundamentals of Digital Image Processing. Prentice Hall, Englewood Cliffs, NJ,

work page
[48]

See Section 5.12, Eq

ISBN 0133361659. See Section 5.12, Eq. 5.169 for the distribution of variances and eigenvalue identities

work page
[49]

Science376(6594), 5197 (2022) https://doi.org/10.1126/science

Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction.Science, 290(5500):2319–2323, 2000. doi: 10.1126/ science.290.5500.2319. URL https://www.science.org/doi/abs/10.1126/science. 290.5500.2319

work page doi:10.1126/science 2000
[50]

Neural networks and principal component analysis: Learning from examples without local minima.Neural Networks, 2(1):53–58, 1989

Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima.Neural Networks, 2(1):53–58, 1989

work page 1989
[51]

Nonlinear principal component analysis using autoassociative neural networks

Mark A Kramer. Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37(2):233–243, 1991

work page 1991
[52]

Nonlinear principal component analysis by neural networks.Tellus A: Dynamic Meteorology and Oceanography, 53(5):599–615, 2001

William W Hsieh. Nonlinear principal component analysis by neural networks.Tellus A: Dynamic Meteorology and Oceanography, 53(5):599–615, 2001

work page 2001
[53]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017. URL https://arxiv.org/abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2017
[54]

Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach

Elizabeth R DeLong, David M DeLong, and Daniel L Clarke-Pearson. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, pages 837–845, 1988. doi: 10.2307/2531595. URL https://www.jstor.org/ stable/2531595

work page doi:10.2307/2531595 1988
[55]

Psychometrika12(2), 153–157 (1947)https://doi

Quinn McNemar. Note on the sampling error of the difference between correlated proportions or percentages.Psychometrika, 12(2):153–157, 1947. doi: 10.1007/bf02295996. URL https://doi.org/10.1007/bf02295996

work page doi:10.1007/bf02295996 1947
[56]

Ixi dataset - brain development

Information eXtraction from Images (IXI). Ixi dataset - brain development. http:// brain-development.org/ixi-dataset/, 2005-2021. Funded by EPSRC GR/S21533/02

work page 2005
[57]

ICCV, 2021.https://arxiv.or g/abs/2103.14030 35 Supplementary Material S1

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows, 2021. URL https://arxiv.org/abs/2103.14030

work page arXiv 2021
[58]

Attention U-Net: Learning Where to Look for the Pancreas

Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, Ben Glocker, and Daniel Rueckert. Attention u-net: Learning where to look for the pancreas, 2018. URL https://arxiv.org/abs/1804.03999

work page internal anchor Pith review Pith/arXiv arXiv 2018
[59]

Roberts, Dominick J

Alexandra G. Roberts, Dominick J. Romano, Mert ¸ Si¸ sman, Alexey V . Dimov, Thanh D. Nguyen, Ilhami Kovanlikaya, Susan A. Gauthier, Yi Wang, and Pascal Spincemaille. Maximum spherical mean value filtering for whole-brain qsm.Magnetic Resonance in Medicine, 91(4):1586–1597,

work page
[60]

URL https://onlinelibrary.wiley.com/ doi/abs/10.1002/mrm.29963

doi: https://doi.org/10.1002/mrm.29963. URL https://onlinelibrary.wiley.com/ doi/abs/10.1002/mrm.29963

work page doi:10.1002/mrm.29963
[61]

Alexandra G Roberts, Jinwei Zhang, Ceren Tozlu, Dominick Romano, Sema Akkus, Heejong Kim, Mert R Sabuncu, Pascal Spincemaille, Jianqi Li, Yi Wang, Xi Wu, and Brian H Kopell. Technical feasibility of quantitative susceptibility mapping radiomics for predicting deep brain stimulation outcomes in parkinson disease.Neurosurgery, 98(5):1139–1148, September 2025

work page 2025
[62]

Focal Loss for Dense Object Detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection, 2018. URLhttps://arxiv.org/abs/1708.02002

work page Pith review arXiv 2018
[63]

Revisiting batch normalization for practical domain adaptation, 2016

Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, and Xiaodi Hou. Revisiting batch normalization for practical domain adaptation, 2016. URL https://arxiv.org/abs/1603. 04779. 15

work page 2016
[64]

Mitigating mri domain shift in sex classification: A deep learning approach with combat harmonization,

Peyman Sharifian, Mohammad Saber Azimi, AliReza Karimian, and Hossein Arabi. Mitigating mri domain shift in sex classification: A deep learning approach with combat harmonization,

work page
[65]

URLhttps://arxiv.org/abs/2508.20300

work page arXiv
[66]

A comparative study of graph neural networks for shape classification in neuroimaging, 2022

Nairouz Shehata, Wulfie Bain, and Ben Glocker. A comparative study of graph neural networks for shape classification in neuroimaging, 2022. URLhttps://arxiv.org/abs/2210.16670

work page arXiv 2022
[67]

Carrero, Asier Rabasco, Tim Lenz, Leo Misera, Gregory Patrick Veldhuizen, Paul Kuntke, Hagen H

Radhika Juglan, Marta Ligero, Zunamys I. Carrero, Asier Rabasco, Tim Lenz, Leo Misera, Gregory Patrick Veldhuizen, Paul Kuntke, Hagen H. Kitzler, Sven Nebelung, Daniel Truhn, and Jakob Nikolas Kather. Three-dimensional end-to-end deep learning for brain mri analysis, 2025. URLhttps://arxiv.org/abs/2506.23916

work page arXiv 2025
[68]

Cerebellar deep brain stimulation for movement disorders.Neurobiology of Disease, 175:105899, 2022

Chun-Hwei Tai and Sheng-Hong Tseng. Cerebellar deep brain stimulation for movement disorders.Neurobiology of Disease, 175:105899, 2022. ISSN 0969-9961. doi: https://doi.org/ 10.1016/j.nbd.2022.105899. URL https://www.sciencedirect.com/science/article/ pii/S0969996122002911

work page doi:10.1016/j.nbd.2022.105899 2022
[69]

G. A. Basile, M. Quartu, S. Bertino, M. P. Serra, M. Boi, A. Bramanti, G. P. Anastasi, D. Milardi, and A. Cacciola. Red nucleus structure and function: from anatomy to clinical neurosciences. Brain Struct Funct, 226(1):69–91, 2021. ISSN 1863-2653 (Print) 1863-2653. doi: 10.1007/ s00429-020-02171-x

work page 2021
[70]

Association of Myelin Disruption and Iron Accumulation on MRI With Parkinson’s Disease Severity

A. G. Roberts, M. ¸ Si¸ sman, and A. V . Dimov. Editorial for “Association of Myelin Disruption and Iron Accumulation on MRI With Parkinson’s Disease Severity”.J Magn Reson Imaging, 62(4):1117–1118, 2025. doi: 10.1002/jmri.70002. 16

work page doi:10.1002/jmri.70002 2025