pith. sign in

arxiv: 2510.13768 · v2 · submitted 2025-10-15 · 💻 cs.CV · cs.AI· q-bio.NC

Scaling Vision Transformers for Functional MRI with Flat Maps

Pith reviewed 2026-05-18 06:58 UTC · model grok-4.3

classification 💻 cs.CV cs.AIq-bio.NC
keywords functional MRIVision Transformermasked autoencodercortical flat mapsfoundation modelscognitive state decodingscaling lawsBrainmarks
0
0 comments X

The pith

Converting fMRI volumes to 2D cortical flat maps lets Vision Transformers scale and outperform prior models on cognitive state decoding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that a simple projection of 3D fMRI volumes onto 2D cortical flat maps makes it practical to train large Vision Transformer models with masked autoencoder pretraining on thousands of hours of brain data. A sympathetic reader would care because this could open the way to reusable foundation models that improve prediction of brain states without needing task-specific architectures for every new experiment. The authors build the CortexMAE family, compare flat maps against volume and parcellation inputs, and run the first systematic scaling study on fMRI. They release an evaluation suite called Brainmarks that reveals flat maps usually win, scaling follows power laws up to limits, trait prediction stays hard for all models, and cognitive decoding sees large gains for the new models.

Core claim

The central claim is that cortical flat map projections turn each 3D fMRI volume into a 2D image that a standard Vision Transformer can process directly inside a masked autoencoder framework. Trained on 2.1K hours of open data, the resulting CortexMAE models exhibit strict power-law scaling with data size and deliver substantially higher accuracy on cognitive state decoding benchmarks than earlier fMRI models. In contrast, the same models show no clear advantage on subject-level trait prediction, where even a simple functional connectivity baseline remains competitive.

What carries the argument

Cortical flat map projection that converts each 3D fMRI volume into a 2D map for direct input to a Vision Transformer trained under the masked autoencoder objective.

If this is right

  • Flat maps generally outperform both parcellation and direct volume representations across the tested tasks.
  • Model performance follows a strict power law with increasing training data, subject to observable limits.
  • CortexMAE models achieve large gains over prior methods specifically on cognitive state decoding.
  • No single architecture dominates subject-level trait prediction, and all models remain close to a functional connectivity baseline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The performance gap between trait and state tasks suggests self-supervised pretraining on fMRI may be better suited to capturing transient activity patterns than stable individual traits.
  • Observed scaling limits imply that further gains will require changes in architecture or data quality rather than data volume alone.
  • If flat maps transfer well to other 3D brain imaging modalities, the same projection step could simplify transformer use across neuroimaging.

Load-bearing premise

Cortical flat-map projections preserve the spatial and temporal information needed for trait prediction and cognitive state decoding without introducing systematic distortions that would invalidate comparisons to volume-based or parcellation baselines.

What would settle it

A controlled experiment in which a volume-based or parcellation-based transformer matches or exceeds CortexMAE accuracy on the same cognitive state decoding tasks would falsify the claimed advantage of flat maps.

Figures

Figures reproduced from arXiv: 2510.13768 by Benjamin Warner, Cesar Kadir Torrico Villanueva, Connor Lane, Daniel Z. Kaplan, Debojyoti Das, Gianfranco Cort\'es, Leema Krishna Murali, Manish Ram, Mihir Tripathy, Paul S. Scotti, Ratna Sagari Grandhi, Sam Gijsen, Shamus Sim Zi Yang, Suin Cho, Tanishq Mathew Abraham, Utkarsh Kumar Singh, Will Beddow, Yuxiang Wei.

Figure 1
Figure 1. Figure 1: Our flat map MAE (fm-MAE) architecture. Surface-mapped fMRI activity patterns are [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of MAE predictions. Within each panel of [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: fMRI modeling performance scales with dataset size. The model is a ViT-B trained on [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Downstream decoding perfor￾mance as a function of dataset size (left col￾umn), model size (middle column), and tem￾poral patch size pt (right column). Smaller temporal patch size corresponds to larger effective sequence length (tokens per input = 364·16/pt). Black dashes indicate perfor￾mance on independent validation sets used for classifier parameter tuning. the 88.6M (ViT-B) model, despite 7× fewer para… view at source ↗
Figure 5
Figure 5. Figure 5: 4D fMRI time series are first preprocessed using standard methods [ [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Schaefer 400 parcellation [22] with Yeo resting-state networks [26] on the cortical surface and flat map. Relaxation cuts required for flat map transformation [30] are marked in white. B.2 Pretraining implementation details We pretrain for 625K steps using AdamW (β1 = 0.9, β2 = 0.95) [64] with a batch size of 32, learning rate of 1.25e-4 (base learning rate 1e-3 scaled by batch_size / 256), and weight deca… view at source ↗
Figure 7
Figure 7. Figure 7: Example NSD images with CLIP assigned labels. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Additional MAE predictions for randomly sampled data. [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
read the original abstract

We study the problem of training self-supervised foundation models for functional MRI. Our main contributions are: (1) we introduce a new model family (CortexMAE) trained using the masked autoencoder framework on 2.1K hours of open fMRI data, and (2) we release the first open evaluation suite (Brainmarks) for fMRI foundation models. Our core innovation is simple: we adapt the Vision Transformer to fMRI by first converting each 3D fMRI volume to a 2D map using a cortical flat map projection. We directly compare flat maps to both parcellation and volume-based representations. While each has its advantages, flat maps generally perform best. We perform the first systematic scaling analysis for fMRI and observe strict power law scaling, albeit with limits. Finally, we use Brainmarks to do controlled benchmark comparisons. On subject-level trait prediction, we report a challenging null result: no single model achieves clear state-of-the-art performance. Moreover, all models struggle to outperform a simple functional connectivity baseline. On cognitive state decoding, we observe more robust performance, and in this setting our CortexMAE family outperforms prior models by a large margin. Code, models, and datasets are available at https://github.com/MedARC-AI/CortexMAE and https://github.com/MedARC-AI/Brainmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces CortexMAE, a family of masked autoencoder models based on Vision Transformers adapted to fMRI via cortical flat-map projections of 3D volumes. Trained self-supervised on 2.1K hours of open data, the work releases the Brainmarks evaluation suite and performs direct comparisons of flat-map, parcellation, and volume representations. It reports strict power-law scaling (with limits), a null result on subject-level trait prediction (no model clearly outperforms a functional connectivity baseline), and substantially stronger performance on cognitive state decoding where CortexMAE outperforms prior models by a large margin. Code, models, and datasets are released.

Significance. If the central empirical claims hold, this provides a useful open foundation-model baseline and scaling analysis for fMRI, together with a new benchmark suite. The explicit release of code, models, and data, plus the honest reporting of the null result on trait prediction, are clear strengths that facilitate independent verification. The flat-map approach could influence representation choices in neuroimaging if shown to preserve functional structure without systematic bias relative to volume or parcellation inputs.

major comments (2)
  1. [§4 and Table 2] §4 (Experimental Comparisons) and Table 2: the headline ranking that flat maps generally perform best, and the large-margin win on cognitive state decoding, rests on the untested assumption that the 3D-to-2D cortical projection retains the spatial structure exploited by the ViT without introducing distortions that favor one input format. No voxel-wise reconstruction error, subcortical signal retention statistics, or representational similarity analysis between flat-map, volume, and parcellation inputs is reported; this is load-bearing for the claim that the observed ordering reflects architectural or representational advantage rather than unequal information fidelity.
  2. [Methods and Results] Methods and Results sections: performance numbers for both trait prediction and cognitive decoding lack error bars, standard deviations across random seeds or cross-validation folds, and explicit statements of the exact train/validation/test splits used. Without these, the robustness of the 'large margin' outperformance and the null result cannot be assessed at the level required for a foundation-model benchmark paper.
minor comments (2)
  1. [Figure 3] Figure 3 (scaling curves): axis labels and legend entries are too small for readability; consider increasing font size and adding a panel showing the fitted power-law exponents with confidence intervals.
  2. [§5] The abstract states 'strict power law scaling, albeit with limits' but the main text does not define the quantitative criterion used to identify the 'limits'; a short clarification would help readers interpret the scaling regime.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We appreciate the focus on validating the representational fidelity of our flat-map approach and on improving the statistical reporting of our results. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§4 and Table 2] §4 (Experimental Comparisons) and Table 2: the headline ranking that flat maps generally perform best, and the large-margin win on cognitive state decoding, rests on the untested assumption that the 3D-to-2D cortical projection retains the spatial structure exploited by the ViT without introducing distortions that favor one input format. No voxel-wise reconstruction error, subcortical signal retention statistics, or representational similarity analysis between flat-map, volume, and parcellation inputs is reported; this is load-bearing for the claim that the observed ordering reflects architectural or representational advantage rather than unequal information fidelity.

    Authors: We agree that the absence of direct fidelity checks on the 3D-to-2D projection is a limitation that weakens the interpretation of why flat maps outperform the alternatives. Although cortical flat mapping is a standard preprocessing step in the field (as implemented in FreeSurfer and used in the HCP), we did not report quantitative comparisons of information preservation across formats. In the revised version we will add voxel-wise reconstruction error between original volumes and projected maps, subcortical signal retention statistics, and representational similarity analysis (RSA) comparing flat-map, volume, and parcellation inputs. These additions will allow readers to evaluate whether the performance ordering reflects genuine representational advantages rather than differential information loss. revision: yes

  2. Referee: [Methods and Results] Methods and Results sections: performance numbers for both trait prediction and cognitive decoding lack error bars, standard deviations across random seeds or cross-validation folds, and explicit statements of the exact train/validation/test splits used. Without these, the robustness of the 'large margin' outperformance and the null result cannot be assessed at the level required for a foundation-model benchmark paper.

    Authors: We acknowledge that the original submission reported only point estimates without measures of variability or precise split details, which is insufficient for a benchmark paper. We will revise the Methods section to document the exact train/validation/test splits (including subject-wise partitioning to avoid leakage) and will update all performance tables and figures in Results to include error bars corresponding to standard deviation across random seeds and cross-validation folds. These changes will make both the large-margin gains on cognitive decoding and the null result on trait prediction more rigorously interpretable. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical training, scaling observations, and held-out benchmarks

full rationale

The paper reports an empirical study: CortexMAE models are trained via masked autoencoding on 2.1K hours of fMRI data after converting volumes to 2D cortical flat maps, then evaluated on the released Brainmarks suite for trait prediction and cognitive state decoding. All performance claims, scaling observations (strict power laws with limits), and comparisons to parcellation/volume baselines arise directly from training runs and held-out test metrics. No mathematical derivations, first-principles results, fitted parameters renamed as predictions, or load-bearing self-citations appear in the described contributions. The work is self-contained against external benchmarks because code, models, and datasets are released for independent reproduction, and results are falsifiable via direct replication on the same splits rather than reducing to any internal definition or prior author result by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the domain assumption that masked autoencoding transfers usefully from natural images to flat-map fMRI and that the chosen open fMRI corpus is representative enough for scaling laws to emerge.

axioms (1)
  • domain assumption Masked autoencoder pretraining on flat-map fMRI yields representations that generalize to downstream decoding tasks.
    Invoked when claiming outperformance on cognitive state decoding.

pith-pipeline@v0.9.0 · 5860 in / 1333 out tokens · 37465 ms · 2026-05-18T06:58:07.501063+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding

    cs.LG 2026-04 unverdicted novelty 6.0

    A meta-optimized in-context learning approach enables training-free cross-subject semantic visual decoding from fMRI by inferring individual neural encoding patterns via hierarchical inference on a few examples.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · cited by 1 Pith paper · 6 internal anchors

  1. [1]

    Prediction as a humanitarian and pragmatic contribution from human cognitive neuroscience.Neuron, 85(1):11–26, 2015

    John DE Gabrieli, Satrajit S Ghosh, and Susan Whitfield-Gabrieli. Prediction as a humanitarian and pragmatic contribution from human cognitive neuroscience.Neuron, 85(1):11–26, 2015

  2. [2]

    Building better biomarkers: brain models in translational neuroimaging.Nature neuroscience, 20(3):365–377, 2017

    Choong-Wan Woo, Luke J Chang, Martin A Lindquist, and Tor D Wager. Building better biomarkers: brain models in translational neuroimaging.Nature neuroscience, 20(3):365–377, 2017

  3. [3]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258, 2021

  4. [4]

    A foundation model for generalizable disease detection from retinal images.Nature, 622(7981):156–163, 2023

    Yukun Zhou, Mark A Chia, Siegfried K Wagner, Murat S Ayhan, Dominic J Williamson, Robbert R Struyven, Timing Liu, Moucheng Xu, Mateo G Lozano, Peter Woodward-Court, et al. A foundation model for generalizable disease detection from retinal images.Nature, 622(7981):156–163, 2023

  5. [5]

    A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

    Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier González, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

  6. [6]

    A foundation model for the earth system.Nature, pages 1–8, 2025

    Cristian Bodnar, Wessel P Bruinsma, Ana Lucic, Megan Stanley, Anna Allen, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan A Weyn, Haiyu Dong, et al. A foundation model for the earth system.Nature, pages 1–8, 2025

  7. [7]

    Foundation model of neural activity predicts response to new stimulus types.Nature, 640(8058):470–477, 2025

    Eric Y Wang, Paul G Fahey, Zhuokun Ding, Stelios Papadopoulos, Kayla Ponder, Marissa A Weis, Andersen Chang, Taliah Muhammad, Saumil Patel, Zhiwei Ding, et al. Foundation model of neural activity predicts response to new stimulus types.Nature, 640(8058):470–477, 2025

  8. [8]

    Bert: Pre-training of deep bidi- rectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

  9. [9]

    Language models are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

  10. [10]

    wav2vec 2.0: A framework for self-supervised learning of speech representations.Advances in neural information processing systems, 33: 12449–12460, 2020

    Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations.Advances in neural information processing systems, 33: 12449–12460, 2020

  11. [11]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022

  12. [12]

    Brain network transformer

    Xuan Kan, Wei Dai, Hejie Cui, Zilong Zhang, Ying Guo, and Carl Yang. Brain network transformer. Advances in Neural Information Processing Systems, 35:25586–25599, 2022

  13. [13]

    Self-supervised learning of brain dynamics from broad neuroimaging data.Advances in neural information processing systems, 35:21255–21269, 2022

    Armin Thomas, Christopher Ré, and Russell Poldrack. Self-supervised learning of brain dynamics from broad neuroimaging data.Advances in neural information processing systems, 35:21255–21269, 2022

  14. [14]

    Self-supervised transformers for fmri representation

    Itzik Malkiel, Gony Rosenman, Lior Wolf, and Talma Hendler. Self-supervised transformers for fmri representation. InInternational Conference on Medical Imaging with Deep Learning, pages 895–913. PMLR, 2022. 6

  15. [15]

    Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding

    Zijiao Chen, Jiaxin Qing, Tiange Xiang, Wan Lin Yue, and Juan Helen Zhou. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22710–22720, 2023

  16. [16]

    Swift: Swin 4d fmri transformer.Advances in Neural Information Processing Systems, 36:42015–42037, 2023

    Peter Kim, Junbeom Kwon, Sunghwan Joo, Sangyoon Bae, Donggyu Lee, Yoonho Jung, Shinjae Yoo, Jiook Cha, and Taesup Moon. Swift: Swin 4d fmri transformer.Advances in Neural Information Processing Systems, 36:42015–42037, 2023

  17. [17]

    BrainLM: A foundation model for brain activity recordings

    Josue Ortega Caro, Antonio Henrique de Oliveira Fonseca, Syed A Rizvi, Matteo Rosati, Christopher Averill, James L Cross, Prateek Mittal, Emanuele Zappala, Rahul Madhav Dhodapkar, Chadi Abdallah, and David van Dijk. BrainLM: A foundation model for brain activity recordings. InThe Twelfth International Conference on Learning Representations, 2024. URL http...

  18. [18]

    Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking.Advances in Neural Information Processing Systems, 37:86048–86073, 2024

    Zijian Dong, Ruilin Li, Yilei Wu, Thuan Tinh Nguyen, Joanna Chong, Fang Ji, Nathanael Tong, Christopher Chen, and Juan Helen Zhou. Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking.Advances in Neural Information Processing Systems, 37:86048–86073, 2024

  19. [19]

    General-purpose brain foundation models for time-series neuroimaging data

    Mohammad Javad Darvishi Bayazi, Hena Ghonia, Roland Riachi, Bruno Aristimunha, Arian Khorasani, Md Rifat Arefin, Amin Darabi, Guillaume Dumas, and Irina Rish. General-purpose brain foundation models for time-series neuroimaging data. InNeurIPS Workshop on Time Series in the Age of Large Models, 2024. URLhttps://openreview.net/forum?id=HwDQH0r37I

  20. [20]

    arXiv preprint arXiv:2506.11167 , year=

    Cheng Wang, Yu Jiang, Zhihao Peng, Chenxin Li, Changbae Bang, Lin Zhao, Jinglei Lv, Jorge Sepulcre, Carl Yang, Lifang He, et al. Towards a general-purpose foundation model for fmri analysis.arXiv preprint arXiv:2506.11167, 2025

  21. [21]

    A unified, scalable framework for neural population decoding.Advances in Neural Information Processing Systems, 36: 44937–44956, 2023

    Mehdi Azabou, Vinam Arora, Venkataramana Ganesh, Ximeng Mao, Santosh Nachimuthu, Michael Mendelson, Blake Richards, Matthew Perich, Guillaume Lajoie, and Eva Dyer. A unified, scalable framework for neural population decoding.Advances in Neural Information Processing Systems, 36: 44937–44956, 2023

  22. [22]

    Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri.Cerebral cortex, 28(9):3095–3114, 2018

    Alexander Schaefer, Ru Kong, Evan M Gordon, Timothy O Laumann, Xi-Nian Zuo, Avram J Holmes, Simon B Eickhoff, and BT Thomas Yeo. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri.Cerebral cortex, 28(9):3095–3114, 2018

  23. [23]

    Fine-grain atlases of functional modes for fmri analysis.NeuroImage, 221:117126, 2020

    Kamalaker Dadi, Gaël Varoquaux, Antonia Machlouzarides-Shalit, Krzysztof J Gorgolewski, Demian Wassermann, Bertrand Thirion, and Arthur Mensch. Fine-grain atlases of functional modes for fmri analysis.NeuroImage, 221:117126, 2020

  24. [24]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

  25. [25]

    The human connectome: a structural description of the human brain.PLoS computational biology, 1(4):e42, 2005

    Olaf Sporns, Giulio Tononi, and Rolf Kötter. The human connectome: a structural description of the human brain.PLoS computational biology, 1(4):e42, 2005

  26. [26]

    The organization of the human cerebral cortex estimated by intrinsic functional connectivity.Journal of neurophysiology, 2011

    BT Thomas Yeo, Fenna M Krienen, Jorge Sepulcre, Mert R Sabuncu, Danial Lashkari, Marisa Hollinshead, Joshua L Roffman, Jordan W Smoller, Lilla Zöllei, Jonathan R Polimeni, et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity.Journal of neurophysiology, 2011

  27. [27]

    Geometric constraints on human brain function.Nature, 618(7965): 566–574, 2023

    James C Pang, Kevin M Aquino, Marianne Oldehinkel, Peter A Robinson, Ben D Fulcher, Michael Breakspear, and Alex Fornito. Geometric constraints on human brain function.Nature, 618(7965): 566–574, 2023

  28. [28]

    The bitter lesson.Incomplete Ideas (blog), 13(1):38, 2019

    Richard Sutton. The bitter lesson.Incomplete Ideas (blog), 13(1):38, 2019

  29. [29]

    Stanford cs25: V4

    Hyung Won Chung. Stanford cs25: V4. https://youtu.be/3gb-ZkVRemQ?si=7FXnklTS9X3FCuv1,

  30. [30]

    YouTube video, Stanford University

  31. [31]

    Pycortex: an interactive surface visualizer for fmri.Frontiers in neuroinformatics, 9:23, 2015

    James S Gao, Alexander G Huth, Mark D Lescroart, and Jack L Gallant. Pycortex: an interactive surface visualizer for fmri.Frontiers in neuroinformatics, 9:23, 2015

  32. [32]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https...

  33. [33]

    Masked autoencoders as spatiotemporal learners

    Christoph Feichtenhofer, Yanghao Li, Kaiming He, et al. Masked autoencoders as spatiotemporal learners. Advances in neural information processing systems, 35:35946–35958, 2022

  34. [34]

    The wu-minn human connectome project: an overview.Neuroimage, 80: 62–79, 2013

    David C Van Essen, Stephen M Smith, Deanna M Barch, Timothy EJ Behrens, Essa Yacoub, Kamil Ugurbil, Wu-Minn HCP Consortium, et al. The wu-minn human connectome project: an overview.Neuroimage, 80: 62–79, 2013

  35. [35]

    Cortical surface-based analysis: I

    Anders M Dale, Bruce Fischl, and Martin I Sereno. Cortical surface-based analysis: I. segmentation and surface reconstruction.Neuroimage, 9(2):179–194, 1999

  36. [36]

    Freesurfer.Neuroimage, 62(2):774–781, 2012

    Bruce Fischl. Freesurfer.Neuroimage, 62(2):774–781, 2012

  37. [37]

    The minimal preprocessing pipelines for the human connectome project.Neuroimage, 80:105–124, 2013

    Matthew F Glasser, Stamatios N Sotiropoulos, J Anthony Wilson, Timothy S Coalson, Bruce Fischl, Jesper L Andersson, Junqian Xu, Saad Jbabdi, Matthew Webster, Jonathan R Polimeni, et al. The minimal preprocessing pipelines for the human connectome project.Neuroimage, 80:105–124, 2013

  38. [38]

    fmriprep: a robust preprocessing pipeline for functional mri.Nature methods, 16(1):111–116, 2019

    Oscar Esteban, Christopher J Markiewicz, Ross W Blair, Craig A Moodie, A Ilkay Isik, Asier Erra- muzpe, James D Kent, Mathias Goncalves, Elizabeth DuPre, Madeleine Snyder, et al. fmriprep: a robust preprocessing pipeline for functional mri.Nature methods, 16(1):111–116, 2019

  39. [39]

    A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence.Nature neuroscience, 25(1):116–126, 2022

    Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L Breedlove, Jacob S Prince, Logan T Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, et al. A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence.Nature neuroscience, 25(1):116–126, 2022

  40. [40]

    Image processing and quality control for the first 10,000 brain imaging datasets from uk biobank.Neuroimage, 166:400–424, 2018

    Fidel Alfaro-Almagro, Mark Jenkinson, Neal K Bangerter, Jesper LR Andersson, Ludovica Griffanti, Gwenaëlle Douaud, Stamatios N Sotiropoulos, Saad Jbabdi, Moises Hernandez-Fernandez, Emmanuel Vallee, et al. Image processing and quality control for the first 10,000 brain imaging datasets from uk biobank.Neuroimage, 166:400–424, 2018

  41. [41]

    Sources and implications of whole-brain fmri signals in humans.Neuroimage, 146:609–625, 2017

    Jonathan D Power, Mark Plitt, Timothy O Laumann, and Alex Martin. Sources and implications of whole-brain fmri signals in humans.Neuroimage, 146:609–625, 2017

  42. [42]

    Videomae v2: Scaling video masked autoencoders with dual masking

    Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, and Yu Qiao. Videomae v2: Scaling video masked autoencoders with dual masking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14549–14560, 2023

  43. [43]

    Functional annotation of human cognitive states using deep graph convolution.NeuroImage, 231:117847, 2021

    Yu Zhang, Loïc Tetrel, Bertrand Thirion, and Pierre Bellec. Functional annotation of human cognitive states using deep graph convolution.NeuroImage, 231:117847, 2021

  44. [44]

    Deep learning models of cognitive processes constrained by human brain connectomes.Medical image analysis, 80:102507, 2022

    Yu Zhang, Nicolas Farrugia, and Pierre Bellec. Deep learning models of cognitive processes constrained by human brain connectomes.Medical image analysis, 80:102507, 2022

  45. [45]

    Brain decoding of the human connectome project tasks in a dense individual fmri dataset.NeuroImage, 283:120395, 2023

    Shima Rastegarnia, Marie St-Laurent, Elizabeth DuPre, Basile Pinsard, and Pierre Bellec. Brain decoding of the human connectome project tasks in a dense individual fmri dataset.NeuroImage, 283:120395, 2023

  46. [46]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014

  47. [47]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

  48. [48]

    Self-supervised learning from images with a joint-embedding predictive architecture

    Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15619–15629, 2023

  49. [49]

    Cluster and predict latents patches for improved masked image modeling.Transactions on Machine Learning Research, 2025

    Timothée Darcet, Federico Baldassarre, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. Cluster and predict latents patches for improved masked image modeling.Transactions on Machine Learning Research, 2025. ISSN 2835-8856. URLhttps://openreview.net/forum?id=Ycmz7qJxUQ

  50. [50]

    Brain connectivity related to working memory performance.Journal of Neuroscience, 26(51):13338–13343, 2006

    Michelle Hampson, Naomi R Driesen, Pawel Skudlarski, John C Gore, and R Todd Constable. Brain connectivity related to working memory performance.Journal of Neuroscience, 26(51):13338–13343, 2006

  51. [51]

    Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity.Nature neuroscience, 18(11):1664–1671, 2015

    Emily S Finn, Xilin Shen, Dustin Scheinost, Monica D Rosenberg, Jessica Huang, Marvin M Chun, Xenophon Papademetris, and R Todd Constable. Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity.Nature neuroscience, 18(11):1664–1671, 2015. 8

  52. [52]

    Meta-matching as a simple framework to translate phenotypic predictive models from big to small data.Nature neuroscience, 25(6):795–804, 2022

    Tong He, Lijun An, Pansheng Chen, Jianzhong Chen, Jiashi Feng, Danilo Bzdok, Avram J Holmes, Simon B Eickhoff, and BT Thomas Yeo. Meta-matching as a simple framework to translate phenotypic predictive models from big to small data.Nature neuroscience, 25(6):795–804, 2022

  53. [53]

    Masked autoencoders for low-dose ct denoising

    Dayang Wang, Yongshun Xu, Shuo Han, and Hengyong Yu. Masked autoencoders for low-dose ct denoising. In2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–4. IEEE, 2023

  54. [54]

    Scaling Laws for Neural Language Models

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020

  55. [55]

    Training Compute-Optimal Large Language Models

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556, 2022

  56. [56]

    V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, et al

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, et al. DINOv2: Learning robust visual features without supervision.Transactions on Machine Learning Research, 2024. ISSN 2835-8856. URLhttps://openreview.net/forum?id=a68SUt6zFt. F...

  57. [57]

    Flexivit: One model for all patch sizes

    Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, and Filip Pavetic. Flexivit: One model for all patch sizes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14496–14506, 2023

  58. [58]

    Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data

    Paul Steven Scotti, Mihir Tripathy, Cesar Torrico, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A Norman, et al. Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data. InForty-first International Conference on Machine Learning, 2024

  59. [59]

    Mindbridge: A cross-subject brain decoding framework

    Shizun Wang, Songhua Liu, Zhenxiong Tan, and Xinchao Wang. Mindbridge: A cross-subject brain decoding framework. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11333–11342, 2024

  60. [60]

    Mindaligner: Explicit brain functional alignment for cross-subject visual decoding from limited fMRI data

    Yuqin Dai, Zhouheng Yao, Chunfeng Song, Qihao Zheng, Weijian Mai, Kunyu Peng, Shuai Lu, Wanli Ouyang, Jian Yang, and Jiamin Wu. Mindaligner: Explicit brain functional alignment for cross-subject visual decoding from limited fMRI data. InForty-second International Conference on Machine Learning,

  61. [61]

    URLhttps://openreview.net/forum?id=1W2WlYRq0K

  62. [62]

    Human connectome project informatics: quality control, database services, and data visualization.Neuroimage, 80:202–219, 2013

    Daniel S Marcus, Michael P Harms, Abraham Z Snyder, Mark Jenkinson, J Anthony Wilson, Matthew F Glasser, Deanna M Barch, Kevin A Archie, Gregory C Burgess, Mohana Ramaratnam, et al. Human connectome project informatics: quality control, database services, and data visualization.Neuroimage, 80:202–219, 2013

  63. [63]

    Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272, 2020

    Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272, 2020

  64. [64]

    Advances in functional and structural mr image analysis and implementation as fsl.Neuroimage, 23:S208–S219, 2004

    Stephen M Smith, Mark Jenkinson, Mark W Woolrich, Christian F Beckmann, Timothy EJ Behrens, Heidi Johansen-Berg, Peter R Bannister, Marilena De Luca, Ivana Drobnjak, David E Flitney, et al. Advances in functional and structural mr image analysis and implementation as fsl.Neuroimage, 23:S208–S219, 2004

  65. [65]

    Cortical analysis of heterogeneous clinical brain mri scans for large-scale neuroimaging studies

    Karthik Gopinath, Douglas N Greve, Sudeshna Das, Steve Arnold, Colin Magdamo, and Juan Eugenio Iglesias. Cortical analysis of heterogeneous clinical brain mri scans for large-scale neuroimaging studies. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 35–45. Springer, 2023

  66. [66]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  67. [67]

    SGDR: Stochastic Gradient Descent with Warm Restarts

    Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016

  68. [68]

    Augment your batch: Improving generalization through instance repetition

    Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, and Daniel Soudry. Augment your batch: Improving generalization through instance repetition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8129–8138, 2020. 9

  69. [69]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426, 2018

  70. [70]

    Spurious reconstruction from brain activity.Neural Networks, page 107515, 2025

    Ken Shirakawa, Yoshihiro Nagano, Misato Tanaka, Shuntaro C Aoki, Yusuke Muraki, Kei Majima, and Yukiyasu Kamitani. Spurious reconstruction from brain activity.Neural Networks, page 107515, 2025. 10 A Author contributions Connor Laneconceived and implemented the flat map strategy, developed the project framing, wrote the majority of the code, trained all t...

  71. [71]

    We fit a regular grid of size height × width = 224×560 to the array of (x, y) points contained in the mask

    to yield a valid flat map mask of containing 58212 vertices across both cortical hemispheres. We fit a regular grid of size height × width = 224×560 to the array of (x, y) points contained in the mask. The grid has a pixel resolution of 1.2mm in flat map coordinates, which equals the mean nearest neighbor distance. To project surface-mapped fMRI data onto...