Scaling Vision Transformers for Functional MRI with Flat Maps

Benjamin Warner; Cesar Kadir Torrico Villanueva; Connor Lane; Daniel Z. Kaplan; Debojyoti Das; Gianfranco Cort\'es; Leema Krishna Murali; Manish Ram; Mihir Tripathy; Paul S. Scotti

arxiv: 2510.13768 · v2 · submitted 2025-10-15 · 💻 cs.CV · cs.AI· q-bio.NC

Scaling Vision Transformers for Functional MRI with Flat Maps

Connor Lane , Mihir Tripathy , Leema Krishna Murali , Ratna Sagari Grandhi , Shamus Sim Zi Yang , Sam Gijsen , Debojyoti Das , Manish Ram

show 10 more authors

Utkarsh Kumar Singh Cesar Kadir Torrico Villanueva Yuxiang Wei Will Beddow Gianfranco Cort\'es Suin Cho Daniel Z. Kaplan Benjamin Warner Tanishq Mathew Abraham Paul S. Scotti

This is my paper

Pith reviewed 2026-05-18 06:58 UTC · model grok-4.3

classification 💻 cs.CV cs.AIq-bio.NC

keywords functional MRIVision Transformermasked autoencodercortical flat mapsfoundation modelscognitive state decodingscaling lawsBrainmarks

0 comments

The pith

Converting fMRI volumes to 2D cortical flat maps lets Vision Transformers scale and outperform prior models on cognitive state decoding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that a simple projection of 3D fMRI volumes onto 2D cortical flat maps makes it practical to train large Vision Transformer models with masked autoencoder pretraining on thousands of hours of brain data. A sympathetic reader would care because this could open the way to reusable foundation models that improve prediction of brain states without needing task-specific architectures for every new experiment. The authors build the CortexMAE family, compare flat maps against volume and parcellation inputs, and run the first systematic scaling study on fMRI. They release an evaluation suite called Brainmarks that reveals flat maps usually win, scaling follows power laws up to limits, trait prediction stays hard for all models, and cognitive decoding sees large gains for the new models.

Core claim

The central claim is that cortical flat map projections turn each 3D fMRI volume into a 2D image that a standard Vision Transformer can process directly inside a masked autoencoder framework. Trained on 2.1K hours of open data, the resulting CortexMAE models exhibit strict power-law scaling with data size and deliver substantially higher accuracy on cognitive state decoding benchmarks than earlier fMRI models. In contrast, the same models show no clear advantage on subject-level trait prediction, where even a simple functional connectivity baseline remains competitive.

What carries the argument

Cortical flat map projection that converts each 3D fMRI volume into a 2D map for direct input to a Vision Transformer trained under the masked autoencoder objective.

If this is right

Flat maps generally outperform both parcellation and direct volume representations across the tested tasks.
Model performance follows a strict power law with increasing training data, subject to observable limits.
CortexMAE models achieve large gains over prior methods specifically on cognitive state decoding.
No single architecture dominates subject-level trait prediction, and all models remain close to a functional connectivity baseline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The performance gap between trait and state tasks suggests self-supervised pretraining on fMRI may be better suited to capturing transient activity patterns than stable individual traits.
Observed scaling limits imply that further gains will require changes in architecture or data quality rather than data volume alone.
If flat maps transfer well to other 3D brain imaging modalities, the same projection step could simplify transformer use across neuroimaging.

Load-bearing premise

Cortical flat-map projections preserve the spatial and temporal information needed for trait prediction and cognitive state decoding without introducing systematic distortions that would invalidate comparisons to volume-based or parcellation baselines.

What would settle it

A controlled experiment in which a volume-based or parcellation-based transformer matches or exceeds CortexMAE accuracy on the same cognitive state decoding tasks would falsify the claimed advantage of flat maps.

Figures

Figures reproduced from arXiv: 2510.13768 by Benjamin Warner, Cesar Kadir Torrico Villanueva, Connor Lane, Daniel Z. Kaplan, Debojyoti Das, Gianfranco Cort\'es, Leema Krishna Murali, Manish Ram, Mihir Tripathy, Paul S. Scotti, Ratna Sagari Grandhi, Sam Gijsen, Shamus Sim Zi Yang, Suin Cho, Tanishq Mathew Abraham, Utkarsh Kumar Singh, Will Beddow, Yuxiang Wei.

**Figure 2.** Figure 2: Visualization of MAE predictions. Within each panel of [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: fMRI modeling performance scales with dataset size. The model is a ViT-B trained on [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Downstream decoding performance as a function of dataset size (left column), model size (middle column), and temporal patch size pt (right column). Smaller temporal patch size corresponds to larger effective sequence length (tokens per input = 364·16/pt). Black dashes indicate performance on independent validation sets used for classifier parameter tuning. the 88.6M (ViT-B) model, despite 7× fewer para… view at source ↗

**Figure 5.** Figure 5: 4D fMRI time series are first preprocessed using standard methods [ [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Schaefer 400 parcellation [22] with Yeo resting-state networks [26] on the cortical surface and flat map. Relaxation cuts required for flat map transformation [30] are marked in white. B.2 Pretraining implementation details We pretrain for 625K steps using AdamW (β1 = 0.9, β2 = 0.95) [64] with a batch size of 32, learning rate of 1.25e-4 (base learning rate 1e-3 scaled by batch_size / 256), and weight deca… view at source ↗

**Figure 7.** Figure 7: Example NSD images with CLIP assigned labels. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Additional MAE predictions for randomly sampled data. [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

We study the problem of training self-supervised foundation models for functional MRI. Our main contributions are: (1) we introduce a new model family (CortexMAE) trained using the masked autoencoder framework on 2.1K hours of open fMRI data, and (2) we release the first open evaluation suite (Brainmarks) for fMRI foundation models. Our core innovation is simple: we adapt the Vision Transformer to fMRI by first converting each 3D fMRI volume to a 2D map using a cortical flat map projection. We directly compare flat maps to both parcellation and volume-based representations. While each has its advantages, flat maps generally perform best. We perform the first systematic scaling analysis for fMRI and observe strict power law scaling, albeit with limits. Finally, we use Brainmarks to do controlled benchmark comparisons. On subject-level trait prediction, we report a challenging null result: no single model achieves clear state-of-the-art performance. Moreover, all models struggle to outperform a simple functional connectivity baseline. On cognitive state decoding, we observe more robust performance, and in this setting our CortexMAE family outperforms prior models by a large margin. Code, models, and datasets are available at https://github.com/MedARC-AI/CortexMAE and https://github.com/MedARC-AI/Brainmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Flat maps let a plain ViT do masked autoencoding on fMRI with clear scaling and a solid win on cognitive decoding, but the trait prediction null result and possible projection artifacts keep the practical payoff modest.

read the letter

The paper's real contribution is the CortexMAE family plus the open Brainmarks suite. They take 2.1K hours of public fMRI, project each volume to a cortical flat map, and train a standard masked autoencoder. That setup shows clean power-law scaling and beats earlier models by a noticeable margin on cognitive state decoding. They also report the null result on subject-level trait prediction without trying to spin it, which is useful to see in print. Releasing code, models, and the benchmark is the part that actually moves the field forward for anyone who wants to test their own ideas against a common set of tasks.

Referee Report

2 major / 2 minor

Summary. The paper introduces CortexMAE, a family of masked autoencoder models based on Vision Transformers adapted to fMRI via cortical flat-map projections of 3D volumes. Trained self-supervised on 2.1K hours of open data, the work releases the Brainmarks evaluation suite and performs direct comparisons of flat-map, parcellation, and volume representations. It reports strict power-law scaling (with limits), a null result on subject-level trait prediction (no model clearly outperforms a functional connectivity baseline), and substantially stronger performance on cognitive state decoding where CortexMAE outperforms prior models by a large margin. Code, models, and datasets are released.

Significance. If the central empirical claims hold, this provides a useful open foundation-model baseline and scaling analysis for fMRI, together with a new benchmark suite. The explicit release of code, models, and data, plus the honest reporting of the null result on trait prediction, are clear strengths that facilitate independent verification. The flat-map approach could influence representation choices in neuroimaging if shown to preserve functional structure without systematic bias relative to volume or parcellation inputs.

major comments (2)

[§4 and Table 2] §4 (Experimental Comparisons) and Table 2: the headline ranking that flat maps generally perform best, and the large-margin win on cognitive state decoding, rests on the untested assumption that the 3D-to-2D cortical projection retains the spatial structure exploited by the ViT without introducing distortions that favor one input format. No voxel-wise reconstruction error, subcortical signal retention statistics, or representational similarity analysis between flat-map, volume, and parcellation inputs is reported; this is load-bearing for the claim that the observed ordering reflects architectural or representational advantage rather than unequal information fidelity.
[Methods and Results] Methods and Results sections: performance numbers for both trait prediction and cognitive decoding lack error bars, standard deviations across random seeds or cross-validation folds, and explicit statements of the exact train/validation/test splits used. Without these, the robustness of the 'large margin' outperformance and the null result cannot be assessed at the level required for a foundation-model benchmark paper.

minor comments (2)

[Figure 3] Figure 3 (scaling curves): axis labels and legend entries are too small for readability; consider increasing font size and adding a panel showing the fitted power-law exponents with confidence intervals.
[§5] The abstract states 'strict power law scaling, albeit with limits' but the main text does not define the quantitative criterion used to identify the 'limits'; a short clarification would help readers interpret the scaling regime.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We appreciate the focus on validating the representational fidelity of our flat-map approach and on improving the statistical reporting of our results. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§4 and Table 2] §4 (Experimental Comparisons) and Table 2: the headline ranking that flat maps generally perform best, and the large-margin win on cognitive state decoding, rests on the untested assumption that the 3D-to-2D cortical projection retains the spatial structure exploited by the ViT without introducing distortions that favor one input format. No voxel-wise reconstruction error, subcortical signal retention statistics, or representational similarity analysis between flat-map, volume, and parcellation inputs is reported; this is load-bearing for the claim that the observed ordering reflects architectural or representational advantage rather than unequal information fidelity.

Authors: We agree that the absence of direct fidelity checks on the 3D-to-2D projection is a limitation that weakens the interpretation of why flat maps outperform the alternatives. Although cortical flat mapping is a standard preprocessing step in the field (as implemented in FreeSurfer and used in the HCP), we did not report quantitative comparisons of information preservation across formats. In the revised version we will add voxel-wise reconstruction error between original volumes and projected maps, subcortical signal retention statistics, and representational similarity analysis (RSA) comparing flat-map, volume, and parcellation inputs. These additions will allow readers to evaluate whether the performance ordering reflects genuine representational advantages rather than differential information loss. revision: yes
Referee: [Methods and Results] Methods and Results sections: performance numbers for both trait prediction and cognitive decoding lack error bars, standard deviations across random seeds or cross-validation folds, and explicit statements of the exact train/validation/test splits used. Without these, the robustness of the 'large margin' outperformance and the null result cannot be assessed at the level required for a foundation-model benchmark paper.

Authors: We acknowledge that the original submission reported only point estimates without measures of variability or precise split details, which is insufficient for a benchmark paper. We will revise the Methods section to document the exact train/validation/test splits (including subject-wise partitioning to avoid leakage) and will update all performance tables and figures in Results to include error bars corresponding to standard deviation across random seeds and cross-validation folds. These changes will make both the large-margin gains on cognitive decoding and the null result on trait prediction more rigorously interpretable. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical training, scaling observations, and held-out benchmarks

full rationale

The paper reports an empirical study: CortexMAE models are trained via masked autoencoding on 2.1K hours of fMRI data after converting volumes to 2D cortical flat maps, then evaluated on the released Brainmarks suite for trait prediction and cognitive state decoding. All performance claims, scaling observations (strict power laws with limits), and comparisons to parcellation/volume baselines arise directly from training runs and held-out test metrics. No mathematical derivations, first-principles results, fitted parameters renamed as predictions, or load-bearing self-citations appear in the described contributions. The work is self-contained against external benchmarks because code, models, and datasets are released for independent reproduction, and results are falsifiable via direct replication on the same splits rather than reducing to any internal definition or prior author result by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the domain assumption that masked autoencoding transfers usefully from natural images to flat-map fMRI and that the chosen open fMRI corpus is representative enough for scaling laws to emerge.

axioms (1)

domain assumption Masked autoencoder pretraining on flat-map fMRI yields representations that generalize to downstream decoding tasks.
Invoked when claiming outperformance on cognitive state decoding.

pith-pipeline@v0.9.0 · 5860 in / 1333 out tokens · 37465 ms · 2026-05-18T06:58:07.501063+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we transform the 4D volumetric fMRI data into videos of 2D fMRI activity flat maps... tokenized into patches... standard ViT on temporal sequences of patchified flat maps using a spatiotemporal MAE
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

masked fMRI modeling performance improves with dataset size according to a strict power scaling law

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
cs.LG 2026-04 unverdicted novelty 6.0

A meta-optimized in-context learning approach enables training-free cross-subject semantic visual decoding from fMRI by inferring individual neural encoding patterns via hierarchical inference on a few examples.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · cited by 1 Pith paper · 6 internal anchors

[1]

Prediction as a humanitarian and pragmatic contribution from human cognitive neuroscience.Neuron, 85(1):11–26, 2015

John DE Gabrieli, Satrajit S Ghosh, and Susan Whitfield-Gabrieli. Prediction as a humanitarian and pragmatic contribution from human cognitive neuroscience.Neuron, 85(1):11–26, 2015

work page 2015
[2]

Building better biomarkers: brain models in translational neuroimaging.Nature neuroscience, 20(3):365–377, 2017

Choong-Wan Woo, Luke J Chang, Martin A Lindquist, and Tor D Wager. Building better biomarkers: brain models in translational neuroimaging.Nature neuroscience, 20(3):365–377, 2017

work page 2017
[3]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[4]

A foundation model for generalizable disease detection from retinal images.Nature, 622(7981):156–163, 2023

Yukun Zhou, Mark A Chia, Siegfried K Wagner, Murat S Ayhan, Dominic J Williamson, Robbert R Struyven, Timing Liu, Moucheng Xu, Mateo G Lozano, Peter Woodward-Court, et al. A foundation model for generalizable disease detection from retinal images.Nature, 622(7981):156–163, 2023

work page 2023
[5]

A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier González, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

work page 2024
[6]

A foundation model for the earth system.Nature, pages 1–8, 2025

Cristian Bodnar, Wessel P Bruinsma, Ana Lucic, Megan Stanley, Anna Allen, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan A Weyn, Haiyu Dong, et al. A foundation model for the earth system.Nature, pages 1–8, 2025

work page 2025
[7]

Foundation model of neural activity predicts response to new stimulus types.Nature, 640(8058):470–477, 2025

Eric Y Wang, Paul G Fahey, Zhuokun Ding, Stelios Papadopoulos, Kayla Ponder, Marissa A Weis, Andersen Chang, Taliah Muhammad, Saumil Patel, Zhiwei Ding, et al. Foundation model of neural activity predicts response to new stimulus types.Nature, 640(8058):470–477, 2025

work page 2025
[8]

Bert: Pre-training of deep bidi- rectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019
[9]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901
[10]

wav2vec 2.0: A framework for self-supervised learning of speech representations.Advances in neural information processing systems, 33: 12449–12460, 2020

Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations.Advances in neural information processing systems, 33: 12449–12460, 2020

work page 2020
[11]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022

work page 2022
[12]

Brain network transformer

Xuan Kan, Wei Dai, Hejie Cui, Zilong Zhang, Ying Guo, and Carl Yang. Brain network transformer. Advances in Neural Information Processing Systems, 35:25586–25599, 2022

work page 2022
[13]

Self-supervised learning of brain dynamics from broad neuroimaging data.Advances in neural information processing systems, 35:21255–21269, 2022

Armin Thomas, Christopher Ré, and Russell Poldrack. Self-supervised learning of brain dynamics from broad neuroimaging data.Advances in neural information processing systems, 35:21255–21269, 2022

work page 2022
[14]

Self-supervised transformers for fmri representation

Itzik Malkiel, Gony Rosenman, Lior Wolf, and Talma Hendler. Self-supervised transformers for fmri representation. InInternational Conference on Medical Imaging with Deep Learning, pages 895–913. PMLR, 2022. 6

work page 2022
[15]

Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding

Zijiao Chen, Jiaxin Qing, Tiange Xiang, Wan Lin Yue, and Juan Helen Zhou. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22710–22720, 2023

work page 2023
[16]

Swift: Swin 4d fmri transformer.Advances in Neural Information Processing Systems, 36:42015–42037, 2023

Peter Kim, Junbeom Kwon, Sunghwan Joo, Sangyoon Bae, Donggyu Lee, Yoonho Jung, Shinjae Yoo, Jiook Cha, and Taesup Moon. Swift: Swin 4d fmri transformer.Advances in Neural Information Processing Systems, 36:42015–42037, 2023

work page 2023
[17]

BrainLM: A foundation model for brain activity recordings

Josue Ortega Caro, Antonio Henrique de Oliveira Fonseca, Syed A Rizvi, Matteo Rosati, Christopher Averill, James L Cross, Prateek Mittal, Emanuele Zappala, Rahul Madhav Dhodapkar, Chadi Abdallah, and David van Dijk. BrainLM: A foundation model for brain activity recordings. InThe Twelfth International Conference on Learning Representations, 2024. URL http...

work page 2024
[18]

Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking.Advances in Neural Information Processing Systems, 37:86048–86073, 2024

Zijian Dong, Ruilin Li, Yilei Wu, Thuan Tinh Nguyen, Joanna Chong, Fang Ji, Nathanael Tong, Christopher Chen, and Juan Helen Zhou. Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking.Advances in Neural Information Processing Systems, 37:86048–86073, 2024

work page 2024
[19]

General-purpose brain foundation models for time-series neuroimaging data

Mohammad Javad Darvishi Bayazi, Hena Ghonia, Roland Riachi, Bruno Aristimunha, Arian Khorasani, Md Rifat Arefin, Amin Darabi, Guillaume Dumas, and Irina Rish. General-purpose brain foundation models for time-series neuroimaging data. InNeurIPS Workshop on Time Series in the Age of Large Models, 2024. URLhttps://openreview.net/forum?id=HwDQH0r37I

work page 2024
[20]

arXiv preprint arXiv:2506.11167 , year=

Cheng Wang, Yu Jiang, Zhihao Peng, Chenxin Li, Changbae Bang, Lin Zhao, Jinglei Lv, Jorge Sepulcre, Carl Yang, Lifang He, et al. Towards a general-purpose foundation model for fmri analysis.arXiv preprint arXiv:2506.11167, 2025

work page arXiv 2025
[21]

A unified, scalable framework for neural population decoding.Advances in Neural Information Processing Systems, 36: 44937–44956, 2023

Mehdi Azabou, Vinam Arora, Venkataramana Ganesh, Ximeng Mao, Santosh Nachimuthu, Michael Mendelson, Blake Richards, Matthew Perich, Guillaume Lajoie, and Eva Dyer. A unified, scalable framework for neural population decoding.Advances in Neural Information Processing Systems, 36: 44937–44956, 2023

work page 2023
[22]

Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri.Cerebral cortex, 28(9):3095–3114, 2018

Alexander Schaefer, Ru Kong, Evan M Gordon, Timothy O Laumann, Xi-Nian Zuo, Avram J Holmes, Simon B Eickhoff, and BT Thomas Yeo. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri.Cerebral cortex, 28(9):3095–3114, 2018

work page 2018
[23]

Fine-grain atlases of functional modes for fmri analysis.NeuroImage, 221:117126, 2020

Kamalaker Dadi, Gaël Varoquaux, Antonia Machlouzarides-Shalit, Krzysztof J Gorgolewski, Demian Wassermann, Bertrand Thirion, and Arthur Mensch. Fine-grain atlases of functional modes for fmri analysis.NeuroImage, 221:117126, 2020

work page 2020
[24]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

work page 2021
[25]

The human connectome: a structural description of the human brain.PLoS computational biology, 1(4):e42, 2005

Olaf Sporns, Giulio Tononi, and Rolf Kötter. The human connectome: a structural description of the human brain.PLoS computational biology, 1(4):e42, 2005

work page 2005
[26]

The organization of the human cerebral cortex estimated by intrinsic functional connectivity.Journal of neurophysiology, 2011

BT Thomas Yeo, Fenna M Krienen, Jorge Sepulcre, Mert R Sabuncu, Danial Lashkari, Marisa Hollinshead, Joshua L Roffman, Jordan W Smoller, Lilla Zöllei, Jonathan R Polimeni, et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity.Journal of neurophysiology, 2011

work page 2011
[27]

Geometric constraints on human brain function.Nature, 618(7965): 566–574, 2023

James C Pang, Kevin M Aquino, Marianne Oldehinkel, Peter A Robinson, Ben D Fulcher, Michael Breakspear, and Alex Fornito. Geometric constraints on human brain function.Nature, 618(7965): 566–574, 2023

work page 2023
[28]

The bitter lesson.Incomplete Ideas (blog), 13(1):38, 2019

Richard Sutton. The bitter lesson.Incomplete Ideas (blog), 13(1):38, 2019

work page 2019
[29]

Stanford cs25: V4

Hyung Won Chung. Stanford cs25: V4. https://youtu.be/3gb-ZkVRemQ?si=7FXnklTS9X3FCuv1,

work page
[30]

YouTube video, Stanford University

work page
[31]

Pycortex: an interactive surface visualizer for fmri.Frontiers in neuroinformatics, 9:23, 2015

James S Gao, Alexander G Huth, Mark D Lescroart, and Jack L Gallant. Pycortex: an interactive surface visualizer for fmri.Frontiers in neuroinformatics, 9:23, 2015

work page 2015
[32]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https...

work page 2021
[33]

Masked autoencoders as spatiotemporal learners

Christoph Feichtenhofer, Yanghao Li, Kaiming He, et al. Masked autoencoders as spatiotemporal learners. Advances in neural information processing systems, 35:35946–35958, 2022

work page 2022
[34]

The wu-minn human connectome project: an overview.Neuroimage, 80: 62–79, 2013

David C Van Essen, Stephen M Smith, Deanna M Barch, Timothy EJ Behrens, Essa Yacoub, Kamil Ugurbil, Wu-Minn HCP Consortium, et al. The wu-minn human connectome project: an overview.Neuroimage, 80: 62–79, 2013

work page 2013
[35]

Cortical surface-based analysis: I

Anders M Dale, Bruce Fischl, and Martin I Sereno. Cortical surface-based analysis: I. segmentation and surface reconstruction.Neuroimage, 9(2):179–194, 1999

work page 1999
[36]

Freesurfer.Neuroimage, 62(2):774–781, 2012

Bruce Fischl. Freesurfer.Neuroimage, 62(2):774–781, 2012

work page 2012
[37]

The minimal preprocessing pipelines for the human connectome project.Neuroimage, 80:105–124, 2013

Matthew F Glasser, Stamatios N Sotiropoulos, J Anthony Wilson, Timothy S Coalson, Bruce Fischl, Jesper L Andersson, Junqian Xu, Saad Jbabdi, Matthew Webster, Jonathan R Polimeni, et al. The minimal preprocessing pipelines for the human connectome project.Neuroimage, 80:105–124, 2013

work page 2013
[38]

fmriprep: a robust preprocessing pipeline for functional mri.Nature methods, 16(1):111–116, 2019

Oscar Esteban, Christopher J Markiewicz, Ross W Blair, Craig A Moodie, A Ilkay Isik, Asier Erra- muzpe, James D Kent, Mathias Goncalves, Elizabeth DuPre, Madeleine Snyder, et al. fmriprep: a robust preprocessing pipeline for functional mri.Nature methods, 16(1):111–116, 2019

work page 2019
[39]

A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence.Nature neuroscience, 25(1):116–126, 2022

Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L Breedlove, Jacob S Prince, Logan T Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, et al. A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence.Nature neuroscience, 25(1):116–126, 2022

work page 2022
[40]

Image processing and quality control for the first 10,000 brain imaging datasets from uk biobank.Neuroimage, 166:400–424, 2018

Fidel Alfaro-Almagro, Mark Jenkinson, Neal K Bangerter, Jesper LR Andersson, Ludovica Griffanti, Gwenaëlle Douaud, Stamatios N Sotiropoulos, Saad Jbabdi, Moises Hernandez-Fernandez, Emmanuel Vallee, et al. Image processing and quality control for the first 10,000 brain imaging datasets from uk biobank.Neuroimage, 166:400–424, 2018

work page 2018
[41]

Sources and implications of whole-brain fmri signals in humans.Neuroimage, 146:609–625, 2017

Jonathan D Power, Mark Plitt, Timothy O Laumann, and Alex Martin. Sources and implications of whole-brain fmri signals in humans.Neuroimage, 146:609–625, 2017

work page 2017
[42]

Videomae v2: Scaling video masked autoencoders with dual masking

Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, and Yu Qiao. Videomae v2: Scaling video masked autoencoders with dual masking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14549–14560, 2023

work page 2023
[43]

Functional annotation of human cognitive states using deep graph convolution.NeuroImage, 231:117847, 2021

Yu Zhang, Loïc Tetrel, Bertrand Thirion, and Pierre Bellec. Functional annotation of human cognitive states using deep graph convolution.NeuroImage, 231:117847, 2021

work page 2021
[44]

Deep learning models of cognitive processes constrained by human brain connectomes.Medical image analysis, 80:102507, 2022

Yu Zhang, Nicolas Farrugia, and Pierre Bellec. Deep learning models of cognitive processes constrained by human brain connectomes.Medical image analysis, 80:102507, 2022

work page 2022
[45]

Brain decoding of the human connectome project tasks in a dense individual fmri dataset.NeuroImage, 283:120395, 2023

Shima Rastegarnia, Marie St-Laurent, Elizabeth DuPre, Basile Pinsard, and Pierre Bellec. Brain decoding of the human connectome project tasks in a dense individual fmri dataset.NeuroImage, 283:120395, 2023

work page 2023
[46]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014

work page 2014
[47]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

work page 2021
[48]

Self-supervised learning from images with a joint-embedding predictive architecture

Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15619–15629, 2023

work page 2023
[49]

Cluster and predict latents patches for improved masked image modeling.Transactions on Machine Learning Research, 2025

Timothée Darcet, Federico Baldassarre, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. Cluster and predict latents patches for improved masked image modeling.Transactions on Machine Learning Research, 2025. ISSN 2835-8856. URLhttps://openreview.net/forum?id=Ycmz7qJxUQ

work page 2025
[50]

Brain connectivity related to working memory performance.Journal of Neuroscience, 26(51):13338–13343, 2006

Michelle Hampson, Naomi R Driesen, Pawel Skudlarski, John C Gore, and R Todd Constable. Brain connectivity related to working memory performance.Journal of Neuroscience, 26(51):13338–13343, 2006

work page 2006
[51]

Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity.Nature neuroscience, 18(11):1664–1671, 2015

Emily S Finn, Xilin Shen, Dustin Scheinost, Monica D Rosenberg, Jessica Huang, Marvin M Chun, Xenophon Papademetris, and R Todd Constable. Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity.Nature neuroscience, 18(11):1664–1671, 2015. 8

work page 2015
[52]

Meta-matching as a simple framework to translate phenotypic predictive models from big to small data.Nature neuroscience, 25(6):795–804, 2022

Tong He, Lijun An, Pansheng Chen, Jianzhong Chen, Jiashi Feng, Danilo Bzdok, Avram J Holmes, Simon B Eickhoff, and BT Thomas Yeo. Meta-matching as a simple framework to translate phenotypic predictive models from big to small data.Nature neuroscience, 25(6):795–804, 2022

work page 2022
[53]

Masked autoencoders for low-dose ct denoising

Dayang Wang, Yongshun Xu, Shuo Han, and Hengyong Yu. Masked autoencoders for low-dose ct denoising. In2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–4. IEEE, 2023

work page 2023
[54]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001
[55]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[56]

V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, et al

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, et al. DINOv2: Learning robust visual features without supervision.Transactions on Machine Learning Research, 2024. ISSN 2835-8856. URLhttps://openreview.net/forum?id=a68SUt6zFt. F...

work page 2024
[57]

Flexivit: One model for all patch sizes

Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, and Filip Pavetic. Flexivit: One model for all patch sizes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14496–14506, 2023

work page 2023
[58]

Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data

Paul Steven Scotti, Mihir Tripathy, Cesar Torrico, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A Norman, et al. Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data. InForty-first International Conference on Machine Learning, 2024

work page 2024
[59]

Mindbridge: A cross-subject brain decoding framework

Shizun Wang, Songhua Liu, Zhenxiong Tan, and Xinchao Wang. Mindbridge: A cross-subject brain decoding framework. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11333–11342, 2024

work page 2024
[60]

Mindaligner: Explicit brain functional alignment for cross-subject visual decoding from limited fMRI data

Yuqin Dai, Zhouheng Yao, Chunfeng Song, Qihao Zheng, Weijian Mai, Kunyu Peng, Shuai Lu, Wanli Ouyang, Jian Yang, and Jiamin Wu. Mindaligner: Explicit brain functional alignment for cross-subject visual decoding from limited fMRI data. InForty-second International Conference on Machine Learning,

work page
[61]

URLhttps://openreview.net/forum?id=1W2WlYRq0K

work page
[62]

Human connectome project informatics: quality control, database services, and data visualization.Neuroimage, 80:202–219, 2013

Daniel S Marcus, Michael P Harms, Abraham Z Snyder, Mark Jenkinson, J Anthony Wilson, Matthew F Glasser, Deanna M Barch, Kevin A Archie, Gregory C Burgess, Mohana Ramaratnam, et al. Human connectome project informatics: quality control, database services, and data visualization.Neuroimage, 80:202–219, 2013

work page 2013
[63]

Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272, 2020

Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272, 2020

work page 2020
[64]

Advances in functional and structural mr image analysis and implementation as fsl.Neuroimage, 23:S208–S219, 2004

Stephen M Smith, Mark Jenkinson, Mark W Woolrich, Christian F Beckmann, Timothy EJ Behrens, Heidi Johansen-Berg, Peter R Bannister, Marilena De Luca, Ivana Drobnjak, David E Flitney, et al. Advances in functional and structural mr image analysis and implementation as fsl.Neuroimage, 23:S208–S219, 2004

work page 2004
[65]

Cortical analysis of heterogeneous clinical brain mri scans for large-scale neuroimaging studies

Karthik Gopinath, Douglas N Greve, Sudeshna Das, Steve Arnold, Colin Magdamo, and Juan Eugenio Iglesias. Cortical analysis of heterogeneous clinical brain mri scans for large-scale neuroimaging studies. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 35–45. Springer, 2023

work page 2023
[66]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[67]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[68]

Augment your batch: Improving generalization through instance repetition

Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, and Daniel Soudry. Augment your batch: Improving generalization through instance repetition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8129–8138, 2020. 9

work page 2020
[69]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[70]

Spurious reconstruction from brain activity.Neural Networks, page 107515, 2025

Ken Shirakawa, Yoshihiro Nagano, Misato Tanaka, Shuntaro C Aoki, Yusuke Muraki, Kei Majima, and Yukiyasu Kamitani. Spurious reconstruction from brain activity.Neural Networks, page 107515, 2025. 10 A Author contributions Connor Laneconceived and implemented the flat map strategy, developed the project framing, wrote the majority of the code, trained all t...

work page 2025
[71]

We fit a regular grid of size height × width = 224×560 to the array of (x, y) points contained in the mask

to yield a valid flat map mask of containing 58212 vertices across both cortical hemispheres. We fit a regular grid of size height × width = 224×560 to the array of (x, y) points contained in the mask. The grid has a pixel resolution of 1.2mm in flat map coordinates, which equals the mean nearest neighbor distance. To project surface-mapped fMRI data onto...

work page

[1] [1]

Prediction as a humanitarian and pragmatic contribution from human cognitive neuroscience.Neuron, 85(1):11–26, 2015

John DE Gabrieli, Satrajit S Ghosh, and Susan Whitfield-Gabrieli. Prediction as a humanitarian and pragmatic contribution from human cognitive neuroscience.Neuron, 85(1):11–26, 2015

work page 2015

[2] [2]

Building better biomarkers: brain models in translational neuroimaging.Nature neuroscience, 20(3):365–377, 2017

Choong-Wan Woo, Luke J Chang, Martin A Lindquist, and Tor D Wager. Building better biomarkers: brain models in translational neuroimaging.Nature neuroscience, 20(3):365–377, 2017

work page 2017

[3] [3]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[4] [4]

A foundation model for generalizable disease detection from retinal images.Nature, 622(7981):156–163, 2023

Yukun Zhou, Mark A Chia, Siegfried K Wagner, Murat S Ayhan, Dominic J Williamson, Robbert R Struyven, Timing Liu, Moucheng Xu, Mateo G Lozano, Peter Woodward-Court, et al. A foundation model for generalizable disease detection from retinal images.Nature, 622(7981):156–163, 2023

work page 2023

[5] [5]

A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier González, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

work page 2024

[6] [6]

A foundation model for the earth system.Nature, pages 1–8, 2025

Cristian Bodnar, Wessel P Bruinsma, Ana Lucic, Megan Stanley, Anna Allen, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan A Weyn, Haiyu Dong, et al. A foundation model for the earth system.Nature, pages 1–8, 2025

work page 2025

[7] [7]

Foundation model of neural activity predicts response to new stimulus types.Nature, 640(8058):470–477, 2025

Eric Y Wang, Paul G Fahey, Zhuokun Ding, Stelios Papadopoulos, Kayla Ponder, Marissa A Weis, Andersen Chang, Taliah Muhammad, Saumil Patel, Zhiwei Ding, et al. Foundation model of neural activity predicts response to new stimulus types.Nature, 640(8058):470–477, 2025

work page 2025

[8] [8]

Bert: Pre-training of deep bidi- rectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019

[9] [9]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901

[10] [10]

wav2vec 2.0: A framework for self-supervised learning of speech representations.Advances in neural information processing systems, 33: 12449–12460, 2020

Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations.Advances in neural information processing systems, 33: 12449–12460, 2020

work page 2020

[11] [11]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022

work page 2022

[12] [12]

Brain network transformer

Xuan Kan, Wei Dai, Hejie Cui, Zilong Zhang, Ying Guo, and Carl Yang. Brain network transformer. Advances in Neural Information Processing Systems, 35:25586–25599, 2022

work page 2022

[13] [13]

Self-supervised learning of brain dynamics from broad neuroimaging data.Advances in neural information processing systems, 35:21255–21269, 2022

Armin Thomas, Christopher Ré, and Russell Poldrack. Self-supervised learning of brain dynamics from broad neuroimaging data.Advances in neural information processing systems, 35:21255–21269, 2022

work page 2022

[14] [14]

Self-supervised transformers for fmri representation

Itzik Malkiel, Gony Rosenman, Lior Wolf, and Talma Hendler. Self-supervised transformers for fmri representation. InInternational Conference on Medical Imaging with Deep Learning, pages 895–913. PMLR, 2022. 6

work page 2022

[15] [15]

Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding

Zijiao Chen, Jiaxin Qing, Tiange Xiang, Wan Lin Yue, and Juan Helen Zhou. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22710–22720, 2023

work page 2023

[16] [16]

Swift: Swin 4d fmri transformer.Advances in Neural Information Processing Systems, 36:42015–42037, 2023

Peter Kim, Junbeom Kwon, Sunghwan Joo, Sangyoon Bae, Donggyu Lee, Yoonho Jung, Shinjae Yoo, Jiook Cha, and Taesup Moon. Swift: Swin 4d fmri transformer.Advances in Neural Information Processing Systems, 36:42015–42037, 2023

work page 2023

[17] [17]

BrainLM: A foundation model for brain activity recordings

Josue Ortega Caro, Antonio Henrique de Oliveira Fonseca, Syed A Rizvi, Matteo Rosati, Christopher Averill, James L Cross, Prateek Mittal, Emanuele Zappala, Rahul Madhav Dhodapkar, Chadi Abdallah, and David van Dijk. BrainLM: A foundation model for brain activity recordings. InThe Twelfth International Conference on Learning Representations, 2024. URL http...

work page 2024

[18] [18]

Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking.Advances in Neural Information Processing Systems, 37:86048–86073, 2024

Zijian Dong, Ruilin Li, Yilei Wu, Thuan Tinh Nguyen, Joanna Chong, Fang Ji, Nathanael Tong, Christopher Chen, and Juan Helen Zhou. Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking.Advances in Neural Information Processing Systems, 37:86048–86073, 2024

work page 2024

[19] [19]

General-purpose brain foundation models for time-series neuroimaging data

Mohammad Javad Darvishi Bayazi, Hena Ghonia, Roland Riachi, Bruno Aristimunha, Arian Khorasani, Md Rifat Arefin, Amin Darabi, Guillaume Dumas, and Irina Rish. General-purpose brain foundation models for time-series neuroimaging data. InNeurIPS Workshop on Time Series in the Age of Large Models, 2024. URLhttps://openreview.net/forum?id=HwDQH0r37I

work page 2024

[20] [20]

arXiv preprint arXiv:2506.11167 , year=

Cheng Wang, Yu Jiang, Zhihao Peng, Chenxin Li, Changbae Bang, Lin Zhao, Jinglei Lv, Jorge Sepulcre, Carl Yang, Lifang He, et al. Towards a general-purpose foundation model for fmri analysis.arXiv preprint arXiv:2506.11167, 2025

work page arXiv 2025

[21] [21]

A unified, scalable framework for neural population decoding.Advances in Neural Information Processing Systems, 36: 44937–44956, 2023

Mehdi Azabou, Vinam Arora, Venkataramana Ganesh, Ximeng Mao, Santosh Nachimuthu, Michael Mendelson, Blake Richards, Matthew Perich, Guillaume Lajoie, and Eva Dyer. A unified, scalable framework for neural population decoding.Advances in Neural Information Processing Systems, 36: 44937–44956, 2023

work page 2023

[22] [22]

Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri.Cerebral cortex, 28(9):3095–3114, 2018

Alexander Schaefer, Ru Kong, Evan M Gordon, Timothy O Laumann, Xi-Nian Zuo, Avram J Holmes, Simon B Eickhoff, and BT Thomas Yeo. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri.Cerebral cortex, 28(9):3095–3114, 2018

work page 2018

[23] [23]

Fine-grain atlases of functional modes for fmri analysis.NeuroImage, 221:117126, 2020

Kamalaker Dadi, Gaël Varoquaux, Antonia Machlouzarides-Shalit, Krzysztof J Gorgolewski, Demian Wassermann, Bertrand Thirion, and Arthur Mensch. Fine-grain atlases of functional modes for fmri analysis.NeuroImage, 221:117126, 2020

work page 2020

[24] [24]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

work page 2021

[25] [25]

The human connectome: a structural description of the human brain.PLoS computational biology, 1(4):e42, 2005

Olaf Sporns, Giulio Tononi, and Rolf Kötter. The human connectome: a structural description of the human brain.PLoS computational biology, 1(4):e42, 2005

work page 2005

[26] [26]

The organization of the human cerebral cortex estimated by intrinsic functional connectivity.Journal of neurophysiology, 2011

BT Thomas Yeo, Fenna M Krienen, Jorge Sepulcre, Mert R Sabuncu, Danial Lashkari, Marisa Hollinshead, Joshua L Roffman, Jordan W Smoller, Lilla Zöllei, Jonathan R Polimeni, et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity.Journal of neurophysiology, 2011

work page 2011

[27] [27]

Geometric constraints on human brain function.Nature, 618(7965): 566–574, 2023

James C Pang, Kevin M Aquino, Marianne Oldehinkel, Peter A Robinson, Ben D Fulcher, Michael Breakspear, and Alex Fornito. Geometric constraints on human brain function.Nature, 618(7965): 566–574, 2023

work page 2023

[28] [28]

The bitter lesson.Incomplete Ideas (blog), 13(1):38, 2019

Richard Sutton. The bitter lesson.Incomplete Ideas (blog), 13(1):38, 2019

work page 2019

[29] [29]

Stanford cs25: V4

Hyung Won Chung. Stanford cs25: V4. https://youtu.be/3gb-ZkVRemQ?si=7FXnklTS9X3FCuv1,

work page

[30] [30]

YouTube video, Stanford University

work page

[31] [31]

Pycortex: an interactive surface visualizer for fmri.Frontiers in neuroinformatics, 9:23, 2015

James S Gao, Alexander G Huth, Mark D Lescroart, and Jack L Gallant. Pycortex: an interactive surface visualizer for fmri.Frontiers in neuroinformatics, 9:23, 2015

work page 2015

[32] [32]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https...

work page 2021

[33] [33]

Masked autoencoders as spatiotemporal learners

Christoph Feichtenhofer, Yanghao Li, Kaiming He, et al. Masked autoencoders as spatiotemporal learners. Advances in neural information processing systems, 35:35946–35958, 2022

work page 2022

[34] [34]

The wu-minn human connectome project: an overview.Neuroimage, 80: 62–79, 2013

David C Van Essen, Stephen M Smith, Deanna M Barch, Timothy EJ Behrens, Essa Yacoub, Kamil Ugurbil, Wu-Minn HCP Consortium, et al. The wu-minn human connectome project: an overview.Neuroimage, 80: 62–79, 2013

work page 2013

[35] [35]

Cortical surface-based analysis: I

Anders M Dale, Bruce Fischl, and Martin I Sereno. Cortical surface-based analysis: I. segmentation and surface reconstruction.Neuroimage, 9(2):179–194, 1999

work page 1999

[36] [36]

Freesurfer.Neuroimage, 62(2):774–781, 2012

Bruce Fischl. Freesurfer.Neuroimage, 62(2):774–781, 2012

work page 2012

[37] [37]

The minimal preprocessing pipelines for the human connectome project.Neuroimage, 80:105–124, 2013

Matthew F Glasser, Stamatios N Sotiropoulos, J Anthony Wilson, Timothy S Coalson, Bruce Fischl, Jesper L Andersson, Junqian Xu, Saad Jbabdi, Matthew Webster, Jonathan R Polimeni, et al. The minimal preprocessing pipelines for the human connectome project.Neuroimage, 80:105–124, 2013

work page 2013

[38] [38]

fmriprep: a robust preprocessing pipeline for functional mri.Nature methods, 16(1):111–116, 2019

Oscar Esteban, Christopher J Markiewicz, Ross W Blair, Craig A Moodie, A Ilkay Isik, Asier Erra- muzpe, James D Kent, Mathias Goncalves, Elizabeth DuPre, Madeleine Snyder, et al. fmriprep: a robust preprocessing pipeline for functional mri.Nature methods, 16(1):111–116, 2019

work page 2019

[39] [39]

A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence.Nature neuroscience, 25(1):116–126, 2022

Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L Breedlove, Jacob S Prince, Logan T Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, et al. A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence.Nature neuroscience, 25(1):116–126, 2022

work page 2022

[40] [40]

Image processing and quality control for the first 10,000 brain imaging datasets from uk biobank.Neuroimage, 166:400–424, 2018

Fidel Alfaro-Almagro, Mark Jenkinson, Neal K Bangerter, Jesper LR Andersson, Ludovica Griffanti, Gwenaëlle Douaud, Stamatios N Sotiropoulos, Saad Jbabdi, Moises Hernandez-Fernandez, Emmanuel Vallee, et al. Image processing and quality control for the first 10,000 brain imaging datasets from uk biobank.Neuroimage, 166:400–424, 2018

work page 2018

[41] [41]

Sources and implications of whole-brain fmri signals in humans.Neuroimage, 146:609–625, 2017

Jonathan D Power, Mark Plitt, Timothy O Laumann, and Alex Martin. Sources and implications of whole-brain fmri signals in humans.Neuroimage, 146:609–625, 2017

work page 2017

[42] [42]

Videomae v2: Scaling video masked autoencoders with dual masking

Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, and Yu Qiao. Videomae v2: Scaling video masked autoencoders with dual masking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14549–14560, 2023

work page 2023

[43] [43]

Functional annotation of human cognitive states using deep graph convolution.NeuroImage, 231:117847, 2021

Yu Zhang, Loïc Tetrel, Bertrand Thirion, and Pierre Bellec. Functional annotation of human cognitive states using deep graph convolution.NeuroImage, 231:117847, 2021

work page 2021

[44] [44]

Deep learning models of cognitive processes constrained by human brain connectomes.Medical image analysis, 80:102507, 2022

Yu Zhang, Nicolas Farrugia, and Pierre Bellec. Deep learning models of cognitive processes constrained by human brain connectomes.Medical image analysis, 80:102507, 2022

work page 2022

[45] [45]

Brain decoding of the human connectome project tasks in a dense individual fmri dataset.NeuroImage, 283:120395, 2023

Shima Rastegarnia, Marie St-Laurent, Elizabeth DuPre, Basile Pinsard, and Pierre Bellec. Brain decoding of the human connectome project tasks in a dense individual fmri dataset.NeuroImage, 283:120395, 2023

work page 2023

[46] [46]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014

work page 2014

[47] [47]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

work page 2021

[48] [48]

Self-supervised learning from images with a joint-embedding predictive architecture

Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15619–15629, 2023

work page 2023

[49] [49]

Cluster and predict latents patches for improved masked image modeling.Transactions on Machine Learning Research, 2025

Timothée Darcet, Federico Baldassarre, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. Cluster and predict latents patches for improved masked image modeling.Transactions on Machine Learning Research, 2025. ISSN 2835-8856. URLhttps://openreview.net/forum?id=Ycmz7qJxUQ

work page 2025

[50] [50]

Brain connectivity related to working memory performance.Journal of Neuroscience, 26(51):13338–13343, 2006

Michelle Hampson, Naomi R Driesen, Pawel Skudlarski, John C Gore, and R Todd Constable. Brain connectivity related to working memory performance.Journal of Neuroscience, 26(51):13338–13343, 2006

work page 2006

[51] [51]

Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity.Nature neuroscience, 18(11):1664–1671, 2015

Emily S Finn, Xilin Shen, Dustin Scheinost, Monica D Rosenberg, Jessica Huang, Marvin M Chun, Xenophon Papademetris, and R Todd Constable. Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity.Nature neuroscience, 18(11):1664–1671, 2015. 8

work page 2015

[52] [52]

Meta-matching as a simple framework to translate phenotypic predictive models from big to small data.Nature neuroscience, 25(6):795–804, 2022

Tong He, Lijun An, Pansheng Chen, Jianzhong Chen, Jiashi Feng, Danilo Bzdok, Avram J Holmes, Simon B Eickhoff, and BT Thomas Yeo. Meta-matching as a simple framework to translate phenotypic predictive models from big to small data.Nature neuroscience, 25(6):795–804, 2022

work page 2022

[53] [53]

Masked autoencoders for low-dose ct denoising

Dayang Wang, Yongshun Xu, Shuo Han, and Hengyong Yu. Masked autoencoders for low-dose ct denoising. In2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–4. IEEE, 2023

work page 2023

[54] [54]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001

[55] [55]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[56] [56]

V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, et al

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, et al. DINOv2: Learning robust visual features without supervision.Transactions on Machine Learning Research, 2024. ISSN 2835-8856. URLhttps://openreview.net/forum?id=a68SUt6zFt. F...

work page 2024

[57] [57]

Flexivit: One model for all patch sizes

Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, and Filip Pavetic. Flexivit: One model for all patch sizes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14496–14506, 2023

work page 2023

[58] [58]

Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data

Paul Steven Scotti, Mihir Tripathy, Cesar Torrico, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A Norman, et al. Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data. InForty-first International Conference on Machine Learning, 2024

work page 2024

[59] [59]

Mindbridge: A cross-subject brain decoding framework

Shizun Wang, Songhua Liu, Zhenxiong Tan, and Xinchao Wang. Mindbridge: A cross-subject brain decoding framework. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11333–11342, 2024

work page 2024

[60] [60]

Mindaligner: Explicit brain functional alignment for cross-subject visual decoding from limited fMRI data

Yuqin Dai, Zhouheng Yao, Chunfeng Song, Qihao Zheng, Weijian Mai, Kunyu Peng, Shuai Lu, Wanli Ouyang, Jian Yang, and Jiamin Wu. Mindaligner: Explicit brain functional alignment for cross-subject visual decoding from limited fMRI data. InForty-second International Conference on Machine Learning,

work page

[61] [61]

URLhttps://openreview.net/forum?id=1W2WlYRq0K

work page

[62] [62]

Human connectome project informatics: quality control, database services, and data visualization.Neuroimage, 80:202–219, 2013

Daniel S Marcus, Michael P Harms, Abraham Z Snyder, Mark Jenkinson, J Anthony Wilson, Matthew F Glasser, Deanna M Barch, Kevin A Archie, Gregory C Burgess, Mohana Ramaratnam, et al. Human connectome project informatics: quality control, database services, and data visualization.Neuroimage, 80:202–219, 2013

work page 2013

[63] [63]

Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272, 2020

Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272, 2020

work page 2020

[64] [64]

Advances in functional and structural mr image analysis and implementation as fsl.Neuroimage, 23:S208–S219, 2004

Stephen M Smith, Mark Jenkinson, Mark W Woolrich, Christian F Beckmann, Timothy EJ Behrens, Heidi Johansen-Berg, Peter R Bannister, Marilena De Luca, Ivana Drobnjak, David E Flitney, et al. Advances in functional and structural mr image analysis and implementation as fsl.Neuroimage, 23:S208–S219, 2004

work page 2004

[65] [65]

Cortical analysis of heterogeneous clinical brain mri scans for large-scale neuroimaging studies

Karthik Gopinath, Douglas N Greve, Sudeshna Das, Steve Arnold, Colin Magdamo, and Juan Eugenio Iglesias. Cortical analysis of heterogeneous clinical brain mri scans for large-scale neuroimaging studies. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 35–45. Springer, 2023

work page 2023

[66] [66]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[67] [67]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[68] [68]

Augment your batch: Improving generalization through instance repetition

Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, and Daniel Soudry. Augment your batch: Improving generalization through instance repetition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8129–8138, 2020. 9

work page 2020

[69] [69]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[70] [70]

Spurious reconstruction from brain activity.Neural Networks, page 107515, 2025

Ken Shirakawa, Yoshihiro Nagano, Misato Tanaka, Shuntaro C Aoki, Yusuke Muraki, Kei Majima, and Yukiyasu Kamitani. Spurious reconstruction from brain activity.Neural Networks, page 107515, 2025. 10 A Author contributions Connor Laneconceived and implemented the flat map strategy, developed the project framing, wrote the majority of the code, trained all t...

work page 2025

[71] [71]

We fit a regular grid of size height × width = 224×560 to the array of (x, y) points contained in the mask

to yield a valid flat map mask of containing 58212 vertices across both cortical hemispheres. We fit a regular grid of size height × width = 224×560 to the array of (x, y) points contained in the mask. The grid has a pixel resolution of 1.2mm in flat map coordinates, which equals the mean nearest neighbor distance. To project surface-mapped fMRI data onto...

work page