Beyond Prediction Accuracy: Target-Space Recovery Profiles for Evaluating Model-Brain Alignment

Ayumu Yamashita; Kaoru Amano; Ken Nakamura; Ryuto Yashiro; Tomoya Nakai

arxiv: 2605.20127 · v1 · pith:NLM25M2Lnew · submitted 2026-05-19 · 🧬 q-bio.NC · cs.AI· cs.LG

Beyond Prediction Accuracy: Target-Space Recovery Profiles for Evaluating Model-Brain Alignment

Ken Nakamura , Tomoya Nakai , Ryuto Yashiro , Ayumu Yamashita , Kaoru Amano This is my paper

Pith reviewed 2026-05-20 03:19 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AIcs.LG

keywords model-brain alignmentprediction accuracyfMRIvisual cortexrecovery profilesNatural Scenes Datasetreproducible dimensionsbrain-to-brain alignment

0 comments

The pith

Prediction accuracy can mask model-brain mismatches because it does not reveal which specific reproducible dimensions of brain responses are recovered.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard accuracy metrics for how well vision models predict brain activity leave open the question of which parts of the brain's response space are actually captured. It proposes a framework that first isolates the reproducible dimensions of target brain responses using repeated fMRI measurements, then measures how strongly each of those dimensions is recovered by predictions from either other brains or model representations. This matters for a sympathetic reader because two predictors can achieve the same overall accuracy while recovering different subsets of the stable brain dimensions, producing different pictures of alignment quality. When applied to the Natural Scenes Dataset, the method finds that early-to-intermediate visual cortex responses are low-dimensional and that brain-to-brain comparisons supply a human reference for recoverability while some models display distinct recovery patterns despite matched accuracy. The result is a diagnostic tool that makes explicit the dimensions driving any given alignment score.

Core claim

By first identifying target-brain response dimensions that can be reproducibly predicted across independent trial splits and then quantifying the recovery strength of each dimension under predictions from models or other subjects' brains, the recovery-profile framework distinguishes alignments that scalar prediction accuracy treats as equivalent. In the examined subset of the Natural Scenes Dataset, early-to-intermediate visual-cortex responses form a low-dimensional set of reproducible dimensions; brain-to-brain predictions identify which of these are consistently recoverable from other subjects, while pretrained and randomly initialized models sometimes match in accuracy yet differ in the哪

What carries the argument

The target-space recovery profile, which first extracts reproducible dimensions from repeated fMRI trial splits of the target brain responses and then reports the recovery strength of each dimension under external predictions.

If this is right

Models or brains with matched prediction accuracy can exhibit distinct recovery profiles, indicating that accuracy alone does not guarantee equivalent alignment.
Brain-to-brain recovery profiles supply a human reference that identifies which reproducible dimensions are consistently recoverable across subjects.
Early-to-intermediate visual cortex contains a low-dimensional set of reproducible response dimensions that can be used as the basis for finer-grained alignment tests.
The framework applies equally to model-brain and brain-brain comparisons, allowing direct comparison of their recovery strengths on the same dimensions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If recovery profiles differ systematically across model classes, one could test whether those differences predict distinct patterns of errors on downstream visual tasks that probe the missed dimensions.
The same reproducible-dimension approach could be extended to evaluate alignment in other sensory modalities or brain regions where repeated measurements are available.
Optimizing a model explicitly for higher recovery of specific dimensions rather than overall accuracy might produce representations that better generalize to human-like behavior on targeted visual judgments.

Load-bearing premise

The dimensions that emerge as reproducible from repeated fMRI trial splits are stable, meaningful features of the target brain response space whose recovery is the right criterion for judging alignment quality.

What would settle it

A direct comparison in which models or brains that differ in recovery profiles across the reproducible dimensions nevertheless produce identical behavioral predictions or task performance on visual judgments tied to those dimensions would show that the profiles do not add diagnostic information beyond accuracy.

Figures

Figures reproduced from arXiv: 2605.20127 by Ayumu Yamashita, Kaoru Amano, Ken Nakamura, Ryuto Yashiro, Tomoya Nakai.

**Figure 1.** Figure 1: Overview of the predictive subspace framework. Repeated target-brain responses are split into two averaged views, and the predictive subspace fitting operator (∗) between views estimates the reproducible target reference. The same operator is applied to each source, either another subject’s brain responses or a model representation, to predict target responses. The predicted response patterns span a source… view at source ↗

**Figure 2.** Figure 2: Reproducible target references across ROIs. (A) Independently estimated target references converge toward the estimate obtained from all split-half partitions. (B, C) Reference weights are concentrated but distributed across several target-reference dimensions. (D) Readouts restricted to the selected predictive subspace preserve held-out prediction accuracy relative to full-representation readouts and out… view at source ↗

**Figure 3.** Figure 3: Models differ in recovered dimensions beyond prediction accuracy. (A) A matched VGG-16 case has nearly identical prediction accuracy but different recovery profiles. (B) Prediction accuracy and the overall height of the recovery curve are related but not interchangeable. (C) Nearequal-accuracy pairs show small differences between brain source pairs but larger differences for source pairs involving models;… view at source ↗

**Figure 4.** Figure 4: Robustness and controls. (A) Trial-partition resampling sensitivity measured as selfsubspace prediction accuracy divided by full-feature readout accuracy. (B) Rank-rule sensitivity for random and pretrained models; error bars show SEM across architectures within each training state and rank rule. (C) Target-side PCA overlaps with part of the reproducible target reference, as expected for reliable high-var… view at source ↗

**Figure 5.** Figure 5: Brain-to-brain recovery profiles. (A) Brain-to-brain top-k reference-coverage curves for all visual ROIs. Top-k reference coverage is a reference-weighted prefix average, not a cumulative sum. (B) Brain-to-brain prediction accuracy and brain-source profile mean are related but not identical, which shows why brain-to-brain comparison is more informative as a recovery profile than a scalar ceiling alone. R18… view at source ↗

**Figure 6.** Figure 6: Dataset-shift analysis. NSD-synthetic evaluates structured recovery outside the main natural-image dataset. Pretrained models remain high across datasets, while random controls show stronger architecture dependence, especially for ViT on synthetic images. to better match several macaque V1 response properties, including predictivity [20]. In our sweep, the L2-robust model has the highest brain-source-refer… view at source ↗

**Figure 7.** Figure 7: Axis-wise directional coverage. Each panel shows DirCov for the first ten reference directions in one visual ROI. Lines show brain-to-brain means, pretrained model means, and randommodel means; gray bars show the normalized reproducible target reference weight for each direction. Axis-wise coverage is diagnostic, especially for the leading directions, but later low-mass directions are weaker and more vari… view at source ↗

**Figure 8.** Figure 8: NSD-core-shared model recovery profiles by ROI and architecture. Each panel shows top-k reference coverage for one visual ROI and one architecture. This quantity is a prefix average rather than a cumulative sum. Curves summarize four-seed, randomly initialized models, ImageNet-pretrained models, and the brain-source recovery profile, all over the same reproducible target reference. Shaded bands show SEM ac… view at source ↗

**Figure 9.** Figure 9: NSD-synthetic model recovery profiles by ROI and architecture. The layout matches [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗

**Figure 10.** Figure 10: Layer-wise and objective diagnostics. (A) Brain-source-referenced profile score across normalized layer depth for representative random and pretrained models. Labels mark the best layer for each curve. (B) Representative top-k reference-coverage profiles show that high-scoring pretrained layers remain closer to the brain-source profile, whereas random ViT is dominated by the leading direction and drops sh… view at source ↗

**Figure 11.** Figure 11: Source-side explanation diagnostics. (A) Brain-source-referenced profile score plotted against the selected source rank. (B) The source effective rank, computed from nonnegative predictive strengths across the selected source coordinates, serves as a diagnostic of compactness. (C) Sourceside fold stability is the median normalized projector overlap between source-side predictive subspaces selected on dif… view at source ↗

**Figure 12.** Figure 12: High-accuracy and brain-source profile-shape diagnostics. (A) The top quartile of prediction accuracy is shaded. (B) Within this high-accuracy subset, brain-source-referenced profile scores remain separated between random and pretrained sources; boxplots show medians, interquartile ranges, and 1.5-IQR whiskers. (C) Brain-source profile-shape distance compares the mean-normalized top-k curve with the brain… view at source ↗

read the original abstract

Artificial vision models are often evaluated against the human visual cortex by measuring how accurately their internal representations predict brain responses. However, prediction accuracy alone does not indicate which dimensions of the target brain's response space are recovered. Here, we introduce a unified framework for evaluating both model-brain and brain-brain alignment by identifying the response dimensions recovered by prediction. Using repeated fMRI measurements, we first identify target-brain response dimensions that can be reproducibly predicted across independent trial splits. We then predict target-brain responses from either another subject's brain responses or a vision model's internal representations, and quantify how strongly each of these reproducible response dimensions is recovered. Applying this framework to a subset of the Natural Scenes Dataset, in which eight subjects viewed the same natural images during fMRI, we find that the early-to-intermediate visual-cortex responses contain a low-dimensional set of reproducible dimensions. Brain-to-brain comparisons identify which of these dimensions are consistently recoverable from other subjects' brains, providing a diagnostic human reference rather than only a scalar benchmark. In some cases, pretrained and randomly initialized models achieve similar prediction accuracy while showing distinct recovery profiles across these response dimensions. These results show that prediction accuracy alone can mask model-brain mismatches. By making explicit which reproducible brain response dimensions are recovered by prediction, our framework provides a more diagnostic evaluation of alignment between artificial vision models and the human visual cortex.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Recovery profiles usefully flag cases where similar accuracy hides different recovered dimensions, but the value depends on whether split-reproducible dimensions truly isolate stable signal.

read the letter

The main point is that prediction accuracy alone can mask mismatches in which brain response dimensions get recovered, and this paper supplies recovery profiles as a diagnostic layer on top of existing methods. They first use repeated fMRI trials on the same images to flag reproducible dimensions via independent splits, then measure recovery strength for each when predicting from models or other brains. On a subset of the Natural Scenes Dataset with eight subjects, they report a low-dimensional set of reproducible dimensions in early-to-intermediate visual cortex, use brain-to-brain recovery as a human reference, and show pretrained versus random models with comparable accuracy but distinct per-dimension profiles. This demonstrates the core claim without overreaching. The two-step split-and-held-out procedure is a reasonable way to ground the evaluation, and the unified treatment of model-brain and brain-brain alignment makes the framework practical for the subfield. The results on the NSD data illustrate the point concretely. The soft spot is the assumption that dimensions identified as reproducible across trial splits represent stable, meaningful features rather than shared scanner noise, hemodynamic effects, or artifacts from the particular split. The stress-test note is on target here; if dimension selection lacks clear statistical controls or is sensitive to partitioning, the profiles may differ from accuracy without revealing deeper neural-code issues. The abstract outlines a coherent approach with held-out quantification, so the concern is moderate rather than fatal, but full methods details on thresholds and robustness checks would be needed to settle it. This is for researchers in visual neuroscience who run model-brain alignment studies and want evaluation tools beyond scalar accuracy. Readers already working with fMRI prediction or looking for better diagnostics would find the framework and NSD application worth their time. It deserves peer review because the idea is a straightforward, usable extension and the initial evidence is sharp enough to benefit from referee input on the reproducibility step.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a framework to evaluate model-brain alignment by first identifying reproducible dimensions in fMRI brain responses using independent trial splits, then measuring the recovery strength of these dimensions in predictions from artificial vision models or other brains. Applied to a subset of the Natural Scenes Dataset with eight subjects, the authors report that early-to-intermediate visual cortex responses are low-dimensional and reproducible, that brain-to-brain recovery provides a diagnostic reference, and that models with comparable prediction accuracy can exhibit distinct recovery profiles, indicating that accuracy alone can mask mismatches in alignment.

Significance. If the core procedure is robust, the framework offers a more granular diagnostic for alignment evaluations than scalar accuracy, with brain-to-brain comparisons serving as an internal human benchmark. The approach leverages repeated measurements and held-out recovery quantification, which are positive features for reproducibility. It could help distinguish cases where models match overall variance but differ in the specific neural dimensions recovered.

major comments (3)

[§3] §3 (Identifying reproducible dimensions): The manuscript must clarify whether the reproducibility threshold or selection criterion is fixed a priori or determined post-hoc from the data; if the latter, this couples the target subspace definition to the same split properties used for recovery and risks inflating apparent diagnostic power of the profiles.
[§4] §4 (Recovery quantification): Per-dimension recovery metrics lack reported error bars, cross-validation details, or correction for multiple comparisons across dimensions; without these, differences in recovery profiles between models (or vs. brain-to-brain) cannot be distinguished from sampling variability in the fMRI responses.
[§2–3] §2–3 (Trial-split procedure): The claim that split-based dimensions isolate stable signal rather than shared noise or hemodynamic artifacts requires explicit validation, such as consistency across alternative trial partitions or correlation with independent stimulus properties; absent this, the recovery profiles may not diagnose deeper code mismatches as asserted.

minor comments (2)

[Abstract / Methods] The abstract and methods should specify the exact number of images, trials per split, and subjects used from the Natural Scenes Dataset subset for reproducibility.
[Figures] Figure legends should explicitly define the recovery strength metric (e.g., correlation or R² per dimension) and the scale used for profile visualization.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We have addressed each major point below and believe the revisions will improve the clarity and statistical rigor of the work.

read point-by-point responses

Referee: §3 (Identifying reproducible dimensions): The manuscript must clarify whether the reproducibility threshold or selection criterion is fixed a priori or determined post-hoc from the data; if the latter, this couples the target subspace definition to the same split properties used for recovery and risks inflating apparent diagnostic power of the profiles.

Authors: We appreciate the referee's emphasis on this methodological detail. The reproducibility threshold in our framework is a fixed statistical criterion (reproducibility p < 0.05 after Bonferroni correction across voxels) chosen a priori based on standard practices in fMRI reliability analyses, rather than optimized post-hoc to maximize recovery. The dimension selection uses one pair of independent trial splits, while recovery is quantified on a fully held-out third split. We will revise the Methods section to state this explicitly and add a brief discussion noting that the held-out recovery measurement prevents direct circularity. These changes should address the concern about inflated diagnostic power. revision: yes
Referee: §4 (Recovery quantification): Per-dimension recovery metrics lack reported error bars, cross-validation details, or correction for multiple comparisons across dimensions; without these, differences in recovery profiles between models (or vs. brain-to-brain) cannot be distinguished from sampling variability in the fMRI responses.

Authors: We agree that these elements are essential for interpreting differences in recovery profiles. In the revised manuscript we will add bootstrap-derived standard errors (resampling across subjects and image trials) for all per-dimension recovery values. We will also specify the nested cross-validation procedure used to compute recovery and apply FDR correction for multiple comparisons across the selected dimensions. These additions will allow readers to evaluate whether observed profile differences exceed sampling variability. revision: yes
Referee: §2–3 (Trial-split procedure): The claim that split-based dimensions isolate stable signal rather than shared noise or hemodynamic artifacts requires explicit validation, such as consistency across alternative trial partitions or correlation with independent stimulus properties; absent this, the recovery profiles may not diagnose deeper code mismatches as asserted.

Authors: This is a fair critique of the interpretive strength of the split-based dimensions. While the core procedure relies on independent trial splits to emphasize stable signal, we acknowledge that further validation is warranted. In the revision we will include supplementary results showing that the identified low-dimensional subspaces remain consistent when alternative random partitions of the trials are used. We will also report correlations of these dimensions with basic stimulus properties (e.g., spatial frequency content and contrast) to help distinguish signal from potential artifacts. A complete exclusion of all hemodynamic confounds would require additional datasets with varied acquisition parameters, which lies outside the current study; we will note this limitation explicitly. revision: partial

Circularity Check

0 steps flagged

No significant circularity; recovery profiles grounded in independent splits

full rationale

The paper first identifies reproducible dimensions via prediction across independent fMRI trial splits on repeated measurements, then separately quantifies recovery strength of those dimensions under model or brain-to-brain predictions. This two-stage structure uses held-out splits for identification and applies the metric to distinct prediction sources, avoiding reduction of the evaluation to its own inputs by construction. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework depends on standard assumptions about fMRI signal reliability and the interpretability of linear or similar prediction models, plus a data-driven definition of reproducible dimensions; no new physical entities are introduced.

free parameters (1)

reproducibility threshold or selection criterion
A cutoff or statistical rule is needed to decide which response dimensions qualify as reproducible across trial splits.

axioms (1)

domain assumption Repeated fMRI measurements on the same images allow reliable identification of stable response dimensions in visual cortex.
The method assumes trial-split predictability captures biologically meaningful and stable features rather than noise or task-specific artifacts.

pith-pipeline@v0.9.0 · 5790 in / 1363 out tokens · 46267 ms · 2026-05-20T03:19:07.174121+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a unified framework for evaluating both model–brain and brain–brain alignment by identifying the response dimensions recovered by prediction... reproducible target reference... TopKCov... recovery profile
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery and embed_strictMono unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Repeated target-brain responses define the reproducible target reference by identifying dimensions recovered across independent trial splits.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 2 internal anchors

[1]

Allen, Ghislain St-Yves, Yihan Wu, Jesse L

Emily J. Allen, Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Jacob S. Prince, Logan T. Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, J. Benjamin Hutchinson, Thomas Naselaris, and Kendrick N. Kay. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence.Nature Neuroscience, 25(1):116–126, 2022. doi: 10.103...

work page doi:10.1038/s41593-021-00962-x 2022
[2]

Perception Encoder: The best visual embeddings are not at the output of the network

Daniel Bolya, Po-Yao Huang, Peize Sun, Jang Hyun Cho, Andrea Madotto, Chen Wei, Tengyu Ma, Jiale Zhi, Jathushan Rajasegaran, Hanoona Abdul Rasheed, Junke Wang, Marco Monteiro, Hu Xu, Shiyu Dong, Nikhila Ravi, Shang-Wen Li, Piotr Dollár, and Christoph Feichtenhofer. Perception Encoder: The best visual embeddings are not at the output of the network. In Adv...

work page 2025
[3]

Correlation calculated from faulty data.British Journal of Psychology, 3(3): 271–295, 1910

William Brown. Some experimental results in the correlation of mental abilities.British Journal of Psychology, 3(3):296–322, 1910. doi: 10.1111/j.2044-8295.1910.tb00207.x

work page doi:10.1111/j.2044-8295.1910.tb00207.x 1910
[4]

A spectral theory of neural prediction and alignment

Abdulkadir Canatar, Jenelle Feather, Albert Wakhloo, and SueYeon Chung. A spectral theory of neural prediction and alignment. InAdvances in Neu- ral Information Processing Systems, volume 36, pages 47052–47080, 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/hash/ 9308d1b7d4ae2d3e2e67ae94b1078bf7-Abstract-Conference.html

work page 2023
[5]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9650–9660, 2021

work page 2021
[6]

Haxby, and Peter J

Po-Hsuan Chen, Janice Chen, Yaara Yeshurun, Uri Hasson, James V . Haxby, and Peter J. Ra- madge. A reduced-dimension fMRI shared response model. InAdvances in Neural Information Processing Systems, volume 28, pages 460–468, 2015

work page 2015
[7]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

work page 2021
[8]

Tibshirani.An Introduction to the Bootstrap

Bradley Efron and Robert J. Tibshirani.An Introduction to the Bootstrap. Chapman and Hall/CRC, 1993. doi: 10.1201/9780429246593

work page doi:10.1201/9780429246593 1993
[9]

Robust- ness Python library, 2019

Logan Engstrom, Andrew Ilyas, Hadi Salman, Shibani Santurkar, and Dimitris Tsipras. Robust- ness Python library, 2019. URLhttps://github.com/MadryLab/robustness

work page 2019
[10]

Ky Fan. Maximum properties and inequalities for the eigenvalues of completely continuous operators.Proceedings of the National Academy of Sciences of the United States of America, 37 (11):760–766, 1951. doi: 10.1073/pnas.37.11.760

work page doi:10.1073/pnas.37.11.760 1951
[11]

Toshev, and Vaishaal Shankar

Alex Fang, Albin Madappally Jose, Amit Jain, Ludwig Schmidt, Alexander T. Toshev, and Vaishaal Shankar. Data filtering networks. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=KAk6ngZ09F

work page 2024
[12]

Apurva Ratan Murty, and Aran Nayebi

Jenelle Feather, Meenakshi Khosla, N. Apurva Ratan Murty, and Aran Nayebi. Brain-Model evaluations need the NeuroAI turing test.arXiv preprint arXiv:2502.16238, 2025. doi: 10. 48550/arXiv.2502.16238

work page arXiv 2025
[13]

Gifford, Radoslaw M

Alessandro T. Gifford, Radoslaw M. Cichy, Thomas Naselaris, and Kendrick N. Kay. A 7T fMRI dataset of synthetic images for out-of-distribution modeling of vision.Nature Communications, 17:1589, 2026. doi: 10.1038/s41467-026-69345-9. 10

work page doi:10.1038/s41467-026-69345-9 2026
[14]

Umut Güçlü and Marcel A. J. van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream.The Journal of Neuroscience, 35 (27):10005–10014, 2015. doi: 10.1523/JNEUROSCI.5023-14.2015

work page doi:10.1523/jneurosci.5023-14.2015 2015
[15]

Haxby, J

James V . Haxby, J. Swaroop Guntupalli, Andrew C. Connolly, Yaroslav O. Halchenko, Bryan R. Conroy, M. Ida Gobbini, Michael Hanke, and Peter J. Ramadge. A common, high-dimensional model of the representational space in human ventral temporal cortex.Neuron, 72(2):404–416,

work page
[16]

doi: 10.1016/j.neuron.2011.08.026

work page doi:10.1016/j.neuron.2011.08.026 2011
[17]

Deep residual learning for im- age recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im- age recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016
[18]

Hoerl and Robert W

Arthur E. Hoerl and Robert W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 12(1):55–67, 1970. doi: 10.1080/00401706.1970.10488634

work page doi:10.1080/00401706.1970.10488634 1970
[19]

Only brains align with brains: Cross-region alignment patterns expose limits of normative models

Larissa Höfling, Matthias Tangemann, Lotta Piefke, Susanne Keller, Matthias Bethge, and Katrin Franke. Only brains align with brains: Cross-region alignment patterns expose limits of normative models. InThe Fourteenth International Conference on Learning Representations,

work page
[20]

URLhttps://openreview.net/forum?id=cMGJcHHI7d

work page
[21]

Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975

Alan Julian Izenman. Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975. doi: 10.1016/0047-259X(75)90042-1

work page doi:10.1016/0047-259x(75)90042-1 1975
[22]

Nathan C. L. Kong, Eshed Margalit, Justin L. Gardner, and Anthony M. Norcia. Increasing neural network robustness improves match to macaque V1 eigenspectrum, spatial frequency preference and predictivity.PLOS Computational Biology, 18(1):e1009739, 2022. doi: 10. 1371/journal.pcbi.1009739

work page 2022
[23]

Similarity of neural network representations revisited

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InProceedings of the 36th International Conference on Machine Learning, pages 3519–3529, 2019

work page 2019
[24]

Representational Similarity Analysis – Connecting the Branches of Systems Neuroscience

Nikolaus Kriegeskorte, Marieke Mur, and Peter A. Bandettini. Representational similarity analysis—connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008. doi: 10.3389/neuro.06.004.2008

work page doi:10.3389/neuro.06.004.2008 2008
[25]

Meth- ods for computing the maximum performance of computational models of fMRI responses

Agustin Lage-Castellanos, Giancarlo Valente, Elia Formisano, and Federico De Martino. Meth- ods for computing the maximum performance of computational models of fMRI responses. PLoS Computational Biology, 15(3):e1006397, 2019. doi: 10.1371/journal.pcbi.1006397

work page doi:10.1371/journal.pcbi.1006397 2019
[26]

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin Transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021. doi: 10.1109/ICCV48922.2021.00986

work page doi:10.1109/iccv48922.2021.00986 2021
[27]

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A ConvNet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022. doi: 10.1109/CVPR52688.2022.01167

work page doi:10.1109/cvpr52688.2022.01167 2022
[28]

Reverse predictivity for bidirectional comparison of neural networks and biological brains.Nature Machine Intelligence, 8(3):474–488, 2026

Sabine Muzellec and Kohitij Kar. Reverse predictivity for bidirectional comparison of neural networks and biological brains.Nature Machine Intelligence, 8(3):474–488, 2026. doi: 10. 1038/s42256-026-01204-0

work page 2026
[29]

doi: 10.1016/j.neuroimage.2010.07.073

Thomas Naselaris, Kendrick N. Kay, Shinji Nishimoto, and Jack L. Gallant. Encoding and decoding in fMRI.NeuroImage, 56(2):400–410, 2011. doi: 10.1016/j.neuroimage.2010.07.073

work page doi:10.1016/j.neuroimage.2010.07.073 2011
[30]

Nastase, Valeria Gazzola, Uri Hasson, and Christian Keysers

Samuel A. Nastase, Valeria Gazzola, Uri Hasson, and Christian Keysers. Measuring shared responses across subjects using intersubject correlation.Social Cognitive and Affective Neuro- science, 14(6):667–685, 2019. doi: 10.1093/scan/nsz037

work page doi:10.1093/scan/nsz037 2019
[31]

Prince, Ian Charest, Jan W

Jacob S. Prince, Ian Charest, Jan W. Kurzawski, John A. Pyles, Michael J. Tarr, and Kendrick N. Kay. Improving the accuracy of single-trial fMRI response estimates using GLMsingle.eLife, 11:e77599, 2022. doi: 10.7554/eLife.77599. 11

work page doi:10.7554/elife.77599 2022
[32]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceed- ings of the 38th International Conference on Machine Learning, pages 8748–8763, 2021

work page 2021
[33]

The effective rank: A measure of effective dimensionality

Olivier Roy and Martin Vetterli. The effective rank: A measure of effective dimensionality. In 2007 15th European Signal Processing Conference, pages 606–610, 2007

work page 2007
[34]

Scharf, Chris Peterson, Michael Kirby, and Joseph M

Ignacio Santamaría, Louis L. Scharf, Chris Peterson, Michael Kirby, and Joseph M. Francos. An order fitting rule for optimal subspace averaging. In2016 IEEE Statistical Signal Processing Workshop (SSP), pages 1–4, 2016

work page 2016
[35]

Harper, Ben D

Oliver Schoppe, Nicol S. Harper, Ben D. B. Willmore, Andrew J. King, and Jan W. H. Schnupp. Measuring the performance of neural models.Frontiers in Computational Neuroscience, 10:10,

work page
[36]

doi: 10.3389/fncom.2016.00010

work page doi:10.3389/fncom.2016.00010 2016
[37]

Semedo, Amin Zandvakili, Christian K

João D. Semedo, Amin Zandvakili, Christian K. Machens, Byron M. Yu, and Adam Kohn. Cortical areas interact through a communication subspace.Neuron, 102(1):249–259.e4, 2019. doi: 10.1016/j.neuron.2019.01.026

work page doi:10.1016/j.neuron.2019.01.026 2019
[38]

Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10104 2025
[39]

Very deep convolutional networks for large-scale image recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. InInternational Conference on Learning Representations, 2015

work page 2015
[40]

Correlation calculated from faulty data.British Journal of Psychology, 3(3): 271–295, 1910

Charles Spearman. Correlation calculated from faulty data.British Journal of Psychology, 3(3): 271–295, 1910. doi: 10.1111/j.2044-8295.1910.tb00206.x

work page doi:10.1111/j.2044-8295.1910.tb00206.x 1910
[41]

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alab- dulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, Olivier Hénaff, Jeremiah Harmsen, Andreas Steiner, and Xiaohua Zhai. SigLIP 2: Multilingual Vision- Language encoders with improved semantic understanding, localization, and dense features....

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.14786 2025
[42]

Daniel L. K. Yamins and James J. DiCarlo. Using goal-driven deep learning models to under- stand sensory cortex.Nature Neuroscience, 19(3):356–365, 2016. doi: 10.1038/nn.4244

work page doi:10.1038/nn.4244 2016
[43]

Daniel L. K. Yamins, Ha Hong, Charles F. Cadieu, Ethan A. Solomon, Darren Seibert, and James J. DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex.Proceedings of the National Academy of Sciences, 111(23):8619–8624, 2014. doi: 10.1073/pnas.1403112111

work page doi:10.1073/pnas.1403112111 2014
[44]

Catalyzing next- generation artificial intelligence through NeuroAI.Nature Communications, 14:1597, 2023

Anthony Zador, Sean Escola, Blake Richards, Bence Ölveczky, Yoshua Bengio, Kwabena Boahen, Matthew Botvinick, Dmitri Chklovskii, Anne Churchland, et al. Catalyzing next- generation artificial intelligence through NeuroAI.Nature Communications, 14:1597, 2023. doi: 10.1038/s41467-023-37180-x

work page doi:10.1038/s41467-023-37180-x 2023
[45]

Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, and Yu Su

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11975–11986, 2023. doi: 10.1109/ICCV51070.2023.01100. 12 A Experimental setting The main text summarizes the information needed to interpret the results. This app...

work page doi:10.1109/iccv51070.2023.01100 2023
[46]

Source and target matrices are split into outer-training and held-out outer-test images, with standardization statistics estimated only on the relevant training data

work page
[47]

Each brain or model source is fit to the target responses on outer-training images, with rank and subspace regularization selected by inner cross-validation; this produces a source- induced predictive subspace in the target response space

work page
[48]

Held-out repeated target responses are split into two averaged views, and target-to-target prediction between these views defines the reproducible target reference for that fold

work page
[49]

Each source-induced predictive subspace is compared with this reference using directional and top-kreference coverage

work page
[50]

Fold-level curves are averaged to form recovery profiles

work page
[51]

The recovery profile, not the scalar summary, is the primary object

Scalar summaries such as profile mean, brain-source-referenced score, and full-spectrum reference coverage are computed for compact reporting and controls. The recovery profile, not the scalar summary, is the primary object. Notation.We use s for a generic source, d for a non-target subject (donor) used as a brain source, m for a model source, t for the t...

work page
[52]

C Justification of the repeated-trial target reference This appendix justifies the repeated-trial target reference used as an evaluation coordinate system

is erank(TargetRef) = exp − X i pi logp i ! , p i =λ i/ X j λj. C Justification of the repeated-trial target reference This appendix justifies the repeated-trial target reference used as an evaluation coordinate system. The goal is limited: we do not claim to recover a unique ground-truth biological subspace. Instead, we show that the construction provide...

work page

[1] [1]

Allen, Ghislain St-Yves, Yihan Wu, Jesse L

Emily J. Allen, Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Jacob S. Prince, Logan T. Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, J. Benjamin Hutchinson, Thomas Naselaris, and Kendrick N. Kay. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence.Nature Neuroscience, 25(1):116–126, 2022. doi: 10.103...

work page doi:10.1038/s41593-021-00962-x 2022

[2] [2]

Perception Encoder: The best visual embeddings are not at the output of the network

Daniel Bolya, Po-Yao Huang, Peize Sun, Jang Hyun Cho, Andrea Madotto, Chen Wei, Tengyu Ma, Jiale Zhi, Jathushan Rajasegaran, Hanoona Abdul Rasheed, Junke Wang, Marco Monteiro, Hu Xu, Shiyu Dong, Nikhila Ravi, Shang-Wen Li, Piotr Dollár, and Christoph Feichtenhofer. Perception Encoder: The best visual embeddings are not at the output of the network. In Adv...

work page 2025

[3] [3]

Correlation calculated from faulty data.British Journal of Psychology, 3(3): 271–295, 1910

William Brown. Some experimental results in the correlation of mental abilities.British Journal of Psychology, 3(3):296–322, 1910. doi: 10.1111/j.2044-8295.1910.tb00207.x

work page doi:10.1111/j.2044-8295.1910.tb00207.x 1910

[4] [4]

A spectral theory of neural prediction and alignment

Abdulkadir Canatar, Jenelle Feather, Albert Wakhloo, and SueYeon Chung. A spectral theory of neural prediction and alignment. InAdvances in Neu- ral Information Processing Systems, volume 36, pages 47052–47080, 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/hash/ 9308d1b7d4ae2d3e2e67ae94b1078bf7-Abstract-Conference.html

work page 2023

[5] [5]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9650–9660, 2021

work page 2021

[6] [6]

Haxby, and Peter J

Po-Hsuan Chen, Janice Chen, Yaara Yeshurun, Uri Hasson, James V . Haxby, and Peter J. Ra- madge. A reduced-dimension fMRI shared response model. InAdvances in Neural Information Processing Systems, volume 28, pages 460–468, 2015

work page 2015

[7] [7]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

work page 2021

[8] [8]

Tibshirani.An Introduction to the Bootstrap

Bradley Efron and Robert J. Tibshirani.An Introduction to the Bootstrap. Chapman and Hall/CRC, 1993. doi: 10.1201/9780429246593

work page doi:10.1201/9780429246593 1993

[9] [9]

Robust- ness Python library, 2019

Logan Engstrom, Andrew Ilyas, Hadi Salman, Shibani Santurkar, and Dimitris Tsipras. Robust- ness Python library, 2019. URLhttps://github.com/MadryLab/robustness

work page 2019

[10] [10]

Ky Fan. Maximum properties and inequalities for the eigenvalues of completely continuous operators.Proceedings of the National Academy of Sciences of the United States of America, 37 (11):760–766, 1951. doi: 10.1073/pnas.37.11.760

work page doi:10.1073/pnas.37.11.760 1951

[11] [11]

Toshev, and Vaishaal Shankar

Alex Fang, Albin Madappally Jose, Amit Jain, Ludwig Schmidt, Alexander T. Toshev, and Vaishaal Shankar. Data filtering networks. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=KAk6ngZ09F

work page 2024

[12] [12]

Apurva Ratan Murty, and Aran Nayebi

Jenelle Feather, Meenakshi Khosla, N. Apurva Ratan Murty, and Aran Nayebi. Brain-Model evaluations need the NeuroAI turing test.arXiv preprint arXiv:2502.16238, 2025. doi: 10. 48550/arXiv.2502.16238

work page arXiv 2025

[13] [13]

Gifford, Radoslaw M

Alessandro T. Gifford, Radoslaw M. Cichy, Thomas Naselaris, and Kendrick N. Kay. A 7T fMRI dataset of synthetic images for out-of-distribution modeling of vision.Nature Communications, 17:1589, 2026. doi: 10.1038/s41467-026-69345-9. 10

work page doi:10.1038/s41467-026-69345-9 2026

[14] [14]

Umut Güçlü and Marcel A. J. van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream.The Journal of Neuroscience, 35 (27):10005–10014, 2015. doi: 10.1523/JNEUROSCI.5023-14.2015

work page doi:10.1523/jneurosci.5023-14.2015 2015

[15] [15]

Haxby, J

James V . Haxby, J. Swaroop Guntupalli, Andrew C. Connolly, Yaroslav O. Halchenko, Bryan R. Conroy, M. Ida Gobbini, Michael Hanke, and Peter J. Ramadge. A common, high-dimensional model of the representational space in human ventral temporal cortex.Neuron, 72(2):404–416,

work page

[16] [16]

doi: 10.1016/j.neuron.2011.08.026

work page doi:10.1016/j.neuron.2011.08.026 2011

[17] [17]

Deep residual learning for im- age recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im- age recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016

[18] [18]

Hoerl and Robert W

Arthur E. Hoerl and Robert W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 12(1):55–67, 1970. doi: 10.1080/00401706.1970.10488634

work page doi:10.1080/00401706.1970.10488634 1970

[19] [19]

Only brains align with brains: Cross-region alignment patterns expose limits of normative models

Larissa Höfling, Matthias Tangemann, Lotta Piefke, Susanne Keller, Matthias Bethge, and Katrin Franke. Only brains align with brains: Cross-region alignment patterns expose limits of normative models. InThe Fourteenth International Conference on Learning Representations,

work page

[20] [20]

URLhttps://openreview.net/forum?id=cMGJcHHI7d

work page

[21] [21]

Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975

Alan Julian Izenman. Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975. doi: 10.1016/0047-259X(75)90042-1

work page doi:10.1016/0047-259x(75)90042-1 1975

[22] [22]

Nathan C. L. Kong, Eshed Margalit, Justin L. Gardner, and Anthony M. Norcia. Increasing neural network robustness improves match to macaque V1 eigenspectrum, spatial frequency preference and predictivity.PLOS Computational Biology, 18(1):e1009739, 2022. doi: 10. 1371/journal.pcbi.1009739

work page 2022

[23] [23]

Similarity of neural network representations revisited

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InProceedings of the 36th International Conference on Machine Learning, pages 3519–3529, 2019

work page 2019

[24] [24]

Representational Similarity Analysis – Connecting the Branches of Systems Neuroscience

Nikolaus Kriegeskorte, Marieke Mur, and Peter A. Bandettini. Representational similarity analysis—connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008. doi: 10.3389/neuro.06.004.2008

work page doi:10.3389/neuro.06.004.2008 2008

[25] [25]

Meth- ods for computing the maximum performance of computational models of fMRI responses

Agustin Lage-Castellanos, Giancarlo Valente, Elia Formisano, and Federico De Martino. Meth- ods for computing the maximum performance of computational models of fMRI responses. PLoS Computational Biology, 15(3):e1006397, 2019. doi: 10.1371/journal.pcbi.1006397

work page doi:10.1371/journal.pcbi.1006397 2019

[26] [26]

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin Transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021. doi: 10.1109/ICCV48922.2021.00986

work page doi:10.1109/iccv48922.2021.00986 2021

[27] [27]

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A ConvNet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022. doi: 10.1109/CVPR52688.2022.01167

work page doi:10.1109/cvpr52688.2022.01167 2022

[28] [28]

Reverse predictivity for bidirectional comparison of neural networks and biological brains.Nature Machine Intelligence, 8(3):474–488, 2026

Sabine Muzellec and Kohitij Kar. Reverse predictivity for bidirectional comparison of neural networks and biological brains.Nature Machine Intelligence, 8(3):474–488, 2026. doi: 10. 1038/s42256-026-01204-0

work page 2026

[29] [29]

doi: 10.1016/j.neuroimage.2010.07.073

Thomas Naselaris, Kendrick N. Kay, Shinji Nishimoto, and Jack L. Gallant. Encoding and decoding in fMRI.NeuroImage, 56(2):400–410, 2011. doi: 10.1016/j.neuroimage.2010.07.073

work page doi:10.1016/j.neuroimage.2010.07.073 2011

[30] [30]

Nastase, Valeria Gazzola, Uri Hasson, and Christian Keysers

Samuel A. Nastase, Valeria Gazzola, Uri Hasson, and Christian Keysers. Measuring shared responses across subjects using intersubject correlation.Social Cognitive and Affective Neuro- science, 14(6):667–685, 2019. doi: 10.1093/scan/nsz037

work page doi:10.1093/scan/nsz037 2019

[31] [31]

Prince, Ian Charest, Jan W

Jacob S. Prince, Ian Charest, Jan W. Kurzawski, John A. Pyles, Michael J. Tarr, and Kendrick N. Kay. Improving the accuracy of single-trial fMRI response estimates using GLMsingle.eLife, 11:e77599, 2022. doi: 10.7554/eLife.77599. 11

work page doi:10.7554/elife.77599 2022

[32] [32]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceed- ings of the 38th International Conference on Machine Learning, pages 8748–8763, 2021

work page 2021

[33] [33]

The effective rank: A measure of effective dimensionality

Olivier Roy and Martin Vetterli. The effective rank: A measure of effective dimensionality. In 2007 15th European Signal Processing Conference, pages 606–610, 2007

work page 2007

[34] [34]

Scharf, Chris Peterson, Michael Kirby, and Joseph M

Ignacio Santamaría, Louis L. Scharf, Chris Peterson, Michael Kirby, and Joseph M. Francos. An order fitting rule for optimal subspace averaging. In2016 IEEE Statistical Signal Processing Workshop (SSP), pages 1–4, 2016

work page 2016

[35] [35]

Harper, Ben D

Oliver Schoppe, Nicol S. Harper, Ben D. B. Willmore, Andrew J. King, and Jan W. H. Schnupp. Measuring the performance of neural models.Frontiers in Computational Neuroscience, 10:10,

work page

[36] [36]

doi: 10.3389/fncom.2016.00010

work page doi:10.3389/fncom.2016.00010 2016

[37] [37]

Semedo, Amin Zandvakili, Christian K

João D. Semedo, Amin Zandvakili, Christian K. Machens, Byron M. Yu, and Adam Kohn. Cortical areas interact through a communication subspace.Neuron, 102(1):249–259.e4, 2019. doi: 10.1016/j.neuron.2019.01.026

work page doi:10.1016/j.neuron.2019.01.026 2019

[38] [38]

Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10104 2025

[39] [39]

Very deep convolutional networks for large-scale image recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. InInternational Conference on Learning Representations, 2015

work page 2015

[40] [40]

Correlation calculated from faulty data.British Journal of Psychology, 3(3): 271–295, 1910

Charles Spearman. Correlation calculated from faulty data.British Journal of Psychology, 3(3): 271–295, 1910. doi: 10.1111/j.2044-8295.1910.tb00206.x

work page doi:10.1111/j.2044-8295.1910.tb00206.x 1910

[41] [41]

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alab- dulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, Olivier Hénaff, Jeremiah Harmsen, Andreas Steiner, and Xiaohua Zhai. SigLIP 2: Multilingual Vision- Language encoders with improved semantic understanding, localization, and dense features....

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.14786 2025

[42] [42]

Daniel L. K. Yamins and James J. DiCarlo. Using goal-driven deep learning models to under- stand sensory cortex.Nature Neuroscience, 19(3):356–365, 2016. doi: 10.1038/nn.4244

work page doi:10.1038/nn.4244 2016

[43] [43]

Daniel L. K. Yamins, Ha Hong, Charles F. Cadieu, Ethan A. Solomon, Darren Seibert, and James J. DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex.Proceedings of the National Academy of Sciences, 111(23):8619–8624, 2014. doi: 10.1073/pnas.1403112111

work page doi:10.1073/pnas.1403112111 2014

[44] [44]

Catalyzing next- generation artificial intelligence through NeuroAI.Nature Communications, 14:1597, 2023

Anthony Zador, Sean Escola, Blake Richards, Bence Ölveczky, Yoshua Bengio, Kwabena Boahen, Matthew Botvinick, Dmitri Chklovskii, Anne Churchland, et al. Catalyzing next- generation artificial intelligence through NeuroAI.Nature Communications, 14:1597, 2023. doi: 10.1038/s41467-023-37180-x

work page doi:10.1038/s41467-023-37180-x 2023

[45] [45]

Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, and Yu Su

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11975–11986, 2023. doi: 10.1109/ICCV51070.2023.01100. 12 A Experimental setting The main text summarizes the information needed to interpret the results. This app...

work page doi:10.1109/iccv51070.2023.01100 2023

[46] [46]

Source and target matrices are split into outer-training and held-out outer-test images, with standardization statistics estimated only on the relevant training data

work page

[47] [47]

Each brain or model source is fit to the target responses on outer-training images, with rank and subspace regularization selected by inner cross-validation; this produces a source- induced predictive subspace in the target response space

work page

[48] [48]

Held-out repeated target responses are split into two averaged views, and target-to-target prediction between these views defines the reproducible target reference for that fold

work page

[49] [49]

Each source-induced predictive subspace is compared with this reference using directional and top-kreference coverage

work page

[50] [50]

Fold-level curves are averaged to form recovery profiles

work page

[51] [51]

The recovery profile, not the scalar summary, is the primary object

Scalar summaries such as profile mean, brain-source-referenced score, and full-spectrum reference coverage are computed for compact reporting and controls. The recovery profile, not the scalar summary, is the primary object. Notation.We use s for a generic source, d for a non-target subject (donor) used as a brain source, m for a model source, t for the t...

work page

[52] [52]

C Justification of the repeated-trial target reference This appendix justifies the repeated-trial target reference used as an evaluation coordinate system

is erank(TargetRef) = exp − X i pi logp i ! , p i =λ i/ X j λj. C Justification of the repeated-trial target reference This appendix justifies the repeated-trial target reference used as an evaluation coordinate system. The goal is limited: we do not claim to recover a unique ground-truth biological subspace. Instead, we show that the construction provide...

work page