pith. sign in

arxiv: 2605.20127 · v1 · pith:NLM25M2Lnew · submitted 2026-05-19 · 🧬 q-bio.NC · cs.AI· cs.LG

Beyond Prediction Accuracy: Target-Space Recovery Profiles for Evaluating Model-Brain Alignment

Pith reviewed 2026-05-20 03:19 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AIcs.LG
keywords model-brain alignmentprediction accuracyfMRIvisual cortexrecovery profilesNatural Scenes Datasetreproducible dimensionsbrain-to-brain alignment
0
0 comments X

The pith

Prediction accuracy can mask model-brain mismatches because it does not reveal which specific reproducible dimensions of brain responses are recovered.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard accuracy metrics for how well vision models predict brain activity leave open the question of which parts of the brain's response space are actually captured. It proposes a framework that first isolates the reproducible dimensions of target brain responses using repeated fMRI measurements, then measures how strongly each of those dimensions is recovered by predictions from either other brains or model representations. This matters for a sympathetic reader because two predictors can achieve the same overall accuracy while recovering different subsets of the stable brain dimensions, producing different pictures of alignment quality. When applied to the Natural Scenes Dataset, the method finds that early-to-intermediate visual cortex responses are low-dimensional and that brain-to-brain comparisons supply a human reference for recoverability while some models display distinct recovery patterns despite matched accuracy. The result is a diagnostic tool that makes explicit the dimensions driving any given alignment score.

Core claim

By first identifying target-brain response dimensions that can be reproducibly predicted across independent trial splits and then quantifying the recovery strength of each dimension under predictions from models or other subjects' brains, the recovery-profile framework distinguishes alignments that scalar prediction accuracy treats as equivalent. In the examined subset of the Natural Scenes Dataset, early-to-intermediate visual-cortex responses form a low-dimensional set of reproducible dimensions; brain-to-brain predictions identify which of these are consistently recoverable from other subjects, while pretrained and randomly initialized models sometimes match in accuracy yet differ in the哪

What carries the argument

The target-space recovery profile, which first extracts reproducible dimensions from repeated fMRI trial splits of the target brain responses and then reports the recovery strength of each dimension under external predictions.

If this is right

  • Models or brains with matched prediction accuracy can exhibit distinct recovery profiles, indicating that accuracy alone does not guarantee equivalent alignment.
  • Brain-to-brain recovery profiles supply a human reference that identifies which reproducible dimensions are consistently recoverable across subjects.
  • Early-to-intermediate visual cortex contains a low-dimensional set of reproducible response dimensions that can be used as the basis for finer-grained alignment tests.
  • The framework applies equally to model-brain and brain-brain comparisons, allowing direct comparison of their recovery strengths on the same dimensions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If recovery profiles differ systematically across model classes, one could test whether those differences predict distinct patterns of errors on downstream visual tasks that probe the missed dimensions.
  • The same reproducible-dimension approach could be extended to evaluate alignment in other sensory modalities or brain regions where repeated measurements are available.
  • Optimizing a model explicitly for higher recovery of specific dimensions rather than overall accuracy might produce representations that better generalize to human-like behavior on targeted visual judgments.

Load-bearing premise

The dimensions that emerge as reproducible from repeated fMRI trial splits are stable, meaningful features of the target brain response space whose recovery is the right criterion for judging alignment quality.

What would settle it

A direct comparison in which models or brains that differ in recovery profiles across the reproducible dimensions nevertheless produce identical behavioral predictions or task performance on visual judgments tied to those dimensions would show that the profiles do not add diagnostic information beyond accuracy.

Figures

Figures reproduced from arXiv: 2605.20127 by Ayumu Yamashita, Kaoru Amano, Ken Nakamura, Ryuto Yashiro, Tomoya Nakai.

Figure 1
Figure 1. Figure 1: Overview of the predictive subspace framework. Repeated target-brain responses are split into two averaged views, and the predictive subspace fitting operator (∗) between views estimates the reproducible target reference. The same operator is applied to each source, either another subject’s brain responses or a model representation, to predict target responses. The predicted response patterns span a source… view at source ↗
Figure 2
Figure 2. Figure 2: Reproducible target references across ROIs. (A) Independently estimated target refer￾ences converge toward the estimate obtained from all split-half partitions. (B, C) Reference weights are concentrated but distributed across several target-reference dimensions. (D) Readouts restricted to the selected predictive subspace preserve held-out prediction accuracy relative to full-representation readouts and out… view at source ↗
Figure 3
Figure 3. Figure 3: Models differ in recovered dimensions beyond prediction accuracy. (A) A matched VGG-16 case has nearly identical prediction accuracy but different recovery profiles. (B) Prediction accuracy and the overall height of the recovery curve are related but not interchangeable. (C) Near￾equal-accuracy pairs show small differences between brain source pairs but larger differences for source pairs involving models;… view at source ↗
Figure 4
Figure 4. Figure 4: Robustness and controls. (A) Trial-partition resampling sensitivity measured as self￾subspace prediction accuracy divided by full-feature readout accuracy. (B) Rank-rule sensitivity for random and pretrained models; error bars show SEM across architectures within each training state and rank rule. (C) Target-side PCA overlaps with part of the reproducible target reference, as expected for reliable high-var… view at source ↗
Figure 5
Figure 5. Figure 5: Brain-to-brain recovery profiles. (A) Brain-to-brain top-k reference-coverage curves for all visual ROIs. Top-k reference coverage is a reference-weighted prefix average, not a cumulative sum. (B) Brain-to-brain prediction accuracy and brain-source profile mean are related but not identical, which shows why brain-to-brain comparison is more informative as a recovery profile than a scalar ceiling alone. R18… view at source ↗
Figure 6
Figure 6. Figure 6: Dataset-shift analysis. NSD-synthetic evaluates structured recovery outside the main natural-image dataset. Pretrained models remain high across datasets, while random controls show stronger architecture dependence, especially for ViT on synthetic images. to better match several macaque V1 response properties, including predictivity [20]. In our sweep, the L2-robust model has the highest brain-source-refer… view at source ↗
Figure 7
Figure 7. Figure 7: Axis-wise directional coverage. Each panel shows DirCov for the first ten reference directions in one visual ROI. Lines show brain-to-brain means, pretrained model means, and random￾model means; gray bars show the normalized reproducible target reference weight for each direction. Axis-wise coverage is diagnostic, especially for the leading directions, but later low-mass directions are weaker and more vari… view at source ↗
Figure 8
Figure 8. Figure 8: NSD-core-shared model recovery profiles by ROI and architecture. Each panel shows top-k reference coverage for one visual ROI and one architecture. This quantity is a prefix average rather than a cumulative sum. Curves summarize four-seed, randomly initialized models, ImageNet-pretrained models, and the brain-source recovery profile, all over the same reproducible target reference. Shaded bands show SEM ac… view at source ↗
Figure 9
Figure 9. Figure 9: NSD-synthetic model recovery profiles by ROI and architecture. The layout matches [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Layer-wise and objective diagnostics. (A) Brain-source-referenced profile score across normalized layer depth for representative random and pretrained models. Labels mark the best layer for each curve. (B) Representative top-k reference-coverage profiles show that high-scoring pretrained layers remain closer to the brain-source profile, whereas random ViT is dominated by the leading direction and drops sh… view at source ↗
Figure 11
Figure 11. Figure 11: Source-side explanation diagnostics. (A) Brain-source-referenced profile score plotted against the selected source rank. (B) The source effective rank, computed from nonnegative predictive strengths across the selected source coordinates, serves as a diagnostic of compactness. (C) Source￾side fold stability is the median normalized projector overlap between source-side predictive subspaces selected on dif… view at source ↗
Figure 12
Figure 12. Figure 12: High-accuracy and brain-source profile-shape diagnostics. (A) The top quartile of prediction accuracy is shaded. (B) Within this high-accuracy subset, brain-source-referenced profile scores remain separated between random and pretrained sources; boxplots show medians, interquartile ranges, and 1.5-IQR whiskers. (C) Brain-source profile-shape distance compares the mean-normalized top-k curve with the brain… view at source ↗
read the original abstract

Artificial vision models are often evaluated against the human visual cortex by measuring how accurately their internal representations predict brain responses. However, prediction accuracy alone does not indicate which dimensions of the target brain's response space are recovered. Here, we introduce a unified framework for evaluating both model-brain and brain-brain alignment by identifying the response dimensions recovered by prediction. Using repeated fMRI measurements, we first identify target-brain response dimensions that can be reproducibly predicted across independent trial splits. We then predict target-brain responses from either another subject's brain responses or a vision model's internal representations, and quantify how strongly each of these reproducible response dimensions is recovered. Applying this framework to a subset of the Natural Scenes Dataset, in which eight subjects viewed the same natural images during fMRI, we find that the early-to-intermediate visual-cortex responses contain a low-dimensional set of reproducible dimensions. Brain-to-brain comparisons identify which of these dimensions are consistently recoverable from other subjects' brains, providing a diagnostic human reference rather than only a scalar benchmark. In some cases, pretrained and randomly initialized models achieve similar prediction accuracy while showing distinct recovery profiles across these response dimensions. These results show that prediction accuracy alone can mask model-brain mismatches. By making explicit which reproducible brain response dimensions are recovered by prediction, our framework provides a more diagnostic evaluation of alignment between artificial vision models and the human visual cortex.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a framework to evaluate model-brain alignment by first identifying reproducible dimensions in fMRI brain responses using independent trial splits, then measuring the recovery strength of these dimensions in predictions from artificial vision models or other brains. Applied to a subset of the Natural Scenes Dataset with eight subjects, the authors report that early-to-intermediate visual cortex responses are low-dimensional and reproducible, that brain-to-brain recovery provides a diagnostic reference, and that models with comparable prediction accuracy can exhibit distinct recovery profiles, indicating that accuracy alone can mask mismatches in alignment.

Significance. If the core procedure is robust, the framework offers a more granular diagnostic for alignment evaluations than scalar accuracy, with brain-to-brain comparisons serving as an internal human benchmark. The approach leverages repeated measurements and held-out recovery quantification, which are positive features for reproducibility. It could help distinguish cases where models match overall variance but differ in the specific neural dimensions recovered.

major comments (3)
  1. [§3] §3 (Identifying reproducible dimensions): The manuscript must clarify whether the reproducibility threshold or selection criterion is fixed a priori or determined post-hoc from the data; if the latter, this couples the target subspace definition to the same split properties used for recovery and risks inflating apparent diagnostic power of the profiles.
  2. [§4] §4 (Recovery quantification): Per-dimension recovery metrics lack reported error bars, cross-validation details, or correction for multiple comparisons across dimensions; without these, differences in recovery profiles between models (or vs. brain-to-brain) cannot be distinguished from sampling variability in the fMRI responses.
  3. [§2–3] §2–3 (Trial-split procedure): The claim that split-based dimensions isolate stable signal rather than shared noise or hemodynamic artifacts requires explicit validation, such as consistency across alternative trial partitions or correlation with independent stimulus properties; absent this, the recovery profiles may not diagnose deeper code mismatches as asserted.
minor comments (2)
  1. [Abstract / Methods] The abstract and methods should specify the exact number of images, trials per split, and subjects used from the Natural Scenes Dataset subset for reproducibility.
  2. [Figures] Figure legends should explicitly define the recovery strength metric (e.g., correlation or R² per dimension) and the scale used for profile visualization.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We have addressed each major point below and believe the revisions will improve the clarity and statistical rigor of the work.

read point-by-point responses
  1. Referee: §3 (Identifying reproducible dimensions): The manuscript must clarify whether the reproducibility threshold or selection criterion is fixed a priori or determined post-hoc from the data; if the latter, this couples the target subspace definition to the same split properties used for recovery and risks inflating apparent diagnostic power of the profiles.

    Authors: We appreciate the referee's emphasis on this methodological detail. The reproducibility threshold in our framework is a fixed statistical criterion (reproducibility p < 0.05 after Bonferroni correction across voxels) chosen a priori based on standard practices in fMRI reliability analyses, rather than optimized post-hoc to maximize recovery. The dimension selection uses one pair of independent trial splits, while recovery is quantified on a fully held-out third split. We will revise the Methods section to state this explicitly and add a brief discussion noting that the held-out recovery measurement prevents direct circularity. These changes should address the concern about inflated diagnostic power. revision: yes

  2. Referee: §4 (Recovery quantification): Per-dimension recovery metrics lack reported error bars, cross-validation details, or correction for multiple comparisons across dimensions; without these, differences in recovery profiles between models (or vs. brain-to-brain) cannot be distinguished from sampling variability in the fMRI responses.

    Authors: We agree that these elements are essential for interpreting differences in recovery profiles. In the revised manuscript we will add bootstrap-derived standard errors (resampling across subjects and image trials) for all per-dimension recovery values. We will also specify the nested cross-validation procedure used to compute recovery and apply FDR correction for multiple comparisons across the selected dimensions. These additions will allow readers to evaluate whether observed profile differences exceed sampling variability. revision: yes

  3. Referee: §2–3 (Trial-split procedure): The claim that split-based dimensions isolate stable signal rather than shared noise or hemodynamic artifacts requires explicit validation, such as consistency across alternative trial partitions or correlation with independent stimulus properties; absent this, the recovery profiles may not diagnose deeper code mismatches as asserted.

    Authors: This is a fair critique of the interpretive strength of the split-based dimensions. While the core procedure relies on independent trial splits to emphasize stable signal, we acknowledge that further validation is warranted. In the revision we will include supplementary results showing that the identified low-dimensional subspaces remain consistent when alternative random partitions of the trials are used. We will also report correlations of these dimensions with basic stimulus properties (e.g., spatial frequency content and contrast) to help distinguish signal from potential artifacts. A complete exclusion of all hemodynamic confounds would require additional datasets with varied acquisition parameters, which lies outside the current study; we will note this limitation explicitly. revision: partial

Circularity Check

0 steps flagged

No significant circularity; recovery profiles grounded in independent splits

full rationale

The paper first identifies reproducible dimensions via prediction across independent fMRI trial splits on repeated measurements, then separately quantifies recovery strength of those dimensions under model or brain-to-brain predictions. This two-stage structure uses held-out splits for identification and applies the metric to distinct prediction sources, avoiding reduction of the evaluation to its own inputs by construction. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework depends on standard assumptions about fMRI signal reliability and the interpretability of linear or similar prediction models, plus a data-driven definition of reproducible dimensions; no new physical entities are introduced.

free parameters (1)
  • reproducibility threshold or selection criterion
    A cutoff or statistical rule is needed to decide which response dimensions qualify as reproducible across trial splits.
axioms (1)
  • domain assumption Repeated fMRI measurements on the same images allow reliable identification of stable response dimensions in visual cortex.
    The method assumes trial-split predictability captures biologically meaningful and stable features rather than noise or task-specific artifacts.

pith-pipeline@v0.9.0 · 5790 in / 1363 out tokens · 46267 ms · 2026-05-20T03:19:07.174121+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 2 internal anchors

  1. [1]

    Allen, Ghislain St-Yves, Yihan Wu, Jesse L

    Emily J. Allen, Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Jacob S. Prince, Logan T. Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, J. Benjamin Hutchinson, Thomas Naselaris, and Kendrick N. Kay. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence.Nature Neuroscience, 25(1):116–126, 2022. doi: 10.103...

  2. [2]

    Perception Encoder: The best visual embeddings are not at the output of the network

    Daniel Bolya, Po-Yao Huang, Peize Sun, Jang Hyun Cho, Andrea Madotto, Chen Wei, Tengyu Ma, Jiale Zhi, Jathushan Rajasegaran, Hanoona Abdul Rasheed, Junke Wang, Marco Monteiro, Hu Xu, Shiyu Dong, Nikhila Ravi, Shang-Wen Li, Piotr Dollár, and Christoph Feichtenhofer. Perception Encoder: The best visual embeddings are not at the output of the network. In Adv...

  3. [3]

    Correlation calculated from faulty data.British Journal of Psychology, 3(3): 271–295, 1910

    William Brown. Some experimental results in the correlation of mental abilities.British Journal of Psychology, 3(3):296–322, 1910. doi: 10.1111/j.2044-8295.1910.tb00207.x

  4. [4]

    A spectral theory of neural prediction and alignment

    Abdulkadir Canatar, Jenelle Feather, Albert Wakhloo, and SueYeon Chung. A spectral theory of neural prediction and alignment. InAdvances in Neu- ral Information Processing Systems, volume 36, pages 47052–47080, 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/hash/ 9308d1b7d4ae2d3e2e67ae94b1078bf7-Abstract-Conference.html

  5. [5]

    Emerging properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9650–9660, 2021

  6. [6]

    Haxby, and Peter J

    Po-Hsuan Chen, Janice Chen, Yaara Yeshurun, Uri Hasson, James V . Haxby, and Peter J. Ra- madge. A reduced-dimension fMRI shared response model. InAdvances in Neural Information Processing Systems, volume 28, pages 460–468, 2015

  7. [7]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

  8. [8]

    Tibshirani.An Introduction to the Bootstrap

    Bradley Efron and Robert J. Tibshirani.An Introduction to the Bootstrap. Chapman and Hall/CRC, 1993. doi: 10.1201/9780429246593

  9. [9]

    Robust- ness Python library, 2019

    Logan Engstrom, Andrew Ilyas, Hadi Salman, Shibani Santurkar, and Dimitris Tsipras. Robust- ness Python library, 2019. URLhttps://github.com/MadryLab/robustness

  10. [10]

    Ky Fan. Maximum properties and inequalities for the eigenvalues of completely continuous operators.Proceedings of the National Academy of Sciences of the United States of America, 37 (11):760–766, 1951. doi: 10.1073/pnas.37.11.760

  11. [11]

    Toshev, and Vaishaal Shankar

    Alex Fang, Albin Madappally Jose, Amit Jain, Ludwig Schmidt, Alexander T. Toshev, and Vaishaal Shankar. Data filtering networks. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=KAk6ngZ09F

  12. [12]

    Apurva Ratan Murty, and Aran Nayebi

    Jenelle Feather, Meenakshi Khosla, N. Apurva Ratan Murty, and Aran Nayebi. Brain-Model evaluations need the NeuroAI turing test.arXiv preprint arXiv:2502.16238, 2025. doi: 10. 48550/arXiv.2502.16238

  13. [13]

    Gifford, Radoslaw M

    Alessandro T. Gifford, Radoslaw M. Cichy, Thomas Naselaris, and Kendrick N. Kay. A 7T fMRI dataset of synthetic images for out-of-distribution modeling of vision.Nature Communications, 17:1589, 2026. doi: 10.1038/s41467-026-69345-9. 10

  14. [14]

    Umut Güçlü and Marcel A. J. van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream.The Journal of Neuroscience, 35 (27):10005–10014, 2015. doi: 10.1523/JNEUROSCI.5023-14.2015

  15. [15]

    Haxby, J

    James V . Haxby, J. Swaroop Guntupalli, Andrew C. Connolly, Yaroslav O. Halchenko, Bryan R. Conroy, M. Ida Gobbini, Michael Hanke, and Peter J. Ramadge. A common, high-dimensional model of the representational space in human ventral temporal cortex.Neuron, 72(2):404–416,

  16. [16]

    doi: 10.1016/j.neuron.2011.08.026

  17. [17]

    Deep residual learning for im- age recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im- age recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

  18. [18]

    Hoerl and Robert W

    Arthur E. Hoerl and Robert W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 12(1):55–67, 1970. doi: 10.1080/00401706.1970.10488634

  19. [19]

    Only brains align with brains: Cross-region alignment patterns expose limits of normative models

    Larissa Höfling, Matthias Tangemann, Lotta Piefke, Susanne Keller, Matthias Bethge, and Katrin Franke. Only brains align with brains: Cross-region alignment patterns expose limits of normative models. InThe Fourteenth International Conference on Learning Representations,

  20. [20]

    URLhttps://openreview.net/forum?id=cMGJcHHI7d

  21. [21]

    Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975

    Alan Julian Izenman. Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975. doi: 10.1016/0047-259X(75)90042-1

  22. [22]

    Nathan C. L. Kong, Eshed Margalit, Justin L. Gardner, and Anthony M. Norcia. Increasing neural network robustness improves match to macaque V1 eigenspectrum, spatial frequency preference and predictivity.PLOS Computational Biology, 18(1):e1009739, 2022. doi: 10. 1371/journal.pcbi.1009739

  23. [23]

    Similarity of neural network representations revisited

    Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InProceedings of the 36th International Conference on Machine Learning, pages 3519–3529, 2019

  24. [24]

    Representational Similarity Analysis – Connecting the Branches of Systems Neuroscience

    Nikolaus Kriegeskorte, Marieke Mur, and Peter A. Bandettini. Representational similarity analysis—connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008. doi: 10.3389/neuro.06.004.2008

  25. [25]

    Meth- ods for computing the maximum performance of computational models of fMRI responses

    Agustin Lage-Castellanos, Giancarlo Valente, Elia Formisano, and Federico De Martino. Meth- ods for computing the maximum performance of computational models of fMRI responses. PLoS Computational Biology, 15(3):e1006397, 2019. doi: 10.1371/journal.pcbi.1006397

  26. [26]

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin Transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021. doi: 10.1109/ICCV48922.2021.00986

  27. [27]

    Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A ConvNet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022. doi: 10.1109/CVPR52688.2022.01167

  28. [28]

    Reverse predictivity for bidirectional comparison of neural networks and biological brains.Nature Machine Intelligence, 8(3):474–488, 2026

    Sabine Muzellec and Kohitij Kar. Reverse predictivity for bidirectional comparison of neural networks and biological brains.Nature Machine Intelligence, 8(3):474–488, 2026. doi: 10. 1038/s42256-026-01204-0

  29. [29]

    doi: 10.1016/j.neuroimage.2010.07.073

    Thomas Naselaris, Kendrick N. Kay, Shinji Nishimoto, and Jack L. Gallant. Encoding and decoding in fMRI.NeuroImage, 56(2):400–410, 2011. doi: 10.1016/j.neuroimage.2010.07.073

  30. [30]

    Nastase, Valeria Gazzola, Uri Hasson, and Christian Keysers

    Samuel A. Nastase, Valeria Gazzola, Uri Hasson, and Christian Keysers. Measuring shared responses across subjects using intersubject correlation.Social Cognitive and Affective Neuro- science, 14(6):667–685, 2019. doi: 10.1093/scan/nsz037

  31. [31]

    Prince, Ian Charest, Jan W

    Jacob S. Prince, Ian Charest, Jan W. Kurzawski, John A. Pyles, Michael J. Tarr, and Kendrick N. Kay. Improving the accuracy of single-trial fMRI response estimates using GLMsingle.eLife, 11:e77599, 2022. doi: 10.7554/eLife.77599. 11

  32. [32]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceed- ings of the 38th International Conference on Machine Learning, pages 8748–8763, 2021

  33. [33]

    The effective rank: A measure of effective dimensionality

    Olivier Roy and Martin Vetterli. The effective rank: A measure of effective dimensionality. In 2007 15th European Signal Processing Conference, pages 606–610, 2007

  34. [34]

    Scharf, Chris Peterson, Michael Kirby, and Joseph M

    Ignacio Santamaría, Louis L. Scharf, Chris Peterson, Michael Kirby, and Joseph M. Francos. An order fitting rule for optimal subspace averaging. In2016 IEEE Statistical Signal Processing Workshop (SSP), pages 1–4, 2016

  35. [35]

    Harper, Ben D

    Oliver Schoppe, Nicol S. Harper, Ben D. B. Willmore, Andrew J. King, and Jan W. H. Schnupp. Measuring the performance of neural models.Frontiers in Computational Neuroscience, 10:10,

  36. [36]

    doi: 10.3389/fncom.2016.00010

  37. [37]

    Semedo, Amin Zandvakili, Christian K

    João D. Semedo, Amin Zandvakili, Christian K. Machens, Byron M. Yu, and Adam Kohn. Cortical areas interact through a communication subspace.Neuron, 102(1):249–259.e4, 2019. doi: 10.1016/j.neuron.2019.01.026

  38. [38]

    Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...

  39. [39]

    Very deep convolutional networks for large-scale image recognition

    Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. InInternational Conference on Learning Representations, 2015

  40. [40]

    Correlation calculated from faulty data.British Journal of Psychology, 3(3): 271–295, 1910

    Charles Spearman. Correlation calculated from faulty data.British Journal of Psychology, 3(3): 271–295, 1910. doi: 10.1111/j.2044-8295.1910.tb00206.x

  41. [41]

    SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

    Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alab- dulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, Olivier Hénaff, Jeremiah Harmsen, Andreas Steiner, and Xiaohua Zhai. SigLIP 2: Multilingual Vision- Language encoders with improved semantic understanding, localization, and dense features....

  42. [42]

    Daniel L. K. Yamins and James J. DiCarlo. Using goal-driven deep learning models to under- stand sensory cortex.Nature Neuroscience, 19(3):356–365, 2016. doi: 10.1038/nn.4244

  43. [43]

    Daniel L. K. Yamins, Ha Hong, Charles F. Cadieu, Ethan A. Solomon, Darren Seibert, and James J. DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex.Proceedings of the National Academy of Sciences, 111(23):8619–8624, 2014. doi: 10.1073/pnas.1403112111

  44. [44]

    Catalyzing next- generation artificial intelligence through NeuroAI.Nature Communications, 14:1597, 2023

    Anthony Zador, Sean Escola, Blake Richards, Bence Ölveczky, Yoshua Bengio, Kwabena Boahen, Matthew Botvinick, Dmitri Chklovskii, Anne Churchland, et al. Catalyzing next- generation artificial intelligence through NeuroAI.Nature Communications, 14:1597, 2023. doi: 10.1038/s41467-023-37180-x

  45. [45]

    Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, and Yu Su

    Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11975–11986, 2023. doi: 10.1109/ICCV51070.2023.01100. 12 A Experimental setting The main text summarizes the information needed to interpret the results. This app...

  46. [46]

    Source and target matrices are split into outer-training and held-out outer-test images, with standardization statistics estimated only on the relevant training data

  47. [47]

    Each brain or model source is fit to the target responses on outer-training images, with rank and subspace regularization selected by inner cross-validation; this produces a source- induced predictive subspace in the target response space

  48. [48]

    Held-out repeated target responses are split into two averaged views, and target-to-target prediction between these views defines the reproducible target reference for that fold

  49. [49]

    Each source-induced predictive subspace is compared with this reference using directional and top-kreference coverage

  50. [50]

    Fold-level curves are averaged to form recovery profiles

  51. [51]

    The recovery profile, not the scalar summary, is the primary object

    Scalar summaries such as profile mean, brain-source-referenced score, and full-spectrum reference coverage are computed for compact reporting and controls. The recovery profile, not the scalar summary, is the primary object. Notation.We use s for a generic source, d for a non-target subject (donor) used as a brain source, m for a model source, t for the t...

  52. [52]

    C Justification of the repeated-trial target reference This appendix justifies the repeated-trial target reference used as an evaluation coordinate system

    is erank(TargetRef) = exp − X i pi logp i ! , p i =λ i/ X j λj. C Justification of the repeated-trial target reference This appendix justifies the repeated-trial target reference used as an evaluation coordinate system. The goal is limited: we do not claim to recover a unique ground-truth biological subspace. Instead, we show that the construction provide...