Shared representations in brains and models reveal a two-route cortical organization during scene perception

Llu\'is Fuentemilla; Pablo Marcos-Manch\'on

arxiv: 2507.13941 · v2 · pith:PSWTA72Bnew · submitted 2025-07-18 · 🧬 q-bio.NC · cs.AI· cs.CV· eess.IV

Shared representations in brains and models reveal a two-route cortical organization during scene perception

Pablo Marcos-Manch\'on , Llu\'is Fuentemilla This is my paper

Pith reviewed 2026-05-19 04:53 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AIcs.CVeess.IV

keywords representational similarity analysisscene perceptionfMRIventromedial pathwaylateral occipitotemporal pathwayneural networkscortical organizationvisual cortex

0 comments

The pith

Scene perception uses two separate cortical routes, one for layout and context and one for animate content.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The study applies representational similarity analysis to 7T fMRI recordings made while people viewed natural scenes. Shared response patterns across participants are compared to layered features taken from vision and language neural networks. This comparison identifies a ventromedial route that encodes scene layout and environmental context and a lateral occipitotemporal route that is tuned to animate elements. Vision models match the geometry in both routes, whereas language models align mainly with the lateral route. The result reframes scene perception as a distributed network rather than a single hierarchical stream.

Core claim

Representational similarity analysis performed on 7T fMRI data collected during natural scene viewing identifies two distinct processing routes in the cortex. A ventromedial pathway specializes in scene layout and environmental context, while a lateral occipitotemporal pathway is selective for animate content. Hierarchical features from vision neural networks align with the shared structure found in both routes, but language-model features correspond primarily to the lateral route. These observations refine classical visual-stream models by describing scene perception as a distributed cortical network with separable representational organizations for context and animate content.

What carries the argument

Representational similarity analysis that extracts shared geometry across individuals' brain responses to scenes and matches it against hierarchical features from vision and language neural networks.

If this is right

Scene perception is carried by separable representational routes for contextual layout and animate content.
Vision models capture shared structure across both routes while language models align mainly with the animate route.
Classical two-stream models of vision must be updated to include this distributed two-route organization for complex scenes.
Shared patterns across people point to stable cortical organizations that support scene understanding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The two routes could show different sensitivity to focal brain damage, producing selective deficits in layout versus object recognition.
Active tasks such as navigation or search might reveal how the routes compete or cooperate under behavioral demands.
Similar cross-model comparisons in other modalities could test whether the separation is specific to visual scene processing.

Load-bearing premise

The assumption that cross-individual representational similarity recorded during passive scene viewing captures stable, functionally meaningful cortical routes rather than task- or stimulus-specific correlations.

What would settle it

If the distinct similarity geometries in ventromedial versus lateral occipitotemporal regions become indistinguishable when the same scenes are viewed under an active task that requires integrating layout and animate information.

Figures

Figures reproduced from arXiv: 2507.13941 by Llu\'is Fuentemilla, Pablo Marcos-Manch\'on.

**Figure 1.** Figure 1: A unified framework for tracing representational pathways. (A) Feature Extraction. For each image stimulus, we extracted equivalent representations from brain activity and deep neural networks. Single-trial fMRI responses were aggregated within HCP-MMP cortical parcels 31 to create vector representations of the brain’s response. Concurrently, layer-wise activations were extracted from pre-trained vision an… view at source ↗

**Figure 2.** Figure 2: Cortical distribution of representational alignment. (A) Inter-subject alignment (RSA, Pearson r) computed per parcel and grouped by macro-anatomical clusters 47. Box-plots show the ten clusters with the highest mean alignment in the NSD sample (N = 8; symmetric HCP-MMP atlas 31). Red lines mark the parcel-wise null distribution (mean ± s.d.; 10 000 label permutations). All displayed parcels exceed chance … view at source ↗

**Figure 3.** Figure 3: Hierarchical convergence between models and cortex. (A–C) Layer-wise alignment (RSA) between vision models and brain activity for representative parcels within the three reference hubs. The alignment curves are averaged across all vision models used in the study. Panels show Early Visual Cortex (A), the Ventral Hub (B), and the LOTC Hub (C). Lines show the mean alignment across participants (N = 8), with s… view at source ↗

**Figure 4.** Figure 4: Inter-subject representational connectivity network. (A) Inter-subject connectivity matrix. Parcel-wise representational connectivity (RSA, Pearson’s r) is shown for 30 key cortical parcels (N = 8). Each cell represents the mean RSA score between the RDMs of two parcels, computed across all pairs of different individuals. The matrix reveals a clear block structure corresponding to three principal hubs: Ea… view at source ↗

**Figure 5.** Figure 5: Dominant representational dimensions in cortical hubs. (A–C) Stimuli projection onto the first two shared components for each hub, extracted via Kernel Multi-view CCA (KMCCA) using the voxel data from all eight NSD participants for the common image set. In Early Visual Cortex (A), stimuli form a category-free cloud reflecting low-level visual similarity. In the Ventral Hub (B), the dominant axis arranges s… view at source ↗

**Figure 6.** Figure 6: Shared representational geometry is modulated by stimulus content and task. Alignment computed using the symmetric HCP atlas combining both hemispheres. (A–D) Replication in BOLD5000, using complex natural scenes. With a different task (valence rating) but with comparable complex scene stimuli that include social content, the main findings were replicated. Both the inter-subject alignment (A) and its corre… view at source ↗

read the original abstract

The brain transforms visual inputs into high-dimensional cortical representations that support diverse cognitive and behavioral goals. Characterizing how this information is organized and routed across the human brain is essential for understanding how we process complex visual scenes. Here, we applied representational similarity analysis to 7T fMRI data collected during natural scene viewing. We quantified representational geometry shared across individuals and compared it to hierarchical features from vision and language neural networks. This analysis revealed two distinct processing routes: a ventromedial pathway specialized for scene layout and environmental context, and a lateral occipitotemporal pathway selective for animate content. Vision models aligned with shared structure in both routes, whereas language models corresponded primarily with the lateral pathway. These findings refine classical visual-stream models by characterizing scene perception as a distributed cortical network with separable representational routes for context and animate content.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds two routes in scene perception from shared RSA geometry but the evidence for stable organization over stimulus co-occurrence is still thin.

read the letter

The main takeaway is that RSA on 7T fMRI during passive natural scene viewing yields shared representational geometry that splits into a ventromedial pathway for layout and context and a lateral occipitotemporal pathway for animate content, with vision models aligning to both and language models aligning mostly to the lateral one. This refines the classical two-stream account by giving scene perception a more distributed, content-specific cortical split. The work applies standard RSA and model-feature comparisons to natural stimuli rather than artificial ones, which is a modest but useful step. The differential alignment with language networks is the clearest addition beyond prior visual-stream studies. The abstract lays this out without unnecessary hype. The soft spots sit mainly in the missing details. No subject count, no reliability metrics for the shared RDM, and no mention of low-level feature controls or split-half checks appear in the provided description. That leaves room for the stress-test concern to apply: the ventromedial-lateral separation could track regularities in the particular stimulus set rather than fixed cortical routes. If the full paper shows cross-validated reliability or explicit controls for image statistics, the claim strengthens; otherwise it stays provisional. The approach itself is not circular or fitted by construction, and the authors engage the literature on visual and language models in a straightforward way. This is the sort of paper that belongs in a cognitive neuroscience or computational modeling venue. Readers working on scene processing or brain-model alignment would get a concrete data point from it, even if they want tighter evidence on generalizability. It deserves peer review because the question is clear, the methods are accessible, and the two-route idea is testable, though any referee would likely press on stimulus specificity and reliability numbers.

Referee Report

2 major / 2 minor

Summary. The manuscript applies representational similarity analysis (RSA) to 7T fMRI data collected during passive viewing of natural scenes. It extracts representational geometry shared across individuals and aligns this geometry to hierarchical features from pre-trained vision and language neural networks. The central finding is a two-route cortical organization: a ventromedial pathway specialized for scene layout and environmental context, and a lateral occipitotemporal pathway selective for animate content. Vision models align with shared structure in both routes, whereas language models align primarily with the lateral route.

Significance. If the shared RDMs prove reliable and the route distinction generalizes, the work refines classical dorsal/ventral stream models by characterizing scene perception as a distributed network with separable representational routes for context versus animate content. The integration of high-field fMRI with both vision and language model features offers a computational bridge that could guide targeted experiments on how cortical organization supports high-level scene understanding.

major comments (2)

[Methods] Methods section: No participant count, stimulus-set size or composition, statistical thresholds, or cross-validation scheme is reported for the RSA or model-alignment steps. These details are load-bearing for the claim that the ventromedial/lateral separation reflects stable functional routes rather than dataset-specific correlations.
[Results] Results section on route identification: Without reported split-half reliability of the shared component or control RDMs for low-level features (e.g., spatial frequency, object co-occurrence), the ventromedial specialization for layout/context could be driven by stimulus statistics in the particular scene set rather than a general organizational principle.

minor comments (2)

[Abstract] Abstract: The phrase 'hierarchical features from vision and language neural networks' should specify which layers or models were used and how feature extraction was performed.
[Discussion] Discussion: A brief limitations paragraph addressing the passive-viewing design and potential task-specificity would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important aspects of methodological transparency and robustness. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of our findings on the two-route organization during scene perception.

read point-by-point responses

Referee: [Methods] Methods section: No participant count, stimulus-set size or composition, statistical thresholds, or cross-validation scheme is reported for the RSA or model-alignment steps. These details are load-bearing for the claim that the ventromedial/lateral separation reflects stable functional routes rather than dataset-specific correlations.

Authors: We agree that explicit reporting of these parameters is necessary to evaluate the stability of the identified routes. The revised Methods section will include the participant count, the size and composition of the natural scene stimulus set, the statistical thresholds applied (including any multiple-comparison corrections), and the cross-validation procedures used for both the shared RDM computation and the model-alignment analyses. These additions will directly support the interpretation that the ventromedial and lateral routes reflect reliable functional organization rather than idiosyncratic dataset features. revision: yes
Referee: [Results] Results section on route identification: Without reported split-half reliability of the shared component or control RDMs for low-level features (e.g., spatial frequency, object co-occurrence), the ventromedial specialization for layout/context could be driven by stimulus statistics in the particular scene set rather than a general organizational principle.

Authors: We recognize the value of these controls for distinguishing stimulus-driven effects from broader organizational principles. In the revision, we will report split-half reliability estimates for the shared representational components across subjects. We will also add control analyses comparing the observed routes against RDMs constructed from low-level image statistics (spatial frequency content) and higher-order co-occurrence measures. These supplementary results will be presented in the Results section to demonstrate that the ventromedial specialization for context and the lateral selectivity for animate content are not fully explained by the specific statistics of the stimulus set. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical RSA and external model comparisons are data-driven

full rationale

The paper applies representational similarity analysis to 7T fMRI data from natural scene viewing, extracts shared representational geometry across individuals, and aligns it to hierarchical features from pre-trained vision and language networks. The two-route distinction (ventromedial for layout/context, lateral for animate content) emerges from these cross-subject and cross-model comparisons rather than any self-definitional equation, fitted parameter renamed as prediction, or load-bearing self-citation. No derivation reduces to its inputs by construction; the analysis remains falsifiable against the stimulus set and external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the work relies on standard assumptions of RSA and DNN feature extraction.

pith-pipeline@v0.9.0 · 5682 in / 1103 out tokens · 31576 ms · 2026-05-19T04:53:42.486982+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

This analysis revealed two distinct processing routes: a ventromedial pathway specialized for scene layout and environmental context, and a lateral occipitotemporal pathway selective for animate content.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We applied representational similarity analysis to 7T fMRI data collected during natural scene viewing.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale
cs.CV 2026-04 unverdicted novelty 6.0

Evidence for cross-modal representational convergence weakens substantially at scale and in realistic many-to-many settings, indicating models learn rich but distinct representations.

Reference graph

Works this paper leans on

93 extracted references · 93 canonical work pages · cited by 1 Pith paper · 6 internal anchors

[1]

James J. Gibson. The ecological approach to visual perception. Houghton, Mifflin and Company, 1979

work page 1979
[2]

R. L. Gregory. Perceptions as hypotheses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 290(1038):181–197, 1980

work page 1980
[3]

The logic of perception

Irvin Rock. The logic of perception. MIT Press, Cambridge, 1983

work page 1983
[4]

Zirui Chen and Michael F. Bonner. Universal dimensions of visual representation. Science Advances, 11(27): eadw7697, 2025. doi:10.1126/sciadv.adw7697

work page doi:10.1126/sciadv.adw7697 2025
[5]

Karl J. Friston. A theory of cortical responses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 360(1456):815–836, 2005. doi:10.1098/rstb.2005.1622

work page doi:10.1098/rstb.2005.1622 2005
[6]

Rajesh P. N. Rao and Dana H. Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1):79–87, 1999. doi:10.1038/4580

work page doi:10.1038/4580 1999
[7]

Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 10(7):301–308, 2006

Alan Yuille and Daniel Kersten. Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 10(7):301–308, 2006. doi:10.1016/j.tics.2006.05.002. Special issue: Probabilistic models of cognition. 15

work page doi:10.1016/j.tics.2006.05.002 2006
[8]

The structural basis of inter-individual differences in human behaviour and cognition

Ryota Kanai and Geraint Rees. The structural basis of inter-individual differences in human behaviour and cognition. Nature Reviews Neuroscience, 12(4):231–242, 2011. doi:10.1038/nrn3000

work page doi:10.1038/nrn3000 2011
[9]

Hermann, and Bevil R

Rosa Lafer-Sousa, Katherine L. Hermann, and Bevil R. Conway. Striking individual differences in color perception uncovered by ‘the dress’ photograph.Current Biology, 25(13):R545–R546, 2015. doi:10.1016/j.cub.2015.04.053

work page doi:10.1016/j.cub.2015.04.053 2015
[10]

Samuel Schwarzkopf, Chen Song, and Geraint Rees

D. Samuel Schwarzkopf, Chen Song, and Geraint Rees. The surface area of human V1 predicts the subjective experience of object size. Nature Neuroscience, 14(1):28–30, 2011. doi:10.1038/nn.2706

work page doi:10.1038/nn.2706 2011
[11]

Christopher Baldassano, Uri Hasson, and Kenneth A. Norman. Representation of real-world event schemas during narrative perception. Journal of Neuroscience, 38(45):9689–9699, 2018. doi:10.1523/JNEUROSCI.0251-18.2018

work page doi:10.1523/jneurosci.0251-18.2018 2018
[12]

Intersubject synchronization of cortical activity during natural vision.Science, 303(5664):1634–1640, 2004

Uri Hasson, Yuval Nir, Ifat Levy, Galit Fuhrmann, and Rafael Malach. Intersubject synchronization of cortical activity during natural vision. Science, 303(5664):1634–1640, 2004. doi:10.1126/science.1089506

work page doi:10.1126/science.1089506 2004
[13]

Haxby, Andrew C

James V . Haxby, Andrew C. Connolly, and J. Swaroop Guntupalli. Decoding neural representational spaces using multivariate pattern analysis. Annual Review of Neuroscience, 37:435–456, 2014. doi:10.1146/annurev-neuro- 062012-170325

work page doi:10.1146/annurev-neuro- 2014
[14]

Honey, Chung H

Janice Chen, Yuan Chang Leong, Christopher J. Honey, Chung H. Yong, Kenneth A. Norman, and Uri Hasson. Shared memories reveal shared structure in neural activity across individuals.Nature Neuroscience, 20(1):115–125,

work page
[15]

Deep supervised, but not unsupervised, models may explain IT cortical representation

Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLOS Computational Biology , 10(11):1–29, 2014. doi:10.1371/journal.pcbi.1003915

work page doi:10.1371/journal.pcbi.1003915 2014
[16]

Daniel L. K. Yamins, Ha Hong, Charles F. Cadieu, Ethan A. Solomon, Darren Seibert, and James J. DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23):8619–8624, 2014. doi:10.1073/pnas.1403112111

work page doi:10.1073/pnas.1403112111 2014
[17]

Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence.Scientific Reports, 6(1):27755, 2016

Radoslaw Martin Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, and Aude Oliva. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports, 6(1):27755, 2016. doi:10.1038/srep27755

work page doi:10.1038/srep27755 2016
[18]

Alexander J. E. Kell, Daniel L. K. Yamins, Erica N. Shook, Sam V . Norman-Haignere, and Josh H. McDermott. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98(3):630–644.e16, 2018. doi:10.1016/j.neuron.2018.03.044

work page doi:10.1016/j.neuron.2018.03.044 2018
[19]

Brains and algorithms partially converge in natural language processing , volume =

Charlotte Caucheteux and Jean-Rémi King. Brains and algorithms partially converge in natural language processing. Communications Biology, 5(1):134, 2022. doi:10.1038/s42003-022-03036-1

work page doi:10.1038/s42003-022-03036-1 2022
[20]

Trends in Cognitive Sciences , author =

Radoslaw M. Cichy and Daniel Kaiser. Deep neural networks as scientific models. Trends in Cognitive Sciences, 23(4):305–317, 2019. doi:10.1016/j.tics.2019.01.009

work page doi:10.1016/j.tics.2019.01.009 2019
[21]

Sommers, Katja Seeliger, Blake Richards, Jenann Ismael, Grace W

Adrien Doerig, Rowan P. Sommers, Katja Seeliger, Blake Richards, Jenann Ismael, Grace W. Lindsay, Kon- rad P. Kording, Talia Konkle, Marcel A. J. van Gerven, Nikolaus Kriegeskorte, and Tim C. Kietzmann. The neuroconnectionist research programme. Nature Reviews Neuroscience, 24(7):431–450, 2023. doi:10.1038/s41583- 023-00705-w

work page doi:10.1038/s41583- 2023
[22]

Brain–machine convergent evolution: Why finding parallels between brain and artificial systems is informative

Erez Simony, Shany Grossman, and Rafael Malach. Brain–machine convergent evolution: Why finding parallels between brain and artificial systems is informative. Proceedings of the National Academy of Sciences, 121(41): e2319709121, 2024. doi:10.1073/pnas.2319709121

work page doi:10.1073/pnas.2319709121 2024
[23]

Allen, Ghislain St-Yves, Yihan Wu, Jesse L

Emily J. Allen, Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Jacob S. Prince, Logan T. Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, J. Benjamin Hutchinson, Thomas Naselaris, and Kendrick Kay. A massive 7t fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1): 116–126, 2022. doi:10.1038/...

work page doi:10.1038/s41593-021-00962-x 2022
[24]

Pyles, Austin Marcus, Abhinav Gupta, Michael J

Nadine Chang, John A. Pyles, Austin Marcus, Abhinav Gupta, Michael J. Tarr, and Elissa M. Aminoff. BOLD5000, a public fMRI dataset while viewing 5000 visual images. Scientific Data, 6(1):49, 2019. doi:10.1038/s41597-019- 0052-3

work page doi:10.1038/s41597-019- 2019
[25]

THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior , volume =

Martin N. Hebart, Oliver Contier, Lina Teichmann, Adam H. Rockter, Charles Y . Zheng, Alexis Kidder, Anna Corriveau, Maryam Vaziri-Pashkam, and Chris I. Baker. THINGS-data, a multimodal collection of large- scale datasets for investigating object representations in human brain and behavior. eLife, 12:e82580, 2023. doi:10.7554/eLife.82580

work page doi:10.7554/elife.82580 2023
[26]

Goodale and A

Melvyn A. Goodale and A. David Milner. Separate visual pathways for perception and action. Trends in Neurosciences, 15(1):20–25, 1992. doi:10.1016/0166-2236(92)90344-8. 16

work page doi:10.1016/0166-2236(92)90344-8 1992
[27]

A cortical representation of the local visual environment

Russell Epstein and Nancy Kanwisher. A cortical representation of the local visual environment. Nature, 392 (6676):598–601, 1998. doi:10.1038/33402

work page doi:10.1038/33402 1998
[28]

Rolls, Xiaoqian Yan, Gustavo Deco, Yi Zhang, Veikko Jousmaki, and Jianfeng Feng

Edmund T. Rolls, Xiaoqian Yan, Gustavo Deco, Yi Zhang, Veikko Jousmaki, and Jianfeng Feng. A ventromedial visual cortical ‘where’ stream to the human hippocampus for spatial scenes revealed with magnetoencephalography. Communications Biology, 7(1):1047, 2024. doi:10.1038/s42003-024-06719-z

work page doi:10.1038/s42003-024-06719-z 2024
[29]

Allison, A

T. Allison, A. Puce, and G. McCarthy. Social perception from visual cues: role of the STS region. Trends in Cognitive Sciences, 4(7):267–278, 2000

work page 2000
[30]

Ungerleider

David Pitcher and Leslie G. Ungerleider. Evidence for a third visual pathway specialized for social perception. Trends in Cognitive Sciences, 25(2):100–110, 2021. doi:10.1016/j.tics.2020.11.006

work page doi:10.1016/j.tics.2020.11.006 2021
[31]

Glasser, Timothy S

Matthew F. Glasser, Timothy S. Coalson, Emma C. Robinson, Carl D. Hacker, John Harwell, Essa Yacoub, Kamil Ugurbil, Jesper Andersson, Christian F. Beckmann, Mark Jenkinson, Stephen M. Smith, and David C. Van Essen. A multi-modal parcellation of human cerebral cortex. Nature, 536(7615):171–178, 2016. doi:10.1038/nature18933

work page doi:10.1038/nature18933 2016
[32]

Representational similarity analysis – connecting the branches of systems neuroscience , issn =

Nikolaus Kriegeskorte, Marieke Mur, and Peter A. Bandettini. Representational similarity analysis – connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2, 2008. doi:10.3389/neuro.06.004.2008

work page doi:10.3389/neuro.06.004.2008 2008
[33]

Hardoon, Sandor Szedmak, and John Shawe-Taylor

David R. Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639–2664, 2004. doi:10.1162/0899766042321814

work page doi:10.1162/0899766042321814 2004
[34]

Felleman and David C

Daniel J. Felleman and David C. Van Essen. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1(1):1–47, 1991. doi:10.1093/cercor/1.1.1-a

work page doi:10.1093/cercor/1.1.1-a 1991
[35]

Kravitz, Kadharbatcha S

Dwight J. Kravitz, Kadharbatcha S. Saleem, Chris I. Baker, Leslie G. Ungerleider, and Mortimer Mishkin. The ventral visual pathway: An expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1):26–49, 2013. doi:10.1016/j.tics.2012.10.011

work page doi:10.1016/j.tics.2012.10.011 2013
[36]

(2014) A Toolbox for Representational Similarity Analysis

Hamed Nili, Cai Wingfield, Alexander Walther, Li Su, William Marslen-Wilson, and Nikolaus Kriegesko- rte. A toolbox for representational similarity analysis. PLOS Computational Biology , 10(4):1–11, 2014. doi:10.1371/journal.pcbi.1003553

work page doi:10.1371/journal.pcbi.1003553 2014
[37]

Methods for computing the maximum performance of computational models of fMRI responses , journal =

Agustin Lage-Castellanos, Giancarlo Valente, Elia Formisano, and Federico De Martino. Methods for computing the maximum performance of computational models of fMRI responses. PLOS Computational Biology, 15(3): 1–25, 2019. doi:10.1371/journal.pcbi.1006397

work page doi:10.1371/journal.pcbi.1006397 2019
[38]

DiCarlo, Davide Zoccolan, and Nicole C

James J. DiCarlo, Davide Zoccolan, and Nicole C. Rust. How does the brain solve visual object recognition? Neuron, 73(3):415–434, 2012. doi:10.1016/j.neuron.2012.01.010

work page doi:10.1016/j.neuron.2012.01.010 2012
[39]

Popham, Alexander G

Sara F. Popham, Alexander G. Huth, Natalia Y . Bilenko, Fatma Deniz, James S. Gao, Anwar O. Nunez-Elizalde, and Jack L. Gallant. Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nature Neuroscience, 24(11):1628–1636, 2021. doi:10.1038/s41593-021-00921-6

work page doi:10.1038/s41593-021-00921-6 2021
[40]

Freedman and Earl K

David J. Freedman and Earl K. Miller. Neural mechanisms of visual categorization: Insights from neurophysiology. Neuroscience & Biobehavioral Reviews, 32(2):311–329, 2008. doi:10.1016/j.neubiorev.2007.07.011

work page doi:10.1016/j.neubiorev.2007.07.011 2008
[41]

Weiner, and Kalanit Grill-Spector

Lior Bugatus, Kevin S. Weiner, and Kalanit Grill-Spector. Task alters category representations in prefrontal but not high-level visual cortex. NeuroImage, 155:437–449, 2017. doi:10.1016/j.neuroimage.2017.03.062

work page doi:10.1016/j.neuroimage.2017.03.062 2017
[42]

High-resolution image reconstruction with latent diffusion models from human brain activity

Yu Takagi and Shinji Nishimoto. High-resolution image reconstruction with latent diffusion models from human brain activity. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14453–14463, 2023

work page 2023
[43]

URL https://www.nature.com/articles/s41467-024-53147-y

Colin Conwell, Jacob S. Prince, Kendrick N. Kay, George A. Alvarez, and Talia Konkle. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines. Nature Communications, 15 (1):9383, 2024. doi:10.1038/s41467-024-53147-y

work page doi:10.1038/s41467-024-53147-y 2024
[44]

Wandell, Serge O

Brian A. Wandell, Serge O. Dumoulin, and Alyssa A. Brewer. Visual field maps in human cortex. Neuron, 56(2): 366–383, 2007. doi:10.1016/j.neuron.2007.10.012

work page doi:10.1016/j.neuron.2007.10.012 2007
[45]

Brady, Michelle R

Soojin Park, Timothy F. Brady, Michelle R. Greene, and Aude Oliva. Disentangling scene content from spatial boundary: Complementary roles for the parahippocampal place area and lateral occipital complex in representing real-world scenes. Journal of Neuroscience, 31(4):1333–1340, 2011. doi:10.1523/JNEUROSCI.3885-10.2011

work page doi:10.1523/jneurosci.3885-10.2011 2011
[46]

Beauchamp, Kathryn E

Michael S. Beauchamp, Kathryn E. Lee, Brenna D. Argall, and Alex Martin. Integration of auditory and visual information about objects in superior temporal sulcus. Neuron, 41(5):809–823, 2004. doi:10.1016/S0896- 6273(04)00070-4

work page doi:10.1016/s0896- 2004
[47]

Rolls, Jianfeng Feng, and Ching-Po Lin

Chu-Chung Huang, Edmund T. Rolls, Jianfeng Feng, and Ching-Po Lin. An extended Human Connectome Project multimodal parcellation atlas of the human cortex and subcortical areas. Brain Structure and Function, 227(3): 763–778, 2022. doi:10.1007/s00429-021-02421-6. 17

work page doi:10.1007/s00429-021-02421-6 2022
[48]

Zeiler and Rob Fergus

Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV), pages 818–833, 2014. doi:10.1007/978-3-319-10590-1_53

work page doi:10.1007/978-3-319-10590-1_53 2014
[49]

Deep neural networks: A new framework for modeling biological vision and brain information processing

Nikolaus Kriegeskorte. Deep neural networks: A new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1:417–446, 2015. doi:10.1146/annurev-vision-082114- 035447

work page doi:10.1146/annurev-vision-082114- 2015
[50]

Daniel L. K. Yamins and James J. DiCarlo. Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3):356–365, 2016. doi:10.1038/nn.4244

work page doi:10.1038/nn.4244 2016
[51]

Martin Schrimpf, Jonas Kubilius, Michael J. Lee, N. Apurva Ratan Murty, Robert Ajemian, and James J. DiCarlo. Integrative benchmarking to advance neurally mechanistic models of human intelligence.Neuron, 108(3):413–423,

work page
[52]

doi:10.1016/j.neuron.2020.07.040

work page doi:10.1016/j.neuron.2020.07.040 2020
[53]

Umut Güçlü and Marcel A. J. van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience , 35(27):10005–10014, 2015. doi:10.1523/JNEUROSCI.5023-14.2015

work page doi:10.1523/jneurosci.5023-14.2015 2015
[54]

When and why vision- language models behave like bags-of-words, and what to do about it? In International Conference on Learning Representations (ICLR), 2023

Mert Yuksekgonul, Federico Bianchi, Pratyusha Kalluri, Dan Jurafsky, and James Zou. When and why vision- language models behave like bags-of-words, and what to do about it? In International Conference on Learning Representations (ICLR), 2023

work page 2023
[55]

Christopher Baldassano, Andre Esteva, Li Fei-Fei, and Diane M. Beck. Two distinct scene-processing networks connecting vision and memory. eNeuro, 3(5), 2016. doi:10.1523/ENEURO.0178-16.2016

work page doi:10.1523/eneuro.0178-16.2016 2016
[56]

Epstein and Chris I

Russell A. Epstein and Chris I. Baker. Scene perception in the human brain. Annual Review of Vision Science, 5: 373–397, 2019. doi:10.1146/annurev-vision-091718-014809

work page doi:10.1146/annurev-vision-091718-014809 2019
[57]

Neuropsychological evidence of a third visual pathway specialized for social perception

David Pitcher. Neuropsychological evidence of a third visual pathway specialized for social perception. Nature Communications, 16(1):5774, 2025. doi:10.1038/s41467-025-61396-8

work page doi:10.1038/s41467-025-61396-8 2025
[58]

Jing Sui, Tülay Adali, Qingbao Yu, Jiayu Chen, and Vince D. Calhoun. A review of multivariate meth- ods for multimodal fusion of brain imaging data. Journal of Neuroscience Methods , 204(1):68–81, 2012. doi:10.1016/j.jneumeth.2011.10.031

work page doi:10.1016/j.jneumeth.2011.10.031 2012
[59]

Huth, and Jack L

Tolga Çukur, Shinji Nishimoto, Alexander G. Huth, and Jack L. Gallant. Attention during natural vision warps semantic representation across the human brain. Nature Neuroscience, 16(6):763–770, 2013. doi:10.1038/nn.3381

work page doi:10.1038/nn.3381 2013
[60]

Inferring DNN-Brain alignment using representational similarity analyses can be problematic

Marin Dujmovic, Jeffrey Bowers, Federico Adolfi, and Gaurav Malhotra. Inferring DNN-Brain alignment using representational similarity analyses can be problematic. In ICLR Workshop on Re-Aligning Vision and Language Models with Human Values, 2024

work page 2024
[61]

van Bergen and Nikolaus Kriegeskorte

Ruben S. van Bergen and Nikolaus Kriegeskorte. Going in circles is the way forward: the role of recurrence in visual inference. Current Opinion in Neurobiology , 65:176–193, 2020. doi:10.1016/j.conb.2020.11.009. Whole-brain interactions between neural circuits

work page doi:10.1016/j.conb.2020.11.009 2020
[62]

Issa, and James J

Kohitij Kar, Jonas Kubilius, Kailyn Schmidt, Elias B. Issa, and James J. DiCarlo. Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior. Nature Neuroscience, 22(6): 974–983, 2019. doi:10.1038/s41593-019-0392-5

work page doi:10.1038/s41593-019-0392-5 2019
[63]

Maintenance and transformation of representational formats during working memory prioritization

Daniel Pacheco-Estefan, Marie-Christin Fellner, Lukas Kunz, Hui Zhang, Peter Reinacher, Charlotte Roy, Armin Brandt, Andreas Schulze-Bonhage, Linglin Yang, Shuang Wang, Jing Liu, Gui Xue, and Nikolai Axmacher. Maintenance and transformation of representational formats during working memory prioritization. Nature Communications, 15(1):8234, 2024. doi:10.10...

work page doi:10.1038/s41467-024-52541-w 2024
[64]

C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27(3):379–423, 1948. doi:10.1002/j.1538-7305.1948.tb01338.x

work page doi:10.1002/j.1538-7305.1948.tb01338.x 1948
[65]

The platonic representation hypothesis

Minyoung Huh, Brian Cheung, Tongzhou Wang, and Phillip Isola. The platonic representation hypothesis. In International Conference on Machine Learning (ICML), 2024

work page 2024
[66]

Rishi Jha, Collin Zhang, Vitaly Shmatikov, and John X. Morris. Harnessing the universal geometry of embeddings,

work page
[67]

Prince, Ian Charest, Jan W

Jacob S. Prince, Ian Charest, Jan W. Kurzawski, John A. Pyles, Michael J. Tarr, and Kendrick N. Kay. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife, 11, 2022. doi:10.7554/eLife.77599

work page doi:10.7554/elife.77599 2022
[68]

Natural scene reconstruction from fmri signals using generative latent diffusion.Scientific Reports, 13(1):15666, Sep 2023

Furkan Ozcelik and Rufin VanRullen. Natural scene reconstruction from fmri signals using generative latent diffusion. Scientific Reports, 13(1):15666, 2023. doi:10.1038/s41598-023-42891-8. 18

work page doi:10.1038/s41598-023-42891-8 2023
[69]

Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J

Paul S. Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J. Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth A. Norman, and Tanishq Mathew Abraham. Reconstructing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors. In Advances in Neural Information Processing Systems, vol...

work page 2023
[70]

How to train your ViT? data, augmentation, and regularization in vision transformers

Andreas Peter Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, and Lucas Beyer. How to train your ViT? data, augmentation, and regularization in vision transformers. Transactions on Machine Learning Research, 2022

work page 2022
[71]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), volume 139, pages 8748–8763, 2021

work page 2021
[72]

Reproducible scaling laws for contrastive language-image learning

Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jitsev. Reproducible scaling laws for contrastive language-image learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2829, 2023

work page 2023
[73]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...

work page 2024
[74]

A ConvNet for the 2020s

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15979–15988, 2022. doi:10.1109/CVPR52688.2022.01553

work page doi:10.1109/cvpr52688.2022.01553 2022
[75]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021

work page 2021
[76]

Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M. Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, and Colin Raffel. Crosslingual generalization through multitask finetunin...

work page doi:10.18653/v1/2023.acl-long.891 2023
[77]

Gemma 2: Improving Open Language Models at a Practical Size

Gemma Team et al. Gemma 2: Improving open language models at a practical size, 2024. arXiv:2408.00118

work page internal anchor Pith review Pith/arXiv arXiv 2024
[78]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models, 2023. arXiv:2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023
[79]

Openllama: An open reproduction of LLaMA

Xinyang Geng and Hao Liu. Openllama: An open reproduction of LLaMA. https://github.com/ openlm-research/open_llama, 2023

work page 2023
[80]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, et al. The Llama 3 herd of models, 2024. arXiv:2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024

Showing first 80 references.

[1] [1]

James J. Gibson. The ecological approach to visual perception. Houghton, Mifflin and Company, 1979

work page 1979

[2] [2]

R. L. Gregory. Perceptions as hypotheses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 290(1038):181–197, 1980

work page 1980

[3] [3]

The logic of perception

Irvin Rock. The logic of perception. MIT Press, Cambridge, 1983

work page 1983

[4] [4]

Zirui Chen and Michael F. Bonner. Universal dimensions of visual representation. Science Advances, 11(27): eadw7697, 2025. doi:10.1126/sciadv.adw7697

work page doi:10.1126/sciadv.adw7697 2025

[5] [5]

Karl J. Friston. A theory of cortical responses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 360(1456):815–836, 2005. doi:10.1098/rstb.2005.1622

work page doi:10.1098/rstb.2005.1622 2005

[6] [6]

Rajesh P. N. Rao and Dana H. Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1):79–87, 1999. doi:10.1038/4580

work page doi:10.1038/4580 1999

[7] [7]

Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 10(7):301–308, 2006

Alan Yuille and Daniel Kersten. Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 10(7):301–308, 2006. doi:10.1016/j.tics.2006.05.002. Special issue: Probabilistic models of cognition. 15

work page doi:10.1016/j.tics.2006.05.002 2006

[8] [8]

The structural basis of inter-individual differences in human behaviour and cognition

Ryota Kanai and Geraint Rees. The structural basis of inter-individual differences in human behaviour and cognition. Nature Reviews Neuroscience, 12(4):231–242, 2011. doi:10.1038/nrn3000

work page doi:10.1038/nrn3000 2011

[9] [9]

Hermann, and Bevil R

Rosa Lafer-Sousa, Katherine L. Hermann, and Bevil R. Conway. Striking individual differences in color perception uncovered by ‘the dress’ photograph.Current Biology, 25(13):R545–R546, 2015. doi:10.1016/j.cub.2015.04.053

work page doi:10.1016/j.cub.2015.04.053 2015

[10] [10]

Samuel Schwarzkopf, Chen Song, and Geraint Rees

D. Samuel Schwarzkopf, Chen Song, and Geraint Rees. The surface area of human V1 predicts the subjective experience of object size. Nature Neuroscience, 14(1):28–30, 2011. doi:10.1038/nn.2706

work page doi:10.1038/nn.2706 2011

[11] [11]

Christopher Baldassano, Uri Hasson, and Kenneth A. Norman. Representation of real-world event schemas during narrative perception. Journal of Neuroscience, 38(45):9689–9699, 2018. doi:10.1523/JNEUROSCI.0251-18.2018

work page doi:10.1523/jneurosci.0251-18.2018 2018

[12] [12]

Intersubject synchronization of cortical activity during natural vision.Science, 303(5664):1634–1640, 2004

Uri Hasson, Yuval Nir, Ifat Levy, Galit Fuhrmann, and Rafael Malach. Intersubject synchronization of cortical activity during natural vision. Science, 303(5664):1634–1640, 2004. doi:10.1126/science.1089506

work page doi:10.1126/science.1089506 2004

[13] [13]

Haxby, Andrew C

James V . Haxby, Andrew C. Connolly, and J. Swaroop Guntupalli. Decoding neural representational spaces using multivariate pattern analysis. Annual Review of Neuroscience, 37:435–456, 2014. doi:10.1146/annurev-neuro- 062012-170325

work page doi:10.1146/annurev-neuro- 2014

[14] [14]

Honey, Chung H

Janice Chen, Yuan Chang Leong, Christopher J. Honey, Chung H. Yong, Kenneth A. Norman, and Uri Hasson. Shared memories reveal shared structure in neural activity across individuals.Nature Neuroscience, 20(1):115–125,

work page

[15] [15]

Deep supervised, but not unsupervised, models may explain IT cortical representation

Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLOS Computational Biology , 10(11):1–29, 2014. doi:10.1371/journal.pcbi.1003915

work page doi:10.1371/journal.pcbi.1003915 2014

[16] [16]

Daniel L. K. Yamins, Ha Hong, Charles F. Cadieu, Ethan A. Solomon, Darren Seibert, and James J. DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23):8619–8624, 2014. doi:10.1073/pnas.1403112111

work page doi:10.1073/pnas.1403112111 2014

[17] [17]

Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence.Scientific Reports, 6(1):27755, 2016

Radoslaw Martin Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, and Aude Oliva. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports, 6(1):27755, 2016. doi:10.1038/srep27755

work page doi:10.1038/srep27755 2016

[18] [18]

Alexander J. E. Kell, Daniel L. K. Yamins, Erica N. Shook, Sam V . Norman-Haignere, and Josh H. McDermott. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98(3):630–644.e16, 2018. doi:10.1016/j.neuron.2018.03.044

work page doi:10.1016/j.neuron.2018.03.044 2018

[19] [19]

Brains and algorithms partially converge in natural language processing , volume =

Charlotte Caucheteux and Jean-Rémi King. Brains and algorithms partially converge in natural language processing. Communications Biology, 5(1):134, 2022. doi:10.1038/s42003-022-03036-1

work page doi:10.1038/s42003-022-03036-1 2022

[20] [20]

Trends in Cognitive Sciences , author =

Radoslaw M. Cichy and Daniel Kaiser. Deep neural networks as scientific models. Trends in Cognitive Sciences, 23(4):305–317, 2019. doi:10.1016/j.tics.2019.01.009

work page doi:10.1016/j.tics.2019.01.009 2019

[21] [21]

Sommers, Katja Seeliger, Blake Richards, Jenann Ismael, Grace W

Adrien Doerig, Rowan P. Sommers, Katja Seeliger, Blake Richards, Jenann Ismael, Grace W. Lindsay, Kon- rad P. Kording, Talia Konkle, Marcel A. J. van Gerven, Nikolaus Kriegeskorte, and Tim C. Kietzmann. The neuroconnectionist research programme. Nature Reviews Neuroscience, 24(7):431–450, 2023. doi:10.1038/s41583- 023-00705-w

work page doi:10.1038/s41583- 2023

[22] [22]

Brain–machine convergent evolution: Why finding parallels between brain and artificial systems is informative

Erez Simony, Shany Grossman, and Rafael Malach. Brain–machine convergent evolution: Why finding parallels between brain and artificial systems is informative. Proceedings of the National Academy of Sciences, 121(41): e2319709121, 2024. doi:10.1073/pnas.2319709121

work page doi:10.1073/pnas.2319709121 2024

[23] [23]

Allen, Ghislain St-Yves, Yihan Wu, Jesse L

Emily J. Allen, Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Jacob S. Prince, Logan T. Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, J. Benjamin Hutchinson, Thomas Naselaris, and Kendrick Kay. A massive 7t fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1): 116–126, 2022. doi:10.1038/...

work page doi:10.1038/s41593-021-00962-x 2022

[24] [24]

Pyles, Austin Marcus, Abhinav Gupta, Michael J

Nadine Chang, John A. Pyles, Austin Marcus, Abhinav Gupta, Michael J. Tarr, and Elissa M. Aminoff. BOLD5000, a public fMRI dataset while viewing 5000 visual images. Scientific Data, 6(1):49, 2019. doi:10.1038/s41597-019- 0052-3

work page doi:10.1038/s41597-019- 2019

[25] [25]

THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior , volume =

Martin N. Hebart, Oliver Contier, Lina Teichmann, Adam H. Rockter, Charles Y . Zheng, Alexis Kidder, Anna Corriveau, Maryam Vaziri-Pashkam, and Chris I. Baker. THINGS-data, a multimodal collection of large- scale datasets for investigating object representations in human brain and behavior. eLife, 12:e82580, 2023. doi:10.7554/eLife.82580

work page doi:10.7554/elife.82580 2023

[26] [26]

Goodale and A

Melvyn A. Goodale and A. David Milner. Separate visual pathways for perception and action. Trends in Neurosciences, 15(1):20–25, 1992. doi:10.1016/0166-2236(92)90344-8. 16

work page doi:10.1016/0166-2236(92)90344-8 1992

[27] [27]

A cortical representation of the local visual environment

Russell Epstein and Nancy Kanwisher. A cortical representation of the local visual environment. Nature, 392 (6676):598–601, 1998. doi:10.1038/33402

work page doi:10.1038/33402 1998

[28] [28]

Rolls, Xiaoqian Yan, Gustavo Deco, Yi Zhang, Veikko Jousmaki, and Jianfeng Feng

Edmund T. Rolls, Xiaoqian Yan, Gustavo Deco, Yi Zhang, Veikko Jousmaki, and Jianfeng Feng. A ventromedial visual cortical ‘where’ stream to the human hippocampus for spatial scenes revealed with magnetoencephalography. Communications Biology, 7(1):1047, 2024. doi:10.1038/s42003-024-06719-z

work page doi:10.1038/s42003-024-06719-z 2024

[29] [29]

Allison, A

T. Allison, A. Puce, and G. McCarthy. Social perception from visual cues: role of the STS region. Trends in Cognitive Sciences, 4(7):267–278, 2000

work page 2000

[30] [30]

Ungerleider

David Pitcher and Leslie G. Ungerleider. Evidence for a third visual pathway specialized for social perception. Trends in Cognitive Sciences, 25(2):100–110, 2021. doi:10.1016/j.tics.2020.11.006

work page doi:10.1016/j.tics.2020.11.006 2021

[31] [31]

Glasser, Timothy S

Matthew F. Glasser, Timothy S. Coalson, Emma C. Robinson, Carl D. Hacker, John Harwell, Essa Yacoub, Kamil Ugurbil, Jesper Andersson, Christian F. Beckmann, Mark Jenkinson, Stephen M. Smith, and David C. Van Essen. A multi-modal parcellation of human cerebral cortex. Nature, 536(7615):171–178, 2016. doi:10.1038/nature18933

work page doi:10.1038/nature18933 2016

[32] [32]

Representational similarity analysis – connecting the branches of systems neuroscience , issn =

Nikolaus Kriegeskorte, Marieke Mur, and Peter A. Bandettini. Representational similarity analysis – connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2, 2008. doi:10.3389/neuro.06.004.2008

work page doi:10.3389/neuro.06.004.2008 2008

[33] [33]

Hardoon, Sandor Szedmak, and John Shawe-Taylor

David R. Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639–2664, 2004. doi:10.1162/0899766042321814

work page doi:10.1162/0899766042321814 2004

[34] [34]

Felleman and David C

Daniel J. Felleman and David C. Van Essen. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1(1):1–47, 1991. doi:10.1093/cercor/1.1.1-a

work page doi:10.1093/cercor/1.1.1-a 1991

[35] [35]

Kravitz, Kadharbatcha S

Dwight J. Kravitz, Kadharbatcha S. Saleem, Chris I. Baker, Leslie G. Ungerleider, and Mortimer Mishkin. The ventral visual pathway: An expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1):26–49, 2013. doi:10.1016/j.tics.2012.10.011

work page doi:10.1016/j.tics.2012.10.011 2013

[36] [36]

(2014) A Toolbox for Representational Similarity Analysis

Hamed Nili, Cai Wingfield, Alexander Walther, Li Su, William Marslen-Wilson, and Nikolaus Kriegesko- rte. A toolbox for representational similarity analysis. PLOS Computational Biology , 10(4):1–11, 2014. doi:10.1371/journal.pcbi.1003553

work page doi:10.1371/journal.pcbi.1003553 2014

[37] [37]

Methods for computing the maximum performance of computational models of fMRI responses , journal =

Agustin Lage-Castellanos, Giancarlo Valente, Elia Formisano, and Federico De Martino. Methods for computing the maximum performance of computational models of fMRI responses. PLOS Computational Biology, 15(3): 1–25, 2019. doi:10.1371/journal.pcbi.1006397

work page doi:10.1371/journal.pcbi.1006397 2019

[38] [38]

DiCarlo, Davide Zoccolan, and Nicole C

James J. DiCarlo, Davide Zoccolan, and Nicole C. Rust. How does the brain solve visual object recognition? Neuron, 73(3):415–434, 2012. doi:10.1016/j.neuron.2012.01.010

work page doi:10.1016/j.neuron.2012.01.010 2012

[39] [39]

Popham, Alexander G

Sara F. Popham, Alexander G. Huth, Natalia Y . Bilenko, Fatma Deniz, James S. Gao, Anwar O. Nunez-Elizalde, and Jack L. Gallant. Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nature Neuroscience, 24(11):1628–1636, 2021. doi:10.1038/s41593-021-00921-6

work page doi:10.1038/s41593-021-00921-6 2021

[40] [40]

Freedman and Earl K

David J. Freedman and Earl K. Miller. Neural mechanisms of visual categorization: Insights from neurophysiology. Neuroscience & Biobehavioral Reviews, 32(2):311–329, 2008. doi:10.1016/j.neubiorev.2007.07.011

work page doi:10.1016/j.neubiorev.2007.07.011 2008

[41] [41]

Weiner, and Kalanit Grill-Spector

Lior Bugatus, Kevin S. Weiner, and Kalanit Grill-Spector. Task alters category representations in prefrontal but not high-level visual cortex. NeuroImage, 155:437–449, 2017. doi:10.1016/j.neuroimage.2017.03.062

work page doi:10.1016/j.neuroimage.2017.03.062 2017

[42] [42]

High-resolution image reconstruction with latent diffusion models from human brain activity

Yu Takagi and Shinji Nishimoto. High-resolution image reconstruction with latent diffusion models from human brain activity. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14453–14463, 2023

work page 2023

[43] [43]

URL https://www.nature.com/articles/s41467-024-53147-y

Colin Conwell, Jacob S. Prince, Kendrick N. Kay, George A. Alvarez, and Talia Konkle. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines. Nature Communications, 15 (1):9383, 2024. doi:10.1038/s41467-024-53147-y

work page doi:10.1038/s41467-024-53147-y 2024

[44] [44]

Wandell, Serge O

Brian A. Wandell, Serge O. Dumoulin, and Alyssa A. Brewer. Visual field maps in human cortex. Neuron, 56(2): 366–383, 2007. doi:10.1016/j.neuron.2007.10.012

work page doi:10.1016/j.neuron.2007.10.012 2007

[45] [45]

Brady, Michelle R

Soojin Park, Timothy F. Brady, Michelle R. Greene, and Aude Oliva. Disentangling scene content from spatial boundary: Complementary roles for the parahippocampal place area and lateral occipital complex in representing real-world scenes. Journal of Neuroscience, 31(4):1333–1340, 2011. doi:10.1523/JNEUROSCI.3885-10.2011

work page doi:10.1523/jneurosci.3885-10.2011 2011

[46] [46]

Beauchamp, Kathryn E

Michael S. Beauchamp, Kathryn E. Lee, Brenna D. Argall, and Alex Martin. Integration of auditory and visual information about objects in superior temporal sulcus. Neuron, 41(5):809–823, 2004. doi:10.1016/S0896- 6273(04)00070-4

work page doi:10.1016/s0896- 2004

[47] [47]

Rolls, Jianfeng Feng, and Ching-Po Lin

Chu-Chung Huang, Edmund T. Rolls, Jianfeng Feng, and Ching-Po Lin. An extended Human Connectome Project multimodal parcellation atlas of the human cortex and subcortical areas. Brain Structure and Function, 227(3): 763–778, 2022. doi:10.1007/s00429-021-02421-6. 17

work page doi:10.1007/s00429-021-02421-6 2022

[48] [48]

Zeiler and Rob Fergus

Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV), pages 818–833, 2014. doi:10.1007/978-3-319-10590-1_53

work page doi:10.1007/978-3-319-10590-1_53 2014

[49] [49]

Deep neural networks: A new framework for modeling biological vision and brain information processing

Nikolaus Kriegeskorte. Deep neural networks: A new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1:417–446, 2015. doi:10.1146/annurev-vision-082114- 035447

work page doi:10.1146/annurev-vision-082114- 2015

[50] [50]

Daniel L. K. Yamins and James J. DiCarlo. Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3):356–365, 2016. doi:10.1038/nn.4244

work page doi:10.1038/nn.4244 2016

[51] [51]

Martin Schrimpf, Jonas Kubilius, Michael J. Lee, N. Apurva Ratan Murty, Robert Ajemian, and James J. DiCarlo. Integrative benchmarking to advance neurally mechanistic models of human intelligence.Neuron, 108(3):413–423,

work page

[52] [52]

doi:10.1016/j.neuron.2020.07.040

work page doi:10.1016/j.neuron.2020.07.040 2020

[53] [53]

Umut Güçlü and Marcel A. J. van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience , 35(27):10005–10014, 2015. doi:10.1523/JNEUROSCI.5023-14.2015

work page doi:10.1523/jneurosci.5023-14.2015 2015

[54] [54]

When and why vision- language models behave like bags-of-words, and what to do about it? In International Conference on Learning Representations (ICLR), 2023

Mert Yuksekgonul, Federico Bianchi, Pratyusha Kalluri, Dan Jurafsky, and James Zou. When and why vision- language models behave like bags-of-words, and what to do about it? In International Conference on Learning Representations (ICLR), 2023

work page 2023

[55] [55]

Christopher Baldassano, Andre Esteva, Li Fei-Fei, and Diane M. Beck. Two distinct scene-processing networks connecting vision and memory. eNeuro, 3(5), 2016. doi:10.1523/ENEURO.0178-16.2016

work page doi:10.1523/eneuro.0178-16.2016 2016

[56] [56]

Epstein and Chris I

Russell A. Epstein and Chris I. Baker. Scene perception in the human brain. Annual Review of Vision Science, 5: 373–397, 2019. doi:10.1146/annurev-vision-091718-014809

work page doi:10.1146/annurev-vision-091718-014809 2019

[57] [57]

Neuropsychological evidence of a third visual pathway specialized for social perception

David Pitcher. Neuropsychological evidence of a third visual pathway specialized for social perception. Nature Communications, 16(1):5774, 2025. doi:10.1038/s41467-025-61396-8

work page doi:10.1038/s41467-025-61396-8 2025

[58] [58]

Jing Sui, Tülay Adali, Qingbao Yu, Jiayu Chen, and Vince D. Calhoun. A review of multivariate meth- ods for multimodal fusion of brain imaging data. Journal of Neuroscience Methods , 204(1):68–81, 2012. doi:10.1016/j.jneumeth.2011.10.031

work page doi:10.1016/j.jneumeth.2011.10.031 2012

[59] [59]

Huth, and Jack L

Tolga Çukur, Shinji Nishimoto, Alexander G. Huth, and Jack L. Gallant. Attention during natural vision warps semantic representation across the human brain. Nature Neuroscience, 16(6):763–770, 2013. doi:10.1038/nn.3381

work page doi:10.1038/nn.3381 2013

[60] [60]

Inferring DNN-Brain alignment using representational similarity analyses can be problematic

Marin Dujmovic, Jeffrey Bowers, Federico Adolfi, and Gaurav Malhotra. Inferring DNN-Brain alignment using representational similarity analyses can be problematic. In ICLR Workshop on Re-Aligning Vision and Language Models with Human Values, 2024

work page 2024

[61] [61]

van Bergen and Nikolaus Kriegeskorte

Ruben S. van Bergen and Nikolaus Kriegeskorte. Going in circles is the way forward: the role of recurrence in visual inference. Current Opinion in Neurobiology , 65:176–193, 2020. doi:10.1016/j.conb.2020.11.009. Whole-brain interactions between neural circuits

work page doi:10.1016/j.conb.2020.11.009 2020

[62] [62]

Issa, and James J

Kohitij Kar, Jonas Kubilius, Kailyn Schmidt, Elias B. Issa, and James J. DiCarlo. Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior. Nature Neuroscience, 22(6): 974–983, 2019. doi:10.1038/s41593-019-0392-5

work page doi:10.1038/s41593-019-0392-5 2019

[63] [63]

Maintenance and transformation of representational formats during working memory prioritization

Daniel Pacheco-Estefan, Marie-Christin Fellner, Lukas Kunz, Hui Zhang, Peter Reinacher, Charlotte Roy, Armin Brandt, Andreas Schulze-Bonhage, Linglin Yang, Shuang Wang, Jing Liu, Gui Xue, and Nikolai Axmacher. Maintenance and transformation of representational formats during working memory prioritization. Nature Communications, 15(1):8234, 2024. doi:10.10...

work page doi:10.1038/s41467-024-52541-w 2024

[64] [64]

C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27(3):379–423, 1948. doi:10.1002/j.1538-7305.1948.tb01338.x

work page doi:10.1002/j.1538-7305.1948.tb01338.x 1948

[65] [65]

The platonic representation hypothesis

Minyoung Huh, Brian Cheung, Tongzhou Wang, and Phillip Isola. The platonic representation hypothesis. In International Conference on Machine Learning (ICML), 2024

work page 2024

[66] [66]

Rishi Jha, Collin Zhang, Vitaly Shmatikov, and John X. Morris. Harnessing the universal geometry of embeddings,

work page

[67] [67]

Prince, Ian Charest, Jan W

Jacob S. Prince, Ian Charest, Jan W. Kurzawski, John A. Pyles, Michael J. Tarr, and Kendrick N. Kay. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife, 11, 2022. doi:10.7554/eLife.77599

work page doi:10.7554/elife.77599 2022

[68] [68]

Natural scene reconstruction from fmri signals using generative latent diffusion.Scientific Reports, 13(1):15666, Sep 2023

Furkan Ozcelik and Rufin VanRullen. Natural scene reconstruction from fmri signals using generative latent diffusion. Scientific Reports, 13(1):15666, 2023. doi:10.1038/s41598-023-42891-8. 18

work page doi:10.1038/s41598-023-42891-8 2023

[69] [69]

Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J

Paul S. Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J. Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth A. Norman, and Tanishq Mathew Abraham. Reconstructing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors. In Advances in Neural Information Processing Systems, vol...

work page 2023

[70] [70]

How to train your ViT? data, augmentation, and regularization in vision transformers

Andreas Peter Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, and Lucas Beyer. How to train your ViT? data, augmentation, and regularization in vision transformers. Transactions on Machine Learning Research, 2022

work page 2022

[71] [71]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), volume 139, pages 8748–8763, 2021

work page 2021

[72] [72]

Reproducible scaling laws for contrastive language-image learning

Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jitsev. Reproducible scaling laws for contrastive language-image learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2829, 2023

work page 2023

[73] [73]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...

work page 2024

[74] [74]

A ConvNet for the 2020s

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15979–15988, 2022. doi:10.1109/CVPR52688.2022.01553

work page doi:10.1109/cvpr52688.2022.01553 2022

[75] [75]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021

work page 2021

[76] [76]

Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M. Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, and Colin Raffel. Crosslingual generalization through multitask finetunin...

work page doi:10.18653/v1/2023.acl-long.891 2023

[77] [77]

Gemma 2: Improving Open Language Models at a Practical Size

Gemma Team et al. Gemma 2: Improving open language models at a practical size, 2024. arXiv:2408.00118

work page internal anchor Pith review Pith/arXiv arXiv 2024

[78] [78]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models, 2023. arXiv:2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023

[79] [79]

Openllama: An open reproduction of LLaMA

Xinyang Geng and Hao Liu. Openllama: An open reproduction of LLaMA. https://github.com/ openlm-research/open_llama, 2023

work page 2023

[80] [80]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, et al. The Llama 3 herd of models, 2024. arXiv:2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024