pith. sign in

arxiv: 2507.13941 · v2 · pith:PSWTA72Bnew · submitted 2025-07-18 · 🧬 q-bio.NC · cs.AI· cs.CV· eess.IV

Shared representations in brains and models reveal a two-route cortical organization during scene perception

Pith reviewed 2026-05-19 04:53 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AIcs.CVeess.IV
keywords representational similarity analysisscene perceptionfMRIventromedial pathwaylateral occipitotemporal pathwayneural networkscortical organizationvisual cortex
0
0 comments X

The pith

Scene perception uses two separate cortical routes, one for layout and context and one for animate content.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The study applies representational similarity analysis to 7T fMRI recordings made while people viewed natural scenes. Shared response patterns across participants are compared to layered features taken from vision and language neural networks. This comparison identifies a ventromedial route that encodes scene layout and environmental context and a lateral occipitotemporal route that is tuned to animate elements. Vision models match the geometry in both routes, whereas language models align mainly with the lateral route. The result reframes scene perception as a distributed network rather than a single hierarchical stream.

Core claim

Representational similarity analysis performed on 7T fMRI data collected during natural scene viewing identifies two distinct processing routes in the cortex. A ventromedial pathway specializes in scene layout and environmental context, while a lateral occipitotemporal pathway is selective for animate content. Hierarchical features from vision neural networks align with the shared structure found in both routes, but language-model features correspond primarily to the lateral route. These observations refine classical visual-stream models by describing scene perception as a distributed cortical network with separable representational organizations for context and animate content.

What carries the argument

Representational similarity analysis that extracts shared geometry across individuals' brain responses to scenes and matches it against hierarchical features from vision and language neural networks.

If this is right

  • Scene perception is carried by separable representational routes for contextual layout and animate content.
  • Vision models capture shared structure across both routes while language models align mainly with the animate route.
  • Classical two-stream models of vision must be updated to include this distributed two-route organization for complex scenes.
  • Shared patterns across people point to stable cortical organizations that support scene understanding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The two routes could show different sensitivity to focal brain damage, producing selective deficits in layout versus object recognition.
  • Active tasks such as navigation or search might reveal how the routes compete or cooperate under behavioral demands.
  • Similar cross-model comparisons in other modalities could test whether the separation is specific to visual scene processing.

Load-bearing premise

The assumption that cross-individual representational similarity recorded during passive scene viewing captures stable, functionally meaningful cortical routes rather than task- or stimulus-specific correlations.

What would settle it

If the distinct similarity geometries in ventromedial versus lateral occipitotemporal regions become indistinguishable when the same scenes are viewed under an active task that requires integrating layout and animate information.

Figures

Figures reproduced from arXiv: 2507.13941 by Llu\'is Fuentemilla, Pablo Marcos-Manch\'on.

Figure 1
Figure 1. Figure 1: A unified framework for tracing representational pathways. (A) Feature Extraction. For each image stimulus, we extracted equivalent representations from brain activity and deep neural networks. Single-trial fMRI responses were aggregated within HCP-MMP cortical parcels 31 to create vector representations of the brain’s response. Concurrently, layer-wise activations were extracted from pre-trained vision an… view at source ↗
Figure 2
Figure 2. Figure 2: Cortical distribution of representational alignment. (A) Inter-subject alignment (RSA, Pearson r) computed per parcel and grouped by macro-anatomical clusters 47. Box-plots show the ten clusters with the highest mean alignment in the NSD sample (N = 8; symmetric HCP-MMP atlas 31). Red lines mark the parcel-wise null distribution (mean ± s.d.; 10 000 label permutations). All displayed parcels exceed chance … view at source ↗
Figure 3
Figure 3. Figure 3: Hierarchical convergence between models and cortex. (A–C) Layer-wise alignment (RSA) between vision models and brain activity for representative parcels within the three reference hubs. The alignment curves are averaged across all vision models used in the study. Panels show Early Visual Cortex (A), the Ventral Hub (B), and the LOTC Hub (C). Lines show the mean alignment across participants (N = 8), with s… view at source ↗
Figure 4
Figure 4. Figure 4: Inter-subject representational connectivity network. (A) Inter-subject connectivity matrix. Parcel-wise represen￾tational connectivity (RSA, Pearson’s r) is shown for 30 key cortical parcels (N = 8). Each cell represents the mean RSA score between the RDMs of two parcels, computed across all pairs of different individuals. The matrix reveals a clear block structure corresponding to three principal hubs: Ea… view at source ↗
Figure 5
Figure 5. Figure 5: Dominant representational dimensions in cortical hubs. (A–C) Stimuli projection onto the first two shared components for each hub, extracted via Kernel Multi-view CCA (KMCCA) using the voxel data from all eight NSD participants for the common image set. In Early Visual Cortex (A), stimuli form a category-free cloud reflecting low-level visual similarity. In the Ventral Hub (B), the dominant axis arranges s… view at source ↗
Figure 6
Figure 6. Figure 6: Shared representational geometry is modulated by stimulus content and task. Alignment computed using the symmetric HCP atlas combining both hemispheres. (A–D) Replication in BOLD5000, using complex natural scenes. With a different task (valence rating) but with comparable complex scene stimuli that include social content, the main findings were replicated. Both the inter-subject alignment (A) and its corre… view at source ↗
read the original abstract

The brain transforms visual inputs into high-dimensional cortical representations that support diverse cognitive and behavioral goals. Characterizing how this information is organized and routed across the human brain is essential for understanding how we process complex visual scenes. Here, we applied representational similarity analysis to 7T fMRI data collected during natural scene viewing. We quantified representational geometry shared across individuals and compared it to hierarchical features from vision and language neural networks. This analysis revealed two distinct processing routes: a ventromedial pathway specialized for scene layout and environmental context, and a lateral occipitotemporal pathway selective for animate content. Vision models aligned with shared structure in both routes, whereas language models corresponded primarily with the lateral pathway. These findings refine classical visual-stream models by characterizing scene perception as a distributed cortical network with separable representational routes for context and animate content.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript applies representational similarity analysis (RSA) to 7T fMRI data collected during passive viewing of natural scenes. It extracts representational geometry shared across individuals and aligns this geometry to hierarchical features from pre-trained vision and language neural networks. The central finding is a two-route cortical organization: a ventromedial pathway specialized for scene layout and environmental context, and a lateral occipitotemporal pathway selective for animate content. Vision models align with shared structure in both routes, whereas language models align primarily with the lateral route.

Significance. If the shared RDMs prove reliable and the route distinction generalizes, the work refines classical dorsal/ventral stream models by characterizing scene perception as a distributed network with separable representational routes for context versus animate content. The integration of high-field fMRI with both vision and language model features offers a computational bridge that could guide targeted experiments on how cortical organization supports high-level scene understanding.

major comments (2)
  1. [Methods] Methods section: No participant count, stimulus-set size or composition, statistical thresholds, or cross-validation scheme is reported for the RSA or model-alignment steps. These details are load-bearing for the claim that the ventromedial/lateral separation reflects stable functional routes rather than dataset-specific correlations.
  2. [Results] Results section on route identification: Without reported split-half reliability of the shared component or control RDMs for low-level features (e.g., spatial frequency, object co-occurrence), the ventromedial specialization for layout/context could be driven by stimulus statistics in the particular scene set rather than a general organizational principle.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'hierarchical features from vision and language neural networks' should specify which layers or models were used and how feature extraction was performed.
  2. [Discussion] Discussion: A brief limitations paragraph addressing the passive-viewing design and potential task-specificity would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important aspects of methodological transparency and robustness. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of our findings on the two-route organization during scene perception.

read point-by-point responses
  1. Referee: [Methods] Methods section: No participant count, stimulus-set size or composition, statistical thresholds, or cross-validation scheme is reported for the RSA or model-alignment steps. These details are load-bearing for the claim that the ventromedial/lateral separation reflects stable functional routes rather than dataset-specific correlations.

    Authors: We agree that explicit reporting of these parameters is necessary to evaluate the stability of the identified routes. The revised Methods section will include the participant count, the size and composition of the natural scene stimulus set, the statistical thresholds applied (including any multiple-comparison corrections), and the cross-validation procedures used for both the shared RDM computation and the model-alignment analyses. These additions will directly support the interpretation that the ventromedial and lateral routes reflect reliable functional organization rather than idiosyncratic dataset features. revision: yes

  2. Referee: [Results] Results section on route identification: Without reported split-half reliability of the shared component or control RDMs for low-level features (e.g., spatial frequency, object co-occurrence), the ventromedial specialization for layout/context could be driven by stimulus statistics in the particular scene set rather than a general organizational principle.

    Authors: We recognize the value of these controls for distinguishing stimulus-driven effects from broader organizational principles. In the revision, we will report split-half reliability estimates for the shared representational components across subjects. We will also add control analyses comparing the observed routes against RDMs constructed from low-level image statistics (spatial frequency content) and higher-order co-occurrence measures. These supplementary results will be presented in the Results section to demonstrate that the ventromedial specialization for context and the lateral selectivity for animate content are not fully explained by the specific statistics of the stimulus set. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical RSA and external model comparisons are data-driven

full rationale

The paper applies representational similarity analysis to 7T fMRI data from natural scene viewing, extracts shared representational geometry across individuals, and aligns it to hierarchical features from pre-trained vision and language networks. The two-route distinction (ventromedial for layout/context, lateral for animate content) emerges from these cross-subject and cross-model comparisons rather than any self-definitional equation, fitted parameter renamed as prediction, or load-bearing self-citation. No derivation reduces to its inputs by construction; the analysis remains falsifiable against the stimulus set and external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the work relies on standard assumptions of RSA and DNN feature extraction.

pith-pipeline@v0.9.0 · 5682 in / 1103 out tokens · 31576 ms · 2026-05-19T04:53:42.486982+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale

    cs.CV 2026-04 unverdicted novelty 6.0

    Evidence for cross-modal representational convergence weakens substantially at scale and in realistic many-to-many settings, indicating models learn rich but distinct representations.

Reference graph

Works this paper leans on

93 extracted references · 93 canonical work pages · cited by 1 Pith paper · 6 internal anchors

  1. [1]

    James J. Gibson. The ecological approach to visual perception. Houghton, Mifflin and Company, 1979

  2. [2]

    R. L. Gregory. Perceptions as hypotheses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 290(1038):181–197, 1980

  3. [3]

    The logic of perception

    Irvin Rock. The logic of perception. MIT Press, Cambridge, 1983

  4. [4]

    Zirui Chen and Michael F. Bonner. Universal dimensions of visual representation. Science Advances, 11(27): eadw7697, 2025. doi:10.1126/sciadv.adw7697

  5. [5]

    Karl J. Friston. A theory of cortical responses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 360(1456):815–836, 2005. doi:10.1098/rstb.2005.1622

  6. [6]

    Rajesh P. N. Rao and Dana H. Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1):79–87, 1999. doi:10.1038/4580

  7. [7]

    Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 10(7):301–308, 2006

    Alan Yuille and Daniel Kersten. Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 10(7):301–308, 2006. doi:10.1016/j.tics.2006.05.002. Special issue: Probabilistic models of cognition. 15

  8. [8]

    The structural basis of inter-individual differences in human behaviour and cognition

    Ryota Kanai and Geraint Rees. The structural basis of inter-individual differences in human behaviour and cognition. Nature Reviews Neuroscience, 12(4):231–242, 2011. doi:10.1038/nrn3000

  9. [9]

    Hermann, and Bevil R

    Rosa Lafer-Sousa, Katherine L. Hermann, and Bevil R. Conway. Striking individual differences in color perception uncovered by ‘the dress’ photograph.Current Biology, 25(13):R545–R546, 2015. doi:10.1016/j.cub.2015.04.053

  10. [10]

    Samuel Schwarzkopf, Chen Song, and Geraint Rees

    D. Samuel Schwarzkopf, Chen Song, and Geraint Rees. The surface area of human V1 predicts the subjective experience of object size. Nature Neuroscience, 14(1):28–30, 2011. doi:10.1038/nn.2706

  11. [11]

    Christopher Baldassano, Uri Hasson, and Kenneth A. Norman. Representation of real-world event schemas during narrative perception. Journal of Neuroscience, 38(45):9689–9699, 2018. doi:10.1523/JNEUROSCI.0251-18.2018

  12. [12]

    Intersubject synchronization of cortical activity during natural vision.Science, 303(5664):1634–1640, 2004

    Uri Hasson, Yuval Nir, Ifat Levy, Galit Fuhrmann, and Rafael Malach. Intersubject synchronization of cortical activity during natural vision. Science, 303(5664):1634–1640, 2004. doi:10.1126/science.1089506

  13. [13]

    Haxby, Andrew C

    James V . Haxby, Andrew C. Connolly, and J. Swaroop Guntupalli. Decoding neural representational spaces using multivariate pattern analysis. Annual Review of Neuroscience, 37:435–456, 2014. doi:10.1146/annurev-neuro- 062012-170325

  14. [14]

    Honey, Chung H

    Janice Chen, Yuan Chang Leong, Christopher J. Honey, Chung H. Yong, Kenneth A. Norman, and Uri Hasson. Shared memories reveal shared structure in neural activity across individuals.Nature Neuroscience, 20(1):115–125,

  15. [15]

    Deep supervised, but not unsupervised, models may explain IT cortical representation

    Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLOS Computational Biology , 10(11):1–29, 2014. doi:10.1371/journal.pcbi.1003915

  16. [16]

    Daniel L. K. Yamins, Ha Hong, Charles F. Cadieu, Ethan A. Solomon, Darren Seibert, and James J. DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23):8619–8624, 2014. doi:10.1073/pnas.1403112111

  17. [17]

    Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence.Scientific Reports, 6(1):27755, 2016

    Radoslaw Martin Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, and Aude Oliva. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports, 6(1):27755, 2016. doi:10.1038/srep27755

  18. [18]

    Alexander J. E. Kell, Daniel L. K. Yamins, Erica N. Shook, Sam V . Norman-Haignere, and Josh H. McDermott. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98(3):630–644.e16, 2018. doi:10.1016/j.neuron.2018.03.044

  19. [19]

    Brains and algorithms partially converge in natural language processing , volume =

    Charlotte Caucheteux and Jean-Rémi King. Brains and algorithms partially converge in natural language processing. Communications Biology, 5(1):134, 2022. doi:10.1038/s42003-022-03036-1

  20. [20]

    Trends in Cognitive Sciences , author =

    Radoslaw M. Cichy and Daniel Kaiser. Deep neural networks as scientific models. Trends in Cognitive Sciences, 23(4):305–317, 2019. doi:10.1016/j.tics.2019.01.009

  21. [21]

    Sommers, Katja Seeliger, Blake Richards, Jenann Ismael, Grace W

    Adrien Doerig, Rowan P. Sommers, Katja Seeliger, Blake Richards, Jenann Ismael, Grace W. Lindsay, Kon- rad P. Kording, Talia Konkle, Marcel A. J. van Gerven, Nikolaus Kriegeskorte, and Tim C. Kietzmann. The neuroconnectionist research programme. Nature Reviews Neuroscience, 24(7):431–450, 2023. doi:10.1038/s41583- 023-00705-w

  22. [22]

    Brain–machine convergent evolution: Why finding parallels between brain and artificial systems is informative

    Erez Simony, Shany Grossman, and Rafael Malach. Brain–machine convergent evolution: Why finding parallels between brain and artificial systems is informative. Proceedings of the National Academy of Sciences, 121(41): e2319709121, 2024. doi:10.1073/pnas.2319709121

  23. [23]

    Allen, Ghislain St-Yves, Yihan Wu, Jesse L

    Emily J. Allen, Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Jacob S. Prince, Logan T. Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, J. Benjamin Hutchinson, Thomas Naselaris, and Kendrick Kay. A massive 7t fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1): 116–126, 2022. doi:10.1038/...

  24. [24]

    Pyles, Austin Marcus, Abhinav Gupta, Michael J

    Nadine Chang, John A. Pyles, Austin Marcus, Abhinav Gupta, Michael J. Tarr, and Elissa M. Aminoff. BOLD5000, a public fMRI dataset while viewing 5000 visual images. Scientific Data, 6(1):49, 2019. doi:10.1038/s41597-019- 0052-3

  25. [25]

    THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior , volume =

    Martin N. Hebart, Oliver Contier, Lina Teichmann, Adam H. Rockter, Charles Y . Zheng, Alexis Kidder, Anna Corriveau, Maryam Vaziri-Pashkam, and Chris I. Baker. THINGS-data, a multimodal collection of large- scale datasets for investigating object representations in human brain and behavior. eLife, 12:e82580, 2023. doi:10.7554/eLife.82580

  26. [26]

    Goodale and A

    Melvyn A. Goodale and A. David Milner. Separate visual pathways for perception and action. Trends in Neurosciences, 15(1):20–25, 1992. doi:10.1016/0166-2236(92)90344-8. 16

  27. [27]

    A cortical representation of the local visual environment

    Russell Epstein and Nancy Kanwisher. A cortical representation of the local visual environment. Nature, 392 (6676):598–601, 1998. doi:10.1038/33402

  28. [28]

    Rolls, Xiaoqian Yan, Gustavo Deco, Yi Zhang, Veikko Jousmaki, and Jianfeng Feng

    Edmund T. Rolls, Xiaoqian Yan, Gustavo Deco, Yi Zhang, Veikko Jousmaki, and Jianfeng Feng. A ventromedial visual cortical ‘where’ stream to the human hippocampus for spatial scenes revealed with magnetoencephalography. Communications Biology, 7(1):1047, 2024. doi:10.1038/s42003-024-06719-z

  29. [29]

    Allison, A

    T. Allison, A. Puce, and G. McCarthy. Social perception from visual cues: role of the STS region. Trends in Cognitive Sciences, 4(7):267–278, 2000

  30. [30]

    Ungerleider

    David Pitcher and Leslie G. Ungerleider. Evidence for a third visual pathway specialized for social perception. Trends in Cognitive Sciences, 25(2):100–110, 2021. doi:10.1016/j.tics.2020.11.006

  31. [31]

    Glasser, Timothy S

    Matthew F. Glasser, Timothy S. Coalson, Emma C. Robinson, Carl D. Hacker, John Harwell, Essa Yacoub, Kamil Ugurbil, Jesper Andersson, Christian F. Beckmann, Mark Jenkinson, Stephen M. Smith, and David C. Van Essen. A multi-modal parcellation of human cerebral cortex. Nature, 536(7615):171–178, 2016. doi:10.1038/nature18933

  32. [32]

    Representational similarity analysis – connecting the branches of systems neuroscience , issn =

    Nikolaus Kriegeskorte, Marieke Mur, and Peter A. Bandettini. Representational similarity analysis – connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2, 2008. doi:10.3389/neuro.06.004.2008

  33. [33]

    Hardoon, Sandor Szedmak, and John Shawe-Taylor

    David R. Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639–2664, 2004. doi:10.1162/0899766042321814

  34. [34]

    Felleman and David C

    Daniel J. Felleman and David C. Van Essen. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1(1):1–47, 1991. doi:10.1093/cercor/1.1.1-a

  35. [35]

    Kravitz, Kadharbatcha S

    Dwight J. Kravitz, Kadharbatcha S. Saleem, Chris I. Baker, Leslie G. Ungerleider, and Mortimer Mishkin. The ventral visual pathway: An expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1):26–49, 2013. doi:10.1016/j.tics.2012.10.011

  36. [36]

    (2014) A Toolbox for Representational Similarity Analysis

    Hamed Nili, Cai Wingfield, Alexander Walther, Li Su, William Marslen-Wilson, and Nikolaus Kriegesko- rte. A toolbox for representational similarity analysis. PLOS Computational Biology , 10(4):1–11, 2014. doi:10.1371/journal.pcbi.1003553

  37. [37]

    Methods for computing the maximum performance of computational models of fMRI responses , journal =

    Agustin Lage-Castellanos, Giancarlo Valente, Elia Formisano, and Federico De Martino. Methods for computing the maximum performance of computational models of fMRI responses. PLOS Computational Biology, 15(3): 1–25, 2019. doi:10.1371/journal.pcbi.1006397

  38. [38]

    DiCarlo, Davide Zoccolan, and Nicole C

    James J. DiCarlo, Davide Zoccolan, and Nicole C. Rust. How does the brain solve visual object recognition? Neuron, 73(3):415–434, 2012. doi:10.1016/j.neuron.2012.01.010

  39. [39]

    Popham, Alexander G

    Sara F. Popham, Alexander G. Huth, Natalia Y . Bilenko, Fatma Deniz, James S. Gao, Anwar O. Nunez-Elizalde, and Jack L. Gallant. Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nature Neuroscience, 24(11):1628–1636, 2021. doi:10.1038/s41593-021-00921-6

  40. [40]

    Freedman and Earl K

    David J. Freedman and Earl K. Miller. Neural mechanisms of visual categorization: Insights from neurophysiology. Neuroscience & Biobehavioral Reviews, 32(2):311–329, 2008. doi:10.1016/j.neubiorev.2007.07.011

  41. [41]

    Weiner, and Kalanit Grill-Spector

    Lior Bugatus, Kevin S. Weiner, and Kalanit Grill-Spector. Task alters category representations in prefrontal but not high-level visual cortex. NeuroImage, 155:437–449, 2017. doi:10.1016/j.neuroimage.2017.03.062

  42. [42]

    High-resolution image reconstruction with latent diffusion models from human brain activity

    Yu Takagi and Shinji Nishimoto. High-resolution image reconstruction with latent diffusion models from human brain activity. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14453–14463, 2023

  43. [43]

    URL https://www.nature.com/articles/s41467-024-53147-y

    Colin Conwell, Jacob S. Prince, Kendrick N. Kay, George A. Alvarez, and Talia Konkle. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines. Nature Communications, 15 (1):9383, 2024. doi:10.1038/s41467-024-53147-y

  44. [44]

    Wandell, Serge O

    Brian A. Wandell, Serge O. Dumoulin, and Alyssa A. Brewer. Visual field maps in human cortex. Neuron, 56(2): 366–383, 2007. doi:10.1016/j.neuron.2007.10.012

  45. [45]

    Brady, Michelle R

    Soojin Park, Timothy F. Brady, Michelle R. Greene, and Aude Oliva. Disentangling scene content from spatial boundary: Complementary roles for the parahippocampal place area and lateral occipital complex in representing real-world scenes. Journal of Neuroscience, 31(4):1333–1340, 2011. doi:10.1523/JNEUROSCI.3885-10.2011

  46. [46]

    Beauchamp, Kathryn E

    Michael S. Beauchamp, Kathryn E. Lee, Brenna D. Argall, and Alex Martin. Integration of auditory and visual information about objects in superior temporal sulcus. Neuron, 41(5):809–823, 2004. doi:10.1016/S0896- 6273(04)00070-4

  47. [47]

    Rolls, Jianfeng Feng, and Ching-Po Lin

    Chu-Chung Huang, Edmund T. Rolls, Jianfeng Feng, and Ching-Po Lin. An extended Human Connectome Project multimodal parcellation atlas of the human cortex and subcortical areas. Brain Structure and Function, 227(3): 763–778, 2022. doi:10.1007/s00429-021-02421-6. 17

  48. [48]

    Zeiler and Rob Fergus

    Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV), pages 818–833, 2014. doi:10.1007/978-3-319-10590-1_53

  49. [49]

    Deep neural networks: A new framework for modeling biological vision and brain information processing

    Nikolaus Kriegeskorte. Deep neural networks: A new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1:417–446, 2015. doi:10.1146/annurev-vision-082114- 035447

  50. [50]

    Daniel L. K. Yamins and James J. DiCarlo. Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3):356–365, 2016. doi:10.1038/nn.4244

  51. [51]

    Martin Schrimpf, Jonas Kubilius, Michael J. Lee, N. Apurva Ratan Murty, Robert Ajemian, and James J. DiCarlo. Integrative benchmarking to advance neurally mechanistic models of human intelligence.Neuron, 108(3):413–423,

  52. [52]

    doi:10.1016/j.neuron.2020.07.040

  53. [53]

    Umut Güçlü and Marcel A. J. van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience , 35(27):10005–10014, 2015. doi:10.1523/JNEUROSCI.5023-14.2015

  54. [54]

    When and why vision- language models behave like bags-of-words, and what to do about it? In International Conference on Learning Representations (ICLR), 2023

    Mert Yuksekgonul, Federico Bianchi, Pratyusha Kalluri, Dan Jurafsky, and James Zou. When and why vision- language models behave like bags-of-words, and what to do about it? In International Conference on Learning Representations (ICLR), 2023

  55. [55]

    Christopher Baldassano, Andre Esteva, Li Fei-Fei, and Diane M. Beck. Two distinct scene-processing networks connecting vision and memory. eNeuro, 3(5), 2016. doi:10.1523/ENEURO.0178-16.2016

  56. [56]

    Epstein and Chris I

    Russell A. Epstein and Chris I. Baker. Scene perception in the human brain. Annual Review of Vision Science, 5: 373–397, 2019. doi:10.1146/annurev-vision-091718-014809

  57. [57]

    Neuropsychological evidence of a third visual pathway specialized for social perception

    David Pitcher. Neuropsychological evidence of a third visual pathway specialized for social perception. Nature Communications, 16(1):5774, 2025. doi:10.1038/s41467-025-61396-8

  58. [58]

    Jing Sui, Tülay Adali, Qingbao Yu, Jiayu Chen, and Vince D. Calhoun. A review of multivariate meth- ods for multimodal fusion of brain imaging data. Journal of Neuroscience Methods , 204(1):68–81, 2012. doi:10.1016/j.jneumeth.2011.10.031

  59. [59]

    Huth, and Jack L

    Tolga Çukur, Shinji Nishimoto, Alexander G. Huth, and Jack L. Gallant. Attention during natural vision warps semantic representation across the human brain. Nature Neuroscience, 16(6):763–770, 2013. doi:10.1038/nn.3381

  60. [60]

    Inferring DNN-Brain alignment using representational similarity analyses can be problematic

    Marin Dujmovic, Jeffrey Bowers, Federico Adolfi, and Gaurav Malhotra. Inferring DNN-Brain alignment using representational similarity analyses can be problematic. In ICLR Workshop on Re-Aligning Vision and Language Models with Human Values, 2024

  61. [61]

    van Bergen and Nikolaus Kriegeskorte

    Ruben S. van Bergen and Nikolaus Kriegeskorte. Going in circles is the way forward: the role of recurrence in visual inference. Current Opinion in Neurobiology , 65:176–193, 2020. doi:10.1016/j.conb.2020.11.009. Whole-brain interactions between neural circuits

  62. [62]

    Issa, and James J

    Kohitij Kar, Jonas Kubilius, Kailyn Schmidt, Elias B. Issa, and James J. DiCarlo. Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior. Nature Neuroscience, 22(6): 974–983, 2019. doi:10.1038/s41593-019-0392-5

  63. [63]

    Maintenance and transformation of representational formats during working memory prioritization

    Daniel Pacheco-Estefan, Marie-Christin Fellner, Lukas Kunz, Hui Zhang, Peter Reinacher, Charlotte Roy, Armin Brandt, Andreas Schulze-Bonhage, Linglin Yang, Shuang Wang, Jing Liu, Gui Xue, and Nikolai Axmacher. Maintenance and transformation of representational formats during working memory prioritization. Nature Communications, 15(1):8234, 2024. doi:10.10...

  64. [64]

    C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27(3):379–423, 1948. doi:10.1002/j.1538-7305.1948.tb01338.x

  65. [65]

    The platonic representation hypothesis

    Minyoung Huh, Brian Cheung, Tongzhou Wang, and Phillip Isola. The platonic representation hypothesis. In International Conference on Machine Learning (ICML), 2024

  66. [66]

    Rishi Jha, Collin Zhang, Vitaly Shmatikov, and John X. Morris. Harnessing the universal geometry of embeddings,

  67. [67]

    Prince, Ian Charest, Jan W

    Jacob S. Prince, Ian Charest, Jan W. Kurzawski, John A. Pyles, Michael J. Tarr, and Kendrick N. Kay. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife, 11, 2022. doi:10.7554/eLife.77599

  68. [68]

    Natural scene reconstruction from fmri signals using generative latent diffusion.Scientific Reports, 13(1):15666, Sep 2023

    Furkan Ozcelik and Rufin VanRullen. Natural scene reconstruction from fmri signals using generative latent diffusion. Scientific Reports, 13(1):15666, 2023. doi:10.1038/s41598-023-42891-8. 18

  69. [69]

    Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J

    Paul S. Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J. Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth A. Norman, and Tanishq Mathew Abraham. Reconstructing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors. In Advances in Neural Information Processing Systems, vol...

  70. [70]

    How to train your ViT? data, augmentation, and regularization in vision transformers

    Andreas Peter Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, and Lucas Beyer. How to train your ViT? data, augmentation, and regularization in vision transformers. Transactions on Machine Learning Research, 2022

  71. [71]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), volume 139, pages 8748–8763, 2021

  72. [72]

    Reproducible scaling laws for contrastive language-image learning

    Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jitsev. Reproducible scaling laws for contrastive language-image learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2829, 2023

  73. [73]

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...

  74. [74]

    A ConvNet for the 2020s

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15979–15988, 2022. doi:10.1109/CVPR52688.2022.01553

  75. [75]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021

  76. [76]

    Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M. Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, and Colin Raffel. Crosslingual generalization through multitask finetunin...

  77. [77]

    Gemma 2: Improving Open Language Models at a Practical Size

    Gemma Team et al. Gemma 2: Improving open language models at a practical size, 2024. arXiv:2408.00118

  78. [78]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models, 2023. arXiv:2302.13971

  79. [79]

    Openllama: An open reproduction of LLaMA

    Xinyang Geng and Hao Liu. Openllama: An open reproduction of LLaMA. https://github.com/ openlm-research/open_llama, 2023

  80. [80]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, et al. The Llama 3 herd of models, 2024. arXiv:2407.21783

Showing first 80 references.