Shared representations in brains and models reveal a two-route cortical organization during scene perception
Pith reviewed 2026-05-19 04:53 UTC · model grok-4.3
The pith
Scene perception uses two separate cortical routes, one for layout and context and one for animate content.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Representational similarity analysis performed on 7T fMRI data collected during natural scene viewing identifies two distinct processing routes in the cortex. A ventromedial pathway specializes in scene layout and environmental context, while a lateral occipitotemporal pathway is selective for animate content. Hierarchical features from vision neural networks align with the shared structure found in both routes, but language-model features correspond primarily to the lateral route. These observations refine classical visual-stream models by describing scene perception as a distributed cortical network with separable representational organizations for context and animate content.
What carries the argument
Representational similarity analysis that extracts shared geometry across individuals' brain responses to scenes and matches it against hierarchical features from vision and language neural networks.
If this is right
- Scene perception is carried by separable representational routes for contextual layout and animate content.
- Vision models capture shared structure across both routes while language models align mainly with the animate route.
- Classical two-stream models of vision must be updated to include this distributed two-route organization for complex scenes.
- Shared patterns across people point to stable cortical organizations that support scene understanding.
Where Pith is reading between the lines
- The two routes could show different sensitivity to focal brain damage, producing selective deficits in layout versus object recognition.
- Active tasks such as navigation or search might reveal how the routes compete or cooperate under behavioral demands.
- Similar cross-model comparisons in other modalities could test whether the separation is specific to visual scene processing.
Load-bearing premise
The assumption that cross-individual representational similarity recorded during passive scene viewing captures stable, functionally meaningful cortical routes rather than task- or stimulus-specific correlations.
What would settle it
If the distinct similarity geometries in ventromedial versus lateral occipitotemporal regions become indistinguishable when the same scenes are viewed under an active task that requires integrating layout and animate information.
Figures
read the original abstract
The brain transforms visual inputs into high-dimensional cortical representations that support diverse cognitive and behavioral goals. Characterizing how this information is organized and routed across the human brain is essential for understanding how we process complex visual scenes. Here, we applied representational similarity analysis to 7T fMRI data collected during natural scene viewing. We quantified representational geometry shared across individuals and compared it to hierarchical features from vision and language neural networks. This analysis revealed two distinct processing routes: a ventromedial pathway specialized for scene layout and environmental context, and a lateral occipitotemporal pathway selective for animate content. Vision models aligned with shared structure in both routes, whereas language models corresponded primarily with the lateral pathway. These findings refine classical visual-stream models by characterizing scene perception as a distributed cortical network with separable representational routes for context and animate content.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript applies representational similarity analysis (RSA) to 7T fMRI data collected during passive viewing of natural scenes. It extracts representational geometry shared across individuals and aligns this geometry to hierarchical features from pre-trained vision and language neural networks. The central finding is a two-route cortical organization: a ventromedial pathway specialized for scene layout and environmental context, and a lateral occipitotemporal pathway selective for animate content. Vision models align with shared structure in both routes, whereas language models align primarily with the lateral route.
Significance. If the shared RDMs prove reliable and the route distinction generalizes, the work refines classical dorsal/ventral stream models by characterizing scene perception as a distributed network with separable representational routes for context versus animate content. The integration of high-field fMRI with both vision and language model features offers a computational bridge that could guide targeted experiments on how cortical organization supports high-level scene understanding.
major comments (2)
- [Methods] Methods section: No participant count, stimulus-set size or composition, statistical thresholds, or cross-validation scheme is reported for the RSA or model-alignment steps. These details are load-bearing for the claim that the ventromedial/lateral separation reflects stable functional routes rather than dataset-specific correlations.
- [Results] Results section on route identification: Without reported split-half reliability of the shared component or control RDMs for low-level features (e.g., spatial frequency, object co-occurrence), the ventromedial specialization for layout/context could be driven by stimulus statistics in the particular scene set rather than a general organizational principle.
minor comments (2)
- [Abstract] Abstract: The phrase 'hierarchical features from vision and language neural networks' should specify which layers or models were used and how feature extraction was performed.
- [Discussion] Discussion: A brief limitations paragraph addressing the passive-viewing design and potential task-specificity would improve clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights important aspects of methodological transparency and robustness. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of our findings on the two-route organization during scene perception.
read point-by-point responses
-
Referee: [Methods] Methods section: No participant count, stimulus-set size or composition, statistical thresholds, or cross-validation scheme is reported for the RSA or model-alignment steps. These details are load-bearing for the claim that the ventromedial/lateral separation reflects stable functional routes rather than dataset-specific correlations.
Authors: We agree that explicit reporting of these parameters is necessary to evaluate the stability of the identified routes. The revised Methods section will include the participant count, the size and composition of the natural scene stimulus set, the statistical thresholds applied (including any multiple-comparison corrections), and the cross-validation procedures used for both the shared RDM computation and the model-alignment analyses. These additions will directly support the interpretation that the ventromedial and lateral routes reflect reliable functional organization rather than idiosyncratic dataset features. revision: yes
-
Referee: [Results] Results section on route identification: Without reported split-half reliability of the shared component or control RDMs for low-level features (e.g., spatial frequency, object co-occurrence), the ventromedial specialization for layout/context could be driven by stimulus statistics in the particular scene set rather than a general organizational principle.
Authors: We recognize the value of these controls for distinguishing stimulus-driven effects from broader organizational principles. In the revision, we will report split-half reliability estimates for the shared representational components across subjects. We will also add control analyses comparing the observed routes against RDMs constructed from low-level image statistics (spatial frequency content) and higher-order co-occurrence measures. These supplementary results will be presented in the Results section to demonstrate that the ventromedial specialization for context and the lateral selectivity for animate content are not fully explained by the specific statistics of the stimulus set. revision: yes
Circularity Check
No circularity: empirical RSA and external model comparisons are data-driven
full rationale
The paper applies representational similarity analysis to 7T fMRI data from natural scene viewing, extracts shared representational geometry across individuals, and aligns it to hierarchical features from pre-trained vision and language networks. The two-route distinction (ventromedial for layout/context, lateral for animate content) emerges from these cross-subject and cross-model comparisons rather than any self-definitional equation, fitted parameter renamed as prediction, or load-bearing self-citation. No derivation reduces to its inputs by construction; the analysis remains falsifiable against the stimulus set and external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
This analysis revealed two distinct processing routes: a ventromedial pathway specialized for scene layout and environmental context, and a lateral occipitotemporal pathway selective for animate content.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We applied representational similarity analysis to 7T fMRI data collected during natural scene viewing.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale
Evidence for cross-modal representational convergence weakens substantially at scale and in realistic many-to-many settings, indicating models learn rich but distinct representations.
Reference graph
Works this paper leans on
-
[1]
James J. Gibson. The ecological approach to visual perception. Houghton, Mifflin and Company, 1979
work page 1979
-
[2]
R. L. Gregory. Perceptions as hypotheses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 290(1038):181–197, 1980
work page 1980
-
[3]
Irvin Rock. The logic of perception. MIT Press, Cambridge, 1983
work page 1983
-
[4]
Zirui Chen and Michael F. Bonner. Universal dimensions of visual representation. Science Advances, 11(27): eadw7697, 2025. doi:10.1126/sciadv.adw7697
-
[5]
Karl J. Friston. A theory of cortical responses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 360(1456):815–836, 2005. doi:10.1098/rstb.2005.1622
-
[6]
Rajesh P. N. Rao and Dana H. Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1):79–87, 1999. doi:10.1038/4580
-
[7]
Alan Yuille and Daniel Kersten. Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 10(7):301–308, 2006. doi:10.1016/j.tics.2006.05.002. Special issue: Probabilistic models of cognition. 15
-
[8]
The structural basis of inter-individual differences in human behaviour and cognition
Ryota Kanai and Geraint Rees. The structural basis of inter-individual differences in human behaviour and cognition. Nature Reviews Neuroscience, 12(4):231–242, 2011. doi:10.1038/nrn3000
-
[9]
Rosa Lafer-Sousa, Katherine L. Hermann, and Bevil R. Conway. Striking individual differences in color perception uncovered by ‘the dress’ photograph.Current Biology, 25(13):R545–R546, 2015. doi:10.1016/j.cub.2015.04.053
-
[10]
Samuel Schwarzkopf, Chen Song, and Geraint Rees
D. Samuel Schwarzkopf, Chen Song, and Geraint Rees. The surface area of human V1 predicts the subjective experience of object size. Nature Neuroscience, 14(1):28–30, 2011. doi:10.1038/nn.2706
-
[11]
Christopher Baldassano, Uri Hasson, and Kenneth A. Norman. Representation of real-world event schemas during narrative perception. Journal of Neuroscience, 38(45):9689–9699, 2018. doi:10.1523/JNEUROSCI.0251-18.2018
-
[12]
Uri Hasson, Yuval Nir, Ifat Levy, Galit Fuhrmann, and Rafael Malach. Intersubject synchronization of cortical activity during natural vision. Science, 303(5664):1634–1640, 2004. doi:10.1126/science.1089506
-
[13]
James V . Haxby, Andrew C. Connolly, and J. Swaroop Guntupalli. Decoding neural representational spaces using multivariate pattern analysis. Annual Review of Neuroscience, 37:435–456, 2014. doi:10.1146/annurev-neuro- 062012-170325
-
[14]
Janice Chen, Yuan Chang Leong, Christopher J. Honey, Chung H. Yong, Kenneth A. Norman, and Uri Hasson. Shared memories reveal shared structure in neural activity across individuals.Nature Neuroscience, 20(1):115–125,
-
[15]
Deep supervised, but not unsupervised, models may explain IT cortical representation
Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLOS Computational Biology , 10(11):1–29, 2014. doi:10.1371/journal.pcbi.1003915
-
[16]
Daniel L. K. Yamins, Ha Hong, Charles F. Cadieu, Ethan A. Solomon, Darren Seibert, and James J. DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23):8619–8624, 2014. doi:10.1073/pnas.1403112111
-
[17]
Radoslaw Martin Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, and Aude Oliva. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports, 6(1):27755, 2016. doi:10.1038/srep27755
-
[18]
Alexander J. E. Kell, Daniel L. K. Yamins, Erica N. Shook, Sam V . Norman-Haignere, and Josh H. McDermott. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98(3):630–644.e16, 2018. doi:10.1016/j.neuron.2018.03.044
-
[19]
Brains and algorithms partially converge in natural language processing , volume =
Charlotte Caucheteux and Jean-Rémi King. Brains and algorithms partially converge in natural language processing. Communications Biology, 5(1):134, 2022. doi:10.1038/s42003-022-03036-1
-
[20]
Trends in Cognitive Sciences , author =
Radoslaw M. Cichy and Daniel Kaiser. Deep neural networks as scientific models. Trends in Cognitive Sciences, 23(4):305–317, 2019. doi:10.1016/j.tics.2019.01.009
-
[21]
Sommers, Katja Seeliger, Blake Richards, Jenann Ismael, Grace W
Adrien Doerig, Rowan P. Sommers, Katja Seeliger, Blake Richards, Jenann Ismael, Grace W. Lindsay, Kon- rad P. Kording, Talia Konkle, Marcel A. J. van Gerven, Nikolaus Kriegeskorte, and Tim C. Kietzmann. The neuroconnectionist research programme. Nature Reviews Neuroscience, 24(7):431–450, 2023. doi:10.1038/s41583- 023-00705-w
-
[22]
Erez Simony, Shany Grossman, and Rafael Malach. Brain–machine convergent evolution: Why finding parallels between brain and artificial systems is informative. Proceedings of the National Academy of Sciences, 121(41): e2319709121, 2024. doi:10.1073/pnas.2319709121
-
[23]
Allen, Ghislain St-Yves, Yihan Wu, Jesse L
Emily J. Allen, Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Jacob S. Prince, Logan T. Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, J. Benjamin Hutchinson, Thomas Naselaris, and Kendrick Kay. A massive 7t fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1): 116–126, 2022. doi:10.1038/...
-
[24]
Pyles, Austin Marcus, Abhinav Gupta, Michael J
Nadine Chang, John A. Pyles, Austin Marcus, Abhinav Gupta, Michael J. Tarr, and Elissa M. Aminoff. BOLD5000, a public fMRI dataset while viewing 5000 visual images. Scientific Data, 6(1):49, 2019. doi:10.1038/s41597-019- 0052-3
-
[25]
Martin N. Hebart, Oliver Contier, Lina Teichmann, Adam H. Rockter, Charles Y . Zheng, Alexis Kidder, Anna Corriveau, Maryam Vaziri-Pashkam, and Chris I. Baker. THINGS-data, a multimodal collection of large- scale datasets for investigating object representations in human brain and behavior. eLife, 12:e82580, 2023. doi:10.7554/eLife.82580
-
[26]
Melvyn A. Goodale and A. David Milner. Separate visual pathways for perception and action. Trends in Neurosciences, 15(1):20–25, 1992. doi:10.1016/0166-2236(92)90344-8. 16
-
[27]
A cortical representation of the local visual environment
Russell Epstein and Nancy Kanwisher. A cortical representation of the local visual environment. Nature, 392 (6676):598–601, 1998. doi:10.1038/33402
-
[28]
Rolls, Xiaoqian Yan, Gustavo Deco, Yi Zhang, Veikko Jousmaki, and Jianfeng Feng
Edmund T. Rolls, Xiaoqian Yan, Gustavo Deco, Yi Zhang, Veikko Jousmaki, and Jianfeng Feng. A ventromedial visual cortical ‘where’ stream to the human hippocampus for spatial scenes revealed with magnetoencephalography. Communications Biology, 7(1):1047, 2024. doi:10.1038/s42003-024-06719-z
-
[29]
T. Allison, A. Puce, and G. McCarthy. Social perception from visual cues: role of the STS region. Trends in Cognitive Sciences, 4(7):267–278, 2000
work page 2000
-
[30]
David Pitcher and Leslie G. Ungerleider. Evidence for a third visual pathway specialized for social perception. Trends in Cognitive Sciences, 25(2):100–110, 2021. doi:10.1016/j.tics.2020.11.006
-
[31]
Matthew F. Glasser, Timothy S. Coalson, Emma C. Robinson, Carl D. Hacker, John Harwell, Essa Yacoub, Kamil Ugurbil, Jesper Andersson, Christian F. Beckmann, Mark Jenkinson, Stephen M. Smith, and David C. Van Essen. A multi-modal parcellation of human cerebral cortex. Nature, 536(7615):171–178, 2016. doi:10.1038/nature18933
-
[32]
Representational similarity analysis – connecting the branches of systems neuroscience , issn =
Nikolaus Kriegeskorte, Marieke Mur, and Peter A. Bandettini. Representational similarity analysis – connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2, 2008. doi:10.3389/neuro.06.004.2008
-
[33]
Hardoon, Sandor Szedmak, and John Shawe-Taylor
David R. Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639–2664, 2004. doi:10.1162/0899766042321814
-
[34]
Daniel J. Felleman and David C. Van Essen. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1(1):1–47, 1991. doi:10.1093/cercor/1.1.1-a
-
[35]
Dwight J. Kravitz, Kadharbatcha S. Saleem, Chris I. Baker, Leslie G. Ungerleider, and Mortimer Mishkin. The ventral visual pathway: An expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1):26–49, 2013. doi:10.1016/j.tics.2012.10.011
-
[36]
(2014) A Toolbox for Representational Similarity Analysis
Hamed Nili, Cai Wingfield, Alexander Walther, Li Su, William Marslen-Wilson, and Nikolaus Kriegesko- rte. A toolbox for representational similarity analysis. PLOS Computational Biology , 10(4):1–11, 2014. doi:10.1371/journal.pcbi.1003553
-
[37]
Methods for computing the maximum performance of computational models of fMRI responses , journal =
Agustin Lage-Castellanos, Giancarlo Valente, Elia Formisano, and Federico De Martino. Methods for computing the maximum performance of computational models of fMRI responses. PLOS Computational Biology, 15(3): 1–25, 2019. doi:10.1371/journal.pcbi.1006397
-
[38]
DiCarlo, Davide Zoccolan, and Nicole C
James J. DiCarlo, Davide Zoccolan, and Nicole C. Rust. How does the brain solve visual object recognition? Neuron, 73(3):415–434, 2012. doi:10.1016/j.neuron.2012.01.010
-
[39]
Sara F. Popham, Alexander G. Huth, Natalia Y . Bilenko, Fatma Deniz, James S. Gao, Anwar O. Nunez-Elizalde, and Jack L. Gallant. Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nature Neuroscience, 24(11):1628–1636, 2021. doi:10.1038/s41593-021-00921-6
-
[40]
David J. Freedman and Earl K. Miller. Neural mechanisms of visual categorization: Insights from neurophysiology. Neuroscience & Biobehavioral Reviews, 32(2):311–329, 2008. doi:10.1016/j.neubiorev.2007.07.011
-
[41]
Weiner, and Kalanit Grill-Spector
Lior Bugatus, Kevin S. Weiner, and Kalanit Grill-Spector. Task alters category representations in prefrontal but not high-level visual cortex. NeuroImage, 155:437–449, 2017. doi:10.1016/j.neuroimage.2017.03.062
-
[42]
High-resolution image reconstruction with latent diffusion models from human brain activity
Yu Takagi and Shinji Nishimoto. High-resolution image reconstruction with latent diffusion models from human brain activity. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14453–14463, 2023
work page 2023
-
[43]
URL https://www.nature.com/articles/s41467-024-53147-y
Colin Conwell, Jacob S. Prince, Kendrick N. Kay, George A. Alvarez, and Talia Konkle. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines. Nature Communications, 15 (1):9383, 2024. doi:10.1038/s41467-024-53147-y
-
[44]
Brian A. Wandell, Serge O. Dumoulin, and Alyssa A. Brewer. Visual field maps in human cortex. Neuron, 56(2): 366–383, 2007. doi:10.1016/j.neuron.2007.10.012
-
[45]
Soojin Park, Timothy F. Brady, Michelle R. Greene, and Aude Oliva. Disentangling scene content from spatial boundary: Complementary roles for the parahippocampal place area and lateral occipital complex in representing real-world scenes. Journal of Neuroscience, 31(4):1333–1340, 2011. doi:10.1523/JNEUROSCI.3885-10.2011
-
[46]
Michael S. Beauchamp, Kathryn E. Lee, Brenna D. Argall, and Alex Martin. Integration of auditory and visual information about objects in superior temporal sulcus. Neuron, 41(5):809–823, 2004. doi:10.1016/S0896- 6273(04)00070-4
-
[47]
Rolls, Jianfeng Feng, and Ching-Po Lin
Chu-Chung Huang, Edmund T. Rolls, Jianfeng Feng, and Ching-Po Lin. An extended Human Connectome Project multimodal parcellation atlas of the human cortex and subcortical areas. Brain Structure and Function, 227(3): 763–778, 2022. doi:10.1007/s00429-021-02421-6. 17
-
[48]
Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV), pages 818–833, 2014. doi:10.1007/978-3-319-10590-1_53
-
[49]
Nikolaus Kriegeskorte. Deep neural networks: A new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1:417–446, 2015. doi:10.1146/annurev-vision-082114- 035447
-
[50]
Daniel L. K. Yamins and James J. DiCarlo. Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3):356–365, 2016. doi:10.1038/nn.4244
-
[51]
Martin Schrimpf, Jonas Kubilius, Michael J. Lee, N. Apurva Ratan Murty, Robert Ajemian, and James J. DiCarlo. Integrative benchmarking to advance neurally mechanistic models of human intelligence.Neuron, 108(3):413–423,
-
[52]
doi:10.1016/j.neuron.2020.07.040
-
[53]
Umut Güçlü and Marcel A. J. van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience , 35(27):10005–10014, 2015. doi:10.1523/JNEUROSCI.5023-14.2015
-
[54]
Mert Yuksekgonul, Federico Bianchi, Pratyusha Kalluri, Dan Jurafsky, and James Zou. When and why vision- language models behave like bags-of-words, and what to do about it? In International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[55]
Christopher Baldassano, Andre Esteva, Li Fei-Fei, and Diane M. Beck. Two distinct scene-processing networks connecting vision and memory. eNeuro, 3(5), 2016. doi:10.1523/ENEURO.0178-16.2016
-
[56]
Russell A. Epstein and Chris I. Baker. Scene perception in the human brain. Annual Review of Vision Science, 5: 373–397, 2019. doi:10.1146/annurev-vision-091718-014809
-
[57]
Neuropsychological evidence of a third visual pathway specialized for social perception
David Pitcher. Neuropsychological evidence of a third visual pathway specialized for social perception. Nature Communications, 16(1):5774, 2025. doi:10.1038/s41467-025-61396-8
-
[58]
Jing Sui, Tülay Adali, Qingbao Yu, Jiayu Chen, and Vince D. Calhoun. A review of multivariate meth- ods for multimodal fusion of brain imaging data. Journal of Neuroscience Methods , 204(1):68–81, 2012. doi:10.1016/j.jneumeth.2011.10.031
-
[59]
Tolga Çukur, Shinji Nishimoto, Alexander G. Huth, and Jack L. Gallant. Attention during natural vision warps semantic representation across the human brain. Nature Neuroscience, 16(6):763–770, 2013. doi:10.1038/nn.3381
-
[60]
Inferring DNN-Brain alignment using representational similarity analyses can be problematic
Marin Dujmovic, Jeffrey Bowers, Federico Adolfi, and Gaurav Malhotra. Inferring DNN-Brain alignment using representational similarity analyses can be problematic. In ICLR Workshop on Re-Aligning Vision and Language Models with Human Values, 2024
work page 2024
-
[61]
van Bergen and Nikolaus Kriegeskorte
Ruben S. van Bergen and Nikolaus Kriegeskorte. Going in circles is the way forward: the role of recurrence in visual inference. Current Opinion in Neurobiology , 65:176–193, 2020. doi:10.1016/j.conb.2020.11.009. Whole-brain interactions between neural circuits
-
[62]
Kohitij Kar, Jonas Kubilius, Kailyn Schmidt, Elias B. Issa, and James J. DiCarlo. Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior. Nature Neuroscience, 22(6): 974–983, 2019. doi:10.1038/s41593-019-0392-5
-
[63]
Maintenance and transformation of representational formats during working memory prioritization
Daniel Pacheco-Estefan, Marie-Christin Fellner, Lukas Kunz, Hui Zhang, Peter Reinacher, Charlotte Roy, Armin Brandt, Andreas Schulze-Bonhage, Linglin Yang, Shuang Wang, Jing Liu, Gui Xue, and Nikolai Axmacher. Maintenance and transformation of representational formats during working memory prioritization. Nature Communications, 15(1):8234, 2024. doi:10.10...
-
[64]
C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27(3):379–423, 1948. doi:10.1002/j.1538-7305.1948.tb01338.x
-
[65]
The platonic representation hypothesis
Minyoung Huh, Brian Cheung, Tongzhou Wang, and Phillip Isola. The platonic representation hypothesis. In International Conference on Machine Learning (ICML), 2024
work page 2024
-
[66]
Rishi Jha, Collin Zhang, Vitaly Shmatikov, and John X. Morris. Harnessing the universal geometry of embeddings,
-
[67]
Jacob S. Prince, Ian Charest, Jan W. Kurzawski, John A. Pyles, Michael J. Tarr, and Kendrick N. Kay. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife, 11, 2022. doi:10.7554/eLife.77599
-
[68]
Furkan Ozcelik and Rufin VanRullen. Natural scene reconstruction from fmri signals using generative latent diffusion. Scientific Reports, 13(1):15666, 2023. doi:10.1038/s41598-023-42891-8. 18
-
[69]
Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J
Paul S. Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Ethan Cohen, Aidan J. Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth A. Norman, and Tanishq Mathew Abraham. Reconstructing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors. In Advances in Neural Information Processing Systems, vol...
work page 2023
-
[70]
How to train your ViT? data, augmentation, and regularization in vision transformers
Andreas Peter Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, and Lucas Beyer. How to train your ViT? data, augmentation, and regularization in vision transformers. Transactions on Machine Learning Research, 2022
work page 2022
-
[71]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), volume 139, pages 8748–8763, 2021
work page 2021
-
[72]
Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jitsev. Reproducible scaling laws for contrastive language-image learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2829, 2023
work page 2023
-
[73]
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...
work page 2024
-
[74]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15979–15988, 2022. doi:10.1109/CVPR52688.2022.01553
-
[75]
An image is worth 16x16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021
work page 2021
-
[76]
Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M. Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, and Colin Raffel. Crosslingual generalization through multitask finetunin...
-
[77]
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team et al. Gemma 2: Improving open language models at a practical size, 2024. arXiv:2408.00118
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[78]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models, 2023. arXiv:2302.13971
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[79]
Openllama: An open reproduction of LLaMA
Xinyang Geng and Hao Liu. Openllama: An open reproduction of LLaMA. https://github.com/ openlm-research/open_llama, 2023
work page 2023
-
[80]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, et al. The Llama 3 herd of models, 2024. arXiv:2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.