pith. sign in

arxiv: 2605.23035 · v1 · pith:HXAK6GFKnew · submitted 2026-05-21 · 💻 cs.CL · cs.AI· q-bio.NC

Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

Pith reviewed 2026-05-25 05:36 UTC · model grok-4.3

classification 💻 cs.CL cs.AIq-bio.NC
keywords sparse autoencodersbrain encodingsemantic featurescortical topographyneural encoding modelsLLM interpretabilitylanguage processing
0
0 comments X

The pith

Sparse autoencoders extract semantic features from LLMs that map onto distinct brain regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper decomposes intermediate layers of GPT-2 XL and Llama-3.1-8B using sparse autoencoders into 16K-32K features per layer and shows that the semantic subset alone recovers 94 percent of the models' ability to predict human brain responses to language. It then tests whether five semantic subcategories, chosen in advance from independent neuroscience studies, align with specific cortical regions. A convergence test finds statistically reliable subcategory-to-region mapping. This supplies a mechanistic account for why certain LLM layers predict brain activity and demonstrates that the alignment occurs at a finer grain than earlier methods reached.

Core claim

Semantic features identified by sparse autoencoders recover 94 percent of peak brain-encoding performance (r=0.285) and exceed variance-matched baselines. The same features, when grouped into five a priori semantic subcategories, map onto distinct brain regions with Spearman ρ=0.72 and hypergeometric p=0.007. These features also improve prediction of reading times beyond lexical controls and the pattern holds across English, Chinese, and French.

What carries the argument

Sparse autoencoder features, especially the human-validated semantic subset, inserted into neural encoding models to predict fMRI responses and tested for alignment with cortical semantic topography via a formal convergence statistic.

If this is right

  • Semantic features alone account for nearly all LLM-to-brain predictive power.
  • The five-category mapping to brain regions survives a formal statistical convergence test.
  • SAE features improve reading-time prediction beyond lexical variables.
  • Exploratory analysis indicates the brain may additionally encode semantic prediction errors.
  • The alignment pattern generalizes across three languages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same SAE taxonomy could be applied to other imaging modalities or to models of different sizes to test whether the topography mapping remains stable.
  • If specific SAE features drive the alignment, targeted ablation or steering of those features should measurably change brain-prediction accuracy.
  • Extending the approach to sentence-level or discourse-level stimuli might reveal whether higher-order semantic structure also shows topographic organization.

Load-bearing premise

The five semantic subcategories taken from prior neuroscience work classify the SAE features correctly and correspond to separate brain regions without any post-hoc selection or adjustment of the mapping.

What would settle it

Absence of significant subcategory-to-region alignment in the convergence test, for example a Spearman ρ near zero or hypergeometric p-value above 0.05, would falsify the topography claim.

Figures

Figures reproduced from arXiv: 2605.23035 by Dongxin Guo, Jikun Wu, Siu Ming Yiu.

Figure 1
Figure 1. Figure 1: Five-stage pipeline: extract LLM activations [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Brain prediction across layers. Both mod [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Predicted vs. observed subcategory × region patterns. Left: a priori predictions from Binder et al. (2009)/Huth et al. (2016)/Deniz et al. (2019) (dark = predicted primary association). Right: observed SAE encoding r-values with FDR-significant cells marked. Formal convergence: ρ=0.72, p<0.001; hypergeomet￾ric p=0.007; Mantel r=0.64, p=0.002. 4.5 Formal Convergence Test We formally test whether the observe… view at source ↗
Figure 4
Figure 4. Figure 4: Subcategory × region encoding at L24. ** sur￾vives FDR correction (q<0.05, Benjamini-Hochberg); * nominally significant (p<0.05, permutation). AA Feature Categorization Confusion Matrix GPT-4↓ / Human→ Sem Syn Lex Pred Oth Semantic 82 4 3 2 9 Syntactic 3 86 2 1 8 Lexical 5 3 84 1 7 Prediction 4 2 1 79 14 Other 6 5 10 17 62 Per-cat. κ .78 .83 .80 .74 .58 [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Activation patching at L24 with 95% CIs. [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
read the original abstract

Intermediate layers of large language models (LLMs) best predict human brain responses to language, one of the most robust findings in computational neurolinguistics, yet why remains mechanistically unexplained. We address this gap by bridging sparse autoencoders (SAEs) from mechanistic interpretability with neural encoding models, decomposing GPT-2 XL and Llama-3.1-8B into 16K-32K interpretable features per layer. A human-validated taxonomy ($\kappa \geq 0.74$) reveals that semantic features alone recover 94% of peak encoding performance ($r=0.285$), substantially exceeding variance-matched baselines ($p<0.001$, $d=1.31$). Beyond this aggregate dominance, we test a novel cortical topography prediction: five semantic subcategories derived a priori from three independent neuroscience programs should map onto distinct brain regions. A formal convergence test confirms this alignment (Spearman $\rho=0.72$, $p<0.001$; hypergeometric $p=0.007$), demonstrating that SAE-discovered features recapitulate known cortical semantic organization at a granularity inaccessible to prior methods. SAE features further predict human reading times beyond lexical controls ($\Delta\mathrm{logLik}=38.4$, $p<0.001$), and an exploratory prediction-error analysis provides preliminary evidence that the brain additionally encodes unexpected semantic content. Results generalize across English, Chinese, and French.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that sparse autoencoders (SAEs) decompose activations from GPT-2 XL and Llama-3.1-8B into 16K–32K features per layer; semantic features alone recover 94% of peak brain encoding performance (r=0.285, exceeding variance-matched baselines at p<0.001, d=1.31); a human-validated taxonomy (κ≥0.74) of five a priori semantic subcategories derived from independent neuroscience programs aligns with distinct cortical regions via a convergence test (Spearman ρ=0.72, p<0.001; hypergeometric p=0.007); SAE features also predict reading times beyond lexical controls (ΔlogLik=38.4, p<0.001); results generalize across English, Chinese, and French.

Significance. If the central claims hold, the work offers a mechanistic bridge between LLM interpretability and neurolinguistics by mapping SAE features onto known cortical semantic topography at a granularity finer than prior methods, supported by formal statistical tests and cross-lingual generalization. The use of a priori subcategories, inter-rater reliability reporting, and explicit baseline comparisons are positive elements.

major comments (2)
  1. [Methods (Convergence Test)] Methods (taxonomy application and convergence test): The manuscript states that the five subcategories are derived a priori and that feature classification was human-validated (κ≥0.74), but provides no description of blinding, pre-registration, total features labeled versus discarded, or whether raters had access to brain encoding or topography results. This information is required to confirm that the reported Spearman ρ=0.72 and hypergeometric p=0.007 are independent of data-dependent choices.
  2. [Results (Encoding Performance)] Results (encoding performance): The claim that semantic features recover 94% of peak performance (r=0.285) requires explicit definition of the peak baseline (which layers, how many features, exact construction of variance-matched controls) and confirmation that the 94% figure is not sensitive to post-hoc feature selection; without these details the dominance claim cannot be fully evaluated.
minor comments (2)
  1. The abstract reports generalization across three languages but the main text should include a dedicated table or supplementary figure breaking down encoding performance and convergence statistics per language.
  2. [Methods] Notation for the hypergeometric test and the exact null model (how many features per subcategory, total feature pool) should be stated explicitly in the Methods to allow replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight opportunities to improve methodological transparency. We address each major point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Methods (Convergence Test)] Methods (taxonomy application and convergence test): The manuscript states that the five subcategories are derived a priori and that feature classification was human-validated (κ≥0.74), but provides no description of blinding, pre-registration, total features labeled versus discarded, or whether raters had access to brain encoding or topography results. This information is required to confirm that the reported Spearman ρ=0.72 and hypergeometric p=0.007 are independent of data-dependent choices.

    Authors: We agree that the Methods section requires expanded detail on the taxonomy application to demonstrate independence from data-dependent choices. The five subcategories were fixed a priori from three independent neuroscience programs before any SAE or brain data inspection. Human validation (κ≥0.74) was performed on features drawn from the full set of 16K–32K per layer. We will revise the manuscript to specify: the total number of features labeled and the subset discarded; that classification occurred without reference to per-feature brain encoding values or topography maps; and that the study was not pre-registered. The convergence test (Spearman ρ and hypergeometric p) was computed only after the taxonomy and labels were finalized. revision: yes

  2. Referee: [Results (Encoding Performance)] Results (encoding performance): The claim that semantic features recover 94% of peak performance (r=0.285) requires explicit definition of the peak baseline (which layers, how many features, exact construction of variance-matched controls) and confirmation that the 94% figure is not sensitive to post-hoc feature selection; without these details the dominance claim cannot be fully evaluated.

    Authors: We will revise the Results and Methods to define the peak baseline explicitly as the highest encoding performance obtained from any layer using the full set of SAE features (no selection). Variance-matched controls were constructed by sampling an equal number of non-semantic features while preserving the distribution of explained variance in the brain data. The 94% ratio is computed directly from these quantities. Semantic feature selection was determined solely by the a priori taxonomy and human labels, independent of brain encoding performance, so the comparison is not post-hoc with respect to the brain data. We will also report sensitivity checks across layer ranges. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation uses independent priors and formal tests

full rationale

The paper's load-bearing steps rest on subcategories derived a priori from three independent neuroscience programs, a human-validated taxonomy with reported κ ≥ 0.74, and formal statistical tests (Spearman ρ=0.72, hypergeometric p=0.007) applied to SAE features extracted from LLMs. No equations, self-citations, or fitted parameters are shown reducing the alignment result to its own inputs by construction. The convergence test is presented as confirmation against external brain data rather than a renaming or self-definition. This is a self-contained derivation against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on the assumption that the human-validated taxonomy correctly partitions features into semantically meaningful subcategories and that these subcategories have independent grounding in neuroscience programs; no free parameters are explicitly fitted in the abstract beyond the choice of SAE dictionary size, and no new entities are postulated.

free parameters (1)
  • SAE dictionary size
    16K-32K features chosen per layer for decomposition; value is selected rather than derived.
axioms (2)
  • domain assumption Semantic features are the dominant component explaining LLM-brain alignment
    Invoked when claiming semantic features recover 94% of peak performance.
  • domain assumption The five semantic subcategories derived from prior neuroscience programs are valid and distinct
    Required for the a priori mapping test and convergence statistic.

pith-pipeline@v0.9.0 · 5797 in / 1429 out tokens · 22902 ms · 2026-05-25T05:36:14.105333+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · 2 internal anchors

  1. [1]

    Language Technologies 2001: The Second Meeting of the North American Chapter of the Association for Computational Linguistics,

    John Hale , title =. Language Technologies 2001: The Second Meeting of the North American Chapter of the Association for Computational Linguistics,. 2001 , burl =

  2. [2]

    Huth and Wendy A

    Alexander G. Huth and Wendy A. de Heer and Thomas L. Griffiths and Fr. Natural speech reveals the semantic maps that tile human cerebral cortex , journal =. 2016 , burl =. doi:10.1038/NATURE17637 , timestamp =

  3. [3]

    Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain) , booktitle =

    Mariya Toneva and Leila Wehbe , editor =. Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain) , booktitle =. 2019 , burl =

  4. [4]

    Hinton , editor =

    Simon Kornblith and Mohammad Norouzi and Honglak Lee and Geoffrey E. Hinton , editor =. Similarity of Neural Network Representations Revisited , booktitle =. 2019 , burl =

  5. [5]

    The Twelfth International Conference on Learning Representations,

    Robert Huben and Hoagy Cunningham and Logan Riggs Smith and Aidan Ewart and Lee Sharkey , title =. The Twelfth International Conference on Learning Representations,. 2024 , burl =

  6. [6]

    Disentangling syntax and semantics in the brain with deep networks , booktitle =

    Charlotte Caucheteux and Alexandre Gramfort and Jean. Disentangling syntax and semantics in the brain with deep networks , booktitle =. 2021 , burl =

  7. [7]

    Hale and Bertrand Thirion and Christophe Pallier , editor =

    Alexandre Pasquiou and Yair Lakretz and John T. Hale and Bertrand Thirion and Christophe Pallier , editor =. Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps , booktitle =. 2022 , burl =

  8. [8]

    Eliciting Latent Predictions from Transformers with the Tuned Lens

    Nora Belrose and Zach Furman and Logan Smith and Danny Halawi and Igor Ostrovsky and Lev McKinney and Stella Biderman and Jacob Steinhardt , title =. arXiv preprint , volume =. 2023 , burl =. doi:10.48550/ARXIV.2303.08112 , beprinttype =

  9. [9]

    Michaud and David D

    Yuxiao Li and Eric J. Michaud and David D. Baek and Joshua Engels and Xiaoqing Sun and Max Tegmark , title =. Entropy , volume =. 2025 , burl =. doi:10.3390/E27040344 , timestamp =

  10. [10]

    Michaud and Yonatan Belinkov and David Bau and Aaron Mueller , title =

    Samuel Marks and Can Rager and Eric J. Michaud and Yonatan Belinkov and David Bau and Aaron Mueller , title =. The Thirteenth International Conference on Learning Representations,. 2025 , burl =

  11. [11]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),

    Jing Huang and Zhengxuan Wu and Christopher Potts and Mor Geva and Atticus Geiger , editor =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2024 , burl =. doi:10.18653/V1/2024.ACL-LONG.470 , timestamp =

  12. [13]

    Lundberg and Su

    Ian Covert and Scott M. Lundberg and Su. Explaining by Removing:. J. Mach. Learn. Res. , volume =. 2021 , burl =

  13. [14]

    Goodman , editor =

    Zhengxuan Wu and Atticus Geiger and Thomas Icard and Christopher Potts and Noah D. Goodman , editor =. Interpretability at Scale: Identifying Causal Mechanisms in Alpaca , booktitle =. 2023 , burl =

  14. [15]

    Goodman , editor =

    Atticus Geiger and Zhengxuan Wu and Christopher Potts and Thomas Icard and Noah D. Goodman , editor =. Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations , booktitle =. 2024 , burl =

  15. [16]

    Antonello and Aditya R

    Richard J. Antonello and Aditya R. Vaidya and Alexander Huth , editor =. Scaling laws for language encoding models in fMRI , booktitle =. 2023 , burl =

  16. [17]

    Proceedings of the 57th Conference of the Association for Computational Linguistics,

    Ian Tenney and Dipanjan Das and Ellie Pavlick , editor =. Proceedings of the 57th Conference of the Association for Computational Linguistics,. 2019 , burl =. doi:10.18653/V1/P19-1452 , timestamp =

  17. [18]

    Saxe and James L

    Andrew M. Saxe and James L. McClelland and Surya Ganguli , title =. arXiv preprint , volume =. 2018 , burl =

  18. [19]

    Incorporating Context into Language Encoding Models for fMRI , booktitle =

    Shailee Jain and Alexander Huth , editor =. Incorporating Context into Language Encoding Models for fMRI , booktitle =. 2018 , burl =

  19. [20]

    The Thirteenth International Conference on Learning Representations,

    Aleksandar Makelov and Georg Lange and Neel Nanda , title =. The Thirteenth International Conference on Learning Representations,. 2025 , burl =

  20. [21]

    Hamed Nili and Cai Wingfield and Alexander Walther and Li Su and William D. Marslen. A Toolbox for Representational Similarity Analysis , journal =. 2014 , burl =. doi:10.1371/JOURNAL.PCBI.1003553 , timestamp =

  21. [22]

    Iamshchinina and Monika Graumann and Alex Andonian and N

    Radoslaw Martin Cichy and Kshitij Dwivedi and Benjamin Lahner and Alex Lascelles and P. Iamshchinina and Monika Graumann and Alex Andonian and N. A. R. Murty and K. Kay and Gemma Roig and Aude Oliva , title =. arXiv preprint , volume =. 2021 , burl =

  22. [23]

    Cognition , year =

    Levy, Roger , title =. Cognition , year =. doi:10.1016/j.cognition.2007.05.006 , pmid =

  23. [24]

    Mitchell, Svetlana V

    Mitchell, Tom M. and Shinkareva, Svetlana V. and Carlson, Andrew and Chang, Kai-Min and Malave, Vicente L. and Mason, Robert A. and Just, Marcel Adam , year =. Predicting Human Brain Activity Associated with the Meanings of Nouns , volume =. Science , publisher =. doi:10.1126/science.1152876 , number =

  24. [25]

    Proceedings of the National Academy of Sciences , year =

    Schrimpf, Martin and Blank, Idan Asher and Tuckute, Greta and Kauf, Carina and Hosseini, Eghbal A. and Kanwisher, Nancy and Tenenbaum, Joshua B. and Fedorenko, Evelina , title =. Proceedings of the National Academy of Sciences , year =. doi:10.1073/pnas.2105646118 , pmid =

  25. [26]

    Goldstein, Ariel and Zada, Zaid and Buchnik, Eliav and Schain, Mariano and Price, Amy and Aubrey, Bobbi and Nastase, Samuel A. and Feder, Amir and Emanuel, Dotan and Cohen, Alon and Jansen, Aren and Gazula, Harshvardhan and Choe, Gina and Rao, Aditi and Kim, Catherine and Casto, Colton and Fanda, Lora and Doyle, Werner and Friedman, Daniel and Dugan, Patr...

  26. [27]

    Gershman, Nancy Kanwisher, Matthew Botvinick, and Evelina Fedorenko

    Pereira, Francisco and Lou, Bin and Pritchett, Brianna and Ritter, Samuel and Gershman, Samuel J. and Kanwisher, Nancy and Botvinick, Matthew and Fedorenko, Evelina , year =. Toward a universal decoder of linguistic meaning from brain activation , volume =. Nature Communications , publisher =. doi:10.1038/s41467-018-03068-4 , number =

  27. [28]

    Nature Human Behaviour , year =

    Tuckute, Greta and Sathe, Aalok and Srikant, Shashank and Taliaferro, Maya and Wang, Mingye and Schrimpf, Martin and Kay, Kendrick and Fedorenko, Evelina , title =. Nature Human Behaviour , year =. doi:10.1038/s41562-023-01783-7 , pmid =

  28. [29]

    and Regev, Tamar I

    Fedorenko, Evelina and Ivanova, Anna A. and Regev, Tamar I. , year =. The language network as a natural kind within the broader landscape of the human brain , volume =. Nature Reviews Neuroscience , publisher =. doi:10.1038/s41583-024-00802-4 , number =

  29. [30]

    Brains and algorithms partially converge in natural language processing , volume =

    Caucheteux, Charlotte and King, Jean-Rémi , year =. Brains and algorithms partially converge in natural language processing , volume =. Communications Biology , publisher =. doi:10.1038/s42003-022-03036-1 , number =

  30. [31]

    Rao, Rajesh P. N. and Ballard, Dana H. , year =. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects , volume =. Nature Neuroscience , publisher =. doi:10.1038/4580 , number =

  31. [32]

    A theory of cortical responses , volume =

    Friston, Karl , year =. A theory of cortical responses , volume =. Philosophical Transactions of the Royal Society B: Biological Sciences , publisher =. doi:10.1098/rstb.2005.1622 , number =

  32. [33]

    The free-energy principle: a unified brain theory? , volume =

    Friston, Karl , year =. The free-energy principle: a unified brain theory? , volume =. Nature Reviews Neuroscience , publisher =. doi:10.1038/nrn2787 , number =

  33. [34]

    Whatever next? Predictive brains, situated agents, and the future of cognitive science , volume =

    Clark, Andy , year =. Whatever next? Predictive brains, situated agents, and the future of cognitive science , volume =. Behavioral and Brain Sciences , publisher =. doi:10.1017/s0140525x12000477 , number =

  34. [35]

    Proceedings of the National Academy of Sciences , year =

    Heilbron, Micha and Armeni, Kristijan and Schoffelen, Jan-Mathijs and Hagoort, Peter and de Lange, Floris P. , title =. Proceedings of the National Academy of Sciences , year =. doi:10.1073/pnas.2201968119 , pmid =

  35. [36]

    Predictive Coding or Just Feature Discovery? An Alternative Account of Why Language Models Fit Brain Data , issn =

    Antonello, Richard and Huth, Alexander , year =. Predictive Coding or Just Feature Discovery? An Alternative Account of Why Language Models Fit Brain Data , issn =. doi:10.1162/nol_a_00087 , journal =

  36. [37]

    Neurobiology of Language , year =

    Kauf, Carina and Tuckute, Greta and Levy, Roger and Andreas, Jacob and Fedorenko, Evelina , title =. Neurobiology of Language , year =. doi:10.1162/nol_a_00116 , pmid =

  37. [38]

    Joint processing of linguistic properties in brains and language models , booktitle =

    Subba Reddy Oota and Manish Gupta and Mariya Toneva , editor =. Joint processing of linguistic properties in brains and language models , booktitle =. 2023 , burl =

  38. [39]

    Neurobiology of Language , year =

    Hosseini, Eghbal A. and Schrimpf, Martin and Zhang, Yian and Bowman, Samuel and Zaslavsky, Noga and Fedorenko, Evelina , title =. Neurobiology of Language , year =. doi:10.1162/nol_a_00137 , pmid =

  39. [40]

    and Yamakoshi, Takateru and Goldstein, Ariel and Hasson, Uri and Norman, Kenneth A

    Kumar, Sreejan and Sumers, Theodore R. and Yamakoshi, Takateru and Goldstein, Ariel and Hasson, Uri and Norman, Kenneth A. and Griffiths, Thomas L. and Hawkins, Robert D. and Nastase, Samuel A. , title =. Nature Communications , year =. doi:10.1038/s41467-024-49173-5 , pmid =

  40. [41]

    Proceedings of the National Academy of Sciences , year =

    Shain, Cory and Meister, Clara and Pimentel, Tiago and Cotterell, Ryan and Levy, Roger , title =. Proceedings of the National Academy of Sciences , year =. doi:10.1073/pnas.2307876121 , pmid =

  41. [42]

    Evidence of a predictive coding hierarchy in the human brain listening to speech , volume =

    Caucheteux, Charlotte and Gramfort, Alexandre and King, Jean-Rémi , year =. Evidence of a predictive coding hierarchy in the human brain listening to speech , volume =. Nature Human Behaviour , publisher =. doi:10.1038/s41562-022-01516-2 , number =

  42. [43]

    Ivanova and Idan Asher Blank and Nancy Kanwisher and Joshua B

    Kyle Mahowald and Anna A. Ivanova and Idan Asher Blank and Nancy Kanwisher and Joshua B. Tenenbaum and Evelina Fedorenko , title =. arXiv preprint , volume =. 2023 , burl =. doi:10.48550/ARXIV.2301.06627 , beprinttype =

  43. [44]

    Thomas and Yao, Shunyu and Friedman, Dan and Hardy, Mathew D

    McCoy, R. Thomas and Yao, Shunyu and Friedman, Dan and Hardy, Mathew D. and Griffiths, Thomas L. , year =. Embers of autoregression show how large language models are shaped by the problem they are trained to solve , volume =. Proceedings of the National Academy of Sciences , publisher =. doi:10.1073/pnas.2322420121 , number =

  44. [45]

    Language in Brains, Minds, and Machines , volume =

    Tuckute, Greta and Kanwisher, Nancy and Fedorenko, Evelina , year =. Language in Brains, Minds, and Machines , volume =. Annual Review of Neuroscience , publisher =. doi:10.1146/annurev-neuro-120623-101142 , number =

  45. [46]

    and Desai, Rutvik H

    Binder, Jeffrey R. and Desai, Rutvik H. and Graves, William W. and Conant, Lisa L. , year =. Where Is the Semantic System? A Critical Review and Meta-Analysis of 120 Functional Neuroimaging Studies , volume =. Cerebral Cortex , publisher =. doi:10.1093/cercor/bhp055 , number =

  46. [47]

    and Huth, Alexander G

    Deniz, Fatma and Nunez-Elizalde, Anwar O. and Huth, Alexander G. and Gallant, Jack L. , year =. The Representation of Semantic Information Across Human Cerebral Cortex During Listening Versus Reading Is Invariant to Stimulus Modality , volume =. The Journal of Neuroscience , publisher =. doi:10.1523/jneurosci.0675-19.2019 , number =

  47. [48]

    , title =

    LeBel, Amanda and Wagner, Lauren and Jain, Shailee and Adhikari-Desai, Aneesh and Gupta, Bhavin and Morgenthal, Allyson and Tang, Jerry and Xu, Lixiang and Huth, Alexander G. , title =. Scientific Data , year =. doi:10.1038/s41597-023-02437-z , pmid =

  48. [49]

    Morris and Richard J

    Vinamra Benara and Chandan Singh and John X. Morris and Richard J. Antonello and Ion Stoica and Alexander Huth and Jianfeng Gao , editor =. Crafting Interpretable Embeddings for Language Neuroscience by Asking LLMs Questions , booktitle =. 2024 , burl =

  49. [50]

    Lost in the Middle: How Language Models Use Long Contexts

    Byung. Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times? , journal =. 2023 , burl =. doi:10.1162/TACL\_A\_00548 , timestamp =

  50. [51]

    and Spoerer, Courtney J

    Kietzmann, Tim C. and Spoerer, Courtney J. and Sörensen, Lynn K. A. and Cichy, Radoslaw M. and Hauk, Olaf and Kriegeskorte, Nikolaus , year =. Recurrence is required to capture the representational dynamics of the human visual system , volume =. Proceedings of the National Academy of Sciences , publisher =. doi:10.1073/pnas.1905544116 , number =

  51. [52]

    Neuron , year =

    Hasson, Uri and Nastase, Samuel A. and Goldstein, Ariel , title =. Neuron , year =. doi:10.1016/j.neuron.2019.12.002 , pmid =

  52. [53]

    and Bardolph, Megan D

    Michaelov, James A. and Bardolph, Megan D. and Van Petten, Cyma K. and Bergen, Benjamin K. and Coulson, Seana , year =. Strong Prediction: Language Model Surprisal Explains Multiple N400 Effects , volume =. Neurobiology of Language , publisher =. doi:10.1162/nol_a_00105 , number =

  53. [54]

    A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders , journal =

    David Chanin and James Wilken. A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders , journal =. 2024 , burl =. doi:10.48550/ARXIV.2409.14507 , beprinttype =

  54. [55]

    and Christianson, Kiel , year =

    Luke, Steven G. and Christianson, Kiel , year =. The Provo Corpus: A large eye-tracking corpus with predictability norms , volume =. Behavior Research Methods , publisher =. doi:10.3758/s13428-017-0908-4 , number =

  55. [56]

    Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing , volume =

    Benjamini, Yoav and Hochberg, Yosef , year =. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing , volume =. Journal of the Royal Statistical Society Series B: Statistical Methodology , publisher =. doi:10.1111/j.2517-6161.1995.tb02031.x , number =

  56. [57]

    and Mahowald, K

    Kanishka Misra and Kyle Mahowald , title =. arXiv preprint , volume =. 2024 , burl =. doi:10.48550/ARXIV.2403.19827 , beprinttype =

  57. [58]

    and Coulson, Seana and Bergen, Benjamin K

    Michaelov, James A. and Coulson, Seana and Bergen, Benjamin K. , year =. So Cloze Yet So Far: N400 Amplitude Is Better Predicted by Distributional Information Than Human Predictability Judgements , volume =. IEEE Transactions on Cognitive and Developmental Systems , publisher =. doi:10.1109/tcds.2022.3176783 , number =

  58. [59]

    Representational similarity analysis – connecting the branches of systems neuroscience , issn =

    Kriegeskorte, Nikolaus , year =. Representational similarity analysis – connecting the branches of systems neuroscience , issn =. doi:10.3389/neuro.06.004.2008 , journal =

  59. [60]

    and Muller, Dyana C.Y

    Piantadosi, Steven T. and Muller, Dyana C.Y. and Rule, Joshua S. and Kaushik, Karthikeya and Gorenstein, Mark and Leib, Elena R. and Sanford, Emily , year =. Why concepts are (probably) vectors , volume =. Trends in Cognitive Sciences , publisher =. doi:10.1016/j.tics.2024.06.011 , number =

  60. [61]

    and Conant, Lisa L

    Binder, Jeffrey R. and Conant, Lisa L. and Humphries, Colin J. and Fernandino, Leonardo and Simons, Stephen B. and Aguilar, Mario and Desai, Rutvik H. , year =. Toward a brain-based componential semantic representation , volume =. Cognitive Neuropsychology , publisher =. doi:10.1080/02643294.2016.1147426 , number =

  61. [62]

    and Seidenberg, Mark S

    McRae, Ken and Cree, George S. and Seidenberg, Mark S. and Mcnorgan, Chris , year =. Semantic feature production norms for a large set of living and nonliving things , volume =. Behavior Research Methods , publisher =. doi:10.3758/bf03192726 , number =

  62. [63]

    Concreteness ratings for 40 thousand generally known English word lemmas , volume =

    Brysbaert, Marc and Warriner, Amy Beth and Kuperman, Victor , year =. Concreteness ratings for 40 thousand generally known English word lemmas , volume =. Behavior Research Methods , publisher =. doi:10.3758/s13428-013-0403-5 , number =

  63. [64]

    Norms of valence, arousal, and dominance for 13,915 English lemmas , volume =

    Warriner, Amy Beth and Kuperman, Victor and Brysbaert, Marc , year =. Norms of valence, arousal, and dominance for 13,915 English lemmas , volume =. Behavior Research Methods , publisher =. doi:10.3758/s13428-012-0314-x , number =

  64. [65]

    THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior , volume =

    Hebart, Martin N and Contier, Oliver and Teichmann, Lina and Rockter, Adam H and Zheng, Charles Y and Kidder, Alexis and Corriveau, Anna and Vaziri-Pashkam, Maryam and Baker, Chris I , year =. THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior , volume =. doi:10.7554/elife.8258...

  65. [66]

    Memorisation versus Generalisation in Pre-trained Language Models

    T. Memorisation versus Generalisation in Pre-trained Language Models. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.521

  66. [67]

    arXiv preprint , volume =

    Rui Mao and Qian Liu and Xiao Li and Erik Cambria and Amir Hussain , title =. arXiv preprint , volume =. 2025 , url =. doi:10.48550/ARXIV.2508.20674 , beprinttype =

  67. [68]

    Daniel and Sumers, Theodore R

    Templeton, Adly and Conerly, Tom and Marcus, Jonathan and Lindsey, Jack and Bricken, Trenton and Chen, Brian and Pearce, Adam and Citro, Craig and Ameisen, Emmanuel and Jones, Andy and Cunningham, Hoagy and Turner, Nicholas L and McDougall, Callum and MacDiarmid, Monte and Tamkin, Alex and Durmus, Esin and Hume, Tristan and Mosconi, Francesco and Freeman,...

  68. [69]

    Kietzmann and Emily Allen and Yihan Wu and Thomas Naselaris and Kendrick Kay and Ian Charest , doi =

    Adrien Doerig and Tim C. Kietzmann and Emily Allen and Yihan Wu and Thomas Naselaris and Kendrick Kay and Ian Charest , doi =. Nature Machine Intelligence , title =

  69. [70]

    Decoding the brain: From neural representations to mechanistic models , booktitle =

    Mackenzie Weygandt Mathis and Adriana. Decoding the brain: From neural representations to mechanistic models , booktitle =. 2024 , issn =. doi:https://doi.org/10.1016/j.cell.2024.08.051 , url =

  70. [71]

    Transformer Circuits Thread , year =

    A Mathematical Framework for Transformer Circuits , author =. Transformer Circuits Thread , year =

  71. [72]

    2023 , journal=

    Language Models Can Explain Neurons in Language Models , author =. 2023 , journal=

  72. [73]

    Chapter 61 - The Hub-and-Spoke Hypothesis of Semantic Memory , editor =

    Karalyn Patterson and Matthew A. Chapter 61 - The Hub-and-Spoke Hypothesis of Semantic Memory , editor =. Neurobiology of Language , publisher =. 2016 , isbn =. doi:https://doi.org/10.1016/B978-0-12-407794-2.00061-4 , url =

  73. [74]

    Communications Biology , title =

    Shirin Vafaei and Ryohei Fukuma and Takufumi Yanagisawa and Huixiang Yang and Satoru Oshino and Naoki Tani and Hui Ming Khoo and Hidenori Sugano and Yasushi Iimura and Hiroharu Suzuki and Madoka Nakajima and Kentaro Tamura and Haruhiko Kishima , doi =. Communications Biology , title =

  74. [75]

    Glossa Psycholinguistics , title =

    Veronica Boyce and Roger Levy , doi =. Glossa Psycholinguistics , title =

  75. [76]

    and Binney, Richard J

    Diveica, Veronica and Pexman, Penny M. and Binney, Richard J. , journal =. Quantifying Social Semantics: An Inclusive Definition of Socialness and Ratings for 8388. 2023 , doi =

  76. [77]

    Vasilakos and Giovanni Iacca and Arshad Ali Khan and Arvind Kumar and Jae Won Cho and Ajmal Mian and Lihua Xie and Erik Cambria and Lin Wang , title =

    Jian Liu and Xiongtao Shi and Thai Duy Nguyen and Haitian Zhang and Tianxiang Zhang and Wei Sun and Yanjie Li and Athanasios V. Vasilakos and Giovanni Iacca and Arshad Ali Khan and Arvind Kumar and Jae Won Cho and Ajmal Mian and Lihua Xie and Erik Cambria and Lin Wang , title =. arXiv preprint , volume =. 2025 , url =. doi:10.48550/ARXIV.2505.07634 , bepr...

  77. [78]

    2025 , journal =

    BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain , author=. 2025 , journal =

  78. [79]

    A udio SAE : Towards Understanding of Audio-Processing Models with Sparse A uto E ncoders

    Aparin, Georgii and Sadekova, Tasnima and Rukhovich, Alexey and Yermekova, Assel and Kushnareva, Laida and Popov, Vadim and Kuznetsov, Kristian and Piontkovskaya, Irina. A udio SAE : Towards Understanding of Audio-Processing Models with Sparse A uto E ncoders. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputation...

  79. [80]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Disentangling Superpositions: Interpretable Brain Encoding Model with Sparse Concept Atoms , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=