Heterogeneous Neural Predictivity from Language Models During Naturalistic Comprehension

Xiao Jia

arxiv: 2606.26880 · v1 · pith:USZ7P6DXnew · submitted 2026-06-25 · 💻 cs.CL · cs.LG

Heterogeneous Neural Predictivity from Language Models During Naturalistic Comprehension

Xiao Jia This is my paper

Pith reviewed 2026-06-26 04:49 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords language modelsneural encodingnaturalistic comprehensionMEGECoGpredictive modelingbrain activity

0 comments

The pith

Language model representations serve as neural predictors during natural speech and text comprehension.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether features from frozen language models can predict recorded brain activity while people process naturalistic language. It applies blocked encoding models to data from Brain Treebank, MEG-MASC, and Podcast ECoG, incorporating matched controls for timing, nuisance variables, and representation capacity. Positive held-out predictions and improvements over low-level baselines appear in many source-level summaries, with 67 of 432 evaluable rows meeting a strict predictive criterion. Feature ablations on the model side alter scores in most cases, while brain-derived and signal controls confirm the pipeline's sensitivity. The work positions these quantities as useful annotations without equating them to brain computations or shared organization.

Core claim

Language-model-derived quantities can annotate neural activity during natural speech and text comprehension. Across the datasets, 67 of 432 evaluable rows met a controlled predictive-only criterion after matched temporal, nuisance, and representation-capacity controls, and model-side feature ablations changed prediction scores in most evaluable source rows.

What carries the argument

Blocked encoding models that use frozen language model features as predictors, paired with matched controls for temporal structure, nuisance variables, and representation capacity.

If this is right

Positive held-out prediction and gains over low-level baselines were widespread in source-level summaries.
Model-side feature ablations changed prediction scores in most evaluable source rows.
Participant-level matched-control advantages were localized rather than uniform across sources.
Response-profile and feature-specificity contrasts bounded representational or computational interpretations.
Complete co-indexed integrated interpretation requires future jointly indexed coverage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same controlled pipeline could be applied to test whether particular language model layers or architectures show stronger alignment with specific recording modalities.
Extending the analysis to jointly model multiple datasets might reveal whether the observed heterogeneity stems from stimulus differences or participant variability.
The separation of predictive usefulness from claims about shared neural organization suggests these predictors could serve as practical tools for annotating new brain data without requiring mechanistic equivalence.

Load-bearing premise

The combination of blocked encoding models and matched controls for timing, nuisance factors, and capacity is enough to isolate the unique predictive contribution of the language model features without leftover confounds.

What would settle it

Re-running the same pipeline on a new set of naturalistic language recordings and finding that no additional rows meet the predictive criterion after identical controls would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.26880 by Xiao Jia.

**Figure 1.** Figure 1: Positive information-bearing predictivity and participant-level scope. (A) Analysis path and evidence ladder. (B) Participant-level raw Pearson-r after within-participant run averaging; intervals are participant-cluster bootstrap summaries. (C) Participant-level gain over the nuisance baseline. (D) Predictiveonly criterion rows by dataset, model, layer, and candidate quantity; dot area gives passed config… view at source ↗

**Figure 2.** Figure 2: Predictive coverage and matched controls. (A) Dataset inventory and branch availability in the matched derived data; Ready denotes branch coverage, including reliability-bounded profile rows where available, and NE denotes unavailable complete-chain coverage. (B) Participant-level model-control means summarize dataset-level scope; labels give participants/subject-run units retained after complete matching.… view at source ↗

**Figure 3.** Figure 3: Response-profile and reliability-bounded profile evidence. (A) Raw model-to-brain profile similarity is positive; best matched controls are stronger on average. (B) Dataset-level matched profile-control deltas are below the most competitive matched controls across all three primary datasets; intervals denote target-profile coverage summaries. (C) Brain-as-model positive controls show median metric-cell del… view at source ↗

**Figure 4.** Figure 4: Feature-ablation, specificity, and calibration. (A) Model-side ablation deltas for Brain Treebank and Podcast ECoG candidate quantities. (B) Stage-level summaries with distinct denominators. (C) Stochastic implanted-signal calibration with 100 seeded implants per strength; the band gives Wilson 95% intervals, and dashed lines mark 80% detection and the threshold of 1.49. (D) Robustness status matrix; P den… view at source ↗

read the original abstract

Language-model representations provide structured, high-dimensional annotations of naturalistic language stimuli and can serve as informative neural predictors during comprehension. We analyzed locked derived data from Brain Treebank, MEG-MASC, and Podcast ECoG with eight frozen language models, blocked encoding models, and matched temporal, nuisance, and representation-capacity controls. Positive held-out prediction and gains over low-level baselines were widespread in source-level summaries. Across Brain Treebank and Podcast ECoG, 67 of 432 evaluable rows met a controlled predictive-only criterion, and model-side feature ablations changed prediction scores in most evaluable source rows. Brain-derived, timing-linked, acoustic, and implanted-signal controls confirmed component-level sensitivity of the analysis pipeline. These findings show that language-model-derived quantities can annotate neural activity during natural speech and text comprehension. Participant-level matched-control advantages were localized rather than uniform, response-profile and feature-specificity contrasts bounded representational or computational interpretations, and complete co-indexed integrated interpretation will require future jointly indexed coverage. Together, the analyses identify language-model features as useful neural predictors and separate predictive usefulness from claims about shared neural organization or language-processing computations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends LM-to-brain encoding with more datasets and ablations but the 67/432 controlled predictions rest on controls whose exact sufficiency is not detailed enough to judge from the abstract.

read the letter

The main takeaway is that this paper finds language-model features predict neural activity during natural comprehension in 67 of 432 evaluable rows after blocked encoding models plus matched temporal, nuisance, and capacity controls across Brain Treebank, MEG-MASC, and Podcast ECoG. It also reports that model ablations changed scores in most rows and that various sensitivity controls worked as expected.

It does a few things cleanly. The multi-dataset setup and use of eight frozen models give breadth. The explicit separation of predictive annotation from mechanism claims is stated plainly, and the note that participant-level advantages are localized rather than uniform is a useful observation. The ablations and control checks provide some evidence that the pipeline is sensitive to the intended features.

The soft spot is the 67/432 count itself. That number depends on the matched controls fully removing residual variance from acoustics, timing, or capacity. The abstract does not spell out the precise capacity-matching procedure or correction for testing 432 rows at once, so it is hard to tell whether the criterion is robust or could shift with different choices. This is an extension of existing encoding methods rather than a new framework, which is fine but means the work mainly adds data points and checks rather than changing the basic approach.

This is for computational neuroscientists who already work on naturalistic language encoding and want more controlled examples. It shows clear engagement with the problem of isolating predictive power. It deserves peer review so the methods can be examined in detail.

Referee Report

3 major / 1 minor

Summary. The paper claims that language-model representations provide useful annotations of neural activity during naturalistic comprehension. Analyzing locked data from Brain Treebank, MEG-MASC, and Podcast ECoG with eight frozen LMs, blocked encoding models, and matched temporal/nuisance/representation-capacity controls, it reports widespread positive held-out predictions and gains over low-level baselines; across two datasets, 67 of 432 evaluable rows meet a controlled predictive-only criterion, with model-side ablations altering scores in most rows and various controls confirming pipeline sensitivity.

Significance. If the blocked-encoding plus matched-control pipeline isolates unique LM contributions without residual confounds, the work supplies concrete evidence that LM-derived quantities can annotate neural responses in natural speech and text, across multiple recording modalities. The reliance on held-out prediction, feature ablations, and explicit controls (rather than circular derivations) is a methodological strength that separates predictive utility from stronger claims about shared organization or computations.

major comments (3)

[Abstract / Methods] Abstract and Methods: the precise definition of the 'controlled predictive-only criterion,' the exact exclusion rules that produce the 432 evaluable rows, and the participant- or source-level thresholding applied to reach the 67 count are not stated; because this count is the headline quantitative result, the absence of these details makes it impossible to verify that the selection is free of post-hoc effects.
[Methods] Methods (representation-capacity controls): the procedure used to match representation capacity across models and baselines is described only at a high level; without the explicit matching algorithm or the resulting capacity metrics, it is unclear whether residual capacity differences could still contribute to the 67 rows that survive the controls.
[Results] Results (67-of-432 rows): no correction for multiple comparisons across the 432 simultaneous tests is mentioned, nor are error bars or per-row statistical thresholds provided for the held-out predictions; these omissions directly affect the reliability of the central count.

minor comments (1)

[Abstract] Abstract: the sentence 'complete co-indexed integrated interpretation will require future jointly indexed coverage' is opaque and should be rephrased for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review. We address each major comment below and will revise the manuscript to supply the requested methodological details and statistical clarifications.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and Methods: the precise definition of the 'controlled predictive-only criterion,' the exact exclusion rules that produce the 432 evaluable rows, and the participant- or source-level thresholding applied to reach the 67 count are not stated; because this count is the headline quantitative result, the absence of these details makes it impossible to verify that the selection is free of post-hoc effects.

Authors: We agree the definitions and selection rules require explicit statement. The controlled predictive-only criterion is a source row that shows (i) significant positive held-out r after all nuisance, temporal, and capacity controls and (ii) a statistically reliable drop when LM features are ablated. The 432 evaluable rows are those with sufficient stimulus coverage and non-degenerate response variance after quality filters; the 67 count aggregates across source-level summaries without participant-level thresholding. We will add a dedicated Methods subsection with the exact criterion, exclusion rules, and aggregation procedure. revision: yes
Referee: [Methods] Methods (representation-capacity controls): the procedure used to match representation capacity across models and baselines is described only at a high level; without the explicit matching algorithm or the resulting capacity metrics, it is unclear whether residual capacity differences could still contribute to the 67 rows that survive the controls.

Authors: The capacity-matching procedure equalizes effective dimensionality by retaining the minimal number of principal components that explain a target fraction of stimulus variance on a held-out set, then projects all feature sets to that common rank. We will insert the explicit algorithm, pseudocode, and per-model capacity metrics (effective rank and explained variance) into the Methods section so readers can verify that residual capacity differences do not drive the surviving rows. revision: yes
Referee: [Results] Results (67-of-432 rows): no correction for multiple comparisons across the 432 simultaneous tests is mentioned, nor are error bars or per-row statistical thresholds provided for the held-out predictions; these omissions directly affect the reliability of the central count.

Authors: We acknowledge the absence of multiple-comparison correction and per-row statistics. In the revision we will report Bonferroni- and FDR-corrected p-values across the 432 tests, include bootstrap error bars on all held-out correlations, and state the exact per-row permutation threshold used to declare significance. These additions will directly support the reliability of the 67-row count. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical held-out prediction with external controls

full rationale

The paper reports an empirical analysis pipeline that applies frozen language models to external neural datasets (Brain Treebank, MEG-MASC, Podcast ECoG) via blocked encoding models, matched temporal/nuisance/representation-capacity controls, and held-out prediction. The 67-of-432 count is an empirical threshold result after these controls and ablations, not a quantity defined by or reduced to the fitted parameters themselves. No equations, self-citations, or ansatzes are described that would make any prediction equivalent to its inputs by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; encoding models are described only at the level of blocked regression with controls, so no ledger entries can be extracted.

pith-pipeline@v0.9.1-grok · 5719 in / 1327 out tokens · 47112 ms · 2026-06-26T04:49:46.211562+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 2 canonical work pages

[1]

Controlling the false discovery rate: A practical and powerful approach to multiple testing , journal =

Benjamini, Yoav and Hochberg, Yosef , year =. Controlling the false discovery rate: A practical and powerful approach to multiple testing , journal =
[2]

and Blank, Idan A

Hadidi, Nima and Feghhi, Ebrahim and Song, Bryan H. and Blank, Idan A. and Kao, Jonathan C. , year =. Spurious alignment between large language models and brains can emerge from non-robust methods and overlooked confounds , journal =
[3]

Pythia: A suite for analyzing large language models across training and scaling , booktitle =

Biderman, Stella and others , year =. Pythia: A suite for analyzing large language models across training and scaling , booktitle =
[4]

Syntactic processing is distributed across the language system , journal =

Blank, Idan and Balewski, Zachary and Mahowald, Kyle and Fedorenko, Evelina , year =. Syntactic processing is distributed across the language system , journal =
[5]

and Stabler, Edward P

Brennan, Jonathan R. and Stabler, Edward P. and Van Wagenen, Sarah E. and Luh, Wen-Ming and Hale, John T. , year =. Abstract linguistic structure correlates with temporal activity during naturalistic comprehension , journal =
[6]

and Anderson, Andrew J

Broderick, Michael P. and Anderson, Andrew J. and Di Liberto, Giovanni M. and Crosse, Michael J. and Lalor, Edmund C. , year =. Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech , journal =
[7]

, year =

Brodbeck, Christian and Presacco, Alessandro and Simon, Jonathan Z. , year =. Rapid transformation from auditory to linguistic representations of continuous speech , journal =
[8]

and Friederici, Angela D

van der Burght, Constantijn L. and Friederici, Angela D. and Maran, Matteo and Papitto, Giorgio and Pyatigorskaya, Elena and Schro. Journal of Cognitive Neuroscience , volume =. 2023 , title =

2023
[9]

and others , year =

Brown, Tom B. and others , year =. Language models are few-shot learners , booktitle =. 2005.14165 , eprinttype =

Pith/arXiv arXiv 2005
[10]

and Subramaniam, Vighnesh and Rosenfarb, Dana and DeWitt, Jan and Misra, Pranav and Madsen, Joseph R

Wang, Christopher and Yaari, Adam Uri and Singh, Aaditya K. and Subramaniam, Vighnesh and Rosenfarb, Dana and DeWitt, Jan and Misra, Pranav and Madsen, Joseph R. and Stone, Scellig and Kreiman, Gabriel and Katz, Boris and Cases, Ignacio and Barbu, Andrei , year =. Brain Treebank: Large-scale intracranial recordings from naturalistic language stimuli , boo...

work page doi:10.52202/079017-3060
[11]

Brains and algorithms partially converge in natural language processing , journal =

Caucheteux, Charlotte and King, Jean-Remi , year =. Brains and algorithms partially converge in natural language processing , journal =
[12]

Algorithms for learning kernels based on centered alignment , journal =

Cortes, Corinna and Mohri, Mehryar and Rostamizadeh, Afshin , year =. Algorithms for learning kernels based on centered alignment , journal =
[13]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , year =. Proceedings of NAACL-HLT , pages =. doi:10.18653/v1/N19-1423 , url =

work page doi:10.18653/v1/n19-1423
[14]

Cortical tracking of hierarchical linguistic structures in connected speech , journal =

Ding, Nai and Melloni, Lucia and Zhang, Hang and Tian, Xing and Poeppel, David , year =. Cortical tracking of hierarchical linguistic structures in connected speech , journal =
[15]

, year =

Efron, Bradley and Tibshirani, Robert J. , year =. An introduction to the bootstrap , publisher =
[16]

Apurva Ratan and Nayebi, Aran , year =

Feather, Jenelle and Khosla, Meenakshi and Murty, N. Apurva Ratan and Nayebi, Aran , year =. Brain-model evaluations need the. 2502.16238 , eprinttype =

arXiv
[17]

and Kanwisher, Nancy , year =

Fedorenko, Evelina and Behr, Michael K. and Kanwisher, Nancy , year =. Functional specificity for high-level linguistic processing in the human brain , journal =
[18]

Futrell, Richard and others , year =. The. Language Resources and Evaluation , volume =
[19]

Shared computational principles for language processing in humans and deep language models , journal =

Goldstein, Ariel and others , year =. Shared computational principles for language processing in humans and deep language models , journal =
[20]

Measuring statistical dependence with

Gretton, Arthur and Bousquet, Olivier and Smola, Alex and Schoelkopf, Bernhard , year =. Measuring statistical dependence with. Algorithmic Learning Theory , pages =
[21]

Introducing

Gwilliams, Laura and others , year =. Introducing. Scientific Data , volume =
[22]

The elements of statistical learning , edition =

Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome , year =. The elements of statistical learning , edition =
[23]

and others , year =

Haxby, James V. and others , year =. Distributed and overlapping representations of faces and objects in ventral temporal cortex , journal =
[24]

Only brains align with brains: Cross-region alignment patterns expose limits of normative models , booktitle =

Hoefling, Leon and Tangemann, Michael and Piefke, Lena and Keller, Sophia and Bethge, Matthias and Franke, Katja , year =. Only brains align with brains: Cross-region alignment patterns expose limits of normative models , booktitle =
[25]

and Kennard, Robert W

Hoerl, Arthur E. and Kennard, Robert W. , year =. Ridge regression: Biased estimation for nonorthogonal problems , journal =
[26]

and de Heer, Wendy A

Huth, Alexander G. and de Heer, Wendy A. and Griffiths, Thomas L. and Theunissen, Frederic E. and Gallant, Jack L. , year =. Natural speech reveals the semantic maps that tile human cerebral cortex , journal =
[27]

and Schrimpf, Martin and Zhang, Yian and Bowman, Samuel R

Hosseini, Eghbal A. and Schrimpf, Martin and Zhang, Yian and Bowman, Samuel R. and Zaslavsky, Noga and Fedorenko, Evelina , year =. Artificial neural network language models predict human brain responses to language even after a developmentally realistic amount of training , journal =
[28]

, year =

Antonello, Richard and Huth, Alexander G. , year =. Predictive coding or just feature discovery? An alternative account of why language models fit brain data , journal =
[29]

and Wehbe, Leila and Huth, Alexander G

Jain, Shailee and Vo, Vy A. and Wehbe, Leila and Huth, Alexander G. , year =. Computational language modeling and the promise of in silico experimentation , journal =
[30]

Distributed sensitivity to syntax and semantics throughout the language network , journal =

Shain, Cory and Kean, Hope and Casto, Colton and Lipkin, Benjamin and Affourtit, Josef and Siegelman, Matthew and Mollica, Francis and Fedorenko, Evelina , year =. Distributed sensitivity to syntax and semantics throughout the language network , journal =
[31]

``All the stars will be wells with a rusty pulley'': Neural processing of the social and pragmatic content in a narrative , journal =

Thye, Melissa and Hoffman, Paul and Mirman, Daniel , year =. ``All the stars will be wells with a rusty pulley'': Neural processing of the social and pragmatic content in a narrative , journal =
[32]

and Reichenbach, Tobias , year =

Weissbart, Hugo and Kandylaki, Katerina D. and Reichenbach, Tobias , year =. Cortical tracking of surprisal during continuous speech comprehension , journal =
[33]

and Bardolph, Megan D

Michaelov, James A. and Bardolph, Megan D. and Van Petten, Cyma K. and Bergen, Benjamin K. and Coulson, Seana , year =. Strong prediction: Language model surprisal explains multiple. Neurobiology of Language , volume =
[34]

Similarity of neural network representations revisited , booktitle =

Kornblith, Simon and Norouzi, Mohammad and Lee, Honglak and Hinton, Geoffrey , year =. Similarity of neural network representations revisited , booktitle =
[35]

, year =

Kriegeskorte, Nikolaus and Mur, Marieke and Bandettini, Peter A. , year =. Representational similarity analysis: Connecting the branches of systems neuroscience , journal =
[36]

Kyle and Bellgowan, Patrick S

Kriegeskorte, Nikolaus and Simmons, W. Kyle and Bellgowan, Patrick S. F. and Baker, Chris I. , year =. Circular analysis in systems neuroscience: The dangers of double dipping , journal =
[37]

and Gaca, Michal and Drozdziel, Dominika and Kossowski, Bartlomiej and Herman, Aleksandra M

Olszewska, Agata M. and Gaca, Michal and Drozdziel, Dominika and Kossowski, Bartlomiej and Herman, Aleksandra M. and Marchewka, Artur , year =
[38]

and Silbert, Lauren J

Lerner, Yulia and Honey, Christopher J. and Silbert, Lauren J. and Hasson, Uri , year =. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story , journal =
[39]

Li, Jinhong and others , year =. Le. Scientific Data , volume =
[40]

The detection of disease clustering and a generalized regression approach , journal =

Mantel, Nathan , year =. The detection of disease clustering and a generalized regression approach , journal =
[41]

Distributed representations of words and phrases and their compositionality , booktitle =

Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg and Dean, Jeffrey , year =. Distributed representations of words and phrases and their compositionality , booktitle =
[42]

and others , year =

Nastase, Samuel A. and others , year =. The. Scientific Data , volume =
[43]

and Holmes, Andrew P

Nichols, Thomas E. and Holmes, Andrew P. , year =. Nonparametric permutation tests for functional neuroimaging: A primer with examples , journal =
[44]

Scikit-learn: Machine learning in

Pedregosa, Fabian and others , year =. Scikit-learn: Machine learning in. Journal of Machine Learning Research , volume =
[45]

Toward a universal decoder of linguistic meaning from brain activation , journal =

Pereira, Francisco and others , year =. Toward a universal decoder of linguistic meaning from brain activation , journal =
[46]

2412.15115 , eprinttype =

Yang, An and others , year =. 2412.15115 , eprinttype =

Pith/arXiv arXiv
[47]

2505.09388 , eprinttype =

Yang, An and others , year =. 2505.09388 , eprinttype =

Pith/arXiv arXiv
[48]

The neural architecture of language: Integrative modeling converges on predictive processing , journal =

Schrimpf, Martin and others , year =. The neural architecture of language: Integrative modeling converges on predictive processing , journal =
[49]

Cross-validatory choice and assessment of statistical predictions , journal =

Stone, Mervyn , year =. Cross-validatory choice and assessment of statistical predictions , journal =
[50]

Interpreting and improving natural-language processing in machines with natural language-processing in the brain , booktitle =

Toneva, Mariya and Wehbe, Leila , year =. Interpreting and improving natural-language processing in machines with natural language-processing in the brain , booktitle =. 1905.11833 , eprinttype =

arXiv 1905
[51]

Driving and suppressing the human language network using large language models , journal =

Tuckute, Greta and others , year =. Driving and suppressing the human language network using large language models , journal =
[52]

Bias in error estimation when using cross-validation for model selection , journal =

Varma, Sudhir and Simon, Richard , year =. Bias in error estimation when using cross-validation for model selection , journal =
[53]

Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines , journal =

Varoquaux, Gael and others , year =. Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines , journal =
[54]

Attention is all you need , booktitle =

Vaswani, Ashish and others , year =. Attention is all you need , booktitle =. 1706.03762 , eprinttype =

Pith/arXiv arXiv
[55]

Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses , journal =

Wehbe, Leila and others , year =. Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses , journal =
[56]

Choosing prediction over explanation in psychology: Lessons from machine learning , journal =

Yarkoni, Tal and Westfall, Jacob , year =. Choosing prediction over explanation in psychology: Lessons from machine learning , journal =
[57]

Zada, Zaid and others , year =. The. Scientific Data , volume =

[1] [1]

Controlling the false discovery rate: A practical and powerful approach to multiple testing , journal =

Benjamini, Yoav and Hochberg, Yosef , year =. Controlling the false discovery rate: A practical and powerful approach to multiple testing , journal =

[2] [2]

and Blank, Idan A

Hadidi, Nima and Feghhi, Ebrahim and Song, Bryan H. and Blank, Idan A. and Kao, Jonathan C. , year =. Spurious alignment between large language models and brains can emerge from non-robust methods and overlooked confounds , journal =

[3] [3]

Pythia: A suite for analyzing large language models across training and scaling , booktitle =

Biderman, Stella and others , year =. Pythia: A suite for analyzing large language models across training and scaling , booktitle =

[4] [4]

Syntactic processing is distributed across the language system , journal =

Blank, Idan and Balewski, Zachary and Mahowald, Kyle and Fedorenko, Evelina , year =. Syntactic processing is distributed across the language system , journal =

[5] [5]

and Stabler, Edward P

Brennan, Jonathan R. and Stabler, Edward P. and Van Wagenen, Sarah E. and Luh, Wen-Ming and Hale, John T. , year =. Abstract linguistic structure correlates with temporal activity during naturalistic comprehension , journal =

[6] [6]

and Anderson, Andrew J

Broderick, Michael P. and Anderson, Andrew J. and Di Liberto, Giovanni M. and Crosse, Michael J. and Lalor, Edmund C. , year =. Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech , journal =

[7] [7]

, year =

Brodbeck, Christian and Presacco, Alessandro and Simon, Jonathan Z. , year =. Rapid transformation from auditory to linguistic representations of continuous speech , journal =

[8] [8]

and Friederici, Angela D

van der Burght, Constantijn L. and Friederici, Angela D. and Maran, Matteo and Papitto, Giorgio and Pyatigorskaya, Elena and Schro. Journal of Cognitive Neuroscience , volume =. 2023 , title =

2023

[9] [9]

and others , year =

Brown, Tom B. and others , year =. Language models are few-shot learners , booktitle =. 2005.14165 , eprinttype =

Pith/arXiv arXiv 2005

[10] [10]

and Subramaniam, Vighnesh and Rosenfarb, Dana and DeWitt, Jan and Misra, Pranav and Madsen, Joseph R

Wang, Christopher and Yaari, Adam Uri and Singh, Aaditya K. and Subramaniam, Vighnesh and Rosenfarb, Dana and DeWitt, Jan and Misra, Pranav and Madsen, Joseph R. and Stone, Scellig and Kreiman, Gabriel and Katz, Boris and Cases, Ignacio and Barbu, Andrei , year =. Brain Treebank: Large-scale intracranial recordings from naturalistic language stimuli , boo...

work page doi:10.52202/079017-3060

[11] [11]

Brains and algorithms partially converge in natural language processing , journal =

Caucheteux, Charlotte and King, Jean-Remi , year =. Brains and algorithms partially converge in natural language processing , journal =

[12] [12]

Algorithms for learning kernels based on centered alignment , journal =

Cortes, Corinna and Mohri, Mehryar and Rostamizadeh, Afshin , year =. Algorithms for learning kernels based on centered alignment , journal =

[13] [13]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , year =. Proceedings of NAACL-HLT , pages =. doi:10.18653/v1/N19-1423 , url =

work page doi:10.18653/v1/n19-1423

[14] [14]

Cortical tracking of hierarchical linguistic structures in connected speech , journal =

Ding, Nai and Melloni, Lucia and Zhang, Hang and Tian, Xing and Poeppel, David , year =. Cortical tracking of hierarchical linguistic structures in connected speech , journal =

[15] [15]

, year =

Efron, Bradley and Tibshirani, Robert J. , year =. An introduction to the bootstrap , publisher =

[16] [16]

Apurva Ratan and Nayebi, Aran , year =

Feather, Jenelle and Khosla, Meenakshi and Murty, N. Apurva Ratan and Nayebi, Aran , year =. Brain-model evaluations need the. 2502.16238 , eprinttype =

arXiv

[17] [17]

and Kanwisher, Nancy , year =

Fedorenko, Evelina and Behr, Michael K. and Kanwisher, Nancy , year =. Functional specificity for high-level linguistic processing in the human brain , journal =

[18] [18]

Futrell, Richard and others , year =. The. Language Resources and Evaluation , volume =

[19] [19]

Shared computational principles for language processing in humans and deep language models , journal =

Goldstein, Ariel and others , year =. Shared computational principles for language processing in humans and deep language models , journal =

[20] [20]

Measuring statistical dependence with

Gretton, Arthur and Bousquet, Olivier and Smola, Alex and Schoelkopf, Bernhard , year =. Measuring statistical dependence with. Algorithmic Learning Theory , pages =

[21] [21]

Introducing

Gwilliams, Laura and others , year =. Introducing. Scientific Data , volume =

[22] [22]

The elements of statistical learning , edition =

Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome , year =. The elements of statistical learning , edition =

[23] [23]

and others , year =

Haxby, James V. and others , year =. Distributed and overlapping representations of faces and objects in ventral temporal cortex , journal =

[24] [24]

Only brains align with brains: Cross-region alignment patterns expose limits of normative models , booktitle =

Hoefling, Leon and Tangemann, Michael and Piefke, Lena and Keller, Sophia and Bethge, Matthias and Franke, Katja , year =. Only brains align with brains: Cross-region alignment patterns expose limits of normative models , booktitle =

[25] [25]

and Kennard, Robert W

Hoerl, Arthur E. and Kennard, Robert W. , year =. Ridge regression: Biased estimation for nonorthogonal problems , journal =

[26] [26]

and de Heer, Wendy A

Huth, Alexander G. and de Heer, Wendy A. and Griffiths, Thomas L. and Theunissen, Frederic E. and Gallant, Jack L. , year =. Natural speech reveals the semantic maps that tile human cerebral cortex , journal =

[27] [27]

and Schrimpf, Martin and Zhang, Yian and Bowman, Samuel R

Hosseini, Eghbal A. and Schrimpf, Martin and Zhang, Yian and Bowman, Samuel R. and Zaslavsky, Noga and Fedorenko, Evelina , year =. Artificial neural network language models predict human brain responses to language even after a developmentally realistic amount of training , journal =

[28] [28]

, year =

Antonello, Richard and Huth, Alexander G. , year =. Predictive coding or just feature discovery? An alternative account of why language models fit brain data , journal =

[29] [29]

and Wehbe, Leila and Huth, Alexander G

Jain, Shailee and Vo, Vy A. and Wehbe, Leila and Huth, Alexander G. , year =. Computational language modeling and the promise of in silico experimentation , journal =

[30] [30]

Distributed sensitivity to syntax and semantics throughout the language network , journal =

Shain, Cory and Kean, Hope and Casto, Colton and Lipkin, Benjamin and Affourtit, Josef and Siegelman, Matthew and Mollica, Francis and Fedorenko, Evelina , year =. Distributed sensitivity to syntax and semantics throughout the language network , journal =

[31] [31]

``All the stars will be wells with a rusty pulley'': Neural processing of the social and pragmatic content in a narrative , journal =

Thye, Melissa and Hoffman, Paul and Mirman, Daniel , year =. ``All the stars will be wells with a rusty pulley'': Neural processing of the social and pragmatic content in a narrative , journal =

[32] [32]

and Reichenbach, Tobias , year =

Weissbart, Hugo and Kandylaki, Katerina D. and Reichenbach, Tobias , year =. Cortical tracking of surprisal during continuous speech comprehension , journal =

[33] [33]

and Bardolph, Megan D

Michaelov, James A. and Bardolph, Megan D. and Van Petten, Cyma K. and Bergen, Benjamin K. and Coulson, Seana , year =. Strong prediction: Language model surprisal explains multiple. Neurobiology of Language , volume =

[34] [34]

Similarity of neural network representations revisited , booktitle =

Kornblith, Simon and Norouzi, Mohammad and Lee, Honglak and Hinton, Geoffrey , year =. Similarity of neural network representations revisited , booktitle =

[35] [35]

, year =

Kriegeskorte, Nikolaus and Mur, Marieke and Bandettini, Peter A. , year =. Representational similarity analysis: Connecting the branches of systems neuroscience , journal =

[36] [36]

Kyle and Bellgowan, Patrick S

Kriegeskorte, Nikolaus and Simmons, W. Kyle and Bellgowan, Patrick S. F. and Baker, Chris I. , year =. Circular analysis in systems neuroscience: The dangers of double dipping , journal =

[37] [37]

and Gaca, Michal and Drozdziel, Dominika and Kossowski, Bartlomiej and Herman, Aleksandra M

Olszewska, Agata M. and Gaca, Michal and Drozdziel, Dominika and Kossowski, Bartlomiej and Herman, Aleksandra M. and Marchewka, Artur , year =

[38] [38]

and Silbert, Lauren J

Lerner, Yulia and Honey, Christopher J. and Silbert, Lauren J. and Hasson, Uri , year =. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story , journal =

[39] [39]

Li, Jinhong and others , year =. Le. Scientific Data , volume =

[40] [40]

The detection of disease clustering and a generalized regression approach , journal =

Mantel, Nathan , year =. The detection of disease clustering and a generalized regression approach , journal =

[41] [41]

Distributed representations of words and phrases and their compositionality , booktitle =

Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg and Dean, Jeffrey , year =. Distributed representations of words and phrases and their compositionality , booktitle =

[42] [42]

and others , year =

Nastase, Samuel A. and others , year =. The. Scientific Data , volume =

[43] [43]

and Holmes, Andrew P

Nichols, Thomas E. and Holmes, Andrew P. , year =. Nonparametric permutation tests for functional neuroimaging: A primer with examples , journal =

[44] [44]

Scikit-learn: Machine learning in

Pedregosa, Fabian and others , year =. Scikit-learn: Machine learning in. Journal of Machine Learning Research , volume =

[45] [45]

Toward a universal decoder of linguistic meaning from brain activation , journal =

Pereira, Francisco and others , year =. Toward a universal decoder of linguistic meaning from brain activation , journal =

[46] [46]

2412.15115 , eprinttype =

Yang, An and others , year =. 2412.15115 , eprinttype =

Pith/arXiv arXiv

[47] [47]

2505.09388 , eprinttype =

Yang, An and others , year =. 2505.09388 , eprinttype =

Pith/arXiv arXiv

[48] [48]

The neural architecture of language: Integrative modeling converges on predictive processing , journal =

Schrimpf, Martin and others , year =. The neural architecture of language: Integrative modeling converges on predictive processing , journal =

[49] [49]

Cross-validatory choice and assessment of statistical predictions , journal =

Stone, Mervyn , year =. Cross-validatory choice and assessment of statistical predictions , journal =

[50] [50]

Interpreting and improving natural-language processing in machines with natural language-processing in the brain , booktitle =

Toneva, Mariya and Wehbe, Leila , year =. Interpreting and improving natural-language processing in machines with natural language-processing in the brain , booktitle =. 1905.11833 , eprinttype =

arXiv 1905

[51] [51]

Driving and suppressing the human language network using large language models , journal =

Tuckute, Greta and others , year =. Driving and suppressing the human language network using large language models , journal =

[52] [52]

Bias in error estimation when using cross-validation for model selection , journal =

Varma, Sudhir and Simon, Richard , year =. Bias in error estimation when using cross-validation for model selection , journal =

[53] [53]

Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines , journal =

Varoquaux, Gael and others , year =. Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines , journal =

[54] [54]

Attention is all you need , booktitle =

Vaswani, Ashish and others , year =. Attention is all you need , booktitle =. 1706.03762 , eprinttype =

Pith/arXiv arXiv

[55] [55]

Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses , journal =

Wehbe, Leila and others , year =. Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses , journal =

[56] [56]

Choosing prediction over explanation in psychology: Lessons from machine learning , journal =

Yarkoni, Tal and Westfall, Jacob , year =. Choosing prediction over explanation in psychology: Lessons from machine learning , journal =

[57] [57]

Zada, Zaid and others , year =. The. Scientific Data , volume =