Mapping Whisper Representations to Human ECoG Responses with Interpretable Time-Resolved Neural Encoding

Matteo Ciferri; Matteo Ferrante; Michal Olak; Nicola Toschi; Tommaso Boccato

arxiv: 2606.02305 · v1 · pith:SOL26VFCnew · submitted 2026-06-01 · 🧬 q-bio.NC · cs.HC

Mapping Whisper Representations to Human ECoG Responses with Interpretable Time-Resolved Neural Encoding

Matteo Ciferri , Tommaso Boccato , Michal Olak , Matteo Ferrante , Nicola Toschi This is my paper

Pith reviewed 2026-06-28 11:37 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.HC

keywords WhisperECoGspeech perceptionneural encodingbrain alignmentfoundation modelstemporal modelingphoneme organization

0 comments

The pith

Intermediate Whisper layers align most closely with human ECoG responses during natural speech.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how representations inside the Whisper speech foundation model relate to intracranial brain recordings from people listening to speech. It introduces a time-resolved neural encoder that adds recurrent temporal processing and soft attention to standard speech embeddings, then uses this encoder to compare alignment across Whisper's layers. Middle layers show the strongest match to the neural signals, consistent with a layered hierarchy in cortical speech processing. The same encoder also produces attention maps that are localized in time and identifies electrodes whose responses organize by phoneme categories in anatomically sensible ways. These elements together position speech models as tools for probing the timing and organization of cortical speech responses.

Core claim

Intermediate Whisper layers provide the strongest correspondence with neural activity, supporting a hierarchical match between model representations and cortical speech processing. The time-resolved neural encoder, which adds recurrent modeling and soft attention to the embeddings, outperforms linear baselines on high-resolution ECoG data and yields interpretable attention maps and phoneme-category organization among informative electrodes.

What carries the argument

The time-resolved neural encoder, which combines speech embeddings with a recurrent temporal model and soft attention to enable layer-wise alignment with brain signals.

If this is right

High-resolution ECoG responses benefit from temporally structured modeling beyond simple linear mappings from the same speech representations.
Attention maps from the encoder reveal temporally local alignment between speech embeddings and neural responses.
A phonemic interpretability analysis identifies anatomically coherent phoneme-category organization among encoding-informative electrodes.
Speech foundation models can serve as a framework for studying time-resolved cortical speech representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same encoder architecture could be applied to other speech or language models to test whether they exhibit similar layer-wise hierarchies when aligned to brain data.
Attention weights might be used to isolate specific time windows where model and brain activity correspond most closely during ongoing speech.
The phoneme-category findings suggest the method could help map how particular brain regions contribute to different speech sound distinctions.

Load-bearing premise

The introduced time-resolved neural encoder captures genuine temporal brain dynamics rather than artifacts introduced by the modeling architecture itself.

What would settle it

A follow-up experiment in which the recurrent temporal component and soft attention are removed, yet layer-wise alignment strengths and phoneme organization remain unchanged or improve, would indicate that the encoder's temporal structure is not required for the reported correspondences.

Figures

Figures reproduced from arXiv: 2606.02305 by Matteo Ciferri, Matteo Ferrante, Michal Olak, Nicola Toschi, Tommaso Boccato.

**Figure 1.** Figure 1: Overview of the neural encoding architecture. Word-aligned speech segments are processed by the Whisper encoder and a bidirectional GRU, followed by a temporal at tention mechanism that maps speech representations to neural timepoints and a linear projection that predicts ECoG activity. Overall, our results demonstrate that (i) intermediate layers of Whisper best predict neural activity, (ii) the learned t… view at source ↗

**Figure 2.** Figure 2: Time-resolved encoding per formance across Whisper layers. The curves show the mean Pearson correlation (averaged across channels and subjects) between predicted and observed neural activity. Colored dots along the bot tom indicate, for each timepoint , the layer achieving the highest per formance. Red crosses along the top mark timepoints that reached statistical significance under a permutation test agai… view at source ↗

**Figure 3.** Figure 3: For each electrode, the Whisper layer yielding the highest encoding per formance is shown across successive temporal windows relative to word onset . Early windows are dominated by lower-level layers, reflecting acoustic tracking, while later windows show a shif t toward middle layers, consistent with phonetic and ar ticulatory processing. Spatial pat terns reveal a progression from posterior auditory cor … view at source ↗

**Figure 4.** Figure 4: Comparison between our time-aware encoding model and linear baseline models. Top: Results obtained with our proposed model using the most predictive Whisper layer (layer 4). The model shows the highest temporal encoding per formance and the broadest spatial distribution of predictive electrodes. Bottom lef t: Time-aware linear Whisper baseline using the same four th-layer Whisper representations while pres… view at source ↗

**Figure 5.** Figure 5: Lef t: At tention map showing how neural timepoints (horizontal axis, aligned to word onset) weight Whisper encoder representations at dif ferent relative temporal of fsets (ver tical axis). The map is obtained by averaging at tention weights across all test samples (i.e., words, electrodes, and subjects). A diagonal structure is observed, indicating a systematic temporal alignment between neural activity … view at source ↗

read the original abstract

Understanding how speech foundation models relate to human cortical activity is a key challenge for computational neuroscience. Here, we investigate how internal representations from Whisper predict intracranial ECoG responses during naturalistic speech perception. We introduce a time-resolved neural encoder that combines speech embeddings with a recurrent temporal model and soft attention, allowing us to examine layer-wise brain alignment. Intermediate Whisper layers provide the strongest correspondence with neural activity, supporting a hierarchical match between model representations and cortical speech processing. Comparisons with baselines show that high-resolution ECoG responses benefit from temporally structured modelling beyond linear mappings from the same speech representations. In addition, attention maps reveal temporally local alignment between speech embeddings and neural responses, while a phonemic interpretability analysis identifies anatomically coherent phoneme-category organization among encoding-informative electrodes. Together, these results suggest that speech foundation models offer a useful framework for studying time-resolved cortical speech representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows intermediate Whisper layers best match ECoG via a recurrent-plus-attention encoder and adds phoneme-level mapping, but the statistical controls are thin.

read the letter

The core result is that intermediate Whisper layers give the strongest alignment to ECoG during speech when passed through their time-resolved encoder, and the phoneme-category analysis lines up with anatomy. That is the main thing to take away.

What is new is the specific package: Whisper embeddings fed into a recurrent temporal model with soft attention, then used for layer-wise prediction on intracranial data plus a post-hoc phonemic interpretability step. Earlier alignment work exists, but this combination on ECoG with explicit temporal structure is a concrete extension.

The paper does a couple of things cleanly. It reports that the structured encoder beats linear baselines on the same embeddings, the attention maps show local temporal correspondence, and the electrode mapping produces anatomically coherent phoneme organization. Those pieces give the hierarchical claim some support.

The soft spots are in the validation details. The abstract gives no error bars, no clear statement on train-test splits or electrode-level multiple-comparison correction, and no direct test that the recurrent-attention block is not simply fitting its own temporal assumptions. If the full paper supplies those, the central claim holds up; if not, the layer-wise differences could be less robust than presented. The circularity risk looks moderate rather than severe because they include baseline comparisons.

This is for computational neuroscientists who already work on speech foundation models and ECoG or similar high-resolution recordings. A reader who wants a ready framework for time-resolved brain-AI alignment will find usable pieces.

It deserves a serious referee. The idea is grounded enough and the predictions are checkable with the right stats, so it should go to review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper introduces a time-resolved neural encoder that integrates Whisper speech embeddings with a recurrent temporal model and soft attention to predict human ECoG responses during naturalistic speech perception. It reports strongest alignment from intermediate Whisper layers, superior performance over linear baselines, temporally local attention patterns, and anatomically coherent phoneme-category organization in informative electrodes, supporting a hierarchical correspondence between model representations and cortical speech processing.

Significance. If the layer-wise alignments and temporal modeling advantages hold under rigorous validation, the work provides a useful framework for linking speech foundation models to time-resolved cortical activity with interpretable components, extending prior linear mapping approaches in computational neuroscience.

major comments (2)

[Abstract] The central claim that the time-resolved encoder captures genuine temporal brain dynamics (rather than architecture-induced artifacts) is load-bearing for the hierarchical match conclusion, yet the abstract provides no quantitative metrics, error bars, or cross-validation details on how the recurrent and attention parameters were fit or regularized against overfitting.
[Abstract] Baseline comparisons are mentioned but lack reported quantitative metrics (e.g., correlation coefficients or R² values with statistical tests) for the linear mappings versus the proposed encoder, making it difficult to assess the claimed benefit of temporally structured modeling.

minor comments (2)

Clarify the exact data-split procedure and electrode selection criteria to allow reproducibility of the layer-wise alignment results.
The phonemic interpretability analysis would benefit from explicit statistical controls for multiple comparisons across electrodes and phoneme categories.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below and will revise the abstract accordingly to include the requested quantitative details.

read point-by-point responses

Referee: [Abstract] The central claim that the time-resolved encoder captures genuine temporal brain dynamics (rather than architecture-induced artifacts) is load-bearing for the hierarchical match conclusion, yet the abstract provides no quantitative metrics, error bars, or cross-validation details on how the recurrent and attention parameters were fit or regularized against overfitting.

Authors: We agree the abstract would be strengthened by including these details. In the revised version we will add a concise summary of the cross-validated performance metrics (including mean Pearson r with standard error across folds), regularization approach (L2 penalty on recurrent weights and attention temperature), and confirmation that temporal modeling parameters were fit via nested cross-validation to mitigate overfitting. Full procedural details remain in Methods Section 3.3; the abstract revision will make the validation explicit without altering the central claim. revision: yes
Referee: [Abstract] Baseline comparisons are mentioned but lack reported quantitative metrics (e.g., correlation coefficients or R² values with statistical tests) for the linear mappings versus the proposed encoder, making it difficult to assess the claimed benefit of temporally structured modeling.

Authors: We acknowledge the absence of specific numbers in the abstract. The revised abstract will report the key quantitative comparison: mean correlation improvement of the time-resolved encoder over linear baselines (with paired t-test p-values across electrodes and subjects). These values and the associated statistical tests are already detailed in Results Section 4.2; we will summarize them concisely in the abstract to allow direct evaluation of the temporal modeling benefit. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation chain consists of fitting a time-resolved encoder (recurrent model + soft attention) to map Whisper layer embeddings to ECoG, then reporting layer-wise alignment strengths plus baseline comparisons. No quoted equation or step reduces the reported alignment result to the fitted parameters by construction, nor does any load-bearing premise collapse to a self-citation or imported uniqueness theorem. The central claim remains an empirical comparison under explicitly stated modeling choices and is therefore self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Abstract-only review limits visibility; ledger reflects components explicitly introduced or assumed in the provided text. The encoder is a new constructed method, Whisper-to-brain correspondence is treated as testable, and temporal modeling choices are implicit.

free parameters (1)

recurrent and attention parameters
The time-resolved encoder combines recurrent temporal model and soft attention; these are fitted to align embeddings with ECoG responses.

axioms (1)

domain assumption Whisper internal representations are relevant to human cortical speech processing
The investigation premise that layer-wise alignment can be examined via the encoder.

invented entities (1)

time-resolved neural encoder no independent evidence
purpose: Combine speech embeddings with recurrent temporal model and soft attention to examine layer-wise brain alignment
Introduced as the core methodological contribution in the abstract.

pith-pipeline@v0.9.1-grok · 5695 in / 1401 out tokens · 25856 ms · 2026-06-28T11:37:27.634184+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Retrieval-Based Brain Decoding by Alignment, not Complexity
q-bio.NC 2026-06 unverdicted novelty 5.0

Linear contrastive decoders outperform ridge regression and non-linear alternatives when mapping fMRI activity to foundation model embeddings in vision, text, and audio.

Reference graph

Works this paper leans on

295 extracted references · 93 canonical work pages · cited by 1 Pith paper · 7 internal anchors

[1]

Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

Attention weights accurately predict language representations in the brain , author=. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

2022
[2]

Attention is not explanation , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages=

2019
[3]

2023 , month = sep, day =

Reid, Ellena , title =. 2023 , month = sep, day =

2023
[4]

Neuron , volume=

Toward an understanding of vowel encoding in the human auditory cortex , author=. Neuron , volume=. 2023 , publisher=

2023
[5]

Journal of neurophysiology , volume=

Influence of context and behavior on stimulus reconstruction from neural activity in primary auditory cortex , author=. Journal of neurophysiology , volume=. 2009 , publisher=

2009
[6]

Nature , volume=

A multi-modal parcellation of human cerebral cortex , author=. Nature , volume=. 2016 , publisher=

2016
[7]

IEEE transactions on pattern analysis and machine intelligence , number=

A cluster separation measure , author=. IEEE transactions on pattern analysis and machine intelligence , number=. 2009 , publisher=

2009
[8]

Journal of computational and applied mathematics , volume=

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , author=. Journal of computational and applied mathematics , volume=. 1987 , publisher=

1987
[9]

Science , volume=

Phonetic feature encoding in human superior temporal gyrus , author=. Science , volume=. 2014 , publisher=

2014
[10]

Nature neuroscience , volume=

Categorical speech representation in human superior temporal gyrus , author=. Nature neuroscience , volume=. 2010 , publisher=

2010
[11]

BioRxiv , pages=

Correspondence between the layered structure of deep language models and temporal structure of natural language processing in the human brain , author=. BioRxiv , pages=. 2022 , publisher=

2022
[12]

Scientific reports , volume=

Deep language algorithms predict semantic comprehension from brain activity , author=. Scientific reports , volume=. 2022 , publisher=

2022
[13]

Proceedings of the National Academy of Sciences , volume=

The neural architecture of language: Integrative modeling converges on predictive processing , author=. Proceedings of the National Academy of Sciences , volume=. 2021 , publisher=

2021
[14]

Advances in neural information processing systems , volume=

Incorporating context into language encoding models for fMRI , author=. Advances in neural information processing systems , volume=
[15]

Scientific Reports , volume=

Reconstructing music perception from brain activity using a prior guided diffusion model , author=. Scientific Reports , volume=. 2025 , publisher=

2025
[16]

IEEE/ACM transactions on audio, speech, and language processing , volume=

Hubert: Self-supervised speech representation learning by masked prediction of hidden units , author=. IEEE/ACM transactions on audio, speech, and language processing , volume=. 2021 , publisher=

2021
[17]

Advances in neural information processing systems , volume=

wav2vec 2.0: A framework for self-supervised learning of speech representations , author=. Advances in neural information processing systems , volume=
[18]

Nature Machine Intelligence , volume=

A neural speech decoding framework leveraging deep learning and speech synthesis , author=. Nature Machine Intelligence , volume=. 2024 , publisher=

2024
[19]

Nature , volume=

A high-performance neuroprosthesis for speech decoding and avatar control , author=. Nature , volume=. 2023 , publisher=

2023
[20]

PLOS Computational Biology , volume=

Deep-learning models reveal how context and listener attention shape electrophysiological correlates of speech-to-language transformation , author=. PLOS Computational Biology , volume=. 2024 , publisher=

2024
[21]

Nature Neuroscience , volume=

Dissecting neural computations in the human auditory pathway using deep neural networks for speech , author=. Nature Neuroscience , volume=. 2023 , publisher=

2023
[22]

arXiv preprint arXiv:2512.01591 , year=

Scaling and context steer LLMs along the same computational path as the human brain , author=. arXiv preprint arXiv:2512.01591 , year=

arXiv
[23]

Nature human behaviour , pages=

A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations , author=. Nature human behaviour , pages=. 2025 , publisher=

2025
[24]

Nature , volume=

A high-performance speech neuroprosthesis , author=. Nature , volume=. 2023 , publisher=

2023
[25]

Speech communication , volume=

Joint-sequence models for grapheme-to-phoneme conversion , author=. Speech communication , volume=. 2008 , publisher=

2008
[26]

International conference on machine learning , pages=

Robust speech recognition via large-scale weak supervision , author=. International conference on machine learning , pages=. 2023 , organization=

2023
[27]

Scientific data , volume=

The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments , author=. Scientific data , volume=. 2016 , publisher=

2016
[28]

Scientific Data , volume=

The “Podcast” ECoG dataset for modeling neural activity during natural language comprehension , author=. Scientific Data , volume=. 2025 , publisher=

2025
[29]

Advances in Neural Information Processing Systems , volume=

Imagereward: Learning and evaluating human preferences for text-to-image generation , author=. Advances in Neural Information Processing Systems , volume=
[30]

arXiv preprint arXiv:2307.01952 , year=

Sdxl: Improving latent diffusion models for high-resolution image synthesis , author=. arXiv preprint arXiv:2307.01952 , year=

Pith/arXiv arXiv
[31]

Nature , volume=

Four ethical priorities for neurotechnologies and AI , author=. Nature , volume=. 2017 , publisher=

2017
[32]

IEEE transactions on image processing , volume=

Image quality assessment: from error visibility to structural similarity , author=. IEEE transactions on image processing , volume=. 2004 , publisher=

2004
[33]

arXiv preprint arXiv:2209.15594 , year=

Self-stabilization: The implicit bias of gradient descent at the edge of stability , author=. arXiv preprint arXiv:2209.15594 , year=

arXiv
[34]

arXiv preprint arXiv:2501.02497 , year=

Test-time Computing: from System-1 Thinking to System-2 Thinking , author=. arXiv preprint arXiv:2501.02497 , year=

arXiv
[35]

arXiv preprint arXiv:2308.06721 , year=

Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models , author=. arXiv preprint arXiv:2308.06721 , year=

Pith/arXiv arXiv
[36]

1982 , note =

Peter Gabriel , title =. 1982 , note =

1982
[37]

Pierre, Susan E

McQuilton, Peter and St. Pierre, Susan E. and Thurmond, Jim and the FlyBase Consortium , title =. 2012 , doi =. http://nar.oxfordjournals.org/content/40/D1/D706.full.pdf+html , journal =

2012
[38]

and Pfeffer, Suzanne , title =

Aivazian, Dikran and Serrano, Ramon L. and Pfeffer, Suzanne , title =. 2006 , doi =. http://jcb.rupress.org/content/173/6/917.full.pdf , journal =

2006
[39]

2016 , doi =

Bloss, Cinnamon S and Wineinger, Nathan E and Peters, Melissa and Boeldt, Debra L and Ariniello, Lauren and Kim, Ju Young and Sheard, Judy and Komatireddy, Ravi and Barrett, Paddy and Topol, Eric J , title =. 2016 , doi =. http://biorxiv.org/content/early/2016/01/14/029983.full.pdf , journal =

2016
[40]

Aquiflexum balticum gen

Brettar, Ingrid and Christen, Richard and Höfle, Manfred G. Aquiflexum balticum gen. nov., sp. nov., a novel marine bacterium of the Cytophaga–Flavobacterium–Bacteroides group isolated from surface water of the central Baltic Sea. International Journal of Systematic and Evolutionary Microbiology. 2004

2004
[41]

Belliella baltica gen

Brettar, Ingrid and Christen, Richard and Höfle, Manfred G. Belliella baltica gen. nov., sp. nov., a novel marine bacterium of the Cytophaga–Flavobacterium–Bacteroides group isolated from surface water of the central Baltic Sea. International Journal of Systematic and Evolutionary Microbiology. 2004

2004
[42]

Engineering , author =

Brain. Engineering , author =. 2019 , pages =. doi:10.1016/j.eng.2019.03.010 , abstract =

work page doi:10.1016/j.eng.2019.03.010 2019
[43]

and Henderson, Margaret M

Luo, Andrew F. and Henderson, Margaret M. and Wehbe, Leila and Tarr, Michael J. , month = jun, year =. Brain
[44]

Yang, Huzheng and Gee, James and Shi, Jianbo , month = aug, year =. Memory. doi:10.48550/arXiv.2308.01175 , abstract =

work page doi:10.48550/arxiv.2308.01175
[45]

and Jobard, Gael and Alexandre, Frederic and Hinaut, Xavier , month = jul, year =

Oota, Subba Reddy and Gupta, Manish and Bapi, Raju S. and Jobard, Gael and Alexandre, Frederic and Hinaut, Xavier , month = jul, year =. Deep
[46]

Multiple visual objects are represented differently in the human brain and convolutional neural networks
[47]

2020 , eprint=

Neural encoding and interpretation for high-level visual cortices based on fMRI using image caption features , author=. 2020 , eprint=

2020
[48]

Olman and Dustin E

Thomas Naselaris and Cheryl A. Olman and Dustin E. Stansbury and Kamil Ugurbil and Jack L. Gallant , keywords =. A voxel-wise encoding model for early visual areas decodes mental images of remembered scenes , journal =. 2015 , issn =. doi:https://doi.org/10.1016/j.neuroimage.2014.10.018 , url =

work page doi:10.1016/j.neuroimage.2014.10.018 2015
[49]

Scientific Data , volume=

A natural language fMRI dataset for voxelwise encoding models , author=. Scientific Data , volume=. 2023 , publisher=

2023
[50]

Nature Neuroscience , volume=

Semantic reconstruction of continuous language from non-invasive brain recordings , author=. Nature Neuroscience , volume=. 2023 , publisher=

2023
[51]

Nature Machine Intelligence , volume=

Decoding speech perception from non-invasive brain recordings , author=. Nature Machine Intelligence , volume=. 2023 , publisher=

2023
[52]

Communications Biology , volume=

Brains and algorithms partially converge in natural language processing , author=. Communications Biology , volume=. 2022 , publisher=

2022
[53]

2023 , eprint=

Scaling laws for language encoding models in fMRI , author=. 2023 , eprint=

2023
[54]

Nature Human Behaviour , volume=

Evidence of a predictive coding hierarchy in the human brain listening to speech , author=. Nature Human Behaviour , volume=. 2023 , publisher=

2023
[55]

Prince and Kendrick N

Colin Conwell and Jacob S. Prince and Kendrick N. Kay and George A. Alvarez and Talia Konkle , title =. 2023 , doi =. https://www.biorxiv.org/content/early/2023/07/01/2022.03.28.485868.full.pdf , journal =

2023
[56]

2023 , eprint=

Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models , author=. 2023 , eprint=

2023
[57]

Multimodal neural networks better explain multivoxel patterns in the hippocampus , journal =

Bhavin Choksi and Milad Mozafari and Rufin VanRullen and Leila Reddy , keywords =. Multimodal neural networks better explain multivoxel patterns in the hippocampus , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.neunet.2022.07.033 , url =

work page doi:10.1016/j.neunet.2022.07.033 2022
[58]

Scientific Reports , year=

Ozcelik, Furkan and VanRullen, Rufin , title=. Scientific Reports , year=. doi:10.1038/s41598-023-42891-8 , url=

work page doi:10.1038/s41598-023-42891-8
[59]

Allen, Ghislain St-Yves, Yihan Wu, Jesse L

Allen, Emily J. and St-Yves, Ghislain and Wu, Yihan and Breedlove, Jesse L. and Prince, Jacob S. and Dowdle, Logan T. and Nau, Matthias and Caron, Brad and Pestilli, Franco and Charest, Ian and Hutchinson, J. Benjamin and Naselaris, Thomas and Kay, Kendrick , title=. Nature Neuroscience , year=. doi:10.1038/s41593-021-00962-x , url=

work page doi:10.1038/s41593-021-00962-x
[60]

2023 , eprint=

The Algonauts Project 2023 Challenge: How the Human Brain Makes Sense of Natural Scenes , author=. 2023 , eprint=

2023
[61]

2023 , eprint=

Memory Encoding Model , author=. 2023 , eprint=

2023
[62]

2023 , eprint=

The Algonauts Project 2023 Challenge: UARK-UAlbany Team Solution , author=. 2023 , eprint=

2023
[63]

2023 , doi =

Hossein Adeli and Sun Minni and Nikolaus Kriegeskorte , title =. 2023 , doi =. https://www.biorxiv.org/content/early/2023/08/05/2023.08.02.551743.full.pdf , journal =

2023
[64]

Methods for computing the maximum performance of computational models of fMRI responses

Lage-Castellanos, Agustin and Valente, Giancarlo and Formisano, Elia and De Martino, Federico. Methods for computing the maximum performance of computational models of fMRI responses. PLoS Comput. Biol
[65]

, title =

Fortunato, S. , title =. Phys. Rep.-Rev. Sec. Phys. Lett. , volume =. 2010 , pages =

2010
[66]

Newman, M. E. J. and Girvan, M. , title =. Phys. Rev. E. , volume =. 2004 , pages =

2004
[67]

and Reinhardt, T

Vehlow, C. and Reinhardt, T. and Weiskopf, D. , title =. IEEE Trans. Vis. Comput. Graph. , volume =. 2013 , pages =

2013
[68]

and Albert, R

Raghavan, U. and Albert, R. and Kumara, S. , title =. Phys. Rev E. , volume =. 2007 , pages =

2007
[69]

2011 , pages =

Robust network community detection using balanced propagation , journal =. 2011 , pages =

2011
[70]

and Li, S

Lou, H. and Li, S. and Zhao, Y. , title =. Physica A. , volume =. 2013 , pages =

2013
[71]

and Newman, M

Clauset, A. and Newman, M. E. J. and Moore, C. , title =. Phys. Rev. E. , volume =. 2004 , pages =

2004
[72]

Blondel, V. D. and Guillaume, J. L. and Lambiotte, R. and Lefebvre, E. , title =. J. Stat. Mech.-Theory Exp. , volume =. 2008 , pages =

2008
[73]

and Campari, R

Sobolevsky, S. and Campari, R. , title =. Phys. Rev. E. , volume =. 2014 , pages =

2014
[74]

and Barthelemy, M

Fortunato, S. and Barthelemy, M. , title =. Proc. Natl. Acad. Sci. U. S. A. , volume =. 2007 , pages =

2007
[75]

2011 , pages =

Unfolding communities in large complex networks: Combining defensive and offensive label propagation for core extraction , journal =. 2011 , pages =

2011
[76]

and Li, J

Wang, X. and Li, J. , title =. Physica A. , volume =. 2013 , pages =

2013
[77]

and Wang, X

Li, J. and Wang, X. and Eustace, J. , title =. Physica A. , volume =. 2013 , pages =

2013
[78]

Fabio, D. R. and Fabio, D. and Carlo, P. , title =. Sci. Rep. , volume =. 2013 , pages =

2013
[79]

and Wu, T

Chen, Q. and Wu, T. T. and Fang, M. , title =. Physica A. , volume =. 2013 , pages =

2013
[80]

and Wang, R

Zhang, S. and Wang, R. and Zhang, X. , title =. Physica A. , volume =. 2007 , pages =

2007

Showing first 80 references.

[1] [1]

Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

Attention weights accurately predict language representations in the brain , author=. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

2022

[2] [2]

Attention is not explanation , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages=

2019

[3] [3]

2023 , month = sep, day =

Reid, Ellena , title =. 2023 , month = sep, day =

2023

[4] [4]

Neuron , volume=

Toward an understanding of vowel encoding in the human auditory cortex , author=. Neuron , volume=. 2023 , publisher=

2023

[5] [5]

Journal of neurophysiology , volume=

Influence of context and behavior on stimulus reconstruction from neural activity in primary auditory cortex , author=. Journal of neurophysiology , volume=. 2009 , publisher=

2009

[6] [6]

Nature , volume=

A multi-modal parcellation of human cerebral cortex , author=. Nature , volume=. 2016 , publisher=

2016

[7] [7]

IEEE transactions on pattern analysis and machine intelligence , number=

A cluster separation measure , author=. IEEE transactions on pattern analysis and machine intelligence , number=. 2009 , publisher=

2009

[8] [8]

Journal of computational and applied mathematics , volume=

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , author=. Journal of computational and applied mathematics , volume=. 1987 , publisher=

1987

[9] [9]

Science , volume=

Phonetic feature encoding in human superior temporal gyrus , author=. Science , volume=. 2014 , publisher=

2014

[10] [10]

Nature neuroscience , volume=

Categorical speech representation in human superior temporal gyrus , author=. Nature neuroscience , volume=. 2010 , publisher=

2010

[11] [11]

BioRxiv , pages=

Correspondence between the layered structure of deep language models and temporal structure of natural language processing in the human brain , author=. BioRxiv , pages=. 2022 , publisher=

2022

[12] [12]

Scientific reports , volume=

Deep language algorithms predict semantic comprehension from brain activity , author=. Scientific reports , volume=. 2022 , publisher=

2022

[13] [13]

Proceedings of the National Academy of Sciences , volume=

The neural architecture of language: Integrative modeling converges on predictive processing , author=. Proceedings of the National Academy of Sciences , volume=. 2021 , publisher=

2021

[14] [14]

Advances in neural information processing systems , volume=

Incorporating context into language encoding models for fMRI , author=. Advances in neural information processing systems , volume=

[15] [15]

Scientific Reports , volume=

Reconstructing music perception from brain activity using a prior guided diffusion model , author=. Scientific Reports , volume=. 2025 , publisher=

2025

[16] [16]

IEEE/ACM transactions on audio, speech, and language processing , volume=

Hubert: Self-supervised speech representation learning by masked prediction of hidden units , author=. IEEE/ACM transactions on audio, speech, and language processing , volume=. 2021 , publisher=

2021

[17] [17]

Advances in neural information processing systems , volume=

wav2vec 2.0: A framework for self-supervised learning of speech representations , author=. Advances in neural information processing systems , volume=

[18] [18]

Nature Machine Intelligence , volume=

A neural speech decoding framework leveraging deep learning and speech synthesis , author=. Nature Machine Intelligence , volume=. 2024 , publisher=

2024

[19] [19]

Nature , volume=

A high-performance neuroprosthesis for speech decoding and avatar control , author=. Nature , volume=. 2023 , publisher=

2023

[20] [20]

PLOS Computational Biology , volume=

Deep-learning models reveal how context and listener attention shape electrophysiological correlates of speech-to-language transformation , author=. PLOS Computational Biology , volume=. 2024 , publisher=

2024

[21] [21]

Nature Neuroscience , volume=

Dissecting neural computations in the human auditory pathway using deep neural networks for speech , author=. Nature Neuroscience , volume=. 2023 , publisher=

2023

[22] [22]

arXiv preprint arXiv:2512.01591 , year=

Scaling and context steer LLMs along the same computational path as the human brain , author=. arXiv preprint arXiv:2512.01591 , year=

arXiv

[23] [23]

Nature human behaviour , pages=

A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations , author=. Nature human behaviour , pages=. 2025 , publisher=

2025

[24] [24]

Nature , volume=

A high-performance speech neuroprosthesis , author=. Nature , volume=. 2023 , publisher=

2023

[25] [25]

Speech communication , volume=

Joint-sequence models for grapheme-to-phoneme conversion , author=. Speech communication , volume=. 2008 , publisher=

2008

[26] [26]

International conference on machine learning , pages=

Robust speech recognition via large-scale weak supervision , author=. International conference on machine learning , pages=. 2023 , organization=

2023

[27] [27]

Scientific data , volume=

The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments , author=. Scientific data , volume=. 2016 , publisher=

2016

[28] [28]

Scientific Data , volume=

The “Podcast” ECoG dataset for modeling neural activity during natural language comprehension , author=. Scientific Data , volume=. 2025 , publisher=

2025

[29] [29]

Advances in Neural Information Processing Systems , volume=

Imagereward: Learning and evaluating human preferences for text-to-image generation , author=. Advances in Neural Information Processing Systems , volume=

[30] [30]

arXiv preprint arXiv:2307.01952 , year=

Sdxl: Improving latent diffusion models for high-resolution image synthesis , author=. arXiv preprint arXiv:2307.01952 , year=

Pith/arXiv arXiv

[31] [31]

Nature , volume=

Four ethical priorities for neurotechnologies and AI , author=. Nature , volume=. 2017 , publisher=

2017

[32] [32]

IEEE transactions on image processing , volume=

Image quality assessment: from error visibility to structural similarity , author=. IEEE transactions on image processing , volume=. 2004 , publisher=

2004

[33] [33]

arXiv preprint arXiv:2209.15594 , year=

Self-stabilization: The implicit bias of gradient descent at the edge of stability , author=. arXiv preprint arXiv:2209.15594 , year=

arXiv

[34] [34]

arXiv preprint arXiv:2501.02497 , year=

Test-time Computing: from System-1 Thinking to System-2 Thinking , author=. arXiv preprint arXiv:2501.02497 , year=

arXiv

[35] [35]

arXiv preprint arXiv:2308.06721 , year=

Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models , author=. arXiv preprint arXiv:2308.06721 , year=

Pith/arXiv arXiv

[36] [36]

1982 , note =

Peter Gabriel , title =. 1982 , note =

1982

[37] [37]

Pierre, Susan E

McQuilton, Peter and St. Pierre, Susan E. and Thurmond, Jim and the FlyBase Consortium , title =. 2012 , doi =. http://nar.oxfordjournals.org/content/40/D1/D706.full.pdf+html , journal =

2012

[38] [38]

and Pfeffer, Suzanne , title =

Aivazian, Dikran and Serrano, Ramon L. and Pfeffer, Suzanne , title =. 2006 , doi =. http://jcb.rupress.org/content/173/6/917.full.pdf , journal =

2006

[39] [39]

2016 , doi =

Bloss, Cinnamon S and Wineinger, Nathan E and Peters, Melissa and Boeldt, Debra L and Ariniello, Lauren and Kim, Ju Young and Sheard, Judy and Komatireddy, Ravi and Barrett, Paddy and Topol, Eric J , title =. 2016 , doi =. http://biorxiv.org/content/early/2016/01/14/029983.full.pdf , journal =

2016

[40] [40]

Aquiflexum balticum gen

Brettar, Ingrid and Christen, Richard and Höfle, Manfred G. Aquiflexum balticum gen. nov., sp. nov., a novel marine bacterium of the Cytophaga–Flavobacterium–Bacteroides group isolated from surface water of the central Baltic Sea. International Journal of Systematic and Evolutionary Microbiology. 2004

2004

[41] [41]

Belliella baltica gen

Brettar, Ingrid and Christen, Richard and Höfle, Manfred G. Belliella baltica gen. nov., sp. nov., a novel marine bacterium of the Cytophaga–Flavobacterium–Bacteroides group isolated from surface water of the central Baltic Sea. International Journal of Systematic and Evolutionary Microbiology. 2004

2004

[42] [42]

Engineering , author =

Brain. Engineering , author =. 2019 , pages =. doi:10.1016/j.eng.2019.03.010 , abstract =

work page doi:10.1016/j.eng.2019.03.010 2019

[43] [43]

and Henderson, Margaret M

Luo, Andrew F. and Henderson, Margaret M. and Wehbe, Leila and Tarr, Michael J. , month = jun, year =. Brain

[44] [44]

Yang, Huzheng and Gee, James and Shi, Jianbo , month = aug, year =. Memory. doi:10.48550/arXiv.2308.01175 , abstract =

work page doi:10.48550/arxiv.2308.01175

[45] [45]

and Jobard, Gael and Alexandre, Frederic and Hinaut, Xavier , month = jul, year =

Oota, Subba Reddy and Gupta, Manish and Bapi, Raju S. and Jobard, Gael and Alexandre, Frederic and Hinaut, Xavier , month = jul, year =. Deep

[46] [46]

Multiple visual objects are represented differently in the human brain and convolutional neural networks

[47] [47]

2020 , eprint=

Neural encoding and interpretation for high-level visual cortices based on fMRI using image caption features , author=. 2020 , eprint=

2020

[48] [48]

Olman and Dustin E

Thomas Naselaris and Cheryl A. Olman and Dustin E. Stansbury and Kamil Ugurbil and Jack L. Gallant , keywords =. A voxel-wise encoding model for early visual areas decodes mental images of remembered scenes , journal =. 2015 , issn =. doi:https://doi.org/10.1016/j.neuroimage.2014.10.018 , url =

work page doi:10.1016/j.neuroimage.2014.10.018 2015

[49] [49]

Scientific Data , volume=

A natural language fMRI dataset for voxelwise encoding models , author=. Scientific Data , volume=. 2023 , publisher=

2023

[50] [50]

Nature Neuroscience , volume=

Semantic reconstruction of continuous language from non-invasive brain recordings , author=. Nature Neuroscience , volume=. 2023 , publisher=

2023

[51] [51]

Nature Machine Intelligence , volume=

Decoding speech perception from non-invasive brain recordings , author=. Nature Machine Intelligence , volume=. 2023 , publisher=

2023

[52] [52]

Communications Biology , volume=

Brains and algorithms partially converge in natural language processing , author=. Communications Biology , volume=. 2022 , publisher=

2022

[53] [53]

2023 , eprint=

Scaling laws for language encoding models in fMRI , author=. 2023 , eprint=

2023

[54] [54]

Nature Human Behaviour , volume=

Evidence of a predictive coding hierarchy in the human brain listening to speech , author=. Nature Human Behaviour , volume=. 2023 , publisher=

2023

[55] [55]

Prince and Kendrick N

Colin Conwell and Jacob S. Prince and Kendrick N. Kay and George A. Alvarez and Talia Konkle , title =. 2023 , doi =. https://www.biorxiv.org/content/early/2023/07/01/2022.03.28.485868.full.pdf , journal =

2023

[56] [56]

2023 , eprint=

Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models , author=. 2023 , eprint=

2023

[57] [57]

Multimodal neural networks better explain multivoxel patterns in the hippocampus , journal =

Bhavin Choksi and Milad Mozafari and Rufin VanRullen and Leila Reddy , keywords =. Multimodal neural networks better explain multivoxel patterns in the hippocampus , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.neunet.2022.07.033 , url =

work page doi:10.1016/j.neunet.2022.07.033 2022

[58] [58]

Scientific Reports , year=

Ozcelik, Furkan and VanRullen, Rufin , title=. Scientific Reports , year=. doi:10.1038/s41598-023-42891-8 , url=

work page doi:10.1038/s41598-023-42891-8

[59] [59]

Allen, Ghislain St-Yves, Yihan Wu, Jesse L

Allen, Emily J. and St-Yves, Ghislain and Wu, Yihan and Breedlove, Jesse L. and Prince, Jacob S. and Dowdle, Logan T. and Nau, Matthias and Caron, Brad and Pestilli, Franco and Charest, Ian and Hutchinson, J. Benjamin and Naselaris, Thomas and Kay, Kendrick , title=. Nature Neuroscience , year=. doi:10.1038/s41593-021-00962-x , url=

work page doi:10.1038/s41593-021-00962-x

[60] [60]

2023 , eprint=

The Algonauts Project 2023 Challenge: How the Human Brain Makes Sense of Natural Scenes , author=. 2023 , eprint=

2023

[61] [61]

2023 , eprint=

Memory Encoding Model , author=. 2023 , eprint=

2023

[62] [62]

2023 , eprint=

The Algonauts Project 2023 Challenge: UARK-UAlbany Team Solution , author=. 2023 , eprint=

2023

[63] [63]

2023 , doi =

Hossein Adeli and Sun Minni and Nikolaus Kriegeskorte , title =. 2023 , doi =. https://www.biorxiv.org/content/early/2023/08/05/2023.08.02.551743.full.pdf , journal =

2023

[64] [64]

Methods for computing the maximum performance of computational models of fMRI responses

Lage-Castellanos, Agustin and Valente, Giancarlo and Formisano, Elia and De Martino, Federico. Methods for computing the maximum performance of computational models of fMRI responses. PLoS Comput. Biol

[65] [65]

, title =

Fortunato, S. , title =. Phys. Rep.-Rev. Sec. Phys. Lett. , volume =. 2010 , pages =

2010

[66] [66]

Newman, M. E. J. and Girvan, M. , title =. Phys. Rev. E. , volume =. 2004 , pages =

2004

[67] [67]

and Reinhardt, T

Vehlow, C. and Reinhardt, T. and Weiskopf, D. , title =. IEEE Trans. Vis. Comput. Graph. , volume =. 2013 , pages =

2013

[68] [68]

and Albert, R

Raghavan, U. and Albert, R. and Kumara, S. , title =. Phys. Rev E. , volume =. 2007 , pages =

2007

[69] [69]

2011 , pages =

Robust network community detection using balanced propagation , journal =. 2011 , pages =

2011

[70] [70]

and Li, S

Lou, H. and Li, S. and Zhao, Y. , title =. Physica A. , volume =. 2013 , pages =

2013

[71] [71]

and Newman, M

Clauset, A. and Newman, M. E. J. and Moore, C. , title =. Phys. Rev. E. , volume =. 2004 , pages =

2004

[72] [72]

Blondel, V. D. and Guillaume, J. L. and Lambiotte, R. and Lefebvre, E. , title =. J. Stat. Mech.-Theory Exp. , volume =. 2008 , pages =

2008

[73] [73]

and Campari, R

Sobolevsky, S. and Campari, R. , title =. Phys. Rev. E. , volume =. 2014 , pages =

2014

[74] [74]

and Barthelemy, M

Fortunato, S. and Barthelemy, M. , title =. Proc. Natl. Acad. Sci. U. S. A. , volume =. 2007 , pages =

2007

[75] [75]

2011 , pages =

Unfolding communities in large complex networks: Combining defensive and offensive label propagation for core extraction , journal =. 2011 , pages =

2011

[76] [76]

and Li, J

Wang, X. and Li, J. , title =. Physica A. , volume =. 2013 , pages =

2013

[77] [77]

and Wang, X

Li, J. and Wang, X. and Eustace, J. , title =. Physica A. , volume =. 2013 , pages =

2013

[78] [78]

Fabio, D. R. and Fabio, D. and Carlo, P. , title =. Sci. Rep. , volume =. 2013 , pages =

2013

[79] [79]

and Wu, T

Chen, Q. and Wu, T. T. and Fang, M. , title =. Physica A. , volume =. 2013 , pages =

2013

[80] [80]

and Wang, R

Zhang, S. and Wang, R. and Zhang, X. , title =. Physica A. , volume =. 2007 , pages =

2007