pith. sign in

arxiv: 2606.02305 · v1 · pith:SOL26VFCnew · submitted 2026-06-01 · 🧬 q-bio.NC · cs.HC

Mapping Whisper Representations to Human ECoG Responses with Interpretable Time-Resolved Neural Encoding

classification 🧬 q-bio.NC cs.HC
keywords speechrepresentationsneuralresponsescorticalecogtime-resolvedwhisper
0
0 comments X
read the original abstract

Understanding how speech foundation models relate to human cortical activity is a key challenge for computational neuroscience. Here, we investigate how internal representations from Whisper predict intracranial ECoG responses during naturalistic speech perception. We introduce a time-resolved neural encoder that combines speech embeddings with a recurrent temporal model and soft attention, allowing us to examine layer-wise brain alignment. Intermediate Whisper layers provide the strongest correspondence with neural activity, supporting a hierarchical match between model representations and cortical speech processing. Comparisons with baselines show that high-resolution ECoG responses benefit from temporally structured modelling beyond linear mappings from the same speech representations. In addition, attention maps reveal temporally local alignment between speech embeddings and neural responses, while a phonemic interpretability analysis identifies anatomically coherent phoneme-category organization among encoding-informative electrodes. Together, these results suggest that speech foundation models offer a useful framework for studying time-resolved cortical speech representations.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.