Reassessing prediction in the brain: Pre-onset neural encoding during natural listening does not reflect pre-activation

Britta U. Westner; Jakub Szewczyk; Linda Geerligs; Sahel Azizpour; Umut G\"u\c{c}l\"u

arxiv: 2412.19622 · v2 · submitted 2024-12-27 · 🧬 q-bio.NC

Reassessing prediction in the brain: Pre-onset neural encoding during natural listening does not reflect pre-activation

Sahel Azizpour , Britta U. Westner , Jakub Szewczyk , Umut G\"u\c{c}l\"u , Linda Geerligs This is my paper

Pith reviewed 2026-05-23 07:30 UTC · model grok-4.3

classification 🧬 q-bio.NC

keywords predictive processingpre-onset encodinglanguage comprehensionMEGECoGtemporal generalizationpostdiction

0 comments

The pith

Pre-onset neural activity during natural listening does not reflect pre-activation of the next word.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether brain signals recorded before a word arrives during natural listening represent preparation for that specific word. It replicates earlier reports of pre-onset encoding with language-model embeddings in both MEG and ECoG data, even after removing stimulus correlations. Temporal generalization tests, however, find no stable match between the pre-onset and post-onset patterns. The signals instead align with ongoing representation of earlier words. This finding questions the interpretation of similar pre-onset effects as direct support for predictive pre-activation in language comprehension.

Core claim

Pre-onset neural encoding effects replicate across MEG and ECoG and survive stimulus-correlation controls, yet temporal generalization analyses reveal no stable overlap between pre- and post-onset representations. This indicates that pre-onset activity does not reflect pre-activation of the next word. Long-range predictive effects reported in prior fMRI work do not appear in the higher-temporal-resolution recordings, while clear postdiction signatures emerge as persistent encoding of prior words.

What carries the argument

Temporal generalization analyses that test whether pre-onset neural patterns remain stable when compared to post-onset patterns.

If this is right

Pre-onset encoding effects cannot be taken as evidence of predictive pre-activation of specific words.
Long-range predictive effects observed in fMRI fail to replicate when temporal resolution is higher.
Neural responses instead carry forward information about words that have already occurred.
LLM-based encoding models require additional tests before their outputs are interpreted as neural prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Other measures of expectation, such as lexical surprise, may still produce detectable prediction signals even if specific-word pre-activation does not.
The same temporal-generalization controls could be applied to claims of pre-activation in non-linguistic domains.
Comprehension models may need to assign a larger role to postdiction mechanisms that maintain prior context.

Load-bearing premise

That the absence of stable overlap in temporal generalization, together with the chosen stimulus controls, is enough to rule out pre-activation rather than simply missing it under these analysis choices.

What would settle it

A new dataset or analysis that demonstrates stable generalization from pre-onset to post-onset representations while keeping the same controls would support the pre-activation interpretation.

Figures

Figures reproduced from arXiv: 2412.19622 by Britta U. Westner, Jakub Szewczyk, Linda Geerligs, Sahel Azizpour, Umut G\"u\c{c}l\"u.

**Figure 2.** Figure 2: Word embeddings explain brain responses before word onset left column: [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Temporal generalization of representations captured by the encoding model differs before and after word [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Removing autocorrelation between pre- and post-onset activity does not eliminate pre-onset encoding. a. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Encoding of the future and past words. All curves represent averages across participants. The embedding vector is constructed by concatenating 𝑑 future word embeddings (𝑑 > 0) or |𝑑| past word embeddings (𝑑 < 0) along with the embedding of the current word 𝑤𝑖 . a Including the next word embedding in the encoding model (𝑑 = 1) enhances encoding only after that word is heard in the story, while including the… view at source ↗

read the original abstract

Predictive processing theories propose that the brain continuously anticipates upcoming input. However, direct neural evidence for predictive pre-activation during natural language comprehension remains limited and debated. Previous studies using large language model (LLM)-based encoding models with fMRI and ECoG have reported pre-onset signals that appear to encode upcoming words, but these effects may instead reflect dependencies in the stimulus or autocorrelations in neural activity. Here, we re-examined this question by aligning LLM-derived word embeddings with neural activity recorded during naturalistic listening using magnetoencephalography (MEG) and electrocorticography (ECoG). We replicated pre-onset encoding effects previously observed in ECoG across both modalities, and found that they persist even after controlling for stimulus correlations. Crucially, temporal generalization analyses revealed no stable overlap between pre- and post-onset representations, indicating that pre-onset activity does not reflect pre-activation of the next word. Consistent with this, long-range predictive effects previously reported in fMRI did not replicate in our higher-temporal-resolution data. While we found no evidence for predictive pre-activation, we observed clear signatures of postdiction, with neural activity reflecting persistent encoding of prior words. These results suggest that reported apparent predictive signals do not reflect pre-activation of upcoming input. They call for caution in interpreting LLM-based encoding models and highlight the need for a more nuanced understanding of what constitutes "prediction" in language comprehension.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pre-onset effects hold after stimulus controls but temporal generalization finds no overlap, so the paper treats them as non-predictive.

read the letter

The paper replicates pre-onset encoding during natural listening in both MEG and ECoG, shows the effects survive stimulus-correlation controls, and then uses temporal generalization to argue there is no stable representational overlap with post-onset activity. It also fails to recover the long-range predictive signatures previously reported in fMRI. Postdiction effects are observed instead. These steps are the actual contribution: an empirical check on whether earlier pre-onset results can be read as pre-activation once basic confounds are addressed and higher temporal resolution is used.

Referee Report

2 major / 2 minor

Summary. The manuscript re-examines evidence for predictive pre-activation during natural language listening by aligning LLM word embeddings with MEG and ECoG recordings. It replicates pre-onset encoding effects that survive stimulus-correlation controls, but reports that temporal generalization analyses show no stable overlap between pre- and post-onset neural representations. This is taken to indicate that pre-onset activity does not reflect pre-activation of the next word. The paper also notes replication of postdiction signatures, failure to replicate long-range fMRI predictive effects in higher-temporal-resolution data, and calls for caution in interpreting LLM-based encoding models.

Significance. If the central negative result holds after addressing sensitivity concerns, the work would meaningfully constrain predictive-processing accounts of language comprehension by showing that apparent pre-onset signals in LLM encoding models do not constitute evidence for pre-activation. The multi-modal replication, explicit stimulus controls, and contrast with postdiction effects provide a useful empirical anchor for reinterpreting prior findings in the field.

major comments (2)

[Abstract] Abstract and temporal generalization section: the load-bearing claim that 'no stable overlap' rules out pre-activation assumes the chosen metric (time-window selection, cross-validation scheme, and overlap criterion) has adequate sensitivity. If pre-activation manifests as a transformed or partial mapping (e.g., semantic features only or a distinct neural subspace), the procedure could fail to detect overlap even when pre-onset encoding survives stimulus controls. This assumption is not directly tested and directly supports the central negative conclusion.
[Methods] Methods description (implied by abstract): data exclusion criteria, exact statistical thresholds, and the precise implementation of temporal generalization (including how 'stable overlap' is quantified) are not fully specified. Without these, the support for the negative claim that pre-onset activity does not reflect pre-activation cannot be fully evaluated, as noted in the low soundness assessment.

minor comments (2)

Clarify whether the temporal generalization analysis was performed in the original neural space or projected into the LLM embedding space, and report the exact cross-validation scheme used.
The abstract states that long-range predictive effects 'did not replicate'; provide a direct quantitative comparison (effect sizes, p-values) to the original fMRI study for transparency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below, providing clarifications and indicating where revisions will strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract and temporal generalization section: the load-bearing claim that 'no stable overlap' rules out pre-activation assumes the chosen metric (time-window selection, cross-validation scheme, and overlap criterion) has adequate sensitivity. If pre-activation manifests as a transformed or partial mapping (e.g., semantic features only or a distinct neural subspace), the procedure could fail to detect overlap even when pre-onset encoding survives stimulus controls. This assumption is not directly tested and directly supports the central negative conclusion.

Authors: We selected temporal generalization as it is the standard approach for detecting shared representational structure across time (e.g., King & Dehaene, 2014). If pre-activation involved the same or linearly related features, some generalization should appear even under partial overlap; the complete absence across multiple thresholds and folds supports our interpretation. That said, we acknowledge the referee's point on potential transformed mappings and will add a supplementary analysis using cross-temporal canonical correlation analysis to explicitly test for partial or rotated overlap. We will also expand the discussion to note this as a boundary condition on the negative result. revision: partial
Referee: [Methods] Methods description (implied by abstract): data exclusion criteria, exact statistical thresholds, and the precise implementation of temporal generalization (including how 'stable overlap' is quantified) are not fully specified. Without these, the support for the negative claim that pre-onset activity does not reflect pre-activation cannot be fully evaluated, as noted in the low soundness assessment.

Authors: We agree that fuller specification is needed for reproducibility. In the revised manuscript we will add explicit sections detailing participant/data exclusion criteria, the precise statistical thresholds (including correction methods), the temporal generalization procedure (window sizes, cross-validation folds, and the exact criterion used to define 'stable overlap', such as significant generalization across consecutive time points), and any preprocessing steps that affect these analyses. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical reanalysis with independent controls and metric

full rationale

The paper's central result—that temporal generalization shows no stable pre/post-onset overlap, implying pre-onset signals are not pre-activation—rests on an empirical analysis pipeline (LLM embedding alignment, stimulus-correlation controls, and standard temporal generalization) applied to MEG/ECoG data. This does not reduce to a self-definition, a fitted parameter renamed as prediction, or a self-citation chain; the negative finding is falsifiable against the chosen metric and controls rather than tautological. No quoted step equates the output to the input by construction. Minor self-citations (if present) are not load-bearing for the key claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review provides no explicit list of fitted parameters or invented entities; the work relies on standard assumptions of encoding models and temporal generalization.

axioms (1)

domain assumption LLM word embeddings can be linearly aligned with neural activity to test representational overlap
Central to the encoding and temporal generalization analyses described.

pith-pipeline@v0.9.0 · 5818 in / 1149 out tokens · 19464 ms · 2026-05-23T07:30:03.903018+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

temporal generalization analyses revealed no stable overlap between pre- and post-onset representations
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

pre-onset encoding remains largely unchanged after decorrelating eight preceding word embeddings

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

[1]

Predictive coding or just feature discovery? An alternative account of why language models ﬁt brain data

Antonello R, Huth A. Predictive coding or just feature discovery? An alternative account of why language models ﬁt brain data. Neurobiology of Language. 2024; 5(1):64–79. Armeni K, Güçlü U, van Gerven M, Schoffelen JM. A 10-hour within-participant magnetoencephalography narrative dataset to test models of language comprehension. Scientiﬁc Data. 2022; 9(1)...

work page 2024
[2]

Representational dynamics of object vision: the ﬁrst 1000 ms

Carlson T, Tovar DA, Alink A, Kriegeskorte N. Representational dynamics of object vision: the ﬁrst 1000 ms. Journal of vision. 2013; 13(10):1–1. Caucheteux C, Gramfort A, King JR. Evidence of a predictive coding hierarchy in the human brain listening to speech. Nature human behaviour. 2023; 7(3):430–441. Caucheteux C, King JR. Brains and algorithms partia...

work page 2013
[3]

Perception in real-time: predicting the present, reconstructing the past

Hogendoorn H. Perception in real-time: predicting the present, reconstructing the past. Trends in Cognitive Sciences. 2022; 26(2):128–141. King JR, Dehaene S. Characterizing the dynamics of mental representations: the temporal generalization method. Trends in cognitive sciences. 2014; 18(4):203–210. King JR, Gramfort A, Schurger A, Naccache L, Dehaene S. ...

work page doi:10.5281/zenodo.10519948 2022
[4]

1532–1543

p. 1532–1543. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, et al. Language models are unsupervised multitask learners. OpenAI blog. 2019; 1(8):9. Schönmann I, Szewczyk J, de Lange FP, Heilbron M. Stimulus dependencies—rather than next-word prediction—can explain pre-onset brain encoding during natural listening. bioRxiv. 2025; p. 2025–03. Schr...

work page 2019
[5]

The effect of word predictability on reading time is logarithmic

Smith NJ, Levy R. The effect of word predictability on reading time is logarithmic. Cognition. 2013; 128(3):302–319. Szewczyk JM, Mech EN, Federmeier KD. The power of “good”: Can adjectives rapidly decrease as well as increase the availability of the upcoming noun? Journal of Experimental Psychology: Learning, Memory, and Cognition. 2022; 48(6):856. Tonev...

work page doi:10.1109/10.623056 2013
[6]

we considered𝑞values smaller than0.05as signiﬁcant

Star symbols mark signiﬁcant differences between predictable and unpredictable words calculated using a dependent t-test for paired samples on the brain scores of the two groups across the same MEG sources and accounted for multiple hypothesis testing using the Benjamini-Hochberg correction. we considered𝑞values smaller than0.05as signiﬁcant. A smoothing ...

work page 2013
[7]

and are often attributed to reversals in neural activity patterns. In our case, because the encoding model is linear and maps embeddings to individual MEG time points, the neg- ative generalization might reﬂect a sign ﬂip in the underlying neural response from pre- to post-onset, potentially arising from the oscillatory nature of the MEG signal. Although ...

work page 2025

[1] [1]

Predictive coding or just feature discovery? An alternative account of why language models ﬁt brain data

Antonello R, Huth A. Predictive coding or just feature discovery? An alternative account of why language models ﬁt brain data. Neurobiology of Language. 2024; 5(1):64–79. Armeni K, Güçlü U, van Gerven M, Schoffelen JM. A 10-hour within-participant magnetoencephalography narrative dataset to test models of language comprehension. Scientiﬁc Data. 2022; 9(1)...

work page 2024

[2] [2]

Representational dynamics of object vision: the ﬁrst 1000 ms

Carlson T, Tovar DA, Alink A, Kriegeskorte N. Representational dynamics of object vision: the ﬁrst 1000 ms. Journal of vision. 2013; 13(10):1–1. Caucheteux C, Gramfort A, King JR. Evidence of a predictive coding hierarchy in the human brain listening to speech. Nature human behaviour. 2023; 7(3):430–441. Caucheteux C, King JR. Brains and algorithms partia...

work page 2013

[3] [3]

Perception in real-time: predicting the present, reconstructing the past

Hogendoorn H. Perception in real-time: predicting the present, reconstructing the past. Trends in Cognitive Sciences. 2022; 26(2):128–141. King JR, Dehaene S. Characterizing the dynamics of mental representations: the temporal generalization method. Trends in cognitive sciences. 2014; 18(4):203–210. King JR, Gramfort A, Schurger A, Naccache L, Dehaene S. ...

work page doi:10.5281/zenodo.10519948 2022

[4] [4]

1532–1543

p. 1532–1543. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, et al. Language models are unsupervised multitask learners. OpenAI blog. 2019; 1(8):9. Schönmann I, Szewczyk J, de Lange FP, Heilbron M. Stimulus dependencies—rather than next-word prediction—can explain pre-onset brain encoding during natural listening. bioRxiv. 2025; p. 2025–03. Schr...

work page 2019

[5] [5]

The effect of word predictability on reading time is logarithmic

Smith NJ, Levy R. The effect of word predictability on reading time is logarithmic. Cognition. 2013; 128(3):302–319. Szewczyk JM, Mech EN, Federmeier KD. The power of “good”: Can adjectives rapidly decrease as well as increase the availability of the upcoming noun? Journal of Experimental Psychology: Learning, Memory, and Cognition. 2022; 48(6):856. Tonev...

work page doi:10.1109/10.623056 2013

[6] [6]

we considered𝑞values smaller than0.05as signiﬁcant

Star symbols mark signiﬁcant differences between predictable and unpredictable words calculated using a dependent t-test for paired samples on the brain scores of the two groups across the same MEG sources and accounted for multiple hypothesis testing using the Benjamini-Hochberg correction. we considered𝑞values smaller than0.05as signiﬁcant. A smoothing ...

work page 2013

[7] [7]

and are often attributed to reversals in neural activity patterns. In our case, because the encoding model is linear and maps embeddings to individual MEG time points, the neg- ative generalization might reﬂect a sign ﬂip in the underlying neural response from pre- to post-onset, potentially arising from the oscillatory nature of the MEG signal. Although ...

work page 2025