Reassessing prediction in the brain: Pre-onset neural encoding during natural listening does not reflect pre-activation
Pith reviewed 2026-05-23 07:30 UTC · model grok-4.3
The pith
Pre-onset neural activity during natural listening does not reflect pre-activation of the next word.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Pre-onset neural encoding effects replicate across MEG and ECoG and survive stimulus-correlation controls, yet temporal generalization analyses reveal no stable overlap between pre- and post-onset representations. This indicates that pre-onset activity does not reflect pre-activation of the next word. Long-range predictive effects reported in prior fMRI work do not appear in the higher-temporal-resolution recordings, while clear postdiction signatures emerge as persistent encoding of prior words.
What carries the argument
Temporal generalization analyses that test whether pre-onset neural patterns remain stable when compared to post-onset patterns.
If this is right
- Pre-onset encoding effects cannot be taken as evidence of predictive pre-activation of specific words.
- Long-range predictive effects observed in fMRI fail to replicate when temporal resolution is higher.
- Neural responses instead carry forward information about words that have already occurred.
- LLM-based encoding models require additional tests before their outputs are interpreted as neural prediction.
Where Pith is reading between the lines
- Other measures of expectation, such as lexical surprise, may still produce detectable prediction signals even if specific-word pre-activation does not.
- The same temporal-generalization controls could be applied to claims of pre-activation in non-linguistic domains.
- Comprehension models may need to assign a larger role to postdiction mechanisms that maintain prior context.
Load-bearing premise
That the absence of stable overlap in temporal generalization, together with the chosen stimulus controls, is enough to rule out pre-activation rather than simply missing it under these analysis choices.
What would settle it
A new dataset or analysis that demonstrates stable generalization from pre-onset to post-onset representations while keeping the same controls would support the pre-activation interpretation.
Figures
read the original abstract
Predictive processing theories propose that the brain continuously anticipates upcoming input. However, direct neural evidence for predictive pre-activation during natural language comprehension remains limited and debated. Previous studies using large language model (LLM)-based encoding models with fMRI and ECoG have reported pre-onset signals that appear to encode upcoming words, but these effects may instead reflect dependencies in the stimulus or autocorrelations in neural activity. Here, we re-examined this question by aligning LLM-derived word embeddings with neural activity recorded during naturalistic listening using magnetoencephalography (MEG) and electrocorticography (ECoG). We replicated pre-onset encoding effects previously observed in ECoG across both modalities, and found that they persist even after controlling for stimulus correlations. Crucially, temporal generalization analyses revealed no stable overlap between pre- and post-onset representations, indicating that pre-onset activity does not reflect pre-activation of the next word. Consistent with this, long-range predictive effects previously reported in fMRI did not replicate in our higher-temporal-resolution data. While we found no evidence for predictive pre-activation, we observed clear signatures of postdiction, with neural activity reflecting persistent encoding of prior words. These results suggest that reported apparent predictive signals do not reflect pre-activation of upcoming input. They call for caution in interpreting LLM-based encoding models and highlight the need for a more nuanced understanding of what constitutes "prediction" in language comprehension.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript re-examines evidence for predictive pre-activation during natural language listening by aligning LLM word embeddings with MEG and ECoG recordings. It replicates pre-onset encoding effects that survive stimulus-correlation controls, but reports that temporal generalization analyses show no stable overlap between pre- and post-onset neural representations. This is taken to indicate that pre-onset activity does not reflect pre-activation of the next word. The paper also notes replication of postdiction signatures, failure to replicate long-range fMRI predictive effects in higher-temporal-resolution data, and calls for caution in interpreting LLM-based encoding models.
Significance. If the central negative result holds after addressing sensitivity concerns, the work would meaningfully constrain predictive-processing accounts of language comprehension by showing that apparent pre-onset signals in LLM encoding models do not constitute evidence for pre-activation. The multi-modal replication, explicit stimulus controls, and contrast with postdiction effects provide a useful empirical anchor for reinterpreting prior findings in the field.
major comments (2)
- [Abstract] Abstract and temporal generalization section: the load-bearing claim that 'no stable overlap' rules out pre-activation assumes the chosen metric (time-window selection, cross-validation scheme, and overlap criterion) has adequate sensitivity. If pre-activation manifests as a transformed or partial mapping (e.g., semantic features only or a distinct neural subspace), the procedure could fail to detect overlap even when pre-onset encoding survives stimulus controls. This assumption is not directly tested and directly supports the central negative conclusion.
- [Methods] Methods description (implied by abstract): data exclusion criteria, exact statistical thresholds, and the precise implementation of temporal generalization (including how 'stable overlap' is quantified) are not fully specified. Without these, the support for the negative claim that pre-onset activity does not reflect pre-activation cannot be fully evaluated, as noted in the low soundness assessment.
minor comments (2)
- Clarify whether the temporal generalization analysis was performed in the original neural space or projected into the LLM embedding space, and report the exact cross-validation scheme used.
- The abstract states that long-range predictive effects 'did not replicate'; provide a direct quantitative comparison (effect sizes, p-values) to the original fMRI study for transparency.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below, providing clarifications and indicating where revisions will strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract and temporal generalization section: the load-bearing claim that 'no stable overlap' rules out pre-activation assumes the chosen metric (time-window selection, cross-validation scheme, and overlap criterion) has adequate sensitivity. If pre-activation manifests as a transformed or partial mapping (e.g., semantic features only or a distinct neural subspace), the procedure could fail to detect overlap even when pre-onset encoding survives stimulus controls. This assumption is not directly tested and directly supports the central negative conclusion.
Authors: We selected temporal generalization as it is the standard approach for detecting shared representational structure across time (e.g., King & Dehaene, 2014). If pre-activation involved the same or linearly related features, some generalization should appear even under partial overlap; the complete absence across multiple thresholds and folds supports our interpretation. That said, we acknowledge the referee's point on potential transformed mappings and will add a supplementary analysis using cross-temporal canonical correlation analysis to explicitly test for partial or rotated overlap. We will also expand the discussion to note this as a boundary condition on the negative result. revision: partial
-
Referee: [Methods] Methods description (implied by abstract): data exclusion criteria, exact statistical thresholds, and the precise implementation of temporal generalization (including how 'stable overlap' is quantified) are not fully specified. Without these, the support for the negative claim that pre-onset activity does not reflect pre-activation cannot be fully evaluated, as noted in the low soundness assessment.
Authors: We agree that fuller specification is needed for reproducibility. In the revised manuscript we will add explicit sections detailing participant/data exclusion criteria, the precise statistical thresholds (including correction methods), the temporal generalization procedure (window sizes, cross-validation folds, and the exact criterion used to define 'stable overlap', such as significant generalization across consecutive time points), and any preprocessing steps that affect these analyses. revision: yes
Circularity Check
No significant circularity; empirical reanalysis with independent controls and metric
full rationale
The paper's central result—that temporal generalization shows no stable pre/post-onset overlap, implying pre-onset signals are not pre-activation—rests on an empirical analysis pipeline (LLM embedding alignment, stimulus-correlation controls, and standard temporal generalization) applied to MEG/ECoG data. This does not reduce to a self-definition, a fitted parameter renamed as prediction, or a self-citation chain; the negative finding is falsifiable against the chosen metric and controls rather than tautological. No quoted step equates the output to the input by construction. Minor self-citations (if present) are not load-bearing for the key claim.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM word embeddings can be linearly aligned with neural activity to test representational overlap
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
temporal generalization analyses revealed no stable overlap between pre- and post-onset representations
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
pre-onset encoding remains largely unchanged after decorrelating eight preceding word embeddings
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Antonello R, Huth A. Predictive coding or just feature discovery? An alternative account of why language models fit brain data. Neurobiology of Language. 2024; 5(1):64–79. Armeni K, Güçlü U, van Gerven M, Schoffelen JM. A 10-hour within-participant magnetoencephalography narrative dataset to test models of language comprehension. Scientific Data. 2022; 9(1)...
work page 2024
-
[2]
Representational dynamics of object vision: the first 1000 ms
Carlson T, Tovar DA, Alink A, Kriegeskorte N. Representational dynamics of object vision: the first 1000 ms. Journal of vision. 2013; 13(10):1–1. Caucheteux C, Gramfort A, King JR. Evidence of a predictive coding hierarchy in the human brain listening to speech. Nature human behaviour. 2023; 7(3):430–441. Caucheteux C, King JR. Brains and algorithms partia...
work page 2013
-
[3]
Perception in real-time: predicting the present, reconstructing the past
Hogendoorn H. Perception in real-time: predicting the present, reconstructing the past. Trends in Cognitive Sciences. 2022; 26(2):128–141. King JR, Dehaene S. Characterizing the dynamics of mental representations: the temporal generalization method. Trends in cognitive sciences. 2014; 18(4):203–210. King JR, Gramfort A, Schurger A, Naccache L, Dehaene S. ...
-
[4]
p. 1532–1543. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, et al. Language models are unsupervised multitask learners. OpenAI blog. 2019; 1(8):9. Schönmann I, Szewczyk J, de Lange FP, Heilbron M. Stimulus dependencies—rather than next-word prediction—can explain pre-onset brain encoding during natural listening. bioRxiv. 2025; p. 2025–03. Schr...
work page 2019
-
[5]
The effect of word predictability on reading time is logarithmic
Smith NJ, Levy R. The effect of word predictability on reading time is logarithmic. Cognition. 2013; 128(3):302–319. Szewczyk JM, Mech EN, Federmeier KD. The power of “good”: Can adjectives rapidly decrease as well as increase the availability of the upcoming noun? Journal of Experimental Psychology: Learning, Memory, and Cognition. 2022; 48(6):856. Tonev...
-
[6]
we considered𝑞values smaller than0.05as significant
Star symbols mark significant differences between predictable and unpredictable words calculated using a dependent t-test for paired samples on the brain scores of the two groups across the same MEG sources and accounted for multiple hypothesis testing using the Benjamini-Hochberg correction. we considered𝑞values smaller than0.05as significant. A smoothing ...
work page 2013
-
[7]
and are often attributed to reversals in neural activity patterns. In our case, because the encoding model is linear and maps embeddings to individual MEG time points, the neg- ative generalization might reflect a sign flip in the underlying neural response from pre- to post-onset, potentially arising from the oscillatory nature of the MEG signal. Although ...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.