Information-Theoretic Storage Cost in Sentence Comprehension
Pith reviewed 2026-05-21 12:37 UTC · model grok-4.3
The pith
An information-theoretic storage cost estimated from neural language models predicts additional variance in human reading times.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Storage cost is formalized as the amount of information that prior context provides about upcoming input under uncertainty. When estimated from neural language models, this quantity aligns with established processing asymmetries and accounts for reading-time data beyond conventional predictors.
What carries the argument
Information-theoretic storage cost: the expected information previous words carry about future context, under uncertainty, as estimated by neural language models.
If this is right
- Recovers processing asymmetries in center embeddings and relative clauses.
- Correlates with grammar-based storage cost measures in annotated corpora.
- Predicts reading-time variance in large naturalistic datasets over and above traditional information-based models.
Where Pith is reading between the lines
- Similar measures could be tested in languages other than English using available language models.
- Processing theories may shift toward probabilistic, model-based accounts of memory load.
- This cost could be combined with other cognitive measures for richer predictions of comprehension difficulty.
Load-bearing premise
Pre-trained neural language models' learned uncertainty distributions accurately proxy the uncertainty humans face in incremental sentence comprehension.
What would settle it
Finding a large reading-time dataset where the new measure adds no predictive power beyond baselines, or where it fails to recover known syntactic processing effects.
read the original abstract
Real-time sentence comprehension imposes a significant load on working memory, as comprehenders must maintain contextual information to anticipate future input. While measures of such load have played an important role in psycholinguistic theories, they have largely been formalized using symbolic grammars, which assign discrete, uniform costs to syntactic predictions. This study proposes a measure of processing storage cost based on an information-theoretic formalization, as the amount of information previous words carry about future context, under uncertainty. Unlike previous discrete, grammar-based metrics, this measure is continuous, probabilistic, theory-neutral, and can be estimated from pre-trained neural language models. The validity of this approach is demonstrated through three analyses in English: our measure (i) recovers well-known processing asymmetries in center embeddings and relative clauses, (ii) correlates with a grammar-based storage cost in a syntactically-annotated corpus, and (iii) predicts reading-time variance in two large-scale naturalistic datasets over and above baseline models with traditional information-based predictors. Our code is available at https://github.com/kohei-kaji/info-storage.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an information-theoretic storage cost for sentence comprehension, quantified as the amount of information prior words carry about future context under uncertainty. Unlike discrete, uniform costs from symbolic grammars, this measure is continuous and probabilistic; it is estimated from pre-trained neural language models and claimed to be theory-neutral. Validity is shown via three English analyses: recovery of known asymmetries in center embeddings and relative clauses, correlation with grammar-based storage costs in a syntactically annotated corpus, and improved prediction of reading times in two large naturalistic datasets beyond traditional information-based baselines. Code is provided for reproducibility.
Significance. If the results hold, the work supplies a continuous, probabilistic alternative to grammar-based load metrics, offering a potential bridge between information theory and models of working memory in incremental processing. The reported predictive gain over baselines in naturalistic data indicates practical utility, while the open code supports reproducibility and future extensions.
major comments (1)
- [Abstract (paragraph on estimation from models)] Abstract (paragraph on estimation from models): The claim that pre-trained LM uncertainty distributions provide a valid proxy for human incremental uncertainty is load-bearing for interpreting the measure as a genuine storage cost rather than a flexible contextual predictor. No direct calibration against human data (e.g., cloze norms or prediction signatures in eye-tracking) is described, leaving open whether corpus or architectural biases in the LMs drive the reported RT correlations instead of the proposed theory-neutral storage cost.
minor comments (1)
- [Abstract] Abstract: The summary of the three analyses would be strengthened by briefly noting the specific datasets, exclusion criteria, and statistical controls used to demonstrate added predictive power over baselines.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and the opportunity to clarify our approach. Below we respond point-by-point to the major comment concerning validation of the language-model estimates.
read point-by-point responses
-
Referee: Abstract (paragraph on estimation from models): The claim that pre-trained LM uncertainty distributions provide a valid proxy for human incremental uncertainty is load-bearing for interpreting the measure as a genuine storage cost rather than a flexible contextual predictor. No direct calibration against human data (e.g., cloze norms or prediction signatures in eye-tracking) is described, leaving open whether corpus or architectural biases in the LMs drive the reported RT correlations instead of the proposed theory-neutral storage cost.
Authors: We acknowledge that the manuscript does not include direct calibration of LM uncertainty against cloze norms or eye-tracking signatures of prediction. Our validations remain indirect yet tied to human data: the measure recovers established processing asymmetries from controlled experiments, correlates with grammar-derived storage costs, and accounts for additional variance in naturalistic reading times. These outcomes indicate that the quantity captures processing-relevant information rather than generic LM prediction alone. We agree that potential corpus or architectural biases merit explicit discussion. In revision we will expand the abstract and discussion to state the modeling assumptions more clearly and to outline future direct comparisons with human uncertainty measures. revision: partial
Circularity Check
No significant circularity; derivation relies on external pre-trained models and independent validation data
full rationale
The information-theoretic storage cost is computed from uncertainty distributions of external pre-trained neural language models rather than any parameter fitted inside the paper. Validation proceeds by comparing the measure to independent reading-time corpora and grammar-based costs, with no equations or steps that reduce the claimed predictions to the inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and the central result is not a renaming or ansatz smuggled from prior author work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pre-trained neural language models capture relevant probabilistic information about language similar to humans.
Forward citations
Cited by 1 Pith paper
-
Syntactically-guided Information Maintenance in Sentence Comprehension
Syntactic structure guides selective maintenance via distinct costs from predicted heads and incomplete dependencies, supported by Japanese reading time data showing they are irreducible and interact with predictability.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.