Information-Theoretic Storage Cost in Sentence Comprehension

Ethan Gotlieb Wilcox; Kohei Kajikawa; Shinnosuke Isono

arxiv: 2602.18217 · v2 · pith:AAJW4MIWnew · submitted 2026-02-20 · 💻 cs.CL

Information-Theoretic Storage Cost in Sentence Comprehension

Kohei Kajikawa , Shinnosuke Isono , Ethan Gotlieb Wilcox This is my paper

Pith reviewed 2026-05-21 12:37 UTC · model grok-4.3

classification 💻 cs.CL

keywords storage costsentence comprehensioninformation theoryreading timesneural language modelspsycholinguisticsworking memory

0 comments

The pith

An information-theoretic storage cost estimated from neural language models predicts additional variance in human reading times.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a measure of storage cost in sentence processing as the information previous words carry about the future context. This is computed using uncertainty from pre-trained neural language models, making it continuous and independent of specific grammars. The approach captures known difficulties in structures like center embeddings. It also correlates with traditional grammar-based costs and improves predictions of reading times in naturalistic data over standard information predictors.

Core claim

Storage cost is formalized as the amount of information that prior context provides about upcoming input under uncertainty. When estimated from neural language models, this quantity aligns with established processing asymmetries and accounts for reading-time data beyond conventional predictors.

What carries the argument

Information-theoretic storage cost: the expected information previous words carry about future context, under uncertainty, as estimated by neural language models.

If this is right

Recovers processing asymmetries in center embeddings and relative clauses.
Correlates with grammar-based storage cost measures in annotated corpora.
Predicts reading-time variance in large naturalistic datasets over and above traditional information-based models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar measures could be tested in languages other than English using available language models.
Processing theories may shift toward probabilistic, model-based accounts of memory load.
This cost could be combined with other cognitive measures for richer predictions of comprehension difficulty.

Load-bearing premise

Pre-trained neural language models' learned uncertainty distributions accurately proxy the uncertainty humans face in incremental sentence comprehension.

What would settle it

Finding a large reading-time dataset where the new measure adds no predictive power beyond baselines, or where it fails to recover known syntactic processing effects.

read the original abstract

Real-time sentence comprehension imposes a significant load on working memory, as comprehenders must maintain contextual information to anticipate future input. While measures of such load have played an important role in psycholinguistic theories, they have largely been formalized using symbolic grammars, which assign discrete, uniform costs to syntactic predictions. This study proposes a measure of processing storage cost based on an information-theoretic formalization, as the amount of information previous words carry about future context, under uncertainty. Unlike previous discrete, grammar-based metrics, this measure is continuous, probabilistic, theory-neutral, and can be estimated from pre-trained neural language models. The validity of this approach is demonstrated through three analyses in English: our measure (i) recovers well-known processing asymmetries in center embeddings and relative clauses, (ii) correlates with a grammar-based storage cost in a syntactically-annotated corpus, and (iii) predicts reading-time variance in two large-scale naturalistic datasets over and above baseline models with traditional information-based predictors. Our code is available at https://github.com/kohei-kaji/info-storage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes an information-theoretic storage cost for sentence comprehension, quantified as the amount of information prior words carry about future context under uncertainty. Unlike discrete, uniform costs from symbolic grammars, this measure is continuous and probabilistic; it is estimated from pre-trained neural language models and claimed to be theory-neutral. Validity is shown via three English analyses: recovery of known asymmetries in center embeddings and relative clauses, correlation with grammar-based storage costs in a syntactically annotated corpus, and improved prediction of reading times in two large naturalistic datasets beyond traditional information-based baselines. Code is provided for reproducibility.

Significance. If the results hold, the work supplies a continuous, probabilistic alternative to grammar-based load metrics, offering a potential bridge between information theory and models of working memory in incremental processing. The reported predictive gain over baselines in naturalistic data indicates practical utility, while the open code supports reproducibility and future extensions.

major comments (1)

[Abstract (paragraph on estimation from models)] Abstract (paragraph on estimation from models): The claim that pre-trained LM uncertainty distributions provide a valid proxy for human incremental uncertainty is load-bearing for interpreting the measure as a genuine storage cost rather than a flexible contextual predictor. No direct calibration against human data (e.g., cloze norms or prediction signatures in eye-tracking) is described, leaving open whether corpus or architectural biases in the LMs drive the reported RT correlations instead of the proposed theory-neutral storage cost.

minor comments (1)

[Abstract] Abstract: The summary of the three analyses would be strengthened by briefly noting the specific datasets, exclusion criteria, and statistical controls used to demonstrate added predictive power over baselines.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and the opportunity to clarify our approach. Below we respond point-by-point to the major comment concerning validation of the language-model estimates.

read point-by-point responses

Referee: Abstract (paragraph on estimation from models): The claim that pre-trained LM uncertainty distributions provide a valid proxy for human incremental uncertainty is load-bearing for interpreting the measure as a genuine storage cost rather than a flexible contextual predictor. No direct calibration against human data (e.g., cloze norms or prediction signatures in eye-tracking) is described, leaving open whether corpus or architectural biases in the LMs drive the reported RT correlations instead of the proposed theory-neutral storage cost.

Authors: We acknowledge that the manuscript does not include direct calibration of LM uncertainty against cloze norms or eye-tracking signatures of prediction. Our validations remain indirect yet tied to human data: the measure recovers established processing asymmetries from controlled experiments, correlates with grammar-derived storage costs, and accounts for additional variance in naturalistic reading times. These outcomes indicate that the quantity captures processing-relevant information rather than generic LM prediction alone. We agree that potential corpus or architectural biases merit explicit discussion. In revision we will expand the abstract and discussion to state the modeling assumptions more clearly and to outline future direct comparisons with human uncertainty measures. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external pre-trained models and independent validation data

full rationale

The information-theoretic storage cost is computed from uncertainty distributions of external pre-trained neural language models rather than any parameter fitted inside the paper. Validation proceeds by comparing the measure to independent reading-time corpora and grammar-based costs, with no equations or steps that reduce the claimed predictions to the inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and the central result is not a renaming or ansatz smuggled from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that neural language models encode human-like predictive uncertainty; no free parameters or invented entities are introduced in the abstract description.

axioms (1)

domain assumption Pre-trained neural language models capture relevant probabilistic information about language similar to humans.
Invoked when estimating the storage cost from model probabilities.

pith-pipeline@v0.9.0 · 5714 in / 1232 out tokens · 52252 ms · 2026-05-21T12:37:32.009263+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Syntactically-guided Information Maintenance in Sentence Comprehension
cs.CL 2026-04 unverdicted novelty 6.0

Syntactic structure guides selective maintenance via distinct costs from predicted heads and incomplete dependencies, supported by Japanese reading time data showing they are irreducible and interact with predictability.