Recognition: 2 theorem links
· Lean TheoremAbstraction Induces the Brain Alignment of Language and Speech Models
Pith reviewed 2026-05-16 07:35 UTC · model grok-4.3
The pith
Brain alignment arises from meaning abstraction in middle layers of language and speech models, not from next-word prediction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Intermediate hidden states from language and speech models predict measured brain responses because models construct higher-order linguistic features in their middle layers. This construction is cued by a peak in layerwise intrinsic dimension. A layer's intrinsic dimension strongly predicts how well it explains fMRI and ECoG signals; the relation between intrinsic dimension and brain predictivity emerges over pre-training; and fine-tuning models to better predict the brain causally increases both representations' intrinsic dimension and their semantic content. Semantic richness, high intrinsic dimension, and brain predictivity therefore mirror one another.
What carries the argument
Layerwise intrinsic dimension, which measures the complexity of features extracted at each layer and peaks in the middle layers where higher-order linguistic abstractions are formed.
If this is right
- Layers with higher intrinsic dimension explain more variance in brain signals from fMRI and ECoG recordings.
- The relationship between intrinsic dimension and brain predictivity strengthens as models undergo pre-training.
- Fine-tuning a model to improve brain prediction increases both its intrinsic dimension and the semantic richness of its representations.
- Next-word prediction alone is insufficient to produce strong brain alignment without the accompanying development of high-dimensional semantic features.
Where Pith is reading between the lines
- Models trained on other sufficiently complex tasks that force abstraction could produce similar brain alignment without language modeling objectives.
- Architectures that explicitly encourage high intrinsic dimension in middle layers might achieve better brain alignment with less data.
- The same intrinsic-dimension signature could be used to identify which layers to extract for downstream applications that aim to mimic human semantic processing.
Load-bearing premise
That peaks in intrinsic dimension directly index the semantic abstraction relevant to brain processing and that the observed correlations reflect a causal driver rather than a side effect of model scale or training data.
What would settle it
An experiment that holds semantics fixed while altering intrinsic dimension across layers and checks whether brain predictivity still tracks the dimension measure.
read the original abstract
Research has repeatedly demonstrated that intermediate hidden states extracted from large language models and speech audio models predict measured brain response to natural language stimuli. Yet, very little is known about the representation properties that enable this high prediction performance. Why is it the intermediate layers, and not the output layers, that are most effective for this unique and highly general transfer task? We give evidence that the correspondence between speech and language models and the brain derives from shared meaning abstraction and not their next-word prediction properties. In particular, models construct higher-order linguistic features in their middle layers, cued by a peak in the layerwise intrinsic dimension, a measure of feature complexity. We show that a layer's intrinsic dimension strongly predicts how well it explains fMRI and ECoG signals; that the relation between intrinsic dimension and brain predictivity arises over model pre-training; and finetuning models to better predict the brain causally increases both representations' intrinsic dimension and their semantic content. Results suggest that semantic richness, high intrinsic dimension, and brain predictivity mirror each other, and that the key driver of model-brain similarity is rich meaning abstraction of the inputs, where language modeling is a task sufficiently complex (but perhaps not the only) to require it.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that alignment between intermediate layers of language and speech models with brain responses (fMRI and ECoG) arises from shared meaning abstraction rather than next-word prediction. This is cued by a peak in layerwise intrinsic dimension (ID) as a measure of feature complexity; ID strongly predicts brain predictivity, the ID-brain relation emerges during pre-training, and brain-supervised fine-tuning jointly increases ID and semantic content.
Significance. If the central results hold, the work supplies a mechanistic account linking representational complexity (via ID) to semantic abstraction and brain predictivity. It offers a potential explanation for why middle layers outperform output layers and could guide construction of more brain-aligned models. The joint observation that ID, semantic richness, and brain alignment co-vary is a useful empirical contribution.
major comments (3)
- [Results on pre-training and fine-tuning] The inference that ID indexes semantic abstraction as the causal driver of brain alignment, independent of next-token prediction, rests on correlational evidence alone. All examined models remain next-token predictors; no control representations that achieve high ID without semantic structure (or low ID with semantic structure) are tested. This separation is load-bearing for the central claim.
- [Methods (intrinsic dimension estimation)] The definition and computation of layerwise intrinsic dimension must be shown to be independent of the same representational properties used to predict brain signals. Without an explicit statement that ID estimation uses only geometric properties of the activation manifold and not the semantic labels or brain targets, the risk of circularity remains unaddressed.
- [Fine-tuning experiments] Brain-supervised fine-tuning is reported to increase both ID and semantic content, yet the design does not include a matched control condition (e.g., fine-tuning on a non-semantic task that also raises ID) to test whether the ID increase is sufficient for improved brain predictivity.
minor comments (2)
- [Abstract and Methods] The abstract refers to 'a peak in the layerwise intrinsic dimension' without stating the estimator, neighborhood size, or dimensionality-reduction step used; these parameters should appear in the main text before the first results figure.
- [Figures and Tables] Figure legends and table captions should explicitly report the number of subjects, stimuli, and cross-validation folds for all brain-prediction correlations.
Simulated Author's Rebuttal
We thank the referee for the constructive comments that help clarify the scope of our claims. We address each major point below with clarifications from the manuscript and propose targeted revisions to improve transparency without overstating the evidence.
read point-by-point responses
-
Referee: [Results on pre-training and fine-tuning] The inference that ID indexes semantic abstraction as the causal driver of brain alignment, independent of next-token prediction, rests on correlational evidence alone. All examined models remain next-token predictors; no control representations that achieve high ID without semantic structure (or low ID with semantic structure) are tested. This separation is load-bearing for the central claim.
Authors: We agree that the evidence remains correlational and that all models use next-token prediction objectives, so we cannot fully isolate ID from semantics with the current experiments. The pre-training results show the ID-brain relation emerging as semantic features develop, and the fine-tuning results demonstrate joint increases in ID, semantic content, and alignment. In the revised manuscript we will add an explicit limitations paragraph acknowledging the absence of control representations (e.g., high-ID non-semantic embeddings) and will suggest such experiments as future work. We cannot introduce new control models in this revision. revision: partial
-
Referee: [Methods (intrinsic dimension estimation)] The definition and computation of layerwise intrinsic dimension must be shown to be independent of the same representational properties used to predict brain signals. Without an explicit statement that ID estimation uses only geometric properties of the activation manifold and not the semantic labels or brain targets, the risk of circularity remains unaddressed.
Authors: Intrinsic dimension is estimated via the TwoNN method applied solely to the raw activation vectors of each layer; the estimator uses only Euclidean nearest-neighbor distances within the activation manifold and incorporates no semantic labels, brain targets, or supervised signals. We will insert a new subsection in Methods that states this independence explicitly, reproduces the estimator formula, and confirms that brain predictivity analyses are performed after ID computation on held-out data. revision: yes
-
Referee: [Fine-tuning experiments] Brain-supervised fine-tuning is reported to increase both ID and semantic content, yet the design does not include a matched control condition (e.g., fine-tuning on a non-semantic task that also raises ID) to test whether the ID increase is sufficient for improved brain predictivity.
Authors: We concur that a matched non-semantic fine-tuning control would strengthen causal claims about ID sufficiency. Our current results show that brain-supervised fine-tuning simultaneously elevates ID, semantic richness, and alignment, consistent with the abstraction hypothesis. In revision we will expand the fine-tuning analysis with trajectory plots and add a discussion paragraph noting the missing control condition as a limitation while proposing it for future study. New control experiments cannot be added within the scope of this revision. revision: partial
Circularity Check
No significant circularity detected in the derivation chain
full rationale
The paper computes layerwise intrinsic dimension directly from the geometry of hidden-state representations and separately measures brain predictivity via linear regression on the same states; the reported correlation, its emergence during pre-training, and the joint increase under brain-supervised fine-tuning are empirical observations rather than definitional reductions. No equation equates ID to brain predictivity by construction, no parameter is fitted on brain data and then relabeled as an independent prediction, and no load-bearing uniqueness claim rests solely on self-citation. The argument that abstraction (indexed by ID) rather than next-token prediction drives alignment is supported by the timing and intervention results, even if those results leave room for alternative interpretations; such evidential gaps do not constitute circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Intrinsic dimension of hidden states measures the complexity of higher-order linguistic features relevant to brain responses
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
models construct higher-order linguistic features in their middle layers, cued by a peak in the layerwise intrinsic dimension... semantic richness, high intrinsic dimension, and brain predictivity mirror each other
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
the relation between intrinsic dimension and brain predictivity arises over model pre-training... finetuning models to better predict the brain causally increases both representations' intrinsic dimension and their semantic content
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.