The Truth Lies Somewhere in the Middle (of the Generated Tokens)

Brian Cheung; Phillip Isola; Sophie L. Wang

arxiv: 2605.09969 · v1 · submitted 2026-05-11 · 💻 cs.LG · cs.CL

The Truth Lies Somewhere in the Middle (of the Generated Tokens)

Sophie L. Wang , Phillip Isola , Brian Cheung This is my paper

Pith reviewed 2026-05-12 02:40 UTC · model grok-4.3

classification 💻 cs.LG cs.CL

keywords mean poolinghidden statesautoregressive modelssemantic representationskernel alignmentlanguage modelstoken representationsmodel internals

0 comments

The pith

Mean pooling hidden states from generated tokens produces better semantic representations than any single token.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how to collapse autoregressively generated hidden states into a single representation of a language model's internal state. It discovers that mean pooling across these states creates representations with stronger semantic content than relying on any individual token. This finding is measured by how well the pooled states align with established reference spaces in language, vision, and protein domains. The result suggests that semantic information is distributed throughout the generated sequence rather than confined to one position, and generated tokens yield superior representations compared to prompt tokens.

Core claim

Despite tokens being generated under causal masking, mean pooling across their hidden states yields more semantic representations than any individual token alone, as quantified through kernel alignment to reference spaces in language, vision, and protein domains. The improvement is consistent with information being distributed across generated tokens rather than localized to a single position. Representations derived from generated tokens outperform those from prompt tokens, and alignment across generation reveals interpretable dynamics in model behavior.

What carries the argument

Mean pooling of hidden states across autoregressively generated tokens, evaluated using kernel alignment to reference semantic spaces.

If this is right

Mean pooling improves semantic quality consistently across multiple domains.
Generated token representations are better than prompt-only ones.
Model behavior shows interpretable dynamics when tracking alignment over generations.
Semantic information spreads across the token sequence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could mean that common practices like using the last token's hidden state are not optimal for capturing full semantics.
Applications relying on model embeddings, such as similarity search, might benefit from mean pooling generated sequences.
The distribution of information might apply to other autoregressive models like those in vision or audio.

Load-bearing premise

That alignment with kernel methods to reference spaces accurately reflects the semantic quality of the collapsed representations.

What would settle it

Finding a task or benchmark where single-token hidden states outperform mean-pooled ones in semantic similarity or downstream performance would challenge the claim.

read the original abstract

How should hidden states generated autoregressively be collapsed into a representation that reflects a language model's internal state? Despite tokens being generated under causal masking, we find that mean pooling across their hidden states yields more semantic representations than any individual token alone. We quantify this through kernel alignment to reference spaces in language, vision, and protein domains. The improvement through mean pooling is consistent with information being distributed across generated tokens rather than localized to a single position. Furthermore, representations derived from generated tokens outperform those from prompt tokens, and alignment across generation reveals interpretable dynamics in model behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract finds mean pooling of generated-token hidden states beats single tokens and prompt tokens on kernel alignment across three domains, but provides no validation that the metric tracks actual semantic quality.

read the letter

The main observation here is that averaging hidden states from autoregressively generated tokens produces representations that align better with reference spaces than any one token or the prompt tokens, and this holds in language, vision, and protein domains. The authors interpret it as evidence that information is spread across the generated sequence rather than concentrated at one position. Generated tokens also outperform prompt tokens, with some notes on alignment dynamics during generation. That's the punchline from the abstract alone. What stands out as new is the targeted side-by-side of mean pooling on generated versus prompt tokens plus the cross-domain consistency check. Pooling techniques are not novel, but this specific empirical angle on where to extract representations from LLMs could matter for downstream work in NLP or scientific modeling. The paper does a reasonable job flagging a practical pattern that matches how these models actually generate. The soft spots are clear and not minor. Only the abstract is available, so there are no datasets, implementation details, statistical tests, or controls to evaluate. More critically, the quantification rests entirely on kernel alignment to chosen reference spaces without any shown link to downstream task performance or other semantic checks. The stress-test concern lands: if higher alignment does not reliably mean better semantic content, the claim that mean pooling yields superior representations does not follow. No evidence addresses that gap. This is aimed at researchers who extract fixed representations from LLMs for transfer or analysis, especially in multimodal or domain-specific settings. A reader looking for quick empirical guidance on token pooling might find the pattern worth testing, but the current write-up is too thin to stand on its own. It deserves a serious referee if the full paper supplies the missing experiments, controls, and metric validation, because the underlying question about collapsing autoregressive states is relevant. Based on the abstract, though, it reads as preliminary.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that mean pooling across hidden states generated autoregressively by language models yields more semantic representations than any individual token. This is quantified using kernel alignment to reference spaces in language, vision, and protein domains, suggesting that information is distributed across generated tokens. Additionally, representations from generated tokens outperform those from prompt tokens, and alignment across generation shows interpretable model dynamics.

Significance. If the empirical findings hold, the result could influence practices in extracting semantic representations from LLMs by favoring mean pooling over single-token or prompt-based approaches. It provides evidence for distributed information in autoregressive generation, which may have broader implications for model interpretability and representation learning across modalities.

major comments (2)

[Abstract] The central quantification relies on kernel alignment to reference spaces serving as a measure of semantic quality, but the abstract provides no justification, validation against alternative metrics (such as downstream task accuracy), or evidence that the chosen reference spaces capture the relevant semantics. This assumption is load-bearing for concluding that mean pooling produces 'more semantic' representations.
[Abstract] No datasets, implementation details, statistical tests, controls, or experimental setup are described, preventing any assessment of whether the reported improvements in kernel alignment support the claims of consistency across domains or superiority of generated token representations over prompt tokens.

minor comments (1)

The title is intriguing but the abstract does not elaborate on what 'the middle' specifically refers to in terms of token positions or pooling strategy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment point by point below, focusing on the abstract's role as a concise summary while committing to revisions that strengthen the presentation without altering the core claims.

read point-by-point responses

Referee: [Abstract] The central quantification relies on kernel alignment to reference spaces serving as a measure of semantic quality, but the abstract provides no justification, validation against alternative metrics (such as downstream task accuracy), or evidence that the chosen reference spaces capture the relevant semantics. This assumption is load-bearing for concluding that mean pooling produces 'more semantic' representations.

Authors: We agree that the abstract's brevity precludes detailed justification. The full manuscript cites prior work validating centered kernel alignment (CKA) as a standard metric for representation similarity and motivates the reference spaces as established semantic embeddings in language, vision, and protein domains. We acknowledge that direct validation against downstream task accuracy would provide stronger support and will incorporate such comparisons in the revised manuscript. revision: yes
Referee: [Abstract] No datasets, implementation details, statistical tests, controls, or experimental setup are described, preventing any assessment of whether the reported improvements in kernel alignment support the claims of consistency across domains or superiority of generated token representations over prompt tokens.

Authors: The abstract is designed as a high-level overview and omits these elements to remain concise. The full manuscript provides the datasets across domains, model implementation details, statistical tests for alignment improvements, control experiments, and experimental protocols that underpin the claims of cross-domain consistency and the advantage of generated-token representations. We will not revise the abstract substantially but will verify that the body of the paper makes these elements fully accessible. revision: no

Circularity Check

0 steps flagged

No circularity: empirical proxy via kernel alignment is independent of inputs

full rationale

The abstract reports an empirical observation that mean pooling of autoregressively generated hidden states produces higher kernel alignment to external reference spaces (language, vision, protein) than single tokens or prompt tokens. No equations, fitted parameters, derivations, or self-citations are present. The central quantification uses an external metric (kernel alignment) applied to held-out reference spaces, so the reported improvement is not equivalent to the inputs by construction and remains falsifiable. This matches the default case of a self-contained empirical comparison with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated premise that kernel alignment is a valid semantic proxy.

pith-pipeline@v0.9.0 · 5357 in / 1032 out tokens · 62311 ms · 2026-05-12T02:40:29.165933+00:00 · methodology

The Truth Lies Somewhere in the Middle (of the Generated Tokens)

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)