The Truth Lies Somewhere in the Middle (of the Generated Tokens)
Pith reviewed 2026-05-12 02:40 UTC · model grok-4.3
The pith
Mean pooling hidden states from generated tokens produces better semantic representations than any single token.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Despite tokens being generated under causal masking, mean pooling across their hidden states yields more semantic representations than any individual token alone, as quantified through kernel alignment to reference spaces in language, vision, and protein domains. The improvement is consistent with information being distributed across generated tokens rather than localized to a single position. Representations derived from generated tokens outperform those from prompt tokens, and alignment across generation reveals interpretable dynamics in model behavior.
What carries the argument
Mean pooling of hidden states across autoregressively generated tokens, evaluated using kernel alignment to reference semantic spaces.
If this is right
- Mean pooling improves semantic quality consistently across multiple domains.
- Generated token representations are better than prompt-only ones.
- Model behavior shows interpretable dynamics when tracking alignment over generations.
- Semantic information spreads across the token sequence.
Where Pith is reading between the lines
- This could mean that common practices like using the last token's hidden state are not optimal for capturing full semantics.
- Applications relying on model embeddings, such as similarity search, might benefit from mean pooling generated sequences.
- The distribution of information might apply to other autoregressive models like those in vision or audio.
Load-bearing premise
That alignment with kernel methods to reference spaces accurately reflects the semantic quality of the collapsed representations.
What would settle it
Finding a task or benchmark where single-token hidden states outperform mean-pooled ones in semantic similarity or downstream performance would challenge the claim.
read the original abstract
How should hidden states generated autoregressively be collapsed into a representation that reflects a language model's internal state? Despite tokens being generated under causal masking, we find that mean pooling across their hidden states yields more semantic representations than any individual token alone. We quantify this through kernel alignment to reference spaces in language, vision, and protein domains. The improvement through mean pooling is consistent with information being distributed across generated tokens rather than localized to a single position. Furthermore, representations derived from generated tokens outperform those from prompt tokens, and alignment across generation reveals interpretable dynamics in model behavior.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that mean pooling across hidden states generated autoregressively by language models yields more semantic representations than any individual token. This is quantified using kernel alignment to reference spaces in language, vision, and protein domains, suggesting that information is distributed across generated tokens. Additionally, representations from generated tokens outperform those from prompt tokens, and alignment across generation shows interpretable model dynamics.
Significance. If the empirical findings hold, the result could influence practices in extracting semantic representations from LLMs by favoring mean pooling over single-token or prompt-based approaches. It provides evidence for distributed information in autoregressive generation, which may have broader implications for model interpretability and representation learning across modalities.
major comments (2)
- [Abstract] The central quantification relies on kernel alignment to reference spaces serving as a measure of semantic quality, but the abstract provides no justification, validation against alternative metrics (such as downstream task accuracy), or evidence that the chosen reference spaces capture the relevant semantics. This assumption is load-bearing for concluding that mean pooling produces 'more semantic' representations.
- [Abstract] No datasets, implementation details, statistical tests, controls, or experimental setup are described, preventing any assessment of whether the reported improvements in kernel alignment support the claims of consistency across domains or superiority of generated token representations over prompt tokens.
minor comments (1)
- The title is intriguing but the abstract does not elaborate on what 'the middle' specifically refers to in terms of token positions or pooling strategy.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address each major comment point by point below, focusing on the abstract's role as a concise summary while committing to revisions that strengthen the presentation without altering the core claims.
read point-by-point responses
-
Referee: [Abstract] The central quantification relies on kernel alignment to reference spaces serving as a measure of semantic quality, but the abstract provides no justification, validation against alternative metrics (such as downstream task accuracy), or evidence that the chosen reference spaces capture the relevant semantics. This assumption is load-bearing for concluding that mean pooling produces 'more semantic' representations.
Authors: We agree that the abstract's brevity precludes detailed justification. The full manuscript cites prior work validating centered kernel alignment (CKA) as a standard metric for representation similarity and motivates the reference spaces as established semantic embeddings in language, vision, and protein domains. We acknowledge that direct validation against downstream task accuracy would provide stronger support and will incorporate such comparisons in the revised manuscript. revision: yes
-
Referee: [Abstract] No datasets, implementation details, statistical tests, controls, or experimental setup are described, preventing any assessment of whether the reported improvements in kernel alignment support the claims of consistency across domains or superiority of generated token representations over prompt tokens.
Authors: The abstract is designed as a high-level overview and omits these elements to remain concise. The full manuscript provides the datasets across domains, model implementation details, statistical tests for alignment improvements, control experiments, and experimental protocols that underpin the claims of cross-domain consistency and the advantage of generated-token representations. We will not revise the abstract substantially but will verify that the body of the paper makes these elements fully accessible. revision: no
Circularity Check
No circularity: empirical proxy via kernel alignment is independent of inputs
full rationale
The abstract reports an empirical observation that mean pooling of autoregressively generated hidden states produces higher kernel alignment to external reference spaces (language, vision, protein) than single tokens or prompt tokens. No equations, fitted parameters, derivations, or self-citations are present. The central quantification uses an external metric (kernel alignment) applied to held-out reference spaces, so the reported improvement is not equivalent to the inputs by construction and remains falsifiable. This matches the default case of a self-contained empirical comparison with no load-bearing circular steps.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.