Future-rhyme information is linearly decodable at line boundaries across model families and strengthens with scale, yet only Gemma-3-27B causally depends on it, with the driver migrating to the boundary around layer 30 and localizing to five attention heads.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
contradiction 1
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
contradiction 1polarities
contest 1representative citing papers
A geometric 1-form on token embeddings has curvature that couples to semantic world models in language models, as evidenced by clustering on chess board regions and piece importance.
citing papers explorer
-
Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions
Future-rhyme information is linearly decodable at line boundaries across model families and strengthens with scale, yet only Gemma-3-27B causally depends on it, with the driver migrating to the boundary around layer 30 and localizing to five attention heads.
-
A geometric relation of the error introduced by sampling a language model's output distribution to its internal state
A geometric 1-form on token embeddings has curvature that couples to semantic world models in language models, as evidenced by clustering on chess board regions and piece importance.