pith. machine review for the scientific record. sign in

arxiv: 2605.09011 · v1 · submitted 2026-05-09 · 💻 cs.LG · cs.AI

Recognition: no theorem link

A Geometric Perspective on Next-Token Prediction in Large Language Models: Three Emerging Phases

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:14 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords large language modelsresidual streamnext-token predictionGrassmann manifoldgeometric phasessingular subspacesmodel depthaffine maps
0
0 comments X

The pith

Large language models organize next-token prediction into three geometric phases across layers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tracks where predictive information for the next token lives inside LLM residual streams by training simple affine maps at each layer and extracting their dominant singular subspaces. It follows how these subspaces move relative to one another on the Grassmann manifold, producing a consistent unimodal similarity curve with a rise, plateau, and descent. The curve breaks into three phases whose main difference is how they change the effective rank of the predictive subspace: expansion while seeding a multiplexed candidate set, stabilization while overriding to concentrate the set, and concentration while writing the final choice. The same pattern holds across eight models from two families and across a wide range of subspace dimensions, implying that extra depth is spent mainly on sharpening among already-present candidates rather than on creating new ones.

Core claim

Across models, the trajectory of the dominant singular subspace of each layer's learned affine map shows three phases distinguished by rank dynamics: Seeding Multiplexing expands the candidate set in superposition with attention and feed-forward layers, the final token rising to lead in 20 to 35 percent of positions; Hoisting Overriding keeps rank stable while overriding existing subspaces to concentrate the distribution; Focal Convergence uses high-energy low-rank updates to align the winner with the unembedding direction. All updates remain approximately orthogonal to the residual stream, and phases one and three grow slowly with depth while phase two grows linearly.

What carries the argument

The predictive readout subspace, the leading k-dimensional singular subspace of a learned affine map from residual stream to next-token logits, whose successive positions on the Grassmann manifold trace the similarity profile that defines the phases.

If this is right

  • The first phase seeds a candidate set in superposition whose leading member is the final token in 20 to 35 percent of positions.
  • The second phase concentrates the candidate distribution by overriding subspaces without increasing effective rank.
  • The third phase writes the selected token into a direction aligned with the unembedding matrix.
  • Phases one and three lengthen slowly with added depth while phase two lengthens linearly, so most new capacity goes to disambiguation.
  • The same three-phase structure appears for any k between 1 percent and 50 percent of the residual dimension.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the phases are mechanistic, targeted edits at the boundaries between them could alter candidate generation or final selection independently.
  • The linear scaling of the middle phase offers a geometric account for why depth improves performance on tasks that require resolving ambiguity among many options.
  • The same subspace-tracking method could be applied to other sequence models to test whether analogous geometric phases appear outside language.

Load-bearing premise

The dominant singular subspaces of these affine maps locate the actual predictive information and their Grassmann trajectories mark real mechanistic phase boundaries rather than depending on the arbitrary choice of k.

What would settle it

Whether additional models outside the Qwen2.5 and OLMo2 families produce the same unimodal rise-plateau-descent similarity profiles whose segments align with the same three patterns of rank expansion, stabilization, and concentration.

Figures

Figures reproduced from arXiv: 2605.09011 by Gianfranco Lombardo, Giuseppe Trimigno, Stefano Cagnoni.

Figure 1
Figure 1. Figure 1: Examples of Hit@k achieved by a representation lens and LogitLens across layers. The representation lens recovers the final token early, while the LogitLens remains nearly blind until the final phase, indicating predictive information is present but hidden in superposition. residual states into vocabulary space. The LogitLens [14] applies the final unembedding directly, but suffers from representational dr… view at source ↗
Figure 2
Figure 2. Figure 2: Pairwise RSS similarity matrices M(k) for OLMO2-32B at truncation levels k. 0.2 0.4 0.6 0.8 1.0 E(k) [mean energy ratio] 0.2 0.3 0.4 0.5 0.6 Visibility(k) [p95−p5 off-diag sim] k=1% k=3% k=5% k=10% k=15% k=25% k=35% k=50% Qwen2.5-1.5B 0.2 0.4 0.6 0.8 1.0 E(k) [mean energy ratio] 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 k=1% k=3% k=5% k=10% k=15% k=25% k=35% k=50% Qwen2.5-32B 0.2 0.4 0.6 0.8 1.0 E(k) [mean e… view at source ↗
Figure 3
Figure 3. Figure 3: Examples of Pareto frontier between Visibility@k and [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Examples of unimodal fits (left) and breakpoint detection (right) on the RSS profile [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distributional fits to RSS profiles {s (k) ℓ } at k = 50%. High R2 values for unimodal distributions supports the unimodality assumption and the tripartite decomposition. singular vectors. We define: RSS(k) ℓ = 1 k Xk r=1 cos θ (ℓ) r (6) where {θ (ℓ) r } k r=1 are the principal angles between S (k) ℓ and S (k) ℓ+1. Tracking RSS(k) ℓ across layers yields the similarity profile {RSS(k) ℓ } L−1 ℓ=1 , a layer-… view at source ↗
Figure 6
Figure 6. Figure 6: Scaling structure of the three phases across model depths. From left to right: [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: ∥∆FFN ℓ ∥/∥∆MHA ℓ ∥ ratio across phases per model. coincide with R d and pairwise similarity collapses to 1. This defines a Pareto trade-off between fidelity and discriminability, making k a resolution parameter. As shown in [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Left: breakpoint robustness for all eight models across all subspace resolutions [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Rank of the total update ∆ℓ (solid) and its norm (dashed). All models in Appendix E. 4 Results 4.1 Phase detection Validation of the Unimodality Assumption. We report the distributional fits from Section 3 as validation of the unimodality assumption. Across models and k, fits achieve high R2 (≥ 0.9 for most models, minimum 0.75; [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: RSS profiles s (k=15%) ℓ for all eight models across the two architectural families, with consensus breakpoints ˆb1 and ˆb2 indicated. All models in Appendix E. 0.0 0.2 0.4 0.6 0.8 1.0 Normalised depth 0.0 0.2 0.4 0.6 0.8 1.0 Hit@1 and Hit@5 P1 P2 P3 Qwen-1.5B Qwen-7B Qwen-14B Qwen-32B OLMo-1B OLMo-7B OLMo-13B OLMo-32B hit@5 (dashed) 0.0 0.2 0.4 0.6 0.8 1.0 Normalised depth 0.8 1.0 1.2 1.4 1.6 1.8 2.0 Hit… view at source ↗
Figure 11
Figure 11. Figure 11: Left: Hit@1 (solid) and Hit@5 (dashed) across normalised depth. Right: Hit@5/Hit@1 [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: RSS profile s (k=15%) ℓ (solid, left axis) and | cos(∆ℓ, hℓ−1)| (dashed red, right axis). High RSS ↔ low cosine (Phase 2); low RSS ↔ high cosine (Phase 3). See Appendix E for all models. 0 10 20 30 Layer 1000 1500 2000 e f f_ r a n k(H ) P1 P2 P3 OLMo-13B 0 5 10 15 20 25 Layer 1050 1100 1150 e f f_ r a n k(H ) P1 P2 P3 Qwen-1.5B 0 5 10 15 20 25 Layer 2000 2200 2400 2600 e f f_ r a n k(H ) P1 P2 P3 Qwen-7B… view at source ↗
Figure 13
Figure 13. Figure 13: Average effective rank of hidden states across 16M tokens. All models in Appendix E. [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Final per-layer training and validation losses across model scales and families. The low [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Pairwise RSS similarity matrices M(k) for all models in the QWEN2.5 family across truncation levels k ∈ {1, 3, 5, 10, 15, 25, 35, 50}% of d. Each row corresponds to a model; each column to a truncation level. Geometric contrast decreases monotonically with k across all sizes. with no systematic gap indicative of overfitting. This consistency confirms that the lens parameters have converged to solutions th… view at source ↗
Figure 16
Figure 16. Figure 16: Pairwise RSS similarity matrices M(k) for all models in the OLMO2 family across truncation levels k ∈ {1, 3, 5, 10, 15, 25, 35, 50}% of d. The same monotonic decay in Visibility@K is observed consistently across all model sizes, in agreement with the QWEN2.5 family. 0.2 0.4 0.6 0.8 1.0 E(k) [mean energy ratio] 0.2 0.3 0.4 0.5 0.6 Visibility(k) [p95 p5 off-diag sim] k=1% k=3% k=5% k=10% k=15% k=25% k=35% k… view at source ↗
Figure 17
Figure 17. Figure 17: Pareto frontier between Visibility@K and E(k) for all models in the QWEN2.5 and OLMO2 families. Each point corresponds to a truncation level k ∈ {1, 3, 5, 10, 15, 25, 35, 50}% of d. The intermediate regime k ∈ {5%, 10%, 15%} consistently lies at the region of highest curvature across all model sizes and both families, offering the most favorable trade-off between spectral fidelity and geometric discrimina… view at source ↗
Figure 18
Figure 18. Figure 18: Distribution fits (left panels) and double-chord breakpoint detection (right panels) on the [PITH_FULL_IMAGE:figures/full_fig_p015_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: RSS profiles s (k=15%) ℓ for all eight models across both families, with consensus breakpoints ˆb1 and ˆb2 indicated. The rise–plateau–descent structure and the placement of phase boundaries are consistent across all model sizes and both architectural families. the unimodal assumption does not discard relevant information. Local deviations from a strictly unimodal shape are present in some models, particu… view at source ↗
Figure 20
Figure 20. Figure 20: RSS profile s (k=15%) ℓ (solid, left axis) and | cos(∆ℓ, hℓ−1)| (dashed red, right axis) overlaid for all eight models across both families. High RSS ↔ low cosine (Phase 2); low RSS ↔ high cosine (Phase 3). The anti-correlation is consistent across all models and scales. 2.5 5.0 7.5 10.0 12.5 Layer 800 1000 1200 1400 e f f_ r a n k(H ) P1 P2 P3 OLMo-1B 0 10 20 30 Layer 1000 1500 2000 e f f_ r a n k(H ) P1… view at source ↗
Figure 21
Figure 21. Figure 21: Effective rank of hidden states Hℓ averaged across 16M tokens for all eight models. Rank increases through Phase 1, peaks near ˆb1, stabilises in Phase 2, and remains approximately constant through Phase 3. QWEN2.5-1.5B exhibits reduced layer-wise variance consistent with its constrained capacity. RSS profiles and consensus breakpoints [PITH_FULL_IMAGE:figures/full_fig_p017_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Effective rank of the total update ∆ℓ (solid) and its norm ∥∆ℓ∥2 (dashed red) for all eight models. Update rank peaks in Phase 2 and declines in Phase 3 (7/8 models); update norm increases monotonically with sharp acceleration in Phase 3 (up to 10×), indicating concentration on few high-energy directions. Effective rank of hidden states and layer updates. Figures 21 and 22 extend Figures 13 and 9 to all e… view at source ↗
read the original abstract

We investigate the geometry of predictive information across the layers of large language models (LLMs). We repurpose representation lenses-learned affine maps trained to predict the next token from intermediate residual streams-as geometric diagnostic tools. Rather than asking what the model predicts at each layer, we ask where predictive information resides and how it evolves across depth. We define at each layer a predictive readout subspace as the dominant k-dimensional singular subspace of such a map on the d-dimensional residual stream (where k is a resolution parameter), and track its trajectory on the Grassmann manifold as a similarity profile across layers. The profile is well described by unimodal distributions exhibiting a rise, near-plateau, and descent; varying k from 1% to 50% of d traces a Pareto frontier between visibility and energy retention, yet the same structure emerges at all scales. Across eight models from two families (Qwen2.5 and OLMo2, 1B-32B), we identify three geometric phases. Updates are approximately orthogonal to the residual stream throughout; what distinguishes the phases is their effect on the effective rank, which expands, stabilizes, and concentrates. In the first, Seeding Multiplexing, feed-forward memories and attention layers seed a candidate set in superposition in family-specific proportions, with the final token rising as leading candidate from 20% to 35% of positions across this phase. In the second, Hoisting Overriding, updates override existing subspaces to concentrate the candidate distribution without expanding the rank. In the third, Focal Convergence, high-energy low-rank updates write the winner into a form aligned with the unembedding direction. Phases 1 and 3 grow slowly with model depth, while Phase 2 expands linearly. The additional capacity of deeper LLMs is largely absorbed by candidate disambiguation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that by fitting per-layer affine probes to predict next tokens from residual streams and tracking the trajectory of their dominant k-dimensional singular subspaces on the Grassmann manifold, three geometric phases emerge across depth in LLMs: Seeding Multiplexing (expanding effective rank via candidate superposition), Hoisting Overriding (stabilizing rank while overriding subspaces), and Focal Convergence (concentrating rank to align with the unembedding). This unimodal rise-plateau-descent structure in subspace similarity holds across eight models (Qwen2.5 and OLMo2, 1B-32B) and k ranging from 1% to 50% of d, with deeper models using extra capacity primarily for candidate disambiguation rather than expanding the candidate set.

Significance. If the phase distinctions prove robust, the work offers a geometric lens on how predictive information is organized and refined through LLM layers, with the empirical scope across two model families and multiple scales providing a useful baseline for interpretability research. The observation that Phase 2 scales linearly with depth while Phases 1 and 3 grow slowly is a concrete, falsifiable claim that could inform scaling laws and mechanistic understanding.

major comments (2)
  1. [Abstract / §3] Abstract and §3 (or equivalent methods/results): the assertion that 'the same structure emerges at all scales' for k from 1% to 50% of d lacks quantitative support such as measured stability of phase boundaries (e.g., layer indices where effective-rank transitions occur), error bars across models, or statistical tests for invariance under k. Since the three phases are defined precisely by whether updates expand, stabilize, or concentrate the effective rank of the k-subspace, sensitivity of these transitions to the resolution parameter k is load-bearing for the central claim.
  2. [Abstract / §4] Abstract and §4 (results on subspace alignment): no controls are reported comparing the alignment of the learned k-subspaces with next-token prediction against random k-dimensional subspaces of the residual stream or against the full d-dimensional space. Without such baselines, it remains unclear whether the reported Grassmann trajectories and phase distinctions capture predictive signal beyond what would arise from any k-dimensional projection, undermining the mechanistic interpretation of the phases.
minor comments (2)
  1. [§2] Notation for the Grassmann manifold distance or similarity profile should be defined explicitly (e.g., via principal angles) rather than left implicit when describing the 'similarity profile across layers'.
  2. [§5] The paper would benefit from a table summarizing the layer ranges or token positions assigned to each phase across the eight models to make the scaling claims (Phase 2 linear, others slow) directly verifiable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify the presentation of our geometric analysis. We address each major comment below and will incorporate the requested quantitative support and controls in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract / §3] Abstract and §3 (or equivalent methods/results): the assertion that 'the same structure emerges at all scales' for k from 1% to 50% of d lacks quantitative support such as measured stability of phase boundaries (e.g., layer indices where effective-rank transitions occur), error bars across models, or statistical tests for invariance under k. Since the three phases are defined precisely by whether updates expand, stabilize, or concentrate the effective rank of the k-subspace, sensitivity of these transitions to the resolution parameter k is load-bearing for the central claim.

    Authors: We agree that quantitative metrics are needed to substantiate invariance across k. In the revision we will report: (i) mean and standard deviation of phase-boundary layer indices (onset of rank expansion, onset of stabilization, onset of convergence) across all eight models for each k in {1%,5%,10%,25%,50%} of d; (ii) error bars on the Grassmann similarity curves; and (iii) a non-parametric test (Kruskal-Wallis with post-hoc Dunn) for differences in phase lengths across k values. These statistics will be added to §3 and a new appendix table. The existing qualitative observation that the unimodal rise-plateau-descent shape persists will be supplemented by these measures, confirming that phase distinctions remain stable within the tested range. revision: yes

  2. Referee: [Abstract / §4] Abstract and §4 (results on subspace alignment): no controls are reported comparing the alignment of the learned k-subspaces with next-token prediction against random k-dimensional subspaces of the residual stream or against the full d-dimensional space. Without such baselines, it remains unclear whether the reported Grassmann trajectories and phase distinctions capture predictive signal beyond what would arise from any k-dimensional projection, undermining the mechanistic interpretation of the phases.

    Authors: We accept that explicit baselines are required. We will add two controls to §4: (1) random k-dimensional subspaces sampled uniformly from the residual stream at each layer, and (2) the full d-dimensional residual stream itself. For both, we will compute the same Grassmann similarity profiles to the final unembedding direction and overlay them on the learned-subspace curves. Statistical significance will be assessed via permutation tests (10,000 shuffles) comparing the plateau-phase alignment of the learned subspaces against the random controls. These results will demonstrate that the learned subspaces exhibit reliably higher alignment than random projections of equal dimension, thereby supporting the claim that the observed phases reflect next-token predictive geometry rather than generic low-rank structure. revision: yes

Circularity Check

0 steps flagged

No circularity: phases are descriptive labels on empirical subspace trajectories

full rationale

The paper defines the predictive readout subspace via the dominant k-dimensional singular subspace of a learned affine map on the residual stream and tracks its Grassmannian trajectory as an empirical similarity profile across layers. The three phases (Seeding Multiplexing, Hoisting Overriding, Focal Convergence) are then assigned as descriptive labels based on observed changes in effective rank and candidate concentration; these labels do not enter the definition of the subspaces, the singular-value decomposition, or the manifold distance metric. No equations reduce the reported phase boundaries or structures to quantities fitted from the same data by construction, and the analysis is replicated across independent model families without self-citation load-bearing steps.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Abstract-only review; ledger entries are inferred directly from stated concepts. Full paper may introduce additional fitted quantities or background assumptions.

free parameters (1)
  • k
    Resolution parameter controlling subspace dimension (varied from 1% to 50% of d); used to trace Pareto frontier between visibility and energy retention.
axioms (1)
  • domain assumption The dominant k-dimensional singular subspace of the learned affine map captures the location of predictive information in the residual stream.
    Invoked when defining the predictive readout subspace at each layer.
invented entities (1)
  • Three geometric phases (Seeding Multiplexing, Hoisting Overriding, Focal Convergence) no independent evidence
    purpose: To classify the observed rise-plateau-descent behavior of predictive subspaces.
    Defined from the shape of similarity profiles and changes in effective rank.

pith-pipeline@v0.9.0 · 5638 in / 1424 out tokens · 82631 ms · 2026-05-12T02:14:02.009172+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 6 internal anchors

  1. [1]

    Intrinsic dimensionality explains the effectiveness of language model fine-tuning

    Armen Aghajanyan, Sonal Gupta, and Luke Zettlemoyer. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors,Proceedings of the 59th Annual Meeting of the Associ- ation for Computational Linguistics and the 11th International Joint Conference on Nat- ural Langua...

  2. [2]

    Intrinsic dimensionality explains the effectiveness of language model fine-tuning

    Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.568. URL https://aclanthology.org/2021.acl-long.568/

  3. [3]

    Understanding intermediate layers using linear classifier probes, 2017

    Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes, 2017. URLhttps://openreview.net/forum?id=ryF7rTqgl

  4. [4]

    Emmanuel Ameisen, Jack Lindsey, Adam Pearce, Wes Gurnee, Nicholas L. Turner, Brian Chen, Craig Citro, David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton, Trenton Bricken, Callum McDougall, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson, Sam Zimmerman, Kelley Rivo...

  5. [5]

    Eliciting Latent Predictions from Transformers with the Tuned Lens

    Nora Belrose, Alberto Bietti, Minsu Roy, Antoine Mialon, Julien Mairal, Shariq Nikzad, Max Tänzer, Vered Rassin, A. Lajoie, et al. Eliciting latent predictions from transformers with the Tuned Lens.arXiv preprint arXiv:2303.08112, 2023. URL https://arxiv.org/abs/2303. 08112

  6. [6]

    A multiscale analysis of mean-field transformers in the moderate interaction regime

    Giuseppe Bruno, Federico Pasqualotto, and Andrea Agazzi. A multiscale analysis of mean-field transformers in the moderate interaction regime. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id= WCRPgBpbcA

  7. [7]

    A mathematical framework for transformer circuits.Transformer Circuits Thread,

    Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. A...

  8. [8]

    https://transformer-circuits.pub/2021/framework/index.html

  9. [9]

    Privileged bases in the transformer residual stream.Transformer Circuits Thread, 2023

    Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, et al. Privileged bases in the transformer residual stream.Transformer Circuits Thread, 2023. URL https://transformer-circuits. pub/2023/privileged-basis/index.html

  10. [10]

    Transformer Feed-Forward Layers Are Key-Value Memories

    Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy. Transformer feed-forward layers are key-value memories. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen- tau Yih, editors,Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, Online and Punta Cana, Dominican Republic, November ...

  11. [11]

    Patchscopes: a unifying framework for inspecting hidden representations of language models

    Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, and Mor Geva. Patchscopes: a unifying framework for inspecting hidden representations of language models. ICML’24. JMLR.org, 2024

  12. [12]

    When models manipulate manifolds: The geometry of a counting task, 2026

    Wes Gurnee, Emmanuel Ameisen, Isaac Kauvar, Julius Tarng, Adam Pearce, Chris Olah, and Joshua Batson. When models manipulate manifolds: The geometry of a counting task, 2026. URLhttps://arxiv.org/abs/2601.04480. 10

  13. [13]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Fixing weight decay regularization in adam.CoRR, abs/1711.05101, 2017. URLhttp://arxiv.org/abs/1711.05101

  14. [14]

    Locating and editing factual associations in gpt

    Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in gpt. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY , USA, 2022. Curran Associates Inc. ISBN 9781713871088

  15. [15]

    Emergent linear representations in world models of self-supervised sequence models

    Neel Nanda, Andrew Lee, and Martin Wattenberg. Emergent linear representations in world models of self-supervised sequence models. In Yonatan Belinkov, Sophie Hao, Jaap Jumelet, Na- joung Kim, Arya McCarthy, and Hosein Mohebbi, editors,Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 16–30, Singapore, ...

  16. [16]

    URLhttps://aclanthology.org/2023.blackboxnlp-1.2/

  17. [17]

    Interpreting GPT: the logit lens

    nostalgebraist. Interpreting GPT: the logit lens. https://www.lesswrong.com/posts/ AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens, 2020

  18. [18]

    2 OLMo 2 Furious

    Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, et al. 2 olmo 2 furious.arXiv preprint arXiv:2501.00656, 2024

  19. [19]

    The Linear Representation Hypothesis and the Geometry of Large Language Models

    Kiho Park, Yo Joong Choe, and Victor Veitch. The linear representation hypothesis and the geometry of large language models, 2024. URLhttps://arxiv.org/abs/2311.03658

  20. [20]

    kneedle

    Ville Satopaa, Jeannie Albrecht, David Irwin, and Barath Raghavan. Finding a" kneedle" in a haystack: Detecting knee points in system behavior. In2011 31st international conference on distributed computing systems workshops, pages 166–171. IEEE, 2011

  21. [21]

    Riechers, Lucas Teixeira, Alexander Gietelink Oldenziel, and Sarah Marzen

    Adam Shai, Paul M. Riechers, Lucas Teixeira, Alexander Gietelink Oldenziel, and Sarah Marzen. Transformers represent belief state geometry in their residual stream. InThe Thirty- eighth Annual Conference on Neural Information Processing Systems, 2024. URL https: //openreview.net/forum?id=YIB7REL8UC

  22. [22]

    2309.10818 , archivePrefix =

    Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Zhengzhong Liu, Hongyi Wang, Bowen Tan, Joel Hestness, Natalia Vassilieva, Daria Soboleva, and Eric Xing. Slimpajama-dc: Understanding data combinations for llm training, 2024. URL https://arxiv.org/abs/ 2309.10818

  23. [23]

    Low-rank lens for scalable llms interpretability

    Giuseppe Trimigno, Gianfranco Lombardo, and Stefano Cagnoni. Low-rank lens for scalable llms interpretability. pages 35–40, 01 2026. doi: 10.14428/esann/2026.ES2026-221

  24. [24]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  25. [25]

    Jump to conclusions: Short-cutting transformers with linear transformations

    Alexander Yom Din, Taelin Karidi, Leshem Choshen, and Mor Geva. Jump to conclusions: Short-cutting transformers with linear transformations. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors,Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources...