The Illusion of Superposition? A Principled Analysis of Latent Thinking in Language Models
Pith reviewed 2026-05-10 19:56 UTC · model grok-4.3
The pith
Language models only exhibit superposition in latent chain-of-thought reasoning when trained from scratch on the task.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Only models trained from scratch exhibit signs of using superposition. In the training-free and fine-tuned regimes, the superposition either collapses or is not used at all, with models discovering shortcut solutions instead, because pretraining on natural language biases models to commit to a token in the last layers and because capacity has a huge effect on which solutions a model favors.
What carries the argument
Comparison across training-free, fine-tuned, and from-scratch regimes, tracked by Logit Lens and entity-level probing of internal representations to detect whether multiple candidate solutions remain active simultaneously.
If this is right
- Superposition in latent reasoning requires full task-specific training from random initialization rather than adaptation of existing weights.
- Pretraining on natural language data creates a bias toward early token commitment that prevents maintenance of multiple hypotheses.
- Model capacity determines whether a network discovers and maintains complex representations like superposition or defaults to shortcuts.
- Practical latent CoT systems built by fine-tuning will rarely realize the hypothesized expressivity gains.
Where Pith is reading between the lines
- Because full retraining is expensive, the conditions for superposition are unlikely to appear in most deployed systems.
- Hybrid approaches that combine limited task-specific training with techniques to preserve multi-hypothesis representations might induce superposition without starting from scratch.
- The same regime-dependent collapse could appear in other continuous reasoning formats that rely on adapting pretrained models.
Load-bearing premise
That Logit Lens and entity-level probing can reliably detect the presence or absence of superposition in the model's internal representations across the three regimes.
What would settle it
A fine-tuned or training-free model that keeps multiple distinct token predictions active across intermediate layers without collapsing to a single commitment in the final layers would contradict the reported absence of superposition in those regimes.
Figures
read the original abstract
Latent reasoning via continuous chain-of-thoughts (Latent CoT) has emerged as a promising alternative to discrete CoT reasoning. Operating in continuous space increases expressivity and has been hypothesized to enable superposition: the ability to maintain multiple candidate solutions simultaneously within a single representation. Despite theoretical arguments, it remains unclear whether language models actually leverage superposition when reasoning using latent CoTs. We investigate this question across three regimes: a training-free regime that constructs latent thoughts as convex combinations of token embeddings, a fine-tuned regime where a base model is adapted to produce latent thoughts, and a from-scratch regime where a model is trained entirely with latent thoughts to solve a given task. Using Logit Lens and entity-level probing to analyze internal representations, we find that only models trained from scratch exhibit signs of using superposition. In the training-free and fine-tuned regimes, we find that the superposition either collapses or is not used at all, with models discovering shortcut solutions instead. We argue that this is due to two complementary phenomena: i) pretraining on natural language data biases models to commit to a token in the last layers ii) capacity has a huge effect on which solutions a model favors. Together, our results offer a unified explanation for when and why superposition arises in continuous chain-of-thought reasoning, and identify the conditions under which it collapses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes whether language models employ superposition in latent continuous chain-of-thought reasoning. It compares three training regimes—training-free construction of latent thoughts, fine-tuning a base model, and training from scratch—and uses Logit Lens and entity-level probing to argue that only the from-scratch regime shows evidence of superposition, with the others collapsing to single solutions or shortcuts due to pretraining biases and capacity constraints.
Significance. Should the empirical distinctions hold under more rigorous validation, the results would provide a principled account of when and why superposition appears in latent reasoning, with implications for model training strategies and the design of continuous reasoning architectures. The identification of complementary phenomena (pretraining bias and capacity) is a notable contribution if substantiated.
major comments (2)
- [Abstract] The abstract asserts specific findings regarding the use of superposition in different regimes but supplies no experimental details, controls, quantitative metrics, or error analysis. This makes it difficult to evaluate the robustness of the claim that only from-scratch models exhibit superposition.
- [Analysis of internal representations] The central claim depends on Logit Lens and entity-level probing distinguishing superposition from collapsed or shortcut representations. However, the manuscript does not address documented limitations of Logit Lens in capturing internal computations or validate that entity-level probing can differentiate true simultaneous encoding from averaged or sequential processing. This is load-bearing for the regime comparisons.
minor comments (1)
- [Abstract] The phrase 'signs of using superposition' in the abstract and results could be clarified by specifying the exact quantitative criteria or thresholds applied to the probing outputs.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which has helped us clarify and strengthen the presentation of our results. We address each major comment point by point below, with revisions made to the manuscript where appropriate to improve robustness and transparency.
read point-by-point responses
-
Referee: [Abstract] The abstract asserts specific findings regarding the use of superposition in different regimes but supplies no experimental details, controls, quantitative metrics, or error analysis. This makes it difficult to evaluate the robustness of the claim that only from-scratch models exhibit superposition.
Authors: We agree that the abstract is a high-level summary and, as such, omits full experimental details by design. To address this concern, we have revised the abstract to include key quantitative metrics (e.g., average probing accuracies and superposition indices with standard deviations across multiple runs) and a brief mention of controls. Full details on experimental setups, controls, quantitative results, and error analysis remain in Sections 3 and 4, with additional tables in the appendix. This revision maintains brevity while providing readers with sufficient information to assess the claims. revision: yes
-
Referee: [Analysis of internal representations] The central claim depends on Logit Lens and entity-level probing distinguishing superposition from collapsed or shortcut representations. However, the manuscript does not address documented limitations of Logit Lens in capturing internal computations or validate that entity-level probing can differentiate true simultaneous encoding from averaged or sequential processing. This is load-bearing for the regime comparisons.
Authors: We acknowledge the documented limitations of Logit Lens (e.g., its tendency to reflect later-layer biases rather than full internal computations) and have added a dedicated discussion subsection (now Section 4.3) citing relevant prior work on these issues, along with how our multi-method approach (combining Logit Lens with probing) mitigates them. For entity-level probing, we have included new validation experiments in Appendix C: we construct synthetic baselines for averaged and sequential representations and show that our probing method yields distinct signatures for true superposition (simultaneous multi-entity activation) versus these alternatives. These additions directly support the validity of our regime comparisons. revision: yes
Circularity Check
No significant circularity: purely empirical observational study
full rationale
The paper conducts an empirical analysis across training-free, fine-tuned, and from-scratch regimes, using Logit Lens and entity-level probing to measure internal representations and observe differences in superposition behavior. No derivations, equations, fitted parameters renamed as predictions, or self-citations that bear the load of the central claims are present. All conclusions rest on external experimental measurements and observations rather than reducing to self-definitions or ansatzes by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
0.0576 Argmax Top-3:
-
[2]
0.0654 Maximum KL GSM8K 42 KL = 4.7812 t = 1050 Soft Top-3:
-
[3]
( 0.0057 Argmax Top-3:
-
[4]
C 0.0508 MATH500 10 KL = 4.9062 t = 800 Soft Top-3:
-
[5]
0.0952 Argmax Top-3:
-
[6]
( 0.0188 MATH500 55 KL = 0.7031 t = 950 Soft Top-3:
-
[7]
( 0.1270 Argmax Top-3:
-
[8]
0.0605 AIME 3 KL = 2.9062 t = 1450 Soft Top-3:
-
[9]
A 0.1416 Argmax Top-3:
-
[10]
( 0.0938 AIME 5 KL = -0.0027 t = 150 Soft Top-3:
-
[11]
: 0.0311 Argmax Top-3:
-
[12]
0.0315 Minimum KL KL = -0.0037 t = 1000 Soft Top-3:
-
[13]
, 0.0684 Argmax Top-3:
-
[14]
, 0.0757 KL = 0.0000 t = 50 Soft Top-3:
-
[15]
( 0.0122 Argmax Top-3:
-
[16]
( 0.0122 KL = -0.0041 t = 1250 Soft Top-3:
-
[17]
0.0703 Argmax Top-3:
-
[18]
0.0649 KL = -0.0028 t = 1150 Soft Top-3:
-
[19]
a 0.0223 Argmax Top-3:
-
[20]
Each column represents a problem instance
A 0.0248 Token Predictions at Final Layer (L=63): Extreme KL Divergence Figure 7: Top 3 tokens at the output layer for time steps with largest (top) and smallest (bottom) KL divergence between soft and argmax representations inQwQ-32B. Each column represents a problem instance. C.2 Full-Dataset Logit Lens Results In this section we present visualizations ...
-
[21]
function 0.0233 Argmax Top-3:
-
[22]
number 0.0116 Maximum KL GSM8K 42 KL = 0.1602 t = 50 Soft Top-3:
-
[23]
in 0.0332 Argmax Top-3:
-
[24]
is 0.0479 MATH500 10 KL = 1.1094 t = 350 Soft Top-3:
-
[25]
number 0.0238 Argmax Top-3:
-
[26]
number 0.0116 MATH500 55 KL = 0.1543 t = 850 Soft Top-3:
-
[27]
first 0.0112 Argmax Top-3:
-
[28]
first 0.0120 AIME 3 KL = 1.9219 t = 500 Soft Top-3:
-
[29]
a 0.0303 Argmax Top-3:
-
[30]
post 0.0374 AIME 5 KL = 0.0000 t = 150 Soft Top-3:
-
[31]
1 0.0840 Argmax Top-3:
-
[32]
1 0.0840 Minimum KL KL = -0.0001 t = 1100 Soft Top-3:
-
[33]
// 0.0806 Argmax Top-3:
-
[34]
// 0.0791 KL = -0.0006 t = 300 Soft Top-3:
-
[35]
3 0.1064 Argmax Top-3:
-
[36]
3 0.1045 KL = 0.0000 t = 50 Soft Top-3:
-
[37]
? 0.0854 Argmax Top-3:
-
[38]
? 0.0854 KL = 0.0000 t = 300 Soft Top-3:
-
[39]
? 0.0986 Argmax Top-3:
-
[40]
? 0.0986 Token Predictions at Final Layer (L=27): Extreme KL Divergence Figure 8: Top 3 predicted tokens at the output layer for time steps with largest (top) and smallest (bottom) KL divergence between soft and argmax representations inQwen2-1.5B. Each column represents a problem instance. 0% 26% 52% 78% 100% Relative T oken Position 0 16 32 48 63Layer K...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.