When Self-Reference Fails to Close: Matrix-Level Dynamics in Large Language Models
Pith reviewed 2026-05-10 14:54 UTC · model grok-4.3
The pith
Non-closing truth recursion disrupts attention matrices in large language models more than stable self-reference.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Prompts inducing non-closing truth recursion produce anomalously elevated attention effective rank and variance kurtosis, reaching Cohen's d of 3.14 to 3.52 versus stable self-reference in the 70B model, with 281 of 397 metric-model combinations showing significant separation after correction and per-layer SVD confirming disruption at all sampled depths.
What carries the argument
The 14-level prompt hierarchy that isolates non-closing truth recursion (NCTR) as truth computations without finite-depth closure, measured through 106 scalar metrics on attention and other matrices together with per-layer singular value decomposition.
If this is right
- NCTR prompts increase contradictory output by 34 to 56 percentage points relative to controls.
- Disruption registers at every sampled layer with effect sizes above 1.0 in the models tested.
- 43 of the 106 metrics replicate the NCTR distinction across all four models from three architecture families.
- A simple classifier trained on the internal metrics reaches AUC values between 0.81 and 0.90.
- Minimal prompt pairs differing only in recursion closure still separate on dozens of metrics.
Where Pith is reading between the lines
- The reported link to matrix-semigroup problems implies that algebraic results on semigroup convergence might predict which recursion patterns will trigger collapse.
- Architectures that explicitly track recursion depth during inference could avoid the elevated rank regimes observed here.
- Training corpora containing many unresolved self-referential statements may embed similar dynamical sensitivities that surface at inference time.
Load-bearing premise
The 14-level hierarchy and NCTR definition correctly isolate the driver of the observed matrix changes without confounds from prompt wording or metric aggregation.
What would settle it
Finding no elevation in attention effective rank or loss of differentiation between NCTR and stable self-reference when the same metrics are applied to new models or alternative layer-wise decompositions.
Figures
read the original abstract
We investigate how self-referential inputs alter the internal matrix dynamics of large language models. Measuring 106 scalar metrics across up to 7 analysis passes on four models from three architecture families -- Qwen3-VL-8B, Llama-3.2-11B, Llama-3.3-70B, and Gemma-2-9B -- over 300 prompts in a 14-level hierarchy at three temperatures ($T \in \{0.0, 0.3, 0.7\}$), we find that self-reference alone is not destabilizing: grounded self-referential statements and meta-cognitive prompts are markedly more stable than paradoxical self-reference on key collapse-related metrics, and on several such metrics can be as stable as factual controls. Instability concentrates in prompts inducing non-closing truth recursion (NCTR) -- truth-value computations with no finite-depth resolution. NCTR prompts produce anomalously elevated attention effective rank -- indicating attention reorganization with global dispersion rather than simple concentration collapse -- and key metrics reach Cohen's $d = 3.14$ (attention effective rank) to $3.52$ (variance kurtosis) vs. stable self-reference in the 70B model; 281/397 metric-model combinations differentiate NCTR from stable self-reference after FDR correction ($q < 0.05$), 198 with $|d| > 0.8$. Per-layer SVD confirms disruption at every sampled layer ($d > +1.0$ in all three models analyzed), ruling out aggregation artifacts. A classifier achieves AUC $0.81$-$0.90$; 30 minimal pairs yield 42/387 significant combinations; 43/106 metrics replicate across all four models. We connect these observations to three classical matrix-semigroup problems and propose, as a conjecture, that NCTR forces finite-depth transformers toward dynamical regimes where these problems concentrate. NCTR prompts also produce elevated contradictory output ($+34$-$56$ percentage points vs. controls), suggesting practical relevance for understanding self-referential failure modes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript investigates how self-referential inputs affect internal matrix dynamics in LLMs by computing 106 scalar metrics (including attention effective rank and variance statistics) across four models (Qwen3-VL-8B, Llama-3.2-11B, Llama-3.3-70B, Gemma-2-9B), a 14-level prompt hierarchy, and three temperatures. It reports that instability is concentrated in non-closing truth recursion (NCTR) prompts, which produce elevated attention effective rank and other collapse-related metrics with large effect sizes (Cohen's d = 3.14 to 3.52 vs. stable self-reference), 281/397 significant metric-model differences after FDR correction, per-layer SVD disruption at every sampled layer, a classifier with AUC 0.81-0.90, and elevated contradictory outputs (+34-56 pp). Stable self-reference and meta-cognitive prompts are more stable, comparable to factual controls. The work connects the observations to matrix-semigroup problems via an explicit conjecture.
Significance. If the NCTR isolation is robust, the scale of the analysis (300 prompts, 106 metrics, multi-model replication with 43/106 metrics consistent across all four models, 30 minimal pairs, and per-layer verification) provides a strong observational foundation for linking specific prompt structures to matrix-level reorganization in transformers. The large effect sizes and FDR-controlled results across architectures strengthen the empirical contribution, while the conjecture offers a potential bridge to classical linear-algebra problems even if left speculative.
major comments (2)
- [Prompt hierarchy and NCTR definition (abstract and §3)] The central attribution of matrix instability to NCTR (non-closing truth recursion) rather than prompt surface features requires that the 14-level hierarchy and minimal-pair design isolate the recursive property. The manuscript does not report explicit statistical matching or regression controls for token length, syntactic nesting depth, or lexical properties between NCTR and stable self-reference conditions. Although 30 minimal pairs and cross-model replication (43/106 metrics) are presented, these do not substitute for surface-feature controls; the reported d > 3 effects could still contain contributions from such confounds. This is load-bearing for the causal interpretation of the NCTR contrast.
- [Methods and metric definitions (§4)] The per-layer SVD analysis rules out simple aggregation artifacts, but the manuscript provides no explicit description of how the 106 scalar metrics are derived from the attention and key/value matrices (e.g., exact formulas for effective rank and kurtosis) or whether metric selection was pre-specified versus post-hoc. Without this or access to the metric definitions and raw per-layer matrices, it is difficult to evaluate whether the 281/397 significant combinations reflect genuine dynamical differences or metric-construction choices.
minor comments (3)
- [Abstract] The abstract is information-dense; consider moving some quantitative details (e.g., exact counts of significant combinations) to a results table or bullet list for readability.
- [Introduction] Ensure the first use of 'NCTR' and 'attention effective rank' includes a concise parenthetical definition even if expanded later.
- [Discussion] The conjecture linking NCTR to matrix-semigroup problems is clearly labeled as such; a short appendix sketching the three classical problems and the observed metric mappings would help readers assess its plausibility without altering the empirical claims.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the presentation of our empirical findings on NCTR-induced matrix instability. We respond to each major comment below.
read point-by-point responses
-
Referee: [Prompt hierarchy and NCTR definition (abstract and §3)] The central attribution of matrix instability to NCTR (non-closing truth recursion) rather than prompt surface features requires that the 14-level hierarchy and minimal-pair design isolate the recursive property. The manuscript does not report explicit statistical matching or regression controls for token length, syntactic nesting depth, or lexical properties between NCTR and stable self-reference conditions. Although 30 minimal pairs and cross-model replication (43/106 metrics) are presented, these do not substitute for surface-feature controls; the reported d > 3 effects could still contain contributions from such confounds. This is load-bearing for the causal interpretation of the NCTR contrast.
Authors: The 30 minimal pairs were constructed to isolate the non-closing recursion property while matching token length, syntactic structure, and lexical content as closely as possible within the constraints of the prompt hierarchy. We also report cross-model replication for 43 metrics and per-layer consistency. However, we acknowledge that we did not perform explicit regression-based controls for residual surface features in the submitted version. In the revision, we will add statistical comparisons of token lengths and nesting depths across conditions, along with a linear regression analysis controlling for these variables when predicting the key metrics. This will directly test whether the large effect sizes persist after accounting for potential confounds, strengthening the causal attribution to NCTR. revision: yes
-
Referee: [Methods and metric definitions (§4)] The per-layer SVD analysis rules out simple aggregation artifacts, but the manuscript provides no explicit description of how the 106 scalar metrics are derived from the attention and key/value matrices (e.g., exact formulas for effective rank and kurtosis) or whether metric selection was pre-specified versus post-hoc. Without this or access to the metric definitions and raw per-layer matrices, it is difficult to evaluate whether the 281/397 significant combinations reflect genuine dynamical differences or metric-construction choices.
Authors: We agree that greater transparency in metric derivation is necessary. The 106 metrics were pre-specified based on established measures of matrix stability in the linear algebra and neural network literature prior to data collection. In the revised manuscript, we will expand §4 with a table and formulas detailing each metric, including the precise definition of effective rank (normalized sum of singular values) and kurtosis of the attention weight distributions. We will also make the full extraction code and a subset of the raw per-layer attention matrices available as supplementary material to enable independent verification. This addresses the concern about potential metric-construction artifacts. revision: yes
Circularity Check
No circularity: purely observational empirical measurements with independent metrics
full rationale
The paper performs statistical comparisons of 106 scalar metrics (including attention effective rank, variance kurtosis, per-layer SVD) across categorized prompt hierarchies (NCTR vs. stable self-reference vs. controls) on multiple models. No derivation chain exists that reduces any reported difference or conjecture to quantities defined by the inputs themselves; the NCTR label is a prompt categorization criterion, not a fitted parameter, and the matrix-semigroup link is explicitly presented as an unproven conjecture rather than a derived result. All claims rest on direct computation and FDR-corrected tests, with no self-definitional loops, fitted-input predictions, or load-bearing self-citations.
Axiom & Free-Parameter Ledger
free parameters (2)
- Temperature set
- Prompt hierarchy depth
axioms (2)
- standard math Cohen's d and FDR-corrected tests are appropriate for comparing the 106 metrics across conditions.
- domain assumption The 106 scalar metrics plus SVD capture the relevant matrix-level dynamics induced by prompt type.
invented entities (1)
-
NCTR (non-closing truth recursion)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Emergent introspective awareness in large language models
Anthropic. Emergent introspective awareness in large language models. Technical report, October 2025. https://www.anthropic.com/research/introspection
work page 2025
-
[2]
Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J.\ Royal Statist.\ Soc.\ B , 57(1):289--300, 1995
work page 1995
- [3]
-
[4]
M. A. Berger and Y. Wang. Bounded semigroups of matrices. Lin.\ Alg.\ Appl. , 166:21--27, 1992
work page 1992
- [5]
-
[6]
V. D. Blondel and J. N. Tsitsiklis. The boundedness of all products of a pair of matrices is undecidable. Syst.\ & Control Lett. , 41(2):135--140, 2000
work page 2000
- [7]
-
[8]
V. Dwarka and A. Blom. Not all who wander are lost: Hallucinations as neutral dynamics in residual transformers. OpenReview, submitted to ICLR 2026, 2025. https://openreview.net/forum?id=fDfctZ8Fhg
work page 2026
-
[9]
W. Merrill and A. Sabharwal. The parallelism tradeoff: Limitations of log-precision transformers. Trans.\ ACL , 11:531--545, 2023
work page 2023
-
[10]
A. Naphade, S. Bhargav, S. Lim, and M. Shah. Me, myself, and : Evaluating and explaining LLM introspection. arXiv:2603.20276 , 2026
-
[11]
interpreting GPT : the logit lens
nostalgebraist. interpreting GPT : the logit lens. Alignment Forum , 2020
work page 2020
-
[12]
J. Ouaknine and J. Worrell. Decision problems for linear recurrence sequences. In RP 2012 , LNCS, pp. 21--28. Springer, 2012
work page 2012
-
[13]
J. Ouaknine and J. Worrell. Ultimate positivity is decidable for simple linear recurrence sequences. In ICALP 2014 , LNCS, pp. 330--341. Springer, 2014
work page 2014
-
[14]
M. S. Paterson. Unsolvability in 3 3 matrices. Stud.\ Appl.\ Math. , 49(1):105--107, 1970
work page 1970
-
[15]
arXiv preprint arXiv:2510.06477 , year=
J. Queipo-de-Llano, N. Arroyo, F. Barbero, Y. Dong, M. Bronstein, Y. LeCun, and R. Shwartz-Ziv. Attention sinks and compression valleys in LLMs are two sides of the same coin. In Proc.\ ICLR , 2026. arXiv:2510.06477
- [16]
-
[17]
A. Tarski. The concept of truth in formalized languages. 1933. English translation in Logic, Semantics, Metamathematics , Clarendon, 1956
work page 1933
-
[18]
T. Thrush et al. I am a strange dataset: Metalinguistic tests for language models. In Proc.\ ACL , 2024
work page 2024
- [19]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.