pith. sign in

arxiv: 2606.19404 · v1 · pith:7YIW7A4Pnew · submitted 2026-06-17 · 💻 cs.LG · cs.CL

Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models

Pith reviewed 2026-06-26 21:36 UTC · model grok-4.3

classification 💻 cs.LG cs.CL
keywords hallucination detectionlarge language modelsgraph Laplacianfree energyspectral form factorrandom matrix theoryattention mechanismsthermodynamic signatures
0
0 comments X

The pith

Free-energy signatures from attention Laplacians detect LLM hallucinations more accurately than prior spectral summaries without retraining the model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that treating each layer's attention-derived graph Laplacian as a Hamiltonian and extracting its thermodynamic potentials plus spectral form factor produces descriptors that carry stronger signal about reasoning quality than summaries limited to a few eigenvalues. These Free-Energy Signatures are proved Lipschitz-stable under attention perturbations, expressive enough to approximate moment-derived spectral functionals under stated regularity conditions, and supported by a finite-sample PAC bound on the AUROC of a training-free detector. Across six open-weight models and six benchmarks the approach yields the highest aggregate AUROC among attention-spectral baselines while leaving the underlying LLM unchanged. In the unsupervised regime an RMT-deviation score alone reaches mean AUROC 0.71, and correct generations exhibit more Wigner-Dyson spectral statistics whereas hallucinations exhibit more Poisson-like statistics.

Core claim

Treating the spectrum of an attention-derived graph Laplacian as the energy levels of a Hamiltonian yields Free-Energy Signatures (partition function, free energy, spectral entropy, heat capacity, and random-matrix spectral form factor) that enrich finite spectral summaries, remain stable under small attention changes, and support both supervised and unsupervised hallucination detectors whose performance exceeds that of earlier eigenvalue-based baselines.

What carries the argument

Free-Energy Signatures (Fes), obtained by interpreting each layer's attention Laplacian as a Hamiltonian and computing its thermodynamic potentials together with the random-matrix-theory spectral form factor.

If this is right

  • A lightweight probe using Fes descriptors achieves the highest aggregate AUROC among attention-spectral baselines on six models and six benchmarks.
  • An unsupervised RMT-deviation score alone reaches mean AUROC 0.71 without any labeled data.
  • Correct generations display more Wigner-Dyson spectral statistics while hallucinations display more Poisson-like statistics.
  • Fes descriptors remain Lipschitz-stable under attention perturbations and approximate moment-derived spectral functionals under the stated regularity and grid-resolution conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same Hamiltonian treatment could be applied to other graph constructions inside neural networks to extract thermodynamic diagnostics for tasks beyond hallucination detection.
  • The observed shift from Wigner-Dyson to Poisson statistics suggests that loss of spectral chaos may serve as a general marker of failure modes in sequential generation.
  • Because the method requires only attention weights, it could be adapted to monitor closed models through API-exposed attention if such access becomes available.

Load-bearing premise

The attention-derived graph Laplacian can be treated as a Hamiltonian whose thermodynamic potentials and random-matrix statistics meaningfully capture differences in reasoning quality.

What would settle it

Run the Fes probe and the RMT-deviation score on a fresh collection of LLM generations whose correctness has been verified by an independent oracle and check whether the reported AUROC margins and the Wigner-Dyson versus Poisson distinction persist.

Figures

Figures reproduced from arXiv: 2606.19404 by Salim Khazem.

Figure 1
Figure 1. Figure 1: FES pipeline. The post-softmax attention map at each layer is symmetrized into a graph Lapla￾cian; its spectrum is summarized by the thermodynamic functionals Z, F, S, C and the spectral form factor g. Concatenation across layers yields Φ(x), which a logis￾tic probe maps to a hallucination score. tions, invent biographical facts, and silently fail at multi-step reasoning (Ji et al., 2023; Farquhar et al., … view at source ↗
Figure 2
Figure 2. Figure 2: Thermodynamic functionals separate valid [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Toy validation of the RMT framing. Left: [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Spectral form factor on real Llama-3-8B / [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Mean AUROC across the full 6 × 6 (model, dataset) grid. FES attains the highest mean AUROC (0.763), outperforming the strongest spectral baseline LAPEIG. by +6.5 points and the four-feature GOR￾4 by +2.4 points. The reported numbers for hidden￾state methods (INSIDE, ICR, HSAD) place them in a comparable AUROC range to FES on tasks evaluated in their original papers, though direct comparison requires identi… view at source ↗
Figure 6
Figure 6. Figure 6: Sensitivity of mean AUROC to the inverse [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Per-layer probe-weight norm ∥wℓ∥2 for the FES-probe. Top: HaluEval probe on Llama-3-8B; weight concentrates in layers 18–26, with negligible mass in the first 12 layers. Bottom: cross-dataset heatmap on Llama-3-8B; TruthfulQA and MATH-500 lean later (layers 22–30), while HaluEval is centred on layers 16–22. No single layer is sufficient; FES aggre￾gates evidence across the discriminative band. F Additional… view at source ↗
read the original abstract

Hallucination detection in large language models (LLMs) is deployment-critical, and recent work shows that the spectrum of attention-derived graph Laplacians carries strong signal about reasoning quality. Prior spectral diagnostics, however, summarize the Laplacian spectrum by a handful of eigenvalues or hand-picked scalars, leaving most of its structure unused. We propose Free-Energy Signatures (Fes), a spectral descriptor that treats each layer's attention Laplacian as a Hamiltonian and extracts its thermodynamic potentials partition function, free energy, spectral entropy, heat capacity together with the random-matrix-theory (RMT) spectral form factor. We prove three results: (i)~Lipschitz stability of Fes under attention perturbation; (ii)~an expressiveness result showing that Fes enriches finite spectral summaries and approximates moment-derived spectral functionals under explicit regularity and grid-resolution assumptions; and (iii)~a finite-sample PAC bound on the AUROC of a training-free detector built from Fes. Empirically, across six open-weight LLMs and six benchmarks, a lightweight probe on Fes descriptors achieves the strongest aggregate AUROC among attention-spectral baselines, improving over LapEig by $+6.5$ AUROC points and over GoR-4 by $+2.4$ points on average, while requiring no update to the underlying LLM. In the fully unsupervised setting, an RMT-deviation score achieves mean AUROC $0.71$, providing a label-free but weaker detector. A complementary RMT analysis shows that correct generations exhibit more Wigner-Dyson like spectral statistics, whereas hallucinations exhibit more Poisson-like statistics. The anonymized code and config are provided in the supplementary material.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Free-Energy Signatures (Fes), which treat attention-derived graph Laplacians as Hamiltonians and extract thermodynamic potentials (partition function, free energy, spectral entropy, heat capacity) together with the RMT spectral form factor. It claims three theoretical results—Lipschitz stability of Fes, an expressiveness result showing enrichment of finite spectral summaries under regularity and grid-resolution assumptions, and a finite-sample PAC bound on AUROC for a training-free detector—and reports that a lightweight Fes probe achieves the highest aggregate AUROC across six LLMs and six benchmarks (+6.5 over LapEig, +2.4 over GoR-4), with an unsupervised RMT-deviation score reaching mean AUROC 0.71. Code is provided.

Significance. If the regularity and grid-resolution assumptions hold for real attention Laplacians and the empirical gains prove robust, Fes would supply a new, training-free spectral-thermodynamic lens on reasoning quality that goes beyond hand-picked eigenvalue summaries. The provision of anonymized code and config is a clear strength for reproducibility.

major comments (2)
  1. [Abstract] Abstract (statement of expressiveness result and PAC bound): both results are explicitly conditioned on regularity and grid-resolution assumptions on the Laplacian treated as Hamiltonian, yet the manuscript provides no verification that the attention matrices of the six evaluated LLMs satisfy these assumptions at the layer resolutions used; if the assumptions fail, the claimed enrichment and PAC guarantee do not apply.
  2. [Empirical results] Empirical section (AUROC results): aggregate improvements are reported without error bars, without details on data exclusion criteria, hyperparameter choices, or statistical significance testing of the +6.5 / +2.4 point gains, leaving the robustness of the headline claim unexamined.
minor comments (1)
  1. [Methods] Notation for the thermodynamic potentials and the precise definition of the RMT spectral form factor should be stated explicitly in the main text rather than deferred to supplementary material.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (statement of expressiveness result and PAC bound): both results are explicitly conditioned on regularity and grid-resolution assumptions on the Laplacian treated as Hamiltonian, yet the manuscript provides no verification that the attention matrices of the six evaluated LLMs satisfy these assumptions at the layer resolutions used; if the assumptions fail, the claimed enrichment and PAC guarantee do not apply.

    Authors: We acknowledge that the expressiveness result and PAC bound are conditioned on the regularity and grid-resolution assumptions, which are stated explicitly in the theoretical sections. The current manuscript does not include verification that these hold for the attention Laplacians of the six evaluated LLMs. In revision we will add a dedicated subsection reporting empirical checks (e.g., bounded operator norms, spectral smoothness, and effective grid resolution) on the attention matrices from the models and layers used; the checks will either confirm applicability of the guarantees or qualify their scope. revision: yes

  2. Referee: [Empirical results] Empirical section (AUROC results): aggregate improvements are reported without error bars, without details on data exclusion criteria, hyperparameter choices, or statistical significance testing of the +6.5 / +2.4 point gains, leaving the robustness of the headline claim unexamined.

    Authors: We agree that the empirical results require additional detail to substantiate robustness. The revised manuscript will report error bars (standard deviation across seeds or bootstrap estimates), explicit data-exclusion criteria and preprocessing steps, full hyperparameter specifications for the Fes probe and baselines, and results of statistical significance tests (e.g., paired Wilcoxon or t-tests) on the AUROC differences. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation chain is self-contained

full rationale

The paper defines Fes by treating attention Laplacians as Hamiltonians and extracting thermodynamic quantities plus RMT form factor. It proves Lipschitz stability, an expressiveness result under explicit regularity/grid-resolution assumptions on the input Laplacian, and a PAC bound on the detector. These are standard conditional proofs, not reductions of the claimed AUROC gains to the definitions themselves. The reported empirical improvements (+6.5 AUROC over LapEig) are measured against external baselines on six LLMs and benchmarks; the unsupervised RMT-deviation score uses raw spectral statistics without parameter fitting. No self-citation is load-bearing, no fitted input is relabeled as prediction, and no ansatz is smuggled. The construction is therefore independent of its outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

Abstract-only review limits visibility into explicit free parameters or invented entities; the method relies on standard mathematical assumptions for stability and PAC bounds plus the modeling choice that the Laplacian spectrum behaves thermodynamically.

axioms (3)
  • standard math Lipschitz stability of Fes under attention perturbation
    Stated as proved result (i); invoked to support reliability of the descriptor.
  • domain assumption Expressiveness of Fes under explicit regularity and grid-resolution assumptions
    Stated as proved result (ii); required for the claim that Fes enriches finite spectral summaries.
  • standard math Finite-sample PAC bound on AUROC of the Fes-based detector
    Stated as proved result (iii); underpins the statistical guarantee for the training-free detector.

pith-pipeline@v0.9.1-grok · 5843 in / 1564 out tokens · 26801 ms · 2026-06-26T21:36:28.832738+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 1 canonical work pages

  1. [1]

    Aho and Jeffrey D

    Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

  2. [2]

    Publications Manual , year = "1983", publisher =

  3. [3]

    Chandra and Dexter C

    Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

  4. [4]

    Scalable training of

    Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

  5. [5]

    Dan Gusfield , title =. 1997

  6. [6]

    Tetreault , title =

    Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

  7. [7]

    A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

    Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

  8. [8]

    Nature , volume =

    Detecting hallucinations in large language models using semantic entropy , author =. Nature , volume =

  9. [9]

    The Internal State of an

    Azaria, Amos and Mitchell, Tom , booktitle =. The Internal State of an

  10. [10]

    Li, Hao and others , journal =

  11. [11]

    ACL , year =

    Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models , author =. ACL , year =

  12. [12]

    ICLR , year =

    Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation , author =. ICLR , year =

  13. [13]

    ACM Computing Surveys , year =

    Survey of Hallucination in Natural Language Generation , author =. ACM Computing Surveys , year =

  14. [14]

    Lin, Stephanie and Hilton, Jacob and Evans, Owain , booktitle =

  15. [15]

    Li, Junyi and Cheng, Xiaoxue and Zhao, Wayne Xin and Nie, Jian-Yun and Wen, Ji-Rong , booktitle =

  16. [16]

    and Zettlemoyer, Luke , booktitle =

    Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke , booktitle =

  17. [17]

    Transactions of the ACL , volume =

    Natural Questions: A Benchmark for Question Answering Research , author =. Transactions of the ACL , volume =

  18. [18]

    arXiv preprint arXiv:2110.14168 , year =

    Training Verifiers to Solve Math Word Problems , author =. arXiv preprint arXiv:2110.14168 , year =

  19. [19]

    Measuring Mathematical Problem Solving With the

    Hendrycks, Dan and Burns, Collin and Kadavath, Saurav and Arora, Akul and Basart, Steven and Tang, Eric and Song, Dawn and Steinhardt, Jacob , booktitle =. Measuring Mathematical Problem Solving With the

  20. [20]

    ICLR , year =

    Let's Verify Step by Step , author =. ICLR , year =

  21. [21]

    Thorne, James and Vlachos, Andreas and Christodoulopoulos, Christos and Mittal, Arpit , booktitle =

  22. [22]

    2004 , publisher=

    Random matrices , author=. 2004 , publisher=

  23. [23]

    Physical Review Letters , volume =

    Distribution of the Ratio of Consecutive Level Spacings in Random Matrix Ensembles , author =. Physical Review Letters , volume =

  24. [24]

    Duke Mathematical Journal , volume =

    The Variation of the Spectrum of a Normal Matrix , author =. Duke Mathematical Journal , volume =

  25. [25]

    Journal of the American Statistical Association , volume =

    Probability Inequalities for Sums of Bounded Random Variables , author =. Journal of the American Statistical Association , volume =

  26. [26]

    and Till, Robert J

    Hand, David J. and Till, Robert J. , journal =. A simple generalisation of the area under the

  27. [27]

    NeurIPS , year =

    Nonlinear Random Matrix Theory for Deep Learning , author =. NeurIPS , year =

  28. [28]

    arXiv preprint arXiv:1811.07062 , year =

    The Full Spectrum of Deep Net Hessians at Scale , author =. arXiv preprint arXiv:1811.07062 , year =

  29. [29]

    arXiv preprint arXiv:1706.04454 , year =

    Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , author =. arXiv preprint arXiv:1706.04454 , year =

  30. [30]

    Journal of Machine Learning Research , volume =

    Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning , author =. Journal of Machine Learning Research , volume =

  31. [31]

    Hallucination Detection in

    Binkowski, Jakub and Janiak, Denis and Sawczyn, Albert and Gabrys, Bogdan and Kajdanowicz, Tomasz Jan , booktitle =. Hallucination Detection in. 2025 , address =

  32. [32]

    arXiv preprint arXiv:2601.00791 , year =

    Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning , author =. arXiv preprint arXiv:2601.00791 , year =

  33. [33]

    arXiv preprint arXiv:2510.19117 , year =

    A Graph Signal Processing Framework for Hallucination Detection in Large Language Models , author =. arXiv preprint arXiv:2510.19117 , year =

  34. [34]

    Chen, Chao and Liu, Kai and Chen, Ze and Gu, Yi and Wu, Yue and Tao, Mingyuan and Fu, Zhihang and Ye, Jieping , booktitle =

  35. [35]

    Zhang, Zhenliang and Hu, Xinyu and Zhang, Huixuan and Zhang, Junzhe and Wan, Xiaojun , booktitle =

  36. [36]

    1997 , doi =

    Matrix Analysis , author =. 1997 , doi =

  37. [37]

    Proceedings of the Royal Society of London

    Level clustering in the regular spectrum , author=. Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences , volume=. 1977 , publisher=

  38. [38]

    Physical review letters , volume=

    Characterization of chaotic quantum spectra and universality of level fluctuation laws , author=. Physical review letters , volume=. 1984 , publisher=

  39. [39]

    arXiv preprint arXiv:2601.02273 , year=

    TopoLoRA-SAM: Topology-Aware Parameter-Efficient Adaptation of Foundation Segmenters for Thin-Structure and Cross-Domain Binary Semantic Segmentation , author=. arXiv preprint arXiv:2601.02273 , year=