Generative Criticality in Large Language Model Temperature Scaling

Huajian Ruan; Jinyang Li; Lingxiao Wang; Xingyu Guo

arxiv: 2606.06238 · v1 · pith:WENSMHBWnew · submitted 2026-06-04 · 💻 cs.LG · cond-mat.stat-mech· hep-lat

Generative Criticality in Large Language Model Temperature Scaling

Huajian Ruan , Jinyang Li , Xingyu Guo , Lingxiao Wang This is my paper

Pith reviewed 2026-06-28 02:36 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.stat-mechhep-lat

keywords large language modelstemperature scalingcritical phenomenasusceptibilityorder parameterintrinsic dimensionphase transitiontoken embeddings

0 comments

The pith

Treating LLM token embeddings as spins on a chain reveals a critical temperature Tc marked by a susceptibility peak and output collapse to one direction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models token embeddings produced by large language models as continuous spin variables arranged on a one-dimensional chain. It defines a susceptibility from the connected two-point correlator of these embeddings and an order parameter from their ensemble average, then scans the softmax temperature T. A sharp peak in susceptibility appears near a characteristic Tc, accompanied by power-law-like scaling, a rapid shift in the order parameter, and a collapse of outputs onto a single semantic direction below Tc. The intrinsic dimension computed by the TwoNN estimator reaches a minimum at the same point. These signatures appear consistently across model sizes from 0.6B to 32B parameters and across prompt categories. A reader might care because the framework supplies concrete statistical-mechanics observables for diagnosing when generated text becomes more ordered or diverse.

Core claim

Defining a susceptibility from the connected two-point correlator and an order parameter from the ensemble-averaged embedding field, we vary the softmax temperature T and observe a sharp susceptibility peak near a characteristic Tc with power-law-like scaling, a concurrent rapid change in the order parameter, and a collapse onto a single semantic direction below Tc. The intrinsic dimension estimated by the two nearest neighbor (TwoNN) method independently corroborates these findings, reaching a minimum near Tc. Results are robust across model scales (Qwen3: 0.6B--32B) and prompt categories. While the phenomenology closely resembles a continuous phase transition, the non-equilibrium nature of

What carries the argument

Susceptibility constructed from the connected two-point correlator of token embeddings treated as continuous spin variables on a one-dimensional chain.

If this is right

Below Tc the generated embeddings collapse onto a single semantic direction.
The intrinsic dimension of the output distribution reaches a minimum near Tc.
The observed scaling of susceptibility near Tc is consistent with power-law behavior.
The same signatures appear across model scales from 0.6B to 32B parameters.
The framework supplies quantitative observables for the collective structure of LLM outputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

An operating temperature near Tc could balance coherence and diversity in generation without external tuning.
The same correlator-based observables might be applied to other sampling methods such as nucleus sampling.
The noted non-equilibrium character implies the transition may be a dynamical crossover rather than a true thermodynamic phase transition.
Prompt engineering could shift the location of Tc by altering the effective interaction range along the token chain.

Load-bearing premise

Token embeddings generated by autoregressive sampling can be treated as continuous spin variables on a one-dimensional chain so that equilibrium statistical-mechanics quantities remain meaningful descriptors of the output distribution.

What would settle it

Repeating the full measurement protocol on a new family of models or with non-autoregressive sampling and finding no susceptibility peak at any temperature.

Figures

Figures reproduced from arXiv: 2606.06238 by Huajian Ruan, Jinyang Li, Lingxiao Wang, Xingyu Guo.

**Figure 1.** Figure 1: Susceptibility χ vs. temperature T for (a) different model scales using Wikipedia prompts (1000 samples, 280 tokens). All models exhibit a peak near Tc ≈ 1.4, with larger models showing higher peak susceptibility. (b) Different prompt categories generated by Qwen3-32B (1000 samples, 300 tokens). 0 5 10 Temperature 2 4 6 8 Susceptibility Â 2 º = 0:203 (dof = 51) Data Fit region Fit, =0.100 Tc = 1.407 (a) 10… view at source ↗

**Figure 2.** Figure 2: Power-law fits of susceptibility near Tc for Qwen3-32B. (a) Full χ(T) curve with the fit region highlighted; (b) log-log plot for T > Tc; (c) log-log plot for T < Tc. Both sides yield a critical exponent γ ≈ 0.1, consistent with power-law divergence. 3 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Stability analysis of the susceptibility signal. (a) Convergence with increasing ensemble [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Order parameter structure via PCA. (a) Projection onto the first two principal components [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Intrinsic dimension Id vs. temperature for five Qwen3 model scales. All models exhibit a minimum near Tc ≈ 1.3–1.5, after which Id rises sharply and saturates at high T, consistent with the onset of a disordered phase where the configuration space expands. range correlations, and leveraging the critical behavior identified here as a diagnostic for trustworthy AI—for instance, the sharp onset of the disorde… view at source ↗

**Figure 6.** Figure 6: By evaluating the ratio of the second-nearest neighbor distance to the nearest neighbor [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Scaling laws tests at different sizes. Panels (a), (b), and (c) show exponent fitting for a [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

We propose a statistical-field framework for text generated by large language models (LLMs), treating token embeddings as continuous spin variables on a one-dimensional chain. Defining a susceptibility from the connected two-point correlator and an order parameter from the ensemble-averaged embedding field, we vary the \texttt{softmax} temperature $T$ and observe a sharp susceptibility peak near a characteristic $T_c$ with power-law-like scaling, a concurrent rapid change in the order parameter, and a collapse onto a single semantic direction below $T_c$. The intrinsic dimension estimated by the two nearest neighbor (TwoNN) method independently corroborates these findings, reaching a minimum near $T_c$. Results are robust across model scales (Qwen3: 0.6B--32B) and prompt categories. While the phenomenology closely resembles a continuous phase transition, the non-equilibrium nature of autoregressive generation warrants further investigation. Our framework provides quantitative tools for probing the collective statistical structure of LLM outputs and suggests connections between decoding strategies and critical phenomena.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper finds a susceptibility peak from embedding correlations at a characteristic temperature in LLMs, but the autoregressive process undercuts the equilibrium statistical-mechanics reading.

read the letter

The main observation is that a susceptibility built from the connected two-point correlator of token embeddings peaks near some Tc, with a concurrent shift in an order parameter defined from the average embedding field, semantic collapse below Tc, and a minimum in TwoNN intrinsic dimension. They report this across Qwen3 models from 0.6B to 32B and multiple prompt types.

What stands out as new is the concrete construction of those quantities from the embeddings treated as spins on a chain, plus the joint appearance of the three signatures. The robustness check across scales and the use of an independent dimension estimator are straightforward and add some weight to the empirical pattern.

The soft spot is the one the stress-test note flags. Autoregressive sampling produces a joint distribution that is a product of successive conditionals, not an equilibrium measure with a single global temperature. That breaks the usual justification for calling the connected correlator a susceptibility or expecting fluctuation-dissipation behavior, so the peak could be an artifact of how the conditionals sharpen rather than collective criticality. The abstract itself flags the non-equilibrium issue but still presents the phenomenology as resembling a continuous transition. No details appear on sample sizes, how the correlator is estimated from finite sequences, error bars, or statistical tests for the power-law-like scaling or the precise location of Tc.

This is aimed at readers working on LLM interpretability and decoding. The observations are concrete enough to be worth checking in a full methods section, even if the physical framing needs tightening. I would send it to peer review rather than desk-reject so referees can assess the actual computations and whether the non-equilibrium gap can be closed.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a statistical-field framework treating LLM token embeddings as continuous spins on a 1D chain. Susceptibility is defined from the connected two-point correlator and an order parameter from the ensemble-averaged embedding field; varying softmax temperature T yields a sharp susceptibility peak near a characteristic Tc with power-law-like scaling, a concurrent rapid change in the order parameter, semantic collapse below Tc, and a minimum in TwoNN intrinsic dimension near Tc. These features are reported as robust across Qwen3 model scales (0.6B–32B) and prompt categories, with the non-equilibrium character of autoregressive generation noted as requiring further study.

Significance. If the equilibrium interpretation can be substantiated, the work supplies quantitative, reproducible diagnostics (connected correlator, order parameter, TwoNN dimension) that link decoding temperature to collective phenomena and could inform temperature selection. The cross-scale robustness and independent corroboration by intrinsic dimension are concrete strengths. The central claim, however, rests on the applicability of equilibrium statistical mechanics to autoregressive output distributions.

major comments (2)

[Abstract; §3 (framework definition)] The mapping of autoregressive token sequences to an equilibrium 1D Ising-like chain is assumed without demonstration that the joint distribution over embeddings satisfies fluctuation-dissipation or cluster properties required for the connected correlator to be interpreted as a susceptibility. This assumption is load-bearing for the claim that the observed peak constitutes criticality rather than a sequential-conditioning artifact (see skeptic concern and abstract caveat on non-equilibrium nature).
[§4 (results and figures)] No information is supplied on how the connected two-point correlator is estimated from finite samples, whether bootstrap or jackknife error bars are computed, or what statistical test establishes the power-law scaling and the location of Tc. This directly affects the soundness of the peak and scaling claims.

minor comments (2)

[§3] Clarify the precise centering and normalization used for the embedding field when defining the order parameter.
[§4] Add a brief methods paragraph on the TwoNN implementation and its sensitivity to sample size.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below, clarifying the phenomenological scope of the framework while committing to added methodological detail.

read point-by-point responses

Referee: [Abstract; §3 (framework definition)] The mapping of autoregressive token sequences to an equilibrium 1D Ising-like chain is assumed without demonstration that the joint distribution over embeddings satisfies fluctuation-dissipation or cluster properties required for the connected correlator to be interpreted as a susceptibility. This assumption is load-bearing for the claim that the observed peak constitutes criticality rather than a sequential-conditioning artifact (see skeptic concern and abstract caveat on non-equilibrium nature).

Authors: We agree the equilibrium mapping is an analogy rather than a derived result. The abstract already states that 'the non-equilibrium nature of autoregressive generation warrants further investigation' and presents the features as phenomenological resemblance to criticality. The connected correlator is employed as a diagnostic of collective structure in the generated sequences. We will revise §3 to state more explicitly that fluctuation-dissipation relations are not demonstrated and that the peak may partly reflect sequential conditioning; the framework is offered as a practical tool rather than a claim of thermodynamic equilibrium. revision: partial
Referee: [§4 (results and figures)] No information is supplied on how the connected two-point correlator is estimated from finite samples, whether bootstrap or jackknife error bars are computed, or what statistical test establishes the power-law scaling and the location of Tc. This directly affects the soundness of the peak and scaling claims.

Authors: We accept that these procedural details are missing. In the revised manuscript we will add a dedicated paragraph in §4 describing: (i) the ensemble construction (multiple independent generations per prompt), (ii) the explicit estimator for the connected two-point correlator, (iii) bootstrap resampling for error bars on susceptibility and order-parameter curves, and (iv) the fitting protocol used to locate Tc and assess power-law scaling. revision: yes

Circularity Check

0 steps flagged

No circularity: direct measurement of defined observables under temperature variation

full rationale

The paper defines susceptibility via the connected two-point correlator of embedding vectors and an order parameter via the ensemble-averaged field, then directly computes these quantities while varying the softmax temperature T on sampled outputs. The reported peak near Tc, power-law-like scaling, order-parameter jump, and TwoNN dimension minimum are all empirical observations from this procedure, not quantities fitted to data and re-labeled as predictions. No self-citations, ansatzes, or uniqueness theorems are invoked to justify the central claims; the framework is applied to generated sequences without reducing the target phenomenology to its own inputs by construction. The non-equilibrium caveat is explicitly noted, confirming the analysis remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on one central domain assumption; no free parameters or new entities are named in the abstract.

axioms (1)

domain assumption Token embeddings can be treated as continuous spin variables on a one-dimensional chain
This premise is required to define the connected two-point correlator and susceptibility from the generated sequence.

pith-pipeline@v0.9.1-grok · 5716 in / 1340 out tokens · 49342 ms · 2026-06-28T02:36:12.369087+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 9 canonical work pages · 4 internal anchors

[1]

Penguin uK, 2003

Steven Pinker.The language instinct: How the mind creates language. Penguin uK, 2003

2003
[2]

Walter de Gruyter, 2002

Noam Chomsky.Syntactic structures. Walter de Gruyter, 2002

2002
[3]

The debate over understanding in ai’s large language models.Proceedings of the National Academy of Sciences, 120(13):e2215907120, 2023

Melanie Mitchell and David C Krakauer. The debate over understanding in ai’s large language models.Proceedings of the National Academy of Sciences, 120(13):e2215907120, 2023

2023
[4]

Smith, and William Bialek

Colin Scheibner, Lindsay M. Smith, and William Bialek. Large language models and the entropy of english.ArXiv, abs/2512.24969, 2025

work page arXiv 2025
[5]

Entropy rate estimates for natural language—a new extrapolation of compressed large-scale corpora.Entropy, 18(10):364, 2016

Ryosuke Takahira, Kumiko Tanaka-Ishii, and Łukasz D˛ ebowski. Entropy rate estimates for natural language—a new extrapolation of compressed large-scale corpora.Entropy, 18(10):364, 2016

2016
[6]

Mutual information functions of natural language texts

Wentian Li. Mutual information functions of natural language texts. Santa Fe Institute Santa Fe, NM, USA, 1989

1989
[7]

Phase transition in large language models and the criticality of natural languages

Kai Nakaishi, Yoshihiko Nishikawa, and Koji Hukushima. Critical phase transition in large language models.arXiv preprint arXiv:2406.05335, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

Phase transitions in large language models and the o(n) model.arXiv preprint arXiv:2501.16241, 2025

Youran Sun and Babak Haghighat. Phase transitions in large language models and the o(n) model.arXiv preprint arXiv:2501.16241, 2025

work page arXiv 2025
[9]

Machine learning and the physical sciences

Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Daudet, Maria Schuld, Naftali Tishby, Leslie V ogt-Maranto, and Lenka Zdeborová. Machine learning and the physical sciences. Reviews of Modern Physics, 91(4):045002, 2019

2019
[10]

World Scientific, 2005

Daniel J Amit and Victor Martin-Mayor.Field theory, the renormalization group, and critical phenomena: graphs to computers. World Scientific, 2005. 5

2005
[11]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017
[12]

Hot or cold? adaptive temperature sampling for code generation with large language models

Yuqi Zhu, Jia Li, Ge Li, YunFei Zhao, Zhi Jin, and Hong Mei. Hot or cold? adaptive temperature sampling for code generation with large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 437–445, 2024

2024
[13]

Critical behavior in physics and probabilistic formal languages

Henry W Lin and Max Tegmark. Critical behavior in physics and probabilistic formal languages. Entropy, 19(7):299, 2017

2017
[14]

A simple explanation for the phase transition in large language models with list decoding.arXiv preprint arXiv:2303.13112, 2023

Cheng-Shang Chang. A simple explanation for the phase transition in large language models with list decoding.arXiv preprint arXiv:2303.13112, 2023

work page arXiv 2023
[15]

Mendes-Santos, X

T. Mendes-Santos, X. Turkeshi, M. Dalmonte, and Alex Rodriguez. Unsupervised learning universal critical behavior via the intrinsic dimension.Physical Review X, 11(1), February 2021

2021
[16]

Measuring the Intrinsic Dimension of Objective Landscapes

Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes.arXiv preprint arXiv:1804.08838, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

Springer Nature, 2023

Christopher M Bishop and Hugh Bishop.Deep learning: Foundations and concepts. Springer Nature, 2023

2023
[18]

Teal: Tokenize and embed all for multi-modal large language models.arXiv preprint arXiv:2311.04589, 2023

Zhen Yang, Yingxue Zhang, Fandong Meng, and Jie Zhou. Teal: Tokenize and embed all for multi-modal large language models.arXiv preprint arXiv:2311.04589, 2023

work page arXiv 2023
[19]

Semantic tokenizer for enhanced natural language processing.arXiv preprint arXiv:2304.12404, 2023

Sandeep Mehta, Darpan Shah, Ravindra Kulkarni, and Cornelia Caragea. Semantic tokenizer for enhanced natural language processing.arXiv preprint arXiv:2304.12404, 2023

work page arXiv 2023
[20]

Text representations and word embeddings: Vectorizing textual data

Roman Egger. Text representations and word embeddings: Vectorizing textual data. InApplied data science in tourism: Interdisciplinary approaches, methodologies, and applications, pages 335–361. Springer, 2022

2022
[21]

Spontaneous symmetry breaking in the o (n) model for large n.Physical Review D, 10(8):2491, 1974

Sidney Coleman, Roman Jackiw, and HDavid Politzer. Spontaneous symmetry breaking in the o (n) model for large n.Physical Review D, 10(8):2491, 1974

1974
[22]

Exact solution of the o (n) model on a random lattice.Nuclear Physics B, 455(3):577–618, 1995

Bertrand Eynard and Charlotte Kristjansen. Exact solution of the o (n) model on a random lattice.Nuclear Physics B, 455(3):577–618, 1995

1995
[23]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[24]

Spin-spin correlations in the two-dimensional ising model.Il Nuovo Cimento B (1965-1970), 44(2):276–305, 1966

Leo P Kadanoff. Spin-spin correlations in the two-dimensional ising model.Il Nuovo Cimento B (1965-1970), 44(2):276–305, 1966

1965
[25]

Oxford university press, 2021

Jean Zinn-Justin.Quantum field theory and critical phenomena, volume 171. Oxford university press, 2021

2021
[26]

Estimating the intrinsic dimension of datasets by a minimal neighborhood information.Scientific reports, 7(1):12140, 2017

Elena Facco, Maria d’Errico, Alex Rodriguez, and Alessandro Laio. Estimating the intrinsic dimension of datasets by a minimal neighborhood information.Scientific reports, 7(1):12140, 2017

2017
[27]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

The renormalization group and the ϵ expansion.Physics Reports, 12(2):75–199, 1974

Kenneth G Wilson and John Kogut. The renormalization group and the ϵ expansion.Physics Reports, 12(2):75–199, 1974. 6 A TwoNN Fitting We present the data and fitting plot for the TwoNN method applied to Qwen3-32B in the appendix to show more details. 0.0 0.1 ln( ) 0 5-ln(1-F) T=0.000, d=27.80, R 2=-1.195 Data Fit: d=27.80 0.0 0.2 ln( ) 0 5-ln(1-F) T=0.200...

1974

[1] [1]

Penguin uK, 2003

Steven Pinker.The language instinct: How the mind creates language. Penguin uK, 2003

2003

[2] [2]

Walter de Gruyter, 2002

Noam Chomsky.Syntactic structures. Walter de Gruyter, 2002

2002

[3] [3]

The debate over understanding in ai’s large language models.Proceedings of the National Academy of Sciences, 120(13):e2215907120, 2023

Melanie Mitchell and David C Krakauer. The debate over understanding in ai’s large language models.Proceedings of the National Academy of Sciences, 120(13):e2215907120, 2023

2023

[4] [4]

Smith, and William Bialek

Colin Scheibner, Lindsay M. Smith, and William Bialek. Large language models and the entropy of english.ArXiv, abs/2512.24969, 2025

work page arXiv 2025

[5] [5]

Entropy rate estimates for natural language—a new extrapolation of compressed large-scale corpora.Entropy, 18(10):364, 2016

Ryosuke Takahira, Kumiko Tanaka-Ishii, and Łukasz D˛ ebowski. Entropy rate estimates for natural language—a new extrapolation of compressed large-scale corpora.Entropy, 18(10):364, 2016

2016

[6] [6]

Mutual information functions of natural language texts

Wentian Li. Mutual information functions of natural language texts. Santa Fe Institute Santa Fe, NM, USA, 1989

1989

[7] [7]

Phase transition in large language models and the criticality of natural languages

Kai Nakaishi, Yoshihiko Nishikawa, and Koji Hukushima. Critical phase transition in large language models.arXiv preprint arXiv:2406.05335, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

Phase transitions in large language models and the o(n) model.arXiv preprint arXiv:2501.16241, 2025

Youran Sun and Babak Haghighat. Phase transitions in large language models and the o(n) model.arXiv preprint arXiv:2501.16241, 2025

work page arXiv 2025

[9] [9]

Machine learning and the physical sciences

Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Daudet, Maria Schuld, Naftali Tishby, Leslie V ogt-Maranto, and Lenka Zdeborová. Machine learning and the physical sciences. Reviews of Modern Physics, 91(4):045002, 2019

2019

[10] [10]

World Scientific, 2005

Daniel J Amit and Victor Martin-Mayor.Field theory, the renormalization group, and critical phenomena: graphs to computers. World Scientific, 2005. 5

2005

[11] [11]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017

[12] [12]

Hot or cold? adaptive temperature sampling for code generation with large language models

Yuqi Zhu, Jia Li, Ge Li, YunFei Zhao, Zhi Jin, and Hong Mei. Hot or cold? adaptive temperature sampling for code generation with large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 437–445, 2024

2024

[13] [13]

Critical behavior in physics and probabilistic formal languages

Henry W Lin and Max Tegmark. Critical behavior in physics and probabilistic formal languages. Entropy, 19(7):299, 2017

2017

[14] [14]

A simple explanation for the phase transition in large language models with list decoding.arXiv preprint arXiv:2303.13112, 2023

Cheng-Shang Chang. A simple explanation for the phase transition in large language models with list decoding.arXiv preprint arXiv:2303.13112, 2023

work page arXiv 2023

[15] [15]

Mendes-Santos, X

T. Mendes-Santos, X. Turkeshi, M. Dalmonte, and Alex Rodriguez. Unsupervised learning universal critical behavior via the intrinsic dimension.Physical Review X, 11(1), February 2021

2021

[16] [16]

Measuring the Intrinsic Dimension of Objective Landscapes

Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes.arXiv preprint arXiv:1804.08838, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

Springer Nature, 2023

Christopher M Bishop and Hugh Bishop.Deep learning: Foundations and concepts. Springer Nature, 2023

2023

[18] [18]

Teal: Tokenize and embed all for multi-modal large language models.arXiv preprint arXiv:2311.04589, 2023

Zhen Yang, Yingxue Zhang, Fandong Meng, and Jie Zhou. Teal: Tokenize and embed all for multi-modal large language models.arXiv preprint arXiv:2311.04589, 2023

work page arXiv 2023

[19] [19]

Semantic tokenizer for enhanced natural language processing.arXiv preprint arXiv:2304.12404, 2023

Sandeep Mehta, Darpan Shah, Ravindra Kulkarni, and Cornelia Caragea. Semantic tokenizer for enhanced natural language processing.arXiv preprint arXiv:2304.12404, 2023

work page arXiv 2023

[20] [20]

Text representations and word embeddings: Vectorizing textual data

Roman Egger. Text representations and word embeddings: Vectorizing textual data. InApplied data science in tourism: Interdisciplinary approaches, methodologies, and applications, pages 335–361. Springer, 2022

2022

[21] [21]

Spontaneous symmetry breaking in the o (n) model for large n.Physical Review D, 10(8):2491, 1974

Sidney Coleman, Roman Jackiw, and HDavid Politzer. Spontaneous symmetry breaking in the o (n) model for large n.Physical Review D, 10(8):2491, 1974

1974

[22] [22]

Exact solution of the o (n) model on a random lattice.Nuclear Physics B, 455(3):577–618, 1995

Bertrand Eynard and Charlotte Kristjansen. Exact solution of the o (n) model on a random lattice.Nuclear Physics B, 455(3):577–618, 1995

1995

[23] [23]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[24] [24]

Spin-spin correlations in the two-dimensional ising model.Il Nuovo Cimento B (1965-1970), 44(2):276–305, 1966

Leo P Kadanoff. Spin-spin correlations in the two-dimensional ising model.Il Nuovo Cimento B (1965-1970), 44(2):276–305, 1966

1965

[25] [25]

Oxford university press, 2021

Jean Zinn-Justin.Quantum field theory and critical phenomena, volume 171. Oxford university press, 2021

2021

[26] [26]

Estimating the intrinsic dimension of datasets by a minimal neighborhood information.Scientific reports, 7(1):12140, 2017

Elena Facco, Maria d’Errico, Alex Rodriguez, and Alessandro Laio. Estimating the intrinsic dimension of datasets by a minimal neighborhood information.Scientific reports, 7(1):12140, 2017

2017

[27] [27]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

The renormalization group and the ϵ expansion.Physics Reports, 12(2):75–199, 1974

Kenneth G Wilson and John Kogut. The renormalization group and the ϵ expansion.Physics Reports, 12(2):75–199, 1974. 6 A TwoNN Fitting We present the data and fitting plot for the TwoNN method applied to Qwen3-32B in the appendix to show more details. 0.0 0.1 ln( ) 0 5-ln(1-F) T=0.000, d=27.80, R 2=-1.195 Data Fit: d=27.80 0.0 0.2 ln( ) 0 5-ln(1-F) T=0.200...

1974