pith. sign in

arxiv: 2606.28103 · v1 · pith:T6E4DR2Inew · submitted 2026-06-26 · ❄️ cond-mat.dis-nn · cond-mat.stat-mech

Phase structure of the Random Language Model

Pith reviewed 2026-06-29 01:57 UTC · model grok-4.3

classification ❄️ cond-mat.dis-nn cond-mat.stat-mech
keywords random language modelphase transitionsrandom energy modelcontext-free grammarsglassy phasedouble-scaling limitHeaps' lawsymbol correlations
0
0 comments X

The pith

The random language model maps to the random energy model and exhibits a hierarchy of phase transitions in a double-scaling limit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an ensemble of random context-free grammars produces structured output only after passing through multiple phase transitions. These occur when the grammar temperature approaches zero at the same time as the number of hidden symbols grows to infinity, with their product held fixed. A mapping to the random energy model locates the points where symbol correlations first appear, single-symbol probabilities become uneven, and rule selection locks into a glassy state. The same limit supplies scaling relations for entropy and rule usage that line up with patterns already seen in large language models.

Core claim

In the double-scaling limit where the rescaled temperature x equals the grammar temperature times the logarithm of the number of hidden symbols, the random language model is equivalent to the random energy model. This equivalence produces a sequence of transitions: symbol correlations set in at one value of x, marginal distributions over symbols cease to be uniform at a second value, and rule usage freezes into a glassy phase at a third value. A semi-annealed calculation then yields explicit scaling forms for rule frequencies, entropy, and energy.

What carries the argument

The identification of the random language model with the random energy model inside the double-scaling limit x = ε̃_d log N.

If this is right

  • Symbol correlations appear once x exceeds the first critical value.
  • Single-symbol marginals become non-uniform past the second critical value.
  • Rule selection enters a glassy phase beyond the third critical value.
  • Rule usage, entropy, and energy obey explicit scaling laws that reproduce Heaps' law and context-length scaling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same scaling construction could be applied to other ensembles of generative grammars to locate their own transition points.
  • Spin-glass techniques already used for the random energy model become directly available for analyzing grammar-based language models.
  • The glassy freezing of rules supplies a candidate mechanism for the abrupt changes in output structure sometimes observed when model size or training temperature is varied.

Load-bearing premise

The random language model becomes exactly equivalent to the random energy model once the grammar temperature and hidden-symbol count are scaled together at fixed product.

What would settle it

Numerical sampling of the random language model at large but finite N and correspondingly small ε̃_d, with x held constant, that fails to show the predicted sequence of changes in symbol correlations or rule freezing.

Figures

Figures reproduced from arXiv: 2606.28103 by Alessio Giorlandino, Eric De Giuli, Sebastian Goldt.

Figure 1
Figure 1. Figure 1: FIG. 1. Example derivation for an English sentence in a [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. (a) Phase diagram of the RLM as a function of rescaled temperature [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Rule use as a function of (a) corpus length and rescaled temperature (b,c). (a) Number of unique rules [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Entropy, energy, and code length. (a) Entropy rate in the RLM at indicated [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

Context-free grammars are minimal models of hierarchical structure in human language, generating structured text from recursive production rules. The Random Language Model (RLM) [De Giuli, PRL 2019], an ensemble of such grammars with random rule weights, exhibits a cross-over from gibberish-like output to structured text as a function of a "temperature", but the location and nature of this transition remained unclear. Here, we show that the RLM exhibits a hierarchy of phase transitions in a double-scaling limit where the grammar temperature $\tilde{\epsilon}_d \to 0$ and the number of hidden symbols $N \to \infty$ at fixed $x = \tilde{\epsilon}_d \log N$. By identifying the relation between RLM and the Random Energy Model, we identify a series of transitions where correlations between symbols emerge, single-symbol marginals become non-uniform, and rule use freezes in a glassy phase. A semi-annealed approximation yields nontrivial scaling laws for rule usage, entropy, and energy, consistent with Heaps' law and context-length scaling observed in large language models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that the Random Language Model (RLM) of random context-free grammars exhibits a hierarchy of phase transitions in the double-scaling limit ε̃_d → 0, N → ∞ at fixed x = ε̃_d log N. By mapping the RLM onto the Random Energy Model (REM), it identifies successive transitions at which symbol correlations emerge, single-symbol marginals become non-uniform, and rule usage freezes into a glassy phase. A semi-annealed approximation is used to derive scaling laws for rule usage, entropy, and energy that are stated to be consistent with Heaps' law and context-length scaling in large language models.

Significance. If the RLM-REM mapping is rigorously controlled, the work supplies an analytic statistical-mechanics account of the crossover from unstructured to structured output in random grammars and links it to observable scaling laws in language models. The explicit use of the REM reference frame and the semi-annealed approximation constitute a clear technical strength when the limit is shown to eliminate grammar-specific correlations.

major comments (1)
  1. [Abstract and the section deriving the relation to the REM] The RLM-REM identification in the double-scaling limit x = ε̃_d log N is the load-bearing step for the entire hierarchy of transitions. The manuscript must demonstrate explicitly (with controlled error bounds) that the effective energy distribution over rule configurations becomes Gaussian and uncorrelated at leading order, with all sub-leading grammar-specific correlations vanishing as ε̃_d → 0 at fixed x; otherwise the sequence of transitions read off from the REM does not apply to the RLM.
minor comments (1)
  1. Notation for the scaled temperature variable (ε̃_d versus ε_d) should be made uniform throughout the text and equations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting the central role of the RLM-REM mapping. We respond to the single major comment below and indicate the revisions we will undertake.

read point-by-point responses
  1. Referee: [Abstract and the section deriving the relation to the REM] The RLM-REM identification in the double-scaling limit x = ε̃_d log N is the load-bearing step for the entire hierarchy of transitions. The manuscript must demonstrate explicitly (with controlled error bounds) that the effective energy distribution over rule configurations becomes Gaussian and uncorrelated at leading order, with all sub-leading grammar-specific correlations vanishing as ε̃_d → 0 at fixed x; otherwise the sequence of transitions read off from the REM does not apply to the RLM.

    Authors: We agree that explicit control of the mapping is required for the hierarchy of transitions to be applicable. The manuscript derives the effective energy by averaging over the random rule weights in the double-scaling limit and invokes the central-limit theorem for the sum of many independent contributions, yielding a Gaussian distribution at leading order with variance proportional to log N. Grammar-specific correlations appear only at O(ε̃_d) and are therefore suppressed at fixed x. To meet the request for controlled error bounds we will revise the derivation section to include an explicit cumulant expansion, showing that all non-Gaussian cumulants and residual correlations are o(1) as ε̃_d → 0 with x held fixed. This addition will be placed immediately after the statement of the mapping and will not alter the subsequent phase-structure analysis. revision: yes

Circularity Check

0 steps flagged

No significant circularity; RLM-REM mapping presented as identification, not tautological reduction.

full rationale

The abstract and context describe an identification of RLM with REM in the double-scaling limit x=ε̃_d log N as the basis for reading off transitions. This is framed as a derived relation rather than a definitional equivalence or self-citation chain. The semi-annealed approximation is explicitly labeled as such and yields scaling laws presented as consistent with external observations (Heaps' law), not fitted to the target results. No equations or steps in the provided text reduce predictions to inputs by construction, and the central claim retains independent content from the REM reference frame. No load-bearing self-citations or ansatz smuggling are quoted.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the identification of the RLM with the REM in the double-scaling limit and on the validity of the semi-annealed approximation; these are not derived from more basic principles in the abstract.

free parameters (1)
  • scaling variable x = ε̃_d log N
    The double-scaling limit is taken at fixed x; this choice organizes the phase structure but is selected to produce the reported transitions.
axioms (1)
  • domain assumption The Random Language Model can be mapped onto the Random Energy Model in the double-scaling limit.
    This identification is invoked to read off the series of transitions from known REM results.

pith-pipeline@v0.9.1-grok · 5728 in / 1415 out tokens · 60698 ms · 2026-06-29T01:57:34.059836+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 6 canonical work pages · 3 internal anchors

  1. [1]

    applying a rulea→bcwith parentaand childrenb(left) andc(right)

    logN 2 +ℓlogT, whereℓis the number of leaves ofTand Tis the number of observable symbols in the grammar. applying a rulea→bcwith parentaand childrenb(left) andc(right). The weakest structure such a factor can generate is a correlation between its two children while each child’s marginal remains uniform. By holding the message reach- ingffrom above uniform...

  2. [2]

    Estensione del numero di dottorati di ricerca e dottorati innovativi per la Pubblica Amministrazione e il patrimonio culturale

    We predict a loga- rithmic behavior at smallxwith prefactor 1/4, verified in fig. 4a (dash-dotted and dotted lines), and a steepening asx→1/8 − (dashed line), also captured by theory; for 5 10-2 10-1 100 1 2 3 4 5 10-2 10-1 100 10-1 100 101 100 101 102 103 104 105 0.5 1 1.5 2 2.5 (a) (b) (c) FIG. 4. Entropy, energy, and code length. (a) Entropy rate in th...

  3. [3]

    De Giuli, Phys

    E. De Giuli, Phys. Rev. Lett.122, 128301 (2019)

  4. [4]

    Scaling Laws for Neural Language Models

    J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, arXiv preprint arXiv:2001.08361 (2020)

  5. [5]

    De Giuli, Journal of Physics A: Mathematical and The- oretical52, 504001 (2019)

    E. De Giuli, Journal of Physics A: Mathematical and The- oretical52, 504001 (2019)

  6. [6]

    De Giuli, Journal of Physics A: Mathematical and The- oretical55, 489501 (2022)

    E. De Giuli, Journal of Physics A: Mathematical and The- oretical55, 489501 (2022)

  7. [7]

    Nakaishi and K

    K. Nakaishi and K. Hukushima, Physical Review Re- search4, 023156 (2022)

  8. [8]

    Lalegani and E

    F. Lalegani and E. De Giuli, Physical Review E109, 054313 (2024)

  9. [9]

    H. W. Lin and M. Tegmark, Entropy19, 299 (2017)

  10. [10]

    Cagnetta, L

    F. Cagnetta, L. Petrini, U. M. Tomasini, A. Favero, and M. Wyart, Physical Review X14, 031001 (2024)

  11. [11]

    Cagnetta and M

    F. Cagnetta and M. Wyart, Advances in Neural Informa- tion Processing Systems37, 83119 (2024)

  12. [12]

    Garnier-Brun, M

    J. Garnier-Brun, M. M´ ezard, E. Moscato, and L. Sagli- etti, arXiv preprint arXiv:2408.15138 (2024)

  13. [13]

    Cagnetta, A

    F. Cagnetta, A. Ravent´ os, S. Ganguli, and M. Wyart, arXiv preprint arXiv:2602.07488 (2026)

  14. [14]

    Chomsky,Syntactic structures(Walter de Gruyter, Berlin, 2002)

    N. Chomsky,Syntactic structures(Walter de Gruyter, Berlin, 2002)

  15. [15]

    Chomsky,Aspects of the Theory of Syntax, Vol

    N. Chomsky,Aspects of the Theory of Syntax, Vol. 11 (MIT press, Cambridge, 2014)

  16. [16]

    J. E. Hopcroft, R. Motwani, and J. D. Ullman,Intro- duction to automata theory, languages, and computation, 3rd ed. (Pearson, Boston, Ma, 2007)

  17. [17]

    Derrida, Physical Review Letters45, 79 (1980)

    B. Derrida, Physical Review Letters45, 79 (1980)

  18. [18]

    Derrida, Phys

    B. Derrida, Phys. Rev. B24, 2613 (1981)

  19. [19]

    H. S. Heaps,Information retrieval: Computational and theoretical aspects(Academic Press, Inc., 1978)

  20. [20]

    Scheibner, L

    C. Scheibner, L. M. Smith, and W. Bialek, arXiv preprint arXiv:2512.24969 (2025)

  21. [21]

    Scaling limit of the Random Language Model,

    E. De Giuli, “Scaling limit of the Random Language Model,” (2026)

  22. [22]

    M´ ezard and A

    M. M´ ezard and A. Montanari,Information, physics, and computation(Oxford University Press, 2009)

  23. [23]

    M´ ezard, G

    M. M´ ezard, G. Parisi, and R. Zecchina, Science297, 812 (2002)

  24. [24]

    Braunstein, M

    A. Braunstein, M. M´ ezard, and R. Zecchina, Random Structures & Algorithms27, 201 (2005)

  25. [25]

    Braunstein and R

    A. Braunstein and R. Zecchina, Physical review letters 96, 030201 (2006)

  26. [26]

    Zdeborov´ a and F

    L. Zdeborov´ a and F. Krzkala, Physical Review E—Statistical, Nonlinear, and Soft Matter Physics76, 031131 (2007)

  27. [27]

    Decelle, F

    A. Decelle, F. Krzakala, C. Moore, and L. Zdeborov´ a, Physical Review Letters107, 065701 (2011)

  28. [28]

    Krzakala, M

    F. Krzakala, M. M´ ezard, F. Sausset, Y. Sun, and L. Zde- borov´ a, Physical Review X2, 021005 (2012)

  29. [29]

    Zdeborov´ a and F

    L. Zdeborov´ a and F. Krzakala, Advances in Physics65, 453 (2016)

  30. [30]

    Ben Arous, L

    G. Ben Arous, L. V. Bogachev, and S. A. Molchanov, Probability theory and related fields132, 579 (2005)

  31. [31]

    J. M. Kosterlitz and D. J. Thouless, Journal of Physics C: Solid State Physics6, 1181 (1973)

  32. [32]

    Kosterlitz, Journal of Physics C: Solid State Physics7, 1046 (1974)

    J. Kosterlitz, Journal of Physics C: Solid State Physics7, 1046 (1974)

  33. [33]

    A. M. Petersen, J. N. Tenenbaum, S. Havlin, H. E. Stan- ley, and M. Perc, Scientific reports2, 943 (2012)

  34. [34]

    G. K. Zipf,The psycho-biology of language: An introduc- tion to dynamic philology(Routledge, Milton Park, 2013)

  35. [35]

    Ferrer i Cancho and R

    R. Ferrer i Cancho and R. V. Sol´ e, Journal of Quantitative Linguistics8, 165 (2001)

  36. [36]

    Corominas-Murtra and R

    B. Corominas-Murtra and R. V. Sol´ e, Physical Review E 82, 011102 (2010)

  37. [37]

    Corral, G

    A. Corral, G. Boleda, and R. Ferrer-i Cancho, PloS one 10, e0129031 (2015)

  38. [38]

    T. OLMo, P. Walsh, L. Soldaini, D. Groeneveld, K. Lo, S. Arora, A. Bhagia, Y. Gu, S. Huang, and M. Jordan, arXiv preprint arXiv:2501.00656 (2024)

  39. [39]

    Nakaishi and K

    K. Nakaishi and K. Hukushima, Physical Review Re- search6, 033216 (2024)

  40. [40]

    Y. Toji, J. Takahashi, V. Roychowdhury, and H. Miya- hara, Physical Review E113, 015305 (2026)

  41. [41]

    T. M. Cover and J. A. Thomas,Elements of information theory(John Wiley & Sons, 1999)

  42. [42]

    Entropy Estimates from Insufficient Samplings

    P. Grassberger, arXiv preprint physics/0307138 (2003)

  43. [43]

    Sch¨ urmann and P

    T. Sch¨ urmann and P. Grassberger, Chaos: An Interdisci- plinary Journal of Nonlinear Science6, 414 (1996). END MA TTER Combinatorics:Here we outline the combinatorics of patterns; for a complete treatment, including surface terms, see [19]. First choose the symbols used by the pbranching rules; this gives N 3 p ∼N 3p/p! ifN 3 ≫p. Then distribute these ac...