Phase Transitions in Driven Informational Systems: A Two-Field Perspective on Learning Theory and Non-Equilibrium Chemistry

Truong Xuan Khanh

arxiv: 2605.16325 · v1 · pith:ZLD5ZPU6new · submitted 2026-05-05 · 💻 cs.LG · cs.AI

Phase Transitions in Driven Informational Systems: A Two-Field Perspective on Learning Theory and Non-Equilibrium Chemistry

Truong Xuan Khanh This is my paper

Pith reviewed 2026-05-21 00:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords phase transitionsdriven informational systemsdeep learningnon-equilibrium chemistryuniversality classentropy productioninformation quasi-potentialorder parameters

0 comments

The pith

Driven informational systems unify phase transitions in deep learning and non-equilibrium chemistry via two gradient fields.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes that phase transitions observed in deep learning such as grokking and emergent capabilities and those in non-equilibrium chemical reaction networks can be understood through a shared framework of driven informational systems. These systems are stochastic processes controlled by two gradient fields: the entropy production rate Sigma and the information quasi-potential Phi_I defined as the negative log of the stationary density. By introducing order parameters like the adversarial breakdown threshold and the self-referential coupling threshold the approach identifies a potential universality class based on their joint scaling which offers a way to generate predictions that can be tested against data from large language models and chemical experiments. This matters because it bridges two previously separate areas of study allowing insights from one to inform the other and providing a more complete picture than single-field gradient models.

Core claim

Both classes of phenomena admit a common description as driven informational systems: stochastic processes governed by two gradient fields, an entropy production rate Sigma and an information quasi-potential Phi_I := -ln p*, where p* is the stationary density. Within this framework two candidate order parameters are introduced: an adversarial breakdown threshold alpha_dagger and a self-referential coupling threshold kappa_c. The joint scaling of (alpha_dagger, kappa_c) defines a candidate universality class with exponents (gamma_1, gamma_2). This framework is consistent with recent empirical findings on alignment transitions, adversarial breakdown scaling, and partial introspection in large

What carries the argument

The two gradient fields Sigma (entropy production rate) and Phi_I := -ln p* (information quasi-potential) that govern stochastic processes in driven informational systems, enabling definition of order parameters alpha_dagger and kappa_c and a candidate universality class with exponents gamma_1 and gamma_2.

If this is right

The joint scaling of alpha_dagger and kappa_c admits falsifiable predictions that distinguish the two-field description from single-field gradient accounts.
The framework shows consistency with empirical findings from 2024-2026 on alignment transitions, adversarial breakdown scaling, and partial introspection in large language models.
Geometric structure of the two-field framework can be outlined to analyze both learning theory and chemical reaction networks.
Candidate universality class with exponents gamma_1 and gamma_2 provides quantitative predictions for phase transitions in driven systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the unification holds, experiments in prebiotic chemistry could be designed using scaling laws observed in deep learning phase transitions.
The two-field perspective might extend to phase transitions in other complex systems such as biological regulatory networks.
Direct comparison of measured exponents gamma_1 and gamma_2 across artificial and natural driven systems would test the claimed universality class.

Load-bearing premise

Both phase-transition phenomena in deep learning and non-equilibrium chemical reaction networks admit a common description as driven informational systems governed by the two gradient fields Sigma and Phi_I.

What would settle it

Measuring whether the scaling of adversarial breakdown or alignment transitions in large language models follows the specific exponents gamma_1 and gamma_2 predicted by the joint scaling of alpha_dagger and kappa_c, or finding matching scaling in chemical reaction networks that single-field models do not reproduce.

Figures

Figures reproduced from arXiv: 2605.16325 by Truong Xuan Khanh.

**Figure 1.** Figure 1: The two candidate order parameters (schematic). (a) The adversarial breakdown threshold α † = Θ(1/ log |ON |) as a function of representational complexity |ON | (solid blue), shown against classical breakdown points of robust statistics (dotted horizontal lines): the median of Hampel (1971) at 1/2, and high-breakdown S/M-estimators at ≈ 0.29. The classical breakdown points are universal constants; equatio… view at source ↗

**Figure 2.** Figure 2: The two-field geometry on two configuration manifolds (schematic). Both panels depict the same dynamical class: a stochastic trajectory Xt (green) on a configuration manifold M driven by an external entropy flux (Σ, blue arrows) and stabilized at attractors of the information quasi-potential ΦI (red contours). The two gradient fields ∇Σ and ∇ΦI are non-collinear off equilibrium (§4). (a) Prebiotic-chemistr… view at source ↗

read the original abstract

Phase-transition phenomena in deep learning (grokking, emergent capabilities, and ontological reorganization under context shift) have been studied through several lenses, including representational compression, singular learning theory, and information-theoretic progress measures. Independently, non-equilibrium statistical physics has identified phase transitions in driven chemical reaction networks underlying prebiotic selection, with empirical signatures that are difficult to reproduce within single-field gradient accounts. We propose a perspective in which both classes of phenomena admit a common description as driven informational systems: stochastic processes governed by two gradient fields, an entropy production rate Sigma and an information quasi-potential Phi_I := -ln p*, where p* is the stationary density. Within this framework we introduce two candidate order parameters: an adversarial breakdown threshold alpha_dagger and a self-referential coupling threshold kappa_c. The joint scaling of (alpha_dagger, kappa_c) defines a candidate universality class with exponents (gamma_1, gamma_2). We outline the geometric structure of this framework, identify falsifiable predictions distinguishing it from single-field alternatives, and show consistency with recent empirical findings (2024--2026) on alignment transitions, adversarial breakdown scaling, and partial introspection in large language models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a conceptual sketch linking phase transitions in learning and chemistry via two gradient fields, but the derivations for the universality class and its exponents are missing.

read the letter

The main point is that the paper offers a two-field perspective on phase transitions, using an entropy production rate Sigma and an information quasi-potential Phi_I to describe both deep learning phenomena like grokking and non-equilibrium chemical networks. It names two order parameters, alpha_dagger and kappa_c, and claims their joint scaling defines a universality class with exponents gamma_1 and gamma_2 that could yield distinct falsifiable predictions.

Referee Report

3 major / 2 minor

Summary. The paper proposes that phase transitions in deep learning (e.g., grokking, emergent capabilities) and non-equilibrium chemical reaction networks share a common description as driven informational systems governed by two gradient fields: the entropy production rate Σ and the information quasi-potential Φ_I := -ln p*, where p* is the stationary density. It introduces two order parameters—an adversarial breakdown threshold α_dagger and a self-referential coupling threshold κ_c—whose joint scaling is claimed to define a candidate universality class with exponents (γ1, γ2) that yields falsifiable predictions distinguishing the framework from single-field gradient accounts, with asserted consistency to 2024–2026 empirical findings on alignment transitions and adversarial scaling.

Significance. If the two-field dynamics and scaling analysis were explicitly derived, the perspective could unify phenomena across learning theory and non-equilibrium chemistry by identifying a distinct universality class. The introduction of Σ and Φ_I as dual gradient fields and the candidate order parameters α_dagger, κ_c represents a potentially interesting geometric framing. However, the manuscript supplies no derivations, dynamical equations, renormalization steps, or independent tests, so the claimed universality class and falsifiable predictions remain labels rather than derived results; no machine-checked proofs, reproducible code, or parameter-free derivations are present.

major comments (3)

[Abstract] Abstract and main text: the central claim that the joint scaling of (α_dagger, κ_c) defines a universality class with exponents (γ1, γ2) admitting falsifiable predictions is not supported by any dynamical equations for the evolution of Σ and Φ_I, nor by renormalization-group or scaling analysis that would extract γ1 and γ2. Without these steps the universality class is introduced by definition rather than derived, rendering the distinction from single-field accounts untestable within the manuscript.
[Abstract] Abstract: the asserted consistency with 2024–2026 empirical findings on alignment transitions, adversarial breakdown scaling, and partial introspection is presented without explicit mapping from the two-field dynamics to the order parameters, without error analysis, and without data-exclusion rules. This raises the risk that α_dagger and κ_c are constructed to match observations rather than independently predicted.
[Main text] Main text (framework outline): the geometric structure of the two-field perspective is outlined but no explicit stochastic process or Fokker–Planck equation linking Σ and Φ_I to the stationary density p* is supplied, leaving the information quasi-potential and entropy-production rate as formal labels without operational dynamics.

minor comments (2)

Notation for Φ_I := -ln p* should be accompanied by a precise definition of the stationary density p* and its relation to the driving protocol.
The manuscript would benefit from a dedicated section contrasting the two-field predictions with existing single-field results (e.g., singular learning theory) using at least one concrete observable.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major point below, clarifying the scope of the manuscript as a unifying perspective while incorporating revisions where they strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract and main text: the central claim that the joint scaling of (α_dagger, κ_c) defines a universality class with exponents (γ1, γ2) admitting falsifiable predictions is not supported by any dynamical equations for the evolution of Σ and Φ_I, nor by renormalization-group or scaling analysis that would extract γ1 and γ2. Without these steps the universality class is introduced by definition rather than derived, rendering the distinction from single-field accounts untestable within the manuscript.

Authors: We agree that explicit dynamical equations and a renormalization-group analysis would make the proposed universality class more rigorously testable. The manuscript introduces the two-field structure and candidate order parameters as a geometric perspective motivated by the dual gradients Σ and Φ_I, with the joint scaling and exponents (γ1, γ2) offered as candidate relations to be derived and tested in follow-up work. We will add a concise section outlining the underlying stochastic dynamics and indicating how the scaling exponents follow from the dual-field coupling, thereby sharpening the distinction from single-field accounts. revision: partial
Referee: [Abstract] Abstract: the asserted consistency with 2024–2026 empirical findings on alignment transitions, adversarial breakdown scaling, and partial introspection is presented without explicit mapping from the two-field dynamics to the order parameters, without error analysis, and without data-exclusion rules. This raises the risk that α_dagger and κ_c are constructed to match observations rather than independently predicted.

Authors: The order parameters α_dagger and κ_c are defined from the geometric properties of the two gradient fields prior to any empirical comparison. The reported consistency is presented as an initial illustration of relevance rather than a calibrated fit. In revision we will include an explicit mapping from the two-field parameters to the cited empirical signatures, together with a discussion of falsification criteria and potential selection effects. revision: yes
Referee: [Main text] Main text (framework outline): the geometric structure of the two-field perspective is outlined but no explicit stochastic process or Fokker–Planck equation linking Σ and Φ_I to the stationary density p* is supplied, leaving the information quasi-potential and entropy-production rate as formal labels without operational dynamics.

Authors: We accept that supplying the explicit stochastic process and Fokker–Planck equation would render the framework more operational. The current text emphasizes the geometric unification; we will add a brief derivation of the stationary density p* from the dual-gradient dynamics in the revised manuscript or as a supplementary appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework proposal remains self-contained.

full rationale

The manuscript proposes a two-field perspective on driven informational systems governed by Sigma and Phi_I, introduces candidate order parameters alpha_dagger and kappa_c as new quantities, and states that their joint scaling defines a candidate universality class with exponents gamma_1 and gamma_2. It outlines geometric structure, identifies distinguishing falsifiable predictions, and notes consistency with external empirical findings from 2024-2026. No derivation chain is exhibited in which the exponents, predictions, or order parameters reduce by construction to fitted inputs, self-definitions, or prior self-citations. The central claim functions as a conceptual unification rather than a closed mathematical reduction to the paper's own inputs, leaving the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The proposal rests on the domain assumption that the two classes of phenomena share a common two-field description; no free parameters or invented entities are numerically specified in the abstract, but the order parameters function as new constructs whose independence from data fitting is unverified.

axioms (1)

domain assumption Phase-transition phenomena in deep learning and non-equilibrium chemical reaction networks admit a common description as driven informational systems governed by two gradient fields Sigma and Phi_I.
Stated directly in the abstract as the unifying premise.

invented entities (2)

adversarial breakdown threshold alpha_dagger no independent evidence
purpose: Candidate order parameter marking phase transition under adversarial conditions
Introduced without independent falsifiable handle or derivation shown in abstract.
self-referential coupling threshold kappa_c no independent evidence
purpose: Candidate order parameter marking self-referential phase transition
Introduced without independent falsifiable handle or derivation shown in abstract.

pith-pipeline@v0.9.0 · 5744 in / 1530 out tokens · 72238 ms · 2026-05-21T00:24:12.977131+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 6 internal anchors

[1]

2026 , note =

Truong, Xuan Khanh , title =. 2026 , note =

work page 2026
[2]

2026 , journal =

Truong, Quynh Hoa and Truong, Xuan Khanh , title =. 2026 , journal =. doi:10.64898/2026.04.21.719958 , url =

work page doi:10.64898/2026.04.21.719958 2026
[3]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Power, Alethea and Burda, Yuri and Edwards, Harri and Babuschkin, Igor and Misra, Vedant , title =. 2022 , journal =. doi:10.48550/arXiv.2201.02177 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2201.02177 2022
[4]

Progress measures for grokking via mechanistic interpretability

Nanda, Neel and Chan, Lawrence and Lieberum, Tom and Smith, Jess and Steinhardt, Jacob , title =. The Eleventh International Conference on Learning Representations (ICLR) , year =. doi:10.48550/arXiv.2301.05217 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2301.05217
[5]

and Tegmark, Max and Williams, Mike , title =

Liu, Ziming and Kitouni, Ouail and Nolte, Niklas and Michaud, Eric J. and Tegmark, Max and Williams, Mike , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =. doi:10.48550/arXiv.2205.10343 , url =

work page doi:10.48550/arxiv.2205.10343
[6]

2023 , journal =

Liu, Ziming and Zhong, Ziqian and Tegmark, Max , title =. 2023 , journal =. doi:10.48550/arXiv.2310.05918 , url =

work page doi:10.48550/arxiv.2310.05918 2023
[7]

URL https://www.sciencedirect.com/sc ience/article/pii/S0167278925003367

DeMoss, Branton and Sapora, Silvia and Foerster, Jakob and Hawes, Nick and Posner, Ingmar , title =. 2025 , journal =. doi:10.1016/j.physd.2025.134859 , note =

work page doi:10.1016/j.physd.2025.134859 2025
[8]

2024 , journal =

Clauw, Kenzo and Stramaglia, Sebastiano and Marinazzo, Daniele , title =. 2024 , journal =. doi:10.48550/arXiv.2408.08944 , url =

work page doi:10.48550/arxiv.2408.08944 2024
[9]

and Hashimoto, Tatsunori and Vinyals, Oriol and Liang, Percy and Dean, Jeff and Fedus, William , title =

Wei, Jason and Tay, Yi and Bommasani, Rishi and Raffel, Colin and Zoph, Barret and Borgeaud, Sebastian and Yogatama, Dani and Bosma, Maarten and Zhou, Denny and Metzler, Donald and Chi, Ed H. and Hashimoto, Tatsunori and Vinyals, Oriol and Liang, Percy and Dean, Jeff and Fedus, William , title =. 2022 , journal =

work page 2022
[10]

2022 , journal =

Olsson, Catherine and Elhage, Nelson and Nanda, Neel and Joseph, Nicholas and DasSarma, Nova and Henighan, Tom and Mann, Ben and Askell, Amanda and Bai, Yuntao and Chen, Anna and Conerly, Tom and Drain, Dawn and Ganguli, Deep and Hatfield-Dodds, Zac and Hernandez, Danny and Johnston, Scott and Jones, Andy and Kernion, Jackson and Lovitt, Liane and Ndousse...

work page 2022
[11]

2024 , journal =

Hoogland, Jesse and Wang, George and Farrugia-Roberts, Matthew and Carroll, Liam and Wei, Susan and Murfet, Daniel , title =. 2024 , journal =. doi:10.48550/arXiv.2402.02364 , url =

work page doi:10.48550/arxiv.2402.02364 2024
[12]

2025 , journal =

Pepin Lehalleur, Simon and Hoogland, Jesse and Farrugia-Roberts, Matthew and Wei, Susan and Gietelink Oldenziel, Alexander and Wang, George and Carroll, Liam and Murfet, Daniel , title =. 2025 , journal =. doi:10.48550/arXiv.2502.05475 , url =

work page doi:10.48550/arxiv.2502.05475 2025
[13]

2009 , publisher =

Watanabe, Sumio , title =. 2009 , publisher =

work page 2009
[14]

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback , year =

Casper, Stephen and Davies, Xander and Shi, Claudia and Gilbert, Thomas Krendl and Scheurer, J. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback , year =. Transactions on Machine Learning Research (TMLR) , url =

work page
[15]

2025 , month =

Lindsey, Jack , title =. 2025 , month =

work page 2025
[16]

arXiv:2410.13787 [cs]

Binder, Felix J. and Chua, James and Korbak, Tomek and Sleight, Henry and Hughes, John and Long, Robert and Perez, Ethan and Turpin, Miles and Evans, Owain , title =. 2024 , journal =. doi:10.48550/arXiv.2410.13787 , url =

work page doi:10.48550/arxiv.2410.13787 2024
[17]

2024 , journal =

Prakki, Rithvik , title =. 2024 , journal =. doi:10.48550/arXiv.2412.10425 , url =

work page doi:10.48550/arxiv.2412.10425 2024
[18]

, title =

Hampel, Frank R. , title =. 1971 , journal =

work page 1971
[19]

and Huber, Peter J

Donoho, David L. and Huber, Peter J. , title =. A Festschrift for. 1983 , publisher =

work page 1983
[20]

, title =

Diakonikolas, Ilias and Kane, Daniel M. , title =. 2023 , publisher =

work page 2023
[21]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Hanneke, Steve and Karbasi, Amin and Mahmoody, Mohammad and Mehalel, Idan and Moran, Shay , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =. doi:10.48550/arXiv.2210.02713 , url =

work page doi:10.48550/arxiv.2210.02713
[22]

2025 , journal =

Chornomaz, Bogdan and Koren, Yonatan and Moran, Shay and Waknine, Tom , title =. 2025 , journal =. doi:10.48550/arXiv.2506.03075 , url =

work page doi:10.48550/arxiv.2506.03075 2025
[23]

1986 , publisher =

Le Cam, Lucien , title =. 1986 , publisher =

work page 1986
[24]

Sparse Autoencoders Find Highly Interpretable Features in Language Models

Cunningham, Hoagy and Ewart, Aidan and Riggs, Logan and Huben, Robert and Sharkey, Lee , title =. 2023 , journal =. doi:10.48550/arXiv.2309.08600 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.08600 2023
[25]

, title =

Schnakenberg, J. , title =. 1976 , journal =

work page 1976
[26]

2012 , journal =

Seifert, Udo , title =. 2012 , journal =

work page 2012
[27]

and Wentzell, Alexander D

Freidlin, Mark I. and Wentzell, Alexander D. , title =. 1984 , publisher =

work page 1984
[28]

and Tweedie, Richard L

Meyn, Sean P. and Tweedie, Richard L. , title =. 1993 , journal =

work page 1993
[29]

, title =

Kelly, Frank P. , title =. 1979 , publisher =

work page 1979
[30]

, title =

England, Jeremy L. , title =. 2015 , journal =

work page 2015
[31]

and Liang, Shiling and Piazza, Francesco and De Los Rios, Paolo , title =

Busiello, Daniel M. and Liang, Shiling and Piazza, Francesco and De Los Rios, Paolo , title =. 2021 , journal =

work page 2021
[32]

, title =

Liang, Shiling and De Los Rios, Paolo and Busiello, Daniel M. , title =. 2024 , journal =

work page 2024
[33]

, title =

Liang, Shiling and De Los Rios, Paolo and Busiello, Daniel M. , title =. 2024 , journal =. doi:10.48550/arXiv.2407.11498 , url =

work page doi:10.48550/arxiv.2407.11498 2024
[34]

2010 , journal =

Friston, Karl , title =. 2010 , journal =

work page 2010
[35]

Ramstead, Maxwell J. D. and Sakthivadivel, Dalton A. R. and Heins, Conor and Koudahl, Magnus and Millidge, Beren and Da Costa, Lancelot and Klein, Brennan and Friston, Karl J. , title =. 2023 , journal =

work page 2023
[36]

Walker, Sara Imari and Davies, Paul C. W. , title =. 2013 , journal =

work page 2013
[37]

Prokopenko, Mikhail and Davies, Paul C. W. and Harr. Biological Arrow of Time: Emergence of Tangled Information Hierarchies and Self-Modelling Dynamics , year =. Journal of Physics: Complexity , volume =

work page
[38]

2023 , journal =

Levin, Michael , title =. 2023 , journal =

work page 2023
[39]

The Major Transitions in Evolution , year =

Maynard Smith, John and Szathm. The Major Transitions in Evolution , year =

work page
[40]

and Hill, Aubrey R

Ferris, James P. and Hill, Aubrey R. and Liu, Rihe and Orgel, Leslie E. , title =. 1996 , journal =

work page 1996
[41]

and Miller, Glenn H

Blank, Jennifer G. and Miller, Glenn H. and Ahrens, Mark J. and Winans, Randall E. , title =. 2001 , journal =

work page 2001
[42]

, title =

Matreux, Thomas and Aikkila, Paula and Scheu, Bettina and Braun, Dieter and Mast, Christof B. , title =. 2024 , journal =

work page 2024
[43]

Membraneless Protocell Confined by a Heat Flow , year =

Floroni, Alexander and Yeh Mart. Membraneless Protocell Confined by a Heat Flow , year =. Nature Physics , volume =

work page
[44]

and Wunnava, Sreekar and Krepl, Milo

Rout, Saroj K. and Wunnava, Sreekar and Krepl, Milo. Amino Acids Catalyse. 2025 , journal =

work page 2025
[45]

, title =

Singh, Jyoti and Thoma, Benjamin and Whitaker, Daniel and Satterly Webley, Max and Yao, Yuan and Powner, Matthew W. , title =. 2025 , journal =

work page 2025
[46]

2020 , journal =

Damer, Bruce and Deamer, David , title =. 2020 , journal =

work page 2020
[47]

and Lawless, James and Pering, Katherine and Peterson, Etta and Flores, Jose and Ponnamperuma, Cyril and Kaplan, Isaac R

Kvenvolden, Keith A. and Lawless, James and Pering, Katherine and Peterson, Etta and Flores, Jose and Ponnamperuma, Cyril and Kaplan, Isaac R. and Moore, Carleton , title =. 1970 , journal =

work page 1970
[48]

and Ohkouchi, Naohiko and Sasaki, Kazunori and Sato, Hajime and Glavin, Daniel P

Oba, Yasuhiro and Koga, Toshiki and Takano, Yoshinori and Ogawa, Nanako O. and Ohkouchi, Naohiko and Sasaki, Kazunori and Sato, Hajime and Glavin, Daniel P. and Dworkin, Jason P. and Naraoka, Hiroshi and others , title =. 2023 , journal =

work page 2023
[49]

, title =

McGuire, Brett A. , title =. 2022 , journal =

work page 2022
[50]

Linfoot, E. H. , title =. 1957 , journal =

work page 1957
[51]

Estimating Mutual Information , year =

Kraskov, Alexander and St. Estimating Mutual Information , year =. Physical Review E , volume =

work page
[52]

Devon , title =

Belghazi, Mohamed Ishmael and Baratin, Aristide and Rajeswar, Sai and Ozair, Sherjil and Bengio, Yoshua and Courville, Aaron and Hjelm, R. Devon , title =. Proceedings of the 35th International Conference on Machine Learning (ICML) , year =

work page
[53]

arXiv preprint arXiv:2506.11613 , year =

Turner, Edward and Soligo, Anna and Taylor, Mia and Rajamanoharan, Senthooran and Nanda, Neel , title =. arXiv preprint arXiv:2506.11613 , year =. doi:10.48550/arXiv.2506.11613 , url =

work page doi:10.48550/arxiv.2506.11613
[54]

arXiv preprint arXiv:2506.11618 , year =

Soligo, Anna and Turner, Edward and Rajamanoharan, Senthooran and Nanda, Neel , title =. arXiv preprint arXiv:2506.11618 , year =. doi:10.48550/arXiv.2506.11618 , url =

work page doi:10.48550/arxiv.2506.11618
[55]

Decomposing Behavioral Phase Transitions in

Arnold, Julian and L. Decomposing Behavioral Phase Transitions in. arXiv preprint arXiv:2508.20015 , year =. doi:10.48550/arXiv.2508.20015 , url =

work page doi:10.48550/arxiv.2508.20015
[56]

Bill Z Jia, Yitong Qi, J David Wong-Campos, Sean G Megason, and Adam E Cohen

Hennick, Max and Corlouer, Guillaume , title =. arXiv preprint arXiv:2603.29805 , year =. doi:10.48550/arXiv.2603.29805 , url =

work page doi:10.48550/arxiv.2603.29805
[57]

Poisoning attacks on llms require a near-constant number of poison samples.arXiv preprint arXiv:2510.07192, 2025

Souly, Alexandra and others , title =. arXiv preprint arXiv:2510.07192 , year =. doi:10.48550/arXiv.2510.07192 , url =

work page doi:10.48550/arxiv.2510.07192
[58]

Emergent Introspection in AI is Content-Agnostic

Lederman, Harvey and Mahowald, Kyle , title =. arXiv preprint arXiv:2603.05414 , year =. doi:10.48550/arXiv.2603.05414 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.05414
[59]

Mechanisms of Introspective Awareness

Macar, Uzay and Yang, Li and Wang, Atticus and Wallich, Peter and Ameisen, Emmanuel and Lindsey, Jack , title =. arXiv preprint arXiv:2603.21396 , year =. doi:10.48550/arXiv.2603.21396 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.21396
[60]

arXiv preprint arXiv:2508.14802 , year =

Song, Siyuan and Lederman, Harvey and Hu, Jennifer and Mahowald, Kyle , title =. arXiv preprint arXiv:2508.14802 , year =. doi:10.48550/arXiv.2508.14802 , url =

work page doi:10.48550/arxiv.2508.14802
[61]

and Vaiana, Michael and Pop, Florin and Rosenblatt, Judd and Schwerz de Lucena, Diogo and Ziman, Kirsten and Graziano, Michael S

Premakumar, Vickram N. and Vaiana, Michael and Pop, Florin and Rosenblatt, Judd and Schwerz de Lucena, Diogo and Ziman, Kirsten and Graziano, Michael S. A. , title =. arXiv preprint arXiv:2407.10188 , year =. doi:10.48550/arXiv.2407.10188 , url =

work page doi:10.48550/arxiv.2407.10188
[62]

Universal Behavior of Coupled Order Parameters Below Three Dimensions , journal =

Eichhorn, Astrid and Mesterh. Universal Behavior of Coupled Order Parameters Below Three Dimensions , journal =. 2013 , note =

work page 2013
[63]

Physical Review E , volume =

Hasselmann, Nils and Sinner, Andreas and Kopietz, Peter , title =. Physical Review E , volume =. 2007 , doi =

work page 2007
[64]

Robust Machine Learning by Median-of-Means: Theory and Practice , journal =

Lecu. Robust Machine Learning by Median-of-Means: Theory and Practice , journal =. 2020 , note =

work page 2020
[65]

The Norm-Separation Delay Law of Grokking: A First-Principles Theory of Delayed Generalization

Truong, Xuan Khanh and Truong, Quynh Hoa and Luu, Duc Trung and Phan, Thanh Duc , title =. arXiv preprint arXiv:2603.13331 , year =. doi:10.48550/arXiv.2603.13331 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.13331
[66]

arXiv preprint arXiv:2603.07323 , year =

Truong, Xuan Khanh and Truong, Quynh Hoa , title =. arXiv preprint arXiv:2603.07323 , year =. doi:10.48550/arXiv.2603.07323 , url =

work page doi:10.48550/arxiv.2603.07323
[67]

and Tzifa-Kratira, Zoe and Samavi, Reza and Gavves, Efstratios , title =

Bereska, Leonard F. and Tzifa-Kratira, Zoe and Samavi, Reza and Gavves, Efstratios , title =. Transactions on Machine Learning Research (TMLR) , year =

work page
[68]

, title =

Bi, Yuda and Zhang, Chenyu and Wang, Qiheng and Calhoun, Vince D. , title =. arXiv preprint arXiv:2603.24746 , year =. doi:10.48550/arXiv.2603.24746 , url =

work page doi:10.48550/arxiv.2603.24746
[69]

arXiv preprint arXiv:2602.07852 , year =

Soligo, Anna and Turner, Edward and Taylor, Mia and Rajamanoharan, Senthooran and Nanda, Neel , title =. arXiv preprint arXiv:2602.07852 , year =. doi:10.48550/arXiv.2602.07852 , url =

work page doi:10.48550/arxiv.2602.07852

[1] [1]

2026 , note =

Truong, Xuan Khanh , title =. 2026 , note =

work page 2026

[2] [2]

2026 , journal =

Truong, Quynh Hoa and Truong, Xuan Khanh , title =. 2026 , journal =. doi:10.64898/2026.04.21.719958 , url =

work page doi:10.64898/2026.04.21.719958 2026

[3] [3]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Power, Alethea and Burda, Yuri and Edwards, Harri and Babuschkin, Igor and Misra, Vedant , title =. 2022 , journal =. doi:10.48550/arXiv.2201.02177 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2201.02177 2022

[4] [4]

Progress measures for grokking via mechanistic interpretability

Nanda, Neel and Chan, Lawrence and Lieberum, Tom and Smith, Jess and Steinhardt, Jacob , title =. The Eleventh International Conference on Learning Representations (ICLR) , year =. doi:10.48550/arXiv.2301.05217 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2301.05217

[5] [5]

and Tegmark, Max and Williams, Mike , title =

Liu, Ziming and Kitouni, Ouail and Nolte, Niklas and Michaud, Eric J. and Tegmark, Max and Williams, Mike , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =. doi:10.48550/arXiv.2205.10343 , url =

work page doi:10.48550/arxiv.2205.10343

[6] [6]

2023 , journal =

Liu, Ziming and Zhong, Ziqian and Tegmark, Max , title =. 2023 , journal =. doi:10.48550/arXiv.2310.05918 , url =

work page doi:10.48550/arxiv.2310.05918 2023

[7] [7]

URL https://www.sciencedirect.com/sc ience/article/pii/S0167278925003367

DeMoss, Branton and Sapora, Silvia and Foerster, Jakob and Hawes, Nick and Posner, Ingmar , title =. 2025 , journal =. doi:10.1016/j.physd.2025.134859 , note =

work page doi:10.1016/j.physd.2025.134859 2025

[8] [8]

2024 , journal =

Clauw, Kenzo and Stramaglia, Sebastiano and Marinazzo, Daniele , title =. 2024 , journal =. doi:10.48550/arXiv.2408.08944 , url =

work page doi:10.48550/arxiv.2408.08944 2024

[9] [9]

and Hashimoto, Tatsunori and Vinyals, Oriol and Liang, Percy and Dean, Jeff and Fedus, William , title =

Wei, Jason and Tay, Yi and Bommasani, Rishi and Raffel, Colin and Zoph, Barret and Borgeaud, Sebastian and Yogatama, Dani and Bosma, Maarten and Zhou, Denny and Metzler, Donald and Chi, Ed H. and Hashimoto, Tatsunori and Vinyals, Oriol and Liang, Percy and Dean, Jeff and Fedus, William , title =. 2022 , journal =

work page 2022

[10] [10]

2022 , journal =

Olsson, Catherine and Elhage, Nelson and Nanda, Neel and Joseph, Nicholas and DasSarma, Nova and Henighan, Tom and Mann, Ben and Askell, Amanda and Bai, Yuntao and Chen, Anna and Conerly, Tom and Drain, Dawn and Ganguli, Deep and Hatfield-Dodds, Zac and Hernandez, Danny and Johnston, Scott and Jones, Andy and Kernion, Jackson and Lovitt, Liane and Ndousse...

work page 2022

[11] [11]

2024 , journal =

Hoogland, Jesse and Wang, George and Farrugia-Roberts, Matthew and Carroll, Liam and Wei, Susan and Murfet, Daniel , title =. 2024 , journal =. doi:10.48550/arXiv.2402.02364 , url =

work page doi:10.48550/arxiv.2402.02364 2024

[12] [12]

2025 , journal =

Pepin Lehalleur, Simon and Hoogland, Jesse and Farrugia-Roberts, Matthew and Wei, Susan and Gietelink Oldenziel, Alexander and Wang, George and Carroll, Liam and Murfet, Daniel , title =. 2025 , journal =. doi:10.48550/arXiv.2502.05475 , url =

work page doi:10.48550/arxiv.2502.05475 2025

[13] [13]

2009 , publisher =

Watanabe, Sumio , title =. 2009 , publisher =

work page 2009

[14] [14]

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback , year =

Casper, Stephen and Davies, Xander and Shi, Claudia and Gilbert, Thomas Krendl and Scheurer, J. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback , year =. Transactions on Machine Learning Research (TMLR) , url =

work page

[15] [15]

2025 , month =

Lindsey, Jack , title =. 2025 , month =

work page 2025

[16] [16]

arXiv:2410.13787 [cs]

Binder, Felix J. and Chua, James and Korbak, Tomek and Sleight, Henry and Hughes, John and Long, Robert and Perez, Ethan and Turpin, Miles and Evans, Owain , title =. 2024 , journal =. doi:10.48550/arXiv.2410.13787 , url =

work page doi:10.48550/arxiv.2410.13787 2024

[17] [17]

2024 , journal =

Prakki, Rithvik , title =. 2024 , journal =. doi:10.48550/arXiv.2412.10425 , url =

work page doi:10.48550/arxiv.2412.10425 2024

[18] [18]

, title =

Hampel, Frank R. , title =. 1971 , journal =

work page 1971

[19] [19]

and Huber, Peter J

Donoho, David L. and Huber, Peter J. , title =. A Festschrift for. 1983 , publisher =

work page 1983

[20] [20]

, title =

Diakonikolas, Ilias and Kane, Daniel M. , title =. 2023 , publisher =

work page 2023

[21] [21]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Hanneke, Steve and Karbasi, Amin and Mahmoody, Mohammad and Mehalel, Idan and Moran, Shay , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =. doi:10.48550/arXiv.2210.02713 , url =

work page doi:10.48550/arxiv.2210.02713

[22] [22]

2025 , journal =

Chornomaz, Bogdan and Koren, Yonatan and Moran, Shay and Waknine, Tom , title =. 2025 , journal =. doi:10.48550/arXiv.2506.03075 , url =

work page doi:10.48550/arxiv.2506.03075 2025

[23] [23]

1986 , publisher =

Le Cam, Lucien , title =. 1986 , publisher =

work page 1986

[24] [24]

Sparse Autoencoders Find Highly Interpretable Features in Language Models

Cunningham, Hoagy and Ewart, Aidan and Riggs, Logan and Huben, Robert and Sharkey, Lee , title =. 2023 , journal =. doi:10.48550/arXiv.2309.08600 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.08600 2023

[25] [25]

, title =

Schnakenberg, J. , title =. 1976 , journal =

work page 1976

[26] [26]

2012 , journal =

Seifert, Udo , title =. 2012 , journal =

work page 2012

[27] [27]

and Wentzell, Alexander D

Freidlin, Mark I. and Wentzell, Alexander D. , title =. 1984 , publisher =

work page 1984

[28] [28]

and Tweedie, Richard L

Meyn, Sean P. and Tweedie, Richard L. , title =. 1993 , journal =

work page 1993

[29] [29]

, title =

Kelly, Frank P. , title =. 1979 , publisher =

work page 1979

[30] [30]

, title =

England, Jeremy L. , title =. 2015 , journal =

work page 2015

[31] [31]

and Liang, Shiling and Piazza, Francesco and De Los Rios, Paolo , title =

Busiello, Daniel M. and Liang, Shiling and Piazza, Francesco and De Los Rios, Paolo , title =. 2021 , journal =

work page 2021

[32] [32]

, title =

Liang, Shiling and De Los Rios, Paolo and Busiello, Daniel M. , title =. 2024 , journal =

work page 2024

[33] [33]

, title =

Liang, Shiling and De Los Rios, Paolo and Busiello, Daniel M. , title =. 2024 , journal =. doi:10.48550/arXiv.2407.11498 , url =

work page doi:10.48550/arxiv.2407.11498 2024

[34] [34]

2010 , journal =

Friston, Karl , title =. 2010 , journal =

work page 2010

[35] [35]

Ramstead, Maxwell J. D. and Sakthivadivel, Dalton A. R. and Heins, Conor and Koudahl, Magnus and Millidge, Beren and Da Costa, Lancelot and Klein, Brennan and Friston, Karl J. , title =. 2023 , journal =

work page 2023

[36] [36]

Walker, Sara Imari and Davies, Paul C. W. , title =. 2013 , journal =

work page 2013

[37] [37]

Prokopenko, Mikhail and Davies, Paul C. W. and Harr. Biological Arrow of Time: Emergence of Tangled Information Hierarchies and Self-Modelling Dynamics , year =. Journal of Physics: Complexity , volume =

work page

[38] [38]

2023 , journal =

Levin, Michael , title =. 2023 , journal =

work page 2023

[39] [39]

The Major Transitions in Evolution , year =

Maynard Smith, John and Szathm. The Major Transitions in Evolution , year =

work page

[40] [40]

and Hill, Aubrey R

Ferris, James P. and Hill, Aubrey R. and Liu, Rihe and Orgel, Leslie E. , title =. 1996 , journal =

work page 1996

[41] [41]

and Miller, Glenn H

Blank, Jennifer G. and Miller, Glenn H. and Ahrens, Mark J. and Winans, Randall E. , title =. 2001 , journal =

work page 2001

[42] [42]

, title =

Matreux, Thomas and Aikkila, Paula and Scheu, Bettina and Braun, Dieter and Mast, Christof B. , title =. 2024 , journal =

work page 2024

[43] [43]

Membraneless Protocell Confined by a Heat Flow , year =

Floroni, Alexander and Yeh Mart. Membraneless Protocell Confined by a Heat Flow , year =. Nature Physics , volume =

work page

[44] [44]

and Wunnava, Sreekar and Krepl, Milo

Rout, Saroj K. and Wunnava, Sreekar and Krepl, Milo. Amino Acids Catalyse. 2025 , journal =

work page 2025

[45] [45]

, title =

Singh, Jyoti and Thoma, Benjamin and Whitaker, Daniel and Satterly Webley, Max and Yao, Yuan and Powner, Matthew W. , title =. 2025 , journal =

work page 2025

[46] [46]

2020 , journal =

Damer, Bruce and Deamer, David , title =. 2020 , journal =

work page 2020

[47] [47]

and Lawless, James and Pering, Katherine and Peterson, Etta and Flores, Jose and Ponnamperuma, Cyril and Kaplan, Isaac R

Kvenvolden, Keith A. and Lawless, James and Pering, Katherine and Peterson, Etta and Flores, Jose and Ponnamperuma, Cyril and Kaplan, Isaac R. and Moore, Carleton , title =. 1970 , journal =

work page 1970

[48] [48]

and Ohkouchi, Naohiko and Sasaki, Kazunori and Sato, Hajime and Glavin, Daniel P

Oba, Yasuhiro and Koga, Toshiki and Takano, Yoshinori and Ogawa, Nanako O. and Ohkouchi, Naohiko and Sasaki, Kazunori and Sato, Hajime and Glavin, Daniel P. and Dworkin, Jason P. and Naraoka, Hiroshi and others , title =. 2023 , journal =

work page 2023

[49] [49]

, title =

McGuire, Brett A. , title =. 2022 , journal =

work page 2022

[50] [50]

Linfoot, E. H. , title =. 1957 , journal =

work page 1957

[51] [51]

Estimating Mutual Information , year =

Kraskov, Alexander and St. Estimating Mutual Information , year =. Physical Review E , volume =

work page

[52] [52]

Devon , title =

Belghazi, Mohamed Ishmael and Baratin, Aristide and Rajeswar, Sai and Ozair, Sherjil and Bengio, Yoshua and Courville, Aaron and Hjelm, R. Devon , title =. Proceedings of the 35th International Conference on Machine Learning (ICML) , year =

work page

[53] [53]

arXiv preprint arXiv:2506.11613 , year =

Turner, Edward and Soligo, Anna and Taylor, Mia and Rajamanoharan, Senthooran and Nanda, Neel , title =. arXiv preprint arXiv:2506.11613 , year =. doi:10.48550/arXiv.2506.11613 , url =

work page doi:10.48550/arxiv.2506.11613

[54] [54]

arXiv preprint arXiv:2506.11618 , year =

Soligo, Anna and Turner, Edward and Rajamanoharan, Senthooran and Nanda, Neel , title =. arXiv preprint arXiv:2506.11618 , year =. doi:10.48550/arXiv.2506.11618 , url =

work page doi:10.48550/arxiv.2506.11618

[55] [55]

Decomposing Behavioral Phase Transitions in

Arnold, Julian and L. Decomposing Behavioral Phase Transitions in. arXiv preprint arXiv:2508.20015 , year =. doi:10.48550/arXiv.2508.20015 , url =

work page doi:10.48550/arxiv.2508.20015

[56] [56]

Bill Z Jia, Yitong Qi, J David Wong-Campos, Sean G Megason, and Adam E Cohen

Hennick, Max and Corlouer, Guillaume , title =. arXiv preprint arXiv:2603.29805 , year =. doi:10.48550/arXiv.2603.29805 , url =

work page doi:10.48550/arxiv.2603.29805

[57] [57]

Poisoning attacks on llms require a near-constant number of poison samples.arXiv preprint arXiv:2510.07192, 2025

Souly, Alexandra and others , title =. arXiv preprint arXiv:2510.07192 , year =. doi:10.48550/arXiv.2510.07192 , url =

work page doi:10.48550/arxiv.2510.07192

[58] [58]

Emergent Introspection in AI is Content-Agnostic

Lederman, Harvey and Mahowald, Kyle , title =. arXiv preprint arXiv:2603.05414 , year =. doi:10.48550/arXiv.2603.05414 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.05414

[59] [59]

Mechanisms of Introspective Awareness

Macar, Uzay and Yang, Li and Wang, Atticus and Wallich, Peter and Ameisen, Emmanuel and Lindsey, Jack , title =. arXiv preprint arXiv:2603.21396 , year =. doi:10.48550/arXiv.2603.21396 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.21396

[60] [60]

arXiv preprint arXiv:2508.14802 , year =

Song, Siyuan and Lederman, Harvey and Hu, Jennifer and Mahowald, Kyle , title =. arXiv preprint arXiv:2508.14802 , year =. doi:10.48550/arXiv.2508.14802 , url =

work page doi:10.48550/arxiv.2508.14802

[61] [61]

and Vaiana, Michael and Pop, Florin and Rosenblatt, Judd and Schwerz de Lucena, Diogo and Ziman, Kirsten and Graziano, Michael S

Premakumar, Vickram N. and Vaiana, Michael and Pop, Florin and Rosenblatt, Judd and Schwerz de Lucena, Diogo and Ziman, Kirsten and Graziano, Michael S. A. , title =. arXiv preprint arXiv:2407.10188 , year =. doi:10.48550/arXiv.2407.10188 , url =

work page doi:10.48550/arxiv.2407.10188

[62] [62]

Universal Behavior of Coupled Order Parameters Below Three Dimensions , journal =

Eichhorn, Astrid and Mesterh. Universal Behavior of Coupled Order Parameters Below Three Dimensions , journal =. 2013 , note =

work page 2013

[63] [63]

Physical Review E , volume =

Hasselmann, Nils and Sinner, Andreas and Kopietz, Peter , title =. Physical Review E , volume =. 2007 , doi =

work page 2007

[64] [64]

Robust Machine Learning by Median-of-Means: Theory and Practice , journal =

Lecu. Robust Machine Learning by Median-of-Means: Theory and Practice , journal =. 2020 , note =

work page 2020

[65] [65]

The Norm-Separation Delay Law of Grokking: A First-Principles Theory of Delayed Generalization

Truong, Xuan Khanh and Truong, Quynh Hoa and Luu, Duc Trung and Phan, Thanh Duc , title =. arXiv preprint arXiv:2603.13331 , year =. doi:10.48550/arXiv.2603.13331 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.13331

[66] [66]

arXiv preprint arXiv:2603.07323 , year =

Truong, Xuan Khanh and Truong, Quynh Hoa , title =. arXiv preprint arXiv:2603.07323 , year =. doi:10.48550/arXiv.2603.07323 , url =

work page doi:10.48550/arxiv.2603.07323

[67] [67]

and Tzifa-Kratira, Zoe and Samavi, Reza and Gavves, Efstratios , title =

Bereska, Leonard F. and Tzifa-Kratira, Zoe and Samavi, Reza and Gavves, Efstratios , title =. Transactions on Machine Learning Research (TMLR) , year =

work page

[68] [68]

, title =

Bi, Yuda and Zhang, Chenyu and Wang, Qiheng and Calhoun, Vince D. , title =. arXiv preprint arXiv:2603.24746 , year =. doi:10.48550/arXiv.2603.24746 , url =

work page doi:10.48550/arxiv.2603.24746

[69] [69]

arXiv preprint arXiv:2602.07852 , year =

Soligo, Anna and Turner, Edward and Taylor, Mia and Rajamanoharan, Senthooran and Nanda, Neel , title =. arXiv preprint arXiv:2602.07852 , year =. doi:10.48550/arXiv.2602.07852 , url =

work page doi:10.48550/arxiv.2602.07852