arxiv: 2301.05217 · v3 · submitted 2023-01-12 · 💻 cs.LG · cs.AI

Recognition: 3 theorem links

· Lean Theorem

Progress measures for grokking via mechanistic interpretability

Neel Nanda , Lawrence Chan , Tom Lieberum , Jess Smith , Jacob Steinhardt

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords grokkingmechanistic interpretabilitymodular additionFourier transformprogress measurestransformer circuitsemergent behaviortraining dynamics

0 comments

The pith

Transformers on modular addition learn a Fourier rotation algorithm that gradually replaces memorization during training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reverse-engineers small transformers trained on modular addition and shows they implement addition by converting inputs to a circle via discrete Fourier transforms and using trigonometric identities to rotate one number by the other. This structured circuit forms gradually rather than appearing suddenly. The authors track the process with progress measures that split training into memorization, circuit formation, and cleanup phases. Grokking therefore reflects the slow amplification of the Fourier mechanism in the weights followed by removal of the memorizing components that were learned first.

Core claim

Small transformers trained on modular addition fully implement the task by representing numbers as points on a circle in Fourier space, rotating one input by the angle given by the other, and reading out the result; this circuit is visible in the weights and activations, confirmed by Fourier-space ablations, and its gradual growth plus later cleanup of memorization circuits produces the grokking phenomenon.

What carries the argument

The discrete Fourier transform circuit that encodes inputs on a circle and performs addition via rotation using trigonometric identities.

If this is right

Training dynamics divide into three phases: early memorization, gradual circuit formation, and late cleanup of memorizing components.
Progress measures based on Fourier components and weight norms track the continuous growth of the structured algorithm.
Grokking is not a discontinuous jump but the point at which the Fourier circuit overtakes memorization in accuracy.
The same reverse-engineering approach can identify similar circuits in other algorithmic tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar progress measures could be defined for other emergent behaviors by first reverse-engineering the underlying circuit.
The cleanup phase suggests that regularization or longer training might systematically prune memorization in favor of structured solutions.
If the Fourier mechanism generalizes, modular arithmetic tasks could serve as a testbed for studying how networks discover group representations.

Load-bearing premise

The Fourier rotation circuit is the main mechanism responsible for the network's behavior once grokking occurs.

What would settle it

Ablating the identified Fourier components in the weights and activations leaves the network still able to compute modular addition correctly.

read the original abstract

Neural networks often exhibit emergent behavior, where qualitatively new capabilities arise from scaling up the amount of parameters, training data, or training steps. One approach to understanding emergence is to find continuous \textit{progress measures} that underlie the seemingly discontinuous qualitative changes. We argue that progress measures can be found via mechanistic interpretability: reverse-engineering learned behaviors into their individual components. As a case study, we investigate the recently-discovered phenomenon of ``grokking'' exhibited by small transformers trained on modular addition tasks. We fully reverse engineer the algorithm learned by these networks, which uses discrete Fourier transforms and trigonometric identities to convert addition to rotation about a circle. We confirm the algorithm by analyzing the activations and weights and by performing ablations in Fourier space. Based on this understanding, we define progress measures that allow us to study the dynamics of training and split training into three continuous phases: memorization, circuit formation, and cleanup. Our results show that grokking, rather than being a sudden shift, arises from the gradual amplification of structured mechanisms encoded in the weights, followed by the later removal of memorizing components.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reverse-engineers grokking on modular addition to an explicit Fourier circuit and splits training into three measurable phases.

read the letter

The main point is that the authors reverse-engineer the exact algorithm a small transformer learns for modular addition. It uses discrete Fourier transforms plus trig identities to turn addition into rotation around a circle. From that circuit they build progress measures that split training into memorization, circuit formation, and cleanup phases, showing grokking as gradual amplification rather than a sudden jump. This is new: prior grokking work did not map the behavior to this specific Fourier mechanism or decompose the dynamics this way. The analysis is concrete. They inspect weights and activations, match them to the expected Fourier components, and run ablations in Fourier space that drop performance when the relevant frequencies are removed. The math lines up with known addition identities, so the circuit identification is not just curve-fitting. The progress measures track the phases without being tuned directly to the accuracy curve. The soft spots are limited. The Fourier ablations are targeted and support the claim that this circuit dominates on the tested models, but it remains possible that parallel non-Fourier computations contribute without being fully isolated. The work stays within small transformers on modular arithmetic, so whether the same phase structure or progress measures apply to other emergence cases is untested. This is useful for people doing mechanistic interpretability or studying training dynamics on algorithmic tasks. It deserves peer review because the circuit account is detailed, the ablations are reproducible, and the phase decomposition gives a clear handle on what was previously treated as discontinuous.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that grokking in small transformers trained on modular addition arises from a fully reverse-engineered circuit that implements addition via discrete Fourier transforms and trigonometric identities (converting the operation to rotation on a circle). This is validated by direct analysis of weights and activations plus targeted ablations in Fourier space. The circuit understanding is then used to construct progress measures that decompose training into three continuous phases—memorization, circuit formation, and cleanup—showing that generalization emerges gradually from amplification of structured weight components rather than a discontinuous jump.

Significance. If the reverse-engineering and ablation results hold, the work supplies a concrete, mechanistically grounded example of progress measures for an emergent phenomenon. The explicit circuit identification (resting on external Fourier identities rather than post-hoc fitting) and the resulting phase decomposition offer a template for using interpretability to track training dynamics, with the low circularity of the measures being a notable strength.

major comments (2)

[§4 (Fourier ablations)] §4 (Fourier ablations): The ablations in Fourier space produce a large drop in test accuracy, but the manuscript does not report the accuracy of the residual non-Fourier components or test whether any parallel non-ablated computations remain active and contribute to generalization. This is load-bearing for the central claim of complete reverse-engineering.
[§5 (Progress measures)] §5 (Progress measures): The three-phase decomposition (memorization, circuit formation, cleanup) is defined from the identified Fourier components. If the ablation does not fully isolate the circuit, the measures may track only a subset of the mechanisms driving the observed test-accuracy curve, weakening the claim that grokking is fully explained by gradual circuit amplification.

minor comments (2)

[Figures 3 and 4] Figure 3 and 4: axis labels and phase boundaries could be annotated more explicitly to make the correspondence between progress-measure curves and the three phases immediately visible.
[Abstract] The abstract's phrasing 'we fully reverse engineer' is strong given the residual-performance question above; a minor softening would better match the evidence presented.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed review and positive recommendation for minor revision. We appreciate the focus on strengthening the evidence for complete reverse-engineering and its implications for the progress measures. We address each major comment below.

read point-by-point responses

Referee: [§4 (Fourier ablations)] §4 (Fourier ablations): The ablations in Fourier space produce a large drop in test accuracy, but the manuscript does not report the accuracy of the residual non-Fourier components or test whether any parallel non-ablated computations remain active and contribute to generalization. This is load-bearing for the central claim of complete reverse-engineering.

Authors: We agree that reporting the test accuracy of the residual non-Fourier components would provide stronger validation for the completeness of our reverse-engineering. In the revised manuscript, we will add this analysis, showing that the non-Fourier residual achieves high training accuracy (consistent with memorization) but near-chance test accuracy on unseen inputs. This demonstrates that no significant parallel non-Fourier computations contribute to generalization, supporting that the Fourier circuit accounts for the learned algorithm. revision: yes
Referee: [§5 (Progress measures)] §5 (Progress measures): The three-phase decomposition (memorization, circuit formation, cleanup) is defined from the identified Fourier components. If the ablation does not fully isolate the circuit, the measures may track only a subset of the mechanisms driving the observed test-accuracy curve, weakening the claim that grokking is fully explained by gradual circuit amplification.

Authors: We maintain that the ablation results (near-total drop in test accuracy upon Fourier component removal) provide strong evidence that the identified circuit is the primary driver, allowing the progress measures to track the full generalization dynamics. However, to further address potential concerns about isolation, we will partially revise §5 by adding explicit correlations between the progress measures and test accuracy, along with controls showing that the phase transitions align specifically with changes in the Fourier components rather than other weight statistics. revision: partial

Circularity Check

0 steps flagged

No significant circularity; circuit identified via independent analysis before defining progress measures

full rationale

The paper first reverse-engineers the network's algorithm through direct inspection of weights, activations, and Fourier-space ablations, grounding the DFT-plus-trigonometric circuit in external mathematical identities rather than in the grokking dynamics or any fitted progress measures. Only after this identification do the authors define the three phases (memorization, circuit formation, cleanup) as derived quantities to track training. No step reduces by construction to its own inputs, no parameters are fitted to the target curve and relabeled as predictions, and no load-bearing claim rests on self-citation chains. The derivation remains self-contained and externally falsifiable via ablation experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard properties of the discrete Fourier transform over finite fields and the assumption that the network's learned weights implement exactly the identified rotation mechanism once the circuit forms.

axioms (2)

standard math The discrete Fourier transform converts modular addition into component-wise multiplication (rotation) in frequency space.
Invoked in the reverse-engineering section to map the addition task onto trigonometric identities.
domain assumption Ablations performed by zeroing specific frequency components isolate the causal mechanism without introducing new artifacts.
Used to confirm the circuit; this is a standard mechanistic-interpretability assumption rather than a paper-specific invention.

pith-pipeline@v0.9.0 · 5498 in / 1301 out tokens · 40356 ms · 2026-05-14T21:47:29.091836+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat embedding and orbit periodicity echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

We fully reverse engineer the algorithm learned by these networks, which uses discrete Fourier transforms and trigonometric identities to convert addition to rotation about a circle.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

ablating key frequencies used by the model reduces performance to chance, while ablating the other 95% of frequencies slightly improves performance
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

training can be split into three continuous phases: memorization, circuit formation, and cleanup

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 24 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Spherical Boltzmann machines: a solvable theory of learning and generation in energy-based models
cs.LG 2026-05 unverdicted novelty 8.0

In the high-dimensional limit the spherical Boltzmann machine admits exact equations for training dynamics, Bayesian evidence, and cascades of phase transitions tied to mode alignment with data, which connect to gener...
When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability
cs.LG 2026-05 unverdicted novelty 7.0

Tensor similarity is a symmetry-invariant metric that measures functional equivalence between tensor-based networks using a recursive algorithm for cross-layer mechanisms.
Assessing the Creativity of Large Language Models: Testing, Limits, and New Frontiers
cs.AI 2026-05 conditional novelty 7.0

The Divergent Remote Association Test (DRAT) is the first creativity test that significantly predicts LLMs' scientific ideation ability, unlike prior tests such as DAT or RAT.
Interpreting Reinforcement Learning Agents with Susceptibilities
cs.LG 2026-05 unverdicted novelty 7.0

Susceptibilities applied to regret in deep RL agents reveal stagewise internal development in parameter space of a gridworld model that policy inspection alone cannot detect, validated via activation steering.
The Right Answer, the Wrong Direction: Why Transformers Fail at Counting and How to Fix It
cs.LG 2026-05 unverdicted novelty 7.0

Transformers encode counts correctly internally but fail to read them out due to misalignment with digit output directions, fixable by updating 37k output parameters or small LoRA on attention.
ILDR: Geometric Early Detection of Grokking
cs.LG 2026-04 unverdicted novelty 7.0

ILDR detects the geometric reorganization preceding grokking by measuring when inter-class centroid separation exceeds intra-class scatter by 2.5 times its baseline in penultimate-layer representations.
Grokking of Diffusion Models: Case Study on Modular Addition
cs.LG 2026-04 unverdicted novelty 7.0

Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.
Dimensional Criticality at Grokking Across MLPs and Transformers
cs.LG 2026-04 unverdicted novelty 7.0

Effective cascade dimension D(t) crosses D=1 at the grokking transition in MLPs and Transformers, with opposite directions for modular addition versus XOR, consistent with attraction to a shared critical manifold.
The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior
cs.LG 2026-03 unverdicted novelty 7.0

The grokking delay in encoder-decoder models on one-step Collatz prediction stems from decoder inability to use early-learned encoder representations of parity and residue structure, with numeral base acting as a stro...
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces
cs.LG 2026-05 unverdicted novelty 6.0

A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory
cs.LG 2026-05 unverdicted novelty 6.0

Random Matrix Theory detects overfitting via growing Correlation Traps in weight spectra during the anti-grokking phase of neural network training.
Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory
cs.LG 2026-05 unverdicted novelty 6.0

A Random Matrix Theory method identifies growing Correlation Traps in neural network weight spectra during an 'anti-grokking' overfitting phase, and applies the same diagnostic to some foundation LLMs.
Not How Many, But Which: Parameter Placement in Low-Rank Adaptation
cs.LG 2026-05 unverdicted novelty 6.0

Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.
Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization
stat.ML 2026-05 unverdicted novelty 6.0

Spectral analysis of activations and gradients provides new diagnostics that link batch size to representation geometry, early covariance tails to token efficiency, and spectral shifts to learning dynamics in decoder-...
Harmful Intent as a Geometrically Recoverable Feature of LLM Residual Streams
cs.LG 2026-04 unverdicted novelty 6.0

Harmful intent is linearly separable in LLM residual streams across 12 models and multiple architectures, reaching mean AUROC 0.982 while showing protocol-dependent directions and strong generalization to held-out har...
Harmful Intent as a Geometrically Recoverable Feature of LLM Residual Streams
cs.LG 2026-04 unverdicted novelty 6.0

Harmful intent is geometrically recoverable as a linear direction or angular deviation in LLM residual streams, with high AUROC across 12 models, stable under alignment variants including abliterated ones, and transfe...
LAG-XAI: A Lie-Inspired Affine Geometric Framework for Interpretable Paraphrasing in Transformer Latent Spaces
cs.CL 2026-04 unverdicted novelty 6.0

LAG-XAI treats paraphrasing as affine flows in semantic manifolds using Lie-inspired approximations, achieving AUC 0.7713 on paraphrase detection and 95.3% hallucination detection on HaluEval.
Grokking as Dimensional Phase Transition in Neural Networks
cs.LG 2026-04 unverdicted novelty 6.0

Grokking occurs as the effective dimensionality of the gradient field transitions from sub-diffusive to super-diffusive at the onset of generalization, exhibiting self-organized criticality.
PhiNet: Speaker Verification with Phonetic Interpretability
eess.AS 2026-04 unverdicted novelty 6.0

PhiNet adds phonetic interpretability to speaker verification while matching the accuracy of standard black-box models on VoxCeleb, SITW, and LibriSpeech.
Model Capacity Determines Grokking through Competing Memorisation and Generalisation Speeds
cs.LG 2026-05 unverdicted novelty 5.0

Grokking emerges near the model size where memorization timescale T_mem(P) intersects generalization timescale T_gen(P) on modular arithmetic.
Emergent Semantic Role Understanding in Language Models
cs.AI 2026-05 unverdicted novelty 5.0

Semantic role understanding partially emerges during language model pre-training, with linear probes on frozen representations achieving substantial performance that improves with scale but does not match fine-tuned m...
Artificial Jagged Intelligence as Uneven Optimization Energy Allocation Capability Concentration, Redistribution, and Optimization Governance
cs.AI 2026-05 unverdicted novelty 4.0

AJI frames jagged AI capabilities as lower bounds on performance dispersion arising from concentrated optimization energy allocation under anisotropic objectives, with theorems on tradeoffs and redistribution interventions.
Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking
cs.LG 2026-04 unverdicted novelty 4.0

Empirical tests confirm robust feature repulsion signs but reveal activation-dependent spectral lock-in in grokking, with x^2 yielding rank-2 updates at epoch ~174 and ReLU remaining rank-1.
There Will Be a Scientific Theory of Deep Learning
stat.ML 2026-04 unverdicted novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universa...

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 22 Pith papers · 7 internal anchors

[1]

More is different for AI , url=

Steinhardt, Jacob , year=. More is different for AI , url=. Bounded Regret , publisher=

work page
[3]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

work page
[4]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page
[9]

2022 ACM Conference on Fairness, Accountability, and Transparency , pages=

Predictability and surprise in large generative models , author=. 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=

work page 2022
[10]

Beren's Blog - Thoughts on AI, Neuroscience, and other things that interest me

Grokking 'grokking' , url=. Beren's Blog - Thoughts on AI, Neuroscience, and other things that interest me. , author=

work page
[13]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , author=. arXiv preprint arXiv:2206.04615 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Distill , year =

Cammarata, Nick and Carter, Shan and Goh, Gabriel and Olah, Chris and Petrov, Michael and Schubert, Ludwig and Voss, Chelsea and Egan, Ben and Lim, Swee Kiat , title =. Distill , year =

work page
[16]

2021 , journal=

A Mathematical Framework for Transformer Circuits , author=. 2021 , journal=

work page 2021
[17]

2022 , journal=

In-context Learning and Induction Heads , author=. 2022 , journal=

work page 2022
[18]

arXiv preprint arXiv:2110.07483 , year=

On the pitfalls of analyzing individual neurons in language models , author=. arXiv preprint arXiv:2110.07483 , year=

work page arXiv
[20]

Advances in neural information processing systems , volume=

Comparing biases for minimal network construction with back-propagation , author=. Advances in neural information processing systems , volume=

work page
[22]

Locating and Editing Factual Associations in GPT, January 2023

Locating and Editing Factual Associations in GPT , author=. arXiv preprint arXiv:2202.05262 , year=

work page arXiv
[23]

IEEE Transactions on Information Theory , volume=

Comparing measures of sparsity , author=. IEEE Transactions on Information Theory , volume=. 2009 , publisher=

work page 2009
[25]

The journal of machine learning research , volume=

Dropout: a simple way to prevent neural networks from overfitting , author=. The journal of machine learning research , volume=. 2014 , publisher=

work page 2014
[26]

Advances in neural information processing systems , volume=

Pytorch: An imperative style, high-performance deep learning library , author=. Advances in neural information processing systems , volume=

work page
[27]

2015 , url =

Collaborative data science , publisher =. 2015 , url =

work page 2015
[29]

International Conference on Learning Representations , year=

Einops: Clear and Reliable Tensor Manipulations with Einstein-like Notation , author=. International Conference on Learning Representations , year=

work page
[30]

Hidden progress in deep learning: Sgd learns parities near the computational limit

Boaz Barak, Benjamin L Edelman, Surbhi Goel, Sham Kakade, Eran Malach, and Cyril Zhang. Hidden progress in deep learning: Sgd learns parities near the computational limit. arXiv preprint arXiv:2207.08799, 2022

work page arXiv 2022
[31]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33: 0 1877--1901, 2020

work page 1901
[32]

Thread: Circuits

Nick Cammarata, Shan Carter, Gabriel Goh, Chris Olah, Michael Petrov, Ludwig Schubert, Chelsea Voss, Ben Egan, and Swee Kiat Lim. Thread: Circuits. Distill, 2020. doi:10.23915/distill.00024. https://distill.pub/2020/circuits

work page doi:10.23915/distill.00024 2020
[33]

A mathematical framework for transformer circuits

Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. A...

work page 2021
[34]

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

Predictability and surprise in large generative models

Deep Ganguli, Danny Hernandez, Liane Lovitt, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova Dassarma, Dawn Drain, Nelson Elhage, et al. Predictability and surprise in large generative models. In 2022 ACM Conference on Fairness, Accountability, and Transparency, pp.\ 1747--1764, 2022

work page 2022
[36]

R., Millman, K

Charles R. Harris, K. Jarrod Millman, St \' e fan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fern \' a ndez del R \' i o, Mark Wiebe, Pearu Peterson, Pierre G \' e rard-M...

work page doi:10.1038/s41586-020-2649-2 2020
[37]

Comparing measures of sparsity

Niall Hurley and Scott Rickard. Comparing measures of sparsity. IEEE Transactions on Information Theory, 55 0 (10): 0 4723--4741, 2009

work page 2009
[38]

Nolte, Eric J

Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J Michaud, Max Tegmark, and Mike Williams. Towards understanding grokking: An effective theory of representation learning. arXiv preprint arXiv:2205.10343, 2022

work page arXiv 2022
[39]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[40]

Acquisition of chess knowledge in alphazero

Thomas McGrath, Andrei Kapishnikov, Nenad Toma s ev, Adam Pearce, Demis Hassabis, Been Kim, Ulrich Paquet, and Vladimir Kramnik. Acquisition of chess knowledge in alphazero. arXiv preprint arXiv:2111.09259, 2021

work page arXiv 2021
[41]

Grokking 'grokking', 2022

Beren Millidge. Grokking 'grokking', 2022. URL https://www.beren.io/2022-01-11-Grokking-Grokking/

work page 2022
[42]

In-context learning and induction heads

Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, a...

work page 2022
[43]

The effects of reward misspecification: Mapping and mitigating misaligned models

Alexander Pan, Kush Bhatia, and Jacob Steinhardt. The effects of reward misspecification: Mapping and mitigating misaligned models. arXiv preprint arXiv:2201.03544, 2022

work page arXiv 2022
[44]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019

work page 2019
[45]

Collaborative data science, 2015

Plotly Technologies Inc. Collaborative data science, 2015. URL https://plot.ly

work page 2015
[46]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. Grokking: Generalization beyond overfitting on small algorithmic datasets. arXiv preprint arXiv:2201.02177, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[47]

Language models are unsupervised multitask learners

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1 0 (8): 0 9, 2019

work page 2019
[48]

Einops: Clear and reliable tensor manipulations with einstein-like notation

Alex Rogozhnikov. Einops: Clear and reliable tensor manipulations with einstein-like notation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=oapKSVM2bcj

work page 2022
[49]

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15 0 (1): 0 1929--1958, 2014

work page 1929
[50]

More is different for ai, Feb 2022

Jacob Steinhardt. More is different for ai, Feb 2022. URL https://bounded-regret.ghost.io/more-is-different-for-ai/

work page 2022
[51]

The slingshot mechanism: An empirical study of adaptive optimizers and the grokking phenomenon

Vimal Thilak, Etai Littwin, Shuangfei Zhai, Omid Saremi, Roni Paiss, and Joshua Susskind. The slingshot mechanism: An empirical study of adaptive optimizers and the grokking phenomenon. arXiv preprint arXiv:2206.04817, 2022

work page arXiv 2022
[52]

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Kevin Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, and Jacob Steinhardt. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. arXiv preprint arXiv:2211.00593, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[53]

Emergent Abilities of Large Language Models

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022 a

work page internal anchor Pith review Pith/arXiv arXiv 2022
[54]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022 b

work page internal anchor Pith review Pith/arXiv arXiv 2022
[55]

2010 , keywords =

W es M c K inney. D ata S tructures for S tatistical C omputing in P ython. In S t\'efan van der W alt and J arrod M illman (eds.), P roceedings of the 9th P ython in S cience C onference , pp.\ 56 -- 61, 2010. doi:10.25080/Majora-92bf1922-00a

work page doi:10.25080/majora-92bf1922-00a 2010