pith. sign in

arxiv: 2306.17844 · v2 · pith:JF5DCC3Vnew · submitted 2023-06-30 · 💻 cs.LG

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

classification 💻 cs.LG
keywords networksneuralalgorithmalgorithmseventasksadditionalgorithmic
0
0 comments X
read the original abstract

Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce the discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm; others implement a previously undescribed, less intuitive, but comprehensible procedure which we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

    cs.LG 2026-05 unverdicted novelty 6.0

    A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.

  2. Learning the symmetric group: large from small

    cs.LG 2025-02 unverdicted novelty 5.0

    Transformer trained on S10 permutation prediction from transpositions generalizes to S25 with near 100% accuracy using identity augmentation and partitioned windows.