Toy models of superposition.Transformer Circuits Thread

Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, Chri · 2022

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The Geometric Wall: Manifold Structure Predicts Layerwise Sparse Autoencoder Scaling Laws

cs.LG · 2026-05-11 · unverdicted · novelty 8.0

Manifold curvature and intrinsic dimension predict layerwise SAE width exponents and asymptotic floors across Gemma models, with cross-model transfer of the geometric regression, establishing a transferable geometric law instead of a universal scaling law.

Dynamic Latent Routing

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

Dynamic Latent Routing jointly learns discrete latent codes, routing policies, and model parameters via dynamic search to match or exceed supervised fine-tuning by 6.6 points on average in low-data settings across four datasets and six models.

Aligned Training: A Parameter-Free Method to Improve Feature Quality and Stability of Sparse Autoencoders (SAE)

cs.LG · 2026-05-18

citing papers explorer

Showing 3 of 3 citing papers.

The Geometric Wall: Manifold Structure Predicts Layerwise Sparse Autoencoder Scaling Laws cs.LG · 2026-05-11 · unverdicted · none · ref 3
Manifold curvature and intrinsic dimension predict layerwise SAE width exponents and asymptotic floors across Gemma models, with cross-model transfer of the geometric regression, establishing a transferable geometric law instead of a universal scaling law.
Dynamic Latent Routing cs.LG · 2026-05-14 · unverdicted · none · ref 13
Dynamic Latent Routing jointly learns discrete latent codes, routing policies, and model parameters via dynamic search to match or exceed supervised fine-tuning by 6.6 points on average in low-data settings across four datasets and six models.
Aligned Training: A Parameter-Free Method to Improve Feature Quality and Stability of Sparse Autoencoders (SAE) cs.LG · 2026-05-18 · unreviewed · ref 10

Toy models of superposition.Transformer Circuits Thread

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer