Flat minima are illusory; generalization is driven by weakness, a reparameterization-invariant measure of compatible completions that predicts performance better than sharpness on MNIST and Fashion-MNIST.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Structurally distinct circuits for literal sequence copying across token frequency bands implement the same computation, shown by broad transfer of band-specific edges, a shared core recovering 99% performance, and interchangeable representations via causal interventions.
citing papers explorer
-
Are Flat Minima an Illusion?
Flat minima are illusory; generalization is driven by weakness, a reparameterization-invariant measure of compatible completions that predicts performance better than sharpness on MNIST and Fashion-MNIST.
-
Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery
Structurally distinct circuits for literal sequence copying across token frequency bands implement the same computation, shown by broad transfer of band-specific edges, a shared core recovering 99% performance, and interchangeable representations via causal interventions.