Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
Hidden progress in deep learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Varying evaluation metrics and corruption methods in activation patching produces different localization and circuit discovery outcomes in language models, leading to recommendations for preferred practices.
citing papers explorer
-
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Varying evaluation metrics and corruption methods in activation patching produces different localization and circuit discovery outcomes in language models, leading to recommendations for preferred practices.