Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
arXiv preprint arXiv:2407.20199 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5representative citing papers
ALU uses public data to suppress unlearning cost quadratically while characterizing distribution mismatch effects, enabling mass unlearning with maintained utility.
Normal alignment is the rank-one Jacobian structure that lets classifiers minimize loss and maximize local robustness in sparse regimes; the paper proves its optimality and uses it to create GrokAlign and RFAMs.
xRFM merges kernel-based feature learning with tree structures for scalable, interpretable tabular modeling and reports top performance on 100 regression and competitive results on 200 classification datasets versus 31 baselines including GBDTs and TabPFNv2.
Tuning the depth-width ratio positions models in an efficient neural interaction interval that correlates with better generalization under fixed budgets and remains stable with scale.
citing papers explorer
-
Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent
Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.
-
Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data
ALU uses public data to suppress unlearning cost quadratically while characterizing distribution mismatch effects, enabling mass unlearning with maintained utility.
-
The Geometric Structure of Models Learning Sparse Data
Normal alignment is the rank-one Jacobian structure that lets classifiers minimize loss and maximize local robustness in sparse regimes; the paper proves its optimality and uses it to create GrokAlign and RFAMs.
-
xRFM: Accurate, scalable, and interpretable feature learning models for tabular data
xRFM merges kernel-based feature learning with tree structures for scalable, interpretable tabular modeling and reports top performance on 100 regression and competitive results on 200 classification datasets versus 31 baselines including GBDTs and TabPFNv2.
-
Law of Neural Interaction: Depth-Width Shape, Interaction Efficiency, and Generalization
Tuning the depth-width ratio positions models in an efficient neural interaction interval that correlates with better generalization under fixed budgets and remains stable with scale.