Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent stat.ML · 2026-05-18