Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.
In: Advances in Neural Information Proces s- ing Systems (2020)
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
A Jacobian sensitivity curve computed at initialization identifies the narrowest U-Net configuration that avoids performance collapse, matching nnU-Net accuracy with 400-1600x fewer parameters on six medical datasets.
A high-level outline is given for a unified theory that reduces learning to a small set of ideas from dynamical systems, geometry, and physics via definitions of solvable problems and parametrized methods.
Arbitrary heterogeneous fan-in profiles in sparse networks match uniform random accuracy at high sparsity, but initializing RigL dynamic sparse training with equilibrium-matched lognormal profiles improves performance by up to 0.49% on classification tasks.
citing papers explorer
-
Man, Machine, and Mathematics
A high-level outline is given for a unified theory that reduces learning to a small set of ideas from dynamical systems, geometry, and physics via definitions of solvable problems and parametrized methods.