Slingshot loss spikes are produced by low-precision arithmetic that breaks the zero-sum gradient constraint and drives exponential growth via Numerical Feature Inflation.
Prakash, Charles H
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Introduces the Patnaik-Pearson intrinsic dimension estimator, proves some of its properties, relates it to HTSR/SETOL for Pareto spectra, and applies it to track embedding dimension evolution in BERT-base and DeepSeek-R1-Distill-Qwen-1.
GROKtimizer combines rapid interpolation with critically damped momentum for post-interpolation norm minimization, yielding quadratic speedup over gradient descent under a local quadratic model and better generalization on synthetic and real datasets.
citing papers explorer
-
Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes
Slingshot loss spikes are produced by low-precision arithmetic that breaks the zero-sum gradient constraint and drives exponential growth via Numerical Feature Inflation.
-
Patnaik-Pearson intrinsic dimension for internal representations of neural networks
Introduces the Patnaik-Pearson intrinsic dimension estimator, proves some of its properties, relates it to HTSR/SETOL for Pareto spectra, and applies it to track embedding dimension evolution in BERT-base and DeepSeek-R1-Distill-Qwen-1.
-
Fast Generalization after Interpolation via Critically Damped Momentum Optimization
GROKtimizer combines rapid interpolation with critically damped momentum for post-interpolation norm minimization, yielding quadratic speedup over gradient descent under a local quadratic model and better generalization on synthetic and real datasets.