Super-convergence: Very fast training of neural networks using large learning rates

Leslie N Smith, Nicholay Topin · 2019

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Large Spikes in Stochastic Gradient Descent: A Large-Deviations View

cs.LG · 2026-03-10 · unverdicted · novelty 7.0

Large loss spikes in SGD are polynomially likely and serve as the dominant mechanism for escaping sharp minima toward flatter solutions in the NTK regime.

Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

Large Spikes in Stochastic Gradient Descent: A Large-Deviations View cs.LG · 2026-03-10 · unverdicted · none · ref 55
Large loss spikes in SGD are polynomially likely and serve as the dominant mechanism for escaping sharp minima toward flatter solutions in the NTK regime.
Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates cs.LG · 2026-05-19 · unverdicted · none · ref 61
FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.

Super-convergence: Very fast training of neural networks using large learning rates

fields

years

verdicts

representative citing papers

citing papers explorer