SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov, Frank Hutter · 2017

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Bounded-Rationality, Hedging, and Generalization

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

Generalization is a testable hedging property of the learner's response law, recovered via f-divergence regularizers that induce information-geometric curves between training loss and sample dependence.

Neural posterior estimation of the neutrino direction in IceCube using transformer-encoded normalizing flows on the sphere

hep-ex · 2026-04-21 · unverdicted · novelty 7.0

A transformer-encoded spherical normalizing flow achieves state-of-the-art angular resolution for IceCube neutrino tracks and showers, improving median resolution by factors of 1.3-2.5 over B-spline likelihoods at 100 TeV and outperforming prior ML methods for muons.

Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

DynamiCS dynamically scales semantic clusters per training epoch to reduce VLM pre-training compute while improving accuracy on long-tail concepts compared to static or flattening baselines.

citing papers explorer

Showing 3 of 3 citing papers.

Bounded-Rationality, Hedging, and Generalization cs.LG · 2026-05-14 · unverdicted · none · ref 68
Generalization is a testable hedging property of the learner's response law, recovered via f-divergence regularizers that induce information-geometric curves between training loss and sample dependence.
Neural posterior estimation of the neutrino direction in IceCube using transformer-encoded normalizing flows on the sphere hep-ex · 2026-04-21 · unverdicted · none · ref 44
A transformer-encoded spherical normalizing flow achieves state-of-the-art angular resolution for IceCube neutrino tracks and showers, improving median resolution by factors of 1.3-2.5 over B-spline likelihoods at 100 TeV and outperforming prior ML methods for muons.
Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training cs.CV · 2026-04-30 · unverdicted · none · ref 51
DynamiCS dynamically scales semantic clusters per training epoch to reduce VLM pre-training compute while improving accuracy on long-tail concepts compared to static or flattening baselines.

SGDR: Stochastic Gradient Descent with Warm Restarts

fields

years

verdicts

representative citing papers

citing papers explorer