Decoupled Weight Decay Regularization

Ilya Loshchilov, Frank Hutter · 2019

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Unsupervised Process Reward Models

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Unsupervised PRMs derived from LLM probabilities achieve up to 15% better error detection than LLM judges and match supervised PRMs in verification and RL tasks.

The Geometric Structure of Models Learning Sparse Data

cs.LG · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

Normal alignment is the rank-one Jacobian structure that lets classifiers minimize loss and maximize local robustness in sparse regimes; the paper proves its optimality and uses it to create GrokAlign and RFAMs.

Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

DynamiCS dynamically scales semantic clusters per training epoch to reduce VLM pre-training compute while improving accuracy on long-tail concepts compared to static or flattening baselines.

citing papers explorer

Showing 3 of 3 citing papers.

Unsupervised Process Reward Models cs.LG · 2026-05-11 · unverdicted · none · ref 57
Unsupervised PRMs derived from LLM probabilities achieve up to 15% better error detection than LLM judges and match supervised PRMs in verification and RL tasks.
The Geometric Structure of Models Learning Sparse Data cs.LG · 2026-05-08 · unverdicted · none · ref 44 · 2 links
Normal alignment is the rank-one Jacobian structure that lets classifiers minimize loss and maximize local robustness in sparse regimes; the paper proves its optimality and uses it to create GrokAlign and RFAMs.
Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training cs.CV · 2026-04-30 · unverdicted · none · ref 49
DynamiCS dynamically scales semantic clusters per training epoch to reduce VLM pre-training compute while improving accuracy on long-tail concepts compared to static or flattening baselines.

Decoupled Weight Decay Regularization

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer