PACE is a clipped per-coordinate controller added to AdamW that improves the limiting error of the returned iterate average in both quadratic analysis and LM experiments.
The AdEMAMix optimizer: Better, faster, older.arXiv preprint arXiv:2409.03137, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Training for the Model You Return: Improving Optimization for Iterate-Averaged Language Models
PACE is a clipped per-coordinate controller added to AdamW that improves the limiting error of the returned iterate average in both quadratic analysis and LM experiments.