SAGE replaces AdamW in memory-efficient LLM hybrids with a Lion-style sign update plus a provably bounded O(d) adaptive scale, delivering SOTA perplexity on Llama-1.3B while cutting optimizer-state memory.
Opt." refers to optimizer state memory, while
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization
SAGE replaces AdamW in memory-efficient LLM hybrids with a Lion-style sign update plus a provably bounded O(d) adaptive scale, delivering SOTA perplexity on Llama-1.3B while cutting optimizer-state memory.