ESSAM matches PPO and GRPO accuracy (~78%) on GSM8K math tasks but uses 10-18x less GPU memory and shows stronger generalization across datasets.
Sharpness-aware black-box optimization.arXiv preprint arXiv:2410.12457,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning
ESSAM matches PPO and GRPO accuracy (~78%) on GSM8K math tasks but uses 10-18x less GPU memory and shows stronger generalization across datasets.