ConMeZO accelerates zeroth-order optimization for LLM finetuning by restricting random direction sampling to a momentum-centered cone, matching MeZO's worst-case rate but showing 2X empirical speedup.
arXiv preprint arXiv:2501.19099 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
Zeroth-order optimization is underexplored rather than underpowered in deep learning, with limitations stemming from full-space designs that can be addressed via subspace, spectral, and systems-aware approaches.
citing papers explorer
-
ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models
ConMeZO accelerates zeroth-order optimization for LLM finetuning by restricting random direction sampling to a momentum-centered cone, matching MeZO's worst-case rate but showing 2X empirical speedup.
-
Position: Zeroth-Order Optimization in Deep Learning Is Underexplored, Not Underpowered
Zeroth-order optimization is underexplored rather than underpowered in deep learning, with limitations stemming from full-space designs that can be addressed via subspace, spectral, and systems-aware approaches.