Zeroth-order SGD learning dynamics are governed by a random low-dimensional projection of the empirical NTK whose approximation error scales with model output dimension, not parameter count.
The Fourteenth International Conference on Learning Representations , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Zeroth-order optimization is underexplored rather than underpowered in deep learning, with limitations stemming from full-space designs that can be addressed via subspace, spectral, and systems-aware approaches.
citing papers explorer
-
Learning Dynamics of Zeroth-Order Optimization: A Kernel Perspective
Zeroth-order SGD learning dynamics are governed by a random low-dimensional projection of the empirical NTK whose approximation error scales with model output dimension, not parameter count.
-
Position: Zeroth-Order Optimization in Deep Learning Is Underexplored, Not Underpowered
Zeroth-order optimization is underexplored rather than underpowered in deep learning, with limitations stemming from full-space designs that can be addressed via subspace, spectral, and systems-aware approaches.