Zeroth-order SGD learning dynamics are governed by a random low-dimensional projection of the empirical NTK whose approximation error scales with model output dimension, not parameter count.
International Conference on Learning Representations , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2representative citing papers
citing papers explorer
-
Learning Dynamics of Zeroth-Order Optimization: A Kernel Perspective
Zeroth-order SGD learning dynamics are governed by a random low-dimensional projection of the empirical NTK whose approximation error scales with model output dimension, not parameter count.
- Turning Stale Gradients into Stable Gradients: Coherent Coordinate Descent with Implicit Landscape Smoothing for Lightweight Zeroth-Order Optimization