Derives ODE deterministic equivalents and an adversarial homogenized SDE for SGD iterates in high-dim ℓ2-adversarial training, showing no constant learning rate ensures monotone descent for single-class adversarial least squares and equivalence to adaptive regularized standard SGD.
Hitting the high- dimensional notes: an ode for sgd learning dynamics on glms and multi-index models.Information and Inference: A Journal of the IMA, 13(4):iaae028, 2024a
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
DP-GD achieves minimax optimal non-asymptotic risk O(γ + γ²/ρ²) for well-conditioned high-dimensional data and power-law scaling for ill-conditioned power-law spectra, with the exponent depending on the privacy parameter ρ.
In the high-dimensional regime, SGD on diagonal linear networks is approximated by an SDE and a deterministic PDE that together give an explicit non-asymptotic description of convergence to zero risk.
citing papers explorer
No citing papers match the current filters.