Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.
Convergence guarantees for RMSProp and Adam in generalized-smooth non-convex optimization with affine noise variance,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
math.OC 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Stochastic versus Deterministic in Stochastic Gradient Descent
Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.