pith. machine review for the scientific record. sign in

arxiv: 2509.02912 · v3 · submitted 2025-09-03 · 🧮 math.OC

Stochastic versus Deterministic in Stochastic Gradient Descent

classification 🧮 math.OC
keywords convergencegradientstochasticratedescentdeterministicmini-batchversus
0
0 comments X
read the original abstract

This paper theoretically reanalyzes the convergence of the mini-batch stochastic gradient descent (SGD) for a structured minimization problem involving a finite-sum function with its gradient being stochastically approximated, and an independent term with its gradient being deterministically computed. Rather than collapsing this problem into a standard finite-sum formulation and treating all components uniformly, we study it from a stochastic versus deterministic viewpoint and focus on how these two gradient computations affect mini-batch SGD differently. The step size, the convergence rate, and the radius of the convergence region depend asymmetrically on the characteristics of the two components, which shows the distinct impacts of stochastic approximation versus deterministic computation in the mini-batch SGD. Based on this, we show that our analysis yields a faster convergence rate and a smaller radius of the convergence region. Moreover, an even better convergence rate can be obtained when the independent term endows the objective function with sufficient strong convexity. Also, the convergence rate of our algorithm in expectation approaches that of the classic gradient descent when the batch size increases. Numerical experiments are conducted to support the theoretical analysis as well.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.