A stochastic gradient algorithm for non-separable optimization with convergence guarantee

Ruofan Wu; Yingzhou Li

arxiv: 2606.10383 · v1 · pith:JDWNFNBLnew · submitted 2026-06-09 · 🧮 math.OC · cs.NA· math.NA

A stochastic gradient algorithm for non-separable optimization with convergence guarantee

Yingzhou Li , Ruofan Wu This is my paper

classification 🧮 math.OC cs.NAmath.NA

keywords convergencelocalunderconvexitydrivengradientnon-separableranges

0 comments

read the original abstract

We study non-separable objectives in which the loss depend on dataset-level quantities. We introduce an SGD-style framework that employs two batch-gradient constructs: the ideal per-batch gradient `$G$' and a cached surrogate `$H$' for cases where full-data terms are expensive. Notably, in the sample-wise separable case, our method reduces to standard mini-batch SGD. Our main contribution is a unified local convergence theory: under mild smoothness and Jacobian-boundedness assumptions, we prove local linear convergence under local strong convexity and local $O(1/k)$ sublinear convergence under local convexity for both `$G$'-driven and `$H$'-driven updates. Crucially, these guarantees hold for fixed step sizes within explicitly characterized ranges; we provide explicit bounds showing how cache staleness, surrogate approximation error, batch size, and step size influence the convergence constants and allowable step-size ranges.

This paper has not been read by Pith yet.

A stochastic gradient algorithm for non-separable optimization with convergence guarantee

discussion (0)