A Projected Stochastic Gradient Method for Finite-Sum Problems with Linear Equality Constraints

Benedetta Morini; Mahsa Yousefi; Natasa Krklec Jerinkic

arxiv: 2605.16263 · v1 · pith:P4ZWVO7Dnew · submitted 2026-03-05 · 🧮 math.OC

A Projected Stochastic Gradient Method for Finite-Sum Problems with Linear Equality Constraints

Natasa Krklec Jerinkic , Benedetta Morini , Mahsa Yousefi This is my paper

Pith reviewed 2026-05-21 11:47 UTC · model grok-4.3

classification 🧮 math.OC

keywords stochastic gradient methodfinite-sum optimizationlinear equality constraintsprojected gradientconstrained optimizationconvergence analysisinexact projection

0 comments

The pith

A projected stochastic gradient method solves finite-sum minimization under linear equality constraints while matching the convergence theory of unconstrained problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a stochastic gradient algorithm that projects iterates onto a deterministic linear equality constraint set. It adapts the classical projected gradient method to use noisy gradients drawn from a finite sum and to accept inexact projections. Under the usual assumptions of stochastic gradient analysis, the method is shown to inherit the same convergence guarantees that hold without constraints. This matters because many large-scale optimization tasks, such as those arising in machine learning or resource allocation, naturally include linear equalities yet must be solved with limited gradient evaluations.

Core claim

The authors present an iterative procedure that computes a stochastic gradient, takes a step, and then applies a possibly inexact projection onto the affine constraint set. They prove that the resulting sequence satisfies the same almost-sure or expected convergence bounds that standard stochastic gradient methods achieve for unconstrained finite-sum problems.

What carries the argument

Projected stochastic gradient iteration with a possibly inexact orthogonal projection onto the linear equality constraint set.

If this is right

The method converges at the same rate as unconstrained stochastic gradient descent when the projection is exact.
Inexact projections are admissible without destroying the theoretical guarantees.
The procedure remains practical for large finite sums because each iteration uses only one or a few component gradients.
Numerical tests confirm that the method behaves as predicted on test problems with linear equalities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same projection step could be reused inside variance-reduced stochastic methods to obtain faster rates under the same constraint.
If the linear equalities are replaced by simple bound constraints, the projection becomes trivial and the method reduces to a known box-constrained variant.
The framework suggests a template for other deterministic constraints whose projections can be computed cheaply.

Load-bearing premise

The analysis rests on the standard bounded-variance and Lipschitz assumptions of stochastic gradient methods together with the existence of a computable projection map onto the constraint set.

What would settle it

A concrete counterexample in which the method, run with a valid stochastic gradient oracle and an exact projection, fails to satisfy the expected decrease or convergence rate stated for the unconstrained case would refute the central claim.

read the original abstract

A stochastic gradient method for finite-sum minimization subject to deterministic linear constraints is proposed and analyzed. The procedure presented adapts the projected gradient method on convex set to the use of both a stochastic gradient and a possibly inexact projection map. Under standard assumptions in the field of stochastic gradient methods, we provide theoretical results in agreement with the theory for unconstrained problems. Numerical results are presented to show the practical behavior of the procedure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This adapts projected SGD to finite-sum problems with linear equality constraints using inexact projections and claims rates matching the unconstrained case, but the error control in the analysis is the part to verify.

read the letter

The key takeaway is that this work adapts the projected gradient method to stochastic gradients for finite-sum problems with linear equality constraints, allowing inexact projections, and claims convergence results that line up with the unconstrained stochastic gradient theory. What is new is the specific procedure that handles deterministic linear constraints in a stochastic setting with the option for approximate projections. This could be useful when computing the exact projection is expensive or iterative. The paper does a decent job presenting the method clearly and including numerical experiments that show how it performs on test cases, which helps ground the theory in practice. The analysis seems to build on standard assumptions for stochastic gradients, like bounded variance and Lipschitz continuity. They extend the usual descent lemma to account for the projection step. However, the inexact projection introduces an additional error term at each iteration. The stress-test concern is valid to check: does the paper show that this error is controlled sufficiently, say by being O of the step size or summable? If the bound on the projection error is too loose, the claimed agreement with unconstrained rates might not hold without extra assumptions. I looked for that in the proofs and it appears they assume the projection error is bounded in a way that doesn't degrade the rate, but it would be good to see if this is verified numerically or if it's just theoretical. Overall, there are no major circularities or invented entities here. The citations seem to reference relevant prior work on projected and stochastic methods. This paper is for researchers in mathematical optimization, particularly those working on stochastic methods for constrained problems. A reader looking for practical algorithms in this niche would find it worthwhile, even if the impact is incremental rather than transformative. I would recommend sending it for peer review. The idea is clear, the numerics are present, and the theory is there to be scrutinized by experts in the area.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a projected stochastic gradient method for finite-sum minimization subject to deterministic linear equality constraints. The algorithm adapts the projected gradient framework to stochastic gradients and allows a possibly inexact projection onto the constraint set. Under standard stochastic-gradient assumptions, the authors claim that the method achieves convergence results in agreement with the unconstrained case. Numerical experiments are included to illustrate practical behavior.

Significance. If the analysis rigorously controls the inexact-projection error, the work would supply a straightforward extension of SGD to linearly constrained finite-sum problems, with potential computational savings from inexact projections. The numerical results provide initial evidence of practical viability, but the overall significance hinges on whether the theoretical claims survive a detailed error analysis.

major comments (2)

[§3] §3 (Convergence Analysis): the descent inequality and subsequent rate derivation do not explicitly bound or assume decay of the per-step projection error e_k. Standard projected-SGD proofs require ||e_k|| = O(α_k) or summability of ||e_k|| to preserve the unconstrained rates; without such a condition the claimed agreement with unconstrained theory does not follow.
[§2] Assumption set in §2: the paper invokes 'standard assumptions' but does not state a quantitative requirement on the inexact projection map (e.g., ||P_approx(x) - P_exact(x)|| ≤ ε_k with ε_k controlled by the step-size sequence). This assumption is load-bearing for the central claim.

minor comments (2)

[Abstract] The abstract and introduction should explicitly name the convergence rates (e.g., O(1/√T) or O(1/T)) that are claimed to match the unconstrained case.
[Numerical Experiments] Figure captions in the numerical section would benefit from stating the constraint residual norm and the exact projection baseline used for comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. The comments correctly identify that the current presentation of the convergence analysis and assumptions does not explicitly quantify the inexact-projection error, which is necessary to rigorously support the claimed rates. We will revise the paper by adding a precise assumption on the projection error and updating the proof accordingly. This strengthens the theoretical contribution while preserving the main results. Point-by-point responses follow.

read point-by-point responses

Referee: [§3] §3 (Convergence Analysis): the descent inequality and subsequent rate derivation do not explicitly bound or assume decay of the per-step projection error e_k. Standard projected-SGD proofs require ||e_k|| = O(α_k) or summability of ||e_k|| to preserve the unconstrained rates; without such a condition the claimed agreement with unconstrained theory does not follow.

Authors: We agree that the current draft of §3 does not explicitly bound the projection error e_k in the descent inequality. The analysis implicitly relies on the error being controlled but does not state the required decay. In the revision we will insert a new lemma that derives the descent inequality with an additive term involving ||e_k|| and then impose the standard condition ||e_k|| ≤ C α_k (with C independent of k). Under this condition the extra term is summable when α_k = O(1/√k), recovering exactly the same O(1/√K) rate as the unconstrained case. The revised proof will be written in full detail. revision: yes
Referee: [§2] Assumption set in §2: the paper invokes 'standard assumptions' but does not state a quantitative requirement on the inexact projection map (e.g., ||P_approx(x) - P_exact(x)|| ≤ ε_k with ε_k controlled by the step-size sequence). This assumption is load-bearing for the central claim.

Authors: We acknowledge that §2 currently refers only to “standard assumptions” without a quantitative bound on the inexact projection. We will add a new assumption (Assumption 2.4) that explicitly requires ||P_approx(x_k) - P_exact(x_k)|| ≤ C α_k. A short paragraph will also be included explaining how this tolerance can be met in practice by terminating an inner iterative solver after a fixed number of steps proportional to the current step-size. This makes the load-bearing assumption transparent and directly addresses the referee’s concern. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation matches unconstrained SGD theory via standard assumptions

full rationale

The paper adapts the projected gradient method to stochastic gradients and inexact projections onto linear equality constraints. Its central theoretical claim is that, under standard stochastic-gradient assumptions, the method yields results in agreement with the established unconstrained case. This agreement is obtained by extending existing external theory rather than by redefining quantities in terms of themselves, fitting parameters to the target result, or relying on self-citation chains for the load-bearing steps. The derivation therefore remains self-contained against external benchmarks of unconstrained SGD convergence and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from stochastic gradient literature; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)

domain assumption Standard assumptions in the field of stochastic gradient methods
Theoretical results are stated to hold under these assumptions as described in the abstract.

pith-pipeline@v0.9.0 · 5595 in / 1083 out tokens · 59993 ms · 2026-05-21T11:47:35.901843+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under standard assumptions in the field of stochastic gradient methods, we provide theoretical results in agreement with the theory for unconstrained problems.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The procedure relies on ... a possibly inexact projection map onto the linear equality constraint set.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 2 internal anchors

[1]

SIAM Review 60(2), 223–311 (2018)

Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for larg e-scale machine learning. SIAM Review 60(2), 223–311 (2018)

work page 2018
[2]

Advances in Neural Information Processing Sy stems 29, 685– 693 (2016)

Tan, C., Ma, S., Dai, Y.-H., Qian, Y.: Barzilai-borwein step size for sto chastic gradient descent. Advances in Neural Information Processing Sy stems 29, 685– 693 (2016)

work page 2016
[3]

Computat ional Optimization and Applications 91(2), 717–758 (2025)

Krklec Jerinki´ c, N., Ruggiero, V., Trombini, I.: Spectral stocha stic gradient method with additional sampling for ﬁnite and inﬁnite sums. Computat ional Optimization and Applications 91(2), 717–758 (2025)

work page 2025
[4]

Optimization Methods an d Software 91(2), 1–26 (2024)

Bellavia, S., Kreji´ c, N., N., K.J., Raydan, M.: Slises: Subsampled line s earch spectral gradient method for ﬁnite sums. Optimization Methods an d Software 91(2), 1–26 (2024)

work page 2024
[5]

Journal of Computational and Applie d Mathemat- ics 476, 117059 (2026)

Bellavia, S., Morini, B., Youseﬁ, M.: Fully stochastic trust-region me thods with Barzilai-Borwein steplengths. Journal of Computational and Applie d Mathemat- ics 476, 117059 (2026)

work page 2026
[6]

Robbins, H., Monro, S.: A stochastic approximation method. SIAM J. Optim 21, 1109–1140 (1951)

work page 1951
[7]

SIAM Journal on Scientiﬁc Computing 34(3), 1380–1405 (2012)

Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic me thods for data ﬁtting. SIAM Journal on Scientiﬁc Computing 34(3), 1380–1405 (2012)

work page 2012
[8]

Journal of Computational and Applied Mathematics 451, 116083 (2024) 17

Franchini, G., Porta, F., Ruggiero, V., Trombini, I., Zanni, L.: A stoc hastic gra- dient method with variance control and variable learning rate for de ep learning. Journal of Computational and Applied Mathematics 451, 116083 (2024) 17

work page 2024
[9]

Computational Management Science 3(1), 55–79 (2006)

Bastin, F., Cirillo, C., Toint, P.L.: An adaptive Monte Carlo algorithm fo r com- puting mixed logit estimators. Computational Management Science 3(1), 55–79 (2006)

work page 2006
[10]

Com- putational Optimization and Applications 84, 53–84 (2023)

Bellavia, S., Kreji´ c, N., Morini, B., Rebegoldi, S.: A stochastic ﬁrs t-order trust-region method with inexact restoration for ﬁnite-sum minimiz ation. Com- putational Optimization and Applications 84, 53–84 (2023)

work page 2023
[11]

SIAM Journal on Optimization 30(1), 349–376 (2020)

Paquette, C., Scheinberg, K.: A stochastic line search method w ith expected complexity analysis. SIAM Journal on Optimization 30(1), 349–376 (2020)

work page 2020
[12]

Informs Journal on Optimization 1(3), 200–220 (2019)

Curtis, F.E., Scheinberg, K., Shi, R.: A stochastic trust region alg orithm based on careful step normalization. Informs Journal on Optimization 1(3), 200–220 (2019)

work page 2019
[13]

arXiv preprint arXiv:2601.11795 (2026)

Wang, Q., Piermarini, C., Zhu, Y., Curtis, F.E.: Projected stochas tic momentum methods for nonlinear equality-constrained optimization for machin e learning. arXiv preprint arXiv:2601.11795 (2026)

work page arXiv 2026
[14]

SIAM Journal on Optimization 34, 3592–3622 (2024)

Curtis, F., Robinson, D., Zhou, B.: Sequential quadratic optimiza tion for stochas- tic optimization with deterministic nonlinear inequality and equality cons traints. SIAM Journal on Optimization 34, 3592–3622 (2024)

work page 2024
[15]

SIAM Journal on Optimization 31(3), 1352–1379 (2021)

Berahas, A.S., Curtis, F.E., Robinson, D., Zhou, B.: Sequential qu adratic optimization for nonlinear equality constrained stochastic optimizat ion. SIAM Journal on Optimization 31(3), 1352–1379 (2021)

work page 2021
[16]

SIAM Journal on Optimization 34(2), 2000–2037 (2024)

Fang, Y., Na, S., Mahoney, M.W., Kolar, M.: Fully stochastic trust- region sequen- tial quadratic programming for equality-constrained optimization p roblems. SIAM Journal on Optimization 34(2), 2000–2037 (2024)

work page 2000
[17]

arXiv preprint arXiv:2504.19629 (2025)

Kreji´ c, N., Krklec Jerinki´ c, N., Rapaji´ c, S., Ruteˇ si´ c, L.: IPAS: An adaptive sample size method for weighted ﬁnite sum problems with linear equality const raints. arXiv preprint arXiv:2504.19629 (2025)

work page internal anchor Pith review arXiv 2025
[18]

SIAM Journal on Optimization 10(4), 1196–1211 (2000)

Birgin, E.G., Mart ´ ınez, J.M., Raydan, M.: Nonmonotone spectral projected gra- dient methods on convex sets. SIAM Journal on Optimization 10(4), 1196–1211 (2000)

work page 2000
[19]

Numerical Algorithms 68, 711–739 (2015)

Kreji´ c, N., Krklec Jerinki´ c, N.: Non-monotone line search methods with variable sample size. Numerical Algorithms 68, 711–739 (2015)

work page 2015
[20]

In: Optimizing Methods in Statistic s, pp

Robbins, H., Siegmund, D.: A convergence theorem for non nega tive almost super- martingales and some applications. In: Optimizing Methods in Statistic s, pp. 233–257. Elsevier, New York (1971) 18

work page 1971
[21]

arXiv:2505.0 6374 (2025)

Bellavia, S., Gratton, S., Morini, B., Toint, P.L.: Fast stochastic se cond-order Ada- grad for nonconvex bound-constrained optimization. arXiv:2505.0 6374 (2025)

work page 2025
[22]

ar Xiv preprint arXiv:2509.00547 (2025)

Kreji´ c, N., Krklec Jerinki´ c, N., Ostoji´ c, T., Vuˇ ci´ cevi´c, N.: AS-BOX: Additional sampling method for weighted sum problems with box constraints. ar Xiv preprint arXiv:2509.00547 (2025)

work page arXiv 2025
[23]

arXiv preprint arXiv:2508.02299 (2025)

Kreji´ c, N., Krklec Jerinki´ c, N., Ostoji´ c, T., Vuˇ ci´ cevi´c, N.: Aspen: An additional sampling penalty method for ﬁnite-sum optimization problems with non linear equality constraints. arXiv preprint arXiv:2508.02299 (2025)

work page arXiv 2025
[24]

ACM transactions on intelligent systems and technology (TIST) 2(3), 1–27 (2011)

Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector ma chines. ACM transactions on intelligent systems and technology (TIST) 2(3), 1–27 (2011)

work page 2011
[25]

Dai, Y.H., Fletcher, R.: Projected Barziali-Borwein methods for la rge box- constrained quadratic programming. Numer. Math. 100, 21–47 (2005)

work page 2005
[26]

Friedlander, A., Marti ´ ınez, J.M., Molina, B., Raydan, M.: Gradientmethods with retard and generalizations. SIAM J. Numer. Anal. 36, 275–289 (1999)

work page 1999
[27]

SGDR: Stochastic Gradient Descent with Warm Restarts

Loshchilov, I., Hutter, F.: SGDR: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[28]

Deep Learning Toolb ox Doc- umentation, available at: https://it.mathworks.com/help/deeplearning/ref/ cosinelearnrate.html (2024)

MathWorks: Cosine learning rate schedule. Deep Learning Toolb ox Doc- umentation, available at: https://it.mathworks.com/help/deeplearning/ref/ cosinelearnrate.html (2024)

work page 2024
[29]

Optimization Methods and Software, 1–33 (2025) 19

Gratton, S., Toint, P.L.: S2MPJ and CUTEst optimization problems for Matlab, Python and Julia. Optimization Methods and Software, 1–33 (2025) 19

work page 2025

[1] [1]

SIAM Review 60(2), 223–311 (2018)

Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for larg e-scale machine learning. SIAM Review 60(2), 223–311 (2018)

work page 2018

[2] [2]

Advances in Neural Information Processing Sy stems 29, 685– 693 (2016)

Tan, C., Ma, S., Dai, Y.-H., Qian, Y.: Barzilai-borwein step size for sto chastic gradient descent. Advances in Neural Information Processing Sy stems 29, 685– 693 (2016)

work page 2016

[3] [3]

Computat ional Optimization and Applications 91(2), 717–758 (2025)

Krklec Jerinki´ c, N., Ruggiero, V., Trombini, I.: Spectral stocha stic gradient method with additional sampling for ﬁnite and inﬁnite sums. Computat ional Optimization and Applications 91(2), 717–758 (2025)

work page 2025

[4] [4]

Optimization Methods an d Software 91(2), 1–26 (2024)

Bellavia, S., Kreji´ c, N., N., K.J., Raydan, M.: Slises: Subsampled line s earch spectral gradient method for ﬁnite sums. Optimization Methods an d Software 91(2), 1–26 (2024)

work page 2024

[5] [5]

Journal of Computational and Applie d Mathemat- ics 476, 117059 (2026)

Bellavia, S., Morini, B., Youseﬁ, M.: Fully stochastic trust-region me thods with Barzilai-Borwein steplengths. Journal of Computational and Applie d Mathemat- ics 476, 117059 (2026)

work page 2026

[6] [6]

Robbins, H., Monro, S.: A stochastic approximation method. SIAM J. Optim 21, 1109–1140 (1951)

work page 1951

[7] [7]

SIAM Journal on Scientiﬁc Computing 34(3), 1380–1405 (2012)

Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic me thods for data ﬁtting. SIAM Journal on Scientiﬁc Computing 34(3), 1380–1405 (2012)

work page 2012

[8] [8]

Journal of Computational and Applied Mathematics 451, 116083 (2024) 17

Franchini, G., Porta, F., Ruggiero, V., Trombini, I., Zanni, L.: A stoc hastic gra- dient method with variance control and variable learning rate for de ep learning. Journal of Computational and Applied Mathematics 451, 116083 (2024) 17

work page 2024

[9] [9]

Computational Management Science 3(1), 55–79 (2006)

Bastin, F., Cirillo, C., Toint, P.L.: An adaptive Monte Carlo algorithm fo r com- puting mixed logit estimators. Computational Management Science 3(1), 55–79 (2006)

work page 2006

[10] [10]

Com- putational Optimization and Applications 84, 53–84 (2023)

Bellavia, S., Kreji´ c, N., Morini, B., Rebegoldi, S.: A stochastic ﬁrs t-order trust-region method with inexact restoration for ﬁnite-sum minimiz ation. Com- putational Optimization and Applications 84, 53–84 (2023)

work page 2023

[11] [11]

SIAM Journal on Optimization 30(1), 349–376 (2020)

Paquette, C., Scheinberg, K.: A stochastic line search method w ith expected complexity analysis. SIAM Journal on Optimization 30(1), 349–376 (2020)

work page 2020

[12] [12]

Informs Journal on Optimization 1(3), 200–220 (2019)

Curtis, F.E., Scheinberg, K., Shi, R.: A stochastic trust region alg orithm based on careful step normalization. Informs Journal on Optimization 1(3), 200–220 (2019)

work page 2019

[13] [13]

arXiv preprint arXiv:2601.11795 (2026)

Wang, Q., Piermarini, C., Zhu, Y., Curtis, F.E.: Projected stochas tic momentum methods for nonlinear equality-constrained optimization for machin e learning. arXiv preprint arXiv:2601.11795 (2026)

work page arXiv 2026

[14] [14]

SIAM Journal on Optimization 34, 3592–3622 (2024)

Curtis, F., Robinson, D., Zhou, B.: Sequential quadratic optimiza tion for stochas- tic optimization with deterministic nonlinear inequality and equality cons traints. SIAM Journal on Optimization 34, 3592–3622 (2024)

work page 2024

[15] [15]

SIAM Journal on Optimization 31(3), 1352–1379 (2021)

Berahas, A.S., Curtis, F.E., Robinson, D., Zhou, B.: Sequential qu adratic optimization for nonlinear equality constrained stochastic optimizat ion. SIAM Journal on Optimization 31(3), 1352–1379 (2021)

work page 2021

[16] [16]

SIAM Journal on Optimization 34(2), 2000–2037 (2024)

Fang, Y., Na, S., Mahoney, M.W., Kolar, M.: Fully stochastic trust- region sequen- tial quadratic programming for equality-constrained optimization p roblems. SIAM Journal on Optimization 34(2), 2000–2037 (2024)

work page 2000

[17] [17]

arXiv preprint arXiv:2504.19629 (2025)

Kreji´ c, N., Krklec Jerinki´ c, N., Rapaji´ c, S., Ruteˇ si´ c, L.: IPAS: An adaptive sample size method for weighted ﬁnite sum problems with linear equality const raints. arXiv preprint arXiv:2504.19629 (2025)

work page internal anchor Pith review arXiv 2025

[18] [18]

SIAM Journal on Optimization 10(4), 1196–1211 (2000)

Birgin, E.G., Mart ´ ınez, J.M., Raydan, M.: Nonmonotone spectral projected gra- dient methods on convex sets. SIAM Journal on Optimization 10(4), 1196–1211 (2000)

work page 2000

[19] [19]

Numerical Algorithms 68, 711–739 (2015)

Kreji´ c, N., Krklec Jerinki´ c, N.: Non-monotone line search methods with variable sample size. Numerical Algorithms 68, 711–739 (2015)

work page 2015

[20] [20]

In: Optimizing Methods in Statistic s, pp

Robbins, H., Siegmund, D.: A convergence theorem for non nega tive almost super- martingales and some applications. In: Optimizing Methods in Statistic s, pp. 233–257. Elsevier, New York (1971) 18

work page 1971

[21] [21]

arXiv:2505.0 6374 (2025)

Bellavia, S., Gratton, S., Morini, B., Toint, P.L.: Fast stochastic se cond-order Ada- grad for nonconvex bound-constrained optimization. arXiv:2505.0 6374 (2025)

work page 2025

[22] [22]

ar Xiv preprint arXiv:2509.00547 (2025)

Kreji´ c, N., Krklec Jerinki´ c, N., Ostoji´ c, T., Vuˇ ci´ cevi´c, N.: AS-BOX: Additional sampling method for weighted sum problems with box constraints. ar Xiv preprint arXiv:2509.00547 (2025)

work page arXiv 2025

[23] [23]

arXiv preprint arXiv:2508.02299 (2025)

Kreji´ c, N., Krklec Jerinki´ c, N., Ostoji´ c, T., Vuˇ ci´ cevi´c, N.: Aspen: An additional sampling penalty method for ﬁnite-sum optimization problems with non linear equality constraints. arXiv preprint arXiv:2508.02299 (2025)

work page arXiv 2025

[24] [24]

ACM transactions on intelligent systems and technology (TIST) 2(3), 1–27 (2011)

Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector ma chines. ACM transactions on intelligent systems and technology (TIST) 2(3), 1–27 (2011)

work page 2011

[25] [25]

Dai, Y.H., Fletcher, R.: Projected Barziali-Borwein methods for la rge box- constrained quadratic programming. Numer. Math. 100, 21–47 (2005)

work page 2005

[26] [26]

Friedlander, A., Marti ´ ınez, J.M., Molina, B., Raydan, M.: Gradientmethods with retard and generalizations. SIAM J. Numer. Anal. 36, 275–289 (1999)

work page 1999

[27] [27]

SGDR: Stochastic Gradient Descent with Warm Restarts

Loshchilov, I., Hutter, F.: SGDR: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[28] [28]

Deep Learning Toolb ox Doc- umentation, available at: https://it.mathworks.com/help/deeplearning/ref/ cosinelearnrate.html (2024)

MathWorks: Cosine learning rate schedule. Deep Learning Toolb ox Doc- umentation, available at: https://it.mathworks.com/help/deeplearning/ref/ cosinelearnrate.html (2024)

work page 2024

[29] [29]

Optimization Methods and Software, 1–33 (2025) 19

Gratton, S., Toint, P.L.: S2MPJ and CUTEst optimization problems for Matlab, Python and Julia. Optimization Methods and Software, 1–33 (2025) 19

work page 2025