pith. sign in

arxiv: 2605.16263 · v1 · pith:P4ZWVO7Dnew · submitted 2026-03-05 · 🧮 math.OC

A Projected Stochastic Gradient Method for Finite-Sum Problems with Linear Equality Constraints

Pith reviewed 2026-05-21 11:47 UTC · model grok-4.3

classification 🧮 math.OC
keywords stochastic gradient methodfinite-sum optimizationlinear equality constraintsprojected gradientconstrained optimizationconvergence analysisinexact projection
0
0 comments X

The pith

A projected stochastic gradient method solves finite-sum minimization under linear equality constraints while matching the convergence theory of unconstrained problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a stochastic gradient algorithm that projects iterates onto a deterministic linear equality constraint set. It adapts the classical projected gradient method to use noisy gradients drawn from a finite sum and to accept inexact projections. Under the usual assumptions of stochastic gradient analysis, the method is shown to inherit the same convergence guarantees that hold without constraints. This matters because many large-scale optimization tasks, such as those arising in machine learning or resource allocation, naturally include linear equalities yet must be solved with limited gradient evaluations.

Core claim

The authors present an iterative procedure that computes a stochastic gradient, takes a step, and then applies a possibly inexact projection onto the affine constraint set. They prove that the resulting sequence satisfies the same almost-sure or expected convergence bounds that standard stochastic gradient methods achieve for unconstrained finite-sum problems.

What carries the argument

Projected stochastic gradient iteration with a possibly inexact orthogonal projection onto the linear equality constraint set.

If this is right

  • The method converges at the same rate as unconstrained stochastic gradient descent when the projection is exact.
  • Inexact projections are admissible without destroying the theoretical guarantees.
  • The procedure remains practical for large finite sums because each iteration uses only one or a few component gradients.
  • Numerical tests confirm that the method behaves as predicted on test problems with linear equalities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same projection step could be reused inside variance-reduced stochastic methods to obtain faster rates under the same constraint.
  • If the linear equalities are replaced by simple bound constraints, the projection becomes trivial and the method reduces to a known box-constrained variant.
  • The framework suggests a template for other deterministic constraints whose projections can be computed cheaply.

Load-bearing premise

The analysis rests on the standard bounded-variance and Lipschitz assumptions of stochastic gradient methods together with the existence of a computable projection map onto the constraint set.

What would settle it

A concrete counterexample in which the method, run with a valid stochastic gradient oracle and an exact projection, fails to satisfy the expected decrease or convergence rate stated for the unconstrained case would refute the central claim.

read the original abstract

A stochastic gradient method for finite-sum minimization subject to deterministic linear constraints is proposed and analyzed. The procedure presented adapts the projected gradient method on convex set to the use of both a stochastic gradient and a possibly inexact projection map. Under standard assumptions in the field of stochastic gradient methods, we provide theoretical results in agreement with the theory for unconstrained problems. Numerical results are presented to show the practical behavior of the procedure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a projected stochastic gradient method for finite-sum minimization subject to deterministic linear equality constraints. The algorithm adapts the projected gradient framework to stochastic gradients and allows a possibly inexact projection onto the constraint set. Under standard stochastic-gradient assumptions, the authors claim that the method achieves convergence results in agreement with the unconstrained case. Numerical experiments are included to illustrate practical behavior.

Significance. If the analysis rigorously controls the inexact-projection error, the work would supply a straightforward extension of SGD to linearly constrained finite-sum problems, with potential computational savings from inexact projections. The numerical results provide initial evidence of practical viability, but the overall significance hinges on whether the theoretical claims survive a detailed error analysis.

major comments (2)
  1. [§3] §3 (Convergence Analysis): the descent inequality and subsequent rate derivation do not explicitly bound or assume decay of the per-step projection error e_k. Standard projected-SGD proofs require ||e_k|| = O(α_k) or summability of ||e_k|| to preserve the unconstrained rates; without such a condition the claimed agreement with unconstrained theory does not follow.
  2. [§2] Assumption set in §2: the paper invokes 'standard assumptions' but does not state a quantitative requirement on the inexact projection map (e.g., ||P_approx(x) - P_exact(x)|| ≤ ε_k with ε_k controlled by the step-size sequence). This assumption is load-bearing for the central claim.
minor comments (2)
  1. [Abstract] The abstract and introduction should explicitly name the convergence rates (e.g., O(1/√T) or O(1/T)) that are claimed to match the unconstrained case.
  2. [Numerical Experiments] Figure captions in the numerical section would benefit from stating the constraint residual norm and the exact projection baseline used for comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. The comments correctly identify that the current presentation of the convergence analysis and assumptions does not explicitly quantify the inexact-projection error, which is necessary to rigorously support the claimed rates. We will revise the paper by adding a precise assumption on the projection error and updating the proof accordingly. This strengthens the theoretical contribution while preserving the main results. Point-by-point responses follow.

read point-by-point responses
  1. Referee: [§3] §3 (Convergence Analysis): the descent inequality and subsequent rate derivation do not explicitly bound or assume decay of the per-step projection error e_k. Standard projected-SGD proofs require ||e_k|| = O(α_k) or summability of ||e_k|| to preserve the unconstrained rates; without such a condition the claimed agreement with unconstrained theory does not follow.

    Authors: We agree that the current draft of §3 does not explicitly bound the projection error e_k in the descent inequality. The analysis implicitly relies on the error being controlled but does not state the required decay. In the revision we will insert a new lemma that derives the descent inequality with an additive term involving ||e_k|| and then impose the standard condition ||e_k|| ≤ C α_k (with C independent of k). Under this condition the extra term is summable when α_k = O(1/√k), recovering exactly the same O(1/√K) rate as the unconstrained case. The revised proof will be written in full detail. revision: yes

  2. Referee: [§2] Assumption set in §2: the paper invokes 'standard assumptions' but does not state a quantitative requirement on the inexact projection map (e.g., ||P_approx(x) - P_exact(x)|| ≤ ε_k with ε_k controlled by the step-size sequence). This assumption is load-bearing for the central claim.

    Authors: We acknowledge that §2 currently refers only to “standard assumptions” without a quantitative bound on the inexact projection. We will add a new assumption (Assumption 2.4) that explicitly requires ||P_approx(x_k) - P_exact(x_k)|| ≤ C α_k. A short paragraph will also be included explaining how this tolerance can be met in practice by terminating an inner iterative solver after a fixed number of steps proportional to the current step-size. This makes the load-bearing assumption transparent and directly addresses the referee’s concern. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation matches unconstrained SGD theory via standard assumptions

full rationale

The paper adapts the projected gradient method to stochastic gradients and inexact projections onto linear equality constraints. Its central theoretical claim is that, under standard stochastic-gradient assumptions, the method yields results in agreement with the established unconstrained case. This agreement is obtained by extending existing external theory rather than by redefining quantities in terms of themselves, fitting parameters to the target result, or relying on self-citation chains for the load-bearing steps. The derivation therefore remains self-contained against external benchmarks of unconstrained SGD convergence and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from stochastic gradient literature; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Standard assumptions in the field of stochastic gradient methods
    Theoretical results are stated to hold under these assumptions as described in the abstract.

pith-pipeline@v0.9.0 · 5595 in / 1083 out tokens · 59993 ms · 2026-05-21T11:47:35.901843+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 2 internal anchors

  1. [1]

    SIAM Review 60(2), 223–311 (2018)

    Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for larg e-scale machine learning. SIAM Review 60(2), 223–311 (2018)

  2. [2]

    Advances in Neural Information Processing Sy stems 29, 685– 693 (2016)

    Tan, C., Ma, S., Dai, Y.-H., Qian, Y.: Barzilai-borwein step size for sto chastic gradient descent. Advances in Neural Information Processing Sy stems 29, 685– 693 (2016)

  3. [3]

    Computat ional Optimization and Applications 91(2), 717–758 (2025)

    Krklec Jerinki´ c, N., Ruggiero, V., Trombini, I.: Spectral stocha stic gradient method with additional sampling for finite and infinite sums. Computat ional Optimization and Applications 91(2), 717–758 (2025)

  4. [4]

    Optimization Methods an d Software 91(2), 1–26 (2024)

    Bellavia, S., Kreji´ c, N., N., K.J., Raydan, M.: Slises: Subsampled line s earch spectral gradient method for finite sums. Optimization Methods an d Software 91(2), 1–26 (2024)

  5. [5]

    Journal of Computational and Applie d Mathemat- ics 476, 117059 (2026)

    Bellavia, S., Morini, B., Yousefi, M.: Fully stochastic trust-region me thods with Barzilai-Borwein steplengths. Journal of Computational and Applie d Mathemat- ics 476, 117059 (2026)

  6. [6]

    Robbins, H., Monro, S.: A stochastic approximation method. SIAM J. Optim 21, 1109–1140 (1951)

  7. [7]

    SIAM Journal on Scientific Computing 34(3), 1380–1405 (2012)

    Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic me thods for data fitting. SIAM Journal on Scientific Computing 34(3), 1380–1405 (2012)

  8. [8]

    Journal of Computational and Applied Mathematics 451, 116083 (2024) 17

    Franchini, G., Porta, F., Ruggiero, V., Trombini, I., Zanni, L.: A stoc hastic gra- dient method with variance control and variable learning rate for de ep learning. Journal of Computational and Applied Mathematics 451, 116083 (2024) 17

  9. [9]

    Computational Management Science 3(1), 55–79 (2006)

    Bastin, F., Cirillo, C., Toint, P.L.: An adaptive Monte Carlo algorithm fo r com- puting mixed logit estimators. Computational Management Science 3(1), 55–79 (2006)

  10. [10]

    Com- putational Optimization and Applications 84, 53–84 (2023)

    Bellavia, S., Kreji´ c, N., Morini, B., Rebegoldi, S.: A stochastic firs t-order trust-region method with inexact restoration for finite-sum minimiz ation. Com- putational Optimization and Applications 84, 53–84 (2023)

  11. [11]

    SIAM Journal on Optimization 30(1), 349–376 (2020)

    Paquette, C., Scheinberg, K.: A stochastic line search method w ith expected complexity analysis. SIAM Journal on Optimization 30(1), 349–376 (2020)

  12. [12]

    Informs Journal on Optimization 1(3), 200–220 (2019)

    Curtis, F.E., Scheinberg, K., Shi, R.: A stochastic trust region alg orithm based on careful step normalization. Informs Journal on Optimization 1(3), 200–220 (2019)

  13. [13]

    arXiv preprint arXiv:2601.11795 (2026)

    Wang, Q., Piermarini, C., Zhu, Y., Curtis, F.E.: Projected stochas tic momentum methods for nonlinear equality-constrained optimization for machin e learning. arXiv preprint arXiv:2601.11795 (2026)

  14. [14]

    SIAM Journal on Optimization 34, 3592–3622 (2024)

    Curtis, F., Robinson, D., Zhou, B.: Sequential quadratic optimiza tion for stochas- tic optimization with deterministic nonlinear inequality and equality cons traints. SIAM Journal on Optimization 34, 3592–3622 (2024)

  15. [15]

    SIAM Journal on Optimization 31(3), 1352–1379 (2021)

    Berahas, A.S., Curtis, F.E., Robinson, D., Zhou, B.: Sequential qu adratic optimization for nonlinear equality constrained stochastic optimizat ion. SIAM Journal on Optimization 31(3), 1352–1379 (2021)

  16. [16]

    SIAM Journal on Optimization 34(2), 2000–2037 (2024)

    Fang, Y., Na, S., Mahoney, M.W., Kolar, M.: Fully stochastic trust- region sequen- tial quadratic programming for equality-constrained optimization p roblems. SIAM Journal on Optimization 34(2), 2000–2037 (2024)

  17. [17]

    arXiv preprint arXiv:2504.19629 (2025)

    Kreji´ c, N., Krklec Jerinki´ c, N., Rapaji´ c, S., Ruteˇ si´ c, L.: IPAS: An adaptive sample size method for weighted finite sum problems with linear equality const raints. arXiv preprint arXiv:2504.19629 (2025)

  18. [18]

    SIAM Journal on Optimization 10(4), 1196–1211 (2000)

    Birgin, E.G., Mart ´ ınez, J.M., Raydan, M.: Nonmonotone spectral projected gra- dient methods on convex sets. SIAM Journal on Optimization 10(4), 1196–1211 (2000)

  19. [19]

    Numerical Algorithms 68, 711–739 (2015)

    Kreji´ c, N., Krklec Jerinki´ c, N.: Non-monotone line search methods with variable sample size. Numerical Algorithms 68, 711–739 (2015)

  20. [20]

    In: Optimizing Methods in Statistic s, pp

    Robbins, H., Siegmund, D.: A convergence theorem for non nega tive almost super- martingales and some applications. In: Optimizing Methods in Statistic s, pp. 233–257. Elsevier, New York (1971) 18

  21. [21]

    arXiv:2505.0 6374 (2025)

    Bellavia, S., Gratton, S., Morini, B., Toint, P.L.: Fast stochastic se cond-order Ada- grad for nonconvex bound-constrained optimization. arXiv:2505.0 6374 (2025)

  22. [22]

    ar Xiv preprint arXiv:2509.00547 (2025)

    Kreji´ c, N., Krklec Jerinki´ c, N., Ostoji´ c, T., Vuˇ ci´ cevi´c, N.: AS-BOX: Additional sampling method for weighted sum problems with box constraints. ar Xiv preprint arXiv:2509.00547 (2025)

  23. [23]

    arXiv preprint arXiv:2508.02299 (2025)

    Kreji´ c, N., Krklec Jerinki´ c, N., Ostoji´ c, T., Vuˇ ci´ cevi´c, N.: Aspen: An additional sampling penalty method for finite-sum optimization problems with non linear equality constraints. arXiv preprint arXiv:2508.02299 (2025)

  24. [24]

    ACM transactions on intelligent systems and technology (TIST) 2(3), 1–27 (2011)

    Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector ma chines. ACM transactions on intelligent systems and technology (TIST) 2(3), 1–27 (2011)

  25. [25]

    Dai, Y.H., Fletcher, R.: Projected Barziali-Borwein methods for la rge box- constrained quadratic programming. Numer. Math. 100, 21–47 (2005)

  26. [26]

    Friedlander, A., Marti ´ ınez, J.M., Molina, B., Raydan, M.: Gradientmethods with retard and generalizations. SIAM J. Numer. Anal. 36, 275–289 (1999)

  27. [27]

    SGDR: Stochastic Gradient Descent with Warm Restarts

    Loshchilov, I., Hutter, F.: SGDR: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)

  28. [28]

    Deep Learning Toolb ox Doc- umentation, available at: https://it.mathworks.com/help/deeplearning/ref/ cosinelearnrate.html (2024)

    MathWorks: Cosine learning rate schedule. Deep Learning Toolb ox Doc- umentation, available at: https://it.mathworks.com/help/deeplearning/ref/ cosinelearnrate.html (2024)

  29. [29]

    Optimization Methods and Software, 1–33 (2025) 19

    Gratton, S., Toint, P.L.: S2MPJ and CUTEst optimization problems for Matlab, Python and Julia. Optimization Methods and Software, 1–33 (2025) 19