A Projected Stochastic Gradient Method for Finite-Sum Problems with Linear Equality Constraints
Pith reviewed 2026-05-21 11:47 UTC · model grok-4.3
The pith
A projected stochastic gradient method solves finite-sum minimization under linear equality constraints while matching the convergence theory of unconstrained problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present an iterative procedure that computes a stochastic gradient, takes a step, and then applies a possibly inexact projection onto the affine constraint set. They prove that the resulting sequence satisfies the same almost-sure or expected convergence bounds that standard stochastic gradient methods achieve for unconstrained finite-sum problems.
What carries the argument
Projected stochastic gradient iteration with a possibly inexact orthogonal projection onto the linear equality constraint set.
If this is right
- The method converges at the same rate as unconstrained stochastic gradient descent when the projection is exact.
- Inexact projections are admissible without destroying the theoretical guarantees.
- The procedure remains practical for large finite sums because each iteration uses only one or a few component gradients.
- Numerical tests confirm that the method behaves as predicted on test problems with linear equalities.
Where Pith is reading between the lines
- The same projection step could be reused inside variance-reduced stochastic methods to obtain faster rates under the same constraint.
- If the linear equalities are replaced by simple bound constraints, the projection becomes trivial and the method reduces to a known box-constrained variant.
- The framework suggests a template for other deterministic constraints whose projections can be computed cheaply.
Load-bearing premise
The analysis rests on the standard bounded-variance and Lipschitz assumptions of stochastic gradient methods together with the existence of a computable projection map onto the constraint set.
What would settle it
A concrete counterexample in which the method, run with a valid stochastic gradient oracle and an exact projection, fails to satisfy the expected decrease or convergence rate stated for the unconstrained case would refute the central claim.
read the original abstract
A stochastic gradient method for finite-sum minimization subject to deterministic linear constraints is proposed and analyzed. The procedure presented adapts the projected gradient method on convex set to the use of both a stochastic gradient and a possibly inexact projection map. Under standard assumptions in the field of stochastic gradient methods, we provide theoretical results in agreement with the theory for unconstrained problems. Numerical results are presented to show the practical behavior of the procedure.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a projected stochastic gradient method for finite-sum minimization subject to deterministic linear equality constraints. The algorithm adapts the projected gradient framework to stochastic gradients and allows a possibly inexact projection onto the constraint set. Under standard stochastic-gradient assumptions, the authors claim that the method achieves convergence results in agreement with the unconstrained case. Numerical experiments are included to illustrate practical behavior.
Significance. If the analysis rigorously controls the inexact-projection error, the work would supply a straightforward extension of SGD to linearly constrained finite-sum problems, with potential computational savings from inexact projections. The numerical results provide initial evidence of practical viability, but the overall significance hinges on whether the theoretical claims survive a detailed error analysis.
major comments (2)
- [§3] §3 (Convergence Analysis): the descent inequality and subsequent rate derivation do not explicitly bound or assume decay of the per-step projection error e_k. Standard projected-SGD proofs require ||e_k|| = O(α_k) or summability of ||e_k|| to preserve the unconstrained rates; without such a condition the claimed agreement with unconstrained theory does not follow.
- [§2] Assumption set in §2: the paper invokes 'standard assumptions' but does not state a quantitative requirement on the inexact projection map (e.g., ||P_approx(x) - P_exact(x)|| ≤ ε_k with ε_k controlled by the step-size sequence). This assumption is load-bearing for the central claim.
minor comments (2)
- [Abstract] The abstract and introduction should explicitly name the convergence rates (e.g., O(1/√T) or O(1/T)) that are claimed to match the unconstrained case.
- [Numerical Experiments] Figure captions in the numerical section would benefit from stating the constraint residual norm and the exact projection baseline used for comparison.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review of our manuscript. The comments correctly identify that the current presentation of the convergence analysis and assumptions does not explicitly quantify the inexact-projection error, which is necessary to rigorously support the claimed rates. We will revise the paper by adding a precise assumption on the projection error and updating the proof accordingly. This strengthens the theoretical contribution while preserving the main results. Point-by-point responses follow.
read point-by-point responses
-
Referee: [§3] §3 (Convergence Analysis): the descent inequality and subsequent rate derivation do not explicitly bound or assume decay of the per-step projection error e_k. Standard projected-SGD proofs require ||e_k|| = O(α_k) or summability of ||e_k|| to preserve the unconstrained rates; without such a condition the claimed agreement with unconstrained theory does not follow.
Authors: We agree that the current draft of §3 does not explicitly bound the projection error e_k in the descent inequality. The analysis implicitly relies on the error being controlled but does not state the required decay. In the revision we will insert a new lemma that derives the descent inequality with an additive term involving ||e_k|| and then impose the standard condition ||e_k|| ≤ C α_k (with C independent of k). Under this condition the extra term is summable when α_k = O(1/√k), recovering exactly the same O(1/√K) rate as the unconstrained case. The revised proof will be written in full detail. revision: yes
-
Referee: [§2] Assumption set in §2: the paper invokes 'standard assumptions' but does not state a quantitative requirement on the inexact projection map (e.g., ||P_approx(x) - P_exact(x)|| ≤ ε_k with ε_k controlled by the step-size sequence). This assumption is load-bearing for the central claim.
Authors: We acknowledge that §2 currently refers only to “standard assumptions” without a quantitative bound on the inexact projection. We will add a new assumption (Assumption 2.4) that explicitly requires ||P_approx(x_k) - P_exact(x_k)|| ≤ C α_k. A short paragraph will also be included explaining how this tolerance can be met in practice by terminating an inner iterative solver after a fixed number of steps proportional to the current step-size. This makes the load-bearing assumption transparent and directly addresses the referee’s concern. revision: yes
Circularity Check
No circularity; derivation matches unconstrained SGD theory via standard assumptions
full rationale
The paper adapts the projected gradient method to stochastic gradients and inexact projections onto linear equality constraints. Its central theoretical claim is that, under standard stochastic-gradient assumptions, the method yields results in agreement with the established unconstrained case. This agreement is obtained by extending existing external theory rather than by redefining quantities in terms of themselves, fitting parameters to the target result, or relying on self-citation chains for the load-bearing steps. The derivation therefore remains self-contained against external benchmarks of unconstrained SGD convergence and does not reduce to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions in the field of stochastic gradient methods
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Under standard assumptions in the field of stochastic gradient methods, we provide theoretical results in agreement with the theory for unconstrained problems.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The procedure relies on ... a possibly inexact projection map onto the linear equality constraint set.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
SIAM Review 60(2), 223–311 (2018)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for larg e-scale machine learning. SIAM Review 60(2), 223–311 (2018)
work page 2018
-
[2]
Advances in Neural Information Processing Sy stems 29, 685– 693 (2016)
Tan, C., Ma, S., Dai, Y.-H., Qian, Y.: Barzilai-borwein step size for sto chastic gradient descent. Advances in Neural Information Processing Sy stems 29, 685– 693 (2016)
work page 2016
-
[3]
Computat ional Optimization and Applications 91(2), 717–758 (2025)
Krklec Jerinki´ c, N., Ruggiero, V., Trombini, I.: Spectral stocha stic gradient method with additional sampling for finite and infinite sums. Computat ional Optimization and Applications 91(2), 717–758 (2025)
work page 2025
-
[4]
Optimization Methods an d Software 91(2), 1–26 (2024)
Bellavia, S., Kreji´ c, N., N., K.J., Raydan, M.: Slises: Subsampled line s earch spectral gradient method for finite sums. Optimization Methods an d Software 91(2), 1–26 (2024)
work page 2024
-
[5]
Journal of Computational and Applie d Mathemat- ics 476, 117059 (2026)
Bellavia, S., Morini, B., Yousefi, M.: Fully stochastic trust-region me thods with Barzilai-Borwein steplengths. Journal of Computational and Applie d Mathemat- ics 476, 117059 (2026)
work page 2026
-
[6]
Robbins, H., Monro, S.: A stochastic approximation method. SIAM J. Optim 21, 1109–1140 (1951)
work page 1951
-
[7]
SIAM Journal on Scientific Computing 34(3), 1380–1405 (2012)
Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic me thods for data fitting. SIAM Journal on Scientific Computing 34(3), 1380–1405 (2012)
work page 2012
-
[8]
Journal of Computational and Applied Mathematics 451, 116083 (2024) 17
Franchini, G., Porta, F., Ruggiero, V., Trombini, I., Zanni, L.: A stoc hastic gra- dient method with variance control and variable learning rate for de ep learning. Journal of Computational and Applied Mathematics 451, 116083 (2024) 17
work page 2024
-
[9]
Computational Management Science 3(1), 55–79 (2006)
Bastin, F., Cirillo, C., Toint, P.L.: An adaptive Monte Carlo algorithm fo r com- puting mixed logit estimators. Computational Management Science 3(1), 55–79 (2006)
work page 2006
-
[10]
Com- putational Optimization and Applications 84, 53–84 (2023)
Bellavia, S., Kreji´ c, N., Morini, B., Rebegoldi, S.: A stochastic firs t-order trust-region method with inexact restoration for finite-sum minimiz ation. Com- putational Optimization and Applications 84, 53–84 (2023)
work page 2023
-
[11]
SIAM Journal on Optimization 30(1), 349–376 (2020)
Paquette, C., Scheinberg, K.: A stochastic line search method w ith expected complexity analysis. SIAM Journal on Optimization 30(1), 349–376 (2020)
work page 2020
-
[12]
Informs Journal on Optimization 1(3), 200–220 (2019)
Curtis, F.E., Scheinberg, K., Shi, R.: A stochastic trust region alg orithm based on careful step normalization. Informs Journal on Optimization 1(3), 200–220 (2019)
work page 2019
-
[13]
arXiv preprint arXiv:2601.11795 (2026)
Wang, Q., Piermarini, C., Zhu, Y., Curtis, F.E.: Projected stochas tic momentum methods for nonlinear equality-constrained optimization for machin e learning. arXiv preprint arXiv:2601.11795 (2026)
-
[14]
SIAM Journal on Optimization 34, 3592–3622 (2024)
Curtis, F., Robinson, D., Zhou, B.: Sequential quadratic optimiza tion for stochas- tic optimization with deterministic nonlinear inequality and equality cons traints. SIAM Journal on Optimization 34, 3592–3622 (2024)
work page 2024
-
[15]
SIAM Journal on Optimization 31(3), 1352–1379 (2021)
Berahas, A.S., Curtis, F.E., Robinson, D., Zhou, B.: Sequential qu adratic optimization for nonlinear equality constrained stochastic optimizat ion. SIAM Journal on Optimization 31(3), 1352–1379 (2021)
work page 2021
-
[16]
SIAM Journal on Optimization 34(2), 2000–2037 (2024)
Fang, Y., Na, S., Mahoney, M.W., Kolar, M.: Fully stochastic trust- region sequen- tial quadratic programming for equality-constrained optimization p roblems. SIAM Journal on Optimization 34(2), 2000–2037 (2024)
work page 2000
-
[17]
arXiv preprint arXiv:2504.19629 (2025)
Kreji´ c, N., Krklec Jerinki´ c, N., Rapaji´ c, S., Ruteˇ si´ c, L.: IPAS: An adaptive sample size method for weighted finite sum problems with linear equality const raints. arXiv preprint arXiv:2504.19629 (2025)
work page internal anchor Pith review arXiv 2025
-
[18]
SIAM Journal on Optimization 10(4), 1196–1211 (2000)
Birgin, E.G., Mart ´ ınez, J.M., Raydan, M.: Nonmonotone spectral projected gra- dient methods on convex sets. SIAM Journal on Optimization 10(4), 1196–1211 (2000)
work page 2000
-
[19]
Numerical Algorithms 68, 711–739 (2015)
Kreji´ c, N., Krklec Jerinki´ c, N.: Non-monotone line search methods with variable sample size. Numerical Algorithms 68, 711–739 (2015)
work page 2015
-
[20]
In: Optimizing Methods in Statistic s, pp
Robbins, H., Siegmund, D.: A convergence theorem for non nega tive almost super- martingales and some applications. In: Optimizing Methods in Statistic s, pp. 233–257. Elsevier, New York (1971) 18
work page 1971
-
[21]
Bellavia, S., Gratton, S., Morini, B., Toint, P.L.: Fast stochastic se cond-order Ada- grad for nonconvex bound-constrained optimization. arXiv:2505.0 6374 (2025)
work page 2025
-
[22]
ar Xiv preprint arXiv:2509.00547 (2025)
Kreji´ c, N., Krklec Jerinki´ c, N., Ostoji´ c, T., Vuˇ ci´ cevi´c, N.: AS-BOX: Additional sampling method for weighted sum problems with box constraints. ar Xiv preprint arXiv:2509.00547 (2025)
-
[23]
arXiv preprint arXiv:2508.02299 (2025)
Kreji´ c, N., Krklec Jerinki´ c, N., Ostoji´ c, T., Vuˇ ci´ cevi´c, N.: Aspen: An additional sampling penalty method for finite-sum optimization problems with non linear equality constraints. arXiv preprint arXiv:2508.02299 (2025)
-
[24]
ACM transactions on intelligent systems and technology (TIST) 2(3), 1–27 (2011)
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector ma chines. ACM transactions on intelligent systems and technology (TIST) 2(3), 1–27 (2011)
work page 2011
-
[25]
Dai, Y.H., Fletcher, R.: Projected Barziali-Borwein methods for la rge box- constrained quadratic programming. Numer. Math. 100, 21–47 (2005)
work page 2005
-
[26]
Friedlander, A., Marti ´ ınez, J.M., Molina, B., Raydan, M.: Gradientmethods with retard and generalizations. SIAM J. Numer. Anal. 36, 275–289 (1999)
work page 1999
-
[27]
SGDR: Stochastic Gradient Descent with Warm Restarts
Loshchilov, I., Hutter, F.: SGDR: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[28]
MathWorks: Cosine learning rate schedule. Deep Learning Toolb ox Doc- umentation, available at: https://it.mathworks.com/help/deeplearning/ref/ cosinelearnrate.html (2024)
work page 2024
-
[29]
Optimization Methods and Software, 1–33 (2025) 19
Gratton, S., Toint, P.L.: S2MPJ and CUTEst optimization problems for Matlab, Python and Julia. Optimization Methods and Software, 1–33 (2025) 19
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.