Projected Gradient Methods with Momentum

Diego Scuppa; Giampaolo Liuzzi; Marco Sciandrone; Matteo Lapucci; Stefano Lucidi

arxiv: 2601.16683 · v2 · submitted 2026-01-23 · 🧮 math.OC

Projected Gradient Methods with Momentum

Matteo Lapucci , Giampaolo Liuzzi , Stefano Lucidi , Marco Sciandrone , Diego Scuppa This is my paper

Pith reviewed 2026-05-16 11:45 UTC · model grok-4.3

classification 🧮 math.OC

keywords projected gradient methodsmomentum accelerationconstrained optimizationnonconvex optimizationconvergence analysisspectral gradient

0 comments

The pith

Momentum can be added to projected gradient methods for smooth nonconvex objectives over convex sets while retaining convergence guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes general algorithmic frameworks that combine projected gradient steps with momentum for smooth possibly nonconvex problems whose convex constraint set admits an efficient Euclidean projection. It identifies theoretically sound ways to insert momentum information and then works out one concrete algorithm that keeps per-iteration cost low. The method is shown to converge and to beat the standard spectral projected gradient method on two numerical benchmarks.

Core claim

A projected gradient algorithm augmented with momentum terms possesses the same convergence and complexity guarantees as its non-momentum counterpart yet produces faster practical progress on the tested constrained problems.

What carries the argument

A tailored projected gradient iteration that incorporates a momentum correction before the projection step, derived from a general convergence framework for momentum-augmented projected methods.

If this is right

The added momentum term does not worsen the worst-case complexity bound of projected gradient methods.
The same convergence theory covers both convex and nonconvex objectives under the stated smoothness assumption.
The per-iteration overhead remains comparable to a single gradient evaluation plus projection.
Empirical gains appear in at least two distinct application domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same momentum construction could be tested on other first-order constrained schemes such as conditional gradient methods.
If the projection cost dominates, the practical advantage of momentum may shrink relative to the unconstrained case.
Extensions to stochastic or inexact gradient oracles would require only minor changes to the existing analysis.

Load-bearing premise

The objective is smooth and the Euclidean projection onto the convex constraint set is available at reasonable cost.

What would settle it

A problem instance on which the momentum version fails to reach the same accuracy as the plain spectral projected gradient method within the stated iteration budget.

Figures

Figures reproduced from arXiv: 2601.16683 by Diego Scuppa, Giampaolo Liuzzi, Marco Sciandrone, Matteo Lapucci, Stefano Lucidi.

**Figure 1.** Figure 1: Performance profiles for the comparison between SPG and PGMM on the 100 LR instances with ℓ1-ball constraints. (a) CPU Time = 2 4 6 8 10 ; a ( = ) 0 0.2 0.4 0.6 0.8 1 SPG PGMM (b) Iterations = 2 4 6 8 10 ; a ( = ) 0 0.2 0.4 0.6 0.8 1 SPG PGMM (c) Function evaluations = 2 4 6 8 10 ; a ( = ) 0 0.2 0.4 0.6 0.8 1 SPG PGMM [PITH_FULL_IMAGE:figures/full_fig_p021_1.png] view at source ↗

**Figure 3.** Figure 3: Feasible set of the bidimensional problem in [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

read the original abstract

We focus on the optimization problem with smooth, possibly nonconvex objectives and a convex constraint set for which the Euclidean projection operation is practically available. Focusing on this setting, we carry out a general convergence and complexity analysis for algorithmic frameworks. Consequently, we discuss theoretically sound strategies to integrate momentum information within classical projected gradient type algorithms. One of these approaches is then developed in detail, up to the definition of a tailored algorithm with both theoretical guarantees and reasonable per-iteration cost. The proposed method is finally shown to outperform the standard (spectral) projected gradient method in two different experimental benchmarks, indicating that the addition of momentum terms is as beneficial in the constrained setting as it is in the unconstrained scenario.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Momentum integrates into projected gradient methods with standard analysis and limited experiments showing modest gains.

read the letter

The paper adapts momentum techniques to projected gradient methods for smooth possibly nonconvex objectives over convex constraint sets where the projection is available. They provide a general convergence analysis for such frameworks, then detail one momentum-integrated algorithm with theoretical guarantees and reasonable cost, and show it outperforms the standard spectral projected gradient in two benchmarks. This is a direct and useful extension of unconstrained momentum ideas to the constrained case, with the analysis building cleanly on standard assumptions like Lipschitz smoothness. The work does well in keeping the per-iteration cost low and in providing both theory and numerical evidence that momentum helps similarly here as in the unconstrained scenario. The soft spots are that the analysis is fairly standard without particularly tight new bounds or surprising convergence rates, and the experiments are limited to two cases without much discussion of problem characteristics or variability, which leaves room for more validation to support broad claims. The math and assumptions look solid, with no circularity or hidden parameters beyond the usual momentum coefficient. This paper is for researchers in continuous optimization who deal with constrained nonconvex problems and want practical first-order methods with some acceleration. It deserves a serious referee because the claims are grounded and the contribution is clear enough to benefit from detailed review, even if it is incremental.

Referee Report

2 major / 2 minor

Summary. The paper addresses optimization of smooth (possibly nonconvex) objectives over convex constraint sets where the Euclidean projection is available. It develops a general convergence and complexity analysis for projected-gradient frameworks, identifies theoretically sound ways to incorporate momentum, fully specifies one such algorithm with per-iteration cost comparable to the baseline, proves convergence guarantees for it, and reports that the method outperforms the spectral projected-gradient baseline on two numerical benchmarks.

Significance. If the stated rates and empirical gains hold, the work supplies a practical, theoretically justified way to accelerate projected-gradient methods in the constrained setting, extending the well-known benefits of momentum from the unconstrained case. The explicit complexity bounds and the fact that the algorithm retains the same per-iteration cost as the baseline are concrete strengths.

major comments (2)

[§3.2, Theorem 3.4] §3.2, Theorem 3.4: the descent inequality used to obtain the O(1/T) rate for the nonconvex case absorbs the momentum term only under the assumption that the momentum coefficient β satisfies β ≤ 1/(2L); this restriction is not stated in the theorem statement itself and appears to be needed for the final bound, so the claimed generality of the rate should be clarified.
[§4.1, Algorithm 1] §4.1, Algorithm 1: the per-iteration cost claim relies on the projection being computed exactly once per step, yet the momentum update introduces an extra vector addition whose numerical stability is not analyzed; when the constraint set is a simplex or a box, this addition can accumulate floating-point error that the convergence proof does not account for.

minor comments (2)

[§5] The abstract states that the method 'outperforms' the baseline in two benchmarks, but §5 does not report the number of random seeds or statistical significance tests; adding these would strengthen the empirical claim.
[§2 and §4.3] Notation for the momentum coefficient is introduced as β in §2 and then reused as γ in the proof of Lemma 4.3; a single consistent symbol would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the precise comments, which help strengthen the presentation of our results. We respond to each major comment below.

read point-by-point responses

Referee: [§3.2, Theorem 3.4] §3.2, Theorem 3.4: the descent inequality used to obtain the O(1/T) rate for the nonconvex case absorbs the momentum term only under the assumption that the momentum coefficient β satisfies β ≤ 1/(2L); this restriction is not stated in the theorem statement itself and appears to be needed for the final bound, so the claimed generality of the rate should be clarified.

Authors: We agree that the condition β ≤ 1/(2L) is required for the descent inequality to absorb the momentum contribution and obtain the stated O(1/T) rate in the nonconvex setting. This bound on β was used in the proof but omitted from the theorem statement. We will revise Theorem 3.4 to state the restriction explicitly. revision: yes
Referee: [§4.1, Algorithm 1] §4.1, Algorithm 1: the per-iteration cost claim relies on the projection being computed exactly once per step, yet the momentum update introduces an extra vector addition whose numerical stability is not analyzed; when the constraint set is a simplex or a box, this addition can accumulate floating-point error that the convergence proof does not account for.

Authors: The momentum step consists of one vector addition and scaling whose arithmetic cost is negligible compared with the projection. The per-iteration complexity claim therefore remains unchanged. Our convergence analysis is performed in exact arithmetic, the standard setting for such first-order guarantees. A detailed floating-point analysis lies outside the scope of the paper; we will add a short remark in Section 4.1 recording the exact-arithmetic assumption and noting that the reported experiments on simplex and box constraints exhibit no observable instability. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper performs a standard convergence and complexity analysis for momentum-augmented projected gradient methods under the assumptions of smooth (possibly nonconvex) objectives and convex constraint sets with available Euclidean projection. Theoretical guarantees follow from classical smoothness and descent arguments without reducing any prediction or result to a fitted parameter or self-referential definition. The tailored algorithm and its per-iteration cost are derived directly from the framework, and experimental outperformance on benchmarks is presented as independent empirical evidence rather than a constructed outcome. No load-bearing self-citations, imported uniqueness theorems, or ansatzes that circularly define the central claims appear in the abstract or described structure; the derivation remains self-contained against external optimization benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard domain assumptions for constrained optimization; momentum parameters are design choices rather than fitted values derived from data.

free parameters (1)

momentum coefficient
A tunable parameter in the momentum update rule, selected as part of algorithm design or empirical tuning.

axioms (1)

domain assumption The objective function is smooth and the constraint set is convex with a practically available Euclidean projection.
Explicitly stated as the focus of the problem setting in the abstract.

pith-pipeline@v0.9.0 · 5414 in / 1134 out tokens · 46049 ms · 2026-05-16T11:45:15.924131+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

IMA Journal of Nu- merical Analysis25(2), 221–252 (2005)

Andreani, R., Birgin, E.G., Mart´ ınez, J.M., Yuan, J.: Spectral projected gradient and variable metric methods for optimization with linear inequalities. IMA Journal of Nu- merical Analysis25(2), 221–252 (2005)

work page 2005
[2]

Numerical Algorithms 53(1), 23–52 (2010)

Andretta, M., Birgin, E.G., Mart´ ınez, J.M.: Partial spectral projected gradient method with active-set strategy for linearly constrained optimization. Numerical Algorithms 53(1), 23–52 (2010)

work page 2010
[3]

Available: https://doi.org/10.1093/imanum/8.1.141

Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA Journal of Numerical Analysis8, 141–148 (1988). DOI 10.1093/imanum/8.1.141

work page doi:10.1093/imanum/8.1.141 1988
[4]

Athena Scientific (1999)

Bertsekas, D.P.: Nonlinear Programming. Athena Scientific (1999)

work page 1999
[5]

Computational Optimization and Applica- tions23(1), 101–125 (2002) Projected gradient methods with momentum 23

Birgin, E.G., Mario Mart´ ınez, J.: Large-scale active-set box-constrained optimization method with spectral projected gradients. Computational Optimization and Applica- tions23(1), 101–125 (2002) Projected gradient methods with momentum 23

work page 2002
[6]

Nonmonotone spectral projected gradient methods on convex sets

Birgin, E.G., Mart´ ınez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM Journal on Optimization10(4), 1196–1211 (2000). DOI 10.1137/S1052623497330963. URLhttps://doi.org/10.1137/S1052623497330963

work page doi:10.1137/s1052623497330963 2000
[7]

ACM Trans

Birgin, E.G., Mart´ ınez, J.M., Raydan, M.: Algorithm 813: Spg—software for convex- constrained optimization. ACM Trans. Math. Softw.27(3), 340–349 (2001). DOI 10.1145/502800.502803. URLhttps://doi.org/10.1145/502800.502803

work page doi:10.1145/502800.502803 2001
[8]

IMA Journal of Numerical Analysis23(4), 539–559 (2003)

Birgin, E.G., Mart´ ınez, J.M., Raydan, M.: Inexact spectral projected gradient methods on convex sets. IMA Journal of Numerical Analysis23(4), 539–559 (2003)

work page 2003
[9]

Inverse Problems31(9), 095,008 (2015)

Bonettini, S., Prato, M.: New convergence results for the scaled gradient projection method. Inverse Problems31(9), 095,008 (2015)

work page 2015
[10]

Inverse Problems25(1), 015,002 (2008)

Bonettini, S., Zanella, R., Zanni, L.: A scaled gradient projection method for constrained image deblurring. Inverse Problems25(1), 015,002 (2008)

work page 2008
[11]

SIAM Review60(2), 223–311 (2018)

Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Review60(2), 223–311 (2018)

work page 2018
[12]

SIAM (2022)

Cartis, C., Gould, N.I., Toint, P.L.: Evaluation Complexity of Algorithms for Nonconvex Optimization: Theory, Computation and Perspectives. SIAM (2022)

work page 2022
[13]

Mathematical Program- ming158(1), 575–585 (2016)

Condat, L.: Fast projection onto the simplex and the l1 ball. Mathematical Program- ming158(1), 575–585 (2016)

work page 2016
[14]

Computational Optimization and Applications83(2), 693–721 (2022)

Cristofari, A., De Santis, M., Lucidi, S., Rinaldi, F.: Minimization over theℓ 1-ball using an active-set non-monotone projected gradient. Computational Optimization and Applications83(2), 693–721 (2022)

work page 2022
[15]

Mathematical Programming91(2), 201–213 (2002)

Dolan, E.D., Mor´ e, J.J.: Benchmarking optimization software with performance profiles. Mathematical Programming91(2), 201–213 (2002)

work page 2002
[16]

Computational Optimization and Applications81(1), 91–125 (2022)

Ferreira, O.P., Lemes, M., Prudente, L.F.: On the inexact scaled gradient projection method. Computational Optimization and Applications81(1), 91–125 (2022)

work page 2022
[17]

IEEE Transactions on Information Theory64(10), 6707–6721 (2018)

Golbabaee, M., Davies, M.E.: Inexact gradient projection and fast data driven com- pressed sensing. IEEE Transactions on Information Theory64(10), 6707–6721 (2018)

work page 2018
[18]

Optimization71(1), 145–163 (2022)

Gon¸ calves, D.S., Gon¸ calves, M.L., Menezes, T.C.: Inexact variable metric method for convex-constrained optimization problems. Optimization71(1), 145–163 (2022)

work page 2022
[19]

Computational Opti- mization and Applications60(2014)

Gould, N., Orban, D., Toint, P.: CUTEst: a Constrained and Unconstrained Testing Environment with safe threads for mathematical optimization. Computational Opti- mization and Applications60(2014). DOI 10.1007/s10589-014-9687-3

work page doi:10.1007/s10589-014-9687-3 2014
[20]

Grippo, F

Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for new- ton’s method. SIAM Journal on Numerical Analysis23(4), 707–716 (1986). DOI 10.1137/0723046. URLhttps://doi.org/10.1137/0723046

work page doi:10.1137/0723046 1986
[21]

Uni- text

Grippo, L., Sciandrone, M.: Introduction to methods for nonlinear optimization. Uni- text. Springer, Cham (2023)

work page 2023
[22]

Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd edn. (2009)

work page 2009
[23]

Mathematical Pro- gramming199(1), 1365–1415 (2023)

Jia, X., Kanzow, C., Mehlitz, P., Wachsmuth, G.: An augmented Lagrangian method for optimization problems with structured geometric constraints. Mathematical Pro- gramming199(1), 1365–1415 (2023)

work page 2023
[24]

Computational Optimization and Applications pp

Lapucci, M., Liuzzi, G., Lucidi, S., Pucci, D., Sciandrone, M.: A globally convergent gradient method with momentum. Computational Optimization and Applications pp. 1–26 (2025)

work page 2025
[25]

Mathe- matical Programming Computation14(3), 543–591 (2022)

Lee, C.p., Wang, P.W., Lin, C.J.: Limited-memory common-directions method for large- scale optimization: convergence, parallelization, and distributed optimization. Mathe- matical Programming Computation14(3), 543–591 (2022)

work page 2022
[26]

In: AAAI, vol

Lee, S.I., Lee, H., Abbeel, P., Ng, A.Y.: Efficient L1 regularized logistic regression. In: AAAI, vol. 6, pp. 401–408 (2006)

work page 2006
[27]

USSR Computational Mathematics and Mathematical Physics6(5), 1–50 (1966)

Levitin, E.S., Polyak, B.T.: Constrained minimization methods. USSR Computational Mathematics and Mathematical Physics6(5), 1–50 (1966)

work page 1966
[28]

In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp

Liu, J., Chen, J., Ye, J.: Large-scale sparse logistic regression. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 547–556 (2009)

work page 2009
[29]

doi: 10.1016/0041-5553(64)90137-5

Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics4(5), 1–17 (1964). DOI https://doi.org/10.1016/0041-5553(64)90137-5. URLhttps://www.sciencedirect. com/science/article/pii/0041555364901375 24 Lapucci et al

work page doi:10.1016/0041-5553(64)90137-5 1964
[30]

SIAM Journal on Optimization7(1), 26–33 (1997)

Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM Journal on Optimization7(1), 26–33 (1997). DOI 10. 1137/S1052623494266365. URLhttps://doi.org/10.1137/S1052623494266365

work page doi:10.1137/s1052623494266365 1997
[31]

SIAM Journal on Scientific Computing46(3), A2025–A2046 (2024)

Tang, T., Toh, K.C., Xiao, N., Ye, Y.: A Riemannian dimension-reduced second-order method with application in sensor network localization. SIAM Journal on Scientific Computing46(3), A2025–A2046 (2024). DOI 10.1137/23M1567229. URLhttps:// doi.org/10.1137/23M1567229

work page doi:10.1137/23m1567229 2024
[32]

IEEE Transactions on Neural Networks and Learning Systems33(3), 1107–1118 (2022)

Tao, W., Wu, G.W., Tao, Q.: Momentum acceleration in the individual convergence of nonsmooth convex optimization with constraints. IEEE Transactions on Neural Networks and Learning Systems33(3), 1107–1118 (2022)

work page 2022
[33]

Wright, S.J., Recht, B.: Optimization for Data Analysis. Cambridge University Press (2022) Appendix: Closed form solution for the problem of minimizing a bidimensional quadratic function over a polytope We consider the two-dimensional optimization problem min α,β φ(α, β) = 1 2 α β T t u u w α β + y h T α β s.t.α≥0, β≥0, α+β≤1. (29) The feasible set of the...

work page 2022

[1] [1]

IMA Journal of Nu- merical Analysis25(2), 221–252 (2005)

Andreani, R., Birgin, E.G., Mart´ ınez, J.M., Yuan, J.: Spectral projected gradient and variable metric methods for optimization with linear inequalities. IMA Journal of Nu- merical Analysis25(2), 221–252 (2005)

work page 2005

[2] [2]

Numerical Algorithms 53(1), 23–52 (2010)

Andretta, M., Birgin, E.G., Mart´ ınez, J.M.: Partial spectral projected gradient method with active-set strategy for linearly constrained optimization. Numerical Algorithms 53(1), 23–52 (2010)

work page 2010

[3] [3]

Available: https://doi.org/10.1093/imanum/8.1.141

Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA Journal of Numerical Analysis8, 141–148 (1988). DOI 10.1093/imanum/8.1.141

work page doi:10.1093/imanum/8.1.141 1988

[4] [4]

Athena Scientific (1999)

Bertsekas, D.P.: Nonlinear Programming. Athena Scientific (1999)

work page 1999

[5] [5]

Computational Optimization and Applica- tions23(1), 101–125 (2002) Projected gradient methods with momentum 23

Birgin, E.G., Mario Mart´ ınez, J.: Large-scale active-set box-constrained optimization method with spectral projected gradients. Computational Optimization and Applica- tions23(1), 101–125 (2002) Projected gradient methods with momentum 23

work page 2002

[6] [6]

Nonmonotone spectral projected gradient methods on convex sets

Birgin, E.G., Mart´ ınez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM Journal on Optimization10(4), 1196–1211 (2000). DOI 10.1137/S1052623497330963. URLhttps://doi.org/10.1137/S1052623497330963

work page doi:10.1137/s1052623497330963 2000

[7] [7]

ACM Trans

Birgin, E.G., Mart´ ınez, J.M., Raydan, M.: Algorithm 813: Spg—software for convex- constrained optimization. ACM Trans. Math. Softw.27(3), 340–349 (2001). DOI 10.1145/502800.502803. URLhttps://doi.org/10.1145/502800.502803

work page doi:10.1145/502800.502803 2001

[8] [8]

IMA Journal of Numerical Analysis23(4), 539–559 (2003)

Birgin, E.G., Mart´ ınez, J.M., Raydan, M.: Inexact spectral projected gradient methods on convex sets. IMA Journal of Numerical Analysis23(4), 539–559 (2003)

work page 2003

[9] [9]

Inverse Problems31(9), 095,008 (2015)

Bonettini, S., Prato, M.: New convergence results for the scaled gradient projection method. Inverse Problems31(9), 095,008 (2015)

work page 2015

[10] [10]

Inverse Problems25(1), 015,002 (2008)

Bonettini, S., Zanella, R., Zanni, L.: A scaled gradient projection method for constrained image deblurring. Inverse Problems25(1), 015,002 (2008)

work page 2008

[11] [11]

SIAM Review60(2), 223–311 (2018)

Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Review60(2), 223–311 (2018)

work page 2018

[12] [12]

SIAM (2022)

Cartis, C., Gould, N.I., Toint, P.L.: Evaluation Complexity of Algorithms for Nonconvex Optimization: Theory, Computation and Perspectives. SIAM (2022)

work page 2022

[13] [13]

Mathematical Program- ming158(1), 575–585 (2016)

Condat, L.: Fast projection onto the simplex and the l1 ball. Mathematical Program- ming158(1), 575–585 (2016)

work page 2016

[14] [14]

Computational Optimization and Applications83(2), 693–721 (2022)

Cristofari, A., De Santis, M., Lucidi, S., Rinaldi, F.: Minimization over theℓ 1-ball using an active-set non-monotone projected gradient. Computational Optimization and Applications83(2), 693–721 (2022)

work page 2022

[15] [15]

Mathematical Programming91(2), 201–213 (2002)

Dolan, E.D., Mor´ e, J.J.: Benchmarking optimization software with performance profiles. Mathematical Programming91(2), 201–213 (2002)

work page 2002

[16] [16]

Computational Optimization and Applications81(1), 91–125 (2022)

Ferreira, O.P., Lemes, M., Prudente, L.F.: On the inexact scaled gradient projection method. Computational Optimization and Applications81(1), 91–125 (2022)

work page 2022

[17] [17]

IEEE Transactions on Information Theory64(10), 6707–6721 (2018)

Golbabaee, M., Davies, M.E.: Inexact gradient projection and fast data driven com- pressed sensing. IEEE Transactions on Information Theory64(10), 6707–6721 (2018)

work page 2018

[18] [18]

Optimization71(1), 145–163 (2022)

Gon¸ calves, D.S., Gon¸ calves, M.L., Menezes, T.C.: Inexact variable metric method for convex-constrained optimization problems. Optimization71(1), 145–163 (2022)

work page 2022

[19] [19]

Computational Opti- mization and Applications60(2014)

Gould, N., Orban, D., Toint, P.: CUTEst: a Constrained and Unconstrained Testing Environment with safe threads for mathematical optimization. Computational Opti- mization and Applications60(2014). DOI 10.1007/s10589-014-9687-3

work page doi:10.1007/s10589-014-9687-3 2014

[20] [20]

Grippo, F

Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for new- ton’s method. SIAM Journal on Numerical Analysis23(4), 707–716 (1986). DOI 10.1137/0723046. URLhttps://doi.org/10.1137/0723046

work page doi:10.1137/0723046 1986

[21] [21]

Uni- text

Grippo, L., Sciandrone, M.: Introduction to methods for nonlinear optimization. Uni- text. Springer, Cham (2023)

work page 2023

[22] [22]

Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd edn. (2009)

work page 2009

[23] [23]

Mathematical Pro- gramming199(1), 1365–1415 (2023)

Jia, X., Kanzow, C., Mehlitz, P., Wachsmuth, G.: An augmented Lagrangian method for optimization problems with structured geometric constraints. Mathematical Pro- gramming199(1), 1365–1415 (2023)

work page 2023

[24] [24]

Computational Optimization and Applications pp

Lapucci, M., Liuzzi, G., Lucidi, S., Pucci, D., Sciandrone, M.: A globally convergent gradient method with momentum. Computational Optimization and Applications pp. 1–26 (2025)

work page 2025

[25] [25]

Mathe- matical Programming Computation14(3), 543–591 (2022)

Lee, C.p., Wang, P.W., Lin, C.J.: Limited-memory common-directions method for large- scale optimization: convergence, parallelization, and distributed optimization. Mathe- matical Programming Computation14(3), 543–591 (2022)

work page 2022

[26] [26]

In: AAAI, vol

Lee, S.I., Lee, H., Abbeel, P., Ng, A.Y.: Efficient L1 regularized logistic regression. In: AAAI, vol. 6, pp. 401–408 (2006)

work page 2006

[27] [27]

USSR Computational Mathematics and Mathematical Physics6(5), 1–50 (1966)

Levitin, E.S., Polyak, B.T.: Constrained minimization methods. USSR Computational Mathematics and Mathematical Physics6(5), 1–50 (1966)

work page 1966

[28] [28]

In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp

Liu, J., Chen, J., Ye, J.: Large-scale sparse logistic regression. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 547–556 (2009)

work page 2009

[29] [29]

doi: 10.1016/0041-5553(64)90137-5

Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics4(5), 1–17 (1964). DOI https://doi.org/10.1016/0041-5553(64)90137-5. URLhttps://www.sciencedirect. com/science/article/pii/0041555364901375 24 Lapucci et al

work page doi:10.1016/0041-5553(64)90137-5 1964

[30] [30]

SIAM Journal on Optimization7(1), 26–33 (1997)

Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM Journal on Optimization7(1), 26–33 (1997). DOI 10. 1137/S1052623494266365. URLhttps://doi.org/10.1137/S1052623494266365

work page doi:10.1137/s1052623494266365 1997

[31] [31]

SIAM Journal on Scientific Computing46(3), A2025–A2046 (2024)

Tang, T., Toh, K.C., Xiao, N., Ye, Y.: A Riemannian dimension-reduced second-order method with application in sensor network localization. SIAM Journal on Scientific Computing46(3), A2025–A2046 (2024). DOI 10.1137/23M1567229. URLhttps:// doi.org/10.1137/23M1567229

work page doi:10.1137/23m1567229 2024

[32] [32]

IEEE Transactions on Neural Networks and Learning Systems33(3), 1107–1118 (2022)

Tao, W., Wu, G.W., Tao, Q.: Momentum acceleration in the individual convergence of nonsmooth convex optimization with constraints. IEEE Transactions on Neural Networks and Learning Systems33(3), 1107–1118 (2022)

work page 2022

[33] [33]

Wright, S.J., Recht, B.: Optimization for Data Analysis. Cambridge University Press (2022) Appendix: Closed form solution for the problem of minimizing a bidimensional quadratic function over a polytope We consider the two-dimensional optimization problem min α,β φ(α, β) = 1 2 α β T t u u w α β + y h T α β s.t.α≥0, β≥0, α+β≤1. (29) The feasible set of the...

work page 2022