Negative Momentum for Convex-Concave Optimization

Henry Shugart; Jason M. Altschuler; Shuyi Wang

arxiv: 2604.17145 · v1 · submitted 2026-04-18 · 🧮 math.OC · cs.DS· cs.LG

Negative Momentum for Convex-Concave Optimization

Henry Shugart , Shuyi Wang , Jason M. Altschuler This is my paper

Pith reviewed 2026-05-10 05:59 UTC · model grok-4.3

classification 🧮 math.OC cs.DScs.LG

keywords negative momentummin-max optimizationconvex-concavestrongly-convex-strongly-concaveglobal convergenceaccelerated convergencegradient descent-ascent

0 comments

The pith

Negative momentum enables global convergence for convex-concave optimization and accelerated rates for strongly-convex-strongly-concave cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that applying negative momentum to gradient descent-ascent dynamics resolves divergence issues in min-max problems. It proves global convergence is achievable for the foundational convex-concave setting, which is the direct analog of convex minimization. It further establishes accelerated convergence for the strongly-convex-strongly-concave setting. These results place negative momentum on par with other min-max algorithms in both generality and speed. A sympathetic reader would care because min-max optimization appears throughout game theory, economics, and adversarial machine learning, where standard momentum fails.

Core claim

The paper proves that negative momentum makes global convergence possible for convex-concave optimization and accelerated convergence possible for strongly-convex-strongly-concave optimization. With the momentum parameter chosen in an appropriate negative range, the dynamics converge globally in the first case and at accelerated linear rates in the second, extending momentum's known benefits from minimization to these min-max settings.

What carries the argument

Negative momentum coefficient inserted into the gradient descent-ascent update rule, selected from a range determined by the problem's convexity and smoothness parameters.

If this is right

Global convergence guarantees now exist for convex-concave min-max problems via negative momentum.
Accelerated linear rates are achievable for strongly-convex-strongly-concave problems.
Negative momentum works at least as generally and as fast as competing min-max algorithms on these standard test cases.
Momentum can be made to succeed in min-max settings by simply reversing its sign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sign-reversal idea might stabilize other first-order methods that currently diverge on min-max problems.
Practical tuning of momentum in adversarial training or game-solving algorithms could be revisited with negative values in mind.
Extensions to nonsmooth or nonconvex-nonconcave regimes remain open but could follow similar proof templates.

Load-bearing premise

The objective must be smooth and the momentum parameter must lie in a nonempty range fixed by the strong-convexity and smoothness constants.

What would settle it

A smooth convex-concave function on which the negative-momentum dynamics diverge for every parameter in the proposed range, or a smooth strongly-convex-strongly-concave function on which the observed convergence rate is not accelerated, would falsify the claims.

read the original abstract

This paper revisits momentum in the context of min-max optimization. Momentum is a celebrated mechanism for accelerating gradient dynamics in settings like convex minimization, but its direct use in min-max optimization makes gradient dynamics diverge. Surprisingly, Gidel et al. 2019 showed that negative momentum can help fix convergence. However, despite these promising initial results and progress since, the power of momentum remains unclear for min-max optimization in two key ways. (1) Generality: is global convergence possible for the foundational setting of convex-concave optimization? This is the direct analog of convex minimization and is a standard testing ground for min-max algorithms. (2) Fast convergence: is accelerated convergence possible for strongly-convex-strong-concave optimization (the only non-linear setting where global convergence is known)? Recent work has even argued that this is impossible. We answer both these questions in the affirmative. Together, these results put negative momentum on more equal footing with competitor algorithms, and show that negative momentum enables convergence significantly faster and more generally than was known possible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Negative momentum settles the open questions on global convergence for convex-concave min-max and acceleration for the strongly convex-concave case.

read the letter

The paper shows that negative momentum can achieve global convergence in the convex-concave setting and accelerated rates in the strongly convex-strongly concave setting. These were the two main open questions left after Gidel et al. 2019, and the authors give affirmative answers using the same basic mechanism but with a careful choice of the momentum parameter in a range that depends on the smoothness and strong-convexity constants. The work puts negative momentum on comparable footing with other min-max methods by removing the earlier restrictions on generality and speed. The derivations rest on standard convex analysis rather than new machinery, and the abstract indicates the results are stated with explicit rates. The main limitation is the usual one for these analyses: the functions must be smooth, and the momentum parameter must land in the derived interval; outside that interval or without smoothness the guarantees do not hold. That is not a flaw in the argument itself, just a standard scope condition. The paper is aimed at people working on optimization for games and adversarial training. It is worth sending to referees because it closes two concrete open questions with clean affirmative results and could affect how people design first-order methods in these domains.

Referee Report

1 major / 3 minor

Summary. The paper shows that negative momentum enables global convergence for convex-concave min-max optimization and accelerated convergence for strongly-convex-strongly-concave optimization, answering two open questions in the affirmative and placing negative momentum on more equal footing with other min-max algorithms.

Significance. If the derivations hold, the results are significant because they establish global convergence in the foundational convex-concave setting (the direct analog of convex minimization) and accelerated rates in the only non-linear setting where global convergence was previously known, countering recent arguments that acceleration is impossible. This extends the power of momentum methods beyond what was shown in Gidel et al. 2019.

major comments (1)

[Main results] Main convergence theorem for the SC-SC case (likely Theorem 3 or equivalent in the results section): the admissible range for the negative momentum parameter must be shown to be non-empty for arbitrary positive strong-convexity constant μ and smoothness constant L; without an explicit verification that the interval is always feasible, the acceleration claim is not guaranteed to hold under the stated assumptions.

minor comments (3)

[Abstract] Abstract: briefly state the specific convergence rates (e.g., the accelerated factor relative to gradient descent) to make the contribution more concrete for readers.
[Introduction] Introduction: the reference to 'recent work' arguing that acceleration is impossible should include an explicit citation (authors, title, year) rather than a generic description.
Notation: ensure the min-max objective and the momentum update are defined with consistent symbols (e.g., for the step-size and momentum parameter) across all theorems and proofs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading, positive summary, and recommendation of minor revision. We address the major comment below.

read point-by-point responses

Referee: Main convergence theorem for the SC-SC case (likely Theorem 3 or equivalent in the results section): the admissible range for the negative momentum parameter must be shown to be non-empty for arbitrary positive strong-convexity constant μ and smoothness constant L; without an explicit verification that the interval is always feasible, the acceleration claim is not guaranteed to hold under the stated assumptions.

Authors: We thank the referee for this observation. The admissible interval for the negative momentum parameter β in the SC-SC theorem is derived from the requirement that the spectral radius of the iteration matrix (or the relevant quadratic form in the Lyapunov analysis) is strictly less than one. This yields an explicit open interval β ∈ (β_lower(μ, L, η), β_upper(μ, L, η)) for step-size η in a suitable range. We will add a short lemma (or remark immediately following the theorem) that explicitly verifies β_upper > β_lower for every μ > 0 and L ≥ μ. The verification follows directly from the positive discriminant of the resulting quadratic inequality and the fact that our step-size upper bound 2/(L + μ) keeps the expressions well-defined and ordered; the algebra is elementary but was previously left implicit. This addition makes the feasibility of the acceleration claim fully rigorous under the stated assumptions and does not change any of the main results or proofs. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims rest on standard convex-concave analysis and Lyapunov-style arguments for negative momentum, with no load-bearing steps that reduce by construction to fitted parameters, self-definitions, or self-citation chains. The abstract and high-level structure invoke prior work (Gidel et al. 2019) only for motivation, not as the sole justification for the new global-convergence or acceleration results. Assumptions on smoothness and parameter ranges are explicit and falsifiable outside the fitted values, and no renaming of known results or ansatz smuggling is present. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard convexity, strong convexity, and smoothness assumptions that are not derived in the paper. No free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The objective function is convex-concave (or strongly convex-strongly concave) and L-smooth.
Invoked to guarantee existence of saddle points and to control the gradient Lipschitz constant in the convergence analysis.

pith-pipeline@v0.9.0 · 5484 in / 1166 out tokens · 39085 ms · 2026-05-10T05:59:18.099546+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

[1]

https://jasonaltschuler.github.io/NegativeMomentum

Negative momentum for convex-concave optimization – verification of identities computer algebra script. https://jasonaltschuler.github.io/NegativeMomentum

work page
[2]

A tight and unified analysis of gradient-based methods for a whole spectrum of differentiable games

Wa¨ ıss Azizian, Ioannis Mitliagkas, Simon Lacoste-Julien, and Gauthier Gidel. A tight and unified analysis of gradient-based methods for a whole spectrum of differentiable games. InInternational Conference on Artificial Intelligence and Statistics, 2020

work page 2020
[3]

Springer, 2006

Saugata Basu, Richard Pollack, and Marie-Fran¸ coise Roy.Algorithms in real algebraic geometry. Springer, 2006

work page 2006
[4]

Princeton University Press, 2009

Aharon Ben-Tal, Arkadi Nemirovski, and Laurent El Ghaoui.Robust optimization. Princeton University Press, 2009. 3We recall here the relevant version of Descartes’ rule of signs (see e.g. [3, Theorem 2.33]). Letq(ζ)= ∑n k=0 ckζ k have real coefficients and real roots. Suppose that for somem≥0, the low-order coefficientsc k =0 for allk<m, and the high-order...

work page 2009
[5]

Numerical solution of saddle point problems.Acta Numerica, 14:1–137, 2005

Michele Benzi, Gene H Golub, and J¨ org Liesen. Numerical solution of saddle point problems.Acta Numerica, 14:1–137, 2005

work page 2005
[6]

Bertsekas.Constrained Optimization and Lagrange Multiplier Methods

Dimitri P. Bertsekas.Constrained Optimization and Lagrange Multiplier Methods. Academic Press, 2014

work page 2014
[7]

arXiv preprint arXiv:2206.08303 , year=

Aleksandr Beznosikov, Aibek Alanov, Dmitry Kovalev, Martin Tak´ aˇ c, and Alexander Gasnikov. On scaled methods for saddle point problems.Preprint at arXiv:2206.08303, 2022

work page arXiv 2022
[8]

Distributed optimization and statistical learning via the alternating direction method of multipliers.Foundations and Trends® in Machine Learning, 3(1):1–122, 2011

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers.Foundations and Trends® in Machine Learning, 3(1):1–122, 2011

work page 2011
[9]

Tight last-iterate convergence of the extragradient and the optimistic gradient descent-ascent algorithm for constrained monotone variational inequalities

Yang Cai, Argyris Oikonomou, and Weiqiang Zheng. Tight last-iterate convergence of the extragradient and the optimistic gradient descent-ascent algorithm for constrained monotone variational inequalities. Advances in Neural Information Processing Systems, 2022

work page 2022
[10]

The limit points of (optimistic) gradient descent in min-max optimization.Advances in Neural Information Processing Systems, 2018

Constantinos Daskalakis and Ioannis Panageas. The limit points of (optimistic) gradient descent in min-max optimization.Advances in Neural Information Processing Systems, 2018

work page 2018
[11]

Training GANs with optimism

Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, and Haoyang Zeng. Training GANs with optimism. InInternational Conference on Learning Representations, 2018

work page 2018
[12]

Performance of first-order methods for smooth convex minimization: a novel approach.Mathematical Programming, 145(1):451–482, 2014

Yoel Drori and Marc Teboulle. Performance of first-order methods for smooth convex minimization: a novel approach.Mathematical Programming, 145(1):451–482, 2014

work page 2014
[13]

Acceleration methods.Foundations and Trends®in Optimization, 5(1-2):1–245, 2021

Alexandre d’Aspremont, Damien Scieur, and Adrien Taylor. Acceleration methods.Foundations and Trends®in Optimization, 5(1-2):1–245, 2021

work page 2021
[14]

Rapid learning in constrained minimax games with negative momentum

Zijian Fang, Zongkai Liu, Chao Yu, and Chaohao Hu. Rapid learning in constrained minimax games with negative momentum. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 16541–16549, 2025

work page 2025
[15]

Continuous-time analysis of heavy ball momentum in min-max games

Yi Feng, Kaito Fujii, Stratis Skoulakis, Xiao Wang, and Volkan Cevher. Continuous-time analysis of heavy ball momentum in min-max games. InInternational Conference on Machine Learning, pages 16670–16710. PMLR, 2025

work page 2025
[16]

Negative momentum for improved game dynamics

Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, R´ emi Le Priol, Gabriel Huang, Si- mon Lacoste-Julien, and Ioannis Mitliagkas. Negative momentum for improved game dynamics. In International Conference on Artificial Intelligence and Statistics, 2019

work page 2019
[17]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Pro- cessing Systems, 2014

work page 2014
[18]

Extragradient method:O(1/k)last-iterate convergence for monotone variational inequalities and connections with cocoercivity

Eduard Gorbunov, Nicolas Loizou, and Gauthier Gidel. Extragradient method:O(1/k)last-iterate convergence for monotone variational inequalities and connections with cocoercivity. InInternational Conference on Artificial Intelligence and Statistics, 2022

work page 2022
[19]

Last-iterate convergence of optimistic gradient method for monotone variational inequalities.Advances in Neural Information Processing Systems, 35, 2022

Eduard Gorbunov, Adrien Taylor, and Gauthier Gidel. Last-iterate convergence of optimistic gradient method for monotone variational inequalities.Advances in Neural Information Processing Systems, 35, 2022

work page 2022
[20]

Provable non-accelerations of the heavy-ball method.Mathematical Programming, pages 1–59, 2025

Baptiste Goujaud, Adrien Taylor, and Aymeric Dieuleveut. Provable non-accelerations of the heavy-ball method.Mathematical Programming, pages 1–59, 2025

work page 2025
[21]

PID design by convex-concave optimization

Martin Hast, Karl Johan ˚Astr¨ om, Bo Bernhardsson, and Stephen Boyd. PID design by convex-concave optimization. InEuropean Control Conference, pages 4460–4465. IEEE, 2013. 16

work page 2013
[22]

The limits of min-max optimization algorithms: Convergence to spurious non-critical sets

Ya-Ping Hsieh, Panayotis Mertikopoulos, and Volkan Cevher. The limits of min-max optimization algorithms: Convergence to spurious non-critical sets. InInternational Conference on Machine Learning, pages 4337–4348. PMLR, 2021

work page 2021
[23]

The extragradient method for finding saddle points and other problems.Matecon, 12:747–756, 1976

Galina M Korpelevich. The extragradient method for finding saddle points and other problems.Matecon, 12:747–756, 1976

work page 1976
[24]

Strengthening the finite characterizations of smooth min-max games.arXiv preprint arXiv:2603.17053, 2026

Valery Krivchenko, Alexander Gasnikov, and Dmitry Kovalev. Strengthening the finite characterizations of smooth min-max games.arXiv preprint arXiv:2603.17053, 2026

work page internal anchor Pith review arXiv 2026
[25]

Convex-concave interpolation and application of pep to the bilinear-coupled saddle point problem.Russian Journal of Nonlinear Dynamics, 20(5):875–893, 2024

Valery O Krivchenko, Alexander V Gasnikov, and Dmitry A Kovalev. Convex-concave interpolation and application of pep to the bilinear-coupled saddle point problem.Russian Journal of Nonlinear Dynamics, 20(5):875–893, 2024

work page 2024
[26]

Fundamental benefit of alternating updates in minimax optimization.Preprint at arXiv:2402.10475, 2024

Jaewook Lee, Hanseul Cho, and Chulhee Yun. Fundamental benefit of alternating updates in minimax optimization.Preprint at arXiv:2402.10475, 2024

work page arXiv 2024
[27]

Fast extra gradient methods for smooth structured nonconvex- nonconcave minimax problems.Advances in Neural Information Processing Systems, 34:22588–22600, 2021

Sucheol Lee and Donghwan Kim. Fast extra gradient methods for smooth structured nonconvex- nonconcave minimax problems.Advances in Neural Information Processing Systems, 34:22588–22600, 2021

work page 2021
[28]

Analysis and design of optimization algorithms via integral quadratic constraints.SIAM Journal on Optimization, 26(1):57–95, 2016

Laurent Lessard, Benjamin Recht, and Andrew Packard. Analysis and design of optimization algorithms via integral quadratic constraints.SIAM Journal on Optimization, 26(1):57–95, 2016

work page 2016
[29]

Complex momentum for opti- mization in games

Jonathan P Lorraine, David Acuna, Paul Vicol, and David Duvenaud. Complex momentum for opti- mization in games. InInternational Conference on Artificial Intelligence and Statistics, pages 7742–7765. PMLR, 2022

work page 2022
[30]

To- wards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. To- wards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations, 2018

work page 2018
[31]

Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

Panayotis Mertikopoulos, Bruno Lecouat, Houssam Zenati, Chuan-Sheng Foo, Vijay Chandrasekhar, and Georgios Piliouras. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. InInternational Conference on Learning Representations, pages 1–23, 2019

work page 2019
[32]

A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach

Aryan Mokhtari, Asuman Ozdaglar, and Sarath Pattathil. A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. InInternational Conference on Artificial Intelligence and Statistics, 2020

work page 2020
[33]

Convergence rate ofO(1/k)for optimistic gradient and extragradient methods in smooth convex-concave saddle point problems.SIAM Journal on Optimization, 30(4):3230–3251, 2020

Aryan Mokhtari, Asuman E Ozdaglar, and Sarath Pattathil. Convergence rate ofO(1/k)for optimistic gradient and extragradient methods in smooth convex-concave saddle point problems.SIAM Journal on Optimization, 30(4):3230–3251, 2020

work page 2020
[34]

A method of solving a convex programming problem with convergence rateO(1/k 2)

Yurii Nesterov. A method of solving a convex programming problem with convergence rateO(1/k 2). Doklady Akademii Nauk SSSR, 269(3):543–547, 1983

work page 1983
[35]

Some methods of speeding up the convergence of iteration methods.USSR Computa- tional Mathematics and Mathematical Physics, 4(5):1–17, 1964

Boris T Polyak. Some methods of speeding up the convergence of iteration methods.USSR Computa- tional Mathematics and Mathematical Physics, 4(5):1–17, 1964

work page 1964
[36]

L. D. Popov. A modification of the Arrow-Hurwicz method for search of saddle points.Mathematical notes of the Academy of Sciences of the USSR, 28(5):845–848, 1980. ISSN 1573-8876

work page 1980
[37]

Operator splitting per- formance estimation: Tight contraction factors and optimal parameter selection.SIAM Journal on Optimization, 30(3):2251–2271, 2020

Ernest K Ryu, Adrien B Taylor, Carolina Bergeling, and Pontus Giselsson. Operator splitting per- formance estimation: Tight contraction factors and optimal parameter selection.SIAM Journal on Optimization, 30(3):2251–2271, 2020

work page 2020
[38]

Min-max optimization is strictly easier than variational in- equalities.Preprint at arXiv:2511.03052, 2025

Henry Shugart and Jason M Altschuler. Min-max optimization is strictly easier than variational in- equalities.Preprint at arXiv:2511.03052, 2025. 17

work page arXiv 2025
[39]

Altschuler

Henry Shugart and Jason M. Altschuler. Negative Stepsizes Make Gradient-Descent-Ascent Converge. Preprint at arXiv:2505.01423, 2025

work page arXiv 2025
[40]

Lyapunov functions for first-order methods: Tight automated convergence guarantees

Adrien Taylor, Bryan Van Scoy, and Laurent Lessard. Lyapunov functions for first-order methods: Tight automated convergence guarantees. InInternational Conference on Machine Learning, pages 4897–4906. PMLR, 2018

work page 2018
[41]

Adrien B. Taylor. Towards principled and systematic approaches to the analysis and design of opti- mization algorithms.PSL Research University, 2024. Habilitation ` a diriger des recherches

work page 2024
[42]

Tran-Dinh and Y

Quoc Tran-Dinh and Yang Luo. Halpern-type accelerated and splitting algorithms for monotone inclu- sions.Preprint at arXiv:2110.08150, 2021

work page arXiv 2021
[43]

On linear convergence of iterative methods for the variational inequality problem.Journal of Computational and Applied Mathematics, 60(1):237–252, 1995

Paul Tseng. On linear convergence of iterative methods for the variational inequality problem.Journal of Computational and Applied Mathematics, 60(1):237–252, 1995

work page 1995
[44]

Automated tight Lyapunov analysis for first-order methods.Mathematical Programming, 209(1):133–170, 2025

Manu Upadhyaya, Sebastian Banert, Adrien B Taylor, and Pontus Giselsson. Automated tight Lyapunov analysis for first-order methods.Mathematical Programming, 209(1):133–170, 2025

work page 2025
[45]

Princeton University Press, Princeton, NJ, 1947

John von Neumann and Oskar Morgenstern.Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 1947

work page 1947
[46]

Accelerated algorithms for smooth convex-concave minimax problems withO(1/k 2)rate on squared gradient norm

TaeHo Yoon and Ernest K Ryu. Accelerated algorithms for smooth convex-concave minimax problems withO(1/k 2)rate on squared gradient norm. InInternational Conference on Machine Learning, 2021

work page 2021
[47]

Convergence rate analysis of the gradient descent–ascent method for convex–concave saddle-point problems.Optimization Methods and Software, 39(5):967–989, 2024

Moslem Zamani, Hadi Abbaszadehpeivasti, and Etienne de Klerk. Convergence rate analysis of the gradient descent–ascent method for convex–concave saddle-point problems.Optimization Methods and Software, 39(5):967–989, 2024

work page 2024
[48]

On the suboptimality of negative momentum for minimax opti- mization

Guodong Zhang and Yuanhao Wang. On the suboptimality of negative momentum for minimax opti- mization. InInternational Conference on Artificial Intelligence and Statistics, pages 2098–2106, 2021

work page 2098
[49]

A unified analysis of first-order methods for smooth games via integral quadratic constraints.Journal of Machine Learning Research, 22(103):1–39, 2021

Guodong Zhang, Xuchan Bao, Laurent Lessard, and Roger Grosse. A unified analysis of first-order methods for smooth games via integral quadratic constraints.Journal of Machine Learning Research, 22(103):1–39, 2021

work page 2021
[50]

Guodong Zhang, Yuanhao Wang, Laurent Lessard, and Roger B. Grosse. Near-optimal local conver- gence of alternating gradient descent-ascent for minimax optimization. InInternational Conference on Artificial Intelligence and Statistics, 2022

work page 2022
[51]

On lower iteration complexity bounds for the convex concave saddle point problems.Mathematical Programming, 194(1–2):901–935, 2022

Junyu Zhang, Mingyi Hong, and Shuzhong Zhang. On lower iteration complexity bounds for the convex concave saddle point problems.Mathematical Programming, 194(1–2):901–935, 2022. 18

work page 2022

[1] [1]

https://jasonaltschuler.github.io/NegativeMomentum

Negative momentum for convex-concave optimization – verification of identities computer algebra script. https://jasonaltschuler.github.io/NegativeMomentum

work page

[2] [2]

A tight and unified analysis of gradient-based methods for a whole spectrum of differentiable games

Wa¨ ıss Azizian, Ioannis Mitliagkas, Simon Lacoste-Julien, and Gauthier Gidel. A tight and unified analysis of gradient-based methods for a whole spectrum of differentiable games. InInternational Conference on Artificial Intelligence and Statistics, 2020

work page 2020

[3] [3]

Springer, 2006

Saugata Basu, Richard Pollack, and Marie-Fran¸ coise Roy.Algorithms in real algebraic geometry. Springer, 2006

work page 2006

[4] [4]

Princeton University Press, 2009

Aharon Ben-Tal, Arkadi Nemirovski, and Laurent El Ghaoui.Robust optimization. Princeton University Press, 2009. 3We recall here the relevant version of Descartes’ rule of signs (see e.g. [3, Theorem 2.33]). Letq(ζ)= ∑n k=0 ckζ k have real coefficients and real roots. Suppose that for somem≥0, the low-order coefficientsc k =0 for allk<m, and the high-order...

work page 2009

[5] [5]

Numerical solution of saddle point problems.Acta Numerica, 14:1–137, 2005

Michele Benzi, Gene H Golub, and J¨ org Liesen. Numerical solution of saddle point problems.Acta Numerica, 14:1–137, 2005

work page 2005

[6] [6]

Bertsekas.Constrained Optimization and Lagrange Multiplier Methods

Dimitri P. Bertsekas.Constrained Optimization and Lagrange Multiplier Methods. Academic Press, 2014

work page 2014

[7] [7]

arXiv preprint arXiv:2206.08303 , year=

Aleksandr Beznosikov, Aibek Alanov, Dmitry Kovalev, Martin Tak´ aˇ c, and Alexander Gasnikov. On scaled methods for saddle point problems.Preprint at arXiv:2206.08303, 2022

work page arXiv 2022

[8] [8]

Distributed optimization and statistical learning via the alternating direction method of multipliers.Foundations and Trends® in Machine Learning, 3(1):1–122, 2011

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers.Foundations and Trends® in Machine Learning, 3(1):1–122, 2011

work page 2011

[9] [9]

Tight last-iterate convergence of the extragradient and the optimistic gradient descent-ascent algorithm for constrained monotone variational inequalities

Yang Cai, Argyris Oikonomou, and Weiqiang Zheng. Tight last-iterate convergence of the extragradient and the optimistic gradient descent-ascent algorithm for constrained monotone variational inequalities. Advances in Neural Information Processing Systems, 2022

work page 2022

[10] [10]

The limit points of (optimistic) gradient descent in min-max optimization.Advances in Neural Information Processing Systems, 2018

Constantinos Daskalakis and Ioannis Panageas. The limit points of (optimistic) gradient descent in min-max optimization.Advances in Neural Information Processing Systems, 2018

work page 2018

[11] [11]

Training GANs with optimism

Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, and Haoyang Zeng. Training GANs with optimism. InInternational Conference on Learning Representations, 2018

work page 2018

[12] [12]

Performance of first-order methods for smooth convex minimization: a novel approach.Mathematical Programming, 145(1):451–482, 2014

Yoel Drori and Marc Teboulle. Performance of first-order methods for smooth convex minimization: a novel approach.Mathematical Programming, 145(1):451–482, 2014

work page 2014

[13] [13]

Acceleration methods.Foundations and Trends®in Optimization, 5(1-2):1–245, 2021

Alexandre d’Aspremont, Damien Scieur, and Adrien Taylor. Acceleration methods.Foundations and Trends®in Optimization, 5(1-2):1–245, 2021

work page 2021

[14] [14]

Rapid learning in constrained minimax games with negative momentum

Zijian Fang, Zongkai Liu, Chao Yu, and Chaohao Hu. Rapid learning in constrained minimax games with negative momentum. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 16541–16549, 2025

work page 2025

[15] [15]

Continuous-time analysis of heavy ball momentum in min-max games

Yi Feng, Kaito Fujii, Stratis Skoulakis, Xiao Wang, and Volkan Cevher. Continuous-time analysis of heavy ball momentum in min-max games. InInternational Conference on Machine Learning, pages 16670–16710. PMLR, 2025

work page 2025

[16] [16]

Negative momentum for improved game dynamics

Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, R´ emi Le Priol, Gabriel Huang, Si- mon Lacoste-Julien, and Ioannis Mitliagkas. Negative momentum for improved game dynamics. In International Conference on Artificial Intelligence and Statistics, 2019

work page 2019

[17] [17]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Pro- cessing Systems, 2014

work page 2014

[18] [18]

Extragradient method:O(1/k)last-iterate convergence for monotone variational inequalities and connections with cocoercivity

Eduard Gorbunov, Nicolas Loizou, and Gauthier Gidel. Extragradient method:O(1/k)last-iterate convergence for monotone variational inequalities and connections with cocoercivity. InInternational Conference on Artificial Intelligence and Statistics, 2022

work page 2022

[19] [19]

Last-iterate convergence of optimistic gradient method for monotone variational inequalities.Advances in Neural Information Processing Systems, 35, 2022

Eduard Gorbunov, Adrien Taylor, and Gauthier Gidel. Last-iterate convergence of optimistic gradient method for monotone variational inequalities.Advances in Neural Information Processing Systems, 35, 2022

work page 2022

[20] [20]

Provable non-accelerations of the heavy-ball method.Mathematical Programming, pages 1–59, 2025

Baptiste Goujaud, Adrien Taylor, and Aymeric Dieuleveut. Provable non-accelerations of the heavy-ball method.Mathematical Programming, pages 1–59, 2025

work page 2025

[21] [21]

PID design by convex-concave optimization

Martin Hast, Karl Johan ˚Astr¨ om, Bo Bernhardsson, and Stephen Boyd. PID design by convex-concave optimization. InEuropean Control Conference, pages 4460–4465. IEEE, 2013. 16

work page 2013

[22] [22]

The limits of min-max optimization algorithms: Convergence to spurious non-critical sets

Ya-Ping Hsieh, Panayotis Mertikopoulos, and Volkan Cevher. The limits of min-max optimization algorithms: Convergence to spurious non-critical sets. InInternational Conference on Machine Learning, pages 4337–4348. PMLR, 2021

work page 2021

[23] [23]

The extragradient method for finding saddle points and other problems.Matecon, 12:747–756, 1976

Galina M Korpelevich. The extragradient method for finding saddle points and other problems.Matecon, 12:747–756, 1976

work page 1976

[24] [24]

Strengthening the finite characterizations of smooth min-max games.arXiv preprint arXiv:2603.17053, 2026

Valery Krivchenko, Alexander Gasnikov, and Dmitry Kovalev. Strengthening the finite characterizations of smooth min-max games.arXiv preprint arXiv:2603.17053, 2026

work page internal anchor Pith review arXiv 2026

[25] [25]

Convex-concave interpolation and application of pep to the bilinear-coupled saddle point problem.Russian Journal of Nonlinear Dynamics, 20(5):875–893, 2024

Valery O Krivchenko, Alexander V Gasnikov, and Dmitry A Kovalev. Convex-concave interpolation and application of pep to the bilinear-coupled saddle point problem.Russian Journal of Nonlinear Dynamics, 20(5):875–893, 2024

work page 2024

[26] [26]

Fundamental benefit of alternating updates in minimax optimization.Preprint at arXiv:2402.10475, 2024

Jaewook Lee, Hanseul Cho, and Chulhee Yun. Fundamental benefit of alternating updates in minimax optimization.Preprint at arXiv:2402.10475, 2024

work page arXiv 2024

[27] [27]

Fast extra gradient methods for smooth structured nonconvex- nonconcave minimax problems.Advances in Neural Information Processing Systems, 34:22588–22600, 2021

Sucheol Lee and Donghwan Kim. Fast extra gradient methods for smooth structured nonconvex- nonconcave minimax problems.Advances in Neural Information Processing Systems, 34:22588–22600, 2021

work page 2021

[28] [28]

Analysis and design of optimization algorithms via integral quadratic constraints.SIAM Journal on Optimization, 26(1):57–95, 2016

Laurent Lessard, Benjamin Recht, and Andrew Packard. Analysis and design of optimization algorithms via integral quadratic constraints.SIAM Journal on Optimization, 26(1):57–95, 2016

work page 2016

[29] [29]

Complex momentum for opti- mization in games

Jonathan P Lorraine, David Acuna, Paul Vicol, and David Duvenaud. Complex momentum for opti- mization in games. InInternational Conference on Artificial Intelligence and Statistics, pages 7742–7765. PMLR, 2022

work page 2022

[30] [30]

To- wards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. To- wards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations, 2018

work page 2018

[31] [31]

Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

Panayotis Mertikopoulos, Bruno Lecouat, Houssam Zenati, Chuan-Sheng Foo, Vijay Chandrasekhar, and Georgios Piliouras. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. InInternational Conference on Learning Representations, pages 1–23, 2019

work page 2019

[32] [32]

A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach

Aryan Mokhtari, Asuman Ozdaglar, and Sarath Pattathil. A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. InInternational Conference on Artificial Intelligence and Statistics, 2020

work page 2020

[33] [33]

Convergence rate ofO(1/k)for optimistic gradient and extragradient methods in smooth convex-concave saddle point problems.SIAM Journal on Optimization, 30(4):3230–3251, 2020

Aryan Mokhtari, Asuman E Ozdaglar, and Sarath Pattathil. Convergence rate ofO(1/k)for optimistic gradient and extragradient methods in smooth convex-concave saddle point problems.SIAM Journal on Optimization, 30(4):3230–3251, 2020

work page 2020

[34] [34]

A method of solving a convex programming problem with convergence rateO(1/k 2)

Yurii Nesterov. A method of solving a convex programming problem with convergence rateO(1/k 2). Doklady Akademii Nauk SSSR, 269(3):543–547, 1983

work page 1983

[35] [35]

Some methods of speeding up the convergence of iteration methods.USSR Computa- tional Mathematics and Mathematical Physics, 4(5):1–17, 1964

Boris T Polyak. Some methods of speeding up the convergence of iteration methods.USSR Computa- tional Mathematics and Mathematical Physics, 4(5):1–17, 1964

work page 1964

[36] [36]

L. D. Popov. A modification of the Arrow-Hurwicz method for search of saddle points.Mathematical notes of the Academy of Sciences of the USSR, 28(5):845–848, 1980. ISSN 1573-8876

work page 1980

[37] [37]

Operator splitting per- formance estimation: Tight contraction factors and optimal parameter selection.SIAM Journal on Optimization, 30(3):2251–2271, 2020

Ernest K Ryu, Adrien B Taylor, Carolina Bergeling, and Pontus Giselsson. Operator splitting per- formance estimation: Tight contraction factors and optimal parameter selection.SIAM Journal on Optimization, 30(3):2251–2271, 2020

work page 2020

[38] [38]

Min-max optimization is strictly easier than variational in- equalities.Preprint at arXiv:2511.03052, 2025

Henry Shugart and Jason M Altschuler. Min-max optimization is strictly easier than variational in- equalities.Preprint at arXiv:2511.03052, 2025. 17

work page arXiv 2025

[39] [39]

Altschuler

Henry Shugart and Jason M. Altschuler. Negative Stepsizes Make Gradient-Descent-Ascent Converge. Preprint at arXiv:2505.01423, 2025

work page arXiv 2025

[40] [40]

Lyapunov functions for first-order methods: Tight automated convergence guarantees

Adrien Taylor, Bryan Van Scoy, and Laurent Lessard. Lyapunov functions for first-order methods: Tight automated convergence guarantees. InInternational Conference on Machine Learning, pages 4897–4906. PMLR, 2018

work page 2018

[41] [41]

Adrien B. Taylor. Towards principled and systematic approaches to the analysis and design of opti- mization algorithms.PSL Research University, 2024. Habilitation ` a diriger des recherches

work page 2024

[42] [42]

Tran-Dinh and Y

Quoc Tran-Dinh and Yang Luo. Halpern-type accelerated and splitting algorithms for monotone inclu- sions.Preprint at arXiv:2110.08150, 2021

work page arXiv 2021

[43] [43]

On linear convergence of iterative methods for the variational inequality problem.Journal of Computational and Applied Mathematics, 60(1):237–252, 1995

Paul Tseng. On linear convergence of iterative methods for the variational inequality problem.Journal of Computational and Applied Mathematics, 60(1):237–252, 1995

work page 1995

[44] [44]

Automated tight Lyapunov analysis for first-order methods.Mathematical Programming, 209(1):133–170, 2025

Manu Upadhyaya, Sebastian Banert, Adrien B Taylor, and Pontus Giselsson. Automated tight Lyapunov analysis for first-order methods.Mathematical Programming, 209(1):133–170, 2025

work page 2025

[45] [45]

Princeton University Press, Princeton, NJ, 1947

John von Neumann and Oskar Morgenstern.Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 1947

work page 1947

[46] [46]

Accelerated algorithms for smooth convex-concave minimax problems withO(1/k 2)rate on squared gradient norm

TaeHo Yoon and Ernest K Ryu. Accelerated algorithms for smooth convex-concave minimax problems withO(1/k 2)rate on squared gradient norm. InInternational Conference on Machine Learning, 2021

work page 2021

[47] [47]

Convergence rate analysis of the gradient descent–ascent method for convex–concave saddle-point problems.Optimization Methods and Software, 39(5):967–989, 2024

Moslem Zamani, Hadi Abbaszadehpeivasti, and Etienne de Klerk. Convergence rate analysis of the gradient descent–ascent method for convex–concave saddle-point problems.Optimization Methods and Software, 39(5):967–989, 2024

work page 2024

[48] [48]

On the suboptimality of negative momentum for minimax opti- mization

Guodong Zhang and Yuanhao Wang. On the suboptimality of negative momentum for minimax opti- mization. InInternational Conference on Artificial Intelligence and Statistics, pages 2098–2106, 2021

work page 2098

[49] [49]

A unified analysis of first-order methods for smooth games via integral quadratic constraints.Journal of Machine Learning Research, 22(103):1–39, 2021

Guodong Zhang, Xuchan Bao, Laurent Lessard, and Roger Grosse. A unified analysis of first-order methods for smooth games via integral quadratic constraints.Journal of Machine Learning Research, 22(103):1–39, 2021

work page 2021

[50] [50]

Guodong Zhang, Yuanhao Wang, Laurent Lessard, and Roger B. Grosse. Near-optimal local conver- gence of alternating gradient descent-ascent for minimax optimization. InInternational Conference on Artificial Intelligence and Statistics, 2022

work page 2022

[51] [51]

On lower iteration complexity bounds for the convex concave saddle point problems.Mathematical Programming, 194(1–2):901–935, 2022

Junyu Zhang, Mingyi Hong, and Shuzhong Zhang. On lower iteration complexity bounds for the convex concave saddle point problems.Mathematical Programming, 194(1–2):901–935, 2022. 18

work page 2022