Recognition: 2 theorem links
· Lean TheoremPL conditions do not guarantee convergence of gradient descent-ascent dynamics
Pith reviewed 2026-05-15 21:16 UTC · model grok-4.3
The pith
A function satisfying the two-sided Polyak-Lojasiewicz condition can have gradient descent-ascent trajectories that circle the saddle point without converging.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We give an example of a function satisfying a two-sided Polyak-Lojasiewicz condition but for which a gradient descent-ascent flow line fails to converge to the saddle point, circling around it instead. We can even impose the function to be strongly convex in one variable and to satisfy a PL condition in the other variable.
What carries the argument
The explicit counterexample function together with the ordinary differential equation that encodes its gradient descent-ascent dynamics.
If this is right
- Convergence theorems for gradient descent-ascent that rely only on the Polyak-Lojasiewicz condition must be supplemented with further assumptions.
- Non-convergence can arise through periodic orbits even when the Polyak-Lojasiewicz inequality holds in both variables.
- Min-max optimization problems that assume Polyak-Lojasiewicz for convergence guarantees require additional structural hypotheses.
Where Pith is reading between the lines
- In applications such as adversarial training one may need to inspect the dynamics separately for the presence of cycles.
- Minimal extra conditions that restore convergence while preserving the Polyak-Lojasiewicz property could be identified.
- Numerical discretizations of the flow on this example could be checked to see whether circling persists in discrete time.
Load-bearing premise
The constructed function satisfies the two-sided Polyak-Lojasiewicz inequality while its associated gradient flow produces a non-converging circular trajectory.
What would settle it
A direct verification that the given example function violates the two-sided Polyak-Lojasiewicz inequality or that its gradient descent-ascent solution actually converges to the saddle point.
Figures
read the original abstract
We give an example of a function satisfying a two-sided Polyak-Lojasiewicz condition but for which a gradient descent-ascent flow line fails to converge to the saddle point, circling around it instead. We can even impose the function to be strongly convex in one variable and to satisfy a PL condition in the other variable.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper constructs an explicit function f(x,y) that satisfies a two-sided Polyak-Łojasiewicz inequality with positive constants μ_x, μ_y > 0 (and is strongly convex in x while satisfying PL in y), yet the associated gradient descent-ascent ODE ẋ = −∇_x f, ẏ = +∇_y f possesses a non-convergent periodic orbit circling the saddle point.
Significance. If the construction and global PL verification hold, the result supplies a concrete counterexample showing that two-sided PL conditions are insufficient to guarantee convergence of GDA dynamics, even under additional strong-convexity/PL structure in each variable separately. This has direct implications for the design and analysis of first-order methods in min-max optimization, indicating that stronger assumptions (e.g., global monotonicity or curvature bounds) may be necessary.
major comments (2)
- [§3] §3, construction of f and verification of two-sided PL: the manuscript must explicitly compute or bound inf (||∇_x f||² / (f−f*)) and inf (||∇_y f||² / (f−f*)) over the entire domain (or the invariant set containing the claimed orbit) and confirm both infima are strictly positive; the provided local analysis near the saddle is insufficient to rule out degeneration at large distances or along the orbit.
- [§4, Eq. (12)] §4, Eq. (12) (periodic orbit): the invariance claim requires showing that the vector field is everywhere tangent to the proposed closed curve and that the speed remains bounded away from zero; without an explicit parametrization or Lyapunov-like argument, it is unclear whether the orbit is truly periodic rather than spiraling or escaping.
minor comments (2)
- [Abstract] Abstract: the phrasing 'We can even impose the function to be strongly convex' is grammatically awkward; rephrase to 'We can even require the function to be strongly convex in one variable'.
- [§2] Notation: the two-sided PL constants μ_x and μ_y are introduced without an explicit statement of the precise inequality form (e.g., whether it is ||∇_x f||² ≥ 2μ_x(f−f*) or the gradient-norm version); add a displayed equation in §2.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments. We address each major comment below and have revised the manuscript to provide the requested global verifications and invariance arguments.
read point-by-point responses
-
Referee: [§3] §3, construction of f and verification of two-sided PL: the manuscript must explicitly compute or bound inf (||∇_x f||² / (f−f*)) and inf (||∇_y f||² / (f−f*)) over the entire domain (or the invariant set containing the claimed orbit) and confirm both infima are strictly positive; the provided local analysis near the saddle is insufficient to rule out degeneration at large distances or along the orbit.
Authors: We agree that local analysis near the saddle is insufficient and that global bounds are required. In the revised manuscript we explicitly compute the ratios ||∇_x f||²/(f−f*) and ||∇_y f||²/(f−f*) over the whole domain and on the invariant set containing the orbit. We derive closed-form lower bounds showing that both infima are strictly positive (equal to the claimed μ_x and μ_y). The calculations appear in the updated Section 3 together with a new appendix that handles the behavior at infinity. revision: yes
-
Referee: [§4, Eq. (12)] §4, Eq. (12) (periodic orbit): the invariance claim requires showing that the vector field is everywhere tangent to the proposed closed curve and that the speed remains bounded away from zero; without an explicit parametrization or Lyapunov-like argument, it is unclear whether the orbit is truly periodic rather than spiraling or escaping.
Authors: We accept that the original invariance argument was incomplete. The revised Section 4 now supplies an explicit parametrization γ(t) of the closed curve. We verify that the vector field (−∇_x f, ∇_y f) is everywhere tangent to γ by direct differentiation and show that the speed ||(−∇_x f, ∇_y f)(γ(t))|| is bounded below by a positive constant on the curve. A simple Lyapunov argument is added to confirm that the curve is invariant and that trajectories starting on it remain periodic with finite period. revision: yes
Circularity Check
Direct counterexample construction with no circularity
full rationale
The paper constructs an explicit function satisfying the two-sided PL inequality (with strong convexity in one variable and PL in the other) while exhibiting a non-convergent periodic orbit under GDA dynamics. No steps reduce predictions to inputs by construction, no fitted parameters are renamed as predictions, and no self-citation chain bears the central claim. The derivation is self-contained via direct verification of the stated properties on the constructed example.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Existence and uniqueness of solutions to the gradient flow ODE.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1.1 ... two-sided PL condition ... GDA flow ... is periodic
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Proposition 2.1 ... f''(x)>0 whenever f'(x)=0
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Oscillating solutions to the mean-field Langevin descent-ascent flow
For certain double-well payoff functions with strong coupling and small noise, the mean-field Langevin descent-ascent flow stays near a limit cycle and fails to converge.
Reference graph
Works this paper leans on
-
[1]
Princeton University Press, Princeton, NJ, 2009
Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski.Robust Optimization. Princeton University Press, Princeton, NJ, 2009
work page 2009
-
[2]
Numerical methods for finding saddle points.USSR Comp
Vladimir Fedorovich Dem’yanov and Aleksandr Borisovich Pevnyi. Numerical methods for finding saddle points.USSR Comp. Math. and Math. Phys., 12(5):11–52, 1972
work page 1972
-
[3]
Thinh Doan. Convergence rates of two-time-scale gradient descent-ascent dynamics for solving nonconvex min-max problems. InLearning for Dynamics and Control Conference, pages 192–206. PMLR, 2022. 12 JEAN-CHRISTOPHE MOURRAT
work page 2022
-
[4]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, pages 2672–2680, 2014
work page 2014
-
[5]
Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition
Hamed Karimi, Julie Nutini, and Mark Schmidt. Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 795–811. Springer, 2016
work page 2016
-
[6]
Galina M. Korpelevich. An extragradient method for finding saddle points and for other problems.Ekonom. i Mat. Metody, 12(4):747–756, 1976
work page 1976
-
[7]
On gradient descent ascent for nonconvex- concave minimax problems
Tianyi Lin, Chi Jin, and Michael Jordan. On gradient descent ascent for nonconvex- concave minimax problems. InInternational Conference on Machine Learning, pages 6083–6093. PMLR, 2020
work page 2020
-
[8]
A topological property of real analytic subsets.Coll
Stanislaw Łojasiewicz. A topological property of real analytic subsets.Coll. du CNRS, Les équations aux dérivées partielles, 117(87-89):2, 1963
work page 1963
-
[9]
Songtao Lu, Ioannis Tsaknakis, Mingyi Hong, and Yongxin Chen. Hybrid block successive approximation for one-sided non-convex min-max problems: algorithms and applications.IEEE Transactions on Signal Processing, 68:3676–3691, 2020
work page 2020
-
[10]
Yulong Lu. Two-scale gradient descent ascent dynamics finds mixed nash equilibria of continuous games: A mean-field perspective. InInternational Conference on Machine Learning, pages 22790–22811. PMLR, 2023
work page 2023
-
[11]
Provably convergent quasistatic dynamics for mean-field two-player zero-sum games
Chao Ma and Lexing Ying. Provably convergent quasistatic dynamics for mean-field two-player zero-sum games. InInternational Conference on Learning Representations, 2022
work page 2022
-
[12]
Solving a class of non-convex min-max games using iterative first order methods
Maher Nouiehed, Maziar Sanjabi, Tianjian Huang, Jason D Lee, and Meisam Raza- viyayn. Solving a class of non-convex min-max games using iterative first order methods. InAdvances in Neural Information Processing Systems, volume 32, 2019
work page 2019
-
[13]
Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.J
Felix Otto and Cédric Villani. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.J. Funct. Anal., 173(2):361–400, 2000
work page 2000
-
[14]
Gradient methods for minimizing functionals.Zh
Boris Teodorovich Polyak. Gradient methods for minimizing functionals.Zh. Vychisl. Mat. Mat. Fiz., 3(4):643–653, 1963
work page 1963
-
[15]
Local exponen- tial stability of mean-field Langevin descent-ascent in Wasserstein space
Geuntaek Seo, Minseop Shin, Pierre Monmarché, and Beomjun Choi. Local exponen- tial stability of mean-field Langevin descent-ascent in Wasserstein space. Preprint, arXiv:2602.01564
-
[16]
Guillaume Wang and Lénaïc Chizat. Open problem: Convergence of single-timescale mean-field Langevin descent-ascent for two-player zero-sum games. InThe Thirty Seventh Annual Conference on Learning Theory, pages 5345–5350. PMLR, 2024
work page 2024
-
[17]
Local convergence of mean-field Langevin dy- namics: from gradient flows to linearly monotone games
Guillaume Wang and Lénaïc Chizat. Local convergence of mean-field Langevin dy- namics: from gradient flows to linearly monotone games. Preprint, arXiv:2602.11999
-
[18]
Global convergence and variance reduc- tion for a class of nonconvex-nonconcave minimax problems
Junchi Yang, Negar Kiyavash, and Niao He. Global convergence and variance reduc- tion for a class of nonconvex-nonconcave minimax problems. InAdvances in Neural Information Processing Systems, volume 33, pages 1153–1165, 2020. (Jean-Christophe Mourrat)Department of Mathematics, ENS Lyon and CNRS, Lyon, France
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.