arxiv: 2602.16517 · v2 · submitted 2026-02-18 · 🧮 math.OC

Recognition: 2 theorem links

· Lean Theorem

PL conditions do not guarantee convergence of gradient descent-ascent dynamics

Jean-Christophe Mourrat

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:16 UTC · model grok-4.3

classification 🧮 math.OC

keywords Polyak-Lojasiewicz conditiongradient descent-ascentsaddle pointconvergencecounterexamplemin-max optimization

0 comments

The pith

A function satisfying the two-sided Polyak-Lojasiewicz condition can have gradient descent-ascent trajectories that circle the saddle point without converging.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs an explicit function that meets the two-sided Polyak-Lojasiewicz condition. For this function the gradient descent-ascent dynamics generate a trajectory that orbits the saddle point indefinitely rather than approaching it. The same non-convergence persists even after making the function strongly convex in one variable and Polyak-Lojasiewicz in the other. A reader would care because the Polyak-Lojasiewicz inequality is frequently invoked to prove convergence in optimization settings, so the counterexample indicates that this inequality alone is not enough to control min-max flows.

Core claim

We give an example of a function satisfying a two-sided Polyak-Lojasiewicz condition but for which a gradient descent-ascent flow line fails to converge to the saddle point, circling around it instead. We can even impose the function to be strongly convex in one variable and to satisfy a PL condition in the other variable.

What carries the argument

The explicit counterexample function together with the ordinary differential equation that encodes its gradient descent-ascent dynamics.

If this is right

Convergence theorems for gradient descent-ascent that rely only on the Polyak-Lojasiewicz condition must be supplemented with further assumptions.
Non-convergence can arise through periodic orbits even when the Polyak-Lojasiewicz inequality holds in both variables.
Min-max optimization problems that assume Polyak-Lojasiewicz for convergence guarantees require additional structural hypotheses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In applications such as adversarial training one may need to inspect the dynamics separately for the presence of cycles.
Minimal extra conditions that restore convergence while preserving the Polyak-Lojasiewicz property could be identified.
Numerical discretizations of the flow on this example could be checked to see whether circling persists in discrete time.

Load-bearing premise

The constructed function satisfies the two-sided Polyak-Lojasiewicz inequality while its associated gradient flow produces a non-converging circular trajectory.

What would settle it

A direct verification that the given example function violates the two-sided Polyak-Lojasiewicz inequality or that its gradient descent-ascent solution actually converges to the saddle point.

Figures

Figures reproduced from arXiv: 2602.16517 by Jean-Christophe Mourrat.

**Figure 1.** Figure 1: The flow lines with a color scale from dark blue to yellow are the level lines of the function f we build for Theorem 1.1 (the color scale indicates the magnitude of v). The value of f is not shown and is prescribed along the two orange lines according to (3.2). These two lines are also the set of points at which the level line of f is horizontal or vertical (i.e., where ∂xf = 0 or ∂yf = 0). The red trajec… view at source ↗

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mourrat constructs an explicit function where two-sided PL holds but GDA dynamics circle the saddle instead of converging.

read the letter

The main point is that Mourrat constructs a function satisfying two-sided PL conditions for which the gradient descent-ascent flow does not converge to the saddle but instead circles around it. The example is explicit and even satisfies strong convexity in one variable and PL in the other. This directly shows that the two-sided PL inequality does not prevent periodic orbits in the dynamics. What the paper does well is deliver a clear negative result with a concrete function. It fills a gap by exhibiting behavior that standard convergence proofs would have ruled out under these assumptions. A point to verify is the global validity of the PL constants. The construction must ensure that the infimum of the PL ratio stays positive everywhere, including along the circling trajectory. If that holds, the counterexample is solid. The abstract suggests it does. This work is relevant for people working on theoretical guarantees for min-max problems. It indicates that additional assumptions beyond PL are needed for convergence. I would bring this to a reading group on optimization dynamics to discuss the implications for algorithm design. It deserves peer review as a precise counterexample that sharpens the literature. Recommendation: send it out for referees.

Referee Report

2 major / 2 minor

Summary. The paper constructs an explicit function f(x,y) that satisfies a two-sided Polyak-Łojasiewicz inequality with positive constants μ_x, μ_y > 0 (and is strongly convex in x while satisfying PL in y), yet the associated gradient descent-ascent ODE ẋ = −∇_x f, ẏ = +∇_y f possesses a non-convergent periodic orbit circling the saddle point.

Significance. If the construction and global PL verification hold, the result supplies a concrete counterexample showing that two-sided PL conditions are insufficient to guarantee convergence of GDA dynamics, even under additional strong-convexity/PL structure in each variable separately. This has direct implications for the design and analysis of first-order methods in min-max optimization, indicating that stronger assumptions (e.g., global monotonicity or curvature bounds) may be necessary.

major comments (2)

[§3] §3, construction of f and verification of two-sided PL: the manuscript must explicitly compute or bound inf (||∇_x f||² / (f−f*)) and inf (||∇_y f||² / (f−f*)) over the entire domain (or the invariant set containing the claimed orbit) and confirm both infima are strictly positive; the provided local analysis near the saddle is insufficient to rule out degeneration at large distances or along the orbit.
[§4, Eq. (12)] §4, Eq. (12) (periodic orbit): the invariance claim requires showing that the vector field is everywhere tangent to the proposed closed curve and that the speed remains bounded away from zero; without an explicit parametrization or Lyapunov-like argument, it is unclear whether the orbit is truly periodic rather than spiraling or escaping.

minor comments (2)

[Abstract] Abstract: the phrasing 'We can even impose the function to be strongly convex' is grammatically awkward; rephrase to 'We can even require the function to be strongly convex in one variable'.
[§2] Notation: the two-sided PL constants μ_x and μ_y are introduced without an explicit statement of the precise inequality form (e.g., whether it is ||∇_x f||² ≥ 2μ_x(f−f*) or the gradient-norm version); add a displayed equation in §2.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We address each major comment below and have revised the manuscript to provide the requested global verifications and invariance arguments.

read point-by-point responses

Referee: [§3] §3, construction of f and verification of two-sided PL: the manuscript must explicitly compute or bound inf (||∇_x f||² / (f−f*)) and inf (||∇_y f||² / (f−f*)) over the entire domain (or the invariant set containing the claimed orbit) and confirm both infima are strictly positive; the provided local analysis near the saddle is insufficient to rule out degeneration at large distances or along the orbit.

Authors: We agree that local analysis near the saddle is insufficient and that global bounds are required. In the revised manuscript we explicitly compute the ratios ||∇_x f||²/(f−f*) and ||∇_y f||²/(f−f*) over the whole domain and on the invariant set containing the orbit. We derive closed-form lower bounds showing that both infima are strictly positive (equal to the claimed μ_x and μ_y). The calculations appear in the updated Section 3 together with a new appendix that handles the behavior at infinity. revision: yes
Referee: [§4, Eq. (12)] §4, Eq. (12) (periodic orbit): the invariance claim requires showing that the vector field is everywhere tangent to the proposed closed curve and that the speed remains bounded away from zero; without an explicit parametrization or Lyapunov-like argument, it is unclear whether the orbit is truly periodic rather than spiraling or escaping.

Authors: We accept that the original invariance argument was incomplete. The revised Section 4 now supplies an explicit parametrization γ(t) of the closed curve. We verify that the vector field (−∇_x f, ∇_y f) is everywhere tangent to γ by direct differentiation and show that the speed ||(−∇_x f, ∇_y f)(γ(t))|| is bounded below by a positive constant on the curve. A simple Lyapunov argument is added to confirm that the curve is invariant and that trajectories starting on it remain periodic with finite period. revision: yes

Circularity Check

0 steps flagged

Direct counterexample construction with no circularity

full rationale

The paper constructs an explicit function satisfying the two-sided PL inequality (with strong convexity in one variable and PL in the other) while exhibiting a non-convergent periodic orbit under GDA dynamics. No steps reduce predictions to inputs by construction, no fitted parameters are renamed as predictions, and no self-citation chain bears the central claim. The derivation is self-contained via direct verification of the stated properties on the constructed example.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim relies on the mathematical construction of a function satisfying PL conditions from both sides, with no additional free parameters or invented entities beyond standard analysis.

axioms (1)

standard math Existence and uniqueness of solutions to the gradient flow ODE.
Required to define the flow line that circles the saddle.

pith-pipeline@v0.9.0 · 5330 in / 1030 out tokens · 56840 ms · 2026-05-15T21:16:41.476448+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1.1 ... two-sided PL condition ... GDA flow ... is periodic
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 2.1 ... f''(x)>0 whenever f'(x)=0

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Oscillating solutions to the mean-field Langevin descent-ascent flow
math.OC 2026-04 unverdicted novelty 7.0

For certain double-well payoff functions with strong coupling and small noise, the mean-field Langevin descent-ascent flow stays near a limit cycle and fails to converge.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · cited by 1 Pith paper

[1]

Princeton University Press, Princeton, NJ, 2009

Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski.Robust Optimization. Princeton University Press, Princeton, NJ, 2009

work page 2009
[2]

Numerical methods for finding saddle points.USSR Comp

Vladimir Fedorovich Dem’yanov and Aleksandr Borisovich Pevnyi. Numerical methods for finding saddle points.USSR Comp. Math. and Math. Phys., 12(5):11–52, 1972

work page 1972
[3]

Convergence rates of two-time-scale gradient descent-ascent dynamics for solving nonconvex min-max problems

Thinh Doan. Convergence rates of two-time-scale gradient descent-ascent dynamics for solving nonconvex min-max problems. InLearning for Dynamics and Control Conference, pages 192–206. PMLR, 2022. 12 JEAN-CHRISTOPHE MOURRAT

work page 2022
[4]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, pages 2672–2680, 2014

work page 2014
[5]

Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition

Hamed Karimi, Julie Nutini, and Mark Schmidt. Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 795–811. Springer, 2016

work page 2016
[6]

Korpelevich

Galina M. Korpelevich. An extragradient method for finding saddle points and for other problems.Ekonom. i Mat. Metody, 12(4):747–756, 1976

work page 1976
[7]

On gradient descent ascent for nonconvex- concave minimax problems

Tianyi Lin, Chi Jin, and Michael Jordan. On gradient descent ascent for nonconvex- concave minimax problems. InInternational Conference on Machine Learning, pages 6083–6093. PMLR, 2020

work page 2020
[8]

A topological property of real analytic subsets.Coll

Stanislaw Łojasiewicz. A topological property of real analytic subsets.Coll. du CNRS, Les équations aux dérivées partielles, 117(87-89):2, 1963

work page 1963
[9]

Hybrid block successive approximation for one-sided non-convex min-max problems: algorithms and applications.IEEE Transactions on Signal Processing, 68:3676–3691, 2020

Songtao Lu, Ioannis Tsaknakis, Mingyi Hong, and Yongxin Chen. Hybrid block successive approximation for one-sided non-convex min-max problems: algorithms and applications.IEEE Transactions on Signal Processing, 68:3676–3691, 2020

work page 2020
[10]

Two-scale gradient descent ascent dynamics finds mixed nash equilibria of continuous games: A mean-field perspective

Yulong Lu. Two-scale gradient descent ascent dynamics finds mixed nash equilibria of continuous games: A mean-field perspective. InInternational Conference on Machine Learning, pages 22790–22811. PMLR, 2023

work page 2023
[11]

Provably convergent quasistatic dynamics for mean-field two-player zero-sum games

Chao Ma and Lexing Ying. Provably convergent quasistatic dynamics for mean-field two-player zero-sum games. InInternational Conference on Learning Representations, 2022

work page 2022
[12]

Solving a class of non-convex min-max games using iterative first order methods

Maher Nouiehed, Maziar Sanjabi, Tianjian Huang, Jason D Lee, and Meisam Raza- viyayn. Solving a class of non-convex min-max games using iterative first order methods. InAdvances in Neural Information Processing Systems, volume 32, 2019

work page 2019
[13]

Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.J

Felix Otto and Cédric Villani. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.J. Funct. Anal., 173(2):361–400, 2000

work page 2000
[14]

Gradient methods for minimizing functionals.Zh

Boris Teodorovich Polyak. Gradient methods for minimizing functionals.Zh. Vychisl. Mat. Mat. Fiz., 3(4):643–653, 1963

work page 1963
[15]

Local exponen- tial stability of mean-field Langevin descent-ascent in Wasserstein space

Geuntaek Seo, Minseop Shin, Pierre Monmarché, and Beomjun Choi. Local exponen- tial stability of mean-field Langevin descent-ascent in Wasserstein space. Preprint, arXiv:2602.01564

work page arXiv
[16]

Open problem: Convergence of single-timescale mean-field Langevin descent-ascent for two-player zero-sum games

Guillaume Wang and Lénaïc Chizat. Open problem: Convergence of single-timescale mean-field Langevin descent-ascent for two-player zero-sum games. InThe Thirty Seventh Annual Conference on Learning Theory, pages 5345–5350. PMLR, 2024

work page 2024
[17]

Local convergence of mean-field Langevin dy- namics: from gradient flows to linearly monotone games

Guillaume Wang and Lénaïc Chizat. Local convergence of mean-field Langevin dy- namics: from gradient flows to linearly monotone games. Preprint, arXiv:2602.11999

work page arXiv
[18]

Global convergence and variance reduc- tion for a class of nonconvex-nonconcave minimax problems

Junchi Yang, Negar Kiyavash, and Niao He. Global convergence and variance reduc- tion for a class of nonconvex-nonconcave minimax problems. InAdvances in Neural Information Processing Systems, volume 33, pages 1153–1165, 2020. (Jean-Christophe Mourrat)Department of Mathematics, ENS Lyon and CNRS, Lyon, France

work page 2020