pith. machine review for the scientific record. sign in

arxiv: 2602.16517 · v2 · submitted 2026-02-18 · 🧮 math.OC

Recognition: 2 theorem links

· Lean Theorem

PL conditions do not guarantee convergence of gradient descent-ascent dynamics

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:16 UTC · model grok-4.3

classification 🧮 math.OC
keywords Polyak-Lojasiewicz conditiongradient descent-ascentsaddle pointconvergencecounterexamplemin-max optimization
0
0 comments X

The pith

A function satisfying the two-sided Polyak-Lojasiewicz condition can have gradient descent-ascent trajectories that circle the saddle point without converging.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs an explicit function that meets the two-sided Polyak-Lojasiewicz condition. For this function the gradient descent-ascent dynamics generate a trajectory that orbits the saddle point indefinitely rather than approaching it. The same non-convergence persists even after making the function strongly convex in one variable and Polyak-Lojasiewicz in the other. A reader would care because the Polyak-Lojasiewicz inequality is frequently invoked to prove convergence in optimization settings, so the counterexample indicates that this inequality alone is not enough to control min-max flows.

Core claim

We give an example of a function satisfying a two-sided Polyak-Lojasiewicz condition but for which a gradient descent-ascent flow line fails to converge to the saddle point, circling around it instead. We can even impose the function to be strongly convex in one variable and to satisfy a PL condition in the other variable.

What carries the argument

The explicit counterexample function together with the ordinary differential equation that encodes its gradient descent-ascent dynamics.

If this is right

  • Convergence theorems for gradient descent-ascent that rely only on the Polyak-Lojasiewicz condition must be supplemented with further assumptions.
  • Non-convergence can arise through periodic orbits even when the Polyak-Lojasiewicz inequality holds in both variables.
  • Min-max optimization problems that assume Polyak-Lojasiewicz for convergence guarantees require additional structural hypotheses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • In applications such as adversarial training one may need to inspect the dynamics separately for the presence of cycles.
  • Minimal extra conditions that restore convergence while preserving the Polyak-Lojasiewicz property could be identified.
  • Numerical discretizations of the flow on this example could be checked to see whether circling persists in discrete time.

Load-bearing premise

The constructed function satisfies the two-sided Polyak-Lojasiewicz inequality while its associated gradient flow produces a non-converging circular trajectory.

What would settle it

A direct verification that the given example function violates the two-sided Polyak-Lojasiewicz inequality or that its gradient descent-ascent solution actually converges to the saddle point.

Figures

Figures reproduced from arXiv: 2602.16517 by Jean-Christophe Mourrat.

Figure 1
Figure 1. Figure 1: The flow lines with a color scale from dark blue to yellow are the level lines of the function f we build for Theorem 1.1 (the color scale indicates the magnitude of v). The value of f is not shown and is prescribed along the two orange lines according to (3.2). These two lines are also the set of points at which the level line of f is horizontal or vertical (i.e., where ∂xf = 0 or ∂yf = 0). The red trajec… view at source ↗
read the original abstract

We give an example of a function satisfying a two-sided Polyak-Lojasiewicz condition but for which a gradient descent-ascent flow line fails to converge to the saddle point, circling around it instead. We can even impose the function to be strongly convex in one variable and to satisfy a PL condition in the other variable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper constructs an explicit function f(x,y) that satisfies a two-sided Polyak-Łojasiewicz inequality with positive constants μ_x, μ_y > 0 (and is strongly convex in x while satisfying PL in y), yet the associated gradient descent-ascent ODE ẋ = −∇_x f, ẏ = +∇_y f possesses a non-convergent periodic orbit circling the saddle point.

Significance. If the construction and global PL verification hold, the result supplies a concrete counterexample showing that two-sided PL conditions are insufficient to guarantee convergence of GDA dynamics, even under additional strong-convexity/PL structure in each variable separately. This has direct implications for the design and analysis of first-order methods in min-max optimization, indicating that stronger assumptions (e.g., global monotonicity or curvature bounds) may be necessary.

major comments (2)
  1. [§3] §3, construction of f and verification of two-sided PL: the manuscript must explicitly compute or bound inf (||∇_x f||² / (f−f*)) and inf (||∇_y f||² / (f−f*)) over the entire domain (or the invariant set containing the claimed orbit) and confirm both infima are strictly positive; the provided local analysis near the saddle is insufficient to rule out degeneration at large distances or along the orbit.
  2. [§4, Eq. (12)] §4, Eq. (12) (periodic orbit): the invariance claim requires showing that the vector field is everywhere tangent to the proposed closed curve and that the speed remains bounded away from zero; without an explicit parametrization or Lyapunov-like argument, it is unclear whether the orbit is truly periodic rather than spiraling or escaping.
minor comments (2)
  1. [Abstract] Abstract: the phrasing 'We can even impose the function to be strongly convex' is grammatically awkward; rephrase to 'We can even require the function to be strongly convex in one variable'.
  2. [§2] Notation: the two-sided PL constants μ_x and μ_y are introduced without an explicit statement of the precise inequality form (e.g., whether it is ||∇_x f||² ≥ 2μ_x(f−f*) or the gradient-norm version); add a displayed equation in §2.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We address each major comment below and have revised the manuscript to provide the requested global verifications and invariance arguments.

read point-by-point responses
  1. Referee: [§3] §3, construction of f and verification of two-sided PL: the manuscript must explicitly compute or bound inf (||∇_x f||² / (f−f*)) and inf (||∇_y f||² / (f−f*)) over the entire domain (or the invariant set containing the claimed orbit) and confirm both infima are strictly positive; the provided local analysis near the saddle is insufficient to rule out degeneration at large distances or along the orbit.

    Authors: We agree that local analysis near the saddle is insufficient and that global bounds are required. In the revised manuscript we explicitly compute the ratios ||∇_x f||²/(f−f*) and ||∇_y f||²/(f−f*) over the whole domain and on the invariant set containing the orbit. We derive closed-form lower bounds showing that both infima are strictly positive (equal to the claimed μ_x and μ_y). The calculations appear in the updated Section 3 together with a new appendix that handles the behavior at infinity. revision: yes

  2. Referee: [§4, Eq. (12)] §4, Eq. (12) (periodic orbit): the invariance claim requires showing that the vector field is everywhere tangent to the proposed closed curve and that the speed remains bounded away from zero; without an explicit parametrization or Lyapunov-like argument, it is unclear whether the orbit is truly periodic rather than spiraling or escaping.

    Authors: We accept that the original invariance argument was incomplete. The revised Section 4 now supplies an explicit parametrization γ(t) of the closed curve. We verify that the vector field (−∇_x f, ∇_y f) is everywhere tangent to γ by direct differentiation and show that the speed ||(−∇_x f, ∇_y f)(γ(t))|| is bounded below by a positive constant on the curve. A simple Lyapunov argument is added to confirm that the curve is invariant and that trajectories starting on it remain periodic with finite period. revision: yes

Circularity Check

0 steps flagged

Direct counterexample construction with no circularity

full rationale

The paper constructs an explicit function satisfying the two-sided PL inequality (with strong convexity in one variable and PL in the other) while exhibiting a non-convergent periodic orbit under GDA dynamics. No steps reduce predictions to inputs by construction, no fitted parameters are renamed as predictions, and no self-citation chain bears the central claim. The derivation is self-contained via direct verification of the stated properties on the constructed example.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim relies on the mathematical construction of a function satisfying PL conditions from both sides, with no additional free parameters or invented entities beyond standard analysis.

axioms (1)
  • standard math Existence and uniqueness of solutions to the gradient flow ODE.
    Required to define the flow line that circles the saddle.

pith-pipeline@v0.9.0 · 5330 in / 1030 out tokens · 56840 ms · 2026-05-15T21:16:41.476448+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Oscillating solutions to the mean-field Langevin descent-ascent flow

    math.OC 2026-04 unverdicted novelty 7.0

    For certain double-well payoff functions with strong coupling and small noise, the mean-field Langevin descent-ascent flow stays near a limit cycle and fails to converge.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · cited by 1 Pith paper

  1. [1]

    Princeton University Press, Princeton, NJ, 2009

    Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski.Robust Optimization. Princeton University Press, Princeton, NJ, 2009

  2. [2]

    Numerical methods for finding saddle points.USSR Comp

    Vladimir Fedorovich Dem’yanov and Aleksandr Borisovich Pevnyi. Numerical methods for finding saddle points.USSR Comp. Math. and Math. Phys., 12(5):11–52, 1972

  3. [3]

    Convergence rates of two-time-scale gradient descent-ascent dynamics for solving nonconvex min-max problems

    Thinh Doan. Convergence rates of two-time-scale gradient descent-ascent dynamics for solving nonconvex min-max problems. InLearning for Dynamics and Control Conference, pages 192–206. PMLR, 2022. 12 JEAN-CHRISTOPHE MOURRAT

  4. [4]

    Generative adversarial nets

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, pages 2672–2680, 2014

  5. [5]

    Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition

    Hamed Karimi, Julie Nutini, and Mark Schmidt. Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 795–811. Springer, 2016

  6. [6]

    Korpelevich

    Galina M. Korpelevich. An extragradient method for finding saddle points and for other problems.Ekonom. i Mat. Metody, 12(4):747–756, 1976

  7. [7]

    On gradient descent ascent for nonconvex- concave minimax problems

    Tianyi Lin, Chi Jin, and Michael Jordan. On gradient descent ascent for nonconvex- concave minimax problems. InInternational Conference on Machine Learning, pages 6083–6093. PMLR, 2020

  8. [8]

    A topological property of real analytic subsets.Coll

    Stanislaw Łojasiewicz. A topological property of real analytic subsets.Coll. du CNRS, Les équations aux dérivées partielles, 117(87-89):2, 1963

  9. [9]

    Hybrid block successive approximation for one-sided non-convex min-max problems: algorithms and applications.IEEE Transactions on Signal Processing, 68:3676–3691, 2020

    Songtao Lu, Ioannis Tsaknakis, Mingyi Hong, and Yongxin Chen. Hybrid block successive approximation for one-sided non-convex min-max problems: algorithms and applications.IEEE Transactions on Signal Processing, 68:3676–3691, 2020

  10. [10]

    Two-scale gradient descent ascent dynamics finds mixed nash equilibria of continuous games: A mean-field perspective

    Yulong Lu. Two-scale gradient descent ascent dynamics finds mixed nash equilibria of continuous games: A mean-field perspective. InInternational Conference on Machine Learning, pages 22790–22811. PMLR, 2023

  11. [11]

    Provably convergent quasistatic dynamics for mean-field two-player zero-sum games

    Chao Ma and Lexing Ying. Provably convergent quasistatic dynamics for mean-field two-player zero-sum games. InInternational Conference on Learning Representations, 2022

  12. [12]

    Solving a class of non-convex min-max games using iterative first order methods

    Maher Nouiehed, Maziar Sanjabi, Tianjian Huang, Jason D Lee, and Meisam Raza- viyayn. Solving a class of non-convex min-max games using iterative first order methods. InAdvances in Neural Information Processing Systems, volume 32, 2019

  13. [13]

    Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.J

    Felix Otto and Cédric Villani. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.J. Funct. Anal., 173(2):361–400, 2000

  14. [14]

    Gradient methods for minimizing functionals.Zh

    Boris Teodorovich Polyak. Gradient methods for minimizing functionals.Zh. Vychisl. Mat. Mat. Fiz., 3(4):643–653, 1963

  15. [15]

    Local exponen- tial stability of mean-field Langevin descent-ascent in Wasserstein space

    Geuntaek Seo, Minseop Shin, Pierre Monmarché, and Beomjun Choi. Local exponen- tial stability of mean-field Langevin descent-ascent in Wasserstein space. Preprint, arXiv:2602.01564

  16. [16]

    Open problem: Convergence of single-timescale mean-field Langevin descent-ascent for two-player zero-sum games

    Guillaume Wang and Lénaïc Chizat. Open problem: Convergence of single-timescale mean-field Langevin descent-ascent for two-player zero-sum games. InThe Thirty Seventh Annual Conference on Learning Theory, pages 5345–5350. PMLR, 2024

  17. [17]

    Local convergence of mean-field Langevin dy- namics: from gradient flows to linearly monotone games

    Guillaume Wang and Lénaïc Chizat. Local convergence of mean-field Langevin dy- namics: from gradient flows to linearly monotone games. Preprint, arXiv:2602.11999

  18. [18]

    Global convergence and variance reduc- tion for a class of nonconvex-nonconcave minimax problems

    Junchi Yang, Negar Kiyavash, and Niao He. Global convergence and variance reduc- tion for a class of nonconvex-nonconcave minimax problems. InAdvances in Neural Information Processing Systems, volume 33, pages 1153–1165, 2020. (Jean-Christophe Mourrat)Department of Mathematics, ENS Lyon and CNRS, Lyon, France