pith. sign in

arxiv: 2507.01932 · v2 · pith:56C3FMTKnew · submitted 2025-07-02 · 🧮 math.OC · cs.LG· cs.NA· math.NA· stat.ML

A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Lojasiewicz condition

Pith reviewed 2026-05-21 23:30 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.NAmath.NAstat.ML
keywords nonconvex-nonconcave minimaxlocal Kurdyka-Lojasiewicz conditionmaximal functiongeneralized Hölder smoothinexact proximal gradientcomplexity analysisfirst-order method
0
0 comments X

The pith

Local KL condition on the inner maximization makes the maximal function locally generalized Hölder smooth and supports an inexact proximal gradient method with complexity guarantees for nonconvex-nonconcave minimax problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines nonconvex-nonconcave minimax problems in which the inner maximization satisfies a local Kurdyka-Lojasiewicz condition that can change depending on the outer minimization variable. This assumption is weaker than the global KL or Polyak-Lojasiewicz conditions often used elsewhere, allowing the method to cover more practical cases while still permitting analysis. The authors prove that the maximal function is locally generalized Hölder smooth under this condition. They then introduce an inexact proximal gradient algorithm that obtains the required inexact gradient by running a proximal gradient method on a KL-structured subproblem. Under mild assumptions the algorithm is shown to reach an approximate stationary point of the minimax problem with explicit iteration complexity bounds.

Core claim

Under a local Kurdyka-Lojasiewicz condition on the inner maximization that may vary with the outer variable, the associated maximal function is locally generalized Hölder smooth; this property allows an inexact proximal gradient method, whose inexact gradient is obtained by applying a proximal gradient method to a KL-structured subproblem, to compute an approximate stationary point with provable complexity guarantees.

What carries the argument

The local Kurdyka-Lojasiewicz condition on the inner problem, which yields local generalized Hölder smoothness of the maximal function and enables the inexact proximal gradient scheme.

If this is right

  • Complexity bounds hold for computing approximate stationary points without requiring global smoothness or global KL/PL assumptions.
  • The method remains applicable when the region of the KL condition shrinks as the algorithm approaches a stationary point.
  • Inexact gradients are obtained by solving KL-structured subproblems with an inner proximal gradient procedure.
  • Mild assumptions suffice for the overall convergence guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same local smoothness property could be used to design accelerated variants or variance-reduced extensions of the method.
  • Similar local KL assumptions might be checked in applications such as robust optimization or adversarial training to justify first-order methods there.
  • The shrinking KL region suggests that adaptive step-size rules could be derived directly from the local Hölder exponent.
  • Testing the method on problems where global PL fails but local KL holds would provide a practical check of the claimed advantage.

Load-bearing premise

The inner maximization problem satisfies a local Kurdyka-Lojasiewicz condition that may vary with the outer minimization variable.

What would settle it

A concrete counter-example in which the inner problem obeys a local KL condition yet the maximal function fails to be locally generalized Hölder smooth, or an instance where the proposed inexact proximal gradient method does not achieve the stated complexity for reaching an approximate stationary point.

read the original abstract

We study a class of nonconvex-nonconcave minimax problems in which the inner maximization problem satisfies a local Kurdyka-Lojasiewicz (KL) condition that may vary with the outer minimization variable. In contrast to the global KL or Polyak-Lojasiewicz (PL) conditions commonly assumed in the literature -- which are significantly stronger and often too restrictive in practice -- this local KL condition accommodates a broader range of practical scenarios. However, it also introduces new analytical challenges. In particular, as an optimization algorithm progresses toward a stationary point of the problem, the region over which the KL condition holds may shrink, resulting in a more intricate and potentially ill-conditioned landscape. To address this challenge, we show that the associated maximal function is locally generalized H\"older smooth. Leveraging this key property, we develop an inexact proximal gradient method for solving the minimax problem, where the inexact gradient of the maximal function is computed by applying a proximal gradient method to a KL-structured subproblem. Under mild assumptions, we establish complexity guarantees for computing an approximate stationary point of the minimax problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies nonconvex-nonconcave minimax problems min_x max_y f(x,y) in which the inner maximization satisfies a local Kurdyka-Łojasiewicz inequality whose neighborhood may depend on and shrink with the outer variable x. From this assumption the authors derive that the maximal function φ(x) = max_y f(x,y) is locally generalized Hölder smooth, then construct an inexact proximal-gradient algorithm on φ whose inexactness is realized by running a proximal-gradient method on the inner KL-structured subproblem, and prove complexity bounds for reaching an approximate stationary point of the original minimax problem.

Significance. If the technical claims hold, the work supplies the first complexity guarantees for first-order methods on a meaningfully broader class of minimax problems than those covered by global KL or PL assumptions. The reduction of inexactness control to an inner proximal-gradient routine on a KL subproblem is a clean algorithmic idea that could be reused in other settings with structured inner problems.

major comments (2)
  1. [§3.2, Theorem 3.3] §3.2, Theorem 3.3 (local generalized Hölder smoothness of φ): the proof invokes a KL neighborhood N(x) whose radius r(x) is permitted to approach zero as x approaches the stationary set of φ. The subsequent complexity argument in §4 requires the generalized Hölder modulus and the radius of validity to remain uniform along the entire proximal-gradient trajectory on φ; no explicit uniformity assumption on {N(x)} or lower bound on r(x) is stated, so the standard descent lemma and step-size selection may fail to close.
  2. [§4.1, Theorem 4.2] §4.1, Algorithm 1 and Theorem 4.2 (complexity bound): the inexact gradient oracle for φ is controlled by running a fixed number of inner proximal-gradient steps whose accuracy depends on the local KL parameters at the current x. If the KL neighborhood shrinks faster than the outer step-size schedule, the accumulated inexactness may exceed the tolerance required by the outer convergence analysis; the proof does not quantify this dependence.
minor comments (2)
  1. [Assumption 2.3] The statement of Assumption 2.3 (local KL) should explicitly record whether the desingularizing function θ and the exponent α are allowed to depend on x or are required to be uniform in a neighborhood of the limit point.
  2. [Definition 3.1 and Theorem 4.2] Notation for the generalized Hölder constant L_α(x) is introduced in Definition 3.1 but never appears in the complexity statement of Theorem 4.2; clarifying whether the final bound depends on sup L_α(x) along the trajectory would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting two important technical points regarding uniformity of the local KL neighborhoods and control of the accumulated inexactness. Both comments identify places where the current argument relies on implicit uniformity that is not explicitly stated. We will revise the manuscript to close these gaps while preserving the local nature of the KL assumption.

read point-by-point responses
  1. Referee: [§3.2, Theorem 3.3] §3.2, Theorem 3.3 (local generalized Hölder smoothness of φ): the proof invokes a KL neighborhood N(x) whose radius r(x) is permitted to approach zero as x approaches the stationary set of φ. The subsequent complexity argument in §4 requires the generalized Hölder modulus and the radius of validity to remain uniform along the entire proximal-gradient trajectory on φ; no explicit uniformity assumption on {N(x)} or lower bound on r(x) is stated, so the standard descent lemma and step-size selection may fail to close.

    Authors: We agree that the current statement of Theorem 3.3 permits r(x) to shrink and does not explicitly guarantee uniformity along the outer trajectory. In the revised version we will add Assumption 3.4, which requires that there exists a neighborhood U of the stationary set S and a uniform radius r_0 > 0 such that for every x in U the KL inequality holds on the ball B(x, r_0). Under this mild strengthening (still strictly weaker than a global KL condition) the generalized Hölder smoothness modulus and the radius of validity become uniform on the compact level sets visited by the proximal-gradient sequence, allowing the standard descent lemma and step-size choice in §4 to go through without modification. revision: yes

  2. Referee: [§4.1, Theorem 4.2] §4.1, Algorithm 1 and Theorem 4.2 (complexity bound): the inexact gradient oracle for φ is controlled by running a fixed number of inner proximal-gradient steps whose accuracy depends on the local KL parameters at the current x. If the KL neighborhood shrinks faster than the outer step-size schedule, the accumulated inexactness may exceed the tolerance required by the outer convergence analysis; the proof does not quantify this dependence.

    Authors: The referee correctly notes that the current proof of Theorem 4.2 treats the inner iteration count as fixed and does not explicitly bound the dependence on a possibly shrinking r(x). With the new uniform-radius Assumption 3.4 introduced above, the local KL parameters (including the exponent and the constant) become uniform along the trajectory. Consequently the inner proximal-gradient routine requires only a uniform number of steps to achieve the required inexactness tolerance at every outer iteration. We will insert a short lemma (Lemma 4.3) that quantifies this uniform inner complexity and update the overall iteration bound in Theorem 4.2 accordingly. The dependence on the local KL parameters is now explicit and controlled. revision: yes

Circularity Check

0 steps flagged

Derivation from local KL assumption to generalized Hölder smoothness of maximal function is independent and non-circular

full rationale

The paper starts from the stated local KL condition on the inner maximization (which may vary with the outer variable x) and derives that the maximal function phi(x) is locally generalized Hölder smooth. This property is then used to construct an inexact proximal gradient method whose inexactness is controlled by running proximal gradient on the inner KL-structured subproblem. Complexity bounds for approximate stationary points follow from standard arguments under the derived smoothness. No step reduces a claimed prediction or rate to a fitted parameter, self-referential normalization, or load-bearing self-citation whose validity depends on the present paper. The derivation chain is self-contained against the external benchmark of the local KL assumption and does not invoke uniqueness theorems or ansatzes from the authors' prior work in a circular manner.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of a local, outer-dependent KL condition and on the derived local generalized Hölder smoothness of the maximal function; no free parameters or new invented entities are introduced.

axioms (1)
  • domain assumption The inner maximization problem satisfies a local Kurdyka-Lojasiewicz condition that may vary with the outer minimization variable.
    This is the key structural assumption stated in the abstract that replaces stronger global conditions and enables the subsequent analysis.

pith-pipeline@v0.9.0 · 5741 in / 1317 out tokens · 42024 ms · 2026-05-21T23:30:34.003267+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Nonsmooth Nonconvex-Concave Minimax Optimization: Convergence Criteria and Algorithms

    math.OC 2026-04 unverdicted novelty 6.0

    The authors introduce (ηx,ηy,δ,ε)-GSSP as a convergence criterion and develop projected gradient-free descent-ascent methods achieving non-asymptotic rates for nonsmooth nonconvex-concave minimax optimization without ...

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Arjovsky, S

    M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In Interna- tional Conference on Machine Learning , pages 214–223, 2017

  2. [2]

    Attouch, J

    H. Attouch, J. Bolte, and B. F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Mathematical programming, 137(1):91–129, 2013

  3. [3]

    Bertsimas, D

    D. Bertsimas, D. B. Brown, and C. Caramanis. Theory and applications of robust optimization. SIAM review, 53(3):464–501, 2011

  4. [4]

    Blanchet, J

    J. Blanchet, J. Li, S. Lin, and X. Zhang. Distributionally robust optimization and robust statistics. arXiv preprint arXiv:2401.14655 , 2024

  5. [5]

    A. B¨ ohm. Solving nonconvex-nonconcave min-max problems exhibiting weak Minty solutions.arXiv preprint arXiv:2201.12247, 2022

  6. [6]

    Bolte, A

    J. Bolte, A. Daniilidis, O. Ley, and L. Mazet. Characterizations of Lojasiewicz inequalities: subgradi- ent flows, talweg, convexity. Transactions of the American Mathematical Society , 362(6):3319–3363, 2010. 24

  7. [7]

    Cai and W

    Y. Cai and W. Zheng. Accelerated single-call methods for constrained min-max optimization. arXiv preprint arXiv:2210.03096, 2022

  8. [8]

    F. H. Clarke. Generalized gradients and applications. Transactions of the American Mathematical Society, 205:247–262, 1975

  9. [9]

    F. H. Clarke. Optimization and nonsmooth analysis . SIAM, 1990

  10. [10]

    B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen, and L. Song. SBEED: Convergent reinforcement learning with nonlinear function approximation. In International Conference on Machine Learning, pages 1125–1134, 2018

  11. [11]

    Drusvyatskiy, A

    D. Drusvyatskiy, A. D. Ioffe, and A. S. Lewis. Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria. Mathematical Programming, 185:357–383, 2021

  12. [12]

    Frankel, G

    P. Frankel, G. Garrigos, and J. Peypouquet. Splitting methods with variable metric for Kurdyka– Lojasiewicz functions and general convergence rates. Journal of Optimization Theory and Applica- tions, 165:874–900, 2015

  13. [13]

    Goodfellow, J

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks. Communications of the ACM , 63(11):139–144, 2020

  14. [14]

    F. Huang. Enhanced adaptive gradient algorithms for nonconvex-PL minimax optimization. arXiv preprint arXiv:2303.03984, 2023

  15. [15]

    C. Jin, P. Netrapalli, and M. I. Jordan. Minmax optimization: Stable limit points of gradient descent ascent are locally optimal. arXiv preprint arXiv:1902.00618 , 2019

  16. [16]

    Karimi, J

    H. Karimi, J. Nutini, and M. Schmidt. Linear convergence of gradient and proximal-gradient methods under the Polyak- Lojasiewicz condition. In Joint European conference on machine learning and knowledge discovery in databases , pages 795–811, 2016

  17. [17]

    Li and T

    G. Li and T. K. Pong. Calculus of the exponent of Kurdyka– Lojasiewicz inequality and its applications to linear convergence of first-order methods. Foundations of computational mathematics, 18(5):1199–1232, 2018

  18. [18]

    J. Li, L. Zhu, and A. M.-C. So. Nonsmooth nonconvex–nonconcave minimax optimization: Primal– dual balancing and iteration complexity analysis. Mathematical Programming, pages 1–51, 2025

  19. [19]

    M. Liu, H. Rafique, Q. Lin, and T. Yang. First-order convergence theory for weakly-convex-weakly- concave min-max problems. Journal of Machine Learning Research , 22(169):1–34, 2021

  20. [20]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 , 2017

  21. [21]

    Nesterov

    Y. Nesterov. Universal gradient methods for convex optimization problems. Mathematical Pro- gramming, 152(1):381–404, 2015

  22. [22]

    Nouiehed, M

    M. Nouiehed, M. Sanjabi, T. Huang, J. D. Lee, and M. Razaviyayn. Solving a class of non-convex min-max games using iterative first order methods. Advances in Neural Information Processing Systems, 32, 2019

  23. [23]

    Omidshafiei, J

    S. Omidshafiei, J. Pazis, C. Amato, J. P. How, and J. Vian. Deep decentralized multi-task multi- agent reinforcement learning under partial observability. In International Conference on Machine Learning, pages 2681–2690, 2017. 25

  24. [24]

    Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems

    T. Pethick, P. Latafat, P. Patrinos, O. Fercoq, and V. Cevher. Escaping limit cycles: Global conver- gence for constrained nonconvex-nonconcave minimax problems. arXiv preprint arXiv:2302.09831 , 2023

  25. [25]

    Rahimian and S

    H. Rahimian and S. Mehrotra. Frameworks and results in distributionally robust optimization. Open Journal of Mathematical Optimization , 3:1–85, 2022

  26. [26]

    Certifying some distributional robustness with principled adversarial training.arXiv preprint arXiv:1710.10571, 2017

    A. Sinha, H. Namkoong, R. Volpi, and J. Duchi. Certifying some distributional robustness with principled adversarial training. arXiv preprint arXiv:1710.10571 , 2017

  27. [27]

    E. M. Stein and R. Shakarchi. Real analysis: measure theory, integration, and Hilbert spaces . Princeton University Press, 2009

  28. [28]

    Xu, Z.-Q

    Z. Xu, Z.-Q. Wang, J.-L. Wang, and Y.-H. Dai. Zeroth-order alternating gradient descent ascent algorithms for a class of nonconvex-nonconcave minimax problems. Journal of Machine Learning Research, 24(313):1–25, 2023

  29. [29]

    J. Yang, A. Orvieto, A. Lucchi, and N. He. Faster single-loop algorithms for minimax optimization without strong concavity. In International Conference on Artificial Intelligence and Statistics , pages 5485–5517, 2022

  30. [30]

    Zheng, A

    T. Zheng, A. M.-C. So, and J. Li. Doubly smoothed optimistic gradients: A universal approach for smooth minimax problems. arXiv preprint arXiv:2506.07397 , 2025

  31. [31]

    Zheng, L

    T. Zheng, L. Zhu, A. M.-C. So, J. Blanchet, and J. Li. Universal gradient descent ascent method for nonconvex-nonconcave minimax optimization. Advances in Neural Information Processing Systems , 36:54075–54110, 2023. 26