A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Lojasiewicz condition
Pith reviewed 2026-05-21 23:30 UTC · model grok-4.3
The pith
Local KL condition on the inner maximization makes the maximal function locally generalized Hölder smooth and supports an inexact proximal gradient method with complexity guarantees for nonconvex-nonconcave minimax problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under a local Kurdyka-Lojasiewicz condition on the inner maximization that may vary with the outer variable, the associated maximal function is locally generalized Hölder smooth; this property allows an inexact proximal gradient method, whose inexact gradient is obtained by applying a proximal gradient method to a KL-structured subproblem, to compute an approximate stationary point with provable complexity guarantees.
What carries the argument
The local Kurdyka-Lojasiewicz condition on the inner problem, which yields local generalized Hölder smoothness of the maximal function and enables the inexact proximal gradient scheme.
If this is right
- Complexity bounds hold for computing approximate stationary points without requiring global smoothness or global KL/PL assumptions.
- The method remains applicable when the region of the KL condition shrinks as the algorithm approaches a stationary point.
- Inexact gradients are obtained by solving KL-structured subproblems with an inner proximal gradient procedure.
- Mild assumptions suffice for the overall convergence guarantees.
Where Pith is reading between the lines
- The same local smoothness property could be used to design accelerated variants or variance-reduced extensions of the method.
- Similar local KL assumptions might be checked in applications such as robust optimization or adversarial training to justify first-order methods there.
- The shrinking KL region suggests that adaptive step-size rules could be derived directly from the local Hölder exponent.
- Testing the method on problems where global PL fails but local KL holds would provide a practical check of the claimed advantage.
Load-bearing premise
The inner maximization problem satisfies a local Kurdyka-Lojasiewicz condition that may vary with the outer minimization variable.
What would settle it
A concrete counter-example in which the inner problem obeys a local KL condition yet the maximal function fails to be locally generalized Hölder smooth, or an instance where the proposed inexact proximal gradient method does not achieve the stated complexity for reaching an approximate stationary point.
read the original abstract
We study a class of nonconvex-nonconcave minimax problems in which the inner maximization problem satisfies a local Kurdyka-Lojasiewicz (KL) condition that may vary with the outer minimization variable. In contrast to the global KL or Polyak-Lojasiewicz (PL) conditions commonly assumed in the literature -- which are significantly stronger and often too restrictive in practice -- this local KL condition accommodates a broader range of practical scenarios. However, it also introduces new analytical challenges. In particular, as an optimization algorithm progresses toward a stationary point of the problem, the region over which the KL condition holds may shrink, resulting in a more intricate and potentially ill-conditioned landscape. To address this challenge, we show that the associated maximal function is locally generalized H\"older smooth. Leveraging this key property, we develop an inexact proximal gradient method for solving the minimax problem, where the inexact gradient of the maximal function is computed by applying a proximal gradient method to a KL-structured subproblem. Under mild assumptions, we establish complexity guarantees for computing an approximate stationary point of the minimax problem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies nonconvex-nonconcave minimax problems min_x max_y f(x,y) in which the inner maximization satisfies a local Kurdyka-Łojasiewicz inequality whose neighborhood may depend on and shrink with the outer variable x. From this assumption the authors derive that the maximal function φ(x) = max_y f(x,y) is locally generalized Hölder smooth, then construct an inexact proximal-gradient algorithm on φ whose inexactness is realized by running a proximal-gradient method on the inner KL-structured subproblem, and prove complexity bounds for reaching an approximate stationary point of the original minimax problem.
Significance. If the technical claims hold, the work supplies the first complexity guarantees for first-order methods on a meaningfully broader class of minimax problems than those covered by global KL or PL assumptions. The reduction of inexactness control to an inner proximal-gradient routine on a KL subproblem is a clean algorithmic idea that could be reused in other settings with structured inner problems.
major comments (2)
- [§3.2, Theorem 3.3] §3.2, Theorem 3.3 (local generalized Hölder smoothness of φ): the proof invokes a KL neighborhood N(x) whose radius r(x) is permitted to approach zero as x approaches the stationary set of φ. The subsequent complexity argument in §4 requires the generalized Hölder modulus and the radius of validity to remain uniform along the entire proximal-gradient trajectory on φ; no explicit uniformity assumption on {N(x)} or lower bound on r(x) is stated, so the standard descent lemma and step-size selection may fail to close.
- [§4.1, Theorem 4.2] §4.1, Algorithm 1 and Theorem 4.2 (complexity bound): the inexact gradient oracle for φ is controlled by running a fixed number of inner proximal-gradient steps whose accuracy depends on the local KL parameters at the current x. If the KL neighborhood shrinks faster than the outer step-size schedule, the accumulated inexactness may exceed the tolerance required by the outer convergence analysis; the proof does not quantify this dependence.
minor comments (2)
- [Assumption 2.3] The statement of Assumption 2.3 (local KL) should explicitly record whether the desingularizing function θ and the exponent α are allowed to depend on x or are required to be uniform in a neighborhood of the limit point.
- [Definition 3.1 and Theorem 4.2] Notation for the generalized Hölder constant L_α(x) is introduced in Definition 3.1 but never appears in the complexity statement of Theorem 4.2; clarifying whether the final bound depends on sup L_α(x) along the trajectory would improve readability.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting two important technical points regarding uniformity of the local KL neighborhoods and control of the accumulated inexactness. Both comments identify places where the current argument relies on implicit uniformity that is not explicitly stated. We will revise the manuscript to close these gaps while preserving the local nature of the KL assumption.
read point-by-point responses
-
Referee: [§3.2, Theorem 3.3] §3.2, Theorem 3.3 (local generalized Hölder smoothness of φ): the proof invokes a KL neighborhood N(x) whose radius r(x) is permitted to approach zero as x approaches the stationary set of φ. The subsequent complexity argument in §4 requires the generalized Hölder modulus and the radius of validity to remain uniform along the entire proximal-gradient trajectory on φ; no explicit uniformity assumption on {N(x)} or lower bound on r(x) is stated, so the standard descent lemma and step-size selection may fail to close.
Authors: We agree that the current statement of Theorem 3.3 permits r(x) to shrink and does not explicitly guarantee uniformity along the outer trajectory. In the revised version we will add Assumption 3.4, which requires that there exists a neighborhood U of the stationary set S and a uniform radius r_0 > 0 such that for every x in U the KL inequality holds on the ball B(x, r_0). Under this mild strengthening (still strictly weaker than a global KL condition) the generalized Hölder smoothness modulus and the radius of validity become uniform on the compact level sets visited by the proximal-gradient sequence, allowing the standard descent lemma and step-size choice in §4 to go through without modification. revision: yes
-
Referee: [§4.1, Theorem 4.2] §4.1, Algorithm 1 and Theorem 4.2 (complexity bound): the inexact gradient oracle for φ is controlled by running a fixed number of inner proximal-gradient steps whose accuracy depends on the local KL parameters at the current x. If the KL neighborhood shrinks faster than the outer step-size schedule, the accumulated inexactness may exceed the tolerance required by the outer convergence analysis; the proof does not quantify this dependence.
Authors: The referee correctly notes that the current proof of Theorem 4.2 treats the inner iteration count as fixed and does not explicitly bound the dependence on a possibly shrinking r(x). With the new uniform-radius Assumption 3.4 introduced above, the local KL parameters (including the exponent and the constant) become uniform along the trajectory. Consequently the inner proximal-gradient routine requires only a uniform number of steps to achieve the required inexactness tolerance at every outer iteration. We will insert a short lemma (Lemma 4.3) that quantifies this uniform inner complexity and update the overall iteration bound in Theorem 4.2 accordingly. The dependence on the local KL parameters is now explicit and controlled. revision: yes
Circularity Check
Derivation from local KL assumption to generalized Hölder smoothness of maximal function is independent and non-circular
full rationale
The paper starts from the stated local KL condition on the inner maximization (which may vary with the outer variable x) and derives that the maximal function phi(x) is locally generalized Hölder smooth. This property is then used to construct an inexact proximal gradient method whose inexactness is controlled by running proximal gradient on the inner KL-structured subproblem. Complexity bounds for approximate stationary points follow from standard arguments under the derived smoothness. No step reduces a claimed prediction or rate to a fitted parameter, self-referential normalization, or load-bearing self-citation whose validity depends on the present paper. The derivation chain is self-contained against the external benchmark of the local KL assumption and does not invoke uniqueness theorems or ansatzes from the authors' prior work in a circular manner.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The inner maximization problem satisfies a local Kurdyka-Lojasiewicz condition that may vary with the outer minimization variable.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we show that the associated maximal function is locally generalized Hölder smooth... develop an inexact proximal gradient method... complexity guarantees... θ and σ are the parameters for the local KL condition
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
C(F^*(x)−F(x,y))^θ ≤ dist(0,∂_y F(x,y)) ∀ y∈L(x) where L(x) shrinks with dist(0,∂Ψ(x))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Nonsmooth Nonconvex-Concave Minimax Optimization: Convergence Criteria and Algorithms
The authors introduce (ηx,ηy,δ,ε)-GSSP as a convergence criterion and develop projected gradient-free descent-ascent methods achieving non-asymptotic rates for nonsmooth nonconvex-concave minimax optimization without ...
Reference graph
Works this paper leans on
-
[1]
M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In Interna- tional Conference on Machine Learning , pages 214–223, 2017
work page 2017
-
[2]
H. Attouch, J. Bolte, and B. F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Mathematical programming, 137(1):91–129, 2013
work page 2013
-
[3]
D. Bertsimas, D. B. Brown, and C. Caramanis. Theory and applications of robust optimization. SIAM review, 53(3):464–501, 2011
work page 2011
-
[4]
J. Blanchet, J. Li, S. Lin, and X. Zhang. Distributionally robust optimization and robust statistics. arXiv preprint arXiv:2401.14655 , 2024
- [5]
- [6]
- [7]
-
[8]
F. H. Clarke. Generalized gradients and applications. Transactions of the American Mathematical Society, 205:247–262, 1975
work page 1975
-
[9]
F. H. Clarke. Optimization and nonsmooth analysis . SIAM, 1990
work page 1990
-
[10]
B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen, and L. Song. SBEED: Convergent reinforcement learning with nonlinear function approximation. In International Conference on Machine Learning, pages 1125–1134, 2018
work page 2018
-
[11]
D. Drusvyatskiy, A. D. Ioffe, and A. S. Lewis. Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria. Mathematical Programming, 185:357–383, 2021
work page 2021
-
[12]
P. Frankel, G. Garrigos, and J. Peypouquet. Splitting methods with variable metric for Kurdyka– Lojasiewicz functions and general convergence rates. Journal of Optimization Theory and Applica- tions, 165:874–900, 2015
work page 2015
-
[13]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks. Communications of the ACM , 63(11):139–144, 2020
work page 2020
- [14]
- [15]
- [16]
- [17]
-
[18]
J. Li, L. Zhu, and A. M.-C. So. Nonsmooth nonconvex–nonconcave minimax optimization: Primal– dual balancing and iteration complexity analysis. Mathematical Programming, pages 1–51, 2025
work page 2025
-
[19]
M. Liu, H. Rafique, Q. Lin, and T. Yang. First-order convergence theory for weakly-convex-weakly- concave min-max problems. Journal of Machine Learning Research , 22(169):1–34, 2021
work page 2021
-
[20]
Towards Deep Learning Models Resistant to Adversarial Attacks
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [21]
-
[22]
M. Nouiehed, M. Sanjabi, T. Huang, J. D. Lee, and M. Razaviyayn. Solving a class of non-convex min-max games using iterative first order methods. Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[23]
S. Omidshafiei, J. Pazis, C. Amato, J. P. How, and J. Vian. Deep decentralized multi-task multi- agent reinforcement learning under partial observability. In International Conference on Machine Learning, pages 2681–2690, 2017. 25
work page 2017
-
[24]
Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems
T. Pethick, P. Latafat, P. Patrinos, O. Fercoq, and V. Cevher. Escaping limit cycles: Global conver- gence for constrained nonconvex-nonconcave minimax problems. arXiv preprint arXiv:2302.09831 , 2023
-
[25]
H. Rahimian and S. Mehrotra. Frameworks and results in distributionally robust optimization. Open Journal of Mathematical Optimization , 3:1–85, 2022
work page 2022
-
[26]
A. Sinha, H. Namkoong, R. Volpi, and J. Duchi. Certifying some distributional robustness with principled adversarial training. arXiv preprint arXiv:1710.10571 , 2017
-
[27]
E. M. Stein and R. Shakarchi. Real analysis: measure theory, integration, and Hilbert spaces . Princeton University Press, 2009
work page 2009
- [28]
-
[29]
J. Yang, A. Orvieto, A. Lucchi, and N. He. Faster single-loop algorithms for minimax optimization without strong concavity. In International Conference on Artificial Intelligence and Statistics , pages 5485–5517, 2022
work page 2022
- [30]
- [31]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.