A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Lojasiewicz condition

Xiangyuan Wang; Zhaosong Lu

arxiv: 2507.01932 · v2 · pith:56C3FMTKnew · submitted 2025-07-02 · 🧮 math.OC · cs.LG· cs.NA· math.NA· stat.ML

A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Lojasiewicz condition

Zhaosong Lu , Xiangyuan Wang This is my paper

Pith reviewed 2026-05-21 23:30 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.NAmath.NAstat.ML

keywords nonconvex-nonconcave minimaxlocal Kurdyka-Lojasiewicz conditionmaximal functiongeneralized Hölder smoothinexact proximal gradientcomplexity analysisfirst-order method

0 comments

The pith

Local KL condition on the inner maximization makes the maximal function locally generalized Hölder smooth and supports an inexact proximal gradient method with complexity guarantees for nonconvex-nonconcave minimax problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines nonconvex-nonconcave minimax problems in which the inner maximization satisfies a local Kurdyka-Lojasiewicz condition that can change depending on the outer minimization variable. This assumption is weaker than the global KL or Polyak-Lojasiewicz conditions often used elsewhere, allowing the method to cover more practical cases while still permitting analysis. The authors prove that the maximal function is locally generalized Hölder smooth under this condition. They then introduce an inexact proximal gradient algorithm that obtains the required inexact gradient by running a proximal gradient method on a KL-structured subproblem. Under mild assumptions the algorithm is shown to reach an approximate stationary point of the minimax problem with explicit iteration complexity bounds.

Core claim

Under a local Kurdyka-Lojasiewicz condition on the inner maximization that may vary with the outer variable, the associated maximal function is locally generalized Hölder smooth; this property allows an inexact proximal gradient method, whose inexact gradient is obtained by applying a proximal gradient method to a KL-structured subproblem, to compute an approximate stationary point with provable complexity guarantees.

What carries the argument

The local Kurdyka-Lojasiewicz condition on the inner problem, which yields local generalized Hölder smoothness of the maximal function and enables the inexact proximal gradient scheme.

If this is right

Complexity bounds hold for computing approximate stationary points without requiring global smoothness or global KL/PL assumptions.
The method remains applicable when the region of the KL condition shrinks as the algorithm approaches a stationary point.
Inexact gradients are obtained by solving KL-structured subproblems with an inner proximal gradient procedure.
Mild assumptions suffice for the overall convergence guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same local smoothness property could be used to design accelerated variants or variance-reduced extensions of the method.
Similar local KL assumptions might be checked in applications such as robust optimization or adversarial training to justify first-order methods there.
The shrinking KL region suggests that adaptive step-size rules could be derived directly from the local Hölder exponent.
Testing the method on problems where global PL fails but local KL holds would provide a practical check of the claimed advantage.

Load-bearing premise

The inner maximization problem satisfies a local Kurdyka-Lojasiewicz condition that may vary with the outer minimization variable.

What would settle it

A concrete counter-example in which the inner problem obeys a local KL condition yet the maximal function fails to be locally generalized Hölder smooth, or an instance where the proposed inexact proximal gradient method does not achieve the stated complexity for reaching an approximate stationary point.

read the original abstract

We study a class of nonconvex-nonconcave minimax problems in which the inner maximization problem satisfies a local Kurdyka-Lojasiewicz (KL) condition that may vary with the outer minimization variable. In contrast to the global KL or Polyak-Lojasiewicz (PL) conditions commonly assumed in the literature -- which are significantly stronger and often too restrictive in practice -- this local KL condition accommodates a broader range of practical scenarios. However, it also introduces new analytical challenges. In particular, as an optimization algorithm progresses toward a stationary point of the problem, the region over which the KL condition holds may shrink, resulting in a more intricate and potentially ill-conditioned landscape. To address this challenge, we show that the associated maximal function is locally generalized H\"older smooth. Leveraging this key property, we develop an inexact proximal gradient method for solving the minimax problem, where the inexact gradient of the maximal function is computed by applying a proximal gradient method to a KL-structured subproblem. Under mild assumptions, we establish complexity guarantees for computing an approximate stationary point of the minimax problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper relaxes global KL to a local x-dependent version for minimax problems and derives complexity for an inexact prox-gradient method, but the uniformity of the resulting Holder smoothness along trajectories is the part that needs close checking in the proofs.

read the letter

The main point is that this work replaces the usual global KL or PL assumptions with a local Kurdyka-Lojasiewicz condition on the inner maximization that can depend on the outer variable x. From that they prove the maximal function is locally generalized Holder smooth and then run an inexact proximal gradient method on it, where inner gradients come from a second proximal scheme that exploits the KL structure. They get iteration complexity to an approximate stationary point under mild assumptions. This is a real extension because global conditions are often too strong for nonconcave inner problems that arise in practice. The paper does a clean job laying out why the shrinking neighborhood creates trouble and how the local Holder property sidesteps it without forcing uniformity from the start. The logic chain from assumption to smoothness to algorithm looks coherent on the abstract and the cited prior results are standard for the area. The soft spot is exactly the one in the stress-test note. If the radius or modulus of the local KL neighborhood deteriorates with distance to the stationary set, it is not obvious that the standard complexity argument for proximal gradient still closes without extra uniformity. The abstract says they establish the Holder property anyway, so their proof presumably controls the trajectory and inexactness tightly enough, but without the explicit constants or the full argument it is hard to be confident the rates are as stated. No experiments are described, which is fine for a theory paper but leaves reproducibility to the proofs alone. This is for optimization theorists who already work on first-order methods for nonconvex-nonconcave minimax problems and want to weaken the global assumptions. A reader who cares about the technical details of local smoothness and inexact oracles will find the extension useful. It is solid enough on its own terms to deserve a serious referee even if the proofs will need careful scrutiny on the uniformity question. I would send it out for peer review.

Referee Report

2 major / 2 minor

Summary. The paper studies nonconvex-nonconcave minimax problems min_x max_y f(x,y) in which the inner maximization satisfies a local Kurdyka-Łojasiewicz inequality whose neighborhood may depend on and shrink with the outer variable x. From this assumption the authors derive that the maximal function φ(x) = max_y f(x,y) is locally generalized Hölder smooth, then construct an inexact proximal-gradient algorithm on φ whose inexactness is realized by running a proximal-gradient method on the inner KL-structured subproblem, and prove complexity bounds for reaching an approximate stationary point of the original minimax problem.

Significance. If the technical claims hold, the work supplies the first complexity guarantees for first-order methods on a meaningfully broader class of minimax problems than those covered by global KL or PL assumptions. The reduction of inexactness control to an inner proximal-gradient routine on a KL subproblem is a clean algorithmic idea that could be reused in other settings with structured inner problems.

major comments (2)

[§3.2, Theorem 3.3] §3.2, Theorem 3.3 (local generalized Hölder smoothness of φ): the proof invokes a KL neighborhood N(x) whose radius r(x) is permitted to approach zero as x approaches the stationary set of φ. The subsequent complexity argument in §4 requires the generalized Hölder modulus and the radius of validity to remain uniform along the entire proximal-gradient trajectory on φ; no explicit uniformity assumption on {N(x)} or lower bound on r(x) is stated, so the standard descent lemma and step-size selection may fail to close.
[§4.1, Theorem 4.2] §4.1, Algorithm 1 and Theorem 4.2 (complexity bound): the inexact gradient oracle for φ is controlled by running a fixed number of inner proximal-gradient steps whose accuracy depends on the local KL parameters at the current x. If the KL neighborhood shrinks faster than the outer step-size schedule, the accumulated inexactness may exceed the tolerance required by the outer convergence analysis; the proof does not quantify this dependence.

minor comments (2)

[Assumption 2.3] The statement of Assumption 2.3 (local KL) should explicitly record whether the desingularizing function θ and the exponent α are allowed to depend on x or are required to be uniform in a neighborhood of the limit point.
[Definition 3.1 and Theorem 4.2] Notation for the generalized Hölder constant L_α(x) is introduced in Definition 3.1 but never appears in the complexity statement of Theorem 4.2; clarifying whether the final bound depends on sup L_α(x) along the trajectory would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting two important technical points regarding uniformity of the local KL neighborhoods and control of the accumulated inexactness. Both comments identify places where the current argument relies on implicit uniformity that is not explicitly stated. We will revise the manuscript to close these gaps while preserving the local nature of the KL assumption.

read point-by-point responses

Referee: [§3.2, Theorem 3.3] §3.2, Theorem 3.3 (local generalized Hölder smoothness of φ): the proof invokes a KL neighborhood N(x) whose radius r(x) is permitted to approach zero as x approaches the stationary set of φ. The subsequent complexity argument in §4 requires the generalized Hölder modulus and the radius of validity to remain uniform along the entire proximal-gradient trajectory on φ; no explicit uniformity assumption on {N(x)} or lower bound on r(x) is stated, so the standard descent lemma and step-size selection may fail to close.

Authors: We agree that the current statement of Theorem 3.3 permits r(x) to shrink and does not explicitly guarantee uniformity along the outer trajectory. In the revised version we will add Assumption 3.4, which requires that there exists a neighborhood U of the stationary set S and a uniform radius r_0 > 0 such that for every x in U the KL inequality holds on the ball B(x, r_0). Under this mild strengthening (still strictly weaker than a global KL condition) the generalized Hölder smoothness modulus and the radius of validity become uniform on the compact level sets visited by the proximal-gradient sequence, allowing the standard descent lemma and step-size choice in §4 to go through without modification. revision: yes
Referee: [§4.1, Theorem 4.2] §4.1, Algorithm 1 and Theorem 4.2 (complexity bound): the inexact gradient oracle for φ is controlled by running a fixed number of inner proximal-gradient steps whose accuracy depends on the local KL parameters at the current x. If the KL neighborhood shrinks faster than the outer step-size schedule, the accumulated inexactness may exceed the tolerance required by the outer convergence analysis; the proof does not quantify this dependence.

Authors: The referee correctly notes that the current proof of Theorem 4.2 treats the inner iteration count as fixed and does not explicitly bound the dependence on a possibly shrinking r(x). With the new uniform-radius Assumption 3.4 introduced above, the local KL parameters (including the exponent and the constant) become uniform along the trajectory. Consequently the inner proximal-gradient routine requires only a uniform number of steps to achieve the required inexactness tolerance at every outer iteration. We will insert a short lemma (Lemma 4.3) that quantifies this uniform inner complexity and update the overall iteration bound in Theorem 4.2 accordingly. The dependence on the local KL parameters is now explicit and controlled. revision: yes

Circularity Check

0 steps flagged

Derivation from local KL assumption to generalized Hölder smoothness of maximal function is independent and non-circular

full rationale

The paper starts from the stated local KL condition on the inner maximization (which may vary with the outer variable x) and derives that the maximal function phi(x) is locally generalized Hölder smooth. This property is then used to construct an inexact proximal gradient method whose inexactness is controlled by running proximal gradient on the inner KL-structured subproblem. Complexity bounds for approximate stationary points follow from standard arguments under the derived smoothness. No step reduces a claimed prediction or rate to a fitted parameter, self-referential normalization, or load-bearing self-citation whose validity depends on the present paper. The derivation chain is self-contained against the external benchmark of the local KL assumption and does not invoke uniqueness theorems or ansatzes from the authors' prior work in a circular manner.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of a local, outer-dependent KL condition and on the derived local generalized Hölder smoothness of the maximal function; no free parameters or new invented entities are introduced.

axioms (1)

domain assumption The inner maximization problem satisfies a local Kurdyka-Lojasiewicz condition that may vary with the outer minimization variable.
This is the key structural assumption stated in the abstract that replaces stronger global conditions and enables the subsequent analysis.

pith-pipeline@v0.9.0 · 5741 in / 1317 out tokens · 42024 ms · 2026-05-21T23:30:34.003267+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we show that the associated maximal function is locally generalized Hölder smooth... develop an inexact proximal gradient method... complexity guarantees... θ and σ are the parameters for the local KL condition
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

C(F^*(x)−F(x,y))^θ ≤ dist(0,∂_y F(x,y)) ∀ y∈L(x) where L(x) shrinks with dist(0,∂Ψ(x))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Nonsmooth Nonconvex-Concave Minimax Optimization: Convergence Criteria and Algorithms
math.OC 2026-04 unverdicted novelty 6.0

The authors introduce (ηx,ηy,δ,ε)-GSSP as a convergence criterion and develop projected gradient-free descent-ascent methods achieving non-asymptotic rates for nonsmooth nonconvex-concave minimax optimization without ...

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Arjovsky, S

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In Interna- tional Conference on Machine Learning , pages 214–223, 2017

work page 2017
[2]

Attouch, J

H. Attouch, J. Bolte, and B. F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Mathematical programming, 137(1):91–129, 2013

work page 2013
[3]

Bertsimas, D

D. Bertsimas, D. B. Brown, and C. Caramanis. Theory and applications of robust optimization. SIAM review, 53(3):464–501, 2011

work page 2011
[4]

Blanchet, J

J. Blanchet, J. Li, S. Lin, and X. Zhang. Distributionally robust optimization and robust statistics. arXiv preprint arXiv:2401.14655 , 2024

work page arXiv 2024
[5]

A. B¨ ohm. Solving nonconvex-nonconcave min-max problems exhibiting weak Minty solutions.arXiv preprint arXiv:2201.12247, 2022

work page arXiv 2022
[6]

Bolte, A

J. Bolte, A. Daniilidis, O. Ley, and L. Mazet. Characterizations of Lojasiewicz inequalities: subgradi- ent flows, talweg, convexity. Transactions of the American Mathematical Society , 362(6):3319–3363, 2010. 24

work page 2010
[7]

Cai and W

Y. Cai and W. Zheng. Accelerated single-call methods for constrained min-max optimization. arXiv preprint arXiv:2210.03096, 2022

work page arXiv 2022
[8]

F. H. Clarke. Generalized gradients and applications. Transactions of the American Mathematical Society, 205:247–262, 1975

work page 1975
[9]

F. H. Clarke. Optimization and nonsmooth analysis . SIAM, 1990

work page 1990
[10]

B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen, and L. Song. SBEED: Convergent reinforcement learning with nonlinear function approximation. In International Conference on Machine Learning, pages 1125–1134, 2018

work page 2018
[11]

Drusvyatskiy, A

D. Drusvyatskiy, A. D. Ioffe, and A. S. Lewis. Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria. Mathematical Programming, 185:357–383, 2021

work page 2021
[12]

Frankel, G

P. Frankel, G. Garrigos, and J. Peypouquet. Splitting methods with variable metric for Kurdyka– Lojasiewicz functions and general convergence rates. Journal of Optimization Theory and Applica- tions, 165:874–900, 2015

work page 2015
[13]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks. Communications of the ACM , 63(11):139–144, 2020

work page 2020
[14]

F. Huang. Enhanced adaptive gradient algorithms for nonconvex-PL minimax optimization. arXiv preprint arXiv:2303.03984, 2023

work page arXiv 2023
[15]

C. Jin, P. Netrapalli, and M. I. Jordan. Minmax optimization: Stable limit points of gradient descent ascent are locally optimal. arXiv preprint arXiv:1902.00618 , 2019

work page arXiv 1902
[16]

Karimi, J

H. Karimi, J. Nutini, and M. Schmidt. Linear convergence of gradient and proximal-gradient methods under the Polyak- Lojasiewicz condition. In Joint European conference on machine learning and knowledge discovery in databases , pages 795–811, 2016

work page 2016
[17]

Li and T

G. Li and T. K. Pong. Calculus of the exponent of Kurdyka– Lojasiewicz inequality and its applications to linear convergence of first-order methods. Foundations of computational mathematics, 18(5):1199–1232, 2018

work page 2018
[18]

J. Li, L. Zhu, and A. M.-C. So. Nonsmooth nonconvex–nonconcave minimax optimization: Primal– dual balancing and iteration complexity analysis. Mathematical Programming, pages 1–51, 2025

work page 2025
[19]

M. Liu, H. Rafique, Q. Lin, and T. Yang. First-order convergence theory for weakly-convex-weakly- concave min-max problems. Journal of Machine Learning Research , 22(169):1–34, 2021

work page 2021
[20]

Towards Deep Learning Models Resistant to Adversarial Attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

Nesterov

Y. Nesterov. Universal gradient methods for convex optimization problems. Mathematical Pro- gramming, 152(1):381–404, 2015

work page 2015
[22]

Nouiehed, M

M. Nouiehed, M. Sanjabi, T. Huang, J. D. Lee, and M. Razaviyayn. Solving a class of non-convex min-max games using iterative first order methods. Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[23]

Omidshafiei, J

S. Omidshafiei, J. Pazis, C. Amato, J. P. How, and J. Vian. Deep decentralized multi-task multi- agent reinforcement learning under partial observability. In International Conference on Machine Learning, pages 2681–2690, 2017. 25

work page 2017
[24]

Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems

T. Pethick, P. Latafat, P. Patrinos, O. Fercoq, and V. Cevher. Escaping limit cycles: Global conver- gence for constrained nonconvex-nonconcave minimax problems. arXiv preprint arXiv:2302.09831 , 2023

work page arXiv 2023
[25]

Rahimian and S

H. Rahimian and S. Mehrotra. Frameworks and results in distributionally robust optimization. Open Journal of Mathematical Optimization , 3:1–85, 2022

work page 2022
[26]

Certifying some distributional robustness with principled adversarial training.arXiv preprint arXiv:1710.10571, 2017

A. Sinha, H. Namkoong, R. Volpi, and J. Duchi. Certifying some distributional robustness with principled adversarial training. arXiv preprint arXiv:1710.10571 , 2017

work page arXiv 2017
[27]

E. M. Stein and R. Shakarchi. Real analysis: measure theory, integration, and Hilbert spaces . Princeton University Press, 2009

work page 2009
[28]

Xu, Z.-Q

Z. Xu, Z.-Q. Wang, J.-L. Wang, and Y.-H. Dai. Zeroth-order alternating gradient descent ascent algorithms for a class of nonconvex-nonconcave minimax problems. Journal of Machine Learning Research, 24(313):1–25, 2023

work page 2023
[29]

J. Yang, A. Orvieto, A. Lucchi, and N. He. Faster single-loop algorithms for minimax optimization without strong concavity. In International Conference on Artificial Intelligence and Statistics , pages 5485–5517, 2022

work page 2022
[30]

Zheng, A

T. Zheng, A. M.-C. So, and J. Li. Doubly smoothed optimistic gradients: A universal approach for smooth minimax problems. arXiv preprint arXiv:2506.07397 , 2025

work page arXiv 2025
[31]

Zheng, L

T. Zheng, L. Zhu, A. M.-C. So, J. Blanchet, and J. Li. Universal gradient descent ascent method for nonconvex-nonconcave minimax optimization. Advances in Neural Information Processing Systems , 36:54075–54110, 2023. 26

work page 2023

[1] [1]

Arjovsky, S

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In Interna- tional Conference on Machine Learning , pages 214–223, 2017

work page 2017

[2] [2]

Attouch, J

H. Attouch, J. Bolte, and B. F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Mathematical programming, 137(1):91–129, 2013

work page 2013

[3] [3]

Bertsimas, D

D. Bertsimas, D. B. Brown, and C. Caramanis. Theory and applications of robust optimization. SIAM review, 53(3):464–501, 2011

work page 2011

[4] [4]

Blanchet, J

J. Blanchet, J. Li, S. Lin, and X. Zhang. Distributionally robust optimization and robust statistics. arXiv preprint arXiv:2401.14655 , 2024

work page arXiv 2024

[5] [5]

A. B¨ ohm. Solving nonconvex-nonconcave min-max problems exhibiting weak Minty solutions.arXiv preprint arXiv:2201.12247, 2022

work page arXiv 2022

[6] [6]

Bolte, A

J. Bolte, A. Daniilidis, O. Ley, and L. Mazet. Characterizations of Lojasiewicz inequalities: subgradi- ent flows, talweg, convexity. Transactions of the American Mathematical Society , 362(6):3319–3363, 2010. 24

work page 2010

[7] [7]

Cai and W

Y. Cai and W. Zheng. Accelerated single-call methods for constrained min-max optimization. arXiv preprint arXiv:2210.03096, 2022

work page arXiv 2022

[8] [8]

F. H. Clarke. Generalized gradients and applications. Transactions of the American Mathematical Society, 205:247–262, 1975

work page 1975

[9] [9]

F. H. Clarke. Optimization and nonsmooth analysis . SIAM, 1990

work page 1990

[10] [10]

B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen, and L. Song. SBEED: Convergent reinforcement learning with nonlinear function approximation. In International Conference on Machine Learning, pages 1125–1134, 2018

work page 2018

[11] [11]

Drusvyatskiy, A

D. Drusvyatskiy, A. D. Ioffe, and A. S. Lewis. Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria. Mathematical Programming, 185:357–383, 2021

work page 2021

[12] [12]

Frankel, G

P. Frankel, G. Garrigos, and J. Peypouquet. Splitting methods with variable metric for Kurdyka– Lojasiewicz functions and general convergence rates. Journal of Optimization Theory and Applica- tions, 165:874–900, 2015

work page 2015

[13] [13]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks. Communications of the ACM , 63(11):139–144, 2020

work page 2020

[14] [14]

F. Huang. Enhanced adaptive gradient algorithms for nonconvex-PL minimax optimization. arXiv preprint arXiv:2303.03984, 2023

work page arXiv 2023

[15] [15]

C. Jin, P. Netrapalli, and M. I. Jordan. Minmax optimization: Stable limit points of gradient descent ascent are locally optimal. arXiv preprint arXiv:1902.00618 , 2019

work page arXiv 1902

[16] [16]

Karimi, J

H. Karimi, J. Nutini, and M. Schmidt. Linear convergence of gradient and proximal-gradient methods under the Polyak- Lojasiewicz condition. In Joint European conference on machine learning and knowledge discovery in databases , pages 795–811, 2016

work page 2016

[17] [17]

Li and T

G. Li and T. K. Pong. Calculus of the exponent of Kurdyka– Lojasiewicz inequality and its applications to linear convergence of first-order methods. Foundations of computational mathematics, 18(5):1199–1232, 2018

work page 2018

[18] [18]

J. Li, L. Zhu, and A. M.-C. So. Nonsmooth nonconvex–nonconcave minimax optimization: Primal– dual balancing and iteration complexity analysis. Mathematical Programming, pages 1–51, 2025

work page 2025

[19] [19]

M. Liu, H. Rafique, Q. Lin, and T. Yang. First-order convergence theory for weakly-convex-weakly- concave min-max problems. Journal of Machine Learning Research , 22(169):1–34, 2021

work page 2021

[20] [20]

Towards Deep Learning Models Resistant to Adversarial Attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

Nesterov

Y. Nesterov. Universal gradient methods for convex optimization problems. Mathematical Pro- gramming, 152(1):381–404, 2015

work page 2015

[22] [22]

Nouiehed, M

M. Nouiehed, M. Sanjabi, T. Huang, J. D. Lee, and M. Razaviyayn. Solving a class of non-convex min-max games using iterative first order methods. Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[23] [23]

Omidshafiei, J

S. Omidshafiei, J. Pazis, C. Amato, J. P. How, and J. Vian. Deep decentralized multi-task multi- agent reinforcement learning under partial observability. In International Conference on Machine Learning, pages 2681–2690, 2017. 25

work page 2017

[24] [24]

Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems

T. Pethick, P. Latafat, P. Patrinos, O. Fercoq, and V. Cevher. Escaping limit cycles: Global conver- gence for constrained nonconvex-nonconcave minimax problems. arXiv preprint arXiv:2302.09831 , 2023

work page arXiv 2023

[25] [25]

Rahimian and S

H. Rahimian and S. Mehrotra. Frameworks and results in distributionally robust optimization. Open Journal of Mathematical Optimization , 3:1–85, 2022

work page 2022

[26] [26]

Certifying some distributional robustness with principled adversarial training.arXiv preprint arXiv:1710.10571, 2017

A. Sinha, H. Namkoong, R. Volpi, and J. Duchi. Certifying some distributional robustness with principled adversarial training. arXiv preprint arXiv:1710.10571 , 2017

work page arXiv 2017

[27] [27]

E. M. Stein and R. Shakarchi. Real analysis: measure theory, integration, and Hilbert spaces . Princeton University Press, 2009

work page 2009

[28] [28]

Xu, Z.-Q

Z. Xu, Z.-Q. Wang, J.-L. Wang, and Y.-H. Dai. Zeroth-order alternating gradient descent ascent algorithms for a class of nonconvex-nonconcave minimax problems. Journal of Machine Learning Research, 24(313):1–25, 2023

work page 2023

[29] [29]

J. Yang, A. Orvieto, A. Lucchi, and N. He. Faster single-loop algorithms for minimax optimization without strong concavity. In International Conference on Artificial Intelligence and Statistics , pages 5485–5517, 2022

work page 2022

[30] [30]

Zheng, A

T. Zheng, A. M.-C. So, and J. Li. Doubly smoothed optimistic gradients: A universal approach for smooth minimax problems. arXiv preprint arXiv:2506.07397 , 2025

work page arXiv 2025

[31] [31]

Zheng, L

T. Zheng, L. Zhu, A. M.-C. So, J. Blanchet, and J. Li. Universal gradient descent ascent method for nonconvex-nonconcave minimax optimization. Advances in Neural Information Processing Systems , 36:54075–54110, 2023. 26

work page 2023