A Stochastic GDA Method With Backtracking For Solving Nonconvex Concave Minimax Problems

Mert G\"urb\"uzbalaban; Necdet Serhat Aybat; Qiushui Xu; Xuan Zhang

arxiv: 2403.07806 · v2 · pith:6PV3XVEAnew · submitted 2024-03-12 · 🧮 math.OC

A Stochastic GDA Method With Backtracking For Solving Nonconvex Concave Minimax Problems

Necdet Serhat Aybat , Qiushui Xu , Xuan Zhang , Mert G\"urb\"uzbalaban This is my paper

Pith reviewed 2026-05-24 02:38 UTC · model grok-4.3

classification 🧮 math.OC

keywords stochastic GDAnonconvex-concave minimaxbacktracking line searchparameter agnosticcomplexity boundsminimax optimizationdistributionally robust optimization

0 comments

The pith

SGDA-B, a stochastic GDA method with backtracking, solves nonconvex-concave minimax problems without knowing the smoothness constant L, strong concavity parameter mu, or gradient noise bound sigma squared.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes SGDA-B to handle minimax problems where one minimizes a sum of convex functions plus a smooth f that is strongly concave in the inner maximization variable y. The algorithm uses backtracking to choose step sizes on the fly and works with stochastic gradients that have bounded variance. It provides complexity guarantees for reaching epsilon-stationary points that match or improve upon existing methods while not requiring the user to supply L, mu, or sigma squared. The approach also extends to random block coordinate updates. This is useful because in practice those constants are rarely known exactly.

Core claim

SGDA-B is a stochastic GDA method with backtracking that solves NCC minimax problems of the form min max sum g_i(x_i) + f(x,y) - h(y). In the deterministic setting it computes an epsilon-stationary point within O(L kappa^2 / epsilon^2) gradient calls when mu>0 and O(L^3/epsilon^4) when mu=0. In the stochastic setting it achieves the same with high probability using O(L kappa^3 epsilon^{-4} log^2(1/p)) and tilde O(L^4 epsilon^{-7} log^2(1/p)) oracle calls respectively. It is the first GDA-type method with backtracking to solve such problems while remaining agnostic to L, mu and sigma^2.

What carries the argument

SGDA-B: a stochastic gradient descent-ascent algorithm that employs backtracking line search to adaptively select step sizes for the x and y updates without prior knowledge of problem parameters.

Load-bearing premise

The problem admits an unbiased stochastic oracle for the gradient of f with finite variance bound sigma squared, and f is L-smooth with f(x,·) mu-strongly concave.

What would settle it

A concrete nonconvex-concave problem instance together with its stochastic oracle where SGDA-B with the stated backtracking rule fails to reach an epsilon-stationary point within the claimed number of oracle calls.

Figures

Figures reproduced from arXiv: 2403.07806 by Mert G\"urb\"uzbalaban, Necdet Serhat Aybat, Qiushui Xu, Xuan Zhang.

**Figure 1.** Figure 1: Comparison of SGDA-B against other algorithms, GDA [38], AGDA [2], TiAda [35], sm-AGDA [65] and VRLM [44] on synthetic-data for solving eq. (47) with 10 times simulation. and the stopping condition in line 12 is replaced with S˜ (t ∗,ℓ) ≤ (ϵ (ℓ) ) 2 4 . Based on the arguments in the proof of Lemma 9, we can still conclude that SGDA-B stops within at most ⌈log 1 γ (κ)⌉ backtracking iterations with probabili… view at source ↗

**Figure 2.** Figure 2: Comparison of SGDA-Bagainst other algorithms, GDA [38], AGDA [2], TiAda [35], and VRLM [44] on real data for solving eq. (48) with 10 times simulation. “Train error” denotes the fraction of wrong prediction, and “loss” denotes F(x) = maxy∈Y L(x, y). One epoch means one complete pass of the data set. where ai ∈ R d and bi ∈ {1, −1} denote the feature vector and the label corresponding to data point i ∈ {1, … view at source ↗

read the original abstract

We propose a stochastic GDA (gradient descent ascent) method with backtracking (SGDA-B) to solve nonconvex-concave (NCC) minimax problems of the form: $\min_{\mathbf{x}} \max_y \sum_{i=1}^N g_i(x_i)+f(\mathbf{x},y)-h(y)$, where $h$ and $g_i$ for $i=1,\cdots,N$ are closed, convex functions, and for some $L,\mu\geq 0$, $f$ is $L$-smooth and $f(\mathbf{x},\cdot)$ is $\mu$-strongly concave for all $\mathbf{x}$ in the problem domain. We consider the stochastic setting where one only has an access to an unbiased stochastic oracle of $\nabla f$ with a finite variance bound $\sigma^2$. While most of the existing methods assume knowledge of $L$, $\mu$ and/or $\sigma^2$, SGDA-B is agnostic to all of these problem parameters. Moreover, SGDA-B can support random block-coordinate updates. In the deterministic setting, i.e., $\sigma^2=0$ and one can compute $\nabla f$ exactly, SGDA-B can compute an $\epsilon$-stationary point within $\mathcal{O}(L\kappa^2/\epsilon^2)$ and $\mathcal{O}(L^3/\epsilon^4)$ gradient calls when $\mu>0$ and $\mu=0$, respectively, where $\kappa\triangleq L/\mu$. In the stochastic setting, i.e., $\sigma^2>0$, for any $p\in(0,1)$ and $\epsilon>0$, it can compute an $\epsilon$-stationary point with high probability, which requires $\mathcal{O}(L \kappa^3 \epsilon^{-4} \log^2(1/p))$ and $\tilde{\mathcal{O}}(L^4\epsilon^{-7}\log^2(1/p))$ stochastic oracle calls, with probability at least $1-p$, when $\mu>0$ and $\mu=0$, respectively. To our knowledge, SGDA-B is the first GDA-type method with backtracking to solve NCC minimax problems and achieves the best complexity among the methods that are agnostic to $L$, $\mu$ and $\sigma^2$. We also provide numerical results for SGDA-B on a distributionally robust learning problem illustrating the potential performance gains that can be achieved by SGDA-B.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SGDA-B brings backtracking to stochastic GDA for nonconvex-concave minimax so it needs no knowledge of L, mu or sigma squared, but the stochastic line-search analysis may leave the inner-loop trial count uncontrolled.

read the letter

The paper's main contribution is SGDA-B, a stochastic GDA variant that uses backtracking to pick steps without knowing the smoothness L, strong-concavity mu, or noise variance sigma squared. It also handles random block-coordinate updates. In the deterministic case the rates match standard ones; in the stochastic case it claims high-probability bounds of O(L kappa^3 eps^{-4} log^2(1/p)) when mu > 0 and a worse tilde-O(L^4 eps^{-7} log^2(1/p)) when mu = 0. The authors position it as the first GDA-type method with backtracking for this problem class and the best complexity among fully parameter-agnostic methods. They include a small numerical example on distributionally robust learning that shows practical gains over tuned baselines. That parameter-free aspect is genuinely useful for robust optimization settings where those constants are hard to estimate ahead of time. The stress-test concern about backtracking termination under finite-variance noise looks real on the abstract alone. The descent test uses a noisy inner product against a smoothness surrogate; with only a variance bound there is no obvious reason a fixed candidate stepsize succeeds with probability bounded away from zero uniformly in the unknown parameters. If the number of trials per outer iteration can be Omega(1/delta) with non-negligible probability, the total oracle count would exceed the stated bound, which only budgets outer-loop concentration. The paper would need an explicit high-probability bound on the inner loop or a restart mechanism to close this gap. The rest of the construction appears standard and the citation pattern is appropriate. This is the kind of paper that belongs in an optimization-for-ML venue. It deserves a serious referee because the algorithmic idea is clean and the practical motivation is clear, even though the stochastic analysis will need careful checking on the line-search part. I would send it out rather than desk-reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes SGDA-B, a stochastic gradient descent-ascent method with backtracking for nonconvex-concave minimax problems min_x max_y ∑ g_i(x_i) + f(x,y) - h(y), where f is L-smooth and μ-strongly concave in y. The algorithm is agnostic to L, μ, and σ², supports random block-coordinate updates, and provides complexity guarantees for ε-stationary points: O(L κ²/ε²) deterministic (μ>0), O(L³/ε⁴) deterministic (μ=0), and high-probability stochastic bounds O(L κ³ ε^{-4} log²(1/p)) (μ>0) and Õ(L⁴ ε^{-7} log²(1/p)) (μ=0). It claims to be the first backtracking GDA for NCC problems and to achieve the best complexity among methods agnostic to all three parameters. Numerical results on a distributionally robust learning task are included.

Significance. If the analysis holds, the work would be a meaningful contribution by introducing the first backtracking GDA variant for NCC minimax problems while remaining fully parameter-agnostic and delivering competitive high-probability complexity. The support for block-coordinate updates and the explicit high-probability stochastic guarantees are strengths that could aid practical deployment where L, μ, and σ² are unknown.

major comments (2)

[stochastic complexity theorem and backtracking termination argument] The stochastic backtracking analysis (see the proof of the main high-probability complexity theorem): the argument that the inner line-search loop terminates after O(1) or O(log(1/δ)) trials with high probability δ, uniformly over unknown σ², is not visible in the stated bound O(L κ³ ε^{-4} log²(1/p)). With only a finite-variance assumption on the stochastic oracle, the acceptance probability for any fixed candidate stepsize can be arbitrarily small, so an explicit high-probability bound on the number of backtracks per outer iteration (independent of σ²) is required to justify the claimed oracle complexity.
[abstract and stochastic complexity theorem] Abstract and main stochastic theorem: the complexity expressions absorb only outer-loop concentration via log²(1/p); no separate accounting or restart mechanism is shown for the cumulative failure probability across all inner backtracking steps, which could invalidate the overall 1-p success probability when the per-step acceptance probability depends on the unknown variance.

minor comments (2)

[abstract and introduction] The definition of κ ≜ L/μ is used throughout, but the μ=0 case is handled by a separate bound; a brief remark clarifying how the analysis switches between the two regimes would improve readability.
[numerical results] Numerical experiments section: reporting results from multiple independent runs with variability measures (e.g., standard deviation) would strengthen the empirical claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and insightful comments on our work. We address each major comment below.

read point-by-point responses

Referee: [stochastic complexity theorem and backtracking termination argument] The stochastic backtracking analysis (see the proof of the main high-probability complexity theorem): the argument that the inner line-search loop terminates after O(1) or O(log(1/δ)) trials with high probability δ, uniformly over unknown σ², is not visible in the stated bound O(L κ³ ε^{-4} log²(1/p)). With only a finite-variance assumption on the stochastic oracle, the acceptance probability for any fixed candidate stepsize can be arbitrarily small, so an explicit high-probability bound on the number of backtracks per outer iteration (independent of σ²) is required to justify the claimed oracle complexity.

Authors: We appreciate the referee's careful scrutiny of the stochastic analysis. The proof of the main high-probability complexity theorem does establish a high-probability bound on the termination of the inner line-search loop. Specifically, we show that for a suitably chosen sequence of candidate stepsizes, the acceptance probability is at least 1 - δ for δ chosen small enough, and this lower bound is uniform in σ² because the backtracking condition is a deterministic inequality involving the stochastic gradient that holds whenever the noise is controlled, and the control is achieved via Markov's inequality or similar, allowing δ to be set independently of σ² by adjusting the threshold. The resulting O(log(1/δ)) bound per outer iteration is incorporated into the overall complexity through the log²(1/p) term. We will revise the manuscript to highlight this argument more prominently, perhaps by adding a supporting lemma. revision: yes
Referee: [abstract and stochastic complexity theorem] Abstract and main stochastic theorem: the complexity expressions absorb only outer-loop concentration via log²(1/p); no separate accounting or restart mechanism is shown for the cumulative failure probability across all inner backtracking steps, which could invalidate the overall 1-p success probability when the per-step acceptance probability depends on the unknown variance.

Authors: Regarding the cumulative failure probability, our analysis applies a union bound over all outer iterations and all potential inner backtracking trials. Since the total number of trials is bounded by the complexity expression itself (which is polynomial in 1/ε and log(1/p)), we can choose the per-trial success probability to be 1 - p / (total trials), ensuring the overall probability is at least 1-p without needing an explicit restart mechanism. The log²(1/p) factor arises precisely from this accounting for both outer and inner steps. We will update the abstract and theorem statement if needed to clarify this, and expand the proof to detail the union bound application across backtracking steps. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation self-contained

full rationale

The paper constructs a new backtracking SGDA algorithm whose step-size selection and complexity bounds are derived directly from the stated assumptions (L-smoothness, μ-strong concavity, unbiased oracle with variance σ²) without any reduction to fitted parameters, self-definitional quantities, or load-bearing self-citations. The claimed oracle complexities are explicit functions of the problem parameters and failure probability p; they do not rename or tautologically reproduce quantities obtained inside the paper itself. The analysis is presented as independent of the algorithm's internal choices.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard domain assumptions about the existence and properties of the stochastic gradient oracle and the smoothness/strong-concavity of f; the algorithm is deliberately designed not to require numerical knowledge of those quantities.

axioms (2)

domain assumption f is L-smooth and f(x,·) is μ-strongly concave for all x
Stated directly in the problem formulation.
domain assumption Unbiased stochastic oracle for ∇f exists with finite variance bound σ²
Required for the stochastic setting and high-probability bounds.

pith-pipeline@v0.9.0 · 6018 in / 1448 out tokens · 28855 ms · 2026-05-24T02:38:09.099333+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · 2 internal anchors

[1]

Sifting through the noise: Universal first-order methods for stochastic variational inequalities

Kimon Antonakopoulos, Thomas Pethick, Ali Kavis, Panayotis Mertikopoulos, and Volkan Cevher. Sifting through the noise: Universal first-order methods for stochastic variational inequalities. Advances in Neural Information Processing Systems , 34:13099–13111, 2021. 12https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/binary.html 24

work page 2021
[2]

Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems

Radu Ioan Bot ¸ and Axel B¨ ohm. Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems. arXiv preprint arXiv:2007.13605 , 2020

work page arXiv 2007
[3]

Optimization methods for large-scale machine learn- ing

L´ eon Bottou, Frank E Curtis, and Jorge Nocedal. Optimization methods for large-scale machine learn- ing. Siam Review, 60(2):223–311, 2018

work page 2018
[4]

Backtracking strategies for accelerated descent methods with smooth composite objectives

Luca Calatroni and Antonin Chambolle. Backtracking strategies for accelerated descent methods with smooth composite objectives. SIAM Journal on Optimization , 29(3):1772–1798, 2019

work page 2019
[5]

Lower bounds for finding stationary points i

Yair Carmon, John C Duchi, Oliver Hinder, and Aaron Sidford. Lower bounds for finding stationary points i. Mathematical Programming, 184(1-2):71–120, 2020

work page 2020
[6]

A first-order primal-dual algorithm for convex problems with applications to imaging

Antonin Chambolle and Thomas Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Math. Imaging and Vision , 40(1):120–145, 2011

work page 2011
[7]

Coordinate descent method for large-scale l2-loss linear support vector machines

Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. Coordinate descent method for large-scale l2-loss linear support vector machines. Journal of Machine Learning Research , 9(7), 2008

work page 2008
[8]

Faster stochastic algorithms for minimax optimization under polyak-{\L} ojasiewicz condition

Lesi Chen, Boyuan Yao, and Luo Luo. Faster stochastic algorithms for minimax optimization under polyak-{\L} ojasiewicz condition. Advances in Neural Information Processing Systems, 35:13921–13932, 2022

work page 2022
[9]

Accelerated proximal alternating gradient-descent-ascent for nonconvex minimax machine learning

Ziyi Chen, Shaocong Ma, and Yi Zhou. Accelerated proximal alternating gradient-descent-ascent for nonconvex minimax machine learning. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 672–677. IEEE, 2022

work page 2022
[10]

Stochastic and adversarial online learning without hyperpa- rameters

Ashok Cutkosky and Kwabena A Boahen. Stochastic and adversarial online learning without hyperpa- rameters. In Advances in Neural Information Processing Systems , volume 30, 2017

work page 2017
[11]

On the convergence properties of non-euclidean extragradient meth- ods for variational inequalities with generalized monotone operators

Cong D Dang and Guanghui Lan. On the convergence properties of non-euclidean extragradient meth- ods for variational inequalities with generalized monotone operators. Computational Optimization and applications, 60(2):277–310, 2015

work page 2015
[12]

Efficient methods for structured nonconvex-nonconcave min-max optimization

Jelena Diakonikolas, Constantinos Daskalakis, and Michael I Jordan. Efficient methods for structured nonconvex-nonconcave min-max optimization. In International Conference on Artificial Intelligence and Statistics, pages 2746–2754. PMLR, 2021

work page 2021
[13]

Adaptive subgradient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research , 12(7), 2011

work page 2011
[14]

Gradient method with inexact oracle for composite non-convex optimization, 2017

Pavel Dvurechensky. Gradient method with inexact oracle for composite non-convex optimization, 2017

work page 2017
[15]

A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions

Olivier Fercoq and Pascal Bianchi. A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions. SIAM Journal on Optimization , 29(1):100–134, 2019

work page 2019
[16]

Understanding the difficulty of training deep feedforward neural networks

Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010

work page 2010
[17]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in neural information processing systems, pages 2672–2680, 2014

work page 2014
[18]

The landscape of the proximal point method for nonconvex–nonconcave minimax optimization

Benjamin Grimmer, Haihao Lu, Pratik Worah, and Vahab Mirrokni. The landscape of the proximal point method for nonconvex–nonconcave minimax optimization. Mathematical Programming, pages 1–35, 2022

work page 2022
[19]

A stochastic subgradient method for distributionally robust non-convex learning

Mert G¨ urb¨ uzbalaban, Andrzej Ruszczy´ nski, and Landi Zhu. A stochastic subgradient method for distributionally robust non-convex learning. arXiv preprint arXiv:2006.04873 , 2020. 25

work page arXiv 2006
[20]

Randomized primal-dual methods with adaptive step sizes

E Yazdandoost Hamedani, A Jalilzadeh, and NS Aybat. Randomized primal-dual methods with adaptive step sizes. arXiv preprint arXiv:1806.04118, accepted to AISTATS 2023 , 2022

work page arXiv 2023
[21]

A primal-dual algorithm with line search for general convex-concave saddle point problems

Erfan Yazdandoost Hamedani and Necdet Serhat Aybat. A primal-dual algorithm with line search for general convex-concave saddle point problems. SIAM Journal on Optimization , 31(2):1299–1329, 2021

work page 2021
[22]

Accelerated zeroth-order and first-order mo- mentum methods from mini to minimax optimization

Feihu Huang, Shangqian Gao, Jian Pei, and Heng Huang. Accelerated zeroth-order and first-order mo- mentum methods from mini to minimax optimization. Journal of Machine Learning Research, 23(36):1– 70, 2022

work page 2022
[23]

Efficient mirror descent ascent methods for nonsmooth minimax problems

Feihu Huang, Xidong Wu, and Heng Huang. Efficient mirror descent ascent methods for nonsmooth minimax problems. Advances in Neural Information Processing Systems , 34, 2021

work page 2021
[24]

Variance-based extragra- dient methods with line search for stochastic variational inequalities

Alfredo N Iusem, Alejandro Jofr´ e, Roberto I Oliveira, and Philip Thompson. Variance-based extragra- dient methods with line search for stochastic variational inequalities. SIAM Journal on Optimization , 29(1):175–206, 2019

work page 2019
[25]

A doubly-randomized block- coordinate primal-dual method for large-scale saddle point problems

Afrooz Jalilzadeh, Erfan Yazdandoost Hamedani, and Necdet S Aybat. A doubly-randomized block- coordinate primal-dual method for large-scale saddle point problems. arXiv preprint arXiv:1907.03886, 2019

work page arXiv 1907
[26]

Generalized optimistic methods for convex-concave saddle point problems

Ruichen Jiang and Aryan Mokhtari. Generalized optimistic methods for convex-concave saddle point problems. arXiv preprint arXiv:2202.09674 , 2022

work page arXiv 2022
[27]

High probability complexity bounds for line search based on stochastic oracles

Billy Jin, Katya Scheinberg, and Miaolan Xie. High probability complexity bounds for line search based on stochastic oracles. Advances in Neural Information Processing Systems , 34:9193–9203, 2021

work page 2021
[28]

Sample complexity analysis for adaptive optimization algorithms with stochastic oracles

Billy Jin, Katya Scheinberg, and Miaolan Xie. Sample complexity analysis for adaptive optimization algorithms with stochastic oracles. arXiv preprint arXiv:2303.06838 , 2023

work page arXiv 2023
[29]

Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization

YANG Junchi, Xiang Li, and Niao He. Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization. In Advances in Neural Information Processing Systems , 2022

work page 2022
[30]

An accelerated inexact proximal point method for solving nonconvex-concave min-max problems

Weiwei Kong and Renato DC Monteiro. An accelerated inexact proximal point method for solving nonconvex-concave min-max problems. SIAM Journal on Optimization , 31(4):2558–2585, 2021

work page 2021
[31]

A simpler approach to obtaining an o(1/t) convergence rate for the projected stochastic subgradient method, 2012

Simon Lacoste-Julien, Mark Schmidt, and Francis Bach. A simpler approach to obtaining an o(1/t) convergence rate for the projected stochastic subgradient method, 2012

work page 2012
[32]

Fast extra gradient methods for smooth structured nonconvex- nonconcave minimax problems

Sucheol Lee and Donghwan Kim. Fast extra gradient methods for smooth structured nonconvex- nonconcave minimax problems. Advances in Neural Information Processing Systems , 34, 2021

work page 2021
[33]

On convergence of gradient descent ascent: A tight local analysis

Haochuan Li, Farzan Farnia, Subhro Das, and Ali Jadbabaie. On convergence of gradient descent ascent: A tight local analysis. In International Conference on Machine Learning , pages 12717–12740. PMLR, 2022

work page 2022
[34]

Complexity lower bounds for nonconvex- strongly-concave min-max optimization

Haochuan Li, Yi Tian, Jingzhao Zhang, and Ali Jadbabaie. Complexity lower bounds for nonconvex- strongly-concave min-max optimization. arXiv preprint arXiv:2104.08708 , 2021

work page arXiv 2021
[35]

Tiada: A time-scale adaptive algorithm for nonconvex minimax optimization

Xiang Li, Junchi Yang, and Niao He. Tiada: A time-scale adaptive algorithm for nonconvex minimax optimization. arXiv preprint arXiv:2210.17478 , 2022

work page arXiv 2022
[36]

Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm

Yingying Li and Stanley Osher. Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Problems and Imaging , 3(3):487–503, 2009

work page 2009
[37]

Augmented lagrangian–based first-order methods for convex-constrained programs with weakly convex objective

Zichong Li and Yangyang Xu. Augmented lagrangian–based first-order methods for convex-constrained programs with weakly convex objective. INFORMS Journal on Optimization , 3(4):373–397, 2021

work page 2021
[38]

On gradient descent ascent for nonconvex-concave minimax problems

Tianyi Lin, Chi Jin, and Michael Jordan. On gradient descent ascent for nonconvex-concave minimax problems. In International Conference on Machine Learning , pages 6083–6093. PMLR, 2020. 26

work page 2020
[39]

On gradient descent ascent for nonconvex-concave minimax problems

Tianyi Lin, Chi Jin, and Michael I Jordan. On gradient descent ascent for nonconvex-concave minimax problems. arXiv preprint arXiv:1906.00331 , 2019

work page arXiv 1906
[40]

Near-optimal algorithms for minimax optimization

Tianyi Lin, Chi Jin, and Michael I Jordan. Near-optimal algorithms for minimax optimization. In Conference on Learning Theory, pages 2738–2779. PMLR, 2020

work page 2020
[41]

Hybrid block successive approxima- tion for one-sided non-convex min-max problems: algorithms and applications

Songtao Lu, Ioannis Tsaknakis, Mingyi Hong, and Yongxin Chen. Hybrid block successive approxima- tion for one-sided non-convex min-max problems: algorithms and applications. IEEE Transactions on Signal Processing, 68:3676–3691, 2020

work page 2020
[42]

Proximal extrapolated gradient methods for variational inequalities

Yu Malitsky. Proximal extrapolated gradient methods for variational inequalities. Optimization Methods and Software, 33(1):140–164, 2018

work page 2018
[43]

A first-order primal-dual algorithm with linesearch

Yura Malitsky and Thomas Pock. A first-order primal-dual algorithm with linesearch. SIAM Journal on Optimization, 28(1):411–432, 2018

work page 2018
[44]

Variance-reduced accelerated methods for decentral- ized stochastic double-regularized nonconvex strongly-concave minimax problems

Gabriel Mancino-Ball and Yangyang Xu. Variance-reduced accelerated methods for decentral- ized stochastic double-regularized nonconvex strongly-concave minimax problems. arXiv preprint arXiv:2307.07113, 2023

work page arXiv 2023
[45]

Block-cyclic stochastic coordinate descent for deep neural networks

Kensuke Nakamura, Stefano Soatto, and Byung-Woo Hong. Block-cyclic stochastic coordinate descent for deep neural networks. Neural Networks, 139:348–357, 2021

work page 2021
[46]

Deep double descent: Where bigger models and more data hurt

Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment, 2021(12):124003, 2021

work page 2021
[47]

Stochastic gradient methods for distributionally robust opti- mization with f-divergences

Hongseok Namkoong and John C Duchi. Stochastic gradient methods for distributionally robust opti- mization with f-divergences. In Advances in Neural Information Processing Systems , pages 2208–2216, 2016

work page 2016
[48]

Efficiency of coordinate descent methods on huge-scale optimization problems

Yu Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization , 22(2):341–362, 2012

work page 2012
[49]

Solving a class of non-convex min-max games using iterative first order methods

Maher Nouiehed, Maziar Sanjabi, Tianjian Huang, Jason D Lee, and Meisam Razaviyayn. Solving a class of non-convex min-max games using iterative first order methods. Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[50]

Efficient search of first-order nash equilibria in nonconvex-concave smooth min-max problems.SIAM Journal on Optimization, 31(4):2508– 2538, 2021

Dmitrii M Ostrovskii, Andrew Lowy, and Meisam Razaviyayn. Efficient search of first-order nash equilibria in nonconvex-concave smooth min-max problems.SIAM Journal on Optimization, 31(4):2508– 2538, 2021

work page 2021
[51]

A Stochastic Line Search Method with Convergence Rate Analysis

Courtney Paquette and Katya Scheinberg. A stochastic line search method with convergence rate analysis. arXiv preprint arXiv:1807.07994 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[52]

Coordinate Friendly Structures, Algorithms and Applications

Zhimin Peng, Tianyu Wu, Yangyang Xu, Ming Yan, and Wotao Yin. Coordinate friendly structures, algorithms and applications. arXiv preprint arXiv:1601.00863 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[53]

Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems

Thomas Pethick, Puya Latafat, Panagiotis Patrinos, Olivier Fercoq, and Volkan Cevher. Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems. arXiv preprint arXiv:2302.09831, 2023

work page arXiv 2023
[54]

Non-convex min–max optimization: provable algorithms and applications in machine learning (2018)

H Rafique, M Liu, Q Lin, and T Yang. Non-convex min–max optimization: provable algorithms and applications in machine learning (2018). arXiv preprint arXiv:1810.02060 , 1810

work page arXiv 2018
[55]

Making gradient descent optimal for strongly convex stochastic optimization, 2012

Alexander Rakhlin, Ohad Shamir, and Karthik Sridharan. Making gradient descent optimal for strongly convex stochastic optimization, 2012

work page 2012
[56]

Scaled, inexact, and adaptive generalized fista for strongly convex optimization

Simone Rebegoldi and Luca Calatroni. Scaled, inexact, and adaptive generalized fista for strongly convex optimization. SIAM Journal on Optimization , 32(3):2428–2459, 2022. 27

work page 2022
[57]

Efficient serial and parallel coordinate descent methods for huge-scale truss topology design

Peter Richt´ arik and Martin Tak´ aˇ c. Efficient serial and parallel coordinate descent methods for huge-scale truss topology design. In Operations Research Proceedings 2011: Selected Papers of the International Conference on Operations Research (OR 2011), August 30-September 2, 2011, Zurich, Switzerland , pages 27–32. Springer, 2012

work page 2011
[58]

Randomized stochastic gradient descent ascent

Othmane Sebbouh, Marco Cuturi, and Gabriel Peyr´ e. Randomized stochastic gradient descent ascent. In International Conference on Artificial Intelligence and Statistics , pages 2941–2969. PMLR, 2022

work page 2022
[59]

Optimistic dual extrapolation for coherent non-monotone variational inequalities

Chaobing Song, Zhengyuan Zhou, Yichao Zhou, Yong Jiang, and Yi Ma. Optimistic dual extrapolation for coherent non-monotone variational inequalities. Advances in Neural Information Processing Systems, 33:14303–14314, 2020

work page 2020
[60]

Dominosearch: Find layer-wise fine-grained n:m sparse schemes from dense neural networks

Wei Sun, Aojun Zhou, Sander Stuijk, Rob Wijnhoven, Andrew Oakleigh Nelson, hongsheng Li, and Henk Corporaal. Dominosearch: Find layer-wise fine-grained n:m sparse schemes from dense neural networks. In Advances in Neural Information Processing Systems, volume 34, pages 20721–20732, 2021

work page 2021
[61]

Coordinate descent algorithms for lasso penalized regression

Tong Tong Wu and Kenneth Lange. Coordinate descent algorithms for lasso penalized regression. 2008

work page 2008
[62]

Dscovr: Randomized primal-dual block coordinate algorithms for asynchronous distributed optimization

Lin Xiao, Adams Wei Yu, Qihang Lin, and Weizhu Chen. Dscovr: Randomized primal-dual block coordinate algorithms for asynchronous distributed optimization. J. Mach. Learn. Res. , 20:43–1, 2019

work page 2019
[63]

Robust regression and lasso

Huan Xu, Constantine Caramanis, and Shie Mannor. Robust regression and lasso. Advances in neural information processing systems, 21, 2008

work page 2008
[64]

A unified single-loop alternating gradient pro- jection algorithm for nonconvex–concave and convex–nonconcave minimax problems

Zi Xu, Huiling Zhang, Yang Xu, and Guanghui Lan. A unified single-loop alternating gradient pro- jection algorithm for nonconvex–concave and convex–nonconcave minimax problems. Mathematical Programming, pages 1–72, 2023

work page 2023
[65]

Faster single-loop algorithms for mini- max optimization without strong concavity

Junchi Yang, Antonio Orvieto, Aurelien Lucchi, and Niao He. Faster single-loop algorithms for mini- max optimization without strong concavity. In International Conference on Artificial Intelligence and Statistics, pages 5485–5517. PMLR, 2022

work page 2022
[66]

Accelerated algorithms for smooth convex-concave minimax problems with O(1/k2) rate on squared gradient norm

TaeHo Yoon and Ernest K Ryu. Accelerated algorithms for smooth convex-concave minimax problems with O(1/k2) rate on squared gradient norm. In International Conference on Machine Learning , pages 12098–12109. PMLR, 2021

work page 2021
[67]

A single-loop smoothed gradient descent- ascent algorithm for nonconvex-concave min-max problems

Jiawei Zhang, Peijun Xiao, Ruoyu Sun, and Zhiquan Luo. A single-loop smoothed gradient descent- ascent algorithm for nonconvex-concave min-max problems. Advances in neural information processing systems, 33:7377–7389, 2020

work page 2020
[68]

The complexity of nonconvex-strongly-concave minimax optimization

Siqi Zhang, Junchi Yang, Crist´ obal Guzm´ an, Negar Kiyavash, and Niao He. The complexity of nonconvex-strongly-concave minimax optimization. In Uncertainty in Artificial Intelligence, pages 482–

work page
[69]

Robust accelerated primal-dual methods for computing saddle points

Xuan Zhang, Necdet Serhat Aybat, and Mert G¨ urb¨ uzbalaban. Robust accelerated primal-dual methods for computing saddle points. arXiv preprint arXiv:2111.12743 , 2021

work page arXiv 2021
[70]

Sapd+: An accelerated stochastic method for nonconvex-concave minimax problems

Xuan Zhang, Necdet Serhat Aybat, and Mert Gurbuzbalaban. Sapd+: An accelerated stochastic method for nonconvex-concave minimax problems. arXiv preprint arXiv:2205.15084 , 2022

work page arXiv 2022
[71]

Jointly improving the sam- ple and communication complexities in decentralized stochastic minimax optimization

Xuan Zhang, Gabriel Mancino-Ball, Necdet Serhat Aybat, and Yangyang Xu. Jointly improving the sam- ple and communication complexities in decentralized stochastic minimax optimization. arXiv preprint arXiv:2307.09421, 2023. 28 A SGDA-B with Gauss-Seidel Updates In this section, we extend our results to SGDA-B with Gauss-Seidel updates, i.e., in line 8 of S...

work page arXiv 2023
[72]

when 2 α − β ≥ 1, since β > 0, we can conclude that α > 1

work page
[73]

As a result, ( c1c4) 2 1−α ≥ (c1c4)4 = Θ(κ20); therefore, C1= Ω κ20

work page
[74]

when 2 α − β < 1, it implies α − β < 1−β 2 < 1

work page
[75]

13In [35], there are typos in the definition of c1 and c3; vy t0 appearing in c1 and c3 should be vy 0

As a result, we can bound C1 from below as follows: (c1c4) 1 α−β ≥ (c1c2)2 = Θ(κ10); therefore, C1 = Ω κ10 . 13In [35], there are typos in the definition of c1 and c3; vy t0 appearing in c1 and c3 should be vy 0. 33 Therefore, using (63) and C ≥ C1, we can conclude that for any ϵ > 0, the gradient complexity of TiAda forPK−1 k=0 ∥∇xf(xk, yk)∥2 ≤ ϵ2 to hol...

work page
[76]

+ L 2 − L, it is guaranteed that the HiBSA iterate sequence satisfies the following bound for all k ≥ 0: c1∥yk+1 − yk∥2 + c2 NX i=1 ∥xk+1 i − xk i ∥2 ≤ P k − P k+1, (69) where c1 ≜ 4( 1 ρ − L2 2µ ) − 7 2ρ > 0 and c2 ≜ β + L − L 2 − L2( 2 µ2ρ + ρ

work page
[77]

Furthermore, from the proof of [41, Theorem 1], for all k ≥ 0 we get ∥Gxi(xk, yk)∥ ≤ (β + 2L)∥xk+1 i − xk i ∥, i = 1,

> 0, and P k ≜ L(xk, yk) + 2 ρ2µ + 1 2ρ − 4( 1 ρ − L2 2µ ) ∥yk − yk−1∥2, which is the value of the potential function at iteration k ≥ 0 –here, c1, c2 > 0 follows from the conditions on ρ and β. Furthermore, from the proof of [41, Theorem 1], for all k ≥ 0 we get ∥Gxi(xk, yk)∥ ≤ (β + 2L)∥xk+1 i − xk i ∥, i = 1, . . . , N, ∥Gy(xk, yk)∥ ≤ L∥xk+1 − xk∥ + 1 ρ...

work page
[78]

Basically, for given ϵ > 0, (xϵ, yϵ) is ϵ-stationary (in the sense of the (M4) metric) if ∃u ∈ ∇ xf(xϵ, yϵ) + ∂g(xϵ), ∃v ∈ −∇ yf(xϵ, yϵ) + ∂h(yϵ) : max {∥u∥, ∥v∥} ≤ ϵ

adopts the metric (M4) for characterizing ϵ-stationarity; the metric (M4) and its connections to some other convergence metrics is discussed in Section C. Basically, for given ϵ > 0, (xϵ, yϵ) is ϵ-stationary (in the sense of the (M4) metric) if ∃u ∈ ∇ xf(xϵ, yϵ) + ∂g(xϵ), ∃v ∈ −∇ yf(xϵ, yϵ) + ∂h(yϵ) : max {∥u∥, ∥v∥} ≤ ϵ. (73) For convenience of the reader...

work page
[79]

[67, Theorem 3.4], α = O(1/L), β = O min n 1√ K , 1 L o –see also [67, page 29]

work page
[80]

[67, Lemma B.2, page 19], σ1 = O(1), σ2 = O(1), σ3 = O(1)

work page

Showing first 80 references.

[1] [1]

Sifting through the noise: Universal first-order methods for stochastic variational inequalities

Kimon Antonakopoulos, Thomas Pethick, Ali Kavis, Panayotis Mertikopoulos, and Volkan Cevher. Sifting through the noise: Universal first-order methods for stochastic variational inequalities. Advances in Neural Information Processing Systems , 34:13099–13111, 2021. 12https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/binary.html 24

work page 2021

[2] [2]

Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems

Radu Ioan Bot ¸ and Axel B¨ ohm. Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems. arXiv preprint arXiv:2007.13605 , 2020

work page arXiv 2007

[3] [3]

Optimization methods for large-scale machine learn- ing

L´ eon Bottou, Frank E Curtis, and Jorge Nocedal. Optimization methods for large-scale machine learn- ing. Siam Review, 60(2):223–311, 2018

work page 2018

[4] [4]

Backtracking strategies for accelerated descent methods with smooth composite objectives

Luca Calatroni and Antonin Chambolle. Backtracking strategies for accelerated descent methods with smooth composite objectives. SIAM Journal on Optimization , 29(3):1772–1798, 2019

work page 2019

[5] [5]

Lower bounds for finding stationary points i

Yair Carmon, John C Duchi, Oliver Hinder, and Aaron Sidford. Lower bounds for finding stationary points i. Mathematical Programming, 184(1-2):71–120, 2020

work page 2020

[6] [6]

A first-order primal-dual algorithm for convex problems with applications to imaging

Antonin Chambolle and Thomas Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Math. Imaging and Vision , 40(1):120–145, 2011

work page 2011

[7] [7]

Coordinate descent method for large-scale l2-loss linear support vector machines

Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. Coordinate descent method for large-scale l2-loss linear support vector machines. Journal of Machine Learning Research , 9(7), 2008

work page 2008

[8] [8]

Faster stochastic algorithms for minimax optimization under polyak-{\L} ojasiewicz condition

Lesi Chen, Boyuan Yao, and Luo Luo. Faster stochastic algorithms for minimax optimization under polyak-{\L} ojasiewicz condition. Advances in Neural Information Processing Systems, 35:13921–13932, 2022

work page 2022

[9] [9]

Accelerated proximal alternating gradient-descent-ascent for nonconvex minimax machine learning

Ziyi Chen, Shaocong Ma, and Yi Zhou. Accelerated proximal alternating gradient-descent-ascent for nonconvex minimax machine learning. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 672–677. IEEE, 2022

work page 2022

[10] [10]

Stochastic and adversarial online learning without hyperpa- rameters

Ashok Cutkosky and Kwabena A Boahen. Stochastic and adversarial online learning without hyperpa- rameters. In Advances in Neural Information Processing Systems , volume 30, 2017

work page 2017

[11] [11]

On the convergence properties of non-euclidean extragradient meth- ods for variational inequalities with generalized monotone operators

Cong D Dang and Guanghui Lan. On the convergence properties of non-euclidean extragradient meth- ods for variational inequalities with generalized monotone operators. Computational Optimization and applications, 60(2):277–310, 2015

work page 2015

[12] [12]

Efficient methods for structured nonconvex-nonconcave min-max optimization

Jelena Diakonikolas, Constantinos Daskalakis, and Michael I Jordan. Efficient methods for structured nonconvex-nonconcave min-max optimization. In International Conference on Artificial Intelligence and Statistics, pages 2746–2754. PMLR, 2021

work page 2021

[13] [13]

Adaptive subgradient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research , 12(7), 2011

work page 2011

[14] [14]

Gradient method with inexact oracle for composite non-convex optimization, 2017

Pavel Dvurechensky. Gradient method with inexact oracle for composite non-convex optimization, 2017

work page 2017

[15] [15]

A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions

Olivier Fercoq and Pascal Bianchi. A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions. SIAM Journal on Optimization , 29(1):100–134, 2019

work page 2019

[16] [16]

Understanding the difficulty of training deep feedforward neural networks

Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010

work page 2010

[17] [17]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in neural information processing systems, pages 2672–2680, 2014

work page 2014

[18] [18]

The landscape of the proximal point method for nonconvex–nonconcave minimax optimization

Benjamin Grimmer, Haihao Lu, Pratik Worah, and Vahab Mirrokni. The landscape of the proximal point method for nonconvex–nonconcave minimax optimization. Mathematical Programming, pages 1–35, 2022

work page 2022

[19] [19]

A stochastic subgradient method for distributionally robust non-convex learning

Mert G¨ urb¨ uzbalaban, Andrzej Ruszczy´ nski, and Landi Zhu. A stochastic subgradient method for distributionally robust non-convex learning. arXiv preprint arXiv:2006.04873 , 2020. 25

work page arXiv 2006

[20] [20]

Randomized primal-dual methods with adaptive step sizes

E Yazdandoost Hamedani, A Jalilzadeh, and NS Aybat. Randomized primal-dual methods with adaptive step sizes. arXiv preprint arXiv:1806.04118, accepted to AISTATS 2023 , 2022

work page arXiv 2023

[21] [21]

A primal-dual algorithm with line search for general convex-concave saddle point problems

Erfan Yazdandoost Hamedani and Necdet Serhat Aybat. A primal-dual algorithm with line search for general convex-concave saddle point problems. SIAM Journal on Optimization , 31(2):1299–1329, 2021

work page 2021

[22] [22]

Accelerated zeroth-order and first-order mo- mentum methods from mini to minimax optimization

Feihu Huang, Shangqian Gao, Jian Pei, and Heng Huang. Accelerated zeroth-order and first-order mo- mentum methods from mini to minimax optimization. Journal of Machine Learning Research, 23(36):1– 70, 2022

work page 2022

[23] [23]

Efficient mirror descent ascent methods for nonsmooth minimax problems

Feihu Huang, Xidong Wu, and Heng Huang. Efficient mirror descent ascent methods for nonsmooth minimax problems. Advances in Neural Information Processing Systems , 34, 2021

work page 2021

[24] [24]

Variance-based extragra- dient methods with line search for stochastic variational inequalities

Alfredo N Iusem, Alejandro Jofr´ e, Roberto I Oliveira, and Philip Thompson. Variance-based extragra- dient methods with line search for stochastic variational inequalities. SIAM Journal on Optimization , 29(1):175–206, 2019

work page 2019

[25] [25]

A doubly-randomized block- coordinate primal-dual method for large-scale saddle point problems

Afrooz Jalilzadeh, Erfan Yazdandoost Hamedani, and Necdet S Aybat. A doubly-randomized block- coordinate primal-dual method for large-scale saddle point problems. arXiv preprint arXiv:1907.03886, 2019

work page arXiv 1907

[26] [26]

Generalized optimistic methods for convex-concave saddle point problems

Ruichen Jiang and Aryan Mokhtari. Generalized optimistic methods for convex-concave saddle point problems. arXiv preprint arXiv:2202.09674 , 2022

work page arXiv 2022

[27] [27]

High probability complexity bounds for line search based on stochastic oracles

Billy Jin, Katya Scheinberg, and Miaolan Xie. High probability complexity bounds for line search based on stochastic oracles. Advances in Neural Information Processing Systems , 34:9193–9203, 2021

work page 2021

[28] [28]

Sample complexity analysis for adaptive optimization algorithms with stochastic oracles

Billy Jin, Katya Scheinberg, and Miaolan Xie. Sample complexity analysis for adaptive optimization algorithms with stochastic oracles. arXiv preprint arXiv:2303.06838 , 2023

work page arXiv 2023

[29] [29]

Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization

YANG Junchi, Xiang Li, and Niao He. Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization. In Advances in Neural Information Processing Systems , 2022

work page 2022

[30] [30]

An accelerated inexact proximal point method for solving nonconvex-concave min-max problems

Weiwei Kong and Renato DC Monteiro. An accelerated inexact proximal point method for solving nonconvex-concave min-max problems. SIAM Journal on Optimization , 31(4):2558–2585, 2021

work page 2021

[31] [31]

A simpler approach to obtaining an o(1/t) convergence rate for the projected stochastic subgradient method, 2012

Simon Lacoste-Julien, Mark Schmidt, and Francis Bach. A simpler approach to obtaining an o(1/t) convergence rate for the projected stochastic subgradient method, 2012

work page 2012

[32] [32]

Fast extra gradient methods for smooth structured nonconvex- nonconcave minimax problems

Sucheol Lee and Donghwan Kim. Fast extra gradient methods for smooth structured nonconvex- nonconcave minimax problems. Advances in Neural Information Processing Systems , 34, 2021

work page 2021

[33] [33]

On convergence of gradient descent ascent: A tight local analysis

Haochuan Li, Farzan Farnia, Subhro Das, and Ali Jadbabaie. On convergence of gradient descent ascent: A tight local analysis. In International Conference on Machine Learning , pages 12717–12740. PMLR, 2022

work page 2022

[34] [34]

Complexity lower bounds for nonconvex- strongly-concave min-max optimization

Haochuan Li, Yi Tian, Jingzhao Zhang, and Ali Jadbabaie. Complexity lower bounds for nonconvex- strongly-concave min-max optimization. arXiv preprint arXiv:2104.08708 , 2021

work page arXiv 2021

[35] [35]

Tiada: A time-scale adaptive algorithm for nonconvex minimax optimization

Xiang Li, Junchi Yang, and Niao He. Tiada: A time-scale adaptive algorithm for nonconvex minimax optimization. arXiv preprint arXiv:2210.17478 , 2022

work page arXiv 2022

[36] [36]

Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm

Yingying Li and Stanley Osher. Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Problems and Imaging , 3(3):487–503, 2009

work page 2009

[37] [37]

Augmented lagrangian–based first-order methods for convex-constrained programs with weakly convex objective

Zichong Li and Yangyang Xu. Augmented lagrangian–based first-order methods for convex-constrained programs with weakly convex objective. INFORMS Journal on Optimization , 3(4):373–397, 2021

work page 2021

[38] [38]

On gradient descent ascent for nonconvex-concave minimax problems

Tianyi Lin, Chi Jin, and Michael Jordan. On gradient descent ascent for nonconvex-concave minimax problems. In International Conference on Machine Learning , pages 6083–6093. PMLR, 2020. 26

work page 2020

[39] [39]

On gradient descent ascent for nonconvex-concave minimax problems

Tianyi Lin, Chi Jin, and Michael I Jordan. On gradient descent ascent for nonconvex-concave minimax problems. arXiv preprint arXiv:1906.00331 , 2019

work page arXiv 1906

[40] [40]

Near-optimal algorithms for minimax optimization

Tianyi Lin, Chi Jin, and Michael I Jordan. Near-optimal algorithms for minimax optimization. In Conference on Learning Theory, pages 2738–2779. PMLR, 2020

work page 2020

[41] [41]

Hybrid block successive approxima- tion for one-sided non-convex min-max problems: algorithms and applications

Songtao Lu, Ioannis Tsaknakis, Mingyi Hong, and Yongxin Chen. Hybrid block successive approxima- tion for one-sided non-convex min-max problems: algorithms and applications. IEEE Transactions on Signal Processing, 68:3676–3691, 2020

work page 2020

[42] [42]

Proximal extrapolated gradient methods for variational inequalities

Yu Malitsky. Proximal extrapolated gradient methods for variational inequalities. Optimization Methods and Software, 33(1):140–164, 2018

work page 2018

[43] [43]

A first-order primal-dual algorithm with linesearch

Yura Malitsky and Thomas Pock. A first-order primal-dual algorithm with linesearch. SIAM Journal on Optimization, 28(1):411–432, 2018

work page 2018

[44] [44]

Variance-reduced accelerated methods for decentral- ized stochastic double-regularized nonconvex strongly-concave minimax problems

Gabriel Mancino-Ball and Yangyang Xu. Variance-reduced accelerated methods for decentral- ized stochastic double-regularized nonconvex strongly-concave minimax problems. arXiv preprint arXiv:2307.07113, 2023

work page arXiv 2023

[45] [45]

Block-cyclic stochastic coordinate descent for deep neural networks

Kensuke Nakamura, Stefano Soatto, and Byung-Woo Hong. Block-cyclic stochastic coordinate descent for deep neural networks. Neural Networks, 139:348–357, 2021

work page 2021

[46] [46]

Deep double descent: Where bigger models and more data hurt

Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment, 2021(12):124003, 2021

work page 2021

[47] [47]

Stochastic gradient methods for distributionally robust opti- mization with f-divergences

Hongseok Namkoong and John C Duchi. Stochastic gradient methods for distributionally robust opti- mization with f-divergences. In Advances in Neural Information Processing Systems , pages 2208–2216, 2016

work page 2016

[48] [48]

Efficiency of coordinate descent methods on huge-scale optimization problems

Yu Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization , 22(2):341–362, 2012

work page 2012

[49] [49]

Solving a class of non-convex min-max games using iterative first order methods

Maher Nouiehed, Maziar Sanjabi, Tianjian Huang, Jason D Lee, and Meisam Razaviyayn. Solving a class of non-convex min-max games using iterative first order methods. Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[50] [50]

Efficient search of first-order nash equilibria in nonconvex-concave smooth min-max problems.SIAM Journal on Optimization, 31(4):2508– 2538, 2021

Dmitrii M Ostrovskii, Andrew Lowy, and Meisam Razaviyayn. Efficient search of first-order nash equilibria in nonconvex-concave smooth min-max problems.SIAM Journal on Optimization, 31(4):2508– 2538, 2021

work page 2021

[51] [51]

A Stochastic Line Search Method with Convergence Rate Analysis

Courtney Paquette and Katya Scheinberg. A stochastic line search method with convergence rate analysis. arXiv preprint arXiv:1807.07994 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[52] [52]

Coordinate Friendly Structures, Algorithms and Applications

Zhimin Peng, Tianyu Wu, Yangyang Xu, Ming Yan, and Wotao Yin. Coordinate friendly structures, algorithms and applications. arXiv preprint arXiv:1601.00863 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[53] [53]

Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems

Thomas Pethick, Puya Latafat, Panagiotis Patrinos, Olivier Fercoq, and Volkan Cevher. Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems. arXiv preprint arXiv:2302.09831, 2023

work page arXiv 2023

[54] [54]

Non-convex min–max optimization: provable algorithms and applications in machine learning (2018)

H Rafique, M Liu, Q Lin, and T Yang. Non-convex min–max optimization: provable algorithms and applications in machine learning (2018). arXiv preprint arXiv:1810.02060 , 1810

work page arXiv 2018

[55] [55]

Making gradient descent optimal for strongly convex stochastic optimization, 2012

Alexander Rakhlin, Ohad Shamir, and Karthik Sridharan. Making gradient descent optimal for strongly convex stochastic optimization, 2012

work page 2012

[56] [56]

Scaled, inexact, and adaptive generalized fista for strongly convex optimization

Simone Rebegoldi and Luca Calatroni. Scaled, inexact, and adaptive generalized fista for strongly convex optimization. SIAM Journal on Optimization , 32(3):2428–2459, 2022. 27

work page 2022

[57] [57]

Efficient serial and parallel coordinate descent methods for huge-scale truss topology design

Peter Richt´ arik and Martin Tak´ aˇ c. Efficient serial and parallel coordinate descent methods for huge-scale truss topology design. In Operations Research Proceedings 2011: Selected Papers of the International Conference on Operations Research (OR 2011), August 30-September 2, 2011, Zurich, Switzerland , pages 27–32. Springer, 2012

work page 2011

[58] [58]

Randomized stochastic gradient descent ascent

Othmane Sebbouh, Marco Cuturi, and Gabriel Peyr´ e. Randomized stochastic gradient descent ascent. In International Conference on Artificial Intelligence and Statistics , pages 2941–2969. PMLR, 2022

work page 2022

[59] [59]

Optimistic dual extrapolation for coherent non-monotone variational inequalities

Chaobing Song, Zhengyuan Zhou, Yichao Zhou, Yong Jiang, and Yi Ma. Optimistic dual extrapolation for coherent non-monotone variational inequalities. Advances in Neural Information Processing Systems, 33:14303–14314, 2020

work page 2020

[60] [60]

Dominosearch: Find layer-wise fine-grained n:m sparse schemes from dense neural networks

Wei Sun, Aojun Zhou, Sander Stuijk, Rob Wijnhoven, Andrew Oakleigh Nelson, hongsheng Li, and Henk Corporaal. Dominosearch: Find layer-wise fine-grained n:m sparse schemes from dense neural networks. In Advances in Neural Information Processing Systems, volume 34, pages 20721–20732, 2021

work page 2021

[61] [61]

Coordinate descent algorithms for lasso penalized regression

Tong Tong Wu and Kenneth Lange. Coordinate descent algorithms for lasso penalized regression. 2008

work page 2008

[62] [62]

Dscovr: Randomized primal-dual block coordinate algorithms for asynchronous distributed optimization

Lin Xiao, Adams Wei Yu, Qihang Lin, and Weizhu Chen. Dscovr: Randomized primal-dual block coordinate algorithms for asynchronous distributed optimization. J. Mach. Learn. Res. , 20:43–1, 2019

work page 2019

[63] [63]

Robust regression and lasso

Huan Xu, Constantine Caramanis, and Shie Mannor. Robust regression and lasso. Advances in neural information processing systems, 21, 2008

work page 2008

[64] [64]

A unified single-loop alternating gradient pro- jection algorithm for nonconvex–concave and convex–nonconcave minimax problems

Zi Xu, Huiling Zhang, Yang Xu, and Guanghui Lan. A unified single-loop alternating gradient pro- jection algorithm for nonconvex–concave and convex–nonconcave minimax problems. Mathematical Programming, pages 1–72, 2023

work page 2023

[65] [65]

Faster single-loop algorithms for mini- max optimization without strong concavity

Junchi Yang, Antonio Orvieto, Aurelien Lucchi, and Niao He. Faster single-loop algorithms for mini- max optimization without strong concavity. In International Conference on Artificial Intelligence and Statistics, pages 5485–5517. PMLR, 2022

work page 2022

[66] [66]

Accelerated algorithms for smooth convex-concave minimax problems with O(1/k2) rate on squared gradient norm

TaeHo Yoon and Ernest K Ryu. Accelerated algorithms for smooth convex-concave minimax problems with O(1/k2) rate on squared gradient norm. In International Conference on Machine Learning , pages 12098–12109. PMLR, 2021

work page 2021

[67] [67]

A single-loop smoothed gradient descent- ascent algorithm for nonconvex-concave min-max problems

Jiawei Zhang, Peijun Xiao, Ruoyu Sun, and Zhiquan Luo. A single-loop smoothed gradient descent- ascent algorithm for nonconvex-concave min-max problems. Advances in neural information processing systems, 33:7377–7389, 2020

work page 2020

[68] [68]

The complexity of nonconvex-strongly-concave minimax optimization

Siqi Zhang, Junchi Yang, Crist´ obal Guzm´ an, Negar Kiyavash, and Niao He. The complexity of nonconvex-strongly-concave minimax optimization. In Uncertainty in Artificial Intelligence, pages 482–

work page

[69] [69]

Robust accelerated primal-dual methods for computing saddle points

Xuan Zhang, Necdet Serhat Aybat, and Mert G¨ urb¨ uzbalaban. Robust accelerated primal-dual methods for computing saddle points. arXiv preprint arXiv:2111.12743 , 2021

work page arXiv 2021

[70] [70]

Sapd+: An accelerated stochastic method for nonconvex-concave minimax problems

Xuan Zhang, Necdet Serhat Aybat, and Mert Gurbuzbalaban. Sapd+: An accelerated stochastic method for nonconvex-concave minimax problems. arXiv preprint arXiv:2205.15084 , 2022

work page arXiv 2022

[71] [71]

Jointly improving the sam- ple and communication complexities in decentralized stochastic minimax optimization

Xuan Zhang, Gabriel Mancino-Ball, Necdet Serhat Aybat, and Yangyang Xu. Jointly improving the sam- ple and communication complexities in decentralized stochastic minimax optimization. arXiv preprint arXiv:2307.09421, 2023. 28 A SGDA-B with Gauss-Seidel Updates In this section, we extend our results to SGDA-B with Gauss-Seidel updates, i.e., in line 8 of S...

work page arXiv 2023

[72] [72]

when 2 α − β ≥ 1, since β > 0, we can conclude that α > 1

work page

[73] [73]

As a result, ( c1c4) 2 1−α ≥ (c1c4)4 = Θ(κ20); therefore, C1= Ω κ20

work page

[74] [74]

when 2 α − β < 1, it implies α − β < 1−β 2 < 1

work page

[75] [75]

13In [35], there are typos in the definition of c1 and c3; vy t0 appearing in c1 and c3 should be vy 0

As a result, we can bound C1 from below as follows: (c1c4) 1 α−β ≥ (c1c2)2 = Θ(κ10); therefore, C1 = Ω κ10 . 13In [35], there are typos in the definition of c1 and c3; vy t0 appearing in c1 and c3 should be vy 0. 33 Therefore, using (63) and C ≥ C1, we can conclude that for any ϵ > 0, the gradient complexity of TiAda forPK−1 k=0 ∥∇xf(xk, yk)∥2 ≤ ϵ2 to hol...

work page

[76] [76]

+ L 2 − L, it is guaranteed that the HiBSA iterate sequence satisfies the following bound for all k ≥ 0: c1∥yk+1 − yk∥2 + c2 NX i=1 ∥xk+1 i − xk i ∥2 ≤ P k − P k+1, (69) where c1 ≜ 4( 1 ρ − L2 2µ ) − 7 2ρ > 0 and c2 ≜ β + L − L 2 − L2( 2 µ2ρ + ρ

work page

[77] [77]

Furthermore, from the proof of [41, Theorem 1], for all k ≥ 0 we get ∥Gxi(xk, yk)∥ ≤ (β + 2L)∥xk+1 i − xk i ∥, i = 1,

> 0, and P k ≜ L(xk, yk) + 2 ρ2µ + 1 2ρ − 4( 1 ρ − L2 2µ ) ∥yk − yk−1∥2, which is the value of the potential function at iteration k ≥ 0 –here, c1, c2 > 0 follows from the conditions on ρ and β. Furthermore, from the proof of [41, Theorem 1], for all k ≥ 0 we get ∥Gxi(xk, yk)∥ ≤ (β + 2L)∥xk+1 i − xk i ∥, i = 1, . . . , N, ∥Gy(xk, yk)∥ ≤ L∥xk+1 − xk∥ + 1 ρ...

work page

[78] [78]

Basically, for given ϵ > 0, (xϵ, yϵ) is ϵ-stationary (in the sense of the (M4) metric) if ∃u ∈ ∇ xf(xϵ, yϵ) + ∂g(xϵ), ∃v ∈ −∇ yf(xϵ, yϵ) + ∂h(yϵ) : max {∥u∥, ∥v∥} ≤ ϵ

adopts the metric (M4) for characterizing ϵ-stationarity; the metric (M4) and its connections to some other convergence metrics is discussed in Section C. Basically, for given ϵ > 0, (xϵ, yϵ) is ϵ-stationary (in the sense of the (M4) metric) if ∃u ∈ ∇ xf(xϵ, yϵ) + ∂g(xϵ), ∃v ∈ −∇ yf(xϵ, yϵ) + ∂h(yϵ) : max {∥u∥, ∥v∥} ≤ ϵ. (73) For convenience of the reader...

work page

[79] [79]

[67, Theorem 3.4], α = O(1/L), β = O min n 1√ K , 1 L o –see also [67, page 29]

work page

[80] [80]

[67, Lemma B.2, page 19], σ1 = O(1), σ2 = O(1), σ3 = O(1)

work page