A Stochastic GDA Method With Backtracking For Solving Nonconvex Concave Minimax Problems
Pith reviewed 2026-05-24 02:38 UTC · model grok-4.3
The pith
SGDA-B, a stochastic GDA method with backtracking, solves nonconvex-concave minimax problems without knowing the smoothness constant L, strong concavity parameter mu, or gradient noise bound sigma squared.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SGDA-B is a stochastic GDA method with backtracking that solves NCC minimax problems of the form min max sum g_i(x_i) + f(x,y) - h(y). In the deterministic setting it computes an epsilon-stationary point within O(L kappa^2 / epsilon^2) gradient calls when mu>0 and O(L^3/epsilon^4) when mu=0. In the stochastic setting it achieves the same with high probability using O(L kappa^3 epsilon^{-4} log^2(1/p)) and tilde O(L^4 epsilon^{-7} log^2(1/p)) oracle calls respectively. It is the first GDA-type method with backtracking to solve such problems while remaining agnostic to L, mu and sigma^2.
What carries the argument
SGDA-B: a stochastic gradient descent-ascent algorithm that employs backtracking line search to adaptively select step sizes for the x and y updates without prior knowledge of problem parameters.
Load-bearing premise
The problem admits an unbiased stochastic oracle for the gradient of f with finite variance bound sigma squared, and f is L-smooth with f(x,·) mu-strongly concave.
What would settle it
A concrete nonconvex-concave problem instance together with its stochastic oracle where SGDA-B with the stated backtracking rule fails to reach an epsilon-stationary point within the claimed number of oracle calls.
Figures
read the original abstract
We propose a stochastic GDA (gradient descent ascent) method with backtracking (SGDA-B) to solve nonconvex-concave (NCC) minimax problems of the form: $\min_{\mathbf{x}} \max_y \sum_{i=1}^N g_i(x_i)+f(\mathbf{x},y)-h(y)$, where $h$ and $g_i$ for $i=1,\cdots,N$ are closed, convex functions, and for some $L,\mu\geq 0$, $f$ is $L$-smooth and $f(\mathbf{x},\cdot)$ is $\mu$-strongly concave for all $\mathbf{x}$ in the problem domain. We consider the stochastic setting where one only has an access to an unbiased stochastic oracle of $\nabla f$ with a finite variance bound $\sigma^2$. While most of the existing methods assume knowledge of $L$, $\mu$ and/or $\sigma^2$, SGDA-B is agnostic to all of these problem parameters. Moreover, SGDA-B can support random block-coordinate updates. In the deterministic setting, i.e., $\sigma^2=0$ and one can compute $\nabla f$ exactly, SGDA-B can compute an $\epsilon$-stationary point within $\mathcal{O}(L\kappa^2/\epsilon^2)$ and $\mathcal{O}(L^3/\epsilon^4)$ gradient calls when $\mu>0$ and $\mu=0$, respectively, where $\kappa\triangleq L/\mu$. In the stochastic setting, i.e., $\sigma^2>0$, for any $p\in(0,1)$ and $\epsilon>0$, it can compute an $\epsilon$-stationary point with high probability, which requires $\mathcal{O}(L \kappa^3 \epsilon^{-4} \log^2(1/p))$ and $\tilde{\mathcal{O}}(L^4\epsilon^{-7}\log^2(1/p))$ stochastic oracle calls, with probability at least $1-p$, when $\mu>0$ and $\mu=0$, respectively. To our knowledge, SGDA-B is the first GDA-type method with backtracking to solve NCC minimax problems and achieves the best complexity among the methods that are agnostic to $L$, $\mu$ and $\sigma^2$. We also provide numerical results for SGDA-B on a distributionally robust learning problem illustrating the potential performance gains that can be achieved by SGDA-B.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SGDA-B, a stochastic gradient descent-ascent method with backtracking for nonconvex-concave minimax problems min_x max_y ∑ g_i(x_i) + f(x,y) - h(y), where f is L-smooth and μ-strongly concave in y. The algorithm is agnostic to L, μ, and σ², supports random block-coordinate updates, and provides complexity guarantees for ε-stationary points: O(L κ²/ε²) deterministic (μ>0), O(L³/ε⁴) deterministic (μ=0), and high-probability stochastic bounds O(L κ³ ε^{-4} log²(1/p)) (μ>0) and Õ(L⁴ ε^{-7} log²(1/p)) (μ=0). It claims to be the first backtracking GDA for NCC problems and to achieve the best complexity among methods agnostic to all three parameters. Numerical results on a distributionally robust learning task are included.
Significance. If the analysis holds, the work would be a meaningful contribution by introducing the first backtracking GDA variant for NCC minimax problems while remaining fully parameter-agnostic and delivering competitive high-probability complexity. The support for block-coordinate updates and the explicit high-probability stochastic guarantees are strengths that could aid practical deployment where L, μ, and σ² are unknown.
major comments (2)
- [stochastic complexity theorem and backtracking termination argument] The stochastic backtracking analysis (see the proof of the main high-probability complexity theorem): the argument that the inner line-search loop terminates after O(1) or O(log(1/δ)) trials with high probability δ, uniformly over unknown σ², is not visible in the stated bound O(L κ³ ε^{-4} log²(1/p)). With only a finite-variance assumption on the stochastic oracle, the acceptance probability for any fixed candidate stepsize can be arbitrarily small, so an explicit high-probability bound on the number of backtracks per outer iteration (independent of σ²) is required to justify the claimed oracle complexity.
- [abstract and stochastic complexity theorem] Abstract and main stochastic theorem: the complexity expressions absorb only outer-loop concentration via log²(1/p); no separate accounting or restart mechanism is shown for the cumulative failure probability across all inner backtracking steps, which could invalidate the overall 1-p success probability when the per-step acceptance probability depends on the unknown variance.
minor comments (2)
- [abstract and introduction] The definition of κ ≜ L/μ is used throughout, but the μ=0 case is handled by a separate bound; a brief remark clarifying how the analysis switches between the two regimes would improve readability.
- [numerical results] Numerical experiments section: reporting results from multiple independent runs with variability measures (e.g., standard deviation) would strengthen the empirical claims.
Simulated Author's Rebuttal
We thank the referee for the thorough review and insightful comments on our work. We address each major comment below.
read point-by-point responses
-
Referee: [stochastic complexity theorem and backtracking termination argument] The stochastic backtracking analysis (see the proof of the main high-probability complexity theorem): the argument that the inner line-search loop terminates after O(1) or O(log(1/δ)) trials with high probability δ, uniformly over unknown σ², is not visible in the stated bound O(L κ³ ε^{-4} log²(1/p)). With only a finite-variance assumption on the stochastic oracle, the acceptance probability for any fixed candidate stepsize can be arbitrarily small, so an explicit high-probability bound on the number of backtracks per outer iteration (independent of σ²) is required to justify the claimed oracle complexity.
Authors: We appreciate the referee's careful scrutiny of the stochastic analysis. The proof of the main high-probability complexity theorem does establish a high-probability bound on the termination of the inner line-search loop. Specifically, we show that for a suitably chosen sequence of candidate stepsizes, the acceptance probability is at least 1 - δ for δ chosen small enough, and this lower bound is uniform in σ² because the backtracking condition is a deterministic inequality involving the stochastic gradient that holds whenever the noise is controlled, and the control is achieved via Markov's inequality or similar, allowing δ to be set independently of σ² by adjusting the threshold. The resulting O(log(1/δ)) bound per outer iteration is incorporated into the overall complexity through the log²(1/p) term. We will revise the manuscript to highlight this argument more prominently, perhaps by adding a supporting lemma. revision: yes
-
Referee: [abstract and stochastic complexity theorem] Abstract and main stochastic theorem: the complexity expressions absorb only outer-loop concentration via log²(1/p); no separate accounting or restart mechanism is shown for the cumulative failure probability across all inner backtracking steps, which could invalidate the overall 1-p success probability when the per-step acceptance probability depends on the unknown variance.
Authors: Regarding the cumulative failure probability, our analysis applies a union bound over all outer iterations and all potential inner backtracking trials. Since the total number of trials is bounded by the complexity expression itself (which is polynomial in 1/ε and log(1/p)), we can choose the per-trial success probability to be 1 - p / (total trials), ensuring the overall probability is at least 1-p without needing an explicit restart mechanism. The log²(1/p) factor arises precisely from this accounting for both outer and inner steps. We will update the abstract and theorem statement if needed to clarify this, and expand the proof to detail the union bound application across backtracking steps. revision: yes
Circularity Check
No circularity detected; derivation self-contained
full rationale
The paper constructs a new backtracking SGDA algorithm whose step-size selection and complexity bounds are derived directly from the stated assumptions (L-smoothness, μ-strong concavity, unbiased oracle with variance σ²) without any reduction to fitted parameters, self-definitional quantities, or load-bearing self-citations. The claimed oracle complexities are explicit functions of the problem parameters and failure probability p; they do not rename or tautologically reproduce quantities obtained inside the paper itself. The analysis is presented as independent of the algorithm's internal choices.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption f is L-smooth and f(x,·) is μ-strongly concave for all x
- domain assumption Unbiased stochastic oracle for ∇f exists with finite variance bound σ²
Reference graph
Works this paper leans on
-
[1]
Sifting through the noise: Universal first-order methods for stochastic variational inequalities
Kimon Antonakopoulos, Thomas Pethick, Ali Kavis, Panayotis Mertikopoulos, and Volkan Cevher. Sifting through the noise: Universal first-order methods for stochastic variational inequalities. Advances in Neural Information Processing Systems , 34:13099–13111, 2021. 12https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/binary.html 24
work page 2021
-
[2]
Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems
Radu Ioan Bot ¸ and Axel B¨ ohm. Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems. arXiv preprint arXiv:2007.13605 , 2020
-
[3]
Optimization methods for large-scale machine learn- ing
L´ eon Bottou, Frank E Curtis, and Jorge Nocedal. Optimization methods for large-scale machine learn- ing. Siam Review, 60(2):223–311, 2018
work page 2018
-
[4]
Backtracking strategies for accelerated descent methods with smooth composite objectives
Luca Calatroni and Antonin Chambolle. Backtracking strategies for accelerated descent methods with smooth composite objectives. SIAM Journal on Optimization , 29(3):1772–1798, 2019
work page 2019
-
[5]
Lower bounds for finding stationary points i
Yair Carmon, John C Duchi, Oliver Hinder, and Aaron Sidford. Lower bounds for finding stationary points i. Mathematical Programming, 184(1-2):71–120, 2020
work page 2020
-
[6]
A first-order primal-dual algorithm for convex problems with applications to imaging
Antonin Chambolle and Thomas Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Math. Imaging and Vision , 40(1):120–145, 2011
work page 2011
-
[7]
Coordinate descent method for large-scale l2-loss linear support vector machines
Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. Coordinate descent method for large-scale l2-loss linear support vector machines. Journal of Machine Learning Research , 9(7), 2008
work page 2008
-
[8]
Faster stochastic algorithms for minimax optimization under polyak-{\L} ojasiewicz condition
Lesi Chen, Boyuan Yao, and Luo Luo. Faster stochastic algorithms for minimax optimization under polyak-{\L} ojasiewicz condition. Advances in Neural Information Processing Systems, 35:13921–13932, 2022
work page 2022
-
[9]
Accelerated proximal alternating gradient-descent-ascent for nonconvex minimax machine learning
Ziyi Chen, Shaocong Ma, and Yi Zhou. Accelerated proximal alternating gradient-descent-ascent for nonconvex minimax machine learning. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 672–677. IEEE, 2022
work page 2022
-
[10]
Stochastic and adversarial online learning without hyperpa- rameters
Ashok Cutkosky and Kwabena A Boahen. Stochastic and adversarial online learning without hyperpa- rameters. In Advances in Neural Information Processing Systems , volume 30, 2017
work page 2017
-
[11]
Cong D Dang and Guanghui Lan. On the convergence properties of non-euclidean extragradient meth- ods for variational inequalities with generalized monotone operators. Computational Optimization and applications, 60(2):277–310, 2015
work page 2015
-
[12]
Efficient methods for structured nonconvex-nonconcave min-max optimization
Jelena Diakonikolas, Constantinos Daskalakis, and Michael I Jordan. Efficient methods for structured nonconvex-nonconcave min-max optimization. In International Conference on Artificial Intelligence and Statistics, pages 2746–2754. PMLR, 2021
work page 2021
-
[13]
Adaptive subgradient methods for online learning and stochastic optimization
John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research , 12(7), 2011
work page 2011
-
[14]
Gradient method with inexact oracle for composite non-convex optimization, 2017
Pavel Dvurechensky. Gradient method with inexact oracle for composite non-convex optimization, 2017
work page 2017
-
[15]
A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions
Olivier Fercoq and Pascal Bianchi. A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions. SIAM Journal on Optimization , 29(1):100–134, 2019
work page 2019
-
[16]
Understanding the difficulty of training deep feedforward neural networks
Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010
work page 2010
-
[17]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in neural information processing systems, pages 2672–2680, 2014
work page 2014
-
[18]
The landscape of the proximal point method for nonconvex–nonconcave minimax optimization
Benjamin Grimmer, Haihao Lu, Pratik Worah, and Vahab Mirrokni. The landscape of the proximal point method for nonconvex–nonconcave minimax optimization. Mathematical Programming, pages 1–35, 2022
work page 2022
-
[19]
A stochastic subgradient method for distributionally robust non-convex learning
Mert G¨ urb¨ uzbalaban, Andrzej Ruszczy´ nski, and Landi Zhu. A stochastic subgradient method for distributionally robust non-convex learning. arXiv preprint arXiv:2006.04873 , 2020. 25
-
[20]
Randomized primal-dual methods with adaptive step sizes
E Yazdandoost Hamedani, A Jalilzadeh, and NS Aybat. Randomized primal-dual methods with adaptive step sizes. arXiv preprint arXiv:1806.04118, accepted to AISTATS 2023 , 2022
-
[21]
A primal-dual algorithm with line search for general convex-concave saddle point problems
Erfan Yazdandoost Hamedani and Necdet Serhat Aybat. A primal-dual algorithm with line search for general convex-concave saddle point problems. SIAM Journal on Optimization , 31(2):1299–1329, 2021
work page 2021
-
[22]
Accelerated zeroth-order and first-order mo- mentum methods from mini to minimax optimization
Feihu Huang, Shangqian Gao, Jian Pei, and Heng Huang. Accelerated zeroth-order and first-order mo- mentum methods from mini to minimax optimization. Journal of Machine Learning Research, 23(36):1– 70, 2022
work page 2022
-
[23]
Efficient mirror descent ascent methods for nonsmooth minimax problems
Feihu Huang, Xidong Wu, and Heng Huang. Efficient mirror descent ascent methods for nonsmooth minimax problems. Advances in Neural Information Processing Systems , 34, 2021
work page 2021
-
[24]
Variance-based extragra- dient methods with line search for stochastic variational inequalities
Alfredo N Iusem, Alejandro Jofr´ e, Roberto I Oliveira, and Philip Thompson. Variance-based extragra- dient methods with line search for stochastic variational inequalities. SIAM Journal on Optimization , 29(1):175–206, 2019
work page 2019
-
[25]
A doubly-randomized block- coordinate primal-dual method for large-scale saddle point problems
Afrooz Jalilzadeh, Erfan Yazdandoost Hamedani, and Necdet S Aybat. A doubly-randomized block- coordinate primal-dual method for large-scale saddle point problems. arXiv preprint arXiv:1907.03886, 2019
-
[26]
Generalized optimistic methods for convex-concave saddle point problems
Ruichen Jiang and Aryan Mokhtari. Generalized optimistic methods for convex-concave saddle point problems. arXiv preprint arXiv:2202.09674 , 2022
-
[27]
High probability complexity bounds for line search based on stochastic oracles
Billy Jin, Katya Scheinberg, and Miaolan Xie. High probability complexity bounds for line search based on stochastic oracles. Advances in Neural Information Processing Systems , 34:9193–9203, 2021
work page 2021
-
[28]
Sample complexity analysis for adaptive optimization algorithms with stochastic oracles
Billy Jin, Katya Scheinberg, and Miaolan Xie. Sample complexity analysis for adaptive optimization algorithms with stochastic oracles. arXiv preprint arXiv:2303.06838 , 2023
-
[29]
Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization
YANG Junchi, Xiang Li, and Niao He. Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization. In Advances in Neural Information Processing Systems , 2022
work page 2022
-
[30]
An accelerated inexact proximal point method for solving nonconvex-concave min-max problems
Weiwei Kong and Renato DC Monteiro. An accelerated inexact proximal point method for solving nonconvex-concave min-max problems. SIAM Journal on Optimization , 31(4):2558–2585, 2021
work page 2021
-
[31]
Simon Lacoste-Julien, Mark Schmidt, and Francis Bach. A simpler approach to obtaining an o(1/t) convergence rate for the projected stochastic subgradient method, 2012
work page 2012
-
[32]
Fast extra gradient methods for smooth structured nonconvex- nonconcave minimax problems
Sucheol Lee and Donghwan Kim. Fast extra gradient methods for smooth structured nonconvex- nonconcave minimax problems. Advances in Neural Information Processing Systems , 34, 2021
work page 2021
-
[33]
On convergence of gradient descent ascent: A tight local analysis
Haochuan Li, Farzan Farnia, Subhro Das, and Ali Jadbabaie. On convergence of gradient descent ascent: A tight local analysis. In International Conference on Machine Learning , pages 12717–12740. PMLR, 2022
work page 2022
-
[34]
Complexity lower bounds for nonconvex- strongly-concave min-max optimization
Haochuan Li, Yi Tian, Jingzhao Zhang, and Ali Jadbabaie. Complexity lower bounds for nonconvex- strongly-concave min-max optimization. arXiv preprint arXiv:2104.08708 , 2021
-
[35]
Tiada: A time-scale adaptive algorithm for nonconvex minimax optimization
Xiang Li, Junchi Yang, and Niao He. Tiada: A time-scale adaptive algorithm for nonconvex minimax optimization. arXiv preprint arXiv:2210.17478 , 2022
-
[36]
Yingying Li and Stanley Osher. Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Problems and Imaging , 3(3):487–503, 2009
work page 2009
-
[37]
Zichong Li and Yangyang Xu. Augmented lagrangian–based first-order methods for convex-constrained programs with weakly convex objective. INFORMS Journal on Optimization , 3(4):373–397, 2021
work page 2021
-
[38]
On gradient descent ascent for nonconvex-concave minimax problems
Tianyi Lin, Chi Jin, and Michael Jordan. On gradient descent ascent for nonconvex-concave minimax problems. In International Conference on Machine Learning , pages 6083–6093. PMLR, 2020. 26
work page 2020
-
[39]
On gradient descent ascent for nonconvex-concave minimax problems
Tianyi Lin, Chi Jin, and Michael I Jordan. On gradient descent ascent for nonconvex-concave minimax problems. arXiv preprint arXiv:1906.00331 , 2019
-
[40]
Near-optimal algorithms for minimax optimization
Tianyi Lin, Chi Jin, and Michael I Jordan. Near-optimal algorithms for minimax optimization. In Conference on Learning Theory, pages 2738–2779. PMLR, 2020
work page 2020
-
[41]
Songtao Lu, Ioannis Tsaknakis, Mingyi Hong, and Yongxin Chen. Hybrid block successive approxima- tion for one-sided non-convex min-max problems: algorithms and applications. IEEE Transactions on Signal Processing, 68:3676–3691, 2020
work page 2020
-
[42]
Proximal extrapolated gradient methods for variational inequalities
Yu Malitsky. Proximal extrapolated gradient methods for variational inequalities. Optimization Methods and Software, 33(1):140–164, 2018
work page 2018
-
[43]
A first-order primal-dual algorithm with linesearch
Yura Malitsky and Thomas Pock. A first-order primal-dual algorithm with linesearch. SIAM Journal on Optimization, 28(1):411–432, 2018
work page 2018
-
[44]
Gabriel Mancino-Ball and Yangyang Xu. Variance-reduced accelerated methods for decentral- ized stochastic double-regularized nonconvex strongly-concave minimax problems. arXiv preprint arXiv:2307.07113, 2023
-
[45]
Block-cyclic stochastic coordinate descent for deep neural networks
Kensuke Nakamura, Stefano Soatto, and Byung-Woo Hong. Block-cyclic stochastic coordinate descent for deep neural networks. Neural Networks, 139:348–357, 2021
work page 2021
-
[46]
Deep double descent: Where bigger models and more data hurt
Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment, 2021(12):124003, 2021
work page 2021
-
[47]
Stochastic gradient methods for distributionally robust opti- mization with f-divergences
Hongseok Namkoong and John C Duchi. Stochastic gradient methods for distributionally robust opti- mization with f-divergences. In Advances in Neural Information Processing Systems , pages 2208–2216, 2016
work page 2016
-
[48]
Efficiency of coordinate descent methods on huge-scale optimization problems
Yu Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization , 22(2):341–362, 2012
work page 2012
-
[49]
Solving a class of non-convex min-max games using iterative first order methods
Maher Nouiehed, Maziar Sanjabi, Tianjian Huang, Jason D Lee, and Meisam Razaviyayn. Solving a class of non-convex min-max games using iterative first order methods. Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[50]
Dmitrii M Ostrovskii, Andrew Lowy, and Meisam Razaviyayn. Efficient search of first-order nash equilibria in nonconvex-concave smooth min-max problems.SIAM Journal on Optimization, 31(4):2508– 2538, 2021
work page 2021
-
[51]
A Stochastic Line Search Method with Convergence Rate Analysis
Courtney Paquette and Katya Scheinberg. A stochastic line search method with convergence rate analysis. arXiv preprint arXiv:1807.07994 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[52]
Coordinate Friendly Structures, Algorithms and Applications
Zhimin Peng, Tianyu Wu, Yangyang Xu, Ming Yan, and Wotao Yin. Coordinate friendly structures, algorithms and applications. arXiv preprint arXiv:1601.00863 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[53]
Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems
Thomas Pethick, Puya Latafat, Panagiotis Patrinos, Olivier Fercoq, and Volkan Cevher. Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems. arXiv preprint arXiv:2302.09831, 2023
-
[54]
Non-convex min–max optimization: provable algorithms and applications in machine learning (2018)
H Rafique, M Liu, Q Lin, and T Yang. Non-convex min–max optimization: provable algorithms and applications in machine learning (2018). arXiv preprint arXiv:1810.02060 , 1810
-
[55]
Making gradient descent optimal for strongly convex stochastic optimization, 2012
Alexander Rakhlin, Ohad Shamir, and Karthik Sridharan. Making gradient descent optimal for strongly convex stochastic optimization, 2012
work page 2012
-
[56]
Scaled, inexact, and adaptive generalized fista for strongly convex optimization
Simone Rebegoldi and Luca Calatroni. Scaled, inexact, and adaptive generalized fista for strongly convex optimization. SIAM Journal on Optimization , 32(3):2428–2459, 2022. 27
work page 2022
-
[57]
Efficient serial and parallel coordinate descent methods for huge-scale truss topology design
Peter Richt´ arik and Martin Tak´ aˇ c. Efficient serial and parallel coordinate descent methods for huge-scale truss topology design. In Operations Research Proceedings 2011: Selected Papers of the International Conference on Operations Research (OR 2011), August 30-September 2, 2011, Zurich, Switzerland , pages 27–32. Springer, 2012
work page 2011
-
[58]
Randomized stochastic gradient descent ascent
Othmane Sebbouh, Marco Cuturi, and Gabriel Peyr´ e. Randomized stochastic gradient descent ascent. In International Conference on Artificial Intelligence and Statistics , pages 2941–2969. PMLR, 2022
work page 2022
-
[59]
Optimistic dual extrapolation for coherent non-monotone variational inequalities
Chaobing Song, Zhengyuan Zhou, Yichao Zhou, Yong Jiang, and Yi Ma. Optimistic dual extrapolation for coherent non-monotone variational inequalities. Advances in Neural Information Processing Systems, 33:14303–14314, 2020
work page 2020
-
[60]
Dominosearch: Find layer-wise fine-grained n:m sparse schemes from dense neural networks
Wei Sun, Aojun Zhou, Sander Stuijk, Rob Wijnhoven, Andrew Oakleigh Nelson, hongsheng Li, and Henk Corporaal. Dominosearch: Find layer-wise fine-grained n:m sparse schemes from dense neural networks. In Advances in Neural Information Processing Systems, volume 34, pages 20721–20732, 2021
work page 2021
-
[61]
Coordinate descent algorithms for lasso penalized regression
Tong Tong Wu and Kenneth Lange. Coordinate descent algorithms for lasso penalized regression. 2008
work page 2008
-
[62]
Dscovr: Randomized primal-dual block coordinate algorithms for asynchronous distributed optimization
Lin Xiao, Adams Wei Yu, Qihang Lin, and Weizhu Chen. Dscovr: Randomized primal-dual block coordinate algorithms for asynchronous distributed optimization. J. Mach. Learn. Res. , 20:43–1, 2019
work page 2019
-
[63]
Huan Xu, Constantine Caramanis, and Shie Mannor. Robust regression and lasso. Advances in neural information processing systems, 21, 2008
work page 2008
-
[64]
Zi Xu, Huiling Zhang, Yang Xu, and Guanghui Lan. A unified single-loop alternating gradient pro- jection algorithm for nonconvex–concave and convex–nonconcave minimax problems. Mathematical Programming, pages 1–72, 2023
work page 2023
-
[65]
Faster single-loop algorithms for mini- max optimization without strong concavity
Junchi Yang, Antonio Orvieto, Aurelien Lucchi, and Niao He. Faster single-loop algorithms for mini- max optimization without strong concavity. In International Conference on Artificial Intelligence and Statistics, pages 5485–5517. PMLR, 2022
work page 2022
-
[66]
TaeHo Yoon and Ernest K Ryu. Accelerated algorithms for smooth convex-concave minimax problems with O(1/k2) rate on squared gradient norm. In International Conference on Machine Learning , pages 12098–12109. PMLR, 2021
work page 2021
-
[67]
A single-loop smoothed gradient descent- ascent algorithm for nonconvex-concave min-max problems
Jiawei Zhang, Peijun Xiao, Ruoyu Sun, and Zhiquan Luo. A single-loop smoothed gradient descent- ascent algorithm for nonconvex-concave min-max problems. Advances in neural information processing systems, 33:7377–7389, 2020
work page 2020
-
[68]
The complexity of nonconvex-strongly-concave minimax optimization
Siqi Zhang, Junchi Yang, Crist´ obal Guzm´ an, Negar Kiyavash, and Niao He. The complexity of nonconvex-strongly-concave minimax optimization. In Uncertainty in Artificial Intelligence, pages 482–
-
[69]
Robust accelerated primal-dual methods for computing saddle points
Xuan Zhang, Necdet Serhat Aybat, and Mert G¨ urb¨ uzbalaban. Robust accelerated primal-dual methods for computing saddle points. arXiv preprint arXiv:2111.12743 , 2021
-
[70]
Sapd+: An accelerated stochastic method for nonconvex-concave minimax problems
Xuan Zhang, Necdet Serhat Aybat, and Mert Gurbuzbalaban. Sapd+: An accelerated stochastic method for nonconvex-concave minimax problems. arXiv preprint arXiv:2205.15084 , 2022
-
[71]
Xuan Zhang, Gabriel Mancino-Ball, Necdet Serhat Aybat, and Yangyang Xu. Jointly improving the sam- ple and communication complexities in decentralized stochastic minimax optimization. arXiv preprint arXiv:2307.09421, 2023. 28 A SGDA-B with Gauss-Seidel Updates In this section, we extend our results to SGDA-B with Gauss-Seidel updates, i.e., in line 8 of S...
-
[72]
when 2 α − β ≥ 1, since β > 0, we can conclude that α > 1
-
[73]
As a result, ( c1c4) 2 1−α ≥ (c1c4)4 = Θ(κ20); therefore, C1= Ω κ20
-
[74]
when 2 α − β < 1, it implies α − β < 1−β 2 < 1
-
[75]
As a result, we can bound C1 from below as follows: (c1c4) 1 α−β ≥ (c1c2)2 = Θ(κ10); therefore, C1 = Ω κ10 . 13In [35], there are typos in the definition of c1 and c3; vy t0 appearing in c1 and c3 should be vy 0. 33 Therefore, using (63) and C ≥ C1, we can conclude that for any ϵ > 0, the gradient complexity of TiAda forPK−1 k=0 ∥∇xf(xk, yk)∥2 ≤ ϵ2 to hol...
-
[76]
+ L 2 − L, it is guaranteed that the HiBSA iterate sequence satisfies the following bound for all k ≥ 0: c1∥yk+1 − yk∥2 + c2 NX i=1 ∥xk+1 i − xk i ∥2 ≤ P k − P k+1, (69) where c1 ≜ 4( 1 ρ − L2 2µ ) − 7 2ρ > 0 and c2 ≜ β + L − L 2 − L2( 2 µ2ρ + ρ
-
[77]
> 0, and P k ≜ L(xk, yk) + 2 ρ2µ + 1 2ρ − 4( 1 ρ − L2 2µ ) ∥yk − yk−1∥2, which is the value of the potential function at iteration k ≥ 0 –here, c1, c2 > 0 follows from the conditions on ρ and β. Furthermore, from the proof of [41, Theorem 1], for all k ≥ 0 we get ∥Gxi(xk, yk)∥ ≤ (β + 2L)∥xk+1 i − xk i ∥, i = 1, . . . , N, ∥Gy(xk, yk)∥ ≤ L∥xk+1 − xk∥ + 1 ρ...
-
[78]
adopts the metric (M4) for characterizing ϵ-stationarity; the metric (M4) and its connections to some other convergence metrics is discussed in Section C. Basically, for given ϵ > 0, (xϵ, yϵ) is ϵ-stationary (in the sense of the (M4) metric) if ∃u ∈ ∇ xf(xϵ, yϵ) + ∂g(xϵ), ∃v ∈ −∇ yf(xϵ, yϵ) + ∂h(yϵ) : max {∥u∥, ∥v∥} ≤ ϵ. (73) For convenience of the reader...
-
[79]
[67, Theorem 3.4], α = O(1/L), β = O min n 1√ K , 1 L o –see also [67, page 29]
-
[80]
[67, Lemma B.2, page 19], σ1 = O(1), σ2 = O(1), σ3 = O(1)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.