arxiv: 2605.01246 · v1 · submitted 2026-05-02 · 🧮 math.OC

A Single-Loop Stochastic Gradient Algorithm for Minimax Optimization with Nonlinear Coupled Constraints

Qichao Cao , Shangzhi Zeng , Jin Zhang , Yuxuan Zhou This is my paper

Pith reviewed 2026-05-09 15:11 UTC · model grok-4.3

classification 🧮 math.OC

keywords minimax optimizationstochastic gradientnonconvex-concavecoupled constraintspenalty methodsingle-loop algorithmconvergence analysismachine learning

0 comments

The pith

SPACO is a single-loop stochastic gradient algorithm that solves nonconvex-concave minimax problems with nonlinear convex coupled constraints via penalty-based smoothing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops SPACO, which approximates the original constrained minimax problem by adding a quadratic penalty and regularization term to create a smooth, differentiable surrogate. Stochastic gradient steps are then applied directly to this surrogate in a single loop without inner iterations. The authors prove non-asymptotic complexity bounds for reaching approximate stationarity and show that accumulation points of the generated sequence satisfy an asymptotic stationarity condition. This approach is relevant because many machine learning tasks involve minimax objectives under nonlinear constraints, and single-loop methods avoid the extra computational cost of nested loops.

Core claim

By integrating a quadratic penalty scheme with regularization, the framework produces a continuously differentiable approximation to the MCC problem that retains sufficient structure for nonconvex-concave analysis; the resulting SPACO algorithm then delivers non-asymptotic complexity guarantees and ensures that its iterates accumulate at asymptotically stationary points.

What carries the argument

The penalty-based smooth approximation framework, which combines a quadratic penalty on the nonlinear convex coupled constraints with a regularization term to obtain a continuously differentiable surrogate problem suitable for single-loop stochastic gradient updates.

If this is right

The single-loop structure reduces per-iteration cost compared with double-loop methods that solve inner subproblems.
Non-asymptotic bounds quantify the number of stochastic gradient evaluations needed to reach a prescribed accuracy level.
Accumulation points of the sequence satisfy an asymptotic first-order stationarity condition for the original constrained problem.
The same smoothing framework can be paired with different stochastic gradient estimators while retaining the convergence theory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The penalty approach may generalize to other constraint types if the penalty function remains differentiable and the resulting surrogate stays amenable to stochastic analysis.
Combining SPACO with variance-reduction techniques could improve the practical sample complexity without changing the single-loop architecture.
The method provides a template for converting other constrained stochastic minimax problems into unconstrained smooth problems that admit simple gradient-based solvers.

Load-bearing premise

The quadratic penalty scheme with regularization produces a continuously differentiable approximation of the MCC problem while preserving enough structure for the nonconvex-concave convergence analysis to apply.

What would settle it

On a simple synthetic MCC instance whose stationary points are known analytically, run SPACO for the number of iterations predicted by the complexity bound and check whether the gradient norm of the smoothed problem remains above the target tolerance with high probability.

Figures

Figures reproduced from arXiv: 2605.01246 by Jin Zhang, Qichao Cao, Shangzhi Zeng, Yuxuan Zhou.

**Figure 1.** Figure 1: Convergence behavior of the penalty-based approximation approach (SPACO) versus view at source ↗

**Figure 2.** Figure 2: Convergence curve for solving (12). whole expected optimal solution (x ∗ , y∗ ) is given by (x ∗ , y∗ ) = (− 3 4 e, 9 16 e). The performance is assessed by the solution error ϵx = ∥x−x ∗∥ 2 ∥x0−x∗∥2+1 and the sub-problem error ϵy = ∥y−y ∗ (x)∥ 2 ∥y0−y∗(x)∥2+1 . We fix n = 100, δ = 1 and compare SPACO with MGD [Tsaknakis et al., 2023], MMPen [Hu et al., 2024] and gradient descent-ascent with fixed penalty p… view at source ↗

**Figure 3.** Figure 3: Convergence curve for solving (13) B.1.1 Experimental Setup and Hyperparameter Tuning We evaluate the proposed algorithms on two synthetic stochastic minimax optimization: a nonlinear constrained problem (12) and a linear constrained problem (13). For both experimental settings, the problem dimension is fixed at n = 100, and the noise variance is set to δ 2 = 1. Regarding initialization, the starting point… view at source ↗

**Figure 4.** Figure 4: Training dynamics on the Convex-Concave Regime: Logistic regression on Adult dataset. view at source ↗

**Figure 5.** Figure 5: Training dynamics on the Nonconvex Regime: Deep fairness learning on the CelebA view at source ↗

**Figure 6.** Figure 6: Visual comparison on the AFHQ-v2 dataset. Left: Representative real samples from the original training set. Right: Samples generated by the SPACO-trained model. The side-by-side comparison demonstrates that SPACO is capable of synthesizing high-fidelity images with visual quality and diversity comparable to the real data. We report the mean results over three independent runs. and standard deviations avera… view at source ↗

read the original abstract

In this paper, we propose a single-loop stochastic gradient algorithm for solving stochastic nonconvex-concave minimax optimization with nonlinear convex coupled constraints (MCC). The proposed method, SPACO (Stochastic Penalty-based Algorithm for minimax optimization with COupled constraints), is built upon a penalty-based smooth approximation framework for MCC. This framework integrates a quadratic penalty scheme with regularization to yield a continuously differentiable approximation of the MCC problem. We provide theoretical convergence guarantees for this smoothing framework. Furthermore, we establish non-asymptotic complexity bounds and provide an asymptotic analysis characterizing the stationarity of accumulation points for the iterates generated by SPACO. Experimental results on synthetic examples and practical machine learning tasks demonstrate the effectiveness and efficiency of the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SPACO gives a single-loop stochastic gradient method for nonconvex-concave minimax problems with nonlinear convex coupled constraints by smoothing via quadratic penalty plus regularization, plus claimed complexity bounds.

read the letter

The main thing to know is that this paper delivers SPACO, a single-loop algorithm that turns the constrained minimax problem into a smooth unconstrained one and then applies stochastic gradients directly, with non-asymptotic complexity statements and an asymptotic stationarity result for the iterates. That combination for nonlinear coupled constraints in the stochastic nonconvex-concave regime is the concrete step beyond prior penalty or single-loop work on simpler settings. The smoothing step uses a quadratic penalty with added regularization to keep the approximation continuously differentiable while trying to retain enough structure for the analysis to go through. Experiments on synthetic cases and a couple of machine learning tasks show it runs faster than double-loop baselines and reaches comparable solution quality. The approach is straightforward to implement once the penalty parameters are set. The soft spots sit in the approximation error control. The quadratic penalty introduces bias that must be driven to zero without blowing up the stochastic variance or destroying the concavity properties needed for the minimax convergence; the paper claims this is handled but the rates will depend on how the penalty and regularization schedules are chosen relative to the step sizes. If those schedules are too aggressive the constraint violation may not vanish fast enough in practice. The non-asymptotic bounds are stated but their dependence on the constraint Lipschitz constants and the penalty level is not spelled out in the abstract, so the practical tightness is unclear. This paper is for people who already work on constrained minimax problems in optimization or robust ML and need a single-loop option that avoids nested optimization. A reader looking for a ready-to-code solver with some theory will find it useful. It deserves peer review because the problem class is relevant, the algorithm is explicit, and the claims are specific enough for referees to check the proof details and the experimental setup.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes SPACO, a single-loop stochastic gradient algorithm for stochastic nonconvex-concave minimax optimization with nonlinear convex coupled constraints (MCC). It relies on a quadratic penalty scheme combined with regularization to produce a continuously differentiable approximation of the constrained problem, provides convergence guarantees for this smoothing framework, establishes non-asymptotic complexity bounds, and characterizes the asymptotic stationarity of accumulation points. Experiments on synthetic examples and machine learning tasks are included to demonstrate practical performance.

Significance. If the stated complexity bounds and stationarity results hold under the given assumptions, the work would contribute a practical single-loop method for an important class of constrained minimax problems arising in machine learning. The penalty-based smoothing approach is standard, but its integration with stochastic gradients while preserving sufficient structure for nonconvex-concave analysis could be useful. The provision of both non-asymptotic bounds and asymptotic analysis, together with reproducible experiments, strengthens the contribution relative to purely asymptotic or heuristic alternatives.

minor comments (2)

[Abstract] Abstract: the claim of 'non-asymptotic complexity bounds' would benefit from an explicit statement of the dependence on target accuracy, batch size, and penalty parameter in the abstract itself.
[Section 3 (smoothing framework)] The weakest assumption identified in the smoothing framework (continuous differentiability while preserving nonconvex-concave structure) should be stated as a numbered assumption with a clear reference to where it is used in the complexity proof.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and positive summary of our work on SPACO. The assessment that the integration of penalty-based smoothing with stochastic gradients for nonconvex-concave minimax problems with coupled constraints could be useful is encouraging. We are happy to clarify or strengthen any aspects of the manuscript. No specific major comments were enumerated in the report, so we respond at a high level and remain available for further questions.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes the SPACO algorithm via a penalty-based smooth approximation of the MCC problem, then derives non-asymptotic complexity bounds and asymptotic stationarity results for the resulting single-loop stochastic gradient iterates. These steps rest on standard stochastic approximation theory and an external quadratic-penalty smoothing construction rather than any internally fitted parameter, self-defined quantity, or self-citation chain that reduces the claimed guarantees to the inputs by construction. The derivation chain therefore remains self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claims rest on standard optimization assumptions plus the penalty smoothing construction; no free parameters or new entities beyond the algorithm.

axioms (2)

domain assumption Objective functions are continuously differentiable and the minimax problem is nonconvex-concave.
Invoked to justify smoothing and convergence analysis.
domain assumption Coupled constraints are nonlinear and convex.
Core to the MCC problem definition.

pith-pipeline@v0.9.0 · 9917 in / 1115 out tokens · 71218 ms · 2026-05-09T15:11:42.760267+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

82 extracted references · 3 canonical work pages

[1]

Mathematical Programming , volume=

A first-order augmented Lagrangian method for constrained minimax optimization , author=. Mathematical Programming , volume=
[2]

2009 , publisher =

Variational Analysis , author =. 2009 , publisher =

2009
[3]

SIAM Journal on Applied Mathematics , volume=

The theory of max-min, with applications , author=. SIAM Journal on Applied Mathematics , volume=. 1966 , publisher=

1966
[4]

International Conference on Learning Representations , year=

Perceptual Adversarial Robustness: Defense Against Unseen Threat Models , author=. International Conference on Learning Representations , year=
[5]

2017 , publisher=

First-order methods in optimization , author=. 2017 , publisher=

2017
[6]

2013 , publisher=

Perturbation analysis of optimization problems , author=. 2013 , publisher=

2013
[7]

Momentum-based variance reduction in non-convex

Cutkosky, Ashok and Orabona, Francesco , booktitle=. Momentum-based variance reduction in non-convex
[8]

Nature methods , volume=

SciPy 1.0: fundamental algorithms for scientific computing in Python , author=. Nature methods , volume=. 2020 , publisher=

2020
[9]

Optimization Online , year=

Primal-dual global convergence of an augmented Lagrangian method under the error bound condition , author=. Optimization Online , year=
[10]

Advances in Neural Information Processing Systems , volume=

Generative adversarial nets , author=. Advances in Neural Information Processing Systems , volume=
[11]

SIAM Journal on Mathematics of Data Science , volume=

Optimality conditions for nonsmooth nonconvex-nonconcave min-max problems and generative adversarial networks , author=. SIAM Journal on Mathematics of Data Science , volume=. 2023 , publisher=

2023
[12]

Artificial Intelligence and Statistics , year=

Conformal contextual robust optimization , author=. Artificial Intelligence and Statistics , year=
[13]

Mathematical Programming , volume=

Data-driven robust optimization , author=. Mathematical Programming , volume=. 2018 , publisher=

2018
[14]

Advances in Neural Information Processing Systems , volume=

Outlier-robust distributionally robust optimization via unbalanced optimal transport , author=. Advances in Neural Information Processing Systems , volume=
[15]

International Conference on Learning Representations , year=

Towards Deep Learning Models Resistant to Adversarial Attacks , author=. International Conference on Learning Representations , year=
[16]

SIAM Journal on Optimization , volume=

Minimax problems with coupled linear constraints: Computational complexity and duality , author=. SIAM Journal on Optimization , volume=. 2023 , publisher=

2023
[17]

IEEE Access , volume=

Constrained generative adversarial networks , author=. IEEE Access , volume=. 2021 , publisher=

2021
[18]

International Conference on Machine Learning , year=

Theoretically principled trade-off between robustness and accuracy , author=. International Conference on Machine Learning , year=
[19]

International Conference on Machine Learning , year=

On gradient descent ascent for nonconvex-concave minimax problems , author=. International Conference on Machine Learning , year=
[20]

International Conference on Learning Representations , year=

Revisiting large-scale non-convex distributionally robust optimization , author=. International Conference on Learning Representations , year=
[21]

SIAM Journal on Optimization , volume=

Accelerated minimax algorithms flock together , author=. SIAM Journal on Optimization , volume=. 2025 , publisher=

2025
[22]

International Conference on Machine Learning , year=

Delving into the convergence of generalized smooth minimax optimization , author=. International Conference on Machine Learning , year=
[23]

International Conference on Machine Learning , year=

Fundamental Benefit of Alternating Updates in Minimax Optimization , author=. International Conference on Machine Learning , year=
[24]

Mathematical Programming , volume=

Nonsmooth nonconvex--nonconcave minimax optimization: Primal--dual balancing and iteration complexity analysis , author=. Mathematical Programming , volume=. 2025 , publisher=

2025
[25]

Bertsekas, Dimitri P , year=
[26]

SIAM Journal on Optimization , volume=

Optimality conditions and numerical algorithms for a class of linearly constrained minimax optimization problems , author=. SIAM Journal on Optimization , volume=. 2024 , publisher=

2024
[27]

2006 , publisher=

Numerical optimization , author=. 2006 , publisher=

2006
[28]

Journal of the Operations Research Society of China , pages=

An Alternating Proximal Gradient Algorithm for Nonsmooth Nonconvex-Linear Minimax Problems with Coupled Linear Constraints , author=. Journal of the Operations Research Society of China , pages=. 2024 , publisher=

2024
[29]

INFORMS Journal on Computing , volume=

A practical scheme to compute the pessimistic bilevel optimization problem , author=. INFORMS Journal on Computing , volume=. 2020 , publisher=

2020
[30]

Advances in Neural Information Processing Systems , volume=

Robust deep reinforcement learning against adversarial perturbations on state observations , author=. Advances in Neural Information Processing Systems , volume=
[31]

International Conference on Machine Learning , year=

Robust adversarial reinforcement learning , author=. International Conference on Machine Learning , year=
[32]

arXiv preprint arXiv:2408.17213 , year=

A Minimization Approach for Minimax Optimization with Coupled Constraints , author=. arXiv preprint arXiv:2408.17213 , year=

work page arXiv
[33]

arXiv preprint arXiv:2402.03352 , year=

Zeroth-Order primal-dual Alternating Projection Gradient Algorithms for Nonconvex Minimax Problems with Coupled linear Constraints , author=. arXiv preprint arXiv:2402.03352 , year=

work page arXiv
[34]

2020 , issn =

Optimality Conditions for Constrained Minimax Optimization , journal =. 2020 , issn =

2020
[35]

Mathematics of Operations Research , volume=

Sensitivity analysis of the maximal value function with applications in nonconvex minimax programs , author=. Mathematics of Operations Research , volume=. 2024 , publisher=

2024
[36]

Journal of the Operations Research Society of China , volume=

The rate of convergence of augmented Lagrangian method for minimax optimization problems with equality constraints , author=. Journal of the Operations Research Society of China , volume=. 2024 , publisher=

2024
[37]

SIAM Journal on Optimization , volume=

Prox-method with rate of convergence O (1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems , author=. SIAM Journal on Optimization , volume=. 2004 , publisher=

2004
[38]

Mathematical Programming , volume=

Dual extrapolation and its applications to solving variational inequalities and related problems , author=. Mathematical Programming , volume=. 2007 , publisher=

2007
[39]

Complexity of variants of

Monteiro, Renato DC and Svaiter, Benar Fux , journal=. Complexity of variants of. 2011 , publisher=

2011
[40]

A primal--dual hybrid gradient method for nonlinear operators with applications to

Valkonen, Tuomo , journal=. A primal--dual hybrid gradient method for nonlinear operators with applications to. 2014 , publisher=

2014
[41]

Journal of Mathematical Imaging and Vision , volume=

A first-order primal-dual algorithm for convex problems with applications to imaging , author=. Journal of Mathematical Imaging and Vision , volume=. 2011 , publisher=

2011
[42]

2000 , publisher=

An introduction to variational inequalities and their applications , author=. 2000 , publisher=

2000
[43]

International Conference on Learning Representations , year=

A Variational Inequality Perspective on Generative Adversarial Networks , author=. International Conference on Learning Representations , year=
[44]

Journal of optimization theory and applications , volume=

Subgradient methods for saddle-point problems , author=. Journal of optimization theory and applications , volume=. 2009 , publisher=

2009
[45]

International Conference on Artificial Intelligence and Statistics , pages=

A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach , author=. International Conference on Artificial Intelligence and Statistics , pages=
[46]

SIAM Journal on Optimization , volume=

Robust stochastic approximation approach to stochastic programming , author=. SIAM Journal on Optimization , volume=. 2009 , publisher=

2009
[47]

Stochastic Systems , volume=

Solving variational inequalities with stochastic mirror-prox algorithm , author=. Stochastic Systems , volume=. 2011 , publisher=

2011
[48]

SIAM Journal on Optimization , volume=

Optimal primal-dual methods for a class of saddle point problems , author=. SIAM Journal on Optimization , volume=. 2014 , publisher=

2014
[49]

On the convergence and robustness of training

Sanjabi, Maziar and Ba, Jimmy and Razaviyayn, Meisam and Lee, Jason D , booktitle=. On the convergence and robustness of training
[50]

Optimizing Methods in Statistics , pages=

A convergence theorem for non negative almost supermartingales and some applications , author=. Optimizing Methods in Statistics , pages=. 1971 , publisher=

1971
[51]

Conference on Learning Theory , pages=

Near-optimal algorithms for minimax optimization , author=. Conference on Learning Theory , pages=
[52]

Advances in Neural Information Processing Systems , volume=

Solving a class of non-convex min-max games using iterative first order methods , author=. Advances in Neural Information Processing Systems , volume=
[53]

Advances in Neural Information Processing Systems , volume=

Efficient algorithms for smooth minimax optimization , author=. Advances in Neural Information Processing Systems , volume=
[54]

SIAM Journal on Optimization , volume=

An accelerated inexact proximal point method for solving nonconvex-concave min-max problems , author=. SIAM Journal on Optimization , volume=. 2021 , publisher=

2021
[55]

Advances in Neural Information Processing Systems , volume=

The limit points of (optimistic) gradient descent in min-max optimization , author=. Advances in Neural Information Processing Systems , volume=
[56]

IEEE Transactions on Signal Processing , volume=

Hybrid block successive approximation for one-sided non-convex min-max problems: algorithms and applications , author=. IEEE Transactions on Signal Processing , volume=. 2020 , publisher=

2020
[57]

Mathematical Programming , volume=

A unified single-loop alternating gradient projection algorithm for nonconvex--concave and convex--nonconcave minimax problems , author=. Mathematical Programming , volume=. 2023 , publisher=

2023
[58]

Advances in Neural Information Processing Systems , volume=

A single-loop smoothed gradient descent-ascent algorithm for nonconvex-concave min-max problems , author=. Advances in Neural Information Processing Systems , volume=
[59]

Advances in Neural Information Processing Systems , volume=

Stochastic mirror descent in variationally coherent optimization problems , author=. Advances in Neural Information Processing Systems , volume=
[60]

International Conference on Artificial Intelligence and Statistics , pages=

Efficient methods for structured nonconvex-nonconcave min-max optimization , author=. International Conference on Artificial Intelligence and Statistics , pages=
[61]

SIAM Journal on Optimization , volume=

A relaxed quasinormality condition and the boundedness of dual augmented Lagrangian sequences , author=. SIAM Journal on Optimization , volume=. 2025 , publisher=

2025
[62]

Mathematical Programming , volume=

The landscape of the proximal point method for nonconvex--nonconcave minimax optimization , author=. Mathematical Programming , volume=. 2023 , publisher=

2023
[63]

Convex-concave min-max

Goktas, Denizalp and Greenwald, Amy , booktitle=. Convex-concave min-max
[64]

International Conference on Learning Representations , year=

Distributionally Robust Neural Networks For Group Shifts: On The Importance Of Regularization For Worst-Case Generalization , author=. International Conference on Learning Representations , year=
[65]

Selective

Geifman, Yonatan and El-Yaniv, Ran , booktitle=. Selective
[66]

SIAM Journal on Optimization , volume=

Stochastic First- and Zeroth-Order Methods For Nonconvex Stochastic Programming , author=. SIAM Journal on Optimization , volume=. 2013 , publisher=

2013
[67]

2013 , publisher=

Convex analysis and minimization algorithms I: Fundamentals , author=. 2013 , publisher=

2013
[68]

arXiv preprint arXiv:2510.03861 , year=

Calm local optimality for couple-constrained minimax problems , author=. arXiv preprint arXiv:2510.03861 , year=

work page arXiv
[69]

International Conference on Machine Learning , year=

Averaged method of multipliers for bi-level optimization without lower-level strong convexity , author=. International Conference on Machine Learning , year=
[70]

Advances in Neural Information Processing Systems , volume =

Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization , author =. Advances in Neural Information Processing Systems , volume =
[71]

2009 , institution=

Learning Multiple Layers of Features from Tiny Images , author=. 2009 , institution=

2009
[72]

Yunjey Choi and Youngjung Uh and Jaejun Yoo and Jung-Woo Ha , booktitle=. Star
[73]

International Conference on Learning Representations , year=

Spectral Normalization for Generative Adversarial Networks , author=. International Conference on Learning Representations , year=
[74]

Heusel, Martin and Ramsauer, Hubert and Unterthiner, Thomas and Nessler, Bernhard and Hochreiter, Sepp , booktitle =
[75]

Improved Techniques for Training

Salimans, Tim and Goodfellow, Ian and Zaremba, Wojciech and Cheung, Vicki and Radford, Alec and Chen, Xi and Chen, Xi , booktitle =. Improved Techniques for Training
[76]

AAAI/ACM Conference on AI, Ethics, and Society , pages=

Mitigating unwanted biases with adversarial learning , author=. AAAI/ACM Conference on AI, Ethics, and Society , pages=
[77]

Second-Order Min-Max Optimization with Lazy

Lesi Chen and Chengchang Liu and Jingzhao Zhang , booktitle=. Second-Order Min-Max Optimization with Lazy
[78]

Matecon , volume=

The extragradient method for finding saddle points and other problems , author=. Matecon , volume=
[79]

and Moeller, John and Scheidegger, Carlos and Venkatasubramanian, Suresh , title =

Feldman, Michael and Friedler, Sorelle A. and Moeller, John and Scheidegger, Carlos and Venkatasubramanian, Suresh , title =. 2015 , booktitle =

2015
[80]

2016 , booktitle =

Hardt, Moritz and Price, Eric and Srebro, Nathan , title =. 2016 , booktitle =

2016

Showing first 80 references.