arxiv: 2604.19065 · v1 · submitted 2026-04-21 · 💻 cs.GT · cs.SY· eess.SY· math.OC· stat.ML

Last-Iterate Guarantees for Learning in Co-coercive Games

Siddharth Chandak , Ramanan Tamizholi , Nicholas Bambos This is my paper

Pith reviewed 2026-05-10 02:05 UTC · model grok-4.3

classification 💻 cs.GT cs.SYeess.SYmath.OCstat.ML

keywords co-coercive gameslast-iterate convergencestochastic gradient descentNash equilibrianon-vanishing noisefinite-time boundsgame theory

0 comments p. Extension

The pith

Stochastic gradient descent achieves O(log t / t^{1/3}) last-iterate convergence in co-coercive games under non-vanishing noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that vanilla stochastic gradient descent converges in the last iterate to the set of Nash equilibria for co-coercive games, which generalize strongly monotone games and include cases with multiple equilibria such as certain quadratic games and smooth concave potential games. It works under a noise model where the second moment of the feedback noise scales affinely with the squared norm of the current iterate, without requiring the noise to vanish near equilibrium. This produces a finite-time last-iterate rate of O(log(t)/t^{1/3}) together with almost-sure convergence of the iterates and standard time-average convergence. A reader would care because the noise model matches settings with large or unbounded action spaces where earlier vanishing-noise assumptions do not apply.

Core claim

In co-coercive games, stochastic gradient descent with a decreasing step-size schedule yields last-iterate convergence to the Nash set at rate O(log(t)/t^{1/3}) when the second moment of the noisy gradient feedback is bounded by a constant plus a term linear in the squared norm of the iterate. The same analysis also establishes almost-sure convergence of the sequence to the set of Nash equilibria and time-average convergence to that set.

What carries the argument

The co-coercivity of the game mapping together with the affine noise-second-moment bound, analyzed through a step-size schedule of order t to the minus two-thirds.

Load-bearing premise

The second moment of the noise is at most an affine function of the squared norm of the iterates.

What would settle it

Numerical runs in which the noise variance is forced to grow quadratically with the squared iterate norm and the observed last-iterate distance to equilibrium is checked for violation of the O(log t / t^{1/3}) rate.

read the original abstract

We establish finite-time last-iterate guarantees for vanilla stochastic gradient descent in co-coercive games under noisy feedback. This is a broad class of games that is more general than strongly monotone games, allows for multiple Nash equilibria, and includes examples such as quadratic games with negative semidefinite interaction matrices and potential games with smooth concave potentials. Prior work in this setting has relied on relative noise models, where the noise vanishes as iterates approach equilibrium, an assumption that is often unrealistic in practice. We work instead under a substantially more general noise model in which the second moment of the noise is allowed to scale affinely with the squared norm of the iterates, an assumption natural in learning with unbounded action spaces. Under this model, we prove a last-iterate bound of order $O(\log(t)/t^{1/3})$, the first such bound for co-coercive games under non-vanishing noise. We additionally establish almost sure convergence of the iterates to the set of Nash equilibria and derive time-average convergence guarantees.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first last-iterate finite-time bound for co-coercive games under affine non-vanishing noise, but the moment-control steps in the proof need a closer look.

read the letter

The main point is that they prove an O(log t / t^{1/3}) last-iterate rate for vanilla stochastic gradient descent in co-coercive games when noise second moments scale affinely with the squared norm of the iterates. This noise model is more realistic for unbounded action spaces than the relative vanishing noise in prior work, and the bound is new for this class of games that includes multiple equilibria and covers quadratic games with negative semidefinite interactions plus potential games with smooth concave potentials. They also add almost-sure convergence to the Nash set and time-average guarantees, which round out the contribution nicely.

Referee Report

2 major / 2 minor

Summary. The manuscript proves finite-time last-iterate convergence for vanilla SGD on co-coercive games under a non-vanishing noise model in which the noise second moment scales affinely with ||x_t||^2. It obtains an O(log t / t^{1/3}) last-iterate rate to the Nash set, almost-sure convergence of the iterates, and time-average convergence. The setting generalizes strongly monotone games and includes quadratic games with negative-semidefinite interaction matrices and smooth potential games.

Significance. If the stated bounds are correct, the result supplies the first last-iterate rate for co-coercive games that does not require relative (vanishing) noise. The noise model is natural for unbounded action spaces and the proof technique appears to combine a supermartingale moment bound with a Lyapunov argument that yields the 1/3 exponent. The paper also supplies almost-sure convergence, which is a useful complement to the finite-time guarantee.

major comments (2)

[§4.2 and Theorem 4.3] §4.2, the moment-control argument: the supermartingale bound on E[||x_t||^2] is derived under the affine noise assumption, yet the subsequent last-iterate telescoping in Theorem 4.3 appears to invoke a uniform bound on the second moments to control the noise terms inside the Lyapunov decrease. These two steps must be sequenced explicitly so that the rate derivation does not presuppose the very boundedness it is used to establish.
[Theorem 3.1 and Lemma 4.1] Theorem 3.1 (last-iterate bound): the O(log t / t^{1/3}) rate is stated to hold for any step-size sequence satisfying a specific summability condition, but the proof sketch does not verify that the chosen step sizes simultaneously satisfy the moment-control hypothesis of Lemma 4.1. A single explicit step-size schedule that works for both parts should be exhibited.

minor comments (2)

[§2.2] Notation: the symbol β is used both for the co-coercivity constant and for a step-size parameter in §2.2; a distinct symbol would improve readability.
[Figure 1] Figure 1: the y-axis label on the right-hand plot is missing the log scale indicator that is described in the caption.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major comment below. The concerns raised are valid points about proof presentation and can be resolved by reordering arguments for clarity and exhibiting an explicit step-size schedule; these are expository improvements that do not alter the results.

read point-by-point responses

Referee: [§4.2 and Theorem 4.3] §4.2, the moment-control argument: the supermartingale bound on E[||x_t||^2] is derived under the affine noise assumption, yet the subsequent last-iterate telescoping in Theorem 4.3 appears to invoke a uniform bound on the second moments to control the noise terms inside the Lyapunov decrease. These two steps must be sequenced explicitly so that the rate derivation does not presuppose the very boundedness it is used to establish.

Authors: We agree that the current presentation could be misread as circular. The supermartingale argument in §4.2 in fact yields a t-independent bound E[||x_t||^2] ≤ M almost surely under the affine noise model (with M depending only on the initial condition, the co-coercivity constant, and the noise coefficients). This bound is then invoked in the Lyapunov telescoping of Theorem 4.3 to absorb the noise terms. To eliminate any ambiguity, we will (i) promote the moment bound to a standalone lemma placed before Theorem 4.3, (ii) explicitly cite it at the start of the proof of Theorem 4.3 (“By the moment-control lemma, E[||x_s||^2] ≤ M for all s ≤ t, which permits the uniform bound …”), and (iii) add a short remark after the lemma stating that the bound is established independently of the subsequent rate analysis. These changes are purely organizational. revision: yes
Referee: [Theorem 3.1 and Lemma 4.1] Theorem 3.1 (last-iterate bound): the O(log t / t^{1/3}) rate is stated to hold for any step-size sequence satisfying a specific summability condition, but the proof sketch does not verify that the chosen step sizes simultaneously satisfy the moment-control hypothesis of Lemma 4.1. A single explicit step-size schedule that works for both parts should be exhibited.

Authors: We concur that an explicit schedule is useful. Theorem 3.1 requires a step-size sequence satisfying ∑ η_t = ∞ together with ∑ η_t^{3/2} < ∞ (to obtain the log t / t^{1/3} rate). Lemma 4.1 requires in addition that ∑ η_t^2 < ∞ to close the supermartingale argument. The schedule η_t = t^{-2/3} (log(t+2))^{-2} for t ≥ 1 satisfies all three conditions simultaneously: the first sum diverges (integral test yields t^{1/3} growth), while the second and third sums converge (exponents -4/3 and -4/3). We will insert this concrete choice as a corollary to Theorem 3.1, verify that it meets the hypothesis of Lemma 4.1, and include the short calculation in the appendix. The general statement of Theorem 3.1 remains unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract presents the O(log(t)/t^{1/3}) last-iterate bound as derived from co-coercivity plus the explicit affine noise model (second moment scaling as O(1 + ||x_t||^2)). No equations, definitions, or self-citations in the provided text reduce this bound to a fitted quantity, a self-referential assumption, or a prior result by the same authors. The noise model is stated as an input assumption rather than derived from the rate, and almost-sure convergence is listed as an additional result. The derivation chain therefore remains self-contained against external benchmarks such as prior vanishing-noise analyses; the skeptic concern about moment control is a standard proof obligation but does not manifest as a circular reduction in the given material.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the definition of co-coercivity and the affine noise model; both are domain assumptions rather than derived quantities. No free parameters are fitted and no new entities are introduced.

axioms (2)

domain assumption The game is co-coercive
Core structural assumption that enables the analysis and is broader than strong monotonicity.
domain assumption Second moment of noise scales affinely with squared norm of iterates
The key modeling choice that replaces vanishing-noise assumptions and is natural for unbounded action spaces.

pith-pipeline@v0.9.0 · 5494 in / 1374 out tokens · 57058 ms · 2026-05-10T02:05:17.207053+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 3 canonical work pages

[1]

A control theoretic approach to noncooperative game design,

T. Alpcan, L. Pavel, and N. Stefanovic, “A control theoretic approach to noncooperative game design,” inDecision and Control, 2009 held jointly with the 2009 28th Chinese Control Conference. CDC/CCC

2009
[2]

IEEE, 2009, pp

Proceedings of the 48th IEEE Conference on. IEEE, 2009, pp. 8575–8580

2009
[3]

Distributed welfare games,

J. R. Marden and A. Wierman, “Distributed welfare games,”Opera- tions Research, vol. 61, no. 1, pp. 155–168, 2013

2013
[4]

Geometric convergence of gradient play algorithms for distributed Nash equilibrium seeking,

T. Tatarenko, W. Shi, and A. Nedi ´c, “Geometric convergence of gradient play algorithms for distributed Nash equilibrium seeking,” IEEE Transactions on Automatic Control, vol. 66, no. 11, pp. 5342– 5353, 2020

2020
[5]

On gradient-based learning in continuous games,

E. Mazumdar, L. J. Ratliff, and S. S. Sastry, “On gradient-based learning in continuous games,”SIAM Journal on Mathematics of Data Science, vol. 2, no. 1, pp. 103–131, 2020

2020
[6]

Learning in games with continuous action sets and unknown payoff functions,

P. Mertikopoulos and Z. Zhou, “Learning in games with continuous action sets and unknown payoff functions,”Mathematical Program- ming, vol. 173, no. 1, pp. 465–507, 2019

2019
[7]

Learning to control unknown strongly monotone games,

S. Chandak, I. Bistritz, and N. Bambos, “Learning to control unknown strongly monotone games,”IEEE Transactions on Control of Network Systems, 2026

2026
[8]

Existence and uniqueness of equilibrium points for concave n-person games,

J. B. Rosen, “Existence and uniqueness of equilibrium points for concave n-person games,”Econometrica: Journal of the Econometric Society, pp. 520–534, 1965

1965
[9]

Robust power management via learning and game design,

Z. Zhou, P. Mertikopoulos, A. L. Moustakas, N. Bambos, and P. Glynn, “Robust power management via learning and game design,”Opera- tions Research, vol. 69, no. 1, pp. 331–345, 2021

2021
[10]

Finite-time last- iterate convergence for multi-agent learning in games,

T. Lin, Z. Zhou, P. Mertikopoulos, and M. Jordan, “Finite-time last- iterate convergence for multi-agent learning in games,” inProceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119. PMLR, 13–18 Jul 2020, pp. 6161–6171

2020
[11]

Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity,

N. Loizou, H. Berard, G. Gidel, I. Mitliagkas, and S. Lacoste-Julien, “Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity,” Advances in Neural Information Processing Systems, vol. 34, pp. 19 095–19 108, 2021

2021
[12]

Convergence rate analysis of several splitting schemes,

D. Davis and W. Yin, “Convergence rate analysis of several splitting schemes,”Splitting methods in communication, imaging, science, and engineering, pp. 115–163, 2016

2016
[13]

Convergence rates with inexact non-expansive operators,

J. Liang, J. Fadili, and G. Peyr ´e, “Convergence rates with inexact non-expansive operators,”Mathematical Programming, vol. 159, pp. 403–434, 2016

2016
[14]

Bandit learning in concave n-person games,

M. Bravo, D. Leslie, and P. Mertikopoulos, “Bandit learning in concave n-person games,”Advances in Neural Information Processing Systems, vol. 31, 2018

2018
[15]

On the rate of convergence of payoff-based algorithms to nash equilibrium in strongly monotone games,

T. Tatarenko and M. Kamgarpour, “On the rate of convergence of payoff-based algorithms to nash equilibrium in strongly monotone games,”arXiv preprint arXiv:2202.11147, 2022

work page arXiv 2022
[16]

Convergence rate of generalized nash equilibrium learning in strongly monotone games with linear constraints,

——, “Convergence rate of generalized nash equilibrium learning in strongly monotone games with linear constraints,”arXiv preprint arXiv:2507.12112, 2025

work page arXiv 2025
[17]

Cycles in adversarial regularized learning,

P. Mertikopoulos, C. Papadimitriou, and G. Piliouras, “Cycles in adversarial regularized learning,” inProceedings of the twenty-ninth annual ACM-SIAM symposium on discrete algorithms. SIAM, 2018, pp. 2703–2717

2018
[18]

Last-iterate convergence of optimistic gradient method for monotone variational inequalities,

E. Gorbunov, A. Taylor, and G. Gidel, “Last-iterate convergence of optimistic gradient method for monotone variational inequalities,” Advances in neural information processing systems, vol. 35, pp. 21 858–21 870, 2022

2022
[19]

Learning nash equilibria in mono- tone games,

T. Tatarenko and M. Kamgarpour, “Learning nash equilibria in mono- tone games,” in2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, 2019, pp. 3104–3109

2019
[20]

Tight last-iterate conver- gence of the extragradient and the optimistic gradient descent-ascent algorithm for constrained monotone variational inequalities,

Y . Cai, A. Oikonomou, and W. Zheng, “Tight last-iterate conver- gence of the extragradient and the optimistic gradient descent-ascent algorithm for constrained monotone variational inequalities,”arXiv preprint arXiv:2204.09228, 2022

work page arXiv 2022
[21]

Facchinei and J.-S

F. Facchinei and J.-S. Pang,Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media, 2007

2007
[22]

Nesterovet al.,Lectures on convex optimization

Y . Nesterovet al.,Lectures on convex optimization. Springer, 2018, vol. 137

2018
[23]

A convergence theorem for non neg- ative almost supermartingales and some applications,

H. Robbins and D. Siegmund, “A convergence theorem for non neg- ative almost supermartingales and some applications,” inOptimizing methods in statistics. Elsevier, 1971, pp. 233–257

1971
[24]

Borkar,Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition, ser

V . Borkar,Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition, ser. Texts and Readings in Mathematics. Hindustan Book Agency, 2022

2022
[25]

J.-J. E. Slotine and W. Li,Applied nonlinear control. Prentice hall Englewood Cliffs, NJ, 1991, vol. 199, no. 1. APPENDIXI PROOFS FORTHEOREMS1, 2, 3 A.Proof for Theorem 1 Proof.Let us first analyze solutions of the following ODE. ˙x(t) =v(x(t)).(5) DefineV(x) =∥x−x ∗∥2 for somex ∗ ∈ X ∗. Then we have thatV(x)≥0for allx∈R N d andV(x)→ ∞asx→ ∞. Also, note t...

1991