Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium

Jeff S. Shamma; Sarah A. Toonsi

arxiv: 2506.10874 · v2 · submitted 2025-06-12 · 💻 cs.MA · cs.GT· cs.SY· eess.SY

Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium

Sarah A. Toonsi , Jeff S. Shamma This is my paper

Pith reviewed 2026-05-19 09:49 UTC · model grok-4.3

classification 💻 cs.MA cs.GTcs.SYeess.SY

keywords uncoupled learninghigher-order dynamicsNash equilibriummixed strategiesfinite gamesdecentralized controlreplicator dynamics

0 comments

The pith

Higher-order uncoupled dynamics exist that locally converge to any isolated mixed Nash equilibrium.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors aim to show that players can learn isolated mixed-strategy Nash equilibria in finite games even when they cannot observe opponents' utilities, provided they use learning rules that incorporate additional internal states. This would matter if true because it overcomes limitations of standard dynamics that often cycle or fail to settle at mixed points. They establish this by connecting the learning process to the design of stabilizing controllers in a decentralized setting from control theory. The work also highlights that learning dynamics cannot be universal across all games.

Core claim

For any finite game with an isolated completely mixed-strategy Nash equilibrium, there exist higher-order uncoupled learning dynamics that lead locally to that equilibrium. The proof relies on associating uncoupled learning with feedback stabilization under decentralized control, which permits constructing the required dynamics using control-theoretic tools. The paper additionally shows a lack of universality by constructing pairs of games where dynamics that learn one equilibrium cannot learn the other, drawing from simultaneous stabilization concepts.

What carries the argument

The correspondence between higher-order uncoupled learning dynamics and decentralized feedback stabilization systems, which enables the application of stability analysis from control theory to prove local convergence of the learning process.

If this is right

Players using only their own payoff observations can still reach mixed equilibria through carefully designed auxiliary dynamics.
Any isolated mixed Nash equilibrium becomes a locally attractive point for some choice of higher-order learning rule.
No single learning dynamic works universally for all games, as shown by pairs where stabilization of one precludes the other.
The asymptotic best response property provides a way to ensure dynamics remain consistent with best responses in stationary settings.
Bandit feedback versions of the dynamics allow learning under partial information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If explicit constructions of these dynamics become available, they could inform the design of memory-augmented algorithms in multi-agent reinforcement learning.
The control-theoretic perspective might help analyze convergence in games with time-varying payoffs or noise.
Non-universality implies that practitioners may need game-dependent tuning of learning parameters for reliable equilibrium finding.
Future work could test whether similar higher-order structures apply to infinite games or stochastic approximations.

Load-bearing premise

The equivalence between the convergence of uncoupled higher-order learning and the stabilization of an associated decentralized control system holds without significant discrepancies.

What would settle it

Numerical or analytical demonstration that in a particular finite game with an isolated mixed Nash equilibrium, no higher-order uncoupled dynamics of reasonable complexity achieve local convergence.

Figures

Figures reproduced from arXiv: 2506.10874 by Jeff S. Shamma, Sarah A. Toonsi.

**Figure 2.** Figure 2: Higher-order replicator dynamics with linear higher-order terms as an open system [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Network structures of ΓCY and ΓPW. Arrows indicate strategic dependency (e.g., in ΓCY, player 1’s payoff depends on player 2’s strategy.) 6.1 Two games with different network structures Let us begin by discussing the class of games of interest. To this end, define ΓCY(c) to be the following (cyclic) polymatrix game R1(x1, x2) = x T 1 M1(c1)x2 R2(x2, x3) = x T 2 M2(c2)x3 R3(x3, x4) = x T 3 M3(c3)x4 R4(x4, x… view at source ↗

**Figure 4.** Figure 4: Stable outcome of mixed-strategy equilibrium under higher-order replicator dynamics. [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗

**Figure 5.** Figure 5: Dynamics of player 1 responding to p1 = 1 0 from various initial strategies and inspect the solution when p1 = [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

read the original abstract

We study learnability of mixed-strategy Nash Equilibrium (NE) in general finite games using higher-order replicator dynamics as well as classes of higher-order uncoupled heterogeneous dynamics. In higher-order uncoupled learning dynamics, players have no access to utilities of opponents (uncoupled) but are allowed to use auxiliary states to further process information (higher-order). We establish a link between uncoupled learning and feedback stabilization with decentralized control. Using this association, we show that for any finite game with an isolated completely mixed-strategy NE, there exist higher-order uncoupled learning dynamics that lead (locally) to that NE. We further establish the lack of universality of learning dynamics by linking learning to the control theoretic concept of simultaneous stabilization. We construct two games such that any higher-order dynamics that learn the completely mixed-strategy NE of one of these games can never learn the completely mixed-strategy NE of the other. Next, motivated by imposing natural restrictions on allowable learning dynamics, we introduce the Asymptotic Best Response (ABR) property. Dynamics with the ABR property asymptotically learn a best response in environments that are asymptotically stationary. We show that the ABR property relates to an internal stability condition on higher-order learning dynamics. We provide conditions under which NE are compatible with the ABR property. Finally, we address learnability of mixed-strategy NE in the bandit setting using a bandit version of higher-order replicator dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses a control-theoretic link to show higher-order uncoupled dynamics can locally attract isolated mixed NE and that no single dynamics works for all such NE via simultaneous stabilization.

read the letter

The core contribution is the reduction of learnability questions for isolated completely mixed NE to decentralized feedback stabilization. They construct higher-order dynamics with auxiliary states that remain uncoupled—each player only uses their own payoff observations and internal states—and show local convergence for any finite game with such an NE. The non-universality result is cleaner: they build two games where any higher-order dynamics that stabilizes one NE cannot stabilize the other, because that would require a controller that simultaneously stabilizes two incompatible plants. This is a direct import of the simultaneous stabilization problem and feels like a genuine new angle on why universal uncoupled learning fails even with extra memory. The ABR property is a reasonable way to add a natural restriction and tie it to internal stability, though it is mostly definitional at this stage. The bandit extension is mentioned but gets less development. The main soft spot is that the explicit construction of the dynamics is not fully visible in the abstract, so it is still unclear how much game structure leaks into the auxiliary update rules during the stabilization design. If the paper keeps the controllers strictly local to each player's payoff function, the claim holds; otherwise the uncoupled label weakens. Overall this is worth a serious referee. It is aimed at people working on dynamics in games who are open to control ideas, and the existence plus non-universality pair gives a clean organizing frame even if the proofs need tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that for any finite game possessing an isolated completely mixed-strategy Nash equilibrium, there exist higher-order uncoupled learning dynamics (allowing auxiliary states but no access to opponents' utilities) that locally converge to that equilibrium. It establishes this via an association between uncoupled learning and decentralized feedback stabilization, proves a non-universality result by constructing two games whose completely mixed NE cannot be simultaneously learned by any single higher-order dynamics (via simultaneous stabilization), introduces the Asymptotic Best Response (ABR) property as a natural restriction on allowable dynamics, relates ABR to internal stability, and extends the analysis to a bandit setting with a higher-order replicator variant.

Significance. If the central existence and non-universality claims hold with the uncoupled property preserved, the work would offer a control-theoretic route to designing higher-order uncoupled dynamics for mixed NE learnability, which is a notable contribution to the literature on learning in games. The non-universality construction and the ABR property provide concrete limitations and restrictions that sharpen understanding of what is learnable. The paper does not appear to supply machine-checked proofs or fully reproducible code, but the explicit game constructions for non-universality are a positive feature.

major comments (2)

[Abstract, paragraph on link between uncoupled learning and feedback stabilization] Abstract, paragraph on link between uncoupled learning and feedback stabilization: the existence result for higher-order uncoupled dynamics rests on mapping uncoupled learning to decentralized stabilization and then constructing auxiliary-state dynamics that locally attract the target NE. It is not shown that this mapping produces update rules (including auxiliary states) that depend only on each player's own payoff function and own observations; if the stabilization design implicitly requires the full payoff matrix or global game structure, the resulting objects would fail to be uncoupled in the standard sense and the central claim would not follow.
[Section establishing the non-universality result] Section establishing the non-universality result: the two-game construction shows that no single higher-order dynamics can learn both completely mixed NEs, but the argument must explicitly verify that the class of dynamics considered matches the higher-order uncoupled definition used in the existence claim; otherwise the non-universality statement applies to a different (possibly larger) class.

minor comments (2)

The definition of 'higher-order' dynamics (auxiliary states and their update rules) should be stated formally before the control-theoretic link is invoked, to make the subsequent constructions easier to follow.
In the bandit-setting section, clarify how the bandit version of higher-order replicator dynamics maintains the uncoupled property when only payoff samples are observed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive comments on our manuscript. We address each of the major comments below.

read point-by-point responses

Referee: Abstract, paragraph on link between uncoupled learning and feedback stabilization: the existence result for higher-order uncoupled dynamics rests on mapping uncoupled learning to decentralized stabilization and then constructing auxiliary-state dynamics that locally attract the target NE. It is not shown that this mapping produces update rules (including auxiliary states) that depend only on each player's own payoff function and own observations; if the stabilization design implicitly requires the full payoff matrix or global game structure, the resulting objects would fail to be uncoupled in the standard sense and the central claim would not follow.

Authors: The referee correctly identifies a point that requires clarification. In our construction, the decentralized feedback stabilization is performed locally for each player, relying exclusively on that player's own payoff function and their private observations of their own strategy. The auxiliary states are introduced as part of each player's individual dynamics and do not encode or require any information about opponents' payoffs or the overall game structure. Thus, the resulting higher-order dynamics remain uncoupled. We will revise the manuscript to include a more explicit discussion of this property in the abstract and the relevant section to prevent any ambiguity. revision: yes
Referee: Section establishing the non-universality result: the two-game construction shows that no single higher-order dynamics can learn both completely mixed NEs, but the argument must explicitly verify that the class of dynamics considered matches the higher-order uncoupled definition used in the existence claim; otherwise the non-universality statement applies to a different (possibly larger) class.

Authors: We appreciate this suggestion for ensuring rigor. The non-universality result is derived for the identical class of higher-order uncoupled dynamics defined and used in the existence result. The proof via simultaneous stabilization impossibility is applied to dynamics that are both higher-order (with auxiliary states) and uncoupled (no access to opponents' utilities). We will update the manuscript to include an explicit cross-reference to the definition of higher-order uncoupled dynamics at the start of the non-universality section, confirming that the class is the same. revision: yes

Circularity Check

0 steps flagged

No circularity: central existence result constructed via external control-theoretic association

full rationale

The paper presents the link between uncoupled learning and decentralized feedback stabilization as an external association used to construct higher-order dynamics. The existence claim for dynamics attracting an isolated completely mixed NE, the non-universality via simultaneous stabilization, the ABR property, and the bandit extension are all developed from this association and game-theoretic constructions without reducing to self-defined parameters, fitted inputs renamed as predictions, or load-bearing self-citations that collapse the derivation. No equations or steps in the provided text exhibit the enumerated circular patterns; the result remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard finite-game assumptions and the validity of the learning-to-stabilization mapping; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Finite games admit isolated completely mixed-strategy Nash equilibria
Invoked in the existence statement for any finite game with such an NE.
domain assumption Uncoupled learning can be associated with decentralized feedback stabilization
Central link used to construct the higher-order dynamics.

pith-pipeline@v0.9.0 · 5791 in / 1232 out tokens · 28271 ms · 2026-05-19T09:49:50.225079+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

[1]

A simple adaptive procedure leading to correlated equilibrium,

S. Hart and A. Mas-Colell, “A simple adaptive procedure leading to correlated equilibrium,” Econo- metrica, vol. 68, no. 5, pp. 1127–1150, 2000

work page 2000
[2]

Fictitious play in 2 ×n games,

U. Berger, “Fictitious play in 2 ×n games,”Journal of Economic Theory, vol. 120, no. 2, pp. 139–154, 2005

work page 2005
[3]

Fictitious play property for games with identical interests,

D. Monderer and L. S. Shapley, “Fictitious play property for games with identical interests,” Journal of Economic Theory, vol. 68, no. 1, pp. 258–265, 1996

work page 1996
[4]

Unified convergence proofs of continuous-time fictitious play,

J. S. Shamma and G. Arslan, “Unified convergence proofs of continuous-time fictitious play,” IEEE Transactions on Automatic Control, vol. 49, no. 7, pp. 1137–1141, 2004. 42

work page 2004
[5]

Game dynamics as the meaning of a game,

C. Papadimitriou and G. Piliouras, “Game dynamics as the meaning of a game,” SIGecom Exch. , vol. 16, pp. 53–63, may 2019

work page 2019
[6]

Some topics in two-person games,

L. S. Shapley, “Some topics in two-person games,” in Advances in Game Theory (L. Shapley, M. Dresher, and A. Tucker, eds.), pp. 1–29, Princeton, NJ: Princeton University Press, 1964

work page 1964
[7]

On the nonconvergence of fictitious play in coordination games,

D. P. Foster and H. Young, “On the nonconvergence of fictitious play in coordination games,” Games and Economic Behavior, vol. 25, no. 1, pp. 79–96, 1998

work page 1998
[8]

H. P. Young, Individual Strategy and Social Structure: An Evolutionary Theory of Institutions. Prince- ton University Press, 1998

work page 1998
[9]

Beyond the Nash equilibrium barrier.,

R. D. Kleinberg, K. Ligett, G. Piliouras, and ´E. Tardos, “Beyond the Nash equilibrium barrier.,” inICS, vol. 20, pp. 125–140, 2011

work page 2011
[10]

Optimization despite chaos: Convex relaxations to complex limit sets via Poincar´e recurrence,

G. Piliouras and J. S. Shamma, “Optimization despite chaos: Convex relaxations to complex limit sets via Poincar´e recurrence,” in Proceedings of the 2014 Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 861–873, 2014

work page 2014
[11]

Hart and A

S. Hart and A. Mas-Colell, Simple Adaptive Strategies. World Scientific, 2013

work page 2013
[12]

Uncoupled dynamics do not lead to Nash equilibrium,

S. Hart and A. Mas-Colell, “Uncoupled dynamics do not lead to Nash equilibrium,” American Eco- nomic Review, vol. 93, pp. 1830–1836, December 2003

work page 2003
[13]

An impossibility theorem in game dynamics,

J. Milionis, C. Papadimitriou, G. Piliouras, and K. Spendlove, “An impossibility theorem in game dynamics,”Proceedings of the National Academy of Sciences, vol. 120, no. 41, 2023

work page 2023
[14]

Chaos in learning a simple two-person game,

Y . Sato, E. Akiyama, and J. D. Farmer, “Chaos in learning a simple two-person game,”Proceedings of the National Academy of Sciences, vol. 99, no. 7, pp. 4748–4751, 2002

work page 2002
[15]

Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games

Y . K. Cheung and G. Piliouras, “V ortices instead of equilibria in minmax optimization: Chaos and butterfly effects of online learning in zero-sum games,”CoRR, vol. abs/1905.08396, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[16]

No-regret learning and mixed Nash equilibria: They do not mix,

E.-V . Vlatakis-Gkaragkounis, L. Flokas, T. Lianeas, P. Mertikopoulos, and G. Piliouras, “No-regret learning and mixed Nash equilibria: They do not mix,” in Advances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds.), vol. 33, pp. 1380–1391, Curran Associates, Inc., 2020

work page 2020
[17]

On the impossibility of convergence of mixed strategies with optimal no-regret learning,

V . Muthukumar, S. Phade, and A. Sahai, “On the impossibility of convergence of mixed strategies with optimal no-regret learning,”Mathematics of Operations Research, p. null, 2024

work page 2024
[18]

The limit points of (optimistic) gradient descent in min-max optimiza- tion,

C. Daskalakis and I. Panageas, “The limit points of (optimistic) gradient descent in min-max optimiza- tion,” inAdvances in Neural Information Processing Systems (NeurIPS), 2018

work page 2018
[19]

Some methods of speeding up the convergence of iteration methods,

B. Polyak, “Some methods of speeding up the convergence of iteration methods,”Ussr Computational Mathematics and Mathematical Physics, vol. 4, pp. 1–17, 12 1964

work page 1964
[20]

Training GANs with Optimism

C. Daskalakis, A. Ilyas, V . Syrgkanis, and H. Zeng, “Training GANs with optimism.” arXiv preprint arXiv:1711.00141, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

On the importance of initialization and momentum in deep learning,

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” inProceedings of the 30th International Conference on Machine Learning(S. Dasgupta and D. McAllester, eds.), vol. 28 of Proceedings of Machine Learning Research , (Atlanta, Georgia, USA), pp. 1139–1147, PMLR, 17–19 Jun 2013. 43

work page 2013
[22]

Dynamic fictitious play, dynamic gradient play, and distributed con- vergence to Nash equilibria,

J. S. Shamma and G. Arslan, “Dynamic fictitious play, dynamic gradient play, and distributed con- vergence to Nash equilibria,” IEEE Transactions on Automatic Control, vol. 50, pp. 312–327, March 2005

work page 2005
[23]

Relaxation techniques and asynchronous algorithms for on-line computation of non- cooperative equilibria,

T. Bas ¸ar, “Relaxation techniques and asynchronous algorithms for on-line computation of non- cooperative equilibria,” Journal of Economic Dynamics and Control , vol. 11, no. 4, pp. 531–549, 1987

work page 1987
[24]

Adaptation in games: Two solutions to the Crawford puzzle,

J. Conlisk, “Adaptation in games: Two solutions to the Crawford puzzle,”Journal of Economic Behav- ior & Organization, vol. 22, no. 1, pp. 25–50, 1993

work page 1993
[25]

Newtonian mechanics and Nash play,

S. Flam and J. Morgan, “Newtonian mechanics and Nash play,” International Game Theory Review , vol. 06, 07 2003

work page 2003
[26]

Higher order game dynamics,

R. Laraki and P. Mertikopoulos, “Higher order game dynamics,”Journal of Economic Theory, vol. 148, pp. 2666–2695, 06 2013

work page 2013
[27]

On passivity, reinforcement learning and higher order learning in multiagent finite games,

B. Gao and L. Pavel, “On passivity, reinforcement learning and higher order learning in multiagent finite games,”IEEE Transactions on Automatic Control, vol. 66, no. 1, pp. 121–136, 2021

work page 2021
[28]

Anticipatory learning in general evolutionary games,

G. Arslan and J. S. Shamma, “Anticipatory learning in general evolutionary games,” inProceedings of the 45th IEEE Conference on Decision and Control, pp. 6289–6294, 2006

work page 2006
[29]

Higher-order uncoupled dynamics do not lead to Nash equilibrium — except when they do,

S. A. Toonsi and J. S. Shamma, “Higher-order uncoupled dynamics do not lead to Nash equilibrium — except when they do,” in Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, (Red Hook, NY , USA), Curran Associates Inc., 2024

work page 2024
[30]

Replicator dynamics,

P. Schuster and K. Sigmund, “Replicator dynamics,” Journal of Theoretical Biology, vol. 100, no. 3, pp. 533–538, 1983

work page 1983
[31]

Replicator dynamics: Old and new,

S. Sylvain, “Replicator dynamics: Old and new,” Journal of Dynamics and Games , vol. 7, no. 4, pp. 365–386, 2020

work page 2020
[32]

Oddness of the number of equilibrium points: A new proof,

J. Harsanyi, “Oddness of the number of equilibrium points: A new proof,” International Journal of Game Theory, vol. 2, pp. 235–250, 1973

work page 1973
[33]

Weibull, Evolutionary Game Theory

J. Weibull, Evolutionary Game Theory. Mit Press, MIT Press, 1997

work page 1997
[34]

Online optimization in games via control theory: Connecting regret, passivity and Poincar´e recurrence,

Y . K. Cheung and G. Piliouras, “Online optimization in games via control theory: Connecting regret, passivity and Poincar´e recurrence,” in Proceedings of the 38th International Conference on Machine Learning (M. Meila and T. Zhang, eds.), vol. 139 of Proceedings of Machine Learning Research , pp. 1855–1865, PMLR, 2021

work page 2021
[35]

Passivity, no-regret, and convergent learning in con- tractive games,

H. Abdelraouf, G. Piliouras, and J. S. Shamma, “Passivity, no-regret, and convergent learning in con- tractive games,” 2025. arXiv preprint

work page 2025
[36]

Single-loop feedback-stabilization of linear multivariable dynam- ical plants,

D. Youla, J. Bongiorno, and C. Lu, “Single-loop feedback-stabilization of linear multivariable dynam- ical plants,”Automatica, vol. 10, no. 2, pp. 159–173, 1974

work page 1974
[37]

Dynamics of stochastic approximation algorithms,

M. Bena ¨ım, “Dynamics of stochastic approximation algorithms,” inS´eminaire de Probabilit´es XXXIII (J. Az´ema, M. ´Emery, M. Ledoux, and M. Yor, eds.), (Berlin, Heidelberg), pp. 1–68, Springer Berlin Heidelberg, 1999. 44

work page 1999
[38]

Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint

V . Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint . Cambridge University Press, 2008

work page 2008
[39]

Spivak, Calculus On Manifolds: A Modern Approach To Classical Theorems Of Advanced Calcu- lus

M. Spivak, Calculus On Manifolds: A Modern Approach To Classical Theorems Of Advanced Calcu- lus. Avalon Publishing, 1965

work page 1965
[40]

On the uniqueness of Nash equilibria in multiagent matrix games

J. P. Bailey, “On the uniqueness of Nash equilibria in multiagent matrix games.” arXiv preprint arXiv:2410.16548, 2024

work page arXiv 2024
[41]

Time average replicator and best-reply dynamics,

J. Hofbauer, S. Sorin, and Y . Viossat, “Time average replicator and best-reply dynamics,”Mathematics of Operations Research, vol. 34, no. 2, pp. 263–269, 2009

work page 2009
[42]

Chapter 9 - stable matrices and polynomials,

A. S. Poznyak, “Chapter 9 - stable matrices and polynomials,” in Advanced Mathematical Tools for Automatic Control Engineers: Deterministic Techniques (A. S. Poznyak, ed.), pp. 139–174, Oxford: Elsevier, 2008

work page 2008
[43]

Decentralized strong stabilization problem,

A. B. Ozguler and K. A. Unyelioulu, “Decentralized strong stabilization problem,” in 1992 American Control Conference, pp. 3294–3298, 1992

work page 1992
[44]

On a network generalization of the minmax theorem,

C. Daskalakis and C. H. Papadimitriou, “On a network generalization of the minmax theorem,” in Au- tomata, Languages and Programming(S. Albers, A. Marchetti-Spaccamela, Y . Matias, S. Nikoletseas, and W. Thomas, eds.), (Berlin, Heidelberg), pp. 423–434, Springer Berlin Heidelberg, 2009

work page 2009
[45]

Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon,

H. Moulin and J.-P. Vial, “Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon,” International Journal of Game Theory , vol. 7, pp. 201–221, 1978

work page 1978
[46]

Aubin and A

J.-P. Aubin and A. Cellina, Differential Inclusions. Grundlehren der mathematischen Wissenschaften, Springer Berlin, Heidelberg, 1 ed., 2012

work page 2012
[47]

Narendra and M

K. Narendra and M. Thathachar, Learning Automata: An Introduction . Dover Books on Electrical Engineering Series, Dover Publications, Incorporated, 2012

work page 2012
[48]

Learning to reach the pareto optimal Nash equi- librium as a team,

K. Verbeeck, A. Now ´e, T. Lenaerts, and J. Parent, “Learning to reach the pareto optimal Nash equi- librium as a team,” in AI 2002: Advances in Artificial Intelligence (B. McKay and J. Slaney, eds.), (Berlin, Heidelberg), pp. 407–418, Springer Berlin Heidelberg, 2002

work page 2002
[49]

Learning through reinforcement and replicator dynamics,

T. B ¨orgers and R. Sarin, “Learning through reinforcement and replicator dynamics,” Journal of Eco- nomic Theory, vol. 77, no. 1, pp. 1–14, 1997

work page 1997
[50]

Attainability of boundary points under reinforcement learning,

E. Hopkins and M. Posch, “Attainability of boundary points under reinforcement learning,” Games and Economic Behavior, vol. 53, no. 1, pp. 110–125, 2005

work page 2005
[51]

Distributed dynamic reinforcement of efficient outcomes in multi- agent coordination and network formation,

G. C. Chasparis and J. S. Shamma, “Distributed dynamic reinforcement of efficient outcomes in multi- agent coordination and network formation,”Dynamic Games and Applications, vol. 2, no. 1, pp. 18–50, 2012

work page 2012
[52]

Multi-agent risks from advanced AI,

L. Hammond, A. Chan, J. Clifton, J. Hoelscher-Obermaier, A. Khan, E. McLean, C. Smith, W. Bar- fuss, J. Foerster, T. Gaven ˇciak, T. A. Han, E. Hughes, V . Kovaˇr´ık, J. Kulveit, J. Z. Leibo, C. Oester- held, C. S. de Witt, N. Shah, M. Wellman, P. Bova, T. Cimpeanu, C. Ezell, Q. Feuillade-Montixi, M. Franklin, E. Kran, I. Krawczuk, M. Lamparth, N. Lauffer...

work page 2025
[53]

Population games, stable games, and passivity,

M. J. Fox and J. S. Shamma, “Population games, stable games, and passivity,” Games, vol. 4, no. 4, pp. 561–583, 2013

work page 2013
[54]

Khalil, Nonlinear Systems

H. Khalil, Nonlinear Systems. Prentice Hall, third ed., 2002

work page 2002
[55]

J. P. Hespanha, Linear Systems Theory: Second Edition. Princeton University Press, 2018

work page 2018
[56]

W. J. Rugh, Linear System Theory. Prentice Hall, 1996

work page 1996
[57]

Washout filters in feedback control: benefits, limitations and extensions,

M. Hassouneh, H.-C. Lee, and E. Abed, “Washout filters in feedback control: benefits, limitations and extensions,” inProceedings of the 2004 American Control Conference, pp. 3950–3955, 2004

work page 2004
[58]

Decentralized stabilization and pole assignment for general proper sys- tems,

E. Davison and T. Chang, “Decentralized stabilization and pole assignment for general proper sys- tems,”IEEE Transactions on Automatic Control, vol. 35, no. 6, pp. 652–664, 1990. 46

work page 1990

[1] [1]

A simple adaptive procedure leading to correlated equilibrium,

S. Hart and A. Mas-Colell, “A simple adaptive procedure leading to correlated equilibrium,” Econo- metrica, vol. 68, no. 5, pp. 1127–1150, 2000

work page 2000

[2] [2]

Fictitious play in 2 ×n games,

U. Berger, “Fictitious play in 2 ×n games,”Journal of Economic Theory, vol. 120, no. 2, pp. 139–154, 2005

work page 2005

[3] [3]

Fictitious play property for games with identical interests,

D. Monderer and L. S. Shapley, “Fictitious play property for games with identical interests,” Journal of Economic Theory, vol. 68, no. 1, pp. 258–265, 1996

work page 1996

[4] [4]

Unified convergence proofs of continuous-time fictitious play,

J. S. Shamma and G. Arslan, “Unified convergence proofs of continuous-time fictitious play,” IEEE Transactions on Automatic Control, vol. 49, no. 7, pp. 1137–1141, 2004. 42

work page 2004

[5] [5]

Game dynamics as the meaning of a game,

C. Papadimitriou and G. Piliouras, “Game dynamics as the meaning of a game,” SIGecom Exch. , vol. 16, pp. 53–63, may 2019

work page 2019

[6] [6]

Some topics in two-person games,

L. S. Shapley, “Some topics in two-person games,” in Advances in Game Theory (L. Shapley, M. Dresher, and A. Tucker, eds.), pp. 1–29, Princeton, NJ: Princeton University Press, 1964

work page 1964

[7] [7]

On the nonconvergence of fictitious play in coordination games,

D. P. Foster and H. Young, “On the nonconvergence of fictitious play in coordination games,” Games and Economic Behavior, vol. 25, no. 1, pp. 79–96, 1998

work page 1998

[8] [8]

H. P. Young, Individual Strategy and Social Structure: An Evolutionary Theory of Institutions. Prince- ton University Press, 1998

work page 1998

[9] [9]

Beyond the Nash equilibrium barrier.,

R. D. Kleinberg, K. Ligett, G. Piliouras, and ´E. Tardos, “Beyond the Nash equilibrium barrier.,” inICS, vol. 20, pp. 125–140, 2011

work page 2011

[10] [10]

Optimization despite chaos: Convex relaxations to complex limit sets via Poincar´e recurrence,

G. Piliouras and J. S. Shamma, “Optimization despite chaos: Convex relaxations to complex limit sets via Poincar´e recurrence,” in Proceedings of the 2014 Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 861–873, 2014

work page 2014

[11] [11]

Hart and A

S. Hart and A. Mas-Colell, Simple Adaptive Strategies. World Scientific, 2013

work page 2013

[12] [12]

Uncoupled dynamics do not lead to Nash equilibrium,

S. Hart and A. Mas-Colell, “Uncoupled dynamics do not lead to Nash equilibrium,” American Eco- nomic Review, vol. 93, pp. 1830–1836, December 2003

work page 2003

[13] [13]

An impossibility theorem in game dynamics,

J. Milionis, C. Papadimitriou, G. Piliouras, and K. Spendlove, “An impossibility theorem in game dynamics,”Proceedings of the National Academy of Sciences, vol. 120, no. 41, 2023

work page 2023

[14] [14]

Chaos in learning a simple two-person game,

Y . Sato, E. Akiyama, and J. D. Farmer, “Chaos in learning a simple two-person game,”Proceedings of the National Academy of Sciences, vol. 99, no. 7, pp. 4748–4751, 2002

work page 2002

[15] [15]

Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games

Y . K. Cheung and G. Piliouras, “V ortices instead of equilibria in minmax optimization: Chaos and butterfly effects of online learning in zero-sum games,”CoRR, vol. abs/1905.08396, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[16] [16]

No-regret learning and mixed Nash equilibria: They do not mix,

E.-V . Vlatakis-Gkaragkounis, L. Flokas, T. Lianeas, P. Mertikopoulos, and G. Piliouras, “No-regret learning and mixed Nash equilibria: They do not mix,” in Advances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds.), vol. 33, pp. 1380–1391, Curran Associates, Inc., 2020

work page 2020

[17] [17]

On the impossibility of convergence of mixed strategies with optimal no-regret learning,

V . Muthukumar, S. Phade, and A. Sahai, “On the impossibility of convergence of mixed strategies with optimal no-regret learning,”Mathematics of Operations Research, p. null, 2024

work page 2024

[18] [18]

The limit points of (optimistic) gradient descent in min-max optimiza- tion,

C. Daskalakis and I. Panageas, “The limit points of (optimistic) gradient descent in min-max optimiza- tion,” inAdvances in Neural Information Processing Systems (NeurIPS), 2018

work page 2018

[19] [19]

Some methods of speeding up the convergence of iteration methods,

B. Polyak, “Some methods of speeding up the convergence of iteration methods,”Ussr Computational Mathematics and Mathematical Physics, vol. 4, pp. 1–17, 12 1964

work page 1964

[20] [20]

Training GANs with Optimism

C. Daskalakis, A. Ilyas, V . Syrgkanis, and H. Zeng, “Training GANs with optimism.” arXiv preprint arXiv:1711.00141, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

On the importance of initialization and momentum in deep learning,

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” inProceedings of the 30th International Conference on Machine Learning(S. Dasgupta and D. McAllester, eds.), vol. 28 of Proceedings of Machine Learning Research , (Atlanta, Georgia, USA), pp. 1139–1147, PMLR, 17–19 Jun 2013. 43

work page 2013

[22] [22]

Dynamic fictitious play, dynamic gradient play, and distributed con- vergence to Nash equilibria,

J. S. Shamma and G. Arslan, “Dynamic fictitious play, dynamic gradient play, and distributed con- vergence to Nash equilibria,” IEEE Transactions on Automatic Control, vol. 50, pp. 312–327, March 2005

work page 2005

[23] [23]

Relaxation techniques and asynchronous algorithms for on-line computation of non- cooperative equilibria,

T. Bas ¸ar, “Relaxation techniques and asynchronous algorithms for on-line computation of non- cooperative equilibria,” Journal of Economic Dynamics and Control , vol. 11, no. 4, pp. 531–549, 1987

work page 1987

[24] [24]

Adaptation in games: Two solutions to the Crawford puzzle,

J. Conlisk, “Adaptation in games: Two solutions to the Crawford puzzle,”Journal of Economic Behav- ior & Organization, vol. 22, no. 1, pp. 25–50, 1993

work page 1993

[25] [25]

Newtonian mechanics and Nash play,

S. Flam and J. Morgan, “Newtonian mechanics and Nash play,” International Game Theory Review , vol. 06, 07 2003

work page 2003

[26] [26]

Higher order game dynamics,

R. Laraki and P. Mertikopoulos, “Higher order game dynamics,”Journal of Economic Theory, vol. 148, pp. 2666–2695, 06 2013

work page 2013

[27] [27]

On passivity, reinforcement learning and higher order learning in multiagent finite games,

B. Gao and L. Pavel, “On passivity, reinforcement learning and higher order learning in multiagent finite games,”IEEE Transactions on Automatic Control, vol. 66, no. 1, pp. 121–136, 2021

work page 2021

[28] [28]

Anticipatory learning in general evolutionary games,

G. Arslan and J. S. Shamma, “Anticipatory learning in general evolutionary games,” inProceedings of the 45th IEEE Conference on Decision and Control, pp. 6289–6294, 2006

work page 2006

[29] [29]

Higher-order uncoupled dynamics do not lead to Nash equilibrium — except when they do,

S. A. Toonsi and J. S. Shamma, “Higher-order uncoupled dynamics do not lead to Nash equilibrium — except when they do,” in Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, (Red Hook, NY , USA), Curran Associates Inc., 2024

work page 2024

[30] [30]

Replicator dynamics,

P. Schuster and K. Sigmund, “Replicator dynamics,” Journal of Theoretical Biology, vol. 100, no. 3, pp. 533–538, 1983

work page 1983

[31] [31]

Replicator dynamics: Old and new,

S. Sylvain, “Replicator dynamics: Old and new,” Journal of Dynamics and Games , vol. 7, no. 4, pp. 365–386, 2020

work page 2020

[32] [32]

Oddness of the number of equilibrium points: A new proof,

J. Harsanyi, “Oddness of the number of equilibrium points: A new proof,” International Journal of Game Theory, vol. 2, pp. 235–250, 1973

work page 1973

[33] [33]

Weibull, Evolutionary Game Theory

J. Weibull, Evolutionary Game Theory. Mit Press, MIT Press, 1997

work page 1997

[34] [34]

Online optimization in games via control theory: Connecting regret, passivity and Poincar´e recurrence,

Y . K. Cheung and G. Piliouras, “Online optimization in games via control theory: Connecting regret, passivity and Poincar´e recurrence,” in Proceedings of the 38th International Conference on Machine Learning (M. Meila and T. Zhang, eds.), vol. 139 of Proceedings of Machine Learning Research , pp. 1855–1865, PMLR, 2021

work page 2021

[35] [35]

Passivity, no-regret, and convergent learning in con- tractive games,

H. Abdelraouf, G. Piliouras, and J. S. Shamma, “Passivity, no-regret, and convergent learning in con- tractive games,” 2025. arXiv preprint

work page 2025

[36] [36]

Single-loop feedback-stabilization of linear multivariable dynam- ical plants,

D. Youla, J. Bongiorno, and C. Lu, “Single-loop feedback-stabilization of linear multivariable dynam- ical plants,”Automatica, vol. 10, no. 2, pp. 159–173, 1974

work page 1974

[37] [37]

Dynamics of stochastic approximation algorithms,

M. Bena ¨ım, “Dynamics of stochastic approximation algorithms,” inS´eminaire de Probabilit´es XXXIII (J. Az´ema, M. ´Emery, M. Ledoux, and M. Yor, eds.), (Berlin, Heidelberg), pp. 1–68, Springer Berlin Heidelberg, 1999. 44

work page 1999

[38] [38]

Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint

V . Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint . Cambridge University Press, 2008

work page 2008

[39] [39]

Spivak, Calculus On Manifolds: A Modern Approach To Classical Theorems Of Advanced Calcu- lus

M. Spivak, Calculus On Manifolds: A Modern Approach To Classical Theorems Of Advanced Calcu- lus. Avalon Publishing, 1965

work page 1965

[40] [40]

On the uniqueness of Nash equilibria in multiagent matrix games

J. P. Bailey, “On the uniqueness of Nash equilibria in multiagent matrix games.” arXiv preprint arXiv:2410.16548, 2024

work page arXiv 2024

[41] [41]

Time average replicator and best-reply dynamics,

J. Hofbauer, S. Sorin, and Y . Viossat, “Time average replicator and best-reply dynamics,”Mathematics of Operations Research, vol. 34, no. 2, pp. 263–269, 2009

work page 2009

[42] [42]

Chapter 9 - stable matrices and polynomials,

A. S. Poznyak, “Chapter 9 - stable matrices and polynomials,” in Advanced Mathematical Tools for Automatic Control Engineers: Deterministic Techniques (A. S. Poznyak, ed.), pp. 139–174, Oxford: Elsevier, 2008

work page 2008

[43] [43]

Decentralized strong stabilization problem,

A. B. Ozguler and K. A. Unyelioulu, “Decentralized strong stabilization problem,” in 1992 American Control Conference, pp. 3294–3298, 1992

work page 1992

[44] [44]

On a network generalization of the minmax theorem,

C. Daskalakis and C. H. Papadimitriou, “On a network generalization of the minmax theorem,” in Au- tomata, Languages and Programming(S. Albers, A. Marchetti-Spaccamela, Y . Matias, S. Nikoletseas, and W. Thomas, eds.), (Berlin, Heidelberg), pp. 423–434, Springer Berlin Heidelberg, 2009

work page 2009

[45] [45]

Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon,

H. Moulin and J.-P. Vial, “Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon,” International Journal of Game Theory , vol. 7, pp. 201–221, 1978

work page 1978

[46] [46]

Aubin and A

J.-P. Aubin and A. Cellina, Differential Inclusions. Grundlehren der mathematischen Wissenschaften, Springer Berlin, Heidelberg, 1 ed., 2012

work page 2012

[47] [47]

Narendra and M

K. Narendra and M. Thathachar, Learning Automata: An Introduction . Dover Books on Electrical Engineering Series, Dover Publications, Incorporated, 2012

work page 2012

[48] [48]

Learning to reach the pareto optimal Nash equi- librium as a team,

K. Verbeeck, A. Now ´e, T. Lenaerts, and J. Parent, “Learning to reach the pareto optimal Nash equi- librium as a team,” in AI 2002: Advances in Artificial Intelligence (B. McKay and J. Slaney, eds.), (Berlin, Heidelberg), pp. 407–418, Springer Berlin Heidelberg, 2002

work page 2002

[49] [49]

Learning through reinforcement and replicator dynamics,

T. B ¨orgers and R. Sarin, “Learning through reinforcement and replicator dynamics,” Journal of Eco- nomic Theory, vol. 77, no. 1, pp. 1–14, 1997

work page 1997

[50] [50]

Attainability of boundary points under reinforcement learning,

E. Hopkins and M. Posch, “Attainability of boundary points under reinforcement learning,” Games and Economic Behavior, vol. 53, no. 1, pp. 110–125, 2005

work page 2005

[51] [51]

Distributed dynamic reinforcement of efficient outcomes in multi- agent coordination and network formation,

G. C. Chasparis and J. S. Shamma, “Distributed dynamic reinforcement of efficient outcomes in multi- agent coordination and network formation,”Dynamic Games and Applications, vol. 2, no. 1, pp. 18–50, 2012

work page 2012

[52] [52]

Multi-agent risks from advanced AI,

L. Hammond, A. Chan, J. Clifton, J. Hoelscher-Obermaier, A. Khan, E. McLean, C. Smith, W. Bar- fuss, J. Foerster, T. Gaven ˇciak, T. A. Han, E. Hughes, V . Kovaˇr´ık, J. Kulveit, J. Z. Leibo, C. Oester- held, C. S. de Witt, N. Shah, M. Wellman, P. Bova, T. Cimpeanu, C. Ezell, Q. Feuillade-Montixi, M. Franklin, E. Kran, I. Krawczuk, M. Lamparth, N. Lauffer...

work page 2025

[53] [53]

Population games, stable games, and passivity,

M. J. Fox and J. S. Shamma, “Population games, stable games, and passivity,” Games, vol. 4, no. 4, pp. 561–583, 2013

work page 2013

[54] [54]

Khalil, Nonlinear Systems

H. Khalil, Nonlinear Systems. Prentice Hall, third ed., 2002

work page 2002

[55] [55]

J. P. Hespanha, Linear Systems Theory: Second Edition. Princeton University Press, 2018

work page 2018

[56] [56]

W. J. Rugh, Linear System Theory. Prentice Hall, 1996

work page 1996

[57] [57]

Washout filters in feedback control: benefits, limitations and extensions,

M. Hassouneh, H.-C. Lee, and E. Abed, “Washout filters in feedback control: benefits, limitations and extensions,” inProceedings of the 2004 American Control Conference, pp. 3950–3955, 2004

work page 2004

[58] [58]

Decentralized stabilization and pole assignment for general proper sys- tems,

E. Davison and T. Chang, “Decentralized stabilization and pole assignment for general proper sys- tems,”IEEE Transactions on Automatic Control, vol. 35, no. 6, pp. 652–664, 1990. 46

work page 1990