pith. sign in

arxiv: 2506.10874 · v2 · submitted 2025-06-12 · 💻 cs.MA · cs.GT· cs.SY· eess.SY

Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium

Pith reviewed 2026-05-19 09:49 UTC · model grok-4.3

classification 💻 cs.MA cs.GTcs.SYeess.SY
keywords uncoupled learninghigher-order dynamicsNash equilibriummixed strategiesfinite gamesdecentralized controlreplicator dynamics
0
0 comments X

The pith

Higher-order uncoupled dynamics exist that locally converge to any isolated mixed Nash equilibrium.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors aim to show that players can learn isolated mixed-strategy Nash equilibria in finite games even when they cannot observe opponents' utilities, provided they use learning rules that incorporate additional internal states. This would matter if true because it overcomes limitations of standard dynamics that often cycle or fail to settle at mixed points. They establish this by connecting the learning process to the design of stabilizing controllers in a decentralized setting from control theory. The work also highlights that learning dynamics cannot be universal across all games.

Core claim

For any finite game with an isolated completely mixed-strategy Nash equilibrium, there exist higher-order uncoupled learning dynamics that lead locally to that equilibrium. The proof relies on associating uncoupled learning with feedback stabilization under decentralized control, which permits constructing the required dynamics using control-theoretic tools. The paper additionally shows a lack of universality by constructing pairs of games where dynamics that learn one equilibrium cannot learn the other, drawing from simultaneous stabilization concepts.

What carries the argument

The correspondence between higher-order uncoupled learning dynamics and decentralized feedback stabilization systems, which enables the application of stability analysis from control theory to prove local convergence of the learning process.

If this is right

  • Players using only their own payoff observations can still reach mixed equilibria through carefully designed auxiliary dynamics.
  • Any isolated mixed Nash equilibrium becomes a locally attractive point for some choice of higher-order learning rule.
  • No single learning dynamic works universally for all games, as shown by pairs where stabilization of one precludes the other.
  • The asymptotic best response property provides a way to ensure dynamics remain consistent with best responses in stationary settings.
  • Bandit feedback versions of the dynamics allow learning under partial information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If explicit constructions of these dynamics become available, they could inform the design of memory-augmented algorithms in multi-agent reinforcement learning.
  • The control-theoretic perspective might help analyze convergence in games with time-varying payoffs or noise.
  • Non-universality implies that practitioners may need game-dependent tuning of learning parameters for reliable equilibrium finding.
  • Future work could test whether similar higher-order structures apply to infinite games or stochastic approximations.

Load-bearing premise

The equivalence between the convergence of uncoupled higher-order learning and the stabilization of an associated decentralized control system holds without significant discrepancies.

What would settle it

Numerical or analytical demonstration that in a particular finite game with an isolated mixed Nash equilibrium, no higher-order uncoupled dynamics of reasonable complexity achieve local convergence.

Figures

Figures reproduced from arXiv: 2506.10874 by Jeff S. Shamma, Sarah A. Toonsi.

Figure 1
Figure 1. Figure 1: Learning dynamics in feedback with a game [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Higher-order replicator dynamics with linear higher-order terms as an open system [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Network structures of ΓCY and ΓPW. Arrows indicate strategic dependency (e.g., in ΓCY, player 1’s payoff depends on player 2’s strategy.) 6.1 Two games with different network structures Let us begin by discussing the class of games of interest. To this end, define ΓCY(c) to be the following (cyclic) polymatrix game R1(x1, x2) = x T 1 M1(c1)x2 R2(x2, x3) = x T 2 M2(c2)x3 R3(x3, x4) = x T 3 M3(c3)x4 R4(x4, x… view at source ↗
Figure 4
Figure 4. Figure 4: Stable outcome of mixed-strategy equilibrium under higher-order replicator dynamics. [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Dynamics of player 1 responding to p1 =  1 0  from various initial strategies and inspect the solution when p1 = [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗
read the original abstract

We study learnability of mixed-strategy Nash Equilibrium (NE) in general finite games using higher-order replicator dynamics as well as classes of higher-order uncoupled heterogeneous dynamics. In higher-order uncoupled learning dynamics, players have no access to utilities of opponents (uncoupled) but are allowed to use auxiliary states to further process information (higher-order). We establish a link between uncoupled learning and feedback stabilization with decentralized control. Using this association, we show that for any finite game with an isolated completely mixed-strategy NE, there exist higher-order uncoupled learning dynamics that lead (locally) to that NE. We further establish the lack of universality of learning dynamics by linking learning to the control theoretic concept of simultaneous stabilization. We construct two games such that any higher-order dynamics that learn the completely mixed-strategy NE of one of these games can never learn the completely mixed-strategy NE of the other. Next, motivated by imposing natural restrictions on allowable learning dynamics, we introduce the Asymptotic Best Response (ABR) property. Dynamics with the ABR property asymptotically learn a best response in environments that are asymptotically stationary. We show that the ABR property relates to an internal stability condition on higher-order learning dynamics. We provide conditions under which NE are compatible with the ABR property. Finally, we address learnability of mixed-strategy NE in the bandit setting using a bandit version of higher-order replicator dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that for any finite game possessing an isolated completely mixed-strategy Nash equilibrium, there exist higher-order uncoupled learning dynamics (allowing auxiliary states but no access to opponents' utilities) that locally converge to that equilibrium. It establishes this via an association between uncoupled learning and decentralized feedback stabilization, proves a non-universality result by constructing two games whose completely mixed NE cannot be simultaneously learned by any single higher-order dynamics (via simultaneous stabilization), introduces the Asymptotic Best Response (ABR) property as a natural restriction on allowable dynamics, relates ABR to internal stability, and extends the analysis to a bandit setting with a higher-order replicator variant.

Significance. If the central existence and non-universality claims hold with the uncoupled property preserved, the work would offer a control-theoretic route to designing higher-order uncoupled dynamics for mixed NE learnability, which is a notable contribution to the literature on learning in games. The non-universality construction and the ABR property provide concrete limitations and restrictions that sharpen understanding of what is learnable. The paper does not appear to supply machine-checked proofs or fully reproducible code, but the explicit game constructions for non-universality are a positive feature.

major comments (2)
  1. [Abstract, paragraph on link between uncoupled learning and feedback stabilization] Abstract, paragraph on link between uncoupled learning and feedback stabilization: the existence result for higher-order uncoupled dynamics rests on mapping uncoupled learning to decentralized stabilization and then constructing auxiliary-state dynamics that locally attract the target NE. It is not shown that this mapping produces update rules (including auxiliary states) that depend only on each player's own payoff function and own observations; if the stabilization design implicitly requires the full payoff matrix or global game structure, the resulting objects would fail to be uncoupled in the standard sense and the central claim would not follow.
  2. [Section establishing the non-universality result] Section establishing the non-universality result: the two-game construction shows that no single higher-order dynamics can learn both completely mixed NEs, but the argument must explicitly verify that the class of dynamics considered matches the higher-order uncoupled definition used in the existence claim; otherwise the non-universality statement applies to a different (possibly larger) class.
minor comments (2)
  1. The definition of 'higher-order' dynamics (auxiliary states and their update rules) should be stated formally before the control-theoretic link is invoked, to make the subsequent constructions easier to follow.
  2. In the bandit-setting section, clarify how the bandit version of higher-order replicator dynamics maintains the uncoupled property when only payoff samples are observed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive comments on our manuscript. We address each of the major comments below.

read point-by-point responses
  1. Referee: Abstract, paragraph on link between uncoupled learning and feedback stabilization: the existence result for higher-order uncoupled dynamics rests on mapping uncoupled learning to decentralized stabilization and then constructing auxiliary-state dynamics that locally attract the target NE. It is not shown that this mapping produces update rules (including auxiliary states) that depend only on each player's own payoff function and own observations; if the stabilization design implicitly requires the full payoff matrix or global game structure, the resulting objects would fail to be uncoupled in the standard sense and the central claim would not follow.

    Authors: The referee correctly identifies a point that requires clarification. In our construction, the decentralized feedback stabilization is performed locally for each player, relying exclusively on that player's own payoff function and their private observations of their own strategy. The auxiliary states are introduced as part of each player's individual dynamics and do not encode or require any information about opponents' payoffs or the overall game structure. Thus, the resulting higher-order dynamics remain uncoupled. We will revise the manuscript to include a more explicit discussion of this property in the abstract and the relevant section to prevent any ambiguity. revision: yes

  2. Referee: Section establishing the non-universality result: the two-game construction shows that no single higher-order dynamics can learn both completely mixed NEs, but the argument must explicitly verify that the class of dynamics considered matches the higher-order uncoupled definition used in the existence claim; otherwise the non-universality statement applies to a different (possibly larger) class.

    Authors: We appreciate this suggestion for ensuring rigor. The non-universality result is derived for the identical class of higher-order uncoupled dynamics defined and used in the existence result. The proof via simultaneous stabilization impossibility is applied to dynamics that are both higher-order (with auxiliary states) and uncoupled (no access to opponents' utilities). We will update the manuscript to include an explicit cross-reference to the definition of higher-order uncoupled dynamics at the start of the non-universality section, confirming that the class is the same. revision: yes

Circularity Check

0 steps flagged

No circularity: central existence result constructed via external control-theoretic association

full rationale

The paper presents the link between uncoupled learning and decentralized feedback stabilization as an external association used to construct higher-order dynamics. The existence claim for dynamics attracting an isolated completely mixed NE, the non-universality via simultaneous stabilization, the ABR property, and the bandit extension are all developed from this association and game-theoretic constructions without reducing to self-defined parameters, fitted inputs renamed as predictions, or load-bearing self-citations that collapse the derivation. No equations or steps in the provided text exhibit the enumerated circular patterns; the result remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard finite-game assumptions and the validity of the learning-to-stabilization mapping; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Finite games admit isolated completely mixed-strategy Nash equilibria
    Invoked in the existence statement for any finite game with such an NE.
  • domain assumption Uncoupled learning can be associated with decentralized feedback stabilization
    Central link used to construct the higher-order dynamics.

pith-pipeline@v0.9.0 · 5791 in / 1232 out tokens · 28271 ms · 2026-05-19T09:49:50.225079+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

  1. [1]

    A simple adaptive procedure leading to correlated equilibrium,

    S. Hart and A. Mas-Colell, “A simple adaptive procedure leading to correlated equilibrium,” Econo- metrica, vol. 68, no. 5, pp. 1127–1150, 2000

  2. [2]

    Fictitious play in 2 ×n games,

    U. Berger, “Fictitious play in 2 ×n games,”Journal of Economic Theory, vol. 120, no. 2, pp. 139–154, 2005

  3. [3]

    Fictitious play property for games with identical interests,

    D. Monderer and L. S. Shapley, “Fictitious play property for games with identical interests,” Journal of Economic Theory, vol. 68, no. 1, pp. 258–265, 1996

  4. [4]

    Unified convergence proofs of continuous-time fictitious play,

    J. S. Shamma and G. Arslan, “Unified convergence proofs of continuous-time fictitious play,” IEEE Transactions on Automatic Control, vol. 49, no. 7, pp. 1137–1141, 2004. 42

  5. [5]

    Game dynamics as the meaning of a game,

    C. Papadimitriou and G. Piliouras, “Game dynamics as the meaning of a game,” SIGecom Exch. , vol. 16, pp. 53–63, may 2019

  6. [6]

    Some topics in two-person games,

    L. S. Shapley, “Some topics in two-person games,” in Advances in Game Theory (L. Shapley, M. Dresher, and A. Tucker, eds.), pp. 1–29, Princeton, NJ: Princeton University Press, 1964

  7. [7]

    On the nonconvergence of fictitious play in coordination games,

    D. P. Foster and H. Young, “On the nonconvergence of fictitious play in coordination games,” Games and Economic Behavior, vol. 25, no. 1, pp. 79–96, 1998

  8. [8]

    H. P. Young, Individual Strategy and Social Structure: An Evolutionary Theory of Institutions. Prince- ton University Press, 1998

  9. [9]

    Beyond the Nash equilibrium barrier.,

    R. D. Kleinberg, K. Ligett, G. Piliouras, and ´E. Tardos, “Beyond the Nash equilibrium barrier.,” inICS, vol. 20, pp. 125–140, 2011

  10. [10]

    Optimization despite chaos: Convex relaxations to complex limit sets via Poincar´e recurrence,

    G. Piliouras and J. S. Shamma, “Optimization despite chaos: Convex relaxations to complex limit sets via Poincar´e recurrence,” in Proceedings of the 2014 Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 861–873, 2014

  11. [11]

    Hart and A

    S. Hart and A. Mas-Colell, Simple Adaptive Strategies. World Scientific, 2013

  12. [12]

    Uncoupled dynamics do not lead to Nash equilibrium,

    S. Hart and A. Mas-Colell, “Uncoupled dynamics do not lead to Nash equilibrium,” American Eco- nomic Review, vol. 93, pp. 1830–1836, December 2003

  13. [13]

    An impossibility theorem in game dynamics,

    J. Milionis, C. Papadimitriou, G. Piliouras, and K. Spendlove, “An impossibility theorem in game dynamics,”Proceedings of the National Academy of Sciences, vol. 120, no. 41, 2023

  14. [14]

    Chaos in learning a simple two-person game,

    Y . Sato, E. Akiyama, and J. D. Farmer, “Chaos in learning a simple two-person game,”Proceedings of the National Academy of Sciences, vol. 99, no. 7, pp. 4748–4751, 2002

  15. [15]

    Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games

    Y . K. Cheung and G. Piliouras, “V ortices instead of equilibria in minmax optimization: Chaos and butterfly effects of online learning in zero-sum games,”CoRR, vol. abs/1905.08396, 2019

  16. [16]

    No-regret learning and mixed Nash equilibria: They do not mix,

    E.-V . Vlatakis-Gkaragkounis, L. Flokas, T. Lianeas, P. Mertikopoulos, and G. Piliouras, “No-regret learning and mixed Nash equilibria: They do not mix,” in Advances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds.), vol. 33, pp. 1380–1391, Curran Associates, Inc., 2020

  17. [17]

    On the impossibility of convergence of mixed strategies with optimal no-regret learning,

    V . Muthukumar, S. Phade, and A. Sahai, “On the impossibility of convergence of mixed strategies with optimal no-regret learning,”Mathematics of Operations Research, p. null, 2024

  18. [18]

    The limit points of (optimistic) gradient descent in min-max optimiza- tion,

    C. Daskalakis and I. Panageas, “The limit points of (optimistic) gradient descent in min-max optimiza- tion,” inAdvances in Neural Information Processing Systems (NeurIPS), 2018

  19. [19]

    Some methods of speeding up the convergence of iteration methods,

    B. Polyak, “Some methods of speeding up the convergence of iteration methods,”Ussr Computational Mathematics and Mathematical Physics, vol. 4, pp. 1–17, 12 1964

  20. [20]

    Training GANs with Optimism

    C. Daskalakis, A. Ilyas, V . Syrgkanis, and H. Zeng, “Training GANs with optimism.” arXiv preprint arXiv:1711.00141, 2017

  21. [21]

    On the importance of initialization and momentum in deep learning,

    I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” inProceedings of the 30th International Conference on Machine Learning(S. Dasgupta and D. McAllester, eds.), vol. 28 of Proceedings of Machine Learning Research , (Atlanta, Georgia, USA), pp. 1139–1147, PMLR, 17–19 Jun 2013. 43

  22. [22]

    Dynamic fictitious play, dynamic gradient play, and distributed con- vergence to Nash equilibria,

    J. S. Shamma and G. Arslan, “Dynamic fictitious play, dynamic gradient play, and distributed con- vergence to Nash equilibria,” IEEE Transactions on Automatic Control, vol. 50, pp. 312–327, March 2005

  23. [23]

    Relaxation techniques and asynchronous algorithms for on-line computation of non- cooperative equilibria,

    T. Bas ¸ar, “Relaxation techniques and asynchronous algorithms for on-line computation of non- cooperative equilibria,” Journal of Economic Dynamics and Control , vol. 11, no. 4, pp. 531–549, 1987

  24. [24]

    Adaptation in games: Two solutions to the Crawford puzzle,

    J. Conlisk, “Adaptation in games: Two solutions to the Crawford puzzle,”Journal of Economic Behav- ior & Organization, vol. 22, no. 1, pp. 25–50, 1993

  25. [25]

    Newtonian mechanics and Nash play,

    S. Flam and J. Morgan, “Newtonian mechanics and Nash play,” International Game Theory Review , vol. 06, 07 2003

  26. [26]

    Higher order game dynamics,

    R. Laraki and P. Mertikopoulos, “Higher order game dynamics,”Journal of Economic Theory, vol. 148, pp. 2666–2695, 06 2013

  27. [27]

    On passivity, reinforcement learning and higher order learning in multiagent finite games,

    B. Gao and L. Pavel, “On passivity, reinforcement learning and higher order learning in multiagent finite games,”IEEE Transactions on Automatic Control, vol. 66, no. 1, pp. 121–136, 2021

  28. [28]

    Anticipatory learning in general evolutionary games,

    G. Arslan and J. S. Shamma, “Anticipatory learning in general evolutionary games,” inProceedings of the 45th IEEE Conference on Decision and Control, pp. 6289–6294, 2006

  29. [29]

    Higher-order uncoupled dynamics do not lead to Nash equilibrium — except when they do,

    S. A. Toonsi and J. S. Shamma, “Higher-order uncoupled dynamics do not lead to Nash equilibrium — except when they do,” in Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, (Red Hook, NY , USA), Curran Associates Inc., 2024

  30. [30]

    Replicator dynamics,

    P. Schuster and K. Sigmund, “Replicator dynamics,” Journal of Theoretical Biology, vol. 100, no. 3, pp. 533–538, 1983

  31. [31]

    Replicator dynamics: Old and new,

    S. Sylvain, “Replicator dynamics: Old and new,” Journal of Dynamics and Games , vol. 7, no. 4, pp. 365–386, 2020

  32. [32]

    Oddness of the number of equilibrium points: A new proof,

    J. Harsanyi, “Oddness of the number of equilibrium points: A new proof,” International Journal of Game Theory, vol. 2, pp. 235–250, 1973

  33. [33]

    Weibull, Evolutionary Game Theory

    J. Weibull, Evolutionary Game Theory. Mit Press, MIT Press, 1997

  34. [34]

    Online optimization in games via control theory: Connecting regret, passivity and Poincar´e recurrence,

    Y . K. Cheung and G. Piliouras, “Online optimization in games via control theory: Connecting regret, passivity and Poincar´e recurrence,” in Proceedings of the 38th International Conference on Machine Learning (M. Meila and T. Zhang, eds.), vol. 139 of Proceedings of Machine Learning Research , pp. 1855–1865, PMLR, 2021

  35. [35]

    Passivity, no-regret, and convergent learning in con- tractive games,

    H. Abdelraouf, G. Piliouras, and J. S. Shamma, “Passivity, no-regret, and convergent learning in con- tractive games,” 2025. arXiv preprint

  36. [36]

    Single-loop feedback-stabilization of linear multivariable dynam- ical plants,

    D. Youla, J. Bongiorno, and C. Lu, “Single-loop feedback-stabilization of linear multivariable dynam- ical plants,”Automatica, vol. 10, no. 2, pp. 159–173, 1974

  37. [37]

    Dynamics of stochastic approximation algorithms,

    M. Bena ¨ım, “Dynamics of stochastic approximation algorithms,” inS´eminaire de Probabilit´es XXXIII (J. Az´ema, M. ´Emery, M. Ledoux, and M. Yor, eds.), (Berlin, Heidelberg), pp. 1–68, Springer Berlin Heidelberg, 1999. 44

  38. [38]

    Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint

    V . Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint . Cambridge University Press, 2008

  39. [39]

    Spivak, Calculus On Manifolds: A Modern Approach To Classical Theorems Of Advanced Calcu- lus

    M. Spivak, Calculus On Manifolds: A Modern Approach To Classical Theorems Of Advanced Calcu- lus. Avalon Publishing, 1965

  40. [40]

    On the uniqueness of Nash equilibria in multiagent matrix games

    J. P. Bailey, “On the uniqueness of Nash equilibria in multiagent matrix games.” arXiv preprint arXiv:2410.16548, 2024

  41. [41]

    Time average replicator and best-reply dynamics,

    J. Hofbauer, S. Sorin, and Y . Viossat, “Time average replicator and best-reply dynamics,”Mathematics of Operations Research, vol. 34, no. 2, pp. 263–269, 2009

  42. [42]

    Chapter 9 - stable matrices and polynomials,

    A. S. Poznyak, “Chapter 9 - stable matrices and polynomials,” in Advanced Mathematical Tools for Automatic Control Engineers: Deterministic Techniques (A. S. Poznyak, ed.), pp. 139–174, Oxford: Elsevier, 2008

  43. [43]

    Decentralized strong stabilization problem,

    A. B. Ozguler and K. A. Unyelioulu, “Decentralized strong stabilization problem,” in 1992 American Control Conference, pp. 3294–3298, 1992

  44. [44]

    On a network generalization of the minmax theorem,

    C. Daskalakis and C. H. Papadimitriou, “On a network generalization of the minmax theorem,” in Au- tomata, Languages and Programming(S. Albers, A. Marchetti-Spaccamela, Y . Matias, S. Nikoletseas, and W. Thomas, eds.), (Berlin, Heidelberg), pp. 423–434, Springer Berlin Heidelberg, 2009

  45. [45]

    Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon,

    H. Moulin and J.-P. Vial, “Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon,” International Journal of Game Theory , vol. 7, pp. 201–221, 1978

  46. [46]

    Aubin and A

    J.-P. Aubin and A. Cellina, Differential Inclusions. Grundlehren der mathematischen Wissenschaften, Springer Berlin, Heidelberg, 1 ed., 2012

  47. [47]

    Narendra and M

    K. Narendra and M. Thathachar, Learning Automata: An Introduction . Dover Books on Electrical Engineering Series, Dover Publications, Incorporated, 2012

  48. [48]

    Learning to reach the pareto optimal Nash equi- librium as a team,

    K. Verbeeck, A. Now ´e, T. Lenaerts, and J. Parent, “Learning to reach the pareto optimal Nash equi- librium as a team,” in AI 2002: Advances in Artificial Intelligence (B. McKay and J. Slaney, eds.), (Berlin, Heidelberg), pp. 407–418, Springer Berlin Heidelberg, 2002

  49. [49]

    Learning through reinforcement and replicator dynamics,

    T. B ¨orgers and R. Sarin, “Learning through reinforcement and replicator dynamics,” Journal of Eco- nomic Theory, vol. 77, no. 1, pp. 1–14, 1997

  50. [50]

    Attainability of boundary points under reinforcement learning,

    E. Hopkins and M. Posch, “Attainability of boundary points under reinforcement learning,” Games and Economic Behavior, vol. 53, no. 1, pp. 110–125, 2005

  51. [51]

    Distributed dynamic reinforcement of efficient outcomes in multi- agent coordination and network formation,

    G. C. Chasparis and J. S. Shamma, “Distributed dynamic reinforcement of efficient outcomes in multi- agent coordination and network formation,”Dynamic Games and Applications, vol. 2, no. 1, pp. 18–50, 2012

  52. [52]

    Multi-agent risks from advanced AI,

    L. Hammond, A. Chan, J. Clifton, J. Hoelscher-Obermaier, A. Khan, E. McLean, C. Smith, W. Bar- fuss, J. Foerster, T. Gaven ˇciak, T. A. Han, E. Hughes, V . Kovaˇr´ık, J. Kulveit, J. Z. Leibo, C. Oester- held, C. S. de Witt, N. Shah, M. Wellman, P. Bova, T. Cimpeanu, C. Ezell, Q. Feuillade-Montixi, M. Franklin, E. Kran, I. Krawczuk, M. Lamparth, N. Lauffer...

  53. [53]

    Population games, stable games, and passivity,

    M. J. Fox and J. S. Shamma, “Population games, stable games, and passivity,” Games, vol. 4, no. 4, pp. 561–583, 2013

  54. [54]

    Khalil, Nonlinear Systems

    H. Khalil, Nonlinear Systems. Prentice Hall, third ed., 2002

  55. [55]

    J. P. Hespanha, Linear Systems Theory: Second Edition. Princeton University Press, 2018

  56. [56]

    W. J. Rugh, Linear System Theory. Prentice Hall, 1996

  57. [57]

    Washout filters in feedback control: benefits, limitations and extensions,

    M. Hassouneh, H.-C. Lee, and E. Abed, “Washout filters in feedback control: benefits, limitations and extensions,” inProceedings of the 2004 American Control Conference, pp. 3950–3955, 2004

  58. [58]

    Decentralized stabilization and pole assignment for general proper sys- tems,

    E. Davison and T. Chang, “Decentralized stabilization and pole assignment for general proper sys- tems,”IEEE Transactions on Automatic Control, vol. 35, no. 6, pp. 652–664, 1990. 46