Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach

Weiming Ou; Xiao Wang; Yi Feng

arxiv: 2605.19392 · v1 · pith:OZ2OVOZZnew · submitted 2026-05-19 · 💻 cs.LG

Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach

Yi Feng , Weiming Ou , Xiao Wang This is my paper

Pith reviewed 2026-05-20 07:47 UTC · model grok-4.3

classification 💻 cs.LG

keywords Adamzero-sum gamesODE analysisGAN trainingmomentum parameterslocal convergenceimplicit regularization

0 comments

The pith

In zero-sum games the first- and second-order momentum terms of Adam-DA reverse the convergence roles they play in minimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives ordinary differential equations that serve as continuous-time limits of the discrete Adam-DA updates used for zero-sum games. These ODEs make it possible to analyze local convergence and implicit gradient regularization in a tractable way. The central result is that the first-order momentum parameter slows convergence while the second-order momentum parameter accelerates it, exactly opposite to the well-known effects in ordinary minimization. The predictions are checked by running GAN training on several architectures and datasets. If the ODE approximation holds, the same reversed momentum behavior should appear in any zero-sum setting where Adam-DA is applied.

Core claim

By taking the continuous-time limit of the Adam-DA iterates, the authors obtain a system of ODEs whose equilibria and stability properties can be studied directly. Analysis of these ODEs shows that raising the first-order momentum coefficient destabilizes the saddle while raising the second-order coefficient stabilizes it; the signs of these effects are reversed relative to the standard minimization case. The same ODEs also reveal an implicit regularization term whose form depends on the momentum parameters in the opposite manner from gradient descent.

What carries the argument

The system of ordinary differential equations obtained as the continuous-time limit of the Adam-DA discrete updates.

If this is right

Local convergence of Adam-DA to a saddle can be read off from the eigenvalues of the linearized ODE.
The implicit regularization induced by Adam-DA in games takes the opposite functional form from the regularization induced in minimization.
Tuning guidelines for Adam-DA in GANs should invert the usual momentum recommendations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same ODE construction could be applied to other adaptive optimizers such as RMSProp-DA to check whether the momentum reversal is specific to Adam or generic.
If the reversal persists in non-convex zero-sum problems, it may explain why small first-order momentum values are often preferred in practice for GAN training.
The ODE view suggests a possible continuous-time schedule for the momentum coefficients that could improve stability without changing the discrete algorithm.

Load-bearing premise

The discrete Adam-DA steps with typical learning rates and momentum values stay close enough to their continuous ODE trajectories that local stability and regularization results carry over.

What would settle it

Run Adam-DA on a simple bilinear zero-sum game with known saddle and measure whether increasing the first-order momentum visibly slows convergence or increasing the second-order momentum visibly speeds it up; a reversal of either trend would contradict the claim.

Figures

Figures reproduced from arXiv: 2605.19392 by Weiming Ou, Xiao Wang, Yi Feng.

**Figure 1.** Figure 1: Trajectories of Adam-DA, Continuous Adam-DA, and SignGDA-flow on three test functions from (Compagnoni et al., 2024b). Continuous Adam-DA closely approximates Adam-DA. Especially in 1(b) and 1(c), where SignGDA-flow either diverges or approaches to a different equilibrium, while the trajectories of the other two methods remain similar. More details are provided in Appendix B.1. between ODEs and algorithms.… view at source ↗

**Figure 2.** Figure 2: Numerical experiments on quadratic test functions for the local convergence of [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: The ℓ1 norm of gradients of Adam-DA with varying β and ρ during GANs training. Datasets: CIFAR-10 and STL-10. Architectures: ResNet and CNN. As shown in 3(a), 3(c), 3(e), and 3(g), smaller β values result in smaller gradient norms. According to 3(b), 3(d), 3(f), and 3(h), larger ρ values also lead to smaller gradient norms. Both findings support the thesis. scapes in terms of ℓ1 norm, i.e., regions with lo… view at source ↗

**Figure 4.** Figure 4: Inception Score for the corresponding experimental settings in Figure [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Figures 5(a), 5(b), and 5(c) show the distances of two continuous-time models between Adam, with results averaged over 30 random initial conditions. In the following [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Additional experiments with different parameters. [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Effect of ϵ. 37 [PITH_FULL_IMAGE:figures/full_fig_p037_7.png] view at source ↗

**Figure 8.** Figure 8: Self-Attention GAN Experiments on CelebA, Evaluated by FID. In [PITH_FULL_IMAGE:figures/full_fig_p038_8.png] view at source ↗

**Figure 9.** Figure 9: IIn this figure, we reproduce one set of the experimental results from Section 5 of the submission on CNN GANs trained on the CIFAR-10 dataset. We evaluate performance using FID and include a comparison with the optimistic adaptive method. The conclusion is the same as that in [PITH_FULL_IMAGE:figures/full_fig_p039_9.png] view at source ↗

**Figure 10.** Figure 10: 2D sweep over (β, ρ) jointly. Each figure represents the final cumulative average gradient norms on 25 GANs training. Each figure shows the final cumulative average gradient norms over 25 GAN training runs. We observe that the upper-left corner of each figure exhibits smaller gradient norms than the lower-right corner, indicating that smaller β and larger ρ guide the optimization trajectories toward flatt… view at source ↗

**Figure 11.** Figure 11: Sample images generated by the models trained in [PITH_FULL_IMAGE:figures/full_fig_p041_11.png] view at source ↗

**Figure 12.** Figure 12: Sample images for different β. Architecture: ResNet. Data Set: CIFAR-10. 42 [PITH_FULL_IMAGE:figures/full_fig_p042_12.png] view at source ↗

**Figure 13.** Figure 13: Sample images for different ρ. Architecture: ResNet. Data Set: CIFAR-10. (a) β = −0.3, ρ = 0.9 (b) β = −0.2, ρ = 0.9 (c) β = 0.0, ρ = 0.9 (d) β = 0.2, ρ = 0.9 (e) β = 0.3, ρ = 0.9 (f) β = 0.5, ρ = 0.9 [PITH_FULL_IMAGE:figures/full_fig_p043_13.png] view at source ↗

**Figure 14.** Figure 14: Sample images for different β. Architecture: CNN. Data Set: STL-10. 43 [PITH_FULL_IMAGE:figures/full_fig_p043_14.png] view at source ↗

**Figure 15.** Figure 15: Sample images for different ρ. Architecture: CNN. Data Set: STL-10. 44 [PITH_FULL_IMAGE:figures/full_fig_p044_15.png] view at source ↗

read the original abstract

The remarkable success of the Adam in training neural networks has naturally led to the widespread use of its descent-ascent counterpart, Adam-DA, for solving zero-sum games. Despite its popularity in practice, a rigorous theoretical understanding of Adam-DA still lags behind. In this paper, we derive ordinary differential equations (ODEs) that serve as continuous-time limits of the Adam-DA. These ODEs closely approximate the discrete-time dynamics of Adam-DA, providing a tractable analytical framework for understanding its behavior in zero-sum games. Using this ODE approach, we investigate two fundamental aspects of Adam-DA: local convergence and implicit gradient regularization. Our analysis reveals that the roles of the first- and second-order momentum parameters in zero-sum games are exactly the opposite of their well-documented effects in minimization problems. We validate these predictions through GAN experiments across multiple architectures and datasets, demonstrating the practical implications of this reversed momentum effect.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Adam-DA in zero-sum games shows reversed momentum roles via ODE limits, but the discrete approximation at practical parameters is the open question.

read the letter

The main thing here is the derivation of ODEs for Adam-DA in zero-sum games, leading to the claim that the first- and second-order momentum parameters reverse their roles compared to minimization. They build the continuous limit from the discrete updates and apply it to study local convergence and implicit regularization. This produces a clear prediction about how beta1 and beta2 should be set differently in game settings, and they check it with GAN experiments on several architectures and datasets. The approach is a direct extension of existing ODE techniques for Adam, adapted to the min-max case, which gives some analytical handle on why these optimizers behave as they do in practice. The soft spot is the assumption that the ODE closely matches the discrete trajectory for the step sizes and momentum values used in real training. In zero-sum games, the non-monotone nature can cause oscillations, and the damping from the second moment might disrupt the timescale separation needed for the approximation. Without detailed error analysis or tests at practical hyperparameters, it's not clear how far the conclusions extend to the actual algorithm. This work is for people studying optimization in adversarial or game-theoretic machine learning contexts. It has a novel prediction backed by some experiments, so it should be sent for peer review. The referees can verify the derivation and push for stronger evidence on the approximation.

Referee Report

2 major / 2 minor

Summary. The paper derives ordinary differential equation (ODE) limits for the discrete Adam-DA algorithm applied to zero-sum games. These ODEs are used to analyze local convergence and implicit gradient regularization, leading to the claim that the first-order momentum parameter (β1) and second-order momentum parameter (β2) play exactly opposite roles compared to their effects in standard minimization problems. The analysis is validated through qualitative GAN experiments on multiple architectures and datasets.

Significance. If the ODE approximation is faithful at practical step sizes and momentum values, the work supplies a useful continuous-time framework for understanding momentum in non-monotone settings and could inform hyperparameter selection for GAN training. The explicit reversal result, if rigorously supported, distinguishes this contribution from prior ODE analyses of Adam in convex or minimization settings.

major comments (2)

[§3] §3 (ODE derivation): The continuous-time limit is obtained via standard Euler discretization and momentum rescaling, but the manuscript provides no explicit error bounds, timescale-separation conditions, or verification that the approximation remains valid for β2 ≈ 0.999 and learning rates ≈ 10^{-3} when the underlying vector field is non-monotone and oscillatory. This assumption is load-bearing for transferring local convergence and regularization conclusions from the ODE to the discrete Adam-DA updates.
[§5] §5 (Experiments): The GAN results are presented without quantitative metrics (e.g., FID scores, convergence rates), ablation controls on β1/β2, or direct comparisons to Adam in minimization tasks that would demonstrate the claimed reversal. This weakens the empirical support for the central theoretical prediction.

minor comments (2)

[§3] The notation for the rescaled momentum terms in the ODE should be aligned more explicitly with the discrete Adam-DA update equations to improve readability.
[Introduction] Add a brief discussion of how the derived ODEs relate to existing continuous-time analyses of Adam in minimization (e.g., prior works on momentum in convex optimization) for clearer positioning.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and insightful comments on our manuscript. We have carefully considered each point and outline our responses and planned revisions below.

read point-by-point responses

Referee: [§3] §3 (ODE derivation): The continuous-time limit is obtained via standard Euler discretization and momentum rescaling, but the manuscript provides no explicit error bounds, timescale-separation conditions, or verification that the approximation remains valid for β2 ≈ 0.999 and learning rates ≈ 10^{-3} when the underlying vector field is non-monotone and oscillatory. This assumption is load-bearing for transferring local convergence and regularization conclusions from the ODE to the discrete Adam-DA updates.

Authors: We thank the referee for this observation. The ODE limit is derived using the standard Euler method with appropriate rescaling of the momentum terms, as is common in the literature on continuous-time analyses of adaptive optimizers. We acknowledge that the manuscript does not provide explicit error bounds or detailed timescale separation conditions, particularly for the non-monotone case. Deriving such bounds rigorously for oscillatory dynamics would require substantial additional analysis. In the revised manuscript, we will include a new subsection in §3 discussing the assumptions underlying the approximation and provide numerical verification by comparing discrete trajectories with the ODE solutions for β2 close to 1 and small learning rates in the context of our GAN experiments. This will offer practical evidence for the validity of the limit in the relevant parameter regime. revision: partial
Referee: [§5] §5 (Experiments): The GAN results are presented without quantitative metrics (e.g., FID scores, convergence rates), ablation controls on β1/β2, or direct comparisons to Adam in minimization tasks that would demonstrate the claimed reversal. This weakens the empirical support for the central theoretical prediction.

Authors: We agree that incorporating quantitative metrics and ablations would strengthen the empirical section. In the revision, we will augment §5 with FID scores and other relevant quantitative measures for the GAN experiments. We will also add ablation studies on the effects of varying β1 and β2, as well as direct comparisons to the behavior of Adam in standard minimization settings. These changes will better substantiate the claimed reversal of roles for the momentum parameters. revision: yes

standing simulated objections not resolved

Deriving explicit error bounds and timescale-separation conditions for the ODE approximation in non-monotone and oscillatory settings.

Circularity Check

0 steps flagged

No circularity: standard ODE limit derivation with independent analysis

full rationale

The paper derives ODEs as continuous-time limits of discrete Adam-DA updates via standard Euler discretization and momentum rescaling techniques. Local convergence and implicit regularization properties are then analyzed directly on the resulting ODE system in the zero-sum setting, yielding the reversed-momentum observation as a consequence of the vector field structure. This chain is self-contained, does not reduce any prediction to a fitted input or prior self-citation by construction, and is externally validated via GAN experiments. No load-bearing step equates to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of taking continuous-time limits of the discrete Adam-DA updates and on the transfer of local stability properties from the resulting ODEs to the original algorithm. No new entities are postulated.

axioms (1)

domain assumption Discrete Adam-DA updates admit a continuous-time ODE limit that approximates their trajectory for small learning rates.
This is the foundational modeling step that converts the discrete optimizer into an analyzable dynamical system.

pith-pipeline@v0.9.0 · 5680 in / 1354 out tokens · 33080 ms · 2026-05-20T07:47:16.248366+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our analysis reveals that the roles of the first- and second-order momentum parameters in zero-sum games are exactly the opposite of their well-documented effects in minimization problems.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Continuous Adam-DA ... JAdam = 1/√ϵ (I - h(1+β)/(2√ϵ(1-β)) J) J

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

289 extracted references · 289 canonical work pages · 6 internal anchors

[1]

Yurii Nesterov , title =. Math. Program. , year =

work page
[2]

Robinson , title =

J. Robinson , title =. Annals of Mathematics , year =

work page
[3]

Brown , title =

G. Brown , title =. Activity Analysis of Production and Allocation , year =

work page
[4]

Zur Elektrodynamik bewegter Körper

Albert Einstein. Zur Elektrodynamik bewegter Körper. Annalen der Physik. 1905

work page 1905
[5]

The \ Companion

Michel Goossens and Frank Mittelbach and Alexander Samarin. The \ Companion. 1993

work page 1993
[6]

Advances in neural information processing systems , volume=

A unified game-theoretic approach to multiagent reinforcement learning , author=. Advances in neural information processing systems , volume=

work page
[7]

arXiv preprint arXiv:2011.00583 , year=

An overview of multi-agent reinforcement learning from game theoretical perspective , author=. arXiv preprint arXiv:2011.00583 , year=

work page arXiv 2011
[8]

Competing in the dark: An efficient algorithm for bandit linear optimization , author=

work page
[9]

Mathematical programming , volume=

Primal-dual subgradient methods for convex problems , author=. Mathematical programming , volume=. 2009 , publisher=

work page 2009
[10]

Foundations and Trends

Online learning and online convex optimization , author=. Foundations and Trends. 2012 , publisher=

work page 2012
[11]

Advances in Neural Information Processing Systems , volume=

Online Learning in Periodic Zero-Sum Games , author=. Advances in Neural Information Processing Systems , volume=

work page
[12]

Characterization and computation of local

Ratliff, Lillian J and Burden, Samuel A and Sastry, S Shankar , booktitle=. Characterization and computation of local. 2013 , organization=

work page 2013
[13]

Lee and Tengyu Ma , Booktitle =

Rong Ge and Jason D. Lee and Tengyu Ma , Booktitle =. Matrix Completion has No Spurious Local Minimum , Year =

work page
[14]

CoRR , Title =

Ngoc. CoRR , Title =

work page
[15]

Dauphin and Razvan Pascanu and Caglar Gulcehre and Kyunghyun Cho and Surya Ganguli and Yoshua Bengio , Date-Added =

Yann N. Dauphin and Razvan Pascanu and Caglar Gulcehre and Kyunghyun Cho and Surya Ganguli and Yoshua Bengio , Date-Added =. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , Urldate =

work page
[16]

Lee and Ioannis Panageas and Georgios Piliouras and Max Simchowitz and Michael I

Jason D. Lee and Ioannis Panageas and Georgios Piliouras and Max Simchowitz and Michael I. Jordan and Benjamin Recht , Journal =. First-order methods almost always avoid strict saddle points , Volume =

work page
[17]

Kakade and Michael I

Chi Jin and Rong Ge and Praneeth Netrapalli and Sham M. Kakade and Michael I. Jordan , Booktitle =. How to Escape Saddle Points Efficiently , Year =

work page
[18]

CoRR , volume =

Songtao Lu and Meisam Razaviyayn and Bo Yang and Kejun Huang and Mingyi Hong , title =. CoRR , volume =

work page
[19]

Proceedings of the 36th International Conference on Machine Learning,

Ioannis Panageas and Georgios Piliouras and Xiao Wang , title =. Proceedings of the 36th International Conference on Machine Learning,. 2019 , crossref =

work page 2019
[20]

First-order methods almost always avoid saddle points: The case of vanishing step-sizes , Year =

Ioannis Panageas and Georgios Piliouras and Xiao Wang , Booktitle =. First-order methods almost always avoid saddle points: The case of vanishing step-sizes , Year =

work page
[21]

Gillis , Booktitle =

N. Gillis , Booktitle =. The Why and How of Nonnegative Matrix Factorization" , Year =

work page
[22]

D. P. Bertsekas , Date-Added =. Nonlinear Programming , Year =

work page
[23]

Ho , Date-Added =

N.D. Ho , Date-Added =. Nonnegative matrix factorization algorithms and applications , Year =

work page
[24]

Cichocki, R

A. Cichocki, R. Zdunek, S.I. Amari , Booktitle =. Hierarchical ALS algorithms for nonnegative matrix and 3d tensor factorization , Year =

work page
[25]

Gonzalez and Yin Zhang , Title =

Edward F. Gonzalez and Yin Zhang , Title =

work page
[26]

Journal of Functional Analysis , Pages =

Felix Otto and Cedric Villani , Title =. Journal of Functional Analysis , Pages =

work page
[27]

AAMAS , year=

James Bailey and Georgios Piliouras , title=. AAMAS , year=

work page
[28]

2006 American Control Conference , pages=

Fundamental constraints on uncertainty evolution in Hamiltonian systems , author=. 2006 American Control Conference , pages=. 2006 , organization=

work page 2006
[29]

2017 , publisher=

Introduction to symplectic topology , author=. 2017 , publisher=

work page 2017
[30]

Training

Daskalakis, Constantinos and Ilyas, Andrew and Syrgkanis, Vasilis and Zeng, Haoyang , journal=. Training

work page
[31]

Advances in neural information processing systems , volume=

Tight last-iterate convergence rates for no-regret learning in multi-player games , author=. Advances in neural information processing systems , volume=

work page
[32]

International Conference on Machine Learning , pages=

Finite-time last-iterate convergence for multi-agent learning in games , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[33]

SODA , Year =

Cycles in Adversarial Regularized Learning , Author =. SODA , Year =

work page
[34]

Optimization despite chaos: Convex relaxations to complex limit sets via Poincar

Piliouras, Georgios and Shamma, Jeff S , booktitle=. Optimization despite chaos: Convex relaxations to complex limit sets via Poincar. 2014 , organization=

work page 2014
[35]

Science , volume=

Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , author=. Science , volume=. 2018 , publisher=

work page 2018
[36]

and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua , title =

Goodfellow, Ian J. and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua , title =. Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 , pages =. 2014 , publisher =

work page 2014
[37]

2019 , eprint=

Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , author=. 2019 , eprint=

work page 2019
[38]

International Conference on Learning Representations , year=

Smooth markets: A basic mechanism for organizing gradient-based learners , author=. International Conference on Learning Representations , year=

work page
[39]

ICLR , Year=

The Evolution of Uncertainty of Learning in Games , author=. ICLR , Year=

work page
[40]

ICML , Year =

The Mechanics of n-player Differentiable Games , Author =. ICML , Year =

work page
[41]

2020 , booktitle =

Yun Kuen Cheung and Georgios Piliouras , title =. 2020 , booktitle =

work page 2020
[42]

Conference on Learning Theory , pages=

Vortices instead of equilibria in minmax optimization: Chaos and butterfly effects of online learning in zero-sum games , author=. Conference on Learning Theory , pages=. 2019 , organization=

work page 2019
[43]

International Conference on Learning Representations , year=

Chaos of Learning Beyond Zero-sum and Coordination via Game Decompositions , author=. International Conference on Learning Representations , year=

work page
[44]

Linear Last-iterate Convergence in Constrained Saddle-point Optimization , booktitle =

Chen. Linear Last-iterate Convergence in Constrained Saddle-point Optimization , booktitle =

work page
[45]

NeurIPS , year =

Yang Cai and Argyris Oikonomou and Weiqiang Zheng , title =. NeurIPS , year =

work page
[46]

NeurIPS , year =

Eduard Gorbunov and Adrien Taylor and Gauthier Gidel , title =. NeurIPS , year =

work page
[47]

ICLR , year=

Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , author=. ICLR , year=

work page
[48]

Extragradient Method:

Eduard Gorbunov and Nicolas Loizou and Gauthier Gidel , editor =. Extragradient Method:. International Conference on Artificial Intelligence and Statistics,

work page
[49]

Pseudo holomorphic curves in symplectic manifolds , Volume =

Misha Gromov , Journal =. Pseudo holomorphic curves in symplectic manifolds , Volume =

work page
[50]

Differential Equations and Dynamical Systems

Lawrence Perko. Differential Equations and Dynamical Systems. 2001

work page 2001
[51]

Introduction to Symplectic Topology

Dust McDuff and Dietmar Salamon. Introduction to Symplectic Topology. 2017

work page 2017
[52]

Algorithmic Game Theory

Noam Nisan and Tim Roughgarden and Eva Tardos and Vijay Varian. Algorithmic Game Theory. 2007

work page 2007
[53]

Foundations of Physics , volume=

The symplectic camel and the uncertainty principle: The tip of an iceberg? , author=. Foundations of Physics , volume=. 2009 , publisher=

work page 2009
[54]

Nature , volume=

The symplectic camel , author=. Nature , volume=

work page
[55]

What is symplectic gemoetry , journal=

Dusa Mcduff , year=. What is symplectic gemoetry , journal=

work page
[56]

Russian Mathematical Surveys , volume=

First steps in symplectic topology , author=. Russian Mathematical Surveys , volume=. 1986 , publisher=

work page 1986
[57]

Proceedings of the 2018 ACM Conference on Economics and Computation , pages=

Multiplicative weights update in zero-sum games , author=. Proceedings of the 2018 ACM Conference on Economics and Computation , pages=

work page 2018
[58]

Adaptive learning in continuous games: Optimal regret bounds and convergence to

Hsieh, Yu-Guan and Antonakopoulos, Kimon and Mertikopoulos, Panayotis , booktitle=. Adaptive learning in continuous games: Optimal regret bounds and convergence to. 2021 , organization=

work page 2021
[59]

International Conference on Machine Learning , pages=

The limits of min-max optimization algorithms: Convergence to spurious non-critical sets , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[60]

nature , volume=

Mastering the game of Go with deep neural networks and tree search , author=. nature , volume=. 2016 , publisher=

work page 2016
[61]

2010 , publisher=

Symplectic geometric algorithms for Hamiltonian systems , author=. 2010 , publisher=

work page 2010
[62]

ACM SIGecom Exchanges , volume=

Game dynamics as the meaning of a game , author=. ACM SIGecom Exchanges , volume=. 2019 , publisher=

work page 2019
[63]

Scientific reports , volume=

-rank: Multi-agent evaluation by evolution , author=. Scientific reports , volume=. 2019 , publisher=

work page 2019
[64]

Conference on Learning Theory , pages=

Learning in matrix games can be arbitrarily complex , author=. Conference on Learning Theory , pages=. 2021 , organization=

work page 2021
[65]

arXiv preprint arXiv:2005.12649 , year=

On the impossibility of global convergence in multi-loss optimization , author=. arXiv preprint arXiv:2005.12649 , year=

work page arXiv 2005
[66]

Conference on Learning Theory , pages=

Finite regret and cycles with fixed step-size via alternating gradient descent-ascent , author=. Conference on Learning Theory , pages=. 2020 , organization=

work page 2020
[67]

Physica D: Nonlinear Phenomena , volume=

Some aspects of Hamiltonian systems and symplectic algorithms , author=. Physica D: Nonlinear Phenomena , volume=. 1994 , publisher=

work page 1994
[68]

2006 , publisher=

Elements of information theory , author=. 2006 , publisher=

work page 2006
[69]

Advances in Neural Information Processing Systems , volume=

Alternating mirror descent for constrained min-max games , author=. Advances in Neural Information Processing Systems , volume=

work page
[70]

Fuzzy Optimization and Decision Making , volume=

Uncertain bimatrix game with applications , author=. Fuzzy Optimization and Decision Making , volume=. 2013 , publisher=

work page 2013
[71]

Advances in Neural Information Processing Systems , volume=

Stochastic variance reduction methods for saddle-point problems , author=. Advances in Neural Information Processing Systems , volume=

work page
[72]

arXiv preprint arXiv:1909.06946 , year=

A stochastic proximal point algorithm for saddle-point problems , author=. arXiv preprint arXiv:1909.06946 , year=

work page arXiv 1909
[73]

ICML , year =

Mengxiao Zhang and Peng Zhao and Haipeng Luo and Zhi-Hua Zhou , title =. ICML , year =

work page
[74]

Advances in Neural Information Processing Systems , volume=

Reducing noise in gan training with variance reduced extragradient , author=. Advances in Neural Information Processing Systems , volume=

work page
[75]

Advances in Neural Information Processing Systems , volume=

Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems , author=. Advances in Neural Information Processing Systems , volume=

work page
[76]

Advances in Neural Information Processing Systems , volume=

Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems , author=. Advances in Neural Information Processing Systems , volume=

work page
[77]

The 22nd International Conference on Artificial Intelligence and Statistics , year=

Negative momentum for improved game dynamics , author=. The 22nd International Conference on Artificial Intelligence and Statistics , year=

work page
[78]

Dynamic Games and Applications , volume=

On the expected number of internal equilibria in random evolutionary games with correlated payoff matrix , author=. Dynamic Games and Applications , volume=. 2019 , publisher=

work page 2019
[79]

ICLR , year=

A variational inequality perspective on generative adversarial networks , author=. ICLR , year=

work page
[80]

ICLR , year=

Convergence of gradient methods on bilinear zero-sum games , author=. ICLR , year=

work page

Showing first 80 references.

[1] [1]

Yurii Nesterov , title =. Math. Program. , year =

work page

[2] [2]

Robinson , title =

J. Robinson , title =. Annals of Mathematics , year =

work page

[3] [3]

Brown , title =

G. Brown , title =. Activity Analysis of Production and Allocation , year =

work page

[4] [4]

Zur Elektrodynamik bewegter Körper

Albert Einstein. Zur Elektrodynamik bewegter Körper. Annalen der Physik. 1905

work page 1905

[5] [5]

The \ Companion

Michel Goossens and Frank Mittelbach and Alexander Samarin. The \ Companion. 1993

work page 1993

[6] [6]

Advances in neural information processing systems , volume=

A unified game-theoretic approach to multiagent reinforcement learning , author=. Advances in neural information processing systems , volume=

work page

[7] [7]

arXiv preprint arXiv:2011.00583 , year=

An overview of multi-agent reinforcement learning from game theoretical perspective , author=. arXiv preprint arXiv:2011.00583 , year=

work page arXiv 2011

[8] [8]

Competing in the dark: An efficient algorithm for bandit linear optimization , author=

work page

[9] [9]

Mathematical programming , volume=

Primal-dual subgradient methods for convex problems , author=. Mathematical programming , volume=. 2009 , publisher=

work page 2009

[10] [10]

Foundations and Trends

Online learning and online convex optimization , author=. Foundations and Trends. 2012 , publisher=

work page 2012

[11] [11]

Advances in Neural Information Processing Systems , volume=

Online Learning in Periodic Zero-Sum Games , author=. Advances in Neural Information Processing Systems , volume=

work page

[12] [12]

Characterization and computation of local

Ratliff, Lillian J and Burden, Samuel A and Sastry, S Shankar , booktitle=. Characterization and computation of local. 2013 , organization=

work page 2013

[13] [13]

Lee and Tengyu Ma , Booktitle =

Rong Ge and Jason D. Lee and Tengyu Ma , Booktitle =. Matrix Completion has No Spurious Local Minimum , Year =

work page

[14] [14]

CoRR , Title =

Ngoc. CoRR , Title =

work page

[15] [15]

Dauphin and Razvan Pascanu and Caglar Gulcehre and Kyunghyun Cho and Surya Ganguli and Yoshua Bengio , Date-Added =

Yann N. Dauphin and Razvan Pascanu and Caglar Gulcehre and Kyunghyun Cho and Surya Ganguli and Yoshua Bengio , Date-Added =. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , Urldate =

work page

[16] [16]

Lee and Ioannis Panageas and Georgios Piliouras and Max Simchowitz and Michael I

Jason D. Lee and Ioannis Panageas and Georgios Piliouras and Max Simchowitz and Michael I. Jordan and Benjamin Recht , Journal =. First-order methods almost always avoid strict saddle points , Volume =

work page

[17] [17]

Kakade and Michael I

Chi Jin and Rong Ge and Praneeth Netrapalli and Sham M. Kakade and Michael I. Jordan , Booktitle =. How to Escape Saddle Points Efficiently , Year =

work page

[18] [18]

CoRR , volume =

Songtao Lu and Meisam Razaviyayn and Bo Yang and Kejun Huang and Mingyi Hong , title =. CoRR , volume =

work page

[19] [19]

Proceedings of the 36th International Conference on Machine Learning,

Ioannis Panageas and Georgios Piliouras and Xiao Wang , title =. Proceedings of the 36th International Conference on Machine Learning,. 2019 , crossref =

work page 2019

[20] [20]

First-order methods almost always avoid saddle points: The case of vanishing step-sizes , Year =

Ioannis Panageas and Georgios Piliouras and Xiao Wang , Booktitle =. First-order methods almost always avoid saddle points: The case of vanishing step-sizes , Year =

work page

[21] [21]

Gillis , Booktitle =

N. Gillis , Booktitle =. The Why and How of Nonnegative Matrix Factorization" , Year =

work page

[22] [22]

D. P. Bertsekas , Date-Added =. Nonlinear Programming , Year =

work page

[23] [23]

Ho , Date-Added =

N.D. Ho , Date-Added =. Nonnegative matrix factorization algorithms and applications , Year =

work page

[24] [24]

Cichocki, R

A. Cichocki, R. Zdunek, S.I. Amari , Booktitle =. Hierarchical ALS algorithms for nonnegative matrix and 3d tensor factorization , Year =

work page

[25] [25]

Gonzalez and Yin Zhang , Title =

Edward F. Gonzalez and Yin Zhang , Title =

work page

[26] [26]

Journal of Functional Analysis , Pages =

Felix Otto and Cedric Villani , Title =. Journal of Functional Analysis , Pages =

work page

[27] [27]

AAMAS , year=

James Bailey and Georgios Piliouras , title=. AAMAS , year=

work page

[28] [28]

2006 American Control Conference , pages=

Fundamental constraints on uncertainty evolution in Hamiltonian systems , author=. 2006 American Control Conference , pages=. 2006 , organization=

work page 2006

[29] [29]

2017 , publisher=

Introduction to symplectic topology , author=. 2017 , publisher=

work page 2017

[30] [30]

Training

Daskalakis, Constantinos and Ilyas, Andrew and Syrgkanis, Vasilis and Zeng, Haoyang , journal=. Training

work page

[31] [31]

Advances in neural information processing systems , volume=

Tight last-iterate convergence rates for no-regret learning in multi-player games , author=. Advances in neural information processing systems , volume=

work page

[32] [32]

International Conference on Machine Learning , pages=

Finite-time last-iterate convergence for multi-agent learning in games , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020

[33] [33]

SODA , Year =

Cycles in Adversarial Regularized Learning , Author =. SODA , Year =

work page

[34] [34]

Optimization despite chaos: Convex relaxations to complex limit sets via Poincar

Piliouras, Georgios and Shamma, Jeff S , booktitle=. Optimization despite chaos: Convex relaxations to complex limit sets via Poincar. 2014 , organization=

work page 2014

[35] [35]

Science , volume=

Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , author=. Science , volume=. 2018 , publisher=

work page 2018

[36] [36]

and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua , title =

Goodfellow, Ian J. and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua , title =. Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 , pages =. 2014 , publisher =

work page 2014

[37] [37]

2019 , eprint=

Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , author=. 2019 , eprint=

work page 2019

[38] [38]

International Conference on Learning Representations , year=

Smooth markets: A basic mechanism for organizing gradient-based learners , author=. International Conference on Learning Representations , year=

work page

[39] [39]

ICLR , Year=

The Evolution of Uncertainty of Learning in Games , author=. ICLR , Year=

work page

[40] [40]

ICML , Year =

The Mechanics of n-player Differentiable Games , Author =. ICML , Year =

work page

[41] [41]

2020 , booktitle =

Yun Kuen Cheung and Georgios Piliouras , title =. 2020 , booktitle =

work page 2020

[42] [42]

Conference on Learning Theory , pages=

Vortices instead of equilibria in minmax optimization: Chaos and butterfly effects of online learning in zero-sum games , author=. Conference on Learning Theory , pages=. 2019 , organization=

work page 2019

[43] [43]

International Conference on Learning Representations , year=

Chaos of Learning Beyond Zero-sum and Coordination via Game Decompositions , author=. International Conference on Learning Representations , year=

work page

[44] [44]

Linear Last-iterate Convergence in Constrained Saddle-point Optimization , booktitle =

Chen. Linear Last-iterate Convergence in Constrained Saddle-point Optimization , booktitle =

work page

[45] [45]

NeurIPS , year =

Yang Cai and Argyris Oikonomou and Weiqiang Zheng , title =. NeurIPS , year =

work page

[46] [46]

NeurIPS , year =

Eduard Gorbunov and Adrien Taylor and Gauthier Gidel , title =. NeurIPS , year =

work page

[47] [47]

ICLR , year=

Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , author=. ICLR , year=

work page

[48] [48]

Extragradient Method:

Eduard Gorbunov and Nicolas Loizou and Gauthier Gidel , editor =. Extragradient Method:. International Conference on Artificial Intelligence and Statistics,

work page

[49] [49]

Pseudo holomorphic curves in symplectic manifolds , Volume =

Misha Gromov , Journal =. Pseudo holomorphic curves in symplectic manifolds , Volume =

work page

[50] [50]

Differential Equations and Dynamical Systems

Lawrence Perko. Differential Equations and Dynamical Systems. 2001

work page 2001

[51] [51]

Introduction to Symplectic Topology

Dust McDuff and Dietmar Salamon. Introduction to Symplectic Topology. 2017

work page 2017

[52] [52]

Algorithmic Game Theory

Noam Nisan and Tim Roughgarden and Eva Tardos and Vijay Varian. Algorithmic Game Theory. 2007

work page 2007

[53] [53]

Foundations of Physics , volume=

The symplectic camel and the uncertainty principle: The tip of an iceberg? , author=. Foundations of Physics , volume=. 2009 , publisher=

work page 2009

[54] [54]

Nature , volume=

The symplectic camel , author=. Nature , volume=

work page

[55] [55]

What is symplectic gemoetry , journal=

Dusa Mcduff , year=. What is symplectic gemoetry , journal=

work page

[56] [56]

Russian Mathematical Surveys , volume=

First steps in symplectic topology , author=. Russian Mathematical Surveys , volume=. 1986 , publisher=

work page 1986

[57] [57]

Proceedings of the 2018 ACM Conference on Economics and Computation , pages=

Multiplicative weights update in zero-sum games , author=. Proceedings of the 2018 ACM Conference on Economics and Computation , pages=

work page 2018

[58] [58]

Adaptive learning in continuous games: Optimal regret bounds and convergence to

Hsieh, Yu-Guan and Antonakopoulos, Kimon and Mertikopoulos, Panayotis , booktitle=. Adaptive learning in continuous games: Optimal regret bounds and convergence to. 2021 , organization=

work page 2021

[59] [59]

International Conference on Machine Learning , pages=

The limits of min-max optimization algorithms: Convergence to spurious non-critical sets , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021

[60] [60]

nature , volume=

Mastering the game of Go with deep neural networks and tree search , author=. nature , volume=. 2016 , publisher=

work page 2016

[61] [61]

2010 , publisher=

Symplectic geometric algorithms for Hamiltonian systems , author=. 2010 , publisher=

work page 2010

[62] [62]

ACM SIGecom Exchanges , volume=

Game dynamics as the meaning of a game , author=. ACM SIGecom Exchanges , volume=. 2019 , publisher=

work page 2019

[63] [63]

Scientific reports , volume=

-rank: Multi-agent evaluation by evolution , author=. Scientific reports , volume=. 2019 , publisher=

work page 2019

[64] [64]

Conference on Learning Theory , pages=

Learning in matrix games can be arbitrarily complex , author=. Conference on Learning Theory , pages=. 2021 , organization=

work page 2021

[65] [65]

arXiv preprint arXiv:2005.12649 , year=

On the impossibility of global convergence in multi-loss optimization , author=. arXiv preprint arXiv:2005.12649 , year=

work page arXiv 2005

[66] [66]

Conference on Learning Theory , pages=

Finite regret and cycles with fixed step-size via alternating gradient descent-ascent , author=. Conference on Learning Theory , pages=. 2020 , organization=

work page 2020

[67] [67]

Physica D: Nonlinear Phenomena , volume=

Some aspects of Hamiltonian systems and symplectic algorithms , author=. Physica D: Nonlinear Phenomena , volume=. 1994 , publisher=

work page 1994

[68] [68]

2006 , publisher=

Elements of information theory , author=. 2006 , publisher=

work page 2006

[69] [69]

Advances in Neural Information Processing Systems , volume=

Alternating mirror descent for constrained min-max games , author=. Advances in Neural Information Processing Systems , volume=

work page

[70] [70]

Fuzzy Optimization and Decision Making , volume=

Uncertain bimatrix game with applications , author=. Fuzzy Optimization and Decision Making , volume=. 2013 , publisher=

work page 2013

[71] [71]

Advances in Neural Information Processing Systems , volume=

Stochastic variance reduction methods for saddle-point problems , author=. Advances in Neural Information Processing Systems , volume=

work page

[72] [72]

arXiv preprint arXiv:1909.06946 , year=

A stochastic proximal point algorithm for saddle-point problems , author=. arXiv preprint arXiv:1909.06946 , year=

work page arXiv 1909

[73] [73]

ICML , year =

Mengxiao Zhang and Peng Zhao and Haipeng Luo and Zhi-Hua Zhou , title =. ICML , year =

work page

[74] [74]

Advances in Neural Information Processing Systems , volume=

Reducing noise in gan training with variance reduced extragradient , author=. Advances in Neural Information Processing Systems , volume=

work page

[75] [75]

Advances in Neural Information Processing Systems , volume=

Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems , author=. Advances in Neural Information Processing Systems , volume=

work page

[76] [76]

Advances in Neural Information Processing Systems , volume=

Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems , author=. Advances in Neural Information Processing Systems , volume=

work page

[77] [77]

The 22nd International Conference on Artificial Intelligence and Statistics , year=

Negative momentum for improved game dynamics , author=. The 22nd International Conference on Artificial Intelligence and Statistics , year=

work page

[78] [78]

Dynamic Games and Applications , volume=

On the expected number of internal equilibria in random evolutionary games with correlated payoff matrix , author=. Dynamic Games and Applications , volume=. 2019 , publisher=

work page 2019

[79] [79]

ICLR , year=

A variational inequality perspective on generative adversarial networks , author=. ICLR , year=

work page

[80] [80]

ICLR , year=

Convergence of gradient methods on bilinear zero-sum games , author=. ICLR , year=

work page