Mirror Descent-Ascent for mean-field min-max problems

{\L}ukasz Szpruch; Mateusz B. Majka; Razvan-Andrei Lascu

arxiv: 2402.08106 · v3 · submitted 2024-02-12 · 🧮 math.OC · cs.LG· math.PR

Mirror Descent-Ascent for mean-field min-max problems

Razvan-Andrei Lascu , Mateusz B. Majka , {\L}ukasz Szpruch This is my paper

Pith reviewed 2026-05-24 03:45 UTC · model grok-4.3

classification 🧮 math.OC cs.LGmath.PR

keywords mirror descent-ascentmean-field min-maxNikaidô-Isoda errormixed Nash equilibriaBregman divergenceconvergence ratesmean-field optimization

0 comments

The pith

Mirror descent-ascent reaches mixed Nash equilibria in mean-field min-max problems at O(N^{-1/2}) simultaneous and O(N^{-2/3}) alternating rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes two mirror descent-ascent variants for min-max problems where the decision variables are probability measures. It proves non-asymptotic rates of convergence to mixed Nash equilibria in the Nikaidô-Isoda error, with the alternating version improving on the simultaneous one. The proofs rely on convexity-concavity and relative smoothness of the payoff with respect to a Bregman divergence constructed via flat derivatives on the space of measures. A dual-space argument controls the commutator terms that arise from alternating updates. The results extend beyond the bilinear case to general nonlinear convex-concave payoffs on measures.

Core claim

We establish non-asymptotic convergence rates to mixed Nash equilibria, measured in the Nikaidô-Isoda error, proving an O(N^{-1/2}) rate for simultaneous MDA and an improved O(N^{-2/3}) rate for alternating MDA. The main technical contribution is an infinite-dimensional dual space analysis that relates Bregman divergences on measures to dual Bregman divergences on spaces of bounded continuous functions, allowing us to control asymmetric commutator terms created by alternating updates. The results substantially generalize prior analyses restricted to bilinear objectives and also apply to nonlinear convex-concave problems on measure spaces.

What carries the argument

Infinite-dimensional dual space analysis that relates Bregman divergences on measures to dual Bregman divergences on spaces of bounded continuous functions to control asymmetric commutator terms from alternating updates.

If this is right

The same dual-space technique yields convergence guarantees for general nonlinear convex-concave mean-field min-max problems.
Alternating MDA is provably faster than simultaneous MDA under the stated assumptions.
The framework supplies non-asymptotic guarantees that apply directly to mixed-strategy equilibria in mean-field games.
Prior bilinear analyses are recovered as special cases of the new rates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same commutator-control argument may adapt to other first-order methods that alternate between two infinite-dimensional spaces.
In practice the O(N^{-2/3}) rate favors alternating updates whenever the payoff satisfies the relative-smoothness condition.
The flat-derivative Bregman construction could be tested on specific mean-field game payoffs to check whether observed error decay matches the predicted exponents.

Load-bearing premise

The payoff function is convex-concave and relatively smooth with respect to a Bregman divergence defined on measures via flat derivatives.

What would settle it

A concrete counter-example payoff on measures that is convex-concave and relatively smooth but where the Nikaidô-Isoda error after N steps of alternating MDA fails to decay at rate O(N^{-2/3}) would falsify the claimed rate.

read the original abstract

We study two variants of the mirror descent-ascent (MDA) algorithm for solving min-max problems on the space of measures: simultaneous and alternating. We work under assumptions of convexity-concavity and relative smoothness of the payoff function with respect to a suitable Bregman divergence, defined on the space of measures via flat derivatives. We establish non-asymptotic convergence rates to mixed Nash equilibria, measured in the Nikaid\^{o}-Isoda error, proving an $\mathcal{O}(N^{-1/2})$ rate for simultaneous MDA and an improved $\mathcal{O}(N^{-2/3})$ rate for alternating MDA. The main technical contribution is an infinite-dimensional dual space analysis that relates Bregman divergences on measures to dual Bregman divergences on spaces of bounded continuous functions, allowing us to control asymmetric commutator terms created by alternating updates. The results substantially generalize prior analyses restricted to bilinear objectives and also apply to nonlinear convex-concave problems on measure spaces, thereby providing a unified theoretical foundation for MDA in mean-field min-max optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper generalizes MDA rates to nonlinear convex-concave problems on measures via a dual-space argument that improves the alternating case.

read the letter

The punchline is that this work moves mirror descent-ascent beyond bilinear payoffs to general convex-concave objectives on probability measures, delivering an O(N^{-1/2}) rate for simultaneous updates and an improved O(N^{-2/3}) rate for alternating updates in the Nikaidô-Isoda error, under relative smoothness with respect to a Bregman divergence defined via flat derivatives. The key new piece is the infinite-dimensional dual-space analysis that relates divergences on measures to dual divergences on bounded continuous functions, which lets them bound the commutator terms that arise only in the alternating variant. That technique directly addresses a limitation in earlier bilinear-only analyses and supplies a unified foundation for these algorithms in mean-field settings. The assumptions are stated clearly and the rates follow from them without obvious circularity or post-hoc fitting. The argument looks internally consistent once the dual mapping is accepted. The main soft spot is that relative smoothness on measures is a strong condition whose verification for typical nonlinear payoffs is left mostly to the reader; without concrete examples or checks, it is not obvious how often the rates apply in practice. The paper also stays strictly inside optimization theory, so it does not claim broader implications for mean-field games or distribution learning. This is for people already working on algorithmic analysis in infinite-dimensional or mean-field optimization who need non-asymptotic guarantees. A reader who cares about extending finite-dimensional mirror methods to measures will find the dual-space step useful. The work is coherent on its own terms and shows clear engagement with the literature, so it deserves a serious referee even if the proofs need tightening on the regularity side.

Referee Report

0 major / 4 minor

Summary. The paper analyzes simultaneous and alternating mirror descent-ascent (MDA) algorithms for min-max problems over probability measures. Under convexity-concavity of the payoff and relative smoothness with respect to a Bregman divergence induced by flat derivatives on the space of measures, it derives non-asymptotic convergence rates in the Nikaidô-Isoda error: O(N^{-1/2}) for the simultaneous variant and the improved O(N^{-2/3}) for the alternating variant. The central technical device is an infinite-dimensional dual-space analysis that relates primal Bregman divergences on measures to dual Bregman divergences on spaces of bounded continuous functions, thereby controlling the asymmetric commutator terms that arise from alternating updates. The results extend prior bilinear analyses to general nonlinear convex-concave mean-field problems.

Significance. If the stated rates and the dual-space commutator control hold, the work supplies the first non-asymptotic guarantees for MDA on measure spaces beyond the bilinear case and furnishes a unified foundation for mean-field min-max optimization. The explicit improvement from N^{-1/2} to N^{-2/3} for alternating updates, obtained without additional assumptions, is a concrete advance over existing bilinear analyses.

minor comments (4)

§3, Definition 3.2: the relative-smoothness constant L appears in the rate statements but its dependence on the choice of reference measure is not made explicit; a short remark clarifying whether L is uniform over the relevant class of measures would improve readability.
Theorem 4.3 (alternating case): the O(N^{-2/3}) bound is stated for the Nikaidô-Isoda error after N iterations; it would be helpful to indicate whether the hidden constants depend on the diameter of the dual space or on the modulus of relative smoothness.
§2.2, flat-derivative notation: the symbol D_μ appears both for the flat derivative and for the Bregman divergence; a brief notational distinction or a dedicated paragraph would prevent confusion.
Figure 1 and the numerical example in §5: the caption does not state the precise discretization size or the choice of reference measure used to generate the plots; adding this information would make the experiment reproducible from the text alone.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives non-asymptotic convergence rates for simultaneous and alternating MDA under explicit assumptions of convexity-concavity and relative smoothness w.r.t. a Bregman divergence on measures (via flat derivatives). The O(N^{-1/2}) and O(N^{-2/3}) rates follow from an infinite-dimensional dual-space analysis relating primal and dual Bregman divergences to control commutator terms; this is a standard first-principles argument from the stated functional-analytic assumptions and does not reduce to any fitted input, self-definition, or self-citation chain. No load-bearing step equates a claimed result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on two domain assumptions stated in the abstract; no free parameters or invented entities are mentioned.

axioms (2)

domain assumption The payoff function is convex-concave
Required to guarantee existence of mixed Nash equilibria and to apply the convergence analysis.
domain assumption Relative smoothness of the payoff with respect to a Bregman divergence defined via flat derivatives on the space of measures
Stated as the key regularity condition that enables the non-asymptotic rates.

pith-pipeline@v0.9.0 · 5722 in / 1324 out tokens · 31146 ms · 2026-05-24T03:45:22.663346+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

Abraham, J

R. Abraham, J. Marsden, and T. Ratiu. Manifolds, Tensor Analysis, and Applications . Applied Mathematical Sciences. Springer New York, 2012

work page 2012
[2]

Aliprantis and K

C. Aliprantis and K. Border. Inﬁnite Dimensional Analysis: A Hitchhiker’s Guide . Springer, 2007

work page 2007
[3]

Ambrosetti and G

A. Ambrosetti and G. Prodi. A Primer of Nonlinear Analysis . Cambridge Studies in Advanced Mathematics. Cambridge University Press, 1995

work page 1995
[4]

Anitescu

M. Anitescu. Degenerate nonlinear programming with a quadratic growth condition. SIAM Journal on Optimization , 10(4):1116–1135, 2000

work page 2000
[5]

Antonakopoulos, V

K. Antonakopoulos, V. Belmega, and P. Mertikopoulos. An adapt ive mirror-prox method for varia- tional inequalities with singular operators. In H. Wallach, H. Laroche lle, A. Beygelzimer, F. d 'Alch´ e- Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems , vol- ume 32. Curran Associates, Inc., 2019

work page 2019
[6]

Arjovsky, S

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generat ive adversarial networks. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machin e Learning , volume 70 of Proceedings of Machine Learning Research , pages 214–223. PMLR, 06–11 Aug 2017

work page 2017
[7]

Aubin-Frankowski, A

P.-C. Aubin-Frankowski, A. Korba, and F. L´ eger. Mirror desce nt with relative smoothness in measure spaces, with application to sinkhorn and em. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems , volume 35, pages 17263–17275. Curran Associates, Inc., 2022. 17

work page 2022
[8]

H. H. Bauschke, J. Bolte, and M. Teboulle. A descent lemma beyon d lipschitz gradient continuity: First-order methods revisited and applications. Math. Oper. Res. , 42:330–348, 2017

work page 2017
[9]

Beck and M

A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003

work page 2003
[10]

J. F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems . Springer Series in Operations Research, 2000

work page 2000
[11]

S. Bubeck. Convex optimization: Algorithms and complexity. Found. Trends Mach. Learn. , 8(3–4):231–357, 2015

work page 2015
[12]

R. A. Carmona and F. Delarue. Probabilistic Theory of Mean Field Games with Applications I: Mean Field FBSDEs, Control, and Games . Springer International Publishing, 2018

work page 2018
[13]

L. Chizat. Convergence Rates of Gradient Methods for Conve x Optimization in the Space of Mea- sures. Open Journal of Mathematical Optimization , 3:1–19, 2022

work page 2022
[14]

L. Chizat. Mean-ﬁeld langevin dynamics : Exponential converge nce and annealing. Transactions on Machine Learning Research , 2022

work page 2022
[15]

Chizat and F

L. Chizat and F. R. Bach. On the global convergence of gradien t descent for over-parameterized models using optimal transport. In NeurIPS, 2018

work page 2018
[16]

Domingo-Enrich, S

C. Domingo-Enrich, S. Jelassi, A. Mensch, G. Rotskoﬀ, and J. B runa. A mean-ﬁeld analysis of two-player zero-sum games. In H. Larochelle, M. Ranzato, R. Had sell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems , volume 33, pages 20215–20226. Curran Associates, Inc., 2020

work page 2020
[17]

Dupuis and R

P. Dupuis and R. S. Ellis. A Weak Convergence Approach to the Theory of Large Deviation s. Wiley, New York, NY, 1997

work page 1997
[18]

Dvurechensky and J.-J

P. Dvurechensky and J.-J. Zhu. Analysis of kernel mirror prox for measure optimization. In 27th International Conference on Artiﬁcial Intelligence and St atistics (AISTATS), 2024

work page 2024
[19]

I. L. Glicksberg. A further generalization of the kakutani ﬁxe d point theorem, with application to nash equilibrium points. Proceedings of the American Mathematical Society , 3(1):170–174, 1952

work page 1952
[20]

Hsieh, C

Y.-P. Hsieh, C. Liu, and V. Cevher. Finding mixed Nash equilibria of g enerative adversarial networks. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning , volume 97 of Proceedings of Machine Learning Research , pages 2810–2819. PMLR, 09–15 Jun 2019

work page 2019
[21]

K. Hu, Z. Ren, D. ˇSiˇ ska, and /suppress Lukasz Szpruch. Mean-ﬁeld Langevin dynamics and energy landscape of neural networks. Annales de l’Institut Henri Poincar´ e, Probabilit´ es et Statistiques, 57(4):2043 – 2065, 2021

work page 2043
[22]

Kerimkulov, J.-M

B. Kerimkulov, J.-M. Leahy, D. ˇSiˇ ska, /suppress L. Szpruch, and Y. Zhang. A Fisher-Rao gradient ﬂow for entropy-regularised Markov decision processes in Polish spaces, 2 023. arXiv:2310.02951

work page arXiv
[23]

Kerimkulov, D

B. Kerimkulov, D. ˇSiˇ ska, /suppress L. Szpruch, and Y. Zhang. Mirror Descent for Stochastic Control Problems with Measure-valued Controls, 2024. arXiv:2401.01198

work page arXiv 2024
[24]

J. Kim, K. Yamamoto, K. Oko, Z. Yang, and T. Suzuki. Symmetric mean-ﬁeld langevin dynamics for distributional minimax problems. In The Twelfth International Conference on Learning Repre- sentations, 2024

work page 2024
[25]

Lascu, M

R.-A. Lascu, M. B. Majka, and /suppress L. Szpruch. Entropic mean-ﬁeld min-max problems via Best Response ﬂow, 2023. arXiv:2306.03033

work page arXiv 2023
[26]

Lascu, M

R.-A. Lascu, M. B. Majka, and /suppress Lukasz Szpruch. A ﬁsher-raogradient ﬂow for entropic mean-ﬁeld min-max games, 2024. arXiv:2405.15834

work page arXiv 2024
[27]

Li, W.-C

C.-L. Li, W.-C. Chang, Y. Cheng, Y. Yang, and B. Poczos. Mmd ga n: Towards deeper understanding of moment matching network. In I. Guyon, U. V. Luxburg, S. Beng io, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems , volume 30. Curran Associates, Inc., 2017

work page 2017
[28]

L. Liu, M. Majka, and /suppress L. Szpruch. Polyak–/suppress lojasiewicz inequality on the space of measures and convergence of mean-ﬁeld birth-death processes. Applied Mathematics and Optimization , 87(3), 2023

work page 2023
[29]

H. Lu, R. M. Freund, and Y. Nesterov. Relatively smooth conve x optimization by ﬁrst-order meth- ods, and applications. SIAM Journal on Optimization , 28(1):333–354, 2018

work page 2018
[30]

Y. Lu. Two-scale gradient descent ascent dynamics ﬁnds mixed nash equilibria of continuous games: a mean-ﬁeld perspective. In Proceedings of the 40th International Conference on Machin e Learning, 2023. 18

work page 2023
[31]

S. Mei, A. Montanari, and P.-M. Nguyen. A mean ﬁeld view of the lan dscape of two-layer neu- ral networks. Proceedings of the National Academy of Sciences of the Unite d States of America , 115:E7665 – E7671, 2018

work page 2018
[32]

Nemirovski and D

A. Nemirovski and D. Yudin. Problem Complexity and Method Eﬃciency in Optimization . A Wiley-Interscience publication. Wiley, 1983

work page 1983
[33]

Nikaidˆ o and K

H. Nikaidˆ o and K. Isoda. Note on non-cooperative convex gam e. Paciﬁc Journal of Mathematics , 5:807–815, 1955

work page 1955
[34]

Nitanda, D

A. Nitanda, D. Wu, and T. Suzuki. Convex analysis of the mean ﬁe ld langevin dynamics. In Proceedings of The 25th International Conference on Artiﬁc ial Intelligence and Statistics , 2022

work page 2022
[35]

Ortega and W

J. Ortega and W. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Varia bles. Clas- sics in Applied Mathematics. Society for Industrial and Applied Mathe matics, 1970

work page 1970
[36]

G. M. Rotskoﬀ and E. Vanden-Eijnden. Neural networks as int eracting particle systems: As- ymptotic convexity of the loss landscape and universal scaling of th e approximation error, 2018. arXiv:1805.00915

work page arXiv 2018
[37]

M. Sion. On general minimax theorems. Paciﬁc Journal of Mathematics , 8(1):171 – 176, 1958

work page 1958
[38]

Suzuki, D

T. Suzuki, D. Wu, and A. Nitanda. Mean-ﬁeld langevin dynamics: T ime-space discretization, stochastic gradient, and variance reduction. In Thirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023
[39]

Tomar, L

M. Tomar, L. Shani, Y. Efroni, and M. Ghavamzadeh. Mirror Des cent Policy Optimization, 2021. arXiv:2005.09814

work page arXiv 2021
[40]

C. G. Trillos and N. G. Trillos. On adversarial robustness and the use of wasserstein ascent-descent dynamics to enforce it, 2023. arXiv:2301.03662

work page arXiv 2023
[41]

Wang and L

G. Wang and L. Chizat. An exponentially converging particle meth od for the mixed nash equilibrium of continuous games, 2023. arXiv:2211.01280

work page arXiv 2023
[42]

Wibisono, M

A. Wibisono, M. Tao, and G. Piliouras. Alternating mirror descent for constrained min-max games. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems , volume 35, pages 35201–35212. Curran Associates, Inc., 2022

work page 2022
[43]

quadratic growth

C. Zalinescu. Convex Analysis In General Vector Spaces . World Scientiﬁc Publishing Company, 2002. Appendix A. Proofs of additional results In this section, we present the proofs of the additional results of the paper. We start with the proofs of Lemma 2.4 and Lemma 2.5, which play a key role in prov ing the main results. Then we continue with the proofs ...

work page 2002
[44]

Forf =f ∗ and using the assumption that δH δf (f ∗) = 0, we get H(g) ≥ H(f ∗), for all g ∈ U , i.e

1], it can be shown that for any f,g ∈ U H(g) ≥ H(f ) + ⟨ g − f, δH δf (f ) ⟩ . Forf =f ∗ and using the assumption that δH δf (f ∗) = 0, we get H(g) ≥ H(f ∗), for all g ∈ U , i.e. f ∗ is a global minimum. □ Deﬁnition C.11 (Second variation of H). Let H :Bb(X ) → X be twice Fr´ echet diﬀer- entiable at f ∈ Bb(X ). If it exists, the second variation of H at...

work page
[45]

However, a major weakness of this implicit game is that it is not imple mentable in practice as opposed to (1.6) and (1.7)

3]). However, a major weakness of this implicit game is that it is not imple mentable in practice as opposed to (1.6) and (1.7). For a given stepsize τ > 0, and ﬁxed initial pair of strategies ( ν0,µ 0) ∈ C × D , for n ≥ 0, the implicit MDA iterative scheme is deﬁned by (E.1) { νn+1 ∈ argminν∈C { ∫ X δF δν (νn,µ n+1,x )(ν − νn)(dx) + 1 τDh(ν,ν n)}, µ n+1 ...

work page

[1] [1]

Abraham, J

R. Abraham, J. Marsden, and T. Ratiu. Manifolds, Tensor Analysis, and Applications . Applied Mathematical Sciences. Springer New York, 2012

work page 2012

[2] [2]

Aliprantis and K

C. Aliprantis and K. Border. Inﬁnite Dimensional Analysis: A Hitchhiker’s Guide . Springer, 2007

work page 2007

[3] [3]

Ambrosetti and G

A. Ambrosetti and G. Prodi. A Primer of Nonlinear Analysis . Cambridge Studies in Advanced Mathematics. Cambridge University Press, 1995

work page 1995

[4] [4]

Anitescu

M. Anitescu. Degenerate nonlinear programming with a quadratic growth condition. SIAM Journal on Optimization , 10(4):1116–1135, 2000

work page 2000

[5] [5]

Antonakopoulos, V

K. Antonakopoulos, V. Belmega, and P. Mertikopoulos. An adapt ive mirror-prox method for varia- tional inequalities with singular operators. In H. Wallach, H. Laroche lle, A. Beygelzimer, F. d 'Alch´ e- Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems , vol- ume 32. Curran Associates, Inc., 2019

work page 2019

[6] [6]

Arjovsky, S

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generat ive adversarial networks. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machin e Learning , volume 70 of Proceedings of Machine Learning Research , pages 214–223. PMLR, 06–11 Aug 2017

work page 2017

[7] [7]

Aubin-Frankowski, A

P.-C. Aubin-Frankowski, A. Korba, and F. L´ eger. Mirror desce nt with relative smoothness in measure spaces, with application to sinkhorn and em. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems , volume 35, pages 17263–17275. Curran Associates, Inc., 2022. 17

work page 2022

[8] [8]

H. H. Bauschke, J. Bolte, and M. Teboulle. A descent lemma beyon d lipschitz gradient continuity: First-order methods revisited and applications. Math. Oper. Res. , 42:330–348, 2017

work page 2017

[9] [9]

Beck and M

A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003

work page 2003

[10] [10]

J. F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems . Springer Series in Operations Research, 2000

work page 2000

[11] [11]

S. Bubeck. Convex optimization: Algorithms and complexity. Found. Trends Mach. Learn. , 8(3–4):231–357, 2015

work page 2015

[12] [12]

R. A. Carmona and F. Delarue. Probabilistic Theory of Mean Field Games with Applications I: Mean Field FBSDEs, Control, and Games . Springer International Publishing, 2018

work page 2018

[13] [13]

L. Chizat. Convergence Rates of Gradient Methods for Conve x Optimization in the Space of Mea- sures. Open Journal of Mathematical Optimization , 3:1–19, 2022

work page 2022

[14] [14]

L. Chizat. Mean-ﬁeld langevin dynamics : Exponential converge nce and annealing. Transactions on Machine Learning Research , 2022

work page 2022

[15] [15]

Chizat and F

L. Chizat and F. R. Bach. On the global convergence of gradien t descent for over-parameterized models using optimal transport. In NeurIPS, 2018

work page 2018

[16] [16]

Domingo-Enrich, S

C. Domingo-Enrich, S. Jelassi, A. Mensch, G. Rotskoﬀ, and J. B runa. A mean-ﬁeld analysis of two-player zero-sum games. In H. Larochelle, M. Ranzato, R. Had sell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems , volume 33, pages 20215–20226. Curran Associates, Inc., 2020

work page 2020

[17] [17]

Dupuis and R

P. Dupuis and R. S. Ellis. A Weak Convergence Approach to the Theory of Large Deviation s. Wiley, New York, NY, 1997

work page 1997

[18] [18]

Dvurechensky and J.-J

P. Dvurechensky and J.-J. Zhu. Analysis of kernel mirror prox for measure optimization. In 27th International Conference on Artiﬁcial Intelligence and St atistics (AISTATS), 2024

work page 2024

[19] [19]

I. L. Glicksberg. A further generalization of the kakutani ﬁxe d point theorem, with application to nash equilibrium points. Proceedings of the American Mathematical Society , 3(1):170–174, 1952

work page 1952

[20] [20]

Hsieh, C

Y.-P. Hsieh, C. Liu, and V. Cevher. Finding mixed Nash equilibria of g enerative adversarial networks. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning , volume 97 of Proceedings of Machine Learning Research , pages 2810–2819. PMLR, 09–15 Jun 2019

work page 2019

[21] [21]

K. Hu, Z. Ren, D. ˇSiˇ ska, and /suppress Lukasz Szpruch. Mean-ﬁeld Langevin dynamics and energy landscape of neural networks. Annales de l’Institut Henri Poincar´ e, Probabilit´ es et Statistiques, 57(4):2043 – 2065, 2021

work page 2043

[22] [22]

Kerimkulov, J.-M

B. Kerimkulov, J.-M. Leahy, D. ˇSiˇ ska, /suppress L. Szpruch, and Y. Zhang. A Fisher-Rao gradient ﬂow for entropy-regularised Markov decision processes in Polish spaces, 2 023. arXiv:2310.02951

work page arXiv

[23] [23]

Kerimkulov, D

B. Kerimkulov, D. ˇSiˇ ska, /suppress L. Szpruch, and Y. Zhang. Mirror Descent for Stochastic Control Problems with Measure-valued Controls, 2024. arXiv:2401.01198

work page arXiv 2024

[24] [24]

J. Kim, K. Yamamoto, K. Oko, Z. Yang, and T. Suzuki. Symmetric mean-ﬁeld langevin dynamics for distributional minimax problems. In The Twelfth International Conference on Learning Repre- sentations, 2024

work page 2024

[25] [25]

Lascu, M

R.-A. Lascu, M. B. Majka, and /suppress L. Szpruch. Entropic mean-ﬁeld min-max problems via Best Response ﬂow, 2023. arXiv:2306.03033

work page arXiv 2023

[26] [26]

Lascu, M

R.-A. Lascu, M. B. Majka, and /suppress Lukasz Szpruch. A ﬁsher-raogradient ﬂow for entropic mean-ﬁeld min-max games, 2024. arXiv:2405.15834

work page arXiv 2024

[27] [27]

Li, W.-C

C.-L. Li, W.-C. Chang, Y. Cheng, Y. Yang, and B. Poczos. Mmd ga n: Towards deeper understanding of moment matching network. In I. Guyon, U. V. Luxburg, S. Beng io, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems , volume 30. Curran Associates, Inc., 2017

work page 2017

[28] [28]

L. Liu, M. Majka, and /suppress L. Szpruch. Polyak–/suppress lojasiewicz inequality on the space of measures and convergence of mean-ﬁeld birth-death processes. Applied Mathematics and Optimization , 87(3), 2023

work page 2023

[29] [29]

H. Lu, R. M. Freund, and Y. Nesterov. Relatively smooth conve x optimization by ﬁrst-order meth- ods, and applications. SIAM Journal on Optimization , 28(1):333–354, 2018

work page 2018

[30] [30]

Y. Lu. Two-scale gradient descent ascent dynamics ﬁnds mixed nash equilibria of continuous games: a mean-ﬁeld perspective. In Proceedings of the 40th International Conference on Machin e Learning, 2023. 18

work page 2023

[31] [31]

S. Mei, A. Montanari, and P.-M. Nguyen. A mean ﬁeld view of the lan dscape of two-layer neu- ral networks. Proceedings of the National Academy of Sciences of the Unite d States of America , 115:E7665 – E7671, 2018

work page 2018

[32] [32]

Nemirovski and D

A. Nemirovski and D. Yudin. Problem Complexity and Method Eﬃciency in Optimization . A Wiley-Interscience publication. Wiley, 1983

work page 1983

[33] [33]

Nikaidˆ o and K

H. Nikaidˆ o and K. Isoda. Note on non-cooperative convex gam e. Paciﬁc Journal of Mathematics , 5:807–815, 1955

work page 1955

[34] [34]

Nitanda, D

A. Nitanda, D. Wu, and T. Suzuki. Convex analysis of the mean ﬁe ld langevin dynamics. In Proceedings of The 25th International Conference on Artiﬁc ial Intelligence and Statistics , 2022

work page 2022

[35] [35]

Ortega and W

J. Ortega and W. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Varia bles. Clas- sics in Applied Mathematics. Society for Industrial and Applied Mathe matics, 1970

work page 1970

[36] [36]

G. M. Rotskoﬀ and E. Vanden-Eijnden. Neural networks as int eracting particle systems: As- ymptotic convexity of the loss landscape and universal scaling of th e approximation error, 2018. arXiv:1805.00915

work page arXiv 2018

[37] [37]

M. Sion. On general minimax theorems. Paciﬁc Journal of Mathematics , 8(1):171 – 176, 1958

work page 1958

[38] [38]

Suzuki, D

T. Suzuki, D. Wu, and A. Nitanda. Mean-ﬁeld langevin dynamics: T ime-space discretization, stochastic gradient, and variance reduction. In Thirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023

[39] [39]

Tomar, L

M. Tomar, L. Shani, Y. Efroni, and M. Ghavamzadeh. Mirror Des cent Policy Optimization, 2021. arXiv:2005.09814

work page arXiv 2021

[40] [40]

C. G. Trillos and N. G. Trillos. On adversarial robustness and the use of wasserstein ascent-descent dynamics to enforce it, 2023. arXiv:2301.03662

work page arXiv 2023

[41] [41]

Wang and L

G. Wang and L. Chizat. An exponentially converging particle meth od for the mixed nash equilibrium of continuous games, 2023. arXiv:2211.01280

work page arXiv 2023

[42] [42]

Wibisono, M

A. Wibisono, M. Tao, and G. Piliouras. Alternating mirror descent for constrained min-max games. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems , volume 35, pages 35201–35212. Curran Associates, Inc., 2022

work page 2022

[43] [43]

quadratic growth

C. Zalinescu. Convex Analysis In General Vector Spaces . World Scientiﬁc Publishing Company, 2002. Appendix A. Proofs of additional results In this section, we present the proofs of the additional results of the paper. We start with the proofs of Lemma 2.4 and Lemma 2.5, which play a key role in prov ing the main results. Then we continue with the proofs ...

work page 2002

[44] [44]

Forf =f ∗ and using the assumption that δH δf (f ∗) = 0, we get H(g) ≥ H(f ∗), for all g ∈ U , i.e

1], it can be shown that for any f,g ∈ U H(g) ≥ H(f ) + ⟨ g − f, δH δf (f ) ⟩ . Forf =f ∗ and using the assumption that δH δf (f ∗) = 0, we get H(g) ≥ H(f ∗), for all g ∈ U , i.e. f ∗ is a global minimum. □ Deﬁnition C.11 (Second variation of H). Let H :Bb(X ) → X be twice Fr´ echet diﬀer- entiable at f ∈ Bb(X ). If it exists, the second variation of H at...

work page

[45] [45]

However, a major weakness of this implicit game is that it is not imple mentable in practice as opposed to (1.6) and (1.7)

3]). However, a major weakness of this implicit game is that it is not imple mentable in practice as opposed to (1.6) and (1.7). For a given stepsize τ > 0, and ﬁxed initial pair of strategies ( ν0,µ 0) ∈ C × D , for n ≥ 0, the implicit MDA iterative scheme is deﬁned by (E.1) { νn+1 ∈ argminν∈C { ∫ X δF δν (νn,µ n+1,x )(ν − νn)(dx) + 1 τDh(ν,ν n)}, µ n+1 ...

work page