Mean-Field Control with a Common Hidden State under Decentralized Observations

Ali D. Kara; Erhan Bayraktar

arxiv: 2606.19639 · v1 · pith:2YCKKTFUnew · submitted 2026-06-17 · 🧮 math.OC

Mean-Field Control with a Common Hidden State under Decentralized Observations

Erhan Bayraktar , Ali D. Kara This is my paper

Pith reviewed 2026-06-26 19:36 UTC · model grok-4.3

classification 🧮 math.OC

keywords mean-field controldecentralized observationshidden statesymmetric policiesinfinite population limitmeasure-valued controldynamic programming

0 comments

The pith

Optimal symmetric policies from the infinite-agent limit achieve near-optimality in finite populations of agents that share a hidden state, with error scaling as 1/sqrt(N).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines control of many agents that share a common hidden state but receive decentralized observations through identical channels. The system dynamics and costs depend on individual actions only via the overall empirical distribution of actions. In the limit of infinitely many agents, this reduces to a single-agent problem formulated as deterministic control over probability measures on policies, solved via dynamic programming. The authors prove that randomization over actions is required for optimality in this limit, but randomization over entire policies is not. They further show that the resulting symmetric policies achieve near-optimality in the original finite-agent system, with explicit convergence rates.

Core claim

The optimal symmetric policies designed for the infinite population problem are near optimal for the finite population problem, with convergence rates that decay with number of agents as 1/sqrt(N) and grow exponentially with the memory length used in the policy. The infinite-agent problem reduces to a deterministic measure-valued control problem over policies, for which a dynamic programming recursion is provided. Randomization over control actions is necessary for optimality in the limit, while randomization over the selection of policies is not.

What carries the argument

The deterministic measure-valued control problem over the space of policies, in which the agent affects the hidden state dynamics via the conditional law of actions given the past hidden state process.

If this is right

Randomization over individual control actions is necessary for optimality in the infinite-population problem.
Mixture policies that randomize over the choice of entire policies are not required for optimality.
The approximation error for any finite number of agents decays as 1/sqrt(N) and increases exponentially with the length of memory in the policy.
Symmetric policies derived from the limit problem suffice to obtain the stated near-optimality guarantee.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

For fixed memory length, the performance gap vanishes as the number of agents tends to infinity.
The framework may apply to large-scale systems whose agents have only minor differences in their observation channels.
Numerical checks of the predicted exponential growth with memory length could inform how much history to retain when implementing the policies.

Load-bearing premise

The dynamics of the hidden state and the costs depend on the agents' actions only through their empirical distribution, with all agents receiving observations through identical channels.

What would settle it

A numerical simulation of a finite-N system in which the performance gap between the infinite-population policy and the true optimum fails to decay proportionally to 1/sqrt(N) as N increases.

read the original abstract

We study optimal control of a system with multiple decision makers who share a common hidden state and receive fully decentralized observations through identical channels. The dynamics of the hidden state and the cost incurred by the agents depend on the agents' actions only through their empirical distribution. In the limit problem with infinitely many agents, the problem reduces to a single agent control problem where the agent affects the hidden state dynamics via the conditional law of the actions given the past values of the hidden state process. We formulate this problem as a deterministic measure valued control problem over the space of policies and provide a dynamic programming recursion. We first show that for the limiting problem randomization over the control actions is necessary for optimality. However, randomization over the selection of policies (i.e., mixture policies) is not required. We then show that the optimal symmetric policies designed for the infinite population problem are near optimal for the finite population problem. In particular, we establish convergence rates that decay with number of agents as $\frac{1}{\sqrt{N}}$, and grow exponentially with the memory length used in the policy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reduces decentralized mean-field control with a shared hidden state to a deterministic DP over measures and shows explicit 1/sqrt(N) near-optimality for finite N, with action randomization required but policy mixing not.

read the letter

The main takeaway is that this work reduces the infinite-agent problem to a single deterministic measure-valued control problem solved by dynamic programming, then proves the resulting symmetric policies are near-optimal for finite populations at rate 1/sqrt(N), with the rate worsening exponentially in memory length.

What stands out as new is the specific setup combining a common hidden state, identical decentralized observation channels, and the clean separation between needing to randomize actions versus not needing to randomize policies. The explicit quantitative rate is also more concrete than the usual qualitative mean-field limits.

The paper handles the model cleanly by assuming dynamics and costs depend on actions only through the empirical distribution. This symmetry supports the reduction without extra propagation-of-chaos terms.

The soft spot is the exponential growth of the error bound with memory length. For any policy that needs more than short history, the guarantee becomes impractical fast; that is a genuine limitation on applicability rather than a minor detail. The abstract states the claims without showing the derivations, so the DP recursion on the measure-valued state would need checking for hidden regularity conditions.

This is for researchers in stochastic control and mean-field methods who work with partial observations or hidden states. A reader looking for tractable approximations to large decentralized systems would get direct value from the reduction and rates. The claims are specific and the model is standard within the area, so the paper deserves a serious referee.

I would send it out for review.

Referee Report

2 major / 2 minor

Summary. The manuscript studies optimal control for a large population of agents sharing a common hidden state, with fully decentralized observations through identical channels. Dynamics and costs depend on actions only via the empirical distribution. The infinite-population limit reduces to a deterministic measure-valued control problem over the space of policies, solved by dynamic programming. The paper establishes that randomization over actions is required for optimality while randomization over policy selection is not, and proves that symmetric policies optimal for the infinite-population problem remain near-optimal for finite N, with explicit rates of order 1/sqrt(N) that grow exponentially in the policy memory length.

Significance. If the derivations are complete, the work extends mean-field control to hidden-state settings with decentralized information and supplies explicit convergence rates together with a dynamic-programming characterization on the space of conditional action laws. These elements could support scalable controller design in applications such as sensor networks or large robotic swarms. The separation between action-level and policy-level randomization is a clarifying technical point.

major comments (2)

[limit-problem formulation] The reduction of the infinite-population problem to a deterministic measure-valued control problem and the associated dynamic-programming recursion are stated in the abstract, but the value-function definition, state space, and Bellman operator are not exhibited; without these the claim that the limit problem is solvable by DP cannot be verified and is load-bearing for all subsequent results.
[finite-to-infinite convergence] The 1/sqrt(N) near-optimality result with exponential dependence on memory length is asserted, yet the propagation-of-chaos estimates, the precise error bounds, and the manner in which the hidden-state filtering error enters the constants are not supplied; this gap directly affects the central convergence claim.

minor comments (2)

Notation for the conditional law of actions given the hidden-state history should be introduced once and used consistently; the current description mixes several equivalent but visually distinct symbols.
A short comparison paragraph placing the identical-channel assumption against the broader literature on mean-field control with partial observations would clarify the novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the constructive identification of points that require greater explicitness. We address each major comment below and indicate the revisions that will be made.

read point-by-point responses

Referee: [limit-problem formulation] The reduction of the infinite-population problem to a deterministic measure-valued control problem and the associated dynamic-programming recursion are stated in the abstract, but the value-function definition, state space, and Bellman operator are not exhibited; without these the claim that the limit problem is solvable by DP cannot be verified and is load-bearing for all subsequent results.

Authors: We agree that a compact, self-contained statement of the DP elements strengthens the manuscript. The state space is the set of probability measures on the product of the hidden-state space and the finite-length observation histories (Section 2.2); the value function is defined as the infimum expected cost over admissible policies starting from a given measure (Definition 3.1); and the Bellman operator appears explicitly as the integral recursion in Theorem 3.2. To make these elements immediately verifiable, we will insert a new subsection 3.1 that collects the state-space definition, value-function definition, and the Bellman operator in one place, together with a short verification that the operator is a contraction under our Lipschitz assumptions. revision: yes
Referee: [finite-to-infinite convergence] The 1/sqrt(N) near-optimality result with exponential dependence on memory length is asserted, yet the propagation-of-chaos estimates, the precise error bounds, and the manner in which the hidden-state filtering error enters the constants are not supplied; this gap directly affects the central convergence claim.

Authors: The propagation-of-chaos argument proceeds from the standard Wasserstein-1 convergence of the empirical measure to its mean-field limit, combined with the Lipschitz continuity of the controlled dynamics and cost with respect to the measure (Assumption 2.1). The hidden-state filtering error is controlled in total variation; because the observation channel is memoryless, the error contracts at a uniform rate, but the Lipschitz constants of the value function grow exponentially with memory length, producing the stated dependence. These bounds are derived in the proof of Theorem 4.3 (Appendix B). We will add a short paragraph in Section 4 that states the key propagation-of-chaos estimate and indicates how the filtering error enters the constant, while retaining the full derivation in the appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper reduces the finite-N problem to an infinite-population deterministic measure-valued control problem via standard mean-field arguments under the stated assumptions (empirical-distribution dependence and identical channels). It then applies dynamic programming to obtain optimal symmetric policies and invokes propagation-of-chaos estimates to obtain the 1/sqrt(N) convergence rates. These steps are self-contained mathematical arguments; no load-bearing claim reduces by definition or self-citation to a fitted input, and the abstract and described structure contain no self-definitional loops or renamed empirical patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the work relies on standard domain assumptions of mean-field control without introducing new free parameters or invented entities; full text would be needed to audit any additional axioms.

axioms (2)

domain assumption Dynamics and costs depend on actions only through the empirical measure
Invoked to obtain the mean-field limit reduction.
domain assumption All agents observe through identical channels
Used to maintain symmetry in the infinite-population problem.

pith-pipeline@v0.9.1-grok · 5717 in / 1338 out tokens · 31310 ms · 2026-06-26T19:36:29.323393+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references

[1]

Unified reinforcement q-learning for mean field game and control problems

Andrea Angiuli, Jean-Pierre Fouque, and Mathieu Lauri `ere. Unified reinforcement q-learning for mean field game and control problems. Mathematics of Control, Signals, and Systems, 34(2):217–271, 2022

2022
[2]

Arrow and Roy Radner

Kenneth J. Arrow and Roy Radner. Allocation of resources in large teams.Econometrica, 47(2):361–385, 1979

1979
[3]

Mean-field games and dynamic de- mand management in power grids.Dynamic Games and Applications, 4(2):155–176, 2014

Fabio Bagagiolo and Dario Bauso. Mean-field games and dynamic de- mand management in power grids.Dynamic Games and Applications, 4(2):155–176, 2014

2014
[4]

Mean field Markov decision processes.Applied Mathematics & Optimization, 88(1):12, 2023

Nicole B ¨auerle. Mean field Markov decision processes.Applied Mathematics & Optimization, 88(1):12, 2023

2023
[5]

Finite approximations for mean-field type multi-agent control and their near optimality.Applied Mathematics & Optimization, 92(1):7, 2025

Erhan Bayraktar, Nicole B ¨auerle, and Ali Devran Kara. Finite approximations for mean-field type multi-agent control and their near optimality.Applied Mathematics & Optimization, 92(1):7, 2025

2025
[6]

Mean field control and finite agent approximation for regime-switching jump diffusions.Applied Mathematics & Optimization, 88(2):36, 2023

Erhan Bayraktar, Alekos Cecchin, and Prakash Chakraborty. Mean field control and finite agent approximation for regime-switching jump diffusions.Applied Mathematics & Optimization, 88(2):36, 2023

2023
[7]

Erhan Bayraktar, Andrea Cosso, and Huy ˆen Pham. Randomized dynamic programming principle and Feynman-Kac representation for optimal control of Mckean-Vlasov dynamics.Transactions of the American Mathematical Society, 370(3):2115–2160, 2018

2018
[8]

Infinite horizon average cost optimality criteria for mean-field control.SIAM Journal on Control and Optimization, 62(5):2776–2806, 2024

Erhan Bayraktar and Ali Devran Kara. Infinite horizon average cost optimality criteria for mean-field control.SIAM Journal on Control and Optimization, 62(5):2776–2806, 2024

2024
[9]

Learning with linear function approximations in mean-field control.Journal of Machine Learning Research, 26(192):1–53, 2025

Erhan Bayraktar and Ali Devran Kara. Learning with linear function approximations in mean-field control.Journal of Machine Learning Research, 26(192):1–53, 2025

2025
[10]

Solvability of infinite horizon Mckean–Vlasov FBSDEs in mean field control problems and games

Erhan Bayraktar and Xin Zhang. Solvability of infinite horizon Mckean–Vlasov FBSDEs in mean field control problems and games. Applied Mathematics & Optimization, 87(1):13, 2023

2023
[11]

Ren ´e Carmona and Mathieu Lauri `ere. Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games i: the ergodic case.SIAM Journal on Numerical Analysis, 59(3):1455–1485, 2021

2021
[12]

Perlaza, Hamidou Tembine, and M ´erouane Debbah

Romain Couillet, Samir M. Perlaza, Hamidou Tembine, and M ´erouane Debbah. Electrical vehicles in the smart grid: A mean field game anal- ysis.IEEE Journal on Selected Areas in Communications, 30(6):1086– 1096, 2012

2012
[13]

Davison, Narain Rau, and Frank Palmay

Edward J. Davison, Narain Rau, and Frank Palmay. The optimal decentralized control of a power system consisting of a number of interconnected synchronous machines.International Journal of Control, 18(6):1313–1328, 1973

1973
[14]

Mckean– vlasov optimal control: limit theory and equivalence between different formulations.Mathematics of Operations Research, 47(4):2891–2930, 2022

Mao Fabrice Djete, Dylan Possama ¨ı, and Xiaolu Tan. Mckean– vlasov optimal control: limit theory and equivalence between different formulations.Mathematics of Operations Research, 47(4):2891–2930, 2022

2022
[15]

McKean–Vlasov optimal control: the dynamic programming principle.The Annals of Probability, 50(2):791–833, 2022

Mao Fabrice Djete, Dylan Possama ¨ı, and Xiaolu Tan. McKean–Vlasov optimal control: the dynamic programming principle.The Annals of Probability, 50(2):791–833, 2022

2022
[16]

Mean-field optimal control as gamma-limit of finite agent controls.European Journal of Applied Mathematics, 30(6):1153–1186, 2019

Massimo Fornasier, Stefano Lisini, Carlo Orrieri, and Giuseppe Savar´e. Mean-field optimal control as gamma-limit of finite agent controls.European Journal of Applied Mathematics, 30(6):1153–1186, 2019

2019
[17]

Numerical resolution of Mckean-Vlasov FBSDEs using neural networks.Method- ology and Computing in Applied Probability, pages 1–30, 2022

Maximilien Germain, Joseph Mikael, and Xavier Warin. Numerical resolution of Mckean-Vlasov FBSDEs using neural networks.Method- ology and Computing in Applied Probability, pages 1–30, 2022

2022
[18]

Mean-field con- trols with q-learning for cooperative marl: convergence and complexity analysis.SIAM Journal on Mathematics of Data Science, 3(4):1168– 1196, 2021

Haotian Gu, Xin Guo, Xiaoli Wei, and Renyuan Xu. Mean-field con- trols with q-learning for cooperative marl: convergence and complexity analysis.SIAM Journal on Mathematics of Data Science, 3(4):1168– 1196, 2021

2021
[19]

Dynamic pro- gramming principles for mean-field controls with learning.Operations Research, 2023

Haotian Gu, Xin Guo, Xiaoli Wei, and Renyuan Xu. Dynamic pro- gramming principles for mean-field controls with learning.Operations Research, 2023

2023
[20]

On the existence of optimal policies for a class of static and se- quential dynamic teams.SIAM Journal on Control and Optimization, 53(3):1681–1712, 2015

Abhishek Gupta, Serdar Y ¨uksel, Tamer Bas ¸ar, and C´edric Langbort. On the existence of optimal policies for a class of static and se- quential dynamic teams.SIAM Journal on Control and Optimization, 53(3):1681–1712, 2015

2015
[21]

Hespanha, Payam Naghshtabrizi, and Yonggang Xu

Jo ˜ao P. Hespanha, Payam Naghshtabrizi, and Yonggang Xu. A survey of recent results in networked control systems.Proceedings of the IEEE, 95(1):138–162, 2007

2007
[22]

Team decision theory and information structures.Pro- ceedings of the IEEE, 68(6):644–654, 1980

Yu-Chi Ho. Team decision theory and information structures.Pro- ceedings of the IEEE, 68(6):644–654, 1980

1980
[23]

D. Lacker. Limit theory for controlled Mckean–Vlasov dynamics. SIAM Journal on Control and Optimization, 55(3):1641–1672, 2017

2017
[24]

Dynamic programming for mean-field type control.Comptes Rendus Mathematique, 352(9):707– 713, 2014

Mathieu Lauri `ere and Olivier Pironneau. Dynamic programming for mean-field type control.Comptes Rendus Mathematique, 352(9):707– 713, 2014

2014
[25]

Martins, Michael Rotkowitz, and Serdar Y¨uksel

Aditya Mahajan, Nuno C. Martins, Michael Rotkowitz, and Serdar Y¨uksel. Information structures in optimal decentralized control. In IEEE 51st Conference on Decision and Control (CDC), pages 1291– 1306, Maui, Hawaii, USA, 2012

2012
[26]

Mean-field Markov decision processes with common noise and open-loop controls.The Annals of Applied Probability, 32(2):1421–1458, 2022

M ´ed´eric Motte and Huy ˆen Pham. Mean-field Markov decision processes with common noise and open-loop controls.The Annals of Applied Probability, 32(2):1421–1458, 2022

2022
[27]

Quantitative propagation of chaos for mean field Markov decision process with common noise.Electronic Journal of Probability, 28:1–24, 2023

M ´ed´eric Motte and Huyˆen Pham. Quantitative propagation of chaos for mean field Markov decision process with common noise.Electronic Journal of Probability, 28:1–24, 2023

2023
[28]

Reza Olfati-Saber and Richard M. Murray. Consensus problems in networks of agents with switching topology and time-delays.IEEE Transactions on Automatic Control, 49(9):1520–1533, 2004

2004
[29]

Dynamic programming for optimal control of stochastic McKean–Vlasov dynamics.SIAM Journal on Control and Optimization, 55(2):1069–1101, 2017

Huy ˆen Pham and Xiaoli Wei. Dynamic programming for optimal control of stochastic McKean–Vlasov dynamics.SIAM Journal on Control and Optimization, 55(2):1069–1101, 2017

2017
[30]

A topology for team policies and existence of optimal team policies in stochastic team theory.IEEE Transactions on Automatic Control, 65(1):310–317, 2020

Naci Saldi. A topology for team policies and existence of optimal team policies in stochastic team theory.IEEE Transactions on Automatic Control, 65(1):310–317, 2020

2020
[31]

Sandell, Pravin Varaiya, Michael Athans, and Michael G

Nils R. Sandell, Pravin Varaiya, Michael Athans, and Michael G. Safonov. Survey of decentralized control methods for large scale systems.IEEE Transactions on Automatic Control, 23(2):108–128, April 1978

1978
[32]

Optimality of inde- pendently randomized symmetric policies for exchangeable stochastic teams with infinitely many decision makers.Mathematics of Opera- tions Research, 2022

Sina Sanjari, Naci Saldi, and Serdar Y ¨uksel. Optimality of inde- pendently randomized symmetric policies for exchangeable stochastic teams with infinitely many decision makers.Mathematics of Opera- tions Research, 2022

2022
[33]

Optimal policies for convex symmetric stochastic dynamic teams and their mean-field limit.SIAM Journal on Control and Optimization, 59(2):777–804, 2021

Sina Sanjari and Serdar Yuksel. Optimal policies for convex symmetric stochastic dynamic teams and their mean-field limit.SIAM Journal on Control and Optimization, 59(2):777–804, 2021

2021
[34]

Tsitsiklis

John N. Tsitsiklis. Decentralized detection by a large number of sensors.Mathematics of Control, Signals and Systems, 1(2):167–182, 1988

1988
[35]

van der Vaart and Jon A

Aad W. van der Vaart and Jon A. Wellner.Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. Springer, New York, NY , 1 edition, 1996

1996
[36]

Cambridge Series in Statistical and Probabilistic Mathematics

Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2 edition, 2026

2026
[37]

Y ¨uksel and T

S. Y ¨uksel and T. Bas ¸ar.Stochastic Networked Control Systems: Sta- bilization and Optimization under Information Constraints. Springer, New York, 2013

2013
[38]

A universal dynamic program and refined existence results for decentralized stochastic control.SIAM Journal on Control and Optimization, 58(5):2711–2739, 2020

Serdar Y ¨uksel. A universal dynamic program and refined existence results for decentralized stochastic control.SIAM Journal on Control and Optimization, 58(5):2711–2739, 2020

2020
[39]

Springer, 2024

Serdar Y ¨uksel and Tamer Bas ¸ar.Stochastic teams, games, and control under information constraints. Springer, 2024

2024
[40]

Convex analysis in decentralized stochastic control, strategic measures, and optimal solutions.SIAM Journal on Control and Optimization, 55(1):1–28, 2017

Serdar Y ¨uksel and Naci Saldi. Convex analysis in decentralized stochastic control, strategic measures, and optimal solutions.SIAM Journal on Control and Optimization, 55(1):1–28, 2017

2017

[1] [1]

Unified reinforcement q-learning for mean field game and control problems

Andrea Angiuli, Jean-Pierre Fouque, and Mathieu Lauri `ere. Unified reinforcement q-learning for mean field game and control problems. Mathematics of Control, Signals, and Systems, 34(2):217–271, 2022

2022

[2] [2]

Arrow and Roy Radner

Kenneth J. Arrow and Roy Radner. Allocation of resources in large teams.Econometrica, 47(2):361–385, 1979

1979

[3] [3]

Mean-field games and dynamic de- mand management in power grids.Dynamic Games and Applications, 4(2):155–176, 2014

Fabio Bagagiolo and Dario Bauso. Mean-field games and dynamic de- mand management in power grids.Dynamic Games and Applications, 4(2):155–176, 2014

2014

[4] [4]

Mean field Markov decision processes.Applied Mathematics & Optimization, 88(1):12, 2023

Nicole B ¨auerle. Mean field Markov decision processes.Applied Mathematics & Optimization, 88(1):12, 2023

2023

[5] [5]

Finite approximations for mean-field type multi-agent control and their near optimality.Applied Mathematics & Optimization, 92(1):7, 2025

Erhan Bayraktar, Nicole B ¨auerle, and Ali Devran Kara. Finite approximations for mean-field type multi-agent control and their near optimality.Applied Mathematics & Optimization, 92(1):7, 2025

2025

[6] [6]

Mean field control and finite agent approximation for regime-switching jump diffusions.Applied Mathematics & Optimization, 88(2):36, 2023

Erhan Bayraktar, Alekos Cecchin, and Prakash Chakraborty. Mean field control and finite agent approximation for regime-switching jump diffusions.Applied Mathematics & Optimization, 88(2):36, 2023

2023

[7] [7]

Erhan Bayraktar, Andrea Cosso, and Huy ˆen Pham. Randomized dynamic programming principle and Feynman-Kac representation for optimal control of Mckean-Vlasov dynamics.Transactions of the American Mathematical Society, 370(3):2115–2160, 2018

2018

[8] [8]

Infinite horizon average cost optimality criteria for mean-field control.SIAM Journal on Control and Optimization, 62(5):2776–2806, 2024

Erhan Bayraktar and Ali Devran Kara. Infinite horizon average cost optimality criteria for mean-field control.SIAM Journal on Control and Optimization, 62(5):2776–2806, 2024

2024

[9] [9]

Learning with linear function approximations in mean-field control.Journal of Machine Learning Research, 26(192):1–53, 2025

Erhan Bayraktar and Ali Devran Kara. Learning with linear function approximations in mean-field control.Journal of Machine Learning Research, 26(192):1–53, 2025

2025

[10] [10]

Solvability of infinite horizon Mckean–Vlasov FBSDEs in mean field control problems and games

Erhan Bayraktar and Xin Zhang. Solvability of infinite horizon Mckean–Vlasov FBSDEs in mean field control problems and games. Applied Mathematics & Optimization, 87(1):13, 2023

2023

[11] [11]

Ren ´e Carmona and Mathieu Lauri `ere. Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games i: the ergodic case.SIAM Journal on Numerical Analysis, 59(3):1455–1485, 2021

2021

[12] [12]

Perlaza, Hamidou Tembine, and M ´erouane Debbah

Romain Couillet, Samir M. Perlaza, Hamidou Tembine, and M ´erouane Debbah. Electrical vehicles in the smart grid: A mean field game anal- ysis.IEEE Journal on Selected Areas in Communications, 30(6):1086– 1096, 2012

2012

[13] [13]

Davison, Narain Rau, and Frank Palmay

Edward J. Davison, Narain Rau, and Frank Palmay. The optimal decentralized control of a power system consisting of a number of interconnected synchronous machines.International Journal of Control, 18(6):1313–1328, 1973

1973

[14] [14]

Mckean– vlasov optimal control: limit theory and equivalence between different formulations.Mathematics of Operations Research, 47(4):2891–2930, 2022

Mao Fabrice Djete, Dylan Possama ¨ı, and Xiaolu Tan. Mckean– vlasov optimal control: limit theory and equivalence between different formulations.Mathematics of Operations Research, 47(4):2891–2930, 2022

2022

[15] [15]

McKean–Vlasov optimal control: the dynamic programming principle.The Annals of Probability, 50(2):791–833, 2022

Mao Fabrice Djete, Dylan Possama ¨ı, and Xiaolu Tan. McKean–Vlasov optimal control: the dynamic programming principle.The Annals of Probability, 50(2):791–833, 2022

2022

[16] [16]

Mean-field optimal control as gamma-limit of finite agent controls.European Journal of Applied Mathematics, 30(6):1153–1186, 2019

Massimo Fornasier, Stefano Lisini, Carlo Orrieri, and Giuseppe Savar´e. Mean-field optimal control as gamma-limit of finite agent controls.European Journal of Applied Mathematics, 30(6):1153–1186, 2019

2019

[17] [17]

Numerical resolution of Mckean-Vlasov FBSDEs using neural networks.Method- ology and Computing in Applied Probability, pages 1–30, 2022

Maximilien Germain, Joseph Mikael, and Xavier Warin. Numerical resolution of Mckean-Vlasov FBSDEs using neural networks.Method- ology and Computing in Applied Probability, pages 1–30, 2022

2022

[18] [18]

Mean-field con- trols with q-learning for cooperative marl: convergence and complexity analysis.SIAM Journal on Mathematics of Data Science, 3(4):1168– 1196, 2021

Haotian Gu, Xin Guo, Xiaoli Wei, and Renyuan Xu. Mean-field con- trols with q-learning for cooperative marl: convergence and complexity analysis.SIAM Journal on Mathematics of Data Science, 3(4):1168– 1196, 2021

2021

[19] [19]

Dynamic pro- gramming principles for mean-field controls with learning.Operations Research, 2023

Haotian Gu, Xin Guo, Xiaoli Wei, and Renyuan Xu. Dynamic pro- gramming principles for mean-field controls with learning.Operations Research, 2023

2023

[20] [20]

On the existence of optimal policies for a class of static and se- quential dynamic teams.SIAM Journal on Control and Optimization, 53(3):1681–1712, 2015

Abhishek Gupta, Serdar Y ¨uksel, Tamer Bas ¸ar, and C´edric Langbort. On the existence of optimal policies for a class of static and se- quential dynamic teams.SIAM Journal on Control and Optimization, 53(3):1681–1712, 2015

2015

[21] [21]

Hespanha, Payam Naghshtabrizi, and Yonggang Xu

Jo ˜ao P. Hespanha, Payam Naghshtabrizi, and Yonggang Xu. A survey of recent results in networked control systems.Proceedings of the IEEE, 95(1):138–162, 2007

2007

[22] [22]

Team decision theory and information structures.Pro- ceedings of the IEEE, 68(6):644–654, 1980

Yu-Chi Ho. Team decision theory and information structures.Pro- ceedings of the IEEE, 68(6):644–654, 1980

1980

[23] [23]

D. Lacker. Limit theory for controlled Mckean–Vlasov dynamics. SIAM Journal on Control and Optimization, 55(3):1641–1672, 2017

2017

[24] [24]

Dynamic programming for mean-field type control.Comptes Rendus Mathematique, 352(9):707– 713, 2014

Mathieu Lauri `ere and Olivier Pironneau. Dynamic programming for mean-field type control.Comptes Rendus Mathematique, 352(9):707– 713, 2014

2014

[25] [25]

Martins, Michael Rotkowitz, and Serdar Y¨uksel

Aditya Mahajan, Nuno C. Martins, Michael Rotkowitz, and Serdar Y¨uksel. Information structures in optimal decentralized control. In IEEE 51st Conference on Decision and Control (CDC), pages 1291– 1306, Maui, Hawaii, USA, 2012

2012

[26] [26]

Mean-field Markov decision processes with common noise and open-loop controls.The Annals of Applied Probability, 32(2):1421–1458, 2022

M ´ed´eric Motte and Huy ˆen Pham. Mean-field Markov decision processes with common noise and open-loop controls.The Annals of Applied Probability, 32(2):1421–1458, 2022

2022

[27] [27]

Quantitative propagation of chaos for mean field Markov decision process with common noise.Electronic Journal of Probability, 28:1–24, 2023

M ´ed´eric Motte and Huyˆen Pham. Quantitative propagation of chaos for mean field Markov decision process with common noise.Electronic Journal of Probability, 28:1–24, 2023

2023

[28] [28]

Reza Olfati-Saber and Richard M. Murray. Consensus problems in networks of agents with switching topology and time-delays.IEEE Transactions on Automatic Control, 49(9):1520–1533, 2004

2004

[29] [29]

Dynamic programming for optimal control of stochastic McKean–Vlasov dynamics.SIAM Journal on Control and Optimization, 55(2):1069–1101, 2017

Huy ˆen Pham and Xiaoli Wei. Dynamic programming for optimal control of stochastic McKean–Vlasov dynamics.SIAM Journal on Control and Optimization, 55(2):1069–1101, 2017

2017

[30] [30]

A topology for team policies and existence of optimal team policies in stochastic team theory.IEEE Transactions on Automatic Control, 65(1):310–317, 2020

Naci Saldi. A topology for team policies and existence of optimal team policies in stochastic team theory.IEEE Transactions on Automatic Control, 65(1):310–317, 2020

2020

[31] [31]

Sandell, Pravin Varaiya, Michael Athans, and Michael G

Nils R. Sandell, Pravin Varaiya, Michael Athans, and Michael G. Safonov. Survey of decentralized control methods for large scale systems.IEEE Transactions on Automatic Control, 23(2):108–128, April 1978

1978

[32] [32]

Optimality of inde- pendently randomized symmetric policies for exchangeable stochastic teams with infinitely many decision makers.Mathematics of Opera- tions Research, 2022

Sina Sanjari, Naci Saldi, and Serdar Y ¨uksel. Optimality of inde- pendently randomized symmetric policies for exchangeable stochastic teams with infinitely many decision makers.Mathematics of Opera- tions Research, 2022

2022

[33] [33]

Optimal policies for convex symmetric stochastic dynamic teams and their mean-field limit.SIAM Journal on Control and Optimization, 59(2):777–804, 2021

Sina Sanjari and Serdar Yuksel. Optimal policies for convex symmetric stochastic dynamic teams and their mean-field limit.SIAM Journal on Control and Optimization, 59(2):777–804, 2021

2021

[34] [34]

Tsitsiklis

John N. Tsitsiklis. Decentralized detection by a large number of sensors.Mathematics of Control, Signals and Systems, 1(2):167–182, 1988

1988

[35] [35]

van der Vaart and Jon A

Aad W. van der Vaart and Jon A. Wellner.Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. Springer, New York, NY , 1 edition, 1996

1996

[36] [36]

Cambridge Series in Statistical and Probabilistic Mathematics

Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2 edition, 2026

2026

[37] [37]

Y ¨uksel and T

S. Y ¨uksel and T. Bas ¸ar.Stochastic Networked Control Systems: Sta- bilization and Optimization under Information Constraints. Springer, New York, 2013

2013

[38] [38]

A universal dynamic program and refined existence results for decentralized stochastic control.SIAM Journal on Control and Optimization, 58(5):2711–2739, 2020

Serdar Y ¨uksel. A universal dynamic program and refined existence results for decentralized stochastic control.SIAM Journal on Control and Optimization, 58(5):2711–2739, 2020

2020

[39] [39]

Springer, 2024

Serdar Y ¨uksel and Tamer Bas ¸ar.Stochastic teams, games, and control under information constraints. Springer, 2024

2024

[40] [40]

Convex analysis in decentralized stochastic control, strategic measures, and optimal solutions.SIAM Journal on Control and Optimization, 55(1):1–28, 2017

Serdar Y ¨uksel and Naci Saldi. Convex analysis in decentralized stochastic control, strategic measures, and optimal solutions.SIAM Journal on Control and Optimization, 55(1):1–28, 2017

2017