AdaFair-MARL: Enforcing Adaptive Fairness Constraints in Multi-Agent Reinforcement Learning

Angelique Taylor; Felix Grimm; Lekan Molu; Promise Ekpo; Saesha Agarwal

arxiv: 2511.14135 · v2 · submitted 2025-11-18 · 💻 cs.LG · cs.AI· cs.GT· cs.MA

AdaFair-MARL: Enforcing Adaptive Fairness Constraints in Multi-Agent Reinforcement Learning

Promise Ekpo , Saesha Agarwal , Felix Grimm , Lekan Molu , Angelique Taylor This is my paper

Pith reviewed 2026-05-17 20:13 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.GTcs.MA

keywords multi-agent reinforcement learningfairness constraintsprimal-dual optimizationJain's Fairness Indexsecond-order coneworkload fairnessconstrained optimization

0 comments

The pith

AdaFair-MARL enforces workload fairness as an explicit constraint in cooperative multi-agent reinforcement learning using adaptive primal-dual updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AdaFair-MARL to address challenges in fair workload enforcement in heterogeneous multi-agent systems pursuing shared objectives. It formulates fairness using Jain's Fairness Index and represents the constraint as a second-order cone to enable stable Lagrangian dual-ascent updates. This approach avoids the inefficiencies and instabilities of fixed fairness penalties or reward-shaping methods by guaranteeing a desired fairness level while optimizing team performance. Experiments in a simulated hospital coordination environment show nearly perfect constraint satisfaction and improved workload balance.

Core claim

We present AdaFair-MARL, a constrained cooperative MARL framework whose core algorithmic component is a primal-dual update that enforces workload fairness via adaptive Lagrange multiplier updates. Grounding the framework in a cooperative Markov game, we derive the fairness constraint from Jain's Fairness Index (JFI) geometry and show that the resulting feasible set admits a second-order cone representation, enabling principled Lagrangian dual-ascent updates without manual penalty tuning.

What carries the argument

Primal-dual update with adaptive Lagrange multipliers on a Jain's Fairness Index constraint cast as a second-order cone.

If this is right

AdaFair-MARL achieves nearly perfect constraint satisfaction between 0.99 and 1.00.
It significantly improves workload fairness compared to fixed-penalty baselines.
It maintains team performance while enforcing fairness in heterogeneous agent settings.
The method eliminates the need for manual penalty tuning and post-hoc evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The second-order cone representation may allow extension to other fairness indices if they can be similarly reformulated.
In practical deployments like hospital coordination, this could lead to more sustainable agent workloads over long-term operations.
Testing the framework in non-cooperative or partially observable environments could reveal additional stability properties.

Load-bearing premise

The fairness constraint derived from Jain's Fairness Index geometry admits a second-order cone representation that enables stable Lagrangian dual-ascent updates without degrading team performance or introducing instabilities.

What would settle it

Observing that the dual-ascent procedure diverges or that team reward drops substantially below baseline levels when applying the second-order cone fairness constraint in a heterogeneous multi-agent environment.

Figures

Figures reproduced from arXiv: 2511.14135 by Angelique Taylor, Felix Grimm, Lekan Molu, Promise Ekpo, Saesha Agarwal.

**Figure 1.** Figure 1: The MARLHospital Environment. The environment integrates a PDDL planner with a MARL state layer to model skill-aligned fairness and shared-task coordination among healthcare workers. The goal is to pick the backboard from the crash cart, move to the patient, place it under the patient, compress patient chest multiple times, retrieve the Bag-Valve-Mask (BVM) from the crash cart, give patient rescue breaths … view at source ↗

**Figure 2.** Figure 2: Task flow diagrams for the check compression (CPR) and rescue breath tasks in MARLHospital. The [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

Fair workload enforcement in heterogeneous multi-agent systems that pursue shared objectives remains challenging. Fixed fairness penalties often introduce inefficiencies, training instability, and conflicting agent incentives. Reward-shaping approaches in fair Multi-Agent Reinforcement Learning (MARL) typically incorporate fairness through heuristic penalties or scalar reward modifications and often rely on post-hoc evaluation. However, these methods do not guarantee that a desired fairness level will be satisfied. To address this limitation, we propose the Adaptive Fairness Multi-Agent Reinforcement Learning (AdaFair-MARL) framework, which formulates workload fairness as an explicit constraint so that agents maintain balanced contributions while optimizing team performance. We present AdaFair-MARL, a constrained cooperative MARL framework whose core algorithmic component is a primal-dual update that enforces workload fairness via adaptive Lagrange multiplier updates. Grounding the framework in a cooperative Markov game, we derive the fairness constraint from Jain's Fairness Index (JFI) geometry and show that the resulting feasible set admits a second-order cone representation, enabling principled Lagrangian dual-ascent updates without manual penalty tuning. Experiments in a simulated hospital coordination environment (MARLHospital) demonstrate the effectiveness of AdaFair-MARL compared to reward-shaping and fixed-penalty fairness methods, improving workload balance while maintaining team performance. We found that AdaFair-MARL achieves nearly perfect constraint satisfaction (0.99-1.00) while significantly improving workload fairness compared to fixed-penalty baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AdaFair-MARL turns JFI fairness into an SOC constraint with adaptive primal-dual updates for cooperative MARL, delivering strong empirical satisfaction in one domain but thin support for stability under approximation.

read the letter

The main point is that this paper replaces heuristic reward penalties with an explicit second-order cone constraint derived from Jain's Fairness Index, then enforces it through adaptive Lagrange multiplier updates in a cooperative Markov game setting. That formulation is the actual novelty over the reward-shaping baselines they cite. In the MARLHospital experiments the method reaches 0.99-1.00 constraint satisfaction while improving workload balance and preserving team performance, which is a practical step forward for applications that need guaranteed balance rather than tuned penalties. The adaptive multiplier avoids some of the incentive conflicts that fixed penalties create in heterogeneous teams. This part of the work is straightforward and useful on its own terms. The soft spot is the missing analysis around the dual ascent. Workloads are expectations over trajectories, so the constraint is nonlinear in the occupancy measure. With function-approximated critics and joint policy gradients, it is not obvious that the multiplier updates remain stable or that the feasible set stays well-behaved under non-stationary dynamics. The abstract reports the empirical numbers but gives no derivation steps, convergence bounds, or oscillation analysis for the multiplier. Experiments are confined to a single simulated environment with no statistical tests or exhaustive baseline tables shown. Readers working on constrained MARL or fairness in cooperative resource allocation will find the formulation worth examining. The idea is distinct enough from prior penalty-based methods that it deserves a serious referee, mainly to check the optimization details and ask for broader validation. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces AdaFair-MARL, a constrained cooperative multi-agent reinforcement learning framework that formulates workload fairness as an explicit constraint derived from Jain's Fairness Index (JFI) geometry. It shows that the resulting feasible set for the workload vector admits a second-order cone representation, enabling a primal-dual update with adaptive Lagrange multipliers to enforce fairness without manual penalty tuning. Experiments in the MARLHospital environment report near-perfect constraint satisfaction (0.99-1.00) and improved workload balance relative to reward-shaping and fixed-penalty baselines while maintaining team performance.

Significance. If the SOC representation of the JFI constraint remains valid for expected workloads under joint policies and the adaptive dual updates prove stable, the framework would provide a principled, tuning-free method for fairness enforcement in heterogeneous cooperative MARL. This could strengthen reliability in domains such as hospital coordination. The empirical results are promising but rest on limited detail regarding metrics and baselines, so the overall significance is moderate pending further analysis.

major comments (2)

[Derivation of fairness constraint] Abstract and derivation section: the feasible set {w | (1^Tw)^2 >= alpha n ||w||_2^2} is SOC-representable for a fixed workload vector w, but workloads in the cooperative Markov game are expectations w(pi) = E_{tau ~ pi}[workload vector] that are linear in the occupancy measure. The manuscript does not provide the explicit steps showing that the constraint remains SOC-representable after this expectation or how the Lagrangian dual is formulated for the resulting non-linear function of the occupancy.
[Algorithm and analysis] No section establishes convergence or bounded oscillation of the dual-ascent updates on the SOC multiplier when function approximation, joint-policy gradients, and heterogeneous non-stationary dynamics are present. The abstract's empirical claim of 0.99-1.00 satisfaction therefore lacks supporting analysis that the method avoids instability in approximate MARL.

minor comments (2)

[Experiments] Experiments section: provide the precise definition of the workload vector, the exact JFI threshold alpha used, and full baseline hyperparameter details to allow reproduction and fair comparison.
[Experiments] Clarify whether the reported constraint satisfaction is measured on the JFI itself or on the SOC constraint violation, and report variance or statistical significance across independent runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below, providing clarifications grounded in the paper's framework and indicating planned revisions to improve rigor and clarity.

read point-by-point responses

Referee: [Derivation of fairness constraint] Abstract and derivation section: the feasible set {w | (1^Tw)^2 >= alpha n ||w||_2^2} is SOC-representable for a fixed workload vector w, but workloads in the cooperative Markov game are expectations w(pi) = E_{tau ~ pi}[workload vector] that are linear in the occupancy measure. The manuscript does not provide the explicit steps showing that the constraint remains SOC-representable after this expectation or how the Lagrangian dual is formulated for the resulting non-linear function of the occupancy.

Authors: We thank the referee for this precise observation. In the manuscript, the workload vector is explicitly the expected value w(π) = E_{τ ~ π}[workload vector] under the joint policy, which is linear in the occupancy measure μ via w = Aμ for a suitable matrix A derived from the state-action visitation. The fairness constraint (1^Tw)^2 ≥ α n ||w||_2^2 is a convex second-order cone constraint on w. Because the mapping μ ↦ w is affine, substituting yields an equivalent convex constraint on μ that remains SOC-representable: the inequality can be rewritten using auxiliary variables to express ||w||_2 ≤ (1^Tw)/√(α n) as a standard SOC constraint ||z||_2 ≤ t with t and z affine in μ. We will add an explicit subsection 'SOC Representation of Expected Workload Constraints' that walks through these substitution steps and shows preservation under affine maps. For the Lagrangian, we associate a single dual variable λ ≥ 0 with the (convex) constraint violation; the augmented Lagrangian is then L(π, λ) = team reward objective + λ · g(w(π)), where g is the SOC violation function, and the dual update is the standard projected ascent λ ← [λ + η g(w(π))]^+. This formulation is already implicit in our primal-dual algorithm but will be written out explicitly in the revision. revision: yes
Referee: [Algorithm and analysis] No section establishes convergence or bounded oscillation of the dual-ascent updates on the SOC multiplier when function approximation, joint-policy gradients, and heterogeneous non-stationary dynamics are present. The abstract's empirical claim of 0.99-1.00 satisfaction therefore lacks supporting analysis that the method avoids instability in approximate MARL.

Authors: We agree that a formal convergence or oscillation bound for the adaptive dual updates under function approximation, joint-policy gradients, and non-stationary heterogeneous dynamics is not provided and would be difficult to establish without strong additional assumptions. Such guarantees remain an open theoretical question for constrained MARL in general. Our current support for the 0.99-1.00 satisfaction claim rests on empirical evidence from the MARLHospital environment across multiple random seeds, where the adaptive multiplier updates produced stable training trajectories and consistent constraint satisfaction without visible divergence. In the revision we will add a dedicated subsection 'Empirical Stability of Adaptive Dual Updates' containing (i) time-series plots of the Lagrange multiplier λ during training and (ii) a brief discussion of how the adaptive step-size rule and projection onto λ ≥ 0 help limit oscillations in practice. We believe this strengthens the practical justification while correctly positioning full theoretical analysis as future work. revision: partial

Circularity Check

0 steps flagged

No circularity in JFI-to-SOC derivation or primal-dual updates

full rationale

The paper derives the workload fairness constraint directly from the standard Jain's Fairness Index formula JFI(w) = (1^Tw)^2 / (n ||w||_2^2) by imposing JFI(w) >= alpha, which algebraically rearranges to the quadratic inequality (1^Tw)^2 >= alpha n ||w||_2^2. This inequality is a known second-order cone representable set for w >= 0 and does not rely on any self-citation, fitted parameters, or prior author results. The subsequent claim that this admits an SOC representation enabling Lagrangian dual-ascent is a direct consequence of convex optimization facts, not a reduction to the paper's own inputs. No steps in the provided derivation chain (Markov game grounding, constraint formulation, or primal-dual algorithm) collapse by construction; the empirical constraint satisfaction numbers are reported outcomes rather than tautological predictions. The analysis is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework relies on the mathematical property that the JFI fairness set can be represented as a second-order cone; no explicit free parameters are named in the abstract, though the adaptive multipliers are learned during training.

axioms (1)

domain assumption The feasible set defined by the Jain's Fairness Index constraint admits a second-order cone representation.
Invoked when deriving the Lagrangian dual-ascent updates in the cooperative Markov game formulation.

pith-pipeline@v0.9.0 · 5572 in / 1320 out tokens · 43793 ms · 2026-05-17T20:13:35.721610+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

derive the fairness constraint from Jain's Fairness Index (JFI) geometry and show that the resulting feasible set admits a second-order cone representation, enabling principled Lagrangian dual-ascent updates

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 8 internal anchors

[1]

Jung, and Hee Rin Lee

Angelique Taylor, Tauhid Tanjim, Michael Joseph Sack, Maia Hirsch, Kexin Cheng, Kevin Ching, Jonathan St George, Thijs Roumen, Malte F. Jung, and Hee Rin Lee. Rapidly Built Medical Crash Cart! Lessons Learned and Impacts on High-Stakes Team Collaboration in the Emergency Room, February 2025. URL http://arxiv. org/abs/2502.18688. arXiv:2502.18688 [cs] version: 1

work page arXiv 2025
[2]

Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams, June 2025

Tauhid Tanjim, Jonathan St George, Kevin Ching, and Angelique Taylor. Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams, June 2025. URL http://arxiv.org/abs/2506.08892. arXiv:2506.08892 [cs]

work page arXiv 2025
[3]

Human-Robot Teaming Field Deployments: A Comparison Between Verbal and Non-verbal Communication, June 2025

Tauhid Tanjim, Promise Ekpo, Huajie Cao, Jonathan St George, Kevin Ching, Hee Rin Lee, and Angelique Taylor. Human-Robot Teaming Field Deployments: A Comparison Between Verbal and Non-verbal Communication, June 2025. URLhttp://arxiv.org/abs/2506.08890. arXiv:2506.08890 [cs]

work page arXiv 2025
[4]

Multi-Agent Reinforcement Learning-Based Fairness-Aware Scheduling for Bursty Traffic

Mingqi Yuan, Qi Cao, Man-On Pun, and Yi Chen. Multi-Agent Reinforcement Learning-Based Fairness-Aware Scheduling for Bursty Traffic. In2021 IEEE Global Communications Conference (GLOBECOM), pages 1–6, December 2021. doi: 10.1109/GLOBECOM46510.2021.9685661. URL https://ieeexplore.ieee.org/ document/9685661/

work page doi:10.1109/globecom46510.2021.9685661 2021
[5]

Parametrized variational inequality approaches to generalized nash equilibrium problems with shared constraints.Computational optimization and applications, 48(3):423–452, 2011

Koichi Nabetani, Paul Tseng, and Masao Fukushima. Parametrized variational inequality approaches to generalized nash equilibrium problems with shared constraints.Computational optimization and applications, 48(3):423–452, 2011

work page 2011
[6]

Learning Fairness in Multi-Agent Systems, October 2019

Jiechuan Jiang and Zongqing Lu. Learning Fairness in Multi-Agent Systems, October 2019. URL http: //arxiv.org/abs/1910.14472. arXiv:1910.14472 [cs]

work page arXiv 2019
[7]

Cooperation and Fairness in Multi-Agent Reinforcement Learning, October 2024

Jasmine Jerry Aloor, Siddharth Nayak, Sydney Dolan, and Hamsa Balakrishnan. Cooperation and Fairness in Multi-Agent Reinforcement Learning, October 2024. URLhttp://arxiv.org/abs/2410.14916

work page arXiv 2024
[8]

Inequity aversion improves cooperation in intertemporal social dilemmas

Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, and Thore Graepel. Inequity aversion improves cooperation in intertemporal social dilemmas, September 2018. URL http://arxiv.org/ abs/1803.08884. arXiv:1803.08884 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

Grupen, Bart Selman, and Daniel D

Niko A. Grupen, Bart Selman, and Daniel D. Lee. Cooperative Multi-Agent Fairness and Equivariant Policies. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):9350–9359, June 2022. ISSN 2374-3468, 2159-5399. doi: 10.1609/aaai.v36i9.21166. URL https://ojs.aaai.org/index.php/AAAI/article/ view/21166

work page doi:10.1609/aaai.v36i9.21166 2022
[10]

Fairness in Traffic Control: Decentralized Multi-agent Reinforce- ment Learning with Generalized Gini Welfare Functions

Umer Siddique, Peilang Li, and Yongcan Cao. Fairness in Traffic Control: Decentralized Multi-agent Reinforce- ment Learning with Generalized Gini Welfare Functions

work page
[11]

Zhang, Michael Luck, and Elizabeth Black

Gabriele La Malfa, Jie M. Zhang, Michael Luck, and Elizabeth Black. Fairness Aware Reinforcement Learning via Proximal Policy Optimization, September 2025. URL http://arxiv.org/abs/2502.03953. arXiv:2502.03953 [cs]

work page arXiv 2025
[12]

Fairness and Welfare Quantification for Regret in Multi-Armed Bandits, May 2022

Siddharth Barman, Arindam Khan, Arnab Maiti, and Ayush Sawarni. Fairness and Welfare Quantification for Regret in Multi-Armed Bandits, May 2022. URL http://arxiv.org/abs/2205.13930. arXiv:2205.13930 [cs]. 8

work page arXiv 2022
[13]

Fairness-aware multi-agent reinforcement learning and visual perception for adaptive traffic signal control.Optoelectronics Letters, 20(12):764–768, December 2024

Wanqing Fang, Xintian Zhao, and Chengwei Zhang. Fairness-aware multi-agent reinforcement learning and visual perception for adaptive traffic signal control.Optoelectronics Letters, 20(12):764–768, December 2024. ISSN 1993-5013. doi: 10.1007/s11801-024-3267-2. URLhttps://doi.org/10.1007/s11801-024-3267-2

work page doi:10.1007/s11801-024-3267-2 2024
[14]

Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning

Umer Siddique. Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning. 2025

work page 2025
[15]

Constrained Policy Optimization

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained Policy Optimization, May 2017. URL http://arxiv.org/abs/1705.10528. arXiv:1705.10528 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

Yiming Zhang, Quan Vuong, and Keith W. Ross. First Order Constrained Optimization in Policy Space, October

work page
[17]

arXiv:2002.06506 [cs]

URLhttp://arxiv.org/abs/2002.06506. arXiv:2002.06506 [cs]

work page arXiv 2002
[18]

Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium, November 2024

Zeyang Li and Navid Azizan. Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium, November 2024. URLhttp://arxiv.org/abs/2411.15036. arXiv:2411.15036 [cs]

work page arXiv 2024
[19]

Qianyue Hao, Fengli Xu, Lin Chen, Pan Hui, and Yong Li. Hierarchical Multi-agent Model for Reinforced Medical Resource Allocation with Imperfect Information.ACM Transactions on Intelligent Systems and Technology, 14 (1):1–27, February 2023. ISSN 2157-6904, 2157-6912. doi: 10.1145/3552436. URL https://dl.acm.org/ doi/10.1145/3552436

work page doi:10.1145/3552436 2023
[20]

Omar Al-Kadri

Muhammad Shadi Hajar, Harsha Kalutarage, and M. Omar Al-Kadri. RRP: A Reliable Reinforcement Learning Based Routing Protocol for Wireless Medical Sensor Networks. In2023 IEEE 20th Consumer Communications & Networking Conference (CCNC), pages 781–789, January 2023. doi: 10.1109/CCNC51644.2023.10060225. URLhttps://ieeexplore.ieee.org/document/10060225. ISSN...

work page doi:10.1109/ccnc51644.2023.10060225 2023
[21]

In: 2021 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS)

Paul Maria Scheikl, Balázs Gyenes, Tornike Davitashvili, Rayan Younis, André Schulze, Beat P. Müller-Stich, Gerhard Neumann, Martin Wagner, and Franziska Mathis-Ullrich. Cooperative Assistance in Robotic Surgery through Multi-Agent Reinforcement Learning. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1859–1864, S...

work page doi:10.1109/iros51168.2021.9636193 2021
[22]

Esha Saha and Pradeep Rathore. A smart inventory management system with medication demand dependencies in a hospital supply chain: A multi-agent reinforcement learning approach.Computers & Industrial Engineering, 191: 110165, May 2024. ISSN 0360-8352. doi: 10.1016/j.cie.2024.110165. URL https://www.sciencedirect. com/science/article/pii/S0360835224002869

work page doi:10.1016/j.cie.2024.110165 2024
[23]

A Multi-Agent Deep Reinforcement Learning Approach for Enhancement of COVID-19 CT Image Segmentation.Journal of Personalized Medicine, 12(2):309, February 2022

Hanane Allioui, Mazin Abed Mohammed, Narjes Benameur, Belal Al-Khateeb, Karrar Hameed Abdulkareem, Begonya Garcia-Zapirain, Robertas Damaševiˇcius, and Rytis Maskeli¯unas. A Multi-Agent Deep Reinforcement Learning Approach for Enhancement of COVID-19 CT Image Segmentation.Journal of Personalized Medicine, 12(2):309, February 2022. ISSN 2075-4426. doi: 10....

work page doi:10.3390/jpm12020309 2022
[24]

Online distributed algorithms for seeking generalized Nash equilibria in dynamic environments, April 2020

Kaihong Lu, Guangqi Li, and Long Wang. Online distributed algorithms for seeking generalized Nash equilibria in dynamic environments, April 2020. URL http://arxiv.org/abs/2004.00525. arXiv:2004.00525 [math]

work page arXiv 2020
[25]

Distributed Generalized Nash Equilibria Seeking for Aggregative Games with Community Structures

Rui Huang, Yixin Gu, Yuan Fan, and Songsong Cheng. Distributed Generalized Nash Equilibria Seeking for Aggregative Games with Community Structures. In2023 42nd Chinese Control Conference (CCC), pages 8137–8142, July 2023. doi: 10.23919/CCC58697.2023.10240724. URL https://ieeexplore.ieee.org/ document/10240724/. ISSN: 1934-1768

work page doi:10.23919/ccc58697.2023.10240724 2023
[26]

The Confluence of Networks, Games and Learning, August

Tao Li, Guanze Peng, Quanyan Zhu, and Tamer Basar. The Confluence of Networks, Games and Learning, August

work page
[27]

arXiv:2105.08158 [cs]

URLhttp://arxiv.org/abs/2105.08158. arXiv:2105.08158 [cs]

work page arXiv
[28]

Learning Generalized Nash Equilibria in a Class of Convex Games

Tatiana Tatarenko and Maryam Kamgarpour. Learning Generalized Nash Equilibria in a Class of Convex Games, October 2018. URLhttp://arxiv.org/abs/1703.04113. arXiv:1703.04113 [math]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Zhaolong Ning, Peiran Dong, Xiaojie Wang, Xiping Hu, Lei Guo, Bin Hu, Yi Guo, Tie Qiu, and Ricky Y . K. Kwok. Mobile Edge Computing Enabled 5G Health Monitoring for Internet of Medical Things: A Decentralized Game Theoretic Approach.IEEE Journal on Selected Areas in Communications, 39(2):463–478, February 2021. ISSN 1558-0008. doi: 10.1109/JSAC.2020.30206...

work page doi:10.1109/jsac.2020.3020645 2021
[30]

Yuxuan Yang, Xiaojie Wang, Zhaolong Ning, Joel J. P. C. Rodrigues, Xin Jiang, and Yi Guo. Edge Learning for Internet of Medical Things and Its COVID-19 Applications: A Distributed 3C Framework.IEEE Internet of Things Magazine, 4(3):18–23, September 2021. ISSN 2576-3199. doi: 10.1109/IOTM.0100.2000154. URL https://ieeexplore.ieee.org/document/9548976

work page doi:10.1109/iotm.0100.2000154 2021
[31]

Learning and Game Based Spectrum Allocation Model for Internet of Medical Things (IoMT) Platform.IEEE Access, PP:1–1, January 2023

Sungwook Kim. Learning and Game Based Spectrum Allocation Model for Internet of Medical Things (IoMT) Platform.IEEE Access, PP:1–1, January 2023. doi: 10.1109/ACCESS.2023.3266331. 9

work page doi:10.1109/access.2023.3266331 2023
[32]

R. Jain, D. Chiu, and W. Hawe. A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems, September 1998. URL http://arxiv.org/abs/cs/9809099. arXiv:cs/9809099

work page internal anchor Pith review Pith/arXiv arXiv 1998
[33]

Equinox: Holistic Fair Scheduling in Serving Large Language Models, August 2025

Zhixiang Wei, James Yen, Jingyi Chen, Ziyang Zhang, Zhibai Huang, Chen Chen, Xingzi Yu, Yicheng Gu, Chenggang Wu, Yun Wang, Mingyuan Xia, Jie Wu, Hao Wang, and Zhengwei Qi. Equinox: Holistic Fair Scheduling in Serving Large Language Models, August 2025. URL http://arxiv.org/abs/2508.16646. arXiv:2508.16646 [cs]

work page arXiv 2025
[34]

Equity-Aware Spatial-Temporal Workload Shifting for Sustainable AI Data Centers

Mohammad Jaminur Islam and Shaolei Ren. Equity-Aware Spatial-Temporal Workload Shifting for Sustainable AI Data Centers

work page
[35]

Fairness-Aware Multi-Agent Learning-based Task Offloading in Dynamic Vehicular Scenarios

Ziqi Zhou, Agon Memedi, Chunghan Lee, Seyhan Ucar, Onur Altintas, and Falko Dressler. Fairness-Aware Multi-Agent Learning-based Task Offloading in Dynamic Vehicular Scenarios

work page
[36]

Routledge, Boca Raton, 1 edition, December 2021

Eitan Altman.Constrained Markov Decision Processes: Stochastic Modeling. Routledge, Boca Raton, 1 edition, December 2021. ISBN 978-1-315-14022-3. doi: 10.1201/9781315140223. URL https://www.taylorfrancis. com/books/9781315140223

work page doi:10.1201/9781315140223 2021
[37]

Molu, and Angelique Taylor

Promise Osaine Ekpo, Brian La, Thomas Wiener, Saesha Agarwal, Arshia Agrawal, Gonzalo Gonzalez-Pumariega, Lekan P. Molu, and Angelique Taylor. Skill-Aligned Fairness in Multi-Agent Learning for Collaboration in Healthcare, August 2025. URLhttp://arxiv.org/abs/2508.18708. arXiv:2508.18708 [cs]

work page arXiv 2025
[38]

American Red Cross Code Cards, January 2025

American Red Cross. American Red Cross Code Cards, January 2025. URL https://www. redcrosslearningcenter.org/s/american-red-cross-code-cards

work page 2025
[39]

Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks,

Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, and Stefano V . Albrecht. Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks, November 2021. URL http://arxiv.org/ abs/2006.07869

work page arXiv 2021
[40]

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, June

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, June

work page
[41]

URLhttp://arxiv.org/abs/1803.11485

work page internal anchor Pith review Pith/arXiv arXiv
[42]

Safe multi-agent reinforcement learning for multi-robot control.Artificial Intelligence, 319:103905, June 2023

Shangding Gu, Jakub Grudzien Kuba, Yuanpei Chen, Yali Du, Long Yang, Alois Knoll, and Yaodong Yang. Safe multi-agent reinforcement learning for multi-robot control.Artificial Intelligence, 319:103905, June 2023. ISSN 00043702. doi: 10.1016/j.artint.2023.103905. URL https://linkinghub.elsevier.com/retrieve/pii/ S0004370223000516

work page doi:10.1016/j.artint.2023.103905 2023
[43]

Competitive Statistical Estimation with Strategic Data Sources

Tyler Westenbroek, Roy Dong, Lillian J. Ratliff, and S. Shankar Sastry. Competitive Statistical Estimation with Strategic Data Sources, April 2019. URLhttp://arxiv.org/abs/1904.12768. arXiv:1904.12768 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[44]

Reward Constrained Policy Optimization

Chen Tessler, Daniel J. Mankowitz, and Shie Mannor. Reward Constrained Policy Optimization, December 2018. URLhttp://arxiv.org/abs/1805.11074. arXiv:1805.11074 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[45]

A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety

Ankita Kushwaha, Kiran Ravish, Preeti Lamba, and Pawan Kumar. A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety, May 2025. URL http: //arxiv.org/abs/2505.17342. arXiv:2505.17342 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[46]

V . S. Borkar and S. P. Meyn. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning.SIAM Journal on Control and Optimization, 38(2):447–469, January 2000. ISSN 0363-0129. doi: 10. 1137/S0363012997331639. URL https://epubs.siam.org/doi/10.1137/S0363012997331639. Publisher: Society for Industrial and Applied Mathematics

work page doi:10.1137/s0363012997331639 2000
[47]

Benchmarking Safe Exploration in Deep Reinforcement Learning

Alex Ray, Joshua Achiam, and Dario Amodei. Benchmarking Safe Exploration in Deep Reinforcement Learning. 10 A Additional clarification on GNE and constrained optimization Motivation for the GNE perspective.Although this is an identical-payoff game where strategic conflict is absent, the GNE framework captures a critical structural property:coupled feasibi...

work page

[1] [1]

Jung, and Hee Rin Lee

Angelique Taylor, Tauhid Tanjim, Michael Joseph Sack, Maia Hirsch, Kexin Cheng, Kevin Ching, Jonathan St George, Thijs Roumen, Malte F. Jung, and Hee Rin Lee. Rapidly Built Medical Crash Cart! Lessons Learned and Impacts on High-Stakes Team Collaboration in the Emergency Room, February 2025. URL http://arxiv. org/abs/2502.18688. arXiv:2502.18688 [cs] version: 1

work page arXiv 2025

[2] [2]

Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams, June 2025

Tauhid Tanjim, Jonathan St George, Kevin Ching, and Angelique Taylor. Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams, June 2025. URL http://arxiv.org/abs/2506.08892. arXiv:2506.08892 [cs]

work page arXiv 2025

[3] [3]

Human-Robot Teaming Field Deployments: A Comparison Between Verbal and Non-verbal Communication, June 2025

Tauhid Tanjim, Promise Ekpo, Huajie Cao, Jonathan St George, Kevin Ching, Hee Rin Lee, and Angelique Taylor. Human-Robot Teaming Field Deployments: A Comparison Between Verbal and Non-verbal Communication, June 2025. URLhttp://arxiv.org/abs/2506.08890. arXiv:2506.08890 [cs]

work page arXiv 2025

[4] [4]

Multi-Agent Reinforcement Learning-Based Fairness-Aware Scheduling for Bursty Traffic

Mingqi Yuan, Qi Cao, Man-On Pun, and Yi Chen. Multi-Agent Reinforcement Learning-Based Fairness-Aware Scheduling for Bursty Traffic. In2021 IEEE Global Communications Conference (GLOBECOM), pages 1–6, December 2021. doi: 10.1109/GLOBECOM46510.2021.9685661. URL https://ieeexplore.ieee.org/ document/9685661/

work page doi:10.1109/globecom46510.2021.9685661 2021

[5] [5]

Parametrized variational inequality approaches to generalized nash equilibrium problems with shared constraints.Computational optimization and applications, 48(3):423–452, 2011

Koichi Nabetani, Paul Tseng, and Masao Fukushima. Parametrized variational inequality approaches to generalized nash equilibrium problems with shared constraints.Computational optimization and applications, 48(3):423–452, 2011

work page 2011

[6] [6]

Learning Fairness in Multi-Agent Systems, October 2019

Jiechuan Jiang and Zongqing Lu. Learning Fairness in Multi-Agent Systems, October 2019. URL http: //arxiv.org/abs/1910.14472. arXiv:1910.14472 [cs]

work page arXiv 2019

[7] [7]

Cooperation and Fairness in Multi-Agent Reinforcement Learning, October 2024

Jasmine Jerry Aloor, Siddharth Nayak, Sydney Dolan, and Hamsa Balakrishnan. Cooperation and Fairness in Multi-Agent Reinforcement Learning, October 2024. URLhttp://arxiv.org/abs/2410.14916

work page arXiv 2024

[8] [8]

Inequity aversion improves cooperation in intertemporal social dilemmas

Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, and Thore Graepel. Inequity aversion improves cooperation in intertemporal social dilemmas, September 2018. URL http://arxiv.org/ abs/1803.08884. arXiv:1803.08884 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[9] [9]

Grupen, Bart Selman, and Daniel D

Niko A. Grupen, Bart Selman, and Daniel D. Lee. Cooperative Multi-Agent Fairness and Equivariant Policies. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):9350–9359, June 2022. ISSN 2374-3468, 2159-5399. doi: 10.1609/aaai.v36i9.21166. URL https://ojs.aaai.org/index.php/AAAI/article/ view/21166

work page doi:10.1609/aaai.v36i9.21166 2022

[10] [10]

Fairness in Traffic Control: Decentralized Multi-agent Reinforce- ment Learning with Generalized Gini Welfare Functions

Umer Siddique, Peilang Li, and Yongcan Cao. Fairness in Traffic Control: Decentralized Multi-agent Reinforce- ment Learning with Generalized Gini Welfare Functions

work page

[11] [11]

Zhang, Michael Luck, and Elizabeth Black

Gabriele La Malfa, Jie M. Zhang, Michael Luck, and Elizabeth Black. Fairness Aware Reinforcement Learning via Proximal Policy Optimization, September 2025. URL http://arxiv.org/abs/2502.03953. arXiv:2502.03953 [cs]

work page arXiv 2025

[12] [12]

Fairness and Welfare Quantification for Regret in Multi-Armed Bandits, May 2022

Siddharth Barman, Arindam Khan, Arnab Maiti, and Ayush Sawarni. Fairness and Welfare Quantification for Regret in Multi-Armed Bandits, May 2022. URL http://arxiv.org/abs/2205.13930. arXiv:2205.13930 [cs]. 8

work page arXiv 2022

[13] [13]

Fairness-aware multi-agent reinforcement learning and visual perception for adaptive traffic signal control.Optoelectronics Letters, 20(12):764–768, December 2024

Wanqing Fang, Xintian Zhao, and Chengwei Zhang. Fairness-aware multi-agent reinforcement learning and visual perception for adaptive traffic signal control.Optoelectronics Letters, 20(12):764–768, December 2024. ISSN 1993-5013. doi: 10.1007/s11801-024-3267-2. URLhttps://doi.org/10.1007/s11801-024-3267-2

work page doi:10.1007/s11801-024-3267-2 2024

[14] [14]

Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning

Umer Siddique. Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning. 2025

work page 2025

[15] [15]

Constrained Policy Optimization

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained Policy Optimization, May 2017. URL http://arxiv.org/abs/1705.10528. arXiv:1705.10528 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2017

[16] [16]

Yiming Zhang, Quan Vuong, and Keith W. Ross. First Order Constrained Optimization in Policy Space, October

work page

[17] [17]

arXiv:2002.06506 [cs]

URLhttp://arxiv.org/abs/2002.06506. arXiv:2002.06506 [cs]

work page arXiv 2002

[18] [18]

Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium, November 2024

Zeyang Li and Navid Azizan. Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium, November 2024. URLhttp://arxiv.org/abs/2411.15036. arXiv:2411.15036 [cs]

work page arXiv 2024

[19] [19]

Qianyue Hao, Fengli Xu, Lin Chen, Pan Hui, and Yong Li. Hierarchical Multi-agent Model for Reinforced Medical Resource Allocation with Imperfect Information.ACM Transactions on Intelligent Systems and Technology, 14 (1):1–27, February 2023. ISSN 2157-6904, 2157-6912. doi: 10.1145/3552436. URL https://dl.acm.org/ doi/10.1145/3552436

work page doi:10.1145/3552436 2023

[20] [20]

Omar Al-Kadri

Muhammad Shadi Hajar, Harsha Kalutarage, and M. Omar Al-Kadri. RRP: A Reliable Reinforcement Learning Based Routing Protocol for Wireless Medical Sensor Networks. In2023 IEEE 20th Consumer Communications & Networking Conference (CCNC), pages 781–789, January 2023. doi: 10.1109/CCNC51644.2023.10060225. URLhttps://ieeexplore.ieee.org/document/10060225. ISSN...

work page doi:10.1109/ccnc51644.2023.10060225 2023

[21] [21]

In: 2021 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS)

Paul Maria Scheikl, Balázs Gyenes, Tornike Davitashvili, Rayan Younis, André Schulze, Beat P. Müller-Stich, Gerhard Neumann, Martin Wagner, and Franziska Mathis-Ullrich. Cooperative Assistance in Robotic Surgery through Multi-Agent Reinforcement Learning. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1859–1864, S...

work page doi:10.1109/iros51168.2021.9636193 2021

[22] [22]

Esha Saha and Pradeep Rathore. A smart inventory management system with medication demand dependencies in a hospital supply chain: A multi-agent reinforcement learning approach.Computers & Industrial Engineering, 191: 110165, May 2024. ISSN 0360-8352. doi: 10.1016/j.cie.2024.110165. URL https://www.sciencedirect. com/science/article/pii/S0360835224002869

work page doi:10.1016/j.cie.2024.110165 2024

[23] [23]

A Multi-Agent Deep Reinforcement Learning Approach for Enhancement of COVID-19 CT Image Segmentation.Journal of Personalized Medicine, 12(2):309, February 2022

Hanane Allioui, Mazin Abed Mohammed, Narjes Benameur, Belal Al-Khateeb, Karrar Hameed Abdulkareem, Begonya Garcia-Zapirain, Robertas Damaševiˇcius, and Rytis Maskeli¯unas. A Multi-Agent Deep Reinforcement Learning Approach for Enhancement of COVID-19 CT Image Segmentation.Journal of Personalized Medicine, 12(2):309, February 2022. ISSN 2075-4426. doi: 10....

work page doi:10.3390/jpm12020309 2022

[24] [24]

Online distributed algorithms for seeking generalized Nash equilibria in dynamic environments, April 2020

Kaihong Lu, Guangqi Li, and Long Wang. Online distributed algorithms for seeking generalized Nash equilibria in dynamic environments, April 2020. URL http://arxiv.org/abs/2004.00525. arXiv:2004.00525 [math]

work page arXiv 2020

[25] [25]

Distributed Generalized Nash Equilibria Seeking for Aggregative Games with Community Structures

Rui Huang, Yixin Gu, Yuan Fan, and Songsong Cheng. Distributed Generalized Nash Equilibria Seeking for Aggregative Games with Community Structures. In2023 42nd Chinese Control Conference (CCC), pages 8137–8142, July 2023. doi: 10.23919/CCC58697.2023.10240724. URL https://ieeexplore.ieee.org/ document/10240724/. ISSN: 1934-1768

work page doi:10.23919/ccc58697.2023.10240724 2023

[26] [26]

The Confluence of Networks, Games and Learning, August

Tao Li, Guanze Peng, Quanyan Zhu, and Tamer Basar. The Confluence of Networks, Games and Learning, August

work page

[27] [27]

arXiv:2105.08158 [cs]

URLhttp://arxiv.org/abs/2105.08158. arXiv:2105.08158 [cs]

work page arXiv

[28] [28]

Learning Generalized Nash Equilibria in a Class of Convex Games

Tatiana Tatarenko and Maryam Kamgarpour. Learning Generalized Nash Equilibria in a Class of Convex Games, October 2018. URLhttp://arxiv.org/abs/1703.04113. arXiv:1703.04113 [math]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

Zhaolong Ning, Peiran Dong, Xiaojie Wang, Xiping Hu, Lei Guo, Bin Hu, Yi Guo, Tie Qiu, and Ricky Y . K. Kwok. Mobile Edge Computing Enabled 5G Health Monitoring for Internet of Medical Things: A Decentralized Game Theoretic Approach.IEEE Journal on Selected Areas in Communications, 39(2):463–478, February 2021. ISSN 1558-0008. doi: 10.1109/JSAC.2020.30206...

work page doi:10.1109/jsac.2020.3020645 2021

[30] [30]

Yuxuan Yang, Xiaojie Wang, Zhaolong Ning, Joel J. P. C. Rodrigues, Xin Jiang, and Yi Guo. Edge Learning for Internet of Medical Things and Its COVID-19 Applications: A Distributed 3C Framework.IEEE Internet of Things Magazine, 4(3):18–23, September 2021. ISSN 2576-3199. doi: 10.1109/IOTM.0100.2000154. URL https://ieeexplore.ieee.org/document/9548976

work page doi:10.1109/iotm.0100.2000154 2021

[31] [31]

Learning and Game Based Spectrum Allocation Model for Internet of Medical Things (IoMT) Platform.IEEE Access, PP:1–1, January 2023

Sungwook Kim. Learning and Game Based Spectrum Allocation Model for Internet of Medical Things (IoMT) Platform.IEEE Access, PP:1–1, January 2023. doi: 10.1109/ACCESS.2023.3266331. 9

work page doi:10.1109/access.2023.3266331 2023

[32] [32]

R. Jain, D. Chiu, and W. Hawe. A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems, September 1998. URL http://arxiv.org/abs/cs/9809099. arXiv:cs/9809099

work page internal anchor Pith review Pith/arXiv arXiv 1998

[33] [33]

Equinox: Holistic Fair Scheduling in Serving Large Language Models, August 2025

Zhixiang Wei, James Yen, Jingyi Chen, Ziyang Zhang, Zhibai Huang, Chen Chen, Xingzi Yu, Yicheng Gu, Chenggang Wu, Yun Wang, Mingyuan Xia, Jie Wu, Hao Wang, and Zhengwei Qi. Equinox: Holistic Fair Scheduling in Serving Large Language Models, August 2025. URL http://arxiv.org/abs/2508.16646. arXiv:2508.16646 [cs]

work page arXiv 2025

[34] [34]

Equity-Aware Spatial-Temporal Workload Shifting for Sustainable AI Data Centers

Mohammad Jaminur Islam and Shaolei Ren. Equity-Aware Spatial-Temporal Workload Shifting for Sustainable AI Data Centers

work page

[35] [35]

Fairness-Aware Multi-Agent Learning-based Task Offloading in Dynamic Vehicular Scenarios

Ziqi Zhou, Agon Memedi, Chunghan Lee, Seyhan Ucar, Onur Altintas, and Falko Dressler. Fairness-Aware Multi-Agent Learning-based Task Offloading in Dynamic Vehicular Scenarios

work page

[36] [36]

Routledge, Boca Raton, 1 edition, December 2021

Eitan Altman.Constrained Markov Decision Processes: Stochastic Modeling. Routledge, Boca Raton, 1 edition, December 2021. ISBN 978-1-315-14022-3. doi: 10.1201/9781315140223. URL https://www.taylorfrancis. com/books/9781315140223

work page doi:10.1201/9781315140223 2021

[37] [37]

Molu, and Angelique Taylor

Promise Osaine Ekpo, Brian La, Thomas Wiener, Saesha Agarwal, Arshia Agrawal, Gonzalo Gonzalez-Pumariega, Lekan P. Molu, and Angelique Taylor. Skill-Aligned Fairness in Multi-Agent Learning for Collaboration in Healthcare, August 2025. URLhttp://arxiv.org/abs/2508.18708. arXiv:2508.18708 [cs]

work page arXiv 2025

[38] [38]

American Red Cross Code Cards, January 2025

American Red Cross. American Red Cross Code Cards, January 2025. URL https://www. redcrosslearningcenter.org/s/american-red-cross-code-cards

work page 2025

[39] [39]

Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks,

Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, and Stefano V . Albrecht. Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks, November 2021. URL http://arxiv.org/ abs/2006.07869

work page arXiv 2021

[40] [40]

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, June

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, June

work page

[41] [41]

URLhttp://arxiv.org/abs/1803.11485

work page internal anchor Pith review Pith/arXiv arXiv

[42] [42]

Safe multi-agent reinforcement learning for multi-robot control.Artificial Intelligence, 319:103905, June 2023

Shangding Gu, Jakub Grudzien Kuba, Yuanpei Chen, Yali Du, Long Yang, Alois Knoll, and Yaodong Yang. Safe multi-agent reinforcement learning for multi-robot control.Artificial Intelligence, 319:103905, June 2023. ISSN 00043702. doi: 10.1016/j.artint.2023.103905. URL https://linkinghub.elsevier.com/retrieve/pii/ S0004370223000516

work page doi:10.1016/j.artint.2023.103905 2023

[43] [43]

Competitive Statistical Estimation with Strategic Data Sources

Tyler Westenbroek, Roy Dong, Lillian J. Ratliff, and S. Shankar Sastry. Competitive Statistical Estimation with Strategic Data Sources, April 2019. URLhttp://arxiv.org/abs/1904.12768. arXiv:1904.12768 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2019

[44] [44]

Reward Constrained Policy Optimization

Chen Tessler, Daniel J. Mankowitz, and Shie Mannor. Reward Constrained Policy Optimization, December 2018. URLhttp://arxiv.org/abs/1805.11074. arXiv:1805.11074 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2018

[45] [45]

A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety

Ankita Kushwaha, Kiran Ravish, Preeti Lamba, and Pawan Kumar. A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety, May 2025. URL http: //arxiv.org/abs/2505.17342. arXiv:2505.17342 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2025

[46] [46]

V . S. Borkar and S. P. Meyn. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning.SIAM Journal on Control and Optimization, 38(2):447–469, January 2000. ISSN 0363-0129. doi: 10. 1137/S0363012997331639. URL https://epubs.siam.org/doi/10.1137/S0363012997331639. Publisher: Society for Industrial and Applied Mathematics

work page doi:10.1137/s0363012997331639 2000

[47] [47]

Benchmarking Safe Exploration in Deep Reinforcement Learning

Alex Ray, Joshua Achiam, and Dario Amodei. Benchmarking Safe Exploration in Deep Reinforcement Learning. 10 A Additional clarification on GNE and constrained optimization Motivation for the GNE perspective.Although this is an identical-payoff game where strategic conflict is absent, the GNE framework captures a critical structural property:coupled feasibi...

work page