pith. sign in

arxiv: 2511.14135 · v2 · submitted 2025-11-18 · 💻 cs.LG · cs.AI· cs.GT· cs.MA

AdaFair-MARL: Enforcing Adaptive Fairness Constraints in Multi-Agent Reinforcement Learning

Pith reviewed 2026-05-17 20:13 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.GTcs.MA
keywords multi-agent reinforcement learningfairness constraintsprimal-dual optimizationJain's Fairness Indexsecond-order coneworkload fairnessconstrained optimization
0
0 comments X

The pith

AdaFair-MARL enforces workload fairness as an explicit constraint in cooperative multi-agent reinforcement learning using adaptive primal-dual updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AdaFair-MARL to address challenges in fair workload enforcement in heterogeneous multi-agent systems pursuing shared objectives. It formulates fairness using Jain's Fairness Index and represents the constraint as a second-order cone to enable stable Lagrangian dual-ascent updates. This approach avoids the inefficiencies and instabilities of fixed fairness penalties or reward-shaping methods by guaranteeing a desired fairness level while optimizing team performance. Experiments in a simulated hospital coordination environment show nearly perfect constraint satisfaction and improved workload balance.

Core claim

We present AdaFair-MARL, a constrained cooperative MARL framework whose core algorithmic component is a primal-dual update that enforces workload fairness via adaptive Lagrange multiplier updates. Grounding the framework in a cooperative Markov game, we derive the fairness constraint from Jain's Fairness Index (JFI) geometry and show that the resulting feasible set admits a second-order cone representation, enabling principled Lagrangian dual-ascent updates without manual penalty tuning.

What carries the argument

Primal-dual update with adaptive Lagrange multipliers on a Jain's Fairness Index constraint cast as a second-order cone.

If this is right

  • AdaFair-MARL achieves nearly perfect constraint satisfaction between 0.99 and 1.00.
  • It significantly improves workload fairness compared to fixed-penalty baselines.
  • It maintains team performance while enforcing fairness in heterogeneous agent settings.
  • The method eliminates the need for manual penalty tuning and post-hoc evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The second-order cone representation may allow extension to other fairness indices if they can be similarly reformulated.
  • In practical deployments like hospital coordination, this could lead to more sustainable agent workloads over long-term operations.
  • Testing the framework in non-cooperative or partially observable environments could reveal additional stability properties.

Load-bearing premise

The fairness constraint derived from Jain's Fairness Index geometry admits a second-order cone representation that enables stable Lagrangian dual-ascent updates without degrading team performance or introducing instabilities.

What would settle it

Observing that the dual-ascent procedure diverges or that team reward drops substantially below baseline levels when applying the second-order cone fairness constraint in a heterogeneous multi-agent environment.

Figures

Figures reproduced from arXiv: 2511.14135 by Angelique Taylor, Felix Grimm, Lekan Molu, Promise Ekpo, Saesha Agarwal.

Figure 1
Figure 1. Figure 1: The MARLHospital Environment. The environment integrates a PDDL planner with a MARL state layer to model skill-aligned fairness and shared-task coordination among healthcare workers. The goal is to pick the backboard from the crash cart, move to the patient, place it under the patient, compress patient chest multiple times, retrieve the Bag-Valve-Mask (BVM) from the crash cart, give patient rescue breaths … view at source ↗
Figure 2
Figure 2. Figure 2: Task flow diagrams for the check compression (CPR) and rescue breath tasks in MARLHospital. The [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
read the original abstract

Fair workload enforcement in heterogeneous multi-agent systems that pursue shared objectives remains challenging. Fixed fairness penalties often introduce inefficiencies, training instability, and conflicting agent incentives. Reward-shaping approaches in fair Multi-Agent Reinforcement Learning (MARL) typically incorporate fairness through heuristic penalties or scalar reward modifications and often rely on post-hoc evaluation. However, these methods do not guarantee that a desired fairness level will be satisfied. To address this limitation, we propose the Adaptive Fairness Multi-Agent Reinforcement Learning (AdaFair-MARL) framework, which formulates workload fairness as an explicit constraint so that agents maintain balanced contributions while optimizing team performance. We present AdaFair-MARL, a constrained cooperative MARL framework whose core algorithmic component is a primal-dual update that enforces workload fairness via adaptive Lagrange multiplier updates. Grounding the framework in a cooperative Markov game, we derive the fairness constraint from Jain's Fairness Index (JFI) geometry and show that the resulting feasible set admits a second-order cone representation, enabling principled Lagrangian dual-ascent updates without manual penalty tuning. Experiments in a simulated hospital coordination environment (MARLHospital) demonstrate the effectiveness of AdaFair-MARL compared to reward-shaping and fixed-penalty fairness methods, improving workload balance while maintaining team performance. We found that AdaFair-MARL achieves nearly perfect constraint satisfaction (0.99-1.00) while significantly improving workload fairness compared to fixed-penalty baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces AdaFair-MARL, a constrained cooperative multi-agent reinforcement learning framework that formulates workload fairness as an explicit constraint derived from Jain's Fairness Index (JFI) geometry. It shows that the resulting feasible set for the workload vector admits a second-order cone representation, enabling a primal-dual update with adaptive Lagrange multipliers to enforce fairness without manual penalty tuning. Experiments in the MARLHospital environment report near-perfect constraint satisfaction (0.99-1.00) and improved workload balance relative to reward-shaping and fixed-penalty baselines while maintaining team performance.

Significance. If the SOC representation of the JFI constraint remains valid for expected workloads under joint policies and the adaptive dual updates prove stable, the framework would provide a principled, tuning-free method for fairness enforcement in heterogeneous cooperative MARL. This could strengthen reliability in domains such as hospital coordination. The empirical results are promising but rest on limited detail regarding metrics and baselines, so the overall significance is moderate pending further analysis.

major comments (2)
  1. [Derivation of fairness constraint] Abstract and derivation section: the feasible set {w | (1^Tw)^2 >= alpha n ||w||_2^2} is SOC-representable for a fixed workload vector w, but workloads in the cooperative Markov game are expectations w(pi) = E_{tau ~ pi}[workload vector] that are linear in the occupancy measure. The manuscript does not provide the explicit steps showing that the constraint remains SOC-representable after this expectation or how the Lagrangian dual is formulated for the resulting non-linear function of the occupancy.
  2. [Algorithm and analysis] No section establishes convergence or bounded oscillation of the dual-ascent updates on the SOC multiplier when function approximation, joint-policy gradients, and heterogeneous non-stationary dynamics are present. The abstract's empirical claim of 0.99-1.00 satisfaction therefore lacks supporting analysis that the method avoids instability in approximate MARL.
minor comments (2)
  1. [Experiments] Experiments section: provide the precise definition of the workload vector, the exact JFI threshold alpha used, and full baseline hyperparameter details to allow reproduction and fair comparison.
  2. [Experiments] Clarify whether the reported constraint satisfaction is measured on the JFI itself or on the SOC constraint violation, and report variance or statistical significance across independent runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below, providing clarifications grounded in the paper's framework and indicating planned revisions to improve rigor and clarity.

read point-by-point responses
  1. Referee: [Derivation of fairness constraint] Abstract and derivation section: the feasible set {w | (1^Tw)^2 >= alpha n ||w||_2^2} is SOC-representable for a fixed workload vector w, but workloads in the cooperative Markov game are expectations w(pi) = E_{tau ~ pi}[workload vector] that are linear in the occupancy measure. The manuscript does not provide the explicit steps showing that the constraint remains SOC-representable after this expectation or how the Lagrangian dual is formulated for the resulting non-linear function of the occupancy.

    Authors: We thank the referee for this precise observation. In the manuscript, the workload vector is explicitly the expected value w(π) = E_{τ ~ π}[workload vector] under the joint policy, which is linear in the occupancy measure μ via w = Aμ for a suitable matrix A derived from the state-action visitation. The fairness constraint (1^Tw)^2 ≥ α n ||w||_2^2 is a convex second-order cone constraint on w. Because the mapping μ ↦ w is affine, substituting yields an equivalent convex constraint on μ that remains SOC-representable: the inequality can be rewritten using auxiliary variables to express ||w||_2 ≤ (1^Tw)/√(α n) as a standard SOC constraint ||z||_2 ≤ t with t and z affine in μ. We will add an explicit subsection 'SOC Representation of Expected Workload Constraints' that walks through these substitution steps and shows preservation under affine maps. For the Lagrangian, we associate a single dual variable λ ≥ 0 with the (convex) constraint violation; the augmented Lagrangian is then L(π, λ) = team reward objective + λ · g(w(π)), where g is the SOC violation function, and the dual update is the standard projected ascent λ ← [λ + η g(w(π))]^+. This formulation is already implicit in our primal-dual algorithm but will be written out explicitly in the revision. revision: yes

  2. Referee: [Algorithm and analysis] No section establishes convergence or bounded oscillation of the dual-ascent updates on the SOC multiplier when function approximation, joint-policy gradients, and heterogeneous non-stationary dynamics are present. The abstract's empirical claim of 0.99-1.00 satisfaction therefore lacks supporting analysis that the method avoids instability in approximate MARL.

    Authors: We agree that a formal convergence or oscillation bound for the adaptive dual updates under function approximation, joint-policy gradients, and non-stationary heterogeneous dynamics is not provided and would be difficult to establish without strong additional assumptions. Such guarantees remain an open theoretical question for constrained MARL in general. Our current support for the 0.99-1.00 satisfaction claim rests on empirical evidence from the MARLHospital environment across multiple random seeds, where the adaptive multiplier updates produced stable training trajectories and consistent constraint satisfaction without visible divergence. In the revision we will add a dedicated subsection 'Empirical Stability of Adaptive Dual Updates' containing (i) time-series plots of the Lagrange multiplier λ during training and (ii) a brief discussion of how the adaptive step-size rule and projection onto λ ≥ 0 help limit oscillations in practice. We believe this strengthens the practical justification while correctly positioning full theoretical analysis as future work. revision: partial

Circularity Check

0 steps flagged

No circularity in JFI-to-SOC derivation or primal-dual updates

full rationale

The paper derives the workload fairness constraint directly from the standard Jain's Fairness Index formula JFI(w) = (1^Tw)^2 / (n ||w||_2^2) by imposing JFI(w) >= alpha, which algebraically rearranges to the quadratic inequality (1^Tw)^2 >= alpha n ||w||_2^2. This inequality is a known second-order cone representable set for w >= 0 and does not rely on any self-citation, fitted parameters, or prior author results. The subsequent claim that this admits an SOC representation enabling Lagrangian dual-ascent is a direct consequence of convex optimization facts, not a reduction to the paper's own inputs. No steps in the provided derivation chain (Markov game grounding, constraint formulation, or primal-dual algorithm) collapse by construction; the empirical constraint satisfaction numbers are reported outcomes rather than tautological predictions. The analysis is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework relies on the mathematical property that the JFI fairness set can be represented as a second-order cone; no explicit free parameters are named in the abstract, though the adaptive multipliers are learned during training.

axioms (1)
  • domain assumption The feasible set defined by the Jain's Fairness Index constraint admits a second-order cone representation.
    Invoked when deriving the Lagrangian dual-ascent updates in the cooperative Markov game formulation.

pith-pipeline@v0.9.0 · 5572 in / 1320 out tokens · 43793 ms · 2026-05-17T20:13:35.721610+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 8 internal anchors

  1. [1]

    Jung, and Hee Rin Lee

    Angelique Taylor, Tauhid Tanjim, Michael Joseph Sack, Maia Hirsch, Kexin Cheng, Kevin Ching, Jonathan St George, Thijs Roumen, Malte F. Jung, and Hee Rin Lee. Rapidly Built Medical Crash Cart! Lessons Learned and Impacts on High-Stakes Team Collaboration in the Emergency Room, February 2025. URL http://arxiv. org/abs/2502.18688. arXiv:2502.18688 [cs] version: 1

  2. [2]

    Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams, June 2025

    Tauhid Tanjim, Jonathan St George, Kevin Ching, and Angelique Taylor. Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams, June 2025. URL http://arxiv.org/abs/2506.08892. arXiv:2506.08892 [cs]

  3. [3]

    Human-Robot Teaming Field Deployments: A Comparison Between Verbal and Non-verbal Communication, June 2025

    Tauhid Tanjim, Promise Ekpo, Huajie Cao, Jonathan St George, Kevin Ching, Hee Rin Lee, and Angelique Taylor. Human-Robot Teaming Field Deployments: A Comparison Between Verbal and Non-verbal Communication, June 2025. URLhttp://arxiv.org/abs/2506.08890. arXiv:2506.08890 [cs]

  4. [4]

    Multi-Agent Reinforcement Learning-Based Fairness-Aware Scheduling for Bursty Traffic

    Mingqi Yuan, Qi Cao, Man-On Pun, and Yi Chen. Multi-Agent Reinforcement Learning-Based Fairness-Aware Scheduling for Bursty Traffic. In2021 IEEE Global Communications Conference (GLOBECOM), pages 1–6, December 2021. doi: 10.1109/GLOBECOM46510.2021.9685661. URL https://ieeexplore.ieee.org/ document/9685661/

  5. [5]

    Parametrized variational inequality approaches to generalized nash equilibrium problems with shared constraints.Computational optimization and applications, 48(3):423–452, 2011

    Koichi Nabetani, Paul Tseng, and Masao Fukushima. Parametrized variational inequality approaches to generalized nash equilibrium problems with shared constraints.Computational optimization and applications, 48(3):423–452, 2011

  6. [6]

    Learning Fairness in Multi-Agent Systems, October 2019

    Jiechuan Jiang and Zongqing Lu. Learning Fairness in Multi-Agent Systems, October 2019. URL http: //arxiv.org/abs/1910.14472. arXiv:1910.14472 [cs]

  7. [7]

    Cooperation and Fairness in Multi-Agent Reinforcement Learning, October 2024

    Jasmine Jerry Aloor, Siddharth Nayak, Sydney Dolan, and Hamsa Balakrishnan. Cooperation and Fairness in Multi-Agent Reinforcement Learning, October 2024. URLhttp://arxiv.org/abs/2410.14916

  8. [8]

    Inequity aversion improves cooperation in intertemporal social dilemmas

    Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, and Thore Graepel. Inequity aversion improves cooperation in intertemporal social dilemmas, September 2018. URL http://arxiv.org/ abs/1803.08884. arXiv:1803.08884 [cs]

  9. [9]

    Grupen, Bart Selman, and Daniel D

    Niko A. Grupen, Bart Selman, and Daniel D. Lee. Cooperative Multi-Agent Fairness and Equivariant Policies. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):9350–9359, June 2022. ISSN 2374-3468, 2159-5399. doi: 10.1609/aaai.v36i9.21166. URL https://ojs.aaai.org/index.php/AAAI/article/ view/21166

  10. [10]

    Fairness in Traffic Control: Decentralized Multi-agent Reinforce- ment Learning with Generalized Gini Welfare Functions

    Umer Siddique, Peilang Li, and Yongcan Cao. Fairness in Traffic Control: Decentralized Multi-agent Reinforce- ment Learning with Generalized Gini Welfare Functions

  11. [11]

    Zhang, Michael Luck, and Elizabeth Black

    Gabriele La Malfa, Jie M. Zhang, Michael Luck, and Elizabeth Black. Fairness Aware Reinforcement Learning via Proximal Policy Optimization, September 2025. URL http://arxiv.org/abs/2502.03953. arXiv:2502.03953 [cs]

  12. [12]

    Fairness and Welfare Quantification for Regret in Multi-Armed Bandits, May 2022

    Siddharth Barman, Arindam Khan, Arnab Maiti, and Ayush Sawarni. Fairness and Welfare Quantification for Regret in Multi-Armed Bandits, May 2022. URL http://arxiv.org/abs/2205.13930. arXiv:2205.13930 [cs]. 8

  13. [13]

    Fairness-aware multi-agent reinforcement learning and visual perception for adaptive traffic signal control.Optoelectronics Letters, 20(12):764–768, December 2024

    Wanqing Fang, Xintian Zhao, and Chengwei Zhang. Fairness-aware multi-agent reinforcement learning and visual perception for adaptive traffic signal control.Optoelectronics Letters, 20(12):764–768, December 2024. ISSN 1993-5013. doi: 10.1007/s11801-024-3267-2. URLhttps://doi.org/10.1007/s11801-024-3267-2

  14. [14]

    Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning

    Umer Siddique. Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning. 2025

  15. [15]

    Constrained Policy Optimization

    Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained Policy Optimization, May 2017. URL http://arxiv.org/abs/1705.10528. arXiv:1705.10528 [cs]

  16. [16]

    Yiming Zhang, Quan Vuong, and Keith W. Ross. First Order Constrained Optimization in Policy Space, October

  17. [17]

    arXiv:2002.06506 [cs]

    URLhttp://arxiv.org/abs/2002.06506. arXiv:2002.06506 [cs]

  18. [18]

    Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium, November 2024

    Zeyang Li and Navid Azizan. Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium, November 2024. URLhttp://arxiv.org/abs/2411.15036. arXiv:2411.15036 [cs]

  19. [19]

    Qianyue Hao, Fengli Xu, Lin Chen, Pan Hui, and Yong Li. Hierarchical Multi-agent Model for Reinforced Medical Resource Allocation with Imperfect Information.ACM Transactions on Intelligent Systems and Technology, 14 (1):1–27, February 2023. ISSN 2157-6904, 2157-6912. doi: 10.1145/3552436. URL https://dl.acm.org/ doi/10.1145/3552436

  20. [20]

    Omar Al-Kadri

    Muhammad Shadi Hajar, Harsha Kalutarage, and M. Omar Al-Kadri. RRP: A Reliable Reinforcement Learning Based Routing Protocol for Wireless Medical Sensor Networks. In2023 IEEE 20th Consumer Communications & Networking Conference (CCNC), pages 781–789, January 2023. doi: 10.1109/CCNC51644.2023.10060225. URLhttps://ieeexplore.ieee.org/document/10060225. ISSN...

  21. [21]

    In: 2021 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS)

    Paul Maria Scheikl, Balázs Gyenes, Tornike Davitashvili, Rayan Younis, André Schulze, Beat P. Müller-Stich, Gerhard Neumann, Martin Wagner, and Franziska Mathis-Ullrich. Cooperative Assistance in Robotic Surgery through Multi-Agent Reinforcement Learning. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1859–1864, S...

  22. [22]

    Esha Saha and Pradeep Rathore. A smart inventory management system with medication demand dependencies in a hospital supply chain: A multi-agent reinforcement learning approach.Computers & Industrial Engineering, 191: 110165, May 2024. ISSN 0360-8352. doi: 10.1016/j.cie.2024.110165. URL https://www.sciencedirect. com/science/article/pii/S0360835224002869

  23. [23]

    A Multi-Agent Deep Reinforcement Learning Approach for Enhancement of COVID-19 CT Image Segmentation.Journal of Personalized Medicine, 12(2):309, February 2022

    Hanane Allioui, Mazin Abed Mohammed, Narjes Benameur, Belal Al-Khateeb, Karrar Hameed Abdulkareem, Begonya Garcia-Zapirain, Robertas Damaševiˇcius, and Rytis Maskeli¯unas. A Multi-Agent Deep Reinforcement Learning Approach for Enhancement of COVID-19 CT Image Segmentation.Journal of Personalized Medicine, 12(2):309, February 2022. ISSN 2075-4426. doi: 10....

  24. [24]

    Online distributed algorithms for seeking generalized Nash equilibria in dynamic environments, April 2020

    Kaihong Lu, Guangqi Li, and Long Wang. Online distributed algorithms for seeking generalized Nash equilibria in dynamic environments, April 2020. URL http://arxiv.org/abs/2004.00525. arXiv:2004.00525 [math]

  25. [25]

    Distributed Generalized Nash Equilibria Seeking for Aggregative Games with Community Structures

    Rui Huang, Yixin Gu, Yuan Fan, and Songsong Cheng. Distributed Generalized Nash Equilibria Seeking for Aggregative Games with Community Structures. In2023 42nd Chinese Control Conference (CCC), pages 8137–8142, July 2023. doi: 10.23919/CCC58697.2023.10240724. URL https://ieeexplore.ieee.org/ document/10240724/. ISSN: 1934-1768

  26. [26]

    The Confluence of Networks, Games and Learning, August

    Tao Li, Guanze Peng, Quanyan Zhu, and Tamer Basar. The Confluence of Networks, Games and Learning, August

  27. [27]

    arXiv:2105.08158 [cs]

    URLhttp://arxiv.org/abs/2105.08158. arXiv:2105.08158 [cs]

  28. [28]

    Learning Generalized Nash Equilibria in a Class of Convex Games

    Tatiana Tatarenko and Maryam Kamgarpour. Learning Generalized Nash Equilibria in a Class of Convex Games, October 2018. URLhttp://arxiv.org/abs/1703.04113. arXiv:1703.04113 [math]

  29. [29]

    Zhaolong Ning, Peiran Dong, Xiaojie Wang, Xiping Hu, Lei Guo, Bin Hu, Yi Guo, Tie Qiu, and Ricky Y . K. Kwok. Mobile Edge Computing Enabled 5G Health Monitoring for Internet of Medical Things: A Decentralized Game Theoretic Approach.IEEE Journal on Selected Areas in Communications, 39(2):463–478, February 2021. ISSN 1558-0008. doi: 10.1109/JSAC.2020.30206...

  30. [30]

    Yuxuan Yang, Xiaojie Wang, Zhaolong Ning, Joel J. P. C. Rodrigues, Xin Jiang, and Yi Guo. Edge Learning for Internet of Medical Things and Its COVID-19 Applications: A Distributed 3C Framework.IEEE Internet of Things Magazine, 4(3):18–23, September 2021. ISSN 2576-3199. doi: 10.1109/IOTM.0100.2000154. URL https://ieeexplore.ieee.org/document/9548976

  31. [31]

    Learning and Game Based Spectrum Allocation Model for Internet of Medical Things (IoMT) Platform.IEEE Access, PP:1–1, January 2023

    Sungwook Kim. Learning and Game Based Spectrum Allocation Model for Internet of Medical Things (IoMT) Platform.IEEE Access, PP:1–1, January 2023. doi: 10.1109/ACCESS.2023.3266331. 9

  32. [32]

    R. Jain, D. Chiu, and W. Hawe. A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems, September 1998. URL http://arxiv.org/abs/cs/9809099. arXiv:cs/9809099

  33. [33]

    Equinox: Holistic Fair Scheduling in Serving Large Language Models, August 2025

    Zhixiang Wei, James Yen, Jingyi Chen, Ziyang Zhang, Zhibai Huang, Chen Chen, Xingzi Yu, Yicheng Gu, Chenggang Wu, Yun Wang, Mingyuan Xia, Jie Wu, Hao Wang, and Zhengwei Qi. Equinox: Holistic Fair Scheduling in Serving Large Language Models, August 2025. URL http://arxiv.org/abs/2508.16646. arXiv:2508.16646 [cs]

  34. [34]

    Equity-Aware Spatial-Temporal Workload Shifting for Sustainable AI Data Centers

    Mohammad Jaminur Islam and Shaolei Ren. Equity-Aware Spatial-Temporal Workload Shifting for Sustainable AI Data Centers

  35. [35]

    Fairness-Aware Multi-Agent Learning-based Task Offloading in Dynamic Vehicular Scenarios

    Ziqi Zhou, Agon Memedi, Chunghan Lee, Seyhan Ucar, Onur Altintas, and Falko Dressler. Fairness-Aware Multi-Agent Learning-based Task Offloading in Dynamic Vehicular Scenarios

  36. [36]

    Routledge, Boca Raton, 1 edition, December 2021

    Eitan Altman.Constrained Markov Decision Processes: Stochastic Modeling. Routledge, Boca Raton, 1 edition, December 2021. ISBN 978-1-315-14022-3. doi: 10.1201/9781315140223. URL https://www.taylorfrancis. com/books/9781315140223

  37. [37]

    Molu, and Angelique Taylor

    Promise Osaine Ekpo, Brian La, Thomas Wiener, Saesha Agarwal, Arshia Agrawal, Gonzalo Gonzalez-Pumariega, Lekan P. Molu, and Angelique Taylor. Skill-Aligned Fairness in Multi-Agent Learning for Collaboration in Healthcare, August 2025. URLhttp://arxiv.org/abs/2508.18708. arXiv:2508.18708 [cs]

  38. [38]

    American Red Cross Code Cards, January 2025

    American Red Cross. American Red Cross Code Cards, January 2025. URL https://www. redcrosslearningcenter.org/s/american-red-cross-code-cards

  39. [39]

    Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks,

    Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, and Stefano V . Albrecht. Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks, November 2021. URL http://arxiv.org/ abs/2006.07869

  40. [40]

    QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, June

    Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, June

  41. [41]

    URLhttp://arxiv.org/abs/1803.11485

  42. [42]

    Safe multi-agent reinforcement learning for multi-robot control.Artificial Intelligence, 319:103905, June 2023

    Shangding Gu, Jakub Grudzien Kuba, Yuanpei Chen, Yali Du, Long Yang, Alois Knoll, and Yaodong Yang. Safe multi-agent reinforcement learning for multi-robot control.Artificial Intelligence, 319:103905, June 2023. ISSN 00043702. doi: 10.1016/j.artint.2023.103905. URL https://linkinghub.elsevier.com/retrieve/pii/ S0004370223000516

  43. [43]

    Competitive Statistical Estimation with Strategic Data Sources

    Tyler Westenbroek, Roy Dong, Lillian J. Ratliff, and S. Shankar Sastry. Competitive Statistical Estimation with Strategic Data Sources, April 2019. URLhttp://arxiv.org/abs/1904.12768. arXiv:1904.12768 [cs]

  44. [44]

    Reward Constrained Policy Optimization

    Chen Tessler, Daniel J. Mankowitz, and Shie Mannor. Reward Constrained Policy Optimization, December 2018. URLhttp://arxiv.org/abs/1805.11074. arXiv:1805.11074 [cs]

  45. [45]

    A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety

    Ankita Kushwaha, Kiran Ravish, Preeti Lamba, and Pawan Kumar. A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety, May 2025. URL http: //arxiv.org/abs/2505.17342. arXiv:2505.17342 [cs]

  46. [46]

    V . S. Borkar and S. P. Meyn. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning.SIAM Journal on Control and Optimization, 38(2):447–469, January 2000. ISSN 0363-0129. doi: 10. 1137/S0363012997331639. URL https://epubs.siam.org/doi/10.1137/S0363012997331639. Publisher: Society for Industrial and Applied Mathematics

  47. [47]

    Benchmarking Safe Exploration in Deep Reinforcement Learning

    Alex Ray, Joshua Achiam, and Dario Amodei. Benchmarking Safe Exploration in Deep Reinforcement Learning. 10 A Additional clarification on GNE and constrained optimization Motivation for the GNE perspective.Although this is an identical-payoff game where strategic conflict is absent, the GNE framework captures a critical structural property:coupled feasibi...