AdaFair-MARL: Enforcing Adaptive Fairness Constraints in Multi-Agent Reinforcement Learning
Pith reviewed 2026-05-17 20:13 UTC · model grok-4.3
The pith
AdaFair-MARL enforces workload fairness as an explicit constraint in cooperative multi-agent reinforcement learning using adaptive primal-dual updates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present AdaFair-MARL, a constrained cooperative MARL framework whose core algorithmic component is a primal-dual update that enforces workload fairness via adaptive Lagrange multiplier updates. Grounding the framework in a cooperative Markov game, we derive the fairness constraint from Jain's Fairness Index (JFI) geometry and show that the resulting feasible set admits a second-order cone representation, enabling principled Lagrangian dual-ascent updates without manual penalty tuning.
What carries the argument
Primal-dual update with adaptive Lagrange multipliers on a Jain's Fairness Index constraint cast as a second-order cone.
If this is right
- AdaFair-MARL achieves nearly perfect constraint satisfaction between 0.99 and 1.00.
- It significantly improves workload fairness compared to fixed-penalty baselines.
- It maintains team performance while enforcing fairness in heterogeneous agent settings.
- The method eliminates the need for manual penalty tuning and post-hoc evaluation.
Where Pith is reading between the lines
- The second-order cone representation may allow extension to other fairness indices if they can be similarly reformulated.
- In practical deployments like hospital coordination, this could lead to more sustainable agent workloads over long-term operations.
- Testing the framework in non-cooperative or partially observable environments could reveal additional stability properties.
Load-bearing premise
The fairness constraint derived from Jain's Fairness Index geometry admits a second-order cone representation that enables stable Lagrangian dual-ascent updates without degrading team performance or introducing instabilities.
What would settle it
Observing that the dual-ascent procedure diverges or that team reward drops substantially below baseline levels when applying the second-order cone fairness constraint in a heterogeneous multi-agent environment.
Figures
read the original abstract
Fair workload enforcement in heterogeneous multi-agent systems that pursue shared objectives remains challenging. Fixed fairness penalties often introduce inefficiencies, training instability, and conflicting agent incentives. Reward-shaping approaches in fair Multi-Agent Reinforcement Learning (MARL) typically incorporate fairness through heuristic penalties or scalar reward modifications and often rely on post-hoc evaluation. However, these methods do not guarantee that a desired fairness level will be satisfied. To address this limitation, we propose the Adaptive Fairness Multi-Agent Reinforcement Learning (AdaFair-MARL) framework, which formulates workload fairness as an explicit constraint so that agents maintain balanced contributions while optimizing team performance. We present AdaFair-MARL, a constrained cooperative MARL framework whose core algorithmic component is a primal-dual update that enforces workload fairness via adaptive Lagrange multiplier updates. Grounding the framework in a cooperative Markov game, we derive the fairness constraint from Jain's Fairness Index (JFI) geometry and show that the resulting feasible set admits a second-order cone representation, enabling principled Lagrangian dual-ascent updates without manual penalty tuning. Experiments in a simulated hospital coordination environment (MARLHospital) demonstrate the effectiveness of AdaFair-MARL compared to reward-shaping and fixed-penalty fairness methods, improving workload balance while maintaining team performance. We found that AdaFair-MARL achieves nearly perfect constraint satisfaction (0.99-1.00) while significantly improving workload fairness compared to fixed-penalty baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AdaFair-MARL, a constrained cooperative multi-agent reinforcement learning framework that formulates workload fairness as an explicit constraint derived from Jain's Fairness Index (JFI) geometry. It shows that the resulting feasible set for the workload vector admits a second-order cone representation, enabling a primal-dual update with adaptive Lagrange multipliers to enforce fairness without manual penalty tuning. Experiments in the MARLHospital environment report near-perfect constraint satisfaction (0.99-1.00) and improved workload balance relative to reward-shaping and fixed-penalty baselines while maintaining team performance.
Significance. If the SOC representation of the JFI constraint remains valid for expected workloads under joint policies and the adaptive dual updates prove stable, the framework would provide a principled, tuning-free method for fairness enforcement in heterogeneous cooperative MARL. This could strengthen reliability in domains such as hospital coordination. The empirical results are promising but rest on limited detail regarding metrics and baselines, so the overall significance is moderate pending further analysis.
major comments (2)
- [Derivation of fairness constraint] Abstract and derivation section: the feasible set {w | (1^Tw)^2 >= alpha n ||w||_2^2} is SOC-representable for a fixed workload vector w, but workloads in the cooperative Markov game are expectations w(pi) = E_{tau ~ pi}[workload vector] that are linear in the occupancy measure. The manuscript does not provide the explicit steps showing that the constraint remains SOC-representable after this expectation or how the Lagrangian dual is formulated for the resulting non-linear function of the occupancy.
- [Algorithm and analysis] No section establishes convergence or bounded oscillation of the dual-ascent updates on the SOC multiplier when function approximation, joint-policy gradients, and heterogeneous non-stationary dynamics are present. The abstract's empirical claim of 0.99-1.00 satisfaction therefore lacks supporting analysis that the method avoids instability in approximate MARL.
minor comments (2)
- [Experiments] Experiments section: provide the precise definition of the workload vector, the exact JFI threshold alpha used, and full baseline hyperparameter details to allow reproduction and fair comparison.
- [Experiments] Clarify whether the reported constraint satisfaction is measured on the JFI itself or on the SOC constraint violation, and report variance or statistical significance across independent runs.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below, providing clarifications grounded in the paper's framework and indicating planned revisions to improve rigor and clarity.
read point-by-point responses
-
Referee: [Derivation of fairness constraint] Abstract and derivation section: the feasible set {w | (1^Tw)^2 >= alpha n ||w||_2^2} is SOC-representable for a fixed workload vector w, but workloads in the cooperative Markov game are expectations w(pi) = E_{tau ~ pi}[workload vector] that are linear in the occupancy measure. The manuscript does not provide the explicit steps showing that the constraint remains SOC-representable after this expectation or how the Lagrangian dual is formulated for the resulting non-linear function of the occupancy.
Authors: We thank the referee for this precise observation. In the manuscript, the workload vector is explicitly the expected value w(π) = E_{τ ~ π}[workload vector] under the joint policy, which is linear in the occupancy measure μ via w = Aμ for a suitable matrix A derived from the state-action visitation. The fairness constraint (1^Tw)^2 ≥ α n ||w||_2^2 is a convex second-order cone constraint on w. Because the mapping μ ↦ w is affine, substituting yields an equivalent convex constraint on μ that remains SOC-representable: the inequality can be rewritten using auxiliary variables to express ||w||_2 ≤ (1^Tw)/√(α n) as a standard SOC constraint ||z||_2 ≤ t with t and z affine in μ. We will add an explicit subsection 'SOC Representation of Expected Workload Constraints' that walks through these substitution steps and shows preservation under affine maps. For the Lagrangian, we associate a single dual variable λ ≥ 0 with the (convex) constraint violation; the augmented Lagrangian is then L(π, λ) = team reward objective + λ · g(w(π)), where g is the SOC violation function, and the dual update is the standard projected ascent λ ← [λ + η g(w(π))]^+. This formulation is already implicit in our primal-dual algorithm but will be written out explicitly in the revision. revision: yes
-
Referee: [Algorithm and analysis] No section establishes convergence or bounded oscillation of the dual-ascent updates on the SOC multiplier when function approximation, joint-policy gradients, and heterogeneous non-stationary dynamics are present. The abstract's empirical claim of 0.99-1.00 satisfaction therefore lacks supporting analysis that the method avoids instability in approximate MARL.
Authors: We agree that a formal convergence or oscillation bound for the adaptive dual updates under function approximation, joint-policy gradients, and non-stationary heterogeneous dynamics is not provided and would be difficult to establish without strong additional assumptions. Such guarantees remain an open theoretical question for constrained MARL in general. Our current support for the 0.99-1.00 satisfaction claim rests on empirical evidence from the MARLHospital environment across multiple random seeds, where the adaptive multiplier updates produced stable training trajectories and consistent constraint satisfaction without visible divergence. In the revision we will add a dedicated subsection 'Empirical Stability of Adaptive Dual Updates' containing (i) time-series plots of the Lagrange multiplier λ during training and (ii) a brief discussion of how the adaptive step-size rule and projection onto λ ≥ 0 help limit oscillations in practice. We believe this strengthens the practical justification while correctly positioning full theoretical analysis as future work. revision: partial
Circularity Check
No circularity in JFI-to-SOC derivation or primal-dual updates
full rationale
The paper derives the workload fairness constraint directly from the standard Jain's Fairness Index formula JFI(w) = (1^Tw)^2 / (n ||w||_2^2) by imposing JFI(w) >= alpha, which algebraically rearranges to the quadratic inequality (1^Tw)^2 >= alpha n ||w||_2^2. This inequality is a known second-order cone representable set for w >= 0 and does not rely on any self-citation, fitted parameters, or prior author results. The subsequent claim that this admits an SOC representation enabling Lagrangian dual-ascent is a direct consequence of convex optimization facts, not a reduction to the paper's own inputs. No steps in the provided derivation chain (Markov game grounding, constraint formulation, or primal-dual algorithm) collapse by construction; the empirical constraint satisfaction numbers are reported outcomes rather than tautological predictions. The analysis is therefore self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The feasible set defined by the Jain's Fairness Index constraint admits a second-order cone representation.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
derive the fairness constraint from Jain's Fairness Index (JFI) geometry and show that the resulting feasible set admits a second-order cone representation, enabling principled Lagrangian dual-ascent updates
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Angelique Taylor, Tauhid Tanjim, Michael Joseph Sack, Maia Hirsch, Kexin Cheng, Kevin Ching, Jonathan St George, Thijs Roumen, Malte F. Jung, and Hee Rin Lee. Rapidly Built Medical Crash Cart! Lessons Learned and Impacts on High-Stakes Team Collaboration in the Emergency Room, February 2025. URL http://arxiv. org/abs/2502.18688. arXiv:2502.18688 [cs] version: 1
-
[2]
Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams, June 2025
Tauhid Tanjim, Jonathan St George, Kevin Ching, and Angelique Taylor. Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams, June 2025. URL http://arxiv.org/abs/2506.08892. arXiv:2506.08892 [cs]
-
[3]
Tauhid Tanjim, Promise Ekpo, Huajie Cao, Jonathan St George, Kevin Ching, Hee Rin Lee, and Angelique Taylor. Human-Robot Teaming Field Deployments: A Comparison Between Verbal and Non-verbal Communication, June 2025. URLhttp://arxiv.org/abs/2506.08890. arXiv:2506.08890 [cs]
-
[4]
Multi-Agent Reinforcement Learning-Based Fairness-Aware Scheduling for Bursty Traffic
Mingqi Yuan, Qi Cao, Man-On Pun, and Yi Chen. Multi-Agent Reinforcement Learning-Based Fairness-Aware Scheduling for Bursty Traffic. In2021 IEEE Global Communications Conference (GLOBECOM), pages 1–6, December 2021. doi: 10.1109/GLOBECOM46510.2021.9685661. URL https://ieeexplore.ieee.org/ document/9685661/
-
[5]
Koichi Nabetani, Paul Tseng, and Masao Fukushima. Parametrized variational inequality approaches to generalized nash equilibrium problems with shared constraints.Computational optimization and applications, 48(3):423–452, 2011
work page 2011
-
[6]
Learning Fairness in Multi-Agent Systems, October 2019
Jiechuan Jiang and Zongqing Lu. Learning Fairness in Multi-Agent Systems, October 2019. URL http: //arxiv.org/abs/1910.14472. arXiv:1910.14472 [cs]
-
[7]
Cooperation and Fairness in Multi-Agent Reinforcement Learning, October 2024
Jasmine Jerry Aloor, Siddharth Nayak, Sydney Dolan, and Hamsa Balakrishnan. Cooperation and Fairness in Multi-Agent Reinforcement Learning, October 2024. URLhttp://arxiv.org/abs/2410.14916
-
[8]
Inequity aversion improves cooperation in intertemporal social dilemmas
Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, and Thore Graepel. Inequity aversion improves cooperation in intertemporal social dilemmas, September 2018. URL http://arxiv.org/ abs/1803.08884. arXiv:1803.08884 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[9]
Grupen, Bart Selman, and Daniel D
Niko A. Grupen, Bart Selman, and Daniel D. Lee. Cooperative Multi-Agent Fairness and Equivariant Policies. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):9350–9359, June 2022. ISSN 2374-3468, 2159-5399. doi: 10.1609/aaai.v36i9.21166. URL https://ojs.aaai.org/index.php/AAAI/article/ view/21166
-
[10]
Umer Siddique, Peilang Li, and Yongcan Cao. Fairness in Traffic Control: Decentralized Multi-agent Reinforce- ment Learning with Generalized Gini Welfare Functions
-
[11]
Zhang, Michael Luck, and Elizabeth Black
Gabriele La Malfa, Jie M. Zhang, Michael Luck, and Elizabeth Black. Fairness Aware Reinforcement Learning via Proximal Policy Optimization, September 2025. URL http://arxiv.org/abs/2502.03953. arXiv:2502.03953 [cs]
-
[12]
Fairness and Welfare Quantification for Regret in Multi-Armed Bandits, May 2022
Siddharth Barman, Arindam Khan, Arnab Maiti, and Ayush Sawarni. Fairness and Welfare Quantification for Regret in Multi-Armed Bandits, May 2022. URL http://arxiv.org/abs/2205.13930. arXiv:2205.13930 [cs]. 8
-
[13]
Wanqing Fang, Xintian Zhao, and Chengwei Zhang. Fairness-aware multi-agent reinforcement learning and visual perception for adaptive traffic signal control.Optoelectronics Letters, 20(12):764–768, December 2024. ISSN 1993-5013. doi: 10.1007/s11801-024-3267-2. URLhttps://doi.org/10.1007/s11801-024-3267-2
-
[14]
Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning
Umer Siddique. Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning. 2025
work page 2025
-
[15]
Constrained Policy Optimization
Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained Policy Optimization, May 2017. URL http://arxiv.org/abs/1705.10528. arXiv:1705.10528 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[16]
Yiming Zhang, Quan Vuong, and Keith W. Ross. First Order Constrained Optimization in Policy Space, October
-
[17]
URLhttp://arxiv.org/abs/2002.06506. arXiv:2002.06506 [cs]
-
[18]
Zeyang Li and Navid Azizan. Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium, November 2024. URLhttp://arxiv.org/abs/2411.15036. arXiv:2411.15036 [cs]
-
[19]
Qianyue Hao, Fengli Xu, Lin Chen, Pan Hui, and Yong Li. Hierarchical Multi-agent Model for Reinforced Medical Resource Allocation with Imperfect Information.ACM Transactions on Intelligent Systems and Technology, 14 (1):1–27, February 2023. ISSN 2157-6904, 2157-6912. doi: 10.1145/3552436. URL https://dl.acm.org/ doi/10.1145/3552436
-
[20]
Muhammad Shadi Hajar, Harsha Kalutarage, and M. Omar Al-Kadri. RRP: A Reliable Reinforcement Learning Based Routing Protocol for Wireless Medical Sensor Networks. In2023 IEEE 20th Consumer Communications & Networking Conference (CCNC), pages 781–789, January 2023. doi: 10.1109/CCNC51644.2023.10060225. URLhttps://ieeexplore.ieee.org/document/10060225. ISSN...
-
[21]
In: 2021 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS)
Paul Maria Scheikl, Balázs Gyenes, Tornike Davitashvili, Rayan Younis, André Schulze, Beat P. Müller-Stich, Gerhard Neumann, Martin Wagner, and Franziska Mathis-Ullrich. Cooperative Assistance in Robotic Surgery through Multi-Agent Reinforcement Learning. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1859–1864, S...
-
[22]
Esha Saha and Pradeep Rathore. A smart inventory management system with medication demand dependencies in a hospital supply chain: A multi-agent reinforcement learning approach.Computers & Industrial Engineering, 191: 110165, May 2024. ISSN 0360-8352. doi: 10.1016/j.cie.2024.110165. URL https://www.sciencedirect. com/science/article/pii/S0360835224002869
-
[23]
Hanane Allioui, Mazin Abed Mohammed, Narjes Benameur, Belal Al-Khateeb, Karrar Hameed Abdulkareem, Begonya Garcia-Zapirain, Robertas Damaševiˇcius, and Rytis Maskeli¯unas. A Multi-Agent Deep Reinforcement Learning Approach for Enhancement of COVID-19 CT Image Segmentation.Journal of Personalized Medicine, 12(2):309, February 2022. ISSN 2075-4426. doi: 10....
-
[24]
Kaihong Lu, Guangqi Li, and Long Wang. Online distributed algorithms for seeking generalized Nash equilibria in dynamic environments, April 2020. URL http://arxiv.org/abs/2004.00525. arXiv:2004.00525 [math]
-
[25]
Distributed Generalized Nash Equilibria Seeking for Aggregative Games with Community Structures
Rui Huang, Yixin Gu, Yuan Fan, and Songsong Cheng. Distributed Generalized Nash Equilibria Seeking for Aggregative Games with Community Structures. In2023 42nd Chinese Control Conference (CCC), pages 8137–8142, July 2023. doi: 10.23919/CCC58697.2023.10240724. URL https://ieeexplore.ieee.org/ document/10240724/. ISSN: 1934-1768
-
[26]
The Confluence of Networks, Games and Learning, August
Tao Li, Guanze Peng, Quanyan Zhu, and Tamer Basar. The Confluence of Networks, Games and Learning, August
- [27]
-
[28]
Learning Generalized Nash Equilibria in a Class of Convex Games
Tatiana Tatarenko and Maryam Kamgarpour. Learning Generalized Nash Equilibria in a Class of Convex Games, October 2018. URLhttp://arxiv.org/abs/1703.04113. arXiv:1703.04113 [math]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[29]
Zhaolong Ning, Peiran Dong, Xiaojie Wang, Xiping Hu, Lei Guo, Bin Hu, Yi Guo, Tie Qiu, and Ricky Y . K. Kwok. Mobile Edge Computing Enabled 5G Health Monitoring for Internet of Medical Things: A Decentralized Game Theoretic Approach.IEEE Journal on Selected Areas in Communications, 39(2):463–478, February 2021. ISSN 1558-0008. doi: 10.1109/JSAC.2020.30206...
-
[30]
Yuxuan Yang, Xiaojie Wang, Zhaolong Ning, Joel J. P. C. Rodrigues, Xin Jiang, and Yi Guo. Edge Learning for Internet of Medical Things and Its COVID-19 Applications: A Distributed 3C Framework.IEEE Internet of Things Magazine, 4(3):18–23, September 2021. ISSN 2576-3199. doi: 10.1109/IOTM.0100.2000154. URL https://ieeexplore.ieee.org/document/9548976
-
[31]
Sungwook Kim. Learning and Game Based Spectrum Allocation Model for Internet of Medical Things (IoMT) Platform.IEEE Access, PP:1–1, January 2023. doi: 10.1109/ACCESS.2023.3266331. 9
-
[32]
R. Jain, D. Chiu, and W. Hawe. A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems, September 1998. URL http://arxiv.org/abs/cs/9809099. arXiv:cs/9809099
work page internal anchor Pith review Pith/arXiv arXiv 1998
-
[33]
Equinox: Holistic Fair Scheduling in Serving Large Language Models, August 2025
Zhixiang Wei, James Yen, Jingyi Chen, Ziyang Zhang, Zhibai Huang, Chen Chen, Xingzi Yu, Yicheng Gu, Chenggang Wu, Yun Wang, Mingyuan Xia, Jie Wu, Hao Wang, and Zhengwei Qi. Equinox: Holistic Fair Scheduling in Serving Large Language Models, August 2025. URL http://arxiv.org/abs/2508.16646. arXiv:2508.16646 [cs]
-
[34]
Equity-Aware Spatial-Temporal Workload Shifting for Sustainable AI Data Centers
Mohammad Jaminur Islam and Shaolei Ren. Equity-Aware Spatial-Temporal Workload Shifting for Sustainable AI Data Centers
-
[35]
Fairness-Aware Multi-Agent Learning-based Task Offloading in Dynamic Vehicular Scenarios
Ziqi Zhou, Agon Memedi, Chunghan Lee, Seyhan Ucar, Onur Altintas, and Falko Dressler. Fairness-Aware Multi-Agent Learning-based Task Offloading in Dynamic Vehicular Scenarios
-
[36]
Routledge, Boca Raton, 1 edition, December 2021
Eitan Altman.Constrained Markov Decision Processes: Stochastic Modeling. Routledge, Boca Raton, 1 edition, December 2021. ISBN 978-1-315-14022-3. doi: 10.1201/9781315140223. URL https://www.taylorfrancis. com/books/9781315140223
-
[37]
Promise Osaine Ekpo, Brian La, Thomas Wiener, Saesha Agarwal, Arshia Agrawal, Gonzalo Gonzalez-Pumariega, Lekan P. Molu, and Angelique Taylor. Skill-Aligned Fairness in Multi-Agent Learning for Collaboration in Healthcare, August 2025. URLhttp://arxiv.org/abs/2508.18708. arXiv:2508.18708 [cs]
-
[38]
American Red Cross Code Cards, January 2025
American Red Cross. American Red Cross Code Cards, January 2025. URL https://www. redcrosslearningcenter.org/s/american-red-cross-code-cards
work page 2025
-
[39]
Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks,
Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, and Stefano V . Albrecht. Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks, November 2021. URL http://arxiv.org/ abs/2006.07869
-
[40]
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, June
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, June
-
[41]
URLhttp://arxiv.org/abs/1803.11485
work page internal anchor Pith review Pith/arXiv arXiv
-
[42]
Shangding Gu, Jakub Grudzien Kuba, Yuanpei Chen, Yali Du, Long Yang, Alois Knoll, and Yaodong Yang. Safe multi-agent reinforcement learning for multi-robot control.Artificial Intelligence, 319:103905, June 2023. ISSN 00043702. doi: 10.1016/j.artint.2023.103905. URL https://linkinghub.elsevier.com/retrieve/pii/ S0004370223000516
-
[43]
Competitive Statistical Estimation with Strategic Data Sources
Tyler Westenbroek, Roy Dong, Lillian J. Ratliff, and S. Shankar Sastry. Competitive Statistical Estimation with Strategic Data Sources, April 2019. URLhttp://arxiv.org/abs/1904.12768. arXiv:1904.12768 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[44]
Reward Constrained Policy Optimization
Chen Tessler, Daniel J. Mankowitz, and Shie Mannor. Reward Constrained Policy Optimization, December 2018. URLhttp://arxiv.org/abs/1805.11074. arXiv:1805.11074 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[45]
Ankita Kushwaha, Kiran Ravish, Preeti Lamba, and Pawan Kumar. A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety, May 2025. URL http: //arxiv.org/abs/2505.17342. arXiv:2505.17342 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[46]
V . S. Borkar and S. P. Meyn. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning.SIAM Journal on Control and Optimization, 38(2):447–469, January 2000. ISSN 0363-0129. doi: 10. 1137/S0363012997331639. URL https://epubs.siam.org/doi/10.1137/S0363012997331639. Publisher: Society for Industrial and Applied Mathematics
-
[47]
Benchmarking Safe Exploration in Deep Reinforcement Learning
Alex Ray, Joshua Achiam, and Dario Amodei. Benchmarking Safe Exploration in Deep Reinforcement Learning. 10 A Additional clarification on GNE and constrained optimization Motivation for the GNE perspective.Although this is an identical-payoff game where strategic conflict is absent, the GNE framework captures a critical structural property:coupled feasibi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.