pith. sign in

arxiv: 2601.22292 · v2 · pith:TSDIVPTHnew · submitted 2026-01-29 · 💻 cs.MA · cs.LG

Learning Incentive Structures for Cooperative Resilience in Multi-Agent Systems under Social Dilemmas

Pith reviewed 2026-05-21 14:30 UTC · model grok-4.3

classification 💻 cs.MA cs.LG
keywords multi-agent reinforcement learningsocial dilemmascooperative resilienceincentive structuresresource sharingdisruptionscollective behavior
0
0 comments X

The pith

A hybrid of individual and group rewards sustains cooperation in multi-agent systems facing resource disruptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a method to learn reward functions that encourage agents to behave resiliently in settings where self-interest clashes with group benefit, such as sharing limited resources. It defines a resilience score based on how well agent groups maintain performance when hit by sudden changes, then uses that score to pick out good reward designs. These rewards are plugged into standard multi-agent learning algorithms. The key finding is that mixing personal and group-based rewards works better than either alone, keeping the group from running out of resources and falling apart even after disruptions occur.

Core claim

The authors show that inferring reward functions from trajectories scored by a resilience metric, and then training agents with a hybrid of individual and resilience-aligned incentives, results in sustained collective behavior, fewer collapse events from resource depletion, and maintained system performance when facing disruptions in resource-sharing environments.

What carries the argument

A resilience metric that scores and ranks complete agent trajectories to infer reward functions promoting collective well-being, which are then combined with individual incentives in the multi-agent reinforcement learning loop.

If this is right

  • The hybrid approach sustains collective behavior over time.
  • It reduces the number of collapse events tied to resource depletion.
  • It preserves overall system performance even when disruptions occur.
  • Individual or purely collective incentives are less effective in these settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could be tested in other social dilemma scenarios like public goods games or prisoner's dilemma variants with perturbations.
  • Scaling the method to larger numbers of agents might reveal limits in how well the inferred rewards generalize.
  • Integrating this with other resilience measures, such as network-based ones, could strengthen the results.

Load-bearing premise

That scoring how agents act over entire runs based on a resilience measure can reliably point to reward settings that will make groups stay cooperative when resources get disrupted.

What would settle it

Running the same resource-sharing experiments with the hybrid incentives but observing the same high rate of collapses and performance drops as seen with pure individual incentives would falsify the central claim.

Figures

Figures reproduced from arXiv: 2601.22292 by Luis Felipe Giraldo, Manuela Chacon-Chamorro, Nicanor Quijano.

Figure 1
Figure 1. Figure 1: (a) Mixed-motive environment used throughout this study. Two agents interact in an 8 × 8 grid with a central apple tree containing 16 apples. (b) Overview of our proposed reward learning pipeline. This figure illustrates the full loop from data collection to policy learning. 3.1 Ranking Trajectories by Cooperative Resilience A trajectory τ = (s0, a0, s1, a1, · · · , sT ) is defined as a sequence of states … view at source ↗
Figure 2
Figure 2. Figure 2: Percentage of episodes (out of 500) in which agents consumed the last remaining apple for the best [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance metrics over 500 episodes. (a) Cooperative resilience. (b) Average total apple consumption per episode across both agents. (c) Episode length. (d) Last-apple consumption frequency, indicating the occurrence of social dilemma failures. under hybrid strategy, resources typically remain available until the simulation horizon (5000 steps), indicat￾ing more efficient and balanced exploitation. Final… view at source ↗
Figure 4
Figure 4. Figure 4: Position frequency maps for Agent 1 (green) and Agent 2 (purple) under four training config [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Multi-agent social dilemmas, such as the tragedy of the commons, capture settings where individual incentives conflict with collective well-being, making these systems highly vulnerable to collapse under disruptions. In this context, this work studies cooperative resilience, understood as the system-level ability to maintain collective well-being under perturbations through adaptive agent behavior. We propose a framework for learning incentive structures aligned with collective well-being in multi-agent reinforcement learning systems, where reward functions shape individual decision-making and collective behavior. A resilience metric is used to score and rank agent trajectories, allowing the inference of reward functions that promote resilient collective behavior. These inferred reward functions are integrated into the multi-agent reinforcement learning process to shape agent interactions in social dilemma settings. The approach is evaluated in resource-sharing environments subject to disruptions, using three incentive structures: individual incentives, resilience-aligned incentives, and a hybrid incentive structure that combines both individual and collective components. The results show that the hybrid incentive structure promotes sustained collective behavior, reduces collapse events associated with resource depletion, and preserves system performance under disruption. These findings highlight the role of incentive design as a mechanism for promoting resilient collective behavior and provide a computational framework for multi-agent social dilemmas under disruptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a framework for learning incentive structures in multi-agent reinforcement learning (MARL) to promote cooperative resilience in social dilemma settings such as resource-sharing environments. A resilience metric scores and ranks agent trajectories to infer reward functions that are then integrated into the MARL training loop. The work evaluates three incentive structures—individual, resilience-aligned, and hybrid—in environments subject to disruptions, claiming that the hybrid structure sustains collective behavior, reduces collapse events from resource depletion, and preserves system performance.

Significance. If the inference procedure and empirical results hold, the work offers a computational approach to aligning individual rewards with system-level resilience in MARL, which could inform incentive design for mitigating tragedies of the commons under perturbations. The evaluation across multiple incentive structures provides a useful comparison, though the overall significance is limited by the absence of detailed validation that the metric-driven rewards reliably induce the claimed resilient fixed points rather than artifacts of weighting.

major comments (2)
  1. [Framework description (inferred from abstract and methods outline)] The central claim that the hybrid incentive structure promotes sustained collective behavior and reduces collapses rests on the step of inferring reward functions from resilience metric scores on trajectories and inserting them into the MARL loop. The manuscript provides no description of this inference procedure (e.g., inverse RL, regression, or constrained optimization) nor any proof or ablation showing that high metric scores imply stable collective outcomes under the learned rewards; this link is load-bearing and currently unverified.
  2. [Evaluation and results sections] The resilience metric is defined on agent trajectories to promote resilient collective behavior, yet the abstract and evaluation sections do not report how the metric is constructed, validated, or shown to be independent of the very collective outcomes it is meant to incentivize. This raises a circularity risk where improvements could stem from the hybrid weighting rather than the metric-driven inference, undermining the cross-structure comparison.
minor comments (2)
  1. [Results] The abstract and results would benefit from explicit reporting of error bars, number of runs, and statistical significance for the reported reductions in collapse events and performance preservation.
  2. [Methods] Notation for the resilience metric parameters and the hybrid weighting coefficients should be introduced with clear definitions to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify areas where the manuscript can be strengthened. We address each major comment below and commit to revisions that provide the requested details and validations without altering the core claims.

read point-by-point responses
  1. Referee: [Framework description (inferred from abstract and methods outline)] The central claim that the hybrid incentive structure promotes sustained collective behavior and reduces collapses rests on the step of inferring reward functions from resilience metric scores on trajectories and inserting them into the MARL loop. The manuscript provides no description of this inference procedure (e.g., inverse RL, regression, or constrained optimization) nor any proof or ablation showing that high metric scores imply stable collective outcomes under the learned rewards; this link is load-bearing and currently unverified.

    Authors: We agree that the current manuscript does not provide a sufficiently detailed description of the inference procedure or supporting analysis for the link between metric scores and stable outcomes. In the revised version we will add an explicit subsection in the methods describing the inference process (ranking trajectories by the resilience metric and deriving reward functions via regression on the scored trajectories) and include new ablations that test whether high-scoring trajectories produce stable collective fixed points under the inferred rewards. revision: yes

  2. Referee: [Evaluation and results sections] The resilience metric is defined on agent trajectories to promote resilient collective behavior, yet the abstract and evaluation sections do not report how the metric is constructed, validated, or shown to be independent of the very collective outcomes it is meant to incentivize. This raises a circularity risk where improvements could stem from the hybrid weighting rather than the metric-driven inference, undermining the cross-structure comparison.

    Authors: We acknowledge the circularity concern and the lack of explicit reporting on metric construction and independence. The revised manuscript will include the precise mathematical definition of the resilience metric, its component terms, and additional validation experiments (e.g., applying the metric to trajectories generated under purely individual incentives and confirming consistent scoring behavior). These additions will demonstrate that the metric operates independently of the hybrid weighting and that performance differences arise from the inferred rewards rather than weighting artifacts alone. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The paper describes a resilience metric applied to trajectories to infer rewards, followed by integration into MARL training and evaluation of hybrid incentives in resource-sharing environments. No equations or explicit reduction are provided in the available text showing that the inferred rewards or final performance claims are equivalent to the metric inputs by construction. The framework treats the metric as an external scoring device for selecting or shaping rewards, with results presented as empirical outcomes rather than tautological restatements. The derivation remains self-contained against the described benchmarks and does not reduce the central claims to self-definition or fitted renaming.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework rests on the existence of a quantifiable resilience metric and the assumption that reward functions inferred from it will produce the desired system-level property when used in MARL.

free parameters (1)
  • resilience metric parameters
    Parameters that define how trajectories are scored and ranked; these must be chosen or fitted to produce the ranking used for reward inference.
axioms (1)
  • domain assumption A scalar resilience metric on agent trajectories can be defined that captures collective well-being under perturbations
    Invoked when the metric is used to score and rank trajectories for reward inference.
invented entities (1)
  • resilience-aligned incentive structure no independent evidence
    purpose: Reward function derived from the resilience metric to shape agent behavior toward collective stability
    New category of incentive introduced by the paper and contrasted with individual and hybrid versions.

pith-pipeline@v0.9.0 · 5747 in / 1250 out tokens · 29425 ms · 2026-05-21T14:30:01.822551+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We propose a framework for learning incentive structures aligned with collective well-being in multi-agent reinforcement learning systems, where reward functions shape individual decision-making and collective behavior. A resilience metric is used to score and rank agent trajectories, allowing the inference of reward functions that promote resilient collective behavior.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 2 internal anchors

  1. [1]

    On impact of disturbance in the deployment problem of multi-agent system,

    K. Topolewicz, S. Olaru, E. Girejko, and C. E. D´ orea, “On impact of disturbance in the deployment problem of multi-agent system,”Archives of Control Sciences, pp. 299–320, 2023

  2. [2]

    On control of multiagent systems in the presence of a misbehaving agent,

    E. Yildirim, S. B. Sarsilmaz, A. T. Koru, and T. Yucelen, “On control of multiagent systems in the presence of a misbehaving agent,”IEEE Control Systems Letters, vol. 4, no. 2, pp. 456–461, 2019

  3. [3]

    Cooperative resilience in arti- ficial intelligence multiagent systems,

    M. Chacon-Chamorro, L. F. Giraldo, N. Quijano, V. Vargas-Panesso, C. Gonz´ alez, J. S. Pinz´ on, R. Man- rique, M. R´ ıos, Y. Fonseca, D. G´ omez-Barrera, and M. Perdomo-P´ erez, “Cooperative resilience in arti- ficial intelligence multiagent systems,”IEEE Transactions on Artificial Intelligence, 2025, to appear

  4. [4]

    Collaboration promotes group resilience in multi-agent RL,

    I. Shraga, G. Azran, M. Gerstgrasser, O. Abu, J. Rosenschein, and S. Keren, “Collaboration promotes group resilience in multi-agent RL,” inReinforcement Learning Conference, 2025. 1See supplementary video: https://drive.google.com/file/d/15j3OD6HnuKYPrDJJmQVgYOyE04HSwmB3/view?usp=sharing 10

  5. [5]

    Monotonic value function factorisation for deep multi-agent reinforcement learning,

    T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,”Journal of Machine Learning Research, vol. 21, no. 178, pp. 1–51, 2020

  6. [6]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algo- rithms,”arXiv preprint arXiv:1707.06347, 2017

  7. [7]

    Social-learning coordination of collaborative multi-robot sys- tems achieves resilient production in a smart factory,

    Z. Nie, K.-C. Chen, and K. J. Kim, “Social-learning coordination of collaborative multi-robot sys- tems achieves resilient production in a smart factory,”IEEE Transactions on Automation Science and Engineering, pp. 1–15, 2024

  8. [8]

    Multi-agent Reinforcement Learning in Sequential Social Dilemmas

    J. Z. Leibo, V. Zambaldi, M. Lanctot, J. Marecki, and T. Graepel, “Multi-agent reinforcement learning in sequential social dilemmas,”arXiv preprint arXiv:1702.03037, 2017

  9. [9]

    Efficient inverse multiagent learning,

    D. Goktas, A. Greenwald, S. Zhao, A. Koppel, and S. Ganesh, “Efficient inverse multiagent learning,” arXiv preprint arXiv:2502.14160, 2025

  10. [10]

    Dynamic inverse reinforcement learning for characterizing animal behavior,

    Z. Ashwood, A. Jha, and J. W. Pillow, “Dynamic inverse reinforcement learning for characterizing animal behavior,”Advances in neural information processing systems, vol. 35, pp. 29 663–29 676, 2022

  11. [11]

    Inverse game theory for stackelberg games: the blessing of bounded rationality,

    J. Wu, W. Shen, F. Fang, and H. Xu, “Inverse game theory for stackelberg games: the blessing of bounded rationality,”Advances in Neural Information Processing Systems, vol. 35, pp. 32 186–32 198, 2022

  12. [12]

    A multi-agent reinforcement learning model of common-pool resource appropriation,

    J. Perolat, J. Z. Leibo, V. Zambaldi, C. Beattie, K. Tuyls, and T. Graepel, “A multi-agent reinforcement learning model of common-pool resource appropriation,”Advances in neural information processing systems, vol. 30, 2017

  13. [13]

    Melting pot 2.0,

    J. P. Agapiou, A. S. Vezhnevets, E. A. Du´ e˜ nez-Guzm´ an, J. Matyas, Y. Mao, P. Sunehag, R. K¨ oster, U. Madhushani, K. Kopparapu, R. Comanescuet al., “Melting pot 2.0,”arXiv preprint arXiv:2211.13746, 2022

  14. [14]

    arXiv preprint arXiv:2012.08630 , year=

    A. Dafoe, E. Hughes, Y. Bachrach, T. Collins, K. R. McKee, J. Z. Leibo, K. Larson, and T. Graepel, “Open problems in cooperative ai,”arXiv preprint arXiv:2012.08630, 2020

  15. [15]

    arXiv preprint arXiv:2502.14143 , year=

    L. Hammond, A. Chan, J. Clifton, J. Hoelscher-Obermaier, A. Khan, E. McLean, C. Smith, W. Barfuss, J. Foerster, T. Gavenˇ ciaket al., “Multi-agent risks from advanced ai,”arXiv preprint arXiv:2502.14143, 2025

  16. [16]

    Understanding the world to solve social dilemmas using multi-agent reinforcement learning,

    M. Rios, N. Quijano, and L. F. Giraldo, “Understanding the world to solve social dilemmas using multi-agent reinforcement learning,”arXiv preprint arXiv:2305.11358, 2023

  17. [17]

    The social dilemma in artificial intelligence development and why we have to solve it,

    I. Str¨ umke, M. Slavkovik, and V. I. Madai, “The social dilemma in artificial intelligence development and why we have to solve it,”AI and Ethics, vol. 2, no. 4, pp. 655–665, 2022

  18. [18]

    Reimagining the future of technology:“the social dilemma

    S. Du, “Reimagining the future of technology:“the social dilemma” review,”Journal of Business Ethics, vol. 177, no. 1, pp. 213–215, 2022

  19. [19]

    A strategic approach to collective action: Looking for agency in social-movement choices,

    J. Jasper, “A strategic approach to collective action: Looking for agency in social-movement choices,” Mobilization: An International Quarterly, vol. 9, no. 1, pp. 1–16, 2004

  20. [20]

    Explaining decisions of agents in mixed-motive games,

    M. Orner, O. Maksimov, A. Kleinerman, C. Ortiz, and S. Kraus, “Explaining decisions of agents in mixed-motive games,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 22, 2025, pp. 23 267–23 275

  21. [21]

    Evaluating co- operative resilience in multiagent systems: A comparison between humans and llms,

    M. Chacon-Chamorro, J. S. Pinz´ on, R. Manrique, L. F. Giraldo, and N. Quijano, “Evaluating co- operative resilience in multiagent systems: A comparison between humans and llms,”arXiv preprint arXiv:2512.11689, 2025. 11

  22. [22]

    Systems resilience for multihazard environments: Definition, metrics, and valuation for decision making,

    B. M. Ayyub, “Systems resilience for multihazard environments: Definition, metrics, and valuation for decision making,”Risk analysis, vol. 34, no. 2, pp. 340–355, 2014

  23. [23]

    Peoples: a framework for evaluating resilience,

    G. P. Cimellaro, C. Renschler, A. M. Reinhorn, and L. Arendt, “Peoples: a framework for evaluating resilience,”Journal of Structural Engineering, vol. 142, no. 10, p. 04016063, 2016

  24. [24]

    Gis-based approach for evaluating a community intrinsic resilience index,

    F. Gerges, H. Nassif, X. Geng, H. A. Michael, and M. C. Boufadel, “Gis-based approach for evaluating a community intrinsic resilience index,”Natural Hazards, vol. 111, no. 2, pp. 1271–1299, 2022

  25. [25]

    Deep multi-agent reinforcement learning,

    J. Foerster, “Deep multi-agent reinforcement learning,” Ph.D. dissertation, University of Oxford, 2018

  26. [26]

    Social influence as intrinsic motivation for multi-agent deep reinforcement learning,

    N. Jaques, A. Lazaridou, E. Hughes, C. Gulcehre, P. Ortega, D. Strouse, J. Z. Leibo, and N. De Freitas, “Social influence as intrinsic motivation for multi-agent deep reinforcement learning,” inInternational conference on machine learning. PMLR, 2019, pp. 3040–3049

  27. [27]

    Inequity aversion improves cooperation in intertemporal social dilemmas,

    E. Hughes, J. Z. Leibo, M. Phillips, K. Tuyls, E. Due˜ nez-Guzman, A. Garc´ ıa Casta˜ neda, I. Dunning, T. Zhu, K. McKee, R. Kosteret al., “Inequity aversion improves cooperation in intertemporal social dilemmas,”Advances in neural information processing systems, vol. 31, 2018

  28. [28]

    Gifting in multi-agent reinforcement learning,

    A. Lupu and D. Precup, “Gifting in multi-agent reinforcement learning,” inProceedings of the 19th International Conference on autonomous agents and multiagent systems, 2020, pp. 789–797

  29. [29]

    Learning to incentivize other learning agents,

    J. Yang, A. Li, M. Farajtabar, P. Sunehag, E. Hughes, and H. Zha, “Learning to incentivize other learning agents,”Advances in Neural Information Processing Systems, vol. 33, pp. 15 208–15 219, 2020

  30. [30]

    A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings,

    E. Vinitsky, R. K¨ oster, J. P. Agapiou, E. A. Du´ e˜ nez-Guzm´ an, A. S. Vezhnevets, and J. Z. Leibo, “A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings,” Collective Intelligence, vol. 2, no. 2, p. 26339137231162025, 2023

  31. [31]

    Emergent cooperation from mutual acknowledgment exchange in multi-agent reinforcement learning,

    T. Phan, F. Sommer, F. Ritz, P. Altmann, J. N¨ ußlein, M. K¨ olle, L. Belzner, and C. Linnhoff-Popien, “Emergent cooperation from mutual acknowledgment exchange in multi-agent reinforcement learning,” Autonomous Agents and Multi-Agent Systems, vol. 38, no. 2, p. 34, 2024

  32. [32]

    A survey of inverse reinforcement learning,

    S. Adams, T. Cody, and P. A. Beling, “A survey of inverse reinforcement learning,”Artificial Intelligence Review, vol. 55, no. 6, pp. 4307–4346, 2022

  33. [33]

    Towards theoretical understanding of inverse reinforcement learning,

    A. M. Metelli, F. Lazzati, and M. Restelli, “Towards theoretical understanding of inverse reinforcement learning,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 24 555–24 591

  34. [34]

    A survey of inverse reinforcement learning: Challenges, methods and progress,

    S. Arora and P. Doshi, “A survey of inverse reinforcement learning: Challenges, methods and progress,” Artificial Intelligence, vol. 297, p. 103500, 2021

  35. [35]

    Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations,

    D. Brown, W. Goo, P. Nagarajan, and S. Niekum, “Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations,” inInternational conference on machine learning. PMLR, 2019, pp. 783–792

  36. [36]

    Sub-optimal experts mitigate ambiguity in inverse reinforcement learning,

    R. Poiani, C. Gabriele, A. M. Metelli, and M. Restelli, “Sub-optimal experts mitigate ambiguity in inverse reinforcement learning,”Advances in Neural Information Processing Systems, vol. 37, pp. 85 778– 85 823, 2024

  37. [37]

    Multi-agent inverse reinforcement learning,

    S. Natarajan, G. Kunapuli, K. Judah, P. Tadepalli, K. Kersting, and J. Shavlik, “Multi-agent inverse reinforcement learning,” in2010 ninth international conference on machine learning and applications. IEEE, 2010, pp. 395–400

  38. [38]

    Markov games as a framework for multi-agent reinforcement learning,

    M. L. Littman, “Markov games as a framework for multi-agent reinforcement learning,” inMachine learning proceedings 1994. Elsevier, 1994, pp. 157–163

  39. [39]

    Inverse concave-utility reinforcement learning is inverse game theory,

    M. M. C ¸ elikok, F. A. Oliehoek, and J.-W. van de Meent, “Inverse concave-utility reinforcement learning is inverse game theory,”arXiv preprint arXiv:2405.19024, 2024. 12

  40. [40]

    Inverse reinforcement learning in swarm systems,

    A. ˇSoˇ si´ c, W. R. KhudaBukhsh, A. M. Zoubir, and H. Koeppl, “Inverse reinforcement learning in swarm systems,” inProceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, 2017, pp. 1413–1421

  41. [41]

    Will systems of llm agents cooperate: An investigation into a social dilemma,

    R. Willis, Y. Du, J. Z. Leibo, and M. Luck, “Will systems of llm agents cooperate: An investigation into a social dilemma,”arXiv preprint arXiv:2501.16173, 2025

  42. [42]

    Planning, learning and coordination in multiagent decision processes,

    C. Boutilier, “Planning, learning and coordination in multiagent decision processes,” inTARK, vol. 96, 1996, pp. 195–210. 13