Prosocial learning agents solve generalized stag hunts better than selfish ones

· 2017 · cs.AI · arXiv 1709.02865

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Deep reinforcement learning has become an important paradigm for constructing agents that can enter complex multi-agent situations and improve their policies through experience. One commonly used technique is reactive training - applying standard RL methods while treating other agents as a part of the learner's environment. It is known that in general-sum games reactive training can lead groups of agents to converge to inefficient outcomes. We focus on one such class of environments: Stag Hunt games. Here agents either choose a risky cooperative policy (which leads to high payoffs if both choose it but low payoffs to an agent who attempts it alone) or a safe one (which leads to a safe payoff no matter what). We ask how we can change the learning rule of a single agent to improve its outcomes in Stag Hunts that include other reactive learners. We extend existing work on reward-shaping in multi-agent reinforcement learning and show that that making a single agent prosocial, that is, making them care about the rewards of their partners can increase the probability that groups converge to good outcomes. Thus, even if we control a single agent in a group making that agent prosocial can increase our agent's long-run payoff. We show experimentally that this result carries over to a variety of more complex environments with Stag Hunt-like dynamics including ones where agents must learn from raw input pixels.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

The Price of Paranoia: Robust Risk-Sensitive Cooperation in Non-Stationary Multi-Agent Reinforcement Learning

cs.GT · 2026-04-17 · unverdicted · novelty 7.0

Robustness applied to policy-gradient variance rather than return distributions expands the basin of cooperative equilibria under partner noise in coordination games, quantified via the new Price of Paranoia metric.

VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments

cs.AI · 2025-06-03 · unverdicted · novelty 6.0

VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of

Investigating the Impact of Subgraph Social Structure Preference on the Strategic Behavior of Networked Mixed-Motive Learning Agents

cs.MA · 2026-04-04 · unverdicted · novelty 6.0

Preferences over local subgraph structures cause distinct changes in reward collection and strategic actions for agents playing sequential social dilemmas in Harvest and Cleanup environments.

citing papers explorer

Showing 3 of 3 citing papers.

The Price of Paranoia: Robust Risk-Sensitive Cooperation in Non-Stationary Multi-Agent Reinforcement Learning cs.GT · 2026-04-17 · unverdicted · none · ref 4
Robustness applied to policy-gradient variance rather than return distributions expands the basin of cooperative equilibria under partner noise in coordination games, quantified via the new Price of Paranoia metric.
VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments cs.AI · 2025-06-03 · unverdicted · none · ref 53 · internal anchor
VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of
Investigating the Impact of Subgraph Social Structure Preference on the Strategic Behavior of Networked Mixed-Motive Learning Agents cs.MA · 2026-04-04 · unverdicted · none · ref 14
Preferences over local subgraph structures cause distinct changes in reward collection and strategic actions for agents playing sequential social dilemmas in Harvest and Cleanup environments.

Prosocial learning agents solve generalized stag hunts better than selfish ones

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer