Prosocial learning agents solve generalized Stag Hunts better than selfish ones

arxiv: 1709.02865 · v2 · pith:QHUZUH7Lnew · submitted 2017-09-08 · 💻 cs.AI · cs.GT

Prosocial learning agents solve generalized Stag Hunts better than selfish ones

Alexander Peysakhovich , Adam Lerer This is my paper

classification 💻 cs.AI cs.GT

keywords agentagentslearningstagmakingoutcomesprosocialreactive

0 comments p. Extension

pith:QHUZUH7L Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{QHUZUH7L}

Prints a linked pith:QHUZUH7L badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Deep reinforcement learning has become an important paradigm for constructing agents that can enter complex multi-agent situations and improve their policies through experience. One commonly used technique is reactive training - applying standard RL methods while treating other agents as a part of the learner's environment. It is known that in general-sum games reactive training can lead groups of agents to converge to inefficient outcomes. We focus on one such class of environments: Stag Hunt games. Here agents either choose a risky cooperative policy (which leads to high payoffs if both choose it but low payoffs to an agent who attempts it alone) or a safe one (which leads to a safe payoff no matter what). We ask how we can change the learning rule of a single agent to improve its outcomes in Stag Hunts that include other reactive learners. We extend existing work on reward-shaping in multi-agent reinforcement learning and show that that making a single agent prosocial, that is, making them care about the rewards of their partners can increase the probability that groups converge to good outcomes. Thus, even if we control a single agent in a group making that agent prosocial can increase our agent's long-run payoff. We show experimentally that this result carries over to a variety of more complex environments with Stag Hunt-like dynamics including ones where agents must learn from raw input pixels.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Price of Paranoia: Robust Risk-Sensitive Cooperation in Non-Stationary Multi-Agent Reinforcement Learning
cs.GT 2026-04 unverdicted novelty 7.0

Robustness applied to policy-gradient variance rather than return distributions expands the basin of cooperative equilibria under partner noise in coordination games, quantified via the new Price of Paranoia metric.
Investigating the Impact of Subgraph Social Structure Preference on the Strategic Behavior of Networked Mixed-Motive Learning Agents
cs.MA 2026-04 unverdicted novelty 6.0

Preferences over local subgraph structures cause distinct changes in reward collection and strategic actions for agents playing sequential social dilemmas in Harvest and Cleanup environments.
VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments
cs.AI 2025-06 unverdicted novelty 6.0

VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasonin...