The fitness landscape of social norms in social dilemmas

Maximilian Puelma Touzel

arxiv: 2605.18834 · v1 · pith:ULWKNBF2new · submitted 2026-05-13 · 💻 cs.GT · cs.MA· cs.SI· q-bio.PE

The fitness landscape of social norms in social dilemmas

Maximilian Puelma Touzel This is my paper

Pith reviewed 2026-05-20 21:43 UTC · model grok-4.3

classification 💻 cs.GT cs.MAcs.SIq-bio.PE

keywords social normssocial dilemmascorrelated equilibriumreplicator dynamicsMarkov gamesevolutionary game theorycoordination mechanisms

0 comments

The pith

Norms that align with optimal single-player strategies produce correlated equilibria in Markov games.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that social norms prescribing actions based on environmental signals resolve dilemmas when those prescriptions match what each agent would choose optimally for its own expected reward. This match makes the joint strategy satisfy a correlated equilibrium condition rather than a Nash equilibrium. The analysis is carried out in the Markov game setting with stochastic transitions and is illustrated by mapping norms across signal and reward spaces. A general solution for the replicator dynamics then explains how such norms can spread and become dominant through evolutionary competition in a population. Readers care because the work supplies a decentralized mechanism for coordination that relies only on individual reward comparisons and shared stochastic observations.

Core claim

Joint player strategies that adopt norms consistent with optimal single-player strategies with respect to expected reward naturally satisfy a correlated rather than Nash game theoretic equilibrium condition. In the Markov game setting the paper supplies a general solution and analysis of the replicator dynamics by which these norms emerge.

What carries the argument

Replicator dynamics applied to norms defined over signal and reward spaces that enforce consistency with single-agent optimality.

If this is right

Norms consistent with single-player optimality spread to dominance under replicator dynamics in populations of rational agents.
The classification of norms by rationality criteria extends from matrix games to stochastic Markov games.
Signal correlation and uncertainty together determine the stability of norm adoption through expected-reward comparisons.
Decentralized norm emergence occurs without central authority or explicit agreements among agents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same signal-based norm construction could be used to design coordination protocols for multi-agent reinforcement learning systems facing repeated dilemmas.
Empirical tests in human groups could check whether observed coordination rules in uncertain environments match the predicted correlated-equilibrium structure.
Relaxing the assumption of perfect rationality in the replicator dynamics might reveal additional stable norm mixtures in noisy or heterogeneous populations.

Load-bearing premise

Signals from the environment must possess sufficient correlation to enable mutually beneficial coordination while retaining enough uncertainty to disincentivize exploitation of that coordination.

What would settle it

A simulation or analytic calculation in which replicator dynamics in a Markov game with correlated signals fail to increase the frequency of norms that match single-player optima, or in which the resulting joint behavior satisfies Nash equilibrium instead of correlated equilibrium.

Figures

Figures reproduced from arXiv: 2605.18834 by Maximilian Puelma Touzel.

**Figure 1.** Figure 1: Classes of norms. Diagram taken from Morsky and Akçay [2019]. and plays the best Nash strategy). This default strategy competes with the prescriptions of norms in any selection dynamics. Norms that are externally valid are called empirically validatable: the prescription is rational against the prescription of another rational norm. If an empirically validatable norm is validated by itself, it is called co… view at source ↗

**Figure 2.** Figure 2: Phase diagram for the rationality of the signal-following norm over the game family of the game of chicken. Left: mutual information between environmental signals o and o ′ as a function the parametrization of Poo′ , (b, g). Center: phase diagram for the general case. The norm is not always rational: when is observed, it is better to go when b > (1 + 2gL)/(1 + 2L) (red). Similarly, when is observed, it bet… view at source ↗

**Figure 3.** Figure 3: Replicator dynamics (eq. (12)) for norms in the game of chicken. Top: State space of norm frequencies in which 30 trajectories (red) are shown starting at frequencies sampled uniformily in the volume (cross markers) (left: b = 1/2 , center: b = 1/3, right: b = 1/5). The trajectories end on the signal-following norm (black circle). Rational norms are denoted by the defining pair of o : a. Bottom: Eigenvalue… view at source ↗

read the original abstract

By specifying behaviour across multiple agents, social norms are a coordination approach to resolving social dilemmas. Decentralized and wide adoption can be achieved by norms whose prescription involves interpreting stochastic signals in the environment. Such signals must have enough correlation to orchestrate mutually beneficial coordination and enough disincentivizing uncertainty about the benefits of exploiting that coordination. Evolutionary game theory of matrix games has been used to describe how, by rational agents comparing and adopting norms, a norm can evolve to become dominant in a population. Morsky \& Ak\c{c}ay (2019) classify norms according to a set of rationality criteria. Joint player strategies that adopt norms that are consistent with optimal single-player strategies with respect to expected reward naturally satisfy a correlated, rather than Nash game theoretic equilibrium condition. Here, we present a version of this theory that clarifies the basic ingredients. We formulate it in the more general Markov game setting more commonly used in reinforcement learning theory. We illustrate the theory by mapping norms over the signal and reward space, while also giving a detailed exposition of the underlying mechanics of the approach. Finally, we give a general solution and analysis of replicator dynamics, which Morsky \& Ak\c{c}ay (2019) propose as a means by which these norms could emerge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper formulates social norms as coordination devices in social dilemmas by mapping them over stochastic signals and rewards in a Markov game setting. It argues that norms consistent with single-player optimal expected-reward strategies yield joint strategies satisfying correlated (rather than Nash) equilibrium, illustrates this mapping, and supplies a general solution and analysis of the replicator dynamics by which such norms can emerge via evolutionary adoption.

Significance. If the derivations are sound and the equilibrium claim holds under the stated conditions, the work could connect evolutionary game theory with multi-agent reinforcement learning by showing how decentralized norm adoption produces correlated equilibria in stochastic environments. The replicator-dynamics analysis is presented as a general solution, which would be a useful technical contribution if it is parameter-free or explicitly derived.

major comments (2)

Abstract and central claim: the assertion that norms consistent with optimal single-player strategies 'naturally satisfy a correlated, rather than Nash' equilibrium is load-bearing for the paper's main theoretical contribution. The manuscript does not explicitly state whether the stochastic signals are publicly observed by all players or privately observed. If signals are private (as is common in POMDP-style Markov games), the joint strategy fails the standard definition of correlated equilibrium because a unilateral deviation can exploit the lack of common knowledge of the signal draw. This assumption must be discharged with a precise statement of the information structure and a verification that the no-deviation condition holds.
Replicator-dynamics section: the abstract promises 'a general solution and analysis' of the replicator dynamics, yet the provided description gives no equations, fixed-point characterization, or stability analysis. Without these, it is impossible to assess whether the dynamics indeed drive adoption of the claimed norms or whether the analysis is independent of the equilibrium construction.

minor comments (2)

The abstract references Morsky & Akçay (2019) for the rationality criteria and replicator-dynamics proposal; the manuscript should clarify which elements are direct extensions versus new contributions in the Markov-game setting.
Notation for signal space, reward space, and norm mapping should be introduced with explicit definitions before the equilibrium claim is stated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below and will revise the manuscript to incorporate the necessary clarifications and expansions.

read point-by-point responses

Referee: Abstract and central claim: the assertion that norms consistent with optimal single-player strategies 'naturally satisfy a correlated, rather than Nash' equilibrium is load-bearing for the paper's main theoretical contribution. The manuscript does not explicitly state whether the stochastic signals are publicly observed by all players or privately observed. If signals are private (as is common in POMDP-style Markov games), the joint strategy fails the standard definition of correlated equilibrium because a unilateral deviation can exploit the lack of common knowledge of the signal draw. This assumption must be discharged with a precise statement of the information structure and a verification that the no-deviation condition holds.

Authors: We agree that the information structure requires explicit clarification. In the Markov game formulation used throughout the manuscript, the stochastic signals are part of the publicly observed state, consistent with the standard definition of Markov games (as opposed to partially observable variants). This ensures common knowledge of each signal realization across agents, so that the joint strategy induced by a norm satisfying single-player optimality meets the no-unilateral-deviation condition of correlated equilibrium. We will add a dedicated paragraph in the model section stating the public-observation assumption and verifying the equilibrium property under it. revision: yes
Referee: Replicator-dynamics section: the abstract promises 'a general solution and analysis' of the replicator dynamics, yet the provided description gives no equations, fixed-point characterization, or stability analysis. Without these, it is impossible to assess whether the dynamics indeed drive adoption of the claimed norms or whether the analysis is independent of the equilibrium construction.

Authors: We acknowledge that the replicator-dynamics presentation in the current draft is concise and lacks the explicit equations requested. The general solution is obtained by substituting the norm-induced payoff matrix (derived from the signal-reward mapping) into the standard replicator equation for strategy frequencies; the fixed points correspond to the pure norms that are consistent with single-player optimality, and local stability follows from the sign of the eigenvalue associated with the fitness difference. To make this self-contained and independent of the equilibrium construction, we will insert the explicit replicator ODE, the fixed-point characterization, and a brief stability analysis (including the condition under which the optimal-norm fixed point is asymptotically stable) in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with independent analysis in Markov game setting

full rationale

The paper extends the framework from Morsky & Akçay (2019) by reformulating the rationality criteria and norm classification in the more general Markov game setting used in reinforcement learning, mapping norms explicitly over signal and reward space, and supplying a general solution plus analysis of the replicator dynamics that the 2019 work only proposed. The assertion that norms consistent with optimal single-player expected-reward strategies satisfy correlated rather than Nash equilibrium is presented as a direct consequence of the joint-strategy construction under shared stochastic signals, but the paper supplies independent mechanistic exposition and dynamics analysis that do not reduce to the input definitions or the cited criteria by construction. No load-bearing self-citation, fitted-input-as-prediction, or ansatz-smuggling steps are present; the central result retains independent content beyond the 2019 reference.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The theory rests on domain assumptions about rational norm comparison and on the requirement that environmental signals carry both correlation and exploitable uncertainty; no explicit free parameters or invented entities are named in the abstract.

axioms (2)

domain assumption Agents rationally compare and adopt norms according to expected reward.
Invoked to justify the use of replicator dynamics as the mechanism of norm spread.
domain assumption Environmental signals possess sufficient correlation for coordination yet enough uncertainty to deter exploitation.
Stated as a necessary condition for stable norm adoption in social dilemmas.

pith-pipeline@v0.9.0 · 5757 in / 1432 out tokens · 50877 ms · 2026-05-20T21:43:37.077216+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Joint player strategies that adopt norms that are consistent with optimal single-player strategies with respect to expected reward naturally satisfy a correlated, rather than Nash game theoretic equilibrium condition... replicator dynamics
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We give a general solution and analysis of replicator dynamics

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

Proceedings of the National Academy of Sciences , volume =

Bryce Morsky and Erol Akçay , title =. Proceedings of the National Academy of Sciences , volume =. 2019 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.1817095116 , abstract =

work page doi:10.1073/pnas.1817095116 2019
[2]

Bulletin of Mathematical Biology , year=

Hilbe, Christian , title=. Bulletin of Mathematical Biology , year=. doi:10.1007/s11538-010-9608-2 , url=

work page doi:10.1007/s11538-010-9608-2
[3]

Foster and Rakesh V

Dean P. Foster and Rakesh V. Vohra , abstract =. Calibrated Learning and Correlated Equilibrium , journal =. 1997 , issn =. doi:https://doi.org/10.1006/game.1997.0595 , url =

work page doi:10.1006/game.1997.0595 1997
[4]

Econometrica , volume =

Hart, Sergiu and Mas-Colell, Andreu , title =. Econometrica , volume =. doi:https://doi.org/10.1111/1468-0262.00153 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/1468-0262.00153 , abstract =

work page doi:10.1111/1468-0262.00153
[5]

Aumann , journal =

Robert J. Aumann , journal =. Correlated Equilibrium as an Expression of Bayesian Rationality , urldate =

work page
[6]

Macy and Andreas Flache , title =

Michael W. Macy and Andreas Flache , title =. Proceedings of the National Academy of Sciences , volume =. 2002 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.092080099 , abstract =

work page doi:10.1073/pnas.092080099 2002
[7]

and Zambaldi, Vinicius and Lanctot, Marc and Marecki, Janusz and Graepel, Thore , title =

Leibo, Joel Z. and Zambaldi, Vinicius and Lanctot, Marc and Marecki, Janusz and Graepel, Thore , title =. Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems , pages =. 2017 , publisher =

work page 2017
[8]

An Evolutionary Approach to Norms , urldate =

Robert Axelrod , journal =. An Evolutionary Approach to Norms , urldate =

work page
[9]

Proceedings of the 23rd ACM Conference on Economics and Computation , pages =

Anagnostides, Ioannis and Farina, Gabriele and Kroer, Christian and Celli, Andrea and Sandholm, Tuomas , title =. Proceedings of the 23rd ACM Conference on Economics and Computation , pages =. 2022 , isbn =. doi:10.1145/3490486.3538288 , abstract =

work page doi:10.1145/3490486.3538288 2022
[10]

Proceedings of the Twentieth International Conference on International Conference on Machine Learning , pages =

Greenwald, Amy and Hall, Keith , title =. Proceedings of the Twentieth International Conference on International Conference on Machine Learning , pages =. 2003 , isbn =

work page 2003
[11]

Politics, Philosophy & Economics , volume =

Herbert Gintis , title =. Politics, Philosophy & Economics , volume =. 2010 , doi =. https://doi.org/10.1177/1470594X09345474 , abstract =

work page doi:10.1177/1470594x09345474 2010
[12]

Proceedings of the National Academy of Sciences , volume =

Petter Törnberg , title =. Proceedings of the National Academy of Sciences , volume =. 2022 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2207159119 , abstract =

work page doi:10.1073/pnas.2207159119 2022
[13]

Politics, Philosophy and Economics , number =

Evolutionary Considerations in the Framing of Social Norms , pages =. Politics, Philosophy and Economics , number =. 2010 , author =. doi:10.1177/1470594x09339744 , publisher =

work page doi:10.1177/1470594x09339744 2010
[14]

A Learning Agent That Acquires Social Norms from Public Sanctions in Decentralized Multi-Agent Settings , year =

Vinitsky, Eugene and K\". A Learning Agent That Acquires Social Norms from Public Sanctions in Decentralized Multi-Agent Settings , year =. Collective Intelligence , month =. doi:10.1177/26339137231162025 , abstract =

work page doi:10.1177/26339137231162025
[15]

Hadfield and Joel Z

Raphael Köster and Dylan Hadfield-Menell and Richard Everett and Laura Weidinger and Gillian K. Hadfield and Joel Z. Leibo , title =. Proceedings of the National Academy of Sciences , volume =. 2022 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2106028118 , abstract =

work page doi:10.1073/pnas.2106028118 2022

[1] [1]

Proceedings of the National Academy of Sciences , volume =

Bryce Morsky and Erol Akçay , title =. Proceedings of the National Academy of Sciences , volume =. 2019 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.1817095116 , abstract =

work page doi:10.1073/pnas.1817095116 2019

[2] [2]

Bulletin of Mathematical Biology , year=

Hilbe, Christian , title=. Bulletin of Mathematical Biology , year=. doi:10.1007/s11538-010-9608-2 , url=

work page doi:10.1007/s11538-010-9608-2

[3] [3]

Foster and Rakesh V

Dean P. Foster and Rakesh V. Vohra , abstract =. Calibrated Learning and Correlated Equilibrium , journal =. 1997 , issn =. doi:https://doi.org/10.1006/game.1997.0595 , url =

work page doi:10.1006/game.1997.0595 1997

[4] [4]

Econometrica , volume =

Hart, Sergiu and Mas-Colell, Andreu , title =. Econometrica , volume =. doi:https://doi.org/10.1111/1468-0262.00153 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/1468-0262.00153 , abstract =

work page doi:10.1111/1468-0262.00153

[5] [5]

Aumann , journal =

Robert J. Aumann , journal =. Correlated Equilibrium as an Expression of Bayesian Rationality , urldate =

work page

[6] [6]

Macy and Andreas Flache , title =

Michael W. Macy and Andreas Flache , title =. Proceedings of the National Academy of Sciences , volume =. 2002 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.092080099 , abstract =

work page doi:10.1073/pnas.092080099 2002

[7] [7]

and Zambaldi, Vinicius and Lanctot, Marc and Marecki, Janusz and Graepel, Thore , title =

Leibo, Joel Z. and Zambaldi, Vinicius and Lanctot, Marc and Marecki, Janusz and Graepel, Thore , title =. Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems , pages =. 2017 , publisher =

work page 2017

[8] [8]

An Evolutionary Approach to Norms , urldate =

Robert Axelrod , journal =. An Evolutionary Approach to Norms , urldate =

work page

[9] [9]

Proceedings of the 23rd ACM Conference on Economics and Computation , pages =

Anagnostides, Ioannis and Farina, Gabriele and Kroer, Christian and Celli, Andrea and Sandholm, Tuomas , title =. Proceedings of the 23rd ACM Conference on Economics and Computation , pages =. 2022 , isbn =. doi:10.1145/3490486.3538288 , abstract =

work page doi:10.1145/3490486.3538288 2022

[10] [10]

Proceedings of the Twentieth International Conference on International Conference on Machine Learning , pages =

Greenwald, Amy and Hall, Keith , title =. Proceedings of the Twentieth International Conference on International Conference on Machine Learning , pages =. 2003 , isbn =

work page 2003

[11] [11]

Politics, Philosophy & Economics , volume =

Herbert Gintis , title =. Politics, Philosophy & Economics , volume =. 2010 , doi =. https://doi.org/10.1177/1470594X09345474 , abstract =

work page doi:10.1177/1470594x09345474 2010

[12] [12]

Proceedings of the National Academy of Sciences , volume =

Petter Törnberg , title =. Proceedings of the National Academy of Sciences , volume =. 2022 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2207159119 , abstract =

work page doi:10.1073/pnas.2207159119 2022

[13] [13]

Politics, Philosophy and Economics , number =

Evolutionary Considerations in the Framing of Social Norms , pages =. Politics, Philosophy and Economics , number =. 2010 , author =. doi:10.1177/1470594x09339744 , publisher =

work page doi:10.1177/1470594x09339744 2010

[14] [14]

A Learning Agent That Acquires Social Norms from Public Sanctions in Decentralized Multi-Agent Settings , year =

Vinitsky, Eugene and K\". A Learning Agent That Acquires Social Norms from Public Sanctions in Decentralized Multi-Agent Settings , year =. Collective Intelligence , month =. doi:10.1177/26339137231162025 , abstract =

work page doi:10.1177/26339137231162025

[15] [15]

Hadfield and Joel Z

Raphael Köster and Dylan Hadfield-Menell and Richard Everett and Laura Weidinger and Gillian K. Hadfield and Joel Z. Leibo , title =. Proceedings of the National Academy of Sciences , volume =. 2022 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2106028118 , abstract =

work page doi:10.1073/pnas.2106028118 2022