Convergence of Replicator Dynamics in the Repeated Prisoner's Dilemma with Restarts

Benedict Russell; Chin-wing Leung; Paolo Turrini

arxiv: 2606.18965 · v1 · pith:754DBVKPnew · submitted 2026-06-17 · 💻 cs.GT

Convergence of Replicator Dynamics in the Repeated Prisoner's Dilemma with Restarts

Benedict Russell , Chin-wing Leung , Paolo Turrini This is my paper

Pith reviewed 2026-06-26 18:53 UTC · model grok-4.3

classification 💻 cs.GT

keywords replicator dynamicsrepeated prisoner's dilemmatrigger-restart mechanismcooperationstrategy lengthhazing periodstable sequencesbasins of attraction

0 comments

The pith

Increasing strategy length enables cooperation to emerge and stabilise under replicator dynamics in the repeated Prisoner's Dilemma with restarts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models a well-mixed population of agents playing repeated Prisoner's Dilemma where the interaction restarts whenever the two partners choose different actions. Agents are restricted to pure strategies that are fixed sequences of length m. The central result is that raising m makes cooperative outcomes reachable and stable under replicator dynamics. An exact count of the stable sequences shows that every such sequence begins with a run of defection before switching to permanent cooperation; sequences with longer initial defection runs attract larger fractions of the population even when they deliver lower long-run payoffs.

Core claim

Formulating the corresponding parametrised normal-form game, with agents each adopting a length-m strategy sequence, we show that increasing the strategy length enables cooperation to emerge and stabilise. We provide exact convergence guarantees for restricted strategy lengths and, in the general payoff configuration, provide the necessary parametric conditions for the stability of cooperative strategies. By deriving an exact formula for the number of stable sequences, we find structural properties necessary for stability, as agents must learn to initially defect - the so-called "hazing period" - before cooperating indefinitely. Our analysis shows that, while optimal cooperative sequences ex

What carries the argument

Length-m strategy sequences in the parametrised normal-form game obtained from the trigger-restart mechanism

If this is right

Cooperation emerges and stabilises once strategy length is increased.
Every stable cooperative sequence must contain an initial run of defection before indefinite cooperation.
Sequences with longer initial defection runs possess larger basins of attraction.
Exact formulas exist for the number of stable sequences under the trigger-restart rule.
Parametric conditions on payoffs determine which cooperative sequences are stable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same restart rule and length-m representation could be applied to other repeated social dilemmas to check whether longer sequences likewise enlarge the set of stable cooperative outcomes.
If populations can evolve the length of their strategies, selection may first increase memory before selecting among the cooperative sequences.
Finite-population or stochastic simulations could test whether the larger basins identified for longer-hazing sequences survive when mutation and drift are added.

Load-bearing premise

Every agent is restricted to a pure length-m strategy sequence in the parametrised normal-form game obtained from the trigger-restart mechanism.

What would settle it

A calculation or simulation in which the number of stable cooperative sequences fails to increase with m or in which no cooperative sequences remain stable once m is large.

read the original abstract

We investigate a population of self-interested agents playing a repeated Prisoner's Dilemma under the trigger-restart mechanism. Under such a mechanism, agents play a sequence of symmetric games with their partner, and restart the interaction if their actions disagree. Our work focuses on the convergence of replicator dynamics in a well-mixed population of agents, where the emergence of cooperation is challenged by the individual incentive for exploitation. Formulating the corresponding parametrised normal-form game, with agents each adopting a length-m strategy sequence, we show that increasing the strategy length enables cooperation to emerge and stabilise. We provide exact convergence guarantees for restricted strategy lengths and, in the general payoff configuration, provide the necessary parametric conditions for the stability of cooperative strategies. By deriving an exact formula for the number of stable sequences, we find structural properties necessary for stability, as agents must learn to initially defect - the so-called "hazing period" - before cooperating indefinitely. Our analysis shows that, while optimal cooperative sequences exist, agents favour less-optimal sequences with a longer hazing period, which possess larger basins of attraction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives an exact count of stable sequences under replicator dynamics in this restart PD and links cooperation to a required initial defection phase, but all results stay inside the class of fixed length-m pure strategies.

read the letter

The paper's main result is that in this trigger-restart repeated Prisoner's Dilemma, stable cooperative outcomes under replicator dynamics require an initial period of defection before switching to mutual cooperation, and that increasing the strategy length m produces more such stable sequences with larger basins of attraction. They give an exact formula for the number of stable sequences.

What the paper does well is lay out the parametrised normal-form game clearly and then derive the stability conditions and the count. The observation that agents prefer less-optimal sequences with longer hazing periods because of bigger basins is a nice structural point that could help organize thinking in this area.

The soft spot is the modeling restriction to pure length-m strategy sequences. The analysis stays inside that class, so the convergence guarantees and basin sizes apply only when every agent is locked into one fixed sequence of that length. Strategies that adapt based on whether a restart happened or that use different lengths are left out. That makes the emergence claim conditional on the truncation being representative, which the paper does not really argue.

The derivations are presented as exact. If the full paper supplies the proofs and the payoff matrices, that would make the claims checkable. The stress-test concern lands because the abstract states the restriction upfront.

This is for people working on evolutionary game theory and multi-agent systems who look at repeated games with specific restart or termination rules. The quantitative results on basins and the hazing property could be of interest to that group.

I would send this to peer review so the derivations can be checked and the scope of the results clarified.

Referee Report

2 major / 1 minor

Summary. The paper studies replicator dynamics for a population playing the repeated Prisoner's Dilemma under a trigger-restart mechanism. It formulates the interaction as a parametrised normal-form game in which each agent is restricted to a pure strategy that is a fixed-length-m sequence of actions, derives exact convergence guarantees for small m, supplies parametric stability conditions in the general case, and gives an exact formula for the number of stable sequences. The central claims are that increasing m permits stable cooperation to emerge, that every stable sequence must begin with a 'hazing period' of defection before indefinite cooperation, and that sequences with longer hazing periods, although payoff-suboptimal, possess larger basins of attraction.

Significance. If the derivations are correct, the work supplies a mathematically precise characterisation of stability and basin sizes inside a deliberately truncated strategy space. The explicit count of stable sequences and the identification of the hazing-period structural property constitute concrete, falsifiable predictions that could be tested numerically or experimentally within the same model class.

major comments (2)

[Abstract / modeling section] Abstract and modeling formulation: the central claim that 'increasing the strategy length enables cooperation to emerge and stabilise' is derived entirely under the restriction that every agent adopts a pure length-m sequence in the trigger-restart normal-form game. Because the replicator dynamics, stability conditions, and basin-size comparisons are obtained only inside this class, the emergence result does not automatically extend to agents that can condition on the restart trigger itself or employ variable-length or history-dependent rules outside the m-sequence truncation. A justification or sensitivity analysis for this modeling choice is required for the claim to be load-bearing.
[Abstract] The abstract asserts 'exact convergence guarantees for restricted strategy lengths' and 'an exact formula for the number of stable sequences,' yet the provided text supplies neither the payoff matrix entries nor the derivation steps that would allow verification of these formulas. Without the explicit mapping from the trigger-restart rule to the normal-form payoffs or the replicator equations, it is impossible to confirm that the reported stability conditions and basin comparisons are free of algebraic error.

minor comments (1)

Notation for the length-m sequences and the restart trigger should be introduced with a small example (e.g., m=2) before the general case to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive suggestions. We address each major comment below and have revised the manuscript to strengthen the presentation of our modeling assumptions and to improve the verifiability of our derivations.

read point-by-point responses

Referee: [Abstract / modeling section] Abstract and modeling formulation: the central claim that 'increasing the strategy length enables cooperation to emerge and stabilise' is derived entirely under the restriction that every agent adopts a pure length-m sequence in the trigger-restart normal-form game. Because the replicator dynamics, stability conditions, and basin-size comparisons are obtained only inside this class, the emergence result does not automatically extend to agents that can condition on the restart trigger itself or employ variable-length or history-dependent rules outside the m-sequence truncation. A justification or sensitivity analysis for this modeling choice is required for the claim to be load-bearing.

Authors: We agree that all results are obtained strictly inside the fixed-length-m pure-strategy truncation of the trigger-restart game. This restriction is deliberate: it permits an exact normal-form representation and closed-form stability analysis that would be intractable for arbitrary history-dependent or variable-length strategies. The abstract already states the restriction explicitly (“with agents each adopting a length-m strategy sequence”). We have added a new paragraph in Section 2 explaining the modeling rationale—namely, that the truncation isolates the effect of increasing memory length while keeping the strategy space finite and the replicator dynamics analytically tractable—and we briefly discuss how conditioning on the restart trigger would require a qualitatively different state space. No sensitivity analysis across broader classes is provided, as that lies outside the paper’s scope. revision: yes
Referee: [Abstract] The abstract asserts 'exact convergence guarantees for restricted strategy lengths' and 'an exact formula for the number of stable sequences,' yet the provided text supplies neither the payoff matrix entries nor the derivation steps that would allow verification of these formulas. Without the explicit mapping from the trigger-restart rule to the normal-form payoffs or the replicator equations, it is impossible to confirm that the reported stability conditions and basin comparisons are free of algebraic error.

Authors: The full manuscript derives the payoff matrix in Section 3 and supplies the replicator equations together with the stability conditions in Section 4; the exact count of stable sequences appears as Theorem 5. To make these derivations immediately verifiable, we have inserted a concrete payoff-matrix example for m=2 in the main text and expanded the appendix with the step-by-step mapping from the trigger-restart rule to the normal-form entries, followed by the algebraic verification of the stability thresholds. These additions allow direct checking of the formulas without altering any results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper sets up an explicit parametrised normal-form game from the trigger-restart repeated PD with the modeling restriction to pure length-m strategy sequences, then derives replicator dynamics convergence, stability conditions, and an exact count of stable sequences directly from the resulting payoff structure and dynamics equations. No fitted parameters are renamed as predictions, no self-citations bear the load of uniqueness or ansatzes, and no step reduces by construction to its own inputs; the hazing-period property and basin-size comparisons follow from analysis of the constructed game rather than being presupposed.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard replicator-dynamics model applied to a normal-form game whose strategies are restricted to length-m sequences and whose payoffs are determined by the trigger-restart rule; m and the underlying PD payoffs function as free parameters.

free parameters (2)

m (strategy length)
Controls the set of admissible strategies; the paper shows results depend on increasing m.
PD payoff parameters
The game is explicitly parametrised; stability conditions are stated in terms of these values.

axioms (2)

domain assumption Population strategy frequencies evolve according to the replicator dynamics equation in a well-mixed population.
Invoked throughout the convergence analysis (abstract: 'convergence of replicator dynamics in a well-mixed population').
domain assumption Every agent is restricted to a pure strategy that is a fixed sequence of length m.
The formulation step that turns the repeated game into a normal-form game (abstract: 'with agents each adopting a length-m strategy sequence').

pith-pipeline@v0.9.1-grok · 5720 in / 1620 out tokens · 39863 ms · 2026-06-26T18:53:06.215635+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 15 canonical work pages

[1]

Science162(3859), 1243–1248 (1968)

Hardin, G.: The tragedy of the commons. Science162(3859), 1243–1248 (1968)

1968
[2]

The Review of Economic Studies38(1), 1–12 (1971)

Friedman, J.W.: A non-cooperative equilibrium for supergames. The Review of Economic Studies38(1), 1–12 (1971)

1971
[3]

Journal of Conflict Resolution24(1), 3–25 (1980) https://doi.org/10.1177/002200278002400101

Axelrod, R.: Effective choice in the prisoner’s dilemma. Journal of Conflict Resolution24(1), 3–25 (1980) https://doi.org/10.1177/002200278002400101

work page doi:10.1177/002200278002400101 1980
[4]

In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

Berker, R.E., Conitzer, V.: Computing optimal equilibria in repeated games with restarts. In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. IJCAI ’24, pp. 2669–2677, Jeju, Korea (2024). https://doi. org/10.24963/ijcai.2024/295 .https://doi.org/10.24963/ijcai.2024/295

work page doi:10.24963/ijcai.2024/295 2024
[5]

In: Proceedings of the Thirty-Fourth International Joint Conference 22 on Artificial Intelligence

Fleischmann, H., Fragkia, K., Berker, R.E.: Beyond symmetry in repeated games with restarts. In: Proceedings of the Thirty-Fourth International Joint Conference 22 on Artificial Intelligence. IJCAI ’25 (2025). https://doi.org/10.24963/ijcai.2025/ 430 .https://doi.org/10.24963/ijcai.2025/430

work page doi:10.24963/ijcai.2025/ 2025
[6]

Applied Mathematics and Computation444, 127819 (2023) https://doi.org/10.1016/j.amc.2022.127819

Ueda, M.: Memory-two strategies forming symmetric mutual reinforcement learn- ing equilibrium in repeated prisoners’ dilemma game. Applied Mathematics and Computation444, 127819 (2023) https://doi.org/10.1016/j.amc.2022.127819

work page doi:10.1016/j.amc.2022.127819 2023
[7]

Proceedings of the National Academy of Sciences 114(18), 4715–4720 (2017) https://doi.org/10.1073/pnas.1621239114

Hilbe, C., Martinez-Vaquero, L.A., Chatterjee, K., Nowak, M.A.: Memory-n strategies of direct reciprocity. Proceedings of the National Academy of Sciences 114(18), 4715–4720 (2017) https://doi.org/10.1073/pnas.1621239114

work page doi:10.1073/pnas.1621239114 2017
[8]

arXiv preprint arXiv:2403.03497 (2024)

Zhang, F., Wu, T., Wang, L.: Adaptive coordination promotes collective cooper- ation in repeated social dilemmas. arXiv preprint arXiv:2403.03497 (2024)

arXiv 2024
[9]

Anastassacos, N., Hailes, S., Musolesi, M.: Partner Selection for the Emer- gence of Cooperation in Multi-Agent Systems Using Reinforcement Learning. (2020). Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). https://aaai.org/Library/conferences-library.php

2020
[10]

In: 25th International Conference on Autonomous Agents and Multiagent Systems

Russell, B., Leung, C.-w., Turrini, P.: Defection at first sight : learning part- ner selection in optional social dilemmas without prior information. In: 25th International Conference on Autonomous Agents and Multiagent Systems. IFAA- MAS; ACM Digital library (2026). https://doi.org/10.65109/IBSZ1473 . In Press. https://doi.org/10.65109/IBSZ1473

work page doi:10.65109/ibsz1473 2026
[11]

https://arxiv.org/abs/2605.18185

Russell, B., Leung, C.-w., Turrini, P.: The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection (2026). https://arxiv.org/abs/2605.18185

Pith/arXiv arXiv 2026
[12]

The Review of Economic Studies76(3), 993–1021 (2009)

Fujiwara-Greve, T., Okuno-Fujiwara, M.: Voluntarily separable repeated pris- oner’s dilemma. The Review of Economic Studies76(3), 993–1021 (2009)

2009
[13]

Journal of Economic Dynamics and Control46, 91–113 (2014) https://doi.org/10.1016/ j.jedc.2014.06.007

Izquierdo, L.R., Izquierdo, S.S., Vega-Redondo, F.: Leave and let leave: A suf- ficient condition to explain the evolutionary emergence of cooperation. Journal of Economic Dynamics and Control46, 91–113 (2014) https://doi.org/10.1016/ j.jedc.2014.06.007

2014
[14]

Proceedings of the Royal Society B: Biological Sciences274(1610), 749–753 (2007)

Barclay, P., Willer, R.: Partner choice creates competitive altruism in humans. Proceedings of the Royal Society B: Biological Sciences274(1610), 749–753 (2007)

2007
[15]

Dynamic social networks promote cooperation in experiments with humans

Rand, D.G., Arbesman, S., Christakis, N.A.: Dynamic social networks promote cooperation in experiments with humans. Proceedings of the National Academy of Sciences108(48), 19193–19198 (2011) https://doi.org/10.1073/pnas.1108243108

work page doi:10.1073/pnas.1108243108 2011
[16]

Proceedings of the National Academy of Sciences109(36), 14363–14368 (2012) https://doi.org/10.1073/pnas.1120867109 23

Wang, J., Suri, S., Watts, D.J.: Cooperation and assortativity with dynamic partner updating. Proceedings of the National Academy of Sciences109(36), 14363–14368 (2012) https://doi.org/10.1073/pnas.1120867109 23

work page doi:10.1073/pnas.1120867109 2012
[17]

Scientific Reports6, 35902 (2016) https://doi.org/10.1038/srep35902

Zhang, B.-Y., Fan, S.-J., Li, C., Zheng, X.-D., Bao, J.-Z., Cressman, R., Tao, Y.: Opting out against defection leads to stable coexistence with cooperation. Scientific Reports6, 35902 (2016) https://doi.org/10.1038/srep35902

work page doi:10.1038/srep35902 2016
[18]

In: Proceedings of the 11th IEEE Congress on Evolutionary Computation (CEC’09) (2009)

Segbroeck, S.V., Santos, F.C., Now´ e, A., Pacheco, J.M., Lenaerts, T.: The coevo- lution of loyalty and cooperation. In: Proceedings of the 11th IEEE Congress on Evolutionary Computation (CEC’09) (2009)

2009
[19]

Biology Letters6(5), 659–662 (2010)

Sylwester, K., Roberts, G.: Cooperators benefit through reputation-based partner choice in economic games. Biology Letters6(5), 659–662 (2010)

2010
[20]

Journal of Theoretical Biology 420, 12–17 (2017) https://doi.org/10.1016/j.jtbi.2017.02.036

Zheng, X.-D., Li, C., Yu, J.-R., Wang, S.-C., Fan, S.-J., Zhang, B.-Y., Tao, Y.: A simple rule of direct reciprocity leads to the stable coexistence of cooperation and defection in the Prisoner’s Dilemma game. Journal of Theoretical Biology 420, 12–17 (2017) https://doi.org/10.1016/j.jtbi.2017.02.036

work page doi:10.1016/j.jtbi.2017.02.036 2017
[21]

Bara, J., Turrini, P., Andrighetto, G.: Enabling imitation-based cooperation in dynamic social networks. Auton. Agents Multi Agent Syst.36(2), 34 (2022) https: //doi.org/10.1007/s10458-022-09562-w

work page doi:10.1007/s10458-022-09562-w 2022
[22]

https://arxiv.org/abs/2606.11892

Russell, B., Nugent, A., Bara, J.: Mean-field imitation dynamics on fast assorta- tive networks (2026). https://arxiv.org/abs/2606.11892

Pith/arXiv arXiv 2026
[23]

PLOS Computational Biology21(2), 1012810 (2025)

Graser, C., Fujiwara-Greve, T., Garc´ ıa, J., Van Veelen, M.: Repeated games with partner choice. PLOS Computational Biology21(2), 1012810 (2025)

2025
[24]

AAMAS ’24, pp

Leung, C., Turrini, P.: Learning partner selection rules that sustain cooperation in social dilemmas with the option of opting out. AAMAS ’24, pp. 1110–1118. Inter- national Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2024)

2024
[25]

In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pp

Leung, C., Lenaerts, T., Turrini, P.: To promote full cooperation in social dilemmas, agents need to unlearn loyalty. In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pp. 111–119. ijcai.org (2024). https://www.ijcai.org/proceedings/2024/13

2024
[26]

In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems Part 1 - AAMAS ’02

Sabater, J., Sierra, C.: Reputation and social network analysis in multi-agent sys- tems. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems Part 1 - AAMAS ’02. ACM Press, New York, New York, USA (2002)

2002
[27]

In: Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’02) (2002) 24

Pujol, J.M., Sang¨ uesa, R., Delgado, J.: Extracting reputation in multi agent sys- tems by means of social network topology. In: Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’02) (2002) 24

2002
[28]

Autonomous Agents and Multi-Agent Systems21(3), 397–424 (2010) https://doi.org/10.1007/s10458-009-9107-8

Pinninck, A., Sierra, C., Schorlemmer, M.: A multiagent network for peer norm enforcement. Autonomous Agents and Multi-Agent Systems21(3), 397–424 (2010) https://doi.org/10.1007/s10458-009-9107-8

work page doi:10.1007/s10458-009-9107-8 2010
[29]

In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI’24) (2024)

Smit, M., Santos, F.P.: Learning fair cooperation in mixed-motive games with indirect reciprocity. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI’24) (2024)

2024
[30]

In: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’25) (2025)

Ren, T., Yao, X., Li, Y., Zeng, X.-J.: Bottom-up reputation promotes cooperation with multi-agent reinforcement learning. In: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’25) (2025)

2025
[31]

In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’18) (2018)

Santos, F.P., Pacheco, J.M., Santos, F.C.: Social norms of cooperation with costly reputation building. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’18) (2018)

2018
[32]

Mathematical biosciences40(1-2), 145–156 (1978)

Taylor, P.D., Jonker, L.B.: Evolutionary stable strategies and game dynamics. Mathematical biosciences40(1-2), 145–156 (1978)

1978
[33]

Dynamic Games and Applications2(2012) https://doi.org/10.1007/ s13235-012-0044-9

Miekisz, J., Ramsza, M.: Replicator dynamics of symmetric ultimatum game. Dynamic Games and Applications2(2012) https://doi.org/10.1007/ s13235-012-0044-9

2012
[34]

Journal of theo- retical biology243(1), 86–97 (2006) https://doi.org/10.1016/j.jtbi.2006.06.004

Ohtsuki, H., Nowak, M.A.: The replicator equation on graphs. Journal of theo- retical biology243(1), 86–97 (2006) https://doi.org/10.1016/j.jtbi.2006.06.004

work page doi:10.1016/j.jtbi.2006.06.004 2006
[35]

IEEE Transactions on Automatic Control66(1), 291–298 (2021) https://doi.org/10.1109/TAC.2020.2975811

Ramazi, P., Cao, M.: Global convergence for replicator dynamics of repeated snowdrift games. IEEE Transactions on Automatic Control66(1), 291–298 (2021) https://doi.org/10.1109/TAC.2020.2975811 . Conference Name: IEEE Transactions on Automatic Control

work page doi:10.1109/tac.2020.2975811 2021
[36]

Clarendon Press, Oxford (1962)

Moran, P.A.P.: The Statistical Processes of Evolutionary Theory. Clarendon Press, Oxford (1962)

1962
[37]

Nature428(6983), 646–650 (2004)

Nowak, M.A., Sasaki, A., Taylor, C., Fudenberg, D.: Emergence of coopera- tion and evolutionary stability in finite populations. Nature428(6983), 646–650 (2004)

2004
[38]

Prentice Hall, Upper Saddle River, N.J

Khalil, H.K.: Nonlinear Systems. Prentice Hall, Upper Saddle River, N.J. (2002)

2002
[39]

(eds.) A Survey of Replicator Equations, pp

Sigmund, K.: In: Casti, J.L., Karlqvist, A. (eds.) A Survey of Replicator Equations, pp. 88–104. Springer, Berlin, Heidelberg (1986). https://doi.org/10. 1007/978-3-642-70953-1

1986
[40]

arXiv preprint arXiv:2407.05460 (2024) 25

Collevecchio, A., Mimun, H.A., Quattropani, M., Scarsini, M.: Basins of attraction in two-player random ordinal potential games. arXiv preprint arXiv:2407.05460 (2024) 25

arXiv 2024
[41]

23 Michael Neuder, Pranav Garimidi, and Tim Roughgarden

Monderer, D., Shapley, L.: Potential games. Games and Economic Behavior14, 124–143 (1996) https://doi.org/10.1006/game.1996.0044

work page doi:10.1006/game.1996.0044 1996
[42]

Cambridge University Press (1998) Appendix A Additional Derivations We present the full algebraic derivation of the manifold between the two strategiess D ands LC

Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge University Press (1998) Appendix A Additional Derivations We present the full algebraic derivation of the manifold between the two strategiess D ands LC. φ(γ, B, m) = ALC,LC −A D,LC AD,D −A LC,D (A1) = Pm−2 j=0 (P γj) + Rγm−1 1−γ − Pm−2 j=0 (P γj )+T γm−1 1−γm P/(1−γ)− Pm−2 j...

1998

[1] [1]

Science162(3859), 1243–1248 (1968)

Hardin, G.: The tragedy of the commons. Science162(3859), 1243–1248 (1968)

1968

[2] [2]

The Review of Economic Studies38(1), 1–12 (1971)

Friedman, J.W.: A non-cooperative equilibrium for supergames. The Review of Economic Studies38(1), 1–12 (1971)

1971

[3] [3]

Journal of Conflict Resolution24(1), 3–25 (1980) https://doi.org/10.1177/002200278002400101

Axelrod, R.: Effective choice in the prisoner’s dilemma. Journal of Conflict Resolution24(1), 3–25 (1980) https://doi.org/10.1177/002200278002400101

work page doi:10.1177/002200278002400101 1980

[4] [4]

In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

Berker, R.E., Conitzer, V.: Computing optimal equilibria in repeated games with restarts. In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. IJCAI ’24, pp. 2669–2677, Jeju, Korea (2024). https://doi. org/10.24963/ijcai.2024/295 .https://doi.org/10.24963/ijcai.2024/295

work page doi:10.24963/ijcai.2024/295 2024

[5] [5]

In: Proceedings of the Thirty-Fourth International Joint Conference 22 on Artificial Intelligence

Fleischmann, H., Fragkia, K., Berker, R.E.: Beyond symmetry in repeated games with restarts. In: Proceedings of the Thirty-Fourth International Joint Conference 22 on Artificial Intelligence. IJCAI ’25 (2025). https://doi.org/10.24963/ijcai.2025/ 430 .https://doi.org/10.24963/ijcai.2025/430

work page doi:10.24963/ijcai.2025/ 2025

[6] [6]

Applied Mathematics and Computation444, 127819 (2023) https://doi.org/10.1016/j.amc.2022.127819

Ueda, M.: Memory-two strategies forming symmetric mutual reinforcement learn- ing equilibrium in repeated prisoners’ dilemma game. Applied Mathematics and Computation444, 127819 (2023) https://doi.org/10.1016/j.amc.2022.127819

work page doi:10.1016/j.amc.2022.127819 2023

[7] [7]

Proceedings of the National Academy of Sciences 114(18), 4715–4720 (2017) https://doi.org/10.1073/pnas.1621239114

Hilbe, C., Martinez-Vaquero, L.A., Chatterjee, K., Nowak, M.A.: Memory-n strategies of direct reciprocity. Proceedings of the National Academy of Sciences 114(18), 4715–4720 (2017) https://doi.org/10.1073/pnas.1621239114

work page doi:10.1073/pnas.1621239114 2017

[8] [8]

arXiv preprint arXiv:2403.03497 (2024)

Zhang, F., Wu, T., Wang, L.: Adaptive coordination promotes collective cooper- ation in repeated social dilemmas. arXiv preprint arXiv:2403.03497 (2024)

arXiv 2024

[9] [9]

Anastassacos, N., Hailes, S., Musolesi, M.: Partner Selection for the Emer- gence of Cooperation in Multi-Agent Systems Using Reinforcement Learning. (2020). Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). https://aaai.org/Library/conferences-library.php

2020

[10] [10]

In: 25th International Conference on Autonomous Agents and Multiagent Systems

Russell, B., Leung, C.-w., Turrini, P.: Defection at first sight : learning part- ner selection in optional social dilemmas without prior information. In: 25th International Conference on Autonomous Agents and Multiagent Systems. IFAA- MAS; ACM Digital library (2026). https://doi.org/10.65109/IBSZ1473 . In Press. https://doi.org/10.65109/IBSZ1473

work page doi:10.65109/ibsz1473 2026

[11] [11]

https://arxiv.org/abs/2605.18185

Russell, B., Leung, C.-w., Turrini, P.: The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection (2026). https://arxiv.org/abs/2605.18185

Pith/arXiv arXiv 2026

[12] [12]

The Review of Economic Studies76(3), 993–1021 (2009)

Fujiwara-Greve, T., Okuno-Fujiwara, M.: Voluntarily separable repeated pris- oner’s dilemma. The Review of Economic Studies76(3), 993–1021 (2009)

2009

[13] [13]

Journal of Economic Dynamics and Control46, 91–113 (2014) https://doi.org/10.1016/ j.jedc.2014.06.007

Izquierdo, L.R., Izquierdo, S.S., Vega-Redondo, F.: Leave and let leave: A suf- ficient condition to explain the evolutionary emergence of cooperation. Journal of Economic Dynamics and Control46, 91–113 (2014) https://doi.org/10.1016/ j.jedc.2014.06.007

2014

[14] [14]

Proceedings of the Royal Society B: Biological Sciences274(1610), 749–753 (2007)

Barclay, P., Willer, R.: Partner choice creates competitive altruism in humans. Proceedings of the Royal Society B: Biological Sciences274(1610), 749–753 (2007)

2007

[15] [15]

Dynamic social networks promote cooperation in experiments with humans

Rand, D.G., Arbesman, S., Christakis, N.A.: Dynamic social networks promote cooperation in experiments with humans. Proceedings of the National Academy of Sciences108(48), 19193–19198 (2011) https://doi.org/10.1073/pnas.1108243108

work page doi:10.1073/pnas.1108243108 2011

[16] [16]

Proceedings of the National Academy of Sciences109(36), 14363–14368 (2012) https://doi.org/10.1073/pnas.1120867109 23

Wang, J., Suri, S., Watts, D.J.: Cooperation and assortativity with dynamic partner updating. Proceedings of the National Academy of Sciences109(36), 14363–14368 (2012) https://doi.org/10.1073/pnas.1120867109 23

work page doi:10.1073/pnas.1120867109 2012

[17] [17]

Scientific Reports6, 35902 (2016) https://doi.org/10.1038/srep35902

Zhang, B.-Y., Fan, S.-J., Li, C., Zheng, X.-D., Bao, J.-Z., Cressman, R., Tao, Y.: Opting out against defection leads to stable coexistence with cooperation. Scientific Reports6, 35902 (2016) https://doi.org/10.1038/srep35902

work page doi:10.1038/srep35902 2016

[18] [18]

In: Proceedings of the 11th IEEE Congress on Evolutionary Computation (CEC’09) (2009)

Segbroeck, S.V., Santos, F.C., Now´ e, A., Pacheco, J.M., Lenaerts, T.: The coevo- lution of loyalty and cooperation. In: Proceedings of the 11th IEEE Congress on Evolutionary Computation (CEC’09) (2009)

2009

[19] [19]

Biology Letters6(5), 659–662 (2010)

Sylwester, K., Roberts, G.: Cooperators benefit through reputation-based partner choice in economic games. Biology Letters6(5), 659–662 (2010)

2010

[20] [20]

Journal of Theoretical Biology 420, 12–17 (2017) https://doi.org/10.1016/j.jtbi.2017.02.036

Zheng, X.-D., Li, C., Yu, J.-R., Wang, S.-C., Fan, S.-J., Zhang, B.-Y., Tao, Y.: A simple rule of direct reciprocity leads to the stable coexistence of cooperation and defection in the Prisoner’s Dilemma game. Journal of Theoretical Biology 420, 12–17 (2017) https://doi.org/10.1016/j.jtbi.2017.02.036

work page doi:10.1016/j.jtbi.2017.02.036 2017

[21] [21]

Bara, J., Turrini, P., Andrighetto, G.: Enabling imitation-based cooperation in dynamic social networks. Auton. Agents Multi Agent Syst.36(2), 34 (2022) https: //doi.org/10.1007/s10458-022-09562-w

work page doi:10.1007/s10458-022-09562-w 2022

[22] [22]

https://arxiv.org/abs/2606.11892

Russell, B., Nugent, A., Bara, J.: Mean-field imitation dynamics on fast assorta- tive networks (2026). https://arxiv.org/abs/2606.11892

Pith/arXiv arXiv 2026

[23] [23]

PLOS Computational Biology21(2), 1012810 (2025)

Graser, C., Fujiwara-Greve, T., Garc´ ıa, J., Van Veelen, M.: Repeated games with partner choice. PLOS Computational Biology21(2), 1012810 (2025)

2025

[24] [24]

AAMAS ’24, pp

Leung, C., Turrini, P.: Learning partner selection rules that sustain cooperation in social dilemmas with the option of opting out. AAMAS ’24, pp. 1110–1118. Inter- national Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2024)

2024

[25] [25]

In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pp

Leung, C., Lenaerts, T., Turrini, P.: To promote full cooperation in social dilemmas, agents need to unlearn loyalty. In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pp. 111–119. ijcai.org (2024). https://www.ijcai.org/proceedings/2024/13

2024

[26] [26]

In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems Part 1 - AAMAS ’02

Sabater, J., Sierra, C.: Reputation and social network analysis in multi-agent sys- tems. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems Part 1 - AAMAS ’02. ACM Press, New York, New York, USA (2002)

2002

[27] [27]

In: Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’02) (2002) 24

Pujol, J.M., Sang¨ uesa, R., Delgado, J.: Extracting reputation in multi agent sys- tems by means of social network topology. In: Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’02) (2002) 24

2002

[28] [28]

Autonomous Agents and Multi-Agent Systems21(3), 397–424 (2010) https://doi.org/10.1007/s10458-009-9107-8

Pinninck, A., Sierra, C., Schorlemmer, M.: A multiagent network for peer norm enforcement. Autonomous Agents and Multi-Agent Systems21(3), 397–424 (2010) https://doi.org/10.1007/s10458-009-9107-8

work page doi:10.1007/s10458-009-9107-8 2010

[29] [29]

In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI’24) (2024)

Smit, M., Santos, F.P.: Learning fair cooperation in mixed-motive games with indirect reciprocity. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI’24) (2024)

2024

[30] [30]

In: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’25) (2025)

Ren, T., Yao, X., Li, Y., Zeng, X.-J.: Bottom-up reputation promotes cooperation with multi-agent reinforcement learning. In: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’25) (2025)

2025

[31] [31]

In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’18) (2018)

Santos, F.P., Pacheco, J.M., Santos, F.C.: Social norms of cooperation with costly reputation building. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’18) (2018)

2018

[32] [32]

Mathematical biosciences40(1-2), 145–156 (1978)

Taylor, P.D., Jonker, L.B.: Evolutionary stable strategies and game dynamics. Mathematical biosciences40(1-2), 145–156 (1978)

1978

[33] [33]

Dynamic Games and Applications2(2012) https://doi.org/10.1007/ s13235-012-0044-9

Miekisz, J., Ramsza, M.: Replicator dynamics of symmetric ultimatum game. Dynamic Games and Applications2(2012) https://doi.org/10.1007/ s13235-012-0044-9

2012

[34] [34]

Journal of theo- retical biology243(1), 86–97 (2006) https://doi.org/10.1016/j.jtbi.2006.06.004

Ohtsuki, H., Nowak, M.A.: The replicator equation on graphs. Journal of theo- retical biology243(1), 86–97 (2006) https://doi.org/10.1016/j.jtbi.2006.06.004

work page doi:10.1016/j.jtbi.2006.06.004 2006

[35] [35]

IEEE Transactions on Automatic Control66(1), 291–298 (2021) https://doi.org/10.1109/TAC.2020.2975811

Ramazi, P., Cao, M.: Global convergence for replicator dynamics of repeated snowdrift games. IEEE Transactions on Automatic Control66(1), 291–298 (2021) https://doi.org/10.1109/TAC.2020.2975811 . Conference Name: IEEE Transactions on Automatic Control

work page doi:10.1109/tac.2020.2975811 2021

[36] [36]

Clarendon Press, Oxford (1962)

Moran, P.A.P.: The Statistical Processes of Evolutionary Theory. Clarendon Press, Oxford (1962)

1962

[37] [37]

Nature428(6983), 646–650 (2004)

Nowak, M.A., Sasaki, A., Taylor, C., Fudenberg, D.: Emergence of coopera- tion and evolutionary stability in finite populations. Nature428(6983), 646–650 (2004)

2004

[38] [38]

Prentice Hall, Upper Saddle River, N.J

Khalil, H.K.: Nonlinear Systems. Prentice Hall, Upper Saddle River, N.J. (2002)

2002

[39] [39]

(eds.) A Survey of Replicator Equations, pp

Sigmund, K.: In: Casti, J.L., Karlqvist, A. (eds.) A Survey of Replicator Equations, pp. 88–104. Springer, Berlin, Heidelberg (1986). https://doi.org/10. 1007/978-3-642-70953-1

1986

[40] [40]

arXiv preprint arXiv:2407.05460 (2024) 25

Collevecchio, A., Mimun, H.A., Quattropani, M., Scarsini, M.: Basins of attraction in two-player random ordinal potential games. arXiv preprint arXiv:2407.05460 (2024) 25

arXiv 2024

[41] [41]

23 Michael Neuder, Pranav Garimidi, and Tim Roughgarden

Monderer, D., Shapley, L.: Potential games. Games and Economic Behavior14, 124–143 (1996) https://doi.org/10.1006/game.1996.0044

work page doi:10.1006/game.1996.0044 1996

[42] [42]

Cambridge University Press (1998) Appendix A Additional Derivations We present the full algebraic derivation of the manifold between the two strategiess D ands LC

Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge University Press (1998) Appendix A Additional Derivations We present the full algebraic derivation of the manifold between the two strategiess D ands LC. φ(γ, B, m) = ALC,LC −A D,LC AD,D −A LC,D (A1) = Pm−2 j=0 (P γj) + Rγm−1 1−γ − Pm−2 j=0 (P γj )+T γm−1 1−γm P/(1−γ)− Pm−2 j...

1998