pith. sign in

arxiv: 2606.18965 · v1 · pith:754DBVKPnew · submitted 2026-06-17 · 💻 cs.GT

Convergence of Replicator Dynamics in the Repeated Prisoner's Dilemma with Restarts

Pith reviewed 2026-06-26 18:53 UTC · model grok-4.3

classification 💻 cs.GT
keywords replicator dynamicsrepeated prisoner's dilemmatrigger-restart mechanismcooperationstrategy lengthhazing periodstable sequencesbasins of attraction
0
0 comments X

The pith

Increasing strategy length enables cooperation to emerge and stabilise under replicator dynamics in the repeated Prisoner's Dilemma with restarts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models a well-mixed population of agents playing repeated Prisoner's Dilemma where the interaction restarts whenever the two partners choose different actions. Agents are restricted to pure strategies that are fixed sequences of length m. The central result is that raising m makes cooperative outcomes reachable and stable under replicator dynamics. An exact count of the stable sequences shows that every such sequence begins with a run of defection before switching to permanent cooperation; sequences with longer initial defection runs attract larger fractions of the population even when they deliver lower long-run payoffs.

Core claim

Formulating the corresponding parametrised normal-form game, with agents each adopting a length-m strategy sequence, we show that increasing the strategy length enables cooperation to emerge and stabilise. We provide exact convergence guarantees for restricted strategy lengths and, in the general payoff configuration, provide the necessary parametric conditions for the stability of cooperative strategies. By deriving an exact formula for the number of stable sequences, we find structural properties necessary for stability, as agents must learn to initially defect - the so-called "hazing period" - before cooperating indefinitely. Our analysis shows that, while optimal cooperative sequences ex

What carries the argument

Length-m strategy sequences in the parametrised normal-form game obtained from the trigger-restart mechanism

If this is right

  • Cooperation emerges and stabilises once strategy length is increased.
  • Every stable cooperative sequence must contain an initial run of defection before indefinite cooperation.
  • Sequences with longer initial defection runs possess larger basins of attraction.
  • Exact formulas exist for the number of stable sequences under the trigger-restart rule.
  • Parametric conditions on payoffs determine which cooperative sequences are stable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same restart rule and length-m representation could be applied to other repeated social dilemmas to check whether longer sequences likewise enlarge the set of stable cooperative outcomes.
  • If populations can evolve the length of their strategies, selection may first increase memory before selecting among the cooperative sequences.
  • Finite-population or stochastic simulations could test whether the larger basins identified for longer-hazing sequences survive when mutation and drift are added.

Load-bearing premise

Every agent is restricted to a pure length-m strategy sequence in the parametrised normal-form game obtained from the trigger-restart mechanism.

What would settle it

A calculation or simulation in which the number of stable cooperative sequences fails to increase with m or in which no cooperative sequences remain stable once m is large.

read the original abstract

We investigate a population of self-interested agents playing a repeated Prisoner's Dilemma under the trigger-restart mechanism. Under such a mechanism, agents play a sequence of symmetric games with their partner, and restart the interaction if their actions disagree. Our work focuses on the convergence of replicator dynamics in a well-mixed population of agents, where the emergence of cooperation is challenged by the individual incentive for exploitation. Formulating the corresponding parametrised normal-form game, with agents each adopting a length-m strategy sequence, we show that increasing the strategy length enables cooperation to emerge and stabilise. We provide exact convergence guarantees for restricted strategy lengths and, in the general payoff configuration, provide the necessary parametric conditions for the stability of cooperative strategies. By deriving an exact formula for the number of stable sequences, we find structural properties necessary for stability, as agents must learn to initially defect - the so-called "hazing period" - before cooperating indefinitely. Our analysis shows that, while optimal cooperative sequences exist, agents favour less-optimal sequences with a longer hazing period, which possess larger basins of attraction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper studies replicator dynamics for a population playing the repeated Prisoner's Dilemma under a trigger-restart mechanism. It formulates the interaction as a parametrised normal-form game in which each agent is restricted to a pure strategy that is a fixed-length-m sequence of actions, derives exact convergence guarantees for small m, supplies parametric stability conditions in the general case, and gives an exact formula for the number of stable sequences. The central claims are that increasing m permits stable cooperation to emerge, that every stable sequence must begin with a 'hazing period' of defection before indefinite cooperation, and that sequences with longer hazing periods, although payoff-suboptimal, possess larger basins of attraction.

Significance. If the derivations are correct, the work supplies a mathematically precise characterisation of stability and basin sizes inside a deliberately truncated strategy space. The explicit count of stable sequences and the identification of the hazing-period structural property constitute concrete, falsifiable predictions that could be tested numerically or experimentally within the same model class.

major comments (2)
  1. [Abstract / modeling section] Abstract and modeling formulation: the central claim that 'increasing the strategy length enables cooperation to emerge and stabilise' is derived entirely under the restriction that every agent adopts a pure length-m sequence in the trigger-restart normal-form game. Because the replicator dynamics, stability conditions, and basin-size comparisons are obtained only inside this class, the emergence result does not automatically extend to agents that can condition on the restart trigger itself or employ variable-length or history-dependent rules outside the m-sequence truncation. A justification or sensitivity analysis for this modeling choice is required for the claim to be load-bearing.
  2. [Abstract] The abstract asserts 'exact convergence guarantees for restricted strategy lengths' and 'an exact formula for the number of stable sequences,' yet the provided text supplies neither the payoff matrix entries nor the derivation steps that would allow verification of these formulas. Without the explicit mapping from the trigger-restart rule to the normal-form payoffs or the replicator equations, it is impossible to confirm that the reported stability conditions and basin comparisons are free of algebraic error.
minor comments (1)
  1. Notation for the length-m sequences and the restart trigger should be introduced with a small example (e.g., m=2) before the general case to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive suggestions. We address each major comment below and have revised the manuscript to strengthen the presentation of our modeling assumptions and to improve the verifiability of our derivations.

read point-by-point responses
  1. Referee: [Abstract / modeling section] Abstract and modeling formulation: the central claim that 'increasing the strategy length enables cooperation to emerge and stabilise' is derived entirely under the restriction that every agent adopts a pure length-m sequence in the trigger-restart normal-form game. Because the replicator dynamics, stability conditions, and basin-size comparisons are obtained only inside this class, the emergence result does not automatically extend to agents that can condition on the restart trigger itself or employ variable-length or history-dependent rules outside the m-sequence truncation. A justification or sensitivity analysis for this modeling choice is required for the claim to be load-bearing.

    Authors: We agree that all results are obtained strictly inside the fixed-length-m pure-strategy truncation of the trigger-restart game. This restriction is deliberate: it permits an exact normal-form representation and closed-form stability analysis that would be intractable for arbitrary history-dependent or variable-length strategies. The abstract already states the restriction explicitly (“with agents each adopting a length-m strategy sequence”). We have added a new paragraph in Section 2 explaining the modeling rationale—namely, that the truncation isolates the effect of increasing memory length while keeping the strategy space finite and the replicator dynamics analytically tractable—and we briefly discuss how conditioning on the restart trigger would require a qualitatively different state space. No sensitivity analysis across broader classes is provided, as that lies outside the paper’s scope. revision: yes

  2. Referee: [Abstract] The abstract asserts 'exact convergence guarantees for restricted strategy lengths' and 'an exact formula for the number of stable sequences,' yet the provided text supplies neither the payoff matrix entries nor the derivation steps that would allow verification of these formulas. Without the explicit mapping from the trigger-restart rule to the normal-form payoffs or the replicator equations, it is impossible to confirm that the reported stability conditions and basin comparisons are free of algebraic error.

    Authors: The full manuscript derives the payoff matrix in Section 3 and supplies the replicator equations together with the stability conditions in Section 4; the exact count of stable sequences appears as Theorem 5. To make these derivations immediately verifiable, we have inserted a concrete payoff-matrix example for m=2 in the main text and expanded the appendix with the step-by-step mapping from the trigger-restart rule to the normal-form entries, followed by the algebraic verification of the stability thresholds. These additions allow direct checking of the formulas without altering any results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper sets up an explicit parametrised normal-form game from the trigger-restart repeated PD with the modeling restriction to pure length-m strategy sequences, then derives replicator dynamics convergence, stability conditions, and an exact count of stable sequences directly from the resulting payoff structure and dynamics equations. No fitted parameters are renamed as predictions, no self-citations bear the load of uniqueness or ansatzes, and no step reduces by construction to its own inputs; the hazing-period property and basin-size comparisons follow from analysis of the constructed game rather than being presupposed.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard replicator-dynamics model applied to a normal-form game whose strategies are restricted to length-m sequences and whose payoffs are determined by the trigger-restart rule; m and the underlying PD payoffs function as free parameters.

free parameters (2)
  • m (strategy length)
    Controls the set of admissible strategies; the paper shows results depend on increasing m.
  • PD payoff parameters
    The game is explicitly parametrised; stability conditions are stated in terms of these values.
axioms (2)
  • domain assumption Population strategy frequencies evolve according to the replicator dynamics equation in a well-mixed population.
    Invoked throughout the convergence analysis (abstract: 'convergence of replicator dynamics in a well-mixed population').
  • domain assumption Every agent is restricted to a pure strategy that is a fixed sequence of length m.
    The formulation step that turns the repeated game into a normal-form game (abstract: 'with agents each adopting a length-m strategy sequence').

pith-pipeline@v0.9.1-grok · 5720 in / 1620 out tokens · 39863 ms · 2026-06-26T18:53:06.215635+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 15 canonical work pages

  1. [1]

    Science162(3859), 1243–1248 (1968)

    Hardin, G.: The tragedy of the commons. Science162(3859), 1243–1248 (1968)

  2. [2]

    The Review of Economic Studies38(1), 1–12 (1971)

    Friedman, J.W.: A non-cooperative equilibrium for supergames. The Review of Economic Studies38(1), 1–12 (1971)

  3. [3]

    Journal of Conflict Resolution24(1), 3–25 (1980) https://doi.org/10.1177/002200278002400101

    Axelrod, R.: Effective choice in the prisoner’s dilemma. Journal of Conflict Resolution24(1), 3–25 (1980) https://doi.org/10.1177/002200278002400101

  4. [4]

    In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

    Berker, R.E., Conitzer, V.: Computing optimal equilibria in repeated games with restarts. In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. IJCAI ’24, pp. 2669–2677, Jeju, Korea (2024). https://doi. org/10.24963/ijcai.2024/295 .https://doi.org/10.24963/ijcai.2024/295

  5. [5]

    In: Proceedings of the Thirty-Fourth International Joint Conference 22 on Artificial Intelligence

    Fleischmann, H., Fragkia, K., Berker, R.E.: Beyond symmetry in repeated games with restarts. In: Proceedings of the Thirty-Fourth International Joint Conference 22 on Artificial Intelligence. IJCAI ’25 (2025). https://doi.org/10.24963/ijcai.2025/ 430 .https://doi.org/10.24963/ijcai.2025/430

  6. [6]

    Applied Mathematics and Computation444, 127819 (2023) https://doi.org/10.1016/j.amc.2022.127819

    Ueda, M.: Memory-two strategies forming symmetric mutual reinforcement learn- ing equilibrium in repeated prisoners’ dilemma game. Applied Mathematics and Computation444, 127819 (2023) https://doi.org/10.1016/j.amc.2022.127819

  7. [7]

    Proceedings of the National Academy of Sciences 114(18), 4715–4720 (2017) https://doi.org/10.1073/pnas.1621239114

    Hilbe, C., Martinez-Vaquero, L.A., Chatterjee, K., Nowak, M.A.: Memory-n strategies of direct reciprocity. Proceedings of the National Academy of Sciences 114(18), 4715–4720 (2017) https://doi.org/10.1073/pnas.1621239114

  8. [8]

    arXiv preprint arXiv:2403.03497 (2024)

    Zhang, F., Wu, T., Wang, L.: Adaptive coordination promotes collective cooper- ation in repeated social dilemmas. arXiv preprint arXiv:2403.03497 (2024)

  9. [9]

    Anastassacos, N., Hailes, S., Musolesi, M.: Partner Selection for the Emer- gence of Cooperation in Multi-Agent Systems Using Reinforcement Learning. (2020). Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). https://aaai.org/Library/conferences-library.php

  10. [10]

    In: 25th International Conference on Autonomous Agents and Multiagent Systems

    Russell, B., Leung, C.-w., Turrini, P.: Defection at first sight : learning part- ner selection in optional social dilemmas without prior information. In: 25th International Conference on Autonomous Agents and Multiagent Systems. IFAA- MAS; ACM Digital library (2026). https://doi.org/10.65109/IBSZ1473 . In Press. https://doi.org/10.65109/IBSZ1473

  11. [11]

    https://arxiv.org/abs/2605.18185

    Russell, B., Leung, C.-w., Turrini, P.: The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection (2026). https://arxiv.org/abs/2605.18185

  12. [12]

    The Review of Economic Studies76(3), 993–1021 (2009)

    Fujiwara-Greve, T., Okuno-Fujiwara, M.: Voluntarily separable repeated pris- oner’s dilemma. The Review of Economic Studies76(3), 993–1021 (2009)

  13. [13]

    Journal of Economic Dynamics and Control46, 91–113 (2014) https://doi.org/10.1016/ j.jedc.2014.06.007

    Izquierdo, L.R., Izquierdo, S.S., Vega-Redondo, F.: Leave and let leave: A suf- ficient condition to explain the evolutionary emergence of cooperation. Journal of Economic Dynamics and Control46, 91–113 (2014) https://doi.org/10.1016/ j.jedc.2014.06.007

  14. [14]

    Proceedings of the Royal Society B: Biological Sciences274(1610), 749–753 (2007)

    Barclay, P., Willer, R.: Partner choice creates competitive altruism in humans. Proceedings of the Royal Society B: Biological Sciences274(1610), 749–753 (2007)

  15. [15]

    Dynamic social networks promote cooperation in experiments with humans

    Rand, D.G., Arbesman, S., Christakis, N.A.: Dynamic social networks promote cooperation in experiments with humans. Proceedings of the National Academy of Sciences108(48), 19193–19198 (2011) https://doi.org/10.1073/pnas.1108243108

  16. [16]

    Proceedings of the National Academy of Sciences109(36), 14363–14368 (2012) https://doi.org/10.1073/pnas.1120867109 23

    Wang, J., Suri, S., Watts, D.J.: Cooperation and assortativity with dynamic partner updating. Proceedings of the National Academy of Sciences109(36), 14363–14368 (2012) https://doi.org/10.1073/pnas.1120867109 23

  17. [17]

    Scientific Reports6, 35902 (2016) https://doi.org/10.1038/srep35902

    Zhang, B.-Y., Fan, S.-J., Li, C., Zheng, X.-D., Bao, J.-Z., Cressman, R., Tao, Y.: Opting out against defection leads to stable coexistence with cooperation. Scientific Reports6, 35902 (2016) https://doi.org/10.1038/srep35902

  18. [18]

    In: Proceedings of the 11th IEEE Congress on Evolutionary Computation (CEC’09) (2009)

    Segbroeck, S.V., Santos, F.C., Now´ e, A., Pacheco, J.M., Lenaerts, T.: The coevo- lution of loyalty and cooperation. In: Proceedings of the 11th IEEE Congress on Evolutionary Computation (CEC’09) (2009)

  19. [19]

    Biology Letters6(5), 659–662 (2010)

    Sylwester, K., Roberts, G.: Cooperators benefit through reputation-based partner choice in economic games. Biology Letters6(5), 659–662 (2010)

  20. [20]

    Journal of Theoretical Biology 420, 12–17 (2017) https://doi.org/10.1016/j.jtbi.2017.02.036

    Zheng, X.-D., Li, C., Yu, J.-R., Wang, S.-C., Fan, S.-J., Zhang, B.-Y., Tao, Y.: A simple rule of direct reciprocity leads to the stable coexistence of cooperation and defection in the Prisoner’s Dilemma game. Journal of Theoretical Biology 420, 12–17 (2017) https://doi.org/10.1016/j.jtbi.2017.02.036

  21. [21]

    Bara, J., Turrini, P., Andrighetto, G.: Enabling imitation-based cooperation in dynamic social networks. Auton. Agents Multi Agent Syst.36(2), 34 (2022) https: //doi.org/10.1007/s10458-022-09562-w

  22. [22]

    https://arxiv.org/abs/2606.11892

    Russell, B., Nugent, A., Bara, J.: Mean-field imitation dynamics on fast assorta- tive networks (2026). https://arxiv.org/abs/2606.11892

  23. [23]

    PLOS Computational Biology21(2), 1012810 (2025)

    Graser, C., Fujiwara-Greve, T., Garc´ ıa, J., Van Veelen, M.: Repeated games with partner choice. PLOS Computational Biology21(2), 1012810 (2025)

  24. [24]

    AAMAS ’24, pp

    Leung, C., Turrini, P.: Learning partner selection rules that sustain cooperation in social dilemmas with the option of opting out. AAMAS ’24, pp. 1110–1118. Inter- national Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2024)

  25. [25]

    In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pp

    Leung, C., Lenaerts, T., Turrini, P.: To promote full cooperation in social dilemmas, agents need to unlearn loyalty. In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pp. 111–119. ijcai.org (2024). https://www.ijcai.org/proceedings/2024/13

  26. [26]

    In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems Part 1 - AAMAS ’02

    Sabater, J., Sierra, C.: Reputation and social network analysis in multi-agent sys- tems. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems Part 1 - AAMAS ’02. ACM Press, New York, New York, USA (2002)

  27. [27]

    In: Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’02) (2002) 24

    Pujol, J.M., Sang¨ uesa, R., Delgado, J.: Extracting reputation in multi agent sys- tems by means of social network topology. In: Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’02) (2002) 24

  28. [28]

    Autonomous Agents and Multi-Agent Systems21(3), 397–424 (2010) https://doi.org/10.1007/s10458-009-9107-8

    Pinninck, A., Sierra, C., Schorlemmer, M.: A multiagent network for peer norm enforcement. Autonomous Agents and Multi-Agent Systems21(3), 397–424 (2010) https://doi.org/10.1007/s10458-009-9107-8

  29. [29]

    In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI’24) (2024)

    Smit, M., Santos, F.P.: Learning fair cooperation in mixed-motive games with indirect reciprocity. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI’24) (2024)

  30. [30]

    In: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’25) (2025)

    Ren, T., Yao, X., Li, Y., Zeng, X.-J.: Bottom-up reputation promotes cooperation with multi-agent reinforcement learning. In: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’25) (2025)

  31. [31]

    In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’18) (2018)

    Santos, F.P., Pacheco, J.M., Santos, F.C.: Social norms of cooperation with costly reputation building. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’18) (2018)

  32. [32]

    Mathematical biosciences40(1-2), 145–156 (1978)

    Taylor, P.D., Jonker, L.B.: Evolutionary stable strategies and game dynamics. Mathematical biosciences40(1-2), 145–156 (1978)

  33. [33]

    Dynamic Games and Applications2(2012) https://doi.org/10.1007/ s13235-012-0044-9

    Miekisz, J., Ramsza, M.: Replicator dynamics of symmetric ultimatum game. Dynamic Games and Applications2(2012) https://doi.org/10.1007/ s13235-012-0044-9

  34. [34]

    Journal of theo- retical biology243(1), 86–97 (2006) https://doi.org/10.1016/j.jtbi.2006.06.004

    Ohtsuki, H., Nowak, M.A.: The replicator equation on graphs. Journal of theo- retical biology243(1), 86–97 (2006) https://doi.org/10.1016/j.jtbi.2006.06.004

  35. [35]

    IEEE Transactions on Automatic Control66(1), 291–298 (2021) https://doi.org/10.1109/TAC.2020.2975811

    Ramazi, P., Cao, M.: Global convergence for replicator dynamics of repeated snowdrift games. IEEE Transactions on Automatic Control66(1), 291–298 (2021) https://doi.org/10.1109/TAC.2020.2975811 . Conference Name: IEEE Transactions on Automatic Control

  36. [36]

    Clarendon Press, Oxford (1962)

    Moran, P.A.P.: The Statistical Processes of Evolutionary Theory. Clarendon Press, Oxford (1962)

  37. [37]

    Nature428(6983), 646–650 (2004)

    Nowak, M.A., Sasaki, A., Taylor, C., Fudenberg, D.: Emergence of coopera- tion and evolutionary stability in finite populations. Nature428(6983), 646–650 (2004)

  38. [38]

    Prentice Hall, Upper Saddle River, N.J

    Khalil, H.K.: Nonlinear Systems. Prentice Hall, Upper Saddle River, N.J. (2002)

  39. [39]

    (eds.) A Survey of Replicator Equations, pp

    Sigmund, K.: In: Casti, J.L., Karlqvist, A. (eds.) A Survey of Replicator Equations, pp. 88–104. Springer, Berlin, Heidelberg (1986). https://doi.org/10. 1007/978-3-642-70953-1

  40. [40]

    arXiv preprint arXiv:2407.05460 (2024) 25

    Collevecchio, A., Mimun, H.A., Quattropani, M., Scarsini, M.: Basins of attraction in two-player random ordinal potential games. arXiv preprint arXiv:2407.05460 (2024) 25

  41. [41]

    23 Michael Neuder, Pranav Garimidi, and Tim Roughgarden

    Monderer, D., Shapley, L.: Potential games. Games and Economic Behavior14, 124–143 (1996) https://doi.org/10.1006/game.1996.0044

  42. [42]

    Cambridge University Press (1998) Appendix A Additional Derivations We present the full algebraic derivation of the manifold between the two strategiess D ands LC

    Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge University Press (1998) Appendix A Additional Derivations We present the full algebraic derivation of the manifold between the two strategiess D ands LC. φ(γ, B, m) = ALC,LC −A D,LC AD,D −A LC,D (A1) = Pm−2 j=0 (P γj) + Rγm−1 1−γ − Pm−2 j=0 (P γj )+T γm−1 1−γm P/(1−γ)− Pm−2 j...