Convergence of Replicator Dynamics in the Repeated Prisoner's Dilemma with Restarts
Pith reviewed 2026-06-26 18:53 UTC · model grok-4.3
The pith
Increasing strategy length enables cooperation to emerge and stabilise under replicator dynamics in the repeated Prisoner's Dilemma with restarts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Formulating the corresponding parametrised normal-form game, with agents each adopting a length-m strategy sequence, we show that increasing the strategy length enables cooperation to emerge and stabilise. We provide exact convergence guarantees for restricted strategy lengths and, in the general payoff configuration, provide the necessary parametric conditions for the stability of cooperative strategies. By deriving an exact formula for the number of stable sequences, we find structural properties necessary for stability, as agents must learn to initially defect - the so-called "hazing period" - before cooperating indefinitely. Our analysis shows that, while optimal cooperative sequences ex
What carries the argument
Length-m strategy sequences in the parametrised normal-form game obtained from the trigger-restart mechanism
If this is right
- Cooperation emerges and stabilises once strategy length is increased.
- Every stable cooperative sequence must contain an initial run of defection before indefinite cooperation.
- Sequences with longer initial defection runs possess larger basins of attraction.
- Exact formulas exist for the number of stable sequences under the trigger-restart rule.
- Parametric conditions on payoffs determine which cooperative sequences are stable.
Where Pith is reading between the lines
- The same restart rule and length-m representation could be applied to other repeated social dilemmas to check whether longer sequences likewise enlarge the set of stable cooperative outcomes.
- If populations can evolve the length of their strategies, selection may first increase memory before selecting among the cooperative sequences.
- Finite-population or stochastic simulations could test whether the larger basins identified for longer-hazing sequences survive when mutation and drift are added.
Load-bearing premise
Every agent is restricted to a pure length-m strategy sequence in the parametrised normal-form game obtained from the trigger-restart mechanism.
What would settle it
A calculation or simulation in which the number of stable cooperative sequences fails to increase with m or in which no cooperative sequences remain stable once m is large.
read the original abstract
We investigate a population of self-interested agents playing a repeated Prisoner's Dilemma under the trigger-restart mechanism. Under such a mechanism, agents play a sequence of symmetric games with their partner, and restart the interaction if their actions disagree. Our work focuses on the convergence of replicator dynamics in a well-mixed population of agents, where the emergence of cooperation is challenged by the individual incentive for exploitation. Formulating the corresponding parametrised normal-form game, with agents each adopting a length-m strategy sequence, we show that increasing the strategy length enables cooperation to emerge and stabilise. We provide exact convergence guarantees for restricted strategy lengths and, in the general payoff configuration, provide the necessary parametric conditions for the stability of cooperative strategies. By deriving an exact formula for the number of stable sequences, we find structural properties necessary for stability, as agents must learn to initially defect - the so-called "hazing period" - before cooperating indefinitely. Our analysis shows that, while optimal cooperative sequences exist, agents favour less-optimal sequences with a longer hazing period, which possess larger basins of attraction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies replicator dynamics for a population playing the repeated Prisoner's Dilemma under a trigger-restart mechanism. It formulates the interaction as a parametrised normal-form game in which each agent is restricted to a pure strategy that is a fixed-length-m sequence of actions, derives exact convergence guarantees for small m, supplies parametric stability conditions in the general case, and gives an exact formula for the number of stable sequences. The central claims are that increasing m permits stable cooperation to emerge, that every stable sequence must begin with a 'hazing period' of defection before indefinite cooperation, and that sequences with longer hazing periods, although payoff-suboptimal, possess larger basins of attraction.
Significance. If the derivations are correct, the work supplies a mathematically precise characterisation of stability and basin sizes inside a deliberately truncated strategy space. The explicit count of stable sequences and the identification of the hazing-period structural property constitute concrete, falsifiable predictions that could be tested numerically or experimentally within the same model class.
major comments (2)
- [Abstract / modeling section] Abstract and modeling formulation: the central claim that 'increasing the strategy length enables cooperation to emerge and stabilise' is derived entirely under the restriction that every agent adopts a pure length-m sequence in the trigger-restart normal-form game. Because the replicator dynamics, stability conditions, and basin-size comparisons are obtained only inside this class, the emergence result does not automatically extend to agents that can condition on the restart trigger itself or employ variable-length or history-dependent rules outside the m-sequence truncation. A justification or sensitivity analysis for this modeling choice is required for the claim to be load-bearing.
- [Abstract] The abstract asserts 'exact convergence guarantees for restricted strategy lengths' and 'an exact formula for the number of stable sequences,' yet the provided text supplies neither the payoff matrix entries nor the derivation steps that would allow verification of these formulas. Without the explicit mapping from the trigger-restart rule to the normal-form payoffs or the replicator equations, it is impossible to confirm that the reported stability conditions and basin comparisons are free of algebraic error.
minor comments (1)
- Notation for the length-m sequences and the restart trigger should be introduced with a small example (e.g., m=2) before the general case to improve readability.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive suggestions. We address each major comment below and have revised the manuscript to strengthen the presentation of our modeling assumptions and to improve the verifiability of our derivations.
read point-by-point responses
-
Referee: [Abstract / modeling section] Abstract and modeling formulation: the central claim that 'increasing the strategy length enables cooperation to emerge and stabilise' is derived entirely under the restriction that every agent adopts a pure length-m sequence in the trigger-restart normal-form game. Because the replicator dynamics, stability conditions, and basin-size comparisons are obtained only inside this class, the emergence result does not automatically extend to agents that can condition on the restart trigger itself or employ variable-length or history-dependent rules outside the m-sequence truncation. A justification or sensitivity analysis for this modeling choice is required for the claim to be load-bearing.
Authors: We agree that all results are obtained strictly inside the fixed-length-m pure-strategy truncation of the trigger-restart game. This restriction is deliberate: it permits an exact normal-form representation and closed-form stability analysis that would be intractable for arbitrary history-dependent or variable-length strategies. The abstract already states the restriction explicitly (“with agents each adopting a length-m strategy sequence”). We have added a new paragraph in Section 2 explaining the modeling rationale—namely, that the truncation isolates the effect of increasing memory length while keeping the strategy space finite and the replicator dynamics analytically tractable—and we briefly discuss how conditioning on the restart trigger would require a qualitatively different state space. No sensitivity analysis across broader classes is provided, as that lies outside the paper’s scope. revision: yes
-
Referee: [Abstract] The abstract asserts 'exact convergence guarantees for restricted strategy lengths' and 'an exact formula for the number of stable sequences,' yet the provided text supplies neither the payoff matrix entries nor the derivation steps that would allow verification of these formulas. Without the explicit mapping from the trigger-restart rule to the normal-form payoffs or the replicator equations, it is impossible to confirm that the reported stability conditions and basin comparisons are free of algebraic error.
Authors: The full manuscript derives the payoff matrix in Section 3 and supplies the replicator equations together with the stability conditions in Section 4; the exact count of stable sequences appears as Theorem 5. To make these derivations immediately verifiable, we have inserted a concrete payoff-matrix example for m=2 in the main text and expanded the appendix with the step-by-step mapping from the trigger-restart rule to the normal-form entries, followed by the algebraic verification of the stability thresholds. These additions allow direct checking of the formulas without altering any results. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper sets up an explicit parametrised normal-form game from the trigger-restart repeated PD with the modeling restriction to pure length-m strategy sequences, then derives replicator dynamics convergence, stability conditions, and an exact count of stable sequences directly from the resulting payoff structure and dynamics equations. No fitted parameters are renamed as predictions, no self-citations bear the load of uniqueness or ansatzes, and no step reduces by construction to its own inputs; the hazing-period property and basin-size comparisons follow from analysis of the constructed game rather than being presupposed.
Axiom & Free-Parameter Ledger
free parameters (2)
- m (strategy length)
- PD payoff parameters
axioms (2)
- domain assumption Population strategy frequencies evolve according to the replicator dynamics equation in a well-mixed population.
- domain assumption Every agent is restricted to a pure strategy that is a fixed sequence of length m.
Reference graph
Works this paper leans on
-
[1]
Science162(3859), 1243–1248 (1968)
Hardin, G.: The tragedy of the commons. Science162(3859), 1243–1248 (1968)
1968
-
[2]
The Review of Economic Studies38(1), 1–12 (1971)
Friedman, J.W.: A non-cooperative equilibrium for supergames. The Review of Economic Studies38(1), 1–12 (1971)
1971
-
[3]
Journal of Conflict Resolution24(1), 3–25 (1980) https://doi.org/10.1177/002200278002400101
Axelrod, R.: Effective choice in the prisoner’s dilemma. Journal of Conflict Resolution24(1), 3–25 (1980) https://doi.org/10.1177/002200278002400101
-
[4]
In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Berker, R.E., Conitzer, V.: Computing optimal equilibria in repeated games with restarts. In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. IJCAI ’24, pp. 2669–2677, Jeju, Korea (2024). https://doi. org/10.24963/ijcai.2024/295 .https://doi.org/10.24963/ijcai.2024/295
-
[5]
In: Proceedings of the Thirty-Fourth International Joint Conference 22 on Artificial Intelligence
Fleischmann, H., Fragkia, K., Berker, R.E.: Beyond symmetry in repeated games with restarts. In: Proceedings of the Thirty-Fourth International Joint Conference 22 on Artificial Intelligence. IJCAI ’25 (2025). https://doi.org/10.24963/ijcai.2025/ 430 .https://doi.org/10.24963/ijcai.2025/430
-
[6]
Applied Mathematics and Computation444, 127819 (2023) https://doi.org/10.1016/j.amc.2022.127819
Ueda, M.: Memory-two strategies forming symmetric mutual reinforcement learn- ing equilibrium in repeated prisoners’ dilemma game. Applied Mathematics and Computation444, 127819 (2023) https://doi.org/10.1016/j.amc.2022.127819
-
[7]
Hilbe, C., Martinez-Vaquero, L.A., Chatterjee, K., Nowak, M.A.: Memory-n strategies of direct reciprocity. Proceedings of the National Academy of Sciences 114(18), 4715–4720 (2017) https://doi.org/10.1073/pnas.1621239114
-
[8]
arXiv preprint arXiv:2403.03497 (2024)
Zhang, F., Wu, T., Wang, L.: Adaptive coordination promotes collective cooper- ation in repeated social dilemmas. arXiv preprint arXiv:2403.03497 (2024)
arXiv 2024
-
[9]
Anastassacos, N., Hailes, S., Musolesi, M.: Partner Selection for the Emer- gence of Cooperation in Multi-Agent Systems Using Reinforcement Learning. (2020). Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). https://aaai.org/Library/conferences-library.php
2020
-
[10]
In: 25th International Conference on Autonomous Agents and Multiagent Systems
Russell, B., Leung, C.-w., Turrini, P.: Defection at first sight : learning part- ner selection in optional social dilemmas without prior information. In: 25th International Conference on Autonomous Agents and Multiagent Systems. IFAA- MAS; ACM Digital library (2026). https://doi.org/10.65109/IBSZ1473 . In Press. https://doi.org/10.65109/IBSZ1473
-
[11]
https://arxiv.org/abs/2605.18185
Russell, B., Leung, C.-w., Turrini, P.: The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection (2026). https://arxiv.org/abs/2605.18185
Pith/arXiv arXiv 2026
-
[12]
The Review of Economic Studies76(3), 993–1021 (2009)
Fujiwara-Greve, T., Okuno-Fujiwara, M.: Voluntarily separable repeated pris- oner’s dilemma. The Review of Economic Studies76(3), 993–1021 (2009)
2009
-
[13]
Journal of Economic Dynamics and Control46, 91–113 (2014) https://doi.org/10.1016/ j.jedc.2014.06.007
Izquierdo, L.R., Izquierdo, S.S., Vega-Redondo, F.: Leave and let leave: A suf- ficient condition to explain the evolutionary emergence of cooperation. Journal of Economic Dynamics and Control46, 91–113 (2014) https://doi.org/10.1016/ j.jedc.2014.06.007
2014
-
[14]
Proceedings of the Royal Society B: Biological Sciences274(1610), 749–753 (2007)
Barclay, P., Willer, R.: Partner choice creates competitive altruism in humans. Proceedings of the Royal Society B: Biological Sciences274(1610), 749–753 (2007)
2007
-
[15]
Dynamic social networks promote cooperation in experiments with humans
Rand, D.G., Arbesman, S., Christakis, N.A.: Dynamic social networks promote cooperation in experiments with humans. Proceedings of the National Academy of Sciences108(48), 19193–19198 (2011) https://doi.org/10.1073/pnas.1108243108
-
[16]
Wang, J., Suri, S., Watts, D.J.: Cooperation and assortativity with dynamic partner updating. Proceedings of the National Academy of Sciences109(36), 14363–14368 (2012) https://doi.org/10.1073/pnas.1120867109 23
-
[17]
Scientific Reports6, 35902 (2016) https://doi.org/10.1038/srep35902
Zhang, B.-Y., Fan, S.-J., Li, C., Zheng, X.-D., Bao, J.-Z., Cressman, R., Tao, Y.: Opting out against defection leads to stable coexistence with cooperation. Scientific Reports6, 35902 (2016) https://doi.org/10.1038/srep35902
-
[18]
In: Proceedings of the 11th IEEE Congress on Evolutionary Computation (CEC’09) (2009)
Segbroeck, S.V., Santos, F.C., Now´ e, A., Pacheco, J.M., Lenaerts, T.: The coevo- lution of loyalty and cooperation. In: Proceedings of the 11th IEEE Congress on Evolutionary Computation (CEC’09) (2009)
2009
-
[19]
Biology Letters6(5), 659–662 (2010)
Sylwester, K., Roberts, G.: Cooperators benefit through reputation-based partner choice in economic games. Biology Letters6(5), 659–662 (2010)
2010
-
[20]
Journal of Theoretical Biology 420, 12–17 (2017) https://doi.org/10.1016/j.jtbi.2017.02.036
Zheng, X.-D., Li, C., Yu, J.-R., Wang, S.-C., Fan, S.-J., Zhang, B.-Y., Tao, Y.: A simple rule of direct reciprocity leads to the stable coexistence of cooperation and defection in the Prisoner’s Dilemma game. Journal of Theoretical Biology 420, 12–17 (2017) https://doi.org/10.1016/j.jtbi.2017.02.036
-
[21]
Bara, J., Turrini, P., Andrighetto, G.: Enabling imitation-based cooperation in dynamic social networks. Auton. Agents Multi Agent Syst.36(2), 34 (2022) https: //doi.org/10.1007/s10458-022-09562-w
-
[22]
https://arxiv.org/abs/2606.11892
Russell, B., Nugent, A., Bara, J.: Mean-field imitation dynamics on fast assorta- tive networks (2026). https://arxiv.org/abs/2606.11892
Pith/arXiv arXiv 2026
-
[23]
PLOS Computational Biology21(2), 1012810 (2025)
Graser, C., Fujiwara-Greve, T., Garc´ ıa, J., Van Veelen, M.: Repeated games with partner choice. PLOS Computational Biology21(2), 1012810 (2025)
2025
-
[24]
AAMAS ’24, pp
Leung, C., Turrini, P.: Learning partner selection rules that sustain cooperation in social dilemmas with the option of opting out. AAMAS ’24, pp. 1110–1118. Inter- national Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2024)
2024
-
[25]
In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pp
Leung, C., Lenaerts, T., Turrini, P.: To promote full cooperation in social dilemmas, agents need to unlearn loyalty. In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pp. 111–119. ijcai.org (2024). https://www.ijcai.org/proceedings/2024/13
2024
-
[26]
In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems Part 1 - AAMAS ’02
Sabater, J., Sierra, C.: Reputation and social network analysis in multi-agent sys- tems. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems Part 1 - AAMAS ’02. ACM Press, New York, New York, USA (2002)
2002
-
[27]
In: Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’02) (2002) 24
Pujol, J.M., Sang¨ uesa, R., Delgado, J.: Extracting reputation in multi agent sys- tems by means of social network topology. In: Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’02) (2002) 24
2002
-
[28]
Pinninck, A., Sierra, C., Schorlemmer, M.: A multiagent network for peer norm enforcement. Autonomous Agents and Multi-Agent Systems21(3), 397–424 (2010) https://doi.org/10.1007/s10458-009-9107-8
-
[29]
In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI’24) (2024)
Smit, M., Santos, F.P.: Learning fair cooperation in mixed-motive games with indirect reciprocity. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI’24) (2024)
2024
-
[30]
In: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’25) (2025)
Ren, T., Yao, X., Li, Y., Zeng, X.-J.: Bottom-up reputation promotes cooperation with multi-agent reinforcement learning. In: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’25) (2025)
2025
-
[31]
In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’18) (2018)
Santos, F.P., Pacheco, J.M., Santos, F.C.: Social norms of cooperation with costly reputation building. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’18) (2018)
2018
-
[32]
Mathematical biosciences40(1-2), 145–156 (1978)
Taylor, P.D., Jonker, L.B.: Evolutionary stable strategies and game dynamics. Mathematical biosciences40(1-2), 145–156 (1978)
1978
-
[33]
Dynamic Games and Applications2(2012) https://doi.org/10.1007/ s13235-012-0044-9
Miekisz, J., Ramsza, M.: Replicator dynamics of symmetric ultimatum game. Dynamic Games and Applications2(2012) https://doi.org/10.1007/ s13235-012-0044-9
2012
-
[34]
Journal of theo- retical biology243(1), 86–97 (2006) https://doi.org/10.1016/j.jtbi.2006.06.004
Ohtsuki, H., Nowak, M.A.: The replicator equation on graphs. Journal of theo- retical biology243(1), 86–97 (2006) https://doi.org/10.1016/j.jtbi.2006.06.004
-
[35]
IEEE Transactions on Automatic Control66(1), 291–298 (2021) https://doi.org/10.1109/TAC.2020.2975811
Ramazi, P., Cao, M.: Global convergence for replicator dynamics of repeated snowdrift games. IEEE Transactions on Automatic Control66(1), 291–298 (2021) https://doi.org/10.1109/TAC.2020.2975811 . Conference Name: IEEE Transactions on Automatic Control
-
[36]
Clarendon Press, Oxford (1962)
Moran, P.A.P.: The Statistical Processes of Evolutionary Theory. Clarendon Press, Oxford (1962)
1962
-
[37]
Nature428(6983), 646–650 (2004)
Nowak, M.A., Sasaki, A., Taylor, C., Fudenberg, D.: Emergence of coopera- tion and evolutionary stability in finite populations. Nature428(6983), 646–650 (2004)
2004
-
[38]
Prentice Hall, Upper Saddle River, N.J
Khalil, H.K.: Nonlinear Systems. Prentice Hall, Upper Saddle River, N.J. (2002)
2002
-
[39]
(eds.) A Survey of Replicator Equations, pp
Sigmund, K.: In: Casti, J.L., Karlqvist, A. (eds.) A Survey of Replicator Equations, pp. 88–104. Springer, Berlin, Heidelberg (1986). https://doi.org/10. 1007/978-3-642-70953-1
1986
-
[40]
arXiv preprint arXiv:2407.05460 (2024) 25
Collevecchio, A., Mimun, H.A., Quattropani, M., Scarsini, M.: Basins of attraction in two-player random ordinal potential games. arXiv preprint arXiv:2407.05460 (2024) 25
arXiv 2024
-
[41]
23 Michael Neuder, Pranav Garimidi, and Tim Roughgarden
Monderer, D., Shapley, L.: Potential games. Games and Economic Behavior14, 124–143 (1996) https://doi.org/10.1006/game.1996.0044
-
[42]
Cambridge University Press (1998) Appendix A Additional Derivations We present the full algebraic derivation of the manifold between the two strategiess D ands LC
Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge University Press (1998) Appendix A Additional Derivations We present the full algebraic derivation of the manifold between the two strategiess D ands LC. φ(γ, B, m) = ALC,LC −A D,LC AD,D −A LC,D (A1) = Pm−2 j=0 (P γj) + Rγm−1 1−γ − Pm−2 j=0 (P γj )+T γm−1 1−γm P/(1−γ)− Pm−2 j...
1998
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.