pith. sign in

arxiv: 2602.04150 · v2 · pith:TMRVILSBnew · submitted 2026-02-04 · 🧬 q-bio.PE · cond-mat.dis-nn· nlin.AO

A brief review of evolutionary game dynamics in the reinforcement learning paradigm

Pith reviewed 2026-05-21 14:38 UTC · model grok-4.3

classification 🧬 q-bio.PE cond-mat.dis-nnnlin.AO
keywords evolutionary game theoryreinforcement learningcooperationfairnesstrustresource coordinationecological dynamics
0
0 comments X

The pith

Reinforcement learning replaces imitation copying in evolutionary games to better explain how cooperation, fairness, and trust arise in real populations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews recent work that applies reinforcement learning to evolutionary game dynamics as a way to address mismatches between standard theoretical predictions and observed human and animal behavior. In the RL approach, agents adjust their strategies through repeated trial and error using feedback from the environment rather than simply copying successful neighbors according to fixed rules. This shift allows models to capture phenomena such as the evolution of cooperation, trust, fairness, and efficient resource use more closely than imitation-based models have achieved. A reader would care because these behaviors underpin social coordination yet have resisted consistent explanation by earlier frameworks.

Core claim

By synthesizing studies that replace imitation learning with reinforcement learning in evolutionary games, the review shows that agents who refine strategies through trial-and-error feedback can generate cooperation, trust, fairness, optimal resource coordination, and stable ecological dynamics at levels that align more closely with experimental observations than prior models permitted.

What carries the argument

Reinforcement learning paradigm applied to evolutionary game dynamics, in which individuals update strategies introspectively from environmental feedback instead of copying neighbors under fixed rules.

If this is right

  • Evolutionary models can now address a wider range of social dilemmas without ad-hoc adjustments to imitation rules.
  • Resource allocation problems in shared environments gain more realistic dynamics when agents learn from direct experience.
  • Ecological interactions can be simulated with the same learning mechanism used for human social behavior.
  • Discrepancies that remain after adopting RL point to specific additional factors worth isolating in future experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same RL mechanism could be tested on coordination games beyond those reviewed to see whether it generates similar improvements in fit to data.
  • Longer simulation runs with RL agents might reveal whether stable fairness norms persist under changing environmental conditions.
  • Hybrid models that combine limited imitation with RL feedback could be compared directly to pure RL versions to quantify the added value of each component.

Load-bearing premise

Persistent gaps between theoretical predictions and behavioral experiments arise in part from the imitation learning paradigm used in earlier models rather than from other modeling choices or unaccounted factors.

What would settle it

A controlled comparison in which reinforcement-learning versions of standard games such as the Prisoner's Dilemma produce cooperation rates that match laboratory experiment data more closely than imitation-learning versions across multiple population sizes and payoff structures.

Figures

Figures reproduced from arXiv: 2602.04150 by Guozhong Zheng, Jiqiang Zhang, Li Chen, Shengfeng Deng, Xin Ou.

Figure 1
Figure 1. Figure 1: Two paradigms for game evolution. In imitation learning, players compare the rewards [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Emergence of cooperation in the prisoner’s dilemma game. (a) The phase diagram [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Schematics of three model setups. (Left) Public goods game (PGG) with a Fermi [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Emergence of trust. Fractions of four strategies in the trust game within the parameter [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Emergence of fairness. As in many practices of behavioral experiments, proposers [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Emergence of coordination in the minority game. The volatility [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison study for species coexistence with the traditional RMF and Q-learning. [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
read the original abstract

Cooperation, fairness, trust, and resource coordination are cornerstones of modern civilization, yet their emergence remains inadequately explained by the persistent discrepancies between theoretical predictions and behavioral experiments. Part of this gap may arise from the imitation learning paradigm commonly used in prior theoretical models, which assumes individuals merely copy successful neighbors according to predetermined, fixed rules. This review examines recent advances in evolutionary game dynamics that employ reinforcement learning (RL) as an alternative paradigm. In RL, individuals learn through trial and error and introspectively refine their strategies based on environmental feedback. We begin by introducing key concepts in evolutionary game theory and the two learning paradigms, then synthesize progress in applying RL to elucidate cooperation, trust, fairness, optimal resource coordination, and ecological dynamics. Collectively, these studies indicate that RL offers a promising unified framework for understanding the diverse social and ecological phenomena observed in human and natural systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. This manuscript is a brief review arguing that persistent discrepancies between evolutionary game theory predictions and behavioral experiments may stem in part from the imitation learning paradigm used in prior models. It introduces core concepts from evolutionary game theory and contrasts imitation learning with reinforcement learning (RL), then synthesizes recent RL applications to cooperation, trust, fairness, resource coordination, and ecological dynamics, concluding that RL provides a promising unified framework for these phenomena.

Significance. A well-executed synthesis could usefully highlight how RL's trial-and-error and feedback mechanisms differ from fixed imitation rules and may better align with experimental observations on cooperation and fairness. The review correctly notes the potential for RL to serve as an alternative modeling approach in evolutionary games, which is a timely topic given growing interest in learning-based explanations of social behavior.

major comments (2)
  1. [Abstract] Abstract: The statement that discrepancies 'may arise in part from the imitation learning paradigm' is presented as motivation but is not supported by any extracted quantitative comparisons (e.g., prediction error, KL divergence to experimental distributions, or held-out fit) between RL and imitation versions of the same games across the cited studies.
  2. [Synthesis section] Synthesis of RL applications: The review summarizes individual RL studies on cooperation, fairness, and ecological dynamics but does not perform or report head-to-head metrics (parameter counts, out-of-sample performance, or direct contrast with imitation baselines) that would substantiate the claim of superior explanatory power over imitation learning for the same phenomena.
minor comments (1)
  1. The manuscript would benefit from a brief table or structured summary listing the key RL models reviewed, the games they address, and any reported performance metrics relative to imitation baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. As this is a brief review synthesizing existing literature rather than a primary research study, our responses below address the scope limitations while maintaining the manuscript's focus on conceptual unification via RL.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The statement that discrepancies 'may arise in part from the imitation learning paradigm' is presented as motivation but is not supported by any extracted quantitative comparisons (e.g., prediction error, KL divergence to experimental distributions, or held-out fit) between RL and imitation versions of the same games across the cited studies.

    Authors: We agree that the manuscript does not extract or report new quantitative metrics such as prediction errors or KL divergences comparing RL and imitation models. The abstract presents the possibility as a motivating hypothesis based on persistent discrepancies noted across the broader literature, rather than as a claim demonstrated via new analysis in this review. We will revise the abstract to clarify this framing and avoid implying direct quantitative support from the current synthesis. revision: partial

  2. Referee: [Synthesis section] Synthesis of RL applications: The review summarizes individual RL studies on cooperation, fairness, and ecological dynamics but does not perform or report head-to-head metrics (parameter counts, out-of-sample performance, or direct contrast with imitation baselines) that would substantiate the claim of superior explanatory power over imitation learning for the same phenomena.

    Authors: The synthesis section overviews applications and findings from the cited RL studies without performing new cross-study comparisons or reporting aggregated metrics such as parameter counts or out-of-sample performance. Individual source papers often contain their own baseline contrasts, but compiling head-to-head evaluations would require a distinct meta-analytic effort outside the scope of a brief review. We therefore do not intend to add such metrics; the manuscript's contribution lies in highlighting RL's potential as a unified framework based on the collective literature. revision: no

Circularity Check

0 steps flagged

Review synthesizes external studies with no internal derivation chain or self-referential reductions

full rationale

This is a review paper that introduces concepts in evolutionary game theory and RL, then summarizes progress from cited external works on cooperation, trust, fairness, and ecological dynamics. No equations, fitted parameters, or derivations are presented that could reduce to inputs by construction. The central claim that RL offers a unified framework rests entirely on the body of reviewed literature rather than any self-citation load-bearing step or ansatz smuggled in. The interpretive suggestion that discrepancies arise from the imitation paradigm is an attribution drawn from the cited studies, not a circular self-definition or renaming of known results within this manuscript.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a review paper and introduces no new free parameters, axioms, or invented entities. It discusses concepts drawn from existing evolutionary game theory and reinforcement learning literature.

pith-pipeline@v0.9.0 · 5694 in / 976 out tokens · 43920 ms · 2026-05-21T14:38:57.179554+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

115 extracted references · 115 canonical work pages · 1 internal anchor

  1. [1]

    M. A. Nowak, Science 314 (2006) 1560-1563

  2. [2]

    Hardin, Trust and trustworthiness , Russell Sage Foundation (2002)

    R. Hardin, Trust and trustworthiness , Russell Sage Foundation (2002). 23

  3. [3]

    Piketty, Capital in the Twenty-First Century , Belknap Press: An Imprint of Harvard Univer- sity Press (2014)

    T. Piketty, Capital in the Twenty-First Century , Belknap Press: An Imprint of Harvard Univer- sity Press (2014)

  4. [4]

    W. B. Arthur, Science 284 (1999) 107–109

  5. [6]

    Maynard Smith and G

    J. Maynard Smith and G. R. Price, Nature 246 (1973) 15-18

  6. [7]

    J. M. Smith, Evolution and the Theory of Games , Cambridge University Press (1982)

  7. [8]

    M. Perc, J. J. Jordan, D. G. Rand, Z. Wang, S. Boccaletti, and A. Szolnoki, Phys. Rep. 687 (2017) 1–51

  8. [9]

    C. F. Camerer, Behavioral game theory: Experiments in strategic interaction , Princeton Univer- sity Press (2011)

  9. [10]

    Traulsen, D

    A. Traulsen, D. Semmann, R. D. Sommerfeld, H.-J. Krambeck, and M. Milinski, Proc. Natl. Acad. Sci. USA 107 (2010) 2962-2966

  10. [11]

    M. A. Nowak and R. M. May, Nature 359 (1992) 826-829

  11. [12]

    Szabó and C

    G. Szabó and C. Tőke, Phys. Rev. E 58 (1998) 69-73

  12. [13]

    Sánchez, J

    A. Sánchez, J. Stat. Mech.: Theory Exp. 2018 (2018) 024001

  13. [14]

    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction , MIT Press (2018)

  14. [15]

    P. D. Taylor and L. B. Jonker, Math. Biosci. 40 (1978) 145–156

  15. [16]

    Szabó and G

    G. Szabó and G. Fáth, Phys. Rep. 446 (2007) 97-216

  16. [17]

    Grujić, C

    J. Grujić, C. Gracia-Lázaro, M. Milinski, D. Semmann, A. Traulsen, J. A. Cuesta, Y. Moreno, and A. Sánchez, Sci. Rep. 4 (2014) 4615

  17. [18]

    Bandura, Social Learning Theory , Englewood Cliffs (1977)

    A. Bandura, Social Learning Theory , Englewood Cliffs (1977)

  18. [19]

    D. Lee, H. Seo, and M. W. Jung, Annu. Rev. Neurosci. 35 (2012) 287

  19. [20]

    M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming , John Wiley & Sons (2014)

  20. [21]

    R. R. Bush and F. Mosteller, Stochastic models for learning , John Wiley & Sons, Inc. (1955)

  21. [22]

    C. J. C. H. Watkins, Learning from delayed rewards (Ph.D. thesis) , University of Cambridge (1989)

  22. [23]

    R. J. Williams, Mach. Learn. 8 (1992) 229

  23. [24]

    R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, in Advances in Neural Information Processing Systems, MIT Press (1999) Vol. 12

  24. [25]

    J. M. Smith, in Did Darwin get it right? Essays on games, sex and evolution , Springer (1982) 202–215

  25. [26]

    Xie and A

    K. Xie and A. Szolnoki, Appl. Math. Comput. 510 (2026) 129685

  26. [27]

    Z. Ding, G. Zheng, C. Cai, W. Cai, L. Chen, J. Zhang, and X. Wang, Chaos, Solitons & Fractals 175 (2023) 114032

  27. [28]

    Zheng, Z

    G. Zheng, Z. Ding, J. Zhang, S. Deng, W. Cai, and L. Chen, Chaos 35 (2025) 053129

  28. [29]

    H. Ding, G. Zhang, S. Wang, J. Li, and Z. Wang, Physica A 536 (2019) 122551

  29. [30]

    H. Lee, S. Chen, and F. Shi, New J. Phys. 27 (2025) 013025

  30. [31]

    D. Jia, H. Guo, Z. Song, L. Shi, X. Deng, M. Perc, and Z. Wang, New J. Phys. 23 (2021) 083020

  31. [32]

    C. Zhao, G. Zheng, C. Zhang, J. Zhang, and L. Chen, Chaos 34 (2024) 073123

  32. [33]

    L. Wang, X. Shi, and Y. Zhou, Chaos 35 (2025) 023103

  33. [34]

    Z. Fang, H. Xu, C. Xie, X. Yue, T. P. Benko, and C. Huang, Chaos, Solitons & Fractals 200 (2025) 117115

  34. [35]

    Zhang and Y

    Q. Zhang and Y. Yan, Phys. Lett. A 2025 (2025) 130754. 24

  35. [36]

    P. Bai, B. Qiang, K. Zou, and C. Huang, Chaos, Solitons & Fractals 180 (2024) 114592

  36. [37]

    T. You, H. Yang, J. Wang, P. Zhang, J. Chen, and Y. Zhang, Appl. Math. Comput. 458 (2023) 128234

  37. [38]

    Huang and Y

    Y. Huang and Y. Chen, Chaos 35 (2025) 043130

  38. [39]

    Zhang, Z.-X

    H.-F. Zhang, Z.-X. Wu, and B.-H. Wang, J. Stat. Mech.: Theory Exp. 2012 (2012) P06005

  39. [40]

    L. Wang, D. Jia, L. Zhang, P. Zhu, M. Perc, L. Shi, and Z. Wang, Nonlinear Dyn. 108 (2022) 1837

  40. [41]

    X. Wang, Z. Yang, Y. Liu, and G. Chen, Physica A 618 (2023) 128699

  41. [42]

    Q. Su, H. Wang, Y. Xia, and L. Wang, Nat. Commun. (2025) (in press)

  42. [43]

    Sheng, J

    A. Sheng, J. Zhang, G. Zheng, J. Zhang, W. Cai, and L. Chen, Chaos 34 (2024) 103117

  43. [44]

    Zheng, J

    G. Zheng, J. Zhang, S. Deng, W. Cai, and L. Chen, Chaos, Solitons & Fractals 188 (2024) 115568

  44. [45]

    Hardin, Science 162 (1968) 1243–1248

    G. Hardin, Science 162 (1968) 1243–1248

  45. [46]

    L. Wang, L. Fan, L. Zhang, R. Zou, and Z. Wang, New J. Phys. 25 (2023) 073008

  46. [47]

    Zhang, T

    H. Zhang, T. An, P. Yan, K. Hu, J. An, L. Shi, J. Zhao, and J. Wang, Chaos, Solitons & Fractals 178 (2024) 114358

  47. [48]

    Zou and C

    K. Zou and C. Huang, Chaos, Solitons & Fractals 186 (2024) 115203

  48. [49]

    B. Li, Z. Zhang, G. Zheng, C. Cai, J. Zhang, and L. Chen, Phys. Rev. E 111 (2025) 014304

  49. [50]

    H. Kang, C. Jiang, Y. Shen, X. Sun, and Q. Chen, Chaos, Solitons & Fractals 199 (2025) 116862

  50. [51]

    Y. Xu, J. Wang, J. Chen, D. Zhao, M. Özer, C. Xia, and M. Perc, Knowledge-Based Systems 301 (2024) 112326

  51. [52]

    Zhang, Y

    L. Zhang, Y. Li, Y. Xie, Y. Feng, and C. Huang, Chaos, Solitons & Fractals 193 (2025) 116071

  52. [53]

    Traulsen, D

    A. Traulsen, D. Semmann, R. D. Sommerfeld, H.-J. Krambeck, and M. Milinski, Proc. Natl. Acad. Sci. USA 107 (2010) 2962–2966

  53. [54]

    X. Han, X. Zhao, and H. Xia, Chaos, Solitons & Fractals 164 (2022) 112684

  54. [55]

    Zhang, Z

    Y. Zhang, Z. Zheng, X. Zhang, and J. Ma, Chaos, Solitons & Fractals 201 (2025) 117264

  55. [56]

    Y. Yang, D. Zhao, and J. Wang, Chaos, Solitons & Fractals 199 (2025) 116592

  56. [57]

    L. Ma, J. Zhang, G. Zheng, R. Liang, and L. Chen, Chaos, Solitons & Fractals 171 (2023) 113452

  57. [58]

    C. Zhao, X. Feng, G. Zheng, W. Cai, J. Zhang, and L. Chen, Phys. Rev. E 112 (2025) 054309

  58. [59]

    Zheng, J

    G. Zheng, J. Zhang, J. Zhang, W. Cai, and L. Chen, New J. Phys. 26 (2024) 053041

  59. [60]

    K. J. Arrow, The limits of organization , Norton & Company (1974)

  60. [61]

    P. J. Zak and S. Knack, Econ. J. 111 (2001) 295–321

  61. [62]

    Algan and P

    Y. Algan and P. Cahuc, Annu. Rev. Econ. 5 (2013) 521-549

  62. [63]

    J. Berg, J. Dickhaut, and K. McCabe, Games Econ. Behav. 10 (1995) 122-142

  63. [64]

    N. D. Johnson and A. A. Mislin, J. Econ. Psychol. 32 (2011) 865-889

  64. [65]

    Bravo and L

    G. Bravo and L. Tamburino, Rationality Soc. 20 (2008) 85-113

  65. [66]

    Wang, Appl

    C. Wang, Appl. Math. Comput. 471 (2024) 128595

  66. [67]

    R. Guo, L. Liu, Y. Liu, and L. Zhang, Chaos, Solitons & Fractals 176 (2023) 114078

  67. [68]

    Y. Zhu, W. Li, C. Xia, and M. Chica, Knowl.-Based Syst. 305 (2024) 112645

  68. [69]

    R. Guo, L. Liu, Y. Liu, and L. Zhang, Appl. Math. Comput. 473 (2024) 128649

  69. [70]

    Y. Liu, L. Wang, R. Guo, S. Hua, L. Liu, L. Zhang, and T. A. Han, J. R. Soc. Interface 22 (2025) 20240726

  70. [71]

    Kumar, V

    A. Kumar, V. Capraro, and M. Perc, J. R. Soc. Interface 17 (2020) 20200491

  71. [72]

    Y. Zhu, B. Xing, and C. Xia, Chaos, Solitons & Fractals 199 (2025) 116653. 25

  72. [73]

    Z. Hu, Y. Zhu, D. Zhao, and C. Xia, Chaos, Solitons & Fractals 202 (2026) 117623

  73. [74]

    Engle-Warnick and R

    J. Engle-Warnick and R. L. Slonim, J. Econ. Behav. Organ. 55 (2004) 553-573

  74. [75]

    Zheng, J

    G. Zheng, J. Zhang, X. Ou, S. Deng, and L. Chen, Phys. Rev. E 111 (2025) 064307

  75. [76]

    W. Güth, R. Schmittberger, and B. Schwarze, J. Econ. Behav. Organ. 3 (1982) 367-388

  76. [77]

    R. H. Thaler, J. Econ. Perspect. 2 (1988) 195-206

  77. [78]

    Güth and M

    W. Güth and M. G. Kocher, J. Econ. Behav. Organ. 108 (2014) 396-409

  78. [79]

    Szabó and C

    G. Szabó and C. UQke, Phys. Rev. E 58 (1998) 69-73

  79. [80]

    K. M. Page and K. Sigmund, Proc. Biol. Sci. 267 (2000) 2177-2182

  80. [81]

    M. N. Kuperman and S. Risau-Gusman, Eur. Phys. J. B 62 (2008) 233-238

Showing first 80 references.