A brief review of evolutionary game dynamics in the reinforcement learning paradigm

Guozhong Zheng; Jiqiang Zhang; Li Chen; Shengfeng Deng; Xin Ou

arxiv: 2602.04150 · v2 · pith:TMRVILSBnew · submitted 2026-02-04 · 🧬 q-bio.PE · cond-mat.dis-nn· nlin.AO

A brief review of evolutionary game dynamics in the reinforcement learning paradigm

Guozhong Zheng , Xin Ou , Shengfeng Deng , Jiqiang Zhang , Li Chen This is my paper

Pith reviewed 2026-05-21 14:38 UTC · model grok-4.3

classification 🧬 q-bio.PE cond-mat.dis-nnnlin.AO

keywords evolutionary game theoryreinforcement learningcooperationfairnesstrustresource coordinationecological dynamics

0 comments

The pith

Reinforcement learning replaces imitation copying in evolutionary games to better explain how cooperation, fairness, and trust arise in real populations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews recent work that applies reinforcement learning to evolutionary game dynamics as a way to address mismatches between standard theoretical predictions and observed human and animal behavior. In the RL approach, agents adjust their strategies through repeated trial and error using feedback from the environment rather than simply copying successful neighbors according to fixed rules. This shift allows models to capture phenomena such as the evolution of cooperation, trust, fairness, and efficient resource use more closely than imitation-based models have achieved. A reader would care because these behaviors underpin social coordination yet have resisted consistent explanation by earlier frameworks.

Core claim

By synthesizing studies that replace imitation learning with reinforcement learning in evolutionary games, the review shows that agents who refine strategies through trial-and-error feedback can generate cooperation, trust, fairness, optimal resource coordination, and stable ecological dynamics at levels that align more closely with experimental observations than prior models permitted.

What carries the argument

Reinforcement learning paradigm applied to evolutionary game dynamics, in which individuals update strategies introspectively from environmental feedback instead of copying neighbors under fixed rules.

If this is right

Evolutionary models can now address a wider range of social dilemmas without ad-hoc adjustments to imitation rules.
Resource allocation problems in shared environments gain more realistic dynamics when agents learn from direct experience.
Ecological interactions can be simulated with the same learning mechanism used for human social behavior.
Discrepancies that remain after adopting RL point to specific additional factors worth isolating in future experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same RL mechanism could be tested on coordination games beyond those reviewed to see whether it generates similar improvements in fit to data.
Longer simulation runs with RL agents might reveal whether stable fairness norms persist under changing environmental conditions.
Hybrid models that combine limited imitation with RL feedback could be compared directly to pure RL versions to quantify the added value of each component.

Load-bearing premise

Persistent gaps between theoretical predictions and behavioral experiments arise in part from the imitation learning paradigm used in earlier models rather than from other modeling choices or unaccounted factors.

What would settle it

A controlled comparison in which reinforcement-learning versions of standard games such as the Prisoner's Dilemma produce cooperation rates that match laboratory experiment data more closely than imitation-learning versions across multiple population sizes and payoff structures.

Figures

Figures reproduced from arXiv: 2602.04150 by Guozhong Zheng, Jiqiang Zhang, Li Chen, Shengfeng Deng, Xin Ou.

**Figure 2.** Figure 2: Emergence of cooperation in the prisoner’s dilemma game. (a) The phase diagram [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Schematics of three model setups. (Left) Public goods game (PGG) with a Fermi [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Emergence of trust. Fractions of four strategies in the trust game within the parameter [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Emergence of fairness. As in many practices of behavioral experiments, proposers [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Emergence of coordination in the minority game. The volatility [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison study for species coexistence with the traditional RMF and Q-learning. [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

read the original abstract

Cooperation, fairness, trust, and resource coordination are cornerstones of modern civilization, yet their emergence remains inadequately explained by the persistent discrepancies between theoretical predictions and behavioral experiments. Part of this gap may arise from the imitation learning paradigm commonly used in prior theoretical models, which assumes individuals merely copy successful neighbors according to predetermined, fixed rules. This review examines recent advances in evolutionary game dynamics that employ reinforcement learning (RL) as an alternative paradigm. In RL, individuals learn through trial and error and introspectively refine their strategies based on environmental feedback. We begin by introducing key concepts in evolutionary game theory and the two learning paradigms, then synthesize progress in applying RL to elucidate cooperation, trust, fairness, optimal resource coordination, and ecological dynamics. Collectively, these studies indicate that RL offers a promising unified framework for understanding the diverse social and ecological phenomena observed in human and natural systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. This manuscript is a brief review arguing that persistent discrepancies between evolutionary game theory predictions and behavioral experiments may stem in part from the imitation learning paradigm used in prior models. It introduces core concepts from evolutionary game theory and contrasts imitation learning with reinforcement learning (RL), then synthesizes recent RL applications to cooperation, trust, fairness, resource coordination, and ecological dynamics, concluding that RL provides a promising unified framework for these phenomena.

Significance. A well-executed synthesis could usefully highlight how RL's trial-and-error and feedback mechanisms differ from fixed imitation rules and may better align with experimental observations on cooperation and fairness. The review correctly notes the potential for RL to serve as an alternative modeling approach in evolutionary games, which is a timely topic given growing interest in learning-based explanations of social behavior.

major comments (2)

[Abstract] Abstract: The statement that discrepancies 'may arise in part from the imitation learning paradigm' is presented as motivation but is not supported by any extracted quantitative comparisons (e.g., prediction error, KL divergence to experimental distributions, or held-out fit) between RL and imitation versions of the same games across the cited studies.
[Synthesis section] Synthesis of RL applications: The review summarizes individual RL studies on cooperation, fairness, and ecological dynamics but does not perform or report head-to-head metrics (parameter counts, out-of-sample performance, or direct contrast with imitation baselines) that would substantiate the claim of superior explanatory power over imitation learning for the same phenomena.

minor comments (1)

The manuscript would benefit from a brief table or structured summary listing the key RL models reviewed, the games they address, and any reported performance metrics relative to imitation baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. As this is a brief review synthesizing existing literature rather than a primary research study, our responses below address the scope limitations while maintaining the manuscript's focus on conceptual unification via RL.

read point-by-point responses

Referee: [Abstract] Abstract: The statement that discrepancies 'may arise in part from the imitation learning paradigm' is presented as motivation but is not supported by any extracted quantitative comparisons (e.g., prediction error, KL divergence to experimental distributions, or held-out fit) between RL and imitation versions of the same games across the cited studies.

Authors: We agree that the manuscript does not extract or report new quantitative metrics such as prediction errors or KL divergences comparing RL and imitation models. The abstract presents the possibility as a motivating hypothesis based on persistent discrepancies noted across the broader literature, rather than as a claim demonstrated via new analysis in this review. We will revise the abstract to clarify this framing and avoid implying direct quantitative support from the current synthesis. revision: partial
Referee: [Synthesis section] Synthesis of RL applications: The review summarizes individual RL studies on cooperation, fairness, and ecological dynamics but does not perform or report head-to-head metrics (parameter counts, out-of-sample performance, or direct contrast with imitation baselines) that would substantiate the claim of superior explanatory power over imitation learning for the same phenomena.

Authors: The synthesis section overviews applications and findings from the cited RL studies without performing new cross-study comparisons or reporting aggregated metrics such as parameter counts or out-of-sample performance. Individual source papers often contain their own baseline contrasts, but compiling head-to-head evaluations would require a distinct meta-analytic effort outside the scope of a brief review. We therefore do not intend to add such metrics; the manuscript's contribution lies in highlighting RL's potential as a unified framework based on the collective literature. revision: no

Circularity Check

0 steps flagged

Review synthesizes external studies with no internal derivation chain or self-referential reductions

full rationale

This is a review paper that introduces concepts in evolutionary game theory and RL, then summarizes progress from cited external works on cooperation, trust, fairness, and ecological dynamics. No equations, fitted parameters, or derivations are presented that could reduce to inputs by construction. The central claim that RL offers a unified framework rests entirely on the body of reviewed literature rather than any self-citation load-bearing step or ansatz smuggled in. The interpretive suggestion that discrepancies arise from the imitation paradigm is an attribution drawn from the cited studies, not a circular self-definition or renaming of known results within this manuscript.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a review paper and introduces no new free parameters, axioms, or invented entities. It discusses concepts drawn from existing evolutionary game theory and reinforcement learning literature.

pith-pipeline@v0.9.0 · 5694 in / 976 out tokens · 43920 ms · 2026-05-21T14:38:57.179554+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Q-learning... Bellman equation Q(st,at) ← (1−α)Q(st,at) + α[Πt+1 + γ max Q(st+1,a′)]
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

phase diagram of cooperation level within the space of learning parameters (α, γ)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

115 extracted references · 115 canonical work pages · 1 internal anchor

[1]

M. A. Nowak, Science 314 (2006) 1560-1563

work page 2006
[2]

Hardin, Trust and trustworthiness , Russell Sage Foundation (2002)

R. Hardin, Trust and trustworthiness , Russell Sage Foundation (2002). 23

work page 2002
[3]

Piketty, Capital in the Twenty-First Century , Belknap Press: An Imprint of Harvard Univer- sity Press (2014)

T. Piketty, Capital in the Twenty-First Century , Belknap Press: An Imprint of Harvard Univer- sity Press (2014)

work page 2014
[4]

W. B. Arthur, Science 284 (1999) 107–109

work page 1999
[6]

Maynard Smith and G

J. Maynard Smith and G. R. Price, Nature 246 (1973) 15-18

work page 1973
[7]

J. M. Smith, Evolution and the Theory of Games , Cambridge University Press (1982)

work page 1982
[8]

M. Perc, J. J. Jordan, D. G. Rand, Z. Wang, S. Boccaletti, and A. Szolnoki, Phys. Rep. 687 (2017) 1–51

work page 2017
[9]

C. F. Camerer, Behavioral game theory: Experiments in strategic interaction , Princeton Univer- sity Press (2011)

work page 2011
[10]

Traulsen, D

A. Traulsen, D. Semmann, R. D. Sommerfeld, H.-J. Krambeck, and M. Milinski, Proc. Natl. Acad. Sci. USA 107 (2010) 2962-2966

work page 2010
[11]

M. A. Nowak and R. M. May, Nature 359 (1992) 826-829

work page 1992
[12]

Szabó and C

G. Szabó and C. Tőke, Phys. Rev. E 58 (1998) 69-73

work page 1998
[13]

Sánchez, J

A. Sánchez, J. Stat. Mech.: Theory Exp. 2018 (2018) 024001

work page 2018
[14]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction , MIT Press (2018)

work page 2018
[15]

P. D. Taylor and L. B. Jonker, Math. Biosci. 40 (1978) 145–156

work page 1978
[16]

Szabó and G

G. Szabó and G. Fáth, Phys. Rep. 446 (2007) 97-216

work page 2007
[17]

Grujić, C

J. Grujić, C. Gracia-Lázaro, M. Milinski, D. Semmann, A. Traulsen, J. A. Cuesta, Y. Moreno, and A. Sánchez, Sci. Rep. 4 (2014) 4615

work page 2014
[18]

Bandura, Social Learning Theory , Englewood Cliffs (1977)

A. Bandura, Social Learning Theory , Englewood Cliffs (1977)

work page 1977
[19]

D. Lee, H. Seo, and M. W. Jung, Annu. Rev. Neurosci. 35 (2012) 287

work page 2012
[20]

M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming , John Wiley & Sons (2014)

work page 2014
[21]

R. R. Bush and F. Mosteller, Stochastic models for learning , John Wiley & Sons, Inc. (1955)

work page 1955
[22]

C. J. C. H. Watkins, Learning from delayed rewards (Ph.D. thesis) , University of Cambridge (1989)

work page 1989
[23]

R. J. Williams, Mach. Learn. 8 (1992) 229

work page 1992
[24]

R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, in Advances in Neural Information Processing Systems, MIT Press (1999) Vol. 12

work page 1999
[25]

J. M. Smith, in Did Darwin get it right? Essays on games, sex and evolution , Springer (1982) 202–215

work page 1982
[26]

Xie and A

K. Xie and A. Szolnoki, Appl. Math. Comput. 510 (2026) 129685

work page 2026
[27]

Z. Ding, G. Zheng, C. Cai, W. Cai, L. Chen, J. Zhang, and X. Wang, Chaos, Solitons & Fractals 175 (2023) 114032

work page 2023
[28]

Zheng, Z

G. Zheng, Z. Ding, J. Zhang, S. Deng, W. Cai, and L. Chen, Chaos 35 (2025) 053129

work page 2025
[29]

H. Ding, G. Zhang, S. Wang, J. Li, and Z. Wang, Physica A 536 (2019) 122551

work page 2019
[30]

H. Lee, S. Chen, and F. Shi, New J. Phys. 27 (2025) 013025

work page 2025
[31]

D. Jia, H. Guo, Z. Song, L. Shi, X. Deng, M. Perc, and Z. Wang, New J. Phys. 23 (2021) 083020

work page 2021
[32]

C. Zhao, G. Zheng, C. Zhang, J. Zhang, and L. Chen, Chaos 34 (2024) 073123

work page 2024
[33]

L. Wang, X. Shi, and Y. Zhou, Chaos 35 (2025) 023103

work page 2025
[34]

Z. Fang, H. Xu, C. Xie, X. Yue, T. P. Benko, and C. Huang, Chaos, Solitons & Fractals 200 (2025) 117115

work page 2025
[35]

Zhang and Y

Q. Zhang and Y. Yan, Phys. Lett. A 2025 (2025) 130754. 24

work page 2025
[36]

P. Bai, B. Qiang, K. Zou, and C. Huang, Chaos, Solitons & Fractals 180 (2024) 114592

work page 2024
[37]

T. You, H. Yang, J. Wang, P. Zhang, J. Chen, and Y. Zhang, Appl. Math. Comput. 458 (2023) 128234

work page 2023
[38]

Huang and Y

Y. Huang and Y. Chen, Chaos 35 (2025) 043130

work page 2025
[39]

Zhang, Z.-X

H.-F. Zhang, Z.-X. Wu, and B.-H. Wang, J. Stat. Mech.: Theory Exp. 2012 (2012) P06005

work page 2012
[40]

L. Wang, D. Jia, L. Zhang, P. Zhu, M. Perc, L. Shi, and Z. Wang, Nonlinear Dyn. 108 (2022) 1837

work page 2022
[41]

X. Wang, Z. Yang, Y. Liu, and G. Chen, Physica A 618 (2023) 128699

work page 2023
[42]

Q. Su, H. Wang, Y. Xia, and L. Wang, Nat. Commun. (2025) (in press)

work page 2025
[43]

Sheng, J

A. Sheng, J. Zhang, G. Zheng, J. Zhang, W. Cai, and L. Chen, Chaos 34 (2024) 103117

work page 2024
[44]

Zheng, J

G. Zheng, J. Zhang, S. Deng, W. Cai, and L. Chen, Chaos, Solitons & Fractals 188 (2024) 115568

work page 2024
[45]

Hardin, Science 162 (1968) 1243–1248

G. Hardin, Science 162 (1968) 1243–1248

work page 1968
[46]

L. Wang, L. Fan, L. Zhang, R. Zou, and Z. Wang, New J. Phys. 25 (2023) 073008

work page 2023
[47]

Zhang, T

H. Zhang, T. An, P. Yan, K. Hu, J. An, L. Shi, J. Zhao, and J. Wang, Chaos, Solitons & Fractals 178 (2024) 114358

work page 2024
[48]

Zou and C

K. Zou and C. Huang, Chaos, Solitons & Fractals 186 (2024) 115203

work page 2024
[49]

B. Li, Z. Zhang, G. Zheng, C. Cai, J. Zhang, and L. Chen, Phys. Rev. E 111 (2025) 014304

work page 2025
[50]

H. Kang, C. Jiang, Y. Shen, X. Sun, and Q. Chen, Chaos, Solitons & Fractals 199 (2025) 116862

work page 2025
[51]

Y. Xu, J. Wang, J. Chen, D. Zhao, M. Özer, C. Xia, and M. Perc, Knowledge-Based Systems 301 (2024) 112326

work page 2024
[52]

Zhang, Y

L. Zhang, Y. Li, Y. Xie, Y. Feng, and C. Huang, Chaos, Solitons & Fractals 193 (2025) 116071

work page 2025
[53]

Traulsen, D

A. Traulsen, D. Semmann, R. D. Sommerfeld, H.-J. Krambeck, and M. Milinski, Proc. Natl. Acad. Sci. USA 107 (2010) 2962–2966

work page 2010
[54]

X. Han, X. Zhao, and H. Xia, Chaos, Solitons & Fractals 164 (2022) 112684

work page 2022
[55]

Zhang, Z

Y. Zhang, Z. Zheng, X. Zhang, and J. Ma, Chaos, Solitons & Fractals 201 (2025) 117264

work page 2025
[56]

Y. Yang, D. Zhao, and J. Wang, Chaos, Solitons & Fractals 199 (2025) 116592

work page 2025
[57]

L. Ma, J. Zhang, G. Zheng, R. Liang, and L. Chen, Chaos, Solitons & Fractals 171 (2023) 113452

work page 2023
[58]

C. Zhao, X. Feng, G. Zheng, W. Cai, J. Zhang, and L. Chen, Phys. Rev. E 112 (2025) 054309

work page 2025
[59]

Zheng, J

G. Zheng, J. Zhang, J. Zhang, W. Cai, and L. Chen, New J. Phys. 26 (2024) 053041

work page 2024
[60]

K. J. Arrow, The limits of organization , Norton & Company (1974)

work page 1974
[61]

P. J. Zak and S. Knack, Econ. J. 111 (2001) 295–321

work page 2001
[62]

Algan and P

Y. Algan and P. Cahuc, Annu. Rev. Econ. 5 (2013) 521-549

work page 2013
[63]

J. Berg, J. Dickhaut, and K. McCabe, Games Econ. Behav. 10 (1995) 122-142

work page 1995
[64]

N. D. Johnson and A. A. Mislin, J. Econ. Psychol. 32 (2011) 865-889

work page 2011
[65]

Bravo and L

G. Bravo and L. Tamburino, Rationality Soc. 20 (2008) 85-113

work page 2008
[66]

Wang, Appl

C. Wang, Appl. Math. Comput. 471 (2024) 128595

work page 2024
[67]

R. Guo, L. Liu, Y. Liu, and L. Zhang, Chaos, Solitons & Fractals 176 (2023) 114078

work page 2023
[68]

Y. Zhu, W. Li, C. Xia, and M. Chica, Knowl.-Based Syst. 305 (2024) 112645

work page 2024
[69]

R. Guo, L. Liu, Y. Liu, and L. Zhang, Appl. Math. Comput. 473 (2024) 128649

work page 2024
[70]

Y. Liu, L. Wang, R. Guo, S. Hua, L. Liu, L. Zhang, and T. A. Han, J. R. Soc. Interface 22 (2025) 20240726

work page 2025
[71]

Kumar, V

A. Kumar, V. Capraro, and M. Perc, J. R. Soc. Interface 17 (2020) 20200491

work page 2020
[72]

Y. Zhu, B. Xing, and C. Xia, Chaos, Solitons & Fractals 199 (2025) 116653. 25

work page 2025
[73]

Z. Hu, Y. Zhu, D. Zhao, and C. Xia, Chaos, Solitons & Fractals 202 (2026) 117623

work page 2026
[74]

Engle-Warnick and R

J. Engle-Warnick and R. L. Slonim, J. Econ. Behav. Organ. 55 (2004) 553-573

work page 2004
[75]

Zheng, J

G. Zheng, J. Zhang, X. Ou, S. Deng, and L. Chen, Phys. Rev. E 111 (2025) 064307

work page 2025
[76]

W. Güth, R. Schmittberger, and B. Schwarze, J. Econ. Behav. Organ. 3 (1982) 367-388

work page 1982
[77]

R. H. Thaler, J. Econ. Perspect. 2 (1988) 195-206

work page 1988
[78]

Güth and M

W. Güth and M. G. Kocher, J. Econ. Behav. Organ. 108 (2014) 396-409

work page 2014
[79]

Szabó and C

G. Szabó and C. UQke, Phys. Rev. E 58 (1998) 69-73

work page 1998
[80]

K. M. Page and K. Sigmund, Proc. Biol. Sci. 267 (2000) 2177-2182

work page 2000
[81]

M. N. Kuperman and S. Risau-Gusman, Eur. Phys. J. B 62 (2008) 233-238

work page 2008

Showing first 80 references.

[1] [1]

M. A. Nowak, Science 314 (2006) 1560-1563

work page 2006

[2] [2]

Hardin, Trust and trustworthiness , Russell Sage Foundation (2002)

R. Hardin, Trust and trustworthiness , Russell Sage Foundation (2002). 23

work page 2002

[3] [3]

Piketty, Capital in the Twenty-First Century , Belknap Press: An Imprint of Harvard Univer- sity Press (2014)

T. Piketty, Capital in the Twenty-First Century , Belknap Press: An Imprint of Harvard Univer- sity Press (2014)

work page 2014

[4] [4]

W. B. Arthur, Science 284 (1999) 107–109

work page 1999

[5] [6]

Maynard Smith and G

J. Maynard Smith and G. R. Price, Nature 246 (1973) 15-18

work page 1973

[6] [7]

J. M. Smith, Evolution and the Theory of Games , Cambridge University Press (1982)

work page 1982

[7] [8]

M. Perc, J. J. Jordan, D. G. Rand, Z. Wang, S. Boccaletti, and A. Szolnoki, Phys. Rep. 687 (2017) 1–51

work page 2017

[8] [9]

C. F. Camerer, Behavioral game theory: Experiments in strategic interaction , Princeton Univer- sity Press (2011)

work page 2011

[9] [10]

Traulsen, D

A. Traulsen, D. Semmann, R. D. Sommerfeld, H.-J. Krambeck, and M. Milinski, Proc. Natl. Acad. Sci. USA 107 (2010) 2962-2966

work page 2010

[10] [11]

M. A. Nowak and R. M. May, Nature 359 (1992) 826-829

work page 1992

[11] [12]

Szabó and C

G. Szabó and C. Tőke, Phys. Rev. E 58 (1998) 69-73

work page 1998

[12] [13]

Sánchez, J

A. Sánchez, J. Stat. Mech.: Theory Exp. 2018 (2018) 024001

work page 2018

[13] [14]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction , MIT Press (2018)

work page 2018

[14] [15]

P. D. Taylor and L. B. Jonker, Math. Biosci. 40 (1978) 145–156

work page 1978

[15] [16]

Szabó and G

G. Szabó and G. Fáth, Phys. Rep. 446 (2007) 97-216

work page 2007

[16] [17]

Grujić, C

J. Grujić, C. Gracia-Lázaro, M. Milinski, D. Semmann, A. Traulsen, J. A. Cuesta, Y. Moreno, and A. Sánchez, Sci. Rep. 4 (2014) 4615

work page 2014

[17] [18]

Bandura, Social Learning Theory , Englewood Cliffs (1977)

A. Bandura, Social Learning Theory , Englewood Cliffs (1977)

work page 1977

[18] [19]

D. Lee, H. Seo, and M. W. Jung, Annu. Rev. Neurosci. 35 (2012) 287

work page 2012

[19] [20]

M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming , John Wiley & Sons (2014)

work page 2014

[20] [21]

R. R. Bush and F. Mosteller, Stochastic models for learning , John Wiley & Sons, Inc. (1955)

work page 1955

[21] [22]

C. J. C. H. Watkins, Learning from delayed rewards (Ph.D. thesis) , University of Cambridge (1989)

work page 1989

[22] [23]

R. J. Williams, Mach. Learn. 8 (1992) 229

work page 1992

[23] [24]

R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, in Advances in Neural Information Processing Systems, MIT Press (1999) Vol. 12

work page 1999

[24] [25]

J. M. Smith, in Did Darwin get it right? Essays on games, sex and evolution , Springer (1982) 202–215

work page 1982

[25] [26]

Xie and A

K. Xie and A. Szolnoki, Appl. Math. Comput. 510 (2026) 129685

work page 2026

[26] [27]

Z. Ding, G. Zheng, C. Cai, W. Cai, L. Chen, J. Zhang, and X. Wang, Chaos, Solitons & Fractals 175 (2023) 114032

work page 2023

[27] [28]

Zheng, Z

G. Zheng, Z. Ding, J. Zhang, S. Deng, W. Cai, and L. Chen, Chaos 35 (2025) 053129

work page 2025

[28] [29]

H. Ding, G. Zhang, S. Wang, J. Li, and Z. Wang, Physica A 536 (2019) 122551

work page 2019

[29] [30]

H. Lee, S. Chen, and F. Shi, New J. Phys. 27 (2025) 013025

work page 2025

[30] [31]

D. Jia, H. Guo, Z. Song, L. Shi, X. Deng, M. Perc, and Z. Wang, New J. Phys. 23 (2021) 083020

work page 2021

[31] [32]

C. Zhao, G. Zheng, C. Zhang, J. Zhang, and L. Chen, Chaos 34 (2024) 073123

work page 2024

[32] [33]

L. Wang, X. Shi, and Y. Zhou, Chaos 35 (2025) 023103

work page 2025

[33] [34]

Z. Fang, H. Xu, C. Xie, X. Yue, T. P. Benko, and C. Huang, Chaos, Solitons & Fractals 200 (2025) 117115

work page 2025

[34] [35]

Zhang and Y

Q. Zhang and Y. Yan, Phys. Lett. A 2025 (2025) 130754. 24

work page 2025

[35] [36]

P. Bai, B. Qiang, K. Zou, and C. Huang, Chaos, Solitons & Fractals 180 (2024) 114592

work page 2024

[36] [37]

T. You, H. Yang, J. Wang, P. Zhang, J. Chen, and Y. Zhang, Appl. Math. Comput. 458 (2023) 128234

work page 2023

[37] [38]

Huang and Y

Y. Huang and Y. Chen, Chaos 35 (2025) 043130

work page 2025

[38] [39]

Zhang, Z.-X

H.-F. Zhang, Z.-X. Wu, and B.-H. Wang, J. Stat. Mech.: Theory Exp. 2012 (2012) P06005

work page 2012

[39] [40]

L. Wang, D. Jia, L. Zhang, P. Zhu, M. Perc, L. Shi, and Z. Wang, Nonlinear Dyn. 108 (2022) 1837

work page 2022

[40] [41]

X. Wang, Z. Yang, Y. Liu, and G. Chen, Physica A 618 (2023) 128699

work page 2023

[41] [42]

Q. Su, H. Wang, Y. Xia, and L. Wang, Nat. Commun. (2025) (in press)

work page 2025

[42] [43]

Sheng, J

A. Sheng, J. Zhang, G. Zheng, J. Zhang, W. Cai, and L. Chen, Chaos 34 (2024) 103117

work page 2024

[43] [44]

Zheng, J

G. Zheng, J. Zhang, S. Deng, W. Cai, and L. Chen, Chaos, Solitons & Fractals 188 (2024) 115568

work page 2024

[44] [45]

Hardin, Science 162 (1968) 1243–1248

G. Hardin, Science 162 (1968) 1243–1248

work page 1968

[45] [46]

L. Wang, L. Fan, L. Zhang, R. Zou, and Z. Wang, New J. Phys. 25 (2023) 073008

work page 2023

[46] [47]

Zhang, T

H. Zhang, T. An, P. Yan, K. Hu, J. An, L. Shi, J. Zhao, and J. Wang, Chaos, Solitons & Fractals 178 (2024) 114358

work page 2024

[47] [48]

Zou and C

K. Zou and C. Huang, Chaos, Solitons & Fractals 186 (2024) 115203

work page 2024

[48] [49]

B. Li, Z. Zhang, G. Zheng, C. Cai, J. Zhang, and L. Chen, Phys. Rev. E 111 (2025) 014304

work page 2025

[49] [50]

H. Kang, C. Jiang, Y. Shen, X. Sun, and Q. Chen, Chaos, Solitons & Fractals 199 (2025) 116862

work page 2025

[50] [51]

Y. Xu, J. Wang, J. Chen, D. Zhao, M. Özer, C. Xia, and M. Perc, Knowledge-Based Systems 301 (2024) 112326

work page 2024

[51] [52]

Zhang, Y

L. Zhang, Y. Li, Y. Xie, Y. Feng, and C. Huang, Chaos, Solitons & Fractals 193 (2025) 116071

work page 2025

[52] [53]

Traulsen, D

A. Traulsen, D. Semmann, R. D. Sommerfeld, H.-J. Krambeck, and M. Milinski, Proc. Natl. Acad. Sci. USA 107 (2010) 2962–2966

work page 2010

[53] [54]

X. Han, X. Zhao, and H. Xia, Chaos, Solitons & Fractals 164 (2022) 112684

work page 2022

[54] [55]

Zhang, Z

Y. Zhang, Z. Zheng, X. Zhang, and J. Ma, Chaos, Solitons & Fractals 201 (2025) 117264

work page 2025

[55] [56]

Y. Yang, D. Zhao, and J. Wang, Chaos, Solitons & Fractals 199 (2025) 116592

work page 2025

[56] [57]

L. Ma, J. Zhang, G. Zheng, R. Liang, and L. Chen, Chaos, Solitons & Fractals 171 (2023) 113452

work page 2023

[57] [58]

C. Zhao, X. Feng, G. Zheng, W. Cai, J. Zhang, and L. Chen, Phys. Rev. E 112 (2025) 054309

work page 2025

[58] [59]

Zheng, J

G. Zheng, J. Zhang, J. Zhang, W. Cai, and L. Chen, New J. Phys. 26 (2024) 053041

work page 2024

[59] [60]

K. J. Arrow, The limits of organization , Norton & Company (1974)

work page 1974

[60] [61]

P. J. Zak and S. Knack, Econ. J. 111 (2001) 295–321

work page 2001

[61] [62]

Algan and P

Y. Algan and P. Cahuc, Annu. Rev. Econ. 5 (2013) 521-549

work page 2013

[62] [63]

J. Berg, J. Dickhaut, and K. McCabe, Games Econ. Behav. 10 (1995) 122-142

work page 1995

[63] [64]

N. D. Johnson and A. A. Mislin, J. Econ. Psychol. 32 (2011) 865-889

work page 2011

[64] [65]

Bravo and L

G. Bravo and L. Tamburino, Rationality Soc. 20 (2008) 85-113

work page 2008

[65] [66]

Wang, Appl

C. Wang, Appl. Math. Comput. 471 (2024) 128595

work page 2024

[66] [67]

R. Guo, L. Liu, Y. Liu, and L. Zhang, Chaos, Solitons & Fractals 176 (2023) 114078

work page 2023

[67] [68]

Y. Zhu, W. Li, C. Xia, and M. Chica, Knowl.-Based Syst. 305 (2024) 112645

work page 2024

[68] [69]

R. Guo, L. Liu, Y. Liu, and L. Zhang, Appl. Math. Comput. 473 (2024) 128649

work page 2024

[69] [70]

Y. Liu, L. Wang, R. Guo, S. Hua, L. Liu, L. Zhang, and T. A. Han, J. R. Soc. Interface 22 (2025) 20240726

work page 2025

[70] [71]

Kumar, V

A. Kumar, V. Capraro, and M. Perc, J. R. Soc. Interface 17 (2020) 20200491

work page 2020

[71] [72]

Y. Zhu, B. Xing, and C. Xia, Chaos, Solitons & Fractals 199 (2025) 116653. 25

work page 2025

[72] [73]

Z. Hu, Y. Zhu, D. Zhao, and C. Xia, Chaos, Solitons & Fractals 202 (2026) 117623

work page 2026

[73] [74]

Engle-Warnick and R

J. Engle-Warnick and R. L. Slonim, J. Econ. Behav. Organ. 55 (2004) 553-573

work page 2004

[74] [75]

Zheng, J

G. Zheng, J. Zhang, X. Ou, S. Deng, and L. Chen, Phys. Rev. E 111 (2025) 064307

work page 2025

[75] [76]

W. Güth, R. Schmittberger, and B. Schwarze, J. Econ. Behav. Organ. 3 (1982) 367-388

work page 1982

[76] [77]

R. H. Thaler, J. Econ. Perspect. 2 (1988) 195-206

work page 1988

[77] [78]

Güth and M

W. Güth and M. G. Kocher, J. Econ. Behav. Organ. 108 (2014) 396-409

work page 2014

[78] [79]

Szabó and C

G. Szabó and C. UQke, Phys. Rev. E 58 (1998) 69-73

work page 1998

[79] [80]

K. M. Page and K. Sigmund, Proc. Biol. Sci. 267 (2000) 2177-2182

work page 2000

[80] [81]

M. N. Kuperman and S. Risau-Gusman, Eur. Phys. J. B 62 (2008) 233-238

work page 2008