Decoding species coexistence: A reinforcement learning perspective

Chenyang Zhao; Jiqiang Zhang; Kaiwen Jiang; Li Chen; Shengfeng Deng; Weiran Cai

arxiv: 2508.17599 · v2 · pith:47MPOQLVnew · submitted 2025-08-25 · 🧬 q-bio.PE · cond-mat.dis-nn· nlin.AO

Decoding species coexistence: A reinforcement learning perspective

Kaiwen Jiang , Chenyang Zhao , Shengfeng Deng , Weiran Cai , Jiqiang Zhang , Li Chen This is my paper

Pith reviewed 2026-05-21 23:30 UTC · model grok-4.3

classification 🧬 q-bio.PE cond-mat.dis-nnnlin.AO

keywords species coexistencerock-paper-scissorsreinforcement learningQ-learningspatial ecologybiodiversity maintenanceadaptive mobility

0 comments

The pith

In a spatial rock-paper-scissors model, mobility adaptively regulated by reinforcement learning allows all three species to coexist stably across broad migration rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies a reinforcement learning framework to a spatial rock-paper-scissors model to resolve why high mobility does not always eliminate biodiversity as earlier fixed-mobility theories predicted. Individuals use Q-learning to adjust movement in response to local encounters rather than following preset rates. This produces stable coexistence with low extinction risk over wide baseline migration values. The mechanism hinges on learned behaviors that balance predator avoidance with prey pursuit.

Core claim

When mobility is adaptively regulated via a Q-learning algorithm in a spatial RPS model, all three species coexist stably with low extinction probabilities across a broad range of baseline migration rates. Individuals develop survival priority by escaping predators and predation priority by remaining near prey. Coexistence arises from the balance of these tendencies; imbalance jeopardizes biodiversity. A symmetry-breaking of action preference in a particular state accounts for divergent species densities. Q-learning species show a significant evolutionary advantage when interacting with fixed-mobility counterparts.

What carries the argument

Q-learning algorithm that adaptively regulates individual mobility based on local predator-prey encounters.

If this is right

Coexistence remains stable over wide migration rates because learned behaviors balance escape and pursuit.
Imbalance between survival priority and predation priority drives biodiversity loss.
Symmetry-breaking in action preference in one state produces unequal species densities.
Adaptive-mobility species outcompete fixed-mobility species in direct interactions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Natural populations may evolve learning rules for movement that promote long-term diversity in competitive settings.
The approach could be tested in other cyclic competition models to check whether adaptive regulation generally supports coexistence.
Conservation planning might prioritize habitat features that enable behavioral adjustment over simply restricting overall movement.

Load-bearing premise

The Q-learning process accurately represents how real organisms adjust mobility in response to local predator-prey encounters.

What would settle it

Direct observation or measurement of movement rates in natural three-species cyclic systems showing whether individuals increase movement away from predators and toward prey in proportions matching the model's learned action preferences at high mobility.

Figures

Figures reproduced from arXiv: 2508.17599 by Chenyang Zhao, Jiqiang Zhang, Kaiwen Jiang, Li Chen, Shengfeng Deng, Weiran Cai.

**Figure 1.** Figure 1: , which occur at a rate σ. Reaction 2 shows the reproduction process with a rate µ, which can only take place when an adjacent site is empty. Reaction 3 represents the migration process with an exchange rate ε0. Following these reactions, one can monitor the density evolution of the three species and study the impact of these parameters, e.g., with the Gillespie algorithm [26]. Typically, the predation an… view at source ↗

**Figure 2.** Figure 2: (b, c) provide the typical time series for the two scenarios at M0 = 3 × 10−4 of all densities ρi , where i ∈ {A, B, C, ∅} [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3 [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4 [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5 [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7 [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 9.** Figure 9: FIG. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10: FIG. 10 [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

read the original abstract

A central goal in ecology is to understand how biodiversity is maintained. Previous theoretical works have employed the rock-paper-scissors (RPS) game as a toy model, demonstrating that population mobility is crucial in determining the species' coexistence. One key prediction is that biodiversity is jeopardized and eventually lost when mobility exceeds a certain value--a conclusion at odds with empirical observations of highly mobile species coexisting in nature. To address this discrepancy, we introduce a reinforcement learning framework and study a spatial RPS model, where individual mobility is adaptively regulated via a Q-learning algorithm rather than held fixed. Our results show that all three species can coexist stably, with extinction probabilities remaining low across a broad range of baseline migration rates. Mechanistic analysis reveals that individuals develop two behavioral tendencies: survival priority (escaping from predators) and predation priority (remaining near prey). While species coexistence emerges from the balance of the two tendencies, their imbalance jeopardizes biodiversity. Notably, there is a symmetry-breaking of action preference in a particular state that is responsible for the divergent species densities. Furthermore, when Q-learning species interact with fixed-mobility counterparts, those with adaptive mobility exhibit a significant evolutionary advantage. Our study suggests that reinforcement learning may offer a promising new perspective for uncovering the mechanisms of biodiversity and informing conservation strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Q-learning lets agents adapt mobility in spatial RPS and produces stable coexistence over wider migration ranges than fixed-mobility models, but the extinction-probability claims rest on simulation details that are not yet visible.

read the letter

The new element is the use of Q-learning to let each individual adjust its own mobility on the fly instead of fixing it in advance. In the classic spatial RPS setup this change removes the sharp mobility threshold that used to drive one species extinct, and the paper shows coexistence persisting across a broad interval of baseline migration rates. The mechanistic part is useful: agents develop two clear tendencies—escaping predators and staying near prey—and coexistence holds when those tendencies stay balanced. The symmetry-breaking in one state also accounts for the unequal densities that appear. When the learning agents compete against fixed-mobility ones, the adaptive group has a clear edge, which is a straightforward test of the advantage.

Referee Report

2 major / 2 minor

Summary. The paper introduces a reinforcement learning (Q-learning) framework to model adaptive individual mobility in a spatial rock-paper-scissors (RPS) ecological model. Unlike fixed-mobility versions where high mobility leads to biodiversity loss, the adaptive model allows stable coexistence of all three species, with low extinction probabilities maintained across a broad range of baseline migration rates. Mechanistic analysis identifies survival-priority and predation-priority behaviors, symmetry-breaking in action preferences that drives density differences, and an evolutionary advantage for Q-learning agents when competing against fixed-mobility counterparts.

Significance. If the reported simulation outcomes are robust, the work provides a novel mechanistic perspective on biodiversity maintenance that reconciles theoretical mobility thresholds with empirical observations of highly mobile coexisting species. The integration of reinforcement learning to derive adaptive behavioral rules from local interactions is a clear strength and could inform future agent-based ecological models and conservation applications.

major comments (2)

[Results / Simulation protocol] The central claim that extinction probabilities remain low for all three species across a broad range of baseline migration rates rests on the simulation protocol correctly sampling the tail of the extinction-time distribution. The abstract and available description supply no information on the number of independent Monte Carlo replicates, total run length, burn-in period, or convergence diagnostics for the probability estimates; if these are modest or short relative to typical extinction timescales at high mobility, rare extinctions could be missed, undermining the stability conclusion (see skeptic note on underpowered estimates).
[Methods / Q-learning details] The Q-learning implementation introduces free parameters (learning rate and exploration parameter) whose specific values and sensitivity are not reported. Because the coexistence result is obtained from agent-based simulations rather than an algebraic reduction, it is necessary to demonstrate that the low-extinction outcome is not an artifact of particular hyperparameter choices or post-hoc tuning.

minor comments (2)

Clarify the precise mapping between the 'baseline migration rate' parameter and the adaptive mobility output of the Q-learning algorithm; this notation is used in the abstract but its operational definition is unclear from the summary.
Add error bars, confidence intervals, or replicate variability to any figures reporting extinction probabilities or species densities to allow visual assessment of robustness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the two major comments below and have revised the manuscript to incorporate additional details on simulation protocols and Q-learning hyperparameters, along with supporting analyses to strengthen the robustness claims.

read point-by-point responses

Referee: [Results / Simulation protocol] The central claim that extinction probabilities remain low for all three species across a broad range of baseline migration rates rests on the simulation protocol correctly sampling the tail of the extinction-time distribution. The abstract and available description supply no information on the number of independent Monte Carlo replicates, total run length, burn-in period, or convergence diagnostics for the probability estimates; if these are modest or short relative to typical extinction timescales at high mobility, rare extinctions could be missed, undermining the stability conclusion (see skeptic note on underpowered estimates).

Authors: We agree that explicit reporting of the simulation protocol is necessary for assessing the reliability of the extinction probability estimates. In the revised manuscript, we will add a new subsection in the Methods section specifying the number of independent Monte Carlo replicates (1000 per parameter combination), total simulation length (10^6 time steps after a 10^5-step burn-in), and convergence checks (monitoring stabilization of species densities and extinction event counts). These parameters were chosen to exceed typical extinction timescales observed in fixed-mobility controls, ensuring adequate sampling of rare events. We will also include supplementary figures showing cumulative extinction probability convergence over replicate count. revision: yes
Referee: [Methods / Q-learning details] The Q-learning implementation introduces free parameters (learning rate and exploration parameter) whose specific values and sensitivity are not reported. Because the coexistence result is obtained from agent-based simulations rather than an algebraic reduction, it is necessary to demonstrate that the low-extinction outcome is not an artifact of particular hyperparameter choices or post-hoc tuning.

Authors: We acknowledge the importance of demonstrating robustness to hyperparameter choices. The revised manuscript will explicitly report the values used (learning rate α = 0.1, exploration rate ε = 0.05 with linear decay) in the Methods. We will add a new supplementary section with sensitivity analyses varying α from 0.01 to 0.5 and ε from 0.01 to 0.2, showing that stable coexistence with low extinction probabilities persists across this range. These results confirm that the reported outcomes are not sensitive to specific tuning. revision: yes

Circularity Check

0 steps flagged

No circularity: coexistence claims arise from independent agent-based RL simulations

full rationale

The paper's central results on stable coexistence and low extinction probabilities are generated by running Q-learning agents in a spatial RPS lattice model. No algebraic derivation chain is presented that reduces the reported outcomes to fitted parameters, self-definitions, or prior self-citations. The adaptive mobility behaviors and symmetry-breaking are observed outputs of the simulation protocol rather than inputs renamed as predictions. The work is self-contained against external benchmarks because the RL update rules and mobility adaptation are defined independently of the final extinction statistics.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The model rests on standard RPS interaction rules and typical Q-learning update mechanics; no new physical entities are postulated, but several algorithmic parameters remain unspecified in the abstract.

free parameters (2)

Q-learning rate
Standard hyperparameter controlling how quickly action values are updated; value and selection method not stated in abstract.
exploration parameter
Controls balance between trying new actions and using learned ones; tuning details absent from abstract.

axioms (1)

domain assumption Species interactions follow cyclic dominance on a spatial lattice with local movement decisions.
Invoked as the base model structure throughout the abstract.

pith-pipeline@v0.9.0 · 5774 in / 1291 out tokens · 52756 ms · 2026-05-21T23:30:08.198702+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

individuals develop two behavioral tendencies: survival priority (escaping from predators) and predation priority (remaining near prey)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Q-learning algorithm on a spatial RPS model; individuals belonging to the same species are guided by a common Q-table

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A brief review of evolutionary game dynamics in the reinforcement learning paradigm
q-bio.PE 2026-02 unverdicted novelty 2.0

A review synthesizing how reinforcement learning in evolutionary games provides a unified framework for social and ecological phenomena beyond traditional imitation models.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 1 Pith paper

[1]

Initialize all the items of the three Q-tables with random num- bers Qs,a ∈ (0, 1) independently to mimic the un- awareness of individuals to the surroundings

Each site of the lattice is randomly occupied by an individual of A, B, C, or left empty. Initialize all the items of the three Q-tables with random num- bers Qs,a ∈ (0, 1) independently to mimic the un- awareness of individuals to the surroundings. Each player i takes a random migration rate withai ∈ A

work page
[2]

Their states also need to be updated

In the learning process, each agent’s action is made by pure exploration ai ∈ A ; afterwards, their re- wards are obtained by collecting payoffs, and then they update their Q-tables to accumulate their ex- perience. Their states also need to be updated

work page
[3]

Their migration is then strictly guided by the corresponding Q-table belonging to their species, and the three Q-tables are no longer revised

After the three Q-tables are converged, the game process starts. Their migration is then strictly guided by the corresponding Q-table belonging to their species, and the three Q-tables are no longer revised. Repeat step 2 till the convergence of three Q-tables, which completes the learning process. Repeat step 3 un- til the system reaches a statistically ...

work page
[4]

M. E. Assessment, Ecosystems and human well-being: current state and trends: findings of the Condition and Trends Working Group (Island press, 2005)

work page 2005
[5]

Darwin, On the origin of species, 1859 (Routledge London, UK:, 2004)

C. Darwin, On the origin of species, 1859 (Routledge London, UK:, 2004)

work page 2004
[6]

Pennisi, Science 309, 90 (2005)

E. Pennisi, Science 309, 90 (2005)

work page 2005
[7]

Government, The Economics of Biodiversity: The Dasgupta Review (UK Government, 2021)

U. Government, The Economics of Biodiversity: The Dasgupta Review (UK Government, 2021)

work page 2021
[8]

May and A

R. May and A. R. McLean, Theoretical ecology: princi- ples and applications (OUP Oxford, 2007)

work page 2007
[9]

J. D. Murray, Mathematical biology: I. An introduction 3rd ed. , Vol. 17 (Springer Science & Business Media, 2013)

work page 2013
[10]

C. L. Lehman and D. Tilman, Spatial ecology: the role of space in population dynamics and interspecific inter- actions 185, 191 (1997)

work page 1997
[11]

A. J. McLane, C. Semeniuk, G. J. McDermid, and D. J. Marceau, Ecological modelling 222, 1544 (2011)

work page 2011
[12]

J. M. Smith, Evolution and the Theory of Games (Cam- bridge Univ. Press, 1982)

work page 1982
[13]

Hofbauer and K

J. Hofbauer and K. Sigmund, Evolutionary Games and Population Dynamics (Cambridge University Press, Cambridge, 1998)

work page 1998
[14]

M. A. Nowak, Evolutionary Dynamics (Belknap Press, Cambridge, MA, 2006)

work page 2006
[15]

Durrett and S

R. Durrett and S. Levin, Journal of Theoretical Biology 185, 165 (1997)

work page 1997
[16]

Durrett and S

R. Durrett and S. Levin, Theoretical Population Biology 53, 30 (1998)

work page 1998
[17]

B. Kerr, C. Neuhauser, B. J. M. Bohannan, and A. M. Dean, Nature 418, 171 (2002)

work page 2002
[19]

R. M. May and W. J. Leonard, SIAM Journal on Applied Mathematics 29, 243 (1975)

work page 1975
[20]

C. R. Johnson and I. Seinen, Proceedings of the Royal Society of London. Series B: Biological Sciences 269, 655 (2002)

work page 2002
[21]

Reichenbach, M

T. Reichenbach, M. Mobilia, and E. Frey, Physical Re- view E 74, 051907 (2006)

work page 2006
[22]

Szab´ o and G

G. Szab´ o and G. F´ ath, Physics Reports446, 97 (2007)

work page 2007
[23]

Szolnoki, M

A. Szolnoki, M. Mobilia, L.-L. Jiang, B. Szczesny, A. M. Rucklidge, and M. Perc, Journal of the Royal Society Interface 11, 20140735 (2014)

work page 2014
[24]

Zhou, Contemporary Physics 57, 151 (2016)

H.-J. Zhou, Contemporary Physics 57, 151 (2016)

work page 2016
[25]

Sinervo and C

B. Sinervo and C. M. Lively, Nature 380, 240 (1996)

work page 1996
[26]

C. E. Paquin and J. Adams, Nature 306, 368 (1983)

work page 1983
[27]

Jackson and L

J. Jackson and L. Buss, Proceedings of the National Academy of Sciences 72, 5160 (1975)

work page 1975
[28]

T. L. Cz´ ar´ an, R. F. Hoekstra, and L. Pagie, Proceedings of the National Academy of Sciences 99, 786 (2002)

work page 2002
[29]

Reichenbach, M

T. Reichenbach, M. Mobilia, and E. Frey, Nature 448, 1046 (2007)

work page 2007
[30]

Yang, W.-X

R. Yang, W.-X. Wang, Y.-C. Lai, and C. Grebogi, Chaos 20, 023113 (2010)

work page 2010
[31]

Wang, Y.-C

W.-X. Wang, Y.-C. Lai, and C. Grebogi, Physical Re- view E 81, 046113 (2010)

work page 2010
[32]

W.-X. Wang, X. Ni, Y.-C. Lai, and C. Grebogi, Physical Review E 83, 011917 (2011)

work page 2011
[33]

J. Park, Y. Do, Z. Huang, and Y. Lai, Chaos 23, 023128 (2013)

work page 2013
[34]

Huang, X

W. Huang, X. Duan, L. Qin, and J. Park, Applied Math- ematics and Computation 456, 128135 (2023), early ac- cess: Jun 2023

work page 2023
[35]

H.-W. Lee, C. Cleveland, and A. Szolnoki, Chaos 32, 093103 (2022)

work page 2022
[36]

Menezes, M

J. Menezes, M. Tenorio, and E. Rangel, EPL 139, 57002 (2022)

work page 2022
[37]

Park and B

J. Park and B. Jang, Journal of the Korean Society for Industrial and Applied Mathematics 24, 351 (2020)

work page 2020
[38]

Park, EPL 126, 38004 (2019)

J. Park, EPL 126, 38004 (2019). 11 Algorithm 1: RPS model with Q-learning Input: α, γ Initialization; Q1, Q2, Q3 ← random(15 × 7); Lattice point ← random[0, 3]L×L; σ, µ ← 1; Nstep ← 10; Learning Process; repeat for each round t do for Each agent do Agent picks a random action a ∈ A; for interaction count = 1 to Nstep × L2 do Randomly select an agent and i...

work page 2019
[39]

Z. Ding, G. Zheng, C. Cai, W. Cai, L. Chen, J. Zhang, and X. Wang, Chaos, Solitons & Fractals 175, 114032 (2023)

work page 2023
[40]

Zheng, J

G. Zheng, J. Zhang, S. Deng, W. Cai, and L. Chen, Chaos, Solitons & Fractals 188, 115568 (2024)

work page 2024
[41]

Zheng, J

G. Zheng, J. Zhang, J. Zhang, W. Cai, and L. Chen, New Journal of Physics 26, 053041 (2024)

work page 2024
[42]

Zheng, J

G. Zheng, J. Zhang, X. Ou, S. Deng, and L. Chen, Phys- ical Review E 111, 064307 (2025)

work page 2025
[43]

Zheng, W

G. Zheng, W. Cai, G. Qi, J. Zhang, and L. Chen, arXiv:2312.14970 (2023)

work page arXiv 2023
[44]

Zhang, J

S. Zhang, J. Zhang, L. Chen, and X. Liu, Nonlinear Dynamics 99, 3301 (2020)

work page 2020
[45]

Zhang, S

J. Zhang, S. Zhang, L. Chen, and X. Liu, Physical Re- view E 101, 042402 (2020)

work page 2020
[46]

M. M. Olsen and R. Fraczkowski, Journal of Computa- tional Science 9, 118 (2015). (a) 0 500 1000 1500 2000 2500 3000 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 A B C empty sites (b) FIG. 10: Evolution in predation dominance scenarios. The typical pattern (a) and time series (b) in the predation dominance scenario with Rp = 40 and Rs = 0.5. Parameters: N...

work page 2015
[47]

X. Wang, J. Cheng, and L. Wang, Entropy 21, 773 (2019)

work page 2019
[48]

X. Wang, J. Cheng, and L. Wang, Ecological Complexity 42, 100815 (2020)

work page 2020
[49]

J. Park, J. Lee, T. Kim, I. Ahn, and J. Park, Entropy 23, 461 (2021)

work page 2021
[50]

J. Li, L. Li, and S. Zhao, New Journal of Physics 25, 092001 (2023)

work page 2023
[51]

Tsutsui, R

K. Tsutsui, R. Tanaka, K. Takeda, and K. Fujii, Elife 13, e85694 (2024)

work page 2024
[52]

Si and T

Z. Si and T. Ito, Chaos, Solitons & Fractals 199, 116628 (2025)

work page 2025
[53]

Reichenbach, M

T. Reichenbach, M. Mobilia, and E. Frey, Journal of Theoretical Biology 254, 368 (2008)

work page 2008
[54]

C. J. C. H. Watkins and P. Dayan, Machine Learning 8, 279 (1992)

work page 1992
[55]

Sutton and A

R. Sutton and A. Barto, Reinforcement Learning:An In- troduction (MIT press, 2018)

work page 2018
[56]

J. E. R. Staddon, Adaptive behavior and learning (Cam- bridge University Press, 1983)

work page 1983
[57]

A. J. Underwood, Experiments in ecology: their logi- cal design and interpretation using analysis of variance (Cambridge university press, 1997)

work page 1997

[1] [1]

Initialize all the items of the three Q-tables with random num- bers Qs,a ∈ (0, 1) independently to mimic the un- awareness of individuals to the surroundings

Each site of the lattice is randomly occupied by an individual of A, B, C, or left empty. Initialize all the items of the three Q-tables with random num- bers Qs,a ∈ (0, 1) independently to mimic the un- awareness of individuals to the surroundings. Each player i takes a random migration rate withai ∈ A

work page

[2] [2]

Their states also need to be updated

In the learning process, each agent’s action is made by pure exploration ai ∈ A ; afterwards, their re- wards are obtained by collecting payoffs, and then they update their Q-tables to accumulate their ex- perience. Their states also need to be updated

work page

[3] [3]

Their migration is then strictly guided by the corresponding Q-table belonging to their species, and the three Q-tables are no longer revised

After the three Q-tables are converged, the game process starts. Their migration is then strictly guided by the corresponding Q-table belonging to their species, and the three Q-tables are no longer revised. Repeat step 2 till the convergence of three Q-tables, which completes the learning process. Repeat step 3 un- til the system reaches a statistically ...

work page

[4] [4]

M. E. Assessment, Ecosystems and human well-being: current state and trends: findings of the Condition and Trends Working Group (Island press, 2005)

work page 2005

[5] [5]

Darwin, On the origin of species, 1859 (Routledge London, UK:, 2004)

C. Darwin, On the origin of species, 1859 (Routledge London, UK:, 2004)

work page 2004

[6] [6]

Pennisi, Science 309, 90 (2005)

E. Pennisi, Science 309, 90 (2005)

work page 2005

[7] [7]

Government, The Economics of Biodiversity: The Dasgupta Review (UK Government, 2021)

U. Government, The Economics of Biodiversity: The Dasgupta Review (UK Government, 2021)

work page 2021

[8] [8]

May and A

R. May and A. R. McLean, Theoretical ecology: princi- ples and applications (OUP Oxford, 2007)

work page 2007

[9] [9]

J. D. Murray, Mathematical biology: I. An introduction 3rd ed. , Vol. 17 (Springer Science & Business Media, 2013)

work page 2013

[10] [10]

C. L. Lehman and D. Tilman, Spatial ecology: the role of space in population dynamics and interspecific inter- actions 185, 191 (1997)

work page 1997

[11] [11]

A. J. McLane, C. Semeniuk, G. J. McDermid, and D. J. Marceau, Ecological modelling 222, 1544 (2011)

work page 2011

[12] [12]

J. M. Smith, Evolution and the Theory of Games (Cam- bridge Univ. Press, 1982)

work page 1982

[13] [13]

Hofbauer and K

J. Hofbauer and K. Sigmund, Evolutionary Games and Population Dynamics (Cambridge University Press, Cambridge, 1998)

work page 1998

[14] [14]

M. A. Nowak, Evolutionary Dynamics (Belknap Press, Cambridge, MA, 2006)

work page 2006

[15] [15]

Durrett and S

R. Durrett and S. Levin, Journal of Theoretical Biology 185, 165 (1997)

work page 1997

[16] [16]

Durrett and S

R. Durrett and S. Levin, Theoretical Population Biology 53, 30 (1998)

work page 1998

[17] [17]

B. Kerr, C. Neuhauser, B. J. M. Bohannan, and A. M. Dean, Nature 418, 171 (2002)

work page 2002

[18] [19]

R. M. May and W. J. Leonard, SIAM Journal on Applied Mathematics 29, 243 (1975)

work page 1975

[19] [20]

C. R. Johnson and I. Seinen, Proceedings of the Royal Society of London. Series B: Biological Sciences 269, 655 (2002)

work page 2002

[20] [21]

Reichenbach, M

T. Reichenbach, M. Mobilia, and E. Frey, Physical Re- view E 74, 051907 (2006)

work page 2006

[21] [22]

Szab´ o and G

G. Szab´ o and G. F´ ath, Physics Reports446, 97 (2007)

work page 2007

[22] [23]

Szolnoki, M

A. Szolnoki, M. Mobilia, L.-L. Jiang, B. Szczesny, A. M. Rucklidge, and M. Perc, Journal of the Royal Society Interface 11, 20140735 (2014)

work page 2014

[23] [24]

Zhou, Contemporary Physics 57, 151 (2016)

H.-J. Zhou, Contemporary Physics 57, 151 (2016)

work page 2016

[24] [25]

Sinervo and C

B. Sinervo and C. M. Lively, Nature 380, 240 (1996)

work page 1996

[25] [26]

C. E. Paquin and J. Adams, Nature 306, 368 (1983)

work page 1983

[26] [27]

Jackson and L

J. Jackson and L. Buss, Proceedings of the National Academy of Sciences 72, 5160 (1975)

work page 1975

[27] [28]

T. L. Cz´ ar´ an, R. F. Hoekstra, and L. Pagie, Proceedings of the National Academy of Sciences 99, 786 (2002)

work page 2002

[28] [29]

Reichenbach, M

T. Reichenbach, M. Mobilia, and E. Frey, Nature 448, 1046 (2007)

work page 2007

[29] [30]

Yang, W.-X

R. Yang, W.-X. Wang, Y.-C. Lai, and C. Grebogi, Chaos 20, 023113 (2010)

work page 2010

[30] [31]

Wang, Y.-C

W.-X. Wang, Y.-C. Lai, and C. Grebogi, Physical Re- view E 81, 046113 (2010)

work page 2010

[31] [32]

W.-X. Wang, X. Ni, Y.-C. Lai, and C. Grebogi, Physical Review E 83, 011917 (2011)

work page 2011

[32] [33]

J. Park, Y. Do, Z. Huang, and Y. Lai, Chaos 23, 023128 (2013)

work page 2013

[33] [34]

Huang, X

W. Huang, X. Duan, L. Qin, and J. Park, Applied Math- ematics and Computation 456, 128135 (2023), early ac- cess: Jun 2023

work page 2023

[34] [35]

H.-W. Lee, C. Cleveland, and A. Szolnoki, Chaos 32, 093103 (2022)

work page 2022

[35] [36]

Menezes, M

J. Menezes, M. Tenorio, and E. Rangel, EPL 139, 57002 (2022)

work page 2022

[36] [37]

Park and B

J. Park and B. Jang, Journal of the Korean Society for Industrial and Applied Mathematics 24, 351 (2020)

work page 2020

[37] [38]

Park, EPL 126, 38004 (2019)

J. Park, EPL 126, 38004 (2019). 11 Algorithm 1: RPS model with Q-learning Input: α, γ Initialization; Q1, Q2, Q3 ← random(15 × 7); Lattice point ← random[0, 3]L×L; σ, µ ← 1; Nstep ← 10; Learning Process; repeat for each round t do for Each agent do Agent picks a random action a ∈ A; for interaction count = 1 to Nstep × L2 do Randomly select an agent and i...

work page 2019

[38] [39]

Z. Ding, G. Zheng, C. Cai, W. Cai, L. Chen, J. Zhang, and X. Wang, Chaos, Solitons & Fractals 175, 114032 (2023)

work page 2023

[39] [40]

Zheng, J

G. Zheng, J. Zhang, S. Deng, W. Cai, and L. Chen, Chaos, Solitons & Fractals 188, 115568 (2024)

work page 2024

[40] [41]

Zheng, J

G. Zheng, J. Zhang, J. Zhang, W. Cai, and L. Chen, New Journal of Physics 26, 053041 (2024)

work page 2024

[41] [42]

Zheng, J

G. Zheng, J. Zhang, X. Ou, S. Deng, and L. Chen, Phys- ical Review E 111, 064307 (2025)

work page 2025

[42] [43]

Zheng, W

G. Zheng, W. Cai, G. Qi, J. Zhang, and L. Chen, arXiv:2312.14970 (2023)

work page arXiv 2023

[43] [44]

Zhang, J

S. Zhang, J. Zhang, L. Chen, and X. Liu, Nonlinear Dynamics 99, 3301 (2020)

work page 2020

[44] [45]

Zhang, S

J. Zhang, S. Zhang, L. Chen, and X. Liu, Physical Re- view E 101, 042402 (2020)

work page 2020

[45] [46]

M. M. Olsen and R. Fraczkowski, Journal of Computa- tional Science 9, 118 (2015). (a) 0 500 1000 1500 2000 2500 3000 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 A B C empty sites (b) FIG. 10: Evolution in predation dominance scenarios. The typical pattern (a) and time series (b) in the predation dominance scenario with Rp = 40 and Rs = 0.5. Parameters: N...

work page 2015

[46] [47]

X. Wang, J. Cheng, and L. Wang, Entropy 21, 773 (2019)

work page 2019

[47] [48]

X. Wang, J. Cheng, and L. Wang, Ecological Complexity 42, 100815 (2020)

work page 2020

[48] [49]

J. Park, J. Lee, T. Kim, I. Ahn, and J. Park, Entropy 23, 461 (2021)

work page 2021

[49] [50]

J. Li, L. Li, and S. Zhao, New Journal of Physics 25, 092001 (2023)

work page 2023

[50] [51]

Tsutsui, R

K. Tsutsui, R. Tanaka, K. Takeda, and K. Fujii, Elife 13, e85694 (2024)

work page 2024

[51] [52]

Si and T

Z. Si and T. Ito, Chaos, Solitons & Fractals 199, 116628 (2025)

work page 2025

[52] [53]

Reichenbach, M

T. Reichenbach, M. Mobilia, and E. Frey, Journal of Theoretical Biology 254, 368 (2008)

work page 2008

[53] [54]

C. J. C. H. Watkins and P. Dayan, Machine Learning 8, 279 (1992)

work page 1992

[54] [55]

Sutton and A

R. Sutton and A. Barto, Reinforcement Learning:An In- troduction (MIT press, 2018)

work page 2018

[55] [56]

J. E. R. Staddon, Adaptive behavior and learning (Cam- bridge University Press, 1983)

work page 1983

[56] [57]

A. J. Underwood, Experiments in ecology: their logi- cal design and interpretation using analysis of variance (Cambridge university press, 1997)

work page 1997