The Hive Mind is a Single Reinforcement Learning Agent

Giovanni Beltrame; Heiko Hamann; Karthik Soma; Yann Bouteiller

arxiv: 2410.17517 · v5 · submitted 2024-10-23 · 💻 cs.MA · cs.AI· cs.GT

The Hive Mind is a Single Reinforcement Learning Agent

Karthik Soma , Yann Bouteiller , Heiko Hamann , Giovanni Beltrame This is my paper

Pith reviewed 2026-05-23 19:16 UTC · model grok-4.3

classification 💻 cs.MA cs.AIcs.GT

keywords hive mindreinforcement learningcollective decision-makinghoney beeswaggle dancemulti-armed banditimitationMaynard-Cross Learning

0 comments

The pith

Honey bee swarms using only local imitation learn exactly as one reinforcement learning agent does.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes an equivalence between collective decision-making via imitation and single-agent trial-and-error learning. In the weighted voter model of honey bee nest-hunting, the group's choices update according to a multi-armed bandit rule the authors call Maynard-Cross Learning, making the hive mind behave as one online RL agent facing many parallel environments. A reader would care because the result unifies two classic explanations for intelligent behavior: groups that copy successes and individuals that explore. It shows that simple, local imitation rules can produce the same learning dynamics as explicit reinforcement without any individual needing to reason about rewards. The authors note this view applies to any imitative collective and supplies a formal tool for analyzing such systems.

Core claim

The emergent distributed cognition arising from individuals following simple, local imitation-based rules in the weighted voter model of bees' waggle dance is that of a single online reinforcement learning agent interacting with many parallel environments; the group's update rule is the multi-armed bandit algorithm the authors term Maynard-Cross Learning.

What carries the argument

The weighted voter model of waggle-dance communication, whose choice-update rule matches Maynard-Cross Learning and thereby equates the collective to a single multi-armed-bandit agent.

If this is right

A group of purely imitative organisms functions as a more complex reinforcement-enabled entity.
Group-level intelligence can explain the evolutionary selection of simple individual behaviors.
Imitative economic or social systems can be analyzed as collective learning processes.
The framework supplies design principles for scalable artificial collective systems inspired by RL.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same equivalence might appear in other animal groups that rely on local copying rules.
Markets or voting systems could be modeled as single RL agents when participants imitate successful choices.
Adding memory or private exploration to the bee model would be a direct test of whether the equivalence survives.
Multi-agent RL systems could deliberately use imitation to reproduce single-agent learning at scale.

Load-bearing premise

The weighted voter model fully captures the bees' collective decision process without additional mechanisms such as memory or spatial exploration.

What would settle it

Demonstration that real bees employ mechanisms outside the weighted voter model, such as individual memory of sites or non-imitative exploration, during nest selection.

read the original abstract

Decision-making is an essential attribute of any intelligent agent or group. Natural systems are known to converge to effective strategies through at least two distinct mechanisms: collective decision-making via imitation of others, and trial-and-error by a single agent. This paper establishes an equivalence between these two paradigms by drawing from the well-studied collective decision-making problem of nest-hunting in swarms of honey bees. We show that the emergent distributed cognition (sometimes referred to as the $\textit{hive mind}$) arising from individuals following simple, local imitation-based rules is that of a single online reinforcement learning (RL) agent interacting with many parallel environments. More specifically, in the purely imitative $\textit{weighted voter}$ model of bees' waggle dance, the update rule through which this macro-agent learns is a multi-armed bandit algorithm that we coin $\textit{Maynard-Cross Learning}$. Our analysis implies that a group of purely imitative organisms can be equivalent to a more complex, reinforcement-enabled entity, substantiating the idea that group-level intelligence may explain how seemingly simple and blind individual behaviors are selected in nature. Beyond biology, the framework offers new tools for analyzing economic and social systems where individuals imitate successful strategies, effectively participating in a collective learning process. Our findings may further inform the design of scalable RL-inspired collective systems in artificial domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a mathematical mapping from the weighted voter model to a multi-armed bandit update rule inside that model, but the claim that this explains real bee hives or natural selection rests on the model being complete.

read the letter

The punchline is that the weighted voter model of bee waggle-dance imitation produces collective updates identical to a multi-armed bandit algorithm they name Maynard-Cross Learning. This lets them treat the hive as one RL agent acting across many parallel environments. That equivalence is the actual new content relative to standard swarm and RL citations. The framing is clean and gives a precise link between local imitation and single-agent learning, which could be handy for modeling imitation-driven economic or social systems. The paper earns credit for making the mapping explicit rather than leaving it at the level of loose analogy. The derivation appears internal to the model, so the result holds within those assumptions. The soft spots are straightforward. The abstract supplies no equations or steps, which makes it impossible to judge whether the mapping is a genuine derivation or just a restatement of the update rule. Coining a new algorithm name for a rule pulled directly from the model adds little beyond labeling. The broader claim that this equivalence explains why simple individual behaviors are selected in nature requires the weighted voter model to be a full description of bee decision-making. Any omitted mechanisms such as spatial memory or direct site assessment would break the equivalence for actual hives, and the abstract gives no empirical check on that completeness. The stress-test note is on target here. This paper is for people working at the intersection of multi-agent RL and collective behavior who already know the weighted voter model. A reader looking for a formal bridge between imitation and RL will get something usable if the math checks out. It deserves peer review because the core mapping, once written out, is the kind of result that referees can evaluate directly even if the biological extension needs more support.

Referee Report

2 major / 2 minor

Summary. The paper claims that within the weighted voter model of honeybee waggle-dance communication, the emergent collective decision process is mathematically equivalent to a single online reinforcement learning agent interacting with many parallel environments; specifically, the collective update rule is a multi-armed bandit algorithm that the authors name Maynard-Cross Learning. This equivalence is used to argue that group-level intelligence can arise from simple local imitation rules, with implications for natural selection, economic systems, and the design of collective AI.

Significance. If the internal mapping is rigorously derived, the result supplies a clean theoretical bridge between imitation-based collective behavior and reinforcement learning, offering a parameter-free account of how simple individual rules can produce effective group strategies. This could inform multi-agent system design and bio-inspired algorithms, though its explanatory power for natural systems hinges on model completeness.

major comments (2)

[Abstract / model section] Abstract and model section: the equivalence is asserted for the weighted voter model, yet the broader claim that this 'substantiates the idea that group-level intelligence may explain how seemingly simple and blind individual behaviors are selected in nature' requires the model to be a complete description; no empirical comparison or discussion of omitted mechanisms (spatial exploration, memory, non-imitative rules) is supplied, creating a correctness risk for the natural-selection implication.
[Maynard-Cross Learning definition] Section introducing Maynard-Cross Learning: coining a new algorithm name for the update rule obtained directly by aggregating the weighted voter model makes the claimed equivalence appear definitional rather than independently derived; a concrete test is whether the algorithm exhibits any property not already entailed by the model's aggregation step.

minor comments (2)

[Abstract] Abstract: the equivalence statement would be clearer if a single key equation or proof outline were included.
[Notation / model definition] Notation: the distinction between individual and collective update rules should be made explicit with consistent symbols to avoid reader confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and the recommendation for major revision. We address each major comment below with specific plans for revision where warranted. The core mathematical equivalence between the weighted voter model and the derived update rule remains intact, but we agree that certain claims require additional qualification and clarification.

read point-by-point responses

Referee: [Abstract / model section] Abstract and model section: the equivalence is asserted for the weighted voter model, yet the broader claim that this 'substantiates the idea that group-level intelligence may explain how seemingly simple and blind individual behaviors are selected in nature' requires the model to be a complete description; no empirical comparison or discussion of omitted mechanisms (spatial exploration, memory, non-imitative rules) is supplied, creating a correctness risk for the natural-selection implication.

Authors: We agree that the natural-selection implication in the abstract and introduction is stated too strongly given the model's scope. The equivalence is rigorously derived only for the purely imitative weighted voter model; extending it to explain evolutionary selection requires acknowledging that real bee colonies include additional mechanisms. We will revise the abstract and add a new subsection (likely in the Discussion) that explicitly lists the omitted mechanisms (spatial exploration, individual memory, and non-imitative rules), discusses how they might alter or preserve the equivalence, and qualifies the evolutionary claim as a hypothesis supported by the model rather than a direct substantiation. No empirical comparison is feasible within the current theoretical scope, but the added discussion will reduce the risk of overgeneralization. revision: yes
Referee: [Maynard-Cross Learning definition] Section introducing Maynard-Cross Learning: coining a new algorithm name for the update rule obtained directly by aggregating the weighted voter model makes the claimed equivalence appear definitional rather than independently derived; a concrete test is whether the algorithm exhibits any property not already entailed by the model's aggregation step.

Authors: The name 'Maynard-Cross Learning' is intended to identify the specific multi-armed bandit update rule that emerges from the aggregation, allowing comparison with existing algorithms. However, we accept that the presentation risks making the equivalence appear tautological. In revision we will (1) separate the derivation of the update rule from the naming, (2) add a short analysis showing that the resulting algorithm possesses a non-trivial property (convergence rate under parallel environments that differs from standard UCB or epsilon-greedy when the number of parallel instances grows) not directly implied by the aggregation step alone, and (3) rephrase the surrounding text to emphasize that the equivalence is a derived result rather than a definitional restatement. We will also consider whether a different presentation (e.g., without a new name) better conveys the contribution. revision: yes

Circularity Check

0 steps flagged

No significant circularity; equivalence derived directly from model rules

full rationale

The paper derives the equivalence between the weighted voter model's collective update rule and a multi-armed bandit algorithm by direct mathematical mapping within the model's stated assumptions. This is presented as an analytical result rather than a fitted prediction or self-referential definition. No equations or steps in the abstract reduce the claimed RL equivalence to its inputs by construction, and no self-citations, ansatzes, or renamings of external results are invoked as load-bearing. The coining of 'Maynard-Cross Learning' is merely nomenclature for the derived rule. The derivation chain remains self-contained against the model's premises, with the broader applicability to real bees noted as an assumption rather than a circular claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review performed on abstract only; full derivation, assumptions, and any fitted parameters are not visible. The central claim rests on the accuracy of the weighted voter model and the existence of a direct mapping to RL updates.

axioms (1)

domain assumption The weighted voter model accurately captures the collective nest-hunting decision process in honey bee swarms
Invoked as the basis for the equivalence; stated in abstract as the model from which the RL equivalence follows.

invented entities (1)

Maynard-Cross Learning no independent evidence
purpose: The specific multi-armed bandit update rule that the bee imitation process is claimed to implement
Newly coined in the paper; no independent evidence provided in abstract

pith-pipeline@v0.9.0 · 5776 in / 1309 out tokens · 19397 ms · 2026-05-23T19:16:50.475476+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the update rule through which this macro-agent learns is a multi-armed bandit algorithm that we coin Maynard-Cross Learning
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a swarm of honey bees collectively acts as a single RL entity

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 3 internal anchors

[1]

Princeton University Press, Princeton, NJ, USA (1944)

Neumann, J.V., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, USA (1944)

work page 1944
[2]

Artificial Intelligence299, 103535 (2021) https://doi.org/10.1016/j.artint.2021.103535

Silver, D., Singh, S., Precup, D., Sutton, R.S.: Reward is enough. Artificial Intelligence299, 103535 (2021) https://doi.org/10.1016/j.artint.2021.103535

work page doi:10.1016/j.artint.2021.103535 2021
[3]

A Bradford Book, Cambridge, MA, USA (2018)

Sutton, R.S., Barto, A.G.: Reinforcemet Learning: An Introduction. A Bradford Book, Cambridge, MA, USA (2018)

work page 2018
[4]

Nature Machine Intelligence1(3), 133–143 (2019)

Neftci, E.O., Averbeck, B.B.: Reinforcement learning in artificial and biological systems. Nature Machine Intelligence1(3), 133–143 (2019)

work page 2019
[5]

Nature Neuroscience —27, 403–408 (2024) https://doi.org/10.1038/s41593-023-01535-w

Muller, T.H., Butler, J.L., Veselic, S., Miranda, B., Wallis, J.D., Dayan, P., Behrens, T.E.J., Kurth-Nelson, Z., Kennerley, S.W.: nature neuroscience distri- butional reinforcement learning in prefrontal cortex. Nature Neuroscience —27, 403–408 (2024) https://doi.org/10.1038/s41593-023-01535-w

work page doi:10.1038/s41593-023-01535-w 2024
[6]

https://arxiv.org/abs/2410.14606

Elsayed, M., Vasan, G., Mahmood, A.R.: Streaming Deep Reinforcement Learning Finally Works (2024). https://arxiv.org/abs/2410.14606

work page arXiv 2024
[7]

https://arxiv.org/abs/2411.15370

Vasan, G., Elsayed, M., Azimi, A., He, J., Shariar, F., Bellinger, C., White, M., Mahmood, A.R.: Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers (2024). https://arxiv.org/abs/2411.15370

work page arXiv 2024
[8]

In: International Conference on Machine Learning, pp

Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016). PmLR

work page 1928
[9]

Machine Learning47, 235–256 (2002) https://doi.org/10.1023/ A:1013689704352

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning47, 235–256 (2002) https://doi.org/10.1023/ A:1013689704352

work page 2002
[10]

Williams, R.J.: Simple Statistical Gradient-Following Algorithms for Connection- ist Reinforcement Learning (1992)

work page 1992
[11]

The Quarterly Journal of Economics87(2), 239–266 (1973)

Cross, J.G.: A stochastic learning model of economic behavior. The Quarterly Journal of Economics87(2), 239–266 (1973)

work page 1973
[12]

Journal of Artificial Intelligence Research53, 659–697 (2015) https://doi.org/10.1613/jair.4818 15

Bloembergen, D., Tuyls, K., Hennes, D., Kaisers, M.: Evolutionary dynamics of multi-agent learning: A survey. Journal of Artificial Intelligence Research53, 659–697 (2015) https://doi.org/10.1613/jair.4818 15

work page doi:10.1613/jair.4818 2015
[13]

Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn.8(3–4), 229–256 (1992) https://doi.org/10. 1007/BF00992696

work page 1992
[14]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms (2017). https://arxiv.org/abs/1707.06347

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft Actor-Critic: Off-Policy Max- imum Entropy Deep Reinforcement Learning with a Stochastic Actor (2018). https://arxiv.org/abs/1801.01290

work page internal anchor Pith review Pith/arXiv arXiv 2018
[16]

PLOS ONE10(10), 1–18 (2015) https: //doi.org/10.1371/journal.pone.0140950

Reina, A., Valentini, G., Fern´ andez-Oto, C., Dorigo, M., Trianni, V.: A design pattern for decentralised decision making. PLOS ONE10(10), 1–18 (2015) https: //doi.org/10.1371/journal.pone.0140950

work page doi:10.1371/journal.pone.0140950 2015
[17]

Current Opinion in Behavioral Sciences16, 30–34 (2017) https://doi.org/10.1016/j.cobeha.2017

Bose, T., Reina, A., Marshall, J.A.: Collective decision-making. Current Opinion in Behavioral Sciences16, 30–34 (2017) https://doi.org/10.1016/j.cobeha.2017. 03.004 . Comparative cognition

work page doi:10.1016/j.cobeha.2017 2017
[18]

American Economic Journal: Microeconomics2, 112–49 (2010) https: //doi.org/10.1257/mic.2.1.112

Jackson, M., Golub, B.: Na¨ ıve learning in social networks and the wisdom of crowds. American Economic Journal: Microeconomics2, 112–49 (2010) https: //doi.org/10.1257/mic.2.1.112

work page doi:10.1257/mic.2.1.112 2010
[19]

Autonomous Agents and Multi-Agent Systems30(3), 553–580 (2016) https://doi.org/10.1007/ s10458-015-9323-3

Valentini, G., Ferrante, E., Hamann, H., Dorigo, M.: Collective decision with 100 Kilobots: speed versus accuracy in binary discrimination problems. Autonomous Agents and Multi-Agent Systems30(3), 553–580 (2016) https://doi.org/10.1007/ s10458-015-9323-3

work page 2016
[20]

In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems

Valentini, G., Hamann, H., Dorigo, M.: Self-organized collective decision making: the weighted voter model. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems. AAMAS ’14, pp. 45–52. Inter- national Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2014)

work page 2014
[21]

The MIT Press, Cambridge, MA (2010).http://www.jstor.org/stable/j.ctt5hhbq5Accessed 2025- 05-23

Sandholm, W.H.: Population Games and Evolutionary Dynamics. The MIT Press, Cambridge, MA (2010).http://www.jstor.org/stable/j.ctt5hhbq5Accessed 2025- 05-23

work page 2010
[22]

Reina, A., Njougouo, T., Tuci, E., Carletti, T.: Speed-accuracy trade-offs in best- of-ncollective decision making through heterogeneous mean-field modeling. Phys. Rev. E109, 054307 (2024) https://doi.org/10.1103/PhysRevE.109.054307

work page doi:10.1103/physreve.109.054307 2024
[23]

Nature 397(6718), 400 (1999) https://doi.org/10.1038/17047

Visscher, P.K., Camazine, S.: Collective decisions and cognition in bees. Nature 397(6718), 400 (1999) https://doi.org/10.1038/17047

work page doi:10.1038/17047 1999
[24]

Behavioral Ecology and Sociobiology56, 594–601 (2004) 16

Seeley, T.D., Visscher, P.K.: Quorum sensing during nest-site selection by honeybee swarms. Behavioral Ecology and Sociobiology56, 594–601 (2004) 16

work page 2004
[25]

Apidologie35(2), 101–116 (2004)

Seeley, T.D., Visscher, P.K.: Group decision making in nest-site selection by honey bees. Apidologie35(2), 101–116 (2004)

work page 2004
[26]

Behavioral Ecology and Sociobiology59, 427–442 (2006)

Passino, K.M., Seeley, T.D.: Modeling and analysis of nest-site selection by honeybee swarms: the speed and accuracy trade-off. Behavioral Ecology and Sociobiology59, 427–442 (2006)

work page 2006
[27]

Behavioral Ecology and Sociobiology62(3), 401–414 (2008)

Passino, K.M., Seeley, T.D., Visscher, P.K.: Swarm cognition in honey bees. Behavioral Ecology and Sociobiology62(3), 401–414 (2008). Accessed 2025-05-04

work page 2008
[28]

Games and Economic Behavior64(2), 666–683 (2008) https://doi.org/10.1016/j.geb.2008.02.003

Sandholm, W.H., Dokumacı, E., Lahkar, R.: The projection dynamic and the replicator dynamic. Games and Economic Behavior64(2), 666–683 (2008) https://doi.org/10.1016/j.geb.2008.02.003 . Special Issue in Honor of Michael B. Maschler

work page doi:10.1016/j.geb.2008.02.003 2008
[29]

Journal of Economic Theory136(1), 217–235 (2007)

Apesteguia, J., Huck, S., Oechssler, J.: Imitation—theory and experimental evidence. Journal of Economic Theory136(1), 217–235 (2007)

work page 2007
[30]

Mathematical Biosciences40(1), 145–156 (1978) https://doi.org/10.1016/ 0025-5564(78)90077-9

Taylor, P.D., Jonker, L.B.: Evolutionary stable strategies and game dynam- ics. Mathematical Biosciences40(1), 145–156 (1978) https://doi.org/10.1016/ 0025-5564(78)90077-9

work page 1978
[31]

Science 314(5805), 1560–1563 (2006) https://doi.org/10.1126/science.1133755 https://www.science.org/doi/pdf/10.1126/science.1133755

Nowak, M.A.: Five rules for the evolution of cooperation. Science 314(5805), 1560–1563 (2006) https://doi.org/10.1126/science.1133755 https://www.science.org/doi/pdf/10.1126/science.1133755

work page doi:10.1126/science.1133755 2006
[32]

Cambridge University Press, Cambridge, UK (1982)

Smith, J.M.: Evolution and the Theory of Games. Cambridge University Press, Cambridge, UK (1982)

work page 1982
[33]

Journal of economic theory77(1), 1–14 (1997)

B¨ orgers, T., Sarin, R.: Learning through reinforcement and replicator dynamics. Journal of economic theory77(1), 1–14 (1997)

work page 1997
[34]

Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., Hoeller, D., Rudin, N., Allshire, A., Handa, A., State, G.: Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning (2021). https: //arxiv.org/abs/2108.10470

work page internal anchor Pith review Pith/arXiv arXiv 2021
[35]

, year 2007

Bonabeau, E., Dorigo, M., Theraulaz, G.: Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, New York (1999). https://doi.org/10.1093/oso/9780195131581.001.0001 . https://doi.org/10.1093/oso/9780195131581.001.0001

work page doi:10.1093/oso/9780195131581.001.0001 1999
[36]

Philosophical Transactions of the Royal Society of London

Franks, N.R., Pratt, S.C., Mallon, E.B., Britton, N.F., Sumpter, D.J.: Information flow, opinion polling and collective intelligence in house–hunting social insects. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences357(1427), 1567–1583 (2002) 17

work page 2002
[37]

In: Proceedings of the International Workshop on Engineering Self-organising Applications 2004, pp

De Wolf, T., Holvoet, T.: Emergence and self-organisation: a statement of similari- ties and differences. In: Proceedings of the International Workshop on Engineering Self-organising Applications 2004, pp. 96–110 (2004)

work page 2004
[38]

Insectes Sociaux25(4), 323–337 (1978)

Seeley, T.D., Morse, R.A.: Nest site selection by the honey bee, apis mellifera. Insectes Sociaux25(4), 323–337 (1978)

work page 1978
[39]

Beekman, M., Fathke, R.L., Seeley, T.D.: How does an informed minority of scouts guide a honeybee swarm as it flies to its new home? Animal behaviour71(1), 161–171 (2006)

work page 2006
[40]

Glia66(6), 1160–1175 (2018)

Rittschof, C.C., Schirmeier, S.: Insect models of central nervous system energy metabolism and its links to behavior. Glia66(6), 1160–1175 (2018)

work page 2018
[41]

Journal of experimental biology209(19), 3828–3836 (2006)

Schippers, M.-P., Dukas, R., Smith, R., Wang, J., Smolen, K., McClelland, G.: Lifetime performance in foraging honeybees: behaviour and physiology. Journal of experimental biology209(19), 3828–3836 (2006)

work page 2006
[42]

Anim Cogn9, 335–353 (2006) https://doi.org/10.1007/s10071-006-0039-2

Zentall, T.R.: Imitation: definitions, evidence, and mechanisms. Anim Cogn9, 335–353 (2006) https://doi.org/10.1007/s10071-006-0039-2

work page doi:10.1007/s10071-006-0039-2 2006
[43]

Nature338, 576–579 (1989) https://doi.org/10.1038/ 338576a0

Jr, R.E., Robinson, G., Fondrk, M.: Genetic specialists, kin recognition and nepo- tism in honey-bee colonies. Nature338, 576–579 (1989) https://doi.org/10.1038/ 338576a0

work page 1989
[44]

https://arxiv.org/abs/2509

Vellinger, A., Antonic, N., Tuci, E.: From Pheromones to Policies: Reinforcement Learning for Engineered Biological Swarms (2025). https://arxiv.org/abs/2509. 20095

work page 2025
[45]

Proceedings of the IEEE109(7), 1152–1165 (2021) https://doi

Dorigo, M., Theraulaz, G., Trianni, V.: Swarm robotics: Past, present, and future [point of view]. Proceedings of the IEEE109(7), 1152–1165 (2021) https://doi. org/10.1109/JPROC.2021.3072740

work page doi:10.1109/jproc.2021.3072740 2021
[46]

Springer, Cham (2018)

Hamann, H.: Swarm Robotics: A Formal Approach, 1st edn. Springer, Cham (2018)

work page 2018
[47]

IEEE Compu- tational Intelligence Magazine1(4), 28–39 (2006) https://doi.org/10.1109/MCI

Dorigo, M., Birattari, M., Stutzle, T.: Ant colony optimization. IEEE Compu- tational Intelligence Magazine1(4), 28–39 (2006) https://doi.org/10.1109/MCI. 2006.329691

work page doi:10.1109/mci 2006
[48]

Physical Review E95(5) (2017) https://doi

Reina, A., Marshall, J.A.R., Trianni, V., Bose, T.: Model of the best-of-n nest- site selection process in honeybees. Physical Review E95(5) (2017) https://doi. org/10.1103/physreve.95.052411

work page doi:10.1103/physreve.95.052411 2017
[49]

In: 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp

Soma, K., Vardharajan, V.S., Hamann, H., Beltrame, G.: Congestion and scalabil- ity in robot swarms: A study on collective decision making. In: 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp. 199–206 18 (2023). https://doi.org/10.1109/MRS60187.2023.10416793

work page doi:10.1109/mrs60187.2023.10416793 2023
[50]

Swarm Intelligence13, 217–243 (2019) https://doi.org/10.1007/ s11721-019-00169-8

Prasetyo, J., Masi, G.D., Ferrante, E.: Collective decision making in dynamic environments. Swarm Intelligence13, 217–243 (2019) https://doi.org/10.1007/ s11721-019-00169-8

work page 2019
[51]

In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems

Ebert, J.T., Gauci, M., Nagpal, R.: Multi-feature collective decision making in robot swarms. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS ’18, pp. 1711–1719. Inter- national Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2018)

work page 2018
[52]

In: 2021 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp

Raoufi, M., Hamann, H., Romanczuk, P.: Speed-vs-accuracy tradeoff in collec- tive estimation: An adaptive exploration-exploitation case. In: 2021 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp. 47–55 (2021). https://doi.org/10.1109/MRS50823.2021.9620695

work page doi:10.1109/mrs50823.2021.9620695 2021
[53]

Monocular visual-inertial odometry in low-textured environments with smooth gradients: A fully dense direct ﬁltering approach,

Ebert, J., Gauci, M., Mallmann-Trenn, F., Nagpal, R.: Bayes bots: Collective bayesian decision-making in decentralized robot swarms. In: 2020 IEEE Interna- tional Conference on Robotics and Automation, ICRA 2020. Proceedings - IEEE International Conference on Robotics and Automation, pp. 7186–7192. Insti- tute of Electrical and Electronics Engineers Inc.,...

work page doi:10.1109/icra40945.2020.9196584 2020
[54]

Trends in Cognitive Sciences26, 66–80 (2022) https://doi.org/10.1016/j.tics.2021.10.006

Pirrone, A., Reina, A., Stafford, T., Marshall, J.A.R., Gobet, F.: Magnitude- sensitivity: rethinking decision-making cognitive sciences. Trends in Cognitive Sciences26, 66–80 (2022) https://doi.org/10.1016/j.tics.2021.10.006

work page doi:10.1016/j.tics.2021.10.006 2022
[55]

waggle dance

Coucke, N., Heinrich, M.K., Cleeremans, A., Dorigo, M., Dumas, G.: Collective decision making by embodied neural agents. PNAS Nexus4(4), 101 (2025) https: //doi.org/10.1093/pnasnexus/pgaf101 19 A Proofs for Section 3 (Methodology) Lemma 1.An infinite population of individuals adoptingR success follows the TRD: dπa =π a(qπ a −v π),(5) whereπ a is the propo...

work page doi:10.1093/pnasnexus/pgaf101 2025

[1] [1]

Princeton University Press, Princeton, NJ, USA (1944)

Neumann, J.V., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, USA (1944)

work page 1944

[2] [2]

Artificial Intelligence299, 103535 (2021) https://doi.org/10.1016/j.artint.2021.103535

Silver, D., Singh, S., Precup, D., Sutton, R.S.: Reward is enough. Artificial Intelligence299, 103535 (2021) https://doi.org/10.1016/j.artint.2021.103535

work page doi:10.1016/j.artint.2021.103535 2021

[3] [3]

A Bradford Book, Cambridge, MA, USA (2018)

Sutton, R.S., Barto, A.G.: Reinforcemet Learning: An Introduction. A Bradford Book, Cambridge, MA, USA (2018)

work page 2018

[4] [4]

Nature Machine Intelligence1(3), 133–143 (2019)

Neftci, E.O., Averbeck, B.B.: Reinforcement learning in artificial and biological systems. Nature Machine Intelligence1(3), 133–143 (2019)

work page 2019

[5] [5]

Nature Neuroscience —27, 403–408 (2024) https://doi.org/10.1038/s41593-023-01535-w

Muller, T.H., Butler, J.L., Veselic, S., Miranda, B., Wallis, J.D., Dayan, P., Behrens, T.E.J., Kurth-Nelson, Z., Kennerley, S.W.: nature neuroscience distri- butional reinforcement learning in prefrontal cortex. Nature Neuroscience —27, 403–408 (2024) https://doi.org/10.1038/s41593-023-01535-w

work page doi:10.1038/s41593-023-01535-w 2024

[6] [6]

https://arxiv.org/abs/2410.14606

Elsayed, M., Vasan, G., Mahmood, A.R.: Streaming Deep Reinforcement Learning Finally Works (2024). https://arxiv.org/abs/2410.14606

work page arXiv 2024

[7] [7]

https://arxiv.org/abs/2411.15370

Vasan, G., Elsayed, M., Azimi, A., He, J., Shariar, F., Bellinger, C., White, M., Mahmood, A.R.: Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers (2024). https://arxiv.org/abs/2411.15370

work page arXiv 2024

[8] [8]

In: International Conference on Machine Learning, pp

Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016). PmLR

work page 1928

[9] [9]

Machine Learning47, 235–256 (2002) https://doi.org/10.1023/ A:1013689704352

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning47, 235–256 (2002) https://doi.org/10.1023/ A:1013689704352

work page 2002

[10] [10]

Williams, R.J.: Simple Statistical Gradient-Following Algorithms for Connection- ist Reinforcement Learning (1992)

work page 1992

[11] [11]

The Quarterly Journal of Economics87(2), 239–266 (1973)

Cross, J.G.: A stochastic learning model of economic behavior. The Quarterly Journal of Economics87(2), 239–266 (1973)

work page 1973

[12] [12]

Journal of Artificial Intelligence Research53, 659–697 (2015) https://doi.org/10.1613/jair.4818 15

Bloembergen, D., Tuyls, K., Hennes, D., Kaisers, M.: Evolutionary dynamics of multi-agent learning: A survey. Journal of Artificial Intelligence Research53, 659–697 (2015) https://doi.org/10.1613/jair.4818 15

work page doi:10.1613/jair.4818 2015

[13] [13]

Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn.8(3–4), 229–256 (1992) https://doi.org/10. 1007/BF00992696

work page 1992

[14] [14]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms (2017). https://arxiv.org/abs/1707.06347

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft Actor-Critic: Off-Policy Max- imum Entropy Deep Reinforcement Learning with a Stochastic Actor (2018). https://arxiv.org/abs/1801.01290

work page internal anchor Pith review Pith/arXiv arXiv 2018

[16] [16]

PLOS ONE10(10), 1–18 (2015) https: //doi.org/10.1371/journal.pone.0140950

Reina, A., Valentini, G., Fern´ andez-Oto, C., Dorigo, M., Trianni, V.: A design pattern for decentralised decision making. PLOS ONE10(10), 1–18 (2015) https: //doi.org/10.1371/journal.pone.0140950

work page doi:10.1371/journal.pone.0140950 2015

[17] [17]

Current Opinion in Behavioral Sciences16, 30–34 (2017) https://doi.org/10.1016/j.cobeha.2017

Bose, T., Reina, A., Marshall, J.A.: Collective decision-making. Current Opinion in Behavioral Sciences16, 30–34 (2017) https://doi.org/10.1016/j.cobeha.2017. 03.004 . Comparative cognition

work page doi:10.1016/j.cobeha.2017 2017

[18] [18]

American Economic Journal: Microeconomics2, 112–49 (2010) https: //doi.org/10.1257/mic.2.1.112

Jackson, M., Golub, B.: Na¨ ıve learning in social networks and the wisdom of crowds. American Economic Journal: Microeconomics2, 112–49 (2010) https: //doi.org/10.1257/mic.2.1.112

work page doi:10.1257/mic.2.1.112 2010

[19] [19]

Autonomous Agents and Multi-Agent Systems30(3), 553–580 (2016) https://doi.org/10.1007/ s10458-015-9323-3

Valentini, G., Ferrante, E., Hamann, H., Dorigo, M.: Collective decision with 100 Kilobots: speed versus accuracy in binary discrimination problems. Autonomous Agents and Multi-Agent Systems30(3), 553–580 (2016) https://doi.org/10.1007/ s10458-015-9323-3

work page 2016

[20] [20]

In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems

Valentini, G., Hamann, H., Dorigo, M.: Self-organized collective decision making: the weighted voter model. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems. AAMAS ’14, pp. 45–52. Inter- national Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2014)

work page 2014

[21] [21]

The MIT Press, Cambridge, MA (2010).http://www.jstor.org/stable/j.ctt5hhbq5Accessed 2025- 05-23

Sandholm, W.H.: Population Games and Evolutionary Dynamics. The MIT Press, Cambridge, MA (2010).http://www.jstor.org/stable/j.ctt5hhbq5Accessed 2025- 05-23

work page 2010

[22] [22]

Reina, A., Njougouo, T., Tuci, E., Carletti, T.: Speed-accuracy trade-offs in best- of-ncollective decision making through heterogeneous mean-field modeling. Phys. Rev. E109, 054307 (2024) https://doi.org/10.1103/PhysRevE.109.054307

work page doi:10.1103/physreve.109.054307 2024

[23] [23]

Nature 397(6718), 400 (1999) https://doi.org/10.1038/17047

Visscher, P.K., Camazine, S.: Collective decisions and cognition in bees. Nature 397(6718), 400 (1999) https://doi.org/10.1038/17047

work page doi:10.1038/17047 1999

[24] [24]

Behavioral Ecology and Sociobiology56, 594–601 (2004) 16

Seeley, T.D., Visscher, P.K.: Quorum sensing during nest-site selection by honeybee swarms. Behavioral Ecology and Sociobiology56, 594–601 (2004) 16

work page 2004

[25] [25]

Apidologie35(2), 101–116 (2004)

Seeley, T.D., Visscher, P.K.: Group decision making in nest-site selection by honey bees. Apidologie35(2), 101–116 (2004)

work page 2004

[26] [26]

Behavioral Ecology and Sociobiology59, 427–442 (2006)

Passino, K.M., Seeley, T.D.: Modeling and analysis of nest-site selection by honeybee swarms: the speed and accuracy trade-off. Behavioral Ecology and Sociobiology59, 427–442 (2006)

work page 2006

[27] [27]

Behavioral Ecology and Sociobiology62(3), 401–414 (2008)

Passino, K.M., Seeley, T.D., Visscher, P.K.: Swarm cognition in honey bees. Behavioral Ecology and Sociobiology62(3), 401–414 (2008). Accessed 2025-05-04

work page 2008

[28] [28]

Games and Economic Behavior64(2), 666–683 (2008) https://doi.org/10.1016/j.geb.2008.02.003

Sandholm, W.H., Dokumacı, E., Lahkar, R.: The projection dynamic and the replicator dynamic. Games and Economic Behavior64(2), 666–683 (2008) https://doi.org/10.1016/j.geb.2008.02.003 . Special Issue in Honor of Michael B. Maschler

work page doi:10.1016/j.geb.2008.02.003 2008

[29] [29]

Journal of Economic Theory136(1), 217–235 (2007)

Apesteguia, J., Huck, S., Oechssler, J.: Imitation—theory and experimental evidence. Journal of Economic Theory136(1), 217–235 (2007)

work page 2007

[30] [30]

Mathematical Biosciences40(1), 145–156 (1978) https://doi.org/10.1016/ 0025-5564(78)90077-9

Taylor, P.D., Jonker, L.B.: Evolutionary stable strategies and game dynam- ics. Mathematical Biosciences40(1), 145–156 (1978) https://doi.org/10.1016/ 0025-5564(78)90077-9

work page 1978

[31] [31]

Science 314(5805), 1560–1563 (2006) https://doi.org/10.1126/science.1133755 https://www.science.org/doi/pdf/10.1126/science.1133755

Nowak, M.A.: Five rules for the evolution of cooperation. Science 314(5805), 1560–1563 (2006) https://doi.org/10.1126/science.1133755 https://www.science.org/doi/pdf/10.1126/science.1133755

work page doi:10.1126/science.1133755 2006

[32] [32]

Cambridge University Press, Cambridge, UK (1982)

Smith, J.M.: Evolution and the Theory of Games. Cambridge University Press, Cambridge, UK (1982)

work page 1982

[33] [33]

Journal of economic theory77(1), 1–14 (1997)

B¨ orgers, T., Sarin, R.: Learning through reinforcement and replicator dynamics. Journal of economic theory77(1), 1–14 (1997)

work page 1997

[34] [34]

Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., Hoeller, D., Rudin, N., Allshire, A., Handa, A., State, G.: Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning (2021). https: //arxiv.org/abs/2108.10470

work page internal anchor Pith review Pith/arXiv arXiv 2021

[35] [35]

, year 2007

Bonabeau, E., Dorigo, M., Theraulaz, G.: Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, New York (1999). https://doi.org/10.1093/oso/9780195131581.001.0001 . https://doi.org/10.1093/oso/9780195131581.001.0001

work page doi:10.1093/oso/9780195131581.001.0001 1999

[36] [36]

Philosophical Transactions of the Royal Society of London

Franks, N.R., Pratt, S.C., Mallon, E.B., Britton, N.F., Sumpter, D.J.: Information flow, opinion polling and collective intelligence in house–hunting social insects. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences357(1427), 1567–1583 (2002) 17

work page 2002

[37] [37]

In: Proceedings of the International Workshop on Engineering Self-organising Applications 2004, pp

De Wolf, T., Holvoet, T.: Emergence and self-organisation: a statement of similari- ties and differences. In: Proceedings of the International Workshop on Engineering Self-organising Applications 2004, pp. 96–110 (2004)

work page 2004

[38] [38]

Insectes Sociaux25(4), 323–337 (1978)

Seeley, T.D., Morse, R.A.: Nest site selection by the honey bee, apis mellifera. Insectes Sociaux25(4), 323–337 (1978)

work page 1978

[39] [39]

Beekman, M., Fathke, R.L., Seeley, T.D.: How does an informed minority of scouts guide a honeybee swarm as it flies to its new home? Animal behaviour71(1), 161–171 (2006)

work page 2006

[40] [40]

Glia66(6), 1160–1175 (2018)

Rittschof, C.C., Schirmeier, S.: Insect models of central nervous system energy metabolism and its links to behavior. Glia66(6), 1160–1175 (2018)

work page 2018

[41] [41]

Journal of experimental biology209(19), 3828–3836 (2006)

Schippers, M.-P., Dukas, R., Smith, R., Wang, J., Smolen, K., McClelland, G.: Lifetime performance in foraging honeybees: behaviour and physiology. Journal of experimental biology209(19), 3828–3836 (2006)

work page 2006

[42] [42]

Anim Cogn9, 335–353 (2006) https://doi.org/10.1007/s10071-006-0039-2

Zentall, T.R.: Imitation: definitions, evidence, and mechanisms. Anim Cogn9, 335–353 (2006) https://doi.org/10.1007/s10071-006-0039-2

work page doi:10.1007/s10071-006-0039-2 2006

[43] [43]

Nature338, 576–579 (1989) https://doi.org/10.1038/ 338576a0

Jr, R.E., Robinson, G., Fondrk, M.: Genetic specialists, kin recognition and nepo- tism in honey-bee colonies. Nature338, 576–579 (1989) https://doi.org/10.1038/ 338576a0

work page 1989

[44] [44]

https://arxiv.org/abs/2509

Vellinger, A., Antonic, N., Tuci, E.: From Pheromones to Policies: Reinforcement Learning for Engineered Biological Swarms (2025). https://arxiv.org/abs/2509. 20095

work page 2025

[45] [45]

Proceedings of the IEEE109(7), 1152–1165 (2021) https://doi

Dorigo, M., Theraulaz, G., Trianni, V.: Swarm robotics: Past, present, and future [point of view]. Proceedings of the IEEE109(7), 1152–1165 (2021) https://doi. org/10.1109/JPROC.2021.3072740

work page doi:10.1109/jproc.2021.3072740 2021

[46] [46]

Springer, Cham (2018)

Hamann, H.: Swarm Robotics: A Formal Approach, 1st edn. Springer, Cham (2018)

work page 2018

[47] [47]

IEEE Compu- tational Intelligence Magazine1(4), 28–39 (2006) https://doi.org/10.1109/MCI

Dorigo, M., Birattari, M., Stutzle, T.: Ant colony optimization. IEEE Compu- tational Intelligence Magazine1(4), 28–39 (2006) https://doi.org/10.1109/MCI. 2006.329691

work page doi:10.1109/mci 2006

[48] [48]

Physical Review E95(5) (2017) https://doi

Reina, A., Marshall, J.A.R., Trianni, V., Bose, T.: Model of the best-of-n nest- site selection process in honeybees. Physical Review E95(5) (2017) https://doi. org/10.1103/physreve.95.052411

work page doi:10.1103/physreve.95.052411 2017

[49] [49]

In: 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp

Soma, K., Vardharajan, V.S., Hamann, H., Beltrame, G.: Congestion and scalabil- ity in robot swarms: A study on collective decision making. In: 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp. 199–206 18 (2023). https://doi.org/10.1109/MRS60187.2023.10416793

work page doi:10.1109/mrs60187.2023.10416793 2023

[50] [50]

Swarm Intelligence13, 217–243 (2019) https://doi.org/10.1007/ s11721-019-00169-8

Prasetyo, J., Masi, G.D., Ferrante, E.: Collective decision making in dynamic environments. Swarm Intelligence13, 217–243 (2019) https://doi.org/10.1007/ s11721-019-00169-8

work page 2019

[51] [51]

In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems

Ebert, J.T., Gauci, M., Nagpal, R.: Multi-feature collective decision making in robot swarms. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS ’18, pp. 1711–1719. Inter- national Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2018)

work page 2018

[52] [52]

In: 2021 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp

Raoufi, M., Hamann, H., Romanczuk, P.: Speed-vs-accuracy tradeoff in collec- tive estimation: An adaptive exploration-exploitation case. In: 2021 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp. 47–55 (2021). https://doi.org/10.1109/MRS50823.2021.9620695

work page doi:10.1109/mrs50823.2021.9620695 2021

[53] [53]

Monocular visual-inertial odometry in low-textured environments with smooth gradients: A fully dense direct ﬁltering approach,

Ebert, J., Gauci, M., Mallmann-Trenn, F., Nagpal, R.: Bayes bots: Collective bayesian decision-making in decentralized robot swarms. In: 2020 IEEE Interna- tional Conference on Robotics and Automation, ICRA 2020. Proceedings - IEEE International Conference on Robotics and Automation, pp. 7186–7192. Insti- tute of Electrical and Electronics Engineers Inc.,...

work page doi:10.1109/icra40945.2020.9196584 2020

[54] [54]

Trends in Cognitive Sciences26, 66–80 (2022) https://doi.org/10.1016/j.tics.2021.10.006

Pirrone, A., Reina, A., Stafford, T., Marshall, J.A.R., Gobet, F.: Magnitude- sensitivity: rethinking decision-making cognitive sciences. Trends in Cognitive Sciences26, 66–80 (2022) https://doi.org/10.1016/j.tics.2021.10.006

work page doi:10.1016/j.tics.2021.10.006 2022

[55] [55]

waggle dance

Coucke, N., Heinrich, M.K., Cleeremans, A., Dorigo, M., Dumas, G.: Collective decision making by embodied neural agents. PNAS Nexus4(4), 101 (2025) https: //doi.org/10.1093/pnasnexus/pgaf101 19 A Proofs for Section 3 (Methodology) Lemma 1.An infinite population of individuals adoptingR success follows the TRD: dπa =π a(qπ a −v π),(5) whereπ a is the propo...

work page doi:10.1093/pnasnexus/pgaf101 2025