Emergence of Internal State-Modulated Swarming in Multi-Agent Patch Foraging System
Pith reviewed 2026-05-18 07:58 UTC · model grok-4.3
The pith
Foragers evolve neural controllers whose hidden states track stored resources and drive stronger aggregation when resources are low.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In this multi-agent patch foraging setup, agents controlled by an evolved continuous-time recurrent neural network exhibit aggregation when resource patches disappear. The strength of aggregation is inversely related to the quantity of resource stored by each forager. The network's hidden states are empirically found to be sensitive to this stored quantity, and clamping those states to values that represent lower resources produces faster aggregation behavior.
What carries the argument
The hidden states of the continuous-time recurrent neural network velocity controller that encode stored resource level and modulate aggregation tendency.
If this is right
- When external resource patches are absent, foragers use the presence of other agents as a proxy signal and aggregate accordingly.
- Aggregation becomes weaker as each forager's internal resource store grows, consistent with risk-sensitive foraging.
- Direct manipulation of the controller's hidden states can reproduce or accelerate the aggregation effect without changing external observations.
- The learned policy adapts to both patch presence and internal energy state within the same neural controller.
Where Pith is reading between the lines
- Similar internal-state modulation might appear in other active-particle systems where agents must balance exploration against depletion risk.
- The same clamping technique could be applied to test whether hidden-state representations generalize across different numbers of agents or patch densities.
- If the inverse resource-aggregation relation holds, it supplies a simple rule that could be inserted into analytic models of collective motion without needing full neural simulation.
Load-bearing premise
The sensitivity of the hidden states to resource levels and the resulting change in aggregation reflect a general internal-state principle rather than an artifact of the evolutionary algorithm, rollout statistics, or chosen environment parameters.
What would settle it
A new set of rollouts in which stored resource is varied independently while the hidden states are left free, followed by a check that aggregation strength reliably decreases with higher resource and that clamping no longer alters behavior when the states are disconnected from resource input.
Figures
read the original abstract
Active particles are entities that sustain persistent out-of-equilibrium motion by consuming energy. Under certain conditions, they exhibit the tendency to self-organize through coordinated movements, such as swarming via aggregation. While performing non-cooperative foraging tasks, the emergence of such swarming behavior in foragers, exemplifying active particles, has been attributed to the partial observability of the environment, in which the presence of another forager can serve as a proxy signal to indicate the potential presence of a food source or a resource patch. In this paper, we validate this phenomenon by simulating multiple self-propelled foragers as they forage from multiple resource patches in a non-cooperative manner. These foragers operate in a continuous two-dimensional space with stochastic position updates and partial observability. We evolve a shared policy in the form of a continuous-time recurrent neural network that serves as a velocity controller for the foragers. To this end, we use an evolutionary strategy algorithm wherein the different samples of the policy-distribution are evaluated in the same rollout. Then we show that agents are able to learn to adaptively forage in the environment. Next, we show the emergence of swarming in the form of aggregation among the foragers when resource patches are absent. We observe that the strength of this swarming behavior appears to be inversely proportional to the amount of resource stored in the foragers, which supports the risk-sensitive foraging claims. Empirical analysis of the learned controller's hidden states in minimal test runs uncovers their sensitivity to the amount of resource stored in a forager. Clamping these hidden states to represent a lesser amount of resource hastens its learned aggregation behavior.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript simulates multiple self-propelled foragers in a continuous 2D environment with partial observability and stochastic motion, controlled by a shared continuous-time RNN velocity policy evolved via an evolutionary strategy in which policy-distribution samples are evaluated within the same rollout. It claims that the agents learn adaptive non-cooperative foraging from resource patches, that swarming via aggregation emerges when patches are absent, that aggregation strength is inversely related to stored resource levels (consistent with risk-sensitive foraging), and that the RNN hidden states encode stored resource amount, as shown by their sensitivity in minimal test runs and by the fact that clamping them to represent lower resource hastens aggregation.
Significance. If the central claims hold after addressing validation gaps, the work would provide concrete empirical evidence that internal states in evolved controllers can causally modulate collective swarming in active-particle foraging systems. The combination of shared-policy RNN evolution, direct simulation of aggregation, and targeted hidden-state clamping offers a useful bridge between multi-agent learning and self-organization phenomena, with potential implications for understanding risk-sensitive behavior in partially observable environments.
major comments (3)
- [Evolutionary strategy and policy evaluation] Evolutionary strategy description: evaluating distinct policy samples inside identical shared rollouts couples their stochastic trajectories and partial observations. This design can induce spurious correlations between hidden-state trajectories and instantaneous resource levels that would not arise under independent rollouts or alternative optimizers, directly threatening the claim that the hidden states causally encode stored resource and modulate aggregation.
- [Empirical analysis of hidden states] Empirical analysis of hidden states and clamping experiments: the reported sensitivity and the hastening of aggregation under clamping are demonstrated only in minimal test runs. No statistics, multiple independent runs, ablation controls, or quantitative metrics of aggregation speed are provided, leaving open whether the effect is robust or specific to the training procedure and rollout conditions.
- [Results on emergence of swarming] Results on swarming strength: the statement that aggregation strength appears inversely proportional to stored resource is presented without quantitative measures, error bars, or statistical tests across resource levels, weakening support for the risk-sensitive foraging interpretation.
minor comments (2)
- [Abstract and Methods] The abstract and methods should explicitly define the phrase 'minimal test runs' and state the number of independent evaluations performed.
- [Figures] Figure captions and axis labels for aggregation trajectories should include the precise resource levels and time windows used in the clamping tests.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of validation and robustness that we will address in the revision. Below we respond point-by-point to the major comments.
read point-by-point responses
-
Referee: [Evolutionary strategy and policy evaluation] Evolutionary strategy description: evaluating distinct policy samples inside identical shared rollouts couples their stochastic trajectories and partial observations. This design can induce spurious correlations between hidden-state trajectories and instantaneous resource levels that would not arise under independent rollouts or alternative optimizers, directly threatening the claim that the hidden states causally encode stored resource and modulate aggregation.
Authors: We acknowledge that evaluating multiple policy samples within the same rollout can introduce correlations in the observed trajectories. This choice was made to ensure that all policy variants experience identical environmental stochasticity and patch configurations, thereby reducing variance in fitness estimates during evolution—a common practice in evolutionary strategies for multi-agent settings. Nevertheless, the causal role of hidden states is supported by the clamping experiments, which intervene directly on the internal state while holding other inputs fixed. We will add a dedicated paragraph in the Methods and Discussion sections clarifying this design choice, its potential limitations, and why the intervention-based evidence still supports the encoding claim. revision: partial
-
Referee: [Empirical analysis of hidden states] Empirical analysis of hidden states and clamping experiments: the reported sensitivity and the hastening of aggregation under clamping are demonstrated only in minimal test runs. No statistics, multiple independent runs, ablation controls, or quantitative metrics of aggregation speed are provided, leaving open whether the effect is robust or specific to the training procedure and rollout conditions.
Authors: We agree that the current presentation relies on illustrative minimal test runs without accompanying statistics. In the revised manuscript we will report results aggregated over multiple independent training seeds, include quantitative metrics such as mean aggregation time and spatial density variance, and add ablation controls (e.g., clamping to neutral values). Error bars and statistical comparisons will be provided to demonstrate robustness. revision: yes
-
Referee: [Results on emergence of swarming] Results on swarming strength: the statement that aggregation strength appears inversely proportional to stored resource is presented without quantitative measures, error bars, or statistical tests across resource levels, weakening support for the risk-sensitive foraging interpretation.
Authors: We accept that the inverse relationship was described qualitatively. We will augment the relevant figure and text with quantitative aggregation metrics (e.g., pair-correlation functions or mean inter-agent distance) computed across a range of resource levels, include error bars from multiple rollouts, and report appropriate statistical tests (e.g., regression slopes and p-values) to substantiate the risk-sensitive interpretation. revision: yes
Circularity Check
No significant circularity in empirical simulation results
full rationale
The paper presents results from direct multi-agent simulations using an evolved continuous-time RNN policy trained via evolutionary strategy, followed by empirical analysis of hidden-state sensitivity and clamping interventions. No mathematical derivation chain, first-principles prediction, or fitted parameter is claimed to generate the reported emergence of swarming or internal-state modulation; the findings are observational outputs from the simulation runs themselves. The shared-rollout evaluation during training is a stated methodological detail but does not reduce any central claim to a self-referential fit or self-citation. The work is self-contained against its own simulation benchmarks with no load-bearing self-citation or ansatz smuggling identified.
Axiom & Free-Parameter Ledger
free parameters (2)
- Evolutionary strategy hyperparameters
- Resource patch parameters
axioms (2)
- domain assumption Partial observability serves as a proxy for resource presence
- domain assumption Continuous-time recurrent neural network can serve as effective velocity controller
Reference graph
Works this paper leans on
-
[1]
Giacomo Albi, Nicola Bellomo, Luisa Fermo, S-Y Ha, Jeongho Kim, Lorenzo Pareschi, David Poyato, and Juan Soler. 2019. Vehicular Traffic, Crowds, and Swarms: From Kinetic Theory and Multiscale Methods to Applications and Re- search Perspectives.Mathematical Models And Methods In Applied Sciences29, 10 (2019), 1901–2005
work page 2019
-
[2]
Nathanaël Aubert-Kato, Olaf Witkowski, and Takashi Ikegami. 2015. The Hunger Games: Embodied Agents Evolving Foraging Strategies on the Frugal-Greedy Spectrum. InArtificial Life Conference Proceedings. MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info . . . , 357–364
work page 2015
-
[3]
Randall D Beer. 1995. On the Dynamics of Small Continuous-time Recurrent Neural Networks.Adaptive Behavior3, 4 (1995), 469–509
work page 1995
-
[4]
N Bellomo, M Esfahanian, V Secchini, and P Terna. 2022. What is life? Active Particles Tools Towards Behavioral Dynamics in Social-Biology and Economics. Physics Of Life Reviews43 (2022), 189–207
work page 2022
-
[5]
James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: Composable Transformations of Python+NumPy Programs. http://github.com/jax-ml/jax
work page 2018
-
[6]
Manuele Brambilla, Eliseo Ferrante, Mauro Birattari, and Marco Dorigo. 2013. Swarm Robotics: a Review from the Swarm Engineering Perspective.Swarm Intelligence7, 1 (2013), 1–41
work page 2013
-
[7]
Matthew A Carland, David Thura, and Paul Cisek. 2019. The Urge to Decide and Act: Implications for Brain Function and Dysfunction.The Neuroscientist25, 5 (2019), 491–511
work page 2019
-
[8]
Mathias Casiulis, Eden Arbel, Charlotte van Waes, Yoav Lahini, Stefano Martini- ani, Naomi Oppenheimer, and Matan Yah Ben Zion. 2025. A Geometric Condition for Robot-Swarm Cohesion and Cluster-Flock Transition.Proceedings of The National Academy Of Sciences122, 37 (2025), e2502211122
work page 2025
-
[9]
D. Challet and Y.-C. Zhang. 1997. Emergence of Cooperation and Organization in an Evolutionary Game.Physica A: Statistical Mechanics And Its Applications 246, 3 (1997), 407–418. https://doi.org/10.1016/S0378-4371(97)00419-6
-
[10]
Siddharth Chaturvedi, Ahmed El-Gazzar, and Marcel Van Gerven. 2024. A Dy- namical Systems Approach to Optimal Foraging.PLOS Complex Systems1, 3 (2024), e0000018
work page 2024
- [11]
-
[12]
Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud
-
[13]
Neural Ordinary Differential Equations.Advances In Neural Information Processing Systems31 (2018)
work page 2018
-
[14]
Iain D Couzin, Jens Krause, Richard James, Graeme D Ruxton, and Nigel R Franks
-
[15]
Collective Memory and Spatial Sorting in Animal Groups.Journal Of Theoretical Biology218, 1 (2002), 1–11
work page 2002
-
[16]
2004.Real-Time Collision Detection
Christer Ericson. 2004.Real-Time Collision Detection. Crc Press
work page 2004
-
[17]
Yaouen Fily and M Cristina Marchetti. 2012. Athermal Phase Separation of Self- Propelled Particles With No Alignment.Physical Review Letters108, 23 (2012), 235702
work page 2012
-
[18]
Steven W Flavell, Nadine Gogolla, Matthew Lovett-Barron, and Moriel Ze- likowsky. 2022. The Emergence and Influence of Internal States.Neuron110, 16 (2022), 2545–2570
work page 2022
-
[19]
Gerhard Gompper, Roland G Winkler, Thomas Speck, Alexandre Solon, Cesare Nardini, Fernando Peruani, Hartmut Löwen, Ramin Golestanian, U Benjamin Kaupp, Luis Alvarez, et al. 2020. The 2020 Motile Active Matter Roadmap.Journal Of Physics: Condensed Matter32, 19 (2020), 193001
work page 2020
-
[20]
Gautier Hamon, Eleni Nisioti, and Clément Moulin-Frier. 2023. Eco-Evolutionary Dynamics of Non-Episodic Neuroevolution in Large Multi-Agent Environments. InProceedings of The Companion Conference On Genetic And Evolutionary Com- putation. 143–146
work page 2023
-
[21]
Nikolaus Hansen. 2016. The CMA Evolution Strategy: A Tutorial.ArXiv preprint arXiv:1604.00772(2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[22]
Benjamin Y Hayden, John M Pearson, and Michael L Platt. 2011. Neuronal Basis of Sequential Foraging Decisions in a Patchy Environment.Nature Neuroscience 14, 7 (2011), 933–939
work page 2011
-
[23]
Ali Jadbabaie, Jie Lin, and A Stephen Morse. 2003. Coordination of Groups of Mobile Autonomous Agents Using Nearest Neighbor Rules.IEEE Transactions On Automatic Control48, 6 (2003), 988–1001
work page 2003
-
[24]
Sander W Keemink and Christian K Machens. 2019. Decoding and Encoding (de) Mixed Population Responses.Current Opinion In Neurobiology58 (2019), 112–121
work page 2019
-
[25]
Utsab Khadka, Viktor Holubec, Haw Yang, and Frank Cichos. 2018. Active Particles Bound by Information Flows.Nature Communications9, 1 (2018), 3864
work page 2018
-
[26]
Robert Tjarko Lange. 2023. Evosax: Jax-Based Evolution Strategies. InProceedings of the Companion Conference On Genetic And Evolutionary Computation. 659–662
work page 2023
-
[27]
Robert C Löffler, Emanuele Panizon, and Clemens Bechinger. 2023. Collective Foraging of Active Particles Trained by Reinforcement Learning.Scientific Reports 13, 1 (2023), 17055
work page 2023
-
[28]
Owen Marschall, Kyunghyun Cho, and Cristina Savin. 2020. A Unified Framework of Online Learning Algorithms for Training Recurrent Neural Networks.Journal Of Machine Learning Research21, 135 (2020), 1–34
work page 2020
-
[29]
Connor Mattson, Varun Raveendra, Ricardo Vega, Cameron Nowzari, Daniel S Drew, and Daniel S Brown. 2025. Discovery and Deployment of Emergent Robot Swarm Behaviors via Representation Learning and Real2Sim2Real Transfer. In Proceedings of the 24th International Conference On Autonomous Agents And Multiagent Systems. 1473–1482
work page 2025
-
[30]
John M McNamara and Alasdair I Houston. 1986. The Common Currency for Behavioral Decisions.The American Naturalist127, 3 (1986), 358–378
work page 1986
-
[31]
Joe A Moschilla, Joseph L Tomkins, and Leigh W Simmons. 2018. State-Dependent Changes in Risk-Taking Behaviour as a Result of Age and Residual Reproductive Value.Animal Behaviour142 (2018), 95–100
work page 2018
-
[32]
Reza Olfati-Saber. 2006. Flocking For Multi-Agent Dynamic Systems: Algorithms and Theory.IEEE Transactions On Automatic Control51, 3 (2006), 401–420
work page 2006
-
[33]
Vitorino Ramos, Carlos Fernandes, Agostinho C Rosa, and Ajith Abraham. 2007. Computational Chemotaxis in Ants and Bacteria over Dynamic Environments. In2007 IEEE Congress On Evolutionary Computation. IEEE, 1109–1117
work page 2007
- [34]
-
[35]
Olga Shishkov and Orit Peleg. 2022. Social Insects and Beyond: the Physics of Soft, Dense Invertebrate Aggregations.Collective Intelligence1, 2 (2022), 26339137221123758
work page 2022
-
[36]
2011.Intro- duction to Autonomous Mobile Robots
Roland Siegwart, Illah Reza Nourbakhsh, and Davide Scaramuzza. 2011.Intro- duction to Autonomous Mobile Robots. MIT press
work page 2011
-
[37]
David W Stephens and John R Krebs. 1986.Foraging Theory. Vol. 6. Princeton university press
work page 1986
-
[38]
David Sumpter, Camille Buhl, Dora Biro, and Iain Couzin. 2008. Information Transfer in Moving Animal Groups.Theory In Biosciences127, 2 (2008), 177–186
work page 2008
-
[39]
David Sussillo. 2014. Neural Circuits as Computational Dynamical Systems. Current Opinion In Neurobiology25 (2014), 156–163
work page 2014
-
[40]
David Sussillo and Omri Barak. 2013. Opening the Black Box: Low-Dimensional Dynamics in High-Dimensional Recurrent Neural Networks.Neural Computation 25, 3 (2013), 626–649
work page 2013
- [41]
-
[42]
Yujin Tang, Yingtao Tian, and David Ha. 2022. Evojax: Hardware-Accelerated Neuroevolution. InProceedings of the Genetic And Evolutionary Computation Conference Companion. 308–311
work page 2022
-
[43]
Tamás Vicsek and Anna Zafeiris. 2012. Collective Motion.Physics Reports517, 3-4 (2012), 71–140
work page 2012
-
[44]
Olaf Witkowski and Takashi Ikegami. 2016. Emergence of Swarming Behavior: Foraging Agents Evolve Collective Motion Based on Signaling.PloS One11, 4 (2016), e0152756
work page 2016
-
[45]
Yaya Youssouf Yaya. 2023. A Predator-Prey Model from a Collective Dynamics and Self-Propelled Particles Approach. InComputer Sciences & Mathematics Forum, Vol. 7. MDPI, 50
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.