pith. sign in

arxiv: 2510.18886 · v2 · submitted 2025-10-14 · 🌊 nlin.AO · cs.MA· cs.NE

Emergence of Internal State-Modulated Swarming in Multi-Agent Patch Foraging System

Pith reviewed 2026-05-18 07:58 UTC · model grok-4.3

classification 🌊 nlin.AO cs.MAcs.NE
keywords swarmingmulti-agent foragingrecurrent neural networksaggregation behaviorinternal state modulationactive particlespatch foragingevolutionary strategies
0
0 comments X

The pith

Foragers evolve neural controllers whose hidden states track stored resources and drive stronger aggregation when resources are low.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper places multiple self-propelled agents in a continuous 2D space with stochastic motion and partial visibility of resource patches. A shared continuous-time recurrent neural network is evolved to output velocity commands so that the agents learn to locate and exploit the patches. Once the patches are removed, the agents still aggregate, and the tightness of this grouping increases as the amount of resource each agent carries decreases. Inspection of the network's internal hidden states shows they respond directly to the carried resource quantity. Forcing the hidden states to a low-resource representation accelerates the aggregation even when actual resources are unchanged.

Core claim

In this multi-agent patch foraging setup, agents controlled by an evolved continuous-time recurrent neural network exhibit aggregation when resource patches disappear. The strength of aggregation is inversely related to the quantity of resource stored by each forager. The network's hidden states are empirically found to be sensitive to this stored quantity, and clamping those states to values that represent lower resources produces faster aggregation behavior.

What carries the argument

The hidden states of the continuous-time recurrent neural network velocity controller that encode stored resource level and modulate aggregation tendency.

If this is right

  • When external resource patches are absent, foragers use the presence of other agents as a proxy signal and aggregate accordingly.
  • Aggregation becomes weaker as each forager's internal resource store grows, consistent with risk-sensitive foraging.
  • Direct manipulation of the controller's hidden states can reproduce or accelerate the aggregation effect without changing external observations.
  • The learned policy adapts to both patch presence and internal energy state within the same neural controller.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar internal-state modulation might appear in other active-particle systems where agents must balance exploration against depletion risk.
  • The same clamping technique could be applied to test whether hidden-state representations generalize across different numbers of agents or patch densities.
  • If the inverse resource-aggregation relation holds, it supplies a simple rule that could be inserted into analytic models of collective motion without needing full neural simulation.

Load-bearing premise

The sensitivity of the hidden states to resource levels and the resulting change in aggregation reflect a general internal-state principle rather than an artifact of the evolutionary algorithm, rollout statistics, or chosen environment parameters.

What would settle it

A new set of rollouts in which stored resource is varied independently while the hidden states are left free, followed by a check that aggregation strength reliably decreases with higher resource and that clamping no longer alters behavior when the states are disconnected from resource input.

Figures

Figures reproduced from arXiv: 2510.18886 by Ahmed EL-Gazzar, Marcel van Gerven, Siddharth Chaturvedi.

Figure 1
Figure 1. Figure 1: Environment snapshot, and local sensing limitations in the patch-foraging task. (b) Snapshot from one rollout with [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) The progression of mean, maximum and minimum fitness across generations during evolution. (b) Trajectories [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The number of unique patches visited by different [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Swarming emerges without patches and disappears when inter-agent sensing is ablated. (a) Ten foragers in a small test [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Final mean nearest neighbor distance versus for [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Single-agent CTRNN probe with clamped internal resource. (a) Example trajectory for a solitary forager with internal [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: (a) Sample trajectory when the free forager’s urgency–sensitive hidden states ( [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Inter–forager distance versus time (mean across 5 seeds; shaded bands denote [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
read the original abstract

Active particles are entities that sustain persistent out-of-equilibrium motion by consuming energy. Under certain conditions, they exhibit the tendency to self-organize through coordinated movements, such as swarming via aggregation. While performing non-cooperative foraging tasks, the emergence of such swarming behavior in foragers, exemplifying active particles, has been attributed to the partial observability of the environment, in which the presence of another forager can serve as a proxy signal to indicate the potential presence of a food source or a resource patch. In this paper, we validate this phenomenon by simulating multiple self-propelled foragers as they forage from multiple resource patches in a non-cooperative manner. These foragers operate in a continuous two-dimensional space with stochastic position updates and partial observability. We evolve a shared policy in the form of a continuous-time recurrent neural network that serves as a velocity controller for the foragers. To this end, we use an evolutionary strategy algorithm wherein the different samples of the policy-distribution are evaluated in the same rollout. Then we show that agents are able to learn to adaptively forage in the environment. Next, we show the emergence of swarming in the form of aggregation among the foragers when resource patches are absent. We observe that the strength of this swarming behavior appears to be inversely proportional to the amount of resource stored in the foragers, which supports the risk-sensitive foraging claims. Empirical analysis of the learned controller's hidden states in minimal test runs uncovers their sensitivity to the amount of resource stored in a forager. Clamping these hidden states to represent a lesser amount of resource hastens its learned aggregation behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript simulates multiple self-propelled foragers in a continuous 2D environment with partial observability and stochastic motion, controlled by a shared continuous-time RNN velocity policy evolved via an evolutionary strategy in which policy-distribution samples are evaluated within the same rollout. It claims that the agents learn adaptive non-cooperative foraging from resource patches, that swarming via aggregation emerges when patches are absent, that aggregation strength is inversely related to stored resource levels (consistent with risk-sensitive foraging), and that the RNN hidden states encode stored resource amount, as shown by their sensitivity in minimal test runs and by the fact that clamping them to represent lower resource hastens aggregation.

Significance. If the central claims hold after addressing validation gaps, the work would provide concrete empirical evidence that internal states in evolved controllers can causally modulate collective swarming in active-particle foraging systems. The combination of shared-policy RNN evolution, direct simulation of aggregation, and targeted hidden-state clamping offers a useful bridge between multi-agent learning and self-organization phenomena, with potential implications for understanding risk-sensitive behavior in partially observable environments.

major comments (3)
  1. [Evolutionary strategy and policy evaluation] Evolutionary strategy description: evaluating distinct policy samples inside identical shared rollouts couples their stochastic trajectories and partial observations. This design can induce spurious correlations between hidden-state trajectories and instantaneous resource levels that would not arise under independent rollouts or alternative optimizers, directly threatening the claim that the hidden states causally encode stored resource and modulate aggregation.
  2. [Empirical analysis of hidden states] Empirical analysis of hidden states and clamping experiments: the reported sensitivity and the hastening of aggregation under clamping are demonstrated only in minimal test runs. No statistics, multiple independent runs, ablation controls, or quantitative metrics of aggregation speed are provided, leaving open whether the effect is robust or specific to the training procedure and rollout conditions.
  3. [Results on emergence of swarming] Results on swarming strength: the statement that aggregation strength appears inversely proportional to stored resource is presented without quantitative measures, error bars, or statistical tests across resource levels, weakening support for the risk-sensitive foraging interpretation.
minor comments (2)
  1. [Abstract and Methods] The abstract and methods should explicitly define the phrase 'minimal test runs' and state the number of independent evaluations performed.
  2. [Figures] Figure captions and axis labels for aggregation trajectories should include the precise resource levels and time windows used in the clamping tests.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of validation and robustness that we will address in the revision. Below we respond point-by-point to the major comments.

read point-by-point responses
  1. Referee: [Evolutionary strategy and policy evaluation] Evolutionary strategy description: evaluating distinct policy samples inside identical shared rollouts couples their stochastic trajectories and partial observations. This design can induce spurious correlations between hidden-state trajectories and instantaneous resource levels that would not arise under independent rollouts or alternative optimizers, directly threatening the claim that the hidden states causally encode stored resource and modulate aggregation.

    Authors: We acknowledge that evaluating multiple policy samples within the same rollout can introduce correlations in the observed trajectories. This choice was made to ensure that all policy variants experience identical environmental stochasticity and patch configurations, thereby reducing variance in fitness estimates during evolution—a common practice in evolutionary strategies for multi-agent settings. Nevertheless, the causal role of hidden states is supported by the clamping experiments, which intervene directly on the internal state while holding other inputs fixed. We will add a dedicated paragraph in the Methods and Discussion sections clarifying this design choice, its potential limitations, and why the intervention-based evidence still supports the encoding claim. revision: partial

  2. Referee: [Empirical analysis of hidden states] Empirical analysis of hidden states and clamping experiments: the reported sensitivity and the hastening of aggregation under clamping are demonstrated only in minimal test runs. No statistics, multiple independent runs, ablation controls, or quantitative metrics of aggregation speed are provided, leaving open whether the effect is robust or specific to the training procedure and rollout conditions.

    Authors: We agree that the current presentation relies on illustrative minimal test runs without accompanying statistics. In the revised manuscript we will report results aggregated over multiple independent training seeds, include quantitative metrics such as mean aggregation time and spatial density variance, and add ablation controls (e.g., clamping to neutral values). Error bars and statistical comparisons will be provided to demonstrate robustness. revision: yes

  3. Referee: [Results on emergence of swarming] Results on swarming strength: the statement that aggregation strength appears inversely proportional to stored resource is presented without quantitative measures, error bars, or statistical tests across resource levels, weakening support for the risk-sensitive foraging interpretation.

    Authors: We accept that the inverse relationship was described qualitatively. We will augment the relevant figure and text with quantitative aggregation metrics (e.g., pair-correlation functions or mean inter-agent distance) computed across a range of resource levels, include error bars from multiple rollouts, and report appropriate statistical tests (e.g., regression slopes and p-values) to substantiate the risk-sensitive interpretation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical simulation results

full rationale

The paper presents results from direct multi-agent simulations using an evolved continuous-time RNN policy trained via evolutionary strategy, followed by empirical analysis of hidden-state sensitivity and clamping interventions. No mathematical derivation chain, first-principles prediction, or fitted parameter is claimed to generate the reported emergence of swarming or internal-state modulation; the findings are observational outputs from the simulation runs themselves. The shared-rollout evaluation during training is a stated methodological detail but does not reduce any central claim to a self-referential fit or self-citation. The work is self-contained against its own simulation benchmarks with no load-bearing self-citation or ansatz smuggling identified.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The paper's claims depend on simulation-specific parameters and modeling assumptions about agent perception and learning rather than deriving from first principles.

free parameters (2)
  • Evolutionary strategy hyperparameters
    Parameters for the evolutionary algorithm used to train the policy are chosen but not specified in detail.
  • Resource patch parameters
    Number, size, and placement of resource patches in the environment.
axioms (2)
  • domain assumption Partial observability serves as a proxy for resource presence
    Invoked to explain why swarming emerges in foraging tasks.
  • domain assumption Continuous-time recurrent neural network can serve as effective velocity controller
    Assumed in the choice of policy representation.

pith-pipeline@v0.9.0 · 5842 in / 1433 out tokens · 86116 ms · 2026-05-18T07:58:58.232636+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

  1. [1]

    Giacomo Albi, Nicola Bellomo, Luisa Fermo, S-Y Ha, Jeongho Kim, Lorenzo Pareschi, David Poyato, and Juan Soler. 2019. Vehicular Traffic, Crowds, and Swarms: From Kinetic Theory and Multiscale Methods to Applications and Re- search Perspectives.Mathematical Models And Methods In Applied Sciences29, 10 (2019), 1901–2005

  2. [2]

    Nathanaël Aubert-Kato, Olaf Witkowski, and Takashi Ikegami. 2015. The Hunger Games: Embodied Agents Evolving Foraging Strategies on the Frugal-Greedy Spectrum. InArtificial Life Conference Proceedings. MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info . . . , 357–364

  3. [3]

    Randall D Beer. 1995. On the Dynamics of Small Continuous-time Recurrent Neural Networks.Adaptive Behavior3, 4 (1995), 469–509

  4. [4]

    N Bellomo, M Esfahanian, V Secchini, and P Terna. 2022. What is life? Active Particles Tools Towards Behavioral Dynamics in Social-Biology and Economics. Physics Of Life Reviews43 (2022), 189–207

  5. [5]

    James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: Composable Transformations of Python+NumPy Programs. http://github.com/jax-ml/jax

  6. [6]

    Manuele Brambilla, Eliseo Ferrante, Mauro Birattari, and Marco Dorigo. 2013. Swarm Robotics: a Review from the Swarm Engineering Perspective.Swarm Intelligence7, 1 (2013), 1–41

  7. [7]

    Matthew A Carland, David Thura, and Paul Cisek. 2019. The Urge to Decide and Act: Implications for Brain Function and Dysfunction.The Neuroscientist25, 5 (2019), 491–511

  8. [8]

    Mathias Casiulis, Eden Arbel, Charlotte van Waes, Yoav Lahini, Stefano Martini- ani, Naomi Oppenheimer, and Matan Yah Ben Zion. 2025. A Geometric Condition for Robot-Swarm Cohesion and Cluster-Flock Transition.Proceedings of The National Academy Of Sciences122, 37 (2025), e2502211122

  9. [9]

    Challet and Y.-C

    D. Challet and Y.-C. Zhang. 1997. Emergence of Cooperation and Organization in an Evolutionary Game.Physica A: Statistical Mechanics And Its Applications 246, 3 (1997), 407–418. https://doi.org/10.1016/S0378-4371(97)00419-6

  10. [10]

    Siddharth Chaturvedi, Ahmed El-Gazzar, and Marcel Van Gerven. 2024. A Dy- namical Systems Approach to Optimal Foraging.PLOS Complex Systems1, 3 (2024), e0000018

  11. [11]

    Siddharth Chaturvedi, Ahmed El-Gazzar, and Marcel Van Gerven. 2025. ABMax: A JAX-based Agent-based Modeling Framework.ArXiv preprint arXiv:2508.16508 (2025)

  12. [12]

    Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud

  13. [13]

    Neural Ordinary Differential Equations.Advances In Neural Information Processing Systems31 (2018)

  14. [14]

    Iain D Couzin, Jens Krause, Richard James, Graeme D Ruxton, and Nigel R Franks

  15. [15]

    Collective Memory and Spatial Sorting in Animal Groups.Journal Of Theoretical Biology218, 1 (2002), 1–11

  16. [16]

    2004.Real-Time Collision Detection

    Christer Ericson. 2004.Real-Time Collision Detection. Crc Press

  17. [17]

    Yaouen Fily and M Cristina Marchetti. 2012. Athermal Phase Separation of Self- Propelled Particles With No Alignment.Physical Review Letters108, 23 (2012), 235702

  18. [18]

    Steven W Flavell, Nadine Gogolla, Matthew Lovett-Barron, and Moriel Ze- likowsky. 2022. The Emergence and Influence of Internal States.Neuron110, 16 (2022), 2545–2570

  19. [19]

    Gerhard Gompper, Roland G Winkler, Thomas Speck, Alexandre Solon, Cesare Nardini, Fernando Peruani, Hartmut Löwen, Ramin Golestanian, U Benjamin Kaupp, Luis Alvarez, et al. 2020. The 2020 Motile Active Matter Roadmap.Journal Of Physics: Condensed Matter32, 19 (2020), 193001

  20. [20]

    Gautier Hamon, Eleni Nisioti, and Clément Moulin-Frier. 2023. Eco-Evolutionary Dynamics of Non-Episodic Neuroevolution in Large Multi-Agent Environments. InProceedings of The Companion Conference On Genetic And Evolutionary Com- putation. 143–146

  21. [21]

    Nikolaus Hansen. 2016. The CMA Evolution Strategy: A Tutorial.ArXiv preprint arXiv:1604.00772(2016)

  22. [22]

    Benjamin Y Hayden, John M Pearson, and Michael L Platt. 2011. Neuronal Basis of Sequential Foraging Decisions in a Patchy Environment.Nature Neuroscience 14, 7 (2011), 933–939

  23. [23]

    Ali Jadbabaie, Jie Lin, and A Stephen Morse. 2003. Coordination of Groups of Mobile Autonomous Agents Using Nearest Neighbor Rules.IEEE Transactions On Automatic Control48, 6 (2003), 988–1001

  24. [24]

    Sander W Keemink and Christian K Machens. 2019. Decoding and Encoding (de) Mixed Population Responses.Current Opinion In Neurobiology58 (2019), 112–121

  25. [25]

    Utsab Khadka, Viktor Holubec, Haw Yang, and Frank Cichos. 2018. Active Particles Bound by Information Flows.Nature Communications9, 1 (2018), 3864

  26. [26]

    Robert Tjarko Lange. 2023. Evosax: Jax-Based Evolution Strategies. InProceedings of the Companion Conference On Genetic And Evolutionary Computation. 659–662

  27. [27]

    Robert C Löffler, Emanuele Panizon, and Clemens Bechinger. 2023. Collective Foraging of Active Particles Trained by Reinforcement Learning.Scientific Reports 13, 1 (2023), 17055

  28. [28]

    Owen Marschall, Kyunghyun Cho, and Cristina Savin. 2020. A Unified Framework of Online Learning Algorithms for Training Recurrent Neural Networks.Journal Of Machine Learning Research21, 135 (2020), 1–34

  29. [29]

    Connor Mattson, Varun Raveendra, Ricardo Vega, Cameron Nowzari, Daniel S Drew, and Daniel S Brown. 2025. Discovery and Deployment of Emergent Robot Swarm Behaviors via Representation Learning and Real2Sim2Real Transfer. In Proceedings of the 24th International Conference On Autonomous Agents And Multiagent Systems. 1473–1482

  30. [30]

    John M McNamara and Alasdair I Houston. 1986. The Common Currency for Behavioral Decisions.The American Naturalist127, 3 (1986), 358–378

  31. [31]

    Joe A Moschilla, Joseph L Tomkins, and Leigh W Simmons. 2018. State-Dependent Changes in Risk-Taking Behaviour as a Result of Age and Residual Reproductive Value.Animal Behaviour142 (2018), 95–100

  32. [32]

    Reza Olfati-Saber. 2006. Flocking For Multi-Agent Dynamic Systems: Algorithms and Theory.IEEE Transactions On Automatic Control51, 3 (2006), 401–420

  33. [33]

    Vitorino Ramos, Carlos Fernandes, Agostinho C Rosa, and Ajith Abraham. 2007. Computational Chemotaxis in Ants and Bacteria over Dynamic Environments. In2007 IEEE Congress On Evolutionary Computation. IEEE, 1109–1117

  34. [34]

    Reynolds

    Craig W. Reynolds. 1987. Flocks, Herds and Schools: a Distributed Behavioral Model.SIGGRAPH Comput. Graph.21, 4 (Aug. 1987), 25–34. https://doi.org/10. 1145/37402.37406

  35. [35]

    Olga Shishkov and Orit Peleg. 2022. Social Insects and Beyond: the Physics of Soft, Dense Invertebrate Aggregations.Collective Intelligence1, 2 (2022), 26339137221123758

  36. [36]

    2011.Intro- duction to Autonomous Mobile Robots

    Roland Siegwart, Illah Reza Nourbakhsh, and Davide Scaramuzza. 2011.Intro- duction to Autonomous Mobile Robots. MIT press

  37. [37]

    1986.Foraging Theory

    David W Stephens and John R Krebs. 1986.Foraging Theory. Vol. 6. Princeton university press

  38. [38]

    David Sumpter, Camille Buhl, Dora Biro, and Iain Couzin. 2008. Information Transfer in Moving Animal Groups.Theory In Biosciences127, 2 (2008), 177–186

  39. [39]

    David Sussillo. 2014. Neural Circuits as Computational Dynamical Systems. Current Opinion In Neurobiology25 (2014), 156–163

  40. [40]

    David Sussillo and Omri Barak. 2013. Opening the Black Box: Low-Dimensional Dynamics in High-Dimensional Recurrent Neural Networks.Neural Computation 25, 3 (2013), 626–649

  41. [41]

    Ryosuke Takata, Yujin Tang, Yingtao Tian, Norihiro Maruyama, Hiroki Kojima, and Takashi Ikegami. 2024. Evolution of Collective AI Beyond Individual Opti- mization.ArXiv Preprint arXiv:2412.02085(2024)

  42. [42]

    Yujin Tang, Yingtao Tian, and David Ha. 2022. Evojax: Hardware-Accelerated Neuroevolution. InProceedings of the Genetic And Evolutionary Computation Conference Companion. 308–311

  43. [43]

    Tamás Vicsek and Anna Zafeiris. 2012. Collective Motion.Physics Reports517, 3-4 (2012), 71–140

  44. [44]

    Olaf Witkowski and Takashi Ikegami. 2016. Emergence of Swarming Behavior: Foraging Agents Evolve Collective Motion Based on Signaling.PloS One11, 4 (2016), e0152756

  45. [45]

    Yaya Youssouf Yaya. 2023. A Predator-Prey Model from a Collective Dynamics and Self-Propelled Particles Approach. InComputer Sciences & Mathematics Forum, Vol. 7. MDPI, 50