arxiv: 2604.07303 · v1 · submitted 2026-04-08 · 💻 cs.RO

Recognition: no theorem link

Robots that learn to evaluate models of collective behavior

Mathis Hocke , Andreas Gerken , David Bierbach , Jens Krause , Tim Landgraf

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:47 UTC · model grok-4.3

classification 💻 cs.RO

keywords collective behaviorbehavioral modelingreinforcement learningrobotic fishsim-to-real transfermodel evaluationclosed-loop interactionbio-inspired robotics

0 comments

The pith

A robotic fish distinguishes neural network models of fish behavior as more accurate than rule-based models through closed-loop interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a method that trains reinforcement learning policies in simulation to steer a virtual fish toward goals using different behavioral models, then transfers those policies to a physical RoboFish robot that interacts with live fish. The evaluation compares simulated and real outcomes by measuring distribution gaps in metrics such as goal-reaching success, inter-individual distances, wall interactions, and alignment. A convolutional neural network model produces the smallest gaps across these metrics, indicating it predicts live fish responses more closely than constant-follow or rule-based alternatives. Traditional model checks rely on static trajectory statistics, but this embodied approach tests predictions under matched dynamic conditions where the robot actively influences the group. If the separation holds, researchers gain a practical way to rank and refine models of collective animal behavior by how well they guide real-time interactions.

Core claim

We introduce a reinforcement-learning framework that evaluates computational models of fish behavior by transferring simulation-trained policies to a biomimetic RoboFish, which then interacts with live fish in goal-directed tasks. Models are ranked by the Wasserstein distance between simulated and real distributions of behavioral metrics including goal-reaching performance, inter-individual distances, wall interactions, and alignment. The neural network-based model shows the smallest gaps on goal-reaching and most other metrics, demonstrating higher behavioral fidelity than rule-based models. This separation establishes that the closed-loop robotic evaluation can quantitatively distinguish候选

What carries the argument

Closed-loop sim-to-real gap measurement, where RL policies trained to guide a simulated fish to goals are transferred to the RoboFish hardware and the resulting Wasserstein distances on behavioral metric distributions quantify each model's fidelity to live fish responses.

If this is right

Models of collective behavior can be ranked by fidelity using interactive robotic tests rather than offline statistics alone.
Specific metric gaps, such as in alignment or wall avoidance, can highlight where a given model fails to capture real fish responses.
Simulation-trained policies become reusable tools for embodied validation of new candidate models on the same hardware.
The framework supports iterative refinement of models by identifying which behavioral aspects drive the largest sim-to-real discrepancies.
Bio-inspired robotic systems can adopt the highest-fidelity models identified through these live interactions for improved group coordination.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same transfer-and-compare approach could be adapted to test behavioral models in other species by swapping the robot platform and metric set.
Large gaps on particular metrics could directly suggest missing features to add to the neural network, such as new sensory inputs or social rules.
Extending the method to multi-robot swarms would allow direct testing of whether a model correctly predicts group-level decisions under physical constraints.
Over time, repeated use of this evaluation could shift validation of animal behavior models from static data matching toward predictive accuracy in live embodied settings.

Load-bearing premise

That policies trained in simulation transfer to the physical RoboFish with enough fidelity that differences in behavioral metric distributions can be attributed to the fish models rather than hardware or control mismatches.

What would settle it

Repeated trials in which the neural network model produces Wasserstein distances on goal-reaching and alignment that are statistically indistinguishable from or larger than those of the rule-based models when the same policy and fish groups are used.

read the original abstract

Understanding and modeling animal behavior is essential for studying collective motion, decision-making, and bio-inspired robotics. Yet, evaluating the accuracy of behavioral models still often relies on offline comparisons to static trajectory statistics. Here we introduce a reinforcement-learning-based framework that uses a biomimetic robotic fish (RoboFish) to evaluate computational models of live fish behavior through closed-loop interaction. We trained policies in simulation using four distinct fish models-a simple constant-follow baseline, two rule-based models, and a biologically grounded convolutional neural network model-and transferred these policies to the real RoboFish setup, where they interacted with live fish. Policies were trained to guide a simulated fish to goal locations, enabling us to quantify how the response of real fish differs from the simulated fish's response. We evaluate the fish models by quantifying the sim-to-real gaps, defined as the Wasserstein distance between simulated and real distributions of behavioral metrics such as goal-reaching performance, inter-individual distances, wall interactions, and alignment. The neural network-based fish model exhibited the smallest gap across goal-reaching performance and most other metrics, indicating higher behavioral fidelity than conventional rule-based models under this benchmark. More importantly, this separation shows that the proposed evaluation can quantitatively distinguish candidate models under matched closed-loop conditions. Our work demonstrates how learning-based robotic experiments can uncover deficiencies in behavioral models and provides a general framework for evaluating animal behavior models through embodied interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a workable embodied way to rank fish behavior models via RL policies on a real robot, but the sim-to-real transfer step leaves the model rankings open to hardware confounds.

read the letter

The main point is that they train model-specific RL policies in simulation to steer a virtual fish toward goals, transfer the policies to a physical RoboFish, and then use Wasserstein distances on live-fish interactions to measure how much each model deviates from reality. The neural network model shows smaller gaps than the rule-based alternatives on goal-reaching and several other metrics, which demonstrates that the method can separate the candidates under matched closed-loop conditions.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces a reinforcement-learning framework that uses a biomimetic robotic fish (RoboFish) to evaluate computational models of collective fish behavior via closed-loop interactions with live fish. Policies are trained in simulation under four fish models (constant-follow baseline, two rule-based models, and a convolutional neural network model) to guide a virtual fish to goal locations; these policies are then transferred to the physical robot. Model accuracy is quantified by Wasserstein distances between the distributions of behavioral metrics (goal-reaching performance, inter-individual distances, wall interactions, and alignment) observed in simulation versus reality. The neural-network model produces the smallest gaps on most metrics, which the authors interpret as evidence of higher behavioral fidelity and as proof that the closed-loop evaluation can distinguish candidate models.

Significance. If the central attribution holds, the work supplies a concrete embodied benchmark that moves model evaluation beyond offline trajectory statistics into interactive, goal-directed settings. The combination of RL-trained policies, sim-to-real transfer, and Wasserstein metrics on multiple behavioral axes offers a reproducible, quantitative protocol that could be adopted for testing other collective-behavior models. The explicit demonstration that the method separates the NN model from rule-based alternatives is a tangible contribution.

major comments (1)

The claim that smaller Wasserstein gaps demonstrate superior fidelity of the neural-network fish model presupposes that policy transfer from simulation to the physical RoboFish is equivalent across all four models and free of confounding robot-specific dynamics. The abstract and experimental description supply no robot-only validation runs, no ablation of hardware effects (actuator lag, sensor noise, fluid drag), and no confirmation that the closed-loop stimuli delivered to live fish match between sim and real. This assumption is load-bearing for the central attribution of performance differences to fish-model accuracy rather than to unmodeled hardware mismatches.

minor comments (2)

The abstract would benefit from a brief statement of the number of experimental trials per condition and whether metric differences were assessed with statistical tests.
Notation for the four fish models and the precise definition of each behavioral metric should be introduced consistently in the main text before the results are presented.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive critique of our work. The concern regarding potential confounding from sim-to-real policy transfer is well-taken, and we address it directly below.

read point-by-point responses

Referee: The claim that smaller Wasserstein gaps demonstrate superior fidelity of the neural-network fish model presupposes that policy transfer from simulation to the physical RoboFish is equivalent across all four models and free of confounding robot-specific dynamics. The abstract and experimental description supply no robot-only validation runs, no ablation of hardware effects (actuator lag, sensor noise, fluid drag), and no confirmation that the closed-loop stimuli delivered to live fish match between sim and real. This assumption is load-bearing for the central attribution of performance differences to fish-model accuracy rather than to unmodeled hardware mismatches.

Authors: We acknowledge that the current manuscript does not include dedicated robot-only validation runs (i.e., policy execution without live fish) or systematic ablations isolating hardware effects such as actuator lag, sensor noise, or fluid drag. All four policies were transferred to the identical physical RoboFish platform and executed under the same real-world conditions when interacting with live fish. While the robot hardware remains fixed, we recognize that policies trained under different fish models may generate distinct action sequences that interact differently with unmodeled dynamics, potentially contributing to the observed Wasserstein gaps. We did not perform additional experiments to decouple these factors. In the revised version we will (1) add an explicit discussion of this assumption and its limitations in the methods and discussion sections, (2) provide further detail on how sensory inputs and robot trajectories are matched between simulation and reality, and (3) clarify that the primary source of variation across conditions is the fish model used during policy training. These changes will qualify the central attribution without altering the reported results. revision: partial

Circularity Check

0 steps flagged

No significant circularity; evaluation uses independent live-fish data and standard metrics

full rationale

The paper trains RL policies in simulation under each candidate fish model (constant-follow, rule-based, NN), transfers them to physical RoboFish hardware, and measures Wasserstein distances between simulated and real distributions of goal-reaching, distances, wall interactions, and alignment. These distances are computed from external live-fish trajectories and a standard statistical distance; no equation or claim reduces a fitted parameter to a prediction by construction, nor does any load-bearing premise rest on a self-citation chain that itself lacks independent verification. The NN model is presented as one of four pre-existing candidates rather than being derived from the evaluation itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The evaluation framework rests on standard assumptions from reinforcement learning and robotics (policy transferability, metric sufficiency) plus the domain assumption that Wasserstein distance on the listed behavioral statistics captures model fidelity. No new free parameters or invented entities are introduced by the evaluation method itself; the four fish models are treated as given inputs.

axioms (2)

domain assumption Policies trained to guide a simulated fish to goals will produce informative interactions when transferred to the physical RoboFish.
Invoked in the description of training in simulation and transfer to real setup.
domain assumption The chosen behavioral metrics (goal-reaching performance, inter-individual distances, wall interactions, alignment) are adequate to distinguish model quality.
Central to the claim that the smallest gap indicates higher fidelity.

pith-pipeline@v0.9.0 · 5550 in / 1438 out tokens · 33422 ms · 2026-05-10T17:47:07.594410+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 21 canonical work pages · 1 internal anchor

[1]

Krause, A

J. Krause, A. F. Winfield, J.-L. Deneubourg, Interactive robots in experimental biology.Trends in Ecology & Evolution26(7), 369–375 (2011), 00112, doi:10.1016/j.tree.2011.03.015,http: //linkinghub.elsevier.com/retrieve/pii/S0169534711000851

work page doi:10.1016/j.tree.2011.03.015 2011
[2]

Landgraf,et al., Interactive Robotic Fish for the Analysis of Swarm Behavior, inAdvances in Swarm Intelligence, Y

T. Landgraf,et al., Interactive Robotic Fish for the Analysis of Swarm Behavior, inAdvances in Swarm Intelligence, Y. Tan, Y. Shi, H. Mo, Eds., Lecture Notes in Computer Science (Springer, Berlin, Heidelberg) (2013), pp. 1–10, doi:10.1007/978-3-642-38703-6 1, 00019

work page doi:10.1007/978-3-642-38703-6 2013
[3]

Vicsek, A

T. Vicsek, A. Czir ´ok, E. Ben-Jacob, I. Cohen, O. Shochet, Novel Type of Phase Tran- sition in a System of Self-Driven Particles.Physical Review Letters75(6), 1226–1229 (1995), 04436, doi:10.1103/PhysRevLett.75.1226,https://link.aps.org/doi/10.1103/ PhysRevLett.75.1226

work page doi:10.1103/physrevlett.75.1226 1995
[4]

I. D. Couzin, J. Krause, R. James, G. D. Ruxton, N. R. Franks, Collective Memory and Spatial Sorting in Animal Groups.Journal of Theoretical Biology218(1), 1–11 (2002), 01717, doi:10.1006/jtbi.2002.3065,http://www.sciencedirect.com/science/article/pii/ S0022519302930651

work page doi:10.1006/jtbi.2002.3065 2002
[5]

Aoki, A simulation study on the schooling mechanism in fish.NIPPON SUISAN GAKKAISHI 48(8), 1081–1088 (1982), 00000, doi:10.2331/suisan.48.1081,http://joi.jlc.jst.go

I. Aoki, A simulation study on the schooling mechanism in fish.NIPPON SUISAN GAKKAISHI 48(8), 1081–1088 (1982), 00000, doi:10.2331/suisan.48.1081,http://joi.jlc.jst.go. jp/JST.Journalarchive/suisan1932/48.1081?from=CrossRef

work page doi:10.2331/suisan.48.1081 1982
[6]

C. W. Reynolds, Flocks, herds and schools: A distributed behavioral model.ACM SIG- GRAPH Computer Graphics21(4), 25–34 (1987), 09308, doi:10.1145/37402.37406,http: //portal.acm.org/citation.cfm?doid=37402.37406

work page doi:10.1145/37402.37406 1987
[7]

A. Huth, C. Wissel, The simulation of the movement of fish schools.Journal of The- oretical Biology156(3), 365–385 (1992), doi:10.1016/S0022-5193(05)80681-2,https: //www.sciencedirect.com/science/article/pii/S0022519305806812

work page doi:10.1016/s0022-5193(05)80681-2 1992
[8]

Eyjolfsdottir, K

E. Eyjolfsdottir, K. Branson, Y. Yue, P. Perona, Learning recurrent representations for hi- erarchical behavior modeling.arXiv:1611.00094 [cs](2016), arXiv: 1611.00094,http: //arxiv.org/abs/1611.00094. 18

work page arXiv 2016
[9]

F. J. H. Heras, F. Romero-Ferrero, R. C. Hinz, G. G. de Polavieja, Deep attention networks reveal the rules of collective motion in zebrafish.PLOS Computational Biology15(9), e1007354 (2019), 00008, doi:10.1371/journal.pcbi.1007354,https://dx.plos.org/10. 1371/journal.pcbi.1007354

work page doi:10.1371/journal.pcbi.1007354 2019
[10]

Costa, A

T. Costa, A. Laan, F. J. H. Heras, G. G. de Polavieja, Automated Discovery of Lo- cal Rules for Desired Collective-Level Behavior Through Reinforcement Learning.Fron- tiers in Physics8(2020), doi:10.3389/fphy.2020.00200,https://www.frontiersin.org/ journals/physics/articles/10.3389/fphy.2020.00200/full

work page doi:10.3389/fphy.2020.00200 2020
[11]

Behavioral Ecology and Sociobiology(2010), doi:10.1007/s00265-010-0988-y

Faria, A novel method for investigating the collective behaviour of fish: introducing ‘Robofish’. Behavioral Ecology and Sociobiology(2010), doi:10.1007/s00265-010-0988-y

work page doi:10.1007/s00265-010-0988-y 2010
[12]

T. Landgraf,et al., RoboFish: increased acceptance of interactive robotic fish with realistic eyes and natural motion patterns by live Trinidadian guppies.Bioinspiration & Biomimetics 11(1), 015001 (2016), 00066, doi:10.1088/1748-3190/11/1/015001,https://doi.org/10. 1088%2F1748-3190%2F11%2F1%2F015001

work page doi:10.1088/1748-3190/11/1/015001 2016
[13]

Polverino, Fish and Robots Swimming Together in a Water Tunnel: Robot Color and Tail-Beat Frequency Influence Fish Behavior.PLoS ONE(2013), doi:10.1371/journal.pone.0077589

work page doi:10.1371/journal.pone.0077589 2013
[14]

D. Bierbach,et al., Using a robotic fish to investigate individual differences in social responsive- ness in the guppy.Royal Society Open Science5(8), 181026 (2018), doi:10.1098/rsos.181026, https://royalsocietypublishing.org/doi/full/10.1098/rsos.181026

work page doi:10.1098/rsos.181026 2018
[15]

Papaspyros, G

V. Papaspyros, G. Theraulaz, C. Sire, F. Mondada, Quantifying the biomimicry gap in biohybrid robot-fish pairs.Bioinspiration & Biomimetics19(4), 046020 (2024), doi:10.1088/1748-3190/ ad577a,https://doi.org/10.1088/1748-3190/ad577a

work page doi:10.1088/1748-3190/ 2024
[16]

M. Maxeiner,et al., Social competence improves the performance of biomimetic robots leading live fish.Bioinspiration & Biomimetics18(4), 045001 (2023), doi:10.1088/1748-3190/acca59, https://dx.doi.org/10.1088/1748-3190/acca59. 19

work page doi:10.1088/1748-3190/acca59 2023
[17]

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

J. Tobin,et al., Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.arXiv:1703.06907 [cs](2017), 00726 arXiv: 1703.06907,http://arxiv. org/abs/1703.06907

work page Pith review arXiv 2017
[18]

X. B. Peng, M. Andrychowicz, W. Zaremba, P. Abbeel, Sim-to-Real Transfer of Robotic Control with Dynamics Randomization, in2018 IEEE International Conference on Robotics and Automation (ICRA)(IEEE, Brisbane, QLD) (2018), pp. 3803–3810, doi:10.1109/ICRA. 2018.8460528,https://ieeexplore.ieee.org/document/8460528/

work page doi:10.1109/icra 2018
[19]

P. P. Klamser,et al., Impact of Variable Speed on Collective Movement of Animal Groups (2021), doi:10.48550/arXiv.2106.00959,http://arxiv.org/abs/2106.00959, arXiv:2106.00959 [physics, q-bio]

work page doi:10.48550/arxiv.2106.00959 2021
[20]

Materials and methods are available as supplementary material
[21]

Bennett, B

L. Bennett, B. Melchers, B. Proppe, Curta: A General-purpose High-Performance Com- puter at ZEDAT, Freie Universit¨at Berlin (2020), doi:10.17169/REFUBIUM-26754,https: //refubium.fu-berlin.de/handle/fub188/26993, artwork Size: 5 S

work page doi:10.17169/refubium-26754 2020
[22]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal Policy Optimization Al- gorithms (2017), doi:10.48550/arXiv.1707.06347,http://arxiv.org/abs/1707.06347, arXiv:1707.06347 [cs]. Acknowledgments We thank Gregor Gebhardt and Julian Stastny for contributions during the early stages of the project, and Janosch Brandhorst for helpful discussi...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017
[23]

Furthermore, Mathis Hocke and Andreas Gerken were supported by the Elsa-Neumann Scholarship of the State of Berlin

and Germany’s Excellence Strategy (EXC 2002/1 ’Science of Intelligence’, Project Number 390523135). Furthermore, Mathis Hocke and Andreas Gerken were supported by the Elsa-Neumann Scholarship of the State of Berlin. The authors would like to thank the HPC Service of FUB-IT, Freie Universit¨at Berlin, for computing time (21). There are no competing interes...

2002
[24]

the bootstrap CI for the Wasserstein gap,
[25]

a permutation test (100,000 permutations), and
[26]

All policies showed highly significant sim-to-real differences (all𝑝 <0.01)

Cliff’s delta. All policies showed highly significant sim-to-real differences (all𝑝 <0.01). Sim-to-real gap (per-time-step metrics).Per-time-step values (25 Hz) are autocorrelated; there- fore, Wasserstein distances were computed at thetrial levelrather than by pooling all time steps. For each policy and metric, we compared each simulated trial with each ...