Robots that learn to evaluate models of collective behavior

Andreas Gerken; David Bierbach; Jens Krause; Mathis Hocke; Tim Landgraf

arxiv: 2604.07303 · v2 · pith:DEP2WS5Qnew · submitted 2026-04-08 · 💻 cs.RO

Robots that learn to evaluate models of collective behavior

Mathis Hocke , Andreas Gerken , David Bierbach , Jens Krause , Tim Landgraf This is my paper

Pith reviewed 2026-05-21 09:17 UTC · model grok-4.3

classification 💻 cs.RO

keywords collective behaviorbehavioral modelingrobotic fishreinforcement learningsim-to-real transferanimal-robot interactionmodel evaluationfish behavior

0 comments

The pith

A robotic fish can rank computational models of live fish behavior by measuring response gaps during closed-loop interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a reinforcement-learning framework that tests models of fish collective behavior by training robot control policies in simulation and then running them on a physical robotic fish with real animals. Policies learn to guide the simulated fish to goal locations using one of four candidate models, after which the same policies drive the RoboFish in tank experiments. Differences between simulated and real outcomes are quantified as Wasserstein distances on metrics such as goal-reaching success, distances between individuals, wall contacts, and alignment. The neural-network model produces the smallest gaps on most metrics, while simpler rule-based models show larger discrepancies. This separation demonstrates that closed-loop robotic testing can distinguish the behavioral fidelity of candidate models under matched interaction conditions.

Core claim

By training reinforcement learning policies to steer a simulated fish toward goals under each of four behavioral models and transferring those policies to a biomimetic RoboFish that interacts with live fish, the authors quantify model accuracy through the Wasserstein distance between the resulting distributions of behavioral metrics in simulation versus reality. The convolutional neural network model yields the smallest gaps across goal-reaching performance and most other measures, establishing higher fidelity than the constant-follow baseline or the two rule-based models. The observed separation in these gaps shows that the evaluation procedure can quantitatively distinguish candidate fish-

What carries the argument

Sim-to-real gap quantification via Wasserstein distances on behavioral metric distributions, obtained from goal-directed reinforcement learning policies transferred from simulation to a physical RoboFish interacting with live fish.

If this is right

The method identifies deficiencies in rule-based fish models that offline trajectory statistics miss.
Embodied closed-loop testing supplies a general way to evaluate any animal behavior model through direct interaction rather than static comparisons.
Quantitative ranking of models becomes feasible when all candidates drive the same robot under identical goal-reaching tasks.
Data-driven neural models can be shown to outperform conventional rule-based alternatives on this benchmark.
The framework supplies a practical route to refine collective-behavior models used in studies of decision-making and bio-inspired robotics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same robotic testing loop could be applied to other group-living species to expose sim-to-real gaps in their existing models.
Iterative use of the robot to collect new interaction data might allow models to be improved directly from embodied discrepancies.
Results suggest that purely offline validation of behavioral models may systematically underestimate the value of data-driven approaches.
Extending the approach to multi-robot setups could test whether the evaluation scales to larger collective systems.

Load-bearing premise

The RoboFish hardware and sensors must produce interaction dynamics that match the assumptions built into every tested behavioral model so that measured gaps reflect model differences rather than robot limitations.

What would settle it

If the Wasserstein distances between simulated and real metric distributions turn out statistically indistinguishable across all four fish models, or if the neural network model fails to show reliably smaller gaps than the rule-based alternatives, the claim that the framework distinguishes model fidelity would be refuted.

read the original abstract

Understanding and modeling animal behavior is essential for studying collective motion, decision-making, and bio-inspired robotics. Yet, evaluating the accuracy of behavioral models still often relies on offline comparisons to static trajectory statistics. Here we introduce a reinforcement-learning-based framework that uses a biomimetic robotic fish (RoboFish) to evaluate computational models of live fish behavior through closed-loop interaction. We trained policies in simulation using four distinct fish models-a simple constant-follow baseline, two rule-based models, and a biologically grounded convolutional neural network model-and transferred these policies to the real RoboFish setup, where they interacted with live fish. Policies were trained to guide a simulated fish to goal locations, enabling us to quantify how the response of real fish differs from the simulated fish's response. We evaluate the fish models by quantifying the sim-to-real gaps, defined as the Wasserstein distance between simulated and real distributions of behavioral metrics such as goal-reaching performance, inter-individual distances, wall interactions, and alignment. The neural network-based fish model exhibited the smallest gap across goal-reaching performance and most other metrics, indicating higher behavioral fidelity than conventional rule-based models under this benchmark. More importantly, this separation shows that the proposed evaluation can quantitatively distinguish candidate models under matched closed-loop conditions. Our work demonstrates how learning-based robotic experiments can uncover deficiencies in behavioral models and provides a general framework for evaluating animal behavior models through embodied interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a robotic closed-loop test can separate fish behavior models via sim-to-real gaps, with the NN version looking best, but hardware-to-model matching is unverified.

read the letter

The main takeaway is that this work uses a biomimetic robot to run closed-loop tests on different models of how fish behave in groups. They train RL policies on four models in simulation – a simple follower, rule-based ones, and a CNN – then put the policy on the RoboFish to interact with actual fish. By measuring Wasserstein distances between real and simulated metric distributions for things like goal reaching and alignment, they find the neural network model has the smallest gap. What the paper does well is move the evaluation into an embodied, interactive setting. Offline stats on trajectories can miss how models perform when the system is running in real time with feedback. Here the robot is trying to achieve goals based on the model, and the live fish respond, so the gaps reflect actual behavioral differences under matched conditions. The separation they report suggests this method can pick out better models. The soft spots are mostly around the details that aren't in the abstract. There's no word on trial numbers, variability, or formal tests for the distance differences. More importantly, the stress-test concern holds some weight: the RoboFish has specific sensing and movement limits that may not line up with the idealized assumptions in the rule-based or even the NN model. If the robot's camera range or turning speed differs from what a model was designed for, then a smaller sim-to-real gap might just mean the model was fitted to something closer to the hardware rather than capturing the true fish behavior. The paper would be stronger with some check that the effective interaction kernel is consistent across models. This paper is for people in robotics and animal behavior who are looking for better validation tools for their simulations. A reader who cares about collective motion or bio-inspired agents would get value from seeing how RL can be used as a probe. I would recommend sending it for peer review. The approach is original and the initial results are encouraging, but it needs the methods section to address the potential mismatches and provide the missing statistics before it can be fully assessed.

Referee Report

2 major / 1 minor

Summary. The paper introduces a reinforcement-learning framework that trains policies in simulation using four fish models (constant-follow baseline, two rule-based models, and a CNN model) to guide a simulated fish to goal locations. These policies are transferred to a biomimetic robotic fish (RoboFish) that interacts with live fish in closed-loop experiments. Models are evaluated by computing Wasserstein distances between the distributions of behavioral metrics (goal-reaching performance, inter-individual distances, wall interactions, alignment) observed in simulation versus reality; the NN model shows the smallest gaps, which the authors interpret as evidence of higher behavioral fidelity and as proof that the method can quantitatively distinguish candidate models under matched conditions.

Significance. If the reported separation holds after proper statistical controls, the work supplies a concrete embodied benchmark for collective-behavior models that goes beyond offline trajectory statistics. The combination of RL policy transfer with Wasserstein-based sim-to-real gaps offers a reusable experimental protocol that could be applied to other bio-inspired systems; the manuscript already demonstrates that the protocol can produce a ranking among four distinct modeling approaches.

major comments (2)

[Abstract] Abstract: the headline claim that the NN model exhibits the smallest Wasserstein gap (and therefore higher fidelity) is presented without sample sizes, number of trials, statistical tests, or error bars on the distances. These omissions make it impossible to judge whether the observed separation is robust or could be explained by sampling variability or robot-specific artifacts.
[Abstract] Abstract (policy-transfer and metric-definition paragraphs): the evaluation presupposes that the RoboFish’s hydrodynamics, camera field-of-view, and motor latencies are consistent with the sensorimotor assumptions implicit in every tested model. No open-loop response characterization, parameter identification, or cross-model consistency check is described; without such verification the smaller gap for the NN could simply reflect better alignment between the NN training distribution and the robot’s actual physics rather than superior behavioral fidelity.

minor comments (1)

[Abstract] Abstract: the phrase “most other metrics” is vague; listing the exact set of metrics for which the NN gap is smallest would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We have addressed each point below and revised the manuscript to improve clarity, statistical reporting, and discussion of potential confounds. We believe these changes strengthen the presentation of our embodied evaluation framework.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim that the NN model exhibits the smallest Wasserstein gap (and therefore higher fidelity) is presented without sample sizes, number of trials, statistical tests, or error bars on the distances. These omissions make it impossible to judge whether the observed separation is robust or could be explained by sampling variability or robot-specific artifacts.

Authors: We agree that the abstract requires additional context for assessing robustness. In the revised manuscript we have expanded the abstract to reference the total number of closed-loop trials per model (minimum 12 independent runs), the use of bootstrap resampling for Wasserstein distance confidence intervals, and the statistical comparison (permutation tests) confirming the NN model’s gap is significantly smaller than the rule-based models (p < 0.05). Full sample sizes, error bars, and test details remain in the Results and Methods sections with explicit cross-references added to the abstract. revision: yes
Referee: [Abstract] Abstract (policy-transfer and metric-definition paragraphs): the evaluation presupposes that the RoboFish’s hydrodynamics, camera field-of-view, and motor latencies are consistent with the sensorimotor assumptions implicit in every tested model. No open-loop response characterization, parameter identification, or cross-model consistency check is described; without such verification the smaller gap for the NN could simply reflect better alignment between the NN training distribution and the robot’s actual physics rather than superior behavioral fidelity.

Authors: We acknowledge this is a valid concern for interpreting absolute fidelity. All four models were trained and evaluated under identical simulation parameters that were previously calibrated to the RoboFish platform (hydrodynamic coefficients, camera FOV, and actuator delays reported in our prior work). Because the same physical robot and sensorimotor loop are used for every model, any systematic physics mismatch affects the absolute gaps equally; the relative ordering therefore still isolates differences in behavioral modeling fidelity. To make this explicit we have added a dedicated paragraph in the revised Methods section summarizing the existing open-loop parameter identification and a brief cross-model sensitivity analysis showing that moderate latency variations do not invert the observed ranking. We also added a short discussion of this potential confound in the Limitations subsection. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical gaps measured via independent Wasserstein distances

full rationale

The paper's core result is an empirical comparison of sim-to-real gaps for four fish models, obtained by training goal-reaching policies in simulation, transferring them to the RoboFish hardware, and computing Wasserstein distances between real and simulated distributions of metrics such as goal-reaching performance, inter-individual distances, and alignment. These distances are calculated directly from observed trajectory statistics under closed-loop interaction and do not reduce to any fitted parameter, self-definition, or prior self-citation within the reported experiment. No load-bearing uniqueness theorem, ansatz smuggling, or renaming of known results appears in the abstract or described framework; the separation between models is presented as a direct measurement outcome rather than a derived identity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Framework rests on standard RL assumptions and the domain assumption that robotic interactions can serve as a valid probe; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Robotic fish interactions faithfully reproduce the conditions needed to test all candidate models equally.
Central to interpreting sim-to-real gaps as model fidelity differences.

pith-pipeline@v0.9.0 · 5781 in / 1168 out tokens · 45803 ms · 2026-05-21T09:17:30.650528+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We evaluate the fish models by quantifying the sim-to-real gaps, defined as the Wasserstein distance between simulated and real distributions of behavioral metrics such as goal-reaching performance, inter-individual distances, wall interactions, and alignment.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The neural network-based fish model exhibited the smallest gap across goal-reaching performance and most other metrics

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 3 internal anchors

[1]

Krause, A

J. Krause, A. F. Winfield, J.-L. Deneubourg, Interactive robots in experimental biology.Trends in Ecology & Evolution26(7), 369–375 (2011), 00112, doi:10.1016/j.tree.2011.03.015,http: //linkinghub.elsevier.com/retrieve/pii/S0169534711000851

work page doi:10.1016/j.tree.2011.03.015 2011
[2]

Landgraf,et al., Interactive Robotic Fish for the Analysis of Swarm Behavior, inAdvances in Swarm Intelligence, Y

T. Landgraf,et al., Interactive Robotic Fish for the Analysis of Swarm Behavior, inAdvances in Swarm Intelligence, Y. Tan, Y. Shi, H. Mo, Eds., Lecture Notes in Computer Science (Springer, Berlin, Heidelberg) (2013), pp. 1–10, doi:10.1007/978-3-642-38703-6 1, 00019

work page doi:10.1007/978-3-642-38703-6 2013
[3]

Czirok , author E

T. Vicsek, A. Czir ´ok, E. Ben-Jacob, I. Cohen, O. Shochet, Novel Type of Phase Tran- sition in a System of Self-Driven Particles.Physical Review Letters75(6), 1226–1229 (1995), 04436, doi:10.1103/PhysRevLett.75.1226,https://link.aps.org/doi/10.1103/ PhysRevLett.75.1226

work page doi:10.1103/physrevlett.75.1226 1995
[4]

I. D. Couzin, J. Krause, R. James, G. D. Ruxton, N. R. Franks, Collective Memory and Spatial Sorting in Animal Groups.Journal of Theoretical Biology218(1), 1–11 (2002), 01717, doi:10.1006/jtbi.2002.3065,http://www.sciencedirect.com/science/article/pii/ S0022519302930651

work page doi:10.1006/jtbi.2002.3065 2002
[5]

Aoki, A simulation study on the schooling mechanism in fish.NIPPON SUISAN GAKKAISHI 48(8), 1081–1088 (1982), 00000, doi:10.2331/suisan.48.1081,http://joi.jlc.jst.go

I. Aoki, A simulation study on the schooling mechanism in fish.NIPPON SUISAN GAKKAISHI 48(8), 1081–1088 (1982), 00000, doi:10.2331/suisan.48.1081,http://joi.jlc.jst.go. jp/JST.Journalarchive/suisan1932/48.1081?from=CrossRef

work page doi:10.2331/suisan.48.1081 1982
[6]

C. W. Reynolds, Flocks, herds and schools: A distributed behavioral model.ACM SIG- GRAPH Computer Graphics21(4), 25–34 (1987), 09308, doi:10.1145/37402.37406,http: //portal.acm.org/citation.cfm?doid=37402.37406

work page doi:10.1145/37402.37406 1987
[7]

A. Huth, C. Wissel, The simulation of the movement of fish schools.Journal of The- oretical Biology156(3), 365–385 (1992), doi:10.1016/S0022-5193(05)80681-2,https: //www.sciencedirect.com/science/article/pii/S0022519305806812

work page doi:10.1016/s0022-5193(05)80681-2 1992
[8]

Learning recurrent representations for hierarchical behavior modeling

E. Eyjolfsdottir, K. Branson, Y. Yue, P. Perona, Learning recurrent representations for hi- erarchical behavior modeling.arXiv:1611.00094 [cs](2016), arXiv: 1611.00094,http: //arxiv.org/abs/1611.00094. 18

work page internal anchor Pith review Pith/arXiv arXiv 2016
[9]

F. J. H. Heras, F. Romero-Ferrero, R. C. Hinz, G. G. de Polavieja, Deep attention networks reveal the rules of collective motion in zebrafish.PLOS Computational Biology15(9), e1007354 (2019), 00008, doi:10.1371/journal.pcbi.1007354,https://dx.plos.org/10. 1371/journal.pcbi.1007354

work page doi:10.1371/journal.pcbi.1007354 2019
[10]

Costa, A

T. Costa, A. Laan, F. J. H. Heras, G. G. de Polavieja, Automated Discovery of Lo- cal Rules for Desired Collective-Level Behavior Through Reinforcement Learning.Fron- tiers in Physics8(2020), doi:10.3389/fphy.2020.00200,https://www.frontiersin.org/ journals/physics/articles/10.3389/fphy.2020.00200/full

work page doi:10.3389/fphy.2020.00200 2020
[11]

Behavioral Ecology and Sociobiology(2010), doi:10.1007/s00265-010-0988-y

Faria, A novel method for investigating the collective behaviour of fish: introducing ‘Robofish’. Behavioral Ecology and Sociobiology(2010), doi:10.1007/s00265-010-0988-y

work page doi:10.1007/s00265-010-0988-y 2010
[12]

T. Landgraf,et al., RoboFish: increased acceptance of interactive robotic fish with realistic eyes and natural motion patterns by live Trinidadian guppies.Bioinspiration & Biomimetics 11(1), 015001 (2016), 00066, doi:10.1088/1748-3190/11/1/015001,https://doi.org/10. 1088%2F1748-3190%2F11%2F1%2F015001

work page doi:10.1088/1748-3190/11/1/015001 2016
[13]

Polverino, Fish and Robots Swimming Together in a Water Tunnel: Robot Color and Tail-Beat Frequency Influence Fish Behavior.PLoS ONE(2013), doi:10.1371/journal.pone.0077589

work page doi:10.1371/journal.pone.0077589 2013
[14]

D. Bierbach,et al., Using a robotic fish to investigate individual differences in social responsive- ness in the guppy.Royal Society Open Science5(8), 181026 (2018), doi:10.1098/rsos.181026, https://royalsocietypublishing.org/doi/full/10.1098/rsos.181026

work page doi:10.1098/rsos.181026 2018
[15]

Papaspyros, G

V. Papaspyros, G. Theraulaz, C. Sire, F. Mondada, Quantifying the biomimicry gap in biohybrid robot-fish pairs.Bioinspiration & Biomimetics19(4), 046020 (2024), doi:10.1088/1748-3190/ ad577a,https://doi.org/10.1088/1748-3190/ad577a

work page doi:10.1088/1748-3190/ 2024
[16]

M. Maxeiner,et al., Social competence improves the performance of biomimetic robots leading live fish.Bioinspiration & Biomimetics18(4), 045001 (2023), doi:10.1088/1748-3190/acca59, https://dx.doi.org/10.1088/1748-3190/acca59. 19

work page doi:10.1088/1748-3190/acca59 2023
[17]

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

J. Tobin,et al., Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.arXiv:1703.06907 [cs](2017), 00726 arXiv: 1703.06907,http://arxiv. org/abs/1703.06907

work page internal anchor Pith review Pith/arXiv arXiv 2017
[18]

X. B. Peng, M. Andrychowicz, W. Zaremba, P. Abbeel, Sim-to-Real Transfer of Robotic Control with Dynamics Randomization, in2018 IEEE International Conference on Robotics and Automation (ICRA)(IEEE, Brisbane, QLD) (2018), pp. 3803–3810, doi:10.1109/ICRA. 2018.8460528,https://ieeexplore.ieee.org/document/8460528/

work page doi:10.1109/icra 2018
[19]

P. P. Klamser,et al., Impact of Variable Speed on Collective Movement of Animal Groups (2021), doi:10.48550/arXiv.2106.00959,http://arxiv.org/abs/2106.00959, arXiv:2106.00959 [physics, q-bio]

work page doi:10.48550/arxiv.2106.00959 2021
[20]

Materials and methods are available as supplementary material

work page
[21]

Bennett, B

L. Bennett, B. Melchers, B. Proppe, Curta: A General-purpose High-Performance Com- puter at ZEDAT, Freie Universit¨at Berlin (2020), doi:10.17169/REFUBIUM-26754,https: //refubium.fu-berlin.de/handle/fub188/26993, artwork Size: 5 S

work page doi:10.17169/refubium-26754 2020
[22]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal Policy Optimization Al- gorithms (2017), doi:10.48550/arXiv.1707.06347,http://arxiv.org/abs/1707.06347, arXiv:1707.06347 [cs]. Acknowledgments We thank Gregor Gebhardt and Julian Stastny for contributions during the early stages of the project, and Janosch Brandhorst for helpful discussi...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017
[23]

Furthermore, Mathis Hocke and Andreas Gerken were supported by the Elsa-Neumann Scholarship of the State of Berlin

and Germany’s Excellence Strategy (EXC 2002/1 ’Science of Intelligence’, Project Number 390523135). Furthermore, Mathis Hocke and Andreas Gerken were supported by the Elsa-Neumann Scholarship of the State of Berlin. The authors would like to thank the HPC Service of FUB-IT, Freie Universit¨at Berlin, for computing time (21). There are no competing interes...

work page 2002
[24]

the bootstrap CI for the Wasserstein gap,

work page
[25]

a permutation test (100,000 permutations), and

work page
[26]

All policies showed highly significant sim-to-real differences (all𝑝 <0.01)

Cliff’s delta. All policies showed highly significant sim-to-real differences (all𝑝 <0.01). Sim-to-real gap (per-time-step metrics).Per-time-step values (25 Hz) are autocorrelated; there- fore, Wasserstein distances were computed at thetrial levelrather than by pooling all time steps. For each policy and metric, we compared each simulated trial with each ...

work page

[1] [1]

Krause, A

J. Krause, A. F. Winfield, J.-L. Deneubourg, Interactive robots in experimental biology.Trends in Ecology & Evolution26(7), 369–375 (2011), 00112, doi:10.1016/j.tree.2011.03.015,http: //linkinghub.elsevier.com/retrieve/pii/S0169534711000851

work page doi:10.1016/j.tree.2011.03.015 2011

[2] [2]

Landgraf,et al., Interactive Robotic Fish for the Analysis of Swarm Behavior, inAdvances in Swarm Intelligence, Y

T. Landgraf,et al., Interactive Robotic Fish for the Analysis of Swarm Behavior, inAdvances in Swarm Intelligence, Y. Tan, Y. Shi, H. Mo, Eds., Lecture Notes in Computer Science (Springer, Berlin, Heidelberg) (2013), pp. 1–10, doi:10.1007/978-3-642-38703-6 1, 00019

work page doi:10.1007/978-3-642-38703-6 2013

[3] [3]

Czirok , author E

T. Vicsek, A. Czir ´ok, E. Ben-Jacob, I. Cohen, O. Shochet, Novel Type of Phase Tran- sition in a System of Self-Driven Particles.Physical Review Letters75(6), 1226–1229 (1995), 04436, doi:10.1103/PhysRevLett.75.1226,https://link.aps.org/doi/10.1103/ PhysRevLett.75.1226

work page doi:10.1103/physrevlett.75.1226 1995

[4] [4]

I. D. Couzin, J. Krause, R. James, G. D. Ruxton, N. R. Franks, Collective Memory and Spatial Sorting in Animal Groups.Journal of Theoretical Biology218(1), 1–11 (2002), 01717, doi:10.1006/jtbi.2002.3065,http://www.sciencedirect.com/science/article/pii/ S0022519302930651

work page doi:10.1006/jtbi.2002.3065 2002

[5] [5]

Aoki, A simulation study on the schooling mechanism in fish.NIPPON SUISAN GAKKAISHI 48(8), 1081–1088 (1982), 00000, doi:10.2331/suisan.48.1081,http://joi.jlc.jst.go

I. Aoki, A simulation study on the schooling mechanism in fish.NIPPON SUISAN GAKKAISHI 48(8), 1081–1088 (1982), 00000, doi:10.2331/suisan.48.1081,http://joi.jlc.jst.go. jp/JST.Journalarchive/suisan1932/48.1081?from=CrossRef

work page doi:10.2331/suisan.48.1081 1982

[6] [6]

C. W. Reynolds, Flocks, herds and schools: A distributed behavioral model.ACM SIG- GRAPH Computer Graphics21(4), 25–34 (1987), 09308, doi:10.1145/37402.37406,http: //portal.acm.org/citation.cfm?doid=37402.37406

work page doi:10.1145/37402.37406 1987

[7] [7]

A. Huth, C. Wissel, The simulation of the movement of fish schools.Journal of The- oretical Biology156(3), 365–385 (1992), doi:10.1016/S0022-5193(05)80681-2,https: //www.sciencedirect.com/science/article/pii/S0022519305806812

work page doi:10.1016/s0022-5193(05)80681-2 1992

[8] [8]

Learning recurrent representations for hierarchical behavior modeling

E. Eyjolfsdottir, K. Branson, Y. Yue, P. Perona, Learning recurrent representations for hi- erarchical behavior modeling.arXiv:1611.00094 [cs](2016), arXiv: 1611.00094,http: //arxiv.org/abs/1611.00094. 18

work page internal anchor Pith review Pith/arXiv arXiv 2016

[9] [9]

F. J. H. Heras, F. Romero-Ferrero, R. C. Hinz, G. G. de Polavieja, Deep attention networks reveal the rules of collective motion in zebrafish.PLOS Computational Biology15(9), e1007354 (2019), 00008, doi:10.1371/journal.pcbi.1007354,https://dx.plos.org/10. 1371/journal.pcbi.1007354

work page doi:10.1371/journal.pcbi.1007354 2019

[10] [10]

Costa, A

T. Costa, A. Laan, F. J. H. Heras, G. G. de Polavieja, Automated Discovery of Lo- cal Rules for Desired Collective-Level Behavior Through Reinforcement Learning.Fron- tiers in Physics8(2020), doi:10.3389/fphy.2020.00200,https://www.frontiersin.org/ journals/physics/articles/10.3389/fphy.2020.00200/full

work page doi:10.3389/fphy.2020.00200 2020

[11] [11]

Behavioral Ecology and Sociobiology(2010), doi:10.1007/s00265-010-0988-y

Faria, A novel method for investigating the collective behaviour of fish: introducing ‘Robofish’. Behavioral Ecology and Sociobiology(2010), doi:10.1007/s00265-010-0988-y

work page doi:10.1007/s00265-010-0988-y 2010

[12] [12]

T. Landgraf,et al., RoboFish: increased acceptance of interactive robotic fish with realistic eyes and natural motion patterns by live Trinidadian guppies.Bioinspiration & Biomimetics 11(1), 015001 (2016), 00066, doi:10.1088/1748-3190/11/1/015001,https://doi.org/10. 1088%2F1748-3190%2F11%2F1%2F015001

work page doi:10.1088/1748-3190/11/1/015001 2016

[13] [13]

Polverino, Fish and Robots Swimming Together in a Water Tunnel: Robot Color and Tail-Beat Frequency Influence Fish Behavior.PLoS ONE(2013), doi:10.1371/journal.pone.0077589

work page doi:10.1371/journal.pone.0077589 2013

[14] [14]

D. Bierbach,et al., Using a robotic fish to investigate individual differences in social responsive- ness in the guppy.Royal Society Open Science5(8), 181026 (2018), doi:10.1098/rsos.181026, https://royalsocietypublishing.org/doi/full/10.1098/rsos.181026

work page doi:10.1098/rsos.181026 2018

[15] [15]

Papaspyros, G

V. Papaspyros, G. Theraulaz, C. Sire, F. Mondada, Quantifying the biomimicry gap in biohybrid robot-fish pairs.Bioinspiration & Biomimetics19(4), 046020 (2024), doi:10.1088/1748-3190/ ad577a,https://doi.org/10.1088/1748-3190/ad577a

work page doi:10.1088/1748-3190/ 2024

[16] [16]

M. Maxeiner,et al., Social competence improves the performance of biomimetic robots leading live fish.Bioinspiration & Biomimetics18(4), 045001 (2023), doi:10.1088/1748-3190/acca59, https://dx.doi.org/10.1088/1748-3190/acca59. 19

work page doi:10.1088/1748-3190/acca59 2023

[17] [17]

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

J. Tobin,et al., Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.arXiv:1703.06907 [cs](2017), 00726 arXiv: 1703.06907,http://arxiv. org/abs/1703.06907

work page internal anchor Pith review Pith/arXiv arXiv 2017

[18] [18]

X. B. Peng, M. Andrychowicz, W. Zaremba, P. Abbeel, Sim-to-Real Transfer of Robotic Control with Dynamics Randomization, in2018 IEEE International Conference on Robotics and Automation (ICRA)(IEEE, Brisbane, QLD) (2018), pp. 3803–3810, doi:10.1109/ICRA. 2018.8460528,https://ieeexplore.ieee.org/document/8460528/

work page doi:10.1109/icra 2018

[19] [19]

P. P. Klamser,et al., Impact of Variable Speed on Collective Movement of Animal Groups (2021), doi:10.48550/arXiv.2106.00959,http://arxiv.org/abs/2106.00959, arXiv:2106.00959 [physics, q-bio]

work page doi:10.48550/arxiv.2106.00959 2021

[20] [20]

Materials and methods are available as supplementary material

work page

[21] [21]

Bennett, B

L. Bennett, B. Melchers, B. Proppe, Curta: A General-purpose High-Performance Com- puter at ZEDAT, Freie Universit¨at Berlin (2020), doi:10.17169/REFUBIUM-26754,https: //refubium.fu-berlin.de/handle/fub188/26993, artwork Size: 5 S

work page doi:10.17169/refubium-26754 2020

[22] [22]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal Policy Optimization Al- gorithms (2017), doi:10.48550/arXiv.1707.06347,http://arxiv.org/abs/1707.06347, arXiv:1707.06347 [cs]. Acknowledgments We thank Gregor Gebhardt and Julian Stastny for contributions during the early stages of the project, and Janosch Brandhorst for helpful discussi...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017

[23] [23]

Furthermore, Mathis Hocke and Andreas Gerken were supported by the Elsa-Neumann Scholarship of the State of Berlin

and Germany’s Excellence Strategy (EXC 2002/1 ’Science of Intelligence’, Project Number 390523135). Furthermore, Mathis Hocke and Andreas Gerken were supported by the Elsa-Neumann Scholarship of the State of Berlin. The authors would like to thank the HPC Service of FUB-IT, Freie Universit¨at Berlin, for computing time (21). There are no competing interes...

work page 2002

[24] [24]

the bootstrap CI for the Wasserstein gap,

work page

[25] [25]

a permutation test (100,000 permutations), and

work page

[26] [26]

All policies showed highly significant sim-to-real differences (all𝑝 <0.01)

Cliff’s delta. All policies showed highly significant sim-to-real differences (all𝑝 <0.01). Sim-to-real gap (per-time-step metrics).Per-time-step values (25 Hz) are autocorrelated; there- fore, Wasserstein distances were computed at thetrial levelrather than by pooling all time steps. For each policy and metric, we compared each simulated trial with each ...

work page