pith. sign in

arxiv: 2604.07303 · v2 · pith:DEP2WS5Qnew · submitted 2026-04-08 · 💻 cs.RO

Robots that learn to evaluate models of collective behavior

Pith reviewed 2026-05-21 09:17 UTC · model grok-4.3

classification 💻 cs.RO
keywords collective behaviorbehavioral modelingrobotic fishreinforcement learningsim-to-real transferanimal-robot interactionmodel evaluationfish behavior
0
0 comments X

The pith

A robotic fish can rank computational models of live fish behavior by measuring response gaps during closed-loop interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a reinforcement-learning framework that tests models of fish collective behavior by training robot control policies in simulation and then running them on a physical robotic fish with real animals. Policies learn to guide the simulated fish to goal locations using one of four candidate models, after which the same policies drive the RoboFish in tank experiments. Differences between simulated and real outcomes are quantified as Wasserstein distances on metrics such as goal-reaching success, distances between individuals, wall contacts, and alignment. The neural-network model produces the smallest gaps on most metrics, while simpler rule-based models show larger discrepancies. This separation demonstrates that closed-loop robotic testing can distinguish the behavioral fidelity of candidate models under matched interaction conditions.

Core claim

By training reinforcement learning policies to steer a simulated fish toward goals under each of four behavioral models and transferring those policies to a biomimetic RoboFish that interacts with live fish, the authors quantify model accuracy through the Wasserstein distance between the resulting distributions of behavioral metrics in simulation versus reality. The convolutional neural network model yields the smallest gaps across goal-reaching performance and most other measures, establishing higher fidelity than the constant-follow baseline or the two rule-based models. The observed separation in these gaps shows that the evaluation procedure can quantitatively distinguish candidate fish-

What carries the argument

Sim-to-real gap quantification via Wasserstein distances on behavioral metric distributions, obtained from goal-directed reinforcement learning policies transferred from simulation to a physical RoboFish interacting with live fish.

If this is right

  • The method identifies deficiencies in rule-based fish models that offline trajectory statistics miss.
  • Embodied closed-loop testing supplies a general way to evaluate any animal behavior model through direct interaction rather than static comparisons.
  • Quantitative ranking of models becomes feasible when all candidates drive the same robot under identical goal-reaching tasks.
  • Data-driven neural models can be shown to outperform conventional rule-based alternatives on this benchmark.
  • The framework supplies a practical route to refine collective-behavior models used in studies of decision-making and bio-inspired robotics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same robotic testing loop could be applied to other group-living species to expose sim-to-real gaps in their existing models.
  • Iterative use of the robot to collect new interaction data might allow models to be improved directly from embodied discrepancies.
  • Results suggest that purely offline validation of behavioral models may systematically underestimate the value of data-driven approaches.
  • Extending the approach to multi-robot setups could test whether the evaluation scales to larger collective systems.

Load-bearing premise

The RoboFish hardware and sensors must produce interaction dynamics that match the assumptions built into every tested behavioral model so that measured gaps reflect model differences rather than robot limitations.

What would settle it

If the Wasserstein distances between simulated and real metric distributions turn out statistically indistinguishable across all four fish models, or if the neural network model fails to show reliably smaller gaps than the rule-based alternatives, the claim that the framework distinguishes model fidelity would be refuted.

read the original abstract

Understanding and modeling animal behavior is essential for studying collective motion, decision-making, and bio-inspired robotics. Yet, evaluating the accuracy of behavioral models still often relies on offline comparisons to static trajectory statistics. Here we introduce a reinforcement-learning-based framework that uses a biomimetic robotic fish (RoboFish) to evaluate computational models of live fish behavior through closed-loop interaction. We trained policies in simulation using four distinct fish models-a simple constant-follow baseline, two rule-based models, and a biologically grounded convolutional neural network model-and transferred these policies to the real RoboFish setup, where they interacted with live fish. Policies were trained to guide a simulated fish to goal locations, enabling us to quantify how the response of real fish differs from the simulated fish's response. We evaluate the fish models by quantifying the sim-to-real gaps, defined as the Wasserstein distance between simulated and real distributions of behavioral metrics such as goal-reaching performance, inter-individual distances, wall interactions, and alignment. The neural network-based fish model exhibited the smallest gap across goal-reaching performance and most other metrics, indicating higher behavioral fidelity than conventional rule-based models under this benchmark. More importantly, this separation shows that the proposed evaluation can quantitatively distinguish candidate models under matched closed-loop conditions. Our work demonstrates how learning-based robotic experiments can uncover deficiencies in behavioral models and provides a general framework for evaluating animal behavior models through embodied interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a reinforcement-learning framework that trains policies in simulation using four fish models (constant-follow baseline, two rule-based models, and a CNN model) to guide a simulated fish to goal locations. These policies are transferred to a biomimetic robotic fish (RoboFish) that interacts with live fish in closed-loop experiments. Models are evaluated by computing Wasserstein distances between the distributions of behavioral metrics (goal-reaching performance, inter-individual distances, wall interactions, alignment) observed in simulation versus reality; the NN model shows the smallest gaps, which the authors interpret as evidence of higher behavioral fidelity and as proof that the method can quantitatively distinguish candidate models under matched conditions.

Significance. If the reported separation holds after proper statistical controls, the work supplies a concrete embodied benchmark for collective-behavior models that goes beyond offline trajectory statistics. The combination of RL policy transfer with Wasserstein-based sim-to-real gaps offers a reusable experimental protocol that could be applied to other bio-inspired systems; the manuscript already demonstrates that the protocol can produce a ranking among four distinct modeling approaches.

major comments (2)
  1. [Abstract] Abstract: the headline claim that the NN model exhibits the smallest Wasserstein gap (and therefore higher fidelity) is presented without sample sizes, number of trials, statistical tests, or error bars on the distances. These omissions make it impossible to judge whether the observed separation is robust or could be explained by sampling variability or robot-specific artifacts.
  2. [Abstract] Abstract (policy-transfer and metric-definition paragraphs): the evaluation presupposes that the RoboFish’s hydrodynamics, camera field-of-view, and motor latencies are consistent with the sensorimotor assumptions implicit in every tested model. No open-loop response characterization, parameter identification, or cross-model consistency check is described; without such verification the smaller gap for the NN could simply reflect better alignment between the NN training distribution and the robot’s actual physics rather than superior behavioral fidelity.
minor comments (1)
  1. [Abstract] Abstract: the phrase “most other metrics” is vague; listing the exact set of metrics for which the NN gap is smallest would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We have addressed each point below and revised the manuscript to improve clarity, statistical reporting, and discussion of potential confounds. We believe these changes strengthen the presentation of our embodied evaluation framework.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim that the NN model exhibits the smallest Wasserstein gap (and therefore higher fidelity) is presented without sample sizes, number of trials, statistical tests, or error bars on the distances. These omissions make it impossible to judge whether the observed separation is robust or could be explained by sampling variability or robot-specific artifacts.

    Authors: We agree that the abstract requires additional context for assessing robustness. In the revised manuscript we have expanded the abstract to reference the total number of closed-loop trials per model (minimum 12 independent runs), the use of bootstrap resampling for Wasserstein distance confidence intervals, and the statistical comparison (permutation tests) confirming the NN model’s gap is significantly smaller than the rule-based models (p < 0.05). Full sample sizes, error bars, and test details remain in the Results and Methods sections with explicit cross-references added to the abstract. revision: yes

  2. Referee: [Abstract] Abstract (policy-transfer and metric-definition paragraphs): the evaluation presupposes that the RoboFish’s hydrodynamics, camera field-of-view, and motor latencies are consistent with the sensorimotor assumptions implicit in every tested model. No open-loop response characterization, parameter identification, or cross-model consistency check is described; without such verification the smaller gap for the NN could simply reflect better alignment between the NN training distribution and the robot’s actual physics rather than superior behavioral fidelity.

    Authors: We acknowledge this is a valid concern for interpreting absolute fidelity. All four models were trained and evaluated under identical simulation parameters that were previously calibrated to the RoboFish platform (hydrodynamic coefficients, camera FOV, and actuator delays reported in our prior work). Because the same physical robot and sensorimotor loop are used for every model, any systematic physics mismatch affects the absolute gaps equally; the relative ordering therefore still isolates differences in behavioral modeling fidelity. To make this explicit we have added a dedicated paragraph in the revised Methods section summarizing the existing open-loop parameter identification and a brief cross-model sensitivity analysis showing that moderate latency variations do not invert the observed ranking. We also added a short discussion of this potential confound in the Limitations subsection. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical gaps measured via independent Wasserstein distances

full rationale

The paper's core result is an empirical comparison of sim-to-real gaps for four fish models, obtained by training goal-reaching policies in simulation, transferring them to the RoboFish hardware, and computing Wasserstein distances between real and simulated distributions of metrics such as goal-reaching performance, inter-individual distances, and alignment. These distances are calculated directly from observed trajectory statistics under closed-loop interaction and do not reduce to any fitted parameter, self-definition, or prior self-citation within the reported experiment. No load-bearing uniqueness theorem, ansatz smuggling, or renaming of known results appears in the abstract or described framework; the separation between models is presented as a direct measurement outcome rather than a derived identity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Framework rests on standard RL assumptions and the domain assumption that robotic interactions can serve as a valid probe; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Robotic fish interactions faithfully reproduce the conditions needed to test all candidate models equally.
    Central to interpreting sim-to-real gaps as model fidelity differences.

pith-pipeline@v0.9.0 · 5781 in / 1168 out tokens · 45803 ms · 2026-05-21T09:17:30.650528+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 3 internal anchors

  1. [1]

    Krause, A

    J. Krause, A. F. Winfield, J.-L. Deneubourg, Interactive robots in experimental biology.Trends in Ecology & Evolution26(7), 369–375 (2011), 00112, doi:10.1016/j.tree.2011.03.015,http: //linkinghub.elsevier.com/retrieve/pii/S0169534711000851

  2. [2]

    Landgraf,et al., Interactive Robotic Fish for the Analysis of Swarm Behavior, inAdvances in Swarm Intelligence, Y

    T. Landgraf,et al., Interactive Robotic Fish for the Analysis of Swarm Behavior, inAdvances in Swarm Intelligence, Y. Tan, Y. Shi, H. Mo, Eds., Lecture Notes in Computer Science (Springer, Berlin, Heidelberg) (2013), pp. 1–10, doi:10.1007/978-3-642-38703-6 1, 00019

  3. [3]

    Czirok , author E

    T. Vicsek, A. Czir ´ok, E. Ben-Jacob, I. Cohen, O. Shochet, Novel Type of Phase Tran- sition in a System of Self-Driven Particles.Physical Review Letters75(6), 1226–1229 (1995), 04436, doi:10.1103/PhysRevLett.75.1226,https://link.aps.org/doi/10.1103/ PhysRevLett.75.1226

  4. [4]

    I. D. Couzin, J. Krause, R. James, G. D. Ruxton, N. R. Franks, Collective Memory and Spatial Sorting in Animal Groups.Journal of Theoretical Biology218(1), 1–11 (2002), 01717, doi:10.1006/jtbi.2002.3065,http://www.sciencedirect.com/science/article/pii/ S0022519302930651

  5. [5]

    Aoki, A simulation study on the schooling mechanism in fish.NIPPON SUISAN GAKKAISHI 48(8), 1081–1088 (1982), 00000, doi:10.2331/suisan.48.1081,http://joi.jlc.jst.go

    I. Aoki, A simulation study on the schooling mechanism in fish.NIPPON SUISAN GAKKAISHI 48(8), 1081–1088 (1982), 00000, doi:10.2331/suisan.48.1081,http://joi.jlc.jst.go. jp/JST.Journalarchive/suisan1932/48.1081?from=CrossRef

  6. [6]

    C. W. Reynolds, Flocks, herds and schools: A distributed behavioral model.ACM SIG- GRAPH Computer Graphics21(4), 25–34 (1987), 09308, doi:10.1145/37402.37406,http: //portal.acm.org/citation.cfm?doid=37402.37406

  7. [7]

    A. Huth, C. Wissel, The simulation of the movement of fish schools.Journal of The- oretical Biology156(3), 365–385 (1992), doi:10.1016/S0022-5193(05)80681-2,https: //www.sciencedirect.com/science/article/pii/S0022519305806812

  8. [8]

    Learning recurrent representations for hierarchical behavior modeling

    E. Eyjolfsdottir, K. Branson, Y. Yue, P. Perona, Learning recurrent representations for hi- erarchical behavior modeling.arXiv:1611.00094 [cs](2016), arXiv: 1611.00094,http: //arxiv.org/abs/1611.00094. 18

  9. [9]

    F. J. H. Heras, F. Romero-Ferrero, R. C. Hinz, G. G. de Polavieja, Deep attention networks reveal the rules of collective motion in zebrafish.PLOS Computational Biology15(9), e1007354 (2019), 00008, doi:10.1371/journal.pcbi.1007354,https://dx.plos.org/10. 1371/journal.pcbi.1007354

  10. [10]

    Costa, A

    T. Costa, A. Laan, F. J. H. Heras, G. G. de Polavieja, Automated Discovery of Lo- cal Rules for Desired Collective-Level Behavior Through Reinforcement Learning.Fron- tiers in Physics8(2020), doi:10.3389/fphy.2020.00200,https://www.frontiersin.org/ journals/physics/articles/10.3389/fphy.2020.00200/full

  11. [11]

    Behavioral Ecology and Sociobiology(2010), doi:10.1007/s00265-010-0988-y

    Faria, A novel method for investigating the collective behaviour of fish: introducing ‘Robofish’. Behavioral Ecology and Sociobiology(2010), doi:10.1007/s00265-010-0988-y

  12. [12]

    T. Landgraf,et al., RoboFish: increased acceptance of interactive robotic fish with realistic eyes and natural motion patterns by live Trinidadian guppies.Bioinspiration & Biomimetics 11(1), 015001 (2016), 00066, doi:10.1088/1748-3190/11/1/015001,https://doi.org/10. 1088%2F1748-3190%2F11%2F1%2F015001

  13. [13]

    Polverino, Fish and Robots Swimming Together in a Water Tunnel: Robot Color and Tail-Beat Frequency Influence Fish Behavior.PLoS ONE(2013), doi:10.1371/journal.pone.0077589

  14. [14]

    D. Bierbach,et al., Using a robotic fish to investigate individual differences in social responsive- ness in the guppy.Royal Society Open Science5(8), 181026 (2018), doi:10.1098/rsos.181026, https://royalsocietypublishing.org/doi/full/10.1098/rsos.181026

  15. [15]

    Papaspyros, G

    V. Papaspyros, G. Theraulaz, C. Sire, F. Mondada, Quantifying the biomimicry gap in biohybrid robot-fish pairs.Bioinspiration & Biomimetics19(4), 046020 (2024), doi:10.1088/1748-3190/ ad577a,https://doi.org/10.1088/1748-3190/ad577a

  16. [16]

    M. Maxeiner,et al., Social competence improves the performance of biomimetic robots leading live fish.Bioinspiration & Biomimetics18(4), 045001 (2023), doi:10.1088/1748-3190/acca59, https://dx.doi.org/10.1088/1748-3190/acca59. 19

  17. [17]

    Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

    J. Tobin,et al., Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.arXiv:1703.06907 [cs](2017), 00726 arXiv: 1703.06907,http://arxiv. org/abs/1703.06907

  18. [18]

    X. B. Peng, M. Andrychowicz, W. Zaremba, P. Abbeel, Sim-to-Real Transfer of Robotic Control with Dynamics Randomization, in2018 IEEE International Conference on Robotics and Automation (ICRA)(IEEE, Brisbane, QLD) (2018), pp. 3803–3810, doi:10.1109/ICRA. 2018.8460528,https://ieeexplore.ieee.org/document/8460528/

  19. [19]

    P. P. Klamser,et al., Impact of Variable Speed on Collective Movement of Animal Groups (2021), doi:10.48550/arXiv.2106.00959,http://arxiv.org/abs/2106.00959, arXiv:2106.00959 [physics, q-bio]

  20. [20]

    Materials and methods are available as supplementary material

  21. [21]

    Bennett, B

    L. Bennett, B. Melchers, B. Proppe, Curta: A General-purpose High-Performance Com- puter at ZEDAT, Freie Universit¨at Berlin (2020), doi:10.17169/REFUBIUM-26754,https: //refubium.fu-berlin.de/handle/fub188/26993, artwork Size: 5 S

  22. [22]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal Policy Optimization Al- gorithms (2017), doi:10.48550/arXiv.1707.06347,http://arxiv.org/abs/1707.06347, arXiv:1707.06347 [cs]. Acknowledgments We thank Gregor Gebhardt and Julian Stastny for contributions during the early stages of the project, and Janosch Brandhorst for helpful discussi...

  23. [23]

    Furthermore, Mathis Hocke and Andreas Gerken were supported by the Elsa-Neumann Scholarship of the State of Berlin

    and Germany’s Excellence Strategy (EXC 2002/1 ’Science of Intelligence’, Project Number 390523135). Furthermore, Mathis Hocke and Andreas Gerken were supported by the Elsa-Neumann Scholarship of the State of Berlin. The authors would like to thank the HPC Service of FUB-IT, Freie Universit¨at Berlin, for computing time (21). There are no competing interes...

  24. [24]

    the bootstrap CI for the Wasserstein gap,

  25. [25]

    a permutation test (100,000 permutations), and

  26. [26]

    All policies showed highly significant sim-to-real differences (all𝑝 <0.01)

    Cliff’s delta. All policies showed highly significant sim-to-real differences (all𝑝 <0.01). Sim-to-real gap (per-time-step metrics).Per-time-step values (25 Hz) are autocorrelated; there- fore, Wasserstein distances were computed at thetrial levelrather than by pooling all time steps. For each policy and metric, we compared each simulated trial with each ...