A Framework For Identifying Group Behavior Of Wild Animals

Guido Muscioni; Marco D. Santambrogio; Margaret C. Crofoot; Matteo Foglio; Riccardo Pressiani; Tanya Berger-Wolf

arxiv: 1907.00932 · v1 · pith:77MXTPZ6new · submitted 2019-07-01 · 💻 cs.LG · cs.CV

A Framework For Identifying Group Behavior Of Wild Animals

Guido Muscioni , Riccardo Pressiani , Matteo Foglio , Margaret C. Crofoot , Marco D. Santambrogio , Tanya Berger-Wolf This is my paper

Pith reviewed 2026-05-25 11:44 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords group behaviorwild animalssequence analysiscollective behavioranimal trackingbehavioral annotationinteraction encodingactivity recognition

0 comments

The pith

Direct encoding of group interactions in sequence analysis improves classification of collective wild animal behaviors from tracking data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method to classify collective behaviors of wild animal groups by analyzing tracking sequences that directly encode member interactions. It tests the approach on a real-world dataset of social animals and reports higher accuracy than standard baseline classifiers. A sympathetic reader would care because this offers a route to infer social dynamics at scale from sensor data without continuous human observation. The work treats behavior inference as a supervised sequence task where interaction structure is made explicit rather than left implicit in the features.

Core claim

The central claim is that a sequence analysis approach with direct encoding of the interactions of a group of wild animals allows more accurate inference of collective behavioral annotations than baseline methods, as demonstrated on a real-world tracking dataset.

What carries the argument

Sequence analysis with a direct encoding of the interactions of a group of wild animals, which turns pairwise or group-level interaction events into explicit features for classification.

If this is right

Collective behaviors can be classified more accurately from indirect tracking data than with methods that do not encode interactions directly.
The same sequence model yields measurable accuracy lifts on real animal movement records compared with baseline sequence or feature-based classifiers.
Behavioral annotation of social groups becomes feasible at the scale of existing tracking deployments without additional manual labeling effort beyond the training set.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The interaction-encoding step could be reused in other sensor-based group activity settings where only position or proximity streams are available.
If the accuracy edge holds across species, the method would support comparative studies of sociality using already-collected tracking archives.
Extensions that relax the need for clean human annotations, such as semi-supervised variants, would directly address the load-bearing assumption.

Load-bearing premise

The tracking data and human-provided behavioral annotations contain sufficient signal about true group interactions.

What would settle it

Running the same classification pipeline on a version of the dataset where interaction cues are deliberately removed or annotations are randomly perturbed and finding that accuracy gains over baselines disappear.

Figures

Figures reproduced from arXiv: 1907.00932 by Guido Muscioni, Marco D. Santambrogio, Margaret C. Crofoot, Matteo Foglio, Riccardo Pressiani, Tanya Berger-Wolf.

read the original abstract

Activity recognition and, more generally, behavior inference tasks are gaining a lot of interest. Much of it is work in the context of human behavior. New available tracking technologies for wild animals are generating datasets that indirectly may provide information about animal behavior. In this work, we propose a method for classifying these data into behavioral annotation, particularly collective behavior of a social group. Our method is based on sequence analysis with a direct encoding of the interactions of a group of wild animals. We evaluate our approach on a real world dataset, showing significant accuracy improvements over baseline methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds direct group interaction encoding to sequence models for classifying collective wild animal behavior from tracking data and claims accuracy gains on real data, but supplies no metrics or annotation details.

read the letter

The main takeaway is that this work adapts sequence analysis by directly encoding interactions among animals in a group, then applies it to label collective behaviors from wild animal tracking data. It reports better accuracy than baselines on a real dataset. That combination looks like a straightforward extension rather than a big conceptual leap, but it targets a practical gap in ecology applications of ML. Using actual tracking data instead of toy examples is a clear positive. The framing around new sensor tech for animals also makes sense and connects to existing activity recognition work. The soft spots are mostly in the lack of supporting detail. The abstract mentions significant improvements but shows no numbers, dataset sizes, error bars, or baseline descriptions. More importantly, the evaluation depends on human-provided behavioral annotations, and nothing is said about how those were collected, their reliability, or whether the sensors capture the interactions that matter. If the labels are noisy or incomplete, the accuracy numbers would not prove the method finds genuine group behavior. That assumption is load-bearing and untested in what is visible. This is for ecologists or computer vision researchers who already have animal tracking datasets and want to automate collective behavior labels. Someone in that niche could get a usable idea from the interaction encoding step. It is coherent on its own terms and shows honest engagement with the domain, so it deserves a serious referee to check the methods and results sections. I would send it to peer review rather than desk reject.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a method for classifying tracking data of wild animals into behavioral annotations, with a focus on collective group behavior. The approach relies on sequence analysis that directly encodes interactions among group members. It is evaluated on a real-world dataset and claims significant accuracy improvements relative to baseline methods.

Significance. If the quantitative gains are reproducible and the evaluation protocol is sound, the direct-interaction encoding could provide a practical addition to automated inference of collective animal behavior from tracking technologies. The absence of any reported metrics, dataset statistics, or validation details in the abstract, however, prevents a clear assessment of whether this contribution would be substantial.

major comments (3)

[Abstract] Abstract: the central claim of 'significant accuracy improvements over baseline methods' is stated without any numerical results, error bars, dataset size, number of sequences, or statistical tests. This omission is load-bearing because the magnitude and reliability of the reported gains cannot be evaluated from the given information.
[Evaluation] Evaluation section (standard placement after methods): the reported accuracy is measured against human-provided behavioral annotations, yet no protocol, inter-annotator agreement statistics, or discussion of label noise is supplied. Because the method's value rests on correctly identifying genuine group interactions, the lack of evidence that the annotations faithfully capture those interactions undermines the claim.
[Abstract] Abstract / Methods: no description is given of the baseline methods, the sensor modalities used, the number of animals or observation periods, or any exclusion criteria applied to the real-world dataset. These details are required to judge whether the accuracy gains generalize beyond the specific data collection setup.

minor comments (1)

[Abstract] The abstract could more explicitly name the sequence-analysis technique and the form of the direct interaction encoding.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments identify key areas where additional details and clarity can strengthen the presentation of our framework for identifying group behavior in wild animals. We respond to each major comment below and indicate where revisions will be made.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of 'significant accuracy improvements over baseline methods' is stated without any numerical results, error bars, dataset size, number of sequences, or statistical tests. This omission is load-bearing because the magnitude and reliability of the reported gains cannot be evaluated from the given information.

Authors: We agree with this observation. The abstract in the current version does not include quantitative results. In the revised manuscript, we will update the abstract to include specific accuracy metrics (e.g., the percentage improvements), dataset size, number of sequences, and any statistical tests performed to support the significance of the gains. revision: yes
Referee: [Evaluation] Evaluation section (standard placement after methods): the reported accuracy is measured against human-provided behavioral annotations, yet no protocol, inter-annotator agreement statistics, or discussion of label noise is supplied. Because the method's value rests on correctly identifying genuine group interactions, the lack of evidence that the annotations faithfully capture those interactions undermines the claim.

Authors: The evaluation is indeed based on human annotations from the dataset. We will add a discussion of the annotation process and potential sources of label noise in the revised evaluation section. However, inter-annotator agreement statistics were not collected during the original annotation process. revision: partial
Referee: [Abstract] Abstract / Methods: no description is given of the baseline methods, the sensor modalities used, the number of animals or observation periods, or any exclusion criteria applied to the real-world dataset. These details are required to judge whether the accuracy gains generalize beyond the specific data collection setup.

Authors: We will revise both the abstract and the methods section to explicitly describe the baseline methods used for comparison, the sensor modalities (tracking technologies), the number of animals and observation periods in the dataset, and any exclusion criteria applied. These details are present in the full manuscript but will be highlighted more clearly. revision: yes

standing simulated objections not resolved

Inter-annotator agreement statistics, as these were not computed in the original data annotation process and cannot be retroactively provided without new annotations.

Circularity Check

0 steps flagged

No circularity detected; evaluation on real data is independent of inputs

full rationale

The paper proposes a sequence analysis method with direct encoding of animal group interactions for classifying behavioral annotations and reports accuracy gains on a real-world dataset against baselines. No equations, fitted parameters presented as predictions, self-citations, or ansatzes are described in the provided text. The central claim is an empirical performance comparison rather than a derivation that reduces to its own inputs by construction, satisfying the criteria for a self-contained result with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the method implicitly rests on the domain assumption that tracking data plus annotations suffice to define interactions.

axioms (1)

domain assumption Tracking data plus human behavioral annotations contain sufficient signal to define and classify group interactions.
Required for the encoding step and the accuracy claim to be meaningful.

pith-pipeline@v0.9.0 · 5632 in / 1071 out tokens · 40134 ms · 2026-05-25T11:44:18.534023+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 2 internal anchors

[1]

Banos, J.-M

O. Banos, J.-M. Galvez, M. Damas, H. Pomares, and I. Rojas. Window size impact in human activity recognition. Sensors, 14(4):6474–6499, 2014

work page 2014
[2]

A General Framework For Task-Oriented Network Inference

I. Brugere, C. Kanich, and T. Y. Berger-Wolf. A general framework for task- oriented network inference. CoRR, abs/1705.00645, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[3]

Multi-Task Learning for Sequence Tagging: An Empirical Study

S. Changpinyo, H. Hu, and F. Sha. Multi-task learning for sequence tagging: An empirical study. arXiv preprint arXiv:1808.04151, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

Chen and C

T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794. ACM, 2016

work page 2016
[5]

M. C. Crofoot, R. Kays, and M. Wikelski. Data from: Shared decision-making drives collective movement in wild baboons, 2015

work page 2015
[6]

Lafferty, A

J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001

work page 2001
[7]

O. D. Lara and M. A. Labrador. A survey on human activity recognition using wearable sensors. IEEE communications surveys & tutorials , 15(3):1192–1209, 2012

work page 2012
[8]

J. Li, K. Asif, H. Wang, B. D. Ziebart, and T. Y. Berger-Wolf. Adversarial sequence tagging. In IJCAI, pages 1690–1696, 2016

work page 2016
[9]

Ordóñez and D

F. Ordóñez and D. Roggen. Deep convolutional and lstm recurrent neural net- works for multimodal wearable activity recognition. Sensors, 16(1):115, 2016

work page 2016
[10]

Sokolova, N

M. Sokolova, N. Japkowicz, and S. Szpakowicz. Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence , pages 1015–1021. Springer, 2006

work page 2006
[11]

Strandburg-Peshkin, D

A. Strandburg-Peshkin, D. R. Farine, I. D. Couzin, and M. C. Crofoot. Shared decision-making drives collective movement in wild baboons. Science, 348(6241):1358–1361, 2015

work page 2015
[12]

Z. Xing, J. Pei, and E. Keogh. A brief survey on sequence classification. ACM Sigkdd Explorations Newsletter, 12(1):40–48, 2010

work page 2010

[1] [1]

Banos, J.-M

O. Banos, J.-M. Galvez, M. Damas, H. Pomares, and I. Rojas. Window size impact in human activity recognition. Sensors, 14(4):6474–6499, 2014

work page 2014

[2] [2]

A General Framework For Task-Oriented Network Inference

I. Brugere, C. Kanich, and T. Y. Berger-Wolf. A general framework for task- oriented network inference. CoRR, abs/1705.00645, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[3] [3]

Multi-Task Learning for Sequence Tagging: An Empirical Study

S. Changpinyo, H. Hu, and F. Sha. Multi-task learning for sequence tagging: An empirical study. arXiv preprint arXiv:1808.04151, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

Chen and C

T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794. ACM, 2016

work page 2016

[5] [5]

M. C. Crofoot, R. Kays, and M. Wikelski. Data from: Shared decision-making drives collective movement in wild baboons, 2015

work page 2015

[6] [6]

Lafferty, A

J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001

work page 2001

[7] [7]

O. D. Lara and M. A. Labrador. A survey on human activity recognition using wearable sensors. IEEE communications surveys & tutorials , 15(3):1192–1209, 2012

work page 2012

[8] [8]

J. Li, K. Asif, H. Wang, B. D. Ziebart, and T. Y. Berger-Wolf. Adversarial sequence tagging. In IJCAI, pages 1690–1696, 2016

work page 2016

[9] [9]

Ordóñez and D

F. Ordóñez and D. Roggen. Deep convolutional and lstm recurrent neural net- works for multimodal wearable activity recognition. Sensors, 16(1):115, 2016

work page 2016

[10] [10]

Sokolova, N

M. Sokolova, N. Japkowicz, and S. Szpakowicz. Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence , pages 1015–1021. Springer, 2006

work page 2006

[11] [11]

Strandburg-Peshkin, D

A. Strandburg-Peshkin, D. R. Farine, I. D. Couzin, and M. C. Crofoot. Shared decision-making drives collective movement in wild baboons. Science, 348(6241):1358–1361, 2015

work page 2015

[12] [12]

Z. Xing, J. Pei, and E. Keogh. A brief survey on sequence classification. ACM Sigkdd Explorations Newsletter, 12(1):40–48, 2010

work page 2010