Cross-Modal Corroboration for Annotation-Free Wildlife Monitoring

Bharath Pillai; Christopher Stewart; Jenna Kline; Tanya Berger-Wolf; Varun Viswapriyan

arxiv: 2606.21613 · v1 · pith:JGFZARP6new · submitted 2026-06-19 · 💻 cs.CV · cs.AI

Cross-Modal Corroboration for Annotation-Free Wildlife Monitoring

Bharath Pillai , Varun Viswapriyan , Christopher Stewart , Tanya Berger-Wolf , Jenna Kline This is my paper

Pith reviewed 2026-06-26 14:36 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords wildlife monitoringcross-modal validationannotation-freecamera trapsacoustic detectionactivity patternsdeer behaviorzero-shot detection

0 comments

The pith

Visual and acoustic sensors each recover matching hourly activity curves for Milu deer that align with published behavioral priors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a vision pipeline using zero-shot detection and an acoustic pipeline using a fine-tuned classifier can each generate daily activity patterns from the same site. These two independent curves converge with each other and with existing literature on the species' behavior. The three-way match is presented as evidence that the patterns are not artifacts of shared training data or internal dataset correlations. A reader would care because the method offers a way to validate automated monitoring systems when labeled examples are scarce.

Core claim

Both the vision pipeline (zero-shot species detection via BioCLIP 2 with sliced inference and geometry-based localization) and the acoustic pipeline (fine-tuned vocalization classifier) independently recover activity patterns for a breeding herd of Milu deer that are consistent with known behavioral ecology, using minimal manual annotation; the three-way convergence of the two derived hourly curves with published priors rules out shared-data confounds.

What carries the argument

Three-way convergence of hourly activity curves derived independently from vision, acoustics, and published behavioral priors.

If this is right

The approach applies to any species detectable in both modalities when behavioral priors are documented in the literature.
Zero-shot visual detection plus geometry-based localization supports deployment under constrained camera positioning.
Fine-tuned acoustic classifiers can serve as an independent check on visual activity estimates.
The framework reduces the need for large-scale manual annotation to validate monitoring pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the same convergence appears across multiple sites and species, monitoring networks could self-calibrate at conservation scale.
Persistent mismatch between one modality and the priors could be used to diagnose failures in the zero-shot detector or the acoustic classifier.
The method might generalize to additional sensor types such as thermal or satellite imagery provided behavioral priors exist.

Load-bearing premise

Published behavioral priors are independent of the dataset and agreement across the three sources is enough to rule out systematic detector errors.

What would settle it

A new dataset where the visual and acoustic curves match each other but both deviate from the published behavioral priors for the same species.

Figures

Figures reproduced from arXiv: 2606.21613 by Bharath Pillai, Christopher Stewart, Jenna Kline, Tanya Berger-Wolf, Varun Viswapriyan.

**Figure 2.** Figure 2: GPS projection model applied to real detections from [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Mean diurnal activity pattern of Milu deer over the deployment, captured by camera traps and acoustic recordings. Camera [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Per-day camera-trap diurnal activity over the deployment period. Each panel shows hourly detection counts for a single sampling [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Scaling wildlife monitoring for real-world conservation deployments requires automated analysis of smart sensors that operate under severe annotation scarcity. We propose leveraging expert knowledge of species activity patterns as an annotation-free validation signal for multimodal monitoring pipelines. We operationalize agreement as the alignment of independently derived hourly activity curves both with each other and with published behavioral priors-a three-way convergence that rules out shared-data confounds and dataset-internal correlation as alternative explanations. Our vision pipeline combines zero-shot species detection via BioCLIP 2, sliced inference to handle deployment-constrained camera positioning, and geometry-based geographic localization from camera trap imagery. Our acoustic pipeline detects species vocalizations via a fine-tuned classifier. We validate the pipeline on a breeding herd of Milu deer and demonstrate that both modalities independently recover activity patterns consistent with known deer behavioral ecology with minimal manual annotation. The framework applies to species detectable in both visual and acoustic modalities for which behavioral priors are documented in the literature, suggesting a practical path toward self-validating wildlife-monitoring pipelines at conservation scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Three-way convergence for annotation-free validation is a novel framing, but the abstract lacks any quantitative agreement metrics or bias checks.

read the letter

The key takeaway is that this work introduces a three-way corroboration method—matching visual activity curves from BioCLIP 2, acoustic curves from a fine-tuned classifier, and published behavioral priors—as an annotation-free validation signal for wildlife monitoring. That operationalization appears new compared to prior multimodal work.

The paper applies this to Milu deer data from camera traps and audio sensors. The vision side uses zero-shot detection with sliced inference for positioning constraints and geometry for localization. The acoustic side detects vocalizations after fine-tuning. They report that both recover patterns consistent with known deer ecology using minimal manual annotation. This is a practical attempt to address annotation scarcity in real conservation deployments.

What stands out is the effort to make the validation self-contained by leveraging external priors and cross-modal independence to reduce the chance of dataset-specific artifacts. It builds directly on tools like BioCLIP without claiming to invent new models, which keeps the focus on the validation strategy.

The soft spots are in the evidence presented. The abstract states the patterns are consistent but provides no quantitative measures of agreement, such as correlation values or overlap statistics, and no details on how alignment was assessed or if any post-processing choices were involved. This makes the central claim hard to evaluate. Additionally, the possibility that both pipelines share inference biases—for instance, better performance during daylight hours or peak activity times that coincidentally align with priors—is not addressed with robustness tests. The cross-modal aspect helps with shared training data but not necessarily with correlated errors at inference time.

This paper would interest researchers in ecological AI and conservation technology who deal with sensor networks. A reader looking for ideas on reducing annotation needs could extract the pipeline design and the conceptual framing, even if the results section needs strengthening.

It deserves peer review because the problem is relevant and the approach is grounded enough to warrant detailed feedback on the methods and any unreported metrics.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an annotation-free validation framework for multimodal wildlife monitoring by deriving independent hourly activity curves from a zero-shot vision pipeline (BioCLIP 2 with sliced inference and geometry-based localization) and a fine-tuned acoustic classifier, then demonstrating their mutual alignment and consistency with published behavioral priors for Milu deer as a three-way convergence that rules out shared-data confounds.

Significance. If the alignment can be shown to be quantitatively robust, the approach offers a scalable path for self-validating sensor pipelines in conservation settings where annotation is scarce, by treating literature priors as an external validation signal. The cross-modal design and use of zero-shot models are practical strengths for deployment.

major comments (2)

[Abstract and Results section] Abstract and Results section: the central claim that 'both modalities independently recover activity patterns consistent with known deer behavioral ecology' and that three-way convergence validates the pipelines rests on an unquantified notion of 'alignment'; no correlation coefficients, RMSE values, statistical tests, or error bars on the hourly curves are reported, so the strength of evidence for the claim cannot be assessed.
[Methods/Validation discussion] Methods/Validation discussion: the assertion that cross-modal independence plus agreement with priors 'rules out shared-data confounds and dataset-internal correlation' does not address possible correlated model biases (e.g., both pipelines exhibiting higher detection rates during daylight or peak vocalization windows that happen to match Milu deer priors). A concrete robustness test or sensitivity analysis against such inference-time confounds is required for the validation argument to hold.

minor comments (2)

[Methods] Provide explicit details on the acoustic classifier fine-tuning dataset, hyperparameters, and any overlap checks with the camera-trap imagery to strengthen reproducibility.
[Vision pipeline description] Clarify the exact procedure for 'sliced inference' and geometry-based localization, including any assumptions about camera positioning that could affect activity curve derivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments identify key opportunities to strengthen the quantitative support for our claims and to explicitly address potential model biases. We respond to each major comment below and commit to revisions that directly incorporate the suggested improvements.

read point-by-point responses

Referee: [Abstract and Results section] Abstract and Results section: the central claim that 'both modalities independently recover activity patterns consistent with known deer behavioral ecology' and that three-way convergence validates the pipelines rests on an unquantified notion of 'alignment'; no correlation coefficients, RMSE values, statistical tests, or error bars on the hourly curves are reported, so the strength of evidence for the claim cannot be assessed.

Authors: We agree that the absence of quantitative alignment metrics limits the strength of evidence that can be assessed from the current text. In the revised manuscript we will add Pearson correlation coefficients and RMSE between the vision-derived and acoustic-derived hourly activity curves. We will also report bootstrap-derived 95% confidence intervals as error bars on the activity curves and include a statistical comparison (e.g., Kolmogorov-Smirnov test) against the published behavioral priors. These metrics will be presented in both the abstract and results sections. revision: yes
Referee: [Methods/Validation discussion] Methods/Validation discussion: the assertion that cross-modal independence plus agreement with priors 'rules out shared-data confounds and dataset-internal correlation' does not address possible correlated model biases (e.g., both pipelines exhibiting higher detection rates during daylight or peak vocalization windows that happen to match Milu deer priors). A concrete robustness test or sensitivity analysis against such inference-time confounds is required for the validation argument to hold.

Authors: We acknowledge that correlated inference-time biases remain a plausible alternative explanation even with cross-modal independence. In the revision we will add an explicit sensitivity-analysis subsection that (1) varies detection thresholds in both pipelines and recomputes the alignment, (2) introduces controlled temporal shifts to the activity curves to test robustness of the observed convergence, and (3) discusses the distinct training regimes (zero-shot vision versus fine-tuned acoustics) to reduce the plausibility of shared biases. These additions will directly address the concern. revision: yes

Circularity Check

0 steps flagged

No significant circularity; external priors provide independent validation signal

full rationale

The paper derives hourly activity curves independently from vision (BioCLIP 2 zero-shot) and acoustic (fine-tuned classifier) pipelines on the Milu deer dataset, then checks alignment with each other and with published behavioral priors from the literature. No equations, self-citations, or ansatzes are shown that reduce the reported curves or the three-way convergence claim to a fit or definition taken from the same data. The validation signal is explicitly external expert knowledge, satisfying the self-contained criterion with no load-bearing internal reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the existence of independent published behavioral priors for the target species and on the assumption that zero-shot and fine-tuned detectors produce activity curves whose agreement can be interpreted causally. No free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Published behavioral priors for Milu deer are independent of the current camera-trap and acoustic dataset.
Invoked when the three-way convergence is said to rule out dataset-internal correlation.
domain assumption Zero-shot species detection via BioCLIP 2 and the fine-tuned acoustic classifier produce activity curves whose errors are uncorrelated across modalities.
Required for the claim that cross-modal agreement validates the detections.

pith-pipeline@v0.9.1-grok · 5713 in / 1447 out tokens · 12126 ms · 2026-06-26T14:36:34.811323+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 2 linked inside Pith

[1]

Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection

Fatih Cagatay Akyon, Sinan Onur Altinuc, and Alptekin Temizel. Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection. In2022 IEEE International Confer- ence on Image Processing (ICIP), pages 966–970, 2022. 2 00 04 08 12 16 20 Hour of day 0 50 100 150 200 250 300 350 400Detection count 2025-06-30 (n = 1,718) 00 04 08 12 16 20 Hour of day...

2022
[2]

wav2vec 2.0: A framework for self-supervised learning of speech representations.Advances in neural infor- mation processing systems, 33:12449–12460, 2020

Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations.Advances in neural infor- mation processing systems, 33:12449–12460, 2020. 4

2020
[3]

Efficient pipeline for camera trap image review

Sara Beery, Dan Morris, and Siyu Yang. Efficient pipeline for camera trap image review. (arXiv:1907.06772), 2019. arXiv:1907.06772 [cs]. 2

Pith/arXiv arXiv 1907
[4]

Buxton, Patrick E

Rachel T. Buxton, Patrick E. Lendrum, Kevin R. Crooks, and George Wittemyer. Pairing camera traps and acoustic recorders to monitor the ecological impact of human distur- bance.Global Ecology and Conservation, 16:e00493, 2018. 1, 2

2018
[5]

Sam 3: Segment anything with concepts.arXiv preprint arXiv:2511.16719, 2025

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoub- hik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. Sam 3: Segment anything with concepts.arXiv preprint arXiv:2511.16719, 2025. 2

Pith/arXiv arXiv 2025
[6]

Xgboost: A scalable tree boosting system

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016. 4

2016
[7]

Behavioral shifts of reintro- duced milu deer elaphurus davidianus in east dongting lake of china.Scientific Reports, 15(1):34833, 2025

Zhibin Cheng, Hong Zhang, Jialiang Ma, Chengmiao Feng, Wei Liu, Zhenyu Zhong, Qingyun Guo, Qingxun Zhang, Pan Zhang, Shumiao Zhang, et al. Behavioral shifts of reintro- duced milu deer elaphurus davidianus in east dongting lake of china.Scientific Reports, 15(1):34833, 2025. 2, 4, 5

2025
[8]

Mammalps: A multi- view video behavior monitoring dataset of wild mammals in the swiss alps

Valentin Gabeff, Haozhe Qi, Brendan Flaherty, Gencer Sum- bul, Alexander Mathis, and Devis Tuia. Mammalps: A multi- view video behavior monitoring dataset of wild mammals in the swiss alps. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 13854–13864, 2025. 2, 5

2025
[9]

Yamnet.https : / / www

Google. Yamnet.https : / / www . kaggle . com / models/google/yamnet, 2020. 4

2020
[10]

Searle, Johan Wahlstr ¨om, Matthew Wijers, and Benno I

Jonathan Growcott, Alex Lobora, Andrew Markham, Char- lotte E. Searle, Johan Wahlstr ¨om, Matthew Wijers, and Benno I. Simmons. The secret acoustic world of leopards: A paired camera trap and bioacoustics survey facilitates the individual identification of leopards via their roars.Remote Sensing in Ecology and Conservation, page rse2.429, 2024. 2

2024
[11]

Campolongo, Matthew J

Jianyang Gu, Samuel Stevens, Elizabeth G. Campolongo, Matthew J. Thompson, Net Zhang, Jiaman Wu, Andrei Kopanev, Zheda Mai, Alexander E. White, James Balhoff, Wasila Dahdul, Daniel Rubenstein, Hilmar Lapp, Tanya Berger-Wolf, Wei-Lun Chao, and Yu Su. Bioclip 2: Emer- gent properties from scaling hierarchical contrastive learn- ing. (arXiv:2505.23883), 2025...

arXiv 2025
[12]

Zhigang Jiang and Richard B. Harris. Elaphurus davidi- anus.The IUCN Red List of Threatened Species, 2016: e.T7121A22159785, 2016. Accessed: 2026-03-15. 2

2016
[13]

Auto- mated distance estimation for wildlife camera trapping.Eco- logical Informatics, 70:101734, 2022

Peter Johanns, Timm Haucke, and V olker Steinhage. Auto- mated distance estimation for wildlife camera trapping.Eco- logical Informatics, 70:101734, 2022. 2

2022
[14]

Smartwilds: Multimodal wildlife monitoring dataset

Jenna Kline, Anirudh Potlapally, Bharath Pillai, Tanishka Wani, Rugved Katole, Vedant Patil, Penelope Covey, Hari Subramoni, Tanya Berger-Wolf, and Christopher Stewart. Smartwilds: Multimodal wildlife monitoring dataset. (arXiv:2509.18894), 2025. arXiv:2509.18894 [cs]. 1, 2, 6

arXiv 2025
[15]

The wilds: Conservation center in southeastern ohio.https://www.thewilds.org/, 2026

The Wilds. The wilds: Conservation center in southeastern ohio.https://www.thewilds.org/, 2026. Non- profit conservation center spanning over 10,000 acres fo- cused on wildlife conservation, research, and education. Ac- cessed: 2026-03-15. 2

2026
[16]

Multi- scale and multimodal species distribution modeling

Nina van Tiel, Robin Zbinden, Emanuele Dalsasso, Ben- jamin Kellenberger, Lo¨ıc Pellissier, and Devis Tuia. Multi- scale and multimodal species distribution modeling. In European conference on computer vision, pages 151–159. Springer, 2024. 2

2024
[17]

Perspectives in machine learning for wildlife conservation.Nature communications, 13(1):792,

Devis Tuia, Benjamin Kellenberger, Sara Beery, Blair R Costelloe, Silvia Zuffi, Benjamin Risse, Alexander Mathis, Mackenzie W Mathis, Frank Van Langevelde, Tilo Burghardt, et al. Perspectives in machine learning for wildlife conservation.Nature communications, 13(1):792,
[18]

Yolov8: A novel object detection algorithm with enhanced performance and robust- ness

Rejin Varghese and Sambath M. Yolov8: A novel object detection algorithm with enhanced performance and robust- ness. In2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), pages 1–6, 2024. 2

2024
[19]

Wilds deer bioacoustics, 2026

Varun Viswapriyan, Jenna Kline, and Bharath Pillai. Wilds deer bioacoustics, 2026. 6

2026
[20]

Jones, and Duncan Wil- son

Aude Vuilliomenet, Kate E. Jones, and Duncan Wil- son. Future of edge ai in biodiversity monitoring. (arXiv:2602.13496), 2026. arXiv:2602.13496 [cs]. 6

arXiv 2026
[21]

The sa-fari dataset: Segment anything in footage of animals for recognition and identification.arXiv preprint arXiv:2511.15622, 2025

Dante Francisco Wasmuht, Otto Brookes, Maximillian Schall, Pablo Palencia, Chris Beirne, Tilo Burghardt, Ma- jid Mirmehdi, Hjalmar K ¨uhl, Mimi Arandjelovic, Sam Pot- tie, et al. The sa-fari dataset: Segment anything in footage of animals for recognition and identification.arXiv preprint arXiv:2511.15622, 2025. 2

arXiv 2025
[22]

Amador, Antoine Cribellier, Marcel Klaassen, Henrik J

Hui Yu, Guillermo J. Amador, Antoine Cribellier, Marcel Klaassen, Henrik J. de Knegt, Marc Naguib, Reindert Nij- land, Lukasz Nowak, Herbert H. T. Prins, Lysanne Snijders, Chris Tyson, and Florian T. Muijres. Edge computing in wildlife behavior and ecology.Trends in Ecology and Evo- lution, 39(2):128–130, 2024. 6

2024

[1] [1]

Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection

Fatih Cagatay Akyon, Sinan Onur Altinuc, and Alptekin Temizel. Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection. In2022 IEEE International Confer- ence on Image Processing (ICIP), pages 966–970, 2022. 2 00 04 08 12 16 20 Hour of day 0 50 100 150 200 250 300 350 400Detection count 2025-06-30 (n = 1,718) 00 04 08 12 16 20 Hour of day...

2022

[2] [2]

wav2vec 2.0: A framework for self-supervised learning of speech representations.Advances in neural infor- mation processing systems, 33:12449–12460, 2020

Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations.Advances in neural infor- mation processing systems, 33:12449–12460, 2020. 4

2020

[3] [3]

Efficient pipeline for camera trap image review

Sara Beery, Dan Morris, and Siyu Yang. Efficient pipeline for camera trap image review. (arXiv:1907.06772), 2019. arXiv:1907.06772 [cs]. 2

Pith/arXiv arXiv 1907

[4] [4]

Buxton, Patrick E

Rachel T. Buxton, Patrick E. Lendrum, Kevin R. Crooks, and George Wittemyer. Pairing camera traps and acoustic recorders to monitor the ecological impact of human distur- bance.Global Ecology and Conservation, 16:e00493, 2018. 1, 2

2018

[5] [5]

Sam 3: Segment anything with concepts.arXiv preprint arXiv:2511.16719, 2025

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoub- hik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. Sam 3: Segment anything with concepts.arXiv preprint arXiv:2511.16719, 2025. 2

Pith/arXiv arXiv 2025

[6] [6]

Xgboost: A scalable tree boosting system

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016. 4

2016

[7] [7]

Behavioral shifts of reintro- duced milu deer elaphurus davidianus in east dongting lake of china.Scientific Reports, 15(1):34833, 2025

Zhibin Cheng, Hong Zhang, Jialiang Ma, Chengmiao Feng, Wei Liu, Zhenyu Zhong, Qingyun Guo, Qingxun Zhang, Pan Zhang, Shumiao Zhang, et al. Behavioral shifts of reintro- duced milu deer elaphurus davidianus in east dongting lake of china.Scientific Reports, 15(1):34833, 2025. 2, 4, 5

2025

[8] [8]

Mammalps: A multi- view video behavior monitoring dataset of wild mammals in the swiss alps

Valentin Gabeff, Haozhe Qi, Brendan Flaherty, Gencer Sum- bul, Alexander Mathis, and Devis Tuia. Mammalps: A multi- view video behavior monitoring dataset of wild mammals in the swiss alps. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 13854–13864, 2025. 2, 5

2025

[9] [9]

Yamnet.https : / / www

Google. Yamnet.https : / / www . kaggle . com / models/google/yamnet, 2020. 4

2020

[10] [10]

Searle, Johan Wahlstr ¨om, Matthew Wijers, and Benno I

Jonathan Growcott, Alex Lobora, Andrew Markham, Char- lotte E. Searle, Johan Wahlstr ¨om, Matthew Wijers, and Benno I. Simmons. The secret acoustic world of leopards: A paired camera trap and bioacoustics survey facilitates the individual identification of leopards via their roars.Remote Sensing in Ecology and Conservation, page rse2.429, 2024. 2

2024

[11] [11]

Campolongo, Matthew J

Jianyang Gu, Samuel Stevens, Elizabeth G. Campolongo, Matthew J. Thompson, Net Zhang, Jiaman Wu, Andrei Kopanev, Zheda Mai, Alexander E. White, James Balhoff, Wasila Dahdul, Daniel Rubenstein, Hilmar Lapp, Tanya Berger-Wolf, Wei-Lun Chao, and Yu Su. Bioclip 2: Emer- gent properties from scaling hierarchical contrastive learn- ing. (arXiv:2505.23883), 2025...

arXiv 2025

[12] [12]

Zhigang Jiang and Richard B. Harris. Elaphurus davidi- anus.The IUCN Red List of Threatened Species, 2016: e.T7121A22159785, 2016. Accessed: 2026-03-15. 2

2016

[13] [13]

Auto- mated distance estimation for wildlife camera trapping.Eco- logical Informatics, 70:101734, 2022

Peter Johanns, Timm Haucke, and V olker Steinhage. Auto- mated distance estimation for wildlife camera trapping.Eco- logical Informatics, 70:101734, 2022. 2

2022

[14] [14]

Smartwilds: Multimodal wildlife monitoring dataset

Jenna Kline, Anirudh Potlapally, Bharath Pillai, Tanishka Wani, Rugved Katole, Vedant Patil, Penelope Covey, Hari Subramoni, Tanya Berger-Wolf, and Christopher Stewart. Smartwilds: Multimodal wildlife monitoring dataset. (arXiv:2509.18894), 2025. arXiv:2509.18894 [cs]. 1, 2, 6

arXiv 2025

[15] [15]

The wilds: Conservation center in southeastern ohio.https://www.thewilds.org/, 2026

The Wilds. The wilds: Conservation center in southeastern ohio.https://www.thewilds.org/, 2026. Non- profit conservation center spanning over 10,000 acres fo- cused on wildlife conservation, research, and education. Ac- cessed: 2026-03-15. 2

2026

[16] [16]

Multi- scale and multimodal species distribution modeling

Nina van Tiel, Robin Zbinden, Emanuele Dalsasso, Ben- jamin Kellenberger, Lo¨ıc Pellissier, and Devis Tuia. Multi- scale and multimodal species distribution modeling. In European conference on computer vision, pages 151–159. Springer, 2024. 2

2024

[17] [17]

Perspectives in machine learning for wildlife conservation.Nature communications, 13(1):792,

Devis Tuia, Benjamin Kellenberger, Sara Beery, Blair R Costelloe, Silvia Zuffi, Benjamin Risse, Alexander Mathis, Mackenzie W Mathis, Frank Van Langevelde, Tilo Burghardt, et al. Perspectives in machine learning for wildlife conservation.Nature communications, 13(1):792,

[18] [18]

Yolov8: A novel object detection algorithm with enhanced performance and robust- ness

Rejin Varghese and Sambath M. Yolov8: A novel object detection algorithm with enhanced performance and robust- ness. In2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), pages 1–6, 2024. 2

2024

[19] [19]

Wilds deer bioacoustics, 2026

Varun Viswapriyan, Jenna Kline, and Bharath Pillai. Wilds deer bioacoustics, 2026. 6

2026

[20] [20]

Jones, and Duncan Wil- son

Aude Vuilliomenet, Kate E. Jones, and Duncan Wil- son. Future of edge ai in biodiversity monitoring. (arXiv:2602.13496), 2026. arXiv:2602.13496 [cs]. 6

arXiv 2026

[21] [21]

The sa-fari dataset: Segment anything in footage of animals for recognition and identification.arXiv preprint arXiv:2511.15622, 2025

Dante Francisco Wasmuht, Otto Brookes, Maximillian Schall, Pablo Palencia, Chris Beirne, Tilo Burghardt, Ma- jid Mirmehdi, Hjalmar K ¨uhl, Mimi Arandjelovic, Sam Pot- tie, et al. The sa-fari dataset: Segment anything in footage of animals for recognition and identification.arXiv preprint arXiv:2511.15622, 2025. 2

arXiv 2025

[22] [22]

Amador, Antoine Cribellier, Marcel Klaassen, Henrik J

Hui Yu, Guillermo J. Amador, Antoine Cribellier, Marcel Klaassen, Henrik J. de Knegt, Marc Naguib, Reindert Nij- land, Lukasz Nowak, Herbert H. T. Prins, Lysanne Snijders, Chris Tyson, and Florian T. Muijres. Edge computing in wildlife behavior and ecology.Trends in Ecology and Evo- lution, 39(2):128–130, 2024. 6

2024