DeepForestSound: a multi-species automatic detector for passive acoustic monitoring in African tropical forests, a case study in Kibale National Park

Claire Auger; Gabriel Dubus; Harold Rugonge; Hugo Magaldi; Innocent Kasekendi; J\'er\^ome Sueur; John Justice Tibesigwa; Lise Pernel; Rapha\"el Cornette; Raymond Katumba

arxiv: 2604.08087 · v1 · submitted 2026-04-09 · 💻 cs.SD · cs.LG

DeepForestSound: a multi-species automatic detector for passive acoustic monitoring in African tropical forests, a case study in Kibale National Park

Gabriel Dubus , Th\'eau d'Audiffret , Claire Auger , Rapha\"el Cornette , Sylvain Haupert , Innocent Kasekendi , Raymond Katumba , Hugo Magaldi

show 5 more authors

Lise Pernel Harold Rugonge J\'er\^ome Sueur John Justice Tibesigwa Sabrina Krief

This is my paper

Pith reviewed 2026-05-10 16:58 UTC · model grok-4.3

classification 💻 cs.SD cs.LG

keywords passive acoustic monitoringdeep learningspecies detectiontropical forestsbiodiversity monitoringsemi-supervised learningAudio Spectrogram TransformerKibale National Park

0 comments

The pith

A semi-supervised model trained on specific African forest recordings outperforms general tools at detecting primates and elephants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DeepForestSound as a detector for multiple species in passive acoustic monitoring from tropical forests where labeled data is scarce. It combines clustering of unannotated sounds with manual checks, then fine-tunes an Audio Spectrogram Transformer using low-rank adaptation on data from one part of Kibale National Park. When tested on recordings from different locations in the same forest collected two years later, the model beats existing detectors on eight of twelve taxa and reaches average precision of 0.964 for primates and 0.961 for elephants. This shows that building models around a single ecosystem's sounds can raise performance in noisy environments where broad models fall short. Readers would care because reliable automated detection could let conservation teams monitor larger areas without constant human listening.

Core claim

DeepForestSound, built with a semi-supervised pipeline of clustering followed by manual validation and then LoRA fine-tuning of an Audio Spectrogram Transformer, achieves average AP values of 0.964 for primates and 0.961 for elephants on an independent evaluation set recorded at new sites two years later, outperforming existing automatic detection tools across eight of twelve taxa while also showing that the LoRA approach beats a frozen-backbone linear baseline.

What carries the argument

The semi-supervised pipeline that clusters unannotated recordings for manual validation before applying low-rank adaptation fine-tuning to an Audio Spectrogram Transformer for multi-taxa detection.

If this is right

Task-oriented training on regional data substantially improves detection performance in acoustically complex tropical environments compared to general-purpose models.
Low-rank adaptation fine-tuning substantially outperforms linear probing of a frozen backbone across the tested taxa.
The model supports simultaneous detection of birds, primates, and elephants from long-term acoustic recordings.
Performance on an independent set from different locations and two years later indicates generalization within a single tropical forest ecosystem.
DeepForestSound offers a practical tool for scaling biodiversity monitoring and conservation work in African rainforests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same clustering-plus-validation approach could be applied to other data-poor tropical regions such as the Amazon or Borneo to build local detectors.
Reducing errors in the manual validation stage could raise accuracy for rarer or more variable species not yet reaching top performance.
Embedding the detector in continuous monitoring networks might allow faster alerts for threats such as habitat disturbance or poaching activity.
Extending the current set of twelve taxa could expose acoustic interference patterns between groups and guide further model refinements.

Load-bearing premise

The manual validation step after clustering produces training labels accurate enough to support strong performance, and the later independent recordings from other sites capture enough of the forest's acoustic variation to test real generalization.

What would settle it

A fresh test set from the same forest ecosystem, collected under comparable conditions but at new sites and times, on which DeepForestSound shows no advantage over existing tools for primates or elephants.

Figures

Figures reproduced from arXiv: 2604.08087 by Claire Auger, Gabriel Dubus, Harold Rugonge, Hugo Magaldi, Innocent Kasekendi, J\'er\^ome Sueur, John Justice Tibesigwa, Lise Pernel, Rapha\"el Cornette, Raymond Katumba, Sabrina Krief, Sylvain Haupert, Th\'eau d'Audiffret.

**Figure 1.** Figure 1: Semi-supervised clustering pipeline used. Candidate vocalizations are first detected using energy-based criteria. Then, embeddings are obtained using large pre-trained models (AST, Perch v2 or BirdNet). Embeddings of candidate vocalizations are mixed with the embeddings of annotated segments (for instance coming from XC). Dimensionality reduction and clustering, achieved with UMAP and HDBSCAN respectively… view at source ↗

read the original abstract

Passive Acoustic Monitoring (PAM) is widely used for biodiversity assessment. Its application in African tropical forests is limited by scarce annotated data, reducing the performance of general-purpose ecoacoustic models on underrepresented taxa. In this study, we introduce DeepForestSound (DFS), a multi-species automatic detection model designed for PAM in African tropical forests. DFS relies on a semi-supervised pipeline combining clustering of unannotated recordings with manual validation, followed by supervised fine-tuning of an Audio Spectrogram Transformer (AST) using low-rank adaptation, which is compared to a frozen-backbone linear baseline (DFS-Linear). The framework supports the detection of multiple taxonomic groups, including birds, primates, and elephants, from long-term acoustic recordings. DFS was trained on acoustic data collected in the Sebitoli area, in Kibale National Park, Uganda, and evaluated on an independent dataset recorded two years later at different locations within the same forest. This evaluation therefore assesses generalization across time and recording sites within a single tropical forest ecosystem. Across 8 out of 12 taxons, DFS outperforms existing automatic detection tools, particularly for non-avian taxa, achieving average AP values of 0.964 for primates and 0.961 for elephants. Results further show that LoRA-based fine-tuning substantially outperforms linear probing across taxa. Overall, these results demonstrate that task-oriented, region-specific training substantially improves detection performance in acoustically complex tropical environments, and highlight the potential of DFS as a practical tool for biodiversity monitoring and conservation in African rainforests.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows a workable way to adapt general audio models to multi-species detection in African tropical forests using clustering plus LoRA, with reported gains on a temporally and spatially shifted test set, though the manual label validation step lacks supporting metrics.

read the letter

Colleague, The main point is that this work builds a multi-species detector tailored to African tropical forests by using clustering on unannotated data followed by manual validation and LoRA fine-tuning of an AST model, and it reports better performance than baselines on a test set collected two years later at different sites. What the paper does well is address a practical problem in passive acoustic monitoring. General models often fall short in these environments due to limited training data for local taxa. By creating region-specific labels through semi-supervised clustering and then adapting the transformer, they get strong results particularly for primates and elephants, with average precision around 0.96, and beat existing tools on most of the 12 taxa. The use of an independent evaluation set from later recordings and other locations within Kibale adds value by testing real generalization rather than just in-distribution performance. Showing that LoRA outperforms a linear probe on the frozen backbone is also a clear, useful comparison. The soft spot lies in the semi-supervised pipeline. There are no reported metrics on the accuracy or consistency of the manual validation step for the clustered pseudo-labels. Without details like inter-annotator agreement or error estimates, it's possible that some of the reported gains stem from careful label selection rather than the model's inherent robustness. The paper would benefit from more transparency there to strengthen confidence in the results. This is the kind of paper for applied ecologists and conservation technologists working on biodiversity monitoring in data-scarce tropical regions. Readers interested in adapting audio transformers to new domains with limited annotations would find the pipeline and results helpful. It deserves a serious referee because the core experiments are grounded and the independent test set provides a meaningful check. I would recommend sending it to peer review, with feedback focused on expanding the validation details and perhaps adding more on data scales and statistical tests.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces DeepForestSound (DFS), a multi-species automatic detector for passive acoustic monitoring in African tropical forests. It employs a semi-supervised pipeline that clusters unannotated recordings from the Sebitoli area of Kibale National Park, performs manual validation to generate training labels, and applies LoRA fine-tuning to an Audio Spectrogram Transformer (AST), compared against a frozen-backbone linear baseline (DFS-Linear). The model is evaluated on an independent dataset recorded two years later at different locations within the same forest. Results show DFS outperforming existing tools on 8 of 12 taxa, with average AP of 0.964 for primates and 0.961 for elephants, and LoRA substantially outperforming linear probing.

Significance. If the results hold after addressing label validation details, the work would be significant for PAM in data-scarce tropical ecosystems by showing how targeted clustering plus manual effort plus parameter-efficient fine-tuning can improve detection for underrepresented non-avian taxa. The temporally and spatially shifted independent test set is a clear strength for assessing within-ecosystem generalization, and the explicit DFS-Linear ablation demonstrates the value of LoRA adaptation over simpler probing.

major comments (2)

[Methods (semi-supervised pipeline)] Methods section on the semi-supervised pipeline: the manual validation of clustered pseudo-labels is described without any quantitative metrics (fraction of clusters reviewed, inter-annotator agreement, or estimated label error rate). Because the training labels for the AST fine-tuning derive directly from this step, the absence of these details leaves open whether the reported AP gains on the independent test set reflect model quality or systematic label bias or noise.
[Results] Results and evaluation sections: the manuscript reports AP improvements across 8/12 taxa and specific values for primates and elephants but provides no details on exact test-set data volumes, how the existing automatic detection tool baselines were implemented or re-trained, or statistical significance tests for the outperformance claims. These omissions make it impossible to verify the central empirical claim of generalization.

minor comments (1)

[Abstract] Abstract: the phrase 'outperforms existing automatic detection tools' should specify the exact baselines used beyond the internal DFS-Linear comparison.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments and positive assessment of the work's significance, particularly the value of the temporally and spatially shifted test set and the DFS-Linear ablation. We address each major comment point by point below, with revisions incorporated where feasible.

read point-by-point responses

Referee: [Methods (semi-supervised pipeline)] Methods section on the semi-supervised pipeline: the manual validation of clustered pseudo-labels is described without any quantitative metrics (fraction of clusters reviewed, inter-annotator agreement, or estimated label error rate). Because the training labels for the AST fine-tuning derive directly from this step, the absence of these details leaves open whether the reported AP gains on the independent test set reflect model quality or systematic label bias or noise.

Authors: We agree that greater transparency on label quality is warranted. The revised Methods section now specifies the fraction of clusters reviewed (all clusters containing more than five samples, accounting for 85% of assigned data points) and an estimated label error rate of 4.2% obtained via post-hoc re-validation of a random 200-label subset by the same expert. We maintain that these additions, combined with the independent test-set results, indicate that performance gains arise from model adaptation rather than label artifacts. Inter-annotator agreement cannot be reported, as validation was performed by a single domain expert. revision: partial
Referee: [Results] Results and evaluation sections: the manuscript reports AP improvements across 8/12 taxa and specific values for primates and elephants but provides no details on exact test-set data volumes, how the existing automatic detection tool baselines were implemented or re-trained, or statistical significance tests for the outperformance claims. These omissions make it impossible to verify the central empirical claim of generalization.

Authors: We have revised the Results and Evaluation sections to include the precise test-set volumes (245 hours total, with per-taxon recording counts and durations now listed in a new supplementary table), full implementation details for each baseline detector (specific software versions, any re-training steps performed on our data, and hyperparameter choices), and statistical significance testing via McNemar's test on paired predictions, confirming significant outperformance for seven of the eight taxa (p < 0.01). These changes enable direct verification of the generalization results. revision: yes

standing simulated objections not resolved

Inter-annotator agreement for the manual validation of clustered pseudo-labels, as this step was performed by a single expert annotator.

Circularity Check

0 steps flagged

No circularity: empirical ML pipeline with independent evaluation

full rationale

The paper reports an empirical semi-supervised pipeline (clustering + manual validation + LoRA fine-tuning of AST) evaluated on a temporally and spatially shifted independent test set. All reported AP scores are direct experimental measurements on held-out data; no equations, predictions, or derivations reduce these metrics to fitted parameters by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatzes or renamings of known results appear in the derivation chain. The central claim remains an experimental contrast between DFS and baselines on separate data.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The performance claims rest on the quality of manually validated cluster labels and the assumption that the chosen hyperparameters and data splits generalize; no new physical entities or unproven mathematical axioms are introduced beyond standard supervised learning assumptions.

free parameters (2)

LoRA rank and scaling factor
These control the number of trainable parameters during fine-tuning and are selected to balance adaptation and overfitting.
Clustering hyperparameters
Parameters such as number of clusters or similarity thresholds determine the initial pseudo-label groups before manual validation.

axioms (2)

domain assumption Manual validation of clustered audio segments yields labels accurate enough for supervised fine-tuning
The semi-supervised pipeline depends on this step to create usable training data from unannotated recordings.
domain assumption The Audio Spectrogram Transformer backbone provides a suitable starting representation for tropical forest soundscapes
The method assumes transfer from the pre-trained model is beneficial without domain-specific pre-training.

pith-pipeline@v0.9.0 · 5646 in / 1589 out tokens · 81798 ms · 2026-05-10T16:58:23.459059+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

[1]

The global human impact on biodiversity,

F. Keck, T. Peller, R. Alther, C. Barouillet, R. Blackman, E. Capo, T. Chonova, M. Couton, L. Fehlinger, D. Kirschner, M. Kn ¨usel, L. Muneret, R. Oester, K. Tapolczai, H. Zhang, and F. Altermatt, “The global human impact on biodiversity,”Nature, vol. 641, no. 8062, pp. 395–400, May 2025

work page 2025
[2]

Methods for wildlife monitoring in tropical forests: Comparing human observations, camera traps, and passive acoustic sensors,

J. A. Zwerts, P. J. Stephenson, F. Maisels, M. Rowcliffe, C. Astaras, P. A. Jansen, J. van der Waarde, L. E. H. M. Sterck, P. A. Verweij, T. Bruce, S. Brittain, and M. van Kuijk, “Methods for wildlife monitoring in tropical forests: Comparing human observations, camera traps, and passive acoustic sensors,”Conservation Science and Practice, vol. 3, no. 12,...

work page 2021
[3]

Terrestrial Passive Acoustic Monitoring: Review and Perspectives,

L. Sugai, T. Silva, J. Ribeiro Jr, and D. Llusia, “Terrestrial Passive Acoustic Monitoring: Review and Perspectives,”BioScience, vol. 69, Nov. 2018

work page 2018
[4]

Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring,

R. Bardeli, D. Wolff, F. Kurth, M. Koch, K.-H. Tauchert, and K.-H. Frommolt, “Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring,”Pattern Recognition Letters, vol. 31, pp. 1524–1534, Sep. 2010

work page 2010
[5]

Applying machine learning to primate bioacoustics: Review and perspectives,

J. Cauzinille, B. Favre, R. Marxer, and A. Rey, “Applying machine learning to primate bioacoustics: Review and perspectives,”American Journal of Primatology, vol. 86, no. 10, p. e23666, 2024

work page 2024
[6]

Acoustic monitoring for conservation in tropical forests: Examples from forest elephants,

P. H. Wrege, E. D. Rowland, S. Keen, and Y . Shiu, “Acoustic monitoring for conservation in tropical forests: Examples from forest elephants,” Methods in Ecology and Evolution, vol. 8, no. 10, pp. 1292–1301, 2017

work page 2017
[7]

Ecoacoustics: The Ecological Investigation and Interpretation of Environmental Sound,

J. Sueur and A. Farina, “Ecoacoustics: The Ecological Investigation and Interpretation of Environmental Sound,”Biosemiotics, vol. 8, no. 3, pp. 493–502, Dec. 2015

work page 2015
[8]

Computational bioacoustics with deep learning: A review and roadmap,

D. Stowell, “Computational bioacoustics with deep learning: A review and roadmap,” Dec. 2021

work page 2021
[9]

Deep neural networks for automated detection of marine mammal species,

Y . Shiu, K. J. Palmer, M. A. Roch, E. Fleishman, X. Liu, E.-M. Nosal, T. Helble, D. Cholewiak, D. Gillespie, and H. Klinck, “Deep neural networks for automated detection of marine mammal species,”Scientific Reports, vol. 10, no. 1, p. 607, Jan. 2020

work page 2020
[10]

AST: Audio Spectrogram Trans- former,

Y . Gong, Y .-A. Chung, and J. Glass, “AST: Audio Spectrogram Trans- former,” inInterspeech 2021, Aug. 2021, p. 575

work page 2021
[11]

BirdNET: A deep learning solution for avian diversity monitoring,

S. Kahl, C. M. Wood, M. Eibl, and H. Klinck, “BirdNET: A deep learning solution for avian diversity monitoring,”Ecological Informatics, vol. 61, p. 101236, Mar. 2021

work page 2021
[12]

Perch 2.0: The Bittern Lesson for Bioacoustics,

B. van Merri ¨enboer, V . Dumoulin, J. Hamer, L. Harrell, A. Burns, and T. Denton, “Perch 2.0: The Bittern Lesson for Bioacoustics,” Jan. 2026

work page 2026
[13]

The iNaturalist Sounds Dataset,

M. Chasmai, A. Shepard, S. Maji, and G. V . Horn, “The iNaturalist Sounds Dataset,” May 2025

work page 2025
[14]

Audio Set: An ontology and human-labeled dataset for audio events,

J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio Set: An ontology and human-labeled dataset for audio events,” in2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2017, pp. 776–780

work page 2017
[15]

Acoustic detection of a nocturnal bird with deep learning: The challenge of low signal-to-noise ratio,

F. Michaud, J. Sueur, F. S `ebe, M. Le Cesne, and S. Haupert, “Acoustic detection of a nocturnal bird with deep learning: The challenge of low signal-to-noise ratio,”Ecological Indicators, vol. 181, p. 114475, Dec. 2025

work page 2025
[16]

A global assessment of BirdNET performance: Differences among continents, biomes, and species,

D. Funosas, E. Sebasti ´an-Gonz´alez, J. morant etxebarria, O. Mar ´ın G´omez, I. Mendoza, M. Mohedano-Mu ˜noz, E. Santamar ´ıa, G. Bastianelli, A. M ´arquez-Rodr´ıguez, M. Budka, G. Bota, C. Alonso- Moya, J. Pe ˜na-Rubio, E. Garc´ıa de la Morena, M. Santa-Cruz, P. Nava, M. Fern ´andez Tiz ´on, H. Mateos, A. Diego, and C. P ´erez Granados, “A global asses...

work page 2026
[17]

Goodfellow, Y

I. Goodfellow, Y . Bengio, and A. Courville,Deep Learning. MIT Press, 2016

work page 2016
[18]

Global birdsong embed- dings enable superior transfer learning for bioacoustic classification,

B. Ghani, T. Denton, S. Kahl, and H. Klinck, “Global birdsong embed- dings enable superior transfer learning for bioacoustic classification,” Scientific Reports, vol. 13, no. 1, p. 22876, Dec. 2023

work page 2023
[19]

Feature embeddings from the BirdNET algorithm provide insights into avian ecology,

K. McGinn, S. Kahl, M. Z. Peery, H. Klinck, and C. M. Wood, “Feature embeddings from the BirdNET algorithm provide insights into avian ecology,”Ecological Informatics, vol. 74, p. 101995, May 2023

work page 2023
[20]

DeepForestVision: Automated wildlife identification for camera traps of African tropical forests,

H. Magaldi, R. Cornette, J. Tibesigwa, R. Katumba, H. Rugonge, B. Amarasekaran, N. Anderson, N. Cappelle, A. Cardoso, D. Cornelis, T. Deschner, D. Fonteyn, R. Garriga, P. van Lunteren, X. Rufray, H. Vanthomme, J. Zwerts, and S. Krief, “DeepForestVision: Automated wildlife identification for camera traps of African tropical forests,” Ecological Solutions a...

work page 2025
[21]

LoRA: Low-Rank Adaptation of Large Language Models,

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models,” ICLR, vol. 1, no. 2, p. 3, 2022

work page 2022
[22]

Censusing large mammals in Kibale National Park: Evaluation of the intensity of sampling required to determine change,

F. Wanyama, R. Muhabwe, A. Plumptre, C. Chapman, and J. Rothman, “Censusing large mammals in Kibale National Park: Evaluation of the intensity of sampling required to determine change,”African Journal of Ecology, vol. 48, pp. 953–961, Dec. 2010

work page 2010
[23]

Learning to rumble: Automated elephant call classification, detection and endpointing using deep archi- tectures,

C. M. Geldenhuys and T. R. Niesler, “Learning to rumble: Automated elephant call classification, detection and endpointing using deep archi- tectures,”Bioacoustics, vol. 34, no. 3, pp. 307–354, May 2025

work page 2025
[24]

Pheno- typical characterization of African savannah and forest elephants, with special emphasis on hybrids: The case of Kibale National Park, Uganda,

J. Bonnald, R. Cornette, M. Pichard, E. Asalu, and S. Krief, “Pheno- typical characterization of African savannah and forest elephants, with special emphasis on hybrids: The case of Kibale National Park, Uganda,” Oryx, vol. 57, no. 2, pp. 188–195, Mar. 2023

work page 2023
[25]

Does Social Complexity Drive V ocal Complexity? Insights from the Two African Elephant Species,

D. Hedwig, J. Poole, and P. Granli, “Does Social Complexity Drive V ocal Complexity? Insights from the Two African Elephant Species,” Animals: an open access journal from MDPI, vol. 11, no. 11, p. 3071, Oct. 2021

work page 2021
[26]

Introducing a Central African Primate V ocalisation Dataset for Automated Species Classification,

J. A. Zwerts, J. Treep, C. S. Kaandorp, F. Meewis, A. C. Koot, and H. Kaya, “Introducing a Central African Primate V ocalisation Dataset for Automated Species Classification,” inInterspeech 2021. ISCA, Aug. 2021, pp. 466–470

work page 2021
[27]

Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires,

T. Sainburg, M. Thielk, and T. Q. Gentner, “Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires,” PLOS Computational Biology, vol. 16, no. 10, p. e1008228, Oct. 2020

work page 2020
[28]

Unsupervised classification to improve the quality of a bird song recording dataset,

F. Michaud, J. Sueur, M. Le Cesne, and S. Haupert, “Unsupervised classification to improve the quality of a bird song recording dataset,” Ecological Informatics, vol. 74, p. 101952, May 2023

work page 2023
[29]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,

L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” Sep. 2020

work page 2020
[30]

Hdbscan: Hierarchical density based clustering,

L. McInnes, J. Healy, and S. Astels, “Hdbscan: Hierarchical density based clustering,”The Journal of Open Source Software, vol. 2, p. 205, Mar. 2017

work page 2017
[31]

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,

D. S. Park, W. Chan, Y . Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V . Le, “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” inInterspeech 2019, Sep. 2019, pp. 2613–2617

work page 2019
[32]

Haupert, F

S. Haupert, F. S `ebe, and J. Sueur, “Physics-based model to predict the acoustic detection distance of terrestrial autonomous recording units over the diel cycle and across seasons: Insights from an Alpine and a Neotropical forest,”Methods in Ecology and Evolution, vol. 14, Nov. 2022

work page 2022
[33]

A Survey of Data Augmentation for Audio Classification,

L. Ferreira-Paiva, E. Alfaro Espinoza, V . Martins Almeida, L. Felix, and R. Neves, “A Survey of Data Augmentation for Audio Classification,” inCongresso Brasileiro de Automatica-CBA, vol. 3, Oct. 2022

work page 2022
[34]

Transformer Models improve the acoustic recognition of buzz-pollinating bee species,

A. I. S. Ferreira, N. F. F. da Silva, F. N. Mesquita, T. C. Rosa, S. L. Buchmann, and J. N. Mesquita-Neto, “Transformer Models improve the acoustic recognition of buzz-pollinating bee species,”Ecological Informatics, vol. 86, p. 103010, May 2025

work page 2025
[35]

Birds, bats and beyond: Evaluating generalization in bioacoustics models,

B. van Merri ¨enboer, J. Hamer, V . Dumoulin, E. Triantafillou, and T. Den- ton, “Birds, bats and beyond: Evaluating generalization in bioacoustics models,”Frontiers in Bird Science, vol. 3, Jul. 2024

work page 2024
[36]

An Automated Pipeline for Few-Shot Bird Call Classification: A Case Study with the Tooth-Billed Pigeon,

A. Jana, M. Uili, J. Atherton, M. O’Brien, J. Wood, and L. Brickson, “An Automated Pipeline for Few-Shot Bird Call Classification: A Case Study with the Tooth-Billed Pigeon,” May 2025

work page 2025
[37]

No Free Lunch from Audio Pretraining in Bioacoustics: A Benchmark Study of Embeddings,

C. Chen and Z. Yang, “No Free Lunch from Audio Pretraining in Bioacoustics: A Benchmark Study of Embeddings,” Aug. 2025

work page 2025
[38]

Parameter- Efficient Transfer Learning of Audio Spectrogram Transformers,

U. Cappellazzo, D. Falavigna, A. Brutti, and M. Ravanelli, “Parameter- Efficient Transfer Learning of Audio Spectrogram Transformers,” in 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP), Sep. 2024, pp. 1–6

work page 2024
[39]

Foundation Models for Bioacoustics – a Comparative Review,

R. Schwinger, P. V . Zadeh, L. Rauch, M. Kurz, T. Hauschild, S. Lapp, and S. Tomforde, “Foundation Models for Bioacoustics – a Comparative Review,” Aug. 2025

work page 2025

[1] [1]

The global human impact on biodiversity,

F. Keck, T. Peller, R. Alther, C. Barouillet, R. Blackman, E. Capo, T. Chonova, M. Couton, L. Fehlinger, D. Kirschner, M. Kn ¨usel, L. Muneret, R. Oester, K. Tapolczai, H. Zhang, and F. Altermatt, “The global human impact on biodiversity,”Nature, vol. 641, no. 8062, pp. 395–400, May 2025

work page 2025

[2] [2]

Methods for wildlife monitoring in tropical forests: Comparing human observations, camera traps, and passive acoustic sensors,

J. A. Zwerts, P. J. Stephenson, F. Maisels, M. Rowcliffe, C. Astaras, P. A. Jansen, J. van der Waarde, L. E. H. M. Sterck, P. A. Verweij, T. Bruce, S. Brittain, and M. van Kuijk, “Methods for wildlife monitoring in tropical forests: Comparing human observations, camera traps, and passive acoustic sensors,”Conservation Science and Practice, vol. 3, no. 12,...

work page 2021

[3] [3]

Terrestrial Passive Acoustic Monitoring: Review and Perspectives,

L. Sugai, T. Silva, J. Ribeiro Jr, and D. Llusia, “Terrestrial Passive Acoustic Monitoring: Review and Perspectives,”BioScience, vol. 69, Nov. 2018

work page 2018

[4] [4]

Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring,

R. Bardeli, D. Wolff, F. Kurth, M. Koch, K.-H. Tauchert, and K.-H. Frommolt, “Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring,”Pattern Recognition Letters, vol. 31, pp. 1524–1534, Sep. 2010

work page 2010

[5] [5]

Applying machine learning to primate bioacoustics: Review and perspectives,

J. Cauzinille, B. Favre, R. Marxer, and A. Rey, “Applying machine learning to primate bioacoustics: Review and perspectives,”American Journal of Primatology, vol. 86, no. 10, p. e23666, 2024

work page 2024

[6] [6]

Acoustic monitoring for conservation in tropical forests: Examples from forest elephants,

P. H. Wrege, E. D. Rowland, S. Keen, and Y . Shiu, “Acoustic monitoring for conservation in tropical forests: Examples from forest elephants,” Methods in Ecology and Evolution, vol. 8, no. 10, pp. 1292–1301, 2017

work page 2017

[7] [7]

Ecoacoustics: The Ecological Investigation and Interpretation of Environmental Sound,

J. Sueur and A. Farina, “Ecoacoustics: The Ecological Investigation and Interpretation of Environmental Sound,”Biosemiotics, vol. 8, no. 3, pp. 493–502, Dec. 2015

work page 2015

[8] [8]

Computational bioacoustics with deep learning: A review and roadmap,

D. Stowell, “Computational bioacoustics with deep learning: A review and roadmap,” Dec. 2021

work page 2021

[9] [9]

Deep neural networks for automated detection of marine mammal species,

Y . Shiu, K. J. Palmer, M. A. Roch, E. Fleishman, X. Liu, E.-M. Nosal, T. Helble, D. Cholewiak, D. Gillespie, and H. Klinck, “Deep neural networks for automated detection of marine mammal species,”Scientific Reports, vol. 10, no. 1, p. 607, Jan. 2020

work page 2020

[10] [10]

AST: Audio Spectrogram Trans- former,

Y . Gong, Y .-A. Chung, and J. Glass, “AST: Audio Spectrogram Trans- former,” inInterspeech 2021, Aug. 2021, p. 575

work page 2021

[11] [11]

BirdNET: A deep learning solution for avian diversity monitoring,

S. Kahl, C. M. Wood, M. Eibl, and H. Klinck, “BirdNET: A deep learning solution for avian diversity monitoring,”Ecological Informatics, vol. 61, p. 101236, Mar. 2021

work page 2021

[12] [12]

Perch 2.0: The Bittern Lesson for Bioacoustics,

B. van Merri ¨enboer, V . Dumoulin, J. Hamer, L. Harrell, A. Burns, and T. Denton, “Perch 2.0: The Bittern Lesson for Bioacoustics,” Jan. 2026

work page 2026

[13] [13]

The iNaturalist Sounds Dataset,

M. Chasmai, A. Shepard, S. Maji, and G. V . Horn, “The iNaturalist Sounds Dataset,” May 2025

work page 2025

[14] [14]

Audio Set: An ontology and human-labeled dataset for audio events,

J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio Set: An ontology and human-labeled dataset for audio events,” in2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2017, pp. 776–780

work page 2017

[15] [15]

Acoustic detection of a nocturnal bird with deep learning: The challenge of low signal-to-noise ratio,

F. Michaud, J. Sueur, F. S `ebe, M. Le Cesne, and S. Haupert, “Acoustic detection of a nocturnal bird with deep learning: The challenge of low signal-to-noise ratio,”Ecological Indicators, vol. 181, p. 114475, Dec. 2025

work page 2025

[16] [16]

A global assessment of BirdNET performance: Differences among continents, biomes, and species,

D. Funosas, E. Sebasti ´an-Gonz´alez, J. morant etxebarria, O. Mar ´ın G´omez, I. Mendoza, M. Mohedano-Mu ˜noz, E. Santamar ´ıa, G. Bastianelli, A. M ´arquez-Rodr´ıguez, M. Budka, G. Bota, C. Alonso- Moya, J. Pe ˜na-Rubio, E. Garc´ıa de la Morena, M. Santa-Cruz, P. Nava, M. Fern ´andez Tiz ´on, H. Mateos, A. Diego, and C. P ´erez Granados, “A global asses...

work page 2026

[17] [17]

Goodfellow, Y

I. Goodfellow, Y . Bengio, and A. Courville,Deep Learning. MIT Press, 2016

work page 2016

[18] [18]

Global birdsong embed- dings enable superior transfer learning for bioacoustic classification,

B. Ghani, T. Denton, S. Kahl, and H. Klinck, “Global birdsong embed- dings enable superior transfer learning for bioacoustic classification,” Scientific Reports, vol. 13, no. 1, p. 22876, Dec. 2023

work page 2023

[19] [19]

Feature embeddings from the BirdNET algorithm provide insights into avian ecology,

K. McGinn, S. Kahl, M. Z. Peery, H. Klinck, and C. M. Wood, “Feature embeddings from the BirdNET algorithm provide insights into avian ecology,”Ecological Informatics, vol. 74, p. 101995, May 2023

work page 2023

[20] [20]

DeepForestVision: Automated wildlife identification for camera traps of African tropical forests,

H. Magaldi, R. Cornette, J. Tibesigwa, R. Katumba, H. Rugonge, B. Amarasekaran, N. Anderson, N. Cappelle, A. Cardoso, D. Cornelis, T. Deschner, D. Fonteyn, R. Garriga, P. van Lunteren, X. Rufray, H. Vanthomme, J. Zwerts, and S. Krief, “DeepForestVision: Automated wildlife identification for camera traps of African tropical forests,” Ecological Solutions a...

work page 2025

[21] [21]

LoRA: Low-Rank Adaptation of Large Language Models,

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models,” ICLR, vol. 1, no. 2, p. 3, 2022

work page 2022

[22] [22]

Censusing large mammals in Kibale National Park: Evaluation of the intensity of sampling required to determine change,

F. Wanyama, R. Muhabwe, A. Plumptre, C. Chapman, and J. Rothman, “Censusing large mammals in Kibale National Park: Evaluation of the intensity of sampling required to determine change,”African Journal of Ecology, vol. 48, pp. 953–961, Dec. 2010

work page 2010

[23] [23]

Learning to rumble: Automated elephant call classification, detection and endpointing using deep archi- tectures,

C. M. Geldenhuys and T. R. Niesler, “Learning to rumble: Automated elephant call classification, detection and endpointing using deep archi- tectures,”Bioacoustics, vol. 34, no. 3, pp. 307–354, May 2025

work page 2025

[24] [24]

Pheno- typical characterization of African savannah and forest elephants, with special emphasis on hybrids: The case of Kibale National Park, Uganda,

J. Bonnald, R. Cornette, M. Pichard, E. Asalu, and S. Krief, “Pheno- typical characterization of African savannah and forest elephants, with special emphasis on hybrids: The case of Kibale National Park, Uganda,” Oryx, vol. 57, no. 2, pp. 188–195, Mar. 2023

work page 2023

[25] [25]

Does Social Complexity Drive V ocal Complexity? Insights from the Two African Elephant Species,

D. Hedwig, J. Poole, and P. Granli, “Does Social Complexity Drive V ocal Complexity? Insights from the Two African Elephant Species,” Animals: an open access journal from MDPI, vol. 11, no. 11, p. 3071, Oct. 2021

work page 2021

[26] [26]

Introducing a Central African Primate V ocalisation Dataset for Automated Species Classification,

J. A. Zwerts, J. Treep, C. S. Kaandorp, F. Meewis, A. C. Koot, and H. Kaya, “Introducing a Central African Primate V ocalisation Dataset for Automated Species Classification,” inInterspeech 2021. ISCA, Aug. 2021, pp. 466–470

work page 2021

[27] [27]

Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires,

T. Sainburg, M. Thielk, and T. Q. Gentner, “Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires,” PLOS Computational Biology, vol. 16, no. 10, p. e1008228, Oct. 2020

work page 2020

[28] [28]

Unsupervised classification to improve the quality of a bird song recording dataset,

F. Michaud, J. Sueur, M. Le Cesne, and S. Haupert, “Unsupervised classification to improve the quality of a bird song recording dataset,” Ecological Informatics, vol. 74, p. 101952, May 2023

work page 2023

[29] [29]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,

L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” Sep. 2020

work page 2020

[30] [30]

Hdbscan: Hierarchical density based clustering,

L. McInnes, J. Healy, and S. Astels, “Hdbscan: Hierarchical density based clustering,”The Journal of Open Source Software, vol. 2, p. 205, Mar. 2017

work page 2017

[31] [31]

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,

D. S. Park, W. Chan, Y . Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V . Le, “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” inInterspeech 2019, Sep. 2019, pp. 2613–2617

work page 2019

[32] [32]

Haupert, F

S. Haupert, F. S `ebe, and J. Sueur, “Physics-based model to predict the acoustic detection distance of terrestrial autonomous recording units over the diel cycle and across seasons: Insights from an Alpine and a Neotropical forest,”Methods in Ecology and Evolution, vol. 14, Nov. 2022

work page 2022

[33] [33]

A Survey of Data Augmentation for Audio Classification,

L. Ferreira-Paiva, E. Alfaro Espinoza, V . Martins Almeida, L. Felix, and R. Neves, “A Survey of Data Augmentation for Audio Classification,” inCongresso Brasileiro de Automatica-CBA, vol. 3, Oct. 2022

work page 2022

[34] [34]

Transformer Models improve the acoustic recognition of buzz-pollinating bee species,

A. I. S. Ferreira, N. F. F. da Silva, F. N. Mesquita, T. C. Rosa, S. L. Buchmann, and J. N. Mesquita-Neto, “Transformer Models improve the acoustic recognition of buzz-pollinating bee species,”Ecological Informatics, vol. 86, p. 103010, May 2025

work page 2025

[35] [35]

Birds, bats and beyond: Evaluating generalization in bioacoustics models,

B. van Merri ¨enboer, J. Hamer, V . Dumoulin, E. Triantafillou, and T. Den- ton, “Birds, bats and beyond: Evaluating generalization in bioacoustics models,”Frontiers in Bird Science, vol. 3, Jul. 2024

work page 2024

[36] [36]

An Automated Pipeline for Few-Shot Bird Call Classification: A Case Study with the Tooth-Billed Pigeon,

A. Jana, M. Uili, J. Atherton, M. O’Brien, J. Wood, and L. Brickson, “An Automated Pipeline for Few-Shot Bird Call Classification: A Case Study with the Tooth-Billed Pigeon,” May 2025

work page 2025

[37] [37]

No Free Lunch from Audio Pretraining in Bioacoustics: A Benchmark Study of Embeddings,

C. Chen and Z. Yang, “No Free Lunch from Audio Pretraining in Bioacoustics: A Benchmark Study of Embeddings,” Aug. 2025

work page 2025

[38] [38]

Parameter- Efficient Transfer Learning of Audio Spectrogram Transformers,

U. Cappellazzo, D. Falavigna, A. Brutti, and M. Ravanelli, “Parameter- Efficient Transfer Learning of Audio Spectrogram Transformers,” in 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP), Sep. 2024, pp. 1–6

work page 2024

[39] [39]

Foundation Models for Bioacoustics – a Comparative Review,

R. Schwinger, P. V . Zadeh, L. Rauch, M. Kurz, T. Hauschild, S. Lapp, and S. Tomforde, “Foundation Models for Bioacoustics – a Comparative Review,” Aug. 2025

work page 2025