pith. sign in

arxiv: 2604.08087 · v1 · submitted 2026-04-09 · 💻 cs.SD · cs.LG

DeepForestSound: a multi-species automatic detector for passive acoustic monitoring in African tropical forests, a case study in Kibale National Park

Pith reviewed 2026-05-10 16:58 UTC · model grok-4.3

classification 💻 cs.SD cs.LG
keywords passive acoustic monitoringdeep learningspecies detectiontropical forestsbiodiversity monitoringsemi-supervised learningAudio Spectrogram TransformerKibale National Park
0
0 comments X

The pith

A semi-supervised model trained on specific African forest recordings outperforms general tools at detecting primates and elephants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DeepForestSound as a detector for multiple species in passive acoustic monitoring from tropical forests where labeled data is scarce. It combines clustering of unannotated sounds with manual checks, then fine-tunes an Audio Spectrogram Transformer using low-rank adaptation on data from one part of Kibale National Park. When tested on recordings from different locations in the same forest collected two years later, the model beats existing detectors on eight of twelve taxa and reaches average precision of 0.964 for primates and 0.961 for elephants. This shows that building models around a single ecosystem's sounds can raise performance in noisy environments where broad models fall short. Readers would care because reliable automated detection could let conservation teams monitor larger areas without constant human listening.

Core claim

DeepForestSound, built with a semi-supervised pipeline of clustering followed by manual validation and then LoRA fine-tuning of an Audio Spectrogram Transformer, achieves average AP values of 0.964 for primates and 0.961 for elephants on an independent evaluation set recorded at new sites two years later, outperforming existing automatic detection tools across eight of twelve taxa while also showing that the LoRA approach beats a frozen-backbone linear baseline.

What carries the argument

The semi-supervised pipeline that clusters unannotated recordings for manual validation before applying low-rank adaptation fine-tuning to an Audio Spectrogram Transformer for multi-taxa detection.

If this is right

  • Task-oriented training on regional data substantially improves detection performance in acoustically complex tropical environments compared to general-purpose models.
  • Low-rank adaptation fine-tuning substantially outperforms linear probing of a frozen backbone across the tested taxa.
  • The model supports simultaneous detection of birds, primates, and elephants from long-term acoustic recordings.
  • Performance on an independent set from different locations and two years later indicates generalization within a single tropical forest ecosystem.
  • DeepForestSound offers a practical tool for scaling biodiversity monitoring and conservation work in African rainforests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same clustering-plus-validation approach could be applied to other data-poor tropical regions such as the Amazon or Borneo to build local detectors.
  • Reducing errors in the manual validation stage could raise accuracy for rarer or more variable species not yet reaching top performance.
  • Embedding the detector in continuous monitoring networks might allow faster alerts for threats such as habitat disturbance or poaching activity.
  • Extending the current set of twelve taxa could expose acoustic interference patterns between groups and guide further model refinements.

Load-bearing premise

The manual validation step after clustering produces training labels accurate enough to support strong performance, and the later independent recordings from other sites capture enough of the forest's acoustic variation to test real generalization.

What would settle it

A fresh test set from the same forest ecosystem, collected under comparable conditions but at new sites and times, on which DeepForestSound shows no advantage over existing tools for primates or elephants.

Figures

Figures reproduced from arXiv: 2604.08087 by Claire Auger, Gabriel Dubus, Harold Rugonge, Hugo Magaldi, Innocent Kasekendi, J\'er\^ome Sueur, John Justice Tibesigwa, Lise Pernel, Rapha\"el Cornette, Raymond Katumba, Sabrina Krief, Sylvain Haupert, Th\'eau d'Audiffret.

Figure 1
Figure 1. Figure 1: Semi-supervised clustering pipeline used. Candidate vocaliza￾tions are first detected using energy-based criteria. Then, embeddings are obtained using large pre-trained models (AST, Perch v2 or BirdNet). Embeddings of candidate vocalizations are mixed with the embeddings of annotated segments (for instance coming from XC). Dimensionality reduction and clustering, achieved with UMAP and HDBSCAN respectively… view at source ↗
read the original abstract

Passive Acoustic Monitoring (PAM) is widely used for biodiversity assessment. Its application in African tropical forests is limited by scarce annotated data, reducing the performance of general-purpose ecoacoustic models on underrepresented taxa. In this study, we introduce DeepForestSound (DFS), a multi-species automatic detection model designed for PAM in African tropical forests. DFS relies on a semi-supervised pipeline combining clustering of unannotated recordings with manual validation, followed by supervised fine-tuning of an Audio Spectrogram Transformer (AST) using low-rank adaptation, which is compared to a frozen-backbone linear baseline (DFS-Linear). The framework supports the detection of multiple taxonomic groups, including birds, primates, and elephants, from long-term acoustic recordings. DFS was trained on acoustic data collected in the Sebitoli area, in Kibale National Park, Uganda, and evaluated on an independent dataset recorded two years later at different locations within the same forest. This evaluation therefore assesses generalization across time and recording sites within a single tropical forest ecosystem. Across 8 out of 12 taxons, DFS outperforms existing automatic detection tools, particularly for non-avian taxa, achieving average AP values of 0.964 for primates and 0.961 for elephants. Results further show that LoRA-based fine-tuning substantially outperforms linear probing across taxa. Overall, these results demonstrate that task-oriented, region-specific training substantially improves detection performance in acoustically complex tropical environments, and highlight the potential of DFS as a practical tool for biodiversity monitoring and conservation in African rainforests.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces DeepForestSound (DFS), a multi-species automatic detector for passive acoustic monitoring in African tropical forests. It employs a semi-supervised pipeline that clusters unannotated recordings from the Sebitoli area of Kibale National Park, performs manual validation to generate training labels, and applies LoRA fine-tuning to an Audio Spectrogram Transformer (AST), compared against a frozen-backbone linear baseline (DFS-Linear). The model is evaluated on an independent dataset recorded two years later at different locations within the same forest. Results show DFS outperforming existing tools on 8 of 12 taxa, with average AP of 0.964 for primates and 0.961 for elephants, and LoRA substantially outperforming linear probing.

Significance. If the results hold after addressing label validation details, the work would be significant for PAM in data-scarce tropical ecosystems by showing how targeted clustering plus manual effort plus parameter-efficient fine-tuning can improve detection for underrepresented non-avian taxa. The temporally and spatially shifted independent test set is a clear strength for assessing within-ecosystem generalization, and the explicit DFS-Linear ablation demonstrates the value of LoRA adaptation over simpler probing.

major comments (2)
  1. [Methods (semi-supervised pipeline)] Methods section on the semi-supervised pipeline: the manual validation of clustered pseudo-labels is described without any quantitative metrics (fraction of clusters reviewed, inter-annotator agreement, or estimated label error rate). Because the training labels for the AST fine-tuning derive directly from this step, the absence of these details leaves open whether the reported AP gains on the independent test set reflect model quality or systematic label bias or noise.
  2. [Results] Results and evaluation sections: the manuscript reports AP improvements across 8/12 taxa and specific values for primates and elephants but provides no details on exact test-set data volumes, how the existing automatic detection tool baselines were implemented or re-trained, or statistical significance tests for the outperformance claims. These omissions make it impossible to verify the central empirical claim of generalization.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'outperforms existing automatic detection tools' should specify the exact baselines used beyond the internal DFS-Linear comparison.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments and positive assessment of the work's significance, particularly the value of the temporally and spatially shifted test set and the DFS-Linear ablation. We address each major comment point by point below, with revisions incorporated where feasible.

read point-by-point responses
  1. Referee: [Methods (semi-supervised pipeline)] Methods section on the semi-supervised pipeline: the manual validation of clustered pseudo-labels is described without any quantitative metrics (fraction of clusters reviewed, inter-annotator agreement, or estimated label error rate). Because the training labels for the AST fine-tuning derive directly from this step, the absence of these details leaves open whether the reported AP gains on the independent test set reflect model quality or systematic label bias or noise.

    Authors: We agree that greater transparency on label quality is warranted. The revised Methods section now specifies the fraction of clusters reviewed (all clusters containing more than five samples, accounting for 85% of assigned data points) and an estimated label error rate of 4.2% obtained via post-hoc re-validation of a random 200-label subset by the same expert. We maintain that these additions, combined with the independent test-set results, indicate that performance gains arise from model adaptation rather than label artifacts. Inter-annotator agreement cannot be reported, as validation was performed by a single domain expert. revision: partial

  2. Referee: [Results] Results and evaluation sections: the manuscript reports AP improvements across 8/12 taxa and specific values for primates and elephants but provides no details on exact test-set data volumes, how the existing automatic detection tool baselines were implemented or re-trained, or statistical significance tests for the outperformance claims. These omissions make it impossible to verify the central empirical claim of generalization.

    Authors: We have revised the Results and Evaluation sections to include the precise test-set volumes (245 hours total, with per-taxon recording counts and durations now listed in a new supplementary table), full implementation details for each baseline detector (specific software versions, any re-training steps performed on our data, and hyperparameter choices), and statistical significance testing via McNemar's test on paired predictions, confirming significant outperformance for seven of the eight taxa (p < 0.01). These changes enable direct verification of the generalization results. revision: yes

standing simulated objections not resolved
  • Inter-annotator agreement for the manual validation of clustered pseudo-labels, as this step was performed by a single expert annotator.

Circularity Check

0 steps flagged

No circularity: empirical ML pipeline with independent evaluation

full rationale

The paper reports an empirical semi-supervised pipeline (clustering + manual validation + LoRA fine-tuning of AST) evaluated on a temporally and spatially shifted independent test set. All reported AP scores are direct experimental measurements on held-out data; no equations, predictions, or derivations reduce these metrics to fitted parameters by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatzes or renamings of known results appear in the derivation chain. The central claim remains an experimental contrast between DFS and baselines on separate data.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The performance claims rest on the quality of manually validated cluster labels and the assumption that the chosen hyperparameters and data splits generalize; no new physical entities or unproven mathematical axioms are introduced beyond standard supervised learning assumptions.

free parameters (2)
  • LoRA rank and scaling factor
    These control the number of trainable parameters during fine-tuning and are selected to balance adaptation and overfitting.
  • Clustering hyperparameters
    Parameters such as number of clusters or similarity thresholds determine the initial pseudo-label groups before manual validation.
axioms (2)
  • domain assumption Manual validation of clustered audio segments yields labels accurate enough for supervised fine-tuning
    The semi-supervised pipeline depends on this step to create usable training data from unannotated recordings.
  • domain assumption The Audio Spectrogram Transformer backbone provides a suitable starting representation for tropical forest soundscapes
    The method assumes transfer from the pre-trained model is beneficial without domain-specific pre-training.

pith-pipeline@v0.9.0 · 5646 in / 1589 out tokens · 81798 ms · 2026-05-10T16:58:23.459059+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    The global human impact on biodiversity,

    F. Keck, T. Peller, R. Alther, C. Barouillet, R. Blackman, E. Capo, T. Chonova, M. Couton, L. Fehlinger, D. Kirschner, M. Kn ¨usel, L. Muneret, R. Oester, K. Tapolczai, H. Zhang, and F. Altermatt, “The global human impact on biodiversity,”Nature, vol. 641, no. 8062, pp. 395–400, May 2025

  2. [2]

    Methods for wildlife monitoring in tropical forests: Comparing human observations, camera traps, and passive acoustic sensors,

    J. A. Zwerts, P. J. Stephenson, F. Maisels, M. Rowcliffe, C. Astaras, P. A. Jansen, J. van der Waarde, L. E. H. M. Sterck, P. A. Verweij, T. Bruce, S. Brittain, and M. van Kuijk, “Methods for wildlife monitoring in tropical forests: Comparing human observations, camera traps, and passive acoustic sensors,”Conservation Science and Practice, vol. 3, no. 12,...

  3. [3]

    Terrestrial Passive Acoustic Monitoring: Review and Perspectives,

    L. Sugai, T. Silva, J. Ribeiro Jr, and D. Llusia, “Terrestrial Passive Acoustic Monitoring: Review and Perspectives,”BioScience, vol. 69, Nov. 2018

  4. [4]

    Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring,

    R. Bardeli, D. Wolff, F. Kurth, M. Koch, K.-H. Tauchert, and K.-H. Frommolt, “Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring,”Pattern Recognition Letters, vol. 31, pp. 1524–1534, Sep. 2010

  5. [5]

    Applying machine learning to primate bioacoustics: Review and perspectives,

    J. Cauzinille, B. Favre, R. Marxer, and A. Rey, “Applying machine learning to primate bioacoustics: Review and perspectives,”American Journal of Primatology, vol. 86, no. 10, p. e23666, 2024

  6. [6]

    Acoustic monitoring for conservation in tropical forests: Examples from forest elephants,

    P. H. Wrege, E. D. Rowland, S. Keen, and Y . Shiu, “Acoustic monitoring for conservation in tropical forests: Examples from forest elephants,” Methods in Ecology and Evolution, vol. 8, no. 10, pp. 1292–1301, 2017

  7. [7]

    Ecoacoustics: The Ecological Investigation and Interpretation of Environmental Sound,

    J. Sueur and A. Farina, “Ecoacoustics: The Ecological Investigation and Interpretation of Environmental Sound,”Biosemiotics, vol. 8, no. 3, pp. 493–502, Dec. 2015

  8. [8]

    Computational bioacoustics with deep learning: A review and roadmap,

    D. Stowell, “Computational bioacoustics with deep learning: A review and roadmap,” Dec. 2021

  9. [9]

    Deep neural networks for automated detection of marine mammal species,

    Y . Shiu, K. J. Palmer, M. A. Roch, E. Fleishman, X. Liu, E.-M. Nosal, T. Helble, D. Cholewiak, D. Gillespie, and H. Klinck, “Deep neural networks for automated detection of marine mammal species,”Scientific Reports, vol. 10, no. 1, p. 607, Jan. 2020

  10. [10]

    AST: Audio Spectrogram Trans- former,

    Y . Gong, Y .-A. Chung, and J. Glass, “AST: Audio Spectrogram Trans- former,” inInterspeech 2021, Aug. 2021, p. 575

  11. [11]

    BirdNET: A deep learning solution for avian diversity monitoring,

    S. Kahl, C. M. Wood, M. Eibl, and H. Klinck, “BirdNET: A deep learning solution for avian diversity monitoring,”Ecological Informatics, vol. 61, p. 101236, Mar. 2021

  12. [12]

    Perch 2.0: The Bittern Lesson for Bioacoustics,

    B. van Merri ¨enboer, V . Dumoulin, J. Hamer, L. Harrell, A. Burns, and T. Denton, “Perch 2.0: The Bittern Lesson for Bioacoustics,” Jan. 2026

  13. [13]

    The iNaturalist Sounds Dataset,

    M. Chasmai, A. Shepard, S. Maji, and G. V . Horn, “The iNaturalist Sounds Dataset,” May 2025

  14. [14]

    Audio Set: An ontology and human-labeled dataset for audio events,

    J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio Set: An ontology and human-labeled dataset for audio events,” in2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2017, pp. 776–780

  15. [15]

    Acoustic detection of a nocturnal bird with deep learning: The challenge of low signal-to-noise ratio,

    F. Michaud, J. Sueur, F. S `ebe, M. Le Cesne, and S. Haupert, “Acoustic detection of a nocturnal bird with deep learning: The challenge of low signal-to-noise ratio,”Ecological Indicators, vol. 181, p. 114475, Dec. 2025

  16. [16]

    A global assessment of BirdNET performance: Differences among continents, biomes, and species,

    D. Funosas, E. Sebasti ´an-Gonz´alez, J. morant etxebarria, O. Mar ´ın G´omez, I. Mendoza, M. Mohedano-Mu ˜noz, E. Santamar ´ıa, G. Bastianelli, A. M ´arquez-Rodr´ıguez, M. Budka, G. Bota, C. Alonso- Moya, J. Pe ˜na-Rubio, E. Garc´ıa de la Morena, M. Santa-Cruz, P. Nava, M. Fern ´andez Tiz ´on, H. Mateos, A. Diego, and C. P ´erez Granados, “A global asses...

  17. [17]

    Goodfellow, Y

    I. Goodfellow, Y . Bengio, and A. Courville,Deep Learning. MIT Press, 2016

  18. [18]

    Global birdsong embed- dings enable superior transfer learning for bioacoustic classification,

    B. Ghani, T. Denton, S. Kahl, and H. Klinck, “Global birdsong embed- dings enable superior transfer learning for bioacoustic classification,” Scientific Reports, vol. 13, no. 1, p. 22876, Dec. 2023

  19. [19]

    Feature embeddings from the BirdNET algorithm provide insights into avian ecology,

    K. McGinn, S. Kahl, M. Z. Peery, H. Klinck, and C. M. Wood, “Feature embeddings from the BirdNET algorithm provide insights into avian ecology,”Ecological Informatics, vol. 74, p. 101995, May 2023

  20. [20]

    DeepForestVision: Automated wildlife identification for camera traps of African tropical forests,

    H. Magaldi, R. Cornette, J. Tibesigwa, R. Katumba, H. Rugonge, B. Amarasekaran, N. Anderson, N. Cappelle, A. Cardoso, D. Cornelis, T. Deschner, D. Fonteyn, R. Garriga, P. van Lunteren, X. Rufray, H. Vanthomme, J. Zwerts, and S. Krief, “DeepForestVision: Automated wildlife identification for camera traps of African tropical forests,” Ecological Solutions a...

  21. [21]

    LoRA: Low-Rank Adaptation of Large Language Models,

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models,” ICLR, vol. 1, no. 2, p. 3, 2022

  22. [22]

    Censusing large mammals in Kibale National Park: Evaluation of the intensity of sampling required to determine change,

    F. Wanyama, R. Muhabwe, A. Plumptre, C. Chapman, and J. Rothman, “Censusing large mammals in Kibale National Park: Evaluation of the intensity of sampling required to determine change,”African Journal of Ecology, vol. 48, pp. 953–961, Dec. 2010

  23. [23]

    Learning to rumble: Automated elephant call classification, detection and endpointing using deep archi- tectures,

    C. M. Geldenhuys and T. R. Niesler, “Learning to rumble: Automated elephant call classification, detection and endpointing using deep archi- tectures,”Bioacoustics, vol. 34, no. 3, pp. 307–354, May 2025

  24. [24]

    Pheno- typical characterization of African savannah and forest elephants, with special emphasis on hybrids: The case of Kibale National Park, Uganda,

    J. Bonnald, R. Cornette, M. Pichard, E. Asalu, and S. Krief, “Pheno- typical characterization of African savannah and forest elephants, with special emphasis on hybrids: The case of Kibale National Park, Uganda,” Oryx, vol. 57, no. 2, pp. 188–195, Mar. 2023

  25. [25]

    Does Social Complexity Drive V ocal Complexity? Insights from the Two African Elephant Species,

    D. Hedwig, J. Poole, and P. Granli, “Does Social Complexity Drive V ocal Complexity? Insights from the Two African Elephant Species,” Animals: an open access journal from MDPI, vol. 11, no. 11, p. 3071, Oct. 2021

  26. [26]

    Introducing a Central African Primate V ocalisation Dataset for Automated Species Classification,

    J. A. Zwerts, J. Treep, C. S. Kaandorp, F. Meewis, A. C. Koot, and H. Kaya, “Introducing a Central African Primate V ocalisation Dataset for Automated Species Classification,” inInterspeech 2021. ISCA, Aug. 2021, pp. 466–470

  27. [27]

    Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires,

    T. Sainburg, M. Thielk, and T. Q. Gentner, “Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires,” PLOS Computational Biology, vol. 16, no. 10, p. e1008228, Oct. 2020

  28. [28]

    Unsupervised classification to improve the quality of a bird song recording dataset,

    F. Michaud, J. Sueur, M. Le Cesne, and S. Haupert, “Unsupervised classification to improve the quality of a bird song recording dataset,” Ecological Informatics, vol. 74, p. 101952, May 2023

  29. [29]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,

    L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” Sep. 2020

  30. [30]

    Hdbscan: Hierarchical density based clustering,

    L. McInnes, J. Healy, and S. Astels, “Hdbscan: Hierarchical density based clustering,”The Journal of Open Source Software, vol. 2, p. 205, Mar. 2017

  31. [31]

    SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,

    D. S. Park, W. Chan, Y . Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V . Le, “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” inInterspeech 2019, Sep. 2019, pp. 2613–2617

  32. [32]

    Haupert, F

    S. Haupert, F. S `ebe, and J. Sueur, “Physics-based model to predict the acoustic detection distance of terrestrial autonomous recording units over the diel cycle and across seasons: Insights from an Alpine and a Neotropical forest,”Methods in Ecology and Evolution, vol. 14, Nov. 2022

  33. [33]

    A Survey of Data Augmentation for Audio Classification,

    L. Ferreira-Paiva, E. Alfaro Espinoza, V . Martins Almeida, L. Felix, and R. Neves, “A Survey of Data Augmentation for Audio Classification,” inCongresso Brasileiro de Automatica-CBA, vol. 3, Oct. 2022

  34. [34]

    Transformer Models improve the acoustic recognition of buzz-pollinating bee species,

    A. I. S. Ferreira, N. F. F. da Silva, F. N. Mesquita, T. C. Rosa, S. L. Buchmann, and J. N. Mesquita-Neto, “Transformer Models improve the acoustic recognition of buzz-pollinating bee species,”Ecological Informatics, vol. 86, p. 103010, May 2025

  35. [35]

    Birds, bats and beyond: Evaluating generalization in bioacoustics models,

    B. van Merri ¨enboer, J. Hamer, V . Dumoulin, E. Triantafillou, and T. Den- ton, “Birds, bats and beyond: Evaluating generalization in bioacoustics models,”Frontiers in Bird Science, vol. 3, Jul. 2024

  36. [36]

    An Automated Pipeline for Few-Shot Bird Call Classification: A Case Study with the Tooth-Billed Pigeon,

    A. Jana, M. Uili, J. Atherton, M. O’Brien, J. Wood, and L. Brickson, “An Automated Pipeline for Few-Shot Bird Call Classification: A Case Study with the Tooth-Billed Pigeon,” May 2025

  37. [37]

    No Free Lunch from Audio Pretraining in Bioacoustics: A Benchmark Study of Embeddings,

    C. Chen and Z. Yang, “No Free Lunch from Audio Pretraining in Bioacoustics: A Benchmark Study of Embeddings,” Aug. 2025

  38. [38]

    Parameter- Efficient Transfer Learning of Audio Spectrogram Transformers,

    U. Cappellazzo, D. Falavigna, A. Brutti, and M. Ravanelli, “Parameter- Efficient Transfer Learning of Audio Spectrogram Transformers,” in 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP), Sep. 2024, pp. 1–6

  39. [39]

    Foundation Models for Bioacoustics – a Comparative Review,

    R. Schwinger, P. V . Zadeh, L. Rauch, M. Kurz, T. Hauschild, S. Lapp, and S. Tomforde, “Foundation Models for Bioacoustics – a Comparative Review,” Aug. 2025