pith. sign in

arxiv: 2605.20578 · v2 · pith:HJWAHX34new · submitted 2026-05-20 · 💻 cs.SD · cs.CV

A strongly annotated passive acoustic dataset for tropical bird monitoring

Pith reviewed 2026-05-22 08:57 UTC · model grok-4.3

classification 💻 cs.SD cs.CV
keywords passive acoustic monitoringNeotropical birdsannotated datasetbird vocalizationsmachine learningbiodiversityColombia
0
0 comments X

The pith

PteroSet supplies 15,372 time-frequency annotations of 168 Neotropical bird species across 73 hours of recordings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Passive acoustic monitoring produces large audio datasets that benefit from machine learning for biodiversity tracking, but supervised methods need detailed annotations that are rare in tropical areas. This paper releases PteroSet, built from recordings at two Colombian sites between 2023 and 2025. The collection holds 563 files for a total of 73.62 hours and carries 15,372 time-frequency annotations, 6,702 of which identify 168 bird species. Annotations appear in a JSON format modeled on COCO to support direct machine learning use, and a baseline detector is included to show both value and the hurdles of overlapping sounds plus site variations.

Core claim

We present PteroSet, a curated dataset of strongly annotated Neotropical bird vocalizations recorded in Puerto Asis (Putumayo) and Pivijay (Magdalena), Colombia. The dataset comprises 563 recordings totaling 73.62 hours and 15,372 time-frequency annotations, including 6,702 species-level events across 168 species. Annotations follow a COCO-inspired JSON schema that unifies audio files, taxonomic categories, and labels. PteroSet serves as a benchmark highlighting acoustic co-occurrence and domain shift, and includes a deep learning baseline for binary bird detection.

What carries the argument

PteroSet dataset with its COCO-inspired JSON schema for audio files, taxonomic categories, and machine learning labels.

If this is right

  • Supervised models can train on exact time labels for bird detection.
  • Dataset shows real challenges from overlapping calls and site differences.
  • Provides public benchmark for tropical acoustic monitoring algorithms.
  • Supports non-invasive biodiversity tracking in Neotropical regions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar datasets for other taxa could extend monitoring to insects or amphibians.
  • Transfer learning from this data may reduce labeling needs at new sites.
  • Strong annotations appear necessary when vocalizations overlap frequently.

Load-bearing premise

Expert annotators can correctly name species and draw exact time-frequency boundaries in dense overlapping tropical soundscapes.

What would settle it

Other experts re-labeling part of the data and showing major mismatches in species names or boundary times.

Figures

Figures reproduced from arXiv: 2605.20578 by Andr\'es Hern\'andez, Andr\'es Sierra-Ricaurte, Angela Mendoza-Henao, Bruno Demuro, Daniela Ruiz, Eliana Barona-Cort\'es, Juan M. Lavista Ferres, Juan Sebasti\'an Ulloa, Maria Paula Toro-G\'omez, Nicol\'as Betancourt, Pablo Arbel\'aez, Rahul Dodhia, Sebasti\'an P\'erez-Pe\~na, Zhongqi Miao.

Figure 1
Figure 1. Figure 1: Overview of the PteroSet data pipeline and baseline modeling approach. (a) Data Collection: We collect audio data using a time-lapse protocol of 10-second recordings every 30 minutes over 24 hours, yielding 480 seconds of WAV files per day. (b) Data Annotation: We manually annotate each recording in Raven as strong time–frequency events (t_min, t_max, f_min, f_max) and store them as a plain-text file per a… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of recording locations. (a) Map of Colombia showing the two study regions where we conduct passive acoustic monitoring deployments: Pivijay (Magdalena) and Puerto Asís (Putumayo). (b–c) Detailed maps showing the specific locations of autonomous recording device deployments at each study site. (d–g) Representative in situ photographs illustrating the deployment environments and examples of recordin… view at source ↗
Figure 3
Figure 3. Figure 3: Time-lapse spectrogram of a 24-hour acoustic recording with annotations. We create the audio by concatenating the first 10 seconds of each of the 48 recordings collected at 30-minute intervals throughout the day. Frequency (Hz) is shown on the y-axis and time of day on the x-axis. Dashed boxes indicate manually annotated bird vocalizations (125 annotations from 22 different species in this example). 5 [PI… view at source ↗
Figure 4
Figure 4. Figure 4: Summary statistics by project. (a) Number of audio recordings per project. (b) Total audio duration (hours) per project. (c) Total annotated duration (hours) per project. (d) Number of class-level annotations per project. (e) Number of species-level annotations per project. (f) Number of unique species identified per project [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Annotation scenarios for binary classification training. Each panel shows a 5-second mel spectrogram window with overlaid annotations (cyan dashed boxes). A window is labeled positive (birds present) if any annotation overlaps it, regardless of the degree of overlap. (a) No annotations overlap the window, yielding a negative label. (b) One annotation falls entirely within the window. (c) An annotation part… view at source ↗
Figure 6
Figure 6. Figure 6: Precision–recall curves for the baseline bird vocalization detector, with the corresponding AUPRC shown in the legend, across the five leave-one-project-out cross-validation folds [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative examples of model predictions on the PPA4 test set (Fold 4) illustrate representative true positive (TP), true negative (TN), false positive (FP), and false negative (FN) cases. To further illustrate the quantitative results, [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
read the original abstract

Passive acoustic monitoring enables continuous, non-invasive biodiversity assessment across diverse ecosystems. The scale of these datasets has driven the adoption of machine learning, with supervised approaches showing strong performance. However, supervised methods require time-resolved annotated datasets, which remain scarce, especially in complex tropical soundscapes. We present PteroSet, a curated dataset of strongly annotated Neotropical bird vocalizations recorded in Puerto Asis (Putumayo) and Pivijay (Magdalena), Colombia, between 2023 and 2025. The dataset comprises 563 recordings (73.62 h) and 15,372 time-frequency annotations, including 6,702 events identified to the species level across 168 species. We release the annotations in a COCO-inspired JSON schema that unifies audio files, taxonomic categories, and labels for machine learning workflows. Beyond providing annotated data, PteroSet serves as a realistic benchmark that highlights key characteristics of tropical soundscapes, including acoustic co-occurrence and domain shift across recording sites. We provide a deep learning baseline for binary bird detection, demonstrating PteroSet's usability and the challenges it presents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents PteroSet, a curated dataset of strongly annotated Neotropical bird vocalizations recorded at two sites in Colombia (Puerto Asis and Pivijay) from 2023 to 2025. It comprises 563 recordings (73.62 h total) containing 15,372 time-frequency annotations, of which 6,702 events are identified to species level across 168 species. Annotations are released in a COCO-inspired JSON schema, and the paper supplies a deep-learning baseline for binary bird detection while noting the dataset's value for studying acoustic co-occurrence and domain shift.

Significance. If the species-level labels and time-frequency boundaries can be shown to be reliable, PteroSet would constitute a useful addition to the limited set of strongly annotated tropical PAM datasets, providing a realistic benchmark that incorporates overlapping vocalizations and cross-site variation. The public JSON schema and baseline code further support immediate use in supervised ML workflows.

major comments (2)
  1. [Abstract] Abstract and Dataset section: the central claim that the 6,702 species-level events constitute reliable ground truth rests on the assertion of 'expert manual' annotations, yet no annotation protocol, number of annotators, reference materials, decision rules for ambiguous or overlapping calls, or inter-annotator agreement metric is supplied. Without these elements the headline statistics cannot be treated as verified labels in complex Neotropical soundscapes.
  2. [Dataset Description] Dataset section: the two-site design is presented as capturing domain shift, but no quantitative comparison of acoustic properties, species composition, or recording conditions between Puerto Asis and Pivijay is provided. This leaves the domain-shift claim unsupported by evidence.
minor comments (1)
  1. [Abstract] Abstract: the total duration is given as 73.62 h; confirm that this figure is repeated with the same precision in the main text and tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below and describe the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract and Dataset section: the central claim that the 6,702 species-level events constitute reliable ground truth rests on the assertion of 'expert manual' annotations, yet no annotation protocol, number of annotators, reference materials, decision rules for ambiguous or overlapping calls, or inter-annotator agreement metric is supplied. Without these elements the headline statistics cannot be treated as verified labels in complex Neotropical soundscapes.

    Authors: We agree that a more detailed description of the annotation process is necessary to support the reliability of the species-level labels. In the revised manuscript we will add a new subsection to the Dataset Description that specifies the annotation protocol: annotations were performed sequentially by two expert ornithologists with extensive field experience in Colombian avifauna; reference materials included xeno-canto recordings, local field guides, and spectrogram examples from prior studies; decision rules for overlapping calls prioritized labeling the clearest vocalization while noting co-occurrence; ambiguous cases were resolved through joint review. Although a formal inter-annotator agreement metric was not computed, we will describe the consistency checks employed. These additions will allow readers to evaluate the ground-truth quality directly. revision: yes

  2. Referee: [Dataset Description] Dataset section: the two-site design is presented as capturing domain shift, but no quantitative comparison of acoustic properties, species composition, or recording conditions between Puerto Asis and Pivijay is provided. This leaves the domain-shift claim unsupported by evidence.

    Authors: We accept that quantitative evidence is required to substantiate the domain-shift claim. In the revised manuscript we will insert a new table (and accompanying text) that compares the two sites on species composition (unique species counts and Jaccard overlap), recording conditions (vegetation type, elevation, time-of-day distribution), and basic acoustic properties (mean sound pressure level and dominant frequency range across recordings). This will provide concrete support for the claim that the sites introduce realistic domain variation suitable for benchmarking. revision: yes

Circularity Check

0 steps flagged

No circularity: data-release paper with no derivations or self-referential predictions

full rationale

The manuscript is a dataset release describing the collection and annotation of 563 recordings with 15,372 time-frequency boxes across 168 species. No equations, fitted parameters, uniqueness theorems, or predictive claims appear that could reduce to author-defined inputs by construction. The central contribution is the curated collection itself, released in a COCO-inspired JSON schema, together with a simple baseline detector. Because there is no derivation chain to inspect, no load-bearing step reduces to a prior choice of the authors. The two-site design and expert-annotation claim are presented as factual descriptions of the data rather than results derived from internal definitions or self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is a dataset curation paper; the central contribution is the collected and labeled recordings rather than any derived mathematical result.

axioms (1)
  • domain assumption Expert manual annotation provides reliable species identification and temporal localization in complex overlapping soundscapes
    The paper depends on the accuracy of 6,702 species-level labels without reporting inter-annotator agreement or validation against independent observers.

pith-pipeline@v0.9.0 · 5805 in / 1314 out tokens · 57682 ms · 2026-05-22T08:57:19.586041+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    Sugai, L. S. M., Silva, T. S. F., Ribeiro, J. W., Jr & Llusia, D. Terrestrial Passive Acoustic Monitoring: Review and Perspectives. BioScience 69, 15–25 (2019)

  2. [2]

    Sugai, L. S. M., Desjonquères, C., Silva, T. S. F. & Llusia, D. A roadmap for survey designs in terrestrial acoustic monitoring. Remote Sensing in Ecology and Conservation 6, 220–235 (2020)

  3. [3]

    & Jones, K

    Gibb, R., Browning, E., Glover-Kapfer, P. & Jones, K. E. Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring. Methods in Ecology and Evolution https://doi.org/10.1111/2041-210X.13101(2018) doi:10.1111/2041-210X.13101

  4. [4]

    & Durbach, I

    Dufourq, E., Batist, C., Foquet, R. & Durbach, I. Passive acoustic monitoring of animal populations with transfer learning. Ecological Informatics 70, 101688 (2022)

  5. [5]

    Kahl, S. et al. Overview of BirdCLEF 2023: Automated Bird Species Identification in Eastern Africa. in CLEF-WN 2023 - 14th Conference and Labs of the Evaluation Forum (Thessaloniki, Greece, 2023)

  6. [6]

    Computational bioacoustics with deep learning: a review and roadmap

    Stowell, D. Computational bioacoustics with deep learning: a review and roadmap. PeerJ 10, e13152 (2022)

  7. [7]

    D., Pamuła, H., Stylianou, Y

    Stowell, D., Wood, M. D., Pamuła, H., Stylianou, Y. & Glotin, H. Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge. Methods in Ecology and Evolution 10, 368–380 (2019)

  8. [8]

    & Okanoya, K

    Koumura, T. & Okanoya, K. Automatic Recognition of Element Classes and Boundaries in the Birdsong with Variable Sequences. PLOS ONE 11, e0159188 (2016)

  9. [9]

    & Sevillano, X

    Gómez-Gómez, J., Vidaña-Vila, E. & Sevillano, X. Western Mediterranean wetlands bird species classification: evaluating small-footprint deep learning approaches on a new annotated dataset. Preprint athttps://doi.org/10.48550/arXiv.2207.05393(2022)

  10. [10]

    & Dufour, V

    Martin, K., Adam, O., Obin, N. & Dufour, V. Rookognise: Acoustic detection and identification of individual rooks in field recordings using multi-task neural networks. Ecological Informatics 72, 101818 (2022)

  11. [11]

    Kumar, S., Anshuman, B., Rüttimann, L., Hahnloser, R. H. R. & Arora, V. Balanced Deep CCA for Bird Vocalization Detection. in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1–5 (2023)

  12. [12]

    Zhao, Z. et al. Automated bird acoustic event detection and robust species classification. Ecological Informatics 39, 99–108 (2017)

  13. [13]

    Salamon, J. et al. Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring. PLOS ONE 11, e0166866 (2016)

  14. [14]

    & Bello, J

    Lostanlen, V., Salamon, J., Farnsworth, A., Kelling, S. & Bello, J. P. Birdvox-Full-Night: A Dataset and Benchmark for Avian Flight Call Detection. in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 266–270 (2018)

  15. [15]

    & Stowell, D

    Morfi, V., Bas, Y., Pamuła, H., Glotin, H. & Stowell, D. NIPS4Bplus: a richly annotated birdsong audio dataset. PeerJ Comput. Sci. 5, e223 (2019). 16

  16. [16]

    Zheng, C

    Cramer, A. L., Lostanlen, V., Farnsworth, A., Salamon, J. & Bello, J. P. Chirping up the Right Tree: Incorporating Biological Taxonomies into Deep Bioacoustic Classifiers. in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 901–905 (2020). doi:10.1109/ICASSP40776.2020.9052908

  17. [17]

    G., Cody, M

    Arriaga, J. G., Cody, M. L., Vallejo, E. E. & Taylor, C. E. Bird-DB: A database for annotated bird song sequences. Ecological Informatics 27, 21–25 (2015)

  18. [18]

    & Kitzes, J

    Chronister, L., Rhinehart, T., Place, A. & Kitzes, J. An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information. Ecology 102, (2021)

  19. [19]

    Hedley, R. W. Complexity, Predictability and Time Homogeneity of Syntax in the Songs of Cassin’s Vireo (Vireo cassinii). PLOS ONE 11, e0150822 (2016)

  20. [20]

    & Alsina-Pagès, R

    Vidaña-Vila, E., Navarro, J. & Alsina-Pagès, R. M. Towards Automatic Bird Detection: An Annotated and Segmented Acoustic Dataset of Seven Picidae Species. Data 2, (2017)

  21. [21]

    Merino Recalde, N. et al. A densely sampled and richly annotated acoustic data set from a wild bird population. Animal Behaviour 211, 111–122 (2024)

  22. [22]

    Weldy, M. J. et al. Audio tagging of avian dawn chorus recordings in California, Oregon and Washington. Biodivers Data J 12, e118315 (2024)

  23. [23]

    Jing, X. et al. DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition. Preprint athttps://doi.org/10.48550/arXiv.2406.08517(2024)

  24. [24]

    Hagiwara, M. et al. BEANS: The Benchmark of Animal Sounds. in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1–5 (2023)

  25. [25]

    Rauch, L. et al. BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics. in The Thirteenth International Conference on Learning Representations (2025)

  26. [26]

    Cañas, J. et al. Overview of BirdCLEF+ 2025: Multi-Taxonomic Sound Identification in the Middle Magdalena, Colombia. in CLEF 2025-Working Notes of the Conference and Labs of the Evaluation Forum vol. 4038 2909–2919 (2025)

  27. [27]

    & Plumbley, M

    Mesaros, A., Serizel, R., Heittola, T., Virtanen, T. & Plumbley, M. D. A decade of DCASE: Achievements, practices, evaluations and future challenges. in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1–5 (2025)

  28. [28]

    Sharing bird sounds from around the world.https://xeno-canto.org/(2022)

  29. [29]

    M., Eibl, M

    Kahl, S., Wood, C. M., Eibl, M. & Klinck, H. BirdNET: A deep learning solution for avian diversity monitoring. Ecological Informatics 61, 101236 (2021)

  30. [30]

    Hamer, J. et al. BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics. Preprint athttps://doi.org/10.48550/arXiv.2312.07439(2023)

  31. [31]

    GBIF Regional Statistics - 2020

    Waller, J. GBIF Regional Statistics - 2020. GBIF Data Bloghttps://data-blog.gbif.org/ post/gbif-regional-statistics-2020/(2020)

  32. [32]

    The top 10 most biodiverse countries

    Butler, R. The top 10 most biodiverse countries. https://news.mongabay.com/2016/05/ top-10-biodiverse-countries/(2016). 17

  33. [33]

    Vega-Hidalgo, Á. et al. A collection of fully-annotated soundscape recordings from neotropical coffee farms in Colombia and Costa Rica. Zenodohttps://doi.org/10.5281/zenodo.7525349 (2023)

  34. [34]

    A., Kahl, S

    Hopping, W. A., Kahl, S. & Klinck, H. A collection of fully-annotated soundscape recordings from the Southwestern Amazon Basin. Zenodohttps://doi.org/10.5281/zenodo.7079124 (2022)

  35. [35]

    Pérez-Granados, C. et al. WABAD: A world annotated bird acoustic dataset for passive acoustic monitoring. Ecology 107, e70317 (2026)

  36. [36]

    Lin, T.-Y. et al. Microsoft COCO: Common Objects in Context. in Computer Vision – ECCV 2014 (eds Fleet, D., Pajdla, T., Schiele, B. & Tuytelaars, T.) 740–755 (Springer International Publishing, Cham, 2014). doi:10.1007/978-3-319-10602-1_48

  37. [37]

    P., Prince, P., Snaddon, J

    Hill, A. P., Prince, P., Snaddon, J. L., Doncaster, C. P. & Rogers, A. AudioMoth: A low-cost acoustic device for monitoring biodiversity and the environment. HardwareX 6, e00073 (2019)

  38. [38]

    Lisa Yang Center for Conservation Bioacoustics

    K. Lisa Yang Center for Conservation Bioacoustics. Raven Pro: Interactive Sound Analysis Software (Version 1.6.5). The Cornell Lab of Ornithology (2026)

  39. [39]

    Ruiz, D. et al. Pteroset. Zenodohttps://doi.org/10.5281/zenodo.18563039(2026). 18