SEABAD: A Tropical Bird Activity Detection Dataset for Passive Acoustic Monitoring
Pith reviewed 2026-05-21 02:11 UTC · model grok-4.3
The pith
A new dataset of 50,000 balanced tropical audio clips supports accurate detection of bird vocalizations in dense Southeast Asian soundscapes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce SEABAD, a dataset of 50,000 curated three-second clips from Southeast Asian soundscapes, evenly balanced between bird-present and bird-absent samples spanning 1,677 bird species and standardized to 16 kHz mono audio. A dual-branch curation pipeline applies a six-stage positive-label workflow to Xeno-Canto recordings alongside source-specific negative-label extractions from environmental datasets, reducing class imbalance. A manual audit of 1,000 clips and baseline experiments with MobileNetV3-Small reaching 99.57 percent accuracy and 0.9985 AUC confirm the dataset supports reliable tropical bird activity detection.
What carries the argument
The dual-branch curation pipeline that combines a six-stage positive-label workflow on bird recordings with source-specific negative extractions from environmental datasets to generate balanced positive and negative samples.
Load-bearing premise
The six-stage positive-label workflow and source-specific negative extractions produce reliable labels for tropical soundscapes that generalize from the audited subset to the full 50,000-clip collection.
What would settle it
An independent collection of Southeast Asian field recordings labeled by the same workflow but showing substantially lower accuracy when classified by models trained on SEABAD would indicate that the labels or soundscape coverage do not hold.
Figures
read the original abstract
Passive acoustic monitoring (PAM) enables large-scale biodiversity assessment, but continuous recording generates large amounts of non-informative audio, creating challenges for storage, power consumption, and long-term edge deployment. Bird audio detection (BAD), which identifies bird vocalizations, can reduce this burden by filtering irrelevant recordings before downstream analysis. However, most BAD systems are trained on temperate datasets despite tropical soundscapes being denser, more species-rich, and acoustically unpredictable. To address this gap, we introduce SEABAD (Southeast Asian Bird Activity Detection), a dataset of 50,000 curated three-second clips from Southeast Asian soundscapes, evenly balanced between bird-present and bird-absent samples. The dataset spans 1,677 bird species and is standardized to 16 kHz mono audio for embedded and low-power inference. We developed a dual-branch curation pipeline: a six-stage positive-label workflow applied to Xeno-Canto recordings, alongside six source-specific negative-label extractions from environmental datasets. These procedures reduced class imbalance by 13.7% (Gini coefficient: 0.601 to 0.519). A manual audit of 1,000 positive clips confirmed 97.8% +/- 0.9% labeling accuracy. Baseline experiments using MobileNetV3-Small achieved 99.57% +/- 0.25% accuracy and 0.9985 +/- 0.0002 AUC across three random seeds. SEABAD and the full curation pipeline are publicly released to support tropical BAD research and energy-efficient acoustic monitoring.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SEABAD, a dataset of 50,000 three-second audio clips from Southeast Asian soundscapes for bird activity detection (BAD) in passive acoustic monitoring (PAM). It is evenly balanced between bird-present and bird-absent classes, spans 1,677 species, and is standardized to 16 kHz mono. The authors describe a dual-branch curation pipeline consisting of a six-stage positive-label workflow on Xeno-Canto recordings and six source-specific negative-label extractions from environmental datasets, which reduced class imbalance (Gini coefficient from 0.601 to 0.519). A manual audit of 1,000 positive clips reports 97.8% +/- 0.9% labeling accuracy. Baseline experiments with MobileNetV3-Small achieve 99.57% +/- 0.25% accuracy and 0.9985 +/- 0.0002 AUC across three seeds. The dataset and pipeline are publicly released to support tropical BAD research and energy-efficient monitoring.
Significance. If the curation process produces clips that faithfully represent the denser, overlapping, and unpredictable acoustics of real tropical PAM deployments, SEABAD would address a clear gap in existing temperate-focused BAD datasets and enable development of models suitable for low-power edge filtering. The public release of both data and the full curation pipeline is a concrete strength that supports reproducibility and extension by other researchers.
major comments (2)
- [§3 (Dataset Curation)] §3 (Dataset Curation): The six-stage positive-label workflow applied to Xeno-Canto recordings (typically focal, high-SNR clips of individual species) paired with source-specific negatives extracted from separate environmental datasets risks producing an artificially separable task. This construction does not demonstrably capture the multi-species overlaps, faint calls, and masking sounds that characterize continuous tropical PAM recordings, which directly weakens the claim that the 99.57% baseline demonstrates utility for the intended real-world filtering application.
- [Manual Audit paragraph] Manual Audit paragraph: The reported 97.8% +/- 0.9% accuracy is based on an audit of only 1,000 positive clips; no equivalent audit or inter-annotator details are provided for the negative samples, and the audit does not assess whether the selected clips exhibit the dense, unpredictable acoustic properties asserted for tropical soundscapes. This leaves the generalization to the full 50,000-clip set and to realistic PAM conditions unverified.
minor comments (2)
- [Abstract / Dataset Curation] The reduction of class imbalance by 13.7% via the Gini coefficient (0.601 to 0.519) is stated without the explicit formula or per-source breakdown used in the calculation.
- [Baseline Experiments] No details are given on how the three random seeds were chosen or whether the reported standard deviations reflect only seed variation or also hyperparameter sensitivity in the MobileNetV3-Small baseline.
Simulated Author's Rebuttal
We are grateful to the referee for their insightful comments, which have helped us improve the clarity and transparency of our work on the SEABAD dataset. We address each major comment in detail below, indicating where revisions have been made to the manuscript.
read point-by-point responses
-
Referee: [§3 (Dataset Curation)] The six-stage positive-label workflow applied to Xeno-Canto recordings (typically focal, high-SNR clips of individual species) paired with source-specific negatives extracted from separate environmental datasets risks producing an artificially separable task. This construction does not demonstrably capture the multi-species overlaps, faint calls, and masking sounds that characterize continuous tropical PAM recordings, which directly weakens the claim that the 99.57% baseline demonstrates utility for the intended real-world filtering application.
Authors: We appreciate this observation and agree that the curation approach, relying on focal Xeno-Canto recordings for positive samples and separate environmental sources for negatives, may not fully replicate the complexities of real-world tropical PAM, such as overlapping calls and masking noise. Our intent was to provide a balanced, publicly available dataset to bootstrap research in this under-represented domain, rather than to claim direct equivalence to continuous recordings. The high baseline accuracy reflects performance on this curated set, which we believe serves as a useful benchmark. To strengthen the manuscript, we have added a dedicated limitations paragraph in Section 5 discussing the differences between curated clips and dense PAM soundscapes, along with suggestions for future work involving raw continuous recordings. revision: yes
-
Referee: [Manual Audit paragraph] The reported 97.8% +/- 0.9% accuracy is based on an audit of only 1,000 positive clips; no equivalent audit or inter-annotator details are provided for the negative samples, and the audit does not assess whether the selected clips exhibit the dense, unpredictable acoustic properties asserted for tropical soundscapes. This leaves the generalization to the full 50,000-clip set and to realistic PAM conditions unverified.
Authors: We acknowledge the limitations of the audit scope. The manual audit focused on positive clips to verify the presence of bird vocalizations from the specified species, as negative clips were extracted from sources documented to lack avian activity. We have now included additional details in the revised manuscript about the verification process for negative samples, including source documentation and spot-checks. Regarding the assessment of acoustic density, we note that this was not part of the audit design, as the primary goal was label accuracy. We have revised the text to clarify the audit's purpose and added a statement on the need for further validation in realistic conditions. A comprehensive inter-annotator agreement study across the entire dataset would require substantial additional resources and is planned for future extensions. revision: partial
Circularity Check
No circularity: empirical dataset release with direct experimental baselines
full rationale
The paper introduces SEABAD as a curated dataset of 50,000 clips with a described six-stage positive workflow and source-specific negative extraction, followed by a manual audit and MobileNetV3-Small baseline runs reporting accuracy and AUC. No equations, first-principles derivations, or predictions appear. The reported 99.57% accuracy is an empirical measurement on the released data, not a fitted parameter or self-defined quantity renamed as a prediction. No self-citations are invoked to justify uniqueness or load-bearing premises. The curation process is presented as a transparent pipeline whose outputs are the dataset itself; nothing reduces to its own inputs by construction. This is a standard empirical contribution whose central claims rest on the audit and baseline numbers rather than any circular reduction.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Audio is standardized to 16 kHz mono
- domain assumption The dual-branch curation pipeline produces reliable positive and negative labels
Reference graph
Works this paper leans on
-
[1]
Methods in Ecology and Evolution , volume =
TweetyNet: A neural network that learns to segment and label birdsong and other temporal patterns , author =. Methods in Ecology and Evolution , volume =. 2022 , publisher =
work page 2022
-
[2]
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , year =
BirdSoundsDenoising: Deep Visual Audio Denoising for Bird Sounds , author =. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , year =
-
[3]
Compensating class imbalance for acoustic chimpanzee detection with convolutional recurrent neural networks , author =. 2021 , journal =
work page 2021
-
[4]
Low Resource Species Agnostic Bird Activity Detection , author =. 2021 , journal =. doi:10.1109/SiPS52927.2021.00015 , isbn =
-
[5]
Micronets: Neural network architectures for deploying
Banbury, Colby and Zhou, Chuteng and Fedorov, Igor and Matas, Ramon and Thakker, Urmish and Gope, Dibakar and Janapa Reddi, Vijay and Mattina, Matthew and Whatmough, Paul , year =. Micronets: Neural network architectures for deploying. Proceedings of machine learning and systems , volume =
-
[6]
Forest sound classification dataset:
Bandara, Meelan and Jayasundara, Roshinie and Ariyarathne, Isuru and Meedeniya, Dulani and Perera, Charith , year =. Forest sound classification dataset:. Sensors , publisher =
-
[7]
and Riesch, Rüdiger and Koricheva, Julia , year =
Beason, Richard D. and Riesch, Rüdiger and Koricheva, Julia , year =. Bioacoustics , publisher =. doi:10.1080/09524622.2018.1463293 , issn =
-
[8]
Correcting for the effects of class imbalance improves the performance of machine-learning based species distribution models , author =. 2023 , journal =
work page 2023
-
[9]
Semantic Segmentation of Bird Audio Patterns Using a Custom-Built Convolutional Neural Network , author =. 2024 , journal =
work page 2024
-
[10]
Bota, Gerard and Manzano-Rubio, Robert and Catal. Hearing to the unseen:. 2023 , journal =
work page 2023
-
[11]
Loss of temporal structure of tropical soundscapes with intensifying land use in Borneo , author =. 2022 , journal =
work page 2022
-
[12]
Soundscape monitoring for biodiversity assessment in tropical forests , author =. 2022 , journal =
work page 2022
-
[13]
Development of Parametric Filter Banks for Sound Feature Extraction , author =. 2023 , journal =
work page 2023
-
[14]
Convolutional Recurrent Neural Networks for Glucose Prediction
Convolutional Recurrent Neural Networks for Bird Audio Detection , author =. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =. 2017 , month =. doi:arXiv:1807.03043v4 , isbn =
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
Bird song: biological themes and variations , author =. 2003 , publisher =
work page 2003
-
[16]
Neural Network Distillation on IoT Platforms for Sound Event Detection , author =. 2019 , journal =. doi:10.21437/Interspeech.2019-2394 , issn =
-
[17]
ACM International Conference Proceeding Series , publisher =
Sound Event Detection With Binary Neural Networks on Tightly Power-Constrained IoT Devices , author =. ACM International Conference Proceeding Series , publisher =. 2020 , month =. doi:10.1145/3370748.3406588 , isbn =
-
[18]
Chasmai, Mustafa and Shepard, Alexander and Maji, Subhransu and Van Horn, Grant , year =. The
-
[19]
Novel Methods to Correct for Observer and Sampling Bias in Presence-Only Species Distribution Models , author =. 2021 , journal =
work page 2021
-
[20]
Journal of Artificial Intelligence Research , volume =
Chawla, Nitesh V and Bowyer, Kevin W and Hall, Lawrence O and Kegelmeyer, W Philip , year =. Journal of Artificial Intelligence Research , volume =
-
[21]
Efficient deep neural network compression for environmental sound classification on microcontroller units , author =. 2024 , journal =. doi:10.55730/1300-0632.4084 , issn =
-
[22]
Xception: Deep Learning with Depthwise Separable Convolutions , author =. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 , publisher =. 2017 , month =. doi:10.1109/CVPR.2017.195 , isbn =
-
[23]
Enabling Multi-Species Bird Classification on Low-Power Bioacoustic Loggers , author =. 2025 , journal =
work page 2025
- [24]
-
[25]
A coefficient of agreement for nominal scales , author =. 1960 , journal =
work page 1960
- [26]
-
[27]
Class-Balanced Loss Based on Effective Number of Samples , author =. 2019 , booktitle =
work page 2019
-
[28]
Evaluation of Classical Machine Learning Techniques towards Urban Sound Recognition on Embedded Systems , author =. 2019 , journal =. doi:10.3390/app9183885 , abstract =
-
[29]
Ecological diversity: measuring the unmeasurable , author =. 2018 , journal =
work page 2018
-
[30]
David, Robert and Duke, Jared and Jain, Advait and Janapa Reddi, Vijay and Jeffries, Nat and Li, Jian and Kreeger, Nick and Nappier, Ian and Natraj, Meghna and Wang, Tiezhen and others , year =. Tensorflow. Proceedings of machine learning and systems , volume =
-
[31]
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences , author =. 1980 , journal =
work page 1980
-
[32]
A Hybrid CNN-LSTM Model for Environmental Sound Classification: Leveraging Feature Engineering and Transfer Learning , author =. 2025 , journal =. doi:10.1016/j.dsp.2025.104079 , url =
-
[33]
Scientific Reports , publisher =
Fast Environmental Sound Classification Based on Resource Adaptive Convolutional Neural Network , author =. Scientific Reports , publisher =. 2022 , month =. doi:10.1038/s41598-022-10382-x , issn =
-
[34]
Freesound Datasets: a Platform for the Creation of Open Audio Datasets , author =. 2017 , journal =
work page 2017
-
[35]
Fonseca, Eduardo and Plakal, Manoj and Font, Frederic and Ellis, Daniel P W and Favory, Xavier and Pons, Jordi and Serra, Xavier , year =. General-purpose Tagging of
-
[36]
IEEE/ACM Transactions on Audio, Speech, and Language Processing , publisher =
Fonseca, Eduardo and Favory, Xavier and Pons, Jordi and Font, Frederic and Serra, Xavier , year =. IEEE/ACM Transactions on Audio, Speech, and Language Processing , publisher =
-
[37]
A state-of-the-art review on birds as indicators of biodiversity: Advances, challenges, and future directions , author =. 2020 , journal =
work page 2020
-
[38]
Environmental Noise Dataset for Sound Event Classification and Detection , author =. 2025 , journal =. doi:10.1038/s41597-025-05991-w , issn =
-
[39]
Artificial Intelligence Review , publisher =
Environmental sound recognition on embedded devices using deep learning: a review , author =. Artificial Intelligence Review , publisher =. 2025 , month =. doi:10.1007/s10462-025-11106-z , issn =
-
[40]
Monitoring with Machines: A Review of Computational Bioacoustics , author =. 2026 , booktitle =. doi:10.1007/978-3-032-05821-8_16 , isbn =
- [41]
- [42]
-
[43]
Giorgi, Giovanni Maria and Gigliarano, Chiara , year =. The. Journal of Economic Surveys , publisher =
-
[44]
CLEF: Conference and Labs of the Evaluation Forum , address =
Goeau, Hervé and Glotin, Hervé and Vellinga, Willem-pier and Planque, Robert and Joly, Alexis , year =. CLEF: Conference and Labs of the Evaluation Forum , address =
-
[45]
arXiv preprint arXiv:2104.01778 , url =
Gong, Yuan and Chung, Yu-An and Glass, James , year =. arXiv preprint arXiv:2104.01778 , url =
-
[46]
25th European Signal Processing Conference, EUSIPCO 2017 , publisher =
Two Convolutional Neural Networks for Bird Detection in Audio Signals , author =. 25th European Signal Processing Conference, EUSIPCO 2017 , publisher =. 2017 , month =. doi:10.23919/EUSIPCO.2017.8081512 , isbn =
-
[47]
Comparing Recurrent Convolutional Neural Networks for Large Scale Bird Species Classification , author =. 2021 , journal =
work page 2021
- [48]
- [49]
-
[50]
Deep Residual Learning for Image Recognition , author =. 2016 , booktitle =
work page 2016
-
[51]
Addressing class imbalance in image-based biodiversity monitoring , author =. 2021 , journal =
work page 2021
-
[52]
Hill, Andrew P and Prince, Peter and Snaddon, Jake L and Doncaster, C Patrick and Rogers, Alex , year =. HardwareX , publisher =. doi:10.1016/j.ohx.2019.e00073 , issn =
-
[53]
Distilling the Knowledge in a Neural Network , author =. 2015 , month =
work page 2015
-
[54]
and Farwig, Nina and Freisleben, Bernd , year =
Hoechst, Jonas and Bellafkir, Hicham and Lampe, Patrick and Vogelbacher, Markus and Muhling, Markus and Schneider, Daniel and Lindner, Kim and Rosner, Sascha and Schabo, Dana G. and Farwig, Nina and Freisleben, Bernd , year =. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatic...
-
[55]
Essential Steps for Establishing a Large-scale Passive Acoustic Monitoring for an Elusive Forest Bird Species: The Eurasian Woodcock (Scolopax rusticola) , author =. 2025 , journal =
work page 2025
-
[56]
Andrew G. Howard and Menglong Zhu and Bo Chen and Dmitry Kalenichenko and Weijun Wang and Tobias Weyand and Marco Andreetto and Hartwig Adam , year =. CoRR , volume =
-
[57]
Andrew Howard and Mark Sandler and Grace Chu and Liang-Chieh Chen and Bo Chen and Mingxing Tan and Weijun Wang and Yukun Zhu and Ruoming Pang and Vijay Vasudevan and Quoc V. Le and Hartwig Adam , year =. Searching for. The IEEE International Conference on Computer Vision (ICCV) , pages =. doi:10.1109/ICCV.2019.00140 , url =
- [58]
-
[59]
Deep Learning Bird Song Recognition Based on
Hu, Shipeng and Chu, Yihang and Wen, Zhifang and Zhou, Guoxiong and Sun, Yurong and Chen, Aibin , year =. Deep Learning Bird Song Recognition Based on. Ecological Indicators , publisher =. doi:10.1016/j.ecolind.2023.110844 , issn =
-
[60]
Huang, Zhaolan and Tousnakhoff, Adrien and Kozyr, Polina and Rehausen, Roman and Bie. 2024 , booktitle =
work page 2024
-
[61]
Sampling biases shape our view of the natural world , author =. 2021 , journal =. doi:https://doi.org/10.1111/ecog.05926 , keywords =. https://nsojournals.onlinelibrary.wiley.com/doi/pdf/10.1111/ecog.05926 , abstract =
-
[62]
Understanding the adequacy and representativeness of species distribution data , author =. 2025 , journal =
work page 2025
-
[63]
Huus, Jan and Kelly, Kevin G. and Bayne, Erin M. and Knight, Elly C. , year =. Ecological Informatics , publisher =. doi:10.1016/j.ecoinf.2025.103122 , issn =
-
[64]
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , author =. 2015 , journal =
work page 2015
-
[65]
Sampling and Modelling Rare Species: Conceptual Guidelines for the Neglected Majority , author =. 2022 , journal =
work page 2022
-
[66]
Weight Light, Hear Right: Heart Sound Classification With a Low-Complexity Model , author =. 2024 , booktitle =
work page 2024
-
[67]
Music type classification by spectral contrast feature , author =. 2002 , booktitle =
work page 2002
-
[68]
Billion-scale similarity search with
Johnson, Jeff and Douze, Matthijs and J. Billion-scale similarity search with. 2019 , journal =
work page 2019
-
[69]
Jolles, Jolle W. , year =. Broad-Scale Applications of the. Methods in Ecology and Evolution , publisher =. doi:10.1111/2041-210X.13652 , issn =
- [70]
-
[71]
Recognizing Birds from Sound - The 2018
Kahl, Stefan and Wilhelm-Stein, Thomas and Klinck, Holger and Kowerko, Danny and Eibl, Maximilian , year =. Recognizing Birds from Sound - The 2018. arXiv , number =
work page 2018
-
[72]
Ecological Informatics , publisher =
Kahl, Stefan and Wood, Connor M and Eibl, Maximilian and Klinck, Holger , year =. Ecological Informatics , publisher =
-
[73]
Kahl, Stefan and Denton, Tom and Klinck, Holger and Glotin, Herv. Overview of. 2021 , booktitle =
work page 2021
-
[74]
Ecological Indicators , publisher =
Automated detection of gunshots in tropical forests using convolutional neural networks , author =. Ecological Indicators , publisher =. 2022 , month =. doi:10.1016/j.ecolind.2022.109128 , issn =
-
[75]
Automatic Detection for Bioacoustic Research: A Practical Guide From and for Biologists and Computer Scientists , author =. 2025 , journal =
work page 2025
-
[76]
Animal Sounds Classification Scheme Based on Multi-Feature Network with Mixed Datasets , author =. 2020 , journal =
work page 2020
-
[77]
Adam: A Method for Stochastic Optimization , author =. 2015 , booktitle =
work page 2015
-
[78]
CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs
Liangzhen Lai and Naveen Suda and Vikas Chandra , year =. arXiv preprint arXiv:1801.06601 , publisher =
work page internal anchor Pith review Pith/arXiv arXiv
-
[79]
A quantitative evaluation of the performance of the low-cost
Lapp, Sam and Stahlman, Nickolus and Kitzes, Justin , year =. A quantitative evaluation of the performance of the low-cost. Sensors , publisher =
-
[80]
Computational Bioacoustics and Automated Recognition of Bird Vocalizations: New Tools, Applications and Methods for Bird Monitoring , author =. 2024 , journal =
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.