MyGardenBird: A Machine-Learning-Ready Bird Sound Dataset for Twelve Common Malaysian Birds
Pith reviewed 2026-06-27 21:14 UTC · model grok-4.3
The pith
A new dataset of twelve Malaysian bird species supports 92-96 percent convolutional classification accuracy on their sounds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that their MyGardenBird dataset of 7200 manually validated three-second clips from twelve Malaysian bird species exhibits strong interspecies separability, as shown by convolutional neural network classification accuracies of 92-96 percent on Mel-spectrograms when partitions are enforced at the source-recording level.
What carries the argument
The curation pipeline of species-level filtering from Xeno-canto, followed by manual spectrogram segmentation, quality control, and source-recording-level partitioning produces the balanced, leak-free dataset that carries the separability result.
If this is right
- Models trained on the dataset can classify the twelve species in new audio recordings at the reported accuracy levels.
- The provided SNR metadata enables direct study of how signal quality influences classification performance.
- The accompanying code allows extension of the same pipeline to additional species or geographic areas.
- The 44.1 kHz version supports experiments that require higher sampling rates than the primary 16 kHz release.
Where Pith is reading between the lines
- The curation approach could be applied to other tropical regions where labeled bird sound data remain scarce.
- Field recordings collected through citizen science could be tested against the dataset for real-world deployment.
- High separability opens the possibility of extending the work to continuous monitoring or multi-species detection tasks.
Load-bearing premise
Single-annotator manual segmentation and quality control produces labels consistent enough to support reliable high-accuracy machine learning.
What would settle it
Independent re-annotation of a subset of clips by multiple experts that reveals frequent species misassignments or inconsistent segment boundaries and drops classification accuracy below 80 percent.
Figures
read the original abstract
Bioacoustic datasets from tropical regions remain limited, in part due to the absence of reproducible workflows for aggregating recordings from public archives. We present \textbf{MyGardenBird}, a curated dataset of bird vocalisations representing twelve common species across Peninsular Malaysia and the Indo-Malayan region. Recordings were sourced from Xeno-canto and processed through species-level filtering, manual spectrogram segmentation, and quality control checks. The primary release comprises 7,200 manually validated audio clips (16 kHz, 16-bit PCM mono WAV), balanced at 600 three-second clips per species (6.0 hours total) derived from 1,381 distinct recordings. Metadata includes geospatial coordinates, vocalisation categories, and signal-to-noise ratio (SNR) values (range: 0.83--59.18 dB; mean: 15.80 dB). A supplementary 44.1 kHz version is also provided. To mitigate data leakage, dataset partitions are defined at the source-recording level. Baseline classification experiments using convolutional neural networks on Mel-spectrograms achieved test accuracies of 92--96\%, indicating strong interspecies separability. Limitations include reliance on single-annotator curation; however, validation with BirdNET confirmed label consistency. MyGardenBird is openly available at https://doi.org/10.5281/zenodo.20306877 under a CC BY-NC-SA 4.0 licence. Complete preprocessing code accompanies the release to support reproducibility and future expansion.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents MyGardenBird, a curated dataset of 7,200 three-second audio clips (600 per species) from 12 common Malaysian bird species sourced from Xeno-canto. It describes species-level filtering, single-annotator manual spectrogram segmentation, quality control, recording-level train/test splits to prevent leakage, and metadata including SNR values. Baseline CNN classification on Mel-spectrograms yields 92-96% test accuracy, presented as evidence of strong interspecies separability. The dataset (16 kHz and 44.1 kHz versions), metadata, and preprocessing code are released openly under CC BY-NC-SA 4.0.
Significance. If label quality holds, the work supplies a balanced, reproducible tropical bioacoustic dataset where such resources are scarce, with explicit recording-level partitioning and full code release as clear strengths for ML reproducibility. The reported baseline accuracies indicate practical utility for classification tasks, and the open Zenodo DOI supports immediate use and extension.
major comments (1)
- [Abstract] Abstract: The assertion that 'validation with BirdNET confirmed label consistency' provides no quantitative metric (agreement rate, confusion matrix, or error analysis on any subset). This is load-bearing for the separability claim, because the 92-96% CNN accuracies on the recording-level split could be inflated by systematic single-annotator mislabels rather than true acoustic distinctiveness; a concrete agreement statistic is required to substantiate the interpretation.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the manuscript. We address the single major comment below and will revise the paper accordingly to improve clarity and substantiation of claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion that 'validation with BirdNET confirmed label consistency' provides no quantitative metric (agreement rate, confusion matrix, or error analysis on any subset). This is load-bearing for the separability claim, because the 92-96% CNN accuracies on the recording-level split could be inflated by systematic single-annotator mislabels rather than true acoustic distinctiveness; a concrete agreement statistic is required to substantiate the interpretation.
Authors: We agree that the current wording in the abstract lacks the requested quantitative support. The BirdNET check was performed as a secondary consistency verification on the curated clips rather than a formal inter-annotator study, but no agreement rate, confusion matrix, or subset analysis is reported. In the revised manuscript we will either (a) remove the BirdNET sentence from the abstract and limitations section or (b) add the concrete statistics that were computed during curation (whichever is supported by our internal records) so that readers can properly evaluate label quality independent of the CNN results. revision: yes
Circularity Check
No circularity: dataset curation with empirical baselines only
full rationale
The manuscript is a data release paper describing sourcing from Xeno-canto, manual segmentation, and release of 7200 clips. Baseline CNN accuracies (92-96%) are reported as empirical results on the released data, not as derivations or predictions from fitted parameters. No equations, self-definitional steps, fitted-input predictions, or load-bearing self-citations appear. External archives and BirdNET provide independent grounding; label curation assumptions affect correctness but do not create circularity by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Recordings from Xeno-canto are representative of the vocalizations of the twelve species
Reference graph
Works this paper leans on
-
[1]
2014 , publisher =
Integer Programming , author =. 2014 , publisher =
2014
-
[2]
Spectral characteristics of
Divyapriya, C and Pramod, P , year =. Spectral characteristics of. Current Science , publisher =
-
[3]
2024 , url =
Cbc (Coin-or Branch and Cut) Solver , author =. 2024 , url =
2024
-
[4]
2020 , journal =
A state-of-the-art review on birds as indicators of biodiversity: Advances, challenges, and future directions , author =. 2020 , journal =
2020
-
[5]
A global assessment of
Funosas, David and Sebasti. A global assessment of. 2026 , journal =
2026
-
[6]
2016 , howpublished =
Deep Residual Learning for Image Recognition , author =. 2016 , howpublished =
2016
-
[7]
Searching for
Howard, Andrew and Sandler, Mark and Chu, Grace and Chen, Liang-Chieh and Chen, Bo and Tan, Mingxing and Wang, Weijun and Zhu, Yukun and Pang, Ruoming and Vasudevan, Vijay and others , year =. Searching for
-
[8]
Ecological Informatics , publisher =
Kahl, Stefan and Wood, Connor M and Eibl, Maximilian and Klinck, Holger , year =. Ecological Informatics , publisher =
-
[9]
Overview of
Kahl, Stefan and Vellinga, Willem-Pier and Denton, Samuel and Flinsenberg, Stefan and Fedorov, Roman and Klinck, Holger and Planque, Robert and Glotin, Herv. Overview of. 2023 , howpublished =
2023
-
[10]
Checklist of the birds of
Lepage, Denis , year =. Checklist of the birds of
-
[11]
2026 , url =
Birds of. 2026 , url =
2026
-
[12]
Park, Daniel S and Chan, William and Zhang, Yu and Chiu, Chung-Cheng and Zoph, Barret and Cubuk, Ekin D and Le, Quoc V , year =
-
[13]
Influence of landscape matrix on urban bird abundance: evidence from
Puan, Chong Leong and Yeong, Kok Loong and Ong, Kang Woei and Fauzi, Muhd Izzat Ahmad and Yahya, Muhammad Syafiq and Khoo, Swee Seng , year =. Influence of landscape matrix on urban bird abundance: evidence from. Journal of Asia-Pacific Biodiversity , publisher =
-
[14]
Rasmussen and John C
Pamela C. Rasmussen and John C. Anderton , year =. Birds of
-
[15]
Geographic variation in acoustic signals in wildlife: A systematic review , author =. 2025 , journal =. doi:10.1111/jbi.15116 , issn =
-
[16]
2016 , howpublished =
Audio Based Bird Species Identification using Deep Learning Techniques , author =. 2016 , howpublished =
2016
-
[17]
2022 , journal =
Computational bioacoustics with deep learning: a review and roadmap , author =. 2022 , journal =
2022
-
[18]
Tan, Mingxing and Le, Quoc , year =
-
[19]
2018 , howpublished =
mixup: Beyond Empirical Risk Minimization , author =. 2018 , howpublished =
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.