pith. sign in

arxiv: 2511.12158 · v3 · pith:UZGBOAACnew · submitted 2025-11-15 · 💻 cs.LG

Data-Efficient Self-Supervised Algorithms for Fine-Grained Birdsong Analysis

Pith reviewed 2026-05-21 19:27 UTC · model grok-4.3

classification 💻 cs.LG
keywords self-supervised learningbirdsong syllable detectiondata-efficient annotationbioacousticsCanary songBengalese Finchmasked predictionsemi-supervised refinement
0
0 comments X

The pith

A three-stage self-supervised pipeline produces reliable syllable detectors for complex birdsong from very few labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Residual Multi-Layer Perceptron Recurrent Neural Network for syllable-level birdsong annotation. It describes a three-stage process that begins with self-supervised pretraining on unlabeled audio through masked prediction or online clustering. Supervised training with data augmentation then builds a frame-level detector from limited labels. A final semi-supervised refinement step improves the model using additional unlabeled recordings. The pipeline succeeds on the demanding Canary song and generalizes to Bengalese Finch, lowering annotation demands for bioacoustic research.

Core claim

The authors establish that self-supervised pretraining on unlabeled Canary and Finch audio, followed by supervised training with augmentation and semi-supervised post-training, yields accurate frame-level syllable detectors even under extreme label scarcity, succeeding on songs marked by rapid vocalizations, brief intervals, broadband sweeps, and spectrally similar syllables that demand fine-grained distinctions.

What carries the argument

The three-stage training pipeline of self-supervised pretraining via masked prediction or online clustering, supervised training with augmentation, and semi-supervised refinement applied to a Residual Multi-Layer Perceptron Recurrent Neural Network.

If this is right

  • Accurate syllable-level annotation of individual birds becomes practical with far smaller labeled datasets than previously required.
  • The pipeline supplies a working baseline for annotating songs from additional bird species that share rapid and spectrally complex patterns.
  • Self-supervised embeddings produced in the first stage support both linear probing for specific tasks and fully unsupervised exploration of song structure.
  • Annotation costs drop sharply for studies in bioacoustics, neuroscience, and linguistics that depend on detailed syllable parsing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same staged approach could lower labeling needs when analyzing vocalizations of other animals such as whales or primates.
  • Adaptations of the pipeline might aid low-resource audio classification tasks beyond birdsong.
  • Testing alternative self-supervised objectives could reveal which pretraining method best isolates fine spectral differences.

Load-bearing premise

Self-supervised pretraining on unlabeled audio captures the fine-grained spectro-temporal features needed to tell apart spectrally similar syllables in rapid sequences with brief gaps.

What would settle it

A controlled test on Canary song showing that the three-stage model achieves no higher syllable detection accuracy than a purely supervised baseline trained on the identical small labeled set would falsify the value of the self-supervised stages.

Figures

Figures reproduced from arXiv: 2511.12158 by Houtan Ghaffari, Lukas Rauch, Paul Devos.

Figure 1
Figure 1. Figure 1: An example of syllable prediction using the proposed model and the three-stage training framework. repetitions. To be more precise, notice that all syllable types are not present in a single recording or song. Additionally, the recordings have variable durations, ranging from 1 to 40 seconds. This few-shot subset is the minimum number of recording files that ensures each syllable is vocalized at least once… view at source ↗
Figure 2
Figure 2. Figure 2: The proposed Res-MLP-RNN neural network architecture with an input spectrogram example. The model has roughly 10 M parameters. The Masked Prediction and Online Clustering heads are used in two separate self-supervised pretraining tasks. The Classifier head is used for supervised and post-training semi-supervised syllable detection tasks. The first linear layer of the first block projects 256 frequency bins… view at source ↗
Figure 3
Figure 3. Figure 3: An example from the birdsong MAE model for the masked prediction task. The model is used in different ways for experiments and ablation studies. However, the proposed training framework has three clear stages. First, pretrain the model on all available species datasets, three Canaries in this work, using either MAE or OSC. Second, after pretraining, replace the SSL head with a linear classifier for supervi… view at source ↗
Figure 4
Figure 4. Figure 4: Online Syllable Clustering loss. The model is trained for 200 epochs using the AdamW optimizer61 with a weight decay of 5e−2, a linear learn￾ing rate warmup from 1e−6 to 5e−4 in 20 epochs, then, a constant rate for 10 epochs, and finally, a cosine decay for the remaining 170 epochs to the minimum learning rate of 1e−6. The experiment is conducted once, and the resulting SSL model is used in all subsequent … view at source ↗
Figure 5
Figure 5. Figure 5: Two syllables from each bird with their true and predicted distribution of duration across training sizes and models. MAE means the model was pretrained by the SSL masked prediction prior to the supervised finetuning. Equivalently, OSC refers to the Online Syllable Clustering pretraining task. The predictions are faithful, even with such small training sizes. dimensions using Principal Component Analysis (… view at source ↗
Figure 6
Figure 6. Figure 6: Syllable transition matrices for the llb3 canary across different SSL pretraining tasks and train set sizes. Even with few-shot finetuning, the main structure of the syllable transition is visible. The True Matrix is the same in both rows, and it is calculated based on human annotation. datasets, can be open-sourced for off-the-shelf usage in the future. Then, the expert can perform a similar clustering an… view at source ↗
Figure 7
Figure 7. Figure 7: T-SNE plots of the clustered SSL embeddings of syllables after re-labeling via majority vote. To avoid cluttering, the plots show the density contours of each cluster, estimated from 2000 random syllables at maximum. and established the proposed model and three-stage training framework as a suitable method for birdsong analysis. The Application section further expanded on the utility of SSL models. As alre… view at source ↗
read the original abstract

Research in bioacoustics, neuroscience, and linguistics often uses birdsong as a proxy to acquire knowledge across diverse areas. This requires audio models to annotate and parse the birdsong. Developing such models requires precise, syllable-level annotated training data. Therefore, automated methods that reduce annotation costs are in demand. This work presents a data-efficient birdsong annotator called Residual Multi-Layer Perceptron Recurrent Neural Network. It then presents a three-stage training pipeline for developing reliable birdsong syllable detectors with minimal annotation. The first stage is self-supervised learning from unlabeled data. Two of the most successful pretraining paradigms are explored, namely, masked prediction and online clustering. The second stage is supervised training with effective data augmentation to produce a robust frame-level syllable detector for each individual. The third stage is a semi-supervised post-training step that refines each individual's model using unlabeled data. The effectiveness of this approach is demonstrated for the Canary song in extreme label-scarcity scenarios. From a signal-processing perspective, the Canary song exhibits one of the most challenging spectro-temporal patterns for algorithmic time-series annotation: rapid vocalizations, brief inter-syllabic intervals, fast and broadband frequency sweeps, and spectrally similar syllables that require fine-grained features to distinguish. Hence, a successful syllable detection algorithm for Canary also establishes a robust baseline for other birds. This methodological generalization is validated in a case study of Bengalese Finch song annotation. Finally, the potential of self-supervised embeddings is assessed for linear probing and unsupervised birdsong analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a Residual MLP-RNN architecture together with a three-stage training pipeline for data-efficient syllable-level birdsong annotation. Stage 1 performs self-supervised pretraining on unlabeled audio via masked prediction or online clustering; Stage 2 trains a frame-level detector with data augmentation; Stage 3 applies semi-supervised refinement on additional unlabeled segments. Effectiveness is shown for Canary song under extreme label scarcity (down to single-digit annotated minutes) and validated on Bengalese Finch, with quantitative metrics including frame-level F1 and syllable boundary error.

Significance. If the reported gains hold, the work supplies a practical route to lower annotation costs in bioacoustics and neuroscience while handling one of the most demanding vocalization patterns (rapid sweeps, spectrally similar syllables). Credit is due for the explicit stress-test design on Canary, the provision of exact label counts, augmentation details, and linear-probing results that demonstrate the value of the self-supervised embeddings over random initialization.

major comments (2)
  1. [§4] §4 (Results): the manuscript states that the semi-supervised refinement yields measurable improvement on held-out unlabeled segments, yet the precise procedure for generating and filtering pseudo-labels (threshold, selection criterion) is not specified; this detail is load-bearing for assessing whether error propagation is controlled in the extreme-scarcity regime.
  2. [Table 2] Table 2 (Canary scarcity experiments): while comparisons to supervised baselines and prior tools are presented, the exact number of annotated minutes corresponding to each reported F1 score must be stated explicitly in the table caption or a dedicated column so that the 'extreme label-scarcity' claim can be evaluated quantitatively.
minor comments (2)
  1. [Abstract] Abstract: no numerical results (F1, boundary error, or label counts) are given, which weakens the summary of the central claim.
  2. [§3.1] §3.1: the description of the Residual MLP-RNN would benefit from an explicit equation or diagram showing the residual connections and RNN integration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments on our manuscript. We address each major comment below and will revise the manuscript accordingly to improve clarity and reproducibility.

read point-by-point responses
  1. Referee: [§4] §4 (Results): the manuscript states that the semi-supervised refinement yields measurable improvement on held-out unlabeled segments, yet the precise procedure for generating and filtering pseudo-labels (threshold, selection criterion) is not specified; this detail is load-bearing for assessing whether error propagation is controlled in the extreme-scarcity regime.

    Authors: We agree that the precise procedure for generating and filtering pseudo-labels must be specified to allow proper evaluation of error propagation control. The current manuscript describes Stage 3 at a high level but does not detail the threshold or selection criterion. In the revised version we will add an explicit description of the pseudo-labeling process, including the confidence threshold applied and the filtering rule used to select reliable segments. revision: yes

  2. Referee: [Table 2] Table 2 (Canary scarcity experiments): while comparisons to supervised baselines and prior tools are presented, the exact number of annotated minutes corresponding to each reported F1 score must be stated explicitly in the table caption or a dedicated column so that the 'extreme label-scarcity' claim can be evaluated quantitatively.

    Authors: We acknowledge that placing the exact annotated-minute counts directly in Table 2 would make the scarcity experiments easier to evaluate at a glance. Although these quantities appear in the surrounding text, we will revise the table by adding a dedicated column (or updating the caption) to list the precise number of annotated minutes for each F1 score. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained empirical pipeline

full rationale

The paper presents a three-stage empirical pipeline (self-supervised pretraining via masked prediction or online clustering on unlabeled audio, followed by supervised training with augmentation, then semi-supervised refinement) for syllable detection. No equations, fitted parameters, or derivations are described that reduce the reported metrics (frame-level F1, boundary error) to inputs by construction. Claims rest on quantitative comparisons to supervised baselines and prior tools, with explicit label-scarcity experiments and cross-species validation. No self-citation load-bearing steps or ansatz smuggling appear in the provided methods description; the architecture and objectives follow standard ML practices without internal reduction to the target results.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard self-supervised learning assumptions plus domain-specific expectations about birdsong spectro-temporal structure; limited information available from abstract alone.

free parameters (1)
  • Pretraining and fine-tuning hyperparameters
    Learning rates, masking ratios, cluster counts, and augmentation strengths are chosen or fitted during the three stages.
axioms (1)
  • domain assumption Unlabeled birdsong recordings contain sufficient structure for masked prediction or clustering to learn features useful for syllable discrimination.
    Invoked by the first stage of the pipeline described in the abstract.

pith-pipeline@v0.9.0 · 5807 in / 1169 out tokens · 70529 ms · 2026-05-21T19:27:41.841405+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 5 internal anchors

  1. [1]

    C., Okanoya, K., Beckers, G

    Berwick, R. C., Okanoya, K., Beckers, G. J. & Bolhuis, J. J. Songs to syntax: the linguistics of birdsong.Trends cognitive sciences15, 113–121 (2011)

  2. [2]

    Mets, D. G. & Brainard, M. S. Learning is enhanced by tailoring instruction to individual genetic differences.Elife8, e47216 (2019)

  3. [3]

    D., Day, N

    Burkett, Z. D., Day, N. F., Peñagarikano, O., Geschwind, D. H. & White, S. A. V oice: A semi-automated pipeline for standardizing vocal analysis across models.Sci. reports5, 10237 (2015)

  4. [4]

    Cohen, Y .et al.Automated annotation of birdsong with a neural network that segments spectrograms.Elife11, e63853 (2022)

  5. [5]

    & Tachibana, R

    Morita, T., Koda, H., Okanoya, K. & Tachibana, R. O. Measuring context dependency in birdsong using artificial neural networks.PLoS computational biology17, e1009707 (2021)

  6. [6]

    & Webster, M

    Podos, J. & Webster, M. S. Ecology and evolution of bird sounds.Curr. Biol.32, R1100–R1104, DOI: https://doi.org/10. 1016/j.cub.2022.07.073 (2022)

  7. [7]

    & Edeline, J.-M

    Huetz, C., Del Negro, C., Lehongre, K., Tarroux, P. & Edeline, J.-M. The selectivity of canary hvc neurons for the bird’s own song: Rate coding, temporal coding, or both?J. Physiol.98, 395–406, DOI: https://doi.org/10.1016/j.jphysparis.2005. 09.011 (2004). Decoding and interfacing the brain: from neuronal assemblies to cyborgs

  8. [8]

    & Devos, P

    Ghaffari, H. & Devos, P. Consistent birdsong syllable segmentation using deep semi-supervised learning. InProceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023, 6277–6284, DOI: 10.61782/fa.2023. 0897 (2023)

  9. [9]

    & Plumbley, M

    Stowell, D. & Plumbley, M. D. Birdsong and c4dm: A survey of uk birdsong and machine recognition for music researchers. Centre for Digit. Music. Queen Mary Univ. London, Tech. Rep. C4DM-TR-09-12(2010)

  10. [10]

    & Gentner, T

    Sainburg, T., Thielk, M. & Gentner, T. Q. Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires.PLoS computational biology16, e1008228 (2020)

  11. [11]

    & Saino, N

    Boncoraglio, G. & Saino, N. Habitat structure and the evolution of bird song: a meta-analysis of the evidence for the acoustic adaptation hypothesis.Funct. Ecol.134–142 (2007)

  12. [12]

    Morfi, V ., Lachlan, R. F. & Stowell, D. Deep perceptual embeddings for unlabelled animal sound events.The J. Acoust. Soc. Am.150, 2–11 (2021)

  13. [13]

    & Devos, P

    Ghaffari, H. & Devos, P. Robust weakly supervised bird species detection via peak aggregation and pie.IEEE Transactions on Audio, Speech Lang. Process.33, 1427–1439, DOI: 10.1109/TASLPRO.2025.3552983 (2025). 15.Rauch, L.et al.Can masked autoencoders also listen to birds?Transactions on Mach. Learn. Res.(2025). 16.Rauch, L.et al.Unmute the patch tokens: Re...

  14. [14]

    & Toutanova, K

    Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 4171–4186 (2019)

  15. [15]

    InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 16000–16009 (2022)

    He, K.et al.Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 16000–16009 (2022). 20.Huang, P.-Y .et al.Masked autoencoders that listen.Adv. Neural Inf. Process. Syst.35, 28708–28720 (2022). 21.Vaswani, A.et al.Attention is all you need.Adv. neural information processing ...

  16. [16]

    InInternational Conference on Learning Representations(2021)

    Dosovitskiy, A.et al.An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations(2021)

  17. [17]

    & Douze, M

    Caron, M., Bojanowski, P., Joulin, A. & Douze, M. Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV), 132–149 (2018)

  18. [18]

    neural information processing systems33, 9912–9924 (2020)

    Caron, M.et al.Unsupervised learning of visual features by contrasting cluster assignments.Adv. neural information processing systems33, 9912–9924 (2020)

  19. [19]

    InProceedings of the IEEE/CVF international conference on computer vision, 9650–9660 (2021)

    Caron, M.et al.Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF international conference on computer vision, 9650–9660 (2021)

  20. [20]

    YM., A., C., R. & A., V . Self-labelling via simultaneous clustering and representation learning. InInternational Conference on Learning Representations(2020)

  21. [21]

    InEuropean conference on computer vision, 456–473 (Springer, 2022)

    Assran, M.et al.Masked siamese networks for label-efficient learning. InEuropean conference on computer vision, 456–473 (Springer, 2022)

  22. [22]

    In Krause, A.et al.(eds.)Proceedings of the 40th International Conference on Machine Learning, vol

    Chen, S.et al.BEATs: Audio pre-training with acoustic tokenizers. In Krause, A.et al.(eds.)Proceedings of the 40th International Conference on Machine Learning, vol. 202 ofProceedings of Machine Learning Research, 5178–5193 (PMLR, 2023)

  23. [23]

    Oquab, M.et al.DINOv2: Learning robust visual features without supervision.Transactions on Mach. Learn. Res.(2024). Featured Certification. 30.Murphy, K. P.Probabilistic machine learning: an introduction(MIT press, 2022)

  24. [24]

    & Bengio, Y

    Grandvalet, Y . & Bengio, Y . Semi-supervised learning by entropy minimization.Adv. neural information processing systems17(2004)

  25. [25]

    In Workshop on challenges in representation learning, ICML, vol

    Lee, D.-H.et al.Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, vol. 3, 896 (Atlanta, 2013)

  26. [26]

    neural information processing systems33, 596–608 (2020)

    Sohn, K.et al.Fixmatch: Simplifying semi-supervised learning with consistency and confidence.Adv. neural information processing systems33, 596–608 (2020)

  27. [27]

    Distilling the Knowledge in a Neural Network

    Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531(2015)

  28. [28]

    & Valpola, H

    Tarvainen, A. & Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Adv. neural information processing systems30(2017)

  29. [29]

    & Garamszegi, L

    Soma, M. & Garamszegi, L. Z. Rethinking birdsong evolution: meta-analysis of the relationship between song complexity and reproductive success.Behav. Ecol.22, 363–371 (2011)

  30. [30]

    J., Bolhuis, J

    Terpstra, N. J., Bolhuis, J. J. & den Boer-Visser, A. M. An analysis of the neural representation of birdsong memory.J. Neurosci.24, 4971–4977 (2004). 38.Williams, H. Birdsong and singing behavior.Annals New York Acad. Sci.1016, 1–30 (2004)

  31. [31]

    & Tchernichovski, O

    Lipkind, D. & Tchernichovski, O. Quantification of developmental birdsong learning from the subsyllabic scale to cultural evolution.Proc. Natl. Acad. Sci.108, 15572–15579 (2011)

  32. [32]

    Sober, S. J. & Brainard, M. S. Adult birdsong is actively maintained by error correction.Nat. neuroscience12, 927–931 (2009)

  33. [33]

    neural information processing systems33, 21271–21284 (2020)

    Grill, J.-B.et al.Bootstrap your own latent-a new approach to self-supervised learning.Adv. neural information processing systems33, 21271–21284 (2020)

  34. [34]

    & Hinton, G

    Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, 1597–1607 (PmLR, 2020)

  35. [35]

    InInternational Conference on Learning Representations (ICLR)(2025)

    Rauch, L.et al.BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics. InInternational Conference on Learning Representations (ICLR)(2025)

  36. [36]

    Sinkhorn distances: Lightspeed computation of optimal transport.Adv

    Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport.Adv. neural information processing systems 26(2013)

  37. [37]

    A Convex Relaxation for Weakly Supervised Classifiers

    He, K., Fan, H., Wu, Y ., Xie, S. & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9729–9738 (2020). 46.Zhou, J.et al.ibot: Image bert pre-training with online tokenizer.Int. Conf. on Learn. Represent. (ICLR)(2022). 16/17 47.Joulin, A. & ...

  38. [38]

    & Bertram, R

    Daou, A., Johnson, F., Wu, W. & Bertram, R. A computational tool for automated large-scale analysis and measurement of bird-song syntax.J. neuroscience methods210, 147–160 (2012)

  39. [39]

    InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 16133–16142 (2023)

    Woo, S.et al.Convnext v2: Co-designing and scaling convnets with masked autoencoders. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 16133–16142 (2023)

  40. [40]

    & Devos, P

    Ghaffari, H. & Devos, P. On the role of audio frontends in bird species recognition.Ecol. Informatics81, 102573, DOI: https://doi.org/10.1016/j.ecoinf.2024.102573 (2024)

  41. [41]

    M., Felix, L

    Ferreira-Paiva, L., Alfaro-Espinoza, E., Almeida, V . M., Felix, L. B. & Neves, R. V . A survey of data augmentation for audio classification. InCongresso Brasileiro de Automática-CBA, vol. 3 (2022)

  42. [42]

    Zollinger, S. A. & Brumm, H. Why birds sing loud songs and why they sometimes don’t.Animal Behav.105, 289–295, DOI: https://doi.org/10.1016/j.anbehav.2015.03.030 (2015). 53.Park, D. S.et al.Specaugment: A simple data augmentation method for automatic speech recognition.Interspeech 2019 DOI: 10.21437/interspeech.2019-2680 (2019)

  43. [43]

    Layer Normalization

    Hochreiter, S. & Schmidhuber, J. Long short-term memory.Neural Comput.9, 1735–1780, DOI: 10.1162/neco.1997.9.8. 1735 (1997). 55.Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization.arXiv preprint arXiv:1607.06450(2016)

  44. [44]

    Ramachandran, P., Zoph, B. & Le, Q. V . Swish: a self-gated activation function.arXiv preprint arXiv:1710.059417, 5 (2017)

  45. [45]

    & Salakhutdinov, R

    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting.The journal machine learning research15, 1929–1958 (2014)

  46. [46]

    Adam: A Method for Stochastic Optimization

    He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016). 59.Kingma, D. P. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980(2014)

  47. [47]

    Breiman, L., Friedman, J., Olshen, R. A. & Stone, C. J.Classification and regression trees(Chapman and Hall/CRC, 2017)

  48. [48]

    & Hutter, F

    Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. InInternational Conference on Learning Representations (2019)

  49. [49]

    X., Epps, J

    Vinh, N. X., Epps, J. & Bailey, J. Information theoretic measures for clusterings comparison: is a correction for chance necessary? InProceedings of the 26th annual international conference on machine learning, 1073–1080 (2009)

  50. [50]

    & Jarvis, E

    Tchernichovski, O., Eisenberg-Edidin, S. & Jarvis, E. D. Balanced imitation sustains song culture in zebra finches.Nat. communications12, 2562 (2021). Author contributions statement H.G. conceptualized the work, reviewed the literature, conducted the experiments, analyzed the results, made the figures, validated the results and arguments, and wrote the or...