Data Augmentation for Instrument Classification Robust to Audio Effects

Ant\'onio Ramires; Xavier Serra

Training instrument classifiers with audio-effect augmentation improves accuracy on processed one-shot sounds used in electronic music production.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

2026-05-24 18:54 UTC pith:BT4CAD3E

load-bearing objection This is a standard empirical check on whether data augmentation with audio effects improves instrument classification robustness for sample packs, with no new methods or surprising results.

arxiv 1907.08520 v1 pith:BT4CAD3E submitted 2019-07-19 cs.SD eess.AS

Data Augmentation for Instrument Classification Robust to Audio Effects

Ant\'onio Ramires , Xavier Serra This is my paper

classification cs.SD eess.AS

keywords instrument classificationdata augmentationaudio effectselectronic music productionone-shot soundsrobustnesssample packs

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates a state-of-the-art instrument classifier on one-shot sounds that have been processed with common audio effects. It applies data augmentation during training by adding the same effects to the original dataset and measures resulting changes in classification accuracy for each effect. A sympathetic reader would care because automatic classification of sample packs is only practical if the labels remain reliable after producers apply reverb, compression, distortion and similar processing. The work shows that augmentation narrows the performance gap between clean and effected sounds without changing the underlying instrument labels. This directly addresses the mismatch between laboratory training data and real electronic-music workflows.

Core claim

A model trained on a large set of clean one-shot instrumental sounds loses accuracy when the test sounds receive audio effects typical of electronic music production; retraining the same model with those effects included as data augmentation restores most of the lost accuracy, and the paper reports the per-effect contribution to the recovery.

What carries the argument

Data augmentation that applies audio effects (reverb, delay, distortion, compression, EQ, etc.) to the training examples while keeping the original instrument label.

Load-bearing premise

The chosen audio effects and their parameter ranges are representative of real production processing and never change an instrument's identity enough to require a new label.

What would settle it

Measure classification accuracy on a held-out set of one-shot sounds that have been processed with the same effects but at parameter values never seen during augmentation; if accuracy remains high only when augmentation was used in training, the claim holds.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Classifiers trained this way can label large sample-pack libraries without manual correction after common production processing.
The per-effect accuracy tables identify which processing steps (for example heavy distortion) still require additional techniques.
The same augmentation pipeline can be reused for other audio classification tasks that encounter production effects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the approach to multi-effect chains or to full music loops would test whether the robustness generalises beyond isolated one-shots.
If the method works, automatic tagging services could offer users the option to train custom models on their own effect chains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

This is a standard empirical check on whether data augmentation with audio effects improves instrument classification robustness for sample packs, with no new methods or surprising results.

read the letter

The paper's core move is to take an existing instrument classifier, apply common audio effects as augmentation during training, and measure how accuracy holds up on processed one-shots. That directly targets a practical pain point in electronic music production libraries where sounds rarely stay dry. The setup is sensible: they pick effects that matter in real workflows and check per-effect impact rather than lumping everything together. Credit for focusing on one-shot classification instead of the more common mixed-track setting. The evaluation design itself looks clean and reproducible on paper. The main limitation is that this is an application of a well-known technique to a known task rather than a new framework or derivation. Without the actual accuracy numbers, baselines, or statistical tests in front of us it is difficult to judge whether the robustness gains are large enough to matter or just incremental. The assumption that the chosen effects preserve instrument identity is reasonable but would benefit from more explicit validation. This work is aimed at practitioners building tools for sample navigation and library search. A reader already working on audio ML for music production would find the per-effect breakdown useful as a reference. It is solid enough on its own terms to deserve peer review even though the contribution is modest.

Referee Report

0 major / 2 minor

Summary. The manuscript evaluates the robustness of a state-of-the-art instrument classification model to audio effects commonly used in electronic music production. It trains the model on a large dataset of one-shot instrumental sounds and uses data augmentation with audio effects to assess how each effect influences classification accuracy on processed sounds.

Significance. If the empirical results demonstrate that augmentation with representative effects measurably improves accuracy on processed sounds while preserving performance on clean inputs, the work would offer a practical technique for deploying classifiers on real sample packs. The focus on EMP workflows addresses a gap between standard instrument classification benchmarks and production use cases.

minor comments (2)

[Abstract] Abstract: the evaluation plan is described but no quantitative results, dataset sizes, model details, or statistical tests are provided, making it difficult to assess the strength of the robustness claims without the full experimental section.
The weakest assumption (effects preserve instrument identity) is stated but receives no explicit validation or discussion of edge cases where heavy processing might alter perceived timbre enough to warrant relabeling.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No major comments were listed in the provided report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical robustness evaluation: a state-of-the-art model is trained on one-shot instrumental sounds and tested after applying audio effects via data augmentation, with accuracy measured per effect. No equations, parameter fits, uniqueness theorems, or self-citation chains are invoked to derive or predict any quantity; the reported accuracies are direct experimental outcomes rather than quantities forced by construction from the training procedure itself. The evaluation design is therefore self-contained against external benchmarks and contains no load-bearing steps that reduce to the inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is inferred from the stated approach. The work relies on an existing large dataset and state-of-the-art model whose training details are not described here.

axioms (1)

domain assumption Audio effects applied to one-shot sounds preserve the instrument label for classification purposes.
Implicit in the plan to measure classification accuracy on processed versions of the same sounds.

pith-pipeline@v0.9.0 · 5694 in / 1205 out tokens · 24512 ms · 2026-05-24T18:54:58.727635+00:00 · methodology

0 comments

read the original abstract

Reusing recorded sounds (sampling) is a key component in Electronic Music Production (EMP), which has been present since its early days and is at the core of genres like hip-hop or jungle. Commercial and non-commercial services allow users to obtain collections of sounds (sample packs) to reuse in their compositions. Automatic classification of one-shot instrumental sounds allows automatically categorising the sounds contained in these collections, allowing easier navigation and better characterisation. Automatic instrument classification has mostly targeted the classification of unprocessed isolated instrumental sounds or detecting predominant instruments in mixed music tracks. For this classification to be useful in audio databases for EMP, it has to be robust to the audio effects applied to unprocessed sounds. In this paper we evaluate how a state of the art model trained with a large dataset of one-shot instrumental sounds performs when classifying instruments processed with audio effects. In order to evaluate the robustness of the model, we use data augmentation with audio effects and evaluate how each effect influences the classification accuracy.

Figures

Figures reproduced from arXiv: 1907.08520 by Ant\'onio Ramires, Xavier Serra.

**Figure 1.** Figure 1: Single-layer CNN architecture proposed in [9] [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 5 internal anchors

[1]

production-ready

INTRODUCTION The repurposing of audio material, also known as sampling, has been a key component in Electronic Music Production (EMP) since its early days and became a practice which had a major in- ﬂuence in a large variety of musical genres. The availability of software such as Digital Audio Workstations, together with the au- dio sharing possibilities ...

work page 2019
[2]

RELA TED WORK Automatic instrument classiﬁcation can be split into two related tasks with a similar goal. The ﬁrst is the identiﬁcation of in- struments in single instrument recordings (which can be isolated or overlapping notes) while the second is the recognition of the predominant instrument in a mixture of sounds. A thorough de- scription of this task...

work page internal anchor Pith review Pith/arXiv arXiv 1907
[3]

METHODOLOGY In our study we will conduct two experiments. First, we will try to understand how augmenting a dataset with speciﬁc effects can improve instrument classiﬁcation and secondly, we will see if this augmentation can improve the robustness of a model to the se- lected effect. To investigate this, we process the training, validation and test sets o...

work page 2019
[4]

RESULTS Two experiments were conducted in our study. We ﬁrstly evalu- ated how augmenting the training set of NSynth [14] by applying audio effects to the sounds can improve the automatic classiﬁca- tion on the instruments of the unmodiﬁed test set. In the second experiment we evaluated how robust a state of the art model for instrument classiﬁcation is w...

work page 2019
[5]

CONCLUSIONS In this paper we evaluated how a state of the art algorithm for automatic instrument classiﬁcation performs when classifying the NSynth dataset and how augmenting this dataset with audio ef- fects commonly used in electronic music production inﬂuences its accuracy on both the original and processed versions of the audio. We identify that the a...

work page
[6]

We thank Matthew Davies for reviewing a draft of this paper and providing helpful feedback

ACKNOWLEDGMENTS This project has received funding from the European Union’s Hori- zon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement N o 765068, MIP-Frontiers. We thank Matthew Davies for reviewing a draft of this paper and providing helpful feedback

work page 2020
[7]

Freesound technical demo,

Frederic Font, Gerard Roma, and Xavier Serra, “Freesound technical demo,” in ACM International Conference on Mul- timedia (MM’13), Barcelona, Spain, 2013, ACM, pp. 411– 412, ACM

work page 2013
[8]

Automatic classiﬁcation of musical instrument sounds,

Perfecto Herrera-Boyer, Geoffroy Peeters, and Shlomo Dubnov, “Automatic classiﬁcation of musical instrument sounds,” Journal of New Music Research, vol. 32, pp. 3–21, 2003

work page 2003
[9]

RWC music database: Popular, classical, and jazz music databases,

Masataka Goto, “RWC music database: Popular, classical, and jazz music databases,” in 3rd International Society for Music Information Retrieval Conference (ISMIR), 2002, pp. 287–288

work page 2002
[10]

A comparison of sound segregation techniques for predominant instrument recognition in musical audio sig- nals,

Juan J Bosch, Jordi Janer, Ferdinand Fuhrmann, and Perfecto Herrera, “A comparison of sound segregation techniques for predominant instrument recognition in musical audio sig- nals,” in 13th International Society for Music Information Retrieval Conference (ISMIR), 2012, pp. 559–564

work page 2012
[11]

A real-time system for measuring sound goodness in instrumental sounds,

Oriol Romani Picas, Hector Parra Rodriguez, Dara Dabiri, Hiroshi Tokuda, Wataru Hariya, Koji Oishi, and Xavier Serra, “A real-time system for measuring sound goodness in instrumental sounds,” in Audio Engineering Society Con- vention 138, Warsaw, Poland, 2015, p. 9350

work page 2015
[12]

Musical Instrument Recog- nition in Multi-Instrument Audio Contexts,

Venkatesh Shenoy Kadandale, “Musical Instrument Recog- nition in Multi-Instrument Audio Contexts,” MSc thesis, Universitat Pompeu Fabra, Oct. 2018

work page 2018
[13]

Deep learning,

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436, 2015

work page 2015
[14]

The Effectiveness of Data Augmentation in Image Classification using Deep Learning

Luis Perez and Jason Wang, “The effectiveness of data aug- mentation in image classiﬁcation using deep learning,”arXiv preprint arXiv:1712.04621, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Timbre analysis of music audio sig- nals with convolutional neural networks,

Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, and Xavier Serra, “Timbre analysis of music audio sig- nals with convolutional neural networks,” in 25th European Signal Processing Conference (EUSIPCO). IEEE, 2017, pp. 2744–2748

work page 2017
[16]

Deep convolutional neural networks for predominant instrument recognition in poly- phonic music,

Yoonchang Han, Jaehun Kim, Kyogu Lee, Yoonchang Han, Jaehun Kim, and Kyogu Lee, “Deep convolutional neural networks for predominant instrument recognition in poly- phonic music,” IEEE/ACM Trans. Audio, Speech and Lang. Proc., vol. 25, no. 1, pp. 208–221, Jan. 2017

work page 2017
[17]

Automatic Instrument Recognition in Polyphonic Music Using Convolutional Neural Networks

Peter Li, Jiyuan Qian, and Tian Wang, “Automatic instru- ment recognition in polyphonic music using convolutional neural networks,” arXiv preprint arXiv:1511.05520, 2015. DAFX-5 Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

Musical instrument sound classification with deep convolutional neural network using feature fusion approach

Taejin Park and Taejin Lee, “Musical instrument sound clas- siﬁcation with deep convolutional neural network using fea- ture fusion approach,” arXiv preprint arXiv:1512.07370 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[19]

Deep learning for audio-based music clas- siﬁcation and tagging: Teaching computers to distinguish rock from bach,

Juhan Nam, Keunwoo Choi, Jongpil Lee, Szu-Yu Chou, and Yi-Hsuan Yang, “Deep learning for audio-based music clas- siﬁcation and tagging: Teaching computers to distinguish rock from bach,” IEEE Signal Processing Magazine , vol. 36, no. 1, pp. 41–51, Jan 2019

work page 2019
[20]

Neural audio synthesis of musical notes with wavenet autoencoders,

Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Diele- man, Mohammad Norouzi, Douglas Eck, and Karen Si- monyan, “Neural audio synthesis of musical notes with wavenet autoencoders,” in Proceedings of the 34th Interna- tional Conference on Machine Learning, ICML 2017, Syd- ney, NSW, Australia, 6-11 August 2017 , 2017, pp. 1068– 1077

work page 2017
[21]

A software framework for musical data augmentation.,

Brian McFee, Eric J Humphrey, and Juan Pablo Bello, “A software framework for musical data augmentation.,” in16th International Society for Music Information Retrieval Con- ference (ISMIR), 2015, pp. 248–254

work page 2015
[22]

Deep convolutional neural networks and data augmentation for environmental sound classiﬁcation,

Justin Salamon and Juan Pablo Bello, “Deep convolutional neural networks and data augmentation for environmental sound classiﬁcation,” IEEE Signal Processing Letters , vol. 24, no. 3, pp. 279–283, 2017

work page 2017
[23]

A study on data augmenta- tion of reverberant speech for robust speech recognition,

Tom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L Seltzer, and Sanjeev Khudanpur, “A study on data augmenta- tion of reverberant speech for robust speech recognition,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2017, pp. 5220– 5224

work page 2017
[24]

Udo Zölzer, DAFX: Digital Audio Effects , John Wiley & Sons, 2011

work page 2011
[25]

Joshua D Reiss and Andrew McPherson, Audio effects: the- ory, implementation and application, CRC Press, 2014

work page 2014
[26]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe and Christian Szegedy, “Batch normalization: Accelerating deep network training by reducing internal co- variate shift,” arXiv preprint arXiv:1502.03167, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[27]

Fast and accurate deep network learning by expo- nential linear units (elus),

Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochre- iter, “Fast and accurate deep network learning by expo- nential linear units (elus),” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016

work page 2016
[28]

Adam: A method for stochastic optimization,

Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015

work page 2015
[29]

librosa/librosa: 0.6.3,

Brian McFee et al., “librosa/librosa: 0.6.3,” Feb. 2019

work page 2019
[30]

A proposed typology of sampled mate- rial within electronic dance music,

Robert Ratcliffe, “A proposed typology of sampled mate- rial within electronic dance music,” Dancecult: Journal of Electronic Dance Music Culture , vol. 6, no. 1, pp. 97–122, 2014

work page 2014
[31]

123–138, Springer Singapore, Sin- gapore, 2018

Shruti Sarika Chakraborty and Ranjan Parekh, Improved Mu- sical Instrument Classiﬁcation Using Cepstral Coefﬁcients and Neural Networks, pp. 123–138, Springer Singapore, Sin- gapore, 2018

work page 2018
[32]

Med- leyDB: A multitrack dataset for annotation-intensive MIR Research,

Rachel M. Bittner, Justin Salamon, Mike Tierney, Matthias Mauch, Chris Cannam, and Juan Pablo Bello, “Med- leyDB: A multitrack dataset for annotation-intensive MIR Research,” in the 15th International Society for Music In- formation Retrieval Conference (ISMIR), 2014. DAFX-6

work page 2014

[1] [1]

production-ready

INTRODUCTION The repurposing of audio material, also known as sampling, has been a key component in Electronic Music Production (EMP) since its early days and became a practice which had a major in- ﬂuence in a large variety of musical genres. The availability of software such as Digital Audio Workstations, together with the au- dio sharing possibilities ...

work page 2019

[2] [2]

RELA TED WORK Automatic instrument classiﬁcation can be split into two related tasks with a similar goal. The ﬁrst is the identiﬁcation of in- struments in single instrument recordings (which can be isolated or overlapping notes) while the second is the recognition of the predominant instrument in a mixture of sounds. A thorough de- scription of this task...

work page internal anchor Pith review Pith/arXiv arXiv 1907

[3] [3]

METHODOLOGY In our study we will conduct two experiments. First, we will try to understand how augmenting a dataset with speciﬁc effects can improve instrument classiﬁcation and secondly, we will see if this augmentation can improve the robustness of a model to the se- lected effect. To investigate this, we process the training, validation and test sets o...

work page 2019

[4] [4]

RESULTS Two experiments were conducted in our study. We ﬁrstly evalu- ated how augmenting the training set of NSynth [14] by applying audio effects to the sounds can improve the automatic classiﬁca- tion on the instruments of the unmodiﬁed test set. In the second experiment we evaluated how robust a state of the art model for instrument classiﬁcation is w...

work page 2019

[5] [5]

CONCLUSIONS In this paper we evaluated how a state of the art algorithm for automatic instrument classiﬁcation performs when classifying the NSynth dataset and how augmenting this dataset with audio ef- fects commonly used in electronic music production inﬂuences its accuracy on both the original and processed versions of the audio. We identify that the a...

work page

[6] [6]

We thank Matthew Davies for reviewing a draft of this paper and providing helpful feedback

ACKNOWLEDGMENTS This project has received funding from the European Union’s Hori- zon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement N o 765068, MIP-Frontiers. We thank Matthew Davies for reviewing a draft of this paper and providing helpful feedback

work page 2020

[7] [7]

Freesound technical demo,

Frederic Font, Gerard Roma, and Xavier Serra, “Freesound technical demo,” in ACM International Conference on Mul- timedia (MM’13), Barcelona, Spain, 2013, ACM, pp. 411– 412, ACM

work page 2013

[8] [8]

Automatic classiﬁcation of musical instrument sounds,

Perfecto Herrera-Boyer, Geoffroy Peeters, and Shlomo Dubnov, “Automatic classiﬁcation of musical instrument sounds,” Journal of New Music Research, vol. 32, pp. 3–21, 2003

work page 2003

[9] [9]

RWC music database: Popular, classical, and jazz music databases,

Masataka Goto, “RWC music database: Popular, classical, and jazz music databases,” in 3rd International Society for Music Information Retrieval Conference (ISMIR), 2002, pp. 287–288

work page 2002

[10] [10]

A comparison of sound segregation techniques for predominant instrument recognition in musical audio sig- nals,

Juan J Bosch, Jordi Janer, Ferdinand Fuhrmann, and Perfecto Herrera, “A comparison of sound segregation techniques for predominant instrument recognition in musical audio sig- nals,” in 13th International Society for Music Information Retrieval Conference (ISMIR), 2012, pp. 559–564

work page 2012

[11] [11]

A real-time system for measuring sound goodness in instrumental sounds,

Oriol Romani Picas, Hector Parra Rodriguez, Dara Dabiri, Hiroshi Tokuda, Wataru Hariya, Koji Oishi, and Xavier Serra, “A real-time system for measuring sound goodness in instrumental sounds,” in Audio Engineering Society Con- vention 138, Warsaw, Poland, 2015, p. 9350

work page 2015

[12] [12]

Musical Instrument Recog- nition in Multi-Instrument Audio Contexts,

Venkatesh Shenoy Kadandale, “Musical Instrument Recog- nition in Multi-Instrument Audio Contexts,” MSc thesis, Universitat Pompeu Fabra, Oct. 2018

work page 2018

[13] [13]

Deep learning,

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436, 2015

work page 2015

[14] [14]

The Effectiveness of Data Augmentation in Image Classification using Deep Learning

Luis Perez and Jason Wang, “The effectiveness of data aug- mentation in image classiﬁcation using deep learning,”arXiv preprint arXiv:1712.04621, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Timbre analysis of music audio sig- nals with convolutional neural networks,

Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, and Xavier Serra, “Timbre analysis of music audio sig- nals with convolutional neural networks,” in 25th European Signal Processing Conference (EUSIPCO). IEEE, 2017, pp. 2744–2748

work page 2017

[16] [16]

Deep convolutional neural networks for predominant instrument recognition in poly- phonic music,

Yoonchang Han, Jaehun Kim, Kyogu Lee, Yoonchang Han, Jaehun Kim, and Kyogu Lee, “Deep convolutional neural networks for predominant instrument recognition in poly- phonic music,” IEEE/ACM Trans. Audio, Speech and Lang. Proc., vol. 25, no. 1, pp. 208–221, Jan. 2017

work page 2017

[17] [17]

Automatic Instrument Recognition in Polyphonic Music Using Convolutional Neural Networks

Peter Li, Jiyuan Qian, and Tian Wang, “Automatic instru- ment recognition in polyphonic music using convolutional neural networks,” arXiv preprint arXiv:1511.05520, 2015. DAFX-5 Proceedings of the 22nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2015

[18] [18]

Musical instrument sound classification with deep convolutional neural network using feature fusion approach

Taejin Park and Taejin Lee, “Musical instrument sound clas- siﬁcation with deep convolutional neural network using fea- ture fusion approach,” arXiv preprint arXiv:1512.07370 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[19] [19]

Deep learning for audio-based music clas- siﬁcation and tagging: Teaching computers to distinguish rock from bach,

Juhan Nam, Keunwoo Choi, Jongpil Lee, Szu-Yu Chou, and Yi-Hsuan Yang, “Deep learning for audio-based music clas- siﬁcation and tagging: Teaching computers to distinguish rock from bach,” IEEE Signal Processing Magazine , vol. 36, no. 1, pp. 41–51, Jan 2019

work page 2019

[20] [20]

Neural audio synthesis of musical notes with wavenet autoencoders,

Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Diele- man, Mohammad Norouzi, Douglas Eck, and Karen Si- monyan, “Neural audio synthesis of musical notes with wavenet autoencoders,” in Proceedings of the 34th Interna- tional Conference on Machine Learning, ICML 2017, Syd- ney, NSW, Australia, 6-11 August 2017 , 2017, pp. 1068– 1077

work page 2017

[21] [21]

A software framework for musical data augmentation.,

Brian McFee, Eric J Humphrey, and Juan Pablo Bello, “A software framework for musical data augmentation.,” in16th International Society for Music Information Retrieval Con- ference (ISMIR), 2015, pp. 248–254

work page 2015

[22] [22]

Deep convolutional neural networks and data augmentation for environmental sound classiﬁcation,

Justin Salamon and Juan Pablo Bello, “Deep convolutional neural networks and data augmentation for environmental sound classiﬁcation,” IEEE Signal Processing Letters , vol. 24, no. 3, pp. 279–283, 2017

work page 2017

[23] [23]

A study on data augmenta- tion of reverberant speech for robust speech recognition,

Tom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L Seltzer, and Sanjeev Khudanpur, “A study on data augmenta- tion of reverberant speech for robust speech recognition,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2017, pp. 5220– 5224

work page 2017

[24] [24]

Udo Zölzer, DAFX: Digital Audio Effects , John Wiley & Sons, 2011

work page 2011

[25] [25]

Joshua D Reiss and Andrew McPherson, Audio effects: the- ory, implementation and application, CRC Press, 2014

work page 2014

[26] [26]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe and Christian Szegedy, “Batch normalization: Accelerating deep network training by reducing internal co- variate shift,” arXiv preprint arXiv:1502.03167, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[27] [27]

Fast and accurate deep network learning by expo- nential linear units (elus),

Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochre- iter, “Fast and accurate deep network learning by expo- nential linear units (elus),” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016

work page 2016

[28] [28]

Adam: A method for stochastic optimization,

Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015

work page 2015

[29] [29]

librosa/librosa: 0.6.3,

Brian McFee et al., “librosa/librosa: 0.6.3,” Feb. 2019

work page 2019

[30] [30]

A proposed typology of sampled mate- rial within electronic dance music,

Robert Ratcliffe, “A proposed typology of sampled mate- rial within electronic dance music,” Dancecult: Journal of Electronic Dance Music Culture , vol. 6, no. 1, pp. 97–122, 2014

work page 2014

[31] [31]

123–138, Springer Singapore, Sin- gapore, 2018

Shruti Sarika Chakraborty and Ranjan Parekh, Improved Mu- sical Instrument Classiﬁcation Using Cepstral Coefﬁcients and Neural Networks, pp. 123–138, Springer Singapore, Sin- gapore, 2018

work page 2018

[32] [32]

Med- leyDB: A multitrack dataset for annotation-intensive MIR Research,

Rachel M. Bittner, Justin Salamon, Mike Tierney, Matthias Mauch, Chris Cannam, and Juan Pablo Bello, “Med- leyDB: A multitrack dataset for annotation-intensive MIR Research,” in the 15th International Society for Music In- formation Retrieval Conference (ISMIR), 2014. DAFX-6

work page 2014