Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

· 2017 · cs.LG · arXiv 1704.01279

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Generative models in vision have seen rapid progress due to algorithmic improvements and the availability of high-quality image datasets. In this paper, we offer contributions in both these areas to enable similar progress in audio modeling. First, we detail a powerful new WaveNet-style autoencoder model that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets. Using NSynth, we demonstrate improved qualitative and quantitative performance of the WaveNet autoencoder over a well-tuned spectral autoencoder baseline. Finally, we show that the model learns a manifold of embeddings that allows for morphing between instruments, meaningfully interpolating in timbre to create new types of sounds that are realistic and expressive.

representative citing papers

Autoencoding sensory substitution

q-bio.NC · 2019-07-14 · unverdicted · novelty 4.0

Deep recurrent autoencoders convert images to shortened audio signals that incorporate hearing models, enabling above-chance hand posture discrimination and object reaching after a few hours of training instead of months.

Classical Music Prediction and Composition by means of Variational Autoencoders

cs.SD · 2019-06-21 · unverdicted · novelty 3.0

VAEs are trained on classical music to encode pieces into latent space and predict continuations, enabling composition of new music from existing pieces or random starts even with small training sets.

citing papers explorer

Showing 2 of 2 citing papers.

Autoencoding sensory substitution q-bio.NC · 2019-07-14 · unverdicted · none · ref 197 · internal anchor
Deep recurrent autoencoders convert images to shortened audio signals that incorporate hearing models, enabling above-chance hand posture discrimination and object reaching after a few hours of training instead of months.
Classical Music Prediction and Composition by means of Variational Autoencoders cs.SD · 2019-06-21 · unverdicted · none · ref 13 · internal anchor
VAEs are trained on classical music to encode pieces into latent space and predict continuations, enabling composition of new music from existing pieces or random starts even with small training sets.

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

fields

years

verdicts

representative citing papers

citing papers explorer