pith. sign in

Generative timbre spaces: regularizing variational auto-encoders with perceptual metrics

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Timbre spaces have been used in music perception to study the perceptual relationships between instruments based on dissimilarity ratings. However, these spaces do not generalize to novel examples and do not provide an invertible mapping, preventing audio synthesis. In parallel, generative models have aimed to provide methods for synthesizing novel timbres. However, these systems do not provide an understanding of their inner workings and are usually not related to any perceptually relevant information. Here, we show that Variational Auto-Encoders (VAE) can alleviate all of these limitations by constructing generative timbre spaces. To do so, we adapt VAEs to learn an audio latent space, while using perceptual ratings from timbre studies to regularize the organization of this space. The resulting space allows us to analyze novel instruments, while being able to synthesize audio from any point of this space. We introduce a specific regularization allowing to enforce any given similarity distances onto these spaces. We show that the resulting space provide almost similar distance relationships as timbre spaces. We evaluate several spectral transforms and show that the Non-Stationary Gabor Transform (NSGT) provides the highest correlation to timbre spaces and the best quality of synthesis. Furthermore, we show that these spaces can generalize to novel instruments and can generate any path between instruments to understand their timbre relationships. As these spaces are continuous, we study how audio descriptors behave along the latent dimensions. We show that even though descriptors have an overall non-linear topology, they follow a locally smooth evolution. Based on this, we introduce a method for descriptor-based synthesis and show that we can control the descriptors of an instrument while keeping its timbre structure.

fields

cs.LG 1 cs.SD 1

years

2019 2

verdicts

UNVERDICTED 2

representative citing papers

Universal audio synthesizer control with normalizing flows

cs.LG · 2019-07-01 · unverdicted · novelty 7.0

A VAE+NF model with disentangling flows for unified audio synthesizer control, claiming better parameter inference and reconstruction than baselines while disentangling audio factors into macro-parameters.

citing papers explorer

Showing 2 of 2 citing papers.

  • Universal audio synthesizer control with normalizing flows cs.LG · 2019-07-01 · unverdicted · none · ref 13 · internal anchor

    A VAE+NF model with disentangling flows for unified audio synthesizer control, claiming better parameter inference and reconstruction than baselines while disentangling audio factors into macro-parameters.

  • Classical Music Prediction and Composition by means of Variational Autoencoders cs.SD · 2019-06-21 · unverdicted · none · ref 14 · internal anchor

    VAEs are trained on classical music to encode pieces into latent space and predict continuations, enabling composition of new music from existing pieces or random starts even with small training sets.