pith. machine review for the scientific record. sign in

arxiv: 1612.07837 · v2 · submitted 2016-12-22 · 💻 cs.SD · cs.AI

Recognition: unknown

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Authors on Pith no claims yet
classification 💻 cs.SD cs.AI
keywords modelaudiogenerationneuraltimeunconditionalableautoregressive
0
0 comments X
read the original abstract

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Generating Long Sequences with Sparse Transformers

    cs.LG 2019-04 unverdicted novelty 7.0

    Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.

  2. Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation

    eess.AS 2026-04 unverdicted novelty 6.0

    Chain-of-Details (CoD) is a cascaded TTS method that explicitly models temporal coarse-to-fine dynamics with a shared decoder, achieving competitive performance using significantly fewer parameters.