Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

Nicolas Boulanger-Lewandowski (Universite de Montreal); Pascal Vincent (Universite de Montreal); Yoshua Bengio (Universite de Montreal)

arxiv: 1206.6392 · v1 · pith:JHEJ5A7Lnew · submitted 2012-06-27 · 💻 cs.LG · cs.SD· stat.ML

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

Nicolas Boulanger-Lewandowski (Universite de Montreal) , Yoshua Bengio (Universite de Montreal) , Pascal Vincent (Universite de Montreal) This is my paper

classification 💻 cs.LG cs.SDstat.ML

keywords polyphonicmusicsequencesdependencieshigh-dimensionalmodelmodelingsymbolic

0 comments

read the original abstract

We investigate the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation. We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences. Our approach outperforms many traditional models of polyphonic music on a variety of realistic datasets. We show how our musical language model can serve as a symbolic prior to improve the accuracy of polyphonic transcription.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

R-Transformer: Recurrent Neural Network Enhanced Transformer
cs.LG 2019-07 unverdicted novelty 6.0

R-Transformer integrates RNNs with multi-head attention to model local and global sequence dependencies without position embeddings and reports large-margin gains over prior methods on diverse tasks.
MIDI-Sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track Music Generation
eess.AS 2019-07 unverdicted novelty 4.0

MIDI-Sandwich is a hierarchical VAE-GAN architecture that generates structured 136-beat melodies by modeling local bars and global relationships on the Nottingham dataset.