pith. sign in

arxiv: 1809.07600 · v1 · pith:JRRGZDMMnew · submitted 2018-09-20 · 💻 cs.SD · cs.LG· eess.AS· stat.ML

MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer

classification 💻 cs.SD cs.LGeess.ASstat.ML
keywords musicstyledynamicstransfermidi-vaecreateinstrumentationmodel
0
0 comments X
read the original abstract

We introduce MIDI-VAE, a neural network model based on Variational Autoencoders that is capable of handling polyphonic music with multiple instrument tracks, as well as modeling the dynamics of music by incorporating note durations and velocities. We show that MIDI-VAE can perform style transfer on symbolic music by automatically changing pitches, dynamics and instruments of a music piece from, e.g., a Classical to a Jazz style. We evaluate the efficacy of the style transfer by training separate style validation classifiers. Our model can also interpolate between short pieces of music, produce medleys and create mixtures of entire songs. The interpolations smoothly change pitches, dynamics and instrumentation to create a harmonic bridge between two music pieces. To the best of our knowledge, this work represents the first successful attempt at applying neural style transfer to complete musical compositions.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Mathematical Foundations of Polyphonic Music Generation via Structural Inductive Bias

    cs.LG 2026-01 unverdicted novelty 5.0

    Smart Embedding reduces parameters by 48.3 percent in polyphonic music models with information-theoretic loss bounds under 0.153 bits and tighter generalization via Rademacher complexity.

  2. Musical Attention Transformer: Music Generation Using a Music-Specific Attention Model

    cs.SD 2026-05 unverdicted novelty 4.0

    The paper introduces Musical Attention, an attention variant that incorporates eight musical features including metadata to generate more coherent and varied music than standard or strided attention baselines.