MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation

Li-Chia Yang; Szu-Yu Chou; Yi-Hsuan Yang

arxiv: 1703.10847 · v2 · pith:NMDGLB7Tnew · submitted 2017-03-31 · 💻 cs.SD · cs.AI

MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation

Li-Chia Yang , Szu-Yu Chou , Yi-Hsuan Yang This is my paper

classification 💻 cs.SD cs.AI

keywords melodymidinetgeneratemelodiesmodelmodelsmusicnetwork

0 comments

read the original abstract

Most existing neural network models for music generation use recurrent neural networks. However, the recent WaveNet model proposed by DeepMind shows that convolutional neural networks (CNNs) can also generate realistic musical waveforms in the audio domain. Following this light, we investigate using CNNs for generating melody (a series of MIDI notes) one bar after another in the symbolic domain. In addition to the generator, we use a discriminator to learn the distributions of melodies, making it a generative adversarial network (GAN). Moreover, we propose a novel conditional mechanism to exploit available prior knowledge, so that the model can generate melodies either from scratch, by following a chord sequence, or by conditioning on the melody of previous bars (e.g. a priming melody), among other possibilities. The resulting model, named MidiNet, can be expanded to generate music with multiple MIDI channels (i.e. tracks). We conduct a user study to compare the melody of eight-bar long generated by MidiNet and by Google's MelodyRNN models, each time using the same priming melody. Result shows that MidiNet performs comparably with MelodyRNN models in being realistic and pleasant to listen to, yet MidiNet's melodies are reported to be much more interesting.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mathematical Foundations of Polyphonic Music Generation via Structural Inductive Bias
cs.LG 2026-01 unverdicted novelty 5.0

Smart Embedding reduces parameters by 48.3 percent in polyphonic music models with information-theoretic loss bounds under 0.153 bits and tighter generalization via Rademacher complexity.
Musical Attention Transformer: Music Generation Using a Music-Specific Attention Model
cs.SD 2026-05 unverdicted novelty 4.0

The paper introduces Musical Attention, an attention variant that incorporates eight musical features including metadata to generate more coherent and varied music than standard or strided attention baselines.
MIDI-Sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track Music Generation
eess.AS 2019-07 unverdicted novelty 4.0

MIDI-Sandwich is a hierarchical VAE-GAN architecture that generates structured 136-beat melodies by modeling local bars and global relationships on the Nottingham dataset.
Classical Music Prediction and Composition by means of Variational Autoencoders
cs.SD 2019-06 unverdicted novelty 3.0

VAEs are trained on classical music to encode pieces into latent space and predict continuations, enabling composition of new music from existing pieces or random starts even with small training sets.