MIDI-SAG generates consistent long-form singing accompaniments by feeding symbolic MIDI timing, chords, and structure labels into a compositional pipeline built from pre-trained modules.
FLUX that plays music,
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
SonicMaster is a text-conditioned flow-matching generative model for unified music restoration and mastering, trained on a dataset of simulated degradations across equalization, dynamics, reverb, amplitude, and stereo.
The paper introduces the ATTM Grand Challenge with a CC-licensed instrumental subset of MTG-Jamendo, two tracks, and evaluation via FAD, CLAP, and a new Concept Coverage Score to support academic text-to-music research.
F5-TTS generates natural speech from text via flow matching on DiT with simple text padding, ConvNeXt refinement, and sway sampling, trained on 100K hours multilingual data.
citing papers explorer
-
MIDI-Informed Singing Accompaniment Generation in a Compositional Song Pipeline
MIDI-SAG generates consistent long-form singing accompaniments by feeding symbolic MIDI timing, chords, and structure labels into a compositional pipeline built from pre-trained modules.
-
SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering
SonicMaster is a text-conditioned flow-matching generative model for unified music restoration and mastering, trained on a dataset of simulated degradations across equalization, dynamics, reverb, amplitude, and stereo.
-
Academic Text-to-Music Grand Challenge: Datasets, Baselines, and Evaluation Methods
The paper introduces the ATTM Grand Challenge with a CC-licensed instrumental subset of MTG-Jamendo, two tracks, and evaluation via FAD, CLAP, and a new Concept Coverage Score to support academic text-to-music research.
-
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS generates natural speech from text via flow matching on DiT with simple text padding, ConvNeXt refinement, and sway sampling, trained on 100K hours multilingual data.