MAGE unifies text, visual, and audio-conditioned music generation and editing in one flow-based latent model with dynamic modality masking and cross-gated control.
The MUSDB18 corpus for music separation
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.
MERT embedding-based MSE and intrusive FAD metrics correlate more strongly with perceptual audio quality ratings than BSS-Eval metrics across stems and models in musical source separation.
citing papers explorer
-
MAGE: Modality-Agnostic Music Generation and Editing
MAGE unifies text, visual, and audio-conditioned music generation and editing in one flow-based latent model with dynamic modality masking and cross-gated control.
-
Two-Dimensional Quantization for Geometry-Aware Audio Coding
Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.
-
Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations
MERT embedding-based MSE and intrusive FAD metrics correlate more strongly with perceptual audio quality ratings than BSS-Eval metrics across stems and models in musical source separation.