MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
read the original abstract
Recent advances in text-to-music generation models have opened new avenues in musical creativity. However, music generation usually involves iterative refinements, and how to edit the generated music remains a significant challenge. This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and instrument, while maintaining other aspects unchanged. Our method transforms text editing to \textit{latent space manipulation} while adding an extra constraint to enforce consistency. It seamlessly integrates with existing pretrained text-to-music diffusion models without requiring additional training. Experimental results demonstrate superior performance over both zero-shot and certain supervised baselines in style and timbre transfer evaluations. Additionally, we showcase the practical applicability of our approach in real-world music editing scenarios.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
AnchorSteer: Self-Discovered Concept Injection for Structure-Preserving Music Editing
AnchorSteer couples self-discovered semantic concept vectors with structural anchoring in diffusion models to achieve controllable music editing with preserved structure.
-
Not that Groove: Zero-Shot Symbolic Music Editing
The work formalizes zero-shot symbolic drum editing as LLM reasoning over a drumroll grid notation, evaluates it on a new benchmark with automated symbolic unit tests, and reports up to 68% success across eight models.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.