MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models

Gus Xia; Marco A. Mart\'inez-Ram\'irez; Naoki Murata; Simon Dixon; Wei-Hsiang Liao; Yixiao Zhang; Yukara Ikemiya; Yuki Mitsufuji

arxiv: 2402.06178 · v3 · pith:IT7WH7HGnew · submitted 2024-02-09 · 💻 cs.SD · cs.AI· cs.MM· eess.AS

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models

Yixiao Zhang , Yukara Ikemiya , Gus Xia , Naoki Murata , Marco A. Mart\'inez-Ram\'irez , Wei-Hsiang Liao , Yuki Mitsufuji , Simon Dixon This is my paper

classification 💻 cs.SD cs.AIcs.MMeess.AS

keywords editingmodelsmusictext-to-musicapproachdiffusiongeneratedgeneration

0 comments

read the original abstract

Recent advances in text-to-music generation models have opened new avenues in musical creativity. However, music generation usually involves iterative refinements, and how to edit the generated music remains a significant challenge. This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and instrument, while maintaining other aspects unchanged. Our method transforms text editing to \textit{latent space manipulation} while adding an extra constraint to enforce consistency. It seamlessly integrates with existing pretrained text-to-music diffusion models without requiring additional training. Experimental results demonstrate superior performance over both zero-shot and certain supervised baselines in style and timbre transfer evaluations. Additionally, we showcase the practical applicability of our approach in real-world music editing scenarios.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AnchorSteer: Self-Discovered Concept Injection for Structure-Preserving Music Editing
cs.SD 2026-05 unverdicted novelty 6.0

AnchorSteer couples self-discovered semantic concept vectors with structural anchoring in diffusion models to achieve controllable music editing with preserved structure.
Not that Groove: Zero-Shot Symbolic Music Editing
cs.SD 2025-05 unverdicted novelty 6.0

The work formalizes zero-shot symbolic drum editing as LLM reasoning over a drumroll grid notation, evaluates it on a new benchmark with automated symbolic unit tests, and reports up to 68% success across eight models.