MMAE is a new multitask audio editing benchmark showing that leading models achieve under 5% exact match rate, with 0% on complex mixed-modality tasks.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
Hybrid two-stage diffusion transformer architecture for instruction-guided audio editing via rectified flow that performs joint attention at low resolution then alternates joint and cross-attention at high resolution for improved performance and efficiency.
A survey that presents a unified taxonomy of audio editing tasks, summarizes training-based and training-free foundation model approaches, reviews datasets and evaluation protocols, and identifies future challenges.
citing papers explorer
-
MMAE: A Massive Multitask Audio Editing Benchmark
MMAE is a new multitask audio editing benchmark showing that leading models achieve under 5% exact match rate, with 0% on complex mixed-modality tasks.
-
Hybrid Diffusion Transformer for Instruction-Guided Audio Editing via Rectified Flow
Hybrid two-stage diffusion transformer architecture for instruction-guided audio editing via rectified flow that performs joint attention at low resolution then alternates joint and cross-attention at high resolution for improved performance and efficiency.
-
Audio Editing in the Era of Foundation Models: A Survey
A survey that presents a unified taxonomy of audio editing tasks, summarizes training-based and training-free foundation model approaches, reviews datasets and evaluation protocols, and identifies future challenges.