MMAE is a new multitask audio editing benchmark showing that leading models achieve under 5% exact match rate, with 0% on complex mixed-modality tasks.
Mmedit: A unified framework for multi-type audio editing via audio language model,
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
Hybrid two-stage diffusion transformer architecture for instruction-guided audio editing via rectified flow that performs joint attention at low resolution then alternates joint and cross-attention at high resolution for improved performance and efficiency.
UNISON introduces a unified latent diffusion framework with layer-wise LLM fusion and channel-mask task encoding for multiple speech and sound generation and editing tasks.
citing papers explorer
-
MMAE: A Massive Multitask Audio Editing Benchmark
MMAE is a new multitask audio editing benchmark showing that leading models achieve under 5% exact match rate, with 0% on complex mixed-modality tasks.
-
Hybrid Diffusion Transformer for Instruction-Guided Audio Editing via Rectified Flow
Hybrid two-stage diffusion transformer architecture for instruction-guided audio editing via rectified flow that performs joint attention at low resolution then alternates joint and cross-attention at high resolution for improved performance and efficiency.