Proposes MeloDISinger, a flow-matching SVE model with MeloDRP for melody-aware duration-preserving editing and audio infilling, claiming SOTA results.
Tcsinger: Zero-shot singing voice synthesis with style transfer and multi-level style control,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
eess.AS 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
SwanSphere introduces a causal autoregressive diffusion transformer architecture with SVAC contrastive learning and ODPO optimization for streaming spatial audio generation from video and text.
citing papers explorer
-
MeloDISinger: Melody-Aware & Duration-Preserving Singing Voice Editing with Audio Infilling
Proposes MeloDISinger, a flow-matching SVE model with MeloDRP for melody-aware duration-preserving editing and audio infilling, claiming SOTA results.
-
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer
SwanSphere introduces a causal autoregressive diffusion transformer architecture with SVAC contrastive learning and ODPO optimization for streaming spatial audio generation from video and text.