UniSinger unifies speaker-cloned song generation and accompaniment co-generation SVC in one multimodal diffusion transformer model trained with curriculum learning via task-specific modality masking.
Secous- ticodec: Cross-modal aligned streaming single-codecbook speech codec,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SD 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
ContextCodec uses a dual-branch encoder with CLIP-style contrastive training on phoneme-aligned context features plus autoregressive refinement to improve quality-intelligibility at bitrates down to 500 bps.
citing papers explorer
-
Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation
UniSinger unifies speaker-cloned song generation and accompaniment co-generation SVC in one multimodal diffusion transformer model trained with curriculum learning via task-specific modality masking.
-
ContextCodec: Content-Focused Context Guidance for Ultra-Low Bitrate Speech Coding
ContextCodec uses a dual-branch encoder with CLIP-style contrastive training on phoneme-aligned context features plus autoregressive refinement to improve quality-intelligibility at bitrates down to 500 bps.