JASTIN is an instruction-driven audio evaluation system that achieves state-of-the-art correlation with human ratings on speech, sound, music, and out-of-domain tasks without task-specific retraining.
Wavlm: Large-scale self-supervised pre- training for full stack speech processing,
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
CoMelSinger introduces a discrete token-based zero-shot SVS framework on MaskGCT with coarse-to-fine contrastive learning and an SVT module to improve melody control and reduce prosody leakage.
Sparse MERIT uses frame-wise sparse mixture-of-experts with task-specific gating on self-supervised speech features to jointly optimize enhancement and emotion recognition, reporting gains over baselines on MSP-Podcast at low SNR.
citing papers explorer
-
JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions
JASTIN is an instruction-driven audio evaluation system that achieves state-of-the-art correlation with human ratings on speech, sound, music, and out-of-domain tasks without task-specific retraining.
-
CoMelSinger: Discrete Token-Based Zero-Shot Singing Synthesis With Structured Melody Control and Guidance
CoMelSinger introduces a discrete token-based zero-shot SVS framework on MaskGCT with coarse-to-fine contrastive learning and an SVT module to improve melody control and reduce prosody leakage.
-
Joint Learning using Mixture-of-Expert-Based Representation for Speech Enhancement and Robust Emotion Recognition
Sparse MERIT uses frame-wise sparse mixture-of-experts with task-specific gating on self-supervised speech features to jointly optimize enhancement and emotion recognition, reporting gains over baselines on MSP-Podcast at low SNR.