NVBench provides a standardized bilingual benchmark and evaluation protocol for assessing non-verbal vocalization generation, placement, and salience in text-to-speech systems.
Cap- speech: Enabling downstream applications in style-captioned text-to-speech
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
FCaps supplies 19M fine-grained speech style captions on 47k hours of audio via direct grounding, enabling the CLSP model to produce multi-granular representations that improve retrieval, zero-shot classification, and style scoring aligned with human judgments.
citing papers explorer
-
NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations
NVBench provides a standardized bilingual benchmark and evaluation protocol for assessing non-verbal vocalization generation, placement, and salience in text-to-speech systems.
-
Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
FCaps supplies 19M fine-grained speech style captions on 47k hours of audio via direct grounding, enabling the CLSP model to produce multi-granular representations that improve retrieval, zero-shot classification, and style scoring aligned with human judgments.