Cap- speech: Enabling downstream applications in style-captioned text-to-speech

CapSpeech: enabling downstream applications in style-captioned text-to-speech · 2023 · arXiv 2506.02863

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations

cs.SD · 2026-04-17 · unverdicted · novelty 7.0

NVBench provides a standardized bilingual benchmark and evaluation protocol for assessing non-verbal vocalization generation, placement, and salience in text-to-speech systems.

Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

eess.AS · 2026-01-06 · unverdicted · novelty 7.0

FCaps supplies 19M fine-grained speech style captions on 47k hours of audio via direct grounding, enabling the CLSP model to produce multi-granular representations that improve retrieval, zero-shot classification, and style scoring aligned with human judgments.

citing papers explorer

Showing 2 of 2 citing papers.

NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations cs.SD · 2026-04-17 · unverdicted · none · ref 32
NVBench provides a standardized bilingual benchmark and evaluation protocol for assessing non-verbal vocalization generation, placement, and salience in text-to-speech systems.
Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training eess.AS · 2026-01-06 · unverdicted · none · ref 8
FCaps supplies 19M fine-grained speech style captions on 47k hours of audio via direct grounding, enabling the CLSP model to produce multi-granular representations that improve retrieval, zero-shot classification, and style scoring aligned with human judgments.

Cap- speech: Enabling downstream applications in style-captioned text-to-speech

fields

years

verdicts

representative citing papers

citing papers explorer