of of the idea that has been the same idea for a thousand years that they believe that—

Helin Wang, Jiarui Hai, Dading Chong, Karan Thakkar, Tiantian Feng, Dongchao Yang, Junhyeok Lee, Thomas Thebaud, Laureano Moro Velazquez, Jesus Villalba, et al · 2025 · arXiv 2506.02863

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations

cs.SD · 2026-04-17 · unverdicted · novelty 7.0

NVBench provides a standardized bilingual benchmark and evaluation protocol for assessing non-verbal vocalization generation, placement, and salience in text-to-speech systems.

Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

eess.AS · 2026-01-06 · unverdicted · novelty 7.0

FCaps supplies 19M fine-grained speech style captions on 47k hours of audio via direct grounding, enabling the CLSP model to produce multi-granular representations that improve retrieval, zero-shot classification, and style scoring aligned with human judgments.

TRACE: Temporal Relationship-Aware Conversational Entrainment Detection in Dyadic Speech

cs.CL · 2026-06-29 · unverdicted · novelty 6.0

Introduces DyadEE dataset and TRACE window-level framework using sequences of acoustic embeddings for emotional entrainment detection, reporting 97.01% accuracy when context and relationship information are included.

Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack Generation

cs.SD · 2026-06-02 · unverdicted · novelty 6.0

Foley-Omni extends isolated audio synthesis to joint generation of full video soundtracks across speech, effects, and music, with a new V2ST-Bench for evaluation showing competitive single-task results and gains in mixed-track consistency.

VoxCPM2 Technical Report

cs.SD · 2026-06-05 · unverdicted · novelty 5.0

VoxCPM2 scales hierarchical continuous-latent speech modeling to 2B parameters and over 2M hours of multilingual data, unifying voice cloning, style control, and continuation in one backbone with open release.

citing papers explorer

Showing 5 of 5 citing papers after filters.

NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations cs.SD · 2026-04-17 · unverdicted · none · ref 32
NVBench provides a standardized bilingual benchmark and evaluation protocol for assessing non-verbal vocalization generation, placement, and salience in text-to-speech systems.
Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training eess.AS · 2026-01-06 · unverdicted · none · ref 8
FCaps supplies 19M fine-grained speech style captions on 47k hours of audio via direct grounding, enabling the CLSP model to produce multi-granular representations that improve retrieval, zero-shot classification, and style scoring aligned with human judgments.
TRACE: Temporal Relationship-Aware Conversational Entrainment Detection in Dyadic Speech cs.CL · 2026-06-29 · unverdicted · none · ref 19
Introduces DyadEE dataset and TRACE window-level framework using sequences of acoustic embeddings for emotional entrainment detection, reporting 97.01% accuracy when context and relationship information are included.
Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack Generation cs.SD · 2026-06-02 · unverdicted · none · ref 18
Foley-Omni extends isolated audio synthesis to joint generation of full video soundtracks across speech, effects, and music, with a new V2ST-Bench for evaluation showing competitive single-task results and gains in mixed-track consistency.
VoxCPM2 Technical Report cs.SD · 2026-06-05 · unverdicted · none · ref 33
VoxCPM2 scales hierarchical continuous-latent speech modeling to 2B parameters and over 2M hours of multilingual data, unifying voice cloning, style control, and continuation in one backbone with open release.

of of the idea that has been the same idea for a thousand years that they believe that—

fields

years

verdicts

representative citing papers

citing papers explorer