RIVET enforces an idempotency objective during training of voice attribute editing models to improve robustness to noisy labels, outperforming standard training on controlled noise and the GLOBE dataset.
Globe: A high-quality english corpus with global accents for zero-shot speaker adaptive text-to- speech,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SPARCLE builds speaker-aware grapheme representations by contrastively aligning characters with Wav2Vec2 acoustic embeddings conditioned on speaker identity, replacing G2P for TTS and halving WER in low-resource cases.
citing papers explorer
-
SPARCLE: SPeaker-aware Aligned Representations via Contrastive Language Embeddings
SPARCLE builds speaker-aware grapheme representations by contrastively aligning characters with Wav2Vec2 acoustic embeddings conditioned on speaker identity, replacing G2P for TTS and halving WER in low-resource cases.