SemaVoice adds SFM-guided alignment to refine continuous speech representations in autoregressive TTS, reporting 1.71% English WER on Seed-TTS and competitiveness with open-source SOTA.
V oicebox: Text-guided multilin- gual universal speech generation at scale.Advances in neural information processing systems, 36:14005–14034, 2023
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Flow-Direct constructs a reusable non-parametric guidance field from the log-density ratio of base and target distributions using all accumulated reward samples for feedback-efficient guidance in flow models.
citing papers explorer
-
SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis
SemaVoice adds SFM-guided alignment to refine continuous speech representations in autoregressive TTS, reporting 1.71% English WER on Seed-TTS and competitiveness with open-source SOTA.
-
Flow-Direct: Feedback-Efficient and Reusable Guidance for Flow Models via Non-Parametric Guidance Field
Flow-Direct constructs a reusable non-parametric guidance field from the log-density ratio of base and target distributions using all accumulated reward samples for feedback-efficient guidance in flow models.