pith:LTWEYMDD
SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis
SemaVoice adds a foundation-model alignment step to continuous speech representations so autoregressive TTS can keep semantic meaning without losing acoustic quality.
arxiv:2605.16964 v1 · 2026-05-16 · eess.AS
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{LTWEYMDD7VOZ5W24WPNTU476OP}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
SemaVoice introduces a Speech Foundation Model (SFM) guided alignment mechanism that refines continuous speech representations to better capture both local semantic consistency and global structural relationships. These representations condition a patch-wise diffusion head within the autoregressive framework for high-quality speech synthesis, achieving an English WER of 1.71% on the Seed-TTS benchmark.
The core premise that an SFM-guided alignment step can resolve the fundamental mismatch between semantic-prosodic modeling and reconstruction-driven continuous representations without introducing new artifacts or error accumulation in autoregressive generation (stated in the abstract's problem formulation and solution description).
SemaVoice adds SFM-guided alignment to refine continuous speech representations in autoregressive TTS, reporting 1.71% English WER on Seed-TTS and competitiveness with open-source SOTA.
References
Receipt and verification
| First computed | 2026-05-20T00:03:33.342329Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
5cec4c3063fd5d9edb5cb3db3a73fe73f1cbcd627b5674bacf4ed177bd85dfbd
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LTWEYMDD7VOZ5W24WPNTU476OP \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5cec4c3063fd5d9edb5cb3db3a73fe73f1cbcd627b5674bacf4ed177bd85dfbd
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "3a9d39534311fcbfd3d6b0b6c2e8057fd19e3dfb87e0b8a88b4b6fe7d332891d",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "eess.AS",
"submitted_at": "2026-05-16T12:37:06Z",
"title_canon_sha256": "c882b5e6ea146a76de6984a63967f3295e0c3c2997f4b632cb8c93c4c5b49977"
},
"schema_version": "1.0",
"source": {
"id": "2605.16964",
"kind": "arxiv",
"version": 1
}
}