Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding

Jinjiang Liu; Pengfei Hu; Qun Yu; Yang Zhang; Yougen Yuan; Yuming Yan; Zhiyong Wu; Zijian Lin

arxiv: 2505.15380 · v2 · pith:UMHCAPBWnew · submitted 2025-05-21 · 💻 cs.SD · cs.AI· eess.AS

Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding

Zijian Lin , Yang Zhang , Yougen Yuan , Yuming Yan , Jinjiang Liu , Zhiyong Wu , Pengfei Hu , Qun Yu This is my paper

classification 💻 cs.SD cs.AIeess.AS

keywords speechautoregressivedecodinginferencemodelmodelssynthesisaccelerating

0 comments

read the original abstract

Modern autoregressive speech synthesis models leveraging language models have demonstrated remarkable performance. However, the sequential nature of next token prediction in these models leads to significant latency, hindering their deployment in scenarios where inference speed is critical. In this work, we propose Speech Speculative Decoding (SSD), a novel framework for autoregressive speech synthesis acceleration. Specifically, our method employs a lightweight draft model to generate candidate token sequences, which are subsequently verified in parallel by the target model using the proposed SSD framework. Experimental results demonstrate that SSD achieves a significant speedup of 1.4x compared with conventional autoregressive decoding, while maintaining high fidelity and naturalness. Subjective evaluations further validate the effectiveness of SSD in preserving the perceptual quality of the target model while accelerating inference.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TLDR: Compressing Audio Tokens for Efficient Autoregressive Text-to-Speech
cs.SD 2026-06 unverdicted novelty 6.0

TLDR groups codec tokens into patches for patch-level autoregressive modeling in pretrained TTS systems, yielding 1.8x speedup and 75% KV-cache reduction at patch size 4.
From Static Inference to Dynamic Interaction: A Survey of Streaming Large Language Models
cs.CL 2026-03 unverdicted novelty 5.0

The paper supplies a unified definition based on data flow and dynamic interaction plus a systematic taxonomy to organize fragmented work on streaming large language models.