Nevertheless, the cascaded vocoder remains a major impediment to pure streaming efficiency

implemented causal flow-based generation to adapt continuous latent variables explicitly for real-time output · 2019

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding

eess.AS · 2026-04-14 · unverdicted · novelty 5.0

A block-wise generation architecture with progressive depth-wise decoding on 32-layer RVQ codes from the Mimi codec delivers 48.99 ms time-to-first-byte latency and improved voicing accuracy over regression-based TTS.

citing papers explorer

Showing 1 of 1 citing paper.

An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding eess.AS · 2026-04-14 · unverdicted · none · ref 9
A block-wise generation architecture with progressive depth-wise decoding on 32-layer RVQ codes from the Mimi codec delivers 48.99 ms time-to-first-byte latency and improved voicing accuracy over regression-based TTS.

Nevertheless, the cascaded vocoder remains a major impediment to pure streaming efficiency

fields

years

verdicts

representative citing papers

citing papers explorer