Next-Turn introduces time-to-next-speech-onset prediction for duration-aware streaming endpoint detection, reporting a 25.9% improvement in accuracy within 320 ms.
V oice activity projection: Self- supervised learning of turn-taking events,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Next-Turn: Duration-Aware Streaming Endpoint Detection via Time-to-Next-Speech-Onset Prediction
Next-Turn introduces time-to-next-speech-onset prediction for duration-aware streaming endpoint detection, reporting a 25.9% improvement in accuracy within 320 ms.