pith. sign in

Adaptive Turn-Taking for Real-time Multi-Party Voice Agents

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Turn-taking in multi-party spoken conversations remains a fundamental challenge for voice-based agents, particularly under dynamic floor competition and varying user expectations. We propose ModeratorLM, a role-playing voice agent that conditions turn-taking behavior on an explicitly assigned role in multi-party settings. The system is built on a speech large language model operating in chunk-wise streaming manner. We further introduce a reasoning-augmented variant that incorporates chain-of-thought reasoning over conversational context and the assigned role. We construct RolePlayConv, a large-scale synthetic dataset of spoken multi-party conversations with diverse assistant roles. Experiments on real-world meeting data and RolePlayConv show improved turn-taking precision by over 40% and recall by more than 70%, while substantially reducing false-positive interruptions compared to non-role-conditioned baselines.

fields

eess.AS 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Adaptive Turn-Taking for Real-time Multi-Party Voice Agents

eess.AS · 2026-06-11 · unverdicted · novelty 5.0

ModeratorLM conditions a streaming speech LLM on assigned roles for adaptive turn-taking in multi-party settings, reporting over 40% higher precision and 70% higher recall than non-role baselines on real meetings and a new synthetic dataset.

citing papers explorer

Showing 1 of 1 citing paper.

  • Adaptive Turn-Taking for Real-time Multi-Party Voice Agents eess.AS · 2026-06-11 · unverdicted · none · ref 2 · internal anchor

    ModeratorLM conditions a streaming speech LLM on assigned roles for adaptive turn-taking in multi-party settings, reporting over 40% higher precision and 70% higher recall than non-role baselines on real meetings and a new synthetic dataset.