Full-Duplex-Bench-v3 provides a dataset of real human audio with five disfluency types and chained API tasks to benchmark six voice agent systems, revealing GPT-Realtime leads in accuracy while cascaded pipelines suffer highest latency.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
Full-Duplex-Bench-v3 provides a dataset of real human audio with five disfluency types and chained API tasks to benchmark six voice agent systems, revealing GPT-Realtime leads in accuracy while cascaded pipelines suffer highest latency.