EVA-Bench supplies a simulation engine for bot-to-bot voice dialogues plus two composite metrics (EVA-A for accuracy, EVA-X for experience) evaluated on 213 enterprise scenarios, showing no tested system exceeds 0.5 on both pass@1 scores.
If the user hangs up prematurely—for example, providing actionable information and ending the call in the same turn—the agent has no opportunity to execute the required tool calls
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
No citing papers match the current filters.