Action-based conversations dataset: A corpus for building more in-depth task-oriented dialogue systems

Derek Chen, Howard Chen, Yi Yang, Alex Lin, Zhou Yu · 2021 · arXiv 2104.00783

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

$\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment

cs.AI · 2025-06-09 · unverdicted · novelty 7.0

τ²-bench provides a Dec-POMDP-based telecom domain with compositional task generation and a tool-constrained user simulator to measure agent performance drops in dual-control versus single-control settings.

$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

cs.AI · 2024-06-17 · unverdicted · novelty 7.0

τ-bench shows state-of-the-art agents like GPT-4o succeed on under 50% of tool-using, rule-following tasks and are inconsistent across repeated trials.

citing papers explorer

Showing 2 of 2 citing papers.

$\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment cs.AI · 2025-06-09 · unverdicted · none · ref 5
τ²-bench provides a Dec-POMDP-based telecom domain with compositional task generation and a tool-constrained user simulator to measure agent performance drops in dual-control versus single-control settings.
$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains cs.AI · 2024-06-17 · unverdicted · none · ref 4
τ-bench shows state-of-the-art agents like GPT-4o succeed on under 50% of tool-using, rule-following tasks and are inconsistent across repeated trials.

Action-based conversations dataset: A corpus for building more in-depth task-oriented dialogue systems

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer