Ambig-DS shows data-science agents degrade on ambiguous tasks via silent wrong framings, with one clarifying question recovering much loss but agents unable to decide when to ask.
(For target ambiguity this is true essentially by construction; rate fail only if the prompt somehow makes the choice trivial or moot.)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Ambig-DS: A Benchmark for Task-Framing Ambiguity in Data-Science Agents
Ambig-DS shows data-science agents degrade on ambiguous tasks via silent wrong framings, with one clarifying question recovering much loss but agents unable to decide when to ask.