Ambig-DS shows data-science agents degrade on ambiguous tasks via silent wrong framings, with one clarifying question recovering much loss but agents unable to decide when to ask.
Change only the wording needed to remove the targeted framing cue
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Ambig-DS: A Benchmark for Task-Framing Ambiguity in Data-Science Agents
Ambig-DS shows data-science agents degrade on ambiguous tasks via silent wrong framings, with one clarifying question recovering much loss but agents unable to decide when to ask.