Ambig-DS shows data-science agents degrade on ambiguous tasks via silent wrong framings, with one clarifying question recovering much loss but agents unable to decide when to ask.
The edited prompt should not introduce new wording that reveals the hidden intended framing
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Ambig-DS: A Benchmark for Task-Framing Ambiguity in Data-Science Agents
Ambig-DS shows data-science agents degrade on ambiguous tasks via silent wrong framings, with one clarifying question recovering much loss but agents unable to decide when to ask.