PrepBench is a benchmark showing that state-of-the-art LLMs still struggle with natural-language-driven data preparation involving disambiguation, code generation, and workflow translation.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
New Text-to-Big SQL metrics show that LLM agents must balance accuracy with cost and speed at scale, where GPT-4o trades some accuracy for up to 12x speedup and GPT-5.2 proves more cost-effective than Gemini 3 Pro on large inputs.
Reinforcement learning with a dual recall-precision reward trains models to enumerate valid interpretations and answers for ambiguous inputs using only multiple-answer supervision.
citing papers explorer
-
PrepBench: How Far Are We from Natural-Language-Driven Data Preparation?
PrepBench is a benchmark showing that state-of-the-art LLMs still struggle with natural-language-driven data preparation involving disambiguation, code generation, and workflow translation.
-
Both Ends Count! Just How Good are LLM Agents at "Text-to-Big SQL"?
New Text-to-Big SQL metrics show that LLM agents must balance accuracy with cost and speed at scale, where GPT-4o trades some accuracy for up to 12x speedup and GPT-5.2 proves more cost-effective than Gemini 3 Pro on large inputs.
-
Reasoning about Intent for Ambiguous Requests
Reinforcement learning with a dual recall-precision reward trains models to enumerate valid interpretations and answers for ambiguous inputs using only multiple-answer supervision.