TaxoBench shows deep research agents retrieve 20.92% of expert-cited papers and produce taxonomies with 75.9% sibling overlap, 51.2% MECE violations, and 83.4% imbalance, while LLMs reach only 28-29% semantic path similarity versus 47-58% for human groups.
semantic_coverage
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Can Deep Research Agents Retrieve and Organize? Evaluating the Synthesis Gap with Expert Taxonomies
TaxoBench shows deep research agents retrieve 20.92% of expert-cited papers and produce taxonomies with 75.9% sibling overlap, 51.2% MECE violations, and 83.4% imbalance, while LLMs reach only 28-29% semantic path similarity versus 47-58% for human groups.