ResearchClawBench supplies 40 grounded tasks and expert rubrics to measure autonomous research agents, with the strongest systems scoring only 21.5 and 20.7 on average.
arXiv preprint arXiv:2506.12958 , year=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
No citing papers match the current filters.