AutoResearchBench is a new benchmark showing top AI agents achieve under 10% success on complex scientific literature discovery tasks that demand deep comprehension and open-ended search.
In NeurIPS 2025 AI for Science Workshop
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery
AutoResearchBench is a new benchmark showing top AI agents achieve under 10% success on complex scientific literature discovery tasks that demand deep comprehension and open-ended search.
- Toward Autonomous Long-Horizon Engineering for ML Research
- Learning to Predict Future-Aligned Research Proposals with Language Models