arXiv preprint arXiv:2511.14366 , year=

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning , author= · arXiv 2511.14366

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

ResearchClawBench is a new benchmark that evaluates autonomous AI research agents on 40 tasks grounded in published papers using expert rubrics, finding that top systems score only 20-26 out of 100.

citing papers explorer

Showing 1 of 1 citing paper.

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research cs.LG · 2026-05-28 · unverdicted · none · ref 31
ResearchClawBench is a new benchmark that evaluates autonomous AI research agents on 40 tasks grounded in published papers using expert rubrics, finding that top systems score only 20-26 out of 100.

arXiv preprint arXiv:2511.14366 , year=

fields

years

verdicts

representative citing papers

citing papers explorer