CoCoReviewBench curates 3,900 conference papers with category subsets and expert discussion annotations to evaluate AI reviewers on completeness and correctness, showing they are limited and prone to hallucinations while reasoning models perform better.
For guidance on when this is appropriate, please review the NeurIPS ethics guidelines
2 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Agent Laboratory is an autonomous LLM framework that completes end-to-end research from idea to report and code, with human feedback improving quality and cutting expenses by 84% while reaching competitive ML performance.
citing papers explorer
-
CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers
CoCoReviewBench curates 3,900 conference papers with category subsets and expert discussion annotations to evaluate AI reviewers on completeness and correctness, showing they are limited and prone to hallucinations while reasoning models perform better.
-
Agent Laboratory: Using LLM Agents as Research Assistants
Agent Laboratory is an autonomous LLM framework that completes end-to-end research from idea to report and code, with human feedback improving quality and cutting expenses by 84% while reaching competitive ML performance.