NovBench is the first large-scale benchmark with 1,684 expert-annotated pairs to evaluate LLMs on assessing academic paper novelty via a four-dimensional framework of Relevance, Correctness, Coverage, and Clarity.
Alireza Ghafarollahi and Markus J Buehler
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8roles
background 3polarities
background 3representative citing papers
ReviewGrounder decomposes review generation into rubric-guided drafting and tool-integrated grounding stages, outperforming larger baseline models on a new benchmark measuring alignment with human judgments and review quality.
HiRAS introduces hierarchical multi-agent coordination for paper-to-code generation and experiment reproduction, claiming over 10% relative gains over prior state-of-the-art on a refined benchmark with reduced hallucination.
SafeReview trains a Generator to create adversarial prompts and a Defender to detect them via co-evolution with an IR-GAN-inspired loss, claiming better resilience than static defenses for LLM-based peer review.
Peer review reports in AI conferences have grown longer and more standardized after LLMs, with increased emphasis on surface-level clarity and summaries at the expense of deeper critiques on originality and replicability.
SciAtlas builds a large-scale multi-disciplinary academic knowledge graph and a neuro-symbolic retrieval system to support automated scientific research tasks such as literature review and idea positioning.
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.
The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.
citing papers explorer
-
NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment
NovBench is the first large-scale benchmark with 1,684 expert-annotated pairs to evaluate LLMs on assessing academic paper novelty via a four-dimensional framework of Relevance, Correctness, Coverage, and Clarity.
-
ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents
ReviewGrounder decomposes review generation into rubric-guided drafting and tool-integrated grounding stages, outperforming larger baseline models on a new benchmark measuring alignment with human judgments and review quality.
-
HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution
HiRAS introduces hierarchical multi-agent coordination for paper-to-code generation and experiment reproduction, claiming over 10% relative gains over prior state-of-the-art on a refined benchmark with reduced hallucination.
-
SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts
SafeReview trains a Generator to create adversarial prompts and a Defender to detect them via co-evolution with an IR-GAN-inspired loss, claiming better resilience than static defenses for LLM-based peer review.
-
Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI
Peer review reports in AI conferences have grown longer and more standardized after LLMs, with increased emphasis on surface-level clarity and summaries at the expense of deeper critiques on originality and replicability.
-
SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research
SciAtlas builds a large-scale multi-disciplinary academic knowledge graph and a neuro-symbolic retrieval system to support automated scientific research tasks such as literature review and idea positioning.
-
AI for Auto-Research: Roadmap & User Guide
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.
-
Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator
The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.