Alireza Ghafarollahi and Markus J Buehler

Zhaolin Gao, Kiant · 2024 · arXiv 2402.10886

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment

cs.CL · 2026-04-13 · unverdicted · novelty 8.0

NovBench is the first large-scale benchmark with 1,684 expert-annotated pairs to evaluate LLMs on assessing academic paper novelty via a four-dimensional framework of Relevance, Correctness, Coverage, and Clarity.

ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents

cs.CL · 2026-04-15 · unverdicted · novelty 7.0

ReviewGrounder decomposes review generation into rubric-guided drafting and tool-integrated grounding stages, outperforming larger baseline models on a new benchmark measuring alignment with human judgments and review quality.

HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

HiRAS introduces hierarchical multi-agent coordination for paper-to-code generation and experiment reproduction, claiming over 10% relative gains over prior state-of-the-art on a refined benchmark with reduced hallucination.

SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts

cs.CL · 2026-04-29 · unverdicted · novelty 5.0

SafeReview trains a Generator to create adversarial prompts and a Defender to detect them via co-evolution with an IR-GAN-inspired loss, claiming better resilience than static defenses for LLM-based peer review.

Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI

cs.CL · 2026-04-21 · unverdicted · novelty 5.0

Peer review reports in AI conferences have grown longer and more standardized after LLMs, with increased emphasis on surface-level clarity and summaries at the expense of deeper critiques on originality and replicability.

SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

cs.AI · 2026-05-20 · unverdicted · novelty 4.0

SciAtlas builds a large-scale multi-disciplinary academic knowledge graph and a neuro-symbolic retrieval system to support automated scientific research tasks such as literature review and idea positioning.

AI for Auto-Research: Roadmap & User Guide

cs.AI · 2026-05-18 · unverdicted · novelty 4.0

The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator

cs.DL · 2025-07-16 · unverdicted · novelty 4.0

The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.

citing papers explorer

Showing 8 of 8 citing papers.

NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment cs.CL · 2026-04-13 · unverdicted · none · ref 2
NovBench is the first large-scale benchmark with 1,684 expert-annotated pairs to evaluate LLMs on assessing academic paper novelty via a four-dimensional framework of Relevance, Correctness, Coverage, and Clarity.
ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents cs.CL · 2026-04-15 · unverdicted · none · ref 1
ReviewGrounder decomposes review generation into rubric-guided drafting and tool-integrated grounding stages, outperforming larger baseline models on a new benchmark measuring alignment with human judgments and review quality.
HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution cs.CL · 2026-04-20 · unverdicted · none · ref 1
HiRAS introduces hierarchical multi-agent coordination for paper-to-code generation and experiment reproduction, claiming over 10% relative gains over prior state-of-the-art on a refined benchmark with reduced hallucination.
SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts cs.CL · 2026-04-29 · unverdicted · none · ref 2
SafeReview trains a Generator to create adversarial prompts and a Defender to detect them via co-evolution with an IR-GAN-inspired loss, claiming better resilience than static defenses for LLM-based peer review.
Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI cs.CL · 2026-04-21 · unverdicted · none · ref 40
Peer review reports in AI conferences have grown longer and more standardized after LLMs, with increased emphasis on surface-level clarity and summaries at the expense of deeper critiques on originality and replicability.
SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research cs.AI · 2026-05-20 · unverdicted · none · ref 29
SciAtlas builds a large-scale multi-disciplinary academic knowledge graph and a neuro-symbolic retrieval system to support automated scientific research tasks such as literature review and idea positioning.
AI for Auto-Research: Roadmap & User Guide cs.AI · 2026-05-18 · unverdicted · none · ref 45
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.
Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator cs.DL · 2025-07-16 · unverdicted · none · ref 41
The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.

Alireza Ghafarollahi and Markus J Buehler

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer