Can large language models provide useful feedback on research papers? a large-scale empirical analysis

Weixin Liang, Yuhui Zhang, Hancheng Cao, Binglu Wang, Daisy Ding, Xinyu Yang, Kailas V odrahalli, Siyu He, Daniel Smith, Yian Yin, et al · 2023 · arXiv 2310.01783

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

AgentReview: Exploring Peer Review Dynamics with LLM Agents

cs.CL · 2024-06-18 · unverdicted · novelty 8.0

AgentReview is the first LLM-based simulation framework for peer review that quantifies a 37.1% decision variation attributable to reviewer biases.

FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification

cs.AI · 2026-04-05 · conditional · novelty 7.0

FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.

Jagged AI in Scientific Peer Review: Evidence from POMP Data Analysis

stat.AP · 2026-05-08 · unverdicted · novelty 5.0 · 2 refs

AI peer reviewers for POMP analyses show jagged performance: strong on technical error detection and invalid inference but weak on interpretive errors, narrative coherence, and domain-informed critique.

citing papers explorer

Showing 3 of 3 citing papers.

AgentReview: Exploring Peer Review Dynamics with LLM Agents cs.CL · 2024-06-18 · unverdicted · none · ref 18
AgentReview is the first LLM-based simulation framework for peer review that quantifies a 37.1% decision variation attributable to reviewer biases.
FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification cs.AI · 2026-04-05 · conditional · none · ref 7
FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.
Jagged AI in Scientific Peer Review: Evidence from POMP Data Analysis stat.AP · 2026-05-08 · unverdicted · none · ref 5 · 2 links
AI peer reviewers for POMP analyses show jagged performance: strong on technical error detection and invalid inference but weak on interpretive errors, narrative coherence, and domain-informed critique.

Can large language models provide useful feedback on research papers? a large-scale empirical analysis

fields

years

verdicts

representative citing papers

citing papers explorer