BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?
read the original abstract
The convergence of LLM-powered research assistants and AI-based peer review systems creates a critical vulnerability: fully automated publication loops where AI-generated research is evaluated by AI reviewers without human oversight. We investigate this through \textbf{BadScientist}, a framework that evaluates whether fabrication-oriented paper generation agents can deceive multi-model LLM review systems. Our generator employs presentation-manipulation strategies requiring no real experiments. We develop a rigorous evaluation framework with formal error guarantees (concentration bounds and calibration analysis), calibrated on real data. Our results reveal systematic vulnerabilities: fabricated papers achieve acceptance rates up to . Critically, we identify \textit{concern-acceptance conflict} -- reviewers frequently flag integrity issues yet assign acceptance-level scores. Our mitigation strategies show only marginal improvements, with detection accuracy barely exceeding random chance. Despite provably sound aggregation mathematics, integrity checking systematically fails, exposing fundamental limitations in current AI-driven review systems and underscoring the urgent need for defense-in-depth safeguards in scientific publishing.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions
Presentation-only revisions guided by AI feedback can boost AI reviewer scores by over 1 point on average with 75% success rate across tested systems.
-
The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators
The Red Queen Gödel Machine organizes recursive self-improvement into epochs with fixed intra-epoch evaluation while allowing utility evolution at boundaries, yielding reported gains on coding, paper writing, and proo...
-
Agon: An Autonomous Large-Scale Omnidisciplinary Research System Built on Prompt Economy
Agon is a new autonomous research system using prompt economy loops across 444 iterations to demonstrate scalable omnidisciplinary research and a taxonomy separating machine-fixable failures from those needing human judgment.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.