AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.
Prompt injection attacks on llm generated reviews of scientific publications,
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 2representative citing papers
Authors show prompt injection attacks that jailbreak LLM paper reviewers for biased acceptance and propose embedding triggers to detect when reviews are LLM-generated rather than human.
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.
citing papers explorer
-
When AI reviews science: Can we trust the referee?
AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.
-
ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected
Authors show prompt injection attacks that jailbreak LLM paper reviewers for biased acceptance and propose embedding triggers to detect when reviews are LLM-generated rather than human.
-
AI for Auto-Research: Roadmap & User Guide
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.