Prompt injection attacks on llm generated reviews of scientific publications

Peer review in scientific publications: benefits, critiques, & a survival guide · 2025 · arXiv 2509.10248

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 unclear 1

representative citing papers

When AI reviews science: Can we trust the referee?

cs.AI · 2026-04-26 · unverdicted · novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.

ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected

cs.CR · 2025-12-23 · unverdicted · novelty 6.0

Authors show prompt injection attacks that jailbreak LLM paper reviewers for biased acceptance and propose embedding triggers to detect when reviews are LLM-generated rather than human.

LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers

cs.CL · 2026-05-25 · unverdicted · novelty 5.0

LLMs overrate weak papers, diverge from humans on criteria like clarity and reproducibility, write longer less diverse reviews, and remain vulnerable to prompt injection attacks that can boost low-scoring papers to acceptance levels.

AI for Auto-Research: Roadmap & User Guide

cs.AI · 2026-05-18 · unverdicted · novelty 4.0

The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

citing papers explorer

Showing 4 of 4 citing papers after filters.

When AI reviews science: Can we trust the referee? cs.AI · 2026-04-26 · unverdicted · none · ref 94
AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.
ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected cs.CR · 2025-12-23 · unverdicted · none · ref 12
Authors show prompt injection attacks that jailbreak LLM paper reviewers for biased acceptance and propose embedding triggers to detect when reviews are LLM-generated rather than human.
LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers cs.CL · 2026-05-25 · unverdicted · none · ref 2
LLMs overrate weak papers, diverge from humans on criteria like clarity and reproducibility, write longer less diverse reviews, and remain vulnerable to prompt injection attacks that can boost low-scoring papers to acceptance levels.
AI for Auto-Research: Roadmap & User Guide cs.AI · 2026-05-18 · unverdicted · none · ref 88
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

Prompt injection attacks on llm generated reviews of scientific publications

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer