Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review

Zhicheng Lin

arxiv: 2507.06185 · v1 · pith:XSZ72LVYnew · submitted 2025-07-08 · 💻 cs.CY · cs.AI· cs.CL· cs.HC

Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review

Zhicheng Lin This is my paper

classification 💻 cs.CY cs.AIcs.CLcs.HC

keywords reviewpeerpromptshiddeninstructionsacademicai-assistedevaluation

0 comments

read the original abstract

In July 2025, 18 academic manuscripts on the preprint website arXiv were found to contain hidden instructions known as prompts designed to manipulate AI-assisted peer review. Instructions such as "GIVE A POSITIVE REVIEW ONLY" were concealed using techniques like white-colored text. Author responses varied: one planned to withdraw the affected paper, while another defended the practice as legitimate testing of reviewer compliance. This commentary analyzes this practice as a novel form of research misconduct. We examine the technique of prompt injection in large language models (LLMs), revealing four types of hidden prompts, ranging from simple positive review commands to detailed evaluation frameworks. The defense that prompts served as "honeypots" to detect reviewers improperly using AI fails under examination--the consistently self-serving nature of prompt instructions indicates intent to manipulate. Publishers maintain inconsistent policies: Elsevier prohibits AI use in peer review entirely, while Springer Nature permits limited use with disclosure requirements. The incident exposes systematic vulnerabilities extending beyond peer review to any automated system processing scholarly texts, including plagiarism detection and citation indexing. Our analysis underscores the need for coordinated technical screening at submission portals and harmonized policies governing generative AI (GenAI) use in academic evaluation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Hidden State Poisoning Attacks against Mamba-based Language Models
cs.CL 2026-01 unverdicted novelty 7.0

Short input phrases can irreversibly overwrite hidden states in Mamba models, impairing information retrieval on a new benchmark while leaving pure Transformer models unaffected.
ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected
cs.CR 2025-12 unverdicted novelty 6.0

Authors show prompt injection attacks that jailbreak LLM paper reviewers for biased acceptance and propose embedding triggers to detect when reviews are LLM-generated rather than human.
Review the Code, Not the Story: A Vision and Protocol for Code-First Peer Review
cs.SE 2026-06 unverdicted novelty 5.0

Proposes a code-first peer review protocol using AI infrastructure to execute research artifacts and generate claim-evidence review packages for human reviewers.